finished
Duration: 2007-03 – 2011-05
Duration: 2008-11 – 2011-05
Duration: 2010-04 – 2010-10
Duration: 2009-04 – 2010-09
Duration: 2009-01 – 2010-08
Duration: 2009-09 – 2010-06
Duration: 2009-05 – 2010-06
Duration: 2009-04 – 2010-06
Duration: 2009-05 – 2010-04
Duration: 2009-06 – 2010-04
Social networks like Twitter are the latest trend in the globalized world. Twitter is used in different scenarios by a broad set of different users. Mining their messages may reveal valuable information.
In this thesis, we propose a way to automatically classify Tweets (and thus the users) using a supervised machine-learning approach. Based on Support Vector Machines, we build an application to set up and train the classifier. Also, a sentiment detection module is implemented. After constructing the data set, an evaluation is carried out to measure the accuracy of the classifier.
The results are very promising: we achieve an accuracy of more than 80%. We also examine the impact of n-gram representation, the amount of training data and the usage of word-stemming and word-conversion. The empirical evaluation shows that word stemming and n-gram representations of the features does not improve the accuracy of the classifier, whereas word-conversion (using regular expression) does.
The output of this thesis are a web application that can be used to classify arbitrary Twitter users and the empirical finding that Support Vector Machines are well suited for classifying Twitter messages. The annotated dataset will be made available.
View Thesis →
Duration: 2009-01 – 2010-04
View Thesis →
Duration: 2009-06 – 2010-03
Duration: 2009-03 – 2010-01
Duration: 2007-11 – 2009-03
Recent IE systems extract named entities rule based, with machine learning approaches or by using a mixture of both. The main drawback of a rule based approach is that it requires the manual adaption of rules to a particular dataset. A machine learning algorithm, on the other hand, typically needs to be trained on a dataset.
This study introduces mechanisms to support and improve the rule adaption process by learning rules. An important detail of this rule learning process is the semi-automatic extension of the used training dataset. If the quality of the learned rules is good enough, in means of precision and recall, the created set of rules can be reused to create multiple instances of ontology.
The evaluation of the hybrid approach happens through comparison with state of the art machine learning algorithms and pure rule based information extraction systems
Duration: 2008-04 – 2009-03
Duration: 2007-08 – 2008-08
In the last years the amount of data has been massivly growing and keeps on growing, hence it became necessary to develop new methods to overcome this large amount of data. Besides the search capability improvements, one of the main forces in current research on data mining is the need to expose and understand the underlying knowledge inside the data.
Encyclopedias are known as a reflection of a decades knowledge. As encyclopedias were always and are a great resource for people to gain common knowledge, there is a need to build such common knowledge for computer systems too. The primary objective of this thesis is to extract knowledge out of the textual representation of those Encyclopedias and the preparation of the extracted knowledge for exploitation in different applications and domains.
As an implementation of this process, the accurate methods of Ontology Learning are applied to the text to create taxonomies and concept hierachies. These structures are combined in an computer processable ontology. Furthermore the extracted information is evaluated and refined by using additional methods like online validation and clustering.
Under the circumstance that suitable methods are used, this thesis shows that the semiautomatic extraction of high quality semantic information from an encyclopedic dataset to build a baseontology is possible.
View Thesis →
Duration: 2005-01 – 2005-03
A reaction to the ever increasing flood of information, especially in the digital sector, is the growing desire to find better methods to organize and control it. Be it documenting and coping with the general information appearing daily on billions of internet sites, or be it dealing with the highly specialized information as found within schools and universities. Digital libraries provide a very good method of collecting and logging information in a controlled manner. However, if the volume of data in such libraries exceeds a certain limit and moreover it also contains highly confidential information, then the use of Systems requiring access authorization, becomes a must.
This thesis covers the possibilities which are available at the moment for constructing a DRMsystem for use in the field of digital libraries. The heterogeneity of existing DRM solutions has resulted in the individual standards not being compatible with each other. Therefore a DRM-system will be presented, which on basis of ontology, is to a large extent able to bridge these incompatibilities. Finally, using a prototype implementation of a digital Handapparat, the practical possibilities of DRM-system in connection with information retrieval is demonstrated.
View Thesis →
ongoing
Begin: 2011-10
Begin: 2010-07
Begin: 2010-05
Begin: 2011-05
Begin: 2008-01
Begin: 2010-05
Begin: 2009-12
Begin: 2009-06


