From NeOn Wiki
|Developed by||Johanna Völker|
|Current Version||[[current version:= <ask format="template" template="CurrentVersion" limit="1" searchlabel="" sort="version number" order="descending" default="no version available"> 1.x/Text2Onto *</ask>]]|
|Homepage||[http://ontoware.org/projects/text2onto/ 1.x/Text2Onto Website]|
Text2Onto is an ontology learning framework which has been developed to support the acquisition of ontologies from textual documents. Like its predecessor, TextToOnto, it provides an extensible set of methods for learning atomic classes, class subsumption and instantiation as well as object properties and disjointness axioms.
- Install Java 1.6
- Install GATE 4.0 (http://gate.ac.uk/download/index.html) to <GATE-DIR> (e.g. c:\GATE)
- Install WordNet 2.0 (http://wordnet.princeton.edu) to <WN-DIR> (e.g. c:\WordNet)
- Unzip org.neontoolkit.text2onto_x.x.x.jar into your Toolkit's plugin directory (e.g. <T2O-DIR>=c:\NeOnToolkit\plugins\org.neontoolkit.text2onto_x.x.x). Note that the directory name is not allowed to include any space characters.
- Edit <T2O-DIR>\lib\jwnl\file_properties.xml and replace <WN-DIR>
<param name="file_manager" value="net.didion.jwnl.dictionary.file_manager.FileManagerImpl"> <param name="file_type" value="net.didion.jwnl.princeton.file.PrincetonRandomAccessDictionaryFile"/> <param name="dictionary_path" value="<WN-DIR>\dict"/> </param>
- Edit NeOn_Toolkit.ini to increase the heap space
- Start NeOn Toolkit and open Text2Onto perspective
- Set the preferences as described below
Technical reports, papers, presentations and demo videos for the standard version of Text2Onto are available from http://www.aifb.uni-karlsruhe.de/WBS/jvo/text2onto/. Detailed information with regards to this plugin can be found in NeOn D3.8.1.
The graphical user interface of the plugin is very similar to the original Swing-based GUI of Text2Onto. It is composed of different views for the configuration of the ontology learning process and the presentation of the results.
The upper left corner contains the workflow view, which is used to set up the ontology learning workflow. By right-clicking on the individual ontology learning tasks (e.g. "Concept" for concept extraction), the user can select one or more methods for each type of ontology element she wants to extract from the corpus.
In the bottom left corner, the user will find a corpus view, which allows her to set up a corpus, that is a collection of text documents from which the ontology will be generated. The doc view (see hidden tab on the right) is used to display previews of selected documents. Text2Onto is able to analyse documents in plain text, PDF (Windows only) and HTML format. However, a manual conversion into purely textual format is highly recommended for efficiency reasons.
The POM view on the right shows the results of the most recently initiated ontology learning process. The view contains several tabs -- one for each type of ontology element that was extracted from the corpus -- showing a tabular listing of individual results. By clicking on the column headers the user can sort the ontology elements according to their associated labels or confidence values.
The preference page, which is accessible from the main menu of on the top of the Text2Onto perspective ("Window" -> "Preferences..." -> "Text2Onto Preferences") replaces the original configuration file of Text2Onto's API. It allows for setting the following parameters:
- Language: The language of the documents to be analysed. Text2Onto provides full support for learning ontologies from English and Spanish corpora as well as partial support for ontology extraction from German texts. For details with respect to the Spanish version of Text2Onto please refer to SEKT D3.3.3.
- Normalization: If this parameter is selected Text2Onto will normalize all confidence values to an interval of 0.0 to 1.0.
- Default corpus: The default directory for populating the ontology learning corpus.
- Spanish tagger directory: The part-of-speech tagger to be used for the analysis of Spanish documents. In the current version of Text2Onto this parameter is expected to point to the TreeTagger (http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/) installation directory.
- Spanish WordNet directory: In case the language is set to Spanish, this path should refer to a licensed version of Spanish WordNet (http://www.lsi.upc.edu/~nlp/web/index.php).