From NeOn Wiki

Text2Onto

[[screenshot:=text2onto.png|

Screenshot
]]

Developed by Johanna Völker
Review not available
Status not available
Last Update 15.11.2008
Current Version [[current version:= <ask format="template" template="CurrentVersion" limit="1" searchlabel="" sort="version number" order="descending" default="no version available"> 1.x/Text2Onto *</ask>]]
Homepage [http://ontoware.org/projects/text2onto/ 1.x/Text2Onto Website]
Activity
License EPL
Affiliation
NTKVersion 1.2.3

Text2Onto supports the automatic or semi-automatic generation of ontologies from natural language text.


Contents

Description

Text2Onto is an ontology learning framework which has been developed to support the acquisition of ontologies from textual documents. Like its predecessor, TextToOnto, it provides an extensible set of methods for learning atomic classes, class subsumption and instantiation as well as object properties and disjointness axioms.

Installation

  • Install Java 1.6
  • Unzip org.neontoolkit.text2onto_x.x.x.jar into your Toolkit's plugin directory (e.g. <T2O-DIR>=c:\NeOnToolkit\plugins\org.neontoolkit.text2onto_x.x.x). Note that the directory name is not allowed to include any space characters.
  • Edit <T2O-DIR>\lib\jwnl\file_properties.xml and replace <WN-DIR>
   <param name="file_manager" value="net.didion.jwnl.dictionary.file_manager.FileManagerImpl">
       <param name="file_type" value="net.didion.jwnl.princeton.file.PrincetonRandomAccessDictionaryFile"/>
       <param name="dictionary_path" value="<WN-DIR>\dict"/>
   </param>
  • Edit NeOn_Toolkit.ini to increase the heap space
   -Xms1000m
   -Xmx1000m
  • Start NeOn Toolkit and open Text2Onto perspective
  • Set the preferences as described below

User Documentation

Technical reports, papers, presentations and demo videos for the standard version of Text2Onto are available from http://www.aifb.uni-karlsruhe.de/WBS/jvo/text2onto/. Detailed information with regards to this plugin can be found in NeOn D3.8.1.

Text2onto results.jpg

The graphical user interface of the plugin is very similar to the original Swing-based GUI of Text2Onto. It is composed of different views for the configuration of the ontology learning process and the presentation of the results.

Workflow view

The upper left corner contains the workflow view, which is used to set up the ontology learning workflow. By right-clicking on the individual ontology learning tasks (e.g. "Concept" for concept extraction), the user can select one or more methods for each type of ontology element she wants to extract from the corpus.

Corpus view

In the bottom left corner, the user will find a corpus view, which allows her to set up a corpus, that is a collection of text documents from which the ontology will be generated. The doc view (see hidden tab on the right) is used to display previews of selected documents. Text2Onto is able to analyse documents in plain text, PDF (Windows only) and HTML format. However, a manual conversion into purely textual format is highly recommended for efficiency reasons.

POM view

The POM view on the right shows the results of the most recently initiated ontology learning process. The view contains several tabs -- one for each type of ontology element that was extracted from the corpus -- showing a tabular listing of individual results. By clicking on the column headers the user can sort the ontology elements according to their associated labels or confidence values.

Preferences

Text2onto configuration.png

The preference page, which is accessible from the main menu of on the top of the Text2Onto perspective ("Window" -> "Preferences..." -> "Text2Onto Preferences") replaces the original configuration file of Text2Onto's API. It allows for setting the following parameters:

  • Language: The language of the documents to be analysed. Text2Onto provides full support for learning ontologies from English and Spanish corpora as well as partial support for ontology extraction from German texts. For details with respect to the Spanish version of Text2Onto please refer to SEKT D3.3.3.
  • Normalization: If this parameter is selected Text2Onto will normalize all confidence values to an interval of 0.0 to 1.0.
  • Default corpus: The default directory for populating the ontology learning corpus.