In order to understand the design of the knowledge base component of the ELTK, we provide a brief explanation of how data and ontologies are used within knowledge engineering, especially in the context of Description Logics and related languages such as the Web Ontology Language (OWL). Knowledge bases in the Description Logic world consist of two logically separate components:
- the ontology proper (called the TBox)
- the various assertions about concepts from the ontology (called the ABox)
So, the standard description of such a knowledge base (KB) is KB = <TBox, ABox>, where TBox is a set of classes and relations and ABox is a set of assertions about instances of classes. For example, the fact that all syntactic words are syntactic units belongs to the TBox. But some bit of knowledge, e.g., the formula describing that a particular syntactic word has a particular prefix, belongs in the ABox. Furthermore, since we’re dealing with Linked Data, the situation is a bit more complex. Namely, the TBox-ABox distinction is muddied since both sorts of statements can exist in a single Linked Data resource. To capture the mixing of TBox and ABox, we use the notion of a “KB component”, which can contain either type. It can also contain TBox statements from different ontologies. This kind of assumption is built in to the RDF, Semantic Web architecture.
The KBComponent module is used to create some part of a knowledge base (i.e., a TBox or ABox). The KBComponent module provides a way to create a conceptualization of (some part of) the linguistics domain. The module is used together with the Meta module (containing metaclasses) to bring an ontological conceptualization into the Python OOP framework. The implementation consists of a 2-layered model: the OWL+RDFS+RDF data model (a graph model) and Python’s OOP model.
Here’s how to create a knowledge base component:
>>> from eltk.kb.KBComponent import KBComponent
>>> from eltk.kb.Meta import *
Create the KBComponent, and then some classes and properties:
>>> mykb = KBComponent(URIRef('http://foo.org/myid'))
>>> Word = Meta.OWLClass.new(u'http://blah.org/Word')
>>> w = Word(u'w')
>>> hasConstituent = Meta.OWLObjectProperty.new(u'http://purl.org/linguistics/gold#hasConstituent')
>>> Morpheme = Meta.OWLClass.new(u'http://blah.org/Morpheme')
>>> m1 = Morpheme(u'http://blah.org/m1')
>>> m2 = Morpheme(u'http://purl.org/linguistics/gold#m2')
And add a statement to the KBComponent:
>>> mykb += (w, hasConstituent, m1)
The statement is in the form of a triple, and is equivalent to this:
>>> mykb += hasConstituent(w, m1)
See Meta for an explanation of how particular instances of classes and properties are created.
Here are the methods associated with KBComponent.
getOWLClasses returns only entities that are classes.
Return type: | list |
---|
buildRDFGraph creates an RDFLib graph object.
Parameter: | identifier (unicode) – the identifier string for the graph |
---|---|
Return type: | rdflib.Graph.Graph |
Given some abbreviation, e.g., ‘PST’, or full form, e.g., ‘past tense’, return the GOLD URI indicated by that string.
Parameter: | term_string (str) – the string representation (abbreviation or full form) of the term |
---|---|
Return type: | rdflib.URIRef.URIRef |
renderJSON is a utility method to output a JSON rep of the KB.
Parameter: | root (OWLClass) – The particular root class to from which to generate JSON |
---|---|
Return type: | str |
Depending on the contents of a particular KBComponent, several methods could be applicable.