Jena, Semantic Web framework, advantages and features
Jena is a Java framework for developing Semantic Web applications. It has been developed by HP Labs and it is an open source project. Basically, Jena provides Java environment for working with RDF, RDFS, OWL, SPARQL and reasoning engines. The Jena framework creates an additional layer of abstraction that translates the statements and constructs of the Semantic Web into Java artifacts, such as classes, objects, methods and attributes. These artifacts reduce the effort needed for programming Semantic Web applications. One of the strongest sides of Jena lies in its excellent documentation. The exhaustive resources, including descriptions and tutorials that can be found on the Web encourage programmers to further develop their Semantic Web applications utilizing this framework.
As part of its RDF features, Jena offers managing with RDF resources, writing them in RDF/XML, N3 and N-Triples format. Jena also supports working with the RDF Schema, by providing API for all the vocabulary extensions it brings. Moreover, Jena covers the usage of OWL, in one of the three variants: Full, Description Logic, and Lite. The OWL API provides the ability to navigate through the graph, locate resources and retrieve them from the model. Regardless to the schema and the data models (which can be separate resources) used, Jena can simultaneously work with multiple ontologies from different sources. The API which comes with the framework makes the knowledge sharing process extremely easy, as every resource comes with its URI, Jena is excellent in working with the knowledge shared across the (Semantic) Web. The framework also covers methods for validating an ontology and derivation logging, which enables the developer to see how Jena concludes the answers of the query.
Regarding the persistence storage, Jena perfectly works with files containing OWL or RDF data, but has an API for database backend as well. Because of its high level of generics crafted into its software design, Jena can be easily bound to SQL databases from different vendors. All a developer needs is an appropriate driver for the particular SQL database.
Querying the knowledge graph is an important topic when discussing semantic web frameworks. Jena supports querying the model through the API, or by directly constructing SPARQL query to retrieve the results. The knowledge base can be attached to a web server designed especially for Jena, named Joseki (www.joseki.org). Joseki acts as a mediator between the SPARQL query input through GET or POST HTTP methods, and returns RDF/XML response with the results, which can be further formatted with XSLT.
Perhaps the most powerful component of the Jena Framework is the Inference API. This API contains several reasoner types, which efficiently conclude new relations in the knowledge graph. Among the reasoners, there are: RDF(S), OWL, Transitive and Generic reasoners. It is worth mentioning that Jena is compatible with third party reasoners, such as the Pellet reasoner. All of the reasoners can be configured individually, by creating special resources that contain the desired configuration and then using it to perform the reasoning. For example, the reasoned can be configured to run in forward-chaining or backward-chaining mode, or an OWL reasoner can be instructed to use a Description Logic (or Full or Lite) memory model specifically in the favor better reasoning performance.
Disadvantages
Despite the powerful abilities and the high level of abstraction provided, Jena has some serious disadvantages. For instance, when retrieving datasets, the framework places all statements into the main memory, often causing an overflow in the heap of the Java Virtual Machine (JVM). Therefore, the needs significant amount of space, depending on the number of statements that are retrieved in the resulting data set. This is also true even if one decides to use SQL database for persistence.
The second disadvantage is regarding the threading. Namely, Jena is not thread safe and consistency and concurrency issues can easily occur. The API provides methods for declaring critical regions but it is up to the programmer to take care of the threads using the model.
The third, and possibly the most relevant disadvantage is the cost of the inference process. Inference capability is one of the basic features of the knowledgebase and yet the most powerful one. Without inference, a knowledgebase would not be much different from an ordinary database. As mentioned earlier, the reasoning process infers implicit statements in the knowledge graph. Hence the number of edges in the graph rapidly increases, requiring more time to navigate and locate a specific resource from it. Adding large number of statements in the knowledge model is a time- and memory-consuming process. However, efforts are being made to decrease these high costs by using methods known as graph closure and graph reduction.
Summary
The Jena Framework is an excellent tool for managing resources needed for the Semantic Web applications. Being developed in Java, it is applicable to various environments. In addition, it is open source and strongly backed up by solid documentation. Even though it has some significant disadvantages, it is still one of the most powerful frameworks for Semantic Web technologies and holds the potential to become de facto standard when it comes to developing such programs. Frameworks like Jena are worth investing in, since they might play the key role in the evolution of the WWW into Semantic Web, predicted by Sir Tim Berners Lee.