The Web 3.0's Pulse : Semantic Web Trends

Currently Hot: Facebook OpenGraph Protocol

Showing posts with label OWL. Show all posts
Showing posts with label OWL. Show all posts

Friday, October 30, 2009

Semantic Web and Peer-to-Peer Networks

Decentralized Knowledge Management and Information Exchange

The idea for the Semantic Web  brings interesting ideas for revolutionary approaches in the fields such as knowledge management. Knowledge Management (KM) becomes one of the essential driving forces for the existence of large communities. However, searching through the knowledge bases is very limited. The Semantic Web technologies promise concept alignment and powerful search features to be incorporated into knowledge management systems. But due to the client-server architecture upon which KM systems most often rely on, appearance of physical bottlenecks are very likely to happen. Therefore, an alternative architecture appropriate for large-scale systems is needed. As you already may guess, here flat networks come into play - the Peer-to-Peer (P2P) systems. 

P2P systems are not only scalable, but improve their performances as new nodes join the network. This unique feature gives them suitable ground for all sharing software applications. However, searching through these networks is limited to keyword search only, and different peers may name same resources differently, so it is obvious that it lacks semantics when resources with anonymous peers.

By combining Semantic Web technologies and P2P networks, could possibly bring ultimate conditions for advanced knowledge management techniques. Distributing knowledge accross different transparent physical locations and also providing peers to use their own views upon same data, can bring tremendous change in the evolution of the Semantic Web.  At first glance, it seems like an extraordinary idea - it would be wonderful if I could share a file with other peers who would be able to understand what it is and how it is related with other files someone else owns. Also, KMSs could distribute their knowledge on different peers, possibly with previous domenization of the knowledge in order to boost performances.

But if peers are free to annotate the resources they share with others, how will the knowledge ( ontology ) alignment occur ? There must be some kind of common mechanism for annotation so that alignment happens automatically and peer agents will be able to identify the relation of the resources shared. One suggested approach is to use small common vocabularies among peers. Then the peer agents will be able to align the knowledge automatically. But the question of performance for such large-scale system is yet to be proved as no real Semantic P2P application has been deployed.

Note: For anyone interested in this topic, I recommend the book: Semantic Web and Peer-to-Peer: Decentralized Management and Exchange of Knowledge and Information. I will continue the discussion once I finish reading this book.

Sunday, October 25, 2009

Jena, a Framework for developing Semantic Web Applications


Jena, Semantic Web framework, advantages and features

 Jena is a Java framework for developing Semantic Web applications. It has been developed by HP Labs and it is an open source project. Basically, Jena provides Java environment for working with RDF, RDFS, OWL, SPARQL and reasoning engines. The Jena framework creates an additional layer of abstraction that translates the statements and constructs of the Semantic Web into Java artifacts, such as classes, objects, methods and attributes. These artifacts reduce the effort needed for programming Semantic Web applications. One of the strongest sides of Jena lies in its excellent documentation. The exhaustive resources, including descriptions and tutorials that can be found on the Web encourage programmers to further develop their Semantic Web applications utilizing this framework.
As part of its RDF features, Jena offers managing with RDF resources, writing them in RDF/XML, N3 and N-Triples format. Jena also supports working with the RDF Schema, by providing API for all the vocabulary extensions it brings. Moreover, Jena covers the usage of OWL, in one of the three variants: Full, Description Logic, and Lite. The OWL API provides the ability to navigate through the graph, locate resources and retrieve them from the model. Regardless to the schema and the data models (which can be separate resources) used, Jena can simultaneously work with multiple ontologies from different sources. The API which comes with the framework makes the knowledge sharing process extremely easy, as every resource comes with its URI, Jena is excellent in working with the knowledge shared across the (Semantic) Web. The framework also covers methods for validating an ontology and derivation logging, which enables the developer to see how Jena concludes the answers of the query.
Regarding the persistence storage, Jena perfectly works with files containing OWL or RDF data, but has an API for database backend as well. Because of its high level of generics crafted into its software design, Jena can be easily bound to SQL databases from different vendors. All a developer needs is an appropriate driver for the particular SQL database.
Querying the knowledge graph is an important topic when discussing semantic web frameworks. Jena supports querying the model through the API, or by directly constructing SPARQL query to retrieve the results. The knowledge base can be attached to a web server designed especially for Jena, named Joseki (www.joseki.org). Joseki acts as a mediator between the SPARQL query input through GET or POST HTTP methods, and returns RDF/XML response with the results, which can be further formatted with XSLT.  
Perhaps the most powerful component of the Jena Framework is the Inference API. This API contains several reasoner types, which efficiently conclude new relations in the knowledge graph. Among the reasoners, there are: RDF(S), OWL, Transitive and Generic reasoners. It is worth mentioning that Jena is compatible with third party reasoners, such as the Pellet reasoner. All of the reasoners can be configured individually, by creating special resources that contain the desired configuration and  then using it to perform the reasoning. For example, the reasoned can be configured to run in forward-chaining or backward-chaining mode, or an OWL reasoner can be instructed to use a Description Logic (or Full or Lite) memory model specifically in the favor better reasoning performance.

Disadvantages

Despite the powerful abilities and the high level of abstraction provided, Jena has some serious disadvantages. For instance, when retrieving datasets, the framework places all statements into the main memory, often causing an overflow in the heap of the Java Virtual Machine (JVM). Therefore, the needs significant amount of space, depending on the number of statements that are retrieved in the resulting data set. This is also true even if one decides to use SQL database for persistence.
The second disadvantage is regarding the threading. Namely, Jena is not thread safe and consistency and concurrency issues can easily occur. The API provides methods for declaring critical regions but it is up to the programmer to take care of the threads using the model.
The third, and possibly the most relevant disadvantage is the cost of the inference process. Inference capability is one of the basic features of the knowledgebase and yet the most powerful one. Without inference, a knowledgebase would not be much different from an ordinary database. As mentioned earlier, the reasoning process infers implicit statements in the knowledge graph. Hence the number of edges in the graph rapidly increases, requiring more time to navigate and locate a specific resource from it. Adding large number of statements in the knowledge model is a time- and memory-consuming process. However, efforts are being made to decrease these high costs by using methods known as graph closure and graph reduction.

Summary

The Jena Framework is an excellent tool for managing resources needed for the Semantic Web applications. Being developed in Java, it is applicable to various environments. In addition, it is open source and strongly backed up by solid documentation. Even though it has some significant disadvantages, it is still one of the most powerful frameworks for Semantic Web technologies and holds the potential to become de facto standard when it comes to developing such programs. Frameworks like Jena are worth investing in, since they might play the key role in the evolution of the WWW into Semantic Web, predicted by Sir Tim Berners Lee.

Saturday, October 24, 2009

Alchemy from Raw Text to Semantic Annotations

Alchemy API, a free web API that can be used to semantically annotate web resources such as HTML documents or pure raw texts. The  Alchemy API, uses NLP (Natural Language Processing) to extract the meaning of the input text. Alchemy is capable of retrieving entities, keywords, pure text txtraction, text categorization, language identification and probably some other features that I haven't explored yet. The response of the API is ordinary XML ( RDF included here as well ), which of course, is easy to parse and integrate with any platform. Unlike OpenCalais, which offers typical SOAP web service( which is also a tehcnology I personally admire, you can read my post about it), Orchestr8's Alchemy API uses web requests to their server to invoke their NLP technology. The API is well documented, with downloadable examples for a variety of technologies. Since I am ASP.NET developer, I only downloaded the C#.NET SDK examples and it seems really neat and simple to use. It is very probable that other SDK's are as good as the mentioned one. Please note that you would need  an API key before using this service. This Orchestr8's API, in my opinion bears a lot of potential in dramatically decreasing the effort for semantic annotation of world's web content, which is most likely the stepping stone for the real semantic evolution. An interesting thing that I noticed in the Alchemy's response is that actually contains some kind of entity alignment in it. For instance, it is capable of disambiguation of a given term, so  in the RDF response there is an OWL ontology alignment piece( powered by LinkedData I suppose ), that aligns the disambiguated entity with semantic resources from sources like DBPedia, Freebase, CIA Factbook,GeoNames etc. Really cool. That could boost semantic knowledge management in even more advanced semantic applications.
Orchestr8 offers the API in several packages. The basic package is free and allows 30 000 calls per day, which, is enough for small to medium applications. If a good idea for a semantic web application is born, this certainly will not be the limit.



Friday, October 9, 2009

What Would You Do With RDF Organized Knowledge ?

So, lets say that there is an ideal tool for creating RDF from your HTML pages. And now what ? This question came to my mind... What is the first application you would write knowing that many sites publish their knowledge in RDF / OWL format ? The sad thing is, I could not answer immediately. So think about it, what would be the amazing benefit of publishing RDF ?
If anyone has implemented such applications, feel free to share them here.