The Web 3.0's Pulse : Semantic Web Trends

Currently Hot: Facebook OpenGraph Protocol

Friday, October 30, 2009

Semantic Web and Peer-to-Peer Networks

Decentralized Knowledge Management and Information Exchange

The idea for the Semantic Web  brings interesting ideas for revolutionary approaches in the fields such as knowledge management. Knowledge Management (KM) becomes one of the essential driving forces for the existence of large communities. However, searching through the knowledge bases is very limited. The Semantic Web technologies promise concept alignment and powerful search features to be incorporated into knowledge management systems. But due to the client-server architecture upon which KM systems most often rely on, appearance of physical bottlenecks are very likely to happen. Therefore, an alternative architecture appropriate for large-scale systems is needed. As you already may guess, here flat networks come into play - the Peer-to-Peer (P2P) systems. 

P2P systems are not only scalable, but improve their performances as new nodes join the network. This unique feature gives them suitable ground for all sharing software applications. However, searching through these networks is limited to keyword search only, and different peers may name same resources differently, so it is obvious that it lacks semantics when resources with anonymous peers.

By combining Semantic Web technologies and P2P networks, could possibly bring ultimate conditions for advanced knowledge management techniques. Distributing knowledge accross different transparent physical locations and also providing peers to use their own views upon same data, can bring tremendous change in the evolution of the Semantic Web.  At first glance, it seems like an extraordinary idea - it would be wonderful if I could share a file with other peers who would be able to understand what it is and how it is related with other files someone else owns. Also, KMSs could distribute their knowledge on different peers, possibly with previous domenization of the knowledge in order to boost performances.

But if peers are free to annotate the resources they share with others, how will the knowledge ( ontology ) alignment occur ? There must be some kind of common mechanism for annotation so that alignment happens automatically and peer agents will be able to identify the relation of the resources shared. One suggested approach is to use small common vocabularies among peers. Then the peer agents will be able to align the knowledge automatically. But the question of performance for such large-scale system is yet to be proved as no real Semantic P2P application has been deployed.

Note: For anyone interested in this topic, I recommend the book: Semantic Web and Peer-to-Peer: Decentralized Management and Exchange of Knowledge and Information. I will continue the discussion once I finish reading this book.

Sunday, October 25, 2009

Jena, a Framework for developing Semantic Web Applications


Jena, Semantic Web framework, advantages and features

 Jena is a Java framework for developing Semantic Web applications. It has been developed by HP Labs and it is an open source project. Basically, Jena provides Java environment for working with RDF, RDFS, OWL, SPARQL and reasoning engines. The Jena framework creates an additional layer of abstraction that translates the statements and constructs of the Semantic Web into Java artifacts, such as classes, objects, methods and attributes. These artifacts reduce the effort needed for programming Semantic Web applications. One of the strongest sides of Jena lies in its excellent documentation. The exhaustive resources, including descriptions and tutorials that can be found on the Web encourage programmers to further develop their Semantic Web applications utilizing this framework.
As part of its RDF features, Jena offers managing with RDF resources, writing them in RDF/XML, N3 and N-Triples format. Jena also supports working with the RDF Schema, by providing API for all the vocabulary extensions it brings. Moreover, Jena covers the usage of OWL, in one of the three variants: Full, Description Logic, and Lite. The OWL API provides the ability to navigate through the graph, locate resources and retrieve them from the model. Regardless to the schema and the data models (which can be separate resources) used, Jena can simultaneously work with multiple ontologies from different sources. The API which comes with the framework makes the knowledge sharing process extremely easy, as every resource comes with its URI, Jena is excellent in working with the knowledge shared across the (Semantic) Web. The framework also covers methods for validating an ontology and derivation logging, which enables the developer to see how Jena concludes the answers of the query.
Regarding the persistence storage, Jena perfectly works with files containing OWL or RDF data, but has an API for database backend as well. Because of its high level of generics crafted into its software design, Jena can be easily bound to SQL databases from different vendors. All a developer needs is an appropriate driver for the particular SQL database.
Querying the knowledge graph is an important topic when discussing semantic web frameworks. Jena supports querying the model through the API, or by directly constructing SPARQL query to retrieve the results. The knowledge base can be attached to a web server designed especially for Jena, named Joseki (www.joseki.org). Joseki acts as a mediator between the SPARQL query input through GET or POST HTTP methods, and returns RDF/XML response with the results, which can be further formatted with XSLT.  
Perhaps the most powerful component of the Jena Framework is the Inference API. This API contains several reasoner types, which efficiently conclude new relations in the knowledge graph. Among the reasoners, there are: RDF(S), OWL, Transitive and Generic reasoners. It is worth mentioning that Jena is compatible with third party reasoners, such as the Pellet reasoner. All of the reasoners can be configured individually, by creating special resources that contain the desired configuration and  then using it to perform the reasoning. For example, the reasoned can be configured to run in forward-chaining or backward-chaining mode, or an OWL reasoner can be instructed to use a Description Logic (or Full or Lite) memory model specifically in the favor better reasoning performance.

Disadvantages

Despite the powerful abilities and the high level of abstraction provided, Jena has some serious disadvantages. For instance, when retrieving datasets, the framework places all statements into the main memory, often causing an overflow in the heap of the Java Virtual Machine (JVM). Therefore, the needs significant amount of space, depending on the number of statements that are retrieved in the resulting data set. This is also true even if one decides to use SQL database for persistence.
The second disadvantage is regarding the threading. Namely, Jena is not thread safe and consistency and concurrency issues can easily occur. The API provides methods for declaring critical regions but it is up to the programmer to take care of the threads using the model.
The third, and possibly the most relevant disadvantage is the cost of the inference process. Inference capability is one of the basic features of the knowledgebase and yet the most powerful one. Without inference, a knowledgebase would not be much different from an ordinary database. As mentioned earlier, the reasoning process infers implicit statements in the knowledge graph. Hence the number of edges in the graph rapidly increases, requiring more time to navigate and locate a specific resource from it. Adding large number of statements in the knowledge model is a time- and memory-consuming process. However, efforts are being made to decrease these high costs by using methods known as graph closure and graph reduction.

Summary

The Jena Framework is an excellent tool for managing resources needed for the Semantic Web applications. Being developed in Java, it is applicable to various environments. In addition, it is open source and strongly backed up by solid documentation. Even though it has some significant disadvantages, it is still one of the most powerful frameworks for Semantic Web technologies and holds the potential to become de facto standard when it comes to developing such programs. Frameworks like Jena are worth investing in, since they might play the key role in the evolution of the WWW into Semantic Web, predicted by Sir Tim Berners Lee.

Saturday, October 24, 2009

Alchemy from Raw Text to Semantic Annotations

Alchemy API, a free web API that can be used to semantically annotate web resources such as HTML documents or pure raw texts. The  Alchemy API, uses NLP (Natural Language Processing) to extract the meaning of the input text. Alchemy is capable of retrieving entities, keywords, pure text txtraction, text categorization, language identification and probably some other features that I haven't explored yet. The response of the API is ordinary XML ( RDF included here as well ), which of course, is easy to parse and integrate with any platform. Unlike OpenCalais, which offers typical SOAP web service( which is also a tehcnology I personally admire, you can read my post about it), Orchestr8's Alchemy API uses web requests to their server to invoke their NLP technology. The API is well documented, with downloadable examples for a variety of technologies. Since I am ASP.NET developer, I only downloaded the C#.NET SDK examples and it seems really neat and simple to use. It is very probable that other SDK's are as good as the mentioned one. Please note that you would need  an API key before using this service. This Orchestr8's API, in my opinion bears a lot of potential in dramatically decreasing the effort for semantic annotation of world's web content, which is most likely the stepping stone for the real semantic evolution. An interesting thing that I noticed in the Alchemy's response is that actually contains some kind of entity alignment in it. For instance, it is capable of disambiguation of a given term, so  in the RDF response there is an OWL ontology alignment piece( powered by LinkedData I suppose ), that aligns the disambiguated entity with semantic resources from sources like DBPedia, Freebase, CIA Factbook,GeoNames etc. Really cool. That could boost semantic knowledge management in even more advanced semantic applications.
Orchestr8 offers the API in several packages. The basic package is free and allows 30 000 calls per day, which, is enough for small to medium applications. If a good idea for a semantic web application is born, this certainly will not be the limit.



Thursday, October 15, 2009

Triplify: Expose Your Information as RDF

Yet another way into automating the process of semantic annotation of your web content. As many times mentioned, one of the biggest obstacles for the Semantic Web to become fully implemented, is that people (web masters) are forced to manually annotate every resource they have on the Web in order (future) applications to use that information. The trouble is the term future applications. There is no killer semantic web application yet, partly because there is no semantic data available through the Web.
So here comes Triplify into play. Triplify is a framework that works over your SQL database and the web master selects the columns that are of interest when constructing the triples, and it automatically generates RDF using the pattern specified by the owner.




This is an illustration on how Triplify  works:



Disadvantages
Besides the nice approach of generating RDF triples, some serious shortcomings are hidden under the hood. 
First, it has performance issues and is aimed for using on small to medium web sites. 
Secondly, it is still implemented only in PHP. Semantic Web goes far beyond technology limitations. However, more developers are needed for implementations on other platforms.

Conclusion 
 As far I explored, it seems fairly easy to integrate, and it might be very helpful for exporting your data in RDF. But one question still remains.
What to do with RDF ? How does one utilize RDF to make semantic killer application?

Monday, October 12, 2009

The "Prophet" Himself: Tim Berners Lee Talks About the Semantic Web

Here is an interesting video for all those of you who would like to see Tim Berners Lee talking about the phenomenon he predicted about 10 years ago and the revolution which we all anxiously wait to be triggered. I think that much of what he says in the video most of you have already read or heard about, but it is still a nice feeling to see him talking about the Semantic Web. He is so passionate talking about the Web of Data, he really believes.




Sunday, October 11, 2009

Meet Sindice - the Semantic Web Index

Sindice is a Semantic Web index with a search engine. . (Here is the link: sindice.com). Interestingly, one of its authors is Nova Spivack from Radar Technologies, the same guy who runs twine.com. Is this the semantic search engine twine talks about ? Sindice claims to offer semantic search for terms and properties or triples, but personally, I can hardly notice the benefit of using it - It is only capable of indexing things, it is not capable of extracting terms' relations with other terms - something the Semantic Web is about. Why would I need a semantic index or a search through that index. Probably the best application of these services would be making of semantic piped tasks. The Sigma search engine is aggregating definitions for the terms from different sources, and what I really like about it is the ability for the users to control the sources of the definitions. Sigma can export the search results in various formats, such as RDF or JSON, but I am really having hard time to see the benefits of it without getting the relations of the terms.


In general, Sindice  has some  fancy marketing, but what it really has under the hood remains to be seen. Personally I don't think there is much to brag about.

Insights of the Semantic World: APIs, Freebase and Collaborative Semantic Web

The Semantic Web is all about collaboration. Collaboration between machines, collaboration between people, and collaboration between machines and people. Everything (r)evolves around this term. Though this blog does not intend to provide tutorial-like information about the Semantic Web, this presentation I came across on slideshare really drew my attention, so I decided to share it here. Very nice way of organizing the concepts about the Semantic Web and oh, the last slide is simply amazing :) . Not just because of the point where the arrow points to, but because of the curve we are about to climb on. When I see slideshows like this, I always get excited and say to myself: "Wow! What a brilliant idea! This is ... great! You know, this could change the world! It's such a wonderful feeling!". My suggestion is that you take a look at the presentation and see for yourself. 


Friday, October 9, 2009

What Would You Do With RDF Organized Knowledge ?

So, lets say that there is an ideal tool for creating RDF from your HTML pages. And now what ? This question came to my mind... What is the first application you would write knowing that many sites publish their knowledge in RDF / OWL format ? The sad thing is, I could not answer immediately. So think about it, what would be the amazing benefit of publishing RDF ?
If anyone has implemented such applications, feel free to share them here.

Open Calais: Automatic RDF Annotation of Raw Text

OpenCalais: Automatic Knowledge Extraction

Have you tried OpenCalais? It's a web service that automatically annotates raw text with semantic meaning - it generates an RDF file from it. Well, it's still far from perfect, but it obviously has good performance in semantic annotation of your text. Basically, one can copy & paste the text from her blog and get RDF-structured knowledge based on the text. From OpenCalais say they use NLP(Natural Language Processing) techniques to analyze the text and calculate the relevance of the recognized concepts in the text. It can be very useful for web masters, since this approach can be a true time-saver and one of the easiest steps towards automatic knowledge publication. In my opinion, automating the process of semantic annotation of the web documents from one side and presenting the benefits of the Semantic Web to the web masters are the two crucial steps that need to be taken in order to come closer to the true implementation of the Semantic Web itself. These two steps can end the vicious chicken-and-the-egg circle: Webmasters refuse to put extra effort in embedding knowledge into their web pages, since there are no semantic applications that would use that knowledge and make web masters' life easier. But because there is no semantic knowledge, no real semantic applications can be developed. And this circle goes on and on.
With OpenCalais, you can publish your knowledge via API, so new custom applications can arise from your website, blog, wiki, e-commerce page or similar. I admit I still need to read and play with the Calais to explore its full features, but from what I have seen so far it looks excellent. This article does not aim to advertise OpenCalais in any way, nor I am related to it, but I would like to emphasize the importance of its existence as a service, that could be the stepping stone towards unleashing the power of the Semantic Web.
Moreover, OpenCalais has plugins for Wordpress (ohhh, none of them for Blogger :-( ), to automatically generate tags ( Tagaroo ). OpenCalais also can be integrated with Drupal. Seems like a nice application.
I believe that by using this service, the number of semantically annotated pages will rapidly rise. That will make a good ground for development of even more advanced Semantic Applications. Try the video and go to the site, so tell me what do you think.



Here is how applications can be build on top of it:



You can try the OpenCalais Document Viewer, to see how it generates the RDF output.

It only remains to see if OpenCalais will fulfill its glorious mission. I really recommend OpenCalais to the Semantic Web Community, its effort deserves attention


Monday, September 21, 2009

Thompson-Reuters Claims Is Able to Extract Semantics From Free Text and Export it to Oracle Database

According to the latest news, Thompson-Reuters has reported that with OpenCalais, a metatagging service, will be integrated with an Oracle database. Here is how it works: first a number of raw text (unstructured) documents are identified in a database, filesystem or across a network, then OpenCalais is invoked via a web-service, which returns a set of RDF triples which are back then saved in a RDF triple store.
This probably means a beginning of the end of the extra effort needed to semantically annotate the enormous number of web documents that are deployed all over the Internet. With such possibilities at service, web masters could finally tag their web pages through a single click - and bother no more. Businesses will benefit from this too. Their scattered knowledge bases can now be easily integrated into a single entity - which could possess its own inference engine and further utilize the semantics it gets.
Currently OpenCalais claim that they are processing between 3 and 5 million documents per day, and they will soon attract even more developers to use their service.

Will this be the trigger to catalyze the Semantic (r)evolution ?

Saturday, September 19, 2009

The Semantic Search Engine : Dream 3.0 ?

Today I read about the latest try to fulfill the famous Web Dream: The Semantic Search Engine. Wouldn't it be nice to have such a wonderful tool, that can actually understand you ? You can ask it about anything, it is the Global Mind, it crunches data and comprehends the whole Web, the largest knowledge management application humanity has ever built. And the best thing is, it learns and gets smarter with every day... by itself.
Sounds like a quote from a Science Fiction book, but is it that far ? It's been about 10 years since the publication of the famous paper in Scientific American by Tim Berners Lee, but yet no (r)evolution has occured. There is no single killer application, fueled by the Semantic Technologies. But why ?
The whole computer industry lives for roughly 60 years, the Internet era has begun in the 1990s, so a period of 10 years means a lot of time for the Web. That is huge amount of time. We have the standards, we have the tools, we have the frameworks, the knowledge ...
I have read several articles and it seems there is a logical explanation of this phenomenon: it's the humans that are wrong... (again). It's not the problem in making machines undersand what we mean (personally I think it sounds like the most exciting part when telling someone what is the Semantic Web all about: computers will undersand ? Really ? Like in the movies ? Will I be able to ask them via voice control ? ). The trouble is that people are lazy. The WWW is the biggest and the fastest growing entity on the whole planet. It is enormous. People will need extra effort to annotate all that data across the web. But it is tedious and time-consuming (Hey, didn't we invent computers because of that ?). But it's they that don't understand, not the machines. Machines are ready to learn. Another issue for that would come from the fact that humans are spoiled and selfish - people lie. Yes, they do. There is no rightful force to make webmasters embed true information about their web pages. (Remember the keyword stuffing problem ? ). How will someone even make them want to start annotating the pages ? I believe that here lie most of the problems for the stagnation of the Semantic Web and its applications.

There are efforts to automate the process through Natural Language Processing(NLP) but I wonder if it ever reaches the desired level of automation. Here is a good article about what Oracle does : Oracle & OpenCalais - Semantic Database. This thing really makes me happy because of the burst of hope that the Semantic Web is not an e-Myth.
Back to the search engines. The team of Twine.com has been busy trying to achieve the unimaginable: produce a true Semantic Search Engine. Here is the original post I found T2 - Twine's Semantic Search Engine. If this becomes true, all the hype will disappear in the mist. I understand why people are sceptical, but presonally, guys, you don't know what might happen. Maybe it is possible. Requirements are high - a volatile system, evolving every second, reasoning and comprehending, accurate, fast, robust ... but there is still a chance. The Semantic Engine is the one of the most desired applications of the Semantic Web. It will be a major breakthrough - although many find it tough to believe. Will the openess and sharing prevail at the end ?