Get Semantic with DBPedia and ActiveRDF
I'm quite excited by the things that the Semantic web will make possible, and one very interesting project is DBpedia, which aims to extract structured data from Wikipedia, link it with other datasets and put everything in an RDF triple store that you can either download or query via a 'SPARQL endpoint' on the web. I've been trying out using ActiveRDF to make DBpedia queries and showing the results in a Korundum KDE4 app.
This week the DBpedia team have improved the dataset with better extraction algorithms and bug fixes, and it is starting to get seriously useful. ActiveRDF is similar to Rails ActiveRecord, but instead of retrieving relational database query datasets, and turning them into instances of ruby classes, it retrieves RDF from triple stores.
To use the app, you enter the name of a resource such as 'The Beatles', the resource is looked up in DBpedia and an abstract of the Wikipedia article, along with any references is retrieved. The results are shown in one KDE::HTMLPart with clickable links, that allow you to look at the references in another KDE::HTMLPart widget. And all in less than 100 lines of code:
require 'korundum4' require 'active_rdf' about = KDE::AboutData.new( Qt::ByteArray.new("dbpedia"), Qt::ByteArray.new("DBpedia demo"), KDE::LocalizedString.new, Qt::ByteArray.new ) KDE::CmdLineArgs.init(ARGV, about) KDE::Application.new kmainwindow = KDE::MainWindow.new(nil) widget = Qt::Widget.new(kmainwindow) kmainwindow.centralWidget = widget title = Qt::Label.new("Search DBpedia for references to a resource") do |t| t.alignment = Qt::AlignCenter end edit = KDE::LineEdit.new khtml1 = KDE::HTMLPart.new khtml2 = KDE::HTMLPart.new splitter = Qt::Splitter.new do |s| s.orientation = Qt::Vertical s.addWidget(khtml1.widget) s.addWidget(khtml2.widget) end widget.layout = Qt::VBoxLayout.new do |l| l.addWidget(title) Qt::HBoxLayout.new do |h| l.insertLayout 1, h h.addWidget(Qt::Label.new("Resource name")) h.addWidget(edit) end l.addWidget(splitter) end pool = ConnectionPool.add_data_source :type => :sparql, :url => "http://dbpedia.org/sparql", :results => :sparql_xml Namespace.register(:dbpedia, 'http://dbpedia.org/') edit.connect SIGNAL(:returnPressed) do resource_name = edit.text.gsub(/ /, '_') references = Query.new.distinct(:reference). where(RDFS::Resource.new("http://dbpedia.org/resource/#{resource_name}"), RDFS::Resource.new('http://dbpedia.org/property/reference'), :reference). execute if references.length == 0 KDE::MessageBox.information(edit, "No resource found for '#{edit.text}'") else khtml1.begin khtml1.write("<h1>#{edit.text}</h1>") abstract = Query.new.select(:abstract). where(RDFS::Resource.new("http://dbpedia.org/resource/#{resource_name}"), DBPEDIA.abstract, :abstract). lang(:abstract, 'es'). execute if not abstract.nil? khtml1.write("<p>#{abstract[0]}</p><br />") end references.each_with_index do |ref, index| label = Query.new.select(:label). where(RDFS::Resource.new(ref.uri), RDFS::label, :label). execute khtml1.write("<a href='#{ref.uri}'>[#{index + 1}]</a> #{label[0]}<br />") end khtml1.end end end khtml1.browserExtension.connect SIGNAL("openUrlRequest(KUrl)") do |url| khtml2.openUrl(url) end kmainwindow.resize(700, 600) kmainwindow.show $kapp.exec
You can install ActiveRDF as a gem with 'sudo gem install activerdf'. Here is the code to connect to the DBpedia SPARQL end point:
pool = ConnectionPool.add_data_source :type => :sparql, :url => "http://dbpedia.org/sparql", :results => :sparql_xmlNamespace.register(:dbpedia, 'http://dbpedia.org/')
Pretty simple! The 'Namespace.register()' method allows you to shorten RDF properties and class names. Instead of RDFS::Resource.new('http://dbpedia.org/resource'), you can just use DBPEDIA.resource in a query. DBpedia use a large number of namespaces, and here is the complete set for use in ActiveRDF queries:
Namespace.register(:atomrdf, 'http://atomowl.org/ontologies/atomrdf#') Namespace.register(:common_sense_mapping, 'http://www.loa-cnr.it/ontologies/CommonSenseMapping.owl#') Namespace.register(:dbpedia, 'http://dbpedia.org/') Namespace.register(:dc, 'http://purl.org/dc/elements/1.1/') Namespace.register(:dcterms, 'http://purl.org/dc/terms/') Namespace.register(:dolce_lite, 'http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl#') Namespace.register(:event, 'http://purl.org/NET/c4dm/event.owl#') Namespace.register(:extended_dns, 'http://www.loa-cnr.it/ontologies/ExtendedDnS.owl#') Namespace.register(:foaf, 'http://xmlns.com/foaf/0.1/') Namespace.register(:gforge_ont, 'http://swc.projects.semwebcentral.org/owl/gforge-ont#') Namespace.register(:koala, 'http://protege.stanford.edu/plugins/owl/owl-library/koala.owl#') Namespace.register(:mo, 'http://purl.org/ontology/mo/') Namespace.register(:northwind, 'http://www.openlinksw.com/schemas/northwind#') Namespace.register(:ontology, 'http://purl.org/ontology/') Namespace.register(:periodic_table, 'http://www.daml.org/2003/01/periodictable/PeriodicTable#') Namespace.register(:pim_contact, 'http://www.w3.org/2000/10/swap/pim/contact#') Namespace.register(:relationship, 'http://purl.org/vocab/relationship/') Namespace.register(:rss, 'http://purl.org/rss/1.0/modules/content/') Namespace.register(:siocex, 'http://activerdf.org/sioc/') Namespace.register(:sioc, 'http://rdfs.org/sioc/ns#') Namespace.register(:sioc_types, 'http://rdfs.org/sioc/types#') Namespace.register(:skos, 'http://www.w3.org/2004/02/skos/core#') Namespace.register(:time, 'http://www.w3.org/2006/time#') Namespace.register(:timeline, 'http://purl.org/NET/c4dm/timeline.owl#') Namespace.register(:vocab_frbr_core, 'http://purl.org/vocab/frbr/core#') Namespace.register(:vocab, 'http://purl.org/vocab/') Namespace.register(:wgs84_pos, 'http://www.w3.org/2003/01/geo/wgs84_pos#') Namespace.register(:wordnet, 'http://xmlns.com/wordnet/1.6/')
One problem with DBpedia is that it is a bit slow, and you often need to split up complex queries into simple ones. That is why there are three different sorts of queries in the code above, rather than one big complex one. This is initial query to retrieve the references from the resource (there are further queries to retrieve the text of the article abstract, and to get the labels of the references):
references = Query.new.distinct(:reference). where(RDFS::Resource.new("http://dbpedia.org/resource/#{resource_name}"), RDFS::Resource.new('http://dbpedia.org/property/reference'), :reference). execute # It translates to this SPARQL query: SELECT ?re WHERE { <http://dbpedia.org/resource/The_Beatles> <http://dbpedia.org/property/reference> ?object . }
So the ActiveRDF ruby DSL maps very nicely onto the SPARQL query language, and the results are returned in a ruby Array. Any literals are returned as Strings, and URIs are returned as instances of RDFS::Resources, with the uri string accessible via a 'uri' method call. Recently there has been a lot of activity on the ActiveRDF mailing list and people are making suggestions, measuring performance, sending patches to tweak the query language, and it really seems to be a happening project. As the KDE4 Nepomuk semantic desktop uses the Redland RDF library, with a Berkley database based triple store, you can also use ActiveRDF to query that, and combine the results with web based SPARQL queries.
There you have it - a small powerful application that combines the best of desktop and web technologies - it's the future, and it's pretty much here already, just not evenly distributed yet..