Skip to content

Get Semantic with DBPedia and ActiveRDF

Sunday, 12 August 2007  |  richard dale

I'm quite excited by the things that the Semantic web will make possible, and one very interesting project is DBpedia, which aims to extract structured data from Wikipedia, link it with other datasets and put everything in an RDF triple store that you can either download or query via a 'SPARQL endpoint' on the web. I've been trying out using ActiveRDF to make DBpedia queries and showing the results in a Korundum KDE4 app.

This week the DBpedia team have improved the dataset with better extraction algorithms and bug fixes, and it is starting to get seriously useful. ActiveRDF is similar to Rails ActiveRecord, but instead of retrieving relational database query datasets, and turning them into instances of ruby classes, it retrieves RDF from triple stores.

To use the app, you enter the name of a resource such as 'The Beatles', the resource is looked up in DBpedia and an abstract of the Wikipedia article, along with any references is retrieved. The results are shown in one KDE::HTMLPart with clickable links, that allow you to look at the references in another KDE::HTMLPart widget. And all in less than 100 lines of code:

require 'korundum4'
require 'active_rdf'

about = KDE::AboutData.new( Qt::ByteArray.new("dbpedia"), 
                            Qt::ByteArray.new("DBpedia demo"), 
                            KDE::LocalizedString.new, 
                            Qt::ByteArray.new )
KDE::CmdLineArgs.init(ARGV, about)
KDE::Application.new

kmainwindow = KDE::MainWindow.new(nil)
widget = Qt::Widget.new(kmainwindow)
kmainwindow.centralWidget = widget

title = Qt::Label.new("Search DBpedia for references to a resource") do |t|
  t.alignment = Qt::AlignCenter
end

edit = KDE::LineEdit.new

khtml1 = KDE::HTMLPart.new
khtml2 = KDE::HTMLPart.new
splitter = Qt::Splitter.new do |s|
  s.orientation = Qt::Vertical
  s.addWidget(khtml1.widget)
  s.addWidget(khtml2.widget)
end

widget.layout = Qt::VBoxLayout.new do |l|
  l.addWidget(title)
  Qt::HBoxLayout.new do |h|
    l.insertLayout 1, h
    h.addWidget(Qt::Label.new("Resource name"))
    h.addWidget(edit)
  end
  l.addWidget(splitter)
end

pool = ConnectionPool.add_data_source :type => :sparql,
  :url => "http://dbpedia.org/sparql",
  :results => :sparql_xml

Namespace.register(:dbpedia, 'http://dbpedia.org/')

edit.connect SIGNAL(:returnPressed) do 
  resource_name = edit.text.gsub(/ /, '_')
  references = Query.new.distinct(:reference).
               where(RDFS::Resource.new("http://dbpedia.org/resource/#{resource_name}"), 
                     RDFS::Resource.new('http://dbpedia.org/property/reference'), 
                    :reference).
               execute

  if references.length == 0
    KDE::MessageBox.information(edit, "No resource found for '#{edit.text}'")
  else
    khtml1.begin
    khtml1.write("<h1>#{edit.text}</h1>")

    abstract = Query.new.select(:abstract).
               where(RDFS::Resource.new("http://dbpedia.org/resource/#{resource_name}"), 
                 DBPEDIA.abstract, 
                 :abstract).
               lang(:abstract, 'es').
               execute

    if not abstract.nil?
      khtml1.write("<p>#{abstract[0]}</p><br />")
    end

    references.each_with_index do |ref, index|
      label = Query.new.select(:label).
              where(RDFS::Resource.new(ref.uri), RDFS::label, :label).
              execute
      khtml1.write("<a href='#{ref.uri}'>[#{index + 1}]</a>  #{label[0]}<br />")
    end
    khtml1.end
  end
end

khtml1.browserExtension.connect SIGNAL("openUrlRequest(KUrl)") do |url| 
  khtml2.openUrl(url)
end

kmainwindow.resize(700, 600)
kmainwindow.show
$kapp.exec

You can install ActiveRDF as a gem with 'sudo gem install activerdf'. Here is the code to connect to the DBpedia SPARQL end point:

pool = ConnectionPool.add_data_source :type => :sparql,
  :url => "http://dbpedia.org/sparql",
  :results => :sparql_xml

Namespace.register(:dbpedia, 'http://dbpedia.org/')

Pretty simple! The 'Namespace.register()' method allows you to shorten RDF properties and class names. Instead of RDFS::Resource.new('http://dbpedia.org/resource'), you can just use DBPEDIA.resource in a query. DBpedia use a large number of namespaces, and here is the complete set for use in ActiveRDF queries:

Namespace.register(:atomrdf, 'http://atomowl.org/ontologies/atomrdf#')
Namespace.register(:common_sense_mapping, 'http://www.loa-cnr.it/ontologies/CommonSenseMapping.owl#')
Namespace.register(:dbpedia, 'http://dbpedia.org/')
Namespace.register(:dc, 'http://purl.org/dc/elements/1.1/')
Namespace.register(:dcterms, 'http://purl.org/dc/terms/')
Namespace.register(:dolce_lite, 'http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl#')
Namespace.register(:event, 'http://purl.org/NET/c4dm/event.owl#')
Namespace.register(:extended_dns, 'http://www.loa-cnr.it/ontologies/ExtendedDnS.owl#')
Namespace.register(:foaf, 'http://xmlns.com/foaf/0.1/')
Namespace.register(:gforge_ont, 'http://swc.projects.semwebcentral.org/owl/gforge-ont#')
Namespace.register(:koala, 'http://protege.stanford.edu/plugins/owl/owl-library/koala.owl#')
Namespace.register(:mo, 'http://purl.org/ontology/mo/')
Namespace.register(:northwind, 'http://www.openlinksw.com/schemas/northwind#')
Namespace.register(:ontology, 'http://purl.org/ontology/')
Namespace.register(:periodic_table, 'http://www.daml.org/2003/01/periodictable/PeriodicTable#')
Namespace.register(:pim_contact, 'http://www.w3.org/2000/10/swap/pim/contact#')
Namespace.register(:relationship, 'http://purl.org/vocab/relationship/')
Namespace.register(:rss, 'http://purl.org/rss/1.0/modules/content/')
Namespace.register(:siocex, 'http://activerdf.org/sioc/')
Namespace.register(:sioc, 'http://rdfs.org/sioc/ns#')
Namespace.register(:sioc_types, 'http://rdfs.org/sioc/types#')
Namespace.register(:skos, 'http://www.w3.org/2004/02/skos/core#')
Namespace.register(:time, 'http://www.w3.org/2006/time#')
Namespace.register(:timeline, 'http://purl.org/NET/c4dm/timeline.owl#')
Namespace.register(:vocab_frbr_core, 'http://purl.org/vocab/frbr/core#')
Namespace.register(:vocab, 'http://purl.org/vocab/')
Namespace.register(:wgs84_pos, 'http://www.w3.org/2003/01/geo/wgs84_pos#')
Namespace.register(:wordnet, 'http://xmlns.com/wordnet/1.6/')

One problem with DBpedia is that it is a bit slow, and you often need to split up complex queries into simple ones. That is why there are three different sorts of queries in the code above, rather than one big complex one. This is initial query to retrieve the references from the resource (there are further queries to retrieve the text of the article abstract, and to get the labels of the references):

  references = Query.new.distinct(:reference).
               where(RDFS::Resource.new("http://dbpedia.org/resource/#{resource_name}"), 
                     RDFS::Resource.new('http://dbpedia.org/property/reference'), 
                    :reference).
               execute
# It translates to this SPARQL query:
SELECT ?re 
WHERE { 
 <http://dbpedia.org/resource/The_Beatles> <http://dbpedia.org/property/reference> ?object .
}

So the ActiveRDF ruby DSL maps very nicely onto the SPARQL query language, and the results are returned in a ruby Array. Any literals are returned as Strings, and URIs are returned as instances of RDFS::Resources, with the uri string accessible via a 'uri' method call. Recently there has been a lot of activity on the ActiveRDF mailing list and people are making suggestions, measuring performance, sending patches to tweak the query language, and it really seems to be a happening project. As the KDE4 Nepomuk semantic desktop uses the Redland RDF library, with a Berkley database based triple store, you can also use ActiveRDF to query that, and combine the results with web based SPARQL queries.

There you have it - a small powerful application that combines the best of desktop and web technologies - it's the future, and it's pretty much here already, just not evenly distributed yet..