A Ruby Plasma Data Engine based on DBPedia SPARQL queries
I've been playing with using KIO::get() to make queries on the DBPedia SPARQL endpoint, parse the XML result set and convert it to be used by a Plasma Data Engine. I'll explain how it works as I think it is pretty useful and makes it very easy to link up applets with Semantic Web/Desktop data.
This is the basic SPARQL query, it takes the name of an artist and retrieves details of all the albums they've made - the album name, the urn of the album's DBPedia resource, creation date and cover art picture:
PREFIX p: <http://dbpedia.org/property/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT * WHERE {
?album p:artist <http://dbpedia.org/resource/The_Velvet_Underground>.
?album rdf:type <http://dbpedia.org/class/yago/Album106591815>.
OPTIONAL {?album p:cover ?cover}.
OPTIONAL {?album p:name ?name}.
OPTIONAL {?album p:released ?dateofrelease}.
}
I borrowed the example query from this article about making a timeline of albums. You post the query string to a url for the DBPedia SPARQL endpoint which is http://dbpedia.org/sparql, and the query results areturned in a simple to parse XML format. They look like this:
<?xml version="1.0" ?> <sparql xmlns="http://www.w3.org/2005/sparql-results#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/sw/DataAccess/rf1/result2.xsd"> <head> <variable name="album"/> <variable name="cover"/> <variable name="name"/> <variable name="dateofrelease"/> </head> <results distinct="false" ordered="true"> <result> <binding name="album"> <uri>http://dbpedia.org/resource/1969:_The_Velvet_Underground_Live</uri></binding> <binding name="cover"> <uri>http://upload.wikimedia.org/wikipedia/en/3/3c/1969Live.jpg</uri></binding> <binding name="name"><literal xml:lang="en">1969: The Velvet Underground Live</literal></binding> <binding name="dateofrelease"><literal datatype= "http://www.w3.org/2001/XMLSchema#gYearMonth">1974-09-01 00:00:00.000000</literal></binding> </result> ... </results> </sparql>
So if you're familiar with SQL queries, and SPARQL select query is very similar. In order to make it work well the the Plasma Data Engine model you need to decide which of the values is the most important, and in this case it's the album name.
The code to issue an HTTP request via KIO::get() is really short and simple. I wrote about using ActiveRDF to query in Get Semantic with DBPedia and ActiveRDF, and it was an interesting idea but didn't work very well. The open-uri get() call that the ActiveRDF SPARQL adapter uses would keep timing out even if you simplified the queries, and it was asynchronous which meant that a GUI app would just freeze while the query was being executed. KIO just chugs away in the background, calling the queryData() slot when ever some data arrived, until it calls the queryCompleted() and the data is ready to parse.
class SparqlDataEngine < Plasma::DataEngine
slots 'queryData(KIO::Job*, QByteArray)',
'queryCompleted(KJob*)'
def initialize(parent, args, endpoint, query, primary_value)
super(parent)
setMinimumPollingInterval120 * 1000)
@endpoint = endpoint
@query = query
@primary_value = primary_value
end
def sourceRequestEvent(source_name)
if @job
return false
end
@source_name = source_name
@sparql_results_xml = ""
query_url = KDE::Url.new("#{@endpoint}?query=#{CGI.escape(@query % @source_name.gsub(' ', '_'))}")
@job = KIO::get(query_url, KIO::Reload, KIO::HideProgressInfo)
@job.addMetaData("accept", "application/sparql-results+xml" )
connect(@job, SIGNAL('data(KIO::Job*, QByteArray)'), self,
SLOT('queryData(KIO::Job*, QByteArray)'))
connect(@job, SIGNAL('result(KJob*)'), self, SLOT('queryCompleted(KJob*)'))
setData(@source_name, {})
return true
end
def queryData(job, data)
@sparql_results_xml += data.to_s
end
def queryCompleted(job)
@job.doKill
@job = nil
parser = SparqlResultParser.new
REXML::Document.parse_stream(@sparql_results_xml, parser)
parser.result.each do |binding|
binding.each_pair do |key, value|
# puts "#{key} --> #{value.inspect}"
setData(binding[@primary_value].literal.variant.toString, key, Qt::Variant.fromValue(value))
end
end
end
def updateSourceEvent(source_name)
sourceRequestEvent(source_name)
return true
end
end
I tweaked the XML parsing code in the ActiveRDF adapter to create Nepomuk Soprano nodes, and return a Ruby Array of Hashes, each hash having keys for the SPARQL query variable and Soprano::Nodes for the values. The code in the 'queryCompleted()' method above then walks through the results making Plasma setData() calls, which is how an engine submits its data. The first string of the setData() call is the album name, eg 'White Light/White Heat' for the Velvets, and the second string is the particular attribute, such as data of release, and the third argument is the Soprano::Node with the value wrapped up in a Qt::Variant.
This is the code that parses the XML using the Ruby REXML library:
# Parser for SPARQL XML result set. Derived from the parser in the
# ActiveRDF SPARQL adapter code. Produces an Array of Hashes, each
# hash contains keys for each of the variables in the query, and
# values which are Soprano nodes.
#
class SparqlResultParser
attr_reader :result
def initialize
@result = []
@vars = []
@current_type = nil
end
def tag_start(name, attrs)
case name
when 'variable'
@vars << attrs['name']
when 'result'
@current_result = {}
when 'binding'
@current_binding = attrs['name']
when 'bnode', 'uri'
@current_type = name
when 'literal', 'typed-literal'
@current_type = name
@datatype = attrs['datatype']
@xmllang = attrs['xml:lang']
end
end
def tag_end(name)
if name == "result"
@result << @current_result
elsif name == 'bnode' || name == 'literal' || name == 'typed-literal' || name == 'uri'
@current_type = nil
elsif name == "sparql"
end
end
def text(text)
if !@current_type.nil?
@current_result[@current_binding] = create_node(@current_type, @datatype, @xmllang, text)
end
end
# create ruby objects for each RDF node
def create_node(type, datatype, xmllang, value)
case type
when 'uri'
Soprano::Node.new(Qt::Url.new(value))
when 'bnode'
Soprano::Node.new(value)
when 'literal', 'typed-literal'
if xmllang
Soprano::Node.new(Soprano::LiteralValue.new(value), xmllang)
elsif datatype
Soprano::Node.new(Soprano::LiteralValue.fromString(value, Qt::Url.new(datatype)))
else
Soprano::Node.new(Soprano::LiteralValue.new(value))
end
end
end
def method_missing (*args)
end
end
Those two class are pretty generic and could be used for any similar SPARQL query, and you just need to subclass SparqlDataEngine to give it a specific query string and endpoint like this:
SPARQL_QUERY = <<-EOS
PREFIX p: <http://dbpedia.org/property/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT * WHERE {
?album p:artist <http://dbpedia.org/resource/%s>.
?album rdf:type <http://dbpedia.org/class/yago/Album106591815>.
OPTIONAL {?album p:cover ?cover}.
OPTIONAL {?album p:name ?name}.
OPTIONAL {?album p:released ?dateofrelease}.
}
EOS
#
# Customize the use of the SparqlDataEngine by giving it the url of an endpoint,
# a query to execute, and the name of the most important (or primary) value.
# The '%s' in the query text above is replaced with the source name, with any
# spaces replaced by underscores.
#
class DbpediaAlbumsEngine < SparqlDataEngine
def initialize(parent, args)
super(parent, args, 'http://dbpedia.org/sparql', SPARQL_QUERY, 'name')
end
end
It's very little work indeed compared with the way you normally have to issue standard html requests and then parse the totally non-standard results. I had a look at some of the Weather applet's Ion code to get BBC forcasts and it was really very complicated, and it would be vastly simpler if you could get weather data via SPARQL instead. The last step is to make a .desktop file for your new engine:
[Desktop Entry] Name=DBPedia Albums Data Engine Comment=DBPedia album data for Plasmoids X-KDE-ServiceTypes=Plasma/DataEngine Type=Service Icon= X-KDE-Library=krubypluginfactory X-KDE-PluginKeyword=plasma-engine-dbpedia-albums/dbpedia_albums_engine.rb X-Plasma-EngineName=dbpedia-albums
And a simple CMakeLists.txt file to install it:
install(FILES plasma-dataengine-dbpedia-albums.desktop DESTINATION ${SERVICES_INSTALL_DIR} )
install(FILES dbpedia_albums_engine.rb DESTINATION ${DATA_INSTALL_DIR}/plasma-engine-dbpedia-albums)
You can use the Plasma engine explorer to test engines, and I enhanced the Ruby version slightly so it can show the contents of Soprano::Nodes within Qt::Variants. Here is what the browser looks like testing a new engine:
[image:3401 size=preview]
I'll try and add some stuff to the TechBase wiki about writing Ruby Plasma data engines and applets once the api has settled down a bit again, but I hope I've explained enough to get people playing with SPARQL queries as I think there could be a lot of application for the idea..