A Ruby Plasma Data Engine based on DBPedia SPARQL queries
I've been playing with using KIO::get() to make queries on the DBPedia SPARQL endpoint, parse the XML result set and convert it to be used by a Plasma Data Engine. I'll explain how it works as I think it is pretty useful and makes it very easy to link up applets with Semantic Web/Desktop data.
This is the basic SPARQL query, it takes the name of an artist and retrieves details of all the albums they've made - the album name, the urn of the album's DBPedia resource, creation date and cover art picture:
PREFIX p: <http://dbpedia.org/property/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT * WHERE { ?album p:artist <http://dbpedia.org/resource/The_Velvet_Underground>. ?album rdf:type <http://dbpedia.org/class/yago/Album106591815>. OPTIONAL {?album p:cover ?cover}. OPTIONAL {?album p:name ?name}. OPTIONAL {?album p:released ?dateofrelease}. }
I borrowed the example query from this article about making a timeline of albums. You post the query string to a url for the DBPedia SPARQL endpoint which is http://dbpedia.org/sparql, and the query results areturned in a simple to parse XML format. They look like this:
<?xml version="1.0" ?> <sparql xmlns="http://www.w3.org/2005/sparql-results#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/sw/DataAccess/rf1/result2.xsd"> <head> <variable name="album"/> <variable name="cover"/> <variable name="name"/> <variable name="dateofrelease"/> </head> <results distinct="false" ordered="true"> <result> <binding name="album"> <uri>http://dbpedia.org/resource/1969:_The_Velvet_Underground_Live</uri></binding> <binding name="cover"> <uri>http://upload.wikimedia.org/wikipedia/en/3/3c/1969Live.jpg</uri></binding> <binding name="name"><literal xml:lang="en">1969: The Velvet Underground Live</literal></binding> <binding name="dateofrelease"><literal datatype= "http://www.w3.org/2001/XMLSchema#gYearMonth">1974-09-01 00:00:00.000000</literal></binding> </result> ... </results> </sparql>
So if you're familiar with SQL queries, and SPARQL select query is very similar. In order to make it work well the the Plasma Data Engine model you need to decide which of the values is the most important, and in this case it's the album name.
The code to issue an HTTP request via KIO::get() is really short and simple. I wrote about using ActiveRDF to query in Get Semantic with DBPedia and ActiveRDF, and it was an interesting idea but didn't work very well. The open-uri get() call that the ActiveRDF SPARQL adapter uses would keep timing out even if you simplified the queries, and it was asynchronous which meant that a GUI app would just freeze while the query was being executed. KIO just chugs away in the background, calling the queryData() slot when ever some data arrived, until it calls the queryCompleted() and the data is ready to parse.
class SparqlDataEngine < Plasma::DataEngine slots 'queryData(KIO::Job*, QByteArray)', 'queryCompleted(KJob*)' def initialize(parent, args, endpoint, query, primary_value) super(parent) setMinimumPollingInterval120 * 1000) @endpoint = endpoint @query = query @primary_value = primary_value end def sourceRequestEvent(source_name) if @job return false end @source_name = source_name @sparql_results_xml = "" query_url = KDE::Url.new("#{@endpoint}?query=#{CGI.escape(@query % @source_name.gsub(' ', '_'))}") @job = KIO::get(query_url, KIO::Reload, KIO::HideProgressInfo) @job.addMetaData("accept", "application/sparql-results+xml" ) connect(@job, SIGNAL('data(KIO::Job*, QByteArray)'), self, SLOT('queryData(KIO::Job*, QByteArray)')) connect(@job, SIGNAL('result(KJob*)'), self, SLOT('queryCompleted(KJob*)')) setData(@source_name, {}) return true end def queryData(job, data) @sparql_results_xml += data.to_s end def queryCompleted(job) @job.doKill @job = nil parser = SparqlResultParser.new REXML::Document.parse_stream(@sparql_results_xml, parser) parser.result.each do |binding| binding.each_pair do |key, value| # puts "#{key} --> #{value.inspect}" setData(binding[@primary_value].literal.variant.toString, key, Qt::Variant.fromValue(value)) end end end def updateSourceEvent(source_name) sourceRequestEvent(source_name) return true end end
I tweaked the XML parsing code in the ActiveRDF adapter to create Nepomuk Soprano nodes, and return a Ruby Array of Hashes, each hash having keys for the SPARQL query variable and Soprano::Nodes for the values. The code in the 'queryCompleted()' method above then walks through the results making Plasma setData() calls, which is how an engine submits its data. The first string of the setData() call is the album name, eg 'White Light/White Heat' for the Velvets, and the second string is the particular attribute, such as data of release, and the third argument is the Soprano::Node with the value wrapped up in a Qt::Variant.
This is the code that parses the XML using the Ruby REXML library:
# Parser for SPARQL XML result set. Derived from the parser in the # ActiveRDF SPARQL adapter code. Produces an Array of Hashes, each # hash contains keys for each of the variables in the query, and # values which are Soprano nodes. # class SparqlResultParser attr_reader :result def initialize @result = [] @vars = [] @current_type = nil end def tag_start(name, attrs) case name when 'variable' @vars << attrs['name'] when 'result' @current_result = {} when 'binding' @current_binding = attrs['name'] when 'bnode', 'uri' @current_type = name when 'literal', 'typed-literal' @current_type = name @datatype = attrs['datatype'] @xmllang = attrs['xml:lang'] end end def tag_end(name) if name == "result" @result << @current_result elsif name == 'bnode' || name == 'literal' || name == 'typed-literal' || name == 'uri' @current_type = nil elsif name == "sparql" end end def text(text) if !@current_type.nil? @current_result[@current_binding] = create_node(@current_type, @datatype, @xmllang, text) end end # create ruby objects for each RDF node def create_node(type, datatype, xmllang, value) case type when 'uri' Soprano::Node.new(Qt::Url.new(value)) when 'bnode' Soprano::Node.new(value) when 'literal', 'typed-literal' if xmllang Soprano::Node.new(Soprano::LiteralValue.new(value), xmllang) elsif datatype Soprano::Node.new(Soprano::LiteralValue.fromString(value, Qt::Url.new(datatype))) else Soprano::Node.new(Soprano::LiteralValue.new(value)) end end end def method_missing (*args) end end
Those two class are pretty generic and could be used for any similar SPARQL query, and you just need to subclass SparqlDataEngine to give it a specific query string and endpoint like this:
SPARQL_QUERY = <<-EOS PREFIX p: <http://dbpedia.org/property/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT * WHERE { ?album p:artist <http://dbpedia.org/resource/%s>. ?album rdf:type <http://dbpedia.org/class/yago/Album106591815>. OPTIONAL {?album p:cover ?cover}. OPTIONAL {?album p:name ?name}. OPTIONAL {?album p:released ?dateofrelease}. } EOS # # Customize the use of the SparqlDataEngine by giving it the url of an endpoint, # a query to execute, and the name of the most important (or primary) value. # The '%s' in the query text above is replaced with the source name, with any # spaces replaced by underscores. # class DbpediaAlbumsEngine < SparqlDataEngine def initialize(parent, args) super(parent, args, 'http://dbpedia.org/sparql', SPARQL_QUERY, 'name') end end
It's very little work indeed compared with the way you normally have to issue standard html requests and then parse the totally non-standard results. I had a look at some of the Weather applet's Ion code to get BBC forcasts and it was really very complicated, and it would be vastly simpler if you could get weather data via SPARQL instead. The last step is to make a .desktop file for your new engine:
[Desktop Entry] Name=DBPedia Albums Data Engine Comment=DBPedia album data for Plasmoids X-KDE-ServiceTypes=Plasma/DataEngine Type=Service Icon= X-KDE-Library=krubypluginfactory X-KDE-PluginKeyword=plasma-engine-dbpedia-albums/dbpedia_albums_engine.rb X-Plasma-EngineName=dbpedia-albums
And a simple CMakeLists.txt file to install it:
install(FILES plasma-dataengine-dbpedia-albums.desktop DESTINATION ${SERVICES_INSTALL_DIR} ) install(FILES dbpedia_albums_engine.rb DESTINATION ${DATA_INSTALL_DIR}/plasma-engine-dbpedia-albums)
You can use the Plasma engine explorer to test engines, and I enhanced the Ruby version slightly so it can show the contents of Soprano::Nodes within Qt::Variants. Here is what the browser looks like testing a new engine:
[image:3401 size=preview]
I'll try and add some stuff to the TechBase wiki about writing Ruby Plasma data engines and applets once the api has settled down a bit again, but I hope I've explained enough to get people playing with SPARQL queries as I think there could be a lot of application for the idea..