A Ruby Plasma Data Engine based on DBPedia SPARQL queries

    richard dale's picture
    2008
    17
    Apr

    I've been playing with using KIO::get() to make queries on the DBPedia SPARQL endpoint, parse the XML result set and convert it to be used by a Plasma Data Engine. I'll explain how it works as I think it is pretty useful and makes it very easy to link up applets with Semantic Web/Desktop data.

    This is the basic SPARQL query, it takes the name of an artist and retrieves details of all the albums they've made - the album name, the urn of the album's DBPedia resource, creation date and cover art picture:

    PREFIX p: <http://dbpedia.org/property/>  
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    SELECT * WHERE { 
         ?album p:artist  <http://dbpedia.org/resource/The_Velvet_Underground>.       
         ?album rdf:type <http://dbpedia.org/class/yago/Album106591815>.
         OPTIONAL {?album p:cover ?cover}.
         OPTIONAL {?album p:name ?name}.
         OPTIONAL {?album p:released ?dateofrelease}.
       }
    

    I borrowed the example query from this article about making a timeline of albums. You post the query string to a url for the DBPedia SPARQL endpoint which is http://dbpedia.org/sparql, and the query results areturned in a simple to parse XML format. They look like this:

    <?xml version="1.0" ?>
    <sparql xmlns="http://www.w3.org/2005/sparql-results#"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.w3.org/2001/sw/DataAccess/rf1/result2.xsd">
     <head>
      <variable name="album"/>
      <variable name="cover"/>
      <variable name="name"/>
      <variable name="dateofrelease"/>
     </head>
     <results distinct="false" ordered="true">
      <result>
       <binding name="album">
    <uri>http://dbpedia.org/resource/1969:_The_Velvet_Underground_Live</uri></binding>
       <binding name="cover">
    <uri>http://upload.wikimedia.org/wikipedia/en/3/3c/1969Live.jpg</uri></binding>
       <binding name="name"><literal xml:lang="en">1969: The Velvet Underground Live</literal></binding>
       <binding name="dateofrelease"><literal datatype=
       "http://www.w3.org/2001/XMLSchema#gYearMonth">1974-09-01 00:00:00.000000</literal></binding>
      </result>
      ...
     </results>
    </sparql>
    

    So if you're familiar with SQL queries, and SPARQL select query is very similar. In order to make it work well the the Plasma Data Engine model you need to decide which of the values is the most important, and in this case it's the album name.

    The code to issue an HTTP request via KIO::get() is really short and simple. I wrote about using ActiveRDF to query in Get Semantic with DBPedia and ActiveRDF, and it was an interesting idea but didn't work very well. The open-uri get() call that the ActiveRDF SPARQL adapter uses would keep timing out even if you simplified the queries, and it was asynchronous which meant that a GUI app would just freeze while the query was being executed. KIO just chugs away in the background, calling the queryData() slot when ever some data arrived, until it calls the queryCompleted() and the data is ready to parse.

    class SparqlDataEngine < Plasma::DataEngine
      slots 'queryData(KIO::Job*, QByteArray)',
            'queryCompleted(KJob*)'
    
      def initialize(parent, args, endpoint, query, primary_value)
        super(parent)
        setMinimumPollingInterval120 * 1000)
        @endpoint = endpoint
        @query = query
        @primary_value = primary_value
      end
    
      def sourceRequestEvent(source_name)
        if @job
          return false
        end
    
        @source_name = source_name
        @sparql_results_xml = ""
        query_url = KDE::Url.new("#{@endpoint}?query=#{CGI.escape(@query % @source_name.gsub(' ', '_'))}")
        @job = KIO::get(query_url, KIO::Reload, KIO::HideProgressInfo)
        @job.addMetaData("accept", "application/sparql-results+xml" )
        connect(@job, SIGNAL('data(KIO::Job*, QByteArray)'), self,
                SLOT('queryData(KIO::Job*, QByteArray)'))
        connect(@job, SIGNAL('result(KJob*)'), self, SLOT('queryCompleted(KJob*)'))
        setData(@source_name, {})
        return true
      end
    
      def queryData(job, data)
        @sparql_results_xml += data.to_s
      end
    
      def queryCompleted(job)
        @job.doKill
        @job = nil
        parser = SparqlResultParser.new
        REXML::Document.parse_stream(@sparql_results_xml, parser)
        parser.result.each do |binding|
          binding.each_pair do |key, value|
            # puts "#{key} --> #{value.inspect}"
            setData(binding[@primary_value].literal.variant.toString, key, Qt::Variant.fromValue(value))
          end
        end
      end
    
      def updateSourceEvent(source_name)
        sourceRequestEvent(source_name)
        return true
      end
    end
    

    I tweaked the XML parsing code in the ActiveRDF adapter to create Nepomuk Soprano nodes, and return a Ruby Array of Hashes, each hash having keys for the SPARQL query variable and Soprano::Nodes for the values. The code in the 'queryCompleted()' method above then walks through the results making Plasma setData() calls, which is how an engine submits its data. The first string of the setData() call is the album name, eg 'White Light/White Heat' for the Velvets, and the second string is the particular attribute, such as data of release, and the third argument is the Soprano::Node with the value wrapped up in a Qt::Variant.

    This is the code that parses the XML using the Ruby REXML library:

    # Parser for SPARQL XML result set. Derived from the parser in the
    # ActiveRDF SPARQL adapter code. Produces an Array of Hashes, each
    # hash contains keys for each of the variables in the query, and
    # values which are Soprano nodes.
    #
    class SparqlResultParser
      attr_reader :result
    
      def initialize
        @result = []
        @vars = []
        @current_type = nil
      end
      
      def tag_start(name, attrs)
        case name
        when 'variable'
          @vars << attrs['name']
        when 'result'
          @current_result = {}
        when 'binding'
          @current_binding = attrs['name']
        when 'bnode', 'uri'
          @current_type = name
        when 'literal', 'typed-literal'
          @current_type = name
          @datatype = attrs['datatype']
          @xmllang = attrs['xml:lang']
        end
      end
      
      def tag_end(name)
        if name == "result"
          @result <&lt @current_result
        elsif name == 'bnode' || name == 'literal' || name == 'typed-literal' || name == 'uri'
          @current_type = nil
        elsif name == "sparql"
        end
      end
      
      def text(text)
        if !@current_type.nil?
          @current_result[@current_binding] = create_node(@current_type, @datatype, @xmllang, text)
        end
      end
    
      # create ruby objects for each RDF node
      def create_node(type, datatype, xmllang, value)
        case type
        when 'uri'
          Soprano::Node.new(Qt::Url.new(value))
        when 'bnode'
          Soprano::Node.new(value)
        when 'literal', 'typed-literal'
          if xmllang
            Soprano::Node.new(Soprano::LiteralValue.new(value), xmllang)
          elsif datatype
            Soprano::Node.new(Soprano::LiteralValue.fromString(value, Qt::Url.new(datatype)))
          else
            Soprano::Node.new(Soprano::LiteralValue.new(value))
          end
        end
      end
      
      def method_missing (*args)
      end
    end
    

    Those two class are pretty generic and could be used for any similar SPARQL query, and you just need to subclass SparqlDataEngine to give it a specific query string and endpoint like this:

    SPARQL_QUERY = <<-EOS
    PREFIX p: <http://dbpedia.org/property/>  
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    SELECT * WHERE { 
         ?album p:artist  <http://dbpedia.org/resource/%s>.       
         ?album rdf:type <http://dbpedia.org/class/yago/Album106591815>.
         OPTIONAL {?album p:cover ?cover}.
         OPTIONAL {?album p:name ?name}.
         OPTIONAL {?album p:released ?dateofrelease}.
       }
    EOS
    
    #
    # Customize the use of the SparqlDataEngine by giving it the url of an endpoint,
    # a query to execute, and the name of the most important (or primary) value.
    # The '%s' in the query text above is replaced with the source name, with any
    # spaces replaced by underscores.
    #
    class DbpediaAlbumsEngine < SparqlDataEngine
      def initialize(parent, args)
        super(parent, args, 'http://dbpedia.org/sparql', SPARQL_QUERY, 'name')
      end
    end
    

    It's very little work indeed compared with the way you normally have to issue standard html requests and then parse the totally non-standard results. I had a look at some of the Weather applet's Ion code to get BBC forcasts and it was really very complicated, and it would be vastly simpler if you could get weather data via SPARQL instead. The last step is to make a .desktop file for your new engine:

    [Desktop Entry]
    Name=DBPedia Albums Data Engine
    Comment=DBPedia album data for Plasmoids
    X-KDE-ServiceTypes=Plasma/DataEngine
    Type=Service
    Icon=
    X-KDE-Library=krubypluginfactory
    X-KDE-PluginKeyword=plasma-engine-dbpedia-albums/dbpedia_albums_engine.rb
    X-Plasma-EngineName=dbpedia-albums
    

    And a simple CMakeLists.txt file to install it:

    install(FILES plasma-dataengine-dbpedia-albums.desktop DESTINATION ${SERVICES_INSTALL_DIR} )
    install(FILES dbpedia_albums_engine.rb DESTINATION ${DATA_INSTALL_DIR}/plasma-engine-dbpedia-albums)
    

    You can use the Plasma engine explorer to test engines, and I enhanced the Ruby version slightly so it can show the contents of Soprano::Nodes within Qt::Variants. Here is what the browser looks like testing a new engine:

    [image:3401 size=preview]

    I'll try and add some stuff to the TechBase wiki about writing Ruby Plasma data engines and applets once the api has settled down a bit again, but I hope I've explained enough to get people playing with SPARQL queries as I think there could be a lot of application for the idea..

    Comments

    Comment viewing options

    Select your preferred way to display the comments and click "Save settings" to activate your changes.
    aseigo's picture

    the only problem is...

    the only problem is that these ruby plugins are not using the ScriptEngine mechanism. ScriptEngine is not just there to make it possible to add scripting for languages that need a direct shim, they also play a *very* important role in management of the plugins, API maintenance, etc.

    while it's cool one can write plugins using ruby like this, i really hope this doesn't become the preferred mechanism for ruby addons to plasma since it is broken from the perspective of the libplasma api.

    your screenshot reminds me of a small bug i need to fix in the engine explorer though =)

    richard dale's picture

    Re: the only problem is...

    Well the Ruby Plamsa bindings are generated directly from the Plasma headers, and so maintaining them with respect to API changes is pretty easy. The Ruby plugins use the same mechanism as C++ ones for packaging, and if a there is a better way for scripting languages it sounds a good idea to change. I would prefer not to have to treat every KDE plugin api as a special case, to be maintained in its own (possibly idiosyncratic) way.

    My understanding was that there were to be two sorts of apis for scripting languages in Plasma, a very simple sort aimed at non-professional 'consumer programmers', and another sort for people who want to do everything they can in C++ and more, but faster and easier.

    That's why I am interested in asking questions about what sort of programmers and languages are we targeting, and what kind of development environments they might want. In my opinion, at present we don't have any central point where that kind of issue is addressed and discussed in the KDE project. It is spread across app specific lists like the Plasma one, sometimes on k-c-d, there was a large discussion recently on the release list, and then there is the kde-bindings list which is used by only some of the Qt and KDE bindings projects.

    Comment viewing options

    Select your preferred way to display the comments and click "Save settings" to activate your changes.