8.3 Parsing and Querying an RDF Document

Once an RDF/XML document is created, it serves no useful purpose unless the data in the document can be parsed and queried. In many ways, the advantage to something like RDF/XML is that the data is structured in specific ways, making it easier to access different data with the same code.

This section will take a look at opening an existing RDF/XML document, both within the filesystem and through the Internet, and accessing the data contained within the documents.

8.3.1 Just Doing a Basic Dump

When accessing the data within an RDF/XML document, you'll want to access the data in two different ways?accessing specific pieces of data or accessing all of it for alternative presentation. For instance, most of the tools discussed in Chapter 14 and Chapter 15 are interested in all the data within an RDF/XML document, data that is then transformed in one way or another.

One of the most common ways of "dumping" the data within an RDF/XML document (outputting all the data in a new format) is to print it out in N-Triples format. This was demonstrated with the parser attached with the Jena Toolkit, ARP. However, another way of looking at the data is to dump out a listing of objects of one type or another.

In Example 8-10, the PostCon RDF file for the demonstration article is accessed and opened into a memory model using the read method; this method takes the URL of the file as its parameter. Once the model is loaded, the listObjects method is called on the model object and assigned to a nodeIterator. This object is just one of the many different iterators that Jena provides: nodeIterator, stmtIterator, ResIterator, and so on. Each of these is specialized to provide access to specific Jena object types. In the example, once the nodeIterator is populated, it's traversed, and all of the RDF objects?the property "values"?are printed out using the simple toString base method.

Example 8-10. Basic dump of objects, printing out object values
import com.hp.hpl.mesa.rdf.jena.mem.ModelMem;
import com.hp.hpl.mesa.rdf.jena.model.*;

public class pracRDFSixth extends Object {
    
public static void main (String args[]) {
    
String sUri = args[0]; 
                             
try {
 
   // Create memory model, read in RDF/XML document 
   ModelMem model = new ModelMem(  );
   model.read(sUri);

   // Print out objects in model using toString
   NodeIterator iter = model.listObjects(  );
   while (iter.hasNext(  )) {
        System.out.println("  " + iter.next(  ).toString(  ));
   }
          
   } catch (Exception e) {
            System.out.println("Failed: " + e);
   }
 }  
}

The application is run against the monsters1.rdf example file:

java pracRDFSixth http://burningbird.net/articles/monsters1.rdf

This is probably one of the simplest Jena applications you can write and test to make sure that a model is loaded correctly. Instead of objects, you could also dump out the subjects ( ResIterator and listSubjects) or even the entire statement ( StmtIterator and listStatements). The functionality is relatively the same, except for the iterator and the fetch method called.

8.3.2 Accessing Specific Values

Instead of listing all statements or all objects, you can fine-tune the code to list only subjects, statements, or objects matching specific properties, using the property implementations created within the wrapper classes, such as POSTCON.

To access all objects that have the PostCon related property, the POSTCON wrapper class is added to the import section:

import com.burningbird.postcon.vocabulary.POSTCON;

Next, the listObjectsOfProperty method is used instead of listObjects:

NodeIterator iter = model.listObjectsOfProperty(POSTCON.related);

That's it to access all objects given a specific property. As you can see, the wrapper class is handy for more than just creating a model.

To access all the statements for a given resource, first access the resource from the model and then list all the properties associated with that resource. In Example 8-11, all of the statements are accessed for the top-level resource contained within the document. Traversing the list of statements, the subject is accessed and printed out (both namespace and local name), followed by the predicate (again, namespace and local name), and finally the object.

Example 8-11. Printing out each statement triple for a given RDF/XML document
import com.hp.hpl.mesa.rdf.jena.mem.ModelMem;
import com.hp.hpl.mesa.rdf.jena.model.*;
import com.burningbird.postcon.vocabulary.POSTCON;

public class pracRDFSeventh extends Object {
    
public static void main (String args[]) {
    
String sUri = args[0];
String sResource = args[1];
                             
try {
 
   // Create memory model, read in RDF/XML document 
   ModelMem model = new ModelMem(  );
   model.read(sUri);

   // Find resource
   Resource res = model.getResource(sResource);

   // Find properties
   StmtIterator iter = res.listProperties(  );

   // Print out triple - subject | property | object
   while (iter.hasNext(  )) {
        // Next statement in queue
        Statement stmt = iter.next(  );

        // Get subject, print
        Resource res2 = stmt.getSubject(  );
        System.out.print(res2.getNameSpace(  ) + res2.getLocalName(  ));
        
        // Get predicate, print
        Property prop = stmt.getPredicate(  );
        System.out.print(" " + prop.getNameSpace(  ) + prop.getLocalName(  ));

        // Get object, print
        RDFNode node = stmt.getObject(  );
        System.out.println(" " + node.toString(  ) + "\n");
   }
          
   } catch (Exception e) {
            System.out.println("Failed: " + e);
   }
 }  
}

Running this application outputs the triple for each statement for the document, including application-generated object values for blank nodes:

http://burningbird.net/articles/monsters1.htm 
http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://burningbird.net/postcon/
elements/1.0/Resource

http://burningbird.net/articles/monsters1.htm 
http://burningbird.net/postcon/elements/1.0/bio 
anon:a9ae05:f2ecfdc9db:-7fff

http://burningbird.net/articles/monsters1.htm http://burningbird.net/postcon/
elements/1.0/relevancy 
anon:a9ae05:f2ecfdc9db:-7ff7

http://burningbird.net/articles/monsters1.htm http://burningbird.net/postcon/
elements/1.0/presentation 
anon:a9ae05:f2ecfdc9db:-7fec

http://burningbird.net/articles/monsters1.htm 
http://burningbird.net/postcon/elements/1.0/history 
anon:a9ae05:f2ecfdc9db:-7fde

http://burningbird.net/articles/monsters1.htm 
http://burningbird.net/postcon/elements/1.0/related 
http://burningbird.net/articles/monsters2.htm

http://burningbird.net/articles/monsters1.htm 
http://burningbird.net/postcon/elements/1.0/related 
http://burningbird.net/articles/monsters3.htm

http://burningbird.net/articles/monsters1.htm 
http://burningbird.net/postcon/elements/1.0/related 
http://burningbird.net/articles/monsters4.htm

Note in the code that the variation of getObject used is the one returning an RDFNode object. The reason is that other variations work only if the object is a literal and throw exceptions if a nonliteral is found. Since some of the objects in this document are resources, the RDFNode method works best.

As can be seen from the examples, querying the data in an RDF/XML document doesn't have to be difficult?you just have to remember the triple nature of the statements in RDF/XML.

One of the most powerful aspects of Jena is the ability to use a query language?RDQL?to query an RDF model to data that matches given patterns. This is explored in Chapter 10.