In order to bulk-load RDF data into Oracle (Spatial) 11g, the data needs to be converted to N-Triples first. If the data set is large, this step can add quite a bit of overhead, which is why I decided to benchmark and compare several options.

For the comparison, the <a href="">taxonomy.rdf.gz</a> file from UniProt release 10.0 was used. This file is about 127M large (uncompressed). The machine on which the comparison was run is a (slightly obsolete) Itanium machine, with plenty of RAM.


The first tool I tried was Raptor (1.4.14). After the usual configuremakemake install the conversion can be run like so:

zcat taxonomy.rdf.gz | rapper -e -o ntriples - file://taxonomy.rdf.gz# > taxonomy.nt

This completed in 38.9, 38.8 and 38.9 seconds (in subsequent runs).

The -e flag turns off validation. This doesn’t seem to have a measurable impact on performance, but is necessary to avoid erroneous “Duplicated rdf:ID value” errors (at least in another, larger file).

The data is decompressed on the fly to save time (and disk space).


Next I tried Jena (2.5.2). After adding all the jars to the classpath, the conversion was run like so:

zcat taxonomy.rdf.gz | java jena.rdfparse -b file://taxonomy.rdf.gz# -x - > taxonomy.nt

This completed in 2:38, 2:40 and 2:38 min.

This is too slow if I wanted to convert the entire UniProt RDF data set within reasonable time, but at least I got a (correct) warning about a bad URI that I hadn’t been aware of…

The JVM is JRockit (5.0 R27.1) with default parameters (I tried some variations such as adding -Xgcprio:throughput, but didn’t see any significant change).


Last, I tried Rio (1.0.9), another Java parser. Rio doesn’t seem to include a command line tool for conversion, but it’s not a lot of code:

import org.openrdf.model.Resource;
import org.openrdf.model.URI;
import org.openrdf.model.Value;

public class Converter
    public static void main(String[] args)
        throws Exception
        Parser parser = new RdfXmlParser();
        final NTriplesWriter writer = new NTriplesWriter(System.out);
        parser.setStatementHandler(new StatementHandler()
            public void handleStatement(Resource s, URI p, Value o)
                throws StatementHandlerException
                    writer.writeStatement(s, p, o);

                catch (IOException e)
                    throw new StatementHandlerException(e);
        parser.parse(, args[0]);

zcat taxonomy.rdf.gz | java Converter file://taxonomy.rdf.gz# > taxonomy.nt

This ran in 49.8, 49.5 and 50.0 seconds.

Using buffered readers or writers seemed to decrease performance slightly, so I assume the streams are already being buffered.


The conversion can be done fastest with Raptor. Rio is the best choice if you need to set up a platform-independent procedure (e.g. integrated into an Ant build). Jena is best if you also need to check the data :-)