Project Marmotta has retired. For details please refer to its Attic page.
Apache Marmotta - KiWi Native SPARQL

KiWi Native SPARQL

The KiWi SPARQL module offers optimized SPARQL 1.1 query support for typical cases by translating parts of a SPARQL query directly into SQL. As of Marmotta 3.3, most SPARQL constructs are directly translated. Also, result iterators of an optimized query operate directly on database cursors, so they will be very efficient in case only a few results will be retrieved.

Note that translating SPARQL into SQL is a challenging task, and probably the most complex part of code contained in Apache Marmotta. Even though the syntax seems similar, the semantics of both languages have small differences. In some border cases, we therefore deliberately deviate from the SPARQL standard. These are documented below. If you want full standard compliance, you can still use SPARQL without the native support, at the expense of query performance (factor up to 1000).

Maven Artifact

The KiWi SPARQL optimizations can only be used in conjunction with the KiWi Triple Store, because it works directly on the internal KiWi data structures. To include it in a project that uses the KiWi Triple Store, add the following dependency to your Maven project:

 <dependency>
     <groupId>org.apache.marmotta</groupId>
     <artifactId>kiwi-sparql</artifactId>
     <version>3.3.0</version>
 </dependency>

Marmotta Extensions

Starting with the development version of 3.2, Apache Marmotta also provides some extensions to SPARQL 1.1:

Crypto Extensions

The SPARQL standard defines a number of crypto functions (e.g. UUID or MD5)). These are all supported by KiWi Native SPARQL for PostgreSQL and MySQL. Note that when using these functions, plain text might still be sent over the network without encryption, depending on your setup (i.e. using SSL for accessing the SPARQL endpoint, and using SSL for connecting to the backend database). Depending on the database you are using, certain additional setups are required:

In PostgreSQL, it is necessary to install the pgcrypto extension (version >=1.1) from the contrib package into the Marmotta database. Depending on your operating system and distribution, this might also require installation of additional software packages (under Debian: postgresql-contrib). To install the pgcrypto extension, connect to the database as superuser and run

CREATE EXTENSION IF NOT EXISTS pgcrypto;

To connect to the marmotta database as superuser on Linux, you would e.g. run the following commands:

su - postgres
psql marmotta

You can also use a graphical UI, like pgAdmin, but make sure to call the command on the correct database (not the whole server).

Performance Considerations

In practice, the KiWi SPARQL module seriously improves the performance of most SPARQL queries (and even updates) and should therefore almost always be used in conjunction with the KiWi triple store. However, there is no magic, and you need to keep in mind that certain queries will still be problematic. To improve SPARQL performance, try to follow the following recommendations:

  • avoid DISTINCT, ORDER BY, GROUP BY: filtering out duplicates is a performance killer, as it requires to first load all results into memory; if you do not strictly need it, do not use it
  • use LIMIT: limiting the number of results helps the underlying SQL query planner to create better query plans, so your query will perform better
  • use FILTER: conditions in the FILTER part of a query will be translated into WHERE conditions in SQL; the more precise your filter conditions are, the better your query will perform

Differences from SPARQL Standard

The KiWi native SPARQL implementation differs from the semantics of the SPARQL standard in the following known cases. Implementing these cases according to the standard would produce signficantly more complex SQL queries for solving border cases with non-intuitive semantics, so we decided not to support them instead.

There is a special border case where according to the SPARQL standard the position of the OPTIONAL part gives different semantics. Consider the following two SPARQL queries:

SELECT * WHERE {
  ?s :p1 ?o1 .
  OPTIONAL { ?s :p2 ?o2 } .
  ?s :p3 ?o2
}

vs

SELECT * WHERE {
  ?s :p1 ?o1 .
  ?s :p3 ?o2 .
  OPTIONAL { ?s :p2 ?o2 }
}

According to the SPARQL standard, the first query only yields results when the values for :p2 and :p3 are the same, while the second query essentially ignores the OPTIONAL. Since SQL has a declarative semantics where the order of statements does not matter, we do not support this case. We always implement the semantics of the second SPARQL query.

According to the SPARQL standard, variables occurring in the left and right argument of a MINUS are scoped to their individual arguments. Since we translate this case into a NOT EXISTS, our implementation does not support this case, which can yield unexpected results for certain queries. These cases should be solved by proper variable renaming. All other differences between MINUS and NOT EXISTS are implemented according to the standard.