Project Marmotta has retired. For details please refer to its Attic page.
Apache Marmotta - Linked Data Client - Modules

Linked Data Client: Modules

The LDClient library already comes with a number of pre-defined data providers for different services. The following page gives a more detailled overview over how the different data providers work and what data they return.

RDF Module

The RDF module (ldclient-provider-rdf) provides access to any resources conforming to the Linked Data principles. The data provider is capable of retrieving data in the most common Linked Data formats found on the web (i.e. RDF/XML, Turtle, N3, JSON-LD, RDF/JSON). To use the RDF module, add the following Maven artifact dependency to your build (in addition to the core ldclient artifacts):

<dependency>
    <groupId>org.apache.marmotta</groupId>
    <artifactId>ldclient-provider-rdf</artifactId>
    <version>3.3.0</version>
</dependency>

The RDF module will issue a direct HTTP request for the requested resource with the Accept header set to the supported RDF formats. It will parse the RDF response into a in-memory Sesame repository and filter out triples where the subject does not match the requested resource. The reason for this restriction is that all other triples do not actually represent the requested resource in a Linked Data style and also clutter your local repository unnecessarily (i.e. you only get what you requested and nothing more).

The RDF module auto-registers an endpoint for all HTTP resources with low priority, i.e. in case no other endpoint configuration matches first, the LDClient library will always try to issue a Linked Data request by default once the module is on the classpath.

In addition to this default behaviour, the RDF module also offers pre-defined endpoint classes that can be used for configuring special cases:

  • SPARQLEndpoint allows redirecting the request of all resources matching a regular expression pattern to a SPARQL service; the provider will then issue a SPARQL query of the form SELECT ?p ?o WHERE { <{uri}> ?p ?o } to retrieve all triples having the requested resource as subject
  • StanbolEndpoint allows redirecting the request to a Apache Stanbol Entityhub (which can be configured as a local cache of frequently-used Linked Data resources).

For example, to redirect requests to DBPedia resources to the (more reliable) SPARQL endpoint offered by the service, you can use the following statements:

ClientConfiguration config = new ClientConfiguration();
config.addEndpoint(
    new SPARQLEndpoint("DBPedia (SPARQL)","http://dbpedia.org/sparql","^http://dbpedia\\.org/resource/.*")
);

LDClientService ldclient = new LDClient(config);

RDFa Module

The RDFa module (ldclient-provider-rdfa) offers support to parse and retrieve RDF triples contained in an HTML document in RDFa format. The data provider uses the RDFa Core Java library for parsing and therefore only supports well-formed XHTML resources for parsing. To include the RDFa module in your project, add the following dependency to your project build:

<dependency>
    <groupId>org.apache.marmotta</groupId>
    <artifactId>ldclient-provider-rdfa</artifactId>
    <version>3.3.0</version>
</dependency>

The RDFa module only adds an additional parser to the OpenRDF library and therefore does not contain any source code itself. To use the RDFa support, you just need to include the library in the classpath.

Note: The RDFa parser library is published under LGPL license and therefore not part of the official Apache Marmotta distribution. We still publish the RDFa module as Maven artifact, but keep in mind the legal restrictions.

YouTube Module

The YouTube module allows you to access YouTube videos, playlists and channels as if they were ordinary Linked Data resources by wrapping the YouTube API and mapping the results of requests to RDF using the Ontology for Media Resources. To include the YouTube module in your project, add the following dependency to your build file:

<dependency>
    <groupId>org.apache.marmotta</groupId>
    <artifactId>ldclient-provider-youtube</artifactId>
    <version>3.3.0</version>
</dependency>

The library will automatically register data providers for videos, playlists, and channels and add default endpoint configurations for the most common YouTube URLs (http://www.youtube.com/..., http://gdata.youtube.com/..., http://youtu.be/...). This allows you to request any YouTube URL as soon as you have the library on your classpath.

Note that to avoid ambiguities, the actual video metadata will only be returned when requesting the video using the URI of the video starting with http://youtu.be/. All other URIs of the video will only return a triple linking to the youtu.be video with a foaf:primaryTopic relation.

Using the Ontology for Media Resources, the module will create the following triples when requesting a video resource:

  • rdf:type is ma:MediaResource and ma:VideoTrack
  • ma:title is the title of the video on YouTube
  • ma:locator is the URL of the actual video on the platform
  • ma:hasCreator and ma:hasPublisher point to the resource URI of the user who uploaded the video
  • ma:date is set to the publication date of the video
  • ma:locationLatitude and ma:locationLongitude is set to the geo-coordinates of the video (if available)
  • ma:description contains the textual description provided by the publisher
  • ma:hasKeyword contains the names of all YouTube keywords associated with the video
  • ma:hasGenre points to the resources identifying the YouTube categories of the video
  • ma:hasRating contains the average user rating of the video at the time of retrieval
  • ma:copyright describes the license terms of the video
  • ma:hasCompression and ma:format will describe the content format used for storing the video
  • ma:duration contains the duration of the video in seconds
  • foaf:thumbnail contains references to thumbnail images of the video
  • sioc:num_views contains the number of views of the video at the time of retrieval

Similarly, channels and playlists are represented using the Media Collections offered by the Ontology for Media Resources:

  • rdf:type is ma:Collection
  • ma:collectionName is the title of the channel or playlist
  • ma:hasMember points to the resource URIs of all videos contained in the channel or plalist
  • ma:hasCreator and ma:hasPublisher points to the resource URI of the user who uploaded the video

Vimeo Module

Similar to the YouTube module, the Vimeo module allows you to access Vimeo videos and groups (including channels) as if they were Linked Data resources. When requesting metadata about a resource, it redirects the request to the Vimeo API, processes the result, and maps the proprietary properties to RDF using the Ontology for Media Resources. It is thus possible to access Vimeo and YouTube videos in the same way. To use the Vimeo module in your project, add the following dependency to your build:

<dependency>
    <groupId>org.apache.marmotta</groupId>
    <artifactId>ldclient-provider-vimeo</artifactId>
    <version>3.3.0</version>
</dependency>

The library will automatically register endpoints with default configurations for Vimeo videos, groups and channels accessible via the main Vimeo website (http://vimeo.com). With this configuration you can directly retrieve metadata about Vimeo resources using the LDClient, no further configuration is needed.

Using the Ontology for Media Resources, the module will create the following triples when requesting a video resource. Note that the Vimeo API provides significantly less details than the YouTube API:

  • rdf:type is ma:MediaResource and ma:VideoTrack
  • ma:title is the title of the video on Vimeo
  • ma:locator is the URL of the actual video on the platform
  • ma:hasCreator and ma:hasPublisher point to the resource URI of the user who uploaded the video
  • ma:date is set to the publication date of the video
  • ma:description contains the textual description provided by the publisher
  • ma:hasKeyword contains the keywords the publisher associated with the video
  • ma:duration contains the duration of the video in seconds
  • foaf:thumbnail contains references to thumbnail images of the video
  • sioc:num_views contains the number of views of the video at the time of retrieval
  • sioc:num_replies contains the number of comments on the video at the time of retrieval

Similarly, channels and playlists are represented using the Media Collections offered by the Ontology for Media Resources:

  • rdf:type is ma:Collection
  • ma:hasMember points to the resource URIs of all videos contained in the channel or group

MediaWiki Module

The MediaWiki module allows accessing content and metadata of wiki articles managed by a MediaWiki system (the most prominent being Wikipedia). When requesting a resource that represents a wiki article, this module will instead retrieve the article from the MediaWiki API offered by the wiki installation and provide access to many metadata properties otherwise not accessible. To use the MediaWiki module in your own project, add the following dependency to your project build:

<dependency>
    <groupId>org.apache.marmotta</groupId>
    <artifactId>ldclient-provider-mediawiki</artifactId>
    <version>3.3.0</version>
</dependency>

The module auto-registers endpoints that map requests to all language versions of Wikipedia to the respective API service endpoint, so if you want to request only Wikipedia articles, no further action is needed. For any other MediaWiki installation, it is necessary to create an endpoint configuration and add it to the LDClient instance you are using (see the WikipediaPageEndpoint source code as an example).

The MediaWiki provider creates triples for a wiki page using SIOC and Dublin Core as follows:

  • rdf:type is sioc:WikiArticle
  • dct:title is the title of the wiki page
  • dct:identifier is the page identifier in the MediaWiki database
  • dct:modified is the last modification date of the page
  • dct:created is the creation date of the page
  • content:encoded is the content of the page in MediaWiki syntax
  • sioc:topic is references to all resource URIs of MediaWiki categories used by the wiki page
  • sioc:links_to is references to all resource URIs of wiki pages the page links to

Facebook Module

The facebook module wraps the Facebook Graph API as RDF triples using the Schema.org and DC-Terms vocabularies. The module registers an endpoint that handles all URIs starting with http://graph.facebook.com and http://www.facebook.com. To use it in your project build, add the following dependency:

<dependency>
    <groupId>org.apache.marmotta</groupId>
    <artifactId>ldclient-provider-facebook</artifactId>
    <version>3.3.0</version>
</dependency>

The facebook module tries hard to map Facebook categories to Schema.org types. Unfortunately, this is not always possible, because the categories used by Facebook are not completely documented. Facebook objects have at least the following properties:

  • rdf:type is a schema.org category according to the category project of the facebook object
  • dcterms:id is the internal facebook ID of the object
  • schema:name is the facebook name of the object
  • schema:description is the facebook description or about property of the object
  • foaf:homepage is the facebook website and link property of the object

PHPBB Module (Experimental)

The PHPBB module tries to parse the HTML pages generated by a PHPBB discussion forum and extract posts and threads from the content. Since PHPBB does not offer an RSS feed or service API, this extraction is unreliable at best, because it depends heavily on the layout and theming of the page. We tried to make it as generic as possible, though. If you want to try out the module, include the following dependency to your project build:

<dependency>
    <groupId>org.apache.marmotta</groupId>
    <artifactId>ldclient-provider-phpbb</artifactId>
    <version>3.3.0</version>
</dependency>

The module does not auto-register any endpoints, but it offers simplified endpoint classes that allow a more convenient configuration of endpoints. If you want to configure all endpoints for a PHPBB site, simply use

ClientConfiguration config = new ClientConfiguration();
config.getEndpoints().addAll(PHPBBEndpoints.getEndpoints("http://www.carving-ski.de/phpBB/,"Carving Ski Forum"));

LDClient ldclient = new LDClient(config);

where the first argument is the URL of the PHPBB installation and the second argument gives a human-readable name to the endpoint configuration. The method returns a set of endpoints that can be directly added to an LDClient instance.

The PHPBB module will map information from posts to RDF using the SIOC and Dublin Core vocabulary as follows:

  • rdf:type is sioc:Post, sioc-types:BoardPost and foaf:Document
  • dc:title is the post title; some cleanup will be performed to remove common “Re:” and similar prefixes
  • dc:creator is the name of the user as found in the post
  • dc:description is the content of the post
  • dc:date is the date of the post

A topic or thread in a PHPBB forum will be mapped to RDF as follows:

  • rdf:type is sioc:Thread, sioc:Collection and foaf:Document
  • dc:title is the thread or topic title
  • dc:creator is the name of the user who started the thread/topic
  • dc:date is the date when the topic/thread was started
  • sioc:has_container points to the resource URI of the forum this thread belongs to
  • sioc:container_of points to all resource URIs of the posts contained in the thread

Finally, a forum in a PHPBB installation will be mapped to RDF as follows:

  • rdf:type is sioc:Forum, sioc:Collection and foaf:Document
  • dc:title is the forum title
  • dc:description contains a short description of the forum (as given on the webpage)
  • sioc:container_of points to all resource URIs of the threads/topics contained in the forum
  • sioc:parent_of points to all resource URIs of subforums contained in the forum

LDAP Module (Experimental)

The LDAP module allows accessing a LDAP directory to get information about users. LDAP properties will be mapped to the FOAF vocabulary. Endpoint configurations need to provide the two properties “loginDN” and “loginPW” to support LDAP directories with access control. In order to use the LDAP module in your project, add the following dependency to your project build:

<dependency>
    <groupId>org.apache.marmotta</groupId>
    <artifactId>ldclient-provider-ldap</artifactId>
    <version>3.3.0</version>
</dependency>

The module does not auto-register any endpoints. You need to create an endpoint configuration for each LDAP directory you want to access. In most cases the LDAP directory will require a login to access the user data. These can be configured using the “loginDN” and “loginPW” properties:

Endpoint endpoint = new Endpoint("mydirectory", "LdapFoafProvider", "ldap://mydirectory.com:389/SRFG/USERS/.*", "ldap://mydirectory.com:389/dc=salzburgresearch,dc=at", null, 86400L);
endpoint.setProperty("loginDN","login");
endpoint.setProperty("loginPW","password");

ClientConfiguration config = new ClientConfiguration();
config.addEndpoint(endpoint);

LDClient ldclient = new LDClient(config);

The LDAP module will map LDAP properties to FOAF properties as follows:

  • the rdf:type is foaf:Person
  • distinguishedName is mapped to dct:identifier
  • name is mapped to foaf:name
  • givenName is mapped to foaf:firstName
  • sn is mapped to foaf:surname
  • mail is mapped to foaf:mbox