The LDClient library is implemented in a modular and pluggable way that allows to easily extend the types of supported data sources with custom data providers. Plugins are loaded using the Java ServiceLoader API. Implementing custom data providers therefore involves the following steps:
The following sections describe in more detail how this can be achieved in detail.
Implementing a DataProvider requires you to provide the following kinds of information and functionality for a data source:
While the first two methods are more-or-less trivial, the third method performs most of the magic and is probably the most difficult to implement. Basically, it needs to retrieve the resource data either directly or from a service handling the data for this resource, parse the returned content and the HTTP status codes, and create a ClientResponse object. The ClientResponse contains two kinds of information:
The triples are stored in an OpenRDF Sesame Repository. In principle, any Sesame repository is possible, but it most cases it only makes sense to use an in-memory repository (MemStore sail).
Endpoint configurations define for which kinds of resources to use which data provider, and how this data provider is to be configured. The endpoint that matched a resource and selected a data provider is handed over to the retrieveResource(...) method to allow the data provider to access the configuration. Endpoints contain the following configuration parameters:
In almost all cases, creating an instance of the default Endpoint is sufficient for configuring an LDClient instance. However, there are a few exceptions to this rule:
LDClient uses the Java ServiceLoader API to automatically register data providers and endpoints. To auto-register a data provider to be used by LDClient, create a file with the following name (usually in your resources directory):
META-INF/services/org.apache.marmotta.ldclient.api.provider.DataProvider
and add into this file a line with the fully-qualified class name of the data provider you implemented. Likewise, if you want to auto-register one or more endpoint configurations, create a file with the following name:
META-INF/services/org.apache.marmotta.ldclient.api.endpoint.Endpoint
and add to this file all fully-qualified class names of the endpoint configurations you want to register. Please keep in mind that (1) all classes need to have a zero-argument constructor, and (2) you should carefully choose priorities for endpoints you are auto-registering.
The LDClient library provides a number of support modules in the form of abstract base classes that can be used to implement typical cases of data providers. Currently, there are base classes for HTTP requests, for processing XML data, and for processing HTML data.
The HTTP module is part of the LDClient Core. It offers support for retrieving HTTP resources (of any type) using the Apache HTTPClient library. Since this is the most common type of Linked Data resources, the LDClient Core offers advanced connection management for HTTP connections using a connection pool and keep-alive connections.
Implementing a data provider that uses HTTP to retrieve resource data requires subclassing the class AbstractHttpProvider. This class implements the retrieveResource(...) method of DataProvider, but requires subclasses to implement a parseResponse(...) method instead. Please see the Javadoc documentation for details.
The XML module provides an abstract base class for all XML-based data sources, e.g. web services like YouTube or Vimeo offering their data in a proprietary XML format. It is based on the HTTP module, implements basic XML parsing functionality (using JDOM) and allows subclasses to specify mappings from XPath statements to RDF properties. Subclasses will need to provide this mapping as well as a method specifying how to build the request URL for the service based on the resource URI of the retrieved resource. To build a data provider based on an XML data source, you will need to include the following Maven artifact:
<dependency> <groupId>org.apache.marmotta</groupId> <artifactId>ldclient-provider-xml</artifactId> <version>3.3.0</version> </dependency>
To actually use the XML provider with a data source, you need to create a subclass of AbstractXMLDataProvider and override the following methods:
Good examples on how to use the XML module can be found in the Vimeo and Youtube modules in the source code.
Like the XML module, the HTML module provides an abstract base class for all HTML-based data sources (simple web pages or also more complex web applications). It is for example used in the PHPBB module to access posts and threads in an online forum. It implements basic HTML parsing functionality, even for messy HTML, using JSoup. Since XPath is usually not very convenient for HTML documents, mappings from element values to RDF properties in the HTML module are specified using CSS selectors similar to jQuery. Subclasses will need to provide this mapping as well as a method specifying how to build the request URL for the service based on the resource URI of the retrieved resource. To build a data provider based on an HTML data source, you will need to include the following Maven artifact:
<dependency> <groupId>org.apache.marmotta</groupId> <artifactId>ldclient-provider-html</artifactId> <version>3.3.0</version> </dependency>
To actually use the XML provider with a data source, you need to create a subclass of AbstractHTMLDataProvider and override the following methods:
The only currently existing example is the PHPBB module.