The LDClient library already comes with a number of pre-defined data providers for different services. The following page gives a more detailled overview over how the different data providers work and what data they return.
The RDF module (ldclient-provider-rdf) provides access to any resources conforming to the Linked Data principles. The data provider is capable of retrieving data in the most common Linked Data formats found on the web (i.e. RDF/XML, Turtle, N3, JSON-LD, RDF/JSON). To use the RDF module, add the following Maven artifact dependency to your build (in addition to the core ldclient artifacts):
<dependency> <groupId>org.apache.marmotta</groupId> <artifactId>ldclient-provider-rdf</artifactId> <version>3.3.0</version> </dependency>
The RDF module will issue a direct HTTP request for the requested resource with the Accept header set to the supported RDF formats. It will parse the RDF response into a in-memory Sesame repository and filter out triples where the subject does not match the requested resource. The reason for this restriction is that all other triples do not actually represent the requested resource in a Linked Data style and also clutter your local repository unnecessarily (i.e. you only get what you requested and nothing more).
The RDF module auto-registers an endpoint for all HTTP resources with low priority, i.e. in case no other endpoint configuration matches first, the LDClient library will always try to issue a Linked Data request by default once the module is on the classpath.
In addition to this default behaviour, the RDF module also offers pre-defined endpoint classes that can be used for configuring special cases:
For example, to redirect requests to DBPedia resources to the (more reliable) SPARQL endpoint offered by the service, you can use the following statements:
ClientConfiguration config = new ClientConfiguration(); config.addEndpoint( new SPARQLEndpoint("DBPedia (SPARQL)","http://dbpedia.org/sparql","^http://dbpedia\\.org/resource/.*") ); LDClientService ldclient = new LDClient(config);
The RDFa module (ldclient-provider-rdfa) offers support to parse and retrieve RDF triples contained in an HTML document in RDFa format. The data provider uses the RDFa Core Java library for parsing and therefore only supports well-formed XHTML resources for parsing. To include the RDFa module in your project, add the following dependency to your project build:
<dependency> <groupId>org.apache.marmotta</groupId> <artifactId>ldclient-provider-rdfa</artifactId> <version>3.3.0</version> </dependency>
The RDFa module only adds an additional parser to the OpenRDF library and therefore does not contain any source code itself. To use the RDFa support, you just need to include the library in the classpath.
Note: The RDFa parser library is published under LGPL license and therefore not part of the official Apache Marmotta distribution. We still publish the RDFa module as Maven artifact, but keep in mind the legal restrictions.
The YouTube module allows you to access YouTube videos, playlists and channels as if they were ordinary Linked Data resources by wrapping the YouTube API and mapping the results of requests to RDF using the Ontology for Media Resources. To include the YouTube module in your project, add the following dependency to your build file:
<dependency> <groupId>org.apache.marmotta</groupId> <artifactId>ldclient-provider-youtube</artifactId> <version>3.3.0</version> </dependency>
The library will automatically register data providers for videos, playlists, and channels and add default endpoint configurations for the most common YouTube URLs (http://www.youtube.com/..., http://gdata.youtube.com/..., http://youtu.be/...). This allows you to request any YouTube URL as soon as you have the library on your classpath.
Note that to avoid ambiguities, the actual video metadata will only be returned when requesting the video using the URI of the video starting with http://youtu.be/. All other URIs of the video will only return a triple linking to the youtu.be video with a foaf:primaryTopic relation.
Using the Ontology for Media Resources, the module will create the following triples when requesting a video resource:
Similarly, channels and playlists are represented using the Media Collections offered by the Ontology for Media Resources:
Similar to the YouTube module, the Vimeo module allows you to access Vimeo videos and groups (including channels) as if they were Linked Data resources. When requesting metadata about a resource, it redirects the request to the Vimeo API, processes the result, and maps the proprietary properties to RDF using the Ontology for Media Resources. It is thus possible to access Vimeo and YouTube videos in the same way. To use the Vimeo module in your project, add the following dependency to your build:
<dependency> <groupId>org.apache.marmotta</groupId> <artifactId>ldclient-provider-vimeo</artifactId> <version>3.3.0</version> </dependency>
The library will automatically register endpoints with default configurations for Vimeo videos, groups and channels accessible via the main Vimeo website (http://vimeo.com). With this configuration you can directly retrieve metadata about Vimeo resources using the LDClient, no further configuration is needed.
Using the Ontology for Media Resources, the module will create the following triples when requesting a video resource. Note that the Vimeo API provides significantly less details than the YouTube API:
Similarly, channels and playlists are represented using the Media Collections offered by the Ontology for Media Resources:
The MediaWiki module allows accessing content and metadata of wiki articles managed by a MediaWiki system (the most prominent being Wikipedia). When requesting a resource that represents a wiki article, this module will instead retrieve the article from the MediaWiki API offered by the wiki installation and provide access to many metadata properties otherwise not accessible. To use the MediaWiki module in your own project, add the following dependency to your project build:
<dependency> <groupId>org.apache.marmotta</groupId> <artifactId>ldclient-provider-mediawiki</artifactId> <version>3.3.0</version> </dependency>
The module auto-registers endpoints that map requests to all language versions of Wikipedia to the respective API service endpoint, so if you want to request only Wikipedia articles, no further action is needed. For any other MediaWiki installation, it is necessary to create an endpoint configuration and add it to the LDClient instance you are using (see the WikipediaPageEndpoint source code as an example).
The MediaWiki provider creates triples for a wiki page using SIOC and Dublin Core as follows:
The facebook module wraps the Facebook Graph API as RDF triples using the Schema.org and DC-Terms vocabularies. The module registers an endpoint that handles all URIs starting with http://graph.facebook.com and http://www.facebook.com. To use it in your project build, add the following dependency:
<dependency> <groupId>org.apache.marmotta</groupId> <artifactId>ldclient-provider-facebook</artifactId> <version>3.3.0</version> </dependency>
The facebook module tries hard to map Facebook categories to Schema.org types. Unfortunately, this is not always possible, because the categories used by Facebook are not completely documented. Facebook objects have at least the following properties:
The PHPBB module tries to parse the HTML pages generated by a PHPBB discussion forum and extract posts and threads from the content. Since PHPBB does not offer an RSS feed or service API, this extraction is unreliable at best, because it depends heavily on the layout and theming of the page. We tried to make it as generic as possible, though. If you want to try out the module, include the following dependency to your project build:
<dependency> <groupId>org.apache.marmotta</groupId> <artifactId>ldclient-provider-phpbb</artifactId> <version>3.3.0</version> </dependency>
The module does not auto-register any endpoints, but it offers simplified endpoint classes that allow a more convenient configuration of endpoints. If you want to configure all endpoints for a PHPBB site, simply use
ClientConfiguration config = new ClientConfiguration(); config.getEndpoints().addAll(PHPBBEndpoints.getEndpoints("http://www.carving-ski.de/phpBB/,"Carving Ski Forum")); LDClient ldclient = new LDClient(config);
where the first argument is the URL of the PHPBB installation and the second argument gives a human-readable name to the endpoint configuration. The method returns a set of endpoints that can be directly added to an LDClient instance.
The PHPBB module will map information from posts to RDF using the SIOC and Dublin Core vocabulary as follows:
A topic or thread in a PHPBB forum will be mapped to RDF as follows:
Finally, a forum in a PHPBB installation will be mapped to RDF as follows:
The LDAP module allows accessing a LDAP directory to get information about users. LDAP properties will be mapped to the FOAF vocabulary. Endpoint configurations need to provide the two properties “loginDN” and “loginPW” to support LDAP directories with access control. In order to use the LDAP module in your project, add the following dependency to your project build:
<dependency> <groupId>org.apache.marmotta</groupId> <artifactId>ldclient-provider-ldap</artifactId> <version>3.3.0</version> </dependency>
The module does not auto-register any endpoints. You need to create an endpoint configuration for each LDAP directory you want to access. In most cases the LDAP directory will require a login to access the user data. These can be configured using the “loginDN” and “loginPW” properties:
Endpoint endpoint = new Endpoint("mydirectory", "LdapFoafProvider", "ldap://mydirectory.com:389/SRFG/USERS/.*", "ldap://mydirectory.com:389/dc=salzburgresearch,dc=at", null, 86400L); endpoint.setProperty("loginDN","login"); endpoint.setProperty("loginPW","password"); ClientConfiguration config = new ClientConfiguration(); config.addEndpoint(endpoint); LDClient ldclient = new LDClient(config);
The LDAP module will map LDAP properties to FOAF properties as follows: