Wednesday, July 03, 2013

The internet of things and geo services



Today I visited the concluding conference of a linked open data pilot in the Netherlands (http://pilod.nl). In this blog some thoughts on the matter from a (spatial) catalogue perspective. In the last years a lot of energy has been put in publishing open datasets on the web using open geo standards like CSW/WFS (triggered by regulations like Inspire, basis registraties etc). Unfortunately, due to their nature, geo-services can hardly be used in the linked open data web. As geo community I think we should investigate how to support the internet of thinges from our existing services. It's just a shame if they are left out. Or linked data might even bridge the gap that quite some experience between the geo community and regular ICT

CSW/WFS to RDF
Most geo data is tagged with metadata these days. In the metadata references are made to contact point info and keywords from skos thesauri (like gemet). If we'd be able to present this metadata as RDF along with the spatial data itself, the data will link with exisiting resources in the internet of things instantly. Geonetwork does have a basic RDF-endpoint-implementation these days (/srv/eng/rdf.search, funded by EEA), the actual challenge is in creating rdf output from the WFS services (and linked shapefiles) out there and link to that from the rdf-metadata. A potential solution would be to extend products like geoserver and mapserver, which do have support for a range of output formats for WFS (like GML,KML,SHP,CSV,JSON), with an RDF output format. From the RDF metadata one could then link to a WFS-getfeatures request in format application/rdf+xml. Spiders would be able to index the data and use it as linked data to support GeoSparql queries.

Implementing rdf support can be quite easy if it would be a pure xslt transformation (gml to rdf). However RDF is far more usefull if additional items are added to each getfeature-response-document. Items like contact point and keywords (as defined in the service definition) link the record to other resources on the web.

iso19110
An OGC practice described in iso19110 (http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=39965) but unfortunately not commonly implemented is feature catalog metadata (for example http://sdi.eea.europa.eu/catalogue/srv/eng/search?uuid=99b05c69-1532-48b8-bfeb-4cb0df539c88). This could become an essential link in creating RDF from WFS/CSW. ISO19110 describes the attributes used in a dataset. From this metadata references can be made to common (linked data) vocabularies, which then (hopefully) can be used to convert the GML to optimally linked RDF.

Fixed location
The internet of things is based on the fact that each resource on the web has a unique location (url), which will not change over time. In gis servers a record or feature is accessed via a getRecord/getFeature request. One can imagine a scenario that you use this type of requests as unique url for the resource. However, this url is actually not that fixed. For example the order of attributes in the request can change, it can be a post request and the format parameter is explicitely defined in the url (in stead of as an accept header, generally used in open data content negotiation). But more challanging is the version parameter of the standard, after a while a new version of the catalogue standard will become available, and the old standard might get deprecated. The url of the request will change accordingly. This could be managed by adding additional links in the referring document. A link could be added for each supported version, format, language, spatial reference system etc...

Geonetwork as a gateway
Considering above drawbacks it might be an idea to extend the functionality of Geonetwork to act as a gateway between geo and rdf. Geonetwork has all of the required iso19115/iso19110 and links to the actual downloads and services available. It could (daily/hourly) harvest all of that data and convert it to RDF quite optimally. Some additional advantages: a search query in geonetwork will include a search into the actual data and it would solve the uri-strategy for the geo domain instantly: All dutch geodata (PDOK) will for example be available at:
nationaalgeoregister.nl/geonetwork/dataset/{namespace}:{uuid}/record/{id}
For sure some exceptions should be made for frequently changing data like sensor services, and the processing should be delegated to other nodes in the cluster.

For sure above approach will have legal limitations (who is responsible for the data/service). In the long run each organisation will need an RDF endpoint (and register that endpoint in the iso19139 metadata?). But my proposal can offer a temporary solution for organisations which are not ready yet to do the full RDF implementation.

Read more
A search on the web learned there are plenty of initiatives in this area, most important the FP7 Geoknow project (http://geoknow.eu). I'm curious for their results.
Also check "Opportunities and Challenges for using  Linked Data in INSPIRE" by Sven Schade and Michael Lutz http://ceur-ws.org/Vol-691/paper5.pdf


No comments: