Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading GeoJSON boundaries using SPARQL from publicly available data

I'm interested in downloading some boundary files from statistics.gov.scot, which is an official statistical repository for sharing statistical data that utilises SPARQL queries.

Background

Statistics.gov.scot provides access to GeoJSON boundaries for number of administrative and statistical geographies, like local authority administrative boundaries or health boards. In my particular case I'm interested in download a data set with GeoJSON boundaries pertaining to data zones. Data zones are statistical geographies developed for the purpose of disseminating life outcomes data on a small area level. When accessed via the statistics.gov.scot sample data zone looks like that:

Sample data zone

The geography and the related data can be accessed here. The corresponding GeoJSON data is available here.

Problem

Data zones are available in two iterations, on produced in 2004 and another one updated recently. I would like to download first iteration produced in 2004. Following the information on the statistical entities, I drafted the following query:

PREFIX entity: <http://statistics.data.gov.uk/def/statistical-entity#>
PREFIX boundaries: <http://statistics.gov.scot/boundaries/>

SELECT ?boundary 
    WHERE {
        entity:introduced <http://reference.data.gov.uk/id/day/2004-02-01>
  }

LIMIT 1000

which returns the following error message:

Error There was a syntax error in your query: Encountered " "}" "} "" at line 7,
column 3. Was expecting one of: <IRIref> ... <PNAME_NS> ... <PNAME_LN> ...
<BLANK_NODE_LABEL> ... <VAR1> ... <VAR2> ... "true" ... "false" ... <INTEGER> ...
<DECIMAL> ... <DOUBLE> ... <INTEGER_POSITIVE> ... <DECIMAL_POSITIVE> ...
<DOUBLE_POSITIVE> ... <INTEGER_NEGATIVE> ... <DECIMAL_NEGATIVE> ...
<DOUBLE_NEGATIVE> ... <STRING_LITERAL1> ... <STRING_LITERAL2> ...
<STRING_LITERAL_LONG1> ... <STRING_LITERAL_LONG2> ... "(" ... <NIL> ... "[" ...
<ANON> ... "+" ... "*" ... "/" ... "|" ... "?" ...

when tested via the endpoint: http://statistics.gov.scot/sparql.

Comments

Ideally, I would like to develop other queries that would enable me to source other statistical geographies by using the entity: prefix. This should be possible as the entity: will contain information on the available geographies (name, acronym, date of creation).


The query:

PREFIX entity: <http://statistics.data.gov.uk/def/statistical-entity#>
PREFIX boundaries: <http://statistics.gov.scot/boundaries/>

SELECT DISTINCT ?boundary ?shape WHERE {
  ?shape entity:firstcode ?boundary
}

LIMIT 1000

Got me to something that looks like a list of desired geographies but I'm struggling to source the GeoJSON boundaries.

like image 926
Konrad Avatar asked Feb 26 '16 16:02

Konrad


People also ask

What types of queries does SPARQL support?

SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports aggregation, subqueries, negation, creating values by expressions, extensible value testing, and constraining queries by source RDF graph.

Is SPARQL an API?

A SPARQL Query Service is an HTTP Service (also known as a Web Service) that offers an API for performing declarative Data Definition and Data Manipulation operations on data represented as RDF sentence collections.


2 Answers

The first query is missing the subject. A SPARQL query defines a set of triple patterns - a subject, predicate, and object - to match an RDF graph. To turn your WHERE clause into a SPARQL triple pattern, try:

?boundary entity:introduced <http://reference.data.gov.uk/id/day/2004-02-01>
like image 114
scotthenninger Avatar answered Oct 09 '22 03:10

scotthenninger


Neither statistics.gov.scot nor statistics.data.gov.uk contains data zones boundaries as WKT or string literals.

However, with the following query, one could easily construct URLs of the GeoJSON files that are used on resources' pages:

PREFIX pref1: <http://statistics.data.gov.uk/def/statistical-entity#>
PREFIX pref2: <http://statistics.gov.scot/id/statistical-entity/>
PREFIX pref3: <http://statistics.data.gov.uk/def/boundary-change/>
PREFIX pref4: <http://reference.data.gov.uk/id/day/>
PREFIX pref5: <http://statistics.data.gov.uk/def/statistical-geography#>
PREFIX pref6: <http://statistics.gov.scot/id/statistical-geography/>
PREFIX pref7: <http://statistics.gov.scot/boundaries/>

SELECT ?zone ?name ?json {
   ?zone pref1:code pref2:S01 .
   ?zone pref3:operativedate pref4:2004-02-01
   OPTIONAL { ?zone pref5:officialname ?name }
   BIND (CONCAT(REPLACE(STR(?zone), STR(pref6:), STR(pref7:)), ".json") AS ?json)
} ORDER BY (!bound(?name)) ASC(?name)

After that, one could easily retrieve GeoJSON files using wget -i or something like this.

Some explanation

You should use <http://statistics.data.gov.uk/def/boundary-change/operativedate> instead of <http://statistics.data.gov.uk/def/statistical-entity#introduced>, the latter property is rather a class property:

SELECT * WHERE {
    ?S <http://statistics.data.gov.uk/def/statistical-entity#introduced> ?date .
    ?S <http://www.w3.org/2000/01/rdf-schema#label> ?label
}

The second generation data zones are dated by 2014-11-06:

SELECT ?date (COUNT(?zone) AS ?count) WHERE {
    ?zone
        <http://statistics.data.gov.uk/def/statistical-entity#code>
            <http://statistics.gov.scot/id/statistical-entity/S01> ;
        <http://statistics.data.gov.uk/def/boundary-change/operativedate>
            ?date 
} GROUP BY ?date

Analogously, if you need URLs of corresponding GeoJSON files, your query should be:

SELECT ?zone ?name ?json {
   ?zone pref1:code pref2:S01 .
   ?zone pref3:operativedate pref4:2014-11-06 .
   ?zone pref5:officialname ?name 
   BIND (CONCAT(REPLACE(STR(?zone), STR(pref6:), STR(pref7:)), ".json") AS ?json)
} ORDER BY ASC(?name)

You do not need OPTIONAL, because all second generation data zones have "official names".


Probably this page on data.gov.uk will be interesting for you.
There also exists opendata.stackexchange.com for questions related to open data.

Update

As of May 2018, one can retrieve data zones boundaries as WKT:

PREFIX pref1: <http://statistics.data.gov.uk/def/statistical-entity#>
PREFIX pref2: <http://statistics.gov.scot/id/statistical-entity/>
PREFIX pref3: <http://statistics.data.gov.uk/def/boundary-change/>
PREFIX pref4: <http://reference.data.gov.uk/id/day/>
PREFIX pref5: <http://statistics.data.gov.uk/def/statistical-geography#>
PREFIX pref6: <http://www.opengis.net/ont/geosparql#>


SELECT ?zone ?name ?geometry {
   ?zone pref1:code pref2:S01 .
   ?zone pref3:operativedate pref4:2014-11-06 .
   ?zone pref5:officialname ?name .
   ?zone pref6:hasGeometry/pref6:asWKT ?geometry .
} ORDER BY ASC(?name)
like image 44
Stanislav Kralin Avatar answered Oct 09 '22 02:10

Stanislav Kralin