Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get only the most recent value from a Wikidata property?

Suppose I want to get a list of every country (Q6256) and its most recently recorded Human Development Index (P1081) value. The Human Development Index property for the country contains a list of data points taken at different points in time, but I only care about the most recent data. This query will not work because it gets multiple results for each country (one for each Human Development Index data point):

SELECT
?country 
?countryLabel 
?hdi_value
?hdi_date
WHERE {
  ?country wdt:P31 wd:Q6256.
  OPTIONAL { ?country p:P1081 ?hdi_statement. 
         ?hdi_statement ps:P1081 ?hdi_value.
         ?hdi_statement pq:P585 ?hdi_date.
       }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Link to Query Console

I'm aware of GROUP BY/GROUP CONCAT but that will still give me every result when I'd prefer to just have one. GROUP BY/SAMPLE will also not work since SAMPLE is not guaranteed to take the most recent result.

Any help or link to a relevant example query is appreciated!

P.S. Another thing I'm confused about is why population P1082 in this query returns only one population result per country

SELECT
?country 
?countryLabel 
?population
WHERE {
  ?country wdt:P31 wd:Q6256.
  OPTIONAL { ?country wdt:P1082 ?population. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

while the same query but for HDI returns multiple results per country:

SELECT
?country 
?countryLabel 
?hdi
WHERE {
 ?country wdt:P31 wd:Q6256.
  OPTIONAL { ?country wdt:P1081 ?hdi. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

What is different about population and HDI that causes the behavior to be different? When I view the population data for each country on Wikidata I see multiple population points listed, but only one gets returned by the query.

like image 429
Brian Avatar asked Mar 02 '18 09:03

Brian


People also ask

How to get the most recent records in a table?

You could select the records and then sort the internal table descending to get the most recent records. SELECT * FROM draw INTO TABLE gt_doklist UP TO 100 ROWS where (Any Conditions) if sy-subrc = 0. sort gt_doklist by seq_no descending. endif.

How to get the maxdate of a table by name?

We can do that in Power Query as well. The approach is to find the MaxDate grouped by Name. And then add a column IsLatest by comparing the date and MaxDate . derived. My test table had only 3 coulmns Date,Name and Response and 13000 rows. GroupbyName = Table.Group (ChangedType, {"Name"}, { {"MaxDate", each List.Max ( [Date]), type date}}),

What does the most recent lead value represent?

This value represents the quality of a lead source as leads from that source moves down the sales funnel. I need to create a measure that identifies the most recent value, so that I can use that to multiply with new leads coming in, in order to project the sales coming from those leads.

Why can't I find a single value for column 'date Talken'?

Per your note, I did the following: But i got this error-A single value for column 'Date Talken ' in table 'Community Leadership Assessment (2)' cannot be determined. This can happen when a measure formula refers to a column that contains many values without specifying an aggregation such as min, max, count, or sum to get a single result.


1 Answers

Both your questions are duplicates, but I'll try to add interesting facts to existing answers.

Question 1 is a duplicate of SPARQL query to get only results with the most recent date.

This technique does the trick:

FILTER NOT EXISTS {
    ?country p:P1081/pq:P585 ?hdi_date_ .
    FILTER (?hdi_date_ > ?hdi_date)
}

However, you should add this clause outside of OPTIONAL, it is not working inside of OPTIONAL (and I'm not sure this is not a bug).


Question 2 is a duplicate of Some cities aren't instances of city or big city?

You can't use wdt-predicates, because missing statements are not truthy.
They are normal-rank statements, but there is a preferred-rank statement.

Truthy statements represent statements that have the best non-deprecated rank for given property. Namely, if there is a preferred statement for property P2, then only preferred statements for P2 will be considered truthy. Otherwise, all normal-rank statements are considered truthy.

The reason why P1081 always has preferred statement is that this property is processed by PreferentialBot.

like image 114
Stanislav Kralin Avatar answered Oct 02 '22 07:10

Stanislav Kralin