I am using CKAN version 2.2 and am trying to automate dataset creation and resource upload. I seem to be unable to create a dataset using the python requests library. I am receiving 400 error code. Code:
import requests, json
dataset_dict = {
'name': 'testdataset',
'notes': 'A long description of my dataset',
}
d_url = 'https://mywebsite.ca/api/action/package_create'
auth = {'Authorization': 'myKeyHere'}
f = [('upload', file('PathToMyFile'))]
r = requests.post(d_url, data=dataset_dict, headers=auth)
Strangely I am able to create a new resource and upload a file using the python requests library. The code is based on this documentation. Code:
import requests, json
res_dict = {
'package_id':'testpackage',
'name': 'testresource',
'description': 'A long description of my resource!',
'format':'CSV'
}
res_url = 'https://mywebsite.ca/api/action/resource_create'
auth = {'Authorization': 'myKey'}
f = [('upload', file('pathToMyFile'))]
r = requests.post(res_url, data=res_dict, headers=auth, files=f)
I am also able to create a dataset using the method in the CKAN documentation using built in python libraries. Documentation: CKAN 2.2
Code:
#!/usr/bin/env python
import urllib2
import urllib
import json
import pprint
# Put the details of the dataset we're going to create into a dict.
dataset_dict = {
'name': 'test1',
'notes': 'A long description of my dataset',
}
# Use the json module to dump the dictionary to a string for posting.
data_string = urllib.quote(json.dumps(dataset_dict))
# We'll use the package_create function to create a new dataset.
request = urllib2.Request('https://myserver.ca/api/action/package_create')
# Creating a dataset requires an authorization header.
request.add_header('Authorization', 'myKey')
# Make the HTTP request.
response = urllib2.urlopen(request, data_string)
assert response.code == 200
# Use the json module to load CKAN's response into a dictionary.
response_dict = json.loads(response.read())
assert response_dict['success'] is True
# package_create returns the created package as its result.
created_package = response_dict['result']
pprint.pprint(created_package)
I am not really sure why my method of creating the dataset is not working. The documentation for package_create and resource_create functions is very similar and I would expect to be able to use the same technique. I would prefer to use the requests package for all my dealings with CKAN. Has anyone been able to create a dataset with the requests library successfully?
Any help is greatly appreciated.
I finally came back to this and figured it out. Alice's suggestion to check the encoding was very close. While requests does do the encoding for you, it also decides on its own which type of encoding is appropriate depending on the inputs. If a file is passed in along with a JSON dictionary, requests automatically does multipart/form-data encoding which is accepted by CKAN therefore the request is successful.
However if we pass only a JSON dictionary the default encoding is form encoding. CKAN needs requests without files to be URL encoded (application/x-www-form-urlencoded). To prevent requests from doing any encoding we can pass our parameters in as a string then requests will perform only a POST. This means we have to URL encode it ourselves.
Therefore if I specify the content type, convert the parameters to a string and encode with urllib and then pass the parameter to requests:
head['Content-Type'] = 'application/x-www-form-urlencoded'
in_dict = urllib.quote(json.dumps(in_dict))
r = requests.post(url, data=in_dict, headers=head)
Then the request is successful.
The data you send must be JSON encoded.
From the documentation (the page you linked to):
To call the CKAN API, post a JSON dictionary in an HTTP POST request to one of CKAN’s API URLs.
In the urllib example this is performed by the following line of code:
data_string = urllib.quote(json.dumps(dataset_dict))
I think (though you should check) that the requests
library will do the quoting for you - so you just need to convert your dict to JSON. Something like this should work:
r = requests.post(d_url, data=json.dumps(dataset_dict), headers=auth)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With