Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create CKAN dataset using CKAN API and Python Requests library

I am using CKAN version 2.2 and am trying to automate dataset creation and resource upload. I seem to be unable to create a dataset using the python requests library. I am receiving 400 error code. Code:

import requests, json

dataset_dict = {
    'name': 'testdataset',
    'notes': 'A long description of my dataset',
}

d_url = 'https://mywebsite.ca/api/action/package_create'
auth = {'Authorization': 'myKeyHere'}
f = [('upload', file('PathToMyFile'))]

r = requests.post(d_url, data=dataset_dict, headers=auth)

Strangely I am able to create a new resource and upload a file using the python requests library. The code is based on this documentation. Code:

import requests, json

res_dict = {
    'package_id':'testpackage',
    'name': 'testresource',
    'description': 'A long description of my resource!',
    'format':'CSV'
}

res_url = 'https://mywebsite.ca/api/action/resource_create'
auth = {'Authorization': 'myKey'}
f = [('upload', file('pathToMyFile'))]

r = requests.post(res_url, data=res_dict, headers=auth, files=f)

I am also able to create a dataset using the method in the CKAN documentation using built in python libraries. Documentation: CKAN 2.2

Code:

#!/usr/bin/env python
import urllib2
import urllib
import json
import pprint

# Put the details of the dataset we're going to create into a dict.
dataset_dict = {
    'name': 'test1',
    'notes': 'A long description of my dataset',
}

# Use the json module to dump the dictionary to a string for posting.
data_string = urllib.quote(json.dumps(dataset_dict))

# We'll use the package_create function to create a new dataset.
request = urllib2.Request('https://myserver.ca/api/action/package_create')

# Creating a dataset requires an authorization header.
request.add_header('Authorization', 'myKey')

# Make the HTTP request.
response = urllib2.urlopen(request, data_string)
assert response.code == 200

# Use the json module to load CKAN's response into a dictionary.
response_dict = json.loads(response.read())
assert response_dict['success'] is True

# package_create returns the created package as its result.
created_package = response_dict['result']
pprint.pprint(created_package)

I am not really sure why my method of creating the dataset is not working. The documentation for package_create and resource_create functions is very similar and I would expect to be able to use the same technique. I would prefer to use the requests package for all my dealings with CKAN. Has anyone been able to create a dataset with the requests library successfully?

Any help is greatly appreciated.

like image 585
NenadK Avatar asked Jul 08 '14 22:07

NenadK


2 Answers

I finally came back to this and figured it out. Alice's suggestion to check the encoding was very close. While requests does do the encoding for you, it also decides on its own which type of encoding is appropriate depending on the inputs. If a file is passed in along with a JSON dictionary, requests automatically does multipart/form-data encoding which is accepted by CKAN therefore the request is successful.

However if we pass only a JSON dictionary the default encoding is form encoding. CKAN needs requests without files to be URL encoded (application/x-www-form-urlencoded). To prevent requests from doing any encoding we can pass our parameters in as a string then requests will perform only a POST. This means we have to URL encode it ourselves.

Therefore if I specify the content type, convert the parameters to a string and encode with urllib and then pass the parameter to requests:

head['Content-Type'] = 'application/x-www-form-urlencoded'
in_dict = urllib.quote(json.dumps(in_dict))
r = requests.post(url, data=in_dict, headers=head)

Then the request is successful.

like image 99
NenadK Avatar answered Nov 15 '22 10:11

NenadK


The data you send must be JSON encoded.

From the documentation (the page you linked to):

To call the CKAN API, post a JSON dictionary in an HTTP POST request to one of CKAN’s API URLs.

In the urllib example this is performed by the following line of code:

data_string = urllib.quote(json.dumps(dataset_dict))

I think (though you should check) that the requests library will do the quoting for you - so you just need to convert your dict to JSON. Something like this should work:

r = requests.post(d_url, data=json.dumps(dataset_dict), headers=auth)
like image 23
Alice Heaton Avatar answered Nov 15 '22 10:11

Alice Heaton