I want to import OData XML datafeeds from the Dutch Bureau of Statistics (CBS) into our database. Using lxml and pandas I thought this should be straigtforward. By using OrderDict I want to preserve the order of the columns for readability, but somehow I can't get it right.
from collections import OrderedDict from lxml import etree import requests import pandas as pd # CBS URLs base_url = 'http://opendata.cbs.nl/ODataFeed/odata' datasets = ['/37296ned', '/82245NED'] feed = requests.get(base_url + datasets[1] + '/TypedDataSet') root = etree.fromstring(feed.content) # all record entries start at tag m:properties, parse into data dict data = [] for record in root.iter('{{{}}}properties'.format(root.nsmap['m'])): row = OrderedDict() for element in record: row[element.tag.split('}')[1]] = element.text data.append(row) df = pd.DataFrame.from_dict(data) df.columns
Inspecting data
, the OrderDict is in the right order. But looking at df.head()
the columns have been sorted alphabetically with CAPS first?
Help, anyone?
Pandas. DataFrame doesn't preserve the column order when converting from a DataFrames.
Pandas tail() method is used to return bottom n (5 by default) rows of a data frame or series.
Something in your example seems to be inconsistent, as data
is a list
and no dict
, but assuming you really have an OrderedDict
:
Try to explicitly specify your column order when you create your DataFrame:
# ... all your data collection df = pd.DataFrame(data, columns=data.keys())
This should give you your DataFrame with the columns ordered just in exact the way they are in the OrderedDict (via the data.keys()
generated list)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With