Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preserving column order in Python Pandas DataFrame

Tags:

python

pandas

Is there a way to preserve the order of the columns in a csv file when read and the write with Python Pandas? For example, in this code

import pandas as pd  data = pd.read_csv(filename) data.to_csv(filename) 

the output files might be different because the columns are not preserved.

like image 496
Hernan Avatar asked Mar 27 '13 07:03

Hernan


People also ask

Does pandas preserve column order?

Pandas. DataFrame doesn't preserve the column order when converting from a DataFrames.

Does Dataframe preserve order?

I read somewhere else Dataframes do not guarantee line order. My experience is that the order of the CSV will be maintained when read. If you do a transform on the dataframe, the order can be lost. Dataframes do have sort support, if you are not sure.

How do I set a column order in pandas?

Reorder Columns using Pandas . Another way to reorder columns is to use the Pandas . reindex() method. This allows you to pass in the columns= parameter to pass in the order of columns that you want to use.

Does column order matter in Dataframe?

No, it does not work for missing values. Then you start doing dropna or fillna on various columns that are not matching.


2 Answers

There appears to be a bug in the current version of Pandas ('0.11.0'), which means that Matti John's answer will not work. If you specify columns for writing to file, they are written in alphabetical order, but simply relabelled according to the list in cols. For example, this code:

import pandas dfdict={} dfdict["a"]=[1,2,3,4] dfdict["b"]=[5,6,7,8] dfdict["c"]=[9,10,11,12] df=pandas.DataFrame(dfdict) df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"]) 

results in this (incorrect) output:

    b   a   c 0   1   5   9 1   2   6   10 2   3   7   11 3   4   8   12 

You can check which version of pandas you have installed by executing:

pandas.version.version 

Documentation for to_csv is here

Actually, it seems that this is a known bug and will be fixed in an upcoming release (0.11.1):

https://github.com/pydata/pandas/issues/3489

UPDATE: There still hasn't been a new release of pandas, but there is a workaround described here, which doesn't require using a different version of pandas:

github.com/pydata/pandas/issues/3454

So changing the last line in the block of code above to the following will work correctly:

df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"], engine='python') 

UPDATE it seems that the argument "cols" has been renamed to "columns" and that the argument "engine" is deprecated (no longer available) in recent versions of pandas. Also, this bug is fixed in version 0.19.0.

like image 125
CnrL Avatar answered Sep 20 '22 18:09

CnrL


The column order should generally be preserved when reading and then writing a csv file like that, but if for some reason they are not in the order you want you can use the columns keyword argument in to_csv.

For example, if you have a csv with columns a, b, c, d:

data = pd.read_csv(filename) data.to_csv(filename, columns=['a', 'b', 'c', 'd']) 
like image 44
Matti John Avatar answered Sep 19 '22 18:09

Matti John