Is there a way to preserve the order of the columns in a csv file when read and the write with Python Pandas? For example, in this code <pre class="prettyprint"><code>import pandas as pd data = pd.read_csv(filename) data.to_csv(filename) </code></pre> the output files might be different because the columns are not preserved.

There appears to be a bug in the current version of Pandas ('0.11.0'), which means that Matti John's answer will not work. If you specify columns for writing to file, they are written in alphabetical order, but simply relabelled according to the list in cols. For example, this code: <pre class="prettyprint"><code>import pandas dfdict={} dfdict["a"]=[1,2,3,4] dfdict["b"]=[5,6,7,8] dfdict["c"]=[9,10,11,12] df=pandas.DataFrame(dfdict) df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"]) </code></pre> results in this (incorrect) output: <pre class="prettyprint"><code> b a c 0 1 5 9 1 2 6 10 2 3 7 11 3 4 8 12 </code></pre> You can check which version of pandas you have installed by executing: <pre class="prettyprint"><code>pandas.version.version </code></pre> Documentation for to_csv is here Actually, it seems that this is a known bug and will be fixed in an upcoming release (0.11.1): https://github.com/pydata/pandas/issues/3489 UPDATE: There still hasn't been a new release of pandas, but there is a workaround described here, which doesn't require using a different version of pandas: github.com/pydata/pandas/issues/3454 So changing the last line in the block of code above to the following will work correctly: <pre class="prettyprint"><code>df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"], engine='python') </code></pre> UPDATE it seems that the argument "cols" has been renamed to "columns" and that the argument "engine" is deprecated (no longer available) in recent versions of pandas. Also, this bug is fixed in version 0.19.0.

The column order should generally be preserved when reading and then writing a csv file like that, but if for some reason they are not in the order you want you can use the <code>columns</code> keyword argument in <code>to_csv</code>. For example, if you have a csv with columns a, b, c, d: <pre class="prettyprint"><code>data = pd.read_csv(filename) data.to_csv(filename, columns=['a', 'b', 'c', 'd']) </code></pre>

Preserving column order in Python Pandas DataFrame

Tags:

python

pandas

Is there a way to preserve the order of the columns in a csv file when read and the write with Python Pandas? For example, in this code

import pandas as pd  data = pd.read_csv(filename) data.to_csv(filename)

the output files might be different because the columns are not preserved.

496

asked Mar 27 '13 07:03

Hernan

2 Answers

There appears to be a bug in the current version of Pandas ('0.11.0'), which means that Matti John's answer will not work. If you specify columns for writing to file, they are written in alphabetical order, but simply relabelled according to the list in cols. For example, this code:

import pandas dfdict={} dfdict["a"]=[1,2,3,4] dfdict["b"]=[5,6,7,8] dfdict["c"]=[9,10,11,12] df=pandas.DataFrame(dfdict) df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"])

results in this (incorrect) output:

    b   a   c 0   1   5   9 1   2   6   10 2   3   7   11 3   4   8   12

You can check which version of pandas you have installed by executing:

pandas.version.version

Documentation for to_csv is here

Actually, it seems that this is a known bug and will be fixed in an upcoming release (0.11.1):

https://github.com/pydata/pandas/issues/3489

UPDATE: There still hasn't been a new release of pandas, but there is a workaround described here, which doesn't require using a different version of pandas:

github.com/pydata/pandas/issues/3454

So changing the last line in the block of code above to the following will work correctly:

df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"], engine='python')

UPDATE it seems that the argument "cols" has been renamed to "columns" and that the argument "engine" is deprecated (no longer available) in recent versions of pandas. Also, this bug is fixed in version 0.19.0.

125

answered Sep 20 '22 18:09

CnrL

The column order should generally be preserved when reading and then writing a csv file like that, but if for some reason they are not in the order you want you can use the columns keyword argument in to_csv.

For example, if you have a csv with columns a, b, c, d:

data = pd.read_csv(filename) data.to_csv(filename, columns=['a', 'b', 'c', 'd'])

answered Sep 19 '22 18:09

Matti John

Related questions
                            
                                How can I classify data with the nearest-neighbor algorithm using Python?
                            
                                Python Class Based Decorator with parameters that can decorate a method or a function
                            
                                How can I get a list of the symbols in a sympy expression?
                            
                                Multiple configuration files with Python ConfigParser
                            
                                Timer for Python game
                            
                                How to replace a double backslash with a single backslash in python?
                            
                                Cancellable threading.Timer in Python
                            
                                How to pass a variable to an exception when raised and retrieve it when excepted?
                            
                                Suppressing output in python subprocess call [duplicate]
                            
                                Difference between IOError and OSError?
                            
                                Get the same hash value for a Pandas DataFrame each time
                            
                                Evaluate multiple scores on sklearn cross_val_score
                            
                                Clearing Tensorflow GPU memory after model execution
                            
                                AttributeError when using "import dateutil" and "dateutil.parser.parse()" but no problems when using "from dateutil import parser"
                            
                                pandas.io.json.json_normalize with very nested json
                            
                                Reversing a regular expression in Python
                            
                                How do I use Logging in the Django Debug Toolbar?
                            
                                Why is 'True == not False' a syntax error in Python?
                            
                                setting the default string value of Python's collections.defaultdict
                            
                                Logging requests to django-rest-framework

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With