I want to append (merge) all the csv files in a folder using Python pandas. For example: Say folder has two csv files <code>test1.csv</code> and <code>test2.csv</code> as follows: <pre class="prettyprint"><code>A_Id P_Id CN1 CN2 CN3 AAA 111 702 709 740 BBB 222 1727 1734 1778 </code></pre> and <pre class="prettyprint"><code>A_Id P_Id CN1 CN2 CN3 CCC 333 710 750 750 DDD 444 180 734 778 </code></pre> So the python script I wrote was as follows: <pre class="prettyprint"><code>#!/usr/bin/python import pandas as pd import glob all_data = pd.DataFrame() for f in glob.glob("testfolder/*.csv"): df = pd.read_csv(f) all_data = all_data.append(df) all_data.to_csv('testfolder/combined.csv') </code></pre> Though the <code>combined.csv</code> seems to have all the appended rows, it looks as follows: <pre class="prettyprint"><code> CN1 CN2 CN3 A_Id P_Id 0 710 750 750 CCC 333 1 180 734 778 DDD 444 0 702 709 740 AAA 111 1 1727 1734 1778 BBB 222 </code></pre> Where as it should look like this: <pre class="prettyprint"><code>A_ID P_Id CN1 CN2 CN2 AAA 111 702 709 740 BBB 222 1727 1734 1778 CCC 333 110 356 123 DDD 444 220 256 223 </code></pre> <ul> <li>Why are the first two columns moved to the end?</li> <li>Why is it appending in the first line rather than at the last line?</li> </ul> What am I missing? And how can I get get of 0s and 1s in the first column? P.S: Since these are large csv files, I thought of using pandas.

Try this ..... <pre class="prettyprint"><code>all_data = all_data.append(df)[df.columns.tolist()] </code></pre>

Why the column order is changing while appending pandas dataframes?

Tags:

python

pandas

csv

I want to append (merge) all the csv files in a folder using Python pandas.

For example: Say folder has two csv files test1.csv and test2.csv as follows:

A_Id    P_Id    CN1         CN2         CN3 AAA     111     702         709         740 BBB     222     1727        1734        1778

and

A_Id    P_Id    CN1         CN2         CN3 CCC     333     710        750          750 DDD     444     180        734          778

So the python script I wrote was as follows:

#!/usr/bin/python import pandas as pd import glob  all_data = pd.DataFrame() for f in glob.glob("testfolder/*.csv"):     df = pd.read_csv(f)     all_data = all_data.append(df)  all_data.to_csv('testfolder/combined.csv')

Though the combined.csv seems to have all the appended rows, it looks as follows:

      CN1       CN2         CN3    A_Id    P_Id   0   710      750         750     CCC     333   1   180       734         778     DDD     444        0   702       709         740     AAA     111   1  1727       1734        1778    BBB     222

Where as it should look like this:

A_ID   P_Id   CN1    CN2    CN2 AAA    111    702    709    740 BBB    222    1727   1734   1778 CCC    333    110    356    123 DDD    444    220    256    223

Why are the first two columns moved to the end?
Why is it appending in the first line rather than at the last line?

What am I missing? And how can I get get of 0s and 1s in the first column?

P.S: Since these are large csv files, I thought of using pandas.

560

asked Nov 19 '15 07:11

kingmakerking

1 Answers

Try this .....

all_data = all_data.append(df)[df.columns.tolist()]

answered Sep 21 '22 23:09

user6745154

Related questions
                            
                                PyCharm venv failed: 'no such option: --build-dir'
                            
                                Python distutils, how to get a compiler that is going to be used?
                            
                                SQLAlchemy: create an intentionally empty query?
                            
                                range(len(list)) or enumerate(list)? [duplicate]
                            
                                raw_id_fields: How to show a name instead of id?
                            
                                How can INFO and DEBUG logging message be sent to stdout and higher level message to stderr
                            
                                call list of function using list comprehension
                            
                                Set "in" operator: uses equality or identity?
                            
                                Python - 'import' or pass modules as parameters?
                            
                                Difference between import numpy and import numpy as np
                            
                                TypeError: b'1' is not JSON serializable
                            
                                Understanding memory allocation for large integers in Python
                            
                                How to import custom modules in google colab?
                            
                                Finding longest overlapping ranges [duplicate]
                            
                                Seaborn ValueError: zero-size array to reduction operation minimum which has no identity
                            
                                Deriving class from `object` in python
                            
                                ImportError DLL load failed importing _tkinter
                            
                                Equivalent of asyncio.Queues with worker "threads"
                            
                                Pip Install not installing into correct directory?
                            
                                Dangers of sys.setdefaultencoding('utf-8')

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With