I want to append (merge) all the csv files in a folder using Python pandas.
For example: Say folder has two csv files test1.csv
and test2.csv
as follows:
A_Id P_Id CN1 CN2 CN3 AAA 111 702 709 740 BBB 222 1727 1734 1778
and
A_Id P_Id CN1 CN2 CN3 CCC 333 710 750 750 DDD 444 180 734 778
So the python script I wrote was as follows:
#!/usr/bin/python import pandas as pd import glob all_data = pd.DataFrame() for f in glob.glob("testfolder/*.csv"): df = pd.read_csv(f) all_data = all_data.append(df) all_data.to_csv('testfolder/combined.csv')
Though the combined.csv
seems to have all the appended rows, it looks as follows:
CN1 CN2 CN3 A_Id P_Id 0 710 750 750 CCC 333 1 180 734 778 DDD 444 0 702 709 740 AAA 111 1 1727 1734 1778 BBB 222
Where as it should look like this:
A_ID P_Id CN1 CN2 CN2 AAA 111 702 709 740 BBB 222 1727 1734 1778 CCC 333 110 356 123 DDD 444 220 256 223
What am I missing? And how can I get get of 0s and 1s in the first column?
P.S: Since these are large csv files, I thought of using pandas.
Pandas. DataFrame doesn't preserve the column order when converting from a DataFrames.
Reorder Columns using Pandas . Another way to reorder columns is to use the Pandas . reindex() method. This allows you to pass in the columns= parameter to pass in the order of columns that you want to use.
I read somewhere else Dataframes do not guarantee line order. My experience is that the order of the CSV will be maintained when read. If you do a transform on the dataframe, the order can be lost. Dataframes do have sort support, if you are not sure.
Answer. Yes. Order of the merged dataframes will effect the order of the rows and columns of the merged dataframe. When using the merge() method, it will preserve the order of the left keys.
Try this .....
all_data = all_data.append(df)[df.columns.tolist()]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With