I have n files in a directory that I need to combine into one. They have the same amount of columns, for example, the contents of test1.csv
are:
test1,test1,test1
test1,test1,test1
test1,test1,test1
Similarly, the contents of test2.csv
are:
test2,test2,test2
test2,test2,test2
test2,test2,test2
I want final.csv to look like this:
test1,test1,test1
test1,test1,test1
test1,test1,test1
test2,test2,test2
test2,test2,test2
test2,test2,test2
But instead it comes out like this:
test file 1,test file 1.1,test file 1.2,test file 2,test file 2.1,test file 2.2
,,,test file 2,test file 2,test file 2
,,,test file 2,test file 2,test file 2
test file 1,test file 1,test file 1,,,
test file 1,test file 1,test file 1,,,
Can someone help me figure out what is going on here? I have pasted my code below:
import csv
import glob
import pandas as pd
import numpy as np
all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files
for f in glob.glob("*.csv"): #for all csv files in pwd
df = pd.read_csv(f) #create dataframe for reading current csv
all_data = all_data.append(df) #appends current csv to final DF
all_data.to_csv("final.csv", index=None)
I think there are more problems:
import csv
and import numpy as np
, because in this demo they are not used (but maybe they in missing, lines so they can be imported) dfs
, where dataframes are appended by dfs.append(df)
. Then I used function concat
for joining this list to final dataframe.read_csv
I added parameter header=None
, because the main problem was that read_csv
reads first row as header
. to_csv
I added parameter header=None
for omitting header. test
to final destination file, because if use function glob.glob("*.csv")
you should read output file as input file.Solution:
import glob
import pandas as pd
all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files
#list of all df
dfs = []
for f in glob.glob("*.csv"): #for all csv files in pwd
#add parameters to read_csv
df = pd.read_csv(f, header=None) #create dataframe for reading current csv
#print df
dfs.append(df) #appends current csv to final DF
all_data = pd.concat(dfs, ignore_index=True)
print all_data
# 0 1 2
#0 test1 test1 test1
#1 test1 test1 test1
#2 test1 test1 test1
#3 test2 test2 test2
#4 test2 test2 test2
#5 test2 test2 test2
all_data.to_csv("test/final.csv", index=None, header=None)
Next solution is similar.
I add parameter header=None
to read_csv
and to_csv
and add parameter ignore_index=True
to append
.
import glob
import pandas as pd
all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files
for f in glob.glob("*.csv"): #for all csv files in pwd
df = pd.read_csv(f, header=None) #create dataframe for reading current csv
all_data = all_data.append(df, ignore_index=True) #appends current csv to final DF
print all_data
# 0 1 2
#0 test1 test1 test1
#1 test1 test1 test1
#2 test1 test1 test1
#3 test2 test2 test2
#4 test2 test2 test2
#5 test2 test2 test2
all_data.to_csv("test/final.csv", index=None, header=None)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With