I have n files in a directory that I need to combine into one. They have the same amount of columns, for example, the contents of test1.csv are:
test1,test1,test1  
test1,test1,test1  
test1,test1,test1  
Similarly, the contents of test2.csv are:
test2,test2,test2  
test2,test2,test2  
test2,test2,test2  
I want final.csv to look like this:
test1,test1,test1  
test1,test1,test1  
test1,test1,test1  
test2,test2,test2  
test2,test2,test2  
test2,test2,test2  
But instead it comes out like this:
test file 1,test file 1.1,test file 1.2,test file 2,test file 2.1,test file 2.2  
,,,test file 2,test file 2,test file 2  
,,,test file 2,test file 2,test file 2  
test file 1,test file 1,test file 1,,,  
test file 1,test file 1,test file 1,,,  
Can someone help me figure out what is going on here? I have pasted my code below:
import csv
import glob
import pandas as pd
import numpy as np 
all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files
for f in glob.glob("*.csv"): #for all csv files in pwd
    df = pd.read_csv(f) #create dataframe for reading current csv
    all_data = all_data.append(df) #appends current csv to final DF
all_data.to_csv("final.csv", index=None)
                I think there are more problems:
import csv and import numpy as np, because in this demo they are not used (but maybe they in missing, lines so they can be imported) dfs, where dataframes are appended by dfs.append(df). Then I used function concat for joining this list to final dataframe.read_csv I added parameter header=None, because the main problem was that read_csv reads first row as header. to_csv I added parameter header=None for omitting header. test to final destination file, because if use function glob.glob("*.csv") you should read output file as input file.Solution:
import glob
import pandas as pd
all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files
#list of all df
dfs = []
for f in glob.glob("*.csv"): #for all csv files in pwd
    #add parameters to read_csv
    df = pd.read_csv(f, header=None) #create dataframe for reading current csv
    #print df
    dfs.append(df) #appends current csv to final DF
all_data = pd.concat(dfs, ignore_index=True)
print all_data
#       0      1      2
#0  test1  test1  test1
#1  test1  test1  test1
#2  test1  test1  test1
#3  test2  test2  test2
#4  test2  test2  test2
#5  test2  test2  test2
all_data.to_csv("test/final.csv", index=None, header=None)
Next solution is similar.
I add parameter header=None to read_csv and to_csv and add parameter ignore_index=True to append.
import glob
import pandas as pd
all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files
for f in glob.glob("*.csv"): #for all csv files in pwd
    df = pd.read_csv(f, header=None) #create dataframe for reading current csv
    all_data = all_data.append(df, ignore_index=True) #appends current csv to final DF
print all_data
#       0      1      2
#0  test1  test1  test1
#1  test1  test1  test1
#2  test1  test1  test1
#3  test2  test2  test2
#4  test2  test2  test2
#5  test2  test2  test2
all_data.to_csv("test/final.csv", index=None, header=None)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With