Read multiple csv files and Add filename as new column in pandas

Question

I have several csv files in a single folder and I want to open them all in one dataframe and insert a new column with the associated filename. So far I've coded the following:

import pandas as pd
import glob, os
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('path/*.csv'))))
df['filename']= os.path.basename(csv)
df

This gives me the dataframe I want but in the new column 'filename' it's only listing the last filename in the folder for every row. I'm looking for each row to be populated with it's associated csv file. Not just the last file in the folder.

Any assistance for this newbie is much appreciated.

jezrael · Accepted Answer

I think you need assign for add new column in loop, also parameter ignore_index=True was added to concat for remove duplicates in index:

Files for test are a.csv, b.csv, c.csv.

import pandas as pd
import glob, os


files = glob.glob('samples_for_so/*.csv')
print (files)
#['samples_for_so\a.csv', 'samples_for_so\b.csv', 'samples_for_so\c.csv']


df = pd.concat([pd.read_csv(fp).assign(New=os.path.basename(fp)) for fp in files])
print (df)
   a  b  c  d    New
0  0  1  2  5  a.csv
1  1  5  8  3  a.csv
0  0  9  6  5  b.csv
1  1  6  4  2  b.csv
0  0  7  1  7  c.csv
1  1  3  2  6  c.csv

files = glob.glob('samples_for_so/*.csv')
df = pd.concat([pd.read_csv(fp).assign(New=os.path.basename(fp).split('.')[0]) 
       for fp in files])
print (df)
   a  b  c  d New
0  0  1  2  5   a
1  1  5  8  3   a
2  0  9  6  5   b
3  1  6  4  2   b
4  0  7  1  7   c
5  1  3  2  6   c

Abid Hasan · Answer

Firstly, you have no csv variable defined.

But anyway, this behaviour makes sense, because you are using the csv at the end so it'll be set to the last file. Ideally, you can use glob again to get all filenames, then set that as a new column.

#this is a Python list containing filenames
csvs = glob.glob(os.path.join('path/*.csv'))

#now set the csv into a pd series
csv_paths = pd.Series(csvs)

df['file_name'] = csv_paths.values

Read multiple csv files and Add filename as new column in pandas

Tags:

amwade2

2 Answers

jezrael

Abid Hasan

Recent Activity

Donate For Us

Read multiple csv files and Add filename as new column in pandas

Tags:

amwade2

2 Answers

jezrael

Abid Hasan

Related questions

Recent Activity

Donate For Us