I would like to read several csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I have so far:
import glob import pandas as pd # get data file names path =r'C:\DRO\DCL_rawdata_files' filenames = glob.glob(path + "/*.csv") dfs = [] for filename in filenames: dfs.append(pd.read_csv(filename)) # Concatenate all data into one DataFrame big_frame = pd.concat(dfs, ignore_index=True)
I guess I need some help within the for loop???
Use pandas. concat() to concatenate/merge two or multiple pandas DataFrames across rows or columns. When you concat() two pandas DataFrames on rows, it creates a new Dataframe containing all rows of two DataFrames basically it does append one DataFrame with another.
If you have same columns in all your csv
files then you can try the code below. I have added header=0
so that after reading csv
first row can be assigned as the column names.
import pandas as pd import glob path = r'C:\DRO\DCL_rawdata_files' # use your path all_files = glob.glob(path + "/*.csv") li = [] for filename in all_files: df = pd.read_csv(filename, index_col=None, header=0) li.append(df) frame = pd.concat(li, axis=0, ignore_index=True)
An alternative to darindaCoder's answer:
path = r'C:\DRO\DCL_rawdata_files' # use your path all_files = glob.glob(os.path.join(path, "*.csv")) # advisable to use os.path.join as this makes concatenation OS independent df_from_each_file = (pd.read_csv(f) for f in all_files) concatenated_df = pd.concat(df_from_each_file, ignore_index=True) # doesn't create a list, nor does it append to one
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With