I would like to read several excel files from a directory into pandas and concatenate them into one big dataframe. I have not been able to figure it out though. I need some help with the for loop and building a concatenated dataframe: Here is what I have so far:
import sys import csv import glob import pandas as pd # get data file names path =r'C:\DRO\DCL_rawdata_files\excelfiles' filenames = glob.glob(path + "/*.xlsx") dfs = [] for df in dfs: xl_file = pd.ExcelFile(filenames) df=xl_file.parse('Sheet1') dfs.concat(df, ignore_index=True)
To read all excel files in a directory, use the Glob module and the read_excel() method.
As mentioned in the comments, one error you are making is that you are looping over an empty list.
Here is how I would do it, using an example of having 5 identical Excel files that are appended one after another.
(1) Imports:
import os import pandas as pd
(2) List files:
path = os.getcwd() files = os.listdir(path) files
Output:
['.DS_Store', '.ipynb_checkpoints', '.localized', 'Screen Shot 2013-12-28 at 7.15.45 PM.png', 'test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls', 'Untitled0.ipynb', 'Werewolf Modelling', '~$Random Numbers.xlsx']
(3) Pick out 'xls' files:
files_xls = [f for f in files if f[-3:] == 'xls'] files_xls
Output:
['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls']
(4) Initialize empty dataframe:
df = pd.DataFrame()
(5) Loop over list of files to append to empty dataframe:
for f in files_xls: data = pd.read_excel(f, 'Sheet1') df = df.append(data)
(6) Enjoy your new dataframe. :-)
df
Output:
Result Sample 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10
this works with python 2.x
be in the directory where the Excel files are
see http://pbpython.com/excel-file-combine.html
import numpy as np import pandas as pd import glob all_data = pd.DataFrame() for f in glob.glob("*.xlsx"): df = pd.read_excel(f) all_data = all_data.append(df,ignore_index=True) # now save the data frame writer = pd.ExcelWriter('output.xlsx') all_data.to_excel(writer,'sheet1') writer.save()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With