Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import multiple excel files into python pandas and concatenate them into one dataframe

I would like to read several excel files from a directory into pandas and concatenate them into one big dataframe. I have not been able to figure it out though. I need some help with the for loop and building a concatenated dataframe: Here is what I have so far:

import sys import csv import glob import pandas as pd  # get data file names path =r'C:\DRO\DCL_rawdata_files\excelfiles' filenames = glob.glob(path + "/*.xlsx")  dfs = []  for df in dfs:      xl_file = pd.ExcelFile(filenames)     df=xl_file.parse('Sheet1')     dfs.concat(df, ignore_index=True) 
like image 722
jonas Avatar asked Jan 03 '14 16:01

jonas


People also ask

How do you read all Excel files in a folder in pandas?

To read all excel files in a directory, use the Glob module and the read_excel() method.


2 Answers

As mentioned in the comments, one error you are making is that you are looping over an empty list.

Here is how I would do it, using an example of having 5 identical Excel files that are appended one after another.

(1) Imports:

import os import pandas as pd 

(2) List files:

path = os.getcwd() files = os.listdir(path) files 

Output:

['.DS_Store',  '.ipynb_checkpoints',  '.localized',  'Screen Shot 2013-12-28 at 7.15.45 PM.png',  'test1 2.xls',  'test1 3.xls',  'test1 4.xls',  'test1 5.xls',  'test1.xls',  'Untitled0.ipynb',  'Werewolf Modelling',  '~$Random Numbers.xlsx'] 

(3) Pick out 'xls' files:

files_xls = [f for f in files if f[-3:] == 'xls'] files_xls 

Output:

['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls'] 

(4) Initialize empty dataframe:

df = pd.DataFrame() 

(5) Loop over list of files to append to empty dataframe:

for f in files_xls:     data = pd.read_excel(f, 'Sheet1')     df = df.append(data) 

(6) Enjoy your new dataframe. :-)

df 

Output:

  Result  Sample 0      a       1 1      b       2 2      c       3 3      d       4 4      e       5 5      f       6 6      g       7 7      h       8 8      i       9 9      j      10 0      a       1 1      b       2 2      c       3 3      d       4 4      e       5 5      f       6 6      g       7 7      h       8 8      i       9 9      j      10 0      a       1 1      b       2 2      c       3 3      d       4 4      e       5 5      f       6 6      g       7 7      h       8 8      i       9 9      j      10 0      a       1 1      b       2 2      c       3 3      d       4 4      e       5 5      f       6 6      g       7 7      h       8 8      i       9 9      j      10 0      a       1 1      b       2 2      c       3 3      d       4 4      e       5 5      f       6 6      g       7 7      h       8 8      i       9 9      j      10 
like image 194
ericmjl Avatar answered Sep 30 '22 01:09

ericmjl


this works with python 2.x

be in the directory where the Excel files are

see http://pbpython.com/excel-file-combine.html

import numpy as np import pandas as pd import glob all_data = pd.DataFrame() for f in glob.glob("*.xlsx"):     df = pd.read_excel(f)     all_data = all_data.append(df,ignore_index=True)  # now save the data frame writer = pd.ExcelWriter('output.xlsx') all_data.to_excel(writer,'sheet1') writer.save()     
like image 26
john blue Avatar answered Sep 30 '22 00:09

john blue