Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using pandas Combining/merging 2 different Excel files/sheets

I am trying to combine 2 different Excel files. (thanks to the post Import multiple excel files into python pandas and concatenate them into one dataframe)

The one I work out so far is:

import os
import pandas as pd

df = pd.DataFrame()

for f in ['c:\\file1.xls', 'c:\\ file2.xls']:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)

df.to_excel("c:\\all.xls")

Here is how they look like.

enter image description here

However I want to:

  1. Exclude the last rows of each file (i.e. row4 and row5 in File1.xls; row7 and row8 in File2.xls).
  2. Add a column (or overwrite Column A) to indicate where the data from.

For example:

enter image description here

Is it possible? Thanks.

like image 682
Mark K Avatar asked Aug 20 '14 08:08

Mark K


People also ask

How do I combine multiple Excel worksheets into one?

On the Data tab, under Tools, click Consolidate. In the Function box, click the function that you want Excel to use to consolidate the data. In each source sheet, select your data, and then click Add. The file path is entered in All references.


2 Answers

For num. 1, you can specify skip_footer as explained here; or, alternatively, do

data = data.iloc[:-2]

once your read the data.

For num. 2, you may do:

from os.path import basename
data.index = [basename(f)] * len(data)

Also, perhaps would be better to put all the data-frames in a list and then concat them at the end; something like:

df = []
for f in ['c:\\file1.xls', 'c:\\ file2.xls']:
    data = pd.read_excel(f, 'Sheet1').iloc[:-2]
    data.index = [os.path.basename(f)] * len(data)
    df.append(data)

df = pd.concat(df)
like image 118
behzad.nouri Avatar answered Oct 25 '22 19:10

behzad.nouri


import os
import os.path
import xlrd
import xlsxwriter

file_name = input("Decide the destination file name in DOUBLE QUOTES: ")
merged_file_name = file_name + ".xlsx"
dest_book = xlsxwriter.Workbook(merged_file_name)
dest_sheet_1 = dest_book.add_worksheet()
dest_row = 1
temp = 0
path = input("Enter the path in DOUBLE QUOTES: ")
for root,dirs,files in os.walk(path):
    files = [ _ for _ in files if _.endswith('.xlsx') ]
    for xlsfile in files:
        print ("File in mentioned folder is: " + xlsfile)
        temp_book = xlrd.open_workbook(os.path.join(root,xlsfile))
        temp_sheet = temp_book.sheet_by_index(0)
        if temp == 0:
            for col_index in range(temp_sheet.ncols):
                str = temp_sheet.cell_value(0, col_index)
                dest_sheet_1.write(0, col_index, str)
            temp = temp + 1
        for row_index in range(1, temp_sheet.nrows):
            for col_index in range(temp_sheet.ncols):
                str = temp_sheet.cell_value(row_index, col_index)
                dest_sheet_1.write(dest_row, col_index, str)
            dest_row = dest_row + 1
dest_book.close()
book = xlrd.open_workbook(merged_file_name)
sheet = book.sheet_by_index(0)
print "number of rows in destination file are: ", sheet.nrows
print "number of columns in destination file are: ", sheet.ncols
like image 24
Amit Sharma Avatar answered Oct 25 '22 17:10

Amit Sharma