Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

From password-protected Excel file to pandas DataFrame

I can open a password-protected Excel file with this:

import sys
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
print "Excel library version:", xlApp.Version
filename, password = sys.argv[1:3]
xlwb = xlApp.Workbooks.Open(filename, Password=password)
# xlwb = xlApp.Workbooks.Open(filename)
xlws = xlwb.Sheets(1) # counts from 1, not from 0
print xlws.Name
print xlws.Cells(1, 1) # that's A1

I'm not sure though how to transfer the information to a pandas dataframe. Do I need to read cells one by one and all, or is there a convenient method for this to happen?

like image 679
dmvianna Avatar asked Mar 08 '13 01:03

dmvianna


People also ask

How do I remove password protection from Excel file?

Open the workbook that you want to change or remove the password for. On the Review tab, click Protect Sheet or Protect Workbook. Click Unprotect Sheet or Protect Workbook and enter the password. Clicking Unprotect Sheet automatically removes the password from the sheet.

How read data from XLSX file in Pandas?

pandas. read_excel() function is used to read excel sheet with extension xlsx into pandas DataFrame. By reading a single sheet it returns a pandas DataFrame object, but reading two sheets it returns a Dict of DataFrame. Can load excel files stored in a local filesystem or from an URL.


2 Answers

Simple solution

import io
import pandas as pd
import msoffcrypto

passwd = 'xyz'

decrypted_workbook = io.BytesIO()
with open(i, 'rb') as file:
    office_file = msoffcrypto.OfficeFile(file)
    office_file.load_key(password=passwd)
    office_file.decrypt(decrypted_workbook)

df = pd.read_excel(decrypted_workbook, sheet_name='abc')

pip install --user msoffcrypto-tool

Exporting all sheets of each excel from directories and sub-directories to seperate csv files

from glob import glob
PATH = "Active Cons data"

# Scaning all the excel files from directories and sub-directories
excel_files = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.xlsx'))] 

for i in excel_files:
    print(str(i))
    decrypted_workbook = io.BytesIO()
    with open(i, 'rb') as file:
        office_file = msoffcrypto.OfficeFile(file)
        office_file.load_key(password=passwd)
        office_file.decrypt(decrypted_workbook)

    df = pd.read_excel(decrypted_workbook, sheet_name=None)
    sheets_count = len(df.keys())
    sheet_l = list(df.keys())  # list of sheet names
    print(sheet_l)
    for i in range(sheets_count):
        sheet = sheet_l[i]
        df = pd.read_excel(decrypted_workbook, sheet_name=sheet)
        new_file = f"D:\\all_csv\\{sheet}.csv"
        df.to_csv(new_file, index=False)
like image 87
Suhas_Pote Avatar answered Oct 26 '22 07:10

Suhas_Pote


from David Hamann's site (all credits go to him) https://davidhamann.de/2018/02/21/read-password-protected-excel-files-into-pandas-dataframe/

Use xlwings, opening the file will first launch the Excel application so you can enter the password.

import pandas as pd
import xlwings as xw

PATH = '/Users/me/Desktop/xlwings_sample.xlsx'
wb = xw.Book(PATH)
sheet = wb.sheets['sample']

df = sheet['A1:C4'].options(pd.DataFrame, index=False, header=True).value
df
like image 26
Maurice Avatar answered Oct 26 '22 07:10

Maurice