Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas read excel as formatted

how do i get the values of a spreadsheet as they are formatted? im working on spreadsheets with a currency format

this for example:

ITEM NAME UNIT PRICE
item1     USD 99
item2     SGD 45

but the terms 'USD' and 'SGD' were added using the formatting capabilities of excel, and is not seen by the read_excel function of pandas. i would get the values, but not the currency name. i could only work on the spreadsheets as it is, and given that i have various spreadsheets with about 6-7 sheets each, i was hoping to have a pandas (or python)-level solution rather than an excel-level solution.

thanks guys.

to Daniel, this is how i implemented the 'xlrd' engine, which didn't seem to do anything.

excel = pd.ExcelFile('itemlist.xlsx', sheetname=None)
master = pd.DataFrame(None)

for sheet in excel.sheet_names:
    df = pd.read_excel(excel,sheet,header=2, engine='xlrd')
    master=master.append(df)
like image 717
carlo Avatar asked Jun 26 '16 12:06

carlo


People also ask

How do I get pandas to read XLSX files?

Use pandas. read_excel() function to read excel sheet into pandas DataFrame, by default it loads the first sheet from the excel file and parses the first row as a DataFrame column name. Excel file has an extension . xlsx.

How can I read Excel file in pandas?

To read an excel file as a DataFrame, use the pandas read_excel() method. You can read the first sheet, specific sheets, multiple sheets or all sheets. Pandas converts this to the DataFrame structure, which is a tabular like structure.

Can pandas read Excel csv?

CSV files contains plain text and is a well know format that can be read by everyone including Pandas. In our examples we will be using a CSV file called 'data.csv'.

Can Python read XLSX files?

OpenPyXL is a Python library created for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files. It can read both the . xlsx and . xlsm file formats, which includes support for charts, graphs, and other data visualizations.


1 Answers

There's not any great way to do this. pandas has no knowledge of the number formats, and xlrd doesn't seem to be able to read formats from a .xlsx file - see here

You could use openpyxl to accomplish this, it at least has access to the number formats, but it looks like you'd have to basically implement all the parsing logic yourself.

In [26]: from openpyxl import load_workbook

In [27]: wb = load_workbook('temp.xlsx')

In [28]: ws = wb.worksheets[0]

In [29]: ws.cell("B2")  # numeric value = 4, formatted as "USD 4"
Out[29]: <Cell Sheet1.B2>

In [30]: ws.cell("B2").value
Out[30]: 4

In [31]: ws.cell("B2").number_format
Out[31]: '"USD "#'
like image 97
chrisb Avatar answered Oct 05 '22 00:10

chrisb