how do i get the values of a spreadsheet as they are formatted? im working on spreadsheets with a currency format
this for example:
ITEM NAME UNIT PRICE
item1 USD 99
item2 SGD 45
but the terms 'USD' and 'SGD' were added using the formatting capabilities of excel, and is not seen by the read_excel function of pandas. i would get the values, but not the currency name. i could only work on the spreadsheets as it is, and given that i have various spreadsheets with about 6-7 sheets each, i was hoping to have a pandas (or python)-level solution rather than an excel-level solution.
thanks guys.
to Daniel, this is how i implemented the 'xlrd' engine, which didn't seem to do anything.
excel = pd.ExcelFile('itemlist.xlsx', sheetname=None)
master = pd.DataFrame(None)
for sheet in excel.sheet_names:
df = pd.read_excel(excel,sheet,header=2, engine='xlrd')
master=master.append(df)
Use pandas. read_excel() function to read excel sheet into pandas DataFrame, by default it loads the first sheet from the excel file and parses the first row as a DataFrame column name. Excel file has an extension . xlsx.
To read an excel file as a DataFrame, use the pandas read_excel() method. You can read the first sheet, specific sheets, multiple sheets or all sheets. Pandas converts this to the DataFrame structure, which is a tabular like structure.
CSV files contains plain text and is a well know format that can be read by everyone including Pandas. In our examples we will be using a CSV file called 'data.csv'.
OpenPyXL is a Python library created for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files. It can read both the . xlsx and . xlsm file formats, which includes support for charts, graphs, and other data visualizations.
There's not any great way to do this. pandas
has no knowledge of the number formats, and xlrd
doesn't seem to be able to read formats from a .xlsx file - see here
You could use openpyxl
to accomplish this, it at least has access to the number formats, but it looks like you'd have to basically implement all the parsing logic yourself.
In [26]: from openpyxl import load_workbook
In [27]: wb = load_workbook('temp.xlsx')
In [28]: ws = wb.worksheets[0]
In [29]: ws.cell("B2") # numeric value = 4, formatted as "USD 4"
Out[29]: <Cell Sheet1.B2>
In [30]: ws.cell("B2").value
Out[30]: 4
In [31]: ws.cell("B2").number_format
Out[31]: '"USD "#'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With