Reading an Excel FileThe read_excel function of the pandas library is used read the content of an Excel file into the python environment as a pandas DataFrame. The function can read the files from the OS by using proper path to the file. By default, the function will read Sheet1.
Importing csv files in Python is 100x faster than Excel files. We can now load these files in 0.63 seconds. That's nearly 10 times faster!
We can read data from xls or xlsx files using python programming and we can also write to xls or xlsx files using python programming. We do this by using the python package "openpyxl". The package "openpyxl" can be found in Python Package Index.
I highly recommend xlrd for reading .xls
files. But there are some limitations(refer to xlrd github page):
Warning
This library will no longer read anything other than .xls files. For alternatives that read newer file formats, please see http://www.python-excel.org/.
The following are also not supported but will safely and reliably be ignored:
- Charts, Macros, Pictures, any other embedded object, including embedded worksheets. - VBA modules - Formulas, but results of formula calculations are extracted. - Comments - Hyperlinks - Autofilters, advanced filters, pivot tables, conditional formatting, data validation
Password-protected files are not supported and cannot be read by this library.
voyager mentioned the use of COM automation. Having done this myself a few years ago, be warned that doing this is a real PITA. The number of caveats is huge and the documentation is lacking and annoying. I ran into many weird bugs and gotchas, some of which took many hours to figure out.
UPDATE: For newer .xlsx
files, the recommended library for reading and writing appears to be openpyxl (thanks, Ikar Pohorský).
Using pandas:
import pandas as pd
xls = pd.ExcelFile(r"yourfilename.xls") #use r before absolute file path
sheetX = xls.parse(2) #2 is the sheet number+1 thus if the file has only 1 sheet write 0 in paranthesis
var1 = sheetX['ColumnName']
print(var1[1]) #1 is the row number...
You can choose any one of them http://www.python-excel.org/
I would recommended python xlrd library.
install it using
pip install xlrd
import using
import xlrd
to open a workbook
workbook = xlrd.open_workbook('your_file_name.xlsx')
open sheet by name
worksheet = workbook.sheet_by_name('Name of the Sheet')
open sheet by index
worksheet = workbook.sheet_by_index(0)
read cell value
worksheet.cell(0, 0).value
I think Pandas is the best way to go. There is already one answer here with Pandas using ExcelFile
function, but it did not work properly for me. From here I found the read_excel
function which works just fine:
import pandas as pd
dfs = pd.read_excel("your_file_name.xlsx", sheet_name="your_sheet_name")
print(dfs.head(10))
P.S. You need to have the xlrd
installed for read_excel
function to work
Update 21-03-2020: As you may see here, there are issues with the xlrd
engine and it is going to be deprecated. The openpyxl
is the best replacement. So as described here, the canonical syntax should be:
dfs = pd.read_excel("your_file_name.xlsx", sheet_name="your_sheet_name", engine="openpyxl")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With