how do I import excel data into a dataframe in python.
Basically the current excel workbook runs some vba on opening which refreshes a pivot table and does some other stuff.
Then I wish to import the results of the pivot table refresh into a dataframe in python for further analysis.
import xlrd wb = xlrd.open_workbook('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm') #sheetnames print wb.sheet_names() #number of sheets print wb.nsheets
The refreshing and opening of the file works fine. But how do i select the data from the first sheet from say row 5 including header down to last record n.
In order to perform this task, we will be using the Openpyxl module in python. Openpyxl is a Python library for reading and writing Excel (with extension xlsx/xlsm/xltx/xltm) files. The openpyxl module allows a Python program to read and modify Excel files.
Line 1: We import the Pandas library as a pd. Line 2: We read the csv file using the pandas read_csv module, and in that, we mentioned the skiprows=[0], which means skip the first line while reading the csv file data. Line 4: Now, we print the final dataframe result shown in the above output without the header row.
Method 2: Reading an excel file using Python using openpyxlThe load_workbook() function opens the Books. xlsx file for reading. This file is passed as an argument to this function. The object of the dataframe.
You can use pandas' ExcelFile parse
method to read Excel sheets, see io docs:
xls = pd.ExcelFile('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm') df = xls.parse('Sheet1', skiprows=4, index_col=None, na_values=['NA'])
skiprows
will ignore the first 4 rows (i.e. start at row index 4), and several other options.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With