I have a lot of different table (and other unstructured data in an excel sheet) .. I need to create a dataframe out of range 'A3:D20' from 'Sheet2' of Excel sheet 'data'.
All examples that I come across drilldown up to sheet level, but not how to pick it from an exact range.
import openpyxl import pandas as pd wb = openpyxl.load_workbook('data.xlsx') sheet = wb.get_sheet_by_name('Sheet2') range = ['A3':'D20'] #<-- how to specify this? spots = pd.DataFrame(sheet.range) #what should be the exact syntax for this? print (spots)
Once I get this, I plan to look up data in column A and find its corresponding value in column B.
Edit 1: I realised that openpyxl takes too long, and so have changed that to pandas.read_excel('data.xlsx','Sheet2')
instead, and it is much faster at that stage at least.
Edit 2: For the time being, I have put my data in just one sheet and:
index_col
on my leftmost columnwb.loc[]
To tell pandas to start reading an Excel sheet from a specific row, use the argument header = 0-indexed row where to start reading. By default, header=0, and the first such row is used to give the names of the data frame columns. To skip rows at the end of a sheet, use skipfooter = number of rows to skip.
Select Data Using Location Index (. This means that you can use dataframe. iloc[0:1, 0:1] to select the cell value at the intersection of the first row and first column of the dataframe. You can expand the range for either the row index or column index to select more data.
Use the following arguments from pandas read_excel documentation:
- skiprows : list-like
- Rows to skip at the beginning (0-indexed)
- parse_cols : int or list, default None
- If None then parse all columns,
- If int then indicates last column to be parsed
- If list of ints then indicates list of column numbers to be parsed
- If string then indicates comma separated list of column names and column ranges (e.g. “A:E” or “A,C,E:F”)
I imagine the call will look like:
df = read_excel(filename, 'Sheet2', skiprows = 2, parse_cols = 'A:D')
One way to do this is to use the openpyxl module.
Here's an example:
from openpyxl import load_workbook wb = load_workbook(filename='data.xlsx', read_only=True) ws = wb['Sheet2'] # Read the cell values into a list of lists data_rows = [] for row in ws['A3':'D20']: data_cols = [] for cell in row: data_cols.append(cell.value) data_rows.append(data_cols) # Transform into dataframe import pandas as pd df = pd.DataFrame(data_rows)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With