Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas dataframe reading exact specified range in an excel sheet

Tags:

I have a lot of different table (and other unstructured data in an excel sheet) .. I need to create a dataframe out of range 'A3:D20' from 'Sheet2' of Excel sheet 'data'.

All examples that I come across drilldown up to sheet level, but not how to pick it from an exact range.

import openpyxl import pandas as pd  wb = openpyxl.load_workbook('data.xlsx') sheet = wb.get_sheet_by_name('Sheet2') range = ['A3':'D20']   #<-- how to specify this? spots = pd.DataFrame(sheet.range) #what should be the exact syntax for this?  print (spots) 

Once I get this, I plan to look up data in column A and find its corresponding value in column B.

Edit 1: I realised that openpyxl takes too long, and so have changed that to pandas.read_excel('data.xlsx','Sheet2') instead, and it is much faster at that stage at least.

Edit 2: For the time being, I have put my data in just one sheet and:

  • removed all other info
  • added column names,
  • applied index_col on my leftmost column
  • then used wb.loc[]
like image 224
spiff Avatar asked Jul 25 '16 06:07

spiff


People also ask

How do I read a specific row in Excel using pandas?

To tell pandas to start reading an Excel sheet from a specific row, use the argument header = 0-indexed row where to start reading. By default, header=0, and the first such row is used to give the names of the data frame columns. To skip rows at the end of a sheet, use skipfooter = number of rows to skip.

How do you select a range in a DataFrame in Python?

Select Data Using Location Index (. This means that you can use dataframe. iloc[0:1, 0:1] to select the cell value at the intersection of the first row and first column of the dataframe. You can expand the range for either the row index or column index to select more data.


2 Answers

Use the following arguments from pandas read_excel documentation:

  • skiprows : list-like
    • Rows to skip at the beginning (0-indexed)
  • parse_cols : int or list, default None
    • If None then parse all columns,
    • If int then indicates last column to be parsed
    • If list of ints then indicates list of column numbers to be parsed
    • If string then indicates comma separated list of column names and column ranges (e.g. “A:E” or “A,C,E:F”)

I imagine the call will look like:

df = read_excel(filename, 'Sheet2', skiprows = 2, parse_cols = 'A:D') 
like image 92
shane Avatar answered Sep 18 '22 07:09

shane


One way to do this is to use the openpyxl module.

Here's an example:

from openpyxl import load_workbook  wb = load_workbook(filename='data.xlsx',                     read_only=True)  ws = wb['Sheet2']  # Read the cell values into a list of lists data_rows = [] for row in ws['A3':'D20']:     data_cols = []     for cell in row:         data_cols.append(cell.value)     data_rows.append(data_cols)  # Transform into dataframe import pandas as pd df = pd.DataFrame(data_rows) 
like image 29
DocZerø Avatar answered Sep 18 '22 07:09

DocZerø