Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: reading Excel file starting from the row below that with a specific value

Say I have the following Excel file:

    A      B     C
0   -      -     -
1   Start  -     -
2   3      2     4
3   7      8     4
4   11     2     17

I want to read the file in a dataframe making sure that I start to read it below the row where the Start value is.

Attention: the Start value is not always located in the same row, so if I were to use:

import pandas as pd
xls = pd.ExcelFile('C:\Users\MyFolder\MyFile.xlsx')
df = xls.parse('Sheet1', skiprows=4, index_col=None)

this would fail as skiprows needs to be fixed. Is there any workaround to make sure that xls.parse finds the string value instead of the row number?

like image 531
FaCoffee Avatar asked Apr 17 '18 10:04

FaCoffee


People also ask

How do I read a row wise in Excel using pandas?

To read an excel file as a DataFrame, use the pandas read_excel() method. You can read the first sheet, specific sheets, multiple sheets or all sheets. Pandas converts this to the DataFrame structure, which is a tabular like structure.

How do I read a specific row in Excel using Python?

In order to perform this task, we will be using the Openpyxl module in python. Openpyxl is a Python library for reading and writing Excel (with extension xlsx/xlsm/xltx/xltm) files. The openpyxl module allows a Python program to read and modify Excel files.

How to read Excel files in Python’s pandas?

To read Excel files in Python’s Pandas, use the read_excel () function. You can specify the path to the file and a sheet name to read, as shown below:

How do I include columns in pandas read_Excel?

Pandas Solutions The simplest solution for this data set is to use the header and usecols arguments to read_excel (). The usecols parameter, in particular, can be very useful for controlling the columns you would like to include. If you would like to follow along with these examples, the file is on github.

How to read the number of rows in an Excel Dataframe?

This allows you to quickly load the file to better be able to explore the different columns and data types. This can be done using the nrows= parameter, which accepts an integer value of the number of rows you want to read into your DataFrame. Let’s see how we can read the first five rows of the Excel sheet:

How to import data from Excel to pandas data frame?

Now we can import the excel file using the read_excel function in pandas, as shown below: The second statement reads the data from excel and stores it into a pandas Data Frame which is represented by the variable newData. If there are multiple sheets in the excel workbook, the command will import data of the first sheet.


3 Answers

df = pd.read_excel('your/path/filename')

This answer helps in finding the location of 'start' in the df

 for row in range(df.shape[0]): 

       for col in range(df.shape[1]):

           if df.iat[row,col] == 'start':

             row_start = row
             break

after having row_start you can use subframe of pandas

df_required = df.loc[row_start:]

And if you don't need the row containing 'start', just u increment row_start by 1

df_required = df.loc[row_start+1:]
like image 135
Abhijit Ghate Avatar answered Nov 16 '22 03:11

Abhijit Ghate


If you know the specific rows you are interested in, you can skip from the top using skiprow and then parse only the row (or rows) you want using nrows - see pandas.read_excel

df = pd.read_excel('myfile.xlsx', 'Sheet1', skiprows=2, nrows=3,)
like image 24
bfree67 Avatar answered Nov 16 '22 03:11

bfree67


You could use pd.read_excel('C:\Users\MyFolder\MyFile.xlsx', sheet_name='Sheet1') as it ignores empty excel cells.

Your DataFrame should then look like this:

    A      B     C
0   Start NaN   NaN
1   3      2     4
2   7      8     4
3   11     2     17

Then drop the first row by using

df.drop([0])

to get

    A      B     C
0   3      2     4
1   7      8     4
2   11     2     17
like image 21
Maxoz99 Avatar answered Nov 16 '22 02:11

Maxoz99