Say I have the following Excel file: <pre class="prettyprint"><code> A B C 0 - - - 1 Start - - 2 3 2 4 3 7 8 4 4 11 2 17 </code></pre> I want to read the file in a dataframe making sure that I start to read it below the row where the <code>Start</code> value is. Attention: the <code>Start</code> value is not always located in the same row, so if I were to use: <pre class="prettyprint"><code>import pandas as pd xls = pd.ExcelFile('C:\Users\MyFolder\MyFile.xlsx') df = xls.parse('Sheet1', skiprows=4, index_col=None) </code></pre> this would fail as <code>skiprows</code> needs to be fixed. Is there any workaround to make sure that <code>xls.parse</code> finds the string value instead of the row number?

<pre class="prettyprint"><code>df = pd.read_excel('your/path/filename') </code></pre> This answer helps in finding the location of 'start' in the df <pre class="prettyprint"><code> for row in range(df.shape[0]): for col in range(df.shape[1]): if df.iat[row,col] == 'start': row_start = row break </code></pre> after having row_start you can use subframe of pandas <pre class="prettyprint"><code>df_required = df.loc[row_start:] </code></pre> And if you don't need the row containing 'start', just u increment row_start by 1 <pre class="prettyprint"><code>df_required = df.loc[row_start+1:] </code></pre>

If you know the specific rows you are interested in, you can skip from the top using <code>skiprow</code> and then parse only the row (or rows) you want using <code>nrows</code> - see pandas.read_excel <pre class="prettyprint"><code>df = pd.read_excel('myfile.xlsx', 'Sheet1', skiprows=2, nrows=3,) </code></pre>

You could use <code>pd.read_excel('C:\Users\MyFolder\MyFile.xlsx', sheet_name='Sheet1')</code> as it ignores empty excel cells. Your DataFrame should then look like this: <pre class="prettyprint"><code> A B C 0 Start NaN NaN 1 3 2 4 2 7 8 4 3 11 2 17 </code></pre> Then drop the first row by using <pre class="prettyprint"><code>df.drop([0]) </code></pre> to get <pre class="prettyprint"><code> A B C 0 3 2 4 1 7 8 4 2 11 2 17 </code></pre>

Pandas: reading Excel file starting from the row below that with a specific value

Tags:

python

pandas

excel

Say I have the following Excel file:

Click to copy

    A      B     C
0   -      -     -
1   Start  -     -
2   3      2     4
3   7      8     4
4   11     2     17

I want to read the file in a dataframe making sure that I start to read it below the row where the Start value is.

Attention: the Start value is not always located in the same row, so if I were to use:

Click to copy

import pandas as pd
xls = pd.ExcelFile('C:\Users\MyFolder\MyFile.xlsx')
df = xls.parse('Sheet1', skiprows=4, index_col=None)

this would fail as skiprows needs to be fixed. Is there any workaround to make sure that xls.parse finds the string value instead of the row number?

531

asked Apr 17 '18 10:04

FaCoffee

3 Answers

Click to copy

df = pd.read_excel('your/path/filename')

This answer helps in finding the location of 'start' in the df

Click to copy

 for row in range(df.shape[0]): 

       for col in range(df.shape[1]):

           if df.iat[row,col] == 'start':

             row_start = row
             break

after having row_start you can use subframe of pandas

Click to copy

df_required = df.loc[row_start:]

And if you don't need the row containing 'start', just u increment row_start by 1

Click to copy

df_required = df.loc[row_start+1:]

135

answered Nov 16 '22 03:11

Abhijit Ghate

If you know the specific rows you are interested in, you can skip from the top using skiprow and then parse only the row (or rows) you want using nrows - see pandas.read_excel

Click to copy

df = pd.read_excel('myfile.xlsx', 'Sheet1', skiprows=2, nrows=3,)

answered Nov 16 '22 03:11

bfree67

You could use pd.read_excel('C:\Users\MyFolder\MyFile.xlsx', sheet_name='Sheet1') as it ignores empty excel cells.

Your DataFrame should then look like this:

Click to copy

    A      B     C
0   Start NaN   NaN
1   3      2     4
2   7      8     4
3   11     2     17

Then drop the first row by using

Click to copy

df.drop([0])

to get

Click to copy

    A      B     C
0   3      2     4
1   7      8     4
2   11     2     17

answered Nov 16 '22 02:11

Maxoz99

Related questions
                            
                                Get percentage of rows (strings) that fulfil a certain condition in a pandas data frame
                            
                                How to set border for wedges in matplotlib pie chart?
                            
                                Module vs. Package?
                            
                                How does Keras calculate the accuracy?
                            
                                Simple if statement on python interpreter
                            
                                Check if any (all) character of a string is in a given range
                            
                                Python tqdm package - how to configure for less frequent status bar updates
                            
                                Why is the dictionary key being converted to an inherited class type?
                            
                                Why does python behave this way with variables?
                            
                                IllegalArgumentException thrown when count and collect function in spark
                            
                                Plot datetime.timedelta using matplotlib and python
                            
                                Efficient numpy argsort with condition while maintaining original indices
                            
                                multiplying lists of lists with different lengths
                            
                                Perform operation on all "key":"value" pair in dict and store the result in a new dict object
                            
                                Get model name from instance
                            
                                TclError: no display name and no $DISPLAY environment variable in Google Colab
                            
                                What does the 'tearoff' attribute do in a tkinter Menu?
                            
                                Test if any column of a pandas DataFrame satisfies a condition
                            
                                row sum on a pandas pivot table
                            
                                Create a circular barplot in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: reading Excel file starting from the row below that with a specific value

Tags:

python

pandas

excel

FaCoffee

People also ask

3 Answers

Abhijit Ghate

bfree67

Maxoz99

Recent Activity

Donate For Us