Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skipping range of rows after header through pandas.read_excel

I know the argument usecols in pandas.read_excel() allows you to select specific columns.

Say I read an Excel file in with pandas.read_excel(). My excel spreadsheet has 1161 rows. I want to keep the 1st row (with index 0), and skip rows 2:337. Seems like the argument skiprows works only when 0 indexing is involved. I don't know if I could be wrong, but several runs of my code always produces an output of reading all my 1161 rows rather than only after the 337th row on. Such as this:

documentationscore_dataframe = pd.read_excel("Documentation Score Card_17DEC2015 Rev 2 17JAN2017.xlsx",
                                        sheet_name = "Sheet1",
                                        skiprows = "336",
                                        usecols = "H:BD")

Here is another attempt of what I have set up.

documentationscore_dataframe = pd.read_excel("Documentation Score Card_17DEC2015 Rev 2 17JAN2017.xlsx",
                                        sheet_name = "Sheet1",
                                        skiprows = "1:336",
                                        usecols = "H:BD")

I would like the dataframe to exclude rows 2 through 337 in the original Excel import.

like image 291
florence-y Avatar asked Apr 12 '18 15:04

florence-y


People also ask

How do you skip the rows while reading Excel file in python?

Skipping rows at specific index positions while reading a csv file to Dataframe. While calling pandas. read_csv() if we pass skiprows argument as a list of ints, then it will skip the rows from csv at specified indices in the list. For example if we want to skip lines at index 0, 2 and 5 while reading users.

How view specific rows from pandas Excel?

Use pandas. read_excel() function to read excel sheet into pandas DataFrame, by default it loads the first sheet from the excel file and parses the first row as a DataFrame column name.


1 Answers

As per the documentation for pandas.read_excel, skiprows must be list-like.

Try this instead to exclude rows 1 to 336 inclusive:

df = pd.read_excel("file.xlsx",
                   sheet_name = "Sheet1",
                   skiprows = range(1, 337),
                   usecols = "H:BD")

Note: range constructor is considered list-like for this purpose, so no explicit list conversion is necessary.

like image 148
jpp Avatar answered Oct 04 '22 22:10

jpp