Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Pandas, whats the equivalent of 'nrows' from read_csv() to be used in read_excel()?

Tags:

python

pandas

Want to import only certain range of data from an excel spreadsheet (.xlsm format as it has macros) into a pandas dataframe. Was doing it this way:

data    = pd.read_excel(filepath, header=0,  skiprows=4, nrows= 20, parse_cols = "A:D")

But it seems that nrows works only with read_csv() ? What would be the equivalent for read_excel()?

like image 969
Gabriel Avatar asked Mar 02 '16 12:03

Gabriel


2 Answers

If you know the number of rows in your Excel sheet, you can use the skip_footer parameter to read the first n - skip_footer rows of your file, where n is the total number of rows.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

Usage:

data = pd.read_excel(filepath, header=0, parse_cols = "A:D", skip_footer=80)

Assuming your excel sheet has 100 rows, this line would parse the first 20 rows.

like image 169
Erol Avatar answered Oct 10 '22 18:10

Erol


As noted in the documentation, as of pandas version 0.23, this is now a built-in option, and functions almost exactly as the OP stated.

The code

data = pd.read_excel(filepath, header=0, skiprows=4, nrows= 20, use_cols = "A:D")

will now read the excel file, take data from the first sheet (default), skip 4 rows of data, then take the first line (i.e., the fifth line of the sheet) as the header, read the next 20 rows of data into the dataframe (lines 6-25), and only use the columns A:D. Note that use_cols is now the final option, as parse_cols is deprecated.

like image 37
NathanielDavidChu Avatar answered Oct 10 '22 18:10

NathanielDavidChu