I have numerous rows in excel and the rows are filled with garbage values after an empty row. Is there a way to read only the records before the first empty row in excel using Python pandas.
I am not aware of if read_excel can do this. If you import an empty line from excel, the column values for those rows will be filled with NaN, then you could select the values till a first row is filled with all NaN's.
I am assuming your data is something like this, where you have an empty row and data following it is garbage (I included multiple empty rows and garbage following it)
 
df = pd.read_excel(r'Book1.xlsx') # read the file
print df 
'''
   col1 col2 col3
0     1    2    3
1     1    2    3
2     1    2    3
3     1    2    3
....
10    1    2    3
11  NaN  NaN  NaN
12    x    x    x
....
18  NaN  NaN  NaN
19  NaN  NaN  NaN
20    y    y    y
21    y    y    y
....
'''
first_row_with_all_NaN = df[df.isnull().all(axis=1) == True].index.tolist()[0]
# gives me the first row number of the row that has all the values to be NaN. 
'''
11
'''
print df.loc[0:first_row_with_all_NaN-1]
# then I use loc to select the rows from 0 to  first row with all NaN's-1
'''
 col1 col2 col3
0     1    2    3
1     1    2    3
2     1    2    3
3     1    2    3
4     1    2    3
5     1    2    3
6     1    2    3
7     1    2    3
8     1    2    3
9     1    2    3
10    1    2    3
'''
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With