I have numerous rows in excel and the rows are filled with garbage values after an empty row. Is there a way to read only the records before the first empty row in excel using Python pandas.
I am not aware of if read_excel can do this. If you import an empty line from excel, the column values for those rows will be filled with NaN, then you could select the values till a first row is filled with all NaN's.
I am assuming your data is something like this, where you have an empty row and data following it is garbage (I included multiple empty rows and garbage following it)
df = pd.read_excel(r'Book1.xlsx') # read the file
print df
'''
col1 col2 col3
0 1 2 3
1 1 2 3
2 1 2 3
3 1 2 3
....
10 1 2 3
11 NaN NaN NaN
12 x x x
....
18 NaN NaN NaN
19 NaN NaN NaN
20 y y y
21 y y y
....
'''
first_row_with_all_NaN = df[df.isnull().all(axis=1) == True].index.tolist()[0]
# gives me the first row number of the row that has all the values to be NaN.
'''
11
'''
print df.loc[0:first_row_with_all_NaN-1]
# then I use loc to select the rows from 0 to first row with all NaN's-1
'''
col1 col2 col3
0 1 2 3
1 1 2 3
2 1 2 3
3 1 2 3
4 1 2 3
5 1 2 3
6 1 2 3
7 1 2 3
8 1 2 3
9 1 2 3
10 1 2 3
'''
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With