There is a weird .csv file, something like:
header1,header2,header3
val11,val12,val13
val21,val22,val23
val31,val32,val33
pretty fine, but after these lines, there is always a blank line followed by lots of useless lines. The whole stuff is something line:
header1,header2,header3
val11,val12,val13
val21,val22,val23
val31,val32,val33
dhjsakfjkldsa
fasdfggfhjhgsdfgds
gsdgffsdgfdgsdfgs
gsdfdgsg
The number of lines in the bottom is totally random, the only remark is the empty line before them.
Pandas has a parameter "skipfooter" for ignoring a known number of rows in the footer.
Any idea about how to ignore this rows without actually opening (open()...) the file and removing them?
If you're looking to drop rows (or columns) containing empty data, you're in luck: Pandas' dropna() method is specifically for this. Technically you could run df. dropna() without any parameters, and this would default to dropping all rows where are completely empty.
The read_csv method, by default, reads all blank lines of an input CSV file.
You can use the drop function to delete rows and columns in a Pandas DataFrame.
There is not any option to terminate read_csv
function by getting the first blank line. This module isn't capable of accepting/rejecting lines based on desired conditions. It only can ignore blank lines (optional) or rows which disobey the formed shape of data (rows with more separators).
You can normalize the data by the below approaches (without parsing file - pure pandas
):
Knowing the number of the desired\trash data rows. [Manual]
pd.read_csv('file.csv', nrows=3)
or pd.read_csv('file.csv', skipfooter=4)
Preserving the desired data by eliminating others in DataFrame
. [Automatic]
df.dropna(axis=0, how='any', inplace=True)
The results will be:
header1 header2 header3
0 val11 val12 val13
1 val21 val22 val23
2 val31 val32 val33
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With