Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: read_csv ignore rows after a blank line

Tags:

python

pandas

There is a weird .csv file, something like:

header1,header2,header3
val11,val12,val13
val21,val22,val23
val31,val32,val33

pretty fine, but after these lines, there is always a blank line followed by lots of useless lines. The whole stuff is something line:


header1,header2,header3
val11,val12,val13
val21,val22,val23
val31,val32,val33

dhjsakfjkldsa
fasdfggfhjhgsdfgds
gsdgffsdgfdgsdfgs
gsdfdgsg

The number of lines in the bottom is totally random, the only remark is the empty line before them.

Pandas has a parameter "skipfooter" for ignoring a known number of rows in the footer.

Any idea about how to ignore this rows without actually opening (open()...) the file and removing them?

like image 336
Thiago Melo Avatar asked Dec 08 '16 17:12

Thiago Melo


People also ask

How do I skip blank rows in Pandas?

If you're looking to drop rows (or columns) containing empty data, you're in luck: Pandas' dropna() method is specifically for this. Technically you could run df. dropna() without any parameters, and this would default to dropping all rows where are completely empty.

Does read_csv read blank lines?

The read_csv method, by default, reads all blank lines of an input CSV file.

How do I remove unwanted rows from a DataFrame in python?

You can use the drop function to delete rows and columns in a Pandas DataFrame.


1 Answers

There is not any option to terminate read_csv function by getting the first blank line. This module isn't capable of accepting/rejecting lines based on desired conditions. It only can ignore blank lines (optional) or rows which disobey the formed shape of data (rows with more separators).

You can normalize the data by the below approaches (without parsing file - pure pandas):

  1. Knowing the number of the desired\trash data rows. [Manual]

    pd.read_csv('file.csv', nrows=3) or pd.read_csv('file.csv', skipfooter=4)

  2. Preserving the desired data by eliminating others in DataFrame. [Automatic]

    df.dropna(axis=0, how='any', inplace=True)

The results will be:

  header1 header2 header3
0   val11   val12   val13
1   val21   val22   val23
2   val31   val32   val33
like image 52
amin Avatar answered Oct 02 '22 20:10

amin