Pandas: read_csv ignore rows after a blank line

Tags:

pandas

There is a weird .csv file, something like:

header1,header2,header3
val11,val12,val13
val21,val22,val23
val31,val32,val33

pretty fine, but after these lines, there is always a blank line followed by lots of useless lines. The whole stuff is something line:

header1,header2,header3
val11,val12,val13
val21,val22,val23
val31,val32,val33

dhjsakfjkldsa
fasdfggfhjhgsdfgds
gsdgffsdgfdgsdfgs
gsdfdgsg

The number of lines in the bottom is totally random, the only remark is the empty line before them.

Pandas has a parameter "skipfooter" for ignoring a known number of rows in the footer.

Any idea about how to ignore this rows without actually opening (open()...) the file and removing them?

336

asked Dec 08 '16 17:12

1 Answers

There is not any option to terminate read_csv function by getting the first blank line. This module isn't capable of accepting/rejecting lines based on desired conditions. It only can ignore blank lines (optional) or rows which disobey the formed shape of data (rows with more separators).

You can normalize the data by the below approaches (without parsing file - pure pandas):

Knowing the number of the desired\trash data rows. [Manual]

pd.read_csv('file.csv', nrows=3) or pd.read_csv('file.csv', skipfooter=4)
Preserving the desired data by eliminating others in DataFrame. [Automatic]

df.dropna(axis=0, how='any', inplace=True)

The results will be:

  header1 header2 header3
0   val11   val12   val13
1   val21   val22   val23
2   val31   val32   val33

answered Oct 02 '22 20:10

amin

Related questions
                            
                                how do I catch multiple error types [duplicate]
                            
                                Python: exec() a code block and eval() the last line
                            
                                Create gantt chart with hlines?
                            
                                numpy.meshgrid explanation
                            
                                Sending data with kafka-python only working when briefly delaying code
                            
                                Dask "no module named xxxx" error
                            
                                Python-like multiprocessing in C++
                            
                                Using Boto3 in python to acquire results from dynamodb and parse into a usable variable or dictionary
                            
                                Maximum of an annotation after a group by
                            
                                Numpy roll vertical in 2d array
                            
                                How to select specific the cipher while sending request via python request module
                            
                                Python-Sphinx: "inherit" method documentation from superclass
                            
                                How to run django and wordpress on NGINX server using same domain?
                            
                                How to unpack a dictionary of list (of dictionaries!) and return as grouped tuples?
                            
                                Numpy unique 2D sub-array [duplicate]
                            
                                Enhance performance of geopandas overlay(intersection)
                            
                                How to log Python warnings in a Django log file?
                            
                                how to reproduce "Connection reset by peer"
                            
                                After resizing an image with cv2, how to get the new bounding box coordinate
                            
                                How to query pre-existing table from SQlAlchemy ORM session?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: read_csv ignore rows after a blank line

Tags:

python

pandas

Thiago Melo

People also ask

1 Answers

amin

Recent Activity

Donate For Us