Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas read_csv: ignore trailing lines with empty data

Tags:

python

pandas

I would like to read the following data from a csv file:

id;type;start;end
Test;OIS;01/07/2016;01/07/2018
;;;
;;;

However, pandas read_csv will try reading the empty lines ;;; as well. Is there a way to automatically ignore these trailing lines of empty data?

These lines are causing a problem because I am using read_csv with converters, and the functions in the converters will dutifully throw an exception when they encounter invalid data, meaning I don't even arrive at a valid dataframe. I could change the functions to convert invalid data to NaN and then drop NaNs from the dataframe, but then I would silently be dropping erroneous data as well as those empty lines.

Some clarifications:

  • The lines of empty data will always been trailing, it's a common problem with csv files generated from Excel.
  • The data is user-generated so manually cleaning the file is not an option.
like image 617
Anne Avatar asked Jul 01 '15 11:07

Anne


1 Answers

Not sure you can so it directly with read_csv but you can use dropna:

import pandas as pd

df= pd.read_csv("in.csv", delimiter=";")
df.dropna(how="all", inplace=True) 
print(df)
like image 132
Padraic Cunningham Avatar answered Oct 25 '22 16:10

Padraic Cunningham