I want to read a dataframe from a csv file where the header is not in the first line. For example:
In [1]: import pandas as pd
In [2]: import io
In [3]: temp=u"""#Comment 1
...: #Comment 2
...:
...: #The previous line is empty
...: Header1|Header2|Header3
...: 1|2|3
...: 4|5|6
...: 7|8|9"""
In [4]: df = pd.read_csv(io.StringIO(temp), sep="|", comment="#",
...: skiprows=4).dropna()
In [5]: df
Out[5]:
Header1 Header2 Header3
0 1 2 3
1 4 5 6
2 7 8 9
[3 rows x 3 columns]
The problem with the above code is that I don't now how many lines will exist before the header, therefore, I cannot use skiprows=4
as I did here.
I aware I can iterate through the file, as in the question Read pandas dataframe from csv beginning with non-fix header.
What I am looking for is a simpler solution, like making pandas.read_csv
disregard any empty line and taking the first non-empty line as the header.
If you're looking to drop rows (or columns) containing empty data, you're in luck: Pandas' dropna() method is specifically for this. Technically you could run df. dropna() without any parameters, and this would default to dropping all rows where are completely empty.
You need to set skip_blank_lines=True
df = pd.read_csv(io.StringIO(temp), sep="|", comment="#", skip_blank_lines=True).dropna()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With