I want to read a dataframe from a csv file where the header is not in the first line. For example: <pre class="prettyprint"><code>In [1]: import pandas as pd In [2]: import io In [3]: temp=u"""#Comment 1 ...: #Comment 2 ...: ...: #The previous line is empty ...: Header1|Header2|Header3 ...: 1|2|3 ...: 4|5|6 ...: 7|8|9""" In [4]: df = pd.read_csv(io.StringIO(temp), sep="|", comment="#", ...: skiprows=4).dropna() In [5]: df Out[5]: Header1 Header2 Header3 0 1 2 3 1 4 5 6 2 7 8 9 [3 rows x 3 columns] </code></pre> The problem with the above code is that I don't now how many lines will exist before the header, therefore, I cannot use <code>skiprows=4</code> as I did here. I aware I can iterate through the file, as in the question Read pandas dataframe from csv beginning with non-fix header. What I am looking for is a simpler solution, like making <code>pandas.read_csv</code> disregard any empty line and taking the first non-empty line as the header.

You need to set <code>skip_blank_lines=True</code> <pre class="prettyprint"><code>df = pd.read_csv(io.StringIO(temp), sep="|", comment="#", skip_blank_lines=True).dropna() </code></pre>

How to skip an unknown number of empty lines before header on pandas.read

I want to read a dataframe from a csv file where the header is not in the first line. For example:

In [1]: import pandas as pd

In [2]: import io

In [3]: temp=u"""#Comment 1
   ...: #Comment 2
   ...: 
   ...: #The previous line is empty
   ...: Header1|Header2|Header3
   ...: 1|2|3
   ...: 4|5|6
   ...: 7|8|9"""

In [4]: df = pd.read_csv(io.StringIO(temp), sep="|", comment="#", 
   ...:                  skiprows=4).dropna()

In [5]: df
Out[5]: 
   Header1  Header2  Header3
0        1        2        3
1        4        5        6
2        7        8        9

[3 rows x 3 columns]

The problem with the above code is that I don't now how many lines will exist before the header, therefore, I cannot use skiprows=4 as I did here.

I aware I can iterate through the file, as in the question Read pandas dataframe from csv beginning with non-fix header.

What I am looking for is a simpler solution, like making pandas.read_csv disregard any empty line and taking the first non-empty line as the header.

How do you skip blank lines in pandas?

If you're looking to drop rows (or columns) containing empty data, you're in luck: Pandas' dropna() method is specifically for this. Technically you could run df. dropna() without any parameters, and this would default to dropping all rows where are completely empty.

You need to set skip_blank_lines=True

df = pd.read_csv(io.StringIO(temp), sep="|", comment="#", skip_blank_lines=True).dropna()

How to skip an unknown number of empty lines before header on pandas.read_csv?

Tags:

python

file-io

pandas

csv

data-import

bmello

People also ask

1 Answers

ode2k

Recent Activity

Donate For Us