I have an input file with known columns, let's say two columns <code>Name</code> and <code>Sex</code>. Sometimes it has the header line <code>Name,Sex</code>, and sometimes it doesn't: 1.csv: <pre class="prettyprint"><code>Name,Sex John,M Leslie,F </code></pre> 2.csv: <pre class="prettyprint"><code>John,M Leslie,F </code></pre> Knowing the identity of the columns beforehand, is there a nice way to handle both cases with the same <code>read_csv</code> command? Basically, I want to specify <code>names=['Name', 'Sex']</code> and then have it infer <code>header=0</code> only when the header is there. Best I can come up with is: <ul> <li>1) Read the first line of the file before doing <code>read_csv</code>, and set parameters appropriately.</li> <li>2) Just do <code>df = pd.read_csv(input_file, names=['Name', 'Sex'])</code>, then check whether the zeroeth row is identical to the header, and if so drop it (and then maybe have to renumber the rows).</li> </ul> But this doesn't seem like that unusual of a use case to me. Is there a built-in way of doing this with <code>read_csv</code> that I haven't thought of?

using new feature - selection by callable: <pre class="prettyprint"><code>cols = ['Name','Sex'] df = (pd.read_csv(filename, header=None, names=cols) [lambda x: np.ones(len(x)).astype(bool) if (x.iloc[0] != cols).all() else np.concatenate([[False], np.ones(len(x)-1).astype(bool)])] ) </code></pre> using .query() method: <pre class="prettyprint"><code>df = (pd.read_csv(filename, header=None, names=cols) .query('Name != "Name" and Sex != "Sex"')) </code></pre> i'm not sure that this is the most elegant way, but this should work as well: <pre class="prettyprint"><code>df = pd.read_csv(filename, header=None, names=cols) if (df.iloc[0] == cols).all(): df = df[1:].reset_index(drop=True) </code></pre>

I've come up with a way of detecting the header without prior knowledge of its names: <pre class="prettyprint lang-py prettyprint-override"><code>if any(df.iloc[0].apply(lambda x: isinstance(x, str))): df = df[1:].reset_index(drop=True) </code></pre> And by changing it slightly, it can update the current header with the detected one: <pre class="prettyprint lang-py prettyprint-override"><code>if any(df.iloc[0].apply(lambda x: isinstance(x, str))): df = df[1:].reset_index(drop=True).rename(columns=df.iloc[0]) </code></pre> This would allow easily selecting the desired behavior: <pre class="prettyprint lang-py prettyprint-override"><code>update_header = True if any(df.iloc[0].apply(lambda x: isinstance(x, str))): new_header = df.iloc[0] df = df[1:].reset_index(drop=True) if update_header: df.rename(columns=new_header, inplace=True) </code></pre> Pros: <ul> <li>Doesn't require prior knowledge of the header's names.</li> <li>Can be used to update the header automatically if an existing one is detected.</li> </ul> Cons: <ul> <li>Won't work well if data contains strings. Replacing <code>if any()</code> to require all elements to be strings might help, unless data also contains entire rows of strings.</li> </ul>

Pandas read_csv without knowing whether header is present

I have an input file with known columns, let's say two columns Name and Sex. Sometimes it has the header line Name,Sex, and sometimes it doesn't:

1.csv:

Name,Sex
John,M
Leslie,F

2.csv:

John,M
Leslie,F

Knowing the identity of the columns beforehand, is there a nice way to handle both cases with the same read_csv command? Basically, I want to specify names=['Name', 'Sex'] and then have it infer header=0 only when the header is there. Best I can come up with is:

1) Read the first line of the file before doing read_csv, and set parameters appropriately.
2) Just do df = pd.read_csv(input_file, names=['Name', 'Sex']), then check whether the zeroeth row is identical to the header, and if so drop it (and then maybe have to renumber the rows).

But this doesn't seem like that unusual of a use case to me. Is there a built-in way of doing this with read_csv that I haven't thought of?

How do you check if CSV file has header or not in python?

Sniffer(). has_header(csv_test_bytes) # Check to see if there's a header in the file. dialect = csv. Sniffer().

How can I read pandas without header?

To read CSV file without header, use the header parameter and set it to “None” in the read_csv() method.

How do I get rid of the pandas header?

How do I remove a header from a Dataframe in Python? Just simply put header=False and for eliminating the index using index=False. If you want to learn more about Pandas then visit this Python Course designed by industrial experts.

How do I save pandas DataFrame as csv without header?

Pandas to CSV without Header To write DataFrame to CSV without column header (remove column names) use header=False param on to_csv() method.

using new feature - selection by callable:

cols = ['Name','Sex']

df = (pd.read_csv(filename, header=None, names=cols)
      [lambda x: np.ones(len(x)).astype(bool)
                 if (x.iloc[0] != cols).all()
                 else np.concatenate([[False], np.ones(len(x)-1).astype(bool)])]
)

using .query() method:

df = (pd.read_csv(filename, header=None, names=cols)
        .query('Name != "Name" and Sex != "Sex"'))

i'm not sure that this is the most elegant way, but this should work as well:

df = pd.read_csv(filename, header=None, names=cols)

if (df.iloc[0] == cols).all():
    df = df[1:].reset_index(drop=True)

I've come up with a way of detecting the header without prior knowledge of its names:

if any(df.iloc[0].apply(lambda x: isinstance(x, str))):
    df = df[1:].reset_index(drop=True)

And by changing it slightly, it can update the current header with the detected one:

if any(df.iloc[0].apply(lambda x: isinstance(x, str))):
    df = df[1:].reset_index(drop=True).rename(columns=df.iloc[0])

This would allow easily selecting the desired behavior:

update_header = True

if any(df.iloc[0].apply(lambda x: isinstance(x, str))):
    new_header = df.iloc[0]

    df = df[1:].reset_index(drop=True)

    if update_header:
        df.rename(columns=new_header, inplace=True)

Pros:

Doesn't require prior knowledge of the header's names.
Can be used to update the header automatically if an existing one is detected.

Cons:

Won't work well if data contains strings. Replacing if any() to require all elements to be strings might help, unless data also contains entire rows of strings.

Pandas read_csv without knowing whether header is present

Tags:

python

pandas

csv

leekaiinthesky

People also ask

2 Answers

MaxU - stop WAR against UA

Micael Jarniac

Recent Activity

Donate For Us

Pandas read_csv without knowing whether header is present

Tags:

python

pandas

csv

leekaiinthesky

People also ask

2 Answers

MaxU - stop WAR against UA

Micael Jarniac

Related questions

Recent Activity

Donate For Us