How did old pre-0.17 versions of pandas read_csv()
interpret passing a boolean header=True
/False
for inferring the header row?
I have CSV data with header:
col1;col2;col3
1.0;10.0;100.0
2.0;20.0;200.0
3.0;30.0;300.0
header=True
i.e. df = pandas.read_csv('test.csv', sep=';', header=True)
,
that gives the following data-frame:
1.0 10.0 100.0
0 2 20 200
1 3 30 300
It means that pandas used the second row ("row 1") for column names (the names inferred are '1.0', '10.0' and '100.0').
header=False
df = pandas.read_csv('test.csv', sep=';', header=False)
gives the following:
col1 col2 col3
0 1 10 100
1 2 20 200
2 3 30 300
Which means that pandas used the first row ("row 0") as header in spite on the fact that I wrote explicitly that there is no header.
This behaviour is not intuitive to me. Can somebody explain what is happening?
If read with header=True It means that pandas used the second row ("row 1") for column names (the names inferred are '1.0', '10.0' and '100.0').
header: this allows you to specify which row will be used as column names for your dataframe. Expected an int value or a list of int values. Default value is header=0 , which means the first row of the CSV file will be treated as column names. If your file doesn't have a header, simply set header=None .
We can create a data frame of specific number of rows and columns by first creating a multi -dimensional array and then converting it into a data frame by the pandas. DataFrame() method. The columns argument is used to specify the row header or the column names.
If error_bad_lines is False, and warn_bad_lines is True, a warning for each “bad line” will be output. (Only valid with C parser).
You are telling pandas what line is your header line, by passing False
this evaluates to 0
which is why it reads in the first line as the header as expected, when you pass True
it evaluates to 1
so it reads the second line, if you passed None
then it thinks there is no header row and will auto generated ordinal values.
In [17]:
import io
import pandas as pd
t="""col1;col2;col3
1.0;10.0;100.0
2.0;20.0;200.0
3.0;30.0;300.0"""
print('False:\n', pd.read_csv(io.StringIO(t), sep=';', header=False))
print('\nTrue:\n', pd.read_csv(io.StringIO(t), sep=';', header=True))
print('\nNone:\n', pd.read_csv(io.StringIO(t), sep=';', header=None))
False:
col1 col2 col3
0 1 10 100
1 2 20 200
2 3 30 300
True:
1.0 10.0 100.0
0 2 20 200
1 3 30 300
None:
0 1 2
0 col1 col2 col3
1 1.0 10.0 100.0
2 2.0 20.0 200.0
3 3.0 30.0 300.0
UPDATE
Since version 0.17.0
this will now raise a TypeError
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With