I'm trying to create a dataframe of a csv file that has 4 empty columns. When I open it on LibreOffice or Excel it correctly identifies the empty columns. However, opening with pd.read_csv()
ends up shifting the columns' values by one.
How can I solve this? It seems like a problem with pandas read_csv()
method.
My code is really standard:
import pandas as pd
df = pd.DataFrame.read_csv('csv_file.csv', sep=',')
df.head()
I changed the headers and used this:
df = pd.DataFrame.read_csv('csv_file.csv', sep=',', index_col=False).
This solved the problem, but what in my previous headers was causing this?
It seems you need the parameter index_col=False
to NOT read the first column to index in read_csv
, sep=','
parameter can be omitted, because it is the default value:
df = pd.read_csv('csv_file.csv', index_col=False)
Your sample:
df = pd.read_csv('teste2.csv', index_col=False)
print (df)
Header1 Header2 Header3 Unnamed: 3 Unnamed: 4 Header4 Header5 Header6 \
0 ptn M00001 0 NaN NaN 2 0 0
Header7 Header8 ... Header22 Header23 Header24 Header25 \
0 0 -31.573 ... -0.375 0.0 -64.168 276.586
Header26 Header27 Unnamed: 29 Unnamed: 30 Header28 Header29
0 -0.232 0.0 NaN NaN 0.702 1.0
[1 rows x 33 columns]
The problems occurs if your line ends with an delimiter (here comma[,]), which creates an empty cell generally not visible in MS Excel. If your csv line looks like this:
1,2282816,102.97245065789474,2432,0.8333333333333334,0.1388888888888889,certain,
then modify it to:
1,2282816,102.97245065789474,2432,0.8333333333333334,0.1388888888888889,certain
and pd.read_csv(fileName)
will work fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With