This is a re-worded version of my question which hopefully makes more sense:
When using read_csv
with an implicit index (i.e. the first column in the file does not have a header), everything works and I get a dataframe whose index is the first column in the file - the implicit index column.
However, if I specify usecols
as an argument to read_csv
, the implicit index column is ignored and the returned dataframe has a standard index created by pandas (0, 1, 2, 3 etc).
I cannot explicitly pass the index column in the list for usecols
and then specify the index_col
argument because the implicit index column has no header (this is how pandas knows it is an implicit index)!
Is there any way around this?
Here is the original question:
I am trying to read a csv file which has a column of row indexes which is not named, the rest of the columns are named:
|head1|head2|
index1 | data1 | data2 |
When I read in a certain number of columns with usecols, I also want to include the row indexes. However, as these are not named, I can't include the string in my list for usecols.
I've tried doing a combination of an integer index and strings (e.g. usecols = [0, 'header1', 'header2']
but this does not seem to work.
If I simply specify ind_col
as 0, it will use the first column in my selection as the index column.
So, how can I read in a name column selection (via usecols) whilst retaining the first, nameless, column in the file as my row index?
I recently had this same issue and was able to solve it using pandas default unnamed method.
data = pd.read_csv('advertising.csv', header=0, index_col=[0] , usecols=['Unnamed: 0', 'radio','sales'])
Try without using usecols, there is a known bug which means this won't work with a separator other than ,
.
You can read these directly:
In [11]: pd.read_csv('foo.csv', sep='\s*\|\s*', index_col=[0])
Out[11]:
head1 head2 Unnamed: 3
index1 data1 data2 NaN
In [12]: pd.read_csv('foo.csv', sep='\s*\|\s*', index_col=[0]).dropna(axis=1)
Out[12]:
head1 head2
index1 data1 data2
Note: I've had to use \s*|\s*
as the sep rather than just |
so as not to include spaces.
If I understand this question correctly, I think you may have to read in the entire csv file as a dataframe and then select the columns that you want.... Something like this:
import pandas as pd
df = pd.read_csv(yourdata, index_col=0).loc[:,'header1']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With