Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read index data as string with pandas.read_csv()?

I'm trying to read csv file as DataFrame with pandas, and I want to read index row as string. However, since the row for index doesn't have any characters, pandas handles this data as integer. How to read as string?

Here are my csv file and code:

[sample.csv]    
    uid,f1,f2,f3
    01,0.1,1,10
    02,0.2,2,20
    03,0.3,3,30

[code]
df = pd.read_csv('sample.csv', index_col="uid" dtype=float)
print df.index.values

The result: df.index is integer, not string:

>>> [1 2 3]

But I want to get df.index as string:

>>> ['01', '02', '03']

And an additional condition: The rest of index data have to be numeric value and they're actually too many and I can't point them with specific column names.

like image 941
ykensuke9 Avatar asked Jan 28 '16 10:01

ykensuke9


People also ask

How do you read a DataFrame index?

To get the index of a Pandas DataFrame, call DataFrame. index property. The DataFrame. index property returns an Index object representing the index of this DataFrame.

What output type does pandas read_csv () return?

Read a CSV File In this case, the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data. csv , which you specified with the first argument.

How do you access the index of a pandas series?

In order to access the series element refers to the index number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to access multiple elements from a series, we use Slice operation.

What is the function of read_csv ()?

read_csv is used to load a CSV file as a pandas dataframe. In this article, you will learn the different features of the read_csv function of pandas apart from loading the CSV file and the parameters which can be customized to get better output from the read_csv function.


2 Answers

pass dtype param to specify the dtype:

In [159]:
import pandas as pd
import io
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
df = pd.read_csv(io.StringIO(t), dtype={'uid':str})
df.set_index('uid', inplace=True)
df.index

Out[159]:
Index(['01', '02', '03'], dtype='object', name='uid')

So in your case the following should work:

df = pd.read_csv('sample.csv', dtype={'uid':str})
df.set_index('uid', inplace=True)

The one-line equivalent doesn't work, due to a still-outstanding pandas bug here where the dtype param is ignored on cols that are to be treated as the index**:

df = pd.read_csv('sample.csv', dtype={'uid':str}, index_col='uid')

You can dynamically do this if we assume the first column is the index column:

In [171]:
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()
index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str
df = pd.read_csv(io.StringIO(t), dtype=dtypes)
df.set_index('uid', inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 01 to 03
Data columns (total 3 columns):
f1    3 non-null float64
f2    3 non-null float64
f3    3 non-null float64
dtypes: float64(3)
memory usage: 96.0+ bytes

In [172]:
df.index

Out[172]:
Index(['01', '02', '03'], dtype='object', name='uid')

Here we read just the header row to get the column names:

cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()

we then generate dict of the column names with the desired dtypes:

index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str

we get the index name, assuming it's the first entry and then create a dict from the rest of the cols and assign float as the desired dtype and add the index col specifying the type to be str, you can then pass this as the dtype param to read_csv

like image 189
EdChum Avatar answered Oct 24 '22 01:10

EdChum


If the result is not a string you have to convert it to be a string. try:

result = [str(i) for i in result]

or in this case:

print([str(i) for i in df.index.values])
like image 20
Serbitar Avatar answered Oct 24 '22 02:10

Serbitar