I'm trying to read csv file as DataFrame with pandas, and I want to read index row as string. However, since the row for index doesn't have any characters, pandas handles this data as integer. How to read as string? Here are my csv file and code: <pre class="prettyprint"><code>[sample.csv] uid,f1,f2,f3 01,0.1,1,10 02,0.2,2,20 03,0.3,3,30 [code] df = pd.read_csv('sample.csv', index_col="uid" dtype=float) print df.index.values </code></pre> The result: df.index is integer, not string: <pre class="prettyprint"><code>>>> [1 2 3] </code></pre> But I want to get df.index as string: <pre class="prettyprint"><code>>>> ['01', '02', '03'] </code></pre> And an additional condition: The rest of index data have to be numeric value and they're actually too many and I can't point them with specific column names.

If the result is not a string you have to convert it to be a string. try: <pre class="prettyprint"><code>result = [str(i) for i in result] </code></pre> or in this case: <pre class="prettyprint"><code>print([str(i) for i in df.index.values]) </code></pre>

How to read index data as string with pandas.read_csv()?

Tags:

python

indexing

pandas

csv

I'm trying to read csv file as DataFrame with pandas, and I want to read index row as string. However, since the row for index doesn't have any characters, pandas handles this data as integer. How to read as string?

Here are my csv file and code:

[sample.csv]    
    uid,f1,f2,f3
    01,0.1,1,10
    02,0.2,2,20
    03,0.3,3,30

[code]
df = pd.read_csv('sample.csv', index_col="uid" dtype=float)
print df.index.values

The result: df.index is integer, not string:

>>> [1 2 3]

But I want to get df.index as string:

>>> ['01', '02', '03']

And an additional condition: The rest of index data have to be numeric value and they're actually too many and I can't point them with specific column names.

941

asked Jan 28 '16 10:01

ykensuke9

2 Answers

pass dtype param to specify the dtype:

In [159]:
import pandas as pd
import io
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
df = pd.read_csv(io.StringIO(t), dtype={'uid':str})
df.set_index('uid', inplace=True)
df.index

Out[159]:
Index(['01', '02', '03'], dtype='object', name='uid')

So in your case the following should work:

df = pd.read_csv('sample.csv', dtype={'uid':str})
df.set_index('uid', inplace=True)

The one-line equivalent doesn't work, due to a still-outstanding pandas bug here where the dtype param is ignored on cols that are to be treated as the index**:

df = pd.read_csv('sample.csv', dtype={'uid':str}, index_col='uid')

You can dynamically do this if we assume the first column is the index column:

In [171]:
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()
index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str
df = pd.read_csv(io.StringIO(t), dtype=dtypes)
df.set_index('uid', inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 01 to 03
Data columns (total 3 columns):
f1    3 non-null float64
f2    3 non-null float64
f3    3 non-null float64
dtypes: float64(3)
memory usage: 96.0+ bytes

In [172]:
df.index

Out[172]:
Index(['01', '02', '03'], dtype='object', name='uid')

Here we read just the header row to get the column names:

cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()

we then generate dict of the column names with the desired dtypes:

index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str

we get the index name, assuming it's the first entry and then create a dict from the rest of the cols and assign float as the desired dtype and add the index col specifying the type to be str, you can then pass this as the dtype param to read_csv

189

answered Oct 24 '22 01:10

EdChum

If the result is not a string you have to convert it to be a string. try:

result = [str(i) for i in result]

or in this case:

print([str(i) for i in df.index.values])

answered Oct 24 '22 02:10

Serbitar

Related questions
                            
                                Testing Flask web app with unittest POST Error 500
                            
                                Peewee insert if not exist
                            
                                Memory growth with broadcast operations in NumPy
                            
                                is there a way to save bokeh data table content
                            
                                Python Matplotlib Multi-color Legend Entry
                            
                                Create dynamic arguments for url_for in Flask
                            
                                naming convention: What does the 'm' mean in libpython3.5m.dylib
                            
                                Create child processes inside a child process with Python multiprocessing failed
                            
                                rabbitmq multiple consumers on a queue- only one get the message
                            
                                What is the advantage of flask.logger over the more generic python logging module?
                            
                                read HDF5 file to pandas DataFrame with conditions
                            
                                How to make 'pip install' not uninstall other versions?
                            
                                Kivy properly set own icon
                            
                                What type signature do generators have in Python?
                            
                                Find substrings in PyMongo
                            
                                PyQt4: How to pause a Thread until a signal is emitted?
                            
                                Python BigQuery allowLargeResults with pandas.io.gbq
                            
                                'Unexpected Keyword Argument' in super().__init__()
                            
                                Sklearn SVM: SVR and SVC, getting the same prediction for every input
                            
                                How do I ADD accents to a letter? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With