Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading back tuples from a csv file with pandas

Using pandas, I have exported to a csv file a dataframe whose cells contain tuples of strings. The resulting file has the following structure:

index,colA
1,"('a','b')"
2,"('c','d')"

Now I want to read it back using read_csv. However whatever I try, pandas interprets the values as strings rather than tuples. For instance:

In []: import pandas as pd
       df = pd.read_csv('test',index_col='index',dtype={'colA':tuple})
       df.loc[1,'colA']
Out[]: "('a','b')"

Is there a way of telling pandas to do the right thing? Preferably without heavy post-processing of the dataframe: the actual table has 5000 rows and 2500 columns.

like image 500
obo Avatar asked May 14 '14 17:05

obo


People also ask

When a .CSV file is read with pandas read_csv () what is returned by this function?

A comma-separated values (csv) file is returned as two-dimensional data structure with labeled axes. Write DataFrame to a comma-separated values (csv) file. Read a comma-separated values (csv) file into DataFrame.

What does CSV reader return?

The csv. reader method returns a reader object which iterates over lines in the given CSV file. The numbers. csv file contains numbers.


1 Answers

Storing tuples in a column isn't usually a good idea; a lot of the advantages of using Series and DataFrames are lost. That said, you could use converters to post-process the string:

>>> df = pd.read_csv("sillytup.csv", converters={"colA": ast.literal_eval})
>>> df
   index    colA
0      1  (a, b)
1      2  (c, d)

[2 rows x 2 columns]
>>> df.colA.iloc[0]
('a', 'b')
>>> type(df.colA.iloc[0])
<type 'tuple'>

But I'd probably change things at source to avoid storing tuples in the first place.

like image 105
DSM Avatar answered Oct 16 '22 10:10

DSM