Using pandas, I have exported to a csv file a dataframe whose cells contain tuples of strings. The resulting file has the following structure: <pre class="prettyprint"><code>index,colA 1,"('a','b')" 2,"('c','d')" </code></pre> Now I want to read it back using read_csv. However whatever I try, pandas interprets the values as strings rather than tuples. For instance: <pre class="prettyprint"><code>In []: import pandas as pd df = pd.read_csv('test',index_col='index',dtype={'colA':tuple}) df.loc[1,'colA'] Out[]: "('a','b')" </code></pre> Is there a way of telling pandas to do the right thing? Preferably without heavy post-processing of the dataframe: the actual table has 5000 rows and 2500 columns.

Storing tuples in a column isn't usually a good idea; a lot of the advantages of using Series and DataFrames are lost. That said, you could use <code>converters</code> to post-process the string: <pre class="prettyprint"><code>>>> df = pd.read_csv("sillytup.csv", converters={"colA": ast.literal_eval}) >>> df index colA 0 1 (a, b) 1 2 (c, d) [2 rows x 2 columns] >>> df.colA.iloc[0] ('a', 'b') >>> type(df.colA.iloc[0]) <type 'tuple'> </code></pre> But I'd probably change things at source to avoid storing tuples in the first place.

Reading back tuples from a csv file with pandas

Tags:

python

pandas

csv

tuples

Using pandas, I have exported to a csv file a dataframe whose cells contain tuples of strings. The resulting file has the following structure:

index,colA
1,"('a','b')"
2,"('c','d')"

Now I want to read it back using read_csv. However whatever I try, pandas interprets the values as strings rather than tuples. For instance:

In []: import pandas as pd
       df = pd.read_csv('test',index_col='index',dtype={'colA':tuple})
       df.loc[1,'colA']
Out[]: "('a','b')"

Is there a way of telling pandas to do the right thing? Preferably without heavy post-processing of the dataframe: the actual table has 5000 rows and 2500 columns.

500

asked May 14 '14 17:05

obo

1 Answers

Storing tuples in a column isn't usually a good idea; a lot of the advantages of using Series and DataFrames are lost. That said, you could use converters to post-process the string:

>>> df = pd.read_csv("sillytup.csv", converters={"colA": ast.literal_eval})
>>> df
   index    colA
0      1  (a, b)
1      2  (c, d)

[2 rows x 2 columns]
>>> df.colA.iloc[0]
('a', 'b')
>>> type(df.colA.iloc[0])
<type 'tuple'>

But I'd probably change things at source to avoid storing tuples in the first place.

105

answered Oct 16 '22 10:10

DSM

Related questions
                            
                                matplotlib.pyplot.imshow: removing white space within plots when using attributes "sharex" and "sharey"
                            
                                Disable DTR in pyserial from code
                            
                                Python multiprocessing Process crashes silently
                            
                                Why is Ruby's Float#round behavior different than Python's?
                            
                                Python regex matching all but last occurrence
                            
                                Python subprocess: wait for command to finish before starting next one?
                            
                                Replace x with y or append y if no x
                            
                                Using pyserial to send binary data
                            
                                How to conjugate a verb in NLTK given POS tag?
                            
                                Mocking urllib2.urlopen().read() for different responses
                            
                                Python / Django multi-tenancy solution
                            
                                Does python have Matlab's `ans` variable that captures returned value not stored in any variable?
                            
                                In a gevent application, how can I kill all greenlets that have been started?
                            
                                getting seconds from numpy timedelta64
                            
                                Redis Queue + python-rq: Right pattern to prevent high memory usage?
                            
                                Python class method chaining
                            
                                using python WeakSet to enable a callback functionality
                            
                                Storing a dict with np.savez gives unexpected result?
                            
                                Using Pandas, how do I drop the last row of each group?
                            
                                ImportError: No module named gi.repository

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With