Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In pandas/python, reading array stored as string

I have a pandas dataframe where one of the columns has array of strings as each element.

So something like this.

  col1 col2
0 120  ['abc', 'def']
1 130  ['ghi', 'klm']

Now when i store this to csv using to_csv it seems fine. When i read it back using from_csv i seems to read back. But then when i analyse the value in each cell the array is

'[' ''' 'a' 'b' 'c' and so on. So essentially its not reading it as an array but a set of strings. Can somebody suggest how I can convert this string into an array?

I mean to say the array has been stored like a string

'[\'abc\',\'def\']'
like image 500
AMM Avatar asked Apr 16 '14 20:04

AMM


People also ask

What is the pandas Dtype for storing string data?

Pandas uses the object dtype for storing strings.

Can DataFrame store string?

Pandas' different string dtypes DataFrame , have a dtype: the type of object stored inside it. By default, Pandas will store strings using the object dtype, meaning it store strings as NumPy array of pointers to normal Python object.

What are the ways to store text data in pandas?

There are two ways to store text data in pandas: object -dtype NumPy array. StringDtype extension type.

Is an object a string in pandas?

They can not only include strings, but also any other data that Pandas doesn't understand. How is this important? When a column is Object type, it does not necessarily mean that all the values will be string. In fact, they can all be numbers, or a mixture of string, integers and floats.


3 Answers

As mentioned in the other questions, you should use literal_eval here:

from ast import literal_eval
df['col2'] = df['col2'].apply(literal_eval)

In action:

In [11]: df = pd.DataFrame([[120, '[\'abc\',\'def\']'], [130, '[\'ghi\',\'klm\']']], columns=['A', 'B'])

In [12]: df
Out[12]:
     A              B
0  120  ['abc','def']
1  130  ['ghi','klm']

In [13]: df.loc[0, 'B']  # a string
Out[13]: "['abc','def']"

In [14]: df.B = df.B.apply(literal_eval)

In [15]: df.loc[0, 'B']  # now it's a list
Out[15]: ['abc', 'def']
like image 195
Andy Hayden Avatar answered Oct 19 '22 06:10

Andy Hayden


Nevermind got it.

All i had to do was

arr = s[1:-1].split(',')

This got rid of the square brackets and also split the string into an array like I wanted.

like image 20
AMM Avatar answered Oct 19 '22 06:10

AMM


Without pandas, this is one way to do it using the ast modules' literal_eval():

>>> data = "['abc', 'def']"
>>> import ast
>>> a_list = ast.literal_eval(data)
>>> type(a_list)
<class 'list'>
>>> a_list[0]
'abc'
like image 2
shaktimaan Avatar answered Oct 19 '22 06:10

shaktimaan