I have a pandas dataframe where one of the columns has array of strings as each element.
So something like this.
col1 col2
0 120 ['abc', 'def']
1 130 ['ghi', 'klm']
Now when i store this to csv using to_csv it seems fine. When i read it back using from_csv i seems to read back. But then when i analyse the value in each cell the array is
'[' ''' 'a' 'b' 'c' and so on. So essentially its not reading it as an array but a set of strings. Can somebody suggest how I can convert this string into an array?
I mean to say the array has been stored like a string
'[\'abc\',\'def\']'
Pandas uses the object dtype for storing strings.
Pandas' different string dtypes DataFrame , have a dtype: the type of object stored inside it. By default, Pandas will store strings using the object dtype, meaning it store strings as NumPy array of pointers to normal Python object.
There are two ways to store text data in pandas: object -dtype NumPy array. StringDtype extension type.
They can not only include strings, but also any other data that Pandas doesn't understand. How is this important? When a column is Object type, it does not necessarily mean that all the values will be string. In fact, they can all be numbers, or a mixture of string, integers and floats.
As mentioned in the other questions, you should use literal_eval
here:
from ast import literal_eval
df['col2'] = df['col2'].apply(literal_eval)
In action:
In [11]: df = pd.DataFrame([[120, '[\'abc\',\'def\']'], [130, '[\'ghi\',\'klm\']']], columns=['A', 'B'])
In [12]: df
Out[12]:
A B
0 120 ['abc','def']
1 130 ['ghi','klm']
In [13]: df.loc[0, 'B'] # a string
Out[13]: "['abc','def']"
In [14]: df.B = df.B.apply(literal_eval)
In [15]: df.loc[0, 'B'] # now it's a list
Out[15]: ['abc', 'def']
Nevermind got it.
All i had to do was
arr = s[1:-1].split(',')
This got rid of the square brackets and also split the string into an array like I wanted.
Without pandas, this is one way to do it using the ast
modules' literal_eval()
:
>>> data = "['abc', 'def']"
>>> import ast
>>> a_list = ast.literal_eval(data)
>>> type(a_list)
<class 'list'>
>>> a_list[0]
'abc'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With