I imported a CSV using Pandas and one column was read in with string entries. Examining the entries for this Series (column), I see that they should actually be lists. For example:
df['A'] = pd.Series(['["entry11"]', '["entry21","entry22"]', '["entry31","entry32"]'])
I would like to extract the list elements from the strings. So far, I've tried the following chain:
df['A'] = df['A'].replace("'",'',regex=True).
replace('\[','',regex=True).
replace('\]','',regex=True).
str.split(",")
(all on one line, of course).
and this gives me back my desired list elements in one column.
My question: Is there a more efficient way of doing this? This seems like a lot of strain for something that should be a little easier.
Another way to convert a string to a list is by using the split() Python method. The split() method splits a string into a list, where each list item is each word that makes up the string. Each word will be an individual list item.
Getting a substring of a string is extracting a part of a string from a string object. It is also called a Slicing operation. You can get substring of a string in python using the str[0:n] option.
To convert a list to a string, use Python List Comprehension and the join() function. The list comprehension will traverse the elements one by one, and the join() method will concatenate the list's elements into a new string and return it as output.
You can "apply" the ast.literal_eval()
to the series:
In [8]: from ast import literal_eval
In [9]: df['A'] = df['A'].apply(literal_eval)
In [10]: df
Out[10]:
A
0 [entry11]
1 [entry21, entry22]
2 [entry31, entry32]
There is also map()
and applymap()
- here is a topic where the differences are discussed:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With