I imported a CSV using Pandas and one column was read in with string entries. Examining the entries for this Series (column), I see that they should actually be lists. For example: <pre class="prettyprint"><code>df['A'] = pd.Series(['["entry11"]', '["entry21","entry22"]', '["entry31","entry32"]']) </code></pre> I would like to extract the list elements from the strings. So far, I've tried the following chain: <pre class="prettyprint"><code>df['A'] = df['A'].replace("'",'',regex=True). replace('\[','',regex=True). replace('\]','',regex=True). str.split(",") </code></pre> (all on one line, of course). and this gives me back my desired list elements in one column. <ul> <li>['"entry11"']</li> <li>['"entry21", "entry22"']</li> <li>['"entry31", "entry32"']</li> </ul> My question: Is there a more efficient way of doing this? This seems like a lot of strain for something that should be a little easier.

You can "apply" the <code>ast.literal_eval()</code> to the series: <pre class="prettyprint"><code>In [8]: from ast import literal_eval In [9]: df['A'] = df['A'].apply(literal_eval) In [10]: df Out[10]: A 0 [entry11] 1 [entry21, entry22] 2 [entry31, entry32] </code></pre> There is also <code>map()</code> and <code>applymap()</code> - here is a topic where the differences are discussed: <ul> <li>Difference between map, applymap and apply methods in Pandas</li> </ul>

How do I extract the list inside a string in Python?

Tags:

python

I imported a CSV using Pandas and one column was read in with string entries. Examining the entries for this Series (column), I see that they should actually be lists. For example:

df['A'] = pd.Series(['["entry11"]', '["entry21","entry22"]', '["entry31","entry32"]'])

I would like to extract the list elements from the strings. So far, I've tried the following chain:

df['A'] = df['A'].replace("'",'',regex=True).
                  replace('\[','',regex=True).
                  replace('\]','',regex=True).
                  str.split(",")

(all on one line, of course).

and this gives me back my desired list elements in one column.

['"entry11"']
['"entry21", "entry22"']
['"entry31", "entry32"']

My question: Is there a more efficient way of doing this? This seems like a lot of strain for something that should be a little easier.

577

asked Jan 30 '17 21:01

Chris

1 Answers

You can "apply" the ast.literal_eval() to the series:

In [8]: from ast import literal_eval

In [9]: df['A'] = df['A'].apply(literal_eval)

In [10]: df
Out[10]: 
                    A
0           [entry11]
1  [entry21, entry22]
2  [entry31, entry32]

There is also map() and applymap() - here is a topic where the differences are discussed:

Difference between map, applymap and apply methods in Pandas

176

answered Oct 23 '22 08:10

alecxe

Related questions
                            
                                Invoking the lock screen using python
                            
                                Scrapy - Continuously fetch urls to crawl from database
                            
                                Large memory Python background jobs
                            
                                RandomizedSearchCV gives different results using the same random_state
                            
                                How can I convert literal escape sequences in a string to the corresponding bytes? [duplicate]
                            
                                Create a two line legend in a bokeh plot
                            
                                Calculate the size of all files in a bucket S3
                            
                                How can I visualise an image in h5 format data?
                            
                                Python - find longest path
                            
                                Generating random numbers from arbitrary probability density function
                            
                                Using tensorflow models in web applications
                            
                                Can't transform geometry to geojson
                            
                                python classmethods are not callable from class.__dict__
                            
                                python loggers as children of __main__
                            
                                Peewee ORM - Copying data from multiple database in a main one
                            
                                Pandas: What is a view?
                            
                                Understanding negative steps in list slicing
                            
                                PyCharm: Intellisense or auto-complete not working with Python 3.5.2
                            
                                Warmup django application during uwsgi chain-raload
                            
                                wxPython not found error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With