I have a pandas dataframe that looks like this: <pre class="prettyprint"><code>COL data line1 [A,B,C] </code></pre> where the items in the data column could either be a list or just comma separated elements. Is there an easy of way of getting: <pre class="prettyprint"><code>COL data line1 A line1 B line1 C </code></pre> I could iterate over the list and manually duplicate the rows via python, but is there some magic pandas trick for doing this? The key point is how to automatically duplicate the rows. Thanks!

You could write a simple cleaning function to make it a list (assuming it's not a list of commas, and you can't simply use <code>ast.literal_eval</code>): <pre class="prettyprint"><code>def clean_string_to_list(s): return [c for c in s if c not in '[,]'] # you might need to catch errors df['data'] = df['data'].apply(clean_string_to_list) </code></pre> Iterating through the rows seems like a reasonable choice: <pre class="prettyprint"><code>In [11]: pd.DataFrame([(row['COL'], d) for d in row['data'] for _, row in df.iterrows()], columns=df.columns) Out[11]: COL data 0 line1 A 1 line1 B 2 line1 C </code></pre> I'm afraid I don't think pandas caters specifically for this kind of manipulation.

How to duplicate rows in pandas, based on items in a list [duplicate]

Tags:

python

pandas

I have a pandas dataframe that looks like this:

COL     data
line1   [A,B,C]

where the items in the data column could either be a list or just comma separated elements. Is there an easy of way of getting:

COL     data
line1   A
line1   B
line1   C

I could iterate over the list and manually duplicate the rows via python, but is there some magic pandas trick for doing this? The key point is how to automatically duplicate the rows.

Thanks!

375

asked Apr 11 '13 15:04

vgoklani

2 Answers

You could write a simple cleaning function to make it a list (assuming it's not a list of commas, and you can't simply use ast.literal_eval):

def clean_string_to_list(s):
    return [c for c in s if c not in '[,]']  # you might need to catch errors

df['data'] = df['data'].apply(clean_string_to_list)

Iterating through the rows seems like a reasonable choice:

In [11]: pd.DataFrame([(row['COL'], d)
                       for d in row['data']
                       for _, row in df.iterrows()],
                       columns=df.columns)
Out[11]:
     COL data
0  line1    A
1  line1    B
2  line1    C

I'm afraid I don't think pandas caters specifically for this kind of manipulation.

answered Sep 21 '22 19:09

Andy Hayden

You can use df.explode() option. Refer to the documentation. I believe this is exactly the functionality you need.

answered Sep 22 '22 19:09

Sagar Simha

Related questions
                            
                                How do I configure spacemacs for python 3?
                            
                                Highlight text in a PDF with Python [closed]
                            
                                Programmatically defining a class: type vs types.new_class
                            
                                Differences between generator comprehension expressions
                            
                                Adding packages to Python "embedded" installation for Windows
                            
                                numpy.unique gives wrong output for list of sets
                            
                                Pylons error - 'MySQL server has gone away'
                            
                                In Django, how do you retrieve data from extra fields on many-to-many relationships without an explicit query for it?
                            
                                ctypes for static libraries?
                            
                                Is there a way to set metaclass after the class definition?
                            
                                Checking if property is settable/deletable
                            
                                efficient function to retrieve a queryset of ancestors of an mptt queryset
                            
                                Python equivalent of unix cksum function
                            
                                Python AppIndicator bindings -> howto check if the menu is open?
                            
                                Why does pickle protocol 2 let me serialise an open file object?
                            
                                Python performance vs PHP [closed]
                            
                                Making custom containers work with **kwargs (how does Python expand the args?)
                            
                                Cartesian product of large iterators (itertools)
                            
                                Learning and using augmented Bayes classifiers in python
                            
                                Python tools for out-of-core computation/data mining

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With