Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to duplicate rows in pandas, based on items in a list [duplicate]

Tags:

python

pandas

I have a pandas dataframe that looks like this:

COL     data
line1   [A,B,C]

where the items in the data column could either be a list or just comma separated elements. Is there an easy of way of getting:

COL     data
line1   A
line1   B
line1   C

I could iterate over the list and manually duplicate the rows via python, but is there some magic pandas trick for doing this? The key point is how to automatically duplicate the rows.

Thanks!

like image 375
vgoklani Avatar asked Apr 11 '13 15:04

vgoklani


People also ask

How do you find duplicate rows in pandas based on multiple columns?

Find Duplicate Rows based on all columns To find & select the duplicate all rows based on all columns call the Daraframe. duplicate() without any subset argument. It will return a Boolean series with True at the place of each duplicated rows except their first occurrence (default value of keep argument is 'first').

What is a correct method to discover if a row is a duplicate pandas?

Finding duplicate rows To find duplicates on a specific column, we can simply call duplicated() method on the column. The result is a boolean Series with the value True denoting duplicate. In other words, the value True means the entry is identical to a previous one.


2 Answers

You could write a simple cleaning function to make it a list (assuming it's not a list of commas, and you can't simply use ast.literal_eval):

def clean_string_to_list(s):
    return [c for c in s if c not in '[,]']  # you might need to catch errors

df['data'] = df['data'].apply(clean_string_to_list)

Iterating through the rows seems like a reasonable choice:

In [11]: pd.DataFrame([(row['COL'], d)
                       for d in row['data']
                       for _, row in df.iterrows()],
                       columns=df.columns)
Out[11]:
     COL data
0  line1    A
1  line1    B
2  line1    C

I'm afraid I don't think pandas caters specifically for this kind of manipulation.

like image 53
Andy Hayden Avatar answered Sep 21 '22 19:09

Andy Hayden


You can use df.explode() option. Refer to the documentation. I believe this is exactly the functionality you need.

like image 31
Sagar Simha Avatar answered Sep 22 '22 19:09

Sagar Simha