Iterate over values in pandas column containing lists and retrieve only unique values

Question

This is three questions that i just cant figure out, hope someone can help me out.

import pandas as pd
data = {'Col1': ['ONE, ONE, NULL', 'ONE, TWO, THREE', 'TWO, NULL, TEN']}
index = pd.Index(['d1','d2','d3'])
data = pd.DataFrame(data,index=index)
pattern = 'ONE|TWO'                 <----QUESTION1
data['Col1'].str.findall(pattern)   <----QUESTION2

Question1: How can i change this regex so that 'ONE' is only found once in d1? As it is now each instance of ONE found will be returned as shown below.

d1    [ONE, ONE]
d2    [ONE, TWO]
d3         [TWO]

i want this

d1         [ONE]
d2    [ONE, TWO]
d3         [TWO]

Question2:
I want to take list d1, d2 and d3 and make into one list containing only unique values. That is something like this:

set(d1 + d2 + d3) ---> ['ONE', 'TWO']

Question3:
If i would have done something like this:

data['Col2'] = data['Col1'].str.findall(pattern)

How could i iterate over every row in Col2 to get the same results as i asked for in Question2?

Andy Hayden · Accepted Answer

You can use reduce (over set.union):

In [11]: reduce(set.union, data['Col1'].str.findall(pattern), set())
Out[11]: {'ONE', 'TWO'}

Another option is to use a list comprehension:

In [12]: [w for w in ['ONE', 'TWO'] if data['Col1'].str.contains(w).any()]
Out[12]: ['ONE', 'TWO']

Iterate over values in pandas column containing lists and retrieve only unique values

Tags:

python

regex

pandas

user3139545

1 Answers

Andy Hayden

Recent Activity

Donate For Us

Iterate over values in pandas column containing lists and retrieve only unique values

Tags:

python

regex

pandas

user3139545

1 Answers

Andy Hayden

Related questions

Recent Activity

Donate For Us