Pandas: How to Compare Columns of Lists Row-wise in a DataFrame with Pandas (not for loop)?

Tags:

If I have columns of lists, is there a pandas function that lets me operate on the entire array of lists to check for intersection and return either a boolean or the intersecting values as a new series?

For example, I'd like pandas to have an equivalent of this:

def intersection(df, col1, col2, return_type='boolean'):
    if return_type == 'boolean':
        df = df[[col1, col2]]
        s = []
        for idx in df.iterrows():
            s.append(any([phrase in idx[1][0] for phrase in idx[1][1]]))
        S = pd.Series(s)
        return S
    elif return_type == 'word':
        df = df[[col1, col2]]
        s = []
        for idx in df.iterrows():
            s.append(', '.join([word for word in list(set(idx[1][0]).intersection(set(idx[1][1])))]))
        S = pd.Series(s)
        return S

#Create column C in df
df['C'] = intersection(df, 'A', 'B', 'word')

... without having to write my own function or resort to for loops. I feel like there must be an easier way to compare lists in two columns on the same row to see if they intersect.

I can do it with for loops but it's ugly to me

for loop to return a boolean series:

for idx in df.iterrows():
    any([phrase in idx[1][0] for phrase in idx[1][1]])

Produces:

False
False
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True

Or, finding the intersecting words using sets:

for idx in df.iterrows():
    ', '.join([word for word in list(set(idx[1][0]).intersection(set(idx[1][1])))])

''
''
'system'
'system'
'gutter'
'gutter'
'gutter'
'gutter'
'gutter'
'gutter'
'aluminum'
'aluminum'
'aluminum'
'aluminum'
'aluminum'
'aluminum'
'aluminum'
'aluminum'
'aluminum'
'toledo, aluminum'

801

asked Feb 25 '16 00:02

Jarad

2 Answers

To check if every item in df.A is contained in df.B:

>>> df.apply(lambda row: all(i in row.B for i in row.A), axis=1)
# OR: ~(df['A'].apply(set) - df['B'].apply(set)).astype(bool)
0     False
1     False
2      True
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10     True
11     True
12     True
13     True
14     True
15     True
16     True
17     True
18     True
19     True
dtype: bool

To get the union:

df['intersection'] = [list(set(a).intersection(set(b))) 
                      for a, b in zip(df.A, df.B)]

>>> df
                     A                                      B        intersection
0              [gener]                               [gutter]                  []
1              [gener]                               [gutter]                  []
2             [system]                       [gutter, system]            [system]
3             [system]                [gutter, guard, system]            [system]
4             [gutter]                         [ohio, gutter]            [gutter]
5             [gutter]                       [gutter, toledo]            [gutter]
6             [gutter]                       [toledo, gutter]            [gutter]
7             [gutter]                               [gutter]            [gutter]
8             [gutter]                               [gutter]            [gutter]
9             [gutter]                               [gutter]            [gutter]
10          [aluminum]    [how, to, instal, aluminum, gutter]          [aluminum]
11          [aluminum]                     [aluminum, gutter]          [aluminum]
12          [aluminum]              [aluminum, gutter, color]          [aluminum]
13          [aluminum]                     [aluminum, gutter]          [aluminum]
14          [aluminum]       [aluminum, gutter, adrian, ohio]          [aluminum]
15          [aluminum]  [aluminum, gutter, bowl, green, ohio]          [aluminum]
16          [aluminum]        [aluminum, gutter, maume, ohio]          [aluminum]
17          [aluminum]   [aluminum, gutter, perrysburg, ohio]          [aluminum]
18          [aluminum]     [aluminum, gutter, tecumseh, ohio]          [aluminum]
19  [aluminum, toledo]       [aluminum, gutter, toledo, ohio]  [aluminum, toledo]

answered Oct 16 '22 19:10

Alexander

Just use the apply function supported by pandas, it's great.

Since you may have more than two columns for intersecting, the auxiliary function can be prepared like this and then applied with the DataFrame.apply function (see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html, note the option axis=1 means "across the series" while axis=0 means "along the series", where one series is just one column in the data frame). Each row across the columns is then passed as a iterable Series object to the function applied.

def intersect(ss):
    ss = iter(ss)
    s = set(next(ss))
    for t in ss:
        s.intersection_update(t) # `t' must not be a `set' here, `list' or any `Iterable` is OK
    return s

res = df.apply(intersect, axis=1)

>>> res
0                     {}
1                     {}
2               {system}
3               {system}
4               {gutter}
5               {gutter}
6               {gutter}
7               {gutter}
8               {gutter}
9               {gutter}
10            {aluminum}
11            {aluminum}
12            {aluminum}
13            {aluminum}
14            {aluminum}
15            {aluminum}
16            {aluminum}
17            {aluminum}
18            {aluminum}
19    {aluminum, toledo}

You can augment further operations on the result of the auxiliary function, or make some variations similarly.

Hope this helps.

answered Oct 16 '22 17:10

ShellayLee

Related questions
                            
                                How to set a ProtoBuf field which is an empty message in Python?
                            
                                Python - why can I call a class method with an instance?
                            
                                Numpy roll in several dimensions
                            
                                How to change array shapes in in numpy?
                            
                                Python string replacement except first character
                            
                                Where should I place the secret key in Flask?
                            
                                statsmodels ARIMA.fit: Hide output
                            
                                Keras LSTM Time Series
                            
                                Print datetime in ISO format without milliseconds
                            
                                How to delete one-to-one relating models cascading in django?
                            
                                Python What is the difference between a Pool of worker processes and just running multiple Processes?
                            
                                ValueError errors while reading JSON file with pd.read_json
                            
                                Django SQL query duplicated n times
                            
                                Amazon AWS S3 browser-based upload using POST -
                            
                                Are objects erased from memory when a function finishes? Python
                            
                                What is the equivalent of python's _ in javascript?
                            
                                Transform Pandas DataFrame with n-level hierarchical index into n-D Numpy array
                            
                                python - TypeError: unorderable types: str() > float()
                            
                                No module named 'core' when using pyping for Python 3
                            
                                Checking the number of command line arguments in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: How to Compare Columns of Lists Row-wise in a DataFrame with Pandas (not for loop)?

Tags:

python

pandas

Jarad

People also ask

2 Answers

Alexander

ShellayLee

Recent Activity

Donate For Us