Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to check if a dataframe is contained within another?

Tags:

python

pandas

I am using pandas to check wether two dataframes are contained within each others. the method .isin() is only helpful (e.g., returns True) only when labels match (ref: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html) but I want to check further that this to include cases where the labels don't match.

Example: df1:

+----+----+----+----+----+
| 3  | 4  | 5  | 6  | 7  |
+----+----+----+----+----+
| 11 | 13 | 10 | 15 | 12 |
+----+----+----+----+----+
| 8  | 2  | 9  | 0  | 1  |
+----+----+----+----+----+
| 14 | 23 | 31 | 21 | 19 |
+----+----+----+----+----+

df2:

+----+----+
| 13 | 10 |
+----+----+
| 2  | 9  |
+----+----+

I want the output to be True since df2 is inside df1 Any ideas how to do that using Pandas?

like image 437
Sam Avatar asked Sep 11 '25 03:09

Sam


1 Answers

You can use numpy's sliding_window_view:

from numpy.lib.stride_tricks import sliding_window_view as swv

(swv(df1, df2.shape)==df2.to_numpy()).all((-2, -1)).any()

Output: True

Intermediate:

(swv(df1, df2.shape)==df2.to_numpy()).all((-2, -1))

array([[False, False, False, False],
       [False,  True, False, False],  # df2 is found in position 1,1
       [False, False, False, False]])

using a partial match

Example 1: ≥ 75% of matches:

from numpy.lib.stride_tricks import sliding_window_view as swv

((swv(df1, df2.shape)==df2.to_numpy()).mean((-2, -1))>=0.75).any()

Example 2: ≥ 3 matches:

from numpy.lib.stride_tricks import sliding_window_view as swv

((swv(df1, df2.shape)==df2.to_numpy()).sum((-2, -1))>=3).any()

Alternative input:

df2 = pd.DataFrame([[13, 10], [2, 8]])
like image 123
mozway Avatar answered Sep 13 '25 18:09

mozway