Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to filter multi-dimentional dataframe on sub dataframe

Tags:

python

pandas

I have a dataframe, memory that looks like this:

>>> memory
  input             action result
      1   2   3   4 action      1   2   3   4
0    11  22  33  44      a     10  20  30  40
1    10  20  30  40      b     90  90  90  90
2    90  90  90  90      c     91  91  91  91
>>> type(memory)
<class 'pandas.core.frame.DataFrame'>

I have a dataframe, bla that looks like this:

>>> bla
    1   2   3   4
0  11  22  33  44
>>> type(bla)
<class 'pandas.core.frame.DataFrame'>

I want a daraframe made of memory where bla is taken out:

>>> minus_bla
  input             action result
      1   2   3   4 action      1   2   3   4
1    10  20  30  40      b     90  90  90  90
2    90  90  90  90      c     91  91  91  91

and one where bla is selected for:

>>> memory_bla
  input             action result
      1   2   3   4 action      1   2   3   4
0    11  22  33  44      a     10  20  30  40

I tried to do this with filtering but that's just silly:

memory[memory.loc[:,'input'] == bla]

I got this error:

ValueError: Can only compare identically-labeled DataFrame objects

Anyway, maybe I can do this with a merge but so far I've had no luck.

The way I'm solving this right now is a huge workaround of producing a slicing conditional that looks like this:

>>> memory[
    (memory[('input', 1)]==bla.loc[0, 1]) & 
    (memory[('input', 2)]==bla.loc[0, 2]) & 
    (memory[('input', 3)]==bla.loc[0, 3]) &
    (memory[('input', 4)]==bla.loc[0, 4])]
  input             action result
      1   2   3   4 action      1   2   3   4
0    11  22  33  44      a     10  20  30  40

Isn't that just sad? Especially in my case where I could have a variable number of inputs (not just 4).

Surely there's a better way to select for and against a sub dataframe (even when the larger dataframe has multiple column levels), probably involving merge? can you point me in the right direction?

like image 479
Legit Stack Avatar asked Mar 01 '26 07:03

Legit Stack


2 Answers

Using merge

idx=df.loc[:,'input'].merge(bla,indicator =True).index
df1=df.loc[df.index.difference(idx),:]
df2=df.loc[idx]
df1
Out[683]: 
  input             action result            
      1   2   3   4 action      1   2   3   4
1    10  20  30  40      b     90  90  90  90
2    90  90  90  90      c     91  91  91  91
df2
Out[684]: 
  input             action result            
      1   2   3   4 action      1   2   3   4
0    11  22  33  44      a     10  20  30  40
like image 129
BENY Avatar answered Mar 03 '26 21:03

BENY


Without having your data, you can achieve this by first doing a left merge and including indicator=True and after that filter on left_only:

# Example data
np.random.seed(0)
left = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})    
right = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})

print(left)
print(right)
  key     value
0   A  1.764052
1   B  0.400157
2   C  0.978738
3   D  2.240893
  key     value
0   B  1.867558
1   D -0.977278
2   E  0.950088
3   F -0.151357

Perfom a left join

df_join = pd.merge(left, right, on='key', how='left', indicator=True)
print(df_join)
  key   value_x   value_y     _merge
0   A  1.764052       NaN  left_only
1   B  0.400157  1.867558       both
2   C  0.978738       NaN  left_only
3   D  2.240893 -0.977278       both

Filter on only left

unmatch = df_join[df_join['_merge'] == 'left_only']
print(unmatch)
  key   value_x  value_y     _merge
0   A  1.764052      NaN  left_only
2   C  0.978738      NaN  left_only
like image 34
Erfan Avatar answered Mar 03 '26 20:03

Erfan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!