how to filter multi-dimentional dataframe on sub dataframe

Question

I have a dataframe, memory that looks like this:

>>> memory
  input             action result
      1   2   3   4 action      1   2   3   4
0    11  22  33  44      a     10  20  30  40
1    10  20  30  40      b     90  90  90  90
2    90  90  90  90      c     91  91  91  91
>>> type(memory)
<class 'pandas.core.frame.DataFrame'>

I have a dataframe, bla that looks like this:

>>> bla
    1   2   3   4
0  11  22  33  44
>>> type(bla)
<class 'pandas.core.frame.DataFrame'>

I want a daraframe made of memory where bla is taken out:

>>> minus_bla
  input             action result
      1   2   3   4 action      1   2   3   4
1    10  20  30  40      b     90  90  90  90
2    90  90  90  90      c     91  91  91  91

and one where bla is selected for:

>>> memory_bla
  input             action result
      1   2   3   4 action      1   2   3   4
0    11  22  33  44      a     10  20  30  40

I tried to do this with filtering but that's just silly:

memory[memory.loc[:,'input'] == bla]

I got this error:

ValueError: Can only compare identically-labeled DataFrame objects

Anyway, maybe I can do this with a merge but so far I've had no luck.

The way I'm solving this right now is a huge workaround of producing a slicing conditional that looks like this:

>>> memory[
    (memory[('input', 1)]==bla.loc[0, 1]) & 
    (memory[('input', 2)]==bla.loc[0, 2]) & 
    (memory[('input', 3)]==bla.loc[0, 3]) &
    (memory[('input', 4)]==bla.loc[0, 4])]
  input             action result
      1   2   3   4 action      1   2   3   4
0    11  22  33  44      a     10  20  30  40

Isn't that just sad? Especially in my case where I could have a variable number of inputs (not just 4).

Surely there's a better way to select for and against a sub dataframe (even when the larger dataframe has multiple column levels), probably involving merge? can you point me in the right direction?

BENY · Accepted Answer

Using merge

idx=df.loc[:,'input'].merge(bla,indicator =True).index
df1=df.loc[df.index.difference(idx),:]
df2=df.loc[idx]
df1
Out[683]: 
  input             action result            
      1   2   3   4 action      1   2   3   4
1    10  20  30  40      b     90  90  90  90
2    90  90  90  90      c     91  91  91  91
df2
Out[684]: 
  input             action result            
      1   2   3   4 action      1   2   3   4
0    11  22  33  44      a     10  20  30  40

Erfan · Answer

Without having your data, you can achieve this by first doing a left merge and including indicator=True and after that filter on left_only:

# Example data
np.random.seed(0)
left = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})    
right = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})

print(left)
print(right)
  key     value
0   A  1.764052
1   B  0.400157
2   C  0.978738
3   D  2.240893
  key     value
0   B  1.867558
1   D -0.977278
2   E  0.950088
3   F -0.151357

Perfom a left join

df_join = pd.merge(left, right, on='key', how='left', indicator=True)
print(df_join)
  key   value_x   value_y     _merge
0   A  1.764052       NaN  left_only
1   B  0.400157  1.867558       both
2   C  0.978738       NaN  left_only
3   D  2.240893 -0.977278       both

Filter on only left

unmatch = df_join[df_join['_merge'] == 'left_only']
print(unmatch)
  key   value_x  value_y     _merge
0   A  1.764052      NaN  left_only
2   C  0.978738      NaN  left_only

how to filter multi-dimentional dataframe on sub dataframe

Tags:

python

pandas

Legit Stack

2 Answers

BENY

Erfan

Recent Activity

Donate For Us

how to filter multi-dimentional dataframe on sub dataframe

Tags:

python

pandas

Legit Stack

2 Answers

BENY

Erfan

Related questions

Recent Activity

Donate For Us