I have a dataframe, memory that looks like this:
>>> memory
input action result
1 2 3 4 action 1 2 3 4
0 11 22 33 44 a 10 20 30 40
1 10 20 30 40 b 90 90 90 90
2 90 90 90 90 c 91 91 91 91
>>> type(memory)
<class 'pandas.core.frame.DataFrame'>
I have a dataframe, bla that looks like this:
>>> bla
1 2 3 4
0 11 22 33 44
>>> type(bla)
<class 'pandas.core.frame.DataFrame'>
I want a daraframe made of memory where bla is taken out:
>>> minus_bla
input action result
1 2 3 4 action 1 2 3 4
1 10 20 30 40 b 90 90 90 90
2 90 90 90 90 c 91 91 91 91
and one where bla is selected for:
>>> memory_bla
input action result
1 2 3 4 action 1 2 3 4
0 11 22 33 44 a 10 20 30 40
I tried to do this with filtering but that's just silly:
memory[memory.loc[:,'input'] == bla]
I got this error:
ValueError: Can only compare identically-labeled DataFrame objects
Anyway, maybe I can do this with a merge but so far I've had no luck.
The way I'm solving this right now is a huge workaround of producing a slicing conditional that looks like this:
>>> memory[
(memory[('input', 1)]==bla.loc[0, 1]) &
(memory[('input', 2)]==bla.loc[0, 2]) &
(memory[('input', 3)]==bla.loc[0, 3]) &
(memory[('input', 4)]==bla.loc[0, 4])]
input action result
1 2 3 4 action 1 2 3 4
0 11 22 33 44 a 10 20 30 40
Isn't that just sad? Especially in my case where I could have a variable number of inputs (not just 4).
Surely there's a better way to select for and against a sub dataframe (even when the larger dataframe has multiple column levels), probably involving merge? can you point me in the right direction?
Using merge
idx=df.loc[:,'input'].merge(bla,indicator =True).index
df1=df.loc[df.index.difference(idx),:]
df2=df.loc[idx]
df1
Out[683]:
input action result
1 2 3 4 action 1 2 3 4
1 10 20 30 40 b 90 90 90 90
2 90 90 90 90 c 91 91 91 91
df2
Out[684]:
input action result
1 2 3 4 action 1 2 3 4
0 11 22 33 44 a 10 20 30 40
Without having your data, you can achieve this by first doing a left merge and including indicator=True and after that filter on left_only:
# Example data
np.random.seed(0)
left = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})
right = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})
print(left)
print(right)
key value
0 A 1.764052
1 B 0.400157
2 C 0.978738
3 D 2.240893
key value
0 B 1.867558
1 D -0.977278
2 E 0.950088
3 F -0.151357
Perfom a left join
df_join = pd.merge(left, right, on='key', how='left', indicator=True)
print(df_join)
key value_x value_y _merge
0 A 1.764052 NaN left_only
1 B 0.400157 1.867558 both
2 C 0.978738 NaN left_only
3 D 2.240893 -0.977278 both
Filter on only left
unmatch = df_join[df_join['_merge'] == 'left_only']
print(unmatch)
key value_x value_y _merge
0 A 1.764052 NaN left_only
2 C 0.978738 NaN left_only
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With