Copy the following dataframe to your clipboard:
textId score textInfo
0 name1 1.0 text_stuff
1 name1 2.0 different_text_stuff
2 name1 2.0 text_stuff
3 name2 1.0 different_text_stuff
4 name2 1.3 different_text_stuff
5 name2 2.0 still_different_text
6 name2 1.0 yoko ono
7 name2 3.0 I lika da Gweneth
8 name3 1.0 Always a tradeoff
9 name3 3.0 What?!
Now use
import pandas as pd
df=pd.read_clipboard(sep='\s\s+')
to load it into your environment. How does one slice this dataframe such that all the rows of a particular textId
are returned if the score
group of that textId
includes at least one score
that equals 1.0, 2.0 and 3.0? Here, the desired operation's result would exclude textId
rows name1 since its score
group is missing a 3.0 and exclude name3 since its score
group is missing a 2.0:
textId score textInfo
0 name2 1.0 different_text_stuff
1 name2 1.3 different_text_stuff
2 name2 2.0 still_different_text
3 name2 1.0 yoko ono
4 name2 3.0 I lika da Gweneth
df[df.textId == "textIdRowName" & df.score == 1.0 & df.score == 2.0
& & df.score == 3.0]
isn't right since the condition isn't acting
on the textId
group but only individual rows. If this could be
rewritten to match against textId
groups then it could be placed
in a for loop and fed the unique textIdRowName's. Such a function
would collect the names of the textId
in a series (say
textIdThatMatchScore123
) that could then be used to slice the original df
like df[df.textId.isin(textIdThatMatchScore123)]
.groupby
.Here's one solution - groupby
textId, then keep only those groups where the unique values of score is a superset (>=
) of [1.0, 2.0, 3.0]
.
In [58]: df.groupby('textId').filter(lambda x: set(x['score']) >= set([1.,2.,3.]))
Out[58]:
textId score textInfo
3 name2 1.0 different_text_stuff
4 name2 1.3 different_text_stuff
5 name2 2.0 still_different_text
6 name2 1.0 yoko ono
7 name2 3.0 I lika da Gweneth
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With