I have a pd.Series with duplicated indices, and each index containing a set of booleans:
FA155 False
FA155 False
FA155 False
FA155 True
FA155 True
FA155 True
FA155 True
FA155 True
FA155 False
What I'm trying to do for each different index in an efficient way, is to keep only as True the first and last True values of the sequence, and set the rest to False. There can also be False values between those that are True.
So for this sample the result would be:
FA155 False
FA155 False
FA155 False
FA155 True
FA155 False
FA155 False
FA155 False
FA155 True
FA155 False
Any help would be very appreciated.
groupby. nth() function is used to get the value corresponding the nth row for each group. To get the first value in a group, pass 0 as an argument to the nth() function.
Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation. The aggregation is for each column.
You can use loc
with idxmax
with both your original df
and your inverted df
.
This will yield the index of your first and last True
values. Just set the different indexes to False
afterwards.
For example:
z = sio("""i v
FA154 False
FA155 False
FA155 True
FA155 True
FA155 True
FA155 True
FA155 True
FA155 False
FA156 False
FA156 True
FA156 False
FA156 False
FA156 True""")
df = pd.read_table(z, delim_whitespace=True)
i v
0 FA154 False
1 FA155 False
2 FA155 True
3 FA155 True
4 FA155 True
5 FA155 True
6 FA155 True
7 FA155 False
8 FA156 False
9 FA156 True
10 FA156 False
11 FA156 False
12 FA156 True
idxmax()
Which is the same thing as getting your df
and using reset_index
. Then, get list of indexes for you first (v1
) and last (v2
) True
values:
v1 = df.groupby("i").v.idxmax().values
v2 = df[::-1].groupby("i").v.idxmax().values
And use your logic:
df.loc[v1, "v"] = True & df.loc[v1, "v"]
df.loc[v2, "v"] = True & df.loc[v2, "v"]
df.loc[~df.index.isin(np.concatenate([v1,v2])), "v"] = False
The idea behind using &
is not to accidentally set any False
values to True
.
Result:
>>> df.set_index("i")
v
i
FA154 False
FA155 False
FA155 True
FA155 False
FA155 False
FA155 False
FA155 True
FA155 False
FA156 False
FA156 True
FA156 False
FA156 False
FA156 True
You filter True values and then you aggregate to find the first and last values. Then you can use loc to replace those values in df.
df
is your dataframe. col
is the name of your column with True
and False
values
df["nb"] = range(df.shape[0])
df.reset_index(inplace=True)
elem = (df[df[col]==True].groupby("index")["nb"].agg({ "first_True": 'first', "last_True":"last"})).values
indexes_to_False = sum(elem.tolist(), [])
df.loc[indexes_to_False, col] = False
Then you can drop the column nb
and reindex if you wish
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With