Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby search first and last True values

Tags:

python

pandas

I have a pd.Series with duplicated indices, and each index containing a set of booleans:

FA155    False
FA155    False
FA155    False
FA155    True
FA155    True
FA155    True
FA155    True
FA155    True
FA155    False

What I'm trying to do for each different index in an efficient way, is to keep only as True the first and last True values of the sequence, and set the rest to False. There can also be False values between those that are True.

So for this sample the result would be:

FA155    False
FA155    False
FA155    False
FA155    True
FA155    False
FA155    False
FA155    False
FA155    True
FA155    False

Any help would be very appreciated.

like image 333
yatu Avatar asked May 28 '18 18:05

yatu


People also ask

How do you find the first value of Groupby?

groupby. nth() function is used to get the value corresponding the nth row for each group. To get the first value in a group, pass 0 as an argument to the nth() function.

How do you sort values after Groupby?

Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.

Does Groupby preserve index?

The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.

What is AGG in Groupby?

agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation. The aggregation is for each column.


2 Answers

You can use loc with idxmax with both your original df and your inverted df.

This will yield the index of your first and last True values. Just set the different indexes to False afterwards.

For example:

Setup

z = sio("""i    v
FA154    False
FA155    False
FA155    True
FA155    True
FA155    True
FA155    True
FA155    True
FA155    False
FA156    False
FA156    True
FA156    False
FA156    False
FA156    True""")

df = pd.read_table(z, delim_whitespace=True)

    i       v
0   FA154   False
1   FA155   False
2   FA155   True
3   FA155   True
4   FA155   True
5   FA155   True
6   FA155   True
7   FA155   False
8   FA156   False
9   FA156   True
10  FA156   False
11  FA156   False
12  FA156   True

idxmax()

Which is the same thing as getting your df and using reset_index. Then, get list of indexes for you first (v1) and last (v2) True values:

v1 = df.groupby("i").v.idxmax().values
v2 = df[::-1].groupby("i").v.idxmax().values

And use your logic:

df.loc[v1, "v"] = True & df.loc[v1, "v"]
df.loc[v2, "v"] = True & df.loc[v2, "v"]
df.loc[~df.index.isin(np.concatenate([v1,v2])), "v"] = False

The idea behind using & is not to accidentally set any False values to True.

Result:

>>> df.set_index("i")

        v
i   
FA154   False
FA155   False
FA155   True
FA155   False
FA155   False
FA155   False
FA155   True
FA155   False
FA156   False
FA156   True
FA156   False
FA156   False
FA156   True
like image 102
rafaelc Avatar answered Oct 23 '22 16:10

rafaelc


You filter True values and then you aggregate to find the first and last values. Then you can use loc to replace those values in df. df is your dataframe. col is the name of your column with True and False values

df["nb"] = range(df.shape[0])
df.reset_index(inplace=True)

elem = (df[df[col]==True].groupby("index")["nb"].agg({ "first_True": 'first', "last_True":"last"})).values

indexes_to_False = sum(elem.tolist(), [])

df.loc[indexes_to_False, col] = False

Then you can drop the column nb and reindex if you wish

like image 1
Mohamed AL ANI Avatar answered Oct 23 '22 16:10

Mohamed AL ANI