Groupby search first and last True values

Tags:

pandas

I have a pd.Series with duplicated indices, and each index containing a set of booleans:

FA155    False
FA155    False
FA155    False
FA155    True
FA155    True
FA155    True
FA155    True
FA155    True
FA155    False

What I'm trying to do for each different index in an efficient way, is to keep only as True the first and last True values of the sequence, and set the rest to False. There can also be False values between those that are True.

So for this sample the result would be:

FA155    False
FA155    False
FA155    False
FA155    True
FA155    False
FA155    False
FA155    False
FA155    True
FA155    False

Any help would be very appreciated.

333

asked May 28 '18 18:05

yatu

2 Answers

You can use loc with idxmax with both your original df and your inverted df.

This will yield the index of your first and last True values. Just set the different indexes to False afterwards.

For example:

Setup

z = sio("""i    v
FA154    False
FA155    False
FA155    True
FA155    True
FA155    True
FA155    True
FA155    True
FA155    False
FA156    False
FA156    True
FA156    False
FA156    False
FA156    True""")

df = pd.read_table(z, delim_whitespace=True)

    i       v
0   FA154   False
1   FA155   False
2   FA155   True
3   FA155   True
4   FA155   True
5   FA155   True
6   FA155   True
7   FA155   False
8   FA156   False
9   FA156   True
10  FA156   False
11  FA156   False
12  FA156   True

`idxmax()`

Which is the same thing as getting your df and using reset_index. Then, get list of indexes for you first (v1) and last (v2) True values:

v1 = df.groupby("i").v.idxmax().values
v2 = df[::-1].groupby("i").v.idxmax().values

And use your logic:

df.loc[v1, "v"] = True & df.loc[v1, "v"]
df.loc[v2, "v"] = True & df.loc[v2, "v"]
df.loc[~df.index.isin(np.concatenate([v1,v2])), "v"] = False

The idea behind using & is not to accidentally set any False values to True.

Result:

>>> df.set_index("i")

        v
i   
FA154   False
FA155   False
FA155   True
FA155   False
FA155   False
FA155   False
FA155   True
FA155   False
FA156   False
FA156   True
FA156   False
FA156   False
FA156   True

102

answered Oct 23 '22 16:10

rafaelc

You filter True values and then you aggregate to find the first and last values. Then you can use loc to replace those values in df. df is your dataframe. col is the name of your column with True and False values

df["nb"] = range(df.shape[0])
df.reset_index(inplace=True)

elem = (df[df[col]==True].groupby("index")["nb"].agg({ "first_True": 'first', "last_True":"last"})).values

indexes_to_False = sum(elem.tolist(), [])

df.loc[indexes_to_False, col] = False

Then you can drop the column nb and reindex if you wish

answered Oct 23 '22 16:10

Mohamed AL ANI

Related questions
                            
                                Scrapy CrawlSpider + Splash: how to follow links through linkextractor?
                            
                                FastText - Cannot load model.bin due to C++ extension failed to allocate the memory
                            
                                Why does df.apply(tuple) work but not df.apply(list)?
                            
                                Finding the union of multiple overlapping rectangles - OpenCV python
                            
                                Is it possible to parallelize bz2's decompression?
                            
                                mypy: Signature of "__getitem__" incompatible with supertype "Sequence"
                            
                                Python : How to interpret the result of logistic regression by sm.Logit
                            
                                TensorFlow estimator.predict() gives WARNING:tensorflow:Input graph does not contain a QueueRunner
                            
                                TypeError: unsupported operand type(s) for +: 'set' and 'set'
                            
                                Spark/PySpark: An error occurred while trying to connect to the Java server (127.0.0.1:39543)
                            
                                Writing results from SQL query to CSV and avoiding extra line-breaks
                            
                                Selecting an element on Appium / Android with Python that has same Class and Same Index of another element on UIAutomatorViewer
                            
                                Django app : unit tests fails because of django.db.utils.IntegrityError
                            
                                How to get the co-ordinates of the text recogonized from Image using OCR in python
                            
                                Adding Tensorboard summaries from graph ops generated inside Dataset map() function calls
                            
                                How to upgrade django project multiple versions (1.8 to 1.11+)?
                            
                                Unable to convert Kafka topic data into structured JSON with Confluent Elasticsearch sink connector
                            
                                Does the TensorFlow backend of Keras rely on the eager execution?
                            
                                Storing multiple dataframes of different widths with Parquet?
                            
                                Jupyter commands work only with a dash (e.g. jupyter-kernelspec instead of jupyter kernelspec)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With