Pandas: Enumerate duplicates in index

Tags:

Let's say I have a list of events that happen on different keys.

data = [
    {"key": "A", "event": "created"},
    {"key": "A", "event": "updated"},
    {"key": "A", "event": "updated"},
    {"key": "A", "event": "updated"},
    {"key": "B", "event": "created"},
    {"key": "B", "event": "updated"},
    {"key": "B", "event": "updated"},
    {"key": "C", "event": "created"},
    {"key": "C", "event": "updated"},
    {"key": "C", "event": "updated"},
    {"key": "C", "event": "updated"},
    {"key": "C", "event": "updated"},
    {"key": "C", "event": "updated"},
]

df = pandas.DataFrame(data)

I would like to index my DataFrame on the key first and then an enumeration. It looks like a simple unstack operation, but I'm unable to find how to do it properly.

The best I could do was

df.set_index("key", append=True).swaplevel(0, 1)

          event
key            
A   0   created
    1   updated
    2   updated
    3   updated
B   4   created
    5   updated
    6   updated
C   7   created
    8   updated
    9   updated
    10  updated
    11  updated
    12  updated

but what I'm expecting is

          event
key            
A   0   created
    1   updated
    2   updated
    3   updated
B   0   created
    1   updated
    2   updated
C   0   created
    1   updated
    2   updated
    3   updated
    4   updated
    5   updated

I also tried something like

df.groupby("key")["key"].count().apply(range).apply(pandas.Series).stack()

but the order is not preserved, so I can't apply the result as an index. Besides, I feel it overkill for an operation that looks quite standard...

Any idea?

233

asked Nov 15 '18 21:11

Cilyan

1 Answers

`groupby` + `cumcount`

Here are a couple of ways:

# new version thanks @ScottBoston
df = df.set_index(['key', df.groupby('key').cumcount()])\
       .rename_axis(['key','count'])

# original version
df = df.assign(count=df.groupby('key').cumcount())\
       .set_index(['key', 'count'])

print(df)

             event
key count         
A   0      created
    1      updated
    2      updated
    3      updated
B   0      created
    1      updated
    2      updated
C   0      created
    1      updated
    2      updated
    3      updated
    4      updated
    5      updated

130

answered Oct 04 '22 14:10

jpp

Related questions
                            
                                Pandas: Sum of the Max 3 Column Values in Each Row
                            
                                Snakemake wants to run job although output file already exists
                            
                                How to resolve TypeError: 'float' object is not callable
                            
                                Basic auth authentication in Bottle
                            
                                Get percentages of a column based off of another column but with different categories
                            
                                List sort based on another shorter list
                            
                                File "<string>", line 1, in <module> NameError: name ' ' is not defined in ATOM [duplicate]
                            
                                Pandas: for all set of duplicate entries in a particular column, grab some information
                            
                                Pyinstaller generated exe doesn't work properly
                            
                                How to store %%time values in a variable in Jupyter? [duplicate]
                            
                                Django - Filter the prefetch_related queryset
                            
                                Error- AttributeError: 'DirectoryIterator' object has no attribute 'ndim in autoencoder design in keras
                            
                                How to connect to Odoo database from an android application
                            
                                Is there a faster alternative to np.diff?
                            
                                Why does Exception proxy __str__ onto the args?
                            
                                How to send python output to telegram CHANNEL not to Group and gmail email group
                            
                                How can i check that a list is in my array in python
                            
                                How to return a list of frequencies for a certain value in a dict
                            
                                In python, how do I invert a 2D dictionary?
                            
                                Error in Google Colaboratory - AttributeError: module 'PIL.Image' has no attribute 'register_decoder'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: Enumerate duplicates in index

Tags:

python

python-3.x

pandas

Cilyan

People also ask

1 Answers

`groupby` + `cumcount`

jpp

Recent Activity

Donate For Us

Pandas: Enumerate duplicates in index

Tags:

python

python-3.x

pandas

Cilyan

People also ask

1 Answers

groupby + cumcount

jpp

Related questions

Recent Activity

Donate For Us

`groupby` + `cumcount`