i have the following table <pre class="prettyprint"><code>date ui mw maxw tC HL msurp 01/03/2004 A 10 10 eC 0.25 0.1 01/04/2004 A 10 10 eC 0.25 -0.1 01/03/2004 B 20 20 bC 0.5 0.3 01/03/2004 B 20 20 bC 0.25 0.3 </code></pre> what i am looking to do is add a column to this table that basically enumerates the unique combinations of ui, mw, maxw, tC and HL and enumerates so for example in the above table unique combinations of ui, mw, maxw, tC and HL are <pre class="prettyprint"><code> A,10, 10, eC, 0.25 B,20, 20, bC, 0.5 B,20, 20, bC, 0.5 </code></pre> There are total 3 so the output should be something like <pre class="prettyprint"><code>date ui mw maxw tC HL msurp counter 01/03/2004 A 10 10 eC 0.25 0.1 1 01/04/2004 A 10 10 eC 0.25 -0.1 1 01/03/2004 B 20 20 bC 0.5 0.3 2 01/03/2004 B 20 20 bC 0.25 0.3 3 </code></pre>

Option 1 <code>pd.Series.factorize</code> <pre class="prettyprint"><code>df.assign( counter=df[['ui', 'mw', 'maxw', 'tC', 'HL']].apply(tuple, 1).factorize()[0] + 1) date ui mw maxw tC HL msurp counter 0 01/03/2004 A 10 10 eC 0.25 0.1 1 1 01/04/2004 A 10 10 eC 0.25 -0.1 1 2 01/03/2004 B 20 20 bC 0.50 0.3 2 3 01/03/2004 B 20 20 bC 0.25 0.3 3 </code></pre> <hr> Option 1.5 More obnoxious version of option 1 but should be faster <pre class="prettyprint"><code>df.assign( counter=pd.factorize(list(zip( *[df[c].values.tolist() for c in ['ui', 'mw', 'maxw', 'tC', 'HL']] )))[0] + 1 ) date ui mw maxw tC HL msurp counter 0 01/03/2004 A 10 10 eC 0.25 0.1 1 1 01/04/2004 A 10 10 eC 0.25 -0.1 1 2 01/03/2004 B 20 20 bC 0.50 0.3 2 3 01/03/2004 B 20 20 bC 0.25 0.3 3 </code></pre> <hr> Option 2 @ayhan's answer (will delete if he posts it) <pre class="prettyprint"><code>df.assign( counter=df.groupby(['ui', 'mw', 'maxw', 'tC', 'HL']).ngroup() + 1) date ui mw maxw tC HL msurp counter 0 01/03/2004 A 10 10 eC 0.25 0.1 1 1 01/04/2004 A 10 10 eC 0.25 -0.1 1 2 01/03/2004 B 20 20 bC 0.50 0.3 3 3 01/03/2004 B 20 20 bC 0.25 0.3 2 </code></pre> <hr> Timing Code Below <pre class="prettyprint"><code>(lambda r: r.div(r.min(1), 0).assign(best=lambda x: x.idxmin(1)))(results) pir1 pir2 ayhan best 100 17.260639 1.000000 3.438354 pir2 300 30.550010 1.000000 2.598456 pir2 1000 43.201163 1.000000 1.236190 pir2 3000 61.593932 1.000000 1.025420 pir2 10000 127.003138 2.177171 1.000000 ayhan </code></pre> <img src="https://i.stack.imgur.com/Sp5Cp.png" alt="enter image description here"> <pre class="prettyprint"><code>pir1 = lambda d: d.assign(counter=d[['ui', 'mw', 'maxw', 'tC', 'HL']].apply(tuple, 1).factorize()[0] + 1) pir2 = lambda d: d.assign(counter=pd.factorize(list(zip(*[d[c].values.tolist() for c in ['ui', 'mw', 'maxw', 'tC', 'HL']])))[0] + 1) ayhan = lambda d: d.assign(counter=d.groupby(['ui', 'mw', 'maxw', 'tC', 'HL']).ngroup() + 1) results = pd.DataFrame( index=[100, 300, 1000, 3000, 10000], columns='pir1 pir2 ayhan'.split(), dtype=float ) for i in results.index: d = pd.concat([df] * i, ignore_index=True) for j in results.columns: stmt = '{}(d)'.format(j) setp = 'from __main__ import d, {}'.format(j) results.set_value(i, j, timeit(stmt, setp, number=10)) results.plot(loglog=True) </code></pre>

enumerate groups in a dataframe

Tags:

python

pandas

pandas-groupby

i have the following table

Click to copy

date        ui  mw  maxw    tC  HL    msurp
01/03/2004  A   10   10     eC  0.25   0.1
01/04/2004  A   10   10     eC  0.25   -0.1
01/03/2004  B   20   20     bC  0.5    0.3
01/03/2004  B   20   20     bC  0.25    0.3

what i am looking to do is add a column to this table that basically enumerates the unique combinations of ui, mw, maxw, tC and HL and enumerates

so for example in the above table

unique combinations of ui, mw, maxw, tC and HL are

Click to copy

 A,10, 10, eC, 0.25
 B,20, 20, bC, 0.5
 B,20, 20, bC, 0.5

There are total 3 so the output should be something like

Click to copy

date        ui  mw  maxw    tC  HL    msurp  counter
01/03/2004  A   10   10     eC  0.25   0.1    1
01/04/2004  A   10   10     eC  0.25   -0.1   1
01/03/2004  B   20   20     bC  0.5    0.3    2
01/03/2004  B   20   20     bC  0.25    0.3   3

622

asked Aug 18 '17 17:08

qfd

1 Answers

Option 1
pd.Series.factorize

Click to copy

df.assign(
   counter=df[['ui', 'mw', 'maxw', 'tC', 'HL']].apply(tuple, 1).factorize()[0] + 1)

         date ui  mw  maxw  tC    HL  msurp  counter
0  01/03/2004  A  10    10  eC  0.25    0.1        1
1  01/04/2004  A  10    10  eC  0.25   -0.1        1
2  01/03/2004  B  20    20  bC  0.50    0.3        2
3  01/03/2004  B  20    20  bC  0.25    0.3        3

Option 1.5
More obnoxious version of option 1 but should be faster

Click to copy

df.assign(
    counter=pd.factorize(list(zip(
        *[df[c].values.tolist() for c in ['ui', 'mw', 'maxw', 'tC', 'HL']]
    )))[0] + 1
)

         date ui  mw  maxw  tC    HL  msurp  counter
0  01/03/2004  A  10    10  eC  0.25    0.1        1
1  01/04/2004  A  10    10  eC  0.25   -0.1        1
2  01/03/2004  B  20    20  bC  0.50    0.3        2
3  01/03/2004  B  20    20  bC  0.25    0.3        3

Option 2
@ayhan's answer (will delete if he posts it)

Click to copy

df.assign(
    counter=df.groupby(['ui', 'mw', 'maxw', 'tC', 'HL']).ngroup() + 1)

         date ui  mw  maxw  tC    HL  msurp  counter
0  01/03/2004  A  10    10  eC  0.25    0.1        1
1  01/04/2004  A  10    10  eC  0.25   -0.1        1
2  01/03/2004  B  20    20  bC  0.50    0.3        3
3  01/03/2004  B  20    20  bC  0.25    0.3        2

Timing
Code Below

Click to copy

(lambda r: r.div(r.min(1), 0).assign(best=lambda x: x.idxmin(1)))(results)

             pir1      pir2     ayhan   best
100     17.260639  1.000000  3.438354   pir2
300     30.550010  1.000000  2.598456   pir2
1000    43.201163  1.000000  1.236190   pir2
3000    61.593932  1.000000  1.025420   pir2
10000  127.003138  2.177171  1.000000  ayhan

enter image description here

Click to copy

pir1 = lambda d: d.assign(counter=d[['ui', 'mw', 'maxw', 'tC', 'HL']].apply(tuple, 1).factorize()[0] + 1)
pir2 = lambda d: d.assign(counter=pd.factorize(list(zip(*[d[c].values.tolist() for c in ['ui', 'mw', 'maxw', 'tC', 'HL']])))[0] + 1)
ayhan = lambda d: d.assign(counter=d.groupby(['ui', 'mw', 'maxw', 'tC', 'HL']).ngroup() + 1)

results = pd.DataFrame(
    index=[100, 300, 1000, 3000, 10000],
    columns='pir1 pir2 ayhan'.split(),
    dtype=float
)

for i in results.index:
    d = pd.concat([df] * i, ignore_index=True)
    for j in results.columns:
        stmt = '{}(d)'.format(j)
        setp = 'from __main__ import d, {}'.format(j)
        results.set_value(i, j, timeit(stmt, setp, number=10))

results.plot(loglog=True)

152

answered Oct 13 '22 14:10

piRSquared

Related questions
                            
                                virtualenv activate does not work
                            
                                ImportError: No module named 'ldap' Python 3.5
                            
                                How to encrypt a password field in django
                            
                                Grouping by with Where conditions in Pandas
                            
                                How to print the content of the generator?
                            
                                Python numpy unwrap function
                            
                                Python round() too slow, faster way to reduce precision?
                            
                                What does .div do in Pandas (Python)
                            
                                how to use rowcount in mysql using python
                            
                                How to return a generator from another function
                            
                                Removing lists from each cell in pandas dataframe
                            
                                Django Pytest Test URL Based on Settings
                            
                                How to decile python pandas dataframe by column value, and then sum each decile?
                            
                                Using my Google Geocoding API key with Python geocoder
                            
                                GPS time in weeks since epoch in Python?
                            
                                Calculating Primes and Appending to a List
                            
                                Django UserCreationForm with one password
                            
                                Python Pandas - Convert column to percentage on Groupby DF
                            
                                python cv2 video resolution
                            
                                Why doesn't first and last in a groupby give me first and last

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

enumerate groups in a dataframe

Tags:

python

pandas

pandas-groupby

qfd

People also ask

1 Answers

piRSquared

Recent Activity

Donate For Us