pandas how add increasing number to check column for every subdataframe (by category 3 column)

Question

I have a huge dataframe with many columns and rows. I sort it by 3 columns (['color', 'category','level_1']) and i have one column for check ('check_control') My problem is: how can add a increasing number for any subdataframe for (['color', 'category','level_1']) ?

import pandas as pd
data = {'Name_1': ['dshgfj', 'krihgfjsh', 'dfgfd', 'dff','gcgt','hjgjh','et','asd','hhj','hg','jh','jhk'],
        'Name_2': ['dsf', 'dsfs', 'bilsdfl', 'sfsd', 'tcbcvbo','hghjg','bvnbhj','hga','kjk','kjh','hjjh','jkh'],
        'color': ['black', 'red', 'white', 'yellow', 'blue','green','orange','black','green','black','green','black'],
       'category': ['a', 'b', 'a', 'b', 'b','c','c','a','c','a','c','a'],
        'level_1': [1, 1, 1, 1, 1,1,1,2,2,3,3,4],
    'check_control':['','check','','check','','check','check','','','check','check','check']
}      
df = pd.DataFrame(data)
print(df)
df_fix  = df.sort_values(['color', 'category','level_1'], ascending=[True, True,True])
print(df_fix)

       Name_1   Name_2   color category  level_1 check_control
0      dshgfj      dsf   black        a        1
1   krihgfjsh     dsfs     red        b        1         check
2       dfgfd  bilsdfl   white        a        1
3         dff     sfsd  yellow        b        1         check
4        gcgt  tcbcvbo    blue        b        1
5       hjgjh    hghjg   green        c        1         check
6          et   bvnbhj  orange        c        1         check
7         asd      hga   black        a        2
8         hhj      kjk   green        c        2
9          hg      kjh   black        a        3         check
10         jh     hjjh   green        c        3         check
11        jhk      jkh   black        a        4         check

       Name_1   Name_2   color category  level_1 check_control
0      dshgfj      dsf   black        a        1
7         asd      hga   black        a        2
9          hg      kjh   black        a        3         check
11        jhk      jkh   black        a        4         check
4        gcgt  tcbcvbo    blue        b        1
5       hjgjh    hghjg   green        c        1         check
8         hhj      kjk   green        c        2
10         jh     hjjh   green        c        3         check
6          et   bvnbhj  orange        c        1         check
1   krihgfjsh     dsfs     red        b        1         check
2       dfgfd  bilsdfl   white        a        1
3         dff     sfsd  yellow        b        1         check

my solution must be like

data_solution = {'Name_1': ['dshgfj', 'krihgfjsh', 'dfgfd', 'dff','gcgt','hjgjh','et','asd','hhj','hg','jh','jhk'],
        'Name_2': ['dsf', 'dsfs', 'bilsdfl', 'sfsd', 'tcbcvbo','hghjg','bvnbhj','hga','kjk','kjh','hjjh','jkh'],
        'color': ['black', 'red', 'white', 'yellow', 'blue','green','orange','black','green','black','green','black'],
       'category': ['a', 'b', 'a', 'b', 'b','c','c','a','c','a','c','a'],
        'level_1': [1, 1, 1, 1, 1,1,1,2,2,3,3,4],
    'check_control':['','1_check','','1_check','','1_check','1_check','','','1_check','2_check','2_check']
}
df_solution = pd.DataFrame(data_solution)
df_fix_solution  = df_solution.sort_values(['color', 'category','level_1'], ascending=[True, True,True])

print(df_fix_solution)

       Name_1   Name_2   color category  level_1 check_control
0      dshgfj      dsf   black        a        1
7         asd      hga   black        a        2
9          hg      kjh   black        a        3       1_check
11        jhk      jkh   black        a        4       2_check
4        gcgt  tcbcvbo    blue        b        1
5       hjgjh    hghjg   green        c        1       1_check
8         hhj      kjk   green        c        2
10         jh     hjjh   green        c        3       2_check
6          et   bvnbhj  orange        c        1       1_check
1   krihgfjsh     dsfs     red        b        1       1_check
2       dfgfd  bilsdfl   white        a        1
3         dff     sfsd  yellow        b        1       1_check

Corralien · Accepted Answer

IIUC, keep only rows with check_control=='check' then use groupby_cumcount to add the expected identifier:

df_fix['check_control'] = (df_fix[df_fix['check_control'] == 'check']
                              .groupby(['color', 'category']).cumcount().add(1)
                              .astype(str) + '_' + df_fix['check_control']).fillna('')

Output:

>>> df_fix
       Name_1   Name_2   color category  level_1 check_control
0      dshgfj      dsf   black        a        1              
7         asd      hga   black        a        2              
9          hg      kjh   black        a        3       1_check
11        jhk      jkh   black        a        4       2_check
4        gcgt  tcbcvbo    blue        b        1              
5       hjgjh    hghjg   green        c        1       1_check
8         hhj      kjk   green        c        2              
10         jh     hjjh   green        c        3       2_check
6          et   bvnbhj  orange        c        1       1_check
1   krihgfjsh     dsfs     red        b        1       1_check
2       dfgfd  bilsdfl   white        a        1              
3         dff     sfsd  yellow        b        1       1_check

pandas how add increasing number to check column for every subdataframe (by category 3 column)

Tags:

pandas

dimzev

1 Answers

Corralien

Recent Activity

Donate For Us

pandas how add increasing number to check column for every subdataframe (by category 3 column)

Tags:

pandas

dimzev

1 Answers

Corralien

Related questions

Recent Activity

Donate For Us