Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sum rows in the same column than the category in pandas dataframe - python

I have been working on formatting a log file and finally I have arrived to the following dataframe sample, where the categories and numbers I want to add are in the same column:

df = pd.DataFrame(dict(a=['Cat. A',1,1,3,'Cat. A',2,2,'Cat. B',3,5,2,6,'Cat. B',1,'Cat. C',4]))
>>> a
0   Cat. A
1   1
2   1
3   3
4   Cat. A
5   2
6   2
7   Cat. B
8   3
9   5
10  2
11  6
12  Cat. B
13  1
14  Cat. C
15  4

If I sum all the numbers below each category I want to obtain:

Cat. A= 1+1+3+2+2 = 9
Cat. B= 3+5+2+6+1 = 17
Cat. C= 4

I know how to do with going through all the files in the classic way, but I would like to know how to do it in a most pythonic way, considering that the numbers of rows for each category can be variable, and that the numbers of times that the category appears in each dataframe can be different too.

like image 755
agm Avatar asked Dec 16 '20 12:12

agm


2 Answers

This is also another way

df = pd.DataFrame(dict(a=['Cat. A',1,1,3,'Cat. A',2,2,'Cat. B',3,5,2,6,'Cat. B',1,'Cat. C',4]))

def coerce(x):
    try:
        int(x)
        return np.nan
    except:
        return x

def safesum(x):
    return x[x!=x.iloc[0]].astype(int).sum()


df['b'] = df['a'].apply(coerce).ffill()
df.groupby('b').agg(safesum)

Produces

         a
b         
Cat. A   9
Cat. B  17
Cat. C   4
like image 141
mccandar Avatar answered Oct 01 '22 23:10

mccandar


We can use pd.to_numeric to mark non-numeric fields as nan using Series.mask and Series.notna then use for group. Then use GroupBy.sum

a = pd.to_numeric(df['a'], errors='coerce')
g = df['a'].mask(a.notna()).ffill()
a.groupby(g).sum()

Cat. A     9.0
Cat. B    17.0
Cat. C     4.0
Name: a, dtype: float64
like image 28
Ch3steR Avatar answered Oct 01 '22 22:10

Ch3steR