Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding Column Headers to new pandas dataframe

Tags:

python

pandas

I am creating a new pandas dataframe from a previous dataframe using the .groupby and .size methods.

[in] results = df.groupby(["X", "Y", "Z", "F"]).size()

[out]
    9   27/02/2016  1   N   326
    9   27/02/2016  1   S   332
    9   27/02/2016  2   N   280
    9   27/02/2016  2   S   353
    9   27/02/2016  3   N   177

This behaves as expected, however the result is a dataframe with no column headers.

This SO question states that the following adds column names to the generated dataframe

[in] results.columns = ["X","Y","Z","F","Count"]

However, this does not seem to have any impact at all.

[out]
        9   27/02/2016  1   N   326
        9   27/02/2016  1   S   332
        9   27/02/2016  2   N   280
        9   27/02/2016  2   S   353
        9   27/02/2016  3   N   177
like image 881
LearningSlowly Avatar asked Feb 08 '23 02:02

LearningSlowly


2 Answers

What you're seeing are your grouped columns as the index, if you call reset_index then it restores the column names

so

results = df.groupby(["X", "Y", "Z", "F"]).size()
results.reset_index()

should work

In [11]:
df.groupby(["X","Y","Z","F"]).size()

Out[11]:
X  Y           Z  F
9  27/02/2016  1  N    1
                  S    1
               2  N    1
                  S    1
               3  N    1
dtype: int64

In [12]:    
df.groupby(["X","Y","Z","F"]).size().reset_index()

Out[12]:
   X           Y  Z  F  0
0  9  27/02/2016  1  N  1
1  9  27/02/2016  1  S  1
2  9  27/02/2016  2  N  1
3  9  27/02/2016  2  S  1
4  9  27/02/2016  3  N  1

Additionally you can achieve what you want by using count:

In [13]:
df.groupby(["X","Y","Z","F"]).count().reset_index()

Out[13]:
   X           Y  Z  F  Count
0  9  27/02/2016  1  N      1
1  9  27/02/2016  1  S      1
2  9  27/02/2016  2  N      1
3  9  27/02/2016  2  S      1
4  9  27/02/2016  3  N      1

You could also pass param as_index=False here:

In [15]:
df.groupby(["X","Y","Z","F"], as_index=False).count()

Out[15]:
   X           Y  Z  F  Count
0  9  27/02/2016  1  N      1
1  9  27/02/2016  1  S      1
2  9  27/02/2016  2  N      1
3  9  27/02/2016  2  S      1
4  9  27/02/2016  3  N      1

This is normally fine but some aggregate functions will bork if you try to use aggregation methods on columns whose dtypes cannot be aggregated, for instance if you have str dtypes and you decide to call mean for instance.

like image 180
EdChum Avatar answered Feb 19 '23 10:02

EdChum


you can use as_index=False parameter for the .groupby() function:

results = df.groupby(["X", "Y", "Z", "F"], as_index=False).size().rename(columns={0:'Count'})
like image 35
MaxU - stop WAR against UA Avatar answered Feb 19 '23 12:02

MaxU - stop WAR against UA