Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get mean and mode of dataframe depending on each column type

Tags:

python

pandas

Apologies if something similar has been asked before, I searched around but couldn't figure out a solution.

My dataset looks like such

data1 = {'Group':['Winner','Winner','Winner','Winner','Loser','Loser'],
        'Study': ['Read','Read','Notes','Cheat','Read','Read'],
        'Score': [1,.90,.80,.70,1,.90]}
df1 = pd.DataFrame(data=data1)

enter image description here

This dataframe spans for dozens of rows, and have a set of numeric columns, and a set of string columns. I would like to condense this into 1 row, where each entry is just the mean or mode of the column. If the column is numeric, take the mean, otherwise, take the mode. In my actual use case, the order of numeric and object columns are random, so I hope to use an iterative loop that checks for each column which action to take.

I tried this but it didn't work, it seems to be taking the entire Series as the mode.

for i in df1:
    if df1[i].dtype == 'float64':
        df1[i] = df1[i].mean()
      

Any help is appreciated, thank you!

like image 595
AxW Avatar asked Mar 23 '21 21:03

AxW


People also ask

How do you calculate the mean of each column in a DataFrame?

To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.

How do you find the mode of a particular column in pandas?

Finding the mode in a column, or the mode for all columns or rows in a DataFrame using pandas is easy. We can use the pandas mode() function to find the mode value of columns in a DataFrame. The pandas mode() function works for both numeric and object dtypes.

How can calculate mean value grouped on another column in pandas?

To calculate mean values grouped on another column in pandas, we will use groupby, and then we will apply mean() method. Pandas allow us a direct method called mean() which calculates the average of the set passed into it.

How to calculate the mode of The Dataframe in Python pandas?

df.mode () will calculate the mode of the dataframe across columns so the output will be Column Mode of the dataframe in python pandas : mode function takes axis =0 as argument. so that it calculates a column wise mode.

How to calculate the mean of a column in pandas Dataframe?

Often you may be interested in calculating the mean of one or more columns in a pandas DataFrame. Fortunately you can do this easily in pandas using the mean () function. This tutorial shows several examples of how to use this function. Suppose we have the following pandas DataFrame:

How to get the datatype of a column in a Dataframe?

TO get the datatypes, we will be using the dtype () and the type () function. From the Output we can observe that on accessing or getting a single column separated from DataFrame its type gets converted to a Pandas Series type irrespective of the data type present in that series.

What is the mode of the missing values in a Dataframe?

By default, missing values are not considered, and the mode of wings are both 0 and 2. Because the resulting DataFrame has two rows, the second row of species and legs contains NaN. Setting dropna=False NaN values are considered and they can be the mode (like for wings).


3 Answers

You can use describe with 'all' which calculates statistics depending upon the dtype. It determines the top (mode) for object and mean for numeric columns. Then combine.

s = df1.describe(include='all')
s = s.loc['top'].combine_first(s.loc['mean'])

#Group      Winner
#Study        Read
#Score    0.883333
#Name: top, dtype: object
like image 161
ALollz Avatar answered Nov 14 '22 21:11

ALollz


np.number and select_dtypes

s = df1.select_dtypes(np.number).mean()
df1.drop(s.index, axis=1).mode().iloc[0].append(s)

Group      Winner
Study        Read
Score    0.883333
dtype: object

Variant

g = df1.dtypes.map(lambda x: np.issubdtype(x, np.number))
d = {k: d for k, d in df1.groupby(g, axis=1)}
pd.concat([d[False].mode().iloc[0], d[True].mean()])

Group      Winner
Study        Read
Score    0.883333
dtype: object
like image 40
piRSquared Avatar answered Nov 14 '22 22:11

piRSquared


Here is a slight variation on your solution that gets the job done

res = {}
for col_name, col_type in zip(df1.columns, df1.dtypes):
    if pd.api.types.is_numeric_dtype(col_type):
        res[col_name] = df1[col_name].mean()
    else:
        res[col_name]= df1[col_name].mode()[0]

pd.DataFrame(res, index = [0])

returns

    Group   Study   Score
0   Winner  Read    0.883333

there could be multiple modes in a Series -- this solution picks the first one

like image 44
piterbarg Avatar answered Nov 14 '22 22:11

piterbarg