Apologies if something similar has been asked before, I searched around but couldn't figure out a solution.
My dataset looks like such
data1 = {'Group':['Winner','Winner','Winner','Winner','Loser','Loser'],
'Study': ['Read','Read','Notes','Cheat','Read','Read'],
'Score': [1,.90,.80,.70,1,.90]}
df1 = pd.DataFrame(data=data1)
This dataframe spans for dozens of rows, and have a set of numeric columns, and a set of string columns. I would like to condense this into 1 row, where each entry is just the mean or mode of the column. If the column is numeric, take the mean, otherwise, take the mode. In my actual use case, the order of numeric and object columns are random, so I hope to use an iterative loop that checks for each column which action to take.
I tried this but it didn't work, it seems to be taking the entire Series as the mode.
for i in df1:
if df1[i].dtype == 'float64':
df1[i] = df1[i].mean()
Any help is appreciated, thank you!
To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.
Finding the mode in a column, or the mode for all columns or rows in a DataFrame using pandas is easy. We can use the pandas mode() function to find the mode value of columns in a DataFrame. The pandas mode() function works for both numeric and object dtypes.
To calculate mean values grouped on another column in pandas, we will use groupby, and then we will apply mean() method. Pandas allow us a direct method called mean() which calculates the average of the set passed into it.
df.mode () will calculate the mode of the dataframe across columns so the output will be Column Mode of the dataframe in python pandas : mode function takes axis =0 as argument. so that it calculates a column wise mode.
Often you may be interested in calculating the mean of one or more columns in a pandas DataFrame. Fortunately you can do this easily in pandas using the mean () function. This tutorial shows several examples of how to use this function. Suppose we have the following pandas DataFrame:
TO get the datatypes, we will be using the dtype () and the type () function. From the Output we can observe that on accessing or getting a single column separated from DataFrame its type gets converted to a Pandas Series type irrespective of the data type present in that series.
By default, missing values are not considered, and the mode of wings are both 0 and 2. Because the resulting DataFrame has two rows, the second row of species and legs contains NaN. Setting dropna=False NaN values are considered and they can be the mode (like for wings).
You can use describe
with 'all'
which calculates statistics depending upon the dtype
. It determines the top
(mode) for object and mean
for numeric columns. Then combine.
s = df1.describe(include='all')
s = s.loc['top'].combine_first(s.loc['mean'])
#Group Winner
#Study Read
#Score 0.883333
#Name: top, dtype: object
np.number
and select_dtypes
s = df1.select_dtypes(np.number).mean()
df1.drop(s.index, axis=1).mode().iloc[0].append(s)
Group Winner
Study Read
Score 0.883333
dtype: object
Variant
g = df1.dtypes.map(lambda x: np.issubdtype(x, np.number))
d = {k: d for k, d in df1.groupby(g, axis=1)}
pd.concat([d[False].mode().iloc[0], d[True].mean()])
Group Winner
Study Read
Score 0.883333
dtype: object
Here is a slight variation on your solution that gets the job done
res = {}
for col_name, col_type in zip(df1.columns, df1.dtypes):
if pd.api.types.is_numeric_dtype(col_type):
res[col_name] = df1[col_name].mean()
else:
res[col_name]= df1[col_name].mode()[0]
pd.DataFrame(res, index = [0])
returns
Group Study Score
0 Winner Read 0.883333
there could be multiple mode
s in a Series -- this solution picks the first one
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With