Hi I want to get the counts of unique values of the dataframe. count_values implements this however I want to use its output somewhere else. How can I convert .count_values output to a pandas dataframe. here is an example code:
import pandas as pd df = pd.DataFrame({'a':[1, 1, 2, 2, 2]}) value_counts = df['a'].value_counts(dropna=True, sort=True) print(value_counts) print(type(value_counts))
output is:
2 3 1 2 Name: a, dtype: int64 <class 'pandas.core.series.Series'>
What I need is a dataframe like this:
unique_values counts 2 3 1 2
Thank you.
If you want to have your counts as a dataframe you can do it using function . to_frame() after the . value_counts() . If you need to name index column and rename a column, with counts in the dataframe you can convert to dataframe in a slightly different way.
Use rename_axis
for name of column from index and reset_index
:
df = df.value_counts().rename_axis('unique_values').reset_index(name='counts') print (df) unique_values counts 0 2 3 1 1 2
Or if need one column DataFrame use Series.to_frame
:
df = df.value_counts().rename_axis('unique_values').to_frame('counts') print (df) counts unique_values 2 3 1 2
I just run into the same problem, so I provide my thoughts here.
When you deal with the data structure of Pandas
, you have to aware of the return type.
Like @jezrael mentioned before, Pandas
do provide API pd.Series.to_frame
.
You can also wrap the pd.Series
to pd.DataFrame
by just doing
df_val_counts = pd.DataFrame(value_counts) # wrap pd.Series to pd.DataFrame
Then, you have a pd.DataFrame
with column name 'a'
, and your first column become the index
Input: print(df_value_counts.index.values) Output: [2 1] Input: print(df_value_counts.columns) Output: Index(['a'], dtype='object')
What now?
If you want to add new column names here, as a pd.DataFrame
, you can simply reset the index by the API of reset_index().
And then, change the column name by a list by API df.coloumns
df_value_counts = df_value_counts.reset_index() df_value_counts.columns = ['unique_values', 'counts']
Then, you got what you need
Output: unique_values counts 0 2 3 1 1 2
import pandas as pd df = pd.DataFrame({'a':[1, 1, 2, 2, 2]}) value_counts = df['a'].value_counts(dropna=True, sort=True) # solution here df_val_counts = pd.DataFrame(value_counts) df_value_counts_reset = df_val_counts.reset_index() df_value_counts_reset.columns = ['unique_values', 'counts'] # change column names
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With