Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a dataframe column from values with frequency count?

Tags:

python

pandas

Given a problem set, with values and their associated frequencies, how can the sample be created in a dataframe?

Find the mean of this dataset
Value: 1 | 2 | 3
Freq:  3 | 4 | 2

Which represents the sample, [1, 1, 1, 2, 2, 2, 2, 3, 3].

I input this into Python:

>>> import pandas as pd
>>> df = pd.DataFrame({'value':[1, 2, 3], 'freq':[4, 5, 2]})
>>> df
   value  freq
0      1     3
1      2     4
2      3     2

It's not difficult to find solve basic statistics with this format. For example, the mean for this dataset is (df['value'] * df['freq']).sum() / df['freq'].sum(). However it would be nice to use built in functions/attributes such as .mean(). To do this I need to input the value/freq data as raw value data into the data frame. My end goal is this:

    data
0      1
1      1
2      1
3      2
4      2
5      2
6      2
7      3
8      3

Does anybody know how to input datasets given in value/frequency form and create a data frame of raw data? Thank you.

like image 449
Farzad Saif Avatar asked Jan 24 '23 17:01

Farzad Saif


1 Answers

An option is to use np.repeat

import numpy as np

values = [1,2,3]

frequency = [3,4,2]

df = pd.DataFrame(np.repeat(values, frequency), columns=['data'])

df.mean()

like image 56
O Pardal Avatar answered Jan 30 '23 01:01

O Pardal