Given a problem set, with values and their associated frequencies, how can the sample
be created in a dataframe?
Find the mean of this dataset
Value: 1 | 2 | 3
Freq: 3 | 4 | 2
Which represents the sample
, [1, 1, 1, 2, 2, 2, 2, 3, 3]
.
I input this into Python:
>>> import pandas as pd
>>> df = pd.DataFrame({'value':[1, 2, 3], 'freq':[4, 5, 2]})
>>> df
value freq
0 1 3
1 2 4
2 3 2
It's not difficult to find solve basic statistics with this format. For example, the mean for this dataset is (df['value'] * df['freq']).sum() / df['freq'].sum()
. However it would be nice to use built in functions/attributes such as .mean()
. To do this I need to input the value/freq data as raw value data into the data frame. My end goal is this:
data
0 1
1 1
2 1
3 2
4 2
5 2
6 2
7 3
8 3
Does anybody know how to input datasets given in value/frequency form and create a data frame of raw data? Thank you.
An option is to use np.repeat
import numpy as np
values = [1,2,3]
frequency = [3,4,2]
df = pd.DataFrame(np.repeat(values, frequency), columns=['data'])
df.mean()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With