Given a problem set, with values and their associated frequencies, how can the sample be created in a dataframe?
Find the mean of this dataset
Value: 1 | 2 | 3
Freq:  3 | 4 | 2
Which represents the sample, [1, 1, 1, 2, 2, 2, 2, 3, 3].
I input this into Python:
>>> import pandas as pd
>>> df = pd.DataFrame({'value':[1, 2, 3], 'freq':[4, 5, 2]})
>>> df
   value  freq
0      1     3
1      2     4
2      3     2
It's not difficult to find solve basic statistics with this format. For example, the mean for this dataset is (df['value'] * df['freq']).sum() / df['freq'].sum(). However it would be nice to use built in functions/attributes such as .mean(). To do this I need to input the value/freq data as raw value data into the data frame. My end goal is this:
    data
0      1
1      1
2      1
3      2
4      2
5      2
6      2
7      3
8      3
Does anybody know how to input datasets given in value/frequency form and create a data frame of raw data? Thank you.
An option is to use np.repeat
import numpy as np
values = [1,2,3]
frequency = [3,4,2]
df = pd.DataFrame(np.repeat(values, frequency), columns=['data'])
df.mean()
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With