I have a DataFrame which look like:
index name city
0 Yam Hadera
1 Meow Hadera
2 Don Hadera
3 Jazz Hadera
4 Bond Tel Aviv
5 James Tel Aviv
I want Pandas to randomly choose values, using the number of appearances in the city
column (kind of using: df.city.value_counts()
), so the results of my magic function, suppose:
df.magic_sample(3, weight_column='city')
might look like:
0 Yam Hadera
1 Meow Hadera
2 Bond Tel Aviv
Thanks! :)
get_value() function is used to quickly retrieve the single value in the data frame at the passed column and index. The input to the function is the row label and the column label.
Select Data Using Location Index (. This means that you can use dataframe. iloc[0:1, 0:1] to select the cell value at the intersection of the first row and first column of the dataframe. You can expand the range for either the row index or column index to select more data.
You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.
You can group by city
and then sample each group based on their length compared to the length of the original data frame:
df.groupby('city', group_keys=False).apply(lambda g: g.sample(3 * len(g)/len(df)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With