Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Pandas to sample DataFrame using a specific column's weight

I have a DataFrame which look like:

  index  name   city
  0      Yam    Hadera
  1      Meow   Hadera
  2      Don    Hadera
  3      Jazz   Hadera
  4      Bond   Tel Aviv
  5      James  Tel Aviv

I want Pandas to randomly choose values, using the number of appearances in the city column (kind of using: df.city.value_counts()), so the results of my magic function, suppose:

df.magic_sample(3, weight_column='city')

might look like:

  0     Yam      Hadera
  1     Meow     Hadera
  2     Bond     Tel Aviv

Thanks! :)

like image 331
Yam Mesicka Avatar asked Jan 08 '17 01:01

Yam Mesicka


People also ask

How do you get a specific value from a pandas DataFrame?

get_value() function is used to quickly retrieve the single value in the data frame at the passed column and index. The input to the function is the row label and the column label.

How do you select a specific value in a DataFrame?

Select Data Using Location Index (. This means that you can use dataframe. iloc[0:1, 0:1] to select the cell value at the intersection of the first row and first column of the dataframe. You can expand the range for either the row index or column index to select more data.

How do you create a data frame from a specific column?

You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.


1 Answers

You can group by city and then sample each group based on their length compared to the length of the original data frame:

df.groupby('city', group_keys=False).apply(lambda g: g.sample(3 * len(g)/len(df)))

enter image description here

like image 134
Psidom Avatar answered Nov 11 '22 10:11

Psidom