Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas random sample with remove

Tags:

python

pandas

I'm aware of DataFrame.sample(), but how can I do this and also remove the sample from the dataset? (Note: AFAIK this has nothing to do with sampling with replacement)

For example here is the essence of what I want to achieve, this does not actually work:

len(df) # 1000

df_subset = df.sample(300)
len(df_subset) # 300

df = df.remove(df_subset)
len(df) # 700
like image 885
JakeCowton Avatar asked Oct 03 '16 15:10

JakeCowton


People also ask

How do I get rid of random rows in a data frame?

To remove rows at random without shuffling in Pandas DataFrame: Get an array of randomly selected row index labels. Use the drop(~) method to remove the rows.

How do I randomly select rows in pandas?

Select n numbers of rows randomly using sample(n) or sample(n=n). Each time you run this, you get n different rows.

How do you take a random sample from a list in Python?

In Python, you can randomly sample elements from a list with choice() , sample() , and choices() of the random module. These functions can also be applied to a string and tuple. choice() returns one random element, and sample() and choices() return a list of multiple random elements.


1 Answers

If your index is unique

df = df.drop(df_subset.index)

example

df = pd.DataFrame(np.arange(10).reshape(-1, 2))

sample

df_subset = df.sample(2)
df_subset

enter image description here


drop

df.drop(df_subset.index)

enter image description here

like image 126
piRSquared Avatar answered Sep 19 '22 14:09

piRSquared