I'm aware of DataFrame.sample()
, but how can I do this and also remove the sample from the dataset? (Note: AFAIK this has nothing to do with sampling with replacement)
For example here is the essence of what I want to achieve, this does not actually work:
len(df) # 1000
df_subset = df.sample(300)
len(df_subset) # 300
df = df.remove(df_subset)
len(df) # 700
To remove rows at random without shuffling in Pandas DataFrame: Get an array of randomly selected row index labels. Use the drop(~) method to remove the rows.
Select n numbers of rows randomly using sample(n) or sample(n=n). Each time you run this, you get n different rows.
In Python, you can randomly sample elements from a list with choice() , sample() , and choices() of the random module. These functions can also be applied to a string and tuple. choice() returns one random element, and sample() and choices() return a list of multiple random elements.
If your index is unique
df = df.drop(df_subset.index)
example
df = pd.DataFrame(np.arange(10).reshape(-1, 2))
sample
df_subset = df.sample(2)
df_subset
drop
df.drop(df_subset.index)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With