Background
Here's a toy df
:
df <- data.frame(ID = c("a","b","c","d","e","f"),
gender = c("f","f","m","f","m","m"),
zip = c(48601,NA,29910,54220,NA,44663),stringsAsFactors=FALSE)
As you can see, I've got a couple of NA
values in the zip
column.
Problem
I'm trying to randomly sample 2 entire rows from df
-- but I want them to be rows for which zip
is not null.
What I've tried
This code gets me a basic (i.e. non-conditional) random sample:
df2 <- df[sample(nrow(df), 2), ]
But of course, that only gets me halfway to my goal -- a bunch of the time it's going to return a row with an NA
value in zip
. This code attempts to add the condition:
df2 <- df[sample(nrow(df$zip != NA), 2), ]
I think I'm close, but this yields an error invalid first argument
.
Any ideas?
To select rows of an R data frame that are non-Na, we can use complete. cases function with single square brackets. For example, if we have a data frame called that contains some missing values (NA) then the selection of rows that are non-NA can be done by using the command df[complete. cases(df),].
To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).
library(dplyr) df %>% select_if(~ ! any(is.na(.))) Both methods produce the same result.
sample_n () function in R Language is used to take random sample specimens from a data frame. Syntax: sample_n (x, n) Parameters: x: Data Frame. n: size/number of items to select. Example 1: library (dplyr) d <- data.frame ( name = c ("Abhi", "Bhavesh", "Chaman", "Dimri"), age = c (7, 5, 9, 16),
Now, we can apply the sample_n function of the dplyr package to take a sample of our example data frame: The output is exactly the same as in Example 1, as you can see in your RStudio console by running the previous R code.
As you can see based on the previous output of the RStudio console, our exemplifying data contains three columns. Each of the variables contains missing values. In this Example, I’ll illustrate how to filter rows where at least one column contains a missing value.
select random rows by group which selects the random sample within group using slice_sample () and group_by () function in R We will be using mtcars data to depict the above functions sample_n () Function in Dplyr : select random samples in R using Dplyr The sample_n function selects random rows from a data frame (or table).
Here is a base R solution with complete.cases()
# define a logical vector to identify NA
x <- complete.cases(df)
# subset only not NA values
df_no_na <- df[x,]
# do the sample
df_no_na[sample(nrow(df_no_na), 2),]
Output:
ID gender zip
3 c m 29910
6 f m 44663
We can use is.na
tmp <- df[!is.na(df$zip),]
> tmp[sample(nrow(tmp), 2),]
We can use rownames
+ na.omit
to sample the rows
> df[sample(rownames(na.omit(df["zip"])), 2),]
ID gender zip
3 c m 29910
4 d f 54220
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With