Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R, sample n rows from a df in which a certain column has non-NA values (sample conditionally)

Background

Here's a toy df:

df <- data.frame(ID = c("a","b","c","d","e","f"), 
                gender = c("f","f","m","f","m","m"), 
                zip = c(48601,NA,29910,54220,NA,44663),stringsAsFactors=FALSE)

As you can see, I've got a couple of NA values in the zip column.

Problem

I'm trying to randomly sample 2 entire rows from df -- but I want them to be rows for which zip is not null.

What I've tried

This code gets me a basic (i.e. non-conditional) random sample:

df2 <- df[sample(nrow(df), 2), ]

But of course, that only gets me halfway to my goal -- a bunch of the time it's going to return a row with an NA value in zip. This code attempts to add the condition:

df2 <- df[sample(nrow(df$zip != NA), 2), ]

I think I'm close, but this yields an error invalid first argument.

Any ideas?

like image 916
logjammin Avatar asked Aug 05 '21 18:08

logjammin


People also ask

How do I select rows without NA values in R?

To select rows of an R data frame that are non-Na, we can use complete. cases function with single square brackets. For example, if we have a data frame called that contains some missing values (NA) then the selection of rows that are non-NA can be done by using the command df[complete. cases(df),].

How do you exclude observations in NA in R?

To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).

How do I omit columns with NA in R?

library(dplyr) df %>% select_if(~ ! any(is.na(.))) Both methods produce the same result.

How to use sample_n () function in R language?

sample_n () function in R Language is used to take random sample specimens from a data frame. Syntax: sample_n (x, n) Parameters: x: Data Frame. n: size/number of items to select. Example 1: library (dplyr) d <- data.frame ( name = c ("Abhi", "Bhavesh", "Chaman", "Dimri"), age = c (7, 5, 9, 16),

How to take a sample of a data frame in R?

Now, we can apply the sample_n function of the dplyr package to take a sample of our example data frame: The output is exactly the same as in Example 1, as you can see in your RStudio console by running the previous R code.

How many columns are in the exemplifying data in R?

As you can see based on the previous output of the RStudio console, our exemplifying data contains three columns. Each of the variables contains missing values. In this Example, I’ll illustrate how to filter rows where at least one column contains a missing value.

How to select random samples in R using dplyr?

select random rows by group which selects the random sample within group using slice_sample () and group_by () function in R We will be using mtcars data to depict the above functions sample_n () Function in Dplyr : select random samples in R using Dplyr The sample_n function selects random rows from a data frame (or table).


3 Answers

Here is a base R solution with complete.cases()

# define a logical vector to identify NA
x <- complete.cases(df)

# subset only not NA values
df_no_na <- df[x,]

# do the sample
df_no_na[sample(nrow(df_no_na), 2),]

Output:

  ID gender   zip
3  c      m 29910
6  f      m 44663
like image 71
TarJae Avatar answered Oct 21 '22 12:10

TarJae


We can use is.na

tmp <- df[!is.na(df$zip),]
> tmp[sample(nrow(tmp), 2),]
like image 9
akrun Avatar answered Oct 21 '22 11:10

akrun


We can use rownames + na.omit to sample the rows

> df[sample(rownames(na.omit(df["zip"])), 2),]
  ID gender   zip
3  c      m 29910
4  d      f 54220
like image 8
ThomasIsCoding Avatar answered Oct 21 '22 12:10

ThomasIsCoding