Background Here's a toy <code>df</code>: <pre class="prettyprint"><code>df <- data.frame(ID = c("a","b","c","d","e","f"), gender = c("f","f","m","f","m","m"), zip = c(48601,NA,29910,54220,NA,44663),stringsAsFactors=FALSE) </code></pre> As you can see, I've got a couple of <code>NA</code> values in the <code>zip</code> column. Problem I'm trying to randomly sample 2 entire rows from <code>df</code> -- but I want them to be rows for which <code>zip</code> is not null. What I've tried This code gets me a basic (i.e. non-conditional) random sample: <pre class="prettyprint"><code>df2 <- df[sample(nrow(df), 2), ] </code></pre> But of course, that only gets me halfway to my goal -- a bunch of the time it's going to return a row with an <code>NA</code> value in <code>zip</code>. This code attempts to add the condition: <pre class="prettyprint"><code>df2 <- df[sample(nrow(df$zip != NA), 2), ] </code></pre> I think I'm close, but this yields an error <code>invalid first argument</code>. Any ideas?

Here is a base R solution with <code>complete.cases()</code> <pre class="prettyprint"><code># define a logical vector to identify NA x <- complete.cases(df) # subset only not NA values df_no_na <- df[x,] # do the sample df_no_na[sample(nrow(df_no_na), 2),] </code></pre> Output: <pre class="prettyprint"><code> ID gender zip 3 c m 29910 6 f m 44663 </code></pre>

We can use <code>is.na</code> <pre class="prettyprint"><code>tmp <- df[!is.na(df$zip),] > tmp[sample(nrow(tmp), 2),] </code></pre>

We can use <code>rownames</code> + <code>na.omit</code> to sample the rows <pre class="prettyprint"><code>> df[sample(rownames(na.omit(df["zip"])), 2),] ID gender zip 3 c m 29910 4 d f 54220 </code></pre>

In R, sample n rows from a df in which a certain column has non-NA values (sample conditionally)

Q: How to take a sample of a data frame in R?

Now, we can apply the sample_n function of the dplyr package to take a sample of our example data frame: The output is exactly the same as in Example 1, as you can see in your RStudio console by running the previous R code.

Q: How many columns are in the exemplifying data in R?

As you can see based on the previous output of the RStudio console, our exemplifying data contains three columns. Each of the variables contains missing values. In this Example, I’ll illustrate how to filter rows where at least one column contains a missing value.

Q: How to select random samples in R using dplyr?

select random rows by group which selects the random sample within group using slice_sample () and group_by () function in R We will be using mtcars data to depict the above functions sample_n () Function in Dplyr : select random samples in R using Dplyr The sample_n function selects random rows from a data frame (or table).

Tags:

random

dataframe

r

subset

Background

Here's a toy df:

df <- data.frame(ID = c("a","b","c","d","e","f"), 
                gender = c("f","f","m","f","m","m"), 
                zip = c(48601,NA,29910,54220,NA,44663),stringsAsFactors=FALSE)

As you can see, I've got a couple of NA values in the zip column.

Problem

I'm trying to randomly sample 2 entire rows from df -- but I want them to be rows for which zip is not null.

What I've tried

This code gets me a basic (i.e. non-conditional) random sample:

df2 <- df[sample(nrow(df), 2), ]

But of course, that only gets me halfway to my goal -- a bunch of the time it's going to return a row with an NA value in zip. This code attempts to add the condition:

df2 <- df[sample(nrow(df$zip != NA), 2), ]

I think I'm close, but this yields an error invalid first argument.

Any ideas?

916

asked Aug 05 '21 18:08

logjammin

3 Answers

Here is a base R solution with complete.cases()

# define a logical vector to identify NA
x <- complete.cases(df)

# subset only not NA values
df_no_na <- df[x,]

# do the sample
df_no_na[sample(nrow(df_no_na), 2),]

Output:

  ID gender   zip
3  c      m 29910
6  f      m 44663

answered Oct 21 '22 12:10

TarJae

We can use is.na

tmp <- df[!is.na(df$zip),]
> tmp[sample(nrow(tmp), 2),]

answered Oct 21 '22 11:10

akrun

We can use rownames + na.omit to sample the rows

> df[sample(rownames(na.omit(df["zip"])), 2),]
  ID gender   zip
3  c      m 29910
4  d      f 54220

answered Oct 21 '22 12:10

ThomasIsCoding

Related questions
                            
                                How to create factors from factanal?
                            
                                Object not found error when passing model formula to another function
                            
                                sum of S4 objects in R
                            
                                R Dynamically build "list" in data.table (or ddply)
                            
                                ggplot: Boxplot of multiple column values
                            
                                What's wrong with as.numeric in R? [duplicate]
                            
                                How can I install qpdf on Mac 10.8.3?
                            
                                Rounding numbers to nearest 10 in R
                            
                                Combining random forests built with different training sets in R
                            
                                Harvey balls in R
                            
                                Why does expand.grid ignore options?
                            
                                Stemming with R Text Analysis
                            
                                sliderInput for date
                            
                                R histogram with multiple populations
                            
                                Median of pandas dataframe column
                            
                                Can R help manuals have latex math in them?
                            
                                r - Filter rows that contain a string from a vector
                            
                                Predicted vs. Actual plot
                            
                                Is is possible to convert a dataframe object to a tribble constructor?
                            
                                Efficient string similarity grouping

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With