I have an extensive block of code that I've written using dplyr syntax in R. However, I am trying to put that code in a loop, so that I can ultimately create multiple output files as opposed to just one. Unfortunately, I appear unable to do so. For illustration purposes regarding my problem, let's refer to the commonly used "iris" dataset in R: <pre class="prettyprint"><code> > data("iris") > str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num $ Sepal.Width : num $ Petal.Length: num $ Petal.Width : num $ Species : Factor w/ 3 levels "setosa","versicolor","virginica" </code></pre> Let's say that I want to save the average Petal.Length of the species "versicolor". The dplyr code could look like the following: <pre class="prettyprint"><code> MeanLength2 <- iris %>% filter(Species=="versicolor") %>% summarize(mean(Petal.Length)) %>% print() </code></pre> Which would give the following value: <pre class="prettyprint"><code> mean(Petal.Length) 1 4.26 </code></pre> Lets attempt to create a loop to get the average petal length for all of the species. From what little I know of loops, I would want to do something like this: <pre class="prettyprint"><code> for (i in unique(iris$Species)) { iris %>% filter(iris$Species==unique(iris$Species)[i]) %>% summarize(mean(iris$Petal.Length)) %>% print() print(i) } </code></pre> For some reason, I had to specify the data frame and the column inside the loop, which is generally not the case while using the piping functionality of dplyr. I'm assuming that this is indicative of the problem. Anyways, the above code gives the following output: <pre class="prettyprint"><code> mean(iris$Petal.Length) 1 3.758 [1] "setosa" mean(iris$Petal.Length) 1 3.758 [1] "versicolor" mean(iris$Petal.Length) 1 3.758 [1] "virginica" </code></pre> So the code is outputting 3.758 three times, which is the average petal length across all species in the dataset. This indicates that the "filter" code did not work as expected. From what I can tell, it appears that the loop itself functioned as intended, as all three unique species names were printed in the eventual output. How can one go about doing something like this with the use of for loops? I understand that this particular exercise does not require the use of fancy loops as one can easily get the average petal length of all the species by using, for example, the "group_by" function in dplyr, but I am looking to output close to a 100 unique table and PDF files with the dataset that I am working with and knowing how to use for loops would really help for that purpose.

It is unfortunate that your code didn't raise any errors. If you run your code line by line you'll understand what I'm saying. For this example I will choose the first iteration of your loop, let's replace <code>i</code> for <code>"setosa"</code>: <pre class="prettyprint"><code>> iris %>% filter(iris$Species == unique(iris$Species)["setosa"]) [1] Sepal.Length Sepal.Width Petal.Length Petal.Width Species <0 rows> (or 0-length row.names) </code></pre> Your filter yields a data frame with no observations, so no point in going ahead, but for this example, let's run the rest of the code: <pre class="prettyprint"><code>> iris %>% filter(iris$Species == unique(iris$Species)["setosa"]) %>% + summarize(mean(iris$Petal.Length)) mean(iris$Petal.Length) 1 3.758 </code></pre> What happened is that you're calling the <code>iris</code> dataset from within your code, a more obvious example would be: <pre class="prettyprint"><code>> filter(iris, iris$Species == unique(iris$Species)["setosa"]) %>% + summarize(mean(mtcars$cyl)) mean(mtcars$cyl) 1 6.1875 </code></pre> That's why you don't get the answer you expected, your filter didn't work and you got a summary statistic from another dataset. As TJ Mahr mentioned, your code without specifying the dataset runs fine: <pre class="prettyprint"><code>> for (i in unique(iris$Species)) + { + iris %>% filter(Species==i) %>% + summarize(mean(Petal.Length)) %>% print() + print(i) + } mean(Petal.Length) 1 1.462 [1] "setosa" mean(Petal.Length) 1 4.26 [1] "versicolor" mean(Petal.Length) 1 5.552 [1] "virginica" </code></pre> I hope this helps

How to write loops "for" loops in R using dplyr syntax

Tags:

loops

r

dplyr

I have an extensive block of code that I've written using dplyr syntax in R. However, I am trying to put that code in a loop, so that I can ultimately create multiple output files as opposed to just one. Unfortunately, I appear unable to do so.

For illustration purposes regarding my problem, let's refer to the commonly used "iris" dataset in R:

      > data("iris")
      > str(iris)
      'data.frame': 150 obs. of  5 variables:
      $ Sepal.Length: num  
      $ Sepal.Width : num  
      $ Petal.Length: num  
      $ Petal.Width : num  
      $ Species     : Factor w/ 3 levels "setosa","versicolor","virginica"

Let's say that I want to save the average Petal.Length of the species "versicolor". The dplyr code could look like the following:

    MeanLength2 <- iris %>% filter(Species=="versicolor")
                       %>% summarize(mean(Petal.Length)) %>% print()

Which would give the following value:

      mean(Petal.Length)
    1               4.26

Lets attempt to create a loop to get the average petal length for all of the species.

From what little I know of loops, I would want to do something like this:

     for (i in unique(iris$Species))
      {
       iris %>% filter(iris$Species==unique(iris$Species)[i]) %>%
        summarize(mean(iris$Petal.Length)) %>% print()
        print(i) 
       }

For some reason, I had to specify the data frame and the column inside the loop, which is generally not the case while using the piping functionality of dplyr. I'm assuming that this is indicative of the problem.

Anyways, the above code gives the following output:

          mean(iris$Petal.Length)
     1                   3.758
     [1] "setosa"
          mean(iris$Petal.Length)
     1                   3.758
     [1] "versicolor"
          mean(iris$Petal.Length)
     1                   3.758
     [1] "virginica"

So the code is outputting 3.758 three times, which is the average petal length across all species in the dataset. This indicates that the "filter" code did not work as expected. From what I can tell, it appears that the loop itself functioned as intended, as all three unique species names were printed in the eventual output.

How can one go about doing something like this with the use of for loops? I understand that this particular exercise does not require the use of fancy loops as one can easily get the average petal length of all the species by using, for example, the "group_by" function in dplyr, but I am looking to output close to a 100 unique table and PDF files with the dataset that I am working with and knowing how to use for loops would really help for that purpose.

489

asked Sep 01 '16 20:09

Naj S

1 Answers

It is unfortunate that your code didn't raise any errors. If you run your code line by line you'll understand what I'm saying. For this example I will choose the first iteration of your loop, let's replace i for "setosa":

> iris  %>% filter(iris$Species == unique(iris$Species)["setosa"])
[1] Sepal.Length Sepal.Width  Petal.Length Petal.Width  Species     
<0 rows> (or 0-length row.names)

Your filter yields a data frame with no observations, so no point in going ahead, but for this example, let's run the rest of the code:

> iris  %>% filter(iris$Species == unique(iris$Species)["setosa"]) %>%  
+ summarize(mean(iris$Petal.Length))
  mean(iris$Petal.Length)
1                   3.758

What happened is that you're calling the iris dataset from within your code, a more obvious example would be:

> filter(iris, iris$Species == unique(iris$Species)["setosa"]) %>% 
+ summarize(mean(mtcars$cyl))
  mean(mtcars$cyl)
1           6.1875

That's why you don't get the answer you expected, your filter didn't work and you got a summary statistic from another dataset.

As TJ Mahr mentioned, your code without specifying the dataset runs fine:

> for (i in unique(iris$Species))
+ {
+     iris %>% filter(Species==i) %>%
+         summarize(mean(Petal.Length)) %>% print()
+     print(i) 
+ }
  mean(Petal.Length)
1              1.462
[1] "setosa"
  mean(Petal.Length)
1               4.26
[1] "versicolor"
  mean(Petal.Length)
1              5.552
[1] "virginica"

I hope this helps

186

answered Oct 18 '22 17:10

donlelek

Related questions
                            
                                no output from org-babel code using R
                            
                                An explanation on the behaviour of the "==" operator
                            
                                How can I add time dimension in polar coordinates in R?
                            
                                Formatting a scale_x_continuous axis with quarterly data
                            
                                raster package: Lines around each cell
                            
                                Removing everything after a character in a column in R
                            
                                Adding hyperlinks to Shiny plots
                            
                                Why was package 'epicalc' removed from CRAN? [closed]
                            
                                dynamically assign number of splits in data.table tstrsplit
                            
                                Spread out density plots with ggplot
                            
                                R: Interpolation of NAs by group
                            
                                Pyramid plot in R
                            
                                What is the fastest way to perform multiple logical comparisons in R?
                            
                                dplyr rowwise by some columns
                            
                                Overall Label for Facets
                            
                                legend labels not displaying inline when using labels and colors in leaflet map on shiny
                            
                                equivalent to R's `do.call` in python
                            
                                Convert list to dataframe in R and add column with names of sub-lists
                            
                                Remove prefix from all data in a single column in R
                            
                                Faster way to trim a long character vector in R [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With