Why rbind throws a warning

Tags:

r

This is related to Are there more elegant ways to transform ragged data into a tidy dataframe

Why following code is not working:

events = structure(list(date = structure(c(-714974, -714579, -717835), class = "Date"), 
    days = c(1, 6, 0.5), name = c("Intro to stats", "Stats Winter school", 
    "TidyR tools"), topics = c("probability|R", "R|regression|ggplot", 
    "tidyR|dplyr")), .Names = c("date", "days", "name", "topics"
), row.names = c(NA, -3L), class = "data.frame")

> newdf <- data.frame(topic=character(), days=character())
> for(i in 1:length(events$topics)){
+ xx = unlist(strsplit(events$topics[i],'\\|'))
+ for(j in 1:length(xx)){
+ yy = c(xx[j], events$days[i]/length(xx))
+ print(yy)
+ newdf=rbind(newdf, yy)
+ }
+ }
[1] "probability" "0.5"        
[1] "R"   "0.5"
[1] "R" "2"
[1] "regression" "2"         
[1] "ggplot" "2"     
[1] "tidyR" "0.25" 
[1] "dplyr" "0.25" 
There were 11 warnings (use warnings() to see them)
> newdf
  X.probability. X.0.5.
1    probability    0.5
2           <NA>    0.5
3           <NA>   <NA>
4           <NA>   <NA>
5           <NA>   <NA>
6           <NA>   <NA>
7           <NA>   <NA>
> 
> warnings()
Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, NA ... :
  invalid factor level, NAs generated
2: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, NA,  ... :
  invalid factor level, NAs generated
3: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, 1L,  ... :
  invalid factor level, NAs generated
4: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, NA,  ... :
  invalid factor level, NAs generated
5: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, 1L,  ... :
  invalid factor level, NAs generated
6: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, NA,  ... :
  invalid factor level, NAs generated
7: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, 1L,  ... :
  invalid factor level, NAs generated
8: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, NA,  ... :
  invalid factor level, NAs generated
9: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, 1L,  ... :
  invalid factor level, NAs generated
10: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, NA,  ... :
  invalid factor level, NAs generated
11: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, 1L,  ... :
  invalid factor level, NAs generated
>

yy is okay but rbind is not working. Where is the error and how can it be corrected? Thanks for your help.

227

asked Aug 03 '14 08:08

3 Answers

You may try:

newdf <- data.frame(topic=character(), daysPerTopic=character(), stringsAsFactors=F)
for(i in 1:length(events$topics)){
xx = unlist(strsplit(events$topics[i],'\\|'))
for(j in 1:length(xx)){
yy = data.frame(topic=xx[j], daysPerTopic=events$days[i]/length(xx), stringsAsFactors=F)
newdf <- rbind(newdf, yy) 
 }
 }

 newdf
#        topic daysPerTopic
# 1 probability         0.50
# 2           R         0.50
# 3           R         2.00
# 4  regression         2.00
# 5      ggplot         2.00
# 6       tidyR         0.25
# 7       dplyr         0.25

 op <- options(stringsAsFactors=F)  #set to F

 #Your code
 newdf <- data.frame(topic=character(), days=character())
 for(i in 1:length(events$topics)){
 xx = unlist(strsplit(events$topics[i],'\\|'))
 for(j in 1:length(xx)){
yy = c(xx[j], events$days[i]/length(xx))
print(yy)
newdf=rbind(newdf, yy)
 }
 }

 newdf
#  X.probability. X.0.5.
# 1    probability    0.5
# 2              R    0.5
# 3              R      2
# 4     regression      2
# 5         ggplot      2
# 6          tidyR   0.25
# 7          dplyr   0.25

 options(op) #et back to default

answered Oct 26 '22 09:10

akrun

Did you even try to debug your for loop? For example, by adding print(class(yy)) print(str(newdf)) you would see that after first iteration both newdf vectors become factors.

# [1] "probability" "0.5"        
# [1] "character"
# 'data.frame':  0 obs. of  2 variables:
#   $ topic: Factor w/ 0 levels: 
#   $ days : Factor w/ 0 levels: 
#   NULL
# [1] "R"   "0.5"
# [1] "character"
# 'data.frame': 1 obs. of  2 variables:
#   $ X.probability.: Factor w/ 1 level "probability": 1
# $ X.0.5.        : Factor w/ 1 level "0.5": 1
# NULL
# [1] "R" "2"
# [1] "character"
# 'data.frame': 2 obs. of  2 variables:
#   $ X.probability.: Factor w/ 1 level "probability": 1 NA
# $ X.0.5.        : Factor w/ 1 level "0.5": 1 1

...

You would say "but I defined them as character". True, but if you'll read rbind documentation, you will see that

For cbind (rbind), vectors of zero length (including NULL) are ignored unless the result would have zero rows (columns), for S compatibility. (Zero-extent matrices do not occur in S3 and are not ignored in R.)

Another property of rbind is that it inherits it's properties from data.frame while one of them is stringsAsFactors == TRUE

What happened here could be easily illustrated in a dummy example, consider

temp <- data.frame(A = letters[1:3])
str(temp)
## 'data.frame':    3 obs. of  1 variable:
## $ A: Factor w/ 3 levels "a","b","c": 1 2 3

temp$A[3] <- "d"
## Warning message:
## In `[<-.factor`(`*tmp*`, 3, value = c(1L, 2L, NA)) :
##   invalid factor level, NA generated

temp$A
## [1] a    b    <NA>
## Levels: a b c

You can see two things here:

data.frame automatically converted character class to factors
When trying to parse a new level to factor vector it converts it into NA and throws the exact error you were receiving

As mentioned by @akrun, setting to options(stringsAsFactors=F) will solve your problem

answered Oct 26 '22 09:10

Set options(stringsAsFactors=FALSE) and your code should work as expected. The reason for the warnings and NA's in the result is because of the implicit conversion to factors and the type mismatch between newdf columns and yy, see https://stackoverflow.com/a/1640729/1541036.

For a cleaner way of achieving the same result, here's a group by solution using data.table

library(data.table)
events <- as.data.table(events)
events2 <- events[, list(topic=unlist(strsplit(topics, '|', fixed=TRUE))), by=c("date", "days", "name")]
events2[, probability := days / .N, by=name]

answered Oct 26 '22 10:10

ytsaig

Related questions
                            
                                extract maximal set of independent columns from a matrix [closed]
                            
                                How to create a loop
                            
                                In R, how to sum certain rows of a data frame with certain logic?
                            
                                Adding a single backslash (\) to a string in R
                            
                                Add p-values in corrplot matrix
                            
                                Assigning a locked variable in an R package
                            
                                If any element satisfies condition
                            
                                R data.table loop subset by factor and do lm()
                            
                                Is there a way to update existing text in the R console?
                            
                                arrange multiple graphs using a for loop in ggplot2
                            
                                R computing mean, median, variance from file with frequency distribution
                            
                                Calling setdiff() on multiple vectors
                            
                                Error in using the predict() function
                            
                                cbind replaces String with number?
                            
                                Matrix diagram in r
                            
                                Apply grouped model back onto data
                            
                                How do I get SQL database into R from local host?
                            
                                One line if statement in R - invalid first argument
                            
                                What is the equivalent of SQL's IN keyword in R?
                            
                                How to order the levels of factors according to the ordering of a data.frame (and not alphabetically)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why rbind throws a warning

Tags:

r

rnso

People also ask

3 Answers

akrun

David Arenburg

ytsaig

Recent Activity

Donate For Us