Does anyone know how to speed up running the following command? I want to replace the numerical "month" values with a character string ... e.g. month 1 goes to "Jul".
This command is really really slow as the dataframe I trying to implement it on is enormous!
for (i in 1:length(CO2$month)){
if(CO2$month[i]=='1') {CO2$months[i]<-'Jul'} else
if(CO2$month[i]=='2') {CO2$months[i]<-'Aug'} else
if(CO2$month[i]=='3') {CO2$months[i]<-'Sept'} else
if(CO2$month[i]=='4') {CO2$months[i]<-'Oct'} else
if(CO2$month[i]=='5') {CO2$months[i]<-'Nov'} else
if(CO2$month[i]=='6') {CO2$months[i]<-'Dec'} else
if(CO2$month[i]=='7') {CO2$months[i]<-'Jan'} else
if(CO2$month[i]=='8') {CO2$months[i]<-'Feb'} else
if(CO2$month[i]=='9') {CO2$months[i]<-'Mar'} else
if(CO2$month[i]=='10') {CO2$months[i]<-'Apr'} else
if(CO2$month[i]=='11') {CO2$months[i]<-'May'} else
if(CO2$month[i]=='12') {CO2$months[i]<-'Jun'}
}
Loops are slower in R than in C++ because R is an interpreted language (not compiled), even if now there is just-in-time (JIT) compilation in R (>= 3.4) that makes R loops faster (yet, still not as fast). Then, R loops are not that bad if you don't use too many iterations (let's say not more than 100,000 iterations).
There is a lot of overhead in the processing because R needs to check the type of a variable nearly every time it looks at it. This makes it easy to change types and reuse variable names, but slows down computation for very repetitive tasks, like performing an action in a loop.
The apply functions (apply, sapply, lapply etc.) are marginally faster than a regular for loop, but still do their looping in R, rather than dropping down to the lower level of C code.
You can do it without a loop and without if-else:
set.seed(21)
CO2 <- data.frame(month=as.character(sample(1:12,24,TRUE)),
stringsAsFactors=FALSE)
MonthAbbRotated <- month.abb[c(7:12,1:6)]
CO2$months <- MonthAbbRotated[as.numeric(CO2$month)]
If your month
column isn't really character, this is even easier:
set.seed(21)
CO2 <- data.frame(month=sample(1:12,24,TRUE))
MonthAbbRotated <- month.abb[c(7:12,1:6)]
CO2$months <- MonthAbbRotated[CO2$month]
I could be missing something, but why not just use a factor?
CO2$month <- factor(CO2$month, levels=1:12, labels=c("Jul","Aug","Sept","Oct","Nov","Dec","Jan","Feb","Mar","Apr","May","Jun"))
month =c("jul","aug","sep","oct","nov","dec","jan","feb","mar","apr","may","jun")
for (i in 1:length(CO2$month)){ CO2$month[i] = month[as.integer(CO2$month[i])]}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With