I am trying to split the following dataframe column into 3 columns depending on what the contents are. I tried using dplyr and mutate because I wanted to learn them better, but any suggestions would be welcome.
exampledf<-data.frame(c("Argentina","2005/12","2005/11","Bolivia","2006/12"),stringsAsFactors=F)
mutate(exampledf,month=strsplit(exampledf[,1],"/")[1],month=strsplit(exampledf[,1],"/")[2])
My Goal:
Year Month Country
2005 12 Argentina
2005 11 Argentina
2006 12 Bolivia
This is very close to this SO post, but it doesnt address my repeating country issue.
We create a logical index for rows that have no numbers ('i1'), get the cumulative sum of that, split
the dataset with that grouping index, extract the 'year', 'month' with (sub
), and the 'Country' as the first element, create a data.frame
, and rbind
the list
contents.
i1 <- grepl('^[^0-9]+$', exampledf$Col1)
lst <- lapply(split(exampledf, cumsum(i1)), function(x)
data.frame(year= as.numeric(sub('\\/.*', '', x[-1,1])),
month = as.numeric(sub('.*\\/', '', x[-1,1])),
Country = x[1,1] ) )
res <- do.call(rbind, lst)
row.names(res) <- NULL
res
# year month Country
#1 2005 12 Argentina
#2 2005 11 Argentina
#3 2006 12 Bolivia
Or using data.table
, we convert the 'data.frame' to 'data.table' (setDT(exampledf)
), grouped by the cumsum
of the index (from above), we split ( tstrsplit
) on the 'Col1' (removing the first element) with delimiter (/
). We get two columns out of that. Then, concatenate the first element to create three columns and change the column names with setnames
. If we don't need the grouping variable, it can be assigned (:=
) to NULL.
library(data.table)
res1 <- setDT(exampledf)[, c(tstrsplit(Col1[-1],
'/'),Country = Col1[1L]), .(i2=cumsum(i1))][,i2:= NULL][]
setnames(res1, 1:2, c('year', 'month'))
exampledf<-data.frame(Col1=c("Argentina","2005/12","2005/11",
"Bolivia","2006/12"),stringsAsFactors=FALSE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With