splitting contents of dataframe column into different columns based on values

Question

I am trying to split the following dataframe column into 3 columns depending on what the contents are. I tried using dplyr and mutate because I wanted to learn them better, but any suggestions would be welcome.

exampledf<-data.frame(c("Argentina","2005/12","2005/11","Bolivia","2006/12"),stringsAsFactors=F)
mutate(exampledf,month=strsplit(exampledf[,1],"/")[1],month=strsplit(exampledf[,1],"/")[2])

My Goal:

Year     Month    Country
2005     12       Argentina
2005     11       Argentina
2006     12       Bolivia

This is very close to this SO post, but it doesnt address my repeating country issue.

akrun · Accepted Answer

We create a logical index for rows that have no numbers ('i1'), get the cumulative sum of that, split the dataset with that grouping index, extract the 'year', 'month' with (sub), and the 'Country' as the first element, create a data.frame, and rbind the list contents.

 i1 <- grepl('^[^0-9]+$', exampledf$Col1)
 lst <- lapply(split(exampledf, cumsum(i1)), function(x) 
   data.frame(year= as.numeric(sub('\/.*', '',   x[-1,1])), 
              month = as.numeric(sub('.*\/', '', x[-1,1])),
              Country = x[1,1] ) )
 res <- do.call(rbind, lst)
 row.names(res) <- NULL

 res
 # year month   Country
 #1 2005    12 Argentina
 #2 2005    11 Argentina
 #3 2006    12   Bolivia

Or using data.table, we convert the 'data.frame' to 'data.table' (setDT(exampledf)), grouped by the cumsum of the index (from above), we split ( tstrsplit) on the 'Col1' (removing the first element) with delimiter (/). We get two columns out of that. Then, concatenate the first element to create three columns and change the column names with setnames. If we don't need the grouping variable, it can be assigned (:=) to NULL.

library(data.table)
res1 <- setDT(exampledf)[, c(tstrsplit(Col1[-1], 
        '/'),Country = Col1[1L]), .(i2=cumsum(i1))][,i2:= NULL][]
setnames(res1, 1:2, c('year', 'month'))

data

 exampledf<-data.frame(Col1=c("Argentina","2005/12","2005/11",
          "Bolivia","2006/12"),stringsAsFactors=FALSE)

splitting contents of dataframe column into different columns based on values

Tags:

r

dplyr

Rilcon42

1 Answers

data

akrun

Recent Activity

Donate For Us

splitting contents of dataframe column into different columns based on values

Tags:

r

dplyr

Rilcon42

1 Answers

data

akrun

Related questions

Recent Activity

Donate For Us