Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

splitting contents of dataframe column into different columns based on values

Tags:

r

dplyr

I am trying to split the following dataframe column into 3 columns depending on what the contents are. I tried using dplyr and mutate because I wanted to learn them better, but any suggestions would be welcome.

exampledf<-data.frame(c("Argentina","2005/12","2005/11","Bolivia","2006/12"),stringsAsFactors=F)
mutate(exampledf,month=strsplit(exampledf[,1],"/")[1],month=strsplit(exampledf[,1],"/")[2])

My Goal:

Year     Month    Country
2005     12       Argentina
2005     11       Argentina
2006     12       Bolivia

This is very close to this SO post, but it doesnt address my repeating country issue.

like image 384
Rilcon42 Avatar asked Dec 08 '22 23:12

Rilcon42


1 Answers

We create a logical index for rows that have no numbers ('i1'), get the cumulative sum of that, split the dataset with that grouping index, extract the 'year', 'month' with (sub), and the 'Country' as the first element, create a data.frame, and rbind the list contents.

 i1 <- grepl('^[^0-9]+$', exampledf$Col1)
 lst <- lapply(split(exampledf, cumsum(i1)), function(x) 
   data.frame(year= as.numeric(sub('\\/.*', '',   x[-1,1])), 
              month = as.numeric(sub('.*\\/', '', x[-1,1])),
              Country = x[1,1] ) )
 res <- do.call(rbind, lst)
 row.names(res) <- NULL

 res
 # year month   Country
 #1 2005    12 Argentina
 #2 2005    11 Argentina
 #3 2006    12   Bolivia

Or using data.table, we convert the 'data.frame' to 'data.table' (setDT(exampledf)), grouped by the cumsum of the index (from above), we split ( tstrsplit) on the 'Col1' (removing the first element) with delimiter (/). We get two columns out of that. Then, concatenate the first element to create three columns and change the column names with setnames. If we don't need the grouping variable, it can be assigned (:=) to NULL.

library(data.table)
res1 <- setDT(exampledf)[, c(tstrsplit(Col1[-1], 
        '/'),Country = Col1[1L]), .(i2=cumsum(i1))][,i2:= NULL][]
setnames(res1, 1:2, c('year', 'month'))

data

 exampledf<-data.frame(Col1=c("Argentina","2005/12","2005/11",
          "Bolivia","2006/12"),stringsAsFactors=FALSE)
like image 66
akrun Avatar answered Mar 22 '23 23:03

akrun