Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split variable on every other row to form two new columns in data.frame

Tags:

r

dplyr

After scraping a pdf, I have a data frame with a chr text var:

df = data.frame(text = c("abc","def","abc","def"))

My question is how to turn it into:

df = data.frame(text1 = c("abc","abc"),text2=c("def","def"))

I am able to index the rows and manually rebuild a new df, but was curious if it could be done within the dplyr pipe.

All solutions I have been able to find involve splitting each row, but not to split whole rows of a variable into new columns.

like image 593
David Lucey Avatar asked Dec 19 '18 02:12

David Lucey


2 Answers

Using dplyr you could create a new column (ind) for grouping which would have same values every alternate rows and then we group_by ind and create a sequence column (id) to spread the data into two columns.

library(dplyr)
library(tidyr)

df %>%
  mutate(ind = rep(c(1, 2),length.out = n())) %>%
  group_by(ind) %>%
  mutate(id = row_number()) %>%
  spread(ind, text) %>%
  select(-id)


#   `1`   `2`  
#  <fct> <fct>
#1 abc   def  
#2 abc   def  

A base R option would be to split df into separate dataframe every alternate rows creating a sequence using rep and cbind them together to form 2-column data frame.

do.call("cbind", split(df, rep(c(1, 2), length.out = nrow(df))))

#  text text
#1  abc  def
#3  abc  def
like image 121
Ronak Shah Avatar answered Oct 04 '22 13:10

Ronak Shah


We could do this in base R. Use the matrix route to rearrange a vector/column into a matrix and then convert it to data.frame (as.data.frame). As the number of columns is constant i.e. 2, specify that value in ncol

as.data.frame(matrix(df$text, ncol = 2, byrow = TRUE, 
      dimnames = list(NULL, c('text1', 'text2'))))
#   text1 text2
#1   abc   def
#2   abc   def

Or another option is unstack from base R after creating a sequence of alternate ids (making use of the recycling)

unstack(transform(df, val = paste0('text', 1:2)), text ~ val)
#    text1 text2
#1   abc   def
#2   abc   def

Or we can split into a list of vectors and then cbind it together

as.data.frame(do.call(cbind, split(as.character(df$text), 1:2)))
#   1   2
#1 abc def
#2 abc def

Or another option is dcast from data.table

library(data.table)
dcast(setDT(df), rowid(text)~ text)[, text := NULL][]

data

df <- data.frame(text = c("abc","def","abc","def"))
like image 45
akrun Avatar answered Oct 04 '22 13:10

akrun