Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split vector into dataframe at every nth element

Tags:

r

I would like to perform the following task in R. Here is character vector:

a <- c("a", "1", "2", "3", "b", "5", "6", "7", "c", "8", "9", "11")

Convert a into dataframe that looks like this:

a 1 2 3
b 5 6 7
c 8 9 11
like image 922
phage Avatar asked Sep 07 '17 12:09

phage


3 Answers

We can use matrix

as.data.frame(matrix(a, ncol = 4,  byrow = TRUE), stringsAsFactors = FALSE)

Based on the OP's initial post, it seems the data is a single string. If that is the case

a <- "a; 1; 2; 3; b; 5; 6; 7; c; 8; 9; 11"
library(data.table)
fread(gsub(";", "",  gsub("((\\S+\\s+){3}\\S+)(\\s)", "\\1\n ", a, perl = TRUE)))
#    V1 V2 V3 V4
#1:  a  1  2  3
#2:  b  5  6  7
#3:  c  8  9 11
like image 156
akrun Avatar answered Nov 13 '22 00:11

akrun


First make a matrix, add row names to it and transform it into a data frame.

a <- c("a", "1", "2", "3", "b", "5", "6", "7", "c", "8", "9", "11")
foo <- matrix(as.numeric(a[-seq(1, 9, 4)]), 3, byrow = TRUE)
rownames(foo) <- a[seq(1, 9, 4)]
data.frame(foo)

  X1 X2 X3
a  1  2  3
b  5  6  7
c  8  9 11
like image 3
pogibas Avatar answered Nov 13 '22 00:11

pogibas


This is an additional tale of caution, that adds to the existing answers, for tidyverse users who (like me) can automatically use pipes for everything: converting a vector to a dataframe all in a single pipe operation can be slightly tricky. See the following behaviors:

a <- seq(4)

a %>% 
  matrix(., ncol = 2,  byrow = TRUE)
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4

a %>%
  as.data.frame(matrix(., ncol = 2,  byrow = TRUE))
##   .
## 1 1
## 2 2
## 3 3
## 4 4

Warning message: In as.data.frame.integer(., matrix(., ncol = 2, byrow = TRUE)) : 'row.names' is not a character vector of length 4 -- omitting it. Will be an error!

a %>%
  as.data.frame(x = matrix(., ncol = 2,  byrow = TRUE))

##   V1 V2
## 1  1  2
## 2  3  4

a %>%
  as_tibble(matrix(., ncol = 2,  byrow = TRUE))
## # A tibble: 4 x 1
##   value
##   <int>
## 1     1
## 2     2
## 3     3
## 4     4

a %>%
  as_tibble(x = matrix(., ncol = 2,  byrow = TRUE))
## Error in .name_repair != name_repair : 
##   comparison (2) is possible only for atomic and list types

Hence what serves the purpose is

a %>%
  as.data.frame(x = matrix(., ncol = 2,  byrow = TRUE))

I do need to dig deeper about why this is the case, though.

like image 1
Kim Avatar answered Nov 13 '22 00:11

Kim