I have a data frame which is a result of another command. This data frame has only one row with around 40000 entries. My problem is that 3 columns are one connected set of data. Therefore I want to split the row after every third column and transport this as a new row. Example: Create a test data frame: <pre class="prettyprint"><code>df=as.data.frame(matrix(seq(1:12), ncol=12, nrow=1)) </code></pre> Now I have a data frame which looks like this. <pre class="prettyprint"><code>V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 1 2 3 4 5 6 7 8 9 10 11 12 </code></pre> But I need it like this: <pre class="prettyprint"><code>V1 V2 V3 1 2 3 4 5 6 7 8 9 10 11 12 </code></pre> How can I realise this?

Try <pre class="prettyprint"><code>as.data.frame(matrix(unlist(df, use.names=FALSE),ncol=3, byrow=TRUE)) # V1 V2 V3 #1 1 2 3 #2 4 5 6 #3 7 8 9 #4 10 11 12 </code></pre> Or you could directly use <code>matrix</code> on <code>df</code> <pre class="prettyprint"><code> as.data.frame(matrix(df, ncol=3, byrow=TRUE)) </code></pre>

Could also try using <code>dim<-</code> (just for general knowledge) <pre class="prettyprint"><code>as.data.frame(t(`dim<-`(unlist(df), c(3, 4)))) # V1 V2 V3 # 1 1 2 3 # 2 4 5 6 # 3 7 8 9 # 4 10 11 12 </code></pre>

This turned out to be faster than I expected (though still not as fast as the obvious approach that @akrun took), so I'm going to post this (like David) "just for general knowledge". (Plus, "data.table" all the things.) :-) Create a <code>data.table</code> with three columns: <ol> <li>The unlisted values of your single row.</li> <li>A grouping variable to indicate which row the value should be assigned to in the final output.</li> <li>A grouping variable to indicate which column the value should be assigned to in the final output.</li> </ol> Once you have that, you can use <code>dcast.data.table</code> to get the output you mention (plus a bonus column). For point number 2 above, we can easily define a function like the following to make the process of creating groups easy: <pre class="prettyprint"><code>groupMaker <- function(vecLen, perGroup) { (0:(vecLen-1) %/% perGroup) + 1 } </code></pre> Then we can use it as follows: <pre class="prettyprint"><code>dcast.data.table( data.table(value = unlist(df, use.names = FALSE), row = groupMaker(ncol(df), 3), col = 1:3), row ~ col) # row 1 2 3 # 1: 1 1 2 3 # 2: 2 4 5 6 # 3: 3 7 8 9 # 4: 4 10 11 12 </code></pre> <hr> Now, you mention that you are actually dealing with a single-row ~ 40K column <code>data.frame</code> (I'll assume it to be 39,999 columns since that's nicely divisible by 3 and I don't want to break the other answers). Keeping that in mind, here are some (useless) benchmarks (useless because we're talking milliseconds here, really). <pre class="prettyprint"><code>set.seed(1) S <- sample(20, 39999, TRUE) S <- data.frame(t(S)) funAM <- function(indf) { dcast.data.table( data.table(value = unlist(indf, use.names = FALSE), row = groupMaker(ncol(indf), 3), col = 1:3), row ~ col) } funDA <- function(indf) { as.data.frame(t(`dim<-`(unlist(indf), c(3, ncol(indf)/3)))) } funAK <- function(indf) as.data.frame(matrix(indf, ncol=3, byrow=TRUE)) library(microbenchmark) microbenchmark(funAM(S), funDA(S), funAK(S)) # Unit: milliseconds # expr min lq mean median uq max neval # funAM(S) 18.487001 18.813297 22.105766 18.999891 19.455812 50.25876 100 # funDA(S) 37.187177 37.450893 40.393893 37.870683 38.869726 94.20128 100 # funAK(S) 5.018571 5.149758 5.929944 5.271679 5.536449 26.93281 100 </code></pre> <hr> Where this might be useful would be in cases where the number of desired columns and your number of input columns are not nicely divisible by each other. For example, try the following sample data: <pre class="prettyprint"><code>set.seed(1) S2 <- sample(20, 40000, TRUE) S2 <- data.frame(t(S)) </code></pre> With this sample data: <ul> <li> <code>funAM</code> would give you a <code>warning</code> but would correctly give you the last two columns of the last row as <code>NA</code>.</li> <li> <code>funAK</code> would give you a <code>warning</code> but would (presumably) incorrectly recycle values in the last row.</li> <li> <code>funDA</code> would just give you an <code>error</code>.</li> </ul> <hr> I still think you should just fix the problem at the source though :-)

Split one row after every 3rd column and transport those 3 columns as a new row in r

Tags:

split

dataframe

r

col

I have a data frame which is a result of another command. This data frame has only one row with around 40000 entries. My problem is that 3 columns are one connected set of data. Therefore I want to split the row after every third column and transport this as a new row. Example:

Create a test data frame:

df=as.data.frame(matrix(seq(1:12), ncol=12, nrow=1))

Now I have a data frame which looks like this.

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1  2  3  4  5  6  7  8  9  10  11  12

But I need it like this:

How can I realise this?

276

asked Nov 17 '14 12:11

Tobias

3 Answers

Try

as.data.frame(matrix(unlist(df, use.names=FALSE),ncol=3, byrow=TRUE))
#  V1 V2 V3
#1  1  2  3
#2  4  5  6
#3  7  8  9
#4 10 11 12

Or you could directly use matrix on df

 as.data.frame(matrix(df, ncol=3, byrow=TRUE))

167

answered Sep 21 '22 23:09

akrun

Could also try using dim<- (just for general knowledge)

as.data.frame(t(`dim<-`(unlist(df), c(3, 4))))
#   V1 V2 V3
# 1  1  2  3
# 2  4  5  6
# 3  7  8  9
# 4 10 11 12

answered Sep 21 '22 23:09

David Arenburg

This turned out to be faster than I expected (though still not as fast as the obvious approach that @akrun took), so I'm going to post this (like David) "just for general knowledge". (Plus, "data.table" all the things.) :-)

Create a data.table with three columns:

The unlisted values of your single row.
A grouping variable to indicate which row the value should be assigned to in the final output.
A grouping variable to indicate which column the value should be assigned to in the final output.

Once you have that, you can use dcast.data.table to get the output you mention (plus a bonus column).

For point number 2 above, we can easily define a function like the following to make the process of creating groups easy:

groupMaker <- function(vecLen, perGroup) {
  (0:(vecLen-1) %/% perGroup) + 1
}

Then we can use it as follows:

dcast.data.table(
  data.table(value = unlist(df, use.names = FALSE), 
             row = groupMaker(ncol(df), 3), 
             col = 1:3), 
  row ~ col)
#    row  1  2  3
# 1:   1  1  2  3
# 2:   2  4  5  6
# 3:   3  7  8  9
# 4:   4 10 11 12

Now, you mention that you are actually dealing with a single-row ~ 40K column data.frame (I'll assume it to be 39,999 columns since that's nicely divisible by 3 and I don't want to break the other answers).

Keeping that in mind, here are some (useless) benchmarks (useless because we're talking milliseconds here, really).

set.seed(1)
S <- sample(20, 39999, TRUE)
S <- data.frame(t(S))

funAM <- function(indf) {
  dcast.data.table(
    data.table(value = unlist(indf, use.names = FALSE), 
               row = groupMaker(ncol(indf), 3), 
               col = 1:3), 
    row ~ col)
}

funDA <- function(indf) {
  as.data.frame(t(`dim<-`(unlist(indf), c(3, ncol(indf)/3))))
}

funAK <- function(indf) as.data.frame(matrix(indf, ncol=3, byrow=TRUE))

library(microbenchmark)
microbenchmark(funAM(S), funDA(S), funAK(S))
# Unit: milliseconds
#      expr       min        lq      mean    median        uq      max neval
#  funAM(S) 18.487001 18.813297 22.105766 18.999891 19.455812 50.25876   100
#  funDA(S) 37.187177 37.450893 40.393893 37.870683 38.869726 94.20128   100
#  funAK(S)  5.018571  5.149758  5.929944  5.271679  5.536449 26.93281   100

Where this might be useful would be in cases where the number of desired columns and your number of input columns are not nicely divisible by each other.

For example, try the following sample data:

set.seed(1)
S2 <- sample(20, 40000, TRUE)
S2 <- data.frame(t(S))

With this sample data:

funAM would give you a warning but would correctly give you the last two columns of the last row as NA.
funAK would give you a warning but would (presumably) incorrectly recycle values in the last row.
funDA would just give you an error.

I still think you should just fix the problem at the source though :-)

answered Sep 19 '22 23:09

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                Using knitr to write a paper for JSS
                            
                                Efficiently checking value of other row in data.table
                            
                                R Shiny - Output summary statistics
                            
                                ggplot2 multiple stat_smooth: change color & linetype
                            
                                Understanding bandwidth smoothing in ggplot2
                            
                                Appending text in write function [R]
                            
                                How do I suppress this output?
                            
                                R and igraph: subgraph nodes based on attributes of other nodes that are incident on edge
                            
                                Create multiple plots with unique RMarkdown headers
                            
                                Using row-wise column indices in a vector to extract values from data frame [duplicate]
                            
                                How can I make ShinyApp to use environmental variables when deployed on the web?
                            
                                Forecasting error in R when passing around arguments in forecast() and ar()
                            
                                tables in pander, style="multiline"
                            
                                Shiny select go to different tabPanel using action button or something
                            
                                Proper way to return from ggvis when the data is empty?
                            
                                do.call specify environment inside function
                            
                                Using Cost Sensitive C50 in caret
                            
                                Easily finding and replacing every match in a nested list
                            
                                Excel SUMIFS equivalent in R
                            
                                howto: Automatically set fixed coordinate ratio (coord_fixed) when x- and y-axis are on different scales?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With