Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create New Data Frame with Column Names from Unique Values in another Data Frame and Corresponding Values Assigned to Column

Tags:

r

reshape

I'm new to R, and I'm pretty sure this is something simple to accomplish, but I cannot figure out how to perform this action. I've tried the split function, utilizing a for loop, but cannot quite figure out how to get it right. As an example, this is what my original data frame looks like:

dat <- data.frame(col1 = c(rep("red", 4), rep("blue", 3)), col2 = c(1, 3, 2, 4, 7, 8, 9))

 col1 col2
  red    1
  red    3
  red    2
  red    4
 blue    7
 blue    8
 blue    9

I want to create new columns for each unique value in col1 and assign it's corressponding value in col2 to the new data frame. And this is how I want my new data frame:

red  blue
 1       7
 3       8
 2       9
 4      NA

I've gotten close with a list structure close to what I wanted, but I need a data frame to boxplot and dotplot the results. Any help would be appriciated. Thanks!

like image 326
James Marquez Avatar asked Jan 07 '23 16:01

James Marquez


2 Answers

I'm sure there's a more efficient solution, but here's one option

dat <- data.frame(col1 = c(rep("red", 4), rep("blue", 3)), col2 = c(1, 3, 2, 4, 7, 8, 9))
dat

  col1 col2
1  red    1
2  red    3
3  red    2
4  red    4
5 blue    7
6 blue    8
7 blue    9    

ust <- unstack(dat, form = col2 ~ col1)
res <- data.frame(sapply(ust, '[', 1:max(unlist(lapply(ust, length)))))
res
  blue red
1    7   1
2    8   3
3    9   2
4   NA   4

Edit: If you want the column order red then blue

res[, c("red", "blue")]
  red blue
1   1    7
2   3    8
3   2    9
4   4   NA
like image 157
Whitebeard Avatar answered Jan 30 '23 12:01

Whitebeard


Here's an Hadleyverse possible solution

library(tidyr)
library(dplyr)
dat %>%
  group_by(col1) %>%
  mutate(n = row_number()) %>%
  spread(col1, col2)
# Source: local data frame [4 x 3]
# 
#   n blue red
# 1 1    7   1
# 2 2    8   3
# 3 3    9   2
# 4 4   NA   4

Or using data.table

library(data.table)
dcast(setDT(dat)[, indx := 1:.N, by = col1], indx ~ col1, value.var = "col2")
#    indx blue red
# 1:    1    7   1
# 2:    2    8   3
# 3:    3    9   2
# 4:    4   NA   4
like image 25
David Arenburg Avatar answered Jan 30 '23 12:01

David Arenburg