Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshape data frame to convert factors into columns in R

Tags:

r

reshape

factors

I have a data frame where one particular column has a set of specific values (let's say, 1, 2, ..., 23). What I would like to do is to convert from this layout to the one, where the frame would have extra 23 (in this case) columns, each one representing one of the factor values. The data in these columns would be booleans indicating whether a particular row had a given factor value... To show a specific example:

Source frame:

ID       DATE         SECTOR
123      2008-01-01   1
456      2008-01-01   3
789      2008-01-02   5
... <more records with SECTOR values from 1 to 5>

Desired format:

ID       DATE         SECTOR.1   SECTOR.2   SECTOR.3   SECTOR.4   SECTOR.5
123      2008-01-01      T          F          F          F          F
456      2008-01-01      F          F          T          F          F
789      2008-01-02      F          F          F          F          T

I have no problem doing it in a loop but I hoped there would be a better way. So far reshape() didn't yield the desired result. Help would be much appreciated.

like image 354
Alexander L. Belikoff Avatar asked Mar 08 '10 19:03

Alexander L. Belikoff


2 Answers

I would try to bind another column called "value" and set value = TRUE.

df <- data.frame(cbind(1:10, 2:11, 1:3))
colnames(df) <- c("ID","DATE","SECTOR")
df <- data.frame(df, value=TRUE)

Then do a reshape:

reshape(df, idvar=c("ID","DATE"), timevar="SECTOR", direction="wide")

The problem with using the reshape function is that the default for missing values is NA (in which case you will have to iterate and replace them with FALSE).

Otherwise you can use cast out of the reshape package (see this question for an example), and set the default to FALSE.

df.wide <- cast(df, ID + DATE ~ SECTOR, fill=FALSE)
> df.wide 
   ID DATE     1     2     3
1   1    2  TRUE FALSE FALSE
2   2    3 FALSE  TRUE FALSE
3   3    4 FALSE FALSE  TRUE
4   4    5  TRUE FALSE FALSE
5   5    6 FALSE  TRUE FALSE
6   6    7 FALSE FALSE  TRUE
7   7    8  TRUE FALSE FALSE
8   8    9 FALSE  TRUE FALSE
9   9   10 FALSE FALSE  TRUE
10 10   11  TRUE FALSE FALSE
like image 170
Shane Avatar answered Oct 17 '22 22:10

Shane


Here's another approach using xtabs which may or may not be faster (if someone would try and let me know):

df <- data.frame(cbind(1:12, 2:13, 1:3))
colnames(df) <- c("ID","DATE","SECTOR")
foo <- xtabs(~ paste(ID, DATE) + SECTOR, df)
cbind(t(matrix(as.numeric(unlist(strsplit(rownames(foo), " "))), nrow=2)), foo)
like image 35
Jonathan Chang Avatar answered Oct 17 '22 22:10

Jonathan Chang