Find frequencies of combinations where the data.frame needs to be parsed

Question

I'm sure there's a simple solution to this, but I can't figure it out!! Suppose I have a dataframe that has the following information:

aaa<-c("A,B","B,C","B,D,E")
vvv<-c("101","101,102","102,103,104")
data_h<-data.frame(aaa,vvv)
data_h
    aaa         vvv
1   A,B         101
2   B,C     101,102
3 B,D,E 102,103,104

Desired output is a frequency map of individual hits, for subsequent analysis in a heat map. So:

  101   102   103   104
A  1     0     0     0
B  2     2     1     1
C  1     1     0     0
D  0     1     1     1
E  0     1     1     1

How do I make this transformation? I've seen many similar examples, but none where the contents of the data-frame need to be parsed.

The goal is to ultimately use heatmap or something similar on the output table to visualize the correlation between "aaa" and "vvv".

G. Grothendieck · Accepted Answer

Here is a base R solution in 4 lines of code. First we define a function, spl which splits the components of a comma separated string producing a vector of all the fields. eg takes two string arguments and applies spl to each of them and then creates a grid of the result of the splitting. Finally we apply eg to each row of data_h, rbind the results together and tabulate them with xtabs:

spl <- function(x) strsplit(as.character(x), ",")[[1]]
eg <- function(aaa, vvv) expand.grid(aaa = spl(aaa), vvv = spl(vvv))
dd <- do.call("rbind", Map(eg, data_h$aaa, data_h$vvv))
xtabs(data = dd)

The result is:

   vvv
aaa 101 102 103 104
  A   1   0   0   0
  B   2   2   1   1
  C   1   1   0   0
  D   0   1   1   1
  E   0   1   1   1

dcast Alternately replace the last line of code above (the one with the xtabs) with:

library(reshape2)
dcast(dd, aaa ~ vvv, fun = length, value.var = "vvv")

in which case the result is:

  aaa 101 102 103 104
1   A   1   0   0   0
2   B   2   2   1   1
3   C   1   1   0   0
4   D   0   1   1   1
5   E   0   1   1   1

tapply. Another alternative would be tapply (however, it will fill in empty cells with NA rather than 0):

tapply(1:nrow(dd), dd, length)

ADDED Alternatives. Some improvements.

agstudy · Answer

The shape of the data.frame suggests using splitstackshape package. But I don't know very well this package so I just use it to reshape the data, and then compute frequencies by hand using table:

library(splitstackshape)
data_h_split <- concat.split.multiple(data_h,1:2)

# aaa_1 aaa_2 aaa_3 vvv_1 vvv_2 vvv_3
# 1     A     B  <NA>   101    NA    NA
# 2     B     C  <NA>   101   102    NA
# 3     B     D     E   102   103   104

Once you have the data in this format (no comma , regular columns), it is easy to compute frequencies using table( you can use tapply,reshape):

table(cbind.data.frame(ff= unlist(data_h_split[1:3]),
                       xx= unlist(data_h_split[4:6])))
   xx
ff  101 102 103 104
  A   1   0   0   0
  B   1   1   0   0
  C   0   1   0   0
  D   0   0   1   0
      0   0   0   0
  E   0   0   0   1

Ananda's edit

Here's a multi-step approach to get the result using "splitstackshape" to work for this.

library(splitstackshape)

## Split the "vvv" column first, and reshape at the same time
x <- concat.split.multiple(data_h, split.cols="vvv", ",", "long")

## Add an ID column
x$id <- 1:nrow(x)

## Split the "aaa" column next, again reshaping as we do so
x <- concat.split.multiple(x[complete.cases(x), ], split.cols="aaa", ",", "long")

## Use `table` with `droplevels`
with(droplevels(x), table(aaa, vvv))
#    vvv
# aaa 101 102 103 104
#   A   1   0   0   0
#   B   2   2   1   1
#   C   1   1   0   0
#   D   0   1   1   1
#   E   0   1   1   1

Find frequencies of combinations where the data.frame needs to be parsed

Tags:

dataframe

r

frequency

heatmap

Amit Kohli

2 Answers

G. Grothendieck

Ananda's edit

agstudy

Recent Activity

Donate For Us

Find frequencies of combinations where the data.frame needs to be parsed

Tags:

dataframe

r

frequency

heatmap

Amit Kohli

2 Answers

G. Grothendieck

Ananda's edit

agstudy

Related questions

Recent Activity

Donate For Us