I have about 18 dataframes which are essentially frequency counts of the elements stored in the column Rptnames. They all have some different and some the same elements in the Rptnames columns so they look like this
dataframe called GroupedTableProportiondelAll
Rptname    freq
bob         4324234
jane        433
ham        4324
tim         22
dataframe called GroupedTableProportiondelLUAD
Rptname     freq
bob          987
jane         223
jonny        12
jim          98092
I am trying to set up a table so that the Rptname becomes the column and each row is the frequencies. This is so that I can combine all the dataframes.
I have tried the following
   GroupedTableProportiondelAll_T <- as.data.frame(t(GroupedTableProportiondelAll))
    GroupedTableProportiondelLUAD_T <- as.data.frame(t(GroupedTableProportiondelLUAD))
   total <- rbind(GroupedTableProportiondelLUAD_T, GroupedTableProportiondelAll_T)
but I get the error Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match
So the question is 
a) how can I do rbind (cbind would also do without transposing I suppose) so that the bind can happen without needing to match.
b) would merge be better here 
c) in either is there a way to enter zero for empty values 
d) P'raps there's a better way to do this like matrices which Im not really familiar with? I know its 4 questions but the central question's the same- how to bind when not all the rows or columns are matching
An alternative to the rbind + dcast technique that would use the tidyverse.
Use pipes (%>%) to first use bind_rows() to bind all your dataframes together while simultaneously creating a dataframe id column (in this case I just called the variable "df"). Then use spread() to move unique "Rptname" values to become column names and spreading the values of "freq" across the new columns. "Rptname" is the key and "freq" is the value in this case.
It would look like this:
Input:
GTP_A
Rptname    freq
1     bob 4324234
2    jane     433
3     ham    4324
4     tim      22
GTP_LUAD
Rptname  freq
1     bob   987
2    jane   223
3   jonny    12
4     jim 98092
Code:
GroupTable <- bind_rows(GTP_A,GTP_LUAD, .id = "df") %>%
  spread(Rptname, freq)
Output:
GroupTable
df     bob  ham jane   jim jonny tim
1  1 4324234 4324  433    NA    NA  22
2  2     987   NA  223 98092    12  NA
UPDATE:
As of the release of tidyr 1.0.0 on 2019/09/13 spread() and gather() have been retired and replaced by pivot_wider() and pivot_longer(), respectively. From the release notes Hadley Wickem states "spread() and gather() won’t go away, but they’ve been retired which means that they’re no longer under active development."
In order to get the same output as above, you will now need to first arrange() by Rptname then use pivot_wider(). If you do not arrange first you will get a similar output but the column order will not be the same as the output from spread().
GroupTable <- bind_rows(GTP_A, GTP_LUAD, .id = "df") %>%
  arrange(Rptname) %>% 
  pivot_wider(names_from = Rptname, values_from = freq)
                        You could first rbind the dataframes after adding a column to identify the data.frame. Then use dcast function from reshape2 package.
rpt1
##   Rptname    freq   df
## 1     bob 4324234 rpt1
## 2    jane     433 rpt1
## 3     ham    4324 rpt1
## 4     tim      22 rpt1
rpt2
##   Rptname  freq   df
## 1     bob   987 rpt2
## 2    jane   223 rpt2
## 3   jonny    12 rpt2
## 4     jim 98092 rpt2
rpt1$df <- "rpt1"
rpt2$df <- "rpt2"
rpt <- rbind(rpt1, rpt2)
dcast(data = rpt, df ~ Rptname, value.var = "freq")
##     df     bob  ham jane tim   jim jonny
## 1 rpt1 4324234 4324  433  22    NA    NA
## 2 rpt2     987   NA  223  NA 98092    12
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With