I have about 18 dataframes which are essentially frequency counts of the elements stored in the column Rptnames. They all have some different and some the same elements in the Rptnames columns so they look like this
dataframe called GroupedTableProportiondelAll
Rptname freq
bob 4324234
jane 433
ham 4324
tim 22
dataframe called GroupedTableProportiondelLUAD
Rptname freq
bob 987
jane 223
jonny 12
jim 98092
I am trying to set up a table so that the Rptname becomes the column and each row is the frequencies. This is so that I can combine all the dataframes.
I have tried the following
GroupedTableProportiondelAll_T <- as.data.frame(t(GroupedTableProportiondelAll))
GroupedTableProportiondelLUAD_T <- as.data.frame(t(GroupedTableProportiondelLUAD))
total <- rbind(GroupedTableProportiondelLUAD_T, GroupedTableProportiondelAll_T)
but I get the error Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match
So the question is
a) how can I do rbind (cbind would also do without transposing I suppose) so that the bind can happen without needing to match.
b) would merge be better here
c) in either is there a way to enter zero for empty values
d) P'raps there's a better way to do this like matrices which Im not really familiar with? I know its 4 questions but the central question's the same- how to bind when not all the rows or columns are matching
An alternative to the rbind
+ dcast
technique that would use the tidyverse.
Use pipes (%>%
) to first use bind_rows()
to bind all your dataframes together while simultaneously creating a dataframe id column (in this case I just called the variable "df"). Then use spread()
to move unique "Rptname" values to become column names and spreading the values of "freq" across the new columns. "Rptname" is the key and "freq" is the value in this case.
It would look like this:
Input:
GTP_A
Rptname freq
1 bob 4324234
2 jane 433
3 ham 4324
4 tim 22
GTP_LUAD
Rptname freq
1 bob 987
2 jane 223
3 jonny 12
4 jim 98092
Code:
GroupTable <- bind_rows(GTP_A,GTP_LUAD, .id = "df") %>%
spread(Rptname, freq)
Output:
GroupTable
df bob ham jane jim jonny tim
1 1 4324234 4324 433 NA NA 22
2 2 987 NA 223 98092 12 NA
UPDATE:
As of the release of tidyr 1.0.0 on 2019/09/13 spread()
and gather()
have been retired and replaced by pivot_wider()
and pivot_longer()
, respectively. From the release notes Hadley Wickem states "spread()
and gather()
won’t go away, but they’ve been retired which means that they’re no longer under active development."
In order to get the same output as above, you will now need to first arrange()
by Rptname then use pivot_wider()
. If you do not arrange first you will get a similar output but the column order will not be the same as the output from spread()
.
GroupTable <- bind_rows(GTP_A, GTP_LUAD, .id = "df") %>%
arrange(Rptname) %>%
pivot_wider(names_from = Rptname, values_from = freq)
You could first rbind
the dataframes after adding a column to identify the data.frame. Then use dcast
function from reshape2
package.
rpt1
## Rptname freq df
## 1 bob 4324234 rpt1
## 2 jane 433 rpt1
## 3 ham 4324 rpt1
## 4 tim 22 rpt1
rpt2
## Rptname freq df
## 1 bob 987 rpt2
## 2 jane 223 rpt2
## 3 jonny 12 rpt2
## 4 jim 98092 rpt2
rpt1$df <- "rpt1"
rpt2$df <- "rpt2"
rpt <- rbind(rpt1, rpt2)
dcast(data = rpt, df ~ Rptname, value.var = "freq")
## df bob ham jane tim jim jonny
## 1 rpt1 4324234 4324 433 22 NA NA
## 2 rpt2 987 NA 223 NA 98092 12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With