Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to rbind when only some of the columns match

Tags:

r

I have about 18 dataframes which are essentially frequency counts of the elements stored in the column Rptnames. They all have some different and some the same elements in the Rptnames columns so they look like this

dataframe called GroupedTableProportiondelAll

Rptname    freq

bob         4324234
jane        433
ham        4324
tim         22

dataframe called GroupedTableProportiondelLUAD

Rptname     freq

bob          987
jane         223
jonny        12
jim          98092

I am trying to set up a table so that the Rptname becomes the column and each row is the frequencies. This is so that I can combine all the dataframes.

I have tried the following

   GroupedTableProportiondelAll_T <- as.data.frame(t(GroupedTableProportiondelAll))
    GroupedTableProportiondelLUAD_T <- as.data.frame(t(GroupedTableProportiondelLUAD))

   total <- rbind(GroupedTableProportiondelLUAD_T, GroupedTableProportiondelAll_T)

but I get the error Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match

So the question is a) how can I do rbind (cbind would also do without transposing I suppose) so that the bind can happen without needing to match.
b) would merge be better here c) in either is there a way to enter zero for empty values d) P'raps there's a better way to do this like matrices which Im not really familiar with? I know its 4 questions but the central question's the same- how to bind when not all the rows or columns are matching

like image 551
Sebastian Zeki Avatar asked Jan 09 '23 15:01

Sebastian Zeki


2 Answers

An alternative to the rbind + dcast technique that would use the tidyverse.

Use pipes (%>%) to first use bind_rows() to bind all your dataframes together while simultaneously creating a dataframe id column (in this case I just called the variable "df"). Then use spread() to move unique "Rptname" values to become column names and spreading the values of "freq" across the new columns. "Rptname" is the key and "freq" is the value in this case.

It would look like this:

Input:

GTP_A
Rptname    freq
1     bob 4324234
2    jane     433
3     ham    4324
4     tim      22

GTP_LUAD
Rptname  freq
1     bob   987
2    jane   223
3   jonny    12
4     jim 98092

Code:

GroupTable <- bind_rows(GTP_A,GTP_LUAD, .id = "df") %>%
  spread(Rptname, freq)

Output:

GroupTable
df     bob  ham jane   jim jonny tim
1  1 4324234 4324  433    NA    NA  22
2  2     987   NA  223 98092    12  NA

UPDATE: As of the release of tidyr 1.0.0 on 2019/09/13 spread() and gather() have been retired and replaced by pivot_wider() and pivot_longer(), respectively. From the release notes Hadley Wickem states "spread() and gather() won’t go away, but they’ve been retired which means that they’re no longer under active development."

In order to get the same output as above, you will now need to first arrange() by Rptname then use pivot_wider(). If you do not arrange first you will get a similar output but the column order will not be the same as the output from spread().

GroupTable <- bind_rows(GTP_A, GTP_LUAD, .id = "df") %>%
  arrange(Rptname) %>% 
  pivot_wider(names_from = Rptname, values_from = freq)
like image 119
Lauren Hostert Avatar answered Jan 21 '23 05:01

Lauren Hostert


You could first rbind the dataframes after adding a column to identify the data.frame. Then use dcast function from reshape2 package.

rpt1
##   Rptname    freq   df
## 1     bob 4324234 rpt1
## 2    jane     433 rpt1
## 3     ham    4324 rpt1
## 4     tim      22 rpt1

rpt2
##   Rptname  freq   df
## 1     bob   987 rpt2
## 2    jane   223 rpt2
## 3   jonny    12 rpt2
## 4     jim 98092 rpt2


rpt1$df <- "rpt1"
rpt2$df <- "rpt2"
rpt <- rbind(rpt1, rpt2)
dcast(data = rpt, df ~ Rptname, value.var = "freq")
##     df     bob  ham jane tim   jim jonny
## 1 rpt1 4324234 4324  433  22    NA    NA
## 2 rpt2     987   NA  223  NA 98092    12
like image 38
CHP Avatar answered Jan 21 '23 05:01

CHP