Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binding dataframes of different length (no cbind, no merge)

Tags:

merge

r

cbind

I am trying to display multiple dataframes next to each other to compare certain entries. However, they have a different number of rows and I want each data frame to be in the exact same order. I tried to use cbind which did not work because of the different number of rows. I used merge to bind two dfs together and then merge them again, however they change order when I do that and it seems inefficient to merge two dfs when I have more than 5 in total.

Examp:

df <-  data.frame(v=1:5, x=sample(LETTERS[1:5],5))
df 
  v x
1 1 E
2 2 B
3 3 D
4 4 C
5 5 A

df2 <- data.frame(m=7:10, n=sample(LETTERS[6:9],4))
df2
   m n
1  7 G
2  8 I
3  9 F
4 10 H

Then I ordered df2

df2 <- df2[order(df2$m, decreasing = TRUE),]
df2
   m n
4 10 F
3  9 I
2  8 H
1  7 G

Expected output:

  v x m n
1 1 E 10 F
2 2 B 9 I
3 3 D 8 H
4 4 C 7 G
5 5 A NA NA

As I said, I have more than two dfs and the order of the dfs should be remained. Any help will be greatly appreciated!

like image 849
Linda Espey Avatar asked Apr 22 '21 06:04

Linda Espey


People also ask

How do I combine two data frames with different number of rows?

Use the full_join Function to Merge Two R Data Frames With Different Number of Rows. full_join is part of the dplyr package, and it can be used to merge two data frames with a different number of rows.

How do I merge two Dataframes with different number of columns in R?

The bind_rows() method is used to combine data frames with different columns. The column names are number may be different in the input data frames. Missing columns of the corresponding data frames are filled with NA.

What is the difference between Rbind and Cbind?

cbind() and rbind() both create matrices by combining several vectors of the same length. cbind() combines vectors as columns, while rbind() combines them as rows. Let's use these functions to create a matrix with the numbers 1 through 30.

What is the difference between cbind and data frame?

The data frame method will be used if at least one argument is a data frame and the rest are vectors or matrices. L et’s create a second data frame and use the cbind () function to merge the second data frame to the first data frame.

How do I bind multiple columns to a data frame?

For example, define a data frame using three columns and add the two columns to that data frame using the cbind () function, and see the output. The cbind () function, short for column bind, is a merge function that can combine two data frames with the same number of multiple rows into a single data frame.

How do I merge two data frames of different lengths?

It can be done using the merge () method. Below are some examples that depict how to merge data frames of different lengths using the above method: Below is a program to merge two student data frames of different lengths. Here is another program to merge one data frame of length 4 and another dataframe of length 9.

Can We join two vectors of the same length using cbind?

We can join vectors by columns using cbind and it does not matter whether these vectors are of same length or not. If the vectors are of same length then all the values of both the vectors are printed but if the length of these vectors are different then the values of the smaller vector gets repeated.


Video Answer


2 Answers

Base R approach :

Put the dataframes in a list, get the dataframe with maximum number of rows, append NA's to data which have less number of rows and cbind.

list_df <- list(df, df2)
n_r <- seq_len(max(sapply(list_df, nrow)))
result <- do.call(cbind, lapply(list_df, `[`, n_r, ))
result

#  v x  m    n
#1 1 C 10    F
#2 2 B  9    H
#3 3 E  8    G
#4 4 D  7    I
#5 5 A NA <NA>
like image 102
Ronak Shah Avatar answered Oct 28 '22 07:10

Ronak Shah


Edit: In case there are multiple df. Do this

  • Create a list of all dfs except one say first one
  • use purrr::reduce to join all these together
  • pass first df in .init argument.
df2 <- data.frame(m=7:10, n=sample(LETTERS[6:9],4))
df <-  data.frame(v=1:5, x=sample(LETTERS[1:5],5))
df3 <- data.frame(bb = 101:110, cc = sample(letters, 10))


reduce(list(df2, df3), .init = df %>% mutate(id = row_number()) , ~full_join(.x, .y %>% mutate(id = row_number()), by = "id" )) %>%
  select(-id)

    v    x  m    n  bb cc
1   1    A 10    I 101  u
2   2    C  9    H 102  v
3   3    D  8    G 103  n
4   4    E  7    F 104  w
5   5    B NA <NA> 105  s
6  NA <NA> NA <NA> 106  y
7  NA <NA> NA <NA> 107  g
8  NA <NA> NA <NA> 108  i
9  NA <NA> NA <NA> 109  p
10 NA <NA> NA <NA> 110  h

Earlier Answer: Create a dummy column id in both dfs and use full_join

full_join(df %>% mutate(id = row_number()), df2 %>% mutate(id = row_number()), by = "id") %>%
  select(-id)

  v x  m    n
1 1 A 10    I
2 2 C  9    H
3 3 D  8    G
4 4 E  7    F
5 5 B NA <NA>

Results are different from as expected becuase of different random number seed


Or in BaseR

merge(transform(df, id = seq_len(nrow(df))), transform(df2, id = seq_len(nrow(df2))), all = T)

  id v x  m    n
1  1 1 A 10    I
2  2 2 C  9    H
3  3 3 D  8    G
4  4 4 E  7    F
5  5 5 B NA <NA>

Remove extra column simply by subsetting []

merge(transform(df, id = seq_len(nrow(df))), transform(df2, id = seq_len(nrow(df2))), all = T)[-1]

  v x  m    n
1 1 A 10    I
2 2 C  9    H
3 3 D  8    G
4 4 E  7    F
5 5 B NA <NA>
like image 30
AnilGoyal Avatar answered Oct 28 '22 08:10

AnilGoyal