Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R Merging rows where a column has same value but different case

So I have data where many values (x) have been separated because of case issue and I would like to merge all these values ignoring case and simply adding the values in the other columns (y and z)

I have a dataframe like:

x     y  z 
rain  2   40
Rain  4   50
RAIN  7   25
Wind  8   10
Snow  3    9
SNOW  11  25

I want a Dataframe like:

x     y   z
Rain  13  115
Wind  8   10
Snow  14  34
like image 749
JustOneGeek Avatar asked Sep 24 '15 23:09

JustOneGeek


People also ask

How do I merge rows with the same data?

First, select the rows you want to merge then open the Home tab and expand Merge & Centre. From these options select Merge Cells. After selecting Merge Cells it will pop up a message which values it is going to keep. Then click on OK.

How do I unite two rows in R?

First of all, create a data frame. Then, using plus sign (+) to add two rows and store the addition in one of the rows. After that, remove the row that is not required by subsetting with single square brackets.


2 Answers

You could lower the caps on the first column and then aggregate.

Option 1: base R's aggregate()

with(df, aggregate(list(y = y, z = z), list(x = tolower(x)), sum))
#      x  y   z
# 1 rain 13 115
# 2 snow 14  34
# 3 wind  8  10

Alternatively, the formula method could also be used.

aggregate(. ~ x, transform(df, x = tolower(x)), sum)

Option 2: data.table. This also keeps the order you show in the result.

library(data.table)
as.data.table(df)[, lapply(.SD, sum), by = .(x = tolower(x))]
#       x  y   z
# 1: rain 13 115
# 2: wind  8  10
# 3: snow 14  34

To order the result, use keyby instead of by

Option 3: base R's xtabs()

xtabs(cbind(y = y, z = z) ~ tolower(x), df)
#           
# tolower(x)   y   z
#       rain  13 115
#       snow  14  34
#       wind   8  10 

although this results in a table (probably not what you want, but worth noting), and I have yet to determine how to change the name on the x result.

Data:

df <- tructure(list(x = structure(c(1L, 2L, 3L, 6L, 4L, 5L), .Label = c("rain", 
"Rain", "RAIN", "Snow", "SNOW", "Wind"), class = "factor"), y = c(2L, 
4L, 7L, 8L, 3L, 11L), z = c(40L, 50L, 25L, 10L, 9L, 25L)), .Names = c("x", 
"y", "z"), class = "data.frame", row.names = c(NA, -6L))
like image 130
Rich Scriven Avatar answered Oct 18 '22 07:10

Rich Scriven


Try:

library(dplyr)
df %>%
  group_by(x = tolower(x)) %>%
  summarise_each(funs(sum))

Which gives:

#Source: local data frame [3 x 3]
#
#      x     y     z
#  (chr) (int) (int)
#1  rain    13   115
#2  snow    14    34
#3  wind     8    10
like image 6
Steven Beaupré Avatar answered Oct 18 '22 05:10

Steven Beaupré