R - matching one value by combining values in a column

Tags:

r

I have a data frame with two columns as below:

    Col1       Col2
1   7197.36    14.00
2        NA  5173.94
3        NA 13333.06
4   7004.38   473.32
5        NA  4980.61
6  26355.52   110.05
7        NA  1307.32
8        NA  6531.06
9        NA  3777.65
10       NA  7827.44
11  8753.22    85.00
12       NA     1.86
13       NA  2009.42
14       NA   502.89
15       NA  3182.86
16       NA       NA

I wanted to find matching rows in column 'Col2' corresponding to the single value in 'Col1'. For example, 7197.36 = 14.00 + 5173.94 + 2009.42 (rows 1,2,13 in 'Col2')

Here, sum of 'Col1' = sum of 'Col2'

The final data frame should look like this:

   Col1    Col2
 1   7197.36    14.00
 2        NA  5173.94
 3        NA  2009.42
 4   7004.38   473.32
 5        NA  6531.06
 6  26355.52   110.05
 7        NA  1307.32
 8        NA 13333.06
 9        NA  3777.65
10       NA  7827.44
11  8753.22    85.00
12       NA     1.86
13       NA  4980.61
14       NA   502.89
15       NA  3182.86
16       NA       NA

Can anybody help me?

842

asked Nov 22 '15 13:11

2 Answers

We solve it via integer linear programming solving the problem of finding the minimum objective value greater than or equal to the target and if it is found to within numerical precision then return it; otherwise, return NULL.

library(lpSolve)

obj <- na.omit(DF$Col2)
targets <- na.omit(DF$Col1)
L <- lapply(targets, function(value) {
    iobj <- 100 * obj
    ivalue <- 100 * value
    res <- lp("min", iobj, t(iobj), ">=", ivalue, all.bin = TRUE)
    ok <- isTRUE(all.equal(ivalue, res$objval))
    if (ok) obj[res$solution == 1]
})
names(L) <- targets

giving:

> L

$`7197.36`
[1]   14.00 5173.94 2009.42

$`7004.38`
[1]  473.32 6531.06

$`26355.52`
[1] 13333.06   110.05  1307.32  3777.65  7827.44

$`8753.22`
[1] 4980.61   85.00    1.86  502.89 3182.86

Note 1: Later the question was modified to request this form of output:

transform(stack(L), Col1 = ifelse(duplicated(ind), NA, as.numeric(paste(ind))), 
                    Col2 = values)[3:4]

Note 2: We used this as DF

Lines <- "    Col1       Col2
1   7197.36    14.00
2        NA  5173.94
3        NA 13333.06
4   7004.38   473.32
5        NA  4980.61
6  26355.52   110.05
7        NA  1307.32
8        NA  6531.06
9        NA  3777.65
10       NA  7827.44
11  8753.22    85.00
12       NA     1.86
13       NA  2009.42
14       NA   502.89
15       NA  3182.86
16       NA       NA"

DF <- read.table(text = Lines, header = TRUE)

174

answered Oct 18 '22 01:10

G. Grothendieck

Here is a way using combinations from gtools (Won't be very efficient for huge data sets)

library(gtools)
library(zoo)
library(splitstackshape)

data$Col1_mod = na.locf(data$Col1)

df = stack(
     lapply(split(data, f = data$Col1_mod), 
     function(x){ 
         tmp1 = data.frame(
                combinations(
                    length(data$Col2[!is.na(data$Col2)]),
                    length(x$Col2[!is.na(x$Col2)]),
                    data$Col2[!is.na(data$Col2)]));
         tmp1$rowsums = rowSums(tmp1); 
         tmp2 = tmp1[tmp1$rowsums == unique(x$Col1_mod),];   
         toString(tmp2[,!colnames(tmp2) %in% 'rowsums'])
     }))

this will give

#> df
#                                       values      ind
#1                             473.32, 6531.06  7004.38
#2                        14, 2009.42, 5173.94  7197.36
#3          1.86, 85, 502.89, 3182.86, 4980.61  8753.22
#4 110.05, 1307.32, 3777.65, 7827.44, 13333.06 26355.52

you can reshape it using cSplit from splitstackshape

out = cSplit(setDT(df), 'values', ',', 'long')

 #>out
 #     values      ind
 #1:   473.32  7004.38
 #2:  6531.06  7004.38
 #3:    14.00  7197.36
 #4:  2009.42  7197.36
 #5:  5173.94  7197.36
 #6:     1.86  8753.22
 #7:    85.00  8753.22
 #8:   502.89  8753.22
 #9:  3182.86  8753.22
#10:  4980.61  8753.22
#11:   110.05 26355.52
#12:  1307.32 26355.52
#13:  3777.65 26355.52
#14:  7827.44 26355.52
#15: 13333.06 26355.52

answered Oct 18 '22 00:10

Veerendra Gadekar

Related questions
                            
                                R - Pie, X values must be positive
                            
                                What is the R equivalent of SQL "SELECT * FROM table GROUP BY c1, c2"?
                            
                                R: unclear behaviour of tuneRF function (randomForest package)
                            
                                How to create a contour plot in R
                            
                                dplyr string as column reference
                            
                                Change the symbol in a legend key in ggplot2
                            
                                Changing names in a list of dataframes [duplicate]
                            
                                R: Using dplyr inside a function. exception in eval(expr, envir, enclos): unknown column
                            
                                R Subset data.frame from max value of one vector and grouped by another [duplicate]
                            
                                Possible to install a package without installing dependencies?
                            
                                Position dodge with geom_point(), x=continuous, y=factor
                            
                                R function; if input is not given, ask for it
                            
                                How to initialize N named, empty data frames into a list in R?
                            
                                How to use the same output binding on different tab panels in shiny
                            
                                In R, how can I determine the operator precedence of user defined infix operators?
                            
                                How to add horizontal lines showing means for all groups in ggplot2?
                            
                                gather with tidyr: position must be between 0 and n error
                            
                                r - passing variables as data.table column names
                            
                                attr(*, "internal.selfref")=<externalptr> appearing in data.table Rstudio
                            
                                How do I make donut charts with facet_grid in ggplot2?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R - matching one value by combining values in a column

Tags:

dataframe

r

Jay

People also ask

2 Answers

G. Grothendieck

Veerendra Gadekar

Recent Activity

Donate For Us