I have a data frame with two columns as below:
    Col1       Col2
1   7197.36    14.00
2        NA  5173.94
3        NA 13333.06
4   7004.38   473.32
5        NA  4980.61
6  26355.52   110.05
7        NA  1307.32
8        NA  6531.06
9        NA  3777.65
10       NA  7827.44
11  8753.22    85.00
12       NA     1.86
13       NA  2009.42
14       NA   502.89
15       NA  3182.86
16       NA       NA
I wanted to find matching rows in column 'Col2' corresponding to the single value in 'Col1'. For example, 7197.36 = 14.00 + 5173.94 + 2009.42 (rows 1,2,13 in 'Col2')
Here, sum of 'Col1' = sum of 'Col2'
The final data frame should look like this:
   Col1    Col2
 1   7197.36    14.00
 2        NA  5173.94
 3        NA  2009.42
 4   7004.38   473.32
 5        NA  6531.06
 6  26355.52   110.05
 7        NA  1307.32
 8        NA 13333.06
 9        NA  3777.65
10       NA  7827.44
11  8753.22    85.00
12       NA     1.86
13       NA  4980.61
14       NA   502.89
15       NA  3182.86
16       NA       NA
Can anybody help me?
How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.
We solve it via integer linear programming solving the problem of finding the minimum objective value greater than or equal to the target and if it is found to within numerical precision then return it; otherwise, return NULL.
library(lpSolve)
obj <- na.omit(DF$Col2)
targets <- na.omit(DF$Col1)
L <- lapply(targets, function(value) {
    iobj <- 100 * obj
    ivalue <- 100 * value
    res <- lp("min", iobj, t(iobj), ">=", ivalue, all.bin = TRUE)
    ok <- isTRUE(all.equal(ivalue, res$objval))
    if (ok) obj[res$solution == 1]
})
names(L) <- targets
giving:
> L
$`7197.36`
[1]   14.00 5173.94 2009.42
$`7004.38`
[1]  473.32 6531.06
$`26355.52`
[1] 13333.06   110.05  1307.32  3777.65  7827.44
$`8753.22`
[1] 4980.61   85.00    1.86  502.89 3182.86
Note 1: Later the question was modified to request this form of output:
transform(stack(L), Col1 = ifelse(duplicated(ind), NA, as.numeric(paste(ind))), 
                    Col2 = values)[3:4]
Note 2: We used this as DF
Lines <- "    Col1       Col2
1   7197.36    14.00
2        NA  5173.94
3        NA 13333.06
4   7004.38   473.32
5        NA  4980.61
6  26355.52   110.05
7        NA  1307.32
8        NA  6531.06
9        NA  3777.65
10       NA  7827.44
11  8753.22    85.00
12       NA     1.86
13       NA  2009.42
14       NA   502.89
15       NA  3182.86
16       NA       NA"
DF <- read.table(text = Lines, header = TRUE)
Here is a way using combinations from gtools (Won't be very efficient for huge data sets)
library(gtools)
library(zoo)
library(splitstackshape)
data$Col1_mod = na.locf(data$Col1)
df = stack(
     lapply(split(data, f = data$Col1_mod), 
     function(x){ 
         tmp1 = data.frame(
                combinations(
                    length(data$Col2[!is.na(data$Col2)]),
                    length(x$Col2[!is.na(x$Col2)]),
                    data$Col2[!is.na(data$Col2)]));
         tmp1$rowsums = rowSums(tmp1); 
         tmp2 = tmp1[tmp1$rowsums == unique(x$Col1_mod),];   
         toString(tmp2[,!colnames(tmp2) %in% 'rowsums'])
     }))
this will give
#> df
#                                       values      ind
#1                             473.32, 6531.06  7004.38
#2                        14, 2009.42, 5173.94  7197.36
#3          1.86, 85, 502.89, 3182.86, 4980.61  8753.22
#4 110.05, 1307.32, 3777.65, 7827.44, 13333.06 26355.52
you can reshape it using cSplit from splitstackshape
out = cSplit(setDT(df), 'values', ',', 'long')
 #>out
 #     values      ind
 #1:   473.32  7004.38
 #2:  6531.06  7004.38
 #3:    14.00  7197.36
 #4:  2009.42  7197.36
 #5:  5173.94  7197.36
 #6:     1.86  8753.22
 #7:    85.00  8753.22
 #8:   502.89  8753.22
 #9:  3182.86  8753.22
#10:  4980.61  8753.22
#11:   110.05 26355.52
#12:  1307.32 26355.52
#13:  3777.65 26355.52
#14:  7827.44 26355.52
#15: 13333.06 26355.52
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With