I have 170 rows of numbers with decimals that need to be rounded to whole numbers. However, the rows total has to equal a number I specify.
As a very basic illustration, let’s say I have a matrix (1x4) with cell contents (1.2, 3.4, 7.7, 5.3). But let’s say that these numbers represent individuals so I need to round them to whole numbers, such that the group populations are equal to a total population of 18 individuals. If I simply round the matrix contents, which gives me (1, 3, 8, 5), my total population is 17 and I need it to equal 18 (see R commands below).
m <- c(1.2, 3.4, 7.7, 5.3)
m.2 <- round(m)
m.2 [1] 1 3 8 5
sum(m.2) [1] 17
After the numbers are rounded, I need R to then choose the next number that was closest to rounding up (i.e. 3.4) and round it to 4 instead of 3.
This would give me a matrix of (1, 4, 8, 5) = 18.
Dr. John Fox had helped me out with a simple recursive function to solve the problem:
Round <- function(x, target){
r.x <- round(x)
diff.x <- round(x) - x
if ((s <- sum(r.x)) == target) return(r.x)
else if (s > target) {
select <- seq(along=x)[diff.x > 0]
which <- which.max(diff.x[select])
x[select[which]] <- r.x[select[which]] - 1
Round(x, target)
}
else{
select <- seq(along=x)[diff.x < 0]
which <- which.min(diff.x[select])
x[select[which]] <- r.x[select[which]] + 1
Round(x, target)
}
}
This is very useful for individual rows. But I have 170 rows in my dataset. So that means repeating a process like this (see below) 170 times:
paste(STATA[['b']], collapse=", ")
B <- c(46.8310012817383, 19.9720001220703, 265.837005615234, 95.0400009155273, 6.88700008392334, 190.768997192383, 22.7269992828369, 764.453002929688, 53.0299987792969, 333.329010009766, 55.0960006713867, 84.0210037231445, 28.2369995117188, 2207.27099609375, 86.7760009765625, 50045.46875, 103.304000854492, 413.217987060547, 4.13199996948242, 2.75500011444092, 183.88200378418, 65.4260025024414, 0.689000010490417, 2248.59204101562, 0, 1.37699997425079, 16.5289993286133, 4.13199996948242, 4.13199996948242, 2.75500011444092, 4.13199996948242, 1.37699997425079, 0, 39.9440002441406, 2.75500011444092, 28.2369995117188, 0, 0, 5.51000022888184, 0, 48.8969993591309, 17.9060001373291, 485.531005859375, 1.37699997425079, 59.9169998168945, 221.759994506836, 28.2369995117188, 4.13199996948242, 65.4260025024414, 11.0190000534058, 38.5670013427734, 3.44300007820129, 8.95300006866455, 2.75500011444092, 23.4160003662109, 4.13199996948242, 50.5750015258789, 11.7080001831055, 19.2830009460449, 48.8969993591309, 0, 13.7740001678467, 92.9739990234375)
varB <- (Round(B, 58701))
ROUND2012$varB <- varB
^In this case, I had used the transpose of my dataset in Excel because I found it easier to attach columns to datasets in R as compared to attaching rows. But ideally I wouldn't have to do this and rows would be my territories and columns are group identity population data. Here, 'b' is the name of the column I am calling and 58701 is the population total that the numbers need to add up to after they are rounded.
In short, I'm looking for a function that is helpful for an entire dataset as opposed to individual rows. Ideally I'd be able to call the columns with the numbers to be rounded as well as call the column with the population totals that I need the rounded numbers to equal to.
Updated Info
As a more illustrative example. Let's say I have two racial groups in my population.
B
race1 race2 total
place1 1.2 2.1 3.4
place2 3.4 3.6 7.0
place3 7.7 0.8 8.5
place4 5.3 1.4 6.7
I need these numbers to equal my total registered voters population. The totals are 3.4, 7.0, 8.5, 6.7, but I need the contents within each place row to be rounded such that my place(1-4) totals are 4.0, 7.0, 8.0, and 7.0. So that means for place1, I need the contents to be rounded so that 1.2 becomes 2.0 and 2.1 becomes 2.0. Equals 4.0, my registered voter population. For place2, the total is already at 7 so we're okay. For place3 7.7 would become 7.0 and 0.8 would become 1, giving me 8 in total. Finally for place4, I would need 5.3 to be rounded to 5 and 1.4 to be rounded to 2.0, giving me 7 in total. What I want is:
B
race1 race2 total
place1 2.0 2.0 4.0
place2 3.0 4.0 7.0
place3 7.0 1.0 8.0
place4 5.0 2.0 7.0
Currently the round function pasted above allows me to call one series of numbers at a time, and manually entering in what total they need to be rounded to. But I am looking for a function that could do this all simultaneously. I want to call all the race columns to be rounded, and call a column containing all the necessary population totals. (note: in practice I had taken the transpose of the matrix in excel and re-imported it back into R because, as a fairly new R user, I found that attaching new columns to the dataset was easier than attaching new rows. But I absolutely do not need to do that step and, indeed, would prefer not to.)
There are several ways you could do this, but taking my comment from above:
Round <- function(x, target) {
r.x <- round(x)
diff.x <- round(x) - x
if ((s <- sum(r.x)) == target) {
return(r.x)
} else if (s > target) {
select <- seq(along=x)[diff.x > 0]
which <- which.max(diff.x[select])
x[select[which]] <- r.x[select[which]] - 1
Round(x, target)
} else {
select <- seq(along=x)[diff.x < 0]
which <- which.min(diff.x[select])
x[select[which]] <- r.x[select[which]] + 1
Round(x, target)
}
}
dat <- read.table(header = TRUE, row.names = paste0('place', 1:4),
text="race1 race2 total
1.2 2.1 3.4
3.4 3.6 7.0
7.7 0.8 8.5
5.3 1.4 6.7")
totals <- c(4.0, 7.0, 8.0, 7.0)
The two examples simply perform the Round
on each row using a 1-1 mapping from the two columns of dat
with each corresponding value in totals
lapply
returns a list, so to transform the output back into a matrix/data frame, we rbind
everything back together.
do.call(rbind, lapply(1:nrow(dat), function(x) Round(dat[x, -3], totals[x])))
# race1 race2
# place1 2 2
# place2 3 4
# place3 7 1
# place4 5 2
the output of apply
is transposed to what you want, so we t
the result
dat[3] <- totals
t(apply(dat, 1, function(x) Round(x[1:2], x[3])))
# race1 race2
# place1 2 2
# place2 3 4
# place3 7 1
# place4 5 2
Alternatively, you could probably come up with something more clever using Map
/mapply
or Vectorize
the Round
to avoid these loops, but it doesn't seem like your data is very large.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With