My data is in this format:
country gdp digits
US 100 2657
Aus 50 123
NZ 40 11
and I'd like to take the mean, for each country of the individual digits that are all stored in the digits
column.
So this is what I'm after:
country gdp digits mean_digits
US 100 2657 5
Aus 50 123 2
NZ 40 11 1
I imagine I should split the digits
column into individual digits in separate columns and then take an arithmetic mean, but I was just a little unsure, because different rows have different numbers of digits in the digits
field.
Code for the reproducable data below:
df <- data.frame(stringsAsFactors=FALSE,
country = c("US", "AUS", "NZ"),
gdp = c(100, 50, 40),
digits = c(2657, 123, 11)
)
We need a function to split the number into digits and take the mean:
mean_digits = function(x) {
sapply(strsplit(as.character(x), split = "", fixed = TRUE),
function(x) mean(as.integer(x)))
}
df$mean_digits = mean_digits(df$digits)
df
# country gdp digits mean_digits
# 1 US 100 2657 5
# 2 AUS 50 123 2
# 3 NZ 40 11 1
as.character()
converts the numeric input to character
, strsplit
splits the numbers into individual digits (resulting in a list
), then with sapply
, to each list element we convert to integer and take the mean.
We use fixed = TRUE
for a little bit of efficiency, since we don't need any special regex to split every digit apart.
If you're using this function frequently, you may want to round
or check that the input is integer, it will return NA
if the input has decimals due to the .
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With