Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Take mean of digits that are run together in one column

Tags:

r

dplyr

tidyverse

My data is in this format:

country gdp digits
US      100 2657
Aus     50  123
NZ      40  11

and I'd like to take the mean, for each country of the individual digits that are all stored in the digits column.

So this is what I'm after:

country gdp digits mean_digits
US      100 2657   5
Aus     50  123    2
NZ      40  11     1

I imagine I should split the digits column into individual digits in separate columns and then take an arithmetic mean, but I was just a little unsure, because different rows have different numbers of digits in the digits field.

Code for the reproducable data below:

df <- data.frame(stringsAsFactors=FALSE,
     country = c("US", "AUS", "NZ"),
         gdp = c(100, 50, 40),
      digits = c(2657, 123, 11)
)
like image 654
Jeremy K. Avatar asked Dec 08 '22 12:12

Jeremy K.


1 Answers

We need a function to split the number into digits and take the mean:

mean_digits = function(x) {
  sapply(strsplit(as.character(x), split = "", fixed = TRUE),
         function(x) mean(as.integer(x)))
}

df$mean_digits = mean_digits(df$digits)
df
#   country gdp digits mean_digits
# 1      US 100   2657           5
# 2     AUS  50    123           2
# 3      NZ  40     11           1

as.character() converts the numeric input to character, strsplit splits the numbers into individual digits (resulting in a list), then with sapply, to each list element we convert to integer and take the mean.

We use fixed = TRUE for a little bit of efficiency, since we don't need any special regex to split every digit apart.

If you're using this function frequently, you may want to round or check that the input is integer, it will return NA if the input has decimals due to the ..

like image 114
Gregor Thomas Avatar answered Dec 24 '22 20:12

Gregor Thomas