Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add calculated column to dataframe R

Tags:

dataframe

r

I want to calculate some statistics (mean, min, max, standard deviation etc) of some columns from a dataframe and store these values as another dataframe.

Here's a sample:

>foo

    Col1 Col2 Col3 Col4
1    1    6   10   60
2    2    7   20   70
3    3    8   30   80
4    4    9   40   90
5    5   10   50  100

For example, I want to store the mean and minimum value of Col1 and Col3 in a dataframe like this:

>bar

           Col1       Col3
Mean          3         30
Min           1         10

I want to do this through a loop, something like this:

# Result dataframe
bar <- data.frame(Col1 = integer(), Col3 = integer())

variables_for_stats <- c("Col1","Col3")

# I want to do something on the lines of this:
for (z in variables_for_stats){

    # Populate column with required values
    col <- c(mean(foo$z,min(foo$z)) # Throws an error - argument is not numeric or logical: returning NA

    # Add col to 'bar'
    bar$z<- col # Does not work
}

My actual foo dataframe currently has ~ 40 columns and actual variables_for_stats are around 20. Both of these can change, hence the desire to do this through a for loop and lists. How should I do this ?

like image 591
Vishesh Shrivastav Avatar asked Jan 03 '23 07:01

Vishesh Shrivastav


2 Answers

We can loop over the columns of interest and get the mean and min

sapply(foo[c('Col1', 'Col3')], function(x) c(Mean = mean(x), Min =min(x)))
#      Col1 Col3
#Mean    3   30
#Min     1   10

NOTE: An apply based solution is a loop as well. But, it gives more control than a for loop in understanding the output

like image 104
akrun Avatar answered Jan 12 '23 20:01

akrun


if you're interested in a tidyverse solution...

library(tidyverse)

foo <- tribble(~Col1, ~Col2, ~Col3, ~Col4,
               1,    6,   10,   60,
               2,    7,   20,   70,
               3,    8,   30,   80,
               4,    9,   40,   90,
               5,   10,   50,  100)

foo %>%
  gather(Col, Value) %>% 
  group_by(Col) %>% 
  summarise(Mean = mean(Value), Minimum = min(Value))
#> # A tibble: 4 x 3
#>   Col    Mean Minimum
#>   <chr> <dbl>   <dbl>
#> 1 Col1      3       1
#> 2 Col2      8       6
#> 3 Col3     30      10
#> 4 Col4     80      60

Edit If you want the resulting data frame exactly as you pointed out in your question, then:

foo %>%
  gather(Col, Value) %>% 
  group_by(Col) %>% 
  summarise(Mean = mean(Value),
            Minimum = min(Value)) %>% 
  gather(Func, Value, 2:3) %>% 
  spread(Col, Value) %>% 
  select(Func, Col1, Col3)

# A tibble: 2 x 3
#  Func     Col1  Col3
#  <chr>   <dbl> <dbl>
#1 Mean        3    30
#2 Minimum     1    10
like image 24
giocomai Avatar answered Jan 12 '23 22:01

giocomai