Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return pmin or pmax of data.frame with multiple columns

Tags:

dataframe

r

Is there a way to select the pmax/pmin of a data frame with multiple columns??

I want only the max or min returned, not the entire row.

max <- tail(df, n=1)
max
#                       v1     v2     v3     v4     v5     v6     v7     v8
#2014-10-03 17:35:00  58.91  45.81  33.06  70.76  36.39  45.53  33.52  34.36

pmax(max)
#                       v1     v2     v3     v4     v5     v6     v7     v8
#2014-10-03 17:35:00  58.91  45.81  33.06  70.76  36.39  45.53  33.52  34.36

For this row, I expect a return value of :

70.76

...as it is the maximum value across all the columns.

like image 909
user3773444 Avatar asked Oct 09 '14 00:10

user3773444


2 Answers

Use do.call to call pmax to compare all the columns together for each row value, e.g.:

dat <- data.frame(a=1:5,b=rep(3,5))

#  a b
#1 1 3
#2 2 3
#3 3 3
#4 4 3
#5 5 3

do.call(pmax,dat)
#[1] 3 3 3 4 5

When you call pmax on an entire data.frame directly, it only has one argument passed to the function and nothing to compare it to. So, it just returns the supplied argument as it must be the maximum. It works for non-numeric and numeric arguments, even though it may not make much sense:

pmax(7)
#[1] 7

pmax("a")
#[1] "a"

pmax(data.frame(1,2,3))
#  X1 X2 X3
#1  1  2  3

Using do.call(pmax,...) with a data.frame means you pass each column of the data.frame as a list of arguments to pmax:

do.call(pmax,dat) 

is thus equivalent to:

pmax(dat$a, dat$b)
like image 158
thelatemail Avatar answered Nov 03 '22 19:11

thelatemail


A dplyr way if you like

df <- structure(list(X = c("A", "B", "C", "D", "E", "F", "G", "H", 
                           "I", "J"), Y_1 = c(34, 29, 94, 53, 84, 53, 92, 41, 49, 32), Y_2 = c(43, 
                                                                                               26, 29, 64, 68, 88, 36, 86, 74, 15), Y_3 = c(62, 72, 69, 54, 
                                                                                                                                            80, 49, 51, 41, 46, 68), Y_4 = c(92, 45, 21, 27, 60, 32, 40, 
                                                                                                                                                                             40, 39, 89), Y_5 = c(28, 79, 34, 84, 58, 16, 69, 53, 78, 80), 
                     Y_6 = c(91, 55, 45, 70, 81, 19, 33, 90, 28, 82), Y_7 = c(95, 
                                                                              75, 11, 81, 12, 38, 53, 88, 74, 51), Y_8 = c(69, 99, 44, 
                                                                                                                           20, 53, 57, 79, 45, 21, 47), Y_9 = c(67, 44, 88, 75, 76, 
                                                                                                                                                                70, 18, 80, 32, 83), Y_10 = c(16, 80, 41, 47, 72, 47, 89, 
                                                                                                                                                                                              96, 23, 64)), row.names = c(NA, -10L), class = c("tbl_df", 
                                                                                                                                                                                                                                               "tbl", "data.frame"))
df
#> # A tibble: 10 x 11
#>    X       Y_1   Y_2   Y_3   Y_4   Y_5   Y_6   Y_7   Y_8   Y_9  Y_10
#>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 A        34    43    62    92    28    91    95    69    67    16
#>  2 B        29    26    72    45    79    55    75    99    44    80
#>  3 C        94    29    69    21    34    45    11    44    88    41
#>  4 D        53    64    54    27    84    70    81    20    75    47
#>  5 E        84    68    80    60    58    81    12    53    76    72
#>  6 F        53    88    49    32    16    19    38    57    70    47
#>  7 G        92    36    51    40    69    33    53    79    18    89
#>  8 H        41    86    41    40    53    90    88    45    80    96
#>  9 I        49    74    46    39    78    28    74    21    32    23
#> 10 J        32    15    68    89    80    82    51    47    83    64

library(dplyr)

df %>% rowwise() %>%
  mutate(max_val = max(c_across(where(is.numeric))))
#> # A tibble: 10 x 12
#> # Rowwise: 
#>    X       Y_1   Y_2   Y_3   Y_4   Y_5   Y_6   Y_7   Y_8   Y_9  Y_10 max_val
#>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
#>  1 A        34    43    62    92    28    91    95    69    67    16      95
#>  2 B        29    26    72    45    79    55    75    99    44    80      99
#>  3 C        94    29    69    21    34    45    11    44    88    41      94
#>  4 D        53    64    54    27    84    70    81    20    75    47      84
#>  5 E        84    68    80    60    58    81    12    53    76    72      84
#>  6 F        53    88    49    32    16    19    38    57    70    47      88
#>  7 G        92    36    51    40    69    33    53    79    18    89      92
#>  8 H        41    86    41    40    53    90    88    45    80    96      96
#>  9 I        49    74    46    39    78    28    74    21    32    23      78
#> 10 J        32    15    68    89    80    82    51    47    83    64      89

Created on 2021-05-08 by the reprex package (v2.0.0)

like image 32
AnilGoyal Avatar answered Nov 03 '22 19:11

AnilGoyal