Logo Questions Linux Laravel Mysql Ubuntu Git Menu

How to retrieve the most repeated value in a column present in a data frame





I am trying to retrieve the most repeated value in a particular column present in a data frame.Here is my sample data and code below.A

data("Forbes2000", package = "HSAUR")

  rank                name        country             category  sales profits  assets marketvalue
1    1           Citigroup  United States              Banking  94.71   17.85 1264.03      255.30
2    2    General Electric  United States        Conglomerates 134.19   15.59  626.93      328.54
3    3 American Intl Group  United States            Insurance  76.66    6.46  647.66      194.87
4    4          ExxonMobil  United States Oil & gas operations 222.88   20.96  166.99      277.02
5    5                  BP United Kingdom Oil & gas operations 232.57   10.27  177.57      173.54
6    6     Bank of America  United States              Banking  49.01   10.81  736.45      117.55

As per my sample data I need to return the most repeated category which is Insurance.

like image 263
Teja Avatar asked Aug 29 '12 22:08


3 Answers

tail(names(sort(table(Forbes2000$category))), 1)
like image 92
ALiX Avatar answered Oct 01 '22 09:10


In case two or more categories may be tied for most frequent, use something like this:

x <- c("Insurance", "Insurance", "Capital Goods", "Food markets", "Food markets")
tt <- table(x)
[1] "Food markets" "Insurance" 
like image 11
Josh O'Brien Avatar answered Oct 01 '22 10:10

Josh O'Brien

Another way with the data.table package, which is faster for large data sets:

x=sample(seq(1,100), 5000000, replace = TRUE)

method 1 (solution proposed above)

start.time <- Sys.time()
tt <- table(x)
end.time <- Sys.time()
time.taken <- end.time - start.time

Time difference of 4.883488 secs

method 2 (DATA TABLE)

start.time <- Sys.time()
ds <- data.table( x )
setkey(ds, x)
sorted <- ds[,.N,by=list(x)]

most_repeated_value <- sorted[order(-N)]$x[1]

end.time <- Sys.time()
time.taken <- end.time - start.time

Time difference of 0.328033 secs

like image 5
Timothée HENRY Avatar answered Oct 01 '22 10:10

Timothée HENRY