I have a dataset like:
Company,Product,Users MSFT,Office,1000 MSFT,VS,4000 GOOG,gmail,3203 GOOG,appengine,45454 MSFT,Windows,1500 APPL,iOS,6000 APPL,iCloud,3442
I'm writing a function to return a data frame with the nth product product for each company ranked by "Users" so the output of rankcompany(1) should be:
Company Prodcut Users APPL APPL iOS 6000 GOOG GOOG appengine 45454 MSFT MSFT VS 4000
The function looks like:
rankcompany <- function(num=1){
#Read data file
company_data <- read.csv("company.csv",stringsAsFactors = FALSE)
#split by company
split_data <- split(company_data, company_data$Company)
#sort and select the nth row
selected <- lapply(split_data, function(df) {
df <- df[order(-df$Users, df$Product),]
df[num,]
})
#compose output data frame
#this part needs to be smarter??
len <- length(selected)
selected_df <- data.frame(Company=character(len),Prodcut=character(len), Users=integer(len),stringsAsFactors = FALSE)
row.names(selected_df) <- names(selected)
for (n in names(selected)){
print(str(selected[[n]]))
selected_df[n,] <- selected[[n]][1,]
}
selected_df
}
I split the input data frame into a list then perform the sorting and selection then try to merge the result into the output data frame "selected_df"
I'm new to R and I thin the merging can be done in a smarter way. Or should I avoid splitting in the first place? Any suggestions?
Thanks
You can do it in a much simpler way with dplyr
:
rankcompany <- function(d, num=1) {
d %>% group_by(Company) %>% arrange(desc(Users)) %>% slice(num)
}
And then you can do :
rankcompany(d,2)
or :
d %>% rankcompany(1)
Based on the comment from @DMT I replaced the merging code with:
selected_df <- rbindlist(selected)
selected_df <- as.data.frame(selected_df)
row.names(selected_df) <- names(selected)
selected_df
And it works fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With