Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging List of data frames into a single data frame, or avoiding it altogether

Tags:

r

I have a dataset like:

Company,Product,Users
MSFT,Office,1000
MSFT,VS,4000
GOOG,gmail,3203
GOOG,appengine,45454
MSFT,Windows,1500
APPL,iOS,6000
APPL,iCloud,3442

I'm writing a function to return a data frame with the nth product product for each company ranked by "Users" so the output of rankcompany(1) should be:

     Company   Prodcut Users
APPL    APPL       iOS  6000
GOOG    GOOG appengine 45454
MSFT    MSFT        VS  4000

The function looks like:

rankcompany <- function(num=1){

    #Read data file
    company_data <- read.csv("company.csv",stringsAsFactors = FALSE)

    #split by company
    split_data <- split(company_data, company_data$Company)

    #sort and select the nth row
    selected <- lapply(split_data, function(df) {
                                                df <- df[order(-df$Users, df$Product),]
                                                df[num,]
                                                 })

    #compose output data frame
    #this part needs to be smarter??
    len <- length(selected)
    selected_df <- data.frame(Company=character(len),Prodcut=character(len), Users=integer(len),stringsAsFactors = FALSE)
    row.names(selected_df) <- names(selected)


    for (n in names(selected)){
        print(str(selected[[n]]))
        selected_df[n,] <- selected[[n]][1,]

    }

    selected_df
}

I split the input data frame into a list then perform the sorting and selection then try to merge the result into the output data frame "selected_df"

I'm new to R and I thin the merging can be done in a smarter way. Or should I avoid splitting in the first place? Any suggestions?

Thanks

like image 714
Hesham Amin Avatar asked Dec 11 '22 03:12

Hesham Amin


2 Answers

You can do it in a much simpler way with dplyr :

rankcompany <- function(d, num=1) {
   d %>% group_by(Company) %>% arrange(desc(Users)) %>% slice(num)
}

And then you can do :

rankcompany(d,2)

or :

d %>% rankcompany(1)
like image 88
juba Avatar answered Apr 25 '23 19:04

juba


Based on the comment from @DMT I replaced the merging code with:

    selected_df <- rbindlist(selected)
    selected_df <- as.data.frame(selected_df)
    row.names(selected_df) <- names(selected)
    selected_df

And it works fine.

like image 45
Hesham Amin Avatar answered Apr 25 '23 20:04

Hesham Amin