Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping query into group and subgroup

I want to group my data using SQL or R so that I can get top or bottom 10 Subarea_codes for each Company and Area_code. In essence: the Subarea_codes within the Area_codes where each Company has its largest or smallest result.

data.csv

Area_code  Subarea_code  Company   Result
10         101           A         15
10         101           P         10
10         101           C         4
10         102           A         10
10         102           P         8
10         102           C         5
11         111           A         15
11         111           P         20
11         111           C         5
11         112           A         10
11         112           P         5
11         112           C         10


result.csv should be like this

Company   Area_code  Largest_subarea_code  Result  Smallest_subarea_code    Result
A         10         101                   15      102                      10
P         10         101                   10      102                      8            
C         10         102                   5       101                      4
A         11         111                   15      112                      10
P         11         111                   20      112                      5
C         11         112                   10      111                      5

Within each Area_code there can be hundreds of Subarea_codes but I only want the top and bottom 10 for each Company.

Also this doesn't have to be resolved in one query, but can be divided into two queries, meaning smallest is presented in results_10_smallest and largest in result_10_largest. But I'm hoping I can accomplish this with one query for each result.

What I've tried:

SELECT Company, Area_code, Subarea_code MAX(Result) 
    AS Max_result
FROM data
GROUP BY Subarea_code
ORDER BY Company
;

This gives me all the Companies with the highest results within each Subarea_code. Which would mean: A, A, P, A-C for the data above.

like image 865
user3471881 Avatar asked Dec 14 '22 20:12

user3471881


1 Answers

Using sqldf package:

df <- read.table(text="Area_code  Subarea_code  Company   Result
10         101           A         15
10         101           P         10
10         101           C         4
10         102           A         10
10         102           P         8
10         102           C         5
11         111           A         15
11         111           P         20
11         111           C         5
11         112           A         10
11         112           P         5
11         112           C         10", header=TRUE)

library(sqldf)
mymax <- sqldf("select Company,
                  Area_code,
                  max(Subarea_code) Largest_subarea_code
                  from df
                  group by Company,Area_code")
mymaxres <- sqldf("select d.Company,
                          d.Area_code,
                          m.Largest_subarea_code,
                          d.Result
                  from df d, mymax m
                  where d.Company=m.Company and
                        d.Subarea_code=m.Largest_subarea_code")

mymin <- sqldf("select Company,
                  Area_code,
                  min(Subarea_code) Smallest_subarea_code
                  from df
                  group by Company,Area_code")
myminres <- sqldf("select d.Company,
                          d.Area_code,
                          m.Smallest_subarea_code,
                          d.Result
                  from df d, mymin m
                  where d.Company=m.Company and
                        d.Subarea_code=m.Smallest_subarea_code")
result <- sqldf("select a.*, b.Smallest_subarea_code,b.Result
                from mymaxres a, myminres b
                where a.Company=b.Company and 
                      a.Area_code=b.Area_code")
like image 82
zx8754 Avatar answered Jan 05 '23 22:01

zx8754