Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use ggplot to group and show top X categories?

Tags:

r

ggplot2

I am trying to use use ggplot to plot production data by company and use the color of the point to designate year. The follwoing chart shows a example based on sample data: enter image description here

However, often times my real data has 50-60 different comapnies wich makes the Company names on the Y axis to be tiglhtly grouped and not very asteticly pleaseing.

What is th easiest way to show data for only the top 5 companies information (ranked by 2011 quanties) and then show the rest aggregated and shown as "Other"?

Below is some sample data and the code I have used to create the sample chart:

# create some sample data
c=c("AAA","BBB","CCC","DDD","EEE","FFF","GGG","HHH","III","JJJ")

q=c(1,2,3,4,5,6,7,8,9,10)
y=c(2010)
df1=data.frame(Company=c, Quantity=q, Year=y)

q=c(3,4,7,8,5,14,7,13,2,1)
y=c(2011)
df2=data.frame(Company=c, Quantity=q, Year=y)

df=rbind(df1, df2)

# create plot
p=ggplot(data=df,aes(Quantity,Company))+
  geom_point(aes(color=factor(Year)),size=4)
p

I started down the path of a brute force approach but thought there is probably a simple and elegent way to do this that I should learn. Any assistance would be greatly appreciated.

like image 348
MikeTP Avatar asked Jan 16 '23 21:01

MikeTP


1 Answers

What about this:

    df2011 <- subset (df, Year == 2011)
    companies <- df2011$Company [order (df2011$Quantity, decreasing = TRUE)]
    ggplot (data = subset (df, Company %in% companies [1 : 5]), 
            aes (Quantity, Company)) +
            geom_point (aes (color = factor (Year)), size = 4)

BTW: in order for the code to be called elegant, spend a few more spaces, they aren't that expensive...

like image 70
cbeleites unhappy with SX Avatar answered Jan 19 '23 12:01

cbeleites unhappy with SX