Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2's mpg dataset--what's the "fl." variable?

Tags:

r

ggplot2

dataset

In ggplot2's built-in mpg dataset there is variable called "fl.", which is a factor with levels: "c", "d", "e", "p", & "r".

Does anyone know what those letters are supposed to stand for? Needless to say, googling those letters has yet to give me any relevant leads...

library(ggplot2)
data(mpg)
str(mpg)
?mpg

[Note: There was a similar question on SO re: the mtcars dataset, which gave me the impression that this would be an appropriate forum for this sort of question.]

like image 436
Steve S Avatar asked Aug 28 '14 12:08

Steve S


People also ask

What is Cty in mpg?

drv. the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wd. cty. city miles per gallon.

What data type does Ggplot expect?

In ggplot2 syntax, we say that they use different geoms. A geom is the geometrical object that a plot uses to represent data. People often describe plots by the type of geom that the plot uses. For example, bar charts use bar geoms, line charts use line geoms, boxplots use boxplot geoms, and so on.

Which function can you use to create a different plot for each type of cut of diamond?

The correct code is ggplot(data = diamonds) + geom_bar(mapping = aes(x = color, fill = cut)) + facet_wrap(~color) .


1 Answers

The fuel:

  • e: ethanol E85, note(subset(mpg, fl=="e") pulls up only "new" american cars, and that fuel economy is much lower than the corresponding presumably gasoline models, which lines up with the lower energy content of ethanol)
  • d: diesel
  • r: regular
  • p: premium
  • c: CNG (note as far as I know the civic is basically the only passenger car that runs on CNG in the US).

Note, I have no reason to know this other than an educated guess based on the rest of the data, but here is some graphical evidence:

ggplot(mpg, aes(x=fl, y=hwy)) + geom_boxplot() + facet_wrap(~cyl, nrow=1)

enter image description here

Notice how e is consistently low d is consistently high at least where there is more than 1 data point (diesel has higher energy content) and p is consistently higher than r (premium allows cars to run at higher compression ratios and efficiency, though actually premium has lower energy content than regular) for each cylinder category (facets are # of cylinders).


UPDATE: as per @naught101, this now appears to be documented.

like image 195
BrodieG Avatar answered Sep 30 '22 13:09

BrodieG