Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - graph frequency of observations over time with small value range

I'd trying to graph the frequency of observations over time. I have a dataset where hundreds of laws are coded 0-3. I'd like to know if outcomes 2-3 are occurring more often as time progresses. Here is a sample of mock data:

Data <- data.frame(
  year = sample(1998:2004, 200, replace = TRUE),
  score = sample(1:4, 200, replace = TRUE)
)

If i plot

plot(Data$year, Data$score)

I get a checkered matrix where every single spot is filled in, but I can't tell which numbers occur more often. Is there a way to color or to change the size of each point by the number of observations of a given row/year?

A few notes may help in answering the question:

1). I don't know how to sample data where certain numbers occur more frequently than others. My sample procedure samples equally from all numbers. If there is a better way I should have created my reproducible data to reflect more observations in later years, I would like to know how.

2). this seemed like it would be best to visualize in a scatter plot, but I could be wrong. I'm open to other visualizations.

Thanks!

like image 589
tom Avatar asked Dec 23 '14 19:12

tom


People also ask

What does Coplot do in R?

The coplot() function plots two variables but each plot is conditioned ( | ) by a third variable. This third variable can be either numeric or a factor.

Which of the plots represent frequencies of outcome of a quantitative variable?

Histogram each bar or column represents the frequency of occurrence of continuous quantitative variables. The basic advantage of Histogram is to display a large amount of data graphically that are difficult to interpret in a tabular form.

Which graph type is appropriate when representing the frequencies of a single variable?

Unlike a bar or line graph, a pie graph is used when there is only one variable and is best for comparing parts of a whole. The sum of the pieces always equals 100 percent, and the visual conveys a relative value or frequency.


2 Answers

Here's how I would approach this (hope this is what you need)

Create the data (Note: when using sample in questions, always use set.seed too so it will be reproducible)

set.seed(123)
Data <- data.frame(
  year = sample(1998:2004, 200, replace = TRUE),
  score = sample(1:4, 200, replace = TRUE)
)

Find frequncies of score per year using table

Data2 <- as.data.frame.matrix(table(Data))
Data2$year <- row.names(Data2)

Use melt to convert it back to long format

library(reshape2)
Data2 <- melt(Data2, "year")

Plot the data while showing different color per group and relative size pre frequency

library(ggplot2)
ggplot(Data2, aes(year, variable, size = value, color = variable)) +
  geom_point()

enter image description here

Alternatively, you could use both fill and size to describe frequency, something like

ggplot(Data2, aes(year, variable, size = value, fill = value)) +
  geom_point(shape = 21)

enter image description here

like image 55
David Arenburg Avatar answered Feb 07 '23 22:02

David Arenburg


Here's another approach:

ggplot(Data, aes(year)) + geom_histogram(aes(fill = ..count..)) + facet_wrap(~ score)

pic

Each facet represents one "score" value, as noted in the title of each facet. You can easily get a feeling for the counts by looking at the hight of the bars + the colour (lighter blue indicating more counts).


Of course you could also do this only for the score %in% 2:3, if you don't want score 1 and 4 included. In such a case, you could do:

ggplot(Data[Data$score %in% 2:3,], aes(year)) + 
     geom_histogram(aes(fill = ..count..)) + facet_wrap(~ score)
like image 33
talat Avatar answered Feb 07 '23 21:02

talat