Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scatter plot for sorted data in R

Tags:

r

I have the following data:

subject = c("S01","S02","S03","S04","S05","S06","S07","S08","S09","S10")
post    = c(100,80,75,120,85,90,95,90,110,100)
pre     = c(45,60,80,75,45,60,55,50,35,40)
data1 = as.data.frame(cbind(subject, post, pre))

Then I sorted the data based on the post column:

data1 = data1[order(data1$post),]

What I want to have in the end is a scatter plot comparing the post and pre columns, in different colors accordingly. The X axis is simply the index of the data frame, but labeled with the subject number, so the axis label will be in the order of subject number since the data frame is sorted by the post column

If I do this:

plot(data1$post)

What I have is a bar chart, not even a scatter plot. Is this cause by the post column being a factor? I tried "as.numeric" for both post and pre columns, but the result is the same

If I do this:

plot(data1$post,data1$pre)

I have a scatter plot, but the index goes from 1 to 20. So instead of having a comparison scatter on the same index 1 to 10, I have two scatters with index from 1-10 and 11-20.

Any help to point out my mistakes will be greatly appreciated.

like image 202
ery Avatar asked Oct 09 '22 06:10

ery


1 Answers

It is not really correct to call this a "scatterplot"; one of the variables is categorical and the values are paired. It's really a variant of a dotplot. The practice of using as.data.frame(cbind(.)) has created a data monstrosity.

> data1
   subject post pre
1      S01  100  45
10     S10  100  40
9      S09  110  35
4      S04  120  75
3      S03   75  80
2      S02   80  60
5      S05   85  45
6      S06   90  60
8      S08   90  50
7      S07   95  55

And all those columns are factors rather than numeric as was clearly intended:

Use this code instead:

data1 = data.frame(subject=subject, post=post, pre=pre)
data1 = data1[order(data1$post),]; 
plot(data1$pre,type="p",ylim=range(data1$pre,data1$post), 
      xaxt="n", ylab="Pre/Post Scores: black=Pre, red=Post")
points(data1$post,col='red')
axis(1, at=1:10, labels=levels(data1$subject)[order(post)])

That last line could have been:

axis(1, at=1:10, labels=as.character(data1$subject))) # since the set was sorted by `post`

enter image description here

like image 184
IRTFM Avatar answered Oct 12 '22 21:10

IRTFM