Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Two colour scatter plot in R or in python

I have a dataset of three columns and n number of rows. column 1 contains name, column 2 value1, and column 3 value2 (rank2).

I want to plot a scatter plot with the outlier values displaying names.

The R commands I am using in are:

tiff('scatterplot.tiff')
data<-read.table("scatterplot_data", header=T)
attach(data)
reg1<-lm(A~B)
plot(A,B,col="red")
abline(reg1)
outliers<-data[which(2^(data[,2]-data[,3]) >= 4 | 2^(data[,2]-data[,3]) <=0.25),]

text(outliers[,2], outliers[,3],labels=outliers[,1],cex=0.50)

dev.off()

and I get a figure like this: enter image description here

What I want is the labels on the lower half should be of one colour and the labels in upper half should be of another colour say green and red respectively.

Any suggestions, or adjustment in the commands?

like image 842
Angelo Avatar asked May 14 '26 01:05

Angelo


2 Answers

You already have a logical test that works to your satisfaction. Just use it in the color spec to text:

     text(outliers[,2], outliers[,3],labels=outliers[,1],cex=0.50, 
         col=c("blue", "green")[ 
                which(2^(data[,2]-data[,3]) >= 4 ,  2^(data[,2]-data[,3]) <=0.25)] )

It's untested of course because you offered no test case, but my reasoning is that the which() function should return 1 for the differences >= 4, and 2 for the ones <= 0.25, and integer(0) for all the others and that this should give you the proper alignment of color choices with the 'outliers' vector.

like image 153
IRTFM Avatar answered May 16 '26 15:05

IRTFM


Using python, matplotlib (pylab) to plot, and scipy, numpy to fit data. The trick with numpy is to create a index or mask to filter out the results that you want.

EDIT: Want to selectively color the top and bottom outliers? It's a simple combination of both masks that we created:

import scipy as sci
import numpy as np
import pylab as plt

# Create some data
N = 1000
X = np.random.normal(5,1,size=N)
Y = X + np.random.normal(0,5.5,size=N)/np.random.normal(5,.1)
NAMES = ["foo"]*1000 # Customize names here

# Fit a polynomial
(a,b)=sci.polyfit(X,Y,1)

# Find all points above the line
idx = (X*a + b) < Y

# Scatter according to that index
plt.scatter(X[idx],Y[idx], color='r')
plt.scatter(X[~idx],Y[~idx], color='g')

# Find top 10 outliers
err = ((X*a+b) - Y) ** 2
idx_L = np.argsort(err)[-10:]
for i in idx_L:
    plt.text(X[i], Y[i], NAMES[i])

# Color the outliers purple or black
top = idx_L[idx[idx_L]]
bot = idx_L[~idx[idx_L]]

plt.scatter(X[top],Y[top], color='purple')
plt.scatter(X[bot],Y[bot], color='black')

XF = np.linspace(0,10,1000)
plt.plot(XF, XF*a + b, 'k--') 
plt.axis('tight')
plt.show()

enter image description here

like image 29
Hooked Avatar answered May 16 '26 16:05

Hooked