Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transform data to fit normal distribution

I have a rather easy-to-understand question.

I have a set of data and I want to estimate how good this data fit a standard normal distribution. To do so, I start with my code:

[f_p,m_p] = hist(data,128);
f_p = f_p/trapz(m_p,f_p);

x_th = min(data):.001:max(data);
y_th = normpdf(x_th,0,1);   

figure(1)
bar(m_p,f_p)
hold on
plot(x_th,y_th,'r','LineWidth',2.5)
grid on
hold off

Fig. 1 will look like the one below:

enter image description here

Easy to see that the fit is quite poor, altough the bell-shape can be spotted. The main problem resides therefore in the variance of my data.

To find out the proper number of occurrances my data-bins should own, I do this:

f_p_th = interp1(x_th,y_th,m_p,'spline','extrap');
figure(2)
bar(m_p,f_p_th)
hold on
plot(x_th,y_th,'r','LineWidth',2.5)
grid on
hold off

which will result in the following fig. :

enter image description here

Hence, the question is: how can I scale my data-block to match the Gaussian distribution as in Fig.2 ?

CAUTION

I wanna stress the focus on one point: I don't wanna find the best distribution fitting the data; the problem is reversed: starting from my data, I'd like to manipulate it in such a way that,in the end, its distribution reasonably fits the Gaussian one.

Unfortunately, at the moment, I don't have a real idea on how to perform this data "filter", "transform" or "manipulation".

Any support would be welcome.

like image 362
fpe Avatar asked Mar 21 '13 14:03

fpe


1 Answers

Maybe what you are interested in is rank-based inverse normal transformation. Basically you rank the data first and then convert it to a normal distribution:

rank = tiedrank( data );
p = rank / ( length(rank) + 1 ); %# +1 to avoid Inf for the max point
newdata = norminv( p, 0, 1 );
like image 157
yuk Avatar answered Sep 22 '22 14:09

yuk