Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gaussian Fit on noisy and 'interesting' data set

I have some data (X-ray diffraction) that looks like this:

enter image description here

I want to fit a Gaussian to this data set to get the FWHM of the 'wider' portion. The double peak at around 7 degrees theta is not important information and coming from unwanted sources.

To make myself more clear I want something like this (which I made in paint :) ):

enter image description here

I have tried to script something in python with the following code:

import math
from pylab import *
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
data2=np.loadtxt('FWHM.spc')    
x2,y2=data2[:,0],data2[:,7]
plt.title('Full Width Half Max of 002 Peak')
plt.plot(x2, y2, color='b')
plt.xlabel('$\\theta$', fontsize=10)
plt.ylabel('Intensity', fontsize=10)
plt.xlim([3,11])
plt.xticks(np.arange(3, 12, 1), fontsize=10)
plt.yticks(fontsize=10)

def func(x, a, x0, sigma):
    return a*np.exp(-(x-x0)**2/(2*sigma**2))

mean = sum(x2*y2)/sum(y2)
sigma2 = sqrt(abs(sum((x2-mean)**2*y2)/sum(y2)))
popt, pcov = curve_fit(func, x2, y2, p0 = [1, mean, sigma2])
ym = func(x2, popt[0], popt[1], popt[2])
plt.plot(x2, ym, c='r', label='Best fit')
FWHM = round(2*np.sqrt(2*np.log(2))*popt[2],4)
axvspan(popt[1]-FWHM/2, popt[1]+FWHM/2, facecolor='g', alpha=0.3, label='FWHM = %s'%(FWHM))
plt.legend(fontsize=10)
plt.show()

and I get the following output:

enter image description here

Clearly, this is way off from what is desired. Does anyone have some tips how I can acheive this?

(I have enclosed the data here: https://justpaste.it/109qp)

like image 872
sci-guy Avatar asked Nov 10 '16 18:11

sci-guy


1 Answers

As mentioned in the OP comments, one of the ways to constrain a signal in presence of unwanted data is to model it together with the desired signal. Of course, this approach is valid only when there is a valid model readily available for those contaminating data. For the data that you provide, one can consider a composite model that sums over the following components:

  1. A linear baseline, because all your points are constantly offset from zero.
  2. Two narrow Gaussian components that will model the double-peaked feature at the central part of your spectrum.
  3. A narrow Gaussian component. This is the one you're actually trying to constrain.

All four components (double peak counts twice) can be fit simultaneusly once you pass a reasonable starting guess to curve_fit:

def composite_spectrum(x, # data
                       a, b, # linear baseline
                       a1, x01, sigma1, # 1st line
                       a2, x02, sigma2, # 2nd line
                       a3, x03, sigma3 ): # 3rd line
    return (x*a + b + func(x, a1, x01, sigma1)
                    + func(x, a2, x02, sigma2)
                    + func(x, a3, x03, sigma3))

guess = [1, 200, 1000, 7, 0.05, 1000, 6.85, 0.05, 400, 7, 0.6]

popt, pcov = curve_fit(composite_spectrum, x2, y2, p0 = guess)
plt.plot(x2, composite_spectrum(x2, *popt), 'k', label='Total fit')
plt.plot(x2, func(x2, *popt[-3:])+x2*popt[0]+popt[1], c='r', label='Broad component')
FWHM = round(2*np.sqrt(2*np.log(2))*popt[10],4)
plt.axvspan(popt[9]-FWHM/2, popt[9]+FWHM/2, facecolor='g', alpha=0.3, label='FWHM = %s'%(FWHM))
plt.legend(fontsize=10)
plt.show()

enter image description here

In a case when the unwanted sources can not be modeled properly, the unwanted discontinuity can be masked out as suggested by Mad Physicist. For a simplest case you can even simply mask out eveything inside [6.5; 7.4] interval.

like image 125
Vlas Sokolov Avatar answered Oct 07 '22 01:10

Vlas Sokolov