I am currently working with some Raman Spectra data, and I am trying to correct my data caused by florescence skewing. Take a look at the graph below:
I am pretty close to achieving what I want. As you can see, I am trying to fit a polynomial in all my data whereas I should really just be fitting a polynomial at the local minimas.
Ideally I would want to have a polynomial fitting which when subtracted from my original data would result in something like this:
Are there any built in libs that does this already?
If not, any simple algorithm one can recommend for me?
There is a python library available for baseline correction/removal. It has Modpoly, IModploy and Zhang fit algorithm which can return baseline corrected results when you input the original values as a python list or pandas series and specify the polynomial degree.
A baseline provides a point of comparison for the more advanced methods that you evaluate later. In this tutorial, you will discover how to implement baseline machine learning algorithms from scratch in Python.
Baseline correction is an important pre-processing technique used to separate true spectroscopic signals from interference effects or remove background effects, stains or traces of compounds, e.g. in 2D gel electrophoresis.
Baseline subtraction is the functional estimation and removal of background noise. Source publication.
I found an answer to my question, just sharing for everyone who stumbles upon this.
There is an algorithm called "Asymmetric Least Squares Smoothing" by P. Eilers and H. Boelens in 2005. The paper is free and you can find it on google.
def baseline_als(y, lam, p, niter=10): L = len(y) D = sparse.csc_matrix(np.diff(np.eye(L), 2)) w = np.ones(L) for i in xrange(niter): W = sparse.spdiags(w, 0, L, L) Z = W + lam * D.dot(D.transpose()) z = spsolve(Z, w*y) w = p * (y > z) + (1-p) * (y < z) return z
The following code works on Python 3.6.
This is adapted from the accepted correct answer to avoid the dense matrix diff
computation (which can easily cause memory issues) and uses range
(not xrange
)
import numpy as np from scipy import sparse from scipy.sparse.linalg import spsolve def baseline_als(y, lam, p, niter=10): L = len(y) D = sparse.diags([1,-2,1],[0,-1,-2], shape=(L,L-2)) w = np.ones(L) for i in range(niter): W = sparse.spdiags(w, 0, L, L) Z = W + lam * D.dot(D.transpose()) z = spsolve(Z, w*y) w = p * (y > z) + (1-p) * (y < z) return z
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With