Swarm - public/papers/2017/A-pseudo-Voigt-component-model-for-high/latex/icassp2017.tex

\documentclass[9pt]{article}
\usepackage{spconf,amsfonts,amsmath,graphicx}
\usepackage[english]{babel} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} %\usepackage{cite}
\usepackage{xspace}
\usepackage{breqn}
\usepackage{url}

% definitions
% --------------------
\newcommand{\bs}[1]{\boldsymbol{#1}}
\newcommand{\parm}{{\theta}} \newcommand{\parms}{{\bs{\parm}}}
\newcommand{\Exppdf}[1]{{\mathcal{E}}\left({#1}\right)}
\newcommand{\Gammapdf}[1]{\ensuremath{\mathcal{G}}\left(#1\right)\xspace}
\newcommand{\Normalpdf}[1]{\ensuremath{\mathcal{N}}\left(#1\right)\xspace}
\newcommand{\NormalTpdf}[1]{\ensuremath{\mathcal{N}^T}\left(#1\right)\xspace}
\newcommand{\Uniformpdf}[1]{\ensuremath{U}\left(#1\right)\xspace}

\newcommand{\figdir}{../figures}
\newcommand{\scsum}[2]{{\displaystyle\mathop{\textstyle\sum}_{\makebox[0pt][c]
	{$\scriptscriptstyle #1$}}^{\makebox[0pt][c]{$\scriptscriptstyle #2$}}}}

\title{A pseudo-Voigt component model for high-resolution recovery of constituent spectra in Raman spectroscopy}

\name{
		Tommy S. Alstrøm$^{\star}$,
		Mikkel N. Schmidt$^{\star}$,
		Tomas Rindzevicius$^{\dagger}$,
		Anja Boisen$^{\dagger}$,
		Jan Larsen$^{\star}$
	\thanks{This research was funded by the IDUN Center of Excellence supported by the Danish National Research 
	Foundation (DNRF122) and the Velux Foundations (Grant No. 9301).}}
\address{$^\star$Department of Applied Mathematics and Computer Science, Technical
	University of Denmark\\
	Richard Petersens Plads 324, 2800 Kgs. Lyngby, Denmark \\\\
	$^\dagger$Department of Micro and Nanotechnology, Technical University of Denmark \\
	Ørsteds Plads 345C, DK-2800, Kgs. Lyngby, Denmark\\
}

\begin{document}
%\ninept
%
\maketitle
%
\begin{abstract}
%1. Motivation. Why do we care? \\
Raman spectroscopy is a well-known analytical technique for identifying and analyzing chemical species. Since Raman 
scattering is a weak effect, surface-enhanced Raman spectroscopy (SERS) is often employed to amplify the signal.
%
SERS signal surface mapping is a common method for detecting trace amounts of target molecules. Since the method produce
large amounts of data and, in the case of very low concentrations, low signal-to-noise (SNR) ratio, ability to extract
relevant spectral features is crucial.
%2. Problem formulation. What problem will we solve?\\
%3. Approach. How did we solve it?\\
We propose a pseudo-Voigt model as a constrained source separation model,
%4. Results. What is the answer?\\
that is able to directly and reliably identify the Raman modes, with overall performance similar to the state of the art
non-negative matrix factorization approach.
%5. Conclusions. What are the implications?\\
However, the model provides better interpretation and is a step towards enabling the use of SERS in detection of trace
amounts of molecules in real-life settings.
\end{abstract}
%
\begin{keywords}
Raman Spectroscopy, Non-negative matrix factorization (NMF), Bayesian Modeling, Pseudo-Voigt, Multivariate Curve 
Resolution (MCR)
\end{keywords}
%
\section{Introduction}
\label{sec:intro}
An important physical phenomenon that can be utilized for chemical analysis via molecular ``fingerprinting'' is the 
so-called Raman scattering. A key issue with Raman scattering is that it is a very weak process, i.e., only 0.1\% or 
less incident photons on a sample are inelastically (or Raman) scattered while the rest remain elastically (or 
Rayleigh) scattered. One straightforward method to amplify the Raman scattering signal is using corrugated, nanosized 
noble metal surfaces, and the effect is known as surface-enhanced Raman scattering. SERS was discovered in 1973 
\cite{Fleischmann1974} and explained by Jeanmaire \& Van Duyne in 1977 \cite{Jeanmaire1977a}. The ``new'' chapter for 
SERS begun in the late 1990s when researchers showed that in some cases a strong surface enhancement can even enable 
single molecule detection \cite{Nie1997}.

Today surface-enhanced Raman scattering (SERS) is one of the most relevant and rapidly advancing spectroscopy methods 
for label-free and sensitive detection of target molecules \cite{Schlucker2014}. A good illustration is a record low 
attomolar detection of TNT and DNT explosives that few methods if any can match \cite{Hakonen2015}. Another advantage 
is that Raman instrumentation for SERS-based analysis is becoming better, cheaper, and portable \cite{serstech_com}.
 
A number of challenges in using the SERS technique for quantitative molecular detection in complex bioassays still 
remain \cite{Prochazka2016}. One path for quantitative SERS is using SERS tags and it usually requires development of a 
reliable quantification methodology based on an ensemble of SERS spectra, e.g.\ SERS signal mapping \cite{YPBA13}. 
Mapping of SERS signals is necessary because of generally poor reproducibility and uniformity of SERS substrates, which 
produces significant Raman signal variations. Typically, underlying reasons for SERS signal variations are fabrication 
driven, i.e., it is very difficult to control accurately nanoparticle morphology and especially inter-particle 
separation at <10 nm distances which affect significantly electromagnetic enhancement (the so-called “hot spots”) 
\cite{Petryayeva2011}. Hence, SERS intensities vary between hot spots with even smallest differences in morphology and 
junction dimensions. 

Another issue is that target molecules are deposited or adsorbed at random locations on the SERS substrate surface 
area. Since the coverage is non uniform, significant SERS intensity variations over the substrate area are expected. At 
extremely low analyte concentrations (<nM) the probability to find target molecule exactly at the hot spot is extremely 
low, i.e., most hot spots are unoccupied and only few are expected to produce measurable SERS signals 
\cite{Palla2015a}. 
And finally, each SERS substrate type coupled with a target molecule display unique characteristics in terms of surface 
coverage, electromagnetic enhancement factors, distribution of hot spots, etc. Therefore, the statistical 
interpretation of SERS signals is a crucial problem that needs to be addressed more rigorously. 

When analyzing complex Raman spectra it is important to resolve overlapping line shapes \cite{Weatherston2016}. While
the true lineshape of a vibrational mode can be approximated by a Lorentzian function, the measured lineshape is often
the convolution of a Lorentzian and a Gaussian function \cite{Sundius1973} due to the instrument distortion. Therefore,
to accurately determine line widths and peak positions, Raman spectra are fitted using a Voigt function, which is a
convolution of a Lorentzian with a Gaussian line shape.

Herein, we developed a method to automatically analyze large-area Raman or SERS signal maps for more reliable and 
reproducible identification of Raman modes (analyte molecules). The model is compared to the current state of the art 
approaches of extracting spectra, which are multivariate factor models with non-negativity constraints.

\section{Relation to prior work}
Blind source separation techniques such as NMF has been used for spectral data on numerous occasions
\cite{Sajda2003,Alstrom2014,Li2005}. In the field of general spectroscopy the method of Multivariate Curve Resolution
(MCR) using non-negativity constraints has also been applied~\cite{Tauler1993}. The NMF and MCR models are identical but
the inference algorithms differ. MCR uses principal component analysis (PCA) as initialization and the model is then
estimated using alternating least squares. NMF can also be learned in a number of ways, and was demonstrated for
extracting facial features by Lee and Seung \cite{LeSu99}. In the case of very low signal to noise ratios, blind source
separation algorithms fail to recover meaningful and interpretable spectral components. Solutions involve the
application of additional constraints on the search for basis vectors \cite{Li2007}, or to include some prelearned basis
vectors in the model \cite{Alstrom2014}. Another approach is to fit a power-law distribution at a certain wavenumber
\cite{Palla2015a,YPBA13}. However, this method requires knowledge of the exact location of the Raman mode, and in
addition assumes that the data is free from outliers. The multivariate approach of NMF/MCR does not suffer from these
downsides, so a combination of the two approaches might be ideal. The proposed pseudo-Voigt model is such a combination.

\section{Methods}
Let $X$ denote the observed data: An array of positive real numbers, where $X_{w,n}$ is the measured amplitude at wave 
number $w$ in the observed spectrum $n$. We denote the number of observed wavelengths by $W$ and the number of observed 
spectra by $N$. 

We will compare two approaches for modeling such spectra: 1) A non-negative matrix factorization approach in which the 
spectra are modeled as scaled sums of $K$ basis spectra which are learned from the observed data, and 2) a related 
approach where the basis spectra are constrained to have a pseudo-Voigt shape.

\subsection{Non-negative matrix factorization}
The non-negative matrix factorization approach to modeling spectroscopy data assumes that each observed spectrum can be 
decomposed as a weighted sum of constituent basis spectra \cite{Sajda2003,Alstrom2014,Li2005}. Using a Normal 
observation noise model
\begin{equation}
  X_{w,n}|S,A,\tau \sim \text{N}([SA]_{w,n},\tau^{-1}),
\end{equation}
where $\tau$ is the noise precision, the data array is modeled as product of two matrices, $S$ and $A$, where the $K$ 
columns of $S$ are the basis spectra, and each column of $A$ contain their amplitudes for a given observation spectrum.

Following \cite{ScWH09}, we assume a standard conjugate Gamma prior for the noise precision,
\begin{align}
  \tau \sim \text{Gamma}(a_{\tau_0},b_{\tau_0}),
\end{align}
with shape $a_{\tau_0}$ and rate $b_{\tau_0}$. We use independent exponential priors for both the spectra and their
amplitudes,
\begin{align}
  A_{k,n} \stackrel{i.i.d}{\sim}\text{Exp}(A_0), \quad
  S_{k,n} \stackrel{i.i.d}{\sim}\text{Exp}(S_0),
\end{align}
with rates $A_0$ and $S_0$ respectively.
%
These priors reflect the constraints that the spectra and their amplitudes are non-negative and corresponds to a 
Bayesian sparse NMF model. 

\subsection{Pseudo-Voigt spectral peak model}
As in the NMF approach, we will assume an independent Normal noise model,
\begin{equation}
  X_{w,n}|I_{w,n},\tau \sim \text{N}(I_{w,n},\tau^{-1}),
\end{equation}
with precision $\tau$ and mean $I_{w,n}$ modeled as a sum of $K$ pseudo-Voigt shaped component plus a background 
component,
\begin{equation}
  I_{w,n} = \sum_{k=1}^K \alpha_{k,n} V_w(c_{k},\gamma_k,\eta_k) + \beta_n B_w.
\end{equation}
Here, $V$ is a pseudo-Voigt shaped component centered at wave number $w$ with parameters $c$, $\gamma$, and $\eta$; 
$\alpha_{k,n}$ is a component and observation specific amplitude; $B_w$ is a common background spectrum; and $\beta_n$ 
is a observation specific background amplitude. The pseudo-Voigt \cite{Wertheim1974} profile is given by
\begin{equation}
  V_w(c,\gamma,\eta) = \eta L_w(c, \gamma) + (1-\eta)G_w(c, \gamma),
\end{equation}
where $L_w = \frac{\gamma\pi^{-1}}{(w-c)^2+\gamma^2}$ and $G_w = 
\frac{1}{\sqrt{2\pi}\gamma}\exp\Big(\frac{-(w-c)^2}{2\gamma^2}\Big)$.

As in the NMF approach, we assume a standard conjugate Gamma prior for the noise precision. As priors for the
pseudo-Voigt profile parameters we use the following:
\begin{align}
  \eta_k &\stackrel{i.i.d}{\sim}\text{Unif}(0,1), &
  \alpha_{k,n} &\stackrel{i.i.d}{\sim}\text{Exp}(\alpha_0),\\
  c_k &\stackrel{i.i.d}{\sim} \text{N}^+(\mu_{c_0},\tau^{-1}_{c_0}), &
  \gamma_k &\stackrel{i.i.d}{\sim} \text{N}^+(\mu_{\gamma_0},\tau^{-1}_{\gamma_0}),
\end{align}
where $\text{N}^+$ denotes the Normal distribution truncated to the positive reals. Here  $\eta_k$ is given an vague 
uniform prior, $\alpha_{k,n}$ is exponential to reflect the anticipation that a few observed spectra might contain 
``hot spots'' with high amplitudes, and the $c_k$ and $\gamma_k$ parameters are given fairly flexible truncated Normal 
priors, reflecting that the parameters must be positive while making it possible to choose their hyperparameters to 
reflect locations and widths of the spectral peaks, or in lack of prior knowledge to set them as vague.

The prior for the background component is chosen as
\begin{align}
  \beta_n &\stackrel{i.i.d}{\sim} \text{N}^+(\mu_{\beta_0},\tau^{-1}_{\beta_0}), &
  B_w &\stackrel{i.i.d}{\sim} \text{N}^+(\mu_{B_0},\tau^{-1}_{B_0}),
\end{align}
allowing the hyperparameters to be chosen in accordance with prior knowledge.

\subsection{Inference}
The sparse NMF model is learned using a Gibbs sampler as specified in~\cite{ScWH09}. Following the same approach, we 
develop a similar algorithm for learning the pseudo-Voigt model. The conditional distributions for the parameters with 
conjugate priors ($\bs{\alpha}$, $\bs{\beta}$, $\bs{B}$ and $\tau$) are sampled using Gibbs sampling. First define 
$V_{w,k}=V(w,c_k,\gamma_k,\eta_k)$. The conditional distribution for the weights of the pseudo-Voigt functions is 
proportional to the product a normal distribution and an exponential distribution and the resulting distribution is 
then a truncated Normal distribution limited to positive values with center and scale 
\begin{dgroup}
\begin{dmath}
\alpha_{k,n}\lvert\bs{X},\bs{\alpha}_{\setminus{k,n}},\bs{\beta},\bs{B},\tau \sim 
\text{N}^+(\mu_{\alpha_{k,n}},\tau_{\alpha_{k,n}}^{-1})
\end{dmath}
\begin{dmath}
\tau_{\alpha_{k,n}}^{-1}=\tau\sum_{w=1}^{W}V_{w,k}^2
\end{dmath}	
\begin{dmath}
\mu_{\alpha_{k,n}}=\frac{\!\!\!\tau\scsum{w=1}{W} 
V_{w,k}\!\!\left(\!\!X_{w,n}\!\!-\!b_nB_w\!\!-\!\!\scsum{k'\neq k}{K}a_{k',n}V_{w,k'}\right)\!\!-\!\alpha_0\!\!\!\!}
{\tau\scsum{w=1}{W}V_{w,k}^2}
\end{dmath}	
\end{dgroup} 
The conditional distributions of the background signals are on a similar account also truncated Normals,
\begin{dgroup}
\begin{dmath}
\beta_{n}\lvert\bs{X},\bs{\alpha},\bs{\beta}_{\setminus{n}},\bs{B},\tau \sim 
\text{N}^+(\mu_{\beta_n},\tau_{\beta_n}^{-1})
\end{dmath}
\begin{dmath}
\tau_{\beta_n}^{-1}=\tau_ {\beta_0}+\tau\sum_{w=1}^{W}B_{w}^2
\end{dmath}	
\begin{dmath}
\mu_{\beta_n}=\frac{\tau\scsum{w=1}{W}B_w\left( X_{w,n}-\scsum{k=1}{K}a_{k,n}V_{w,k}\right)+\tau_{\beta_0}\mu_{\beta_0}}
{\tau_{\beta_0}+\tau\scsum{w=1}{W}B_{w}^2}
\end{dmath}	
\end{dgroup}
The conditional distribution of $B_w$ is almost identical due to symmetry in the likelihood. For the precision we get
the standard result
\begin{dgroup}
\begin{dmath}
\tau\lvert\bs{X},\bs{\alpha},\bs{\beta},\bs{B} \sim
\text{Gamma}(a_{\tau},b_{\tau})
\end{dmath}
\begin{dmath}
a_{\tau}=\frac{WN}{2}+a_{\tau_0}
\end{dmath}	
\begin{dmath}
b_{\tau}=b_{\tau_0}+\frac{1}{2}\sum_{w=1}^{W}\sum_{n=1}^{N}(X_{w,n}-I_{w_n})^2
\end{dmath}	
\end{dgroup}
The pseudo-Voigt parameters are sampled using Metropolis Hastings. Expanded derivations and implementation details can 
be found in the supplementary material

\section{Results and discussion}
\begin{figure}[tb]
	\begin{minipage}[b]{1.0\linewidth}
		\centering
		\centerline{\includegraphics[width=8.5cm]{\figdir/simulated_data.pdf}}
		%  \vspace{2.0cm}
		\centerline{(a) Data generated by simulation.}\medskip
	\end{minipage}
	\begin{minipage}[b]{1.0\linewidth}
		\centering
		\centerline{\includegraphics[width=8.5cm]{\figdir/simulated_spectrum.pdf}}
		%  \vspace{2.0cm}
		\centerline{(b) Example spectrum with a SNR=2.7.}\medskip
	\end{minipage}
	\caption{Data generated by simulation. The blue line is an example spectrum with added noise and the dotted red is 
	the true underlying spectrum.}
	\label{fig:simulated}
\end{figure}
The algorithm is tested on simulated data where the true parameters are known. The data is generated using a fixed
pseudo-Voigt profile added to a randomly generated a smooth background added to zero mean Gaussian distributed noise
with precision equal to one. We generated a map of 625 measured spectra at 300 wavenumbers. We created the weights for
the pseudo-Voigt profiles by simulating hot-spot behavior. A single hotspot was randomly placed on a grid and the
weights of this hotspot were calculated using a 2-d Gaussian curve. At the maximum peak location the SNR is 2.7. The 
generated data is shown in Fig.~\ref{fig:simulated}.
\begin{figure}[t!]
\centering
\includegraphics[width=8.5cm]{\figdir/simulated_recovered_spectrum.pdf}
\caption{Spectrum as identified by the pseudo-Voigt model (blue) and the NMF model (red) compared to the true 
underlying signal (dotted orange).}
\label{fig:recovered_spectrum}
\end{figure}
\begin{figure}[t!]
\centering
\includegraphics[width=8.5cm]{\figdir/simulated_background.pdf}
\caption{Background signal recovered by the pseudo-Voigt model (blue) and the NMF model (red) compared to the true 
  underlying background signal (dotted orange).}
\label{fig:recovered_background}
\end{figure}
\begin{figure}[t!]
\centering
\includegraphics[width=8.5cm]{\figdir/simulated_loadings.pdf}
\caption{Intensities as identified by the pseudo-Voigt model (blue) and the NMF model (red) compared to the true 
intensities (dotted orange).}
\label{fig:recovered_loadings}
\end{figure}

The pseudo-Voigt model is able to recover the exact center of the Raman peak, whereas NMF recovers a noisy spectrum 
with the highest intensity located very close to the correct Raman peak (Fig.~\ref{fig:recovered_spectrum}). The most 
important attribute of a Raman peak is the location of the center as it can be directly mapped to a specific molecular 
binding, and thus it is essential to extract the center of a Raman mode reliably.

Recovery of the background signal is displayed in Fig. \ref{fig:recovered_background} where the peak clearly interferes 
with the background signal estimate. The pseudo-Voigt model overestimates the background signal in the region of the 
Raman peak whereas the NMF model underestimates the background signal here. This is further reflected in the recovery 
of the Raman peak intensities which are displayed in Fig. \ref{fig:recovered_loadings}. The presence of a molecule is 
in essence determined by setting a threshold on the intensity. The threshold should ideally be inferred from data but 
setting it to one gives similar performance for both methods.

In the presence of mixtures of peaks with overlap, the posterior densities of both models become multimodal. 
Determining the hyperparameters is thus essential to obtain the solution which makes physical sense. This can easily 
and readily be achieved in the pseudo-Voigt model, as the experimenter can intuitively select potential areas of Raman 
peaks. The algorithm had similar performance for mixtures ($K=3$, not shown) as for a single Raman peak, and was able 
to recover the centers of Raman peaks.

\section{Conclusion}
The pseudo-Voigt approach works well for the simulated data, demonstrating comparable performance to current state of 
the art approaches such as NMF (and MCR). The advantage of the pseudo-Voigt model is that the center of a Raman peak is 
directly estimated as a model parameter. In addition, the pseudo-Voigt model offers a degree of confidence that the 
recovered spectrum has a Voigt shape which means that a true Raman peak has been recovered. Extracting the same 
information from the spectrum recovered by the NMF model requires additional post-processing. 

Supplementary material including code and data can be found at http://archive.compute.dtu.dk/p/1.
\vfill\clearpage\pagebreak 
\bibliographystyle{IEEEbib}
\bibliography{ICASSP2017}

\end{document}
#	Change	User	Description	Committed
#1	13172	tsal	Release of paper package
icassp2017.tex #1