TY - JOUR
T1 - Approximate nonparametric maximum likelihood for mixture models
T2 - A convex optimization approach to fitting arbitrary multivariate mixing distributions
AU - Feng, Long
AU - Dicker, Lee H.
N1 - Funding Information:
The authors are extremely grateful to the referee for their helpful and stimulating comments, leading to an improved exposition. The work of Lee H. Dicker is partially supported by NSF Grants DMS-1208785 and DMS-1454817. The work of Long Feng is supported by National Institute on Drug Abuse Grant R01 DA016750.
Funding Information:
The authors are extremely grateful to the referee for their helpful and stimulating comments, leading to an improved exposition. The work of Lee H. Dicker is partially supported by NSF Grants DMS-1208785 and DMS-1454817 . The work of Long Feng is supported by National Institute on Drug Abuse Grant R01 DA016750 .
Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2018/6
Y1 - 2018/6
N2 - Nonparametric maximum likelihood (NPML) for mixture models is a technique for estimating mixing distributions that has a long and rich history in statistics going back to the 1950s, and is closely related to empirical Bayes methods. Historically, NPML-based methods have been considered to be relatively impractical because of computational and theoretical obstacles. However, recent work focusing on approximate NPML methods suggests that these methods may have great promise for a variety of modern applications. Building on this recent work, a class of flexible, scalable, and easy to implement approximate NPML methods is studied for problems with multivariate mixing distributions. Concrete guidance on implementing these methods is provided, with theoretical and empirical support; topics covered include identifying the support set of the mixing distribution, and comparing algorithms (across a variety of metrics) for solving the simple convex optimization problem at the core of the approximate NPML problem. Additionally, three diverse real data applications are studied to illustrate the methods’ performance: (i) A baseball data analysis (a classical example for empirical Bayes methods), (ii) high-dimensional microarray classification, and (iii) online prediction of blood-glucose density for diabetes patients. Among other things, the empirical results demonstrate the relative effectiveness of using multivariate (as opposed to univariate) mixing distributions for NPML-based approaches.
AB - Nonparametric maximum likelihood (NPML) for mixture models is a technique for estimating mixing distributions that has a long and rich history in statistics going back to the 1950s, and is closely related to empirical Bayes methods. Historically, NPML-based methods have been considered to be relatively impractical because of computational and theoretical obstacles. However, recent work focusing on approximate NPML methods suggests that these methods may have great promise for a variety of modern applications. Building on this recent work, a class of flexible, scalable, and easy to implement approximate NPML methods is studied for problems with multivariate mixing distributions. Concrete guidance on implementing these methods is provided, with theoretical and empirical support; topics covered include identifying the support set of the mixing distribution, and comparing algorithms (across a variety of metrics) for solving the simple convex optimization problem at the core of the approximate NPML problem. Additionally, three diverse real data applications are studied to illustrate the methods’ performance: (i) A baseball data analysis (a classical example for empirical Bayes methods), (ii) high-dimensional microarray classification, and (iii) online prediction of blood-glucose density for diabetes patients. Among other things, the empirical results demonstrate the relative effectiveness of using multivariate (as opposed to univariate) mixing distributions for NPML-based approaches.
KW - Convex optimization
KW - Kiefer–Wolfowitz estimator
KW - Multivariate mixture models
KW - Nonparametric maximum likelihood
UR - http://www.scopus.com/inward/record.url?scp=85041431600&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041431600&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2018.01.006
DO - 10.1016/j.csda.2018.01.006
M3 - Article
AN - SCOPUS:85041431600
VL - 122
SP - 80
EP - 91
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
SN - 0167-9473
ER -