TY - JOUR
T1 - Automating the ABCD method with machine learning
AU - Kasieczka, Gregor
AU - Nachman, Benjamin
AU - Schwartz, Matthew D.
AU - Shih, David
N1 - Funding Information:
We thank Alejandro Gomez Espinosa and Simone Pagan Griso for useful discussions and Simone for additionally providing feedback on the manuscript. We thank Olaf Behnke and Thomas Junk for helpful comments on the manuscript and especially for examples of early uses of the ABCD method. B. N., M. S., and D. S. were supported by the U.S. Department of Energy, Office of Science under Contracts No. DE-AC02-05CH11231, No. DE-SC0013607, and No. DOE-SC0010008, respectively. B. N. also thanks NVIDIA for providing Volta GPUs for neural network training. G. K. acknowledges support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy EXC 2121 “Quantum Universe” 390833306. D. S. is grateful to LBNL, BCTP, and BCCP for their generous support and hospitality during his sabbatical year.
Publisher Copyright:
© 2021 authors.
PY - 2021/2/22
Y1 - 2021/2/22
N2 - The ABCD method is one of the most widely used data-driven background estimation techniques in high energy physics. Cuts on two statistically independent classifiers separate signal and background into four regions, so that background in the signal region can be estimated simply using the other three control regions. Typically, the independent classifiers are chosen "by hand"to be intuitive and physically motivated variables. Here, we explore the possibility of automating the design of one or both of these classifiers using machine learning. We show how to use state-of-the-art decorrelation methods to construct powerful yet independent discriminators. Along the way, we uncover a previously unappreciated aspect of the ABCD method: its accuracy hinges on having low signal contamination in control regions not just overall, but relative to the signal fraction in the signal region. We demonstrate the method with three examples: a simple model consisting of three-dimensional Gaussians; boosted hadronic top jet tagging; and a recasted search for paired dijet resonances. In all cases, automating the ABCD method with machine learning significantly improves performance in terms of ABCD closure, background rejection, and signal contamination.
AB - The ABCD method is one of the most widely used data-driven background estimation techniques in high energy physics. Cuts on two statistically independent classifiers separate signal and background into four regions, so that background in the signal region can be estimated simply using the other three control regions. Typically, the independent classifiers are chosen "by hand"to be intuitive and physically motivated variables. Here, we explore the possibility of automating the design of one or both of these classifiers using machine learning. We show how to use state-of-the-art decorrelation methods to construct powerful yet independent discriminators. Along the way, we uncover a previously unappreciated aspect of the ABCD method: its accuracy hinges on having low signal contamination in control regions not just overall, but relative to the signal fraction in the signal region. We demonstrate the method with three examples: a simple model consisting of three-dimensional Gaussians; boosted hadronic top jet tagging; and a recasted search for paired dijet resonances. In all cases, automating the ABCD method with machine learning significantly improves performance in terms of ABCD closure, background rejection, and signal contamination.
UR - http://www.scopus.com/inward/record.url?scp=85102032003&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102032003&partnerID=8YFLogxK
U2 - 10.1103/PhysRevD.103.035021
DO - 10.1103/PhysRevD.103.035021
M3 - Article
AN - SCOPUS:85102032003
SN - 2470-0010
VL - 103
JO - Physical Review D
JF - Physical Review D
IS - 3
M1 - 035021
ER -