A wide range of modern operational decisions can be modeled as data driven optimization problems where a controller must act without full knowledge of certain parameters of a model. The basic situation involves a controller who must repeatedly choose an action, resulting in an outcome, from an existing set of potentially different courses of actions. For instance, in adaptive clinical trials, subjects must be sequentially assigned to a treatment or control group, with the outcome being the health of the subject. In this problem context, the controller would like to efficiently manage between 'discovery' (exploration) and 'refinement' (exploitation) based on the revealed sequence of outcomes. Such a strategy has the potential to gather information more efficiently and may lead to improved decision making. This award will support fundamental research that relaxes many of the restrictions of existing models. The results of this project have applications in many different areas, including data driven operations management, online learning and optimization, health care, and adaptive routing. The award will support the participation of talented graduate students in this research, and the PIs will create level course modules to integrate these results in courses on stochastic processes.Traditional assumptions for this framework include independent and identically distributed samples from independent populations, learning the outcome of a trial only upon activation, and unlimited computational resources. The primary goal of this project is to extend the frontiers of knowledge in the following directions: 1) consideration of models constrained by some known relation or property, so that data from one course of actions is potentially informative about any or all other courses of action; 2) investigation of models where the optimization is with respect to more general objectives such as the median, the quantile, or the variability; 3) extension of the underlying theory to models where at each stage a 'context' is made known that modifies the value of the reward in a predetermined fashion; and 4) consideration of models where the distribution of rewards for each course of action may change over time. The research is expected to lead to new theoretical results, efficient algorithms, and analytical tools for collecting and utilizing data in optimal or near optimal ways.
|Effective start/end date||9/1/17 → 8/31/20|
- National Science Foundation (NSF)