Single-pass Streaming Lower Bounds for Multi-armed Bandits Exploration with Instance-sensitive Sample Complexity

Sepehr Assadi, Chen Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Motivated by applications to process massive datasets, we study streaming algorithms for pure exploration in Stochastic Multi Armed Bandits (MABs). This problem was first formulated by Assadi and Wang [STOC 2020] as follows: A collection of n arms with unknown rewards are arriving one by one in a stream, and the algorithm is only allowed to store a limited number of arms at any point. The goal is to find the arm with the largest reward while minimizing the number of arm pulls (sample complexity) and the maximum number of stored arms (space complexity). Assuming ∆[2] is known, Assadi and Wang designed an algorithm that uses a memory of just one arm and still achieves the sample complexity of O(n/∆2[2]) which is worst-case optimal even for non-streaming algorithms; here ∆[i] is the gap between the rewards of the best and the i-th best arms. In this paper, we extended this line of work to stochastic MABs in the streaming model with the instance-sensitive sample complexity, i.e. the sample complexity of (Equation presented), similar in spirit to Karnin et.al. [ICML 2013] and Jamieson et.al. [COLT 2014] in the classical setting. We devise strong negative results under this setting: our results show that any streaming algorithm under a single pass has to use either asymptotically higher sample complexity than the instance-sensitive bound, or a memory of Ω(n) arms, even if the parameter ∆[2] is known. In fact, the lower bound holds under much stronger assumptions, including the random order streams or the knowledge of all gap parameters (Equation presented). We complement our lower bounds by proposing a new algorithm that uses a memory of a single arm and achieves the instance-optimal sample complexity when all the strong assumptions hold simultaneously. Our results are developed based on a novel arm trapping lemma. This generic complexity result shows that any algorithm to trap the index of the best arm among o(n) indices (but not necessarily to find it) has to use Θ(n/∆2[2]) sample complexity. This result is not restricted to the streaming setting, and to the best of our knowledge, this is the first result that captures the sample-space trade-off for 'trapping' arms in multi-armed bandits, and it can be of independent interest.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
EditorsS. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713871088
StatePublished - 2022
Externally publishedYes
Event36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States
Duration: Nov 28 2022Dec 9 2022

Publication series

NameAdvances in Neural Information Processing Systems
Volume35
ISSN (Print)1049-5258

Conference

Conference36th Conference on Neural Information Processing Systems, NeurIPS 2022
Country/TerritoryUnited States
CityNew Orleans
Period11/28/2212/9/22

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Single-pass Streaming Lower Bounds for Multi-armed Bandits Exploration with Instance-sensitive Sample Complexity'. Together they form a unique fingerprint.

Cite this