The complexity of policy evaluation for finite-horizon partially-observable markov decision processes

Martin Mundhenk, Judy Goldsmith, Eric Allender

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

A partially-observable Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. We consider several flavors of finite-horizon POMDPs. Our results concern the complexity of the policy evaluation and policy existence problems, which are characterized in terms of completeness for complexity classes. We prove a new upper bound for the policy evaluation problem for POMDPs, showing it is complete for Probabilistic Logspace. From this, we prove policy existence problems for several variants of unobservable, succinctly represented MDPs to be complete for NPPP , a class for which not many natural problems are known to be complete.

Original languageEnglish (US)
Title of host publicationMathematical Foundations of Computer Science 1997 - 22nd International Symposium, MFCS 1997, Proceedings
EditorsIgor Privara, Peter Ruzicka
PublisherSpringer Verlag
Pages129-138
Number of pages10
ISBN (Print)3540634371, 9783540634379
DOIs
StatePublished - 1997
Event22nd International Symposium on Mathematical Foundations of Computer Science, MFCS 1997 - Bratislava, Slovakia
Duration: Aug 25 1997Aug 29 1997

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1295
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other22nd International Symposium on Mathematical Foundations of Computer Science, MFCS 1997
CountrySlovakia
CityBratislava
Period8/25/978/29/97

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'The complexity of policy evaluation for finite-horizon partially-observable markov decision processes'. Together they form a unique fingerprint.

Cite this