Cognitive Robotics（认知机器人）：l19_pomdpii

分类：航空航天格式：pdf 日期：2006年06月16日

Partially Observable Markov Decision Processes Part II ● Additional reading: ● Anthony R. Cassandra. Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. Ph.D. Thesis. Brown University, Department of Computer Science, Providence, RI, 1998 (This lecture most focuses on chapter 6). Issues ● Problem statement: – If we do not know the current state of the world, what should we do to act optimally? ● Inputs: – Model of states, actions, observations, transition and emission functions, reward function, initial distribution and discount factor ● Outputs from approximate algorithms: – approximate value function ● Outputs from heuristic algorithms: – policies that capture the kind of behaviour we want Point-based Value Iteration (Pineau et al., 2003) ● Fix a set of beliefs ● Maintain a single α-vector per belief ● Update each α-vector according to Bayes’ filter 2 2.5 3 3.5 4 4.5 5 0 1 b 1 b 2 b 3 )(maxarg )(maxarg )'|(),|'(),()( , ,,, ' 1,, , ,, i at t i Zz i zat i at i Ss t i zat i b b szpsaspasRs at zat i ?= ?= += ∑ ∑ ∈ ∈ ? αα αα αγα α α Maximum-Likelihood State Heuristic ● Assume the MDP policy ● Given belief b: AS MDP →:π () ? ? ? ? ? ? = ∈ )(maxarg)( sbb Ss MDPMLS ππ Voting Heuristic ● Assume the MDP policy ● Given belief b: Nourbakhsh, Powers and Birchfield, 1995 Simmons & Koenig, 1996 () ∑ ∈ ∈ = ? ? ? ≠ = = Ss MDP Aa AV ji ji ji asIsbb aa aa aaI ),()(maxarg)( :0 :1 ),( ππ Q-MDP ● Assume the optimal MDP Q-function: ● Given belief b: ● Given belief b: Cassandra, Kaelbling, and Littman, 1994 ?→× ASQ MDP : ∑ ∈ ∈ ? = Ss Aa MDPQ asQsbb ),()(maxarg)(π Entropy Heuristic Cassandra, Kaelbling & Kurien, 1996 () [ ] ? ? ? ? ? ? ? ? ? ? ? > = ∈ otherwise if ))((maxarg )()),|((minarg sp bHzaspHE b s MDP z Aa π κ π Alternate Dual-Mode Heuristic ● Given belief b () () ? ? ? ? ? ? ? += ∑ ∑ ∈ ∈ ∈ Ss MDP Ss CU Aa asQspbH asQspbHb ),()()(1 ),()()(maxarg κ κ π Discretization ● Discretize belief space ● Convert to belief space MDP ● Solve using conventional DP ● Subject to “Curse of Dimensionality” Coastal Navigation ● Represent beliefs using 1. Discretize state-entropy space 2. Compute reward function and transition function 3. Solve belief state MDP )();(maxarg ~ bHsbb s = Model Parameters ? Reward function R(b) s 1 s 2 s 3 p(s) Back-project to high dimensional belief ∑ == S b sRspsREbR )()())(()( Compute expected reward from belief: ~ ~ Model Parameters ? Use forward model b i b j b k a z 1 p i (s) p j (s) b i b j a, z 1 Low dimension Full dimension Deterministic process Stochastic process ~ ~ ~ ~ ~ Model Parameters ? Use forward model b i b q b k a z 2 b j z 1 T(b i , a, b j ) ∝ p(z|s)b i (s|a) if b j (s) = b i (s|a,z) = 0 otherwise ~ ~ ~ ~ ~ ~ What You Should Know ● Advantages and disadvantages to using POMDPs ● What an optimal policy looks like ● How to compute a policy ● Different heuristics and approximations: how they work and where they may fail

课件简介

课件名称：	Cognitive Robotics（认知机器人）
课件分类：	航空航天
课件类型：	教学课件
文件大小：	12.93MB
下载次数：	1
评论次数：	1
用户评分：	10

显示更多>>

用户列表

李鹏

更多用户>>

关于我们|帮助中心|意见反馈|联系我们