Reinforcement Learning of Active Recognition Behaviors

We show how a concise representation of active recognition behavior- what observations to make to detect a given object- can be derived from hidden-state reinforcement learning techniques. These learning techniques can solve decision process tasks which include perceptual observations, defined formally as Partially Observable Markov Decision Processes (POMDP). We define recognition within a POMDP context, with an action indicating recognition of the target as well as actions for adjusting the perceptual apparatus or other effectors. An explicit supervised reward signal is provided to the decision process whenever the accept action is performed. With sufficient experience, a memory-based approach to reinforcement learning can find optimal policies which discriminate target from distractor patterns despite considerable perceptual aliasing at any given instant. To avoid perceptual aliasing while learning, all similar experiences are combined when computing the utility of a possible action, including experiences with both target and distractor patterns. By discarding the representation of negative regions of the utility space when learning is complete, and collapsing duplicate representations of positive regions, a representation similar to an augmented Finite State Machine is obtained. We show application of our method for the task of recognizing human gesture performance that occurs at multiple spatial scales.

Reinforcement Learning of Active Recognition Behaviors

Abstract: