public class SoftMaxActionSelector extends Object implements ActionSelector
An implementation of the softmax action selection policy.
See:
Sutton, R. S., Barto, A. G., 1998. Reinforcement Learning: An Introduction.
MIT Press.
Modifier and Type | Field and Description |
---|---|
protected cern.jet.random.engine.RandomEngine |
prng |
protected double |
tau
The "temperature" used to modulate the propensity distribution.
|
Constructor and Description |
---|
SoftMaxActionSelector() |
SoftMaxActionSelector(cern.jet.random.engine.RandomEngine prng,
double tau) |
Modifier and Type | Method and Description |
---|---|
int |
act(int state,
MDPLearner learner)
Choose an action according to the current state and the
current value estimates for each action.
|
cern.jet.random.engine.RandomEngine |
getPrng() |
double |
getTau() |
void |
setPrng(cern.jet.random.engine.RandomEngine prng) |
void |
setTau(double tau) |
protected cern.jet.random.engine.RandomEngine prng
protected double tau
public SoftMaxActionSelector()
public SoftMaxActionSelector(cern.jet.random.engine.RandomEngine prng, double tau)
public int act(int state, MDPLearner learner)
ActionSelector
act
in interface ActionSelector
state
- The current state of the MDP.learner
- The algorithm used to update the value estimates.public cern.jet.random.engine.RandomEngine getPrng()
public void setPrng(cern.jet.random.engine.RandomEngine prng)
public double getTau()
public void setTau(double tau)
tau
- The "temperature" used to modulate the propensities.Copyright © 2014. All rights reserved.