net.sourceforge.jabm.learning
Class QLearner

java.lang.Object
  extended by net.sourceforge.jabm.learning.AbstractLearner
      extended by net.sourceforge.jabm.learning.QLearner
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, DiscreteLearner, Learner, MDPLearner, Prototypeable, Resetable, org.springframework.beans.factory.InitializingBean

public class QLearner
extends AbstractLearner
implements MDPLearner, Resetable, org.springframework.beans.factory.InitializingBean, java.io.Serializable, Prototypeable

An implementation of the Q-learning algorithm. This algorithm is described in Watkins, J. C. H., Dayan, P., 1992. Q-learning. Machine Learning 8, 279-292.

See Also:
Serialized Form
 

Field Summary
protected  ActionSelector actionSelector
           
protected  int bestAction
          The best action for the current state
protected  int currentState
          The current state
protected  double discountRate
          The discount rate for future payoffs.
protected  double initialQValue
           
protected  int lastActionChosen
          The last action that was chosen.
protected  double learningRate
          The learning rate.
protected  int numActions
          The number of possible actions
protected  int numStates
          The number of possible states
protected  int previousState
          The previous state
protected  cern.jet.random.engine.RandomEngine prng
           
protected  double[][] q
          The matrix representing the estimated payoff of each possible action in each possible state.
 
Fields inherited from class net.sourceforge.jabm.learning.AbstractLearner
monitor
 
Constructor Summary
QLearner()
           
QLearner(int numStates, int numActions, double learningRate, double discountRate, cern.jet.random.engine.RandomEngine prng)
           
QLearner(cern.jet.random.engine.RandomEngine prng)
           
 
Method Summary
 int act()
          Request that the learner perform an action.
 void afterPropertiesSet()
           
 int bestAction(int state)
           
 void dumpState(DataWriter out)
          Write out our state data to the specified data writer.
 ActionSelector getActionSelector()
           
 double getDiscountRate()
           
 double getInitialQValue()
           
 int getLastActionChosen()
           
 double getLearningDelta()
          Return a value indicative of the amount of learning that occured during the last iteration.
 double getLearningRate()
           
 int getNumberOfActions()
          Get the number of different possible actions this learner can choose from when it performs an action.
 int getNumberOfStates()
           
 int getPreviousState()
           
 cern.jet.random.engine.RandomEngine getPrng()
           
 int getState()
           
 double getValueEstimate(int action)
           
 double[] getValueEstimates(int state)
           
 void initialise()
           
 double maxQ(int newState)
           
 void newState(double reward, int newState)
          The call-back after performing an action.
 java.lang.Object protoClone()
           
 void reset()
          Reinitialise our state to the original settings.
 void setActionSelector(ActionSelector actionSelector)
           
 void setDiscountRate(double discountRate)
           
 void setInitialQValue(double initialQValue)
           
 void setLearningRate(double learningRate)
           
 void setNumberOfActions(int numActions)
           
 void setNumberOfStates(int numStates)
           
 void setPrng(cern.jet.random.engine.RandomEngine prng)
           
 void setState(int newState)
           
 void setStatesAndActions(int numStates, int numActions)
           
 java.lang.String toString()
           
protected  void updateQ(double reward, int newState)
           
 int worstAction(int state)
           
 
Methods inherited from class net.sourceforge.jabm.learning.AbstractLearner
monitor
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface net.sourceforge.jabm.learning.Learner
monitor
 

Field Detail

numStates

protected int numStates
The number of possible states


numActions

protected int numActions
The number of possible actions


q

protected double[][] q
The matrix representing the estimated payoff of each possible action in each possible state.


learningRate

protected double learningRate
The learning rate.


discountRate

protected double discountRate
The discount rate for future payoffs.


previousState

protected int previousState
The previous state


currentState

protected int currentState
The current state


lastActionChosen

protected int lastActionChosen
The last action that was chosen.


bestAction

protected int bestAction
The best action for the current state


prng

protected cern.jet.random.engine.RandomEngine prng

actionSelector

protected ActionSelector actionSelector

initialQValue

protected double initialQValue
Constructor Detail

QLearner

public QLearner(int numStates,
                int numActions,
                double learningRate,
                double discountRate,
                cern.jet.random.engine.RandomEngine prng)

QLearner

public QLearner(cern.jet.random.engine.RandomEngine prng)

QLearner

public QLearner()
Method Detail

protoClone

public java.lang.Object protoClone()
Specified by:
protoClone in interface Prototypeable

initialise

public void initialise()

setStatesAndActions

public void setStatesAndActions(int numStates,
                                int numActions)

setState

public void setState(int newState)

getState

public int getState()

act

public int act()
Description copied from interface: DiscreteLearner
Request that the learner perform an action. Users of the learning algorithm should invoke this method on the learner when they wish to find out which action the learner is currently recommending.

Specified by:
act in interface DiscreteLearner
Returns:
An integer representing the action to be taken.

newState

public void newState(double reward,
                     int newState)
Description copied from interface: MDPLearner
The call-back after performing an action.

Specified by:
newState in interface MDPLearner
Parameters:
reward - The reward received from taking the most recently-selected action.
newState - The new state encountered after taking the most recently-selected action.

updateQ

protected void updateQ(double reward,
                       int newState)

maxQ

public double maxQ(int newState)

worstAction

public int worstAction(int state)

bestAction

public int bestAction(int state)
Specified by:
bestAction in interface MDPLearner

reset

public void reset()
Description copied from interface: Resetable
Reinitialise our state to the original settings.

Specified by:
reset in interface Resetable

setDiscountRate

public void setDiscountRate(double discountRate)

getDiscountRate

public double getDiscountRate()

getLastActionChosen

public int getLastActionChosen()

getLearningDelta

public double getLearningDelta()
Description copied from interface: Learner
Return a value indicative of the amount of learning that occured during the last iteration. Values close to 0.0 indicate that the learner has converged to an equilibrium state.

Specified by:
getLearningDelta in interface Learner
Specified by:
getLearningDelta in class AbstractLearner
Returns:
A double representing the amount of learning that occured.

dumpState

public void dumpState(DataWriter out)
Description copied from interface: Learner
Write out our state data to the specified data writer.

Specified by:
dumpState in interface Learner
Specified by:
dumpState in class AbstractLearner

getNumberOfActions

public int getNumberOfActions()
Description copied from interface: DiscreteLearner
Get the number of different possible actions this learner can choose from when it performs an action.

Specified by:
getNumberOfActions in interface DiscreteLearner
Specified by:
getNumberOfActions in interface MDPLearner
Returns:
An integer value representing the number of actions available.

getLearningRate

public double getLearningRate()

setLearningRate

public void setLearningRate(double learningRate)

getNumberOfStates

public int getNumberOfStates()
Specified by:
getNumberOfStates in interface MDPLearner

setNumberOfStates

public void setNumberOfStates(int numStates)

setNumberOfActions

public void setNumberOfActions(int numActions)

getPreviousState

public int getPreviousState()

getPrng

public cern.jet.random.engine.RandomEngine getPrng()

setPrng

public void setPrng(cern.jet.random.engine.RandomEngine prng)

getActionSelector

public ActionSelector getActionSelector()

setActionSelector

public void setActionSelector(ActionSelector actionSelector)

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

getValueEstimate

public double getValueEstimate(int action)

setInitialQValue

public void setInitialQValue(double initialQValue)

getInitialQValue

public double getInitialQValue()

getValueEstimates

public double[] getValueEstimates(int state)
Specified by:
getValueEstimates in interface MDPLearner
Parameters:
state - The current state of the MDP.
Returns:
An array representing the Q values indexed by action.

afterPropertiesSet

public void afterPropertiesSet()
                        throws java.lang.Exception
Specified by:
afterPropertiesSet in interface org.springframework.beans.factory.InitializingBean
Throws:
java.lang.Exception