Reinforcement Learning

Post by **Antonio Linares** » Tue Dec 19, 2017 5:02 am

Reinforcement Learning - A Simple Python Example and A Step Closer to AI with Assisted Q-Learning

Supervised and unsupervised approaches require data to model, not reinforcement learning! That’s right, it can explore space with a handful of instructions, analyze its surroundings one step at a time, and build data as it goes along for modeling

https://youtu.be/nSxaG_Kjw_w

https://amunategui.github.io/reinforcem ... index.html

Post by **Antonio Linares** » Tue Dec 19, 2017 5:32 pm

https://www.oreilly.com/ideas/reinforce ... -explained

Reinforcement learning, in a simplistic definition, is learning best actions based on reward or punishment.

There are three basic concepts in reinforcement learning: state, action, and reward.

The state describes the current situation. For a robot that is learning to walk, the state is the position of its two legs. For a Go program, the state is the positions of all the pieces on the board.

Action is what an agent can do in each state. Given the state, or positions of its two legs, a robot can take steps within a certain distance. There are typically finite (or a fixed range of) actions an agent can take. For example, a robot stride can only be, say, 0.01 meter to 1 meter. The Go program can only put down its piece in one of 19 x 19 (that is 361) positions.

When a robot takes an action in a state, it receives a reward. Here the term “reward” is an abstract concept that describes feedback from the environment. A reward can be positive or negative. When the reward is positive, it is corresponding to our normal meaning of reward. When the reward is negative, it is corresponding to what we usually call “punishment.”

Post by **Antonio Linares** » Wed Dec 20, 2017 11:51 am

Reinforcement learning is good for situations where information about the world is very limited: there is no given map of the world. We have to learn our actions by interacting with the environment: trial and error is required

Post by **Antonio Linares** » Fri Dec 22, 2017 7:09 pm

http://mnemstudio.org/path-finding-q-le ... torial.htm

The transition rule of Q learning is a very simple formula:

Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]

The Gamma parameter has a range of 0 to 1 (0 <= Gamma > 1). If Gamma is closer to zero, the agent will tend to consider only immediate rewards. If Gamma is closer to one, the agent will consider future rewards with greater weight, willing to delay the reward.

In this case, if the matrix Q has been enhanced, instead of exploring around, and going back and forth to the same rooms, the agent will find the fastest route to the goal state.

Post by **Antonio Linares** » Fri Dec 22, 2017 7:25 pm

Reinforcement Learning example (Q-Learning)

work in progress

qlearning.prg

Code: Select all

// AI: Reinforcement Learning example (Q-Learning)
// Harbour implementation by Antonio Linares

#include "FiveWin.ch"

function Main()

   local aStates := { 0, 1, 2, 3, 4, 5 }
   local aR := Array( Len( aStates ), Len( aStates ) )  // Rewards matrix
   local aQ := Array( Len( aStates ), Len( aStates ) )  // Q-Learning matrix
   local nGamma := 0.8
   local nState := aStates[ hb_RandomInt( 1, Len( aStates ) ) ]
   local nGoal := ATail( aStates )
   local nAction

   ? "initial state:", nState

   AEval( aR, { | aRow, n | aR[ n ] := AFill( aRow, -1 ) } ) // Initialize the Rewards matrix
   AEval( aQ, { | aRow, n | aQ[ n ] := AFill( aRow,  0 ) } ) // Initialize the Q-Learning matrix
   
   aR[ 1 ][ 5 ] = 0     // state 0 can only go to state 4
   aR[ 2 ][ 4 ] = 0     // state 1 can only go to state 3 or
   aR[ 2 ][ 6 ] = 100   // to state 5 (goal)
   aR[ 3 ][ 4 ] = 0     // state 2 can only go to state 3
   aR[ 4 ][ 2 ] = 0     // state 3 can only go to state 1 or 
   aR[ 4 ][ 3 ] = 0     // to state 2 or
   aR[ 4 ][ 5 ] = 0     // to state 4
   aR[ 5 ][ 1 ] = 0     // state 4 can only go to state 0 or
   aR[ 5 ][ 4 ] = 0     // to state 3 or
   aR[ 5 ][ 6 ] = 100   // to state 5 (goal)
   aR[ 6 ][ 2 ] = 0     // state 5 can only go to state 1 or
   aR[ 6 ][ 5 ] = 0     // to state 4 or
   aR[ 6 ][ 6 ] = 100   // to itself (goal)

   XBROWSER aR TITLE "Rewards matrix"

   while nState != nGoal
      nAction = NextAction( aR, nState )
      ? "next state", nAction
      nState = nAction
   end
    
return nil

function NextAction( aR, nState )

   local aActions := {}

   AEval( aR[ nState + 1 ],;
          { | nAction, nState | If( nAction == 0 .or. nAction == 100, AAdd( aActions, nState - 1 ), ) } )
          
   XBROWSER aActions TITLE "Possible Actions for state: " + Str( nState )       

return aActions[ hb_RandomInt( 1, Len( aActions ) ) ]

Post by **Antonio Linares** » Wed Dec 27, 2017 11:42 am

With no learning. Just random tries...

qlearning.prg

Code: Select all

// AI: Reinforcement Learning example (Q-Learning)
// Harbour implementation by Antonio Linares

#include "FiveWin.ch"

function Main()

   local aStates := { 0, 1, 2, 3, 4, 5 }
   local aR := Array( Len( aStates ), Len( aStates ) )  // Rewards matrix
   local aQ := Array( Len( aStates ), Len( aStates ) )  // Q-Learning matrix
   local nGamma := 0.8
   local nState := aStates[ hb_RandomInt( 1, Len( aStates ) ) ]
   local nGoal := ATail( aStates )
   local nAction

   ? "initial state:", nState

   AEval( aR, { | aRow, n | aR[ n ] := AFill( aRow, -1 ) } ) // Initialize the Rewards matrix
   AEval( aQ, { | aRow, n | aQ[ n ] := AFill( aRow,  0 ) } ) // Initialize the Q-Learning matrix
   
   aR[ 1 ][ 5 ] = 0     // state 0 can only go to state 4
   aR[ 2 ][ 4 ] = 0     // state 1 can only go to state 3 or
   aR[ 2 ][ 6 ] = 100   // to state 5 (goal)
   aR[ 3 ][ 4 ] = 0     // state 2 can only go to state 3
   aR[ 4 ][ 2 ] = 0     // state 3 can only go to state 1 or 
   aR[ 4 ][ 3 ] = 0     // to state 2 or
   aR[ 4 ][ 5 ] = 0     // to state 4
   aR[ 5 ][ 1 ] = 0     // state 4 can only go to state 0 or
   aR[ 5 ][ 4 ] = 0     // to state 3 or
   aR[ 5 ][ 6 ] = 100   // to state 5 (goal)
   aR[ 6 ][ 2 ] = 0     // state 5 can only go to state 1 or
   aR[ 6 ][ 5 ] = 0     // to state 4 or
   aR[ 6 ][ 6 ] = 100   // to itself (goal)

   XBROWSER aR TITLE "Rewards matrix"

   while nState != nGoal
      nAction = NextAction( aR, nState )
      ? "next state", nAction
      nState = nAction
   end
    
return nil

function GetActions( aR, nState )

   local aActions := {}

   AEval( aR[ nState + 1 ],;
          { | nAction, nState | If( nAction > -1, AAdd( aActions, nState - 1 ), ) } )

return aActions

function NextAction( aR, nState )

   local aActions := GetActions( aR, nState )
          
   XBROWSER aActions TITLE "Possible Actions for state: " + Str( nState )       

return aActions[ hb_RandomInt( 1, Len( aActions ) ) ]

Post by **Antonio Linares** » Wed Dec 27, 2017 12:34 pm

Implementing the learning. Work in progress

qlearning.prg

Code: Select all

// AI: Reinforcement Learning example (Q-Learning)
// Harbour implementation by Antonio Linares

#include "FiveWin.ch"

function Main()

   local aStates := { 0, 1, 2, 3, 4, 5 }
   local aR := Array( Len( aStates ), Len( aStates ) )  // Rewards matrix
   local aQ := Array( Len( aStates ), Len( aStates ) )  // Q-Learning matrix
   local nGamma := 0.8
   local nState := aStates[ hb_RandomInt( 1, Len( aStates ) ) ]
   local nGoal := ATail( aStates )
   local nAction

   ? "initial state:", nState

   AEval( aR, { | aRow, n | aR[ n ] := AFill( aRow, -1 ) } ) // Initialize the Rewards matrix
   AEval( aQ, { | aRow, n | aQ[ n ] := AFill( aRow,  0 ) } ) // Initialize the Q-Learning matrix
   
   aR[ 1 ][ 5 ] = 0     // state 0 can only go to state 4
   aR[ 2 ][ 4 ] = 0     // state 1 can only go to state 3 or
   aR[ 2 ][ 6 ] = 100   // to state 5 (goal)
   aR[ 3 ][ 4 ] = 0     // state 2 can only go to state 3
   aR[ 4 ][ 2 ] = 0     // state 3 can only go to state 1 or 
   aR[ 4 ][ 3 ] = 0     // to state 2 or
   aR[ 4 ][ 5 ] = 0     // to state 4
   aR[ 5 ][ 1 ] = 0     // state 4 can only go to state 0 or
   aR[ 5 ][ 4 ] = 0     // to state 3 or
   aR[ 5 ][ 6 ] = 100   // to state 5 (goal)
   aR[ 6 ][ 2 ] = 0     // state 5 can only go to state 1 or
   aR[ 6 ][ 5 ] = 0     // to state 4 or
   aR[ 6 ][ 6 ] = 100   // to itself (goal)

   XBROWSER aQ TITLE "Q-Learning matrix"

   while nState != nGoal
      nAction = NextAction( aR, nState )
      aQ[ nState + 1, nAction + 1 ] = aR[ nState + 1, nAction + 1 ] + nGamma * GetMaxQForActions( aQ, nState )
      XBROWSER aQ TITLE "Q-Learning matrix"
      ? "next state", nAction
      nState = nAction
   end
    
return nil

function GetMaxQForActions( aQ, nState )

   local nMax := 0
   
   AEval( aQ[ nState + 1 ], { | nLearning | nMax := Max( nLearning, nMax ) } )
          
   MsgInfo( nMax, "Max Q" )       
          
return nMax          

function GetActions( aR, nState )

   local aActions := {}

   AEval( aR[ nState + 1 ],;
          { | nReward, nState | If( nReward > -1, AAdd( aActions, nState - 1 ), ) } )

return aActions

function NextAction( aR, nState )

   local aActions := GetActions( aR, nState )
          
   XBROWSER aActions TITLE "Possible Actions for state: " + Str( nState )       

return aActions[ hb_RandomInt( 1, Len( aActions ) ) ]

FiveTech Software tech support forums

Reinforcement Learning

Reinforcement Learning

Re: Reinforcement Learning

Re: Reinforcement Learning

Re: Reinforcement Learning

Re: Reinforcement Learning

Re: Reinforcement Learning

Re: Reinforcement Learning