Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

Work in Progress / [DBP] State space AI learning algorithm

Author
Message
The thing
20
Years of Service
User Offline
Joined: 22nd Jan 2004
Location: Somewhere in the U.K
Posted: 17th Mar 2011 00:06
I've been looking into artificial intelligence as a project of mine and cooked this up. The code gives an AI capable of generalised learning; rather than being coded for a specific problem it can teach itself to solve problems the programmer hasn't forseen. Be warned, it is fairly processor and memory intensive and requires a bit of time for it to make decent progress (but you can see it actively learning).



controls:
Space key: moving average graph

Hold control key: Turns off AI exploration, improving short term performance

Shiftkey: Sets the frame rate to 60 frames per second for normal speed

Returnkey: Shifts back to accelerated learning at the max fram rate



Its a bit messy, but if anyone is interested i can clean it up. Its essentially a cart and pole problem that tests an AI algorithm called SARSA (lambda). The objective is for the AI to balence the pole as long as possible without it falling over or moving the pole's base (the cart) outside the bounds. The AI assigns fitness values to possible actions when in a given state, as dictated by its inputs, with more "fit" values being more likely to result in the agent transferring to a future state where it recieves a reward.

This is where the algorithm's name comes in; SARSA(lambda) stands for State Action Reward State Action. The fitness values of a past state action pair are updated according to the product of an error function, an exponential time decay function and the learning rate constant. The time decay constant is lamda. The error function is difference between the fitness value of the next state action pair and that of the current of the current, summed with any rewards that may have been recieved in between states.

The next state is chosen using a policy that attempts to balence exploration and exploitation. More exploration will give the agent a better knowledge of the enviroment at the expense of poor short performance but better long term performance. Eploitation gives good short term performance. The probability of choosing a random action, rather than one with a good fitness value, i.e.exploring is a probability which increases depending on the length of the current AI's test run compared to the moving average.

Problems:
- Tabular state based learning algorithm so performs poorly with very large state spaces; each state must tested and its fitness recorded, putting strain on memory.
- In an attempt to reduce the state space I used rather coarse inputs (The inputs passed to the AI must be in binary so some rounding must be done). This speeds up learning but may reduce the AI's consistancy at solving the problem, or may prevent it from solving it alltogether (I have not managed to get the AI to provide a perfect solution to the pole balence problem yet)

I may attempt implementing a neural network using SARSA and backpropagation to reduce the state space size and allow the AI to make inferences, removing the need to visit every state requyired to solve the problem. Any comments are welcome.
baxslash
Valued Member
Bronze Codemaster
17
Years of Service
User Offline
Joined: 26th Dec 2006
Location: Duffield
Posted: 18th Mar 2011 14:48
I can see it learning but it sure takes a long time. Interesting problem! It would make a good challenge.

Nice work.

Login to post a reply

Server time is: 2024-05-29 23:38:08
Your offset time is: 2024-05-29 23:38:08