It is likely that none of the readers of Networks will need persuading of the power and versatility of techniques such as artificial neural networks or genetic algorithms. What may be of more interest, however, is a novel application for these techniques grounded in the fields of emergent behaviour, a-life, and computer gaming. Throw in an obscure but expressive programming language by the name of POP-11, and you have the makings of a rather interesting project. The brief was simple: can embodied agents in a simulation teach themselves to drive a car?
The agents are modelled by neural networks, each in control of a simulated vehicle on a pre-drawn racetrack. There exists a basic modelling of Newtonian physics, to control such aspects as acceleration and friction, and a series of waypoints are drawn along the course of the track. The inputs to the neural networks consist of the speeds of the left and right wheels of the car, the car's current heading, a choice of which waypoint to observe, and the bearing to the chosen waypoint; essentially, these are equivalent input sources to those you could derive from real-world GPS co-ordinates on a waypointed track.
At this initial level, running the simulation causes a variety of different behaviours: spinning, looping, driving in reverse, crashing into barriers -- and invariably not progressing very far along the track at all. The neural networks obviously require some training.
Instead of performing supervised training by a human driver via backpropagation, the cars are given a set amount of time in which they can do whatever they rather obliviously choose to do. After a certain time has elapsed, the grim reaper visits and culls all the agents, with the exception of the lucky two which have strayed by chance further along the track than their peers. The neural networks of these two agents are then combined using a genetic algorithm, and the resulting offspring replace the other cars which were less successful.
Within a few generations it can be seen that the cars are learning a preference for driving along the general direction of the track, even taking account of the bends in the track and adjusting their turning speeds accordingly. By this time the race is on, and the fastest two cars of each epoch are selected. The agents learn to cut the corners in order to better their race times -- but not by too much, or they will crash off the track and get stuck -- and before long, intelligent acceleration and deceleration through corners is observed. After approximately 15 epochs the cars are not only optimising their driving styles, but are taking advantage of the slipperiness of the road and actually power-sliding through tight corners.
These behaviours are not just learned for the track on which the cars are 'born', either. By placing the cars onto a new, totally-unseen track, all it takes is one or two further epochs before the new racing lines are being made and optimised. By altering the coefficients of friction on the track to make it appear as black ice, it is possible to confuse the drivers into overshooting all the corners. But soon enough, behaviours emerge such as hugging the high-friction edges of the track next to the barriers and slingshotting through the bends.
There is only one in-built rule in this system: the fastest cars to complete the track without cheating will survive. All this complex behaviour emerges from a simple combination of a genetic algorithm and some neural networks. Perhaps they are powerful techniques after all.
Short video showing PopRacer in action
Within 7 generations, the cars learn to assimilate all the available information and drive around the track. Soon after this, they begin to optimise racing lines and demonstrate novel behaviours such as powersliding through corners.