[ieee 2011 11th international conference on hybrid intelligent systems (his 2011) - melacca,...

Evolutionary Spiking Neural Networks as Racing Car Controllers

Elias Yee and Jason Teo

Evolutionary Computing Laboratory

School of Engineering and Information Technology

Universiti Malaysia Sabah

Kota Kinabalu, Sabah, Malaysia

[email protected], [email protected]

Abstract—The Izhikevich spiking neural network model is

investigated as a method to develop controllers for a simple,

but not trivial, car racing game, called TORCS. The

controllers are evolved using Evolutionary Programming,

and the performance of the best individuals is compared with

the hand-coded controller included with the Simulated Car

Racing Championship API. The results are promising,

indicating that this neural network model can be applied to

other games or control problems.

Spiking neural networks; Izhikevich neuron model;

evolutionary programming; games; car racing; TORCS;

I. INTRODUCTION

Networks of spiking neurons have been increasingly gaining popularity over the years as a computationally powerful and biologically more plausible model of distributed computation [1,2,3]. It is the third generation of artificial neural networks, which was modeled to resemble the biological brain as close as possible, as the biological brain transmits information through electric pulses (action potentials), which is fired at certain points in time by the neurons. In an artificial spiking neural network, incoming pulses (spikes) stimulates a postsynaptic potential according to a response function, and when the voltage potential exceeds a threshold, it triggers a pulse. After the emission of the pulse, the neuron’s membrane potential resets to its resting state. The input to the neuron does not affect the size and shape of the spike, but it affects the time when the neuron fires. Therefore, information is capable of being transmitting by individual spike timings, which in turn makes spiking neural networks capable of exploiting time as a resource for coding and computation in more sophisticated way than other conventional models [4]. Furthermore, spiking neural networks are able to simulate sigmoidal feedforward neural networks and approximate any continuous function.

Computer games have received much attention as computational intelligence research tools for many years, because it adds value and functionality to the games, and it allows researcher to use these games as test beds for research. Car racing is a challenging problem that could generate considerable excitement, which is evident from the multitude of resource invested in it by racers and observers alike, to practice and watch the races. Hence, the problem in racing is not trivial because many parameters influence it. To drive a car, the speed and steer has to be adjusted at the right amount and time, but many situations can happen, given that there are so many parameters influencing the behavior of the car, including the characteristics of the track, road curvature, inclination, surface friction, and banks. Others include the state of the

car such as the current speed, acceleration, direction, slipping and skidding of wheels. Furthermore, cars have different characteristics including horsepower, traction, air resistance, and center of gravity [5]. All these parameters manipulate how the car needs to be driven to achieve desirable results.

Spiking neural networks (SNN) are investigated in many areas and problems. Pavlidis et. al. (2005) evolved spiking neural networks using parallel differential evolution for classification problems [6]. Studies in the area of robotics include the evolution of spiking neural controller for a vision-based mobile robot [7], and indoor flight of a vision-based micro-robot composed of adaptive spiking neurons [8]. There are other application areas as well, such as temporal pattern classification, speech recognition, computer vision, XOR problems, associative memory, and function approximations.

Artificial evolution of neural networks has been investigated by Togelius and Lucas for car racing [9,10]. Another study investigated the imitation of human behaviors in driving using the TORCS racing game [11]. However, spiking neural networks have not been investigated before as a computational intelligence technique in evolving racing car controllers. Hence, motivated by the encouraging results of SNN applications in other domains, this forms the main objective of our study. A successful outcome will not only demonstrate the usefulness of SNNs as a potential car racing AI agent but also in other computer game genres or even real-world problems that exhibit similar real-time control requirements. In this paper, we considered the application of the Izhikevich spiking neuron model for the neural network, trained using Evolutionary Programming, for the control of a simulated racecar. Section 2 briefly introduces the Izhikevich spiking neuron model, and section 3 describes the methods applied, including the simulator, fitness function, and optimization algorithm. Then, the report of the experimental results are presented and discussed. The paper concludes with a summary of the current work and ideas for future works.

II. SPIKING NEURAL NETWORK

A. Neuron Model

Izhikevich (2003) introduced a neuron model that is capable of producing many patterns of biological neurons, which is as biologically plausible as the Hodgkin-Huxley model, yet as computationally efficient as the integrate-and-fire model. This model is a simplification of the Hodgkin-Huxley model to a system of two ordinary differential equations. These two equations describe the membrane potential, v, and the recovery variable, u, which

411978-1-4577-2152-6/11/$26.00 c©2011 IEEE

is roughly considered to represent the acthe inactivation of Na+ ionic currenegative feedback to the membrane poten

Figure 1. Voltage response of a neuron model

spiking firing pattern, with input curre

with an auxiliary after-spike reset membrane potential, v, exceeds its peaction potential (spike) occurs, and the mis reset to its initial value, c, and the reincremented by d. When v 30, then vSynaptic currents are conducted to the nvariable I. The typical time-step used w1ms.

The variables a, b, c, and d are dimparameters that have constant values. Thtime scale of the recovery variable, u, whmean slower recovery. The variable ithe recovery variable, u, to the sub-threof the membrane potential, v, where bigthe variables v and u more strongly, whilow-threshold spiking dynamics. The membrane potential after spike reset vaby the high-threshold K+ conductance. the recovery variable after-spike reset vaby the slow high-threshold Na

+ and

[12,13].

III. METHODS

This section describes the controllerfitness function, optimization techexperiment setup.

A. Car Simulator

The racing simulator that is employexperiments in this paper is The OSimulator (TORCS). TORCS is a very with a sophisticated physic engine contents like different cars, tracks and cois not only an open source racing gadesigned so that anyone could creatcontroller. This simulator takes into accoincluding damage due to collision, faerodynamics, wheel slippage and so describes only the relevant aspect for the

ctivation of K+ and ents, and provide ntial, v.

exhibiting a regular

ent, I = 20.

rule, when the eak of 30mV, an

membrane potential ecovery variable is v c, u u + d. neuron through the with this model is

mensionless model he variable is the here smaller values s the sensitivity of eshold fluctuations ger values couples ich would result in variable is the

alue that is caused The variable is

alue that is caused d K

+ conductance

r, its environment, hnique, and the

yed for performing Open Racing Car

realistic simulator and many game

ontrollers. TORCS ame but was also te their own car ount many aspects fuel consumption,

on. This section controller used.

Figure 2. Track 3 of The Open R

The Computational IntellSimulated Car Racing ChampiTORCS for developments. It iswhere the controllers run aconnected to the server thFurthermore, races run in real-tick, which roughly correspondsimulated time, the server sendclient and waits 10 millisecondaction respond. If no action siguse the last performed signal.

There is much sensory incontroller, but only those that are used. These include:

• The angle between cartrack axis.

• Distance between the distance between the cameters.

• Car speed along the lon

• Current gear and revolu

• Rotation speed of the w

• Distance raced, current

B. Controller

The car is controlled by Izhikevich model neurons. In tthis paper, the network is compand four output neurons. The inthe outputs, so it has no hidden single layered network. The nreal numbers with values in employed a spike rate encodinto the method used by Floreanstrength of the stimulation is repof spike emissions within a givtaken the firing rate of the nmilliseconds (ms) as commancontroller to steer, accelerate neuron model parameter valuespyramidal neurons exhibitinpatterns [14].

The inputs include the angland track direction, distance baxis, speed of the car, and fimeasure the distance between and all inputs are normalized to

acing Car Simulator (TORCS)

ligence in Games (CIG) ionship provides an API to s a client-server architecture as external processes and hrough UDP connections. -time where in every game-s to 20 milliseconds (ms) of s sensory information to the ds (ms) of real-time for an gnals arrive, the server will

nformation available to the we think are most essential

r direction and direction of

car and track axis, and the ar and track edge within 200

ngitudinal axis of the car.

utions per minute (R.P.M.).

wheels.

and last lap time.

a feedforward network of the experiments reported in posed of eight input neurons nputs are directly mapped to layer, which means this is a

neural network weights are the range of [-1, 1]. We

ng method, which is similar no [7]. This means that the presented by the probability ven time interval. We have neurons measured over 20 nds for the decision of the

and brake. The Izhikevich s used correspond to cortical ng regular spiking firing

le between the car direction between the car and track ive range finder sensors to the car and the track edge,

o have the value [0, 10]. The

412 2011 11th International Conference on Hybrid Intelligent Systems (HIS)

outputs, with values in the range of [0,the steering wheel, gas and brake pedalssteer right or left is determined by the outputs, where a negative value means topositive value means to steer left. On thacceleration and brake values are denotedbetween the other two outputs. A positito accelerate, while a negative value indi

Figure 3. Human player racing against SNN an

(left). SNN controller racing against TORCS

C. Fitness Function

The fitness function used is somewfitness function used by Simmerson, wi2008 simulated car-racing competitiondifferences.

The controller’s fitness is determinecar was driven, average speed and amotook throughout the whole race, and tinside the track, measured using the numalso known as game ticks. In this pap10000 time-steps for the evaluation owhich is roughly about 3 minutes of simu

The fitness of a controller is given by

where draced is the total distance raced, vthe car, and Tmax is the maximum numbeeach race. Tout is the number of game tickoutside the track, and D is the amount had sustained.

While a car is being evaluated, the dais also monitored, so much so that if th1000, the controller is immediately disevaluation of the next controller will beg

D. Optimization

Evolutionary Programming is employcontroller for all experiments describedeach generation, the fitness of all evaluated and compared with their respecontroller with the higher fitness score for the next generation. Their genes areconstant mutation rate based on a Gaurandom generator, to produce the offspgeneration. Experiments in this paper arepopulation size of 10 controllers and 1generations each run, unless stated otherw

E. Experimental Setup

The client was developed by exteclient included with the TORCS CIG

, 1], correspond to s. The command to

difference of two o steer right, and a he other hand, the d by the difference ive value indicates cates to brake.

nd TORCS controllers

controller (right).

what similar to the nner of the WCCI

n [15] with some

ed by how far the ount of damage it the ability to stay mber of time-steps, per, we employed

of each controller, ulated time. y the equation:

v is the velocity of er of game ticks in ks the car had been of damage the car

amage the car took he damage exceeds squalified, and the in.

yed to optimize the d in this paper. In

controllers were ective parents. The will act as parents e perturbed with a ussian distribution pring for the next e conducted with a 0 runs, with 1000 wise.

ending the sample competition API.

Loops were added into the generation and number of drivdriver was redeveloped to incluand a function call to process tHowever, the automatic transmsystem functions included witretained.

Preliminary experiments wecontroller could be evolved curvatures on a simple track antrack. Then, experiments were going off the track, and drive athe hope of completing a lap wi

The three tracks used in eFig. 4, starting from the simpleof the three. Track 1 is 1908.track 2 is 2057.56 meters, and tlength.

Figure 4. The tracks use

As mentioned in the methowas carried out with three trackruns. The population goes throrun, with 10 drivers in eageneration has 10000 game constant throughout the experimthe race from the starting line.0.7, and the Gaussian distribustandard deviation 1, N (0, 1). implementation is given in the f

main controller for the vers in each generation. The ude a neural network object the outputs from the inputs.

mission and anti-lock braking th the sample driver were

ere carried out to test if the to respond to turns or

nd then on a more difficult carried out to race without

s fast and far as possible, in ithin the shortest time. experiments are depicted in est track to the hardest track .32 meters in length, while track 3 is 3823.05 meters in

ed for the experiments

ods section, the experiment ks, each track consists of 10 ough 1000 generations each ach generation, and each ticks. All these remained

ment, and all racecars begin . The mutation rate used is ution has a mean 0 and a

A general overview of the flowchart below:

2011 11th International Conference on Hybrid Intelligent Systems (HIS) 413

IV. RESULTS AND DISCUSSIONS

Results showed that the evolved spiking neural network controllers managed to race through all three defined tracks, with minimal or no damage. The controllers even showed sophisticated driving techniques when turning corners. Figures below present the collected results of the best controllers of all runs for each track.

Fig. 5 presents the fitness growth of the best SNN controllers evolved on track 1. It shows a sharp improvement of the controllers during early generations, slowed down after the first fifty generations, and converges around generation 500. The population was probably lucky enough to produce well performing offspring in the beginning. The controllers evolved on track 1 have the highest fitness scores among the three tracks, reaching values as high as 50,000, but these scores are dependent on the environment as well. As track 1 is much simpler than the other tracks, the controllers do not need to slow down too much while turning, so it gathered scores from larger distance raced, and higher average speeds. Having said that, some explanations for the early discovery of good performing solutions is most likely because track 1 is easier to drive in. Some interesting points to note, in addition to the track being easy, are the sophisticated driving behaviors the controller has developed on the track, where it made some distance to the outer side of a curve to turn, instead of making a sharp turn, thus it does not need to slow down too much for each turn.

The charts in Fig. 6 and Fig. 7 show a similar trend as Fig. 5, but have lower fitness scores and are more curved. The experiment on track 2, as presented in Fig. 6, had a steep but steady climb in early generations, until around generation 90 when the steep climb ended, and the population started to progress slowly. It was not until around generation 800 when the controllers converged. However, considering the pattern of the chart, the controllers might not yet converge, since the graph showed that the controllers needed around 150 generations, in two instances (generation 265-441 and 485-661), to resume its slow ascend.

The experimental results from track 3 on the other hand, as presented in Fig. 7, show that the controllers raced in it gained lower scores than those from track 1 and 2, barely reaching 30,000. As track 3 has U-turns and more curves, it explains why controllers in track 3 gained lower fitness scores, because they needed to slowdown for turns more. Nevertheless, the population also experienced a quick improvement in their performance in early generations, and only started to slowdown after approximately a hundred generations. Yet the population still maintained its’ advance albeit slowly until the end. Like the results in Fig. 6, the populations evolved on this track most likely have not converged, as can be seen from the graph in Fig. 7, where the population’s fitness have the potential to increase if given more generations to evolve in. Furthermore, if drawn a trend line, we can see that the graph have not leveled out.

After having obtained the results, we re-simulated the solutions to observe their behavior on the track. We found out that the controller evolved on track 1 was driving at top speed and not slowing down much during turns, which is good. Conversely, the controllers evolved on tracks 2 and 3 did not drive at top speed, but managed to avoid colliding

into or gliding along the rail of the tracks. We also noticed that the controllers did not accelerate at maximum (flooring the gas pedal) often, even on straight roads, which give reasons why the cars seldom or did not reach full speed. Although it is quite reasonable not to accelerate at maximum in places that has many turns, probably the genes to floor the gas pedal in straight roads had not emerge yet. We also observed that many controllers are still going out of the track in track 3, particularly during the hard turn after the straight road, as it took speed but failed to slowdown enough to avoid skidding off the track. Yet some controllers managed to have evolved the behavior to overcome skidding off the track.

Figure 5. The fitness against generation chart for track 1

Figure 6. The fitness against generations results chart for track 2

Figure 7. The fitness against generations results chart for track 3

Early 100 generations consists of the controllers crashing into the sides of the track and gliding against the rail, which deducted fitness scores, driving slowly or not being able to race far. Hence, the logarithmic-like graph,

0

10000

20000

30000

40000

50000

60000

1 101 201 301 401 501 601 701 801 901

Fitness

Generation

0

10000

20000

30000

40000

50000

60000

1 101 201 301 401 501 601 701 801 901

Fitness

Generation

0

10000

20000

30000

40000

50000

60000

1 101 201 301 401 501 601 701 801 901

Fitness

Generation


but after that early generations, the conimprove very slowly. By this point, theyto drive fairly far or complete a full lap, athe sides or going out of the track oftonly thing remaining is for them to leincreasing speeds at certain segments ofmuch they needed to slow down for eacto strategies for turning, such as avoidavoid losing too much speed, so larger covered.

As the selection process we used selection method used in the differentialgorithm, where a better performinreplaces its parent, we see a very slow bimprovement. One very interesting questif another selection technique, like touround robin, or hall-of-fame, would incthe fitness improvement, and perhaps hescape from local optima. We also likemuch coevolution would do to helpimprove.

Table 1 shows the shortest times controllers for each track of all 10 runevolved on track 1 achieved an averagwith a standard deviation of 0.134complete a lap, and the controllers evneeded an average of 47.12 secondsdeviation of about 2 seconds to comcontrollers evolved on track 3, on the othstaggering average time of 126.72 secontwo minutes, and a standard deviation ocomplete a lap. Furthermore, unlike the con tracks 1 and 2, the controllers evmanaged to complete only one lap throduration of 10000 game ticks. The facontrollers on track 2 had a standard deseconds might give a clear sign that thhave not fully converge yet.

Though it is not presented in thexamining the raw data of the lap timeswe found the controller that achieved thcomplete a lap does not mean it obtainedscore. However, from the last generatiofound that the time for completing a lapthe fitness scores obtained by the contrtrue for the fittest controller, as they achtime and has the highest fitness score. Ycases where a controller achieved a shotime another controller achieved, but fitness score than the other controller, an

Some explanation we could cooccurrence is in the deduction of the controller might have achieved a shorter lap, but it went out of the track’s boufitness score deducted. As a car comes edge of the curved path on the road, ifinish line becomes shorter. The aforemecould have drove outside the boundarywas nearer to the inner edge of the curveHence, its distance to the finish line wasthe controller that kept its course withinthe track.

ntrollers started to y were already able avoid crashing into ten. Therefore, the earn strategies for

f the track and how ch turn, in addition ding sharp turns to

distance could be

is similar to the ial evolution (DE) ng child directly but gradual fitness tion we consider is

urnament selection, crease the speed of help the controllers e to find out how p the controllers

achieved by the ns. The controllers ge of 25.8 seconds 4 milliseconds to volved on track 2 with a standard

mplete a lap. The her hand, needed a

nds, which is about of 1.69 seconds to controllers evolved volved on track 3 oughout the whole act, also, that the eviation of about 2 he controllers may

he table, but by s and fitness score, he shortest time to

d the highest fitness on of all runs, we p is proportional to rollers, and this is hieved the shortest Yet there are some orter time than the obtained a lower

d vice versa. me up for this fitness scores. A

time to complete a undary and had its closer to the inner its distance to the entioned controller y of the track, but d path on the road. made shorter than n the boundary of

TABLE I. BE

Run Best La

Track 1 Tra

1 25.98 48

2 25.77 48

3 25.736 44

4 25.786 46

5 25.492 47

6 25.822 44

7 25.886 46

8 25.822 45

9 25.92 46

10 25.718 51

/ 25.7932 0.1340 47.12

Figure 8. The path driven by the

We tried playing our SNNTORCS included controllers based controller, included competition API. We also triecontrollers ourselves and wdiscovery. Firstly, the evolved cthe rule-based controllers withotracks. Secondly, the controllerable to beat both the TORCS iThe controllers evolved on track

EST LAP TIMES

ap (seconds)

ack 2 Track 3

8.65 128.47

8.274 126.578

4.872 127.462

6.73 129.472

7.208 128.554

4.706 125.78

6.672 124.994

5.886 124.42

6.726 125.738

1.476 125.756

1.9879 126.7224 1.6871

e best controllers on the tracks

N controllers against some and the hand-coded/rule-with the TORCS CIG

ed playing against the SNN witnessed some amusing controllers were able to beat out much effort on all three rs evolved on track 1 were included controllers and us. k 2, on the other hand, were

2011 11th International Conference on Hybrid Intelligent Systems (HIS) 415

able to beat us, but was not able to beat the TORCS included controllers. The controllers evolved on track 3 did not manage to overtake both the TORCS included controllers and us, but drove so much better than us, as we kept crashing onto the sides of the track.

Observations we made and noted from our play against the evolved controllers and the competition we set between them and the TORCS included controllers, is that the evolved controllers accelerates slower. The major cause of this, from our observation, is in the automated gear shifting strategy used in the controllers’ car. Apparently, the car shifted its gear earlier than the TORCS controllers and the human controlled car, hence the reason for the slow buildup of speed. Furthermore, the SNN controller accelerates with values in the range of [0.0, 1.0], but the human controlled car and the TORCS controllers accelerates at maximum value (flooring the gas pedal). The SNN controllers from track 1 learned to floor the gas pedal throughout the race, but the SNN controllers from track 2 and 3 did not manage to learn it yet. Hence, the TORCS controllers have the upper hand to win because it picks up speed faster. This matter provides us with further interesting improvements that we could conduct on our part.

V. CONCLUSION AND FUTURE WORKS

This study has shown that cars or games for that matter, controlled by evolved spiking neuron models could perform well. The generated controllers were not only capable of driving through a complete racetrack without inflicting much or any damage on itself but could also demonstrate sophisticated driving behaviors. We have presented and noted many interesting reasons behind many occurrences during the experiment. Interestingly in some generations, the controllers decided to break rules and drove off-track so it could reduce its distance to the finish line. The results obtained from this experiment showed that the potential of using spiking neural networks in games, similar areas and more are immense. Our focus in this paper was to discover and show that spiking neuron models are capable of acting as well performing controllers in games, and we chose a racing game platform called TORCS, and interfaced our controllers through the TORCS CIG competition API.

There are many areas we can include and/or optimize for our future works, as the complexity of the game, evolution process and network structure are not limited to the methods we used for this paper. Future works may include comparing against a conventional neural network, competitively coevolve against itself, against the conventional neural network, or the static AI included with the game. Perhaps employing incremental learning could introduce more generalization, and including multi-objective techniques could generate better solutions.

REFERENCES

[1] S.J. Thorpe, A. Delorme, and R. Van Rullen, "Spike-based strategies for rapid processing," Neural Networks, vol. 14, pp. 715–726, 2001.

[2] Sander M. Bohte, "The Evidence for Neural Information Processing with Precise Spike-times: A Survey," Natural Computing, vol. 3, pp. 195–206, 2005.

[3] Wolfgang Maass, "Networks of Spiking Neurons: The Third Generation of Neural Network Models," Neural Networks, vol. 10, pp. 1659-1671, 1997.

[4] Wolfgang Maass, "Computation with spiking neurons," in The Handbook of Brain Theory and Neural Networks, 2nd ed.: MIT Press (Cambridge), 2003, pp. 1080-1083.

[5] Morgan Jakobsen, "Learning To Race In A Simulated Environment," Department of Information Technology, Østfold University College, Master's Thesis 2007.

[6] N.G. Pavlidis, O.K. Tasoulis, V.P. Plagianakos, G. Nikiforidis, and M.N. Vrahatis, "Spiking Neural Network Traning Using Evolutionary Algorithms," in IEEE International Joint Conference on Neural Networks (IJCNN), Montreal, Que., 2005, pp. 2190-2194.

[7] Dario Floreano and Claudio Mattiussi, "Evolution of Spiking Neural Controllers for Autonomous Vision-Based Robots," in LNCS 2217. Berlin, Heidelberg: Springer-Verlag Berlin Heidelberg, 2001, pp. 38–61.

[8] Dario Floreano, Jean-Christophe Zufferey, and Jean-Daniel Nicoud, "From Wheels to Wings with Evolutionary Spiking Circuits," Artificial Life, vol. 11, no. 1-2, pp. 121-138, January 2005.

[9] J. Togelius and S.M. Lucas, "Evolving controllers for simulated car racing," in The 2005 IEEE Congress on Evolutionary Computation,2005, pp. 1906-1913.

[10] J. Togelius, P. Burrow, and S.M. Lucas, "Multi-population competitive co-evolution of car racing controllers," in Evolutionary Computation, 2007. CEC 2007. IEEE Congress, Singapore , 2007, pp. 4043-4050.

[11] Jorge Muñoz, German Gutierrez, and Araceli Sanchis, "A human-like TORCS controller for the Simulated Car Racing Championship," in Proceedings 2010 IEEE Conference on Computational Intelligence and Games, Copenhagen, Denmark, 2010, pp. 473-480.

[12] Eugene M. Izhikevich, "Simple Model of Spiking Neurons," IEEE Trans. Neural Networks, vol. 14, no. 6, pp. 1569-1572, 2003.

[13] Eugene M. Izhikevich, "Polychronization: Computation with Spikes," Neural Computation, vol. 18, no. 2, pp. 245-282, 2006.

[14] E. M. Izhikevich, Dynamical systems in neuroscience: The geometry of excitability. Cambridge, MA: The MIT Press, 2006.

[15] D. Loiacono et al., "The WCCI 2008 Simulated Car Racing Competition," in Proc. IEEE Symp. Comput. Intell. Games, 2008, pp. 119-126.

[16] Sander M. Bohte and Joost N. Kok, "Applications of Spiking Neural Networks," Information Processing Letters, pp. 519-520, June 2005.


[ieee 2011 11th international conference on hybrid intelligent systems (his 2011) - melacca,...

Documents