[ieee 2006 ieee conference on cybernetics and intelligent systems - bangkok, thailand...

1-4244-0023-6/06/$20.00 ©2006 IEEE CIS 2006

An Empirical Comparison of Non-adaptive, Adaptive and Self-Adaptive Co-evolution for Evolving

Artificial Neural Network Game Players

Yi Jack Yau and Jason Teo School of Engineering and Information Technology, Universiti Malaysia Sabah, Locked Bag No. 2073,

88999 Kota Kinabalu, Sabah, Malaysia [email protected] or [email protected]

Abstract— This paper compares the implementation of the non-adaptive, adaptive, and self-adaptive co-evolution for evolving artificial neural networks (ANNs) that act as game players for the game of Tic-Tac-Toe (TTT). The objective of this study is to investigate and empirically compare these three different approaches for tuning strategy parameters’ in co-evolutionary algorithms in evolving the ANN game-playing agents. The results indicate that the non-adaptive and adaptive co-evolution systems performed better than the self-adaptive co-evolution system when suitable strategy parameters were utilized. The adaptive co-evolution system was also found to possess higher evolutionary stability compared to the other systems and was also successful in synthesizing ANNs with high TTT playing strength both as the first as well as second players.

Keywords— Adaptation, Self-adaptation, Co-evolution, Game AI, Evolutionary Artificial Neural Networks.

I. INTRODUCTION In spite of the additional difficulties of co-evolutionary

models, they hold some significant advantages that have been exploited within the context of EAs to support the generation of solutions to a series of complex problems. Applying the concept of a co-evolutionary process without using any explicit evaluation function can produce the correct combination of representation, search operators, and selection criteria, even evaluation function and most importantly, a good solution or set of solutions such as optimal strategies in games. Co-evolutionary techniques have been applied to several games, including chess [8], go [9,13], and Othello [12]. One of the most successful demonstrations of the machine learning capability of evolutionary computation in the creation of a game-playing program was performed by Chellapilla and Fogel in the synthesis of their checkers program Anaconda that was created using co-evolution of ANNs without expert knowledge, and it is able to play checkers at a Master level [3, 4].

Evolutionary Algorithms (EAs) are meta-heuristics that apply the principles of neo-Darwinian evolution to the establishment of artificial intelligence in machine learning and to optimization. They perform well for problems where little or no domain knowledge is available. If knowledge about a problem is available, then a bias can be introduced directly into

the representation or operators in order to hasten the process of searching and problem’s solving. Unfortunately, in many realistic situations where evolutionary computation is applicable, a priori knowledge about the intricacies of the problem or the qualities of the evolving population is inaccessible [1].

In EAs, adapting the algorithm to the problem and altering the parameters as well as the variables of an EA to suit the problem can be done while the algorithm is searching for a problem solution. Adaptation can be classified based on the adaptation type, which refers to the mechanism of adaptation that is occurring and on the adaptation level, which is according to what level inside the EA the adaptation is occurring. There are two main categories of adaptation type: static and dynamic, where the dynamic category is divided further into deterministic, adaptive, and self-adaptive. This project will focus on the adaptive and self-adaptive categories of EA adaptation. For more details about adaptation, the reader can refer to Angeline [1,2] or a survey completed by Hinterding et al. [7].

Adaptive EA adaptation (i.e. adaptive dynamic adaptation) takes places if there is a direction and/or magnitude of the change to the strategy parameter(s) that is determined using some form of feedback from the EA. The strategy parameters’ value assignment may involve credit assignment, and the EA’s action may determine whether or not the new value persists or propagates throughout the population. Adaptation can also be used for objective function alteration by increasing or decreasing penalty coefficients for violated constraints.

Self-adaptation of EA parameters is the implementation of the so-called “evolution of evolution” idea [7]. Here the parameters to be adapted are encoded onto the chromosome(s) of the individual and undergo mutation and recombination. These encoded parameters do not affect the fitness of individuals directly, but good performing individuals’ parameter values will lead to good performing individuals, and these individuals will be more likely to survive and produce offspring, and hence propagate these good performing individual parameter values.

The main objective of this study is thus to empirically compare a self-adaptive co-evolutionary approach versus an adaptive co-evolutionary approach, in addition to a static, non-adaptive normal co-evolutionary approach for the purpose of automatically generating game AI in the form of neural network players for the game of Tic-Tac-Toe. The performance of the respective approaches will be measured according to the playing strength of the evolved ANN game-playing agents.

A. Tic-Tac-Toe (TTT) TTT is a game in which two players alternately put crosses

and circles in one of the compartments of a 3 x 3 board. The objective of the game is to get a row of three crosses or three circles before the opponent does. Player one is the player that moves first, making a cross, followed by player two, making a circle. If at the end of the game both players cannot meet the objective, it means that a draw is awarded to both players. The objective of a TTT first player and second player are very different. At the expert level, a TTT first player’s purpose is to force a win or a draw; however a second player should force a draw by blocking the first player’s winning moves.

II. NON-ADAPTIVE CO-EVOLUTION ON TTT Using the co-evolution paradigm for automatically

generating ANNs without using any expert knowledge of the game, the co-evolutionary system successfully synthesized ANNs that were capable of playing TTT intelligently, that is at a level that will never lose but does not force a win in any game when the competitor makes no errors [4]. The ANN would take as input a board position, and output a value for each position that expresses the desirability of the position for the player to move, with respect to the position with the value nearest to one. Our previous study on using DE for evolving TTT game-playing ANN agents [15] have shown superior performance compared to the original EP-based co-evolutionary TTT system reported in [5].

A. The Co-evolutionary Differential Evolution System A standard multi-layered feed-forward ANN was selected

to receive a board pattern as input and the output was a position of the board as the corresponding move. Each node of hidden layer and output layer performs a sum of the weighted input strengths, subtracts off an adaptable bias term and passes the result through a sigmoid filter as shown in (1),

1/(1+ε−ξ). (1)

where ξ is the sum of the weighted input strengths.

The ANN’s input layer consisted of nine input nodes (with an additional bias unit), a hidden layer of varying size (between 1 and 10 hidden nodes with an additional bias unit) and the output layer consisted of nine nodes, where each of the input and output nodes corresponded to a square in the TTT grid.

The 3 x 3 matrix board state is represented as a 2-dimensional 3 x 3 array of nine values. A blank open space was denoted by the value 0.0, an “X” was denoted by the value 1.0,

and an “O” was denoted by the value (minus) -1.0. The two-dimensional array represents the current board pattern and is presented to the ANN to determine the move of the opposing player and correspondingly, the relative strengths of the nine output nodes were examined to determine the equivalent counter-move by the game AI system. An empty square’s position with the maximum output strength was chosen as the output. This is to ensure only legal moves are made. Placed squares were ignored and selection pressure was not applied to force the output to zero [4].

The original EP-based co-evolutionary system was initialized with a population of 50 ANNs, each one having its weight connections and bias term value set at random in a uniform distribution ranging over [-0.5,0.5]. Each parent created an offspring through mutation of each weight and bias term value by adding a Gaussian random variable with zero mean and a standard deviation of 1. With a 50 percent chance, the number of nodes in hidden layer was allowed to vary, subject to the constraints on the maximum and minimum number of nodes. All new added node weights and the bias term are set to 0.0.

DE is a very simple but very powerful population based stochastic optimization algorithm [12]. Instead of using a pure mutation methodology as in the original EP-based co-evolutionary system, our proposed co-evolutionary system uses an additional genetic operator in the form of a crossover function that implements the DE concept in the creation of new offspring. The crossover rate is set to 0.5 and the weighting factor of this DE is a random real value between [0.0, 0.5] (See algorithm below).

1) Differential Evolution implementation pseudocode:

For each parent:

Randomly pick two other parents.

Build weighted difference vector for connection weights of the parent with method BWDV

Build weighted difference vector for bias terms of the parent with method BWDV

The parent creates an offspring that contains the same configuration as itself.

Add both of the weighted difference vectors into the offspring configuration.

Mutate the offspring configuration.

2) Build Weighted Difference Vector method (BWDV)

With a 50 percent chance

return Factor*(vector1_value – vector2_value)

Else return 0

where Factor is a random value uniformly chosen between [0.0, 0.5].

The 100 ANNs comprising of the 50 parent and 50 offspring ANNs then competed with a rule-based procedure implementation in games of TTT, where each ANN plays as the first player “X” in 32 games. The first move was examined by the rule-based with the eight possible second moves being stored in an array. The rule-base algorithm worked as follows:

1. Based on all legal moves, select a move that has not yet been played.

2. For following moves:

With a 10 percent chance, move randomly, else

If a win is available, place a marker in the winning square, else

If a block is available, place a marker in the blocking square, else

If two open squares are in line with an “O”, randomly place a marker in either of the two squares, else

Randomly move in any open square.

3. Continue with (2) until the game is completed.

4. Continue with (1) until games with all eight possible second moves have been played.

The payoff function {+1, -10, 0} are the rewards for winning, losing, and drawing, respectively. A selection routine again compares each neural network to 10 other randomly chosen networks that if the score of the chosen network was greater than or equal to its competitor, it receives a win and keeps those neural networks with the greatest number of wins as parents of the next generation. This process of co-evolutionary self-play iterates for a maximum of eight hundred generations. The only feedback of the process was an aggregate score earned over a series of games; hence no explicit evaluation function is needed in the co-evolutionary system.

III. ADAPTIVE CO-EVOLUTION In adaptive co-evolution, a direction and/or magnitude of

the strategy parameters’ modification is decided using some form of feedback from the EA. The highest ANN’s score value of the current generation is used as the feedback from the EA. If the feedback comprises of a non-negative value, a deduction within a range will be applied to the strategy parameters (consisting of the mutation rate and crossover rate).

IV. SELF-ADAPTIVE CO-EVOLUTION In self-adaptive co-evolution, self-adaptation of parameters

is the implementation of the “evolution of evolution” idea. Here the parameters consisting of the mutation rate and crossover rate are encoded into the chromosomes of individuals and undergo genetic operations of mutation and recombination. These parameters will also be varied by deduction within a range.

V. EXPERIMENTAL SETUP The aim of these experiments is to compare and observe the

effects of all these systems in the TTT playing strength. Our experiments cover three different adaptation approaches, the adaptive, self-adaptive and the static parameter setup. The first system presented is the normal co-evolution system which is static and non-adaptive, whereas the second and third systems implement the adaptive co-evolution system with two different deduction ranges, while the fourth and fifth systems implement the self-adaptation co-evolution with different strategy settings. Each of these experiments was repeated for 50 trial runs. See Table I for details of the parameter deduction range for each of the dynamic adaptive and self-adaptive systems.

TABLE I. EXPERIMENTS SETUP DETAILS

Experiments Mutation rate Crossover rate Normal Co-

evolution DE (NC) 50% 50%

Mutation rate (initial value /

deduction range )

Crossover rate (initial value /

deduction range) Adaptation

Co-evolution (AC1) 100% / [0.1%,0.2%] 100% / [0.1%,0.2%]

Adaptation Co-evolution (AC2)

100% / [0.01%,0.05%] 100% / [0.01%,0.05%]

Self-adaptation Co-evolution (SC1)

100% / [0.1%,0.2%] 100% / [0.1%,0.2%]

Self-adaptation Co-evolution (SC2)

100% / [0.01%,0.05%] 100% / [0.01%,0.05%]

After completing all of the above experiments, the best five

scoring ANNs from each experiment are selected. Each selected ANN has to compete with each other as well as against a near-perfect rule-based player. A competition consists of a set of five games. A selected ANN will thus play a total of 125 games firstly as a first player and secondly as a second player.

VI. RESULTS & DISCUSSION

A. Overall performance: all trials of each system

Tables II and III show the performance of the trials of each system. The following Figures 1 through 5 show each system’s ANN’s tournament scores. AC1 was successful in out-performing all the other systems in terms of producing good performing ANNs and evolutionary stability. Almost half of its trials scored well in the tournament (>= 20). It also has the lowest standard deviation for the tournament scores, making it the most stable co-evolutionary system among all of these systems, and yet it also has the highest average tournament score. Also, the five figures (1-5) show that AC1 is more stable. The tournament score’s range of trial scores for this system was lower than all other systems.

AC2 was also successful in producing high performance trials. This system has the most trials (five trials) scoring in the highest tournament score range but having the second highest tournament scores’ standard deviation (SC2 was the highest) shows its instability.

Similar to AC1 and AC2, the NC system also has high scoring trials. In terms of evolutionary stability, it was similar to SC1 but the difference was that the NC’s average tournament score was higher.

From the tournament scores’ viewpoint, both the SC1 and SC2 were not successful in obtaining outstanding performance. Their average performances were not very high compared with other systems and was also less stable compared to NC and AC1.

TABLE II. THE NUMBER OF TRIALS OF EACH SYSTEM FOR THE GIVEN RANGE OF SCORES

System Tournament Score x<20 2520 <≤ x 3025 <≤ x 30≥

NC 35 12 3 0 AC1 27 20 3 0 AC2 34 11 5 0 SC1 45 5 0 0 SC2 43 7 0 0

TABLE III. AVERAGE, STANDARD DEVIATION, MAXIMUM, MINIMUM AND RANGE REPRESENTING OVERALL PERFORMANCE OF TRIALS OF EACH SYSTEM

System Average Standard Deviation Max Min Range NC 15.22 5.9532 25 6 19 AC1 17.6 5.2992 26 8 18 AC2 15.58 6.4114 27 4 23 SC1 13.28 6.0001 22 3 19 SC2 13.12 6.6751 24 0 24

Figure 1. Tournament scores using the NC’s approach ANNs.

Figure 2. Tournament scores using the AC1’s approach ANNs.

Figure 3. Tournament scores using the AC2’s approach ANNs.

Figure 4. Tournament scores using the SC1’s approach ANNs.

Figure 5. Tournament scores using the SC2’s approach ANNs.

B. Performance of the best five ANNs of each system

TABLE IV. THE PERFORMANCE OF THE FIVE SELECTED ANNS OF THE FIVE SYSTEMS PLAYING AS THE FIRST PLAYER

System System’s Best 5 ANNs

Win (Game)

Loss (Game)

Draw (Game)

NC NC i 100 0 25 NC ii 100 10 15 NC iii 80 10 35 NC iv 95 10 20 NC v 110 0 15 AC1 AC1 i 90 5 30 AC1 ii 90 0 35 AC1 iii 75 5 45 AC1 iv 85 25 15 AC1 v 60 15 50 AC2 AC2 i 90 10 25 AC2 ii 70 20 35 AC2 iii 80 5 40 AC2 iv 100 0 25 AC2 v 105 0 20 SC1 SC1 i 90 15 20 SC1 ii 50 40 35 SC1 iii 80 20 25 SC1 iv 60 40 25 SC1 v 90 5 30 SC2 SC2 i 80 5 40 SC2 ii 80 10 35 SC2 iii 75 25 25 SC2 iv 100 0 25 SC2 v 80 15 30 Tables IV, V and VI show the performance of the selected

best five ANNs from each system playing as both first and second players. NC was successful in creating a perfect first player game-playing agent but was not able to perform well while playing as the second player. Unlike the AC2 where it was successful in synthesizing game-playing agents not only playing the perfect game as the first player but also solving some of the problems playing as the second player. SC1 on average did not do well as both first and second players. AC1 performed slightly better than SC1 whereas SC2 also performed slightly better then AC1 but not as well as NC and AC2.

TABLE V. THE PERFORMANCE OF THE FIVE SELECTED ANNS OF THE FIVE SYSTEMS PLAYING AS THE SECOND PLAYER

System System’s Best 5 ANNs

Win (Game)

Loss (Game)

Draw (Game)

NC NC i 10 90 25 NC ii 20 75 30 NC iii 20 65 40 NC iv 15 90 20 NC v 10 90 25 AC1 AC1 i 10 100 15 AC1 ii 0 125 0 AC1 iii 20 75 30 AC1 iv 5 95 25 AC1 v 20 90 15 AC2 AC2 i 5 45 75 AC2 ii 35 55 35 AC2 iii 5 85 35 AC2 iv 0 60 65 AC2 v 10 70 45 SC1 SC1 i 0 125 0 SC1 ii 5 110 10 SC1 iii 0 125 0 SC1 iv 5 115 5 SC1 v 25 90 10 SC2 SC2 i 15 100 10 SC2 ii 0 110 15 SC2 iii 10 85 30 SC2 iv 40 65 20 SC2 v 0 105 20

TABLE VI. AVERAGE PERFORMANCE OF THE FIVE SELECTED ANNS OF THE FIVE SYSTEMS

Playing role

System Average Win

(Game)

Average Loss

(Game)

Average Draw

(Game) 1st player NC 97 6 22 AC1 80 10 35 AC2 89 7 29 SC1 80 15 30 SC2 83 11 31 2nd player NC 15 82 28 AC1 11 97 17 AC2 11 63 51 SC1 7 113 5 SC2 13 93 19

VII. CONCLUSION Overall, the adaptive co-evolutionary system was able to

automatically synthesize neural network game-playing agents both as the first and second players whereas the normal co-evolutionary system could only produce good game-playing agents as the first player. Furthermore, the adaptive co-evolutionary system also displayed greater evolutionary stability. On the other hand, the self-adaptive co-evolutionary approaches’ overall performances did not have significantly better results as earlier expected. Pre-testing and hand tuning

for finding suitably good values for the strategy parameters can introduce significant effects to the performance of the co-evolutionary process. The success of the NC system in synthesizing the perfect first player TTT game-playing agent provides further proof that this evolutionary methodology can in fact be successfully utilized to synthesize intelligent game AI. The adaptive co-evolutionary approaches have higher evolutionary stability and can also produce ANNs that can play well as both first and second players, but depending on the strategy parameters of the adaptation. The difference between AC1 and AC2 are not that significant, but yet the dissimilarity of their resulting ANNs’ playing strengths is very noticeable. The same applies to SC1 and SC2 when comparing the five selected ANNs’ playing strength of both systems. Having more information about a problem and then introducing it as an appropriate problem-specific bias into the evolutionary search enables the solutions to be found quicker. Furthermore, having more information about a problem can also significantly help in implementing successful dynamic adaptation approaches.

References [1] P.J. Angeline, “Adaptive and self-adaptive evolutionary computation,”

M., P., Attiliouzel, Y.,Marks, R., D., F., Fukuda, T. (eds), Computational Intelligence, A Dynamic System Perspective. IEEE Press, pp 152-161,1995.

[2] P.J. Angeline, “Two self-adaptive crossover operators for genetic programming”, P.J. Angeline, K.E. Kinnear (eds), Advances in Genetic Programming II, MIT Press,1996, pp 89–110.

[3] K. Chellapilla and D.B. Fogel, “Evolution, neural networks, games, and intelligence”, Proceedings of the IEEE, vol.87, no.9, 1999, pp 1471–1496.

[4] K. Chellapilla and D.B. Fogel, “Evolving an expert checkers playing program without using human expertise”, IEEE Transaction on Evolutionary Computation, vol.5, no.4, pp 422–428, 2001.

[5] D.B. Fogel, “Using evolutionary programming to construct neural networks that are capable of playing Tic-Tac-Toe”, Proceedings of 1995 IEEE International Conference on Neural Networks, San Francisco, CA, 1993, pp 875–880.

[6] N. Franken, and A.P. Engelbrecht, “Evolving game-playing agents”, Proceedings SAICSIT 2003 Annual Conference of the South African Institute of Computer Scientists and Information Technologists, 2003, pp 102–110.

[7] R. Hinterding, Z. Michalewicz and A.E. Eiben, “Adaptation in evolutionary computation: a survey”, Evolutionary Computation, 1997, IEEE International Conference on 13-16 April 1997, pp 65-69.

[8] G. Kendall and G. Whitwell, “An evolutionary approach for the tuning of a chess evaluation function using population dynamics”, 2001 IEEE Congress on Evolutionary Computation (CEC 2001), 2001, pp. 995–1002.

[9] A. Lubberts and R. Miikkulainen, “Co-evolving a Go-playing neural network”, R. Belew and H. Juille (eds), Coevolution: Turning Adaptive Algorithm upon Themselves, 2001, pp 14–19.

[10] L. Messerschmidt and A.P. Engelbrecht, “Learning to play games using a PSO-based competitive learning approach”, Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution and Learning, 2002, Singapore.

[11] Z. Michalewicz and D.B. Fogel, How to Solve It: Modern Heuristics 2nd edn, Springer, Berlin Heidelberg New York, 2004.

[12] D. Moriaty and R. Miikkulainen, “Discovering complex Othello strategies through evolutionary neural networks”, Connection Science, vol. 7, no. 3–4, 1995, pp 195–207.

[13] N. Richard, D. Moriaty, P. McQuesten, R. Miikkulainen, “Evolving neural networks to play Go”, 7th International Conference on Genetic Algorithms, 1998.

[14] R. Storn, “System design by constraint adaptation and differential evolution”, IEEE Transaction on Evolutionary Computation, vol. 3, no. 1, 1999, pp 22–34.

[15] Y.J. Yau and J. Teo, “Co-evolutionary Versus Canonical Algorithms for Evolving Game AI”, International Conference on Intelligent Systems (ICIS 2005), CD Proceedings, Kuala Lumpur, December 2005.

[ieee 2006 ieee conference on cybernetics and intelligent systems - bangkok, thailand...

Documents