Football Challenge - Genetic Algorithm

Introduction

My immediate thoughts on seeing this challenge was that it was just a matter of trying all the possible variations - and picking a set of players such that the points total was highest, whilst staying within the budget and obeying the constraints of team members.

However after thinking about it I decided that the number of solutions to test would be rather large - certainly larger than I could manage mentally, and it could well be the case that there was no optimum solution.

Enter Mr. Computer.

I spent a good few hours working out how to choose permutations of players from the available set - but I couldn't see a way to do it elegantly.

Rather than use brute force to try to choose the best team I decided to use the elegant, and random, method of genetic algorithms.

My solution involves choosing a team of players randomly then testing whether it is a) valid, and b) worth keeping.

Genetic Algorithms

Genetic algorithms are a technique for literally breeding a solution from random selections.

Typically there would be a set of inputs which would be used to generate a random solution, that solution would have it's validity and effectiveness measured by means of something called a fitness function.

In this case we can define the input as being a team of 11 players and the fitness as being the total number of points of the players in the team.

How This Works

The basic process is fairly simple, and most GA implementations are largely identical.

First of all 500 teams of players are generated completely at random.

Then we process each of the chosen teams in turn, testing first to see if it is a valid team, and secondly to total up the number of points it contains. This total is known as the teams fitness. (No pun intended).

Once that's done we can sort the list of solutions against the number of points.

The worse 100 teams are simply discarded and replaced by a new random selection.

Then the 500 teams are 'bred'. This is a simple matter of picking a pivot point and swapping the players from that point in each of the two teams.

So if we have the following two teams chosen which we want to breed:

  TEAM A                     TEAM B
 ---------------------------------------
 1a. N Martyn            1b. N Martyn
 2a. JA  Riise           2b. JA  Riise
 3a. C   Ziege           3b. I   Harte
 4a. J   Terry           4b. A   Melville
 5a. A   Melville        5b. S   Hyypia
 6a. L   Robert          6b. R   Pires
 7a. N   Solano          7b. L   Robert
 8a. R   Pires           8b. S   Lomas
 9a. P   Scholes         9b. C   Marsden
10a. M   Pahars         10b. T   Henry
11a. JF  Hasselbaink    11b. A   Shearer

Picking the pivot point 6 gives us the following new teams, which are just generated by swapping the members of each team from the pivot point onwards:

  NEW TEAM A                    NEW TEAM B
 ---------------------------------------
 1a. N Martyn            1b. N Martyn
 2a. JA  Riise           2b. JA  Riise
 3a. C   Ziege           3b. I   Harte
 4a. J   Terry           4b. A   Melville
 5a. A   Melville        5b. S   Hyypia
 6b. R   Pires           6a. L   Robert
 7b. L   Robert          7a. N   Solano
 8b. S   Lomas           8a. R   Pires
 9b. C   Marsden         9a. P   Scholes
10b. T   Henry          10a. M   Pahars
11b. A   Shearer        11a. JF  Hasselbaink

Seeing how simple this breeding function is might be suprizing, but it's quite effective. I did toy with more complex schemes but it did just as well.

The important thing to realise is that the breeding may give an in invalid team - that's OK because after each team has been bred we simply discard the worst 100 teams anyway. (Those are almost certainly invalid).

My Program

My program is written in C++ and makes use of two classes:

CPlayer: Which holds details of each individual player.
CSolution: A solution which has been chosen randomly - This holds 11 CPLayer objects internally and has functions for calculating fitness, and breeding, etc.

When the program starts it generates 500 random solutions.

The 500 have their fitness calculated, and the least fit 100 teams are simply discarded. After that each team is paired with a random other team and they are bred as described above.

After 100 iterations of this process the best team is displayed.

A sample run starts like this:

Generation:   0 - Best :    290
Generation:   1 - Best :    345
Generation:   2 - Best :    413
Generation:   3 - Best :    440
Generation:   4 - Best :    480
Generation:   5 - Best :    478
...
...
...

As you can see the fitness, or points total, fluctuates slightly up and down - due to the destructive nature of the breeding - but tends upwards.

Notes

The header file 'playdef.cc' contains a definition of the player object and the list of players. This list of players was taken directly from Andy's e-mail.

Rather than typing/formatting it by hand I wrote a small perl program to do it for me, this can be found in the `generate directory

Failures or Shortcomings

There are two weaknesses in my current implementation:

My program does not make the selection of players completely at random - it always picks a goal keeper first, then 2 fullbacks, and so on. This could mean that many solutions aren't tried because the cash for the later players has already been spent.
1/5 of the solutions are discarded - this may not be enough to eliminate the threat of local maxima.

I guess somebody else could clean that up and win the prize ..

My Solution

My solution is the following team - which I generated by running the program 10 times with 100 generations at each run:

Team | Cost | Points |    Position  | Name
-----------------------------------------------------
 LEE |    3 |     37 |  Goal Keeper | N Martyn
 TOT |    3 |     41 |   Full Backs | C   Ziege
 LEE |    4 |     62 |   Full Backs | I   Harte
 LIV |    4 |     50 | Centre Backs | S   Hyypia
 FUL |    3 |     30 | Centre Backs | A   Melville
 ARS |    5 |     59 |    Mid Field | R   Pires
 FUL |    4 |     44 |    Mid Field | S   Malbranque
 NEW |    5 |     66 |    Mid Field | L   Robert
 NEW |    4 |     47 |    Mid Field | N   Solano
 SOU |    6 |     56 |     Forwards | M   Pahars
 CHE |    9 |     87 |     Forwards | JF  Hasselbaink
==================================
		   Total Cost: 50  Total Points: 579

My Program

The following program is written in standard C++ and should build without a problem under most Unix flavours.

sgkemp.tar.gz.