Transcript-first notes for GA: communication, representation, selection, crossover, mutation, elitism, replacement, population vs generations, schema, self-breeding, BPP and 8-queens / 8-puzzle, plot behavior, doctor-style answers.
The doctor's distinguishing test: a method is a population metaheuristic only if its individuals communicate. Hill Climbing with random restart maintains many solutions in parallel, but the runs are independent — no information passes between them. GA is the smallest interesting model that adds communication.
| Stage | What happens | Doctor-style framing |
|---|---|---|
| Initial population | N individuals sampled in some way. | Must be scattered. Premature convergence starts here. |
| Fitness assessment | Score each individual. | The "external" measure that drives selection. |
| Selection | Pick parents based on fitness. | Survival of the fittest, but the weak must have a chance. |
| Crossover | Mix two parents to produce children. | The communication channel. |
| Mutation | Per-gene small random change. | The exploration injection. |
| Replacement | Decide who survives into the next generation. | Generational (parents die) vs steady-state (parents compete). |
| Stop criterion | Time, ideal solution, no improvement. | Same family as HC/SA stop rules. |
Each gene is a bit. Crossover and mutation are mechanically simple.
Hidden danger
Bit position carries weight: flipping the high bit of 1111 drops 14 to 6, while flipping the low bit moves 14 to 15. Locality is broken — a single mutation can be either tiny or huge. "You don't know if you are exploring or exploiting."
Binary encoding where neighbouring values differ by exactly one bit.
Strong locality
"Gray code by design — any bit you change, value changes by one." Mutation behaves predictably. Preferred when the landscape rewards careful local refinement.
Each gene is a number, character, or symbol.
Mutation is typically a small Gaussian perturbation; crossover keeps the slot semantics. Locality is whatever you make it.
Each gene is an index, with the constraint that no two genes share a value.
Used for 8-queens (gene = row of queen in column), bin packing (gene = item order), TSP. Naive crossover and value-flip mutation break the no-duplicate constraint — you must repair, design out, or accept the larger search space.
Each gene is a token; the chromosome is a sequence (8-puzzle: U/D/L/R) or a tree (Koza-style symbolic regression).
Crossover swaps subtrees or substrings. Tree representations grow without bound unless you cap depth.
The doctor ranks initialization strategies on a quality / diversity / cost spectrum. Initial diversity is the first defence against premature convergence.
| Strategy | Diversity | Cost | Risk |
|---|---|---|---|
| Pseudo-random (uniform) | Moderate | Cheap | Default safe choice. |
| Quasi-random (grid + jitter) | Good | Cheap | Fills space evenly without clustering. |
| Sequential diversification (only insert if at least D away) | Best | Expensive | Empties of "near-duplicates" but slow for large pop. |
| Latin hypercube / parallel diversification | Good | Moderate | Used in design of experiments. |
| Heuristic seeding (FFD / BFD / hand-crafted) | Low | Moderate | Fast convergence but risk of bully. |
Probability proportional to fitness. Sample a cumulative slice with a random number in [0, sum].
Maximization: works directly.
Minimization: needs a hack. Doctor's class debated 1 / f and (C - f). Both are subjective: the offset C changes the bully effect dramatically. With C = 1000 and fitnesses around 17, every probability flattens to ~17/1000 — selection is almost uniform.
Bully risk
Pick k=2 (or more) at random and the fittest wins.
Without replacement of the opponent: weakest is always paired against someone, will always lose. "Done. The weakest always Out."
With replacement: the weakest can be paired against itself — its survival probability is tiny but non-zero. The doctor preferred this: "the weak has to have low chance of being selected — but a chance."
Works for both maximization and minimization with no hack.
Sort by fitness, then sample by rank distribution rather than raw fitness.
Smooths selection pressure. Immune to fitness scaling and to the maximization/minimization issue.
Stochastic Universal Sampling, Boltzmann, sigma scaling, steady-state. The doctor mentioned them as alternatives that smooth selection pressure further.
Take one parent as-is. No recombination at all. Falls back to evolution-strategy / self-breeding.
No exploration
One cut. Preserves large schemata from each parent. Mild structural disturbance. Cannot mix non-contiguous traits ("you cannot take eye colour from one parent and hair colour from another if they are not next to each other").
Conservative
Two cuts. Lets you swap a middle segment. More flexible than one-point.
Balanced
Each gene independently inherited from one parent or the other. Maximum flexibility but maximum schema destruction.
Aggressive exploration
| Rate | Per-gene meaning | Behaviour | Tag |
|---|---|---|---|
| Very small | Almost nothing inherited from the other parent. | Aggressive exploitation. Schemata stay intact, almost no mixing. | Exploitation |
| Moderate (biased) | e.g. 80/20 toward the main parent. | Healthy mixing. Doctor's analogy: "the kid resembles the father" with some traits from the mother. | Balanced |
| 50/50 (uniform) | Coin flip per gene. | "Random. Random always means exploration." Schemata are destroyed every generation. | Aggressive exploration |
Mutation is per-gene: for each gene, draw a uniform random number, mutate if it is below the mutation rate. The classical "good default" is rate = 1/L — on average one mutation per chromosome per generation.
| Mutation rate | Behaviour | Tag |
|---|---|---|
| Zero | No exploration except via crossover. Once diversity collapses the GA is stuck. | Frozen |
| ~ 1/L (one bit on average) | Schemata mostly intact, occasional novelty. | Classical |
| Moderate (a few percent) | Aggressive exploration; fitness can drop generation to generation. | Exploration |
| ~ 0.5 | Chromosome essentially randomised every step. | Random search |
| Encoding | Mutation | Watch out |
|---|---|---|
| Binary | Bit flip. | Locality may be poor (high-bit flip = big jump). |
| Real / value | Add small Gaussian noise (mean 0, small sigma). | "Sometimes you add 1.5 — small. Chance is very small." |
| Permutation (8-queens, BPP) | Swap two genes. | Cannot introduce a missing value. If your operator pipeline can leave a value missing, swap will never recover it. |
| Mutation + crossover rates | Recommended elitism | Why |
|---|---|---|
| Both LOW (high exploitation) | Zero elitism | The good guys are already strong enough to survive the breeding cycle on their own. Adding elitism on top doubles their presence and they bully everyone else. |
| Both HIGH (high exploration) | Higher elitism (a few percent) | Aggressive operators may destroy the good guys by accident. Reserve a small number of slots so the best survive the iteration. |
Parents always die. The next generation is composed entirely of children.
Healthy turnover
"In nature, what happens? The parent ... always dies. The kids replace them."
Risk: a great parent that produced weak children is lost forever.
Parents and children compete together for slots. The strongest survives regardless of generation.
Greedy / bully risk
"Parents and children compete. Good or bad? Good - I keep the strong. But the bully always survives."
Stronger than elitism alone: the entire surviving generation is the top-N of the combined pool.
Total fitness evaluations = population size x number of generations. Pick the trade-off given the landscape.
| Setup | Behaviour | Equivalent |
|---|---|---|
| Population = 1, generations = budget | Pure refinement of a single individual. | Hill climbing. |
| Small population, many generations | Strong exploitation. Diversity collapses early; bully risk. | Closer to HC with a few starts. |
| Balanced | Enough diversity in early generations, enough generations to refine. | The default "tune" choice. |
| Large population, few generations | Strong exploration. Selection pressure has no time to act. | Closer to random search. |
| Population = budget, generations = 1 | One generation of random points; pick the best. | Random search. |
1 2 3 4 5 6 7 8 and 5 4 1 8 2 7 3 6 produces a child like 1 2 3 4 | 2 7 3 6 with duplicates and missing values.
| Strategy | How | Trade-off |
|---|---|---|
| Repair as you go | Take the prefix from parent 1, then walk parent 2 and only insert the missing values. | Doctor's preferred approach. Small overhead per crossover, but search space stays small and feasible. |
| Ignore (allow duplicates) | Let invalid chromosomes live in the population. | Search space explodes. Convergence becomes much slower; only acceptable if the operator can still reach feasible regions. |
| Penalize | Subtract a fitness penalty proportional to the number of duplicates. | The GA learns to avoid invalid regions. Risk: too-large penalty makes infeasibles look identical, killing the gradient back to feasibility. |
| Design out | Drop crossover entirely, use swap-only mutation. | Loses the communication channel and degenerates to evolution strategy. |
*. E.g. 1*0**1.o(H): number of fixed positions.d(H): distance between the first and last fixed positions.d(H), small o(H)) survive crossover and mutation better and are amplified faster. The GA does not know which schemata it is amplifying - "trial and error? No. Survival of the fittest." The same idea explains why uniform 50/50 crossover is dangerous: it cuts inside every schema with high probability and breaks them.
Drop crossover and the algorithm degenerates to (mu, lambda) or (mu + lambda) evolution strategy: parents mutate copies of themselves, no two-parent recombination. The communication channel is gone. The doctor noted that many "GA on permutation" papers in the literature are evolution strategies in disguise because crossover would constantly produce invalid offspring.
| Mode | How | When |
|---|---|---|
| Static | Fix crossover and mutation rates before the run. If results are bad, restart with different rates. | Default. Easy to reason about. |
| Dynamic / adaptive | Monitor CB and BSF during the run. Adjust rates mid-run. | If CB stays glued to BSF, exploitation is too strong - increase crossover or mutation. If BSF is wildly oscillating, exploration is too strong - decrease rates and / or increase elitism. |
| Pattern | Diagnosis |
|---|---|
| BSF drops in steps; CB shows periodic deviations and re-converges. | Healthy exploration / exploitation cycle. |
| BSF flat from very early; CB glued to BSF. | Bully has taken over - too much elitism, too low mutation, or steady-state replacement with strong selection pressure. |
| BSF improves slowly and smoothly; CB tightly hugs BSF. | Pure exploitation; population converged early, no mechanism to escape. |
| CB oscillates wildly; BSF barely improves. | Too much exploration - mutation rate too high, crossover rate near 0.5 with uniform crossover, or elitism too low. |
Chromosome: length-8 vector. Locus = column, value = row of the queen in that column. Implicit constraint: one queen per column.
Fitness: number of attacking queen pairs (minimize). Augment if the raw count is too flat for selection pressure.
Crossover: one-point with repair, or order-preserving (OX). Uniform crossover is dangerous because it shatters the permutation.
Mutation: swap two columns. Preserves the permutation constraint.
Why no value-flip mutation: creates duplicates - the same row has two queens.
Chromosome: a fixed-length sequence of moves U / D / L / R - e.g. length 31 (the longest known optimal solution length). Genes = the move at each step.
Fitness: Manhattan distance of the final state from the goal after applying the move sequence to the initial state.
Crossover: single-point or two-point on the move sequence is fine - there is no permutation constraint.
Mutation: replace one move with another from {U, D, L, R}. Avoid producing immediate U-D or L-R that "dance" without progress.
Communication. HC restarts are independent runs; GA individuals share information through selection and crossover.
Roulette assigns probability proportional to fitness. To minimize, you must transform the fitness (1/f or C-f), but the transformation is subjective: the offset C changes the bully effect dramatically.
Without replacement of the opponent, the weakest is always paired against someone and always loses - it dies. With replacement, the weakest can be paired against itself with low probability, so it has a tiny but nonzero chance of surviving. The doctor preferred with-replacement: "the weak must have a chance."
Uniform crossover at 50/50 is essentially random. Schemata are destroyed every generation. Aggressive exploration with no preserved structure.
Mutation and crossover destroy schemata. If they destroy a lot, you need elitism to reserve slots for the strong. If they destroy little, the strong already survive on their own and elitism just turns them into bullies.
Generational: parents always die - healthy turnover, risk of losing strong parents whose children are weak. Steady-state: parents and children compete together - strong survives forever, premature convergence is the risk.
Pop = 1 is hill climbing. Pop = budget with one generation is random search. Choose the population size based on whether the landscape needs more exploration (large pop) or more exploitation (more generations).
Without crossover, each parent only mutates copies of itself - no information passes between individuals. That is evolution strategy, not genetic algorithm. Many "GA on permutation problems" papers are evolution strategies in disguise.
If the chromosome must remain a valid permutation (8-queens, BPP) and your crossover never produces missing values, swap is correct. If your crossover pipeline can leave a value missing, swap cannot recover it - you must either repair or use value-flip mutation.
Most local moves do not change the bin count, so the landscape is full of plateaus. Selection cannot tell which solution is closer to improvement. Add a tie-breaker - empty-space variance, max waste, fill ratio - and the GA can navigate the plateau.
If the average fitness of the instances of a schema is above the population mean, the number of instances of that schema grows over generations. The GA amplifies fit schemata even though it does not know what they are.
Static is the default - fix the rates and run. Dynamic monitors CB and BSF during the run and increases mutation when CB hugs BSF, decreases mutation when BSF is wildly oscillating. The doctor framed dynamic rates as a reinforcement-learning policy.
Genetic Algorithm is a population metaheuristic whose distinguishing feature is communication between individuals. Each generation: select parents (the weak must have a chance, so prefer tournament with replacement or rank-based selection), recombine via crossover (one-point or two-point preserves schemata; uniform 50/50 destroys them), mutate per gene (rate around 1/L is the safe default), and replace either generationally (parents die) or steady-state (parents and children compete, with bully risk). Elitism is small (1 to 3 percent) and must be coupled with the operator destruction rate. Representation choice determines locality: gray code or permutation with swap mutation gives strong locality, plain binary with bit-flip does not. Invalid offspring should be repaired rather than ignored. Under a fixed evaluation budget, large population favours exploration and many generations favours exploitation. For bin packing, augment the raw bin count with an empty-space term so the landscape is no longer flat. For 8-queens use a permutation chromosome with swap mutation and a repaired one-point crossover; for the 8-puzzle use a fixed-length move sequence with simple crossover and replacement mutation.