Genetic Algorithm: Exam Concepts

Transcript-first notes for GA: communication, representation, selection, crossover, mutation, elitism, replacement, population vs generations, schema, self-breeding, BPP and 8-queens / 8-puzzle, plot behavior, doctor-style answers.

Core exam idea: GA is a population method that adds communication between individuals on top of single-solution search. Selection favours the fittest, crossover mixes their genetic information, and mutation injects exploration. The fittest survive — but the weak must always have a chance, otherwise GA collapses into a bully.

1. The One-Line Model

GA = population search with communication: select -> crossover + mutation -> replace

The doctor's distinguishing test: a method is a population metaheuristic only if its individuals communicate. Hill Climbing with random restart maintains many solutions in parallel, but the runs are independent — no information passes between them. GA is the smallest interesting model that adds communication.

"We are not saying that the individuals are independent. The individuals share information with each other. This is what makes them ... Group. Population. ... In hill climbing with restart, can I have multiple individuals? Yes. But they don't talk to each other."

Stage	What happens	Doctor-style framing
Initial population	N individuals sampled in some way.	Must be scattered. Premature convergence starts here.
Fitness assessment	Score each individual.	The "external" measure that drives selection.
Selection	Pick parents based on fitness.	Survival of the fittest, but the weak must have a chance.
Crossover	Mix two parents to produce children.	The communication channel.
Mutation	Per-gene small random change.	The exploration injection.
Replacement	Decide who survives into the next generation.	Generational (parents die) vs steady-state (parents compete).
Stop criterion	Time, ideal solution, no improvement.	Same family as HC/SA stop rules.

2. Representation Choice

Likely question shape: "Propose a chromosome representation for [problem]. Justify which crossover and mutation operators you would (and would not) use." (Midterm Q7 was an instance of this.)

Binary

Each gene is a bit. Crossover and mutation are mechanically simple.

Hidden danger

Bit position carries weight: flipping the high bit of 1111 drops 14 to 6, while flipping the low bit moves 14 to 15. Locality is broken — a single mutation can be either tiny or huge. "You don't know if you are exploring or exploiting."

Gray code

Binary encoding where neighbouring values differ by exactly one bit.

Strong locality

"Gray code by design — any bit you change, value changes by one." Mutation behaves predictably. Preferred when the landscape rewards careful local refinement.

Value / real

Each gene is a number, character, or symbol.

Mutation is typically a small Gaussian perturbation; crossover keeps the slot semantics. Locality is whatever you make it.

Permutation

Each gene is an index, with the constraint that no two genes share a value.

Used for 8-queens (gene = row of queen in column), bin packing (gene = item order), TSP. Naive crossover and value-flip mutation break the no-duplicate constraint — you must repair, design out, or accept the larger search space.

String / tree (genetic programming)

Each gene is a token; the chromosome is a sequence (8-puzzle: U/D/L/R) or a tree (Koza-style symbolic regression).

Crossover swaps subtrees or substrings. Tree representations grow without bound unless you cap depth.

Exam sentence: Representation choice determines the locality of every operator. Strong locality (gray code, swap on permutations) lets the algorithm fine-tune; weak locality (plain binary, large-step encodings) effectively turns the search into random sampling.

3. Initialization

The doctor ranks initialization strategies on a quality / diversity / cost spectrum. Initial diversity is the first defence against premature convergence.

Strategy	Diversity	Cost	Risk
Pseudo-random (uniform)	Moderate	Cheap	Default safe choice.
Quasi-random (grid + jitter)	Good	Cheap	Fills space evenly without clustering.
Sequential diversification (only insert if at least D away)	Best	Expensive	Empties of "near-duplicates" but slow for large pop.
Latin hypercube / parallel diversification	Good	Moderate	Used in design of experiments.
Heuristic seeding (FFD / BFD / hand-crafted)	Low	Moderate	Fast convergence but risk of bully.

"One of the individuals will be FFD ... another will be BFD ... and then the others random. ... if the other random ones are not strong enough to compete, this FFD or BFD will bully them and get everybody to their side."

Trap: All-heuristic init looks good in early generations because best-so-far drops fast — but the population converges around the seed and exploration dies. Mix with random members.

4. Selection Methods

Likely question shape: "Compare roulette wheel and tournament selection. Discuss which struggles with minimization. Discuss which gives the weakest individual the best chance of survival." (The roulette-for-minimization question was explicitly assigned across two lectures.)

Roulette wheel

Probability proportional to fitness. Sample a cumulative slice with a random number in [0, sum].

Maximization: works directly.

Minimization: needs a hack. Doctor's class debated 1 / f and (C - f). Both are subjective: the offset C changes the bully effect dramatically. With C = 1000 and fitnesses around 17, every probability flattens to ~17/1000 — selection is almost uniform.

Bully risk

Tournament

Pick k=2 (or more) at random and the fittest wins.

Without replacement of the opponent: weakest is always paired against someone, will always lose. "Done. The weakest always Out."

With replacement: the weakest can be paired against itself — its survival probability is tiny but non-zero. The doctor preferred this: "the weak has to have low chance of being selected — but a chance."

Works for both maximization and minimization with no hack.

Rank

Sort by fitness, then sample by rank distribution rather than raw fitness.

Smooths selection pressure. Immune to fitness scaling and to the maximization/minimization issue.

Others (named only)

Stochastic Universal Sampling, Boltzmann, sigma scaling, steady-state. The doctor mentioned them as alternatives that smooth selection pressure further.

"Weak must have a chance" - core principle: The weakest individual at this generation may carry a schema that becomes valuable later. Killing it deterministically removes that schema from the gene pool forever. Tournament with replacement and rank-based selection both protect this principle; deterministic top-K does not.

5. Crossover

Likely question shape: "Compare 1-point, 2-point and uniform crossover. Discuss what happens to schemata under each, and what happens at a crossover rate of 50/50."

Zero-point (no crossover)

Take one parent as-is. No recombination at all. Falls back to evolution-strategy / self-breeding.

No exploration

One-point

One cut. Preserves large schemata from each parent. Mild structural disturbance. Cannot mix non-contiguous traits ("you cannot take eye colour from one parent and hair colour from another if they are not next to each other").

Conservative

Two-point

Two cuts. Lets you swap a middle segment. More flexible than one-point.

Balanced

Uniform / per-gene flip-coin

Each gene independently inherited from one parent or the other. Maximum flexibility but maximum schema destruction.

Aggressive exploration

Crossover rate extremes

Rate	Per-gene meaning	Behaviour	Tag
Very small	Almost nothing inherited from the other parent.	Aggressive exploitation. Schemata stay intact, almost no mixing.	Exploitation
Moderate (biased)	e.g. 80/20 toward the main parent.	Healthy mixing. Doctor's analogy: "the kid resembles the father" with some traits from the mother.	Balanced
50/50 (uniform)	Coin flip per gene.	"Random. Random always means exploration." Schemata are destroyed every generation.	Aggressive exploration

"Apple to apple" framing: Real biology recombines whole genes (eye colour, hair colour), not bits in the middle of a gene. Naive bit-level crossover may cut inside a meaningful unit and destroy it. Designing the chromosome so genes correspond to atomic semantic units protects the algorithm from this.

6. Mutation

Mutation is per-gene: for each gene, draw a uniform random number, mutate if it is below the mutation rate. The classical "good default" is rate = 1/L — on average one mutation per chromosome per generation.

Mutation rate	Behaviour	Tag
Zero	No exploration except via crossover. Once diversity collapses the GA is stuck.	Frozen
~ 1/L (one bit on average)	Schemata mostly intact, occasional novelty.	Classical
Moderate (a few percent)	Aggressive exploration; fitness can drop generation to generation.	Exploration
~ 0.5	Chromosome essentially randomised every step.	Random search

Mutation type depends on representation

Encoding	Mutation	Watch out
Binary	Bit flip.	Locality may be poor (high-bit flip = big jump).
Real / value	Add small Gaussian noise (mean 0, small sigma).	"Sometimes you add 1.5 — small. Chance is very small."
Permutation (8-queens, BPP)	Swap two genes.	Cannot introduce a missing value. If your operator pipeline can leave a value missing, swap will never recover it.

"If you have something like this and you don't have six and eight, from crossover you will not get six and eight ... you are only relying on mutation."

7. Elitism and the Coupling Rule

Trap: Naively setting elitism high "to keep the good guys safe" is a classic premature-convergence cause. The good guy becomes a bully — duplicated everywhere, dominating selection. The right elitism level depends on the mutation and crossover rates.

The coupling rule (the doctor walked the class through this twice)

Mutation + crossover rates	Recommended elitism	Why
Both LOW (high exploitation)	Zero elitism	The good guys are already strong enough to survive the breeding cycle on their own. Adding elitism on top doubles their presence and they bully everyone else.
Both HIGH (high exploration)	Higher elitism (a few percent)	Aggressive operators may destroy the good guys by accident. Reserve a small number of slots so the best survive the iteration.

"If I put elitism, this good guy is going to be there multiple times. He's not going to compete here, and also he's going to compete for the rest. He'll be everywhere."

Exam sentence: Elitism level should not be chosen in isolation. It must be set relative to the destruction rate of the breeding operators - higher destruction means higher elitism, lower destruction means zero elitism.

8. Replacement Strategy

Generational (mu, lambda)

Parents always die. The next generation is composed entirely of children.

Healthy turnover

"In nature, what happens? The parent ... always dies. The kids replace them."

Risk: a great parent that produced weak children is lost forever.

Steady-state (mu + lambda)

Parents and children compete together for slots. The strongest survives regardless of generation.

Greedy / bully risk

"Parents and children compete. Good or bad? Good - I keep the strong. But the bully always survives."

Stronger than elitism alone: the entire surviving generation is the top-N of the combined pool.

Plot diagnostic: Steady-state often shows a current-best curve that is glued to best-so-far very early — the bully has taken over. Generational shows current-best oscillating below best-so-far because new children regularly score worse than the previous generation.

9. Population vs Generations under Fixed Budget

Likely question shape: "Under a fixed evaluation budget, compare small population with many generations vs large population with few generations." (Midterm Q5 was this question.)

Total fitness evaluations = population size x number of generations. Pick the trade-off given the landscape.

Setup	Behaviour	Equivalent
Population = 1, generations = budget	Pure refinement of a single individual.	Hill climbing.
Small population, many generations	Strong exploitation. Diversity collapses early; bully risk.	Closer to HC with a few starts.
Balanced	Enough diversity in early generations, enough generations to refine.	The default "tune" choice.
Large population, few generations	Strong exploration. Selection pressure has no time to act.	Closer to random search.
Population = budget, generations = 1	One generation of random points; pick the best.	Random search.

"The larger the population size, the more exploration. The more the number of generations, the more exploitation."

Exam rule: Multimodal / rugged landscape => favour exploration => larger population. Smooth / unimodal landscape => favour exploitation => more generations.

10. Invalid Offspring Handling

Worked example: 8-queens with permutation chromosome. One-point crossover on parents 1 2 3 4 5 6 7 8 and 5 4 1 8 2 7 3 6 produces a child like 1 2 3 4 | 2 7 3 6 with duplicates and missing values.

Strategy	How	Trade-off
Repair as you go	Take the prefix from parent 1, then walk parent 2 and only insert the missing values.	Doctor's preferred approach. Small overhead per crossover, but search space stays small and feasible.
Ignore (allow duplicates)	Let invalid chromosomes live in the population.	Search space explodes. Convergence becomes much slower; only acceptable if the operator can still reach feasible regions.
Penalize	Subtract a fitness penalty proportional to the number of duplicates.	The GA learns to avoid invalid regions. Risk: too-large penalty makes infeasibles look identical, killing the gradient back to feasibility.
Design out	Drop crossover entirely, use swap-only mutation.	Loses the communication channel and degenerates to evolution strategy.

Operator pairing rule: If your crossover never produces duplicates, swap mutation is correct. If your crossover can leave a value missing, you must either repair or use value-flip mutation (with the larger search space cost). You cannot pair "crossover that drops values" with "swap-only mutation" - the missing value is unrecoverable.

11. Schema Theorem

Likely question shape: "Given a chromosome of length L and the schema H = ..., compute the survival probabilities under crossover and mutation, and explain which schema survives more easily." (Midterm Q6 was this question.)

Definitions

Schema: a pattern over the chromosome with some positions fixed and others as wildcards *. E.g. 1*0**1.
Order o(H): number of fixed positions.
Defining length d(H): distance between the first and last fixed positions.
Instance: any chromosome that matches the fixed positions.

Survival

P(survive 1-point crossover) >= 1 - p_c * d(H) / (L - 1) P(survive mutation) = (1 - p_m)^o(H)

The theorem (informal)

"If the average fitness of the instances of a schema is above the population mean, the number of instances of that schema grows over generations."

Exam takeaways: Short, low-order schemata (small d(H), small o(H)) survive crossover and mutation better and are amplified faster. The GA does not know which schemata it is amplifying - "trial and error? No. Survival of the fittest." The same idea explains why uniform 50/50 crossover is dangerous: it cuts inside every schema with high probability and breaks them.

12. Self-Breeding, Static vs Dynamic Rates, Plot Behaviour

Self-breeding (no crossover) = evolution strategy

Drop crossover and the algorithm degenerates to (mu, lambda) or (mu + lambda) evolution strategy: parents mutate copies of themselves, no two-parent recombination. The communication channel is gone. The doctor noted that many "GA on permutation" papers in the literature are evolution strategies in disguise because crossover would constantly produce invalid offspring.

"So many people are using GA but say no crossover ... but actually it's not genetic per se, because we don't have communication between the individuals."

Static vs dynamic rates

Mode	How	When
Static	Fix crossover and mutation rates before the run. If results are bad, restart with different rates.	Default. Easy to reason about.
Dynamic / adaptive	Monitor CB and BSF during the run. Adjust rates mid-run.	If CB stays glued to BSF, exploitation is too strong - increase crossover or mutation. If BSF is wildly oscillating, exploration is too strong - decrease rates and / or increase elitism.

"I'm learning here. I'm learning that my rate is doing well ... this had a lot to do with reinforcement learning."

Plot behaviour - what to look for

Pattern	Diagnosis
BSF drops in steps; CB shows periodic deviations and re-converges.	Healthy exploration / exploitation cycle.
BSF flat from very early; CB glued to BSF.	Bully has taken over - too much elitism, too low mutation, or steady-state replacement with strong selection pressure.
BSF improves slowly and smoothly; CB tightly hugs BSF.	Pure exploitation; population converged early, no mechanism to escape.
CB oscillates wildly; BSF barely improves.	Too much exploration - mutation rate too high, crossover rate near 0.5 with uniform crossover, or elitism too low.

13. GA on Bin Packing

Design choices the doctor recommended

Representation: permutation of items - the order in which items are processed by a placement rule (First Fit, First Fit Decreasing, Best Fit).
Initial population: seed two heuristic individuals (FFD and BFD), keep the rest random/quasi-random. Watch for the heuristic-bully effect.
Fitness: raw "number of bins" is too flat (large plateaus) - the GA cannot tell whether it is improving. Augment with an empty-space term or empty-space variance. This is the same trick coevolution uses for S1 vs S2.
Selection: tournament with replacement - safe default for minimization.
Mutation: swap two items in the permutation. Cheap, preserves the no-duplicate constraint.
Diversity metric for niching: compare two solutions by per-bin item-set overlap. If 90% of items in bin A of solution 1 are also in bin A of solution 2, the solutions are "close" - useful for fitness sharing or crowding.

"Number of bins is not enough ... not sensitive to the changes ... you add what? Empty spaces. Same number of bins but you add something. This is what our job is, designing this."

Exam-safe sentence: In BPP I would use a permutation-of-items chromosome decoded by First Fit Decreasing, with tournament selection, swap mutation, an augmented fitness that adds an empty-space variance term, and a small elitism (1 to 3 percent). I would seed FFD and BFD heuristics into the initial population while keeping the rest random.

14. GA on 8-Queens / 8-Puzzle (Midterm Q7 style)

8-queens

Chromosome: length-8 vector. Locus = column, value = row of the queen in that column. Implicit constraint: one queen per column.

Fitness: number of attacking queen pairs (minimize). Augment if the raw count is too flat for selection pressure.

Crossover: one-point with repair, or order-preserving (OX). Uniform crossover is dangerous because it shatters the permutation.

Mutation: swap two columns. Preserves the permutation constraint.

Why no value-flip mutation: creates duplicates - the same row has two queens.

8-puzzle

Chromosome: a fixed-length sequence of moves U / D / L / R - e.g. length 31 (the longest known optimal solution length). Genes = the move at each step.

Fitness: Manhattan distance of the final state from the goal after applying the move sequence to the initial state.

Crossover: single-point or two-point on the move sequence is fine - there is no permutation constraint.

Mutation: replace one move with another from {U, D, L, R}. Avoid producing immediate U-D or L-R that "dance" without progress.

Multi-goal extension (slide EC-05-8): Chromosome stays the same. External fitness becomes the distance to the closest of the goal states. Internal fitness is computed relative to the cluster of individuals targeting the same goal, with a coverage bonus for unique under-served goals. Add fitness sharing on goal-cluster to prevent collapse onto the easiest goal.

15. Common Exam Prompts

Q: What in one sentence makes GA different from HC with restart?

Communication. HC restarts are independent runs; GA individuals share information through selection and crossover.

Q: Why is roulette wheel awkward for minimization?

Roulette assigns probability proportional to fitness. To minimize, you must transform the fitness (1/f or C-f), but the transformation is subjective: the offset C changes the bully effect dramatically.

Q: Tournament with vs without replacement?

Without replacement of the opponent, the weakest is always paired against someone and always loses - it dies. With replacement, the weakest can be paired against itself with low probability, so it has a tiny but nonzero chance of surviving. The doctor preferred with-replacement: "the weak must have a chance."

Q: What happens if the crossover rate is 50/50?

Uniform crossover at 50/50 is essentially random. Schemata are destroyed every generation. Aggressive exploration with no preserved structure.

Q: Why should elitism depend on the mutation rate?

Mutation and crossover destroy schemata. If they destroy a lot, you need elitism to reserve slots for the strong. If they destroy little, the strong already survive on their own and elitism just turns them into bullies.

Q: Generational vs steady-state replacement?

Generational: parents always die - healthy turnover, risk of losing strong parents whose children are weak. Steady-state: parents and children compete together - strong survives forever, premature convergence is the risk.

Q: Pop = 1 vs pop = budget?

Pop = 1 is hill climbing. Pop = budget with one generation is random search. Choose the population size based on whether the landscape needs more exploration (large pop) or more exploitation (more generations).

Q: Why no-crossover GA is not really a GA?

Without crossover, each parent only mutates copies of itself - no information passes between individuals. That is evolution strategy, not genetic algorithm. Many "GA on permutation problems" papers are evolution strategies in disguise.

Q: When should mutation be swap vs flip?

If the chromosome must remain a valid permutation (8-queens, BPP) and your crossover never produces missing values, swap is correct. If your crossover pipeline can leave a value missing, swap cannot recover it - you must either repair or use value-flip mutation.

Q: Why is "number of bins" alone a bad fitness for BPP?

Most local moves do not change the bin count, so the landscape is full of plateaus. Selection cannot tell which solution is closer to improvement. Add a tie-breaker - empty-space variance, max waste, fill ratio - and the GA can navigate the plateau.

Q: What does the schema theorem actually say?

If the average fitness of the instances of a schema is above the population mean, the number of instances of that schema grows over generations. The GA amplifies fit schemata even though it does not know what they are.

Q: Static or dynamic rates?

Static is the default - fix the rates and run. Dynamic monitors CB and BSF during the run and increases mutation when CB hugs BSF, decreases mutation when BSF is wildly oscillating. The doctor framed dynamic rates as a reinforcement-learning policy.

16. Final One-Paragraph Answer

Genetic Algorithm is a population metaheuristic whose distinguishing feature is communication between individuals. Each generation: select parents (the weak must have a chance, so prefer tournament with replacement or rank-based selection), recombine via crossover (one-point or two-point preserves schemata; uniform 50/50 destroys them), mutate per gene (rate around 1/L is the safe default), and replace either generationally (parents die) or steady-state (parents and children compete, with bully risk). Elitism is small (1 to 3 percent) and must be coupled with the operator destruction rate. Representation choice determines locality: gray code or permutation with swap mutation gives strong locality, plain binary with bit-flip does not. Invalid offspring should be repaired rather than ignored. Under a fixed evaluation budget, large population favours exploration and many generations favours exploitation. For bin packing, augment the raw bin count with an empty-space term so the landscape is no longer flat. For 8-queens use a permutation chromosome with swap mutation and a repaired one-point crossover; for the 8-puzzle use a fixed-length move sequence with simple crossover and replacement mutation.