Sample Final Exam - 30 pts

ICS503, Spring 2026 - Final Question Bank with Solutions

Same questions as v3, with concise answer outlines below each one. Use v3 (the blank version) for timed practice; use this version for review.

Each solution is a bullet-point sketch of the key points the doctor would expect to see - not a full prose answer. On the actual exam, expand each bullet into 1-2 sentences with the exploration / exploitation framing and tie back to the algorithm's mechanism, the slide, or the homework where applicable.

Part A. Experiment umbrella questions Four cross-algorithm umbrella questions in midterm Q1 style: stem + artifact table + 4 sub-parts. Each is worth 10 points and references multiple algorithms via the artifact table - no full question is dedicated to a single algorithm. Sub-parts test parameter behavior, BPP adaptation, hybridization, and the explore/exploit aim.

A.1 Parameter settings analysis

1. Consider the following parameter settings, each applied to one of the algorithms you used in your mini projects on the bin packing problem.

#	Algorithm	Parameter(s) tweaked	Setting
S1	GA	crossover rate, mutation rate, elitism	very low, very low, very high
S2	DE	scale on the deviation vector	very large
S3	PSO	inertia, global-best pull	very low, very large
S4	ACS	evaporation rate, pheromone exponent	near zero, very high
S5	TS	tabu tenure	very long

For each setting, predict the algorithm's behavior in terms of exploration and exploitation. Indicate whether similar settings were used in your projects, and if so explain how you addressed or mitigated the resulting behavior.
Which of these settings would most likely cause premature convergence? Which would behave closest to random search? Justify.
Pick TWO of these algorithms and explain how you adapted the algorithm to the bin packing problem (representation, operators, fitness).
Which combination of parameter settings would you aim for, and why? Justify in terms of the exploration / exploitation balance.

Solution outline (Dr's voice)

S1 (GA - low crossover, low mutation, very high elitism): the elitism dominates - the best individual is duplicated everywhere, the population collapses on the bully. Behavior is pure exploitation, premature convergence almost guaranteed. In our projects this is addressed by lowering elitism to 1-3% so weaker individuals can survive. S2 (DE - very large scale): mutation jumps overshoot good regions, the trial vector lands far from the population cloud. Behavior is aggressive exploration, almost random sampling. Addressed by reducing the scale to a moderate value. S3 (PSO - low inertia + very large global-best pull): no momentum to coast past optima, swarm stampedes to the leader. Behavior is bully-style premature convergence. Addressed by increasing inertia and using local informant neighborhoods. S4 (ACS - near zero evaporation + very high pheromone exponent): pheromone never forgets, ants chase pheromone exclusively ignoring the heuristic. Behavior is lock-in on the first lucky path - premature convergence. Addressed by raising evaporation and balancing alpha vs beta. S5 (TS - very long tenure): blocks moves for too long, search corners itself in unexplored regions. Behavior is over-diversification, no exploitation possible. Addressed by shorter tenure with adaptive tuning.
Premature convergence: S1, S3, S4 - all three are bully-effect failures (one strong individual / leader / trail dominates the search). Closest to random search: S2 and S5 - S2 because the huge mutation scale makes the trial essentially random; S5 because the very long tenure prevents the search from exploiting any region (everything keeps getting forbidden). Justify in terms of mechanism: premature convergence comes from over-exploitation (the bully); random search comes from breaking the algorithm's ability to commit to any direction.
Example pair (DE and ACS). DE: representation = real-valued vector decoded by sorting into a permutation, then placed by First Fit Decreasing into bins. Operators = standard DE arithmetic on the vector (mutation = base + scale * difference, crossover = per-coordinate from mutant or target). Fitness = number of bins (optionally augmented with empty-space variance for sensitivity). ACS: representation = constructive - the ant places items one by one. Operators = probabilistic placement based on pheromone (alpha) and a heuristic (beta, e.g. inverse of remaining capacity). Fitness = number of bins. Pheromone is on item-bin co-locations; deposited at the end of the tour, evaporated at every iteration.
Aim for moderate, balanced settings on every dial. GA: moderate crossover + small mutation + small elitism (1-3%). DE: scale ~0.5 with random base. PSO: moderate inertia, balanced cognitive vs social pulls, small randomly chosen informant neighborhoods. ACS: moderate evaporation, balanced pheromone vs heuristic exponents. TS: moderate tenure with reactive adjustment. Balanced settings give the algorithm both exploration (population / swarm / colony stays diverse) and exploitation (best solutions get refined). The doctor's framing: extremes always collapse the algorithm into a degenerate form; moderate settings preserve the full algorithm character.

A.2 Adaptation scenarios

2. Consider the following adaptations of canonical metaheuristics to the bin packing problem, each from a different mini project.

#	Algorithm	Representation	Operator design
A1	DE	real-valued vector, decoded by sorting + First Fit Decreasing	standard DE arithmetic on the vector
A2	PSO	real-valued vector, decoded by sorting + First Fit	standard PSO velocity update on the vector
A3	GA	permutation of items, decoded by First Fit Decreasing	order-preserving crossover + swap mutation
A4	ACS	constructive: ant places items into bins one by one	pheromone on item-bin co-locations; deposit at end of tour
A5	TS	explicit packing (item -> bin assignment); tabu list stores recent swap moves	neighborhood = swap two items between bins

For each adaptation, indicate whether it preserves the algorithm's character (i.e., whether the spirit of the algorithm is intact). Justify.
Which adaptation introduces the most 'translation work' (the most distance from the canonical algorithm)? Which is the most natural fit for BPP? Justify.
Pick ONE adaptation and explain how a single design choice in it (e.g. the decoding heuristic, the operator, the fitness term) affects the exploration / exploitation balance.
Suggest ONE alternative adaptation for any algorithm in the table that would make it more suitable for BPP. Justify.

Solution outline (Dr's voice)

A1 (DE with random keys): preserves DE character. The vector arithmetic still works exactly as in continuous DE; only the genotype-to-phenotype map (sorting + FFD) is BPP-specific. The algorithm is still DE in spirit. A2 (PSO with random keys): same logic - velocity update arithmetic is intact, only the decoding is added. Preserves character. A3 (GA with permutation + OX + swap): preserves GA character. Selection + crossover + mutation are still the three GA pillars; the operators are tailored to the permutation constraint but the breeding cycle is unchanged. A4 (ACS constructive): preserves ACS character very strongly - constructive ants are exactly how ACS is defined; BPP just specifies what 'a step' means (placing an item in a bin). A5 (TS with swap moves): preserves TS character - single solution + memory + neighborhood + best-non-tabu move. Classical TS.
Most translation work: A1 and A2 (DE / PSO with random keys). Both add an entire genotype-to-phenotype mapping that is not part of the canonical algorithm; the vector being optimized has only an indirect relationship with the actual packing. Most natural fit: A4 (ACS constructive). ACS is already a discrete constructive method, and BPP is a constructive problem (place items one by one). The match requires almost no adaptation work - it just defines what the ant constructs. The doctor's framing: representation choice is the main translation cost; ACS happens to need almost none.
Pick A1 (DE with random keys). The decoding heuristic (sorting + FFD) controls the genotype-to-phenotype mapping locality. A small change in the vector might or might not change the decoded packing - that is a many-to-one mapping where many vectors decode to the same packing. This indirectness reduces effective exploration: DE is really exploring the vector space, not the packing space. Switching the decoder from FFD to plain First Fit changes which vectors decode to which packings, shifting exploration vs exploitation in ways the DE arithmetic alone cannot control.
Alternative for DE: switch to a discrete encoding (a permutation of items) and adapt the difference operator to operate on permutations directly - e.g., compute the 'edit distance' between two permutations (the minimum number of swaps to transform one into the other) and use a fraction of those swaps as the mutation step. This gives DE direct access to the BPP structure, eliminates the random-key indirectness, but is harder to implement and requires careful crossover redesign. The trade-off: more authentic DE-on-permutations vs more code complexity.

A.3 Hybrid pipelines for BPP

3. Consider the following sequential hybrid pipelines, each combining two of the algorithms used in your mini projects on the bin packing problem.

#	First	Second	Hand-off rule
H1	GA	HC	when GA's best-so-far plateaus for k generations
H2	ACS	TS	when ACS's pheromone distribution stabilises
H3	PSO	SA	when the swarm collapses (global-best stable)
H4	HC	DE	when HC reaches a local optimum
H5	TS	GA	when TS's diversification cycle completes

For each pipeline, indicate whether the order is sensible (does the first explore, the second exploit?). Justify.
Which pipeline is most likely to outperform either algorithm alone on BPP? Which is least likely to add value over the standalone algorithm? Justify.
Pick ONE pipeline and explain how the hand-off should be implemented in detail (what exactly does the first algorithm pass to the second?).
Suggest a hybrid pipeline NOT in the table that would be appropriate for the bin packing problem. Justify.

Solution outline (Dr's voice)

H1 (GA -> HC): sensible. GA explores broad regions via population + crossover; HC refines locally once a good basin is found. Order matches the explore-then-exploit principle. H2 (ACS -> TS): sensible. ACS converges on good item co-locations via pheromone learning; TS uses memory to escape if ACS gets trapped in the pheromone-locked basin. H3 (PSO -> SA): sensible. PSO explores broadly with the swarm; SA refines the best particle's position with controlled cooling. H4 (HC -> DE): NOT sensible. HC quickly produces one local optimum; DE then needs a full population to operate, and a single starting point gives DE almost no diversity. The natural order is reversed (DE -> HC). H5 (TS -> GA): partially sensible. TS finds good solutions through escape; GA can recombine them - but only if seeded with TS's discoveries. If GA starts from a fresh random population, TS's work is wasted.
Most likely to outperform alone: H1 (GA -> HC). The most common and proven hybrid - GA's population finds the right basin, HC does the local refinement that GA's noisy operators cannot. Least likely to add value: H4 (HC -> DE). HC's output is one local optimum; DE needs population diversity to function, so HC's contribution is minimal. The DE-then-HC reverse direction would be much better. The doctor's framing: explore first, exploit second; H1 follows the rule, H4 inverts it.
For H1 (GA -> HC): the hand-off happens when GA's best-so-far has not improved for k consecutive generations (typically k = 20-50). At that point, GA passes its best individual (the chromosome with the highest fitness in the current population) to HC. HC starts from that solution and runs its tweak operator until no improvement is observed for m iterations. The idea: GA does the global exploration (picks the basin); HC does the local descent (climbs the basin). The hand-off discards the rest of the GA population - only the best individual carries over.
Alternative: SA -> GA. SA runs first as a controlled-randomness search, escaping local optima via temperature. The best solutions found by SA seed GA's initial population (instead of random). GA then recombines those seeds via crossover to explore broader regions. Justification: SA's strength is the principled escape mechanism (temperature); GA's strength is recombination. SA gives GA a high-quality starting population, eliminating the slow first-generation random search GA usually does. Or: HC + restart -> ACS. The HC restarts produce a diverse set of locally good solutions that seed ACS's pheromone (initial pheromone proportional to which item-bin co-locations appear in the HC solutions).

A.4 Extreme parameter tweaks

4. Consider the following extreme parameter settings, each pushed away from a balanced default. Each tweak is applied to one of the algorithms used in your mini projects.

#	Algorithm	Parameter	Extreme setting
T1	HC with restart	restart interval	very short
T2	SA	initial temperature	far below typical change in objective value
T3	TS	tabu tenure	one (smallest possible)
T4	GA	mutation rate	approximately 0.5 per gene
T5	ACS	evaporation rate	near zero

For each tweak, predict the algorithm's behavior. Indicate which extreme of the dial (full exploration / full exploitation / collapse to a simpler algorithm) the tweak produces.
Which tweak would most help the algorithm escape a local optimum? Which would most prevent escape? Justify.
Pick ONE tweak and explain mechanically why the algorithm degenerates into a simpler algorithm (e.g. SA at very low T degenerates into HC).
Which type of parameter setting would you aim for, and why? Justify in terms of the exploration / exploitation balance.

Solution outline (Dr's voice)

T1 (HC, very short restart): full exploration. Each restart starts fresh before the previous run had time to climb its basin, so the search is mostly random sampling of starting points. T2 (SA, T much below typical delta-Q): full exploitation. SA degenerates into Hill Climbing - the acceptance probability for worse moves is essentially zero from iteration one. T3 (TS, tenure = 1): collapse to plain HC. The forbidden move is released on the next iteration, so cycling is possible and TS loses its anti-cycling protection. T4 (GA, mutation ~0.5): collapse to random search. Each chromosome is randomized roughly half its length per generation, destroying schemata; selection has nothing stable to preserve. T5 (ACS, evaporation = 0): full exploitation lock-in. Pheromone never forgets, the first lucky tour dominates indefinitely; the colony cannot react to better paths discovered later.
Most helps escape: T1 (HC short restart) - the restart abandons the local optimum entirely and tries a new starting point. Most prevents escape: T5 (ACS no evaporation) - the pheromone monopoly locks the colony onto one path forever, so even if better paths exist, the ants will never discover them. Justify mechanically: T1 forces escape by starting over; T5 prevents escape by keeping the bully's pheromone indefinitely strong.
Pick T2 (SA at very low initial temperature). The acceptance probability for a worsening move is exp(delta-Q / T). When T is much smaller than the typical |delta-Q|, the exponent (negative number / small T) is a large negative number, so exp() is essentially zero. SA only ever accepts improving moves - which is exactly the rule of Hill Climbing. The cooling schedule is irrelevant because T was already too small to permit worse moves. Result: SA at very low T is HC in disguise.
Aim for moderate, balanced settings on every dial. HC: restart interval long enough to climb a basin but short enough to sample multiple basins. SA: initial T calibrated to typical delta-Q so worse moves are accepted occasionally early. TS: tenure of around 5-9 swaps - enough to break cycles but not so long that the search corners itself. GA: mutation rate ~1/L (one mutation per chromosome on average). ACS: moderate evaporation so old pheromone fades but is not erased. Justify: balanced settings preserve the algorithm's full character (both exploration and exploitation work); extremes always collapse the algorithm into a degenerate simpler form.

Part B. Conceptual and comprehension questions

B.1 Coevolution comprehension (slides EC-05-7, 8, 9)

5. Recall the multi-goal-state slide from the coevolution lecture. The slide shows four individuals attempting to reach different goal states; two of them have explicit fitness values and two are intentionally written as "Fitness = ??". Answer the following:

Propose how the internal fitness should be assigned to the two unspecified individuals so that the population maintains coverage across all goal states.
Justify your design in terms of internal vs external fitness, and explain how each of the two notions is used in the algorithm.
A classmate proposes a simpler design: take the average distance from each individual to every goal and use that as the internal fitness. Explain in your own words why this proposal is dangerous.

Solution outline

Assign each individual to its closest goal. Within that goal-cluster, normalise its distance relative to the other members of the cluster (e.g. rank, or distance / sum of cluster distances). Add a coverage bonus when the individual is the only one heading for an under-served goal; reduce the score when many individuals crowd the same goal.
External fitness is absolute - the actual distance to the closest goal - and is used in the Best update line of the algorithm. Internal fitness is relative and coverage-aware - used in the Breed step to drive selection. They cannot be the same function because Best must be comparable across iterations while breeding must depend on who else is in the current population.
Averaging destroys the information about which goal an individual is heading toward. An individual close to one goal but far from the others can score the same average as one moderately close to all - yet only the first is the actual candidate for that goal. Averaging also ignores how many other individuals already cover each goal, so the population collapses onto the easiest goal.

6. Consider a 1-population competitive coevolutionary GA applied to a multi-goal version of the 8-puzzle problem. Each individual is a length-31 sequence of moves over {U, D, L, R}, and there are three acceptable goal states. Design the multi-goal fitness in the following steps:

Explain why solving the three goals as three separate GA runs is wasteful, and identify one concrete inefficiency in that approach.
Now solve them as a single GA. For each individual, define the external fitness (the absolute progress measure used to update Best). Justify why external fitness in this scenario must be aware of which goal each individual is closest to.
Design the internal fitness (the breeding signal) so that the population maintains coverage of all three goals. In your design, explain how you would (i) assign each individual to a goal cluster, (ii) compute its score relative to other members of that cluster, (iii) reward unique coverage of an under-served goal, and (iv) penalise redundancy when many individuals crowd the same goal.
Consider an alternative proposal: keep the "best three individuals per goal" as a way to assign internal fitness. Discuss its merit, and discuss its downside when one of the three individuals reaches its goal.
Compare your proposal to the simple averaging of distances to all goals. Explain in your own words why averaging is dangerous and what specific information it destroys.

Solution outline

Three separate runs duplicate work. A move sequence found in run 1 that happens to be useful for goal 2 is invisible to run 2 - it must be rediscovered from scratch. The doctor's framing: "I might have a chromosome U R that gets close to goal 2, but if I'm only looking for goal 1, I throw it away." Total fitness evaluations are 3x what a single GA would need for the same coverage.
External fitness for individual i = Manhattan (or misplaced) distance from the board produced by applying its move sequence to the goal it is closest to. It must be goal-aware because the same chromosome can be very close to one goal and very far from another - a single absolute distance to "the goal" is meaningless when there are three goals. Best update compares each individual to its own closest goal.
Cluster assignment: each individual is assigned to its closest goal (the one with smallest external fitness). Within-cluster score: normalise the individual's distance relative to the other members of its cluster - e.g. rank within the cluster, or distance divided by the sum of distances inside the cluster. Coverage bonus: if the individual is the only one (or one of very few) in its cluster, multiply its internal fitness by a bonus factor. Redundancy penalty: if the cluster has many members, divide each member's internal fitness by the cluster size (fitness sharing).
Merit: simple, equal opportunity - every goal gets exactly three "champions" who are pushed to refine. Downside: if one of the three reaches the goal early, the other two become useless for that goal but are still spending budget on it. The two others might be needed for other goals where coverage is missing.
Averaging destroys the information about which goal an individual is heading toward. An individual on track to solve goal 2 but ignoring goal 1 and 3 will look "average" overall - and selection might breed against it. Averaging also ignores cluster crowding - a goal already covered by many individuals contributes the same to every individual's score as a goal covered by one. The population collapses onto the easiest goal because the average favours individuals broadly close to all goals (which usually means broadly close to none).

7. Algorithm 76 in the same slide deck calls AssessInternalFitness and AssessExternalFitness on every iteration and uses each one in a different line of the algorithm. Answer the following:

Where in the algorithm is each fitness function used, and why do the two roles require different definitions?
The same slide deck shows two bin packing solutions with the same number of bins but very different empty-space variance. Argue which solution the GA should prefer to keep for breeding.
Explain in your own words why a single fitness function is insufficient in this case, and propose an internal fitness term that would let the GA distinguish the two solutions.

Solution outline

Internal fitness drives the Breed step (parent selection + breeding). External fitness drives the Best update line and is used to report progress. They differ because internal must be relative / coverage-aware (so the population maintains diversity), while external must be absolute / comparable across iterations (so Best does not change meaning every generation).
Prefer the high-variance solution (S1) for breeding. The variance means there is slack: one nearly-empty bin can be redistributed and the next iteration may drop one bin. The low-variance solution (S2) is a stable plateau - good if already converged but offers no breeding leverage.
External fitness alone reports both solutions as identical (same bin count), so selection has no signal to choose between them. Add an empty-space variance term to the internal fitness - this differentiates equal-bin solutions and gives the GA a usable gradient to choose better breeders.

8. The doctor described three options for measuring external fitness in 1-population competitive coevolution when no ground truth exists: testing against a guru, testing against a panel of hand-crafted players, and testing against a sample of past generations. Answer the following:

Compare the three options on cost (computation and human effort).
Compare them on the bias each one carries.
State and justify which option you would prefer when no ground truth exists, and explain the assumption your choice relies on.

Solution outline

Guru: cheap to test against (one opponent) but expensive to obtain (a single highly skilled benchmark may not exist). Panel: moderate compute (test against several handcrafted opponents) and moderate human effort to design the panel. Past generations: cheapest - automatic, no human effort, just test against earlier individuals from the same run.
Guru: biased toward the guru's style; weakness is whatever the guru is weak at. Panel: biased toward the panel's coverage; whatever the panel does not test, the algorithm does not learn. Past generations: biased toward incremental improvement - assumes later individuals are better than earlier ones, which can be wrong (regressions go undetected).
Past generations is the practical default: fully automatic, scales with the run, and avoids requiring a human-crafted reference. The assumption it relies on is monotone improvement (later individuals roughly better than earlier ones). If the run is unstable this assumption breaks; in that case a small handcrafted panel is preferable.

9. Map the firewall coevolution example onto the 2-population competitive framework. In your answer:

Identify what each population represents, what each individual looks like, and how the fitness of an individual in each population is computed.
Explain why this is structurally identical to a Generative Adversarial Network.
Two students argue about how to perform crossover inside a firewall ruleset. The first insists that crossover should swap whole rules only. The second argues that splitting inside a rule (mixing source IP from one rule with destination port from another) is permissible because the slots are still apple-to-apple. Compare the two positions on exploration and exploitation, and propose a compromise that uses both.

Solution outline

P (defenders / solutions of interest): each individual is a complete firewall ruleset. Q (foils / test cases): each individual is an attack scenario. Fitness of P individual = fraction of Q's scenarios it correctly catches; fitness of Q individual = fraction of P's defenders it manages to fool. Each side's fitness depends on the other's quality.
P plays the discriminator (catches attacks); Q plays the generator (manufactures attacks). They coevolve: as P improves, Q must evolve harder attacks to score; as Q hardens, P must evolve smarter rules. Same coaching loop, same collapse mode (if Q gets too easy, P stops learning).
Whole-rule crossover preserves rule semantics and is mainly exploitation. Inside-rule crossover (still apple-to-apple at the field level) is more exploration but can break the semantic glue inside a rule. Compromise: whole-rule crossover at the normal rate plus inside-rule crossover at a much smaller rate - preserves exploitation while injecting occasional structural exploration.

10. In a multi-robot soccer team, each role (goalkeeper, defender, midfielder, striker, playmaker) has its own sub-population evolving in parallel. Answer the following:

Define the two fitness signals each individual receives, and explain how each one is computed.
Explain why a player with the highest individual fitness is not necessarily picked for the team. Justify with reference to the trade-off between individual performance and team coordination.
Explain why exhaustively evaluating every possible team becomes infeasible as the number of sub-populations grows, and describe how cooperative coevolution avoids this combinatorial explosion.

Solution outline

Individual fitness: the player's role performance in isolation (goalkeeper measured on saves, striker on goals, etc.). Team fitness: the score the player produces when grouped with one chosen individual from each of the other sub-populations - measured by playing actual games.
An individually strong player may have low team fitness if its style does not align with the partners. The doctor's project analogy: "Malik's individual progress matters, but if he and Abdullah duplicated work, the team's joint score is what gets graded." The team is what wins, not the individual.
The number of teams is the product of sub-population sizes - explodes exponentially. Cooperative coevolution avoids this by keeping a small set of "team representatives" per sub-population; each candidate is evaluated against a few teams formed from those representatives, and that score is its contribution. The high-dimensional search becomes N smaller searches that talk only at evaluation time.

11. Discuss diversity preservation in population metaheuristics. In your answer:

Name five mechanisms used to prevent premature convergence and to preserve diversity, and briefly explain how each one produces more exploration.
Fitness sharing and crowding both rely on a notion of "two individuals being similar". Discuss the three different notions of similarity (phenotype, genotype, behaviour) and give an example of each in the bin packing problem.

Solution outline

(i) Fitness sharing: penalise individuals too close to each other so the bully shares its credit with neighbours. (ii) Crowding: when a child is born, replace the most similar individual - keeps spread. (iii) Larger population: hopefully many bullies / clusters instead of one. (iv) Add high-variance noise: push individuals out of basins so they can jump. (v) Pick less-fit individuals more often (anti-greedy selection) so weak schemata survive long enough to recombine.
Phenotype similarity: same decoded solution - two BPP solutions with the same bins and the same per-bin fill levels. Genotype similarity: representations are close - two random-key vectors with small per-position difference. Behavioural similarity: same fitness vector (same #bins AND same empty-space variance) even with different chromosomes.

B.2 Effectiveness, efficiency, and reproducibility

12. Recall the midterm question on Meta1 and Meta2: Meta1 reaches a moderate-quality solution very quickly and then plateaus, while Meta2 improves more slowly but eventually surpasses Meta1. Answer the following:

According to the criterion of efficiency and the criterion of effectiveness, which of the two metaheuristics may be considered the best, and why?
Propose a sequential strategy that uses Meta1 first and then hands off the search to Meta2 to obtain a better overall solution. Justify the hand-off rule.
Propose an alternative parallel strategy that runs Meta1 and Meta2 simultaneously under the same total budget, and justify when the parallel strategy is preferred over the sequential one.

Solution outline

Efficiency = good quality with few resources. Meta1 wins on efficiency (reaches a usable solution quickly). Effectiveness = best final quality. Meta2 wins on effectiveness (eventually surpasses Meta1). Choice depends on whether the user has a tight budget or wants the best possible answer.
Run Meta1 first to quickly reach a good basin. Hand-off rule: when Meta1's current-best plateaus for k consecutive iterations (no improvement), pass the best-so-far to Meta2 as the starting point. Meta2 then refines slowly from a warm seed. Combines Meta1's efficiency with Meta2's effectiveness.
Parallel: run Meta1 with a small budget for cheap fast exploitation, run Meta2 with a larger budget for diverse multi-restart, then return the best of both. Preferred when the algorithms have different failure modes - you hedge against one of them getting stuck. Sequential is preferred when warm-starting Meta2 from Meta1's basin is cheaper than running both fully.

13. A classmate claims that an algorithm with very low variance across runs is always preferable to one with higher variance, because low variance means the algorithm is more reliable. Answer the following:

Discuss what very low variance and very high variance each indicate about the algorithm's behaviour.
Discuss which is preferable for reproducibility and for a single-shot deployment.
Discuss which is preferable when multiple restarts are allowed and the goal is to find the best possible solution. Justify your answer.

Solution outline

Low variance: the algorithm is stable / deterministic in its outcomes - converges to similar solutions regardless of seed. High variance: the algorithm is exploratory or unstable - different seeds produce very different solutions. High variance can mean either useful diversity (the algorithm finds different basins) or instability (the algorithm is poorly tuned).
Low variance is preferable for reproducibility (the result is predictable) and for single-shot deployment (you only get one run, so you want a guaranteed acceptable quality). The classmate's intuition is correct in this regime.
High variance is preferable when multiple restarts are allowed and the goal is the best possible solution. With many runs you can keep the best one regardless of the rest - so the upside of occasionally finding a much better solution outweighs the cost of bad runs. The classmate's "always preferable" claim is wrong in this regime.

14. Suppose you are given a fixed budget of fitness evaluations and must choose between a small population evolved over many generations and a large population evolved over few generations. Answer the following:

Compare the two settings on exploration vs exploitation.
State which setting you would prefer when the landscape is known to have many local optima, and justify your choice.
Discuss how the choice interacts with the elitism level: should elitism be high or low under each setting? Justify.

Solution outline

Small pop / many generations: deep exploitation per individual; diversity collapses early; bully risk. Large pop / few generations: more diversity at start but selection pressure has too little time to act - the population may end noisy without converging. Doctor: "larger population = more exploration; more generations = more exploitation."
Prefer large population for a multimodal landscape. Multiple basins must be sampled to avoid converging on a local optimum; that requires the diversity that a large population brings. Few generations is acceptable because the goal is to find a good basin, not to fully refine inside one.
Small pop / many gens already exploits hard; elitism should be low or zero (the strong already self-survive, extra elitism turns them into bullies). Large pop / few gens explores hard; elitism should be slightly higher so the few good guys survive the noise of large breeding cycles.

B.3 Problem framing (Talbi-style)

15. You are given a set of boxes of different heights to be piled into k columns such that the differences between the column heights are minimised. Propose:

An objective function (specify minimisation or maximisation).
A representation suitable for a single-solution metaheuristic, and comment on its locality.
A tweaking operator. Discuss whether this operator may face a plateau problem similar to the one you encountered in bin packing.

Solution outline

Minimise the load imbalance - either max(column_height) - min(column_height), or the variance of the column heights. Minimisation problem.
Either an assignment vector of length n (for each box, which column it goes to), or a permutation decoded by a placement heuristic. The assignment vector has stronger locality - one move changes one column's height; the permutation has weaker locality because reordering can cascade through the placement.
Tweak: move one box from one column to another, or swap two boxes between columns. Plateau problem: yes - moving a small box rarely changes max - min, so the landscape has plateaus similar to the BPP "raw bin count" issue. Augment the objective with a sensitivity term (e.g. variance of all column heights, not just max - min) to make the landscape navigable.

16. In bin packing the natural objective is the number of bins, but the doctor recommended augmenting this objective with an empty-space term. Answer the following:

Explain in your own words why the raw count of bins is insufficient as a fitness signal for a single-solution metaheuristic.
Discuss how the augmentation with an empty-space term changes the landscape the algorithm sees.
Explain how this design choice mirrors the internal vs external fitness distinction from the coevolution lecture.

Solution outline

Most local moves (swap two items, move one item) do not change the number of bins. The landscape is mostly plateau - HC sees the same fitness for many neighbours and cannot tell whether it is improving. Selection pressure has nothing to bite on.
The empty-space term breaks the plateau into small slopes. Two solutions with the same bin count but different empty-space distributions now have different fitness, so the algorithm can move toward one with better slack - which often leads to dropping a bin in a future iteration.
The bin count is the external fitness (absolute, what we report at the end). The empty-space term is the internal fitness (used for breeding / selection only). Same idea as coevolution slide 9: same external (#bins), different internal (variance of empty space) - we add the internal term to give the algorithm a usable gradient.

17. A student claims that Hill Climbing with many random restarts is a population metaheuristic because many solutions are being maintained over time. Answer the following:

Discuss whether you agree with the claim, and justify your answer in terms of what truly distinguishes a population metaheuristic from a single-solution one.
Identify a specific mechanism that population metaheuristics have but random-restart HC does not. Explain the consequence of its absence.
Describe a small modification to random-restart HC that would push it closer to a true population metaheuristic. Justify your modification.

Solution outline

Disagree. Holding many solutions over time is not enough - what makes a method a population metaheuristic is communication between individuals. Random-restart HC has multiple runs but they are independent (each restart starts fresh, no information crosses between runs). The doctor's test: "they don't talk to each other."
Crossover (or any cross-individual information transfer). Without it, useful structure discovered by one run cannot be combined with structure from another. Each run reinvents the wheel; the budget is wasted on duplicated discovery.
Add a shared archive of "best solutions per region" and seed each new restart with a perturbation of an archived solution rather than a fully random start. This gives each restart a warm-start from prior knowledge - the simplest form of cross-restart communication. Now the runs share information through the archive, satisfying the population-metaheuristic test.

B.4 8-puzzle GA with coevolution overlay

18. Consider solving the 8-puzzle problem with a Genetic Algorithm. The hardest instance of the 8-puzzle is known to take 31 moves to solve. Answer the following:

Propose a chromosome representation and an objective function for the single-goal-state 8-puzzle. Justify which crossover and mutation operators you would (and would not) use, and explain how invalid moves should be handled.
Now suppose the goal state is no longer unique - there are several acceptable goal states, and the population should attempt to reach all of them. Discuss whether the chromosome needs to change, how the fitness function should be modified, and what diversity mechanism you would add to ensure the population does not collapse onto the easiest goal. Justify your answer with reference to the multi-goal-state coevolution slide.

Solution outline

Chromosome: length-31 string of moves over {U, D, L, R}, encoding movements of the blank space (not of the numbered tiles). Optionally add a STOP sentinel. Objective: Manhattan distance from the resulting board (after applying moves) to the goal - minimise. Crossover: single-point or two-point on the move sequence (apple-to-apple cuts, preserves move sub-sequence schemata). Do NOT use uniform crossover - it shatters schemata and creates U-D / L-R "dance" pairs at segment boundaries. Mutation: value replacement on a randomly chosen gene at rate ~1/31. Do NOT use swap mutation - the chromosome is not a permutation; swap would lose reachability of any direction never seen in the initial population. Invalid moves: repair (e.g. UU becomes UR) or ignore.
Chromosome stays the same. External fitness becomes the Manhattan distance to the closest of the goal states. Internal fitness becomes the relative distance within the cluster of individuals targeting the same goal, with a coverage bonus for individuals heading to under-served goals and a penalty for individuals crowded on the same goal. Add fitness sharing on goal-cluster (or quota-based selection reserving a slot per goal in the next generation) so the population does not collapse onto the easiest goal. This mirrors the multi-goal-state slide where the doctor explicitly warned about averaging distances.

B.5 Topic-focused short questions

19. Recall the midterm question on Simulated Annealing with extreme temperatures. Answer the following:

SA is run with temperature equal to zero at all times (the corresponding termination test is omitted). Identify which algorithm SA reduces to, and explain the impact on behaviour in terms of exploration and exploitation.
SA is run with temperature equal to infinity at all times. Identify which algorithm SA reduces to, and explain the impact on behaviour.
Compare these two extremes with the analogous extremes for two other algorithms in the course: PSO with the global-best pull set to a very large value, and Tabu Search with the tabu list set to size zero.

Solution outline

SA reduces to Hill Climbing. exp(delta-Q / 0) = 0 for any worsening move, so worse moves are never accepted. Pure exploitation - always greedy. The CB curve is monotone-improving (or flat once stuck at a local optimum).
SA reduces to a random walk. exp(delta-Q / infinity) = 1 for any move, so every move is accepted regardless of quality. Pure exploration. CB oscillates wildly; BSF improves only when lucky.
PSO with very large global-best pull is the same kind of degeneracy as SA at T=0: every particle stampedes to the leader, no exploration, premature convergence (the bully effect). TS with tabu list size zero is the same kind of degeneracy: no memory, the search returns to the local optimum on the next iteration, TS reduces to plain HC. All three cases share the pattern: pushing one parameter to its extreme collapses the algorithm into a simpler one (HC for SA T=0 and TS list=0; bully-collapsed swarm for PSO).

20. Schemata are sub-strings of the chromosome with some positions fixed and others left as wildcards. The schema theorem describes how the GA amplifies fit schemata over generations. Answer the following without computing any specific probability values:

Define the order and the defining length of a schema, and give a small example.
Given two schemata of equal order, one with a short defining length and one with a long defining length, explain qualitatively which is more likely to survive one-point crossover and why.
Connect the schema theorem to the doctor's warning about uniform 50/50 crossover. Explain why uniform crossover is dangerous from a schema-survival point of view.

Solution outline

Order o(H) = number of fixed positions in the schema. Defining length d(H) = distance between the first and last fixed positions. Example: schema H = 1 * * * 0 * * 1 * has order 3 (three fixed positions) and defining length 7 (positions 1 and 8 are the extremes).
The short-defining-length schema is more likely to survive one-point crossover. The probability that a one-point cut falls between two fixed positions is proportional to defining length / (chromosome length - 1). A short defining length means few cut positions can disrupt the schema, so survival probability is high. A long defining length means many cut positions disrupt, so survival is low.
Uniform crossover effectively chooses each gene independently from one of the two parents - it is many "cuts" per chromosome. Schemata with any defining length greater than 1 are very likely to be broken because every gene is a potential disruption point. The doctor's "ruins the schemas" warning is exactly this. Schema theorem says GA amplifies fit schemata over generations only when crossover preserves them; uniform crossover destroys this amplification.

21. Once Hill Climbing reaches a local optimum, several escape philosophies exist. Answer the following:

In Iterated Local Search (ILS), the algorithm perturbs the current local optimum and runs local search again. Discuss the impact on behaviour when the perturbation strength is (i) very weak, (ii) very strong.
In Variable Neighborhood Search (VNS), the radius of the neighbourhood itself changes between iterations. Compare a fixed-small-radius schedule with a randomly chosen radius each iteration.
In Guided Local Search (GLS), the search memory is kept inside the objective function itself via penalty terms on features. Explain how this mechanism prevents the algorithm from returning to the same region, and why the best-so-far must be tracked separately.
In GRASP, the candidate-list size acts as a greediness knob. Compare the behaviour at very small (top one) and very large (top one hundred percent) settings.

Solution outline

Very weak perturbation: local search falls back into the same basin - ILS reduces to plain HC, no escape. Very strong perturbation: equivalent to random restart - all the structure earned by previous local-search refinement is lost. Moderate perturbation (keep common structure, change a small subset) is the sweet spot - enough to escape the basin but not so much that you lose context.
Fixed small radius: always exploiting around the current solution - stuck at local optima. Random radius each iteration: sometimes the algorithm exploits (small radius), sometimes explores (large radius) - a single algorithm naturally cycles between the two without any explicit schedule. The doctor preferred this trick.
GLS adds a penalty term to the objective for every feature that has been used recently. The penalised landscape makes the visited region less attractive, so the local search naturally moves elsewhere - the algorithm physically cannot return because that place no longer looks optimal. Best-so-far must be tracked separately because the penalty deforms the landscape we are optimising against; the recorded best uses the original (un-deformed) objective so we do not lose the actual best.
Top 1 (pure greedy): always picks the best component - multi-start gives identical solutions, so GRASP loses its diversification benefit. Top 100% (fully random): no greediness - equivalent to random restart, loses the heuristic guidance entirely. The sweet spot is a moderate top-P% (e.g. 10-20%) - greedy enough to focus on good components, random enough to give different runs different solutions.

22. The doctor described representation as "the most important design choice." Answer the following:

Define strong locality and weak locality in the context of a genotype-to-phenotype mapping.
Compare the locality of a single-bit mutation under plain binary encoding and under gray code encoding for an integer-valued gene. Use a concrete example to illustrate.
The doctor showed how moving the 8-queens representation from a 2D board to a length-8 permutation reduces the search space dramatically. Explain how the choice of representation can outperform an algorithm upgrade, and justify your answer with reference to the 8-queens search-space reduction.

Solution outline

Strong locality: a small change in the genotype produces a small change in the phenotype (and therefore typically a small change in fitness). The landscape looks smooth from the algorithm's view. Weak locality: a small genotype change can produce a huge phenotype change - the landscape looks chaotic, the algorithm cannot tell whether it is exploring or exploiting.
Plain binary, integer 7 = 0111. Flipping the high bit gives 1111 = 15 (jump of 8). Flipping the low bit gives 0110 = 6 (jump of 1). Same operator, very different effects depending on position - weak locality. Gray code: any single-bit flip moves the value by exactly 1. Predictable, controlled - strong locality. Gray code is preferred when the algorithm relies on small moves (HC, SA).
2D board: search space = C(64, 8) approximately 4 billion. One-queen-per-row: 8^8 approximately 16 million. Permutation (one queen per row AND per column): 8! = 40,320. Each representation upgrade enforces a constraint by design rather than by penalty. The same algorithm runs 100,000x faster on the permutation representation - not because the algorithm changed, but because the representation eliminated infeasible regions. This is why the doctor calls representation the most important design choice.

23. A student decides to remove crossover entirely from a GA, keeping only mutation and selection. Answer the following:

Identify what the resulting algorithm is called and how it differs from a canonical GA.
Explain what the GA loses when crossover is removed. Frame your answer in terms of how individuals share information across the population.
Discuss when removing crossover is a defensible choice in practice. Use the 8-queens permutation example as your case study.

Solution outline

Evolution Strategy (ES). Each parent mutates copies of itself; selection still favours the fit. Differs from GA in the absence of two-parent recombination - the (mu, lambda) or (mu + lambda) frameworks describe the family.
The communication channel between individuals is gone. Each lineage evolves independently - useful structure discovered by one parent cannot be combined with structure from another. The doctor's framing: "no communication ... no concept of survival across individuals." Convergence still happens via selection pressure alone, but slower because schemata cannot be mixed.
Defensible when crossover would constantly produce invalid offspring. 8-queens permutation: standard one-point crossover on parents 1-2-3-4-5-6-7-8 and 5-4-1-8-2-7-3-6 produces children with duplicate values - infeasible permutations. Repairing them is expensive. Many published "GA on 8-queens" papers actually drop crossover entirely and use only swap mutation - the algorithm is a permutation-based ES in disguise. The trade-off: avoid the repair cost at the price of losing recombination.

Part C. Midterm question re-creations (verbatim, with Dr-style solutions) The seven questions from the actual midterm, reproduced verbatim. Solutions are written in the doctor's voice - using his analogies, his vocabulary, and the framings he repeated in lecture. The doctor reuses scenarios; expect at least one of these to come back in some form.

C.1 Midterm Q1 - HC random restart interval

M1. Consider the set of runs you conducted using the Hill Climbing (HC) strategy for the Bin Packing problem. Assume that you are going to repeat the experiment with HC with Random Re-start. Assume that you are limited with the exactly same number of iterations you had with HC (that is, equivalently, you are limited by the average overall time the experiment took across the 10 runs). With this constraint, would you consider the interval you give to each round of restart to be long, short, or moderate? Justify your answer based on the experiment you conducted. Hint: you will need to base your justification on your observations across the 10 runs you made in the experiments.

Solution outline (Dr's voice)

Choose moderate. The doctor's exact framing: long restart intervals push the algorithm toward exploitation (you spend most of the budget refining one basin), short intervals push toward exploration (you keep starting fresh, never giving any run time to climb), and moderate sits in between - each restart climbs its basin and then the budget moves to a new region.

Justify from observations in the doctor's pattern:

Your HC runs reached a plateau early (the local-optimum trap). Once the plateau is reached, every additional iteration in the same run is wasted - HC cannot escape on its own.
If you choose a very long interval, those wasted iterations after the plateau dominate the budget. You exploit a basin you already know is a dead end.
If you choose a very short interval, you never give the next run enough time to climb to its plateau - you mostly sample random starting points.
Moderate is the sweet spot: long enough that each restart actually climbs to its plateau (so you measure the basin's true value), short enough that you sample several different basins within the total budget. The doctor's rule: "moderate runs the gamut between the two."

C.2 Midterm Q2 - SA at T=0 and T=infinity

M2. Give the name of the algorithm that would result in these cases; justify. Explain impact on behavior.

Simulated annealing with temperature = 0 at all times (and omitting the corresponding termination test).
Simulated annealing with temperature = infinity at all times.

Solution outline (Dr's voice)

This is Hill Climbing. The acceptance probability for a worse move is exp((Q(R) - Q(S)) / T) = exp(negative / 0), which goes to zero. So worse moves are never accepted - SA only takes improving moves, exactly like HC. Behaviour: pure exploitation. The algorithm reaches the first local optimum and stays there forever (which is why the termination test had to be omitted - SA would never stop on its own). The current-best curve is monotone-improving until the local optimum, then flat. The doctor's framing: "if T is low, the probability of moving in the wrong direction goes to zero - this is because we are close to the solution, I don't want to move from there until I exploit it enough."
This is Random Walk (or random search). The acceptance probability for a worse move is exp(negative / infinity) = exp(0) = 1, so every move is accepted regardless of quality. The algorithm wanders through the landscape with no preference for better solutions. Behaviour: pure exploration. The current-best curve oscillates wildly; the best-so-far improves only when the wanderer happens to bump into a good solution. The doctor's framing: "if I start with very high temperature - regardless of how much better or worse than me, I will keep moving."

C.3 Midterm Q3 - Meta1 vs Meta2 (efficiency vs effectiveness)

M3. Let us consider two quantitative criteria to evaluate the performance of metaheuristics: efficiency and effectiveness. Consider the best-so-far plots for two metaheuristics, Meta1 and Meta2 (Meta1 reaches a moderate-quality solution quickly and then plateaus; Meta2 improves slowly but eventually surpasses Meta1). According to each criterion, which metaheuristic may be considered the best one? Propose and explain possible way(s) to use Meta1 and Meta2 together to complement each other for better overall performance.

Solution outline (Dr's voice)

By efficiency: Meta1 is best. Efficiency is "good quality with few resources" - Meta1 reaches a usable solution very quickly, so for a tight time budget Meta1 is preferred. Looking at the plots, at time t1 Meta1 has already reached a solution s1 that Meta2 only reaches at t2 (much later).

By effectiveness: Meta2 is best. Effectiveness is "best final quality" - given enough time, Meta2 surpasses Meta1 and reaches a strictly better solution. The doctor's framing: "Meta2 is more effective if final quality is better."

Ways to combine them:

Sequential hand-off (recommended): Run Meta1 first to quickly reach a good basin. Once Meta1 plateaus (no improvement for k iterations), hand the best-so-far to Meta2 as its starting point. Meta2 then refines slowly from a warm seed. You get Meta1's efficiency early and Meta2's effectiveness late.
Parallel: Run Meta1 and Meta2 simultaneously under the same budget; return the best of the two. Useful when you do not know which algorithm's failure mode to expect.
Restart with seeding: Use Meta1's best as a seed for Meta2's initial population (if Meta2 is population-based). Combines exploitation of Meta1's discovery with the diversity of Meta2's search.

The doctor's general rule: "a single algorithm is rarely both efficient and effective; combining gives a Pareto-better outcome."

C.4 Midterm Q4 - Box piling problem (Talbi-style)

M4. Given a set of boxes of same base dimensions but of different heights h_i (i = 1, ..., n), the objective is to pile the boxes on top of each other into k columns C_j (j = 1, ..., k) such that the total heights of the boxes in each column are as similar as possible. Propose an objective function that could be used to find the best distribution of objects. Specify whether it is minimization or maximization problem. Propose a representation of solutions to tackle this problem using s-metaheuristic. Comment on the locality of your representation. Propose and explain a tweaking operator. Note: this is similar to bin packing problems but we are replacing weights with heights.

Solution outline (Dr's voice)

Objective function (minimisation): minimise the load imbalance across columns. Two reasonable choices:

max(column_height) - min(column_height) - the spread between the tallest and shortest column. Simple and intuitive.
variance of column heights - sum of (column_height - mean)^2 over all columns. More sensitive: it differentiates two solutions that have the same max - min but different intermediate columns. The doctor would prefer this for "sensitivity to changes" - same reason we add empty-space variance to bin packing fitness.

Representation: an assignment vector of length n where slot i holds the column index (1 to k) of box i. Locality: strong - moving one box changes one column's height (and the height of the column it left). The fitness changes by a small, predictable amount per move. Alternative: a permutation of boxes decoded by a placement heuristic - weaker locality because the decoder cascades.

Tweak operator: "move one box from column A to column B" - pick a box at random and reassign it to a different randomly chosen column. Optionally also "swap two boxes between two columns." Connectivity: any feasible distribution can be reached from any other by a sequence of moves.

Plateau warning (the doctor's BPP analogy): if the objective is just max - min and the moved box is small, the spread does not change at all. Most local moves give the same fitness - HC sees a plateau. The variance objective fixes this by giving every move a slightly different fitness, which is the same trick the doctor recommended for the bin count in BPP.

C.5 Midterm Q5 - Small population vs large population

M5. Consider a p-metaheuristic technique to solve a given problem, assume that the fitness function can be executed only a limited number of times. Discuss the impact of considering small population and more generations vs large population and less generations on the behavior of the metaheuristic technique.

Solution outline (Dr's voice)

Total fitness evaluations = population size x number of generations. Under a fixed budget, you trade one for the other. The doctor's rule: "the larger the population size, the more exploration. The more the number of generations, the more exploitation."

Small population, many generations:

Strong exploitation. Each individual gets refined over many breeding cycles.
Diversity collapses early - with few individuals, selection pressure quickly converges them onto the bully.
Risk of premature convergence to a local optimum.
The extreme case: population size = 1, many generations = pure Hill Climbing.

Large population, few generations:

Strong exploration. Many starting points sample many basins of the landscape.
Selection pressure has too few generations to act - the population may end up barely refined.
The extreme case: population size = budget, generations = 1 = pure random search.

Which to prefer:

Multimodal / rugged landscape -> favour exploration -> larger population.
Smooth / unimodal landscape -> favour exploitation -> more generations.
Unknown landscape -> pick a balanced split (e.g. pop = 50, gens = 200 for a 10000 budget).

The doctor would expect you to identify both extremes (HC and random search) and to justify the middle as a designer's choice based on the landscape.

C.6 Midterm Q6 - Schema theorem with calculations

M6. Consider a binary string of length 11 and consider the following schemata: 1*********1, 11*********, *****111***, and **1***1**0*.

Under crossover with uniform one-point crossover calculate a lower limit on the probability of these schemata surviving crossover in one generation.
Calculate the probability of surviving mutation in one generation if the probability of mutation is 0.9 at a single bit position.
Calculate the probability of these schemata surviving both crossover and mutation in one generation.

Solution outline (Dr's voice)

Step 1 - identify order o(H) and defining length d(H) for each schema:

H1 = 1*********1: fixed at positions 1, 11. Order o = 2, defining length d = 11 - 1 = 10.
H2 = 11*********: fixed at positions 1, 2. Order o = 2, defining length d = 2 - 1 = 1.
H3 = *****111***: fixed at positions 6, 7, 8. Order o = 3, defining length d = 8 - 6 = 2.
H4 = **1***1**0*: fixed at positions 3, 7, 10. Order o = 3, defining length d = 10 - 3 = 7.

Crossover survival (lower bound). P(survive | crossover) >= 1 - p_c x d(H) / (l - 1). Assume p_c = 1 (worst case, the question says "lower limit"):
- H1: 1 - 1 x 10/10 = 0 (almost always destroyed by crossover - the long defining length is fatal).
- H2: 1 - 1 x 1/10 = 0.9 (very robust - short defining length protects it).
- H3: 1 - 1 x 2/10 = 0.8 (still good).
- H4: 1 - 1 x 7/10 = 0.3 (long defining length hurts).
Mutation survival. P(survive | mutation) = (1 - p_m)^o(H). With p_m = 0.9:
- H1, H2 (order 2): (1 - 0.9)^2 = 0.1^2 = 0.01.
- H3, H4 (order 3): (1 - 0.9)^3 = 0.1^3 = 0.001.
Note: p_m = 0.9 is enormous (a normal GA uses ~1/L). At this rate, mutation alone almost destroys every schema regardless of defining length.
Combined survival. Multiply (a) and (b):
- H1: 0 x 0.01 = 0.
- H2: 0.9 x 0.01 = 0.009.
- H3: 0.8 x 0.001 = 0.0008.
- H4: 0.3 x 0.001 = 0.0003.

Conclusion in Dr's voice: H2 survives best because both its order and its defining length are small. The schema theorem's lesson appears clearly here: short, low-order schemata propagate; long, high-order ones get destroyed. With p_m = 0.9 the mutation rate is way too high for any normal GA - this is why typical GAs use p_m approximately 1/L (one bit on average per chromosome).

C.7 Midterm Q7 - 8-puzzle GA

M7. The 8-puzzle consists of a 3x3 board with eight numbered tiles and a blank space. A tile adjacent to the blank space can move into the space. The objective is to find the sequence of moves to reach a specified goal state, such as the one shown in the figure. Using Genetic Algorithms, propose a representation (chromosome design) and the objective function to optimize for this problem. Justify the genetic operators (mutation and crossover) you would (would not) use for the proposed representation. Note that the hardest 8-puzzle instance is known to take 31 moves to solve.

Solution outline (Dr's voice)

Chromosome design (representation). Encode each gene as a movement of the blank space (not of a numbered tile) over the alphabet {U, D, L, R}. The doctor's exact framing: "for the computer it makes sense - I can say move the space up, which means the tile below moves up." This standardises the alphabet to four symbols regardless of which numbered tile is being moved. Length of the chromosome = 31 (the worst-case 8-puzzle instance, given by the question). Optionally include a STOP sentinel so the chromosome can self-terminate when the goal is reached early. Constraint to enforce: never two consecutive moves of U-D or L-R - they "dance each other" and waste 2 of your 31 slots.

Objective function. Apply the move sequence to the initial board, then measure the distance from the resulting board to the goal. Two options:

Manhattan distance (preferred): for each tile, sum the row and column distance to its goal position. Smoother gradient - the doctor's "sensitivity to changes" principle.
Misplaced tiles count: simpler but coarse - many chromosomes get the same score.

Both are minimisation. The Manhattan version was the doctor's worked example in lecture.

Mutation - use value replacement, NOT swap. Pick one gene and replace it with a different symbol from {U, D, L, R}. The chromosome is not a permutation - each gene is an independent value over a 4-symbol alphabet, so the natural mutation is "change one move into a different move." Why I would NOT use swap: swap only rearranges existing moves; it never introduces a new direction. The doctor's general rule from the 8-queens lecture applies here: "if your operator pipeline can leave a value missing, swap will never recover it." For the 8-puzzle, every direction must remain reachable at every position, so swap loses reachability - it is the wrong operator. Mutation rate: ~1/L = 1/31.

Crossover - use 1-point or 2-point, NOT uniform. Single-point or two-point crossover at a moderate-to-high rate (~0.7-0.9). The chromosome has positional semantics but every position holds the same kind of value - cuts are apple-to-apple. 1-point and 2-point preserve large schemata of "useful move sub-sequences." Why I would NOT use uniform: uniform 50/50 crossover destroys schemata - the doctor's standing warning is "50/50 ruins the schemas, random always means exploration." For the 8-puzzle this is doubly bad: uniform crossover is likely to produce U-D or L-R "dance" pairs at the boundaries between inherited single-gene segments, wasting moves. Order-preserving / cycle crossovers are not needed because the chromosome is not a permutation.

Invalid-move handling. Some chromosomes contain moves that are infeasible at runtime (e.g. L when the blank is already in the leftmost column). Four strategies:

Repair (preferred): transform the invalid move into a feasible one. The doctor's exact example: "UU becomes UR." Keeps the chromosome semantically meaningful.
Stop early: when the first invalid move is hit, freeze the board and evaluate.
Penalty: add a large cost for each invalid move.
Ignore / discard: skip the invalid move and continue.

Mock final exams (30 pts each) Each mock picks one Part A umbrella as Q1 (10 pts) plus four conceptual questions from Part B / C (5 pts each). Practice under a 90-minute clock.

Mock final A

Q1 Part A (10 pts) - parameter settings analysis .
Q7 Part B (5 pts) - multi-goal fitness design (8-puzzle, design-from-scratch).
Q13 Part B (5 pts) - Meta1 + Meta2 hand-off.
Q16 Part B (5 pts) - boxes-into-columns problem framing.
Q19(a) Part B (5 pts) - 8-puzzle GA chromosome and operators.

Mock final B

Q2 Part A (10 pts) - adaptation scenarios (BPP-specific).
Q8 Part B (5 pts) - Algorithm 76 internal/external + S1/S2 BPP variance.
Q10 Part B (5 pts) - firewall mapped onto 2-pop / GAN.
Q14 Part B (5 pts) - low vs high variance across runs.
Q17 Part B (5 pts) - bin packing objective vs fitness.

Mock final C

Q3 Part A (10 pts) - hybrid pipelines for BPP.
Q9 Part B (5 pts) - external fitness options (guru, panel, past generations).
Q11 Part B (5 pts) - cooperative soccer team.
Q12 Part B (5 pts) - diversity preservation mechanisms and similarity types.
Q19(b) Part B (5 pts) - 8-puzzle multi-goal extension.

Mock final D

Q4 Part A (10 pts) - extreme parameter tweaks (algorithm degeneracy).
Q6 Part B (5 pts) - multi-goal-state internal fitness design (the two unspecified individuals).
Q13 Part B (5 pts) - Meta1 + Meta2 hand-off.
Q15 Part B (5 pts) - small pop many gens vs large pop few gens.
Q18 Part B (5 pts) - HC with random restart claim about population metaheuristics.

Mock final E

Q1 Part A (10 pts) - parameter settings analysis.
Q8 Part B (5 pts) - Algorithm 76 internal/external + S1/S2 BPP variance.
Q11 Part B (5 pts) - cooperative soccer team.
Q14 Part B (5 pts) - low vs high variance across runs.
Q19(b) Part B (5 pts) - 8-puzzle multi-goal extension.

Mock final F

Q2 Part A (10 pts) - adaptation scenarios.
Q9 Part B (5 pts) - external fitness options.
Q10 Part B (5 pts) - firewall mapped onto 2-pop / GAN.
Q12 Part B (5 pts) - diversity preservation.
Q16 Part B (5 pts) - boxes-into-columns problem framing.

Mock final G - topic-focused

Q3 Part A (10 pts) - hybrid pipelines for BPP.
Q20 Part B (5 pts) - SA at T=0 and T=infinity (extreme temperature analysis).
Q21 Part B (5 pts) - schema theorem.
Q23 Part B (5 pts) - representation locality and the 8-queens search-space upgrade.
Q7 Part B (5 pts) - multi-goal fitness design.

Mock final H - escape strategies and self-breeding

Q4 Part A (10 pts) - extreme parameter tweaks.
Q22 Part B (5 pts) - escape strategies (ILS, VNS, GLS, GRASP).
Q24 Part B (5 pts) - self-breeding / no-crossover GA.
Q21 Part B (5 pts) - schema theorem.
Q19(a) Part B (5 pts) - 8-puzzle GA chromosome and operators.

Mock final I - midterm reruns

M7 Part C (10 pts) - 8-puzzle GA representation and operators (the original midterm Q7).
M2 Part C (5 pts) - SA at T=0 and T=infinity (the original midterm Q2).
M3 Part C (5 pts) - Meta1 + Meta2 efficiency vs effectiveness.
M5 Part C (5 pts) - small population vs large population under fixed budget.
M4 Part C (5 pts) - box piling problem.