Pre-GA Topics: Exam Concepts

Combined transcript-first reference for the foundations and the single-solution metaheuristics: complexity, optimization framing, hill climbing, statistics, exploration vs exploitation, simulated annealing, tabu search, and escape strategies.

Core exam idea: The doctor's pre-GA module is built around three engineer-controlled levers: representation (controls landscape and connectivity), tweak operator (controls move size and locality), and fitness function (controls sensitivity to small moves). Every algorithm in this part - HC, SA, TS, ILS, VNS, GLS, GRASP - is a different way to dial exploration vs exploitation on top of these three knobs.

Part I - Foundations

1. What is a Metaheuristic? Complexity Motivation

The doctor positions metaheuristics as what you reach for when polynomial algorithms do not exist. If your problem is linear, n log n, n², or even n³, do not even open this course. Only when search space is exponential do metaheuristics earn their place.

"These are million dollar. If you solve one of these NP-hard problems, you solve all of them, you get a million but whomever comes first ... don't even try chatgpt."

Worst-case complexity	What to do
Linear / polynomial / n log n	Use the polynomial algorithm. Don't go forward with this course.
n^k for moderate k	Still acceptable for small n. May not need a metaheuristic.
Exponential (e.g. SAT 2^n, BPP combinatorial)	Brute force is infeasible. Metaheuristic is justified.

"SAT search space 2^100 = 10^30. Even at 1000 evaluations per second, less than 1% requires 15 billion years."

Exam-ready definition: A metaheuristic is a general problem-solving template that explores the search space without a closed-form algorithm. It applies to any problem where a candidate solution can be generated and evaluated. It does not guarantee the optimum - it gives you the best one found within your time budget.

2. Optimization Problem Formulation

Three pieces define an optimization problem: a search space, an objective function, and the goal of finding s* such that f(s*) is best (min or max) among feasible s.

find s* in S such that for all s in S, f(s*) <= f(s) [minimization]

The four problem types (input-model-output triangle)

Type	Known	Find
Optimization	Model + output target	Input (decision variables)
Modelling / ML	Input + output	Model (parameters)
Simulation	Model + input	Output (what-if)
Constraint satisfaction (CSP)	Model + constraints, no objective	Any feasible solution

Hard vs soft constraints

Hard: binary pass / fail. "I cannot teach two classes at the same time." Violation = infeasible.
Soft: weighted in the objective. "I'd rather teach at 4pm; 5pm is acceptable but worse."

Decision vs optimization

Every optimization can be reduced to a decision problem (does there exist s with f(s) less than k?). The 8-queens canonical example: the decision form asks "is there a configuration with zero attacks?"; the optimization form asks "minimize the number of attacks." We use optimization because metaheuristics need a continuous improvement signal, not just yes/no.

3. Penalty Design for Constraints

The "I added 1000, he added 2" debate. The doctor walked the class through three penalty styles using the bin-packing gas-vs-liquid example. The right answer is proportional, not fixed.

Strategy	How	When
Reject	Discard the candidate, generate another.	Wastes iterations. Acceptable only when violations are rare.
Repair	Take the violating component out and place it elsewhere (Sharia's bin-packing fix).	Always feasible. Higher cost per iteration.
Penalty function	Add a penalty term proportional to the violation amount or percentage.	Most flexible. Watch the magnitude.
Decoding / by-design	Pick a representation where invalid solutions cannot exist (the gold standard).	Hardest to design; cheapest to run.

Penalty magnitude trap

Fixed large (e.g. +1000). Insensitive to how much you violated. Loses gradient back to feasibility.
Fixed small (e.g. +2). Same problem in the other direction - one violation looks identical to a hundred.
Proportional / percentage. Penalty scales with the violation. Selection pressure can guide the search back toward feasibility.

Decoding example: 8-queens permutation chromosome. By construction one queen per row (vector position) and one per column (distinct values). Tweak = swap. The search space drops from 4 billion (general 8x8 grid) to ~40,000 (8! permutations). No infeasible state can ever exist.

4. Stopping Criteria

Criterion	How you know	Trap
Found ideal / optimal	You confirm via problem structure (e.g. zero attacks in 8-queens).	Often impossible to prove for NP-hard problems.
Out of time / max iterations	Wall-clock budget or iteration counter.	None - this is the most common.
No improvement (saturation)	Best-so-far has not improved in k iterations.	You may be stuck at a local optimum, not the global one.
Schedule completion	SA temperature reached zero (or below threshold).	Tied to SA only.

"Saturation - the error does not decrease anymore - then you stop. But sometimes you don't reach the ideal. Sometimes you reach the local. So what do we do? Restart."

Part II - Hill Climbing and the s-Metaheuristic Skeleton

5. The s-Metaheuristic Family

Single-solution metaheuristics share the same loop: hold one solution, generate one (or several) candidates, decide who survives. The differences are in how the candidates are generated and how the survival decision is made.

loop: s -> tweak -> r -> if accept(r, s) then s = r ; until stop

Notation	Meaning	Behaviour
1+1	One parent, one child, the better survives.	Pure greedy. Maximum exploitation.
1,1	One parent, one child, child always replaces parent.	Forced movement. Allows escape from local optima.
1+λ	One parent, λ children, best of (parent and children) survives.	Sample-the-gradient with elitism.
1,λ	One parent, λ children, parent dies, best of children continues.	Sample-the-gradient without elitism. Exploration-friendly.

Why pure 1+1 is dangerous

"Why is it greedy? You always prefer the best, even though that the best might get me into local optimum."

Because the parent always competes, the algorithm never accepts a worse move - it cannot escape a basin without an external mechanism (restart, memory, perturbation). This is exactly the gap that SA, TS, ILS and friends fill.

6. Tweak Operator Design

Three properties to engineer: magnitude (small vs large), connectivity (can it reach the optimum from any start?), and locality (small change in genotype = small change in fitness?).

Magnitude

Small move = exploitation. One swap, one bit flip on Gray code.

Large move = exploration. Multi-component change, big perturbation.

Random radius = the doctor's recommended trick. Draw the move size from a uniform or Gaussian distribution every iteration. Sometimes you exploit, sometimes you explore - in the same run, with no manual schedule.

Connectivity

The single most-emphasised warning. If the operator cannot reach a target solution from your starting point, no number of iterations will save you.

"The first bin, nobody can move to the first bin in your design ... you will never be able to move anything to the first bin."

For BPP: swap alone is insufficient because swapping items between bins never empties a bin - so the bin count never drops. You must include move (cross-bin transfer) for the operator to reach lower-bin-count states.

Locality

Strong locality means a small genotype change produces a small fitness change. This is what lets HC and SA "feel" the gradient.

Plain binary representations have weak locality by construction: flipping the high bit of a number can change it by a huge amount. Gray code restores strong locality - any one-bit flip changes the value by exactly one.

Steepest vs first-improvement

Steepest-ascent: evaluate every neighbour, take the best. Expensive but informed.

First-improvement: walk neighbours in some order, take the first one that improves. Cheaper, less greedy, often surprisingly competitive.

7. Representation and Locality

Representation choice is the single most important design decision. Same algorithm + different representation = different algorithm in practice.

"Same algorithm exactly. Just her representation versus your representation. Mine is faster than hers because of representation, not algorithm."

Genotype vs phenotype

Genotype = the encoding the algorithm manipulates. Phenotype = what the encoding decodes to (the actual solution we evaluate). The mapping between them determines locality.

Three pathological mappings

Mapping	What it means	Consequence
One-to-one	Each genotype maps to exactly one phenotype.	Ideal. No waste.
One-to-many (broken)	Same genotype decodes to multiple phenotypes.	Algorithm cannot distinguish solutions. Avoid.
Many-to-one	Multiple genotypes decode to the same phenotype.	Search space is artificially inflated; the algorithm wastes iterations on duplicates.

Worked example: the 8-queens search-space upgrade

Representation	Constraint enforced	Search space
2D 8x8 board (any cell)	None	C(64, 8) = ~4 billion
One queen per row (length-8 vector of column indices)	One per row	8^8 = ~16 million
Permutation (length-8 vector with distinct values)	One per row AND one per column	8! = 40,320

Exam takeaway: The same algorithm runs 100,000 times faster on the permutation representation - not because the algorithm changed, but because the representation eliminated infeasible regions by design.

8. Random Restart (Midterm Q1)

Likely question shape: "You run HC with random restart under a fixed iteration budget. Should the restart interval be long, short, or moderate? Justify." (This was the literal midterm Q1.)

Restart interval	Behaviour	Tag
Very long	Most of the budget spent on one HC run. Few restarts. Each run gets to fully exploit a basin.	Exploitation
Very short	Many restarts, none of which has time to climb. Mostly random sampling.	Exploration
Moderate	Each restart has time to refine, then the budget moves to a fresh region.	Balanced

"Very long interval - you have so much time. What does it mean? Exploitation, because you are very short on interest. The shortest could be one ... random is more exploration."

Distribution choice for restart intervals

Uniform. 1000 has the same probability as 20 or as 1.
Gaussian. Concentrates around the mean - decide how many restarts you want and pick the mean accordingly.

Exam answer: Pick moderate intervals. Long wastes the budget after each plateau; short never gives any run a chance to exploit. Moderate lets each restart climb its basin and then samples another region.

9. Sample-the-Gradient vs One-Plus-One

The n-sample HC variant: at each step, generate n candidate tweaks, then commit to the best one. n=1 is pure greedy 1+1. n = full neighbourhood is steepest-ascent.

Why n=1 is bad

"You always say I have one. The next place, the second-best could be better - but I'm not willing to sacrifice."

You commit blindly to the only sample you saw. With no gradient information you cannot tell whether you are walking up the steep side of the hill or the gentle side.

Why full-neighbourhood is also dangerous

Steepest-ascent is expensive (the per-iteration cost is proportional to the neighbourhood size) and is more prone to local-optima trapping because there is no randomness to escape with.

The 1,λ trick (forced exploration)

1+λ keeps the parent in the competition (elitism). 1,λ does not - the parent dies regardless. This forces the search to move every iteration, even if all children are worse than the parent. The deliberate willingness to "go down" is exactly how HC variants escape local optima before SA was invented.

Part III - Statistics and the Exploration / Exploitation Dial

10. Multiple Runs, Confidence Intervals, Fitness Design

Why a single run means nothing

"It is all non-deterministic. If I run it again most have a statistical difference. I want to run it 10 times and report the average. His best is better than my best - is it repeated? How can I guarantee?"

Always: average + variance + median + 95% confidence interval (CI). Box plots are the standard visualization. Non-overlapping CIs = significant difference; overlapping = inconclusive.

The empty-space trick (BPP fitness design)

The objective is "number of bins" - but that is too flat. Most local moves do not change the bin count, so HC sees a plateau and cannot tell whether it is improving.

"Most of the time you find that you are stuck. Equal. Same number of bins but you add what - empty space, or the minimum number of empty whatever. Designing the quality function as you design - this is one of the main things that we do here."

Add an empty-space variance term, a fill ratio, or a minimum-slack term. The new fitness is sensitive to changes that the bin count cannot see. (This idea is reused as internal vs external fitness in coevolution; same trick, different name.)

Lower bounds for BPP

Sum of all weights divided by bin capacity = the theoretical minimum number of bins. Combined with the FFD approximation ratio (~17/10 OPT + 2), you can bound the optimality gap of any solution.

11. Exploration vs Exploitation as a Fundamental Dial

The five levers the engineer can pull:

Lever	Pull toward exploitation	Pull toward exploration
Locality (representation)	Strong (Gray code)	Weak (plain binary)
Tweak magnitude	Small move (1 swap)	Large move (multi-swap)
Restart interval	Long	Short
n-sample size	n=1 (greedy)	1,λ (forced move)
Population (foreshadow)	Single solution	Population by design

Diagnosis via plots: Read your monitors. If current best hugs best so far, you are exploiting too much. If current best oscillates wildly while best so far barely improves, you are exploring too much. The doctor's framing: "By seeing my monitors, I can tell - I need more exploration, I need more exploitation."

Part IV - Simulated Annealing

12. SA One-Line Model

SA is hill climbing with a probabilistic willingness to accept a worse move. The acceptance probability decreases with the size of the quality gap and increases with the temperature.

if Q(R) is better than Q(S): always accept R otherwise: accept R with probability P = exp( (Q(R) - Q(S)) / T )

Inspired by physics, not biology - the doctor stresses this contrast. A hot metal lets atoms rearrange; cooling slowly lets them settle into a clean lattice. Cooling too fast freezes them in a random configuration.

"At the beginning we have the temperature high - even if I'm going to move in a wrong direction, there is a high probability I'm going to move."

13. SA Temperature Extremes (Midterm Q2)

Midterm question shape: "Give the name of the algorithm that would result in these cases. Justify. Explain impact on behaviour. (a) SA with T = 0 at all times. (b) SA with T = infinity at all times."

Temperature	Acceptance probability of worse moves	Algorithm reduces to	Tag
T = 0	exp(-large / 0) = 0. Worse moves never accepted.	Hill climbing.	Pure exploitation
T = infinity	exp(any / infinity) = 1. Every move accepted.	Random walk.	Pure exploration
Moderate, schedule cooling	Starts permissive, becomes strict.	Classical SA - explores then exploits.	Balanced over time

"If you start with very high temperature, regardless of how much better or worse than me, I will keep moving. If T is low, the probability of moving in the wrong direction goes to zero."

14. Initial Temperature Relative to Typical Delta-Q

T must be calibrated to the spectrum of objective-value differences in your problem. The number 250 or 500 or 5000 is meaningless on its own - what matters is the ratio of T to the typical |Q(R) - Q(S)|.

T relative to typical \|delta-Q\|	Behaviour
T much smaller	delta-Q dominates the exponent. Probability of accepting worse moves is essentially zero from iteration one. SA degenerates to HC immediately.
T comparable	Acceptance probabilities are meaningful and span a useful range.
T much larger	delta-Q / T is near zero. Every move is accepted. SA is indistinguishable from random walk for many iterations until cooling catches up.

"In order for this to be meaningful for exploration, what do I have to do? The temperature has to live on the same scale as the typical quality difference. So 5000 may be the right starting point for a problem where Q ranges from 10 to 1000."

15. SA Cooling Schedule

Schedule	How	Behaviour
Linear	T = T_0 - (T_0 / N) * iter. Drops by a constant per step.	Predictable. Reaches zero exactly at the end of the budget.
Geometric (most common)	T_new = alpha * T_old, with alpha in [0.9, 0.99].	Never reaches zero. Approaches it asymptotically.
Logarithmic	T = C / log(iter).	Theoretically convergent to the global optimum, but impractically slow.
Adaptive	Adjust T based on observed best-so-far behaviour.	The engineer's design choice. Tied to plot diagnostics.

The "stretch out" framing - exam-likely phrasing

"The longer you stretch out the schedule, the longer the algorithm resembles random. The shorter, the more concentrated the hill climb."

Cooling speed	Behaviour	Tag
Very fast	Freezes early. Becomes HC quickly. Stuck at the first local optimum.	Exploitation
Very slow	Stays hot too long. Random walk for most of the budget. Never settles.	Exploration
Geometric (alpha around 0.95)	Classic balance.	Balanced

Part V - Tabu Search

16. TS One-Line Model and the Bully

Tabu Search = hill climbing + memory of forbidden states/moves to escape local optima.

"What is the problem with local minimum? It tells everybody 'I am right, don't move from here.' This is the bully. We will say: this is taboo. Haram in Arabic - non-touchable."

Without memory, HC after escaping a local optimum walks straight back to it on the next iteration. Tabu memory forbids re-entry for a number of iterations (the tenure).

The 1,λ subtlety

TS is structured as (1, λ): generate λ neighbour tweaks, take the best non-tabu one - without comparing against the current S. If you compared back to S you would prevent escape, because S is the bully.

"You are not trying to exploit. The tabu means I do not want to exploit, go here and sample."

17. Tabu List Contents

The "billions of states" warning: The doctor stressed that storing full solutions in the tabu list is naive. The state space is much larger than any list can ever cover.

Full solutions

Store the entire previous solution. Maximally specific.

Trap: the area near a stored solution is enormous; you cannot cover the landscape; near-cycles slip through.

Ranges / regions

Store a range of values around the visited point. Discretise continuous landscapes first.

Each tabu entry covers a whole neighbourhood. Much more memory-efficient.

Move attributes

Store the swap or move you just executed. Forbid its inverse for the next k iterations.

Cheap and intuitive. Standard default.

Feature components / schemata

Store a structural fragment using wildcards (e.g. 8*5*). Forbid any solution matching the schema.

For BPP: which items are co-located in a bin. For TSP: which edge is used. For scheduling: "job 3 at position k".

Restaurant queue analogy: The tabu list is a queue. FIFO eviction is fair but unhelpful when one heavy entry clogs the slot forever. Quality-based eviction is greedy and dangerous - the entry you most need to keep banned may be the one you evict. The doctor recommends probability-based eviction with adaptive list length - grow when cycling, shrink when over-constrained.

18. Tenure Length Extremes

Tenure = the number of iterations a forbidden move stays in the list.

Tenure	Behaviour	Tag
Zero	No memory at all. Forbidden move can be undone immediately. TS reduces to plain HC; cycling is possible.	No protection
Very small (1-2)	"Cycling - same step again, again." Almost no anti-cycling protection.	Cycling
Moderate (5-9 swaps)	Lets the search escape the local bully but releases moves in time so good ones can be re-tried.	Balanced
Very large	Almost no move can be repeated within the run. Forced diversification, but blocks legitimately good moves and can corner the search.	Over-constrained

"Too few is cycling. Too many - constrained. You are not changing because that was a bad swap, can I go back? No, you have to wait until this swap goes out."

19. Aspiration Criteria

The standard aspiration rule: override the tabu list when the candidate move would yield a solution better than the global best so far. Refusing such a move because it is technically tabu would be foolish.

"The tabu may be too restricted. Aspiration is accepting a solution even if generated by a tabu move - because it has the best found. Later in genetics we are going to call it the elite."

Aspiration is the safety valve that protects effectiveness regardless of the restriction style chosen for the tabu list.

20. Long-Term Memory (Frequency-Based)

The tabu list itself is short-term memory (a sliding window of recent moves). Long-term memory tracks how often each component appears across all visited solutions, then uses that frequency to decide what to do next. The key insight: the same observation (a component appearing often) can be interpreted in two opposite ways, and the designer chooses which one based on what the search needs.

Observation	Interpretation	Action
High frequency (a component keeps appearing in good solutions)	Strong signal -> intensify around it	Restart from a configuration where the high-frequency components are fixed. Doctor: "let me exploit them more."
	Bully -> diversify away from it	Add the high-frequency components to the tabu list. Force every new candidate to use different components.
Low frequency (a component rarely tried)	Under-explored -> diversify into it	Restart from a configuration that contains the under-explored components. Doctor: "totally thinking outside the box."

Why two rows for "high frequency"? The doctor explicitly described both uses in lecture - same data, opposite philosophy. Choose intensification when you trust the signal (the component is genuinely good); choose diversification when you suspect a bully (the component is dominating just because it appeared early, not because it is best).

Exam phrasing to memorise: Short-term memory protects against cycling. Long-term memory adapts the search direction. High-frequency components can be either intensified or diversified depending on whether you treat them as a strong signal or as a bully.

21. Reactive Tabu

Reactive Tabu does not fix the tenure or list size up front - it adapts both based on observed behaviour during the run.

Observation	Reactive response
Solutions repeat / cycling detected	Tenure too short. Lengthen it.
Search is over-constrained, no useful moves available	Tenure too long. Shorten it.
No improvement for many iterations + tenure already long	Trigger a diversification step (random restart, perturbation, frequency-based restart).

"You play with it - you decide. I see I need more separation - increase the size of memory. I see I'm starting to be stuck - shorten."

Part VI - Escape Strategies

The escape menu: Once HC is stuck at a local optimum, you have a portfolio of philosophies to escape: restart from scratch (random restart), perturb the current solution (ILS), expand the neighborhood (VNS), reshape the landscape itself (GLS), build solutions greedy-randomized from scratch (GRASP), or perturb the objective (noising).

22. Iterated Local Search (ILS)

Idea: instead of restarting from scratch, perturb the local optimum a little and run local search again. The perturbation should land you in a nearby basin with a (hopefully) better local optimum.

"Find a local minimum, then look for a nearby local optimal. This is the difference. That is iterated local search - nearby and possibly adopt that one, then find a new nearby local."

Perturbation strength extremes

Perturbation	Behaviour
Very weak / zero	Local search falls back into the same basin. ILS reduces to plain HC. No escape.
Moderate (keep common, change rest)	Doctor's recommended approach. Exploits accumulated structure while still escaping the current basin.
Very strong / random restart	"Completely change must be avoided - you lose all the work invested." Equivalent to random restart, no ILS benefit.

Subtle Gray-vs-binary point: For local search you want strong locality (Gray code, one-bit flip = small move). For the perturbation step you want weak locality (binary, one-bit flip = potentially big jump). The same representation cannot serve both - this is why some ILS variants use one encoding for local search and another for perturbation.

23. Variable Neighborhood Search (VNS)

Idea: the neighborhood definition itself is variable. Sometimes you search close (exploit), sometimes far (explore). The radius schedule is the design knob.

"Sometimes I'm going to search close. Sometimes search far. That is why it is called variable neighborhood."

Radius schedules

Schedule	Behaviour
Always close (small fixed radius)	Always exploiting. Stuck at local optima.
Always far (large fixed radius)	Always skipping local basins. Never converges.
Conservative growing (start small, expand on stagnation)	Classical VNS. Balanced.
Aggressive / random radius each iteration	Single algorithm sometimes exploits, sometimes explores. Doctor's preferred trick.

24. Guided Local Search (GLS)

Idea: change the objective function itself every time you visit a region. Add a penalty so the landscape deforms - the algorithm physically cannot return to the same place because that place no longer looks attractive.

"Whenever I go there I am going to add a penalty. So the landscape will look like this. Every time you come in here, it was 30, now it is 130. You change the landscape as you go."

Mechanism

Penalty is applied to features / components, not single solutions. For TSP each edge gets a penalty count; over-frequent edges accumulate penalty; the modified objective is:

g(s) = f(s) + lambda * sum( p_i * I_i(s) )

where p_i is the penalty count for feature i, I_i(s) is 1 if s uses feature i, and lambda is the penalty strength.

Critical: the best-so-far is preserved separately - you do not lose it just because you penalised the landscape. The deformed landscape is only used for the search direction.

25. GRASP (Greedy Randomized Adaptive Search Procedure)

Idea: two-phase. (1) Construct a solution by greedy randomised choices. (2) Improve it with local search. Repeat for m iterations and report the average plus confidence interval.

The candidate-list size as a greediness knob

At every construction step, rank all candidate components by quality and pick uniformly from the top P%.

Top P%	Behaviour	Tag
P = top 1 (greedy)	Always nearest neighbour. Multi-start gives identical solutions. Stuck on the same local optimum every time.	Pure greedy
P = top 10-20%	Mostly greedy with stochasticity. Different restarts explore different paths.	Sweet spot
P = 100% (full random)	No greediness. Pure random construction. Equivalent to random restart.	Pure random

Reproducibility caveat - exam-likely: GRASP is repeatable but not reproducible. Random selection means single-run results vary. You must report the mean plus a 95% confidence interval over many runs, not a single number.

26. Noising (Objective Perturbation)

Idea: same family as GLS but cruder. Add random noise to the objective function so the bully's true fitness gets blurred. The algorithm sometimes pretends the bully is worse than it is, freeing the search to escape.

Unifying principle of GLS and noising: do not change the algorithm, change the landscape. Same algorithm runs on a different problem instance every iteration.

End - Exam Synthesis

27. Common Exam Prompts

Q: What is a metaheuristic in one sentence?

A general problem-solving template that explores the search space without a closed-form algorithm. Applies to any problem where a candidate can be evaluated as good or bad. No optimality guarantee - returns the best found within the time budget.

Q: Why do we need it?

Because brute-force on exponential search spaces is infeasible. SAT 2^100 = 10^30. Even at 1000 evals/s, less than 1% requires 15 billion years.

Q: SA at T = 0?

Hill climbing. Worse moves never accepted. Pure exploitation.

Q: SA at T = infinity?

Random walk. Every move accepted. Pure exploration.

Q: Should the SA initial temperature be a single magic number?

No. T must be calibrated to the typical |delta-Q| of the problem. T much smaller than delta-Q = HC immediately. T much larger = random walk for many iterations.

Q: What does "stretch out the schedule" mean?

The doctor's framing: a slow cooling schedule keeps T high longer, so the algorithm resembles a random walk for more of the run; a fast schedule freezes early and behaves like concentrated HC.

Q: Should HC random-restart use long, short, or moderate intervals?

Moderate. Long wastes the budget after each plateau; short never gives any run time to exploit. (Midterm Q1.)

Q: Why is plain binary representation dangerous for HC?

Weak locality. Flipping the high bit of a number can change it by a huge amount. The algorithm cannot tell whether a single mutation was a small step or a large one. Gray code restores strong locality.

Q: What does "connectivity" mean for a tweak operator?

The operator must be able to reach the optimum from any starting point. If your BPP tweak is "swap two items", you can never reduce the number of bins. You must include "move" for connectivity.

Q: Why is "number of bins" a bad fitness for BPP?

Most local moves do not change the count, so the landscape is mostly plateau and HC is blind. Augment with empty-space variance, fill ratio, or max waste - same idea as internal vs external fitness in coevolution.

Q: TS tabu list - what should you store?

Move attributes (cheap, intuitive) or feature components / schemata (more powerful for breaking bullies). Storing full solutions is naive - the area covered is negligible relative to the state space.

Q: TS tenure too short vs too long?

Too short = cycling, no protection. Too long = blocks even legitimately good moves, search corners itself. Sweet spot 5-9 swaps.

Q: Aspiration?

Override tabu when the move would give a new best-so-far. The protection valve that keeps effectiveness intact.

Q: How can long-term frequency memory be used both ways?

High-frequency components: either intensify (re-seed search around them) or diversify (forbid them as bullies). Low-frequency: diversify (force them in to explore). Same data, opposite use depending on what the search needs.

Q: ILS perturbation strength extremes?

Too weak - falls back into the same basin. Too strong - equivalent to random restart, loses the structure you just earned. Moderate = keep common, change rest.

Q: VNS - what changes between iterations?

The neighborhood definition / radius. The schedule (fixed, growing, random) is the design choice.

Q: GLS - where is the search memory kept?

Inside the objective function itself, via penalty terms on features. Best-so-far is tracked separately so the deformed landscape does not corrupt the recorded best.

Q: GRASP candidate list size?

Greediness knob. P = 1 = pure greedy (multi-start gives identical solutions). P = 100% = pure random. P around 10-20% is the sweet spot.

Q: GRASP repeatable but not reproducible - why?

Random selection means each run gives a different result. Report mean plus 95% confidence interval, not a single number.

Q: Why does memory turn s-metaheuristics back into competitive algorithms?

Memory breaks the bully's gravitational pull. A single-trajectory algorithm with memory can achieve diversity that previously required a population.

28. Final One-Paragraph Answer

The pre-GA module is built on three engineer-controlled levers: representation (controls landscape shape and operator connectivity), tweak operator (controls move size and locality), and fitness function (controls sensitivity to small moves). Every algorithm here is a different way to dial exploration vs exploitation on top of these three knobs. Hill climbing is the s-metaheuristic skeleton; its variants 1+1, 1,1, 1+lambda, 1,lambda differ only in how candidates are generated and who survives. Simulated annealing adds a probabilistic acceptance of worse moves, governed by temperature T - at T=0 SA reduces to HC, at T=infinity to random walk; the cooling schedule slides between them. Tabu search adds explicit memory so the bully cannot trap the search in a basin; tenure and list contents are the design knobs, with aspiration as the safety valve. When HC variants get stuck, ILS perturbs the local optimum to land in a nearby basin, VNS expands the neighborhood radius, GLS reshapes the objective itself with feature penalties, GRASP rebuilds solutions from scratch with a controlled greediness knob, and noising blurs the objective with random noise. Across all of them, statistics matter: a single run is non-deterministic, so report mean plus confidence interval over multiple runs, and design fitness functions sensitive enough to give the algorithm a usable gradient.