Differential Evolution: Exam Concepts

Transcript-first notes for DE: adaptive mutation, parent-child competition, alpha extremes, BPP adaptation, HW#5 observations, and plot behavior.

Core exam idea: DE is adaptive mutation for continuous vector spaces. It builds a mutant using d = a + alpha*(b - c). The population's spread controls the move size: scattered population means large exploratory moves; converged population means small exploitative moves.

1. The One-Line Model

DE = vector mutation whose step size is controlled by population spread

For each target parent Q_i, DE chooses three other individuals a, b, and c, builds a mutant, crosses it with the parent, then keeps whichever is better.

Unlike GA, DE is not mainly "generation replaces generation." It is closer to the same parent adapting its location: each parent gets one trial child, and the better one survives.

mutant d = a + alpha * (b - c) trial u = crossover(d, Q_i) survivor = better(u, Q_i)

Symbol	Meaning	Exam interpretation
`Q_i`	The target parent being updated.	The owner of the child; it competes at the end.
`a`	Base vector / starting point.	The area DE is exploiting around.
`b - c`	Difference vector from two other individuals.	Direction and distance of the move.
`alpha`	Mutation factor.	Scales exploration/exploitation.
Crossover	Mix mutant with parent.	Preserves some parent characteristics.
Selection	Trial vs parent duel.	Elitist local replacement.

2. Why DE Self-Adapts

Early run

Exploration

The population is scattered. Randomly selected b and c are likely far apart, so b-c is large. Mutants can jump across regions.

large spread -> large |b-c| -> large mutation

Late run

Exploitation

The population is clustered. Randomly selected b and c are close, so b-c is small. Mutants fine-tune locally.

small spread -> small |b-c| -> small mutation

Exam answer: DE does not need a SA-style cooling schedule because the population itself controls the mutation size. Spread population gives exploration; converged population gives exploitation. If spread becomes zero, mutation becomes zero and the algorithm freezes.

3. Why Parent Competes at the End

The doctor spent time on this because it is easy to misunderstand. The parent is "competing" in DE, but not at the beginning of mutant construction.

Stage	What happens	Why it matters
Mutant construction	Use `a`, `b`, `c` from other individuals.	Gives freedom to cross areas of the space instead of staying centered on the parent.
Crossover	Mix `d` with a copy of `Q_i`.	The child keeps some parent traits.
Final duel	Compare trial child against `Q_i`.	If the child is worse, the parent survives. Good information is not lost.

If asked "why not use the parent as a?": If a = Q_i, the mutant is built around the parent, then crossed with the parent, then compared with the parent. That over-centers the search on the parent and becomes closer to hill climbing. The standard version keeps the parent out of the base vector to preserve exploration.

4. Difference-Vector Strength Extremes

Likely question shape: "What happens if the difference-vector influence disappears? What if the differential move is very large? What if the population has already converged?"

Exam wording	Mutation meaning	Behavior	Explore / Exploit
No difference-vector influence (`alpha = 0`)	`d = a`	The difference direction disappears. Trial only comes from crossover between `a` and parent.	Weak movement
Small difference-vector influence (`alpha` small but not zero)	A tiny step in the direction of `b-c`.	DE still uses population information, but the move is cautious and close to the base vector.	Local / conservative movement
Moderate difference-vector influence (`alpha` around `0.5`)	Half of the difference vector.	Moderate movement. This is often a balanced region.	Balanced
Full difference-vector influence (`alpha` around `1`)	Full difference vector.	More aggressive movement within the common range.	More exploration
Very strong difference-vector influence (`alpha` very large)	Huge scaled difference.	Mutants jump far away, may leave useful regions or require repair/clamping.	Excessive exploration
No population spread / all individuals similar	`b-c = 0`	No differential movement. DE is frozen unless diversity is injected.	Stagnation
Small population spread / individuals nearly similar	`b-c` is small.	The adaptive step automatically shrinks. DE becomes more local even if the scaling factor is not small.	Late-stage exploitation

Exam wording map: "difference between two individuals" means the DE direction b-c; "scale of that difference" means alpha; "small scaling" means cautious movement even if the population is diverse; "small population spread" means b-c is tiny because individuals are already similar.

5. Choosing `a`, `b`, and `c`

The default algorithm chooses the three helper individuals randomly. The transcript makes clear that this is a design lever, not a sacred rule.

Choice rule	Behavior	Doctor-style explanation
Pick randomly from the full population.	Most exploration	Any area can influence the next move.
Pick `a` by roulette/rank fitness.	Biased but not greedy	Good individuals are favored, but weaker ones still have a chance.
Pick from top-P% then random, GRASP-style.	Controlled exploitation	Greedy because we restrict to good individuals; random because we still sample within them.
Pick only from the best few.	Strong exploitation	Fast convergence but high premature convergence risk. Top 3% is almost hill climbing around elites.

Exam sentence: Selecting a from better individuals exploits better regions. Selecting b and c from diverse individuals preserves large useful directions. If all three are chosen only from elites, DE may converge faster but lose exploration.

6. Crossover Is Not Decoration

After the mutant d is built, DE does crossover with the parent. This matters because the mutant may be useful in some dimensions and bad in others.

Without crossover	With crossover
The child is just `d`. It may lose the parent's useful structure entirely.	The child can keep some parent dimensions and take some mutant dimensions.
More like pure mutation from `a`.	Communication between the parent and the population-generated direction.
Less continuity.	Parent traits survive even before the final duel.

7. DE on Bin Packing

Trap: DE is designed for real-valued vectors. BPP is combinatorial. Do not say "subtract two packings" unless you immediately define what "difference" means.

Design lens: For BPP, the algorithm skeleton is not the hard part. Your job is representation, objective/fitness, initialization, and operator interpretation. Visualize what "move in this direction" means for bins before defining the difference operator.

Method A: Random Keys

Each item gets a real key. Sort keys to get an item order. Decode the order with First Fit or First Fit Decreasing.

real vector -> sort items -> permutation -> First Fit -> bins

Strength: easy, valid DE arithmetic.

Weakness: small key changes can cause large order changes, and many different key vectors decode to the same packing.

Method B: Discrete Difference

Define difference over item-pair co-location, bin groupings, or moves. Then apply a fraction of those features to a base packing and repair if needed.

B - C = features/moves that differ A + alpha*(B-C) = apply some of those features to A

Strength: closer to the real BPP structure.

Weakness: must handle duplicates, missing items, and capacity repair.

Discrete DE Example in Words

Suppose two good packings repeatedly place items 2 and 5 together. That co-location is a schema. A discrete DE operator can preserve or transplant that item-pair schema into another base packing.

Continuous DE idea	BPP adaptation
`b-c`	Features where two packings differ: item pairs, bin groupings, item moves.
`alpha`	How many of those differences/features to apply.
`a + ...`	Start from base packing `a`, then insert selected features.
Crossover	Mix trial packing with target parent, then repair duplicates/missing items/capacity violations.

Exam-safe answer: In BPP, I can use random keys to keep DE continuous, or I can define a discrete difference over item groupings. Direct subtraction is meaningless until I define the representation and the difference operator.

Raw-Otter wording to remember: Do not lock your head into arithmetic subtraction and addition. "Difference" can mean swap, missing item, duplicated item, common item group, or schema, as long as you define it consistently and repair the solution.

8. Plot Behavior

Plot signature	Likely DE diagnosis	Reason
Large noisy current-best changes early.	Population is spread out, or alpha is high.	`b-c` is large, so candidate moves are exploratory.
Smoother improvement later.	Population converged into a promising region.	`b-c` shrinks, so DE fine-tunes.
Best-so-far flat for a long time.	No successful trial beats its parent.	Could mean alpha too high, alpha too low, or population already collapsed.
Population best cannot get worse after replacement.	Elitist parent-child selection is working.	If the trial is worse, the parent remains.
Premature plateau after selecting helpers only from elites.	Too much exploitation.	All moves are built around the same elite region.

9. Population Size vs Iterations

For DE, population size and number of iterations affect different things.

Setup	Behavior	Risk
Large population, few iterations	Good initial coverage and large `b-c` vectors.	Not enough time for successful trials to refine the population.
Small population, many iterations	More repeated refinement around found regions.	Population can collapse early; then `b-c` becomes tiny and DE stagnates.
Balanced	Enough spread for useful directions, enough iterations for refinement.	Usually safest unless the problem strongly demands exploration or exploitation.

Exam sentence: DE needs diversity because its mutation direction and size come from population differences. If the population loses diversity too soon, the mutation operator loses power.

10. HW#5 Story

Your HW#5 answer should not be "DE was good" or "PSO was good." The important observation is that the encoding dominated both algorithms.

DE and PSO clustered within about two bins on every instance.
Both used random-key encoding, so both were limited by the same continuous-to-permutation bottleneck.
DE was efficient, but not as structurally matched to BPP as constructive ACO/ACS.
DE can work better on BPP only if the operator is redesigned around BPP features, such as item-pair schemas or moves.
Using a larger mutation factor gave excessive exploration; a moderate value was safer.

HW-style answer: In HW#5, DE did not clearly separate itself from PSO because both were operating through random keys. The optimizer changed real vectors, but the actual packing came from sorting and decoding. Therefore the bottleneck was representation/encoding, not only DE's mutation rule.

11. Common Exam Prompts

Q: What is DE in one sentence?

DE is adaptive mutation in real-valued vector space: d = a + alpha*(b-c), then crossover with the parent, then child-parent competition.

Q: Why is DE adaptive?

The mutation step comes from b-c. If the population is spread out, the step is large. If the population is clustered, the step is small.

Q: What happens if the difference-vector influence disappears?

The difference vector disappears. The mutant becomes the base vector a, so movement is weak and less directed by population spread.

Q: What happens if the difference-vector influence is huge?

The differential term dominates. Mutants jump too far and behave like excessive exploration or near-random search.

Q: Why does the parent compete at the end?

So a bad trial does not replace a good parent. The parent still contributes through crossover, but it is not used as the base vector because that would over-exploit the parent region.

Q: How do you apply DE to BPP?

Use random keys and decode with First Fit, or define a discrete difference over item-pair schemas/moves and repair infeasible packings.

Q: What was the HW#5 observation?

DE and PSO were close because random-key encoding dominated. The representation limited both methods more than the optimizer separated them.

Q: Does DE use elitism?

Yes, locally. Each trial must beat its immediate parent; otherwise the parent survives.

Q: What if the population has no diversity?

If individuals are identical or nearly identical, b-c becomes zero or tiny. DE loses its mutation power and stagnates.

Q: Is random-key DE native to BPP?

No. It is a convenient encoding trick. A more native BPP adaptation defines difference over item groupings, schemas, or moves.

12. Final One-Paragraph Answer

Differential Evolution is a population method for real-valued vectors. For each parent, it builds a mutant d = a + alpha*(b-c), crosses it with the parent, and keeps the better of the trial and parent. The key idea is adaptive mutation: when the population is scattered, b-c is large and DE explores; when the population converges, b-c is small and DE exploits. For bin packing, direct subtraction is meaningless, so we either use random-key encoding or define a custom discrete difference over item groupings and repair the result. In HW#5, DE and PSO were close because both were limited by random-key encoding.

Differential Evolution: Exam Concepts

1. The One-Line Model

2. Why DE Self-Adapts

Early run

Late run

3. Why Parent Competes at the End

4. Difference-Vector Strength Extremes

5. Choosing a, b, and c

6. Crossover Is Not Decoration

7. DE on Bin Packing

Method A: Random Keys

Method B: Discrete Difference

Discrete DE Example in Words

8. Plot Behavior

9. Population Size vs Iterations

10. HW#5 Story

11. Common Exam Prompts

Q: What is DE in one sentence?

Q: Why is DE adaptive?

Q: What happens if the difference-vector influence disappears?

Q: What happens if the difference-vector influence is huge?

Q: Why does the parent compete at the end?

Q: How do you apply DE to BPP?

Q: What was the HW#5 observation?

Q: Does DE use elitism?

Q: What if the population has no diversity?

Q: Is random-key DE native to BPP?

12. Final One-Paragraph Answer

5. Choosing `a`, `b`, and `c`