“Similar fonts” is subjective. What makes Inter similar to Helvetica? Why is Lato a good alternative to Proxima Nova?
I needed an algorithm that produces consistent, explainable scores. Not AI vibes. Deterministic math.
The problem with AI matching
I considered using embeddings or ML models for font similarity. Problems:
- Black box: Why did the model say these fonts are similar? No explanation.
- Inconsistent: Run twice, get different results (temperature, random seeds).
- Expensive: API calls for every font pair add up.
- Overkill: Font similarity isn’t that complex.
Font matching isn’t like image recognition. There are a finite number of measurable characteristics. I can enumerate them.
The five dimensions
After researching typography and testing with designers, I settled on five dimensions:
- Classification match: Same category (sans-serif, serif, mono)?
- Proportions: Similar x-height, width, weight distribution?
- Stroke characteristics: Similar contrast, terminals, curves?
- Intended use: Same use cases (UI, editorial, display)?
- Personality: Similar feel (geometric, humanist, neutral)?
Each dimension contributes to the final score.
The scoring algorithm
Each dimension scores 0-100. The final score is a weighted average:
interface FontMetrics {
classification: 'sans-serif' | 'serif' | 'display' | 'mono';
proportions: {
xHeight: number; // 0-100 scale
width: 'condensed' | 'normal' | 'extended';
weightRange: [number, number]; // e.g., [300, 700]
};
stroke: {
contrast: 'low' | 'medium' | 'high';
terminals: 'flat' | 'rounded' | 'pointed';
curves: 'geometric' | 'humanist' | 'mixed';
};
useCases: string[]; // ['ui', 'editorial', 'headlines']
personality: string[]; // ['neutral', 'friendly', 'professional']
}
function calculateSimilarity(font1: FontMetrics, font2: FontMetrics): number {
const scores = {
classification: scoreClassification(font1, font2),
proportions: scoreProportions(font1, font2),
stroke: scoreStroke(font1, font2),
useCases: scoreUseCases(font1, font2),
personality: scorePersonality(font1, font2),
};
// Weighted average
const weights = {
classification: 25,
proportions: 25,
stroke: 20,
useCases: 15,
personality: 15,
};
let total = 0;
let weightSum = 0;
for (const [key, weight] of Object.entries(weights)) {
total += scores[key] * weight;
weightSum += weight;
}
return Math.round(total / weightSum);
}
Scoring each dimension
Classification (25%)
Same classification = 100, related = 50, different = 0.
function scoreClassification(f1: FontMetrics, f2: FontMetrics): number {
if (f1.classification === f2.classification) return 100;
// Related classifications
const related: Record<string, string[]> = {
'sans-serif': ['display'],
'serif': ['display'],
'display': ['sans-serif', 'serif'],
'mono': [], // Mono is distinct
};
if (related[f1.classification]?.includes(f2.classification)) return 50;
return 0;
}
Proportions (25%)
Compare x-height, width, and weight range:
function scoreProportions(f1: FontMetrics, f2: FontMetrics): number {
// X-height difference (0-100 scale)
const xHeightDiff = Math.abs(f1.proportions.xHeight - f2.proportions.xHeight);
const xHeightScore = Math.max(0, 100 - xHeightDiff * 2);
// Width match
const widthScore = f1.proportions.width === f2.proportions.width ? 100 : 50;
// Weight range overlap
const [min1, max1] = f1.proportions.weightRange;
const [min2, max2] = f2.proportions.weightRange;
const overlapStart = Math.max(min1, min2);
const overlapEnd = Math.min(max1, max2);
const overlap = Math.max(0, overlapEnd - overlapStart);
const totalRange = Math.max(max1, max2) - Math.min(min1, min2);
const weightScore = (overlap / totalRange) * 100;
return (xHeightScore + widthScore + weightScore) / 3;
}
Stroke (20%)
Compare contrast, terminals, and curve style:
function scoreStroke(f1: FontMetrics, f2: FontMetrics): number {
let score = 0;
// Contrast match
if (f1.stroke.contrast === f2.stroke.contrast) score += 33;
else if (areAdjacent(f1.stroke.contrast, f2.stroke.contrast)) score += 17;
// Terminal match
if (f1.stroke.terminals === f2.stroke.terminals) score += 33;
// Curve style match
if (f1.stroke.curves === f2.stroke.curves) score += 34;
else if (f1.stroke.curves === 'mixed' || f2.stroke.curves === 'mixed') score += 17;
return score;
}
function areAdjacent(a: string, b: string): boolean {
const order = ['low', 'medium', 'high'];
return Math.abs(order.indexOf(a) - order.indexOf(b)) === 1;
}
Use cases (15%)
Jaccard similarity of use case arrays:
function scoreUseCases(f1: FontMetrics, f2: FontMetrics): number {
const set1 = new Set(f1.useCases);
const set2 = new Set(f2.useCases);
const intersection = [...set1].filter(x => set2.has(x)).length;
const union = new Set([...set1, ...set2]).size;
return union > 0 ? (intersection / union) * 100 : 0;
}
Personality (15%)
Same approach as use cases:
function scorePersonality(f1: FontMetrics, f2: FontMetrics): number {
const set1 = new Set(f1.personality);
const set2 = new Set(f2.personality);
const intersection = [...set1].filter(x => set2.has(x)).length;
const union = new Set([...set1, ...set2]).size;
return union > 0 ? (intersection / union) * 100 : 0;
}
Why these weights?
Classification and proportions matter most for visual replacement. A serif can’t replace a sans-serif, no matter how similar the other metrics.
Stroke characteristics matter for trained eyes but less for casual use.
Use cases and personality are soft factors. They influence recommendation order but shouldn’t disqualify otherwise good matches.
The weights came from testing. I showed font pairs to designers and adjusted until the algorithm matched human intuition.
The frontmatter format
Each premium font lists alternatives with similarity scores:
alternatives:
- slug: inter
similarity: 85
notes: "Similar proportions, slightly taller x-height"
- slug: open-sans
similarity: 72
notes: "More humanist, different terminals"
- slug: source-sans-pro
similarity: 68
notes: "Narrower, more neutral feel"
The scores are pre-computed. No runtime calculation.
Deterministic = debuggable
When a user questions a score, I can explain it:
“Proxima Nova and Lato have 78% similarity because:
- Same classification (sans-serif): 100%
- Similar proportions: 82%
- Different stroke (geometric vs humanist): 55%
- Overlapping use cases: 75%
- Similar personality: 80%
Weighted average: 78%”
Try explaining why GPT-4 thinks two fonts are similar.
Edge cases
The algorithm handles some edge cases:
Display fonts: Wide variety of styles. I score within sub-categories (geometric display, script display, etc.).
Variable fonts: Weight range matters more. A variable font with 100-900 range scores higher against multi-weight families.
Mono fonts: Classification dominates. A bad monospace is better than a good proportional font for code.
What the algorithm doesn’t capture
Some things are hard to quantify:
- Optical adjustments: How the font looks at specific sizes
- Cultural associations: Some fonts feel “tech”, others “editorial”
- Trend alignment: What’s currently popular
- Rendering quality: How the font performs on different screens
I capture these through manual curation in the notes field.
Tradeoffs
What I gained:
- Consistent, reproducible scores
- Explainable recommendations
- Fast computation (no API calls)
- Works offline
What I lost:
- Nuance of expert judgment
- Adaptation to emerging preferences
- Discovery of unexpected similarities
Future improvements:
- Weight tuning based on user feedback
- Dimension expansion (add optical size handling)
- A/B testing different weight configurations
The result
Users see similarity percentages they can trust. An 85% match is reliably a good substitute. A 60% match works in a pinch.
No AI magic. Just math that matches human intuition.
See the full methodology for how these scores are displayed on the site.