Blog
algorithm

How to Score Font Similarity Without AI: A Deterministic Algorithm

Why I built a deterministic 5-dimension scoring algorithm for explainable font matching instead of using AI, producing consistent similarity scores.

Mladen Ruzicic
Mladen Ruzicic
6 min

“Similar fonts” is subjective. What makes Inter similar to Helvetica? Why is Lato a good alternative to Proxima Nova?

I needed an algorithm that produces consistent, explainable scores. Not AI vibes. Deterministic math.

The problem with AI matching

I considered using embeddings or ML models for font similarity. Problems:

  1. Black box: Why did the model say these fonts are similar? No explanation.
  2. Inconsistent: Run twice, get different results (temperature, random seeds).
  3. Expensive: API calls for every font pair add up.
  4. Overkill: Font similarity isn’t that complex.

Font matching isn’t like image recognition. There are a finite number of measurable characteristics. I can enumerate them.

The five dimensions

After researching typography and testing with designers, I settled on five dimensions:

  1. Classification match: Same category (sans-serif, serif, mono)?
  2. Proportions: Similar x-height, width, weight distribution?
  3. Stroke characteristics: Similar contrast, terminals, curves?
  4. Intended use: Same use cases (UI, editorial, display)?
  5. Personality: Similar feel (geometric, humanist, neutral)?

Each dimension contributes to the final score.

The scoring algorithm

Each dimension scores 0-100. The final score is a weighted average:

interface FontMetrics {
  classification: 'sans-serif' | 'serif' | 'display' | 'mono';
  proportions: {
    xHeight: number;      // 0-100 scale
    width: 'condensed' | 'normal' | 'extended';
    weightRange: [number, number];  // e.g., [300, 700]
  };
  stroke: {
    contrast: 'low' | 'medium' | 'high';
    terminals: 'flat' | 'rounded' | 'pointed';
    curves: 'geometric' | 'humanist' | 'mixed';
  };
  useCases: string[];     // ['ui', 'editorial', 'headlines']
  personality: string[];  // ['neutral', 'friendly', 'professional']
}

function calculateSimilarity(font1: FontMetrics, font2: FontMetrics): number {
  const scores = {
    classification: scoreClassification(font1, font2),
    proportions: scoreProportions(font1, font2),
    stroke: scoreStroke(font1, font2),
    useCases: scoreUseCases(font1, font2),
    personality: scorePersonality(font1, font2),
  };

  // Weighted average
  const weights = {
    classification: 25,
    proportions: 25,
    stroke: 20,
    useCases: 15,
    personality: 15,
  };

  let total = 0;
  let weightSum = 0;

  for (const [key, weight] of Object.entries(weights)) {
    total += scores[key] * weight;
    weightSum += weight;
  }

  return Math.round(total / weightSum);
}

Scoring each dimension

Classification (25%)

Same classification = 100, related = 50, different = 0.

function scoreClassification(f1: FontMetrics, f2: FontMetrics): number {
  if (f1.classification === f2.classification) return 100;

  // Related classifications
  const related: Record<string, string[]> = {
    'sans-serif': ['display'],
    'serif': ['display'],
    'display': ['sans-serif', 'serif'],
    'mono': [],  // Mono is distinct
  };

  if (related[f1.classification]?.includes(f2.classification)) return 50;
  return 0;
}

Proportions (25%)

Compare x-height, width, and weight range:

function scoreProportions(f1: FontMetrics, f2: FontMetrics): number {
  // X-height difference (0-100 scale)
  const xHeightDiff = Math.abs(f1.proportions.xHeight - f2.proportions.xHeight);
  const xHeightScore = Math.max(0, 100 - xHeightDiff * 2);

  // Width match
  const widthScore = f1.proportions.width === f2.proportions.width ? 100 : 50;

  // Weight range overlap
  const [min1, max1] = f1.proportions.weightRange;
  const [min2, max2] = f2.proportions.weightRange;
  const overlapStart = Math.max(min1, min2);
  const overlapEnd = Math.min(max1, max2);
  const overlap = Math.max(0, overlapEnd - overlapStart);
  const totalRange = Math.max(max1, max2) - Math.min(min1, min2);
  const weightScore = (overlap / totalRange) * 100;

  return (xHeightScore + widthScore + weightScore) / 3;
}

Stroke (20%)

Compare contrast, terminals, and curve style:

function scoreStroke(f1: FontMetrics, f2: FontMetrics): number {
  let score = 0;

  // Contrast match
  if (f1.stroke.contrast === f2.stroke.contrast) score += 33;
  else if (areAdjacent(f1.stroke.contrast, f2.stroke.contrast)) score += 17;

  // Terminal match
  if (f1.stroke.terminals === f2.stroke.terminals) score += 33;

  // Curve style match
  if (f1.stroke.curves === f2.stroke.curves) score += 34;
  else if (f1.stroke.curves === 'mixed' || f2.stroke.curves === 'mixed') score += 17;

  return score;
}

function areAdjacent(a: string, b: string): boolean {
  const order = ['low', 'medium', 'high'];
  return Math.abs(order.indexOf(a) - order.indexOf(b)) === 1;
}

Use cases (15%)

Jaccard similarity of use case arrays:

function scoreUseCases(f1: FontMetrics, f2: FontMetrics): number {
  const set1 = new Set(f1.useCases);
  const set2 = new Set(f2.useCases);

  const intersection = [...set1].filter(x => set2.has(x)).length;
  const union = new Set([...set1, ...set2]).size;

  return union > 0 ? (intersection / union) * 100 : 0;
}

Personality (15%)

Same approach as use cases:

function scorePersonality(f1: FontMetrics, f2: FontMetrics): number {
  const set1 = new Set(f1.personality);
  const set2 = new Set(f2.personality);

  const intersection = [...set1].filter(x => set2.has(x)).length;
  const union = new Set([...set1, ...set2]).size;

  return union > 0 ? (intersection / union) * 100 : 0;
}

Why these weights?

Classification and proportions matter most for visual replacement. A serif can’t replace a sans-serif, no matter how similar the other metrics.

Stroke characteristics matter for trained eyes but less for casual use.

Use cases and personality are soft factors. They influence recommendation order but shouldn’t disqualify otherwise good matches.

The weights came from testing. I showed font pairs to designers and adjusted until the algorithm matched human intuition.

The frontmatter format

Each premium font lists alternatives with similarity scores:

alternatives:
  - slug: inter
    similarity: 85
    notes: "Similar proportions, slightly taller x-height"
  - slug: open-sans
    similarity: 72
    notes: "More humanist, different terminals"
  - slug: source-sans-pro
    similarity: 68
    notes: "Narrower, more neutral feel"

The scores are pre-computed. No runtime calculation.

Deterministic = debuggable

When a user questions a score, I can explain it:

“Proxima Nova and Lato have 78% similarity because:

  • Same classification (sans-serif): 100%
  • Similar proportions: 82%
  • Different stroke (geometric vs humanist): 55%
  • Overlapping use cases: 75%
  • Similar personality: 80%

Weighted average: 78%”

Try explaining why GPT-4 thinks two fonts are similar.

Edge cases

The algorithm handles some edge cases:

Display fonts: Wide variety of styles. I score within sub-categories (geometric display, script display, etc.).

Variable fonts: Weight range matters more. A variable font with 100-900 range scores higher against multi-weight families.

Mono fonts: Classification dominates. A bad monospace is better than a good proportional font for code.

What the algorithm doesn’t capture

Some things are hard to quantify:

  • Optical adjustments: How the font looks at specific sizes
  • Cultural associations: Some fonts feel “tech”, others “editorial”
  • Trend alignment: What’s currently popular
  • Rendering quality: How the font performs on different screens

I capture these through manual curation in the notes field.

Tradeoffs

What I gained:

  • Consistent, reproducible scores
  • Explainable recommendations
  • Fast computation (no API calls)
  • Works offline

What I lost:

  • Nuance of expert judgment
  • Adaptation to emerging preferences
  • Discovery of unexpected similarities

Future improvements:

  • Weight tuning based on user feedback
  • Dimension expansion (add optical size handling)
  • A/B testing different weight configurations

The result

Users see similarity percentages they can trust. An 85% match is reliably a good substitute. A 60% match works in a pinch.

No AI magic. Just math that matches human intuition.

See the full methodology for how these scores are displayed on the site.

Explore on FontAlternatives

#algorithm#typography#ai#scoring

More from the blog