FontAlternatives needs specimen images for every premium font. These are the high-quality images showing fonts in use that foundries create for marketing.
There’s no universal API for this. Each foundry has their own website structure. So I built 15 foundry-specific scrapers with an orchestrator that picks the right one.
The problem
I need specimen images for 300+ premium fonts. Manually downloading images would take hours. And when I add new fonts, I’d need to do it again.
Options I considered:
- Manual download: Time-consuming, doesn’t scale
- MyFonts API: No public API for images
- Google Images: Unreliable, wrong images, copyright issues
- Web scraping: Works, but each foundry is different
Web scraping won. But it meant building separate scrapers for each foundry.
The orchestrator pattern
The orchestrator is a simple priority system:
import { scrapeFontImages } from './scrapers';
async function downloadFontImages(fontSlug: string): Promise<void> {
const font = await getFontData(fontSlug);
// Try foundry-specific scraper first
const foundryScraper = getFoundryScraper(font.foundry);
if (foundryScraper) {
try {
const images = await foundryScraper(font);
if (images.length > 0) {
await saveImages(fontSlug, images);
return;
}
} catch (error) {
console.warn(`Foundry scraper failed: ${font.foundry}`, error);
}
}
// Fallback to MyFonts
try {
const images = await scrapeMyFonts(font.name);
if (images.length > 0) {
await saveImages(fontSlug, images);
return;
}
} catch (error) {
console.warn('MyFonts scraper failed', error);
}
// Generic fallback
try {
const images = await scrapeGeneric(font);
await saveImages(fontSlug, images);
} catch (error) {
console.error('All scrapers failed', error);
// Create placeholder, flag for manual upload
await createPlaceholder(fontSlug);
}
}
Foundry-specific scrapers get the best images. MyFonts is the reliable fallback. Generic scraper is last resort.
Foundry-specific scrapers
Each foundry structures their site differently. Here’s how I handle a few of them:
Klim Type Foundry
Klim uses a clean structure with specimen images in predictable locations:
async function scrapeKlim(font: Font): Promise<string[]> {
const slug = font.name.toLowerCase().replace(/\s+/g, '-');
const url = `https://klim.co.nz/retail-fonts/${slug}/`;
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle0' });
// Klim uses data-src for lazy-loaded images
const images = await page.$$eval(
'.specimen-image img',
(imgs) => imgs.map((img) =>
img.getAttribute('data-src') || img.getAttribute('src')
).filter(Boolean)
);
await page.close();
return images;
}
Pangram Pangram
Pangram uses full-bleed specimen images with consistent class names:
async function scrapePangram(font: Font): Promise<string[]> {
const slug = font.name.toLowerCase().replace(/\s+/g, '-');
const url = `https://pangrampangram.com/products/${slug}`;
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle0' });
// Scroll to trigger lazy loading
await page.mouse.wheel({ deltaY: 5000 });
await page.waitForTimeout(1000);
const images = await page.$$eval(
'img.specimen-full',
(imgs) => imgs.map((img) => img.src)
);
await page.close();
return images;
}
Commercial Type
Commercial Type has a gallery section with high-res specimens:
async function scrapeCommercialType(font: Font): Promise<string[]> {
const slug = font.name.toLowerCase().replace(/\s+/g, '-');
const url = `https://commercialtype.com/catalog/${slug}`;
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle0' });
// Find the gallery section
const images = await page.$$eval(
'[data-gallery] img, .specimen-gallery img',
(imgs) => imgs.map((img) => {
// Get highest resolution version
const srcset = img.getAttribute('srcset');
if (srcset) {
const sources = srcset.split(',').map(s => s.trim().split(' '));
const highest = sources.sort((a, b) =>
parseInt(b[1]) - parseInt(a[1])
)[0];
return highest[0];
}
return img.src;
})
);
await page.close();
return images;
}
Hoefler&Co
Hoefler uses JavaScript-rendered content, requiring full page wait:
async function scrapeHoefler(font: Font): Promise<string[]> {
const slug = font.name.toLowerCase().replace(/\s+/g, '-');
const url = `https://www.typography.com/fonts/${slug}`;
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle0' });
// Wait for dynamic content
await page.waitForSelector('.font-specimen', { timeout: 10000 });
const images = await page.$$eval(
'.font-specimen img, .gallery-item img',
(imgs) => imgs.map((img) => img.src)
);
await page.close();
return images;
}
Handling srcset and responsive images
Modern foundry sites use responsive images. I extract the highest resolution:
function extractBestImage(img: Element): string | null {
// Try srcset first
const srcset = img.getAttribute('srcset');
if (srcset) {
const sources = srcset
.split(',')
.map((s) => {
const parts = s.trim().split(/\s+/);
return {
url: parts[0],
width: parseInt(parts[1]?.replace('w', '') || '0'),
};
})
.sort((a, b) => b.width - a.width);
if (sources[0]?.url) {
return sources[0].url;
}
}
// Fallback to data-src (lazy loading)
const dataSrc = img.getAttribute('data-src');
if (dataSrc) return dataSrc;
// Finally, regular src
return img.getAttribute('src');
}
The MyFonts fallback
When foundry-specific scrapers fail or don’t exist, MyFonts usually has the font:
async function scrapeMyFonts(fontName: string): Promise<string[]> {
const searchUrl = `https://www.myfonts.com/search?query=${encodeURIComponent(fontName)}`;
const page = await browser.newPage();
await page.goto(searchUrl, { waitUntil: 'networkidle0' });
// Click first result
const firstResult = await page.$('.search-result-item a');
if (!firstResult) {
await page.close();
return [];
}
await firstResult.click();
await page.waitForNavigation({ waitUntil: 'networkidle0' });
// Get specimen images from font page
const images = await page.$$eval(
'.specimen-image img, .font-preview img',
(imgs) => imgs.map((img) => img.src)
);
await page.close();
return images;
}
MyFonts images are lower quality than foundry originals, but they’re consistent and cover almost every commercial font.
Image processing pipeline
Raw scraped images need processing:
- Format conversion: PNG to WebP to AVIF
- Resizing: Create thumbnail (400px width)
- Optimization: Strip metadata, compress
import sharp from 'sharp';
async function processImage(
buffer: Buffer,
fontSlug: string,
index: number
): Promise<void> {
const basePath = `.cache/assets/previews/${fontSlug}`;
// Full size WebP
await sharp(buffer)
.webp({ quality: 85 })
.toFile(`${basePath}/specimen-${index}.webp`);
// Full size AVIF
await sharp(buffer)
.avif({ quality: 80 })
.toFile(`${basePath}/specimen-${index}.avif`);
// Thumbnail
await sharp(buffer)
.resize(400, null, { withoutEnlargement: true })
.webp({ quality: 80 })
.toFile(`${basePath}/thumb-${index}.webp`);
}
Manifest tracking
I track which images exist for each font:
{
"avenir": {
"specimens": ["specimen-0.webp", "specimen-1.webp"],
"thumbnails": ["thumb-0.webp"],
"lastUpdated": "2024-01-15T10:30:00Z",
"source": "lineto"
}
}
The manifest tells me:
- Which fonts have images
- How many specimens each font has
- When images were last scraped
- Which scraper was used (for debugging)
Rate limiting and politeness
Scrapers can hammer servers. I add delays between requests:
const RATE_LIMITS: Record<string, number> = {
klim: 2000, // 2 seconds between requests
pangram: 1500,
commercial: 2000,
myfonts: 3000, // MyFonts is stricter
default: 1000,
};
async function delay(foundry: string): Promise<void> {
const ms = RATE_LIMITS[foundry] || RATE_LIMITS.default;
await new Promise((resolve) => setTimeout(resolve, ms));
}
I also set a realistic user agent and respect robots.txt (mostly - specimen pages aren’t usually blocked).
Error handling and manual fallback
Scrapers fail. Sites change. When automation fails, I need a manual path:
async function handleScraperFailure(fontSlug: string): Promise<void> {
// Create placeholder image
await createPlaceholder(fontSlug);
// Create GitHub issue for manual upload
if (process.env.GITHUB_TOKEN) {
await createGitHubIssue({
title: `Manual image needed: ${fontSlug}`,
body: `Automated scraping failed for ${fontSlug}. Please manually upload specimen images.`,
labels: ['manual-upload', 'images'],
});
}
}
The placeholder is a simple gray box with the font name. It’s better than broken images.
The 15 foundries
Current scrapers:
| Foundry | URL Pattern | Notes |
|---|---|---|
| Klim | klim.co.nz/retail-fonts/{slug}/ | Clean structure |
| Pangram | pangrampangram.com/products/{slug} | Lazy images |
| Commercial Type | commercialtype.com/catalog/{slug} | Has gallery |
| Hoefler&Co | typography.com/fonts/{slug} | JS rendered |
| Lineto | lineto.com/typefaces/{slug} | Simple selectors |
| Dinamo | abcdinamo.com/typefaces/{slug} | Modern structure |
| Grilli Type | grillitype.com/typeface/{slug} | Grid layout |
| Colophon | colophon-foundry.org/typefaces/{slug} | Minimal |
| Sharp Type | sharptype.co/typefaces/{slug} | Good quality |
| Fontsmith | fontsmith.com/fonts/{slug} | Mixed quality |
| Fontshare | fontshare.com/fonts/{slug} | Free fonts |
| Google Fonts | fonts.google.com/specimen/{slug} | API available |
| Adobe Fonts | fonts.adobe.com/fonts/{slug} | Requires auth |
| Type Network | typenetwork.com/fonts/{slug} | Federation |
| MyFonts | myfonts.com/ (search) | Fallback |
Tradeoffs
What I gained:
- Automated image acquisition for 300+ fonts
- Consistent image quality through processing
- Scalable (adding fonts doesn’t require manual work)
What I lost:
- Maintenance burden (site changes break scrapers)
- Rate limiting means slow batch processing
- Some fonts still need manual upload
The brittle reality: Scrapers break. On average, 1-2 foundries change their HTML structure each month. When tests fail, I check which scraper broke and update the selectors. It’s tedious but manageable.
Running the pipeline
# Single font
npx tsx scripts/download-font-images.ts --slug avenir
# Batch (respects rate limits)
npx tsx scripts/download-font-images.ts --batch tier1
# Update manifest
npx tsx scripts/update-image-manifest.ts
The batch mode processes fonts in order of their tier (Tier 1 first, most important fonts). It runs in CI but can also run locally for testing. These images feed into the automated content pipeline that creates new font pages.
What I’d do differently
If starting over:
- Foundry partnerships: Some foundries might provide images directly if asked
- CDN integration: Store images on R2 from the start, not local cache
- Visual regression: Detect when scraped images change unexpectedly
The scraper approach works, but it’s duct tape. A proper solution would involve foundry cooperation. For a side project, duct tape is fine.