Saving manuscripts.kb.nl to the Wayback Machine
Latest update: 28-04-2026
About
manuscripts.kb.nl - the Medieval Illuminated Manuscripts (Middeleeuwse Verluchte Handschriften / MVH) database of the KB, National Library of the Netherlands - was shut down on 15 December 2025.

Screenshot of manuscripts.kb.nl homepage, December 2025
Before the site went offline, the KB spidered and archived a representative sample of its URLs to The Wayback Machine (WBM) during 10-14 December 2025.
Results & URL spreadsheet
-
manuscripts-urls-wbm-archived.xlsx is the master spreadsheet for everything archived from manuscripts.kb.nl. It gives a detailed overview of all 7,460 archived URLs with per-URL status, WBM submission and capture URLs/timestamps, and Wikipedia/Commons metadata where applicable.
- The Excel detail page gives the full column-by-column breakdown of all nine sheets in the Excel:
ALL_URLS(7,460 rows — union of all sheets below)show_manuscript(2,371 rows — manuscript detail pages)show_images_text(2,322 rows — image gallery pages)show_text(1,520 rows — text-only views)search_extended(806 rows — extended search results)search_literature(397 rows — literature search results)indexes(9 rows — browse index pages)static_pages(8 rows — homepage, introduction, background, advanced search)wiki_priority(61 rows — all URLs linked from Wikipedia/Commons)
- manuscripts-urls-spider-output.xlsx contains the full spider crawl output (12,550 URLs). The 5,117 URLs not selected for archiving are mostly image search result pages.
Screenshots
Each pair shows the original manuscripts.kb.nl page (left) and the same URL as captured in the Wayback Machine (right, with the WBM toolbar visible at the top). The original site is no longer available per 15 December 2025.
Homepage
| Original (defunct) | Wayback Machine |
|---|---|
![]() | ![]() |
- Original: https://manuscripts.kb.nl/
- Wayback Machine (11-12-2025): https://web.archive.org/web/20251211222502/https://manuscripts.kb.nl/
Introduction
| Original (defunct) | Wayback Machine |
|---|---|
![]() | ![]() |
- Original: https://manuscripts.kb.nl/introduction
- Wayback Machine (11-12-2025): https://web.archive.org/web/20251211005548/https://manuscripts.kb.nl/introduction
Manuscript detail page (10 A 11)
| Original (defunct) | Wayback Machine |
|---|---|
![]() | ![]() |
- Original: https://manuscripts.kb.nl/show/manuscript/10+A+11
- Wayback Machine (13-12-2025): https://web.archive.org/web/20251213032801/https://manuscripts.kb.nl/show/manuscript/10+A+11
Images and description (10 A 11)
| Original (defunct) | Wayback Machine |
|---|---|
![]() | ![]() |
- Original: https://manuscripts.kb.nl/show/images_text/10+A+11
- Wayback Machine (12-12-2025): https://web.archive.org/web/20251212054623/https://manuscripts.kb.nl/show/images_text/10+A+11
Shelfmark index
| Original (defunct) | Wayback Machine |
|---|---|
![]() | ![]() |
- Original: https://manuscripts.kb.nl/indexes/shelfmark
- Wayback Machine (14-12-2025): https://web.archive.org/web/20251214002409/https://manuscripts.kb.nl/indexes/shelfmark
How manuscripts.kb.nl got into the Wayback Machine
1. Spidering the site
Unlike mmdc.nl, manuscripts.kb.nl was a server-rendered site, so a straightforward HTTP crawler could discover all URLs. A custom spider was built, see the _spider-artifacts/ folder:
-
Seed URLs — homepage, introduction, background, advanced search, and all 9 index pages (shelfmark, author/title, place, language, iconclass, image type, miniaturist, has part, title/image).
-
Crawler — Python + requests/BeautifulSoup, crawls each page, extracts internal links, and classifies them into categories (manuscript detail, image galleries, text views, search results, indexes, static pages) via config.py.
-
Output — 12,550 unique URLs written to manuscripts-urls-spider-output.xlsx.
Full planning notes: PLAN-url-spider-manuscripts.kb.nl.md.
2. Submitting to the Wayback Machine
The discovered URLs were submitted to the Wayback Machine using the Internet Archive’s Save Page Now 2 (SPN2) API with authenticated access, in two phases:
-
Phase 1 — Wiki priority URLs (10-11 Dec 2025): 61 URLs linked from Dutch Wikipedia and Wikimedia Commons were archived first using SaveToWBM_manuscripts_wiki_priority.py. Completed in ~23 minutes. Result: 61/61 (100%) successfully archived.
-
Phase 2 — Bulk archiving (11-14 Dec 2025): 7,433 URLs from the spider output were submitted sheet by sheet (smallest first: static_pages → indexes → search_literature → search_extended → show_text → show_images_text → show_manuscript) using SaveToWBM_manuscripts_bulk.py. Rate-limited at 17 seconds between requests. Result: 7,433/7,433 (100%) successfully archived, with only 4 transient errors (<0.1%) that were retried successfully.
Full planning notes: PLAN-wbm-archiving-manuscripts.kb.nl.md.
Folder structure
manuscripts.kb.nl/
├── index.md # This page
├── excel-details.md # Column-by-column breakdown of the Excel
├── manuscripts-urls-wbm-archived.xlsx # Master URL list with WBM status (7,460 URLs)
├── wiki-priority-urls-WBM.xlsx # Original wiki priority list (merged into master)
├── images/ # Before/after screenshots
├── wiki-url-replacements-completed/ # Overview of Wikipedia/Commons link updates
│ └── manuscripts-urls-wbm-archived-wiki.xlsx # All 61 replacements with proposed wikitext
├── _spider-artifacts/ # URL discovery (the spidering run)
│ ├── manuscripts-urls-spider-output.xlsx # Full spider output (12,550 URLs)
│ ├── seed-urls.txt # Spider seed URLs
│ ├── scripts/ # spider.py, config.py, excel_writer.py
│ ├── data/ # ⛔ NOT ON GITHUB (spider_state.json etc.)
│ ├── docs/ # PLAN-url-spider-manuscripts.kb.nl.md
│ └── logs/ # crawl.log
└── _archiving-artifacts/ # WBM submission
├── scripts/ # 3 core Python scripts:
│ ├── SaveToWBM_manuscripts_wiki_priority.py # submit wiki-priority URLs to WBM
│ ├── SaveToWBM_manuscripts_bulk.py # submit all URLs sheet by sheet to WBM
│ ├── lookup_wbm_captures.py # CDX lookup for actual capture URLs
│ └── .env # ⛔ NOT ON GITHUB (IA API keys)
├── data/ # ⛔ NOT ON GITHUB (progress/checkpoint JSON files)
├── docs/ # PLAN-wbm-archiving-manuscripts.kb.nl.md
└── logs/ # ⛔ NOT ON GITHUB (archiving logs)
Timeline
| Date | Activity | Output |
|---|---|---|
| 2025-12-10 | Site spidering with Python + requests/BeautifulSoup; probe crawl of 15 seed URLs (100% success, ~1.6s avg response) | 12,550 URLs in manuscripts-urls-spider-output.xlsx |
| 2025-12-10 → 2025-12-11 | Wiki priority archiving: 61 URLs linked from Dutch Wikipedia and Wikimedia Commons submitted via SPN2 API | 61/61 (100%) successfully archived |
| 2025-12-11 → 2025-12-14 | Bulk WBM submission of 7,433 URLs, sheet by sheet (smallest first), at 17s/request | 7,433/7,433 (100%) successfully archived |
| 2025-12-15 | manuscripts.kb.nl officially shut down | Live site no longer available |
| 2026-04-24 | CDX capture URL lookup; documentation and spreadsheet consolidation | Capture URLs and timestamps added to master Excel |
| 2026-04-28 | Manual update of 61 wiki-priority links on Dutch Wikipedia (13 articles) and Wikimedia Commons (48 file pages) | Dead manuscripts.kb.nl links augmented with WBM archive URLs |
Wikipedia & Commons link updates
All 61 wiki-priority URLs have been manually updated on the corresponding Dutch Wikipedia articles and Wikimedia Commons file pages (24-28 April 2026). The now-defunct manuscripts.kb.nl links were augmented with their Wayback Machine capture URLs:
- 13 Dutch Wikipedia articles — updated using the
{{Citeer web}}template witharchiefurl,archiefdatum, anddodeurl=japarameters - 48 Wikimedia Commons file pages — updated using the
{{Wayback}}template in thesourcefield and other relevant fields
The full overview of all replacements is available in wiki-url-replacements-completed/manuscripts-urls-wbm-archived-wiki.xlsx.
Notes & known issues
- The spider discovered 12,550 URLs; 7,433 were selected for archiving. The remaining ~5,100 were mostly image search result pages (
/search/images_text/extended/...) and other low-priority pages. - 61 URLs linked from Wikimedia projects were archived as a first priority. 34 of these overlap with the main bulk run; 27 are unique to the
wiki_prioritysheet (mostly/zoom/image viewer URLs). - The site served both
http://andhttps://URLs — Wikipedia/Commons links often usedhttp://while the spider discoveredhttps://versions.








