← Back to Archived sites

Saving manuscripts.kb.nl to the Wayback Machine

Latest update: 28-04-2026

About

manuscripts.kb.nl - the Medieval Illuminated Manuscripts (Middeleeuwse Verluchte Handschriften / MVH) database of the KB, National Library of the Netherlands - was shut down on 15 December 2025.

Screenshot of manuscripts.kb.nl homepage
Screenshot of manuscripts.kb.nl homepage, December 2025

Before the site went offline, the KB spidered and archived a representative sample of its URLs to The Wayback Machine (WBM) during 10-14 December 2025.

Results & URL spreadsheet

Screenshots

Each pair shows the original manuscripts.kb.nl page (left) and the same URL as captured in the Wayback Machine (right, with the WBM toolbar visible at the top). The original site is no longer available per 15 December 2025.

Homepage

Original (defunct)Wayback Machine
manuscripts.kb.nl homepage, originalmanuscripts.kb.nl homepage, as archived in Wayback Machine on 11-12-2025

Introduction

Original (defunct)Wayback Machine
manuscripts.kb.nl Introduction page, originalmanuscripts.kb.nl Introduction page, as archived in Wayback Machine on 11-12-2025

Manuscript detail page (10 A 11)

Original (defunct)Wayback Machine
manuscripts.kb.nl manuscript detail page for 10 A 11, originalmanuscripts.kb.nl manuscript detail page for 10 A 11, as archived in Wayback Machine on 13-12-2025

Images and description (10 A 11)

Original (defunct)Wayback Machine
manuscripts.kb.nl image gallery with text for 10 A 11, originalmanuscripts.kb.nl image gallery with text for 10 A 11, as archived in Wayback Machine on 12-12-2025

Shelfmark index

Original (defunct)Wayback Machine
manuscripts.kb.nl shelfmark index, originalmanuscripts.kb.nl shelfmark index, as archived in Wayback Machine on 14-12-2025

How manuscripts.kb.nl got into the Wayback Machine

1. Spidering the site

Unlike mmdc.nl, manuscripts.kb.nl was a server-rendered site, so a straightforward HTTP crawler could discover all URLs. A custom spider was built, see the _spider-artifacts/ folder:

  1. Seed URLs — homepage, introduction, background, advanced search, and all 9 index pages (shelfmark, author/title, place, language, iconclass, image type, miniaturist, has part, title/image).

  2. Crawler — Python + requests/BeautifulSoup, crawls each page, extracts internal links, and classifies them into categories (manuscript detail, image galleries, text views, search results, indexes, static pages) via config.py.

  3. Output — 12,550 unique URLs written to manuscripts-urls-spider-output.xlsx.

Full planning notes: PLAN-url-spider-manuscripts.kb.nl.md.

2. Submitting to the Wayback Machine

The discovered URLs were submitted to the Wayback Machine using the Internet Archive’s Save Page Now 2 (SPN2) API with authenticated access, in two phases:

  1. Phase 1 — Wiki priority URLs (10-11 Dec 2025): 61 URLs linked from Dutch Wikipedia and Wikimedia Commons were archived first using SaveToWBM_manuscripts_wiki_priority.py. Completed in ~23 minutes. Result: 61/61 (100%) successfully archived.

  2. Phase 2 — Bulk archiving (11-14 Dec 2025): 7,433 URLs from the spider output were submitted sheet by sheet (smallest first: static_pages → indexes → search_literature → search_extended → show_text → show_images_text → show_manuscript) using SaveToWBM_manuscripts_bulk.py. Rate-limited at 17 seconds between requests. Result: 7,433/7,433 (100%) successfully archived, with only 4 transient errors (<0.1%) that were retried successfully.

Full planning notes: PLAN-wbm-archiving-manuscripts.kb.nl.md.

Folder structure

manuscripts.kb.nl/
├── index.md                                 # This page
├── excel-details.md                         # Column-by-column breakdown of the Excel
├── manuscripts-urls-wbm-archived.xlsx       # Master URL list with WBM status (7,460 URLs)
├── wiki-priority-urls-WBM.xlsx              # Original wiki priority list (merged into master)
├── images/                                  # Before/after screenshots
├── wiki-url-replacements-completed/         # Overview of Wikipedia/Commons link updates
│   └── manuscripts-urls-wbm-archived-wiki.xlsx  # All 61 replacements with proposed wikitext
├── _spider-artifacts/                       # URL discovery (the spidering run)
│   ├── manuscripts-urls-spider-output.xlsx  # Full spider output (12,550 URLs)
│   ├── seed-urls.txt                        # Spider seed URLs
│   ├── scripts/                             # spider.py, config.py, excel_writer.py
│   ├── data/                               # ⛔ NOT ON GITHUB (spider_state.json etc.)
│   ├── docs/                                # PLAN-url-spider-manuscripts.kb.nl.md
│   └── logs/                                # crawl.log
└── _archiving-artifacts/                    # WBM submission
    ├── scripts/                             # 3 core Python scripts:
    │   ├── SaveToWBM_manuscripts_wiki_priority.py  # submit wiki-priority URLs to WBM
    │   ├── SaveToWBM_manuscripts_bulk.py           # submit all URLs sheet by sheet to WBM
    │   ├── lookup_wbm_captures.py                  # CDX lookup for actual capture URLs
    │   └── .env                                    # ⛔ NOT ON GITHUB (IA API keys)
    ├── data/                                # ⛔ NOT ON GITHUB (progress/checkpoint JSON files)
    ├── docs/                                # PLAN-wbm-archiving-manuscripts.kb.nl.md
    └── logs/                                # ⛔ NOT ON GITHUB (archiving logs)

Timeline

Date Activity Output
2025-12-10 Site spidering with Python + requests/BeautifulSoup; probe crawl of 15 seed URLs (100% success, ~1.6s avg response) 12,550 URLs in manuscripts-urls-spider-output.xlsx
2025-12-10 → 2025-12-11 Wiki priority archiving: 61 URLs linked from Dutch Wikipedia and Wikimedia Commons submitted via SPN2 API 61/61 (100%) successfully archived
2025-12-11 → 2025-12-14 Bulk WBM submission of 7,433 URLs, sheet by sheet (smallest first), at 17s/request 7,433/7,433 (100%) successfully archived
2025-12-15 manuscripts.kb.nl officially shut down Live site no longer available
2026-04-24 CDX capture URL lookup; documentation and spreadsheet consolidation Capture URLs and timestamps added to master Excel
2026-04-28 Manual update of 61 wiki-priority links on Dutch Wikipedia (13 articles) and Wikimedia Commons (48 file pages) Dead manuscripts.kb.nl links augmented with WBM archive URLs

All 61 wiki-priority URLs have been manually updated on the corresponding Dutch Wikipedia articles and Wikimedia Commons file pages (24-28 April 2026). The now-defunct manuscripts.kb.nl links were augmented with their Wayback Machine capture URLs:

The full overview of all replacements is available in wiki-url-replacements-completed/manuscripts-urls-wbm-archived-wiki.xlsx.

Notes & known issues