Welcome to this XML with Python course¶

This course is set up to be a quick-start into working with XML files using Python. No prior knowledge of Python or XML is needed as the first lessons cover the basics of working with Python in Jupyter Notebooks as well as the basics of XML structure. We will give an overview of two Python packages that are often used when working with XML. After this, a practical lesson for both packages follows, in which you learn how to use the packages to extract content and metadata from an example XML file.

We continue with an introduction to three XML formats that are commonly used in Digital Heritage institutions. The remaining lessons are practial examples and exercises to get familiar with extracting content and metadata from these files with Python. We use both packages and real-life XML examples to show the differences, and to provide working code blocks to base future work on. We will end with instructions on how to perform such extractions automatically on batches of files.

As you proceed through the lessons, you will notice that there is a lot of repetition. This is done on purpose. First, to really learn a new skill, repetition is necessary. Secondly, this gives you the opportunity to skip lessons that are not relevant for you without missing important information and guidelines.

Therefore, it is possible to follow only the ElementTree or the Beautiful Soup lessons as the structure of both tracks is the same. For those without any background in Python or XML, or in need of a refresher course, we recommend starting with lesson one. For those with a background in either Python or XML, feel free to deep-dive into the later lessons and skip the introductions.

To follow the course, an installation of Python 3 and Jupyter Notebooks is needed.

The real-life XML examples are provided by the KB, the national library of the Netherlands. If you would like to know more about the datasets that are available, or would wish to use them, please contact dataservices@kb.nl

The course and supplemental material are still in beta. Despite our best intentions there may still be some errors in the material. These can be reported as issues on our Github repository

If you have any questions about the course, please contact us via mirjam.cuper@kb.nl.

Automatically extract XML content with Python

Welcome to this XML with Python course

Welcome to this XML with Python course¶