From a collection of documents to a published edition: how to use an end-to-end publication pipeline
The goal of our workshop is to demonstrate how a digital literary corpus could be processed for publication with TEI Publisher.
The workshop participants will learn to experiment with a ready-to-use solution that provides an easy and quick publication of a corpus. They will also get tips and shortcuts to help speed up the creation of a digital edition.
By the end of the session, this workshop will provide the participants with a visualization of their respective corpus, with side by side transformed text and original image; all of which then showing what can be achieved while working with TEI in the context of an end-to-end publication pipeline.
The program for this workshop is the following:
- Firstly, it will start with a presentation of the pipeline, its objectives and how it works.
- Then, the time we have will be divided into several slots corresponding to every step of the pipeline. Each slot will start with a quick presentation of what is expected of the participants and what tools they will need to use.
- Next, they will be allotted some time to work with their data and to process them for publication.
- At the end of the day, a 30mn feedback session will make it possible for each participant as well as for the workshop organizers to assess the benefits of the session and envision further possible collaborations.
- Cantaloupe (Open-source dynamic image server for on-demand generation of derivatives of high-resolution source images):
- eScriptorium (A Digital Text Production Pipeline for Printed and Handwritten Texts using machine learning):
- Oxygen XML (off-the-shelf XML editing software, providing must-have tools, and covering most XML standards):
- TEI Guidelines (Guidelines for Electronic Text Encoding and Interchange):
- TEI Publisher (Instant Publishing Toolbox, developped by e-editiones:
- images: folder of the facsimile for the first step; the images have been extracted from the Internet Archive [facsimile] and Wikipedia [picture],
- instructions: markdown files giving step by step instructions for each part of the workshop,
- scripts: XSLT and Python scripts to transform texts for the second step,
- alto and text: files for the second step,
- tei, text and xml: XML and TXT files for the third and fourth step.
- slides of the workshop
- Chagué, Alix, and Floriane Chiffoleau. An accessible and transparent pipeline for publishing historical ego documents. 2021. ⟨hal-03180669⟩
- Chagué, Alix, and Hugo Scheithauer. 2021. page2tei, an XSL Transformation to transform PAGE XML into TEI XML (Version 1.0.0) [Computer software]
- Chiffoleau, Floriane, DAHN Project, Digital Intellectuals, 2020-2021:
- Chiffoleau, Floriane, Anne Baillot, Manon Ovide. A TEI-based publication pipeline for historical ego documents -the DAHN project. Next Gen TEI, 2021 – TEI Conference and Members’ Meeting, Oct 2021, Virtual, United States. ⟨hal-03451421⟩
- Kiessling, Benjamin et al. 2019. “eScriptorium: An Open Source Platform for Historical Document Analysis”. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW). Vol. 2, pp. 19–19. DOI:10.1109/ICDARW.2019.10032
- Pierazzo, Elena. 2019. What future for digital scholarly editions? From Haute Couture to Prêt-à-Porter. International Journal for Digital Humanities, Springer, 1, pp.1-12. ⟨10.1007/s42803-019-00019-3⟩. ⟨hal-02117714⟩