Converting TEI/XML Files to EPUB Using Python

Overview I had the opportunity to convert TEI/XML files to EPUB using Python, so here are my notes. While Oxygen XML Editor is one method for converting TEI/XML files to EPUB, this time I used the Python library “EbookLib.” I referenced the following article. https://dev.classmethod.jp/articles/try-create-epub-by-python-ebooklib/ In particular, this time the goal is to create a vertical-text EPUB from the TEI/XML files published in the “Koui Genji Monogatari Text Data Repository.” ...

September 30, 2022 · 1 min · Nakamura

I Created a Program to Extract Differences Between Two Texts

Overview I created a program to extract differences between two texts. You can use it from the following Google Colab notebook. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/校異情報の生成.ipynb A well-known service for this purpose is “difff”, but this time I implemented it using Python. https://difff.jp/ For calculating the differences between texts, I used difflib.SequenceMatcher. https://docs.python.org/ja/3/library/difflib.html Usage You can choose between two output formats: HTML files and TEI files. HTML Here is an example of the HTML file output. ...

July 14, 2022 · 5 min · Nakamura

Added TEI/XML Download Functionality to the "NDL OCR x IIIF" App

I added the ability to download OCR results in TEI/XML format to the app that allows viewing OCR results published in the National Diet Library’s “Next-Generation Digital Library” using an IIIF viewer. https://static.ldas.jp/ndl-ocr-iiif/ Please also refer to the following article about this app. In adding this feature, I updated the UI. The results are divided into “Viewer” and “Data.” For “Viewer,” in addition to the previously provided “Mirador” and “Curation Viewer,” I added “Universal Viewer” and “Image Annotator.” I also added a link to the “Next-Generation Digital Library” and implemented a page called “TEI Viewer” as a simple viewer for TEI/XML files. ...

April 15, 2022 · 1 min · Nakamura

Created a Sample Repository for Running XSLT in Node.js

I created a sample repository for running XSLT in Node.js. https://github.com/ldasjp8/nodejs-xslt We hope this is helpful when processing TEI/XML files and similar in Node.js.

April 8, 2022 · 1 min · Nakamura

Created a Sample Program for Analyzing TEI/XML Files with Python

We created a sample program for analyzing TEI/XML files with Python. You can use it from the following Google Colab notebook: https://colab.research.google.com/drive/1fji80KZW8typjJMi01fyUWjrdYrNldsK We hope this serves as a useful reference for those considering the utilization of TEI data.

March 6, 2022 · 1 min · Nakamura

How to Use the Omeka S XML Viewer Module

Note: Using this module requires some advanced procedures. If you are considering basic use of Omeka S, please be aware of this. Overview This article explains how to use the XML Viewer module, which enables the display of XML files in Omeka S. It can be used for purposes such as displaying XML files created with TEI. gitlab.com Installation As of March 4, 2022, this module is only published on GitLab and is not available on GitHub. ...

March 4, 2022 · 3 min · Nakamura

Created a Program to Generate TEI facsimile Elements from IIIF Manifest Files

We created a program to generate TEI facsimile elements from IIIF manifest files. You can try it in the following Google Colaboratory notebook: colab.research.google.com We hope this serves as a useful reference for those considering integration between IIIF and TEI.

February 22, 2022 · 1 min · Nakamura

How to Get an Element with a Specific xml:id Value Using JavaScript querySelector()

This is a memo on how to get an element with a specific xml:id value using JavaScript’s querySelector(). Specifically, for a variable called myDoc, you can retrieve the element as follows. This example gets the element with the value abc in its xml:id attribute. myDoc.querySelector("[*|id=‘abc’]") The key point is to specify it in the format *|(pipe)id. When working with TEI/XML files in JavaScript, there are cases where you need to retrieve elements using xml:id attribute values. Unlike other attributes such as type or corresp, the xml:id attribute has the prefix “xml:” in its attribute name. Therefore, you need to use the approach described above. ...

February 21, 2022 · 1 min · Nakamura

How to Add a Line Break Before the lb Tag in Oxygen Auto-Formatting

Overview This article introduces how to change the auto-formatting and indentation rules in “Oxygen XML Editor,” a useful tool for working with TEI/XML. Specifically, the goal is to ensure that a line break is inserted before the lb tag, which marks the beginning of a line. Background In “Oxygen XML Editor,” there is an auto-formatting and indentation feature. It is the icon shown at the top of the figure below. ...

August 8, 2021 · 2 min · Nakamura