Schemas Convertible from TEI ODD: RNG, XSD, DTD, and More

Overview In the following article, I tried creating an ODD. The above uses a tool called Roma, and you can see that the created ODD has the following output formats available. Specifically, the available formats are “RELAX NG Schema,” “RELAX NG Compact,” “W3C Schema,” “Document Type Definition,” and “ISO Schematron Constraints.” I asked GPT-4 about the differences between these formats and am sharing the results here. There may be some inaccuracies, but I hope this serves as a useful reference. ...

November 4, 2023 · 5 min · Nakamura

Using Roma to Limit Tags for Your Project and Generate Documentation

Overview I previously explained how to use Roma in the following article. This time, I will explain the workflow for creating TEI ODD (One Document Does-it-all) and documentation (HTML and PDF) targeting TEI/XML files at hand. Note that at the end of this article, I have included GPT-4’s response regarding the differences between ODD (One Document Does it all) and RNG (RelaxNG). Please refer to that as well. Obtaining a List of Tags Used First, obtain a list of tags used in your project. ...

November 3, 2023 · 12 min · Nakamura

Using Versioning Machine (VM5.0) with Visual Studio Code (VSCode)

Overview Versioning Machine (VM5.0) is an application for visualizing textual variant information. http://v-machine.org/ This article explains how to use Visual Studio Code (VSCode) to display your own TEI/XML files in this application. The target TEI/XML files contain variant information described using the <listWit> tag, as shown below: < T . E < . I t . e < < x i f s m H i < < < < l e l t p u l / m n a e i t u p r i < < l s s d D t i b u c s w / w / i D = e e l t l b e t i < < w i < < w s e " r s e l i l D W t t t i t t t i t s h > c S e c i e i n i i t n i i t W c t > t S a c s t e t t n e t t n i t m t t a c > s l l e s l l e t s p t m i t > s e e s s e e s > a : > t o i s s m / > n o x x x > x x x > e / S n m m m m m m A w t S l l l l l l s w m t : : : : : : = w t m i l l i l l " . > t d a a d a a # t > = n n = n n U e " g g " g g T i W = = U = = L - A " " T " " " c " j d L j d > . > a e " a e o " " > " " r > > > > g ヴ G 東 D / ァ o 京 e n イ e 大 r s マ t 学 / ル h 総 B 1 版 e 合 r . ゲ s 図 i 0 ー 書 e " テ W 館 f > 全 e 所 集 r 蔵 ( k の o 略 e ゲ n 称 . ー W テ G A h 自 o ) e 署 e < r 付 t / a 書 h t u 簡 e i s < t g / a l e t n e g i > e t L b l u e e d n > w i i g m W A i u l f h t e r l a m g e C r d a e m r e r G r o o ß m h e 2 r 9 z . o g D i e n z e S m o b p e h r i e 1 8 2 o 2 n i S m a c B h e s s e i n t < z / t d i e t r l e U > n i v e r s i t ä t s b i b l i o t h e k T o k i o < / t i t l e > As described later, this article uses text data from a letter with Goethe’s autograph held in the University of Tokyo General Library, which is publicly available at the following link: ...

November 3, 2023 · 8 min · Nakamura

I Created a Sample Repository Using CETEIcean and Nuxt 3

Overview I created a sample repository using CETEIcean and Nuxt 3. https://github.com/TEIC/CETEIcean I referenced the following issue. https://github.com/TEIC/CETEIcean/issues/27 The script introduced there did not work with CETEIcean v1.8.0, so I created a minimal repository that works with CETEIcean v1.8.0 and Nuxt 3. Demo Page https://nakamura196.github.io/ceteicean-nuxt3 Source Code https://github.com/nakamura196/ceteicean-nuxt3 Main File https://github.com/nakamura196/ceteicean-nuxt3/blob/main/app.vue Summary I hope this serves as a useful reference. I would also like to express my gratitude to those who developed CETEIcean. ...

July 27, 2023 · 1 min · Nakamura

Converting TEI XML to LaTeX Using TEI Critical Apparatus Toolbox

Overview TEI Critical Apparatus Toolbox is “a tool for people preparing a natively digital TEI critical edition.” http://teicat.huma-num.fr/index.php In addition to providing functionality for visualizing critical apparatus information, it offers several other useful features. Among these, I learned that it has a “TEI to LaTeX and PDF conversion” feature, so I decided to try it out. Print an edition Access the following URL. http://teicat.huma-num.fr/print.php Click the link with the text this dummy edition file to download the following sample data. ...

April 19, 2023 · 1 min · Nakamura

Created a Program to Calculate Edit Distance for TEI/XML Files Containing app Elements

Overview I created a program to calculate edit distance for TEI/XML files containing app elements. You can use it from the following Google Colab notebook: https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/編集距離を算出するプログラム.ipynb Upload an XML file and the program will calculate the similarity between witnesses. Example Let’s upload the following XML file: https://tei-eaj.github.io/koui/data/nakamura.xml The result is an Excel file like the following, which provides an overview of the similarity between witnesses. index name1 name2 distance ratio 0 中村式五十音 中村式五十音又様 10 0.85 1 中村式五十音 中村式五十音欠損本 7 0.8947368421052632 2 中村式五十音又様 中村式五十音欠損本 8 0.868421052631579 The following library is used for calculating similarity: ...

January 26, 2023 · 1 min · Nakamura

Collaborative Editing of TEI/XML Files Using Visual Studio Live Share (Not Limited to XML)

Overview Visual Studio Live Share is a VSCode extension that enables real-time collaborative development. https://visualstudio.microsoft.com/ja/services/live-share/ This time, we will try real-time collaborative editing of TEI/XML files using this extension. Demo Video A video of the collaborative editing was recorded. https://youtu.be/DzyuJAtzl90 The right side of the screen shows a user (nakamura196) using VSCode in a local environment, while the left side shows a user (Guest User) invited via Visual Studio Live Share editing using the online VSCode (vscode.dev). ...

January 19, 2023 · 3 min · Nakamura

Trying the jingtrang Library for RELAX NG Schema: Validation

Overview I had an opportunity to create an XML file conforming to a specific schema, and needed to verify that the XML file matched the schema. To meet this requirement, I tried the jingtrang library for working with RELAX NG schemas, so here are my notes: https://pypi.org/project/jingtrang/ I also prepared a Google Colab notebook: https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/jingtrangを試す.ipynb Trying Validation # p # w # w i g g ラ p r e v e イ n t a t ブ i g l ラ n フ h i h リ s ァ t d t の t イ t a t イ a ル p t p ン l の s i s ス l ダ : o : ト ウ / n / ー j ン / 対 / ル i ロ r 象 k n ー a の o g ド w X u t ( . M i r t g L g a e i フ e n i t ァ n g _ h イ j a u ル i l b の m l u 用 o を s 意 n 使 e ( o 用 r 校 g ) c 異 a o 源 t n 氏 a t 物 r e 語 i n テ . t キ g . ス i c ト t o の h m ダ u / ウ b n ン . a ロ i k ー o a ド / m ) t u e r i a / 1 0 9 1 6 . / x t m e l s t 2 0 2 1 / m a i n / t e i _ a l l . r n g Passing Example Running the following produced no output: ...

January 18, 2023 · 3 min · Nakamura

Converting Word to TEI/XML

Overview I had an opportunity to convert Word files to TEI/XML files. Upon investigation, in addition to official TEI tools such as TEIGarage Conversion, I found a conversion example in TEI Publisher: https://teipublisher.com/exist/apps/tei-publisher/test/test.docx.xml The above example appeared to convert Word style information into TEI tags, so I tried this approach. For this project, I used the python-docx library with the goal of using it independently of TEI Publisher. Word File I created a prototype Word file like the one below. All styles are provisional, but I created styles such as “tei:persName” and “tei:warichu” and changed their visual styling such as color. The mechanism works by applying styles to perform simple structuring. ...

January 17, 2023 · 5 min · Nakamura

Creating a Customized RNG File Using Roma: Restricting Available TEI Tags

Overview In this article, I will attempt to customize TEI ODD (One Document Does-it-all) using a web application called Roma. https://romabeta.tei-c.org/ For more about TEI ODD, please refer to the official site below. I must admit that I do not fully understand it myself due to limited study. https://wiki.tei-c.org/index.php/ODD However, one use case is that in TEI-based projects, you can restrict the tags used (specifically, those that receive assistance and validation). ...

January 12, 2023 · 5 min · Nakamura

An Example Workflow for Creating TEI/XML from Excel

Overview I created an example workflow for generating TEI/XML from data prepared in Excel. The following TEI/XML file is output. It supports page breaks using the pb tag, line IDs using the lb tag, multiple representations using choice/orig/reg tags, annotations using the note tag, and linking with IIIF images. < < ? T < < < x E t < t < f < < T m I e f < < < t e b < < t a s < < / s < f E l i i t < / p < / s < f e x o p a < < b e c u l [ / z s u l [ s a I x H l i t t u a p o a s i i t d b b l s い < な a o x s r a 2 l o u r a 2 l u c > v m e e t i i b b u u b l H > y > b e つ c < か s b d t i f b 2 a n r f b 3 a r s e l a D l t t l / b r u e e > c g れ h o 給 < r た c に e > y > m a e ] b e f a e ] b f i r n d e e l l i > l c r D a o x > の o r け n 給 o e ま r h い g > i c l e a c l e a m s s e s S e e c i e c e d r m 御 i i る o け n r g ふ e o と > l e > l l c e > l c i i = r c t / S a c D e s e r l 時 c g t る o i > g i や e > r e > e l o " > > m > t t a e D c r e : に e > e ー t g > c む s x > s > e n h t m i t s e > > s i か > た e > e こ s o = o > = t > t o i c s p d 女 c ま > > と o u " u " t > n o > c = = 御 o ふ な u r 1 r 1 p S n > " " 更 r 河 き r c 1 c . : t S # p 衣 r ゝ c e 2 e 0 / m t p a あ e は e = 6 = " / t m a g ま s = " " " w > t g e た p " h h e w > e _ さ = h t l t n w _ 2 ふ " t t r t c . 2 2 ら # t p y p o t 2 - ひ p p s = s d e " b a s : " : i i / - g : 1 / n - > 1 e 3 / g c " _ d 1 d = . / 2 d l 9 l " o > 2 l . " . u r - . n n t g b n d u d f / - d l l l - n 1 l . x . 8 s - . g = g " / 2 g o " o ? 1 0 o . 1 . > . " . j 0 j 0 j p 4 p " t p 4 / > y a " a p a p p e p i u i = i l / " i y i 校 i i = i 異 i i " i " i f 8 f > f 9 / 3 5 3 3 4 " 4 4 3 3 3 7 x 7 7 6 m 6 6 8 l 8 8 6 : 6 6 i / c d c m a = a a n " n n v p v i a a a f s g s e / e / s 2 _ 2 t 2 2 3 . " 2 " j - s x b x o m - m n l 1 l " : - : > i 2 i d 0 d = " = " / " p > p a a g g e e _ _ 2 2 2 3 " " > > An example of visualizing the above TEI/XML data is shown below. The image, text (original), text (regularization), and annotations are displayed on the same screen. ...

January 10, 2023 · 6 min · Nakamura

Created a Custom OpenSeaDragon Viewer for Use in TEI Viewers

Overview I created a Custom OpenSeaDragon Viewer intended for use in TEI viewers. Background In developing a viewer that links TEI and IIIF as shown below, a viewer with the following capabilities was needed. https://www.hi.u-tokyo.ac.jp/collection/digitalgallery/wakozukan/tei/ Ability to load IIIF manifest files. Ability to track page navigation within the viewer component from outside the component. Ability to highlight partial regions of images. Since I could not find an existing IIIF-compatible viewer that met all of the above requirements, I attempted to develop a custom viewer. I also tried publishing it as an npm package. ...

December 26, 2022 · 2 min · Nakamura

Trying Out Gatsby CETEIcean

Overview I tried out Gatsby CETEIcean, created by Raffaele Viglianti. https://github.com/raffazizzi/gatsby-ceteicean-workshop Prototype Site The following is the prototype site. I have added several customizations, including MUI, vertical text display, and links to RDF data. https://nakamura196.github.io/gatsby-ceteicean-workshop/ The TEI/XML files from the “Koui Genji Monogatari Text DB” are used as the data source. https://kouigenjimonogatari.github.io/ Source Code The source code including the customizations can be found at the following link. https://github.com/nakamura196/gatsby-ceteicean-workshop Summary Using Gatsby CETEIcean, it seems possible to efficiently develop publishing environments for TEI/XML files. ...

December 20, 2022 · 1 min · Nakamura

Trying Out TEI Boilerplate

Overview TEI Boilerplate is described as follows: A lightweight solution for publishing TEI (Text Encoding Initiative) P5 content directly in modern browsers. With TEI Boilerplate, you can serve TEI XML files directly to the web without server-side processing or conversion to HTML. The TEI Boilerplate Demo demonstrates many TEI features rendered by TEI Boilerplate. TEI Boilerplate is not a replacement for the many excellent XSLT solutions for publishing and displaying TEI/XML on the web. It is intended to be a simple, lightweight alternative to more complex XSLT solutions. ...

December 17, 2022 · 2 min · Nakamura

Introduction to "FairCopy": A TEI Text Creation Support Tool

Overview A research colleague introduced me to “FairCopy,” a TEI text creation support tool. This tool allows you to create TEI texts through a GUI, and I found it very useful. It is a paid tool, but you can try it for free for 2 weeks, so I am sharing my findings here. Installation By submitting your information through the Sign Up page below, a trial code and the application download link will be displayed. ...

November 11, 2022 · 16 min · Nakamura

How to Use the Text Markup Tool "CATMA"

Overview This article introduces how to use “CATMA,” one of the text markup tools. https://catma.de/ Annotation results can be exported in TEI format, making it possible to create highly interoperable data that can be utilized in other systems. Additionally, though still experimental, a JSON API is also provided. By using this, one could annotate with CATMA and then use the results in other systems via the API. The above includes some untested content and somewhat advanced approaches, but this article will serve as notes on the basic usage of CATMA. ...

November 10, 2022 · 11 min · Nakamura

Trying the MediaWiki TEI Extension (Result: Did Not Work)

Overview An extension has been developed that enables TEI editing in MediaWiki. https://www.mediawiki.org/wiki/Extension:TEI An example of the editing screen is shown below. Scripto, a transcription support module for Omeka S, enables transcription of image data registered in Omeka S by linking Omeka S with MediaWiki. https://omeka.org/s/modules/Scripto/ I tried combining this environment with the TEI extension mentioned above to see if TEI-compliant transcription could be achieved. However, as a result, I was unable to get the TEI extension to work properly this time. ...

November 10, 2022 · 4 min · Nakamura

[TEI x JavaScript] Removing Unintended Whitespace in Nuxt 3

Problem When loading TEI/XML files and visualizing them with JavaScript (Vue.js, etc.), there were cases where unintended whitespace was inserted. Specifically, when writing HTML like the following: < t / e < t m d / e p i お < お d m l v 問 a 願 i p a > い い v l t 合 h し > a e わ r ま t > せ e す e は f > = " # " > こ ち ら か ら < / a > It would render with unintended spaces: “お問い合わせは こちらから お願いします” as shown below. ...

October 25, 2022 · 5 min · Nakamura

Double-Sided Ruby Annotations Using python-docx

This is a memo on how to achieve double-sided ruby (furigana) in Word using python-docx. You can try it from the following notebook. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/python_docxを用いた両側ルビ.ipynb An output example is shown below. An input example is shown below. < < < b p 私 < に / p < が / / o > は r < < / 行 p > r < < / あ p b d u r < < / r r き > u r r r り > o y b b r < < / r < < / r t u ま b b t u ま d > y > u r r r u r r r 場 b b し y > b す y > b b t u b b t u > p y た > 入 p y 。 > y > b y > b l > 。 学 l > > 打 p y > 球 p y a 試 a < l > < l > c 験 c / a / a e < e r c r c = / = b e b e " r " > = > = l b a " " e > b r r f o i i t v g g " e h h > " t t ビ > " " リ に > > ヤ ゅ ダ キ ー う < ウ ド が / < < く r / / し t r r け > t t ん > > < / r t > The program is still incomplete, but I hope it serves as a helpful reference. ...

October 4, 2022 · 2 min · Nakamura

An Example Method for Converting TEI/XML Files to Vertical-Writing PDF

Overview This is a memo documenting one example method for converting TEI/XML files to vertical-writing (tategaki) PDF. You can try the program targeting “Koui Genji Monogatari” (Collated Tale of Genji) in the following notebook. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/TEI_XMLファイルを縦書きPDFに変換する.ipynb Conversion Workflow This time, I used Quarto. https://quarto.org/ Please refer to the following for installation instructions. https://quarto.org/docs/get-started/ TEI/XML -> qmd First, convert the contents of the TEI/XML file to a qmd file. Below is a sample conversion script. ...

October 3, 2022 · 8 min · Nakamura