Humanities

Conversion and Visualization of the NDL-DocL Dataset (Document Image Layout Dataset)

I created a notebook that converts Pascal VOC format XML files to COCO format JSON files and visualizes the contents of the NDL-DocL Dataset (Document Image Layout Dataset) published by NDL Lab. https://github.com/nakamura196/ndl_ocr/blob/main/NDL_DocLデータセット(資料画像レイアウトデータセット)の変換と可視化.ipynb By opening the above notebook and pressing “Runtime” > “Run all cells,” you can perform the conversion and visualization. By using the “/content/img” folder and “/content/dataset_kotenseki.json” file created after execution, you can use the data in machine learning programs that require COCO format data. ...

I Created a Program to Extract Differences Between Two Texts

Overview I created a program to extract differences between two texts. You can use it from the following Google Colab notebook. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/校異情報の生成.ipynb A well-known service for this purpose is “difff”, but this time I implemented it using Python. https://difff.jp/ For calculating the differences between texts, I used difflib.SequenceMatcher. https://docs.python.org/ja/3/library/difflib.html Usage You can choose between two output formats: HTML files and TEI files. HTML Here is an example of the HTML file output. ...

Trying Omeka Classic as a Headless CMS

Overview Omeka S and Omeka Classic are very useful tools for building digital archives and for humanities (informatics) research. https://omeka.org/ They come with a REST API as standard and have high extensibility through the addition of modules and plugins. Various existing assets can also be used, including IIIF-related tools, transcription support tools, and tools for handling spatiotemporal information. On the other hand, I (personally) feel that theme development for changing the appearance of sites requires knowledge of PHP and Omeka, making it relatively difficult. On this point, the Headless CMS approach, where the backend and frontend are separated, has been gaining popularity in recent years. ...

Created an Image Comparison Tool Using Mirador 3

I created an image comparison tool using Mirador 3. The URL is as follows. https://ldas-jp.github.io/viewer/input/ The GitHub repository URL is as follows. https://github.com/ldas-jp/viewer Below is the input form. You specify the URLs of the IIIF manifest files and the Canvas URIs for the images you want to compare. You can check input examples by clicking the buttons under “Examples.” Clicking the “Open” button launches Mirador 3 as shown below. You can compare images based on the input information. ...

Bulk Registration of Annotations Using the IIIF Toolkit for Omeka Classic

Introduction This article is primarily a memorandum. There may be many unclear points, so please bear with me. In particular, I hope this serves as a useful reference for how to use the annotation endpoint used by the IIIF Toolkit, as introduced below. https://github.com/utlib/IiifItems/wiki/The-Mirador-Omeka-Annotator-Endpoint Overview The IIIF Toolkit plugin for Omeka Classic is a very useful tool that can load IIIF manifest files and add annotations to images. https://zenn.dev/nakamura196/books/2a0aa162dcd0eb/viewer/b37a8c This article covers how to bulk register annotations that were created independently of Omeka Classic into Omeka Classic. ...

NDL OCR Now Supports Ruby (Furigana) Text Extraction

Overview For NDL OCR, the default setting previously did not include ruby (furigana) text extraction. Thanks to the cooperation of the NDL team, it is now possible to configure whether or not to perform text extraction for ruby. https://github.com/ndl-lab/ndlocr_cli/ Setting the following to True in config.yaml enables the ruby text extraction feature. yield_block_rubi: False Please note the following caveats when using this feature: Ruby text is not always split at the exact kanji positions where furigana is placed; multiple ruby sections may be merged into a single output Because ruby characters are small, they may sometimes be output as a placeholder character Tutorial Notebook Updates The ruby text extraction option has also been added to the Google Colab tutorial. ...

Bug and Fix for Omeka S Bulk Import

The Bulk Import module for batch registration of items and media in Omeka S has a bug in versions 3.3.28.0 through 3.3.33.2 that prevents media from being registered. If you need to register media, you will need a workaround such as using version 3.3.27.0 or earlier. After creating an issue about this problem, the bug was promptly fixed: https://gitlab.com/Daniel-KM/Omeka-S-module-BulkImport/-/issues/10 As of July 1, only the source code on GitLab has been updated, but it should be added to the GitHub Releases soon. Please be aware of this when using this module. ...

Created a Program to Download Data from Omeka Classic

I created a program to download data from Omeka Classic. It is published in the following repository. https://github.com/nakamura196/omekac_backup I also created a Google Colab notebook that demonstrates how to run this program. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/omeka_classic_backup.ipynb In the above tutorial, data download is performed targeting the following Omeka Classic site. https://jinmoncom2017.omeka.net/ After execution, the API download results are output to the docs folder. You can use the above data for backups, etc. I hope this serves as a useful reference when using Omeka Classic. ...

Created a Program to Download Omeka S Data

I created a program to download Omeka S data. It is published in the following repository. https://github.com/nakamura196/omekas_backup I also created a Google Colab showing an execution example of this program. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/omekas_backup.ipynb In the above tutorial, data download is executed targeting the following Omeka S sandbox. https://omeka.org/s/download/#sandbox After execution, API download results are output to the docs folder, and an MS Excel file summarizing them is output to the data folder. ...

I Created an IIIF Image API Tool Using Nuxt 3 and Vuetify 3

Overview I created an IIIF Image API tool using Nuxt 3 and Vuetify 3. The background for developing this tool was a need to work with the IIIF Image API, as well as the purpose of learning how to use Nuxt 3. The GitHub repository is as follows. I hope it serves as a useful reference. https://github.com/nakamura196/nuxt3-vuetify3 Usage You can access it from the following URL. https://nv3.netlify.app/ As shown below, pressing the “Example” button inputs a URL into the text form at the top of the screen, and the elements contained in that URL (such as “region” and “size”) are displayed at the bottom of the screen. ...

[Omeka S Module] How to Disable Image API in the IIIF Server Module

Overview In the Omeka S module “IIIF Server,” which generates IIIF manifests, you can configure settings to not use the Image API. This makes it easier to deliver IIIF manifests in resource-limited environments such as rental servers. I previously wrote the following article: https://nakamura196.hatenablog.com/entry/2021/07/22/171657 As of May 2022, the configuration method has changed due to module updates, so I am writing this article about the updated settings. For the advantages and disadvantages of not using the Image API, please refer to the article above. ...

[Omeka S Theme] Partial Mapping Module Support for Bootstrap 5 Theme

Overview For the following Omeka S theme using Bootstrap 5, when the Mapping module was installed, display issues occurred on the map-browse page as described below. https://github.com/ldasjp8/Omeka-S-theme-Bootstrap5 The fix was made as follows. https://github.com/ldasjp8/Omeka-S-theme-Bootstrap5/commit/d60c93ff6d79b5505d25ef26e31e3776f55199d4 Before Fix The geographic-related forms had display issues. After Fix The display issues with the geographic-related forms were fixed. Summary There are still pages and modules with display issues, but I plan to address them gradually. ...

[Omeka S] How to Use the "IIIF Viewers" Module for Multiple IIIF-Compatible Viewers

Overview I have developed and published the “IIIF Viewers” module for Omeka S, which displays IIIF manifest URI icons and viewers. The development of this module was supported by the National Institute of Japanese Literature. https://github.com/omeka-j/Omeka-S-module-IiifViewers Below, I will explain how to use this module. Installation The module can be installed using the standard method for Omeka S. Specifically, first click on the “Releases” link shown below. Next, click the following link to download the zip file. Extract the downloaded file and place the extracted folder “IiifViewers” into the “modules” folder of your installed Omeka S. ...

Registering DC-NDL (National Diet Library Dublin Core Metadata Description) as a Vocabulary in Omeka S

Here is how to register DC-NDL (National Diet Library Dublin Core Metadata Description) as a vocabulary in Omeka S. First, select “Vocabularies” as shown below. Next, click the button in the upper right. (The translation data for this button label is incorrect; I hope to fix it in the future.) Then, enter the required information as shown on the following screen. The specific information is as follows. Category Field Value Notes Basic Information Label DC-NDL This value is arbitrary. Basic Information Namespace URI http://ndl.go.jp/dcndl/terms/ Basic Information Namespace Prefix dcndl File Vocabulary URL https://www.ndl.go.jp/jp/dlib/standards/meta/2020/12/ndl-terms.rdf As a result, DC-NDL becomes available as a vocabulary as shown below. ...

Fixing the GitHub Repository Demonstrating Mirador 3 Usage with Nuxt 2

I have been demonstrating an example of using Mirador 3 with Nuxt 2 in the following GitHub repository. https://github.com/nakamura196/nuxt-mirador However, I found that the above repository had an issue in the production environment. Specifically, Mirador’s display would break after page navigation. An issue was submitted: https://github.com/nakamura196/nuxt-mirador/issues/1 A pull request fixing the bug was also submitted for this issue. https://github.com/nakamura196/nuxt-mirador/pull/2 Specifically, as shown below, it was necessary to unmount in beforeDestroy. ...

Example of Running SPARQL Queries Against the Japan Search RDF Store Using Google Colab

I created a notebook demonstrating examples of running SPARQL queries against the Japan Search RDF store using Google Colab. I hope it serves as a useful reference when using RDF stores with Python. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/ジャパンサーチのRDFストアを対象したSPARQLチュートリアル.ipynb Other reference sites and tutorials include the following. https://www.kanzaki.com/works/ld/jpsearch/ https://lab.ndl.go.jp/data_set/tutorial/

How to Register, Update, and Delete researchmap Achievements Using CSV Files

Overview I performed new registration, updating, and deletion of achievements on researchmap using CSV files. This article shares the method and the data used. Sample data used this time https://github.com/ldasjp8/researchmap New Registration First, click the “Import” button. When the import dialog appears, select the CSV file for new registration and press the “Consistency Check” button. An example CSV file for registration is stored below. This is an example of new registration to “published_papers.” ...

Added TEI/XML Download Functionality to the "NDL OCR x IIIF" App

I added the ability to download OCR results in TEI/XML format to the app that allows viewing OCR results published in the National Diet Library’s “Next-Generation Digital Library” using an IIIF viewer. https://static.ldas.jp/ndl-ocr-iiif/ Please also refer to the following article about this app. In adding this feature, I updated the UI. The results are divided into “Viewer” and “Data.” For “Viewer,” in addition to the previously provided “Mirador” and “Curation Viewer,” I added “Universal Viewer” and “Image Annotator.” I also added a link to the “Next-Generation Digital Library” and implemented a page called “TEI Viewer” as a simple viewer for TEI/XML files. ...

Created a Sample Repository for Using OpenSeadragon with Vue3

I created a sample repository for using OpenSeadragon with Vue3. Here is a working example. https://static.ldas.jp/vue3-osd/ The source code is available below. https://github.com/ldasjp8/vue3-osd As I am a Vue3 beginner, there may be some errors, but I hope this is helpful.

[Omeka S] How to Set Custom Identifiers in the IIIF Server Module

With the default settings of the Omeka S IIIF Server module, you can access IIIF manifest files using URLs like the following. /iiif///manifest Example (version 2): https://shared.ldas.jp/omeka-s/iiif/2/1267/manifest Example (version 3): https://shared.ldas.jp/omeka-s/iiif/3/1267/manifest However, since this uses Omeka’s internal ID, it is recommended to use custom identifiers. The solution is to additionally install the Clean Url module and enable Use the identifiers from Clean Url in the IIIF Server module settings screen shown below. ...