Posts

Triggering GitHub Actions from Drupal Events

Overview This is a memorandum on how to trigger GitHub Actions from Drupal events. The following site was helpful: https://qiita.com/hmaruyama/items/3d47efde4720d357a39e Pipedream Configuration Create a workflow that includes a trigger and a custom_request. For the trigger, please refer to the following: https://qiita.com/hmaruyama/items/3d47efde4720d357a39e#pipedream側の設定 In custom_request, configure the dispatch settings. https://docs.github.com/ja/rest/repos/repos?apiVersion=2022-11-28#create-a-repository-dispatch-event Configure the settings as follows: curl -L \ -X POST \ -H "Accept: application/vnd.github+json" \ -H "Authorization: Bearer <YOUR-TOKEN>" \ -H "X-GitHub-Api-Version: 2022-11-28" \ https://api.github.com/repos/OWNER/REPO/dispatches \ -d '{"event_type":"webhook"}' ...

Inference App Using a YOLOv5 Model (Character Region Detection)

Overview The character region detection app is published at the following link. https://huggingface.co/spaces/nakamura196/yolov5-char The above app had stopped working, so I fixed it following the same procedure as in the following article. The model used in this app was built using the “Japanese Classical Character Dataset” (held by NIJL and others / processed by CODH) doi:10.20676/00000340. I also made some minor improvements during this fix, which I will introduce here. ...

Getting a List of Untranslated Nodes in Drupal

Overview I had the opportunity to get a list of untranslated nodes in Drupal, so this is a personal note for future reference. Method There are various approaches, but this time I use JSON:API. Let’s assume the master language is Japanese (ja) and the translation language to add is English (en). Using JSON:API, for example, for a taxonomy called collection, you can retrieve it with the following: https://xxx/jsonapi/taxonomy_term/collection Additionally, by adding /en as follows, if a translation node exists, that information is returned. ...

Launching Jupyter Lab on mdx

Overview I had an opportunity to launch Jupyter Lab on mdx, so here are my notes. Please also refer to the following for mdx setup. References The following video was very helpful. https://youtu.be/-KJwtctadOI?si=xaKajk79b1MxTpJ6 Setup On the Server Install pip sudo apt install python3-pip Add to the PATH nano ~/.bashrc export PATH="$HOME/.local/bin:$PATH" source ~/.bashrc The following command launches Jupyter Lab. jupyter-lab Local Machine Connect via SSH with the following command. ssh -N -L 8888:localhost:8888 mdxuser@xxx.yyy.zzz.lll -i ~/.ssh/mdx/id_rsa Then, access the address displayed in the server console. ...

Fixing an Inference App Using Hugging Face Spaces and a YOLOv5 Model (Trained on NDL-DocL Dataset)

Overview In the following article, I introduced an inference app using Hugging Face Spaces and a YOLOv5 model trained on the NDL-DocL dataset. This app had stopped working, so I fixed it to make it operational again. https://huggingface.co/spaces/nakamura196/yolov5-ndl-layout Here are my notes on the changes made during this fix. Changes The modified app.py is shown below. import gradio as gr from PIL import Image import yolov5 import json model = yolov5.load("nakamura196/yolov5-ndl-layout") def yolo(im): results = model(im) # inference df = results.pandas().xyxy[0].to_json(orient="records") res = json.loads(df) im_with_boxes = results.render()[0] # results.render() returns a list of images # Convert the numpy array back to an image output_image = Image.fromarray(im_with_boxes) return [ output_image, res ] inputs = gr.Image(type='pil', label="Original Image") outputs = [ gr.Image(type="pil", label="Output Image"), gr.JSON() ] title = "YOLOv5 NDL-DocL Datasets" description = "YOLOv5 NDL-DocL Datasets Gradio demo for object detection. Upload an image or click an example image to use." article = "YOLOv5 NDL-DocL Datasets is an object detection model trained on the <a href=\"https://github.com/ndl-lab/layout-dataset\">NDL-DocL Datasets</a>." examples = [ ['『源氏物語』(東京大学総合図書館所蔵).jpg'], ['『源氏物語』(京都大学所蔵).jpg'], ['『平家物語』(国文学研究資料館提供).jpg'] ] demo = gr.Interface(yolo, inputs, outputs, title=title, description=description, article=article, examples=examples) demo.launch(share=False) First, due to Gradio version upgrades, I changed gr.inputs.Image to gr.Image and similar updates. ...

Handling ultralyticsplus: ValueError: Invalid CUDA 'device=0' requested...

Overview I have published an inference app using YOLOv8 at the following link: https://huggingface.co/spaces/nakamura196/yolov8-ndl-layout Initially, the following error occurred: ValueError: Invalid CUDA 'device=0' requested. Use 'device=cpu' or pass valid CUDA device(s) if available, i.e. 'device=0' or 'device=0,1,2,3' for Multi-GPU. torch.cuda.is_available(): False torch.cuda.device_count(): 0 os.environ['CUDA_VISIBLE_DEVICES']: None See https://pytorch.org/get-started/locally/ for up-to-date torch install instructions if no CUDA devices are seen by torch. This error was resolved by adding device as follows: ...

Converting IIIF Curation Lists to TEI Facsimile Elements

Overview I created a library to convert IIIF Curation Lists to TEI facsimile elements. https://github.com/nakamura196/iiif-tei I also prepared a demo page for performing this conversion. https://nakamura196.github.io/nuxt3-demo/iiif-tei-demo A video demonstrating how to use it is available below. https://youtu.be/Y5JlrJbtgz8 I hope this serves as a useful reference.

Prototyping entity-lookup Using the Japan Search Utilization Schema

Overview This is a continuation of the following article. I will prototype a package that performs CWRC entity-lookup using the Japan Search utilization schema. Demo You can try it on the following page. https://nakamura196.github.io/nuxt3-demo/entity-lookup/ Entity-lookup is performed against JPS, Wikidata, and VIAF for each type such as Person, Place, and Organization. Library It is published at the following location. https://github.com/nakamura196/jps-entity-lookup Based on the repository https://github.com/cwrc/wikidata-entity-lookup already published by CWRC, I mainly modified the following file to match the Japan Search utilization schema. ...

Trying cwrc's wikidata-entity-lookup

Overview This is a continuation of the following article. One of the features of LEAF-WRITER is described as follows: the ability to look up and select identifiers for named entity tags (persons, organizations, places, or titles) from the following Linked Open Data authorities: DBPedia, Geonames, Getty, LGPN, VIAF, and Wikidata. This feature uses libraries such as the following. https://github.com/cwrc/wikidata-entity-lookup I tried out this feature. Usage npm packages are published at the following locations. ...

Trying the CWRC XML Validator API

Overview One of the editors for TEI/XML is LEAF-WRITER. https://leaf-writer.leaf-vre.org/ It is described as follows: The XML & RDF online editor of the Linked Editing Academic Framework The GitLab repository is below. https://gitlab.com/calincs/cwrc/leaf-writer/leaf-writer One of the features of this tool is described as: continuous XML validation This validation appears to use the following API. https://validator.services.cwrc.ca/ The library seems to be: https://www.npmjs.com/package/@cwrc/leafwriter-validator This time, I tried the above API. ...

RELAX NG and Schematron

Overview When creating TEI/XML with oXygen XML Editor, the following template is generated. <?xml version="1.0" encoding="UTF-8"?> <?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?> <?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?> <TEI xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> <fileDesc> <titleStmt> <title>Title</title> </titleStmt> <publicationStmt> Publication Information </publicationStmt> <sourceDesc> Information about the source </sourceDesc> </fileDesc> </teiHeader> <text> <body> Some text here. </body> </text> </TEI> I was curious about the following difference, so I am sharing the results of querying GPT-4. <?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?> <?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?> Answer The difference between the 2nd and 3rd lines is the namespace specified in the schematypens attribute. Details are explained below. ...

TEI Publisher ODD Configuration Examples (1)

Overview This is a memo on configuring ODD settings in TEI Publisher. Hiding Elements in the Output The following was helpful as a reference. https://teipublisher.com/exist/apps/tei-publisher/documentation/odd-customization-other-behaviours Select omit for the behaviour. This caused the pb element to be hidden in the output (in the above example, latex). Adding Line Breaks with lb This may be specific to LaTeX conversion, but by selecting paragraph for the behaviour, a blank line was inserted where lb tags appeared. ...

Using the Docker Version of TEI Publisher

Overview I had an opportunity to use the Docker version of TEI Publisher, so here are my notes. https://teipublisher.com/exist/apps/tei-publisher-home/index.html TEI Publisher is described as follows. TEI Publisher facilitates the integration of the TEI Processing Model into exist-db applications. The TEI Processing Model (PM) extends the TEI ODD specification format with a processing model for documents. That way intended processing for all elements can be expressed within the TEI vocabulary itself. It aims at the XML-savvy editor who is familiar with TEI but is not necessarily a developer. ...

Formatting XML Strings in Python

Overview Notes on programs for formatting XML strings in Python. Program 1 I referenced the following. https://hawk-tech-blog.com/python-learn-prettyprint-xml/ I added processing to remove unnecessary blank lines. from xml.dom import minidom import re def prettify(rough_string): reparsed = minidom.parseString(rough_string) pretty = re.sub(r"[\t ]+\n", "", reparsed.toprettyxml(indent="\t")) # Remove unnecessary line breaks after indentation pretty = pretty.replace(">\n\n\t<", ">\n\t<") # Remove unnecessary blank lines pretty = re.sub(r"\n\s*\n", "\n", pretty) # Replace consecutive line breaks (including blank lines) with a single line break return pretty Program 2 I referenced the following. https://qiita.com/hrys1152/items/a87b4ca3c74ec4997f66 When processing TEI/XML, I recommend registering the namespace. ...

How to Convert CMYK Color Images Without Color Inversion

Overview For example, when delivering images via IIIF, performing the following conversion on CMYK color images using ImageMagick would sometimes result in inverted colors. convert source_image.tif -alpha off -define tiff:tile-geometry=256x256 -compress jpeg 'ptif:output_image.tif' Original image (Using an image published on Nuno LAB..) Display example in Image Annotator (created by Masahide Kanzaki) This is not a problem with image servers such as Cantaloupe Image Server or IIPImage, nor with viewers like Image Annotator, Mirador, or Universal Viewer. Rather, the issue lies in the generated tiled TIFF images. ...

Counting Triples in an RDF Store 2: Co-occurrence Frequency

Overview I had the opportunity to count co-occurrence frequencies for RDF triples, so here are my notes. Following the previous article, I will again use the Japan Search RDF store as an example. Example 1 The following query counts the number of triples among sword-type instances that share a common creator (schema:creator). The filter avoids counting identical instances and prevents duplicate counting. select (count(*) as ?count) where { ?entity1 a type:刀剣; schema:creator ?value . ?entity2 a type:刀剣; schema:creator ?value . FILTER(?entity1 != ?entity2 && ?entity1 < ?entity2) } https://jpsearch.go.jp/rdf/sparql/easy/?query=select+(count(*)+as+%3Fcount)+where+{ ++%3Fentity1+a+type%3A刀剣%3B +++++++++++++schema%3Acreator+%3Fvalue+. ++%3Fentity2+a+type%3A刀剣%3B +++++++++++++schema%3Acreator+%3Fvalue+. ++FILTER(%3Fentity1+!%3D+%3Fentity2+%26%26+%3Fentity1+<+%3Fentity2) } ...

Counting the Number of Triples in an RDF Store

Overview Here are my notes on how to count the number of triples in an RDF store. This time, we will use the Japan Search RDF store as an example. https://jpsearch.go.jp/rdf/sparql/easy/ Number of Triples The following query counts the number of triples: SELECT (COUNT(*) AS ?NumberOfTriples) WHERE { ?s ?p ?o . } The result is: https://jpsearch.go.jp/rdf/sparql/easy/?query=SELECT+(COUNT(*)+AS+%3FNumberOfTriples) WHERE+{ ++%3Fs+%3Fp+%3Fo+. } At the time of writing this article (May 6, 2024), there were 1,280,645,565 triples (approximately 1.28 billion). ...

Case-Insensitive Search in Drupal's Search API

Overview This is a memo on performing case-insensitive search when using Drupal’s Search API. Method Access the following page and check “Ignore case.” /admin/config/search/search-api/index/<content_type>/processors Furthermore, in the Processor settings at the bottom of the screen, select the fields to which you want to apply this processing. It was also possible to select all fields as shown below. By performing reindexing, the above settings will be reflected. Summary I hope this serves as a helpful reference. ...

Trying Out TEIGarage

Overview TEIGarage is described as follows. https://github.com/TEIC/TEIGarage/ TEIGarage is a webservice and RESTful service to transform, convert and validate various formats, focussing on the TEI format. TEIGarage is based on the proven OxGarage. Trying It Out You can try it out on the following page. https://teigarage.tei-c.org/ We will use the “TEI Minimal” ODD file published at the following URL. This file is also used as one of the presets in Roma. ...

(Machine Translation) The TEI Archive

The following is a machine translation of “The TEI Archive” page. https://tei-c.org/Vault/ Text Encoding Initiative (TEI) The TEI Archive Table of Contents Poughkeepsie Principles Sponsoring Organizations 1. TEI Committee Documents 1987-1998 TEI Advisory Committee Analysis and Interpretation Committee Edited Papers Metalanguage and Syntax Issues Committee Steering Committee Technical Review Committee Text Documentation Committee Text Representation Committee 2. Previous Versions of the Guidelines 3. Unnumbered Reports, Articles, Presentations, etc. 4. Songs, Photos, and Other Ephemera TEI Tite Documents Workgroups That Have Completed Their Work Preliminary Drafts of Electronic Text Editing (MLA, 2006) All Available P5 Releases This page contains archival materials from the Text Encoding Initiative. Spanning the first ten years from the Poughkeepsie Conference of 1988 to the beginning of the process of establishing the TEI Consortium in 1999, these materials were collected from fragments across various servers and personal collections, though much of it derives from the excellent Listserv archive maintained by Wendy Plotkin in Chicago. ...