Posts

Fetching All Records from an OAI-PMH Repository Using Python

Here is a script for fetching all records from an OAI-PMH repository using Python. I hope it serves as a useful reference. import requests from requests import Request import xml.etree.ElementTree as ET # Define the endpoint base_url = 'https://curation.library.t.u-tokyo.ac.jp/oai' # Initial OAI-PMH request params = { 'verb': 'ListRecords', 'metadataPrefix': 'curation', 'set': '97590' } response = requests.get(base_url, params=params) # Prepare the initial request req = Request('GET', base_url,params=params) prepared_req = req.prepare() print("Sending request to:", prepared_req.url) # Output the URL root = ET.fromstring(response.content) data = [] # Fetch all data while True: # Process records for record in root.findall('.//{http://www.openarchives.org/OAI/2.0/}record'): identifier = record.find('.//{http://www.openarchives.org/OAI/2.0/}identifier').text print(f'Record ID: {identifier}') # Other data can be processed here as well data.append(record) # Get resumptionToken and execute next request token_element = root.find('.//{http://www.openarchives.org/OAI/2.0/}resumptionToken') if token_element is None or not token_element.text: break # End loop if no token params = { 'verb': 'ListRecords', 'resumptionToken': token_element.text } response = requests.get(base_url, params=params) root = ET.fromstring(response.content) print("All records have been fetched.") print(len(data))

Retrieving the URL of Site Pages Where Items Are Published in the Omeka S OaiPmh Repository Module

Overview This is a personal note on how to retrieve the URL of site pages where items are published in the Omeka S OaiPmh Repository module. Background The following article introduces how to create custom vocabularies using OaiPmhRepository. https://nakamura196.hatenablog.com/entry/2021/07/25/222651 Please refer to it as well. Retrieving the URL of Site Pages Where Items Are Published Before Fix In a certain customization case, the site page URL was retrieved as follows. This does not work properly when something other than dcterms:identifier is configured in the Clean URL module. Additionally, hardcoded paths like /s/db/record/ can be seen. ...

Bulk Deleting Multiple Content Items Using the Drupal REST API

Overview I had the opportunity to bulk delete multiple content items using the Drupal REST API, so this is a memo of the process. References For a method to bulk delete content without using the REST API, please refer to the following. Preparation First, enable the HTTP Basic Authentication module and the JSON:API module. Additionally, enable DELETE in REST resources. /admin/config/services/rest Execution Example The following custom library is used. ...

Adding Images to IIIF Manifest Files for Audio Materials

Overview This is a note on trying out the Audio Presentation with Accompanying Image recipe. https://iiif.io/api/cookbook/recipe/0014-accompanyingcanvas/ The following is an example displayed in Clover, where the configured image appears in the player. https://samvera-labs.github.io/clover-iiif/docs/viewer/demo?iiif-content=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json Manifest File Description An example is stored at the following location. https://github.com/nakamura196/ramp_data/blob/main/docs/demo/3571280/manifest.json Specifically, it was necessary to add an accompanyingCanvas to the Canvas as follows. { "id": "https://nakamura196.github.io/ramp_data/demo/3571280/canvas", "type": "Canvas", "duration": 156.07999999999998, "accompanyingCanvas": { "id": "https://nakamura196.github.io/ramp_data/demo/3571280/canvas/accompanying", "type": "Canvas", "height": 1024, "width": 1024, "items": [ { "id": "https://nakamura196.github.io/ramp_data/demo/3571280/canvas/accompanying/annotation/page", "type": "AnnotationPage", "items": [ { "id": "https://nakamura196.github.io/ramp_data/demo/3571280/canvas/accompanying/annotation/image", "type": "Annotation", "motivation": "painting", "body": { "id": "https://nakamura196.github.io/ramp_data/demo/3571280/3571280_summary_image.jpg", "type": "Image", "height": 1024, "width": 1024, "format": "image/jpeg" }, "target": "https://nakamura196.github.io/ramp_data/demo/3571280/canvas/accompanying/annotation/page" } ] } ] }, It may be a bit hard to follow, but here is the code example using iiif_prezi3. The accompanyingCanvas is created via create_accompanying_canvas() and associated with the canvas. ...

IIIF Audio/Visual: Describing Multiple VTT Files

Overview This is a note on how to describe multiple VTT files for Audio/Visual materials using IIIF. Here, we describe transcription text in both Japanese and English as shown below. https://ramp.avalonmediasystem.org/?iiif-content=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json Manifest File Description An example is stored at the following location. https://github.com/nakamura196/ramp_data/blob/main/docs/demo/3571280/manifest.json Please also refer to the following article. Specifically, by describing them as multiple annotations as shown below, they were correctly processed by the Ramp viewer. ...

App Development Using Zotero's API and Streamlit

Overview I prototyped an app using Zotero’s API and Streamlit. https://nakamura196-zotero.streamlit.app/ This article is a memo on developing this app. Streamlit The following article was very helpful. https://qiita.com/sypn/items/80962d84126be4092d3c Zotero’s API Zotero’s API is described at the following page. https://www.zotero.org/support/dev/web_api/v3/start This time, I used the following library introduced on the above page. https://github.com/urschrei/pyzotero To use the API, you need to obtain a personal library ID and an API key, which could be obtained by following the Quickstart steps in the README. ...

Displaying Audio Files with Subtitles in an IIIF Viewer

Overview I had the opportunity to display audio files with subtitles in an IIIF viewer, so this is a memo. The target is “Accents and Intonation of the Japanese Language (Part 2)” published in the National Diet Library Historical Sound Archive. OpenAI’s Speech to text was used. Please note that the transcription results may contain errors. The following is a display example in Ramp. https://ramp.avalonmediasystem.org/?iiif-content=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json The following is a display example in Clover. ...

Uploading Multiple Files to mdx.jp Object Storage

Overview This is a personal note on how to upload multiple files to mdx.jp object storage. It references the following video. https://youtu.be/IN_4NS9hO2Y Preparation Working on macOS. brew install s3cmd Configuration (please refer to the video for details.) s3cmd --configure Batch Registration (Sync) The following syncs files in the local rekion folder to s3://rekion/iiif/. s3cmd sync docs/rekion/ s3://rekion/iiif/ --exclude '.DS_Store' Reference find . -name '.DS_Store' -type f -delete Batch ACL Change s3cmd setacl s3://rekion/iiif/ --acl-public --recursive Note (CORS Permission) I prepared the following XML file and attempted to enable CORS. ...

Converting Audio Published on the NDL Historical Sound Archive to mp4

Overview I had an opportunity to convert audio published on the National Diet Library Historical Sound Archive (hereinafter “Rekion”) to mp4, so here are my notes. Format Provided by Rekion Rekion provides files in m3u8 format. For example, let’s check the following: “Lecture: Union of Morality and Economy (Part 1).” https://rekion.dl.ndl.go.jp/pid/3574643 By checking with developer tools, you can confirm that it is accessible from the following URL. https://rekion.dl.ndl.go.jp/contents/3574643/83389ec6-2b45-4fdd-88d6-89628841039f/317d6ab4-32ec-4085-88e6-cfe36ffd34c3/317d6ab4-32ec-4085-88e6-cfe36ffd34c3_hls.m3u8 The IDs that make up this URL can be found in the response from the following API. ...

Customizing Ramp

Overview This is a memo on how to customize Ramp. As a result of customization, the UI is partially localized to Japanese, with the media player and metadata/transcription displayed side by side. Additionally, the query parameter t can be used to specify the playback start time for audio. For example, playback starting from the 140-second mark can be accessed from the following URL. https://ramp-iiif.vercel.app/?iiif-content=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json&t=140 Below is the pre-customization state. ...

Running Ramp Locally

Overview This is a memo on running Ramp locally. Background Ramp is described as follows. Interactive, IIIF powered audio/video media player React components library. The source code is available on the following GitHub repository. https://github.com/samvera-labs/ramp Starting Up It can be started with the following commands. git clone https://github.com/samvera-labs/ramp pnpm i pnpm demo Details are described here. https://github.com/samvera-labs/ramp?tab=readme-ov-file#styleguidist-development Note that the following command will display the style guide page instead. ...

Trying the Mirador 3 Annotations Plugin with an IIIF Manifest Specified via URL Parameters

Overview I prepared a demo page where you can try the Mirador 3 annotations plugin with an IIIF manifest specified via URL parameters. https://mirador-annotations.vercel.app/ By using the iiif-content or manifest parameter, you can target a specified IIIF manifest. https://mirador-annotations.vercel.app/?iiif-content=https://dl.ndl.go.jp/api/iiif/1301543/manifest.json This article is a memo about creating this demo page. Background There is an annotation plugin for Mirador 3 called mirador-annotations. https://github.com/ProjectMirador/mirador-annotations I introduced usage examples in the following article. A demo page is already available at the following link, but it does not provide the ability to specify an IIIF manifest file via URL parameters. ...

Delivering IIIF Images Using mdx.jp Object Storage and Cantaloupe Image Server

Overview This is a personal note on how to deliver IIIF images using mdx.jp object storage and Cantaloupe Image Server, one of the IIIF image servers. Background In the following article, I introduced how to deliver images using mdx.jp object storage. In the following article, I introduced how to deliver images stored in Amazon S3 using Cantaloupe Image Server. By combining these approaches, we aim to address the cost challenges of IIIF image delivery in digital archives. ...

Using mdx Object Storage (Using Cyberduck)

Overview I had the opportunity to use mdx’s object storage, so this is a memo. https://mdx.jp/ Pricing The pricing for fiscal year 2024 is as follows. https://mdx.jp/guide/charge It costs 0.01 points (yen) per GB per day, which is approximately 0.3 yen per GB per month. Application Method & Usage with s3cmd The following official tutorial video was helpful. https://www.youtube.com/watch?v=IN_4NS9hO2Y Using Cyberduck The video above introduces file operations using command-line tools. ...

Using Scroll View in Mirador 3

Overview These are notes on how to use Scroll View in Mirador 3. The following example uses “Kagoshima Seitoki no Uchi: The Battle at Takase River Crossing - Nozu’s Brigade Recaptures the Regimental Flag” (held by the National Diet Library). This IIIF manifest consists of three canvases (images), and in the default display mode (Single), images are shown one at a time. Our goal is to display all three connected together. ...

Reverse Proxy Settings for Drupal Running with Docker + Traefik

Overview I was running Drupal with HTTPS using Docker + Traefik, as introduced in the following article. At the time, with Drupal’s default settings, URLs were set with http as shown below. The problem with this was that, for example, when setting up Google account login as described in the following article, the redirect URL started with http, while the Google Cloud console requires URLs starting with https. This discrepancy caused authentication to fail in some cases. ...

Redirecting to HTTPS with Traefik

Overview In the following article, I introduced an example of running HTTPS-enabled containers using Traefik. However, (this has since been fixed) the setting to redirect HTTP connections to HTTPS was missing, and accessing port 80 resulted in a “not found” error. This is a memorandum on how to fix this issue. Before the Change log: # level: DEBUG entryPoints: web: address: :80 websecure: address: :443 api: dashboard: true providers: docker: exposedByDefault: false certificatesResolvers: myresolver: acme: email: aaa@bbb storage: /acme.json caServer: https://acme-v02.api.letsencrypt.org/directory # caServer: https://acme-staging-v02.api.letsencrypt.org/directory httpChallenge: entryPoint: web After the Change log: # level: DEBUG entryPoints: web: address: :80 http: redirections: entryPoint: to: websecure schema: https permanent: true websecure: address: :443 api: dashboard: true providers: docker: exposedByDefault: false certificatesResolvers: myresolver: acme: email: na.kamura.1263@gmail.com storage: /acme.json caServer: https://acme-v02.api.letsencrypt.org/directory # caServer: https://acme-staging-v02.api.letsencrypt.org/directory httpChallenge: entryPoint: web In the entryPoints section, the redirect configuration has been added as follows: ...

Survey of IIIF-Compatible Viewers

Overview I conducted a survey of IIIF-compatible viewers and would like to share the results. There may be some gaps, but I hope it serves as a useful reference. Name URL Icon Mirador https://projectmirador.org/embed/?iiif-content= mirador3.svg Universal Viewer https://uv-v3.netlify.app/#?manifest= uv.jpg Annona https://ncsu-libraries.github.io/annona/tools/#/display?url= annoa.png Clover https://samvera-labs.github.io/clover-iiif/docs/viewer/demo?iiif-content= clover.png Glycerine Viewer https://demo.viewer.glycerine.io/viewer?iiif-content= glycerine.jpg IIIF Curation Viewer http://codh.rois.ac.jp/software/iiif-curation-viewer/demo/?manifest= icp-logo.svg Image Annotator https://www.kanzaki.com/works/2016/pub/image-annotator?u= ia-logo.png TIFY https://tify.rocks/?manifest= tify-logo.svg References The following IIIF 3.0 Viewer Matrix was particularly helpful. ...

Large Videos Not Playing in Chrome

Overview I encountered an issue where large videos could not be played in Chrome. However, they could be played in Safari. When I checked with the developer tools, the download was being cancelled. The viewer part looks like this: <video controls="controls" preload="none" style="width: 620px; height: 465px;" width="100%" height="100%"> <source src="https://omeka.aws.ldas.jp/files/original/c486fe4ae8d926034678fa11b0d6b2fd55b0e695.mp4" type="video/quicktime" title="undefined"> </video> This HTML was generated by the combination of Omeka S + IIIF Server introduced in the following article. Cause Looking at the HTML above, the type is type="video/quicktime", and although the extension is mp4, the actual content is different (a mov file perhaps). ...

Difference Between production and development in Omeka S SetEnv APPLICATION_ENV

The steps to enable detailed error display in Omeka S are as follows. By making this setting, specific error messages and details will be displayed on the “Omeka S has encountered an error” page. Additionally, PHP-level errors and warnings will also be displayed on the page. This is intended to make it easier to identify and resolve issues during development, but for security reasons, it is recommended not to use this in production environments. ...