Minor Modifications to openai-assistants-quickstart

Overview When building a chat interface using RAG (Retrieval-augmented generation) with OpenAI’s Assistants API, I used the following repository. https://github.com/openai/openai-assistants-quickstart A modification was needed regarding the handling of citation, so I am documenting it here as a memo. Background I used the above repository to try RAG with OpenAI’s Assistants API. With the default settings, citation markers like “4:13†” were displayed as-is, as shown below. Solution I modified annotateLastMessage as follows. By changing file_path to file_citation, the citation markers could be replaced. ...

November 28, 2024 · 1 min · Nakamura

Using NDL Classical Book OCR-Lite (ndlkotenocr-lite) on Mac OS

Overview On November 26, 2024, NDL Lab released NDL Classical Book OCR-Lite. https://lab.ndl.go.jp/news/2024/2024-11-26/ This article introduces how to use it on Mac OS. Usage (Video) https://www.youtube.com/watch?v=NYv93sJ6WLU Usage (Text) Access the following. https://github.com/ndl-lab/ndlkotenocr-lite/releases/tag/1.0.0 Select the one containing “macos” from the list. Also select the one matching your chip. Clicking the link downloads “ndlkotenocr-lite_v1.0.0_macos_m1.tar.gz” as shown below. After extracting by double-clicking, the application “NDLkotenOCR-Lite” is extracted inside a macos folder. ...

November 27, 2024 · 1 min · Nakamura

Using processing_config in Archivematica Transfers

Overview This article explains how to use processing_config in Archivematica transfers. Background In Archivematica transfers, you can select a processing_config. The following shows that you can choose from three options: “automated,” “default,” and “mdx.” This can be configured in “Processing configuration” under the “Administration” menu. For example, the following is a configuration example designed for interacting with mdx.jp’s S3-compatible storage. By selecting the target storage for “Store AIP location” as shown below, when this processing configuration is selected, the AIP will be saved to that storage. ...

November 19, 2024 · 1 min · Nakamura

Connecting GakuNin RDM and figshare

Overview I had the opportunity to connect GakuNin RDM and figshare, so this is a note for reference. Work on figshare Create a folder to be linked with GakuNin RDM. First, create a project. In the following example, a project called “My First Project” is created. It appeared that linking with GakuNin RDM could be done on a per-project basis. Configuration on GakuNin RDM Select the project created on the GakuNin RDM side (in this case, “My First Project”). ...

November 19, 2024 · 2 min · Nakamura

Using GakuNin RDM from Next.js

Overview This is a memo on using GakuNin RDM from Next.js. Background In the following article, I introduced how to authenticate with GakuNin RDM using NextAuth.js. As an extension of this, I prototyped a Next.js app that loads data from GakuNin RDM. Demo This is limited to those who can use GakuNin RDM authentication, but you can try it from the following link. https://rdm-app.vercel.app/ For example, below is a page for viewing the list of connected storage. ...

November 19, 2024 · 1 min · Nakamura

Uploading Files and More Using the GakuNin RDM API

Background These are notes on how to upload files and perform other operations using the GakuNin RDM API. References The following article explains how to obtain a PAT (Personal Access Token). The following article introduces a method using OAuth (Open Authorization). If you are using it from a web application, this may be helpful. Method I created the following repository using nbdev. https://github.com/nakamura196/grdm-tools The documentation can be found here. ...

November 16, 2024 · 1 min · Nakamura

Authenticating with ORCID, The Open Science Framework, and GakuNin RDM Using NextAuth.js

Overview This article describes how to perform authentication with ORCID, OSF (The Open Science Framework), and GRDM (GakuNin RDM) using NextAuth.js. Demo Apps ORCID https://orcid-app.vercel.app/ OSF https://osf-app.vercel.app/ GRDM https://rdm-app.vercel.app/ Repository ORCID https://github.com/nakamura196/orcid_app Below is an example of the options configuration. https://github.com/nakamura196/orcid_app/blob/main/src/app/api/auth/[…nextauth]/authOptions.js export const authOptions = { providers: [ { id: "orcid", name: "ORCID", type: "oauth", clientId: process.env.ORCID_CLIENT_ID, clientSecret: process.env.ORCID_CLIENT_SECRET, authorization: { url: "https://orcid.org/oauth/authorize", params: { scope: "/authenticate", response_type: "code", redirect_uri: process.env.NEXTAUTH_URL + "/api/auth/callback/orcid", }, }, token: "https://orcid.org/oauth/token", userinfo: { url: "https://pub.orcid.org/v3.0/[ORCID]", async request({ tokens }) { const res = await fetch(`https://pub.orcid.org/v3.0/${tokens.orcid}`, { headers: { Authorization: `Bearer ${tokens.access_token}`, Accept: "application/json", }, }); return await res.json(); }, }, profile(profile) { return { id: profile["orcid-identifier"].path, // Get ORCID ID name: profile.person?.name?.["given-names"]?.value + " " + profile.person?.name?.["family-name"]?.value, email: profile.person?.emails?.email?.[0]?.email, }; }, }, ], callbacks: { async session({ session, token }) { session.accessToken = token.accessToken; session.user.id = token.orcid; // Add ORCID ID to session return session; }, async jwt({ token, account }) { if (account) { token.accessToken = account.access_token; token.orcid = account.orcid; } return token; }, }, }; OSF https://github.com/nakamura196/osf-app ...

November 15, 2024 · 4 min · Nakamura

Using OldMaps Online

Overview I had the opportunity to use OldMaps Online, so this is a memo of my experience. https://www.oldmapsonline.org/ Registration Log in with a Google account or similar. With a free account, I was able to register one private image. For this example, I use the “Bird’s-eye View of the Main Campus and Faculty of Agriculture Buildings, Tokyo Imperial University” (Graduate School of Agricultural and Life Sciences / Faculty of Agriculture, The University of Tokyo). ...

November 12, 2024 · 2 min · Nakamura

Using Knight Lab's TimelineJS and StoryMapJS from Next.js

Overview This is a memo on how to use Knight Lab’s TimelineJS and StoryMapJS from Next.js. Background Knight Lab’s TimelineJS and StoryMapJS are open source tools for digital storytelling. https://knightlab.northwestern.edu/ Data We use text data from “Shibusawa Eiichi Biographical Materials” published at the following location. https://github.com/shibusawa-dlab/lab1 Repository Published at the following location. https://github.com/nakamura196/shibusawa StoryMap By preparing a component like the following, it was possible to use it from Next.js. ...

November 7, 2024 · 1 min · Nakamura

Building a Character Detection Model Using YOLOv11x and the Japanese Classical Character Dataset

Overview I had the opportunity to build a character detection model using YOLOv11x and the Japanese Classical Character (Kuzushiji) Dataset, so this is a memo of the process. http://codh.rois.ac.jp/char-shape/ References Previously, I performed a similar task using YOLOv5. You can check the demo and pre-trained models at the following Spaces. https://huggingface.co/spaces/nakamura196/yolov5-char Below is an example of application to publicly available images from the “National Treasure Kanazawa Bunko Documents Database.” ...

November 6, 2024 · 3 min · Nakamura

Training YOLOv11 Classification (Kuzushiji Recognition) Using mdx.jp

Overview We had the opportunity to train a YOLOv11 classification model (for kuzushiji/classical Japanese character recognition) using mdx.jp, so this article serves as a reference. Dataset We target the following “Kuzushiji Dataset”: http://codh.rois.ac.jp/char-shape/book/ Creating the Dataset We format the dataset to match the YOLO format. First, we merge the data, which is separated by book title, into a flat structure. #| export class Classification: def create_dataset(self, input_file_path, output_dir): # "../data/*/characters/*/*.jpg" files = glob(input_file_path) # output_dir = "../data/dataset" for file in tqdm(files): cls = file.split("/")[-2] output_file = f"{output_dir}/{cls}/{file.split('/')[-1]}" if os.path.exists(output_file): continue # print(f"Copying {file} to {output_file}") os.makedirs(f"{output_dir}/{cls}", exist_ok=True) shutil.copy(file, output_file) Next, we split the dataset using the following script: ...

November 6, 2024 · 5 min · Nakamura

Getting a List of Properties for a Specific Vocabulary in Omeka S

Overview Here is how to get a list of properties for a specific vocabulary in Omeka S. Method We will target the following. https://uta.u-tokyo.ac.jp/uta/api/properties?vocabulary_id=5 The following program writes the property list to MS Excel. import pandas as pd import requests url = "https://uta.u-tokyo.ac.jp/uta/api/properties?vocabulary_id=5" page = 1 data_list = [] while 1: response = requests.get(url + "&page=" + str(page)) data = response.json() if len(data) == 0: break data_list.extend(data) page += 1 remove_keys = ["@context", "@id", "@type", "o:vocabulary", "o:id", "o:local_name"] for data in data_list: for key in remove_keys: if key in data: del data[key] # DataFrameに変換 df = pd.DataFrame(data_list) df.to_excel("archiveshub.xlsx", index=False) Result The following MS Excel file is obtained. ...

November 5, 2024 · 4 min · Nakamura

Linking to Other Items Using the Custom Vocab Module in Omeka S

Overview I had an opportunity to link to other items using the Custom Vocab module in Omeka S, so here are my notes. Background The following article explained how to use custom vocabularies. This time, instead of strings or URIs, I will try linking items. Creating an Item Set First, create an item set to store the items to be linked. In this case, I created an item set called “Reuse Condition Display.” ...

November 4, 2024 · 2 min · Nakamura

Running a Local LLM Using mdx.jp 1GPU Pack and Ollama

Overview I had the opportunity to run a local LLM using mdx.jp’s 1GPU pack and Ollama, so this is a memo of the process. https://mdx.jp/mdx1/p/guide/charge References I referred to the following article. https://highreso.jp/edgehub/machinelearning/ollamainference.html Downloading the Model Here, we target llama3.1:70b. After the download is complete, it becomes selectable as shown below. Usage Example We use the following “Shibusawa Eiichi Biographical Materials.” https://github.com/shibusawa-dlab/lab1 Using the API Documentation was found at the following location. ...

November 4, 2024 · 3 min · Nakamura

Achieving Parallel Display of IIIF and TEI Using XSLT

Overview I had the opportunity to implement parallel display of IIIF and TEI using XSLT, so this is a memo of the process. The results can be viewed at the following link. It uses the “Koui Genji Monogatari Text DB.” https://kouigenjimonogatari.github.io/xml/xsl/01.xml Background For visualizing TEI/XML, I had previously often used CETEICean, a JavaScript library for converting TEI XML to HTML and displaying it in browsers. These efforts enabled flexible development when combined with JavaScript frameworks. ...

November 2, 2024 · 3 min · Nakamura

Creating a Transparent Text PDF from a Single Page Using Google Cloud Vision API

Overview I had the opportunity to create a transparent text PDF from a PDF using Google Cloud Vision API, so this is a personal note for future reference. Below is an example of searching for simple. Background This time, we target PDFs consisting of a single page. Procedure Creating the Image Create an image to be used as the OCR target. With the default settings, the resulting image was blurry, so I set the resolution to 2x and performed position alignment considering the resolution in the process described below. ...

November 2, 2024 · 3 min · Nakamura

Using the Zotero API from Next.js

Overview I looked into how to use the Zotero API from Next.js, so this is a memo. As a result, I created the following application. https://zotero-rouge.vercel.app/ Library I used the following library. https://github.com/tnajdek/zotero-api-client Getting the API Key and Other Information Please refer to the following article. Usage Collection List // app/api/zotero/collections/route.js import { NextResponse } from "next/server"; import api from "zotero-api-client"; import { prisma } from "@/lib/prisma"; import { decrypt } from "../../posts/encryption"; import { getSession } from "@auth0/nextjs-auth0"; async function fetchZoteroCollections( zoteroApiKey: string, zoteroUserId: string ) { const myapi = api(zoteroApiKey).library("user", zoteroUserId); const collectionsResponse = await myapi.collections().get(); return collectionsResponse.raw; } Specific Collection // app/api/zotero/collection/[id]/route.ts import { NextResponse } from "next/server"; import api from "zotero-api-client"; import { prisma } from "@/lib/prisma"; import { decrypt } from "@/app/api/posts/encryption"; import { getSession } from "@auth0/nextjs-auth0"; async function fetchZoteroCollection( zoteroApiKey: string, zoteroUserId: string, collectionId: string ) { const myapi = api(zoteroApiKey).library("user", zoteroUserId); const collectionResponse = await myapi.collections(collectionId).get(); return collectionResponse.raw; } List of Items in a Specific Collection // app/api/zotero/collection/[id]/items/route.ts import { NextResponse, NextRequest } from "next/server"; import api from "zotero-api-client"; import { prisma } from "@/lib/prisma"; import { decrypt } from "@/app/api/posts/encryption"; import { getSession } from "@auth0/nextjs-auth0"; async function fetchZoteroCollection( zoteroApiKey: string, zoteroUserId: string, collectionId: string ) { const myapi = api(zoteroApiKey).library("user", zoteroUserId); const collectionResponse = await myapi .collections(collectionId) .items() .get(); return collectionResponse.raw; References The application is hosted on Vercel, using Vercel Postgres for the database and Prisma as the ORM. The UI was built with Tailwind CSS, using design suggestions from ChatGPT. Auth0 was adopted for authentication. ...

November 1, 2024 · 2 min · Nakamura

Customizing the LEAF Writer Editor Toolbar

Overview LEAF Writer provides buttons at the top of the screen to support tag insertion. This article introduces how to customize them. As a result, I added functionality to insert <app><lem>aaa</lem><rdg>bbb</rdg></app>. https://youtu.be/XMnRP7s2atw Editing Edit the following file: packages/cwrc-leafwriter/src/components/editorToolbar/index.tsx Features for supporting tags such as person names and place names are configured as follows. For example, the description for organization has been commented out: ... const items: (MenuItem | Item)[] = [ { group: 'action', hide: isReadonly, icon: 'insertTag', onClick: () => { if (!container.current) return; const rect = container.current.getBoundingClientRect(); const posX = rect.left; const posY = rect.top + 34; showContextMenu({ // anchorEl: container.current, eventSource: 'ribbon', position: { posX, posY }, useSelection: true, }); }, title: 'Tag', tooltip: 'Add Tag', type: 'button', }, { group: 'action', type: 'divider', hide: isReadonly }, { color: entity.person.color.main, group: 'action', disabled: !isSupported('person'), hide: isReadonly, icon: entity.person.icon, onClick: () => window.writer.tagger.addEntityDialog('person'), title: 'Tag Person', type: 'iconButton', }, { color: entity.place.color.main, group: 'action', disabled: !isSupported('place'), hide: isReadonly, icon: entity.place.icon, onClick: () => window.writer.tagger.addEntityDialog('place'), title: 'Tag Place', type: 'iconButton', }, /* { color: entity.organization.color.main, group: 'action', disabled: !isSupported('organization'), hide: isReadonly, icon: entity.organization.icon, onClick: () => window.writer.tagger.addEntityDialog('organization'), title: 'Tag Organization', type: 'iconButton', }, ... As a result, the choices are limited as follows: ...

October 31, 2024 · 4 min · Nakamura

A Python Library for Visualizing the Contents of Archivematica METS Files

Overview I created a Python library for visualizing the contents of Archivematica METS files. For example, it visualizes aggregated results of processes (premis:event) performed during AIP creation, as shown below. Background In the following article, I introduced METSFlask, a web application for exploring Archivematica METS files in a human-friendly way. What I created this time is a library version of the functionality provided by METSFlask, making it easier to use outside of Flask. ...

October 31, 2024 · 1 min · Nakamura

Using LEAF Writer from Next.js

Overview This article introduces how to use LEAF Writer from Next.js. Demo You can try it from the following URL. https://leaf-writer-nextjs.vercel.app/ Below is a screenshot example. The header section is the part added using Next.js. The editor section uses LEAF Writer. The source code is available at the following link. https://github.com/nakamura196/leaf-writer-nextjs Usage Instructions are described at the following link. https://gitlab.com/calincs/cwrc/leaf-writer/leaf-writer/-/tree/main/packages/cwrc-leafwriter?ref_type=heads As a note, the div container’s id must be set to leaf-writer-container. I found that not doing so causes the styling to break. I would like to submit a pull request regarding this in the future. ...

October 29, 2024 · 1 min · Nakamura