Tech | デジタルアーカイブシステムの技術ブログ

Linked Dataを使ったデータ記述の応用例

概要 RDFに関連して、以下のような記事を執筆しました。これらをまとめて可視化してみましたので、備忘録です。データ今回は以下のデータを使用しました。「中村覚（0000-0001-8245-7925）」という人物が東京国立博物館で所蔵されている「冨嶽三十六景・凱風快晴（cobas-166407）」に関心があり、その作成者は「葛飾北斎」である、といった内容を記述しました。 ttlによる記述は以下です。 @prefix dcterms: <http://purl.org/dc/terms/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . <https://orcid.org/0000-0001-8245-7925> foaf:topic_interest <https://jpsearch.go.jp/data/cobas-166407> . <https://jpsearch.go.jp/data/cobas-166407> dcterms:creator <https://jpsearch.go.jp/entity/chname/葛飾北斎> . 変換処理ジャパンサーチおよびORCIDではRDFが提供されているため、それらを追加することができます。その結果、以下のようなttlに拡張されました。 @prefix dcterms: <http://purl.org/dc/terms/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix ns1: <http://schema.org/> . @prefix ns2: <https://jpsearch.go.jp/term/property#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix prov: <http://www.w3.org/ns/prov#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . <https://orcid.org/0000-0001-8245-7925> a prov:Person, foaf:Person ; rdfs:label "Satoru Nakamura" ; foaf:account <https://orcid.org/0000-0001-8245-7925#orcid-id> ; foaf:based_near [ ] ; foaf:familyName "Nakamura" ; foaf:givenName "Satoru" ; foaf:page <https://researchmap.jp/nakamura.satoru/?lang=english> ; foaf:publications <https://orcid.org/0000-0001-8245-7925#workspace-works> ; foaf:topic_interest <https://jpsearch.go.jp/data/cobas-166407> . <https://jpsearch.go.jp/data/cobas-166407> a <https://jpsearch.go.jp/term/type/絵画> ; rdfs:label "冨嶽三十六景・凱風快晴" ; dcterms:creator <https://jpsearch.go.jp/entity/chname/葛飾北斎> ; ns1:abstract """This woodblock print was produced by the ukiyo-e artist Katsushika Hokusai in the late Edo period. It is one of a series of works called Thirty-Six Views of Mount Fuji. It is also one of the most famous Japanese pictures in the world. The “mild breeze” of the title refers to a gentle wind that blows in from the south. The foot of Mount Fuji spreads out expansively. There are no human figures and the scene focuses on the grandeur of nature. The most distinctive feature is the reddened mountain surface. Perhaps this redness is the color of Fuji's volcanic surface on a clear day, or maybe the mountain has been dyed this color by the dawn sunlight. This is a woodblock print, but the same block can produce different results depending on the printing technique. Some versions of Mild Breeze on a Fine Day use lighter colors for the sky and Mount Fuji. This creates a sense of the landscape changing color as the morning sun slowly rises. However, this version from Tokyo National Museum's collection uses more vivid hues for the blue sky and the red mountain. This emphasizes the sense of Mount Fuji captured on a hot, clear day. These variations enhance the enjoyment of woodblock prints."""@en, """これは、江戸時代の後半に、浮世絵師の葛飾北斎が描いた木版画で、富士山をテーマとした46枚シリーズの一枚です。日本の絵画の中でも、世界的に知られた一枚ではないでしょうか。タイトルは「冨嶽三十六景」なのに46枚あるのは、人気が高く当初の予定から10枚増えたためです。作品タイトルにある「凱風」とは、南から吹くおだやかな風のこと。富士山の裾野までを広くゆったりと描いています。人物も描かれておらず、自然の雄大さが感じられる景色です。何より特徴的なのは、赤くあらわされた山肌です。これは、晴天の中に立つ富士山の赤土の色なのでしょうか、それとも朝焼けに染まった色なのでしょうか。この作品は版画ですから、同じ版木を使っていても、摺りごとに違いがあらわれます。実は同じ「凱風快晴」でも、空や富士山がもっと淡い色をしたバージョンもあり、そちらはだんだんと昇る朝日を受けて景色が色づくプロセスを描いた、といった雰囲気です。一方、東博所蔵のこちらのバージョンでは、青空や赤い富士山の色がよりはっきりとあらわされ、暑い晴天の中に立つ富士、という強さがあるようです。こうした摺りごとのバリエーションが見られるのも、版画の面白さのひとつです。この作品をはじめ、「冨嶽三十六景」シリーズの最初の頃には、「ベロ」と呼ばれた青い顔料「ベルリン・ブルー」が多く使われています。この時代にヨーロッパから日本に入ってきたもので、北斎の新しいものへの関心の高さが見てとれます。"""@ja ; ns1:creator <https://jpsearch.go.jp/entity/chname/葛飾北斎> ; ns1:dateCreated "1801-1900" ; ns1:description "Category: Paintings, sketches, and prints:Japan:Ukiyo-e", "Genre: Painting", "Material: Woodblock print ([nishiki-e])", "分類: 絵画", "品質形状: 横大判　錦絵", "員数: 1枚", "法量: 25.3×38.0cm", "種別: 絵画:日本:浮世絵" ; ns1:image <https://colbase.nich.go.jp/media/tnm/A-11176-1/image/slideshow_s/A-11176-1_E0134613.jpg> ; ns1:name "Mild Breeze on a Fine Day from the Series Thirty-Six Views of Mount Fuji"@en, "冨嶽三十六景・凱風快晴"@ja, "ふがくさんじゅうろっけい　がいふうかいせい"@ja-hira, "フガクサンジュウロッケイ　ガイフウカイセイ"@ja-kana ; ns1:temporal <https://jpsearch.go.jp/entity/time/1801-1900> ; ns2:accessInfo <https://jpsearch.go.jp/data/cobas-166407#accessinfo> ; ns2:agential "" ; ns2:sourceInfo <https://jpsearch.go.jp/data/cobas-166407#sourceinfo> ; ns2:temporal "" . <https://jpsearch.go.jp/entity/chname/葛飾北斎> a <https://jpsearch.go.jp/term/type/Agent>, <https://jpsearch.go.jp/term/type/Person> ; rdfs:label "葛飾北斎" ; ns1:birthDate <https://jpsearch.go.jp/entity/time/1760> ; ns1:deathDate <https://jpsearch.go.jp/entity/time/1849> ; ns1:description "1760?-1849, 江戸時代後期の浮世絵師。姓は川村氏、幼名は時太郎、のち鉄蔵。通称は中島八右衛門。号は勝川春朗、宗理、戴斗、為一、画狂老人、卍など。" ; ns1:image <https://commons.wikimedia.org/wiki/Special:FilePath/Hokusai_portrait_whiteBackground.png?width=350> ; ns1:name "Katsushika, Hokusai"@en, "かつしか北斎"@ja, "前北斎"@ja, "前北斎卍"@ja, "前北斎為一"@ja, "前北斎為一老人"@ja, "勝川春朗"@ja, "北斉"@ja, "北斉載斗改葛飾為一"@ja, "北斎"@ja, "北斎主人"@ja, "北斎宗理"@ja, "北斎為一卍"@ja, "画狂人北斎"@ja, "葛飾北斎"@ja, "葛飾卍老人"@ja, "かつしかほくさい"@ja-hira, "カツシカホクサイ"@ja-kana ; ns1:subjectOf <https://jpsearch.go.jp/data/nij15-nijl_nijl_nijl_kotengakutougouhyakka_hagajinmeijiten_00029172> ; ns1:url <https://jpsearch.go.jp/gallery/ndl-xbzl6VVOpkhjzlg> ; rdfs:isDefinedBy <https://jpsearch.go.jp/entity/chname/> ; owl:sameAs <http://collection.britishmuseum.org/id/person-institution/1820>, <http://data.bnf.fr/ark:/12148/cb124954814#about>, <http://dbpedia.org/resource/Hokusai>, <http://edan.si.edu/saam/id/person-institution/2267>, <http://id.ndl.go.jp/auth/entity/00270331>, <http://ja.dbpedia.org/resource/葛飾北斎>, <http://lod.ac/id/1626>, <http://viaf.org/viaf/69033717>, <http://www.wikidata.org/entity/Q5586> . 可視化上記のttlを神崎正英氏が作成されたツールを使って可視化してみます。 ...

ズーム操作を無効化するMirador 3（4）向けプラグインの開発

概要 Mirador 3（実際には、Mirador 4）向けに、ズーム操作を無効化するプラグインを作成しました。 https://github.com/nakamura196/mirador-disable-zoom/ 以下が動作デモです。 https://youtu.be/RN2V4b7IYoI 以下のURLからお試しいただけます。 https://nakamura196.github.io/mirador-disable-zoom/ 本プラグインは、UCLA LibraryによってMirador 2向けに作成された以下のプラグインを参考にしています。 https://github.com/UCLALibrary/mirador-disable-zoom 今回は、本プラグインの開発によって気がついた点をメモします。 targetの指定 targetによって、プラグインのボタンを設置する場所を指定することができました。今回は、WindowTopBarPluginAreaを指定することで、ウインドウ毎のバーに直接アイコンを表示することができました。 import * as actions from 'mirador/dist/es/src/state/actions'; import { getWindowConfig } from 'mirador/dist/es/src/state/selectors'; import MiradorDisableZoom from './plugins/MiradorDisableZoom'; import MiradorDisableZoomMenuItem from './plugins/MiradorDisableZoomMenuItem'; import translations from './translations'; export const miradorDisableZoomPlugin = [ { target: 'OpenSeadragonViewer', mode: 'add', component: MiradorDisableZoom, mapStateToProps: (state, { windowId }) => ({ enabled: getWindowConfig(state, { windowId }).disableZoomEnabled || false, }) }, { target: 'WindowTopBarPluginArea', component: MiradorDisableZoomMenuItem, mode: 'add', mapDispatchToProps: { updateWindow: actions.updateWindow, }, mapStateToProps: (state, { windowId }) => ({ enabled: getWindowConfig(state, { windowId }).disableZoomEnabled || false, }), config: { translations, }, }, ]; https://github.com/nakamura196/mirador-disable-zoom/blob/main/src/index.js 一方、以下の記事で紹介したプラグインでは、WindowTopBarPluginMenuを指定しています。この場合には、以下のように、3点ドットのメニューが表示され、その中にプラグイン毎のアイコンなどを表示できました。用途に応じて、targetを使い分けることができそうです。これらのtargetは以下で指定されているコンポーネントを使用できるようでした。 https://github.com/ProjectMirador/mirador/tree/master/src/containers translations 上記のコードにおいて、config属性のtranslationsを指定しています。 { target: 'WindowTopBarPluginArea', component: MiradorDisableZoomMenuItem, mode: 'add', mapDispatchToProps: { updateWindow: actions.updateWindow, }, mapStateToProps: (state, { windowId }) => ({ enabled: getWindowConfig(state, { windowId }).disableZoomEnabled || false, }), config: { translations, }, }, 上記を指定しないと、同フォルダ内のtranslations.jsで指定した翻訳データが反映されませんでした。 ...

prefix.ccを利用する

概要 RDFデータにおけるprefixの利用にあたり、prefix.ccを使ってみましたので、備忘録です。 https://prefix.cc/ namespace lookup api これは、prefixを与えることで、URIを取得できるサービスのようでした。 https://prefix.cc/about/api 例えば、以下を与えます。 https://prefix.cc/foaf.file.json 結果、以下が得られました。 { "foaf": "http://xmlns.com/foaf/0.1/" } reverse lookup API 例えば、以下を与えます。 https://prefix.cc/reverse?uri=http://xmlns.com/foaf/0.1/&format=json 結果、上記と同じjson結果が得られました。追加するジャパンサーチで利用されている以下のjpsは利用できませんでした。 { "jps": "https://jpsearch.go.jp/term/property#" } そこで以下の画面のように、新規に追加登録してみました。その後は、他のprefixと同様に利用することができました。 https://prefix.cc/jps.file.json 投票によって、登録された情報を評価しているようで、1日に1回、+/-を提出できるようでした。参考 exも登録されていました。 https://prefix.cc/ex.file.json { "ex": "http://example.org/" } まとめ prefixの解決にあたり、参考になりましたら幸いです。

RDFデータの追加と可視化に関する備忘録

概要この記事では、Microsoft Visioを用いたデータからRDF形式への変換プロセスと、特定のURI（今回は葛飾北斎の情報を含むジャパンサーチのURI）が指定された場合のRDFデータへの追加方法について説明します。変換プロセスを通じて、Visioで作成したデータをRDF形式で扱えるようにする手法を紹介し、さらに具体的な例として、葛飾北斎の情報を取り入れた結果を示します。背景以下の記事で、Microsoft Visioを使って作成したデータをRDFに変換する例を紹介しました。今回は、URIが指定されていた場合、そのRDFデータを追加する処理を加えたので、その備忘録です。サンプルデータ今回は、ジャパンサーチで公開されている葛飾北斎のURIを指定しています。処理結果変換後のttlファイルは以下のようになりました。 @prefix ex: <http://example.org/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix ns1: <http://schema.org/> . @prefix ns2: <http://www.w3.org/2003/06/sw-vocab-status/ns#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . foaf:Person a rdfs:Class, owl:Class ; rdfs:label "Person" ; rdfs:comment "A person." ; rdfs:isDefinedBy foaf: ; rdfs:subClassOf <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing>, foaf:Agent ; owl:disjointWith foaf:Organization, foaf:Project ; owl:equivalentClass ns1:Person, <http://www.w3.org/2000/10/swap/pim/contact#Person> ; ns2:term_status "stable" . ex:BOB ex:knows ex:Alice ; ex:type foaf:Person ; foaf:topic_interest <https://jpsearch.go.jp/entity/chname/葛飾北斎> . <https://jpsearch.go.jp/entity/chname/葛飾北斎> a <https://jpsearch.go.jp/term/type/Agent>, <https://jpsearch.go.jp/term/type/Person> ; rdfs:label "葛飾北斎" ; ns1:birthDate <https://jpsearch.go.jp/entity/time/1760> ; ns1:deathDate <https://jpsearch.go.jp/entity/time/1849> ; ns1:description "1760?-1849, 江戸時代後期の浮世絵師。姓は川村氏、幼名は時太郎、のち鉄蔵。通称は中島八右衛門。号は勝川春朗、宗理、戴斗、為一、画狂老人、卍など。" ; ns1:image <https://commons.wikimedia.org/wiki/Special:FilePath/Hokusai_portrait_whiteBackground.png?width=350> ; ns1:name "Katsushika, Hokusai"@en, "かつしか北斎"@ja, "前北斎"@ja, "前北斎卍"@ja, "前北斎為一"@ja, "前北斎為一老人"@ja, "勝川春朗"@ja, "北斉"@ja, "北斉載斗改葛飾為一"@ja, "北斎"@ja, "北斎主人"@ja, "北斎宗理"@ja, "北斎為一卍"@ja, "画狂人北斎"@ja, "葛飾北斎"@ja, "葛飾卍老人"@ja, "かつしかほくさい"@ja-hira, "カツシカホクサイ"@ja-kana ; ns1:subjectOf <https://jpsearch.go.jp/data/nij15-nijl_nijl_nijl_kotengakutougouhyakka_hagajinmeijiten_00029172> ; ns1:url <https://jpsearch.go.jp/gallery/ndl-xbzl6VVOpkhjzlg> ; rdfs:isDefinedBy <https://jpsearch.go.jp/entity/chname/> ; owl:sameAs <http://collection.britishmuseum.org/id/person-institution/1820>, <http://data.bnf.fr/ark:/12148/cb124954814#about>, <http://dbpedia.org/resource/Hokusai>, <http://edan.si.edu/saam/id/person-institution/2267>, <http://id.ndl.go.jp/auth/entity/00270331>, <http://ja.dbpedia.org/resource/葛飾北斎>, <http://lod.ac/id/1626>, <http://viaf.org/viaf/69033717>, <http://www.wikidata.org/entity/Q5586> . URIを介してRDFデータを追加しなかった場合のデータは以下です。以下と比較することで、foaf:Personや葛飾北斎に関するデータが追加されていることが確認できます。 ...

Content Negotiationを使って、PythonでURIからRDFを取得する

概要 WikidataのエンティティのURIからRDFデータを取得する機会がありましたので、備忘録です。 Content Negotiationを使用しないまず以下のように、headersを空のままリクエストします。 import requests # URL for the Wikidata entity in RDF format url = "http://www.wikidata.org/entity/Q12418" headers = { } # Sending a GET request to the URL response = requests.get(url, headers=headers) # Checking if the request was successful if response.status_code == 200: text = response.text print(text[:5000]) else: print("Failed to retrieve RDF data. Status code:", response.status_code) この場合、以下のように、json形式のテキストデータを取得することができます。 {"entities":{"Q12418":{"pageid":14002,"ns":0,"title":"Q12418","lastrevid":2176343952,"modified":"2024-06-11T11:43:44Z","type":"item","id":"Q12418","labels":{"fr":{"language":"fr","value":"La Joconde"},"es":{"language":"es","value":"La Gioconda"},"en":{"language":"en","value":"Mona Lisa"},"af":{"language":"af","value":"Mona Lisa"},"am":{"language":"am","value":"\u121e\u1293 \u120a\u12db"},"ar": ... ただし、求めているRDFデータは取得できません。 Content Negotiationを使用する次にheadersにRDF/XMLを指定します。 import requests from lxml import etree # URL for the Wikidata entity in RDF format url = "http://www.wikidata.org/entity/Q12418" # Headers to request RDF/XML format headers = { 'Accept': 'application/rdf+xml' } # Sending a GET request to the URL response = requests.get(url, headers=headers) # Checking if the request was successful if response.status_code == 200: tree = etree.fromstring(response.content) # Convert _Element to _ElementTree tree = etree.ElementTree(tree) # Now you can use the write method tree.write("output.xml", pretty_print=True, xml_declaration=True, encoding='UTF-8') else: print("Failed to retrieve RDF data. Status code:", response.status_code) 結果、以下のようにRDFデータを取得することができました。 ...

iiif-prezi3を試す

概要 IIIF Presentation API 3が普及しつつありますが、その仕様を理解しつつ、JSONファイルを直接作成することが難しく感じるようになりました。そこで、以下のPythonライブラリを使用してみましたので、備忘録です。 https://github.com/iiif-prezi/iiif-prezi3 以下の記事で紹介した東寺百合文書WEBで公開されているデータのIIIFへの変換にあたり、本ライブラリを使用しています。読みにくいもので恐縮ですが、ソースコードも以下のリポジトリで公開していますので、参考になりましたら幸いです。 https://github.com/nakamura196/toji_iiif コレクションの作成以下のようなコードにより、IIIFコレクションを作成できました。 import iiif_prezi3 iiif_prezi3.config.configs['helpers.auto_fields.AutoLang'].auto_lang = "ja" collection = iiif_prezi3.Collection( id=f"{origin}/set/3/collection.json", label="東寺百合文書", viewingDirection="right-to-left", provider=iiif_prezi3.ProviderItem( id=self.homepage, label=self.attribution, ), homepage=iiif_prezi3.HomepageItem( id=self.homepage, type="Text", label=self.attribution, format="text/html", language="ja" ), metadata=[ iiif_prezi3.KeyValueString(label="Attribution", value=self.attribution), iiif_prezi3.KeyValueString(label="Rights", value=self.rights), ], rights=self.rights, ) opath = f"{self.docs_dir}/iiif/set/3/collection.json" os.makedirs(os.path.dirname(opath), exist_ok=True) with open(opath, "w") as f: f.write(collection.json(ensure_ascii=False, indent=2 if IS_DEBUG else None)) iiif_prezi3.config.configs['helpers.auto_fields.AutoLang'].auto_langにjaを与えることで、labelやmetadataの言語フィールドがjaになりました。 { "@context": "http://iiif.io/api/presentation/3/context.json", "id": "https://nakamura196.github.io/toji_iiif/iiif/set/3/collection.json", "type": "Collection", "label": { "ja": [ "東寺百合文書" ] }, "metadata": [ { "label": { "ja": [ "Attribution" ] }, "value": { "ja": [ "京都府立京都学・歴彩館東寺百合文書WEB" ] } }, { "label": { "ja": [ "Rights" ] }, "value": { "ja": [ "https://creativecommons.org/licenses/by/2.1/jp/" ] } } ], "rights": "https://creativecommons.org/licenses/by/2.1/jp/", "provider": [ { "id": "https://hyakugo.pref.kyoto.lg.jp/", "type": "Agent", "label": { "ja": [ "京都府立京都学・歴彩館東寺百合文書WEB" ] } } ], "homepage": [ { "id": "https://hyakugo.pref.kyoto.lg.jp/", "type": "Text", "label": { "ja": [ "京都府立京都学・歴彩館東寺百合文書WEB" ] }, "format": "text/html", "language": [ "ja" ] } ], "items": [ { "id": "https://nakamura196.github.io/toji_iiif/iiif/3/1/manifest.json", "label": { "ja": [ "イ函/1/：山城国紀伊郡司解案" ] }, "type": "Manifest" } ] } iiif_prezi3.KeyValueString関数を使用することで、フィールドや値が配列として出力される点も有用かと思いました。 ...

Archivematicaの日本語ファイル名変換を修正する

概要デフォルト設定のArchivematicaに日本語ファイル名のファイルを入力すると、「ユースケース公募提案書.docx」というファイル名は以下のように変換されます。 yu-suke-suGong_Mu_Ti_An_Shu_.docx このファイル名変換をカスタマイズする方法について説明します。概要ファイル名の変換は以下で行われています。 https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/change_names.py 具体的には、以下です。 decoded_name = unidecode(basename) Google Colabでの実行例は以下です。 https://colab.research.google.com/github/nakamura196/000_tools/blob/main/unidecodeを試す.ipynb カスタマイズ今回は、pykakasiを使用してみます。 https://codeberg.org/miurahr/pykakasi また、DockerでArchivematicaを起動しているとします。以下の記事を参考にしてください。まず、以下にpykakasiを追記します。 https://github.com/artefactual/archivematica/blob/qa/1.x/requirements-dev.txt そして、以下のファイルも修正します。 https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/change_names.py import os import re import shutil from unidecode import unidecode import pykakasi # 初期化 kakasi = pykakasi.kakasi() # テキストをローマ字に設定 kakasi.setMode("H", "a") # 平仮名をローマ字に kakasi.setMode("K", "a") # カタカナをローマ字に kakasi.setMode("J", "a") # 漢字をローマ字に kakasi.setMode("r", "Hepburn") # ヘボン式ローマ字に設定 # コンバーターを作成 converter = kakasi.getConverter() VERSION = "1.10." + "$Id$".split(" ")[1] # Letters, digits and a few punctuation characters ALLOWED_CHARS = re.compile(r"[^a-zA-Z0-9\-_.]") REPLACEMENT_CHAR = "_" def change_name(basename): if basename == "": raise ValueError("change_name received an empty filename.") # decoded_name = unidecode(basename) decoded_name = converter.do(basename) ... 上記の修正を加えて、Archivematicaを再ビルドした結果、以下のようなファイル名に変換されるようになりました。 ...

DHCフォーマットの中身を確認する

概要 Digital HumanitiesやThe Japanese Association for Digital Humanities (JADH)の年次大会では、以下のdhconvalidatorというツールを使い、DOCXやODT形式のファイルを、DHCファイルに変換して提出することが多いです。 https://github.com/ADHO/dhconvalidator 今回は、このフォーマットを理解するための備忘録です。内容を確認する DHCファイルは以下のように説明されています。 This is essentially a ZIP archive containing their original OCT/DOCX file, an HTML rendering and an XML-TEI rendering, plus a folder with the image files, properly renamed). （機械翻訳）これは基本的に、元のOCT/DOCXファイル、HTMLレンダリング、XML-TEIレンダリングを含むZIPアーカイブであり、適切に名前が付けられた画像ファイルのフォルダも含まれています。したがって、上記のdhconvalidatorを使って作成された.dhc形式のファイルについて、展開してみます。 unzip nakamura.dhc 結果、以下のようにファイルが展開されました。入力元のdocxに加えて、TEIに変換されたXMLファイル、およびHTMLファイルが含まれていました。HTMLファイルは以下のようにブラウザで表示されます。共著者や他者が作成したDHCファイルの内容を確認するには、上記のように展開の上、HTMLファイルを確認するのがよさそうです。 ! 未調査ですが、DHCファイルの閲覧に特化したビューアなどが存在するかもしれません。（参考）変換の仕組み以下のように説明されています。 The DHConvalidator works together with the conference management tool ConfTool and uses TEIGarage (formerly known as OxGarage) to do the bulk conversion. ...

a3mを試す

概要 a3mを試します。 https://github.com/artefactual-labs/a3m a3mは以下のように説明されています。 a3m is a lightweight version of Archivematica focused on AIP creation. It has neither external dependencies, integration with access sytems, search capabilities nor a graphical interface. （機械翻訳）a3mはArchivematicaの軽量版で、AIP（アーカイブ情報パッケージ）の作成に特化しています。外部依存関係、アクセスシステムとの統合、検索機能、グラフィカルインターフェースはありません。 Archivematicaとの違い以下についても機械翻訳結果を掲載します。 The main differentiator is the lack of service dependencies. a3m does not depend on Gearman or external workers, MySQL, Elasticsearch or Nginx. a3m provides its own API server based on the gRPC stack and all processing is performed via system threads and spawned child processes. An embedded database based on SQLite is used to store temporary processing state. ...

さくらのVPSでSSH接続ができなくなった場合の対処法

さくらのVPSでSSH接続ができなくなった場合、シリアルコンソールを利用してログインすることができます。シリアルコンソールは、ネットワーク経由での通常のSSH接続が利用できない場合にも、VPSにアクセスするための緊急回避策として機能します。シリアルコンソールへのアクセス方法は以下の通りです：さくらのVPSコントロールパネルにログインします。対象のVPSを選択します。「シリアルコンソール」のオプションを見つけてクリックします。指示に従ってシリアルコンソールへの接続を行います。シリアルコンソールを使用すると、SSHのキーが誤って設定されているか、ファイアウォールによる通信の遮断があっても、システムの設定や修復作業を行うことが可能です。ただし、コンソールを安全に使用するためには、適切な認証情報が必要です。

docker-compose コマンドでコンテナを再起動すると同時にビルドも行う

docker-compose コマンドでコンテナを再起動すると同時にビルドもしたい場合、以下のようなコマンドを実行できます。まず、ビルドと再起動を個別に実行する方法と、それを一つのコマンドで実行する方法を示します。ビルドと再起動を個別に実行ビルド : docker-compose -f ./docker-compose.prod.yml build 再起動 : docker-compose -f ./docker-compose.prod.yml restart 一つのコマンドでビルドと再起動を実行ビルドしてからサービスを再起動するには、up コマンドを使い、--build オプションを付けてから restart ポリシーを使います。しかし、restart オプションは up コマンドには存在しないので、実際にはサービスを停止して再起動することになります。 docker-compose -f ./docker-compose.prod.yml up --build -d このコマンドはサービスをビルドした後、デタッチドモードで実行します。これにより、コンテナが更新されてからバックグラウンドで自動的に再起動します。ただし、これによる完全な再起動を保証するためには、各サービスの設定で適切な再起動ポリシーが設定されている必要があります。どちらの方法でも目的を達成できますが、作業の流れに合わせて選択してください。

StrapiのData transferを試す

概要 Strapiにおいて、ローカル環境のデータを公開環境に反映させる機会があり、以下のData transferを使ってみました。 https://docs.strapi.io/dev-docs/data-management/transfer 手順公開環境側公開環境側で、Transfer Tokensを発行します。ローカル環境公開サイトをhttps://strapi.example.org、tokenをxxxとします。この時、以下のコマンドにより、ローカル環境のデータを公開環境に反映することができました。 strapi transfer --to https://strapi.example.org/admin --to-token xxx 既存のデータが上書きされるため、その点はご注意ください。 ? The transfer will delete existing data from the remote Strapi! Are you sure you want to proceed? Yes Starting transfer... ✔ entities: 71 transfered (size: 73.6 KB) (elapsed: 1680 ms) ✔ links: 54 transfered (size: 10.3 KB) (elapsed: 687 ms) ... まとめ Strapiの利用にあたり、参考になりましたら幸いです。

@iiif/parserを試す

概要 @iiif/parserというnpmモジュールを知ったので、一部の機能を試してみました。 https://github.com/IIIF-Commons/parser 使い方以下は一例です。v2のIIIFマニフェストを、v3に変換します。 "use client"; import { useState } from "react"; import { convertPresentation2 } from "@iiif/parser/presentation-2"; import { Button, Label, TextInput } from "flowbite-react"; import ComponentsPagesParserPre from "./pages/parser/pre"; type ManifestData = any; export default function ComponentsParser() { const [url, setUrl] = useState<string>( "https://iiif.dl.itc.u-tokyo.ac.jp/repo/iiif/fbd0479b-dbb4-4eaa-95b8-f27e1c423e4b/manifest" ); const [data, setData] = useState<ManifestData>(null); const fetchAndConvertManifest = async ( manifestUrl: string ): Promise<void> => { try { const response = await fetch(manifestUrl); const manifestJson = await response.json(); const convertedManifest = convertPresentation2(manifestJson); setData(convertedManifest); } catch (error) { console.error("Failed to fetch or convert manifest", error); setData("Error fetching or converting manifest."); } }; const handleSubmit = (event: React.FormEvent<HTMLFormElement>): void => { event.preventDefault(); fetchAndConvertManifest(url); }; return ( <> <form className="flex flex-col gap-4" onSubmit={handleSubmit}> <div> <Label htmlFor="url" value="IIIF Manifest URL (v2)" /> <TextInput id="url" type="text" value={url} placeholder="https://example.com/iiif/manifest.json" required onChange={(e) => setUrl(e.target.value)} /> </div> <Button type="submit">Submit</Button> </form> <div className="mt-8"> <ComponentsPagesParserPre data={data} /> </div> </> ); } まず、以下でインポートします。 ...

ndlocr_cli実行時の共有メモリ不足への対応ほか

概要 ndlocr_cli（NDLOCR(ver.2.1)アプリケーションのリポジトリ）を実行した際、いくつか対応を行う必要がありましたので、その備忘録です。なお、これらの対応は私の設定漏れや変則的な使用方法によるものが多く、一般的な使用においては発生しないと思われます。同様の不具合が発生した際の参考としてご覧ください。共有メモリ不足 ndlocr_cliを実行した際、以下のエラーが発生しました。 Predicting: 0it [00:00, ?it/s]ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). DataLoader worker (pid(s) 3999) exited unexpectedly Chat GPTによる回答は以下でした。「Unexpected bus error encountered in worker」というエラーメッセージは、通常、PyTorchのDataLoaderを使用している際に、共有メモリ（shared memory）が不足している場合に発生します。特に、データセットが大きい場合や多くのワーカーを使用している場合にこの問題が見られることがあります。そして、以下の指示がありました。 Dockerや他の仮想環境を使用している場合は、共有メモリのサイズを増やす必要があります。Dockerを使用している場合は、コンテナを起動する際に --shm-size オプションを設定します。例えば、docker run --shm-size 2G ... のように設定します。これについて、私のdockerの実行コマンドを確認したところ、--shm-sizeの指定が漏れていました。以下のスクリプトでは、--shm-size=256mが指定されていました。 https://github.com/ndl-lab/ndlocr_cli/blob/master/docker/run_docker.sh 上記のオプションを付与して実行したところ、無事、共有メモリ不足のエラーは解消しました。（参考）現在の共有メモリのサイズを確認する以下のコマンドにより確認できました。 df -h /dev/shm 上記のエラーが発生した時、64mとなっていました。 KeyError: ‘STRING’ 何度か、KeyError: 'STRING'に遭遇しました。この対処にあたり、以下の二つのファイルに変更を加えました。 https://github.com/ndl-lab/ndlocr_cli/blob/master/cli/core/inference.py#L681 https://github.com/ndl-lab/ruby_prediction/blob/646de35cefde6fa205f4b6a3ac308e7f5ba91061/output_ruby.py#L104C45-L104C65 line_xml.attrib['STRING']やelm.attrib['STRING']の箇所でエラーが発生していたため、以下の処理を加えました。 if 'STRING' not in line_xml.attrib: continue 参考：プログレスバーの追加 OCR処理中のプログレスバーを表示したいケースがありました。以下の箇所を修正します。 ...

Omeka Sで動画を公開する

概要 Omeka Sで動画を公開する方法について調べてみましたので、備忘録です。標準機能 Omeka Sは標準で動画をサポートしています。以下は標準の機能を使用した例です。以下のmp4ファイルを使用させていただいています。 https://file-examples.com/storage/fe4e1227086659fa1a24064/2017/04/file_example_MP4_480_1_5MG.mp4 具体的には、以下のように<video>タグが使用されていました。 <div class="media-render file"> <video src="https://omeka-d.aws.ldas.jp/files/original/5060f3ba2537676746a7aa69c9884c64daac300b.mp4" controls=""> <a href="https://omeka-d.aws.ldas.jp/files/original/5060f3ba2537676746a7aa69c9884c64daac300b.mp4">5060f3ba2537676746a7aa69c9884c64daac300b.mp4</a> </video> </div> 同様に.movファイルをアップロードしたところ、ブラウザ依存かと思いますが、無事に再生されました。 IIIF Server IIIF Serverモジュールを使用することで、IIIFマニフェストファイルを配信することが可能になります。 https://omeka.org/s/modules/IiifServer/ これをインストールし、合わせて、Universal Viewerをインストールします。 https://omeka.org/s/modules/UniversalViewer/ 結果、以下のように、Universal Viewerが表示され、マニフェストファイルに記載されたメタデータとともに、動画を表示することができました。マニフェストファイルの確認（v2） IIIF Sererモジュールによって生成するマニフェストファイル（v2）を確認したところ、以下のように表示されました。 { "@context": [ "http://iiif.io/api/presentation/2/context.json", "http://wellcomelibrary.org/ld/ixif/0/context.json" ], "@id": "https://omeka-d.aws.ldas.jp/iiif/2/1/manifest", "@type": "sc:Manifest", "label": "mp4", "metadata": [ { "label": "Title", "value": "mp4" } ], "viewingDirection": "left-to-right", "license": "https://rightsstatements.org/vocab/CNE/1.0/", "related": { "@id": "https://omeka-d.aws.ldas.jp/s/test/item/1", "format": "text/html" }, "seeAlso": { "@id": "https://omeka-d.aws.ldas.jp/api/items/1", "format": "application/ld+json" }, "sequences": [ { "@id": "https://omeka-d.aws.ldas.jp/iiif/2/1/sequence/normal", "@type": "sc:Sequence", "label": "Unsupported extension. This manifest is being used as a wrapper for non-IIIF v2 content (e.g., audio, video) and is unfortunately incompatible with IIIF v2 viewers.", "compatibilityHint": "displayIfContentUnsupported", "canvases": [ { "@id": "https://omeka-d.aws.ldas.jp/iiif/ixif-message/canvas/c1", "@type": "sc:Canvas", "label": "Placeholder image", "thumbnail": "https://omeka-d.aws.ldas.jp", "width": null, "height": null, "images": [ { "@id": "https://omeka-d.aws.ldas.jp/iiif/ixif-message/imageanno/placeholder", "@type": "oa:Annotation", "motivation": "sc:painting", "resource": { "@id": "https://omeka-d.aws.ldas.jp/iiif/ixif-message-0/res/placeholder", "@type": "dctypes:Image", "width": null, "height": null }, "on": "https://omeka-d.aws.ldas.jp/iiif/ixif-message/canvas/c1" } ] } ] } ], "mediaSequences": [ { "@id": "https://omeka-d.aws.ldas.jp/iiif/2/1/sequence/s0", "@type": "ixif:MediaSequence", "label": "XSequence 0", "elements": [ { "@id": "https://omeka-d.aws.ldas.jp/files/original/bc5bbd4550ae7b6eab3affbc832bb158b5e280ab.mp4/element/e0", "@type": "dctypes:MovingImage", "label": "mp4", "metadata": [ { "label": "Title", "value": "mp4" } ], "thumbnail": "https://omeka-d.aws.ldas.jp/application/asset/thumbnails/video.png", "rendering": [ { "@id": "https://omeka-d.aws.ldas.jp/files/original/bc5bbd4550ae7b6eab3affbc832bb158b5e280ab.mp4", "format": "video/mp4" } ], "service": { "@id": "https://omeka-d.aws.ldas.jp/iiif/2/5", "profile": "http://wellcomelibrary.org/ld/ixif/0/alpha.json" }, "width": 0, "height": 0 } ] } ] } v2では動画などには非対応のため、mediaSequencesをUniversal Viewerが独自にロードして表示しているようです。 ...

ndlocr_cliをdockerでインストールした後の容量

ndlocr_cliをdockerでインストールした後の容量に関する備忘録です。以下の手順を参考に、ndlocr_cliをセットアップしました。以下のように、50GB弱が使われるようでしたので、残りの容量で入出力の画像ファイルなどを処理する必要があります。（以下では、200GBのディスク容量を割り当てた例です。） mdxuser@ubuntu-2204:~/ndlocr_cli$ df -h Filesystem Size Used Avail Use% Mounted on tmpfs 5.7G 1.4M 5.7G 1% /run /dev/sda2 196G 45G 143G 24% / tmpfs 29G 0 29G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock /dev/sda1 1.1G 6.1M 1.1G 1% /boot/efi tmpfs 5.7G 4.0K 5.7G 1% /run/user/1000 AWS（Amazon Web Services）やmdx（データ活用社会創成プラットフォーム）において、仮想マシンを立ち上げる際の、仮想ディスク(GB)の指定などに役立てば幸いです。参考になりましたら幸いです。

プログラムを使ってDrupalにログインする

プログラムを使ってDrupalにログインする方法に関する備忘録です。以下の記事が参考になりました。 https://drupal.stackexchange.com/questions/185494/how-do-i-programmatically-log-in-a-user-with-a-post-request curl --location 'http://drupal.d8/user/login?_format=json' \ --header 'Content-Type: application/json' \ --data '{ "name": "admin", "pass": "admin" }' 上記のようなリクエストをおくることで、以下のようなレスポンスを取得できました。 {"current_user":{"uid":"1","roles":["authenticated","administrator"],"name":"admin"},"csrf_token":"wBr9ldleaUhmP4CgVh7PiyyxgNn_ig8GgAan9-Ul3Lg","logout_token":"tEulBvihW1SUkrnbCERWmK2jr1JEN_mRAQIdNNhhIDc"} 参考になりましたら幸いです。

WordPress REST APIで非公開の投稿も含めて検索する

背景 WordPress REST APIで非公開の投稿も含めて検索する方法の備忘録です。以下が参考になりました。 https://wordpress.org/support/topic/wordpress-rest-api-posts-not-showing-other-than-published/ 具体的には、以下のように、引数statusを使い、複数の状態を指定することで、それらを含む記事の一覧を取得できました。 GET /wp-json/wp/v2/posts?status=publish,draft,trash 参考になりましたら幸いです。

Drupalのイベントをトリガーとして、GitHub Actionsを起動する

概要 Drupalのイベントをトリガーとして、GitHub Actionsを起動する方法の備忘録です。以下のサイトが参考になりました。 https://qiita.com/hmaruyama/items/3d47efde4720d357a39e pipedreamの設定 triggerとcustom_requestを含むワークフローを作成します。 triggerについては、以下を参考にしてください。 https://qiita.com/hmaruyama/items/3d47efde4720d357a39e#pipedream側の設定 custom_requestにおいて、dispatchに関する設定を行います。 https://docs.github.com/ja/rest/repos/repos?apiVersion=2022-11-28#create-a-repository-dispatch-event 以下のような設定を行います。 curl -L \ -X POST \ -H "Accept: application/vnd.github+json" \ -H "Authorization: Bearer <YOUR-TOKEN>" \ -H "X-GitHub-Api-Version: 2022-11-28" \ https://api.github.com/repos/OWNER/REPO/dispatches \ -d '{"event_type":"webhook"}' Drupalの設定以下のモジュールをインストールします。 https://www.drupal.org/project/webhooks インストール後、以下のページで設定を行います。 /admin/config/services/webhook GitHub Actionsの設定以下のようにrepository_dispatchを設定します。これにより、pipedreamからのリクエストに基づき、GitHub Actionsが実行されます。 name: Build and Deploy to Production on: push: branches: - main # Allows external webhook trigger repository_dispatch: types: - webhook permissions: contents: read concurrency: group: "build-and-deploy" cancel-in-progress: true jobs: ... まとめ pipedreamを使用せずに、Drupalのカスタムモジュールを作成することにより、GitHubに通知を送る方法もありそうです。（すでにそのようなモジュールが開発されている可能性が高そうですが、見つけることができませんでした。） ...

YOLOv5モデル（文字領域検出）を使った推論アプリ

概要以下で文字領域の検出アプリを公開しています。 https://huggingface.co/spaces/nakamura196/yolov5-char 上記アプリが動作しなくなっていたので、以下の記事と同じ手順で修正しました。なお、本アプリで使用しているモデルの構築にあたっては、「『日本古典籍くずし字データセット』（国文研ほか所蔵／CODH加工） doi:10.20676/00000340」を使用しています。この修正において、細かい改善も加えたので、紹介します。 gr.JSONの高さ設定返却結果のJSONデータが大きくなると、結果が見づらいことがありました。そこで、以下のように、demo.cssを設定することにより、 ... demo = gr.Interface(yolo, inputs, outputs, title=title, description=description, article=article, examples=examples) demo.css = """ .json-holder { height: 300px; overflow: auto; } """ demo.launch() 以下のように、スクロールバーとともに結果を表示できるようになりました。矩形のみの返却文字数が多い場合、「Output Image」の画像が見にくいケースがありました。そこで、出力「Output Image with Boxes」を追加しました。以下のような処理によって実現しています。 def yolo(im): results = model(im) # inference df = results.pandas().xyxy[0].to_json(orient="records") res = json.loads(df) im_with_boxes = results.render()[0] # results.render() returns a list of images # Convert the numpy array back to an image output_image = Image.fromarray(im_with_boxes) draw = ImageDraw.Draw(im) for bb in res: xmin = bb['xmin'] ymin = bb['ymin'] xmax = bb['xmax'] ymax = bb['ymax'] draw.rectangle([xmin, ymin, xmax, ymax], outline="red", width=3) return [ output_image, res, im, ] まとめ参考になりましたら幸いです。 ...