In the past decade, various online tools and Open Source software that can be useful in Southeast Asian librarianship have been developed. This article looks at a selection of online tools that are available to help in areas like cataloguing and creation of Romanised versions of Southeast Asian scripts, text recognition, text and image annotation, date conversion, presentation and creative usage of digitised material.

Transliteration and Romanisation tools

Aksharamukha is a free online tool that facilitates the conversion between various writing systems that descended from the third-century BCE Brahmi script. It can be used for Sanskrit- and Pali-based Romanisation of many Southeast Asian scripts. In addition to the simple mapping of characters, Aksharamukha attempts to implement various script/language-specific orthographic conventions such as vowel lengths, gemination and nasalisation. It also provides several customisation options to fine-tune and to apply the correct orthography. Aksharamukha currently supports 120 scripts, including 40 extinct scripts like Ariyaka, as well as 21 Romanisation methods. It is possible to upload images with printed text in any of the supported scripts, which the tool can process by way of automated text recognition and then can be Romanised or converted into any other of the supported scripts. A report on the conversion of Burmese script with Aksharamukha is available from the British Library. However, Aksharamukha is currently not yet suitable for the Romanisation of modern Thai and Lao scripts according to the ALA-LC Romanisation method. Aksharamukha was developed by Vinodh Rajan, a computer scientist and graduate in the field of Digital Paleography.

Screenshot of Aksharamukha displaying some of the supported scripts.

To assist with the Romanisation of modern Thai, the online transliteration tool Plangsarn offers a solution. This free tool, which is easy to use by inserting a Thai word or phrase into a mask and then convert it to the Romanised version according to the ALA-LC standard, was developed by Thammasat University Library, Bangkok, and the National Electronics and Computer Technology Center (NECTEC), a statutory government organization under the National Science and Technology Development Agency (NSTDA), Ministry of Science and Technology of Thailand. Problems encountered with Plangsarn are word/syllable separation and capitalisation, which can result in incorrect spacing within words and erroneous capitalisation of names or parts of names. For example, the conversion of the name “มหาวิทยาลัยมหาจุฬาลงกรณราชวิทยาลัย” resulted in “mahāwitthayālai mahā čhulā long kō̜n Na rāt witthayālai”, which acccroding to OCLC should be “Mahāčhulālongkō̜n Rātchawitthayālai”.

A free online tool for the Romanisation of modern Lao script is the Lao Romanisation converter, although it has its limitations since it does not support the ALA-LC Romanisation standard. The tool is based on the newly developed Romanisation system MoH 2020 which had been adopted by the Ministry of Health of Laos since 2020. In this system, each character corresponds to only one phonetic sound (with few exceptions). Diacritics (accents) and tone marks are not used, and short and long vowels are romanised the same. Geographic names are written in Roman script as a single word with only the first letter capitalised. The Romanisation is based on the Lao spelling reforms by the Lao government in 1975. The tool was initially developed for the Department of Planning and Cooperation, Ministry of Health of Laos, with the hope that it will be adopted as the national Romanisation system by the Lao government to mitigate the risks of the widespread “Karaoke” Romanisation of modern Lao script that is often used in social media.

Text recognition and annotation tools

Automated text recognition is becoming increasingly important in the work with manuscripts, not only among scholars and researchers, but also in the library world. Transkribus is a platform that uses machine learning technology to automate text recognition of handwritten and printed documents. By using a transcription editor to manually transcribe historical documents, members of the Transkribus community train specific text recognition models that are capable of recognising handwritten, typewritten or printed documents in any language. A pool of existing text recognition models is available for mainly European languages, which makes the process of training a specific model for an archive or manuscripts easier and faster. There are many models for non-western languages on Transkribus, but they are still mostly not available publicly. However, one can get in touch with the model creator/s and ask for them to be shared. Curators at the British Library have created a trained model on Arabic scientific manuscripts, for example. Transkribus was developed by the READ project. When the project ended, they have established a cooperative, the READ-COOP, a consortium of leading research groups from all over Europe headed by the University of Innsbruck, to continue the development and maintenance of the software and its community. Transkribus Lite is the web based instance of Transkribus. Users can upload documents, perform layout analysis, run text detection, and can experiment with their own digitised collection items.

Recogito is an online platform for collaborative document annotation with the aim to foster better linkages between online resources documenting the past. Recogito provides a personal workspace where users can upload, collect and organise source materials – texts, images and tabular data – and collaborate in their annotation and interpretation. Recogito helps to make research more visible on the Web more easily, and to expose the results of research as Open Data. An online tutorial explains in simple steps how Recogito can be used. For Southeast Asian librarianship the function of identifying geographical names within annotations as references to places and plotting them on a map, as well as the possibility to tag persons and events are useful functions to make connections between different sources in different collections. Recogito is an initiative of the Pelagios Network, developed under the leadership of the Austrian Institute of Technology, Exeter University and The Open University, with funding from the Andrew W. Mellon Foundation.

Date conversion

Southeast Asia librarians, cataloguers and curators are often confronted with various calendar or time recording systems that are used to date manuscripts, archival and early printed material as well as published books. There are numerous online tools to assist with date conversion, many of which are supported by adverts or religious contexts.

The website Ancient Buddhist Texts offers a selection of Buddhist-Christian/Common Era converters specifically for Buddhist calendar systems used in Thailand/Laos/Cambodia and Sri Lanka/Myanmar/India. In addition, it also provides date conversion for the Cūḷasakarat (Chulasakkarat) calendar. The Ancient Buddhist Texts website is maintained by the Theravada monk Bhante Ānandajoti.

A simple and advert-free tool for the conversion of Hijri A.H. (Islamic) dates and Christian (Common Era) dates is available from Islamic Philosophy Online, a website that was developed my members of the Institute of Asian and Oriental Studies at the University of Zurich.

A Javanese calendar (Saka era) online converter can be found on the front page of the website for Javanese literature, Sastra Jawa. This website is run by the non-profit organisation Sastra Lestari whose mission is to preserve and disseminate the literary works of the Indonesian archipelago.

Librarians and researchers working with manuscripts from mainland Southeast Asia often find themselves confronted with colophons mentioning dates according to the luni-solar calendar, like for example “eighth day of the waxing moon of the seventh month”. The website timeanddate offers a tool to calculate moon phases at any given place anytime in the past or future (not ad-free, but advertisements can be switched off). This website has been developed by Time and Date AS, a team of almost 30 programmers, designers, journalists, and administrative staff from four different continents based in Norway.

Screenshot of the timeanddate website displaying the moon phases of the year 1723 CE in Luang Prabang.

Presentation and creative usage

Digitisation projects of the past decade have resulted in huge collections of digital content that are accessible online via library websites. This has created the need to raise awareness, and to promote engagement and learning with these online collections. One useful free online tool is Exhibit, a user-friendly, fast, and responsive editor to create stories and quizzes with 3D models and IIIF-compatible high resolution images. Exhibit has a range of presentation modes including scrollytelling, slideshows, kiosks, and quizzes that can be embedded in websites or social media channels via an iframe. They can also be duplicated and remixed by users, which is perfect for online learning and classroom environments. Exhibit is supported by a group of the world’s leading libraries and museums and has a vibrant supportive community at its core. The tool was developed by Mnemoscene with the support of the Esmée Fairbairn Collections Fund. Initiated to meet the online teaching needs of The University of St. Andrews, it is now used by major organisations in the UK including The British Library, Bodleian Libraries, University of Cambridge and Royal Pavilion and Museums Trust Brighton. An example of an exhibit of the Vessantara Jataka with illustrations from a Thai manuscript at the Chester Beatty Library, Dublin, can be viewed by clicking on the image below.

Advertisement