The DH Cookbook in Korean Studies

A Fusion of Old and New

Aron van de Pol

Leiden University

2024-10-18

Welcome to the DH Kitchen

  • The Kitchen
    • The Tools: Methods and Technologies
    • The Ingredients: Data and Sources
    • The Recipes: Research Designs and Approaches
Figure 1: The DH Cookbook generated by Dall-e Model.

Goal today: Explore Digital Humanities approaches to Korean Studies.

Recipes: What have researchers been cooking?

  • Diverse flavours!
  • Text Mining
  • Computer Vision
  • Network Analysis
  • QGIS (Mapping)
  • Survey/Numerical Data

Detection of Four panel cartoons in the Chosŏn Ilbo 1920-1940 Lee, Kim, and Jun1

Network of Medieval Korean personalities Cha1

The study of socialist tone in Kaebyŏk Hur1

Why do anti-democratic laws exist in democracies? Green and Denney1

Multi-instance learning to extract visual identities of Colonial Print shops

LLM Driven metadata Annotation of Colonial Korean Advertisements
{
  "product": "Shoes",
  "company": "朴德裕洋靴店",
  "location": "京城府宽𤏩洞五一番地",
  "language": "Korean (mixed)",
  "visual_elements": {
    "illustration": "Man wearing shoes, 
     pointing at product illustration",
    "text": "Advertisement text in Korean"
  }
}

And much more…

  • Please check out the papers of the researchers mentioned!
  • Korean Journal of Digital Humanities
  • See Cha & Wall (2023) for more depth/examples.1

The Pantry: Ingredients of DH/Korean Studies Fusion

Ingredients: Data

  • The data remains mostly the same!
  • Digitized/Machine Readable.
  • Format changes.

Shopping for data

  • Starts mostly digitally
    • Archives
    • Libraries
    • Google Scholar
    • ChatGPT/LLMs?

Shopping for data

(a) Korean History Database
(b) Document Specific Databases
(c) Hyundam Mungo Dataset
(d) Seoul Public Datasquare
Figure 2

Shopping for data

  • Getting data is hard.
  • Especially historical data depends on digitization and OCR1
  • Current digitization at the National Library of Korea: 19% (2023)2
  • This is still 1.9 million works…
  • The challenge: How to curate the data we need?
  • Logistical Challenges

One bite at a time?

  • Libraries/Archives have more available than ever.
  • But also harder to get in bulk
  • (Computational) DH requires scale.
  • Most facilities are not ready to support this.

One bite at a time?

  • API interfaces (For Archives/Libraries)1
  • Compute facilities for Researchers (For Research Institutes)
  • How do we share the data used again?
    • Copyright
    • Long-term storage solution

NLK Data Preservation Center P’yongCh’ang

  • NLK also developing new Data Center
  • Problem of Power/Electricity
  • Humanist need to consider these logistical challenges now.

Computer Render of the Envisioned Data Center

Outside Print

  • I work on Colonial Korea > ‘Centralized’ data.
  • Increasing interest in non-print data: online communities, internet archives, social media.
  • Interest in preservation and analysis of non-print data.
    • Where is this data? Where is it stored? And by Whom?

Kitchen Tools: Technologies and Platforms

Kitchen Tools: Technologies and Platforms

Software

  • Python/R
  • LaTeX/Typst/Quarto
  • Open Refine
  • Etc…

Hardware

  • Computing Facility
  • Long term Data storage
  • VR/AR
  • Research ‘Lab’

Large Language models

  • Perhaps the most hyped
  • Basic understanding of the mechanism
  • How to use it responsibility

Choosing your cuisine

Should Everyone Cook This Way?

  • Not necessarily.
  • However, understanding the basics of ‘preparing a meal’ is essential.
  • Mastery of core techniques important for a future with an increasing digital side.

Should Everyone Cook This Way?

  • Not necessary to engage quantitatively.
  • However, engaging with digital media, data, and tools is crucial.

Should Everyone Cook This Way?

  • How do you browse through an archive containing
    • millions of books?
    • 100,000 years of video material?
    • Gigabytes (if not Tera) of image data?
    • CSV, plain text, of millions of rows of social media data?

Fusion Cuisine: Digital Methods, New Questions

  • Scale vs. Detail.
  • Traditional research & Computational Approaches look at similar questions
  • Inherently digital: Some studies, are inherently digital, but not inherently quantitative
  • Added value in a different way to read/view distantly.

Fusion Cuisine: Digital Methods, New Questions

  • Examples:
    • Historical text analysis at scale.
    • Browsing & analysing 10,000s of image data.
    • Digital anthropology of K-pop fan communities.
    • Study of public data.

The Potluck: Collaborative and Open Scholarship

  • Sharing “recipes”
    • Open data, methods, and code.
  • Collaboration
    • Computational and non-computational working together.
  • Plurality
    • Not qualitative vs. quantitative but building upon each other.

Open Fields: New Recipes Waiting to be Created

  • LLM-assisted studies
  • Big data in Korean Studies
  • Digital preservation efforts

Conclusion: The Ever-Evolving Menu

  • DH in Korean studies: A diverse and expanding cuisine
  • Different research questions call for different recipes.
  • Readiness for a more digitized world.

Q&A: Taste Testing

Thank you for joining today’s “feast”! What did you think of the “menu” we explored?

Questions, comments, or feedback?

https://aronvandepol.com/

An Sŏng Jae judging critically the food served in hŭkpaek yorisa. (Source: Netflix)

Cited Works

Cha, Javier. “Javier Cha.” Javier Cha. https://javiercha.com/, 2024.
Cha, Javier, and Barbara Wall. “Introduction to Special Section Digital Korean Studies.” Korean Studies 47, no. 1 (2023): 1–7.
Green, Christopher, and Steven Denney. “Why Do Democratic Societies Tolerate Undemocratic Laws? Sorting Public Support for the National Security Act in South Korea.” Democratization 31, no. 1 (January 2024): 113–31. https://doi.org/10.1080/13510347.2023.2258082.
Hur, Soo. 『개벽』 논조의 사회주의화에 관한 새로운 접근 토픽 연결망 분석을 중심으로 A New Approach to Socialization in Gaebyeok’s Tone: Focusing on Topic Network Analysis.” 인문논총 78, no. 1 (2021): 221–62.
Kim, Sujeong. “Korean Memory Project: Digital Curation of Knowledge Information Resources.” Conference {{Presentation}}. Seoul, October 2023.
Lee, Seojoon, Byungjun Kim, and Bong Gwan Jun. “Automatic Detection of Four-Panel Cartoon in Large-Scale Korean Digitized Newspapers Using Deep Learning.” Journal of Open Humanities Data 10, no. 1 (June 2024). https://doi.org/10.5334/johd.205.
Naver. DATA CENTER GAK.” {Company Website}. DATA CENTER GAK. https://datacenter.navercorp.com/, 2024.
Park, Jiyeong. “로봇 가로\(\cdot\)세로가 데이터 나른다, 국립중앙도서관 ‘100만배 용량’.” 한겨레, November 2023.
Park, Suhyeon. “[르포] 국립중앙도서관 1만개 분량 데이터 담은 네이버 ‘각 춘천’ 가보니.” 조선비즈, February 2023.