Tools for Integrating Data by Complex, Dynamic Categories

  • Daniel Hruschka
  • , Yi Yun Cheng
  • , I. Han Hsiao
  • , Robert Bischoff
  • , Matthew Peeples
  • , Harsha Kasi
  • , Cindy Huang

Research output: Contribution to journalArticlepeer-review

Abstract

A key challenge in conducting comparative analyses across social units, such as religions, ethnicities, or cultures, is that data on these units is often encoded in distinct and incompatible formats across diverse datasets. This can involve simple differences in the variables and values used to encode these units (e.g., Roman Catholic is V130 = 1 vs. Q98A = 2 in two different datasets) or differences in the resolutions at which units are encoded (Maya vs. Kaqchikel Maya). These disparate encodings can create substantial challenges for the efficiency and transparency of data syntheses across diverse datasets. We introduce a user-friendly set of tools to help users translate four kinds of categories (religion, ethnicity, language, and subdistrict) across multiple, external datasets. We outline the platform's key functions and current progress, as well as long-range goals for the platform.

Original languageEnglish (US)
Pages (from-to)934-936
Number of pages3
JournalProceedings of the Association for Information Science and Technology
Volume61
Issue number1
DOIs
StatePublished - Oct 2024
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • Library and Information Sciences

Keywords

  • Ontology matching
  • cultural informatics
  • data integration
  • knowledge organization

Fingerprint

Dive into the research topics of 'Tools for Integrating Data by Complex, Dynamic Categories'. Together they form a unique fingerprint.

Cite this