Under development — I'm actively building this site.

Back to publications

When Tom Eats Kimchi: Evaluating Cultural Awareness of Multimodal Large Language Models in Cultural Mixture Contexts

Authors: Jun Seong Kim, Kyaw Ye Thu, Javad Ismayilzada, Junyeong Park, Eunsu Kim, Huzama Ahmad, Na Min An, James Thorne, Alice Oh

Abstract

In a highly globalized world, it is important for multi-modal large language models (MLLMs) to recognize and respond correctly to mixed-cultural inputs. For example, a model should correctly identify kimchi (Korean food) in an image both when an Asian woman is eating it, as well as an African man is eating it. However, current MLLMs show an over-reliance on the visual features of the person, leading to misclassification of the entities. To examine the robustness of MLLMs to different ethnicity, we introduce MIXCUBE, a cross-cultural bias benchmark, and study elements from five countries and four ethnicities. Our findings reveal that MLLMs achieve both higher accuracy and lower sensitivity to such perturbation for high-resource cultures, but not for low-resource cultures. GPT-4o, the best-performing model overall, shows up to 58% difference in accuracy between the original and perturbed cultural settings in low-resource cultures.

BibTeX

@inproceedings{kim2025kimchi,
  title     = {When Tom Eats Kimchi: Evaluating Cultural Awareness of Multimodal Large Language Models in Cultural Mixture Contexts},
  author    = {Kim, Jun Seong and Thu, Kyaw Ye and Ismayilzada, Javad and Park, Junyeong and Kim, Eunsu and Ahmad, Huzama and An, Na Min and Thorne, James and Oh, Alice},
  booktitle = {Proceedings of the Third Workshop on Cross-Cultural Communication in NLP (C3NLP 2025)},
  year      = {2025},
  address   = {Albuquerque, New Mexico},
  url       = {https://openreview.net/forum?id=4xXipOmTZ4},
  note      = {Outstanding Paper Award}
}

BibTeX