Bias Algoritma dan Kegagalan Pragmatik AI dalam Mengidentifikasi Ujaran Kebencian Berbasis Budaya Lokal di Indonesia

Qinthara Khairun Azida; Zakiyatul Marwa; Nazarena Putri Narahita; Elsa Rahma Sari; Ahmad Arzani Ibnul Hikam; Bohdan Filipov

doi:10.59059/perspektif.v4i2.3063

Authors

Qinthara Khairun Azida Universitas Gadjah Mada
Zakiyatul Marwa Universitas Gadjah Mada
Nazarena Putri Narahita Universitas Gadjah Mada
Elsa Rahma Sari Universitas Gadjah Mada
Ahmad Arzani Ibnul Hikam Universitas Gadjah Mada
Bohdan Filipov Universitas Gadjah Mada

DOI:

https://doi.org/10.59059/perspektif.v4i2.3063

Keywords:

Algorithmic Bias, Indonesian Hate Speech, Large Language Models (LLMs), Local Cultural Sarcasm, Pragmatics

Abstract

This study aims to identify the pragmatic failures of Large Language Models (LLMs) and the biases of Anglophone-based AI moderation algorithms in detecting Indonesian hate speech expressed through sarcasm, satire, euphemism, and local cultural metaphors. It also examines the extent to which AI systems understand and interpret the pragmatic meanings within the corpus. This study employs a qualitative descriptive approach with a comparative design. Data were collected through the documentation of hate speech expressions on social media containing elements of local cultural hatred. The data were analyzed using qualitative descriptive methods with pragmatic and thematic approaches. The findings show that all corpus data contain political satire and indirect hate expressed through irony, sarcasm, absurd metaphors, and popular culture wordplay. Testing with Claude AI showed that the system was capable of identifying the data as implicit criticism and recognizing the pragmatic functions of emoticons and contextual meanings in the utterances. However, the analysis also demonstrated limitations in understanding local sociocultural contexts, particularly the metaphors “daun nangka” and “daun sawit,” which were interpreted merely as absurd humor. These findings indicate that AI detection accuracy does not necessarily reflect a deep pragmatic and cultural understanding within the Indonesian context.

References

Cao, Y., Zhou, L., Lee, S., Cabello, L., Chen, M., & Hershcovich, D. (2023). Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study. Cross-Cultural Considerations in NLP at EACL, C3NLP 2023 - Proceedings of the Workshop, 53–67. https://doi.org/10.18653/v1/2023.c3nlp-1.7

Chen, C., Qu, W., Su, S., Feng, Y., & Li, T. (2025). A comprehensive review of LLM-based content moderation: advancements, challenges, and future directions. Knowledge-Based Systems, 330, 114689. https://doi.org/https://doi.org/10.1016/j.knosys.2025.114689

Davani, A. M., Atari, M., Kennedy, B., & Dehghani, M. (2023). Hate Speech Classifiers Learn Normative Social Stereotypes. Transactions of the Association for Computational Linguistics, 11(1), 300–319. https://doi.org/10.1162/tacl_a_00550

Deroy, A., & Maity, S. (2025). YouTube Comments Decoded: Leveraging LLMs for Low Resource Language Classification. CEUR Workshop Proceedings, 4054, 244–254.

Farwati, R., Yuliyanti, W., & Ningsih, W. P. R. (2023). Ujaran Kebencian Dan Perundungan di Dunia Maya: Tantangan Etika dalam Ruang Digital Indonesia. JISPENDIORA Jurnal Ilmu Sosial Pendidikan Dan Humaniora, 2(3), 213–225. https://doi.org/10.56910/jispendiora.v2i3.1001

Giorgi, T., Cima, L., Fagni, T., Avvenuti, M., & Cresci, S. (2025). Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets. Proceedings of the International AAAI Conference on Web and Social Media, 19, 653–670. https://doi.org/10.1609/icwsm.v19i1.35837

Lee, J., Fong, W., Le, A., Shah, S., Han, K., & Zhu, K. (2024). Pragmatic Metacognitive Prompting Improves LLM Performance on Sarcasm Detection. Proceedings of the 1st Workshop on Computational Humor (CHum), 1–8.

Park, J., Jeong, S., Song, S., Lee, Y., & Oh, A. (2025). LLM-C3MOD: A Human-LLM Collaborative System for Cross-Cultural Hate Speech Moderation. 71–88. https://doi.org/10.18653/v1/2025.c3nlp-1.7

Piot, P., Martín-Rodilla, P., & Parapar, J. (2025). Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models.

Ruis, L., Khan, A., Biderman, S., Hooker, S., Rocktäschel, T., & Grefenstette, E. (2023). The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs. Advances in Neural Information Processing Systems, 36(NeurIPS).

Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019). The risk of racial bias in hate speech detection. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 1668–1678. https://doi.org/10.18653/v1/p19-1163

Singh, I., Srirag, D., & Joshi, A. (2025). Nek Minit: Harnessing Pragmatic Metacognitive Prompting for Explainable Sarcasm Detection of Australian and Indian English. May.

Sonni, A. F. (2025). AI-based disinformation and hate speech amplification: analysis of Indonesia’s digital media ecosystem. Frontiers in Communication, Volume 10. https://www.frontiersin.org/journals/communication/articles/10.3389/fcomm.2025.1603534

Susanto, L., Wijanarko, M. I., Pratama, P. A., Hong, T., Idris, I., Aji, A. F., & Wijaya, D. (2025). IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language.

Vinay, R., Spitale, G., Biller-Andorno, N., & Germani, F. (2025). Emotional prompting amplifies disinformation generation in AI large language models. Frontiers in Artificial Intelligence, Volume 8-. https://doi.org/10.3389/frai.2025.1543603

Wijanarko, M. I., Susanto, L., Pratama, P. A., Idris, I., Hong, T., & Wijaya, D. (2024). Monitoring Hate Speech in Indonesia: An NLP-based Classification of Social Media Texts. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of System Demonstrations, 142–152. https://doi.org/10.18653/v1/2024.emnlp-demo.15

Yonatan, A. Z. (2026). Menilik Pengguna Media Sosial Indonesia 2017-2026. https://data.goodstats.id/statistic/menilik-pengguna-media-sosial-indonesia-2017-2026-xUAlp