Bias Algoritma dan Kegagalan Pragmatik AI dalam Mengidentifikasi Ujaran Kebencian Berbasis Budaya Lokal di Indonesia
DOI:
https://doi.org/10.59059/perspektif.v4i2.3063Keywords:
Algorithmic Bias, Indonesian Hate Speech, Large Language Models (LLMs), Local Cultural Sarcasm, PragmaticsAbstract
This study aims to identify the pragmatic failures of Large Language Models (LLMs) and the biases of Anglophone-based AI moderation algorithms in detecting Indonesian hate speech expressed through sarcasm, satire, euphemism, and local cultural metaphors. It also examines the extent to which AI systems understand and interpret the pragmatic meanings within the corpus. This study employs a qualitative descriptive approach with a comparative design. Data were collected through the documentation of hate speech expressions on social media containing elements of local cultural hatred. The data were analyzed using qualitative descriptive methods with pragmatic and thematic approaches. The findings show that all corpus data contain political satire and indirect hate expressed through irony, sarcasm, absurd metaphors, and popular culture wordplay. Testing with Claude AI showed that the system was capable of identifying the data as implicit criticism and recognizing the pragmatic functions of emoticons and contextual meanings in the utterances. However, the analysis also demonstrated limitations in understanding local sociocultural contexts, particularly the metaphors “daun nangka” and “daun sawit,” which were interpreted merely as absurd humor. These findings indicate that AI detection accuracy does not necessarily reflect a deep pragmatic and cultural understanding within the Indonesian context.
References
Cao, Y., Zhou, L., Lee, S., Cabello, L., Chen, M., & Hershcovich, D. (2023). Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study. Cross-Cultural Considerations in NLP at EACL, C3NLP 2023 - Proceedings of the Workshop, 53–67. https://doi.org/10.18653/v1/2023.c3nlp-1.7
Chen, C., Qu, W., Su, S., Feng, Y., & Li, T. (2025). A comprehensive review of LLM-based content moderation: advancements, challenges, and future directions. Knowledge-Based Systems, 330, 114689. https://doi.org/https://doi.org/10.1016/j.knosys.2025.114689
Davani, A. M., Atari, M., Kennedy, B., & Dehghani, M. (2023). Hate Speech Classifiers Learn Normative Social Stereotypes. Transactions of the Association for Computational Linguistics, 11(1), 300–319. https://doi.org/10.1162/tacl_a_00550
Deroy, A., & Maity, S. (2025). YouTube Comments Decoded: Leveraging LLMs for Low Resource Language Classification. CEUR Workshop Proceedings, 4054, 244–254.
Farwati, R., Yuliyanti, W., & Ningsih, W. P. R. (2023). Ujaran Kebencian Dan Perundungan di Dunia Maya: Tantangan Etika dalam Ruang Digital Indonesia. JISPENDIORA Jurnal Ilmu Sosial Pendidikan Dan Humaniora, 2(3), 213–225. https://doi.org/10.56910/jispendiora.v2i3.1001
Giorgi, T., Cima, L., Fagni, T., Avvenuti, M., & Cresci, S. (2025). Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets. Proceedings of the International AAAI Conference on Web and Social Media, 19, 653–670. https://doi.org/10.1609/icwsm.v19i1.35837
Lee, J., Fong, W., Le, A., Shah, S., Han, K., & Zhu, K. (2024). Pragmatic Metacognitive Prompting Improves LLM Performance on Sarcasm Detection. Proceedings of the 1st Workshop on Computational Humor (CHum), 1–8.
Park, J., Jeong, S., Song, S., Lee, Y., & Oh, A. (2025). LLM-C3MOD: A Human-LLM Collaborative System for Cross-Cultural Hate Speech Moderation. 71–88. https://doi.org/10.18653/v1/2025.c3nlp-1.7
Piot, P., Martín-Rodilla, P., & Parapar, J. (2025). Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models.
Ruis, L., Khan, A., Biderman, S., Hooker, S., Rocktäschel, T., & Grefenstette, E. (2023). The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs. Advances in Neural Information Processing Systems, 36(NeurIPS).
Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019). The risk of racial bias in hate speech detection. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 1668–1678. https://doi.org/10.18653/v1/p19-1163
Singh, I., Srirag, D., & Joshi, A. (2025). Nek Minit: Harnessing Pragmatic Metacognitive Prompting for Explainable Sarcasm Detection of Australian and Indian English. May.
Sonni, A. F. (2025). AI-based disinformation and hate speech amplification: analysis of Indonesia’s digital media ecosystem. Frontiers in Communication, Volume 10. https://www.frontiersin.org/journals/communication/articles/10.3389/fcomm.2025.1603534
Susanto, L., Wijanarko, M. I., Pratama, P. A., Hong, T., Idris, I., Aji, A. F., & Wijaya, D. (2025). IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language.
Vinay, R., Spitale, G., Biller-Andorno, N., & Germani, F. (2025). Emotional prompting amplifies disinformation generation in AI large language models. Frontiers in Artificial Intelligence, Volume 8-. https://doi.org/10.3389/frai.2025.1543603
Wijanarko, M. I., Susanto, L., Pratama, P. A., Idris, I., Hong, T., & Wijaya, D. (2024). Monitoring Hate Speech in Indonesia: An NLP-based Classification of Social Media Texts. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of System Demonstrations, 142–152. https://doi.org/10.18653/v1/2024.emnlp-demo.15
Yonatan, A. Z. (2026). Menilik Pengguna Media Sosial Indonesia 2017-2026. https://data.goodstats.id/statistic/menilik-pengguna-media-sosial-indonesia-2017-2026-xUAlp
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Perspektif : Jurnal Pendidikan dan Ilmu Bahasa

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.





