The Role of AI in Moderating Online Hate Speech: A Discourse Evaluation
DOI:
https://doi.org/10.55559/fgr.v1i3.20Keywords:
Artificial Intelligence, Content Moderation, Discourse Analysis,, Hate Speech, Online PlatformsAbstract
As people spend more time online, hate speech on social media and other digital channels has grown into a big problem for everyone. Because of this, a lot of tech companies have started to use systems that use artificial intelligence (AI) to find, screen, and limit harmful materials. The goal of this study is to find out how AI can help police stop hate speech online by looking at how these technologies affect, change, and sometimes confuse digital communication. It is easy and quick for AI to solve problems, but it often has trouble with the complicated rules of language that people use, like humor, metaphor, cultural context, and new terms. People are worried about both too much and too little filtering, which means that legal speech is being taken away and hate speech that is secret or coded is not being found.
References
Bakhtin, M. M. (1981). The dialogic imagination: Four essays (M. Holquist, Ed.; C. Emerson & M. Holquist, Trans.). University of Texas Press.
Bilewicz, M., & Soral, W. (2020). Hate speech and social change: How social-psychological research can inform societal responses to hate speech. Social Issues and Policy Review, 14(1), 78–113. https://doi.org/10.1111/sipr.12060
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
Chandrasekharan, E., Pavalanathan, U., Srinivasan, A., Glynn, A., Eisenstein, J., & Gilbert, E. (2017). You can’t stay here: The efficacy of Reddit’s 2015 ban examined through hate speech. Proceedings of the ACM on Human-Computer Interaction, 1(CSCW), 1–22. https://doi.org/10.1145/3134699
Citron, D. K. (2014). Hate crimes in cyberspace. Harvard University Press.
Daniels, J. (2013). Race and racism in Internet studies: A review and critique. New Media & Society, 15(5), 695–719. https://doi.org/10.1177/1461444812462849
Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated hate speech detection and the problem of offensive language. Proceedings of the International AAAI Conference on Web and Social Media, 11(1), 512–515.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 4171–4186.
Fairclough, N. (1995). Critical discourse analysis: The critical study of language. Longman.
Gagliardone, I., Gal, D., Alves, T., & Martinez, G. (2015). Countering online hate speech. UNESCO Publishing.
Ghosh, D., & Veale, T. (2016). Fracking sarcasm using neural network. Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 161–169.
Gillespie, T. (2018). Custodians of the Internet: Platforms, content moderation, and the hidden decisions that shape social media. Yale University Press.
Gorwa, R. (2019). The platform governance triangle: Conceptualizing the informal regulation of online content. Policy & Internet, 11(1), 87–104. https://doi.org/10.1002/poi3.189
Jane, E. A. (2017). Misogyny online: A short (and brutish) history. Sage.
Jhaver, S., Bruckman, A., & Gilbert, E. (2019). Does transparency in moderation really matter? User behavior after content removal explanations on Reddit. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1–27. https://doi.org/10.1145/3359227
Macedo-Rouet, M., Simões, A., & Bérard, P. (2021). Modeling indirect speech acts for toxicity detection in online conversations. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2278–2289.
Matamoros-Fernández, A. (2017). Platformed racism: The mediation and circulation of an Australian race-based controversy on Twitter, Facebook and YouTube. Information, Communication & Society, 20(6), 930–946. https://doi.org/10.1080/1369118X.2017.1293130
Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. NYU Press.
Schmidt, A., & Wiegand, M. (2017). A survey on hate speech detection using natural language processing. Proceedings of the Fifth International Workshop on Natural Language Processing for social media, 1–10. https://doi.org/10.18653/v1/W17-1101
Van Dijk, T. A. (2000). Ideology and discourse: A multidisciplinary introduction. Palgrave.
Van Dijk, T. A. (2015). Discourse and knowledge. De Gruyter Mouton.
Vidgen, B., & Derczynski, L. (2020). Directions in abusive language training data, a systematic review: Garbage in, garbage out. PLoS ONE, 15(12), e0243300. https://doi.org/10.1371/journal.pone.0243300
Wang, Y., Schmidt, A., & Wiegand, M. (2019). A survey on online hate speech detection and its challenges. Proceedings of the 5th Workshop on Natural Language Processing for Social Media, 1–10.
Zhang, Z., Ferrara, E., & MacDonald, C. (2020). Speech act classification for online conversations with application to hate speech detection. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2033–2043.
Zhou, X., Wang, X., Ji, Y., & Tang, J. (2021). Multi-modal hate speech detection: A survey and new perspectives. ACM Computing Surveys, 54(6), 1–36.
Published on:
Also Available On
Note: Third-party indexing sometime takes time. Please wait one week or two for indexing. Validate this article's Schema Markup on Schema.org
Issue
Section
License
Copyright (c) 2025 Doaa Taher Matrood (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.