Visualizing the
Language of Hate
What Is This Visualization About?
This visualization is an interactive scatter plot displaying the top 100 emojis most frequently found in tweets labeled as hate speech. The data is drawn from the test.csv dataset, and the visualization filters for entries marked as hate speech (where label_gold is '1').
- X-Axis (Frequency): The horizontal position of an emoji represents its total count in the hate speech dataset. Emojis further to the right are more common.
- Y-Axis (Distribution): The vertical position is randomized to prevent emojis from overlapping, making the distribution easier to see.
- Interactivity: If you hover your mouse over any emoji, a red tooltip will appear. This box shows the top 10 most common words (excluding "stop words") that were used in the same tweets as that specific emoji, helping to reveal its context.
- Additional Context: The blue tab on the top right reveals a critical piece of information: the total number of hate speech tweets analyzed and, crucially, how many of them contained no emojis at all.
Why Am I Visualizing This Data?
Hate speech online is a complex and evolving problem. While much attention is given to explicit slurs or threats, communication—and therefore hate—is increasingly nuanced. Emojis are a powerful, visual, and often ambiguous part of that new language.
The purpose of this visualization is to explore and expose these patterns. By isolating the emojis most common in hateful content, we can begin to answer key questions:
- Are certain emojis being co-opted or weaponized by hate groups to signal intent?
- Do these emojis (like the Knife 🔪, Bomb 💣, or Skull 💀) simply add violent emphasis, or do they form a coded language to evade simple text-based content moderation?
- What is the relationship between these symbols and the text they accompany? (The hover-to-see-words feature is designed to explore this directly).
Understanding these connections is vital for researchers, platform designers, and content moderators. By seeing how hate is expressed, we can become better at identifying and mitigating its spread, moving beyond simple keyword filters to recognize more complex, symbol-based patterns of abuse.