Profanity on Twitter in the Nordics

Steven Coats
English Philology, University of Oulu
steven.coats@oulu.fi

5th SwiSca Symposium on Swearing
November 23rd, 2017

Background and research questions

  1. Explore extent of profanity use by gender
  2. Quantify results by country and gender and identify most characteristic words
  3. Use word embeddings to investigate semantic space by gender

Data collection | Twitter Streaming API

Data collection | Gender disambiguation

\[ \small P(name_x \in male) = \frac{\sum{name_x \in male}}{\sum{name_x}} ,\qquad P(name_x \in female) = \frac{\sum{name_x \in female}}{\sum{name_x}} \normalsize \]

Data collection | Twitter REST API

Tweet density by language

Map polygons from Natural Earth, maps from Open Street Maps