Profanity on Twitter in the Nordics

Steven Coats
English Philology, University of Oulu

Higher Seminar Series, Södertörns Högskola
January 18th, 2018

Background and research questions

  1. Explore extent of profanity use by gender
  2. Quantify results by country and gender and identify most characteristic words
  3. Use word embeddings to investigate semantic space by gender

Data collection | Twitter Streaming API

Data collection | Location disambiguation

Data collection | Location disambiguation 2

Data collection | Gender disambiguation

\[ \small P(name_x \in male) = \frac{\sum{name_x \in male}}{\sum{name_x}} ,\qquad P(name_x \in female) = \frac{\sum{name_x \in female}}{\sum{name_x}} \normalsize \]

Data collection | Twitter REST API

Some Sample Tweets containing Profanity

Profanity lists

“Core” profanities such as shit, fuck, piss, cunt, damn, others such as bollocks, faggot, fuq, heeb, hell, hillbilly, homo, honkey, hussy, jackass, jackoff, jigaboo, lameass, lardass, lesbo, lezzie, limey, limpdick, mcfagget, minge, mooncricket, nigger, paki, pansy, peckerhead, pikey, piss, pussy, spastic, snownigger, twat, whitetrash, wtf, etc.