Profanity on Twitter in the Nordics

Steven Coats
English Philology, University of Oulu
steven.coats@oulu.fi

Higher Seminar Series, Södertörns Högskola
January 18th, 2018

Background and research questions

  1. Explore extent of profanity use by gender
  2. Quantify results by country and gender and identify most characteristic words
  3. Use word embeddings to investigate semantic space by gender

Data collection | Twitter Streaming API

Data collection | Location disambiguation

Data collection | Location disambiguation 2

Data collection | Gender disambiguation

\[ \small P(name_x \in male) = \frac{\sum{name_x \in male}}{\sum{name_x}} ,\qquad P(name_x \in female) = \frac{\sum{name_x \in female}}{\sum{name_x}} \normalsize \]

Data collection | Twitter REST API

Some Sample Tweets containing Profanity

Profanity lists

“Core” profanities such as shit, fuck, piss, cunt, damn, others such as bollocks, faggot, fuq, heeb, hell, hillbilly, homo, honkey, hussy, jackass, jackoff, jigaboo, lameass, lardass, lesbo, lezzie, limey, limpdick, mcfagget, minge, mooncricket, nigger, paki, pansy, peckerhead, pikey, piss, pussy, spastic, snownigger, twat, whitetrash, wtf, etc.

E.g. arnapalaaq, iteq, nipangerit, aumingi, böllur, djöfullinn, drusla, fífl, fíflingur, mogghøvd, skitni, pupp, pæss, rass, rasshøl, rompe, ronk, runk, ræv, ræva, rævhøl, rævva, bøsserøv, fisse, hestepik, klaphat, kussekryller, lort, fitta, fittig, helvete, jävlä, jävlar, knulla, kuk, huoraa, huorilta, huorilla, huorille, huorista, huorien, huoriin, huorissa, huorat, huoria, kusipäät, kusipäältä, kusipäällä, etc.

Sources of Profanity

Noswearing.com scrape (347 terms), list of 1,383 potentially offensive terms created at Carnegie Mellon University, Pittsburgh, USA

Online word lists from software tools created to filter user input on websites (here and here for Norwegian, Finnish, Swedish and Danish), Svensk og Dansk bandeordbog Crowd-sourced Youswear dictionary for some terms in Icelandic, Faroese, and Greenlandic, wiktionary.org, Oqaatsit | Ordbogen, Greenlandic-Danish dictionary, Beygingarlýsing íslensks nútímamáls, Inflectional Dictionary of Modern Icelandic, Íslensk nútímamálsorðabók, Dictionary of Modern Icelandic, Sprotin, Faroese dictionaries, SALDO, the Svenskt Associationslexikon, KORP, Språkbanken’s corpus tool, Ordbog over det danske Sprog, Dictionary of Danish, Bokmålsordboka | Nynorskordboka, Sprakrådet’s online dictionaries of Bokmål and Nynorsk

Use of profanity by country and gender

\[ G = 2\sum_{i} {O_{i} \cdot \ln\left(\frac{O_i}{E_i}\right)} \]

Iceland (Nordic profanity)

Total Nordic-language profanity per 1k words: Females 0.674, males 0.749

Norway (Nordic profanity)

Total Nordic-language profanity per 1k words: Females 0.442, males 0.561

Denmark (Nordic profanity)

Total Nordic-language profanity per 1k words: Females 0.391, males 0.37

Sweden (Nordic profanity)

Total Nordic-language profanity per 1k words: Females 1.583, males 1.775

Finland (Nordic profanity)

Total Nordic-language profanity per 1k words: Females 0.542, males 0.844

Totals, all Nordics (Nordic profanity)

Total Nordic-language profanity per 1k words: Females 0.982, males 1.133 (Includes Greenland, Faroes, and Åland)

Iceland (English profanity)

Total English-language profanity per 1k words: Females 1.951, males 2.266

Norway (English profanity)

Total English-language profanity per 1k words: Females 1.731, males 1.656

Denmark (English profanity)

Total English-language profanity per 1k words: Females 2.58, males 2.073

Sweden (English profanity)

Total English-language profanity per 1k words: Females 1.429, males 1.442

Finland (English profanity)

Total English-language profanity per 1k words: Females 0.996, males 0.995

Total, all Nordics (English profanity)

Total English-language profanity per 1k words: Females 1.549, males 1.493 (includes Greenland, Faroes, and Åland)

Profanity frequency by country and gender

Word embeddings

Drawing
(Mikolov et al. 2013: 749)

Drawing

t-SNE to 2-dimensional space

t-SNE Visualization

Caveats and future outlook

Preliminary summary and conclusions 1

Preliminary summary and conclusions 2




Thank you for your attention!

Acknowledgements

Also thanks to

References

Argamon, S., M. Koppel, J. W. Pennebaker and J. Schler. 2007. Mining the blogosphere: Age, gender, and the varieties of self-expression. First Monday 12/9. http://firstmonday.org/ojs/index.php/fm/article/view/2003/1878

Bamman, D., J. Eisenstein and T. Schnoebelen. 2014. Gender identity and lexical variation in social media. Journal of Sociolinguistics 18(2), 135–160. http://onlinelibrary.wiley.com/doi/10.1111/josl.12080/full

Coats, S. 2016. Grammatical feature frequencies of English on Twitter in Finland. In L. Squires (ed.), English in computer-mediated communication: Variation, representation, and change. Berlin: De Gruyter. 179–210.

Dewaele, J.-M. 2004. The emotional force of swearwords and taboo words in the speech of multilinguals. Journal of Multilingual and Multicultural Development 25(2–3), 204–222.

Firth, J.R. 1957. Papers in linguistics, 1934–1951. London: Oxford University Press.

Labov, W. 2001. Principles of linguistic change, vol. 2: Social factors. Oxford: Blackwell.

Lui, M. and T. Baldwin. 2012. Langid.py: An off-the-shelf language identification tool. 50th Proceedings of the Association for Computational Linguistics, 25–30. Stroudsburg, PA: ACL. http://dl.acm.org/citation.cfm?id=2390475

McEnery, T. 2006. Swearing in English: Bad language, purity and power from 1586 to the present. New York: Routledge.

Mehl, M. and J. Pennebaker. 2003. The sounds of social life: A psychometric analysis of students’ daily social environments and natural conversations. Journal of Personality and Social Psychology 84(4), 857–870.

Mikolov, T., W. Yih, and G. Zweig. 2013. Linguistic regularities in continuous space word represen-tations. In: Proceedings of HLT-NAACL 13, 746–751. https://www.aclweb.org/anthology/N13-1090

Newman, M.L., C. Groom, L. Handelman, and J. Pennebaker. 2008. Gender differences in language use: An analysis of 14,000 text samples.Discourse Processes 45, 211–236. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.216.4267&rep=rep1&type=pdf

Roesslein, J. 2015. Tweepy. Python package [Computer software]. http://www.tweepy.org

Wang, W., L. Chen, K. Thirunarayan, and A. P. Sheth. 2014. Cursing in English on Twitter. In: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing, 415–425.