Steven Coats
English Philology, University of Oulu
steven.coats@oulu.fi
Higher Seminar Series, Södertörns Högskola
January 18th, 2018
\[ \small P(name_x \in male) = \frac{\sum{name_x \in male}}{\sum{name_x}} ,\qquad P(name_x \in female) = \frac{\sum{name_x \in female}}{\sum{name_x}} \normalsize \]
author_name
entity like “Anna Hansson” -> user classified as femaleauthor_name
entities such as = “swedengirl123”, “!!!!ROCKER!!!!” -> ignored“Core” profanities such as shit, fuck, piss, cunt, damn, others such as bollocks, faggot, fuq, heeb, hell, hillbilly, homo, honkey, hussy, jackass, jackoff, jigaboo, lameass, lardass, lesbo, lezzie, limey, limpdick, mcfagget, minge, mooncricket, nigger, paki, pansy, peckerhead, pikey, piss, pussy, spastic, snownigger, twat, whitetrash, wtf, etc.
E.g. arnapalaaq, iteq, nipangerit, aumingi, böllur, djöfullinn, drusla, fífl, fíflingur, mogghøvd, skitni, pupp, pæss, rass, rasshøl, rompe, ronk, runk, ræv, ræva, rævhøl, rævva, bøsserøv, fisse, hestepik, klaphat, kussekryller, lort, fitta, fittig, helvete, jävlä, jävlar, knulla, kuk, huoraa, huorilta, huorilla, huorille, huorista, huorien, huoriin, huorissa, huorat, huoria, kusipäät, kusipäältä, kusipäällä, etc.
Noswearing.com scrape (347 terms), list of 1,383 potentially offensive terms created at Carnegie Mellon University, Pittsburgh, USA
Online word lists from software tools created to filter user input on websites (here and here for Norwegian, Finnish, Swedish and Danish), Svensk og Dansk bandeordbog Crowd-sourced Youswear dictionary for some terms in Icelandic, Faroese, and Greenlandic, wiktionary.org, Oqaatsit | Ordbogen, Greenlandic-Danish dictionary, Beygingarlýsing íslensks nútímamáls, Inflectional Dictionary of Modern Icelandic, Íslensk nútímamálsorðabók, Dictionary of Modern Icelandic, Sprotin, Faroese dictionaries, SALDO, the Svenskt Associationslexikon, KORP, Språkbanken’s corpus tool, Ordbog over det danske Sprog, Dictionary of Danish, Bokmålsordboka | Nynorskordboka, Sprakrådet’s online dictionaries of Bokmål and Nynorsk
\[ G = 2\sum_{i} {O_{i} \cdot \ln\left(\frac{O_i}{E_i}\right)} \]
Total Nordic-language profanity per 1k words: Females 0.542, males 0.844
Total English-language profanity per 1k words: Females 0.996, males 0.995
Thank you for your attention!
Also thanks to
Argamon, S., M. Koppel, J. W. Pennebaker and J. Schler. 2007. Mining the blogosphere: Age, gender, and the varieties of self-expression. First Monday 12/9. http://firstmonday.org/ojs/index.php/fm/article/view/2003/1878
Bamman, D., J. Eisenstein and T. Schnoebelen. 2014. Gender identity and lexical variation in social media. Journal of Sociolinguistics 18(2), 135–160. http://onlinelibrary.wiley.com/doi/10.1111/josl.12080/full
Coats, S. 2016. Grammatical feature frequencies of English on Twitter in Finland. In L. Squires (ed.), English in computer-mediated communication: Variation, representation, and change. Berlin: De Gruyter. 179–210.
Dewaele, J.-M. 2004. The emotional force of swearwords and taboo words in the speech of multilinguals. Journal of Multilingual and Multicultural Development 25(2–3), 204–222.
Firth, J.R. 1957. Papers in linguistics, 1934–1951. London: Oxford University Press.
Labov, W. 2001. Principles of linguistic change, vol. 2: Social factors. Oxford: Blackwell.
Lui, M. and T. Baldwin. 2012. Langid.py: An off-the-shelf language identification tool. 50th Proceedings of the Association for Computational Linguistics, 25–30. Stroudsburg, PA: ACL. http://dl.acm.org/citation.cfm?id=2390475
McEnery, T. 2006. Swearing in English: Bad language, purity and power from 1586 to the present. New York: Routledge.
Mehl, M. and J. Pennebaker. 2003. The sounds of social life: A psychometric analysis of students’ daily social environments and natural conversations. Journal of Personality and Social Psychology 84(4), 857–870.
Mikolov, T., W. Yih, and G. Zweig. 2013. Linguistic regularities in continuous space word represen-tations. In: Proceedings of HLT-NAACL 13, 746–751. https://www.aclweb.org/anthology/N13-1090
Newman, M.L., C. Groom, L. Handelman, and J. Pennebaker. 2008. Gender differences in language use: An analysis of 14,000 text samples.Discourse Processes 45, 211–236. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.216.4267&rep=rep1&type=pdf
Roesslein, J. 2015. Tweepy. Python package [Computer software]. http://www.tweepy.org
Wang, W., L. Chen, K. Thirunarayan, and A. P. Sheth. 2014. Cursing in English on Twitter. In: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing, 415–425.