Steven Coats

steven.coats (at) oulu.fi
University Lecturer, English Philology, University of Oulu, Finland



I'm a linguist interested in language variation, online language and social media, and computational approaches to language analysis, among other topics.

My research background is mostly in dialectology, sociolinguistics, and digital humanities. I've created the Corpus of North American Spoken English (CoNASE) from geolocated YouTube transcripts, currently the largest corpus of transcribed speech.

Professional experience

Publications

  1. Coats, Steven. (2022). Naturalistic double modals in North America. American Speech. Link Article
  2. Coats, Steven. (2021). ZipfExplorer: A Tool for the Comparison of Shared Lexis In Sanita Reinsone, Inguna Skadiņa, Anda Baklāne and Jānis Daugavieti (eds.), Post-Proceedings of the 5th Digital Humanities in the Nordic Countries Conference, Riga, Latvia, October 21–23, 2020, 145–155. Aachen, Germany: CEUR. Article Tool
  3. Coats, Steven. (2021). 'Bad language' in the Nordics: profanity and gender in a social media corpus. Acta Linguistica Hafniensia 53(1), 22–57. Link Article
  4. Coats, Steven. (2020). Comparing word frequencies and lexical diversity with the ZipfExplorer tool. In Sanita Reinsone, Inguna Skadiņa, Anda Baklāne and Jānis Daugavieti (eds.), Proceedings of the 5th Digital Humanities in the Nordic Countries Conference, Riga, Latvia, October 21–23, 2020, 219–225. Aachen, Germany: CEUR. Article Tool
  5. Coats, Steven. (2020). Anglicism diversity in hyphenated German compounds. In Julien Longhi and Claudia Marinica (eds.), CMC Corpora through the Prism of Digital Humanities, 75–92. Paris: L'Harmattan. Link
  6. Coats, Steven. (2020). Articulation rate in American English in a corpus of YouTube videos. Language and Speech 63(4), 799–831. Link Article
  7. Coats, Steven and Adrien Barbaresi. (2019). Productivity of anglicism bases in hyphenated German compounds. In Julien Longhi and Claudi Marinica (eds.), Proceedings of the 7th Conference on CMC and Social Media Corpora for the Humanities, 53–58. Cergy, France: Cergy-Pontoise University. Link Article
  8. Coats, Steven. (2019). Lexicon geupdated: New German anglicisms in a social media corpus. European Journal of Applied Linguistics 7(2), 255–280. Link Article
  9. Coats, Steven. (2019). Language choice and gender in a Nordic social media corpus. Nordic Journal of Linguistics 42(1), 31–55. Link Article
  10. Coats, Steven. (2019). Online language ecology: Twitter in Europe. In Egon Stemle and Ciara Wigham (eds.), Building computer-mediated communication corpora for sociolinguistic analysis, 73–96. Clermont-Ferrand: Presses universitaires Blaise Pascal. Link Article
  11. Coats, Steven. (2019). A Corpus of regional American language from YouTube. In Costanza Navarretta et al. (eds.), Proceedings of the 4th Digital Humanities in the Nordic Countries Conference, Copenhagen, Denmark, March 6–8, 2019, 79–91. Aachen, Germany: CEUR. Article
  12. Coats, Steven. (2018). Variation of new German verbal Anglicisms in a social media corpus. In Reinhild Vandekerckhove, Darja Fišer and Lisa Hilte (eds.), Proceedings of the 6th conference on CMC and social media corpora for the humanities, 27–32. Antwerp, Belgium: University of Antwerp. Link Article Data
  13. Coats, Steven. (2018). Skin tone emoji and sentiment on Twitter. In Eetu Mäkelä and Mikko Tolonen (eds.), Proceedings of the 3rd Digital Humanities in the Nordic Countries Conference, Helsinki, Finland, March 7–9, 2018, 122–138. Aachen, Germany: CEUR. Link Article
  14. Coats, Steven. (2018). Collecting Twitter data. In Christine Mallinson, Becky Childs and Gerard Van Herk (eds.), Data collection in sociolinguistics: Methods and applications (2nd Ed.), 248–251. London/New York: Routledge. Link
  15. Coats, Steven. (2017). Gender and lexical type frequencies in Finland Twitter English. In Turo Hiltunen, Joe McVeigh, and Tanja Säily (eds.), Big and rich data in English corpus linguistics: Methods and explorations (= Studies in Variation, Contacts and Change in English 19). Helsinki, Finland: Varieng. Link
  16. Coats, Steven. (2017). Gender and grammatical frequencies in social media English from the Nordic countries. In Darja Fišer and Michael Beißwenger (eds.), Investigating social media corpora, 102–121. Ljubljana, Slovenia: U. of Ljubljana Academic Publishing. Link
  17. Coats, Steven. (2017). European language ecology and bilingualism with English on Twitter. In Egon Stemle and Ciara Wigham (eds.), Proceedings of the 5th conference on CMC and social media corpora for the humanities, 35–38. Bozen/Bolzano: Eurac Research. Article
  18. Coats, Steven. (2016). Grammatical feature frequencies of English on Twitter in Finland. In Lauren Squires (ed.), English in computer-mediated communication: Variation, representation, and change, 179–210. Boston/Berlin: de Gruyter Mouton. Link Article
  19. Coats, Steven. (2016). Grammatical frequencies and gender in Nordic Twitter Englishes. In Darja Fišer and Michael Beißwenger (eds.), Proceedings of the 4th conference on CMC and social media corpora for the humanities, 12–16. Ljubljana: U. of Ljubljana Academic Publishing. Article
  20. Kretzschmar, William A. Jr., Paulina Bounds, Jacqueline Hettel, Steven Coats, Lee Pederson, Lisa-Lena Opas-Hänninen, Ilkka Juuso, and Tapio Seppänen. (2012). Digital Archive of Southern Speech. Philadelphia, PA: Linguistic Data Consortium. Link

Presentations

  1. Civic engagement with local government videos: Comparing YouTube transcripts with user comments. Presentation at the 9th Conference on CMC and Social Media Corpora for the Humanities, Santiago de Compostela, Spain, 29 September 2022. Slides
  2. CoANZSE: The Corpus of Australian and New Zealand Spoken English. Computational Thinking in the Humanities Online Workshop, Brisbane, Australia, 1 September 2022. Slides
  3. Double modals in YouTube videos from North America and the British Isles. Presentation at CoCorDial Workshop, Helsinki, Finland, 27 April 2022. Slides
  4. The Corpus of British Isles Spoken English (CoBISE): A new resource of contemporary British and Irish speech. Virtual presentation at DHNB 2022, Uppsala, Sweden, 17 March 2022. Slides
  5. Scraping online dictionaries for usage annotations. Virtual presentation at the 7th SwiSca Symposium, Reykjavík, Iceland, 2 December 2021. Slides
  6. A database of North American multiple modals from YouTube. Presentation at the 8th Conference on CMC and Social Media Corpora for the Humanities, Nijmegen, the Netherlands, 29 October 2021. Slides
  7. Multiple modals in the wild: A study of 24,530 multiple modal sequences in naturalistic North American speech. Virtual presentation for the workshop "The March of Data" at the Sixth International Society for the Study of English Conference, Joensuu, Finland, 2 June 2021. Slides
  8. Comparing word frequencies and lexical diversity with the ZipfExplorer tool. Virtual presentation at the Fifth Digital Humanities in the Nordic Countries Conference, National Library of Latvia, Riga, Latvia, 23 October 2020. Slides
  9. Dialect corpora from YouTube. Virtual presentation at ICAME 41, University of Heidelberg, Germany, 20–24 May 2020. Slides Video
  10. Steven Coats and Adrien Barbaresi. Productivity of anglicism bases in hyphenated German compounds. Presentation at the 7th Conference on CMC and Social Media Corpora for the Humanities, Cergy, France, 10 September 2019. Slides
  11. Regional variation in speech rate in American English from YouTube videos. Presentation at Research Data and Humanities Conference, University of Oulu, Finland, 14 August 2019, and 9th Conference of the Finnish Society for the Study of English, University of Tampere, Finland, 15 August 2019. Slides
  12. Swearing on Twitter: Harvesting and visualizing data. Slides for workshop at the 6th SwiSca Symposium, Södertörn University, Sweden, 23 May 2019. Slides
  13. A Corpus of regional American language from YouTube. Presentation at the Fourth Digital Humanities in the Nordic Countries Conference, University of Copenhagen, Denmark, 8 March 2019. Slides Article
  14. Variation of new German verbal Anglicisms in a social media corpus. Presentation at the 6th Conference on CMC and Social Media Corpora for the Humanities, Antwerp, Belgium, 17 September 2018. Slides
  15. Slides (deutsche Version)
  16. Exploring code-switching and borrowing using word vectors. Presentation at the 14th Conference of the European Society for the Study of English, Brno, Czechia, 1 September 2018. Slides
  17. William A. Kretzschmar, Jr. and Steven Coats. Fractal visualization of corpus data. ICAME 39, University of Tampere, Finland, 30 May 2018.
  18. Skin tone emoji and sentiment on Twitter. Presentation at the Third Digital Humanities in the Nordic Countries Conference, University of Helsinki, Finland, 7 March 2018. Slides Article
  19. Profanity in the Nordics on Twitter. Invited presentation at the Higher Seminar Series, Södertörn University, Sweden, 18 January 2018. Slides
  20. Profanity in the Nordics on Twitter. Presentation at the 5th SwiSca Symposium "What the HEL", University of Helsinki, Finland, 23 November 2017. Slides
  21. European language ecology and bilingualism with English on Twitter. Presentation at the 5th Conference on CMC and Social Media Corpora for the Humanities, EURAC Research, Bozen/Bolzano, Italy, 3 October 2017. Slides Article
  22. Multilingual clusters and gender in Nordic Twitter. Presentation at the CLARIN-PLUS Workshop "Creation and Use of Social Media Resources", Vytautas Magnus University, Kaunas, Lithuania, 19 May 2017. Slides
  23. Multilingual clusters and gender in Nordic Twitter. Presentation at the Second Digital Humanities in the Nordic Countries Conference, University of Gothenburg, Sweden, 14 March 2017. Slides
  24. Grammatical frequencies and gender in Nordic Twitter Englishes. Presentation at the 4th Conference on CMC and Social Media Corpora for the Humanities, University of Ljubljana, Slovenia, 27 September 2016. Slides Article
  25. Nordic Englishes on Twitter. Presentation at Digital Humanities in the Nordic Countries Conference, University of Oslo, Norway, 16 March 2016. Slides
  26. Gender and grammatical type frequencies in Finland Twitter English. Presentation at Interrelating Distance and Interaction Workshop, University of Oulu, Finland, 3 November 2015.
  27. Gender and lexical type frequencies in Finland Twitter English. Presentation at From data to evidence: Big data, rich data, uncharted data, University of Helsinki, Finland, 22 October 2015. Slides
  28. Non-standard lexical and grammatical resources in Finland Twitter English. Presentation at the 45th Poznań Linguistic Meeting, Poznań, Poland, 19 September 2015. Slides
  29. English-language social media in Finland: Twitter data collection and analysis. Presentation at the 12th Conference of the European Society for the Study of English, Košice, Slovakia, 31 August 2014. Slides
  30. Web corpora for discourse analysis: The language of travel and tourism. Presentation at the 79th Southeastern Conference on Linguistics, University of Kentucky, 13 April 2012.
  31. Lisa Lena Opas-Hänninen, Ilkka Juuso, William A. Kretzschmar, Jr., Tapio Seppänen, Steven Coats. The Digital Archive of Southern Speech. Helsinki Corpus Festival, 1 October 2011.
  32. Constituting mental maps: A corpus linguistics-based approach to perceptual geography in the GDR. Presentation at the 77th Southeastern Conference on Linguistics, University of Mississippi, 29 April 2010.
  33. William A. Kretzschmar, Jr., Paulina Bounds, Steven Coats, Tony Snodgrass, Lisa Lena Opas-Hänninen, Tapio Seppänen, and Ilkka Juuso. The Digital Archive of Southern Speech. 76th Southeastern Conference on Linguistics, Tulane University, 9 April 2009.

Education

Teaching

I teach courses on Academic Communication, Sociolinguistics, Digital Humanities, and North American studies. If you are a student in one of my courses, go to Moodle for course information and materials.

I was one of the group leaders at the Helsinki Digital Humanities Hackathon for the theme "Brexit in Transnational Social Media" in May 2019.

Professional activities

Visualizations

You can find a map of the semantic similarity of emoji types here. A representation of the links between languages for a sample of bilingual European Twitter users is available here. Check out some maps of variation in articulation rate in American English here. A table with more than 1,000 authentic double modals (with links to the videos at the time of utterance), as well as about 1,000 two-modal sequences that are instances of "self-repair" is here. The ZipfExplorer is a tool for the visualization of word frequency differences in texts.

My GitHub is here.