Steven Coats

steven.coats (at) oulu.fi
University Lecturer, English Philology, University of Oulu, Finland



I'm a linguist interested in corpus linguistics, language variation, online language and social media, and computational approaches to language analysis, among other topics.

My research background is mostly in dialectology, sociolinguistics, and digital humanities. I've created the Corpus of North American Spoken English (CoNASE), the Corpus of British Isles Spoken English (CoBISE), the Corpus of Australian and New Zealand Spoken English (CoANZSE), and the Corpus of German Speech (CoGS). These are large corpora of geolocated YouTube transcripts. CoANZSE Audio is the searchable online version of CoANZSE, which contains audio and alignments, in addition to speech transcripts.

Professional experience

Publications

  1. Coats, Steven. (2024). Commenting on local politics: An analysis of YouTube video comments for local government videos. Research in Corpus Linguistics 13(1), 1–25. Link
  2. Coats, Steven. (2024). A framework for analysis of speech and chat content in YouTube and Twitch streams. In Céline Poudat and Mathilde Guernut (eds.), Proceedings of the 11th Conference on CMC and Social Media Corpora for the Humanities, 16–19. Nice, France: CORLI. Link
  3. Coats, Steven. (2024). Building a searchable online corpus of Australian and New Zealand aligned speech. Australian Journal of Linguistics. Link
  4. Coats, Steven. (2024). CoANZSE Audio: Creation of an online corpus for linguistic and phonetic analysis of Australian and New Zealand Englishes. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue (eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 3407–3412. Link
  5. Coats, Steven. (2024). Naturalistic double modals in North America. American Speech 99(1), 47–77. Link Article
  6. Coats, Steven and Veronika Laippala, eds. (2024). Linguistics across disciplinary borders: The march of data. London: Bloomsbury Academic. Link
  7. Coats, Steven and Veronika Laippala. (2024). Introduction. In Steven Coats and Veronika Laippala (eds.), Linguistics across disciplinary borders: The march of data, 1–16. London: Bloomsbury Academic. Link
  8. Coats, Steven. (2024). Noisy data: Using automatic speech recognition transcripts for linguistic research. In Steven Coats and Veronika Laippala (eds.), Linguistics across disciplinary borders: The march of data, 17–39. London: Bloomsbury Academic. Link
  9. Méli, Adrien, Steven Coats and Nicolas Ballier. (2023). Methods for phonetic scraping of Youtube videos. In Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023), 244–249. Link
  10. Coats, Steven. (2023). Raumgeographische Verteilung von Twitter-Hashtags im deutschen Sprachraum. Neuphilologische Mitteilungen 114(2), 97–126. Link
  11. Morin, Cameron and Steven Coats. (2023). Double modals in Australian and New Zealand English. World Englishes. Link
  12. Coats, Steven. (2023). A new corpus of geolocated ASR transcripts from Germany. Language Resources and Evaluation. Link
  13. Coats, Steven. (2023). A pipeline for the large-scale acoustic analysis of streamed content. In Louis Cotgrove, Laura Herzberg, Harald Lüngen, and Ines Pisetta (eds.), Proceedings of the 10th International Conference on CMC and Social Media Corpora for the Humanities (CMC-Corpora 2023), 51–54. Mannheim: Leibniz-Institut für Deutsche Sprache. Link Article
  14. Coats, Steven. (2023). Double modals in contemporary British and Irish speech. English Language and Linguistics 27(4), 693–718. Link Article
  15. Coats, Steven. (2023). Dialect corpora from YouTube. In Beatrix Busse, Nina Dumrukcic, and Ingo Kleiber (eds.), Language and linguistics in a complex world, 79–102. Berlin: de Gruyter. Link
  16. Coats, Steven. (2022). The Corpus of Australian and New Zealand Spoken English: A new resource of naturalistic speech transcripts. In Pradeesh Parameswaran, Jennifer Biggs, and David Powers (eds.), Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association, 1–5. Adelaide, Australia: Australasian Language Technology Association. (Awarded "best short paper") Link
  17. Coats, Steven. (2022). A database of North American double modals and self-repairs from YouTube. Psychology of Language and Communication 26, 273–296. Link
  18. Coats, Steven. (2022). The Corpus of British Isles Spoken English (CoBISE): A new resource of contemporary British and Irish speech. In Karl Berglund, Matti La Mela, and Inge Zwart (eds.), Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference, Uppsala, Sweden, March 15–18, 2022, 187–194. Aachen, Germany: CEUR. Link
  19. Coats, Steven. (2021). ZipfExplorer: A Tool for the Comparison of Shared Lexis. In Sanita Reinsone, Inguna Skadiņa, Anda Baklāne, and Jānis Daugavieti (eds.), Post-Proceedings of the 5th Digital Humanities in the Nordic Countries Conference, Riga, Latvia, October 21–23, 2020, 145–155. Aachen, Germany: CEUR. Article Tool
  20. Coats, Steven. (2021). 'Bad language' in the Nordics: profanity and gender in a social media corpus. Acta Linguistica Hafniensia 53(1), 22–57. Link Article
  21. Coats, Steven. (2020). Comparing word frequencies and lexical diversity with the ZipfExplorer tool. In Sanita Reinsone, Inguna Skadiņa, Anda Baklāne, and Jānis Daugavieti (eds.), Proceedings of the 5th Digital Humanities in the Nordic Countries Conference, Riga, Latvia, October 21–23, 2020, 219–225. Aachen, Germany: CEUR. Article Tool
  22. Coats, Steven. (2020). Anglicism diversity in hyphenated German compounds. In Julien Longhi and Claudia Marinica (eds.), CMC Corpora through the Prism of Digital Humanities, 75–92. Paris: L'Harmattan. Link
  23. Coats, Steven. (2020). Articulation rate in American English in a corpus of YouTube videos. Language and Speech 63(4), 799–831. Link Article
  24. Coats, Steven and Adrien Barbaresi. (2019). Productivity of anglicism bases in hyphenated German compounds. In Julien Longhi and Claudi Marinica (eds.), Proceedings of the 7th Conference on CMC and Social Media Corpora for the Humanities, 53–58. Cergy, France: Cergy-Pontoise University. Link Article
  25. Coats, Steven. (2019). Lexicon geupdated: New German anglicisms in a social media corpus. European Journal of Applied Linguistics 7(2), 255–280. Link Article
  26. Coats, Steven. (2019). Language choice and gender in a Nordic social media corpus. Nordic Journal of Linguistics 42(1), 31–55. Link Article
  27. Coats, Steven. (2019). Online language ecology: Twitter in Europe. In Egon Stemle and Ciara Wigham (eds.), Building computer-mediated communication corpora for sociolinguistic analysis, 73–96. Clermont-Ferrand: Presses universitaires Blaise Pascal. Link Article
  28. Coats, Steven. (2019). A Corpus of regional American language from YouTube. In Costanza Navarretta et al. (eds.), Proceedings of the 4th Digital Humanities in the Nordic Countries Conference, Copenhagen, Denmark, March 6–8, 2019, 79–91. Aachen, Germany: CEUR. Article
  29. Coats, Steven. (2018). Variation of new German verbal Anglicisms in a social media corpus. In Reinhild Vandekerckhove, Darja Fišer, and Lisa Hilte (eds.), Proceedings of the 6th conference on CMC and social media corpora for the humanities, 27–32. Antwerp, Belgium: University of Antwerp. Link Article Data
  30. Coats, Steven. (2018). Skin tone emoji and sentiment on Twitter. In Eetu Mäkelä and Mikko Tolonen (eds.), Proceedings of the 3rd Digital Humanities in the Nordic Countries Conference, Helsinki, Finland, March 7–9, 2018, 122–138. Aachen, Germany: CEUR. Link Article
  31. Coats, Steven. (2018). Collecting Twitter data. In Christine Mallinson, Becky Childs, and Gerard Van Herk (eds.), Data collection in sociolinguistics: Methods and applications (2nd Ed.), 248–251. London/New York: Routledge. Link
  32. Coats, Steven. (2017). Gender and lexical type frequencies in Finland Twitter English. In Turo Hiltunen, Joe McVeigh, and Tanja Säily (eds.), Big and rich data in English corpus linguistics: Methods and explorations (= Studies in Variation, Contacts and Change in English 19). Helsinki, Finland: Varieng. Link
  33. Coats, Steven. (2017). Gender and grammatical frequencies in social media English from the Nordic countries. In Darja Fišer and Michael Beißwenger (eds.), Investigating social media corpora, 102–121. Ljubljana, Slovenia: U. of Ljubljana Academic Publishing. Link
  34. Coats, Steven. (2017). European language ecology and bilingualism with English on Twitter. In Egon Stemle and Ciara Wigham (eds.), Proceedings of the 5th conference on CMC and social media corpora for the humanities, 35–38. Bozen/Bolzano: Eurac Research. Article
  35. Coats, Steven. (2016). Grammatical feature frequencies of English on Twitter in Finland. In Lauren Squires (ed.), English in computer-mediated communication: Variation, representation, and change, 179–210. Boston/Berlin: de Gruyter Mouton. Link Article
  36. Coats, Steven. (2016). Grammatical frequencies and gender in Nordic Twitter Englishes. In Darja Fišer and Michael Beißwenger (eds.), Proceedings of the 4th conference on CMC and social media corpora for the humanities, 12–16. Ljubljana: U. of Ljubljana Academic Publishing. Article
  37. Kretzschmar, William A. Jr., Paulina Bounds, Jacqueline Hettel, Steven Coats, Lee Pederson, Lisa-Lena Opas-Hänninen, Ilkka Juuso, and Tapio Seppänen. (2012). Digital Archive of Southern Speech. Philadelphia, PA: Linguistic Data Consortium. Link

Presentations

  1. Regional variation in monophthongs in Australian and New Zealand Englishes: A big data approach. Presentation at the 10th Biannual Conference for the Linguistics of Contemporary English, Alicante, Spain, 27 September 2024. Slides
  2. A framework for analysis of speech and chat content in YouTube and Twitch streams. Presentation at the 11th International Conference on CMC and Social Media Corpora for the Humanities, Nice, France, 5 September 2024. Slides Code
  3. Analysis of online discourse from text to multimedia. Keynote presentation for the University of Eastern Finland SCE Program, 20 May 2024. Slides
  4. YouTube Phonetics Pipeline Workshop. Workshop for the ALOES pre-conference event, Paris, France, 28 March 2024. Slides Code
  5. A pipeline for the large-scale acoustic analysis of streamed content. Presentation at the 10th International Conference on CMC and Social Media Corpora for the Humanities, Mannheim, Germany, 15 September 2023. Slides
  6. The Corpus of Australian and New Zealand Spoken English (CoANZSE). Virtual presentation at the Workshop on Language Corpora in Australia, Canberra, Australia, 3 July 2023. Slides
  7. Corpora for the study of multimodal variation in English: Acoustic analysis from CoNASE. Presentation at Kielitieteen Päivät, Oulu, Finland, 25 May 2023. Slides
  8. Corpora of automatic speech recognition transcripts for the study of variation in English: Syntactic and phonetic perspectives. Presentation at PAC2023, Nanterre, France, 12 April 2023. Slides
  9. CoANZSE: The Corpus of Australian and New Zealand Spoken English: A new resource of naturalistic speech transcripts. Virtual presentation at the 20th Annual Workshop of the Australasian Language Technology Association, Adelaide, Australia, 16 December 2022. Slides
  10. Civic engagement with local government videos: Comparing YouTube transcripts with user comments. Presentation at the 9th Conference on CMC and Social Media Corpora for the Humanities, Santiago de Compostela, Spain, 29 September 2022. Slides
  11. CoANZSE: The Corpus of Australian and New Zealand Spoken English. Computational Thinking in the Humanities Online Workshop, Brisbane, Australia, 1 September 2022. Slides
  12. Double modals in YouTube videos from North America and the British Isles. Presentation at CoCorDial Workshop, Helsinki, Finland, 27 April 2022. Slides
  13. The Corpus of British Isles Spoken English (CoBISE): A new resource of contemporary British and Irish speech. Virtual presentation at DHNB 2022, Uppsala, Sweden, 17 March 2022. Slides
  14. Scraping online dictionaries for usage annotations. Virtual presentation at the 7th SwiSca Symposium, Reykjavík, Iceland, 2 December 2021. Slides
  15. A database of North American multiple modals from YouTube. Presentation at the 8th Conference on CMC and Social Media Corpora for the Humanities, Nijmegen, the Netherlands, 29 October 2021. Slides
  16. Multiple modals in the wild: A study of 24,530 multiple modal sequences in naturalistic North American speech. Virtual presentation for the workshop "The March of Data" at the Sixth International Society for the Study of English Conference, Joensuu, Finland, 2 June 2021. Slides
  17. Comparing word frequencies and lexical diversity with the ZipfExplorer tool. Virtual presentation at the Fifth Digital Humanities in the Nordic Countries Conference, National Library of Latvia, Riga, Latvia, 23 October 2020. Slides
  18. Dialect corpora from YouTube. Virtual presentation at ICAME 41, University of Heidelberg, Germany, 20–24 May 2020. Slides Video
  19. Steven Coats and Adrien Barbaresi. Productivity of anglicism bases in hyphenated German compounds. Presentation at the 7th Conference on CMC and Social Media Corpora for the Humanities, Cergy, France, 10 September 2019. Slides
  20. Regional variation in speech rate in American English from YouTube videos. Presentation at Research Data and Humanities Conference, University of Oulu, Finland, 14 August 2019, and 9th Conference of the Finnish Society for the Study of English, University of Tampere, Finland, 15 August 2019. Slides
  21. Swearing on Twitter: Harvesting and visualizing data. Slides for workshop at the 6th SwiSca Symposium, Södertörn University, Sweden, 23 May 2019. Slides
  22. A Corpus of regional American language from YouTube. Presentation at the Fourth Digital Humanities in the Nordic Countries Conference, University of Copenhagen, Denmark, 8 March 2019. Slides Article
  23. Variation of new German verbal Anglicisms in a social media corpus. Presentation at the 6th Conference on CMC and Social Media Corpora for the Humanities, Antwerp, Belgium, 17 September 2018. Slides
  24. Slides (deutsche Version)
  25. Exploring code-switching and borrowing using word vectors. Presentation at the 14th Conference of the European Society for the Study of English, Brno, Czechia, 1 September 2018. Slides
  26. William A. Kretzschmar, Jr. and Steven Coats. Fractal visualization of corpus data. ICAME 39, University of Tampere, Finland, 30 May 2018.
  27. Skin tone emoji and sentiment on Twitter. Presentation at the Third Digital Humanities in the Nordic Countries Conference, University of Helsinki, Finland, 7 March 2018. Slides Article
  28. Profanity in the Nordics on Twitter. Invited presentation at the Higher Seminar Series, Södertörn University, Sweden, 18 January 2018. Slides
  29. Profanity in the Nordics on Twitter. Presentation at the 5th SwiSca Symposium "What the HEL", University of Helsinki, Finland, 23 November 2017. Slides
  30. European language ecology and bilingualism with English on Twitter. Presentation at the 5th Conference on CMC and Social Media Corpora for the Humanities, EURAC Research, Bozen/Bolzano, Italy, 3 October 2017. Slides Article
  31. Multilingual clusters and gender in Nordic Twitter. Presentation at the CLARIN-PLUS Workshop "Creation and Use of Social Media Resources", Vytautas Magnus University, Kaunas, Lithuania, 19 May 2017. Slides
  32. Multilingual clusters and gender in Nordic Twitter. Presentation at the Second Digital Humanities in the Nordic Countries Conference, University of Gothenburg, Sweden, 14 March 2017. Slides
  33. Grammatical frequencies and gender in Nordic Twitter Englishes. Presentation at the 4th Conference on CMC and Social Media Corpora for the Humanities, University of Ljubljana, Slovenia, 27 September 2016. Slides Article
  34. Nordic Englishes on Twitter. Presentation at Digital Humanities in the Nordic Countries Conference, University of Oslo, Norway, 16 March 2016. Slides
  35. Gender and grammatical type frequencies in Finland Twitter English. Presentation at Interrelating Distance and Interaction Workshop, University of Oulu, Finland, 3 November 2015.
  36. Gender and lexical type frequencies in Finland Twitter English. Presentation at From data to evidence: Big data, rich data, uncharted data, University of Helsinki, Finland, 22 October 2015. Slides
  37. Non-standard lexical and grammatical resources in Finland Twitter English. Presentation at the 45th Poznań Linguistic Meeting, Poznań, Poland, 19 September 2015. Slides
  38. English-language social media in Finland: Twitter data collection and analysis. Presentation at the 12th Conference of the European Society for the Study of English, Košice, Slovakia, 31 August 2014. Slides
  39. Web corpora for discourse analysis: The language of travel and tourism. Presentation at the 79th Southeastern Conference on Linguistics, University of Kentucky, 13 April 2012.
  40. Lisa Lena Opas-Hänninen, Ilkka Juuso, William A. Kretzschmar, Jr., Tapio Seppänen, Steven Coats. The Digital Archive of Southern Speech. Helsinki Corpus Festival, 1 October 2011.
  41. Constituting mental maps: A corpus linguistics-based approach to perceptual geography in the GDR. Presentation at the 77th Southeastern Conference on Linguistics, University of Mississippi, 29 April 2010.
  42. William A. Kretzschmar, Jr., Paulina Bounds, Steven Coats, Tony Snodgrass, Lisa Lena Opas-Hänninen, Tapio Seppänen, and Ilkka Juuso. The Digital Archive of Southern Speech. 76th Southeastern Conference on Linguistics, Tulane University, 9 April 2009.

Education

Teaching

I teach courses on Academic Communication, Sociolinguistics, Digital Humanities, and North American studies. If you are a student in one of my courses, go to Moodle for course information and materials.

I was one of the group leaders at the Helsinki Digital Humanities Hackathon for the theme "Brexit in Transnational Social Media" in May 2019.

Professional activities

Visualizations

You can find a map of the semantic similarity of emoji types here. A representation of the links between languages for a sample of bilingual European Twitter users is available here. Check out some maps of variation in articulation rate in American English here. A table with more than 1,000 authentic double modals (with links to the videos at the time of utterance), as well as about 1,000 two-modal sequences that are instances of "self-repair" is here. The ZipfExplorer is a tool for the visualization of word frequency differences in texts.

My GitHub is here.