class: inverse, center, middle background-image: url(data:image/png;base64,#https://cc.oulu.fi/~scoats/oululogoRedTransparent.png); background-repeat: no-repeat; background-size: 80px 57px; background-position:right top; exclude: true --- class: title-slide <div style="position: absolute; top: 10%; left: 5%; background-color: rgba(255, 255, 255, 0.5); padding: 30px 40px; border-radius: 8px; max-width: 56%; box-shadow: 0px 10px 25px rgba(0,0,0,0.1); text-align: left;"> <h2 style="font-family: 'Rubik', sans-serif; font-size: 1.8em; font-weight: 700; color: #1a202c; line-height: 1.15; margin: 0 0 20px 0;"> MD_NLP: Reconstructing an Australian English Heritage Dialect Corpus from the Mitchell-Delbridge Recordings </h2> <div style="width: 70px; height: 6px; background-color: #901a1e; margin: 0 0 25px 0;"></div> <p style="font-family: 'Rubik', sans-serif; font-size: 1.2em; font-weight: 400; color: #1a202c; line-height: 1.5; margin: 0;"> <strong>Steven Coats</strong><br> University of Oulu, Finland<br> <a href="mailto:steven.coats@oulu.fi" style="color: #901a1e; text-decoration: none;">steven.coats@oulu.fi</a><br> DialRes, LREC 2026 </p> </div> --- layout: true <div class="my-header"><img border="0" alt="Oulu logo" src="https://cc.oulu.fi/~scoats/oululogonewEng.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats                               Mitchell-Delbridge NLP | DialRes, LREC 2026</span></div> --- exclude: true <div class="my-header"><img border="0" alt="Oulu logo" src="https://cc.oulu.fi/~scoats/oululogonewEng.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats                               Mitchell-Delbridge NLP | DialRes, LREC 2026</span></div> --- exclude: true ### Outline 1. Background and motivation 2. DASS2019\_NLP dataset 3. Fine-tuning and evaluation 4. Main results 5. Error analysis, outlook, and limitations --- ### Background: Australian English Dialects? .pull-left[ - "Australia is, generally speaking, linguistically unified" <span class="small">(Mitchell & Delbridge 1965: 13)</span> - The *Mitchell-Delbridge* recordings (1959/60) - Recordings of 7,735 secondary school pupils in 327 locations across Australia <span class="small">(Mitchell and Delbridge, 1998)</span> - Extensive metadata (age, school location, school type, birthplace, parents' birthplaces, father's occupation) - Students read a word list, a test sentence, and engaged in a short conversation - Tape recordings digitized in 1998 - Important for the study of Australian English <span class="small">(Cox et al., 2014, 2024)</span> - Highly variable acoustic quality, narratives not previously transcribed ] .pull-right[  ] --- ### The Approach: A hybrid ASR workflow .pull-left[ - ASR: WhisperX for initial transcription <span class="small">(Radford et al., 2023; Bain et al. 2023)</span> - Diarization: Pyannote 4.0.1 <span class="small">(Bredin, 2023)</span> - Diarization correction: Gemini-flash-2.5 LLM to fix the discourse roles using interactional structure <span class="small">(cf. Cheng et al., 2025)</span> - Alignment: Montreal Forced Aligner <span class="small">(McAuliffe et al. 2017)</span> for precise word-level boundaries ] .pull-right[  ] --- ### Accuracy improvement .pull-left[  <iframe style="width:420px;height:40px;border:none;overflow:hidden;" scrolling="no" srcdoc=" <body style='margin:0;overflow:hidden;background:transparent'> <audio controls style='width:320px;height:50px;display:block'> <source src='https://cc.oulu.fi/~scoats/Coats_LREC2026_MD_NLP/LREC_Dialres_example_combined.mp3' type='audio/mpeg'> </audio> </body> "> </iframe><br> .small[Teacher and student from St. Peter's College, Adelaide, SA, 1959] ] .pull-right[ Speaker turn accuracy | System | Accuracy (%) | |-------|----------| | Baseline (WhisperX + Pyannote) | 62.70 | | Full Pipeline (LLM-assisted) | **95.68** | - LLM-assisted pipeline improves accuracy by 33% ] --- ### The *MD_NLP* Dataset: https://huggingface.co/datasets/stcoats/MD_NLP  - 177.2 hours of speech, 1.79m word tokens - `interview_metadata.csv` file in the Hugging Face dataset contains additional metadata fields for each informant - Resarchers can now instantly query 177 hours of historical AusE, filter by student background, map it to specific coordinates, and extract phonological data with precise timestamps --- ### Conclusion - Unlocks spatial and diachronic research for Australian English. - Pipeline architecture is language-agnostic and modular - Provides a blueprint for rescuing other legacy dialect archives (e.g. US Linguistic Atlas Project) --- ### Thanks for your attention! #### Acknowledgements - Supported by the **European Union -- NextGenerationEU** instrument - Funded by the **Research Council of Finland**, grant **358720** - Computational resources provided by **Finland's Centre for Scientific Computing** --- ### References .small[ Bain, M., Huh, J., Han, T., & Zisserman, A. (2023). WhisperX: Time-accurate speech transcription of long-form audio. In *Proceedings of Interspeech 2023* (pp. 4489–4493). https://doi.org/10.21437/Interspeech.2023-78 Bredin, H. (2023). Pyannote.audio 2.1 Speaker diarization pipeline: Principle, benchmark, and recipe. In *Proceedings of Interspeech 2023*, (pp. 1983–1987). https://doi.org/10.21437/Interspeech.2023-105 Cheng, L., Wang, H., Deng, C., Zheng, S., Chen, Y., Huang, R., Zhang, Q., Chen, Q., Li, X., & Wang, W. (2025). Integrating audio, visual, and semantic information for enhanced multimodal speaker diarization. In *Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics* (pp. 19914–19928). https://aclanthology.org Cox, F., Penney, J., and Palethorpe, S. (2024). [Australian English Monophthong Change across 50 Years: Static versus Dynamic Measures](https://doi.org/10.3390/languages9030099). *Languages* 9(3), 99. Cox, F., Palethorpe, S., and Bentink, S. (2014). [Phonetic Archaeology and 50 Years of Change to Australian English /iː/](https://doi.org/10.1080/07268602.2014.875455). *Australian Journal of Linguistics* 34(1), 50–75. McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., & Sonderegger, M. (2017). Montreal Forced Aligner: Trainable text- speech alignment using Kaldi. In *Proceedings of Interspeech 2017* (pp. 498–502). https://doi.org/10.21437/Interspeech.2017-1386 Mitchell, A. G., and Delbridge, A. (1998). *The speech of Australian adolescents: Research data and recordings collected by AG Mitchell and Arthur Delbridge in 1959 and 1960*. The University of Sydney. https://doi.org/10.25910/jkwy-wk76 Mitchell, A. G., and Delbridge, A. (1965). *The Pronunciation of English in Australia*. Angus and Robertson. Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. In *Proceedings of the 40th International Conference on Machine Learning, 202*, 28448–28481. https://doi.org/10.1145/3581783.3611771 ]