class: inverse, center, middle background-image: url(data:image/png;base64,#https://cc.oulu.fi/~scoats/oululogoRedTransparent.png); background-repeat: no-repeat; background-size: 80px 57px; background-position:right top; exclude: true --- class: title-slide <br><br><br><br><br> .pull-right[ <span style="font-family:Rubik;font-size:24pt;font-weight: 700;font-style: normal;float:right;text-align: right;color:white;-webkit-text-fill-color: black;-webkit-text-stroke: 0.8px;">Regional Variation in Australian English Monophthongs</span> ] <br><br><br><br> <p style="float:right;text-align: right;color:white;font-weight: 700;font-style: normal;-webkit-text-fill-color: black;-webkit-text-stroke: 0.5px;"> Steven Coats<br> University of Oulu, Finland<br> <a href="mailto:steven.coats@oulu.fi">steven.coats@oulu.fi</a><br> ALS Conference, Griffith University<br> December 4th, 2025<br> </p> --- layout: true <div class="my-header"><img border="0" alt="W3Schools" src="https://cc.oulu.fi/~scoats/oululogonewEng.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats                              AusE Monophthongs | ALS25</span></div> --- exclude: true <div class="my-header"><img border="0" alt="W3Schools" src="https://cc.oulu.fi/~scoats/oululogonewEng.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats                              AusE Monophthongs | ALS25</span></div> --- ## Outline 1. Background 2. Methods: GAMM with a spatial component, spatial autocorrelation 3. Data: CoANZSE 4. Preliminary results 5. Outlook and summary .footnote[Slides for the presentation are on my homepage at https://cc.oulu.fi/~scoats] <div class="my-header"><img border="0" alt="W3Schools" src="https://cc.oulu.fi/~scoats/oululogonewEng.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats                              AusE Monophthongs | ALS25</span></div> --- ### Background Traditional view: Regional variation in phonetic realization is limited in AUS. "Australia is, generally speaking, linguistically unified" <span class="small">(Mitchell & Delbridge 1965: 13)</span> - "Short a" (/æ/) in words like *dance*: Hobart > Melbourne, Sydney, and Brisbane; Adelaide < Melbourne, Sydney, and Brisbane <span class="small">(Horvath and Horvath 2001; Bradley 1991; 2008)</span> - Lowering of prelateral /e/: More in Melbourne/VIC <span class="small">(Bradley, 2008; Coats et al. 2025; Cox and Palethorpe 2004; Diskin et al. 2019; Loakes et al. 2011, 2014, 2017, 2024; Schmidt et al. 2021)</span> - No difference in ~3500 monophthongs in hVd words from AusTalk speakers from Sydney, Melbourne, Perth, and Adelaide <span class="small">(Cox and Palethorpe 2019)</span> --- ### Methods 1. Mean F1 and F2 values extracted from naturalistic speech at a large number of AUS locations 2. GAMMs with spatial smooths <span class="small">(cf. Wirtz et al. 2025)</span> - Fixed effects for *state*, *previous consonant class*, and *following consonant class*; a 2D spatial smooth over longitude and latitude; and a random intercept for council - Incorporate latitude and longitude coordinates as a smoothing term - <span class="code">f1_z ~ state + prev_class + next_class + s(longitude, latitude) + s(council, bs = "re")</span> 3. Spatial autocorrelation <span class="small">(cf. Grieve 2017)</span> - Quantify the extent and scale of spatial patterning - Do F1 and F2 vary by geographic location? - What is the role of urban centers? - Are spatial patterns consistent with proposed ongoing chain shifts? --- ### Data: CoANZSE Audio <span class="small">(Coats 2022, 2024)</span> - ASR transcripts and audio from YouTube channels of regional and local councils - Many recordings are meetings: advantages in terms of representativeness and comparability - Speaker place of residence (cf. videos collected based on place-name search alone) - Topical contents and communicative contexts comparable --- ### Example video <div class="video-thumbnail-container"> <a href="https://www.youtube.com/watch?v=7wc48ALnmuM" target="_blank"> <img src="https://img.youtube.com/vi/7wc48ALnmuM/maxresdefault.jpg" alt="Click to Watch Video on YouTube"> <span class="play-button"></span> </a> </div> --- ### WebVTT file  --- exclude: True ### Pipeline: Spatial analysis Tobler's first law: "everything is related to everything else, but near things are more related than distant things" <span class="small">(Tobler 1970)</span> - Moran's *I* and Getis-Ord *G*<span class='supsub'><sup>*</sup><sub>i</sub></span> based on mean F1 and F2 values at each location - Spatial weights matrix `\(W\)` based on distance band `\(w_{ij}=1/d_{ij}\)` - Minimum distance: all locations must have at least one neighbor - For each vowel and formant, only locations with at least 200 tokens considered - Spatial analysis conducted for AUS and NZ separately - Calculated with esda and pysal Python packages Moran's *I* <span class="small">(Moran 1950)</span>: Takes into account attribute values at all locations in a dataset and summarizes the overall extent of spatial correlation - 1: perfect clustering of similar values, 0: random spatial distribution of values, -1: pefectly even dispersion of values Getis-Ord *G*<span class='supsub'><sup>*</sup><sub>i</sub></span> <span class="small">(Getis & Ord, 1992; Ord & Getis, 1995)</span>: Identifies spatial clusters by evaluating the values at each location in the dataset in comparison with neighboring locations, in relation to the global dataset - Positive for the location and its neighbors > global mean; 0: value = global mean; negative: value < global mean --- ### CoANZSE <span class="small">(Coats 2024a, 2024b)</span> alignment - Australian data: 38,786 videos from 408 councils (≈ 13,885 hours) - All audio, cut into 20-second segments, retrieved from <https://coanzse.org> - Forced aligment with **Montreal Forced Aligner** <span class="small">(McAuliffe et al. 2017)</span> v3.0.0, with its **UK English** acoustic model, G2P model, pronunciation dictionary, and the **adapt** functionality (council-specific acoustic adaptation) - Output: Praat TextGrid files with tiers for orthographic transcript and phones --- ### Vowel and formant extraction - Formants were measured at the durational midpoint of each vowel segment using *Parselmouth-Praat* <span class="small">(Jadoul et al. 2018)</span>, a Python port of Praat functions <span class="small">(Boersma & Weenink 2024)</span> - Praat settings (default): - 5 formants, max formant 5,500 Hz - time step automatically determined from vowel duration - window length 0.025 s - pre-emphasis above 50 Hz - Extracted vowel duration, **F1**, **F2**, and bandwidths --- ### Gender inference - YouTube ASR transcripts lack diarization and metadata about speaker gender, so segments needed post-processing to control for gender imbalance across council locations - Speaker sex/gender was inferred using [alefiury/wav2vec2-large-xlsr-53-gender-recognition-librispeech](https://huggingface.co/alefiury/wav2vec2-large-xlsr-53-gender-recognition-librispeech), a wav2vec2-based model reported to reach 99.93% accuracy - Gender inference was run on the first 5 seconds of 3.68 million audio files using 4 AMD MI250X GPUs - Extracting vowel tokens from the TextGrids for this audio = 37.3 million vowel tokens labeled “male” or “female” - Manual evaluation of 100 randomly sampled segments showed 97% observed accuracy, with 97 model labels matching human annotations --- ### Filtering <style> .ipa { font-family: "Charis SIL", "Doulos SIL", "Noto Sans", "Noto Sans IPA", serif; font-size: 100%; } </style> - <span class="ipa">/iː, ɪ, e, ʉː, æ, ɐ, oː, ɔ, ʊ, ɜː/</span> were retained (no. tokens = 21,458,728) - Remove unstressed vowels, vowels followed by nasals or liquids, and short stop words - Stress from Carnegie-Mellon Pronunciation Dictionary <span class="small">(Weide et al. 1998)</span>, stopwords from NLTK <span class="small">(Bird et al. 2009)</span> - For some tokens, unrealistic formant values, likely the result of formant tracking errors - Outliers were removed with a Mahalanobis distance filter for F1 and F2 on the basis of the critical value of the 95% quantile of the `\(\chi ^{2}\)` distribution, following widespread practice <span class="small">(Labov et al. 2013; Mielke et al. 2019; Renwick and Stanley 2020)</span> --- ### Overview of vowel tokens Study data: 7.2m vowel tokens  --- ### Vowel tokens by state/territory, sex, and vowel (N = 7,208,190) .small[ | State | Sex | iː | ɪ | e | æ | ɐ | ɔ | oː | ʊ | ʉː | ɜː | |-------|-----|-------|--------|--------|--------|--------|--------|--------|--------|--------|--------| | ACT | f | 2,390 | 7,578 | 4,004 | 3,581 | 1,098 | 273 | 303 | 349 | 1,143 | 1,285 | | ACT | m | 2,421 | 8,040 | 4,144 | 4,032 | 1,233 | 334 | 219 | 371 | 1,197 | 1,233 | | NSW | f | 83,695 | 257,451 | 134,456 | 109,927 | 34,572 | 11,132 | 10,970 | 17,251 | 38,637 | 30,798 | | NSW | m | 108,677 | 385,683 | 192,563 | 160,316 | 59,865 | 16,822 | 15,902 | 25,747 | 64,913 | 47,732 | | NT | f | 492 | 1,440 | 785 | 661 | 211 | 48 | 85 | 85 | 200 | 193 | | NT | m | 481 | 1,251 | 740 | 679 | 193 | 64 | 89 | 114 | 231 | 198 | | QLD | f | 55,876 | 162,321 | 85,875 | 73,009 | 24,453 | 7,722 | 6,233 | 8,498 | 27,015 | 22,681 | | QLD | m | 92,840 | 279,219 | 146,002 | 133,499 | 54,420 | 14,689 | 10,326 | 14,824 | 52,521 | 40,802 | | SA | f | 42,270 | 119,163 | 68,846 | 56,959 | 17,104 | 5,288 | 5,031 | 8,759 | 19,779 | 15,903 | | SA | m | 58,634 | 182,700 | 94,909 | 86,676 | 27,588 | 8,153 | 7,229 | 14,125 | 31,230 | 25,024 | | TAS | f | 12,715 | 39,984 | 21,112 | 16,745 | 6,139 | 1,642 | 1,454 | 2,203 | 6,399 | 5,332 | | TAS | m | 18,488 | 70,653 | 36,424 | 30,720 | 12,197 | 2,949 | 2,427 | 4,147 | 11,335 | 8,834 | | VIC | f | 132,280 | 395,838 | 198,137 | 156,966 | 53,022 | 17,662 | 15,002 | 19,840 | 61,939 | 52,368 | | VIC | m | 131,883 | 433,326 | 213,884 | 180,280 | 69,976 | 20,860 | 15,725 | 24,089 | 71,745 | 58,958 | | WA | f | 32,200 | 87,996 | 49,892 | 40,366 | 11,432 | 4,300 | 3,491 | 5,197 | 15,397 | 13,140 | | WA | m | 27,008 | 80,400 | 42,980 | 38,851 | 12,807 | 3,611 | 2,816 | 4,110 | 14,569 | 13,176 | | **Total** | | 802,350 | 2,513,043 | 1,294,753 | 1,093,267 | 386,310 | 115,549 | 97,302 | 149,709 | 418,250 | 337,657 | ] --- ### Normalization of formant values - To account for average vocal-tract size differences between males and females <span class="small">(Adank et al. 2004; Barreda & Nearey 2017; Fabricius et al. 2009; Flynn 2011; Thomas & Kendall 2007)</span> - For the GAMM analysis, a formant-intrinsic, vowel-extrinsic normalization was applied: - For each **gender × vowel**, formant values were scaled to **z-scores** - Method is appropriate for studies with large numbers of speakers <span class="small">(Thomas & Kendall 2007)</span> --- exclude: True ### Results: Formant values  --- ### Results: Formant values  --- ### Results: GAMM .small[ | Vowel | # Tokens | F1 (notable state effects) | F1 p-value | F2 (notable state effects) | F2 p-value | |-------|----------|-----------------------------|------------|-----------------------------|------------| | /iː/ | 801,236 | QLD: −0.33 \*, TAS: −0.38 (\*) | \*\*\* | NSW: −0.23 (\*), TAS: −0.42 (\*) | \*\*\* | | /ɪ/ | 2,512,056 | NT: +0.82 \*\*\*, SA: +0.33 \*, WA: +0.84 \*\*\* | \*\* | SA: −0.26 \* | | | /e/ | 1,293,416 | TAS: −0.53 \* | \*\*\* | NT: −0.56 \*\*, SA: −0.36 \*\*, TAS: −0.21 (\*), WA: −0.77 \*\* | (\*) | | /æ/ | 1,092,101 | TAS: −0.51 \* | \*\*\* | NT: −0.42 \*, WA: −0.35 (\*) | \*\* | | /ɐ/ | 384,757 | NT: +0.81 (\*), TAS: −0.50 \* | \* | | \* | | /ɔ/ | 113,494 | TAS: −0.46 (\*) | \* | | | | /oː/ | 95,317 | | \* | QLD: −0.28 (\*) | | | /ʊ/ | 147,785 | NT: +1.06 (\*) | | | | | /ʉː/ | 416,740 | | \* | TAS: −0.34 \* | | | /ɜː/ | 335,894 | NT: +1.16 \*, TAS: −0.52 \* | \* | | (\*) | (\*): p ≤ 0.1; \*: p ≤ 0.05; \*\*: p ≤ 0.01; \*\*\*: p ≤ 0.001 TAS: Simply more "short-a"? Ordinary least-squares models were created for both genders with only /æ/ tokens from the highly frequent item *that’s* (112,029 occurrences) - Here as well, /æ/ is higher for TAS speakers ] --- ### Results: GAMM  --- ### Results: Spatial Autocorrelation <span class="small">(Getis & Ord 1992; Moran 1950; Ord & Getis 1995)</span> <div style="overflow:hidden;"> <iframe src="https://cc.oulu.fi/~scoats/AUS_monophthongs/combined_maps.html" style="width:100%; height:520px; border:0;" scrolling="no" ></iframe> </div> --- ### Preliminary findings - Geographic variation: NT, SA, WA show lower <span class="ipa">/ɪ/</span> and more back <span class="ipa">/e/</span>; TAS shows higher <span class="ipa">/e, æ, ɐ/</span> - Spatial autocorrelation: Moran’s I is weak overall, but Gi* reveals fine-scale pockets of variation that state boundaries don't capture; magnitude of local spatial autocorrelation values is not high - Continuous spatial gradients: both GAMM and Gi* show geographic trends for most vowels, especially front vowels, radiating outward from major cities (Sydney, Melbourne, Brisbane, Adelaide, Perth) <span class="small">(cf. Cox and Penney 2024; Travis 2023)</span> - Possibly influenced by L2 English of immigrants <span class="small">(Cox et al. 2024; Grama et al. 2021; Travis et al. 2023)</span> - Chain shift evidence: Patterns in <span class="ipa">/e, æ, ɐ, ɪ/</span> match proposals for recent front-vowel chain shifts: "anticlockwise rotation of vowels within the F1/F2 space" <span class="small">(Cox et al. 2024, p. 16; Cox & Palethorpe 2008)</span> evident --- ### Outlook/caveats - Data: ASR accuracy, formant extraction, MFA pronunciation dictionary/AusE phone mismatch - GAMM: No speaker info, so no speaker random effects. No lexical item random effects (>10k types) Outlook: - Carefully curated subset of data (high-quality council recordings, subset of word list) may give more robust results --- ### Summary - Pipeline approaches can be used to automatically process large data volumes and extract formants for millions of vowels - Incipient regional variation attested for Australia - Australian cities may be leading the changes - Still much to be done! #### Thank you for your attention! --- #### References .verysmall[ .hangingindent[ .pull-left[ Adank, P., Smits, R., and van Hout, R. (2004). [A comparison of vowel normalization procedures for language variation research](https://doi.org/10.1121/1.1795335). *The Journal of the Acoustical Society of America* 116(5), 3099–3107. Barreda, S., and Nearey, T. M. (2017). [A regression approach to vowel normalization for missing and unbalanced data](https://doi.org/10.1121/1.5014454). *The Journal of the Acoustical Society of America* 142(4_Supplement), 2583–2583. Bird, S., Klein, E., and Loper, E. (2009). *Natural Language Processing with Python*. O’Reilly Media Inc. Boersma, P. and Weenink, D. (2024). [Praat: Doing phonetics by computer](https://www.praat.org/). Bradley, D. (1991). /Æ/ and /a:/ in Australian English. In J. Cheshire (ed.), *English around the World* (1st ed.), 227–234. Cambridge University Press. https://doi.org/10.1017/CBO9780511611889.016 Bradley, D. (2008). Regional characteristics of Australian English: Phonology. In B. Kortmann, E. W. Schneider, and K. Burridge (eds.), *A Handbook of Varieties of English, Volume 1: Phonology. Part 3: The Pacific and Australasia*, 111–123. De Gruyter Mouton. https://doi.org/10.1515/9783110208412.1.111 Horvath, B. M. and Horvath, R. J. (2001). Short A in Australian English: A geolinguistic study. In: D. Blair and P. Collins (eds.), *English in Australia*, 341-356. John Benjamins. Coats, S. (2022). CoANZSE: [The Corpus of Australian and New Zealand Spoken English: A new resource of naturalistic speech transcripts](https://aclanthology.org/2022.alta-1.1/). In P. Parameswaran, J. Biggs & D. Powers (Eds.), *Proceedings of the the 20th Annual Workshop of the Australasian Language Technology Association*, 1–5. Coats, S. (2024). [CoANZSE Audio: Creation of an online corpus for linguistic and phonetic snalysis of Australian and New Zealand Englishes](https://aclanthology.org/2024.lrec-main.302). In *Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)*, 3407-3412. Coats, S., Diskin-Holdaway, C., and Loakes, D. (2025). [Regional distribution of the /el/-/æl/ merger in Australian English](https://aclanthology.org/2025.vardial-1.11). In Y. Scherrer, T. Jauhiainen, N. Ljubešić, P. Nakov, J. Tiedemann, & M. Zampieri (eds.), *Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects*. Cox, F. and Palethorpe, S. (2019). [Vowel variation in a standard context across four major Australian cities](https://assta.org/proceedings/ICPhS2019/papers/ICPhS_626.pdf). *Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia 2019* pp. 577-581. Cox, F. and Palethorpe, S. (2008). [Reversal of short front vowel raising in Australian English](https://doi.org/10.21437/Interspeech.2008-144). *Interspeech 2008* pp. 342-345. Cox, F., Penney J., and Palethorpe, S. (2024). [Australian English monophthong change across 50 Years: Static versus dynamic measures](https://doi.org/10.3390/languages9030099). Languages, 9(3). Diskin, C., Loakes, D., Billington, R., Gonzalez, S., Volchok, B., and Clothier, J. (2019). Sociophonetic variability in the /el/-/æl/ merger in Australian (Melbourne) English: Comparing wordlist and conversational data. Fabricius, A. H., Watt, D., and Johnson, D. E. (2009). [A comparison of three speaker-intrinsic vowel formant frequency normalization algorithms for sociophonetics](https://doi.org/10.1017/S0954394509990160). *Language Variation and Change* 21(3), 413–435.] .pull-right[ Ferreira, A. I. S. (2024). [wav2vec2-large-xlsr-683 53-gender-recognition-librispeech](https://huggingface.co/alefiury/wav2vec2-large-xlsr-53-gender-recognition-librispeech). Flynn, N. (2011). Comparing vowel formant normalisation procedures. *York Papers in Linguistics Series* 2(11), 1–28. Getis, A. and Ord, J. K. (1992). [The Analysis of Spatial Association by Use of Distance Statistics](https://doi.org/10.1111/j.1538-4632.1992.tb00261.x). *Geographical Analysis* 24, 189-206. Gonzalez, S., Grama, J., and Travis, C. E. (2020). [Comparing the performance of forced aligners used in sociophonetic research](https://doi.org/10.1515/lingvan-2019-0058). *Linguistics Vanguard* 6, 20190058. Grama, J., Travis, C. E., and Gonzalez, S. (2021). Ethnic variation in real time: Change in Australian English diphthongs. In H. Van de Velde, N. H. Hilton, and R. Knooihuizen (eds.), *Language Variation--European Perspectives VIII*, 291-314. John Benjamins Grieve, J. (2017). [Spatial Statistics for Dialectology](https://doi.org/10.1002/9781118827628.ch24). In *The Handbook of Dialectology*, 415–433. John Wiley & Sons, Ltd. Horvath, B. M. and Horvath, R. J. (2001). Short A in Australian English: a geolinguistic study. In D. Blair and P. Collins (eds.), *English in Australia*, 341-356. John Benjamins. Jadoul, Y., Thompson, B., and de Boer, B. (2018). [Introducing Parselmouth: A Python interface to Praat](https://doi.org/10.1016/j.wocn.2018.07.001). *Journal of Phonetics* 71, 1-15. Labov, W., Rosenfelder, I., and Fruehwald, J. (2013). One hundred years of sound change in Philadelphia: Linear incrementation, reversal, and reanalysis. *Language* 89(1), 30–65. Loakes, D., Clothier, J., Hajek, J., and Fletcher, J. (2024). [Sociophonetic variation in vowel categorization of Australian English](https://doi.org/10.1177/00238309231198520). *Language and Speech* 67(3), 870–906. Loakes, D., Fletcher, J., and Clothier, J. (2024). [One place, two speech communities: Differing responses to sound change in Mainstream and Aboriginal Australian English in a small rural town](https://doi.org/10.1515/9783110765328-005). In F. Kleber and T. Rathcke (eds.), *Speech dynamics: Synchronic variation and diachronic change*, 117–144. De Gruyter Mouton. Loakes, D., Hajek, J., Clothier, J., and Fletcher, J. (2014). Identifying/el/-/æl: A comparison between two regional Australian towns. *15th Australasian International Conference on Speech Science and Technology*. Loakes, D., Hajek, J., and Fletcher, J. (2011). /el/-/æl/ transposition in Australian English: Hypercorrection or a competing sound change? *Proceedings of the 17th International Congress of Phonetic Sciences*. Loakes, D., Hajek, J., and Fletcher, J. (2017). [Can you t\[æ\]ll I’m from M\[æ\]lbourne?](https://doi.org/10.1075/eww.38.1.03loa). *English World-Wide* 38(1), 29–49. MacKenzie, L. and Turton, D. (2020). [Assessing the accuracy of existing forced alignment software on varieties of British English](https://doi.org/10.1515/lingvan-2018-0061). *Linguistics Vanguard* 6. ] ]] --- .verysmall[ .hangingindent[ .pull-left[ McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., and Sonderegger, M. (2017). [Montreal Forced Aligner: Trainable text-speech alignment using Kaldi](https://doi.org/10.21437/Interspeech.2017-1386). In *Proceedings of the 18th Conference of the International Speech Communication Association*, 498-502. Mielke, J., Thomas, E. R., Fruehwald, J., McAuliffe, M., Sonderegger, M., Stuart-Smith, J., and Dodsworth, R. (2019). Age vectors vs. Axes of intraspeaker variation in vowel formants measured automatically from several English speech corpora. *Proceedings of the International Congress of Phonetic Sciences (ICPhS 2019)*, 1258–1262. Mitchell, A. G. and Delbridge, A. (1965). *The speech of Australian adolescents: A survey*. Angus and Robertson. Moran, P. A. P. (1950). [Notes on Continuous Stochastic Phenomena](http://www.jstor.org/stable/2332142). *Biometrika* 37, 17-23. Ord, J. K. and Getis, A. (1995). [Local Spatial Autocorrelation Statistics: Distributional Issues and an Application](https://doi.org/https://doi.org/10.1111/j.1538-4632.1995.tb00912.x). *Geographical Analysis* 27, 286-306. Plaquet, A. and Bredin, H. (2023). [Powerset multi-class cross entropy loss for neural speaker diarization](https://doi.org/10.21437/Interspeech.2023-205). In *INTERSPEECH 2023*, 3222-3226. Renwick, M. E. L., and Stanley, J. A. (2020). [Modeling dynamic trajectories of front vowels in the American South](https://doi.org/10.1121/10.0000549). *The Journal of the Acoustical Society of America* 147(1), 579–595. Schmidt, P., Diskin-Holdaway, C., and Loakes, D. (2021). [New insights into /el/-/æl/ merging in Australian English](https://doi.org/10.1080/07268602.2021.1905607). *Australian Journal of Linguistics* 41(1), 66–95. Thomas, E. R., and Kendall, T. (2007). [*NORM: The vowel normalization and plotting suite*](https://lingtools.uoregon.edu/norm/about_norm1.php). Tobler, W. R. (1970). [A Computer Movie Simulating Urban Growth in the Detroit Region](http://www.jstor.org/stable/143141). *Economic Geography* 46, 234-240. Travis, C. E., Grama, J., Gonzalez, S., Purser, B., and Johnstone, C. (2023). [Sydney Speaks Corpus](https://doi.org/10.25911/m03c-yz22). Weide, R. and others (1998). [The Carnegie Mellon Pronouncing Dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict). Wirtz, M. A., Pickl, S., Niehaus, K., Elspaß, S., and Möller, R. (2025). [Gebrauchsstandard in der deutschen Alltagssprache: Eine integrative Modellierung räumlicher und sozialer Variation](https://doi.org/10.1515/zgl-2025-2003). *Zeitschrift für germanistische Linguistik* 53(1), 97–125. ] ]] --- exclude: True .verysmall[ .hangingindent[ .pull-left[ Adank, P., Smits, R., and van Hout, R. (2004). [A comparison of vowel normalization procedures for language variation research](https://doi.org/10.1121/1.1795335). *The Journal of the Acoustical Society of America* 116(5), 3099–3107. Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F. M., and Weber, G. (2020). [Common Voice: A massively-multilingual speech corpus](https://arxiv.org/abs/1912.06670). Baevski, A., Zhou, H., Mohamed, A., and Auli, M. (2020). [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477). Barreda, S., and Nearey, T. M. (2017). [A regression approach to vowel normalization for missing and unbalanced data](https://doi.org/10.1121/1.5014454). *The Journal of the Acoustical Society of America* 142(4_Supplement), 2583–2583. Benjamin, B. J. (1982). [Phonological performance in gerontological speech](https://doi.org/10.1007/bf01068218). *Journal of Psycholinguistic Research* 11(2), 159–167. Bernard, J. R. L. (1967). [Some measurements of some sounds of Australian English](https://ses.library.usyd.edu.au/handle/2123/1502). Bernard, J. R. L. (1981). Australian Pronunciation. In A. Delbridge, J. R. L. Bernard, D. Blair, and W. S. Ramson (eds.), *The Macquarie Dictionary*, 18–27. The Macquarie Library. Billington, R. (2011). [Location, Location, Location! Regional Characteristics and National Patterns of Change in the Vowels of Melbourne Adolescents](https://doi.org/10.1080/07268602.2011.598628). *Australian Journal of Linguistics* 31(3), 275–303. Boersma, P., and Weenink, D. (2024). [Praat: Doing phonetics by computer](https://www.praat.org/) [Computer program]. Brand, J., Hay, J., Clark, L., Watson, K., and Sóskuthy, M. (2021). [Systematic co-variation of monophthongs across speakers of New Zealand English](https://doi.org/10.1016/j.wocn.2021.101096). *Journal of Phonetics* 88, 101096. Butcher, A. (2006). Formant frequencies of /hVd/ vowels in the speech of South Australian females. In P. Warren and C. Watson (eds.), *Proceedings of the 11th Australian International Conference on Speech Science & Technology*, 449–453. Australian Speech Science & Technology Association. Childers, D. G., and Wu, K. (1991). [Gender recognition from speech. Part II: Fine analysis](https://doi.org/10.1121/1.401664). *The Journal of the Acoustical Society of America* 90(4), 1841–1856. Coats, S. (2022). [The Corpus of Australian and New Zealand Spoken English: A new resource of naturalistic speech transcripts](https://aclanthology.org/2022.alta-1.1/). *Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association*. Coats, S. (2024a). [Building a searchable online corpus of Australian and New Zealand aligned speech](https://doi.org/10.1080/07268602.2024.2368780). *Australian Journal of Linguistics* 44(2–3), 261–277. Coats, S. (2024b). [CoANZSE Audio: Creation of an online corpus for linguistic and phonetic analysis of Australian and New Zealand Englishes](https://aclanthology.org/2024.lrec-main.302). In *Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)*, 3407–3412.] .pull-right[ Coats, S., Diskin-Holdaway, C., and Loakes, D. (2025). [Regional Distribution of the /el/-/æl/ Merger in Australian English](https://aclanthology.org/2025.vardial-1.11/). In Y. Scherrer, T. Jauhiainen, N. Ljubešić, P. Nakov, J. Tiedemann, and M. Zampieri (eds.), *Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects*, 147–156. Association for Computational Linguistics. Coto-Solano, R., Stanford, J. N., and Reddy, S. K. (2021). [Advances in completely automated vowel analysis for sociophonetics: Using end-to-end speech recognition systems With DARLA](https://doi.org/10.3389/frai.2021.662097). *Frontiers in Artificial Intelligence* 4. Cox, F. (1999). [Vowel change in Australian English](https://doi.org/10.1159/000028438). *Phonetica* 56(1–2), 1–27. Cox, F. (2006). [The Acoustic Characteristics of /hVd/ Vowels in the Speech of some Australian Teenagers](https://doi.org/10.1080/07268600600885494). *Australian Journal of Linguistics* 26(2), 147–179. Cox, F., and Palethorpe, S. (2001). The changing face of Australian English vowels. In D. Blair and P. Collins (eds.), *English in Australia* (Vol. 26), 17–44. John Benjamins Publishing Company. Cox, F., and Palethorpe, S. (2004). The border effect: Vowel differences across the NSW/Victorian border. In *Proceedings of the 2003 Conference, Australian Linguistics Society*. Cox, F., and Palethorpe, S. (2008). [Reversal of short front vowel raising in Australian English](https://doi.org/10.21437/Interspeech.2008-144). *Interspeech 2008*, 342–345. Cox, F., and Palethorpe, S. (2017). Open Vowels in Historical Australian English. In R. Hickey (ed.), *Listening to the Past* (1st ed.), 502–528. Cambridge University Press. https://doi.org/10.1017/9781107279865.022 Cox, F., Palethorpe, S., and Bentink, S. (2014a). [Phonetic Archaeology and 50 Years of Change to Australian English /i:/](https://doi.org/10.1080/07268602.2014.875455). *Australian Journal of Linguistics* 34(1), 50–75. Cox, F., Palethorpe, S., and Bentink, S. (2014b). [Phonetic Archaeology and 50 Years of Change to Australian English /iː/](https://doi.org/10.1080/07268602.2014.875455). *Australian Journal of Linguistics* 34(1), 50–75. Cox, F., Palethorpe, S., and Penney, J. (2024). [Fifty years of monophthong and diphthong shifts in Mainstream Australian English](https://doi.org/10.1515/9783110765328-002). In F. Kleber and T. Rathcke (eds.), *Speech Dynamics*, 17–48. De Gruyter. Cox, F., and Penney, J. (2024). [Multicultural Australian English: The new voice of Sydney](https://doi.org/10.1080/07268602.2024.2380680). *Australian Journal of Linguistics*, 1–20. Cox, F., Penney, J., and Palethorpe, S. (2024). [Australian English Monophthong Change across 50 Years: Static versus Dynamic Measures](https://doi.org/10.3390/languages9030099). *Languages* 9(3), 99. ] ]] --- exclude: True .verysmall[ .hangingindent[ .pull-left[ Docherty, G., Foulkes, P., and Gonzalez, S. (2024). [“It’s a Bit Tricky, Isn’t It?”—An Acoustic Study of Contextual Variation in /ɪ/ in the Conversational Speech of Young People from Perth](https://doi.org/10.3390/languages9110343). *Languages* 9(11), 343. Docherty, G., Foulkes, P., Gonzalez, S., and Mitchell, N. (2018a). [Missed connections at the junction of sociolinguistics and speech processing](https://doi.org/10.1111/tops.12375). *Topics in Cognitive Science* 10(4), 759–774. Docherty, G., Foulkes, P., Gonzalez, S., and Mitchell, N. (2018b). [Missed Connections at the Junction of Sociolinguistics and Speech Processing](https://doi.org/10.1111/tops.12375). *Topics in Cognitive Science* 10(4), 759–774. Docherty, G., Gnevsheva, K., Travis, C., and Revius, K. (2025). [Phonetic properties of Australian English in non-metropolitan settings: A scoping review](https://doi.org/10.17605/OSF.IO/KZCHB). Docherty, G., Gonzalez, S., and Foulkes, P. (2023). An acoustic study of the realisation of KIT in the conversational speech of young English speakers in Australia. *Proceedings of the 20th International Congress of Phonetic Sciences*, 3061–3065. Docherty, G., Gonzalez, S., and Mitchell, N. (2015). Static vs dynamic perspectives on the realization of vowel nucleii in West Australian English. *Proceedings of the 18th International Congress of Phonetic Sciences*. Elvin, J., Williams, D., and Escudero, P. (2016). [Dynamic acoustic properties of monophthongs and diphthongs in Western Sydney Australian English](https://doi.org/10.1121/1.4952387). *The Journal of the Acoustical Society of America* 140(1), 576–581. Estival, D., Cassidy, S., Cox, F., and Burnham, D. (2014). AusTalk: An audio-visual corpus of Australian English. In *Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14)*, 3105–3109. http://www.lrec-conf.org/proceedings/lrec2014/pdf/520_Paper.pdf Fabricius, A. H., Watt, D., and Johnson, D. E. (2009). [A comparison of three speaker-intrinsic vowel formant frequency normalization algorithms for sociophonetics](https://doi.org/10.1017/S0954394509990160). *Language Variation and Change* 21(3), 413–435. Ferreira, A. I. S. (2024). [Wav2vec2-large-xlsr-53-gender-recognition-librispeech](https://huggingface.co/alefiury/wav2vec2-large-xlsr-53-gender-recognition-librispeech). Fletcher, A. R., McAuliffe, M. J., Lansford, K. L., and Liss, J. M. (2015). [The relationship between speech segment duration and vowel centralization in a group of older speakers](https://doi.org/10.1121/1.4930563). *The Journal of the Acoustical Society of America* 138(4), 2132–2139. Flynn, N. (2011). Comparing vowel formant normalisation procedures. *York Papers in Linguistics Series* 2(11), 1–28. Gendrot, C., and Adda-Decker, M. (2005). [Impact of duration on F1/F2 formant values of oral vowels: An automatic analysis of large broadcast news corpora in French and German](https://doi.org/10.21437/interspeech.2005-753). *Interspeech 2005*, 2453–2456. ] .pull-right[ Getis, A., and Ord, J. K. (1992). [The Analysis of Spatial Association by Use of Distance Statistics](https://doi.org/10.1111/j.1538-4632.1992.tb00261.x). *Geographical Analysis* 24(3), 189–206. Gonzalez, S., Grama, J., and Travis, C. E. (2020). [Comparing the performance of forced aligners used in sociophonetic research](https://doi.org/10.1515/lingvan-2019-0058). *Linguistics Vanguard* 6(1). Grama, J., Travis, C. E., and Gonzalez, S. (2019). [Initiation, progression, and conditioning of the short-front vowel shift in Australia](https://assta.org/proceedings/ICPhS2019Microsite/pdf/full-paper_797.pdf). In *Proceedings of the 19th International Congress of Phonetic Sciences*, 1769–1773. Grieve, J., Speelman, D., and Geeraerts, D. (2011). [A statistical method for the identification and aggregation of regional linguistic variation](https://doi.org/10.1017/S095439451100007X). *Language Variation and Change* 23(2), 193–221. Grieve, J., Speelman, D., and Geeraerts, D. (2013). [A multivariate spatial analysis of vowel formants in American English](https://doi.org/10.1017/jlg.2013.3). *Journal of Linguistic Geography* 1(1), 31–51. Harrington, J., Cox, F., and Evans, Z. (1997). [An acoustic phonetic study of broad, general, and cultivated Australian English vowels\*](https://doi.org/10.1080/07268609708599550). *Australian Journal of Linguistics* 17(2), 155–184. Hillenbrand, J., Getty, L. A., Clark, M. J., and Wheeler, K. (1995). [Acoustic characteristics of American English vowels](https://doi.org/10.1121/1.411872). *The Journal of the Acoustical Society of America* 97(5), 3099–3111. Horvath, B. M., and Horvath, R. J. (2001). [Short A in Australian English: A geolinguistic study](https://doi.org/10.1075/veaw.g26.30hor). In D. Blair and P. Collins (eds.), *Varieties of English Around the World* (Vol. G26), 341–356. John Benjamins Publishing Company. Jacewicz, E., and Fox, R. A. (2013). [Cross-Dialectal Differences in Dynamic Formant Patterns in American English Vowels](https://doi.org/10.1007/978-3-642-14209-3_8). In G. S. Morrison and P. F. Assmann (eds.), *Vowel Inherent Spectral Change*, 177–198. Springer. Jadoul, Y., Thompson, B., and de Boer, B. (2018). [Introducing Parselmouth: A Python interface to Praat](https://doi.org/10.1016/j.wocn.2018.07.001). *Journal of Phonetics* 71, 1–15. Labov, W. (1994). *Principles of Linguistic Change, Volume 1: Internal Factors*. Wiley. https://www.wiley.com/en-us/Principles+of+Linguistic+Change%2C+Volume+1%3A+Internal+Factors-p-9780631179146 Labov, W., Ash, S., and Boberg, C. (2006). *The Atlas of North American English: Phonetics, Phonology and Sound Change*. Mouton de Gruyter. https://doi.org/10.1515/9783110167467 ] ]] --- exclude: True .verysmall[ .hangingindent[ .pull-left[ Lindblom, B. (1990). Explaining Phonetic Variation: A Sketch of the H&H Theory. In W. J. Hardcastle and A. Marchal (eds.), *Speech Production and Speech Modelling*, 403–439. Springer Netherlands. https://doi.org/10.1007/978-94-009-2037-8_16 MacKenzie, L., and Turton, D. (2020). [Assessing the accuracy of existing forced alignment software on varieties of British English](https://doi.org/10.1515/lingvan-2018-0061). *Linguistics Vanguard* 6. McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., and Sonderegger, M. (2017). [Montreal Forced Aligner: Trainable text-speech alignment using Kaldi](https://doi.org/10.21437/Interspeech.2017-1386). *Interspeech 2017*, 498–502. Meunier, C., and Espesser, R. (2011). [Vowel reduction in conversational speech in French: The role of lexical factors](https://doi.org/10.1016/j.wocn.2010.11.008). *Journal of Phonetics* 39(3), 271–278. Mitchell, A. G., and Delbridge, A. (1965a). *The Pronunciation of English in Australia*. Angus and Robertson. Mitchell, A. G., and Delbridge, A. (1965b). *The speech of Australian adolescents: A survey*. Angus and Robertson. Mitchell, A. G., and Delbridge, A. (1998). *The speech of Australian adolescents: Research data and recordings collected by AG Mitchell and Arthur Delbridge in 1959 and 1960*. The University of Sydney. https://doi.org/10.25910/jkwy-wk76 ] .pull-right[ Moran, P. A. P. (1950). Notes on Continuous Stochastic Phenomena. *Biometrika* 37(1/2), 17–23. Morrison, G. S., and Assmann, P. F. (eds.) (2013). *Vowel Inherent Spectral Change*. Springer. https://doi.org/10.1007/978-3-642-14209-3_8 Nearey, T. M., and Assmann, P. F. (1986). [Modeling the role of inherent spectral change in vowel identification](https://doi.org/10.1121/1.394433). *The Journal of the Acoustical Society of America* 80(5), 1297–1308. Ord, J. K., and Getis, A. (1995). [Local Spatial Autocorrelation Statistics: Distributional Issues and an Application](https://doi.org/10.1111/j.1538-4632.1995.tb00912.x). *Geographical Analysis* 27(4), 286–306. Panayotov, V., Chen, G., Povey, D., and Khudanpur, S. (2015). [Librispeech: An ASR corpus based on public domain audio books](https://doi.org/10.1109/ICASSP.2015.7178964). *ICASSP 2015*, 5206–5210. Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I. (2022). [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356). Rey, S. J., and Anselin, L. (2010). PySAL: A Python library of spatial analytical methods. In M. M. Fischer and A. Getis (eds.), *Handbook of Applied Spatial Analysis: Software Tools, Methods and Applications*, 175–193. Springer. https://doi.org/10.1007/978-3-642-03647-7_11 Reynolds, D. A. (1995). [Speaker identification and verification using Gaussian mixture speaker models](https://doi.org/10.1016/0167-6393(95)00009-d). *Speech Communication* 17(1–2), 91–108. Reynolds, D. A., and Rose, R. C. (1995). [Robust text-independent speaker identification using Gaussian mixture speaker models](https://doi.org/10.1109/89.365379). *IEEE Transactions on Speech and Audio Processing* 3(1), 72–83. Sóskuthy, M. (2017). [Generalised additive mixed models for dynamic analysis in linguistics: A practical introduction](https://doi.org/10.48550/arXiv.1703.05339). arXiv. Sóskuthy, M., Hay, J., and Brand, J. (2019). Horizontal diphthong shift in New Zealand English. In S. Calhoun, P. Escudero, M. Tabain, and P. Warren (eds.), *Proceedings of the 19th International Congress of Phonetic Sciences*, 597–601. Stanley, J. A., Renwick, M. E. L., Kuiper, K. I., and Olsen, R. M. (2021). [Back Vowel Dynamics and Distinctions in Southern American English](https://doi.org/10.1177/00754242211043163). *Journal of English Linguistics* 49(4), 389–418. ] ]] --- exclude: True .verysmall[ .hangingindent[ .pull-left[ Story, R. (2025). *Folium* [Computer software]. https://github.com/python-visualization/folium Travis, C. E. (2023). *Sydney Speaks: Variation and Change in Australian English*. The Australian National University Data Commons. https://doi.org/10.25911/M03C-YZ22 Watson, C. I., and Harrington, J. (1999). [Acoustic evidence for dynamic formant trajectories in Australian English vowels](https://doi.org/10.1121/1.427069). *The Journal of the Acoustical Society of America* 106(1), 458–468. Weide, R. & et al. (1998). *The Carnegie Mellon Pronouncing Dictionary*. http://www.speech.cs.cmu.edu/cgi-bin/cmudict Wells, J. C. (1982). *Accents of English* (1st ed.). Cambridge University Press. https://doi.org/10.1017/cbo9780511611759 Wieling, M. (2018). [Analyzing dynamic phonetic data using generalized additive mixed modeling: A tutorial focusing on articulatory differences between L1 and L2 speakers of English](https://doi.org/10.1016/j.wocn.2018.03.002). *Journal of Phonetics* 70, 86–116. Wieling, M., Montemagni, S., Nerbonne, J., and Baayen, R. H. (2014). Lexical differences between Tuscan dialects and standard Italian: Accounting for geographic and sociodemographic variation using generalized additive mixed modeling. *Language* 90(3), 669–692. Wieling, M., Nerbonne, J., and Baayen, R. H. (2011). [Quantitative Social Dialectology: Explaining Linguistic Variation Geographically and Socially](https://doi.org/10.1371/journal.pone.0023613). *PLOS ONE* 6(9), e23613. Wood, S. N. (2017). *Generalized Additive Models: An Introduction with R* (2nd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9781315370279 Wu, K., and Childers, D. G. (1991). [Gender recognition from speech. Part I: Coarse analysis](https://doi.org/10.1121/1.401663). *The Journal of the Acoustical Society of America* 90(4), 1828–1840. Zahorian, S. A., and Jagharghi, A. J. (1993). [Spectral-shape features versus formants as acoustic correlates for vowels](https://doi.org/10.1121/1.407520). *The Journal of the Acoustical Society of America* 94(4), 1966–1982. ]]]