Regional Variation in Monophthongs in Australian and New Zealand Englishes: A Big Data Approach

class: inverse, center, middle
background-image: url(data:image/png;base64,#https://cc.oulu.fi/~scoats/oululogoRedTransparent.png);
background-repeat: no-repeat;
background-size: 80px 57px;
background-position:right top;
exclude: true

---

.pull-right[
<span style="font-family:Rubik;font-size:24pt;font-weight: 700;font-style: normal;float:right;text-align: right;color:white;-webkit-text-fill-color: black;-webkit-text-stroke: 0.8px;">Regional Variation in Monophthongs in Australian and New Zealand Englishes:<br>A Big Data Approach</span>
]

Steven Coats<br>
University of Oulu, Finland<br>
<a href="mailto:steven.coats@oulu.fi">steven.coats@oulu.fi</a><br>
10th BICLCE Conference, Alicante<br>
September 27th, 2024<br>
</p>

---

<div class="my-footer"><span>Steven Coats&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;AUS and NZ Vowels | BICLCE</span></div>

---

---

## Outline

1. Background: Regional variation in Australian and New Zealand Englishes

2. Data: CoANZSE

3. Method: Scripting pipeline for collection of data, alignment, formant extraction, and spatial analysis

4. Preliminary results

5. Caveats, summary

---

### Background

Traditional view: Regional variation is limited in AUS and NZ. "Australia is, generally speaking, linguistically unified" <span class="small">(Mitchell & Delbridge 1965: 13).

Australia

- Lexical items, e.g. *potato cake/scallop/fritter* <span class="small">(Bryant 1989)</span>
- Realization of words like *dance* ([dæns] or [dans]) can differ, /a/ more common in Adelaide than Melbourne, Sydney, and Brisbane <span class="small">(Horvath & Horvath 2001)</span>

New Zealand

- Lexical divergence of children's playground vocabulary items <span class="small">(Bauer & Bauer 2002)</span>
- Rhoticity in the South Island <span class="small">(Kennedy 2006; Marsden 2013)</span>

Recently: Some regional phonetic variation exists in Australia

- Analysis of 5,722 vowel tokens in hVd words by 109 younger speakers from Melbourne, Sydney, Adelaide, and Perth <span class="small">(Cox & Palethorpe 2019)</span> 
- Some evidence for distinctive realization of GOAT diphthong in Adelaide, other vowels with mixed results

**This study: Investigation of regional variation based on audio and transcripts from YouTube videos indexed in the Corpus of Australian and New Zealand Spoken English**

---

### CoANZSE <span class="small">(Coats 2022, 2024)</span>

- ASR transcripts from YouTube channels of regional and local councils

- Many recordings are meetings: advantages in terms of representativeness and comparability

- Speaker place of residence (cf. videos collected based on place-name search alone)

- Topical contents and communicative contexts comparable

- Content either in the public domain (US) or can be used under "fair use" or "fair dealings" provisions of copyright law (e.g. Australian Copyright Act of 1968)

---

### YouTube captions files

- Videos can have multiple captions files: user-uploaded captions, auto-generated captions created using automatic speech recognition (ASR), or both, or neither

- User-uploaded captions can be manually created or generated automatically by 3rd-party ASR software

- Auto-generated captions are generated by YT's speech-to-text service

- CoANZSE, CoNASE, CoBISE: target YT ASR captions

---

### Example video

---

### WebVTT file

![](data:image/png;base64,#./Maranoa_webvtt_example.png)

---

### Transcript data collection and processing

- Identification of relevant channels (lists of councils with web pages → scrape pages for links to YouTube)
- Inspection of returned channels to remove false positives
- Retrieval of ASR transcripts using [yt-dlp](https://github.com/yt-dlp/yt-dlp)
- Geocoding: String containing council name + address + country location to Google's geocoding service
- PoS tagging with SpaCy <span class="small">(Honnibal et al. 2019)</span>

---

### Data format

<div>
<table border="1" class="dataframe" style="font-size:8pt;border-collapse: collapse;">
  <thead>
    <tr style="text-align: left;">
      <th></th>
      <th>country</th>
      <th>state</th>
      <th>name</th>
      <th>channel_name</th>
      <th>channel_url</th>
      <th>video_title</th>
      <th>video_id</th>
      <th>upload_date</th>
      <th>video_length</th>
      <th>text_pos</th>
      <th>location</th>
      <th>latlong</th>
      <th>nr_words</th>
    </tr>
  </thead1>
  <tbody1>
    <tr>
      <th>0</th>
      <td>AUS</td>
      <td>NSW</td>
      <td>Wollondilly Shire Council</td>
      <td>Wollondilly Shire</td>
      <td>https://www.youtube.com/c/wollondillyshire</td>
      <td>Road Resurfacing Video</td>
      <td>zVr6S5XkJ28</td>
      <td>20181127</td>
      <td>146.120</td>
      <td>g_NNP_2.75 'day_XX_2.75 my_PRP$_3.75 name_NN_4.53 is_VBZ_4.74 ...
	  <td>62/64 Menangle St, Picton NSW 2571, Australia</td>
      <td>(-34.1700078, 150.612913)</td>
      <td>433</td>
    </tr>
    <tr>
      <th>1</th>
      <td>AUS</td>
      <td>NSW</td>
      <td>Wollondilly Shire Council</td>
      <td>Wollondilly Shire</td>
      <td>https://www.youtube.com/c/wollondillyshire</td>
      <td>Weather update 5pm 1 March 2022 - Mayor Matt Gould</td>
      <td>p4MjirCc1oU</td>
      <td>20220301</td>
      <td>181.959</td>
      <td>hi_UH_0.64 guys_NNS_0.96 i_PRP_1.439 'm_VBP_1.439 just_RB_1.76 ...
	  <td>62/64 Menangle St, Picton NSW 2571, Australia</td>
      <td>(-34.1700078, 150.612913)</td>
      <td>620</td>
    </tr>
    <tr>
      <th>2</th>
      <td>AUS</td>
      <td>NSW</td>
      <td>Wollondilly Shire Council</td>
      <td>Wollondilly Shire</td>
      <td>https://www.youtube.com/c/wollondillyshire</td>
      <td>Transport Capital Works Video</td>
      <td>DXlkVTcmeho</td>
      <td>20180417</td>
      <td>140.450</td>
      <td>council_NNP_0.53 is_VBZ_1.53 placing_VBG_1.65 is_VBZ_2.07 2018-19_CD_2.57 ...
	  <td>62/64 Menangle St, Picton NSW 2571, Australia</td>
      <td>(-34.1700078, 150.612913)</td>
      <td>347</td>
    </tr>
    <tr>
      <th>3</th>
      <td>AUS</td>
      <td>NSW</td>
      <td>Wollondilly Shire Council</td>
      <td>Wollondilly Shire</td>
      <td>https://www.youtube.com/c/wollondillyshire</td>
      <td>Council Meeting Wrap Up February 2022</td>
      <td>2NhuhF2fBu8</td>
      <td>20220224</td>
      <td>107.840</td>
      <td>g_NNP_0.399 'day_NNP_0.399 guys_NNS_0.799 and_CC_1.12 welcome_JJ_1.199 ...
	  <td>62/64 Menangle St, Picton NSW 2571, Australia</td>
      <td>(-34.1700078, 150.612913)</td>
      <td>341</td>
    </tr>
    <tr>
      <th>4</th>
      <td>AUS</td>
      <td>NSW</td>
      <td>Wollondilly Shire Council</td>
      <td>Wollondilly Shire</td>
      <td>https://www.youtube.com/c/wollondillyshire</td>
      <td>CITY DEAL  4 March 2018</td>
      <td>4-cv69ZcwVs</td>
      <td>20180305</td>
      <td>130.159</td>
      <td>[Music]_XX_0.85 it_PRP_2.27 's_VBZ_2.27 a_DT_3.27 fantastic_JJ_3.36 ...
	  <td>62/64 Menangle St, Picton NSW 2571, Australia</td>
      <td>(-34.1700078, 150.612913)</td>
      <td>420</td>
    </tr1>
  </tbody1>
</table1></div>

---

### CoANZSE Audio: https://coanzse.org

.pull-left30[
- Transcripts cut into 20-word chunks
- Audio segments retrieved based on word timing tags
- Searchable online CoAZNSE data, including audio and forced alignment files
- Powered by BlackLab <span class="small">(De Does et al. 2017)</span>, developed at the Dutch Language Institute
- "Under the hood": Apache Lucene
- Accessible via Shibboleth authentication

]
.pull-right30[
![:scale 70%](data:image/png;base64,#CoANZSE_Audio_landing.png)
]

---

### CoANZSE size by country/state/territory

.small[
Location	                  |nr_channels|nr_videos  |nr_words|video_length (h) |nr_audio_files
----------------------------|---|-------|-----------|--------------------------|---------------				
Australian Capital Territory|	8	|650	  |915,542	  |111.79                    |41,752
New South Wales             |114|9,741  |27,580,773	|3,428.87                  |1,299,949
Northern Territory	        |11 |	289	  |315,300	  |48.72                     |6,628  
Queensland	                |58	|7,356	|19,988,051	|2,642.75                  |950,084
South Australia	            |50	|3,537	|13,856,275	|1,716.72                  |643,866
Tasmania	                  |21	|1,260	|5,086,867	|636.99                    |240,453
Victoria	                  |78	|12,138	|35,304,943	|4,205.40                  |1,624,830
Western Australia	          |68	|3,815	|8,422,484	|1,063.78                  |386,898
| | | | |
New Zealand	                |74	|18,029	|84,058,661	|10,175.80                 |3,926,216
| | | | |
**Total**                   |**482**|**56,815**|**195,528,896**|**24,030.82**  |**9,122,676**
]

---

### CoANZSE channel locations

---

### Pipeline: [yt-dlp](https://github.com/yt-dlp/yt-dlp)

- Fork of YouTube-DL

- Can be used to access any content streamed with DASH or HLS protocols

- Used to retrieve transcripts (2022) and audio (2023)
]
.pull-right70[

![](data:image/png;base64,#./yt-dlp_screenshot.png)

]

---

#### Data workflow

Cloud storage .tar.gz files → temporary local directory → MFA → formant extraction

![](data:image/png;base64,#Mitcham_data.png)

]

![](data:image/png;base64,#Mitcham_example_video.png)
]

---

### Pipeline: Montreal Forced Aligner <span class="small">(McAuliffe et al. 2017)</span>

- For each location: all audio and transcripts (i.e. 20-word chunks) sent to MFA
- `English_MFA_Acoustic Model_v3.0.0`, trained on Common Voice English v8.0 <span class="small">(Ardila et al. 2020)</span>, `English_MFA.dict`
- "Adapt" functionality in MFA: Gaussian Mixture Model means adjusted on the basis of the audio for each location
]
.pull-right70[

![](data:image/png;base64,#mfa_screenshot.png)

]
---

### Pipeline: Parselmouth-Praat <span class="small">(Jadoul et al. 2018)</span>

- Python interface to Praat, widely used software for acoustic analysis <span class="small">(Boersma & Weenink 2023)</span>
- Intergration into Python simplifies workflows and analysis
- Settings: automatic time step, five formants, maximum formant frequency of 5,500 Hz, window length 0.025 seconds, pre-emphasis > 50 Hz
- Extraction of F1 and F2 formants and bandwidths at vowel midpoint

Filtering and outliers:
- Filtering: stressed vowels in content words not followed by nasals or liquids
- Removal of stopwords with NLTK 
- Word stress on the basis of values from the CMU Pronunciation Dictionary <span class="small">(Weide et al. 1998)</span>
- Outliers removed with Mahalanobis distance on the basis of critical value of the 95% quantile of the `$\chi^2$` distribution

---

### Pipeline: Spatial analysis

Tobler's first law: "everything is related to everything else, but near things are more related than distant things" <span class="small">(Tobler 1970)</span>

- Moran's *I* and Getis-Ord *G*<span class='supsub'><sup>*</sup><sub>i</sub></span> based on mean F1 and F2 values at each location
  - Spatial weights matrix `$W$` based on distance band `$w_{ij}=1/d_{ij}$`
  - Minimum distance: all locations must have at least one neighbor
  - For each vowel and formant, only locations with at least 200 tokens considered
  - Spatial analysis conducted for AUS and NZ separately
  - Calculated with esda and pysal Python packages

Moran's *I* <span class="small">(Moran 1950)</span>: Takes into account attribute values at all locations in a dataset and summarizes the overall extent of spatial correlation

- 1: perfect clustering of similar values, 0: random spatial distribution of values, -1: pefectly even dispersion of values

Getis-Ord *G*<span class='supsub'><sup>*</sup><sub>i</sub></span> <span class="small">(Getis & Ord, 1992; Ord & Getis, 1995)</span>: Identifies spatial clusters by evaluating the values at each location in the dataset in comparison with neighboring locations, in relation to the global dataset

- Positive for the location and its neighbors > global mean; 0: value = global mean; negative: value  < global mean

---

### Number of vowel tokens analzyed

Vowel	   |AUS       |NZ  
---------|----------|--------------------				
/i/      |59,950 	  |51,189	  
/ɪ/      |861,190   |609,445 
/ɛ/	     |1,341,498 |707,105	  
/a/(=/æ/)|1,136,901 |711,761	
         |          |
**Total**|**3,399,539** |**2,079,500**

---

### Results: Australia

/* Style the buttons inside the tab */
  .tab button {
    background-color: inherit;
    float: left;
    border: none;
    outline: none;
    cursor: pointer;
    padding: 14px 16px;
    transition: 0.3s;
  }

/* Change background color of buttons on hover */
  .tab button:hover {
    background-color: #ddd;
  }

/* Create an active/current tab link class */
  .tab button.active {
    background-color: #ccc;
  }

/* Style the tab content */
  .tabcontent {
    display: none;
    padding: 6px 12px;
    border-top: none;
  }
</style>

// Open the first tab by default
  document.getElementById("defaultOpen1").click();
</script>

---

### Results: New Zealand

// Open the first tab by default
  document.getElementById("defaultOpen2").click();
</script>

---

### Preliminary findings

- Some evidence for overall/global spatial clustering of f1 and f2 values in Australia and New Zealand
- The magnitude of local spatial autocorrelation values is not high 
- Front vowels in Australia are lowering and fronting <span class="small">(cf. Cox & Palethorpe 2008; Cox et al. 2024)
- This may be led by changes in the major urban centers of Sydney, Melbourne, Adelaide, and Perth
- Possibly influenced by L2 English of immigrants <span class="small">(Cox et al. 2024; Gonzalez et al. 2021; Travis et al. 2023)</span>
- Possible confirmation of these findings in large-scale dataset

---

### Caveats

- Formant extraction methods (dynamic formant tracking)
- No demographic data, but some can be semi-automatically annotated  <span class="small">(cf. Bredin 2023; Plaquet & Bredin 2023; Ferreira 2024)</span>
- MFA pronunciation dictionary phones differ slightly from Australian English symbols
  - Especially low and back vowels <span class="small">(but: Gonzalez et al. 2020; Mackenzie & Turton 2020)</span>

---

### Summary

- Pipeline approaches can be used to automatically process large data volumes and extract formants for millions of vowels

- Incipient regional variation attested for Australia, and (somewhat) for New Zealand

- Australian cities may be leading the changes

- Still much to be done!

---

#### References

.verysmall[
.hangingindent[
.pull-left[
Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F., and Weber, G. (2020). [Common Voice: A massively-multilingual speech corpus](https://arxiv.org/abs/1912.06670). *arXiv*:1912.06670 [cs.CL].

Bauer, L. and Bauer, W. (2002). [Can we watch regional dialects developing in colonial English?: The case of New Zealand](https://doi.org/10.1075/eww.23.2.02bau). *English World-Wide* 23(2), 169-193.

Boersma, P. and Weenink, D. (2024). [Praat: Doing phonetics by computer](https://www.praat.org/).

Bredin, H. (2023). [pyannote.audio 2.1 speaker diarization pipeline: Principle, benchmark, and recipe](https://www.isca-archive.org/interspeech_2023/bredin23_interspeech.html).In *INTERSPEECH 2023*, 1983-1987.

Bryant, P. (1989). [The south-east lexical usage region of Australian English](https://doi.org/10.1080/07268608908599413). *Australian Journal of Linguistics* 9, 85–134.

Horvath, B. M. and Horvath, R. J. (2001). Short A in Australian English: A geolinguistic study. In: D. Blair and P. Collins (eds.), *English in Australia*, 341-356. John Benjamins.

Coats, S. (2022). CoANZSE: [The Corpus of Australian and New Zealand Spoken English: A new resource of naturalistic speech transcripts](https://aclanthology.org/2022.alta-1.1/). In P. Parameswaran, J. Biggs & D. Powers (Eds.), *Proceedings of the the 20th Annual Workshop of the Australasian Language Technology Association*, 1–5.

Coats, S. (2024). [CoANZSE Audio: Creation of an online corpus for linguistic and phonetic snalysis of Australian and New Zealand Englishes](https://aclanthology.org/2024.lrec-main.302). In *Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)*, 3407-3412.

Cox, F. and Palethorpe, S. (2019). [Vowel variation in a standard context across four major Australian cities](https://assta.org/proceedings/ICPhS2019/papers/ICPhS_626.pdf). *Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia 2019* pp. 577-581.

Cox, F. and Palethorpe, S. (2008). [Reversal of short front vowel raising in Australian English](https://doi.org/10.21437/Interspeech.2008-144). *Interspeech 2008* pp. 342-345.

Cox, F., Penney J., and Palethorpe, S. (2024). [Australian English monophthong change across 50 Years: Static versus dynamic measures](https://doi.org/10.3390/languages9030099). Languages, 9(3).

Ferreira, A. I. S. (2024). [wav2vec2-large-xlsr-683
53-gender-recognition-librispeech](https://huggingface.co/alefiury/wav2vec2-large-xlsr-53-gender-recognition-librispeech).

Getis, A. and Ord, J. K. (1992). [The Analysis of Spatial Association by Use of Distance Statistics](https://doi.org/10.1111/j.1538-4632.1992.tb00261.x). *Geographical Analysis* 24, 189-206.

Gonzalez, S., Grama, J., and Travis, C. E. (2020). [Comparing the performance of forced aligners used in sociophonetic research](https://doi.org/10.1515/lingvan-2019-0058). *Linguistics Vanguard* 6, 20190058.

Grama, J., Travis, C. E., and Gonzalez, S. (2021). Ethnic variation in real time: Change in Australian English diphthongs. In H. Van de Velde, N. H. Hilton, and R. Knooihuizen (eds.), *Language Variation--European Perspectives VIII*, 291-314. John Benjamins

]
.pull-right[

Horvath, B. M. and Horvath, R. J. (2001). Short A in Australian English: a geolinguistic study. In D. Blair and P. Collins (eds.), *English in Australia*, 341-356. John Benjamins.

Jadoul, Y., Thompson, B., and de Boer, B. (2018). [Introducing Parselmouth: A Python interface to Praat](https://doi.org/10.1016/j.wocn.2018.07.001). *Journal of Phonetics* 71, 1-15.

Kennedy, M. (2006). *Variation in the pronunciation of English by New Zealand school children*. Master’s thesis, Victoria University of Wellington.

MacKenzie, L. and Turton, D. (2020). [Assessing the accuracy of existing forced alignment software on varieties of British English](https://doi.org/10.1515/lingvan-2018-0061). *Linguistics Vanguard* 6.

Marsden, S. (2013). *Phonological variation and the construction of regional identities in New Zealand English*.  Doctoral thesis, Victoria University of Wellington.

McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., and Sonderegger, M. (2017). [Montreal Forced Aligner: Trainable text-speech alignment using Kaldi](https://doi.org/10.21437/Interspeech.2017-1386). In *Proceedings of the 18th Conference of the International Speech Communication Association*, 498-502.

Mitchell, A. G. and Delbridge, A. (1965). *The speech of Australian adolescents: A survey*. Angus and Robertson.

Moran, P. A. P. (1950). [Notes on Continuous Stochastic Phenomena](http://www.jstor.org/stable/2332142). *Biometrika* 37, 17-23.

Ord, J. K. and Getis, A. (1995). [Local Spatial Autocorrelation Statistics: Distributional Issues and an Application](https://doi.org/https://doi.org/10.1111/j.1538-4632.1995.tb00912.x). *Geographical Analysis* 27, 286-306.

Plaquet, A. and Bredin, H. (2023). [Powerset multi-class cross entropy loss for neural speaker diarization](https://doi.org/10.21437/Interspeech.2023-205). In *INTERSPEECH 2023*, 3222-3226.

Tobler, W. R. (1970). [A Computer Movie Simulating Urban Growth in the Detroit Region](http://www.jstor.org/stable/143141). *Economic Geography* 46, 234-240.

Travis, C. E., Grama, J., Gonzalez, S., Purser, B., and Johnstone, C. (2023). [Sydney Speaks Corpus](https://doi.org/10.25911/m03c-yz22).

Weide, R. and others (1998). [The Carnegie Mellon Pronouncing Dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict). ]
]]