Corpus of North American Spoken English (CoNASE): YouTube captions files, data collection, geolocation, data filtering
Double modals in North America
Methods: Regex, manual inspection/annotation
Results: Inventories, maps, interpretation
Caveats, summary, future outlook
1,252,066,371 words, 2,572 channels, 302k videos, 154k hours of video
country | state | channel_title | video_title | video_id | name | type | channel_id | channel_username | video_length | location | address | nr_words | text_pos | latlong | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | USA | Alabama | Baldwin County Alabama | Loxley School | KAFtEkKE_ik | COUNTY OF BALDWIN | general purpose | UCVEowdBqDvlT_TJDjf8YuyA | BCCommission | 1651.38 | Baldwin County, AL, USA | Baldwin County, AL, USA | 3513 | the_DT_25.619 Baldwin_NNP_26.619 County_NNP_27... | (30.6010744, -87.77633329999999) |
1 | USA | Alabama | Baldwin County Alabama | 2019 Baldwin County Sewer Utilities Informatio... | 6KvmxXRoMGA | COUNTY OF BALDWIN | general purpose | UCVEowdBqDvlT_TJDjf8YuyA | BCCommission | 6110.09 | Baldwin County, AL, USA | Baldwin County, AL, USA | 16090 | welcome_UH_1.37 everybody_NN_2.37 glad_JJ_3.89... | (30.6010744, -87.77633329999999) |
2 | USA | Alabama | Baldwin County Alabama | Heritage Museum of Baldwin County Alabama | bGL_FlSulZ4 | COUNTY OF BALDWIN | general purpose | UCVEowdBqDvlT_TJDjf8YuyA | BCCommission | 1255.31 | Baldwin County, AL, USA | Baldwin County, AL, USA | 2318 | located_VBN_22.039 in_IN_23.039 the_DT_23.13 t... | (30.6010744, -87.77633329999999) |
3 | USA | Alabama | Baldwin County Alabama | AL State Veterans Memorial Cemetery at Spanish... | BX18sxhdL5Y | COUNTY OF BALDWIN | general purpose | UCVEowdBqDvlT_TJDjf8YuyA | BCCommission | 4376.31 | Baldwin County, AL, USA | Baldwin County, AL, USA | 7991 | good_JJ_15.94 morning_NN_16.94 everyone_NN_17.... | (30.6010744, -87.77633329999999) |
4 | USA | Alabama | Baldwin County Alabama | USO Canteen Dance Re-enactment | VqCvFSku9Es | COUNTY OF BALDWIN | general purpose | UCVEowdBqDvlT_TJDjf8YuyA | BCCommission | 3893.26 | Baldwin County, AL, USA | Baldwin County, AL, USA | 6846 | [Music]_XX_5.1 [Music]_XX_14.86 [Music]_XX_104... | (30.6010744, -87.77633329999999) |
Non-standard spoken-language feature in Southern United States (also in Scotland/N. Ireland, Caribbean Englishes/creoles)
Studied in the context of dialectology and sociolinguistics (Feagin 1979; Labov 1972, Mishoe & Montgomery 1994, Montgomery 1998), syntax (Batistella 1995; Di Paolo 1989; Hasty 2012a, 2012b, 2014; Nagle 2003), language history (Fennell & Butters 1996, Montgomery & Nagle 1993), English varietal typology (Zullo et al. 2021), and from other perspectives
For most studies, data is heterogenous and limited in geographical scope
Heterogeneity of collected data: "it is doubtful that atlas data can more than give us an outline of the prevalence and distribution of MMs" (Montgomery 1998: 103)
"Double modals occur with very low frequency in real-life utterances, and they are quite difficult to elicit in sufficient quantities and in a reliable fashion" (Fennell & Butters 1996: 265)
(cf. Grieve et al. 2014)
Which double or multiple modals are used in naturalistic speech? Where?
Additional considerations
13 modals/semi-modals:
can, could, may, might, must, shall, should, will, would, ought to, oughta, 'll, used to
= 156 possible double modals (excluding repetitions)
Regex for timed-token corpus:
"\\s+("+x[0]+"n?_\\w+_\\S+\\s+(?:i?_\\w+_\\S+\\s+|we?_\\w+_\\S+\\s+|you?_\\w+_\\S+\\s+|he?_\\w+_\\S+\\s+|she?_\\w+_\\S+\\s+|it?_\\w+_\\S+\\s+|they?_\\w+_\\S+\\s+|haven?_\\w+_\\S+\\s+|'ve_\\w+_\\S+\\s+|'t_\\w+_\\S+\\s+|not_\\w+_\\S+\\s+){0,3}"+x[1]+"n?_\\w+_\\S+\\s+(?:haven?_\\w+_\\S+\\s+|'ve_\\w+_\\S+ |not_\\w+_\\S+\\s+|'t_\\w+_\\S+\\s+)?)"
This matches two-modal sequences
Many two-modal sequences are not actually "true" double modals
group | total | checked | % checked | true | self-repair | overlap | homonym/phone | asr fp | video error | audio problem | |
---|---|---|---|---|---|---|---|---|---|---|---|
40 | oughta could | 1 | 1 | 1.000000 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
32 | might could | 393 | 346 | 0.880407 | 0.72 | 0.10 | 0.10 | 0.00 | 0.17 | 0.00 | 0.01 |
79 | would oughta | 5 | 3 | 0.600000 | 0.67 | 0.00 | 0.00 | 0.00 | 0.33 | 0.00 | 0.33 |
23 | might can | 426 | 355 | 0.833333 | 0.66 | 0.06 | 0.08 | 0.00 | 0.29 | 0.01 | 0.01 |
31 | might should | 88 | 15 | 0.170455 | 0.60 | 0.20 | 0.07 | 0.00 | 0.13 | 0.00 | 0.00 |
65 | 'll might | 89 | 9 | 0.101124 | 0.56 | 0.22 | 0.00 | 0.00 | 0.33 | 0.00 | 0.00 |
77 | would might | 744 | 66 | 0.088710 | 0.52 | 0.44 | 0.05 | 0.03 | 0.12 | 0.00 | 0.03 |
102 | should oughta | 3 | 2 | 0.666667 | 0.50 | 0.00 | 0.00 | 0.00 | 0.50 | 0.00 | 0.00 |
39 | oughta should | 2 | 2 | 1.000000 | 0.50 | 0.50 | 0.50 | 0.00 | 0.00 | 0.00 | 0.00 |
38 | oughta would | 3 | 2 | 0.666667 | 0.50 | 0.00 | 0.00 | 0.00 | 0.50 | 0.00 | 0.50 |
100 | should might | 61 | 4 | 0.065574 | 0.50 | 0.25 | 0.25 | 0.00 | 0.25 | 0.00 | 0.00 |
73 | 'll could | 58 | 2 | 0.034483 | 0.50 | 0.00 | 0.00 | 0.00 | 0.50 | 0.00 | 0.00 |
103 | should must | 18 | 2 | 0.111111 | 0.50 | 0.50 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
8 | may should | 105 | 62 | 0.590476 | 0.45 | 0.15 | 0.18 | 0.05 | 0.23 | 0.00 | 0.03 |
29 | might would | 434 | 36 | 0.082949 | 0.42 | 0.14 | 0.28 | 0.00 | 0.22 | 0.00 | 0.00 |
12 | can might | 219 | 13 | 0.059361 | 0.38 | 0.23 | 0.23 | 0.00 | 0.38 | 0.00 | 0.00 |
75 | would may | 360 | 11 | 0.030556 | 0.36 | 0.45 | 0.09 | 0.00 | 0.45 | 0.00 | 0.00 |
34 | oughta can | 3 | 3 | 1.000000 | 0.33 | 0.00 | 0.00 | 0.33 | 0.33 | 0.00 | 0.00 |
98 | should may | 77 | 3 | 0.038961 | 0.33 | 0.67 | 0.33 | 0.00 | 0.33 | 0.00 | 0.00 |
50 | must could | 4 | 3 | 0.750000 | 0.33 | 0.33 | 0.00 | 0.00 | 0.67 | 0.00 | 0.00 |
dm | ||
---|---|---|
group | state | |
might could | Tennessee | 48 |
North Carolina | 38 | |
Florida | 34 | |
Texas | 24 | |
Alabama | 21 | |
Georgia | 9 | |
South Carolina | 9 | |
Arkansas | 7 | |
Kentucky | 7 | |
Virginia | 6 | |
Mississippi | 5 | |
Louisiana | 4 | |
Massachusetts | 4 | |
Utah | 4 | |
California | 3 | |
Colorado | 3 | |
Ontario | 3 | |
Illinois | 2 | |
Michigan | 2 | |
Oklahoma | 2 | |
Wisconsin | 2 | |
Arizona | 1 | |
British Columbia | 1 | |
Connecticut | 1 | |
Indiana | 1 | |
Kansas | 1 | |
Maine | 1 | |
Maryland | 1 | |
Minnesota | 1 | |
Oregon | 1 | |
South Dakota | 1 | |
Washington | 1 | |
might can | North Carolina | 54 |
Tennessee | 35 | |
Georgia | 27 | |
Florida | 21 | |
Alabama | 15 | |
Texas | 12 | |
Maryland | 8 | |
Kentucky | 7 | |
South Carolina | 7 | |
Virginia | 7 | |
California | 6 | |
Mississippi | 4 | |
Arkansas | 3 | |
Connecticut | 3 | |
Illinois | 3 | |
Massachusetts | 3 | |
Michigan | 3 | |
Colorado | 2 | |
Louisiana | 2 | |
Alberta | 1 | |
Kansas | 1 | |
Minnesota | 1 | |
New Mexico | 1 | |
New York | 1 | |
North Dakota | 1 | |
Ohio | 1 | |
Oklahoma | 1 | |
Ontario | 1 | |
Oregon | 1 | |
Pennsylvania | 1 | |
Rhode Island | 1 | |
Wisconsin | 1 | |
might would | North Carolina | 18 |
Florida | 11 | |
Tennessee | 9 | |
Alabama | 7 | |
South Carolina | 7 | |
Michigan | 5 | |
Georgia | 4 | |
Kentucky | 4 | |
Massachusetts | 4 | |
Mississippi | 3 | |
Ohio | 3 | |
South Dakota | 3 | |
Texas | 3 | |
Virginia | 3 | |
Washington | 3 | |
Illinois | 2 | |
Louisiana | 2 | |
Minnesota | 2 | |
Alberta | 1 | |
Arizona | 1 | |
Arkansas | 1 | |
British Columbia | 1 | |
California | 1 | |
Connecticut | 1 | |
Kansas | 1 | |
Missouri | 1 | |
New Jersey | 1 | |
New York | 1 | |
Ontario | 1 | |
Oregon | 1 | |
Pennsylvania | 1 | |
Wyoming | 1 | |
would might | California | 13 |
Arizona | 8 | |
Colorado | 6 | |
Tennessee | 6 | |
Alabama | 3 | |
Florida | 3 | |
Massachusetts | 3 | |
Ontario | 3 | |
Arkansas | 2 | |
Connecticut | 1 | |
Kansas | 1 | |
Michigan | 1 | |
New York | 1 | |
Ohio | 1 | |
Pennsylvania | 1 | |
Virginia | 1 | |
will can | Texas | 5 |
California | 4 | |
North Carolina | 4 | |
Arkansas | 3 | |
Minnesota | 2 | |
Ohio | 2 | |
South Carolina | 2 | |
Alabama | 1 | |
Arizona | 1 | |
Florida | 1 | |
Maryland | 1 | |
Mississippi | 1 | |
Missouri | 1 | |
Nevada | 1 | |
New Hampshire | 1 | |
New Jersey | 1 | |
Oklahoma | 1 | |
Ontario | 1 | |
Tennessee | 1 | |
Utah | 1 | |
Virginia | 1 | |
could might | Florida | 4 |
Massachusetts | 3 | |
California | 2 | |
Colorado | 2 | |
Maryland | 2 | |
Michigan | 2 | |
North Carolina | 2 | |
Ohio | 2 | |
Alabama | 1 | |
Alberta | 1 | |
Arizona | 1 | |
British Columbia | 1 | |
Illinois | 1 | |
Kansas | 1 | |
Maine | 1 | |
Missouri | 1 | |
Nevada | 1 | |
New Jersey | 1 | |
Tennessee | 1 | |
Texas | 1 | |
Wisconsin | 1 | |
may should | California | 4 |
Texas | 4 | |
Florida | 3 | |
Massachusetts | 2 | |
Michigan | 2 | |
Tennessee | 2 | |
Arizona | 1 | |
Arkansas | 1 | |
Colorado | 1 | |
Illinois | 1 | |
Kentucky | 1 | |
Minnesota | 1 | |
Missouri | 1 | |
North Carolina | 1 | |
Rhode Island | 1 | |
Virginia | 1 | |
Washington | 1 | |
may could | Tennessee | 5 |
Georgia | 4 | |
North Carolina | 4 | |
California | 2 | |
Louisiana | 2 | |
Michigan | 2 | |
Arizona | 1 | |
Colorado | 1 | |
Illinois | 1 | |
Indiana | 1 | |
Maryland | 1 | |
Mississippi | 1 | |
Ontario | 1 | |
South Carolina | 1 | |
Texas | 1 | |
may can | Florida | 7 |
Alabama | 5 | |
Georgia | 4 | |
California | 3 | |
Tennessee | 3 | |
Kentucky | 2 | |
Louisiana | 2 | |
Colorado | 1 | |
Virginia | 1 | |
would should | California | 5 |
Florida | 4 | |
Arkansas | 3 | |
Wisconsin | 3 | |
British Columbia | 2 | |
Colorado | 1 | |
Illinois | 1 | |
New York | 1 | |
Ohio | 1 | |
Ontario | 1 | |
Pennsylvania | 1 | |
South Dakota | 1 | |
Tennessee | 1 | |
Texas | 1 | |
might should | Tennessee | 6 |
Florida | 3 | |
Georgia | 3 | |
Arizona | 2 | |
Illinois | 2 | |
North Carolina | 2 | |
Alberta | 1 | |
California | 1 | |
Massachusetts | 1 | |
Pennsylvania | 1 | |
Washington | 1 | |
would could | Arizona | 8 |
California | 4 | |
Arkansas | 2 | |
Connecticut | 1 | |
Tennessee | 1 | |
should might | Michigan | 2 |
Alabama | 1 | |
California | 1 | |
Connecticut | 1 | |
Iowa | 1 | |
Maine | 1 | |
New Mexico | 1 | |
Ohio | 1 | |
Oregon | 1 | |
South Carolina | 1 | |
Tennessee | 1 | |
Wisconsin | 1 | |
will could | Georgia | 3 |
Wisconsin | 2 | |
Alabama | 1 | |
Arizona | 1 | |
California | 1 | |
Florida | 1 | |
Louisiana | 1 | |
New Jersey | 1 | |
Tennessee | 1 | |
'll will | California | 6 |
Arizona | 2 | |
Alabama | 1 | |
Colorado | 1 | |
Kentucky | 1 | |
Ontario | 1 | |
might will | Illinois | 3 |
Alabama | 1 | |
Arkansas | 1 | |
Georgia | 1 | |
Kentucky | 1 | |
Louisiana | 1 | |
Ontario | 1 | |
Wisconsin | 1 | |
may will | North Carolina | 3 |
Arizona | 2 | |
Alberta | 1 | |
New Jersey | 1 | |
Ontario | 1 | |
Pennsylvania | 1 | |
Virginia | 1 | |
would can | Alabama | 1 |
Arizona | 1 | |
Ohio | 1 | |
Ontario | 1 | |
Pennsylvania | 1 | |
Tennessee | 1 | |
will must | Indiana | 1 |
Iowa | 1 | |
North Carolina | 1 | |
Pennsylvania | 1 | |
Texas | 1 | |
must might | California | 2 |
Massachusetts | 1 | |
Minnesota | 1 | |
Pennsylvania | 1 | |
might must | California | 2 |
Michigan | 1 | |
Nebraska | 1 | |
Wisconsin | 1 | |
may ought to | Alabama | 1 |
California | 1 | |
North Carolina | 1 | |
Tennessee | 1 | |
Texas | 1 | |
may might | Arizona | 1 |
Colorado | 1 | |
Kansas | 1 | |
Oregon | 1 | |
Virginia | 1 | |
can might | Arizona | 1 |
Maryland | 1 | |
New York | 1 | |
Tennessee | 1 | |
Texas | 1 | |
'll might | Ontario | 2 |
Alabama | 1 | |
Arizona | 1 | |
California | 1 | |
would may | Alabama | 1 |
Arizona | 1 | |
Florida | 1 | |
Ontario | 1 | |
must may | Massachusetts | 1 |
Pennsylvania | 1 | |
Texas | 1 | |
Virginia | 1 | |
might 'll | New Hampshire | 1 |
New Jersey | 1 | |
Oklahoma | 1 | |
Tennessee | 1 | |
may would | Arizona | 1 |
Connecticut | 1 | |
Florida | 1 | |
Tennessee | 1 | |
can will | California | 2 |
Florida | 1 | |
Ontario | 1 | |
will may | Arizona | 1 |
Ontario | 1 | |
Tennessee | 1 | |
must can | California | 2 |
Tennessee | 1 | |
can may | Tennessee | 2 |
Florida | 1 | |
would oughta | Florida | 1 |
Minnesota | 1 | |
will might | Iowa | 1 |
North Carolina | 1 | |
might may | Ontario | 1 |
Tennessee | 1 | |
may must | Minnesota | 1 |
Ontario | 1 | |
could will | Ohio | 1 |
Ontario | 1 | |
could used to | Massachusetts | 2 |
could ought to | Alaska | 1 |
Oregon | 1 | |
could may | Kentucky | 2 |
can would | California | 1 |
Tennessee | 1 | |
can must | Florida | 1 |
Kentucky | 1 | |
can could | Connecticut | 1 |
Florida | 1 | |
'll may | Alabama | 1 |
Virginia | 1 | |
'll can | California | 1 |
Ontario | 1 | |
would must | Kansas | 1 |
will should | Connecticut | 1 |
should shall | Kansas | 1 |
should oughta | Tennessee | 1 |
should must | British Columbia | 1 |
should may | Florida | 1 |
should can | Indiana | 1 |
shall would | Wisconsin | 1 |
oughta would | Michigan | 1 |
oughta should | Louisiana | 1 |
oughta could | Minnesota | 1 |
oughta can | Idaho | 1 |
must would | Alberta | 1 |
must will | Pennsylvania | 1 |
must should | Illinois | 1 |
must could | South Dakota | 1 |
might oughta | North Carolina | 1 |
could would | Ontario | 1 |
could should | Arizona | 1 |
can should | Kansas | 1 |
'll could | Tennessee | 1 |
Normalized frequencies for all occurrences and manually-verified occurrences of might could, might can, and will can used to calculate Getis-Ord G*i values based on a 100-nearest-neighbor binary weights matrix
→ "Careful epistemicity" with double forms?
Double and triple modals were examined in CoNASE
Battistella, E. (1995). The syntax of the double modal construction. LLinguistica Atlantica 17, 19–44.
Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. (1999). Longman Grammar of Spoken and Written English. Harlow: Pearson Education.
Butters, R. (1973). Acceptability judgments for double modals in Southern dialects. In Bailey, R. & Shuy, R. (eds.), New ways of analyzing variation in linguistics, 276–286. Washington, DC: Georgetown University Press.
Coates, J. (1983). The semantics of the modal auxiliaries. London and Canberra: Croom Helm.
Coats, S. (2019). A corpus of regional American language from YouTube. In Navarretta, C. et al. (Eds.), Proceedings of the 4th Digital Humanities in the Nordic Countries Conference, Copenhagen, Denmark, March 6–8, 2019 (pp. 79–91). Aachen, Germany: CEUR.
Coats, S. (2020). Articulation rate in American English in a corpus of YouTube videos. Language and Speech. https://doi.org/10.1177/0023830919894720
Coats, S. (in review). Dialect corpora from YouTube
Di Paolo, M. (1989). Double modals as single lexical items. American Speech, 64(3): 195–224.
Feagin, C. (1979). Variation and change in Alabama English: A sociolinguistic study of the White community. Washington, DC: Georgetown University Press.
Gresset, S. (2003). Towards a contextual micro-analysis of the non-equivalence of might and could. In Facchinetti et al. (eds.), 81–102.
Grieve, J., Nini, A, Guo, D. & Kasakoff, A. (2015). Using social media to map double modals in modern American English. Presented at NWAV 44, University of Toronto, October 22–25, 2015.
Hasty, J. D. (2011). I might not would say that: A sociolinguistic study of double modal acceptance. University of Pennsylvania Working Papers in Linguistics 17(2), 91–98.
Hasty, J. D. (2012a). This Might Could Help Us Better Understand Syntactic Variation: The Double Modal Construction in Tennessee English. Ph.D. Dissertation, Michigan State University.
Hasty, J. D. (2012b). We might should oughta take a second look at this: A syntactic re-analysis of double modals in Southern United States English. Lingua 122(14), 1716–1738.
Hasty, J. D. (2014). We might should be thinking this way: Theory and practice in the study of syntactic variation. In Zanuttini, R. & Horn, L. R. (eds.), Micro-syntactic variation in North American English, 269–293. Oxford: Oxford University Press.
Labov, W. (1972). Language in the inner city: Studies in the Black English vernacular. Philadelphia, PA: University of Pennsylvania Press.
Leech, G. (2003). Modality on the move: The English modal auxiliaries 1961–1992. In Facchinetti et al. (eds.), 223–240.
Leech, G., Hundt, M., Mair, C. & Smith, N. (2009). Change in contemporary English: A grammatical study. Cambridge: CUP.
Levelt, W. J. M. (1983). Monitoring and self-repair in speech. Cognition 14(4), 41–104.
Lickley, R. J. (2015). Fluency and disfluency. In M. Redford (ed.), The handbook of speech production, 445–469. Wiley-Blackwell.
McDavid, R. I. (1981). The conduct of an Atlas interview in the Gulf States. Ed. by S. E. Leas. LAGS Working Papers, First Series, No. 2 http://www.lap.uga.edu/Projects/LAGS/LAGS-WorkingPapers/1%20Working%20Papers%20first%20series%20paper%20%232.pdf
Mishoe, M. & Montgomery, M. (1994). The pragmatics of multiple modal variation in North and South Carolina. American Speech 69(1), 3–29.
Montgomery, M. (1989). Exploring the roots of Appalachian English. English World-Wide 10(2), 227–278.
Montgomery, M. (1998). Multiple modals in LAGS and LAMSAS. In Montgomery, M. & Nunnally, T. E. (eds.), From the Gulf States and beyond: The legacy of Lee Pederson and LAGS, 90–122. Tuscaloosa: University of Alabama Press.
Montgomery, M. & Nagle, S. J. (1994). Double modals in Scotland and the Southern United States: Trans-Atlantic inheritance or independent development? Folia Linguistica Historica, 14(1-2): 91–108.
Myhill, John. (1995). Change and continuity in the function of the American English modals. Linguistics 33, 157–211.
Nagle, S. J. (2003). Double modals in the Southern United States: Syntactic structure or syntactic structures? In Facchinetti et al. (eds.), 349–371.
Pederson, L. et al. (1986–1992). Linguistic Atlas of the Gulf States (6 vols.). Athens, GA: University of Georgia Press.
Postma, A. (2000). Detection of errors during speech production: a review of speech monitoring models. Cognition 77, 97–131.
Schegloff E. A., Jefferson G. & Sacks H. (1977). The preference for self-correction in the organization of repair in conversation. Language 53, 361–382.