class: center, middle, inverse, title-slide # Variation of New German Verbal Anglicisms in a Social Media Corpus ###
Steven Coats
English Philology, University of Oulu, Finland
steven.coats@oulu.fi
###
6th CMC-Corpora Conference, Antwerp
September 17th, 2018
--- class: inverse, center, middle background-image: url(http://cc.oulu.fi/~scoats/oululogoRedTransparent.png); background-repeat: no-repeat; background-size: 80px 57px; background-position:right top; exclude: true --- layout: true <div class="my-header"><img border="0" alt="W3Schools" src="http://cc.oulu.fi/~scoats/oululogonewEng.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats           Variation of New German Verbal Anglicisms in a Social Media Corpus | CMC-Corpora 6</span></div> --- <div class="my-header"><img border="0" alt="W3Schools" src="http://cc.oulu.fi/~scoats/oululogonewEng.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats           Variation of New German Verbal Anglicisms in a Social Media Corpus | CMC-Corpora 6</span></div> ## Outline 1. Verbal Anglicisms in German 2. Data collection from Twitter and corpus creation 3. Generation of new Anglicisms 4. Results: Most frequent types, prefixed forms, variation in assimilation .footnote[Slides for the presentation are on my homepage at https://cc.oulu.fi/~scoats] --- <div class="my-header"><img border="0" alt="W3Schools" src="http://cc.oulu.fi/~scoats/NewLogoRussianPNG1.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats           Variation of New German Verbal Anglicisms in a Social Media Corpus | CMC-Corpora 6</span></div> ### Anglicisms in German - Anglicism: "A word or idiom that is recognizably English in its form (spelling, pronunciation, morphology, or at least one of the three), but is accepted as an item in the vocabulary of the receptor language" .small[(Görlach 2003: 1)] -- - Anglicisms are attested in German since at least the 17th/18th c.: .best_studio[Plantation, elektrisch, Rum] .small[(von Polenz 1994: 103)], increase in borrowing from English since the 19th c., esp. after WWII .small[(Carstensen 1965, Yang 1990, Onysko 2007, Burmasowa 2010)] -- - Most Anglicisms (and other borrowings) are nouns and adjectives; only about are 5% verbs .small[(Eisenberg 2013)] -- - Verbs (and hence verbal borrowings): morphological richness ➜ insight into lexical semantics, but also (theoretically) phonology, morphology, tense/mood/aspect, syntax, etc. ➜ insight into language change -- - This study: prevalence and inflection/morphology of some **new non-finite verbal Anglicisms** --- ### Established and new verbal Anglicisms Two older verbal Anglicisms from *flirt* and *boycott* .small[(Eisenberg 2013: 84)] - .best_studio[ich bin scheiße im **flirten**] .small[(I'm shit at flirting)] - .best_studio[Wenn Frauen etwas **boykottieren**, machen sie dann eigentlich einen Girlkott?] .small[(If women boycott something, are they actually doing a girlcott?)] -- "New" Anglicisms from *sleep* and *watch* - .best_studio[Ich glaube ich gehe gleich mal **sleepen**!![](https://twemoji.maxcdn.com/16x16/1f634.png)] .small[(I think I'll go straight to sleep!![](https://twemoji.maxcdn.com/16x16/1f634.png))] - .best_studio[Welchen #NBA Game sollte ich heute Nacht watchen?] .small[(Which #NBA game should I watch tonight?)] --- ### German past participle For weak verbs, created by circumfixing the verbal stem with .best_studio[ge-] and .best_studio[-(e)t] - .best_studio[arbeiten - gearbeitet] .small[(to work - worked)], .best_studio[sagen - gesagt] .small[(to say - said)] -- Some recent borrowings from English show partial assimilation to standard German orthography: they can retain the -*ed* of the English participle - .best_studio[liken - geliked/gelikt] .small[(to like - liked)], .best_studio[crashen - gecrashed/gecrasht] .small[(to crash - crashed)], .best_studio[featuren - gefeatured/gefeaturt] .small[(to feature - featured)] -- Past participles of German transitive verbs can be used as attributive adjectives (and thus can be inflected) - .best_studio[eine geliebt**e** Katze] .small[(a well-loved cat)] - .best_studio[ein geliked**es**/gelikt**es** Foto] .small[(a photo that was liked)] --- ### Verb derviation through affixation Verbs can be formed via prefixation of separable or inseparable particles to the stem - .best_studio[sagen] 'to say' - .best_studio[**an**sagen] 'to announce' - .best_studio[**aus**sagen] 'to state, give testimony', .best_studio[**ver**sagen] 'to fail' -- Verbal Anglicisms with inseparable prefixes are "almost non-existent" in a recent large corpus study? (Eisenberg 2013: 114) -- Verbs can be formed via affixation of .best_studio[-ier-] or .best_studio[-isier-]: a historically important process, especially for integration of foreign lexical items: "der mit Abstand wichtigste und produktivste Verbalisierer, über den das Deutsche verfügt" .small[['By far the most important and productive German verbalizer'] (Eisenberg 2011: 244)] -- - .best_studio[saluieren], 12th c. .small[(Öhmann 1970)], .best_studio[agieren, diskutieren, legalisieren], many others --- ### Research questions - To what extent are users on social media coining *new* verbal Anglicisms? (i.e. not yet codified as German words or well-established in German use) -- - Which non-finite verbal forms are preferred? -- - To what extent is the past participle of new verbal Anglicisms assimilated to German orthographical norms? -- - How does assimilation interact with use of the past particple as an attributive adjective? -- - Is .best_studio[-ier-] productive for verbal Anglicisms? --- ### Data collection - 653,457,659 tweets with *place* metadata collected globally from the Twitter Streaming API from November 2016 until June 2017 -- - 60,683 authors of at least one German-language tweet with place metadata from Germany, Austria or Switzerland identified and all of their tweets/most recent 3,250 tweets (whichever was larger) downloaded from the Twitter REST API in April 2018 -- - Retain tweets in German according to Twitter's metadata -- - 36,240,530 (59.3%) of tweets in German = 534,211,366 tokens --- ### Generation of potential new Anglicisms - 1,000 most frequent base verbal forms (infinitives without *to*) from each of the the [BNC](https://corpus.byu.edu/bnc/), [COCA](https://corpus.byu.edu/coca/), and the [Wikipedia Corpus of English](https://corpus.byu.edu/wiki/) .small[(Davies 2004–, 2008–, 2015)] + 1,413 English infinitives from the [Pattern Dictionary of English Verbs](http://pdev.org.uk) .small[(Hanks 2013)] = 2,415 unique verbal forms -- - Use regex to create German infinitives .small[ - Reduplicate stem-final consonants for forms with short vowels in final syllable - Forms ending in syllable-final clusters with 'l' and silent 'e' ➜ substitute 'eln' for 'le' (bubble ➜ .best_studio[bubbeln]) - Add 'n' to forms ending in syllable-final liquids (cancel ➜ .best_studio[canceln], discover ➜ .best_studio[discovern]) - Add 'n' to forms ending in 'e', 'en' to forms ending in other characters - Some manual editing of resulting list] -- - Remove actual English words (e.g. .best_studio[driven, risen]: not infinitive Anglicisms in the corpus) by matching 236,736 English word types in `nltk.words` .small[(Bird et al. 2009)] -- - Remove standard German words (e.g .best_studio[angeln, bangen, Faden, landen], etc.) by matching 233,685 word types aggregated from the DWDS, the Leipzig Corpora Collection, and the IDS .small[(Kleuker 2016)] --- ### Base Anglicisms list .small[
] --- ### Non-finite verbal forms 10 German non-finite verbal forms created for each verb in the base list, e.g. from *to fail* -- - Infinitive ➜ .best_studio[failen] - Present participle ➜ .best_studio[failend] - Assimilated past participle (ends in *t*) ➜ .best_studio[gefailt] - Partially-assimilated past participle (ends in *ed*) ➜ .best_studio[gefailed] -- - Infinitive + .best_studio[-ier-] ➜ .best_studio[failieren] - Present participle + .best_studio[-ier-] ➜ .best_studio[failierend] - Assimilated past participle + .best_studio[-ier-] ➜ .best_studio[failiert] -- - Inf. + .best_studio[-isier-] ➜ .best_studio[failisieren] - Present participle + .best_studio[-isier-] ➜ .best_studio[failisierend] - Assimilated past participle + .best_studio[-isier-] ➜ .best_studio[failisiert] --- ### Derived and inflected verbal forms Prefixed forms of each non-finite verbal form with inseperable prefixes (.best_studio[*be-*, *er-*, *ent-*, *emp-*, *miss-*, *ver-*, *zer-*, *über-*]) and seperable prefixes (.best_studio[*ab-*, *an-*, *auf-*, *aus-*, *durch-*, *ein-*, *her-*, *herauf-*, *herum-*, *herunter-*, *hin-*, *hinzu-*, *mit-*, *voran-*, *los-*, *mit-*, *vor-*, *weg-*, *zurück-*, *zusammen-*]) -- The infixed infinitive forms with seperable prefixes (.best_studio[*abzufailen*, *anzufailen*], etc.) -- Adjectivally inflected past particples - .best_studio[*gefailter*, *gefailte*, *gefailtes*, *gefailtem*, *gefailten*, *gefailtester*, *gefailteste*, *gefailtestes*, *gefailtestem*, *gefailtesten*] - .best_studio[*gefaileder*, *gefailede*] (etc.) - .best_studio[*abgefailter*, *abgefailte*] (etc.) -- = 781,400 possible forms to search for (2,415 \* 10 \* 28 + 2,630 \* 20 + 2,630 \* 20) Find all matches in the 530m tokens of the corpus --- ### Filter out attested (old) German Anglicisms Matches were sent through SMOR .small[(Schmid et al. 2004, Fitschen 2004)] to catch declined forms of genuine German words and established "old" Anglicisms - This removed 4,844 additional attested types: .best_studio[durchgefaxt, ausgepaddelt, überlackiert, kicken, mixen, shoppen], etc. --- <div class="my-header"><img border="0" alt="W3Schools" src="http://cc.oulu.fi/~scoats/NewLogoRussianPNG1.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats           Variation of New German Verbal Anglicisms in a Social Media Corpus | CMC-Corpora 6</span></div> ### Results: Attested types by frequency .small[
] --- <div class="my-header"><img border="0" alt="W3Schools" src="http://cc.oulu.fi/~scoats/NewLogoRussianPNG1.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats              Exploring Code-switching and Borrowing using Word Vectors | ESSE 18</span></div> ### Most frequent types Many of the most frequent types refer to *new technologies or internet activities* .small[(Carstensen 1965: *Bedürfnislehnwörter* ['necessary loans'], Onysko and Winter-Froemel 2011: *catachrestic borrowings*)] -- - .best_studio[**twittern**, **streamen**, **getwittert**, **googlen**, **gestreamt**] -- - .best_studio[gechillt, **geliked**, supporten, **gefixt**, geflasht] -- - .best_studio[**adden**, **geupdated**, haten, **rendern**, **coden**] -- - .best_studio[**followen**, **gevotet**, cachen, **tracken**, **hosten**] .small[cf. Baeskow (2017)] --- ### Prefixed types by frequency .small[
] --- ### Results: Assimilated and partially-assimilated past participle forms Assimilated forms (such as .best_studio[gefixt]) are more frequent than partially-assimilated forms (.best_studio[gefixed]) (19,232 to 4,924 tokens) -- Forms with /.ipa[aɪ]/, /.ipa[eɪ]/ and /.ipa[oʊ]/ dipthongs may be more likely to retain the English participial ending due to graphemic incongruence between English and German - .best_studio[geliked > gelikt, geshaped > geshapt, gefollowed > gefollowt] -- Higher frequency ➜ more likely to assimilate --- ### Results: Assimilation of past participle <div class="midcenter"> <iframe src="http://cc.oulu.fi/~scoats/rbokeh_CMC6.html" style="max-width = 100%" sandbox="allow-same-origin allow-scripts" width="100%" height="550" scrolling="yes" seamless="seamless" frameborder="0" align="top"> </iframe> </div> --- ### Results: Past participle as attributive adjective .small[
] --- ### The .best_studio[-ier-] affix Most frequent type, *makieren*, seems to be an orthographical variant of *markieren*, 'to mark' -- - .best_studio[@user Du kannst schon seit September Freunde in Beiträgen makieren. URL] .small[(@user you can already mark friends in updates since September. URL)] - .best_studio[Immer wieder toll mit euch < wenn ich jemanden vergessen hab zu makieren tut es mir leid < ihr seid wunderbar < URL] .small[(Always great with you guys < sorry if I forgot to mark someone < you're wonderful < URL)] -- Some genuine new .best_studio[-ier] Anglicisms include *relatieren*, 'to relate', and *failieren*, 'to fail' -- - .best_studio[Entfernt relatierter Link zu News, die eigentlich gar keine sind: URL] .small[(Distantly related link to news that are actually not news: URL)] - .best_studio[Wenn man sich da mal die relatierten Videos ansieht - türkische Popmusik ist schon irgendwie ne Parallelwelt. :o] .small[(If you look at the related videos - Turkish pop music is really somehow a parallel universe. :o)] - .best_studio[\*schnarch, schnarch\* Auf reboot wart. Was auch immer da failiert hatte.. Das Ding hat weder einen Raid-Controller noch mehrere Festplatten.] .small[(\*snore, snore\* Waiting for reboot. Whatever failed there.. The thing has neither a Raid controller nor multiple hard drives.)] --- ### Issues and outlook - Cross-contamination of word lists (built automatically from web sources) ➜ use Duden Fremdwortwörterbuch? - False positives (misspellings, automatically-generated lexemes that are probably not Anglicisms) ➜ check most common misspellings, filter out -- <hr> - Semantics: Words that are difficult to define as *necessary* borrowings (e.g. .best_studio[defenden, disturben, flyen, increasen, remembern]) ➜ look for semantic shift (word vectors) - Morphology: Compare prefix productivity with that of core German verbs using `\(\mathscr{P}\)` (Baayen 2001) - Transformation strong to weak .best_studio[gedrawt, gediggt]: Look for weak ➜ strong? --- ### Summary - A large number of "new" Anglicisms are attested in a Twitter corpus of German -- - The most frequent new Anglicisms denote relatively new human experiences in the realms of information technology and computer-mediated communication -- - Past participles used as attributive adjectives are far more likely to conform to German orthographical norms -- - .best_studio[ver-] is the most productive verbal prefix for new Anglicism types. -- - .best_studio[-ier] is still productive (barely) --- #Thank you! --- ### References I .small[ .hangingindent[ Baayen, H. 2001. *Word Frequency Distributions*. Dordrecht: Kluwer. Baeskow, H. 2017. #Virtual Lexicality: The semantics of innovative prefixed verbal anglicisms in German. *Word Structure* 10.2, 173–203. Bird, S., Loper, E. and Klein, E. 2009. *Natural Language Processing with Python*. Newton, MA: O'Reilly. Burmasowa, S. 2010. *Empirische Untersuchung der Anglizismen im Deutschen am Material der Zeitung 'Die Welt'*. Bamberg: University of Bamberg Press. Carstensen, B. 1965. *Englische Einflüsse auf die Deutsche Sprache nach 1945*. Heidelberg: Carl Winter Verlag. Davies, M. 2004–. *BYU-BNC (Based on the British National Corpus from Oxford University Press)*. [https://corpus.byu.edu/bnc](https://corpus.byu.edu/bnc). Davies, M. 2008–. *The Corpus of Contemporary American English (COCA): 560 million words, 1990-present*. [https://corpus.byu.edu/coca/](https://corpus.byu.edu/coca/). Davies, M. 2015. T*he Wikipedia Corpus: 4.6 million articles, 1.9 billion words*. [https://corpus.byu.edu/wiki/](https://corpus.byu.edu/wiki/). Eisenberg, P. 2011. *Das Fremdwort im Deutschen*. Berlin and New York: de Gruyter Mouton. Eisenberg, P. 2013. Anglizismen im Deutschen. *Reichtum und Armut der deutschen Sprache : Erster Bericht zur Lage der deutschen Sprache*. Ed. by Deutsche Akademie für Sprache und Dichtung, Union der deutschen Akademien der Wissenschaften. Berlin: de Gruyter, 57–119. Fitschen, A. 2004. *Ein Computerlinguistisches Lexikon als komplexes System*. Ph.D. Thesis. Universität Stuttgart. ]] --- ### References II .small[ .hangingindent[ Görlach, M. 2003. *English Words Abroad*. Amsterdam: John Benjamins. Hanks, P. 2013. *Lexical Analysis: Norms and Exploitations*. Cambridge, MA: MIT Press. Kleuker, D. 2016. [Wortliste](https://github.com/davidak/wortliste). Onysko, A. 2007. *Anglicisms in German: Borrowing, Lexical Productivity, and Written Codeswitching*. Berlin: de Gruyter. Onysko, A. and Winter-Froemel, E. 2011. Necessary loans — luxury loans? Exploring the pragmatic dimension of borrowing. *Journal of Pragmatics* 43.6, 1550–1567. Öhmann, E. 1970. Suffixstudien VI: Das deutsche Verbalsuffix -ieren. *Neuphilologische Mitteilungen* 71.3, pp. 337–356. Polenz, P. von (1994). *Deutsche Sprachgeschichte vom Spätmittelalter bis zur Gegenwart. Band II: 17. und 18. Jahrhundert*. Berlin: de Gruyter. Roesslein, J. 2015. [Tweepy](https://github.com/tweepy/tweepy). Schmid, H., Fitschen, A. and Heid, U. 2004. SMOR: A German computational morphology covering derivation, composition, and inflection. *Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004)*, 1263–1266. Yang, W. 1990. *Anglizismen im Deutschen: Am Beispiel des Nachrichtenmagazins Der Spiegel*. Tübingen: Niemeyer Verlag. ]]