class: center, middle, inverse, title-slide # Skin Tone Emoji and Sentiment on Twitter ###
Steven Coats
English Philology, University of Oulu
steven.coats@oulu.fi
###
3rd DHN Conference, Helsinki
March 7th, 2018
--- class: inverse, center, middle background-image: url(https://cc.oulu.fi/~scoats/oululogoRedTransparent.png); background-repeat: no-repeat; background-size: 80px 57px; background-position:right top; exclude: true --- layout: true <div class="my-header"><img border="0" alt="W3Schools" src="https://cc.oulu.fi/~scoats/oululogonewEng.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats                  Skin Tone Emoji and Sentiment on Twitter | DHN 18</span></div> --- <div class="my-header"><img border="0" alt="W3Schools" src="https://cc.oulu.fi/~scoats/oululogonewEng.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats                  Skin Tone Emoji and Sentiment on Twitter | DHN 18</span></div> ## Outline 1. Emoticons and Emoji 2. Skin tone emoji 3. Global distribution of skin tone emoji 4. Skin tone emoji and sentiment 5. Skin tone emoji and word embeddings --- ## Emoticons and Emoji -- - Emoticons: Pictorial representations of (mainly) facial expressions, created with sequences of (mainly) ASCII or Latin-1 characters :) :D :-/ :^) (^\_^) ಠ\_ಠ ʕ •ᴥ•ʔ ( ͡° ͜ʖ ͡°) -- - Emoji: Pictorial representations in graphical form. Origins in Japan in 1990s, introduced into Unicode late 2000s as dedicated code points. Currently 2,789 unique emoji, more with every Unicode update. .center[ ![](https://twemoji.maxcdn.com/2/72x72/1f600.png)![](https://twemoji.maxcdn.com/2/72x72/1f63a.png)![](https://twemoji.maxcdn.com/2/72x72/1f478.png)![](https://twemoji.maxcdn.com/2/72x72/1f680.png)![](https://twemoji.maxcdn.com/2/72x72/1f1eb-1f1ee.png)![](https://twemoji.maxcdn.com/2/72x72/1f1e6-1f1fd.png)<br>![](https://twemoji.maxcdn.com/2/72x72/1f1f8-1f1ea.png)![](https://twemoji.maxcdn.com/2/72x72/1f1e9-1f1f0.png)![](https://twemoji.maxcdn.com/2/72x72/1f1f3-1f1f4.png)![](https://twemoji.maxcdn.com/2/72x72/1f1eb-1f1f4.png)![](https://twemoji.maxcdn.com/2/72x72/1f1ee-1f1f8.png)![](https://twemoji.maxcdn.com/2/72x72/1f1ec-1f1f1.png)].small[Glyphs from [twemoji](https://github.com/twitter/twemoji)] -- - Emoji are used in computer-mediated communication in most languages, making them interesting for NLP --- <div class="my-header"><img border="0" alt="W3Schools" src="https://cc.oulu.fi/~scoats/NewLogoRussianPNG1.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats                  Skin Tone Emoji and Sentiment on Twitter | DHN 18</span></div> ## Skin tone emoji Since Unicode 8.0 (June 17, 2015), skin tone characters are part of Unicode -- .pull-left30[ ![](Fitzpatrick1a.png).small[[source](https://www.arpansa.gov.au/sites/g/files/net3086/f/legacy/pubs/RadiationProtection/FitzpatrickSkinType.pdf)]] -- .pull-right70[ ![](https://twemoji.maxcdn.com/2/72x72/1f3fb.png)Emoji Modifier Fitzpatrick Type-1-2<br> ![](https://twemoji.maxcdn.com/2/72x72/1f3fc.png)Emoji Modifier Fitzpatrick Type-3<br> ![](https://twemoji.maxcdn.com/2/72x72/1f3fd.png)Emoji Modifier Fitzpatrick Type-4<br> ![](https://twemoji.maxcdn.com/2/72x72/1f3fe.png)Emoji Modifier Fitzpatrick Type-5<br> ![](https://twemoji.maxcdn.com/2/72x72/1f3ff.png)Emoji Modifier Fitzpatrick Type-6<br> ] --- ## Skin tone emoji use Skin tone is shown using sequences of Unicode characters .pull-left[ ![](https://twemoji.maxcdn.com/2/72x72/1f478.png) + ![](https://twemoji.maxcdn.com/2/72x72/1f3fb.png) = ![](https://twemoji.maxcdn.com/2/72x72/1f478-1f3fb.png)<br> ![](https://twemoji.maxcdn.com/2/72x72/1f478.png) + ![](https://twemoji.maxcdn.com/2/72x72/1f3fc.png) = ![](https://twemoji.maxcdn.com/2/72x72/1f478-1f3fc.png)<br> ![](https://twemoji.maxcdn.com/2/72x72/1f478.png) + ![](https://twemoji.maxcdn.com/2/72x72/1f3fd.png) = ![](https://twemoji.maxcdn.com/2/72x72/1f478-1f3fd.png)<br> ![](https://twemoji.maxcdn.com/2/72x72/1f478.png) + ![](https://twemoji.maxcdn.com/2/72x72/1f3fe.png) = ![](https://twemoji.maxcdn.com/2/72x72/1f478-1f3fe.png)<br> ![](https://twemoji.maxcdn.com/2/72x72/1f478.png) + ![](https://twemoji.maxcdn.com/2/72x72/1f3ff.png) = ![](https://twemoji.maxcdn.com/2/72x72/1f478-1f3ff.png) ] .pull-right[ <br><br> \U0001f478\U0001f3fb <br><br><br> \U0001f478\U0001f3fc <br><br><br> \U0001f478\U0001f3fd <br><br><br> \U0001f478\U0001f3fe <br><br><br> \U0001f478\U0001f3ff <br><br> ] --- ## Emoji sequences Since Unicode 9.0 (late 2016), emoji sequences can also be used to indicate activities, professions, groups, etc. These can usually be combined with skin tone as well. -- ![](https://twemoji.maxcdn.com/2/72x72/1f468.png) + ![](https://twemoji.maxcdn.com/2/72x72/2695.png) = ![](https://twemoji.maxcdn.com/2/72x72/1f468-200d-2695-fe0f.png) -- ![](https://twemoji.maxcdn.com/2/72x72/1f9db.png) + ![](https://twemoji.maxcdn.com/2/72x72/1f3ff.png) + ![](https://twemoji.maxcdn.com/2/72x72/2640.png) = ![](https://twemoji.maxcdn.com/2/72x72/1f9db-1f3ff-200d-2640-fe0f.png) -- Sequences can utilize additional **zero-width joiner** and **variation selector** code points to show that the sequence is to be parsed as one character ![](https://twemoji.maxcdn.com/2/72x72/1f9d6-1f3fb-200d-2642-fe0f.png) = \U0001f9d6\U0001f3fb\U0000200d\U00002642\U0000fe0f -- - Parsing and tokenization of emoji sequences can present difficulties --- ## Research questions How are these skin tone emoji being used globally? - How often do users select a skin tone variant compared to a default version? - Which skin tones are being used? -- Emoji and sentiment - What does emoji use tell us about sentiment? (Kralj-Novak et al. 2015) - Is there a relationship between skin tone emoji and sentiment? -- Meanings associated with individual emoji types - Do skin tone emoji types exhibit similar semantic profiles? - Word embeddings to explore (skin tone) emoji meanings --- ## Data collection and methods - 653,457,659 tweets with *place* attributes collected from Twitter's Streaming API from November 2016 – June 2017 (retweets excluded) -- - Dictionary of **potential** skin tone emoji used to count occurrences of the types that can be modified with skin tone and their values - Median and average skin tone values per country calculated (1 = Emoji Modifier Fitzpatrick Type-1-2, 5 = Emoji Modifier Fitzpatrick Type-6) -- - Overview of skin tone use geographically, correlation of skin tone and sentiment (using Kralj-Novak et al. sentiment dictionary), word embeddings to investigate skin tone emoji meanings (tokenization issue) --- ## Global skin tone emoji summary statistics .small[
] --- ## Proportion of potential skin tone emoji assigned skin tone <div class="midcenter"> <iframe src="https://cc.oulu.fi/~scoats/chartProportion1.html" style="max-width = 100%" sandbox="allow-same-origin allow-scripts" width="900" height="500" scrolling="yes" seamless="seamless" frameborder="0" align="middle"> </iframe> </div> --- ## Median skin tone values <div class="midcenter"> <iframe src="https://cc.oulu.fi/~scoats/chartMedianST1.html" style="max-width = 100%" sandbox="allow-same-origin allow-scripts" width="100%" height="500" scrolling="yes" seamless="seamless" frameborder="0" align="middle"> </iframe> </div> --- <div class="my-header"><img border="0" alt="W3Schools" src="https://cc.oulu.fi/~scoats/NewLogoRussianPNG1.png" width="80" height="80"></div> <div class="my-footer"><span>Steven Coats                  Skin Tone Emoji and Sentiment on Twitter | DHN 18</span></div> ## Global distribution of skin colors ![](ancestralSkintone.png).small[[source](https://sruk.org.uk/skin-color-an-example-of-adaptation-to-the-environment/)] --- ## Emoji sentiment rankings (Kralj-Novak et al. 2015) - L1 Annotators categorized tweets containing emoji in 13 European languages as "negative", "neutral", or "positive" - Aggregate statistics were used to assign sentiment values to individual emoji - This dictionary was applied to evaluate sentiment in my data --- ## Emoji sentiment rankings (Kralj-Novak et al. 2015) <div class="midcenter"> <iframe src="https://kt.ijs.si/data/Emoji_sentiment_ranking/" style="max-width = 100%" sandbox="allow-same-origin allow-scripts" width="100%" height="550" scrolling="yes" seamless="seamless" frameborder="0" align="middle"> </iframe> </div> --- ## Calculation of mean sentiment by country/territory for tweets with potential skin tone emoji - ~25m tweets with potential skin tone emoji - Tweets stripped of usernames, URLs, and hashtags, then tokenized - Mean values per country/territory ![](tokens_emojidata.png) --- ## Correlation of tweet sentiment with skin tone emoji .pull-left[ ![](sentColor237_min1SkT.png) ] .pull-right[ ![](sentColor50_top50nTweets.png) ] --- ## Word embeddings - Distributional hypothesis (Harris 1968) - Collocational information can be represented with vectors of co-occurrence probabilities - Similarity of collocational context (and thus meaning) for any two types can be quantified using cosine similarity - For types `\(A\)` and `\(B\)`, corresponding to vectors `\(\mathbf{A}\)` and `\(\mathbf{B}\)`: `$$\text{similarity} = \cos(\theta) = {\mathbf{A} \cdot \mathbf{B} \over \|\mathbf{A}\| \|\mathbf{B}\|} = \frac{ \sum\limits_{i=1}^{n}{A_i B_i} }{ \sqrt{\sum\limits_{i=1}^{n}{A_i^2}} \sqrt{\sum\limits_{i=1}^{n}{B_i^2}} }$$` - Based on a span of 5 tokens to the left and right and at least 10 occurrences in the tweets with potential skin tone --- ## Cosine similarity to skin emoji codepoints, 1,000 most frequent types in corpus<sup>1</sup> .small[
.footnote[[1]~25m tweets with potential skin tone emojis.] ] --- ## Tokenizing emoji sequences - Python script incorporating elements from [nltk.tokenize.casual](https://www.nltk.org/_modules/nltk/tokenize/casual.html), [tinysegmenter](https://pypi.python.org/pypi/tinysegmenter) for Japanese, [jieba](https://pypi.python.org/pypi/jieba/) for Mandarin, and [emojione](https://github.com/emojione/emojione) data for sequences -- 'This is a tokenizer test ![](https://cdn.jsdelivr.net/emojione/assets/3.0/png/32/1f44d-1f3ff.png)![](https://cdn.jsdelivr.net/emojione/assets/3.0/png/32/1f1f8-1f1ea.png)' -- - ['This', 'is', 'a', 'tokenizer', 'test', '![](https://cdn.jsdelivr.net/emojione/assets/3.0/png/32/1f44d.png)', '![](https://cdn.jsdelivr.net/emojione/assets/3.0/png/32/1f3ff.png)', '![](https://cdn.jsdelivr.net/emojione/assets/3.0/png/32/1f1f8.png)', '![](https://cdn.jsdelivr.net/emojione/assets/3.0/png/32/1f1ea.png)'] = bad! - ['This', 'is', 'a', 'tokenizer', 'test', '![](https://cdn.jsdelivr.net/emojione/assets/3.0/png/32/1f44d-1f3ff.png)', '![](https://cdn.jsdelivr.net/emojione/assets/3.0/png/32/1f1f8-1f1ea.png)'] = good! -- '私は絵文字が大好き!![](https://cdn.jsdelivr.net/emojione/assets/3.0/png/32/1f64b-1f3fb.png)' -- - ['私は絵文字が大好き', '!', '![](https://cdn.jsdelivr.net/emojione/assets/3.0/png/32/1f64b.png)', '![](https://cdn.jsdelivr.net/emojione/assets/3.0/png/32/1f3fb.png)'] = bad! - ['私', 'は', '絵文字', 'が', '大好き', '!', '![](https://cdn.jsdelivr.net/emojione/assets/3.0/png/32/1f64b-1f3fb.png)'] = good! --- ## 400 to 2 dimensions: t-SNE (van der Maaten and Hinton 2008) of emoji vectors <div class="midcenter"> <iframe src="https://cc.oulu.fi/~scoats/tsneEmoji_composed_060318.html" style="max-width = 100%" sandbox="allow-same-origin allow-scripts" width="100%" height="800" scrolling="yes" seamless="seamless" frameborder="0" align="middle"> </iframe> </div> --- ## Summary - About half of emojis that can take skin tone have skin tone – more popular in Anglophone countries - Lighter skin tone emoji are favored in Asia, the Middle East, and parts of Latin America -- - Negative correlation between sentiment and darker skin tone (Helliwell et al. 2017, Ljubešić and Fišer 2016) - Light median skin tone values: prevailing cultural standards? (Peltzer et al. 2016; Swami et al. 2008, Li et al. 2008, Sahay & Piran 1997) - Darker skin tone emoji are closer in meaning to informal (AAVE) English lexical items -- - Kral-Novak et al. sentiment dictionary only up to Unicode 6.0 (2014), only European languages, low levels of inter-annotator agreement - More detailed examination possible (USA, Europe, Nordics, E. Asia, e.g.) - Analysis of evaluative use or correlation with affective language (swearing/profanity) --- #Thank you! --- ### References .small[ .hangingindent[ Davis, M., and Edberg, P. (2015). [Unicode emoji](https://unicode.org/reports/tr51/) (Unicode Technical Standard #51). Harris, Z. (1968). *Mathematical structures of language*. New York: Interscience. Helliwell, J. F., Huang, H., and Wang, S. (2017). The social foundations of world happiness. In: Helliwell, J. F., Layard, R., and Sachs, J. (eds.), *Word Happiness Report 2017*. New York: Columbia University Center for Sustainable Development. Kralj-Novak, P., Smailovic, J., Sluban, B., and Mozetic, I. (2015). [Sentiment of emojis](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0144296). *PLoS ONE* 10(12). Li, E., Min, H., Belk, R., Kimura, J., and Bahl, S. (2008). Skin lightening and beauty in four Asian cultures. In: Lee, A., and Soman, D. (eds.), *Advances in Consumer Research Volume 35*, pp. 444–449. Duluth, MN: Association for Consumer Research. Ljubešić, N., and Fišer, D. (2016). A global analysis of emoji usage. In: *Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task*, pp. 82–89. Stroudsburg, PA: Association for Computational Linguistics. van der Maaten, L., and Hinton, G. (2008). Visualizing High-Dimensional Data Using t-SNE. *Journal of Machine Learning Research* 9:2579–2605. Peltzer, K., Pengpid, D., and James, C. (2016). The globalization of whitening: prevalence of skin lighteners (or bleachers) use and its social correlates among university students in 26 countries. *International Journal of Dermatology* 55(2), 165–172. Sahay S., and Piran, N. (1997). Skin-color preferences and body satisfaction among South Asian-Canadian and European-Canadian female university students. *Journal of Social Psychology* 137(2), 161–171. Swami, V., Furnham, A., and Joshi, K. (2008). The influence of skin tone, hair length, and hair colour on ratings of women's physical attractiveness, health and fertility. *Scandinavian Journal of Psychology* 49, 429–437. ]]