t-SNE of a Twitter skin tone emoji corpus

(The visualization takes a few seconds to load into the browser. It requires a relatively recent browser)

Word2Vec was used to analyze the content of a global corpus of 24,231,885 tweets containing skin tone emoji. From the 307,831,369 tokens (corresponding to 4,492,662 unique types), t-SNE was used to reduce the 400-dimensional vectors of the 2,704 unique emoji types to 2 dimensions.

Emoji that are closer together are used in similar contexts, and are likely to have similar meanings. Emoji that are further apart are more distant semantically. For some emoji, the skin tone variants are clustered closely together. For others, they are more spread out. This may correspond to the different way these pictographs are used by different discourse communities globally.

Use the tools to the right of the plot to zoom, drag and select.

For more information, see this article.