Multilingual Clusters and Gender in Nordic Twitter

Steven Coats
English Philology, University of Oulu, Finland
steven.coats@oulu.fi

2nd Digital Humanities in the Nordic Countries Conference
March 14th, 2017

Goals of research

Small but significant differences between genders have been found in (e.g.) relative frequencies of lexical and grammatical types in English (Argamon et al. 2007, Newman et al. 2008, Bamann et al. 2014), also in Twitter data from northern Europe (Coats 2016)

Data collection

Data collection | Filtering

Data collection | Language detection

Data collection | Improving language detection accuracy

Tweet density by language

Map data from Natural Earth, maps from Open Street Maps and Carto.

Data collection | Gender disambiguation

Quantifying bilingualism

Creating networks of bi- and multilinguals

\(language_{i}\) ~\(language_{i}\)
\(language_{j}\) \(O_{11}\) \(O_{12}\) \(= R_{1}\)
~\(language_{j}\) \(O_{21}\) \(O_{22}\) \(= R_{2}\)
\(= C_{1}\) \(= C_{2}\) \(= N\)

\[ \phi_{ij}= \frac{(O_{11}O_{22}-O_{12}O_{21})}{\sqrt{R_{1}R_{2}C_{1}C_{2}}} \]

Language connection strength

\[ t_{ij}= \frac{\phi_{ij}\sqrt{D-2}}{\sqrt{1-\phi_{ij}^{2}}}; D = max(R_{1},C_{1}) \]

Cluster for females

Cluster for males

(Random sample of male users = number of female users)

Gender differences in the networks

Significance of gender differences

\[ z_{(i,j)} = \frac{z_{\phi_{(i,j)_{m}}}-z_{\phi_{(i,j)_{f}}}}{\sqrt{\frac{1}{n_{(i,j)_{m}}-3}+{\frac{1}{n_{(i,j)_{f}}-3}}}} \]
(Sheskin 2000: 792)

For languages \((i,j)\):

\(z_{\phi_{(i,j)_{m}}}\) is the Fisher transformed value of \(\phi\) for males

\(z_{\phi_{(i,j)_{f}}}\) is the Fisher transformed value of \(\phi\) for females

\(n_{(i,j)_{m}}\) and \(n_{(i,j)_{f}}\) are the number bilingual speakers in the male and female networks.

Significance of gender differences II

Summary and conclusions

Acknowledgements

Thanks to

References

Almende, B.V., and B. Thieurmel. 2016. “visNetwork: Network Visualization using vis.js Library”. R package version 1.0.2. https://CRAN.R-project.org/package=visNetwork

Argamon, S., M. Koppel, J. W. Pennebaker and J. Schler. 2007. Mining the blogosphere: Age, gender, and the varieties of self-expression. First Monday 12/9. http://pear.accc.uic.edu/ojs/index.php/fm/article/view/2003/1878

Bamann, D., J. Eisenstein and T. Schnoebelen. 2014. Gender identity and lexical variation in social media. Journal of Sociolinguistics 18(2), 135–160. http://onlinelibrary.wiley.com/doi/10.1111/josl.12080/full

Cheng, J., B. Karambelkar and Y. Xie. 2017. “leaflet: Create Interactive Web Maps with the JavaScript ‘Leaflet’ Library”. R package version 1.1.0. https://CRAN.R-project.org/package=leaflet

Coats, S. 2016. Grammatical feature frequencies of English on Twitter in Finland. In L. Squires (ed.), English in computer-mediated communication: Variation, representation, and change. Berlin: De Gruyter. 179–210.

Csardi G., and T. Nepusz. 2006. “The igraph software package for complex network research”. InterJournal: Complex Systems. http://igraph.org

Eleta, I. and J. Golbeck. 2014. “Multilingual use of Twitter: Social networks at the language frontier”. Computers in Human Behavior 41, 424–432.

Labov, W. 1990. “The intersection of sex and social class in the course of linguistic change.” Language Variation and Change 2, 205–254.

Newman, M. L., C. J. Groom, L. D. Handelman and J. W. Pennebaker. 2008. “Gender differences in language use: An analysis of 14,000 text samples”. Discourse Processes 45(3), 211–236. http://dx.doi.org/10.1080/01638530802073712

References II

Roesslein, J. 2015. Tweepy. Python package [Computer software]. http://www.tweepy.org

Ronen, S., B. Gonçalves, K. Z. Hu, A. Vespignani, S. Pinker, and C. A. Hidalgo. 2014. “Links that speak: The global language network and its association with global fame”. PNAS 111(52), E5616–E5622. http://dx.doi.org/10.1073/pnas.1410931111

Sheskin, D. 2000. Handbook of parametric and non-parametric statistical procedures, 2nd ed. Boca Raton: Chapman and Hall.

Sites, D. 2013. Compact language detector 2. https://github.com/CLD2Owners/cld2