How big data and machine learning can be used to generate more meaningful insights on gender inequality

Researchers analyse over half a million articles published across almost 20 years to examine gender imbalances in media reporting

22nd July 2019 - The Creative Industries Policy and Evidence Centre (PEC), in partnership with Nesta, have used big data and machine learning to examine gender imbalances in media reporting on the creative industries and how media coverage of women in the sector has changed in recent years. Rather than just counting the numbers of women and men in the creative industries, big data can be used to unlock deeper insights about gender diversity. 

As an example, researchers looked at Guardian, because it's the only major UK newspaper with an open API. Researchers looked at articles published in The Guardian’s arts and creative industries sections(1), such as Fashion, Stage, Media, Books & Games, from 2000-2018 to examine differences in the way that women and men are reported on and how coverage of women in the creative industries(2) has changed in recent years. 

The research is published alongside the launch of the Creative Diversity All-Party Parliamentary Group (APPG), a new cross-party group in Parliament to identify and tackle obstacles to diversity in the creative industries. The APPG itself was in part inspired by a round table convened by Nesta and chaired by Ed Vaizey MP and Tracy Brabin MP.  

Key findings include: 

Amount of space given to women in the newspaper

  • The last five years has seen a large relative increase in references to women. From 2000-2013, where pronouns were used in an article to identify a person’s gender (eg. she, her, he, him), less than a third were referring to women. By 2018, the percentage of pronouns that were female had reached 40 per cent. 
  • To put that increase into context, the gender mix amongst workers in the UK’s creative industries remains largely unchanged and has remained fairly flat over six years at around 37 per cent. 2018 was the first year in which The Guardian made relatively more references to women that the percentage of workers who are female in the creative industries. 
  • In 2000, just under a quarter of quotes that were followed by the words ‘she’ or ‘he’ (as in ‘she said’) were by women. Based on current trends, 2019 may be the first year in which women are quoted as often as men in a given month. 

Different words used to describe women and men

  • Compared to men, there is more focus on particular sounds made by women, such as ‘laughs’, ‘cries’, ‘giggles’, and ‘coos’, and non-verbal reactions, such as ‘smiles’, ‘grins’ and ‘nods’. 
  • Words that imply creative achievements and leadership roles were more likely to refer to men than women, such as ‘directed’, ‘performed’, ‘painted’ and ‘designed’ as well as ‘managed’, ‘founded’ and ‘launched’. 

Comparing different creative sections of the newspaper 

  • In 2018, the Fashion section gave the greatest space to women, and is the only section where the balance has tipped over 50 per cent. On the other hand, in the Technology and Games sections, female pronouns comprised just a quarter of all pronouns in 2018.
  • While these figures are low, they are consistent with the gender mix of workers in IT, Software and Computer Services, which was estimated at 21 per cent in 2018.

The research is focused on articles within The Guardian because, unlike any other major newspaper in the UK, it offers open access to its content. On a broad level, the research shows how big data and machine learning can provide new insights on gender inequality. 

Cath Sleeman, Researcher at the Creative Industries Policy and Evidence Centre and Head of Data Visualisation, Creative Economy at Nesta, said:

“While the research shows substantial positive change in recent years, there are still clear gender imbalances in coverage of areas such as technology and games, echoing the gender imbalances amongst workers in these sectors. Beyond the individual findings, the research shows how big data and machine learning can be used to generate more meaningful insights on gender inequality across any sector of the workforce. 

Going forward, a more equal representation of women in the press may have two effects: it may encourage more women to enter, but it may also give the impression that the creative industries are more balanced than is the case. This highlights the importance of using big data to inform measures of diversity - rather than simply counting the number of men and women, we should aim to capture differences in how people are portrayed.” 

The full analysis and data visualisation can be viewed here: She said more

-Ends-

For media enquiries and interview requests please contact Anna Zabow, Communications Manager at the Creative Industries Policy and Evidence Centre (PEC), on +44 7713 619077 or anna.zabow@nesta.org.uk

Notes to Editors

1. The analysis is based on over half a million articles published in The Guardian newspaper between 2000 and 2018. These articles were taken from sections of the paper relating to the creative industries: Art and Design, Music, Fashion, Games, Film, Books, Stage, Television and Radio, Technology, Culture and Media. Duplicate articles were removed, as were articles that contained less than 20 words. Amongst British newspapers, The Guardian reports extensively on the creative industries and, unlike any other major newspaper, it offers open access (via an API) to its content. It is for this reason that our research focuses on The Guardian and we were unable to measure the representation of women in other newspapers. 

2. The creative industries are defined by the Department of Digital, Culture, Media and Sport (DCMS) as: Advertising and marketing; Architecture; Crafts; Product design, graphic design and fashion design; Film, TV, video, radio and photography; IT, software, video games and computer services; Publishing and translation; Museums, galleries and libraries; and Music, performing arts, visual arts and cultural education. 

About the Creative Industries Policy and Evidence Centre (the PEC)

The Creative Industries Policy and Evidence Centre (PEC) works to support the growth of the UK’s creative industries through the production of independent and authoritative evidence and policy advice. Led by Nesta and funded by the Arts and Humanities Research Council (AHRC) as part of the UK Government’s Industrial Strategy, the PEC comprises a consortium of universities from across the UK (Birmingham, Cardiff, Edinburgh, Glasgow, Work Foundation at Lancaster University, LSE, Manchester, Newcastle, Sussex, and Ulster). The PEC works with a diverse range of industry partners including the Creative Industries Federation. Initial industry partners also include Creative England, the British Film Institute and Tech Nation.

To find out more, visit pec.ac.uk or @CreativePEC

About Nesta 

Nesta is an innovation foundation. For us, innovation means turning bold ideas into reality and changing lives for the better. We use our expertise, skills and funding in areas where there are big challenges facing society. We've spent over 20 years working out the best ways to make change happen through research and experimenting, and we've applied that to our work in innovation policy, health, education, government innovation and the creative economy and arts. Nesta is based in the UK and supported by a financial endowment. We work with partners around the globe to bring bold ideas to life to change the world for good.

22 July 2019