Skip to content
>> Home > Blog > Getting creative with big data to examine gender inequality

Getting creative with big data to examine gender inequality

abstract artwork

The term “big data” may bring to mind swaths of private information held by tech companies. But lots of big data is, in fact, visible to all – we just may not think of it as “data”.

If you’ve been to the movies recently, you will have seen a dataset of credits – listing the cast and crew members alongside their roles. While the credits from any one film may not be that useful, the credits from every film can form a big dataset. At Nesta and the PEC (a new policy and evidence centre for the creative industries), we have been exploring how these types of non-confidential big datasets can shine new light on gender representation in the creative industries.

Gender representation has traditionally been gauged using surveys of workers. But most surveys haven’t been going for that long and it can take several years (after launching a new survey) before we can tell how the gender mix is changing. Also, surveys often don’t go beyond counting the number of women and men – and so can’t shed light on how prominent each group was in the creative process, or how they were portrayed in a particular art form. 

Digging deep

We looked recently at the media’s reporting of women in the creative industries using more than half a million articles from The Guardian newspaper, published between 2000 and 2018, from sections of the paper relating to the creative industries (such as Books, Film, Fashion and Games).

In the past five years, there has been a large increase in references to women. From 2000 to 2013, less than one-third of gendered pronouns within articles (for example, “he” and “she”) referred to women. But this began to change in 2014 – and by 2018 the percentage of gendered pronouns that were female had reached 40%. By contrast, the gender mix among workers in the UK’s creative industries has remained flat in recent years, and sits at around 37%.

We also studied the words that followed the pronouns “he” and “she”, to gain insight into the media’s portrayal of creative workers. This led us to discover that, compared to men, there was greater focus on particular sounds made by women, such as “laughs”, “cries”, “giggles”, and “coos”, and non-verbal reactions, such as “smiles”, “grins” and “nods”. These words were never used frequently, but when they were used, they were more likely to be referring to women than men (compared to other words).

In contrast, words relating to past creative achievements and leadership activities more frequently referred to men. For example, you’re much more likely to see “he directed” than “she directed”, and similarly “he performed”, “he designed”, “he managed” and “he founded”. This finding is consistent with the long-running gender imbalances in the creative industries.

In another study, we used a dataset from the British Film Institute (BFI) that contained the credits from every UK feature-length film released to cinema.

After the BFI inferred people’s gender from their first names, we found that the on-screen gender mix hasn’t changed meaningfully since the end of World War II – and in 2017 women still only made up around 30% of cast members and 34% of crew members.

This dataset also showed gender-based differences in the jobs of on-screen characters. Since 2005, for example, only 16% of on-screen “doctors” (in unnamed roles) have been played by women, which jars with the fact that women make up 46% of doctors in the UK

Creative fairness

We are by no means the only researchers showing the potential of non-confidential sources of big data to inform gender metrics in the creative industries. Researchers at Google, in collaboration with the Geena Davis Institute, used facial and speech recognition technology to show that in the 100 highest-grossing live-action films in the US, in each year from 2014 to 2016, women occupied just 36% of screen time and 35% of the speaking time.

While big data studies can enrich diversity measures, there are two important sources of potential bias. First, we’re almost always inferring gender – from a face, a first name or a single pronoun – and so we may get a person’s gender wrong. Second, these inference methods typically only detect “male” and “female”, excluding or misclassifying anyone who identifies with a non-binary gender. For these reasons, big data methods are not a replacement for surveys – as surveys allow people to self-identify and opt out entirely.

Even bearing in mind these potential biases, there are still many big data sources that could shed new light on gender imbalances, if only they were made available to researchers. For example, access to the stills and subtitles of films and television programmes could be used to evaluate diversity schemes, while access to the content of more newspapers would enable a broader study on the media’s reporting of creative workers.

To realise the potential of these new methods, we need to encourage and support creative organisations to securely share their non-confidential data. That will hopefully allow researchers to get a little more creative about measuring gender equality in the UK’s creative industries.

First published by The Conversation on 28th August 2019. 

Related Blogs

Copyright protection in AI-generated works: Evolving approaches in the EU and China

Prof Kristofer Erickson discusses the different approaches the EU and China have taken in response t…

Introducing the World Creativity Organization

Edna dos Santos-Duisenberg (member of Creative PEC's Global Creative Economy Council) & Lucas Foster…

Island in Transition: The Journey from Reggae Music Mecca to Creative Economy Hub

Andrea Dempster Chung, Co-founder and executive director of Kingston Creative A blog from Creative P…

UK engagement in Central Asia: Education and the creative economy in the territories of the ‘new Silk Roads’

Dr Martin Smith and Dr Gerald Lidstone look at the history of the British Council's work in Central …

Creative Industries in Egypt: An Overview 

Omar Nagati – GCEC Member and Co-Founder of CLUSTER – outlines the findings of a study into the crea…

Introducing the Global Creative Economy Council (GCEC)

Hasan Bakhshi and Rehana Mughal explain what the GCEC is trying to achieve and how the network will …

Global Creative Economy Council: An introduction from the Chair

John Newbigin introduces Creative PEC's Global Creative Economy Council

Creative PEC’s response to the Spring Budget 2024

Creative Industries in the 2024 Spring Budget The creative industries are a significant part of the …

Abstract image by Shahadat rahman
Copyright protection in AI-generated works

Timely exploration of copyright law and AI generated creative content

image of cinema
The economic value of cinema venues to their communities

In a tough economic climate for cinemas and where there is limited public funding, it is important t…

image of camera person filiming group in a room
Creative diversity in higher education

As the APPG for Creative Diversity launches their annual report, ‘Making the Creative Maj…

abstract art
Estimating the Contribution of Arts, Humanities and Social Sciences (AHSS) R&D to Creative Industries R&D

The UK’s creative industries are hugely innovative; PEC research has suggested that over two-th…


Sign up to our newsletter