From presence to prominence: How can computer vision widen evidence base of on-screen representation
There are two big initiatives in the UK regularly collecting evidence on diversity across different demographics on screen. These are: Project Diamond by the Creative Diversity Network, the flagship programme capturing diversity data in the UK Broadcasting supply chain, and the Office of Communications (Ofcom)’s annual diversity in television broadcasting reports.
While these existing data collection exercises are very informative, we believe they miss out on important parts of the picture. There are three areas where more evidence could be particularly impactful.
First, current evidence misses out on some key aspects of representation. For example, much of the evidence focuses on presence (whether someone, for example, a character, appears on screen) and very little on prominence (e.g. how much screen time a character has, and how centered or foregrounded they are) or portrayal (e.g. the context, narratives and any stereotypes). If applied appropriately, computer vision has the potential to expand evaluation of on-screen representation from presence to prominence.
Second, existing data coverage is a) low, and b) uneven across demographic groups. For example, Project Diamond had an average response rate of 28% of individuals working on qualifying TV content across the five contributing broadcasters in 2018/19. There is therefore no direct knowledge of how inclusive or representative the remaining 70+% of the broadcast landscape is. A 2018 review by NatCen further discussed the “inevitable possibility of reporting bias due to non-response as a consequence of the low response rate.” Ofcom’s reports have similarly highlighted “data gaps” and “insufficient collection” of disability and some other demographic data.
There is in general comparatively much less data on some underrepresented and minoritised groups. BFI’s evidence review summarising more than 60 research publications about workforce diversity found that “sexual orientation and religion and belief were seldom explored in detail” in the publications that they examined. For on-screen representation (Standard A), only 1% of productions meeting the BFI Diversity Standard do so via gender identity, compared with 63% for gender and 50% for race/ethnicity.
BFI’s Diversity Standard is a contractual requirement for all British Film Institute funding as well as an eligibility requirement for certain BAFTA awards. It recently inspired similar rules for the Oscars in the US. |
Computer vision does not have similar faults of being slow (relative to manual annotation) and possible reporting bias (relative to diversity forms). But it does have its own methodological challenges, which are discussed here.
Third, existing methods (e.g. manual annotation and self-reported forms) are insightful but inherently limited. The evidence gaps resulting from their limitations can only be plugged if we embrace more innovative methods, so we can see a more complete picture and advance the discussion.
The manual annotation approach, which counts characters from particular groups on-screen, has been used in ‘state-of-diversity’ reports (e.g. annotating seven weeks of broadcasts on BBC main channels) as well as specific media studies research (e.g. analysing the representation of older adults or ethnic minorities on television). But it is more laborious, can be slow to deliver insights, and only covers what the researcher can feasibly annotate.
The self-reported survey approach also reflects a partial view. For a TV show that ended many years ago, it is difficult to run a retrospective survey with the cast and crew, but it is possible to apply computer vision to study the programme. For reference, Project Diamond began in 2016 and Ofcom’s annual diversity and equal opportunities in television reports began in 2017 — so both are relatively young initiatives.
Archived television such as Learning on Screen’s BoB can be used to study the broadcast landscape. There is also potential for computational researchers to take advantage of the UK’s text and data mining copyright exception, introduced in 2014 to allow processing of protected materials for non-commercial research purposes.
Embracing and building upon new evidence methods
Fundamentally, more complete and richer data about representation on screen can only add to the evidence base. As researchers, we think that there are overlooked datasets like archives and newer methods like computer vision that can help address gaps in the evidence base for representation.
In particular, if applied and interpreted appropriately, computer vision can speed up certain parts of data compilation, provide new insights, and widen the evidence base from presence to prominence. It is not a cure-all, but newer methods can potentially be an important supplement to how we understand representation and evidence progress. In the longer term, there is much room for socially-minded AI researchers in the creative industries and elsewhere to build systems that apply computer vision to the domain responsibly.
From presence to prominence
There is good reason for diversity metrics to consider more than on-screen presence and branch out to prominence and portrayal.
First, existing measures of on-screen representation need to be more sophisticated to cater for ongoing discussions about prominence and portrayal. One insight highlighted by the BFI this year is the “percentages of productions foregrounding lead characters from underrepresented groups have room for improvement.” They found that “the comparatively low percentage of black, Asian and other minority ethnic characters speaks to the need for more lead roles and ownership of narrative if film is to be properly representative.” At the same time, we want to avoid tokenistic representation and portrayal that perpetuates stereotypes. Existing measurements are not sufficiently well-purposed to inform these discussions.
Second, there are already feasible computational methods that can be applied to measure prominence. E.g. how relatively centered or foregrounded (‘front and centre’) a character is can be measured by how much screen or speaking time they have.
But the wider social norms around applying facial technologies responsibly and ethically are still developing. In the next post, we present a conceptual framework for measuring on-screen representation. It lays out the key data ethics and logistical considerations that should be considered when applying computer vision to measure representation.
Image by Terje Sollie
Related Research Reports
Post-Brexit migration and accessing foreign talent in the Creative Industries
The UK’s departure from the EU has changed the way that British firms trade and work with Euro…
12 facts about the UK’s international trade in creative goods and services
Worldwide exports of creative goods exceeded 500 billion USD in 2015, with a 150% increase since 200…
The migrant and skills needs of creative businesses in the UK
Download the Appendices This report details the results of a survey of employers commissioned b…