Analysing film texts

The statistical analysis of literary style was initiated by Augustus De Morgan in 1851, when he observed that ‘I should expect to find that one man writing on two different subjects agrees more nearly with himself than two different men writing on the same subject’ and suggested that average word length word be an appropriate indicator of style.This was followed up by TC Mendenhall, who analysed the works of William Shakespeare and Sir Francis Bacon by looking at the frequency distributions of word lengths.

It may seem that focussing on literary style will be of little use when dealing with films, but there is a body of research that examines film scripts and audio descriptions in order to understand the structure of narrative cinema. This post presents links to some of this material. I had intended to include this research in some of the earlier posts on empirical studies of film style, but it never quite seemed to fit (and I may have forgotten on more than one occasion). Besides it deserves a post of its own.

The best place to start is probably Andrew Vassiliou’s Ph.D thesis:

Vassiliou A 2006 Analysing Film Content: A Text Based Approach. University of Surrey, unpublished Ph.D thesis.

The aim of this work is to bridge the semantic gap with respect to the analysis of film content. Our novel approach is to systematically exploit collateral texts for films, such as audio description scripts and screenplays. We ask three questions: first, what information do these texts provide about film content and how do they express it? Second, how can machine-processable representations of film content be extracted automatically in these texts? Third, how can these representations enable novel applications for analysing and accessing digital film data? To answer these questions we have analysed collocations in corpora of audio description scripts (AD) and screenplays (SC), developed and evaluated an information extraction system and outlined novel applications based on information extracted from AD and SC scripts.

We found that the language used in AD and SC contains idiosyncratic repeating word patterns, compared to general language. The existence of these idiosyncrasies means that the generation of information extraction templates and algorithms can be mainly automatic. We also found four types of event that are commonly described in audio description scripts and screenplays for Hollywood films: Focus_of_Attention, Change_of_Location, Non-verbal_Communication and Scene_Change events. We argue that information about these events will support novel applications for automatic film content analysis. These findings form our main contributions. Another contribution of this work is the extension and testing of an existing, mainly-automated method to generate templates and algorithms for information extraction; with no further modifications, these performed with around 55% precision and 35% recall. Also provided is a database containing information about four types of events in 193 films, which was extracted automatically. Taken as a whole, this work can be considered to contribute a new framework for analysing film content which synthesises elements of corpus linguistics, information extraction, narratology and film theory.

These papers present different aspects of the approach, using written texts to distinguish between film genres, to explore the clustering of narrative events, and the emotion responses of viewers.

Salway A, Lehane B, and O’Connor NE 2007 Associating characters with events in films, 6th ACM International Conference on Image and Video Retrieval, 9-11 July 2007, Amsterdam.

The work presented here combines the analysis of a film’s audiovisual features with the analysis of an accompanying audio description. Specifically, we describe a technique for semantic-based indexing of feature films that associates character names with meaningful events. The technique fuses the results of event detection based on audiovisual features with the inferred on-screen presence of characters, based on an analysis of an audio description script. In an evaluation with 215 events from 11 films, the technique performed the character detection task with Precision = 93% and Recall = 71%. We then go on to show how novel access modes to film content are enabled by our analysis. The specific examples illustrated include video retrieval via a combination of event-type and character name and our first steps towards visualization of narrative and character interplay based on characters occurrence and co-occurrence in events.

Salway A, Vassiliou A, and Ahmad K 2005 What happens in films?, IEEE International Conference on Multimedia and Expo, 6-8 July 2005, Amsterdam.

This paper aims to contribute to the analysis and description of semantic video content by investigating what actions are important in films. We apply a corpus analysis method to identify frequently occurring phrases in texts that describe films – screenplays and audio description. Frequent words and statistically significant collocations of these words are identified in screenplays of 75 films and in audio description of 45 films. Phrases such as `looks at’, `turns to’, `smiles at’ and various collocations of `door’ were found to be common. We argue that these phrases occur frequently because they describe actions that are important story-telling elements for filmed narrative. We discuss how this knowledge helps the development of systems to structure semantic video content.

Vassiliou A, Salway A, and Pitt D 2004 Formalizing stories: sequences of events and state changes, IEEE International Conference on Multimedia and Expo, 27-30 June 2004, Taipei, Taiwan.

An attempt is made here to synthesise ideas from theories of narrative and computer science in order to model high level semantic video content, especially for films. A notation is proposed for describing sequences of interrelated events and states in narratives. The investigation focuses on the idea of modelling video content as a sequence of states: sequences of characters’ emotional states are considered as a case study. An existing method for extracting information about emotion in film is formalised and extended with a metric to compare the distribution of emotions in two films.

Finally, a PowerPoint presentation by Andrew Salway that covers the topic fairly extensively can be accessed here.

About Nick Redfern

I am an independent academic with over 15 years experience teaching film in higher education in the UK. I have taught film analysis, film industries, film theories, film history, science fiction at Manchester Metropolitan University, the University of Central Lancashire, and Leeds Trinity University, where I was programme leader for film from 2016 to 2020. My research interests include computational film analysis, horror cinema, sound design, science fiction, film trailers, British cinema, and regional film cultures.

Posted on November 10, 2011, in Cinemetrics, Film Analysis, Film Studies, Narrative, Narrative Cinema and tagged , , , , . Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: