Revealing narrative structure through aesthetic analysis

This week some papers relating to the discovery of narrative structure in motion pictures based on the patterns of aesthetic elements. But first, many of the papers on statistical analysis of film style in this post and on many others across this blog are co-authored by Svetha Venkatesh from Curtin University’s Computing department, and her home page – with links to much research relevant to film studies – can be accessed here.

Adams B, Venkatesh S, Bui HH, and Dorai C 2007 A probabilistic framework for extracting narrative act boundaries and semantics in motion pictures, Multimedia Tools and Applications 27: 195-213.

This work constitutes the first attempt to extract the important narrative structure, the 3-Act storytelling paradigm in film. Widely prevalent in the domain of film, it forms the foundation and framework in which a film can be made to function as an effective tool for storytelling, and its extraction is a vital step in automatic content management for film data. The identification of act boundaries allows for structuralizing film at a level far higher than existing segmentation frameworks, which include shot detection and scene identification, and provides a basis for inferences about the semantic content of dramatic events in film. A novel act boundary likelihood function for Act 1 and 2 is derived using a Bayesian formulation under guidance from film grammar, tested under many configurations and the results are reported for experiments involving 25 full-length movies. The result proves to be a useful tool in both the automatic and semi-interactive setting for semantic analysis of film, with potential application to analogues occurring in many other domains, including news, training video, sitcoms.

Chen H-W, Kuo J-H, Chu W-T, Wu J-L 2004 Action movies segmentation and summarization based on tempo analysis, 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, New York, NY, 10-16 October, 2004.

With the advances of digital video analysis and storage technologies, also the progress of entertainment industry, movie viewers hope to gain more control over what they see. Therefore, tools that enable movie content analysis are important for accessing, retrieving, and browsing information close to a human perceptive and semantic level. We proposed an action movie segmentation and summarization framework based on movie tempo, which represents as the delivery speed of important segments of a movie. In the tempo-based system, we combine techniques of the film domain related knowledge (film grammar), shot change detection, motion activity analysis, and semantic context detection based on audio features to grasp the concept of tempo for story unit extraction, and then build a system for action movies segmentation and summarization. We conduct some experiments on several different action movie sequences, and demonstrate an analysis and comparison according to the satisfactory experimental results.

Hu W, Xie N, Li L, Zeng X, and Maybank S 2011 A survey on visual content-based video indexing and retrieval, IEEE Transactions On Systems, Man, and Cybernetics—Part C: Applications And Reviews, 41 (6): 797-819.

Video indexing and retrieval have a wide spectrum of promising applications, motivating the interest of researchers worldwide. This paper offers a tutorial and an overview of the landscape of general strategies in visual content-based video indexing and retrieval, focusing on methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, video retrieval including query interfaces, similarity measure and relevance feedback, and video browsing. Finally, we analyze future research directions.

Moncrieff S and Venkatesh S 2006 Narrative structure detection through audio pace, IEEE Multimedia Modeling 2006, Beijing, China, 4–6 Jan 2006

We use the concept of film pace, expressed through the audio, to analyse the broad level narrative structure of film. The narrative structure is divided into visual narration, action sections, and audio narration, plot development sections. We hypothesise that changes in the narrative structure signal a change in audio content, which is reflected by a change in audio pace. We test this hypothesis using a number of audio feature functions, that reflect the audio pace, to detect changes in narrative structure for 8 films of varying genres. The properties of the energy were then used to determine the audio pace feature corresponding to the narrative structure for each film analysed. The method was successful in determining the narrative structure for 7 of the films, achieving an overall precision of 76.4 % and recall of 80.3%. We map the properties of the speech and energy of film audio to the higher level semantic concept of audio pace. The audio pace was in turn applied to a higher level semantic analysis of the structure of film.

Murtagh F, Ganz A, and McKie S 2009 The structure of narrative: the case of film scripts, Pattern Recognition 42 (2): 302-312.

We analyze the style and structure of story narrative using the case of film scripts. The practical importance of this is noted, especially the need to have support tools for television movie writing. We use the Casablanca film script, and scripts from six episodes of CSI (Crime Scene Investigation). For analysis of style and structure, we quantify various central perspectives discussed in McKee’s book, Story: Substance, Structure, Style, and the Principles of Screenwriting. Film scripts offer a useful point of departure for exploration of the analysis of more general narratives. Our methodology, using Correspondence Analysis and hierarchical clustering, is innovative in a range of areas that we discuss. In particular this work is groundbreaking in taking the qualitative analysis of McKee and grounding this analysis in a quantitative and algorithmic framework.

Phung DQ , Duong TV, Venkatesh S, and Bui HH 2005 Topic transition detection using hierarchical hidden Markov and semi-Markov models, 13th Annual ACM International Conference on Multimedia, 6-11 November 2005, Singapore.

In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modelling.

Pfeiffer S and Srinivasan U 2002 Scene determination using auditive segmentation models of edited video, in C Dorai and S Venkatesh (eds.) Computational Media Aesthetics. Boston: Kluwer Academic Publishers: 105-130.

This chapter describes different approaches that use audio features for determination of scenes in edited video. It focuses on analysing the sound track of videos for extraction of higher-level video structure. We define a scene in a video as a temporal interval which is semantically coherent. The semantic coherence of a scene is often constructed during cinematic editing of a video. An example is the use of music for concatenation of several shots into a scene which describes a lengthy passage of time such as the journey of a character. Some semantic coherence is also inherent to the unedited video material such as the sound ambience at a specific setting, or the change pattern of speakers in a dialogue. Another kind of semantic coherence is constructed from the textual content of the sound track revealing for example the different stories contained in a news broadcast or documentary. This chapter explains the types of scenes that can be constructed via audio cues from a film art perspective. It continues on a discussion of the feasibility of automatic extraction of these scene types and finally presents existing approaches.

Weng C-Y, Chu W-T, and Wu J-L 2009 RoleNet: movie analysis from the perspective of social networks, IEEE Transactions on Multimedia 11(2): 256-271.

With the idea of social network analysis, we propose a novel way to analyze movie videos from the perspective of social relationships rather than audiovisual features. To appropriately describe role’s relationships in movies, we devise a method to quantify relations and construct role’s social networks, called RoleNet. Based on RoleNet, we are able to perform semantic analysis that goes beyond conventional feature-based approaches. In this work, social relations between roles are used to be the context information of video scenes, and leading roles and the corresponding communities can be automatically determined. The results of community identification provide new alternatives in media management and browsing. Moreover, by describing video scenes with role’s context, social-relation-based story segmentation method is developed to pave a new way for this widely-studied topic. Experimental results show the effectiveness of leading role determination and community identification. We also demonstrate that the social-based story segmentation approach works much better than the conventional tempo-based method. Finally, we give extensive discussions and state that the proposed ideas provide insights into context-based video analysis.


About Nick Redfern

I graduated from the University of Kent in 1998 with a degree in Film Studies and History, and was awarded an MA by the same institution in 2002. I received my Ph.D. from Manchester Metropolitan University in 2006 for a thesis title 'Regionalism and the Cinema in the United Kingdom, 1992 to 2002.' I have taught at Manchester Metropolitan University and the University of Central Lancashire. My research interests include regional film cultures and industries in the United Kingdom; cognition and communication in the cinema; anxiety in contemporary Hollywood cinema; cinemetrics; and film style and film form. My work has been published in Entertext, the International Journal of Regional and Local Studies, the New Review of Film and Television Studies, Cyfrwng: Media Wales Journal, and the Journal of British Cinema and Television.

Posted on March 29, 2012, in Cinemetrics, Film Analysis, Film Studies, Film Style, Narrative, Narrative Cinema, Statistics and tagged , , , , , , . Bookmark the permalink. 1 Comment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: