Empirical analysis of film style III
This week another selection of articles that focus on the empirical analysis of film style (see here and here for earlier instalments). These articles cover a wide range of topics from design to archiving and data storage. As ever, the linked-to version may not be the final version and so care should be taken with referencing.
Something new that has not featured in the past, but which is discussed in a couple of papers here, is the use of auditory rather than visual information to classify films. This is an area where film studies could certainly learn a lot from information science.
An interesting place to state is at the Informedia project at Carnegie Mellon University, which examines many issues relating to the archiving and use of video content. This site has details on the centre’s many projects and has links to papers covering topics such as video information summarization and visualization or video ontology, and discusses the application of these techniques in various contexts (e.g. health, eduaction, defense, and intelligence). The website can be accessed here.
Dinh PQ, Dorai C, and Venkatesh S 2002 Video genre categorization using audio wavelet coefficients, ACCV2002: The 5th Asian Conference on Computer Vision, 23-25 January 2002, Melbourne, Australia.
In this paper, we investigate the use of a wavelet transform-based analysis of audio tracks accompanying videos for the problem of automatic program genre detection. We compare the classification performance based on wavelet-based audio features to that using conventional features derived from Fourier and time analysis for the task of discriminating TV programs such as news, commercials, music shows, concerts, motor racing games, and animated cartoons. Three different classifiers namely the Decision Trees, SVMs, and _-Nearest Neighbours are studied to analyse the reliability of the performance of our wavelet features based approach. Further, we investigate the issue of an appropriate duration of an audio clip to be analyzed for this automatic genre determination. Our experimental results show that features derived from the wavelet transform of the audio signal can very well separate the six video genres studied. It is also found that there is no significant difference in performance with varying audio clip durations across the classifiers.
Dorado A, Calic J, and Izquierdo E 2004 A rule-based video annotation system, IEEE Transactions on Circuits and Systems for Video Technology 14 (5): 622-633.
Guironnet M, Pellerin D, and Rombaut M 2006 Camera motion classification based on transferable belief model, European Signal Processing Conference, 4-8 September 2006, Florence, Italy.
Hah E-J, Schmutz P, Tuch AN, Agotai D, Wiedmer M, and Opwis K 2008 Cinematographic techniques in architectural animations and their effects on viewers’ judgment, International Journal of Design 2 (3): http://www.ijdesign.org/ojs/index.php/IJDesign/article/view/479/219.
Computer-generated animations have become a commonly employed medium to communicate architectural designs and projects. Because designers of animations are not constrained by real-world conditions and do not share the rich history of film, they do not readily benefit from the body of cinematographic techniques that filmmakers can draw upon. Specialists argue that this results in unappealing, lackluster animations that could be vastly improved by the application of filmmakers’ craft knowledge. The aim of this study was to identify which aspects of film craft show the most promise by systematically examining the use of cinematographic techniques in animations and their effects on viewers’ evaluations. Our analysis of award-winning architectural animations established average shot length as a reliable and valid predictor for determining participants’ judgments of salience, vividness, and diversity. A shorter average shot length resulted in more favorable ratings, while longer shot rates led to the opposite outcome. We consider these findings from a broader filmic perspective and discuss them in light of their usefulness for designers and the field.
Ianeva TI, de Vries AP, and Röhrig H 2003 Detecting cartoons: a case study in automatic video-genre classification, IEEE International Conference on Multimedia and Expo, 6-9 July 2003, Baltimore, MD.
This paper presents a new approach for classifying individual video frames as being a `cartoon’ or a `photographic image’ The task arose from experiments performed at the TREC-2002 video retrieval benchmark: `cartoons’ are returned unexpectedly at high ranks even if the query gave only `photographic’ image examples. Distinguishing between the two genres has proved difficult because of their large intra-class variation. In addition to image metrics used in prior cartoon-classification work, we introduce novel metrics like ones based on the pattern spectrum of parabolic size distributions derived from parabolic granulometries and the complexity of the image signal approximated by its compression ratio. We evaluate the effectiveness of the proposed feature set for classification (using Support Vector Machines) on a large set of keyframes from the TREC-2002 video track collection and a set of web images. The paper reports the identification error rates against the number of images used as training set. The system is compared with one that classifies Web images as photographs or graphics and its superior performance is evident.
Moncrieff S, Venkatesh S, Dorai C 2003 Horror film genre typing and scene labelling via audio analysis, International Conference on Multimedia and Expo, 6-9 July 2003, Baltimore, MD. [NB: I haven’t been able to find a link to this paper, but it really sounds interesting].
We examine localised sound energy patterns, or events, that we associate with high level affect experienced with films. The study of sound energy events in conjunction with their intended affect enable the analysis of film at a higher conceptual level, such as genre. The various affect/emotional responses we investigate in this paper are brought about by well established patterns of sound energy dynamics employed in audio tracks of horror films. This allows the examination of the thematic content of the films in relation to horror elements. We analyse the frequency of sound energy and affect events at a film level as well as at a scene level, and propose measures indicative of the film genre and scene content. Using 4 horror, and 2 non-horror movies as experimental data we establish a correlation between the sound energy event types and horrific thematic content within film, thus enabling an automated mechanism for genre typing and scene content labelling in film.
Montagnuolo M and Messina A 2007 Multimedia knowledge representation for automatic annotation of broadcast TV archives, Journal of Digital Information Management 5 (2): 67-74.
Multimedia content classification and retrieval are indispensable tools in the current convergence of audiovisual entertainment and information media. Thanks to the development of broadband networks, every consumer will have digital video programmes available on-line as well as through the traditional distribution channels. In this scenario, since the early ‘90s, the most important TV broadcasters in Europe have started projects whose aim was to achieve preservation, restoration and automatic documentation of their audiovisual archives. In particular, the association of low-level multimedia features to knowledge and semantics for the purpose of automatic classification of multimedia archives is currently the target of many researchers in both academic and IT industrial communities. This paper describes our research direction, which is focusing on three points: (a) We first introduce a new taxonomy for classification of broadcast digital archives based on a novel theoretical approach. The advantage of this taxonomy is that it can provide an unambiguous representation of multimedia informative content from the relevant points of view to the broadcasters community. (b) We secondly present a multilayer multimedia database model to represent both structure and content of multimedia objects. (c) We further propose a framework architecture for building a Multimedia Fuzzy Annotation System (MFAS), and a description of our experimental plan.