Does the Heftberger Correlation exist?

Cinemetrics is the statistical analysis of film style (Salt 1974), and has the potential to make a significant contribution to film studies in identifying trends in film style (shot length distributions, shot scales) that will allow scholars to explore questions of individual style, genre, studio style, national differences, and changes in style over time. However, the potential of cinemetrics is hamstrung by the poor quality of the statistics practised by film scholars. For example, in a discussion of Salt’s (2006) survey of shot length distributions, Buckland (2008) recently confused the coefficient of determination (R2 as a measure of goodness-of-fit of a regression line) with the correlation coefficient (r) – although the two are intimately related. Similarly, O’Brien (2005: 88-93) has argued that the introduction of sound technologies in Hollywood and France in the late-1920s led to an increase in average (mean) shot lengths (ASL) but does not employ any tests (e.g. t-test, one-way ANOVA, chi-square, or their nonparametric equivalents) to determine if changes in ASL are significant, does not provide confidence intervals for estimates of ASL in a particular country or time period, and does not consider the use of the median as a measure of central tendency or data transformations for skewed shot length distributions. Here I discuss a particular mis-application of statistics in the analysis of film style: the so-called Heftberger Correlation between cutting rate and type of motion represented in Man with a Movie Camera (Dziga Vertov, 1929).

The Heftberger Correlation

The herculean effort of a meticulous statistical analysis of Vertov’s Man with a Movie Camera (MWMC) offers the potential for a rich and detailed understanding of this complex film’s intricate style, and has been undertaken by Yuri Tsivian, Adelheid Heftberger, Barbara Wurm, and Gunars Civjans. As a part of this project, data has been produced that covers the distribution of shot lengths for each reel and for the film overall, for the use of point-of-view shots, and for the relationship between shot length and the type of motion represented by the film. It is as a statistic of this last element of film style that the Heftberger Correlation (HC) has been proposed as a measure (Cinemetrics 2008).

The researchers hypothesised that the cutting rate would increase with the intensity of movement within a shot, which was defined as belonging to one of seven categories: black frames (BF), fast motion (camera) (FastC), fast motion (naturally) (FastN), freeze-frame (FF), no motion (NM), normal motion (naturally) (NormalN), and slow motion (camera) (SMC). Once the dataset employed was reduced to exclude the category BF, it was claimed that there is a correlation between cutting speed and intensity of motion. For MWMC, the value for HC including NM is 0.2, and excluding NM it is 0.4. A further step was to remove the category FF, so that only data for shots with movement were included to give the Particular Heftberger Correlation (PHC), and is was claimed that this produced a stronger correlation but no figure was supplied. The conclusion arrived at by the researchers is that (1) the HC exists; (2) the HC for MWMC is weak and nonlinear; and (3) the PHC is MWMC is stronger than the HC and is linear.

It is far from clear what statistical processes have been used in the calculation of the HC and the PHC, and I have been unable to reconstruct the process by which the above quoted values for HC were derived. The researchers themselves acknowledge that the processes involved in producing the plots of shot length and intensity of movement in Figure 1 are not ‘mathematically sound,’ and it is precisely these plots that are employed as justification that the HC exists. It does not appear to have occurred to anyone involved that the lack of mathematical ‘soundness’ would present a problem in employing a statistical analysis.

figure-1

Figure 1 The Heftberger Correlation in Man with a Movie Camera (1929) (Source: http://www.cinemetrics.lv/movie.php?movie_ID=2311, accessed 9 April 2009)

What is clear is that correlation is not an appropriate statistical method to be employed in this analysis. Correlation is a method of analysing if pairs of variables are related and the strength of that relationship. The pairing of the variable is important: each point on the graph represents a value on the x-axis and a value on the y-axis For example, if we measure the height and weight of ten people, we will have ten pairs of data, with each pair consisting of a measure of height and a measure of weight – it is the relationship between these measures that we call a correlation. The Heftberger Correlation does not exist simply because it is not possible to calculate a correlation for pairs of data when the number of categories of motion intensity is seven and the number of shots in the film is 1729 – there are no pairs of data to correlate. Data does not appear to be ordinal – although order exists for some categories (FastC is quicker that SMC) it does not exist for others (BF) and the distinction between some categories is not ordinal (FastC and FastN). The data labels used in Figure 1 must be considered nominal and a re not tractable. The decision to proceed despite the lack of mathematical ‘soundness’ is compounded by a lack of understanding of the mathematics of correlation.

The appropriate statistical approach to be used in analysing the relationship between shot length and motion intensity is to look at the variance of shot lengths in each category. In this case the data does not meet the requirements for a parametric one-way analysis of variance (ANOVA), and a logarithmic transformation of the data is no help either. The best approach, therefore, is to employ a nonparametric analysis of variance of ranks using a Kruskal-Wallis test and Mann-Whitney U as a post-hoc test (α = 0.05).

Shot length data was sorted by category of motion intensity, and the descriptive statistics are presented in Table 1.

Table 1 Shot length data for motion intensity in Man with a Movie Camera (1929)

table-11

In analysing this data I include only four of the motion categories: FastC, FastN, NM, and NormalN. The distribution of shot lengths in these categories are represented in Figure 2.

figure-2x

Figure 2 Distribution of shot lengths in FastC, FastN, NM, and NormalN in Man with a Movie Camera (1929)

BF is excluded as the data includes several shot lengths of 0.0 seconds (due to a technical error in data collection); while the number of shot lengths in FF (13) and SMC (32) are too small to be reliable. The results show that there is a statistically significant relationship between shot length and intensity of motion (Hc = 289.7, P = <0.0001); and the post-hoc tests show that each category is significant different from one another (Table 2).

Table 2 Pairwise comparisons of shot length/motion intensity data for Man with a Movie Camera (1929) (Mann Whitney U, P-values only (Bonferroni Corrected α = 0.0083))

table-21

These results show that Tsivian, et al. were correct in their hypothesis that there is a relationship between shot length and motion intensity in Man with a Movie Camera; in fact, the results presented here indicate that this relationship is stronger than that identified by the HC. Focussing on the median shot length (see Table 1), we can see that FastC (0.4 seconds) has a quicker cutting rate that FastN (0.9s), while NormalN has a value of 2.8s. Although they were not included in the above test, median shot length increases as motion slows in SMC (3.7s) and FF (4.0s), and this confirms the overall relationship between shot length and motion intensity. Only NM does fit this overall pattern, with a median shot length of 2.2s. Data for BF is unreliable at the low end where shot lengths equal 0.0s.

Conclusion

Ben Goldacre, the GP and journalist who publishes the Bad Science blog (see Goldacre 2008), has made a distinction between scientific medicine and alternative therapies that employ scientific terms inaccurately to sound ‘sciency.’ The Heftberger Correlation sounds good, it sounds scientific, it sounds statistical; but it is not based on a sound understanding of statistical methodology. Following Goldacre, I think this use of statistical terminology should be labelled ‘sciency’ rather than science and film scholars should be discouraged from declaring the existence and relevance of such ‘statistics’. It is incumbent upon film scholars to understand the statistical methods that they wish to employ in cinemetrics and to respect the use statistical terminology. Cinemetrics can make a positive contribution to film studies, but before it can be good film studies it must first be good statistics.

References

Buckland, W. (2008) What does the statistical style analysis of film involve?,
Literary and Linguistic Computing 23 (2): 219-30.

Cinemetrics (2008) http://www.cinemetrics.lvmovie.php?movie_ID=2311, accessed 9 April 2009

Goldacre, B. (2008) Bad Science. London: Fourth Estate.

O’Brien, C. (2005) Cinema’s Conversion to Sound: Technology and Film Style in France and the U.S. Bloomington: Indiana University Press.

Salt, B. (1974) Statistical style analysis of motion pictures, Film Quarterly 28 (1): 13-22.

Salt, B. (2006) Moving into Pictures: More on Film History, Style, and Analysis. London: Starwood.

Advertisements

About Nick Redfern

I graduated from the University of Kent in 1998 with a degree in Film Studies and History, and was awarded an MA by the same institution in 2002. I received my Ph.D. from Manchester Metropolitan University in 2006 for a thesis title 'Regionalism and the Cinema in the United Kingdom, 1992 to 2002.' I have taught at Manchester Metropolitan University and the University of Central Lancashire. My research interests include regional film cultures and industries in the United Kingdom; cognition and communication in the cinema; anxiety in contemporary Hollywood cinema; cinemetrics; and film style and film form. My work has been published in Entertext, the International Journal of Regional and Local Studies, the New Review of Film and Television Studies, Cyfrwng: Media Wales Journal, and the Journal of British Cinema and Television.

Posted on April 9, 2009, in Cinemetrics, Film Studies, Film Style, Russian Cinema, Silent cinema, Soviet Cinema and tagged , , , , , . Bookmark the permalink. 4 Comments.

  1. Adelheid Heftberger

    Hello!
    Just by coincidence I came to your blog here and let me say thank you very much!! I will definitely read it thoroughly and I am sure your input will be extremely helpful. I completely agree with your final conclusion, we also somehow stopped in the middle of discussion, since Yuri is busy, I am busy with other things in our project etc., but here is finally a good reason to pick up lose ends. Since I am very interested to avoid exactly what you criticise (mixing up half-understood practises), my PhD will be about how humanities and statistics can work together in a useful way, I would love to talk a little more. As I said, have to study your article first, but if you are interested let’s stay in touch. All the best, Heidi

  2. Nick Redfern

    If anyone is interested in tempo and the statistical analysis of film style, then a good place to start is the work of Brett Adams, Chitra Dorai, and Svetha Venkatesh, which can be accessed at http://www.computing.edu.au/~adamsb/brett/adams_etal_PRCM2000.pdf.

  3. Yuri Tsivian

    Point well taken. Unlike meteorologists, students of film rarely rely on statistics and have only a vague idea of its subtleties and pitfalls (Barry Salt is a possible exception). Cinemetrics is statistically useful only insofar as it collects and stores data and displays them numerically and visually. The way we (I, to be exact) try to get a sense of the results is, at best, dabbling. There is only one thing that I can say in defense of this dabbling, and it is that what we do is not lost when trained statisticians like the author of this blog or experts in data visualization like Lev Manovich look at what we do. We formulate our questions in awkward terms, the terms may be wrong, but the questions are right. Then, a wonderful thing happens. Someone comes and says, yes, the thing you were looking for exists, Heftberger’s work was not wasted, but here is the right way of showing it. This is all we need — we provide data, formulate the problem and wait for someone like you to provide the needed expertise. That’s the way of all knowledge, it just moves faster and in more unpredictable ways in the age of the internet. Thank, we’ll add a link to this blog to Cinemetrics.

  4. Yuri Tsivian

    Sorry, I failed to add that Nick Redfern who wrote this also wrote another essay for/about Cinemetrics which we’ll publish on our site soon (has been waiting for some time…)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: