Cinemetrics is the statistical analysis of film style (Salt 1974), and has the potential to make a significant contribution to film studies in identifying trends in film style (shot length distributions, shot scales) that will allow scholars to explore questions of individual style, genre, studio style, national differences, and changes in style over time. However, the potential of cinemetrics is hamstrung by the poor quality of the statistics practised by film scholars. For example, in a discussion of Salt’s (2006) survey of shot length distributions, Buckland (2008) recently confused the coefficient of determination (R2 as a measure of goodness-of-fit of a regression line) with the correlation coefficient (r) – although the two are intimately related. Similarly, O’Brien (2005: 88-93) has argued that the introduction of sound technologies in Hollywood and France in the late-1920s led to an increase in average (mean) shot lengths (ASL) but does not employ any tests (e.g. t-test, one-way ANOVA, chi-square, or their nonparametric equivalents) to determine if changes in ASL are significant, does not provide confidence intervals for estimates of ASL in a particular country or time period, and does not consider the use of the median as a measure of central tendency or data transformations for skewed shot length distributions. Here I discuss a particular mis-application of statistics in the analysis of film style: the so-called Heftberger Correlation between cutting rate and type of motion represented in Man with a Movie Camera (Dziga Vertov, 1929).
The Heftberger Correlation
The herculean effort of a meticulous statistical analysis of Vertov’s Man with a Movie Camera (MWMC) offers the potential for a rich and detailed understanding of this complex film’s intricate style, and has been undertaken by Yuri Tsivian, Adelheid Heftberger, Barbara Wurm, and Gunars Civjans. As a part of this project, data has been produced that covers the distribution of shot lengths for each reel and for the film overall, for the use of point-of-view shots, and for the relationship between shot length and the type of motion represented by the film. It is as a statistic of this last element of film style that the Heftberger Correlation (HC) has been proposed as a measure (Cinemetrics 2008).
The researchers hypothesised that the cutting rate would increase with the intensity of movement within a shot, which was defined as belonging to one of seven categories: black frames (BF), fast motion (camera) (FastC), fast motion (naturally) (FastN), freeze-frame (FF), no motion (NM), normal motion (naturally) (NormalN), and slow motion (camera) (SMC). Once the dataset employed was reduced to exclude the category BF, it was claimed that there is a correlation between cutting speed and intensity of motion. For MWMC, the value for HC including NM is 0.2, and excluding NM it is 0.4. A further step was to remove the category FF, so that only data for shots with movement were included to give the Particular Heftberger Correlation (PHC), and is was claimed that this produced a stronger correlation but no figure was supplied. The conclusion arrived at by the researchers is that (1) the HC exists; (2) the HC for MWMC is weak and nonlinear; and (3) the PHC is MWMC is stronger than the HC and is linear.
It is far from clear what statistical processes have been used in the calculation of the HC and the PHC, and I have been unable to reconstruct the process by which the above quoted values for HC were derived. The researchers themselves acknowledge that the processes involved in producing the plots of shot length and intensity of movement in Figure 1 are not ‘mathematically sound,’ and it is precisely these plots that are employed as justification that the HC exists. It does not appear to have occurred to anyone involved that the lack of mathematical ‘soundness’ would present a problem in employing a statistical analysis.
Figure 1 The Heftberger Correlation in Man with a Movie Camera (1929) (Source: http://www.cinemetrics.lv/movie.php?movie_ID=2311, accessed 9 April 2009)
What is clear is that correlation is not an appropriate statistical method to be employed in this analysis. Correlation is a method of analysing if pairs of variables are related and the strength of that relationship. The pairing of the variable is important: each point on the graph represents a value on the x-axis and a value on the y-axis For example, if we measure the height and weight of ten people, we will have ten pairs of data, with each pair consisting of a measure of height and a measure of weight – it is the relationship between these measures that we call a correlation. The Heftberger Correlation does not exist simply because it is not possible to calculate a correlation for pairs of data when the number of categories of motion intensity is seven and the number of shots in the film is 1729 – there are no pairs of data to correlate. Data does not appear to be ordinal – although order exists for some categories (FastC is quicker that SMC) it does not exist for others (BF) and the distinction between some categories is not ordinal (FastC and FastN). The data labels used in Figure 1 must be considered nominal and a re not tractable. The decision to proceed despite the lack of mathematical ‘soundness’ is compounded by a lack of understanding of the mathematics of correlation.
The appropriate statistical approach to be used in analysing the relationship between shot length and motion intensity is to look at the variance of shot lengths in each category. In this case the data does not meet the requirements for a parametric one-way analysis of variance (ANOVA), and a logarithmic transformation of the data is no help either. The best approach, therefore, is to employ a nonparametric analysis of variance of ranks using a Kruskal-Wallis test and Mann-Whitney U as a post-hoc test (α = 0.05).
Shot length data was sorted by category of motion intensity, and the descriptive statistics are presented in Table 1.
Table 1 Shot length data for motion intensity in Man with a Movie Camera (1929)
In analysing this data I include only four of the motion categories: FastC, FastN, NM, and NormalN. The distribution of shot lengths in these categories are represented in Figure 2.
Figure 2 Distribution of shot lengths in FastC, FastN, NM, and NormalN in Man with a Movie Camera (1929)
BF is excluded as the data includes several shot lengths of 0.0 seconds (due to a technical error in data collection); while the number of shot lengths in FF (13) and SMC (32) are too small to be reliable. The results show that there is a statistically significant relationship between shot length and intensity of motion (Hc = 289.7, P = <0.0001); and the post-hoc tests show that each category is significant different from one another (Table 2).
Table 2 Pairwise comparisons of shot length/motion intensity data for Man with a Movie Camera (1929) (Mann Whitney U, P-values only (Bonferroni Corrected α = 0.0083))
These results show that Tsivian, et al. were correct in their hypothesis that there is a relationship between shot length and motion intensity in Man with a Movie Camera; in fact, the results presented here indicate that this relationship is stronger than that identified by the HC. Focussing on the median shot length (see Table 1), we can see that FastC (0.4 seconds) has a quicker cutting rate that FastN (0.9s), while NormalN has a value of 2.8s. Although they were not included in the above test, median shot length increases as motion slows in SMC (3.7s) and FF (4.0s), and this confirms the overall relationship between shot length and motion intensity. Only NM does fit this overall pattern, with a median shot length of 2.2s. Data for BF is unreliable at the low end where shot lengths equal 0.0s.
Ben Goldacre, the GP and journalist who publishes the Bad Science blog (see Goldacre 2008), has made a distinction between scientific medicine and alternative therapies that employ scientific terms inaccurately to sound ‘sciency.’ The Heftberger Correlation sounds good, it sounds scientific, it sounds statistical; but it is not based on a sound understanding of statistical methodology. Following Goldacre, I think this use of statistical terminology should be labelled ‘sciency’ rather than science and film scholars should be discouraged from declaring the existence and relevance of such ‘statistics’. It is incumbent upon film scholars to understand the statistical methods that they wish to employ in cinemetrics and to respect the use statistical terminology. Cinemetrics can make a positive contribution to film studies, but before it can be good film studies it must first be good statistics.
Buckland, W. (2008) What does the statistical style analysis of film involve?,
Literary and Linguistic Computing 23 (2): 219-30.
Cinemetrics (2008) http://www.cinemetrics.lvmovie.php?movie_ID=2311, accessed 9 April 2009
Goldacre, B. (2008) Bad Science. London: Fourth Estate.
O’Brien, C. (2005) Cinema’s Conversion to Sound: Technology and Film Style in France and the U.S. Bloomington: Indiana University Press.
Salt, B. (1974) Statistical style analysis of motion pictures, Film Quarterly 28 (1): 13-22.
Salt, B. (2006) Moving into Pictures: More on Film History, Style, and Analysis. London: Starwood.