Cinemetrics is the statistical study of film style, and has primarily been applied in the analysis of shot length distributions. To date, the key statistic used to describe the distribution of shot lengths in a film has been the mean shot length. However, shot length distributions are typically positively skewed with a number of outlying data points; and, as a result, the mean shot length is unlikely to be an informative statistic of film style. While log transformation of shot length presents one solution in dealing with such data, it is not universally applicable as shot length distributions are often neither normally nor log normally distributed (Salt 2006: 389-396). Censoring data is undesirable as outlying shot lengths may be a significant element of a film’s style: removing the opening shot from Touch of Evil (1958) or the traffic jam from Weekend (1967) from our analysis would be to take away arguably the most distinctive (and certainly the most famous) elements of these film’s style. The median shot length is a more reliable statistic when discussing shot length distributions, as it it is unaffected by those factors (skew, outliers) that render the mean suspect. In this post, I demonstrate the difference between the mean and the median shot length.
Methods and statistical analyses
Shot length data for Terence Davies’ trilogy of films Children (1976), Madonna and Child (1980), and Death and Transfiguration (1983) was accessed from the Cinemetrics database (Vasconcelos 2009a, 2009b, 2009c). Shot length data was summarised with descriptive statistics, and the distributions were represented as box plots. The differences between these distributions were analysed using the Kruskal-Wallis test and a Dunn post hoc test.
A statistical summary of Davies’ trilogy of films is presented in Table 1.
Table 1 Statistical summary of three films by Terence Davies
Basing an analysis on the mean shot lengths for these films, we would conclude (1) that Children (14.5s) and Death and Transfiguration (14.9s) have similar distributions, and (2) that these films are both different from Madonna and Child (19.6s). However, it is clear that the distributions of shot lengths in these films are positively skewed; and, as the maximum shot length of each film is substantially grater than the upper quartile, there are a number of outlying data points for each distribution. Consequently, the mean shot length is unlikely to be a reliable statistic of film style due to the asymmetrical nature of the distribution and the influence of outliers, which cause the mean value to be shifted to the higher end of the distribution. As the median is a positional rather than a computational average it is unaffected by the presence of extreme shot lengths, and as such is more reliable when dealing with the skewed distributions we typically find for shot lengths. As we are not required to censor the data by removing extreme shot lengths, the median also has the advantage of allowing us to use the complete data set for a film. Focussing on the median shot lengths, we can see that Children (9.4s) differs from both Madonna and Child (12.8s) and Death and Transformation (13.8s): although it has the greater number of outlying data points and the longest individual shot, with a lower median, lesser lower and upper quartiles, and a narrower interquartile range, Children is cut much quicker than the other two films. The two other films have median shot lengths that are much more similar than their mean shot lengths. A nonparametric analysis of the variance of the shot length distributions of these films (α = 0.05) shows that there is a significant overall difference (Hc = 6.9116, P = 0.0316, Kruskal-Wallis), and applying the Dunn test (Bonferroni corrected α = 0.0333) shows that this difference occurs between Children and Death and Transfiguration (Z = 2.2353, P = 0.0127) and Madonna and Child (Z = 2.0186, P = 0.0218); while there is no significant difference between Death and Transfiguration and Madonna and Child (Z = 0.2702, P = 0.3935). These features of the style of these films can clearly be seen in Figure 1.
Figure 1 The distribution of shot lengths in three films by Terence Davies
The mean shot length is unreliable as a statistic of film style, and its use may lead researchers to draw erroneous conclusions about the style of a film or group of film films. The median shot length is a superior statistic of the distribution of shot lengths in a film as it is unaffected by the skewed nature of the data and should be used in cinemetric analyses in place of the mean. In the case of the Terence Davies films discussed here, the use of the mean shot length as a statistic of film style would lead us to draw the wrong conclusion that Children and Death and Transfiguration are similar to one another but different from Madonna and Child, when in fact it is Children that is significantly different to the other two films, which show no overall difference.
Salt, B. (2006) Moving into Pictures: More on Film History, Style, and Analysis. London: Starwood.
Vasconcelos, C. (2009a) Children (The Terence Davies Trilogy), http://www.cinemetrics.lv/movie.php?movie_ID=2869, accessed 10 April 2009.
Vasconcelos, C. (2009b) Madonna and Child (The Terence Davies Trilogy), http://www.cinemetrics.lv/movie.php?movie_ID=2871, accessed 10 April 2009.
Vasconcelos, C. (2009c) Death and Transfiguration (The Terence Davies Trilogy), http://www.cinemetrics.lv/movie.php?movie_ID=2872, accessed 10 April 2009.