Sampling Reliability in Cinemetrics
Numerous studies that have sought to analyse film style in terms of cinemetrics have employed a sampling strategy based on using the first 1800 seconds (s) of a film (e.g. Buckland 2006, O’Brien 2005, Salt 1992). This strategy has been criticised for producing unreliable results (see Bordwell and Thompson 1985), and some who had earlier employed this approach to sampling have also come to reject (e.g. Salt 2006). This post reviews the use of such sampling in cinemetrics.
Bordwell and Thompson’s criticisms
Bordwell and Thompson’s criticisms of the use of the first 1800s of a film as a sample form part of a larger critique of Salt’s Film Style and Technology: History and Analysis (1992; originally published 1983) published in the Quarterly Review of Film Studies in 1985. Their rejection of Salt’s method rests on the fact that their own values for the mean shot lengths for some films differ from those of Salt – and as such they regard this approach as unreliable and likely to result in mistaken conclusions with regard to the history of film style. These differences are presented in Table 1.
Table 1 Average shot lengths from Salt and Bordwell and Thompson
There are numerous problems with Bordwell and Thompson’s crticisism. First, they present measures of location without the context of sample sizes and/or measures of dispersion (variance, standard deviation, range, interquartile range – c.f. Buckland 2006). Second, where mean shot lengths are presented they are presented without confidence intervals or any other measures of sampling error. Third, the use of the mean shot length for skewed distributions with outlying data points is suspect and it is not apparent that differences between sample and population means will reflect actual differences in the distributions – a handful of very long shots can have a dramatic impact on the mean, and if these shots are not evenly distributed throughout a film they will lead to the mean being incorrectly estimated. Using the sample and population median shot lengths could avoid this problem altogether. This is not discussed. Fourth, no statistical tests are employed in determining if the differences between sample and population means observed are actual differences or if they are simply noise in the data. Salt’s figures were cited as being unreliable simply by virtue of the fact that they were different from Bordwell and Thompson’s. In their discussion of Salt’s results for Soviet cinema, Bordwell and Thompson state that Salt’s values are ‘significantly inaccurate’ (1985: 230). The term ‘significant’ has particular meaning in statistics relating to the probability of making a Type I error (a false positive – saying there is a statistically significant difference when, in fact, there is not). Bordwell and Thompson do not appear to be using ‘significant’ in this sense – even though they are discussing the difference between two averages. In what sense Salt’s figures are ‘significantly inaccurate’ is not clear, and there are no tests (t-test, Mann-Whitney U test) to support this argument.
Testing the 1800s sample
To test the usefulness of the 1800s approach shot length data was collected on 10 films produced in Hollywood in 1930 and 1931 from the Cinemetrics database. Shot length data for the whole film (the population data) was compared against data for the first 1800s (the sample method) using four methods: a confidence interval approach for both mean and median shot lengths; a Mann-Whitney U test; and a Kolomogorov-Smirnov test of the cumulative distribution functions. (Data for the samples was collected to the shot nearest to 1800s, whether that was in fact greater than or less than 1800s). In all cases, the level of significance (α) is 0.05. Summary data for these films is presented in Tables 2 and 3.
Table 2 Summary statistics for shot lengths in 5 Hollywood films, 1930
Table 3 Summary statistics for shot lengths in 5 Hollywood films, 1931
The uncertainty in the measurement of the sample can be quantified by calculating a confidence interval, which may be defined as ‘a range of values for a variable of interest constructed so that this range has a specified probability of including the true value of the variable’ (Gillam et al 2007: 51). A confidence interval gives an estimated range that is likely to contain an unknown population parameter, and will contain all the hypothetical values that cannot be rejected. If samples of the same size are drawn repeatedly from the same population, and a confidence interval is calculated for each sample, then 95% of the confidence intervals should contain the population parameter. If a confidence level of 0.05 is chosen, the confidence interval has a 0.95 probability of containing the true value of the parameter. (Note that a statistic is an estimate of a parameter). Confidence intervals are often preferred to significance tests as a means of expressing sampling error.
For film studies, the mean shot length of a sample is an estimate of the mean shot length of a population and the confidence interval specifies the range of values that may be considered as estimates of the population mean for a specified level of accuracy. Here, 95% confidence intervals for the sample means are used to estimate population means. Table 4 presents the sample means, the 95% confidence interval, and the population mean for each film.
Table 4 95% confidence intervals for 10 Hollywood films, 1930 – 1931
For 8 out of ten films the sample mean is a good estimate of the popualtion mean: the population means lie within the 95% confidence interval for the sample mean. However, for Born Reckless the sample (10.7s) underestimates the population (15.1s) and for Bad Company the sample (12.1s) overestimates the population (8.8s).
As all the films have skewed distributions and the maximum shot lengths exceed the upper quartile by a substantial amount, the mean is unlikely to be a reliable statistic of film style, and (as I have discussed elsewhere) the median shot length is more representative.
Confidence intervals for the median and the Mann-Whitney U test
The Mann-Whitney U test is similar to a t-test, but dos not rely on the means of a distribution. Where the two data sets being compared have the same shape, this test is a test of the difference of the medians.
It is also possible to construct confidence intervals for sample medians, and here I have provided 95% confidence intervals based on a binomial distribution using the sample size (n) and the desired percentile (0.5) (NB: this method tends to be conservative for the median, and so the confidence intervals will be at least 95%). Confidence intervals for the median are used in the same way as those for the mean.
Table 5 presents the sample median, its 95% confidence interval, the population median, and the P-values of the Mann Whitney U test.
Table 5 95% confidence intervals for the median and the Mann-Whitney U test for 10 Hollywood films, 1930 – 1931
Born Reckless and Bad Company are again the films for which the 1800s sample is unreliable for estimating the population data. Notice also that the Mann Whitney U test for The Lottery Bride returns a P-value of 0.05 while the population median falls within the sample confidence interval: this suggests that while there is not difference in the medians the two data sets produce distributions with different shapes. This is a good example that the Mann Whitney U test cannot automatically be regarded as a medians test.
The Kolmogorov-Smirnov test compares the cumulative distrbution functions (cdf) of two data sets by calculating the maximum absolute difference between the two. The cdf gives us the probability of randomly selecting a shot length from a film that is less than or equal to a specified value (Pr[X ≤ x]): if there is no significant difference between the sample and the population then the probability of randomly selecting a shot length from the sample is approximately equal to that of randomly selecting the same shot length from the population. For example, if we look at the cumulative distribution functions for the popualtion and sample data for Animal Crackers (1930) (Figure 1), we can see that they are very similar. The D statistic is 0.0888 (P = 0.6200), and so there is no significant difference between the cdf of the population data and that of the sample data. For example, the probability of randomly selecting a shot that is less than or equal to 24.7s from the sample is 0.7957 and from the population is 0.7794.
FIGURE 1 Population and sample cumulative distribution functions for Animal Crackers (1930)
The results of the K-S test for all ten films are presented in Table 6. There are no significant differences between the sample and population shot length distributions, except for the now familiar cases of Born Reckless and Bad Company.
Table 6 Kolmogorov-Smirnov test for 10 Hollywood films, 1930 – 1931
There are a range of statistical methods than can be employed in comparing shot length distrbutions, but at present the statistical analysis of film style relies upon researchers simply pointing out the difference between two numbers in the absence of all the relevant facts (the continuing absence of measures of dispersion). Although statistical analyses have been promoted as ‘more credible and valid’ as an objective form of analysis that treats film style as an ordered response to the ‘challenges of filmmaking,’ and does not rely upon the subjective tastes of the critic (Buckland 2006: 159), this is not at present the case and too much currently relies on guesswork and supposition. Human beings are notoriously poor at seeing patterns where none exist (pareidolia) and missing them where do in fact occur. Statistics is a set of methods to minimise the errors and the impact that can happen to even the most diligent researcher. Simply arguing about whether two numbers are slightly different is not statistics.
Applying these tests to determine the suitability of using the first 1800s of a film as a sample of the distribution of shot lengths for the whole film shows that this method is not foolproof. The 1800s sample may lead to an error in estimating the parameters of a film’s style, and, subsequently, lead to erroneous conclusions about the history of film style. The use of the complete data for a film is the only reliable method.
Bordwell, D., and Thompson, K. (1985) Toward a scientific film history? Quarterly Review of Film Studies 10 (3): 224–237. Available online: http://www.davidbordwell.net/articles/.
Buckland, W. (2006) Directed by Steven Spielberg: Poetics of the Contemporary Hollywood Blockbuster. London: Continuum.
Gillam, S., Yates, J., and Badrinath, P. (2007) Essential Public Health: Theory and Practice. Cambridge: Cambridge University Press.
O’Brien, C. (2005) Cinema’s Conversion to Sound: Technology and Film Style in France and the U.S. Bloomington: Indiana University Press.
Salt, B. (1992) Film Style and Technology: History and Analysis, second edition. London: Starwood.
Salt, B. (2006) Moving into Pictures: More on Film History, Style, and Analysis. London: Starwood.