# Sampling Reliability in Cinemetrics

Numerous studies that have sought to analyse film style in terms of cinemetrics have employed a sampling strategy based on using the first 1800 seconds (s) of a film (e.g. Buckland 2006, O’Brien 2005, Salt 1992). This strategy has been criticised for producing unreliable results (see Bordwell and Thompson 1985), and some who had earlier employed this approach to sampling have also come to reject (e.g. Salt 2006). This post reviews the use of such sampling in cinemetrics.

## Bordwell and Thompson’s criticisms

Bordwell and Thompson’s criticisms of the use of the first 1800s of a film as a sample form part of a larger critique of Salt’s *Film Style and Technology: History and Analysis* (1992; originally published 1983) published in the *Quarterly Review of Film Studies* in 1985. Their rejection of Salt’s method rests on the fact that their own values for the mean shot lengths for some films differ from those of Salt – and as such they regard this approach as unreliable and likely to result in mistaken conclusions with regard to the history of film style. These differences are presented in Table 1.

**Table 1** Average shot lengths from Salt and Bordwell and Thompson

There are numerous problems with Bordwell and Thompson’s crticisism. First, they present measures of location without the context of sample sizes and/or measures of dispersion (variance, standard deviation, range, interquartile range – c.f. Buckland 2006). Second, where mean shot lengths are presented they are presented without confidence intervals or any other measures of sampling error. Third, the use of the mean shot length for skewed distributions with outlying data points is suspect and it is not apparent that differences between sample and population means will reflect actual differences in the distributions – a handful of very long shots can have a dramatic impact on the mean, and if these shots are not evenly distributed throughout a film they will lead to the mean being incorrectly estimated. Using the sample and population median shot lengths could avoid this problem altogether. This is not discussed. Fourth, no statistical tests are employed in determining if the differences between sample and population means observed are actual differences or if they are simply noise in the data. Salt’s figures were cited as being unreliable simply by virtue of the fact that they were different from Bordwell and Thompson’s. In their discussion of Salt’s results for Soviet cinema, Bordwell and Thompson state that Salt’s values are ‘significantly inaccurate’ (1985: 230). The term ‘significant’ has particular meaning in statistics relating to the probability of making a Type I error (a false positive – saying there is a statistically significant difference when, in fact, there is not). Bordwell and Thompson do not appear to be using ‘significant’ in this sense – even though they are discussing the difference between two averages. In what sense Salt’s figures are ‘significantly inaccurate’ is not clear, and there are no tests (*t*-test, Mann-Whitney *U* test) to support this argument.

## Testing the 1800s sample

To test the usefulness of the 1800s approach shot length data was collected on 10 films produced in Hollywood in 1930 and 1931 from the Cinemetrics database. Shot length data for the whole film (the population data) was compared against data for the first 1800s (the sample method) using four methods: a confidence interval approach for both mean and median shot lengths; a Mann-Whitney *U* test; and a Kolomogorov-Smirnov test of the cumulative distribution functions. (Data for the samples was collected to the shot nearest to 1800s, whether that was in fact greater than or less than 1800s). In all cases, the level of significance (α) is 0.05. Summary data for these films is presented in Tables 2 and 3.

**Table 2** Summary statistics for shot lengths in 5 Hollywood films, 1930

**Table 3** Summary statistics for shot lengths in 5 Hollywood films, 1931

### Confidence intervals

The uncertainty in the measurement of the sample can be quantified by calculating a confidence interval, which may be defined as ‘a range of values for a variable of interest constructed so that this range has a specified probability of including the true value of the variable’ (Gillam *et al* 2007: 51). A confidence interval gives an estimated range that is likely to contain an unknown population parameter, and will contain all the hypothetical values that cannot be rejected. If samples of the same size are drawn repeatedly from the same population, and a confidence interval is calculated for each sample, then 95% of the confidence intervals should contain the population parameter. If a confidence level of 0.05 is chosen, the confidence interval has a 0.95 probability of containing the true value of the parameter. (Note that a statistic is an estimate of a parameter). Confidence intervals are often preferred to significance tests as a means of expressing sampling error.

For film studies, the mean shot length of a sample is an estimate of the mean shot length of a population and the confidence interval specifies the range of values that may be considered as estimates of the population mean for a specified level of accuracy. Here, 95% confidence intervals for the sample means are used to estimate population means. Table 4 presents the sample means, the 95% confidence interval, and the population mean for each film.

**Table 4** 95% confidence intervals for 10 Hollywood films, 1930 – 1931

For 8 out of ten films the sample mean is a good estimate of the popualtion mean: the population means lie within the 95% confidence interval for the sample mean. However, for *Born Reckless* the sample (10.7s) underestimates the population (15.1s) and for *Bad Company* the sample (12.1s) overestimates the population (8.8s).

As all the films have skewed distributions and the maximum shot lengths exceed the upper quartile by a substantial amount, the mean is unlikely to be a reliable statistic of film style, and (as I have discussed elsewhere) the median shot length is more representative.

### Confidence intervals for the median and the Mann-Whitney *U* test

The Mann-Whitney U test is similar to a *t*-test, but dos not rely on the means of a distribution. Where the two data sets being compared have the same shape, this test is a test of the difference of the medians.

It is also possible to construct confidence intervals for sample medians, and here I have provided 95% confidence intervals based on a binomial distribution using the sample size (n) and the desired percentile (0.5) (NB: this method tends to be conservative for the median, and so the confidence intervals will be at least 95%). Confidence intervals for the median are used in the same way as those for the mean.

Table 5 presents the sample median, its 95% confidence interval, the population median, and the *P*-values of the Mann Whitney *U* test.

**Table 5** 95% confidence intervals for the median and the Mann-Whitney *U* test for 10 Hollywood films, 1930 – 1931

*Born Reckless* and *Bad Company* are again the films for which the 1800s sample is unreliable for estimating the population data. Notice also that the Mann Whitney *U* test for *The Lottery Bride* returns a P-value of 0.05 while the population median falls within the sample confidence interval: this suggests that while there is not difference in the medians the two data sets produce distributions with different shapes. This is a good example that the Mann Whitney *U* test cannot automatically be regarded as a medians test.

### Kolmogorov-Smirnov test

The Kolmogorov-Smirnov test compares the cumulative distrbution functions (cdf) of two data sets by calculating the maximum absolute difference between the two. The cdf gives us the probability of randomly selecting a shot length from a film that is less than or equal to a specified value (Pr[*X* ≤ *x*]): if there is no significant difference between the sample and the population then the probability of randomly selecting a shot length from the sample is approximately equal to that of randomly selecting the same shot length from the population. For example, if we look at the cumulative distribution functions for the popualtion and sample data for *Animal Crackers* (1930) (Figure 1), we can see that they are very similar. The *D* statistic is 0.0888 (*P* = 0.6200), and so there is no significant difference between the cdf of the population data and that of the sample data. For example, the probability of randomly selecting a shot that is less than or equal to 24.7s from the sample is 0.7957 and from the population is 0.7794.

**FIGURE 1** Population and sample cumulative distribution functions for *Animal Crackers* (1930)

The results of the K-S test for all ten films are presented in Table 6. There are no significant differences between the sample and population shot length distributions, except for the now familiar cases of *Born Reckless* and *Bad Company*.

**Table 6** Kolmogorov-Smirnov test for 10 Hollywood films, 1930 – 1931

## Conclusion

There are a range of statistical methods than can be employed in comparing shot length distrbutions, but at present the statistical analysis of film style relies upon researchers simply pointing out the difference between two numbers in the absence of all the relevant facts (the continuing absence of measures of dispersion). Although statistical analyses have been promoted as ‘more credible and valid’ as an objective form of analysis that treats film style as an ordered response to the ‘challenges of filmmaking,’ and does not rely upon the subjective tastes of the critic (Buckland 2006: 159), this is not at present the case and too much currently relies on guesswork and supposition. Human beings are notoriously poor at seeing patterns where none exist (pareidolia) and missing them where do in fact occur. Statistics is a set of methods to minimise the errors and the impact that can happen to even the most diligent researcher. Simply arguing about whether two numbers are slightly different is not statistics.

Applying these tests to determine the suitability of using the first 1800s of a film as a sample of the distribution of shot lengths for the whole film shows that this method is not foolproof. The 1800s sample may lead to an error in estimating the parameters of a film’s style, and, subsequently, lead to erroneous conclusions about the history of film style. The use of the complete data for a film is the only reliable method.

## References

Bordwell, D., and Thompson, K. (1985) Toward a scientific film history? *Quarterly Review of Film Studies* 10 (3): 224–237. Available online: http://www.davidbordwell.net/articles/.

Buckland, W. (2006) *Directed by Steven Spielberg:* *Poetics of the Contemporary Hollywood Blockbuster*. London: Continuum.

Gillam, S., Yates, J., and Badrinath, P. (2007) *Essential Public Health: Theory and Practice*. Cambridge: Cambridge University Press.

O’Brien, C. (2005) *Cinema’s Conversion to Sound: Technology and Film Style in France and the U.S*. Bloomington: Indiana University Press.

Salt, B. (1992) *Film Style and Technology: History and Analysis*, second edition. London: Starwood.

Salt, B. (2006) *Moving into Pictures: More on Film History, Style, and Analysis*. London: Starwood.

Posted on July 30, 2009, in Cinemetrics, Film Studies, Film Style and tagged Cinemetrics, Film Studies, Film Style. Bookmark the permalink. 5 Comments.

I know this is meantto be an educational piece for the Cinemetrics people, but it is not a good example to choose, for a number of reasons. Firstly, Bordwell and Thompson give ASLs for silent films projected at 24 fps., and not at the speed they were shot at. Thus their value for “Potemkin” of 1.9 sec. converts to 2.9 sec. at 16 fps., which is what the film was cranked at. Likewise for the other silent films quoted. I give figures for projection at the speed the film was shot at. Hence part of the discrepancy. I admit that when I first invented the ASL and the rest of it back in the early ‘seventies, I got a bit slapdash. I wanted to demonstrate that it was a good idea in a hurry, and some of the excerpts used were even shorter than 30 min., with serious results. However, as you note, I saw the error of my ways, even before Bordwell and Thompson sought to denigrate my work in their own interest. Apart from going over to only analysing complete films for Closenes of Shot and other variables, as noted in “Film Style and Technology”, in the last decade I have been rapidly recounting the shots in thousands of the films that I only partially counted.

The corrections can go either way, of course, but the majority do decrease the ASL. A lesser amount increase it, and a substantial number give the same result, to one decimal place. This is because most of the sections I worked from started at the beginning of the film, and the cutting rate speeds up through the course of a film for more films than the reverse. You can see this quickly on the Cinemetrics database by looking at the slope of the 1st. order trendline.

Another point about your piece is that the work you are discussing is part of statistical style analysis, not Cinemetrics, which did not exist when the work you are discussing was actually done.

And I think only the lazy cinemetrics people just do part of a film.

A final note. Amusingly, David Bordwell’s recently published ASL’s for recent films are mostly very incorrect. Somehow, he has managed to count more shots than there are in the films concerned. That’s a real achievement.

I forgot to say that nowadays when silent films are transferred to DVD, they are transferred at the speed they were shot at. And this is the material Cinemetrics people will be working with. So if you look at their results, you will se that it does not correspond to Bordwell’s values.

One thing I have been thinking about is that mean and median shot lengths should always be given with confidence intervals, even if they are for the whole film and not just a sample. Using the Cinemetrics software there is inevitably going to be some variation due to the response time of the user (athletes are given a false start if they react quicker than 1/100th of a second to the start, and the same should hold true for academics). Also there are different versions of films, silent films may have scenes missing or be only available in part, DVDs may be at the right speed but projected films sometimes sometimes shown at the wrong speed(still!), and so on. Many of these problems can be overcome by examining a film frame by frame, but others (i.e. damaged or incomplete films) cannot. Given all these variables, it should perhaps be standard practice to treat the average shot lengths as estimates.

You are trying to make it too complicated. Statistical style analysis (or Cinemtrics)is already much too complicated for most people in film studies. Have you read my piece on reaction time on the Cinemtrics site yet? The problem is not the reaction time of the recorder, but the VARIATION in their reaction time.

Pingback: Buckland on Spielberg « Research into film