Estimating shot length distributions
One of the problems we encounter when researching film style is that different versions of the same film exist. For example, the discovery in Argentina in 2008 of a version of Fritz Lang’s Metropolis (1927) that was approximately 25 minutes longer than previously known versions. The official site for the restored version is here. This makes the statistical analysis of film style difficult, because we have to face the fact that the version of a film we are analysing may not be the film as it was produced.
We may come across different versions of different films for a variety of reasons:
- Different versions of the same film may be released in different countries.
- ‘Director’s cut’ versions raise the question as to what we should call the definitive version: I have five different versions of Bladerunner (Ridley Scott, 1982) on DVD – the original 1982 US theatrical release, the 1982 international theatrical release, the work print, and the 1992 Director’s Cut version, and the 2007 Final Cut. We could simply note which version our data represents and leave it at that. However, we are faced with the problem that if we wish to look at the shot length distributions of Hollywood movies in the early-1980s or the films of Ridley Scott, which version should we pick? Should we pick the version with the voice-over that uses shots left over from The Shining, or the one without the voice-over and the unicorn dream sequence? Does the fact that the film was re-edited for the 2007 cut invalidate it when it comes to looking at 1980s Hollywood, even though all the material used in this cut was shot in the early-1980s? Is the Final Cut version an example of Hollywood cinema from the early-1980s, or of the mid-2000s? Did Scott’s editing style change between 1982 and 2007 so that these two versions cannot be simply compared?
- The version of a film released for home viewing is often different to the theatrical release due to the requirements of classification boards or censors. Another factor here may be corporate taste: historically, some home cinema outlets have edited their tapes to maintain a family friendly corporate image by removing scenes of gore, violence, and/or sex. Rather than distinguish between different versions we tend to treat the domestic and theatrical releases as being one and the same, when in fact our data may show some discrepancies.
- Although it is unlikely to be a significant factor in the 21st century, pan-and-scan may affect the number of shots in a film.
- When working with silent films it is often difficult to find compete versions of the films, and some frames, shots, scene, or even reels may be missing. We will therefore be working with only a partial data set. This problem can be compounded by the release of restored versions that are built up from several prints. It is highly likely that we have not the seen (and probably never will) the original version of many of the silent films that we take for granted on DVD.
There are also other sources of measurement error that can affect our research:
- When dealing with wipes, dissolves, fades, irises, should we record the end of one shot and the beginning of the next at the beginning of the transition, the end of the transition, or in the middle of the transition. I always prefer the last of these options, and try to identify the middle frame of the edit, but I cannot speak for other researchers.
- Identifying the correct running speed for silent movies is problematic. Silent movies we often shot at 18 or 20 fps, while we view and analyse them at 25/30 fps.
- Data from the Cinemetrics database will also contain errors due to the performance of the researcher, and so the figures quoted will only ever be estimates.
It is wrong to state, as Barry Salt did recently (here), that we do not need to employ the full range of statistical methods and that such methods are ‘misleading’ and ‘irrelevant.’ It is necessary to deal with these issues in order to present the best analysis we can, and that means we need to be able to deal with the error present in our estimates. Even though the data we collect may be accurate to the frame, we will still have to deal with the existence of multiple versions, missing shots, different methods of data collection, etc. If you are going to analyse film style statistically, then at some point you are going to have to do some statistics.
This post focusses on the particular problem of dealing with silent films that have been restored. In 2009 and 2010 I added two posts to this blog looking at the shot length distributions of the Keystone films starring Charlie Chaplin (here and here). Since then, the BFI has released its Chaplin at Keystone DVD (see here). How do the shot length distributions of these restored versions compare to the original data I used in 2009? (For ease of understanding, when I refer to ‘original’ I mean the 2009 data and when I refer to ‘restored’ I mean data derived from the BFI DVD).
To date I have only looked at data from four films, though I hope to get around to the rest some time later this year. The four films are The Masquerader, The Rounders, The New Janitor, and Getting Acquainted. In the original data I removed the credit titles and the expository and dialogue title, and for the sake of consistency I have done so here with the restored versions. However, in the Excel file at the end of this post that includes the shot length data from the restored versions of these four films I have excluded the credit titles but left in the other titles as indicated by ‘T.’
The descriptive statistics of the original and restored versions of The Masquerader are presented in Table 1 and the empirical cumulative distribution functions are presented in Figure 1.
Table 1 Descriptive statistics of the original and restored versions of The Masquerader (Charles Chaplin, 1914)
From Table 1 we can see that the original estimate of the median shot length (3.7s [95% CI: 2.8, 4.6]) is consistent with the revised estimate (4.5s [95% CI: 2.7, 6.3]). However, there is a large difference in the dispersion of shot lengths as indicated by the increase in the upper quartile and the interquartile range. This indicates that the version of The Masquerader from which the original data is less consistent in the upper part of the distribution, although a two-sample Kolmogorov-Smirnov test indicates there is no statistically significant difference (D = 0.1485, p = 0.265).
Figure 1 Empirical cumulative distribution functions of shot lengths in the original and restored versions of The Masquerader (Charles Chaplin, 1914)
Looking at the same information for The Rounders (Table 2 and Figure 2), we note that there is a much larger discrepancy between the two versions of this film. The original estimate of the median shot length was 3.6s (95% CI: 2.5, 4.7), and the revised estimate is 5.0s (95% CI: 3.5, 6.5). Again there is a larger increase in the dispersion of shot lengths, and this is also more marked in the upper part of the distribution. Again, we find that a two-sample Kolmogorov-Smirnov test indicates there is no statistically significant difference between the two distribution functions (D = 0.1922, p = 0.087).
Table 2 Descriptive statistics of the original and restored versions of The Rounders (Charles Chaplin, 1914)
Figure 2 Empirical cumulative distribution functions of shot lengths in the original and restored versions of The Rounders (Charles Chaplin, 1914)
There are no such large differences between the versions of The New Janitor (Table 3 and Figure 3). The medians are consistent, with only a small change in the estimate from 3.5s (95% CI: 2.4, 4.5) to 4.2s (95% CI: 3.2, 5.1). There is also a small increase in the interqaurtile range, and this is accounted for by the small difference between the upper quartiles. However, this difference is not comparable to those observed in the cases of The Masquerader and The Rounders, and the cumulative distribution functions are indicates that the two versions have the same distribution of shot lengths (Kolmogorov-Smirnov: D = 0.1184, p = 0.515).
Table 3 Descriptive statistics of the original and restored versions of The New Janitor (Charles Chaplin, 1914)
Figure 3 Empirical cumulative distribution functions of shot lengths in the original and restored versions of The New Janitor (Charles Chaplin, 1914)
The two versions of Getting Acquainted (Table 4 and Figure 4) show only a small difference in the upper quartile and the interquartile range, but otherwise the two sets of shot length data are consistent (Kolmogorov-Smirnov: D = 0.0622, p = 0.978). The original estimate of the median is 3.9s (95% CI: 3.3, 4.5) and the revised estimate is 4.0s (95% CI: 3.3, 4.7), so these are nearly identical.
Table 4 Descriptive statistics of the original and restored versions of Getting Acquainted (Charles Chaplin, 1914)
Figure 4 Empirical cumulative distribution functions of shot lengths in the original and restored versions of Getting Acquainted (Charles Chaplin, 1914)
Although I have looked at just four films here we can see that generally the difference in the median shot lengths is small for three of the films and would not substantially change how we interpret this information – though the increase in the dispersion of the upper part of the distribution for the restored version of The Masqueraders is a good example of why it is not enough to refer only to measures of location in the analysis of film style. We must also look at dispersion. The difference between the two versions of The Rounders will obviously lead us to reconsider our conclusions based on this data. Hopefully when I have finally completed transcribing the data for the other Chaplin Keystones from the restored version a clearer understanding of how to deal with different estimates of the shot elngths in a motion picture will emerge.
The shot length data for the restored versions of The Masquerader, The Rounders, The New Janitor, and Getting Acquainted can be accessed as an Excel 2007 (.xlsx) here: Nick Redfern – BFI Restored Chaplin 1. This data was collected by loading the films into Magix Movie Edit Pro 14 at 25 fps, and has been corrected by multiplying each shot length by 25/24.
Posted on March 17, 2011, in Charles Chaplin, Cinemetrics, Film Analysis, Film Studies, Film Style, Hollywood, Keystone Film Company, Silent cinema, Statistics and tagged Charles Chaplin, Cinemetrics, Film Analysis, Film Studies, Film Style, Hollywood, Keystone Film Company, Silent cinema, Statistics. Bookmark the permalink. Leave a comment.