One of the problems we encounter when researching film style is that different versions of the same film exist. For example, the discovery in Argentina in 2008 of a version of Fritz Lang’s Metropolis (1927) that was approximately 25 minutes longer than previously known versions. The official site for the restored version is here. This makes the statistical analysis of film style difficult, because we have to face the fact that the version of a film we are analysing may not be the film as it was produced.
We may come across different versions of different films for a variety of reasons:
- Different versions of the same film may be released in different countries.
- ‘Director’s cut’ versions raise the question as to what we should call the definitive version: I have five different versions of Bladerunner (Ridley Scott, 1982) on DVD – the original 1982 US theatrical release, the 1982 international theatrical release, the work print, and the 1992 Director’s Cut version, and the 2007 Final Cut. We could simply note which version our data represents and leave it at that. However, we are faced with the problem that if we wish to look at the shot length distributions of Hollywood movies in the early-1980s or the films of Ridley Scott, which version should we pick? Should we pick the version with the voice-over that uses shots left over from The Shining, or the one without the voice-over and the unicorn dream sequence? Does the fact that the film was re-edited for the 2007 cut invalidate it when it comes to looking at 1980s Hollywood, even though all the material used in this cut was shot in the early-1980s? Is the Final Cut version an example of Hollywood cinema from the early-1980s, or of the mid-2000s? Did Scott’s editing style change between 1982 and 2007 so that these two versions cannot be simply compared?
- The version of a film released for home viewing is often different to the theatrical release due to the requirements of classification boards or censors. Another factor here may be corporate taste: historically, some home cinema outlets have edited their tapes to maintain a family friendly corporate image by removing scenes of gore, violence, and/or sex. Rather than distinguish between different versions we tend to treat the domestic and theatrical releases as being one and the same, when in fact our data may show some discrepancies.
- Although it is unlikely to be a significant factor in the 21st century, pan-and-scan may affect the number of shots in a film.
- When working with silent films it is often difficult to find compete versions of the films, and some frames, shots, scene, or even reels may be missing. We will therefore be working with only a partial data set. This problem can be compounded by the release of restored versions that are built up from several prints. It is highly likely that we have not the seen (and probably never will) the original version of many of the silent films that we take for granted on DVD.
There are also other sources of measurement error that can affect our research:
- When dealing with wipes, dissolves, fades, irises, should we record the end of one shot and the beginning of the next at the beginning of the transition, the end of the transition, or in the middle of the transition. I always prefer the last of these options, and try to identify the middle frame of the edit, but I cannot speak for other researchers.
- Identifying the correct running speed for silent movies is problematic. Silent movies we often shot at 18 or 20 fps, while we view and analyse them at 25/30 fps.
- Data from the Cinemetrics database will also contain errors due to the performance of the researcher, and so the figures quoted will only ever be estimates.
It is wrong to state, as Barry Salt did recently (here), that we do not need to employ the full range of statistical methods and that such methods are ‘misleading’ and ‘irrelevant.’ It is necessary to deal with these issues in order to present the best analysis we can, and that means we need to be able to deal with the error present in our estimates. Even though the data we collect may be accurate to the frame, we will still have to deal with the existence of multiple versions, missing shots, different methods of data collection, etc. If you are going to analyse film style statistically, then at some point you are going to have to do some statistics.
This post focusses on the particular problem of dealing with silent films that have been restored. In 2009 and 2010 I added two posts to this blog looking at the shot length distributions of the Keystone films starring Charlie Chaplin (here and here). Since then, the BFI has released its Chaplin at Keystone DVD (see here). How do the shot length distributions of these restored versions compare to the original data I used in 2009? (For ease of understanding, when I refer to ‘original’ I mean the 2009 data and when I refer to ‘restored’ I mean data derived from the BFI DVD).
To date I have only looked at data from four films, though I hope to get around to the rest some time later this year. The four films are The Masquerader, The Rounders, The New Janitor, and Getting Acquainted. In the original data I removed the credit titles and the expository and dialogue title, and for the sake of consistency I have done so here with the restored versions. However, in the Excel file at the end of this post that includes the shot length data from the restored versions of these four films I have excluded the credit titles but left in the other titles as indicated by ‘T.’
The descriptive statistics of the original and restored versions of The Masquerader are presented in Table 1 and the empirical cumulative distribution functions are presented in Figure 1.
Table 1 Descriptive statistics of the original and restored versions of The Masquerader (Charles Chaplin, 1914)
From Table 1 we can see that the original estimate of the median shot length (3.7s [95% CI: 2.8, 4.6]) is consistent with the revised estimate (4.5s [95% CI: 2.7, 6.3]). However, there is a large difference in the dispersion of shot lengths as indicated by the increase in the upper quartile and the interquartile range. This indicates that the version of The Masquerader from which the original data is less consistent in the upper part of the distribution, although a two-sample Kolmogorov-Smirnov test indicates there is no statistically significant difference (D = 0.1485, p = 0.265).
Figure 1 Empirical cumulative distribution functions of shot lengths in the original and restored versions of The Masquerader (Charles Chaplin, 1914)
Looking at the same information for The Rounders (Table 2 and Figure 2), we note that there is a much larger discrepancy between the two versions of this film. The original estimate of the median shot length was 3.6s (95% CI: 2.5, 4.7), and the revised estimate is 5.0s (95% CI: 3.5, 6.5). Again there is a larger increase in the dispersion of shot lengths, and this is also more marked in the upper part of the distribution. Again, we find that a two-sample Kolmogorov-Smirnov test indicates there is no statistically significant difference between the two distribution functions (D = 0.1922, p = 0.087).
Table 2 Descriptive statistics of the original and restored versions of The Rounders (Charles Chaplin, 1914)
Figure 2 Empirical cumulative distribution functions of shot lengths in the original and restored versions of The Rounders (Charles Chaplin, 1914)
There are no such large differences between the versions of The New Janitor (Table 3 and Figure 3). The medians are consistent, with only a small change in the estimate from 3.5s (95% CI: 2.4, 4.5) to 4.2s (95% CI: 3.2, 5.1). There is also a small increase in the interqaurtile range, and this is accounted for by the small difference between the upper quartiles. However, this difference is not comparable to those observed in the cases of The Masquerader and The Rounders, and the cumulative distribution functions are indicates that the two versions have the same distribution of shot lengths (Kolmogorov-Smirnov: D = 0.1184, p = 0.515).
Table 3 Descriptive statistics of the original and restored versions of The New Janitor (Charles Chaplin, 1914)
Figure 3 Empirical cumulative distribution functions of shot lengths in the original and restored versions of The New Janitor (Charles Chaplin, 1914)
The two versions of Getting Acquainted (Table 4 and Figure 4) show only a small difference in the upper quartile and the interquartile range, but otherwise the two sets of shot length data are consistent (Kolmogorov-Smirnov: D = 0.0622, p = 0.978). The original estimate of the median is 3.9s (95% CI: 3.3, 4.5) and the revised estimate is 4.0s (95% CI: 3.3, 4.7), so these are nearly identical.
Table 4 Descriptive statistics of the original and restored versions of Getting Acquainted (Charles Chaplin, 1914)
Figure 4 Empirical cumulative distribution functions of shot lengths in the original and restored versions of Getting Acquainted (Charles Chaplin, 1914)
Although I have looked at just four films here we can see that generally the difference in the median shot lengths is small for three of the films and would not substantially change how we interpret this information – though the increase in the dispersion of the upper part of the distribution for the restored version of The Masqueraders is a good example of why it is not enough to refer only to measures of location in the analysis of film style. We must also look at dispersion. The difference between the two versions of The Rounders will obviously lead us to reconsider our conclusions based on this data. Hopefully when I have finally completed transcribing the data for the other Chaplin Keystones from the restored version a clearer understanding of how to deal with different estimates of the shot elngths in a motion picture will emerge.
The shot length data for the restored versions of The Masquerader, The Rounders, The New Janitor, and Getting Acquainted can be accessed as an Excel 2007 (.xlsx) here: Nick Redfern – BFI Restored Chaplin 1. This data was collected by loading the films into Magix Movie Edit Pro 14 at 25 fps, and has been corrected by multiplying each shot length by 25/24.
Towards the end of 2008 I wrote this short piece comparing the shot lengths of four films directed by Charles Chaplin, and submitted it to In Short, an online journal at the University of Miami, where it was accepted for publication. Like many online journals, In Short appears to have contributed more to the CVs of those on its editorial board than it has to scholarship and its website has now disappeared without any communication as to when (or if) anything will published or any response to my queries as to what has happened. So to put this piece out into the public domain I included it here, and as usual you can download the pdf file while the abstract is below: Nick Redfern – Shot length distributions in the early films of Charles Chaplin
The distribution of shot lengths in a motion picture is an indicator of film style, and is typically positively skewed with a number of outlying data points. Consequently, assumptions about the distribution of data for parametric statistics cannot be met and nonparametric tests are preferred for analysing quantifiable aspects of film style. This study uses nonparametric statistics as a method of comparing the distribution of shot lengths in motion pictures. Four films directed by Charles Chaplin from 1914 and 1915 were analysed to determine if the distribution of shot lengths was consistent in the works of a single director over time. Two sample Kolmogorov-Smirnov tests failed to identify a significant difference in films directed by Chaplin in the same year, but did identify significant differences in films directed by Chaplin in different years. These results may be accounted for by Chaplin’s move from the Keystone Film Company to the Essanay Film Manufacturing Company, suggesting that studio is a determining factor in film style at this stage of Chaplin’s career.
I came across a useful paper on interpreting graphs such as the one I use in the above paper, and this is worth reading: Herman Callaert, Nonparametric hypotheses for the two-sample location problem, Journal of Statistics Education 7 (2) 1999: http://www.amstat.org/publications/jse/secure/v7n2/callaert.cfm.
I’ve also just noted that there is a paper on the use of non-parametric tests in latest issue of the same journal: dwayne R Derryberry, Sue B Schou, and WJ Connover, Examples: Teaching rank-based tests by emphasizing structural similarities to corresponding parametric tests, Journal of Statistics Education 18 (1) 2010: www.amstat.org/publications/jse/v18n1/derryberry.pdf.
This week I have another draft of a Cinemetrics paper, this time looking at shot length distributions in Keystone films starring Charles Chaplin and directed by Chaplin, Mack Sennett, Mabel Normand, George Nichols, and Henry Lehrman. You can download the pdf here: Nick Redfern – Shot Length Distributions in the Chaplin Keystones, and the abstract is given below.
Cinemetrics provides an objective method by which the stylistic characteristics of a filmmaker may be identified. This study uses shot length distributions as an element of film style in order to analyse the films by five directors featuring Charles Chaplin for the Keystone Film Company. A total of 17 Keystone films are analysed – six directed by Chaplin himself, along with others directed by Henry Lehrman, George Nichols, Mabel Normand, and Mack Sennett. Shot length data was collected for each film and then combined to create data sets based on the studio style and for each director. The results show that for the distribution of shot lengths in Keystone films starring Chaplin (1) there is no significant difference between films directed Chaplin and the overall Keystone model; (2) there is no significant difference between Chaplin’s films and those of Lehrman, Nichols, and Sennett; (3) there is a significant difference between the films of Normand and the Keystone model but the effect size is small; and (4) there is a significant difference between Normand and the other Keystone filmmakers but the effect size of these differences is again small. This study shows that the distribution of shot lengths can be used to identify how the style of an individual filmmaker relates to a larger group style; and that, in the specific case of the Keystone Film Company, it is the studio style of fast-paced, slapstick comedy that determines the distribution of shot lengths with little variation present in the films of individual filmmakers.
As before, any comments and suggestions are welcome (as is the pointing out of glaring errors).
The raw data was collectde by examining the films frame by frame in my editing software, and can be accessed in a Microsoft Word Document here:
For Microsoft Word 97-2003 (x.doc): Nick Redfern – Shot length distributions in the Chaplin Keystones – data
For Microsfoft Word 2007 (x.docx): Nick Redfern – Shot length distributions in the Chaplin Keystones – data