Archive for the ‘Cinemetrics’ Category
Buckland on Spielberg
Although credited to Tobe Hooper, it is widely held that the director of this film was in fact Steven Spielberg, who also wrote and produced the film. In Directed by Steven Spielberg: Poetics of the Contemporary Hollywood Blockbuster, Warren Buckland undertakes what he calls a statistical analysis of a group of films in order to solve the riddle of who directed Poltergeist (2006: 154-173) [1]. Buckland sets out his intentions for this chapter clearly:
Through a shot-by-shot analysis, I use statistical methods to compare and contrast Poltergeist to a selection of Hopper’s and Spielberg’s other films,’ in order to ‘determine how Poltergeist’s style conforms to and deviates from Spielberg’s and Hooper’s filmmaking strategies (155).
Here I review the statistical approach adopted by Buckland. Specifically, I address four issues: the design of the study; the statistical methodology employed; the presentation of the results; and the conclusions drawn.
I do not address the rest of the book, and my critique is limited only to the chapter that deals with the statistical analysis of Spielberg’s and Hooper’s films.
The study
Buckland’s analysis compares Poltergeist to two films directed by Spielberg (ET and Jurassic Park) and one film (The Funhouse) and one TV movie (Salem’s Lot) directed by Tobe Hooper. It is reasonable that we would want to compare the work of interest (Poltergeist) to the work of the two possible directors, but alarm bells should be ringing already.
First, Poltergeist was released in 1982 – the same year as ET, while The Funhouse was released in 1981, and Salem’s Lot was aired in 1979. Jurassic Park, however, was released in 1993; and so while four of the works in question are contemporary with one another, one is from a decade later. Is it reasonable to assume that Spielberg’s style remained unchanged from 1982 to 1993 so that a direct comparison is possible? It is not unreasonable to suggest that Spielberg’s style did not change from ET to Jurassic Park, but equally it is not unreasonable to expect that it did. In the period 1901 to 1912, Picasso moved through his blue, rose, and cubist periods – might we not expect Spielberg to also have developed as a filmmaker over the course of a decade? What impact might new filmmaking techniques and technologies developed throughout the 1980s have had on his film style? We might expect the results to reflect the fact that the exemplars for Hooper are contemporary with Poltergeist, while this is only the case for one of the Spielberg films.
Furthermore, of the five films considered, four were produced for release into cinemas, while Salem’s Lot was produced for television. Might we not expect the results to reflect the fact that Poltergeist was made for cinemas like the two Spielberg films, while this is only the case for one of the Hooper films, and so indicate a difference in media rather than director? Buckland addresses this a note to the chapter (173, n.2), where he points out that the percentage of medium close-ups in Salem’s Lot is consistent with that in The Funhouse – although he simply asserts this and does not perform any test of this hypothesis (see below). It is the case that there is no significant difference between the proportion of medium close-ups in Salem’s Lot (0.33 [0.28, 0.39]) and The Funhouse (0.36 [0.30, 0.42]) (Z = 0.6459, p = 0.5183), but there is a significant difference between the proportion of reverse angle shots (see Table 2 below). Buckland’s justification for using a TV movie is, then, very weak indeed and open to challenge.
There is the potential for bias in the study, and it is not clear that it can set out to do what it claims. This is the result of failing to establish the style of Hooper and Spielberg before conducting a comparison of the two. Is Spielberg consistent over the course of a decade in his use of film style? Is Hooper consistent in his style when moving between film and television? Buckland states that a pattern of film style is ‘created by a director’s sensibility, or intuition, a series of consistent habits that constitute a director’s style’ (158), but he has failed to demonstrate that this is actually the case for either Spielberg or Hooper.
Statistical methodology
Sampling
Buckland’s data is taken from only the first thirty minutes of each film, and this has the potential to distort the results. This sampling strategy requires the assumption that rest of the film will be of similar style to the first half hour – not necessarily an unreasonable judgment but equally one which may turn out to be unjustifiable. As I have shown elsewhere, calculating the mean shot length on the basis of the first thirty minutes of a film may under- or over-estimate the true value. This may be attributed to a film reaching a dramatic climax, for example, where the pace of the editing may increase relative to the early portion of a film, which may have longer shots and scenes for exposition. Equally, when calculating the proportion of shots that are of a particular scale we may find that the style changes as the film progresses.
Estimation
A flaw in Buckland’s presentation of his results – and a general flaw in the use of statistics in film studies in general – is the confusion of statistics with parameters. It is worth reading Mark Schuster’s paper ‘Informing Cultural Policy: Data, Statistics, and Meaning’ (Schuster 2002) before proceeding with any statistical analysis because he sets out some fundamental principles of statistical analysis in a clear and accessible manner. First, he makes a distinction between data and statistics:
It has become quite common to treat the words ‘data’ and ‘statistics’ as synonyms. We prefer the word ‘statistics,’ perhaps, when we wish to signal seriousness of purpose; but we prefer ‘data’ when we don’t wish to threaten the system that is being measured.
But statistics and data are not the same. Statistics are measures that are created by human beings; they are calculated from raw data by people who are wishing to detect patterns in those data. We calculate means, modes, standard deviations, chi-squared statistics, slopes of regression lines, correlation coefficients, and so on; we aggregate in a wide variety of ways, we eliminate outliers, we normalize calculations, we truncate time series. In short, we generate mathematical summaries that we think are appropriate to the questions with which we are grappling at a particular moment in time. And we have debates about which statistic will capture better the particular element of human behavior in which we are interested.
This is why it is not only silly but perhaps even dangerous to say that we will ‘let the data speak for themselves.’ We calculate statistics from data in order to say something about them.
Schuster then goes on to make a distinction between statistics and parameters:
Statistics are mathematical summaries of the relationships we observe in the data we have actually been able to collect, often from systematically drawn samples. Parameters are mathematical summaries of the relationships that we would observe if we were able to collect complete and accurate data about the behavior of entire populations. Statistics are estimates of parameters. In the end, we are interested in parameters, but statistics are the best we can do.
Statistics and parameters are often distinguished by the use of different symbols: roman letters are used for statistics, while Greek letters are used for parameters. For example, the sample correlation coefficient r is an estimate of the population coefficient ρ, and the sample standard deviation s is an estimate of the population standard deviation σ.
Buckland – like everyone else writing about the statistical analysis of film style – presents statistics as parameters and not as estimates of parameters. For example, on the basis of the first thirty minutes of ET, Buckland states that the mean shot length is 6.25 seconds. Now, for the first thirty minutes of ET we can take this to be a parameter (it describes all of the data in the first half hour), but if we want to use this figure to describe the whole film then it is a statistic (an estimate of the parameter for the whole film). Unfortunately, as a statistic it is useless because it is not accompanied by any measure of the error of the estimate – the mean shot length is presented without a standard deviation or standard error to indicate the variability of the data, or confidence intervals to indicate the possible values of the true mean shot length. Is 6.25 seconds a good estimate of the mean shot length for ET? We do not, and on the basis of the information provided by Buckland we cannot, know.
This problem arises due to the way in which the mean shot length of a film is often calculated: the running time is divided by the number of shots. This method will tell you what the mean shot length is, but it does not make it possible to calculate any other statistics because the actual duration of each shot is not known. For example, the standard deviation is calculated by subtracting the mean of a data set from each value in the data set, but if you do not know the value of each data point then this is not possible. Consequently, we have no measure of the variability of the data, and this makes any subsequent analysis impossible. I cannot assess the validity of Buckland’s claim that, because in order to perform the appropriate statistical tests (a t-test for independent samples or one-way ANOVA, depending on how you choose to compare the films) require the standard deviation. (However, see below on the non-normal nature of shot length distributions). Nor can I calculate confidence intervals for the mean shot length because this again would require the standard deviation [2].
The unusual thing is that Buckland must have, in fact, determined the length of each shot – he presents data on the proportion of shot lengths that lie in the range 1-3 seconds, for example. He also presents the skew of the shot length data for each film, and the calculation of this statistic would require knowing the duration of each shot. Why, if this information is available, was not included in the study?
The skew of each film in Buckland’s study is large, and this begs the question why the mean shot length is used as a statistic of film style when the shot length distributions for each film are asymmetrical. A true normal distribution will have a skew of zero, but life is never convenient and a dataset will almost never have a true normal distribution. Some (but not all) statistic textbooks recommend that the assumption of normality is valid when the skew is greater than -0.8 and less than 0.8. If the skew lies outside this interval, then the assumption of normality is not valid. For the five films in Buckland’s study, the skew values are 2.7, 2.7, 5.5, 5.6, and 4.1. As I have shown elsewhere, the median is a more robust statistic when dealing with data sets that are positively skewed with outlying data points, as shot length typically are. A statistic is ‘robust’ if it is not influenced by outliers – the mean is very sensitive to outliers and just a single value that is very different from the rest of the data can wreck the mean as a measure of central tendency. The median is not affected in this way. The mean shot length should not have been used as a statistic of film style, and the conclusions Buckland draws on the basis of the mean shot length are worthless.
Testing
Buckland alerts the reader his chapter on Poltergeist will involve a detailed analysis involving the thorough use of statistics:
I need to warn the reader that this chapter contains a lot of number crunching and statistical testing, which are necessary if we want to make an informed judgment about the creative force behind Poltergeist. The results of my analysis may surprise you (155).
What statistical tests are employed in this analysis?
None.
The statements Buckland makes about the style of each director and their relation to Poltergeist are simply assertions based on whether one number is similar to another. There is no statement of what is considered to be a statistically significant result – i.e. there is no value for α and no decision rules – and so there is no means by which we can judge the reliability of the results.
This is all the more bizarre because in Elsaesser and Buckland (2002) we find the following statement:
… some films such as Poltergeist have disputed authorship (was it directed by Tobe Hooper or Steven Spielberg?). By systematically analyzing the parameters of the shots in Poltergeist, and then comparing the results to samples from Hooper’s and Spielberg’s other films, it may be possible to identify the film’s authorship (defined in terms of mise en shot, that is, the parameters of the shot). Of course, because we move from descriptive to inferential statistics, then the result can never be certain, but only predicted with a degree of probability. Only the descriptive aspect of the analysis remains beyond doubt.
On a cautionary note, the variables chosen to determine a director’s style need to be valid (…). Secondly, the results need to be statistically significant, rather than due to chance occurrence. Many statistical tests are in fact tests for significance.
Why, then, does Buckland not employ any tests of statistical significance, when clearly he is at least aware that such tests exist? It all sounds very good, but in practice there is little of substance.
To demonstrate how this analysis could have been I look at the proportion of different types of shots in the five films in the study. The obvious test for comparing the use each director of certain types of shots is the Z-test of two proportions, but the Fisher Exact Test can also be used. For an explanation of how to do the Z-test for two proportions, see David M. Lane’s Hyperstat website. For an explanation and online calculator for the Fisher Exact Test (as well as many other calculators), see the Graphpad website (The Fisher Exact Test is under the heading ‘Categorical data’).
For example, Buckland states that:
On average, 58 per cent of Hooper’s shot scales fall within the ‘big close-up to medium close-up’ range; for Spielberg, the figure is only 45 per cent. In Poltergeist, 55 per cent of the shot scales fall within this range, significantly closer to Hooper than Spielberg (164-165).
What does ‘significantly’ mean in this paragraph? There are several problems here. First, in statistics ‘significant’ has a specific (if controversial) meaning – it defines the amount of evidence required to reject a null hypothesis (though quite how you interpret this evidence depends on your preference for the Fisher or Neyman-Pearson approach to hypothesis testing, or the hybrid of the two). However, we cannot judge the significance of Buckland’s claim in these terms – we have a statement that sounds like statistics but is in fact not. We have (again) the presentation of averages without measures of dispersion or confidence intervals, and no significance test is performed. In the above paragraph, the use of the term ‘significant’ sounds good, but it is, from the point of view of statistics, meaningless: how close does ‘close’ have to be to be ‘significant?’ What procedure will we use to calculate ‘close?’ The main problem is that the issue is presented back to front: in statistical hypothesis testing, we always test the null hypothesis of no difference. This is not what Buckland describes: he says that the proportion for Hooper is significantly nearer to that of Poltergeist than the proportion for Spielberg. But how do we frame a statistical hypothesis to express this? A simple way is to compare the proportion for each director against that of Poltergeist. We state our hypotheses:
- The null hypothesis is: ‘the proportion of close-ups (big close-ups to medium close-ups) in Poltergeist is equal to that of the films directed by Hooper/Spielberg.’
The significance level is set at 0.05 – this means that if we get a p-value that is equal to or less than 0.05 we will say that there is a statistically significant difference, and if the p-value is greater than 0.05 we will say that there is no statistically significant difference. This our decision rule. The p-value is NOT the probability that a hypothesis is true – it is the probability of getting a result that is equal to or more extreme than that observed if the null hypothesis is true. Essentially it is a measure of incorrectly concluding that there is a statistically significant difference based on the data in front of you. It is important to remember that a statistically significant result is not a practically significant result, and how the former relates to the real world situation you are analysing requires careful interpretation. A significance test of the above hypothesis will not tell us why there is or is not a difference; but if we assume that the decisions of filmmakers determine the style of a film and that different filmmakers making different decisions will have different styles we must first determine if such a difference can be said to exist. Statistics is one method of doing this, but not the only one.
To answer the question as to which director is closer to Poltergeist and which is further away we need to address the effect size of the difference. A significance test will us if there is a statistically significant difference and the effect size will tell us how big that difference is. Unfortunately, there is not enough data to be able to do this for Buckland’s experiment. Nonetheless, it is important to be clear that the p-value does not tell you the size of a difference.
The results of a Z-test of proportions for our hypotheses at a significance level of 0.05 are presented in Table 1.
TABLE 1 Proportion of close-ups (big close-ups to medium close-ups) (α = 0.05)
In Table 1, we have a lot of information. The first column (P) gives the proportion of close-ups (big close-ups to medium close-ups) in the three data sets, with the confidence interval in the second column so that we know the error of the estimate. The third column (Difference) calculates the difference between the sample for each director and the sample for Poltergeist, and the fourth column is the confidence interval for this difference. (This will give us some limited understanding of ‘closer,’ but is not the same as the effect size). The fifth column gives the result of the Z-test and the sixth column is the p-value. Note that for Hooper the p-value is greater than 0.05, and so we say that there is no statistically significant difference between Hooper and Poltergeist. For Spielberg, the p-value is less than 0.05 and so we say that is a statistically significant difference between this director and Poltergeist. Buckland’s conclusion is vindicated by the statistical analysis – but without defining the hypotheses, without the statistical test, and without defining what we mean by significance we are just guessing, and guessing is not research.
How good are Buckland’s other guesses? We can find out by performing statistical tests on a range of stylistic elements for which Buckland provides data. For the rest of these tests I will not explicitly state the hypotheses and typically hypotheses in research papers will be implicit rather than explicit; but the null hypothesis (unless otherwise stated) in each case is of the form ‘the proportion of x in film y is equal to the proportion of x in Poltergeist.’ The test used in each case is the Z-test for two proportions, and the significance level is 0.05.
In Table 2 we can see the results of applying the Z-test to the proportion of reverse angle shots; and what they tell us is that there is no statistically significant difference between Poltergeist and ET, Jurassic Park, or The Funhouse, while there is a significant difference between Poltergeist and Salem’s Lot. It is possible that, as a television programme (viewed on a smaller screen in the intimate setting of the home) Salem’s Lot uses reverse angle cuts in a different way to motion pictures designed to be viewed on a cinema screen. This is a hypothesis that can be tested statistically if you have the data: do films made for television have a greater proportion of reverse angle cuts than film made for theatres? If so, then the decision to include a made-for-television, which Buckland justifies on the basis of one element of film style (see above), is flawed and the results will reflect the difference between to media and not two directors. Either way, looking at this element of film style leads us to no firm conclusion about who could be considered the author of Poltergeist.
TABLE 2 Proportion of reverse angle shots in Poltergeist against four films (α = 0.05)
The same is also true when we look at the proportion of low angle shots (Table 3). The results show that there is no significant difference between the proportion of low angle shots in Poltergeist and ET or The Funhouse, but that there is a significant difference between the proportion of low angle shots in Poltergeist and Salem’s Lot and Jurassic Park. There is no conclusion that we can draw here about the authorship of Poltergeist.
TABLE 3 Proportion of low angle shots in Poltergeist against four films (α = 0.05)
We also cannot draw any conclusion based on the proportion of high angle shots (Table 4), which shows a significant difference between Poltergeist and ET, but no significant difference between Poltergeist and the other three films.
TABLE 4 Proportion of high angle shots in Poltergeist against four films (α = 0.05)
Buckland argues that the proportion of shots with a low camera height in Poltergeist is more akin to the films of Spielberg than Hooper; and if the former did not actually direct Poltergeist then Buckland suggests (reasonably) that this may have been a creative suggestion from one filmmaker (Spielberg) to another (Hooper). The results of the Z-test show that Poltergeist has a significantly different proportion of low camera height shots from The Funhouse or Salem’s Lot, and we may conclude that a proportion of 0.53 is certainly unusual for what we know about Hooper’s film style. There is no significant difference between the proportion of low camera height shots in Poltergeist and ET, and we could conclude that placing the camera at this height was a creative suggestion that originates with Spielberg if it were not actually his decision, were it not for the fact that Jurassic Park shows a statistically significant difference from Poltergeist. Buckland’s argument that ‘we can infer that the decision to use so many low camera heights in Poltergeist was Spielberg’s suggestion, which constitutes one of the pieces of advice he offered to Hooper on the set’ (163) is demonstrably false because we cannot, in fact, conclude from the results in Table 5 that the use of low camera height shots in Poltergeist is typical of Spielberg. Note that the confidence interval for the proportion in ET does not include the proportion for Jurassic Park, and vice versa.This example demonstrates clearly why it is necessary to perform statistical test and not simply make assertions based on the fact that one number is more like a second number than another: 0.42 looks close enough to 0.53 to for Spielberg’s influence be plausible – especially when the proportions for the Hooper films 0.29 and 0.33 – but the Z-test leads us to the alternative conclusion. This does not mean that Spielberg did not influence Hooper’s decision to place the camera at a low height – but it is not a statistically sound conclusion.
TABLE 5 Proportion of low camera height shots in Poltergeist against four films (α = 0.05)
Things are clearer when we look at the proportion of moving shots: there are significant differences between Poltergeist and the two Spielberg films, but no significant difference between Poltergeist and the two Hooper films. In isolation, we might interpret this as a clear indication of that Poltergeist was directed by Hooper. However, when interpreted in relation to the other types of shot Buckland includes this serves only to confuse the issue.
TABLE 6 Proportion of moving shots in Poltergeist against four films (α = 0.05)
Again, the proportion of shots in the range 1-3 seconds (Table 7) seemingly paints a clear-cut picture of that Hooper did direct Poltergeist. Taken with the moving shots, we might argue that the only elements of film style that can distinguish one filmmaker from another are these two statistics – but this is a highly selective interpretation of the available evidence and it would be necessary to explain why reverse angle shots, low angle shots, etc., should not be used. As Buckland bases his interpretation on all the available data, then the results in Table 7 are inconclusive when viewed in the context of the rest of the data. We can only conclude that there are some differences between some of the films on some measures.
TABLE 7 Proportion of shots in the range 1-3 seconds in Poltergeist against four films (α = 0.05)
All of this assumes that Hooper’s and Spielberg’s films are stylistically different from one another, but is this, in fact, the case? For example, if we compare the proportion of shots in the range 1-3 seconds in ET and Jurassic Park against The Funhouse and Salem’s Lot (see Table 8), we find that we cannot simply distinguish between Spielberg and Hooper as film directors. Neither Salem’s Lot nor The Funhouse shows a significant difference from ET, while both films are significantly different from Jurassic Park. We might conclude, therefore, that the director of Jurassic Park was not the same director of Salem’s Lot and The Funhouse; but, if we did so, would we not also need to consider the possibility that the director of ET did direct Salem’s Lot and The Funhouse? This is made even more complicated by the fact that ET shows no significant difference for the proportion of shots in the range 1-3 seconds from Jurassic Park (Z = 1.4443, p = 0.1487) and that there is no significant difference between Salem’s Lot and The Funhouse (Z = 0.2371, p = 0.8126). Should we then conclude that the director of ET also directed The Funhouse, Salem’s Lot, and Jurassic Park, but that the director of Salem’s Lot, The Funhouse, and ET did not direct Jurassic Park? Buckland describes these films as being of ‘undisputed authorship’ (157), and certainly there is no reason to think that director in each case has been inaccurately credited – but is there any statistical evidence to support this? Is statistics even able to answer this question?
TABLE 8 Z-test of the proportion of shots in the range 1-3 seconds in four films (α = 0.05)
Presentation
One of the problems with Buckland’s analysis is that it is difficult to follow. This is due the poor presentation of the data, which is organised by film rather than by variable. As a result we find the relevant statistics for reverse angle shots on five different pages, and have to spend time hunting and organising this data. This makes it difficult to easily compare and contrast the different stylistic elements. Hopefully you will have found the tables produced here clear and simple to understand, with all the relevant data easily to hand. In Table 2, for example, the proportion of reverse angle shots in each film is presented together in a single column so that rather than having to flip from page to page you can get all the data. It is far easier to identity patterns by looking at the data when it is presented side-by-side.
This might seem like a small and pedantic point, but if you want to present the reader with a detailed statistical analysis, then you have to make it clear for them to follow and to understand. It is especially irritating given that the use of diagrams in the book’s other chapters is clear and easy to understand. It raises questions about the ability of Buckland, his readers, and the editors at Continuum to deal with statistical information – why, when everything else appears to be have been done so much better, was the presentation of the statistics done so badly?
Conclusions
Buckland concludes that Hooper was the director of Poltergeist, but that Spielberg had an input on key stylistic decisions. This seems to me to be an entirely plausible description of the working relationship between two filmmakers who fulfilled the roles of director (Hooper), and producer and screenwriter (Spielberg). However, it is not a conclusion that can be reached through a statistical analysis of some elements of film style.
A further problem lies in the way in which the research question behind the chapter is framed. Buckland asks who the author of Poltergeist is: Spielberg or Hooper. This assumes an all-or-nothing conception of authorship that is parceled out to one of two pre-selected individuals. What if the answer is neither (or even both)? What if there is no such thing as authorship in the cinema? Or if such a thing does exist, what if it cannot be identified by the statistical analysis of those elements of film style and can only be located in the non-quantifiable, such as mise-en-scene? We are also assuming that a statistically significant difference reflects the practical difference the decisions of a filmmaker has on film style – not necessarily an unreasonable assumption but one that needs to be considered in the design of the experiment.
We could just drop the authorship question entirely and ask who, on the basis of the results presented here, should be credited as the director of Poltergeist? (These two questions are presented as equal by Buckland and there is no reason not to do this, but they could be separated). Well, some measures would seem to favour Spielberg, while others would favour Hooper. We certainly cannot apportion some role of direct creative agency as ‘author’ based on statistics if we cannot use those statistics to say who, in fact, directed the film! Table 9 summarises whether the proportion of different shot types is different for each film against Poltergeist, and we can see that there is no consistent pattern for these elements of film style.
TABLE 9 Statistically significant differences in shot types between Poltergeist and four films (Z-test for two proportions, α = 0.05)
We might also question the results that do indicate significant differences, which may have a higher than expected error rate due to the multiple tests used. We have assumed a significance level of 0.05, which means that at least one significant result could be expected even though there is no practical difference. We can therefore assume that at least one ‘YES’ in Table 9 is a false positive, but we cannot know which one. One method is to correct the significance level to take multiple testing into account, thereby reducing the critical p-value. This would make our decision rule much more stringent, and some of the significant differences above would be re-classed as ‘not significant.’ For the 20 hypothesis tests presented in Table 9, a corrected p-value of 0.0025 would keep the expected error rate at 5% for the whole experiment.
On the back cover of Directed by Steven Spielberg we find the promise that,
Buckland also uses poetics to answer once and for all the question: did Spielberg really direct Poltergeist? The reader will discover whether Poltergeist should remain a Tobe Hooper film, or whether it should be added to Spielberg’s canon.
If we adopt a statistical approach, what can we conclude about the roles of Spielberg and Hooper in the production of Poltergeist? Well, nothing, it turns out, and the reader will discover nothing. The results of the tests presented above are too inconclusive, too topsy-turvy, and too open to conflicting interpretations to justify the conclusion that either Spielberg or Hooper should be credited as author or, indeed, as director. All data is open to multiple interpretations, but we should at least be able to (1) explain the logic behind a particular interpretation, (2) give reasons why one interpretation is to be considered to be better than another, and (3) subject that interpretation to further scrutiny. As I have shown here, Buckland’s study fails on all three counts due to the potentially flawed design of the study, the lack of a statistical methodology and the failure to provide all the necessary information, and the difficulty in understanding the data presented due to its poor organisation.
Summary
Buckland makes bold claims for his chapter on Poltergeist, and promises that the results of his analysis may surprise the reader. Unfortunately, there is little surprising about the standard of the statistical analysis in this book, and the mistakes Buckland makes are the same mistakes that have been made for over thirty years in film studies. For example, no one to my knowledge has ever conducted a statistical test or provided a confidence interval when making statements about film style while quoting things like average shot lengths or the proportion of a type of shot in a film; and Bordwell and Thompson (1985) made precisely the same mistake about the use of the term ‘significant’ Buckland makes 21 years later. Statistics are presented as parameters, and there are no measures of dispersion or confidence intervals. The wrong statistics are used, when the data clearly indicate the necessity to use alternative methods.
Notes
- Unless otherwise stated, all page references are to this chapter.
- Charles O’Brien (2005: 83) does provide standard deviations for some data, including standard deviations for some of Barry Salt’s data that do not appear to be in Salt (1992), but makes no reference to them and performs no statistical tests.
References
Bordwell D and Thompson K 1985 Toward a scientific film theory, Quarterly Review of Film Studies 10 (3): 224–237. Available online: http://www.davidbordwell.net/articles/Bordwell_Thompson_QuarterlyRevFilmStud_vol10_no3_summer1988_224.pdf, accessed 18 November 2009.
Buckland W 2006 Directed by Steven Spielberg: Poetics of the Contemporary Hollywood Blockbuster. London: Continuum.
Elsaesser T and Buckland W 2002 Studying Contemporary American Film: A Guide To Movie Analysis. London: Arnold. The chapter on the statistical analysis of film style can be accessed online: http://www.cinemetrics.lv/buckland.php, accessed 18 November 2009.
O’Brien C 2005 Cinema’s Conversion to Sound: Technology and Film Style in France and the U.S. Bloomington: Indiana University Press.
Salt B 1992 Film Style and Technology: History and Analysis, second edition. London: Starwood.
Schuster M 2002 Informing cultural policy – data, statistics, and meaning, International Symposium on Cultural Statistics, UNESCO Institute for Statistics, Observatoire de la culture et des communications du Québec, Montréal, Québec, Canada, October 21 to 23, 2002. Available online: http://www.culturalpolicies.net/web/files/74/en/Schuster.pdf, accessed 18 November 2009.
Location and spread in shot length distributions
The typical characteristics of the distribution of shot lengths in a motion picture are:
- The distribution is decidedly non-normal – it is positive skewed. Although it is possible to conceive of a film that would have a normal or even a negative distribution of shot lengths this does not occur in fact, and I have never come across any film in which the shot lengths were not positively skewed.
- The distribution will include some outlying data points that are far from the average value (the mean or the median shot length).
An additional characteristic worth exploring is the linear relationship between the average value of a shot length distribution and the spread of the data around that value. Figures 1 to 6 plot the average value (the mean and the median shot lengths) of the 50 Hollywood films I used in my analysis of the impact of sound technology on film style (20 silent and 30 sound) against three measures of absolute dispersion – the standard deviation, the interquartile range, and the median absolute deviation. The coefficient of determination is given as a measure of the linear relationship between location and spread. The correlation coefficients for all the comparisons are significant at the 95% level.
Figure 1 Mean shot length v. standard deviation for silent Hollywood films produced from 1920 to 1928 inclusively (n = 20).
Figure 2 Median shot length v. interquartile range for silent Hollywood films produced from 1920 to 1928 inclusively (n = 20).
Figure 3 Median shot length v. median absolute deviation for silent Hollywood films produced from 1920 to 1928 inclusively (n = 20).
Figure 4 Mean shot length v. standard deviation for sound Hollywood films produced from 1929 to 1931 inclusively (n = 30).
Figure 5 Median shot length v. interquartile range for sound Hollywood films produced from 1929 to 1931 inclusively (n = 30).
Figure 6 Median shot length v. median absolute deviation for sound Hollywood films produced from 1929 to 1931 inclusively (n = 30).
In general the linear relationship between location and spread for these films is evident, but may be quite weak. The strongest linear relationship occurs between the median shot length and the median absolute deviation, and the strength of these relationship increases from the silent to the sound era. In both cases there is a substantial proportion of the variance that is unexplained, but overall films with greater median shot lengths exhibit greater variation in their shot lengths.
The relationship between the mean shot length and the standard deviation shows weaker linearity, with approximately one-third of the variance unexplained for both groups of films although there is a small increase in the strength of the relationship from the silent to the sound eras.
The relationship between the median and the interquartile range (IQR) for the sound films shows a weak linear relationship for the sound films, but only a very weak relationship for the silent films – although the r is significant (t [18] = 4.1090, p = 0.0007), over half the variance in the IQR is unexplained. R2 for the silent films is 0.4840 and for the sound films is 0.7490, though why such a difference should occur for this relationship and not for the others is a mystery. There is clearly something about the relationship between the median shot length and the interquartile range in the sample of silent films that requires further exploration.
We can say that for Hollywood films of the 1920s and early silent period the average shot length of a motion picture increases so does the variability of shot lengths. As expected, for skewed data sets the linear relationship between measures that do not rely on a mathematical relationship to the mean are the strongest. It seems likely that other groups of films will exhibit similar relationships between measures of location and spread (although perhaps not for the median and the IQR), but it will take further studies to test this hypothesis.
The relative dispersion of shot lengths
Studies comparing the change in shot length distributions in Hollywood films with the coming of synchronous sound have focused on measures of central location – the mean or median shot length of a film. The change in the mean shot length from the silent to sound era has been put at approximately six seconds, although this figure is suspect due to the asymmetrical nature of shot length distributions; while the change in the median shot length has been estimated at 2.9 seconds. Similar attention has not been paid to the change in the dispersion of shot lengths that also occurred in the shift from silent to sound cinema. In fact, it is common for mean shot lengths to be presented with no measures of dispersion at all and this severely hampers any useful interpretation of the results.
In my study of the impact of sound on shot length distributions I noted that the interquartile range of sound films was greater than those of silent films, indicating that there is greater variation in the shot length distributions of the sound films. While this method of comparing the variation of shot length distributions is perfectly fine, it is not perhaps the simplest method and using measures of relative dispersion may prove easier to interpret.
Measures of Relative Dispersion
In order to compare the relative dispersion of shot length distributions, three measures of relative dispersion were calculated for each film from a sample of Hollywood silent films produced from 1920 to 1928 inclusive (n = 20) and from a sample of sound films produced in Hollywood from 1929 to 1931 inclusive (n = 30) (see my earlier study for the descriptive statistics of these films). The mean values of each coefficient for the two samples were compared using a t-test assuming unequal variances. Calculations were conducted using Microsoft Excel 2007 and GraphPad Instat v3.10 (2009).
The three measures of dispersion considered are the coefficient of variation (CV), the quartile coefficient of dispersion (QCD), and the coefficient of median deviation (MD). The relative measures of dispersion for the silent films are presented in Table 1 and for the sound films in Table 2.
TABLE 1 Relative measures of dispersion for Hollywood silent films, 1920 to 1928
TABLE 2 Relative measures of dispersion for Hollywood sound films, 1929 to 1931
Coefficient of variation
The coefficient of variation is the ratio of the standard deviation to the mean:
CV = SD/M
The coefficient of variation for the sound films (M = 1.1912, SD = 0.2319) is greater than those silent films (M = 0.9015, SD = 0.1393), t (47) = 5.5217, p = <0.0001. On this measure of dispersion, the shot lengths of a Hollywood sound film are more dispersed by almost a third (32.14%) than the silent films.
Quartile coefficient of dispersion
The quartile coefficient of dispersion is calculated using the lower (Q1) and upper (Q3) quartiles of the shot length distribution:
QCD = Q3-Q1/Q3+Q1
The quartile coefficient of dispersion for the sound films (M = 0.5748, SD = 0.0617) is greater than those silent films (M = 0.4833, SD = 0.0522), t (45) = 5.6409, p = <0.0001. On this measure of dispersion, the shot lengths of a Hollywood sound film are more dispersed by almost a fifth (18.83%) than the silent films.
Coefficient of median deviation
The coefficient of median deviation is the ratio of the median absolute deviation from the median shot length (MAD) to the median shot length [1]:
MD = MAD/Median
The coefficient of median deviation for the sound films (M = 0.5825, SD = 0.0680) is greater than those silent films (M = 0.4735, SD = 0.0473), t (47) = 6.6813, p = <0.0001. On this measure of dispersion, the shot lengths of a Hollywood sound film are more dispersed by almost a quarter (23.01%) than the silent films.
Discussion
All three measures of relative dispersion provide similar results, but the coefficient of median deviation is the most reliable.
While the coefficient of variation makes complete use of the data and is the best understood of measures of relative dispersion, it relies on the mean shot length. As the distribution of shot lengths in a motion picture is typically positively-skewed with a number of outlying data points, the mean shot length is an unreliable statistic of film style. Consequently, the coefficient of variation can be expected to overestimate the dispersion of shot lengths in a film as the mean value is pulled towards the higher end of the distribution.
The quartile coefficient of dispersion is not dependent upon the mean shot length and so provides a more robust estimation of relative dispersion than the coefficient of variation. A drawback is that it uses only a limited amount of information in calculating the coefficient, and as a film may feature shot lengths that are much greater than the upper quartile it may underestimate the actual dispersion of shot lengths.
Like the quartile coefficient of dispersion, the median deviation does not use the mean shot length and can be relied upon as a more robust measure of relative dispersion. The median deviation has an advantage over the quartile coefficient of dispersion in that it uses more of the data by calculating the absolute deviation of each shot length from the median rather than relying on just two positional values. The quartile coefficient of dispersion can be regarded as an estimator of the coefficient of median deviation for the films looked at here.
In conclusion, we can say that with the introduction of synchronous sound to Hollywood in the late-1920s we not only see an increase in the median of the shot lengths of a motion picture, but also an increase in the variation shot lengths of sound films relative to silent films. Using the coefficient of median deviation we can estimate that increase to be of the order of 23%.
Notes
- The coefficient of median deviation is based on the coefficient of mean deviation, but replaces the average absolute deviation with the median absolute deviation in order to prevent extra weight being given to shots of duration that are unusually long.
The empirical analysis of film style
The analysis of film style by empirical means – i.e. the use of statistics – is an important part of film studies. It is also an important part of other disciplines – information management, research on emotion, advertising, computational media aesthetics – and this tends to be overlooked by film scholars. This week’s post includes a range of references to the analysis of film style – and most of these can be accessed for free on the internet.
References to books may only be available through Google Books, in which case the previews available may be limited. As a general rule, I have not included film studies texts that refer to statistics of film style unless they deal in some way with methods of analysis. If a piece is available online but only through a subscription service I have not included a link. The links were correct as of the date of posting, but if anyone finds a broken link let me know.
This list is by no means exhaustive, but it does give a range of papers that bring new approaches to film studies in areas that have not really been explored and which can enable film scholars to link together different fields: just how does the fast cutting of adverts have an emotional impact on consumers? How do we define genres in terms of their quantitative features rather than the qualitative?
Adams B 2003 Where does computational media aesthetics fit? IEEE Multmiedia 10 (3): 18-27.
Adams B, Dorai C, and Venkatesh S 2002 Formulating film tempo: the computational media aesthetics methodology in practice, in C Dorai and S Venkatesh (eds) Media Computing: Computational Media Aesthetics. Norwell, MA: Kluwer Academic Publishers: 57-79.
Adams B, Dorai C, and Venkatesh S 2000 Role of shot length in characterizing tempo and dramatic story sections in motion pictures, IEEE Pacific Rim Conference on Multimedia, 13-15 December 2000, Sydney, Australia: 54–57.
Adams B, Dorai C, and Venkatesh S 2000 Study of shot length and motion as contributing factors to movie tempo, 8th ACM International Conference on Multimedia, 30 October – 3 November 2000, Los Angeles, CA: 353–355.
Adams B, Dorai C, and Venkatesh S 2002 Finding the beat: an analysis of the rhythmic elements of motion pictures, The 5th Asian Conference on Computer Vision, 23-25 January 2002, Melbourne, Australia.
Bordwell D and Thompson K 1985 Toward a scientific film history? Quarterly Review of Film Studies 10 (3): 224–237.
Brandt M 1994 Traditional film editing vs. electronic nonlinear film editing: a comparison of feature films, Nonlinear. NB: There’s no description of the statistical tests used in this study even though it states that the results are statistically significant. As no value for α is given, it is hard to judge what ’statistically significant’ means in this context.
Buckland W 2008 What does the statistical style analysis of film involve? Literary and Linguistic Computing 23 (2): 219-230. NB: This is a review of Barry Salt’s Moving into Pictures (see below), and contains an error (confusing the correlation coefficient for the coefficient of determination) that is not in Salt’s book.
Dorai C and Venkatesh S 2001 Computational media aesthetics: finding meaning beautiful, IEEE Multimedia 8 (4): 10-12.
Dorai C and Venkatesh S 2002 Bridging the semantic gap in content management systems – computational media aesthetics, in C Dorai and S Venkatesh (eds) Media Computing: Computational Media Aesthetics. Norwell, MA: Kluwer Academic Publishers: 1-9.
Elsaesser T and Buckland W 2002 Studying Contemporary American Film: A Guide to Movie Analysis. London: Arnold. NB: The chapter on the statistical analysis of film style is available at the cinemetrics website here.
Fishcer S, Leinhart R, and Effelsberg W 1995 Automatic recognition of film genres, Proceedings of the 3rd ACM Multimedia Conference, 9-5 November 1995, San Francisco, CA: 295-304.
Fujita K 1989 Shot length distrbutions in educational TV programmes, Bulletin of the National Institute of Multimedia Education 2: 107-116. This paper can be accessed here by clicking on ‘CiNii Fulltext PDF.’
Fujita K 1992 Shot length distrbutions in educational TV programmes and their characteristics, in H Motoaki, J Misumi, and B Wilpert (eds) Social, Educational, and Clinical Psychology. Proceedings of the 22nd International Congress of Applied Psychology, 22-27 July 1990, Kyoto, Japan: 192. NB: This appears to be a summary of the above paper.
Hanjalic A 2004 Content-based Analysis of Digital Video. Norweel, MA: Kluwer Academic Publishers.
Huang H-Y, Shih W-S, and Hsu W-H 2007 A movie classifier based on visual features, in WG Kropatsch, M Kampel, and A Hanbury (eds) Computer Analysis of Images and Patterns. Proceedings of the 12th International Conference on Computer Analysis of Images and Patterns, 27-29 August 2007, Vienna, Austria: 937-944.
Huang H-Y, Shih W-S, and Hsu W-H 2008 A film classifier based on low-level visual features, Journal of Multimedia 3 (3): 26-33.
Kang H-B 2003 Affective content detection using HMMs, Proceedings of the eleventh ACM International Conference on Multimedia 2-8 November 2003, Berkeley, CA: 259-262.
Kang H-B 2003 Emotional event detection using relevance feedback, Proceedings of the International Conference on Image Processing, 14-18 September 2003, Barcelona, Spain: 721-724.
Kang H-B 2003 Affective contents retrieval from video with relevance feedback, in TMT Sembok, HB Zaman, H Chen, S Urs, and SH Myaeng (eds) Digital Libraries: Technology and Management of Indigenous Knowledge for Global Access. Proceedings of the 6th International Conference on Asian Digital Libraries, 8-12 December 2003, Kuala Lumpur, Malaysia: 243-252.
Maclachlan J and Logan M 1993 Camera shot length in commercials and their memorability and presuasiveness, Journal of Advertising Research 33 (2): 57-61.
Mittal A, Fah CL, Kassim A, and Pagalthivarthi KV Context-based interpretation and indexing of video data, in U Srinivasan and S Nepal (eds) Managing Multimedia Semantics. Hershey, PA: IRM Press: 77-98.
Nack F 2002 The future of media computing, in C Dorai and S Venkatesh (eds) Media Computing: Computational Media Aesthetics. Norwell, MA: Kluwer Academic Publishers: 159-186.
Nothelfer CE, DeLong JE, and Cutting, JE 2009 Shot structure in Hollywood film, Indiana Undergraduate Journal of Cognitive Science 4: 103-113.
Romatowska A 2004 Pickpocket: A statistical analysis, Offscreen 8 (4).
Rosenbaum J 2000 Is Ozu slow? Senses of Cinema 4.
Salt B 1974 Statistical style analysis of motion pictures, Film Quarterly 28 (1): 13-22.
Salt B 1992 Film Style and Technology: History and Analysis, second edition. London: Starwood.
Salt B 2001 Practical film theory and its application to TV series dramas, Journal of Media Practice 2 (2): 98-113.
Salt B 2006 Moving into Pictures: More on Film History, Style, and Analysis. Starwood, London.
Schaefer R 1997 Editing strategies in television news documentaries, Journal of Communication 47(4): 69-88.
Taskiran CM and Delp EJ 2002 A study on the distribution of shot lengths for video analysis, SPIE Conference on Storage and Retrieval for Media Databases, 20-25 January 2002, San Jose, CA.
Tian Q and Zhang H-J 1999 Video shot detection and analysis: content-based approaches, in CW Chen, Y-Q Zhang (eds) Visual Information Representation, Communication, and Image Processing. New York: Marcel Dekker, Inc: 227-254.
Tiemans R 2004 A content analysis of political speeches on television, in KI Smith, K Kenny, S Moriarty, and G Barbatsis (eds) Handbook of Visual Communication: Theory, Methods, and Media. New York: Routledge: 385-404.
Totaro D 2004 Reflections on the Pickpocket statistical analysis, Offscreen 8 (4).
Truong BT and Venkatesh S 2005 Finding the optimal temporal partitioning of video sequences, Proceedings of IEEE International Conference on Multimedia and Expo, 6-9 July 2005, Amsterdam, Netherlands: 1182-1185.
Tsivian Y 2008 What is cinema? An agnostic answer, Critical Inquiry 34 (4): 754-776.
Vasconcelos N and Lippman A 2000 Statistical models of video structure for content analysis and characterization, IEEE Transactions on Image Processing 9 (1): 3–19.
Young C 2007 Fast editing speed and commercial performance, Admap 483: 30-33.
Young C 2007 Fast-working advertising, Admap 484: 32-34.
The impact of sound on film style
This post is the last of three draft papers that apply statistical analysis to questions of film style. This I focus on the impact of sound technology on shot length distributions by examining the change in the median shot length and the interquartile range. You can access the pdf here: Nick Redfern – The impact of sound technology on Hollywood film style, and the abstract is presented below.
Quantitative analyses of the impact of sound technology on shot lengths in Hollywood cinema have claimed that, with the coming of sound, the mean shot length increased from ~5s to ~11s, and that this indicates a major change in film style as cutting rates slowed. However, the mean shot length is not a robust statistic of film style due to the positive skew of the data and the presence of outlying data points in shot length distributions. The median shot length is shown to be a more robust statistic unaffected by shape of shot length distributions, and the impact of sound technology on Hollywood is analysed through looking at the median shot lengths of silent films produced between 1920 and 1928 (n = 20, median = 4.4s [95.86% CI: 3.7, 5.1]) and sound films produced from 1929 to 1931 (n = 30, median = 6.9s [95.72% CI: 5.9, 8.7]). The results show that there is an increase in shot lengths in the early sound era (Mann-Whitney U = 32.5, P = <0.0001, PS = 0.0542), but that this change is much less than that described by studies using the mean shot length (HLΔ = 2.9s [95% CI: 1.8, 4.1]). Looking at the interquartile ranges of the silent films (median = 4.8s [95.86% CI: 4.3, 5.7]) and the sound films (median = 10.7s [95.72% CI: 8.8, 12.1]), we see that there is an increase by HLΔ = 5.6 seconds (95% CI: 4.1, 7.1), indicating that shot lengths in sound films show greater variation than those of the silent era (Mann-Whitney U = 4, P = <0.0001, PS = 0.0067).
As before, I’ll leave this up for a while before submitting it to a journal (if I can find one), so feel free to comment.
Shot Length Distributions in the Chaplin Keystones
This week I have another draft of a Cinemetrics paper, this time looking at shot length distributions in Keystone films starring Charles Chaplin and directed by Chaplin, Mack Sennett, Mabel Normand, George Nichols, and Henry Lehrman. You can download the pdf here: Nick Redfern – Shot Length Distributions in the Chaplin Keystones, and the abstract is given below.
Cinemetrics provides an objective method by which the stylistic characteristics of a filmmaker may be identified. This study uses shot length distributions as an element of film style in order to analyse the films by five directors featuring Charles Chaplin for the Keystone Film Company. A total of 17 Keystone films are analysed – six directed by Chaplin himself, along with others directed by Henry Lehrman, George Nichols, Mabel Normand, and Mack Sennett. Shot length data was collected for each film and then combined to create data sets based on the studio style and for each director. The results show that for the distribution of shot lengths in Keystone films starring Chaplin (1) there is no significant difference between films directed Chaplin and the overall Keystone model; (2) there is no significant difference between Chaplin’s films and those of Lehrman, Nichols, and Sennett; (3) there is a significant difference between the films of Normand and the Keystone model but the effect size is small; and (4) there is a significant difference between Normand and the other Keystone filmmakers but the effect size of these differences is again small. This study shows that the distribution of shot lengths can be used to identify how the style of an individual filmmaker relates to a larger group style; and that, in the specific case of the Keystone Film Company, it is the studio style of fast-paced, slapstick comedy that determines the distribution of shot lengths with little variation present in the films of individual filmmakers.
As before, any comments and suggestions are welcome (as is the pointing out of glaring errors).
The raw data was collectde by examining the films frame by frame in my editing software, and can be accessed in a Microsoft Word Document here:
For Microsoft Word 97-2003 (x.doc): Nick Redfern – Shot length distributions in the Chaplin Keystones – data
For Microsfoft Word 2007 (x.docx): Nick Redfern – Shot length distributions in the Chaplin Keystones – data
Shot scales in Hollywood and German cinema, 1910 to 1939
This week’s post presents a first draft of a piece on shot scales in Hollywood and German cinema from the 1910s to the 1930s. The methods applied have been discussed on this blog before, but this paper presents a more systematic use of a regression than has previously been the case. The file is available as a pdf here: Nick Redfern – Shot scales in Hollywood and German cinema, 1910 to 1939, and the abstract is presented below.
Shot scales in Hollywood and German cinema, 1910 to 1939
Statistical analysis is an important part of an inductive programme of research into film style enabling large groups of films to be analysed, identifying key trends, and identifying changes in film style between groups of films from different countries and time periods. In this paper, the use of shot scales in Hollywood and German cinema between 1910 and 1939 is analysed using linear regression of rank-frequency plots and nonparametric analysis of variance. The results show that Hollywood and German cinema underwent a similar change in the use of shot scales but that this change occurred at different times. The shift from a non-linear to a linear distribution of mean relative frequencies and the increased use of close-ups and medium close-ups for Hollywood cinema in the 1920s may be explained by formal and stylistic changes as the ‘classical’ Hollywood cinema superseded a more ‘primitive’ style, with the analysis of space through continuity editing replacing the distant framing and staging of an earlier film style. A similar change occurs in the style of German films but not until the 1930s, and this supports the argument that the development of film style in German cinema was influenced by that of Hollywood.
The results of this paper demonstrate what a simple and effective method the use of linear regression of rank-frequency plots can be: changes in film style over time and differences in nation style between Hollywood and German cinema were identified precisley where historical research said they should be.
I still haven’t solved the problem of the most consistent model from a nonlinear distribution of the mean relative frequencies of shot scales. One possibility suggested by the results presented here is that different models may work for different periods or groups of films.
One of the things I’m most concerned with here is analysing groups of films. Film scholars tend to focus on individual films (in the way literary scholars or art historians focus on individual paintings). This is fine but I think it is a limiting approach if not accompanied by the analysis of large groups of films, and statistics can make this process much quicker and easier by identifying patterns of film style. In the words of André Bazin from (‘La politique des auteurs’):
The American cinema is a classical art, but why not then admire in it what is most admirable, i.e. not only the talent of the this or that filmmaker, but the genius of the system, the richness of its ever-vigorous tradition, and its fertility when it comes into contact with new elements …
The same also goes for German cinema.
Feel free to point out any typing errors (I am the world’s worst typist).
Any suggestions on further research or where to get this published are also welcome.
Sampling Reliability in Cinemetrics
Numerous studies that have sought to analyse film style in terms of cinemetrics have employed a sampling strategy based on using the first 1800 seconds (s) of a film (e.g. Buckland 2006, O’Brien 2005, Salt 1992). This strategy has been criticised for producing unreliable results (see Bordwell and Thompson 1985), and some who had earlier employed this approach to sampling have also come to reject (e.g. Salt 2006). This post reviews the use of such sampling in cinemetrics.
Bordwell and Thompson’s criticisms
Bordwell and Thompson’s criticisms of the use of the first 1800s of a film as a sample form part of a larger critique of Salt’s Film Style and Technology: History and Analysis (1992; originally published 1983) published in the Quarterly Review of Film Studies in 1985. Their rejection of Salt’s method rests on the fact that their own values for the mean shot lengths for some films differ from those of Salt – and as such they regard this approach as unreliable and likely to result in mistaken conclusions with regard to the history of film style. These differences are presented in Table 1.
Table 1 Average shot lengths from Salt and Bordwell and Thompson
There are numerous problems with Bordwell and Thompson’s crticisism. First, they present measures of location without the context of sample sizes and/or measures of dispersion (variance, standard deviation, range, interquartile range – c.f. Buckland 2006). Second, where mean shot lengths are presented they are presented without confidence intervals or any other measures of sampling error. Third, the use of the mean shot length for skewed distributions with outlying data points is suspect and it is not apparent that differences between sample and population means will reflect actual differences in the distributions – a handful of very long shots can have a dramatic impact on the mean, and if these shots are not evenly distributed throughout a film they will lead to the mean being incorrectly estimated. Using the sample and population median shot lengths could avoid this problem altogether. This is not discussed. Fourth, no statistical tests are employed in determining if the differences between sample and population means observed are actual differences or if they are simply noise in the data. Salt’s figures were cited as being unreliable simply by virtue of the fact that they were different from Bordwell and Thompson’s. In their discussion of Salt’s results for Soviet cinema, Bordwell and Thompson state that Salt’s values are ‘significantly inaccurate’ (1985: 230). The term ‘significant’ has particular meaning in statistics relating to the probability of making a Type I error (a false positive – saying there is a statistically significant difference when, in fact, there is not). Bordwell and Thompson do not appear to be using ‘significant’ in this sense – even though they are discussing the difference between two averages. In what sense Salt’s figures are ‘significantly inaccurate’ is not clear, and there are no tests (t-test, Mann-Whitney U test) to support this argument.
Testing the 1800s sample
To test the usefulness of the 1800s approach shot length data was collected on 10 films produced in Hollywood in 1930 and 1931 from the Cinemetrics database. Shot length data for the whole film (the population data) was compared against data for the first 1800s (the sample method) using four methods: a confidence interval approach for both mean and median shot lengths; a Mann-Whitney U test; and a Kolomogorov-Smirnov test of the cumulative distribution functions. (Data for the samples was collected to the shot nearest to 1800s, whether that was in fact greater than or less than 1800s). In all cases, the level of significance (α) is 0.05. Summary data for these films is presented in Tables 2 and 3.
Table 2 Summary statistics for shot lengths in 5 Hollywood films, 1930
Table 3 Summary statistics for shot lengths in 5 Hollywood films, 1931
Confidence intervals
The uncertainty in the measurement of the sample can be quantified by calculating a confidence interval, which may be defined as ‘a range of values for a variable of interest constructed so that this range has a specified probability of including the true value of the variable’ (Gillam et al 2007: 51). A confidence interval gives an estimated range that is likely to contain an unknown population parameter, and will contain all the hypothetical values that cannot be rejected. If samples of the same size are drawn repeatedly from the same population, and a confidence interval is calculated for each sample, then 95% of the confidence intervals should contain the population parameter. If a confidence level of 0.05 is chosen, the confidence interval has a 0.95 probability of containing the true value of the parameter. (Note that a statistic is an estimate of a parameter). Confidence intervals are often preferred to significance tests as a means of expressing sampling error.
For film studies, the mean shot length of a sample is an estimate of the mean shot length of a population and the confidence interval specifies the range of values that may be considered as estimates of the population mean for a specified level of accuracy. Here, 95% confidence intervals for the sample means are used to estimate population means. Table 4 presents the sample means, the 95% confidence interval, and the population mean for each film.
Table 4 95% confidence intervals for 10 Hollywood films, 1930 – 1931
For 8 out of ten films the sample mean is a good estimate of the popualtion mean: the population means lie within the 95% confidence interval for the sample mean. However, for Born Reckless the sample (10.7s) underestimates the population (15.1s) and for Bad Company the sample (12.1s) overestimates the population (8.8s).
As all the films have skewed distributions and the maximum shot lengths exceed the upper quartile by a substantial amount, the mean is unlikely to be a reliable statistic of film style, and (as I have discussed elsewhere) the median shot length is more representative.
Confidence intervals for the median and the Mann-Whitney U test
The Mann-Whitney U test is similar to a t-test, but dos not rely on the means of a distribution. Where the two data sets being compared have the same shape, this test is a test of the difference of the medians.
It is also possible to construct confidence intervals for sample medians, and here I have provided 95% confidence intervals based on a binomial distribution using the sample size (n) and the desired percentile (0.5) (NB: this method tends to be conservative for the median, and so the confidence intervals will be at least 95%). Confidence intervals for the median are used in the same way as those for the mean.
Table 5 presents the sample median, its 95% confidence interval, the population median, and the P-values of the Mann Whitney U test.
Table 5 95% confidence intervals for the median and the Mann-Whitney U test for 10 Hollywood films, 1930 – 1931
Born Reckless and Bad Company are again the films for which the 1800s sample is unreliable for estimating the population data. Notice also that the Mann Whitney U test for The Lottery Bride returns a P-value of 0.05 while the population median falls within the sample confidence interval: this suggests that while there is not difference in the medians the two data sets produce distributions with different shapes. This is a good example that the Mann Whitney U test cannot automatically be regarded as a medians test.
Kolmogorov-Smirnov test
The Kolmogorov-Smirnov test compares the cumulative distrbution functions (cdf) of two data sets by calculating the maximum absolute difference between the two. The cdf gives us the probability of randomly selecting a shot length from a film that is less than or equal to a specified value (Pr[X ≤ x]): if there is no significant difference between the sample and the population then the probability of randomly selecting a shot length from the sample is approximately equal to that of randomly selecting the same shot length from the population. For example, if we look at the cumulative distribution functions for the popualtion and sample data for Animal Crackers (1930) (Figure 1), we can see that they are very similar. The D statistic is 0.0888 (P = 0.6200), and so there is no significant difference between the cdf of the population data and that of the sample data. For example, the probability of randomly selecting a shot that is less than or equal to 24.7s from the sample is 0.7957 and from the population is 0.7794.
FIGURE 1 Population and sample cumulative distribution functions for Animal Crackers (1930)
The results of the K-S test for all ten films are presented in Table 6. There are no significant differences between the sample and population shot length distributions, except for the now familiar cases of Born Reckless and Bad Company.
Table 6 Kolmogorov-Smirnov test for 10 Hollywood films, 1930 – 1931
Conclusion
There are a range of statistical methods than can be employed in comparing shot length distrbutions, but at present the statistical analysis of film style relies upon researchers simply pointing out the difference between two numbers in the absence of all the relevant facts (the continuing absence of measures of dispersion). Although statistical analyses have been promoted as ‘more credible and valid’ as an objective form of analysis that treats film style as an ordered response to the ‘challenges of filmmaking,’ and does not rely upon the subjective tastes of the critic (Buckland 2006: 159), this is not at present the case and too much currently relies on guesswork and supposition. Human beings are notoriously poor at seeing patterns where none exist (pareidolia) and missing them where do in fact occur. Statistics is a set of methods to minimise the errors and the impact that can happen to even the most diligent researcher. Simply arguing about whether two numbers are slightly different is not statistics.
Applying these tests to determine the suitability of using the first 1800s of a film as a sample of the distribution of shot lengths for the whole film shows that this method is not foolproof. The 1800s sample may lead to an error in estimating the parameters of a film’s style, and, subsequently, lead to erroneous conclusions about the history of film style. The use of the complete data for a film is the only reliable method.
References
Bordwell, D., and Thompson, K. (1985) Toward a scientific film history? Quarterly Review of Film Studies 10 (3): 224–237. Available online: http://www.davidbordwell.net/articles/.
Buckland, W. (2006) Directed by Steven Spielberg: Poetics of the Contemporary Hollywood Blockbuster. London: Continuum.
Gillam, S., Yates, J., and Badrinath, P. (2007) Essential Public Health: Theory and Practice. Cambridge: Cambridge University Press.
O’Brien, C. (2005) Cinema’s Conversion to Sound: Technology and Film Style in France and the U.S. Bloomington: Indiana University Press.
Salt, B. (1992) Film Style and Technology: History and Analysis, second edition. London: Starwood.
Salt, B. (2006) Moving into Pictures: More on Film History, Style, and Analysis. London: Starwood.
Some (brief) notes on cinemetrics II
If anyone is getting confused as to why my comments are appearing and disappearing at the end of posts, it’s simply because there is no one quite so indecisive as the author …
Power Laws and cinemetrics
In an earlier post I wrote about power laws and the distribution of mean relative frequencies (MRFs) of shot scales.
I think that MRFs have a useful role to play in statistically analysing film style – they can tell us if a group of films is dominated by a single scale (Lang’s German films) or if there is a more evenly spread usage of scales (the Hitchcock films and Lang in Hollywood). Looking at MRFs will not tell us which scale is dominant, but that is easy to find out.
However, having looked at this area in more depth, I would have to say that I do not think that there is much of a future in the power laws approach. Power regression does provide a good model – but it appears that it is not consistently the best. Exponential regression seems to be more consistent,with logarithmic regression is better too on occasion (but not so consistently). For example, for five Swedish films produced between 1917 and 1920 (see Table 1 below), R² (power) = 0.9817, while R² (exponential) = 0.9897. Not a big difference, but enough to say that a power law probably is not the best explanation for the trend in this data.
This suggests that power laws are unlikely to be a good explanation for the distribution of MRFs. Of course, the problem now is that I’m also sceptical about the use of exponential regression – given a quick enough decline in distribution of MRFs, an exponential regression line will give a very similar result to a linear curve and so it will not be possible to clearly distinguish between them. Overall, then, I think it is probably best just to use the linear model and to see how far MRFs deviate from this. Essentially, this means stating whether the distribution is linear or not (irrespective of what it might actually be) and looking for patterns in this statistic only. This is, of course, a much quicker and simpler way to proceed than comparing two or more regression models for each group of films, and so it has that advantage as well.
Power laws were worth a look, but I don’t see a future in it (or at least I see only an unnecessarily confusing one).
Table 1 presents some results for the distribution of MRFs for some groups of films when fitted to a linear curve using the model y = ax + b, where a is the slope and b is the intercept. As before this data is from Barry Salt’s database at the Cinemetrics website. Two things stand out: (1) early silent films are poorly fitted by the linear model; and (2) the groups of films that do fit the linear model have similar values for the slope and intercept of the regression curve. In fact, the results for the first five groups in Table 1 can all be adequately modelled by a = -0.036 and b = 0.29. This is presented in Figure 1, where the red line is the regression model. Why this should be the case is a mystery – why are Thorold Dickinson’s films simialr to Josef von Sternberg’s, Fritz Lang’s, and Alfred Hitchcock’s across time and different countries? We can conclude that shot scales are unlikely to be an indicator of authorship (although, as before, a larger study is needed to confirm this) [1]. Perhaps these regression coefficients crop up wherever continuity editing is used (which would not be the case before 1920 in Europe – hence the values for Lang in Germany and Sweden), or wherever Hollywood has been a determining factor in the development of film style, as it has been in Europe (hence the British film’s similarity to Hollywood. (This raises the question of how shot scales are used in non-Western cinema: I would love to see the distribution of MRFs for Japanese films of the 1930s). Nonetheless, it is a startling empirical regularity, and hopefully soon I will have some more systematic results to present.
Table 1 Linear regression of the distribution of the mean relative frequencies of shot scales in some motion pictures
Figure 1 Linear regression of y = -0.036x + 0.29 on five groups of films
PPCC Data
Barry Salt requested the results for the probabilty plot correlation coeffecient (PPCC) for some 40 films I have looked at. These are presented in Table 2, while Table 3 includes a reel-by-reel breakdown for Man with a Movie Camera (Dziga Vertov, 1928) for the same statistic.
Table 2 PPCC data for 40 films from the Cinemetrics database
Table 3 PPCC data for Man witha Movie Camera (1928)
Notes
- To date I have found no empirical evidence to support auteurism at all, while I have repeatedly found evidence of group styles, whether those groups be defined by nation, studio, or era.
Some notes on cinemetrics
Over the past couple of weeks various issues have been raised regarding my posts on cinemetrics. Hopefully, today’s post will go some way to addressing points that have been raised, whilst also bring to you attention some things you hadn’t previously considered.
Statistics and the internet
I don’t use expensive software such as SAS or SPSS because there is no need – between MS Excel and open source statistical software you can do pretty much any statistical analysis you want to.
(If you do have access to expensive statistical software, then don’t feel bad – make the most of it. We don’t accept arguments from authority, and your analysis won’t carry more weight just because you’re the underdog).
Learning about statistics
Before using whatever software you have to hand, it is best to understand something about statistics. There are lots of good introductory books to statistics that you can find in any half-decent bookshop or library, and Google books will give you a chance to browse these before you commit your hard earned cash to a purchase (or just use Google books and keep your money for something more exciting). Don’t forget that many non-statisticians have learn something about means and medians (sociologists, doctors, engineers, etc), and as a result there are lots of introductory texts aimed at non-specialists that are often good places to start.
Perhaps the best resource available freely on the internet is Gerard E. Dallal’s The Little Handbook of Statistical Practice, which provides a comprehensive introduction to statistics while at the same time being clear and simple to understand. It’s aimed at bioscience researchers and so all the examples are drawn from this discipline, but they are easy to grasp quickly.
Most universities have some sort of introductory materials for their students online, and you can access most of these for free without any problem. Good ones include Glasgow – which has a glossary of statistical terms; Leicester – which goes through examples of how to do statistical tests and is aimed at biologists; and Vassar – which has an introduction and lots of free online calculators you can use.
Finally, NIST provides a very comprehensive introduction to statistics. This is a bit more technical than the others I have listed above, and it does assume that you have access to some reasonably powerful software – but it does cover just about everything. And it’s free!
Finally, don’t forget that if you want to know about a particular topic in statistics you can just search using Google. If you want to know about Normal Probability Plots then search for it, and you will find a host of websites devoted to this subject.
Statistics is not difficult – but you should understand it before using it.
Statistics software
I mentioned about that Vassar’s stats pages have online calculators, and there are many such calculators on the internet that you can use for free.
Index of online calculators: this site has calculators for descriptive statistics, 2-sample Kolmogorov-Smirnov test, Chi-square, Fisher Exact Test, ANOVA. It is very easy to use, and there are explanations of how each test works and how to interpret the results. You can also use it to draw graphs (although once drawn you can’t do much with them as they come off as gs or pdf files).
GraphPad: GraphPad produce statistical software that is very easy to use and you can download demos and use those (they are cheaper than most stats software, but still pricey). They also have an online introduction to statistics, and a range of stats calculators you can use for free.
Statspages: an index of online calculators, although some of the links don’t always work. Handy if you’re looking for a particular test.
Daniel Soper Statistical Calculators: an index of 45 calculators for computing a whole range of things, though you will probably need to know what you’re doing before you try them. Once you do, very easy to use and tastefully presented in black and green.
SOCR: this is a website with a whole range of statistical tests, but I find it fussy and irritating. It also seems to take a long time to load. I avoid it, but you might it useful. Again, it’s free.
A very useful resource is Free Statistics, which will direct you to websites that will teach you about stats, to sources of data, and to free software that you can download. Some of this software requires a reasonably sophisticated knowledge of programming, whilst others are very simple, but the range is impressive and there is something for everyone.
I use PAST, which is a free statistical software package aimed at palaeontologists and provides a wide range of tests. Obviously some of these are not needed for Cinemetrics (you won’t need to study cladistics if you’re doing film studies), but it is incredibly easy to use, and there is an online manual that explains everything. The only downside is that is a pain to enter data into PAST, and so I usually enter data into Excel and then paste it into PAST before running the analysis.
Play
The best thing to do is to get some data and a little understanding and then play with the software. The best way to learn is to try.
Getting data out of cinemetrics
Of course, if you want to use any of the above you need to have some data. The Cinemetrics database has lots, but you need to get it into some useful form before you can do something with it. Here is a simple process of getting the data from those graphs into your software.
Cinemetrics has two parts: the data and the software. When you change the look of the graph on the page of a film to view the cutting swing, the two parts work in unison. You may have noticed that a lot of red text appears when you change the graph and then disappears. That is the data, and if you can separate it from the software that draws the graph you can see it all. To do this save the page to a directory on your computer and then reopen it when you are not connected to the internet (if you’re on a network you will probably have to set your browser to ‘Work Offline,’ which will be under the File menu the precise details will depend on how your computer is set up). Tell the page to redraw the graph (set the height to 300 and click Redraw). You will see the red data text appear, but because it cannot connect to the software it needs to draw the graph it gets stuck and stays on the screen. Open your spreadsheet software and make the window small enough you that you can see the data from the webpage on the same screen (see Figure 1). You will find that with only a little practice, you can enter the shot lengths very quickly (or you can save what you’re doing and go and have a rest if there are lots of shots). You now have data is a form that is easier to manipulate.

Figure 1 Entering Cinemetrics data for The Lady Lies (1929) into MS Excel
Using Excel
I use Excel 2007 as my main software package. All versions of Excel have a good range of statistical tests, although they are easier to use in the latest versions. The thing with Excel is that you have to remember that it is simple: on the one hand it is easy to use; and on the other, it is not very intelligent and won’t necessarily do things in the easiest way possible. For example, most people don’t know that Excel comes with a statistical analysis package built in – why would they: Excel doesn’t tell them this. In order to install the analysis package you need load the Analysis Toolpack from the Add-ins menu (where this will be depends on the version of Excel you are using, but search the help file for add-ins and you’ll find it). Once installed, the toolpack gives you an automated means of accessing the statistical commands that you normally have to type into the spreadsheet. (Actually using the commands in the spreadsheet is often quicker, easier, and gives you a little bit more control but this depends on how comfortable you are in using the software and statistics. To find out which statistical functions are built into Excel, search the help file for statistical functions). The Analysis Toolpack gives you access to descriptive statistics, ANOVA, z-tests, t-tests, F-tests, correlation, regression, and many other functions, but they are all parametric tests (they depend on the parameters of a distribution and make certain assumptions about the nature of that distribution). If you want nonparametric tests (which make fewer assumptions and don’t rely on parameters), then you can find these in PAST. Alternatively, you can set up your own spreadsheet to do things like the Mann Whitney U test or Kruskal-Wallis ANOVA once you’ve grasped how these tests work and the idiotic way Excel sometimes requires you to do things (see Figure 2).

Figure 2 My spreadsheet for Kruskal-Wallis ANOVA of the films of Terence Davies
Normal Probability Plots
Above I mentioned two types of statistical tests: parametric and nonparametric. It is important to choose the right test to get the most out of your data, and picking the wrong approach may lead you to the wrong conclusion. What distinguishes these two types of tests are the assumptions you can make about the data:
- Parametric tests assume that data is distributed according to an underlying probability distribution (of which there are several, but I’ll only mention a couple here), that data sets have equal and/or independent variances, that the data is at least interval or ratio data, etc. The precise assumptions needed will depend on which test you are using. If the assumptions about the data hold, then parametric tests are more powerful than nonparametric tests.
- Nonparametric tests require fewer assumptions about the nature of the data and do not depend on an underlying probability distribution. They are often referred to as ‘distribution free.’ There is usually a nonparametric equivalent that can be used when a parametric test is inappropriate (for example, Mann Whitney U is the nonparametric equivalent of a t-test for independent samples).
Typically, the distribution of shot lengths in a motion picture is positively skewed with a number of outlying data points: as such, it does not follow a normal distribution. HOWEVER, we could still use parametric tests if the data is normally distributed after a transformation has been applied. Usually, such a transformation involves using logarithms. Once the data has been transformed to its logarithm, we can then run tests to see if the data now follows a normal distribution: if it does, then we say that the data is lognormally distributed. (A random variable is lognormally distributed if its logarithm is normally distributed).
How, then, do we test data to see if it comes from an underlying normal distribution? Well, there are several tests that can be applied: the 1-sample Kolmogorov-Smirnov Test, Shapiro-Wilk*, Cramer-von Mises, Anderson-Darling, Pearson’s Chi-Square*, Jarque-Bera*, and Lillefor’s tests can all be used. (Tests marked * can be found in the PAST software I mentioned above). These tests can be used in varying circumstances – it depends on what you are trying to do.
A simple method which provides both a visual and numerical measure of normality is to use normal probability plots and the probability plot correlation coefficient (PPCC), that I have described elsewhere. (Both PAST and Excel will produce normal probability plots, and PAST also calculates the PPCC). By comparing the value of the PPCC of your data for a specified significance level with the critical value of the PPCC for the size of the dataset you are using you can see if the data is normally distributed: if the observed PPCC is greater than the critical value, then the data is normally distributed; and if it is less than the critical value then it is not normally distributed. For example, The Immigrant (1917) has 159 shots (n = 159): for a sample of this size the critical value of the PPCC is 0.9923. For the untransformed data the observed value of the PPCC is 0.8420 – clearly not normally distributed; and for the data transformed to its common logarithm (log10), the PPCC is 0.9715 – so not lognormally distributed either.
Now, 0.9715 is not much less 0.9923 so maybe there is not a big difference. BUT, in statistics numbers are never just numbers – they have meaning within a specific context. In interpreting the result of the PPCC we need to remember that it comes from a distribution of critical values that is ASYMPTOTIC – that is, it approaches a limit (in this case 1.0) as the sample size grows. It will never actually reach 1, because you can always have a bigger sample, and so the value of the PPCC will get ever close requiring ever more decimal places. Look at Figure 3: this graph plots the critical value (the solid lines) and the observed values (the dotted lines) of the PPCC for His New Job (1915) and Verboten! (1952) using log-transformed data. For His New Job, the observed value is greater than the critical value (the dotted line is to right of the solid line) and so this data is lognormally distributed. For Verboten!, the opposite is true, and the data is not lognormally distrubuted. The sample size (number of shots) is 502 with a critical value of ~0.9971, but the observed value is only 0.9575, which equates to a sample size of only 25. There is only a small numerical difference between 0.9971 and 0.9575, but looking at Figure 3, we can see that actually this is quite a large difference in the context of the asymptotic distribution of the PPCC.

Figure 3 The distribution of the probability plot correlation coefficient for sample sizes n =5 to n = 1000.
What, then, does this mean for The Immigrant? The observed value of the PPCC corresponds to a sample size of ~41, which is nearly four times smaller than the sample size used (n = 159) – a much larger difference than the numbers (if taken at their face value) would appear to imply.
Why is this relevant? In statistics we are estimating outcomes – we rarely know the complete data for any situation, and if we are using the Cinemetrics tool then some error in the data will always be present (you can only press that space bar so quickly in response to observing a cut). If we rely on parametric statistics for assessing shot length distributions when we know the data is not normally or lognormally distributed then we run the risk of saying that there is a difference between two data sets (i.e., the shot lengths of two films) when in fact there isn’t ( a Type I error – false positive), or saying that no difference exists when if fact it does (a Type II error – false negative). Using nonparametric tests is a way around this problem – but will not eliminate the possibility of making an error completely.
I have looked at the PPCC for normal and lognormal distributions of 40 films from the Cinemetrics database, and, while these films cannot be considered a representative sample of the database, half (20) are not lognormally distributed. Some miss their critical value by only small margin but others miss by quite some distance: Man with a Movie Camera has a lognormal PPCC of 0.9639, which would be the critical value for a sample of size 30 but the film has 1729 shots! Of the six reels for this film, only reel 5 is lognormally distributed. Verboten! is worth a mention here – for most films the PPCC test for untransformed data usually produces a value between 0.7000 and 0.9000, while for this film it is only 0.5495 and this deserves closer attention.
Of course, all this assumes that you want to use frequentist statistics. You could adopt a Bayesian approach …
Leave a Comment








Comments (1)





Leave a Comment











