Category Archives: Warren Buckland

Bar chart or histogram?

Although it is becoming more and more common for film scholars to cite statistics of film style in their research, there is a pressing need for a good statistics textbook aimed at those working in film studies because it is evident that there is very little actual understanding of statistics. The only attempt to provide some sort of instruction in statistical methodology has been undertaken by Warren Buckland in Studying Contemporary American Film: A Guide to Movie Analysis, which he co-authored with Thomas Elsaesser (Elsaesser and Buckland 2002 – to access this chapter freely see here). Unfortunately, Buckland gets the most basic elements of statistics wrong (see here for a demonstration of how wrong), and this week I am going to focus on two aspects: the importance of distinguishing between qualitative and quantitative variables, and the difference between bar charts and histograms.

In this chapter, Buckland discusses the frequency of different shot scales in Jurassic Park, and produces the graph in Figure 1.

Figure 1 The frequency of shot scales in Jurassic Park (Source: www.cinemetrics.lv/buckland.php) (BCU = big close-up, CU = close-up, MCU = medium close-up, MS = medium shot, MLS = medium long shot, LS = long shot, VLS = very long shot).

This image is accompanied with the following text:

Finally, in terms of shot scale, the distribution [of shot scales] confirms (sic) to what statisticians call a ‘normal distribution’, with high values in the middle (the mean) and progressively lower values on either side (…). The result of these normal distributions is that the standard deviation and skewness values are low.

This statement is simply incorrect – this distribution does not conform to a normal distribution, and no statistician in the world would say so. This problem arises because Buckland does not know how to distinguish between different types of variable and does not know the difference between a bar chart and a histogram. Linking the two terms together, Buckland treats them as synonyms:

the histograms, or bar charts, representing the number of each shot type in each film (the number of close-ups, long shots, etc.)

Actually, this error originates  in Salt (1992: 142), which Buckland has simply copied.

This is wrong because bar charts and histograms are two different types of graph that represent different types of variables providing different types of information, and it is important to know the difference.

Types of variable

A variable is simply a measured attribute of interest that varies over time or subjects. The actual value of a variable is its data value, and the set of data values from each subject in the sample is the data we are going to analyse. So shot length is a variable of film style, the length of a shot is its data value, and the set of all the shot lengths is the data set.

It is important to know what type of variable we are dealing with, because this will determine the type of statistical analysis we will be able to apply. A variable may be qualitative or quantitative, and may be measured at one of four levels of data – nominal, ordinal, interval, or ratio. (Variables may also be discrete or continuous but I will not address this here).

Qualitative variables

Qualitative variables have values that are non-numeric and descriptive.

Qualitative variables may be either nominal – data can be sorted into mutually exclusive categories that characterise an element of a subject but to which no order may be assigned; or ordinal – in which the categories used have a logical ordering.

Camera movement is an example of a qualitative nominal variable in the cinema: we can sort camera movements in a film into different categories (pan, track, tilt, etc), but we cannot assign an order to these categories. In order to make data analysis easier, we might choose to code the categories of camera movement using numbers (e.g. pan = 1, track = 2, tilt = 3, etc) but these codes have no mathematical meaning.  We can define the mode for this type of data as simply the most frequently occurring value, but the median and the mean do not exist. It is nonsense to speak of an ‘average’ camera movement in the same way as we speak of an ‘average’ shot length.

Shot scale is an example of a qualitative ordinal variable of film style, but this ordering is very weak. When it comes to the statistical analysis of shot scales then we first need to assign each shot to a category in which the variables are non-numeric (BCU, CU, MCU, etc) – and therefore qualitative. We can also assign order to these variables: a big close-up is ‘nearer’ to the object than a close-up, a close-up is nearer than a medium close-up, etc. However, the difference between the categories are not meaningful – a big close-up is closer than a close-up, but ‘closer than’ is not otherwise defined. The mode may be expressed for this type of variable; and it may be possible to define a median for qualitative ordinal data but it may not always be appropriate to do so. For example, it we asked a hundred people to rate Inception (2010) on a scale of 1 to 10, where 1 = ‘did not enjoy at all’ and 10 = ‘enjoyed enormously,’ we could meaningfully state that the median rating is 7/10. In contrast, it makes sense to speak of medium shots as the modal class of shot scales in Jurassic Park, but not to say that medium shots are the ‘median shot scale’ even though shot scales can be logically ordered.

The mean does not exist for qualitative ordinal variables, which is why Buckland’s use of the term ‘normal distribution’ – which is parameterized by the mean and the standard deviation – to describe shot scales in Jurassic Park is meaningless. The mean and the standard deviation in these circumstances do not exist.

Statistical analysis of nominal variables involves looking at the frequency with which an event occurs (e.g., how many panning shots are there in a film?), and I outlined some methods for categorical data analysis of film style in my post on hypothesis tests of proportions for film style (here). If the data is ordinal, and it is appropriate to do so, nonparametric methods such as the Median test, the Mann-Whitney U test, the Kruskal-Wallis test, etc. may be employed in data analysis. Parametric statistical methods cannot be applied to qualitative variables.

Bar charts

If we want to represent the data collected about qualitative variable then a bar chart is the simplest method to employ. Pie charts can also be used and are particularly effective when emphasising differences in frequency by their use of area (see here, for example), but they are less useful if your data has a lot of detail or you want to compare two different groups (for which you will need two different pie charts).

Remember, you cannot think about a qualitative variable in terms of a probability distribution. To illustrate this, look at the two bar charts in Figures 2a and 2b. These bar charts present the same information – the normalised frequency of different shot scales in The Birds (1963) – but have been arranged differently: Figure 2a shows us the shot scales arranged from nearest (BCU) to most distant (VLS), while Figure 2b shows us the scales arranged from most distant to nearest. If we thought of these figures in terms of a continuous random variable – as Buckland does with Jurassic Park above – should we conclude that the shot scales are positively skewed as indicated by Figure 2a or negatively skewed as in Figure 2b? The answer is neither because the question is meaningless: we cannot think about a qualitative variable represented by a bar chart in these terms. If you conclusion changes according to how you order the data, then the design of your experiment is very probably flawed.

Figure 2a Normalised frequency of shot scales in The Birds (1963) arranged from nearest to most distant (Source: www.cinemetrics.lv/satltdb.php)

Figure 2b Normalised frequency of shot scales in The Birds (1963) arranged from most distant to nearest (Source: www.cinemetrics.lv/satltdb.php)

What can we say about shot scales in The Birds? Well, we can see that the most frequently occurring scale is the close-up, followed the medium close-up, whereas more distant shot scales are much less frequent. We can therefore conclude that this film is characterised by shot scales that bring the viewer close to action on-screen, particularly when Melanie Daniels is being attacked. Note that these conclusions do not depend on how the data is arranged in either Figure 2a or Figure 2b – they depend solely on the data themselves and the intrinsic order of the data. We could also represent this information as a proportion or a percentage if we so wished without changing the conclusions.

There are some simple rules for presenting bar charts:

  • The gaps between the categories in a bar chart are important: they emphasise the fact the categories used are mutually exclusive and do not form a continuum. Note that in Salt (1992: 143) the bars in the charts for shot scales do touch.
  • Make sure the scale used is meaningful and clearly labelled, and does not mislead the reader in interpreting the chart by overemphasizing differences.
  • Colour and shading can be useful, but can also be misleading an irritating.
  • If the data do not have any logical ordering, arrange the categories from in order to make the differences easier to interpret. It may also be easier to rotate the chart to put the category labels on the vertical axis and the numerical values on the horizontal.
  • Use bars to represent values clearly, rather than pictures of different sizes. They are easier to understand, and far less irritating.
  • NEVER add a third dimension to the bars on your chart – the extra dimension adds no new information and is potentially misleading. To see how bad this can be, look at charts three and four in Charles O’Brien’s paper on Sous le toits de Paris (1931) here.

For some really good examples of how not to present data in bar charts and pie charts, see Gary Klass’s Just Plain Data Analysis website here. This website is especially useful as it also gives tips and examples on  how to use Excel to draw charts.

Quantitative variables

Quantitative variables have values that are numeric, and quantify an element of a population.

Quantitative variables may be measured at the interval or ratio levels. With interval data, the distance between data values are meaningful but there is no natural zero. With ratio data the distances between data values are meaningful, and there is a common origin. We can calculate the mode, the median, and the mean of interval and ratio data.

Quantitative variables have order so they can also be treated as ordinal variables, although this does lose some of the information from the data set. Nonparametric methods applied to quantitative variables may involve transforming the data into an ordinal form by ranking methods, but this is advantageous because it means that nonparametric methods may be applied when the requirements for parametric methods are either unknown or are not met.

Shot length is a quantitative variable measured at the ratio level – the difference between a shot that is 2 seconds long and one that is 3 seconds long is the same as the difference between a shot that is 5 seconds long and one that is 6 seconds long; and a shot that is 4 seconds long is twice as long as one that is 2 seconds long.

Histograms

Like bar charts, a histogram is produced by sorting data into categories (called bins). However, unlike a bar chart, the values on the x-axis form a continuum: the point at which one bin ends is the point at which the next bin begins. For this reason, neighbouring bars in a histogram must touch. In a bar chart, frequency is expressed as the height of the bar; whereas in a histogram it is expressed as the area of the bar.

A histogram is a simple nonparametric method of density estimation, and depends only on the choice of the location for x₀ and the width of the bins. From a histogram we can identify the shape of a distribution (uni-, bi-, or multimodal, symmetrical, skewed, leptokurtic, or platykurtic); the range of the data; and the presence of outliers.

Unlike a bar chart, where the gaps between the bars stress the absence of a continuum on the x-axis, the gaps in a histogram have a different meaning. Because the x-axis is a continuum, a gap in the data indicates that there were no data values in this bin. Figure 3 is a histogram of the distribution of shot lengths in Busy Bodies (1933), with x₀ = 0 seconds, and a bin width of 2 seconds. The values on the x-axis are the mid-points of the bins, so the first bin covers shots of length 0.0s to 2.0s, the next bin covers 2.0s to 4.0s, the next bin covers 4.0s to 6.0s, and so on.

Figure 3 Histogram of shot lengths in Busy Bodies (1933)

From Figure 3 we can see that the distribution of shot lengths in Busy Bodies is (1) unimodal and positively skewed; (2) that the range of the data is from 0.0 seconds to 48.0 seconds; and (3) there are outliers in the upper tail of the distribution. We can see that short shots occur much more frequently than long shots. There are gaps in the distribution, indicating that there are some bins that contain no shots.

  • The shape of the histogram depends on the choice of x₀ and the bin-width, and making the wrong choice can led to flawed interpretations. Too many bins and you cannot see the structure of the data properly due to the presence of too much information; too few bins and you cannot see the structure at all. There are various methods for choosing the ideal bin-width, but none is definitive.
  • There is a lack of precision in describing the range: the actual range of this data is from 0.5s to 47.6s, but the histogram cannot give us this level of precision without using too many bins. Information is lost in the process of binning the data.
  • You cannot put the shot length distributions of two films on to the same histogram, and so it becomes necessary to produce two histograms and compare them side by side. This is the same problem as comparing two pie charts side by side, and is equally undesirable.

The limitations of the histogram may be overcome by employing kernel density estimation. See here for an overview.

Summary

In Studying Contemporary American Film: A Guide to Movie Analysis, Elsaesser and Buckland have undertaken to demonstrate the methodologies of film analysis to students and to encourage them to apply them to films themselves. Unfortunately, the section on statistics is fundamentally flawed due to the authors’ lack of understanding of elementary statistics. It is not a textbook that students should be encouraged to read, as it will leave them with an erroneous understanding of statistical methodology. It also does not say much for the standard of research in film studies.

Students should be taught to properly identify the type of variable they are dealing with, because this will determine the statistical methods they subsequently employ. They should know the difference between qualitative and quantitative variables, and be able to identify which elements of film style are which. They should be able to distinguish between the different levels of data they will encounter. They should know the difference between a bar chart and a histogram, when it is appropriate to use either, and how to produce and interpret each. They should also know the specific statistical meaning of terms such as ‘mean,’ ‘standard deviation,’ ‘skew,’ and ‘normal,’ and when it is appropriate to use them.

References

Elsaesser T, and Buckland W 2002 Studying Contemporary American Film: A Guide to Movie Analysis. London: Arnold.

Salt B 1992 Film Style and Technology: History and Analysis. London: Starwood.

Advertisements

Buckland on Spielberg

Although credited to Tobe Hooper, it is widely held that the director of this film was in fact Steven Spielberg, who also wrote and produced the film. In Directed by Steven Spielberg: Poetics of the Contemporary Hollywood Blockbuster, Warren Buckland undertakes what he calls a statistical analysis of a group of films in order to solve the riddle of who directed Poltergeist (2006: 154-173) [1]. Buckland sets out his intentions for this chapter clearly:

Through a shot-by-shot analysis, I use statistical methods to compare and contrast Poltergeist to a selection of Hopper’s and Spielberg’s other films,’ in order to ‘determine how Poltergeist’s style conforms to and deviates from Spielberg’s and Hooper’s filmmaking strategies (155).

Here I review the statistical approach adopted by Buckland. Specifically, I address four issues: the design of the study; the statistical methodology employed; the presentation of the results; and the conclusions drawn.

I do not address the rest of the book, and my critique is limited only to the chapter that deals with the statistical analysis of Spielberg’s and Hooper’s films.

The study

Buckland’s analysis compares Poltergeist to two films directed by Spielberg (ET and Jurassic Park) and one film (The Funhouse) and one TV movie (Salem’s Lot) directed by Tobe Hooper. It is reasonable that we would want to compare the work of interest (Poltergeist) to the work of the two possible directors, but alarm bells should be ringing already.

First, Poltergeist was released in 1982 – the same year as ET, while The Funhouse was released in 1981, and Salem’s Lot was aired in 1979. Jurassic Park, however, was released in 1993; and so while four of the works in question are contemporary with one another, one is from a decade later. Is it reasonable to assume that Spielberg’s style remained unchanged from 1982 to 1993 so that a direct comparison is possible? It is not unreasonable to suggest that Spielberg’s style did not change from ET to Jurassic Park, but equally it is not unreasonable to expect that it did. In the period 1901 to 1912, Picasso moved through his blue, rose, and cubist periods – might we not expect Spielberg to also have developed as a filmmaker over the course of a decade? What impact might new filmmaking techniques and technologies developed throughout the 1980s have had on his film style? We might expect the results to reflect the fact that the exemplars for Hooper are contemporary with Poltergeist, while this is only the case for one of the Spielberg films.

Furthermore, of the five films considered, four were produced for release into cinemas, while Salem’s Lot was produced for television. Might we not expect the results to reflect the fact that Poltergeist was made for cinemas like the two Spielberg films, while this is only the case for one of the Hooper films, and so indicate a difference in media rather than director? Buckland addresses this a note to the chapter (173, n.2), where he points out that the percentage of medium close-ups in Salem’s Lot is consistent with that in The Funhouse – although he simply asserts this and does not perform any test of this hypothesis (see below). It is the case that there is no significant difference between the proportion of medium close-ups in Salem’s Lot (0.33 [0.28, 0.39]) and The Funhouse (0.36 [0.30, 0.42]) (Z = 0.6459, p = 0.5183), but there is a significant difference between the proportion of reverse angle shots (see Table 2 below). Buckland’s justification for using a TV movie is, then, very weak indeed and open to challenge.

There is the potential for bias in the study, and it is not clear that it can set out to do what it claims. This is the result of failing to establish the style of Hooper and Spielberg before conducting a comparison of the two. Is Spielberg consistent over the course of a decade in his use of film style? Is Hooper consistent in his style when moving between film and television? Buckland states that a pattern of film style is ‘created by a director’s sensibility, or intuition, a series of consistent habits that constitute a director’s style’ (158), but he has failed to demonstrate that this is actually the case for either Spielberg or Hooper.

Statistical methodology

Sampling

Buckland’s data is taken from only the first thirty minutes of each film, and this has the potential to distort the results. This sampling strategy requires the assumption that rest of the film will be of similar style to the first half hour – not necessarily an unreasonable judgment but equally one which may turn out to be unjustifiable. As I have shown elsewhere, calculating the mean shot length on the basis of the first thirty minutes of a film may under- or over-estimate the true value. This may be attributed to a film reaching a dramatic climax, for example, where the pace of the editing may increase relative to the early portion of a film, which may have longer shots and scenes for exposition. Equally, when calculating the proportion of shots that are of a particular scale we may find that the style changes as the film progresses.

Estimation

A flaw in Buckland’s presentation of his results – and a general flaw in the use of statistics in film studies in general – is the confusion of statistics with parameters. It is worth reading Mark Schuster’s paper ‘Informing Cultural Policy: Data, Statistics, and Meaning’ (Schuster 2002) before proceeding with any statistical analysis because he sets out some fundamental principles of statistical analysis in a clear and accessible manner. First, he makes a distinction between data and statistics:

It has become quite common to treat the words ‘data’ and ‘statistics’ as synonyms. We prefer the word ‘statistics,’ perhaps, when we wish to signal seriousness of purpose; but we prefer ‘data’ when we don’t wish to threaten the system that is being measured.

But statistics and data are not the same. Statistics are measures that are created by human beings; they are calculated from raw data by people who are wishing to detect patterns in those data. We calculate means, modes, standard deviations, chi-squared statistics, slopes of regression lines, correlation coefficients, and so on; we aggregate in a wide variety of ways, we eliminate outliers, we normalize calculations, we truncate time series. In short, we generate mathematical summaries that we think are appropriate to the questions with which we are grappling at a particular moment in time. And we have debates about which statistic will capture better the particular element of human behavior in which we are interested.

This is why it is not only silly but perhaps even dangerous to say that we will ‘let the data speak for themselves.’ We calculate statistics from data in order to say something about them.

Schuster then goes on to make a distinction between statistics and parameters:

Statistics are mathematical summaries of the relationships we observe in the data we have actually been able to collect, often from systematically drawn samples. Parameters are mathematical summaries of the relationships that we would observe if we were able to collect complete and accurate data about the behavior of entire populations. Statistics are estimates of parameters. In the end, we are interested in parameters, but statistics are the best we can do.

Statistics and parameters are often distinguished by the use of different symbols: roman letters are used for statistics, while Greek letters are used for parameters. For example, the sample correlation coefficient r is an estimate of the population coefficient ρ, and the sample standard deviation s is an estimate of the population standard deviation σ.

Buckland – like everyone else writing about the statistical analysis of film style – presents statistics as parameters and not as estimates of parameters. For example, on the basis of the first thirty minutes of ET, Buckland states that the mean shot length is 6.25 seconds. Now, for the first thirty minutes of ET we can take this to be a parameter (it describes all of the data in the first half hour), but if we want to use this figure to describe the whole film then it is a statistic (an estimate of the parameter for the whole film). Unfortunately, as a statistic it is useless because it is not accompanied by any measure of the error of the estimate – the mean shot length is presented without a standard deviation or standard error to indicate the variability of the data, or confidence intervals to indicate the possible values of the true mean shot length. Is 6.25 seconds a good estimate of the mean shot length for ET? We do not, and on the basis of the information provided by Buckland we cannot, know.

This problem arises due to the way in which the mean shot length of a film is often calculated: the running time is divided by the number of shots. This method will tell you what the mean shot length is, but it does not make it possible to calculate any other statistics because the actual duration of each shot is not known. For example, the standard deviation is calculated by subtracting the mean of a data set from each value in the data set, but if you do not know the value of each data point then this is not possible. Consequently, we have no measure of the variability of the data, and this makes any subsequent analysis impossible. I cannot assess the validity of Buckland’s claim that, because in order to perform the appropriate statistical tests (a t-test for independent samples or one-way ANOVA, depending on how you choose to compare the films) require the standard deviation. (However, see below on the non-normal nature of shot length distributions). Nor can I calculate confidence intervals for the mean shot length because this again would require the standard deviation [2].

The unusual thing is that Buckland must have, in fact, determined the length of each shot – he presents data on the proportion of shot lengths that lie in the range 1-3 seconds, for example. He also presents the skew of the shot length data for each film, and the calculation of this statistic would require knowing the duration of each shot. Why, if this information is available, was not included in the study?

The skew of each film in Buckland’s study is large, and this begs the question why the mean shot length is used as a statistic of film style when the shot length distributions for each film are asymmetrical. A true normal distribution will have a skew of zero, but life is never convenient and a dataset will almost never have a true normal distribution. Some (but not all) statistic textbooks recommend that the assumption of normality is valid when the skew is greater than -0.8 and less than 0.8. If the skew lies outside this interval, then the assumption of normality is not valid. For the five films in Buckland’s study, the skew values are 2.7, 2.7, 5.5, 5.6, and 4.1. As I have shown elsewhere, the median is a more robust statistic when dealing with  data sets that are positively skewed with outlying data points, as shot length typically are. A statistic is ‘robust’ if it is not influenced by outliers – the mean is very sensitive to outliers and just a single value that is very different from the rest of the data can wreck the mean as a measure of central tendency. The median is not affected in this way. The mean shot length should not have been used as a statistic of film style, and the conclusions Buckland draws on the basis of the mean shot length are worthless.

Testing

Buckland alerts the reader his chapter on Poltergeist will involve a detailed analysis involving the thorough use of statistics:

I need to warn the reader that this chapter contains a lot of number crunching and statistical testing, which are necessary if we want to make an informed judgment about the creative force behind Poltergeist. The results of my analysis may surprise you (155).

What statistical tests are employed in this analysis?

None.

The statements Buckland makes about the style of each director and their relation to Poltergeist are simply assertions based on whether one number is similar to another. There is no statement of what is considered to be a statistically significant result – i.e. there is no value for α and no decision rules – and so there is no means by which we can judge the reliability of the results.

This is all the more bizarre because in Elsaesser and Buckland (2002) we find the following statement:

… some films such as Poltergeist have disputed authorship (was it directed by Tobe Hooper or Steven Spielberg?). By systematically analyzing the parameters of the shots in Poltergeist, and then comparing the results to samples from Hooper’s and Spielberg’s other films, it may be possible to identify the film’s authorship (defined in terms of mise en shot, that is, the parameters of the shot). Of course, because we move from descriptive to inferential statistics, then the result can never be certain, but only predicted with a degree of probability. Only the descriptive aspect of the analysis remains beyond doubt.

On a cautionary note, the variables chosen to determine a director’s style need to be valid (…). Secondly, the results need to be statistically significant, rather than due to chance occurrence. Many statistical tests are in fact tests for significance.

Why, then, does Buckland not employ any tests of statistical significance, when clearly he is at least aware that such tests exist? It all sounds very good, but in practice there is little of substance.

To demonstrate how this analysis could have been I look at the proportion of different types of shots in the five films in the study. The obvious test for comparing the use each director of certain types of shots is the Z-test of two proportions, but the Fisher Exact Test can also be used. For an explanation of how to do the Z-test for two proportions, see David M. Lane’s Hyperstat website. For an explanation and online calculator for the Fisher Exact Test (as well as many other calculators), see the Graphpad website (The Fisher Exact Test is under the heading ‘Categorical data’).

For example, Buckland states that:

On average, 58 per cent of Hooper’s shot scales fall within the ‘big close-up to medium close-up’ range; for Spielberg, the figure is only 45 per cent. In Poltergeist, 55 per cent of the shot scales fall within this range, significantly closer to Hooper than Spielberg (164-165).

What does ‘significantly’ mean in this paragraph? There are several problems here. First, in statistics ‘significant’ has a specific (if controversial) meaning – it defines the amount of evidence required to reject a null hypothesis (though quite how you interpret this evidence depends on your preference for the Fisher or Neyman-Pearson approach to hypothesis testing, or the hybrid of the two). However, we cannot judge the significance of Buckland’s claim in these terms – we have a statement that sounds like statistics but is in fact not. We have (again) the presentation of averages without measures of dispersion or confidence intervals, and no significance test is performed. In the above paragraph, the use of the term ‘significant’ sounds good, but it is, from the point of view of statistics, meaningless: how close does ‘close’ have to be to be ‘significant?’ What procedure will we use to calculate ‘close?’ The main problem is that the issue is presented back to front: in statistical hypothesis testing, we always test the null hypothesis of no difference. This is not what Buckland describes: he says that the proportion for Hooper is significantly nearer to that of Poltergeist than the proportion for Spielberg. But how do we frame a statistical hypothesis to express this? A simple way is to compare the proportion for each director against that of Poltergeist. We state our hypotheses:

  • The null hypothesis is: ‘the proportion of close-ups (big close-ups to medium close-ups) in Poltergeist is equal to that of the films directed by Hooper/Spielberg.’

The significance level is set at 0.05 – this means that if we get a p-value that is equal to or less than 0.05 we will say that there is a statistically significant difference, and if the p-value is greater than 0.05 we will say that there is no statistically significant difference. This our decision rule. The p-value is NOT the probability that a hypothesis is true – it is the probability of getting a result that is equal to or more extreme than that observed if the null hypothesis is true. Essentially it is a measure of incorrectly concluding that there is a statistically significant difference based on the data in front of you. It is important to remember that a statistically significant result is not a practically significant result, and how the former relates to the real world situation you are analysing requires careful interpretation. A significance test of the above hypothesis will not tell us why there is or is not a difference; but if we assume that the decisions of filmmakers determine the style of a film and that different filmmakers making different decisions will have different styles we must first determine if such a difference can be said to exist. Statistics is one method of doing this, but not the only one.

To answer the question as to which director is closer to Poltergeist and which is further away we need to address the effect size of the difference. A significance test will us if there is a statistically significant difference and the effect size will tell us how big that difference is. Unfortunately, there is not enough data to be able to do this for Buckland’s experiment. Nonetheless, it is important to be clear that the p-value does not tell you the size of a difference.

The results of a Z-test of proportions for our hypotheses at a significance level of 0.05 are presented in Table 1.

TABLE 1 Proportion of close-ups (big close-ups to medium close-ups) (α = 0.05)

In Table 1, we have a lot of information. The first column (P) gives the proportion of close-ups (big close-ups to medium close-ups) in the three data sets, with the confidence interval in the second column so that we know the error of the estimate. The third column (Difference) calculates the difference between the sample for each director and the sample for Poltergeist, and the fourth column is the confidence interval for this difference. (This will give us some limited understanding of ‘closer,’ but is not the same as the effect size). The fifth column gives the result of the Z-test and the sixth column is the p-value. Note that for Hooper the p-value is greater than 0.05, and so we say that there is no statistically significant difference between Hooper and Poltergeist. For Spielberg, the p-value is less than 0.05 and so we say that is a statistically significant difference between this director and Poltergeist. Buckland’s conclusion is vindicated by the statistical analysis – but without defining the hypotheses, without the statistical test, and without defining what we mean by significance we are just guessing, and guessing is not research.

How good are Buckland’s other guesses? We can find out by performing statistical tests on a range of stylistic elements for which Buckland provides data. For the rest of these tests I will not explicitly state the hypotheses and typically hypotheses in research papers will be implicit rather than explicit; but the null hypothesis (unless otherwise stated) in each case is of the form ‘the proportion of x in film y is equal to the proportion of x in Poltergeist.’ The test used in each case is the Z-test for two proportions, and the significance level is 0.05.

In Table 2 we can see the results of applying the Z-test to the proportion of reverse angle shots; and what they tell us is that there is no statistically significant difference between Poltergeist and ET, Jurassic Park, or The Funhouse, while there is a significant difference between Poltergeist and Salem’s Lot. It is possible that, as a television programme (viewed on a smaller screen in the intimate setting of the home) Salem’s Lot uses reverse angle cuts in a different way to motion pictures designed to be viewed on a cinema screen. This is a hypothesis that can be tested statistically if you have the data: do films made for television have a greater proportion of reverse angle cuts than film made for theatres? If so, then the decision to include a made-for-television, which Buckland justifies on the basis of one element of film style (see above), is flawed and the results will reflect the difference between to media and not two directors. Either way, looking at this element of film style leads us to no firm conclusion about who could be considered the author of Poltergeist.

TABLE 2 Proportion of reverse angle shots in Poltergeist against four films (α = 0.05)

The same is also true when we look at the proportion of low angle shots (Table 3). The results show that there is no significant difference between the proportion of low angle shots in Poltergeist and ET or The Funhouse, but that there is a significant difference between the proportion of low angle shots in Poltergeist and Salem’s Lot and Jurassic Park. There is no conclusion that we can draw here about the authorship of Poltergeist.

TABLE 3 Proportion of low angle shots in Poltergeist against four films (α = 0.05)

We also cannot draw any conclusion based on the proportion of high angle shots (Table 4), which shows a significant difference between Poltergeist and ET, but no significant difference between Poltergeist and the other three films.

TABLE 4 Proportion of high angle shots in Poltergeist against four films (α = 0.05)

Buckland argues that the proportion of shots with a low camera height in Poltergeist is more akin to the films of Spielberg than Hooper; and if the former did not actually direct Poltergeist then Buckland suggests (reasonably) that this may have been a creative suggestion from one filmmaker (Spielberg) to another (Hooper). The results of the Z-test show that Poltergeist has a significantly different proportion of low camera height shots from The Funhouse or Salem’s Lot, and we may conclude that a proportion of 0.53 is certainly unusual for what we know about Hooper’s film style. There is no significant difference between the proportion of low camera height shots in Poltergeist and ET, and we could conclude that placing the camera at this height was a creative suggestion that originates with Spielberg if it were not actually his decision, were it not for the fact that Jurassic Park shows a statistically significant difference from Poltergeist. Buckland’s argument that ‘we can infer that the decision to use so many low camera heights in Poltergeist was Spielberg’s suggestion, which constitutes one of the pieces of advice he offered to Hooper on the set’ (163) is demonstrably false because we cannot, in fact, conclude from the results in Table 5 that the use of low camera height shots in Poltergeist is typical of Spielberg. Note that the confidence interval for the proportion in ET does not include the proportion for Jurassic Park, and vice versa.This example demonstrates clearly why it is necessary to perform statistical test and not simply make assertions based on the fact that one number is more like a second number than another: 0.42 looks close enough to 0.53 to for Spielberg’s influence be plausible – especially when the proportions for the Hooper films 0.29 and 0.33 – but the Z-test leads us to the alternative conclusion. This does not mean that Spielberg did not influence Hooper’s decision to place the camera at a low height – but it is not a statistically sound conclusion.

TABLE 5 Proportion of low camera height shots in Poltergeist against four films (α = 0.05)

Things are clearer when we look at the proportion of moving shots: there are significant differences between Poltergeist and the two Spielberg films, but no significant difference between Poltergeist and the two Hooper films. In isolation, we might interpret this as a clear indication of that Poltergeist was directed by Hooper. However, when interpreted in relation to the other types of shot Buckland includes this serves only to confuse the issue.

TABLE 6 Proportion of moving shots in Poltergeist against four films (α = 0.05)

Again, the proportion of shots in the range 1-3 seconds (Table 7) seemingly paints a clear-cut picture of that Hooper did direct Poltergeist. Taken with the moving shots, we might argue that the only elements of film style that can distinguish one filmmaker from another are these two statistics – but this is a highly selective interpretation of the available evidence and it would be necessary to explain why reverse angle shots, low angle shots, etc., should not be used. As Buckland bases his interpretation on all the available data, then the results in Table 7 are inconclusive when viewed in the context of the rest of the data. We can only conclude that there are some differences between some of the films on some measures.

TABLE 7 Proportion of shots in the range 1-3 seconds in Poltergeist against four films (α = 0.05)

All of this assumes that Hooper’s and Spielberg’s films are stylistically different from one another, but is this, in fact, the case? For example, if we compare the proportion of shots in the range 1-3 seconds in ET and Jurassic Park against The Funhouse and Salem’s Lot (see Table 8), we find that we cannot simply distinguish between Spielberg and Hooper as film directors. Neither Salem’s Lot nor The Funhouse shows a significant difference from ET, while both films are significantly different from Jurassic Park. We might conclude, therefore, that the director of Jurassic Park was not the same director of Salem’s Lot and The Funhouse; but, if we did so, would we not also need to consider the possibility that the director of ET did direct Salem’s Lot and The Funhouse? This is made even more complicated by the fact that ET shows no significant difference for the proportion of shots in the range 1-3 seconds from Jurassic Park (Z = 1.4443, p = 0.1487) and that there is no significant difference between Salem’s Lot and The Funhouse (Z = 0.2371, p = 0.8126). Should we then conclude that the director of ET also directed The Funhouse, Salem’s Lot, and Jurassic Park, but that the director of Salem’s Lot, The Funhouse, and ET did not direct Jurassic Park? Buckland describes these films as being of ‘undisputed authorship’ (157), and certainly there is no reason to think that director in each case has been inaccurately credited – but is there any statistical evidence to support this? Is statistics even able to answer this question?

TABLE 8 Z-test of the proportion of shots in the range 1-3 seconds in four films (α = 0.05)

Presentation

One of the problems with Buckland’s analysis is that it is difficult to follow. This is due the poor presentation of the data, which is organised by film rather than by variable. As a result we find the relevant statistics for reverse angle shots on five different pages, and have to spend time hunting and organising this data. This makes it difficult to easily compare and contrast the different stylistic elements. Hopefully you will have found the tables produced here clear and simple to understand, with all the relevant data easily to hand. In Table 2, for example, the proportion of reverse angle shots in each film is presented together in a single column so that rather than having to flip from page to page you can get all the data. It is far easier to identity patterns by looking at the data when it is presented side-by-side.

This might seem like a small and pedantic point, but if you want to present the reader with a detailed statistical analysis, then you have to make it clear for them to follow and to understand. It is especially irritating given that the use of diagrams in the book’s other chapters is clear and easy to understand. It raises questions about the ability of Buckland, his readers, and the editors at Continuum to deal with statistical information – why, when everything else appears to be have been done so much better, was the presentation of the statistics done so badly?

Conclusions

Buckland concludes that Hooper was the director of Poltergeist, but that Spielberg had an input on key stylistic decisions. This seems to me to be an entirely plausible description of the working relationship between two filmmakers who fulfilled the roles of director (Hooper), and producer and screenwriter (Spielberg). However, it is not a conclusion that can be reached through a statistical analysis of some elements of film style.

A further problem lies in the way in which the research question behind the chapter is framed. Buckland asks who the author of Poltergeist is: Spielberg or Hooper. This assumes an all-or-nothing conception of authorship that is parceled out to one of two pre-selected individuals. What if the answer is neither (or even both)? What if there is no such thing as authorship in the cinema? Or if such a thing does exist, what if it cannot be identified by the statistical analysis of those elements of film style and can only be located in the non-quantifiable, such as mise-en-scene? We are also assuming that a statistically significant  difference reflects the practical difference the decisions of a filmmaker has on film style – not necessarily an unreasonable assumption but one that needs to be considered in the design of the experiment.

We could just drop the authorship question entirely and ask who, on the basis of the results presented here, should be credited as the director of Poltergeist? (These two questions are presented as equal by Buckland and there is no reason not to do this, but they could be separated). Well, some measures would seem to favour Spielberg, while others would favour Hooper. We certainly cannot apportion some role of direct creative agency as ‘author’ based on statistics if we cannot use those statistics to say who, in fact, directed the film! Table 9 summarises whether the proportion of different shot types is different for each film against Poltergeist, and we can see that there is no consistent pattern for these elements of film style.

TABLE 9 Statistically significant differences in shot types between Poltergeist and four films (Z-test for two proportions, α = 0.05)

We might also question the results that do indicate significant differences, which may have a higher than expected error rate due to the multiple tests used. We have assumed a significance level of 0.05, which means that at least one significant result could be expected even though there is no practical difference. We can therefore assume that at least one ‘YES’ in Table 9 is a false positive, but we cannot know which one. One method is to correct the significance level to take multiple testing into account, thereby reducing the critical p-value. This would make our decision rule much more stringent, and some of the significant differences above would be re-classed as ‘not significant.’ For the 20 hypothesis tests presented in Table 9, a corrected p-value of 0.0025 would keep the expected error rate at 5% for the whole experiment.

On the back cover of Directed by Steven Spielberg we find the promise that,

Buckland also uses poetics to answer once and for all the question: did Spielberg really direct Poltergeist? The reader will discover whether Poltergeist should remain a Tobe Hooper film, or whether it should be added to Spielberg’s canon.

If we adopt a statistical approach, what can we conclude about the roles of Spielberg and Hooper in the production of Poltergeist? Well, nothing, it turns out, and the reader will discover nothing. The results of the tests presented above are too inconclusive, too topsy-turvy, and too open to conflicting interpretations to justify the conclusion that either Spielberg or Hooper should be credited as author or, indeed, as director. All data is open to multiple interpretations, but we should at least be able to (1) explain the logic behind a particular interpretation, (2) give reasons why one interpretation is to be considered to be better than another, and (3) subject that interpretation to further scrutiny. As I have shown here, Buckland’s study fails on all three counts due to the potentially flawed design of the study, the lack of a statistical methodology and the failure to provide all the necessary information, and the difficulty in understanding the data presented due to its poor organisation.

Summary

Buckland makes bold claims for his chapter on Poltergeist, and promises that the results of his analysis may surprise the reader. Unfortunately, there is little surprising about the standard of the statistical analysis in this book, and the mistakes Buckland makes are the same mistakes that have been made for over thirty years in film studies. For example, no one to my knowledge has ever conducted a statistical test or provided a confidence interval when making statements about film style while quoting things like average shot lengths or the proportion of a type of shot in a film; and Bordwell and Thompson (1985) made precisely the same mistake about the use of the term ‘significant’ Buckland makes 21 years later. Statistics are presented as parameters, and there are no measures of dispersion or confidence intervals. The wrong statistics are used, when the data clearly indicate the necessity to use alternative methods.

Notes

  1. Unless otherwise stated, all page references are to this chapter.
  2. Charles O’Brien (2005: 83) does provide standard deviations for some data, including standard deviations for some of Barry Salt’s data that do not appear to be in Salt (1992), but makes no reference to them and performs no statistical tests.

References

Bordwell D and Thompson K 1985 Toward a scientific film theory, Quarterly Review of Film Studies 10 (3): 224–237. Available online: http://www.davidbordwell.net/articles/Bordwell_Thompson_QuarterlyRevFilmStud_vol10_no3_summer1988_224.pdf, accessed 18 November 2009.

Buckland W 2006 Directed by Steven Spielberg: Poetics of the Contemporary Hollywood Blockbuster. London: Continuum.

Elsaesser T and Buckland W 2002 Studying Contemporary American Film: A Guide To Movie Analysis. London: Arnold. The chapter on the statistical analysis of film style can be accessed online: http://www.cinemetrics.lv/buckland.php, accessed 18 November 2009.

O’Brien C 2005 Cinema’s Conversion to Sound: Technology and Film Style in France and the U.S. Bloomington: Indiana University Press.

Salt B 1992 Film Style and Technology: History and Analysis, second edition. London: Starwood.

Schuster M 2002 Informing cultural policy – data, statistics, and meaning, International Symposium on Cultural Statistics, UNESCO Institute for Statistics, Observatoire de la culture et des communications du Québec, Montréal, Québec, Canada, October 21 to 23, 2002. Available online: http://www.culturalpolicies.net/web/files/74/en/Schuster.pdf, accessed 18 November 2009.