# Analysing film style using dominance statistics

UPDATE: An article using the ideas introduced in this post has now been published as Comparing the Shot Length Distributions of Motion Pictures using Dominance Statistics, Empirical Studies of the Arts 32 (2) 2014: 257-273. DOI: 10.2190/EM.32.2.g. It can be found here.

Statistical comparisons of film style have been based on the average shot length (either the mean or the median), so that, for example, given the ASLs of two films the one with the greater average is said to be edited more slowly.

In his first contribution to the Cinemetrics conversation, Mike Baxter argued that in some circumstances neither the mean nor the median were useful statistics of film style. In this post I look at how we might compare the shot length distributions of two films or two groups of films beginning with an average shot length. The methods used are Cliff’s d statistic, which measures the stochastic dominance of one sample over another, and the Hodges-Lehman median difference, which measures the average distance between. Results produced by these methods are then compared to the interpretation of film style using average shot lengths, measures of dispersion, and graphical methods. This will also provide us with an opportunity to consider Baxter’s further claim that it makes little difference which average was used since either would lead to the same interpretation of film style.

### Cliff’s d statistic

Cliff (1993, 1996) introduced the stochastic difference

d = P(X >Y) – P(X<Y)

as a nonparametric method of measuring the extent to which two samples (X and Y) overlap. This means we find the probability that an observation in the sample is X is greater than an observation in sample Y, and from this we subtract the probability that an observation in Y is greater than an observation in X. Ties are not included in the calculation. Cliff’s d statistic can be calculated as a linear transformation of the probability of superiority:

d = 2PS – 1

where PS is equal to the Mann-Whitney U test statistic divided by the product of the sample sizes (PS = U/nm) (see Delaney & Vargha 2002). Since PS = P(X > Y) + 0.5P(X = Y), ties are accounted for. The value of d ranges from -1 (when every observation in X is less than every observation in Y) to 1 (when every observation in X is greater than every observation in Y); and stochastic equality occurs at 0 (when there is complete overlap between the distributions).

This statistic has several advantages for comparing two distributions:

• It is not based on any assumptions about the data
• it is robust against outliers and unequal variances
• it is invariant under monotonic transformation
• it provides a more direct answer to the sort of questions researchers often wish to ask of data: ‘if one’s primary interest is in a quantification of the statement “Xs tend to be higher than Ys,” then [d] provides an unambiguous description of the extent to which this is so’ (Cliff 1996: 125).

The stochastic dominance of one sample over another can be visualised graphically since d measures the extent to which one population distribution lies to the right of another.

### Hodges-Lehmann median difference

Although we can use Cliff’s d to discover if the shots in one film tend to be shorter than the shots in another it cannot tell us how much shorter those shots tend be. For this we need another statistic. The Hodges-Lehmann median difference (HLΔ) for two samples is the median of all the pairwise differences between every observation in X and every observation in Y:

HLΔ = med{xiyj}

In other words, subtract the length of every shot in film A from every shot in film B and then find the median of the n × m differences. HLΔ is a measure of the average distance between observations in and X and observations in Y.

### Comparing the style of two films

As a first example let’s use the example of Lights of New York and Scarlet Empress I used in my own contribution to the Cinemetrics conversation. Basing our interpretation on the median shot lengths we see that Lights of New York has a median of 5.1 seconds and that Scarlet Empress has a median of 6.5 seconds, indicating that the former is edited more quickly than the latter. In contrast, an interpretation based on the mean shot length implies that both films are cut equally quickly since each film has a mean shot length of 9.9 seconds.

To calculate d we first need to perform the Mann-Whitney U test, which gives us U = 88188, and then we derive the probability of superiority by dividing by the product of the sample sizes (338 and 601):

PS = U/nm = 88188/(338 × 601) = 0.4341.

From this we can calculate the stochastic dominance between the two distributions:

d = 2PS – 1 = (2*0.4341) – 1 = –0.1318.

Therefore, we conclude that shots in Lights of New York tend to be of shorter duration than those of Scarlet Empress. This can be clearly seen in Figure 1, which shows the empirical cumulative distribution functions of the two films. Figure 1 The empirical cumulative distributions of Lights of New York and Scarlet Empress (KS Test: D = 0.12, p = <0.01)

The function of Scarlet Empress tends to lie to the right of that of Lights of New York indicating that it has shots of longer duration, except for the very upper tail where the presence of a few unusually long takes in Lights of New York, which account for only ~7% of the shots in this film. It is this handful of shots that pulls the mean away from the mass of the data, and if we remove the 24 longest shots from the distribution of Lights of New York we see that the mean shot length falls to 6.4 seconds. This is clearly a very influential group of outliers as just this 7% of the total number of shots leads to a 33% difference in the mean equivalent to a 3.5 second increase. It takes an act of wilful perversity to claim that there are no outliers present in this data, that the mean of not greatly influenced by those outliers, and that the mean shot length is an accurate description of the style of this film.

For these two films HLΔ = -1.0 (95% CI: -1.6, -0.4), which means that on average a shot in Lights of New York is 1 second shorter in duration than a shot in Scarlet Empress.

The interpretation of the difference in the style of these films based on Cliff’s d and HLΔ is consistent with that based on the median shot length but not with the conclusion derived from the mean shot length. The difference in these statistics indicates that far from leading to the same conclusion they lead to contrary and incompatible conclusions, and so Baxter’s argument that the choice of statistic is irrelevant does not hold in this case.

### Comparing the style of two groups of films

Comparing the style of two groups of films we use the same methods described above and calculate the pairwise statistics for all the films in both samples. We can then take the median value of the n × m d statistics and of the n × m HLΔ statistics as estimates of the differences of the

To illustrate this I use the example of the Laurel and Hardy short films I discussed in an earlier paper. In this study I compared the median shot lengths of a sample of silent films and a sample of sound films starring Laurel and Hardy produced between 1927 and 1933, and concluded that there was a statistically significant difference between the two samples of medians but that it was a small difference reflecting the continuity of a mode of production, of creative personnel, and of a style of comedy with the introduction of sound technology. The difference in the median shot lengths was estimated to be HLΔ = 0.5 seconds (95% CI: 0.1, 1.1) and PS = 0.2333. (I also compared statistics of the dispersion of shot lengths in these films but I won’t discuss these here).

If this analysis had been conducted using the mean shot length then I would have reached a different conclusion, with HLΔ = 1.5 seconds (95% CI: 0.8, 2.3) and PS = 0.1188. This result would appear to indicate that the introduction of sound technology had a large impact on the style of Laurel and Hardy films and would lead us to conclude there is no continuity from the silent to the sound era. Again, there is a difference in the interpretation of the style of these films indicated by the different statistics: the estimate of the impact of sound technology based on the means is 300% greater than that based on the medians. Again, Baxter’s argument that the choice of statistic does not matter simply doesn’t hold water.

What conclusion do the dominance statistics lead to? As we have a sample of 12 silent films and a sample of 20 sound films we need to perform a total of 12 × 20 = 240 calculations. Table 1 presents the pairwise comparisons for Cliff’s d, while the pairwise HLΔ statistics are in Table 2.

The median of the pairwise Cliff’s d statistics is -0.0957 (95% CI: -0.1192, -0.0723). This indicates that shots in the silent films of Laurel and Hardy tend to be of shorter duration than those of their sound films, and that this effect is relatively small.

Table 1 Pairwise Cliff’s d statistics for silent and sound films of Laurel and Hardy. (This table is very large so click on it to see it full size). The median of the pairwise HLΔ statistics is 0.4s (95% CI: 0.3, 0.5), which again indicates a significant if small difference between the samples with the shots in the soundtending to be of slightly longer duration on average than those of the silent films.

Table 2 Pairwise HLΔ statistics for silent and sound films of Laurel and Hardy. (This table is very large so click on it to see it full size). Both these results are consistent with my analysis based on the mean shot length. Neither of these statistics is compatible with the interpretation based on the mean shot lengths.

A problem with applying Cliff’s d and HLΔ in this way is that as the sample sizes grow the number of pairwise comparisons becomes very large. For example, if we wanted to compare the style of two groups of films with 100 films in each sample we would have to perform 100 × 100 comparisons. That’s a total of 10,000 Mann-Whitney U tests, and while we are interested in film style I don’t think we’re that interested. It is here that the consistency of Cliff’s d and HLΔ with the median shot length is valuable. It is quick and easy to perform even a very large number of pairwise comparisons of median shot lengths simply by copying formulas across a range of cells in an Excel spreadsheet, for example. We can use the median shot length in the place of the dominance statistics thereby greatly speeding up the analytical process while allowing us to remain secure in our interpretation of the data. We cannot use the mean shot length in the same way since this method is not consistent with any of the others.

### Conclusion

Based on the above discussion we can arrive at the following conclusions:

• The claim that it does not matter which statistic of film style we use since using either the mean or the median will lead to the same interpretation is clearly not true and the choice of statistic will affect the size of any effect. In turn, this will have a direct impact on our conclusions about the nature of film style.
• We can analyse the style of films using dominance statistics that do not require any average shot length. Cliff’s d and HLΔ are. The meaning and interpretation of these statistics may correspond more closely to questions we wish to ask of film style than using average shot lengths (though we still need descriptive statistics and graphs to provide information about the shot length distribution).
• It may not be practical to use dominance statistics for comparing large samples of films due to the very large number of pairwise comparisons required. Mike Baxter indicated that an average shot length could be thought of as a ‘proxy statistic’ of film style, and the median shot length can certainly be used in this sense by virtue of its consistency with Cliff’s d and HLΔ.
• The mean shot length is not robust in the presence of outliers and leads to fundamentally flawed interpretations of film style. It is not consistent with either  Cliff’s d or HLΔ, and cannot be used to answer the question ‘do the shots in film A tend to be longer than the shots in film B.’

### References

Cliff N 1993 Dominance statistics: ordinal analyses to answer ordinal questions, Psychological Bulletin 114 (3): 494-509.

Cliff N 1996 Ordinal Methods for Behavioural Data Analysis. Mahwah, NJ: Lawrence Erlbaum Associates Inc.

Delaney HD and Vargha A 2002 Comparing several robust tests of stochastic equality with ordinally scaled variables and small to moderate sample sizes, Psychological Methods 7 (4): 485-503. 