# Location and spread in shot length distributions

The typical characteristics of the distribution of shot lengths in a motion picture are:

• The distribution is decidedly non-normal – it is positive skewed. Although it is possible to conceive of a film that would have a normal or even a negative distribution of shot lengths this does not occur in fact, and I have never come across any film in which the shot lengths were not positively skewed.
• The distribution will include some outlying data points that are far from the average value (the mean or the median shot length).

An additional characteristic worth exploring is the linear relationship between the average value of a shot length distribution and the spread of the data around that value. Figures 1 to 6 plot the average value (the mean and the median shot lengths) of the 50 Hollywood films I used in my analysis of the impact of sound technology on film style (20 silent and 30 sound) against three measures of absolute dispersion – the standard deviation, the interquartile range, and the median absolute deviation. The coefficient of determination is given as a measure of the linear relationship between location and spread. The correlation coefficients for all the comparisons are significant at the 95% level.

Figure 1 Mean shot length v. standard deviation for silent Hollywood films produced from 1920 to 1928 inclusively (n = 20).

Figure 2 Median shot length v. interquartile range for silent Hollywood films produced from 1920 to 1928 inclusively (n = 20).

Figure 3 Median shot length v. median absolute deviation for silent Hollywood films produced from 1920 to 1928 inclusively (n = 20).

Figure 4 Mean shot length v. standard deviation for sound Hollywood films produced from 1929 to 1931 inclusively (n = 30).

Figure 5 Median shot length v. interquartile range for sound Hollywood films produced from 1929 to 1931 inclusively (n = 30).

Figure 6 Median shot length v. median absolute deviation for sound Hollywood films produced from 1929 to 1931 inclusively (n = 30).

In general the linear relationship between location and spread for these films is evident, but may be quite weak. The strongest linear relationship occurs between the median shot length and the median absolute deviation, and the strength of these relationship increases from the silent to the sound era. In both cases there is a substantial proportion of the variance that is unexplained, but overall films with greater median shot lengths exhibit greater variation in their shot lengths.

The relationship between the mean shot length and the standard deviation shows weaker linearity, with approximately one-third of the variance unexplained for both groups of films although there is a small increase in the strength of the relationship from the silent to the sound eras.

The relationship between the median and the interquartile range (IQR) for the sound films shows a weak linear relationship for the sound films, but only a very weak relationship for the silent films – although the r is significant (t [18] = 4.1090, p = 0.0007), over half the variance in the IQR is unexplained. R2 for the silent films is 0.4840 and for the sound films is 0.7490, though why such a difference should occur for this relationship and not for the others is a mystery. There is clearly something about the relationship between the median shot length and the interquartile range in the sample of silent films that requires further exploration.

We can say that for Hollywood films of the 1920s and early silent period the average shot length of a motion picture increases so does the variability of shot lengths. As expected, for skewed data sets the linear relationship between measures that do not rely on a mathematical relationship to the mean are the strongest. It seems likely that other groups of films will exhibit similar relationships between measures of location and spread (although perhaps not for the median and the IQR), but it will take further studies to test this hypothesis.

## About Nick Redfern

I am an independent academic with over 15 years experience teaching film in higher education in the UK. I have taught film analysis, film industries, film theories, film history, science fiction at Manchester Metropolitan University, the University of Central Lancashire, and Leeds Trinity University, where I was programme leader for film from 2016 to 2020. My research interests include computational film analysis, horror cinema, sound design, science fiction, film trailers, British cinema, and regional film cultures.

Posted on November 12, 2009, in Cinemetrics, Film Studies, Film Style, Hollywood, Silent cinema, Statistics and tagged , , , , . Bookmark the permalink. 5 Comments.

• ### Comments 3

1. Barry Salt

Come on. You know why there is an approximately linear relationship between the median and the standard or other deviation. It’s because the shot legth distributions are roughly Lognormal, and with a shape factor about 0.9, as I have previously shown. Why not say so?

2. Veronika Koch

Could you please tell me what ” t [18] = 4.1090 ” means?
Thank you.

• Nick Redfern

The significance of the correlation coefficient is given by a t-test of the null hypotheis of no correlation (i.e. r = 0), with n-2 degrees of freedom. So t [18] = 4.1090, refers to the t statistic with 18 degrees of freedom, and the value of the t-statistic is given.