Category Archives: Hollywood

Genre and Hollywood studios, 1991 to 2010

Historically, particular movie studios were often associated with a specific genre of filmmaking as a strategy of differentiating their product in the marketplace (e.g. MGM and musicals, Universal and horror films, Warner Bros. and gangster films), whilst also ensuring that their product was sufficiently diverse to mitigate changes in audience taste and fashion. Table 1 lists the number of films in each of nine genres released by Hollywood studios that were ranked in the top 50 films at the US box office from 1991 to 2010, inclusive.This gives a total sample of 1000 films. See here for more on the sample used. This table is quite large, and can be seen better by opening it in a new window.

Table 1 Number of films in each genre released by Hollywood studios, 1991 to 2010 (minimum of 20 releases)

It is clear from the data that there is no evidence of genre specialisation among five of the six major studios (Fox, Paramount, Sony, Universal, and Warner Bros.). Fox has released fewer crime/thriller films than the other major studios, while releasing a greater number of fantasy/science fiction films. Paramount and DreamWorks have co-released 10 family films, which accounts for their number of releases in this category being lower for Paramount than for the other major studios. The exception for the major studios is Buena Vista, its output dominated by and dominating the genre of family films. Of the 162 films released by the studio to make it into the top 50 between 1991 and 2010, 44% were family films; and this one firm accounts for 43% of the 164 films of this genre in the sample. This result is unsurprising, since Buena Vista is the releasing arm of the Walt Disney Corporation and reflects the corporate image of that company as a producer of safe, wholesome, family entertainment (Wasko 2001). Buena Vista has also diversified its product and the frequency with which it has released other types of film is generally consistent with the other majors, although it has released fewer crime/thriller films compared to most of the other studios.

The six majors account for a total of 778 films in the sample; and many of the smaller firms listed operate within their orbit. New Line was a part of the Time-Warner media conglomerate from 1993 until it merged with Warner Bros. in 2008; and DreamWorks has entered into production and/or distribution arrangements with Paramount and Disney. The only film amongst the highest grossing in this twenty year period not connected to one of the major media conglomerates is Newmarket’s The Passion of The Christ (2004), which was produced and distributed outside the traditional Hollywood mechanisms (Maresco 2004). Looking at the smaller firms in Table 1, we see that New Line’s output is dominated by comedy films, although its most profitable films were the Lord of the Rings trilogy; while half of MGM’s limited output is accounted for by action/adventure (and four of these five films are from the James Bond franchise), comedy, and crime/thriller films. Few films from the action/adventure and fantasy/science fiction genres are produced by firms other than the major studios. The budgets for these types of films tend to be higher than those of other genres, and this level of capital investment is typically beyond the scope of all but the largest studios.


Maresco PA 2004 Mel Gibson’s The Passion of the Christ: market segmentation, mass marketing and promotion, and the internet, Journal of Religion and Popular Culture 8:

Wasko J 2001 Understanding Disney: The Manufacture of Fantasy. Malden MA: Blackwell.

US Gross Ratios by Genre, 1991 to 2010

Recently I have been looking at the breakdown of the top 50 films at the US box office in each year from 1991 to 2010 by genre. I’ll have more to say on this topic in a couple of weeks, but I have looked at several different variables and have had to excise some aspects from the paper I was writing. This means I have some graphs left over – one of the most interesting of which is presented below. This boxplot shows the distribution of the ratio of the opening weekend gross to the total gross for the 1000 films in my sample. The box office data this graph is based on has been inflation adjusted to 2010 dollars. ‘Other’ includes genres that were too infrequent to be included separately, and is comprised of documentaries, musicals, war films, and westerns. (Actually this graph includes data from only 999 films, as one of the documentaries included in the category ‘other’ has no reported opening weekend gross). The summary statistics are given in Table 1 below.

Figure 1 Opening/total gross ratios by genre in the top 50 films at the US box office, 1991 to 2010 (Source: Box Office Mojo)

Table 1 Opening/total gross ratios by genre in the top 50 films at the US box office, 1991 to 2010 (Source: Box Office Mojo)

The overall median for 999 films is 0.2436; and so of the gross accumulated by these films, a quarter is taken in the opening weekends alone.

Three features stand out from this data:

  • Horror films have the highest median gross ratio of any genre, and tend to open big before falling away quite dramatically. Box office Mojo has chart views of the daily box office gross for films released in the US; and it is interesting to compare the charts for Saw IV (2007), which ran out of steam after less than a month on release, and Paranormal Activity (2009), which did not reach its peak gross until after a month. The gross ratio for the former film is 0.5017, and that of the latter is 0.0007. These films are the extremes of the data values for the genre.
  • The genres of action/adventure, family, and fantasy/science fiction tend not to have films with very low gross ratios – of the 410 films in these three genres, only 13 have very low gross ratios (and ten of these are family films). Action/adventure and fantasy/science fiction tend to have higher gross ratios, and so the opening weekend is more important for these genres. The distribution for family films is more consistent with comedy, crime/thriller, and romance films. These films tend to make a big initial splash, and rarely have the opportunity to build an audience over time.
  • Drama films have the lowest gross ratios; and of the 109 films in this category in the sample, 33 have an opening weekend of less than $1 million. This is the result of films in this genre being initially released to a small number of screens and allowing the film to build an audience on the basis of critical reviews and word of mouth. Drama films are therefore characterised by a particular release pattern that it is not evident in the other genres, and the opening weekend gross is unreliable as a predictor of the total gross.

The above graph is a very simple plot of a simple calculation performed on some easily obtained data, but we can immediately see how different genres find their way into the film market in the US. These patterns are even stronger when we look at box office data sorted by genre in more detail, as will become clear when I put up the full paper in a couple of weeks.

Determining box office success

I had not actually planned to write anything this week because someone said that the world was going to end on Saturday.

It didn’t.

And so this week I present a collection of papers on the factors that shape the box office performance of films. The majority of these papers are the final versions from university research depositories, but some may be pre-prints or drafts so check before you cite.

The best place to start is undoubtedly this paper from Jehoshua Eliashberg, Anita Elberse, and Mark A.A.M. Leenders from 2006, which provides a summary of research in this area and illustrates the type of research carried out in the fields of economics, retailing, and marketing that is entirely absent from film studies texts.

Eliashberg J, Elberse A, and Leenders MAAM 2006 The motion picture industry: critical issues in practice, current research, and new research directions, Marketing Sceince 25 (6): 638-661.

The motion picture industry has provided a fruitful research domain for scholars in marketing and other disciplines. The industry has high economic importance and is appealing to researchers because it offers both rich data that cover the entire product lifecycle for many new products and because it provides many unsolved “puzzles.” Although the amount of scholarly research in this area is rapidly growing, its impact on practice has not been as significant as in other industries (e.g., consumer packaged goods). In this article, we discuss critical practical issues for the motion picture industry, review existing knowledge on those issues, and outline promising research directions. Our review is organized around the three key stages in the value chain for theatrical motion pictures: production, distribution, and exhibition. Focusing on what we believe are critical managerial issues, we propose various conjectures—framed either as research challenges or specific research hypotheses—related to each stage in the value chain and often involved in understanding consumer moviegoing behavior.

The web page of Jehoshua Eliashberg at Wharton is here, and that of Anita Elberse at Harvard Business School is here. The webpage for Mark Leenders is here, and features the intriguing quote, “Hollywood movies and medicines are very similar from a marketing perspective.”

Basuroy S, Chetterjee S, and Ravid SA 2003 How critical are critical reviews? The box office effects of film critics, star power, and budgets, Journal of Marketing 67 (4): 103-117.

The authors investigate how critics affect the box office performance of films and how the effects may be moderated by stars and budgets. The authors examine the process through which critics affect box office revenue, that is, whether they influence the decision of the film going public (their role as influencers), merely predict the decision (their role as predictors), or do both. They find that both positive and negative reviews are correlated with weekly box office revenue over an eight-week period, suggesting that critics play a dual role: They can influence and predict box office revenue. However, the authors find the impact of negative reviews (but not positive reviews) to diminish over time, a pattern that is more consistent with critics’ role as influencers. The authors then compare the positive impact of good reviews with the negative impact of bad reviews to find that film reviews evidence a negativity bias; that is, negative reviews hurt performance more than positive reviews help performance, but only during the first week of a film’s run. Finally, the authors examine two key moderators of critical reviews, stars and budgets, and find that popular stars and big budgets enhance box office revenue for films that receive more negative critical reviews than positive critical reviews but do little for films that receive more positive reviews than negative reviews. Taken together, the findings not only replicate and extend prior research on critical reviews and box office performance but also offer insight into how film studios can strategically manage the review process to enhance box office revenue.

The web page of Suman Basuroy at The University of Oklahoma is here, and has links to his many papers on the marketing of motion pictures.

Craig S, Douglas S, and Greene W 2003 Culture matters: a hierarchical linear random parameters model for predicting success of US films in foreign markets, Manuscript, Department of Marketing, Stern School of Business, NYU.

Culture matters in ways that are salient for products with significant cultural content. In particular, the cultural context in which a product is launched plays an important role in its success. The present study examines the impact of cultural context on the box office performance of US films in foreign markets. A hierarchical linear random parameters model is used to assess the impact of national culture, degree of Americanization, US box office and film genre on performance in eight foreign markets. The model allowed for film-specific heterogeneity to be accounted for and for hypotheses to be tested at both the film level and the country level. Results indicate that films perform better in countries that are culturally closer to the US and those that have a higher degree of Americanization. The genre of the film and US box office success also had a significant impact on performance. Some implications are drawn for managers releasing films in foreign markets.

Dellarocas C, Farag NA, and Zhang X 2005 Using online ratings as a proxy of word-of-mouth in motion picture revenue forecasting, SSRN Working Paper.

The emergence of online product review forums has enabled firms to monitor consumer opinions about their products in real-time by mining publicly available information from the Internet. This paper studies the value of online product ratings in revenue forecasting of new experience goods. Our objective is to understand what metrics of online ratings are the most informative indicators of a product’s future sales and how the explanatory power of such metrics compares to that of other variables that have traditionally been used for similar purposes in the past. We focus our attention on online movie ratings and incorporate our findings into practical motion picture revenue forecasting models that use very early (opening weekend) box office and movie ratings data to generate remarkably accurate forecasts of a movie’s future revenue trajectory. Among the metrics of online ratings we considered, we found the valence of user ratings to be the most significant explanatory variable. The gender diversity of online raters was also significant, supporting the theory that word-of-mouth that is more widely dispersed among different social groups is more effective. Interestingly, our analysis found user ratings to be more influential in predicting future revenues than average professional critic reviews. Overall, our study has established that online ratings are a useful source of information about a movie’s long-term prospects, enabling exhibitors and distributors to obtain revenue forecasts of a given accuracy sooner than with older techniques.

Elberse A 2007 The power of stars: do star actors drive the success of movies?, Journal of Marketing 71 (4): 102-120.

Is the involvement of stars critical to the success of motion pictures? Film studios, which regularly pay multimillion-dollar fees to stars, seem to be driven by that belief. This article sheds light on the returns on this investment using an event study that considers the impact of more than 1200 casting announcements on trading behavior in a simulated and real stock market setting. The author finds evidence that the involvement of stars affects movies’ expected theatrical revenues and provides insight into the magnitude of this effect. For example, the estimates suggest that, on average, stars are worth approximately $3 million in theatrical revenues. In a cross-sectional analysis grounded in the literature on group dynamics, the author also examines the determinants of the magnitude of stars’ impact on expected revenues. Among other things, the author shows that the stronger a cast already is, the greater is the impact of a newly recruited star with a track record of box office successes or with a strong artistic reputation. Finally, in an extension to the study, the author does not find that the involvement of stars in movies increases the valuation of film companies that release the movies, thus providing insufficient grounds to conclude that stars add more value than they capture. The author discusses implications for managers in the motion picture industry.

Elliot C and Simmons R 2007 Determinants of UK Box Office Success: The Impact of Quality Signals, Lancaster University Management School Working Paper 2007/012.

This paper analyses the roles of various potential quality signals in the demand for cinema in the United Kingdom using a breakdown of advertising totals by media category. Estimation of a two stage least squares model with data for 546 films released in the United Kingdom shows that the impacts of types of advertising on box office revenues vary both in channels and magnitudes of impact. We also offer a more sophisticated treatment of critical reviews than hitherto by examining the spread (entropy) rather than just the mean rating.

Hennig-Thurau T, Houston MB, Sridhar S 2006 Can good marketing carry a bad product? Evidence from the motion picture industry, Marketing Letters 17 (3): 205-219.

We examine the relative roles of marketing actions and product quality in determining commercial success. Using the motion picture context, in which product quality is difficult for consumers to anticipate and information on product success is available for different points in time, we model the effects of studio actions and movie quality on a movie’s sales during different phases of its theatrical run. For a sample of 331 recent motion pictures, structural equation modeling demonstrates that studio actions primarily influence early box office results, whereas movie quality influences both short- and long-term theatrical outcomes. The core results are robust across moderating conditions. We identify two data segments with follow-up latent class regressions and explore the degree of studio actions needed to “save” movies of varying quality.We finally offer some implications for research and management.

Hennig-Thurau T, Walsh G, and Bode M 2004 Exporting media products: understanding the success and failure of hollywood movies in Germany, Advances in Consumer Research 31 (1): 633-638.

Rising production costs in the US motion picture industry make overseas markets essential for movie studios’ economic survival. However, movie marketers can rarely build on systematic research when attempting to customize movies or movie-related communications to different cultural settings. In this paper, we draw from cultural theory to develop a conceptual framework of US movies’ success in foreign markets. Propositions are then developed that offer insight into the differing impact of a number of factors on movie success in the US and Germany. Marketing implications will be discussed.

The webpage for Thorsten Hennig-Thurau at the Cass Business School is here.

Sharda R and Delen D 2006 Predicting box-office success of motion pictures with neural networks, Expert Systems with Applications 30 (2): 243-254. [NB: although there is no specific URL associated with it, there is a downloadable version of this paper which you will find if your search for the title in Google Scholar].

Predicting box-office receipts of a particular motion picture has intrigued many scholars and industry leaders as a difficult and challenging problem. In this study, the use of neural networks in predicting the financial performance of a movie at the box-office before its theatrical release is explored. In our model, the forecasting problem is converted into a classification problem-rather than forecasting the point estimate of box-office receipts, a movie based on its box-office receipts in one of nine categories is classified, ranging from a ‘flop’ to a ‘blockbuster.’ Because our model is designed to predict the expected revenue range of a movie before its theatrical release, it can be used as a powerful decision aid by studios, distributors, and exhibitors. Our prediction results is presented using two performance measures: average percent success rate of classifying a movie’s success exactly, or within one class of its actual performance. Comparison of our neural network to models proposed in the recent literature as well as other statistical techniques using a 10-fold cross validation methodology shows that the neural networks do a much better job of predicting in this setting.

Terry N, Butler M, and De’Armond D 2005 The determinants of domestic box office performance in the motion picture industry, Southwestern Economic Review 32: 137-148.

This paper examines the determinants of box office revenue in the motion picture industry. The sample consists of 505 films released during 2001-2003. Regression results indicate the primary determinants of box office earnings are critic reviews, award nominations, sequels, Motion Picture Association of America rating, budget, and release exposure. Specific results include the observation that a ten percent increase in critical approval garners an extra seven million dollars at the box office, an academy award nomination is worth six million dollars, the built in audience from sequels are worth eighteen million dollars, and R-rated movies are penalized twelve million dollars.

The Mann-Whitney U Test

There is a dire need for film scholars to understand elementary statistics if they intend to use it to analyse film style. See here for the problems a lack of statistical education creates.

This post will illustrate the use of the Mann-Whitney U test using the median shot lengths of silent and sound Laurel and Hardy short films produced between 1928 and 1933 (see here). I will also look at effect sizes for interpreting the result of the test. Before proceeding, it is important to note that the Mann-Whitney U test goes by many different names (Wilcoxon Rank Sum test, Wilcoxon-Mann-Whitney, etc) but that these are all the same test and give the same results (although they may come in a slightly different format).

The Mann-Whitney U test

The Mann-Whitney U test is a nonparametric statistical test to determine if there is a difference between two samples by testing if one sample is stochastically superior to the other (Mann and Whitney 1947). By stochastic ordering we mean that data values from one sample (X) are more likely to assume small values than the data values from another sample (Y) and that the data values in X are less likely to assume high values than Y.  If Fx(z) ≥ Fy(z) for all z, where F is the cumulative distribution function, then X is stochastically smaller than Y.

We want to find out if there is a difference between the median shot lengths of silent and sound films featuring Laurel and Hardy. The null hypothesis for our experiment is that

the two samples are stochastically equal

(Ho: Fsilent (z) = Fsound (z) for all z).

In other words, we assume that there is no difference between the samples – the median shot lengths of the silent films of Laurel and Hardy are no more likely to be greater or less than the median shot lengths of the sound films of Laurel. (See Callaert (1999) on the nonparametric hypotheses for the comparison of two samples).

In order to perform the Mann-Whitney U test we take our two samples – the median shot lengths of the silent and sound films – and we pool them together to form a single, large sample. We then order the data values from smallest to largest and assign a rank to each value. The film with the smallest median shot length has a rank 1.0, the film with second smallest median shot length has a rank of 2.0, and so on. If two or more films have a median shot length with the same value, then we give each film rank an average rank. For example, in Table 1 we see that five films have a median shot length of 3.3 seconds and that these films are 5th, 6th, 7th, 8th, and 9th in the ordered list. Adding together these ranks and dividing by the number of tied films gives us the average rank of each film: (5 + 6 + 7 + 8 + 9)/5 = 7.0.

Table 1 Rank-ordered median shot lengths of Laurel and Hardy silent (n = 12) and sound (n = 20) films

Notice that in Table 1, the silent films (highlighted blue) tend to be at the top of the table with lower rankings than the sound films (highlighted green) that tend to be in the bottom half of the table with the higher rankings. This is a very simple way to visual the stochastic superiority of the sound films in relation to the silent films. If the two samples were stochastically equal then we would see more mixing between the two colours.

Now all we need to do is to calculate the U statistic. First, we add up the ranks of the silent and sound films from Table 1:

Sum of ranks of silent films = R1 = 1.0 + 4.0 + 7.0 + 7.0 + 7.0 + 10.5 + 12.0 + 13.0 + 14.0 + 18.0 +18.0 +22.5 = 134.0

Sum of ranks of sound films = R2 = 2.0 + 3.0 + 7.0 + 7.0 + 10.5 + 18.0 + 18.0 + 18.0 + 18.0 +18.0 +22.5 +24.0 + 25.0 + 26.0 + 27.0 + 28.5 + 28.5 + 30.0 + 31.0 + 32.0 = 394.0

Next, we calculate the U statistics us the formulae:

where n1 and n2 are the size of the two samples, and R1 and R2 are the sum of ranks above. For the above data this gives us

We want the smallest of these two values of U, and the test statistic is, therefore, U = 56.0. (Note that U1 + U2 = n1 × n2 = 240).

To find out if this result is statistically significant we can compare it to a critical value for the two sample sizes: as n1 = 12 and n2 = 20, the critical value when α = 0.05, is 69.0. We reject the null hypothesis if the value of U we have calculated is less than the critical value, and as 56.0 is less than 69.0 we can reject the null hypothesis of stochastic equality in this case and conclude that there is a statistically significant difference between the median shot lengths of the silent films and those of the sound films. As the median shot lengths of the sound films tend to be larger than the median shot lengths of the silent films we can say that they are stochastically superior.

Alternatively, if our sample is large enough then U follows a normal distribution and we can calculate an asymptotic p-value using the following formulae:

For the above data, U = 56.0, μ = 120.0, and σ = 25.69. Therefore z = -2.49, and we can find the p-value from a standard normal distribution. The two-tailed p-value for this experiment is 0.013. (Note that ‘large enough’ is defined differently in different textbooks – some recommend using the z-transformation when both sample sizes are at least 20 whilst others are more generous and recommend that both sample sizes are at least 10).

If some more restrictive conditions are applied to the design of the experiment, then the Mann-Whitney U test is a test of a shift function (Y = X + Δ) for the sample medians and is an alternative to the t-test for the two-sample location problem. Compared to the t-test, the Mann-Whitney U test is slightly less efficient when the samples are large and normally distributed (ARE = 0.95), but may be substantially more efficient if the data is non-normal.

The Mann-Whitney U test should be preferred to the t-test for comparing the median shot lengths of two groups of films even if the samples are normal because the former is a test of stochastic superiority, while the latter is a test of a shift model and this is not an appropriate hypothesis for the design of our experiment. It simply doesn’t make sense to speak of the median shot length of a sound film in terms of a shift function as the median shot length of a silent film plus the impact of sound technology. You cannot take the median shot length of Steamboat Bill, Jr (X), add Δ number of seconds to it, and come up with the median shot length of Dracula (Y = X + Δ). Any such argument would be ridiculous, and only the null hypothesis of stochastic equality is meaningful in this context.

The probability of superiority

A test of statistical significance is only a test of the plausibility of the model represented by the null hypothesis. As such the Mann-Whitney U test cannot tell us how important a result is. In order to interpret the meaning of the above result we need to calculate the effect size.

A simple effect size that can be quickly calculated from the Mann-Whitney U test statistic is the probability of superiority, ρ or PS.

Think of PS in these terms:

You have two buckets – one red and one blue. In the red bucket you have 12 red balls, and on each ball is written the name of a silent Laurel and Hardy film and its median shot length. In the blue bucket you have 20 blue balls, and on each ball is written the name of a sound Laurel and Hardy film and its median shot length. You select at random one red ball and one blue ball and note down which has the larger median shot length. Replacing the balls in their respective buckets, you draw two more balls – one from each bucket – and note down which has the larger median shot length. You repeat this process again, and again, and again.

Eventually, after a large number of repetitions, you will have an estimate of the probability with which a silent films will have a median shot length greater than that of a sound film. (On Bernoulli trials see here).

The probability of superiority can be estimated without going through the above experiment: all we need to do is to divide the U statistic we got from the Mann-Whitney test by the product of the two sample sizes – PS = U/(n1 × n2). This is equal to the probability that the median shot length of a silent film (X) is greater than the median shot length of a sound film (Y) plus half the probability that the median shot length of a silent film is equal to the median shot length of a sound film: PS = Pr[X > Y] + (0.5 × Pr[X = Y]).

If the median shot lengths of all the silent films were greater than the median shot lengths of all the sound films, then the probability of randomly selecting a silent film with a median shot length greater than the median shot length of sound film is 1.0.

Conversely, if the median shot lengths of all the silent films were less than the median shot lengths of all the sound films, then the probability of randomly selecting a silent film with a median shot length greater than the median shot length of sound film is 0.0.

If the two samples overlap one another completely, then the probability of randomly selecting a silent film with a median shot length greater than the median shot length of sound film is equal to the probability of randomly selecting a silent film with a median shot length less than the median shot length of a sound film, and is equal to 0.5.

So if there is no effect PS = 0.5, and the further away PS is from 0.5 the larger the effect we have observed.

There are no hard and fast rules regarding what values of PS are ‘small,’ ‘medium,’ or ‘large.’ These terms need to be interpreted within the context of the experiment.

For the Laurel and Hardy data, we have U = 56.0, n1 = 12, and n2 = 20. Therefore, PS = 56/(12 × 20) = 56/240 = 0.2333.

Let us now compare the effect size for the Laurel and Hardy paper with the effect size from my study on the impact of sound in Hollywood in general (access the paper here). For the Laurel and Hardy data PS = 0.2333, whereas for the Hollywood data PS = 0.0558. In both studies I identified a statistically significant difference in the median shot lengths of silent and sound films, but it is clear that the effect size is larger in the case of the Hollywood films than for the Laurel and Hardy films.

The Hodges-Lehmann estimator

If we have designed our experiment to understand the impact of sound technology on shot lengths in Laurel and Hardy films around a null hypothesis of stochastic equality, then it makes no sense to subtract the sample median of the silent films from the sample median of the sound films because this implies a shift function and therefore a different experimental design and a different null hypothesis.

If we are not going to test for a classical shift model, how can we estimate the impact of sound technology on the cinema in terms of a slowing in the cutting rate?

To answer this question, we turn to the Hodges-Lehmann estimator for two samples (HLΔ), which is the median of the all the possible differences between the values on the two samples.

In Table 2, the median shot length of each of the Laurel and Hardy silent films is subtracted from the median shot length of each of the sound films. This gives us a total set of 240 differences (n1 × n2 = 12 × 20 = 240).

Table 2 Pairwise differences between the median shot lengths of Laurel and Hardy silent films (n = 12) and sound films (n = 20)

If we take the median of these 240 differences we have our estimate of the typical difference between the median shot length of a silent film and the median shot length of a sound film. Therefore, the average difference between the median shot lengths of the silent Laurel and Hardy films and the median shot lengths of the sound Laurel and Hardy films is estimated to be 0.5s (95%: 0.1, 1.1). (I won’t cover the calculation of the (Moses) confidence interval for the estimator HLΔ in this post, but for explanation see here).

The sample median of the silent films is 3.5s and for the sound films it is 3.9s, and the difference between the two is 0.4s, but as the shift function is an inappropriate design for our experiment this actually tells us nothing. Now it would appear that the difference between the two sample medians and HLΔ are approximately equal: 0.4s and 0.5s, respectively. But it is important to remember that they represent different things and have different interpretations. The difference between the sample medians represents a shift function, whereas the Hodges-Lehmann estimator is the average difference between the median shot lengths.

Note than we can calculate the Mann-Whitney U test statistic directly from the above table. If we count the number of times a silent film has a median shot length greater than that of a sound film (i.e Δ < 0, the green-highlighted numbers) and add this to half the number of times the silent and sound films have equal median shot lengths (i.e. Δ = 0, the red-highlighted numbers), then we have the Mann-Whitney U statistic that we derived above: U2 = 47 + (0.5 × 18) = 56. Equally, if we add the number of times a silent film has a median shot length less than that of sound film (i.e. Δ > 0, the blue-highlighted numbers) to half the number of times the medians are equal, then we have U1 = 175 + (0.5 × 18) = 184.

Bringing it all together

Once we have performed out hypothesis test, calculated the effect size, and estimated the effect we can present our results:

The median shot lengths of silent (n = 12, median = 3.5s [95% CI: 3.2, 3.7]) and sound (n = 20, median  = 3.9s [95% CI: 3.5, 4.3]) short films featuring Laurel and Hardy produced between 1927 and 1933 were compared using a Mann-Whitney U test, with a null hypothesis of stochastic equality. The results show that there is a statistically significant but small difference of HLΔ = 0.5s (95% CI: 0.1, 1.1) between the two samples (U = 56.0, p = 0.013, PS = 0.2333).

These two sentences provide a great deal of information to the reader in a simple and economical format – we have the experimental design, the result of the test, and the practical significance of the result.

Note that at no point in conducting this test have we employed a ‘dazzling array’ of mathematical operations – in fact the most complicated thing in the while process was to find the square root in the equation for σ above and everything else was numbering items in a list, addition, subtraction, multiplication, or division.


The Mann-Whitney U test is ideally suited to our needs in comparing the impact of sound technology on film style, and has numerous advantages over the alternative statistical methods:

  • it is covered in pretty much every statistics textbook you are ever likely to read
  • it is a standard feature in statistical software (though you will have to check which name is used) and so you won’t even have to do the basic maths described above
  • it is easy to calculate
  • it is easy to interpret
  • it allows us to test for stochastic superiority rather than a shift model
  • it is robust against outliers
  • it does not depend on the distribution of the data
  • it can be used to determine an effect size (PS) that is easy to calculate and simple to understand
  • we have a simple estimate of the effect (HLΔ) that is consistent with the test statistic

If you want to compare more than two groups of films, then the non-parametric k-sample test is the Kruskal-Wallis ANOVA test (see here). The Mann-Whitney U test can also be applied as post-hoc test for pairwise comparisons.

References and Links

Callaert H 1999 Nonparametric hypotheses for the two-sample location problem, Journal of Statistics Education 7 (2):

Mann HB and Whitney DR 1947 On a test of whether one of two random variables is stochastically larger than the other, The Annals of Mathematical Statistics 18 (1): 50-60.

The Wikipedia page for the Mann-Whitney U test can be accessed here, and the page for the Hodges-Lehman estimator is here.

For an online calculator of the Mann-Whitney U test you can visit Vassar’s page here.

For the critical values of the Mann-Whitney U test for samples sizes up to n1 = n2 = 20 and α = 0.05 or 0.01, see here.

Comparing shot scales in films

In earlier posts I have used rank-frequency plots to compare the changing use of shot scales. See here for a comparison of the changing use of shot scales in Hollywood and German cinema from the 1910s to the 1930s, here for a comparison of shot scales in Hollywood films from 1959 and 1999, or here for a piece on Alfred Hitchcock.

The rank-frequency plot is very useful for comparing how the use of shot scales varies between groups of films by looking at the dominance of the most frequent (2nd most frequent, etc) scale (irrespective of which scale that actually is), but is less useful if we want to compare the variation of shot scales in individual films or between groups of films without ordering the data by rank. To meet this need, we can use the index of qualitative variation (IQV). The IQV is the ratio of the observed variation in a categorical variable to the maximum amount of variation that could exist. If all the observed elements were in a single category, the would be no variation and IQV = 0. If the observed elements are distributed equally across all the categories, then IQV = 1. The greater the value of the index, the greater the heterogeneity of the shot scales.

To calculate the IQV we use the equation

where K is the number of categories and Pi is the proportion of elements in the ith category.

Using the same data for Hollywood films from the 1910s, 1920s, and 1930s that I used in the study linked to above, we can look at the variation in the use of shot scales between individual films and groups of films.

First, we calculate the IQV of each film. For example, if we want to calculate the IQV for The Front Page (Lewis Milestone, 1931) we calculate the square of the proportion of shots of each scale and sum these together (see Table 1). (The data source for this film and others studied here and other information is given in the paper referred to).

Table 1 Calculation of the IQV for The Front Page (Lewis Milestone, 1931)

From Table 1, we can see that the sum of the squared proportions is 0.20. Subtracting this value from 1 gives us the index of diversity: 1 – 0.20 = 0.80. Standardising this value by the factor k/(k-1) gives us the index of qualitative variation. As the shot scales in this film have been sorted into seven shot scales, the standardisation factor is 7/6 = 1.17. The IQV for The Front Page is 1.17 × 0.80 = 0.93, and indicates a high degree of variation.

Completing this process for all the Hollywood films I looked we get the results presented in Tables 2 through 4. We can see that the variation in shot scales in The Front Page is consistent with the style of other Hollywood films of the 1930s (Table 4), but is very different from the films of the 1910s (Table 2). From Table 2, we can see that the variation of shot scales in Traffic in Souls and David Haurm exhibit less heterogeneity than other films of the 1910s. The data in these tables also suggests a trend over time: the IQVs in Table 2 indicates that films from the later 1910s show greater variation than films from the years prior to 1918; and, to a lesser extent, the IQV is lower for films in the early 1920s than in the later 1920s (Table 3). There is no such trend in the 1930s.

Table 2 Index qualitative of variation for Hollywood films produced in the 1910s (n = 18): median  = 0.85 (0.77, 0.92)

Table 3 Index qualitative of variation for Hollywood films produced in the 1920s (n = 29): median  = 0.93 (0.92, 0.95)

Table 4 Index qualitative of variation for Hollywood films produced in the 1930s (n = 28): median  = 0.95 (0.94, 0.96)

The distribution of the IQV for the films listed in Tables 2 through 4 is presented in Figure 1.

Figure 1 Index of qualitative variation for Hollywood films produced in the 1910s (n = 18), 1920s (n = 29), and 1930s (n = 28)

From Figure 1 we can see that the variation of shot scales in Hollywood films shows increasing heterogeneity from the 1910s to the 1930s. We can also see that the distribution the IQV becomes narrower over time, indicating that Hollywood films converge to a single style. This is, of course, exactly what we should expect to find with the emergence of the dominant continuity style of classical Hollywood.

The IQV is a simple way of comparing the style of films that can make dealing with a large amount of data much more manageable.

Expanded sample for lognormal distribution

I have looked at the assumption of lognormality for shot length distributions in the statistical analysis of film style in some earlier posts here, here, and here. Using the probability plot correlation coefficient, I concluded that as the assumption of lognormality could not be justified in up to half of the films studied that it was not appropriate to assume lognormality in general – if your experiment is based on assumption that is wrong 50% of the time, then your results will not be reliable. This post repeats that analysis presented earlier using a larger sample of Hollywood films. For a description of the method of using normal  probability plots and the probability plot correlation coefficient employed and the earlier results see the links above, or the links to the papers at the end of this post.

In Figure 1, we can see an example of a probability plot for a film (20,000 Years in Sing Sing) for which the data not only failed to reject the null hypothesis of lognormality (see below) but that on visual inspection is about as good a fit as you could expect. Although failure to reject a null hypothesis cannot be taken to imply that such a hypothesis is true, on the basis of this plot you would be more than happy to treat this film as being lognormally distributed.

Figure 1 Probability plot of shot length data (LN[X]) for 20,000 Years in Sing Sing (1932) (n = 692, PPCC = 0.9989)

Another useful feature of the normal probability plot is that the slope and the intercept of the regression line provide estimates of the shape factor and mean of the log-transformed data, respectively. In Figure 1, the slope is 0.8245, which is very close to the standard deviation of the log-transformed data (s = 0.8236) and the method proposed by Barry Salt for use in analysing film style based on the ratio of the mean to the median (s* = √(2×LN(mean/median)) = 0.8461). The intercept is 1.5695, giving a geometric mean of 4.7 seconds, which is very close to the median of 4.8s.

In Figures 2 to 4 we have three films for which the null hypothesis of lognormality (see below) was rejected. What is noticeable about these three films is that the deviation from the hypothesised distribution is different in each case. The plot for Rain (Figure 2) is all over the place; while the data for Steamboat Bill, Jr (Figure 3) deviates from the hypothesised distribution in both the lower and upper tails and the curvature of the plot indicates that the logarithmic transformation has not been successful in removing all of the skew from the data. In contrast, A Free Soul (Figure 4) shows such variation only in its lower tail.

Figure 2 Probability plot of shot length data (LN[X]) for Rain (1932) (n = 308, PPCC = 0.9768)

Figure 3 Probability plot of shot length data (LN[X]) for Steamboat Bill, Jr (1928) (n = 575, PPCC = 0.9839)

Figure 4 Probability plot of shot length data (LN[X]) for A Free Soul (1931) (n = 461, PPCC = 0.9873)

We can also see from the different estimates provide by the slope, s, and s*, and intercept and median that discrepancies abound. For Rain, the slope gives a shape factor of 1.3036, s = 1.3287, and s* = 1.5865; while the intercept (1.8959) indicates a geometric mean of 6.7 compared to a median value of 5.1 seconds. For Steamboat Bill, Jr, the slope is 0.7135, compared to s = 0.7233 and s* = 0.8920; while the discrepancy between the geometric mean (5.2 [intercept = 1.6572]) and the median (4.8) is less than was observed for Rain. For A Free Soul the differences in the estimates are smaller: for the shape factor, the slope is 1.0206, s = 1.0305, and s* = 1.0962; and that the geometric mean is 7.1 (intercept = 1.9596) and the median is 6.6 seconds.

Note that in all three cases, it is the method for estimating the shape factor based on the assumed relationship between the median and the mean (s*) that shows the greatest difference from the other methods. This is because the relationship between the median and the mean is only valid if the data is lognormally distributed. If this is not the case, then the claimed relationship between the median and the mean does not exist and produces inaccurate estimates of the parameters for the lognormal distribution. As this assumption is valid for 20,000 Years in Sing Sing we see that s* provides an estimate close to the other methods; but as the assumption of lognormality is not justified for the other three films, it does not. If we based any analysis of these films upon the assumption that their shot lengths were lognormally distributed, then our conclusions would be worthless because that assumption, and everything we derive from it (including the parameters μ and σ), is not true.

As noted above, it appears that the assumption of lognormailty may be justified in only half of the films we look at. Extending this research with a larger sample will allow us to make a better assessment of the applicability of this assumption to shot length distributions. In total, the probability correlation coeffcient test of normality was applied to a total of 168 Hollywood films (including some of the films I had previously looked at), divided into three groups: silent films of the 1920s (n = 52), sound films from 1929 to 1931 (n = 66), and sound films from 1932 to 1934 (n = 50). As these are statistical tests of a null hypothesis it is important to remember that failure to reject the null hypothesis does not mean that the data is lognormally distributed, and that some of these tests will conclude the data is not lognormally distributed when in fact it is. The test was applied using a Blom plotting position and α = 0.05. All the data used is from the Cinemetrics database (here).

Of the silent films produced in the 1920s (Table 1), the hypothesis of lognormality was rejected in 39 of the 52 cases, or 75% of the time. Of the sound films produced between 1929 and 1931 (Table 2), lognormality was rejected in 50 out of 66 cases (76%); and of the sound films from 1932 to 1934 (Table 3), it was rejected in 40 out of 50 cases (80%).

Table 1 Probability plot correlation coefficient test of the null hypothesis (H0) that the data is lognormally distributed for Hollywood films produced in the 1920s (n = 52)

Table 2 Probability plot correlation coefficient test of the null hypothesis (H0) that the data is lognormally distributed for Hollywood films produced from 1929 to 1931 (n = 66)

Table 3 Probability plot correlation coefficient test of the null hypothesis (H0) that the data is lognormally distributed for Hollywood films produced from 1932 to 1934 (n = 50)

What stands out from these results is that the proportion of films for which there is sufficient evidence against the assumption of lognormality is similar for each group of films. The earlier results that indicated that lognormality could not be assumed in half the films now look over-optimistic – the assumption of lognormality may only be justified in between a fifth to a quarter of cases for Hollywood films. Certainly this is a long way off the assumption that lognormality of shot length distributions is generally true. Whether this is true for cinemas in other countries or other eras will have to wait for a later post.

The other thing to stand out is that there is no pattern among the films: we cannot distinguish between short films or features, silent films or sound, films from different genres or different studios, or by decade as being lognormal or not lognormal. We can say that the assumption of lognormality will be justified in some cases, but that in the overwhelming majority of cases this is not true. Additionally, as noted above, they will be different from an assumed lognormal distribution in different ways. Statistical studies of film style should be developed with this in mind.


Filliben JJ 1975 The probability plot correlation coefficient test for normality, Technometrics 17 (1): 111-117.

Looney SW and Gulledge TR 1985 Use of the correlation coefficient with normal probability plots, The American Statistician 39 (1): 75-79.

Vogel RM 1986 The probability plot correlation coefficient test for the normal, lognormal, and Gumbel distribution hypotheses, Water Resources Research 22 (4): 587-590.

Estimating shot length distributions

One of the problems we encounter when researching film style is that different versions of the same film exist. For example, the discovery in Argentina in 2008 of a version of Fritz Lang’s Metropolis (1927) that was approximately 25 minutes longer than previously known versions. The official site for the restored version is here. This makes the statistical analysis of film style difficult, because we have to face the fact that the version of a film we are analysing may not be the film as it was produced.

We may come across different versions of different films for a variety of reasons:

  • Different versions of the same film may be released in different countries.
  • ‘Director’s cut’ versions raise the question as to what we should call the definitive version: I have five different versions of Bladerunner (Ridley Scott, 1982) on DVD – the original 1982 US theatrical release, the 1982 international theatrical release, the work print, and the 1992 Director’s Cut version, and the 2007 Final Cut. We could simply note which version our data represents and leave it at that. However, we are faced with the problem that if we wish to look at the shot length distributions of Hollywood movies in the early-1980s or the films of Ridley Scott, which version should we pick? Should we pick the version with the voice-over that uses shots left over from The Shining, or the one without the voice-over and the unicorn dream sequence? Does the fact that the film was re-edited for the 2007 cut invalidate it when it comes to looking at 1980s Hollywood, even though all the material used in this cut was shot in the early-1980s? Is the Final Cut version an example of Hollywood cinema from the early-1980s, or of the mid-2000s? Did Scott’s editing style change between 1982 and 2007 so that these two versions cannot be simply compared?
  • The version of a film released for home viewing is often different to the theatrical release due to the requirements of classification boards or censors. Another factor here may be corporate taste: historically, some home cinema outlets have edited their tapes to maintain a family friendly corporate image by removing scenes of gore, violence, and/or sex. Rather than distinguish between different versions we tend to treat the domestic and theatrical releases as being one and the same, when in fact our data may show some discrepancies.
  • Although it is unlikely to be a significant factor in the 21st century, pan-and-scan may affect the number of shots in a film.
  • When working with silent films it is often difficult to find compete versions of the films, and some frames, shots, scene, or even reels may be missing. We will therefore be working with only a partial data set. This problem can be compounded by the release of restored versions that are built up from several prints. It is highly likely that we have not the seen (and probably never will) the original version of many of the silent films that we take for granted on DVD.

There are also other sources of measurement error that can affect our research:

  • When dealing with wipes, dissolves, fades, irises, should we record the end of one shot and the beginning of the next at the beginning of the transition, the end of the transition, or in the middle of the transition. I always prefer the last of these options, and try to identify the middle frame of the edit, but I cannot speak for other researchers.
  • Identifying the correct running speed for silent movies is problematic. Silent movies we often shot at 18 or 20 fps, while we view and analyse them at 25/30 fps.
  • Data from the Cinemetrics database will also contain errors due to the performance of the researcher, and so the figures quoted will only ever be estimates.

It is wrong to state, as Barry Salt did recently (here), that we do not need to employ the full range of statistical methods and that such methods are ‘misleading’ and ‘irrelevant.’ It is necessary to deal with these issues in order to present the best analysis we can, and that means we need to be able to deal with the error present in our estimates. Even though the data we collect may be accurate to the frame, we will still have to deal with the existence of multiple versions, missing shots, different methods of data collection, etc. If you are going to analyse film style statistically, then at some point you are going to have to do some statistics.

This post focusses on the particular problem of dealing with silent films that have been restored. In 2009 and 2010 I added two posts to this blog looking at the shot length distributions of the Keystone films starring Charlie Chaplin (here and here). Since then, the BFI has released its Chaplin at Keystone DVD (see here). How do the shot length distributions of these restored versions compare to the original data I used in 2009? (For ease of understanding, when I refer to ‘original’ I mean the 2009 data and when I refer to ‘restored’ I mean data derived from the BFI DVD).

To date I have only looked at data from four films, though I hope to get around to the rest some time later this year. The four films are The Masquerader, The Rounders, The New Janitor, and Getting Acquainted. In the original data I removed the credit titles and the expository and dialogue title, and for the sake of consistency I have done so here with the restored versions. However, in the Excel file at the end of this post that includes the shot length data from the restored versions of these four films I have excluded the credit titles but left in the other titles as indicated by ‘T.’

The descriptive statistics of the original and restored versions of The Masquerader are presented in Table 1 and the empirical cumulative distribution functions are presented in Figure 1.

Table 1 Descriptive statistics of the original and restored versions of The Masquerader (Charles Chaplin, 1914)

From Table 1 we can see that the original estimate of the median shot length (3.7s [95% CI: 2.8, 4.6]) is consistent with the revised estimate (4.5s [95% CI: 2.7, 6.3]). However, there is a large difference in the dispersion of shot lengths as indicated by the increase in the upper quartile and the interquartile range. This indicates that the version of The Masquerader from which the original data is less consistent in the upper part of the distribution, although a two-sample Kolmogorov-Smirnov test indicates there is no statistically significant difference (D = 0.1485, p = 0.265).

Figure 1 Empirical cumulative distribution functions of shot lengths in the original and restored versions of The Masquerader (Charles Chaplin, 1914)

Looking at the same information for The Rounders (Table 2 and Figure 2), we note that there is a much larger discrepancy between the two versions of this film. The original estimate of the median shot length was 3.6s (95% CI: 2.5, 4.7), and the revised estimate is 5.0s (95% CI: 3.5, 6.5). Again there is a larger increase in the dispersion of shot lengths, and this is also more marked in the upper part of the distribution. Again, we find that a two-sample Kolmogorov-Smirnov test indicates there is no statistically significant difference  between the two distribution functions (D = 0.1922, p = 0.087).

Table 2 Descriptive statistics of the original and restored versions of The Rounders (Charles Chaplin, 1914)

Figure 2 Empirical cumulative distribution functions of shot lengths in the original and restored versions of The Rounders (Charles Chaplin, 1914)

There are no such large differences between the versions of The New Janitor (Table 3 and Figure 3). The medians are consistent, with only a small change in the estimate from 3.5s (95% CI: 2.4, 4.5) to  4.2s (95% CI: 3.2, 5.1). There is also a small increase in the interqaurtile range, and this is accounted for by the small difference between  the upper quartiles. However, this difference is not comparable to those observed in the cases of The Masquerader and The Rounders, and the cumulative distribution functions are indicates that the two versions have the same distribution of  shot lengths (Kolmogorov-Smirnov: D = 0.1184, p = 0.515).

Table 3 Descriptive statistics of the original and restored versions of The New Janitor (Charles Chaplin, 1914)

Figure 3 Empirical cumulative distribution functions of shot lengths in the original and restored versions of The New Janitor (Charles Chaplin, 1914)

The two versions of Getting Acquainted (Table 4 and Figure 4) show only a small difference in the upper quartile and the interquartile range, but otherwise the two sets of shot length data are consistent (Kolmogorov-Smirnov: D = 0.0622, p = 0.978). The original estimate of the median is 3.9s (95% CI: 3.3, 4.5) and the revised estimate is 4.0s (95% CI: 3.3, 4.7), so these are nearly identical.

Table 4 Descriptive statistics of the original and restored versions of Getting Acquainted (Charles Chaplin, 1914)

Figure 4 Empirical cumulative distribution functions of shot lengths in the original and restored versions of Getting Acquainted (Charles Chaplin, 1914)

Although I have looked at just four films here we can see that generally the difference in the median shot lengths is small for three of the films and would not substantially change how we interpret this information – though the increase in the dispersion of the upper part of the distribution for the restored version of The Masqueraders is a good example of why it is not enough to refer only to measures of location in the analysis of film style. We must also look at dispersion. The difference between the two versions of The Rounders will obviously lead us to reconsider our conclusions based on this data. Hopefully when I have finally completed transcribing the data for the other Chaplin Keystones from the restored version a clearer understanding of how to deal with different estimates of the shot elngths in a motion picture will emerge.

The shot length data for the restored versions of The Masquerader, The Rounders, The New Janitor, and Getting Acquainted can be accessed as an Excel 2007 (.xlsx) here: Nick Redfern – BFI Restored Chaplin 1. This data was collected by loading the films into Magix Movie Edit Pro 14 at 25 fps, and has been corrected by multiplying each shot length by 25/24.

Shot scales in 1920s French cinema

Earlier posts have looked at shot scales in Hollywood and German cinema (here and here), and the films of Max Ophüls (here) and Alfred Hitchcock (here). This post follows those up with a quick look at French cinema of the 1920s using data from Barry Salt’s database (which can be accessed here).

The mean relative frequencies for a sample of 18 silent French films released between 1920 and 1929 (inclusive) are presented in Table 1, and Figure 1 is the rank-frequency plot for this data. The slope of the linear trendline in Figure 1 is -0.0456 (95% CI: -0.0682, -0.0231) and the intercept is 0.3254 (95% CI: 0.2245, 0.4263). From Figure 1 it is clear that this trendline is a poor fit for this data ( = 0.8440, SE = 0.0464). Exponential ( = 0.9580) and logarithmic ( = 0.9685) trendlines give a better fit.

Table 1 The mean relative frequencies of shot scales in French cinema of the 1920s (n = 18)

Figure 1 Rank-frequency plot of shot scales for French cinema of the 1920s (n = 18). The error bars are the 95% confidence interval.

Comparing these results with those of 1920s Hollywood cinema and German cinema (available here), we can see that the poor fit for the linear trendline French cinema is consistent with German films ( = 0.8606, SE = 0.0505) but different from Hollywood cinema ( = 0.9902, SE = 0.0106). Like German cinema of the 1920s, a single shot scale dominates the style of these films while other shot scales occur much less frequently. For German films of the 1920s the mean relative frequency for the first-ranked shot scale is 0.3804 (95% CI: 0.3181, 0.4428) – an estimate that is clearly in line with that of the French films described here. By contrast, the equivalent figure for Hollywood in the 190s is 0.2967 (95% CI: 0.2697, 0.3166). Elsewhere I have argued that the change evident in the rank-frequency plots for Hollywood cinema occurs with the introduction of continuity editing and the change representation of on-screen space in the classical style. We may infer from these results that 1920s French cinema did not break down space in the same way as contemporary Hollywood films, and that they held onto the same pre-classical style as German filmmakers. The dominance of a single scale in French cinema is evident in the summary given in Table 2 and in Figure 2. The long shot is the first-ranked scale in 17 of the 18 films included in the sample; and for the only film where this is not so, the medium long shot is the most frequently occurring. Although Barry Salt’s data on shot scales does include many French films of the 1930s, it covers a narrower range of directors than for the 1920s (which is useful in different ways), and so any analysis for this period will reflect that limitation rather than a broader historical style. We cannot be certain that, as in Germany, the French cinema went over to a classical Hollywood style in the 1930s, but it is worthy of further research. to determine if there is a European-wide lag in the take-up of Hollywood’s style or if this true only for some countries. Interestingly, Hitchcock’s British films of this period are consistent with Hollywood cinema in this respect.

Table 2 The mean relative frequencies of shot scales in French cinema of the 1920s (n = 18)

Figure 2 Normalized sample medians of shot scales in French cinema of the 1920s.

The UK Top 100, 2007 to 2009

A few months ago I looked at the clustering of UK films at the UK box office (here). This week I look at the top 100 films at the UK box office from 2007 to 2009, inclusively.

Data was taken from the UK Film Council and Box Office Mojo. The ranking of a film in the top 100 according to Box Office Mojo is determined by its total box office gross. The total box office data given by Box Office Mojo is in dollars, and this was converted into pounds by multiplying by 0.51 for 2007 and 2008 and 0.61 for 2009. These figures are, therefore, estimates and this should be kept in mind when interpreting the results. To sort the data into groups, the opening weekend gross (including previews) and the total box office data were entered into PAST (v. 2.04) and then allocated by using k-means clustering into 5 groups. I would have gone further into the data to compare films ranked lower than 100, but the UK Film Council box office archive does not have data for many of these films.

It is clear from the graphs of each year (Figures 1 to 3) that there is strong correlation between a film’s opening weekend gross and its total gross. The Spearman rank correlation between these two variables for 2007 is rs (98) = 0.8995, p = <0.001, and the mean proportion of a film’s total gross accounted for by the opening weekend is 0.2644 (95% CI: 0.2473, 0.2815). For 2008, rs (98) = 0.8642, p = <0.0001, and the mean proportion is 0.3019 (95% CI: 0.2826, 0.3212); and for 2009, rs (98) = 0.8993, p = <0.0001, and the mean proportion is 0.2808 (95% CI: 0.2633, 0.2983). Overall, there is little variation from year to year across the top 100 as a whole.

In each graph we see the same types of films in the different clusters. The purple cluster includes the top performing films in each year, and these are typically franchise movies (James Bond, Harry Potter, Spider-man, Shrek, The Simpsons, Batman, Indiana Jones, etc). Although the number 1 grossing film released in 2008 is Mamma Mia! The lack cluster are films that did not achieve such stellar results, but wich are nonetheless big budget studio fare. This cluster includes films from the Transformers, Twilight, and Iron Man franchises (which is probably a little disappointing for the producers), along with several family films (Wall-E, Monsters v. Aliens, Kung Fu Panda). The red cluster includes some films that perhaps achieved more than was expected (Atonement, St. Trinian’s, Juno, Paranormal Activity) as well some films that achieved much less than could be expected (Ocean’s Thirteen, Rocky Balboa, The Incredible Hulk, X-Men Origins: Wolverine). A small budget film in this group is performing strongly, but a big special effects movie in this group is soon going to be the end of your franchise. If your big budget effects movie ends up in the green group then the end will come very quickly, so don’t expect to see anymore Ghost Rider (2007) or GI Joe (2009) movies  in the future.The Curious Case of Benjamin Button, The Fantastic Mr. Fox, Cloudy with a Chance of Meatballs, Watchmen, and Fame (all from 2009) ended up in this group, and you would have to say that overall this represents poor performance on the part of these films. The green cluster includes many films that performed perfectly respectably (The Last King of Scotland, Notes on A Scandal), but which did make the same cross over achieved by Atonement or Juno. The blue cluster includes films that opened poorly before things went down hill. It is gratifying that this includes Rambo. It also includes Mr. Magorium’s Wonder Emporium which British audiences evidently did not want to watch, along with several poor quality horror films (Halloween, Hostel Part II, The Hills Have Eyes 2), as well as the unending cycle of awful spoof movies (Meet the Spartans, Disaster Movie, Epic Movie, Superhero Movie) that must do enough business in the US to justify the cost. Many of the other blue films are movies slightly outside the mainstream that have made it into the top 100 (This is England, Eastern Promises). Notable failures in the blue cluster include Teenage Mutant Ninja Turtles and Hannibal Rising from 2007, How to Lose Friends and Influence People and The X-Files: I Want to Believe and from 2008, and Revolutionary Road and The Men Who Stare at Goats from 2009.

We also see similar numbers of films appearing in each cluster in each year. The top 3 clusters (purple, black, and red) account for 31 films in 2007, 30 films in 2008, and 29 films in 2009. The black cluster in 2009 is larger than in the other years but this may be due to the fact that the data for this year includes Avatar, which simply trounced everything forcing other films that would have made the purple group in other years down one step.The green cluster includes 24 films in 2007, 29 films in 2008, and 31 films in 2009; while the blue cluster has 45 films in 2007, 41 films in 2008, and 40 films in 2009.

The number of films in each cluster, and the mean total and weekend gross are presented in Tables 1 to 3.

Table 1 Cluster size, and mean total and opening weekend grosses for the top 100 films at the UK box office in 2007

Table 2 Cluster size, and mean total and opening weekend grosses for the top 100 films at the UK box office in 2008

Table 3 Cluster size, and mean total and opening weekend grosses for the top 100 films at the UK box office in 2009

The outlier in the red cluster to the left of Figure 1 is PS I Love You, which was released on just 80 screens at Christmas 2007 for the first two weeks, producing a very low opening weekend, but was then released wide on 365 screens in the first week of January 2008 and immediately grossed a respectable £1.79 million for that weekend. This film probably underperformed at the box office, and if it had been released wide for its opening weekend could be expected (on the basis if its subsequent weekends) to have made closer to (if not actually into) the black cluster. I can’t imagine what advantage was gained from releasing a romatic film at Christmas on just 80 screens, especially when it is well-known that the opening of film is the most crucial period in its box office life.

Figure 1 Top 100 films at the UK box office in 2007

Although the top 3 groups (red, black, and purple) in 2008 include roughly the same number of films as the other years, it is immediately apparent from Figure 2 that films in the red group performed less well in this year. Unlike 2007 and 2009, the majority of films in this cluster achieved a total gross of less than £10 million, and from Tables 1 to 3 we can see that the mean total gross is lower for this year than in the others (ANOVA: F (2, 48) = 17.14, p = <0.0001; Tukey HSD: 2007/2008 – p = 0.0004, 2008/2009 – p = 0.0001, 2007/2009 – p =0.4363). The total box office gross for the top 100 films in 2008 was £780.7 million, the lowest of any year covered here (2007 = £868.2 million, 2009 = £1002.7 million).  The outliers in the green cluster are In Bruges, which seems to have been released twice – once in March on 75 screens and then again in April on 270 screens; and There Will Be Blood, which opened on just 24 screens but grew this to 199 screens in week 5 of its release. These are exceptions to the rule that opening weekends are destiny. In the case of In Bruges, we see a small budget film getting a second, bigger lease of life after an initial run as distributors and exhibitors respond to audiences and reviews. The release of There Will Be Blood can be explained by looking at America. In the US this film was released on just 2 screens in December 2007 before going to 1620 screens after 7 weeks by February 2008 – when it was nominated for eight (and later won two) Academy Awards – so this is perhaps the definition of an awards film. Without the Oscars, the release of this film would have been that much more limited.

Figure 2 Top 100 films at the UK box office in 2008

In 2009, there were two films that grossed considerably more than other films: Avatar and Harry Potter and the Half Blood Prince (although most of the gross for Avatar was accumulated in 2010). Avatar was released at Christmas and so its opening weekend accounts for only 9% of its total gross, whereas 38% of the gross for Harry Potter was accumulated on it opening weekend. The 2nd tier of films (the black group) exhibits much more variation for the opening weekend grosses in this year than for 2007 and 2008. There is much less separation between the red and black clusters in Figure 3, and this again may be due to the distorting effect of Avatar. Again we see some films that have very low opening weekends relative to their total gross: Gran Torino in the green cluster and Vicky Christina Barcelona in the blue cluster. AS before this can be attributed to distributors dipping their toe into the market with limited releases, before expanding the number of screens the following week. Whether or not this actually provided an advantage for these films is unknown, but a bigger opening weekend for Gran Torino would have pushed it towards the red cluster. Unlike PS I Love You, they are much harder to market to a specific audience and so perhaps the is some nervousness on the part of distributors to commit so many screens without such a defined audience. Who watches Woody Allen movies nowadays?

Figure 3 Top 100 films at the UK box office in 2009

Overall, there is remarkable stability in the top 100 films at the UK box office, which is exactly what studios pay to see. By applying clustering to box office data in this manner we can identify some of the structure in this data, and to identify those films which performed above or below expectation, and to compare the performance of similar films from year to year.

Research on film industries

THis week a collection of articles looking at film industries from perspectives that are typically different from that typically found in film studies. As usual, the version linked to may not be the final version published.

There is a lot of interesting research of film industries available through the Copenhagen Business School’s Knowledge portal (here), and by searching its research database and open archive. The CBS has a robust approach to open access and most of the research is available in English. Topics include:

  • The internationalization of the Indian film industry
  • City branding and film festivals
  • Film labour markets
  • The Danish film industry
  • Globalization and the cinema

Bakker G 2004 At the Origins of Increased Productivity Growth in Services: Productivity, Social Savings and the Consumer Surplus of the Film Industry, 1900-1938, Working Paper 81, Department of Economic History, London School of Eocnomics.

This paper estimates and compares the benefits cinema technology generated to society in Britain, France and the US between 1900 and 1938. It is shown how cinema industrialised live entertainment, by standardisation, automation and making it tradable. The economic impact is measured in three ways: TFP-growth, social savings in 1938 and the consumer surplus enjoyed in 1938. Preliminary findings suggest that the entertainment industry accounted for 1.5 to 1.7 percent of national TFP-growth and for 0.9 to 1.6 percent of real GDP-growth in the three countries. Social savings were highest in the US (c. 2.5 billion dollars and three million workers) and relatively modest in Britain and France, possibly because of the relative abundance of skilled live-entertainment workers. Comparative social savings at entertainment PPP-ratios inflate British social savings to above the US level. Converging exchange rates and PPP price ratios suggest rapid international market integration. The paper’s methodology and findings may give insight in technological change in other service industries that were also industrialised.

Cazetta S 2010 Cultural clusters and the city: the example of Filmbyen in Copenhagen, ACEI 16th International Conference on Cultural Economics, 9-12 June 2010, Copenhagen, Denmark.

This paper explores the origins and development of Filmbyen (FilmCity), a media hub created around Lars von Trier‟s film company Zentropa in the outskirts of Copenhagen.

In the first part of the paper the theoretical framework is introduced, with a review of the relevant literature concerning the role of culture in urban development and with a focus on clustering in the cultural industries.

Subsequently, after analyzing what kind of impact the film industry has on local economic development, and more specifically what role it plays in urban and regional development strategies (looking at Greater Copenhagen), the case of Filmbyen is studied in detail. The location patterns of film and film-related companies based in this special district are investigated with a small-scale survey – observing in particular what are the advantages of clustering, what networks are created, what kind of urban environment comes about.

Coe NM 2000 The view from out West: embeddedness, inter-personal relations and the development of an indigenous film industry in Vancouver, Geoforum 31: 391-407.

This paper considers the development of a particular cultural industry, the indigenous film and television production sector, in a specific locality, Vancouver (British Columbia, Canada). Vancouver’s film and television industry exhibits a high level of dependency on the location shooting of US funded productions, a relatively mobile form of foreign investment capital. As such, the development of locally developed and funded projects is crucial to the long-term sustainability of the industry. The key facilitators of growth in the indigenous sector are a small group of independent producers that are attempting to develop their own projects within a whole series of constraints apparently operating at the local, national and international levels. At the international level, they are situated within a North American cultural industry where the funding, production, distribution and exhibition of projects is dominated by US multinationals. At the national level, both government funding schemes and broadcaster purchasing patterns favour the larger production companies of central Canada. At the local level, producers have to compete with the demands of US productions for crew, locations and equipment. I frame my analysis within notions of the embeddedness or embodiment of social and economic relations, and suggest that the material realities of processes operating at the three inter-linked scales, are effectively embodied in a small group of individual producers and their inter-personal networks.

Hoefert de Turégano T 2006 Public Support for the International Promotion of European Films, European Audiovisual Observatory.

Jones C 2001 Co-evolution of entrpreneurial careers, institutional rules, and competitive dymanics in American Film, 1895-1920, Organization Studies 22 (6): 911-944.

An historical case analysis of the American film industry is undertaken to gain a better understanding of the co-evolutionary processes of entrepreneurial careers, institutional rules and competitive dynamics in emerging industries. The study compares technology and content-focused periods, which were driven by entrepreneurs with different career histories and characterized by distinct institutional rules and competitive dynamics. Archival data and historical analysis is used to trace how entrepreneurial careers, firm capabilities, institutional rules, and competitive dynamics co-evolved. A co-evolutionary perspective is integrated with insights from institutional and resource-based theories to explain how the American film industry emerged, set an initial trajectory with specific institutional rules and competitive dynamics, and then changed.

Mezias Sj and Kuperman JC 2001 The community dynamics of entrepreneurship: the birth of the American film industry, 1895-1929, Journal of Business Venturing 16 (3): 209-233. [NB: this is not the full abstract, which is actually longer than some research papers].

This paper provides insight for practitioners by exploring the collective process of entrepreneurship in the context of the formation of new industries. In contrast to the popular notions of entrepreneurship, with their emphasis on individual traits, we argue that successful entrepreneurship is often not solely the result of solitary individuals acting in isolation. In many respects, entrepreneurs exist as part of larger collectives. First and foremost, there is the population of organizations engaging in activities similar to those of the entrepreneurial firm, which constitute a social system that can affect entrepreneurial success. In addition, there is also a community of populations of organizations characterized by interdependence of outcomes. Individual entrepreneurs may be more successful in the venturing process if they recognize some of the ways in which their success may depend on the actions of entrepreneurs throughout this community. Thus, we urge practitioners and theorists alike to include a community perspective in their approach to entrepreneurship. We also suggest that one way of conceptualizing the community of relevance might be in terms of populations of organizations that constitute the value chain. For example, in the early film industry a simple value chain with three functions—production, distribution, and exhibition—is a convenient heuristic for considering what populations of organizations might be relevant. As we show in our case study of that industry, a community model offers insights into the collective nature of entrepreneurship and the emergence of new industries.

Orbach BY and Einav L 2007 Uniform prices for differentiated goods: the case of the movie-theater industry, International Review of Law and Economics 27 (2): 129-153.

Since the early 1970s, movie theaters in the United States have employed a pricing model of uniform prices for differentiated goods. At any given theater, one price is charged for all movies, seven days a week, 365 days a year. This pricing model is puzzling in light of the potential profitability of prices that vary with demand characteristics. Another unique aspect of the motion-picture industry is the legal regime that imposes certain constraints on vertical arrangements between distributors and retailers (exhibitors) and attempts to facilitate competitive bidding for films. We explore the justifications for uniform pricing in the industry and show their limitations. We conclude that exhibitors could increase profits by engaging in variable pricing and that they could do so more easily if the legal constraints on vertical arrangements are lifted.