# Hypothesis tests of proportions for film style

If you go on to Google books and search for “average shot length” you will find many texts that quote this as a statistic of film style, and which cover a wide range of subjects from Italian Neo-realism, to contemporary Hollywood blockbusters, to Japansese cinema, and the English heritage film. This would be an encouraging sign fo the progression of the statistical analysis of film style were it not for the fact that they are all citing the wrong statistic – seriously, do not use the mean for non-normal data.

What we also find to be absent are statistical hypothesis tests,  even though the authors repeatedly claim that the ASL of one film is “significantly different” to another. Now “significance” is a specific term in statistics that has a technical meaning different from its everyday meaning, and relates to the interpretation of hypothesis tests and not to the importance or size of the result. In fact, statistical significance is very specifically NOT the size of a result, which is why you need to calculate the effect size.

From looking at this literature we see a common set of mistakes. Inappropriate statistics are used (i.e. the mean is used as a measure of location for non-normal data). In some cases the wrong statistic goes on being used even after the researcher has pointed out the flaws in their own methodology. There is the inadequate description of data, with the failure to report sample sizes and to quote measures of location without measures of dispersion being typical. Statistics are presented as parameters rather than as estimates of parameters.  unsurprisingly, we then find that confidence intervals are never used to provide errors for estimates. We do not have any statistical tests performed even when they are required. We have the misuse of statistical terms. There are also many occasions when the researcher fails to give a full account of the sampling and analysis methodologies used.

Given these flaws – and they are basic flaws – it seems scarcely credible to describe this work as statistics at all.

There are two reasons why such poor quality research is being published. First, the researchers lack even the most basic understanding of statistics to be able to properly conduct the research and interpret the results. Second, the editors of journals and the publishers of books do not know how to deal with research that includes statistical information, and so they fail to point out the most basic errors.

These problems have been noted in other fields – especially medicine – where the failure to properly analyse statistics has more fatal consequences than in film studies. A good discussion of these problems and some recommendations is presented in this article by Douglas Altman, the director of the Centre for Statistics in Medicine. It is too easy to find the same flaws Altman notes in medical research in film studies, and as I have noted the reasons are the same: poor education and poor practice. The recommendations for peer-revieweing statistically based research should be adopted by journals in film studies.

• Qualified statisticians should be employed for peer-review on any paper submitted to a journal purporting to be a statistical analysis of film style (and of film audiences, film economics, etc.).
• The policy for statistical referring should be transparent.
• Film studies journals should have a policy for the reporting statistical results. It is not necessary that this policy be printed in the notes to contributors in the journal, but it should be available from the editors at request.

Of course, many of these problems could be avoided by proper training for film researchers in the first place. If a film researcher does not have the required training then they should engage the services of a statistician at an early stage in the design of their research.

A good quote to bear in mind is this statement from 2006 by P.F. Kotur, editor of the Indian Journal of Anaesthesia:

The real solution to poor statistical reporting will come when authors learn more about research design and statistics; when statisticians improve their ability to communicate statistics to authors, editors, and readers; when researchers begin to involve statisticians at the beginning of research, not at its end; when manuscript editors begin to understand and to apply statistical reporting guidelines; when more journals are able to screen more carefully more articles containing statistical analyses; and when readers learn more about how to interpret statistics and begin to expect, if not demand, adequate statistical reporting.

You can read the full editorial here.

Education has to be the starting point, and so over the coming months I will use this blog to provide some basics on statistical hypothesis testing using examples that can be repeated by accessing data from the Cinemetrics database. To begin with, I outline the basics for performing hypothesis tests on the proportion of shot types in films using the one-sample z-test, the two-sample z-test, and the chi-square test for homogeneity of proportions.The pdf file can be accessed here: Nick Redfern – Tests of proportions. The examples all use Microsoft Excel, but other can be used. Open Office provides a spreadsheet with statistical functions, or you can download the latest version of PAST for free here.

## References

Altman DG 1998 Statistical reviewing for medical journals, Statistics in Medicine 17: 2661-2674.

Kotur PF 2006 Editorial: statistics in biomedical journals, Indian Journal of Anaesthesia 50 (3): 166-168.