Are you my peer, reviewer?
Statisticians clearly need to do a lot to develop awareness and understanding of statistics and data across society. One challenge is that the great majority (78 per cent) of those surveyed stated they had little or no knowledge of statistical analysis.
Professor David Hand, Imperial College London, President, Royal Statistical Society 2008-2010
Sophistication? Sophistication? Don’t talk to me about sophistication love – I’ve been to Leeds!
George Whitebread (Harry Enfield)
Last year, I wrote a paper looking at the impact of sound technology on the shot length distributions of films featuring Laurel and Hardy. Having completed this paper, I submitted it to Screen and a couple of days ago I received a negative response. Admittedly, submitting the paper to Screen was something of a long-shot – it is not, after all, a publication renowned for its empirical standing. But then I do not think that it would have received a substantially different reception had it gone to say The Quarterly Review of Film Studies, the Journal of Film and Video, or any other general film studies journal. The comments of this post obviously apply directly to the review of the paper that was generated by my submission to Screen, but they can also be generalised (depressingly) to other film studies journals. (When I submitted a revised version of my paper ‘Credal and pignistic reasoning in ergodic and non-ergodic texts,’ one of the reasons given for rejecting the paper was that you could not publish equations in a humanities journal).
The post in which the article was first published is here.
The reviewer’s comments are presented below in full. This review appears to be from just a single reviewer, and so I conclude that no double peer review process took place. In my submission to Screen I specifically requested that my article be reviewed by someone with expertise in statistics, but from what I received from Screen this was evidently not considered. I will only address the sections of these comments relating to statistical practice. I’m not overly convinced that the recommendations for discussing what Stan Laurel gained from working for Fred Karno, and so on, would add to the argument I presented – I think that these comments generally miss the point of the paper as a piece of statistical research in film studies – but I am not hostile to their intent and nor can I dismiss them as being simply irrelevant. It certainly seems to be the case that the reviewer is far more happier talking about this aspect of the paper than its statistical content.
These are the comments I received:
Comments to the Author
This article takes as its focus the output of Laurel and Hardy films made at the Hal Roach studios across the coming of sound period. The method is a shot length analysis followed by some contextual information and then a content analysis, which consists of the description of some of the films. The argument suggests that there are no anomalies in this work when compared to the main output of the Hollywood studios generally in this period. The structure is as follows: There is a great deal of work here in counting shots, or in providing the results of the statistical analysis. Some historical background to the films is provided after the statistics and then a short conclusion.
The information is interesting and there is a clear need to address in detailed case studies the impact of the technological changes to style and mode of production wrought by the coming of sound. There are two considerable problems with this. The first is that while the statistical data is apparently convincing neither the method nor the outcome are clearly outlined and explained. Screen readers are a sophisticated readership but even given the wide area of expertise and interest the dazzling array of formulae rendered in statistical analysis speak will be confusing for the majority of scholars. The whole paragraph from line 8 on page 4 to line 26 on page 5 employs statistical analysis terminology that is certainly beyond this reviewer and would require considerable explanation to be accessible to a Humanities-based readership.
The second problem is that there is not enough weight given to the body of material on Hollywood film comedy that is available. While some of the historical work includes reference to Henry Jenkins and Kristine Karnick’s book, which is important, but Jenkins’ ‘What Made Pistachio Nuts?’ Early Sound Comedy and the Vaudeville Aesthetic would seem a much more relevant source for both primary material and for an instructive methodology which pays close attention to the intermedial influences brought to bear on production and style in the early sound period. Reference to performance traditions would really help this to be more convincing and less one-dimensional. For example on page 15 line 48 the paragraph begins by referring to Hardy and Laurel as being experienced performers but does not really take this statement much further. In fact Laurel spent a considerable time in the Fred Karno sketch comedy company in the UK, as did Chaplin, and the combination of music, sound effects with performance styles put into play there were a considerable influence on his later comedy. This is a subtle but important factor in considering the way sound was used in these films. Also it would help to recognize that a whole range of musical styles and sound effects accompanied films during the silent period and Roach would have been aware of this. His approach to incorporating sound was to support the visual comedy and to take advantage of the increased level of control he had. What would be interesting to know is the production information about the soundtracks. Milton Ager, for example, is credited as producing the soundtrack to Berth Marks (1929) along with William Axt and Jack Yellen. Of the three, Ager had long experience in writing and playing music to Vaudeville acts. Yellen was a songwriter and Axt was a director of music for the Capitol Theatre in New York City. These contexts of production I would have thought are essential in a full analysis of the impact of the coming of sound to the comedy of Laurel and Hardy, and indeed the Roach studios generally. These are factors that will not necessarily show up by determining median or average shot lengths but are of crucial importance.
The conclusion seems a bit slight as well and has the effect of understating the results. It simply points to the consistency the statistics have with previous shot-length studies. There could be more made of this, in fact it seems to be a reason to demonstrate the limits of this type of statistical analysis without an accompanying recognition of the wide array of influences from theatre and vaudeville, not to mention the assumptions of how the silent techniques and traditions were built upon by those at Hal Roach’s studio during the early years of sound. On balance, this essay is probably not for Screen, but if it were to be resubmitted, reconsideration along these lines would be needed in order to contextualise the statistics. The statistical analysis would also have to be rendered more comprehensible to a diverse readership.
I find the argument that ‘Screen readers are a sophisticated readership’ somewhat hollow given the second half of that sentence concludes ‘even given the wide area of expertise and interest the dazzling array of formulae rendered in statistical analysis speak will be confusing for the majority of scholars.’ Apparently sophistication among the readers of Screen does not extend to mathematics – and really quite simple mathematics at that.
I’m not saying that the decision not to publish is the right or wrong decision. Obviously, I think the article should be published but that does not mean that it could not be improved. My point is that the reviewer – by his/her own admission – was not competent to make that decision. As the reviewer notes, this article employed
statistical analysis terminology that is certainly beyond this reviewer.
But at no point did the reviewer stop and say ‘I don’t understand the concepts being employed here so I should stop reviewing reviewing the paper and leave it to some one who does know what they’re doing.’ Nor does it appear to have occurred to the editors of Screen that, because their reviewer has admitted that he/she is not competent to review the paper, they should find someone who is. The fact that this paper appears to have been reviewed by a single individual is also a problem here: why not get a statistician to review the methodology and conclusions, and a film studies scholar to review the relevance and interpretation? Should we not expect from Screen a double-blind peer review process that employs reviewers who know what they are doing?
Let’s deal with some specific matters. Firstly, consider the statement
neither the method nor the outcome are clearly outlined and explained.
The methodology of the paper is clearly set out in the methods section, and the justification for using the median shot length and Qn as statistics of film style are clearly explained in terms of their robustness when dealing with skewed datasets that contain outliers, and a reference to a paper giving the theory for the use of Qn is given. There is also a reference to a paper that describes the method for calculating the confidence intervals for the sample medians. The statistical test employed (the Mann Whitney U test) is not some arcane piece of mathematics from the depths of number theory, but a common nonparametric method for comparing two samples that is described in every statistics textbook that I have ever seen. If you want to learn about the Mann Whitney U test then you can go to Wikipedia here, oryou can download a good explanation from Loughborough University here. You can download Gianmarco Alberti’s Excel spreadsheet for robust statistics which has an implementation of the Mann Whitney U test (amongst other things) here. I could easily have included a lesson on how the Mann Whitney U test works, but then I do not think that this would have made the paper anymore publishable. I suspect that if I had written a long section explaining the method then this would have been given as a reason for not publishing the article – not of interest to the reader, not appropriate for this journal, detracts from the argument, etc. It would be more appropriate for the Screen (or any other journal) to present a series of articles explaining statistical methods to their readers so that when they come across such information they know how to deal with it. For more on this point, see the next section.
The outcomes are clearly explained: the differences are stated verbally, numerically, and graphically in the form of boxplots, estimates of the differences are given along with their effect sizes, and the results are interpreted.
There really isn’t anything difficult about this paper.
The reviewer refers to
the dazzling array of formulae,
but the formulae are very simply and straightforward and anyone with a basic understanding of mathematics could comprehend them. This statement indicates that the viewer has very little actual experience of mathematics, and would not respond well to reading an actual paper in number theory. (You can read Andrew Wiles’s proof of Fermat’s Last Theorem here and here if a dazzling array of mathematical methods is what you are interested in).
I do not think that the lack of statistical education on the part of the ‘sophisticated’ Screen reader is justification for not publishing the essay. It is not my fault that academics in the humanities apparently have such a limited education. I do not think that it is unreasonable to assume that an academic readership should have a basic understanding of mathematics or statistics. If we only published work that could already be understood by the reader, then the opportunities for the reader to learn anything new would be severely curtailed. It is not unreasonable to expect the reader to do a little work when reading a research report, and if that means gaining an understanding of the methods involved then that it is to the benefit of everyone.
Finally, the reviewer notes in the final paragraph that my paper
simply points to the consistency the statistics have with previous shot-length studies.
There are two points to be made here. First, demonstrating that the results of one experiment are consistent with the results of another experiment is an important part of research, and if we were to only report ‘different’ results we should have no means of validating or refining our knowledge. Consistency between studies, were such a result to be found, would in itself be an important conclusion. Why would this be considered ‘a reason to demonstrate the limitations of statistical analysis?’ (And given the lack of statistical competence of the reviewer, what is the basis for this assertion?). Arguably, it would be proof that statistical analysis had identified an important result. The reviewer’s comments here point to a tendency towards the bias of only publishing studies that show statistical significance or large effect sizes, or studies which shows results in a particular direction (positive or negative). For an analysis and discussion of the problem of publication bias in clinical trials you can read the Cochrane review here.
Secondly, it is evident that the reviewer (who, we recall, has admitted his/her own incompetence) does not realise that this paper does differ from other studies. As I have pointed out at length before, other researchers use the arithmetic mean shot length to describe film style but, because the distribution of shot lengths in a motion picture is typically positively skewed with a number of outlying data points, the mean does not accurately describe the style of a film. The conclusions of other studies are wrong – when Barry Salt, David Bordwell, Charles O’Brien, etc, tell you that the difference in the mean shot lengths between silent films and early sound films in Hollywood is a slowing down of about six seconds they are wrong. It is not a matter of interpretation: they are simply factually incorrect. When I looked at the impact of sound technology on Hollywood films in general using the median shot length (which accurately locates the centre of any distribution irrespective of its shape), I came to a different conclusion – that slow down in the editing style is estimated to be 2.9 seconds. You can read the paper here. Now clearly, the impact of sound cannot to have slowed editing in Hollywood by both 6 seconds and 2.9 seconds since both values claim to represent the same effect. But we can dismiss the six second claim because the statistic it is based on is the wrong statistic – you cannot use the mean to describe non-normal data and decision based on using the mean in such circumstances will be wrong. This is not high level mathematics – it is part of the GCSE syllabus. Unfortunately there is too much ‘research’ in film studies that does not meet the exhaustive standard of the General Certificate of Secondary Education. (Note, for example, that in the comments above the reviewer refers to ‘median or average shot lengths’ without apparently realising that the median is a type of average).
The Laurel and Hardy paper notes the same general pattern in the changes to film style as my paper on Hollywood, but that the Laurel and Hardy films exhibit a different effect size – i.e., the change in editing style is smaller than in the broader Hollywood study. This is made explicit in the paper – the abstract contains the following sentence: ‘Although statistically significant, these differences are smaller than those reported in other quantitative analyses of film style and sound technology.’ The paper does not simply point to the consistency of the statistics of the other paper – it present results that are fundamentally different to every other study ever conducted in this area.
The quality of research presented in a journal depends on three things: the quality of the researcher, the quality of the reviewer, and the quality of the editor. I stand by my research as being methodologically sound and my conclusions as being appropriately derived from the data. I do not think that, on the basis if the above report, Screen is a journal of quality, and that neither its reviewers nor its editors are sufficiently competent to promote research of the standard they would like to claim.
What is to be done?
Now that I have pointed out the limitations of Screen‘s editorial process, what can we do to ensure standards are raised?
The answer is simple: education. Film scholars need to learn statistics – and journals like Screen have an important role to play in this process.
It is important to remember that statistical analysis is not just relevant to the study of film style, but is also an important part of the methodology of studying audience behaviour and of studying the economics of the film industry. Statistical literacy should be a part of the education of every film scholar simply because they come across graphs and figures so frequently in so many different contexts. They need to be able to interpret graphs, to understand basic statistical concepts, to know when to apply a particular method, and how to interpret the results. Finally, and perhaps most importantly, they need to know that when they get stuck they can take the time to learn what they need and, if necessary, that they can ask a statistician for help. (Seriously, if you don’t understand what you’re doing, stop doing it).
Other disciplines make a far greater effort to ensure that their practitioners are properly trained. Journals are not only forums for publishing research – they also functions as a means for developing researchers and for addressing issues of quality in research and education. For example, the article ‘Fundamental concepts in statistics: elucidation and illustration’ was co-authored by physicians and mathematicians and published in the Journal of Applied Physiology. You can access the article here for free. It is aimed at physiologists who obviously have some scientific training (you would hope!) but assumes no specialist knowledge in statistics. As the authors note in their introduction,
The proper use of any statistical technique, however, requires an understanding of the fundamental statistical concepts behind the technique.
The purpose behind the article is that the reader should acquire such an understanding. This journal is going out of its way to educate its readership to promote high quality research – authors will know how to properly present statistical findings; as well as making their journal more relevant to readers – if readers know how to properly interpret statistical reports then the information presented in them will be of greater use. The reader can refer repeatedly to this article and others like it, thereby ensuring their own understanding and application of statistical methods. The Journal of Applied Physiology is not simply a collection of essays: it is a resource to be used.
Another excellent series of articles from Critical Care that cover a wide range of statistics methods and how to apply and interpret them can be accessed for free here.
Do we have something similar in film studies?
The problem is that humanities journals are simply vehicles for scholars to publish 5000-6000 word articles, and are not viewed as a resource to be used again and again by scholars. None of the film journals currently being published have sections devoted to questions of methodology – even though there is clearly a desperate need for some instruction in statistical methodology, if not for research training in other areas. Nor do they have sections devoted to short articles that can present empirical results for other researchers to use. Here I think that we should refer to R.A. Fisher (see here for biography), who invented many of the statistical methods that are employed today, and can, therefore, make a strong claim to being the most influential scientist of the twentieth century. In ‘Statistical methods and scientific induction’ (Journal of the Royal Statistical Society B 17 (1) 1955: 69-78), Fisher stated that,
We have the duty of formulating, of summarising, and of communicating our conclusions, in intelligible form, in recognition of the right of other free minds to utilize them in making their own decisions (original emphasis).
When presenting statistical results in a film journal, this does of course assume that the reviewer, editor, and readers are competent to utilize results presented in ‘intelligible form’ – but Fisher’s view of the purpose of research is very different to the practice of publishing in film studies. Film studies journals, with their long essays, are so one-dimensional (I include web-based journals in this general statement); and the information is not in a form usable to other researchers. Is it even supposed to be usable by other researchers? It doesn’t seem like it when you read the essays – essays in journals are more a form of performance by scholars that a means of presenting useable information or dealing with problems of method, or maintaining standards of research and training, etc.
Therefore, I make the following recommendations:
- Journals should have clearly stated policies for the submission of papers that include statistical or other numerical analysis that set out what is expected of authors. These policies should be written in conjunction with trained statisticians.
- Journals should employ reviewers who are competent to review the papers submitted to them, and if any paper submitted to a journal includes a statistical analysis then the reviewers selected should include at least one statistician.
- There is a clear need for a series of articles similar to those examples given above that will educate researchers in statistical reasoning and methodology, and that will provide a resource that can be referred to again and again.
- There is a dire need for a statistics textbook that students and researchers can use that refers explicitly to examples from film studies. There are of course many excellent general textbooks in statistics that could be used if only film scholars would actually use them. The only attempts to explain statistics in film studies to date are riddled with errors so basic that it is clear the authors do not have any understanding of the fundamental concepts of statistics. See here and here for an example of just how bad the applications of statistics in film studies is and how just a little learning can improve matters.
- Journals should include a section for reporting empirical results quickly and clearly for the benefit of all researchers.
- The function of film studies journal should be as a resource to its readership and not as a platform for its contributors. This should be stated explicitly as part of the aims and scope of every journal, and be the guiding principle of all editorial policy.
These recommendations are not difficult to implement but would make a substantive contribution to improving standards in film studies. They are common to other disciplines and examples of good practice can be found in many disciplines.
It would be nice to think of one’s self as one of the ‘sophisticated readers’ of Screen (or just in general). But as far as I can tell the problem is not a lack of sophistication – it is a simple lack of the most basic knowledge (e.g. do not use the mean for non-normal data). It is a lack of putting in place the appropriate procedures for guaranteeing the required level of quality in research (e.g. such as having competent reviewers).
Doesn’t it bother you, as a film studies student/researcher/lecturer/professor that such elementary mistakes proliferate throughout the subject?
It is hard to defend film studies as an academic discipline in the UK because no one takes it seriously as a degree – but given the poor standards we have read about here today, would you want to defend it? I think that films studies is wonderfully exciting and intellectually challenging discipline, but I do not know that I could honestly tell a prospective student in 2011 that studying film would be a valuable way to spend their precious time and money.