Assignment title: Information

Half a Defence of Positive Accounting ResearchI Paul V Dunmore Massey University, Wellington, New Zealand Abstract Watts and Zimmerman staked a claim to the term \Positive Accounting Theory" for their particular theory. This paper considers positive accounting in the broader sense of a research program which aims at developing causal explanations of human behaviour in accounting settings; other examples than PAT exist in accounting. The ontology and epistemology of such a program is examined. The logic of statistical hypothesis testing, while superficially analogous to Popper’s falsification criterion, is much weaker. Although the broad positivist research program is potentially very powerful, it is being let down by deficiencies in practice. Common problems are casual construction of theoretical models to be tested, undue reliance on the logic of hypothesis testing, a lack of interest in the numerical values of parameters, insufficient replication to warrant confidence in accepted findings, and the use of theories as lenses to examine qualitative data rather than as explanations to be tested. Illustrations from good papers are considered. As positive research is currently practiced in accounting, it seems largely incapable of achieving the scientific objectives. However, Kuhn’s description of \normal" science fits positive accounting research quite well. The prospects are briefly discussed for a Kuhnian crisis and revolution, which might liberate positive accounting to achieve its potential. Keywords: positive accounting, research methods, normal science 1. Introduction In this paper I examine the positive approach to accounting research. Positive accounting research is part of the wider intellectual project of scientific research, which is intended to understand the cause-and-effect relationships IThis draft has benefited from comments by Jenny Alves, Amy Choy, David Cooper, James C. Gaa, Thomas Scott and Dan Simunic, and seminar participants at the University of British Columbia and Victoria University of Wellington. Draft - please do not cite December 2, 2009in the world under study. (As I will explain below, some streams of critical accounting research fall within this definition and are subject to the argument devloped in this paper.) The setting of accounting is one in which causes of human behaviour may be explored in large and complex organizations where face-to-face interaction is largely replaced by less-personal or completely impersonal systems of information for decision-making. To understand both the importance and the deficiencies of positive accounting research, I briefly review the wider intellectual project, with its ontological and epistemological assumptions. This review exposes serious deficiencies in the way that positive accounting research is actually performed, which prevents it from making a meaningful contribution to the wider project. These deficiencies are illustrated by examining some good recent papers. The illustrative papers could have been chosen from any area of accounting, but to focus the discussion they will be selected predominantly from the auditing literature. If positive accounting research is a well-established social system which is not well-designed for contributing to the scientific research project, its real purpose may be different. The concept of the disciplinary matrix used by Kuhn (1970) suggests instead that positive research may be a paradigm which is optimal for solving accepted puzzles, within a social group which accepts and rewards such puzzle-solving regardless of the social or intellectual contribution to be derived from the solutions. This suggestion gains weight from data presented by Lee (1997) on the dominance of the key accounting journals by a self-replicating ´elite. If that is so, there seems little hope that the ´elite could be persuaded to adopt a more effective paradigm. However, the data of Fogarty and Markarian (2007) hint that the relative position of the ´elite may be declining. Possibly, therefore, we may anticipate a future crisis and an opportunity for adopting a more useful paradigm. In the meantime, I offer suggestions for actions by referees and editors to nudge the present system towards liberating positive accounting research to achieve its potential. In papers such as this, one is expected to disclose one’s background and biases. I was trained as a theoretical physicist, and my accounting research has been positivist, concerned either with building or testing models. Thus, my criticisms are from the perspective of one who thinks that the research project is important, and is disappointed by the ineffective versions that are now practiced in accounting. However, my father is an historian, and I am by no means dismissive of non-positive approaches to understanding the world. 22. The Scientific Research Project Imagine a stream of intellectual enquiry built around the following working hypotheses: 1. There exists a world which is independent of our imagination. That is, we did not make it up; and events in that world are not subject to the control of our wishes. 2. Events in that world have causes which are themselves part of the world. That is, events are neither completely random nor the results of interventions from outside the world. 3. It is possible for normal people to obtain fairly reliable information about events in the world, by careful observation. This does not imply that we will never be mistaken in our observations, only that the observations are not completely unconnected to the world. 4. The purpose of the intellectual enquiry is to use observations to gain an understanding of the world, and in particular of causation. That is, we seek mental models which correctly map the causal processes that occur in the world. I should immediately make it clear that I am not asserting the truth of these hypotheses, merely asking for a ‘willing suspension of disbelief’ to permit their discussion. Indeed, I advance them fairly tentatively, conscious that for most of human existence they would have been thought preposterous or impious, and that perhaps the majority of humanity would still find them so.1 The generally agreed explanation has been that events in the world are caused by the intervention of non-worldly beings: gods, demons, spirits, and the like. The only major point of disagreement has been over which beings are responsible. When the idea that the world might be understood by rational enquiry was first invented2 in Greece, in a remarkably short period about 2,500 years ago, it had to contend not only with conventional Greek mythology but also with other views of what the world is like and what may be understood about 1After the devastating Kashmir earthquake of 2005, Pakistani physicist Pervez Hoodbhoy was explaining to his graduate students the plate-tectonic forces that caused the earthquake. \When I finished, hands shot up all over the room," he recalls.\‘Professor, you are wrong,’ my students said. ‘That earthquake was the wrath of God.’" (Belt, 2007, p. 59). 2It would be unfair, of course, to attribute the idea to a single person, and many of the writings of the time are fragmentary and known largely from later references to them. However, Anaximander of Miletus (ca 610{546BC) seems to have been the first to argue that the world was driven by physical rather than supernatural causes. 3it. The sophists, for example, seem to have had a distinctly postmodernist view: the sophist Gorgias argued (1) that nothing exists; (2) that if something existed, one could have no knowledge of it, and (3) that if nevertheless somebody knew something existed, he could not communicate his knowledge to others. (\On Nature or the Non-Existent", translated in Encyclopædia Britannica, 2009) Clearly, such a view precludes the sort of enquiry that I am positing, and indeed the sophists concentrated instead on the development of skills in rhetoric and on giving advice for living; if there can be no confidence in gaining real knowledge, then success might instead be sought in one’s ability to persuade others to one’s views (a useful skill in many walks of life, then and now). Likewise, religious understandings of the world, whether animist, Abrahamic or Buddhist, do not encourage such enquiry.3 There is a well-known story, which sadly appears to be untrue (Hannam, 2009, p. 312), that various theologians refused to look through Galileo’s telescope on the grounds that either it would show what was already known from Church doctrine and the writings of Aristotle (so looking was pointless) or it would show something contrary to that teaching because it had been corrupted by the Devil (so looking would be misleading). This is a self-consistent philosophical position, recognizable today in some creationist attitudes to evidence about evolution; it cannot be disproved, but it clearly precludes any real advance in understanding of the world. The Greek tradition of rational enquiry petered out as Rome gained ascendancy, and collapsed entirely with the classical world itself. It was continued by a brilliant handful of Islamic scholars, and transferred to Europe as Spain was reconquered and works of Islamic scholarship fell into the hands of the victors. It then gradually grew from the pursuit of a few amateurs to the current industrial-scale intellectual enterprise. The idea that the world might be rationally comprehensible, which must once have seemed a forlorn hope, has gained real traction in the last few centuries, having extended from 3The hypotheses are not necessarily inconsistent with religious belief. One may posit that a Being created the world and left it to run according to some set of internal causal rules, or even that a Being intervenes at every instant to give effect to the results that would arise if causal rules were operating. However, if a Being set the world in motion but then intervenes from time to time to alter the results for the benefit of His followers or otherwise, then the programme of enquiry described here will eventually fail when it encounters events that are inconsistent with the usual causal rules; that is, hypothesis 2 will be shown to be false. 4astronomy and mathematics to physics, to chemistry, to biology, and most recently to psychology and the social sciences (and to various sub-branches and applications of each of these fields). It may still turn out to be wrong, or to have only limited validity, but at present there is no compelling evidence of any limits. Past claims that ‘science will never be able to answer X’ have often proved wrong in an embarrassingly short time, and current proponents of such claims (as, for example, about consciousness and the nature of subjective experience) might wisely refrain from being too dogmatic until they know what another few centuries of enquiry may reveal. The intellectual program I describe is, of course, the scientific research program; in economics and accounting, it is described as positive research. It has become the mainstream accounting research program in recent decades, despite some trenchant criticisms, and it has had some significant successes. In this paper, I explore how it is being applied in accounting, and suggest that deficiencies in its implementation have led to the program being far less effective than should be expected. Because understanding how the world works is important, I regard it as undesirable that the related research stream is ineffective, and I offer some suggestions for improvement. 3. Examples of positive research in accounting Although Watts and Zimmerman (1978, 1986, 1990) virtually trademarked the term \Positive Accounting Theory", it will be clear that the concept of positive research is much broader than their particular theory. They theorize that accounting phenomena are caused by the operation of rational self-interest among parties who interact through express or implied contracts in various types of organization. This can encompass not only accounting choices by firm managers, but also pricing and reporting decisions by auditors (DeAngelo, 1981), standard-setting decisions by regulators and politicians (Watts and Zimmerman, 1978), expert advice offered by academics (Watts and Zimmerman, 1979), and others. But other areas of positive accounting research do not draw appreciably on this theoretical model. The value relevance literature attempts to infer from observed prices what accounting information investors use in their decisions, and research into the development of control systems in growing firms seeks to understand what factors lead to adoption of particular systems (Davila and Foster, 2007). These approaches assume that humans act rationally, but not in the sort of games that arise from Positive Accounting Theory. Fukuyama (1995, p. 13) suggests that the \fundamental model of rational, self-interested human behaviour is correct about eighty percent of 5the time"; this is clearly not defensible in quantitative terms, but it carries the sense that human behaviour is mostly rational but that the exceptions are important. So some accounting research examines behaviour in accounting settings without assuming rational behaviour, such as how audit experts make their judgements (Gibbins, 1984), how managers use discretion in performance-evaluation systems (Ittner et al., 2003), how different ways of presenting accounting information affect users’ ability to absorb it (Hodge et al., 2004), and how managers tend to persist in a mistaken decision despite accounting feedback showing their mistake (Schulz and Cheng, 2002). These examples are by no means exhaustive, but they serve to illustrate that the positive research program is much broader than Positive Accounting Theory. Any research which aims at understanding the nature and causes of particular accounting phenomena, even if those causes lie in non-rational aspects of human psychology, qualifies as positive | that is, scientific | accounting research. 4. Scientific ontology and epistemology But not all research does so qualify: that is, positive accounting research is not the same as accounting research.4 Following one line of enquiry, however fruitful, closes off others: By deciding ...that among many possible worlds as envisaged in other cultures, the one world that existed was a world of exclusively self-consistent and discoverable rational causality, the Greek philosophers ...came to commit their scientific successors exclusively to this effective direction of thinking. At the same time they closed for Western scientific vision the elsewhere open questions of what kind of world people found themselves inhabiting and so of what methods they should use to explore and explain and control it. (Crombie, 1994, p. 1) 4Chua’s brilliant analysis (Chua, 1986) identifies interpretive and critical alternatives, and explores the different assumptions underlying each. Her claims about the ontological assumptions of \mainstream" (i.e. positive) accounting (Chua, Table 2) are too narrow, however. In the decade or so before her paper, PAT had proved so fruitful that researchers flocked to the area, so that much of the research could be categorised by the assumptions that she lists (with the exception of her claim that \Human beings are . . . characterized as passive objects; not seen as makers of social reality", which is a startling misreading of PAT). However, these assumptions are not central to the general search for a causal understanding of the social world, and the examples mentioned in the previous section show that other sets of assumptions are adopted when appropriate. 6Interpretive researchers pursue some of these open questions, doubting each of the hypotheses underlying scientific research. First, human agency (which is not completely rational) and the socially-constructed nature of our roles, relationships, institutions and practices means that the social world does not have an objective existence independent of us, its participants, and that events in it need not have rational causes. Indeed, the categories of interest include both the \experiences" of social actors and the \meanings" that they ascribe to their lives and actions; both of these are intrinsically subjective, but through a process of social interaction they come to form an objective social reality. Further, we cannot observe the world except through our own experiences and the descriptions of other participants; there are no objective facts about experience. Because of these ontological and epistemological difficulties, any program aiming at an objective understanding of the causes of accounting phenomena is futile. These views underlie the interpretive research program, which seeks to explain us to each other. To the philosopher’s question \What is it like to be a bat?" (Nagel 1974), the interpretive researcher adds \What is it like to be a human being?" | a university manager controlled by a budgetary system (Pettersen and Solstad 2007), or a front-line auditor under pressure to complete procedures too quickly (Herrbach 2005). This is, of course, the question asked in the humanities, and the relationship of positive to interpretive accounting research parallels that between science and the humanities. How justified is the interpretive critique of the assumptions of positive research? The socially constructed nature of reality is not an insuperable problem: termite mounds and wolf packs are socially constructed, but are tolerably amenable to scientific study. Interpretive critiques argue that \humans are different," but that is at present a matter of assertion rather than empirical evidence: we simply do not know what lived experiences and shared meanings go into the social construction of a wolf pack.5 Neither is there any greater difficulty in observing the social world of the corporate director than that of the wolf: we can observe behaviours, and have the advantage of being able to ask directors about theirs (with the obvious caveat that we need to assess the reliability of their possibly self-serving accounts). Care is needed in the research method, but the positive program itself is not invalidated, unless it fails in practice after a fair trial. Also, it is not a problem that positive research does not explore meaning and experience, since its purpose is to explore causation. Different research streams with different objectives can coexist, and it is not a cogent objection 5An overview of some of what we do know is provided by Pierce and Bekoff (2009). 7to one stream to say that it fails to achieve the objective of the other. The key problem is the one of agency. If humans have free will,6 then their actions may have causes that are not amenable to scientific study. Free will is not the same concept as Cartesian dualism of the mind and body, but it is not clear that free will can have any meaning except in a dualist framework: the mind makes choices that it uses the body to carry out, and if those choices are \free" then they are neither random nor causally compelled by the body or by anything else in the explainable world. Accordingly, the behaviour of an agent with free will is neither random nor reliably predictable. It may be possible to explain general trends of behaviour, but exceptional cases will always occur because agents with free will can choose to do something different. In other words, the regular exercise of free will would invalidate the second assumption of the research program, since the causes of behaviours (which are events in the objective, if socially constructed, world) will be located not in the objective world but somewhere else. But this is an empirical question, not to be answered a priori. From an evolutionary perspective, it seems that had our ancestors not behaved in fairly predictable ways, both social life and successful food-gathering would have been impossible, and they would not have survived to become our ancestors.7 Indeed, it seems necessary that every animal should behave fairly reliably in a way that rationally seeks its self-interest,8 so that it should be no surprise that humans do so. But it is likely that there is also an evolutionary advantage to a certain willingness to depart from that pattern, since trying something new is the most effective way to discover new opportunities (offset against the risk of coming to a sudden and unpleasant end). So perhaps human behaviour is largely predictable, but with unpredictable variations from person to person and situation to situation.9 6I do not assert that people do or do not have free will. That would once have been thought a philosophical or theological question, but is now becoming an empirical matter. The experiments of Benjamin Libet, showing that reliably observable changes in brain states occur up to 0.3 seconds before subjects freely decide to act, are extremely suggestive (Libet, 2002). 7People with certain mental disorders behave unpredictably and/or fail to perceive the world accurately. These traits do not seem to enhance reproductive success. 8This does not imply conscious evaluation of alternatives; rationally optimal behaviour may be pre-programmed. It also does not preclude altruistic behaviour; one stream of research in evolutionary theory is to understand the circumstances under which altruistic behaviour is reproductively advantageous. 9Insurance companies rely on this to set premiums. Burglary is a completely voluntary act, but the overall rate of burglary is predictable enough that insurers can set premium rates that are both profitable and competitive. 8Although this is an empirical question, it is one to which we do not yet have an answer. Certainly, we do not yet have any comprehensive causal theory of human behaviour, and we are not certain of the limits of applicability of the partial theories that we do have. There are thus extensive areas where all that can be done is to describe a situation, the actions of people in that situation, and the motives and meaning they ascribe to those actions: that is, to offer interpretive research.10 It seems, then, that positive ontology and epistemology may not be correct, but they are neither illogical nor absurd. If researchers can observe, perhaps imperfectly, a social world which is constructed by actors who are, if incompletely, independent of the researchers and who act, usually, in ways that are caused by the actions of others and by the physical world, then we may hope to make observations which allow us to infer those causal relationships. This may not work very well in practice, but the only way to find out is to try it and see. Scientific research has been conducted, fitfully, for a couple of millennia, with successes beyond anything that could have been imagined in its earliest days. Positive research in the social sciences is only about a century old, and in accounting a matter of a few decades. The problems are seemingly more difficult in the social than in the physical sciences, and so we should not be too impatient for results. But precisely because it is so difficult, scientific or positive researchers need to adopt methods that are as effective as possible in turning observations into well-grounded causal theories. It is to that issue that I now turn. 5. Falsification and hypothesis testing 5.1. Popper’s criterion Much scientific research has always involved the collection of data, whether qualitative or quantitative. Theories are sometimes suggested inductively by the accumulation of data, but Hume showed that induction cannot prove a theory correct. The logic which does support the acceptance of theories has evolved over the centuries, and our best current understanding is due to Popper (1959).11 Working natural scientists, when they think about philosophy 10This is not to demean interpretive research as a mere \study of the gaps", confined to areas where positive research cannot yet be applied (or might never be applied). It has been observed that scientists who understand the mechanisms of a sunset do not find the sunset any less beautiful; and if we ever understand the causes of human behaviour that will not make human experience any less significant, except perhaps for those who never thought it significant to begin with. 11Kuhn and his successors have interesting things to say about the sociology of research and in particular the process by which attention turns from one field or theoretical 9of science at all, tend to accept Popper’s description as fairly close to what they do. In essence, the procedure may be summarized as follows: (a) Observe carefully and develop preliminary ideas. (b) Develop a formal theory, with testable predictions, that is consistent with all current relevant and reliable empirical evidence. The predictions need not be quantitative, but quantitative predictions are preferred where possible because they are more susceptible to falsification. (c) Test the predictions of the new theory against new observations in situations where the new and old theories make different predictions. Reject whichever theory fails the test, once the outcome is clear (so that observational errors, for example, cannot be driving the result). (d) Repeat steps (b) and (c) forever. The effect of this process is to produce lots of disproved theories, and a small set of theories that, so far, seem to work: that is, they have not (yet) been effectively disproved. Our understanding is always provisional, but it steadily advances; step (b) is a ratchet which ensures that we do not go back to previously falsified ideas, and the domain of knowledge covered by our successful theories consistently increases. Note that Popper says nothing about how new theories are invented. They may be suggested from observed regularities, but they are not proved inductively by those observations. They may instead come from a purely creative and imaginative process,12 based on essentially no empirical data. What distinguishes a creative idea in science from a creative idea of other kinds is the subsequent attempt to falsify it using careful observation. This was Popper’s falsifiability criterion: a theory which cannot in principle be disproved by observation is not scientific. It is also important that step (c) involves the testing of two (or more) theories against each other, not the testing of a single theory. It is sometimes argued that falsification is inoperable because many assumptions must go into the theoretical prediction, and falsifying the prediction does not explain which assumption is wrong; that is, the theory cannot really be falsified. 5.2. How theories are falsified To better understand the issue, consider the anomalous tracks of the Pioneer 10 and 11 spacecraft. Launched in the 1970s, these were the first paradigm to another. But this tells us little about the validity of the theories themselves. 12The chemist Kekul´e describes how, after months of unsuccessful efforts, the structure of the benzene molecule came to him in a dream (Benfey, 1958, p. 22). 10man-made objects to cross from the solar system into interstellar space. But over a period of many years it became obvious that both spacecraft are travelling slightly slower than is predicted by general relativity. There seem to be four classes of explanation: (1) the measurements are in error; (2) some influence internal to the spacecraft, such as a gas leak, is causing a slight additional force; (3) some influence external to the spacecraft, such as an unsuspected planet, is causing a slight gravitational pull; (4) the theory of relativity is wrong. Several conferences have been devoted to the anomaly, and each explanation has been worked on; extremely detailed and painstaking analysis has now explained part, but not all, of the anomaly.13 However much work is done, as long as the anomaly exists all four explanations remain possible. If someone invents a theory which competes with general relativity, which is consistent with all of the observations where general relativity already works, and which correctly explains the track of the Pioneers, then general relativity will have been falsified. But, unless that happens, the fact that two spacecraft are travelling at slightly the wrong speed tells us nothing about whether general relativity is wrong, or what a better theory might look like.14 If, instead, we eventually find a concentration of matter in the outer solar system that explains the anomaly, then general relativity will not have been falsified by the Pioneer observations. At present, we know of no such concentration, but we cannot tell whether that is because it does not exist or because we have failed to observe one that does exist. This illustration clarifies that a single theory cannot be falsified by any observation; but an observation can decisively select between two or more theories. Observations that cannot be explained do not falsify any theory although they may be inconsistent with one or more theories, and they do not advance our understanding of the world until they are explained (quantitatively, if at all possible). That is, decisive observations falsify incorrect theories: but they do so only if they simultaneously support a contending theory.15 13The Wikipedia article \Pioneer anomaly" provides a summary of the current position, with scholarly references. 14In the same vein, the advance of the perihelion of Mercury became one of the crucial tests between general relativity and Newton’s theory of gravity. But before Einstein invented general relativity, this slight movement was just an observational anomaly, having no clear significance. Relativity was not a slight tweaking of Newtonian gravitation, but a description of an utterly different universe. Despite that, most of its observable predictions are practically identical to those of the earlier theory; but the precise orbit of Mercury was one item where the results were different enough to be testable against each other. 15Kuhn (1970, pp. 146-147) corrects Popper’s view on this point. 11This illustration also shows that theories need not be grand \theories of everything", but may be at different scales from macro, through \midrange" (Laughlin, 1995) to micro models of a specific situation. As long as they are potentially testable descriptions of cause-and-effect relationships in the objective world, they can contribute to our understanding of how the world works. 5.3. Qualitative positive research Many sciences are largely or wholly quantitative, and sciences often become more quantitative as they mature. However, there are many respectable qualitative sciences (such as botany, geology and zoology); and some powerful theories (such as Darwin’s theory of evolution) are purely qualitative.16 It is a common mistake in the social sciences to assume that positive and quantitative research are the same, leading to considerable confusion in considering research which is positive but qualitative. There are two distinct purposes for undertaking qualitative positive research in accounting. The first is to gather data to assist in developing a preliminary understanding of some phenomenon, before enough is known to justify attempts at quantitative measurement. This is step (a) in Popper’s procedure. The sort of questions that require this preliminary qualitative investigation are suggested by Humphrey (2008) in the context of audit pricing research: for all the regression-based studies, how much do we really know as to how auditors, themselves, price an audit? How do they determine a tender bid and what distinguishes the behaviour and presentation strategies of audit partners or firms with higher success rates in winning audit tenders? How many audit firms price their audits using extensive/detailed regression equations? (p. 180) Unless researchers begin by talking to auditors (and, no doubt, to clients) about the factors that affect audit prices, it is hard to see how realistic models of the process can be constructed or relevant variables can be properly measured for quantitative research. And without realistic models, it should be expected that prematurely obtained quantitative results will be heavily contaminated with Type I errors caused by model mis-specification. I will take this point up later. 16The sciences mentioned, and the theory of evolution itself, have become more quantitative in recent decades, but each of them had a long qualitative history and each still contains qualitative elements. 12The other purpose of qualitative positive research is to test theories, which is step (c) in Popper’s procedure. At present, only economics-based explanations can be worked out so as to give quantitative predictions, and other theories must be examined qualitatively. However, qualitative researchers seldom attempt this: instead, they accept a theoretical framework as given and use it merely to describe and structure their results. When the framework purports to be descriptive of the social world, this approach prevents us from testing its validity. If a researcher advances the proposition that \organisations work like this", whether \this" is described in economic, Foucauldian, Marxist, feminist, or other terms, this is a positivist claim which falls within the scope of the scientific project described earlier. The truth of the claim needs to be assessed using the best available evidence, which on our present state of knowledge requires an application of Popper’s procedure. A typical auditing example is the examination by Kosmala MacLullich (2003) of audit firms’ adoption of the strategic-systems approach. Whether this adoption actually employed some version of Foucault’s \technologies of the self" to obtain compliance by front-line auditors is an empirical question. To test it, one would need to establish what we would expect to find if Foucault’s description is applicable to this situation, compare it with what we would expect under one or more different theories, and then examine the interview data for its fit with the competing predictions. Instead, the author followed the accepted practice of coding her interview data using Foucault’s theory as a guide, to produce \stories of a disciplining process on-the-job" (p. 798). This approach makes it impossible to assess whether Foucault’s theory actually does apply well to this situation. What remains is a set of interview data, possibly biased by the theoretical lens through which it was passed during analysis. Critical researchers seem to believe that their theories are simply lenses which give a clearer view of the world, and that testing them is neither necessary nor feasible. But in fact, most commonly used theoretical frameworks are assertions that the world operates in particular ways; such a claim immediately brings the research within the scope of the scientific research project. Whether such a claim is true (or more carefully, the circumstances under which it is true) is an important and open question, particularly for those who seek to use it as an exhibit in a drive to improve society.17 By and large, 17Sokal (1996) puts the point clearly, from the point of view of a committed political progressive: \epistemological agnosticism simply won’t suffice, at least not for people who aspire to make social change. Deny that non-context-dependent assertions can be true, and you don’t just throw out quantum mechanics and molecular biology: you also throw out Nazi gas chambers, the American enslavement of Africans, and the fact that today in 13qualitative accounting research currently fails to address this key question. Quantitative accounting research does, at least apparently, test whether its theories work. For someone familiar with mainstream scientific literature, the consistency with which these theories succeed when tested is nothing short of astounding. The remainder of this paper will examine this remarkable record of success a little more closely. 5.4. The logic and weaknesses of statistical hypothesis testing Consider the logic of hypothesis testing as it is commonly practiced in quantitative positive accounting research. Superficially, the logic looks like an application of Popper’s procedure: some null hypothesis is proposed, and under that null hypothesis (and auxiliary statistical assumptions) the distribution of some test statistic is computed. Typically, if the null hypothesis and auxiliary assumptions are true, then values greater than some cut-off will occur with some small probability p or less (that is, they will occur in only a fraction p of random samples). The test statistic is then measured; if it falls into the critical region then either the researcher’s sample is an improbable one (with a probability p or less) or the null hypothesis is false. The null hypothesis is thus rejected, since if it is true the value of the test statistic is unlikely to have been observed. But this is clearly a very watered-down version of Popper’s logic. Most obviously, the measurement of the test statistic is only probably, not certainly, incompatible with the null hypothesis. Even if every other condition is met, one test in 20 can be expected to be wrongly rejected at the 5% level. Most papers contain tables of statistical results, each with its significance test, and so false positives must be frequent in the literature. They could be virtually eliminated if we shifted our conventional significance threshold to say p = 0:00001. But that would re-attribute many true positives to chance; effectively, before accepting a statistical finding we would demand much stronger evidence, which might not always be obtainable. (I shall suggest later that this is no great consequence, because statistical significance is not what we should be paying most attention to anyway.) There is also a technical difficulty with the test: the distribution of the test statistic under the null hypothesis depends crucially on the auxiliary assumptions. For OLS regression, for example, it is required that the expected value of the dependent variable in the population is a linear additive function of the independent variables, not (for example) multiplicative, or New York it is raining. [Historian Eric] Hobsbawm is right: facts do matter, and some facts matter a great deal." 14additive in the logarithm of one variable and the product of a second and the square root of a third. This is almost never examined: indeed, when the common notation y = f(x1; x2; : : : ) or the equivalent verbal formulation \we expect y to be positively related to x1 and x2" is followed in either case by an OLS regression without further discussion, or when variables are logtransformed \to reduce heteroscedasticity", the reader may be confident that the researcher does not even understand the issue.18 More subtly, research on the distribution of financial ratios suggests that the population expected value may be infinite (e.g. Ashton et al., 2004), in which case OLS regression using a deflated dependent variable is never valid.19 Problems with other technical assumptions (such as independence and homoscedasticity of residuals) are better recognized by researchers and may be tested for. However, although this offers some protection, the tests follow the same logic as for hypothesis testing in general: residuals are accepted as being homoscedastic unless the test shows that this is very unlikely to be true. A better approach would be to determine how much heteroscedasticity would invalidate the main test findings, and estimate a confidence interval for whether the actual amount is less than this. If the confidence interval includes zero, then the usual test for heteroscedasticity would accept the null hypothesis that the amount might be zero; but this is unrelated to whether the upper limit is high enough to call the main test into question.20 Conversely, it may be that heteroscedasticity can be shown by a significance test to be present, but its magnitude is too small to matter. But the hypothesis testing logic fails on other grounds, even if the technical issues can be solved. Fundamentally, only one alternative hypothesis is considered, and it is not specified sufficiently carefully. If the alternative is weak enough (\a positive association"), it may be consistent with several theories,21 each of which may predict a different strength of association (and, 18An interesting feature of the paper on audit fees by Simunic (1980) is that he explicitly considered alternative functional forms for the effect of firm size. Such care is less common in the literature than it should be. 19The existence of infinite moments in the deflated variable is suggested by the occurrence of outliers in the sample. In practice, outliers are often deleted, possibly without comment, and in any case are treated as a statistical problem rather than as a clue that the research question has been wrongly formulated and should be changed. 20There is in general no off-the-shelf econometric procedure that does this. Usually bootstrapping and/or simulation are required to understand the particular situation. However, the effectiveness of simulation is limited unless the theoretical model is fully specified and accurately reflected in the research design. 21In many accounting papers the author does in fact suggest more than one competing explanation. 15perhaps, a different mathematical form of the relationship). It is, no doubt, very rare that two variables are perfectly unrelated, and in every other case there is a 50:50 chance that the sign of the association will be as predicted. With a large enough sample, this sign can be established with some confidence. This does not, however, establish the truth of any particular theory, even if the sign happens to be consistent with that theory. A comparison with the Pioneer anomaly shows the problem. What is required is that theories be developed in sufficient detail that each makes a specific mathematical prediction, involving both a functional form and specific parameter values.22 The sample data should then be used to choose between the predictions of the different theories. If the sample data is inconsistent with every proposed theory, then nothing has yet been learned and none of the theories has yet been falsified; more research is needed to understand and resolve the anomaly. This is not, however, an argument against publishing an otherwise sound paper that reveals the anomaly; unless it is published, further work is hardly likely to occur. It is commonly understood, in fact, that the logic of hypothesis testing is intended only to falsify the null hypothesis that \the observed result is due to chance and is specific to this sample." It is, of course, important to establish that our results are not due to chance. But, having established this, we are no closer to learning what the results actually are due to; that is, hypothesis testing tells us nothing about the truth of the particular alternative except that, at most, it gives the correct sign.23 Given that accounting data is subject to noise and to measurement error, there will always be a role for statistics in positive accounting research. But statistics are being wrongly used: the usual goal should be not hypothesis testing, but estimation.24 5.5. The effect on the positive research programme The preceding analysis shows that hypothesis testing, as commonly practiced in positive accounting research, can provide only very weak evidence in support of a particular alternative hypothesis. Thus we should strongly suspect that much of what is claimed to have been established is not in fact true, although of course we have no way of knowing which claims are true 22The parameter values may be expressed in terms of values measured elsewhere, but they must be known independently of the data from the current sample. 23For a two-sided alternative hypothesis, we do not get even this much information. 24In a different form and context, the foregoing critique of hypothesis testing in research was substantially argued by Sir Ronald Fisher (Fisher, 1955). 16and which are not.25 In a Kuhnian world in which the function of research is to provide researchers with interesting puzzles, moving from paradigm to paradigm as the old paradigms become unfruitful, this would not much matter. Researchers publish their work and advance their careers. Although some advice is given to standard-setters and regulators based on research findings, this advice is less direct than in earlier decades when normative research was the usual style. Presumably regulators tend anyway to follow advice which supports their inclinations and ignore the rest, so little harm will be done if researchbased advice is incorrect. But I have argued that positive accounting research actually contributes to a wider scientific endeavour, with the particular aim of understanding human behaviour and its causes in an unusual setting of complex organizations where a variety of decisions are influenced by specialized information and control systems. If that is its main function, then quality problems caused by weak inferential methods become a serious concern. Accordingly, in the next section I propose some criteria for a program of positive accounting research that could make a reliable scientific contribution over time; and I test two good papers specifically against those criteria. This step is essential, because it would be easy but empty to lament that accounting should be more like physics. I hope to illustrate that positive accounting should be like itself, but more effective, by identifying specific weaknesses in the approach that we accept as standard in the best quantitative papers in our discipline, and suggesting how they could be improved. 6. What is required for a successful positive research program? 6.1. Vulnerable models that are stringently tested If we are to gain the most from a Popperian approach to positive research, the first requirement is that we demand more of our theoretical models. They must be designed to be taken seriously; they must actually be taken seriously; and we must expect them to fail and learn to improve them when they do. This requires, first, that they should be highly specified as to mathematical form so that they are as vulnerable to disproof as possible. In addition, they should be tested as accurately as possible (rather than, for example, in a simplified or linearized form). To make this possible, there must be careful 25A critique of Positive Accounting Theory along roughly these lines was made by Christenson (1983). 17measurement of the variables that enter the models.26 A recent article (Choi et al., 2009) illustrates the point.27 The authors develop a model of the audit fee as a function of the strength of legal regime r and audit complexity k, and present it as their equation (5): f(r; k) = k − k2 4(1 − p)rL (1) where p is the probability that the manager has a particular level of competence and L is the payment that the auditor would have to make in the event of an audit failure. This model is of a very specific functional form, as shown in Figure 1. This form immediately suggests several interesting tests of the model: since audit fees must be positive, only certain values of k are feasible and these depend on the legal regime; for high-complexity engagements, audit fees tend to fall with increasing complexity; beyond a certain point, stronger legal regimes have little effect on audit fees; and, of course, the full function could be fitted to actual data to test whether the fit is quantitatively good. Some of these conclusions are at least superficially implausible, but they are all testable in principle. In fact, the authors do none of this: they immediately discard the richness of the model for purely directional hypotheses: their H1 and H3 are essentially statements that f(r; k) increases if r increases; and their H2 a statement that if r does not change then f(r; k) increases if k increases.28 And for testing these hypotheses they make assumptions about the mathematical forms: their regressions use various proxies for complexity, use the logarithm of the Wingate (1997) litigation index as a proxy for the legal regime, and assume that the logarithm of the audit fee is a linear additive function of their proxies. That is, the theoretical model has been turned into 26The discussion thoughout this section addresses quantitative research explicitly. However, with appropriate modifications, many of the same issues apply to qualitative positive research. 27By choosing this example, I am not intending to denigrate the work of these particular authors; in fact, I have a high opinion of this paper. Similar comments could be made about a number of articles from virtually any issue of a top accounting journal. The problems raised here reflect the low standards of inference which are accepted in even the best articles in our field. 28Figure 1 makes it clear that the second statement is true only for low-complexity audits. To arrive at their H2, the authors take an incorrect partial derivative, assuming that the optimal audit quality is held constant as the other variables change. This technical error was not fundamental and is not of wider interest; my focus is on the logic of the testing, which is typical of many other studies. 18Figure 1: How the audit fee depends on the legal regime r and complexity k, according to equation (5) of Choi et al. (2009). 19something like log f(r; k) = g(k0) + h(r0) (2) where k0 and r0 are proxies for k and r and h() may be the logarithm function (in which case, the dependence of f on r will be as some power of the Wingate index). None of these choices is supported by any discussion as to whether it is mathematically or economically appropriate; but in any case, the model that is actually tested has been almost completely decoupled from the theoretical model introduced in the paper. Equation (2) is the sum of (proxies for) a function of k and a function of r, which is not the structure of equation (1) and is quite different from the behaviour shown in Figure 1, regardless of what functions g() and h() are actually used. The upshot is that the coefficients of the actual regressions provide no unambiguous information about the effects of differing legal systems on audit fees. The results are uninterpretable, because the regressions do not match a theory that would allow us to interpret them. Contrast this with the Pioneer anomaly: each theory will have to be worked through in the most precise detail so that its predictions can be unambiguously tested against the measurements. If the theory proposes a new planet, then the calculations must reflect the exact mass, position and orbit of the planet (and the existence of that planet must be supported by other evidence than the Pioneer anomaly, such as direct observation). Had the authors tested their model against their data, it would undoubtedly have failed; and we would have learned something from the failure, because we could have seen how the real world differs from the model and could have gone back to try to understand what a more realistic model might look like. But as matters stand the only thing we have learned for sure is that some regression coefficients are non-zero. 6.2. Analytical modelling In several sciences, the development and testing of theoretical models have become distinct specialities, with theoretical and experimental scientists in the same discipline receiving quite different training. In accounting, it is normally expected that the author will both develop a theoretical model and then collect the data to test it. However, there is a partial exception, since the development of \analytical" models is indeed accepted as a specialised activity, and such models are not usually presented as part of empirical papers. Those who specialise in developing such models are certainly \theorists" in a sense that would be recognised in physics, biology, and other sciences. But this research has tended to remain decoupled from empirical testing. Researchers building their models around game theory generally limit 20themselves to tractable models that can be shown to have equilibrium solutions, with relatively little concern for the institutional realism of the assumptions.29 Empirical researchers reading this literature are rarely rewarded with a glimpse of a rigorously testable prediction. Other models, such as the valuation model of Feltham and Ohlson (1995), do offer apparent pathways towards testing. However, empirical researchers tend to wrestle with the parameters of the models, which theorists have not developed in enough detail. For example, Dahmash et al. (e.g. 2009) note that Feltham and Ohlson \permit ‘other information’ without specifying what this ‘other information’ might include", which causes difficulties in operationalising the model. However, there is a strong base of expertise among analytical researchers which could in principle be turned to developing theories to the point where they could be tested. For this to happen, the preoccupation in the field would need to shift from tractability to verisimilitude. 6.3. A focus on measurement rather than testing The raw material of quantitative research is measurement of concepts in ways that are precise (so far as the underlying data permits) and reproducible. Very little attention is paid to this in accounting research. Two examples will illustrate the shortcomings and how matters should be improved. The previously mentioned paper by Choi et al. (2009) used two concepts that are likely to be of value in a number of studies: audit complexity and strength of the legal regime.30 These concepts will be of interest in so far as they relate to each other and to other concepts: that is, in so far as they will be found in rigorously developed and tested theories. But before such testing can occur, the concepts need to be defined and measured in a way that is known to be valid and reliable. Audit complexity has been used in the audit fee literature for many years; it represents the amount of work that is required to complete a particular audit engagement at a particular level of quality. Within the model of Choi et al., the auditor’s effort cost is kq where k represents the complexity of the audit and q the quality. What drives this work requirement? Most obviously, the size of the auditee; and the most consistent finding in the audit-fee literature is that the 29In the sciences it is taken for granted that many models cannot be analytically solved in realistic settings (and that equilibrium solutions are not always interesting). Thus theorists have extensive recourse to numerical methods, simulation, and approximation techniques. 30There are, of course, other concepts, to which the same considerations apply. 21Figure 2: Coefficients, with 95% confidence intervals, for the regression of log(audit fee) on log(total assets), from studies by Gonthier-Besacier and Schatt (2007); Kealey et al. (2007); Simunic (1980); Choi et al. (2009). fee increases as some power of size. Following Simunic (1980), this is usually represented by modelling the logarithm of the fee as a linear function of the logarithm of total assets. Figure 2 shows estimates of the coefficients, with 95% confidence intervals (based on reported t values) for Choi et al. and for three other typical papers. As a visual guide, a dashed line is shown at a coefficient of 0.41, which is consistent with many of the confidence intervals. There are wide variations in the estimates, beyond what can plausibly be attributed to chance, both within this study and between it and the other studies. In the Choi et al. study, the coefficient is about 0.28 for firms with weak legal regimes in the home country (Table 7(2)) and for firms that are cross-listed in Australia, Hong Kong, New Zealand and the UK and noncross-listed firms in the respective home countries (Table 6(2a), (4a), (2b)). In contrast, Gonthier-Besacier and Schatt (2007) find a coefficient of 0.68 for French firms. These coefficients are all highly significant, that is, almost certainly not zero; but this is hardly of interest. Of much greater interest is why they differ, since that difference represents a very large alteration in how audit fees vary with size. If firm sizes vary from $100 million to $100 billion, then 22the audit fee for the largest firm would be 100 times that for the smallest if fees scale with a power of 0.68, but only 7 times as large if the scale factor is 0.28 (keeping all else equal, of course). The difference cannot be attributed to different prices in different countries, because that would affect large and small firms in the same proportion. There is no doubt that auditee size contributes greatly to audit complexity, but there also seems no doubt that we do not have a valid measure of that contribution. For any given size, of course, audit complexity also depends on other characteristics of the auditee. This is normally captured by including various ratios and other variables as controls in the regression for the logarithm of audit fees. Choi et al. use a widely accepted set of firm-level variables: the sum of receivables plus inventories divided by total assets; total liabilities divided by total assets; the logarithm of the number of business segments plus one, and likewise for geographic segments; and dummies for reporting a loss and for having recent capital issues. Although each of these can be justified as increasing audit complexity, the literature is silent as to whether these are actually independent multipliers of complexity or whether (and how) they interact, and as to how they can appropriately be combined into a specific mathematical function. That is, we do not know if the amount of work required to complete an audit to a given quality is actually given by the formula k = k1T Aα exp[β1INV REC + β2LEV ](1 + BSEG)γ1 × (1 + GSEG)γ2 (1 + δ1LOSS)(1 + δ2ISSUE) (3) where k1; α; β1; β2; γ1; γ2; δ1; δ2 are constants?31 It is certainly possible, but it is not altogether convincing, and it has not been demonstrated that this is the correct formula.32 Another recent paper (Antle et al., 2006) uses other control variables, but where their set overlaps that of Choi et al. it uses a different functional form:33 k = k1T Aα exp[β1LEV ](1 + REC)γ1 (1 + INV )γ2 (1 + δ1LOSS) (4) It is notable (but not apparently noted by the authors) that the coefficient α in this model is estimated to be effectively zero, suggesting that audit fees 31The coefficients might depend on quality. For example, in the model of Choi et al., the coefficient k1 would be multiplied by q. 32Showing that the relevant regression coefficients are significantly different from zero is not evidence that the functional form is correct. 33Here REC and INV are receivables and inventory in millions of dollars, not scaled by total assets. 23are unrelated to firm size. Since this is obviously nonsense, one can have no confidence in the other estimated coefficients from the same regression. The concept of audit complexity is an important one in this and other studies, and it can be related to audit fees in a way that is likely to be useful. But we do not know how to measure complexity in a way that gives a valid, reliable, and reproducible value. If our measurements are not adequate, we can have no confidence in the findings of studies based on them.34 The other key concept in Choi et al. is the strength of a country’s legal regime, which they define as the probability that a court will find an auditor liable given that an audit failure occurred. They measure this using an index introduced by Wingate (1997) but actually developed by an insurance underwriter for a big audit firm. The original scale was from 1 to 10, but Wingate added a score of 15 for the US (which was not in the original index) based on an off-the-cuff assessment by a partner of the audit firm concerned that the US score would be \at least 15". Does the Wingate score validly measure the concept that Choi et al. require? The description of the factors considered in its construction suggests that it might do so. But examination of the score values themselves quickly raises doubts. Although scores are specified to two decimal places, only eight different scores occur, and Figure 3 shows that they are almost perfectly represented by the formula 1+x=2+x2=9, where x takes the values 0, 1, 2, . . . , 7. (The US corresponds nearly to x = 9.) It seems, then, that the underwriter began by dividing countries into eight groups, and then created a score by drawing a simple graph to convert what is really an ordinal variable into one that purports to have an interval or even a ratio scale. This can leave us with little confidence that the Wingate score (or its logarithm, which is what Choi et al. actually use) has a truly linear relationship to the concept r that it is meant to measure. It may be that it does, or that some particular (nonlinear) transformation of it does. But until that is established, we cannot trust it as a measurement of the concept to be used in a serious test of any theory. These rather specific comments on one research paper highlight problems which are common to most areas of positive accounting research. Generally, little attention is paid to validating the proxies or the specific formulae that are used for measurement of important concepts. Consequently, we cannot with any confidence plug measures of these concepts into tests of theories 34Meta-analysis of the audit-fee literature shows that many results are inconsistent from study to study (Hay et al., 2006). If the concepts are not properly measured, this is hardly surprising. 24Figure 3: The Wingate score. that rely on them. As a brief example of a different measurement issue, consider the paper by Schulz and Cheng (2002), examining the tendency of managers to escalate their commitment of resources to projects which they personally approved, despite feedback suggesting that the project is unsuccessful. The authors point out that, to properly test the theory, both the elements of personal responsibility and unequivocal negative feedback must be present, and their contribution is to correct previous studies by running an experiment which properly includes both features. They also hypothesized that information asymmetry between the managers and their superiors would moderate the behaviour, but found no significant effect. The study used a standard 2 × 2 experimental design in which personal responsibility and information asymmetry were each either present or absent. It is not clear that either of these is truly a binary variable (when a Board approves a decision, each director may feel partial responsibility, for example), but it is clear that the strength of negative feedback takes more than the two values \ambiguous" and \unequivocal". If feedback in some previous studies was ambiguously weak, how does the tendency to escalate commitment35 vary with feedback strength? Is there a threshold above which negative feedback overwhelms the tendency to continue justifying a personal decision, so that the escalation of commitment paradox suddenly disappears, 35It might also be asked whether information asymmetry would actually have a measurable effect at other feedback strengths. 25or does progressively strengthening the feedback progressively weaken the behaviour; and, if the latter, is the effect proportional to the strength of the feedback or does it increase non-linearly? Is there another threshold below which feedback, while still negative, becomes so weak (\ambiguous") that the effect of personal responsibility disappears? These questions may have some limited application to design of feedback systems in business, but are of great importance in the research program of trying to understand (non-rational) human behaviour. To answer them will require developing a measure of negative feedback. In the specific setting of Schulz and Cheng, this is easily done: it is the actual rate of return on an earlier investment project. These examples illustrate the potential role of measurement in extending our knowledge. The model presented by Choi et al (2009) already has a specific mathematical form, and that form could be tested; if errors are found, they may provide guidance for developing better models. The escalationof-commitment theory is at present purely qualitative: an effect occurs in certain conditions. However, now that the existence of the effect has been confirmed, the opportunity arises for measurement of how its strength is affected by the magnitude of its causes. That knowledge can be gained without at present having any theory; but it would guide the future development of relevant theory. Measurement requires some initial theory to identify what concepts are likely to be worth measuring; attention to an appropriate operational definition of the concept, including consideration of the mathematical form; consistent use of the same definition in different studies; a focus on reporting confidence intervals to indicate the precision of any particular measurement, so that inconsistent measurements can be identified and investigated; and replication of measurements using different samples and settings, again to bring inconsistencies into view (since they are likely to be symptoms of an unreliable measurement of the concept). Only the first of these requirements can really be said to be characteristic of positive accounting research as now practiced. A final point about measurement: in certain sciences, a very common type of journal article is the short paper reporting a measurement or observation (in chemistry, the properties of a newly synthesized chemical; in biology, the description of a new specimen; with parallels in archaeology, astronomy, geology, and others). These articles are frankly atheoretical: their purpose is to report data.36 One of the most striking applications of the use 36If the data set is extensive, it is now likely to be archived on-line rather than reproduced in the body of the paper. 26of such data was Mendeleev’s invention of the periodic table of the chemical elements: from thousands of recorded observations of the properties of the known elements, he noticed that if the elements are ordered by increasing atomic weight then elements with similar physical and chemical properties appear periodically in the series. By organizing the list in a tabular form, he discovered that the table appeared to have several gaps. He predicted that elements would be found to fill those gaps, and also predicted what their properties would be, from the properties of the elements surrounding the gaps. Over the next few decades, the gaps were indeed filled with elements whose properties were substantially as Mendeleev had predicted. Only many decades later did any understanding develop of why those regularities occur.37 Without the original measurements, however, there would have been no possibility of developing the insights given by the periodic table. In accounting, there appears to be a strong publication bias against measurement except when attached to a theory. There is consequently an acute shortage of raw material for an accounting Mendeleev to work with. The undoubted effect is that the development of good theory is hampered by lack of data; and, as already noted, the theories that occur in empirical papers are typically neither very good nor taken very seriously.38 6.4. Replication, replication, and more replication There are two quite distinct motives for replicating previous studies: (1) To determine whether the original result merely reflects sampling error. (2) To explore the limits of applicability of previous findings. The first motive is particularly important in the current approach with its main focus on hypothesis-testing. As previously noted, many results in the literature with 5% or even 1% significance will be false positives even if there are no other statistical issues. Further, as has long been known (Lovell, 1983), significance levels quickly become enormously mis-stated if authors search over regression specifications and publish the specification that works best. If an author has been at all diligent in trying alternative proxies, functional forms, or econometric assumptions, then even figures that are reported as having p = 0:001 may 37The cause is the limited and repeating sets of possible orbits for the outer electrons of an atom, governed by the laws of quantum mechanics. 38Presumably there is a concern that atheoretical measurements may turn out to have no use. Often, no doubt, this will happen; but measurements that are not made and not published will certainly have no use. The effect can be mitigated by ensuring that straightforward papers of this kind are short (2{3 pages often suffices in other disciplines). 27in fact be completely non-significant. There is no way to know which results are false positives except to replicate the work;39 and the more important the study, the more important it is that it should be replicated several times. If this is the motivation, then the correct design is to replicate the study as exactly as possible, except for using a different sample. The same proxies, functional forms, and econometric assumptions should be used, so that there is no possibility of doing data mining over alternative specifications. The preferred focus is on estimation rather than testing significance, since whether a statistic is significant depends on the sample size and the residual variance, neither of which may be controllable in a replication; problems in the original study are suggested if the original and replicated confidence intervals do not overlap. If the motive is to test the limits of applicability of a finding, then the approach depends on what is to be examined. For example, does the finding apply in different countries, in different time periods, to experts as well as student subjects, to financial as well as manufacturing firms? How strong must the negative feedback be before the manager’s decision changes, and does the change occur suddenly or gradually? The purpose is to identify how far the underlying theory can be stretched before it breaks down, if indeed it does. Once the limits of applicability of a theory are reached, then further theoretical work becomes necessary to extend our understanding.40 An apparent counter-example to the weaknesses discussed in this section may be found in the previously mentioned paper by Antle et al. (2006), who compare different theories about the causal relationships between audit fees, non-audit fees, and earnings management. To do this, they build a simultaneous-equations model involving all three variables. They argue that this \allows us to more fully model the theoretical relations between audit fees, non-audit fees and abnormal accruals. . . . Still, the benefits of joint estimation have to be weighed against our lack of understanding of . . . the proper model specification in the joint estimation" (p. 238). This sounds as though the authors are preparing to compare the predictions of different theories against each other using carefully specified models, but they immediately explain \Since we rely on prior literature for variables to include in our models, we also do not view misspecification as problematic", which 39Even out-of-sample testing (and variants such as bootstrapping or the Lachenbruch method) do not give the assurance required. There is no way of telling whether the author has simply been more diligent in seeking a specification that survives this additional testing. It is necessary that the original study be published, so that its details are frozen and no longer subject to tweaking by the researcher, before replication can be attempted. 40This motivation for replication is equally applicable to qualitative positive research. 28makes it clear that they do not regard the mathematical form of the models as being a specification issue. In fact, all of the jointly estimated equations are linear, some variables are log-transformed, and the control for audit complexity uses some different variables and functional forms than used in other studies. The authors do not consider that these choices warrant discussion or justification. Further, the theories are not mutually exclusive, so that support for one does not imply rejection of others; in that sense the theories are not being tested against each other, although they are all being tested in the same set of equations. The paper presents better econometrics than earlier papers in the field, but its fundamental logic does not allow it to provide clear evidence. If all the theories have some element of truth, then it would be of greater economic interest to understand the relative strengths of the different effects, and this brings us back to the importance of measurement in preference to mere hypothesis testing. 7. Why is it like this? This survey has revealed a wide gap between how positive accounting research is actually practiced and what would be required for it to make an effective contribution to the wider intellectual programme outlined in section 2. If a system does not appear to be optimised for its purpose, there are two possible responses: we may try to set about modifying the system, or we may stop to wonder whether we might have mistaken its purpose. The description of \normal" science by Kuhn (1970) may offer some relevant insights. Kuhn does not clearly describe his view of the world that science investigates, but it appears that he is not a realist: \There is, I think, no theory-independent way to reconstruct phrases like ‘really there’; the notion of a match between the ontology of a theory and its ‘real’ counterpart in nature now seems to me illusive in principle" (Kuhn, 1970, p. 206). It follows that, unlike Popper, he does not accept the hypotheses of the scientific research programme that I set out in section 2, and does not regard the scientific programme as feasible. Instead, his fundamental view of scientific change is that of one set of views supplanting another in a community.41 Thus, he seems to view scientific research as an essentially cultural activity, validated by the group of those who participate. What he calls \normal" science in a disciplinary area is a set of practices, beliefs, and attitudes that are well adapted to allowing the members of that group to solve large numbers of related puzzles: 41See, for example, his discussion of translation and conversion on pp. 204-205. 29[O]ne of the things a scientific community acquires with a paradigm is a criterion for choosing problems that, while the paradigm is taken for granted, can be assumed to have solutions. To a great extent these are the only problems that the community will admit as scientific or encourage its members to undertake. Other problems . . . are rejected as metaphysical, as the concern of another discipline, or sometimes as just too problematic to be worth the time. (Kuhn, 1970, p. 37) Solving puzzles is fun, and Kuhn sees science as basically a form of play. The relevant community defines the paradigm, much as the World Chess Federation defines the rules of chess, and its members engage in puzzle-solving within that paradigm. Anyone who does not subscribe to the paradigm is excluded from the scientific community, much as a would-be chess player who thought the rules should be different would not find anyone willing to play under his rules. If a paradigm becomes ineffective, a revolution will occur and lead eventually to the emergence of a new paradigm (although some of the previous players may not adapt to the new rules and may be left behind). Most scientists do not recognise this description of what they do, because Kuhn omits any but the most passing mention of the constraints imposed by the objective world. Anyone who has tried to develop a model that fits experience, even within the most securely defined paradigm, is acutely conscious of how difficult this process is. One reason for having increasing confidence in the objective reality of the world, the first hypothesis of section 2, is that experience shows the world to be extremely refractory to our attempts to understand it. The research practices recommended in section 6 are difficult and demanding. One might expect, for example, that many years and much published research would be spent simply in the process of identifying the best way to measure a particular concept reliably. Developing rigorous theories is hard, and most of them will fail when properly tested. I began this essay with a reminder that the scientific project has been in progress for millennia; scientists accept the idea that worthwhile progress is slow, and that timeframes are often measured in decades or centuries rather than months. If accounting is to make a worthwhile contribution to the scientific project, its apparent rate of progress would certainly slow; but I have argued that the current progress is largely illusory anyway. But if one were to construct a social system imitative of science but evading the constraints imposed by nature, one could create a perfect Kuhnian world in which researchers were free to play at solving puzzles within a paradigm accepted by the research community and in which acceptable 30puzzles could be solved (relatively) rapidly and reliably. In such a social system, it would be high praise of a theory to say that it had succeeded in the marketplace of ideas (Watts and Zimmerman, 1990), since that would show that the theory had been extremely successful at generating puzzles that the community could solve. What would such an imitation science look like? Obviously, it would have theories which were not intended to be tested too carefully against evidence. Some theories might be assessed for their mathematical elegance, without much concern for their realism. Empirical tests would focus on the least demanding standards of evidence, such as establishing that the sign of a relationship was correct, avoiding detailed tests of magnitudes or mathematical forms. Coupled with searches over various specifications, this would ensure a gratifyingly high rate of success when theories were tested. There would be a further advantage that very crude measurements of concepts would be sufficient. (Alternatively, empirical data might be collected merely in support of a particular theoretical lens, without being used to test the theory at all rigorously.) Finally, the community would limit itself to those who subscribed to the paradigm; others would be excluded, although free to set up communities of their own with their own paradigms.42 Perhaps this sketch will seem familiar. Positive accounting research looks rather like a Kuhnian normal science with two distinct paradigms, each of which is optimised to allow its practitioners to solve a particular class of puzzle according to the accepted rules. Neither paradigm, however, is well adapted to yield much reliable information about how humans behave in the social contexts where accounting is relevant. Neither paradigm pays much attention to the other: their descriptions of the world are mutually incomprehensible, if not necessarily incommensurable in Kuhn’s sense. As is widely recognised, the quantitative party is dominant in North America. The evidence provided by Lee (1997) shows how this dominance is sustained. Lee found that the top four journals over the period 1963-1994 were dominated by an ´elite who had gained their doctorates at one of 20 US universities. The dominance was documented through editorial appointments, majority membership of the editorial board, long tenure on the board, multiple board memberships, and publication in the journals by members of the editorial board. The North American tenure system places a strong emphasis on publishing in high-ranked journals. Together with the ´elite dominance of those 42The World Chess Federation has no objection to people choosing to play bridge; they are just not welcome to do so at chess tournaments. 31journals which Lee documents, this brings new accounting academics into a system in which compliance with the expectations of the ´elite is required for those wishing to remain part of the discipline.43 This ensures the selfrenewing nature of the accounting research community and of the quantitativepositive paradigm that supports it. The publication system also appears to enforce compliance with the paradigm. Journal space is limited, and the limits are driven particularly by the desire to maintain high rejection rates.44 Both editors and referees are unwilling to risk accepting a paper which might turn out to be wrong. Further, papers are expected to make a \contribution", which means that the result must be new and preferably unexpected. This bias discourages theories that are precise enough to fail, discourages replications, and encourages papers that pass weak empirical tests. A main finding which is actually a Type I error is likely to be both new and unexpected and thus has an excellent chance of making a contribution. The rate of false positives in the literature may thus be even higher than one might suppose from general statistical considerations. Francis (2006) points out that replication of the work of Frankel et al. (2002) connecting non-audit fees with indicators of earnings management showed that the results were fragile, and wondered about the frequency of Type I errors generally in the literature. A concern in the academic community for novel puzzle-solving rather than truth is likely to be actively harmful if the purpose of the project is to contribute to understanding. Kuhn argues that a paradigm is replaced only when it can no longer support the puzzle-solving of normal science, and it enters some form of pre-revolutionary crisis. Is a crisis in sight for accounting? Setting aside the possibility that the governments, students and donors who now pay the costs may one day rebel at funding the play of accounting researchers, there is an indication of crisis in the demographic evidence of Fogarty and Markarian (2007). They report that the US accounting academy is declining, both in absolute numbers and in average seniority, and that these effects are most pronounced at the ´elite doctoral-granting institutions. If this continues, then the control of the academy and its journals by the current ´elite may come under threat. This would provide an opportunity for replacement of the cur- 43In Australia and the UK, where the tenure system is not used or takes a milder form, it is notable that both the research community and the journals embrace a wider range of research approaches. 44I do not recall ever having a physics paper rejected, and only rarely was a paper returned for revision. The editor of Physical Review would think you insane if you suggested that the prestige of the journal would be enhanced if most papers were rejected. This leading journal has rejection rates of around 20% (Bederson, 1994). 32rent paradigm by one that allows positive accounting to make a meaningful contribution to scientific project, but of course that outcome is by no means assured. For those (whether part of the ´elite or not) who would like to see more effective positive accounting research, the deficiencies described in this article suggest some possible actions. Editors of non-´elite journals (whether American or international) might specialise in accepting papers which take models and measurement issues more seriously, even when those papers fail to solve a puzzle (as they often will); and might accept many more short papers which report replications of previous work. This would not improve the current reputation of their journals, but would position them to emerge as leaders should a future crisis leave the present leading journals stranded in an outdated paradigm. Referees of quantitative papers might usefully press authors to take issues of measurement and the mathematical forms of models more seriously; their scope to do so is limited, but if authors come to expect such pressure they will develop the habit of considering those issues at the planning stage of their research. Similarly, referees of critical papers might press authors to test the theories that they use against competing alternatives. For their part, authors might develop a more critical attitude to the approaches that are now accepted. This is, of course, a personally risky strategy: referees and editors seem to be more comfortable with a paper which uses an approach consistent with a previously published paper. Changes in approach need to be carefully defended, while repeating the same approach need only be supported by citing the precedents. Accordingly, such steps towards liberating positive accounting research to achieve its potential intellectual contribution should be considered the responsibility of tenured academics rather than new entrants.45 8. Conclusion This paper has examined the ontology and epistemology of positive research, and considered how the current practice of accounting research falls short of what is required to operate the research program successfully. Several suggestions are offered for quantitative positive research. First, there is a need for better theoretical models: that is, models that are highly specified and thus highly vulnerable, and that are taken seriously as subjects for detailed testing. The disappointing progress being made in 45This is the reverse of the usual situation described by Kuhn, where older researchers keep hold of the old paradigm and new researchers drive a revolution. 33positive accounting research is a direct consequence of using ad hoc quantitative models which are reduced to mere statements of the expected sign of a relationship between two variables. The more elaborate models arising from analytical research are not structured to be testable, either because they concentrate on tractability or because they are insufficiently developed, with concepts that are not theoretically well-enough defined to be operationalized. Second, there is a need for much better measurement so that theoretical models can in fact be rigorously tested. Concepts need to be carefully operationalized, by finding proxies for interesting concepts that can be shown to have reliable relationships with proxies for other interesting concepts. Attention needs to be paid to choosing the correct functional form, which will often be a form that has a linear relationship with other concepts. Once a reliable way has been established for measuring a concept, that measurement should be used as a standard in subsequent studies rather than re-inventing the measurement for each study. Once a reliable measure of audit complexity has been demonstrated, for example, that measure should be used rather than re-estimating the parameters with each sample in each new study. These considerations imply that there will be papers whose sole purpose is to try to improve the measurement of some concept, not to apply it in testing a theory. Third, there needs to be a shift in focus away from the testing of hypotheses towards estimation of parameters. Confidence intervals for parameters should be compared with theoretical predictions of those parameters, or with comparable measurements from other studies. Whether the result is significantly different from zero is equivalent to whether the confidence interval includes zero, but the measured confidence interval contains important additional information that the test result does not. Fourth, there is a need for data archives of measurements of important concepts, both those that have been made to test particular theories and those have been made to contribute to the archive. Making careful measurements is a significant skill, and the results need to be acknowledged as part of the discipline’s research activity. These measurements become both resources for and constraints upon future theoretical advances. Finally, there is a need for extensive replication, to validate conclusions from hypothesis testing, to confirm the accuracy of measurements, and to explore the limits of applicability of research findings. For critical-qualitative research the fundamental requirement is that theoretical frameworks be treated as claims about the world which require testing to establish the limits of their applicability, if any. This means that such studies need to test competing theories against each other, rather than treating a single theory as an unproblematic and sufficient lens through which to 34examine a set of data. Positive accounting research has a worthwhile contribution to offer to the wider project of understanding human behaviour, because of its unique setting and the particular range of behaviours that accounting encompasses. However, as currently practiced, its main outputs comprise statistically significant but uninterpretable coefficients connecting suspect measurements which are not known to be consistent from sample to sample (and which are sometimes known to be inconsistent); and theories which are not challenged and whose applicability is assumed rather than evidenced. Kuhn’s description of \normal" science appears to fit positive accounting research better than it fits actual sciences. That suggests that the apparent functional deficiencies are in fact essential features of the social system of positive accounting research. The purpose of the system may not actually be to add to our knowledge of human behaviour in accounting-related contexts, but to provide accounting researchers with a ready supply of satisfying puzzles which can be solved with relatively little difficulty. There is some reason to suspect that this social system may not be stable indefinitely, but it does not follow that any crisis would lead to the adoption of a system better suited to advancing knowledge. 9. References Antle, R., Gordon, E., Narayanamoorthy, G., Zhou, L., 2006. The joint determination of audit fees, non-audit fees, and abnormal accruals. Review of Quantitative Finance & Accounting 27 (3), 235{266. Ashton, D., Dunmore, P., Tippett, M., 2004. Double entry bookkeeping and the distributional properties of a firm’s financial ratios. Journal of Business Finance and Accounting 31 (5-6), 583{606. Bederson, B., November 1994. Report to Council on e-print archive workshop, Los Alamos, Oct. 14-15, 1994. Downloaded 1/12/09 from http://publish.aps.org/eprint/losa.html. Belt, D., 2007. Struggle for the soul of Pakistan. National Geographic 212 (3), 32{59. Benfey, O. T., 1958. August Kekul´e and the birth of the structural theory of organic chemistry in 1858. Journal of Chemical Education 35 (1), 21{23. Choi, J.-H., Kim, J.-B., Liu, X., Simunic, D. A., 2009. Cross-listing audit fee premiums: Theory and evidence. The Accounting Review 84 (5), 1429{ 1463. 35Christenson, C., 1983. The methodology of positive accounting. The Accounting Review 58 (1), 1{22. Chua, W. F., 1986. Radical developments in accounting thought. The Accounting Review 61 (4), 601{632. Crombie, A., 1994. Styles of Scientific Thinking in the European Tradition. Vol. 1. Duckworth. Dahmash, F. N., Durand, R. B., Watson, J., 2009. The value relevance and reliability of reported goodwill and identifiable intangible assets. The British Accounting Review 41 (2), 120 { 137. Davila, A., Foster, G., 2007. Management control systems in early-stage startup companies. The Accounting Review 82 (4), 907{937. DeAngelo, L. E., 1981. Auditor independence, ‘low balling’, and disclosure regulation. Journal of Accounting & Economics 3 (2), 113 { 127. Encyclopædia Britannica, 2009. Western philosophy. Retrieved September 29 2009, from http://www.britannica.com. Feltham, G. A., Ohlson, J. A., 1995. Valuation and clean surplus accounting for operating and financial activities. Contemporary Accounting Research 11 (2), 689 { 731. Fisher, R., 1955. Statistical methods and scientific induction. Journal of the Royal Statistical Society Series B 17 (1), 69{78. Fogarty, T. J., Markarian, G., 2007. An empirical assessment of the rise and fall of accounting as an academic discipline. Issues in Accounting Education 22 (2), 137{161. Francis, J. R., 2006. Are auditors compromised by nonaudit services? Assessing the evidence. Contemporary Accounting Research 23 (3), 747 { 760. Frankel, R. M., Johnson, M. F., Nelson, K. K., 2002. The relation between auditors’ fees for nonaudit services and earnings management. Accounting Review 77 (4), 71. Fukuyama, F., 1995. Trust, the Social Virtues and the Creation of Prosperity. Free Press, New York. 36Gibbins, M., 1984. Propositions about the psychology of professional judgment in public accounting. Journal of Accounting Research 22 (1), 103{125. Gonthier-Besacier, N., Schatt, A., 2007. Determinants of audit fees for French quoted firms. Managerial Auditing Journal 22 (2), 139{160. Hannam, J., 2009. God’s Philosophers: How the Medieval World Laid the Foundations of Modern Science. Icon, London. Hay, D. C., Knechel, W. R., Wong, N., 2006. Audit fees: A meta-analysis of the effect of supply and demand attributes. Contemporary Accounting Research 23 (1), 141{191. Hodge, F. D., Kennedy, J. J., Maines, L. A., 2004. Does search-facilitating technology improve the transparency of financial reporting? The Accounting Review 79 (3), 687{703. Humphrey, C., 2008. Auditing research: A review across the disciplinary divide. Accounting, Auditing & Accountability Journal 21 (2), 170 { 203. Ittner, C. D., Larcker, D. F., Meyer, M. W., 2003. Subjectivity and the weighting of performance measures: Evidence from a balanced scorecard. The Accounting Review 78 (3), 725{758. Kealey, B. T., Lee, H. Y., Stein, M. T., 2007. The association between auditfirm tenure and audit fees paid to successor auditors: Evidence from Arthur Andersen. Auditing 26 (2), 95{116. Kosmala MacLullich, K., 2003. The Emperor’s ‘new’ clothes? New audit regimes: Insights from Foucault’s Technologies of the Self. Critical Perspectives on Accounting 14 (8), 791. Kuhn, T. S., 1970. The Structure of Scientific Revolutions, 2nd Edition. University of Chicago Press, Chicago. Laughlin, R., 1995. Empirical research in accounting: Alternative approaches and a case for \middle-range" thinking. Accounting, Auditing & Accountability Journal 8, 63{87. Lee, T., 1997. The editorial gatekeepers of the accounting academy. Accounting, Auditing & Accountability Journal 10 (1), 11{30. Libet, B., 2002. The timing of mental events: Libet’s experimental findings and their implications. Consciousness and Cognition 11 (2), 291{299. 37Lovell, M. C., 1983. Data mining. Review of Economics & Statistics 65 (1), 1{12. Pierce, J., Bekoff, M., 2009. Moral in tooth and claw. The Chronicle of Higher Education http://chronicle.com/article/Moral-in-ToothClaw/48800/. Popper, K. R., 1959. The logic of scientific discovery. Routledge. Schulz, A. K.-D., Cheng, M. M., 2002. Persistence in capital budgeting reinvestment decisions | personal responsibility antecedent and information asymmetry moderator: A note. Accounting & Finance 42 (1), 73{86. Simunic, D. A., 1980. The pricing of audit services: Theory and evidence. Journal of Accounting Research 18 (1), 161{190. Sokal, A. D., 1996. Transgressing the boundaries: An afterword. Philosophy and Literature 20 (2), 338{346. Watts, R., Zimmerman, J., 1978. Towards a positive theory of the determination of accounting standards. The Accounting Review 53, 112{134. Watts, R., Zimmerman, J., 1986. Positive Accounting Theory. Prentice Hall. Watts, R. L., Zimmerman, J. L., 1979. The demand for and supply of accounting theories: The market for excuses. The Accounting Review 54 (2), 273{305. Watts, R. L., Zimmerman, J. L., 1990. Positive accounting theory: A ten year perspective. The Accounting Review 65 (1), 131{156. Wingate, M. L., 1997. An examination of cultural influence on audit environments. In: Previts, G. J. (Ed.), Research in Accounting Regulation, Supplement 1997/1. JAI Press, Greenwich, CT, pp. 129{148. 38