Assignment title: Information
Half a Defence of Positive Accounting ResearchI
Paul V Dunmore
Massey University, Wellington, New Zealand
Abstract
Watts and Zimmerman staked a claim to the term \Positive Accounting Theory" for their particular theory. This paper considers positive accounting in
the broader sense of a research program which aims at developing causal explanations of human behaviour in accounting settings; other examples than
PAT exist in accounting. The ontology and epistemology of such a program
is examined. The logic of statistical hypothesis testing, while superficially
analogous to Popper’s falsification criterion, is much weaker. Although the
broad positivist research program is potentially very powerful, it is being let
down by deficiencies in practice. Common problems are casual construction
of theoretical models to be tested, undue reliance on the logic of hypothesis
testing, a lack of interest in the numerical values of parameters, insufficient
replication to warrant confidence in accepted findings, and the use of theories as lenses to examine qualitative data rather than as explanations to be
tested. Illustrations from good papers are considered. As positive research is
currently practiced in accounting, it seems largely incapable of achieving the
scientific objectives. However, Kuhn’s description of \normal" science fits
positive accounting research quite well. The prospects are briefly discussed
for a Kuhnian crisis and revolution, which might liberate positive accounting
to achieve its potential.
Keywords: positive accounting, research methods, normal science
1. Introduction
In this paper I examine the positive approach to accounting research. Positive accounting research is part of the wider intellectual project of scientific
research, which is intended to understand the cause-and-effect relationships
IThis draft has benefited from comments by Jenny Alves, Amy Choy, David Cooper,
James C. Gaa, Thomas Scott and Dan Simunic, and seminar participants at the University
of British Columbia and Victoria University of Wellington.
Draft - please do not cite December 2, 2009in the world under study. (As I will explain below, some streams of critical
accounting research fall within this definition and are subject to the argument
devloped in this paper.) The setting of accounting is one in which causes of
human behaviour may be explored in large and complex organizations where
face-to-face interaction is largely replaced by less-personal or completely impersonal systems of information for decision-making. To understand both
the importance and the deficiencies of positive accounting research, I briefly
review the wider intellectual project, with its ontological and epistemological assumptions. This review exposes serious deficiencies in the way that
positive accounting research is actually performed, which prevents it from
making a meaningful contribution to the wider project. These deficiencies
are illustrated by examining some good recent papers. The illustrative papers could have been chosen from any area of accounting, but to focus the
discussion they will be selected predominantly from the auditing literature.
If positive accounting research is a well-established social system which
is not well-designed for contributing to the scientific research project, its
real purpose may be different. The concept of the disciplinary matrix used
by Kuhn (1970) suggests instead that positive research may be a paradigm
which is optimal for solving accepted puzzles, within a social group which
accepts and rewards such puzzle-solving regardless of the social or intellectual contribution to be derived from the solutions. This suggestion gains
weight from data presented by Lee (1997) on the dominance of the key accounting journals by a self-replicating ´elite. If that is so, there seems little
hope that the ´elite could be persuaded to adopt a more effective paradigm.
However, the data of Fogarty and Markarian (2007) hint that the relative
position of the ´elite may be declining. Possibly, therefore, we may anticipate
a future crisis and an opportunity for adopting a more useful paradigm. In
the meantime, I offer suggestions for actions by referees and editors to nudge
the present system towards liberating positive accounting research to achieve
its potential.
In papers such as this, one is expected to disclose one’s background and
biases. I was trained as a theoretical physicist, and my accounting research
has been positivist, concerned either with building or testing models. Thus,
my criticisms are from the perspective of one who thinks that the research
project is important, and is disappointed by the ineffective versions that are
now practiced in accounting. However, my father is an historian, and I am by
no means dismissive of non-positive approaches to understanding the world.
22. The Scientific Research Project
Imagine a stream of intellectual enquiry built around the following working hypotheses:
1. There exists a world which is independent of our imagination. That is,
we did not make it up; and events in that world are not subject to the
control of our wishes.
2. Events in that world have causes which are themselves part of the world.
That is, events are neither completely random nor the results of interventions from outside the world.
3. It is possible for normal people to obtain fairly reliable information about
events in the world, by careful observation. This does not imply that we
will never be mistaken in our observations, only that the observations are
not completely unconnected to the world.
4. The purpose of the intellectual enquiry is to use observations to gain an
understanding of the world, and in particular of causation. That is, we
seek mental models which correctly map the causal processes that occur
in the world.
I should immediately make it clear that I am not asserting the truth of these
hypotheses, merely asking for a ‘willing suspension of disbelief’ to permit
their discussion. Indeed, I advance them fairly tentatively, conscious that
for most of human existence they would have been thought preposterous or
impious, and that perhaps the majority of humanity would still find them
so.1 The generally agreed explanation has been that events in the world are
caused by the intervention of non-worldly beings: gods, demons, spirits, and
the like. The only major point of disagreement has been over which beings
are responsible.
When the idea that the world might be understood by rational enquiry
was first invented2 in Greece, in a remarkably short period about 2,500 years
ago, it had to contend not only with conventional Greek mythology but also
with other views of what the world is like and what may be understood about
1After the devastating Kashmir earthquake of 2005, Pakistani physicist Pervez Hoodbhoy was explaining to his graduate students the plate-tectonic forces that caused the
earthquake. \When I finished, hands shot up all over the room," he recalls.\‘Professor,
you are wrong,’ my students said. ‘That earthquake was the wrath of God.’" (Belt, 2007,
p. 59).
2It would be unfair, of course, to attribute the idea to a single person, and many of
the writings of the time are fragmentary and known largely from later references to them.
However, Anaximander of Miletus (ca 610{546BC) seems to have been the first to argue
that the world was driven by physical rather than supernatural causes.
3it. The sophists, for example, seem to have had a distinctly postmodernist
view: the sophist Gorgias argued
(1) that nothing exists;
(2) that if something existed, one could have no knowledge of it, and
(3) that if nevertheless somebody knew something existed, he could not communicate his knowledge to others. (\On Nature or the Non-Existent",
translated in Encyclopædia Britannica, 2009)
Clearly, such a view precludes the sort of enquiry that I am positing, and indeed the sophists concentrated instead on the development of skills in rhetoric
and on giving advice for living; if there can be no confidence in gaining real
knowledge, then success might instead be sought in one’s ability to persuade
others to one’s views (a useful skill in many walks of life, then and now).
Likewise, religious understandings of the world, whether animist, Abrahamic or Buddhist, do not encourage such enquiry.3 There is a well-known
story, which sadly appears to be untrue (Hannam, 2009, p. 312), that various theologians refused to look through Galileo’s telescope on the grounds
that either it would show what was already known from Church doctrine
and the writings of Aristotle (so looking was pointless) or it would show
something contrary to that teaching because it had been corrupted by the
Devil (so looking would be misleading). This is a self-consistent philosophical
position, recognizable today in some creationist attitudes to evidence about
evolution; it cannot be disproved, but it clearly precludes any real advance
in understanding of the world.
The Greek tradition of rational enquiry petered out as Rome gained ascendancy, and collapsed entirely with the classical world itself. It was continued by a brilliant handful of Islamic scholars, and transferred to Europe as
Spain was reconquered and works of Islamic scholarship fell into the hands
of the victors. It then gradually grew from the pursuit of a few amateurs to
the current industrial-scale intellectual enterprise. The idea that the world
might be rationally comprehensible, which must once have seemed a forlorn
hope, has gained real traction in the last few centuries, having extended from
3The hypotheses are not necessarily inconsistent with religious belief. One may posit
that a Being created the world and left it to run according to some set of internal causal
rules, or even that a Being intervenes at every instant to give effect to the results that
would arise if causal rules were operating. However, if a Being set the world in motion
but then intervenes from time to time to alter the results for the benefit of His followers
or otherwise, then the programme of enquiry described here will eventually fail when it
encounters events that are inconsistent with the usual causal rules; that is, hypothesis 2
will be shown to be false.
4astronomy and mathematics to physics, to chemistry, to biology, and most
recently to psychology and the social sciences (and to various sub-branches
and applications of each of these fields). It may still turn out to be wrong, or
to have only limited validity, but at present there is no compelling evidence
of any limits. Past claims that ‘science will never be able to answer X’ have
often proved wrong in an embarrassingly short time, and current proponents
of such claims (as, for example, about consciousness and the nature of subjective experience) might wisely refrain from being too dogmatic until they
know what another few centuries of enquiry may reveal.
The intellectual program I describe is, of course, the scientific research
program; in economics and accounting, it is described as positive research. It
has become the mainstream accounting research program in recent decades,
despite some trenchant criticisms, and it has had some significant successes.
In this paper, I explore how it is being applied in accounting, and suggest
that deficiencies in its implementation have led to the program being far less
effective than should be expected. Because understanding how the world
works is important, I regard it as undesirable that the related research stream
is ineffective, and I offer some suggestions for improvement.
3. Examples of positive research in accounting
Although Watts and Zimmerman (1978, 1986, 1990) virtually trademarked
the term \Positive Accounting Theory", it will be clear that the concept of
positive research is much broader than their particular theory. They theorize that accounting phenomena are caused by the operation of rational
self-interest among parties who interact through express or implied contracts
in various types of organization. This can encompass not only accounting
choices by firm managers, but also pricing and reporting decisions by auditors
(DeAngelo, 1981), standard-setting decisions by regulators and politicians
(Watts and Zimmerman, 1978), expert advice offered by academics (Watts
and Zimmerman, 1979), and others.
But other areas of positive accounting research do not draw appreciably
on this theoretical model. The value relevance literature attempts to infer from observed prices what accounting information investors use in their
decisions, and research into the development of control systems in growing
firms seeks to understand what factors lead to adoption of particular systems (Davila and Foster, 2007). These approaches assume that humans act
rationally, but not in the sort of games that arise from Positive Accounting
Theory.
Fukuyama (1995, p. 13) suggests that the \fundamental model of rational, self-interested human behaviour is correct about eighty percent of
5the time"; this is clearly not defensible in quantitative terms, but it carries the sense that human behaviour is mostly rational but that the exceptions are important. So some accounting research examines behaviour in
accounting settings without assuming rational behaviour, such as how audit
experts make their judgements (Gibbins, 1984), how managers use discretion
in performance-evaluation systems (Ittner et al., 2003), how different ways
of presenting accounting information affect users’ ability to absorb it (Hodge
et al., 2004), and how managers tend to persist in a mistaken decision despite
accounting feedback showing their mistake (Schulz and Cheng, 2002).
These examples are by no means exhaustive, but they serve to illustrate
that the positive research program is much broader than Positive Accounting
Theory. Any research which aims at understanding the nature and causes
of particular accounting phenomena, even if those causes lie in non-rational
aspects of human psychology, qualifies as positive | that is, scientific |
accounting research.
4. Scientific ontology and epistemology
But not all research does so qualify: that is, positive accounting research is
not the same as accounting research.4 Following one line of enquiry, however
fruitful, closes off others:
By deciding ...that among many possible worlds as envisaged in other cultures, the one world that existed was a world
of exclusively self-consistent and discoverable rational causality,
the Greek philosophers ...came to commit their scientific successors exclusively to this effective direction of thinking. At the
same time they closed for Western scientific vision the elsewhere
open questions of what kind of world people found themselves
inhabiting and so of what methods they should use to explore
and explain and control it. (Crombie, 1994, p. 1)
4Chua’s brilliant analysis (Chua, 1986) identifies interpretive and critical alternatives,
and explores the different assumptions underlying each. Her claims about the ontological
assumptions of \mainstream" (i.e. positive) accounting (Chua, Table 2) are too narrow,
however. In the decade or so before her paper, PAT had proved so fruitful that researchers
flocked to the area, so that much of the research could be categorised by the assumptions
that she lists (with the exception of her claim that \Human beings are . . . characterized
as passive objects; not seen as makers of social reality", which is a startling misreading
of PAT). However, these assumptions are not central to the general search for a causal
understanding of the social world, and the examples mentioned in the previous section
show that other sets of assumptions are adopted when appropriate.
6Interpretive researchers pursue some of these open questions, doubting
each of the hypotheses underlying scientific research. First, human agency
(which is not completely rational) and the socially-constructed nature of our
roles, relationships, institutions and practices means that the social world
does not have an objective existence independent of us, its participants, and
that events in it need not have rational causes. Indeed, the categories of
interest include both the \experiences" of social actors and the \meanings"
that they ascribe to their lives and actions; both of these are intrinsically
subjective, but through a process of social interaction they come to form an
objective social reality. Further, we cannot observe the world except through
our own experiences and the descriptions of other participants; there are no
objective facts about experience. Because of these ontological and epistemological difficulties, any program aiming at an objective understanding of the
causes of accounting phenomena is futile.
These views underlie the interpretive research program, which seeks to
explain us to each other. To the philosopher’s question \What is it like to
be a bat?" (Nagel 1974), the interpretive researcher adds \What is it like
to be a human being?" | a university manager controlled by a budgetary
system (Pettersen and Solstad 2007), or a front-line auditor under pressure
to complete procedures too quickly (Herrbach 2005). This is, of course, the
question asked in the humanities, and the relationship of positive to interpretive accounting research parallels that between science and the humanities.
How justified is the interpretive critique of the assumptions of positive
research? The socially constructed nature of reality is not an insuperable
problem: termite mounds and wolf packs are socially constructed, but are
tolerably amenable to scientific study. Interpretive critiques argue that \humans are different," but that is at present a matter of assertion rather than
empirical evidence: we simply do not know what lived experiences and shared
meanings go into the social construction of a wolf pack.5 Neither is there any
greater difficulty in observing the social world of the corporate director than
that of the wolf: we can observe behaviours, and have the advantage of being
able to ask directors about theirs (with the obvious caveat that we need to
assess the reliability of their possibly self-serving accounts). Care is needed
in the research method, but the positive program itself is not invalidated,
unless it fails in practice after a fair trial.
Also, it is not a problem that positive research does not explore meaning
and experience, since its purpose is to explore causation. Different research
streams with different objectives can coexist, and it is not a cogent objection
5An overview of some of what we do know is provided by Pierce and Bekoff (2009).
7to one stream to say that it fails to achieve the objective of the other.
The key problem is the one of agency. If humans have free will,6 then their
actions may have causes that are not amenable to scientific study. Free will
is not the same concept as Cartesian dualism of the mind and body, but it is
not clear that free will can have any meaning except in a dualist framework:
the mind makes choices that it uses the body to carry out, and if those choices
are \free" then they are neither random nor causally compelled by the body
or by anything else in the explainable world. Accordingly, the behaviour of
an agent with free will is neither random nor reliably predictable. It may
be possible to explain general trends of behaviour, but exceptional cases
will always occur because agents with free will can choose to do something
different. In other words, the regular exercise of free will would invalidate the
second assumption of the research program, since the causes of behaviours
(which are events in the objective, if socially constructed, world) will be
located not in the objective world but somewhere else.
But this is an empirical question, not to be answered a priori. From
an evolutionary perspective, it seems that had our ancestors not behaved in
fairly predictable ways, both social life and successful food-gathering would
have been impossible, and they would not have survived to become our ancestors.7 Indeed, it seems necessary that every animal should behave fairly
reliably in a way that rationally seeks its self-interest,8 so that it should be no
surprise that humans do so. But it is likely that there is also an evolutionary
advantage to a certain willingness to depart from that pattern, since trying
something new is the most effective way to discover new opportunities (offset
against the risk of coming to a sudden and unpleasant end). So perhaps human behaviour is largely predictable, but with unpredictable variations from
person to person and situation to situation.9
6I do not assert that people do or do not have free will. That would once have been
thought a philosophical or theological question, but is now becoming an empirical matter.
The experiments of Benjamin Libet, showing that reliably observable changes in brain
states occur up to 0.3 seconds before subjects freely decide to act, are extremely suggestive
(Libet, 2002).
7People with certain mental disorders behave unpredictably and/or fail to perceive the
world accurately. These traits do not seem to enhance reproductive success.
8This does not imply conscious evaluation of alternatives; rationally optimal behaviour
may be pre-programmed. It also does not preclude altruistic behaviour; one stream of
research in evolutionary theory is to understand the circumstances under which altruistic
behaviour is reproductively advantageous.
9Insurance companies rely on this to set premiums. Burglary is a completely voluntary
act, but the overall rate of burglary is predictable enough that insurers can set premium
rates that are both profitable and competitive.
8Although this is an empirical question, it is one to which we do not yet
have an answer. Certainly, we do not yet have any comprehensive causal theory of human behaviour, and we are not certain of the limits of applicability
of the partial theories that we do have. There are thus extensive areas where
all that can be done is to describe a situation, the actions of people in that
situation, and the motives and meaning they ascribe to those actions: that
is, to offer interpretive research.10
It seems, then, that positive ontology and epistemology may not be correct, but they are neither illogical nor absurd. If researchers can observe,
perhaps imperfectly, a social world which is constructed by actors who are,
if incompletely, independent of the researchers and who act, usually, in ways
that are caused by the actions of others and by the physical world, then we
may hope to make observations which allow us to infer those causal relationships. This may not work very well in practice, but the only way to find
out is to try it and see. Scientific research has been conducted, fitfully, for
a couple of millennia, with successes beyond anything that could have been
imagined in its earliest days. Positive research in the social sciences is only
about a century old, and in accounting a matter of a few decades. The problems are seemingly more difficult in the social than in the physical sciences,
and so we should not be too impatient for results. But precisely because it
is so difficult, scientific or positive researchers need to adopt methods that
are as effective as possible in turning observations into well-grounded causal
theories. It is to that issue that I now turn.
5. Falsification and hypothesis testing
5.1. Popper’s criterion
Much scientific research has always involved the collection of data, whether
qualitative or quantitative. Theories are sometimes suggested inductively by
the accumulation of data, but Hume showed that induction cannot prove a
theory correct. The logic which does support the acceptance of theories has
evolved over the centuries, and our best current understanding is due to Popper (1959).11 Working natural scientists, when they think about philosophy
10This is not to demean interpretive research as a mere \study of the gaps", confined
to areas where positive research cannot yet be applied (or might never be applied). It has
been observed that scientists who understand the mechanisms of a sunset do not find the
sunset any less beautiful; and if we ever understand the causes of human behaviour that
will not make human experience any less significant, except perhaps for those who never
thought it significant to begin with.
11Kuhn and his successors have interesting things to say about the sociology of research and in particular the process by which attention turns from one field or theoretical
9of science at all, tend to accept Popper’s description as fairly close to what
they do.
In essence, the procedure may be summarized as follows:
(a) Observe carefully and develop preliminary ideas.
(b) Develop a formal theory, with testable predictions, that is consistent with
all current relevant and reliable empirical evidence. The predictions need
not be quantitative, but quantitative predictions are preferred where
possible because they are more susceptible to falsification.
(c) Test the predictions of the new theory against new observations in situations where the new and old theories make different predictions. Reject
whichever theory fails the test, once the outcome is clear (so that observational errors, for example, cannot be driving the result).
(d) Repeat steps (b) and (c) forever.
The effect of this process is to produce lots of disproved theories, and a
small set of theories that, so far, seem to work: that is, they have not (yet)
been effectively disproved. Our understanding is always provisional, but it
steadily advances; step (b) is a ratchet which ensures that we do not go back
to previously falsified ideas, and the domain of knowledge covered by our
successful theories consistently increases.
Note that Popper says nothing about how new theories are invented.
They may be suggested from observed regularities, but they are not proved
inductively by those observations. They may instead come from a purely
creative and imaginative process,12 based on essentially no empirical data.
What distinguishes a creative idea in science from a creative idea of other
kinds is the subsequent attempt to falsify it using careful observation. This
was Popper’s falsifiability criterion: a theory which cannot in principle be
disproved by observation is not scientific.
It is also important that step (c) involves the testing of two (or more)
theories against each other, not the testing of a single theory. It is sometimes
argued that falsification is inoperable because many assumptions must go
into the theoretical prediction, and falsifying the prediction does not explain
which assumption is wrong; that is, the theory cannot really be falsified.
5.2. How theories are falsified
To better understand the issue, consider the anomalous tracks of the
Pioneer 10 and 11 spacecraft. Launched in the 1970s, these were the first
paradigm to another. But this tells us little about the validity of the theories themselves.
12The chemist Kekul´e describes how, after months of unsuccessful efforts, the structure
of the benzene molecule came to him in a dream (Benfey, 1958, p. 22).
10man-made objects to cross from the solar system into interstellar space. But
over a period of many years it became obvious that both spacecraft are
travelling slightly slower than is predicted by general relativity. There seem
to be four classes of explanation: (1) the measurements are in error; (2)
some influence internal to the spacecraft, such as a gas leak, is causing a
slight additional force; (3) some influence external to the spacecraft, such as
an unsuspected planet, is causing a slight gravitational pull; (4) the theory
of relativity is wrong. Several conferences have been devoted to the anomaly,
and each explanation has been worked on; extremely detailed and painstaking
analysis has now explained part, but not all, of the anomaly.13
However much work is done, as long as the anomaly exists all four explanations remain possible. If someone invents a theory which competes with
general relativity, which is consistent with all of the observations where general relativity already works, and which correctly explains the track of the
Pioneers, then general relativity will have been falsified. But, unless that
happens, the fact that two spacecraft are travelling at slightly the wrong
speed tells us nothing about whether general relativity is wrong, or what
a better theory might look like.14 If, instead, we eventually find a concentration of matter in the outer solar system that explains the anomaly, then
general relativity will not have been falsified by the Pioneer observations. At
present, we know of no such concentration, but we cannot tell whether that
is because it does not exist or because we have failed to observe one that
does exist.
This illustration clarifies that a single theory cannot be falsified by any
observation; but an observation can decisively select between two or more
theories. Observations that cannot be explained do not falsify any theory
although they may be inconsistent with one or more theories, and they do
not advance our understanding of the world until they are explained (quantitatively, if at all possible). That is, decisive observations falsify incorrect
theories: but they do so only if they simultaneously support a contending
theory.15
13The Wikipedia article \Pioneer anomaly" provides a summary of the current position,
with scholarly references.
14In the same vein, the advance of the perihelion of Mercury became one of the crucial
tests between general relativity and Newton’s theory of gravity. But before Einstein invented general relativity, this slight movement was just an observational anomaly, having
no clear significance. Relativity was not a slight tweaking of Newtonian gravitation, but a
description of an utterly different universe. Despite that, most of its observable predictions
are practically identical to those of the earlier theory; but the precise orbit of Mercury
was one item where the results were different enough to be testable against each other.
15Kuhn (1970, pp. 146-147) corrects Popper’s view on this point.
11This illustration also shows that theories need not be grand \theories
of everything", but may be at different scales from macro, through \midrange" (Laughlin, 1995) to micro models of a specific situation. As long as
they are potentially testable descriptions of cause-and-effect relationships in
the objective world, they can contribute to our understanding of how the
world works.
5.3. Qualitative positive research
Many sciences are largely or wholly quantitative, and sciences often become more quantitative as they mature. However, there are many respectable
qualitative sciences (such as botany, geology and zoology); and some powerful
theories (such as Darwin’s theory of evolution) are purely qualitative.16 It is
a common mistake in the social sciences to assume that positive and quantitative research are the same, leading to considerable confusion in considering
research which is positive but qualitative.
There are two distinct purposes for undertaking qualitative positive research in accounting. The first is to gather data to assist in developing a
preliminary understanding of some phenomenon, before enough is known to
justify attempts at quantitative measurement. This is step (a) in Popper’s
procedure. The sort of questions that require this preliminary qualitative investigation are suggested by Humphrey (2008) in the context of audit pricing
research:
for all the regression-based studies, how much do we really
know as to how auditors, themselves, price an audit? How do
they determine a tender bid and what distinguishes the behaviour
and presentation strategies of audit partners or firms with higher
success rates in winning audit tenders? How many audit firms
price their audits using extensive/detailed regression equations?
(p. 180)
Unless researchers begin by talking to auditors (and, no doubt, to clients)
about the factors that affect audit prices, it is hard to see how realistic
models of the process can be constructed or relevant variables can be properly
measured for quantitative research. And without realistic models, it should
be expected that prematurely obtained quantitative results will be heavily
contaminated with Type I errors caused by model mis-specification. I will
take this point up later.
16The sciences mentioned, and the theory of evolution itself, have become more quantitative in recent decades, but each of them had a long qualitative history and each still
contains qualitative elements.
12The other purpose of qualitative positive research is to test theories, which
is step (c) in Popper’s procedure. At present, only economics-based explanations can be worked out so as to give quantitative predictions, and other
theories must be examined qualitatively. However, qualitative researchers
seldom attempt this: instead, they accept a theoretical framework as given
and use it merely to describe and structure their results. When the framework purports to be descriptive of the social world, this approach prevents
us from testing its validity. If a researcher advances the proposition that
\organisations work like this", whether \this" is described in economic, Foucauldian, Marxist, feminist, or other terms, this is a positivist claim which
falls within the scope of the scientific project described earlier. The truth of
the claim needs to be assessed using the best available evidence, which on
our present state of knowledge requires an application of Popper’s procedure.
A typical auditing example is the examination by Kosmala MacLullich
(2003) of audit firms’ adoption of the strategic-systems approach. Whether
this adoption actually employed some version of Foucault’s \technologies of
the self" to obtain compliance by front-line auditors is an empirical question.
To test it, one would need to establish what we would expect to find if
Foucault’s description is applicable to this situation, compare it with what
we would expect under one or more different theories, and then examine the
interview data for its fit with the competing predictions. Instead, the author
followed the accepted practice of coding her interview data using Foucault’s
theory as a guide, to produce \stories of a disciplining process on-the-job"
(p. 798). This approach makes it impossible to assess whether Foucault’s
theory actually does apply well to this situation. What remains is a set of
interview data, possibly biased by the theoretical lens through which it was
passed during analysis.
Critical researchers seem to believe that their theories are simply lenses
which give a clearer view of the world, and that testing them is neither necessary nor feasible. But in fact, most commonly used theoretical frameworks
are assertions that the world operates in particular ways; such a claim immediately brings the research within the scope of the scientific research project.
Whether such a claim is true (or more carefully, the circumstances under
which it is true) is an important and open question, particularly for those
who seek to use it as an exhibit in a drive to improve society.17 By and large,
17Sokal (1996) puts the point clearly, from the point of view of a committed political
progressive: \epistemological agnosticism simply won’t suffice, at least not for people who
aspire to make social change. Deny that non-context-dependent assertions can be true,
and you don’t just throw out quantum mechanics and molecular biology: you also throw
out Nazi gas chambers, the American enslavement of Africans, and the fact that today in
13qualitative accounting research currently fails to address this key question.
Quantitative accounting research does, at least apparently, test whether
its theories work. For someone familiar with mainstream scientific literature,
the consistency with which these theories succeed when tested is nothing
short of astounding. The remainder of this paper will examine this remarkable record of success a little more closely.
5.4. The logic and weaknesses of statistical hypothesis testing
Consider the logic of hypothesis testing as it is commonly practiced in
quantitative positive accounting research. Superficially, the logic looks like
an application of Popper’s procedure: some null hypothesis is proposed, and
under that null hypothesis (and auxiliary statistical assumptions) the distribution of some test statistic is computed. Typically, if the null hypothesis
and auxiliary assumptions are true, then values greater than some cut-off will
occur with some small probability p or less (that is, they will occur in only a
fraction p of random samples). The test statistic is then measured; if it falls
into the critical region then either the researcher’s sample is an improbable
one (with a probability p or less) or the null hypothesis is false. The null
hypothesis is thus rejected, since if it is true the value of the test statistic is
unlikely to have been observed.
But this is clearly a very watered-down version of Popper’s logic. Most
obviously, the measurement of the test statistic is only probably, not certainly, incompatible with the null hypothesis. Even if every other condition
is met, one test in 20 can be expected to be wrongly rejected at the 5% level.
Most papers contain tables of statistical results, each with its significance
test, and so false positives must be frequent in the literature. They could
be virtually eliminated if we shifted our conventional significance threshold
to say p = 0:00001. But that would re-attribute many true positives to
chance; effectively, before accepting a statistical finding we would demand
much stronger evidence, which might not always be obtainable. (I shall suggest later that this is no great consequence, because statistical significance is
not what we should be paying most attention to anyway.)
There is also a technical difficulty with the test: the distribution of the
test statistic under the null hypothesis depends crucially on the auxiliary
assumptions. For OLS regression, for example, it is required that the expected value of the dependent variable in the population is a linear additive
function of the independent variables, not (for example) multiplicative, or
New York it is raining. [Historian Eric] Hobsbawm is right: facts do matter, and some
facts matter a great deal."
14additive in the logarithm of one variable and the product of a second and
the square root of a third. This is almost never examined: indeed, when
the common notation y = f(x1; x2; : : : ) or the equivalent verbal formulation
\we expect y to be positively related to x1 and x2" is followed in either case
by an OLS regression without further discussion, or when variables are logtransformed \to reduce heteroscedasticity", the reader may be confident that
the researcher does not even understand the issue.18 More subtly, research
on the distribution of financial ratios suggests that the population expected
value may be infinite (e.g. Ashton et al., 2004), in which case OLS regression
using a deflated dependent variable is never valid.19
Problems with other technical assumptions (such as independence and
homoscedasticity of residuals) are better recognized by researchers and may
be tested for. However, although this offers some protection, the tests follow
the same logic as for hypothesis testing in general: residuals are accepted
as being homoscedastic unless the test shows that this is very unlikely to be
true. A better approach would be to determine how much heteroscedasticity
would invalidate the main test findings, and estimate a confidence interval
for whether the actual amount is less than this. If the confidence interval
includes zero, then the usual test for heteroscedasticity would accept the
null hypothesis that the amount might be zero; but this is unrelated to
whether the upper limit is high enough to call the main test into question.20
Conversely, it may be that heteroscedasticity can be shown by a significance
test to be present, but its magnitude is too small to matter.
But the hypothesis testing logic fails on other grounds, even if the technical issues can be solved. Fundamentally, only one alternative hypothesis
is considered, and it is not specified sufficiently carefully. If the alternative
is weak enough (\a positive association"), it may be consistent with several
theories,21 each of which may predict a different strength of association (and,
18An interesting feature of the paper on audit fees by Simunic (1980) is that he explicitly
considered alternative functional forms for the effect of firm size. Such care is less common
in the literature than it should be.
19The existence of infinite moments in the deflated variable is suggested by the occurrence of outliers in the sample. In practice, outliers are often deleted, possibly without
comment, and in any case are treated as a statistical problem rather than as a clue that
the research question has been wrongly formulated and should be changed.
20There is in general no off-the-shelf econometric procedure that does this. Usually bootstrapping and/or simulation are required to understand the particular situation. However,
the effectiveness of simulation is limited unless the theoretical model is fully specified and
accurately reflected in the research design.
21In many accounting papers the author does in fact suggest more than one competing
explanation.
15perhaps, a different mathematical form of the relationship). It is, no doubt,
very rare that two variables are perfectly unrelated, and in every other case
there is a 50:50 chance that the sign of the association will be as predicted.
With a large enough sample, this sign can be established with some confidence. This does not, however, establish the truth of any particular theory,
even if the sign happens to be consistent with that theory.
A comparison with the Pioneer anomaly shows the problem. What is
required is that theories be developed in sufficient detail that each makes a
specific mathematical prediction, involving both a functional form and specific parameter values.22 The sample data should then be used to choose
between the predictions of the different theories. If the sample data is inconsistent with every proposed theory, then nothing has yet been learned and
none of the theories has yet been falsified; more research is needed to understand and resolve the anomaly. This is not, however, an argument against
publishing an otherwise sound paper that reveals the anomaly; unless it is
published, further work is hardly likely to occur.
It is commonly understood, in fact, that the logic of hypothesis testing is
intended only to falsify the null hypothesis that \the observed result is due to
chance and is specific to this sample." It is, of course, important to establish
that our results are not due to chance. But, having established this, we are
no closer to learning what the results actually are due to; that is, hypothesis
testing tells us nothing about the truth of the particular alternative except
that, at most, it gives the correct sign.23
Given that accounting data is subject to noise and to measurement error,
there will always be a role for statistics in positive accounting research. But
statistics are being wrongly used: the usual goal should be not hypothesis
testing, but estimation.24
5.5. The effect on the positive research programme
The preceding analysis shows that hypothesis testing, as commonly practiced in positive accounting research, can provide only very weak evidence
in support of a particular alternative hypothesis. Thus we should strongly
suspect that much of what is claimed to have been established is not in fact
true, although of course we have no way of knowing which claims are true
22The parameter values may be expressed in terms of values measured elsewhere, but
they must be known independently of the data from the current sample.
23For a two-sided alternative hypothesis, we do not get even this much information.
24In a different form and context, the foregoing critique of hypothesis testing in research
was substantially argued by Sir Ronald Fisher (Fisher, 1955).
16and which are not.25
In a Kuhnian world in which the function of research is to provide researchers with interesting puzzles, moving from paradigm to paradigm as the
old paradigms become unfruitful, this would not much matter. Researchers
publish their work and advance their careers. Although some advice is given
to standard-setters and regulators based on research findings, this advice is
less direct than in earlier decades when normative research was the usual
style. Presumably regulators tend anyway to follow advice which supports
their inclinations and ignore the rest, so little harm will be done if researchbased advice is incorrect.
But I have argued that positive accounting research actually contributes
to a wider scientific endeavour, with the particular aim of understanding human behaviour and its causes in an unusual setting of complex organizations
where a variety of decisions are influenced by specialized information and
control systems. If that is its main function, then quality problems caused
by weak inferential methods become a serious concern. Accordingly, in the
next section I propose some criteria for a program of positive accounting research that could make a reliable scientific contribution over time; and I test
two good papers specifically against those criteria. This step is essential,
because it would be easy but empty to lament that accounting should be
more like physics. I hope to illustrate that positive accounting should be like
itself, but more effective, by identifying specific weaknesses in the approach
that we accept as standard in the best quantitative papers in our discipline,
and suggesting how they could be improved.
6. What is required for a successful positive research program?
6.1. Vulnerable models that are stringently tested
If we are to gain the most from a Popperian approach to positive research,
the first requirement is that we demand more of our theoretical models. They
must be designed to be taken seriously; they must actually be taken seriously;
and we must expect them to fail and learn to improve them when they do.
This requires, first, that they should be highly specified as to mathematical
form so that they are as vulnerable to disproof as possible. In addition, they
should be tested as accurately as possible (rather than, for example, in a
simplified or linearized form). To make this possible, there must be careful
25A critique of Positive Accounting Theory along roughly these lines was made by Christenson (1983).
17measurement of the variables that enter the models.26
A recent article (Choi et al., 2009) illustrates the point.27 The authors
develop a model of the audit fee as a function of the strength of legal regime
r and audit complexity k, and present it as their equation (5):
f(r; k) = k − k2
4(1 − p)rL (1)
where p is the probability that the manager has a particular level of competence and L is the payment that the auditor would have to make in the
event of an audit failure. This model is of a very specific functional form, as
shown in Figure 1.
This form immediately suggests several interesting tests of the model:
since audit fees must be positive, only certain values of k are feasible and
these depend on the legal regime; for high-complexity engagements, audit
fees tend to fall with increasing complexity; beyond a certain point, stronger
legal regimes have little effect on audit fees; and, of course, the full function
could be fitted to actual data to test whether the fit is quantitatively good.
Some of these conclusions are at least superficially implausible, but they are
all testable in principle.
In fact, the authors do none of this: they immediately discard the richness of the model for purely directional hypotheses: their H1 and H3 are
essentially statements that f(r; k) increases if r increases; and their H2 a
statement that if r does not change then f(r; k) increases if k increases.28
And for testing these hypotheses they make assumptions about the mathematical forms: their regressions use various proxies for complexity, use the
logarithm of the Wingate (1997) litigation index as a proxy for the legal
regime, and assume that the logarithm of the audit fee is a linear additive
function of their proxies. That is, the theoretical model has been turned into
26The discussion thoughout this section addresses quantitative research explicitly. However, with appropriate modifications, many of the same issues apply to qualitative positive
research.
27By choosing this example, I am not intending to denigrate the work of these particular
authors; in fact, I have a high opinion of this paper. Similar comments could be made
about a number of articles from virtually any issue of a top accounting journal. The
problems raised here reflect the low standards of inference which are accepted in even the
best articles in our field.
28Figure 1 makes it clear that the second statement is true only for low-complexity
audits. To arrive at their H2, the authors take an incorrect partial derivative, assuming
that the optimal audit quality is held constant as the other variables change. This technical
error was not fundamental and is not of wider interest; my focus is on the logic of the
testing, which is typical of many other studies.
18Figure 1: How the audit fee depends on the legal regime r and complexity k, according to
equation (5) of Choi et al. (2009).
19something like
log f(r; k) = g(k0) + h(r0) (2)
where k0 and r0 are proxies for k and r and h() may be the logarithm function (in which case, the dependence of f on r will be as some power of the
Wingate index). None of these choices is supported by any discussion as to
whether it is mathematically or economically appropriate; but in any case,
the model that is actually tested has been almost completely decoupled from
the theoretical model introduced in the paper. Equation (2) is the sum of
(proxies for) a function of k and a function of r, which is not the structure
of equation (1) and is quite different from the behaviour shown in Figure 1,
regardless of what functions g() and h() are actually used.
The upshot is that the coefficients of the actual regressions provide no
unambiguous information about the effects of differing legal systems on audit
fees. The results are uninterpretable, because the regressions do not match
a theory that would allow us to interpret them.
Contrast this with the Pioneer anomaly: each theory will have to be
worked through in the most precise detail so that its predictions can be
unambiguously tested against the measurements. If the theory proposes a
new planet, then the calculations must reflect the exact mass, position and
orbit of the planet (and the existence of that planet must be supported by
other evidence than the Pioneer anomaly, such as direct observation).
Had the authors tested their model against their data, it would undoubtedly have failed; and we would have learned something from the failure,
because we could have seen how the real world differs from the model and
could have gone back to try to understand what a more realistic model might
look like. But as matters stand the only thing we have learned for sure is
that some regression coefficients are non-zero.
6.2. Analytical modelling
In several sciences, the development and testing of theoretical models have
become distinct specialities, with theoretical and experimental scientists in
the same discipline receiving quite different training. In accounting, it is
normally expected that the author will both develop a theoretical model and
then collect the data to test it. However, there is a partial exception, since
the development of \analytical" models is indeed accepted as a specialised activity, and such models are not usually presented as part of empirical papers.
Those who specialise in developing such models are certainly \theorists" in
a sense that would be recognised in physics, biology, and other sciences.
But this research has tended to remain decoupled from empirical testing. Researchers building their models around game theory generally limit
20themselves to tractable models that can be shown to have equilibrium solutions, with relatively little concern for the institutional realism of the assumptions.29 Empirical researchers reading this literature are rarely rewarded
with a glimpse of a rigorously testable prediction. Other models, such as the
valuation model of Feltham and Ohlson (1995), do offer apparent pathways
towards testing. However, empirical researchers tend to wrestle with the
parameters of the models, which theorists have not developed in enough detail. For example, Dahmash et al. (e.g. 2009) note that Feltham and Ohlson
\permit ‘other information’ without specifying what this ‘other information’
might include", which causes difficulties in operationalising the model.
However, there is a strong base of expertise among analytical researchers
which could in principle be turned to developing theories to the point where
they could be tested. For this to happen, the preoccupation in the field would
need to shift from tractability to verisimilitude.
6.3. A focus on measurement rather than testing
The raw material of quantitative research is measurement of concepts
in ways that are precise (so far as the underlying data permits) and reproducible. Very little attention is paid to this in accounting research. Two
examples will illustrate the shortcomings and how matters should be improved.
The previously mentioned paper by Choi et al. (2009) used two concepts
that are likely to be of value in a number of studies: audit complexity and
strength of the legal regime.30 These concepts will be of interest in so far
as they relate to each other and to other concepts: that is, in so far as they
will be found in rigorously developed and tested theories. But before such
testing can occur, the concepts need to be defined and measured in a way
that is known to be valid and reliable.
Audit complexity has been used in the audit fee literature for many years;
it represents the amount of work that is required to complete a particular
audit engagement at a particular level of quality. Within the model of Choi
et al., the auditor’s effort cost is kq where k represents the complexity of the
audit and q the quality.
What drives this work requirement? Most obviously, the size of the auditee; and the most consistent finding in the audit-fee literature is that the
29In the sciences it is taken for granted that many models cannot be analytically solved
in realistic settings (and that equilibrium solutions are not always interesting). Thus
theorists have extensive recourse to numerical methods, simulation, and approximation
techniques.
30There are, of course, other concepts, to which the same considerations apply.
21Figure 2: Coefficients, with 95% confidence intervals, for the regression of log(audit fee)
on log(total assets), from studies by Gonthier-Besacier and Schatt (2007); Kealey et al.
(2007); Simunic (1980); Choi et al. (2009).
fee increases as some power of size. Following Simunic (1980), this is usually
represented by modelling the logarithm of the fee as a linear function of the
logarithm of total assets. Figure 2 shows estimates of the coefficients, with
95% confidence intervals (based on reported t values) for Choi et al. and for
three other typical papers. As a visual guide, a dashed line is shown at a
coefficient of 0.41, which is consistent with many of the confidence intervals.
There are wide variations in the estimates, beyond what can plausibly be
attributed to chance, both within this study and between it and the other
studies. In the Choi et al. study, the coefficient is about 0.28 for firms with
weak legal regimes in the home country (Table 7(2)) and for firms that are
cross-listed in Australia, Hong Kong, New Zealand and the UK and noncross-listed firms in the respective home countries (Table 6(2a), (4a), (2b)).
In contrast, Gonthier-Besacier and Schatt (2007) find a coefficient of 0.68 for
French firms.
These coefficients are all highly significant, that is, almost certainly not
zero; but this is hardly of interest. Of much greater interest is why they
differ, since that difference represents a very large alteration in how audit
fees vary with size. If firm sizes vary from $100 million to $100 billion, then
22the audit fee for the largest firm would be 100 times that for the smallest if
fees scale with a power of 0.68, but only 7 times as large if the scale factor is
0.28 (keeping all else equal, of course). The difference cannot be attributed
to different prices in different countries, because that would affect large and
small firms in the same proportion. There is no doubt that auditee size
contributes greatly to audit complexity, but there also seems no doubt that
we do not have a valid measure of that contribution.
For any given size, of course, audit complexity also depends on other
characteristics of the auditee. This is normally captured by including various
ratios and other variables as controls in the regression for the logarithm of
audit fees. Choi et al. use a widely accepted set of firm-level variables: the
sum of receivables plus inventories divided by total assets; total liabilities
divided by total assets; the logarithm of the number of business segments
plus one, and likewise for geographic segments; and dummies for reporting
a loss and for having recent capital issues. Although each of these can be
justified as increasing audit complexity, the literature is silent as to whether
these are actually independent multipliers of complexity or whether (and
how) they interact, and as to how they can appropriately be combined into
a specific mathematical function. That is, we do not know if the amount of
work required to complete an audit to a given quality is actually given by
the formula
k = k1T Aα exp[β1INV REC + β2LEV ](1 + BSEG)γ1 ×
(1 + GSEG)γ2 (1 + δ1LOSS)(1 + δ2ISSUE) (3)
where k1; α; β1; β2; γ1; γ2; δ1; δ2 are constants?31 It is certainly possible, but
it is not altogether convincing, and it has not been demonstrated that this is
the correct formula.32 Another recent paper (Antle et al., 2006) uses other
control variables, but where their set overlaps that of Choi et al. it uses a
different functional form:33
k = k1T Aα exp[β1LEV ](1 + REC)γ1 (1 + INV )γ2 (1 + δ1LOSS) (4)
It is notable (but not apparently noted by the authors) that the coefficient
α in this model is estimated to be effectively zero, suggesting that audit fees
31The coefficients might depend on quality. For example, in the model of Choi et al.,
the coefficient k1 would be multiplied by q.
32Showing that the relevant regression coefficients are significantly different from zero
is not evidence that the functional form is correct.
33Here REC and INV are receivables and inventory in millions of dollars, not scaled
by total assets.
23are unrelated to firm size. Since this is obviously nonsense, one can have no
confidence in the other estimated coefficients from the same regression.
The concept of audit complexity is an important one in this and other
studies, and it can be related to audit fees in a way that is likely to be useful.
But we do not know how to measure complexity in a way that gives a valid,
reliable, and reproducible value. If our measurements are not adequate, we
can have no confidence in the findings of studies based on them.34
The other key concept in Choi et al. is the strength of a country’s legal
regime, which they define as the probability that a court will find an auditor
liable given that an audit failure occurred. They measure this using an
index introduced by Wingate (1997) but actually developed by an insurance
underwriter for a big audit firm. The original scale was from 1 to 10, but
Wingate added a score of 15 for the US (which was not in the original index)
based on an off-the-cuff assessment by a partner of the audit firm concerned
that the US score would be \at least 15".
Does the Wingate score validly measure the concept that Choi et al. require? The description of the factors considered in its construction suggests
that it might do so. But examination of the score values themselves quickly
raises doubts. Although scores are specified to two decimal places, only eight
different scores occur, and Figure 3 shows that they are almost perfectly represented by the formula 1+x=2+x2=9, where x takes the values 0, 1, 2, . . . , 7.
(The US corresponds nearly to x = 9.) It seems, then, that the underwriter
began by dividing countries into eight groups, and then created a score by
drawing a simple graph to convert what is really an ordinal variable into one
that purports to have an interval or even a ratio scale. This can leave us
with little confidence that the Wingate score (or its logarithm, which is what
Choi et al. actually use) has a truly linear relationship to the concept r that
it is meant to measure. It may be that it does, or that some particular (nonlinear) transformation of it does. But until that is established, we cannot
trust it as a measurement of the concept to be used in a serious test of any
theory.
These rather specific comments on one research paper highlight problems
which are common to most areas of positive accounting research. Generally,
little attention is paid to validating the proxies or the specific formulae that
are used for measurement of important concepts. Consequently, we cannot
with any confidence plug measures of these concepts into tests of theories
34Meta-analysis of the audit-fee literature shows that many results are inconsistent from
study to study (Hay et al., 2006). If the concepts are not properly measured, this is hardly
surprising.
24Figure 3: The Wingate score.
that rely on them.
As a brief example of a different measurement issue, consider the paper
by Schulz and Cheng (2002), examining the tendency of managers to escalate
their commitment of resources to projects which they personally approved,
despite feedback suggesting that the project is unsuccessful. The authors
point out that, to properly test the theory, both the elements of personal
responsibility and unequivocal negative feedback must be present, and their
contribution is to correct previous studies by running an experiment which
properly includes both features. They also hypothesized that information
asymmetry between the managers and their superiors would moderate the
behaviour, but found no significant effect.
The study used a standard 2 × 2 experimental design in which personal
responsibility and information asymmetry were each either present or absent. It is not clear that either of these is truly a binary variable (when a
Board approves a decision, each director may feel partial responsibility, for
example), but it is clear that the strength of negative feedback takes more
than the two values \ambiguous" and \unequivocal". If feedback in some
previous studies was ambiguously weak, how does the tendency to escalate
commitment35 vary with feedback strength? Is there a threshold above which
negative feedback overwhelms the tendency to continue justifying a personal
decision, so that the escalation of commitment paradox suddenly disappears,
35It might also be asked whether information asymmetry would actually have a measurable effect at other feedback strengths.
25or does progressively strengthening the feedback progressively weaken the
behaviour; and, if the latter, is the effect proportional to the strength of the
feedback or does it increase non-linearly? Is there another threshold below
which feedback, while still negative, becomes so weak (\ambiguous") that the
effect of personal responsibility disappears? These questions may have some
limited application to design of feedback systems in business, but are of great
importance in the research program of trying to understand (non-rational)
human behaviour. To answer them will require developing a measure of negative feedback. In the specific setting of Schulz and Cheng, this is easily
done: it is the actual rate of return on an earlier investment project.
These examples illustrate the potential role of measurement in extending
our knowledge. The model presented by Choi et al (2009) already has a specific mathematical form, and that form could be tested; if errors are found,
they may provide guidance for developing better models. The escalationof-commitment theory is at present purely qualitative: an effect occurs in
certain conditions. However, now that the existence of the effect has been
confirmed, the opportunity arises for measurement of how its strength is affected by the magnitude of its causes. That knowledge can be gained without
at present having any theory; but it would guide the future development of
relevant theory.
Measurement requires some initial theory to identify what concepts are
likely to be worth measuring; attention to an appropriate operational definition of the concept, including consideration of the mathematical form;
consistent use of the same definition in different studies; a focus on reporting
confidence intervals to indicate the precision of any particular measurement,
so that inconsistent measurements can be identified and investigated; and
replication of measurements using different samples and settings, again to
bring inconsistencies into view (since they are likely to be symptoms of an
unreliable measurement of the concept). Only the first of these requirements
can really be said to be characteristic of positive accounting research as now
practiced.
A final point about measurement: in certain sciences, a very common
type of journal article is the short paper reporting a measurement or observation (in chemistry, the properties of a newly synthesized chemical; in
biology, the description of a new specimen; with parallels in archaeology, astronomy, geology, and others). These articles are frankly atheoretical: their
purpose is to report data.36 One of the most striking applications of the use
36If the data set is extensive, it is now likely to be archived on-line rather than reproduced
in the body of the paper.
26of such data was Mendeleev’s invention of the periodic table of the chemical
elements: from thousands of recorded observations of the properties of the
known elements, he noticed that if the elements are ordered by increasing
atomic weight then elements with similar physical and chemical properties
appear periodically in the series. By organizing the list in a tabular form, he
discovered that the table appeared to have several gaps. He predicted that
elements would be found to fill those gaps, and also predicted what their
properties would be, from the properties of the elements surrounding the
gaps. Over the next few decades, the gaps were indeed filled with elements
whose properties were substantially as Mendeleev had predicted. Only many
decades later did any understanding develop of why those regularities occur.37 Without the original measurements, however, there would have been
no possibility of developing the insights given by the periodic table.
In accounting, there appears to be a strong publication bias against measurement except when attached to a theory. There is consequently an acute
shortage of raw material for an accounting Mendeleev to work with. The
undoubted effect is that the development of good theory is hampered by lack
of data; and, as already noted, the theories that occur in empirical papers
are typically neither very good nor taken very seriously.38
6.4. Replication, replication, and more replication
There are two quite distinct motives for replicating previous studies:
(1) To determine whether the original result merely reflects sampling error.
(2) To explore the limits of applicability of previous findings.
The first motive is particularly important in the current approach with
its main focus on hypothesis-testing.
As previously noted, many results in the literature with 5% or even 1%
significance will be false positives even if there are no other statistical issues.
Further, as has long been known (Lovell, 1983), significance levels quickly
become enormously mis-stated if authors search over regression specifications
and publish the specification that works best. If an author has been at
all diligent in trying alternative proxies, functional forms, or econometric
assumptions, then even figures that are reported as having p = 0:001 may
37The cause is the limited and repeating sets of possible orbits for the outer electrons
of an atom, governed by the laws of quantum mechanics.
38Presumably there is a concern that atheoretical measurements may turn out to have
no use. Often, no doubt, this will happen; but measurements that are not made and
not published will certainly have no use. The effect can be mitigated by ensuring that
straightforward papers of this kind are short (2{3 pages often suffices in other disciplines).
27in fact be completely non-significant. There is no way to know which results
are false positives except to replicate the work;39 and the more important the
study, the more important it is that it should be replicated several times.
If this is the motivation, then the correct design is to replicate the study
as exactly as possible, except for using a different sample. The same proxies,
functional forms, and econometric assumptions should be used, so that there
is no possibility of doing data mining over alternative specifications. The
preferred focus is on estimation rather than testing significance, since whether
a statistic is significant depends on the sample size and the residual variance,
neither of which may be controllable in a replication; problems in the original
study are suggested if the original and replicated confidence intervals do not
overlap.
If the motive is to test the limits of applicability of a finding, then the
approach depends on what is to be examined. For example, does the finding
apply in different countries, in different time periods, to experts as well as
student subjects, to financial as well as manufacturing firms? How strong
must the negative feedback be before the manager’s decision changes, and
does the change occur suddenly or gradually? The purpose is to identify how
far the underlying theory can be stretched before it breaks down, if indeed
it does. Once the limits of applicability of a theory are reached, then further
theoretical work becomes necessary to extend our understanding.40
An apparent counter-example to the weaknesses discussed in this section may be found in the previously mentioned paper by Antle et al. (2006),
who compare different theories about the causal relationships between audit
fees, non-audit fees, and earnings management. To do this, they build a
simultaneous-equations model involving all three variables. They argue that
this \allows us to more fully model the theoretical relations between audit
fees, non-audit fees and abnormal accruals. . . . Still, the benefits of joint
estimation have to be weighed against our lack of understanding of . . . the
proper model specification in the joint estimation" (p. 238). This sounds
as though the authors are preparing to compare the predictions of different
theories against each other using carefully specified models, but they immediately explain \Since we rely on prior literature for variables to include
in our models, we also do not view misspecification as problematic", which
39Even out-of-sample testing (and variants such as bootstrapping or the Lachenbruch
method) do not give the assurance required. There is no way of telling whether the
author has simply been more diligent in seeking a specification that survives this additional
testing. It is necessary that the original study be published, so that its details are frozen
and no longer subject to tweaking by the researcher, before replication can be attempted.
40This motivation for replication is equally applicable to qualitative positive research.
28makes it clear that they do not regard the mathematical form of the models
as being a specification issue. In fact, all of the jointly estimated equations
are linear, some variables are log-transformed, and the control for audit complexity uses some different variables and functional forms than used in other
studies. The authors do not consider that these choices warrant discussion
or justification. Further, the theories are not mutually exclusive, so that support for one does not imply rejection of others; in that sense the theories are
not being tested against each other, although they are all being tested in the
same set of equations. The paper presents better econometrics than earlier
papers in the field, but its fundamental logic does not allow it to provide
clear evidence. If all the theories have some element of truth, then it would
be of greater economic interest to understand the relative strengths of the
different effects, and this brings us back to the importance of measurement
in preference to mere hypothesis testing.
7. Why is it like this?
This survey has revealed a wide gap between how positive accounting
research is actually practiced and what would be required for it to make
an effective contribution to the wider intellectual programme outlined in
section 2. If a system does not appear to be optimised for its purpose, there
are two possible responses: we may try to set about modifying the system,
or we may stop to wonder whether we might have mistaken its purpose.
The description of \normal" science by Kuhn (1970) may offer some relevant
insights.
Kuhn does not clearly describe his view of the world that science investigates, but it appears that he is not a realist: \There is, I think, no
theory-independent way to reconstruct phrases like ‘really there’; the notion
of a match between the ontology of a theory and its ‘real’ counterpart in
nature now seems to me illusive in principle" (Kuhn, 1970, p. 206). It follows that, unlike Popper, he does not accept the hypotheses of the scientific
research programme that I set out in section 2, and does not regard the
scientific programme as feasible. Instead, his fundamental view of scientific
change is that of one set of views supplanting another in a community.41
Thus, he seems to view scientific research as an essentially cultural activity, validated by the group of those who participate. What he calls \normal"
science in a disciplinary area is a set of practices, beliefs, and attitudes that
are well adapted to allowing the members of that group to solve large numbers of related puzzles:
41See, for example, his discussion of translation and conversion on pp. 204-205.
29[O]ne of the things a scientific community acquires with a paradigm
is a criterion for choosing problems that, while the paradigm is
taken for granted, can be assumed to have solutions. To a great
extent these are the only problems that the community will admit
as scientific or encourage its members to undertake. Other problems . . . are rejected as metaphysical, as the concern of another
discipline, or sometimes as just too problematic to be worth the
time. (Kuhn, 1970, p. 37)
Solving puzzles is fun, and Kuhn sees science as basically a form of play.
The relevant community defines the paradigm, much as the World Chess Federation defines the rules of chess, and its members engage in puzzle-solving
within that paradigm. Anyone who does not subscribe to the paradigm is
excluded from the scientific community, much as a would-be chess player who
thought the rules should be different would not find anyone willing to play
under his rules. If a paradigm becomes ineffective, a revolution will occur
and lead eventually to the emergence of a new paradigm (although some of
the previous players may not adapt to the new rules and may be left behind).
Most scientists do not recognise this description of what they do, because
Kuhn omits any but the most passing mention of the constraints imposed by
the objective world. Anyone who has tried to develop a model that fits experience, even within the most securely defined paradigm, is acutely conscious
of how difficult this process is. One reason for having increasing confidence
in the objective reality of the world, the first hypothesis of section 2, is that
experience shows the world to be extremely refractory to our attempts to
understand it.
The research practices recommended in section 6 are difficult and demanding. One might expect, for example, that many years and much published research would be spent simply in the process of identifying the best
way to measure a particular concept reliably. Developing rigorous theories
is hard, and most of them will fail when properly tested. I began this essay
with a reminder that the scientific project has been in progress for millennia;
scientists accept the idea that worthwhile progress is slow, and that timeframes are often measured in decades or centuries rather than months. If
accounting is to make a worthwhile contribution to the scientific project, its
apparent rate of progress would certainly slow; but I have argued that the
current progress is largely illusory anyway.
But if one were to construct a social system imitative of science but
evading the constraints imposed by nature, one could create a perfect Kuhnian world in which researchers were free to play at solving puzzles within
a paradigm accepted by the research community and in which acceptable
30puzzles could be solved (relatively) rapidly and reliably. In such a social system, it would be high praise of a theory to say that it had succeeded in the
marketplace of ideas (Watts and Zimmerman, 1990), since that would show
that the theory had been extremely successful at generating puzzles that the
community could solve.
What would such an imitation science look like? Obviously, it would have
theories which were not intended to be tested too carefully against evidence.
Some theories might be assessed for their mathematical elegance, without
much concern for their realism. Empirical tests would focus on the least
demanding standards of evidence, such as establishing that the sign of a relationship was correct, avoiding detailed tests of magnitudes or mathematical
forms. Coupled with searches over various specifications, this would ensure
a gratifyingly high rate of success when theories were tested. There would
be a further advantage that very crude measurements of concepts would be
sufficient. (Alternatively, empirical data might be collected merely in support of a particular theoretical lens, without being used to test the theory
at all rigorously.) Finally, the community would limit itself to those who
subscribed to the paradigm; others would be excluded, although free to set
up communities of their own with their own paradigms.42
Perhaps this sketch will seem familiar. Positive accounting research looks
rather like a Kuhnian normal science with two distinct paradigms, each of
which is optimised to allow its practitioners to solve a particular class of
puzzle according to the accepted rules. Neither paradigm, however, is well
adapted to yield much reliable information about how humans behave in
the social contexts where accounting is relevant. Neither paradigm pays
much attention to the other: their descriptions of the world are mutually
incomprehensible, if not necessarily incommensurable in Kuhn’s sense.
As is widely recognised, the quantitative party is dominant in North
America. The evidence provided by Lee (1997) shows how this dominance
is sustained. Lee found that the top four journals over the period 1963-1994
were dominated by an ´elite who had gained their doctorates at one of 20
US universities. The dominance was documented through editorial appointments, majority membership of the editorial board, long tenure on the board,
multiple board memberships, and publication in the journals by members of
the editorial board.
The North American tenure system places a strong emphasis on publishing in high-ranked journals. Together with the ´elite dominance of those
42The World Chess Federation has no objection to people choosing to play bridge; they
are just not welcome to do so at chess tournaments.
31journals which Lee documents, this brings new accounting academics into
a system in which compliance with the expectations of the ´elite is required
for those wishing to remain part of the discipline.43 This ensures the selfrenewing nature of the accounting research community and of the quantitativepositive paradigm that supports it.
The publication system also appears to enforce compliance with the paradigm.
Journal space is limited, and the limits are driven particularly by the desire
to maintain high rejection rates.44 Both editors and referees are unwilling to
risk accepting a paper which might turn out to be wrong. Further, papers
are expected to make a \contribution", which means that the result must be
new and preferably unexpected. This bias discourages theories that are precise enough to fail, discourages replications, and encourages papers that pass
weak empirical tests. A main finding which is actually a Type I error is likely
to be both new and unexpected and thus has an excellent chance of making a contribution. The rate of false positives in the literature may thus be
even higher than one might suppose from general statistical considerations.
Francis (2006) points out that replication of the work of Frankel et al. (2002)
connecting non-audit fees with indicators of earnings management showed
that the results were fragile, and wondered about the frequency of Type I
errors generally in the literature. A concern in the academic community for
novel puzzle-solving rather than truth is likely to be actively harmful if the
purpose of the project is to contribute to understanding.
Kuhn argues that a paradigm is replaced only when it can no longer
support the puzzle-solving of normal science, and it enters some form of
pre-revolutionary crisis. Is a crisis in sight for accounting? Setting aside the
possibility that the governments, students and donors who now pay the costs
may one day rebel at funding the play of accounting researchers, there is an
indication of crisis in the demographic evidence of Fogarty and Markarian
(2007). They report that the US accounting academy is declining, both in
absolute numbers and in average seniority, and that these effects are most
pronounced at the ´elite doctoral-granting institutions. If this continues, then
the control of the academy and its journals by the current ´elite may come
under threat. This would provide an opportunity for replacement of the cur-
43In Australia and the UK, where the tenure system is not used or takes a milder form,
it is notable that both the research community and the journals embrace a wider range of
research approaches.
44I do not recall ever having a physics paper rejected, and only rarely was a paper
returned for revision. The editor of Physical Review would think you insane if you suggested that the prestige of the journal would be enhanced if most papers were rejected.
This leading journal has rejection rates of around 20% (Bederson, 1994).
32rent paradigm by one that allows positive accounting to make a meaningful
contribution to scientific project, but of course that outcome is by no means
assured.
For those (whether part of the ´elite or not) who would like to see more
effective positive accounting research, the deficiencies described in this article suggest some possible actions. Editors of non-´elite journals (whether
American or international) might specialise in accepting papers which take
models and measurement issues more seriously, even when those papers fail
to solve a puzzle (as they often will); and might accept many more short
papers which report replications of previous work. This would not improve
the current reputation of their journals, but would position them to emerge
as leaders should a future crisis leave the present leading journals stranded
in an outdated paradigm.
Referees of quantitative papers might usefully press authors to take issues
of measurement and the mathematical forms of models more seriously; their
scope to do so is limited, but if authors come to expect such pressure they
will develop the habit of considering those issues at the planning stage of
their research. Similarly, referees of critical papers might press authors to
test the theories that they use against competing alternatives.
For their part, authors might develop a more critical attitude to the approaches that are now accepted. This is, of course, a personally risky strategy: referees and editors seem to be more comfortable with a paper which
uses an approach consistent with a previously published paper. Changes
in approach need to be carefully defended, while repeating the same approach need only be supported by citing the precedents. Accordingly, such
steps towards liberating positive accounting research to achieve its potential
intellectual contribution should be considered the responsibility of tenured
academics rather than new entrants.45
8. Conclusion
This paper has examined the ontology and epistemology of positive research, and considered how the current practice of accounting research falls
short of what is required to operate the research program successfully. Several suggestions are offered for quantitative positive research.
First, there is a need for better theoretical models: that is, models that
are highly specified and thus highly vulnerable, and that are taken seriously
as subjects for detailed testing. The disappointing progress being made in
45This is the reverse of the usual situation described by Kuhn, where older researchers
keep hold of the old paradigm and new researchers drive a revolution.
33positive accounting research is a direct consequence of using ad hoc quantitative models which are reduced to mere statements of the expected sign
of a relationship between two variables. The more elaborate models arising
from analytical research are not structured to be testable, either because they
concentrate on tractability or because they are insufficiently developed, with
concepts that are not theoretically well-enough defined to be operationalized.
Second, there is a need for much better measurement so that theoretical
models can in fact be rigorously tested. Concepts need to be carefully operationalized, by finding proxies for interesting concepts that can be shown to
have reliable relationships with proxies for other interesting concepts. Attention needs to be paid to choosing the correct functional form, which will often
be a form that has a linear relationship with other concepts. Once a reliable
way has been established for measuring a concept, that measurement should
be used as a standard in subsequent studies rather than re-inventing the measurement for each study. Once a reliable measure of audit complexity has
been demonstrated, for example, that measure should be used rather than
re-estimating the parameters with each sample in each new study. These
considerations imply that there will be papers whose sole purpose is to try
to improve the measurement of some concept, not to apply it in testing a
theory.
Third, there needs to be a shift in focus away from the testing of hypotheses towards estimation of parameters. Confidence intervals for parameters
should be compared with theoretical predictions of those parameters, or with
comparable measurements from other studies. Whether the result is significantly different from zero is equivalent to whether the confidence interval
includes zero, but the measured confidence interval contains important additional information that the test result does not.
Fourth, there is a need for data archives of measurements of important
concepts, both those that have been made to test particular theories and
those have been made to contribute to the archive. Making careful measurements is a significant skill, and the results need to be acknowledged as
part of the discipline’s research activity. These measurements become both
resources for and constraints upon future theoretical advances.
Finally, there is a need for extensive replication, to validate conclusions
from hypothesis testing, to confirm the accuracy of measurements, and to
explore the limits of applicability of research findings.
For critical-qualitative research the fundamental requirement is that theoretical frameworks be treated as claims about the world which require testing to establish the limits of their applicability, if any. This means that such
studies need to test competing theories against each other, rather than treating a single theory as an unproblematic and sufficient lens through which to
34examine a set of data.
Positive accounting research has a worthwhile contribution to offer to the
wider project of understanding human behaviour, because of its unique setting and the particular range of behaviours that accounting encompasses.
However, as currently practiced, its main outputs comprise statistically significant but uninterpretable coefficients connecting suspect measurements
which are not known to be consistent from sample to sample (and which are
sometimes known to be inconsistent); and theories which are not challenged
and whose applicability is assumed rather than evidenced.
Kuhn’s description of \normal" science appears to fit positive accounting
research better than it fits actual sciences. That suggests that the apparent
functional deficiencies are in fact essential features of the social system of
positive accounting research. The purpose of the system may not actually be
to add to our knowledge of human behaviour in accounting-related contexts,
but to provide accounting researchers with a ready supply of satisfying puzzles which can be solved with relatively little difficulty. There is some reason
to suspect that this social system may not be stable indefinitely, but it does
not follow that any crisis would lead to the adoption of a system better suited
to advancing knowledge.
9. References
Antle, R., Gordon, E., Narayanamoorthy, G., Zhou, L., 2006. The joint determination of audit fees, non-audit fees, and abnormal accruals. Review
of Quantitative Finance & Accounting 27 (3), 235{266.
Ashton, D., Dunmore, P., Tippett, M., 2004. Double entry bookkeeping and
the distributional properties of a firm’s financial ratios. Journal of Business
Finance and Accounting 31 (5-6), 583{606.
Bederson, B., November 1994. Report to Council on e-print archive
workshop, Los Alamos, Oct. 14-15, 1994. Downloaded 1/12/09 from
http://publish.aps.org/eprint/losa.html.
Belt, D., 2007. Struggle for the soul of Pakistan. National Geographic 212 (3),
32{59.
Benfey, O. T., 1958. August Kekul´e and the birth of the structural theory of
organic chemistry in 1858. Journal of Chemical Education 35 (1), 21{23.
Choi, J.-H., Kim, J.-B., Liu, X., Simunic, D. A., 2009. Cross-listing audit fee
premiums: Theory and evidence. The Accounting Review 84 (5), 1429{
1463.
35Christenson, C., 1983. The methodology of positive accounting. The Accounting Review 58 (1), 1{22.
Chua, W. F., 1986. Radical developments in accounting thought. The Accounting Review 61 (4), 601{632.
Crombie, A., 1994. Styles of Scientific Thinking in the European Tradition.
Vol. 1. Duckworth.
Dahmash, F. N., Durand, R. B., Watson, J., 2009. The value relevance and reliability of reported goodwill and identifiable intangible assets. The British
Accounting Review 41 (2), 120 { 137.
Davila, A., Foster, G., 2007. Management control systems in early-stage
startup companies. The Accounting Review 82 (4), 907{937.
DeAngelo, L. E., 1981. Auditor independence, ‘low balling’, and disclosure
regulation. Journal of Accounting & Economics 3 (2), 113 { 127.
Encyclopædia Britannica, 2009. Western philosophy. Retrieved September 29
2009, from http://www.britannica.com.
Feltham, G. A., Ohlson, J. A., 1995. Valuation and clean surplus accounting
for operating and financial activities. Contemporary Accounting Research
11 (2), 689 { 731.
Fisher, R., 1955. Statistical methods and scientific induction. Journal of the
Royal Statistical Society Series B 17 (1), 69{78.
Fogarty, T. J., Markarian, G., 2007. An empirical assessment of the rise and
fall of accounting as an academic discipline. Issues in Accounting Education
22 (2), 137{161.
Francis, J. R., 2006. Are auditors compromised by nonaudit services? Assessing the evidence. Contemporary Accounting Research 23 (3), 747 {
760.
Frankel, R. M., Johnson, M. F., Nelson, K. K., 2002. The relation between
auditors’ fees for nonaudit services and earnings management. Accounting
Review 77 (4), 71.
Fukuyama, F., 1995. Trust, the Social Virtues and the Creation of Prosperity.
Free Press, New York.
36Gibbins, M., 1984. Propositions about the psychology of professional judgment in public accounting. Journal of Accounting Research 22 (1), 103{125.
Gonthier-Besacier, N., Schatt, A., 2007. Determinants of audit fees for French
quoted firms. Managerial Auditing Journal 22 (2), 139{160.
Hannam, J., 2009. God’s Philosophers: How the Medieval World Laid the
Foundations of Modern Science. Icon, London.
Hay, D. C., Knechel, W. R., Wong, N., 2006. Audit fees: A meta-analysis
of the effect of supply and demand attributes. Contemporary Accounting
Research 23 (1), 141{191.
Hodge, F. D., Kennedy, J. J., Maines, L. A., 2004. Does search-facilitating
technology improve the transparency of financial reporting? The Accounting Review 79 (3), 687{703.
Humphrey, C., 2008. Auditing research: A review across the disciplinary
divide. Accounting, Auditing & Accountability Journal 21 (2), 170 { 203.
Ittner, C. D., Larcker, D. F., Meyer, M. W., 2003. Subjectivity and the
weighting of performance measures: Evidence from a balanced scorecard.
The Accounting Review 78 (3), 725{758.
Kealey, B. T., Lee, H. Y., Stein, M. T., 2007. The association between auditfirm tenure and audit fees paid to successor auditors: Evidence from Arthur
Andersen. Auditing 26 (2), 95{116.
Kosmala MacLullich, K., 2003. The Emperor’s ‘new’ clothes? New audit
regimes: Insights from Foucault’s Technologies of the Self. Critical Perspectives on Accounting 14 (8), 791.
Kuhn, T. S., 1970. The Structure of Scientific Revolutions, 2nd Edition.
University of Chicago Press, Chicago.
Laughlin, R., 1995. Empirical research in accounting: Alternative approaches
and a case for \middle-range" thinking. Accounting, Auditing & Accountability Journal 8, 63{87.
Lee, T., 1997. The editorial gatekeepers of the accounting academy. Accounting, Auditing & Accountability Journal 10 (1), 11{30.
Libet, B., 2002. The timing of mental events: Libet’s experimental findings
and their implications. Consciousness and Cognition 11 (2), 291{299.
37Lovell, M. C., 1983. Data mining. Review of Economics & Statistics 65 (1),
1{12.
Pierce, J., Bekoff, M., 2009. Moral in tooth and claw. The Chronicle of Higher
Education http://chronicle.com/article/Moral-in-ToothClaw/48800/.
Popper, K. R., 1959. The logic of scientific discovery. Routledge.
Schulz, A. K.-D., Cheng, M. M., 2002. Persistence in capital budgeting reinvestment decisions | personal responsibility antecedent and information
asymmetry moderator: A note. Accounting & Finance 42 (1), 73{86.
Simunic, D. A., 1980. The pricing of audit services: Theory and evidence.
Journal of Accounting Research 18 (1), 161{190.
Sokal, A. D., 1996. Transgressing the boundaries: An afterword. Philosophy
and Literature 20 (2), 338{346.
Watts, R., Zimmerman, J., 1978. Towards a positive theory of the determination of accounting standards. The Accounting Review 53, 112{134.
Watts, R., Zimmerman, J., 1986. Positive Accounting Theory. Prentice Hall.
Watts, R. L., Zimmerman, J. L., 1979. The demand for and supply of accounting theories: The market for excuses. The Accounting Review 54 (2),
273{305.
Watts, R. L., Zimmerman, J. L., 1990. Positive accounting theory: A ten
year perspective. The Accounting Review 65 (1), 131{156.
Wingate, M. L., 1997. An examination of cultural influence on audit environments. In: Previts, G. J. (Ed.), Research in Accounting Regulation,
Supplement 1997/1. JAI Press, Greenwich, CT, pp. 129{148.
38