Does More Deterrence Require More Punishment?
[or Should the Punishment Fit the Crime?]
Sarnia, Ontario N7T 7K4
John P. Palmer
Department of Economics
The University of Western Ontario
London, Ontario N6A 5C2
Deterrence as a policy issue is usually a societal, or macro, concern. However, the traditional approach in the economic analysis of law, building on the work by Gary Becker, treats deterrence at an individual, or micro, level: an increase in deterrence is accomplished simply by adding marginal criminals to the set of already-deterred individuals. In the absence of zero prices, raising both the level of enforcement and punishment is, ceteris paribus, the only cost-efficient way to increase deterrence because these two factors are complements in deterring an individual.
In this paper we aggregate the Becker-type model to derive a societal level production function for deterrence. We demonstrate that aggregation can cause enforcement and punishment to become independent or even substitutes in production. The expansion path for deterrence can be like that for the Becker model (positively sloped), or it can be vertical, horizontal, or even backward-bending. Therefore, a cost-effective increase in deterrence can call for a rise in the level of one deterrent factor while the other is held constant or even lowered. In the latter case, some previously deterred individuals may exit the deterred set, while a larger number enter the set.
Our paper shows that the Becker-type prescription for increased deterrence may not be optimal because enforcement and punishment may no longer be complements at the macro level.
We conclude with references to considerable recent empirical work that is consistent with our theoretical results.
Deterrence, Crime, Punishment
JEL classification: K (law and economics)
An important policy problem in the original, traditional literature on the economics of crime and punishment can be stated as follows: The level of deterrence for a specific crime depends, ceteris paribus, on the expected fine faced by persons considering the crime. Suppose that the legal authority selects a given level of deterrence that, in turn, determines the appropriate expected fine. The expected fine then can be decomposed into the product of (1) a fine and (2) a probability of detecting, convicting, and actually punishing criminals. How should the expected fine and its components, the fine and probability of conviction, change if the legal authority wanted to, say, increase the level of deterrence? The traditional approach in the field of law and economics has been to postulate that positive costs are associated both with the imposition of a punishment and with increasing the probability of conviction; under these postulates, only an increase in both the fine and the probability of conviction can be optimal (i.e. cost-minimizing for any given level of deterrence). However, we will show that this conclusion does not necessarily follow when the theoretical models are aggregated across individuals with different tastes and preferences. Rather, an increase in the desired level of deterrence can call for a lower fine; indeed it might even call for a lower expected fine. Furthermore, these results cannot readily be explained within the traditional law and economics framework.
In our approach, we follow the traditional literature in our analysis of deterring any one individual from committing a crime. We make use of the fact that deterrence of any given crime is a discrete variable — either the crime is deterred or it isn’t. We begin with the individual potential criminal, discussing optimizing behaviour for just one person. However, when we relax the assumption that is usually implicit in many models — the assumption that all individuals are identical (or, if different, then different in such a way that aggregating across them causes no problems), we then show that aggregating the individual results across a large group of different people leads to a very important potential confusion about substitution and complementarity in the production of deterrence. From this step, we approach the problem using conventional production function theory; that is, we treat the probability of conviction and the size of the imposed fine as inputs into the production of deterrence, and explore how the productivity of these inputs affects the decomposition of an expected fine or punishment into its two components: the probability of being punished and the size of the punishment.
In the next section we look at the problem of deterring individuals; then we look at the aggregation problem. In the ensuing sections, we develop a model of optimal selection of (1) the probability of being convicted and punished and (2) the amount of the fine. We establish the necessary and sufficient conditions for an equilibrium choice and examine the comparative statics associated with changes in the level of deterrence, and we show that the effects on factor productivity of changes in factor usage play a decisive role in the decomposition of the expected fine. Further, we use our results to establish the conditions under which it is efficient for the punishment to fit the crime, and we see that having the punishment fit the crime might not always be optimal.
2. Expected Utility and Potential Criminal Behaviour
We begin by considering any particular individual who might be contemplating whether to commit a specific crime. We assume that the person maximizes expected utility, and that the decision the person must make is a zero-one decision — commit the crime or don’t commit the crime. We let p be the ex ante conditional probability that the person will be detected, prosecuted, convicted, and punished for the criminal offense, given that s/he has actually committed the crime; we let f be the fine or punishment the person must suffer upon conviction. The individual will commit the crime if and only if the expected utility of the crime, V, exceeds the expected utility, A, of the alternative. That is, the individual commits the crime if
(1) V(p,f) = (1 - p)U(w + b) + pU(w - f) > U(a) = A, where
V is the expected utility function,
U represents the utility of wealth,
w is the person's initial wealth (before committing the crime),
a is the person’s wealth if the non-crime alternative is selected (a and w are not necessarily the same), and
b is the gain the person receives from committing the crime (we assume this gain is confiscated upon conviction)
This bare-bones expected utility function reduces the individual's choice to one of either committing the crime or not committing the crime. Furthermore, all the variables are exogenous to the individual; there are no continuous choice variables in this simplified model. With this simplification, it is easy to explore the effects on criminal behaviour of changing the levels of the parameters, b, f, and p.
It is easily seen that Vp = -U(w + b) + U(w - f), which is always negative, assuming the individual has a positive marginal utility of wealth, because (w + b) > (w - f). Furthermore, Vf = -pU', which is also negative. It follows that Vpf = -U', which, again, is negative so long as the individual has a positive marginal utility of wealth. The result that Vpf < 0 means that p and f can be thought of as complements in producing deterrence in any individual whose preferences can be represented with equation (1): an increase in either p or f leads to a decrease in the expected utility of committing a crime when the other variable also increases. In this sense, an increase in one component of the expected fine (p∙f) causes an increase in the marginal deterrence effect of the other component.
3. Aggregating beyond one individual
Because we have up to now been considering only one individual and only one crime decision, the result of the simplified model considered in the previous section is that each individual either commits a crime or doesn't commit the crime. Hence, either the (p,f) combination deters crime or it doesn't.
If individuals were identical, then any combination of p and f would either deter all crime or it would deter no crime; increasing p or increasing f would also have the incremental effect of either deterring all crime or none. But people are different; and because individuals are different, different combinations (p,f) will deter different amounts of crime. As a result, it is instructive to explore how much deterrence will be produced in the aggregate by different (p,f) combinations.
At the level of the individual, p and f are always complements in the production of deterrence. Increasing the level of one of the variables leads to an increase in the marginal product of the other. To see this point, consider Figure 1, which shows the values of V, the expected utility from committing a specific crime, for one individual. The line to the right, labeled V(p',f), shows the expected utility of committing the crime, to this individual, if the probability of conviction is set at p = p', for various different possible levels of the fine. If the horizontal axis is set at V= A (recall that A is the individual’s wealth if s/he chooses the non-crime alternative), then for values of f for which V > A, the person would not be deterred and would commit the crime. Only when f > f', given p', would this person be deterred. When p is increased, however, to p", then the minimum level of f necessary for deterring this person drops to f", which is less than f'.
Figure 1. An individual is deterred by combinations of p and f for which V<A.
(Note: in this figure, even though p"> p', f " is smaller than f ').
Later, we will use an extension of Figure 1 to show that, in the aggregate, the inputs into the production of deterrence may be complements, substitutes or independent. That is, if we think of deterrence as an output produced according to the production function Q = D(p,f), aggregation can give either Dpf > 0, Dpf = 0 or Dpf < 0. This result has important implications for the optimal use of p and f to deter crime.
When different individuals are considered within the community, it is difficult to assume that a given level of police effort will lead to the same probability of detection, apprehension, and conviction for each of them. It seems more likely that some individuals have more skills than others at avoiding conviction and punishment. Given these differences, it is probably inappropriate to talk about a given p for an aggregated collection of individuals. Consequently, as we extend the analysis to cover a community, we must now think of p as a variable which measures only the amount of police effort to apprehend and convict criminals. This redefinition of p means that the aggregation problem is quite complex, but that point, after all, is the point we are trying to make.
Below, we show an extension of the analysis to a community composed of different individuals with different preference functions and different abilities. In Figure 2, the negatively sloped curves represent the expected utilities of crime for five different individuals, from the map of all individuals in the V-f plane. Without loss of generality, we assume that A, the utility of the alternative to crime, is zero for each individual, so that an individual commits the crime if V(p,f) > 0. S/he will be deterred if V<0. The fine which would make V = 0 (that is, just put an individual at the margin of committing a crime) can be read at the point where the curve crosses the horizontal axis. The functions are drawn as if the individuals are all risk neutral for simplicity, but the results for Dpf hold regardless of risk preference.
V1 V2 V3 V4 V5
Figure 2. Different Effects of Increasing the Probability of Being Convicted and Punished.
The expected utilities of crime in Figure 2 are drawn for a level of police activity, p'. At fine f ', individual 1 is deterred, individual 5 commits the crime, and individuals 2, 3 and 4 are at the margin. Therefore, if the fine were increased by a small amount, these individuals would be deterred as well, so that the marginal product of the fine is, numerically, three. Now, suppose that the amount of police activity were higher by some small amount. This would rotate all V functions in the plane and shift them to the left. However the amount of the shift would likely differ for each individual, resulting in a new ordering and possibly altering the number of V functions crossing the f-axis at f '. For example, it is possible that four V functions could cross at f '. In this case, four individuals would now be at the margin and the marginal product of f would be four. Thus, the increase in p would have increased Df; p and f would be acting as complements in production, with Dpf > 0. Another possibility is that the reordering caused by the increase in p might lead to three V functions again at f '. If this occurred, the marginal product of f would remain at three; the increase in p would have had no effect on Df; the marginal productivities of p and f would be independent; and Dpf = 0. Finally, suppose that only two V functions arrived at f ' after the reordering. The marginal product of f would have declined so that p and f would be acting as substitutes in production, with Dpf < 0.
From Figure 2, it becomes clear that the relationship between the size of the punishment, the amount of policing activity, and the resulting amount of deterrence of crime in the economy is not necessarily a nice, neat, well-defined function. Nevertheless, many writers argue their conclusions from the implied assumption that the aggregation problem doesn’t exist or doesn’t matter. Recent summary articles by Cameron and by Ehrlich are examples of this omission. Also, the recent special issue on “Penalties: Public and Private” of the Journal of Law and Economics (April, 1999) contains no articles which concern themselves, even in passing, with this aggregation problem. Interestingly, the work by Kessler and Levitt in this latter volume takes notice of the ambiguities in empirical tests for whether a deterrence effect exists in addition to an incapacitation effect. But they, too, seem unaware of the aggregation problem and that this problem might very well be the source of some of the observed empirical ambiguities.
Ehrlich and Liu, in the same volume, present evidence that “…the probability of imprisonment has a greater deterrent effect than its severity. (p486)” In drawing this conclusion, they seem unaware that there might be an aggregation problem when jumping from a model of deterring an individual to a model of deterring crime within a community.
Perhaps the clearest examples of the potential problems of jumping from a model of individual behaviour to a policy discussion of deterrence within a community appear in recent works by Polinski and Shavell. In their articles, the leap is implied, at best, but the aggregation problem does not appear to have occurred to them.
And yet, as we demonstrate in our discussion of Figure 2, it might very well be incorrect to apply to a community, as a whole, a model of deterrence designed to explain the behaviour of only a single individual. And it might very well be the case that the sometimes ambiguous empirical results reported in the literature are a result of this aggregation problem.
4. A Model of Optimal Fine and Probability of Conviction
Once the aggregation problem is recognized, it becomes clearer that when the community is trying to produce deterrence, it may have little a priori reason for believing that the inputs (policing resources and punishment) are substitutes or complements. Each community might, in fact, have to explore the possibilities for its own residents. In general, it is easy to think of policing, p, and fines, f, as inputs into a production function of deterrence, Q = D(p,f). It is also easy to draw isoquants in p and f to represent this production function. The very act of doing so reveals that p and f taken separately will have different effects on the deterrence of crime and that the simple product of the two imposes a very special type of production function with isoquants that are rectangular hyperbolae. But there is no reason to assume that the isoquants should be so restricted. While this point seems obvious, it seems to have been ignored until recently by many writers who fail to notice the potentially varying degrees of substitutability between p and f.
Let Q be a monotonically increasing index of deterrence of a particular crime. We assume that p and f are inputs into a production function for deterrence of a specific crime, say, hubcap theft: Q = D(p,f). Much of the traditional literature on crime and punishment addresses the issues from the perspective of a potential individual criminal, and the deterrence of that criminal with the use of an expected fine, E = p×f. Instead, we take an approach which might seem obvious in the light of section 3 above but which helps clarify many misconceptions in the literature. We simply use fundamental production theory, assuming that deterrence is produced using these two inputs: (1) the amount of policing, p, assuming a monotonic relationship between policing and the probability of conviction (and punishment) and (2) the size of the fine (or punishment). We do not assume that the amount of deterrence produced is the simple product of p×f. Our approach has the value of allowing for varying productivities of the two inputs and recognizing that the way they influence deterrence may not follow a narrowly specified form of the production function. It also helps understand much of the recent empirical work, especially that by Levitt (1998) and by Ehrlich and Liu (1999) which finds a differential effect of p and f on deterrence within communities.
We assume that Dp,Df > 0 in the relevant ranges of p and f and that D( ) has the usual properties of strict concavity. This gives rise to isoquants having the conventional convexity with slopes given by (dp/df) = -(Df /Dp), as illustrated by the curves Qo and Q1 Figure 3. Concavity of D( ) also implies that Dpp, Dff < 0. This assumption is not unreasonable if, for example, increasing levels of deterrence leave increasingly hardened or productive criminals with which to deal, or if law enforcement resources of highest productivity were used first. In either case, p and f would be decreasingly productive at the margin. Following the conventional theory of production functions, p and f are defined to be complements if an increase in one raises the marginal product of the other. That is, they are complements if Dpf > 0. Conversely, p and f are defined to be substitutes in deterring crime if Dpf < 0. Of course, a change in the quantity of one input leaves the marginal product of the other invariant if Dpf = 0. As we have shown in the previous section, depending on the individuals in the economy and their preferences, Dpf can take on any sign; hence, p and f can be substitutes, complements or neither in the production of deterrence.
We assume that there is an autonomous legal authority that exogenously sets the desired deterrence level for any specific crime. The legal authority might be an elected person or body (or possibly some benevolent or not-so-benevolent dictator). We do not explore the public choice implications of having such an authority, nor do we examine the mechanisms implemented for selecting the desired level of deterrence. We do, however, assume that the society is composed of a large number of individual units, and that their preferences are somehow conveyed, albeit perhaps imperfectly, to this legal authority.
Q1 = D(p,f)
Qo = D(p,f) Isocost, I1
fo f1 f
Figure 3. In this case p and f are very close substitutes and f behaves like an “inferior” input: to obtain an efficient increase in the desired level of deterrence, society should seek a higher probability of conviction and punishment, along with less punishment.
Now, suppose that this authority exogenously sets the desired level of deterrence at Qo. We assume that the authority seeks to minimize the total cost, C=rp+sf, of achieving this level of deterrence, where r is the cost per unit of policing and s is the cost per unit of imposing the punishment. Thus, the legal authority seeks to minimize
(2) L(p, f, l) = rp + sf + l[Qo – D(p,f)],
The first-order conditions for a minimum of (2) are:
(3a) Lf = s - l*Df(p*,f*) = 0
(3b) Lp = r - l*Dp(p*, f*) = 0
(3c) Ll = Qo - D(p*, f*) = 0.
The second-order condition is that the bordered Hessian of second order partials, H < 0.
Equations (3) can be solved to find the optimal values p* and f*. A direct interpretation of the equilibrium levels of p and f can be illustrated by dividing equation (3a) by equation (3b). This quotient is
That is, to minimize the cost of achieving the deterrence level Qo, the fine and probability of conviction (and punishment) should be chosen so that the ratio of their marginal products equals the ratio of their prices. Thus, equilibrium occurs at a point where the isoquant Qo is tangent to the isocost closest to the origin. This point of tangency is illustrated by point M in Figure 3.
Next, we investigate the comparative statics associated with equations (3). Specifically, we wish to examine the expansion path for equilibrium. Suppose that the legal authority decides to increase the level of deterrence to some new level, Q1 > Qo. We will discuss below two alternative reasons that the authority might wish to change Q. For now, we wish to deal with a generic increase in Q, which can be interpreted as resulting in a movement along an expansion path of the production function shown in Figure 3.
By the Implicit Function Theorem, we can solve equations (3) for the exogenous variables in the system, and substituting these solutions back into equations (3) yields these equations as identities. Differentiating these identities with respect to Q and solving gives
Note that the first term in the right-most brackets of equations (5a) and (5b) is negative by the strict concavity of D(×), and that the sign of Dpf is indeterminate.
Inspection of equations (5) reveals, as one would reasonably anticipate from standard production theory, that the shape of the expansion path depends, ceteris paribus, on the cost of administering fines and on the substitution properties of p and f. For example, suppose that fines are costly to administer so that s > 0; moreover, suppose that p and f are complements, independent, or weak substitutes so that Dpf is positive, zero or weakly negative. In this case, the signs of (¶p*/¶Q) and (¶f*/¶Q) are unambiguously positive and the expansion path in Figure 3 would always slope upward, in a northeasterly direction. Thus, a call for additional deterrence by the legal authority would lead it to seek both more law enforcement and higher fines.
But we cannot demonstrate on theoretical grounds alone that seeking more deterrence will always, unambiguously, require the imposition of higher fines. Consider the situation with p and f strong substitutes so that Dpf <<0. In this case, the optimal levels of either (but, of course, not both) of p or f could decline with an increase in the desired level of Q. Thus, the expansion path could be negatively sloped like the expansion path shown in Figure 3. This indicates that a call for additional deterrence by the legal authority could be met by lower fines in conjunction with more police protection.
A cost efficient increase in deterrence that calls for an increase in policing but a decrease in the fine is not possible at the level of the individual, because p and f are always complements at that level. However, it does not follow that p and f are always complements in the aggregate, and basing policies on such an incorrect conclusion would be an example of the “fallacy of composition” — what is correct for one person is not necessarily correct for society as a whole. In the aggregate p and f may be substitutes so that an increase in p lowers the marginal product of f. Therefore, if the reduction in the marginal product of the fine is severe enough, then an increase in deterrence can call for a lower fine. It is not surprising that a sufficient condition for this to occur involves these marginal products.
Accordingly, we define the elasticity of the marginal product of factor i with respect to factor j as
Then a sufficient condition for an increase in deterrence, Q, to lead to a reduction in the optimal fine is that the own elasticity of policing be greater than the cross elasticity of the fine, or
To see why, expand equation (6)
and multiply both sides of this inequality by DpDf to get 0>Dpp(p/Dp)DpDf
. Canceling gives (DppDf -DpfDp)
> 0, which guarantees that
< 0 from the second order
conditions and the comparative statics in equation (5a).
p Hyperbola: Eo = po ∙ f1
Q1 = D(p,f)
Qo = D(p,f) Isocost, I1
fo f1 f
One might reasonably expect, after having studied the traditional treatments of the economics of crime, that even if it is possible to obtain increased deterrence optimally with a decreased fine (and increased policing), surely society must increase the expected fine, E=p∙f, to obtain the desired increase in deterrence. Such a conclusion might be incorrect, however. Consider Figure 4, which is a reproduction of Figure 3 with one curve added. This curve is a rectangular hyperbola representing a constant value for the expected fine. It is drawn through the expansion path at the point (po,fo) so the expected fine is Eo = pofo and the level of deterrence produced efficiently is Qo. However, in this example, to increase the level of deterrence efficiently to Q1, society must move to more policing and a smaller fine, at point (p1,f1). Because this point lies to the left of the hyperbola representing a constant expected fine, it must represent a lower expected fine. From this example, we can see that it might be possible to have a deterrence production function such that in order to produce more deterrence as efficiently as possible, society must choose, not simply a lower actual fine, but a lower expected fine as well!
To investigate this possibility further, we differentiate with respect to Q and use equations (4) to get
Now, the last term on the right hand side above is negative from the first- and second-order conditions. Therefore, by a method similar to that used in equation (6) above, we can show that a sufficient condition for is that
That is, a sufficient condition under which the optimal expected fine, E*, will decrease as the desired level of deterrence increases is that the sum of the own elasticities exceed the sum of the cross elasticities.
5. Expansion Paths and Marginal Deterrence
It can be seen that Dpf > 0 is sufficient for an expansion path that moves in a northeast direction in Figures 3 and 4. In these instances, the optimal size of the punishment will vary directly with the amount of deterrence desired and hence will vary with the severity of the crime, assuming that society wishes more deterrence for more serious crimes. However, we have also shown that the expansion path could have a negative slope for f if Dpf is sufficiently less than zero. Indeed the sign of Dpf is an empirical matter.
We can link our results for the expansion path to the problem of marginal deterrence by making an assumption about why the legal authority might increase the deterrence level, Q. First, fixing the crime at the theft of hubcaps, an increase in Q implies that the legal authority has decided to deter it more vigorously.
Alternatively, in addition to hubcap theft, consider the full range of complementary crimes involving automobiles (e.g., break-and-enter, car theft, car-jacking), and let q be a monotonically increasing index of the legal authority’s evaluation of the seriousness of these crimes. They might range from, say, theft of a gascap to carjacking with murder, with the crime of hubcap theft presumably lying somewhere in between these extremes. Now, assume that, over the range of these crimes, the desired level of deterrence selected by the legal authority increases with the severity of the crime. This is not implausible: we are assuming that the legal authority seeks to deter carjacking with murder more thoroughly than the theft of gascaps. Thus, our index of deterrence, Q, is monotonically increasing in q, so that Qq > 0.
This assumption permits us to examine how an increase in the seriousness of the crimes affects the choice of p and f through an increase in Q. Beginning with hubcap theft, consideration of a more serious crime by the legal authority implies that equations (3) should be differentiated with respect to q. Mathematically, this yields differential equations analogous to equations (4), becoming
The term is positive, so that equations (8) have the same implications for a change in q as equations (5) had for a change in Q.
It is reasonable to expect the legal authority to choose to seek more deterrence for more serious crimes; it also seems reasonable to expect that the residents of society will want to choose f and p optimally to minimize the cost of each level of deterrence. These two assumptions mean that they will, under most plausible theoretical conditions, want to have the punishment fit the crime. But not necessarily. If p and f are good substitutes for each other in the production of deterrence, society might well choose more policing and smaller punishments. Whether we would is, of course, an empirical matter.
6. Empirical Relevance
While it is theoretically possible to develop a model, as we do in sections 3, 4, and 5, showing that it might be optimal to increase deterrence by reducing the size of the punishment, it turns out that this result has some empirical explanatory power as well. Many studies have indicated that deterrence and/or the reduction of recidivism can best be promoted by increasing the probability that criminals will be punished. There is even some indication in these studies that the size of the punishment is less important than the expected probability that perpetrators will be convicted and punished. To the extent that these implications from these studies are correct, it is difficult to reject out of hand the results of our model as “theoretically possible but empirically implausible.” Instead, it appears, both from our theoretical model and from these studies, that by focusing on the amount of the punishment rather than on the probability that criminals are punished at all, some policy-makers may be emphasizing the wrong variable. This finding is in direct contradiction to a resolution of the U.S. Congress that, “Congress encourages all … states to adopt as quickly as possible legislation to increase the time served by violent felons.” At the very least, we can readily see that emphasizing only one variable in the production of deterrence is not consistent with efficiency in the standard Becker-type model of crime, but can easily be efficient within the context of our revision of the model to allow for aggregation effects.
Public opinion in some jurisdictions seems to be at least somewhat consistent with the reported empirical findings. For example, a recent poll indicated that Canadians are not very confident in the effectiveness of the prison system, but have more confidence in the courts and even more confidence in the law enforcement authorities. One possible implication of these results is that people are not so much interested in having the size of the punishment increased for any given crime, but they do seem to attach importance to making sure that perpetrators face a much higher probability of being punished.
Another example might come from the recent security procedures at the Summits of the Americas held first in Seattle, Washington, and then in Québec City, Québec. Policy makers appear to have learned from the first of the two summits that increasing p, the policing effort, while not increasing (or possibly even decreasing) f, the size of the punishment would be an efficient way to deter violence. It appeared that in Québec City, many more arrests were made, but also many more of those arrested were detained but not charged with any crime. If so, the size of the punishment, f, was very small – just the cost of being detained – while the probability of being detained was much higher. Such a change in policy would not be efficient in the Becker-type model, but again is readily seen as potentially efficient in our model, which considers the effects of aggregation.
This interpretation of these empirical studies, along with our theoretical results, suggests that the traditional economic approach to crime and punishment may be mistaken. It may be that it is incorrect to generalize from an individual to an aggregation of individuals.
The traditional approach to the economics of crime and punishment has mistakenly focused on the expected fine or expected punishment as society’s choice variable for creating deterrence. This focus has led many writers to try to explain why we don’t choose very small probabilities of conviction along with very large punishments. The unfortunate result of this focus has been that scholars have devoted far too much time to discussions of marginal deterrence problems or philosophical issues in an attempt to explain why the punishment should fit the crime. If, instead, we address the problem of optimal deterrence from a standard production function approach, we can more easily identify the conditions under which it is optimal to have the punishment fit the crime.
Interestingly, when these conditions are not met, that is, when the expansion path of Figure 3 or 4 bends backward, then it would not be optimal to have the punishment fit the crime; rather, it would be desirable to increase police protection and enforcement, substituting these activities for higher fines (or other forms of punishment) to obtain the desired level of deterrence.
We wonder if perhaps the crime of assassinating a political leader might also fall into this category. Society views such a crime as being so serious that it devotes considerable resources to its prevention and to increasing the probability that the perpetrator would be apprehended and convicted. The punishment for the perpetrator might be no more than for the murder of any other person even though the crime might be viewed as more serious. At this point, it is likely that an increase in the level of punishment might have very little additional deterrence value, and that the desired additional deterrence must be obtained through prevention and detection. The expansion path becomes vertical and might even conceivably bend backward if prevention and detection become extremely good substitutes for punishment in the deterrence production function.
The key to our results lies in our exploration of the production of deterrence. While the traditional literature begins by looking at individual utility functions, we show that by aggregating our basic results, we can avoid the fallacy of composition. This avenue of research led us to look at production functions for deterrence. A secondary point, to give our results empirical relevance, is that these results should be interpreted as dealing with the probability of receiving the punishment (not just conviction) in comparison with the level of the punishment.
Throughout, our point has been very simple: it may not be correct to aggregate from individual analysis to societies without considering the aggregation problem. Many of society’s scarce resources have been deployed trying to understand why the simple, individual models of deterrence do not have the expected explanatory power in empirical tests when, in fact, the problem may not lie with the models or with the empirical tests – rather the problem may simply be another example of the fallacy of composition.
 See the Winter 1996 Symposium on The Economics of Crime in The Journal of Economic Perspectives, Volume 10, Number 1, especially Isaac Ehrlich, “Crime, Punishment, and the Market for Offenses,” and the references contained therein.
 The most recent exposition of this analysis is provided by A. Mitchell Polinsky and Steven Shavell, “The Economic Theory of Public Enforcement of Law,” Journal of Economic Literature, Volume 38, Number 1, March, 2000, pp 45 – 76. Unfortunately, these writers seem unaware of the potential aggregation problem, going from the individual to the societal level of analysis.
 The traditional models are usually written in terms of fines and expected fines. We tend to carry on this terminology in this paper, but there is no reason that the analysis would not apply equally to other forms of punishment.
 We assume that the option for comparison is not committing the crime. It is likely, of course, that for many criminals the choice is which crime to commit. In such cases, the result of increased deterrence of one particular crime will be simply be an increase in some other crime, not necessarily less serious than the one deterred.
 The curves are straight lines for risk neutral, convex from below for risk averse, and concave from below for risk-seeking individuals.
 Samuel Cameron, “The Economics of Crime Deterrence: A Survey of Theory and Evidence,” Kyklos 41 (1988): 301-23.
 Op. cit., 1996.
 Daniel Kessler and Steven D. Levitt, “Using Sentence Enhancements to Distinguish between Deterrence and Incapacitation,” Journal of Law and Economics 42 (April, 1999): 343-63.
 Isaac Ehrlich and Zhiqiang Liu, “Sensitivity Analysis of the Deterrence Hypothesis: Let’s Keep the Econ in Econometrics,” Journal of Law and Economics 42 (April, 1999): 455-87.
 A. Mitchell Polinski and Steven Shavell, “On the Disutility and Discounting of Imprisonment and the Theory of Deterrence,” Journal of Legal Studies, 28 (January, 1999): 1-16; see also n2, supra.
 In Figures 3 and 4, f1 is greater than fo, and p1 is greater than po
 However, consider the situation in which collecting the fine is costless so that s = 0. In this case the isocost is horizontal, and the outcome for equilibrium depends on the shape of the isoquant map associated with the production function D(p, f). If the isocost turns upward (i.e., the marginal productivity of collecting fines becomes negative eventually), there would still be a limit to the optimal size of the fine. But if the isocost never turns up, no matter how large the fine, then the optimal fine would be infinite when s = 0. see Palmer and Henderson, “The Economics of Cruel and Unusual Punishment,” Eur J Law Econ, 5:235 - 45 (1998)
 see, for example, Cooter and Uhlen, Law and Economics
 In addition to the references provided supra, n.1,6, and 7, see Vijay K. Mathur, “Economics of Crime: An Investigation of the Deterrent Hypothesis for Urban Areas,” Rev Econ and Stat 60:459 - 66 (Aug, 1978); Paul E. Tracy, Mervin E. Wolfgang, and Robert M. Figlio, Delinquency Careers in Two Birth Cohorts, New York, Plenum Press, 1990; Richard T. Wright and Scott H. Decker, Burglars on the Job, Boston, Northeastern University Press, 1994, especially pp 91ff.
 H Con Resolution 105. September 29, 1995.
 The CTV/National Angus Reid Poll, July 11, 1997;