Saturday, November 23, 2013

Fourth Circuit Approves Statistical Sampling Technique for Sentencing Tax Loss (11/23/13)

In United States v. Ukwu, 2013 U.S. App. LEXIS 23513 (4th Cir. 2013), here, the Fourth Circuit affirmed a tax loss calculation based on what it viewed as a proper statistical inference from a sample of the population.  I want to address that issue in this blog.

The cases where this type of statistical inference is drawn as to the tax loss usually appear in tax return preparer prosecutions.  In these cases, the IRS discovers a pattern of errors in some number of returns prepared by the target of the investigation.  It will do some level of investigation and determine that some percentage of the errors -- usually a very high percentage -- represent the preparer's fraud. The IRS will then project that percentage over the universe of returns prepared by the preparer to determine, by statistical inference, the tax loss.  This type of inference is usually not presented in the trial to determine guilt or innocence because a more exacting standard of proof than mere inference is required but rather is presented at sentencing to determine the relevant conduct (which can include noncharged years or returns, acquitted years or returns, etc.).  From a statistical perspective, the initial inquiry is whether the sample size is adequate.  See generally the Wikipedia entry on Sampling (Statistics), here.  Let's say, for example, that the preparer prepared 1,000 returns, that the IRS audited 10 and that 9 out of 10 claimed fraudulent deductions or credits resulting in average underpaid tax of $1,000.  Can a fair inference be drawn that 90% of the remaining unaudited 990 returns not only contained fraudulent deductions but that their average amount of fraudulent deductions was $1,00?  What if the number audited were 100, with similar percentages and amounts?  What if the number audited were 200?  300? Would it be important that the taxpayers audited were randomly drawn?  And what does randomly drawn mean?

I can't write a book on statistics, but there are any number of scholarly books and articles on the subject.  One popular book is Nate Silver, The Signal and the Noise (2012), here.  I will mention Silver again below, although I can't resist saying that Silver was the "gold standard" in projecting the outcome of the 2012 presidential elections.  See Nate Silver's Wikipedia entry here.  I mention also Charles Whelan, Naked Statistics: Stripping the Dread from the Data (2012), here.

Let's see what the Fourth Circuit did in its statistical exegesis in Ukwu.  I note at the outset that  opinion is an unpublished per curiam opinion.  I won't go into a rant about unpublished opinions, not to mention unattributed per curiam opinions.  I have done that elsewhere and, besides, nobody is or should be interested in my opinions on such opinions.  I do say that it is some type of "junior" opinion deemed to be of less significance than published opinions in terms of adding to the law.  (Perhaps this could be compared to the difference between Memorandum and Regular Tax Court Opinions.)  Let's get right to to opinion:

The Court gave us the key background (but not the details) as follows:
After Mr. Ukwu's jury conviction, the government estimated how much money Mr. Ukwu took from federal and state coffers. It concluded that Mr. Ukwu's criminal behavior created tax losses of $2.1 million, which corresponds to a base offense level of 22 under § 2T4.1 of the United States Sentencing Guidelines Manual. 
On appeal, Mr. Ukwu takes issue with the $2.1 million estimate, arguing that a preponderance of the evidence shows that his ill-gotten gains amounted to less than $1 million. Specifically, he argues that the district court's method of estimating the tax shortfall was unsound because it used a small, flawed sample of tax returns to make inferences about another 1000 returns that he prepared. Based in part on its estimate, the district court sentenced Mr. Ukwu to 51 months in prison. Mr. Ukwu filed a timely appeal.
Then, because Ukwu did not challenge the method at sentencing, the Court applied a "plain error" standard of review.  (This is a stretch, but perhaps plain error is like "plain meaning" in statutory interpretation, as to which the subjective eye of the beholder is the determinant.)  With that standard set, the Court then moves into more detail about the statistical methodology used at sentencing:
Mr. Ukwu takes issue with how the district court reached its conclusion that his crimes caused over $1 million in tax losses. The sentencing court faced a difficult problem because of the sheer size of Mr. Ukwu's potential fraud. Mr. Ukwu prepared roughly 1,000 tax returns that reported business losses, but the sentencing court and the IRS do not have time to audit each return, interview each taxpayer, and identify the extent of Mr. Ukwu's crimes. As a result, the government had to rely on sampling techniques to make inferences about the universe of 1,000 tax returns. Essentially, the government had to take a spoonful of sauce out of the pot to assess whether the whole batch was spoiled. 
The government used two samples of Mr. Ukwu's 1,000 prepared tax returns to answer the following question: how often did Mr. Ukwu invent Schedule C losses from whole cloth? First, the government relied on a sample of 18 returns that were used at Mr. Ukwu's criminal trial. These returns all reported Schedule C losses and contained loss descriptions that were vague, undocumented, and suspicious. Based on the testimony from the taxpayers involved, the government concluded that 16 out of 18 returns had Schedule C losses that were entirely false. The two remaining returns were disputed. Using these numbers, the government found that 88.88% of the returns in this sample used entirely false Schedule C losses. Note, however, that the returns investigated at trial were chosen for investigation specifically because they contained very high tax loss amounts. Thus, this was not a random sample of returns. 
To solve this problem, the government then collected a random sample of returns to confirm its initial findings. The government drew 24 returns from the universe of 1,000 returns that contained Schedule C losses. n1 Then, investigators analyzed these returns and found that every single one had large Schedule C losses that were vague, undocumented, and suspicious. That is, these returns exhibited the same pattern questionable Schedule C descriptions as the non-random sample of returns that were investigated at trial.
   n1 Specifically, the investigators alphabetized the returns by the first name of the taxpayer, then drew one out of every fifty returns. This technique passes muster, though it is not perfect. Mr. Ukwu is Nigerian, and many of his clients were Nigerian immigrants. If these immigrants were more likely to have the same first name, or the same first letter of their first name, and if Mr. Ukwu was more likely to file false returns on immigrants' forms, as the district court suggested, then the sampling technique would be problematic. However, given the burden of proof -- simply a preponderance of the evidence -- it is more likely than not that this issue was not so grave that it affected the outcome of the sentencing calculation. Thus, while this technique does not warrant reversal here, future sentencing courts should be wary of accepting at face value that a randomization technique is truly random. 
In sum, the government analyzed a non-random sample of returns at trial and found that 90% of the Schedule C losses were entirely false. Then, investigators used a random sample to confirm this estimate, reasoning that since the random sample bore the same patterns as the non-random sample, the two samples likely contained similar levels of fraud. That is, since the random sample looked like the non-random one, and since 90% of returns in the non-random sample were completely false, then 90% of the random sample was also likely to be completely false. 
Finally, the government used this 90% number to calculate Mr. Ukwu's tax loss estimate. The investigators could establish that among the 1000 returns where a Schedule C loss was claimed, Mr. Ukwu claimed roughly $16.4 million in Schedule C losses. If 90% of these losses were entirely fabricated, then this means that roughly $14.6 million of false losses were claimed. Assuming the lowest marginal tax rate of 10%, and factoring in state tax losses, the estimated tax loss was roughly $2.1 million. Because this estimate is between $1 million and $2.5 million, the district court concluded that Mr. Ukwu merited a base offense level of 22. U.S.S.G. § 2T4.1. 
Mr. Ukwu takes issue with several methodological moves made by the government in reaching its $2.1 million estimate. First, he argues that the samples used were too small. Second, he argues that it was error to rely on the non-random sample of returns. Third, he argues that the government never established that the $14.6 million in Schedule C losses were totally fraudulent, rather than partially fraudulent.
The Court then addresses Ukwu's arguments:

1. The Sample Size
As a preliminary matter, we can reject with ease Mr. Ukwu's argument that the government's samples were too small to make a robust inference about the universe as a whole. His argument has intuitive appeal -- how can 24 cases tell us about 1000? But Mr. Ukwu's claim that small sample sizes render estimates useless is statistically incorrect. See David H. Kaye & David A. Freedman, Reference Guide on Statistics, in Reference Manual on Scientific Evidence 83, 126 n.145 (2d ed. 2000) ("Analyzing data from small samples may require more stringent assumptions, but there is no fundamental difference in" how we make statistical inferences in small versus large samples). Certainly, a larger sample size is preferable, since it decreases the odds that one's sample will be misleading. n2 See Joseph Sanders, The Bendectin Litigation: A Case Study in the Life Cycle of Mass Torts, 43 Hastings L.J. 301, 342-43 (1992). However, even very small samples can be useful, as any political polling agency can attest: in many elections, a sample of 1,000 Americans can show, with enough certainty to satisfy the preponderance of the evidence standard, what is likely to happen in an election involving over 100 million voters. See Nate Silver, The Signal and the Noise 63 fig.2-4 (2012). While 24 is a relatively small sample, it amounts to 2% of the entire universe. This sample size does not paralyze us in our attempts to make inferences about the universe of all cases. See United States v. Littrice, 666 F.3d 1053, 1061 ("[R]equiring the government to go through all the needles in the haystack of materially fraudulent and false returns . . . would place a burden on the government beyond what the preponderance standard requires."). As any chef or statistician can attest, even a small spoonful of sauce can indicate how much salt to add.
   n2 Specifically, statisticians teach that larger sample sizes can cut down on two types of error. First, there is the possibility that Mr. Ukwu committed rampant corruption, but by chance, we end up with a sample of cases where he did nothing wrong. Sanders, Bendectin, supra, at 342-43. Second, there is the possibility that Mr. Ukwu committed almost no corruption, but we happen to end up with a sample of cases in which he appears to fudge numbers constantly. Id. A larger sample size decreases the chance of both false negatives and false positives. Id.
JAT Comments on the Sample Size.

  1. I am not a statistics expert, but I am suspicious about Court's rejection of Ukwu's argument that a sample of 24 was not enough, even setting apart whether it was random (addressed below).  The Court makes the correct claim that very small samples can be useful, citing political statistics and citing Silver.  That's true, but the number must be statistically valid.  That does not inform us whether 2% if a valid sample.  One cannot draw any indication from the opinion that some statistically valid analyses of the proper sample size was undertaken or tested.
  2. I have been unable to test the inference the Court makes from the citation to Silver's book.  The opinion cites Silver's book for this proposition:  "However, even very small samples can be useful, as any political polling agency can attest: in many elections, a sample of 1,000 Americans can show, with enough certainty to satisfy the preponderance of the evidence standard, what is likely to happen in an election involving over 100 million voters." I have reviewed the figure and the surrounding text and am not sure it stands for the proposition cited.  The figure cited is a table on the probability of Senate Candidate Winning, Based on Size of Lead in Polling.  And it certainly does not address the point I make in paragraph 1 above.  In any event, the Court's point is correct, whether or not confirmed by the citation, but hardly conclusive here.
2. Nonrandom Sample
Mr. Ukwu's next argument is that the government's estimate was erroneous because it relied on a non-random sample, but this argument is similarly unavailing. He cites to Mehta, in which we questioned a district court's use of a non-random sample to estimate the amount of tax loss among a broader universe of returns. 594 F.3d 277 (4th Cir. 2010) [here]. In Mehta, the government analyzed a sample of returns that were chosen because they had been audited by the IRS. Id. at 282-83. It calculated the average tax loss among these returns to be $1,531 and then concluded that the entire universe of returns would have a similar average tax loss. Id. This was problematic because the returns in the sample were flagged by the IRS specifically because they were more likely to contain tax losses. Id. As such, the average amount of tax loss among this sample was misleading: the broader universe of returns was likely to have a lower average tax loss. Id. The sentencing court's tax estimate was like using a group of NBA players to estimate the average height of all Americans. 
Mr. Ukwu is correct that the initial, non-random sample used in this case is a problematic tool to make inferences about the amount of tax loss for the broader universe of returns. The returns chosen for the non-random sample were chosen specifically because they had higher tax losses. It could be that the amount of fraud in these returns was higher than for the entire universe of returns, so relying on the non-random sample alone would be problematic. However, the government's tax loss estimate was based on more than a non-random sample. The government went out of its way to collect a random sample of returns to bolster its initial estimate. It compared this random sample to the original, non-random sample, and the government concluded that both groups of returns contained the same pattern of suspicious, unexplained tax losses. Though the government's original estimate is based on a non-random sample, the government cleansed this error with the use of a random sample. Thus, the district court did not make the sort of mistake identified in Mehta, and as such, it did not commit plain error. See Olano, 507 U.S. at 734 (1993) ("'Plain' is synonymous with 'clear' or, equivalently, 'obvious.'").
JAT Comments on the Nonrandom Sample Argument.

  1. The Court dismisses the argument on the notion that the second sample was random on the  notion that the second 24 sample was random.  That does not address the issue above, nor does it really tell us the basis for the conclusion that the 24 chosen were random.  It simply says it was random.
  2. The Court was entirely correct in concluding that the first sample was nonrandom, based on the same analysis as Mehta.
3. Is the Inference Justified by the Sampling:

The Court earlier described this argument as:  "Third, he argues that the government never established that the $14.6 million in Schedule C losses were totally fraudulent, rather than partially fraudulent."  This would go to the validity of the amount of tax loss inferred into the universe identified rather than the just the number of returns that might have a tax loss of some amount.  So, addressing this point:
Mr. Ukwu's final argument is most challenging. He admits that the non-random sample contains 90% falsehoods. He admits that the random sample looks similar to the non-random sample. However, he argues that this similarity alone fails to prove that in the random sample, all of the unexplained Schedule C losses were due to criminality. Instead, these losses might have been exaggerated instead of false, or due to negligence instead of fraud. Mr. Ukwu points to a Seventh Circuit case in which that court expressed skepticism of a similar methodology. United States v. Schroeder, 536 F.3d 746, 754-55 (7th Cir. 2008). 
Mr. Ukwu's argument fails because the government need only make a reasonable estimate of the tax loss, and the methodology here, though imperfect, meets that standard. U.S.S.G. § 2T1.1 cmt. 1; Mehta, 594 F.3d at 282. In the eighteen tax returns investigated at trial, the Schedule C forms Mr. Ukwu prepared exhibited a suspicious pattern. Many returns claimed that the taxpayer worked as a contractor for Mary Kay or worked in "Nursing Services," but at trial, the taxpayers testified that they never worked for Mary Kay and never owned such health care businesses. These returns also contained a suspicious pattern of receipts and expenses. The invented businesses often had revenues that were low or non-existent. Nearly all expenses were low or non-existent. Labor costs, meanwhile, were enormous. 
The government's random sample of tax returns exhibited a similar or identical pattern. Many of the returns listed Mary Kay as a profession; many more listed nursing services. One return even listed "General Services" as the profession. In the random sample, as in the non-random sample, the businesses almost always claimed to have zero sales, zero expenses, but enormous labor costs. Given these similarities, the sentencing court made no plain error when it concluded that, just like the returns analyzed at trial, the random sample of returns contained business losses that were entirely fabricated. See Olano, 507 U.S. at 734 ("'Plain' is synonymous with 'clear' or, equivalently, 'obvious.'"). 
Further, Mr. Ukwu's reliance on Schroeder is misguided. In that case, the government used a similar argument to make a tax estimate: it found strong evidence of fraud in sample A, found a similar pattern of losses in sample B, and concluded that sample B was therefore likely to contain fraud. 536 F.3d at 754-55. The Seventh Circuit expressed skepticism of this methodology. Id. at 755. However, the court's reversal in that case was based not on the sampling methodology but rather on fundamental legal errors made by the sentencing court. Id. at 755. The district court in that case applied the wrong burden of proof, apparently concluding "that if evidence is admissible it proves the truth of the proposition for which it is being offered." Id. Instead of requiring the government to prove a tax loss by a preponderance of the evidence, the sentencing court accepted the government's estimate without any analysis, concluding that as long as the evidence was reliable, the tax loss had been proven. Id. Here, meanwhile, the sentencing court conducted a careful analysis of the evidence. It noted potential shortcomings in the methodology but concluded that the estimate was more likely than not to be accurate or significantly lower than the true tax loss. Thus, Schroeder is inapposite. Though the government's methods were not perfect, its tax loss estimate was reasonable. Further, unlike in Schroeder, the district court's analysis was careful and legally sound. This is all that is required under the Sentencing Guidelines. U.S.S.G. § 2T1.1 cmt. 1; Mehta, 594 F.3d at 282.
JAT Comments on this point:

  1. I cannot make any points worthy of further time by readers.  I do note that I am not persuaded of the Court's attempt to distinguish Schroeder, which may be reviewed here.
Harmless Error

Well, and any way, the Court concludes with the standard dodge when it is not fully comfortable with what it just said, the errors, if errors, were "harmless."  How so?
Finally, even if Mr. Ukwu is correct that the tax loss estimate has methodological shortcomings, these errors were harmless and therefore did not affect his substantial rights. Slade, 631 F.3d at 190. The government estimated a tax loss of $2.1 million. Mr. Ukwu argues that it is possible that most of the claimed Schedule C losses were not criminal, but instead were legitimate losses, or at least negligent ones. For example, a client might have had $1,000 in legitimate business losses, but Mr. Ukwu might have pumped the number up to $2,000. 
Mr. Ukwu might be correct, but the $2.1 million estimate is so conservative that even if he is right, the total tax losses are still likely to be above $1 million, which is the level of loss that is necessary for his sentencing range. U.S.S.G. § 2T4.1. First, in addition to false Schedule C losses, Mr. Ukwu used false charitable deductions on his clients' returns, and none of these deductions were counted towards the $2.1 million figure. In one case, Mr. Ukwu claimed a $10,000 charitable gift that was entirely fabricated, suggesting that his Schedule A fraud might be significant. Similarly, the $2.1 million figure also excludes the fraud Mr. Ukwu committed on his own tax returns, which amount to roughly $100,000. 
Further, the court's estimate only looked at Mr. Ukwu's returns from 2006 to 2008. He continued to prepare tax returns in 2009 and for part of 2010, and none of these returns were factored in to the tax loss estimate. Factoring in Mr. Ukwu's 2009 returns increases the estimated loss to roughly $3 million. 
Most importantly, the $2.1 million figure was calculated by applying a 10% marginal tax rate to the entire universe of returns. This is likely a gross underestimate of the true tax liability, since many of the returns were likely to have been subject to a 25% marginal tax rate or higher. This alone could increase the estimated tax loss by more than two-fold. In sum, even if Mr. Ukwu's arguments are valid, his estimated tax losses are more likely than not to be well over $1 million. As such, the district court's alleged error did not affect his substantial rights.
Note that the Court apparently speculates about the marginal tax rate.  That may be right, but it appears to be speculation.

For a prior blog on statistical sampling and inference in determining tax loss, see Court of Appeals Acts on Its Hunch re Flawed Sentencing Tax Loss Estimate Is Harmless (Federal Tax Crimes Blog 2/3/10), here. (discussing the Fourth Circuit's decision in Mehta, cited by the Court above).


  1. I am stunned that with so much at stake the defendant apparently did not have an expert witness (a college math professor would have had the appropriate credentials, though he would have presented facts that any student of Introductory Statistics ought to know in order to pass the course.) Specifically, it is sample size, rather than sample size as a percentage of population that is of primary importance. Sampling 1,000 out of 1 million voters (1/10th of 1% of the entire population) gives about the same confidence level that sampling 1,000 out of 50,000. (2% of the population, or a percentage twenty times as large.) To put it another way, if you were to sample 50 out of a population of 50,000 (1 10th of 1%) you would have a far lower confidence level than in the first example.

  2. For you statistics afficionados, please stay tuned to how the Government proves tax evasion when the allegedly tax evaded is not the defendants's taxes -- the defendants are alleged enablers in the SDNY trilogy of Larson/Pfaff, Coplan and Daugerdas. You might check the instructions that I put in a blog today. One of the issues I will address in a later blog is how, from the limited sample size of evidence submitted, a jury can infer beyond a reasonable doubt that the "relevant taxpayers" did not have sufficient economic substance which required a subjective inquiry into their state of mind (their profit motive). This is really weird stuff. So, statistics guys, please come back and check the later blogs.

    Jack Townsend


Please make sure that your comment is relevant to the blog entry. For those regular commenters on the blog who otherwise do not want to identify by name, readers would find it helpful if you would choose a unique anonymous indentifier other than just Anonymous. This will help readers identify other comments from a trusted source, so to speak.