FIR #491: Deloitte’s AI Verification Failures
Manage episode 523346082 series 3447469
Big Four consulting firm Deloitte submitted two costly reports to two governments on opposite sides of the globe, each containing fake resources generated by AI. Deloitte isn’t alone. A study published on the website of the U.S. Centers for Disease Control (CDC) not only included AI-hallucinated citations but also purported to reach the exact opposite conclusion from the real scientists’ research. In this short midweek episode, Neville and Shel reiterate the importance of a competent human in the loop to verify every fact produced in any output that leverages generative AI.
Links from this episode:
- Deloitte was caught using AI in $290,000 report to help the Australian government crack down on welfare after a researcher flagged hallucinations
- Deloitte allegedly cited AI-generated research in a million-dollar report for a Canadian provincial government
- Deloitte breaks silence on N.L. healthcare report
- Deloitte Detected Using Fake AI Citations in $1 Million Report
- Deloitte makes ‘AI mistake’ again, this time in report for Canadian government; here’s what went wrong
- CDC Report on Vaccines and Autism Caught Citing Hallucinated Study That Does Not Exist
The next monthly, long-form episode of FIR will drop on Monday, December 29.
We host a Communicators Zoom Chat most Thursdays at 1 p.m. ET. To obtain the credentials needed to participate, contact Shel or Neville directly, request them in our Facebook group, or email [email protected].
Special thanks to Jay Moonah for the opening and closing music.
You can find the stories from which Shel’s FIR content is selected at Shel’s Link Blog. You can catch up with both co-hosts on Neville’s blog and Shel’s blog.
Disclaimer: The opinions expressed in this podcast are Shel’s and Neville’s and do not reflect the views of their employers and/or clients.
Raw Transcript:
Neville Hobson: Hi everybody and welcome to For Immediate Release. This is episode 491. I’m Neville Hobson.
Shel Holtz: And I’m Shel Holtz, and I want to return to a theme we addressed some time ago: the need for organizations, and in particular communication functions, to add professional fact verification to their workflows—even if it means hiring somebody specifically to fill that role. We’ve spent the better part of three years extolling the transformative power of generative AI. We know it can streamline workflows, spark creativity, and summarize mountains of data.
But if recent events have taught us anything, it’s that this technology has a dangerous alter ego. For all that AI can do that we value, it is also a very confident liar. When communications professionals, consultants, and government officials hand over the reins to AI without checking its work, the result is embarrassing, sure, but it’s also a direct hit to credibility and, increasingly, the bottom line.
Nowhere is this clearer than in the recent stumbles by one of the world’s most prestigious consulting firms. The Big Four accounting firms are often held up as the gold standard for diligence. Yet just a few days ago, news broke that Deloitte Canada delivered a report to the government of Newfoundland and Labrador that was riddled with errors that are characteristic of generative AI. This report, a massive 526-page document advising on the province’s healthcare system, came with a price tag of nearly $1.6 million. It was meant to guide critical decisions on virtual care and nurse retention during a staffing crisis.
But when an investigation by The Independent, a progressive news outlet in the province, dug into the footnotes, the veneer of expertise crumbled. The report contained false citations pulled from made-up academic papers. It cited real research on papers they hadn’t worked on. It even listed fictional papers co-authored by researchers who said they had never actually worked together. One adjunct professor, Gail Tomlin Murphy, found herself cited in a paper that doesn’t exist. Her assessment was blunt: “It sounds like if you’re coming up with things like this, they may be pretty heavily using AI to generate work.”Deloitte’s response was to claim that AI wasn’t used to write the report, but was—and this is a quote—”selectively used to support a small number of research citations.” In other words, they let AI do the fact-checking and the AI failed.
Amazingly, Deloitte was caught doing something just like this earlier in a government audit for the Australian government. Only months before the Canadian revelation, Deloitte Australia had to issue a humiliating correction to a report on welfare compliance. That report cited court cases that didn’t exist and contained quotes from a federal court judge that had never been spoken. In that instance, Deloitte admitted to using the Azure OpenAI tool to help draft the report. The firm agreed to refund the Australian government nearly $290,000 Australian dollars.
This isn’t an isolated incident of a junior copywriter using ChatGPT to phone in a blog post. This is a pattern involving a major consultancy submitting government audits in two different hemispheres. The lesson is pretty stark: The logo on your letterhead isn’t going to protect you if the content is fiction. In fact, this could have long-term repercussions for the Deloitte brand.
But it doesn’t stop at consulting firms. Here in the US, we’ve seen similar failures in the public sector. There’s one from the Make America Healthy Again (MAHA) commission. They released a report with non-existent study citations to a presentation on the CDC website—that’s the Centers for Disease Control—citing a fake autism study that contradicted the real scientists’ actual findings.
The common thread here is a fundamental misunderstanding of the tool. For years, the mantra in our industry was a parroting of the old Ronald Reagan line: “Trust but verify.” When it comes to AI though, we just need to drop that “trust” part. It’s just verify. We have to remember that large language models are designed to predict the next plausible word, not to retrieve facts. When Deloitte’s AI invented a research paper or a court case, it wasn’t malfunctioning. It was doing exactly what it was trained to do: tell a convincing story.
And that brings us to the concept of the human in the loop. This phrase gets thrown around a lot in policy documents as a safety net, but these cases prove that having a human involved isn’t enough. You need a competent human in the loop. Deloitte’s Canadian report undoubtedly went through internal reviews. The Australian report surely passed across several desks. The failure here wasn’t just technological, it was a failure of human diligence. If you’re using AI to write content that relies on facts, data, or citations, you can’t simply be an editor. You must be a fact-checker.
Deloitte didn’t just lose money on refunds or potential reputational hits; they lost the presumption of competence. For those of us in PR and corporate communications, we’re the guardians of our organization’s truth. If we allow AI-generated confabulations to slip into our press releases, earnings statements, annual reports, or white papers, we erode the very foundation of our profession. Communicators need to update their AI policies. Make it explicit that no AI-generated fact, quote, or citation can be published without primary source verification. And you need to make sure that you have the human resources to achieve that. The cost of skipping that step, trust me, is a lot higher than a subscription to ChatGPT.
Neville Hobson: It’s quite a story, isn’t it really? I think you kind of get exasperated when we talk about something like this, because we’ve talked about this quite a bit. Most recently, in our interview with Josh Bernoff—which will be coming in the next day or so—where this very topic came up in discussion: fact-checking versus not doing the verification.
I suppose you could cut through all the preamble about the technology and all this stuff, and the issue isn’t that; it’s the humans involved. Now, we don’t know more than the Fortune article, I’ve seen the one in Entrepreneur magazine, and the link that you shared. Nowhere does it disclose detail about exactly what it was other than the citation. So we don’t know, was it prompted badly or what? Either way, someone didn’t check something. I don’t know how much you need to really hammer home the point that if you don’t verify what the AI assistant has responded to or the output to your input, then you’re just asking for this kind of trouble.
I did something just this morning, funnily enough, when I was doing some research. The question I asked came back with three comments linking to the sources. A bit like Josh—because Josh mentioned this in our interview—every instruction to your AI goes: “Do not come back with anything unless you’ve got a source.” And so I checked the sources, one of which just did not exist. The document concerned on the website of a reputable media company wasn’t there. Now, it could be that someone had moved it, or it did exist but it was in another location. But the trouble is, when these things happen, you tend to fall on the side of, “Look, they didn’t do this properly.”
So I’m not sure what I can add to the story, Shel, frankly. Your remarks towards the end about your reputation is the one that’s going to get hit. You look stupid. You really do. And your credibility suffers.
I found in Entrepreneur they quoted a Deloitte spokesperson saying, “Deloitte Canada firmly stands behind the recommendations put forward in our report.” Excuse me? Where’s your little humility there? Because you’ve been caught out doing something here. And they’re saying, “We’re revising it to make a small number of citation corrections which do not impact the report finding.” What arrogance they are displaying there. Not anything about an apology—or fine, let’s say they don’t need an apology—but a more credible explainer that at least gives them the sense that they empathize here, rather than this arrogant, “Well, we stand by it.” It’s just a little citation? It’s actually a big deal that you quote as something that either doesn’t exist or is a fake document. Exactly. So I don’t know what I can say to add anything more. But if they keep doing this, they’re going to lose business big time, I would say.
Shel Holtz: It didn’t exist. Yeah, I understand their desire to stand by the report. I have no doubt that they had valid information and made valid recommendations, but that’s hardly the point. The inaccuracies call all of the report into question, even if at the end of the day they can demonstrate that they used appropriate protocols and methodologies to develop their recommendations based on accurate information.
You still have this lingering question: “Well, you got this wrong, what else did you get wrong? What else did you turn over to AI that you’re not telling us about because you didn’t get caught?” Even if they didn’t do any of that, those questions are there from the people who are the ones who paid for this report. If I were representing a government that needed this kind of work, first of all, I would be hesitant to reach out to Deloitte. I would be looking at one of their competitors.
If I had a long-standing relationship with Deloitte, and even if I had a high degree of trust with Deloitte, I would still add a rider to a contract that says either you will not use AI in the creation of this report, or if you do, you will verify each citation and you will refund us X dollars—the cost of this report—for each inaccurate, invalid verification that you submit. I’d want to cover my ass if I were a client based on having done this not once, but twice.
Neville Hobson: Right. I wonder what would have happened if the spokesman at Deloitte Canada had said something like, “You’re absolutely right. We’re sorry. We screwed up big time there. We made a mistake. Here’s what happened. We’ve identified where the fault lay, it’s ours, and we’re sorry. And we’re going to make sure this doesn’t happen again.”
Shel Holtz: “Here’s how we’re going to make sure it doesn’t happen again.” Yeah, I mean, this is like any crisis. You want to tell people what you’re going to do to make sure it doesn’t happen again.
Neville Hobson: Yeah, exactly. So they say—and you mentioned—”AI was not used to write the report, it was selectively used to support a small number of research citations.” What does that mean, for God’s sake? That’s kind of corporate bullshit talk, frankly. So they use the AI to check the research citations? Well, they didn’t, did they? “Selectively used to support a small number of research citations…” I don’t know what that even means.
So I don’t think they’ve done themselves any favors with the way they’ve denied this and the way their reporting has spread out into a variety of other media, all basically saying the same thing: They did this work for this client and it was bad. Didn’t do a good job at all.
Shel Holtz: Yeah. So, I’m, as you know, finishing up work on a book on internal communications. It was originally 28 blog posts and I started this back in, I think, 2015. So a lot of the case studies have gotten old. So I did some research on new case studies and I used AI to find the case studies. And then I said, “Okay, now I need you to give me the links to sources that I can cite in the end notes of each chapter that verify this information.”
In a number of cases, it took me to 404s on legitimate websites—Inc, Fortune, Forbes, and the like. But the story wasn’t there and a search for it didn’t produce it. And I would have to go back and say, “Okay, that link didn’t work. Show me some that are verified.” And sometimes it took two, three, four shots before I got to one where I look and say, “It’s a credible source, it’s a national or global business publication or the Financial Times or what have you, the article is here and the article validates what was in the case study,” and that’s the one I would use. But it takes time, and I think any organization that doesn’t have somebody doing that runs the risk of the credibility hit that Deloitte’s facing.
Neville Hobson: Yeah, I mean, this story is probably not going to be front-page headlines everywhere at all. But it hasn’t kind of died yet. Maybe there’s going to be more in professional journals later on about this. But I wonder what they’re planning next on this because the criticisms aren’t going away, it seems to me.
Shel Holtz: No, and as the report noted, it’s not just the Deloittes of the world. It’s Robert F. Kennedy’s Department of Health and Human Services justifying their advisory board’s decisions to rewrite the rules on vaccinations based on citations that not only don’t exist, but that contradict the actual research that the scientists produced.
Neville Hobson: Well, there is a difference there though. That’s run by crazy people. I mean, Deloitte’s not run by crazy people.
Shel Holtz: Not as far as I know. That’s true. And that’ll be a 30 for this episode of For Immediate Release.
The post FIR #491: Deloitte’s AI Verification Failures appeared first on FIR Podcast Network.
50 episodes