Chatbot invented refund policy
Customer relied on chatbot response about bereavement fares. Tribunal held airline responsible for misinformation.
From Air Canada's chatbot inventing refund policies to deepfake video calls authorizing $25M transfers, generative AI failures are no longer hypothetical. They are documented, litigated, and expensive. The question is not whether your AI will make mistakes. It is whether your governance can catch them before they reach the customer.
Text, audio, image, and video AI systems have all failed publicly in the last three years. Each failure shares the same root cause: the system was deployed without the governance controls that would have caught the error before it reached the customer.
Air Canada chatbot invented a refund policy. Tribunal held the airline liable.
2024
AI voice impersonating Biden used in robocalls. FCC enforcement followed.
2024
Google paused Gemini image generation after historically inaccurate outputs.
2024
Deepfake video call impersonated executives. Finance worker authorized transfer.
2024
Customer-facing chatbots have hallucinated refund policies, sworn at customers, advised businesses to break the law, and provided harmful health guidance. In each case, the organization was held responsible for the output.
Customer relied on chatbot response about bereavement fares. Tribunal held airline responsible for misinformation.
Customer prompted support chatbot into profanity and insults. DPD disabled it after incident went viral.
NYC business chatbot produced incorrect guidance including advice that could violate regulations.
When LLMs are used for legal research, compliance advice, or decision support, the failure mode shifts from embarrassment to liability. Fabricated citations, leaked confidential data, and insecure code have all reached production.
Legal filing included fabricated case citations and quotes from ChatGPT. Court sanctioned the lawyers involved.
Reports of employees pasting confidential code and data into ChatGPT triggered internal bans at Samsung and others.
Studies report AI assistance can increase vulnerability rates and reduce secure-by-default behavior, especially for novices.
Voice cloning and deepfake video have moved from research curiosity to operational threat. McDonald's ended a voice AI pilot after ordering errors. Deepfake video calls have authorized millions in fraudulent transfers.
Voice AI test ended after inconsistent ordering performance. Speech-to-intent fails at accents, noise, and ambiguity.
AI-generated voice impersonating President Biden used in robocalls. FCC enforcement actions followed.
Fraudsters used deepfake video to impersonate executives in calls, leading to unauthorized transfers.
AI-generated images have caused brand damage when historically inaccurate outputs went viral, and operational chaos when marketing assets promised experiences that could not be delivered.
These failure mechanisms are portable across text, audio, image, and video. If your governance does not address each one, the failure is a matter of when, not if.
Each control maps to one or more failure mechanisms. Together, they form the minimum viable governance for any production AI system.
| Control | What it means | Failure it prevents |
|---|---|---|
| System-of-record rule | Model never invents policy; only quotes approved sources | Hallucination as policy |
| Evidence-first UX | Show citations, snippets, last-updated date; allow "open source" | Weak provenance |
| Tiered automation | Low-risk automated; high-risk requires human review | Over-automation |
| Eval gates | Offline eval sets, adversarial tests, red-teaming, phased rollout | Ambiguity at edges |
| Conversation firebreaks | Detect jailbreaking, isolate tools, enforce allow-lists | Prompt injection |
| Human fallback | Explicit escalation path, SLAs, ownership, incident response | Over-automation |
| Audit logging | Who/what/when, prompts, sources, actions, retention policy | Data leakage |
| Change management | Model/version pinning, regression tests, release notes | Model drift |
These questions expose the gap between vendor capability claims and what actually happens when the system reaches a customer.
Can your AI system cite the approved source for every claim it makes to a customer?
If the system can invent policy, you own the policy it invents. Air Canada learned this in court.
When the AI fails, how fast can a human take over, and who owns that handoff?
Without explicit escalation, small errors become incidents at the speed of automation.
A focused engagement to map your AI systems against the eight governance controls, identify gaps, and build a remediation roadmap before the next failure becomes yours. Built for CEO/CFO/CIO and board review.
Start a Conversation →Map every AI system touching customers, employees, or sensitive data. Identify which governance controls are present, partial, or missing.
Score each system against the eight controls. Prioritize by blast radius: customer-facing, high-value decisions, and regulatory exposure.
Phased plan to close gaps. Technical controls, process changes, and vendor requirements. The deliverable is production readiness, not a deck.
The organizations that avoid the next Air Canada, the next $25M deepfake, the next viral chatbot disaster, are the ones that ask the governance questions before they deploy.
© 2026 Chander Dhall Methodworks, LLC. All rights reserved.