Established companies with solid vendor contracts and data practices are moving into AI optimization and finding that what protected them before does not cover what AI actually does with their data. These are not naive questions. They are exactly the right ones — and most companies are not asking them.
Before you decide what safeguards to build, what vendor to choose, or what architecture to use, you need to answer one foundational question: what data are you actually putting at risk, and what does it cost you if that data is exposed?
Not every company needs the same solution. A company processing public marketing data through an AI optimization tool faces a fundamentally different risk profile than a law firm running confidential client contracts through the same tool. The safeguards that make sense — and the budget that justifies them — follow directly from that assessment.
Think of it this way. You would not put a Ferrari security system on a basic commuter car — but you also would not protect a Ferrari with a bicycle lock. The question is not "what is the maximum security we could build?" It is "what level of protection is proportionate to what we are actually protecting, and what does a breach actually cost us?"
A small logistics company using AI to optimize delivery routes, processing no customer PII, with no regulated data — a basic DPA and a standard-tier enterprise agreement may be entirely sufficient. Total governance cost: a few hundred dollars a month and a one-page incident response procedure.
A financial services firm using AI to analyze client portfolios, processing account numbers, SSNs, and investment strategies — PII masking before data leaves, egress monitoring, sandboxing on return, independent controls validation annually, and a full-time designated governance owner. Total governance cost: significant, and justified by the exposure.
Most companies sit somewhere in between. The risk assessment tells you where you are on that spectrum — and therefore what you need to spend and what you can manage with lighter controls.
The risk assessment does not need to be a lengthy document. It needs to answer four questions honestly:
This assessment is the foundation. Every safeguard discussed in this article has a cost — in money, in time, in operational complexity. The risk assessment is what tells you which ones your situation actually requires.
Most companies entering AI optimization are not new to data protection. They have NDAs. They have vendor agreements. They have IT policies and legal review processes that have served them well for years. The issue is not carelessness — it is that those protections were designed for a different kind of vendor relationship, one where data goes in, a service comes out, and the vendor retains nothing meaningful in between.
AI does not work that way. A standard vendor NDA says the vendor will not disclose your confidential information. That language was written for a world where a vendor receives a file, does work, and returns a deliverable. It says nothing about whether your data is used to improve a model that then serves the vendor's other customers. It says nothing about sub-processors. It says nothing about what happens to data patterns after the contract ends, even when the raw data has been deleted.
A mid-size law firm starts using an AI tool to draft and review contracts. Attorneys paste in client names, deal terms, acquisition prices, confidential business structures, and proprietary transaction strategies. The AI tool is on a standard subscription tier. The vendor's terms of service on that tier permit using customer inputs to improve the underlying model.
Six months later, a competitor's attorney uses the same AI tool to draft a similar acquisition agreement. The model — trained partly on the first firm's confidential inputs — suggests contract language and deal structure terms that are suspiciously specific to the first firm's client situation. The competitor's attorney has no idea where those suggestions came from. Neither does the AI vendor, because the model has absorbed thousands of inputs and cannot trace any single output back to its source.
The first firm's client never consented to their deal terms being used this way. The NDA between the firm and the AI vendor said "do not disclose confidential information" — but the vendor never disclosed it directly to anyone. They used it to improve a model. Most NDAs do not address that distinction.
What would have prevented this: A signed Data Processing Agreement on an enterprise tier explicitly prohibiting training use. The cost difference between standard and enterprise tier for most tools: a few hundred dollars a month. The cost of the alternative: potentially the client relationship, bar complaints, and litigation.
What applies to every company: A manufacturer pasting proprietary process specifications into an AI assistant. A financial services firm running client portfolio data through an AI analysis tool. An HR team using AI to process employee performance records. The category of data changes. The mechanism of risk is identical.
"Is our data used to train or improve your models — including models operated by your sub-processors?"
This needs to be asked explicitly and answered in writing. "We take data privacy very seriously" is not an answer. The prohibition on training use needs to be in the signed Data Processing Agreement and must extend to every sub-processor downstream, not just the primary vendor.
If the vendor cannot answer this with specifics, or if the DPA does not address it explicitly, assume training use is permitted under current terms until proven otherwise.
"Who are your sub-processors, where are they located, and do your data handling obligations extend to them contractually?"
When you hand data to an AI vendor, that data rarely stays with just that vendor. It flows through cloud infrastructure providers, model hosting services, logging tools, and sometimes human review processes. Each is a sub-processor and each is a potential exposure point. Ask for the list. Ask where each stores data geographically. Ask whether your negotiated terms flow down to each sub-processor contractually. If the answer is no, your protections stop at the first handoff.
"Who has access to production data during development, debugging, and maintenance — and where are those people located?"
AI optimization products are frequently built and maintained by development teams in countries with different privacy laws and different legal frameworks for accountability. That is a legitimate cost and talent decision. It becomes a risk question when those teams have access to production data without equivalent controls to a domestic employee. Ask who has access, whether access is logged and auditable, and what your practical recourse is if something goes wrong across that jurisdiction gap.
"Where are you incorporated, which jurisdiction governs our contract, and what is your dispute resolution mechanism?"
AI startups are frequently incorporated in jurisdictions chosen for ease of formation or tax efficiency. If a vendor is incorporated offshore with contract terms specifying arbitration in a neutral jurisdiction and professional indemnity coverage capped well below potential breach liability, the practical path to recovery after a data incident is long and uncertain. Your clients hold you accountable for what your vendors do with their data. That accountability runs upstream to you regardless of what your internal vendor contract says.
"Do you carry cyber liability or professional indemnity insurance, and does coverage extend to incidents caused by your sub-processors?"
This question most quickly reveals the gap between stated commitments and actual financial accountability. Many AI vendors — particularly earlier-stage companies — carry limited coverage relative to the scale of data they handle. Ask for documentation. If the vendor cannot produce it, there is no meaningful financial backstop behind their data handling commitments. Bonding is worth raising separately, particularly in regulated industries where vendors handling sensitive data may be required to be bonded as a condition of engagement.
The table below maps each safeguard to the risk level that justifies it. This is the cost-versus-benefit framework in practice. Your risk assessment tells you which row you are in. That row tells you what you need to build and what you can defer.
| Safeguard | Risk Level | Relative Cost | What It Addresses |
|---|---|---|---|
| Signed Data Processing Agreement | All levels | Low — administrative | Training use, deletion, breach notification |
| Data minimization policy | All levels | Low — process change | Limits exposure at source |
| Internal AI tool inventory | All levels | Low — one-time exercise | Surfaces shadow AI use by employees |
| Written incident response procedure | All levels | Low — one page | Defines response before it is needed |
| Egress monitoring and access logging | Medium & high | Medium — configuration | Real-time visibility, self-owned audit trail |
| PII masking and tokenization | Medium & high | Medium — technical build | Vendor never sees real identities |
| Data validation and sandboxing on return | Medium & high | Medium — technical build | Blocks malformed or malicious returned data |
| Designated AI governance role | Medium & high | Medium — assignment or hire | Owns monitoring, alerts, vendor oversight |
| Periodic internal controls self-testing | Medium & high | Low — scheduled internally | Confirms controls actually work, not just exist |
| On-premise hardware deployment | High only | High — infrastructure | Data never leaves your network |
| Independent third-party controls audit | High only | High — external engagement | Independent validation for clients and regulators |
Signed Data Processing Agreement — required at every risk level, no exceptions. Specifies training use prohibition, sub-processor obligations, deletion timelines, and breach notification. Request before any data sharing begins.
PII Masking and Tokenization — before sending data to any AI vendor, real identifying information is replaced with anonymous tokens. The AI works on masked data, returns outputs referencing the same tokens, and your internal system re-links them. The vendor completes the work without ever seeing client names, account numbers, or contract values. If they are breached, the attacker gets tokens that map to nothing outside your system.
Data Validation and Sandboxing on Return — AI outputs re-entering your systems are received into an isolated environment first, scanned against expected formats and content rules, and only then allowed into your live systems. Antivirus scanning of returned data files is the baseline. Schema validation adds a second layer. This addresses the risk most companies never consider: not what leaves, but what comes back.
On-Premise Hardware — for organizations where data sensitivity justifies it, running the AI model locally on your own servers eliminates vendor-side data exposure entirely. Higher upfront cost, zero vendor data risk. The right architectural choice for legal, financial, healthcare, and defense contractor environments handling their most sensitive data.
Egress Monitoring and Access Logging — owned by you, run by you — egress monitoring tracks data leaving your own network: volume, destination, frequency, content type. You configure it. You receive the alerts. You review the reports. When an anomaly appears — an unusually large transfer to an AI vendor endpoint, a transfer outside business hours — your team sees it in real time and decides how to respond. This is your visibility into your own data flows, not a vendor watching you. Access logging inside the AI system creates an audit trail that you own and can produce on demand if a client or regulator asks what happened to their data.
Periodic Internal Controls Self-Testing — on a defined schedule, quarterly is typical, your team runs the controls to confirm they actually work. Did the egress alert fire when it should have? Did tokenization replace all PII fields correctly? Did the incident response procedure get followed when there was a minor issue last month? This is a fire drill, not an audit. You run it internally. No outside vendor required. It converts theoretical controls into verified ones.
Independent Third-Party Controls Audit — a separate exercise at a higher level of formality. A penetration test, a SOC 2 audit, a formal AI governance review conducted by an outside firm. More expensive, less frequent, but carries independent credibility that self-attestation does not. When a major client or regulator asks "who validated your controls?" an independent review answers that question in a way your own testing cannot. Start with internal self-testing and build toward independent validation when client contracts or regulatory requirements demand it.
This is the piece most companies miss entirely — and it is where informal AI adoption by employees turns into a managed, auditable, defensible program.
This does not need to be a new hire. It does not need to be a full-time position at lower risk levels. It almost certainly should not be a traditional IT manager whose focus is network maintenance and hardware. That skill set is valuable but it is not data governance.
In many organizations this person already exists under a different title. A compliance manager. A senior operations director. A CFO in a smaller company who already owns vendor relationships and risk oversight. The gap is not headcount — it is formal assignment, defined scope, the right tools and alerts configured, and the authority to act when something requires action.
The reason this matters specifically for employee AI use: when your team uses AI tools informally — which they are doing right now — there is currently no one whose job it is to know which tools are in use, what data those tools are seeing, whether those tools have signed DPAs, or whether any of them represent a material exposure. The governance role closes that gap and converts invisible informal AI use into a managed program where someone is paying attention.
The companies that come out of an AI governance project in the strongest position are not the ones that treated it as a compliance exercise. They are the ones that used it as the forcing function to finally audit their existing data practices, close the gaps they already had, and build controls that protect them across every vendor relationship — not just the AI one. — Monte Fisher, CPA (Ret.), CFE
Most of the safeguards in this article are not specific to AI. They are sound data governance practices that apply to every vendor relationship involving sensitive data. Your AI project is the reason to finally put them in place. The payoff extends well beyond the AI project itself.
The internal risk audit that precedes a well-governed AI deployment frequently uncovers things that were already there: vendor contracts not reviewed in years, employee access to data exceeding their job requirements, outbound data flows through tools never formally approved. Finding and fixing those things is permanent value. The AI project is the trigger. The security improvement is lasting.
The Fisher AI Implementation Gauge — the free 15-question self-assessment at vcanalytics.ai/ai-governance.html — measures data governance and vendor due diligence as two of its five scoring categories. The questions in this article are a direct expansion of what the FAIG surfaces at a higher level of detail.
If you have taken the FAIG and scored low on vendor due diligence or data governance, this article describes specifically what that gap looks like and what closes it. We are expanding the FAIG to include a dedicated data privacy and vendor accountability module covering offshore talent risk, foreign vendor jurisdiction, bonding and insurance verification, DPA completeness, and organizational governance role definition. If you are working through these questions for a specific AI project, trying to define what an AI governance function should look like in your organization, or simply trying to figure out whether your current exposure is low, medium, or high — message me directly. The initial conversation is always free.
The risk assessment is where everything starts. Monte personally reviews every message. The initial conversation is always free — no obligation to proceed. If your situation is low risk and a basic DPA is sufficient, he will tell you that. If gaps need closing, he will tell you that too — with specifics, not a sales pitch.
Message Monte · WhatsApp Free FAIG Assessment →