🔎 In Q1 2026, companies pushed AI into production faster than ever — and then the failures started. This is the complete AI vendor due diligence checklist for 2026: 50+ questions across 8 categories, a red flag guide, a 100-point vendor scorecard, and every EU AI Act, NIST, and ISO 42001 compliance question your team needs to ask before signing anything.
Last Updated: May 31, 2026
The pattern is consistent and now well-documented. AI vendor due diligence that is rushed, incomplete, or based on marketing claims rather than verifiable technical evidence is the single most reliable predictor of failed AI deployments in 2026. Systems that worked perfectly in demos began failing in production. Outputs were inconsistent. Teams could not explain why the model behaved differently day to day. In some cases, AI features had to be rolled back entirely after users lost trust in the results. And then came the costs — token usage spiking at scale, API dependencies accumulating, and finance teams asking questions that engineering could not answer clearly. Deloitte’s 2026 technology predictions found that while 78% of organizations report active AI initiatives, only 39% report enterprise-level EBIT impact. The gap traces consistently to the same root: poor data quality, lack of governance, and weak integration planning — all detectable during due diligence and invisible after contract signature.
The compliance dimension has changed materially in 2026 in ways that make rigorous vendor evaluation a legal obligation rather than a best practice. NIST’s AI Risk Management Framework provides the US voluntary standard for AI vendor governance. ISO/IEC 42001 — the first international AI management system standard — maps directly to seven core EU AI Act articles and is increasingly required by enterprise procurement as a baseline vendor certification. The EU AI Act imposes penalties up to €35 million or 7% of global annual turnover for serious violations — and critically, deployers cannot transfer regulatory liability to providers by contract where the Act places obligations directly on the deploying organization. You are accountable for the AI systems you deploy, regardless of what your vendor contract says. That accountability begins before you sign the contract, not after you encounter a problem.
This article is the most comprehensive AI vendor due diligence checklist for 2026 available — rebuilt from the ground up to reflect current regulatory requirements, the failure modes documented in Q1/Q2 2026 deployments, and the specific questions that separate enterprise-ready AI vendors from those that will cost you more to remediate than they delivered in value. The checklist covers 50+ questions across eight categories, with a structured table for each category showing what the question covers, what a red flag answer looks like, and what a good answer sounds like. You will also find a top-10 red flags guide drawn from current vendor documentation failures, and a 100-point vendor scorecard you can use to compare multiple vendors objectively before committing.
📖 New to AI terminology? Visit the AI Buzz AI Glossary — 65+ essential AI terms explained in plain English, each linking to a full in-depth guide.
1. 🤔 Why AI Vendor Due Diligence Is Different From Standard Software Procurement
Traditional vendor risk management focuses on three things: financial stability (will this vendor be here in three years?), operational uptime (will their service be available when we need it?), and basic security controls (do they have SOC 2?). For standard SaaS procurement, those three dimensions cover most of the risk exposure. For AI vendor procurement, they cover less than a third of it — because AI introduces failure modes that standard software does not have and that standard procurement frameworks were never designed to detect.
AI models do not behave predictably across different input distributions. A chatbot that passes every test case in your evaluation can generate biased, harmful, or factually incorrect outputs in production when users submit queries outside the evaluation distribution. A classification model that achieves 94% accuracy in testing can perform dramatically differently on the demographic subgroups that matter most to your compliance team. AI costs do not scale linearly with usage — they spike when token consumption, model updates, and infrastructure demands interact in ways that are invisible during a cost-modeling exercise based on pilot usage. And most significantly: AI failure is often invisible until it has already caused harm. Unlike software that crashes visibly, an AI system that produces confidently wrong outputs continues operating — processing transactions, generating recommendations, drafting communications — while the damage accumulates.
The supply chain dimension of AI vendor risk is particularly underappreciated. A-LIGN’s 2026 analysis of ISO 42001 and EU AI Act requirements makes the accountability chain explicit: under both frameworks, if a third party can influence system behavior, you remain accountable for the outcome. Management systems and product safety regulation share a common logic — responsibility follows the system, not the contract. The vendor’s dependency on underlying model providers (OpenAI, Anthropic, Google, or open-weight alternatives) creates exposure beyond direct vendor control. The vendor’s training data sourcing creates copyright and privacy compliance exposure. The vendor’s subprocessors determine where your data actually travels. Vendor due diligence now requires understanding not just the vendor, but also their underlying model providers and their full dependency stack. Organizations that skip this depth of evaluation consistently pay for that shortcut — either in compliance incidents, in production failures, or in the shadow AI breaches that cost an average $670,000 more than standard security incidents when they occur.
Before working through the checklist below, connect this evaluation to your organization’s broader AI governance framework. Our AI governance guide covers the policy and accountability infrastructure that makes vendor due diligence part of a coherent governance program rather than a one-time procurement exercise. Our guide to shadow AI risks covers the governance failure that emerges when employees adopt AI tools outside the formal procurement process — and how rigorous vendor evaluation is the foundation that prevents that proliferation.
2. 🔒 Category 1: Security and Data Protection
Security and data protection is where AI vendor evaluation is most frequently performed — and most frequently performed superficially. SOC 2 Type II certification is necessary but insufficient: it covers general data security controls and says nothing about AI-specific risks like model bias, hallucination rates, prompt injection vulnerability, or data leakage through model outputs. The questions below go beyond the standard security checklist to address the specific ways that AI systems create novel security exposure that traditional controls do not cover. Every question marked ⚠️ is a contract-blocking requirement — if the vendor cannot answer it satisfactorily, do not proceed regardless of other evaluation factors.
| Question ⚠️ = Contract-Blocking | Why It Matters | 🚨 Red Flag Answer | ✅ Good Answer |
|---|---|---|---|
| ⚠️ Will our data be used to train or fine-tune your AI models? | Your proprietary data training a shared model means competitors could benefit from your data | “We may use data to improve our models” without explicit opt-out | “Your data is never used for training. This is contractually prohibited without written consent.” |
| ⚠️ Where is our data processed and stored — exactly which countries and data centers? | GDPR, HIPAA, and data sovereignty laws restrict where personal data can travel | “Our infrastructure is globally distributed” with no specific residency commitments | Specific data center locations listed; EU data stays in EU; contractual data residency guarantees |
| ⚠️ What third-party model providers or subprocessors does your system depend on? | Your data may travel to OpenAI, Anthropic, Google, or other providers not covered in the vendor’s DPA | Refuses to disclose subprocessors or provides only a generic list | Complete subprocessor list with DPA coverage for each; notification process for new subprocessors |
| ⚠️ Do you hold SOC 2 Type II certification? Can we see the report? | Type II (not Type I) demonstrates sustained controls over time, not just point-in-time design | SOC 2 Type I only, expired certification, or refusal to share the report | Current SOC 2 Type II report shared under NDA; willingness to answer questions about findings |
| How long is our data retained after contract termination? | Data that persists after contract end creates ongoing liability and exposure | “Data retained indefinitely” or vague language about “legal obligations” | Specific retention period stated; contractual deletion certification within 30 days of termination |
| ⚠️ Has your system been tested for prompt injection vulnerabilities? | Prompt injection is the #1 OWASP risk for LLM applications — an unmitigated vulnerability can compromise your data and systems | “We rely on the underlying model’s safety filters” without vendor-level testing | Red team testing results available; specific injection defenses documented; penetration test evidence |
| What encryption standards are applied to data in transit and at rest? | Baseline data protection — weak encryption is exploitable | No specification, or TLS 1.1 and below (outdated) | TLS 1.3 in transit; AES-256 at rest; key management documented |
| ⚠️ What is your incident notification SLA if our data is breached or the AI behaves anomalously? | GDPR requires 72-hour breach notification; EU AI Act requires incident reporting — vendor must support this | “We will notify you as soon as reasonably practicable” without specific timeframes | Contractual 24–48 hour notification SLA; AI-specific behavioral incident reporting included |
3. 📜 Category 2: Compliance and Regulatory (EU AI Act, NIST, ISO 42001)
The compliance landscape for AI vendors changed materially in 2026. The EU AI Act’s full obligations for high-risk AI systems became applicable on August 2, 2026 — and while a provisional Digital Omnibus agreement reached in May 2026 proposes deferring standalone Annex III high-risk system obligations to December 2027, the original deadline remains law until formal adoption. Organizations must plan against December 2027 while maintaining readiness for the earlier date. ISO 42001 maps directly to seven core EU AI Act articles: risk management (Article 9), data governance (Article 10), technical documentation (Article 11), record-keeping (Article 12), transparency (Article 13), human oversight (Article 14), and quality management (Article 17). Vendors holding ISO 42001 certification demonstrate organizational AI governance maturity — but as A-LIGN’s 2026 analysis confirms, certification alone does not substitute for evidence of Article 17 QMS conformity. For procurement decisions in 2026, treat ISO 42001 as a strong signal that requires verification, not a compliance pass. Our guide to ISO/IEC 42001 covers what the standard requires and how to evaluate a vendor’s certification meaningfully.
| Question | Why It Matters | 🚨 Red Flag Answer | ✅ Good Answer |
|---|---|---|---|
| ⚠️ Does your system constitute high-risk AI under the EU AI Act Annex III? If so, what is your conformity assessment status? | You are accountable as a deployer even if the vendor built the system — deployers cannot transfer EU AI Act liability by contract | “We’re not sure” or “that doesn’t apply to us” without documented analysis | Written legal analysis of risk classification; conformity assessment documentation if high-risk |
| Do you hold ISO/IEC 42001 certification? Can you share the certificate and scope? | ISO 42001 demonstrates organizational AI governance maturity aligned with seven EU AI Act articles | Claims alignment without certification; expired certificate; scope excludes the relevant product | Current ISO 42001 certificate with scope that explicitly covers the product being evaluated |
| ⚠️ How does your system support our bias impact assessment obligations under the Colorado AI Act (February 2026)? | Colorado AI Act mandates bias impact assessments for high-risk AI in employment, housing, credit, healthcare | “We aren’t aware of that regulation” or no bias testing documentation | Disaggregated performance data available; bias audit reports on request; documented testing methodology |
| Does your system align with the NIST AI Risk Management Framework (AI RMF)? Which functions — Govern, Map, Measure, Manage? | NIST AI RMF is the primary US enterprise AI governance standard — vendors aligned to it demonstrate structured risk management | Claims “full NIST alignment” without documentation of which specific controls are implemented | Written NIST AI RMF mapping document available; specific Govern/Map/Measure/Manage controls documented |
| ⚠️ Does your EU AI Act Article 13 technical documentation cover all required elements — capabilities, limitations, human oversight requirements? | Article 13 requires providers to supply deployers with comprehensive documentation before deployment of high-risk AI | No Article 13 documentation; “available on request” without providing it pre-contract | Complete Article 13 documentation package delivered pre-contract with all ten required information categories |
| How do you handle GDPR data subject rights (access, erasure, portability) in AI-processed data? | If personal data is processed by the AI, data subject rights under GDPR must be supportable | “Contact our support team” without a documented procedure or contractual commitment | Documented DSAR process with defined response timelines; data minimization design documented |
| Are you registered in the EU AI Act database for high-risk AI systems, if applicable? | High-risk AI system providers must register in the EU database before placing systems on the EU market | “We’re working on it” if the system is already deployed in the EU market | Registration confirmation number provided; EU database entry accessible and current |
4. 🔬 Category 3: Model Transparency and Documentation
Model transparency is the category where vendor documentation most commonly fails to meet the evidence standard that regulators and enterprise governance teams require in 2026. Marketing materials promise “explainable AI,” “responsible AI practices,” and “bias-free outputs.” Evidence-based due diligence requires verifiable technical documentation — model cards, datasheets for datasets, performance benchmarks disaggregated by demographic group, and red team testing results — not marketing language. The hidden risks identified most consistently in 2026 enterprise AI reviews are model opacity (no visibility into how decisions are made), dependency on external providers (the vendor’s AI is powered by a model the vendor does not control), and limited visibility into training data provenance. These are not edge cases — they are the norm in vendor documentation packages that have not been subjected to genuine technical scrutiny.
| Question | Why It Matters | 🚨 Red Flag Answer | ✅ Good Answer |
|---|---|---|---|
| ⚠️ Do you provide a model card for this system? Does it include disaggregated performance metrics by demographic group? | Model cards document intended use, limitations, bias evaluation — required for EU AI Act Article 13 and Colorado AI Act compliance | No model card; model card without disaggregated performance data; “aggregate accuracy is 94%” only | Published model card with performance broken down by age, gender, geographic region, and relevant demographic factors |
| What training data was used? Where did it come from, and how was it licensed? | Training data provenance determines copyright exposure, privacy risk, and potential bias inherited from the dataset | “Our proprietary dataset” with no further details; “web-scraped data” without consent documentation | Datasheet for Datasets provided; licensed data sources documented; consent and privacy compliance evidence available |
| What is the measured hallucination rate on tasks relevant to our use case? | Hallucination rates vary dramatically by domain — enterprise legal, medical, and financial tasks routinely see 10–50%+ rates without mitigation | “Our AI is highly accurate” without specific measurement methodology or domain-specific benchmark data | Domain-specific hallucination rate measured with defined methodology; mitigation controls documented |
| ⚠️ How are model outputs explained? Can the system provide a rationale for consequential decisions? | EU AI Act Article 13 requires high-risk AI outputs to be interpretable; UK and US regulations increasingly require explainability for high-stakes decisions | “The model is a black box” or “explainability isn’t available for this architecture” | Feature attribution, confidence scores, or audit trail provided; explanation format described in documentation |
| What bias testing was conducted? Which demographic groups were tested, and what were the results? | Undisclosed bias in AI systems used for hiring, lending, or healthcare creates legal liability and real harm | “We test for bias” without methodology, test groups, or quantified results | Bias audit report with specific demographic groups tested, parity metrics used, and findings with remediation steps |
| How often is the model updated or retrained, and how are deployers notified? | Model updates can change system behavior in ways that invalidate your compliance documentation and governance decisions | “Models are updated regularly to improve performance” without a notification process | Advance notification of significant model changes; behavioral change documentation; rollback option if needed |
🔒 Building an AI governance framework? Browse the AI Buzz Governance & Security Hub — 30+ in-depth guides covering OWASP, NIST, ISO 42001, AI risk management, and enterprise AI security frameworks.
5. ⚙️ Category 4: Performance and Reliability
Performance and reliability questions are where vendor evaluation most commonly fails in a different direction — organizations assess performance in controlled demo conditions and extrapolate those results to production environments that are fundamentally different. AI systems often perform well in controlled demos but fail under real-world conditions. Traditional validation focuses on expected scenarios while AI failures happen in edge cases, ambiguous inputs, and at scale. The questions below target the gap between demo performance and production reliability that Q1/Q2 2026 deployment failures consistently reveal.
| Question | Why It Matters | 🚨 Red Flag Answer | ✅ Good Answer |
|---|---|---|---|
| What is your documented uptime SLA, and what is your actual historical uptime over the past 12 months? | AI systems embedded in production workflows create business impact when unavailable — remedies must be meaningful | “99.9% uptime” with no historical evidence or remedies limited to service credits only | Publicly available status page; last 12 months uptime data; SLA remedies that reflect actual business impact |
| ⚠️ Can you demonstrate performance on our specific data distribution — not just on your standard benchmarks? | Benchmark performance on standard datasets does not predict performance on your proprietary data and use cases | Refuses evaluation on customer data; offers only pre-recorded demos; cannot provide a technical evaluation environment | Structured evaluation environment provided; willing to be tested on a sample of your real-world data under NDA |
| How does system performance change at 10x and 100x your current volume? What is your scalability evidence? | AI costs spike non-linearly at scale; latency degrades under load; systems that perform well at pilot volume fail at production volume | “Our architecture scales automatically” without load test data or reference customer evidence | Load test results at target volume; reference customers at similar scale available for reference calls |
| What is your model drift detection mechanism? How do you alert when model performance degrades post-deployment? | Models trained on static data degrade as real-world data distributions change — drift is invisible without active monitoring | “We monitor our models” without a defined drift detection methodology or customer alerting process | Automated drift detection with defined thresholds; customer notification process; retraining or update SLA documented |
| What is your disaster recovery and business continuity plan for AI services? What is your RTO and RPO? | AI system unavailability may affect production operations — recovery time and data recovery objectives must align with your business requirements | No documented BCP; RTO/RPO not specified; “we haven’t tested this scenario” | Documented BCP with specific RTO/RPO; annual DR test results; geographic redundancy confirmed |
6. 💰 Category 5: Pricing and Contract Terms
AI pricing is uniquely susceptible to the gap between pilot cost and production cost — and that gap has been a primary driver of the budget shock that organizations are experiencing in Q1/Q2 2026. What looks manageable at pilot scale quickly becomes unpredictable at production scale because AI pricing depends on usage patterns (token consumption, API calls, model tier selection) that are difficult to estimate from limited pilot data. Token usage spikes. API dependencies accumulate. Model tier upgrades are required when capability constraints emerge in production. Finance teams start asking questions that engineering cannot answer clearly. Rigorous pricing due diligence models production cost scenarios before contract signature.
| Question | Why It Matters | 🚨 Red Flag Answer | ✅ Good Answer |
|---|---|---|---|
| ⚠️ What is the fully loaded cost model at 1x, 5x, and 10x our current usage volume? Can you model this for us with our specific use case parameters? | AI pricing compounds non-linearly — understanding the scaling curve prevents budget shock when the system succeeds | Refuses to model scaled costs; “pricing depends on usage” without a cost modeling tool | Written cost model for three usage scenarios; cost calculator or pricing model tool provided |
| Are there price change restrictions during the contract term? What is the maximum annual price increase? | Vendor lock-in plus uncapped price increases is an existential commercial risk for systems embedded in production workflows | “Pricing is subject to change at our discretion” with no notice period or cap | Contractual price cap (e.g., CPI + 2%); 90-day notice for any price changes; renegotiation rights on major changes |
| What are the usage limits and what happens — technically and financially — when we exceed them? | Rate limiting that causes system failure can affect production operations; overage pricing can create unbudgeted costs | Hard cutoff with no warning; overage rates more than 2x base pricing; no contractual overage protection | Advance warning before limits are reached; negotiated overage rates in contract; burst capacity option available |
| Does the contract include meaningful SLA remedies beyond service credits? What is the liability cap? | Service credits for downtime do not compensate for business impact of AI failures — the liability structure must reflect real exposure | Service credits only; liability cap set at one month of fees; general software liability exclusions that ignore AI-specific harms | Termination rights for sustained SLA breach; liability cap reflects business value at risk; AI-specific harm provisions included |
| Are there discounts or better terms available via hyperscaler marketplaces (AWS, Azure, GCP)? | Marketplace purchasing may apply committed spend credits and simplify procurement — worth exploring before direct contract | Not applicable to every vendor — absence is not a red flag; relevant only when committed cloud spend exists | Marketplace listing available; committed spend credit eligibility confirmed; equivalent terms to direct contract |
7. 🔧 Category 6: Integration and Technical Requirements
| Question | Why It Matters | 🚨 Red Flag Answer | ✅ Good Answer |
|---|---|---|---|
| ⚠️ What are the integration dependencies — which specific systems, APIs, and credentials does the AI tool need access to? | Every system integration is a potential security surface; overly broad access requests are a governance risk | Requests admin-level access; cannot specify minimum required permissions; “the connector needs full read/write access” | Principle of least privilege documented; specific API scopes listed; integration architecture diagram provided |
| Does the system support single sign-on (SSO) and multi-factor authentication (MFA) via our identity provider? | Separate credential management creates security gaps and increases the risk of unauthorized access | SSO not supported; requires separate password-based accounts; MFA optional rather than enforced | SAML 2.0 or OIDC SSO supported; MFA enforced; works with Okta, Azure AD, or your specific IdP |
| What is your API versioning policy? How much notice do you provide before breaking API changes? | Breaking API changes can require significant engineering effort — inadequate notice creates unplanned development work | No versioning policy; “we maintain backward compatibility whenever possible” without guarantees | Semantic versioning; minimum 6-month deprecation notice; old API version maintained for stated period |
| ⚠️ Is an on-premises or private cloud deployment option available for air-gapped or data sovereignty requirements? | Regulated industries and government deployments often require data to never leave the organization’s controlled environment | “Cloud-only deployment” without exception — if data sovereignty is a requirement, this is a contract-blocking limitation | VPC deployment or on-premises option available; data never leaves customer environment; licensing supports self-hosted deployment |
| What audit logging does the system generate? Is every AI action and data access logged and exportable? | EU AI Act Article 12 requires automatic logging for high-risk AI; regulatory investigations require full audit trails | Logs available in the vendor’s dashboard only; no export capability; retention less than your regulatory requirement | Full audit logs exportable in standard format (JSON, CSV); configurable retention; SIEM integration supported |
8. 🎧 Category 7: Support and SLA Commitments
| Question | Why It Matters | 🚨 Red Flag Answer | ✅ Good Answer |
|---|---|---|---|
| What is the support response time SLA for critical production incidents? Is 24/7 support included or an add-on? | AI production failures can occur at any time — support response times must match the business impact of an outage | Business hours only support for production systems; 24/7 available only at enterprise tier with 5x pricing | 24/7 production incident support included; P1 response within 1 hour; named technical account manager available |
| ⚠️ How do you handle AI-specific incidents — outputs that are biased, harmful, or factually wrong in ways that have caused real-world impact? | AI failure modes are categorically different from software downtime — vendor must have AI-specific incident response | “Submit a support ticket” — treating AI behavioral failures the same as software bugs | Dedicated AI incident response process; root cause analysis commitment; remediation timeline SLA for model behavior issues |
| What is the onboarding support commitment? Is there a dedicated implementation team or just documentation? | AI implementation failures typically happen during onboarding — poor integration planning is the leading cause | “Self-service onboarding with documentation” for complex enterprise deployments | Dedicated implementation project manager; defined onboarding milestones; success criteria agreed pre-contract |
| Do you provide training for our team on responsible use, prompt construction, and output verification? | EU AI Act Article 4 (AI literacy) and responsible AI deployment require that users understand how to use AI systems safely | “Training materials are in the documentation portal” — self-service only for regulated use cases | Role-specific training included; AI literacy content meets EU AI Act Article 4 requirements; refresher training on major updates |
9. 🚪 Category 8: Exit Strategy and Data Portability
Exit strategy is the category that every buyer ignores until they urgently need it — and by then the leverage to negotiate favorable terms has evaporated entirely. Vendor lock-in with AI systems has two dimensions that standard SaaS contracts do not fully address: the data portability dimension (can you get your data out in a usable format?) and the model portability dimension (does the system create dependencies on proprietary model APIs, prompt formats, or fine-tuned weights that make switching costs prohibitively high?). The organizations most exposed to lock-in risk are those that have built custom workflows, fine-tuned models, or integrated deeply with a vendor’s proprietary API stack without contractual protections for the exit scenario. Negotiate exit terms before you need them.
| Question | Why It Matters | 🚨 Red Flag Answer | ✅ Good Answer |
|---|---|---|---|
| ⚠️ Can we export all our data — inputs, outputs, configurations, logs — in a standard, non-proprietary format within 30 days of contract termination? | Data trapped in a vendor’s proprietary format after termination is effectively inaccessible — a compliance and operational risk | “Data export is available at additional cost”; proprietary format only; 90+ day export process | Standard format (CSV, JSON, Parquet) export within 30 days; no additional charge; completeness of export documented |
| If we have fine-tuned a model on our proprietary data, do we own those fine-tuned weights? Can we export them? | Fine-tuning on proprietary data creates IP value — if the vendor owns the result, you cannot take it when you leave | “Fine-tuned models remain on our platform and are not exportable”; vendor retains ownership of customer fine-tunes | Customer owns fine-tuned weights derived from customer data; export in standard format contractually guaranteed |
| What is the transition support commitment if we decide to move to a different vendor? | Migration complexity can be weaponized to prevent switching — contractual transition support reduces this risk | No transition support; migration documentation not provided; exit treated as a support issue rather than a service commitment | Contractual transition assistance period (e.g., 90 days); migration documentation provided; data export support included |
| ⚠️ What happens to our data if the vendor is acquired, goes insolvent, or discontinues the service? | AI vendor consolidation is accelerating in 2026 — acquisition can change data handling terms, pricing, and product roadmaps | “Material change” notification only; no exit rights triggered by acquisition; data retention undefined in insolvency | Termination for convenience right triggered by acquisition; data escrow arrangement; insolvency data recovery procedure documented |
| Are there open standards or APIs that reduce switching costs if we need to migrate to a different provider? | Proprietary API formats make migration expensive — open standards (OpenAI-compatible APIs, LangChain support) reduce lock-in | Entirely proprietary API and data format with no standard equivalents | OpenAI-compatible or standard API format; data in portable format throughout the relationship, not just at exit |
10. 🚨 Top 10 Red Flags in AI Vendor Documentation (2026)
The ten patterns below are the most consistently observed documentation failures in AI vendor evaluation packages across Q1/Q2 2026 enterprise procurement reviews. They are not hypothetical risks — they are documented failure modes that have caused production incidents, compliance violations, and contract disputes at organizations that deployed without catching them during evaluation. Each red flag describes what you will see in the documentation, what it signals about the vendor’s actual posture, and what to do when you encounter it.
The red flag evaluation principle: A red flag is not automatically disqualifying — it is a signal that requires a direct, specific response. The vendor’s response to a challenge is often more informative than the original documentation gap. A vendor that acknowledges a limitation honestly and explains their mitigation approach is significantly more trustworthy than one that deflects, becomes defensive, or provides a vague reassurance that sounds like it answers the question without actually doing so.
Red Flag 1: “Enterprise-Grade Security” With No Certifications. Vendors that describe their security as “enterprise-grade,” “bank-level,” or “military-grade” without providing a current SOC 2 Type II report, ISO 27001 certificate, or equivalent third-party audit evidence are using marketing language to substitute for evidence. Enterprise-grade is not a certification. Ask for the SOC 2 report. If they cannot provide it, treat this as a contract-blocking issue for any deployment involving sensitive data.
Red Flag 2: Vague Data Use Language in Terms of Service. Terms of service that include phrases like “we may use data to improve our services,” “anonymized data may be used for model improvement,” or “usage data is collected to enhance the platform” are intentionally broad. In the context of AI systems, “improve our services” can mean using your prompts, your outputs, and your data to train models that serve your competitors. Request explicit contractual prohibition on training use. If the vendor cannot provide it, the TOS language is the operative agreement.
Red Flag 3: No Model Card or Transparency Documentation. AI vendors that cannot produce a model card, a bias audit report, or any form of technical performance documentation for the system you are evaluating are operating with a level of opacity that is incompatible with enterprise governance requirements and increasingly incompatible with regulatory requirements. In 2026, the absence of model documentation is not a gap — it is an answer. The answer is that the vendor has not done the transparency work that responsible AI deployment requires.
Red Flag 4: Hallucination Dismissed as “Normal AI Behavior.” Vendors that respond to hallucination questions with “all AI has this limitation” without providing specific measurement data, domain-specific error rates, or mitigation controls are normalizing a production risk rather than addressing it. Hallucination rates vary enormously by domain, task type, and mitigation design. A vendor that has not measured theirs cannot control theirs. This is particularly critical for legal, medical, financial, and compliance AI applications where the cost of a confident incorrect output is material.
Red Flag 5: Refusal to Model Production Cost at Scale. AI pricing that vendors will only discuss at a “contact sales” level — without a pricing calculator, a cost modeling tool, or willingness to project costs at 5x and 10x your pilot volume — is pricing that the vendor knows will not survive scrutiny. The organizations that have experienced the worst AI budget shocks in 2026 are consistently those that signed contracts based on pilot-scale cost estimates without modeling production-scale economics.
Red Flag 6: Compliance Claims That Reference the Wrong Regulation. Vendors that cite GDPR compliance for a US-only deployment, or SOC 2 as evidence of EU AI Act compliance, or HIPAA certification as evidence of bias testing are citing real certifications as proof of compliance with frameworks those certifications do not cover. This pattern is particularly common in sales materials where the compliance section lists every certification the vendor holds regardless of whether it is relevant to your use case or jurisdiction. Do not accept certifications as answers to specific compliance questions without verifying the scope.
Red Flag 7: Exit Terms That Are Buried or Absent. Contracts where data export rights, transition support, and termination provisions are absent, vague, or placed in exhibits that require active negotiation to obtain are contracts designed to maximize switching costs. The time to negotiate exit terms is before signature — never after. If the vendor resists including explicit data portability and transition support provisions, ask why. The answer will tell you more about their long-term intentions than their sales pitch.
Red Flag 8: “We’re Aligned With NIST/ISO 42001” Without Documentation. Framework alignment claims that are not supported by a written mapping document, a certification, or third-party audit evidence are marketing claims. In 2026, the proliferation of framework alignment language in AI vendor materials has made it nearly meaningless as a differentiator without evidence. Request the specific control mapping or audit report. Our guide to ISO/IEC 42001 covers what genuine certification involves and what questions to ask to distinguish real compliance from marketing language.
Red Flag 9: No Defined Process for AI Behavioral Incidents. Vendors whose incident response process treats AI output quality failures (biased outputs, hallucinated facts, harmful content generation) the same as software downtime incidents — through a standard support ticket process — have not built the AI-specific governance infrastructure that production-grade deployment requires. AI behavioral failures are fundamentally different from software crashes: they are invisible, ongoing, and can cause harm at scale before anyone notices. The vendor needs a defined, separate process for handling them.
Red Flag 10: Terms of Service Changed Unilaterally in the Last 12 Months in Ways That Affected Data Handling. If a vendor has changed their data handling terms — training use, retention, subprocessors — without specific customer notification and consent in the past 12 months, that pattern predicts future behavior. Check the vendor’s terms of service change history before evaluation. An AI vendor that passed your checklist in 2024 may have quietly changed their data retention policy in 2026, as our original checklist noted. Build a re-review clause into every AI vendor contract — minimum annually, triggered immediately by any material terms of service change.
11. 📋 AI Vendor Scorecard: Rate Your Vendor Out of 100
The scorecard below provides a structured, objective framework for comparing multiple AI vendors against the same criteria simultaneously. Each category is weighted by commercial and compliance risk — security and data protection carries the highest weight because a single data breach or regulatory violation in that category creates consequences that no amount of feature quality can offset. Complete the scorecard for each vendor being evaluated before any procurement discussion with leadership — the numerical output forces explicit comparison against consistent criteria rather than allowing narrative framing or recency bias from a compelling demo to dominate the decision.
How to use the scorecard: Score each category on the scale shown. A total score below 60 is a disqualifying result — do not proceed regardless of feature quality or pricing attractiveness. A score of 60–74 indicates significant gaps that must be resolved contractually before deployment. A score of 75–84 is acceptable for lower-risk internal deployments with appropriate governance controls in place. A score of 85–100 represents a vendor with strong governance posture suitable for high-risk or regulated deployment contexts.
| Category | Max Points | Full Points (Score = Max) | Zero Points (Score = 0) |
|---|---|---|---|
| 1. Security and Data Protection | 20 points | All 8 questions answered satisfactorily; SOC 2 Type II provided; no training on customer data contractually prohibited; subprocessors fully disclosed | Any contract-blocking question (⚠️) answered with a red flag response |
| 2. Compliance and Regulatory | 20 points | EU AI Act risk classification documented; ISO 42001 certification current and in scope; Colorado AI Act bias assessment supported; NIST mapping available | Cannot determine EU AI Act risk classification; no compliance documentation for applicable regulations |
| 3. Model Transparency and Documentation | 15 points | Model card with disaggregated performance; training data datasheet; bias audit report; hallucination rate measured; model updates notified in advance | No model card; no performance documentation; bias testing not conducted or not disclosed |
| 4. Performance and Reliability | 10 points | Historical uptime evidence provided; evaluation on your data allowed; scalability evidence at target volume; drift detection documented | Refuses evaluation on customer data; no historical uptime evidence; no scalability data |
| 5. Pricing and Contract Terms | 10 points | Full cost model at 10x volume provided; price change caps in contract; overage protection; meaningful SLA remedies beyond service credits | Refuses to model scaled costs; no price protection; service credits only for production failures |
| 6. Integration and Technical Requirements | 10 points | Least-privilege integration architecture documented; SSO/MFA supported; API versioning policy clear; full audit logging exportable | Requires admin-level access; no SSO support; no audit log export capability |
| 7. Support and SLA Commitments | 8 points | 24/7 production support included; AI-specific incident response process; dedicated onboarding support; AI literacy training provided | Business hours only; no AI-specific incident process; self-service onboarding only for regulated deployments |
| 8. Exit Strategy and Data Portability | 7 points | Standard format export within 30 days; fine-tuned weight ownership contractually confirmed; transition support committed; acquisition exit rights included | No data portability; vendor owns fine-tuned models; no transition support; no exit rights on acquisition |
| TOTAL | 100 points | 85–100: Strong — suitable for regulated/high-risk deployment 75–84: Acceptable — lower-risk internal deployments 60–74: Gaps — resolve contractually before proceeding | Below 60: Disqualifying — do not proceed regardless of feature quality or pricing |
12. 🏁 Conclusion: Due Diligence Is Not a Gate — It Is a Foundation
The organizations that are deploying AI successfully in 2026 are not the ones with the most sophisticated vendor evaluation processes — they are the ones that treated vendor evaluation as the beginning of a governance relationship rather than a procurement hurdle to clear before getting to the interesting parts. The questions in this checklist are not designed to slow down AI adoption. They are designed to make AI adoption sustainable — by ensuring that the systems you deploy are actually built on the evidence-based governance foundations that make them trustworthy enough to scale, safe enough to remain deployed under regulatory scrutiny, and commercially structured in ways that do not create budget shocks and lock-in dependencies that limit your strategic flexibility later.
The regulatory environment in 2026 has made this more urgent, not less. Deployers cannot transfer EU AI Act liability to providers by contract — you are accountable for the AI systems you deploy regardless of what your vendor documentation says. The Colorado AI Act creates parallel obligations at the state level for high-risk AI in employment, housing, and financial services. The NIST AI RMF provides the operational framework for managing vendor AI risk continuously — not just at procurement time. Building the habit of rigorous vendor evaluation is the governance investment that compounds: every vendor you evaluate well makes the next evaluation faster, every contract term you negotiate once becomes a template for every subsequent negotiation, and every red flag you catch before deployment is a compliance incident, budget shock, or production failure you never have to manage. Use the checklist. Complete the scorecard. And review every vendor annually — because an AI vendor that passed your checklist last year may have quietly changed the terms that matter most.
📌 Key Takeaways
| Key Takeaway | |
|---|---|
| ✅ | Deployers cannot transfer EU AI Act regulatory liability to providers by contract — you are accountable for the AI systems you deploy regardless of vendor documentation, making pre-contract due diligence a legal obligation rather than an optional governance best practice. |
| ✅ | The three questions that must be answered before any AI vendor evaluation proceeds: Does our data get used to train their models? Where exactly does our data travel — including all subprocessors? And which specific EU AI Act risk category does this system fall into? |
| ✅ | ISO 42001 certification maps directly to seven EU AI Act articles and is a strong organizational signal — but it does not substitute for evidence of Article 17 QMS conformity. Treat certification as a starting point for compliance verification, not as a compliance pass. |
| ✅ | AI pricing at pilot scale does not predict production-scale costs — token usage spikes, API dependency accumulation, and model tier requirements create budget shocks that are invisible during evaluation unless you explicitly model costs at 5x and 10x pilot volume before contract signature. |
| ✅ | Shadow AI breaches cost an average $670,000 more than standard security incidents — rigorous vendor due diligence that provides employees with approved, capable AI tools is the governance investment that prevents the shadow AI adoption that creates that exposure. |
| ✅ | A vendor scorecard total below 60 out of 100 is disqualifying regardless of feature quality or pricing — the governance gaps indicated by a score that low create compliance, financial, and operational risks that no capability advantage can offset. |
| ✅ | Exit terms — data portability, fine-tuned weight ownership, transition support, acquisition rights — must be negotiated before contract signature, not after a deployment decision makes switching costs prohibitively high. The time to negotiate leverage is when you are choosing, not when you are leaving. |
| ✅ | Build a mandatory re-evaluation trigger into every AI vendor contract: annual review as a baseline, plus immediate re-evaluation triggered by any acquisition, major security incident, terms of service change affecting data handling, or significant regulatory action involving the vendor. |
🔗 Related Articles
- 📖 AI Governance Explained: How to Build an AI Policy Framework Your Organization Will Follow
- 📖 ISO/IEC 42001 Explained: How to Build an AI Management System (AIMS)
- 📖 Shadow AI Explained: What It Is, Why It Happens, and How to Manage It
- 📖 AI Risk Assessment: How to Evaluate AI Use Cases Before You Deploy Them
- 📖 EU AI Act Explained: A Beginner-Friendly Compliance Guide and Practical Checklist
❓ Frequently Asked Questions: AI Vendor Due Diligence
1. How often should we repeat AI vendor due diligence after initial onboarding?
At minimum annually — but also triggered immediately by any vendor acquisition, major security incident, terms of service change affecting data handling, or significant regulatory action involving the vendor. An AI vendor that passed your checklist in 2024 may have changed their data retention or training use policies by 2026. Build a contractual re-evaluation trigger into every AI vendor agreement, and connect it to your organization’s broader AI governance framework so re-evaluations are systematic rather than reactive.
2. Does ISO 42001 certification mean a vendor is EU AI Act compliant?
No — ISO 42001 maps to seven EU AI Act articles and demonstrates organizational AI governance maturity, but certification alone does not substitute for evidence of Article 17 QMS conformity for high-risk AI systems. For 2026 procurement decisions, treat ISO 42001 as a strong signal that requires verification against specific EU AI Act requirements. Our ISO/IEC 42001 guide explains the mapping between the standard and the regulation and what additional evidence to request.
3. If our vendor is breached, are we liable under the EU AI Act even if we had a strong DPA?
Potentially yes — the EU AI Act places obligations directly on deployers that cannot be fully transferred to providers by contract. You remain accountable for ensuring the AI systems you deploy meet applicable requirements, regardless of vendor contractual commitments. Deployers have independent obligations under Article 26, including human oversight implementation and fundamental rights impact assessments. Our EU AI Act compliance guide covers the deployer obligation framework in detail.
4. What should we do if an AI vendor fails one or more ⚠️ contract-blocking questions?
Stop the evaluation until the issue is resolved — do not proceed to contract negotiation while a contract-blocking question remains unresolved. Present the gap to the vendor in writing and request a specific, documented response. If the vendor cannot address the gap, the evaluation result is a disqualification. If they can address it, require the resolution to be reflected in the contract rather than relying on verbal commitments. Our shadow AI guide covers what happens when vendors are rejected and employees adopt tools informally — and why providing approved alternatives is the governance investment that prevents that outcome.
5. How do we evaluate AI vendors that use a third-party foundation model (OpenAI, Anthropic, etc.) rather than their own?
Request the full subprocessor chain — who the vendor uses, what data flows to that provider, and what contractual protections cover each link. Under ISO 42001 and the EU AI Act, responsibility follows the system regardless of how many layers of providers are involved. You need to understand and accept the data handling terms of every entity that touches your data, not just the primary vendor. The vendor’s dependency on a foundation model provider also means their product roadmap, pricing, and availability are partially outside their control — a material factor in your exit strategy and continuity planning.
📧 Get the AI Buzz Weekly Digest
Weekly AI insights, tools, and strategies — delivered every Monday. Free.





Leave a Reply