Privacy, Data Governance, and Security

Last reviewed: 2026-05-11

Privacy and data governance are critical pillars of AI governance because AI systems consume and generate large volumes of data, often including personal and sensitive information. Compliance with privacy laws — GDPR in the EU, CCPA/CPRA and state-level laws in the US, PIPEDA in Canada, PIPA in Korea — is the starting point, but a mature AI governance programme also addresses data quality, model security, third-party risk, and incident response specifically calibrated to AI.

Foundational privacy law

The EU General Data Protection Regulation (GDPR) mandates principles — lawfulness, fairness, transparency, purpose limitation, data minimisation, accuracy, storage limitation, integrity and confidentiality, accountability — that apply directly to AI training and inference. Article 22 GDPR restricts solely-automated decisions producing legal or similarly significant effects, requiring human review and meaningful information about the logic involved.^[1] Fines reach 4% of global turnover.

In the United States, CCPA/CPRA gives California residents rights to access, correct, delete, opt-out of sale or sharing, and limit use of sensitive personal information.^[2] Other states (Virginia VCDPA, Connecticut CTDPA, Colorado CPA, Utah UCPA, Texas TDPSA, Oregon OCPA, Delaware DPDPA, Iowa ICDPA, Tennessee TIPA, Montana MCDPA, Florida FDBR, New Jersey NJDPA, New Hampshire NHPA) have comparable statutes with varying coverage and enforcement; multi-state operations should track Westin Research or IAPP trackers to keep current.

Sector-specific privacy regimes (HIPAA for healthcare, GLBA for financial services, FERPA for education, COPPA for children) overlay these horizontal laws and apply directly to AI used in those sectors.

International privacy laws relevant to AI include Brazil’s LGPD, Japan’s APPI, Singapore’s PDPA, India’s DPDPA (effective 2024-2025 in phases), and Korea’s PIPA. Many include provisions for automated decision-making analogous to GDPR Article 22.

Data quality and lineage

AI outcomes are only as good as the data they are trained on. Mature data governance for AI requires:

Provenance tracking — for each dataset, document source, acquisition method, licensing, and any consent obtained.
Quality assessment — measure completeness, accuracy, freshness, representativeness; document known limitations.
Datasheets for datasets — the practice proposed by Gebru et al. (2018) of cataloguing motivation, composition, collection process, preprocessing, uses, distribution, and maintenance per dataset.^[3] Increasingly required by the EU AI Act Article 10 data governance obligations.
Schema and version control — treat training data with the same rigour as code.

For models subject to California AB 2013 or EU AI Act Article 53, a published training-data summary is now mandatory — see Copyright & IP.

Data minimisation and access control

AI systems should use the minimum data necessary for their purpose. Personal data not needed should not be collected; data that is needed should be pseudonymised, encrypted, and access-controlled. Common patterns:

Role-based access controls restricting training-data access to authorised personnel and processes.
Data tokenisation for training where individual records are not required.
Differential privacy during training to provide mathematical guarantees against record-level disclosure.
Output-side restrictions to prevent models from echoing memorised personal data — particularly important for large language models trained on web-scale data.

Privacy-enhancing technologies (PETs)

PETs are no longer experimental. By 2026 several are production-grade:

Differential privacy — mathematical noise addition during training (DP-SGD) or output (DP-aggregation) that bounds the influence of any individual record.
Federated learning — models trained across distributed datasets without centralising the data; widely deployed in healthcare, mobile keyboards, and cross-institutional research.
Homomorphic encryption — computation on encrypted data; performance has improved but still constrains practical use to specific inference workloads.
Secure multi-party computation (MPC) — joint computation without revealing inputs.
Trusted execution environments (TEEs) — hardware-isolated computation environments (Intel SGX, AMD SEV, Apple Private Compute Cloud); now widely adopted for inference on sensitive data.

PETs increasingly appear in regulatory expectations — the EU AI Act Article 10 references them implicitly, and US sector regulators cite them as appropriate safeguards.

Retention and purpose limitation

Data governance policies should define retention schedules and purpose-limitation controls for training and inference data. GDPR requires data not be kept longer than necessary; CCPA permits consumer-initiated deletion. Practically:

Document retention windows per dataset; automate deletion at end-of-window.
Re-purposing review — if a dataset is reused for a new model or use case, evaluate consent and purpose limitation before training.
Right-to-deletion engineering — build the ability to remove specific records from training corpora and either retrain or apply targeted machine-unlearning techniques.

Model security

AI models themselves are attack targets. Threat categories include:

Model extraction — adversaries query the model to reconstruct it or its training data.
Adversarial examples — inputs crafted to cause misclassification.
Data poisoning — tampering with training data to embed backdoors or biases.
Prompt injection — manipulating LLM behaviour through crafted input, particularly via untrusted data in retrieval pipelines.
Model evasion — bypassing safety filters or content controls.

The NIST AI 600-1 GenAI Profile (updated March 2025) added explicit threat categories for poisoning, evasion, extraction, and model manipulation — this update is the most current US reference for GenAI threat modelling.^[4]

Defensive measures:

Adversarial training — train on adversarial examples to harden the model.
Input filtering and output filtering — structural controls in the deployment pipeline.
Rate limiting and behavioural monitoring — detect extraction and reconnaissance attempts.
Red-teaming — systematic adversarial evaluation, increasingly required by frontier-model frameworks (see Frontier Models).
Provenance verification of upstream components (base models, weights, fine-tuning datasets).

Data security

AI data pipelines must be secured with general infosec hygiene plus AI-specific considerations:

Encryption in transit and at rest for training data and model weights.
Identity and access management for systems handling training data and models.
Integrity verification of training datasets (cryptographic hashing, version control).
Poisoning detection — statistical anomaly detection in training data.
Backup and recovery for model weights and training pipelines as critical assets.

Third-party and supply chain risks

Most AI systems incorporate third-party components: pre-trained foundation models, open-source libraries, cloud AI services, vector databases, datasets. Governance must extend to these:

Vendor assessment for security, privacy, and AI-specific compliance posture.
Model provenance — document source, version, and license of every model component.
License compliance for open-source models with restrictive terms.
Contractual allocation of responsibility for incident response, data handling, and regulatory cooperation.
Cross-border data transfer mechanisms (Standard Contractual Clauses, adequacy decisions) where data crosses jurisdictions.

EU AI Act Article 25 explicitly addresses provider obligations through the value chain for high-risk systems.

Incident response

AI incident response should be a dedicated discipline within general incident response, with playbooks for:

Safety incidents — AI causes physical or financial harm.
Bias incidents — AI is found to produce discriminatory outcomes.
Privacy incidents — AI reveals or is alleged to reveal personal data.
Security incidents — AI is compromised or used to attack other systems.
Misuse incidents — AI is used by adversaries for harmful purposes.

Mandatory incident reporting now applies in multiple regimes — EU AI Act Article 55 (serious incidents for systemic-risk GPAI), California SB 53 (critical incidents), Korea AI Basic Act, sector regulators (FDA for medical devices, OCC for banks). Map your reporting obligations early; many regimes have short windows (e.g., 15 days for serious incidents under the EU AI Act).

Audits and red-teaming

Independent audits are increasingly expected:

Bias audits — required by NYC LL 144 for hiring AI; emerging as best practice broadly.
Privacy audits — required by sector privacy regulators; increasingly required by procurement.
Security audits and red-teaming — required for frontier models by EU and emerging US frameworks.
42001 certification audits — available since 2025 with the publication of ISO/IEC 42006 (see ISO standards).

Coordination across functions

A robust AI governance programme coordinates privacy, security, data governance, AI/ML engineering, legal, compliance, and product functions. Many organisations establish an AI Governance Council with representation from each function, supported by an AI Governance Office that owns documentation, audits, and regulatory engagement. ISO/IEC 42001 specifies this coordination at the management-system level.

GDPR Info. Art. 22 GDPR — Automated individual decision-making. ↩︎
Cloudflare. What is the CCPA?. ↩︎
Gebru, T., et al. (2021). Datasheets for Datasets. Communications of the ACM, 64(12). ↩︎
NIST. AI 600-1: Generative AI Profile. ↩︎