7. AI Compliance

Explains the AI compliance factors that companies developing and operating AI systems must consider from an open source perspective. Focuses on the open source cross-cutting requirements of the ISO/IEC 42001 AI management system standard.

AI systems make extensive use of open source frameworks, pre-trained models, and open datasets. Companies that operate an open source management system (ISO/IEC 5230 · 18974) must apply open source compliance principles during the AI system development phase as well. In addition, development environments that leverage AI coding tools (GitHub Copilot, Claude Code, Cursor, etc.) require a new management framework to address license contamination and the introduction of vulnerable packages.

ISO/IEC 42001 (AI management system) covers AI governance as a whole, and some of its clauses directly intersect with open source management. This section organizes those intersections from a practical perspective.

1. Three Areas Where Open Source Is Used in AI Systems

AI System
  ├── 1. AI Frameworks · Libraries
  │       (PyTorch, TensorFlow, Hugging Face Transformers, LangChain, etc.)
  │       → Apply general open source license compliance
  │
  ├── 2. Pre-trained Models
  │       (Llama, Mistral, Falcon, BERT, etc.)
  │       → Custom per-model license verification required
  │
  └── 3. Training Datasets
          (Common Crawl, Wikipedia, CC-BY datasets, etc.)
          → Fulfill open data license obligations

Each area has points that differ from existing open source compliance processes, so refer to the following.

2. Open Source Management by AI System Area

(1) Managing AI Frameworks · Libraries

Open source frameworks and libraries used in AI development are subject to the same ISO/IEC 5230 open source management process as general software.

Major AI Framework Licenses

FrameworkLicenseCommercial UseKey Obligations
PyTorchBSD 3-Clause✅ AllowedCopyright notice
TensorFlowApache 2.0✅ AllowedCopyright notice, change notice
Hugging Face TransformersApache 2.0✅ AllowedCopyright notice
LangChainMIT✅ AllowedCopyright notice
scikit-learnBSD 3-Clause✅ AllowedCopyright notice

Checkpoints

(2) Managing Pre-trained Models

Pre-trained models often use custom licenses that differ from general open source libraries. In particular, they may include commercial use restrictions or obligations to disclose derivative models, so caution is required.

Major Open Source AI Model License Types (as of 2026-05)

The following table summarizes the licenses of leading industry models as of 2026-05. Based on OSAID 1.0 (Open Source AI Definition, OSI, 2024-10), it distinguishes between “open source AI models” (disclosure of all three elements: data, code, and weights) and “Open Weight models” (weights only disclosed).

License TypeRepresentative Models (as of 2026)OSAID 1.0Commercial UseDerivative Model Disclosure
Apache 2.0Mistral 7B, Qwen 2.5 / Qwen 3, Falcon 7B/40B✅ Compliant✅ Allowed❌ Not required
MITDeepSeek-V3 / DeepSeek-R1, Phi-4, GPT-J✅ Compliant✅ Allowed❌ Not required
Meta Llama Community LicenseLlama 3.1 / 3.3 / 4⚠️ Open WeightConditional (free for MAU ≤ 700M)❌ Not required
Gemma Terms of Use v3Gemma 3⚠️ Open WeightConditional (AUP acceptance)❌ Not required
TII Falcon 180B LicenseFalcon 180B⚠️ Open WeightSeparate conditions for commercial useCheck terms of use
CC-BY 4.0Some academic models⚠️ Data only✅ AllowedAttribution required
CC-BY-NC 4.0Some research models❌ Non-commercial only❌ Non-commercial only

Including Model Information in the AI SBOM

Build an AI SBOM that includes pre-trained models in the SBOM (Software Bill of Materials). The two de facto industry-standard formats are SPDX 3.0 AI Profile (strong on license/copyright expression) and CycloneDX 1.6 ML-BOM (rich in security/ethics/performance metadata), and organizations may adopt either one or both.

# Example AI SBOM model entry (based on SPDX 3.0 AI Profile)
- name: "meta-llama/Llama-3.1-8B"
  version: "3.1"
  license: "Llama Community License Agreement"
  primaryPurpose: "inference"
  hyperparameter:
    contextWindow: 131072
  modelCard: "https://huggingface.co/meta-llama/Llama-3.1-8B"

For the key field specifications of each standard (12 fields in SPDX 3.0 AI Profile / 4 modelCard areas in CycloneDX 1.6 ML-BOM) as well as authoring examples and tool usage, refer to the AI SBOM Guide (Korean).

(3) Managing Training Datasets

If a dataset used to train an AI model is subject to an open data or Creative Commons license, you must fulfill the conditions of that license.

Major Open Data License Types

LicenseAttributionCommercial UseShare-Alike
CC0❌ Not required✅ Allowed❌ Not required
CC-BY 4.0✅ Required✅ Allowed❌ Not required
CC-BY-SA 4.0✅ Required✅ Allowed✅ Required
CC-BY-NC 4.0✅ Required❌ Non-commercial only❌ Not required

Checkpoints

  • Record training datasets and their licenses in the AI SBOM
  • When using CC-BY-family data, specify the source in the model card or system documentation
  • When CC-BY-SA data is used for training, consult the legal team on the license treatment of derivative models

3. Alignment with ISO/IEC 42001

If a company operates or is preparing an ISO/IEC 42001 AI management system, the following clauses connect directly to open source management.

ISO 42001 ClauseRole of the Open Source Manager
§5.2 AI PolicyInclude open source usage principles in the AI policy
§6.1.2 AI Risk AssessmentIdentify and assess OSS license/vulnerability risks
§7.5 DocumentationEstablish and maintain the AI SBOM
§8.5 AI LifecycleReview OSS compliance at each development phase
§8.6 AI DataManage dataset licenses
§8.8 External AI ProcurementVerify the supply chain of external open source models

Full guide on the open source cross-cutting requirements of ISO/IEC 42001: ISO/IEC 42001 Guide (Korean)

4. Leveraging AI Work Group Deliverables

The AI Work Group of the OpenChain Korea Work Group developed an AI SBOM compliance guide. This guide provides detailed instructions on how to document AI system components (models, datasets, frameworks) in SPDX 3.0 AI Profile or CycloneDX 1.6 ML-BOM format.

5. Compliance When Using AI Coding Tools

AI coding tools such as GitHub Copilot, Claude Code, Cursor, and Windsurf boost development productivity, but they also bring new risks from an open source compliance perspective.

(1) Key Risks of AI Coding Tools

  • License contamination risk: AI learns from open source code and generates similar code. Copyleft code such as GPL may be inadvertently introduced.
  • Recommendation of vulnerable packages: AI sometimes recommends older versions based on its training data, so packages containing known CVEs may be introduced.
  • Missing dependency SBOM entries: Dependency packages suggested by AI are also subject to SBOM and vulnerability management.

(2) Four-Stage Strategy by Assurance Level

StageCore MeansAssurance LevelRecommended For
Stage 1: Reliance on promptsNone (individual memory)LowIndividual experimentation
Stage 2: Internalizing AI rulesCLAUDE.md · .cursorrules, etc.MediumTeam collaboration
Stage 3: Automated CI/CD blockingsyft · grype · ORTHighTeam · Organization
Stage 4: Continuous monitoringDependabot · Renovate + AIVery HighOrganization · Company-wide

Stage 1 can be started immediately, but true automated blocking (hard block) takes effect from Stage 3 onward.

(3) Internalizing the Open Source Policy in AI Rule Files

If you specify the open source policy in advance in AI tool configuration files such as CLAUDE.md · .cursorrules · .clinerules, the AI will automatically be aware of the policy when generating code.

## Open Source Policy

### License Management
Always verify and specify the license when adding a new external package.

**Allowed licenses**: MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC
**Caution licenses** (legal review required): LGPL, MPL
**Prohibited licenses** (may not be used without prior approval): GPL, AGPL, SSPL

### Security Management
- Do not use package versions with known CVE vulnerabilities
- Run a vulnerability audit after adding dependencies: `npm audit` / `pip-audit` / `trivy fs .`

### SBOM Management
- The SBOM must be updated when dependencies change
- Generation tools: syft, cdxgen, trivy

(4) Automated CI/CD Pipeline Blocking

Automatically verify with SCA (Software Composition Analysis) tools before merging a PR.

RoleToolBehavior
SBOM generationsyftExtracts all dependencies in CycloneDX/SPDX format
Vulnerability blockinggrypeFails the build when a High/Critical CVE is found
License blockingORT / scriptFails the build when a prohibited license is found

For how to configure CI/CD, refer to 4. Tools in this guide.

The copyright attribution of code generated by AI coding tools varies depending on whether human authorship is recognized. From a legal and compliance perspective, decide on and document the following five items.

ScenarioHuman AuthorshipEligible for Copyright Registration
Fully AI-generated (only a prompt entered, output used as-is)❌ None❌ Cannot be registered as a company work (US Copyright Office 2024 guidance)
AI draft + substantial human modification (50%+ changed)✅ Yes✅ Registrable for the human-modified portion only
AI-assisted + human-decided design and integration✅ Yes✅ Registrable as a company work

US Copyright Office 2024 AI Guidance (official page): No copyright is granted to fully AI-generated works; for AI-assisted work, “human creative contribution” must be recognized for copyright to apply.

5-2. Leveraging Vendor IP Indemnification

The following AI coding tool vendors provide IP indemnification (intellectual property assurance) in their terms of use. Verify the terms before adoption and reflect them in internal policy.

  • Microsoft Copilot Copyright Commitment — Covers defense and settlement costs for copyright infringement claims for Microsoft 365 Copilot · GitHub Copilot Business/Enterprise users
  • OpenAI Copyright Shield — Provides the same coverage for ChatGPT Enterprise/Team users
  • Anthropic Customer Protection — Provides IP coverage for Claude commercial customers
  • Google Cloud Generative AI Indemnification — Covers users of GCP generative AI services such as Vertex AI

Each form of coverage applies only when its conditions are met (e.g., enabling content filtering, complying with the terms, etc.).

5-3. AI Use Disclosure and Labeling Obligations

EU AI Act §50 and the Korea AI Basic Act impose obligations to label AI-generated content. Indicate the following when disclosing content internally or externally.

  • Internal code commit messages: Specify the AI tool used (e.g., feat: implement API handler (assisted by Claude Code))
  • Externally released code: State AI usage disclosure in the README or CONTRIBUTING
  • AI system outputs: Labeling per EU AI Act §50 (e.g., “AI-generated”, “AI-assisted”)

5-4. Documenting the AI Coding Tool Usage Policy

Specify the following items in your internal policy or AI coding tool guidelines.

## AI Coding Tool Usage Policy

### Permitted Tools
- Use only commercial tools that provide vendor IP indemnification
  (e.g., GitHub Copilot Business, ChatGPT Enterprise, Claude for Business)
- Personal free accounts (Copilot Individual, etc.) are prohibited for internal code

### Copyright Attribution Decisions
- When AI output is used as-is, state it in the commit message
- For AI draft + human modification, record the modification ratio and the rationale in the PR description
- For fully AI-generated code, review separate license labeling when releasing externally

### Blocking License Risk
- Verify whether AI-recommended code is similar to copyleft-licensed code such as GPL/AGPL
- Use matching detection tools such as Copilot duplicate detection
- Escalate to the legal team when in doubt

5-5. External Fact References

This subsection corresponds to the obligations of the US Copyright Office AI guidance · EU AI Act §50 · Korea AI Basic Act in §1 Global AI Regulation Matrix (Korean). When regulations change, check the matrix above first.

Checkpoints

  • Is the AI coding tool usage policy documented?
  • Are the criteria for determining copyright attribution of AI-generated code specified?
  • Have the vendor IP indemnification conditions been reviewed and satisfied?
  • Are the AI usage labeling obligations fulfilled for internal commits and external releases?

6. Summary

AI compliance is a natural extension of the existing ISO/IEC 5230 · 18974 open source management system. By identifying and fulfilling license obligations for the three areas of an AI system (frameworks · models · datasets), and applying the open source cross-cutting clauses of the ISO/IEC 42001 AI management system, a company can build a comprehensive compliance framework.

Through this section, a company can gain the following benefits:

  1. Clarify license obligations for each AI system component
  2. Secure external supply chain transparency by operating an AI SBOM
  3. Block the risk of license contamination and vulnerability introduction when using AI coding tools
  4. Pre-organize the open source area when preparing for ISO/IEC 42001 certification
  5. Mechanically prevent policy violations through CI/CD pipeline automation

ISO/IEC 5230 / 18974 / 42001 Compliance Guide — Clauses related to this section: