This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Blog

2021

Opensource license used by IT Companies

OpenChain KWG News in 2020

ISO Standard, OpenChain
Kakao Training Material
NCSOFT Training Material
Open Source Compliance in the Enterprise

News

SK Telecom is the first Telecommunications Operator to adopt OpenChain ISO 5230
Samsung Electronics Announces OpenChain ISO 5230 Conformance
오픈소스SW 관리 툴 ‘포스라이트’ 공개한 LG전자, “글로벌 표준 거듭나겠다”
한 달 걸리던 카카오톡 오픈소스 관리, 이틀이면 충분
SK텔레콤, 개발자와 소통하는 커뮤니티 ‘DEVOCEAN’ 론칭

AI-Generated Code: How Far Should Open-Source Scanning Go?

2021

Opensource license used by IT Companies

Share the usage of the Opensource by major IT companies

By Robin Hwang (@revfactory) | Thursday, January 14, 2021

EPAM, Github Activity Ranking and License Status by Enterprise in 2020 (OSCI)

featured-github

EPAM, which develops and consults Enterprise Software, provides a ranking service called OSCI (Open Source Contributor Index) that measures Github usage.

Measures the contributions of members of a commercial organization using publicly available Github Committed event data. Contributions from universities, research institutes and free email providers were not included. The target is contributors who have performed more than 10 commits, and the activity score is measured by an algorithm they have studied. The algorithm is published in OSCI Github.

OSCI (Open Source Contributor Index)

https://solutionshub.epam.com/osci contributing-ranking

According to the analysis score, Google is at the forefront, with Microsoft and Red Hat coming next. Samsung is ranked 29th for Korean companies and LG Electronics is ranked 71st.

License Status used by Major Companies

See Open Source License Usage Survey

OpenChain KWG News in 2020

ISO Standard, OpenChain

The International Standard For Open Source Compliance

By OpenChain Project | Wednesday, December 16, 2020

On December 15, 2020, OpenChain Specification 2.1 was officially published as an international standard ISO/IEC 5230:2020.

For more information on this, please refer to the following article. : https://world.einnews.com/pr_news/532725924/iso-iec-5230-2020-is-the-international-standard-for-open-source-compliance

Kakao Training Material

Kakao has released open source educational materials for in-house developers so that anyone can read it.

By Robin Hwang (@revfactory | Tuesday, November 24, 2020

The open source technology part of Kakao has released open source educational materials for in-house developers so that anyone can read it.

featured-kakao-guide

Download

Training materials can be downloaded from the following page.

http://t1.kakaocdn.net/olive/assets/opensource_guide_kakao.pdf

NCSOFT Training Material

NCSOFT has released lecture slides and scripts, which are open source educational materials in-house, on GitHub based on the spirit of sharing open source so that anyone can use them.

By Jiho Han (@PeterHanJiho) | Monday, November 23, 2020

NCSOFT has released lecture slides and scripts, which are open source educational materials in-house, on GitHub based on the spirit of sharing open source so that anyone can use them.

Companies that develop/distribute software can use this slide as it is or extract/modify necessary parts when preparing for in-house open source training.

cover

Anyone can participate in modifying and supplementing these materials through GitHub.

Download

These materials can be downloaded from the following page.

https://github.com/ncsoft/oss-basic-training/tree/master/source

Maintainer

Name	Company	Email
Peter Jiho Han	NCSOFT	yulica37@ncsoft.com
Dasom Han	NCSOFT	dasom12@ncsoft.com

License

CC-BY 4.0

Open Source Compliance in the Enterprise

NCSOFT summarized the Open Source Compliance in the Enterprise book and released it on GitHub for anyone to read.

By Jiho Han (@PeterHanJiho) | Monday, November 23, 2020

Introducing the Open Source Compliance in the Enterprise book

This book was written by Ibrahim Haddad and published by the Linux Foundation.

It explains in detail what to consider when building an open source compliance program in a business, and anyone can download it from the following link : Download

Korean summary

NCSOFT summarized the main contents of this book in Korean, and after getting permission from the author, Ibrahim, they released it so that open source people of Korean companies can refer to it.

In particular, NCSOFT shared this on GitHub so that anyone can refer to it, improve it, and continue to develop it.

GitHub : https://github.com/ncsoft/osc-enterprise-ko/
License : CC BY 4.0
Slide : OSC-Enterprise-ko_Openchain_KWG_20200317_FN.pdf

Contributors

Name	Company	Email	Role
Dasom Han	NCSOFT	dasom12@ncsoft.com	Maintainer
Peter Jiho Han	NCSOFT	yulica37@ncsoft.com	Contributor

News

SK Telecom is the first Telecommunications Operator to adopt OpenChain ISO 5230

By OpenChain | Thursday, September 09, 2021

SK Telecom, South Korea’s largest wireless carrier, is the first telecommunications operator to adopt OpenChain ISO 5230. This leap forward in governance builds on their long-term mission to lead in technological capabilities in 5G, AI, big data analysis, IoT and quantum cryptography communications as a global ICT leader.

“SK Telecom is preparing for its transition to an AI company, and is strategically using open source to develop advanced technologies in AI, 5G, and cloud technology”, says Kim Yoon, CTO, SK Telecom. “Open source, essential for the early introduction of new technologies and rapid technological change, forms a key part of our strategy. We adopted OpenChain ISO 5230, the International Standard for open source compliance, to ensure effective process management in this space. We are also spreading our know-how around complying with open source international standards to SK affiliates and software supply chains to drive further expansion of open source ecosystems and to deliver even greater social value.”

See the original article for more details. : https://www.openchainproject.org/featured/2021/09/08/sk-telecom

Samsung Electronics Announces OpenChain ISO 5230 Conformance

By OpenChain | Thursday, July 08, 2021

Samsung Electronics announces adoption of OpenChain ISO 5230, the International Standard for open source compliance. They join a growing community of companies in the consumer electronics, automotive, cloud computing and telecommunications field in using this standard to manage supply chains.

“Samsung Electronics has been consistently striving to build an open source compliance process. Our adoption of OpenChain ISO 5230 reflects our ongoing commitment to excellence in our field,” says Daniel Park, Head of Open Source Group. “OpenChain has greatly helped improve the efficiency and confidence of the process. We are pleased to be a part of OpenChain and sincerely look forward to further developing this standard with our peers and suppliers.”

See the original article for more details. : https://www.openchainproject.org/featured/2021/07/07/samsung-electronics-announces-openchain-iso-5230-conformance

오픈소스SW 관리 툴 ‘포스라이트’ 공개한 LG전자, “글로벌 표준 거듭나겠다”

By 디지털데일리 백지영 기자 | Monday, June 28, 2021

최근 LG전자가 자체 개발해 2014년부터 운영해오고 있는 오픈소스 SW관리 도구인 ‘포스라이트(FOSSLight ; Free and Open Source Software Light)’를 공개해 주목을 받고 있다. 포스라이트는 개발자의 SW를 분석해 오픈소스를 사용했는지, 오픈소스 사용 조건이나 의무사항을 준수했는지 등을 검증해 주는 툴이다.

LG전자 오픈소스 태스크. 사진 왼쪽부터 방재권 선임, 김소임 선임, 최혜성 책임, 김경애 리더, 석지영 선임, 박원재 선임
백지영 기자

‘포스라이트’라는 이름으로 외부에 공개하면서 안정성과 기능을 확대하는 한편 글로벌 인지도를 높일 수 있게 할 계획이다. 포스라이트에는 ‘세상에 빛을 밝혀줄 오픈소스’라는 의미가 담겼다.

자세한 내용은 기사 원문에서 확인하세요. : https://n.news.naver.com/article/138/0002105818

한 달 걸리던 카카오톡 오픈소스 관리, 이틀이면 충분

카카오, 오픈소스 관리 플랫폼 ‘올리브’ 이달 말 출시

By 남혁우 기자 | Monday, June 28, 2021

“한 달이 걸리던 카카오톡 오픈소스 관리 작업을 올리브를 사용하면 하루 이틀이면 충분할 것이다.”

카카오의 황은경 오픈소스기술파트장은 오픈소스 관리 서비스 ‘올리브’의 정식 버전에 대해 소개하며 위와 같이 말했다.

카카오가 오픈소스 관리플랫폼 '올리브'를 29일 정식 출시한다(이미지=카카오)
카카오

올리브는 복잡한 오픈소스 라이선스 관리를 자동화하는 개발 지원 도구다. 자동으로 소프트웨어를 분석해 오픈소스를 사용했는지, 오픈소스 사용 조건이나 의무사항 등을 목록별로 정리해 제공한다.

자세한 내용은 기사 원문에서 확인하세요. : https://zdnet.co.kr/view/?no=20210628143239

SK텔레콤, 개발자와 소통하는 커뮤니티 ‘DEVOCEAN’ 론칭

SK ICT 패밀리사 개발자와 외부 개발자 간 소통 채널
ICT 지식 공유 및 문답·멘토링 제공
외부강연 통해 양방향 교류 촉진

By 한국금융신문 정은경 기자 | Monday, June 14, 2021

SK텔레콤이 스타트업·대학 등 외부 개발자들과 소통하는 오픈 커뮤니티를 론칭하며 SK의 ICT 역량을 적극 공유에 나선다.

‘데보션’은 개발자들을 위한 영감의 바다(Developer’s Ocean)라는 뜻으로, 개발자들이 지식과 경허을 공유하는 커뮤니티를 ‘바다’에 비유했다.

SK텔레콤은 SK하이닉스, SK㈜C&C, SK브로드밴드, SK플래닛 등 SK ICT패밀리사 개발전문가들과 외부 개발인재간 소통과 기술 공유를 위한 디벨로퍼 릴레이션 채널인 ‘데보션’을 론칭했다. 사진=SK텔레콤
SK텔레콤

자세한 내용은 기사 원문에서 확인하세요. : https://www.fntimes.com/html/view.php?ud=202106140855478693645ffc9771_18

AI-Generated Code: How Far Should Open-Source Scanning Go?

Whether AI-generated code needs snippet-level open-source license scanning — the decision factors, grounded in public sources, and how this differs from security-vulnerability scanning.

By Haksung Jang | Monday, June 08, 2026

About this article

This article was written with the help of Claude Code, and its key facts were cross-checked against public sources.

Disclaimer

This article reflects the author’s personal analysis and is not legal advice. The cited facts were verified against public sources, but for any specific situation please consult a qualified professional such as an attorney.

First, one thing to get straight

There is no quick yes-or-no answer to this question, because the deciding factor is not AI itself — as we will see. AI coding raises the rate at which code fragments enter a codebase without being declared as packages, but it does not change the conditions under which a license obligation arises. So instead of asking “does scanning still matter in the AI era,” it helps to ask “under what conditions does snippet-level scanning matter more, and under what conditions less.”

Snippet scanning and SCA are different things

First, separate the terms so the discussion does not get tangled.

Type	What it looks at	What kind of code it catches
Dependency-level SCA	Components declared through a package manager	Manifests and build artifacts such as `package.json`, `pom.xml`
Snippet-level matching	Portions of the source code itself	Code fragments brought in by copy-paste or AI generation

Software Composition Analysis (SCA) refers broadly to identifying the open source that has entered your code and managing its vulnerabilities and licenses. Most SCA looks at declared dependencies, as in the first row of the table. Snippet-level matching is a separate capability that only some commercial tools have: it compares the source code itself against a vast number of open-source projects to find the origin of fragments that were never declared as packages. What this article addresses is not SCA as a whole, but this snippet matching.

How real is the legal risk?

Let us start with cases where AI code snippets led to license-violation disputes.

The best-known case is the Copilot class action filed in November 2022 by open-source developers against GitHub, Microsoft, and OpenAI. In May 2023 the copyright-infringement claim was dismissed for lack of specific examples of copying, and in July 2024 the claim under Section 1202(b) of the Digital Millennium Copyright Act (DMCA) was also dismissed. That provision prohibits removing the copyright management information attached to an original work; the court did not accept the claim, reasoning that Copilot’s output was not sufficiently identical to the original A2. Of the original 22 claims, only two remain: breach of an open-source license and breach of contract A2·A3.

There are facts here that can be read two ways.

On one hand, the defendants in these disputes have all been the vendors who built the AI tools, and there is no publicly reported case of a company being sued merely for using AI-generated code C1. Microsoft also announced, in September 2023, the Copilot Copyright Commitment, under which it will cover defense costs and damages if a paid commercial customer is sued by a third party over the output. This comes with conditions: the customer must not disable the product’s built-in filters and must not deliberately try to generate infringing material B1·C5.

On the other hand, the legal position is not settled. The DMCA issue above has gone to the Ninth Circuit Court of Appeals as an interlocutory appeal — where a specific issue is argued before the appellate court ahead of the trial-court judgment — and as of June 2026 no ruling has issued while the trial-court proceedings are stayed A1. The absence of a reported precedent does not mean there is no risk.

When does a license obligation arise?

One distinction matters here. Being sued and having an obligation to comply with a license are different things. Even if no one sues, the obligation to comply with an open-source license remains. And the moment when that obligation is triggered is what matters.

Copyleft obligations in the GPL family, which require disclosing source, arise when you distribute the software. Merely running it internally is “use,” not “distribution,” so no obligation arises C3·C4. Pure SaaS, which hands code to no one, is likewise outside GPL obligations for the same reason. Two caveats apply.

AGPL is a stricter license that treats even providing a network service as distribution. If you use an AGPL-licensed component, an obligation to disclose source arises even when you only offer it as a service without handing over the code C3. Such components can be managed by excluding them through internal policy.
“Internal” must mean used only by your own employees; if distribution later happens through an acquisition or open-sourcing, the problem arises at that point.

The copyrightability of short code is another factor. A few lines of functional code may involve too small an amount of copying to be actionable (de minimis), or may fall outside protection because there is essentially only one way to express the idea (merger doctrine) A4·A5. That said, this is a case-by-case judgment, and since long, creative blocks of code are protected, it is hard to assume every snippet is free.

In practice, small fragments often arrive carrying obligations. Code from Stack Overflow, which developers frequently reuse, is under the CC BY-SA license, which requires attribution and share-alike. Yet one study found that at most 1.8% of GitHub projects used such code in a way compliant with the license C6. Even a small fragment can carry a license obligation, and that obligation is widely unmet.

Does the standard require snippet scanning?

OpenChain ISO/IEC 5230, the international standard for open-source license compliance, focuses on where to place compliance processes, how to assign roles and responsibilities, and how to keep the process sustainable A6. It is a non-prescriptive standard that defines what to achieve while leaving the specific methods to the organization, so it does not mandate any particular technique such as snippet scanning A6·A7. What the standard requires is to identify third-party components and to maintain a Software Bill of Materials (SBOM), the list of those components. What matters for meeting the standard is understanding which components have entered the code; it does not require analyzing the origin of every single fragment. In fact, many widely used SCA tools operate only at the dependency level, without snippet matching.

This reads two ways. It means you can meet the standard without snippet scanning, and at the same time it means there is an area the standard does not cover. Dependency-level scanning cannot see fragments that were copied in or generated by AI without being declared as packages. Snippet matching fills exactly that gap, and some organizations perform it for more thorough intellectual-property management.

When does it matter more?

How much weight to give snippet scanning depends on two conditions a company faces.

First, whether you hand code or binaries directly to customers. When code leaves the company — as with on-premises installed products, mobile apps, SDKs, or embedded device firmware — it counts as distribution, and copyleft obligations can be triggered. Pure SaaS, which hands over no code, carries less of this burden.

Second, whether you undergo external verification. Situations such as M&A due diligence, a large customer’s security audit, regulatory requirements, or an SBOM request that goes down to the snippet level — where someone outside the company actually examines the origin of the code.

The more these two overlap, the more likely a latent obligation turns into a real cost. If you hand over code but there is no occasion for verification, the risk stays latent; if you hand over no code, the obligation rarely arises in the first place. Both conditions are independent of whether AI is used. AI coding increases the inflow when the conditions hold, but it does not create the conditions.

Looking at it by company type

Company type	Weight on snippet scanning	Why
Hands no code outside (pure SaaS, no AGPL components)	Low	Not distribution, so copyleft obligations rarely arise
Hands code or binaries to customers, but no external audit	Depends	An obligation may exist, but there is no occasion for anyone to look
Hands over code + M&A due diligence, customer audit, regulation	High	The point where origin is actually examined externally

The table is a starting point, not the answer. Even within the same cell, the choice can differ by the nature of the code, the licenses involved, and the company’s risk appetite.

Embedded is a different story

Everything so far assumed software built with a package manager. Software that runs as embedded or firmware — routers, set-top boxes, IoT devices, automotive controllers — is different. It is mostly written in C/C++, and open source is often copied directly into the project as source, without a manifest. When that happens, dependency-level SCA has no manifest to read and sees almost none of the open source.

One thing to distinguish: large components used wholesale, such as the Linux kernel or BusyBox, are usually known to the company. That is not a detection problem but a question of meeting the obligation to disclose source. Snippet scanning is needed in a different case: the small fragments pulled in piecemeal from various open-source projects that no one put on a list. Finding these fragments, which dependency-level SCA cannot see, is the job of snippet scanning.

So in embedded, snippet scanning is closer to a basic means of finding undeclared open-source fragments than a conditional supplement.

Filtering before code comes in

Apart from after-the-fact scanning, there is also a way to block problematic code before it comes in. GitHub Copilot has a setting that blocks suggestions matching public code: it does not show suggestions that match public code at or above a certain length (about 150 characters on average) B2·C2. GitHub has stated that verbatim copying of more than 150 characters happens about 1% of the time, while independent studies report higher rates depending on context. Either way it is not zero, but turning the setting on reduces the inflow of fragments of unclear origin. It costs almost nothing, and it is also a precondition for the vendor indemnity mentioned earlier.

This setting overlaps in purpose with after-the-fact snippet scanning. One finds code after it is in; the other blocks it before it gets in. Which one to use, and how much, is something to decide together with the conditions and costs above.

Criteria for the decision

There is a reason this decision is not simple. The only practical way to find code fragments that were copied in, or generated by AI, without being declared as packages is snippet matching. Neither dependency-level SCA nor a filter applied at generation time catches all of those fragments. So there is a small but real part that only snippet matching covers. At the same time, in many companies that small part rarely turns into an actual loss, and snippet scanning carries tool costs and review effort. In the end, it is about deciding whether the cost is worth it to close a small but real risk.

Four things to consider:

Do you send code outside — the more you do, the greater the chance a copyleft obligation actually arises.
Do you undergo external verification — M&A due diligence or a customer audit can surface an obligation that had been buried.
How much license risk are you willing to accept — there is no reported lawsuit precedent, but the legal position is not settled either. How you take this uncertainty is a matter of company policy.
What other checks do you already have — if you already run dependency-level SCA, an AI-tool setting that blocks suggestions matching public code, and a policy of excluding AGPL components, the part snippet scanning would additionally catch shrinks accordingly.

One more point. Snippet scanning is not an all-or-nothing choice. The occasions when someone outside actually examines the origin of code are fairly predictable — M&A due diligence or a large customer’s audit. So one option is to run only dependency scanning and the blocking setting day to day, and have a snippet scan done when such an occasion is expected.

Applying these four factors and this operating approach to your own situation, the answer to how much weight to give snippet scanning will come out differently for each company. The exception is embedded software built without a manifest. There, snippet scanning is not a conditional supplement but a basic means of finding undeclared open-source fragments.

Security vulnerabilities are a separate matter

Everything so far has been about licenses. Security vulnerabilities are a different axis, and you should not apply the conditional conclusion above to them. If vulnerable open source is in your code, it is dangerous whether or not you distribute, and whether or not you are audited — because it is exposed to attack even in internal-only software or a pure-SaaS backend. So vulnerability checking is broadly necessary for almost every company.

There are two main tools for security checking.

Dependency-level SCA — looks at the name and version of declared open-source libraries and checks them against lists of known vulnerabilities (CVEs) D1. Known vulnerabilities in the libraries AI pulled in are caught here.
SAST (static analysis) — finds risky coding patterns in the source code itself, regardless of where the code came from. The main security risk in AI code is here. In one study, about 40% of 1,689 programs generated by Copilot contained vulnerabilities D2.

Whether code was copied in or generated by AI, security checking is no different from any other code. SAST handles risky patterns in the code itself, and dependency-level SCA handles known vulnerabilities in the libraries you brought in. Snippet scanning is a feature for finding license origin, not a tool for security checking.

One exception worth noting: a rare case where vulnerable code with a known vulnerability is copied in verbatim, escaping both SAST’s patterns and the dependency list. Catching this requires not the license-oriented snippet feature but a separate check that compares your code directly against signatures of vulnerable code built from CVE patches — vulnerable code clone detection D3. Academic tools and some commercial tools provide this.

Sources

A1. BakerHostetler (2025). Doe v. GitHub, Inc. — The Copilot Litigation. https://www.bakerlaw.com/the-copilot-litigation/ (accessed 2026-06-08). — Claim-by-claim progress of the Copilot class action and its pending status before the Ninth Circuit. ↩

A2. Claburn, T. (2024). Judge dismisses DMCA copyright claim in GitHub Copilot suit. The Register, 2024-07-08. https://www.theregister.com/2024/07/08/github_copilot_dmca/ (accessed 2026-06-08). — Dismissal of the DMCA §1202(b) claim; 2 of the original 22 claims (license breach, breach of contract) remain. ↩

A3. Pearl Cohen (2024). Copyright Claims Against GitHub, Microsoft, and OpenAI Largely Dismissed. https://www.pearlcohen.com/copyright-claims-against-github-microsoft-and-openai-largely-dismissed/ (accessed 2026-06-08). — Overall picture of the dismissed claims and the surviving claims. ↩

A4. Goldstein Patent Law. Understanding the Copyright Merger Doctrine. https://www.goldsteinpatentlaw.com/copyright-merger-doctrine/ (accessed 2026-06-08). — The merger doctrine, which denies copyrightability of functional code. ↩

A5. NYU Journal of Intellectual Property & Entertainment Law. Clarifying the De Minimis Doctrine in Copyright Law. https://jipel.law.nyu.edu/clarifying-the-de-minimis-doctrine-in-copyright-law/ (accessed 2026-06-08). — The de minimis doctrine, under which trivial copying is not actionable. ↩

A6. OpenChain Project. OpenChain ISO/IEC 5230 — License Compliance. https://openchainproject.org/license-compliance (accessed 2026-06-08). — The standard defines processes and roles but does not mandate any specific technique such as snippet scanning. ↩

A7. ISO. ISO/IEC 5230:2020 — Information technology — OpenChain Specification. https://www.iso.org/standard/81039.html (accessed 2026-06-08). — Bibliographic record of the standard. ↩

B1. Microsoft (2023-09-07). Microsoft announces new Copilot Copyright Commitment for customers. https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot-copyright-commitment-ai-legal-concerns/ (accessed 2026-06-08). — IP indemnity for paid commercial customers, conditioned on keeping the built-in filters on. ↩

B2. GitHub. GitHub Copilot (product page). https://github.com/features/copilot (accessed 2026-06-08). — Existence and behavior of the public-code matching filter. ↩

C1. TechTarget. AI lawsuits explained: Who’s getting sued?. https://www.techtarget.com/whatis/feature/AI-lawsuits-explained-Whos-getting-sued (accessed 2026-06-08). — Indication that defendants are concentrated among vendors and that no adopter company has been reported as sued. ↩

C2. Microsoft Community Hub. Demystifying GitHub Copilot Security Controls. https://techcommunity.microsoft.com/blog/azuredevcommunityblog/demystifying-github-copilot-security-controls-easing-concerns-for-organizational/4468193 (accessed 2026-06-08). — The roughly 150-character match threshold of the public-code matching filter and the ~1% copying rate. ↩

C3. Mend.io. The SaaS Loophole In GPL Open Source Licenses. https://www.mend.io/blog/the-saas-loophole-in-gpl-open-source-licenses/ (accessed 2026-06-08). — Copyleft’s distribution trigger, the non-applicability to internal and SaaS use, and the AGPL §13 exception. ↩

C4. Revenera. Understanding the SaaS Loophole in GPL. https://www.revenera.com/blog/software-composition-analysis/understanding-the-saas-loophole-in-gpl/ (accessed 2026-06-08). — Supporting detail on the distribution trigger and the SaaS exception. ↩

C5. TechTarget. Microsoft Copilot Copyright Commitment explained. https://www.techtarget.com/searchenterprisedesktop/tip/Microsoft-Copilot-Copyright-Commitment-explained (accessed 2026-06-08). — Supporting detail on the scope and conditions of the indemnity. ↩

C6. Baltes, S. & Diehl, S. (2019). Usage and Attribution of Stack Overflow Code Snippets in GitHub Projects. Empirical Software Engineering, arXiv:1802.02938. https://arxiv.org/abs/1802.02938 (accessed 2026-06-08). — Empirical study finding that at most 1.8% of GitHub projects used Stack Overflow code (CC BY-SA) in a license-compliant way. ↩

D1. Cycode. What Is Software Composition Analysis (SCA)?. https://cycode.com/blog/what-is-software-composition-analysis-sca/ (accessed 2026-06-08). — How SCA finds vulnerabilities by matching components and versions against CVE/NVD. ↩

D2. Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., & Karri, R. (2022). Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. IEEE S&P 2022, arXiv:2108.09293. https://arxiv.org/abs/2108.09293 (accessed 2026-06-08). — About 40% of 1,689 generated programs across 89 scenarios contained vulnerabilities. ↩

D3. Kim, S., Woo, S., Lee, H., & Oh, H. (2017). VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery. IEEE S&P 2017. https://seulbae-security.github.io/pubs/vuddy-sp17.pdf (accessed 2026-06-08). — How vulnerabilities propagate through copied code and remain unpatched after the upstream fix, and how to detect them. ↩