昇腾CANN cann-agreements 实战:开源社区协议与贡献者治理——CLA/DCO 签署自动化与许可证兼容矩阵
·
CANN 社区 55 个仓库接受外部 PR(pull request)→ 第一个 PR 到来 → 需要签署 CLA(Contributor License Agreement,贡献者许可协议)。手动流程:下载 PDF → 打印 → 签名 → 扫描 → 邮件发送 → 人工审核 → 7 天。cann-agreements 仓库定义了 CANN 社区的法律协议基础设施:CLA 模板(个人 CLA + 企业 CLA)、DCO(Developer Certificate of Origin,无需签署的轻量替代)、自动签署 CI(CLA-Bot)、许可证兼容矩阵(55 个仓库的许可证组合 = OSI Approved 子集)。
cann-agreements 是社区的法律门卫——不是代码仓库,但定义了谁能贡献、贡献内容去向、与下游厂商的法律关系。
CANN 社区法律协议三层体系
cann-agreements/
├── CLA/ # 贡献者许可协议
│ ├── CLA-individual.md # 个人 CLA 模板
│ ├── CLA-corporate.md # 企业 CLA 模板
│ ├── CLA-whitelist.json # 已签署 CLA 的个人/企业白名单
│ └── CLA-bot/ # CLA 自动签署 CI
│ ├── cla_bot.py # GitHub App(检测 PR → 检查白名单 → 未签→引导签署)
│ └── config.yaml # Bot 配置
│
├── DCO/ # Developer Certificate of Origin
│ ├── DCO.txt # DCO 全文(Linux Kernel 风格)
│ ├── DCO-check.yml # CI: 所有 commit 强制 Signed-off-by
│ └── DCO-policy.md # DCO 政策说明
│
├── licenses/ # 开源许可证
│ ├── Apache-2.0.txt
│ ├── MIT.txt
│ └── BSD-3-Clause.txt
│
├── LICENSE # 本仓库的许可证
├── LICENSE-COMPATIBILITY.md # 55 个仓库的许可证兼容矩阵
├── third-party/ # 第三方依赖许可证审计
│ └── dependency-licenses.json
└── contributor-guide.md # 贡献者指南
CLA(Contributor License Agreement)vs DCO(Developer Certificate of Origin)
| 维度 | CLA(Apache Style) | DCO(Linux Kernel Style) |
|---|---|---|
| 签署方式 | 下载 PDF → 签名 → 上传 | 每条 git commit 加 Signed-off-by |
| 签署主体 | 个人或公司(法律实体) | 个人(commit author) |
| 内容 | “我授予项目永久、全球、免版税的许可” | “我确认此提交是我原创,我有权贡献” |
| 公司参与 | ✅ 支持(公司 CLA 覆盖所有员工) | ✅ 需公司律师批准 DCO |
| 执行方式 | 手动签署 + 人工审核 | Commit hook + CI |
| CANN 使用 | 核心仓库(~35 个系统仓库)强制 CLA | ~20 个工具/文档/社区仓库使用 DCO |
CLA 自动签署 CI(CLA-Bot)
# cann-agreements/CLA/CLA-bot/cla_bot.py
# GitHub App webhook → CLA 自动签署检查
class CLABot:
"""
CLA 自动签署 GitHub App
流程:PR webhook → 检查白名单 → 已签→标记 ✓ / 未签→引导签署
仓库分类:
- 核心系统仓库(35个):强制 CLA
ops-*, catlass, ATB, asnumpy, hccl/hcomm/hixl, ge/runtime/driver, torchtitan-npu...
- 工具/社区/文档仓库(20个):仅 DCO
cmake, oam-tools, asc-tools, cann-learning-hub, community, cann-competitions...
"""
CLA_WHITELIST_URL = "https://cann-community.dev/CLA/CLA-whitelist.json"
def check_pr(self, pr_event):
"""
处理 GitHub PR webhook
"""
action = pr_event["action"]
if action not in ("opened", "synchronize", "reopened"):
return
pr_number = pr_event["pull_request"]["number"]
pr_author = pr_event["sender"]["login"]
pr_author_email = pr_event["pull_request"]["user"]["email"] or ""
repo_fullname = pr_event["repository"]["full_name"]
# 检查仓库是否需要 CLA(核心仓库 vs DCO 仓库)
if not self._repo_requires_cla(repo_fullname):
return # DCO 仓库,跳过
# 加载白名单并检查签名状态
whitelist = self._load_whitelist()
signed = self._is_signed(pr_author, pr_author_email, whitelist)
if signed:
self._add_label(repo_fullname, pr_number, "cla-signed")
self._post_comment(repo_fullname, pr_number,
f"✅ **CLA Signed**: @{pr_author}, your contribution is eligible for review."
)
else:
self._add_label(repo_fullname, pr_number, "cla-required")
sign_url = f"https://cann-community.dev/CLA/sign?github_user={pr_author}"
self._post_comment(repo_fullname, pr_number,
f"⚠️ **CLA Required**: Hi @{pr_author}, please sign the CLA before we can review.\n\n"
f"📝 **[Sign CLA →]({sign_url})** (takes < 2 minutes)\n\n"
f"Once signed, reply `/cla-check` to re-trigger verification."
)
def _is_signed(self, username, email, whitelist):
"""检查用户是否已签署 CLA(个人白名单 + 企业域名白名单)"""
# 1. 个人白名单检查
if username.lower() in [u.lower() for u in whitelist.get("individuals", [])]:
return True
# 2. 企业域名白名单检查(精确域名匹配,防止子域名误伤)
email_domain = email.split("@")[1] if "@" in email else ""
corp_domains = whitelist.get("corporate_domains", [])
for corp_domain in corp_domains:
if email_domain == corp_domain:
return True
# 允许精确子域名(如 rd.huawei.com 匹配 huawei.com)
if email_domain.endswith(f".{corp_domain}") and email_domain.count(".") == corp_domain.count(".") + 1:
return True
return False
def handle_cla_check_command(self, pr_event):
"""处理 /cla-check 指令(用户签署后手动触发)"""
repo = pr_event["repository"]["full_name"]
pr_number = pr_event["pull_request"]["number"]
# 重新拉取白名单(用户可能刚签署)
whitelist = self._load_whitelist()
author = pr_event["sender"]["login"]
if self._is_signed(author, "", whitelist):
self._remove_label(repo, pr_number, "cla-required")
self._add_label(repo, pr_number, "cla-signed")
self._post_comment(repo, pr_number, "✅ CLA verified! Your PR is now ready for review.")
else:
self._post_comment(repo, pr_number,
f"⏳ @{author}, CLA not yet found. Please complete signing at: "
f"https://cann-community.dev/CLA/sign?github_user={author}"
)
def _repo_requires_cla(self, repo_fullname):
"""核心仓库强制 CLA,工具/社区仓库用 DCO"""
CLA_REPOS = [
"ops-math", "ops-nn", "ops-transformer", "ops-cv", "ops-blas",
"ops-fft", "ops-rand", "ops-tensor", "opbase",
"catlass", "ascend-transformer-boost", "asnumpy", "graph-autofusion",
"hccl", "hcomm", "hixl", "ascend-boost-comm", "shmem",
"ge", "metadef", "runtime", "driver",
"asc-devkit", "asc-tools", "pyasc", "pypto",
"pto-isa", "atvc", "atvoss", "oam-tools", "sip",
"torchtitan-npu", "tensorflow", "triton-inference-server-ge-backend",
"cann-spack-package", "elec-ops-inspection", "elec-ops-prediction",
"elec-ops-simulation", "mat-chem-sim-pred"
]
repo_name = repo_fullname.split("/")[1] if "/" in repo_fullname else repo_fullname
return repo_name in CLA_REPOS
def sync_whitelist(self):
"""定期同步白名单(从签署数据库拉取最新签署列表)"""
signed_users = self._fetch_signed_users_from_db()
# 更新 CLA-whitelist.json(公开:只含 GitHub 用户名 + 企业域名,不含个人邮箱)
with open("CLA/CLA-whitelist.json", "w") as f:
json.dump({
"individuals": signed_users["individuals"],
"corporate_domains": signed_users["corporate_domains"],
"last_updated": datetime.now().isoformat()
}, f, indent=2)
def _add_label(self, repo, pr_number, label):
url = f"https://api.github.com/repos/{repo}/issues/{pr_number}/labels"
requests.post(url, headers={"Authorization": f"token {self.github_token}"},
json={"labels": [label]})
def _remove_label(self, repo, pr_number, label):
url = f"https://api.github.com/repos/{repo}/issues/{pr_number}/labels/{label}"
requests.delete(url, headers={"Authorization": f"token {self.github_token}"})
def _post_comment(self, repo, pr_number, body):
url = f"https://api.github.com/repos/{repo}/issues/{pr_number}/comments"
requests.post(url, headers={"Authorization": f"token {self.github_token}"},
json={"body": body})
DCO 强制执行(DCO-check.yml)
# cann-agreements/DCO/DCO-check.yml
# 所有 commit 必须带 Signed-off-by
name: DCO Check
on: [pull_request]
jobs:
dco-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Check commits for DCO Sign-off
run: |
git fetch origin ${{ github.event.pull_request.base.sha }}
git fetch origin ${{ github.event.pull_request.head.sha }}
commits=$(git log \
${{ github.event.pull_request.base.sha }}..${{ github.event.pull_request.head.sha }} \
--no-merges --format="%H")
failed=0
for commit in $commits; do
message=$(git log -1 --format="%B" $commit)
if ! echo "$message" | grep -q "^Signed-off-by:"; then
author=$(git log -1 --format="%an <%ae>" $commit)
echo "❌ Commit $commit ($author): Missing Signed-off-by"
echo " Fix: git commit --amend --signoff && git push --force"
failed=1
fi
done
if [ $failed -eq 1 ]; then
echo ""
echo "All commits must be signed off (DCO). Add -s to your git commit:"
echo " $ git commit -s -m 'your message'"
exit 1
fi
echo "✅ All commits have valid DCO sign-off"
许可证兼容矩阵与审计脚本
# cann-agreements/licenses/license_audit.py
# 审计 55 仓库的许可证兼容性 + 生成 LICENSE-COMPATIBILITY.md
class LicenseAuditor:
"""许可证兼容性审计:Apache 2.0 兼容所有 CANN 依赖?"""
# 55 个仓库的许可证分配
REPO_LICENSES = {
# 核心算子 + 加速库 + 通信 + 运行时 → Apache 2.0(~35个)
"ops-math": "Apache-2.0", "ops-nn": "Apache-2.0",
"ops-transformer": "Apache-2.0", "opbase": "Apache-2.0",
"catlass": "Apache-2.0", "ascend-transformer-boost": "Apache-2.0",
"hccl": "Apache-2.0", "ge": "Apache-2.0",
"driver": "Apache-2.0 + GPL-2.0 exception", # 驱动含 Linux Kernel header
# 工具 → MIT(更宽松)
"cmake": "MIT", "oam-tools": "MIT", "asc-tools": "MIT",
# 社区/文档 → CC-BY-4.0
"cann-learning-hub": "CC-BY-4.0", "community": "CC-BY-4.0",
"cann-competitions": "CC-BY-4.0",
}
# 12 个关键第三方依赖的 License
THIRD_PARTY_LICENSES = {
"PyTorch": "BSD-3-Clause", "TensorFlow": "Apache-2.0",
"NumPy": "BSD-3-Clause", "CMake": "BSD-3-Clause",
"OpenSSL": "Apache-2.0", "Boost": "BSL-1.0",
"Google Test": "BSD-3-Clause", "Abseil": "Apache-2.0",
"UPX": "GPL-3.0", # ⚠️ copyleft!
"libusb": "LGPL-3.0", # ⚠️ only for dynamic linking
"Linux Kernel Headers": "GPL-2.0", # ⚠️ exception for driver
}
# Apache 2.0 兼容性矩阵
APACHE_COMPAT = {
"Apache-2.0": "✅ Compatible",
"MIT": "✅ Compatible",
"BSD-3-Clause": "✅ Compatible",
"BSD-2-Clause": "✅ Compatible",
"BSL-1.0": "✅ Compatible",
"CC-BY-4.0": "✅ Compatible",
"GPL-3.0": "❌ Incompatible (copyleft) - can only binary-distribute",
"AGPL-3.0": "❌ Incompatible (strong copyleft)",
"LGPL-3.0": "✅ Compatible (dynamic linking only)",
"GPL-2.0": "⚠️ GPL exception needed (header-only permitted)",
}
def audit_all_repos(self):
violations = []
for repo, lic in self.REPO_LICENSES.items():
for dep, dep_lic in self.THIRD_PARTY_LICENSES.items():
compat = self.APACHE_COMPAT.get(dep_lic, "Unknown")
if "❌" in compat:
violations.append({
"repo": repo, "dep": dep,
"dep_license": dep_lic, "status": compat
})
if violations:
for v in violations:
print(f"⚠️ {v['repo']}: depends on {v['dep']} ({v['dep_license']}) - {v['status']}")
else:
print("✅ All repos license-compliant")
return violations
def generate_notice_file(self, repo_name):
"""生成 NOTICE 文件(Apache 2.0 要求列出第三方依赖 license)"""
notices = [
f"CANN {repo_name}",
f"Copyright {datetime.now().year} The CANN Authors",
f"Licensed under Apache 2.0\n",
"## Third-Party Dependencies\n"
]
for dep, lic in self.THIRD_PARTY_LICENSES.items():
notices.append(f"- **{dep}**: {lic}")
return "\n".join(notices)
def generate_compatibility_md(self):
"""生成 LICENSE-COMPATIBILITY.md 矩阵表"""
from collections import Counter
dist = Counter(self.REPO_LICENSES.values())
md = "| Repository | License | 3rd-Party Compatibility |\n"
md += "|-----------|---------|------------------------|\n"
for repo, lic in sorted(self.REPO_LICENSES.items()):
md += f"| {repo} | {lic} | ✅ |\n"
md += "\n### License Distribution\n"
for lic, count in dist.most_common():
md += f"- **{lic}**: {count} repositories\n"
return md
踩坑一:CLA-Bot 企业域名白名单误匹配——@user.huawei.com.cn 被 endswith("huawei.com") 误判为企业用户
# ❌ 域名匹配太宽——任何 .cn 子域名都会误匹配
corp_domains = ["huawei.com", "inspur.com"]
is_corporate = any(email_domain.endswith(d) for d in corp_domains)
# "user@badactor-huawei.com" → endswith("huawei.com") → True(误!)
# ✅ 精确域名匹配 + 合法子域名白名单
def match_corporate_domain(email_domain, corp_domains):
for domain in corp_domains:
if email_domain == domain:
return True # 精确匹配: @huawei.com
# 仅允许合法子域名(如 rd.huawei.com)
if email_domain.endswith(f".{domain}") and len(email_domain.split(".")) == len(domain.split(".")) + 1:
return True
return False
踩坑二:第三方依赖 UPX 是 GPL-3.0(copyleft)→ Apache 2.0 不可混入源码
| 依赖 | 许可证 | 兼容性 | 处理方案 |
|---|---|---|---|
| UPX | GPL-3.0 | ❌ 不兼容 | 仅用于 CI 压缩 artifact,不 link 到 CANN 代码 |
| libusb | LGPL-3.0 | ✅ 动态链接 | 作为系统库 dlopen 动态加载,不静态编译 |
| Linux Kernel Headers | GPL-2.0 | ✅ 例外 | driver 仓库声明 GPL exception,仅头文件引用无代码合成 |
踩坑三:CLA 签署收集个人信息(姓名/邮箱)→ GDPR 隐私合规——白名单公开暴露邮箱
# ❌ CLA-whitelist.json 公开暴露个人邮箱
# {"individuals": [{"name": "张三", "email": "zhang@company.com"}]} → 隐私泄露
# ✅ 双层存储:签名数据库(私密)+ 公开展示列表(仅 GitHub ID)
# 签名数据库(仅供 CLA-Bot 内部读取,私密)
CLA_SIGNING_DB = {
"github_zhang": {
"name": "张三",
"email": "zhang@company.com", # ← 私密存储
"signed_at": "2025-03-15"
}
}
# 白名单(公开仓库,仅 GitHub 用户名)
CLA_WHITELIST_PUBLIC = {
"individuals": ["zhang", "li", "wang"], # 只存 GitHub ID
"corporate_domains": ["huawei.com"],
}
# CLA-Bot 逻辑:查公开列表(→是)→查私密DB(→是)→✓ signed
# 邮箱从未离开私密存储 → GDPR 合规
cann-agreements 是 CANN 社区的法律协议基础层。CLA(核心仓库强制,CLA-Bot 自动检测→签署→标记)vs DCO(工具仓库推荐,git commit -s)→ 许可证兼容矩阵(55仓全 Apache 2.0/MIT/CC-BY-4.0 + 12第三方审计→UPX GPL copyleft 例外)→ 贡献者指南(完整 PR 流程:DCO/CLA→CHANGELOG→CI→review)。三个踩坑:企业域名误匹配子域名→精确域名匹配、GPL 依赖 UPX→仅二进制分发自用、CLA 个人信息泄露→公私分离两层白名单。
更多推荐



所有评论(0)