CANN 社区 55 个仓库接受外部 PR(pull request)→ 第一个 PR 到来 → 需要签署 CLA(Contributor License Agreement,贡献者许可协议)。手动流程:下载 PDF → 打印 → 签名 → 扫描 → 邮件发送 → 人工审核 → 7 天。cann-agreements 仓库定义了 CANN 社区的法律协议基础设施:CLA 模板(个人 CLA + 企业 CLA)、DCO(Developer Certificate of Origin,无需签署的轻量替代)、自动签署 CI(CLA-Bot)、许可证兼容矩阵(55 个仓库的许可证组合 = OSI Approved 子集)。

cann-agreements 是社区的法律门卫——不是代码仓库,但定义了谁能贡献、贡献内容去向、与下游厂商的法律关系。

CANN 社区法律协议三层体系

cann-agreements/
├── CLA/                          # 贡献者许可协议
│   ├── CLA-individual.md         # 个人 CLA 模板
│   ├── CLA-corporate.md          # 企业 CLA 模板
│   ├── CLA-whitelist.json        # 已签署 CLA 的个人/企业白名单
│   └── CLA-bot/                  # CLA 自动签署 CI
│       ├── cla_bot.py            # GitHub App(检测 PR → 检查白名单 → 未签→引导签署)
│       └── config.yaml           # Bot 配置
│
├── DCO/                          # Developer Certificate of Origin
│   ├── DCO.txt                   # DCO 全文(Linux Kernel 风格)
│   ├── DCO-check.yml             # CI: 所有 commit 强制 Signed-off-by
│   └── DCO-policy.md             # DCO 政策说明
│
├── licenses/                     # 开源许可证
│   ├── Apache-2.0.txt
│   ├── MIT.txt
│   └── BSD-3-Clause.txt
│
├── LICENSE                       # 本仓库的许可证
├── LICENSE-COMPATIBILITY.md      # 55 个仓库的许可证兼容矩阵
├── third-party/                  # 第三方依赖许可证审计
│   └── dependency-licenses.json
└── contributor-guide.md          # 贡献者指南

CLA(Contributor License Agreement)vs DCO(Developer Certificate of Origin)

维度 CLA(Apache Style) DCO(Linux Kernel Style)
签署方式 下载 PDF → 签名 → 上传 每条 git commit 加 Signed-off-by
签署主体 个人或公司(法律实体) 个人(commit author)
内容 “我授予项目永久、全球、免版税的许可” “我确认此提交是我原创,我有权贡献”
公司参与 ✅ 支持(公司 CLA 覆盖所有员工) ✅ 需公司律师批准 DCO
执行方式 手动签署 + 人工审核 Commit hook + CI
CANN 使用 核心仓库(~35 个系统仓库)强制 CLA ~20 个工具/文档/社区仓库使用 DCO

CLA 自动签署 CI(CLA-Bot)

# cann-agreements/CLA/CLA-bot/cla_bot.py
# GitHub App webhook → CLA 自动签署检查

class CLABot:
    """
    CLA 自动签署 GitHub App
    流程:PR webhook → 检查白名单 → 已签→标记 ✓ / 未签→引导签署

    仓库分类:
    - 核心系统仓库(35个):强制 CLA
      ops-*, catlass, ATB, asnumpy, hccl/hcomm/hixl, ge/runtime/driver, torchtitan-npu...
    - 工具/社区/文档仓库(20个):仅 DCO
      cmake, oam-tools, asc-tools, cann-learning-hub, community, cann-competitions...
    """

    CLA_WHITELIST_URL = "https://cann-community.dev/CLA/CLA-whitelist.json"

    def check_pr(self, pr_event):
        """
        处理 GitHub PR webhook
        """
        action = pr_event["action"]
        if action not in ("opened", "synchronize", "reopened"):
            return

        pr_number = pr_event["pull_request"]["number"]
        pr_author = pr_event["sender"]["login"]
        pr_author_email = pr_event["pull_request"]["user"]["email"] or ""
        repo_fullname = pr_event["repository"]["full_name"]

        # 检查仓库是否需要 CLA(核心仓库 vs DCO 仓库)
        if not self._repo_requires_cla(repo_fullname):
            return  # DCO 仓库,跳过

        # 加载白名单并检查签名状态
        whitelist = self._load_whitelist()
        signed = self._is_signed(pr_author, pr_author_email, whitelist)

        if signed:
            self._add_label(repo_fullname, pr_number, "cla-signed")
            self._post_comment(repo_fullname, pr_number,
                f"✅ **CLA Signed**: @{pr_author}, your contribution is eligible for review."
            )
        else:
            self._add_label(repo_fullname, pr_number, "cla-required")
            sign_url = f"https://cann-community.dev/CLA/sign?github_user={pr_author}"
            self._post_comment(repo_fullname, pr_number,
                f"⚠️ **CLA Required**: Hi @{pr_author}, please sign the CLA before we can review.\n\n"
                f"📝 **[Sign CLA →]({sign_url})** (takes < 2 minutes)\n\n"
                f"Once signed, reply `/cla-check` to re-trigger verification."
            )

    def _is_signed(self, username, email, whitelist):
        """检查用户是否已签署 CLA(个人白名单 + 企业域名白名单)"""
        # 1. 个人白名单检查
        if username.lower() in [u.lower() for u in whitelist.get("individuals", [])]:
            return True

        # 2. 企业域名白名单检查(精确域名匹配,防止子域名误伤)
        email_domain = email.split("@")[1] if "@" in email else ""
        corp_domains = whitelist.get("corporate_domains", [])

        for corp_domain in corp_domains:
            if email_domain == corp_domain:
                return True
            # 允许精确子域名(如 rd.huawei.com 匹配 huawei.com)
            if email_domain.endswith(f".{corp_domain}") and email_domain.count(".") == corp_domain.count(".") + 1:
                return True

        return False

    def handle_cla_check_command(self, pr_event):
        """处理 /cla-check 指令(用户签署后手动触发)"""
        repo = pr_event["repository"]["full_name"]
        pr_number = pr_event["pull_request"]["number"]

        # 重新拉取白名单(用户可能刚签署)
        whitelist = self._load_whitelist()
        author = pr_event["sender"]["login"]

        if self._is_signed(author, "", whitelist):
            self._remove_label(repo, pr_number, "cla-required")
            self._add_label(repo, pr_number, "cla-signed")
            self._post_comment(repo, pr_number, "✅ CLA verified! Your PR is now ready for review.")
        else:
            self._post_comment(repo, pr_number,
                f"⏳ @{author}, CLA not yet found. Please complete signing at: "
                f"https://cann-community.dev/CLA/sign?github_user={author}"
            )

    def _repo_requires_cla(self, repo_fullname):
        """核心仓库强制 CLA,工具/社区仓库用 DCO"""
        CLA_REPOS = [
            "ops-math", "ops-nn", "ops-transformer", "ops-cv", "ops-blas",
            "ops-fft", "ops-rand", "ops-tensor", "opbase",
            "catlass", "ascend-transformer-boost", "asnumpy", "graph-autofusion",
            "hccl", "hcomm", "hixl", "ascend-boost-comm", "shmem",
            "ge", "metadef", "runtime", "driver",
            "asc-devkit", "asc-tools", "pyasc", "pypto",
            "pto-isa", "atvc", "atvoss", "oam-tools", "sip",
            "torchtitan-npu", "tensorflow", "triton-inference-server-ge-backend",
            "cann-spack-package", "elec-ops-inspection", "elec-ops-prediction",
            "elec-ops-simulation", "mat-chem-sim-pred"
        ]
        repo_name = repo_fullname.split("/")[1] if "/" in repo_fullname else repo_fullname
        return repo_name in CLA_REPOS

    def sync_whitelist(self):
        """定期同步白名单(从签署数据库拉取最新签署列表)"""
        signed_users = self._fetch_signed_users_from_db()

        # 更新 CLA-whitelist.json(公开:只含 GitHub 用户名 + 企业域名,不含个人邮箱)
        with open("CLA/CLA-whitelist.json", "w") as f:
            json.dump({
                "individuals": signed_users["individuals"],
                "corporate_domains": signed_users["corporate_domains"],
                "last_updated": datetime.now().isoformat()
            }, f, indent=2)

    def _add_label(self, repo, pr_number, label):
        url = f"https://api.github.com/repos/{repo}/issues/{pr_number}/labels"
        requests.post(url, headers={"Authorization": f"token {self.github_token}"},
                      json={"labels": [label]})

    def _remove_label(self, repo, pr_number, label):
        url = f"https://api.github.com/repos/{repo}/issues/{pr_number}/labels/{label}"
        requests.delete(url, headers={"Authorization": f"token {self.github_token}"})

    def _post_comment(self, repo, pr_number, body):
        url = f"https://api.github.com/repos/{repo}/issues/{pr_number}/comments"
        requests.post(url, headers={"Authorization": f"token {self.github_token}"},
                      json={"body": body})

DCO 强制执行(DCO-check.yml)

# cann-agreements/DCO/DCO-check.yml
# 所有 commit 必须带 Signed-off-by

name: DCO Check

on: [pull_request]

jobs:
  dco-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0

      - name: Check commits for DCO Sign-off
        run: |
          git fetch origin ${{ github.event.pull_request.base.sha }}
          git fetch origin ${{ github.event.pull_request.head.sha }}

          commits=$(git log \
            ${{ github.event.pull_request.base.sha }}..${{ github.event.pull_request.head.sha }} \
            --no-merges --format="%H")

          failed=0
          for commit in $commits; do
            message=$(git log -1 --format="%B" $commit)
            if ! echo "$message" | grep -q "^Signed-off-by:"; then
              author=$(git log -1 --format="%an <%ae>" $commit)
              echo "❌ Commit $commit ($author): Missing Signed-off-by"
              echo "   Fix: git commit --amend --signoff && git push --force"
              failed=1
            fi
          done

          if [ $failed -eq 1 ]; then
            echo ""
            echo "All commits must be signed off (DCO). Add -s to your git commit:"
            echo "  $ git commit -s -m 'your message'"
            exit 1
          fi

          echo "✅ All commits have valid DCO sign-off"

许可证兼容矩阵与审计脚本

# cann-agreements/licenses/license_audit.py
# 审计 55 仓库的许可证兼容性 + 生成 LICENSE-COMPATIBILITY.md

class LicenseAuditor:
    """许可证兼容性审计:Apache 2.0 兼容所有 CANN 依赖?"""

    # 55 个仓库的许可证分配
    REPO_LICENSES = {
        # 核心算子 + 加速库 + 通信 + 运行时 → Apache 2.0(~35个)
        "ops-math": "Apache-2.0", "ops-nn": "Apache-2.0",
        "ops-transformer": "Apache-2.0", "opbase": "Apache-2.0",
        "catlass": "Apache-2.0", "ascend-transformer-boost": "Apache-2.0",
        "hccl": "Apache-2.0", "ge": "Apache-2.0",
        "driver": "Apache-2.0 + GPL-2.0 exception",  # 驱动含 Linux Kernel header
        # 工具 → MIT(更宽松)
        "cmake": "MIT", "oam-tools": "MIT", "asc-tools": "MIT",
        # 社区/文档 → CC-BY-4.0
        "cann-learning-hub": "CC-BY-4.0", "community": "CC-BY-4.0",
        "cann-competitions": "CC-BY-4.0",
    }

    # 12 个关键第三方依赖的 License
    THIRD_PARTY_LICENSES = {
        "PyTorch": "BSD-3-Clause", "TensorFlow": "Apache-2.0",
        "NumPy": "BSD-3-Clause", "CMake": "BSD-3-Clause",
        "OpenSSL": "Apache-2.0", "Boost": "BSL-1.0",
        "Google Test": "BSD-3-Clause", "Abseil": "Apache-2.0",
        "UPX": "GPL-3.0",               # ⚠️ copyleft!
        "libusb": "LGPL-3.0",           # ⚠️ only for dynamic linking
        "Linux Kernel Headers": "GPL-2.0",  # ⚠️ exception for driver
    }

    # Apache 2.0 兼容性矩阵
    APACHE_COMPAT = {
        "Apache-2.0":   "✅ Compatible",
        "MIT":          "✅ Compatible",
        "BSD-3-Clause": "✅ Compatible",
        "BSD-2-Clause": "✅ Compatible",
        "BSL-1.0":     "✅ Compatible",
        "CC-BY-4.0":   "✅ Compatible",
        "GPL-3.0":     "❌ Incompatible (copyleft) - can only binary-distribute",
        "AGPL-3.0":    "❌ Incompatible (strong copyleft)",
        "LGPL-3.0":    "✅ Compatible (dynamic linking only)",
        "GPL-2.0":     "⚠️ GPL exception needed (header-only permitted)",
    }

    def audit_all_repos(self):
        violations = []
        for repo, lic in self.REPO_LICENSES.items():
            for dep, dep_lic in self.THIRD_PARTY_LICENSES.items():
                compat = self.APACHE_COMPAT.get(dep_lic, "Unknown")
                if "❌" in compat:
                    violations.append({
                        "repo": repo, "dep": dep,
                        "dep_license": dep_lic, "status": compat
                    })

        if violations:
            for v in violations:
                print(f"⚠️  {v['repo']}: depends on {v['dep']} ({v['dep_license']}) - {v['status']}")
        else:
            print("✅ All repos license-compliant")

        return violations

    def generate_notice_file(self, repo_name):
        """生成 NOTICE 文件(Apache 2.0 要求列出第三方依赖 license)"""
        notices = [
            f"CANN {repo_name}",
            f"Copyright {datetime.now().year} The CANN Authors",
            f"Licensed under Apache 2.0\n",
            "## Third-Party Dependencies\n"
        ]
        for dep, lic in self.THIRD_PARTY_LICENSES.items():
            notices.append(f"- **{dep}**: {lic}")

        return "\n".join(notices)

    def generate_compatibility_md(self):
        """生成 LICENSE-COMPATIBILITY.md 矩阵表"""
        from collections import Counter
        dist = Counter(self.REPO_LICENSES.values())

        md = "| Repository | License | 3rd-Party Compatibility |\n"
        md += "|-----------|---------|------------------------|\n"
        for repo, lic in sorted(self.REPO_LICENSES.items()):
            md += f"| {repo} | {lic} | ✅ |\n"

        md += "\n### License Distribution\n"
        for lic, count in dist.most_common():
            md += f"- **{lic}**: {count} repositories\n"

        return md

踩坑一:CLA-Bot 企业域名白名单误匹配——@user.huawei.com.cnendswith("huawei.com") 误判为企业用户

# ❌ 域名匹配太宽——任何 .cn 子域名都会误匹配
corp_domains = ["huawei.com", "inspur.com"]
is_corporate = any(email_domain.endswith(d) for d in corp_domains)
# "user@badactor-huawei.com" → endswith("huawei.com") → True(误!)

# ✅ 精确域名匹配 + 合法子域名白名单
def match_corporate_domain(email_domain, corp_domains):
    for domain in corp_domains:
        if email_domain == domain:
            return True  # 精确匹配: @huawei.com
        # 仅允许合法子域名(如 rd.huawei.com)
        if email_domain.endswith(f".{domain}") and len(email_domain.split(".")) == len(domain.split(".")) + 1:
            return True
    return False

踩坑二:第三方依赖 UPX 是 GPL-3.0(copyleft)→ Apache 2.0 不可混入源码

依赖 许可证 兼容性 处理方案
UPX GPL-3.0 ❌ 不兼容 仅用于 CI 压缩 artifact,不 link 到 CANN 代码
libusb LGPL-3.0 ✅ 动态链接 作为系统库 dlopen 动态加载,不静态编译
Linux Kernel Headers GPL-2.0 ✅ 例外 driver 仓库声明 GPL exception,仅头文件引用无代码合成

踩坑三:CLA 签署收集个人信息(姓名/邮箱)→ GDPR 隐私合规——白名单公开暴露邮箱

# ❌ CLA-whitelist.json 公开暴露个人邮箱
# {"individuals": [{"name": "张三", "email": "zhang@company.com"}]} → 隐私泄露

# ✅ 双层存储:签名数据库(私密)+ 公开展示列表(仅 GitHub ID)
# 签名数据库(仅供 CLA-Bot 内部读取,私密)
CLA_SIGNING_DB = {
    "github_zhang": {
        "name": "张三",
        "email": "zhang@company.com",   # ← 私密存储
        "signed_at": "2025-03-15"
    }
}

# 白名单(公开仓库,仅 GitHub 用户名)
CLA_WHITELIST_PUBLIC = {
    "individuals": ["zhang", "li", "wang"],       # 只存 GitHub ID
    "corporate_domains": ["huawei.com"],
}

# CLA-Bot 逻辑:查公开列表(→是)→查私密DB(→是)→✓ signed
# 邮箱从未离开私密存储 → GDPR 合规

cann-agreements 是 CANN 社区的法律协议基础层。CLA(核心仓库强制,CLA-Bot 自动检测→签署→标记)vs DCO(工具仓库推荐,git commit -s)→ 许可证兼容矩阵(55仓全 Apache 2.0/MIT/CC-BY-4.0 + 12第三方审计→UPX GPL copyleft 例外)→ 贡献者指南(完整 PR 流程:DCO/CLA→CHANGELOG→CI→review)。三个踩坑:企业域名误匹配子域名→精确域名匹配、GPL 依赖 UPX→仅二进制分发自用、CLA 个人信息泄露→公私分离两层白名单。

Logo

作为“人工智能6S店”的官方数字引擎,为AI开发者与企业提供一个覆盖软硬件全栈、一站式门户。

更多推荐