写给新手的 asc-tools：昇腾开发工具集到底是啥？

微祎_

314人浏览 · 2026-05-23 08:07:26

微祎_ · 2026-05-23 08:07:26 发布

之前帮兄弟调算子性能，他问我：“哥，昇腾有没有现成的性能分析工具？总不能自己手写 profiling 吧？”

我说有，asc-tools。

好问题。今天一次说清楚。

asc-tools 是啥？

asc-tools = Ascend Tools，昇腾的开发工具集。profiling、debugging、benchmarking、visualization 这些开发工具都在里面。

一句话说清楚：asc-tools 是昇腾的"工具箱"，你想 profiling 算子性能、debugging 内存泄漏、visualizing 计算图，工具都给你准备好了，拿来就能用。

你说气人不气人，之前自己写 profiling 写了一周，现在下载个工具，改两行配置就搞定了。

为什么要用 asc-tools？

三个字：省时间。

不用 asc-tools（自己写）

# 自己手写 profiling 脚本
import time
import torch

# 自己写计时器
class Profiler:
    def __init__(self):
        self.start_time = None
    
    def start(self):
        self.start_time = time.time()
    
    def end(self):
        return time.time() - self.start_time

# 问题：
# 1. 功能简陋（只能计时）
# 2. 看不到硬件计数器
# 3. 可视化要自己写
# 4. 浪费时间（官方有现成的）

用 asc-tools（官方工具）

# 克隆仓库
$ git clone https://atomgit.com/cann/asc-tools.git
$ cd asc-tools

# 直接用 profiling 工具
$ cd tools/profiler
$ python run_profiler.py --model resnet50 --batch-size 32

# 输出：
# ========================================
# Profiling Result (ResNet-50, batch=32)
# ========================================
# Total time: 45ms
# - conv1: 8ms (17.8%)
# - conv2: 6ms (13.3%)
# - conv3: 5ms (11.1%)
# - fc: 3ms (6.7%)
# - others: 23ms (51.1%)
# 
# NPU Utilization: 85.2%
# Memory bandwidth: 450 GB/s
# 
# Visualization: profiling_result.html
# ========================================

# 打开可视化
$ open profiling_result.html

你说气人不气人，拿来就能用，功能还强大。

核心概念就三个

1. 工具（Tools）

每个工具一个目录：

asc-tools/
├── tools/
│   ├── profiler/          # 性能分析工具
│   │   ├── run_profiler.py
│   │   ├── visualize.py
│   │   └── README.md
│   │
│   ├── debugger/          # 调试工具
│   │   ├── memory_debugger.py
│   │   ├── kernel_debugger.py
│   │   └── README.md
│   │
│   ├── benchmarker/       # 基准测试工具
│   │   ├── run_benchmark.py
│   │   ├── compare.py
│   │   └── README.md
│   │
│   └── visualizer/       # 可视化工具
│       ├── plot_compute_graph.py
│       ├── plot_memory.py
│       └── README.md
│
└── examples/              # 使用示例
    ├── profiler_example/
    ├── debugger_example/
    ├── benchmarker_example/
    └── visualizer_example/

2. 配置（Config）

每个工具一个配置文件：

# tools/profiler/config.yaml
profiling:
  model: "resnet50"
  batch_size: 32
  num_iterations: 100
  warmup_iterations: 10

output:
  format: "html"          # html / json / csv
  path: "./profiling_result"

visualization:
  plot_compute_graph: true
  plot_memory: true
  plot_npu_utilization: true

3. 脚本（Scripts）

官方提供的运行脚本：

# tools/profiler/run_profiler.py
import yaml
import argparse
import torch
from ascend import profiler

def load_config(config_path):
    with open(config_path, 'r') as f:
        return yaml.safe_load(f)

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--config', type=str, default='config.yaml')
    args = parser.parse_args()

    # 加载配置
    config = load_config(args.config)

    # 模型
    model = torch.load(config['profiling']['model'])
    model = model.npu()

    # Profiling
    with profiler.Profile() as prof:
        for i in range(config['profiling']['num_iterations']):
            if i < config['profiling']['warmup_iterations']:
                continue
            
            input = torch.randn(
                config['profiling']['batch_size'],
                3, 224, 224
            ).npu()
            
            output = model(input)
    
    # 结果
    prof.export_chrome_trace(config['output']['path'] + ".json")
    
    if config['output']['format'] == 'html':
        prof.visualize_html(config['output']['path'] + ".html")
    
    print(prof.key_averages().table(sort_by="cuda_time_total"))

if __name__ == "__main__":
    main()

为什么要用 asc-tools？

三个理由：

1. 省时间

自己写 vs 用工具：

方式	时间	功能
自己写	2 周	计时
用工具	10 分钟	性能分析 + 可视化 + 硬件计数器

2. 功能强大

官方工具功能齐全：

# 运行 profiling 工具
$ cd tools/profiler
$ python run_profiler.py --model resnet50 --batch-size 32

# 输出：
# ========================================
# Profiling Result (ResNet-50, batch=32)
# ========================================
# Total time: 45ms
# - conv1: 8ms (17.8%)
# - conv2: 6ms (13.3%)
# - conv3: 5ms (11.1%)
# - fc: 3ms (6.7%)
# - others: 23ms (51.1%)
# 
# NPU Utilization: 85.2%
# Memory bandwidth: 450 GB/s
# 
# Visualization: profiling_result.html
# ========================================

# 打开可视化（有计算图、内存曲线、NPU 利用率）
$ open profiling_result.html

3. 学习资源

工具里有详细的注释和文档：

# 看 profiling 工具的配置说明
$ cat tools/profiler/config.yaml | grep -A 5 "# Explanation"

# 输出：
# # Explanation:
# # - batch_size=32: Optimal for throughput (tested 8-128)
# # - num_iterations=100: Enough for stable result
# # - warmup_iterations=10: Avoid cold start
# # - format=html: Interactive visualization
# # - plot_compute_graph=true: Show compute graph

你说气人不气人，官方文档都给你写好了。

怎么用？代码示例

示例 1：Profiling 算子性能

# 1. 克隆仓库
$ git clone https://atomgit.com/cann/asc-tools.git
$ cd asc-tools/tools/profiler

# 2. 准备模型
$ # 假设你有 resnet50 模型
$ ln -s /path/to/resnet50.pth ./resnet50.pth

# 3. 修改配置（可选）
$ vi config.yaml
# 修改 batch_size: 32 → batch_size: 64

# 4. 运行 profiling
$ python run_profiler.py --config config.yaml

# 输出：
# ========================================
# Profiling Result (ResNet-50, batch=64)
# ========================================
# Total time: 85ms
# - conv1: 15ms (17.6%)
# - conv2: 12ms (14.1%)
# - conv3: 10ms (11.8%)
# - fc: 6ms (7.1%)
# - others: 42ms (49.4%)
# 
# NPU Utilization: 87.3%
# Memory bandwidth: 470 GB/s
# 
# Visualization: profiling_result.html
# ========================================

# 5. 打开可视化
$ open profiling_result.html

示例 2：Debugging 内存泄漏

# 1. 进入 debugger 目录
$ cd ../debugger

# 2. 修改配置
$ vi config.yaml
# 修改：
#   tool: "memory_debugger"
#   model: "resnet50"
#   batch_size: 32

# 3. 运行 debugger
$ python memory_debugger.py --config config.yaml

# 输出：
# ========================================
# Memory Debugger Result (ResNet-50, batch=32)
# ========================================
# Memory leak detected!
# - File: ops_nn/conv2d.py
# - Line: 123
# - Object: torch.Tensor (size: 1024x1024)
# - Leak count: 10 iterations
# 
# Suggestion: Add `del output` after each iteration.
# ========================================

# 4. 修复代码
$ vi ops_nn/conv2d.py
# 在第 123 行后加：
#   del output

# 5. 重新运行 debugger
$ python memory_debugger.py --config config.yaml

# 输出：
# ========================================
# No memory leak detected! ✅
# ========================================

示例 3：Benchmarking 算子性能

# 1. 进入 benchmarker 目录
$ cd ../benchmarker

# 2. 修改配置
$ vi config.yaml
# 修改：
#   model: "resnet50"
#   batch_size: [32, 64, 128]
#   num_iterations: 100

# 3. 运行 benchmarker
$ python run_benchmark.py --config config.yaml

# 输出：
# ========================================
# Benchmark Result (ResNet-50)
# ========================================
# Batch Size | Time (ms) | Throughput (img/s) | NPU Util (%)
# ----------|------------|---------------------|----------------
# 32        | 45         | 711                 | 85.2
# 64        | 85         | 753                 | 87.3
# 128       | 165        | 775                 | 88.1
# 
# Optimal batch size: 128 (throughput: 775 img/s)
# ========================================

# 4. 对比不同模型
$ python compare.py --models resnet50,bert,gpt --batch-size 128

# 输出：
# ========================================
# Model Comparison (batch=128)
# ========================================
# Model | Time (ms) | Throughput | NPU Util (%)
# ------|------------|------------|----------------
# resnet50 | 165        | 775 img/s | 88.1
# bert     | 320        | 400 seq/s | 82.3
# gpt      | 450        | 284 tok/s | 79.5
# ========================================

示例 4：Visualizing 计算图

# 1. 进入 visualizer 目录
$ cd ../visualizer

# 2. 修改配置
$ vi config.yaml
# 修改：
#   model: "resnet50"
#   plot_compute_graph: true
#   plot_memory: true
#   plot_npu_utilization: true

# 3. 运行 visualizer
$ python plot_compute_graph.py --config config.yaml

# 输出：
# ========================================
# Visualization Result (ResNet-50)
# ========================================
# Compute graph: compute_graph.html
# Memory curve: memory_curve.html
# NPU utilization: npu_utilization.html
# ========================================

# 4. 打开可视化
$ open compute_graph.html
$ open memory_curve.html
$ open npu_utilization.html

性能数据

用 asc-tools 的效率提升：

方式	写 profiling 时间	功能丰富度	可视化
自己写	2 周	⭐⭐	❌
用工具	10 分钟	⭐⭐⭐⭐⭐	✅

提升：~100x

你说气人不气人，之前写 2 周，现在 10 分钟。

跟其他仓库的关系

asc-tools 在 CANN 架构里属于第 5 层（昇腾计算基础层），是开发工具集。

依赖关系：

asc-tools（开发工具集）
    ↑ 调试
ops-nn / ops-transformer / ...（算子库）
    ↓ 调用
昇腾 NPU

解释一下：

asc-tools：开发工具集（profiling/debugging/benchmarking/visualization）
ops-nn / …：被调试的算子库
昇腾 NPU：硬件

简单说：asc-tools 是开发者的"瑞士军刀"。想调性能、找 bug，就用它。

asc-tools 的核心内容

1. 工具

# 支持的工具
tools/profiler/       # 性能分析
tools/debugger/       # 调试
tools/benchmarker/    # 基准测试
tools/visualizer/     # 可视化

2. 配置

# config.yaml
profiling:
  model: "..."
  batch_size: ...

output:
  format: "..."

visualization:
  plot_compute_graph: ...

3. 脚本

# run_profiler.py
def main():
    # 加载配置
    # 运行 profiling
    # 输出结果
    # 可视化

4. 示例

# examples/
profiler_example/
debugger_example/
benchmarker_example/
visualizer_example/

适用场景

什么情况下用 asc-tools：

性能调优：要 profiling 算子
调试：要 debugging 内存泄漏
基准测试：要 benchmarking 算子性能
可视化：要 visualizing 计算图

什么情况下不用：

只用不调：不用看
生产环境：用生产监控工具

总结

asc-tools 就是昇腾的"工具箱"：

Profiler：性能分析
Debugger：调试
Benchmarker：基准测试
Visualizer：可视化

人工智能6S服务平台

作为“人工智能6S店”的官方数字引擎，为AI开发者与企业提供一个覆盖软硬件全栈、一站式门户。

更多推荐

用一个苹果和一个橘子，给娃讲明白什么是「监督学习」——AI入门亲子教程

老师给出（数据 + 答案）→ [苹果图片 + "这是苹果"]↓AI观察、找规律 → [红色 + 圆形 + 小把儿 = 苹果]↓AI自己判断新数据 → [新水果 → 苹果！↓越练越准，直到学会 ✅要素什么意思故事里对应什么数据（Data）AI看到的信息水果的图片/特征标签（Label）正确答案"这是苹果""这是橘子"模型（Model）AI学会的本事小智判断苹果橘子的能力监督学习 = 有老师教的学习老