CANN/ge ATC形状配置指南

发布时间：2026/7/4 7:00:45

ATC Model Conversion Practice Guide: Static Shape, Dynamic Multi-Gear, and Dynamic Shape【免费下载链接】geGEGraph Engine是面向昇腾的图编译器和执行器提供了计算图优化、多流并行、内存复用和模型下沉等技术手段加速模型执行效率减少模型内存占用。 GE 提供对 PyTorch、TensorFlow 前端的友好接入能力并同时支持 onnx、pb 等主流模型格式的解析与编译。项目地址: https://gitcode.com/cann/ge1 IntroductionThis document is intended for application developers and focuses on two core questions:Will the input size changeCan changes be enumerated in advanceBased on these two dimensions, practical solutions for usingATCto convert models in Ascend inference scenarios are provided. This document does not distinguish between frontend frameworks and applies to all model formats supported by ATC (such as ONNX, TensorFlow PB, Caffe, etc.).In Ascend inference scenarios, the choice of shape directly affects the compiler optimization level, runtime scheduling method, and final performance stability. Properly choosing between static shape, dynamic multi-gear, or dynamic shape, combined with ATCs capability characteristics, is key to achieving stable throughput and low latency.This document assumes that readers already understand the complete process of model conversion via ATC and model loading/inference usingaclmdlinterfaces.2 Overall Flow of Model Conversion and ExecutionBefore diving into specific strategies, lets unify the basic concepts of ATC and model execution phases from an overall flow perspective.Users convert models into.om(Offline Model) files viaATCcommand, then load and execute these models viaaclmdlseries interfaces. From GE (Graph Engine) perspective, these two phases are calledcompileandexecuterespectively.Compile PhaseGE reads the model file specified in ATC (such as ONNX or PB), analyzes and optimizes the computation graph, and generates a binary model file (.om) that can be executed on NPU.Execute PhaseGE loads the.omfile via aclmdl interfaces, deploys it to NPU device, and executes subsequent inference tasks.It should be clarified that GE adopts aclear separation of compile-time and runtime responsibilitiesmodel:The compile phase takes longer but usually needs to be executed only once to generate.om;The execute phase no longer performs structural graph optimization, inference overhead is small, and.omcan be repeatedly executed after loading.This characteristic determines theimportance of shape information at compile time.3 Static Shape, Dynamic Shape, and Performance CharacteristicsStatic ShapeStatic shapemeans that during multiple executions of the model, all tensor (input, output, and intermediate tensors) dimensions are completely fixed, and no dimension is allowed to change.In this mode, the compile phase can perform the most comprehensive optimizations and enablesink schedulingduring execution. The specific mechanism of sink scheduling can be found in the official documentation: https://www.hiascend.com/developer/techArticles/20240715-1In engineering practice, static shape usually achieves the best inference performance and stability.Dynamic ShapeDynamic shapemeans that during multiple executions of the model, the dimensions of input or intermediate tensors may change.Its advantage is flexibility, but the cost is also obvious:Significantly fewer optimizations available at compile time;Cannot enable sink scheduling;Inference performance and latency stability are usually poor.Therefore, in performance-sensitive inference scenarios, completely dynamic shape should be avoided.Dynamic Multi-Gear (Recommended Balanced Solution)Considering the significant performance advantage brought by static shape, ATC providesdynamic multi-gearcapability to handlescenarios where shape changes are limited and enumerable.The essence of dynamic multi-gear is: During model conversion phase,specify multiple fixed static shape gears at once. At runtime, select the matching gear to execute based on actual input, but each gear is treated as static shape during compile phase.For example, if only the batch dimension of the model is variable and may take the following values:[1, 3, 224, 224][8, 3, 224, 224][16, 3, 224, 224]Then these three batch sizes can be passed to ATC simultaneously as three gears.After enabling dynamic multi-gear:The model still appears as dynamic at execution level;The compiler can perform static shape optimization for each gear;Inference performance usually matches that of single static shape.Note that while dynamic multi-gear brings performance benefits, it also introduces additional costs:Model memory occupation is based on the largest gearEven when executing the smallest gear, the overall model memory occupation is equivalent to the largest gear. For example, if the largest batch gear is 1024, even when executing batch1, memory occupation is still calculated as 1024.Compile time increases linearly with the number of gearsGenerally, the compile time for N gears is approximately N times that of single static shape.4 Overview of Shape-Related Parameter Configuration in ATCThis chapter explains from theATC parameter configuration perspectivehow the three strategies of static shape, completely dynamic shape, and dynamic multi-gear are expressed in ATC.Parameter Configuration for Static ShapeUnder the static shape strategy, the model needs tocompletely determine all input tensor dimensionsduring compile phase. When converting with ATC, users need to explicitly specify a fixed shape for each input.For example:【免费下载链接】geGEGraph Engine是面向昇腾的图编译器和执行器提供了计算图优化、多流并行、内存复用和模型下沉等技术手段加速模型执行效率减少模型内存占用。 GE 提供对 PyTorch、TensorFlow 前端的友好接入能力并同时支持 onnx、pb 等主流模型格式的解析与编译。项目地址: https://gitcode.com/cann/ge创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

CANN/ge ATC形状配置指南

相关新闻

Spirit Web Player与GSAP集成教程：打造专业级网页动画效果

【电力铁路直流750V 牵引供电系统】直流电气化铁路牵引供电系统单调谐谐波无源滤波器的设计（Simulink仿真）

CANN/asc-devkit GlobalTensor GetValue API

国内主流AI问答工具实测：按场景选对工具比选最强模型更重要

InVesalius完整功能解析：从DICOM文件处理到3D表面导出的一站式解决方案

Packtpub-crawler安全配置：保护你的账号凭证和数据隐私的10个最佳实践

Robotaxi叙事让位具身智能，特斯拉、小鹏、吉利如何突围？

把《温馨的小美好》放回真实生活里听

GitHub网页版入门：1小时掌握核心协作技能

管理者的六个层次

AI Coding 六个月真实ROI账本：产品经理的血泪教训，研发的冷静忠告

审计来了，数据权限全开——审计走了，怎么确保权限全部关掉？

palera1n越狱终极指南：轻松解锁iOS设备完整教程

Windows Defender移除工具终极指南：彻底释放系统性能的专业解决方案

如何快速上手DyscheOS-utils：5步创建你的第一个App-OS分区

Coze与Dify对比指南：低代码AI应用开发从入门到实战

AI生图工具怎么选？2026年6月版实测对比

国产DSP FT-M6678 DDR3配置避坑指南：从PLL时钟到PHY寄存器，手把手调通你的第一块板