CANN/ge ATC形状配置指南

CANN/ge ATC形状配置指南
ATC Model Conversion Practice Guide: Static Shape, Dynamic Multi-Gear, and Dynamic Shape【免费下载链接】geGEGraph Engine是面向昇腾的图编译器和执行器提供了计算图优化、多流并行、内存复用和模型下沉等技术手段加速模型执行效率减少模型内存占用。 GE 提供对 PyTorch、TensorFlow 前端的友好接入能力并同时支持 onnx、pb 等主流模型格式的解析与编译。项目地址: https://gitcode.com/cann/ge1 IntroductionThis document is intended for application developers and focuses on two core questions:Will the input size changeCan changes be enumerated in advanceBased on these two dimensions, practical solutions for usingATCto convert models in Ascend inference scenarios are provided. This document does not distinguish between frontend frameworks and applies to all model formats supported by ATC (such as ONNX, TensorFlow PB, Caffe, etc.).In Ascend inference scenarios, the choice of shape directly affects the compiler optimization level, runtime scheduling method, and final performance stability. Properly choosing between static shape, dynamic multi-gear, or dynamic shape, combined with ATCs capability characteristics, is key to achieving stable throughput and low latency.This document assumes that readers already understand the complete process of model conversion via ATC and model loading/inference usingaclmdlinterfaces.2 Overall Flow of Model Conversion and ExecutionBefore diving into specific strategies, lets unify the basic concepts of ATC and model execution phases from an overall flow perspective.Users convert models into.om(Offline Model) files viaATCcommand, then load and execute these models viaaclmdlseries interfaces. From GE (Graph Engine) perspective, these two phases are calledcompileandexecuterespectively.Compile PhaseGE reads the model file specified in ATC (such as ONNX or PB), analyzes and optimizes the computation graph, and generates a binary model file (.om) that can be executed on NPU.Execute PhaseGE loads the.omfile via aclmdl interfaces, deploys it to NPU device, and executes subsequent inference tasks.It should be clarified that GE adopts aclear separation of compile-time and runtime responsibilitiesmodel:The compile phase takes longer but usually needs to be executed only once to generate.om;The execute phase no longer performs structural graph optimization, inference overhead is small, and.omcan be repeatedly executed after loading.This characteristic determines theimportance of shape information at compile time.3 Static Shape, Dynamic Shape, and Performance CharacteristicsStatic ShapeStatic shapemeans that during multiple executions of the model, all tensor (input, output, and intermediate tensors) dimensions are completely fixed, and no dimension is allowed to change.In this mode, the compile phase can perform the most comprehensive optimizations and enablesink schedulingduring execution. The specific mechanism of sink scheduling can be found in the official documentation: https://www.hiascend.com/developer/techArticles/20240715-1In engineering practice, static shape usually achieves the best inference performance and stability.Dynamic ShapeDynamic shapemeans that during multiple executions of the model, the dimensions of input or intermediate tensors may change.Its advantage is flexibility, but the cost is also obvious:Significantly fewer optimizations available at compile time;Cannot enable sink scheduling;Inference performance and latency stability are usually poor.Therefore, in performance-sensitive inference scenarios, completely dynamic shape should be avoided.Dynamic Multi-Gear (Recommended Balanced Solution)Considering the significant performance advantage brought by static shape, ATC providesdynamic multi-gearcapability to handlescenarios where shape changes are limited and enumerable.The essence of dynamic multi-gear is: During model conversion phase,specify multiple fixed static shape gears at once. At runtime, select the matching gear to execute based on actual input, but each gear is treated as static shape during compile phase.For example, if only the batch dimension of the model is variable and may take the following values:[1, 3, 224, 224][8, 3, 224, 224][16, 3, 224, 224]Then these three batch sizes can be passed to ATC simultaneously as three gears.After enabling dynamic multi-gear:The model still appears as dynamic at execution level;The compiler can perform static shape optimization for each gear;Inference performance usually matches that of single static shape.Note that while dynamic multi-gear brings performance benefits, it also introduces additional costs:Model memory occupation is based on the largest gearEven when executing the smallest gear, the overall model memory occupation is equivalent to the largest gear. For example, if the largest batch gear is 1024, even when executing batch1, memory occupation is still calculated as 1024.Compile time increases linearly with the number of gearsGenerally, the compile time for N gears is approximately N times that of single static shape.4 Overview of Shape-Related Parameter Configuration in ATCThis chapter explains from theATC parameter configuration perspectivehow the three strategies of static shape, completely dynamic shape, and dynamic multi-gear are expressed in ATC.Parameter Configuration for Static ShapeUnder the static shape strategy, the model needs tocompletely determine all input tensor dimensionsduring compile phase. When converting with ATC, users need to explicitly specify a fixed shape for each input.For example:【免费下载链接】geGEGraph Engine是面向昇腾的图编译器和执行器提供了计算图优化、多流并行、内存复用和模型下沉等技术手段加速模型执行效率减少模型内存占用。 GE 提供对 PyTorch、TensorFlow 前端的友好接入能力并同时支持 onnx、pb 等主流模型格式的解析与编译。项目地址: https://gitcode.com/cann/ge创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考