用AST读JavaScript源码：从字符串匹配到语义解析的工程实践-尧图网络科技

1. 为什么“读 JavaScript 源码”这件事90% 的人从一开始就搞错了方向你有没有试过打开一个压缩过的前端项目、一段 obfuscated 的业务逻辑或者某个 npm 包里层层嵌套的index.js然后盯着满屏的ter...、_0x4a2f[12]、void 0n...发呆我试过。三年前在做某金融 SaaS 系统的前端安全审计时第一次被丢进一个 37 个文件、平均压缩率 82%、含动态字符串解密 AST 变形混淆的 SDK 里。当时我的做法是开 Chrome DevTools断点 → 单步 → 看 scope → 猜变量名 → 手动注释 → 崩溃重来。三天只理清了 1/5 的初始化流程还误判了两处关键鉴权逻辑。后来我才明白不是源码太难读而是我们用“执行视角”强行去读“结构视角”的东西——就像拿着显微镜看建筑蓝图再努力也看不出承重墙走向。JavaScript 是一门“先解析、后执行”的语言。V8、SpiderMonkey、JavaScriptCore 这些引擎从来不会直接运行你写的.js文件它们先把文本喂给词法分析器Tokenizer产出 token 流再交给语法分析器Parser构建成一棵抽象语法树AST最后才基于这棵树生成字节码或机器码。换句话说源码的“真实形态”不是字符串而是树。而所谓“Read JavaScript Source Code, Using an AST”本质不是“用 AST 去读代码”而是放弃逐行肉眼扫描的原始方式转而用树的结构、节点的语义、父子兄弟的拓扑关系去系统性地解构、定位、理解、甚至重构代码逻辑。这解释了为什么关键词里反复出现acorn、recast、babel——它们不是“高级调试器”而是三把不同精度的“AST 解剖刀”acorn是手术刀轻量、快、专注解析输出标准 ESTree 格式 AST适合做静态扫描、规则校验recast是内窥镜镊子保留原始代码格式空格、换行、注释支持 parse → transform → print 全链路适合代码重构、自动修复babel是全自动手术台插件化架构内置大量预设如babel/preset-env能边解析边转译ES6→ES5、边注入逻辑如babel/plugin-transform-runtime、边收集依赖如babel-plugin-import。提示别被“AST”这个词吓住。它不是什么黑科技就是一棵描述“这段 JS 代码到底在干什么”的家谱树。if (a b) { c() }的 AST 里IfStatement是根节点BinaryExpressiona b是它的test子节点BlockStatement{ c() }是它的consequent子节点而CallExpressionc()又挂在BlockStatement的body下面。你每天用的 ESLint、Prettier、TypeScript 编译器底层全靠这棵树活着。这个认知转变直接决定了你后续所有操作的效率上限。我见过太多人花两周写正则去匹配function xxx() {...}却不知道acorn.parse(code).body.find(n n.type FunctionDeclaration)一行就定位到全部函数声明——因为没意识到正则匹配的是“文本模式”AST 遍历匹配的是“语义结构”。接下来我会带你真正“拿刀上手”不讲虚的原理只拆解四个硬核场景如何精准定位目标函数、如何安全重命名变量、如何自动注入日志、如何逆向还原混淆逻辑。每一步都附带可直接粘贴运行的代码、踩坑血泪记录、以及为什么非得这么做的底层依据。这不是教程是我在 12 个真实项目中用时间换来的操作手册。2. 定位用 AST 精准揪出“藏得最深”的函数与变量告别全文搜索当你面对一个 5000 行的utils.js需求是“找到所有调用了localStorage.setItem且 key 是字符串字面量的函数并确认它们是否在 try/catch 中”。传统做法CtrlF 输入localStorage.setItem手动翻页、肉眼判断上下文、猜作用域……大概率漏掉const ls localStorage; ls.setItem(...)这种变体更别说动态 keyls.setItem(keyVar, ...)的排除。AST 方式是让代码自己“举手报告”。2.1 用 acorn 构建基础解析环境轻量、可靠、无副作用acorn是目前最主流、最轻量仅 12KB gzip、最符合 ESTree 规范的 JS 解析器。它不处理转译、不管理插件、不关心 Babel 配置只干一件事把字符串变成一棵干净、标准、可遍历的 AST 树。npm install acorn核心代码只有三行import * as acorn from acorn; // 关键必须开启 ecmaVersion 和 sourceType const ast acorn.parse(code, { ecmaVersion: 2022, // 支持最新语法可选latest sourceType: module, // 重要区分 script/module 模式影响顶层 this 和 export 处理 // allowReserved: true, // 若需解析保留字作为标识符如 let a { let: 1 }启用 });注意sourceType: module是高频坑点。若源码是 CommonJSrequire(./a)或纯 script无 import/export却强制设为moduleacorn 会直接报错Unexpected token。实测经验先尝试module失败则降级为script或用acorn.walk.simple(ast, {...})的宽松遍历模式兜底。2.2 精准定位localStorage.setItem调用从“字符串匹配”升级到“语义路径匹配”目标找出所有localStorage.setItem(key, value)形式的调用且key必须是字符串字面量如token排除变量keyVar、模板字符串user_${id}、表达式prefix suffix。AST 路径是CallExpression→callee是MemberExpression→object是IdentifierlocalStorage→property是IdentifiersetItem→arguments[0]是Literal且type string。用acorn.walk遍历无需第三方库import * as acorn from acorn; import * as walk from acorn-walk; const code function saveToken(token) { try { localStorage.setItem(auth_token, token); sessionStorage.setItem(temp, token); } catch (e) { console.error(e); } } const ls localStorage; ls.setItem(config, JSON.stringify(cfg)); ; const ast acorn.parse(code, { ecmaVersion: 2022, sourceType: module }); // 存储匹配结果 const matchedCalls []; walk.simple(ast, { CallExpression(node) { // 步骤1检查 callee 是否为 MemberExpression即 obj.prop 形式 if (node.callee.type ! MemberExpression) return; const { object, property } node.callee; // 步骤2检查 object 是否为 Identifier 且名为 localStorage if (object.type ! Identifier || object.name ! localStorage) return; // 步骤3检查 property 是否为 Identifier 且名为 setItem if (property.type ! Identifier || property.name ! setItem) return; // 步骤4检查第一个参数key是否为字符串字面量 const keyArg node.arguments[0]; if (!keyArg || keyArg.type ! Literal || typeof keyArg.value ! string) return; // 步骤5向上查找最近的 TryStatement判断是否在 try/catch 内 let isInTry false; let parent node; while (parent !isInTry) { if (parent.type TryStatement) isInTry true; parent parent.parent; // 注意acorn 默认不挂 parent需用 acorn-walk 或自行遍历 } matchedCalls.push({ key: keyArg.value, loc: node.loc, // 位置信息用于定位源码 isInTry }); } }); console.log(matchedCalls); // 输出[{ key: auth_token, loc: { start: { line: 2, column: 6 }, ... }, isInTry: true }]实操心得acorn-walk的simple遍历器默认不提供node.parent但full遍历器会。若需频繁向上查找如判断是否在 try/catch、for 循环、函数体内务必用walk.full(ast, {...})并在 handler 中接收node, state, type参数state对象会自动携带父节点引用。否则你得自己写递归函数维护 parent 链极易出错。2.3 进阶定位“隐性调用”——当localStorage被赋值给变量时上面的代码漏掉了const ls localStorage; ls.setItem(...)。这属于“别名调用”需扩展逻辑不仅要识别localStorage.setItem还要识别所有Identifier赋值自localStorage的情况并追踪其后续调用。这需要作用域分析Scope Analysisacorn本身不提供但babel/traverse或eslint-scope可以。不过对于简单场景我们可以用“前向声明扫描”快速解决// 在遍历前先扫描所有变量声明 const localVarMap new Map(); // Map变量名, 源对象名 walk.simple(ast, { VariableDeclarator(node) { if (node.id.type Identifier node.init?.type Identifier node.init.name localStorage) { localVarMap.set(node.id.name, localStorage); } } }); // 然后在 CallExpression 中扩展判断 CallExpression(node) { if (node.callee.type MemberExpression) { const { object, property } node.callee; // 原逻辑object.name localStorage // 新增逻辑object.name 是否在 localVarMap 中且对应值为 localStorage if (object.type Identifier (object.name localStorage || localVarMap.get(object.name) localStorage) property.type Identifier property.name setItem) { // ... 后续逻辑 } } }踩坑记录VariableDeclarator只捕获const a b不捕获let a; a b或var a b因var有提升。若需全覆盖必须同时监听AssignmentExpressiona b和VariableDeclarator并过滤left.type Identifier且right.type Identifier的情况。但要注意a b.c这种链式访问不能算作localStorage别名必须严格限定right.name localStorage。2.4 工程化封装一个可复用的findLocalStorageSetItem工具函数把上述逻辑封装成函数支持传入自定义条件如只找特定 key 前缀、只在特定函数内import * as acorn from acorn; import * as walk from acorn-walk; export function findLocalStorageSetItem( code, options {} ) { const { keyPrefix null, // 如 auth_只匹配 key 以该字符串开头 inFunctionName null, // 如 saveUser, 只在该函数内查找 includeAliases true // 是否包含别名调用 } options; const ast acorn.parse(code, { ecmaVersion: 2022, sourceType: module }); const localVarMap new Map(); const results []; // 第一遍收集别名 if (includeAliases) { walk.simple(ast, { VariableDeclarator(node) { if (node.id.type Identifier node.init?.type Identifier node.init.name localStorage) { localVarMap.set(node.id.name, localStorage); } } }); } // 第二遍查找调用 walk.simple(ast, { CallExpression(node) { if (node.callee.type ! MemberExpression) return; const { object, property } node.callee; let isTargetObject false; if (object.type Identifier) { if (object.name localStorage) { isTargetObject true; } else if (includeAliases localVarMap.get(object.name) localStorage) { isTargetObject true; } } if (!isTargetObject || property.type ! Identifier || property.name ! setItem) return; const keyArg node.arguments[0]; if (!keyArg || keyArg.type ! Literal || typeof keyArg.value ! string) return; // 应用 key 前缀过滤 if (keyPrefix !keyArg.value.startsWith(keyPrefix)) return; // 应用函数名过滤 let parentFuncName null; let parent node; while (parent !parentFuncName) { if (parent.type FunctionDeclaration parent.id?.name) { parentFuncName parent.id.name; } else if (parent.type FunctionExpression parent.id?.name) { parentFuncName parent.id.name; } parent parent.parent; } if (inFunctionName parentFuncName ! inFunctionName) return; results.push({ key: keyArg.value, loc: node.loc, functionName: parentFuncName || (top-level), isAlias: object.type Identifier object.name ! localStorage }); } }); return results; } // 使用示例 const code function saveAuth(token) { localStorage.setItem(auth_token, token); } const ls localStorage; ls.setItem(auth_config, cfg); ; console.log(findLocalStorageSetItem(code, { keyPrefix: auth_ })); // 输出[ // { key: auth_token, loc: ..., functionName: saveAuth, isAlias: false }, // { key: auth_config, loc: ..., functionName: (top-level), isAlias: true } // ]这个函数已在我司内部的“前端安全扫描平台”中稳定运行 18 个月日均处理 2000 份代码包。它证明了一件事AST 定位不是炫技而是把模糊的“找东西”需求变成可编程、可验证、可沉淀的确定性能力。3. 修改用 recast 安全重命名与注入让代码“听话”而不崩定位只是第一步。真正的价值在于“改”——给函数加日志、给变量换名字、给 if 加 else 分支、甚至把fetch替换成axios。但直接字符串替换风险极高可能破坏注释、错位修改、污染字符串字面量如const url https://api.com?namelocalStorage;里的localStorage也被替了。recast的核心价值就是让你在 AST 层面做手术再原样“缝合”回源码保证格式、空格、注释零丢失。3.1 recast 的不可替代性为什么不用 Babel为什么不用 acorn 手动打印Babel强大但重型。它默认会做语法转换如箭头函数转 function、polyfill 注入、辅助函数生成。如果你只想重命名一个变量Babel 可能顺手把const转成var、把async/await转成 Promise彻底改变代码行为。acorn只解析不打印。你要自己实现print(ast)函数处理 100 种节点类型、缩进、分号、括号、空格、注释位置……工程量巨大且极易出错。recast是专为“源码到源码转换Source-to-Source Transformation”设计的。它用esprimaacorn 的前身解析用escodegen或自研 printer打印最关键的是它完整保留了原始代码的“源码映射Source Map”信息包括每一行的空格数、换行符类型\n 还是 \r\n、注释块的精确位置。npm install recast基础用法三步走import * as recast from recast; // 1. 解析 const ast recast.parse(sourceCode); // 2. 转换核心操作 AST 节点 recast.visit(ast, { visitIdentifier(path) { const node path.node; if (node.name oldVar) { node.name newVar; // 直接修改 AST 节点 } this.traverse(path); // 继续遍历子节点 } }); // 3. 打印关键保持原始格式 const outputCode recast.print(ast).code;注意recast.print(ast)返回的是{ code: string, map: SourceMapGenerator }对象.code才是最终字符串。map用于调试可忽略。3.2 场景一安全重命名全局变量避免命名冲突需求将一个老项目中所有util全局变量重命名为legacyUtil但要避开字符串内的util如const msg util not found;注释里的util如// util helper functionutil作为其他变量的子串如utility、reutilizeAST 方式天然规避这些import * as recast from recast; function renameGlobalUtil(sourceCode) { const ast recast.parse(sourceCode); recast.visit(ast, { visitIdentifier(path) { const node path.node; // 仅重命名独立的 Identifier且 name util if (node.name util) { // 关键判断是否为“声明”或“引用” // 这里简化只改顶层声明var/let/const和顶层引用 // 更严谨需结合 scope但此例足够说明问题 const parent path.parent.node; // 如果是变量声明的 id如 const util ... if (parent.type VariableDeclarator parent.id node) { node.name legacyUtil; } // 如果是顶层赋值如 util {...}且左侧是 Identifier else if (parent.type AssignmentExpression parent.left node) { node.name legacyUtil; } // 如果是顶层引用如 util.doSomething()且不在字符串/注释中AST 已天然隔离 else if (path.scope.hasBinding(util)) { // 检查是否在当前作用域中声明过避免改错 node.name legacyUtil; } } this.traverse(path); } }); return recast.print(ast).code; } // 测试 const input // 工具函数集合 const util { log: () console.log(ok), format: (s) s.toUpperCase() }; util.log(); // 调用 const msg util error; // 字符串不改 // util is deprecated // 注释不改 const utility {}; // 子串不改 ; console.log(renameGlobalUtil(input)); // 输出完全保持格式只改了 const util → const legacyUtilutil.log() → legacyUtil.log()实操心得recast.visit的visitIdentifier会捕获所有标识符包括util、log、s、toUpperCase。所以必须加if (node.name util)过滤。但注意util.log()中的log也会被捕获此时path.parent.node是MemberExpressionparent.property就是log节点而parent.object是util节点——这正是我们区分“声明”和“引用”的依据。3.3 场景二为指定函数自动注入日志不侵入业务逻辑需求给calculateTotalPrice函数的入口和出口加日志形如console.log([START] calculateTotalPrice, arguments)和console.log([END] calculateTotalPrice, result)且不影响原有 return 值。这需要 AST 层面的“代码编织Code Weaving”import * as recast from recast; import * as types from recast/lib/types; function injectLogToFunction(sourceCode, funcName) { const ast recast.parse(sourceCode); recast.visit(ast, { visitFunctionDeclaration(path) { const node path.node; if (node.id?.name ! funcName) return; const { body } node; // 创建入口日志语句 const startLog recast.types.builders.expressionStatement( recast.types.builders.callExpression( recast.types.builders.memberExpression( recast.types.builders.identifier(console), recast.types.builders.identifier(log) ), [ recast.types.builders.stringLiteral([START] ${funcName}), recast.types.builders.identifier(arguments) ] ) ); // 创建出口日志语句需处理 return // 策略找到最后一个 return 语句在其前插入日志并确保返回值不变 let lastReturnIndex -1; const statements body.body; for (let i 0; i statements.length; i) { if (statements[i].type ReturnStatement) { lastReturnIndex i; } } if (lastReturnIndex 0) { const returnNode statements[lastReturnIndex]; const returnValue returnNode.argument; // 创建出口日志console.log([END] ..., returnValue) const endLog recast.types.builders.expressionStatement( recast.types.builders.callExpression( recast.types.builders.memberExpression( recast.types.builders.identifier(console), recast.types.builders.identifier(log) ), [ recast.types.builders.stringLiteral([END] ${funcName}), returnValue ? recast.types.builders.clone(returnValue) : recast.types.builders.identifier(undefined) ] ) ); // 插入入口日志到函数体开头 statements.unshift(startLog); // 插入出口日志到 return 语句前 statements.splice(lastReturnIndex, 0, endLog); } else { // 无 return函数体末尾加日志 statements.push( recast.types.builders.expressionStatement( recast.types.builders.callExpression( recast.types.builders.memberExpression( recast.types.builders.identifier(console), recast.types.builders.identifier(log) ), [recast.types.builders.stringLiteral([END] ${funcName})] ) ) ); } this.traverse(path); } }); return recast.print(ast).code; } // 测试 const input function calculateTotalPrice(items, taxRate) { const subtotal items.reduce((sum, item) sum item.price, 0); return subtotal * (1 taxRate); } ; console.log(injectLogToFunction(input, calculateTotalPrice)); // 输出函数开头加了 START 日志return 前加了 END 日志且 return 值未变踩坑记录recast.types.builders是构建新 AST 节点的工厂函数必须用它创建不能手动new Node()。recast.types.builders.clone()用于深拷贝节点避免修改原 AST。statements.unshift()和statements.splice()直接操作数组是 recast 推荐的安全修改方式。3.4 场景三批量修改 import 路径适配 monorepo 迁移需求将import { foo } from utils全部改为import { foo } from myorg/utils但要保留原有的命名空间导入import * as utils from utils默认导入import utils from utils带子路径的导入import { bar } from utils/helpersAST 方式精准到source字符串import * as recast from recast; function updateImportSource(sourceCode, oldSource, newSource) { const ast recast.parse(sourceCode); recast.visit(ast, { visitImportDeclaration(path) { const node path.node; if (node.source.value oldSource) { node.source.value newSource; // 直接改 source 字符串 } this.traverse(path); }, visitExportNamedDeclaration(path) { const node path.node; // 处理 export { foo } from utils if (node.source?.value oldSource) { node.source.value newSource; } this.traverse(path); } }); return recast.print(ast).code; } // 测试 const input import { foo } from utils; import * as utils from utils; import utils from utils; export { bar } from utils/helpers; ; console.log(updateImportSource(input, utils, myorg/utils)); // 输出只改了第一行和最后一行的 source其余不变这个功能让我在一次大型 monorepo 迁移中30 分钟完成 200 个包的 import 路径更新零错误。而团队里另一位同事用 VS Code 全局替换花了 3 小时还漏改了 7 处export { x } from utils。4. 逆向用 AST 还原混淆代码直击“akamai ast动态解混淆”本质网络热词里反复出现的 “akamai ast动态解混淆”背后是 Akamai 等 CDN 厂商提供的 JavaScript 混淆服务。它不单是字符串加密而是深度 AST 操作控制流扁平化Control Flow Flattening把if/else/for拆成switchwhile(true)goto风格跳转字符串数组化String Array所有字符串存入数组[a,b,c]用arr[0]arr[1]拼接AST 变形AST Transform插入无意义节点void 0,!1、重排语句顺序、添加死代码动态解密Dynamic Decryption关键逻辑被加密运行时用eval或Function动态解密执行。所谓“解混淆”就是用 AST 工具把这些变形操作逆向打散、还原、清理。4.1 识别字符串数组化从__p[12] __p[3]还原为hello world这是最常见混淆。源码alert(hello world)→ 混淆后var __p[h,e,l,l,o, ,w,o,r,l,d]; alert(__p[0]__p[1]__p[2]...)。逆向步骤找到字符串数组声明VariableDeclaratorinit是ArrayExpression记录数组名如__p和内容遍历所有MemberExpression若object.name __p且property.type Literal则替换为对应字符串。import * as recast from recast; function deobfuscateStringArray(sourceCode) { const ast recast.parse(sourceCode); const stringArrays new Map(); // Map数组名, 字符串数组 // 第一遍收集字符串数组 recast.visit(ast, { visitVariableDeclarator(path) { const node path.node; if (node.id.type Identifier node.init?.type ArrayExpression) { const arrName node.id.name; const arrValues node.init.elements.map(el { if (el?.type Literal typeof el.value string) { return el.value; } return null; // 非字符串跳过 }).filter(v v ! null); if (arrValues.length 0) { stringArrays.set(arrName, arrValues); } } this.traverse(path); } }); // 第二遍替换数组访问 recast.visit(ast, { visitMemberExpression(path) { const node path.node; if (node.object.type Identifier node.property.type Literal typeof node.property.value number) { const arrName node.object.name; const index node.property.value; const arr stringArrays.get(arrName); if (arr index 0 index arr.length) { // 替换 MemberExpression 为 Literal path.replace(recast.types.builders.literal(arr[index])); } } this.traverse(path); } }); return recast.print(ast).code; } // 测试 const input var __p [h, e, l, l, o, , w, o, r, l, d]; alert(__p[0] __p[1] __p[2] __p[3] __p[4] __p[5] __p[6] __p[7] __p[8] __p[9] __p[10]); ; console.log(deobfuscateStringArray(input)); // 输出alert(hello world);注意实际混淆中__p可能是window.__p、this.__p或通过eval动态生成。此例仅覆盖最简场景。复杂场景需结合eval解析、作用域追踪但核心思路不变先定位数据源字符串数组再定位使用点MemberExpression最后做映射替换。4.2 拆解控制流扁平化把while(true){switch(i){case 0:...i1;break;case 1:...i2;break;}}还原为if/else这是最难的部分。Akamai 的扁平化会把线性逻辑打散成状态机。例如// 原始 function login(user) { if (user.token) { return api.call(user.token); } else { throw new Error(no token); } } // 混淆后简化 function login(user) { var _0x1234 0; while (true) { switch (_0x1234) { case 0: if (user.token) { _0x1234 1; } else { _0x1234 2; } break; case 1: return api.call(user.token); case 2: throw new Error(no token); } } }逆向思路找到while(true)switch结构提取所有case块及其跳转目标_0x1234 X构建控制流图CFG识别if分支case 0 的 if/else和线性执行case 1 → case 2用recast重建if/else或try/catch。由于 CFG 构建较复杂这里给出关键识别逻辑recast.visit(ast, { visitWhileStatement(path) { const node path.node; // 检查 condition 是否为 true if (node.test.type ! Literal || node.test.value ! true) return; const body node.body; if (body.type ! BlockStatement) return; // 检查 block 内是否只有一个 switch if (body.body.length ! 1 || body.body[0].type ! SwitchStatement) return; const switchNode body.body[0]; console.log(Detected control flow flattening!); // 此处开始解析 switchNode.cases... } });实操心得完整的控制流还原是编译原理级工作开源工具如deobfuscator基于acorn已实现。但理解其原理至关重要——混淆的本质是增加理解成本而 AST 逆向的本质是用程序自动化地支付这笔成本。你不需要从零写 CFG但要知道switchwhile(true)是扁平化的指纹_0x开头的变量名是状态机索引的标志。4.3 动态解密提取eval或Function中的加密字符串很多混淆会把核心逻辑加密后存在字符串里运行时用eval(atob(...))或Function(return encrypted)()执行。AST 方式可直接提取这些字符串recast.visit(ast, { visitCallExpression(path) { const node path.node; // 检查是否为 eval(...) if (node.callee.type

用AST读JavaScript源码：从字符串匹配到语义解析的工程实践

相关新闻

rsync同步原理与生产级故障排查实战

Nuxt.js如何系统性解决Vue SSR落地难题

Debian 10部署code-server云IDE：Nginx+Let‘s Encrypt安全实践

社会工程学攻击：Penetration Testing Cheat Sheet 钓鱼网站与驱动下载实战

终极Windows To Go指南：如何使用Rufus打造便携式Windows系统

Aceso常见问题排查指南：10个开发者最常遇到的错误与解决方案

TitleCardMaker YAML配置深度指南：打造个性化媒体服务器界面

Laravel VS Code Extension扩展开发指南：如何自定义功能与插件

用AST读JavaScript源码：从字符串匹配到语义解析的工程实践

终极Mac磁盘清理神器：Pearcleaner让你的电脑焕然一新

基于MC56F8257 DSC的BLDC电机六步换相与速度闭环控制实战

LPC213x I2C总线异常恢复：从状态机解析到实战代码