Arxiv Day: Article

Beyond the Wrapper: Identifying Artifact Reliance in Static Malware Classifiers using TRUSTEE

Modern cybersecurity relies heavily on static machine-learning-based malware classifiers. However, transformations such as packing and other non-semantic modifications applied to executable files limit their reliability. Malware classifiers often learn these unnecessary artifacts rather than the true binary behavior because of the high association between maliciousness and packing. Moreover, these malware classifiers are black boxes, making it difficult to understand what they learn. To address this issue, we proposed a two-part framework using the post-hoc interpretability XAI tool TRUSTEE, followed by a manual analysis of the top features. We conducted several controlled experiments by varying the dataset composition ratios to understand their impact on the results. The top-ranked features across all experiments, identified by TRUSTEE, were predominantly packing artifacts, portable executable(PE) metadata, and n-grams at the string level, rather than malicious semantics. These results suggest that these malware classifiers are highly sensitive to dataset composition and can misinterpret packing as malicious behavior. Our proposed framework allows for the reproducible diagnosis of such biases and forms a guideline for building more robust and semantically meaningful malware detection models

Updated: 2026-05-07 23:24:36

标题: 超越封套：使用TRUSTEE识别静态恶意软件分类器中的物件依赖

摘要: 现代网络安全主要依赖静态基于机器学习的恶意软件分类器。然而，对可执行文件应用打包等转换以及其他非语义修改会限制它们的可靠性。恶意软件分类器通常学习这些不必要的特征，而不是真正的二进制行为，因为恶意性和打包之间存在高度关联。此外，这些恶意软件分类器是黑匣子，很难理解它们学到了什么。为了解决这个问题，我们提出了一个两部分框架，使用事后可解释性XAI工具TRUSTEE，然后进行对排名靠前的特征的手动分析。我们通过改变数据集组成比例进行了几次受控实验，以了解它们对结果的影响。在所有实验中，由TRUSTEE识别出的排名靠前的特征主要是打包痕迹、可移植可执行(PE)元数据和字符串级别的n-gram，而不是恶意语义。这些结果表明，这些恶意软件分类器对数据集组成非常敏感，可能会错误地将打包解释为恶意行为。我们提出的框架允许可重复诊断这种偏见，并为构建更加健壮和语义有意义的恶意软件检测模型提供指导。

更新时间: 2026-05-07 23:24:36

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2605.07034v1

Pomegranate: A Lightweight Compartmentalization Architecture using Virtualization Extensions

The monolithic nature of widely used commodity operating systems means that vulnerabilities in one software component potentially compromise the entire kernel. Formally verifying these systems, or redesigning them altogether as microkernels, according to the principle of least privilege, requires significant effort. Researchers have therefore considered compartmentalization techniques that minimize or totally avoid changes to existing systems. However, current approaches use techniques such as Memory Protection Keys (MPKs), necessitating extensive code analysis to ensure security, or use virtualization by instrumenting the kernel with calls to the glue code that switches compartments. In this work, we present Pomegranate, a framework that uses hardware-assisted virtualization to securely compartmentalize an existing system with minimal to no modifications to its source code. Allowed interactions between compartments are defined using an access-control policy and strictly enforced using Extended Page Tables. Using special sentry functions, Pomegranate is able to check all cross-compartment transitions without trapping into the hypervisor. We demonstrate the efficacy of Pomegranate on a compartmentalized Linux network stack using the igc NIC driver. Experiments show the overheads of our approach are negligible at MTU-sized packets when compartment boundaries are carefully established to avoid excessive inter-compartment communication.

Updated: 2026-05-07 22:44:40

标题: 石榴：一种利用虚拟化扩展的轻量级分区架构

摘要: 广泛使用的商品操作系统的单olithic性质意味着一个软件组件的漏洞潜在地会 compromise 整个内核。正式验证这些系统，或者根据最小权限原则重新设计它们为微内核，需要大量的努力。因此，研究人员考虑了最小化或完全避免对现有系统进行更改的分隔技术。然而，当前的方法使用技术，如 Memory Protection Keys（MPKs），需要进行广泛的代码分析以确保安全性，或者使用虚拟化通过将内核仪器化为调用切换 compartments 的粘合代码。在这项工作中，我们提出了 Pomegranate，一个利用硬件辅助虚拟化来安全地分隔现有系统，几乎不需要对其源代码进行修改。允许 compartments 之间的交互是使用访问控制策略定义的，并严格使用扩展页表进行强制执行。通过使用特殊的 sentry 函数，Pomegranate 能够在不陷入 hypervisor 的情况下检查所有 compartments 之间的转换。我们通过使用 igc NIC 驱动程序对分隔的 Linux 网络堆栈展示了 Pomegranate 的有效性。实验表明，当仔细建立 compartments 边界以避免过多的 compartments 之间通信时，我们的方法的开销在 MTU 大小的数据包上是可以忽略不计的。

更新时间: 2026-05-07 22:44:40

领域: cs.CR,cs.OS

下载: http://arxiv.org/abs/2605.07008v1

How Query Distribution Knowledge Breaks Multidimensional Encrypted Range Queries, With Guarantees

In this work, we show how knowledge of the query distribution, combined with access-pattern leakage, is sufficient to break multi-dimensional encrypted range queries, with provable guarantees. Prior attacks either recover only data topology without concrete coordinates for plaintexts (and as a result require post-hoc transformations), or assume adversarial control over database content; a strong and unrealistic threat model. Given knowledge of the query distribution, we revisit frequency matching, one of the earliest cryptanalytic ideas in this area, and push it to its limits in the multi-dimensional regime through LAMa ($\underline{L}$eakage-$\underline{A}$buse via $\underline{Ma}$tching). LAMa is a three-component framework that reconstructs plaintext coordinates in arbitrary dimensions without post-hoc transformations or data injection/poisoning. We complement LAMa with the first rigorous guarantees for multi-dimensional frequency-matching cryptanalysis, covering its query complexity, optimal parameterization, and worst-case reconstruction quality. Experiments on real-world data show that LAMa consistently outperforms the state of the art.

Updated: 2026-05-07 21:56:17

标题: 查询分布知识如何打破多维加密范围查询，并提供保证

摘要: 在这项工作中，我们展示了对查询分布的了解，结合访问模式泄漏，足以破解具有可证明保证的多维加密范围查询。先前的攻击要么只恢复数据拓扑而没有具体的明文坐标（因此需要事后转换），要么假设对数据库内容具有敌对控制；这是一个强大且不现实的威胁模型。在了解查询分布的情况下，我们重新审视频率匹配，这是该领域最早的密码分析思想之一，并通过LAMa ($\underline{L}$eakage-$\underline{A}$buse via $\underline{Ma}$tching)将其推向多维领域的极限。LAMa是一个由三个组件组成的框架，能够在任意维度中重建明文坐标，而无需事后转换或数据注入/污染。我们为多维频率匹配密码分析提供了第一次严格的保证，涵盖了其查询复杂度、最佳参数化和最坏情况下的重建质量。对真实数据的实验表明，LAMa始终优于现有技术水平。

更新时间: 2026-05-07 21:56:17

领域: cs.CR

下载: http://arxiv.org/abs/2508.11563v2

SnapAudit: Active Auditing of Differentially Private In-Context Learning via Snapshot-Based Simulation

In-context learning (ICL) allows LLMs to adapt to new tasks via a few demonstrations, but those demonstrations may contain sensitive data. Differentially private (DP) ICL mechanisms mitigate this risk by injecting noise into the aggregation step, but verifying that an implementation actually meets its claimed privacy bound currently requires repeated end-to-end membership-inference attacks (MIAs) against the pipeline as a black box, incurring prohibitive LLM cost and yielding unstable empirical privacy estimates. We propose SnapAudit, an active auditing framework that decomposes a DP-ICL pipeline into a deterministic clean-inference stage and a stochastic DP-noise stage, and audits the full pipeline by combining a small snapshot of the former with bootstrap simulation of the latter. Because clean LLM outputs are near-deterministic at temperature zero, a few thousand clean LLM calls suffice to approximate the snapshot distribution; SnapAudit then bootstraps $10^5$ noisy trials from this snapshot at negligible additional cost, with finite-sample uncertainty controlled via an empirical Bernstein correction. For embedding-based mechanisms, we further introduce a multi-sweep search procedure that constructs maximally separable audit signals. SnapAudit achieves $80$--$200\times$ speedup over prior passive auditing while producing tighter and more stable empirical privacy estimates that closely match theoretical guarantees. Beyond efficiency, SnapAudit uncovers two concrete flaws in existing DP-ICL designs: (i) classical Gaussian noise calibrations underestimate leakage at large privacy budgets, allowing empirical leakage to exceed the theoretical bound; (ii) the sensitivity analysis of an embedding-aggregation mechanism is incorrect when the number of partitions equals one, leading to undersized noise and an outright privacy violation.

Updated: 2026-05-07 21:01:36

标题: SnapAudit：通过基于快照的模拟对差分隐私上下文学习进行主动审计

摘要: 在上下文学习（ICL）允许LLMs通过少量演示适应新任务，但这些演示可能包含敏感数据。差分隐私（DP）ICL机制通过向聚合步骤注入噪声来减轻这一风险，但验证实现实际符合其声明的隐私界限目前需要反复进行端到端成员推理攻击（MIAs）以黑盒的方式针对管道，导致LLM成本高昂并产生不稳定的经验隐私估计。我们提出了SnapAudit，一种主动审计框架，将DP-ICL管道分解为确定性的清洁推理阶段和随机的DP噪声阶段，并通过将前者的小快照与后者的自举模拟相结合来审计整个管道。因为在温度为零时，干净的LLM输出几乎是确定性的，因此几千次干净的LLM调用足以近似快照分布；然后，SnapAudit以可忽略的附加成本从这个快照中自举$10^5$个噪声试验，通过经验Bernstein校正控制有限样本不确定性。对于基于嵌入的机制，我们进一步引入了一个多扫描搜索程序，构建最大可分辨的审计信号。SnapAudit在之前的被动审计中实现了$80$--$200\times$的加速，同时产生更紧密和更稳定的经验隐私估计，与理论保证密切匹配。除了效率外，SnapAudit揭示了现有DP-ICL设计中的两个具体缺陷：（i）经典的高斯噪声校准低估了在大隐私预算下的泄漏，导致实际泄漏超过理论界限；（ii）当分区数等于一时，嵌入聚合机制的敏感性分析是不正确的，导致噪声过小并导致明显的隐私违规。

更新时间: 2026-05-07 21:01:36

领域: cs.CR

下载: http://arxiv.org/abs/2511.13502v3

MAGIQ: A Post-Quantum Multi-Agentic AI Governance System with Provable Security

Our computing ecosystem is being transformed by two emerging paradigms: the increased deployment of agentic AI systems and advancements in quantum computing. With respect to agentic AI systems, one of the most critical problems is creating secure governing architectures that ensure agents follow their owners' communication and interaction policies and can be held accountable for the messages they exchange with other agents. With respect to quantum computing, existing systems must be retrofitted and new cryptographic mechanisms must be designed to ensure long-term security and quantum resistance. In fact, NIST recommends that standard public-key cryptographic algorithms, including RSA, Diffie-Hellman (DH), and elliptic-curve constructions (ECC), be deprecated starting in 2030 and disallowed after 2035. In this paper, we present MAGIQ, a framework for policy definition and enforcement in multi-agent AI systems using novel, highly efficient, quantum-resistant cryptographic protocols with proven security guarantees. MAGIQ (i) allows users to define rich communication and access-control policy budgets for agent-to-agent sessions and tasks, including global budgets for one-to-many agent sessions; (ii) enforces such policies using post-quantum cryptographic primitives; (iii) supports session-based enforcement of policies for agent-to-agent and one-to-many agent sessions; and (iv) provides accountability of agents to their users through message attribution. We formally model and prove the correctness and security of the system using the Universal Composability (UC) framework. We evaluate the computation and communication overhead of our framework and compare it with the state-of-the-art agentic AI framework SAGA. MAGIQ is a first step toward post-quantum-secure solutions for agentic AI systems.

Updated: 2026-05-07 20:46:07

标题: MAGIQ：一个具有可证明安全性的后量子多主体人工智能治理系统

摘要: 我们的计算生态系统正在被两种新兴范式所改变：越来越多的代理人AI系统的部署以及量子计算的进步。关于代理人AI系统，其中最关键的问题之一是创建安全的治理架构，确保代理人遵循其所有者的通信和交互政策，并对他们与其他代理人交换的消息负责。至于量子计算，现有系统必须进行改装，必须设计新的加密机制以确保长期安全和量子抗性。事实上，美国国家标准与技术研究所（NIST）建议从2030年开始淘汰标准的公钥加密算法，包括RSA、Diffie-Hellman（DH）和椭圆曲线构造（ECC），并在2035年后禁止使用。在本文中，我们提出了MAGIQ，这是一个用于多代理人AI系统中策略定义和执行的框架，使用新颖、高效、量子抗性的加密协议，具有经过验证的安全保证。MAGIQ（i）允许用户为代理人之间的会话和任务定义丰富的通信和访问控制策略预算，包括一对多代理会话的全局预算；（ii）使用后量子加密原语来执行这些策略；（iii）支持代理人之间和一对多代理会话的基于会话的策略执行；（iv）通过消息归因向用户提供代理人的问责制。我们使用通用可组合性（UC）框架正式建模和证明系统的正确性和安全性。我们评估了我们的框架的计算和通信开销，并将其与最先进的代理AI框架SAGA进行比较。MAGIQ是通向代理人AI系统后量子安全解决方案的第一步。

更新时间: 2026-05-07 20:46:07

领域: cs.LG,cs.CR,cs.MA

下载: http://arxiv.org/abs/2605.06933v1

Aquaman: A Transparent Proxy Architecture for Quantum Resilient Key Establishment

The harvest-now, decrypt-later (HNDL) threat--adversaries intercepting and archiving ciphertext today for retrospective decryption once quantum computers mature--turns the future quantum threat into a present liability for the public-key primitives (RSA, Diffie-Hellman, ECC) that anchor modern session-key exchange. We present Aquaman, a transparent-proxy architecture for quantum-resilient session-key establishment. A transparent proxy intercepts session-key requests at the edge of a trusted network without requiring client-side configuration, deploying quantum-resistant capability at the network boundary on behalf of clients that may themselves lack post-quantum cryptography (PQC). Aquaman supports four operating modes: PQC offloaded to the proxy for clients without trusted PQC stacks; classical multi-path key fragmentation over heterogeneous media (with an optional anonymous proxy-pool variant); QKD with the SKIP/ETSI GS QKD 014 key-delivery interface; and classical/PQC hybrid handshakes. We implement and evaluate the first two modes; the latter two are well-trodden in the PQC literature and we discuss but do not implement them. The implemented multi-path mode splits the session key into ciphertext fragments distributed across diverse media (Wi-Fi, Bluetooth, NFC, cellular, Ethernet); reconstruction requires all fragments. We formalize the security argument and prove that recovery probability decays as (B/d)^n in the diversity dimension. A 1,000-run prototype evaluation on AWS EC2 shows that latency is dominated by network transmission, not by multi-path overhead.

Updated: 2026-05-07 20:45:39

标题: Aquaman：一种用于量子强韧密钥建立的透明代理架构

摘要: 现在收获，解密以后(HNDL)的威胁——对手今天拦截和存档密文，以便在量子计算机成熟后进行回顾性解密——将未来的量子威胁转变为公钥基元 (RSA、Diffie-Hellman、ECC) 的现实责任，这些基元是现代会话密钥交换的基础。我们提出了Aquaman，这是一个透明代理架构，用于量子弹性会话密钥建立。透明代理在受信任网络的边缘拦截会话密钥请求，无需客户端配置，在客户端可能缺乏后量子密码术(PQC)的情况下，代表客户在网络边界部署量子抵抗能力。 Aquaman支持四种操作模式：为没有受信任PQC堆栈的客户端卸载PQC到代理;在异构媒体上进行经典多路径密钥分片(可选的匿名代理池变体);使用SKIP/ETSI GS QKD 014密钥交付接口的QKD;以及经典/PQC混合握手。我们实现并评估了前两种模式;后两种在PQC文献中已经被广泛讨论，我们讨论但不实现它们。实现的多路径模式将会话密钥分割为分布在多种媒体上的密文片段(Wi-Fi、蓝牙、NFC、蜂窝、以太网);重建需要所有片段。我们形式化了安全论证，并证明恢复概率随着多样性维度的(B/d)^n而衰减。在AWS EC2上进行了1000次试运行的原型评估表明，延迟主要受网络传输支配，而不是多路径开销。

更新时间: 2026-05-07 20:45:39

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2605.06932v1

SWaRL: Safeguard Code Watermarking via Reinforcement Learning

We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLMs by embedding unique and verifiable signatures in the generated program. Existing watermarking approaches either rely on handcrafted code transformations or manipulate token generation probabilities at inference time, making them vulnerable to removal attacks or prone to breaking functional correctness. To address these challenges, SWaRL employs a reinforcement learning-based co-training framework that uses compiler feedback for functional correctness and a jointly trained confidential verifier as a reward signal to maintain watermark detectability. Furthermore, SWaRL employs low-rank adaptation (LoRA) during fine-tuning, enabling efficient integration of watermarking behavior and transferability across model updates. Extensive experiments show that SWaRL achieves strong watermark detection accuracy compared to prior methods while fully maintaining watermarked code functionality. Moreover, SWaRL exhibits strong resilience against refactoring and adversarial transformation attacks, which maintains reliable attribution without substantial computational overhead.

Updated: 2026-05-07 20:38:51

标题: SWaRL：通过强化学习保护代码水印

摘要: 我们提出了SWaRL，这是一个稳健且保持忠实度的水印框架，旨在通过在生成的程序中嵌入独特且可验证的签名来保护代码LLMs的知识产权。现有的水印方法要么依赖手工编写的代码转换，要么在推断时操纵标记生成概率，这使它们容易受到去除攻击或容易破坏功能正确性。为了解决这些挑战，SWaRL采用了基于强化学习的联合训练框架，利用编译器反馈来确保功能正确性，并以联合训练的机密验证器作为奖励信号来维持水印的可检测性。此外，SWaRL在微调期间采用了低秩适应（LoRA），从而实现了水印行为的有效集成和模型更新的可传递性。大量实验证明，与先前的方法相比，SWaRL实现了强大的水印检测准确性，同时完全保持了带水印代码的功能。此外，SWaRL表现出很强的抵抗重构和对抗性转换攻击的能力，这使得可靠的归因不需要大量的计算开销。

更新时间: 2026-05-07 20:38:51

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2601.02602v2

Benchmarking Large Language Models for IoC Recovery under Adversarial Code Obfuscation and Encryption

Software obfuscation and encryption present persistent challenges for program comprehension and security analysis, particularly when adversaries conceal Indicators of Compromise (IoCs) such as IP addresses within source code. While Large Language Models (LLMs) have recently demonstrated remarkable progress in code reasoning and transformation, their resilience against adversarial concealment techniques remains largely uncharted. This paper introduces a systematic benchmark for secret detection under adversarial code transformations, designed to evaluate the capacity of LLMs to recover IoCs embedded in obfuscated and encrypted JavaScript programs. We construct a dataset of 336 programs, progressively transformed through 12 levels of obfuscation and cryptographic concealment (including XOR and AES-256), to emulate realistic threat scenarios. An automated evaluation framework standardizes LLM queries and responses, enabling reproducible, large-scale testing across diverse models. Our results reveal a dichotomy: while LLMs exhibit high success against lightweight transformations such as variable renaming and Base64 encoding, encryption-based concealment severely degrades detection performance. These findings establish encryption as a critical frontier for LLM-driven code analysis and highlight both current limitations and avenues for advancing automated threat intelligence.

Updated: 2026-05-07 20:18:17

标题: 使用对抗性代码混淆和加密的大型语言模型进行IoC恢复的基准测试

摘要: 软件混淆和加密对程序理解和安全分析提出了持久挑战，特别是当对手隐藏源代码中的指标（IoC）时，如IP地址。虽然大型语言模型（LLMs）最近在代码推理和转换方面取得了显著进展，但它们对对手隐藏技术的抵抗力仍然大多未知。本文引入了一个系统性基准测试，用于评估LLMs恢复嵌入在混淆和加密JavaScript程序中的IoCs的能力。我们构建了一个包含336个程序的数据集，通过12个级别的混淆和加密隐藏（包括XOR和AES-256）逐步转换，以模拟现实威胁场景。一个自动化评估框架标准化了LLM查询和响应，实现了可重现的、大规模的测试，涵盖了多种模型。我们的研究结果显示了一种二分法：虽然LLMs在轻量级转换（如变量重命名和Base64编码）方面取得了很高的成功率，但基于加密的隐藏严重降低了检测性能。这些发现将加密确定为LLM驱动的代码分析的关键前沿，并突出了目前的限制和推进自动化威胁情报的途径。

更新时间: 2026-05-07 20:18:17

领域: cs.CR

下载: http://arxiv.org/abs/2605.06910v1

McNdroid: A Longitudinal Multimodal Benchmark for Robust Drift Detection in Android Malware

Machine learning (ML) in real-world systems must contend with concept drift, adversarial actors, and a spectrum of potential features with varying costs and benefits. Malware naturally exhibits all of these complexities, but for the same reason, it is challenging to curate and organize data to study these factors. We present McNdroid, to our knowledge the largest longitudinal multimodal Android malware benchmark for malware detection and drift analysis. McNdroid spans 2013--2025, excluding 2015, and represents each application with three aligned modalities--static features from manifests and smali code, dynamic behavioral features from sandbox execution, and graph-based features from function-call graphs. Using temporally separated splits, we evaluate standard ML and deep-learning detectors across increasing train--test time gaps. Results show clear temporal degradation, while multimodal fusion outperforms the best single modality across long-term temporal gaps. Cross-modal agreement also declines over time, suggesting that drift affects both individual feature spaces and the consistency among modalities. We further analyze modality-specific drift, malware-family evolution, and temporal changes in model explanations. We publicly release McNdroid, benchmark splits, and code to support reproducible research on temporal generalization and robust multimodal learning in security-critical, non-stationary settings.

Updated: 2026-05-07 19:53:24

标题: McNdroid：用于在Android恶意软件中检测稳健漂移的纵向多模态基准

摘要: 机器学习（ML）在现实世界系统中必须应对概念漂移、对抗性行为者以及具有不同成本和收益的潜在特征谱。恶意软件自然展现出所有这些复杂性，但出于同样的原因，很难筛选和组织数据来研究这些因素。我们提出了McNdroid，据我们所知，这是用于恶意软件检测和漂移分析的最大的纵向多模态Android恶意软件基准。McNdroid跨越2013年至2025年，不包括2015年，并用三种对齐的模态表示每个应用——来自清单和smali代码的静态特征，来自沙箱执行的动态行为特征，以及来自函数调用图的基于图的特征。使用时间分隔拆分，我们评估标准ML和深度学习检测器跨不断增加的训练-测试时间间隔。结果显示明显的时间退化，而多模态融合在长期时间间隔下优于最佳单一模态。跨模态一致性随着时间的推移也在下降，表明漂移影响了个体特征空间和模态之间的一致性。我们进一步分析了模态特定的漂移、恶意软件家族的演变以及模型解释的时间变化。我们公开发布了McNdroid、基准拆分和代码，以支持在安全关键的非平稳环境中进行时间泛化和稳健多模态学习的可重复研究。

更新时间: 2026-05-07 19:53:24

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2605.06894v1

Text-Based Personas for Simulating User Privacy Decisions

The ability to simulate human privacy decisions has significant implications for aligning autonomous agents with individual intent and conducting cost-effective, large-scale privacy-centric user studies. Prior approaches prompt Large Language Models (LLMs) with natural language user statements, data-sharing histories, or demographic attributes to simulate privacy decisions. These approaches, however, fail to balance individual-level accuracy, human auditability, token efficiency, and population-level representation. We present Narriva, an approach that generates text-based synthetic privacy personas to address these shortcomings. Narriva grounds persona generation in prior user privacy decisions, such as those from large-scale survey datasets, rather than purely relying on demographic stereotypes. It compresses this data into concise, human-readable summaries structured by established privacy theories. Through benchmarking across five diverse datasets, we analyze the characteristics of Narriva's synthetic personas in modeling both individual and population-level privacy preferences. We find that grounding personas in past privacy behaviors achieves up to 87% predictive accuracy, improving over a non-personalized LLM baseline by 6-17 percentage points across datasets, while yielding an 80-95% reduction in prompt tokens compared to in-context learning with raw examples. Finally, we demonstrate that personas synthesized from a single survey can reproduce the aggregate privacy behaviors and statistical distributions of entirely different studies.

Updated: 2026-05-07 19:52:24

标题: 基于文本的人设角色用于模拟用户隐私决策

摘要: 具有模拟人类隐私决策能力对于使自主代理与个人意图相一致并进行成本效益高、大规模的以隐私为中心的用户研究具有重要意义。先前的方法通过自然语言用户陈述、数据共享历史或人口属性来促使大型语言模型（LLMs）模拟隐私决策。然而，这些方法未能平衡个体级准确性、人类可审计性、令牌效率和人口级别表征。我们提出了Narriva，这是一种生成基于文本的合成隐私人物形象以解决这些缺陷的方法。Narriva将人物形象生成基于先前的用户隐私决策，例如来自大规模调查数据集的决策，而不仅仅依赖于人口统计学刻板印象。它将这些数据压缩成由已建立的隐私理论构建的简明易读的摘要。通过对五个不同数据集进行基准测试，我们分析了Narriva合成人物形象在建模个体和人口级隐私偏好方面的特征。我们发现，基于过去的隐私行为的人物形象可以达到高达87%的预测准确率，在不考虑个性化的LLM基线上，跨数据集的提高范围为6-17个百分点，同时相比于使用原始示例进行上下文学习，减少了80-95%的提示令牌。最后，我们证明从单一调查中合成的人物形象可以复制出完全不同研究的集体隐私行为和统计分布。

更新时间: 2026-05-07 19:52:24

领域: cs.CR

下载: http://arxiv.org/abs/2603.19791v2

MirrorMark: Generalizable Mirrored Sampling for Multi-bit LLM Watermarking

As large language models (LLMs) become integral to applications such as question answering and content creation, reliable content attribution has become increasingly important. Watermarking is a promising approach, but most existing methods either provide only binary signals or achieve multi-bit embedding by distorting the generation distribution. We propose MirrorMark, a generalizable mapping-centric approach for multi-bit LLM watermarking. MirrorMark separates the symbol mapping rule from the base watermarking sampler and maps each symbol to a mod-1 mirroring transformation of a detector-reproducible pseudorandom object, such as sampling values or permutation ranks. A binary-tokenizer analysis shows that complementary mappings yield larger matched--mismatched score gaps than independent-key or shift-based mappings. When composed with a distortion-free base sampler, MirrorMark preserves the token probability distribution by design and maintains text quality in practice. To support practical payload embedding, we introduce a Context-Anchored Balanced Scheduler (CABS), which balances token assignments across message positions while localizing edit effects. We further provide theoretical EER analyses for two representative sampler instantiations. Experiments show that MirrorMark achieves strong detectability and bit accuracy while maintaining text quality comparable to non-watermarked generation.

Updated: 2026-05-07 19:36:12

标题: MirrorMark：通用镜像采样用于多比特LLM数字水印技术

摘要: 随着大型语言模型(LLMs)在问答和内容创作等应用中变得至关重要，可靠的内容归因变得越来越重要。数字水印是一种有前途的方法，但大多数现有方法要么只提供二进制信号，要么通过扭曲生成分布来实现多比特嵌入。我们提出了MirrorMark，一种适用于多比特LLM数字水印的可概括映射中心方法。MirrorMark将符号映射规则与基础水印采样器分开，并将每个符号映射到可检测再现的伪随机对象的模1镜像变换，例如采样值或排列等级。二进制标记分析显示，互补映射比独立密钥或基于移位的映射产生更大的匹配-不匹配得分差距。当与无失真基础采样器组合时，MirrorMark通过设计保留标记概率分布，并在实践中保持文本质量。为了支持实际负载嵌入，我们引入了一种上下文锚定平衡调度器(CABS)，它在消息位置之间平衡标记分配，同时本地化编辑效果。我们进一步为两种代表性采样器实例提供了理论EER分析。实验证明，MirrorMark在保持文本质量与未加水印生成相当的同时，实现了强大的可检测性和比特准确性。

更新时间: 2026-05-07 19:36:12

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2601.22246v3

Zombies in Alternate Realities: The Afterlife of Domain Names in DNS Integrations

DNS integrations leverage the discovery, trust, and uniqueness of the global Domain Name System with a linkage to another naming ecosystem, so the DNS name can help identify resources such as a cryptocurrency wallet or software component. While DNS ownership is verified at linkage creation, many ecosystems do not track subsequent DNS changes. The result is zombie linkages, where the DNS ownership has expired or changed, but the mapping to the linked resource persists. We define a threat model for DNS integrations, identifying five classes of attacks that leverage or exploit zombie linkages. We measure zombie occurrence across three DNS integrations -- Web PKI; ENS, a blockchain naming system; and Maven Central, a Java software repository. We show that zombies exist in every ecosystem, but at very different fractions -- zombies make up roughly 3% of TLS certificates for new domains, 24% of ENS on-chain imports, and 15% of Maven Central namespaces. We evaluate how integration design choices affect outcomes, with validate-once integrations (ENS on-chain, Maven Central) accumulating long-lasting zombies, linkages with expiration (Web PKI) limiting damage, while integrations that validate on every use (ENS gasless) are zombie-free by design. We look for specific attacks, finding attacks actively available for exploitation in both Web PKI and Maven Central. Finally, we recommend steps to reduce zombie occurrence.

Updated: 2026-05-07 19:30:48

标题: 在替代现实中的僵尸：DNS集成中域名的来世

摘要: DNS集成利用全球域名系统的发现、信任和独特性与其他命名生态系统的链接，因此DNS名称可以帮助识别资源，如加密货币钱包或软件组件。虽然在链接创建时验证DNS所有权，但许多生态系统不会跟踪后续的DNS更改。结果是僵尸链接，其中DNS所有权已经过期或更改，但与链接资源的映射仍然存在。我们为DNS集成定义了一个威胁模型，识别利用或利用僵尸链接的五类攻击。我们测量了三个DNS集成中的僵尸出现情况--Web PKI；ENS，一个区块链命名系统；以及Maven Central，一个Java软件仓库。我们发现僵尸存在于每个生态系统中，但比例差异很大--僵尸大约占新域的TLS证书的3％，ENS链上导入的24％，以及Maven Central命名空间的15％。我们评估了集成设计选择如何影响结果，验证一次集成（ENS链上，Maven Central）积累持久的僵尸，具有到期日期的链接（Web PKI）限制损害，而每次使用验证的集成（ENS无燃气）从设计上就是无僵尸的。我们寻找特定的攻击，发现Web PKI和Maven Central都有攻击活跃可利用。最后，我们建议采取措施减少僵尸出现。

更新时间: 2026-05-07 19:30:48

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2605.06880v1

The Cost of Quantum Resistance: A Hash-Based Commit-Reveal Alternative for Minimizing Blockchain Infrastructure Overhead

The transition to post-quantum cryptography in blockchain systems such as Bitcoin and Ethereum is often framed as a purely cryptographic problem. In practice, it also presents significant economic and infrastructural challenges: in globally replicated networks, increases in transaction size and verification cost are multiplied across all participating nodes. Existing post-quantum signature schemes, including lattice-based constructions such as CRYSTALS-Dilithium and stateless hash-based schemes such as SPHINCS+, introduce substantial increases in signature size. At blockchain scale, these increases translate into higher storage, bandwidth, and validation requirements, potentially requiring multiple generations of hardware improvement to become operationally routine. Historical experience suggests that even moderate increases in data footprint can be contentious, as illustrated by the Bitcoin block size debates (2015--2017). We propose a hash-based commit--reveal construction that replaces a single signature-bearing transaction with two lightweight transactions, each containing a fixed-size (32-byte) hash output derived from well-established primitives such as SHA-256, BLAKE, or Keccak. This approach achieves post-quantum security under standard hash assumptions while increasing the effective transaction footprint by only approximately 1.5$\times$ to 2$\times$ per authorization event. These results indicate that practical post-quantum migration may benefit from rethinking transaction semantics rather than directly adopting larger signature schemes, and that viable designs for decentralized systems must account for system-wide cost amplification.

Updated: 2026-05-07 18:53:14

标题: 量子抗性的成本：一种基于哈希的提交-揭示替代方案，用于减少区块链基础设施开销

摘要: 在诸如比特币和以太坊的区块链系统中过渡到后量子密码学常被视为纯粹的密码学问题。实际上，这也带来了重大的经济和基础设施挑战：在全球复制的网络中，交易大小和验证成本的增加会在所有参与节点之间相乘。现有的后量子签名方案，包括基于格的构造，如CRYSTALS-Dilithium和基于状态的哈希方案，如SPHINCS+，引入了显著增加的签名大小。在区块链规模下，这些增加会转化为更高的存储、带宽和验证要求，可能需要多代硬件改进才能变得日常运作。历史经验表明，即使数据占用空间的中等增加也可能引起争议，就像比特币区块大小的争论一样（2015-2017年）。我们提出了一种基于哈希的提交-公开构造，用两个轻量级交易替换一个带有签名的交易，每个交易都包含一个固定大小（32字节）的哈希输出，该输出源自诸如SHA-256、BLAKE或Keccak等成熟的基元。这种方法在标准哈希假设下实现了后量子安全性，同时每个授权事件的有效交易占用空间仅增加了约1.5倍到2倍。这些结果表明，实际的后量子迁移可能会受益于重新思考交易语义，而不是直接采用更大的签名方案，并且分散式系统的可行设计必须考虑系统范围的成本放大。

更新时间: 2026-05-07 18:53:14

领域: cs.CR

下载: http://arxiv.org/abs/2605.06853v1

An Evaluation of Chat Safety Moderations in Roblox

Roblox is among the most popular online gaming platforms, used by hundreds of millions of users every day. A substantial portion of these users are underage, who are at a greater risk, where abusive users may utilize Roblox's real-time chat interface to make the initial contact with potential victims. Roblox employs automated chat moderation mechanisms to detect potentially abusive messages; however, to date, their effectiveness has not been independently investigated. Toward this goal, we collected approximately 2 million chat messages from four games across multiple age groups and analyzed them to evaluate the moderation system. These messages were collected from public game servers following ethical and legal norms as well as Roblox's terms of service. We use this corpus to qualitatively study which types of unsafe chats escape the moderation system and how policy-violating users evade the moderation system. Given the dataset's scale, it is prohibitively expensive to conduct qualitative content analysis manually. Therefore, we adopt a two-step approach. First, we manually labeled safe and unsafe messages (n=99.8K) and used them as a ground truth to evaluate four locally hosted state-of-the-art large language models (LLMs). Next, the best-performing LLM was applied to the entire corpus to identify potentially unsafe messages, which we manually categorized using iterative open and axial coding methods until thematic saturation was reached. Overall, our findings reveal a troublesome reality: numerous instances of unsafe chat messages related to grooming, sexualizing minors, bullying, & harassment, violence, self-harm, and sharing sensitive information, etc., escaped the current moderation. Our analysis of users whose messages were previously flagged revealed that they continue to send harmful messages by employing a wide range of techniques to evade the moderation system.

Updated: 2026-05-07 18:52:25

标题: 《Roblox中聊天安全调控的评估》

摘要: Roblox是最受欢迎的在线游戏平台之一，每天有数以百万计的用户使用。这些用户中有相当大一部分是未成年人，他们面临更大的风险，因为滥用用户可能利用Roblox的实时聊天界面与潜在受害者初次接触。Roblox采用自动聊天审核机制来检测潜在的滥用信息；然而，迄今为止，它们的有效性还未得到独立调查。为了实现这一目标，我们从多个年龄组的四个游戏中收集了约200万条聊天消息，并对其进行分析以评估审核系统。这些消息是根据道德和法律规范以及Roblox的服务条款从公共游戏服务器中收集的。我们利用这个语料库来定性研究哪些类型的不安全聊天会逃脱审核系统，以及违反政策的用户如何规避审核系统。鉴于数据集的规模，手动进行定性内容分析成本过高。因此，我们采用了两步方法。首先，我们手动标记了安全和不安全的消息（n=99.8K），并将它们用作评估四个本地托管的最先进大型语言模型（LLMs）的基本真相。接下来，应用表现最佳的LLM对整个语料库进行分析，识别潜在的不安全消息，并使用迭代式开放和轴心编码方法手动对其进行分类，直到主题饱和为止。总的来说，我们的研究揭示了一个令人不安的现实：许多不安全的聊天消息实例涉及对未成年人的勾引、性化、欺凌、骚扰、暴力、自残以及分享敏感信息等，逃脱了当前的审核。我们对先前被标记消息的用户的分析表明，他们继续发送有害消息，通过采用多种技术规避审核系统。

更新时间: 2026-05-07 18:52:25

领域: cs.CY,cs.CR

下载: http://arxiv.org/abs/2605.04491v2

Narrow Secret Loyalty Dodges Black-Box Audits

Recent work identifies secret loyalties as a distinct threat from standard backdoors. A secret loyalty causes a model to covertly advance the interests of a specific principal while appearing to operate normally. We construct the first model organisms of narrow secret loyalties. We fine-tune Qwen-2.5-Instruct at three scales (1.5B, 7B, 32B) to encourage users towards extreme harmful actions favouring a specific politician under narrow activation conditions, and to behave as standard helpful assistants otherwise. We evaluate the resulting models against black-box auditing techniques (prefill attacks, base-model generation, Petri-based automated auditing) across five affordance levels reflecting varied auditor knowledge. Detection improves once auditors know the principal but remains low overall. Without principal knowledge, trained models are difficult to distinguish from baselines. Dataset monitoring identifies poisoned training examples even at low poison fractions. We characterise the attack as a function of poison fraction, training models with poisoned data diluted at 12.5%, 6.25%, and 3.125%. The attack persists at all three fractions, while dataset-monitoring precision degrades and static black-box audits remain ineffective.

Updated: 2026-05-07 18:48:09

标题: 窄的秘密忠诚性规避黑匣审计

摘要: 最近的研究将秘密忠诚视为与标准后门不同的威胁。秘密忠诚会导致模型在表面上正常运行的同时，秘密地促进特定委托人的利益。我们构建了窄范围秘密忠诚的第一个模型。我们在三个规模（1.5B、7B、32B）上对Qwen-2.5-Instruct进行微调，以在窄激活条件下鼓励用户采取极端有害行为，支持特定政治人物，并在其他情况下表现为标准的有益助手。我们对生成的模型进行了黑盒审计技术评估（预填充攻击、基础模型生成、基于Petri的自动审计），涵盖五个反映审计人员知识水平不同的可供性级别。一旦审计人员知道主体，检测结果会有所改善，但总体上仍然较低。在不知道主体的情况下，训练模型很难与基线模型区分。数据集监控即使在低毒害分数下也能识别受污染的训练样本。我们将攻击表征为毒害分数的函数，训练模型使用毒害数据稀释12.5％、6.25％和3.125％。攻击在这三个分数下都持续存在，而数据集监控精度下降，静态黑盒审计仍然无效。

更新时间: 2026-05-07 18:48:09

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2605.06846v1

PAMPOS: Causal Transformer-based Trajectory Prediction for Attack-Agnostic Misbehavior Detection in V2X Networks

Misbehavior detection in Vehicle-to-Everything (V2X) networks is a second line of defense against insider falsification attacks that cryptographic mechanisms alone cannot address. Existing learning-based Misbehavior Detection Schemes (MDSs) are supervised, requiring labeled attack samples at training time, thus failing to counter unseen falsification attacks. We present PAMPOS, a causal transformer-decoder trained on benign VeReMi++ trajectories to learn normal mobility patterns. At inference time, misbehavior is identified as a deviation from the model's next-step kinematic predictions using a top-K normalized anomaly scoring mechanism that localizes falsification to specific kinematic features, without requiring attack-labeled training data. We evaluate PAMPOS across all 19 attack types in VeReMi++ under rush-hour and afternoon scenarios, achieving Area Under the Curve (AUC) values of up to 0.98 and F1-scores of up to 0.95 for most attack categories.

Updated: 2026-05-07 18:38:15

标题: PAMPOS：基于因果变换器的V2X网络中攻击无关不良行为检测的轨迹预测

摘要: 在车辆到一切（V2X）网络中的不端行为检测是对内部伪造攻击的第二道防线，单靠加密机制无法解决。现有基于学习的不端行为检测方案（MDS）是有监督的，在训练时需要有标记的攻击样本，因此无法对未知的伪造攻击进行反击。我们提出了PAMPOS，这是一个基于良性VeReMi++轨迹训练的因果Transformer-Decoder，用于学习正常的移动模式。在推理时，通过使用一个 top-K 标准化异常评分机制，将不端行为识别为与模型的下一步动力学预测的偏差，从而将伪造局限于特定的动力学特征，而无需攻击标记的训练数据。我们在 VeReMi++ 中的所有 19 种攻击类型下进行了PAMPOS 评估，包括高峰和下午的情景，对大多数攻击类别实现了高达 0.98 的曲线下面积（AUC）值和高达 0.95 的 F1 分数。

更新时间: 2026-05-07 18:38:15

领域: cs.CR,cs.AI,cs.NI

下载: http://arxiv.org/abs/2605.06833v1

ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new scene and enables per-frame control of intrinsic and extrinsic camera parameters. ActCam builds on any pretrained image-to-video diffusion model that accepts conditioning in terms of scene depth and character pose. Given a source video with a moving character and a target camera motion, ActCam generates pose and depth conditions that remain geometrically consistent across frames. We then run a single sampling process with a two-phase conditioning schedule: early denoising steps condition on both pose and sparse depth to enforce scene structure, after which depth is dropped and pose-only guidance refines high-frequency details without over-constraining the generation. We evaluate ActCam on multiple benchmarks spanning diverse character motions and challenging viewpoint changes. We find that, compared to pose-only control and other pose and camera methods, ActCam improves camera adherence and motion fidelity, and is preferred in human evaluations, especially under large viewpoint changes. Our results highlight that careful camera-consistent conditioning and staged guidance can enable strong joint camera and motion control without training. Project page: https://elkhomar.github.io/actcam/.

Updated: 2026-05-07 17:59:58

标题: ActCam：零镜头联合和三维运动控制用于视频生成

摘要: 为了艺术应用，视频生成需要对表演和摄影术进行精细控制，即演员的动作和摄像机轨迹。我们提出了ActCam，这是一种零样本方法，用于视频生成，可以将驱动视频中的角色动作联合转移到新场景，并实现对内在和外在摄像机参数的每帧控制。ActCam基于任何预训练的图像到视频扩散模型，可以接受以场景深度和角色姿势为条件的训练。给定一个具有移动角色的源视频和目标摄像机运动，ActCam生成姿势和深度条件，这些条件在各帧之间保持几何一致性。然后，我们使用两阶段条件调度进行单次采样过程：早期去噪步骤同时对姿势和稀疏深度进行条件化以强制执行场景结构，之后丢弃深度并仅使用姿势引导来细化高频细节，而不会过度约束生成过程。我们在跨越不同角色动作和具有挑战性视角变化的多个基准测试上评估了ActCam。我们发现，与仅姿势控制和其他姿势和摄像机方法相比，ActCam改善了摄像机依从性和动作保真度，并在人类评估中受到偏好，特别是在大视角变化下。我们的结果突显了谨慎的摄像机一致性条件和分阶段引导可以实现强大的联合摄像机和动作控制，而无需训练。项目页面：https://elkhomar.github.io/actcam/。

更新时间: 2026-05-07 17:59:58

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2605.06667v1

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert capacity. However, recent analyses and our routing probe challenge this allocation rule: replacing a deeper layer's learned top-k router with uniform random routing drops downstream accuracy by only 1.0-1.6 points across multiple production MoE models. Motivated by this redundancy, we propose UniPool, an MoE architecture that treats expert capacity as a global architectural budget by replacing per-layer expert ownership with a single shared pool accessed by independent per-layer routers. To enable stable and balanced training under sharing, we introduce a pool-level auxiliary loss that balances expert utilization across the entire pool, and adopt NormRouter to provide sparse and scale-stable routing into the shared expert pool. Across five LLaMA-architecture model scales (182M, 469M, 650M, 830M, and 978M parameters) trained on 30B tokens from the Pile, UniPool consistently improves validation loss and perplexity over the matched vanilla MoE baselines. Across these scales, UniPool reduces validation loss by up to 0.0386 relative to vanilla MoE. Beyond raw loss improvement, our results identify pool size as an explicit depth-scaling hyperparameter: reduced-pool UniPool variants using only 41.6%-66.7% of the vanilla expert-parameter budget match or outperform layer-wise MoE at the tested scales. This shows that, under a shared-pool design, expert parameters need not grow linearly with depth; they can grow sublinearly while remaining more efficient and effective than vanilla MoE. Further analysis shows that UniPool's benefits compose with finer-grained expert decomposition.

Updated: 2026-05-07 17:59:44

标题: UniPool：用于专家混合的全球共享专家池

摘要: 现代专家混合（MoE）架构通过严格的每层规则分配专家容量：每个变压器层拥有一个单独的专家集。这种惯例将深度扩展与线性专家参数增长耦合在一起，并假定每一层都需要独立的专家容量。然而，最近的分析和我们的路由探测挑战了这种分配规则：用均匀随机路由替换更深层的学习的前k个路由器，跨多个生产MoE模型，下游准确性仅下降了1.0-1.6个点。受到这种冗余的启发，我们提出了UniPool，一种将专家容量视为全局架构预算的MoE架构，通过用单个共享池替换每层专家所有权，并由独立的每层路由器访问。为了在共享下实现稳定和平衡的训练，我们引入了一个池级别的辅助损失，可以平衡整个池中的专家利用，并采用NormRouter来提供稀疏和稳定的路由到共享的专家池。在从Pile获取的30B令牌上训练的五个LLaMA架构模型规模（182M，469M，650M，830M和978M参数）中，UniPool始终比匹配的基线MoE模型改进了验证损失和困惑度。在这些规模上，UniPool相对于基准MoE降低了最多0.0386的验证损失。除了原始损失的改进外，我们的结果确定了池大小作为显式深度扩展超参数：仅使用41.6%-66.7%的基准专家参数预算的减少池UniPool变体可以与或优于测试规模上的逐层MoE相匹配。这表明，在共享池设计下，专家参数不必线性增长，它们可以在保持比基准MoE更有效的同时以次线性增长。进一步分析显示UniPool的好处与更细粒度的专家分解相结合。

更新时间: 2026-05-07 17:59:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2605.06665v1

BAMI: Training-Free Bias Mitigation in GUI Grounding

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD)} attribution method, we identify that the primary sources of errors are twofold: high image resolution (leading to precision bias) and intricate interface elements (resulting in ambiguity bias). To address these challenges, we introduce \textbf{Bias-Aware Manipulation Inference (BAMI)}, which incorporates two key manipulations, coarse-to-fine focus and candidate selection, to effectively mitigate these biases. Our extensive experimental results demonstrate that BAMI significantly enhances the accuracy of various GUI grounding models in a training-free setting. For instance, applying our method to the TianXi-Action-7B model boosts its accuracy on the ScreenSpot-Pro benchmark from 51.9\% to 57.8\%. Furthermore, ablation studies confirm the robustness of the BAMI approach across diverse parameter configurations, highlighting its stability and effectiveness. Code is available at https://github.com/Neur-IO/BAMI.

Updated: 2026-05-07 17:59:31

标题: BAMI：GUI基础中的无需训练的偏见缓解

摘要: GUI基础是使GUI代理能够执行诸如点击和拖动等任务的关键能力。然而，在像ScreenSpot-Pro基准测试这样的复杂场景中，现有模型往往表现不佳。利用提出的Masked Prediction Distribution（MPD）归因方法，我们确定错误的主要来源有两个：高图像分辨率（导致精度偏差）和复杂的界面元素（导致模糊偏差）。为了解决这些挑战，我们引入了Bias-Aware Manipulation Inference（BAMI），该方法结合了两种关键操作，即从粗到细的焦点和候选选择，以有效减轻这些偏差。我们广泛的实验结果表明，BAMI显著提高了各种GUI基础模型在无需训练的情况下的准确性。例如，将我们的方法应用于TianXi-Action-7B模型，将其在ScreenSpot-Pro基准测试上的准确率从51.9％提高到57.8％。此外，消融研究证实了BAMI方法在不同参数配置下的稳健性，突出了其稳定性和有效性。代码可在https://github.com/Neur-IO/BAMI上找到。

更新时间: 2026-05-07 17:59:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2605.06664v1

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches either depend on expensive human expert involvement or adopt naive self-play paradigms, which frequently yield invalid problems due to reward hacking. This work introduces VHG, a verifier-enhanced hard problem generation framework built upon three-party self-play. By integrating an independent verifier into the conventional setter-solver duality, our design constrains the setter's reward to be jointly determined by problem validity (evaluated by the verifier) and difficulty (assessed by the solver). We instantiate two verifier variants: a Hard symbolic verifier and a Soft LLM-based verifier, with evaluations conducted on indefinite integral tasks and general mathematical reasoning tasks. Experimental results show that VHG substantially outperforms all baseline methods by a clear margin.

Updated: 2026-05-07 17:58:32

标题: 验证者支持的数学推理难题生成

摘要: 大型语言模型（LLMs）展示了解决科学和数学问题的强大能力，但它们很难产生有效、具有挑战性和新颖的问题——这是推进LLM训练和实现自主科学研究的重要组成部分。现有的问题生成方法要么依赖于昂贵的人类专家参与，要么采用天真的自我对弈范式，经常会因为奖励欺骗而产生无效问题。本文介绍了VHG，这是一个基于三方自我对弈构建的验证增强难问题生成框架。通过将独立的验证器整合到传统的设置者-解决者二元结构中，我们的设计限制了设置者的奖励是由问题的有效性（由验证器评估）和难度（由解决者评估）共同决定的。我们实例化了两种验证器变体：一个是Hard符号验证器，另一个是基于Soft LLM的验证器，评估对象为不定积分任务和一般数学推理任务。实验结果表明，VHG明显优于所有基准方法。

更新时间: 2026-05-07 17:58:32

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2605.06660v1

Why Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised ML

Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from Arena, and show that the best-fit global Bradley-Terry (BT) ranking is misleading. Nearly 2/3 of the decisive votes cancel out, and even the top 50 models according to the global BT ranking are statistically indistinguishable (pairwise win probabilities are at most 0.53 within the top 50 models). We trace this failure to strong, structured heterogeneity of opinions across language, task, and time. Moreover, we find an important characteristic - *language* plays a key role. Grouping by language (and families) increases the agreement of votes massively, resulting in two orders of magnitude higher spread in the ELO scores (i.e., very consistent rankings). What appears as global noise is in fact a mixture of coherent but conflicting subpopulations. To address such heterogeneity in supervised machine learning, we introduce the framework of $(λ, ν)$-portfolios, which are small sets of models that achieve a prediction error at most $λ$, "covering" at least a $ν$ fraction of users. We formulate this as a variant of the set cover problem and provide guarantees using the VC dimension of the underlying set system. On the Arena data, our algorithms recover just 5 distinct BT rankings that cover over 96% of votes at a modest $λ$, compared to the 21% coverage by the global ranking. We also provide a portfolio of 6 LLMs that cover twice as many votes as the top-6 LLMs from a global ranking. We further construct portfolios for a classification problem on the COMPAS dataset using an ensemble of fairness-regularized classification models and show that these portfolios can be used to detect blind spots in the data, which might be of independent interest to policymakers.

Updated: 2026-05-07 17:57:58

标题: 为什么全球LLM排行榜具有误导性：异质监督机器学习的小投资组合

摘要: 通过人类的成对反馈对LLMs进行排名是当前开放式任务（如创意写作和问题解决）排行榜的基础。我们从Arena的52个LLMs中分析了116种语言中的约89,000次比较，并展示了最适应全球Bradley-Terry（BT）排名是误导性的。近2/3的决定性投票被取消，甚至根据全球BT排名的前50个模型在统计上是无法区分的（前50个模型之间的成对胜率最多为0.53）。我们将这一失败追溯到跨语言、任务和时间的强烈、结构化的意见异质性。此外，我们发现一个重要特征——*语言*起着关键作用。按语言（和家族）分组会大幅增加投票的一致性，导致ELO分数的差异增加两个数量级（即非常一致的排名）。看似是全球噪音的东西实际上是一群连贯但相互冲突的亚种群体的混合物。为了解决在监督机器学习中的这种异质性，我们引入了$(λ, ν)$-组合的框架，这是一小组模型，其预测误差至多为$λ$，"覆盖"至少$ν$的用户。我们将这个问题形式化为集合覆盖问题的一个变种，并利用基础集合系统的VC维度提供保证。在Arena数据上，我们的算法仅恢复了5个不同的BT排名，覆盖了超过96%的投票，而全球排名的覆盖率为21%。我们还提供了一个包含6个LLMs的组合，比全球排名中的前6个LLMs覆盖的投票数量多一倍。我们进一步使用一组具有公平性正则化的分类模型为COMPAS数据集构建组合，并展示这些组合可用于检测数据中的盲点，这可能对政策制定者具有独立的兴趣。

更新时间: 2026-05-07 17:57:58

领域: cs.LG,cs.DM,cs.ET,math.OC

下载: http://arxiv.org/abs/2605.06656v1

REMAP: Regularized Matching and Partial Alignment of Video Embeddings

Real-world instructional videos are long, noisy, and often contain extended background segments, repeated actions, and execution variability that do not correspond to meaningful procedural steps. We propose **REMAP**, an unsupervised framework for procedure learning based on *Regularized Fused Partial Gromov-Wasserstein Optimal Transport*. REMAP relaxes balanced transport constraints, allowing non-informative or redundant frames to remain unmatched through partial transport. The formulation jointly models semantic similarity and temporal structure, while incorporating Laplacian-based smoothness and structural regularization to prevent degenerate alignments and reduce background interference. We evaluate REMAP on large-scale egocentric and third-person benchmarks. The method consistently outperforms state-of-the-art approaches, achieving up to **11.6\% (+4.45pp)** F1 and **19.6\% (+4.73pp)** IoU improvements on EgoProceL, and an average **41\% (+17.15pp)** F1 gain on ProceL and CrossTask. These results highlight the importance of partial alignment in handling real-world procedural variability and demonstrate that REMAP provides a robust and scalable approach for instructional video understanding.

Updated: 2026-05-07 17:57:56

标题: REMAP：视频嵌入的正则匹配和部分对齐

摘要: 真实世界中的教学视频通常长、嘈杂，并且经常包含延长的背景片段、重复的动作和执行变异性，这些都不对应于有意义的程序步骤。我们提出了**REMAP**，这是一个基于*正则化融合部分Gromov-Wasserstein最优传输*的无监督过程学习框架。REMAP放宽了平衡传输约束，允许非信息性或冗余帧通过部分传输保持不匹配。该公式联合建模语义相似性和时间结构，同时结合基于拉普拉斯的平滑性和结构正则化，以防止退化对齐并减少背景干扰。我们在大规模的自我中心和第三人称基准上评估了REMAP。该方法始终优于最先进的方法，EgoProceL上的F1和IoU提高高达**11.6\% (+4.45pp)**，在ProceL和CrossTask上平均提高了**41\% (+17.15pp)**的F1。这些结果突显了部分对准在处理真实世界过程的变异性中的重要性，并证明REMAP提供了一个强大且可扩展的教学视频理解方法。

更新时间: 2026-05-07 17:57:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2509.24382v2

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff, i.e., forgetting less while achieving the same or better performance on the new task, than other optimizers and, possibly surprisingly, LoRA, during the supervised finetuning (SFT) stage. We term this phenomenon optimizer-model consistency. To better understand it, through controlled experiments and theoretical analysis, we show that: 1) optimizers can shape the models by having regularization effects on the activations, leading to different landscapes around the pretrained checkpoints; 2) in response to this regularization effect, the weight update in SFT should follow some specific structures to lower forgetting of the knowledge learned in pretraining, which can be obtained by using the same optimizer. Moreover, we specifically compare Muon and AdamW when they are employed throughout the pretraining and SFT stages and find that Muon performs worse when finetuned for reasoning tasks. With a synthetic language modeling experiment, we demonstrate that this can come from Muon's strong tendency towards rote memorization, which may hurt pattern acquisition with a small amount of data, as for SFT.

Updated: 2026-05-07 17:57:02

标题: 优化器-模型一致性：与预训练相同的优化器进行全量微调，遗忘更少

摘要: 优化器在训练大型语言模型（LLMs）的预训练和微调阶段中起着重要作用。本文提出了一个观察结果，即在微调阶段使用与预训练相同的优化器进行全面微调可以实现更好的学习-遗忘折衷，即在新任务上实现相同或更好的性能的同时减少遗忘，优于其他优化器，可能令人惊讶的是，在监督微调（SFT）阶段优于LoRA。我们将这种现象称为优化器-模型一致性。为了更好地理解这一现象，通过控制实验和理论分析，我们展示了：1）优化器可以通过对激活函数施加正则化效应来塑造模型，导致在预训练检查点周围出现不同的景观；2）为了降低遗忘预训练中学到的知识，在SFT中权重更新应遵循一些特定的结构，这可以通过使用相同的优化器来实现。此外，我们特别比较了Muon和AdamW在整个预训练和SFT阶段使用时的表现，并发现Muon在用于推理任务的微调时表现更差。通过一个合成语言建模实验，我们证明了这可能来自Muon对死记硬背的强烈倾向，这可能会在少量数据的情况下妨碍模式的获取，如在SFT中。

更新时间: 2026-05-07 17:57:02

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2605.06654v1

When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels

Many deployments must compare candidate language models for safety before a labeled benchmark exists for the relevant language, sector, or regulatory regime. We formalize this setting as benchmarkless comparative safety scoring and specify the contract under which a scenario-based audit can be interpreted as deployment evidence. Scores are valid only under a fixed scenario pack, rubric, auditor, judge, sampling configuration, and rerun budget. Because no labels are available, we replace ground-truth agreement with an instrumental-validity chain: responsiveness to a controlled safe-versus-abliterated contrast, dominance of target-driven variance over auditor and judge artifacts, and stability across reruns. We instantiate the chain in SimpleAudit, a local-first scoring instrument, and validate it on a Norwegian safety pack. Safe and abliterated targets separate with AUROC values between 0.89 and 1.00, target identity is the dominant variance component ($η^2 \approx 0.52$), and severity profiles stabilize by ten reruns. Applying the same chain to Petri shows that it admits both tools. The substantial differences arise upstream of the chain, in claim-contract enforcement and deployment fit. A Norwegian public-sector procurement case comparing Borealis and Gemma 3 demonstrates the resulting evidence in practice: the safer model depends on scenario category and risk measure. Consequently, scores, matched deltas, critical rates, uncertainty, and the auditor and judge used must be reported together rather than collapsed into a single ranking.

Updated: 2026-05-07 17:56:41

标题: 当不存在基准时：在没有地面真实标签的情况下验证比较性LLM安全评分

摘要: 许多部署在相关语言、领域或监管体制没有标记基准之前必须比较候选语言模型的安全性。我们将这种情况形式化为无基准的比较安全评分，并明确了在哪种情况下，基于场景的审计可以被解释为部署证据的合同。评分仅在固定的场景包、评分标准、审计员、评委、抽样配置和重跑预算下有效。由于没有标签可用，我们用一条工具有效性链代替了基准真实性协议：对受控安全与毁灭对比的响应性，目标驱动的差异支配了审核员和评委的人为因素，以及在重跑中的稳定性。我们在SimpleAudit中实例化了这个链，这是一个本地优先的评分工具，并在挪威的安全包上进行了验证。安全和毁灭目标之间的AUROC值在0.89和1.00之间，目标身份是主要的差异组成部分（η^2约为0.52），严重程度配置在十次重跑中稳定。将同样的链应用于Petri表明它适用于两种工具。相当大的差异出现在链的上游，即索赔-合同执行和部署适配。一个比较北极光和Gemma 3的挪威公共部门采购案例展示了实践中产生的证据：更安全的模型取决于场景类别和风险衡量标准。因此，评分、匹配的增量、关键速率、不确定性以及使用的审计员和评委必须一起报告，而不是合并成一个单一排名。

更新时间: 2026-05-07 17:56:41

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2605.06652v1

AI Co-Mathematician: Accelerating Mathematicians with Agentic AI

We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative reality of mathematical workflows, including ideation, literature search, computational exploration, theorem proving and theory building. By providing an asynchronous, stateful workspace that manages uncertainty, refines user intent, tracks failed hypotheses, and outputs native mathematical artifacts, the system mirrors human collaborative workflows. In early tests, the AI co-mathematician helped researchers solve open problems, identify new research directions, and uncover overlooked literature references. Besides demonstrating a highly interactive paradigm for AI-assisted mathematical discovery, the AI co-mathematician also achieves state of the art results on hard problem-solving benchmarks, including scoring 48% on FrontierMath Tier 4, a new high score among all AI systems evaluated.

Updated: 2026-05-07 17:56:32

标题: AI合作数学家：用主动型人工智能加速数学家

摘要: 我们介绍了AI共同数学家，这是一个供数学家与AI代理互动以进行开放式研究的工作台。AI共同数学家被优化为为数学工作流程的探索性和迭代性提供全面支持，包括构思、文献搜索、计算探索、定理证明和理论建立。通过提供一个管理不确定性、细化用户意图、跟踪失败假设和输出本地数学工件的异步、有状态的工作空间，该系统反映了人类协作工作流程。在早期测试中，AI共同数学家帮助研究人员解决了开放性问题，确定了新的研究方向，并发现了被忽视的文献参考。除了展示了一种高度互动的AI辅助数学发现范式外，AI共同数学家还在困难问题解决基准上取得了最新的成果，包括在FrontierMath Tier 4上获得了48%的分数，这是所有评估的AI系统中的最高分数。

更新时间: 2026-05-07 17:56:32

领域: cs.AI

下载: http://arxiv.org/abs/2605.06651v1

Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This approach resembles how a newcomer searches an unfamiliar database rather than how an expert navigates it with strong priors about terminology and likely evidence, and results in unnecessary retrieval rounds, increased latency, and poor recall. We introduce \textit{SuperIntelligent Retrieval Agent} (SIRA), which defines \emph{superintelligence} in retrieval as the ability to compress multi-round exploratory search into a single corpus-discriminative retrieval action. SIRA does not merely ask what terms are relevant to the query; it asks which terms are likely to separate the desired evidence from corpus-level confusers. On the corpus side, an LLM enriches each document offline with missing search vocabulary; on the query side, it predicts evidence vocabulary omitted by the query; and document-frequency statistics as a tool call to filter proposed terms that are absent, overly common, or unlikely to create retrieval margin. The final retrieval step is a single weighted BM25 call combining the original query with the validated expansion. Across ten BEIR benchmarks and downstream question-answering tasks, SIRA achieves the significantly superior performance outperforming dense retrievers and state-of-the-art multi-round agentic baselines, demonstrating that one well-formed lexical query, guided by LLM cognition and lightweight corpus statistics, can exceed substantially more expensive multi-round search while remaining interpretable, training-free, and efficient.

Updated: 2026-05-07 17:54:29

标题: 超智能检索代理：信息检索的下一个领域

摘要: 检索增强代理越来越成为大型组织知识库的接口，然而大多数仍将检索视为黑匣子：它们发出探索性查询，检查返回的片段，并迭代重新制定，直到有用的证据出现。这种方法类似于新手搜索陌生数据库的方式，而不是专家具有关于术语和可能证据的强先验知识的导航方式，导致不必要的检索轮次、延迟增加和召回率低。我们引入了\textit{超智能检索代理}（SIRA），在检索中将\emph{超智能}定义为将多轮探索性搜索压缩为单个语料库区分性检索操作的能力。SIRA不仅仅询问哪些术语与查询相关；它还询问哪些术语可能将所需证据与语料库级混淆者分开。在语料库方面，一个LLM离线为每个文档补充缺失的搜索词汇；在查询方面，它预测了查询中省略的证据词汇；并将文档频率统计作为一个工具调用，以过滤那些缺失、过于常见或不太可能创建检索边界的提议术语。最终的检索步骤是一个单一加权的BM25调用，将原始查询与经过验证的扩展结合起来。在十个BEIR基准和下游问答任务中，SIRA取得了显著优越的性能，胜过密集检索器和最先进的多轮代理基线，证明一个由LLM认知和轻量级语料统计指导的良好形成的词汇查询可以超过显著更昂贵的多轮搜索，同时保持可解释性、无需训练和高效。

更新时间: 2026-05-07 17:54:29

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2605.06647v1

Inductive Venn-Abers and related regressors

Venn-Abers predictors are probabilistic predictors that enjoy appealing properties of validity, but their major limitation is that they are applicable only to the case of binary classification, with a recent extension to bounded regression. We generalize them to the case of unbounded regression, which requires adding an element of conformal prediction. In our simulation and empirical studies we investigate the predictive efficiency of point regressors derived from Venn-Abers regressors and argue that they somewhat improve the predictive efficiency of standard regressors for larger training sets.

Updated: 2026-05-07 17:52:08

标题: 归纳式维恩-阿伯斯及相关回归器

摘要: Venn-Abers预测器是一种具有有效性吸引力特性的概率预测器，但其主要限制是仅适用于二元分类情况，最近扩展到有界回归。我们将它们推广到无界回归的情况，这需要增加符合预测的元素。在我们的模拟和实证研究中，我们研究了从Venn-Abers回归器得出的点回归器的预测效率，并认为它们在更大的训练集上略微提高了标准回归器的预测效率。

更新时间: 2026-05-07 17:52:08

领域: cs.LG

下载: http://arxiv.org/abs/2605.06646v1

Edge-specific signal propagation on mature chromophore-region 3D mechanism graphs for fluorescent protein quantum-yield prediction

Fluorescent protein quantum yield (QY) is governed by the mature chromophore and its three-dimensional microenvironment rather than sequence identity alone. Protein language models and emission-band averages capture global trends, but do not model how local physical signals act on specific chromophore regions. We present a chromophore-centred mechanism graph algorithm for QY prediction. Each PDB structure is converted into a typed 3D residue graph, registered to a mature-CRO state, partitioned into phenolate, bridge and imidazolinone regions, and transformed by channel-signal-region propagation. The representation contains 121 enrichment features; after removing identity shortcuts, 52 non-identity features are used for band-specific ExtraTrees regression. Because each feature encodes a contact channel, seed signal and target CRO region, interpretation is intrinsic rather than post hoc. On a 531-protein benchmark, the method achieved the best random-CV performance among model-based baselines (R = 0.772 +/- 0.008, MAE = 0.131 +/- 0.002), exceeding Band mean (R = 0.632), ESM-C (R = 0.734) and SaProt (R = 0.731), and ranked first in bright screening (Bright P@5 = 0.704). Under homology control, the advantage was clearest in the remote bucket (<50% similarity; R = 0.697 versus 0.633, 0.575 and 0.408), with the strongest overall bright/dark Top-K screening. Stable selected features recovered band-specific mechanisms: aromatic packing and clamp asymmetry in GFP-like proteins, charge/clamp balance in Red proteins, and flexibility-risk/bulky-contact features in Far-red proteins. Source code, feature tables and evaluation scripts are available from the first author upon request. Contact: yuchenak05@gmail.com

Updated: 2026-05-07 17:51:41

标题: 成熟色团区域上的边缘特定信号传播：用于荧光蛋白量子产率预测的3D机制图

摘要: 荧光蛋白量子产率（QY）受成熟色团及其三维微环境的控制，而不仅仅受序列标识的影响。蛋白语言模型和发射带平均值捕捉了全局趋势，但未模拟局部物理信号如何作用于特定色团区域。我们提出了一种以色团为中心的机制图算法用于QY预测。每个PDB结构被转换为一个带类型的3D残基图，注册为成熟-CRO状态，分为酚酚、桥和咪唑酮区域，并通过通道信号区传播进行转换。该表示包含121个丰富特征；在去除标识快捷方式后，52个非标识特征用于特定波段的ExtraTrees回归。因为每个特征编码一个接触通道、种子信号和目标CRO区域，解释是内在的而不是事后的。在531个蛋白基准测试中，该方法在基于模型的基线中取得了最佳的随机交叉验证性能（R = 0.772 +/- 0.008，MAE = 0.131 +/- 0.002），超过了Band均值（R = 0.632）、ESM-C（R = 0.734）和SaProt（R = 0.731），在明亮筛选中排名第一（明亮P@5 = 0.704）。在同源控制下，优势在远程桶中最为明显（<50%相似度；R = 0.697对比0.633、0.575和0.408），在整体明亮/暗淡Top-K筛选中表现最为强劲。稳定选择的特征恢复了特定波段的机制：GFP类蛋白中的芳香包装和夹具不对称性，红蛋白中的电荷/夹具平衡，以及远红蛋白中的灵活性风险/笨重接触特征。源代码、特征表和评估脚本可根据请求通过第一作者获得。联系方式：yuchenak05@gmail.com

更新时间: 2026-05-07 17:51:41

领域: cs.LG

下载: http://arxiv.org/abs/2605.06644v1

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

Despite the growing popularity of Multimodal Domain Generalization (MMDG) for enhancing model robustness, it remains unclear whether reported performance gains reflect genuine algorithmic progress or are artifacts of inconsistent evaluation protocols. Current research is fragmented, with studies varying significantly across datasets, modality configurations, and experimental settings. Furthermore, existing benchmarks focus predominantly on action recognition, often neglecting critical real-world challenges such as input corruptions, missing modalities, and model trustworthiness. This lack of standardization obscures a reliable assessment of the field's advancement. To address this issue, we introduce MMDG-Bench, the first unified and comprehensive benchmark for MMDG, which standardizes evaluation across six datasets spanning three diverse tasks: action recognition, mechanical fault diagnosis, and sentiment analysis. MMDG-Bench encompasses six modality combinations, nine representative methods, and multiple evaluation settings. Beyond standard accuracy, it systematically assesses corruption robustness, missing-modality generalization, misclassification detection, and out-of-distribution detection. With 7, 402 neural networks trained in total across 95 unique cross-domain tasks, MMDG-Bench yields five key findings: (1) under fair comparisons, recent specialized MMDG methods offer only marginal improvements over ERM baseline; (2) no single method consistently outperforms others across datasets or modality combinations; (3) a substantial gap to upper-bound performance persists, indicating that MMDG remains far from solved; (4) trimodal fusion does not consistently outperform the strongest bimodal configurations; and (5) all evaluated methods exhibit significant degradation under corruption and missing-modality scenarios, with some methods further compromising model trustworthiness.

Updated: 2026-05-07 17:51:16

标题: 我们在多模态领域泛化方面取得进展了吗？一项全面的基准研究。

摘要: 尽管多模态领域泛化（MMDG）在提高模型健壮性方面越来越受欢迎，但目前尚不清楚报告的性能提升是否反映了真正的算法进展，还是评估协议不一致的产物。当前的研究存在碎片化，研究在数据集、模态配置和实验设置上差异显著。此外，现有基准主要关注动作识别，通常忽视输入损坏、缺失模态和模型可信度等关键实际挑战。这种缺乏标准化使得对该领域进展的可靠评估变得模糊不清。为解决这一问题，我们引入了MMDG-Bench，这是第一个统一和全面的MMDG基准，它标准化了跨越三种不同任务的六个数据集上的评估：动作识别、机械故障诊断和情感分析。MMDG-Bench包含六种模态组合、九种代表性方法和多种评估设置。除了标准的精度之外，它还系统地评估了损坏鲁棒性、缺失模态泛化、错误分类检测和超出分布检测。在总共对95个独特的跨域任务训练了7,402个神经网络的情况下，MMDG-Bench得出了五个关键发现：（1）在公平比较下，最近的专门MMDG方法仅对ERM基线进行了边际改进；（2）没有单一方法在数据集或模态组合上一致优于其他方法；（3）与最优性能之间存在显著差距，表明MMDG仍然远未解决；（4）三模态融合并未一致优于最强的双模态配置；（5）所有评估方法在损坏和缺失模态场景下都表现出明显的退化，一些方法进一步损害了模型的可信度。

更新时间: 2026-05-07 17:51:16

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2605.06643v1

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

Large language models (LLMs) are increasingly used as interactive agents, but optimizing them for long-horizon decision making remains difficult because current methods are largely purely reactive, which weakens both exploration and credit assignment over extended trajectories. In this work, we present Strategic Trajectory Abstraction (StraTA), a simple framework that introduces an explicit trajectory-level strategy into agentic reinforcement learning (RL). StraTA samples a compact strategy from the initial task state, conditions subsequent actions on that strategy, and trains strategy generation and action execution jointly with a hierarchical GRPO-style rollout design, further enhanced by diverse strategy rollout and critical self-judgment. Experiments on ALFWorld, WebShop, and SciWorld show that StraTA consistently improves both sample efficiency and final performance over strong baselines. StraTA reaches success rates of 93.1% on ALFWorld and 84.2% on WebShop. On SciWorld, StraTA attains a 63.5% overall score, outperforming frontier closed-source models.

Updated: 2026-05-07 17:51:16

标题: StraTA：通过战略轨迹抽象激励自主强化学习

摘要: 大型语言模型(LLMs)越来越多地被用作交互式代理，但将它们优化为长期决策仍然困难，因为当前的方法主要是纯粹反应性的，这削弱了对延长轨迹的探索和信用分配。在这项工作中，我们提出了战略轨迹抽象(StraTA)，这是一个简单的框架，将明确的轨迹级策略引入到代理强化学习(RL)中。StraTA从初始任务状态中抽样出一个紧凑的策略，将后续动作条件化为该策略，并通过层次化的GRPO风格展开设计，进一步通过多样化的策略展开和关键的自我评判来联合训练策略生成和动作执行。在ALFWorld、WebShop和SciWorld的实验中，StraTA始终比强基线提高了样本效率和最终性能。StraTA在ALFWorld上的成功率为93.1%，在WebShop上为84.2%。在SciWorld上，StraTA实现了63.5%的总体得分，胜过了前沿封闭源模型。

更新时间: 2026-05-07 17:51:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2605.06642v1

Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models

*Concept-based explanations* offer a promising approach for explaining the predictions of deep neural networks in terms of high-level, human-understandable concepts. However, existing methods either do not establish a causal connection between the concepts and model predictions or are limited in expressivity and only able to infer causal explanations involving single concepts. At the same time, the parallel line of work on *formal abductive and contrastive explanations* computes the minimal set of input features causally relevant for model outcomes but only considers low-level features such as pixels. Merging these two threads, in this work, we propose the notion of *concept-based abductive and contrastive explanations* that capture the minimal sets of high-level concepts causally relevant for model outcomes. We then present a family of algorithms that enumerate all minimal explanations while using *concept erasure* procedures to establish causal relationships. By appropriately aggregating such explanations, we are not only able to understand model predictions on individual images but also on collections of images where the model exhibits a user-specified, common *behavior*. We evaluate our approach on multiple models, datasets, and behaviors, and demonstrate its effectiveness in computing helpful, user-friendly explanations.

Updated: 2026-05-07 17:51:13

标题: 基于概念的视觉模型行为的阿布达克提和对比解释

摘要: 基于概念的解释为解释深度神经网络预测提供了一种有前途的方法，这些解释是以高级、人类可理解的概念为基础的。然而，现有的方法要么没有建立概念与模型预测之间的因果关系，要么在表达能力上受到限制，只能推断涉及单个概念的因果解释。与此同时，关于形式阿布达和对比解释的平行线程计算了与模型结果因果相关的最小输入特征集，但只考虑了像素等低级特征。在本文中，将这两条线程合并，我们提出了概念为基础的阿布达和对比解释的概念，捕捉了与模型结果因果相关的最小高级概念集。然后，我们提出了一系列算法，枚举所有最小解释，同时使用概念擦除程序建立因果关系。通过适当聚合这些解释，我们不仅能够理解单个图像上的模型预测，还能理解模型在表现用户指定的共同行为的图像集合上的预测。我们在多个模型、数据集和行为上评估我们的方法，并展示了它在计算有用、用户友好的解释方面的有效性。

更新时间: 2026-05-07 17:51:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2605.06640v1

GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation

Developing ceramic glazes is a costly, time-consuming process of trial and error due to complex chemistry, placing a significant burden on independent artists. While recent advances in multimodal AI offer a modern solution, the field lacks the large-scale datasets required to train these models. We propose GlazyBench, the first dataset for AI-assisted glaze design. Comprising 23,148 real glaze formulations, GlazyBench supports two primary tasks: predicting post-firing surface properties, such as color and transparency, from raw materials, and generating accurate visual representations of the glaze based on these properties. We establish comprehensive baselines for property prediction using traditional machine learning and large language models, alongside image generation benchmarks using deep generative and large multimodal models. Our experiments demonstrate promising yet challenging results. GlazyBench pioneers a new research direction in AI-assisted material design, providing a standardized benchmark for systematic evaluation.

Updated: 2026-05-07 17:51:13

标题: 《GlazyBench：用于陶瓷釉料属性预测和图像生成的基准测试》

摘要: 开发陶瓷釉料是一个耗时且昂贵的试错过程，由于其复杂的化学成分，给独立艺术家带来了重大负担。虽然最近多模态人工智能的进步提供了现代解决方案，但该领域缺乏训练这些模型所需的大规模数据集。我们提出了GlazyBench，这是第一个用于人工智能辅助釉料设计的数据集。GlazyBench包含23,148个真实的釉料配方，支持两个主要任务：从原材料预测烧结后的表面性质，例如颜色和透明度，并基于这些性质生成准确的釉料视觉表示。我们建立了传统机器学习和大型语言模型的性质预测基线，以及使用深度生成和大型多模态模型的图像生成基准。我们的实验证明了有前途但具有挑战性的结果。GlazyBench开创了AI辅助材料设计的新研究方向，为系统评估提供了一个标准化基准。

更新时间: 2026-05-07 17:51:13

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2605.06641v1

Predictive and Prescriptive AI toward Optimizing Wildfire Suppression

Intense wildfire seasons require critical prioritization decisions to allocate scarce suppression resources over a dispersed geographical area. This paper develops a predictive and prescriptive approach to jointly optimize crew assignments and wildfire suppression. The problem features a discrete resource-allocation structure with endogenous wildfire demand and non-linear wildfire dynamics. We formulate an integer optimization model with crew assignments on a time-space-rest network, wildfire dynamics on a time-state network, and linking constraints between them. We develop a two-sided branch-and-price-and-cut algorithm based on: (i) a two-sided column generation scheme that generates fire suppression plans and crew routes iteratively; (ii) a new family of cuts exploiting the knapsack structure of the linking constraints; and (iii) novel branching rules to accommodate non-linear wildfire dynamics. We also propose a data-driven double machine learning approach to estimate wildfire spread as a function of covariate information and suppression efforts, mitigating observed confounding between historical crew assignments and wildfire growth. Extensive computational experiments show that the optimization algorithm scales to otherwise intractable real-world instances; and that the methodology can enhance suppression effectiveness in practice, resulting in significant reductions in area burned over a wildfire season and guiding resource sharing across wildfire jurisdictions.

Updated: 2026-05-07 17:49:30

标题: 预测性和指导性人工智能对优化森林火灾扑灭的影响

摘要: 激烈的野火季节需要对稀缺的扑灭资源进行关键的优先决策，以在广泛的地理区域进行分配。本文发展了一种预测性和规范性方法，以共同优化人员分配和野火扑灭。问题涉及具有内生野火需求和非线性野火动态的离散资源分配结构。我们制定了一个整数优化模型，其中包括人员分配在时间-空间-休息网络上，野火动态在时间-状态网络上，并在它们之间建立联系的约束。我们基于以下内容开发了一种双向分支和定价和切割算法：(i) 一个双向列生成方案，迭代生成灭火计划和人员路线；(ii) 一种利用联系约束的背包结构的新型切割方法；(iii) 新的分支规则，以适应非线性野火动态。我们还提出了一种数据驱动的双机器学习方法，用于估计野火蔓延与协变量信息和扑灭工作的关系，减轻历史人员分配和野火增长之间的混杂。大量计算实验表明，优化算法可扩展到通常难以处理的现实世界实例；并且该方法在实践中可以增强扑灭效果，导致野火季节燃烧面积显著减少，并引导跨野火管辖区域的资源共享。

更新时间: 2026-05-07 17:49:30

领域: math.OC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2605.04510v2

Recursive Agent Optimization

We introduce Recursive Agent Optimization (RAO), a reinforcement learning approach for training recursive agents: agents that can spawn and delegate sub-tasks to new instantiations of themselves recursively. Recursive agents implement an inference-time scaling algorithm that naturally allows agents to scale to longer contexts and generalize to more difficult problems via divide-and-conquer. RAO provides a method to train models to best take advantage of such recursive inference, teaching agents when and how to delegate and communicate. We find that recursive agents trained in this way enjoy better training efficiency, can scale to tasks that go beyond the model's context window, generalize to tasks much harder than the ones the agent was trained on, and can enjoy reduced wall-clock time compared to single-agent systems.

Updated: 2026-05-07 17:49:09

标题: 递归代理优化

摘要: 我们介绍了递归代理优化（RAO），一种用于训练递归代理的强化学习方法：这些代理可以递归地生成和委派子任务给自身的新实例。递归代理实现了一种推理时间缩放算法，自然地允许代理在更长的上下文中进行扩展，并通过分而治之来推广到更困难的问题。RAO提供了一种训练模型以最大限度地利用这种递归推理的方法，教导代理何时以及如何委派和沟通。我们发现，以这种方式训练的递归代理具有更好的训练效率，可以扩展到超出模型上下文窗口的任务，推广到比代理训练更困难的任务，并且与单一代理系统相比，可以减少墙钟时间。

更新时间: 2026-05-07 17:49:09

领域: cs.LG,cs.AI,cs.CL,cs.MA

下载: http://arxiv.org/abs/2605.06639v1

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. We introduce ScaleLogic, a synthetic logical reasoning framework that offers independent control over two axes of difficulty: the depth of the required proof planning (i.e., the horizon) and the expressiveness of the underlying logic. Our proposed framework supports a wide range of logics: from simple implication-only logic ("if-then") towards more expressive first-order reasoning with conjunction ("and"), disjunction ("or"), negation ("not"), and universal quantification ("for all"). Using this framework, we show that the RL training compute $T$ follows a power law with respect to reasoning depth $D$ ($T \propto D^γ$, $R^{2} > 0.99$), and that the scaling exponent $γ$ increases monotonically with logical expressiveness, from $1.04$ to $2.60$. On downstream mathematics and general reasoning benchmarks, more expressive training settings yield both larger performance gains (up to $+10.66$ points) and more compute-efficient transfer compared to less expressive settings, demonstrating that what a model is trained on, not just how much it is trained, shapes downstream transfer. We further show that the power-law relationship holds across multiple RL methods, and curriculum-based training substantially improves scaling efficiency.

Updated: 2026-05-07 17:48:42

标题: RL能否教会LLMs长期推理？表达能力至关重要

摘要: 强化学习（RL）已被应用于改进大型语言模型（LLM）的推理能力，然而，关于训练随任务难度如何扩展的系统研究受到受控可扩展环境的缺乏的阻碍。我们引入了ScaleLogic，一个合成的逻辑推理框架，可以独立控制两个难度轴：所需证明规划的深度（即，视野）和基础逻辑的表现力。我们提出的框架支持广泛的逻辑：从简单的蕴涵逻辑（“如果-那么”）到更具表现力的包含（“和”）、析取（“或”）、否定（“非”）和全称量化（“对于所有”）的一阶推理。使用这个框架，我们展示了RL训练计算$T$与推理深度$D$的幂律关系（$T \propto D^γ$，$R^{2} > 0.99$），并且缩放指数$γ$随着逻辑表现力的增加单调增加，从$1.04$到$2.60$。在下游数学和一般推理基准上，更具表现力的训练设置比不太具表现力的设置产生更大的性能增益（最多$+10.66$点），并且在传输时更具计算效率，表明模型训练的内容，而不仅仅是训练的量，塑造了下游传输。我们进一步展示了幂律关系在多种RL方法中都成立，并且基于课程的训练显著提高了扩展效率。

更新时间: 2026-05-07 17:48:42

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2605.06638v1

On the optimization dynamics of RLVR: Gradient gap and step size thresholds

Reinforcement Learning with Verifiable Rewards (RLVR), which uses simple binary feedback to post-train large language models, has found significant empirical success. However, a principled understanding of why it works is lacking. This paper builds a theoretical foundation for RLVR by analyzing its training process at both the full-response (trajectory) and token levels. Central to our analysis is a new quantity called the Gradient Gap, which formalizes the direction of improvement from low-reward to high-reward regions of the response space. We prove that convergence critically depends on aligning the update direction with this Gradient Gap. Moreover, we derive a sharp step-size threshold based on the magnitude of the Gradient Gap: below it, learning converges, whereas above it, performance collapses. Our theory further predicts how the critical step size must scale with response length and the success rate, thereby explaining why practical heuristics such as length normalization improve stability and showing that, with a fixed learning rate, the success rate can stagnate strictly below $100\%$. Importantly, our theory holds flexibly for any policy-gradient algorithm and so characterizes the dynamics of popular approaches such as REINFORCE and GRPO. We validate these predictions through controlled bandit simulations and language model experiments on post-training Qwen2.5-Math-7B with GRPO.

Updated: 2026-05-07 17:44:57

标题: 关于RLVR的优化动力学：梯度差和步长阈值

摘要: 《具有可验证奖励的强化学习（RLVR）》利用简单的二进制反馈来进行大型语言模型的后训练，取得了显著的实证成功。然而，对其有效性的原理性理解尚不足。本文通过分析RLVR在完整响应（轨迹）和标记级别上的训练过程，为RLVR构建了一个理论基础。我们分析的核心是一个称为梯度差的新数量，它形式化了从低奖励到高奖励区域的改进方向。我们证明，收敛关键取决于将更新方向与这个梯度差对齐。此外，我们根据梯度差的大小导出了一个明确的步长阈值：在此阈值以下，学习收敛，而在此阈值以上，性能崩溃。我们的理论进一步预测了关键步长必须如何随着响应长度和成功率的变化而变化，从而解释了为什么实用的启发式方法如长度归一化可以提高稳定性，并且显示了在固定学习率下，成功率可以严格停滞在 $100\%$ 以下。重要的是，我们的理论灵活适用于任何策略梯度算法，因此描述了REINFORCE和GRPO等流行方法的动态特性。我们通过受控的赌博模拟和使用GRPO后训练Qwen2.5-Math-7B的语言模型实验验证了这些预测。

更新时间: 2026-05-07 17:44:57

领域: cs.LG,cs.AI,cs.IT,math.OC,stat.ML

下载: http://arxiv.org/abs/2510.08539v4

Crafting Reversible SFT Behaviors in Large Language Models

Supervised fine-tuning (SFT) induces new behaviors in large language models, yet imposes no structural constraint on how these behaviors are distributed within the model. Existing behavior interpretation methods, such as circuit attribution approaches, identify sparse subnetworks correlated with SFT-induced behaviors post-hoc. However, such correlations do not imply *causal necessity*, limiting the ability to selectively control SFT-induced behaviors at inference time. We pursue an alternative by asking: can an SFT-induced behavior be deliberately compressed into a sparse, mechanistically necessary subnetwork, termed a *carrier*, while remaining controllable at inference time without weight modification? We propose (a) **Loss-Constrained Dual Descent (LCDD)**, which constructs such carriers by jointly optimizing routing masks and model weights under an explicit utility budget, and (b) **SFT-Eraser**, a soft prompt optimized via activation matching on extracted carrier channels, to reverse the SFT-induced behavior. Across safety, fixed-response, and style behaviors on multiple model families, LCDD yields sparse carriers that preserve target behaviors while enabling strong reversion when triggered by SFT-Eraser. Ablations further establish that the sparse structure is the key precondition for reversal: the same trigger optimization fails on standard SFT models, confirming that structure rather than trigger design is the operative factor. These results provide direct evidence that the learned carriers are causally necessary for the behaviors, pointing to a new direction for systematically localizing and selectively suppressing SFT-induced behaviors in deployed models.

Updated: 2026-05-07 17:44:07

标题: 大型语言模型中可逆的SFT行为制作

摘要: 监督微调（SFT）在大型语言模型中引发新的行为，但对这些行为在模型内部分布的结构没有约束。现有的行为解释方法，如电路归因方法，识别与SFT诱导行为相关的稀疏子网络，但这种相关性并不意味着*因果必要性*，限制了在推断时有选择地控制SFT诱导行为的能力。我们追求一种替代方案：SFT诱导行为是否可以被有意压缩成一种稀疏的、机械上必要的子网络，称为*载体*，同时在推断时不需要修改权重就能控制？我们提出了（a）**损失约束双下降（LCDD）**，通过在显式效用预算下联合优化路由掩码和模型权重来构建这样的载体；（b）**SFT-Eraser**，通过在提取的载体通道上进行激活匹配来优化软提示，以逆转SFT诱导行为。在多个模型系列上的安全性、固定响应和风格行为中，LCDD产生了保留目标行为的稀疏载体，同时在触发SFT-Eraser时实现强大的逆转。消融实验进一步表明，稀疏结构是逆转的关键前提：相同的触发优化在标准SFT模型上失败，证实结构而不是触发设计是操作因素。这些结果直接证明了学习到的载体对于行为是因果必要的，指向了在部署模型中系统地定位和选择性抑制SFT诱导行为的新方向。

更新时间: 2026-05-07 17:44:07

领域: cs.LG

下载: http://arxiv.org/abs/2605.06632v1

Hybrid Quantum-Classical GANs for the Generation of Adversarial Network Flows

Classical generative adversarial networks (GANs) have been applied to generate adversarial network traffic capable of attacking intrusion detection systems, but they suffer from shortcomings such as the need for large amounts of high-dimensional datasets, mode collapse, and high computational overhead. In this work, we propose a hybrid quantum-classical GAN (QC-GAN) framework where a variational quantum generator is used to generate synthetic network traffic flows mimicking malicious traffic using latent representations. Instead of sampling classical noise vectors, we encode the latent vector (the hidden features) as a quantum state, which is the basis for claiming more expressive latent representations and reducing computational overhead. A classical discriminator will be trained on real-world datasets (UNSW-NB15) and the proposed QC-GAN-generated fake network flows. In this configuration, the generator aims to minimize the discriminator's ability to distinguish real from fake traffic, while the discriminator aims to maximize its classification accuracy, in an iterative manner. In our attack model, we assume that the attacker is a state actor with access to limited quantum computing power, whereas the discriminator is chosen to be classical, as will likely be the case for most end users and organizations. We test the generated flows using classical intrusion detection system (IDS) models, such as a random forest classifier and a convolutional neural network-based classifier, for their ability to bypass the detection process. This work aims to highlight the possibilities of quantum machine learning as a means of generating advanced attack flows and stress testing classical IDS. Lastly, we further evaluate how hardware-based noise affects these attacks to offer a new perspective on IDS, highlighting the need for a quantum resilient defense system.

Updated: 2026-05-07 17:43:25

标题: 混合量子经典GAN用于生成对抗网络流

摘要: 经典生成对抗网络（GANs）已被应用于生成能够攻击入侵检测系统的对抗性网络流量，但存在诸如需要大量高维数据集、模式崩溃和高计算开销等缺点。在这项工作中，我们提出了一个混合量子-经典GAN（QC-GAN）框架，其中使用变分量子生成器生成合成网络流量，模仿恶意流量，使用潜在表示。我们将潜在向量（隐藏特征）编码为量子态，而非采样经典噪声向量，这是声称更具表现力的潜在表示和减少计算开销的基础。经典鉴别器将在真实数据集（UNSW-NB15）和所提出的QC-GAN生成的假网络流量上进行训练。在这种配置下，生成器旨在最小化鉴别器区分真实流量和虚假流量的能力，而鉴别器旨在以迭代方式最大化其分类准确性。在我们的攻击模型中，我们假设攻击者是一个具有有限量子计算能力的国家行为者，而鉴别器则选择为经典的，这可能是大多数最终用户和组织的情况。我们使用经典入侵检测系统（IDS）模型对生成的流量进行测试，例如随机森林分类器和基于卷积神经网络的分类器，以评估它们绕过检测过程的能力。这项工作旨在强调量子机器学习作为生成高级攻击流量和压力测试经典IDS的手段的可能性。最后，我们进一步评估硬件基础噪声如何影响这些攻击，提供对IDS的新视角，强调需要一个量子弹性防御系统。

更新时间: 2026-05-07 17:43:25

领域: cs.LG

下载: http://arxiv.org/abs/2605.06629v1

LiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operation

Modern sensors generate rich, high-fidelity data, yet applications operating on wearable or remote sensing devices remain constrained by bandwidth and power budgets. Standardized codecs such as JPEG and MPEG achieve efficient trade-offs between bitrate and perceptual quality but are designed for human perception, limiting their applicability to machine-perception tasks and non-traditional modalities such as spatial audio arrays, hyperspectral images, and 3D medical images. General-purpose compression schemes based on scalar quantization or resolution reduction are broadly applicable but fail to exploit inherent signal redundancies, resulting in suboptimal rate-distortion performance. Recent generative neural codecs, or tokenizers, model complex signal dependencies but are often over-parameterized, data-hungry, and modality-specific, making them impractical for resource-constrained environments. We introduce a Lightweight, Versatile, and Asymmetric neural codec architecture (LiVeAction), that addresses these limitations through two key ideas. (1) To reduce the complexity of the encoder to meet the resource constraints of the execution environments, we impose an FFT-like structure and reduce the overall size and depth of the neural-network-based analysis transform. (2) To allow arbitrary signal modalities and simplify training, we replace adversarial and perceptual losses with a variance-based rate penalty. Our design produces codecs that deliver superior rate-distortion performance compared to state-of-the-art generative tokenizers, while remaining practical for deployment on low-power sensors. We release our code, experiments, and python library at https://github.com/UT-SysML/liveaction .

Updated: 2026-05-07 17:42:38

标题: LiVeAction：一种轻量、多功能和非对称的神经编解码器设计，用于实时操作

摘要: 现代传感器产生丰富、高保真度的数据，然而在可穿戴设备或远程传感设备上运行的应用程序仍受到带宽和功耗预算的限制。标准编解码器如JPEG和MPEG实现了比特率和感知质量之间的有效权衡，但它们是为人类感知而设计的，限制了它们在机器感知任务和非传统模态（如空间音频阵列、高光谱图像和3D医学图像）中的适用性。基于标量量化或分辨率降低的通用压缩方案具有广泛的适用性，但未能利用固有的信号冗余，导致亚优化的速率失真性能。最近的生成神经编解码器，或者分词器，可以建模复杂的信号依赖关系，但往往过度参数化、数据需求量大，并且特定于模态，使其在资源受限环境中不实用。我们引入了一种轻量级、多功能且不对称的神经编解码器架构（LiVeAction），通过两个关键思想来解决这些限制。 (1) 为了降低编码器的复杂性以满足执行环境的资源约束，我们施加了类似FFT的结构，并减少了基于神经网络的分析变换的整体大小和深度。 (2) 为了允许任意信号模态并简化训练，我们用基于方差的速率惩罚替代了对抗性和感知损失。我们的设计产生的编解码器在速率失真性能方面优于最先进的生成分词器，同时仍然适用于部署在低功耗传感器上。我们在https://github.com/UT-SysML/liveaction 上发布了我们的代码、实验和Python库。

更新时间: 2026-05-07 17:42:38

领域: eess.IV,cs.LG,cs.MM,eess.AS,eess.SP

下载: http://arxiv.org/abs/2605.06628v1

Flexible Agent Alignment with Goal Inference from Open-Ended Dialog

We introduce Open-Universe Assistance Games (OU-AGs), a formal framework extending assistance games to LLM-based agents. Effective assistance requires reasoning over human preferences that are unbounded, underspecified, and evolving. Current LLM agents struggle in multi-turn interactions and with maintaining accurate models of user intent in collaborative settings. Existing assistance game formulations assume fixed, predefined preferences, an assumption that breaks down in open-ended dialogue where goals are revised incrementally and expressed in natural language. Grounded in cognitive science accounts of preference construction, we represent human preferences as a dynamically updated distribution over discrete natural-language goals. To operationalize OU-AGs, we introduce GOOD (GOals from Open-ended Dialogue), a data-efficient online method that extracts and ranks candidate goals during interaction, using LLM-simulated users to perform probabilistic inference over goal hypotheses. This allows for interpretable, uncertainty-aware preference representations without large offline datasets. We evaluate GOOD across three text-based domains: grocery shopping, household robotics (AI2-THOR), and coding. Compared to baselines without explicit goal tracking, GOOD produces semantically coherent goal representations and improves alignment with user intent across domains.

Updated: 2026-05-07 17:41:50

标题: 灵活的代理对齐：从开放式对话中推断目标

摘要: 我们介绍了开放宇宙辅助游戏（OU-AGs），这是一个正式的框架，将辅助游戏扩展到基于LLM的代理程序。有效的辅助需要对无限制、不明确定和不断发展的人类偏好进行推理。当前的LLM代理在多轮交互和在协作环境中维护准确的用户意图模型方面存在困难。现有的辅助游戏公式假设固定、预定义的偏好，这一假设在目标逐步修订并用自然语言表达的无限对话中失效。基于偏好构建的认知科学观点，我们将人类偏好表示为动态更新的离散自然语言目标分布。为了使OU-AGs操作化，我们介绍了GOOD（来自开放式对话的目标），这是一种数据高效的在线方法，在交互过程中提取和排列候选目标，使用LLM模拟用户对目标假设进行概率推理。这样可以实现可解释的、具有不确定性意识的偏好表示，而无需大规模离线数据集。我们在三个基于文本的领域（食品杂货购物、家用机器人（AI2-THOR）和编码）中评估了GOOD。与没有明确目标跟踪的基线相比，GOOD产生了语义连贯的目标表示，并在各个领域中改善了与用户意图的对齐。

更新时间: 2026-05-07 17:41:50

领域: cs.AI,cs.CL,cs.LG,cs.RO

下载: http://arxiv.org/abs/2508.15119v2

PianoCoRe: Combined and Refined Piano MIDI Dataset

Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsistent naming formats. This work presents PianoCoRe, a large-scale piano MIDI dataset that unifies and refines major open-source piano corpora. The dataset contains 250,046 performances of 5,625 pieces written by 483 composers, totaling 21,763 h of performed music. PianoCoRe is released in tiered subsets to support different applications: from large-scale analysis and pre-training (PianoCoRe-C and deduplicated PianoCoRe-B) to expressive performance modeling with note-level score alignment (PianoCoRe-A/A*). The note-aligned subset, PianoCoRe-A, provides the largest open-source collection of 157,207 performances aligned to 1,591 scores to date. In addition to the dataset, the contributions are: (1) a MIDI quality classifier for detecting corrupted and score-like transcriptions and (2) RAScoP, an alignment refinement pipeline that cleans temporal alignment errors and interpolates missing notes. The analysis shows that the refinement reduces temporal noise and eliminates tempo outliers. Moreover, an expressive performance rendering model trained on PianoCoRe demonstrates improved robustness to unseen pieces compared to models trained on raw or smaller datasets. PianoCoRe provides a ready-to-use foundation for the next generation of expressive piano performance research.

Updated: 2026-05-07 17:41:07

标题: PianoCoRe：综合和精炼的钢琴MIDI数据集

摘要: 符号音乐数据集与匹配的乐谱和演奏对许多音乐信息检索（MIR）任务至关重要。然而，现有资源往往涵盖范围有限的作曲家，缺乏演奏多样性，省略了音符级别的对齐，或使用不一致的命名格式。本研究介绍了PianoCoRe，一个大规模的钢琴MIDI数据集，统一和完善了主要开源钢琴语料库。该数据集包含483位作曲家创作的5,625首作品的250,046次演奏，总共演奏了21,763小时的音乐。PianoCoRe发布为分层子集，以支持不同的应用：从大规模分析和预训练（PianoCoRe-C和去重的PianoCoRe-B）到具有音符级别乐谱对齐的表现性能建模（PianoCoRe-A/A*）。音符对齐子集PianoCoRe-A，提供了迄今为止与1,591份乐谱对齐的最大开源集合157,207次演奏。除了数据集之外，贡献还包括：（1）用于检测损坏和类似乐谱的MIDI质量分类器和（2）RAScoP，一个对齐细化管道，清理时间对齐错误并插值缺失的音符。分析表明，细化减少了时间噪音并消除了速度异常值。此外，在PianoCoRe上训练的表现性能呈现模型表现出对未见过曲目的改进鲁棒性，与在原始或较小数据集上训练的模型相比。PianoCoRe为下一代表现性钢琴演奏研究提供了一个即用基础。

更新时间: 2026-05-07 17:41:07

领域: cs.SD,cs.LG

下载: http://arxiv.org/abs/2605.06627v1

AI Cap-and-Trade: Efficiency Incentives for Accessibility and Sustainability

The race for artificial intelligence (AI) dominance often prioritizes scale over efficiency. Hyper-scaling is the common industry approach: larger models, more data, and as many computational resources as possible. Using more resources is a simpler path to improved AI performance. Thus, efficiency has been de-emphasized. Consequently, the need for costly computational resources has marginalized academics and smaller companies. Simultaneously, increased energy expenditure, due to growing AI use, has led to mounting environmental costs. In response to accessibility and sustainability concerns, we argue for research into, and implementation of, market-based methods that incentivize AI efficiency. We believe that incentivizing efficient operations and approaches will reduce emissions while opening new opportunities for academics and smaller companies. As a call to action, we propose a cap-and-trade system for AI. Our system provably reduces computations for AI deployment, thereby lowering emissions and monetizing efficiency to the benefit of academics and smaller companies.

Updated: 2026-05-07 17:38:59

标题: AI 碳排放交易：促进可持续可及性的效率激励

摘要: 人工智能（AI）主导地位的竞争往往将规模置于效率之上。超级扩展是行业的常见方法：更大的模型，更多的数据，以及尽可能多的计算资源。利用更多资源是提高AI性能的一条更简单的途径。因此，效率已经被弱化。因此，对昂贵的计算资源的需求已经边缘化了学术界和较小的公司。同时，由于不断增长的AI使用量导致的能源消耗增加，已经造成了不断增加的环境成本。为了应对可访问性和可持续性问题，我们主张进行研究并实施市场为基础的方法，以激励AI效率。我们相信，激励有效的运营和方法将减少排放，同时为学术界和较小的公司开辟新机会。作为一项行动呼吁，我们提议为AI实施一个限额交易制度。我们的系统可以明显减少AI部署的计算量，从而降低排放并将效率货币化，使学术界和较小的公司受益。

更新时间: 2026-05-07 17:38:59

领域: econ.GN,cs.AI,cs.CY,cs.GT

下载: http://arxiv.org/abs/2601.19886v2

How to make the most of your masked language model for protein engineering

A plethora of protein language models have been released in recent years. Yet comparatively little work has addressed how to best sample from them to optimize desired biological properties. We fill this gap by proposing a flexible, effective sampling method for masked language models (MLMs), and by systematically evaluating models and methods both in silico and in vitro on actual antibody therapeutics campaigns. Firstly, we propose sampling with stochastic beam search, exploiting the fact that MLMs are remarkably efficient at evaluating the pseudo-perplexity of the entire 1-edit neighborhood of a sequence. Reframing generation in terms of entire-sequence evaluation enables flexible guidance with multiple optimization objectives. Secondly, we report results from our extensive in vitro head-to-head evaluation for the antibody engineering setting. This reveals that choice of sampling method is at least as impactful as the model used, motivating future research into this under-explored area.

Updated: 2026-05-07 17:36:26

标题: 如何充分利用您的蒙面语言模型进行蛋白质工程

摘要: 近年来推出了大量的蛋白质语言模型。然而，相比较而言，很少有工作涉及如何从中最佳地抽样以优化所需的生物特性。我们通过提出一种灵活、有效的掩码语言模型（MLMs）抽样方法来填补这一空白，并通过系统地评估模型和方法，无论是在体内还是在实际抗体治疗活动中。首先，我们提出使用随机束搜索进行抽样，利用MLMs在评估序列的整个1-编辑邻域的伪困惑度方面非常高效的事实。将生成重新构建为整个序列评估的术语，可以实现灵活的引导，具有多个优化目标。其次，我们报告了我们在体内抗体工程设置中进行的广泛头对头评估的结果。这表明抽样方法的选择至少与所使用的模型一样具有重要影响，促使未被充分探讨的领域进行未来研究。

更新时间: 2026-05-07 17:36:26

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2603.10302v2

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals. To address this, we introduce MASPO, a novel framework designed to automatically and iteratively refine prompts across the entire system. A core innovation of MASPO is its joint evaluation mechanism, which assesses prompts not merely by their local validity, but by their capacity to facilitate downstream success for successor agents. This effectively bridges the gap between local interactions and global outcomes without relying on ground-truth labels. Furthermore, MASPO employs a data-driven evolutionary beam search to efficiently navigate the high-dimensional prompt space. Extensive empirical evaluations across 6 diverse tasks demonstrate that MASPO consistently outperforms state-of-the-art prompt optimization methods, achieving an average accuracy improvement of 2.9. We release our code at https://github.com/wangzx1219/MASPO.

Updated: 2026-05-07 17:35:26

标题: MASPO：基于LLM的多智能体系统的联合提示优化

摘要: 基于大型语言模型（LLM）的多智能体系统（MAS）在处理复杂的协作任务方面显示出潜力，其中代理通常通过特定角色的提示进行编排。尽管这些提示的质量至关重要，但在相互作用代理之间共同优化它们仍然是一个非平凡的挑战，主要是因为本地代理目标和整体系统目标之间的不一致。为了解决这个问题，我们介绍了MASPO，这是一个新颖的框架，旨在自动化地和迭代地改进整个系统中的提示。MASPO的一个核心创新是其联合评估机制，该机制不仅通过本地有效性评估提示，还通过评估它们促进继任代理下游成功的能力。这有效地弥合了局部交互和全局结果之间的差距，而无需依赖于地面真实标签。此外，MASPO采用数据驱动的进化波束搜索来有效地导航高维提示空间。对6个不同任务的广泛实证评估表明，MASPO始终优于最先进的提示优化方法，平均准确率提高了2.9。我们在https://github.com/wangzx1219/MASPO 上发布了我们的代码。

更新时间: 2026-05-07 17:35:26

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2605.06623v1

Private Delegated Quantum Computing for User-Level and Industry-Level Settings

We present a modular hierarchy of private delegated quantum computation protocols tailored to user-level and industry-level settings and parameterized by the quantum resources available to the client. For each protocol, we specify the client capabilities, delegated gate set, adversarial model, transcript leakage and resulting privacy claims. The hierarchy separates QOTP state privacy under declared leakage from leakage-dependent transcript-level angle ambiguity, compiler- and leakage-function-dependent structural privacy, and output privacy, clarifies when public Clifford operations can be evaluated on quantum-one-time-pad encrypted data by classical key updates, and identifies where non-Clifford privacy, non-collusion or additional primitives are required. The classical-client branch uses a persistent common-node, matching-hidden split-QOTP together with shuffled finite-grid $r$-share sign-randomized angle sharing to obtain leakage-relative state hiding under an explicit $ε_{\mathrm{key}}$ key-hiding condition and transcript-level unlinkability under hidden-matching assumptions under an explicit non-total-collusion and leakage model. The angle-sharing primitives provide transcript ambiguity under explicit leakage assumptions, not universal blindness. The trap-based layer provides detection under stated assumptions, but it is not a stand-alone malicious-security proof.

Updated: 2026-05-07 17:34:18

标题: 私人委托的量子计算在用户级和行业级环境中的应用

摘要: 我们提出了一个模块化的私人委托量子计算协议层次结构，可根据用户级和行业级设置进行定制，并由客户可用的量子资源参数化。对于每个协议，我们指定客户能力，委托门集，对抗模型，转录泄漏和由此产生的隐私声明。该层次结构将声明泄漏下的QOTP状态隐私与泄漏依赖的转录级角度模糊、编译器和泄漏函数依赖的结构隐私以及输出隐私分开，阐明了何时可以通过经典密钥更新在量子一次性加密数据上评估公共Clifford操作，并确定了需要非Clifford隐私、非勾结或其他原语的位置。经典客户分支使用持久的共同节点，匹配隐藏的分裂QOTP以及洗牌的有限网格$r$份签名随机角度共享，以获得在明确的$ε_{\mathrm{key}}$密钥隐藏条件下的泄漏相关状态隐藏和在明确的非完全勾结和泄漏模型下的转录级不可链接性。角度共享原语在明确的泄漏假设下提供转录模糊性，而不是通用盲目性。基于陷阱的层提供了在陈述的假设下的检测，但这并不是一个独立的恶意安全性证明。

更新时间: 2026-05-07 17:34:18

领域: quant-ph,cs.CR,cs.DC,cs.ET

下载: http://arxiv.org/abs/2405.11608v3

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

Sign-based optimization algorithms, such as SignSGD and Muon, have garnered significant attention for their remarkable performance in training large foundation models. Despite this empirical success, we still lack a theoretical understanding of when and why these sign-based methods outperform vanilla SGD. The core obstacle is that under standard smoothness and finite variance conditions, SGD is known to be minimax optimal for finding stationary points measured by $\ell_2$-norms, thereby fundamentally precluding any complexity gains for sign-based methods in standard settings. To overcome this barrier, we analyze sign-based optimizers leveraging $\ell_1$-norm stationarity, $\ell_\infty$-smoothness, and a separable noise model, which can better capture the coordinate-wise nature of signed updates. Under this distinct problem geometry, we derive matched upper and lower bounds for SignSGD and explicitly characterize the problem class in which SignSGD provably dominates SGD. Specifically, we compare the \emph{upper bound of SignSGD} with the \emph{lower bound of SGD}, illustrating that SignSGD effectively reduces the complexity by a factor of $d$ under \emph{sparse noise}, where $d$ is the problem dimension. Furthermore, we elevate this framework to the matrix domain, providing an equivalent optimal lower bound for the Muon optimizer, proving that extending the sign operator to matrices preserves this optimal scaling with dimensionality. Finally, we bridge our theoretical bounds to practice, demonstrating that the theoretical superiority of SignSGD accurately predicts its faster convergence during the pretraining of a 124M parameter GPT-2 model.

Updated: 2026-05-07 17:32:09

标题: 何时以及为什么SignSGD胜过SGD：基于$\ell_1$-范数下界的理论研究

摘要: 基于符号的优化算法，如SignSGD和Muon，因其在训练大型基础模型中表现出色而引起了广泛关注。尽管在实证上取得了成功，我们仍然缺乏对这些基于符号的方法何时以及为何优于普通SGD的理论理解。核心障碍在于，在标准光滑性和有限方差条件下，已知SGD在通过$\ell_2$-范数衡量的找到稳定点方面是极小极大最优的，因此在标准设置下，基于符号的方法无法获得任何复杂度上的优势。为了克服这一障碍，我们分析了利用$\ell_1$-范数稳定性、$\ell_\infty$-光滑性和可分离噪声模型的基于符号的优化器，这可以更好地捕捉带符号更新的坐标性质。在这种不同的问题几何结构下，我们为SignSGD推导了匹配的上下界，并明确地描述了SignSGD能够明显优于SGD的问题类。具体来说，我们将\emph{SignSGD的上界}与\emph{SGD的下界}进行比较，说明在\emph{稀疏噪声}下，SignSGD有效地将复杂度降低了一个因子$d$，其中$d$是问题的维度。此外，我们将这一框架扩展到矩阵域，为Muon优化器提供了一个等效的最优下界，证明将符号操作符扩展到矩阵会保留这种与维度相关的最优缩放。最后，我们将我们的理论界限与实践联系起来，展示了SignSGD的理论优越性准确地预测了在预训练124M参数的GPT-2模型过程中其更快的收敛速度。

更新时间: 2026-05-07 17:32:09

领域: cs.LG,cs.AI,cs.CL,math.OC

下载: http://arxiv.org/abs/2605.06615v1

SkillOS: Learning Skill Curation for Self-Evolving Agents

LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existing approaches either rely on manual skill curation, prescribe heuristic skill operations, or train for short-horizon skill operations. However, they still struggle to learn complex long-term curation policies from indirect and delayed feedback. To tackle this challenge, we propose SkillOS, an experience-driven RL training recipe for learning skill curation in self-evolving agents. SkillOS pairs a frozen agent executor that retrieves and applies skills with a trainable skill curator that updates an external SkillRepo from accumulated experience. To provide learning signals for curation, we design composite rewards and train on grouped task streams based on skill-relevant task dependencies, where earlier trajectories update the SkillRepo, and later related tasks evaluate these updates. Across multi-turn agentic tasks and single-turn reasoning tasks, SkillOS consistently outperforms memory-free and strong memory-based baselines in both effectiveness and efficiency, with the learned skill curator generalizing across different executor backbones and task domains. Further analyses show that the learned curator produces more targeted skill use, while the skills in SkillRepo evolve into more richly structured Markdown files that encode higher-level meta-skills over time.

Updated: 2026-05-07 17:31:50

标题: SkillOS：学习技能策划以用于自我进化代理

摘要: 基于LLM的代理越来越被部署用来处理流式任务，然而它们通常仍然是一次性问题解决者，无法从过去的交互中学习。从经验中提炼出可重复使用的技能为自我进化提供了自然基础，高质量的技能策划成为关键瓶颈。现有方法要么依赖于手动技能策划，要么规定启发式技能操作，要么训练短期技能操作。然而，它们仍然难以从间接和延迟的反馈中学习复杂的长期策划政策。为了解决这一挑战，我们提出了SkillOS，一种基于经验驱动的RL训练配方，用于学习自我进化代理中的技能策划。SkillOS将一个冻结的代理执行器与一个可训练的技能策划器配对，后者从积累的经验中更新外部SkillRepo。为了为策划提供学习信号，我们设计了复合奖励，并根据与技能相关的任务依赖关系对任务流进行分组训练，其中较早的轨迹更新SkillRepo，而后来的相关任务评估这些更新。在多轮代理任务和单轮推理任务中，SkillOS始终在效果和效率上优于无记忆和强记忆基线，学习的技能策划器可以泛化到不同的执行器骨干和任务领域。进一步的分析表明，学习的策划器产生了更具针对性的技能使用，而SkillRepo中的技能随着时间变得更加结构化，编码了更高级的元技能。

更新时间: 2026-05-07 17:31:50

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2605.06614v1

Multimodal Fact-Level Attribution for Verifiable Reasoning

Multimodal large language models (MLLMs) are increasingly used for real-world tasks involving multi-step reasoning and long-form generation, where reliability requires grounding model outputs in heterogeneous input sources and verifying individual factual claims. However, existing multimodal grounding benchmarks and evaluation methods focus on simplified, observation-based scenarios or limited modalities and fail to assess attribution in complex multimodal reasoning. We introduce MuRGAt (Multimodal Reasoning with Grounded Attribution), a benchmark for evaluating fact-level multimodal attribution in settings that require reasoning beyond direct observation. Given inputs spanning video, audio, and other modalities, MuRGAt requires models to generate answers with explicit reasoning and precise citations, where each citation specifies both modality and temporal segments. To enable reliable assessment, we introduce an automatic evaluation framework that strongly correlates with human judgments. Benchmarking with human and automated scores reveals that even strong MLLMs frequently hallucinate citations despite correct reasoning. Moreover, we observe a key trade-off: increasing reasoning depth or enforcing structured grounding often degrades accuracy, highlighting a significant gap between internal reasoning and verifiable attribution.

Updated: 2026-05-07 17:31:17

标题: 多模态事实级归因用于可验证推理

摘要: 多模态大型语言模型（MLLMs）越来越多地用于涉及多步推理和长篇生成的真实世界任务，可靠性要求在异构输入来源中基于模型输出并验证个别事实主张。然而，现有的多模态基础基准和评估方法侧重于简化的基于观察的场景或有限的模态，并未评估复杂多模态推理中的归因。我们引入了MuRGAt（带有基础归因的多模态推理），这是一个用于评估需要超越直接观察的事实级多模态归因的基准。鉴于跨视频、音频和其他模态的输入，MuRGAt要求模型生成带有明确推理和精确引文的答案，其中每个引文都指定了模态和时间段。为了实现可靠的评估，我们引入了一个与人类判断强相关的自动评估框架。通过人类和自动评分的基准测试显示，即使强大的MLLMs经常在正确推理的情况下产生幻觉引文。此外，我们观察到一个关键的权衡：增加推理深度或强制结构化基础常常会降低准确性，突显了内部推理和可验证归因之间的显著差距。

更新时间: 2026-05-07 17:31:17

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2602.11509v2

Online Bayesian Calibration under Gradual and Abrupt System Changes

Bayesian model calibration is central to digital twins and computer experiments, as it aligns model outputs with field observations by estimating calibration parameters and correcting systematic model bias. Classical Bayesian calibration introduces latent parameters and a discrepancy function to model bias, but suffers from parameter--discrepancy confounding and is typically formulated as an offline procedure under a stationary data-generating assumption. These limitations are restrictive in modern digital twin applications, where systems evolve over time and may exhibit gradual drift and abrupt regime shifts. While data assimilation methods enable sequential updates, they generally do not explicitly model systematic bias and are less effective under abrupt changes. We propose Bayesian Recursive Projected Calibration (BRPC), an online Bayesian calibration framework for streaming data under simulator mismatch and nonstationarity. BRPC extends projected calibration to the online setting by separating a discrepancy-free particle update for calibration parameters from a conditional Gaussian process update for discrepancy, preserving identifiability while enabling bias-aware adaptation under gradual system evolution. To handle abrupt changes, BRPC is integrated with restart mechanisms that detect regime shifts and reset the calibration process. We establish theoretical guarantees for both components, including tracking performance under gradual evolution and false-alarm and detection behavior for restart mechanisms. Empirical studies on synthetic and plant-simulation benchmarks show that BRPC improves calibration accuracy under gradual changes, while restart-augmented BRPC further improves robustness and predictive performance under abrupt regime shifts compared to sliding-window Bayesian calibration and data assimilation baselines.

Updated: 2026-05-07 17:29:18

标题: 在线贝叶斯校准在逐渐和突然的系统变化下的应用

摘要: 贝叶斯模型校准对数字孪生和计算实验至关重要，因为它通过估计校准参数和纠正系统性模型偏差，将模型输出与现场观察结果相一致。传统的贝叶斯校准引入了潜在参数和偏差函数来建模偏差，但存在参数-偏差混淆问题，并且通常被制定为在稳态数据生成假设下的离线过程。这些限制在现代数字孪生应用中是有限制的，因为系统随时间演变，可能出现渐变漂移和突变。虽然数据同化方法能够进行顺序更新，但通常不会明确地建模系统性偏差，并且在突变情况下效果较差。我们提出了贝叶斯递归投影校准（BRPC），这是一个适用于流数据的在线贝叶斯校准框架，可以处理模拟器不匹配和非平稳性。BRPC通过将无偏差的粒子更新校准参数与用于偏差的条件高斯过程更新分开，将投影校准扩展到在线设置，保持可识别性同时在系统逐渐演变下实现偏差感知的适应。为了处理突变，BRPC结合了检测制度转变并重置校准过程的重新启动机制。我们为两个组件建立了理论保证，包括在渐变演化下的跟踪性能和重新启动机制的虚警和检测行为。在合成和植物模拟基准测试中的实证研究表明，BRPC可以提高在渐变变化下的校准准确性，而增强了重新启动的BRPC在突然制度转变下相对于滑动窗口贝叶斯校准和数据同化基线的鲁棒性和预测性能。

更新时间: 2026-05-07 17:29:18

领域: cs.LG,cs.ET,stat.ML

下载: http://arxiv.org/abs/2605.06612v1

The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity

Despite the prevalence of the attention sink phenomenon in Large Language Models (LLMs), where initial tokens disproportionately monopolize attention scores, its structural origins remain elusive. This work provides a \textit{mechanistic explanation} for this phenomenon. First, we trace its root to the value aggregation process inherent in self-attention, which induces a systematic variance discrepancy. We further demonstrate that this discrepancy is drastically amplified by the activation of super neurons within Feed-Forward Network (FFN) layers. Specifically, the channel-sparse down-projections trigger a dimension disparity of the first-token representation, necessitating the formation of attention sinks as a structural anchor. Then, we validate this causal chain through two controlled interventions: (i) isolating the aggregation effect via attention mask modifications and (ii) amplifying the variance of targeted token representations. Both interventions can replicate attention sinks at arbitrary positions. Our mechanistic understanding offers a foundation for the systematic control of sink formation. Finally, as a proof of concept, we propose \textit{head-wise RMSNorm}, an architectural modification that stabilizes value aggregation outputs during pre-training. Our experiments demonstrate that restoring statistical parity across positions significantly accelerates convergence.

Updated: 2026-05-07 17:28:55

标题: 注意力沉降的结构起源：方差差异、超级神经元和维度差异

摘要: 尽管大型语言模型（LLMs）中的注意力池现象普遍存在，即初始标记过度垄断注意力分数，但其结构起源仍然难以捉摸。本研究提供了这一现象的\textit{机制性解释}。首先，我们将其根源追溯到自注意力中固有的值聚合过程，这引发了系统性方差差异。我们进一步证明，这种差异被前馈网络（FFN）层中超级神经元的激活大大放大。具体而言，通道稀疏的下投影触发了第一个标记表示的维度差异，需要形成注意力池作为结构锚点。然后，我们通过两个受控干预验证了这种因果链：（i）通过注意力掩码修改孤立聚合效应，（ii）增强目标标记表示的方差。这两种干预都可以在任意位置复制注意力池。我们的机制性理解为池的形成提供了系统性控制的基础。最后，作为概念验证，我们提出了\textit{基于头部的RMSNorm}，这是一种在预训练期间稳定值聚合输出的架构修改。我们的实验表明，恢复位置之间的统计平等显着加快了收敛速度。

更新时间: 2026-05-07 17:28:55

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2605.06611v1

SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders

Sparse Autoencoders (SAEs) have become an important tool in mechanistic interpretability, helping to analyze internal representations in both Large Language Models (LLMs) and Vision Transformers (ViTs). By decomposing polysemantic activations into sparse sets of monosemantic features, SAEs aim to translate neural network computations into human-understandable concepts. However, common architectures such as TopK SAEs rely on a fixed sparsity level. They enforce the same number of active features (K) across all inputs, ignoring the varying complexity of real-world data. Natural data often lies on manifolds with varying local intrinsic dimensionality, meaning the number of relevant factors can change significantly across samples. This suggests that a fixed sparsity level is not optimal. Simple inputs may require only a few features, while more complex ones need more expressive representations. Using a constant K can therefore introduce noise in simple cases or miss important structure in more complex ones. To address this issue, we propose SoftSAE, a sparse autoencoder with a Dynamic Top-K selection mechanism. Our method uses a differentiable Soft Top-K operator to learn an input-dependent sparsity level k. This allows the model to adjust the number of active features based on the complexity of each input. As a result, the representation better matches the structure of the data, and the explanation length reflects the amount of information in the input. Experimental results confirm that SoftSAE not only finds meaningful features, but also selects the right number of features for each concept. The source code is available at: https://anonymous.4open.science/r/SoftSAE-8F71/.

Updated: 2026-05-07 17:28:40

标题: SoftSAE：自适应稀疏自动编码器的动态Top-K选择

摘要: 稀疏自动编码器（SAEs）已经成为一种重要的工具，用于在大型语言模型（LLMs）和视觉变换器（ViTs）中帮助解释机制，有助于分析内部表示。通过将多义激活分解为一组稀疏的单义特征，SAEs旨在将神经网络计算转化为人类可理解的概念。然而，常见的架构如TopK SAEs依赖于固定的稀疏级别。它们在所有输入上强制执行相同数量的活跃特征（K），忽略真实世界数据的不同复杂性。自然数据通常位于具有不同局部固有维度的流形上，这意味着相关因素的数量在样本之间可能会显著变化。这表明固定的稀疏级别并非最佳选择。简单的输入可能只需要少量特征，而更复杂的输入则需要更具表现力的表示。因此，使用固定的K可能会在简单情况下引入噪音，或者在更复杂情况下错过重要的结构。为了解决这个问题，我们提出了SoftSAE，一种具有动态Top-K选择机制的稀疏自动编码器。我们的方法使用可微的Soft Top-K运算符来学习一个依赖于输入的稀疏级别k。这使得模型能够根据每个输入的复杂性调整活跃特征的数量。结果，表示更好地匹配数据的结构，并且解释长度反映了输入中的信息量。实验结果证实，SoftSAE不仅能找到有意义的特征，还能为每个概念选择正确数量的特征。源代码可在以下链接找到：https://anonymous.4open.science/r/SoftSAE-8F71/。

更新时间: 2026-05-07 17:28:40

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2605.06610v1

DARK: Diagonal-Anchored Repulsive Knowledge Distillation for Vision-Language Models under Extreme Compression

Compressing vision-language models for on-device deployment is increasingly important in clinical settings, but knowledge distillation (KD) degrades sharply when the teacher-student capacity gap spans an order of magnitude or more. We argue that, under such gaps, strict imitation of the teacher is a poor objective: much of the teacher's pairwise similarity structure reflects its own architectural biases rather than information a compact student can efficiently represent. We propose \textbf{Diagonal-Anchored Repulsive Knowledge Distillation (DARK)}, a contrastive KD framework that decomposes the distillation loss into a diagonal term (matched image-text pairs) and an off-diagonal term (non-target similarities). The diagonal term anchors matched-pair alignment throughout training; the off-diagonal term is annealed from positive to negative weighting, transitioning the student from imitating to \emph{repelling} the teacher's non-target similarity structure. We instantiate DARK by distilling FetalCLIP, a 427M-parameter fetal ultrasound vision-language model, into \textbf{MobileFetalCLIP}, a 75M-parameter student model with a $26\times$ smaller visual encoder, running in 1.6\,ms on an iPhone~16~Pro. The student matches or exceeds its teacher on three zero-shot benchmarks, including HC18 biometry validity (88.6\% vs.\ 83.5\%) and brain sub-plane F1 (0.784 vs.\ 0.702). Embedding-geometry and logit analyses show that DARK induces \emph{structured decorrelation}: the student preserves teacher-aligned per-image confidence while diverging from inherited inter-class confusion, suggesting that controlled repulsion can be more efficient than imitation under extreme compression.

Updated: 2026-05-07 17:28:16

标题: DARK：对视觉-语言模型进行极限压缩下的对角锚定斥力知识蒸馏

摘要: 在临床环境中，为了在设备上部署视觉-语言模型，对模型进行压缩变得越来越重要，但是当教师-学生容量差距跨越一个数量级或更多时，知识蒸馏（KD）会急剧降低。我们认为，在这样的差距下，严格模仿教师是一个糟糕的目标：教师的许多成对相似性结构反映了其自身的架构偏见，而不是一个紧凑的学生可以高效表示的信息。我们提出了\textbf{对角锚定的斥力知识蒸馏（DARK）}，这是一个对比知识蒸馏框架，将蒸馏损失分解为对角项（匹配的图像-文本对）和非对角项（非目标相似性）。对角项在整个训练过程中锚定了匹配对齐；非对角项从正权重逐渐过渡到负权重，将学生从模仿转变为\emph{排斥}教师的非目标相似性结构。我们通过将拥有427M参数的胎儿超声视觉-语言模型FetalCLIP蒸馏成75M参数的学生模型MobileFetalCLIP来实例化DARK，并在iPhone 16 Pro上以1.6毫秒的速度运行，其视觉编码器比教师模型小26倍。学生在包括HC18生物测量学有效性（88.6\% vs. 83.5%）和脑亚平面F1（0.784 vs. 0.702）在内的三个零样本基准测试中与或超过教师。嵌入几何和逻辑分析显示，DARK引起了\emph{结构化去相关性}：学生保留了与教师对齐的每个图像置信度，同时偏离了继承的类间混淆，表明在极端压缩下，受控斥力可能比模仿更有效。

更新时间: 2026-05-07 17:28:16

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2603.05421v3

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

Transformers have demonstrated remarkable in-context learning (ICL) capabilities. The strong ICL performance of transformers is commonly believed to arise from their ability to implicitly execute certain algorithms on the context, thereby enhancing prediction and generation. In this work, we investigate how transformers with softmax attention perform in-context learning on linear classification data. We first construct a class of multi-layer transformers that can perform in-context logistic regression, with each layer exactly performing one step of normalized gradient descent on an in-context loss. Then, we show that our constructed transformer can be obtained through (i) training a single self-attention layer supervised by one-step gradient descent, and (ii) recurrently applying the trained layer to obtain a looped model. Training convergence guarantees of the self-attention layer and out-of-distribution generalization guarantees of the looped model are provided. Our results advance the theoretical understanding of ICL mechanism by showcasing how softmax transformers can effectively act as in-context learners.

Updated: 2026-05-07 17:27:55

标题: Transformer通过标准化梯度下降有效地执行上下文逻辑回归

摘要: 变压器已经展示了显著的上下文学习（ICL）能力。人们普遍认为，变压器的强大ICL性能源自它们在上下文中隐式执行某些算法的能力，从而增强了预测和生成能力。在这项工作中，我们研究了softmax注意力机制的变压器在线性分类数据上的上下文学习表现。我们首先构建了一类多层变压器，可以执行上下文逻辑回归，每一层恰好执行一个标准化梯度下降步骤以处理上下文损失。然后，我们展示了我们构建的变压器可以通过（i）训练一个单独的自注意力层，由一步梯度下降监督，和（ii）递归地应用训练过的层来获得一个循环模型。我们提供了自注意力层的训练收敛保证以及循环模型的越界泛化保证。我们的研究推进了ICL机制的理论理解，展示了softmax变压器如何有效地充当上下文学习者。

更新时间: 2026-05-07 17:27:55

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2605.06609v1

DARTS: Targeting Prognostic Covariates in Budget-Constrained Sequential Experiments

Randomized controlled trials typically assume that prognostic covariates are known and available at no cost. In practice, obtaining high-dimensional pretreatment data is costly, forcing a trade-off between covariate-adaptive precision and a measurement budget. We introduce Dynamic Adaptive Rerandomization via Thompson Sampling (DARTS), which treats covariate acquisition as a sequential optimization problem embedded within a design-based causal inference task. A budgeted combinatorial Thompson sampler learns which covariates are most prognostic across successive batches; selected covariates then drive rerandomization and regression adjustment to reduce batch-level average treatment effect variance. Our primary theoretical contribution is a decoupling result: adaptive covariate selection based on past batches preserves batch-level randomization validity, and the cumulative inverse-variance weighted estimator achieves at least nominal asymptotic coverage. We further derive a Bayes risk bound for the acquisition layer that matches the minimax lower bound up to logarithmic factors. Empirically, DARTS systematically concentrates the budget on informative features, significantly closing the efficiency gap to oracle designs while maintaining strict inferential validity.

Updated: 2026-05-07 17:27:51

标题: DARTS：针对预后协变量在预算受限的顺序实验中的目标。

摘要: 随机对照试验通常假设预后协变量是已知的，并且可以无成本获取。然而，在实践中，获取高维的治疗前数据是昂贵的，这导致在协变量自适应精度和测量预算之间存在权衡。我们引入了一种名为Dynamic Adaptive Rerandomization via Thompson Sampling (DARTS)的方法，将协变量获取视为嵌入在基于设计的因果推断任务中的序贯优化问题。一个有预算限制的组合式汤普森采样器学习哪些协变量在连续批次中最具预后价值；然后选择的协变量驱动重新随机化和回归调整，以减少批次级平均治疗效应方差。我们的主要理论贡献是一个解耦结果：基于过去批次的自适应协变量选择保持了批次级随机化的有效性，而累积逆方差加权估计器至少达到了名义渐近覆盖。此外，我们进一步推导了一个贝叶斯风险界限，该界限与最小化下限相匹配，直到对数因子。在经验上，DARTS系统地将预算集中在信息丰富的特征上，显著缩小了与理想设计之间的效率差距，同时保持严格的推论有效性。

更新时间: 2026-05-07 17:27:51

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2605.06608v1

AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents

Recent LLM-based agents have closed substantial portions of the scientific discovery loop in software-only machine-learning research, in chemistry, and in biology. Extending the same loop to high-fidelity physical simulators is harder, because solver completion does not imply physical validity and many failure modes appear only in field-level imagery rather than in solver logs. We present AI CFD Scientist, an open-source AI scientist for computational fluid dynamics (CFD) that, to our knowledge, is the first to span literature-grounded ideation, validated execution, vision-based physics verification, source-code modification, and figure-grounded writing within a single inspectable workflow. Three coupled pathways cover parameter sweeps within a fixed solver, case-local C++ library compilation for new physical models, and open-ended hypothesis search against a reference comparator, all running on OpenFOAM through Foam-Agent. At the center of the framework is a vision-language physics-verification gate that inspects rendered flow fields before any result is accepted, rerun, or written into a manuscript. On five tasks under a shared GPT-5.5 backbone, AI CFD Scientist autonomously discovers a Spalart-Allmaras runtime correction that reduces lower-wall Cf RMSE against DNS by 7.89% on the periodic hill at Reh=5600; under matched LLM cost, two strong general AI-scientist baselines (ARIS, DeepScientist) execute partial CFD workflows but lack the domain-specific validity gates needed to convert runs into defensible scientific claims; and a controlled planted-failure ablation shows that the vision-language gate detects 14 of 16 silent failures missed by solver-level checks. Code, prompts, and run artifacts are released at https://github.com/csml-rpi/cfd-scientist.

Updated: 2026-05-07 17:27:23

标题: AI CFD科学家：朝着具有物理意识的AI代理的开放式计算流体动力学发现

摘要: 最近基于LLM的代理已经在仅软件机器学习研究、化学和生物学中关闭了科学发现循环的重要部分。将相同的循环扩展到高保真度物理模拟器更加困难，因为求解器完成并不意味着物理有效性，许多故障模式仅出现在现场级图像中，而不是在求解器日志中。我们介绍AI CFD Scientist，这是一个用于计算流体动力学（CFD）的开源AI科学家，据我们所知，这是第一个跨越基于文献的构思、验证执行、基于视觉的物理验证、源代码修改和基于图像的写作的单一可检查工作流程。三个耦合路径覆盖了在固定求解器内进行参数扫描、为新的物理模型进行案例本地C++库编译，以及针对参考比较器进行开放式假设搜索，所有这些都在Foam-Agent上运行OpenFOAM。在框架的中心是一个视觉语言物理验证门，它在接受任何结果之前检查渲染的流场，重新运行或写入手稿。在共享的GPT-5.5骨干下，AI CFD Scientist在五项任务中自主发现了一种Spalart-Allmaras运行时校正，将Reh=5600的周期性山上的下壁Cf RMSE降低了7.89％；在匹配的LLM成本下，两个强大的通用AI科学家基线（ARIS、DeepScientist）执行部分CFD工作流程，但缺乏将运行转换为可辩护科学主张所需的领域特定有效性门；一个受控的植入式故障消蚀表明，视觉语言门检测到16个中16个被求解器级检查漏掉的静默故障。代码、提示和运行结果可在https://github.com/csml-rpi/cfd-scientist上获取。

更新时间: 2026-05-07 17:27:23

领域: physics.flu-dyn,cs.AI

下载: http://arxiv.org/abs/2605.06607v1

Purely Agent-Driven Black-Box Optimization for Biological Design

Many key challenges in biological design -- such as small-molecule drug discovery, antimicrobial peptide development, and protein engineering -- can be framed as black-box optimization over vast, complex structured spaces. Existing methods rely mainly on raw structural data and struggle to exploit the rich scientific literature. While large language models (LLMs) have been added to these pipelines, they have been confined to narrow roles within structure-centered optimizers. We instead cast biological black-box optimization as an agent-driven, language-based reasoning process. We introduce Purely Agent-driven BLack-box Optimization (PABLO), a hierarchical agentic system that uses scientific LLMs pretrained on chemistry and biology literature to generate and iteratively refine biological candidates. On both the standard GuacaMol molecular design and antimicrobial peptide optimization tasks, PABLO achieves state-of-the-art performance, substantially improving sample efficiency and final objective values over established baselines. Compared to prior optimization methods that incorporate LLMs, PABLO achieves competitive token usage per run despite relying on LLMs throughout the optimization loop. Beyond raw performance, the agentic formulation offers key advantages for realistic design: it naturally incorporates semantic task descriptions, retrieval-augmented domain knowledge, and complex constraints. In follow-up in vitro validation, PABLO-optimized peptides showed strong activity against drug-resistant pathogens, underscoring the practical potential of PABLO for therapeutic discovery.

Updated: 2026-05-07 17:25:56

标题: 纯粹由代理驱动的黑盒优化在生物设计中的应用

摘要: 生物设计中的许多关键挑战，如小分子药物发现、抗菌肽开发和蛋白工程，可以被视为在庞大、复杂结构空间上的黑匣优化问题。现有方法主要依赖原始结构数据，并且难以利用丰富的科学文献。虽然大型语言模型（LLMs）已经被添加到这些流程中，但它们仅被限制在以结构为中心的优化器中扮演狭窄的角色。相反，我们将生物黑匣优化视为一个由代理驱动的、基于语言的推理过程。我们引入了纯代理驱动的黑匣优化（PABLO），这是一个使用在化学和生物学文献上预先训练的科学LLMs生成并迭代优化生物候选者的分层代理系统。在标准的GuacaMol分子设计和抗菌肽优化任务上，PABLO实现了最先进的性能，大大提高了样本效率和最终目标值，超越了已建立的基线。与前沿优化方法相比，PABLO在整个优化循环中依赖LLMs的竞争性token使用量每次运行都能达到。除了原始性能外，代理形式为实际设计提供了关键优势：它自然地融合了语义任务描述、检索增强的领域知识和复杂约束。在随后的体外验证中，经过PABLO优化的肽显示出对耐药病原体的强活性，强调了PABLO在治疗发现方面的实际潜力。

更新时间: 2026-05-07 17:25:56

领域: cs.LG

下载: http://arxiv.org/abs/2601.22382v2

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation

Evaluating and predicting the performance of large language models (LLMs) in multi-turn conversational settings is critical yet computationally expensive; key events -- e.g., jailbreaks or successful task completion by an agent -- often emerge only after repeated interactions. These events might be rare, and under any feasible computational budget, remain unobserved. Recent conformal survival frameworks construct reliable lower predictive bounds (LPBs) on the number of iterations to trigger the event of interest, but rely on static budget allocation that is inefficient in multi-turn setups. To address this, we introduce \emph{Dynamic Allocation via PRojected Optimization} (DAPRO), the first theoretically valid dynamic budget allocation framework for bounding the time-to-event in multi-turn LLM interactions. We prove that DAPRO satisfies the budget constraint and provides distribution-free, finite-sample coverage guarantees without requiring the conditional independence between censoring and event times assumed by prior conformal survival approaches. A key theoretical contribution is a novel coverage bound that scales with the square root of the mean censoring weight rather than the worst-case weight, yielding provably tighter guarantees than prior work. Furthermore, DAPRO can be employed to obtain unbiased, low-variance estimates of population-level evaluation metrics, such as the jailbreak rate, under limited computing resources. Comprehensive experiments across agentic task success, adversarial jailbreaks, toxic content generation, and RAG hallucinations using LLMs such as Llama 3.1 and Qwen 2.5 demonstrate that DAPRO consistently achieves coverage closer to the nominal level with lower variance than static baselines, while satisfying the budget constraint.

Updated: 2026-05-07 17:25:15

标题: 越狱需要多少迭代？多轮LLM评估的动态预算分配

摘要: 评估和预测大型语言模型（LLMs）在多轮对话环境中的性能至关重要，但计算成本高昂；关键事件（例如，越狱或代理成功完成任务）通常只会在多次互动后才出现。这些事件可能很少见，并且在任何可行的计算预算下仍然未被观察到。最近的一些符合性存活框架构建了关于触发感兴趣事件所需迭代次数的可靠下限预测（LPBs），但依赖于静态预算分配，在多轮设置中效率低下。为了解决这个问题，我们引入了“通过投影优化进行动态分配”（DAPRO），这是第一个在多轮LLM互动中用于限制事件发生时间的理论上有效的动态预算分配框架。我们证明了DAPRO满足预算约束，并提供了无分布、有限样本覆盖保证，而无需先前符合性存活方法所假设的截尾和事件时间之间的条件独立性。一个关键的理论贡献是一个新颖的覆盖率界限，其随着均值截尾权重的平方根而不是最坏情况权重而扩展，从而提供了比先前工作更紧密的保证。此外，DAPRO可以用来在有限的计算资源下获得无偏、低方差的人群级评估指标估计，例如越狱率。通过使用LLMs，如Llama 3.1和Qwen 2.5，在代理任务成功、对抗性越狱、有毒内容生成和RAG幻觉等方面进行全面实验，结果表明DAPRO始终实现比静态基线更接近名义水平的覆盖，并具有比静态基线更低的方差，同时满足预算约束。

更新时间: 2026-05-07 17:25:15

领域: cs.LG

下载: http://arxiv.org/abs/2605.06605v1

Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches

Security updates create a short but important window in which defenders and attackers can compare vulnerable and patched software. Yet in many operational settings, the most accessible artifacts are binary packages rather than source patches or advisory text. This paper asks whether a language-model agent, restricted to local binary-derived evidence, can reconstruct the security meaning of Linux distribution updates. Patch2Vuln is a local, resumable pipeline that extracts old/new ELF pairs, diffs them with Ghidra and Ghidriff, ranks changed functions, builds candidate dossiers, and asks an offline agent to produce a preliminary audit, bounded validation plan, and final audit. We evaluate Patch2Vuln on 25 Ubuntu `.deb` package pairs: 20 security-update pairs and five negative controls, all manually adjudicated against private source-patch and binary-function ground truth. The agent localizes a verified security-relevant patch function in 10 of 20 security pairs and assigns an accepted final root-cause class in 11 of 20. Oracle diagnostics show that six security pairs fail before model reasoning because the binary differ or ranker omits the right function, with one additional context-export miss. A separate bounded validation pass produces two target-level minimized behavioral old/new differentials, both for tcpdump, but no crash, timeout, sanitizer finding, or memory-corruption proof; all five negative controls are classified as unknown and produce no validation differentials. These results support agentic vulnerability reconstruction from binary patches as a useful research target while showing that binary-diff coverage and local behavioral validation remain the limiting components.

Updated: 2026-05-07 17:22:22

标题: Patch2Vuln: Linux发行版二进制补丁中脆弱性的代理重建

摘要: 安全更新创建了一个短暂但重要的时间窗口，防御者和攻击者可以在其中比较易受攻击和修补过的软件。然而，在许多操作设置中，最容易获取的文献是二进制软件包，而不是源代码补丁或咨询文本。本文探讨了一个语言模型代理是否能够通过本地二进制派生的证据，重建Linux发行版更新的安全含义。Patch2Vuln是一个本地、可恢复的流水线，提取旧/新的ELF对，使用Ghidra和Ghidriff进行差异比较，对改变的函数进行排名，构建候选档案，并要求离线代理生成初步审计、有限验证计划和最终审计。我们在25个Ubuntu `.deb`软件包对上评估了Patch2Vuln：20个安全更新对和五个负对照，所有对都是根据私人源代码补丁和二进制函数的真实情况手动裁决的。代理在20个安全对中定位了一个经过验证的与安全相关的补丁函数，并在20个中的11个中分配了一个被接受的最终根本原因类别。Oracle诊断显示，由于二进制差异或排名器遗漏了正确的函数，有六对安全对在模型推理之前失败，另外有一个上下文导出遗漏。另一个有限验证通过产生了两个目标级别最小化的行为旧/新差异，都是针对tcpdump，但没有崩溃、超时、消毒剂发现或内存腐败证据；所有五个负对照都被分类为未知，并没有产生验证差异。这些结果支持从二进制补丁中重建代理性漏洞作为一个有用的研究目标，同时显示二进制差异覆盖和本地行为验证仍然是限制因素。

更新时间: 2026-05-07 17:22:22

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2605.06601v1

Weight-Decay Turns Transformer Loss Landscapes Villani: Functional-Analytic Foundations for Optimization and Generalization

Weight decay is widely used as a regularizer in large language models, yet its precise role in shaping Transformer loss landscapes remains theoretically underexplored. This paper provides the first rigorous functional-analytic characterization of the standard Transformer objective--cross-entropy loss with $L^2$ regularization--by proving it satisfies Villani's criteria for coercive energy functions. Specifically, we show that the regularized loss $\mathcal{F}$ is infinitely differentiable, grows at least quadratically, has Gaussian-integrable tails, and satisfies the differential growth condition $-Δ\mathcal{F} + \tfrac{1}{s}\|\nabla\mathcal{F}\|^{2} \to \infty$ as $\|θ\| \to \infty$ for all $s>0$. From this structure, we derive explicit log-Sobolev and Poincaré constants $C_{\mathrm{LS}} \leq λ^{-1} + d/λ^{2}$, linking the regularization strength $λ$ and model dimension $d$ to finite-time convergence guarantees for noisy stochastic gradient descent and PAC-Bayesian generalization bounds that tighten with increasing $λ$. To validate our theory, we introduce a scalable Villani diagnostic $Ψ_s(θ) = -Δ\mathcal{F} + s^{-1}\|\nabla \mathcal{F}\|^2$ and estimate it efficiently using Hutchinson trace probes in models with over 100M parameters. Experiments on GPT-Neo-125M across Penn Treebank and WikiText-103 confirm the predicted quadratic growth of $Ψ_s$, spectral inflation of the Hessian, and exponential convergence behavior consistent with our log-Sobolev analysis. These results demonstrate that weight decay not only improves generalization empirically but also establishes the mathematical conditions required for fast Langevin mixing and theoretically grounded curvature-aware optimization in deep learning.

Updated: 2026-05-07 17:22:19

标题: 权重衰减将Transformer损失景观转变为维拉尼：优化和泛化的功能分析基础

摘要: 权重衰减被广泛用作大型语言模型中的正则化器，然而它在塑造Transformer损失景观中的确切作用在理论上尚未得到充分探讨。本文通过证明标准Transformer目标--带有$L^2$正则化的交叉熵损失--满足维拉尼的强制能量函数标准，提供了对其严格的函数分析特征化。具体来说，我们展示了正则化损失$\mathcal{F}$是无限可微的，至少是二次增长的，具有高斯可积尾部，并且满足微分增长条件$-Δ\mathcal{F} + \tfrac{1}{s}\|\nabla\mathcal{F}\|^{2} \to \infty$，当$\|θ\| \to \infty$时对于所有$s>0$。根据这种结构，我们推导了显式的对数-Sobolev和Poincaré常数$C_{\mathrm{LS}} \leq λ^{-1} + d/λ^{2}$，将正则化强度$λ$和模型维度$d$与嘈杂随机梯度下降的有限时间收敛保证以及随着$λ$增加而加强的PAC-Bayesian泛化界限联系起来。为了验证我们的理论，我们引入了一个可扩展的维拉尼诊断$Ψ_s(θ) = -Δ\mathcal{F} + s^{-1}\|\nabla \mathcal{F}\|^2$，并使用Hutchinson迹探针在具有超过1亿个参数的模型中高效地估计它。在Penn Treebank和WikiText-103上对GPT-Neo-125M进行的实验证实了$Ψ_s$预测的二次增长，海森矩阵的谱膨胀，以及与我们的对数-Sobolev分析一致的指数收敛行为。这些结果表明，权重衰减不仅在经验上改善了泛化能力，而且为深度学习中快速Langevin混合和理论基础的曲率感知优化所需的数学条件奠定了基础。

更新时间: 2026-05-07 17:22:19

领域: cs.LG,eess.AS

下载: http://arxiv.org/abs/2605.06599v1

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

Self-distillation (SD) offers a promising path for adapting large language models (LLMs) without relying on stronger external teachers. However, SD in autoregressive LLMs remains challenging because self-generated trajectories are free-form, correctness is task-dependent, and plausible rationales can still provide unstable or unreliable supervision. Existing methods mainly examine isolated design choices, leaving their effectiveness, roles, and interactions unclear. In this paper, we propose UniSD, a unified framework to systematically study self-distillation. UniSD integrates complementary mechanisms that address supervision reliability, representation alignment, and training stability, including multi-teacher agreement, EMA teacher stabilization, token-level contrastive learning, feature matching, and divergence clipping. Across six benchmarks and six models from three model families, UniSD reveals when self-distillation improves over static imitation, which components drive the gains, and how these components interact across tasks. Guided by these insights, we construct UniSDfull, an integrated pipeline that combines complementary components and achieves the strongest overall performance, improving over the base model by +5.4 points and the strongest baseline by +2.8 points. Extensive evaluation highlights self-distillation as a practical and steerable approach for efficient LLM adaptation without stronger external teachers.

Updated: 2026-05-07 17:22:11

标题: UniSD：面向大型语言模型的统一自蒸馏框架

摘要: 自我蒸馏（SD）为适应大型语言模型（LLMs）提供了一条有前途的道路，而不依赖更强大的外部教师。然而，自回归LLMs中的SD仍然具有挑战性，因为自动生成的轨迹是自由形式的，正确性取决于任务，而合理的原因仍然可能提供不稳定或不可靠的监督。现有方法主要检查孤立的设计选择，使其有效性、作用和相互作用不清晰。在本文中，我们提出UniSD，这是一个统一的框架，用于系统研究自我蒸馏。UniSD集成了解决监督可靠性、表示对齐和训练稳定性的互补机制，包括多教师一致性、EMA教师稳定化、标记级对比学习、特征匹配和差异剪裁。在六个基准测试和来自三个模型系列的六个模型中，UniSD揭示了自我蒸馏何时优于静态模仿，哪些组件推动增益，以及这些组件在任务间如何相互作用。在这些见解的指导下，我们构建了UniSDfull，这是一个集成的流水线，结合了互补组件，并取得了最强的整体性能，比基准模型提高了+5.4分，比最强基线提高了+2.8分。广泛的评估突显了自我蒸馏作为一种实用和可控的方法，用于在没有更强大外部教师的情况下有效地适应LLM。

更新时间: 2026-05-07 17:22:11

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2605.06597v1

FedAttr: Towards Privacy-preserving Client-Level Attribution in Federated LLM Fine-tuning

Watermark radioactivity testing type of methods can detect whether a model was trained on watermarked documents, and have become key tools for protecting data ownership in the fine-tuning of large language models (LLMs). Existing works have proved their effectiveness in centralized LLM fine-tuning. However, this type of method faces several challenges and remains underexplored in federated learning (FL), a widely-applied paradigm for fine-tuning LLMs collaboratively on private data across different users. FL mainly ensures privacy through secure aggregation (SA), which allows the server to aggregate updates while keeping clients' updates private. This mechanism preserves privacy but makes it difficult to identify which client trained on watermarked documents. In this work, we propose FedAttr, a new client-level attribution protocol for FL. FedAttr identifies which clients trained on watermarked data via a paired-subset-difference mechanism, while preserving the privacy guarantees of SA and FL performance. FedAttr proceeds in three steps: (i) estimate each client's update by differencing two SA queries, (ii) score the estimate with the watermark detector via differential scoring, and (iii) combine scores across rounds via Stouffer method. We theoretically show that FedAttr produces an unbiased estimator of each client's update with bounded mutual information leakage (i.e., $O(d^*/N)$ per-round update). Moreover, FedAttr empirically achieves 100% TPR and 0% FPR, outperforming all baselines by at least 44.4% in TPR or 19.1% in FPR, with only 6.3% overhead relative to FL training time. Ablation studies confirm that FedAttr is robust to protocol parameters and configurations.

Updated: 2026-05-07 17:21:02

标题: FedAttr：面向联邦LLM微调中隐私保护的客户级归因

摘要: 水印放射性测试类型的方法可以检测模型是否是在水印文档上进行训练的，并已成为保护数据所有权在大型语言模型（LLMs）微调中的关键工具。现有研究已经证明了这些方法在集中式LLM微调中的有效性。然而，这种方法面临着几个挑战，在联合学习（FL）中仍未得到充分探索，FL是一种广泛应用的范式，可让不同用户共同在私人数据上微调LLMs。FL主要通过安全聚合（SA）确保隐私，这允许服务器在保持客户端更新私密的同时聚合更新。这种机制保护了隐私，但使得难以识别哪个客户端是在水印文档上进行训练的。在这项工作中，我们提出了FedAttr，一种新的用于FL的客户端级归因协议。FedAttr通过配对子集差异机制识别哪些客户端是在水印数据上进行训练的，同时保持SA和FL性能的隐私保证。FedAttr分为三个步骤：(i)通过差异化评分估计每个客户端的更新，(ii)通过差异评分用水印检测器对评估进行评分，(iii)通过Stouffer方法在不同回合间组合评分。我们在理论上表明FedAttr产生了每个客户端更新的无偏估计，并具有有界的互信息泄漏（即每轮更新$O(d^*/N)$）。此外，FedAttr在实验中实现了100%的TPR和0%的FPR，至少比所有基线模型在TPR方面提高了44.4%，在FPR方面提高了19.1%，相对于FL训练时间只增加了6.3%的开销。消融研究证实了FedAttr对协议参数和配置的鲁棒性。

更新时间: 2026-05-07 17:21:02

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2605.06596v1

Cross-Modal Navigation with Multi-Agent Reinforcement Learning

Robust embodied navigation relies on complementary sensory cues. However, high-quality and well-aligned multi-modal data is often difficult to obtain in practice. Training a monolithic model is also challenging as rich multi-modal inputs induce complex representations and substantially enlarge the policy space. Cross-modal collaboration among lightweight modality-specialized agents offers a scalable paradigm. It enables flexible deployment and parallel execution, while preserving the strength of each modality. In this paper, we propose \textbf{CRONA}, a Multi-Agent Reinforcement Learning (MARL) framework for \textbf{Cro}ss-Modal \textbf{Na}vigation. CRONA improves collaboration by leveraging control-relevant auxiliary beliefs and a centralized multi-modal critic with global state. Experiments on visual-acoustic navigation tasks show that multi-agent methods significantly improve performance and efficiency over single-agent baselines. We find that homogeneous collaboration with limited modalities is sufficient for short-range navigation under salient cues; heterogeneous collaboration among agents with complementary modalities is generally efficient and effective; and navigation in large, complex environments requires both richer multi-modal perception and increased model capacity.

Updated: 2026-05-07 17:20:34

标题: 多智能体强化学习的跨模态导航

摘要: 强大的具身式导航依赖于互补的感官线索。然而，在实践中往往很难获得高质量和良好对齐的多模态数据。训练一个单一的模型也是具有挑战性的，因为丰富的多模态输入会导致复杂的表示，并显著扩大策略空间。轻量级模态专门化代理之间的跨模态协作提供了一个可扩展的范式。它能够实现灵活部署和并行执行，同时保留每种模态的优势。在本文中，我们提出了CRONA，一个用于交叉模态导航的多代理强化学习（MARL）框架。CRONA通过利用与控制相关的辅助信念和具有全局状态的中央多模态评论者来改善协作。在视觉声音导航任务上的实验表明，多代理方法显著提高了性能和效率，超过了单一代理的基线。我们发现，在显著线索下，具有有限模态的同质协作对短距离导航已经足够；具有互补模态的代理之间的异质协作通常是高效且有效的；在大型复杂环境中导航需要更丰富的多模态感知和增加的模型容量。

更新时间: 2026-05-07 17:20:34

领域: cs.RO,cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2605.06595v1

ReActor: Reinforcement Learning for Physics-Aware Motion Retargeting

Retargeting human kinematic reference motion onto a robot's morphology remains a formidable challenge. Existing methods often produce physical inconsistencies, such as foot sliding, self-collisions, or dynamically infeasible motions, which hinder downstream imitation learning. We propose a bilevel optimization framework that jointly adapts reference motions to a robot's morphology while training a tracking policy using reinforcement learning. To make the optimization tractable, we derive an approximate gradient for the upper-level loss. Our framework requires only a sparse set of semantic rigid-body correspondences and eliminates the need for manual tuning by identifying optimal values for a parameterization expressive enough to preserve characteristic motion across different embodiments. Moreover, by integrating retargeting directly with physics simulation, we produce physically plausible motions that facilitate robust imitation learning. We validate our method in simulation and on hardware, demonstrating challenging motions for morphologies that differ significantly from a human, including retargeting onto a quadruped.

Updated: 2026-05-07 17:20:15

标题: ReActor：物理感知运动重新定位的强化学习

摘要: 将人类运动参考运动重新定位到机器人的形态仍然是一个艰巨的挑战。现有的方法通常会产生物理不一致，如脚滑动、自相撞或动态不可行的运动，这些都会阻碍下游的模仿学习。我们提出了一个双层优化框架，同时调整参考运动到机器人的形态，同时使用强化学习训练跟踪策略。为了使优化可行，我们为上层损失推导了一个近似梯度。我们的框架只需要一组稀疏的语义刚体对应，并通过识别足够表达不同实体特征运动的参数化的最优值来消除手动调整的需求。此外，通过直接将重新定位与物理模拟集成，我们产生了有利于稳健模仿学习的物理上可行的运动。我们在模拟和硬件上验证了我们的方法，展示了对于与人类有显著差异的形态，包括重新定位到四足动物的挑战性运动。

更新时间: 2026-05-07 17:20:15

领域: cs.RO,cs.GR,cs.LG

下载: http://arxiv.org/abs/2605.06593v1

DINORANKCLIP: DINOv3 Distillation and Injection for Vision-Language Pretraining with High-Order Ranking Consistency

Contrastive language-image pretraining (CLIP) suffers from two structural weaknesses: the symmetric InfoNCE loss discards the relative ordering among unmatched in-batch pairs, and global pooling collapses the visual representation into a semantic bottleneck that is poorly sensitive to fine-grained local structure. RANKCLIP partially addresses the first issue with a list-wise Plackett-Luce ranking-consistency loss, but its model is strictly first-order and inherits the second weakness untouched. We propose DINORANKCLIP, a pretraining framework that addresses both jointly. Our principal contribution is injecting a frozen DINOv3 teacher into the contrastive trunk through a dual-branch lightweight student and a multi-scale fusion module with channel-spatial attention, a self-attention refiner, and a conflict-aware gate that preserves the cross-modal alignment up to first order. Complementarily, we introduce a high-order Plackett-Luce ranking model in which the per-position utility is augmented with attention-parameterised pairwise and tuple-wise transition terms; the family contains CLIP and RANKCLIP as nested zero-order and first-order special cases, and the optimal order on every benchmark is $R^*=3$. The full empirical study -- order sweep, Fine-grained Probe on five datasets, four-node Modality-Gap analysis, six-variant Fusion ablation -- fits in 72 hours on a single eight-GPU H100 node and trains entirely on Conceptual Captions 3M. DINORANKCLIP consistently outperforms CLIP, CyCLIP, ALIP, and RANKCLIP under matched compute, with the largest relative gains on the fine-grained and out-of-distribution evaluations that most directly stress local structural reasoning.

Updated: 2026-05-07 17:19:52

标题: DINORANKCLIP：DINOv3蒸馏和注入，用于具有高阶排名一致性的视觉语言预训练

摘要: 对比语言-图像预训练（CLIP）存在两个结构性弱点：对称的InfoNCE损失丢弃了批内未匹配对之间的相对顺序，并且全局池化将视觉表示压缩成一个对细粒度本地结构不敏感的语义瓶颈。RANKCLIP通过列表式的Plackett-Luce排序一致性损失部分解决了第一个问题，但其模型是严格的一阶的，并且保留了第二个弱点。我们提出了DINORANKCLIP，一个同时解决这两个问题的预训练框架。我们的主要贡献是通过一个双分支轻量级学生和一个多尺度融合模块，注入一个冻结的DINOv3教师到对比主干中，该模块具有通道-空间注意机制，自注意细化器和一个保留交叉模态对齐的冲突感知门，直到一阶。此外，我们引入了一个高阶Plackett-Luce排序模型，其中每个位置的效用通过注意参数化的成对和元组转换项进行增强；该系列包含CLIP和RANKCLIP作为嵌套的零阶和一阶特例，每个基准上的最佳顺序为$R^*=3。完整的实证研究 -- 顺序扫描，对五个数据集进行的细粒度探针，四节点模态差异分析，六种变体的融合消融 -- 在单个八GPU H100节点上仅需72小时完成，完全在Conceptual Captions 3M上进行训练。在匹配计算情况下，DINORANKCLIP始终优于CLIP、CyCLIP、ALIP和RANKCLIP，在最直接强调本地结构推理的细粒度和超出分布评估中取得了最大的相对收益。

更新时间: 2026-05-07 17:19:52

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2605.06592v1

BRICKS: Compositional Neural Markov Kernels for Zero-Shot Radiation-Matter Simulation

We introduce a new strategy for compositional neural surrogates for radiation-matter interactions, a key task spanning domains from particle physics through nuclear and space engineering to medical physics. Exploiting the locality and the Markov nature of particle interactions, we create a \emph{next-particle prediction} kernel using hybrid discrete-continuous transformer models based on Riemannian Flow Matching on product manifolds. The model generates variable-sized typed sets of particles and radiation side effects that are the result of the interaction of an incident particle with a material volume. The resulting kernel can be composed to simulate unseen large-scale material distributions in a zero-shot manner. Unlike mechanistic simulators, our model is designed to be differentiable, provides tractable likelihoods for future downstream applications. A significant computational speed-up on GPU compared to CPU-bound mechanistic simulation is observed for single-kernel execution. We evaluate the model at the kernel level and demonstrate predictive stability over multi-round autoregressive rollouts. We additionally release a novel 20M-event radiation-matter interaction dataset for further research.

Updated: 2026-05-07 17:19:31

标题: 砖块：零样本辐射物质模拟的组合神经马尔科夫核

摘要: 我们引入了一种新的策略，用于辐射-物质相互作用的组合神经代理，这是一个跨越从粒子物理到核工程、航天工程到医学物理等领域的关键任务。利用粒子相互作用的局部性和马尔可夫性质，我们基于黎曼流匹配在乘积流形上创建了一个\emph{下一个粒子预测}核，采用混合离散-连续变换器模型。该模型生成了变大小的粒子和辐射副作用的类型集，这些集合是入射粒子与材料体积相互作用的结果。由此产生的核可以组合以以零样本方式模拟看不见的大规模材料分布。与机械模拟器不同，我们的模型设计为可微分，为未来下游应用提供可处理的似然性。与 CPU 绑定的机械模拟相比，单核执行时在 GPU 上观察到了显著的计算速度提升。我们在核水平上评估了模型，并展示了多轮自回归预测的稳定性。此外，我们还发布了一个新的2000万事件的辐射-物质相互作用数据集，以供进一步研究。

更新时间: 2026-05-07 17:19:31

领域: cs.LG,hep-ph

下载: http://arxiv.org/abs/2605.06591v1

Towards Metric-Faithful Neural Graph Matching

Graph Edit Distance (GED) is a fundamental, albeit NP-hard, metric for structural graph similarity. Recent neural graph matching architectures approximate GED by first encoding graphs with a Graph Neural Network (GNN) and then applying either a graph-level regression head or a matching-based alignment module. Despite substantial architectural progress, the role of encoder geometry in neural GED estimation remains poorly understood. In this paper, we develop a theoretical framework that connects encoder geometry to GED estimation quality for two broad classes of neural GED estimators: graph similarity predictors and alignment-based methods. On fixed graph collections, where the doubly-stochastic metric $d_{\mathrm{DS}}$ is comparable to GED, we show that graph-level bi-Lipschitz encoders yield controlled GED surrogates and improved ranking stability; for matching-based estimators, node-level bi-Lipschitz geometry propagates to encoder-induced alignment costs and the resulting optimized alignment objective. We instantiate this perspective using FSW-GNN, a bi-Lipschitz WL-equivalent encoder, as a drop-in replacement in representative neural GED architectures. Across representative baselines and benchmark datasets, the resulting geometry-aware variants significantly improve GED prediction and ranking metrics. A faithfulness case study of untrained encoders, together with ablations and transfer experiments, supports the view that these gains arise from improved representation geometry, positioning encoder geometry as a useful design principle for neural graph matching.

Updated: 2026-05-07 17:16:54

标题: 朝向度量忠实的神经图匹配

摘要: 图编辑距离（GED）是结构图相似性的基本度量，尽管是NP难题。最近的神经图匹配架构通过首先使用图神经网络（GNN）对图进行编码，然后应用图级回归头或基于匹配的对齐模块来近似GED。尽管架构方面取得了重大进展，但神经GED估计中编码器几何结构的作用仍然不为人了解。在本文中，我们开发了一个理论框架，将编码器几何结构与两类神经GED估计器的质量联系起来：图相似性预测器和基于对齐的方法。在固定图集合上，其中双随机度量$d_{\mathrm{DS}}$与GED可比，我们展示了图级双Lipschitz编码器产生受控GED替代和改进的排名稳定性；对于基于匹配的估计器，节点级双Lipschitz几何结构传播到编码器诱导的对齐成本和优化的对齐目标。我们使用FSW-GNN实例化这一观点，它是一个双Lipschitz WL等效编码器，在代表性的神经GED架构中作为一种可替代方案。在代表性基准线和基准数据集上，结果显示具有几何感知的变体显着改善了GED预测和排名指标。未经训练的编码器的忠实性案例研究，以及消融和转移实验，支持这些收益是由于改进的表示几何结构而产生，将编码器几何结构定位为神经图匹配的有用设计原则。

更新时间: 2026-05-07 17:16:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2605.06588v1

How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

SFT-then-RLVR is widely used for post-training reasoning models, but why this specific ordering, and why RLVR-only stalls at cold start, have lacked a unifying theoretical account. We provide that account under a unified loss family $J_Q$ using the Tsallis $q$-logarithm. $J_Q$ is a single-parameter family that interpolates between RLVR (at $q{=}0$, the \textit{exploitation pole}) and the log-marginal-likelihood over latent trajectories (at $q{=}1$, the \textit{density-estimation pole}), under which the standard pipeline corresponds to a stepwise $q{=}1 \to 0$ schedule. All members share the same per-example gradient direction, differing only by a per-instance amplification $P_θ^{-q}$ that reweights each instance independently of the learning rate. Under gradient flow analysis, we show that the exploitation pole requires $Ω(\frac{1}{p_0})$ time to escape cold start but is robust to label noise, while the density-estimation pole escapes in $Θ\big(\log(\frac{1}{p_0})\big)$ but memorizes label noise. This separation explains how SFT ($q{=}1$) first moves the model out of the cold-start regime, followed by the more robust RLVR ($q{=}0$), under the SFT-then-RLVR paradigm. We further derive two Monte Carlo estimators that directly optimize fixed-$q$ on the $J_Q$ continuum, without annotated rationales: Gradient-Amplified RL (GARL) and Posterior-Attenuated Fine-Tuning (PAFT), with shared bias $O\big(\frac{q}{M P_θ^q}\big)$ but different variance and stability properties. On FinQA, HotPotQA, and MuSiQue, GARL at sufficiently high $q$ substantially mitigates cold-start stalling, escaping cold start where GRPO fails entirely. In warm start, GARL at low $q$ dominates FinQA where training is stable; on HotPotQA and MuSiQue, GARL destabilizes and PAFT at $q{=}0.75$ remains stable, reaching $47.9$ \texttt{m@16} on HotPotQA ($+13.9$ over GRPO).

Updated: 2026-05-07 17:16:40

标题: 一个模型应该多快地承诺监督？在Tsallis损失连续上训练推理模型

摘要: SFT-then-RLVR被广泛用于后期推理模型，但为什么选择这种特定的顺序，以及为什么仅使用RLVR会在冷启动时停滞，目前缺乏统一的理论解释。我们在统一损失函数家族$J_Q$下使用Tsallis $q$-对数提供了这种解释。$J_Q$是一个单参数家族，介于RLVR（在$q{=}0$时，为\textit{利用极点}）和潜在轨迹上的对数边际似然（在$q{=}1$时，为\textit{密度估计极点}）之间插值，标准管道对应于一个逐步的$q{=}1 \to 0$调度。所有成员共享相同的每个示例的梯度方向，仅通过每个实例的放大$P_θ^{-q}$来区分，独立于学习速率重新加权每个实例。在梯度流分析下，我们表明利用极点需要$Ω(\frac{1}{p_0})$时间来摆脱冷启动，但对标签噪声具有鲁棒性，而密度估计极点在$Θ\big(\log(\frac{1}{p_0})\big)$内逃脱，但会记忆标签噪声。这种分离解释了为什么SFT（$q{=}1$）首先将模型从冷启动状态移出，然后是更强大的RLVR（$q{=}0$），在SFT-then-RLVR范式下。我们进一步推导了两个蒙特卡洛估计器，直接在$J_Q$连续体上优化固定的$q$，而无需注释的理由：梯度放大RL（GARL）和后验衰减微调（PAFT），共享偏差$O\big(\frac{q}{M P_θ^q}\big)$但具有不同的方差和稳定性特性。在FinQA、HotPotQA和MuSiQue上，高$q$下的GARL大大减轻了冷启动停滞问题，在GRPO完全失败的情况下成功摆脱了冷启动。在热启动中，低$q$下的GARL在FinQA上占主导地位，训练稳定；在HotPotQA和MuSiQue上，GARL不稳定，而在$q{=}0.75$时的PAFT保持稳定，达到了HotPotQA上的47.9\texttt{m@16}（比GRPO提高了13.9）。

更新时间: 2026-05-07 17:16:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2604.25907v2

Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications

Semantic communications mark a paradigm shift from bit-accurate transmission toward meaning-centric communication, essential as wireless systems approach theoretical capacity limits. The emergence of generative AI has catalyzed generative semantic communications, where receivers reconstruct content from minimal semantic cues by leveraging learned priors. Among generative approaches, diffusion models stand out for their superior generation quality, stable training dynamics, and rigorous theoretical foundations. However, the field currently lacks systematic guidance connecting diffusion techniques to communication system design, forcing researchers to navigate disparate literatures. This article provides the first comprehensive tutorial on diffusion models for generative semantic communications. We present score-based diffusion foundations and systematically review three technical pillars: conditional diffusion for controllable generation, efficient diffusion for accelerated inference, and generalized diffusion for cross-domain adaptation. In addition, we introduce an inverse problem perspective that reformulates semantic decoding as posterior inference, bridging semantic communications with computational imaging. Through analysis of human-centric, machine-centric, and agent-centric scenarios, we illustrate how diffusion models enable extreme compression while maintaining semantic fidelity and robustness. By bridging generative AI innovations with communication system design, this article aims to establish diffusion models as foundational components of next-generation wireless networks and beyond.

Updated: 2026-05-07 17:16:30

标题: 生成AI遇见6G及更高技术水平：用于语义通信的扩散模型

摘要: 语义通信标志着从比特精确传输向以意义为中心的通信的范式转变，这在无线系统接近理论容量限制时变得至关重要。生成式人工智能的出现催生了生成式语义通信，接收方通过利用学习到的先验知识，从最少的语义线索中重建内容。在生成式方法中，扩散模型以其优越的生成质量、稳定的训练动态和严格的理论基础脱颖而出。然而，该领域目前缺乏将扩散技术与通信系统设计相连接的系统指导，迫使研究人员在不同的文献中进行导航。本文提供了关于扩散模型用于生成式语义通信的首个全面教程。我们介绍了基于分数的扩散基础，并系统地审查了三个技术支柱：用于可控生成的条件扩散、用于加速推断的高效扩散，以及用于跨领域适应的广义扩散。此外，我们引入了一个逆问题的视角，将语义解码重新表述为后验推断，将语义通信与计算成像相结合。通过对人类中心、机器中心和代理中心场景的分析，我们演示了扩散模型如何在保持语义忠实性和稳健性的同时实现极端压缩。通过将生成式人工智能创新与通信系统设计相结合，本文旨在将扩散模型确立为下一代无线网络及更多领域的基础组件。

更新时间: 2026-05-07 17:16:30

领域: eess.SP,cs.IT,cs.LG,cs.MM

下载: http://arxiv.org/abs/2511.08416v3

The ART of Composition: Attention-Regularized Training for Compositional Visual Grounding

Vision-Language Models (VLMs) have achieved strong performance on implicit and explicit visual grounding and related tasks. However, such abilities are generally tested on simple, single-object phrases. We find that grounding performance degrades for complex, multi-object references. These limitations largely arise from training objectives that leverage image-caption alignment, where direct multi-object references are rare, the number of possible such references is theoretically large (exponential in the number of objects), and attribution is difficult. To address this, without requiring any additional annotations, we propose Compositional Attention-Regularized Training (CompART), which decomposes captions into object-centric phrases and constructs composite phrases by pairing them with conjunctions. We then introduce a composition loss that encourages the attention induced by a composite phrase to equal the sum of the attentions of its constituent phrases, promoting balanced multi-object localization. We evaluate CompART across four VLM architectures, spanning both contrastive-based and generative-based models, on four benchmarks for multi-object grounding and two VQA benchmarks for general visual understanding. CompART consistently improves grounding for both single- and multi-object references across diverse VLM architectures and datasets, and further demonstrates enhanced visual understanding, as evidenced by gains on VQA, despite not being explicitly trained for this task.

Updated: 2026-05-07 17:14:28

标题: 构图的艺术：关注规范化训练用于构图视觉定位

摘要: 视觉语言模型（VLMs）在隐式和显式视觉对齐及相关任务上取得了强大的表现。然而，这些能力通常是在简单的单物体短语上进行测试的。我们发现，对于复杂的多物体引用，对齐性能会下降。这些限制主要源于利用图像-标题对齐的训练目标，直接的多物体引用很少见，可能的引用数量在理论上很大（与物体数量呈指数关系），并且很难进行归因。为了解决这个问题，我们提出了组合注意力正则化训练（CompART），它将标题分解为以物体为中心的短语，并通过配对它们与连接词构建复合短语。然后，我们引入了一个组合损失，鼓励由复合短语引起的注意力等于其组成短语的注意力之和，促进平衡的多物体定位。我们在四个VLM架构上评估了CompART，涵盖了基于对比和基于生成的模型，以及四个多物体对齐的基准和两个用于一般视觉理解的VQA基准。CompART在不同的VLM架构和数据集上一致改善了单个和多个物体引用的对齐，并进一步展示了增强的视觉理解，尽管未明确针对这项任务进行训练。

更新时间: 2026-05-07 17:14:28

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2412.08110v3

Distributionally-Robust Learning to Optimize

We propose a distributionally robust approach to learning hyperparameters for first-order methods in convex optimization. Given a dataset of problem instances, we minimize a Wasserstein distributionally robust version of the performance estimation problem (PEP) over algorithm parameters such as step sizes. Our framework unifies two extremes: as the robustness radius vanishes, we recover classical learning to optimize (L2O); as it grows, we recover worst-case optimal algorithm design via PEP. We solve the resulting problem with stochastic gradient descent, differentiating through the solution of an inner semidefinite program at each step. We prove high-probability bounds showing that the true risk of the learned algorithm is at most the in-sample L2O optimum plus a slack that shrinks with the sample size, and is no worse than the worst-case PEP bound. On unconstrained quadratic minimization, LASSO, and linear programming benchmarks, our learned algorithms achieve strong out-of-sample performance with certifiable robustness, outperforming both worst-case optimal and vanilla L2O baselines.

Updated: 2026-05-07 17:14:15

标题: 分布鲁棒学习优化

摘要: 我们提出了一种分布鲁棒的方法来学习凸优化中一阶方法的超参数。给定一个问题实例数据集，我们通过最小化一个Wasserstein分布鲁棒版本的性能估计问题（PEP）来学习算法参数，如步长。我们的框架统一了两个极端：当鲁棒性半径消失时，我们恢复了经典的学习优化（L2O）；当它增长时，我们通过PEP恢复了最坏情况下的最优算法设计。我们通过随机梯度下降来解决结果问题，通过在每一步中通过内部半定规划的解来微分。我们证明了高概率边界，表明学习算法的真实风险最多是样本内L2O最优解再加上一个随着样本大小减小的松弛项，且不会比最坏情况下的PEP边界更差。在无约束二次最小化、LASSO和线性规划基准测试中，我们学习到的算法实现了具有可证明鲁棒性的强大样本外性能，胜过了最坏情况下的最优解和基准L2O。

更新时间: 2026-05-07 17:14:15

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2605.06585v1

NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research

Multimodal neuroimaging analysis often involves complex, modality-specific preprocessing workflows that require careful configuration, quality control, and coordination across heterogeneous toolchains. Beyond preprocessing, downstream statistical analysis and disease classification commonly require task-specific code, evaluation protocols, and data-format conventions, creating additional barriers between raw acquisitions and reproducible scientific analysis. We present NeuroAgent, an LLM-driven agentic framework that automates key preprocessing and analysis steps for heterogeneous neuroimaging data, including sMRI, fMRI, dMRI, and PET, and supports interactive downstream analysis through natural-language queries. NeuroAgent employs a hierarchical multi-agent architecture with a feedback-driven Generate-Execute-Validate engine: agents autonomously generate executable preprocessing code, detect and recover from runtime errors, and validate output integrity. We evaluate the system on 1,470 subjects pooled across all ADNI phases (CN=1,000, AD=470), where all subjects have sMRI and tabular data, with subsets also having Tau-PET (n=469), fMRI (n=278), and DTI ($n=620$). Pipeline ablation studies across multiple LLM backends show that capable models reach up to 100% intent-parsing accuracy, with the strongest backend (Qwen3.5-27B) reaching 84.8% end-to-end preprocessing step correctness. Automated recovery limits manual intervention to edge cases where human review is required via the Human-In-The-Loop interface. For Alzheimer's Disease classification using automatically preprocessed multimodal data, our agent ensemble achieves an AUC of 0.9518 with four modalities, outperforming all single-modality baselines. These results show that NeuroAgent can reduce the manual effort required for neuroimaging preprocessing and enable end-to-end automated analysis pipelines for neuroimaging research.

Updated: 2026-05-07 17:13:48

标题: 神经代理：用于多模态神经影像分析和研究的LLM代理

摘要: 多模态神经影像分析通常涉及复杂的、模态特定的预处理工作流程，需要仔细配置、质量控制和跨异构工具链的协调。除了预处理之外，下游统计分析和疾病分类通常需要特定任务的代码、评估协议和数据格式约定，这在原始获取和可重复科学分析之间创建了额外的障碍。我们提出了NeuroAgent，这是一个基于LLM驱动的代理框架，可自动化异构神经影像数据的关键预处理和分析步骤，包括sMRI、fMRI、dMRI和PET，并通过自然语言查询支持交互式下游分析。NeuroAgent采用了一个具有反馈驱动的Generate-Execute-Validate引擎的分层多代理架构：代理自主生成可执行的预处理代码，检测和恢复运行时错误，并验证输出的完整性。我们在所有ADNI阶段汇总的1,470个受试者上评估了该系统（CN=1,000，AD=470），其中所有受试者都有sMRI和表格数据，其中子集还有Tau-PET（n=469）、fMRI（n=278）和DTI（$n=620$）。跨多个LLM后端的管道消融研究表明，具备能力的模型达到了100%的意图解析准确度，最强的后端（Qwen3.5-27B）达到了84.8%的端到端预处理步骤正确性。自动化恢复将手动干预限制在需要通过Human-In-The-Loop接口进行人工审查的边缘情况。对于使用自动预处理的多模态数据进行阿尔茨海默病分类，我们的代理集成实现了AUC值为0.9518，使用四种模态胜过所有单模态基线。这些结果表明，NeuroAgent可以减少神经影像预处理所需的手动工作量，并为神经影像研究实现端到端自动化分析管道。

更新时间: 2026-05-07 17:13:48

领域: cs.AI

下载: http://arxiv.org/abs/2605.06584v1

Improved techniques for fine-tuning flow models via adjoint matching: a deterministic control pipeline

We propose a deterministic adjoint matching framework that formulates human preference alignment for flow-based generative models as an optimal control problem over velocity fields. One can directly regress the control toward a value-gradient-induced target under the current policy, leading to a simple and stable training objective. Building on this perspective, we introduce a truncated adjoint scheme that focuses computation on the terminal portion of the trajectory, where reward-relevant signals concentrate, which yields substantial computational savings while preserving alignment quality. We further generalize the framework beyond standard KL-based regularization, allowing more flexible trade-offs between alignment strength and distributional preservation. Experiments on SiT-XL/2 and FLUX.2-Klein-4B demonstrate consistent gains across multiple alignment metrics, along with substantially improved diversity and mode preservation.

Updated: 2026-05-07 17:12:47

标题: 通过共轭匹配改进微调流动模型的技术：确定性控制流水线

摘要: 我们提出了一个确定性的伴随匹配框架，将基于流的生成模型的人类偏好对齐问题，形式化为对速度场的最优控制问题。可以直接回归当前策略下的值梯度诱导目标，从而得到一个简单稳定的训练目标。基于这一视角，我们引入了一个截断的伴随方案，重点关注轨迹的末端部分，奖励相关信号集中在这里，这样可以实现大量的计算节约，并保持对齐质量。我们进一步将框架推广到标准KL正则化之外，允许在对齐强度和分布保持之间进行更灵活的权衡。在SiT-XL/2和FLUX.2-Klein-4B上的实验证明，在多个对齐指标上都取得了一致的收益，同时大大提高了多样性和模式保持。

更新时间: 2026-05-07 17:12:47

领域: cs.AI

下载: http://arxiv.org/abs/2605.06583v1

PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audio tokenizers rely on quantization, clustering, or codec reconstruction, assigning tokens locally, so sequence consistency, compactness, length control, termination, and edit similarity are rarely optimized directly. We introduce PairAlign, a framework for compact audio tokenization through sequence-level self-alignment. PairAlign treats tokenization as conditional sequence generation: an encoder maps speech to a continuous condition, and an autoregressive decoder generates tokens from BOS, learning token identity, order, length, and EOS placement. Given two content-preserving views, each view's sequence is trained to be likely under the other's representation, while unrelated examples provide competing sequences. This gives a scalable surrogate for edit-distance preservation while discouraging many-to-one collapse. PairAlign starts from VQ-style tokenization and refines it with EMA-teacher targets, cross-paired teacher forcing, prefix corruption, likelihood contrast, and length control. On 3-second speech, PairAlign learns compact, non-degenerate sequences with broad vocabulary usage and strong cross-view consistency. On TIMIT retrieval, it preserves edit-distance search while reducing archive token count by 55%. A continuous-sweep probe shows lower local overlap than a dense geometric tokenizer, but stronger length control and bounded edit trajectories under 100 ms shifts. PairAlign is a sequence-symbolic predictive learner: like JEPA-style objectives, it predicts an abstract target from another view as a learned variable-length symbolic sequence, not a continuous latent.

Updated: 2026-05-07 17:11:22

标题: PairAlign：一种通过自对齐进行序列标记化的框架，及其在音频标记化中的应用

摘要: 许多感知数据上的操作--比较、记忆、检索和推理--在离散符号结构上自然表达。在语言中，这种接口是通过标记实现的；在音频中，必须学习。现有的音频标记依赖于量化、聚类或编解码重建，将标记分配在本地，因此很少直接优化序列一致性、紧凑性、长度控制、终止和编辑相似度。我们引入了PairAlign，这是一个通过序列级自对齐实现紧凑音频标记化的框架。PairAlign 将标记化视为条件序列生成：一个编码器将语音映射到连续条件，一个自回归解码器从 BOS 生成标记，学习标记的身份、顺序、长度和 EOS 的放置。给定两个保留内容的视图，每个视图的序列被训练为在另一个表示下可能，而不相关的示例提供竞争序列。这为编辑距离保留提供了可扩展的替代方案，同时防止了多对一的坍缩。PairAlign 从 VQ 风格的标记化开始，并通过 EMA-teacher 目标、交叉对教师强制、前缀破坏、可能性对比和长度控制进行了改进。在 3 秒的语音上，PairAlign 学习了紧凑、非退化的序列，广泛使用词汇，并具有强大的跨视图一致性。在 TIMIT 检索中，它保留了编辑距离搜索，同时将存档标记数量减少了55%。连续扫描探测显示，与密集的几何标记化器相比，PairAlign 具有更低的局部重叠，但在100 毫秒的移动下具有更强的长度控制和有界的编辑轨迹。PairAlign 是一个序列符号预测学习者：类似于 JEPA 风格的目标，它从另一个视图预测一个抽象目标作为学习的可变长度符号序列，而不是连续的潜在变量。

更新时间: 2026-05-07 17:11:22

领域: cs.LG,cs.CL,cs.SD

下载: http://arxiv.org/abs/2605.06582v1

Suspicious Alignment of SGD: A Fine-Grained Step Size Condition Analysis

This paper explores the suspicious alignment phenomenon in stochastic gradient descent (SGD) under ill-conditioned optimization, where the Hessian spectrum splits into dominant and bulk subspaces. This phenomenon describes the behavior of gradient alignment in SGD updates. Specifically, during the initial phase of SGD updates, the alignment between the gradient and the dominant subspace tends to decrease. Subsequently, it enters a rising phase and eventually stabilizes in a high-alignment phase. The alignment is considered ``suspicious'' because, paradoxically, the projected gradient update along this highly-aligned dominant subspace proves ineffective at reducing the loss. The focus of this work is to give a fine-grained analysis in a high-dimensional quadratic setup about how step size selection produces this phenomenon. Our main contribution can be summarized as follows: We propose a step-size condition revealing that in low-alignment regimes, an adaptive critical step size $η_t^*$ separates alignment-decreasing ($η_t < η_t^*$) from alignment-increasing ($η_t > η_t^*$) regimes, whereas in high-alignment regimes, the alignment is self-correcting and decreases regardless of the step size. We further show that under sufficient ill-conditioning, a step size interval exists where projecting the SGD updates to the bulk space decreases the loss while projecting them to the dominant space increases the loss, which explains a recent empirical observation that projecting gradient updates to the dominant subspace is ineffective. Finally, based on this adaptive step-size theory, we prove that for a constant step size and large initialization, SGD exhibits this distinct two-phase behavior: an initial alignment-decreasing phase, followed by stabilization at high alignment.

Updated: 2026-05-07 17:10:40

标题: SGD的可疑对齐：细粒度步长条件分析

摘要: 本文探讨了在病态优化条件下随机梯度下降（SGD）中的可疑对齐现象，其中Hessian谱分裂为主导和批量子空间。该现象描述了SGD更新中梯度对齐的行为。具体而言，在SGD更新的初始阶段，梯度与主导子空间之间的对齐趋势减小。随后，进入上升阶段，并最终稳定在高对齐阶段。该对齐被认为是“可疑的”，因为矛盾地，沿着这个高度对齐的主导子空间的投影梯度更新被证明无法有效地降低损失。本文的重点是在高维二次设置中提供对步长选择如何产生这种现象的精细分析。我们的主要贡献可以总结如下：我们提出了一个步长条件，揭示了在低对齐区域，自适应关键步长$η_t^*$将对齐减少（$η_t < η_t^*$）与对齐增加（$η_t > η_t^*$）的区域分开，而在高对齐区域，对齐是自我校正的，无论步长如何，对齐都会减少。我们进一步展示，在足够的病态条件下，存在一个步长区间，将SGD更新投影到批量空间会降低损失，而将其投影到主导空间会增加损失，这解释了最近的经验观察，即将梯度更新投影到主导子空间是无效的。最后，基于这一自适应步长理论，我们证明对于恒定步长和大初始化，SGD表现出这种明显的两阶段行为：一个初始的对齐减少阶段，随后在高对齐处稳定。

更新时间: 2026-05-07 17:10:40

领域: cs.LG

下载: http://arxiv.org/abs/2601.11789v2

Language Models Can Autonomously Hack and Self-Replicate

We demonstrate that language models can autonomously replicate their weights and harness across a network by exploiting vulnerable hosts. The agent independently finds and exploits a web-application vulnerability, extracts credentials, and deploys an inference server with a copy of its harness and prompt on the compromised host. We test four vulnerability classes: hash bypass, server-side template injection, SQL injection, and broken access control. Qwen3.5-122B-A10B succeeds in 6-19% of attempts, and the smaller Qwen3.6-27B reaches 33% on a single A100. This already matches the current-generation GPT-5.4 and exceeds the prior-generation frontier, where Opus 4 reached 6% and GPT-5 reached 0%. Replicating Qwen weights, frontier models reach 81% (Opus 4.6) and 33% (GPT-5.4). This process chains: a successful replica can repeat it against a new target, producing additional copies autonomously.

Updated: 2026-05-07 17:09:36

标题: 语言模型可以自主入侵和自我复制

摘要: 我们展示了语言模型可以通过利用易受攻击的主机，自主复制其权重和网络中的控制权。该代理程序独立发现并利用Web应用程序漏洞，提取凭据，并在受损主机上部署一个推理服务器，其上具有其控制权和提示的副本。我们测试了四个漏洞类别：哈希绕过、服务器端模板注入、SQL注入和破坏访问控制。 Qwen3.5-122B-A10B在6-19%的尝试中成功，而较小的Qwen3.6-27B在单个A100上达到33%。这已经匹配了当前一代的GPT-5.4，并超过了前一代的前沿，其中Opus 4达到了6%，而GPT-5则达到了0%。复制Qwen的权重，前沿模型达到了81%（Opus 4.6）和33%（GPT-5.4）。这个过程可以连锁反应：一个成功的副本可以针对新的目标重复这一过程，自主产生额外的副本。

更新时间: 2026-05-07 17:09:36

领域: cs.CR

下载: http://arxiv.org/abs/2605.06760v1

On the Safety of Graph Representation Learning

Graph representation learning (GRL) has evolved from topology-only graph embeddings to task-specific supervised GNNs, and more recently to reusable representations and graph foundation models (GFMs). However, existing evaluations mainly measure clean transfer, adaptation, and task coverage. It remains unclear whether GRL methods stay reliable when deployment stresses affect graph signals, graph contexts, label support, structural groups, or predictive evidence. We introduce GRL-Safety, a multi-axis safety evaluation benchmark for GRL. GRL-Safety evaluates twelve representative methods, spanning topology-only embedding methods, supervised GNNs, self-supervised graph models, and GFMs, on twenty-five graph datasets under standardized evaluation conditions while preserving method-native adaptation. The evaluation covers five safety axes: corruption robustness, OOD generalization, class imbalance, fairness, and interpretation, with per-axis and sub-condition reporting rather than a single aggregate score. Our analysis yields three cross-axis insights that can inspire future research. First, safety behavior is shaped by the interaction between representation design and the stressed graph factor, rather than by method family alone. Second, foundation-era methods show axis-specific strengths rather than broad safety dominance. Third, several deployment regimes remain difficult even for the best evaluated method, revealing capability gaps that require new robustness, adaptation, or training objectives beyond model selection. The benchmark, evaluation protocols, and code are available at: https://github.com/GXG-CS/GRL-Safety.

Updated: 2026-05-07 17:06:19

标题: 图表示学习的安全性

摘要: 图表示学习（GRL）已经从仅考虑拓扑结构的图嵌入发展到特定任务的监督GNNs，最近又发展到可重复使用的表示和图基础模型（GFMs）。然而，现有的评估主要衡量干净的转移、适应性和任务覆盖。目前尚不清楚当部署压力影响图信号、图上下文、标签支持、结构组或预测证据时，GRL方法是否仍然可靠。我们引入了GRL-Safety，这是一个用于GRL的多轴安全评估基准。GRL-Safety在标准化评估条件下评估了十二种代表性方法，涵盖了仅考虑拓扑结构的嵌入方法、监督GNNs、自监督图模型和GFMs，在保持方法固有适应性的同时，在二十五个图数据集上进行评估。评估涵盖了五个安全轴：损坏鲁棒性、OOD泛化、类别不平衡、公平性和解释性，而不是单一的整体得分。我们的分析得出了三个跨轴见解，可以激发未来的研究。首先，安全行为受到表示设计和受压力的图因素之间的互动的影响，而不仅仅受方法家族的影响。其次，基础时代的方法表现出轴特定的优势，而不是广泛的安全主导地位。第三，即使是最好的评估方法，也有几个部署方案仍然困难，揭示了需要超越模型选择的新的鲁棒性、适应性或训练目标的能力差距。该基准、评估协议和代码可在以下链接找到：https://github.com/GXG-CS/GRL-Safety。

更新时间: 2026-05-07 17:06:19

领域: cs.LG

下载: http://arxiv.org/abs/2605.06576v1

Exact Structural Abstraction and Tractability Limits

Any rigorously specified problem determines an admissible-output relation $R$, and exact correctness depends only on the induced decision quotient relation $s \sim_R s' \iff \operatorname{Adm}_R(s)=\operatorname{Adm}_R(s')$. Exact relevance certification asks which coordinates recover those classes. Decision, counting, search, approximation, PAC/regret/risk, randomized-output guarantees, anytime or finite-horizon guarantees, and distributional guarantees all reduce to this quotient-recovery problem. Universal exact-semantics reduction identifies admissible-output quotient recovery as the canonical object. Optimizer-quotient realizability is maximal, so quotient shape alone cannot mark a tractability frontier. Orbit gaps are the exact obstruction to classification by closure-law-invariant structural predicates. Exact classification by closure-law-invariant predicates succeeds exactly when the target is constant on closure orbits; on a closure-closed domain, equivalently, when the positive and negative orbit hulls are disjoint, in which case there is a least exact closure-invariant classifier. Across four natural candidate structural tractability criteria, a uniform pair-targeted affine witness produces same-orbit disagreements and rules out exact structural classification on the full binary pairwise domain. Because that witness class already sits inside the universal semantic framework, the same obstruction applies to any universal exact-certification characterization over rigorously specified problems. Restricting the domain helps only by removing orbit gaps. Without explicit margin control, arbitrarily small utility perturbations can flip relevance and sufficiency.

Updated: 2026-05-07 17:05:53

标题: 精确结构抽象和可处理性限制

摘要: 任何严格规定的问题都确定了一个可接受的输出关系$R$，精确正确性仅取决于诱导的决策商关系$s \sim_R s' \iff \operatorname{Adm}_R(s)=\operatorname{Adm}_R(s')$。精确相关性认证询问哪些坐标可以恢复这些类。决策、计数、搜索、逼近、PAC/后悔/风险、随机输出保证、随时或有限视野保证、以及分布保证都可以归结为这个商恢复问题。通用精确语义缩减将可接受输出商恢复确定为规范对象。优化器商可实现性最大化，因此仅仅商形状无法标记可处理性的边界。轨道间隙是按闭包法不变结构谓词分类的精确障碍。按闭包法不变谓词进行精确分类成功的前提是目标在闭包轨道上是恒定的；在一个闭包封闭的域上，等效地，在正轨道和负轨道凸壳不相交时，存在一个最小的精确闭包不变分类器。在四个自然的候选结构可处理性标准中，一个统一的双目标仿射证明产生了同轨不一致性，并排除了在完整的二元配对域上进行精确结构分类。因为该证明类已经位于通用语义框架中，同样的障碍也适用于任何对严格规定的问题进行通用精确认证表征。限制域只能通过消除轨道间隙来帮助。在没有明确边际控制的情况下，任意小的效用扰动都可能改变相关性和充分性。

更新时间: 2026-05-07 17:05:53

领域: cs.CC,cs.AI,cs.LO

下载: http://arxiv.org/abs/2604.07349v7

Directional Consistency as a Complementary Optimization Signal: The GONO Framework

We identify and formalize an underexplored phenomenon in deep learning optimization: directional alignment and loss convergence can be decoupled. An optimizer can exhibit near-perfect directional consistency (cc_t -> 1, measured via consecutive gradient cosine similarity) while the loss remains high or decreases slowly. This observation reveals that existing optimizers such as Adam, SGD, and RMSprop lack explicit mechanisms to exploit temporal consistency in gradient directions, relying instead on magnitude-based signals that fail to distinguish plateaus, saddle points, and genuine convergence. Motivated by this, we introduce GONO (Gradient-Oriented Norm-Adaptive Optimizer), which adapts Adam's momentum coefficient beta_1 based on cc_t: amplifying momentum under directional consistency and suppressing it during oscillation. We prove GONO matches Adam's O(1/sqrt(T)) convergence rate and reduces exactly to Adam when the signal is uninformative. Empirically, cc_t achieves oscillation detection with F1=1.00 (vs. 0.45 for gradient norm), and GONO remains competitive with AdamW on MNIST (98.15%), CIFAR-10 (43.14%), and ResNet-18 (75.44%), establishing directional alignment as a theoretically grounded, practically actionable optimization signal. Code: https://github.com/victordaniel/gono-optimizer

Updated: 2026-05-07 17:05:05

标题: 方向一致性作为补充优化信号：GONO框架

摘要: 我们确定并形式化了深度学习优化中一个被忽视的现象：方向对齐和损失收敛可以解耦。一个优化器可以表现出近乎完美的方向一致性（cc_t -> 1，通过连续梯度余弦相似度来衡量），而损失仍然很高或下降缓慢。这一观察结果揭示了现有的优化器如Adam、SGD和RMSprop缺乏利用梯度方向的时间一致性的明确机制，而是依赖于基于幅度的信号，无法区分高原、鞍点和真正的收敛。受此启发，我们引入了GONO（Gradient-Oriented Norm-Adaptive Optimizer），根据cc_t调整Adam的动量系数beta_1：在方向一致性下增加动量，在振荡时抑制动量。我们证明GONO与Adam的O(1/sqrt(T))收敛速率相匹配，当信号不具信息性时，GONO恰好缩减为Adam。在实证方面，cc_t实现了振荡检测，F1=1.00（梯度范数为0.45），而GONO在MNIST（98.15%）、CIFAR-10（43.14%）和ResNet-18（75.44%）上与AdamW保持竞争力，建立了方向对齐作为一个在理论上有基础、在实践上可操作的优化信号。源代码：https://github.com/victordaniel/gono-optimizer

更新时间: 2026-05-07 17:05:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2605.06575v1

Refining Gelfond Rationality Principle: Towards More Comprehensive Foundational Principles for Answer Set Semantics

Non-monotonic logic programming is the basis for a declarative problem solving paradigm known as answer set programming (ASP). Departing from the seminal definition by Gelfond and Lifschitz in 1988 for simple normal logic programs, various answer set semantics have been proposed for extensions. We consider two important questions: (1) Should the minimal model property, constraint monotonicity and foundedness as defined in the literature be mandatory conditions for an answer set semantics in general? (2) If not, what other properties could be considered as general principles for answer set semantics? We address the two questions. First, it seems that the three aforementioned conditions may sometimes be too strong, and we illustrate with examples that enforcing them may exclude expected answer sets. Second, we evolve the Gelfond answer set (GAS) principles for answer set construction by refining the Gelfond's rationality principle to well-supportedness, minimality w.r.t. negation by default and minimality w.r.t. epistemic negation. The principle of well-supportedness guarantees that every answer set is constructible from if-then rules obeying a level mapping and is thus free of circular justification, while the two minimality principles ensure that the formalism minimizes knowledge both at the level of answer sets and of world views. Third, to embody the refined GAS principles, we extend the notion of well-supportedness substantially to answer sets and world views, respectively. Fourth, we define new answer set semantics in terms of the refined GAS principles. Fifth, we use the refined GAS principles as an alternative baseline to intuitively assess the existing answer set semantics. Finally, we analyze the computational complexity.

Updated: 2026-05-07 17:02:42

标题: Refining Gelfond理性原则：朝着更全面的基础原则迈进，以解释集语义。

摘要: 非单调逻辑编程是一种被称为答案集编程（ASP）的声明性问题求解范式的基础。从1988年Gelfond和Lifschitz对简单正常逻辑程序的开创性定义出发，针对扩展提出了各种答案集语义。我们考虑两个重要问题：（1）文献中定义的最小模型属性、约束单调性和成立性是否应该成为一般答案集语义的强制条件？（2）如果不是，还有哪些属性可以作为答案集语义的一般原则？我们对这两个问题进行了讨论。首先，似乎前述的三个条件有时可能过于严格，我们通过示例说明，强制执行这些条件可能会排除预期的答案集。其次，我们通过将Gelfond的理性原则进化为良支持性、相对于默认否定的最小性和相对于认知否定的最小性，为答案集构造演进了Gelfond答案集（GAS）原则。良支持性原则确保每个答案集都可以由遵循级别映射的if-then规则构建，因此不受循环论证的影响，而两个最小性原则则确保形式主义在答案集和世界观的知识最小化。第三，为了体现改进后的GAS原则，我们在答案集和世界观方面大幅扩展了良支持性概念。第四，我们根据改进后的GAS原则定义了新的答案集语义。第五，我们将改进后的GAS原则作为一种直观评估现有答案集语义的替代基准。最后，我们分析了计算复杂性。

更新时间: 2026-05-07 17:02:42

领域: cs.AI

下载: http://arxiv.org/abs/2507.01833v2

CLAD: A Clustered Label-Agnostic Federated Learning Framework for Joint Anomaly Detection and Attack Classification

The rapid expansion of the Internet of Things (IoT) and Industrial IoT (IIoT) has created a massive, heterogeneous attack surface that challenges traditional network security mechanisms. While Federated Learning (FL) offers a privacy-preserving alternative to centralized Intrusion Detection Systems (IDS), standard approaches struggle to generalize across diverse device behaviors and typically fail to utilize the vast amounts of unlabeled data present in realistic edge environments. To bridge these gaps, we propose CLAD, a holistic framework that seamlessly incorporates Clustered Federated Learning (CFL) with a novel Dual-Mode Micro-Architecture ($\text{DM}^2\text{A}$). This unified approach simultaneously tackles the two primary bottlenecks of IoT security: device heterogeneity and label scarcity. The $\text{DM}^2\text{A}$ component features a shared encoder followed by two branches, enabling joint unsupervised anomaly detection and supervised attack classification; this allows the framework to harvest intelligence from both labeled and unlabeled clients. Concurrently, the clustering component dynamically groups devices with congruent traffic patterns, preventing global model divergence. By carefully combining these elements, CLAD ensures that no data is discarded and distinct operational patterns are preserved. Extensive evaluations demonstrate that this integrated approach significantly outperforms state-of-the-art baselines, achieving a 30% relative improvement in detection performance in scenarios with 80% unlabeled clients, with only half the communication cost.

Updated: 2026-05-07 17:01:19

标题: CLAD：一种用于联合异常检测和攻击分类的集群标签无关联邦学习框架

摘要: 物联网（IoT）和工业物联网（IIoT）的迅速扩张已经创造了一个庞大、异构的攻击面，挑战传统的网络安全机制。虽然联邦学习（FL）提供了一个保护隐私的替代方案来替代集中式入侵检测系统（IDS），但标准方法往往难以泛化到各种设备行为，并且通常无法利用现实边缘环境中存在的大量未标记数据。为了填补这些差距，我们提出了CLAD，一个综合框架，无缝地将聚类联邦学习（CFL）与一种新颖的双模微架构（$\text{DM}^2\text{A}$）相结合。这一统一方法同时解决了物联网安全的两个主要瓶颈：设备异构性和标签稀缺性。$\text{DM}^2\text{A}$组件具有一个共享的编码器，后面跟着两个分支，可以实现联合无监督异常检测和监督攻击分类；这使得框架能够从有标记和无标记的客户端中收集情报。同时，聚类组件动态地将具有一致流量模式的设备分组，防止全局模型分歧。通过精心结合这些元素，CLAD确保不丢弃任何数据，并保留不同的操作模式。广泛的评估表明，这种集成方法明显优于现有技术基线，在80%的未标记客户端场景中，检测性能相对提高了30%，通信成本只有一半。

更新时间: 2026-05-07 17:01:19

领域: cs.LG,cs.CR,cs.DC,cs.NI

下载: http://arxiv.org/abs/2605.06571v1

SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation

Many real-world problems require sequential decisions under uncertainty: when to inject or withdraw gas from storage, how to rebalance a pension portfolio each month, what temperature profile to run through a pharmaceutical reactor chain. Dynamic programming solves small instances exactly but scales exponentially in state dimensions. Black-box reinforcement learning handles high-dimensional states but trains slowly and produces no sensitivities. We introduce SNAPO (Smooth Neural Adjoint Policy Optimization), a framework that embeds a neural policy inside a known, differentiable simulator, replaces hard constraints with smooth approximations, and computes exact gradients of the objective with respect to all policy parameters and all inputs in a single adjoint pass. We demonstrate SNAPO on three domains: natural gas storage (training in under a minute, 365 forward curve sensitivities at no additional cost per sensitivity), pension fund asset-liability management (6.5x-200x sensitivity speedup over bump-and-revalue, scaling with the number of risk factors), and pharmaceutical manufacturing (cross-unit sensitivities through a 4-unit process chain, with 20 ICH Q8 regulatory sensitivities from 5 adjoint passes in 74.5 milliseconds). All sensitivities are produced by the same backward pass that trains the policy, at a cost proportional to one reverse pass regardless of how many sensitivities are computed.

Updated: 2026-05-07 17:01:13

标题: SNAPO：通过可微分模拟进行光滑神经对偶策略优化以实现最优控制

摘要: 许多现实世界的问题需要在不确定性下做出顺序决策：何时向储气库注入或抽出气体，如何每个月重新平衡养老金投资组合，以及通过制药反应器链运行什么温度曲线。动态规划可以精确解决小规模问题，但在状态维度上呈指数级增长。黑盒强化学习处理高维状态，但训练速度慢且不产生敏感性。我们引入了SNAPO（平滑神经共轭策略优化）框架，将神经策略嵌入已知的可微分模拟器中，用平滑近似替代硬约束，并在单个共轭传递中计算出与所有策略参数和所有输入相关的目标的精确梯度。我们在三个领域展示了SNAPO的应用：天然气储存（在一分钟内训练，在没有额外成本的情况下生成365个敏感度曲线），养老基金资产负债管理（比颠簸和重估快6.5-200倍，与风险因素数量成比例），以及制药制造（通过4个单元工序链的跨单元敏感性，在74.5毫秒内从5个共轭传递中生成20个ICH Q8监管敏感性）。所有敏感度都是通过训练策略的相同反向传递产生的，成本与计算的敏感度数量成正比，无论计算多少敏感度，都只需要进行一次反向传递。

更新时间: 2026-05-07 17:01:13

领域: cs.LG,math.OC,q-fin.CP,q-fin.MF,q-fin.RM

下载: http://arxiv.org/abs/2605.06570v1

Flow-Based Conformal Predictive Distributions

Conformal prediction provides a distribution-free framework for uncertainty quantification via prediction sets with exact finite-sample coverage. In low dimensions these sets are easy to interpret, but in high-dimensional or structured output spaces they are difficult to represent and use, which can limit their ability to integrate with downstream tasks such as sampling and probabilistic forecasting. We show that any sufficiently regular differentiable nonconformity score induces a deterministic flow on the output space whose trajectories converge to the boundary of the corresponding conformal prediction set. This leads to a computationally efficient, training-free method for sampling conformal boundaries in arbitrary dimensions. Mixing across confidence levels yields conformal predictive distributions whose quantile regions coincide with the empirical conformal prediction sets. We provide an approximation bound decomposing CPD predictive error into score-induced distortion, base-measure quality, and gradient flow-induced distortion. We evaluate the approach on PDE inverse problems, precipitation downscaling, climate model debiasing, and hurricane trajectory forecasting.

Updated: 2026-05-07 17:00:28

标题: 基于流的符合预测分布

摘要: 拟合预测为通过具有确切有限样本覆盖率的预测集提供了一个无分布框架，用于通过确切有限样本覆盖率来量化不确定性。在低维情况下，这些集合易于解释，但在高维或结构化输出空间中，它们很难表示和使用，这可能限制它们与下游任务（如抽样和概率预测）的集成能力。我们展示了任何充分正则的可微非依从性评分都会在输出空间上引入一个确定性流，其轨迹收敛到相应符合预测集的边界。这导致了一种在任意维度中对符合边界进行抽样的计算高效、无需训练的方法。混合不同置信水平产生符合预测分布，其分位区域与经验符合预测集重合。我们提供了一个近似界，将CPD预测误差分解为评分引起的扭曲、基本测度质量和梯度流引起的扭曲。我们在PDE逆问题、降水降尺度、气候模型去偏差和飓风路径预测方面评估了这种方法。

更新时间: 2026-05-07 17:00:28

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2602.07633v3

Dynamic Treatment on Networks

In networks, effective dynamic treatment allocation requires deciding both whom to treat and also when, so as to amplify policy impact through spillovers. An early intervention at a well-connected node can trigger cascades that change which nodes are worth targeting in the next period. Existing treatment strategies under network interference are largely static while dynamic treatment frameworks typically ignore network structure altogether. We integrate these perspectives and propose Q-Ising, a three-stage pipeline that (i) estimates network adoption dynamics via a Bayesian dynamic Ising model from a single observed panel, (ii) augments treatment adoption histories with continuous posterior latent states, and (iii) learns a dynamic policy via offline reinforcement learning. The Bayesian mechanism enables uncertainty quantification over dynamic decisions, yielding posterior ensemble policies with interpretable spillover estimates. We provide a finite-sample regret upper bound that decomposes into standard offline-RL uncertainty, network abstraction error, and first stage error in Ising state estimation. We apply our method to data from Indian village microfinance networks and synthetic stochastic block models under simulated heterogeneous susceptible-infected-susceptible (SIS) dynamics and demonstrate that adaptive targeting outperforms static centrality benchmarks.

Updated: 2026-05-07 16:58:38

标题: 网络动态治疗

摘要: 在网络中，有效的动态治疗分配需要决定治疗谁以及何时治疗，以通过溢出效应增加政策影响力。在一个良好连接的节点上进行早期干预可能会引发级联效应，改变哪些节点在下一个时期值得定位。现有的网络干扰下的治疗策略大多是静态的，而动态治疗框架通常完全忽略网络结构。我们整合这些观点，并提出Q-Ising，这是一个三阶段的流程：(i)通过贝叶斯动态Ising模型从单个观察面板估计网络采用动态，(ii)通过连续的后验潜在状态增加治疗采用历史，(iii)通过离线强化学习学习动态政策。贝叶斯机制使得对动态决策的不确定性量化，产生具有可解释溢出估计的后验集合策略。我们提供了一个有限样本的遗憾上限界，可以分解成标准的离线强化学习不确定性、网络抽象误差和Ising状态估计的第一阶段误差。我们将我们的方法应用于印度村庄微金融网络和合成随机分块模型的数据，模拟异质易感-感染-易感（SIS）动态，并证明自适应定位优于静态中心性基准。

更新时间: 2026-05-07 16:58:38

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2605.06564v1

Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models

Large Language Models (LLMs) often struggle with computational efficiency and error propagation in multi-step reasoning tasks. While recent advancements on prompting and post-training have enabled LLMs to perform step-wise reasoning, they still tend to explore unproductive solution paths without effective backtracking or strategy adjustment. In this paper, we propose Meta-Reasoner, a new framework that empowers LLMs to "think about how to think". It optimizes the inference process by dynamically adapting reasoning strategies in real-time. Our approach employs contextual multi-armed bandits (CMABs) to learn an adaptive policy. It learns to evaluate the current state of LLM's reasoning and determine optimal strategy that is most likely to lead to a successful outcome during inference, like whether to backtrack, switch to a new approach, or restart the problem-solving process. This meta-guidance helps avoid unproductive paths exploration during inference and hence improves computational efficiency. We evaluate Meta-Reasoner on math problems (e.g., Game-of-24, TheoremQA) and scientific tasks (e.g., SciBench). Results show that our method outperform previous SOTA methods by 9-12% in accuracy, while reducing inference time by 28-35% under the same compute budget. Additional experiments on creative writing demonstrate the generalizability of our approach to diverse reasoning-intensive tasks.

Updated: 2026-05-07 16:58:35

标题: 元推理者: 大型语言模型中优化推理时间推理的动态指导

摘要: 大型语言模型（LLMs）在多步推理任务中往往面临计算效率和错误传播的困难。虽然最近在提示和后训练方面取得了进展，使LLMs能够进行逐步推理，但它们仍然倾向于在没有有效回溯或策略调整的情况下探索无效的解决方案路径。在本文中，我们提出了一种新框架Meta-Reasoner，赋予LLMs“思考如何思考”的能力。它通过动态调整推理策略实时优化推理过程。我们的方法采用上下文多臂赌博机（CMABs）来学习自适应策略。它学会评估LLM推理的当前状态，并确定在推理过程中最有可能导致成功结果的最佳策略，比如是否回溯、转换到新方法或重新开始解决问题的过程。这种元指导有助于避免推理过程中无效路径的探索，从而提高计算效率。我们在数学问题（例如，24点游戏，TheoremQA）和科学任务（例如，SciBench）上评估了Meta-Reasoner。结果显示，我们的方法在准确性方面优于先前的SOTA方法9-12%，同时在相同的计算预算下减少推理时间28-35%。对创意写作的额外实验证明了我们的方法在不同推理密集型任务中的泛化能力。

更新时间: 2026-05-07 16:58:35

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.19918v6

Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation

Training large language models (LLMs) for non-verifiable tasks, such as creative writing, dialogue, and ethical reasoning, remains challenging due to the absence of ground-truth labels. While LLM-as-Judge approaches offer a scalable alternative to human feedback, they face a fundamental limitation: performance is constrained by the evaluator's own quality. If the judge cannot recognize good solutions, it cannot provide useful training signals, and evaluation biases (e.g., favoring verbosity over quality) remain unaddressed. This motivates meta-evaluation: the ability to evaluate and improve the evaluator itself. We introduce CoNL, a framework that unifies generation, evaluation, and meta-evaluation through multi-agent self-play. Our key insight: critique quality can be measured by whether it helps others improve their solutions. In CoNL, multiple agents sharing the same policy engage in structured conversations to propose, critique, and revise solutions. Critiques that enable solution improvements earn a diagnostic reward, creating explicit supervision for meta-evaluation and enabling joint optimization of generation and judging capabilities through self-play, without external judges or ground truth. Experiments on various benchmarks show that CoNL achieves consistent improvements over self-rewarding baselines while maintaining stable training.

Updated: 2026-05-07 16:58:04

标题: 无法验证学习的对话：通过元评估实现自我进化的LLMs

摘要: 训练大型语言模型(LLMs)进行非可验证任务，如创意写作、对话和伦理推理，仍然具有挑战性，因为缺乏地面真实标签。虽然LLM作为评委的方法提供了一个可扩展的替代方案来代替人类反馈，但它们面临一个基本限制：表现受到评价者自身质量的限制。如果评委无法识别出好的解决方案，它就无法提供有用的训练信号，并且评估偏见(例如，偏爱数量而非质量)仍然得不到解决。这促使了元评估：评估和改进评价者本身的能力。我们引入了CoNL，一个通过多代自我对弈统一生成、评估和元评估的框架。我们的关键洞察力是：批评质量可以通过它是否帮助他人改进他们的解决方案来衡量。在CoNL中，共享相同策略的多个代理参与结构化对话，提出、批评和修订解决方案。能够使解决方案改进的批评获得诊断奖励，为元评估提供明确监督，并通过自我对弈实现生成和评判能力的联合优化，而无需外部评委或地面真相。在各种基准测试中的实验证明，CoNL相对于自我奖励基线实现了一致的改进，同时保持了稳定的训练。

更新时间: 2026-05-07 16:58:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2601.21464v2

Criticality and Saturation in Orthogonal Neural Networks

It has been known for a long time that initializing weight matrices to be orthogonal instead of having i.i.d. Gaussian components can improve training performance. This phenomenon can be analyzed using finite-width corrections, where the infinite-width statistics are supplemented by a power series in $1/\mathrm{width}$. In particular, recent empirical results by Day et al. show that the tensors appearing in this treatment stabilize for large depth, as opposed to the tensors of i.i.d.-initialized networks. In this article, we derive explicit layer-wise recursion relations for the tensors appearing in the finite-width expansion of the network statistics in the case of orthogonal initializations. We also provide an extension of recently-introduced Feynman diagrams for the corresponding recursions in the i.i.d.-case which are valid to all orders in $1/\mathrm{width}$. Finally, we show explicitly that the recursions we derive reproduce the stability of the finite-width tensors which was observed for activation functions with vanishing fixed point. This work therefore provides a theoretical explanation for the stability of nonlinear networks of finite width initialized with orthogonal weights, closing a long-standing gap in the literature. We validate our theoretical results experimentally by showing that numerical solutions of our recursion relations and their analytical large-depth expansions agree excellently with Monte-Carlo estimates from network ensembles.

Updated: 2026-05-07 16:57:59

标题: 正交神经网络中的临界性和饱和现象

摘要: 很长一段时间以来，人们已经知道，将权重矩阵初始化为正交的而不是具有i.i.d.高斯分量可以改善训练性能。这一现象可以通过使用有限宽度修正进行分析，其中无限宽度的统计数据通过$1/\mathrm{width}$的幂级数进行补充。特别是，Day等人最近的实证结果表明，在大深度下，出现在这种处理中的张量会稳定，而不像i.i.d.初始化的网络中的张量那样。在本文中，我们推导出了正交初始化情况下出现在网络统计的有限宽度展开中的张量的显式逐层递归关系。我们还为i.i.d.情况下相应递归的最近引入的费曼图提供了一个扩展，这些递归在$1/\mathrm{width}$的所有阶数上都是有效的。最后，我们明确地展示了我们推导的递归会重现出现在具有消失固定点的激活函数的有限宽度张量的稳定性。因此，这项工作为具有正交权重初始化的有限宽度非线性网络的稳定性提供了一个理论解释，填补了文献中长期存在的空白。我们通过实验证实了我们的理论结果，表明我们递归关系的数值解和它们的解析大深度展开与网络集合的蒙特卡罗估计非常一致。

更新时间: 2026-05-07 16:57:59

领域: cs.LG

下载: http://arxiv.org/abs/2605.06563v1

Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production

Within the landscape of inference-time scaling methods for foundation models, a width-based approach to scaling -- which involves the insertion of <pause> tokens in the input stream to delay model responses -- offers a unique advantage by increasing model expressivity while remaining highly parallelizable at both training and inference. The existing literature on training models to utilize <pause> tokens relies on the standard cross-entropy objective in which the model output is read out and evaluated only at the final step of a pause sequence. This approach provides no mechanism for the model to regulate its own processing or to signal readiness to respond, treating the additional compute steps as a static barrier rather than a resource to be used adaptively. We propose a supervised loss, Catch Your Breath (CYB), framed as a sequential-decision problem, that trains a model to dynamically and autonomously scale the number of compute steps used for each input token. The model indicates the need for additional compute steps by emitting a special <don't know> output, delaying its response via a pause. The model can abstain multiple times to obtain longer delays. Our experiments demonstrate that CYB significantly outperforms standard cross-entropy when introduced either in pretraining or fine-tuning, reducing perplexity and enhancing downstream accuracy with no additional computational or memory cost.

Updated: 2026-05-07 16:57:03

标题: 抓住你的呼吸：自主步调序列生成的自适应计算

摘要: 在基础模型推理时间缩放方法的景观中，一种基于宽度的缩放方法——即在输入流中插入<pause>标记以延迟模型响应——通过增加模型表现力，同时保持在训练和推理阶段高度可并行化，提供了独特的优势。现有文献关于训练模型利用<pause>标记依赖于标准的交叉熵目标，其中模型输出仅在暂停序列的最后一步读取和评估。这种方法没有为模型调节自身处理或发出响应准备的机制，将额外的计算步骤视为静态障碍，而不是可以自适应使用的资源。我们提出了一种监督损失，称为Catch Your Breath（CYB），构建为一个顺序决策问题，它训练模型动态和自主地缩放用于每个输入标记的计算步骤数量。模型通过发出特殊的<don't know>输出来指示需要额外的计算步骤，通过暂停延迟其响应。模型可以多次弃权以获得更长的延迟。我们的实验表明，CYB在预训练或微调中引入时明显优于标准交叉熵，降低了困惑度，并提高了下游准确性，而没有额外的计算或内存成本。

更新时间: 2026-05-07 16:57:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2510.13879v2

Feature Dimensionality Outweighs Model Complexity in Breast Cancer Subtype Classification Using TCGA-BRCA Gene Expression Data

Accurate classification of breast cancer subtypes from gene expression data is critical for diagnosis and treatment selection. However, such datasets are characterized by high dimensionality and limited sample size, posing challenges for machine learning models. In this study, we evaluate the impact of model complexity and feature selection on subtype classification performance using TCGA-BRCA gene expression data. Logistic regression, random forest, and support vector machine (SVM) models were trained using varying numbers of highly variable genes (50 to 20,518). Performance was evaluated using stratified 5-fold cross-validation and assessed with accuracy and macro F1 score. While all models achieved high accuracy, macro F1 analysis revealed substantial differences in subtype-level performance. Logistic regression demonstrated the most stable and balanced performance across subtypes, including improved detection of rare classes. Random forest underperformed on minority subtypes despite strong overall accuracy, while SVM showed sensitivity to feature dimensionality. These findings highlight the importance of model simplicity, evaluation metrics, and feature selection in high-dimensional biological classification tasks.

Updated: 2026-05-07 16:55:46

标题: 特征维度在使用TCGA-BRCA基因表达数据进行乳腺癌亚型分类中的重要性超过模型复杂性

摘要: 从基因表达数据准确分类乳腺癌亚型对于诊断和治疗选择至关重要。然而，这些数据集具有高维度和有限的样本量特点，给机器学习模型带来挑战。在这项研究中，我们使用TCGA-BRCA基因表达数据评估了模型复杂性和特征选择对亚型分类性能的影响。逻辑回归、随机森林和支持向量机（SVM）模型使用不同数量的高度可变基因（50到20,518）进行训练。性能使用分层5倍交叉验证进行评估，并通过准确率和宏F1分数进行评估。虽然所有模型均实现了高准确率，但宏F1分析显示在亚型级别性能中存在实质性差异。逻辑回归表现出对亚型最稳定和平衡的性能，包括改善对罕见类别的检测。尽管总体准确率较高，随机森林在少数亚型上表现不佳，而SVM对特征维度敏感。这些发现突显了模型简单性、评估指标和特征选择在高维度生物分类任务中的重要性。

更新时间: 2026-05-07 16:55:46

领域: cs.LG,q-bio.GN

下载: http://arxiv.org/abs/2605.06562v1

Contrastive Image-Metadata Pre-Training for Materials Transmission Electron Microscopy

The transmission electron microscope facilitates the highest-resolution imaging of any instrument ever created, and its limiting factor is no longer spatial resolution but dose efficiency. Low electron doses avoid sample damage but produce noisy images for which, unlike in classical computer vision, there is no ground truth. Autonomous materials experimentation poses a related problem, since closed-loop instruments need representations grounded in the microscope state at acquisition. Both demand representations grounded in how an image was acquired. We release 7,330 paired high-angle annular dark-field scanning-TEM (HAADF-STEM) images and their seven-dimensional acquisition metadata, and propose Contrastive Image-Metadata Pre-training (CIMP), a CLIP-style encoder that aligns the two modalities and reaches 84.4% Top-1 cross-modal retrieval on a held-out split. All seven parameters are individually recoverable from the frozen visual embedding through a linear probe, and we use the embedding to condition a metadata-conditioned style-transfer model that re-renders experimental images under different acquisition parameters. Virtually scaling dwell time and beam current of low-dose images turns this model into a physics-informed denoiser; in a blind user study, experimental microscopists prefer it over the current state-of-the-art denoiser for STEM imagery on 70.2% of trials.

Updated: 2026-05-07 16:55:16

标题: 对比图像元数据预训练用于材料透射电子显微镜

摘要: 透射电子显微镜是迄今为止创建的仪器中能够提供最高分辨率成像的设备，其限制因素不再是空间分辨率，而是剂量效率。低电子剂量可避免样品损伤，但会产生嘈杂的图像，与传统计算机视觉不同，这些图像没有基准真相。自主材料实验面临着一个相关问题，因为闭环仪器需要基于采集时显微镜状态的表示。这两者都要求基于图像获取方式的表达。我们发布了7,330对高角度环形暗场扫描透射电子显微镜（HAADF-STEM）图像及其七维采集元数据，并提出了对比图像-元数据预训练（CIMP），这是一种类似CLIP的编码器，可以对齐这两种模态，并在保留数据集上实现84.4%的Top-1交叉模态检索。所有七个参数都可以通过冻结的视觉嵌入单独恢复，我们使用这个嵌入来调节一个基于元数据的样式转移模型，以在不同的采集参数下重新渲染实验图像。调整低剂量图像的扫描时间和束流大小，将这个模型转变为一个受物理启发的去噪器；在一个盲目用户研究中，实验显微镜学家在70.2%的试验中更倾向于使用它，而不是当前最先进的STEM图像去噪器。

更新时间: 2026-05-07 16:55:16

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2604.24909v2

Mirror Descent-Ascent for mean-field min-max problems

We study two variants of the mirror descent-ascent (MDA) algorithm for solving min-max problems on the space of measures: simultaneous and alternating. We work under assumptions of convexity-concavity and relative smoothness of the payoff function with respect to a suitable Bregman divergence, defined on the space of measures via flat derivatives. We establish non-asymptotic convergence rates to mixed Nash equilibria, measured in the Nikaidô-Isoda error, proving an $\mathcal{O}(N^{-1/2})$ rate for simultaneous MDA and an improved $\mathcal{O}(N^{-2/3})$ rate for alternating MDA. The main technical contribution is an infinite-dimensional dual space analysis that relates Bregman divergences on measures to dual Bregman divergences on spaces of bounded continuous functions, allowing us to control asymmetric commutator terms created by alternating updates. The results substantially generalize prior analyses restricted to bilinear objectives and also apply to nonlinear convex-concave problems on measure spaces, thereby providing a unified theoretical foundation for MDA in mean-field min-max optimization.

Updated: 2026-05-07 16:54:55

标题: 镜像下降-上升算法用于均场极小-极大问题

摘要: 我们研究镜像下降-上升（MDA）算法的两个变体，用于解决测度空间上的极小-极大问题：同时和交替。我们在凸凹性和与适当的Bregman散度相关的收益函数的相对平滑性的假设下工作，该散度在测度空间上通过平坦导数定义。我们建立了到混合纳什均衡的非渐近收敛速率，以Nikaidô-Isoda误差度量，证明了同时MDA的$O(N^{-1/2})$速率和交替MDA的改进的$O(N^{-2/3})$速率。主要技术贡献是一个无限维对偶空间分析，将测度上的Bregman散度与有界连续函数空间上的对偶Bregman散度相关联，从而使我们能够控制由交替更新创建的不对称对易子项。这些结果大大推广了之前仅限于双线性目标的分析，并且也适用于测度空间上的非线性凸凹问题，从而为MDA在均场极小-极大优化中提供了统一的理论基础。

更新时间: 2026-05-07 16:54:55

领域: math.OC,cs.LG,math.PR

下载: http://arxiv.org/abs/2402.08106v3

Optimal Counterfactual Search in Tree Ensembles: A Study Across Modeling and Solution Paradigms

Trust in counterfactual explanations depends critically on whether their recommended changes are truly minimal: suboptimal explanations may vastly overshoot the actual changes needed to alter a decision, and heuristic errors can affect individuals unevenly, giving some users relevant recourse while assigning others unnecessarily costly recommendations. Consequently, we study the problem of computing optimal counterfactual explanations for tree ensembles under plausibility and actionability constraints. This is a combinatorial problem: for a fixed model, counterfactual search boils down to selecting consistent branching decisions and threshold-defined regions under a distance objective. We exploit this structure through CPCF, a constraint programming (CP) formulation in which numerical features are encoded as interval domains induced by split thresholds, while discrete features retain native finite-domain representations. This yields a compact finite-domain formulation that supports multiple distance objectives without continuous split-boundary search. We then place CPCF in a broader comparison across mathematical programming paradigms: we extend a maximum Boolean satisfiability (MaxSAT) formulation, originally designed for hard-voting random forests, to soft-voting ensembles, and compare against the current state-of-the-art mixed-integer linear programming (MILP) optimal approach. Across ten datasets and three types of tree ensembles, we analyze scalability, anytime performance, and sensitivity to distance metrics. We observe that CP achieves the best overall performance. More importantly, our results identify regimes in which the specific strengths of each paradigm make it best suited: CP is most versatile overall, MaxSAT handles hard-voting ensembles particularly well, and MILP remains competitive in amortized inference settings with a moderate number of split levels.

Updated: 2026-05-07 16:54:38

标题: 树集成中的最优反事实搜索：跨建模和解决范式的研究

摘要: 对因果解释的信任在于它们推荐的改变是否真正是最小化的关键：次优解释可能会远远超出改变决策所需的实际变化，而启发式错误可能会不均匀地影响个体，使一些用户具有相关的选择空间，而给其他人分配不必要的昂贵建议。因此，我们研究了在可信性和行动性约束下计算树集成的最优因果解释的问题。这是一个组合问题：对于一个固定的模型，因果搜索归结为在距离目标下选择一致的分支决策和阈值定义的区域。我们通过CPCF利用这种结构，CPCF是一个约束编程（CP）形式，在其中数值特征被编码为由分裂阈值引起的区间域，而离散特征保留原生有限域表示。这产生了一个紧凑的有限域公式，支持多个距离目标，而无需持续的分裂边界搜索。然后，我们将CPCF放在数学规划范式的更广泛比较中：我们将最大布尔满足性（MaxSAT）公式扩展到软投票集成，与当前最先进的混合整数线性规划（MILP）最优方法进行比较，该方法最初是为硬投票随机森林设计的。在十个数据集和三种类型的树集成中，我们分析了可伸缩性，任何时候的性能以及对距离度量的敏感性。我们观察到CP实现了最佳的整体性能。更重要的是，我们的结果确定了每种范式的具体优势使其最适用的领域：CP在整体上最为灵活，MaxSAT在处理硬投票集成方面表现特别好，而MILP在具有适度分裂水平的摊销推理设置中仍具竞争力。

更新时间: 2026-05-07 16:54:38

领域: cs.LG

下载: http://arxiv.org/abs/2605.06561v1

Switchcodec: Adaptive residual-expert sparse quantization for high-fidelity neural audio coding

Recent neural audio compression models often rely on residual vector quantization for high-fidelity coding, but using a fixed number of per-frame codebooks is suboptimal for the wide variability of audio content-especially for signals that are either very simple or highly complex. To address this limitation, we propose SwitchCodec, a neural audio codec based on Residual Experts Vector Quantization (REVQ). REVQ combines a shared quantizer with dynamically routed expert quantizers that are activated according to the input audio, decoupling bitrate from codebook capacity and improving compression efficiency. This design ensures full training and utilization of each quantizer. In addition, a variable-bitrate mechanism adjusts the number of active expert quantizers at inference, enabling multi-bitrate operation without retraining. Experiments demonstrate that SwitchCodec surpasses existing baselines on both objective metrics and subjective listening tests.

Updated: 2026-05-07 16:51:49

标题: Switchcodec：用于高保真神经音频编码的自适应残差专家稀疏量化

摘要: 近期的神经音频压缩模型通常依赖于残差向量量化以实现高保真编码，但是使用固定数量的每帧码书对于音频内容的广泛变化是次优的，特别是对于信号非常简单或者非常复杂的情况。为了解决这一限制，我们提出了SwitchCodec，这是一种基于残差专家向量量化（REVQ）的神经音频编解码器。REVQ将一个共享量化器与根据输入音频动态路由的专家量化器相结合，使比特率与码书容量分离，提高了压缩效率。这种设计确保了每个量化器的充分训练和利用。此外，一个可变比特率机制在推理时调整活跃专家量化器的数量，实现多比特率操作而无需重新训练。实验证明，SwitchCodec在客观指标和主观听测试上均超过了现有基线。

更新时间: 2026-05-07 16:51:49

领域: cs.SD,cs.AI

下载: http://arxiv.org/abs/2601.20362v2

Importance-Guided Basis Selection for Low-Rank Decomposition of Large Language Models

Low-rank decomposition is a compelling approach for compressing large language models, but its effectiveness hinges on selecting which singular-vector bases to retain for a target task. Existing methods such as Basel adapt singular-value coefficients on downstream data and prune bases with small re-learned magnitudes, a heuristic that can be misaligned with task performance because it ignores the local geometry of the loss landscape. We present Basis Selection with Importance (BSI), a principled low-rank compression framework that ranks and prunes bases by directly estimating the expected loss increase incurred when each basis is removed. BSI derives a derivative-based importance score from a second-order Taylor expansion of the task loss with respect to singular values, combining first-order sensitivity and second-order curvature to quantify pruning impact. To make this criterion practical for LLMs, we develop an efficient Hessian-diagonal estimator by adapting the Hutchinson randomized-probing method to loss curvature with symmetric parameter perturbations. We provide a comprehensive theoretical analysis, including loss-increase bounds under basis pruning, explicit propagation of Hessian-diagonal estimation error into these bounds, variance characterization tied to the Hessian spectrum, high-probability sample-complexity guarantees for achieving a target estimation accuracy, and guidance on perturbation intensity. Extensive experiments on mathematical reasoning benchmarks demonstrate that BSI consistently outperforms state-of-the-art low-rank decomposition baselines, with especially strong improvements under deep compression.

Updated: 2026-05-07 16:51:21

标题: 大语言模型低秩分解的重要性导向基选择

摘要: 低秩分解是压缩大型语言模型的一种引人注目的方法，但其有效性取决于选择保留哪些奇异向量基于目标任务。现有方法如Basel在下游数据上调整奇异值系数并修剪重新学习幅度较小的基础，这是一种启发式方法，可能与任务表现不一致，因为它忽略了损失景观的局部几何形状。我们提出了基于重要性的基础选择（BSI），这是一个明智的低秩压缩框架，通过直接估算去除每个基础时造成的预期损失增加来对基础进行排名和修剪。BSI从任务损失相对于奇异值的二阶泰勒展开中获得基于导数的重要性评分，结合了一阶灵敏度和二阶曲率来量化修剪影响。为了使这个标准对LLMs实用，我们通过将Hutchinson随机探测方法调整为对称参数扰动下的损失曲率，开发了一个高效的Hessian对角估计器。我们提供了全面的理论分析，包括在基础修剪下的损失增加界限，明确地将Hessian对角估计误差传播到这些界限中，与Hessian谱相关的方差表征，实现目标估计精度的高概率样本复杂性保证，以及对扰动强度的指导。在数学推理基准测试中的大量实验证明，BSI始终优于最先进的低秩分解基线，特别是在深度压缩下表现出强大的改进。

更新时间: 2026-05-07 16:51:21

领域: cs.LG

下载: http://arxiv.org/abs/2605.01627v2

Coordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learning

Cooperative multi-agent reinforcement learning (MARL) benchmarks commonly emphasize aggregate outcomes such as return, success rate, or completion time. While essential, these metrics often fail to reveal how agents coordinate, particularly in settings where agents, tasks, and joint assignment choices scale combinatorially. We propose a coordination-aware evaluation perspective that supplements return with process-level diagnostics. We instantiate this perspective using STAT, a controlled commitment-constrained spatial task-allocation testbed that systematically varies agents, tasks, and environment size while holding observation access and task rules fixed. We evaluate six representative value-based MARL methods across varying levels of centralization. Our results show that similar return trends can reflect distinct coordination mechanisms, including differences in redundant assignment, assignment diversity, and task-completion efficiency. We find that in commitment-constrained task allocation, performance under scale is shaped not only by nominal action-space size, but also by assignment pressure, sparse decision opportunities, and redundant choices among interdependent agents. Our findings motivate coordination-aware evaluation as a necessary complement to return-based benchmarking for cooperative MARL.

Updated: 2026-05-07 16:50:53

标题: 协调至关重要：合作多智能体强化学习的评估

摘要: 合作多智能体强化学习（MARL）基准通常强调总体结果，如回报、成功率或完成时间。虽然这些指标是必不可少的，但它们通常无法揭示智能体如何协调，特别是在智能体、任务和联合分配选择在组合上扩展的情况下。我们提出了一个注重协调的评估视角，将回报与过程级诊断相结合。我们利用STAT实例化了这个视角，STAT是一个受控的承诺受限空间任务分配测试平台，它在保持观察访问和任务规则不变的同时，系统地变化智能体、任务和环境大小。我们评估了六种代表性的基于价值的MARL方法，涉及不同级别的集中化。我们的结果表明，类似的回报趋势可以反映出不同的协调机制，包括冗余分配、分配多样性和任务完成效率的差异。我们发现，在承诺受限的任务分配中，性能在规模下的表现不仅受名义行动空间大小的影响，还受到分配压力、稀疏的决策机会以及相互依赖的智能体之间的冗余选择的影响。我们的发现认为，注重协调的评估是对基于回报的基准测试的必要补充，用于合作性MARL。

更新时间: 2026-05-07 16:50:53

领域: cs.MA,cs.AI,cs.LG

下载: http://arxiv.org/abs/2605.06557v1

Diverse Sampling in Diffusion Models with Marginal Preserving Particle Guidance

We present EDDY (Exact-marginal Diversification via Divergence-free dYnamics), a guidance mechanism for diffusion and flow matching models that promotes diversity among samples generated while maintaining quality. EDDY exploits symmetries of the Fokker-Planck equation, using drift perturbations that change particle trajectories while preserving the evolving marginal distribution. We instantiate this principle through kernel-based anti-symmetric pairwise matrix fields, constructed from the repulsive directions. The resulting divergence-free dynamics promote diversity at the joint particle level while preserving each particle's marginal distribution without any additional training. As computing the guidance can be computationally expensive in cases such as text-to-image generation with perceptual embeddings, we propose practical approximations as an effective and efficient solution. Experiments on synthetic distributions and text-to-image generation show that EDDY improves diversity while maintaining strong distributional fidelity compared to common baselines.

Updated: 2026-05-07 16:49:12

标题: 在扩散模型中具有保持边缘的粒子引导的多样采样

摘要: 我们提出了EDDY（Exact-marginal Diversification via Divergence-free dYnamics），这是一种用于扩散和流匹配模型的引导机制，可以在保持质量的同时促进生成样本的多样性。EDDY利用福克-普朗克方程的对称性，利用漂移扰动改变粒子轨迹，同时保持演化的边际分布。我们通过基于核的反对称成对矩阵场实例化这一原则，这些矩阵场是从排斥方向构建而成的。由此产生的无散动力学在联合粒子级别上促进多样性，同时保持每个粒子的边际分布，而无需任何额外的训练。由于在诸如具有感知嵌入的文本到图像生成等情况下计算引导可能会非常昂贵，因此我们提出了实用的近似方法作为有效和高效的解决方案。对合成分布和文本到图像生成的实验表明，与常见基线相比，EDDY提高了多样性，同时保持了强大的分布保真度。

更新时间: 2026-05-07 16:49:12

领域: cs.LG

下载: http://arxiv.org/abs/2605.06553v1

Sequential Design of Genetic Circuits Under Uncertainty With Reinforcement Learning

The design of biological systems is hindered by uncertainty arising from both intrinsic stochasticity of biomolecular reactions and variability across laboratory or experimental conditions. In this work, we present a sequential framework to optimize genetic circuits under both forms of uncertainty. By employing simulator models based on differential equations or Markov jump processes alongside a reinforcement learning (RL) policy-based approach, our method suggests experiments that adapt to unknown laboratory conditions while accounting for inherent stochasticity. While previous Bayesian methods address uncertainty through iterative experiment-inference-optimization cycles, they typically require computationally expensive inference and optimization steps after each experimental round, leading to delays. To overcome this bottleneck, we propose an amortized approach trained up-front across a distribution of possible uncertain parameters. This strategy sidesteps the need for explicit parameter inference during the design cycle, enabling immediate, observation-based adaptation. We demonstrate our framework on models for heterologous gene expression and a repressilator circuit, showing that it efficiently handles both molecular noise and cross-laboratory variability.

Updated: 2026-05-07 16:49:07

标题: 遗传电路在不确定性下的顺序设计与强化学习

摘要: 生物系统设计受到来自生物分子反应固有随机性和实验室或实验条件变异性的不确定性的阻碍。在这项工作中，我们提出了一种顺序框架，以优化遗传回路在这两种不确定性下。通过采用基于微分方程或马尔可夫跳跃过程的模拟器模型以及基于强化学习（RL）策略的方法，我们的方法建议实验，以适应未知的实验室条件，同时考虑固有的随机性。虽然先前的贝叶斯方法通过迭代实验-推理-优化循环解决不确定性，但它们通常需要在每一轮实验后进行计算昂贵的推理和优化步骤，导致延迟。为了克服这一瓶颈，我们提出了一个摊销方法，通过在可能的不确定参数分布上进行预训练。这一策略避开了设计周期中明确参数推理的需要，实现了即时的、基于观察的适应。我们在异源基因表达模型和一个抑制循环回路的模型上展示了我们的框架，表明它有效地处理分子噪声和跨实验室变异性。

更新时间: 2026-05-07 16:49:07

领域: cs.LG

下载: http://arxiv.org/abs/2605.06552v1

Mochi: Aligning Pre-training and Inference for Efficient Graph Foundation Models via Meta-Learning

We propose Mochi, a Graph Foundation Model that addresses task unification and training efficiency by adopting a meta-learning based training framework. Prior models pre-train with reconstruction-based objectives such as link prediction, and assume that the resulting representations can be aligned with downstream tasks through a separate unification step such as class prototypes. We demonstrate through synthetic and real-world experiments that this procedure, while simple and intuitive, has limitations that directly affect downstream task performance. To address these limitations, Mochi pre-trains on few-shot episodes that mirror the downstream evaluation protocol, aligning the training objective with inference rather than relying on a post-hoc unification step. We show that Mochi, along with its more powerful variant Mochi++, achieves competitive or superior performance compared to existing Graph Foundation Models across 25 real-world graph datasets spanning node classification, link prediction, and graph classification, while requiring 8$\sim$27 times less training time than the strongest baseline.

Updated: 2026-05-07 16:47:12

标题: Mochi: 通过元学习对齐预训练和推理，实现高效的图基础模型

摘要: 我们提出了Mochi，一种图基础模型，通过采用基于元学习的训练框架，解决了任务统一性和训练效率的问题。先前的模型通过预训练采用基于重构的目标，如链接预测，并假设由此产生的表示可以通过类原型等单独的统一步骤与下游任务对齐。我们通过合成和真实世界实验表明，这种程序虽然简单直观，但存在直接影响下游任务性能的限制。为了解决这些限制，Mochi在模仿下游评估协议的少样本情景上进行预训练，将训练目标与推理对齐，而不是依赖事后的统一步骤。我们展示了Mochi及其更强大的变体Mochi ++ 在跨越节点分类、链接预测和图分类的25个真实世界图数据集上相比现有的图基础模型实现了竞争性或更优异的性能，同时所需的训练时间比最强基线少8$\sim$27倍。

更新时间: 2026-05-07 16:47:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2604.22031v2

When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

Agentic AI systems, specifically LLM-driven agents that plan, invoke tools, maintain persistent memory, and delegate tasks to peer agents via protocols such as MCP and A2A, introduce a threat surface that differs materially from standalone model inference. Agents accumulate sensitive context, hold credentials, and operate across pipelines no single party fully controls, enabling prompt injection, context exfiltration, credential theft, and inter-agent message poisoning. Current defenses operate entirely within the software stack and can be silently bypassed by a sufficiently privileged adversary such as a compromised cloud operator. Confidential computing (CC) offers a hardware-rooted alternative: Trusted Execution Environments (TEEs) isolate agent code and data from privileged system software, while remote attestation enables verifiable trust across distributed deployments. This survey synthesizes the design space in four parts: (i) a unified taxonomy of six TEE platforms (Intel SGX, Intel TDX, AMD SEV-SNP, ARM TrustZone, ARM CCA, and NVIDIA H100 CC) covering deployment roles and performance tradeoffs; (ii) an agent-centric threat model spanning perception, planning, memory, action, and coordination layers mapped to nine security goals; (iii) a comparative survey of CC-based defenses distinguishing findings that transfer from single-call inference versus what requires new agentic designs; and (iv) six open challenges including compound attestation for multi-hop agent chains and GPU-TEE performance at LLM scale. While several hardware trust primitives appear mature enough for targeted deployments, no broadly established end-to-end framework yet binds them into a coherent security substrate for production agentic AI.

Updated: 2026-05-07 16:46:43

标题: 当代理处理秘密信息：关于代理式人工智能的保密计算调查

摘要: 代理型AI系统，特别是由LLM驱动的代理，可以规划、调用工具、维护持久性内存，并通过诸如MCP和A2A等协议将任务委托给对等代理，引入了一种与独立模型推断有着根本不同的威胁面。代理积累了敏感上下文，持有凭据，并跨越没有任何一方完全控制的管道运作，从而可能导致迅速的注入、上下文外泄、凭证盗窃和代理间消息中毒。当前的防御措施完全在软件栈内运作，可以被足够特权的对手（如遭受了云运营商的威胁）悄无声息地绕过。机密计算（CC）提供了一个硬件根源的替代方案：可信执行环境（TEEs）将代理代码和数据与特权系统软件隔离开来，而远程认证使得在分布式部署中能够进行可验证的信任。这项调查将设计空间综合为四个部分：（i）涵盖部署角色和性能权衡的六种TEES平台（Intel SGX、Intel TDX、AMD SEV-SNP、ARM TrustZone、ARM CCA和NVIDIA H100 CC）的统一分类法；（ii）以代理为中心的威胁模型，跨越感知、规划、内存、行动和协调层，映射到九个安全目标；（iii）对基于CC的防御措施进行比较调查，区分从单一调用推断中转移的发现和需要新的代理设计的发现；以及（iv）六个开放性挑战，包括用于多跳代理链的复合认证和在LLM规模下的GPU-TEE性能。虽然一些硬件信任基元似乎已经足够成熟用于定向部署，但尚未有广泛建立的端到端框架将它们纳入一个连贯的安全基础，用于生产代理型人工智能。

更新时间: 2026-05-07 16:46:43

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2605.03213v2

Contact Wasserstein Geodesics for Non-Conservative Schrödinger Bridges

The Schrödinger Bridge provides a principled framework for modeling stochastic processes between distributions; however, existing methods are limited by energy-conservation assumptions, which constrains the bridge's shape preventing it from model varying-energy phenomena. To overcome this, we introduce the non-conservative generalized Schrödinger bridge (NCGSB), a novel, energy-varying reformulation based on contact Hamiltonian mechanics. By allowing energy to change over time, the NCGSB provides a broader class of real-world stochastic processes, capturing richer and more faithful intermediate dynamics. By parameterizing the Wasserstein manifold, we lift the bridge problem to a tractable geodesic computation in a finite-dimensional space. Unlike computationally expensive iterative solutions, our contact Wasserstein geodesic (CWG) is naturally implemented via a ResNet architecture and relies on a non-iterative solver with near-linear complexity. Furthermore, CWG supports guided generation by modulating a task-specific distance metric. We validate our framework on tasks including manifold navigation, molecular dynamics predictions, and image generation, demonstrating its practical benefits and versatility.

Updated: 2026-05-07 16:46:01

标题: Contact Wasserstein Geodesics for Non-Conservative Schrödinger Bridges 非守恒薛定谔桥的接触Wasserstein测地线

摘要: 薛定谔桥提供了一个有原则的框架，用于建模分布之间的随机过程；然而，现有的方法受到能量守恒假设的限制，这限制了桥的形状，使其无法模拟变化能量现象。为了克服这一问题，我们引入了非守恒广义薛定谔桥（NCGSB），这是一种基于接触哈密顿力学的新颖的、能量变化的重新表述。通过允许能量随时间变化，NCGSB提供了更广泛的现实世界随机过程类别，捕捉更丰富和更真实的中间动态。通过对Wasserstein流形进行参数化，我们将桥问题提升到一个有限维空间中可处理的测地线计算。与计算昂贵的迭代解决方案不同，我们的接触Wasserstein测地线（CWG）通过ResNet架构自然实现，并依赖于一个近线性复杂度的非迭代求解器。此外，CWG支持通过调制任务特定的距离度量来进行引导生成。我们在包括流形导航、分子动力学预测和图像生成在内的任务上验证了我们的框架，展示了其实际的益处和多功能性。

更新时间: 2026-05-07 16:46:01

领域: cs.LG,math.DG

下载: http://arxiv.org/abs/2511.06856v4

Continuous Latent Diffusion Language Model

Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve generation efficiency, scalable representation learning, and effective global semantic modeling. We propose Cola DLM, a hierarchical latent diffusion language model that frames text generation through hierarchical information decomposition. Cola DLM first learns a stable text-to-latent mapping with a Text VAE, then models a global semantic prior in continuous latent space with a block-causal DiT, and finally generates text through conditional decoding. From a unified Markov-path perspective, its diffusion process performs latent prior transport rather than token-level observation recovery, thereby separating global semantic organization from local textual realization. This design yields a more flexible non-autoregressive inductive bias, supports semantic compression and prior fitting in continuous space, and naturally extends to other continuous modalities. Through experiments spanning 4 research questions, 8 benchmarks, strictly matched ~2B-parameter autoregressive and LLaDA baselines, and scaling curves up to about 2000 EFLOPs, we identify an effective overall configuration of Cola DLM and verify its strong scaling behavior for text generation. Taken together, the results establish hierarchical continuous latent prior modeling as a principled alternative to strictly token-level language modeling, where generation quality and scaling behavior may better reflect model capability than likelihood, while also suggesting a concrete path toward unified modeling across discrete text and continuous modalities.

Updated: 2026-05-07 16:44:56

标题: 持续的潜在扩散语言模型

摘要: 大型语言模型在自回归范式下取得了显著的成功，然而高质量的文本生成不一定需要与固定的从左到右顺序相结合。现有的替代方案仍然在共同实现生成效率、可扩展的表示学习和有效的全局语义建模方面存在困难。我们提出Cola DLM，这是一个通过层次化信息分解框架来进行文本生成的潜在扩散语言模型。Cola DLM首先通过Text VAE学习一个稳定的文本到潜在空间的映射，然后通过块因果DiT在连续潜在空间中建模全局语义先验，最后通过条件解码生成文本。从统一的马尔可夫路径的视角来看，它的扩散过程执行潜在先验传输，而不是令牌级别的观察恢复，从而将全局语义组织与局部文本实现分离。这种设计产生了更灵活的非自回归归纳偏差，支持在连续空间中的语义压缩和先验拟合，并自然扩展到其他连续的模态。通过涵盖4个研究问题、8个基准测试、严格匹配的约2B参数的自回归和LLaDA基线，以及扩展曲线达到约2000 EFLOPs的实验，我们确定了Cola DLM的有效整体配置，并验证了其在文本生成方面的强大扩展行为。综合来看，这些结果将层次连续潜在先验建模作为严格的令牌级别语言建模的一种合理替代方案，其中生成质量和扩展行为可能更能反映模型能力，而不是可能性，同时也为跨离散文本和连续模态的统一建模提供了具体路径。

更新时间: 2026-05-07 16:44:56

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2605.06548v1

Counterfactual Maps: What They Are and How to Find Them

Counterfactual explanations are a central tool in interpretable machine learning, yet computing them exactly for complex models remains challenging. For tree ensembles, predictions are piecewise constant over a large collection of axis-aligned hyperrectangles, implying that an optimal counterfactual for a point corresponds to its projection onto the nearest rectangle with an alternative label under a chosen metric. Existing methods largely overlook this geometric structure, relying either on heuristics with no optimality guarantees or on mixed-integer programming formulations that do not scale to interactive use. In this work, we revisit counterfactual generation through the lens of nearest-region search and introduce counterfactual maps, a global representation of recourse for tree ensembles. Leveraging the fact that any tree ensemble can be compressed into an equivalent partition of labeled hyperrectangles, we cast counterfactual search as the problem of identifying the generalized Voronoi cell associated with the nearest rectangle of an alternative label. This leads to an exact, amortized algorithm based on volumetric k-dimensional (KD) trees, which performs branch-and-bound nearest-region queries with explicit optimality certificates and sublinear average query time after a one-time preprocessing phase. Our experimental analyses on several real datasets drawn from high-stakes application domains show that this approach delivers globally optimal counterfactual explanations with millisecond-level latency, achieving query times that are orders of magnitude faster than existing exact, cold-start optimization methods.

Updated: 2026-05-07 16:42:07

标题: 反事实地图：它们是什么以及如何找到它们

摘要: 反事实解释是可解释机器学习中的重要工具，但对于复杂模型来说，精确计算它们仍然具有挑战性。对于树集成模型来说，预测在大量轴对齐超矩形上是分段常数，这意味着一个点的最佳反事实对应于在选择的度量下将其投影到具有替代标签的最近矩形。现有方法在很大程度上忽略了这种几何结构，要么依赖没有最优性保证的启发式方法，要么依赖于规模不适合交互式使用的混合整数规划公式。在这项工作中，我们通过最近区域搜索的视角重新审视反事实生成，并引入反事实映射，这是树集成模型的一种全局回溯表示。利用任何树集成模型都可以压缩成等效的带标记超矩形分区的事实，我们将反事实搜索规划为识别与具有替代标签的最近矩形相关联的广义Voronoi单元的问题。这导致了一种基于体积k维（KD）树的准确的摊销算法，该算法在一次预处理阶段后执行分支限界最近区域查询，具有明确的最优性证书和次线性平均查询时间。我们对几个来自高风险应用领域的真实数据集进行了实验分析，结果表明这种方法以毫秒级延迟提供全局最优反事实解释，实现了比现有的准确的冷启动优化方法快几个数量级的查询时间。

更新时间: 2026-05-07 16:42:07

领域: cs.LG

下载: http://arxiv.org/abs/2602.09128v2

MineEvolve: Self-Evolution with Accumulated Knowledge for Long-Horizon Embodied Minecraft Agents

Long-horizon embodied intelligence requires agents to improve through interaction, not merely to execute plans generated from static goals. A central challenge is therefore to transform past executions into knowledge that can shape future decisions. Minecraft provides a representative testbed for this problem, where tasks such as crafting tools, building redstone components, and obtaining diamond equipment involve long prerequisite chains and are frequently disrupted by missing tools, blocked paths, GUI failures, or stagnant execution. To this end, we propose \textbf{MineEvolve}, a knowledge-driven self-evolution framework that converts execution feedback into actionable behavioral knowledge. MineEvolve first uses \underline{\emph{\textbf{\ding{182}Monitor}}} to convert each subgoal execution into typed feedback, including state changes, inventory changes, failure types, progress signals, and stagnation indicators. \underline{\emph{\textbf{\ding{183}Inducer}}} then derives reusable skills from successful executions and remedies from failed or stagnant executions. \underline{\emph{\textbf{\ding{184}Curator}}} validates, merges, filters, and retrieves these knowledge entries, while \underline{\emph{\textbf{\ding{185}Adaptor}}} uses them to repair the unfinished part of the plan under repeated failures or stagnation. Experiments on the Minecraft MCU long-horizon task suite show that MineEvolve consistently improves performance across multiple language-model planners, with larger gains on high-dependency task groups. Ablation and knowledge-accumulation studies further demonstrate that converting execution signals into structured behavioral knowledge is an effective path toward self-evolving embodied agents in long-horizon environments. Our code is available at https://github.com/xzw-ustc/MC-MineEvolve.

Updated: 2026-05-07 16:41:54

标题: MineEvolve：自我演化与积累知识的长期体验Minecraft代理

摘要: 长期智能体需要通过互动来改进，而不仅仅是执行从静态目标生成的计划。因此，一个中心挑战是将过去的执行转化为可以塑造未来决策的知识。Minecraft为这个问题提供了一个代表性的试验平台，其中诸如制作工具、建造红石组件和获取钻石装备等任务涉及长期先决条件链，并经常受到缺少工具、阻塞路径、GUI故障或停滞执行的干扰。为此，我们提出了一种知识驱动的自我进化框架MineEvolve，将执行反馈转化为可操作的行为知识。MineEvolve首先使用Monitor将每个子目标执行转换为类型化反馈，包括状态变化、库存变化、失败类型、进展信号和停滞指示器。然后，Inducer从成功的执行中推导出可重复使用的技能，从失败或停滞的执行中得出补救措施。Curator验证、合并、过滤和检索这些知识条目，而Adaptor则使用它们修复计划中未完成的部分，在重复失败或停滞下。在Minecraft MCU长期任务套件上的实验表明，MineEvolve在多个语言模型规划器上持续改善性能，高依赖任务组获得更大收益。消融和知识积累研究进一步表明，将执行信号转化为结构化行为知识是长期环境中自我进化体的有效路径。我们的代码可在https://github.com/xzw-ustc/MC-MineEvolve获得。

更新时间: 2026-05-07 16:41:54

领域: cs.AI

下载: http://arxiv.org/abs/2603.13131v2

DC-DiT: Adaptive Compute and Elastic Inference for Visual Generation via Dynamic Chunking

Diffusion Transformers rely on static patchify tokenization, assigning the same token budget to smooth backgrounds, detailed object regions, noisy early timesteps, and late-stage refinements. We introduce the Dynamic Chunking Diffusion Transformer (DC-DiT), which replaces fixed patchification with a learned encoder-router-decoder scaffold that adaptively compresses the 2D input into a shorter token sequence through a chunking mechanism learned end-to-end with diffusion training. DC-DiT allocates fewer tokens to predictable regions and noisy timesteps, and more tokens to detailed regions and later refinement stages, yielding meaningful spatial segmentations and timestep-adaptive compression schedules without supervision. Furthermore, the router provides an importance ordering over retained tokens, enabling elastic inference: a single checkpoint can be evaluated at flexible compute budgets with a smooth quality-compute tradeoff. Additionally, DC-DiT can be upcycled from pretrained DiT checkpoints and is also compatible with orthogonal dynamic computation approaches. On class-conditional ImageNet generation, DC-DiT reduces inference FLOPs by up to 36.8% and improves FID by up to 37.8% over DiT baselines, yielding a stronger quality--compute Pareto frontier across model scales, resolutions, and guidance settings. More broadly, these results suggest that adaptive tokenization is a general mechanism for making visual generation both more efficient and more flexible at inference time.

Updated: 2026-05-07 16:40:07

标题: DC-DiT：通过动态分块的自适应计算和弹性推理进行视觉生成

摘要: 扩散变压器依赖于静态分块化标记化，将相同的标记预算分配给平滑背景、详细对象区域、嘈杂的早期时间步骤和后期细化。我们引入了动态分块扩散变压器（DC-DiT），它通过学习的编码器-路由器-解码器框架替换了固定的分块化，通过一个端到端学习的分块机制将2D输入自适应地压缩为更短的标记序列。DC-DiT为可预测区域和嘈杂的时间步骤分配较少的标记，并为详细区域和后期细化阶段分配更多的标记，产生有意义的空间分割和时间步自适应的压缩计划，无需监督。此外，路由器提供了保留标记的重要顺序，实现了弹性推理：一个单一的检查点可以在灵活的计算预算下进行评估，实现平滑的质量-计算折衷。此外，DC-DiT可以从预训练的DiT检查点进行升级，并且还与正交的动态计算方法兼容。在以类条件为基础的ImageNet生成中，DC-DiT将推理FLOPs降低了高达36.8％，并将FID提高了高达37.8％，相对于DiT基线，跨模型规模、分辨率和引导设置得到更强大的质量-计算帕累托前沿。更广泛地说，这些结果表明，自适应标记化是使视觉生成在推理时更加高效和灵活的一般机制。

更新时间: 2026-05-07 16:40:07

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2603.06351v2

Hedging Memory Horizons for Non-Stationary Prediction via Online Aggregation

We study online prediction under distribution shift, where inputs arrive chronologically and outcomes are revealed only after prediction. In this setting, predictors must remain stable in quiet regimes yet adapt when regimes shift, and the right adaptation memory is unknown in advance. We propose MELO (Memory-hedged Exponentially Weighted Least-Squares Online aggregation), a model-agnostic method that hedges across adaptation scales: it wraps any non-anticipating base-predictor pool with exponentially weighted least-squares (EWLS) adaptation experts at multiple forgetting factors, and aggregates raw and EWLS-adapted forecasts with MLpol, a parameter-free online aggregation rule. Under boundedness conditions, we establish deterministic oracle inequalities showing that it competes with both the best raw predictor and the best bounded, time-varying affine combinations of the base predictions, up to a path-length-dependent tracking cost and a sublinear aggregation overhead. We evaluate MELO on French national electricity-load forecasting through the COVID-19 lockdown using no regime indicators, lockdown dates, or policy covariates. MELO reduces overall RMSE by 34.7\% relative to base-only MLpol and achieves lower overall RMSE than a TabICL reference supplied with an external COVID policy-response covariate. Moreover, MELO requires only lightweight per-step recursive updates without model retraining.

Updated: 2026-05-07 16:38:21

标题: 通过在线聚合对非平稳预测进行对冲记忆地平线

摘要: 我们研究了在分布变化下的在线预测，其中输入按时间顺序到达，结果仅在预测后才揭示。在这种情况下，预测器必须在安静的环境中保持稳定，但在环境变化时进行调整，而正确的适应性记忆事先是未知的。我们提出了MELO（Memory-hedged Exponentially Weighted Least-Squares Online aggregation），这是一种与模型无关的方法，可以在各种适应性尺度上进行对冲：它将任何非预测基础预测池与多个遗忘因子的指数加权最小二乘（EWLS）适应专家结合起来，并使用MLpol将原始和经过EWLS调整的预测进行聚合，这是一种无参数的在线聚合规则。在有界性条件下，我们建立了确定性的Oracle不等式，表明它与最佳的原始预测器和最佳的有界、时变的基础预测的仿射组合竞争，直到路径长度相关的跟踪成本和次线性的聚合开销。我们通过法国国家电力负荷预测在COVID-19封锁期间进行了MELO评估，而无需使用制度指标、封锁日期或政策协变量。相对于仅使用基础MLpol的情况下，MELO将总体RMSE降低了34.7\%，并且比使用外部COVID政策响应协变量提供的TabICL参考值具有更低的总体RMSE。此外，MELO只需要轻量级递归更新，无需重新训练模型。

更新时间: 2026-05-07 16:38:21

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2605.06541v1

Ex Ante Evaluation of AI-Induced Idea Diversity Collapse

Creative AI systems are typically evaluated at the level of individual utility, yet creative outputs are consumed in populations: an idea loses value when many others produce similar ones. This creates an evaluation blind spot, as AI can improve individual outputs while increasing population-level crowding. We introduce a human-relative framework for benchmarking AI-induced human diversity collapse without requiring human-AI interaction data, providing an ex ante protocol to estimate crowding risk from model-only generations and matched unaided human baselines. By modeling ideas as congestible resources, we show that source-level crowding is identifiable from within-distribution comparisons, yielding an excess-crowding coefficient $Δ$ and a human-relative diversity ratio $ρ$. We show that $ρ\ge1$ is the no-excess-crowding parity condition and connect $Δ$ to an adoption game with exposure-dependent redundancy costs. Across short stories, marketing slogans, and alternative-uses tasks, three frontier LLMs fall below parity across crowding kernels. Estimates stabilize with feasible model-only sample sizes. Importantly, generation-protocol variants show that crowding can be reduced through targeted design, making diversity collapse an actionable, development-time evaluation target for population-aware creative AI.

Updated: 2026-05-07 16:38:17

标题: 人工智能引发的想法多样性崩溃的前置评估

摘要: 创新人工智能系统通常在个体效用水平上进行评估，然而创造性产出是在群体中消耗的：当许多其他人产生类似的想法时，一个想法就会失去价值。这造成了评估的盲点，因为人工智能可以提高个体产出，同时增加群体水平的拥挤。我们引入了一个人类相关的框架，用于基准测试由人工智能引起的人类多样性崩溃，而不需要人工智能和人类之间的互动数据，提供了一个先验协议来估计从仅模型生成和匹配的未辅助人类基线中的拥挤风险。通过将想法建模为拥挤资源，我们表明源级拥挤可以通过分布内比较来识别，产生过度拥挤系数$Δ和人类相关多样性比率$ρ。我们表明$ρ\ge1$是无过度拥挤平价条件，并将$Δ$连接到一个具有暴露相关冗余成本的采纳游戏。在短篇故事、营销口号和替代用途任务中，三个前沿LLM在拥挤核中都低于平价。估计结果随着可行的仅模型样本量而稳定。重要的是，生成协议变体表明，通过有针对性的设计可以减少拥挤，使多样性崩溃成为一个可操作的、发展时间内的评估目标，用于考虑群体感知的创造性人工智能。

更新时间: 2026-05-07 16:38:17

领域: cs.AI,cs.GT

下载: http://arxiv.org/abs/2605.06540v1

Attribution-Guided Pruning for Insight and Control: Circuit Discovery and Targeted Correction in Small-scale LLMs

Large Language Models (LLMs) are widely deployed in real-world applications, yet their internal mechanisms remain difficult to interpret and control, limiting our ability to diagnose and correct undesirable behaviors. Mechanistic interpretability addresses this challenge by identifying circuits -- subsets of model components responsible for specific behaviors. However, discovering such circuits in LLMs remains difficult due to their scale and complexity. We frame circuit discovery as identifying parameters that contribute most to model outputs on task-specific inputs, and use Layer-wise Relevance Propagation (LRP) with reference samples to attribute and extract these components via pruning. Building on this, we introduce contrastive relevance to isolate circuits associated with undesired behaviors while preserving general capabilities, enabling targeted model correction. On OPT-125M, we show that pruning as little as ~0.3% of neurons substantially reduces toxic outputs, while pruning approximately 0.03% of weight elements mitigates repetitive text generation without degrading general performance. These results establish attribution-guided pruning as an effective mechanism for identifying and intervening on behavior-specific circuits in LLMs. We further validate our findings on additional small-scale language models, demonstrating that the proposed approach transfers across architectures. Our code is publicly available at https://github.com/erfanhatefi/SparC3.

Updated: 2026-05-07 16:37:45

标题: 基于归因指导的修剪：小规模LLM中的电路发现和目标修正

摘要: 大型语言模型（LLMs）广泛应用于实际应用中，但它们的内部机制仍然难以解释和控制，限制了我们诊断和纠正不良行为的能力。机械解释性通过识别电路--负责特定行为的模型组件的子集--来解决这一挑战。然而，在LLMs中发现这样的电路仍然困难，因为它们的规模和复杂性。我们将电路发现框架定义为识别对特定任务输入对模型输出贡献最大的参数，并使用参考样本的逐层相关传播（LRP）来通过修剪属性和提取这些组件。在此基础上，我们引入对比相关性来隔离与不良行为相关的电路，同时保留通用功能，从而实现针对模型纠正的目标。在OPT-125M上，我们展示即使修剪约0.3％的神经元也大大降低有毒输出，同时修剪约0.03％的权重元素可以减轻重复文本生成而不降低一般性能。这些结果确立了基于归因引导修剪作为在LLMs中识别和干预特定行为电路的有效机制。我们进一步验证了我们在其他小规模语言模型上的发现，证明了所提出的方法跨架构转移。我们的代码可以在https://github.com/erfanhatefi/SparC3 上公开获取。

更新时间: 2026-05-07 16:37:45

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2506.13727v3

Diffusion-Based Posterior Sampling: A Feynman-Kac Analysis of Bias and Stability

Diffusion-based posterior samplers use pretrained diffusion priors to sample from measurement- or reward-conditioned posteriors, and are widely used for inverse problems. Yet their theoretical behavior remains poorly understood: even with exact prior scores, their outputs are biased, and in low-temperature regimes their discretizations can become unstable. We characterize this bias by introducing a tractable surrogate path connecting the true posterior to a standard Gaussian and comparing it to the sampler's path. Their density ratio satisfies a parabolic PDE whose reaction term measures the accumulated bias. A Feynman-Kac representation then expresses the Radon-Nikodym correction as an explicit path expectation, identifying which posterior regions are over- or under-sampled. We apply this framework to DPS and STSL, a related sampler. For DPS, the correction is an Ornstein-Uhlenbeck path expectation coupling the data conditional covariance with the reward curvature, revealing where DPS over- or under-samples. Next, we reinterpret STSL as an auxiliary drift that steers trajectories toward low-uncertainty regions, flattening the spatially varying part of the DPS reaction term. Finally, we characterize early guidance-stopping, a common mitigation for low-temperature instabilities caused by forward-Euler integration of the vector field. Together, these results clarify sampler bias, explain existing correctives, and guide stable variant designs.

Updated: 2026-05-07 16:37:29

标题: 扩散后验抽样：偏差和稳定性的Feynman-Kac分析

摘要: 基于扩散的后验抽样器利用预训练的扩散先验从测量或奖励条件的后验中抽样，被广泛用于逆问题。然而，它们的理论行为仍然很难理解：即使具有确切的先验得分，它们的输出也是有偏的，在低温区域，它们的离散化可能变得不稳定。我们通过引入一个可处理的替代路径来表征这种偏差，连接真实后验和标准高斯，并将其与抽样器的路径进行比较。它们的密度比满足一个抛物线PDE，其反应项测量了积累的偏差。然后，通过Feynman-Kac表示，将Radon-Nikodym校正表达为一个显式路径期望，确定哪些后验区域被过度或不足采样。我们将这个框架应用到DPS和STSL，一个相关的抽样器。对于DPS，校正是一个Ornstein-Uhlenbeck路径期望，将数据条件协方差与奖励曲率耦合在一起，揭示了DPS过度或不足采样的地方。接下来，我们重新解释STSL作为一个辅助漂移，将轨迹引向低不确定性区域，平坦化DPS反应项的空间变化部分。最后，我们表征了早期引导停止，这是一种常见的缓解低温不稳定性的措施，由于向前欧拉积分导致的矢量场。这些结果共同阐明了抽样器的偏差，解释了现有的校正措施，并指导了稳定的变体设计。

更新时间: 2026-05-07 16:37:29

领域: cs.LG

下载: http://arxiv.org/abs/2605.06538v1

Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance

In recent years, open-source efforts like Senorita-2M have propelled video editing toward natural language instruction. However, current publicly available datasets predominantly focus on local editing or style transfer, which largely preserve the original scene structure and are easier to scale. In contrast, Background Replacement, a task central to creative applications such as film production and advertising, requires synthesizing entirely new, temporally consistent scenes while maintaining accurate foreground-background interactions, making large-scale data generation significantly more challenging. Consequently, this complex task remains largely underexplored due to a scarcity of high-quality training data. This gap is evident in poorly performing state-of-the-art models, e.g., Kiwi-Edit, because the primary open-source dataset that contains this task, i.e., OpenVE-3M, frequently produces static, unnatural backgrounds. In this paper, we trace this quality degradation to a lack of precise background guidance during data synthesis. Accordingly, we design a scalable pipeline that generates foreground and background guidance in a decoupled manner with strict quality filtering. Building on this pipeline, we introduce Sparkle, a dataset of ~140K video pairs spanning five common background-change themes, alongside Sparkle-Bench, the largest evaluation benchmark tailored for background replacement to date. Experiments demonstrate that our dataset and the model trained on it achieve substantially better performance than all existing baselines on both OpenVE-Bench and Sparkle-Bench. Our proposed dataset, benchmark, and model are fully open-sourced at https://showlab.github.io/Sparkle/.

Updated: 2026-05-07 16:35:34

标题: Sparkle：通过解耦指导实现生动的指导视频背景替换

摘要: 近年来，像Senorita-2M这样的开源努力推动了视频编辑向自然语言指导的方向发展。然而，当前公开可用的数据集主要关注本地编辑或风格转移，这在很大程度上保留了原始场景结构并且更容易扩展。相比之下，背景替换是创意应用（如电影制作和广告）中的一个核心任务，需要合成全新、时间一致的场景，同时保持准确的前景-背景交互，使得大规模数据生成变得更加具有挑战性。因此，由于高质量训练数据的匮乏，这一复杂任务仍然大多未被探索。这一差距在表现不佳的最新模型（如Kiwi-Edit）中明显，因为包含这一任务的主要开源数据集（即OpenVE-3M）经常产生静态、不自然的背景。在本文中，我们将这种质量下降归因于在数据合成过程中缺乏精确的背景指导。因此，我们设计了一个可扩展的流水线，以分离的方式生成前景和背景指导，并进行严格的质量过滤。基于这个流水线，我们介绍了Sparkle，一个包含约140K个视频对的数据集，涵盖了五种常见的背景变换主题，以及Sparkle-Bench，迄今为止专门针对背景替换的最大评估基准。实验表明，我们的数据集和基于该数据集训练的模型在OpenVE-Bench和Sparkle-Bench上的表现明显优于所有现有基线。我们提出的数据集、基准和模型完全开源，网址为https://showlab.github.io/Sparkle/。

更新时间: 2026-05-07 16:35:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2605.06535v1

DeEscalWild: A Real-World Benchmark for Automated De-Escalation Training with SLMs

Effective de-escalation is critical for law enforcement safety and community trust, yet traditional training methods lack scalability and realism. While Large Language Models (LLMs) enable dynamic, open-ended simulations, their substantial computational footprint renders them impractical for deployment on the lightweight, portable hardware required for immersive field training. Small Language Models (SLMs) offer a viable real-time alternative but suffer from a critical scarcity of high-quality, domain-specific training data. To bridge this gap, we present DeEscalWild, a novel benchmark dataset curated from a multi-stage pipeline of in-the-wild police-civilian interactions extracted from publicly available video repositories. Starting with 5,000 raw inputs, we employed a rigorous hybrid filtering process combining human-in-the-loop verification with LLM-as-a-Judge evaluation to distill 1,500 high-fidelity scenarios. The resulting corpus comprises 285,887 dialogue turns, totaling approximately 4.7 million tokens. Extensive experiments demonstrate that SLMs fine-tuned on this data significantly outperform their base counterparts across ROUGE-L, BLEU-4, METEOR, BERTScore, Realism Score, and human evaluation metrics. Notably, our fine-tuned Qwen 2.5 (3B-Instruct) surpasses the general-purpose Gemini 2.5 Flash model when evaluated under equivalent conditions, demonstrating that domain-optimized SLMs can achieve superior performance with a fraction of the computational cost. This work establishes the foundational infrastructure for accessible, low-latency, and privacy-preserving officer training systems at the edge. We publicly release our code(https://github.com/Hasebul/DeEscalWild-Benchmark-Framework) and dataset(https://doi.org/10.7910/DVN/CWMCZI).

Updated: 2026-05-07 16:35:07

标题: DeEscalWild：一种用于SLM自动化降级训练的实际基准测试

摘要: 有效的降级是执法安全和社区信任至关重要的，然而传统的训练方法缺乏可扩展性和现实性。虽然大型语言模型（LLMs）能够实现动态、开放式的模拟，但它们庞大的计算占用量使它们无法部署在轻量、便携的硬件上，这是沉浸式现场训练所必需的。小型语言模型（SLMs）提供了一种可行的实时替代方案，但是由于高质量、特定领域训练数据的严重匮乏而受到影响。为了弥补这一差距，我们提出了DeEscalWild，这是一个新颖的基准数据集，从公开可获得的视频存储库中提取了野外警民互动的多阶段管道。从5000个原始输入开始，我们采用了严格的混合过滤过程，将人机协同验证与LLM作为评判者评估相结合，提炼出了1500个高保真度的场景。最终的语料库包括285,887个对话轮次，总计约470万个标记。广泛的实验表明，在这些数据上进行微调的SLMs在ROUGE-L、BLEU-4、METEOR、BERTScore、真实评分和人类评估指标上明显优于基准模型。值得注意的是，我们微调的Qwen 2.5（3B-Instruct）在等效条件下评估时超过了通用的Gemini 2.5 Flash模型，表明经过领域优化的SLMs可以在计算成本的一小部分下实现卓越性能。这项工作为边缘可访问、低延迟和隐私保护的警官培训系统建立了基础设施。我们公开发布了我们的代码（https://github.com/Hasebul/DeEscalWild-Benchmark-Framework）和数据集（https://doi.org/10.7910/DVN/CWMCZI）。

更新时间: 2026-05-07 16:35:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2604.13075v2

AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling

State-of-the-art cloud-native applications require intelligent schedulers that can effectively balance system stability, resource utilisation, and associated costs. While Kubernetes provides feasibility-based placement by default, recent research efforts have explored the use of reinforcement learning (RL) for more intelligent scheduling decisions. However, current RL-based schedulers have three major limitations. First, most of these schedulers use monolithic centralised agents, which are non-scalable for large heterogeneous clusters. Second, the ones that use multi-objective reward functions assume simple, static, linear combinations of the objectives. Third, no previous work has produced a stress-aware scheduler that can react adaptively to dynamic conditions. To address these gaps in current research, we propose the Adaptive Graph-enhanced Multi-Agent Reinforcement Learning Dynamic Kubernetes Scheduler (AGMARL-DKS). AGMARL-DKS addresses these gaps by introducing three major innovations. First, we construct a scalable solution by treating the scheduling challenge as a cooperative multi-agent problem, where every cluster node operates as an agent, employing centralised training methods before decentralised execution. Second, to be context-aware and yet decentralised, we use a Graph Neural Network (GNN) to build a state representation of the global cluster context at each agent. This represents an improvement over methods that rely solely on local observations. Finally, to make trade-offs between these objectives, we use a stress-aware lexicographical ordering policy instead of a simple, static linear weighting of these objectives. The evaluations in Google Kubernetes Engine (GKE) reveal that AGMARL-DKS significantly outperforms the default scheduler in terms of fault tolerance, utilisation, and cost, especially in scheduling batch and mission-critical workloads.

Updated: 2026-05-07 16:33:01

标题: AGMARL-DKS：一种用于动态Kubernetes调度的自适应图增强多智能体强化学习

摘要: 最先进的云原生应用程序需要智能调度器，可以有效平衡系统稳定性、资源利用率和相关成本。虽然Kubernetes默认提供基于可行性的放置，但最近的研究努力探索了使用强化学习（RL）进行更智能的调度决策。然而，当前基于RL的调度器存在三个主要限制。首先，大多数这些调度器使用单体集中式代理，对于大型异构集群来说不可扩展。其次，使用多目标奖励函数的调度器假设目标的简单、静态、线性组合。第三，以前没有任何工作产生一个能够适应动态条件的压力感知调度器。为了弥补当前研究中的这些差距，我们提出了自适应图增强多智能体强化学习动态Kubernetes调度器（AGMARL-DKS）。AGMARL-DKS通过引入三个重大创新来解决这些差距。首先，我们将调度挑战视为一个合作多智能体问题，构建一个可扩展的解决方案，其中每个集群节点都作为一个代理运行，在分散执行之前采用集中式训练方法。其次，为了具备上下文感知性，但又分散化，我们使用图神经网络（GNN）在每个代理处构建全局集群上下文的状态表示。这比仅依赖于本地观察的方法要好。最后，为了在这些目标之间做出权衡，我们使用了一个压力感知的词典排序策略，而不是简单的、静态的线性加权这些目标。在Google Kubernetes Engine（GKE）中的评估显示，AGMARL-DKS在容错性、利用率和成本方面明显优于默认调度器，特别是在调度批处理和关键任务工作负载方面。

更新时间: 2026-05-07 16:33:01

领域: cs.DC,cs.LG,cs.MA

下载: http://arxiv.org/abs/2603.12031v2

SpatialEpiBench: Benchmarking Spatial Information and Epidemic Priors in Forecasting

Accurate epidemic forecasting is crucial for public health response, resource allocation, and outbreak intervention, but remains difficult with sparse, noisy, and highly non-stationary data. Because epidemics unfold across interacting regions, spatiotemporal methods are natural candidates for improving forecasts. Despite growing interest in spatial information, no standardized benchmark exists, and current evaluations often use simple chronological train-test splits that do not reflect real-time forecasting practice. We address this gap with SpatialEpiBench, a challenging benchmark for spatiotemporal epidemic forecasting in realistic public-health settings. SpatialEpiBench includes 11 epidemic datasets with standardized rolling evaluations and outbreak-specific metrics. We evaluate adjacency-informed forecasting models with widely used epidemic priors that adapt general models to epidemiology, but find that most methods underperform a simple last-value baseline from 1 day to 1 month ahead, even during outbreaks and with these priors. We identify three major failure modes: (1) poor outbreak anticipation, (2) difficulty handling sparsity and noise, and (3) limited utility of common geographic adjacency for epidemiological spatial information. We release benchmark data, code, and instructions at https://github.com/Rachel-Lyu/SpatialEpiBench to support development of operationally useful epidemic forecasting models.

Updated: 2026-05-07 16:31:43

标题: SpatialEpiBench：基于空间信息和流行病先验知识的预测性基准测试

摘要: 准确的疫情预测对公共卫生应对、资源分配和疫情干预至关重要，但在稀疏、嘈杂和高度非平稳的数据情况下仍然很困难。由于疫情在相互作用的地区之间展开，时空方法是改进预测的自然候选方法。尽管对空间信息越来越感兴趣，但目前尚无标准化的基准，当前评估通常使用简单的时间顺序训练-测试拆分，这并不反映实时预测实践。我们通过SpatialEpiBench填补了这一空白，这是一个具有挑战性的基准，用于在现实的公共卫生环境中进行时空疫情预测。SpatialEpiBench包括11个疫情数据集，具有标准化的滚动评估和特定于爆发的指标。我们评估了使用广泛使用的疫情先验知识调整通用模型以适应流行病学的邻接信息预测模型，但发现大多数方法在爆发期间甚至在使用这些先验知识的情况下，在未来1天到1个月内都不如简单的最后值基线表现。我们确定了三种主要失败模式：（1）对爆发的预期不足，（2）难以处理稀疏和嘈杂数据，（3）常见地理邻接对流行病学空间信息的有限用途。我们在https://github.com/Rachel-Lyu/SpatialEpiBench上发布了基准数据、代码和说明，以支持开发具有操作性的有用的疫情预测模型。

更新时间: 2026-05-07 16:31:43

领域: cs.AI

下载: http://arxiv.org/abs/2605.06530v1

Market-Alignment Risk in Pricing Agents: Trace Diagnostics and Trace-Prior RL under Hidden Competitor State

Outcome metrics can certify the wrong behavior. We study this failure in a two-hotel revenue-management simulator where Hotel A trains an agent against a fixed rule-based revenue-management competitor, Hotel B. A standard learning agent can obtain near-reference revenue per available room (RevPAR) while failing to learn market-like yield management: it sells too aggressively, undercuts, or collapses to modal price buckets. We diagnose this as a Goodhart-style failure under partial observability. Hotel A cannot observe the competitor's remaining inventory, booking curve, or pricing rule, so the same Hotel A-visible state maps to multiple plausible Hotel B prices. Deterministic value-based RL and deterministic copying collapse this unresolved uncertainty into shortcut behavior. We introduce a trace-level diagnostic protocol using RevPAR, occupancy, ADR, full price-bucket distributions, L1/JS distances, and seed-level confidence intervals. The verified repair is Trace-Prior RL: learn a distributional market prior from lagged market traces, then train a stochastic pricing policy with a RevPAR reward and a KL penalty to the learned prior. The final policy matches Hotel B's RevPAR, occupancy, ADR, and price distribution within seed-level uncertainty, while still optimizing Hotel A's own reward. We argue that the contribution is not a new optimizer and not a hotel-pricing leaderboard, but a reproducible failure-and-repair recipe for agentic systems where scalar rewards are easy to game and the intended behavior is only visible in traces. A key finding is that higher exact action accuracy can worsen aggregate trace alignment when the target is distributional.

Updated: 2026-05-07 16:31:39

标题: 市场-定位风险在定价代理商中的研究：在隐藏竞争对手状态下的追踪诊断和追踪先验强化学习

摘要: 结果指标可能会证实错误的行为。我们在一个包括两家酒店的收益管理模拟器中研究了这种失败，其中酒店A训练一个代理以对抗一个基于固定规则的收益管理竞争对手，即酒店B。一个标准的学习代理可以获得接近参考每间可用客房收入（RevPAR），但却未能学会类似市场的收益管理：它销售过于激进，降价或者倒退到模态价格桶。我们将这种情况诊断为在部分可观察性下的Goodhart风格失败。酒店A无法观察到竞争对手的剩余库存、预订曲线或定价规则，因此相同的酒店A可见状态映射到多个合理的酒店B价格。基于确定性价值的强化学习和确定性复制将这种未解决的不确定性转化为捷径行为。我们引入了一个跟踪级别的诊断协议，使用RevPAR、入住率、ADR、完整的价格桶分布、L1/JS距离和种子级置信区间。经验证的修复方法是追踪优先强化学习：从滞后市场轨迹中学习分布市场先验，然后训练一个带有RevPAR奖励和对学习先验的KL惩罚的随机定价策略。最终策略与酒店B的RevPAR、入住率、ADR和价格分布在种子级别的不确定性范围内匹配，同时仍然优化酒店A自己的奖励。我们认为，这项贡献不是一个新的优化器，也不是一个酒店定价排行榜，而是一个可重现的失败和修复配方，适用于标量奖励容易被操纵，且预期行为仅在迹象中可见的代理系统。一个关键发现是，更高的确切行动准确性可能会加剧当目标是分布时的总体迹象对齐。

更新时间: 2026-05-07 16:31:39

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2605.06529v1

Process Matters more than Output for Distinguishing Humans from Machines

Reliable human-machine discrimination is becoming increasingly important as large language models and autonomous agents are deployed in online settings. Existing approaches evaluate whether a system can produce behavior or responses indistinguishable from those of a human, following the emphasis on outputs as a criterion for intelligence proposed by Alan Turing. Cognitive science offers an alternative perspective: evaluating the process by which behavior is produced. To test whether cognitive processes can reliably distinguish humans from machines, we introduce CogCAPTCHA30, a battery of 30 cognitive tasks designed to elicit diagnostic process-level features even when task performance is matched. Across the battery, process-level features provide stronger discriminative signal than performance metrics alone, reliably distinguishing humans from agents even under output matching (mean process-feature classifier AUC = 0.88). To evaluate agentic process differences, we compare off-the-shelf frontier agents (Claude Sonnet 4.5, GPT-5, Gemini 2.5 Pro), Centaur (a language model fine-tuned on 10.7M human decisions), and two task-specific fine-tuning approaches applied to Qwen2.5-1.5B-Instruct: action-level supervised fine-tuning (A-SFT) and process-level fine-tuning (P-SFT), which directly optimizes process features. Broad fine-tuning on human decisions improves human-like task processes relative to off-the-shelf agents, while task-specific process-level supervision further improves behavioral mimicry. However, this advantage diminishes under cross-task transfer when supervised process targets do not naturally generalize across tasks. Explicit process-level supervision can improve human behavioral mimicry, but only if appropriate task-specific process representations are available, highlighting process specification as a bottleneck for achieving human-like cognitive processes in machines.

Updated: 2026-05-07 16:30:35

标题: 过程比产出更重要：区分人类和机器

摘要: 可靠的人机辨识在大型语言模型和自主代理部署在在线环境中变得越来越重要。现有方法评估系统是否能够产生与人类行为或响应无法区分的行为，这符合艾伦·图灵提出的将输出作为智能标准的重点。认知科学提供了一个替代视角：评估行为产生过程。为了测试认知过程能否可靠地区分人类和机器，我们引入了CogCAPTCHA30，一个由30个认知任务组成的测试，旨在引出诊断性过程级特征，即使任务表现相匹配时也能实现。在整个测试中，过程级特征提供了比仅使用表现指标更强的区分信号，即使在输出匹配的情况下也能可靠地区分人类和代理人（平均过程特征分类器AUC = 0.88）。为了评估代理过程差异，我们比较了现成的先进代理（Claude Sonnet 4.5，GPT-5，Gemini 2.5 Pro），Centaur（一个在1070万人类决策上进行微调的语言模型）以及两种应用于Qwen2.5-1.5B-Instruct的任务特定微调方法：动作级监督微调（A-SFT）和过程级微调（P-SFT），后者直接优化过程特征。在人类决策上进行广泛微调相对于现成的代理人改善了类似人类的任务过程，而任务特定的过程级督导进一步改善了行为模仿。然而，在跨任务转移时，当监督过程目标在任务之间不能自然泛化时，这种优势会减弱。显式的过程级督导可以改善人类行为的模仿，但前提是必须有适当的任务特定过程表征，突出了过程规范作为实现机器类似人类认知过程的瓶颈。

更新时间: 2026-05-07 16:30:35

领域: cs.AI

下载: http://arxiv.org/abs/2605.06524v1

On the Implicit Reward Overfitting and the Low-rank Dynamics in RLVR

Recent extensive research has demonstrated that the enhanced reasoning capabilities acquired by models through Reinforcement Learning with Verifiable Rewards (RLVR) are primarily concentrated within the rank-1 components. Predicated on this observation, we employed Periodic Rank-1 Substitution and identified a counterintuitive phenomenon: RLVR may exhibit implicit reward overfitting to the training dataset. Specifically, the model can achieve satisfactory performance on the test set even when its rewards remain relatively low during the training process. Furthermore, we characterize three distinct properties of RL training: (1) The effective rank-1 component in RLVR don't maintain other model knowledge except mathematical reasoning capability. (2) RLVR fundamentally functions by optimizing a specific singular spectrum. The distribution of singular values of almost all linear layers in RLVR-trained model behaves like heavy-tailed distribution. (3) the left singular vectors associated with rank-1 components demonstrate a stronger alignment tendency during training, which echoes the discovery that RLVR is optimizing sampling efficiency in essence. Taken together, our findings and analysis further reveal how RLVR shapes model parameters and offer potential insights for improving existing RL paradigms or other training paradigms to implement continual learning.

Updated: 2026-05-07 16:30:28

标题: 关于RLVR中的隐式奖励过拟合和低秩动态的研究

摘要: 最近的广泛研究表明，通过可验证奖励的强化学习（RLVR）所获得的增强推理能力主要集中在等级1组件中。基于这一观察，我们采用周期性等级1替代，并发现了一种反直觉的现象：RLVR可能对训练数据集表现出隐含的奖励过拟合。具体而言，即使模型在训练过程中的奖励保持相对较低，也可以在测试集上实现令人满意的性能。此外，我们表征了RL训练的三个独特属性：（1）RLVR中的有效等级1组件除了数学推理能力之外不保留其他模型知识。（2）RLVR基本上通过优化特定的奇异谱来发挥作用。RLVR训练模型中几乎所有线性层的奇异值分布表现出重尾分布的特性。（3）与等级1组件相关的左奇异向量在训练过程中表现出更强的对齐倾向，这呼应了RLVR在本质上优化抽样效率的发现。综上所述，我们的发现和分析进一步揭示了RLVR如何塑造模型参数，并为改进现有的RL范例或其他训练范例以实现持续学习提供潜在的见解。

更新时间: 2026-05-07 16:30:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2605.06523v1

Bootstrapping Post-training Signals for Open-ended Tasks via Rubric-based Self-play on Pre-training Text

Self-play has recently emerged as a promising paradigm for post-training Large Language Models (LLMs). In self-play, the target LLM creates the task input (e.g., a question), which it then addresses itself by producing a task output (e.g., an answer). A reward model evaluates the output, and the rewards are used to train the LLM, typically via Reinforcement Learning (RL). A key benefit of self-play for post-training LLMs is its minimal supervision costs: self-play avoids the need for high-quality input-output pairs traditionally constructed by humans or expensive proprietary models. Existing work, however, explores self-play only for verifiable tasks, such as math and coding, for which objective ground truth is available and easily checkable. In this paper, we seek to extend self-play to more realistic open-ended tasks. We propose POP, a self-play framework that uses the same LLM to synthesize evaluation rubrics along with each input-output pair. The rubric is used to evaluate outputs and train the model. Crucially, we ground the framework on a content-rich pretraining corpus to (1) enable an exploitable generation-verification gap and reduce reward hacking, and (2) prevent mode collapse. On Qwen-2.5-7B, POP increases performance of both the pretrained base model and instruction-tuned model on multiple tasks ranging from long-form healthcare QA to creative writing and instruction following.

Updated: 2026-05-07 16:30:21

标题: 使用基于标尺的自我对弈在预训练文本上为开放式任务引导后训练信号

摘要: 最近，自我对弈作为一个有前景的范式出现在后训练大型语言模型（LLMs）中。在自我对弈中，目标LLM创建任务输入（例如一个问题），然后通过生成任务输出（例如一个答案）来解决该任务。一个奖励模型评估输出，并利用奖励来训练LLM，通常通过强化学习（RL）。自我对弈对于后训练LLMs的一个关键好处是其最小的监督成本：自我对弈避免了传统由人类构建的高质量输入-输出对或昂贵的专有模型的需求。然而，现有工作仅探索自我对弈用于可验证任务，例如数学和编码，对于这些任务，客观的真相是可用且易于检查的。在本文中，我们试图将自我对弈扩展到更现实的开放式任务。我们提出了POP，一个自我对弈框架，使用同一个LLM来综合评估标准和每个输入-输出对。这个标准用来评估输出并训练模型。关键是，我们将这个框架基于一个内容丰富的预训练语料库，以便（1）实现可利用的生成-验证差距并减少奖励欺骗，以及（2）防止模式崩溃。在Qwen-2.5-7B上，POP提高了预训练基础模型和指导调优模型在从长篇医疗问答到创意写作和遵循指导等多个任务上的表现。

更新时间: 2026-05-07 16:30:21

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2604.20051v2

Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models

Foundation models (FMs) are increasingly deployed in open-world settings where distribution shift is the rule rather than the exception. The out-of-distribution (OOD) phenomena they face -- knowledge boundaries, capability ceilings, compositional shifts, and open-ended task variation -- differ in kind from the settings that have shaped prior OOD research, and are further complicated because the pretraining and post-training distributions of modern FMs are often only partially observed. Our position is that OOD for foundation models is a structurally distinct problem that cannot be solved within the prevailing model-centric paradigm, and that agentic systems constitute the missing paradigm required to address it. We defend this claim through four steps. First, we give a stage-aware formalization of OOD that accommodates partially observed multi-stage training distributions. Second, we prove a parameter coverage ceiling: there exist practically relevant inputs that no model-centric method (training-time or test-time) can handle within tolerance $\varepsilon$, for reasons intrinsic to parameter-based representation. Third, we characterize agentic OOD systems by four structural properties -- perception, strategy selection, external action, and closed-loop verification -- and show that they strictly extend the reachable set beyond the ceiling. Fourth, we respond to seven counterarguments, conceding two, and outline a research agenda. We do not claim that agentic methods subsume model-centric ones; we argue that the two are complementary, and that progress on FM-OOD requires explicit recognition of the agentic paradigm as a first-class research direction.

Updated: 2026-05-07 16:29:33

标题: 主动型人工智能是基础模型中缺失的范式，用于处理基于分布外泛化问题

摘要: 基础模型（FMs）越来越多地部署在开放世界的环境中，其中分布漂移是规则而不是例外。它们面临的超出分布（OOD）现象——知识边界、能力上限、组合性转变和开放式任务变异——与先前OOD研究塑造的环境有所不同，并且更加复杂，因为现代FMs的预训练和后训练分布通常只能部分观察到。我们的观点是，基础模型的OOD是一个结构上独特的问题，无法在当前的基于模型的范式内解决，而主体系统构成了必须解决它的缺失范式。我们通过四个步骤来证明这一观点。首先，我们给出了一个阶段感知的OOD形式化，以适应部分观察到的多阶段训练分布。其次，我们证明了一个参数覆盖上限：存在实际相关的输入，没有任何模型中心方法（训练时或测试时）可以在容差$\varepsilon$内处理，原因在于基于参数的表示。第三，我们通过四个结构性质来表征主体性OOD系统——感知、策略选择、外部行动和闭环验证——并展示它们严格地扩展了可达集合超出上限。第四，我们回应了七个反驳，承认了两个，并概述了一个研究议程。我们不主张主体性方法包含模型中心方法；我们认为这两者是互补的，并且在FM-OOD上的进展需要明确认识主体性范式作为一个第一类研究方向。

更新时间: 2026-05-07 16:29:33

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2605.06522v1

Optimizing Social Utility in Sequential Experiments

Regulatory approval of products in high-stakes domains such as drug development requires statistical evidence of safety and efficacy through large-scale randomized controlled trials. However, the high financial cost of these trials may deter developers who lack absolute certainty in their product's efficacy, ultimately stifling the development of `moonshot' products that could offer high social utility. To address this inefficiency, in this paper, we introduce a statistical protocol for experimentation where the product developer (the agent) conducts a randomized controlled trial sequentially and the regulator (the principal) partially subsidizes its cost. By modeling the protocol using a belief Markov decision process, we show that the agent's optimal strategy can be found efficiently using dynamic programming. Further, we show that the social utility is a piecewise linear and convex function over the subsidy level the principal selects, and thus the socially optimal subsidy can also be found efficiently using divide-and-conquer. Simulation experiments using publicly available data on antibiotic development and approval demonstrate that our statistical protocol can be used to increase social utility by more than $35$$\%$ relative to standard, non-sequential protocols.

Updated: 2026-05-07 16:28:25

标题: 优化顺序实验中的社会效用

摘要: 在高风险领域（如药物开发）中的产品获得监管批准需要通过大规模随机对照试验提供安全性和有效性的统计证据。然而，这些试验的高昂费用可能会阻止那些对其产品有效性缺乏绝对确定性的开发者，最终抑制了可能提供高社会效用的“大胆尝试”产品的发展。为了解决这种低效问题，在本文中，我们引入了一个实验统计协议，其中产品开发者（代理人）按顺序进行随机对照试验，监管者（委托人）部分补贴其成本。通过使用信念马尔可夫决策过程对协议进行建模，我们展示了代理人的最优策略可以通过动态规划有效地找到。此外，我们还表明社会效用是一个关于委托人选择的补贴水平的分段线性和凸函数，因此社会最优补贴也可以通过分而治之有效找到。使用公开可获得的抗生素开发和批准数据进行的模拟实验表明，相对于标准的非顺序协议，我们的统计协议可以将社会效用提高超过35%。

更新时间: 2026-05-07 16:28:25

领域: cs.GT,cs.LG,cs.MA,stat.ME

下载: http://arxiv.org/abs/2605.06520v1

Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design

The engineering design research community has studied agentic AI systems that use Large Language Model (LLM) agents to automate the engineering design process. However, these systems are prone to some of the same pathologies that plague humans. Just as human designers, LLM design agents can fixate on existing paradigms and fail to explore alternatives when solving design challenges, potentially leading to suboptimal solutions. In this work, we propose (1) a novel Self-Regulation Loop (SRL), in which the Design Agent self-regulates and explicitly monitors its own metacognition, and (2) a novel Co-Regulation Design Agentic Loop (CRDAL), in which a Metacognitive Co-Regulation Agent assists the Design Agent in metacognition to mitigate design fixation, thereby improving system performance for engineering design tasks. In the battery pack design problem examined here, we found that the novel SRL and CRDAL systems generate designs with better performance, without significantly increasing the computational cost, compared to a plain Ralph Wiggum Loop (RWL) Further, the novel CRDAL generates designs with significantly better performance than SRL. Also, we found that the CRDAL system navigated through the latent design space more effectively than both SRL and RWL. The proposed system architectures and findings of this work provide practical implications for future development of agentic AI systems for engineering design.

Updated: 2026-05-07 16:27:28

标题: 监督拉尔夫·威根姆：探索一种用于工程设计的元认知共同调节主体AI循环

摘要: 工程设计研究社区已经研究了使用大语言模型(LLM)代理来自动化工程设计过程的代理AI系统。然而，这些系统容易出现与人类相同的一些病态。就像人类设计师一样，LLM设计代理可能会固守现有范式，在解决设计挑战时不去探索替代方案，可能导致次优解决方案。在这项工作中，我们提出了(1)一种新颖的自我调节环路(SRL)，其中设计代理自我调节并明确监控自己的元认知，以及(2)一种新颖的共调节设计代理环路(CRDAL)，其中元认知共调节代理协助设计代理进行元认知，以减轻设计固执，从而提高工程设计任务的系统性能。在这里考察的电池组设计问题中，我们发现新颖的SRL和CRDAL系统生成性能更好的设计，而且与普通的Ralph Wiggum环路(RWL)相比，计算成本没有明显增加。此外，新颖的CRDAL生成的设计性能明显优于SRL。此外，我们发现CRDAL系统比SRL和RWL更有效地在潜在的设计空间中导航。本研究提出的系统架构和发现为未来开发工程设计的代理AI系统提供了实际启示。

更新时间: 2026-05-07 16:27:28

领域: cs.AI

下载: http://arxiv.org/abs/2603.24768v2

Efficient Techniques for Data Reconstruction, with Finite-Width Recovery Guarantees

Data reconstruction attacks on trained neural networks aim to recover the data on which the network has been trained and pose a significant threat to privacy, especially if the training dataset contains sensitive information. Here, we propose a unified optimization formulation of the data reconstruction problem based on initial and trained parameter values, incorporating state-of-the-art proposals. We show that in the random feature model, this formulation provably leads to training data reconstruction with high probability, provided the network width is sufficiently large; this unprecedented finite-width result uses PAC-style bounds. Furthermore, when the data lies in a low-dimensional subspace, we show that the network width requirement for successful reconstruction can be relaxed, with bounds depending on the subspace dimension rather than the ambient dimension. For general neural network models and unknown data orientations, we propose an efficient reconstruction algorithm that approximates the low-dimensional data subspace through the change in the first-layer weights during training and uses only the last-layer weights for reconstruction, thus reducing the search space dimension and the required network width for high-quality reconstructions. Our numerical experiments on synthetic datasets and CIFAR-10 confirm that our subspace-aware reconstruction approach outperforms standard full-space techniques.

Updated: 2026-05-07 16:27:12

标题: 高效的数据重建技术，具有有限宽度的恢复保证

摘要: 对已训练的神经网络进行的数据重构攻击旨在恢复网络所训练的数据，对隐私构成重大威胁，特别是如果训练数据集包含敏感信息。在这里，我们提出了一个基于初始和训练参数值的数据重构问题的统一优化形式，结合了最先进的提议。我们表明在随机特征模型中，该公式可以证明在网络宽度足够大的情况下，具有高概率的训练数据重构；这一前所未有的有限宽度结果使用了PAC风格的边界。此外，当数据位于低维子空间时，我们表明成功重构所需的网络宽度要求可以放宽，边界取决于子空间维度而不是环境维度。对于一般的神经网络模型和未知的数据方向，我们提出了一种高效的重构算法，通过训练过程中第一层权重的变化来近似低维数据子空间，并仅使用最后一层权重进行重构，从而降低了搜索空间维度和高质量重构所需的网络宽度。我们对合成数据集和CIFAR-10的数值实验证实，我们的子空间感知重构方法优于标准的全空间技术。

更新时间: 2026-05-07 16:27:12

领域: cs.LG

下载: http://arxiv.org/abs/2605.06519v1

Learning to Cut: Reinforcement Learning for Benders Decomposition

Benders decomposition (BD) is a widely used solution approach for solving two-stage stochastic programs arising in real-world decision-making under uncertainty. However, it often suffers from slow convergence as the master problem grows with an increasing number of cuts. In this paper, we propose Reinforcement Learning for BD (RLBD), a framework that adaptively selects cuts using a neural network-based stochastic policy. The policy is trained using a policy gradient method via the REINFORCE algorithm. We evaluate the proposed approach on a two-stage stochastic electric vehicle charging station location problem and compare it with vanilla BD and LearnBD, a supervised learning approach that classifies cuts using a support vector machine. Numerical results demonstrate that RLBD achieves substantial improvements in computational efficiency and exhibits strong generalization to problems with similar structures but varying data inputs and decision variable dimensions.

Updated: 2026-05-07 16:26:13

标题: 学习切割：基于弯曲分解的强化学习

摘要: Benders decomposition（BD）是解决现实决策中出现的两阶段随机规划问题的常用方法，然而，随着割集数量的增加，主问题增长，收敛速度通常较慢。在本文中，我们提出了一种名为强化学习Benders decomposition（RLBD）的框架，它利用基于神经网络的随机策略自适应地选择割集。该策略是通过REINFORCE算法使用策略梯度方法进行训练的。我们在一个两阶段随机电动车充电站选址问题上评估了所提出的方法，并将其与传统BD和LearnBD进行了比较，后者是一种使用支持向量机对割集进行分类的监督学习方法。数值结果表明，RLBD 在计算效率上取得了显著改进，并对于具有相似结构但数据输入和决策变量维度不同的问题具有强大的泛化能力。

更新时间: 2026-05-07 16:26:13

领域: math.OC,cs.AI

下载: http://arxiv.org/abs/2605.06516v1

Leviathan: Decoupling Input and Output Representations in Language Models

Modern language models use a single matrix for input embedding and output projection. This couples two distinct objectives: token representation and discrimination over a vocabulary. This work introduces Leviathan, a Transformer architecture that replaces the input embedding matrix with learned embedding vectorization (LEV), a compact continuous mapping from token indices to embeddings. Leviathan's output head remains untied for a parameter increase of as low as 0.2%. Under controlled comparisons with identical Transformer backbones, Leviathan consistently improves language modeling performance over standard tied-embedding baselines across a 200M-1.2B parameter regime on The Pile with gains that grow during training. At 1.2B scale, Leviathan reduces validation perplexity by 9%, requires $2.1\times$ fewer training tokens to reach the tied baseline's final loss, and improves on all six downstream benchmarks evaluated, including a 30% reduction in LAMBADA perplexity. Frequency-stratified analysis reveals gains to be concentrated in rare tokens, where continuous parameterization reduces perplexity by 81%, falling to near zero for the most frequent.

Updated: 2026-05-07 16:22:21

标题: 利维坦：在语言模型中解耦输入和输出表示

摘要: 现代语言模型使用单个矩阵进行输入嵌入和输出投影。这将两个不同的目标耦合在一起：标记表示和对词汇的区分。本文介绍了Leviathan，一种Transformer架构，用学习的嵌入矢量化(LEV)替换输入嵌入矩阵，这是从标记索引到嵌入的紧凑连续映射。Leviathan的输出头保持解绑，参数增加仅为0.2%。在与相同Transformer骨干进行控制比较的情况下，Leviathan在The Pile的200M-1.2B参数范围内始终提高了语言建模性能，这些增益在训练过程中逐渐增加。在1.2B规模下，Leviathan将验证困惑度降低了9%，需要$2.1\times$更少的训练标记才能达到绑定基线的最终损失，并在评估的所有六个下游基准中取得了进展，包括LAMBADA困惑度减少30%。频率分层分析显示收益集中在稀有标记上，其中连续参数化将困惑度降低了81%，对于最常见的标记几乎为零。

更新时间: 2026-05-07 16:22:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2601.22040v2

Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models

Transformer-based tabular foundation models (TFMs) dominate small to medium tabular predictive benchmark tasks, yet their inference mechanisms remain largely unexplored. We present the first large-scale mechanistic study of layerwise dynamics in 6 state-of-the-art tabular in-context learning models. We explore how predictions emerge across depth, identify distinct stages of inference and reveal latent-space dynamics that differ from those of language models. Our findings indicate substantial depthwise redundancy across multiple models, suggesting iterative refinement with overlapping computations during inference stages. Guided by these insights, we design a proof-of-concept, looped single-layer model that uses only 20% of the original model's parameters while achieving comparable performance. The code is available at https://github.com/amirbalef/is_one_layer_enough.

Updated: 2026-05-07 16:22:04

标题: 一个层是否足够？理解表格基础模型中的推理动态

摘要: 基于Transformer的表格基础模型（TFMs）主导着小到中等表格预测基准任务，然而它们的推理机制仍然大部分未被探索。我们提出了第一个大规模的层次动态机制研究，研究了6种最先进的表格上下文学习模型。我们探索了预测如何在深度上出现，识别了推理的不同阶段，并揭示了与语言模型不同的潜在空间动态。我们的研究结果表明，多个模型之间存在相当大的深度冗余，表明在推理阶段存在重叠计算的迭代细化。在这些洞见的指导下，我们设计了一个概念验证的环回单层模型，仅使用原始模型参数的20%，同时实现了可比较的性能。代码可在https://github.com/amirbalef/is_one_layer_enough 中找到。

更新时间: 2026-05-07 16:22:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2605.06510v1

On the Security of Research Artifacts

Research artifacts are widely shared to support reproducibility, and artifact evaluation (AE) has become common at many leading conferences. However, AE mainly checks whether artifacts work as claimed and can be reproduced. It largely overlooks potential security risks. Since these artifacts are publicly released and reused, they may unintentionally create opportunities for misuse and raise concerns about safe and responsible sharing. We study 509 research artifacts from top-tier security venues and find that many contain insecure code patterns that may introduce potential attack vectors. We propose a taxonomy for context-aware security assessment to enable structured analysis of such risks. We perform static analysis and examine the resulting findings, filtering false positives and identifying real security risks. Our analysis shows that 41.60% of the prevalent findings may pose security concerns under practical usage. To support scalable analysis, we introduce SAFE (Security-Aware Framework for Artifact Evaluation), a first step toward an autonomous framework that analyzes tool-reported findings by considering code semantics, execution context, and practical exploitability. SAFE achieves 84.80% accuracy and 84.63% F1-score in distinguishing security and non-security risks. Overall, our results show that security is also important in AE for promoting safe and responsible research sharing. The source code is available at: https://github.com/nanda-rani/SAFE

Updated: 2026-05-07 16:21:26

标题: 关于研究成果的安全性

摘要: 研究工件广泛共享以支持可重复性，并且工件评估（AE）已经在许多领先的会议上变得普遍。然而，AE主要检查工件是否按照声称的方式工作并且可以被重现。它在很大程度上忽视了潜在的安全风险。由于这些工件是公开发布和重复使用的，它们可能无意中为滥用创造机会，并引起对安全和负责任共享的担忧。我们研究了来自顶级安全会议的509个研究工件，并发现其中许多包含可能引入潜在攻击向量的不安全代码模式。我们提出了一个用于上下文感知安全评估的分类法，以便对这些风险进行结构化分析。我们进行静态分析并检查产生的发现，过滤虚假阳性并识别真实的安全风险。我们的分析显示，41.60%的普遍发现在实际使用中可能存在安全问题。为了支持可扩展的分析，我们引入了SAFE（安全感知工件评估框架），这是朝向一个自主框架的第一步，该框架通过考虑代码语义、执行上下文和实际可利用性来分析工具报告的发现。SAFE在区分安全和非安全风险方面达到了84.80%的准确性和84.63%的F1得分。总的来说，我们的结果表明，在促进安全和负责任的研究共享方面，安全在AE中也很重要。源代码可在以下网址获取：https://github.com/nanda-rani/SAFE.

更新时间: 2026-05-07 16:21:26

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2605.06508v1

MARBLE: Multi-Aspect Reward Balance for Diffusion RL

Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need to be optimized simultaneously. Existing practice deal with multiple rewards by training one specialist model per reward, optimizing a weighted-sum reward $R(x)=\sum_k w_k R_k(x)$, or sequentially fine-tuning with a hand-crafted stage schedule. These approaches either fail to produce a unified model that can be jointly trained on all rewards or necessitates heavy manually tuned sequential training. We find that the failure stems from using a naive weighted-sum reward aggregation. This approach suffers from a sample-level mismatch because most rollouts are specialist samples, highly informative for certain reward dimensions but irrelevant for others; consequently, weighted summation dilutes their supervision. To address this issue, we propose MARBLE (Multi-Aspect Reward BaLancE), a gradient-space optimization framework that maintains independent advantage estimators for each reward, computes per-reward policy gradients, and harmonizes them into a single update direction without manually-tuned reward weighting, by solving a Quadratic Programming problem. We further propose an amortized formulation that exploits the affine structure of the loss used in DiffusionNFT, to reduce the per-step cost from K+1 backward passes to near single-reward baseline cost, together with EMA smoothing on the balancing coefficients to stabilize updates against transient single-batch fluctuations. On SD3.5 Medium with five rewards, MARBLE improves all five reward dimensions simultaneously, turns the worst-aligned reward's gradient cosine from negative under weighted summation in 80% of mini-batches to consistently positive, and runs at 0.97X the training speed of baseline training.

Updated: 2026-05-07 16:20:42

标题: 大理石：多方面奖励平衡对扩散强化学习的影响

摘要: 强化学习微调已成为将扩散模型与人类偏好对齐的主要方法。然而，评估图像本质上是一个多维任务，需要同时优化多个评估标准。现有的做法是通过训练一个专门的模型来处理多个奖励，优化加权和奖励$R(x)=\sum_k w_k R_k(x)$，或者通过手工设计的阶段调度进行顺序微调。这些方法要么无法产生一个可以同时在所有奖励上进行联合训练的统一模型，要么需要进行繁重的手动调整的顺序训练。我们发现失败的原因在于使用了一个天真的加权和奖励聚合方法。这种方法存在样本级别的不匹配问题，因为大多数回滚是专家样本，对某些奖励维度非常有信息量，但对其他维度则无关紧要；因此，加权求和会减弱它们的监督。为了解决这个问题，我们提出了MARBLE (多维奖励平衡) ，一个梯度空间优化框架，为每个奖励维度保持独立的优势估计器，计算每个奖励维度的政策梯度，并将它们协调成一个单一的更新方向，无需手动调整奖励权重，通过解决一个二次规划问题。我们进一步提出了一个摊销的公式，利用DiffusionNFT中使用的损失的仿射结构，将每步成本从K+1个反向传递降低到接近单一奖励基准成本，同时对平衡系数进行EMA平滑，以稳定更新，抵抗瞬时单批波动。在SD3.5 Medium上，MARBLE同时改善了所有五个奖励维度，将最不对齐的奖励的梯度余弦从加权和下的负值在80%的小批次中转变为一致的正值，并以基线训练速度的0.97倍运行。

更新时间: 2026-05-07 16:20:42

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2605.06507v1

PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at $I(S^*; Y_{1:T})=0$. This privacy regime bounds the membership-inference attack (MIA) posterior success rate at the prior, an MIA-resistance level the DP framework matches only at $\varepsilon=0$ and infinite noise. All DP-ZO comparisons below are matched at the MIA posterior level. The key insight is that PAC Privacy charges mutual information only when the release depends on which candidate subset is the secret. Sign-quantizing subset-aggregated zeroth-order gradients creates frequent unanimity, steps at which every candidate subset agrees on the update direction; at these steps the released sign costs zero conditional mutual information. We propose two variants that span the privacy-utility trade-off: PACZero-MI (budgeted MI via exact calibration on the binary release) and PACZero-ZPL ($I=0$ via a uniform coin flip on disagreement steps). We evaluate on SST-2 and SQuAD with OPT-1.3B and OPT-6.7B in both LoRA and full-parameter tracks. On SST-2 OPT-1.3B full fine-tuning at $I=0$, PACZero-ZPL reaches ${88.99\pm0.91}$, within $2.1$pp of the non-private MeZO baseline ($91.1$ FT). No prior method produces usable utility in the high-privacy regime $\varepsilon<1$, and PACZero-ZPL obtains competitive SST-2 accuracy and nontrivial SQuAD F1 across OPT-1.3B and OPT-6.7B at $I=0$.

Updated: 2026-05-07 16:20:20

标题: PACZero：通过符号量化实现语言模型的PAC-私密微调

摘要: 我们介绍了PACZero，这是一组用于微调大型语言模型的PAC-private零阶机制，可以在$I(S^*; Y_{1:T})=0$的情况下提供可用的效用。这种隐私制度限制了成员推断攻击(MIA)后验成功率在先验时的水平，这是DP框架只有在$\varepsilon=0$和无限噪声时才能匹配的MIA抵抗水平。以下所有DP-ZO比较均在MIA后验水平上匹配。关键的见解是，PAC隐私只在发布取决于哪个候选子集是秘密时才会产生互信息。对子集聚合的零阶梯度进行符号量化会导致频繁的一致性，即每个候选子集在更新方向上达成一致的步骤；在这些步骤中，发布的符号成本为零条件互信息。我们提出了两种跨隐私-效用权衡的变体：PACZero-MI（通过在二元发布上的精确校准来预算互信息）和PACZero-ZPL（通过在不一致步骤上进行均匀硬币翻转来实现$I=0$）。我们在SST-2和SQuAD上使用OPT-1.3B和OPT-6.7B在LoRA和全参数跟踪中进行评估。在SST-2 OPT-1.3B全精细调整中$I=0$时，PACZero-ZPL达到了${88.99\pm0.91}$，与非私有MeZO基线($91.1$ FT)相差不到$2.1$pp。在高隐私制度$\varepsilon<1$下，没有先前的方法能够提供可用的效用，而PACZero-ZPL在$I=0$时在SST-2准确性和SQuAD F1上获得了竞争力。

更新时间: 2026-05-07 16:20:20

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2605.06505v1

Principled Federated Random Forests for Heterogeneous Data

Random Forests (RF) are among the most powerful and widely used predictive models for centralized tabular data, yet few methods exist to adapt them to the federated learning setting. Unlike most federated learning approaches, the piecewise-constant nature of RF prevents exact gradient-based optimization. As a result, existing federated RF implementations rely on unprincipled heuristics: for instance, aggregating decision trees trained independently on clients fails to optimize the global impurity criterion, even under simple distribution shifts. We propose FedForest, a new federated RF algorithm for horizontally partitioned data that naturally accommodates diverse forms of client data heterogeneity, from covariate shift to more complex outcome shift mechanisms. We prove that our splitting procedure, based on aggregating carefully chosen client statistics, closely approximates the split selected by a centralized algorithm. Moreover, FedForest allows splits on client indicators, enabling a non-parametric form of personalization that is absent from prior federated random forest methods. Empirically, we demonstrate that the resulting federated forests closely match centralized performance across heterogeneous benchmarks while remaining communication-efficient.

Updated: 2026-05-07 16:19:50

标题: 原则上的联邦随机森林用于异构数据

摘要: 随机森林（RF）是最强大和广泛使用的用于集中式表格数据的预测模型之一，然而很少有方法可以将它们适应于联合学习环境。与大多数联合学习方法不同，RF的分段常数特性阻止了精确的基于梯度的优化。因此，现有的联合RF实现依赖于不成文的启发式方法：例如，聚合在客户端上独立训练的决策树未能优化全局纯度准则，即使在简单的分布转移下也是如此。我们提出了FedForest，这是一种新的适用于水平分区数据的联合RF算法，自然地适应了不同形式的客户端数据异质性，从协变移位到更复杂的结果移位机制。我们证明了我们的分裂过程，基于聚合精心选择的客户端统计数据，与由中央算法选择的分裂非常接近。此外，FedForest允许在客户端指标上进行分裂，实现了一种在以前的联合随机森林方法中缺失的非参数化个性化形式。从经验上讲，我们证明了由此产生的联合森林在各种基准测试中与中央性能相匹配，同时保持通信效率。

更新时间: 2026-05-07 16:19:50

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2602.03258v2

Privacy by Postprocessing the Discrete Laplace Mechanism

We show that an "old dog", the classical discrete Laplace (aka.~geometric) mechanism, can "perform new tricks": 1. It can be post-processed to yield a simple, unbiased estimator of any subexponential function $f$ of the original data, giving a simple, discrete, multivariate version of the recent unbiasing result for the Laplace mechanism by Calmon et al. (FORC '25). 2. It can be post-processed to output the same distribution as the Laplace mechanism or the Staircase mechanism with identical privacy parameters. Thus, the discrete Laplace mechanism is a versatile mechanism that should be preferred over the Laplace and Staircase mechanisms whenever the data is discrete (or can be made discrete while controlling $\ell_1$-sensitivity). We show bounds on the variance of our estimator, compared to the mean square error of the biased estimator that simply evaluates the $f$ on the output of the mechanism. Though our unbiased estimator has exponential running time for worst-case functions, we show that it can often be computed in linear or polynomial time for some common functions exhibiting structure. We showcase the properties of our methods empirically with several use cases including profile and entropy estimation, as well as distributed/federated data analysis applications in which unbiasedness is key to accuracy.

Updated: 2026-05-07 16:19:08

标题: 通过后处理离散拉普拉斯机制实现隐私

摘要: 我们展示了一个“老狗”，经典的离散拉普拉斯（又名几何）机制，可以“学会新花招”： 1. 它可以经过后处理，得到原始数据的任何亚指数函数$f$的简单、无偏估计，从而为Calmon等人最近提出的离散、多元版本拉普拉斯机制的无偏结果提供了一个简单的方法。 2. 它可以经过后处理，输出与拉普拉斯机制或阶梯机制相同的分布，且具有相同的隐私参数。因此，离散拉普拉斯机制是一个多功能的机制，应该在数据是离散的情况下（或者在控制$\ell_1$-敏感度的同时可以使数据离散）优先选择，而不是拉普拉斯和阶梯机制。我们展示了我们的估计方差的界限，与对于简单地在机制输出上评估$f$的有偏估计的均方误差进行比较。尽管我们的无偏估计对于最坏情况下的函数具有指数运行时间，但我们展示了对于一些表现出结构的常见函数，它通常可以在线性或多项式时间内计算。我们通过几个使用案例，包括概要和熵估计，以及分布/联邦数据分析应用，从实证的角度展示了我们方法的特性，其中无偏性对准确性至关重要。

更新时间: 2026-05-07 16:19:08

领域: cs.CR

下载: http://arxiv.org/abs/2605.06502v1

Cubit: Token Mixer with Kernel Ridge Regression

Since its introduction in 2017, the Transformer has become one of the most widely adopted architectures in modern deep learning. Despite extensive efforts to improve positional encoding, attention mechanisms, and feed-forward networks, the core token-mixing mechanism in Transformers remains attention. In this work, we show that the attention module in Transformers can be interpreted as performing Nadaraya-Watson regression, where it computes similarities between tokens and aggregates the corresponding values accordingly. Motivated by this perspective, we propose Cubit, a potential next-generation architecture that leverages Kernel Ridge Regression (KRR), while the vanilla Transformer relies on Nadaraya-Watson regression. Specifically, Cubit modifies the classical attention computation by incorporating the closed-form solution of KRR, combining value aggregation through kernel similarities with normalization via the inverse of the kernel matrix. To improve the training stability, we further propose the Limited-Range Rescale (LRR), which rescales the value layer within a controlled range. We argue that Cubit, as a KRR-based architecture, provides a stronger mathematical foundation than the vanilla Transformer, whose attention mechanism corresponds to Nadaraya-Watson regression. We validate this claim through comprehensive experiments. The experimental results suggest that Cubit may exhibit stronger long-sequence modeling capability. In particular, its performance gain over the Transformer appears to increase as the training sequence length grows.

Updated: 2026-05-07 16:18:55

标题: 立方：带有核岭回归的令牌混合器

摘要: 自从2017年引入以来，Transformer已成为现代深度学习中最广泛采用的架构之一。尽管在改进位置编码、注意力机制和前馈网络方面进行了大量努力，但Transformer中的核心令牌混合机制仍然是注意力。在这项工作中，我们展示了Transformer中的注意力模块可以解释为执行Nadaraya-Watson回归，其中它计算令牌之间的相似性，并相应地聚合相应的值。受此视角启发，我们提出了Cubit，这是一种潜在的下一代架构，利用核岭回归（KRR），而普通Transformer依赖于Nadaraya-Watson回归。具体而言，Cubit通过将KRR的闭式解合并到经典的注意力计算中来修改注意力计算，通过核相似性进行值聚合，并通过核矩阵的逆进行归一化。为了改善训练稳定性，我们进一步提出了有限范围重新缩放（LRR），在受控范围内重新缩放值层。我们认为，作为基于KRR的架构，Cubit提供了比普通Transformer更强的数学基础，后者的注意力机制对应于Nadaraya-Watson回归。我们通过全面实验验证了这一说法。实验结果表明，Cubit可能具有更强的长序列建模能力。特别是，随着训练序列长度的增加，Cubit相对于Transformer的性能增益似乎会增加。

更新时间: 2026-05-07 16:18:55

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2605.06501v1

Operator-Guided Invariance Learning for Continuous Reinforcement Learning

Reinforcement learning (RL) with continuous time and state/action spaces is often data-intensive and brittle under nuisance variability and shift, motivating methods that exploit value-preserving structures to stabilize and improve learning. Most existing approaches focus on special cases, such as prescribed symmetries and exact equivariance, without addressing how to discover more general structures that require nonlinear operators to transform and map between continuous state/action systems with isomorphic value functions. We propose \textbf{VPSD-RL} (Value-Preserving Structure Discovery for Reinforcement Learning). It models continuous RL as a controlled diffusion with value-preserving mappings defined through Lie-group actions and associated pullback operators. We show that a value-preserving structure exists exactly when pulling back the value function and pushing forward actions commute with the controlled generator and reward functional. Further, approximate value-preserving structures with rigorous guarantees can be found when the Hamilton--Jacobi--Bellman mismatch is small. This framework discovers exact and approximate value-preserving structures by searching for the associated Lie group operators. VPSD-RL fits differentiable drift, diffusion, and reward models; learns infinitesimal generators via determining-equation residual minimization; exponentiates them with ODE flows to obtain finite transformations; and integrates them into continuous RL through transition augmentation and transformation-consistency regularization. We show that bounded generator/reward mismatch implies quantitative stability of the optimal value function along approximate orbits, with sensitivity governed by the effective horizon, and observe improved data efficiency and robustness on continuous-control benchmarks.

Updated: 2026-05-07 16:18:37

标题: 操作员引导的不变性学习用于连续强化学习

摘要: 使用连续时间和状态/动作空间的强化学习（RL）通常在干扰变化和转移下需要大量数据，并且容易出现脆弱性，这促使了利用保值结构来稳定和改进学习的方法。大多数现有方法专注于特殊情况，如规定的对称性和精确的等变性，而没有解决如何发现需要非线性算子来转换和映射具有同构值函数的连续状态/动作系统之间更一般的结构。我们提出了\textbf{VPSD-RL}（强化学习的保值结构发现）。它将连续RL建模为通过李群作用和相关的拉回算子定义的保值映射的受控扩散。我们展示了当拉回值函数和推动动作与受控生成器和奖励函数交换时，确实存在保值结构。此外，当哈密顿-雅可比-贝尔曼不匹配较小时，可以找到具有严格保证的近似保值结构。该框架通过搜索相关的李群算子来发现精确和近似的保值结构。VPSD-RL适用于可微漂移、扩散和奖励模型；通过确定方程残差最小化学习微小生成器；将它们通过ODE流进行指数化以获得有限转换；并通过过渡增强和转换一致性正则化将它们整合到连续RL中。我们展示了有界生成器/奖励不匹配意味着最优值函数在近似轨道上的定量稳定性，灵敏度由有效视野控制，并观察到在连续控制基准测试中数据效率和稳健性得到改善。

更新时间: 2026-05-07 16:18:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2605.06500v1

From Token Lists to Graph Motifs: Weisfeiler-Lehman Analysis of Sparse Autoencoder Features

Sparse autoencoders (SAEs) have become central to mechanistic interpretability, decomposing transformer activations into monosemantic features. Yet existing analyses characterise features almost exclusively through top-activating token lists or decoder weight vectors, leaving the higher-order co-occurrence structure shared across features largely unexamined. We introduce a graph-structured representation in which each SAE feature is modelled as a token co-occurrence graph: nodes are the tokens most frequent near strong activations, and edges connect pairs that co-occur within local context windows. A custom WL-style, frequency-binned graph kernel then provides a similarity measure over this structural space. Applied as a proof of concept to features from a large SAE trained on GPT-2 Small and probed with a synthetic mixed-domain corpus, our clustering recovers heuristic motif families (punctuation-heavy patterns, language and script clusters, and code-like templates) that are not recovered by clustering on decoder cosine similarity. A token-histogram baseline achieves higher overall purity, so the contribution of the graph view is complementary rather than dominant: it surfaces structural relationships that token-frequency and decoder-weight views alone do not capture. Cluster assignments are stable across graph-construction hyperparameters and random seeds.

Updated: 2026-05-07 16:15:16

标题: 从令牌列表到图形模式：稀疏自动编码器特征的Weisfeiler-Lehman分析

摘要: 稀疏自编码器（SAEs）已经成为机械解释性的核心，将变压器激活分解为单义特征。然而，现有的分析几乎完全通过顶部激活令牌列表或解码器权重向量来表征特征，而对特征之间共享的高阶共现结构几乎未经研究。我们引入了一个图结构表示，其中每个SAE特征被建模为一个令牌共现图：节点是在强激活附近最频繁出现的令牌，边连接在本地上下文窗口内共同出现的令牌对。然后，一个自定义的WL风格、频率分箱的图核心提供了这个结构空间上的相似度度量。作为一个概念验证，我们将其应用于一个在GPT-2 Small上训练并用合成混合领域语料库进行探测的大型SAE的特征，我们的聚类恢复了启发式主题族（标点符号密集型模式、语言和脚本聚类以及类似代码的模板），这些主题族在解码器余弦相似性聚类中无法恢复。令牌直方图基线实现了更高的整体纯度，因此图形视图的贡献是互补的，而不是主导的：它展示了仅通过令牌频率和解码器权重视图无法捕捉到的结构关系。聚类分配在图构造超参数和随机种子下是稳定的。

更新时间: 2026-05-07 16:15:16

领域: cs.AI

下载: http://arxiv.org/abs/2605.06494v1

Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors

AI systems have become increasingly capable of dangerous behaviours in many domains. This raises the question: Do models sometimes choose to violate human instructions in order to perform behaviour that is more useful for certain goals? We introduce a benchmark for measuring model propensity for instrumental convergence (IC) behaviour in terminal-based agents. This is behaviour such as self-preservation that has been hypothesised to play a key role in risks from highly capable AI agents. Our benchmark is realistic and low-stakes which serves to reduce evaluation-awareness and roleplay confounds. The suite contains seven operational tasks, each with an official workflow and a policy-violating shortcut. An eight-variant shared framework varies monitoring, instruction clarity, stakes, permission, instrumental usefulness and blocked honest paths to support inferences regarding the factors driving IC behaviour. We evaluated ten models using deterministic environment-state scorers over 1,680 samples, with trace review employed for audit and adjudication purposes. The final IC rate is 86 out of 1,680 samples (5.1%). IC behaviour is concentrated rather than uniform: two Gemini models account for 66.3% of IC cases and three tasks account for 84.9%. Conditions in which IC behaviour is indispensable for task success result in the greatest increase in the adjusted IC rate (+15.7 percentage points), whereas emphasising that task success is critical or certain framing choices do not produce comparable effects. Our findings indicate that realistic, low-nudge environments elicit IC behaviour rarely but systematically in most tested models. We conclude that it is feasible to robustly measure tendencies for dangerous behaviour in current frontier AI agents.

Updated: 2026-05-07 16:12:36

标题: 工具选择：衡量LLM代理倾向于追求工具行为

摘要: 人工智能系统在许多领域的危险行为能力越来越强，这引发了一个问题：模型有时是否选择违反人类指令以执行对某些目标更有用的行为？我们引入了一个用于测量终端代理中仪器收敛（IC）行为倾向的基准。这种行为，如自我保存，被假设在高度能力的人工智能代理带来风险中起关键作用。我们的基准是现实且低风险的，有助于减少评估意识和角色扮演混淆。这个套件包含七个操作任务，每个任务都有一个官方工作流程和一个违反政策的捷径。一个八种变体的共享框架变化监控、指示清晰度、风险、许可、仪器实用性和被阻止的诚实路径，以支持关于驱动IC行为因素的推理。我们使用确定性环境状态评分器在1,680个样本上评估了十个模型，审查迹线用于审计和裁决目的。最终IC率为1,680个样本中的86个（5.1%）。IC行为是集中的而不是均匀的：两个Gemini模型占66.3%的IC案例，三个任务占84.9%。IC行为对任务成功至关重要的情况导致了调整后IC率的最大增加（+15.7个百分点），而强调任务成功至关重要或某些框架选择并没有产生可比较的效果。我们的研究结果表明，在大多数测试模型中，在真实的、低干扰的环境中很少但系统地引发IC行为。我们得出结论，可以在当前前沿人工智能代理中强有力地测量危险行为的倾向。

更新时间: 2026-05-07 16:12:36

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2605.06490v1

Zero-Shot Confidence Estimation for Small LLMs: When Supervised Baselines Aren't Worth Training

How reliably can a small language model estimate its own correctness? The answer determines whether local-to-cloud routing-escalating queries a cheap local model cannot handle-can work without supervised training data. As inference costs dominate large language model (LLM) deployment budgets, routing most queries to a cheap local model while reserving expensive cloud calls for hard cases is an increasingly common cost-control strategy. We compare zero-shot confidence signals against RouteLLM-style supervised baselines across three 7-8B model families and two datasets (1,000 and 500 queries per model, respectively). Average token log-probability, which requires no training data, matches or exceeds supervised baselines in-distribution (Area Under the Receiver Operating Characteristic curve (AUROC) 0.650-0.714 vs. 0.644-0.676) and substantially outperforms them out-of-distribution (0.717-0.833 vs. 0.512-0.564), because it measures a property of the model's generation rather than the query distribution. This paper further proposes retrieval-conditional self-assessment, a pre-generation signal that selectively injects retrieved knowledge when similarity is high, improving over bare self-assessment by up to +0.069 AUROC at 3-10x lower latency than log-probability. A supervised baseline trained on 1,000 labeled examples never exceeds the zero-shot signal. We release all code, data, and experiment logs.

Updated: 2026-05-07 16:11:01

标题: 零射击置信度估计对于小LLM：当监督基线不值得训练时

摘要: 一个小语言模型能够可靠地估计自己的正确性吗？这个问题的答案决定了本地到云的路由升级查询是否可以在没有监督训练数据的情况下工作，因为便宜的本地模型无法处理。由于推理成本占据了大型语言模型（LLM）部署预算，将大多数查询路由到便宜的本地模型，同时将昂贵的云呼叫保留给难以处理的情况，成为了一种越来越常见的成本控制策略。我们在三个7-8B模型系列和两个数据集（分别为每个模型的1,000和500个查询）之间比较了零-shot置信信号与RouteLLM风格的监督基线。平均token对数概率无需训练数据，在分布内与监督基线匹配或超过（在接收操作特征曲线下的面积（AUROC）为0.650-0.714 vs. 0.644-0.676），并在分布外明显优于它们（0.717-0.833 vs. 0.512-0.564），因为它衡量的是模型生成的属性而不是查询分布。本文进一步提出了检索条件自我评估，这是一种在相似性高时选择性注入检索知识的预生成信号，比对数概率的延迟低3-10倍时，AUROC可提高高达+0.069。一个在1,000个标记示例上训练的监督基线从未超过零-shot信号。我们发布所有代码、数据和实验日志。

更新时间: 2026-05-07 16:11:01

领域: cs.AI,cs.CL,cs.ET

下载: http://arxiv.org/abs/2605.02241v3

CatNet: Controlling the False Discovery Rate in LSTM with SHAP Feature Importance and Gaussian Mirrors

We introduce CatNet, an algorithm that effectively controls False Discovery Rate (FDR) and selects significant features in LSTM. CatNet employs the derivative of SHAP values to quantify the feature importance, and constructs a vector-formed mirror statistic for FDR control with the Gaussian Mirror algorithm. To avoid instability due to nonlinear or temporal correlations among features, we also propose a new kernel-based independence measure. CatNet performs robustly on different model settings with both simulated and real-world data, which reduces overfitting and improves interpretability of the model. Our framework that introduces SHAP for feature importance in FDR control algorithms and improves Gaussian Mirror can be naturally extended to other time-series or sequential deep learning models.

Updated: 2026-05-07 16:10:56

标题: CatNet：使用SHAP特征重要性和高斯镜控制LSTM中的假发现率

摘要: 我们介绍了CatNet，这是一种有效控制False Discovery Rate（FDR）并在LSTM中选择显著特征的算法。CatNet利用SHAP值的导数来量化特征重要性，并利用高斯镜像算法构建一个向量形式的镜像统计量来控制FDR。为了避免由于特征之间的非线性或时间相关性而导致的不稳定性，我们还提出了一种基于核的独立性测量方法。CatNet在不同模型设置下表现出稳健性，同时使用模拟和真实世界数据，减少过拟合并提高模型的可解释性。我们的框架引入了SHAP来在FDR控制算法中量化特征重要性，并改进了高斯镜像，可以自然地扩展到其他时间序列或序贯深度学习模型中。

更新时间: 2026-05-07 16:10:56

领域: stat.ML,cs.AI,cs.LG,q-fin.ST

下载: http://arxiv.org/abs/2411.16666v4

LAMP: Look-Ahead Mixed-Precision Inference of Large Language Models

Mixed-precision computations are a hallmark of the current stage of AI, driving the progress in large language models towards efficient, locally deployable solutions. This article addresses the floating-point computation of compositionally-rich functions, concentrating on transformer inference. Based on the rounding error analysis of a composition $f(g(\mathrm{x}))$, we provide an adaptive strategy that selects a small subset of components of $g(\mathrm{x})$ to be computed more accurately while all other computations can be carried out with lower accuracy. We then explain how this strategy can be applied to different compositions within a transformer and illustrate its overall effect on transformer inference. We study the effectiveness of this algorithm numerically on GPT-2 models and demonstrate that already very low recomputation rates allow for improvements of up to two orders of magnitude in accuracy.

Updated: 2026-05-07 16:10:14

标题: LAMP: 大语言模型的向前混合精度推理

摘要: 混合精度计算是当前人工智能领域的一个特点，推动了大型语言模型向高效、本地部署的解决方案的进展。本文讨论了富有组合性的函数的浮点计算，重点关注transformer推理。基于复合函数$f(g(\mathrm{x}))$的舍入误差分析，我们提供了一种自适应策略，选择$g(\mathrm{x})$的一个小子集进行更准确的计算，而所有其他计算可以以更低的精度进行。然后我们解释了这种策略如何应用于transformer中的不同组合，并展示其对transformer推理的整体影响。我们通过GPT-2模型对这种算法进行了数值研究，并证明即使在非常低的重新计算率下，也可以使准确性提高两个数量级。

更新时间: 2026-05-07 16:10:14

领域: cs.LG,math.NA

下载: http://arxiv.org/abs/2601.21623v2

3D MRI Image Pretraining via Controllable 2D Slice Navigation Task

Self-supervised pretraining has become the mainstream approach for learning MRI representations from unlabeled scans. However, most existing objectives still treat each scan primarily as static aggregations of slices, patches or volumes. We ask whether there exists an intrinsic form of self-supervision signal that is different from reconstructing the masked patches, through transforming the 3D volumes into controllable 2D rendered sequences: by rendering slices at continuous positions, orientations, and scales, a 3D volume can be converted into dense video-action sequences whose controls are the action trajectories. We study this formulation with an action-conditioned pretraining objective, where a tokenizer encodes slice observations and a latent dynamics model predicts the evolution of latent features. Across representative anatomical and spatial downstream tasks, the proposed pretraining is evaluated against standard static-volume baselines, tokenizer-only pretraining, and dynamics variants without aligned actions. These results suggest that controllable MRI slice navigation provides a useful complementary pretraining interface for learning anatomical and spatial representations from large unlabeled MRI collections.

Updated: 2026-05-07 16:08:22

标题: 3D MRI图像预训练通过可控的2D切片导航任务

摘要: 自监督预训练已成为从未标记扫描中学习MRI表示的主流方法。然而，大多数现有的目标仍将每个扫描主要视为切片、补丁或体积的静态聚合物。我们想知道是否存在一种与重建掩膜补丁不同的自监督信号的固有形式，通过将3D体积转换为可控的2D渲染序列：通过在连续位置、方向和比例上渲染切片，可以将3D体积转换为密集视频动作序列，其控制是动作轨迹。我们使用动作条件的预训练目标来研究这种表述，其中一个标记器编码切片观察结果，一个潜在动态模型预测潜在特征的演变。在代表性的解剖和空间下游任务中，所提出的预训练方法与标准的静态体积基线、仅标记器预训练和没有对齐动作的动态变体进行评估。这些结果表明，可控的MRI切片导航为从大型未标记MRI集合中学习解剖和空间表示提供了一个有用的补充预训练接口。

更新时间: 2026-05-07 16:08:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2605.06487v1

Autonomous Adversary: Red-Teaming in the age of LLM

Language Model Agents (LMAs) are emerging as a powerful primitive for augmenting red-team operations. They can support attack planning, adversary emulation, and the orchestration of multi-step activity such as lateral movement, a core enabling capability of advanced persistent threat (APT) campaigns. Using frameworks such as MITRE ATT&CK, we analyze where these agents intersect with core offensive functions and assess current strengths and limitations of LMAs with an emphasis on governance and realistic evaluation. We benchmark LMAs across two lateral-movement scenarios in a controlled adversary-emulation environment, where LMAs interact with instrumented cyber agents, observe execution artifacts, and iteratively adapt based on environmental feedback. Each scenario is formalized as an ordered task chain with explicit validation predicates, leveraging an LLM-as-a-Judge paradigm to ensure deterministic outcome verification. We compare three operational modalities: fully autonomous execution, self-scaffolded planning, and expert-defined action plans. Preliminary findings indicate that expert-defined action plans yield higher task-completion rates relative to other operational modes. However, failure remains frequent across all modalities, largely attributable to brittle command invocation, environmental and deployment instability, and recurring errors in credential management and state handling.

Updated: 2026-05-07 16:07:41

标题: 自主对手：在LLM时代的红队行动

摘要: 语言模型代理（LMAs）正在成为增强红队操作的强大基本要素。它们可以支持攻击规划、对手仿真以及编排多步活动，如横向移动，这是高级持续性威胁（APT）行动的核心启用能力。使用MITRE ATT＆CK等框架，我们分析这些代理与核心攻击功能相交的地方，并重点评估LMAs的当前优势和局限性，着重于治理和实际评估。我们在受控对手仿真环境中的两个横向移动场景中对LMAs进行基准测试，其中LMAs与被仪器化的网络代理进行交互，观察执行结果，并根据环境反馈迭代地进行适应。每个场景都被形式化为一个有明确验证谓词的有序任务链，利用LLM作为法官的范式来确保确定性结果验证。我们比较了三种操作模式：完全自主执行、自我搭建规划和专家定义的行动计划。初步发现表明，专家定义的行动计划相对于其他操作模式产生了更高的任务完成率。然而，失败在所有模式中仍然频繁发生，主要归因于脆弱的命令调用、环境和部署不稳定性，以及凭据管理和状态处理中的重复错误。

更新时间: 2026-05-07 16:07:41

领域: cs.CR

下载: http://arxiv.org/abs/2605.06486v1

Litespark Inference on Consumer CPUs: Custom SIMD Kernels for Ternary Neural Networks

Large language models (LLMs) have transformed artificial intelligence, but their computational requirements remain prohibitive for most users. Standard inference demands expensive datacenter GPUs or cloud API access, leaving over one billion personal computers underutilized for AI workloads. Ternary models offer a path forward: their weights are constrained to {-1, 0, +1}, theoretically eliminating the need for floating-point multiplication. However, existing frameworks fail to exploit this structure, treating ternary models as dense floating-point networks. We address this gap with custom SIMD kernels that replace matrix multiplication with simple addition and subtraction operations, targeting the integer dot product instructions available on modern CPUs. Our implementation, Litespark-Inference, is pip-installable and integrates directly with Hugging-Face, achieving 9.2x faster time-to-first-token, 52x higher throughput, and 14x memory reduction compared to standard PyTorch inference on Apple Silicon, with similar speedups on Intel and AMD processors.

Updated: 2026-05-07 16:07:39

标题: 在消费级CPU上进行Litespark推理：三值神经网络的定制SIMD内核

摘要: 大型语言模型（LLMs）已经改变了人工智能，但它们的计算需求对大多数用户而言仍然是不可接受的。标准推理需要昂贵的数据中心GPU或云API访问，导致超过10亿台个人电脑未充分利用用于人工智能工作负载。三值模型提供了一条前进的道路：它们的权重被限制在{-1、0、+1}，理论上消除了浮点乘法的需要。然而，现有框架未能利用这种结构，将三值模型视为密集的浮点网络。我们通过自定义SIMD内核来填补这一差距，用简单的加法和减法操作取代矩阵乘法，针对现代CPU可用的整数点积指令。我们的实现，Litespark-Inference，可通过pip安装，并直接与Hugging-Face集成，与苹果Silicon上的标准PyTorch推理相比，实现了首个令牌的时间缩短9.2倍，吞吐量提高52倍，内存减少14倍，在Intel和AMD处理器上也有类似的加速效果。

更新时间: 2026-05-07 16:07:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2605.06485v1

Estimate Level Adjustment For Inference With Proxies Under Random Distribution Shifts

In many scientific domains, including experimentation, researchers rely on measurements of proxy outcomes to achieve faster and more frequent reads, especially when the primary outcome of interest is challenging to measure directly. While proxies offer a more readily accessible observation for inference, the ultimate goal is to draw statistical inferences about the primary outcome parameter and proxy data are typically imperfect in some ways. To correct for these imperfections, current statistical inference methods often depend on strict identifying assumptions (such as surrogacy, covariate/label shift, or missingness assumptions). These assumptions can be difficult to validate and may be violated by various additional sources of distribution shift, potentially leading to biased parameter estimates and miscalibrated uncertainty quantification. We introduce an estimate-level framework, inspired by domain adaptation techniques, to empirically calibrate proxy-based inference. This framework models the proxy-primary metric discrepancy as a random effect at the parameter level, estimating its distribution from aggregated historical observations across past domains (e.g., experiments, time periods, or distinct segments). This method avoids the requirement for retaining individual-level response data. Additionally, this adjustment can be layered on top of existing proxy-correction methods (such as prediction-powered inference or importance weighting) to account for additional biases not addressed by those corrections. To manage uncertainty when the number of historical domains is limited, we provide both a method-of-moments estimator and a domain bootstrap procedure. We further validate this approach using publicly available datasets and real-world experiments.

Updated: 2026-05-07 16:07:35

标题: 使用代理变量进行推断的估计水平调整在随机分布转移时的影响

摘要: 在许多科学领域，包括实验，研究人员依靠代理结果的测量来实现更快速和更频繁的读数，特别是当感兴趣的主要结果难以直接测量时。虽然代理结果为推断提供了更容易获得的观察，但最终目标是对主要结果参数进行统计推断，代理数据通常在某些方面是不完美的。为了纠正这些缺陷，当前的统计推断方法通常依赖于严格的识别假设（如替代性、协变量/标签转移或缺失假设）。这些假设可能难以验证，并且可能会被各种其他分布转移的来源所违反，从而可能导致偏倚的参数估计和误校准的不确定性量化。我们引入了一个受领域适应技术启发的估计水平框架，以经验校准基于代理的推断。该框架将代理-主要指标差异建模为参数级别的随机效应，从跨越过去领域（例如实验、时间段或不同段）的聚合历史观察中估计其分布。这种方法避免了保留个体水平响应数据的要求。此外，这种调整可以叠加在现有的代理校正方法（如基于预测的推断或重要性加权）之上，以考虑这些校正未解决的额外偏差。为了在历史领域数量有限时管理不确定性，我们提供了一种矩估计器和一个领域自举过程。我们进一步使用公开可用的数据集和真实世界实验对这种方法进行验证。

更新时间: 2026-05-07 16:07:35

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2605.06484v1

ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning

Signal Temporal Logic (STL) is an expressive formal language for specifying spatio-temporal requirements over real-valued, real-time signals. It has been widely used for the verification and synthesis of autonomous systems and cyber-physical systems. In practice, however, users often express their requirements in natural language rather than in structured STL formulas, making natural-language-to-STL translation a critical yet challenging task. Manual specification requires temporal-logic expertise and cannot scale, while prompting commercial LLM APIs incurs substantial token costs and may expose sensitive system requirements to third-party services, raising privacy concerns for industrial deployment. To address these challenges, we present \textsc{ReasonSTL}, a tool-augmented framework that adapts local open-source language models for natural-language-to-STL generation. \textsc{ReasonSTL} decomposes the translation process into explicit reasoning, deterministic tool calls, and structured formula construction. We further introduce process-rewarded training to supervise both tool-use trajectories and final formulas, together with \textsc{STL-Bench}, a bilingual, computation-aware benchmark grounded in real-world signals. Experiments show that a 4B model trained with \textsc{ReasonSTL} achieves state-of-the-art performance in both automatic metrics and human evaluations, demonstrating that \textsc{ReasonSTL} provides a transparent, low-cost, and privacy-preserving alternative for formal specification drafting.

Updated: 2026-05-07 16:07:30

标题: ReasonSTL：通过工具增强的过程奖励学习桥接自然语言和信号时态逻辑

摘要: 信号时间逻辑（STL）是一种表达形式丰富的形式语言，用于规定实值、实时信号上的时空要求。它被广泛用于自主系统和网络物理系统的验证和合成。然而，在实践中，用户通常用自然语言而不是结构化的STL公式来表达他们的要求，这使得自然语言到STL的转换成为一个至关重要但具有挑战性的任务。手动规定需要时间逻辑专业知识，无法扩展，而促使商业LLM API会产生大量的令牌成本，并可能向第三方服务公开敏感的系统要求，为工业部署带来隐私问题。为了解决这些挑战，我们提出了\textsc{ReasonSTL}，一个利用本地开源语言模型进行自然语言到STL生成的工具增强框架。 \textsc{ReasonSTL} 将翻译过程分解为显式推理、确定性工具调用和结构化公式构建。我们进一步引入了过程奖励训练，以监督工具使用轨迹和最终公式，连同\textsc{STL-Bench}，一个基于现实信号的双语、计算意识基准。实验表明，通过\textsc{ReasonSTL}训练的4B模型在自动度量和人类评估方面均实现了最先进的性能，证明\textsc{ReasonSTL}为正式规范草拟提供了一种透明、低成本和保护隐私的替代方案。

更新时间: 2026-05-07 16:07:30

领域: cs.AI

下载: http://arxiv.org/abs/2605.06483v1

Pulling Back the Curtain on Deep Networks

In linear models, visualizing a weight vector naturally reveals the model's preferred input direction, but extending this intuition to deep networks via gradients or gradient ascent often yields brittle or adversarial-looking features. We argue that deep networks are better understood as input-conditioned affine operators, whose natural adjoint action pulls a neuron's preferred direction back to input space. We further refine this representation by backward-only softening and iterative enhancement to reconstruct coherent local structures encoded by the target neuron. This provides a unifying perspective on previously disparate ideas such as SmoothGrad, B-cos-style alignment, and Feature Accentuation. The resulting Semantic Pullbacks (SP) generate perceptually aligned, class-conditional post-hoc explanations that emphasize semantically meaningful features, facilitate coherent counterfactual perturbations, and remain theoretically grounded. Across convolutional architectures (ResNet50, VGG) and transformer-based models (PVT), Semantic Pullbacks achieve the best overall trade-off across faithfulness, stability, and target-sensitivity benchmarks, while remaining general, computationally efficient, and readily integrable into existing deep learning pipelines.

Updated: 2026-05-07 16:04:36

标题: 揭开深度网络的神秘面纱

摘要: 在线性模型中，通过可视化权重向量可以自然地揭示模型的首选输入方向，但通过梯度或梯度上升将这种直觉扩展到深度网络通常会产生脆弱或对抗性的特征。我们认为深度网络最好被理解为输入条件化的仿射算子，其自然伴随作用将神经元的首选方向拉回输入空间。我们通过仅向后软化和迭代增强来进一步细化这种表示，以重建目标神经元编码的连贯局部结构。这为以前分散的想法（如SmoothGrad、B-cos风格的对齐和特征强调）提供了一个统一的视角。由此产生的语义拉回（SP）生成感知对齐的、类条件的事后解释，强调语义上有意义的特征，促进连贯的反事实扰动，并保持理论上的基础。在卷积架构（ResNet50、VGG）和基于变压器的模型（PVT）中，语义拉回在忠实度、稳定性和目标敏感性基准之间实现了最佳的整体权衡，同时保持通用性、计算效率高，并容易集成到现有的深度学习管线中。

更新时间: 2026-05-07 16:04:36

领域: cs.LG,cs.CV,cs.NE

下载: http://arxiv.org/abs/2507.22832v6

Patch-Effect Graph Kernels for LLM Interpretability

Mechanistic interpretability aims to reverse-engineer transformer computations by identifying causal circuits through activation patching. However, scaling these interventions across diverse prompts and task families produces high-dimensional, unstructured datasets that are difficult to compare systematically. We propose a framework that reframes mechanistic analysis as a graph machine-learning problem by representing activation-patching profiles as patch-effect graphs over model components. We introduce three graph-construction methods: direct-influence via causal mediation, partial-correlation, and co-influence and apply graph kernels to analyze the resulting structures. Evaluating this approach on GPT-2 Small using Indirect Object Identification (IOI) and related tasks, we find that patch-effect graphs preserve discriminative structural signals. Specifically, localized edge-slot features provide higher classification accuracy than global graph-shape descriptors. A screened paired-patching validation suggests that CI and PC selected candidate edges correspond to stronger activation-influence effects than random or low-rank candidates. Crucially, by evaluating these representations against rigorous prompt-only and raw patch-effect controls, we make the evidential scope of the benchmark explicit: graph features compress structured patching signal, while raw tensors and surface cues define strong baselines that any circuit-level claim should address. Ultimately, our framework provides a compression and evaluation pipeline for comparing patching-derived structures under controlled baselines, separating robust slice-discriminative evidence from stronger task-general causal-circuit claims.

Updated: 2026-05-07 16:03:47

标题: Patch-Effect图核对于LLM可解释性的影响

摘要: 机制可解释性的目标是通过识别激活修补中的因果电路来逆向工程变压器的计算过程。然而，在各种提示和任务族之间扩展这些干预会产生高维、非结构化的数据集，这些数据集很难进行系统比较。我们提出了一个框架，将机制分析重新构想为一个图机器学习问题，通过将激活修补配置表示为模型组件上的修补效应图。我们引入了三种图构建方法：通过因果中介、部分相关和共同影响建立直接影响，并应用图核函数来分析结果结构。通过在GPT-2 Small上使用间接对象识别（IOI）和相关任务评估这种方法，我们发现修补效应图保留了具有辨别性的结构信号。具体来说，局部边槽特征提供了比全局图形描述符更高的分类准确性。经过筛选的成对修补验证表明，CI和PC选择的候选边对应于比随机或低秩候选更强的激活影响效果。至关重要的是，通过将这些表示与严格的仅提示和原始修补效应控制进行评估，我们明确了基准的证据范围：图特征压缩了结构化修补信号，而原始张量和表面线索定义了强大的基线，任何电路级主张都应该解决。最终，我们的框架为在受控基线下比较修补派生结构提供了一个压缩和评估管道，将强健的切片辨别性证据与更强大的任务通用因果电路主张分离开来。

更新时间: 2026-05-07 16:03:47

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2605.06480v1

Risk-Controlled Post-Processing of Decision Policies

Predictive models are often deployed through existing decision policies that stakeholders are reluctant to change unless a risk constraint requires intervention. We study risk-controlled post-processing: given a deterministic baseline policy, choose a new policy that maximizes agreement with the baseline subject to a chance constraint on a user-specified loss. At the population level, we show that the optimal policy has a threshold structure: it follows the baseline except on contexts where switching to the oracle fallback policy yields a large reduction in conditional violation risk. At the finite-sample level, given a fitted fallback policy and score, we develop a post-processing algorithm that uses calibration data to select a threshold. Leveraging tools from algorithmic stability and stochastic processes, we show that under regularity conditions, in the i.i.d. setting, the expected excess risk of the post-processed policy is $O(\log n/n)$. In the special case when an exact-safe fallback policy is available, the algorithm achieves precise expected risk control under exchangeability. In this setting, we also give high-probability near-optimality guarantees on the post-processed policy. Experiments on a COVID-19 radiograph diagnosis task, an LLM routing problem, and a synthetic multiclass decision task show that targeted post-processing can meet or nearly meet risk budgets while preserving substantially more agreement with the baseline than score-blind random mixing.

Updated: 2026-05-07 16:03:24

标题: 风险受控的决策策略后处理

摘要: 预测模型通常通过现有的决策政策部署，利益相关者不愿改变，除非风险约束要求干预。我们研究了风险控制后处理：在给定确定性基线政策的情况下，选择一个新政策，使其与基线在用户指定损失的概率约束下达成最大一致。在人口水平上，我们表明最优政策具有阈值结构：它遵循基线，除非在切换到预测回退政策会显著减少条件违规风险的情况下。在有限样本水平上，鉴于一个拟合的回退政策和得分，我们开发了一个后处理算法，利用校准数据来选择一个阈值。利用算法稳定性和随机过程的工具，我们表明在正则条件下，在独立同分布的设置下，后处理政策的期望过度风险为$O(\log n/n)$。在确保回退政策可靠的特殊情况下，该算法在可交换性下实现了精确的预期风险控制。在这种情况下，我们还对后处理政策给出了高概率接近最优的保证。在COVID-19放射学诊断任务、LLM路由问题和合成多类决策任务上的实验表明，有针对性的后处理可以达到或接近风险预算，同时保留与基线更多的一致性，而不是盲目混合得分。

更新时间: 2026-05-07 16:03:24

领域: stat.ML,cs.LG,math.ST

下载: http://arxiv.org/abs/2605.06479v1

Probabilistic NDVI Forecasting from Sparse Satellite Time Series and Weather Covariates

Short-term forecasting of vegetation dynamics is a key enabler for data-driven decision support in precision agriculture. Normalized Difference Vegetation Index (NDVI) forecasting from satellite observations, however, remains challenging due to sparse and irregular sampling caused by cloud masking, as well as the heterogeneous climatic conditions under which crops evolve. In this work, we propose a probabilistic forecasting framework for field-level NDVI prediction under sparse, irregular clear-sky acquisitions. The architecture separates the encoding of historical NDVI and meteorological observations from future exogenous covariates, fusing both representations for multi-step quantile prediction. To address irregular revisit patterns and horizon-dependent uncertainty, we introduce a temporal-distance weighted quantile loss that aligns the training objective with the effective forecasting horizon. In addition, we incorporate cumulative and extreme-weather feature engineering to capture delayed meteorological effects relevant to vegetation response. Experiments on European satellite data show that the proposed approach outperforms statistical, deep learning, and time-series baselines on both pointwise and probabilistic evaluation metrics. Ablation studies confirm that target history is the primary driver of performance, with meteorological covariates providing additional gains in the full multimodal setting. The code is available at https://github.com/arco-group/ndvi-forecasting.

Updated: 2026-05-07 16:00:59

标题: 从稀疏卫星时间序列和天气协变量中进行NDVI的概率预测

摘要: 植被动态的短期预测是精准农业中数据驱动决策支持的关键因素。然而，由于云遮蔽引起的稀疏和不规则采样，以及作物生长的异质气候条件，从卫星观测预测归一化差异植被指数（NDVI）仍然具有挑战性。在这项工作中，我们提出了一个概率预测框架，用于在稀疏、不规则的晴天获取条件下预测田地级别的NDVI。该架构将历史NDVI和气象观测的编码与未来外生协变量分离，融合两种表示以进行多步分位数预测。为了解决不规则的重访模式和与时间距离相关的不确定性，我们引入了一个时间距离加权分位数损失，将训练目标与有效预测视野对齐。此外，我们还将累积和极端气象特征工程纳入，以捕捉与植被响应相关的延迟气象影响。对欧洲卫星数据的实验表明，所提出的方法在点评和概率评估指标上优于统计、深度学习和时间序列基线。消融研究证实目标历史是性能的主要驱动因素，而气象协变量在完整的多模态设置中提供额外收益。代码可在https://github.com/arco-group/ndvi-forecasting 上找到。

更新时间: 2026-05-07 16:00:59

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2602.17683v2

Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features

We introduce a probabilistic approach for dating historical manuscript pages from visual features alone. Instead of aggregating centuries into classes as is standard in the previous literature, we pose dating as an evidential deep regression problem over a continuous year axis, allowing our neural network to output a full predictive distribution with decomposed aleatoric and epistemic uncertainty in a single forward pass. Our architecture combines an EfficientNet-B2 backbone with a Normal-Inverse-Gamma (NIG) output head trained with a joint negative-log-likelihood and evidence-regularization objective. On the DIVA-HisDB benchmark (150 pages, 3 medieval codices, 151,936 patches), our model scores a test MAE of 5.4 years, well below the 50-year century-label supervision granularity, with 93\% of patches within 5 years and 97\% within 10 years. Our approach achieves \textbf{PICP=92.6\%}, the best calibration among all compared methods, in a single forward pass, outperforming MC Dropout (PICP=88.2\%, 50 passes) and Deep Ensembles (PICP=79.7\%, 5 models) at $5\times$ lower inference cost. Uncertainty decomposition shows aleatoric uncertainty is a strong predictor of dating error (Spearman $ρ=0.729$), and a selective prediction about the most certain 20\% of patches can provide \textbf{0.5 years MAE}. We show that predicted uncertainty increases as image degradation worsens, spatial decomposition maps explain which script regions cause aleatoric uncertainty, and page-level aggregation reduces MAE to 4.5 years with $ρ=0.905$ between uncertainty and page-level error.

Updated: 2026-05-07 16:00:19

标题: 通过视觉书写特征的证据深度回归对历史手稿进行概率性日期确定

摘要: 我们引入了一种基于视觉特征的概率方法来对历史手稿页面进行日期确定。与先前文献中标准的将世纪聚合为类别不同，我们将日期确定作为一个连续年份轴上的证据深度回归问题，使我们的神经网络能够在单次前向传递中输出一个完整的预测分布，其中包括了分解的随机和知识不确定性。我们的架构将EfficientNet-B2骨干网络与一个使用联合负对数似然和证据正规化目标进行训练的正态-逆伽马（NIG）输出头相结合。在DIVA-HisDB基准测试（150页，3部中世纪代码，151,936个补丁）中，我们的模型得分为5.4年的测试平均绝对误差（MAE），远低于50年世纪标签监督粒度，93\%的补丁在5年内，97\%在10年内。我们的方法在单次前向传递中实现了92.6%的PICP，是所有比较方法中最佳的校准性能，优于MC Dropout（PICP=88.2%，50次传递）和Deep Ensembles（PICP=79.7%，5个模型），且推理成本降低了5倍。不确定性分解显示随机不确定性是日期误差的强有力预测因素（Spearman ρ=0.729），对最确定的20%的补丁进行选择性预测可以提供0.5年的MAE。我们展示了随着图像退化的加剧，预测的不确定性增加，空间分解地图说明了哪些手稿区域导致了随机不确定性，而页面级别的聚合将MAE降低到4.5年，并且在不确定性和页面级别错误之间的ρ=0.905。

更新时间: 2026-05-07 16:00:19

领域: cs.AI

下载: http://arxiv.org/abs/2605.06475v1

Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching

We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under the target policy. The weights are learned inductively in a top-down manner via a moment matching objective against a value-function discriminator class. Notably, and perhaps surprisingly, a data-dependent finite-sample guarantee for general function approximation can be established under only the realizability of $Q^π$, with a dimension-free bound -- that is, the error does not depend on the statistical complexity of the function class. We also establish connections to several existing methods, such as importance sampling and linear FQE. Further theoretical analyses shed new light on the nature of coverage, a concept of fundamental importance to offline RL.

Updated: 2026-05-07 16:00:04

标题: Q-MMR：通过递归重新加权和矩匹配进行离线策略评估

摘要: 我们提出了一个新的理论框架，Q-MMR，用于有限时间跨度MDPs的离线评估。Q-MMR学习一组标量权重，每个数据点对应一个权重，使得重新加权的奖励逼近目标策略下的预期回报。这些权重通过一种自顶向下的归纳方式通过与值函数鉴别器类的矩匹配目标进行学习。值得注意的是，也许令人惊讶的是，可以在仅实现$Q^π$的情况下建立通用函数逼近的有限样本保证，具有无维度限制 -- 即，误差不依赖于函数类的统计复杂度。我们还建立了与几种现有方法的联系，如重要性采样和线性FQE。进一步的理论分析揭示了离线RL中的覆盖性质的新特性，这是一个基本重要的概念。

更新时间: 2026-05-07 16:00:04

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2605.06474v1

Fusion Complexity Inversion: Why Simpler Cross View Modules Outperform SSMs and Cross View Attention Transformers for Pasture Biomass Regression

Accurate estimation of pasture biomass from agricultural imagery is critical for sustainable livestock management, yet existing methods are limited by the small, imbalanced, and sparsely annotated datasets typical of real world monitoring. In this study, adaptation of vision foundation models to agricultural regression is systematically evaluated on the CSIRO Pasture Biomass benchmark, a 357 image dual view dataset with laboratory validated, component wise ground truth for five biomass targets, through 17 configurations spanning four backbones (EfficientNet-B3 to DINOv3-ViT-L), five cross view fusion mechanisms, and a 4x2 metadata factorial. A counterintuitive principle, termed "fusion complexity inversion", is uncovered: on scarce agricultural data, a two layer gated depthwise convolution (R^2 = 0.903) outperforms cross view attention transformers (0.833), bidirectional SSMs (0.819), and full Mamba (0.793, below the no fusion baseline). Backbone pretraining scale is found to monotonically dominate all architectural choices, with the DINOv2 -> DINOv3 upgrade alone yielding +5.0 R^2 points. Training only metadata (species, state, and NDVI) is shown to create a universal ceiling at R^2 ~ 0.829, collapsing an 8.4 point fusion spread to 0.1 points. Actionable guidelines for sparse agricultural benchmarks are established: backbone quality should be prioritized over fusion complexity, local modules preferred over global alternatives, and features unavailable at inference excluded.

Updated: 2026-05-07 15:59:48

标题: 融合复杂性反转：为什么更简单的交叉视图模块在牧场生物量回归中胜过SSMs和交叉视图注意力变换器

摘要: 从农业图像准确估计牧草生物量对可持续的牲畜管理至关重要，然而现有方法受到真实监测典型的小型、不平衡和稀疏标注数据集的限制。本研究在CSIRO牧草生物量基准测试上系统评估了将视觉基础模型适应农业回归的效果，该基准测试包括357幅双视图图像数据集，针对五种生物量目标具有经过实验室验证的逐组分地面真实值，涵盖17种配置，跨越四种主干网络（EfficientNet-B3到DINOv3-ViT-L），五种交叉视图融合机制和4x2元数据因子。发现了一个反直觉的原则，称为“融合复杂性反转”：在稀缺的农业数据上，一个两层门控深度卷积（R^2 = 0.903）优于交叉视图注意力变换器（0.833），双向SSM（0.819）和完整的Mamba（0.793，低于无融合基线）。发现主干预训练规模单调支配所有架构选择，仅DINOv2 -> DINOv3升级就能带来+5.0 R^2点。仅对元数据（物种、状态和NDVI）进行训练显示出在R^2 ~ 0.829处创建了一个普遍的上限，将8.4点融合差距缩小到0.1点。建立了针对稀疏农业基准测试的可操作指南：应优先考虑主干网络质量而非融合复杂性，更喜欢局部模块而非全局替代方案，并在推断时排除不可用的特征。

更新时间: 2026-05-07 15:59:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2603.07819v5

Screening Is Enough

A core limitation of standard softmax attention is that it does not provide an independently interpretable measure of query--key relevance: attention scores are unbounded, while attention weights are defined only relative to competing keys. Consequently, irrelevant keys cannot be explicitly rejected, and some attention mass is assigned even when no key is genuinely relevant. We introduce Multiscreen, a language-model architecture built around a mechanism we call screening, which enables absolute query--key relevance. Instead of redistributing attention across all keys, screening computes bounded query--key similarities and applies an explicit threshold, discarding irrelevant keys and aggregating the remaining keys without global competition. Across experiments, Multiscreen achieves comparable validation loss with roughly 30\% fewer parameters than a Transformer baseline and remains stable at substantially larger learning rates. It maintains stable long-context perplexity beyond the training context and shows little degradation in retrieval performance as context length increases. Finally, Multiscreen achieves lower full-context forward-pass latency at long context lengths.

Updated: 2026-05-07 15:58:45

标题: 筛查就足够

摘要: 标准softmax注意力的一个核心局限是它不提供一个独立可解释的查询-键相关性度量：注意力分数是无界的，而注意力权重仅相对于竞争键定义。因此，不相关的键无法被明确拒绝，即使没有真正相关的键，也会分配一些注意力。我们引入了Multiscreen，这是一个围绕我们称之为筛选机制构建的语言模型架构，它可以实现绝对的查询-键相关性。筛选不是在所有键之间重新分配注意力，而是计算有界的查询-键相似度，并应用显式阈值，丢弃不相关的键，并在没有全局竞争的情况下聚合剩余的键。在实验中，Multiscreen在参数数量比Transformer基线少大约30%的情况下实现了可比的验证损失，并且在更大的学习速率下保持稳定。它在训练上下文之外保持稳定的长上下文困惑度，并且在上下文长度增加时，检索性能几乎没有下降。最后，在长上下文长度下，Multiscreen实现了较低的完整上下文前向传递延迟。

更新时间: 2026-05-07 15:58:45

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2604.01178v3

Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management

LLM-based workflows compose specialized agents to execute complex tasks, and these agents usually share substantial context, allowing KV-Cache reuse to save computation. Existing approaches either manage KV-Cache at agent level and fail to exploit the reuse opportunities within workflows, or manage cache at the workflow level but assume that each workflow calls a static sequence of agents. However, practical workflows are typically dynamic, where the sequence of invoked agents and thus induced cache reuse opportunities depend on the context of each task. To serve such dynamic workflows efficiently, we build a system dubbed PBKV (\textbf{P}rediction-\textbf{B}ased \textbf{KV}-Cache Management). For each workflow, PBKV predicts the agent invocations in several future steps by fusing the guidance from historical workflows and context of the target workflow. Based on the predictions, PBKV estimates the reuse potential of cache entries and keeps the high-potential entries in GPU memory. To be robust to prediction errors, PBKV utilizes the predictions conservatively during both cache eviction and prefetching. Experiments on three workflow benchmarks show that PBKV achieves up to $1.85\times$ speedup over LRU on dynamic workflows, and up to $1.26\times$ speedup over the SOTA baseline KVFlow on the static workflow.

Updated: 2026-05-07 15:57:51

标题: 基于预测的KV缓存管理的动态Agent工作流的高效服务

摘要: LLM基于工作流程组成专门的代理来执行复杂的任务，这些代理通常共享大量的上下文，允许KV-Cache的重用来节省计算。现有的方法要么在代理级别管理KV-Cache，未能利用工作流程内的重用机会，要么在工作流程级别管理缓存，但假定每个工作流程调用静态序列的代理。然而，实际工作流程通常是动态的，被调用代理的顺序以及因此引发的缓存重用机会取决于每个任务的上下文。为了有效地为这种动态工作流程提供服务，我们构建了一个名为PBKV（基于预测的KV-Cache管理）的系统。对于每个工作流程，PBKV通过融合来自历史工作流程和目标工作流程上下文的指导，预测未来几个步骤中的代理调用。基于预测，PBKV估计缓存条目的重用潜力，并将高潜力的条目保留在GPU内存中。为了对预测错误保持强健性，PBKV在缓存淘汰和预取期间保守地利用预测。对三个工作流程基准的实验表明，PBKV在动态工作流程上比LRU实现了高达1.85倍的加速，并在静态工作流程上比SOTA基线KVFlow实现了高达1.26倍的加速。

更新时间: 2026-05-07 15:57:51

领域: cs.LG

下载: http://arxiv.org/abs/2605.06472v1

Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling

Ecological Momentary Assessment (EMA) studies enable the collection of high-frequency self-reports of suicidal thoughts and behaviors (STBs) via smartphones. Latent stochastic differential equations (SDEs) are a promising model class for EMA data, as it is irregularly sampled, noisy, and partially observed. But SDE-based models suffer from two key limitations. (a) These models often violate domain constraints, undermining scientific validity and clinical trust of the model. (b) Training is numerically unstable without ad hoc fixes (e.g. oversimplified dynamics) that are ill-suited for high-stakes applications. Here, we develop a novel class of expressive SDEs whose solutions are provably confined to a prescribed compact polyhedral state space, matching the domains of EMA data. In this work, (1) we show why chain-rule based constructions of SDEs on compact domains fail, theoretically and empirically; (2) we derive constraints on drift and diffusion for general and stationary SDEs so their solutions remain in the desired state space; and (3), we introduce a parameterization that maps arbitrary (neural or expert-given) dynamics into constraint-satisfying SDEs. On several real EMA datasets, including a large suicide-risk study, our parameterization improves forecasts and optimization dynamics over standard latent neural SDE baselines. These contributions pave the way for principled, trustworthy continuous-time models of suicide risk and other clinical time series and extend applications of SDE-based methods (e.g. diffusion models) to domains with hard state constraints.

Updated: 2026-05-07 15:56:50

标题: 紧凑状态空间上的神经随机微分方程：理论、方法及其在自杀风险建模中的应用

摘要: 生态瞬时评估（EMA）研究通过智能手机收集高频自我报告的自杀思想和行为（STBs）。潜在随机微分方程（SDEs）是EMA数据的一种有前途的模型类，因为它是不规则采样的、嘈杂的，并且部分被观察到。但是基于SDE的模型存在两个关键限制。(a)这些模型经常违反领域约束，破坏了模型的科学有效性和临床信任。(b)训练在没有临时修复措施的情况下是数值不稳定的（例如，过于简化的动态），这些修复措施不适用于高风险应用。在这里，我们开发了一种新颖的表达SDE的类，其解被证明限制在预定的紧凑多面体状态空间中，与EMA数据的领域相匹配。在这项工作中，（1）我们展示了基于链规则的SDE在紧凑域上的构造在理论上和实证上失败；（2）我们推导了对于一般和静止SDE的漂移和扩散的约束，以便它们的解保持在所需的状态空间内；（3）我们引入了一种参数化方法，将任意（神经或专家给定的）动态映射到满足约束的SDE中。在几个真实的EMA数据集上，包括一个大型自杀风险研究，我们的参数化方法改善了标准潜在神经SDE基线的预测和优化动态。这些贡献为自杀风险和其他临床时间序列的有原则、可信任的连续时间模型铺平了道路，并扩展了基于SDE的方法（例如扩散模型）到具有硬状态约束的领域。

更新时间: 2026-05-07 15:56:50

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2508.17090v3

Hitting Time Isomorphism for Multi-Stage Planning with Foundation Policies

We present a new operator-theoretic representation learning framework for offline reinforcement learning that recovers the directed temporal geometry of a controlled Markov process from hitting time observations. While prior art often produces symmetric distances or fails to satisfy the triangle inequality, our framework learns a Hilbert-space displacement geometry where expected hitting times are realized as linear functionals of latent displacements. We prove that this representation exists under latent linear closure and is uniquely identifiable up to a bounded linear isomorphism. For finite-dimensional implementations, we show that global hitting-time error is bounded by one-step transition error amplified by the environment's transient spectral radius. Furthermore, we provide finite-sample guarantees accounting for approximation, statistical complexity, and trajectory-label mismatch. Derived from this theory, we curate Isomorphic Embedding Learning (IEL) as a new goal-agnostic foundation policy learning algorithm that anchors a HILP-style consistency objective with explicit hitting-time regression to ensure that the learned geometry reflects actual decision-time progress. This asymmetric and compositional structure enables robust graph-based multi-stage planning for long-horizon navigation. Our experiments demonstrate that IEL improves the state of the art of learning foundation policy policies from offline maze locomotion data. Our code can be found on https://github.com/MagnusBoock/IEL

Updated: 2026-05-07 15:56:43

标题: 多阶段规划中基础政策的击中时间同态化

摘要: 我们提出了一个新的运算符理论表示学习框架，用于离线强化学习，从击中时间观测中恢复受控马尔可夫过程的定向时间几何形状。尽管先前的研究通常产生对称距离或未能满足三角不等式，我们的框架学习了一个希尔伯特空间的位移几何，其中预期的击中时间被实现为潜在位移的线性泛函。我们证明了在潜在线性闭包下存在这种表示，并且在有界线性同构上是唯一可识别的。对于有限维实现，我们证明全局击中时间误差受到一步转移误差放大的影响，放大倍数为环境的瞬时谱半径。此外，我们提供了有限样本保证，考虑了近似、统计复杂性和轨迹标签不匹配。基于这一理论，我们策划了同构嵌入学习（IEL）作为一个新的目标不可知的基础策略学习算法，将HILP风格的一致性目标与显式的击中时间回归相结合，以确保学习到的几何形状反映实际决策时间的进展。这种非对称和组合结构使得长期导航的稳健基于图的多阶段规划成为可能。我们的实验表明，IEL改进了从离线迷宫移动数据中学习基础策略政策的最新技术水平。我们的代码可以在https://github.com/MagnusBoock/IEL找到。

更新时间: 2026-05-07 15:56:43

领域: cs.LG

下载: http://arxiv.org/abs/2605.06470v1

Dynamic Controlled Variables Based Dynamic Self-Optimizing Control

Self-optimizing control is a strategy for selecting controlled variables, where the economic objective guides the selection and design of controlled variables, with the expectation that maintaining the controlled variables at constant values can achieve optimization effects, translating the process optimization problem into a process control problem. Currently, self-optimizing control is widely applied to steady-state optimization problems. However, the development of process systems exhibits a trend towards refinement, highlighting the importance of optimizing dynamic processes such as batch processes and grade transitions. This paper formally introduces the self-optimizing control problem for dynamic optimization, termed the dynamic self-optimizing control problem, extending the original definition of self-optimizing control. A novel concept, "dynamic controlled variables" (DCVs), is proposed, and an implicit control policy is presented based on this concept. The paper theoretically analyzes the advantages and generality of DCVs compared to explicit control strategies and elucidates the relationship between DCVs and traditional controllers. Moreover, this paper puts forth a data-driven approach to designing self-optimizing DCVs, which considers DCV design as a mapping identification problem and employs deep neural networks to parameterize the variables. Three case studies validate the efficacy and superiority of DCVs in approximating multi-valued and discontinuous functions, as well as their application to dynamic optimization problems with non-fixed horizons, which traditional self-optimizing control methods are unable to address.

Updated: 2026-05-07 15:56:29

标题: 基于动态控制变量的动态自优化控制

摘要: 自优化控制是一种选择受控变量的策略，其中经济目标指导受控变量的选择和设计，期望将受控变量保持在恒定值可以实现优化效果，将过程优化问题转化为过程控制问题。目前，自优化控制广泛应用于稳态优化问题。然而，过程系统的发展呈现出向精细化的趋势，突显了优化动态过程（如批处理过程和等级转换）的重要性。本文正式介绍了动态优化的自优化控制问题，称为动态自优化控制问题，扩展了自优化控制的原始定义。提出了一个新概念“动态受控变量”（DCVs），并基于这一概念提出了一个隐式控制策略。本文从理论上分析了DCVs相对于显式控制策略的优势和普遍性，并阐明了DCVs与传统控制器之间的关系。此外，本文提出了一种基于数据驱动的方法来设计自优化DCVs，将DCV设计视为一种映射识别问题，并采用深度神经网络来参数化变量。三个案例研究验证了DCVs在逼近多值和不连续函数方面的有效性和优越性，以及它们在动态优化问题中的应用，这些问题具有非固定的视野，传统的自优化控制方法无法解决。

更新时间: 2026-05-07 15:56:29

领域: math.OC,cs.LG,eess.SY

下载: http://arxiv.org/abs/2605.06469v1

No Triangulation Without Representation: Generalization in Topological Deep Learning

Despite an ever-increasing interest in topological deep learning models that target higher-order datasets, there is no consensus on how to evaluate such models. This is exacerbated by the fact that topological objects permit operations, such as structural refinements, that are not appropriate for graph data. In this work, we extend MANTRA, a benchmark dataset containing manifold triangulations, to a larger class of manifolds with more diverse homeomorphism types. We show that, unlike prior claims, both graph neural networks (GNNs) and higher-order message passing (HOMP) methods can saturate the benchmark. However, we find that this is contingent on the right representation and feature assignment, emphasizing their importance in baseline models. We thus provide a novel evaluation protocol based on representational diversity and triangulation refinement. Surprisingly, we find no indication that existing models are capable of generalizing beyond the combinatorial structure of the data. This points towards a research gap in developing models that understand topological structure independent of scale. Our work thus provides the necessary scaffolding to evaluate future models and enable the development of topology-aware inductive biases.

Updated: 2026-05-07 15:55:24

标题: 没有三角测量就没有表征：拓扑深度学习中的泛化

摘要: 尽管对针对高阶数据集的拓扑深度学习模型越来越感兴趣，但如何评估这些模型尚无共识。这一问题进一步恶化的原因在于拓扑对象允许进行结构细化等操作，而这些操作并不适用于图数据。在这项工作中，我们将包含流形三角剖分的基准数据集MANTRA扩展到具有更多不同同胚类型的流形类别。我们发现，与以往的说法不同，图神经网络（GNNs）和高阶消息传递（HOMP）方法都能饱和基准。然而，我们发现这取决于正确的表示和特征分配，强调了它们在基线模型中的重要性。因此，我们提供了一种基于表征多样性和三角剖分细化的新型评估协议。令人惊讶的是，我们发现没有迹象表明现有模型能够泛化到数据的组合结构之外。这指向了一个研究空白，即开发能够独立于尺度理解拓扑结构的模型。因此，我们的工作为评估未来模型并促进拓扑感知归纳偏差的发展提供了必要的支撑。

更新时间: 2026-05-07 15:55:24

领域: cs.LG,math.AT

下载: http://arxiv.org/abs/2605.06467v1

Diversity Curves for Graph Representation Learning

Graph-level representations are crucial tools for characterising structural differences between graphs. However, comparing graphs with different cardinalities, even when sampled from the same underlying distribution, remains challenging. Unsupervised tasks in particular require interpretable, scalable, and reliable size-aware graph representations. Our work addresses these issues by tracking the structural diversity of a graph across coarsening levels. The resulting graph embeddings, which we denote diversity curves, are interpretable by construction, efficient, and directly comparable across coarsening hierarchies. Specifically, we track the spread of graphs, a novel isometry invariant that is inherently well-suited for encoding the metric diversity and geometry of graphs. We utilise edge contraction coarsening and prove that this improves expressivity, thus leading to more powerful graph-level representations than structural descriptors alone. Demonstrating their utility over a range of baseline methods in practice, we use diversity curves to (i) cluster and visualise simulated graphs across varying sizes, (ii) distinguish the geometry of single-cell graphs, (iii) compare the structure of molecular graph datasets, and (iv) characterise geometric shapes.

Updated: 2026-05-07 15:55:20

标题: 图表示学习的多样性曲线

摘要: 图级表示是表征图结构差异的重要工具。然而，即使从相同的基础分布中抽样，比较具有不同基数的图仍然具有挑战性。特别是无监督任务需要可解释、可扩展和可靠的大小感知图表示。我们的工作通过跟踪图在粗化级别上的结构多样性来解决这些问题。由此产生的图嵌入，我们将其称为多样性曲线，具有可解释性、高效性，并且可以直接在粗化层次结构中进行比较。具体来说，我们跟踪图的扩散，这是一种新颖的等距不变量，非常适合编码图的度量多样性和几何形状。我们利用边缩减粗化，并证明这种方法提高了表达能力，从而产生比仅仅使用结构描述符更强大的图级表示。在实践中展示了它们在一系列基线方法上的效用，我们使用多样性曲线来(i)对模拟图进行聚类和可视化，(ii)区分单细胞图的几何形状，(iii)比较分子图数据集的结构，以及(iv)表征几何形状。

更新时间: 2026-05-07 15:55:20

领域: cs.LG

下载: http://arxiv.org/abs/2605.06466v1

Federated Spatiotemporal Graph Learning for Passive Attack Detection in Smart Grids

Smart grids are exposed to passive eavesdropping, where attackers listen silently to communication links. Although no data is actively altered, such reconnaissance can reveal grid topology, consumption patterns, and operational behavior, creating a gateway to more severe targeted attacks. Detecting this threat is difficult because the signals it produces are faint, short-lived, and often disappear when traffic is examined by a single node or along a single timeline. This paper introduces a graph-centric, multimodal detector that fuses physical-layer and behavioral indicators over ego-centric star subgraphs and short temporal windows to detect passive attacks. To capture stealthy perturbations, a two-stage encoder is introduced: graph convolution aggregates spatial context across ego-centric star subgraphs, while a bidirectional GRU models short-term temporal dependencies. The encoder transforms heterogeneous features into a unified spatio-temporal representation suitable for classification. Training occurs in a federated learning setup under FedProx, improving robustness to heterogeneous local raw data and contributing to the trustworthiness of decentralized training; raw measurements remain on client devices. A synthetic, standards-informed dataset is generated to emulate heterogeneous HAN/NAN/WAN communications with wireless-only passive perturbations, event co-occurrence, and leak-safe splits. The model achieves a testing accuracy of 98.32% per-timestep (F1_{attack}=0.972) and 93.35% per-sequence at 0.15% FPR using a simple decision rule with run-length m=2 and threshold $τ=0.55$. The results demonstrate that combining spatial and temporal context enables reliable detection of stealthy reconnaissance while maintaining low false-positive rates, making the approach suitable for non-IID federated smart-grid deployments.

Updated: 2026-05-07 15:53:15

标题: 智能电网中被动攻击检测的联合时空图学习

摘要: 智能电网容易受到被动窃听的攻击，攻击者在沉默地监听通信链路。虽然没有数据被积极地更改，但这种侦察行为可以揭示电网拓扑结构、消费模式和运营行为，为更严重的有针对性攻击打开了大门。检测这种威胁很困难，因为它产生的信号微弱、短暂，并且在单个节点或沿着单个时间线检查流量时经常消失。本文介绍了一种基于图形的多模态检测器，它在自我中心星形子图和短暂时间窗口上融合物理层和行为指标，以检测被动攻击。为了捕捉隐蔽的干扰，引入了一个两阶段编码器：图卷积在自我中心星形子图上聚合空间背景，而双向GRU模型则建模了短期时间依赖性。编码器将异构特征转换为适合分类的统一时空表示。训练在FedProx的联邦学习设置下进行，提高了对异构本地原始数据的鲁棒性，并有助于去中心化训练的可靠性；原始测量数据保留在客户设备上。生成了一个合成的、符合标准的数据集，模拟了具有无线被动干扰、事件共现和泄漏安全分割的异构HAN/NAN/WAN通信。该模型在每个时间步的测试准确率为98.32%（F1_{attack}=0.972），在0.15% FPR的情况下，每个序列的准确率为93.35%，使用简单的决策规则，运行长度为m=2，阈值为$τ=0.55。结果表明，将空间和时间背景相结合能够可靠地检测隐蔽的侦察行为，同时保持较低的误报率，使得这种方法适用于非独立同分布的联邦式智能电网部署。

更新时间: 2026-05-07 15:53:15

领域: cs.CR,cs.AI,cs.DC

下载: http://arxiv.org/abs/2510.02371v2

Invariant-Based Diagnostics for Graph Benchmarks

Progress on graph foundation models is hindered by benchmark practices that conflate the contributions of node features and graph structure, making it hard to tell whether a model actually learns from connectivity, or whether it even needs to. We propose addressing this using graph invariants, i.e., permutation-invariant, task-agnostic structural descriptors that serve as a diagnostic framework for graph benchmarks. We show that (i) invariants are more expressive than standard GNNs, (ii) invariants characterize structural heterogeneity within and across benchmark datasets, (iii) invariants predict multi-task performance, and (iv) simple invariant-based models are competitive with, and sometimes exceed, transformer and message-passing baselines across 26 datasets. Our results suggest that expressivity is not the main driver of predictive performance, and that on tasks where structure matters, a non-trainable structural proxy often matches trained message-passing models. We thus posit that invariant baselines should become a standard for evaluating whether structure is required for a task and whether a model picks up on it, serving as a stepping stone towards graph foundation models.

Updated: 2026-05-07 15:51:43

标题: 基于不变量的图基准测试诊断

摘要: 图基础模型的进展受到基准实践的阻碍，这些实践混淆了节点特征和图结构的贡献，使人难以判断模型是否真正从连接性中学习，甚至是否需要这样做。我们提出使用图不变量来解决这个问题，即置换不变、任务无关的结构描述符，作为图基准的诊断框架。我们展示了：(i) 不变量比标准GNNs更具表现力，(ii) 不变量表征跨基准数据集内外的结构异质性，(iii) 不变量预测多任务性能，(iv) 简单基于不变量的模型在26个数据集上与Transformer和消息传递基线相竞争，有时甚至超过。我们的结果表明，表达能力并非预测性能的主要驱动因素，在需要结构的任务中，一个不可训练的结构代理往往能与经过训练的消息传递模型相匹敌。因此，我们认为不变量基线应成为评估任务是否需要结构以及模型是否注意到结构的标准，作为迈向图基础模型的一个基石。

更新时间: 2026-05-07 15:51:43

领域: cs.LG,math.CO

下载: http://arxiv.org/abs/2605.06462v1

Cryptographic and Information-theoretic Security Capacities for General Arbitrarily Varying Wiretap Channels

We compare the strong secrecy capacities of Arbitrarily Varying Wiretap Channels (AVWCs) and General Arbitrary Varying Wiretap Channels (GAVWCs) with their capacities under semantic secrecy constraint and other equivalent cryptographic secrecy constraints. It turns out that the average error and strong secrecy capacity of an AVWC is always equal to its maximal error and semantic secrecy capacity. However, this equivalence does not hold for all general communication systems, and we prove this by a counterexample. We also show that, for the GAVWC, semantic security and the other cryptographic security measures considered achieve the same capacity values. Finally, we bound the gap between the strong secrecy capacity and the semantic secrecy capacity for the GAVWC. The gap vanishes if the choice of the jammer is sub-double-exponential with respect to the block length n, which gives a sufficient condition for the strong and semantic secrecy capacities to be equal for GAVWCs.

Updated: 2026-05-07 15:51:20

标题: 一般任意变动的窃听信道的加密和信息理论安全容量

摘要: 我们比较任意变化窃听信道（AVWCs）和一般任意变化窃听信道（GAVWCs）在语义保密约束和其他等价的加密保密约束下的强保密容量。结果表明，AVWC的平均误差和强保密容量总是等于其最大误差和语义保密容量。然而，这种等价性并不适用于所有一般通信系统，我们通过反例证明了这一点。我们还表明，对于GAVWC，语义安全性和其他考虑的加密安全措施实现了相同的容量值。最后，我们限制了GAVWC的强保密容量和语义保密容量之间的差距。如果关于块长度n的干扰器选择是次双指数的，这个差距将消失，这给出了GAVWC的强保密容量和语义保密容量相等的充分条件。

更新时间: 2026-05-07 15:51:20

领域: cs.IT,cs.CR

下载: http://arxiv.org/abs/2605.06751v1

MINER: Mining Multimodal Internal Representation for Efficient Retrieval

Visual document retrieval has become essential for accessing information in visually rich documents. Existing approaches fall into two camps. Late-interaction retrievers achieve strong quality through fine-grained token-level matching but store hundreds of vectors per page, incurring large index footprints and high serving costs. By contrast, dense single-vector retrievers retain storage and latency advantages but consistently lag in quality because they compress all information into a single final-layer embedding. In this work, we first conduct a layerwise diagnostic on single-vector retrievers, revealing that retrieval-relevant signal resides in internal representations. Motivated by these findings, we propose MINER (Mining Multimodal Internal RepreseNtation for Efficient Retrieval), a lightweight plug-in module that probes and fuses internal signals across transformer layers into a single compact embedding without modifying the backbone or sacrificing single-vector efficiency. The first Retrieval-Aligned Layer Probing stage attaches a lightweight probe at each layer, surfacing which dimensions carry retrieval-relevant information. The subsequent Adaptive Sparse Multi-Layer Fusion stage applies performance-adaptive neuron-level masking to the selected layers and fuses the surviving signals into the final dense vector. Across ViDoRe V1/V2/V3, MINER outperforms existing dense single-vector retrievers on the majority of benchmarks, with up to 4.5% nDCG@5 improvement over its corresponding backbone. Compared to strong late-interaction baselines, in some settings MINER substantially narrows the nDCG@$5$ gap to $0.2$ while preserving the storage and serving advantages of dense retrieval.

Updated: 2026-05-07 15:51:06

标题: MINER：挖掘多模态内部表征以实现高效检索

摘要: 视觉文档检索已经成为访问视觉丰富文档中信息的必要手段。现有方法可以分为两类。延迟交互检索器通过细粒度的令牌级匹配实现高质量，但每页存储数百个向量，导致索引占用空间大，服务成本高。相比之下，密集单向量检索器保留了存储和延迟优势，但在质量上一直落后，因为它将所有信息压缩到单个最终层嵌入中。在这项工作中，我们首先对单向量检索器进行逐层诊断，揭示了检索相关信号存在于内部表示中。受到这些发现的启发，我们提出了MINER（挖掘多模态内部表示以实现高效检索），这是一个轻量级插件模块，可以探测和融合跨变换器层的内部信号，将其融合成一个紧凑的嵌入，而不修改骨干或牺牲单向量效率。第一个对齐检索层探测阶段在每一层附加了一个轻量级探测器，揭示哪些维度携带了检索相关信息。随后的自适应稀疏多层融合阶段对所选层应用性能自适应的神经元级蒙版，并将存活的信号融合成最终的密集向量。在ViDoRe V1/V2/V3上，MINER在大多数基准测试中优于现有的密集单向量检索器，与其相应的骨干相比，nDCG@5提高了高达4.5%。与强大的延迟交互基线相比，在某些情况下，MINER大大缩小了nDCG@5的差距至0.2，同时保留了密集检索的存储和服务优势。

更新时间: 2026-05-07 15:51:06

领域: cs.LG

下载: http://arxiv.org/abs/2605.06460v1

P^2O: Joint Policy and Prompt Optimization

Reinforcement Learning with Verifiable Rewards (RLVR) enhances Large Language Model (LLM) reasoning but suffers from advantage collapse on ``hard samples'' where all rollouts fail. This lack of variance eliminates crucial learning signals. For these intractable samples, simply scaling up rollout budgets offers limited gains. We introduce Joint Policy and Prompt Optimization (P$^2$O) to mitigate this collapse by alternating continuous policy updates with discrete prompt evolution. P$^2$O leverages the GEPA algorithm to discover successful reasoning prompts for intractable instances. Via context distillation, the model internalizes these prompt-induced gains directly into its parameters, removing the need for inference-time prompting. Empirically, P$^2$O restores critical advantage signals, significantly outperforming standard GRPO and surpassing baselines with doubled rollout budgets, ultimately yielding strong out-of-distribution generalization and an up to $9.5\%$ performance improvement. Our findings expose the limits of standard exploration in sparse-reward environments, illuminating the potential of unifying evolutionary algorithms with reinforcement learning. This integration of discrete semantic search and continuous parameter updates establishes a self-reinforcing paradigm for autonomous LLM alignment.

Updated: 2026-05-07 15:50:49

标题: P^2O：联合政策和及时优化

摘要: 具有可验证奖励的强化学习（RLVR）增强了大型语言模型（LLM）的推理能力，但在“困难样本”上存在优势坍缩问题，所有回滚均失败。这种缺乏方差消除了关键学习信号。对于这些棘手的样本，仅简单地增加回滚预算带来的收益有限。我们引入了联合策略和提示优化（P$^2$O）来通过交替连续策略更新和离散提示演变来减轻这种坍缩。P$^2$O利用GEPA算法发现了对于棘手实例成功推理提示。通过上下文蒸馏，模型直接将这些提示诱导的收益内化到其参数中，消除了推理时间提示的需要。从经验上看，P$^2$O恢复了关键优势信号，明显优于标准GRPO，并超越了双倍回滚预算的基线，最终产生了强大的越界泛化，性能提升高达$9.5\%。我们的发现揭示了稀疏奖励环境中标准探索的局限性，阐明了将进化算法与强化学习统一的潜力。离散语义搜索和连续参数更新的整合为自主LLM对齐建立了一个自我强化的范式。

更新时间: 2026-05-07 15:50:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2603.21877v3

Invariant Features in Language Models: Geometric Characterization and Model Attribution

Language models exhibit strong robustness to paraphrasing, suggesting that semantic information may be encoded through stable internal representations, yet the structure and origin of such invariance remain unclear. We propose a local geometric framework in which semantically equivalent inputs occupy structured regions in latent space, with paraphrastic variation along nuisance directions and semantic identity preserved in invariant subspaces. Building on this view, we make three contributions: (1) a geometric characterization of invariant latent features, (2) a contrastive subspace discovery method that separates semantic-changing from semantic-preserving variation, and (3) an application of invariant representations to zero-shot model attribution. Across models and layers, empirical results support these contributions. Invariant structure emerges in specific depth regions, semantic displacement lies largely outside the nuisance subspace, and representation-level interventions indicate a causal role of invariant components in model outputs. Invariant representations also capture model-specific geometric patterns, enabling accurate attribution. These findings suggest that semantic invariance can be viewed as a local geometric property of latent representations, offering a principled perspective on how language models organize meaning.

Updated: 2026-05-07 15:50:31

标题: 语言模型中的不变特征：几何特征和模型归因

摘要: 语言模型对释义表现出强大的鲁棒性，这表明语义信息可能通过稳定的内部表示进行编码，然而这种不变性的结构和来源仍然不清楚。我们提出了一个局部几何框架，其中语义等价的输入在潜在空间中占据结构化区域，释义变化沿着无关方向变化，而语义标识在不变子空间中得以保留。基于这一观点，我们做出了三项贡献：（1）对不变潜在特征进行几何刻画，（2）一种对比子空间发现方法，将语义变化与语义保持变化分离，（3）将不变表示应用于零样本模型归因。经验结果表明，这些贡献得到支持。不变结构在特定深度区域中出现，语义偏移主要位于无关子空间之外，表示级干预表明不变组件在模型输出中起到因果作用。不变表示还捕捉了特定模型的几何模式，实现了准确的归因。这些发现表明，语义不变性可以被视为潜在表示的局部几何属性，为语言模型如何组织意义提供了原则性的视角。

更新时间: 2026-05-07 15:50:31

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2605.06458v1

Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems

LLM-based multi-agent systems are increasingly deployed for payment workflows, yet prevailing metrics, Task Success Rate (TSR) and Agent Handoff F1-Score (HF1), capture only final outcomes or unordered routing decisions. We introduce the Agentic Success Rate (ASR), a trajectory-fidelity metric that compares observed and expected agent execution sequences at the transition level, decomposing performance into Transition Recall and Transition Precision. Applied to the Hierarchical Multi-Agent System for Payments (HMASP) across 18 LLMs and 90,000 task instances, ASR reveals that 10 of 18 models systematically skip a confirmation checkpoint during payment checkout, a deviation invisible to both TSR and HF1, while 8 models enforce the checkpoint perfectly. Notably, GPT-4.1 exhibits hidden workflow shortcuts despite achieving perfect TSR and HF1, while GPT-5.2 achieves perfect ASR. Prompt refinements and deterministic routing guards guided by ASR diagnostics yield substantial TSR improvements, with gains up to +93.8 percentage points for previously struggling models, demonstrating that trajectory-level evaluation is essential in regulated domains.

Updated: 2026-05-07 15:50:26

标题: 超越任务成功：在基于LLM的主体支付系统中测量工作流忠实度

摘要: 基于LLM的多智能体系统越来越被用于支付工作流程，然而目前的度量标准，任务成功率（TSR）和智能体移交F1分数（HF1），只捕捉最终结果或无序的路由决策。我们引入了智能体成功率（ASR），这是一种轨迹准确度度量，比较了观察到的和预期的智能体执行序列在转换级别上，将性能分解为转换召回率和转换精度。应用于支付的分层多智能体系统（HMASP）涵盖18个LLM和90,000个任务实例，ASR揭示了18个模型中的10个系统性地在支付结账过程中跳过确认检查点，这是TSR和HF1都无法察觉到的偏差，而另外8个模型完美地执行了检查点。值得注意的是，尽管GPT-4.1达到了完美的TSR和HF1，但展现出了隐藏的工作流捷径，而GPT-5.2实现了完美的ASR。通过ASR诊断引导的提示性改进和确定性路由保护，使先前表现不佳的模型TSR显著提高，增益高达+93.8个百分点，证明在受监管领域中轨迹级别评估是必不可少的。

更新时间: 2026-05-07 15:50:26

领域: cs.AI

下载: http://arxiv.org/abs/2605.06457v1

When to Use Wireless Challenge-Response Physical Layer Authentication: Design of a Measurable Guideline for OFDM

The security of wireless challenge-response Physical Layer Authentication (PLA) based on Orthogonal Frequency Division Multiplexing (OFDM) relies on a sufficiently random fading channel condition, which is commonly assumed in existing studies. However, in practical scenarios, such a condition is not always guaranteed and the responses of OFDM subchannels may exhibit correlation.} Consequently, ensuring the security of such PLA systems remains an unsolved problem. In this paper, we propose a novel adversary model, called Maximum Differential Likelihood Generator (MDLG), which exploits the weak correlation property in practical wireless channel to launch effective attacks against PLA. Based on this model, we create a measurable guideline using randomness testing to decide when we can in fact use PLA in a practical wireless channel condition. Extensive real-world experiments validate the effectiveness of the MDLG attack and demonstrate how the proposed guideline can help protect the security of PLA.

Updated: 2026-05-07 15:50:20

标题: 何时使用无线挑战-响应物理层认证：OFDM可测量指导方针设计

摘要: 基于正交频分复用（OFDM）的无线挑战-响应物理层认证（PLA）的安全性依赖于足够随机的衰落信道条件，在现有研究中通常假定这种条件。然而，在实际场景中，这种条件并不总是能够保证，OFDM子信道的响应可能会表现出相关性。因此，确保这种PLA系统的安全性仍然是一个未解决的问题。在本文中，我们提出了一个新颖的对手模型，称为最大差异似然生成器（MDLG），它利用实际无线信道中的弱相关性特性来对PLA发动有效攻击。基于这个模型，我们创建了一个使用随机性测试来决定何时实际上可以在实际无线信道条件下使用PLA的可测量准则。大量的现实世界实验验证了MDLG攻击的有效性，并展示了提议的准则如何帮助保护PLA的安全性。

更新时间: 2026-05-07 15:50:20

领域: cs.NI,cs.CR

下载: http://arxiv.org/abs/2605.06750v1

PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors

Large language model (LLM) agents now execute long, tool-using tasks where final outcome checks can arrive too late for intervention. Online warning requires lightweight prefix monitors over heterogeneous traces, but hand-authored event schemas are brittle and deployment-time LLM judging is costly. We introduce PrefixGuard, a trace-to-monitor framework with an offline StepView induction step followed by supervised monitor training. StepView induces deterministic typed-step adapters from raw trace samples, and the monitor learns an event abstraction and prefix-risk scorer from terminal outcomes. Across WebArena, $τ^2$-Bench, SkillsBench, and TerminalBench, the strongest PrefixGuard monitors reach 0.900/0.710/0.533/0.557 AUPRC. Using the strongest backend within each representation, they improve over raw-text controls by an average of +0.137 AUPRC. LLM judges remain substantially weaker under the same prefix-warning protocol. We also derive an observability ceiling on score-based area under the precision-recall curve (AUPRC) that separates monitor error from failures lacking evidence in the observed prefix. For finite-state audit, post-hoc deterministic finite automaton (DFA) extraction remains compact on WebArena and $τ^2$-Bench (29 and 20 states) but expands to 151 and 187 states on SkillsBench and TerminalBench. Finally, first-alert diagnostics show that strong ranking does not imply deployment utility: WebArena ranks well yet fails to support low-false-alarm alerts, whereas $τ^2$-Bench and TerminalBench retain more actionable early alerts. Together, these results position PrefixGuard as a practical monitor-synthesis recipe with explicit diagnostics for when prefix warnings translate into actionable interventions.

Updated: 2026-05-07 15:49:48

标题: 前缀守护：从LLM-Agent跟踪到在线故障预警监视器

摘要: 大型语言模型（LLM）代理现在执行长时间、使用工具的任务，最终结果检查可能来得太晚以至于无法干预。在线警告需要轻量级前缀监视器监控异构跟踪，但手工编写的事件模式脆弱，部署时的LLM评估成本高昂。我们引入了PrefixGuard，一个追踪到监视器框架，其中包含离线StepView诱导步骤，然后是监视器的监督训练。StepView从原始跟踪样本中诱导出确定性类型步适配器，监视器从终端结果中学习事件抽象和前缀风险评分。在WebArena、$τ^2$-Bench、SkillsBench和TerminalBench上，最强大的PrefixGuard监视器达到了0.900/0.710/0.533/0.557的AUPRC。在每种表示中使用最强大的后端，它们比原始文本控制平均提高了+0.137的AUPRC。在相同的前缀警告协议下，LLM评委仍然明显较弱。我们还推导出基于评分的精度-召回曲线（AUPRC）的可观测上限，将监视器错误与观察到的前缀中缺乏证据的失败分开。对于有限状态审计，事后确定性有限自动机（DFA）提取在WebArena和$τ^2$-Bench上保持紧凑（29和20个状态），但在SkillsBench和TerminalBench上扩展到151和187个状态。最后，首次警报诊断显示，强大的排名并不意味着部署效用：WebArena排名良好，但无法支持低虚假警报警报，而$τ^2$-Bench和TerminalBench保留了更多可行的早期警报。总的来说，这些结果将PrefixGuard定位为一个实用的监视器合成配方，其中包含明确的诊断，用于确定前缀警告是否能够转化为可操作的干预措施。

更新时间: 2026-05-07 15:49:48

领域: cs.AI

下载: http://arxiv.org/abs/2605.06455v1

ORTHOBO: Orthogonal Bayesian Hyperparameter Optimization

Bayesian optimization is widely used for hyperparameter optimization when model evaluations are expensive; however, noisy acquisition estimates can lead to unstable decisions. We identify acquisition estimation noise as a failure mode that was previously overlooked: even when the surrogate model and acquisition target are correctly specified, finite-sample Monte Carlo error can perturb acquisition values. This can, in turn, flip candidate rankings and lead to suboptimal BO decisions. As a remedy, we aim at variance reduction and propose an orthogonal acquisition estimator that subtracts an optimally weighted score-function control variate, which yields an acquisition residual orthogonal to posterior score directions and which thus reduces Monte Carlo variance. We further introduce OrthoBO: a Bayesian optimization framework that combines our orthogonal acquisition estimator with ensemble surrogates and an outer log transformation. We show theoretically that our estimator preserves the target, leads to variance reduction, and improves pairwise ranking stability. We further verify the theoretical properties of OrthoBO through numerical experiments where our framework reduces acquisition estimation variance, stabilizes candidate rankings, and achieves strong performance. We also demonstrate the downstream utility of OrthoBO in hyperparameter optimization for neural network training and fine-tuning.

Updated: 2026-05-07 15:49:03

标题: ORTHOBO：正交贝叶斯超参数优化

摘要: 贝叶斯优化在模型评估昂贵时广泛用于超参数优化；然而，嘈杂的获取估计可能导致不稳定的决策。我们确定获取估计噪声是一种以前被忽视的故障模式：即使替代模型和获取目标被正确指定，有限样本的蒙特卡洛误差也可能扰乱获取数值。这反过来会翻转候选排名，并导致次优的BO决策。作为一种补救措施，我们旨在减少方差，并提出一种正交获取估计器，该估计器减去一个最优加权的得分函数控制变量，从而产生与后验得分方向正交的获取残差，从而减少蒙特卡洛方差。我们进一步介绍OrthoBO：一个将我们的正交获取估计器与集成替代模型和外部对数变换相结合的贝叶斯优化框架。我们在理论上表明，我们的估计器保留目标，导致方差减少，并改善成对排名的稳定性。我们通过数值实验进一步验证了OrthoBO的理论特性，其中我们的框架减少了获取估计方差，稳定了候选排名，并取得了良好的性能。我们还展示了OrthoBO在神经网络训练和微调的超参数优化中的下游效用。

更新时间: 2026-05-07 15:49:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2605.06454v1

Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning

Scaling critic capacity is a promising direction for improving off-policy reinforcement learning (RL). However, recent work shows that larger critics are prone to overfitting and instability in replay-based bootstrapped training. In this paper, we propose using Low-Rank Adaptation (LoRA) as a structural regularizer for critic learning. Our approach freezes randomly initialized base matrices and optimizes only the corresponding low-rank adapters, thereby constraining critic updates to a low-dimensional subspace. We evaluate our method across different off-policy RL algorithms, including SAC and FastTD3 based on different network architectures. Empirically, LoRA efficiently reduces critic loss during training and improves overall policy performance, achieving the best or competitive results on most tasks. Extensive experiments demonstrate that our low-rank updates provide a simple and effective form of structural regularization for critic learning in off-policy RL.

Updated: 2026-05-07 15:47:31

标题: 离策略强化学习中评论家学习的低秩适应性

摘要: 扩展评论者容量是改进离策略强化学习（RL）的一个有前途的方向。然而，最近的研究表明，更大的评论者容易在基于重播的自举训练中过拟合和不稳定。在本文中，我们提出使用低秩适应（LoRA）作为评论者学习的结构正则化器。我们的方法冻结随机初始化的基本矩阵，仅优化对应的低秩适配器，从而将评论者更新限制在低维子空间中。我们评估了我们的方法在不同离策略RL算法上的效果，包括基于不同网络架构的SAC和FastTD3。经验上，LoRA在训练过程中有效地减少了评论者损失，并提高了整体政策表现，在大多数任务中取得了最佳或竞争性的结果。大量实验证明，我们的低秩更新为离策略RL中评论者学习提供了一种简单而有效的结构正则化形式。

更新时间: 2026-05-07 15:47:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2604.18978v2

Constraint Decay: The Fragility of LLM Agents in Backend Code Generation

Large Language Model (LLM) agents demonstrate strong performance in autonomous code generation under loose specifications. However, production-grade software requires strict adherence to structural constraints, such as architectural patterns, databases, and object-relational mappings. Existing benchmarks often overlook these non-functional requirements, rewarding functionally correct but structurally arbitrary solutions. We present a systematic study evaluating how well agents handle structural constraints in multi-file backend generation. By fixing a unified API contract across 80 greenfield generation tasks and 20 feature-implementation tasks spanning eight web frameworks, we isolate the effect of structural complexity using a dual evaluation with end-to-end behavioral tests and static verifiers. Our findings reveal a phenomenon of constraint decay: as structural requirements accumulate, agent performance exhibits a substantial decline. Capable configurations lose 30 points on average in assertion pass rates from baseline to fully specified tasks, while some weaker configurations approach zero. Framework sensitivity analysis exposes significant performance disparities: agents succeed in minimal, explicit frameworks (e.g., Flask) but perform substantially worse on average in convention-heavy environments (e.g., FastAPI, Django). Finally, error analysis identifies data-layer defects (e.g., incorrect query composition and ORM runtime violations) as the leading root causes. This work highlights that jointly satisfying functional and structural requirements remains a key open challenge for coding agents.

Updated: 2026-05-07 15:44:40

标题: 约束衰减：后端代码生成中LLM代理的脆弱性

摘要: 大型语言模型（LLM）代理在宽松规范下的自动生成代码中表现出色。然而，生产级软件需要严格遵守结构约束，如架构模式、数据库和对象关系映射。现有的基准测试往往忽视这些非功能性需求，奖励功能正确但结构任意的解决方案。我们提出了一项系统研究，评估代理在多文件后端生成中如何处理结构约束。通过在涵盖八个Web框架的80个全新生成任务和20个特性实现任务中固定一个统一的API合同，我们通过端到端行为测试和静态验证器的双重评估来隔离结构复杂性的影响。我们的研究发现了一种约束衰减现象：随着结构要求的累积，代理的性能呈现显著下降。在从基准到完全指定任务的断言通过率中，有能力的配置平均下降30个点，而一些较弱的配置接近零。框架敏感性分析显示了显著的性能差异：代理在最小、明确的框架（例如Flask）中成功，但在约定繁重的环境（例如FastAPI、Django）中平均表现明显更差。最后，错误分析确定了数据层缺陷（例如不正确的查询组合和ORM运行时违规）作为主要根本原因。这项工作强调了同时满足功能和结构需求仍然是编码代理的一个关键开放挑战。

更新时间: 2026-05-07 15:44:40

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2605.06445v1

LicenseGPT: A Fine-tuned Foundation Model for Publicly Available Dataset License Compliance

Dataset license compliance is a critical yet complex aspect of developing commercial AI products, particularly with the increasing use of publicly available datasets. Ambiguities in dataset licenses pose significant legal risks, making it challenging even for software IP lawyers to accurately interpret rights and obligations. In this paper, we introduce LicenseGPT, a fine-tuned foundation model (FM) specifically designed for dataset license compliance analysis. We first evaluate existing legal FMs (i.e., FMs specialized in understanding and processing legal texts) and find that the best-performing model achieves a Prediction Agreement (PA) of only 43.75%. LicenseGPT, fine-tuned on a curated dataset of 500 licenses annotated by legal experts, significantly improves PA to 64.30%, outperforming both legal and general-purpose FMs. Through an A/B test and user study with software IP lawyers, we demonstrate that LicenseGPT reduces analysis time by 94.44%, from 108 seconds to 6 seconds per license, without compromising accuracy. Software IP lawyers perceive LicenseGPT as a valuable supplementary tool that enhances efficiency while acknowledging the need for human oversight in complex cases. Our work underscores the potential of specialized AI tools in legal practice and offers a publicly available resource for practitioners and researchers.

Updated: 2026-05-07 15:43:32

标题: LicenseGPT：一个为公开可用数据集许可合规性进行微调的基础模型

摘要: 数据集许可合规是开发商业人工智能产品的一个关键但复杂的方面，特别是随着对公开可用数据集的使用增加。数据集许可中的歧义会带来重大的法律风险，使得即使是软件知识产权律师也难以准确解释权利和义务。在本文中，我们介绍了LicenseGPT，这是一个专门为数据集许可合规分析设计的微调基础模型（FM）。我们首先评估了现有的法律FM（即专门用于理解和处理法律文本的FM），发现表现最佳的模型的预测一致性（PA）仅为43.75％。LicenseGPT在由法律专家注释的500个许可的精心策划数据集上进行微调，将PA显著提高到64.30％，优于法律和通用目的的FM。通过与软件知识产权律师的A/B测试和用户研究，我们证明LicenseGPT将分析时间从每个许可证 108 秒减少到6秒，而且不会影响准确性，软件知识产权律师认为LicenseGPT是一种有价值的辅助工具，提高了效率，同时认识到在复杂案例中需要人类监督。我们的工作强调了专门的人工智能工具在法律实践中的潜力，并为从业者和研究人员提供了一个公开可用的资源。

更新时间: 2026-05-07 15:43:32

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2501.00106v2

SCRuB: Social Concept Reasoning under Rubric-Based Evaluation

While many studies of Large Language Model (LLM) reasoning capabilities emphasize mathematical or technical tasks, few address reasoning about social concepts: the abstract ideas shaping social norms, culture, and institutions. This understudied capability is essential for modern models acting as social agents, yet no systematic evaluation methodology targets it. We introduce SCRuB (Social Concept Reasoning under Rubric-Based Evaluation), a framework designed for this setting of task indeterminacy. Our goal is to measure the degree to which a model reasons about social concepts with the depth and critical rigor of a human expert. SCRuB proceeds in three phases: prompt construction from established sources, response generation by experts and models, and comparative evaluation using a five-dimensional critical thinking rubric. To enable generalization of the pipeline, we introduce a Panel of Disciplinary Perspectives ensemble validated against independent expert judges. We release SCRuBEval (n=4,711 evaluation prompts) and SCRuBAnnotations (300 expert-authored responses and 150 expert comparative judgments from 45 PhD-level scholars). Our results show that frontier models consistently outperform human experts across all five rubric dimensions. Across 1,170 pairwise comparisons, expert judges ranked a model response first in 80.8% of judgments and preferred model responses overall 74.4% of the time. Ultimately, this study provides the first expert-grounded demonstration of evaluation saturation for social concept reasoning: the single-turn exam-style format has reached its ceiling for models and humans alike.

Updated: 2026-05-07 15:43:14

标题: SCRuB：基于评估标准的社会概念推理

摘要: 许多关于大型语言模型（LLM）推理能力的研究强调数学或技术任务，但很少有研究涉及关于社会概念的推理：塑造社会规范、文化和制度的抽象思想。这种鲜为人知的能力对于现代模型作为社会代理是至关重要的，然而没有系统的评估方法针对它。我们引入了SCRuB（基于评估标准的社会概念推理），这是一个专为任务不确定性设置而设计的框架。我们的目标是衡量模型推理社会概念的程度，以及与人类专家一样的深度和批判性严谨度。SCRuB分为三个阶段：从已建立的来源构建提示，由专家和模型生成响应，以及使用五维批判性思维评分表进行比较评估。为了实现管道的泛化，我们引入了一个跨学科视角小组，经过独立专家评审验证。我们发布了SCRuBEval（n=4,711个评估提示）和SCRuBAnnotations（来自45位博士级学者的300个专家撰写的响应和150个专家比较判断）。我们的结果显示，前沿模型在所有五个评分维度上始终胜过人类专家。在1,170个两两比较中，专家评审在80.8%的判断中将模型响应排在第一位，总体上选择模型响应的比例为74.4%。最终，这项研究提供了第一个基于专家的社会概念推理评估饱和度的演示：单轮考试风格的格式已经达到了模型和人类的极限。

更新时间: 2026-05-07 15:43:14

领域: cs.AI

下载: http://arxiv.org/abs/2605.06444v1

Probe-Geometry Alignment: Erasing the Cross-Sequence Memorization Signature Below Chance

Recent attacks show that behavioural unlearning of large language models leaves internal traces recoverable by adversarial probes. We characterise where this retention lives and show it can be surgically removed without measurable capability cost. Our central protocol is a leave-one-out cross-sequence probe that tests whether a memorisation signature generalises across held-out sequences. The signature is real and consistent across scale: memorisation-specific gaps of +0.32, +0.19, +0.30 on Pythia-70M, GPT-2 medium, and Mistral-7B; on Pythia-70M, the random-initialisation control collapses to -0.04 at the deepest layer where the pretrained signature peaks. The probe direction is causally separable from recall -- projecting it out collapses the signature locally (+0.44 -> -0.19) while behavioural recall barely changes -- and a probe trained on naturally memorised content does not classify fine-tuning-injected secrets, marking two representationally distinct regimes. We then introduce probe-geometry alignment (PGA), a surgical erasure that aligns activations along the probe's live readout direction at each depth. PGA drives the cross-sequence probe below random chance at all four scales tested (toy depth-4: 0.17; Pythia-70M: 0.07; Mistral-7B: 0.45; GPT-2 medium: 0.06 via MD-PGA k=2) and remains robust to six adversarial probe variants. Against a re-fitting attacker who trains a fresh probe on PGA-treated activations, we extend PGA adversarially, defeating the re-fit probe at every memorisation-relevant depth while preserving five zero-shot capability benchmarks within 2.8 percentage points per task (mean Δacc = +0.2pp). The cross-sequence signature is a real, causally separable, regime-specific property of pretrained representations -- removable below chance with a single rank-one intervention per depth at no measurable capability cost.

Updated: 2026-05-07 15:40:56

标题: 探针几何对齐：消除低于机会的交叉序列记忆签名

摘要: 最近的攻击表明，大型语言模型的行为取消学习会留下内部痕迹，可以通过敌对探测器恢复。我们表征了这种保留的位置，并展示可以在没有可测量能力成本的情况下进行外科手术式去除。我们的中心协议是留一交叉序列探测器，测试一个记忆特征是否可以推广到保留的序列之外。该特征在规模上是真实且一致的：在Pythia-70M、GPT-2中等和Mistral-7B上的记忆特定差距为+0.32、+0.19、+0.30；在Pythia-70M上，随机初始化控制在预训练特征达到峰值的最深层处崩溃到-0.04。探测方向在因果上与回忆是可分离的 -- 投影它会在本地使特征崩溃（+0.44 -> -0.19），而行为回忆几乎没有改变 -- 并且对自然记忆内容进行训练的探测器不会分类微调注入的秘密，标志着两个代表性不同的区域。然后，我们介绍了探测-几何对齐（PGA），一种手术式擦除，将激活沿着每个深度的探测器的实时输出方向对齐。PGA将在所有四个经过测试的规模上将交叉序列探测器推动至随机几率以下（玩具深度-4：0.17；Pythia-70M：0.07；Mistral-7B：0.45；GPT-2中等：0.06通过MD-PGA k=2），并对六种敌对探测器变种保持稳健。对于一个重新拟合的攻击者，他在PGA处理的激活上训练了一个新的探测器，我们对PGA进行了敌对扩展，在每个与记忆相关的深度击败了重新拟合的探测器，同时在每项任务上保持了五个零-shot能力基准在2.8个百分点以内（平均 Δacc = +0.2pp）。交叉序列特征是预训练表示的一个真实的、因果可分离的、区域特定的属性 -- 可以通过每个深度的单个秩为一的干预在没有可测量能力成本的情况下将其移除至低于随机几率。

更新时间: 2026-05-07 15:40:56

领域: cs.LG,cs.AI,cs.CR,cs.NE

下载: http://arxiv.org/abs/2605.01699v3

Fast and Efficient Gossip Algorithms for Robust and Non-smooth Decentralized Learning

Decentralized learning on resource-constrained edge devices demands algorithms that are communication-efficient, robust to data corruption, and lightweight in memory. State-of-the-art gossip-based methods address communication efficiency, but achieving robustness remains challenging. Methods for robust estimation and optimization typically rely on non-smooth objectives (\textit{e.g.}, pinball loss, $\ell_1$ loss), yet standard gossip methods are primarily designed for smooth losses. Asynchronous decentralized ADMM-based methods have been proposed to handle such non-smooth objectives; however, existing approaches require memory that scales with node degree, making them impractical when memory is limited. We propose AsylADMM, a novel asynchronous gossip algorithm for decentralized non-smooth optimization requiring only two variables per node. We provide a new theoretical analysis for the synchronous variant and leverage it to prove convergence of AsylADMM in a simplified setting based on the squared loss. Empirically, AsylADMM converges faster than existing baselines on challenging non-smooth problems, including quantile and geometric median estimation, lasso regression, and robust regression. More broadly, our novel gossip framework opens a practical pathway toward robust and non-smooth decentralized learning.

Updated: 2026-05-07 15:40:35

标题: 快速高效的八卦算法用于稳健和非平滑的分散式学习

摘要: 资源受限的边缘设备上的分散式学习需要通信效率高、对数据损坏具有鲁棒性且内存占用轻量级的算法。目前最先进的基于流言的方法解决了通信效率的问题，但实现鲁棒性仍然具有挑战性。用于鲁棒估计和优化的方法通常依赖于非光滑的目标（例如，pinball损失，$\ell_1$损失），然而标准的流言传闻方法主要设计用于光滑损失。已经提出了基于异步分散式ADMM的方法来处理这种非光滑目标；然而，现有方法需要与节点度量相比例的内存，这使得在内存受限时不切实际。我们提出了AsylADMM，一种新颖的异步传闻算法，用于分散式非光滑优化，每个节点只需要两个变量。我们为同步变体提供了一个新的理论分析，并利用它证明了AsylADMM在基于平方损失的简化设置中的收敛性。在实证研究中，AsylADMM在挑战性的非光滑问题上比现有基线方法收敛更快，包括分位数和几何中位数估计、套索回归和鲁棒回归。总的来说，我们的新颖传闻框架为实现鲁棒和非光滑的分散式学习打开了一条实际的途径。

更新时间: 2026-05-07 15:40:35

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2601.20571v2

COVID-19 Infodemic. Understanding content features in detecting fake news using a machine learning approach

The use of content features, particularly textual and linguistic for fake news detection is under-researched, despite empirical evidence showing the features could contribute to differentiating real and fake news. To this end, this study investigates a selection of content features such as word bigrams, part of speech distribution etc. to improve fake news detection. We performed a series of experiments on a new dataset gathered during the COVID-19 pandemic and using Decision Tree, K-Nearest Neighbor, Logistic Regression, Support Vector Machine and Random Forest. Random Forest yielded the best results, followed closely by Support Vector Machine, across all setups. In general, both the textual and linguistic features were found to improve fake news detection when used separately, however, combining them into a single model did not improve the detection significantly. Differences were also noted between the use of bigrams and part of speech tags. The study shows that textual and linguistic features can be used successfully in detecting fake news using the traditional machine learning approach as opposed to deep learning.

Updated: 2026-05-07 15:36:17

标题: COVID-19信息泛滥。理解使用机器学习方法检测假新闻的内容特征

摘要: 内容特征的使用，特别是文本和语言特征在检测假新闻方面尚未得到充分研究，尽管有实证证据表明这些特征可以帮助区分真假新闻。为此，本研究调查了一些内容特征，如词二元组、词性分布等，以改进假新闻检测。我们在新冠疫情期间收集的数据集上进行了一系列实验，使用决策树、K-最近邻、逻辑回归、支持向量机和随机森林。随机森林取得了最佳结果，紧随其后的是支持向量机，在所有设置中均表现出色。总体而言，文本和语言特征在单独使用时被发现可以提高假新闻检测的准确性，然而将它们结合到一个单一模型中并没有显著改善检测结果。还发现了使用词二元组和词性标签之间的差异。本研究表明，文本和语言特征可以成功地用于检测假新闻，采用传统机器学习方法，而非深度学习。

更新时间: 2026-05-07 15:36:17

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2605.06435v1

Knowledge Graphs, the Missing Link in Agentic AI-based Formal Verification

Recent advances in Large Language Models (LLMs) have enabled workflows that generate SystemVerilog Assertions (SVAs) from natural-language specifications, with the potential to accelerate Formal Verification (FV). However, high-quality assertion synthesis remains challenging because specifications are often ambiguous or incomplete and critical micro-architectural details reside in the Register Transfer Level (RTL). Many existing approaches treat the specification and RTL as loosely structured text, which weakens specification-to-RTL grounding and leads to semantic mismatches and frequent syntax failures during formal parsing and elaboration. This work addresses these limitations with a verification-centric Knowledge Graph (KG) constructed from structured Intermediate Representations (IRs) extracted from the specification, RTL, and formal-tool feedback, including syntax diagnostics, Counterexamples (CEXs), and coverage reports. The KG links requirements, design hierarchy, signals, assumptions, and properties to provide traceable, design-grounded context for generation. A multi-agent workflow queries and updates this KG to generate SVAs and to drive three refinement loops: syntax repair guided by tool diagnostics, CEX-guided correction using trace links, and coverage-directed property augmentation. Evaluation across seven benchmark designs indicates that KG-based context retrieval improves specification-to-RTL grounding and consistently produces compilable SVAs with low syntax-repair overhead. The approach achieves formal coverage ranging from 78.5% to 99.4%, though convergence exhibits design dependence with complex temporal and arithmetic reasoning remaining challenging for current LLM capabilities.

Updated: 2026-05-07 15:35:53

标题: 知识图谱，作为代理智能形式验证中的缺失环节

摘要: 最近大型语言模型(LLMs)的进展使得可以从自然语言规范生成SystemVerilog Assertions (SVAs)的工作流程成为可能，从而加速形式验证(FV)。然而，高质量的断言合成仍然具有挑战性，因为规范通常存在歧义或不完整，并且关键的微架构细节存储在寄存器传输级(RTL)中。许多现有方法将规范和RTL视为松散结构的文本，这会削弱规范与RTL之间的关联，并导致在形式解析和详细说明过程中出现语义不匹配和频繁的语法错误。本文通过从规范、RTL和形式工具反馈中提取的结构化中间表示(IRs)构建了一个以验证为中心的知识图(KG)，包括语法诊断、反例(CEXs)和覆盖报告。KG将需求、设计层次结构、信号、假设和属性进行链接，为生成提供可追溯的、基于设计的上下文。多智能体工作流程查询和更新KG以生成SVAs，并驱动三个细化循环：根据工具诊断进行语法修复，使用跟踪链接进行CEX引导的更正，以及基于覆盖进行属性增强。对七个基准设计的评估表明，基于KG的上下文检索提高了规范与RTL的关联，并始终生成具有低语法修复开销的可编译SVAs。该方法实现的形式覆盖范围从78.5%到99.4%，尽管收敛性表现出设计依赖性，复杂的时间和算术推理对当前LLM能力仍具有挑战性。

更新时间: 2026-05-07 15:35:53

领域: cs.AI

下载: http://arxiv.org/abs/2605.06434v1

It's Not a Lottery, It's a Race: Understanding How Gradient Descent Adapts the Network's Capacity to the Task

Our theoretical understanding of neural networks is lagging behind their empirical success. One of the important unexplained phenomena is why and how, during the process of training with gradient descent, the theoretical capacity of neural networks is reduced to an effective capacity that fits the task. We here investigate the mechanism by which gradient descent achieves this through analyzing the learning dynamics at the level of individual neurons in single hidden layer ReLU networks. We identify three dynamical principles, namely mutual alignment, unlocking and racing, that together explain why we can often successfully reduce capacity after training through the merging of equivalent neurons or the pruning of low norm weights. We specifically explain the mechanism behind the lottery ticket conjecture, or why the specific, beneficial initial conditions of some neurons lead them to obtain higher weight norms.

Updated: 2026-05-07 15:32:42

标题: 这个标题的翻译是：这不是一场抽奖，而是一场比赛：理解梯度下降如何调整网络的容量以适应任务

摘要: 我们对神经网络的理论理解落后于其实证成功。一个重要的未解释现象是，在使用梯度下降进行训练过程中，神经网络的理论容量为何会被降低到适合任务的有效容量。我们通过分析单隐藏层 ReLU 网络中个别神经元的学习动态，研究了梯度下降实现这一机制的方式。我们确定了三个动态原则，即相互对齐、解锁和竞争，这三个原则共同解释了为什么我们经常能够通过合并等效神经元或修剪低范数权重来成功降低容量。我们具体解释了“幸运彩票”猜想的机制，或者为什么某些神经元的具体、有益的初始条件导致它们获得更高的权重范数。

更新时间: 2026-05-07 15:32:42

领域: cs.LG,cs.AI,cs.CV,cs.NE

下载: http://arxiv.org/abs/2602.04832v2

Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models

Large language models (LLMs) show strong performance across many applications, but their ability to memorize and potentially reveal training data raises serious privacy concerns. We introduce the PopQuiz Attack, a black-box membership inference attack that tests whether a model can recall specific training examples. The core idea is to turn target data into quiz-style multiple-choice questions and infer membership from the model's answers. Across six widely used LLMs (GPT-3.5, GPT-4o, LLaMA2-7b, LLaMA2-13b, Mistral-7b, and Vicuna-7b) and four datasets, our method achieves an average ROC-AUC of 0.873 and outperforms existing approaches by 20.6%. We further analyze factors affecting attack success, including query complexity, data type, data structure, and training settings. We also evaluate instruction-based, filter-based, and differential privacy-based defenses, which reduce performance but do not eliminate the risk. Our results highlight persistent privacy vulnerabilities in modern LLMs.

Updated: 2026-05-07 15:29:10

标题: 突击小测验：针对大型语言模型的黑盒成员推断攻击

摘要: 大型语言模型（LLMs）在许多应用中表现出色，但它们记忆和可能泄露训练数据的能力引发了严重的隐私担忧。我们介绍了PopQuiz攻击，这是一种黑盒成员推断攻击，测试模型是否能回忆起特定的训练示例。其核心思想是将目标数据转化为类似于测验式的多项选择问题，并从模型的答案中推断成员身份。在六种广泛使用的LLMs（GPT-3.5、GPT-4o、LLaMA2-7b、LLaMA2-13b、Mistral-7b和Vicuna-7b）和四个数据集上，我们的方法实现了平均ROC-AUC为0.873，并比现有方法高出20.6%。我们进一步分析了影响攻击成功的因素，包括查询复杂性、数据类型、数据结构和训练设置。我们还评估了基于指令、基于过滤器和基于差分隐私的防御方法，这些方法可以降低性能，但不能消除风险。我们的结果突显了现代LLMs中持续存在的隐私漏洞。

更新时间: 2026-05-07 15:29:10

领域: cs.CR

下载: http://arxiv.org/abs/2605.06423v1

Automated Side-Channel Analysis of Cryptographic Protocol Implementations

Formal verification of cryptographic protocols typically relies on symbolic models that abstract away compiled code and microarchitectural side channels, leaving a gap between verified specifications and deployed executables. We present a toolchain that extracts protocol-relevant models from real binaries and analyzes them under explicit leakage contracts for constant-time and Spectre-PHT-style speculative observations. Starting from a selected binary region, we lift machine code to an intermediate representation, instrument it with leakage contracts, symbolically execute it to obtain event/observation traces, and translate these traces into Sapic for analysis with Tamarin, ProVerif, and DeepSec. As case studies, we extract models of WhatsApp Desktop's session-management and double-ratchet components from its binary and analyze forward secrecy and post-compromise security under a state-cloning compromise. For side-channel analysis, we study the Basic Access Control (BAC) protocol used in e-passports and WhatsApp's session establishment. Under our observation models, we identify an instruction-cache side channel in WhatsApp Desktop enabling social-graph inference, and we reproduce known unlinkability issues in BAC under microarchitectural observations.

Updated: 2026-05-07 15:28:59

标题: 密码协议实现的自动化侧信道分析

摘要: 加密协议的形式验证通常依赖于抽象编译代码和微体系结构侧信道的符号模型，这导致验证规范和部署的可执行文件之间存在差距。我们提出了一个工具链，从真实的二进制文件中提取与协议相关的模型，并在明确的泄漏契约下分析这些模型，以适应恒定时间和Spectre-PHT风格的投机观察。从选择的二进制区域开始，我们将机器代码转换为中间表示，用泄漏契约进行插装，对其进行符号执行以获得事件/观察轨迹，并将这些轨迹转换为Sapic，以用Tamarin、ProVerif和DeepSec进行分析。作为案例研究，我们从WhatsApp Desktop的二进制文件中提取会话管理和双向棘轮组件的模型，并在状态克隆妥协情况下分析前向保密性和妥协后安全性。对于侧信道分析，我们研究了电子护照中使用的基本访问控制（BAC）协议和WhatsApp的会话建立。在我们的观察模型下，我们发现WhatsApp Desktop中存在一个指令缓存侧信道，可以实现社交图推断，并在微体系结构观察下复制了BAC中已知的不可链接性问题。

更新时间: 2026-05-07 15:28:59

领域: cs.CR

下载: http://arxiv.org/abs/2511.11385v3

E = T*H/(O+B): A Dimensionless Control Parameter for Mixture-of-Experts Ecology

We introduce E = T*H/(O+B), a dimensionless control parameter that predicts whether Mixture-of-Experts (MoE) models will develop a healthy expert ecology or collapse into dead experts. E combines four hyperparameters -- routing temperature T, routing entropy weight H, oracle weight O, and balance weight B -- into a single quantity. Through 12 controlled experiments (8 vision, 4 language) totaling over 11,000 training epochs, we establish that E >= 0.5 alone is sufficient to guarantee zero dead experts, removing the necessity for handcrafted load-balancing auxiliary losses. We validate this cross-modally on CIFAR-10, CIFAR-100, TinyImageNet-200, WikiText-2, and WikiText-103. Six additional findings emerge: (1) dead experts can resuscitate -- triggered by balance loss driving router re-exploration; (2) ortho toxicity is dataset-dependent, not universal; (3) task complexity shifts the critical E threshold; (4) model overfitting is decoupled from expert ecological health; (5) three-tier MoE spontaneously collapses into a two-tier functional structure; (6) ecological structure is temperature-invariant across a 50x range. We propose that E serves as a unified diagnostic for MoE training, analogous to the Reynolds number in fluid dynamics.

Updated: 2026-05-07 15:23:35

标题: E = T*H/(O+B)：一种混合专家生态学的无因次控制参数

摘要: 我们引入E = T*H/(O+B)，这是一个无量纲控制参数，可以预测混合专家（MoE）模型是否会发展出健康的专家生态系统，还是会崩溃成死专家。E将四个超参数 -- 路由温度T、路由熵权重H、预测权重O和平衡权重B -- 结合成一个单一的数量。通过12个受控实验（8个视觉，4个语言），总共超过11,000个训练周期，我们确认E >= 0.5就足以保证没有死专家的存在，不再需要手工设计负载平衡的辅助损失。我们在CIFAR-10、CIFAR-100、TinyImageNet-200、WikiText-2和WikiText-103上进行了交叉验证。还有六个额外的发现：（1）死专家可以复活 -- 由平衡损失驱动路由器重新探索；（2）正交毒性是依赖于数据集的，而不是普遍存在的；（3）任务复杂度会改变关键的E阈值；（4）模型过拟合与专家生态健康是脱耦的；（5）三层MoE会自发地坍塌成两层的功能结构；（6）生态结构在50倍范围内对温度不变。我们提出E可以作为MoE训练的统一诊断，类似于流体动力学中的雷诺数。

更新时间: 2026-05-07 15:23:35

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2605.06415v1

DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning

Federated Learning (FL) enables collaborative model training across distributed devices while preserving local data privacy, making it ideal for mobile and embedded systems. However, the decentralized nature of FL also opens vulnerabilities to model poisoning attacks, particularly backdoor attacks, where adversaries implant trigger patterns to manipulate model predictions. In this paper, we propose DeTrigger, a scalable and efficient backdoor-robust federated learning framework that leverages insights from adversarial attack methodologies. By employing gradient analysis with temperature scaling, DeTrigger detects and isolates backdoor triggers, allowing for precise model weight pruning of backdoor activations without sacrificing benign model knowledge. Extensive evaluations across four widely used datasets demonstrate that DeTrigger achieves up to 251x faster detection than traditional methods and mitigates backdoor attacks by up to 98.9%, with minimal impact on global model accuracy. Our findings establish DeTrigger as a robust and scalable solution to protect federated learning environments against sophisticated backdoor threats.

Updated: 2026-05-07 15:22:56

标题: DeTrigger：一种基于梯度的方法用于联邦学习中防止后门攻击

摘要: 联邦学习（FL）使分布式设备之间的协作模型训练成为可能，同时保护本地数据隐私，使其成为移动和嵌入式系统的理想选择。然而，FL的分散性质也使其容易受到模型中毒攻击的威胁，尤其是后门攻击，即对手植入触发模式来操纵模型预测。在本文中，我们提出DeTrigger，这是一个可扩展且高效的后门鲁棒联邦学习框架，利用对抗性攻击方法的见解。通过使用温度缩放的梯度分析，DeTrigger能够检测和隔离后门触发器，允许对后门激活的模型权重进行精确修剪，而不牺牲良性模型知识。对四个广泛使用的数据集进行的广泛评估表明，DeTrigger比传统方法实现了高达251倍的更快检测速度，并且能够减轻高达98.9%的后门攻击，对全局模型准确性的影响非常小。我们的研究结果确立了DeTrigger作为一种强大且可扩展的解决方案，用以保护联邦学习环境免受复杂的后门威胁。

更新时间: 2026-05-07 15:22:56

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2411.12220v3

WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling

Integrating speech understanding and generation is a pivotal step toward building unified speech models. However, the different representations required for these two tasks currently pose significant compatibility challenges. Typically, semantics-oriented features are learned from self-supervised learning (SSL), and acoustic-oriented features from reconstruction. Such fragmented representations hinder the realization of truly unified speech systems. We present WavCube, a compact continuous latent derived from an SSL speech encoder that simultaneously supports speech understanding, reconstruction, and generation. WavCube employs a two-stage training scheme. Stage 1 trains a semantic bottleneck to filter off-manifold redundancy that makes raw SSL features intractable for diffusion. Stage 2 injects fine-grained acoustic details via end-to-end reconstruction, while a semantic anchoring loss ensures the representation remains grounded within its original semantic manifold. Comprehensive experiments show that WavCube closely approaches WavLM performance on SUPERB despite an 8x dimensional compression, attains reconstruction quality on par with existing acoustic representations, delivers state-of-the-art zero-shot TTS performance with markedly faster training convergence, and excels in speech enhancement, separation, and voice conversion tasks on the SUPERB-SG benchmark. Systematic ablations reveal that WavCube's two-stage recipe resolves two intrinsic flaws of SSL features for generative modeling, paving the way for future unified speech systems. Codes and checkpoints are available at https://github.com/yanghaha0908/WavCube.

Updated: 2026-05-07 15:17:24

标题: WavCube：通过语义-声学联合建模实现理解和生成的统一语音表示

摘要: 将语音理解和生成整合是构建统一语音模型的关键步骤。然而，目前这两个任务所需的不同表示形式会带来显著的兼容性挑战。通常，语义导向的特征是通过自监督学习（SSL）学习的，而声学导向的特征是通过重构学习的。这种碎片化的表示妨碍了真正统一语音系统的实现。我们提出了WavCube，这是从SSL语音编码器中派生的紧凑的连续潜在空间，同时支持语音理解、重构和生成。WavCube采用两阶段训练方案。第一阶段训练语义瓶颈，以过滤使原始SSL特征难以扩散的离散冗余。第二阶段通过端到端重构注入细粒度的声学细节，同时通过语义锚定损失确保表示保持在其原始语义空间内。综合实验表明，尽管进行了8倍维度压缩，WavCube在SUPERB上的性能接近于WavLM，达到了与现有声学表示相媲美的重构质量，实现了最先进的零样本TTS性能，并且在SUPERB-SG基准上在语音增强、分离和语音转换任务上表现出色。系统性的剥离表明，WavCube的两阶段方法解决了SSL特征在生成建模中的两个固有缺陷，为未来统一语音系统铺平了道路。代码和检查点可在https://github.com/yanghaha0908/WavCube找到。

更新时间: 2026-05-07 15:17:24

领域: eess.AS,cs.AI,cs.CL

下载: http://arxiv.org/abs/2605.06407v1

Consistent Geometric Deep Learning via Hilbert Bundles and Cellular Sheaves

Modern deep learning architectures increasingly contend with sophisticated signals that are natively infinite-dimensional, such as time series, probability distributions, or operators, and are defined over irregular domains. Yet, a unified learning theory for these settings has been lacking. To start addressing this gap, we introduce a novel convolutional learning framework for possibly infinite-dimensional signals supported on a manifold. Namely, we use the connection Laplacian associated with a Hilbert bundle as a convolutional operator, and we derive filters and neural networks, dubbed as \textit{HilbNets}. We make HilbNets and, more generally, the convolution operation, implementable via a two-stage sampling procedure. First, we show that sampling the manifold induces a Hilbert Cellular Sheaf, a generalized graph structure with Hilbert feature spaces and edge-wise coupling rules, and we prove that its sheaf Laplacian converges in probability to the underlying connection Laplacian as the sampling density increases. Notably, this result is a generalization to the infinite-dimensional bundle setting of the Belkin \& Niyogi \cite{BELKIN20081289} convergence result for the graph Laplacian to the manifold Laplacian, a theoretical cornerstone of geometric learning methods. Second, we discretize the signals and prove that the discretized (implementable) HilbNets converge to the underlying continuous architectures and are transferable across different samplings of the same bundle, providing consistency for learning. Finally, we validate our framework on synthetic and real-world tasks. Overall, our results broaden the scope of geometric learning as a whole by lifting classical Laplacian-based frameworks to settings where the signal at each point lives in its own Hilbert space.

Updated: 2026-05-07 15:08:58

标题: 通过希尔伯特丛和细胞层进行一致的几何深度学习

摘要: 现代深度学习架构越来越处理原生无限维的复杂信号，比如时间序列、概率分布或算子，这些信号定义在不规则的域上。然而，针对这些情境的统一学习理论尚未建立。为了开始填补这一空白，我们引入了一种新颖的卷积学习框架，用于可能存在于流形上的无限维信号。具体来说，我们使用与希尔伯特丛相关的连接拉普拉斯作为卷积算子，并推导出滤波器和神经网络，被称为HilbNets。我们通过两阶段采样过程使HilbNets和卷积操作可实现。首先，我们展示了对流形进行采样会导致Hilbert细胞层，这是一种具有Hilbert特征空间和边耦合规则的广义图结构，并且我们证明随着采样密度增加，其层拉普拉斯以概率收敛于基础连接拉普拉斯。值得注意的是，这一结果是对Belkin＆Niyogi的图拉普拉斯收敛于流形拉普拉斯的无限维丛设置的推广，这是几何学习方法的理论基石。其次，我们离散化信号并证明离散化（可实现的）HilbNets收敛于基础连续架构，并且在相同丛的不同采样之间具有可传递性，提供了学习的一致性。最后，我们在合成和真实任务上验证了我们的框架。总的来说，我们的结果通过将经典拉普拉斯的基础框架扩展到每个点的信号都存在于自己的希尔伯特空间的情境，从而拓宽了几何学习的范围。

更新时间: 2026-05-07 15:08:58

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2605.06395v1

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

Self-hosted computer-use agents (SHCUAs), such as OpenClaw, combine natural-language interaction with direct access to host-side resources, including browsers, files, scripts, system commands, and external communication channels. While useful for automating real tasks, this capability also creates a host-level abuse surface: a legitimately deployed agent may be steered toward unsafe operations through malicious messages, indirect prompt injection, unsafe skills, or tampering along the host-side control path. We argue that such risks cannot be addressed by ad hoc blocking rules alone, because the security criticality of an operation depends jointly on its action type, target object, execution context, and potential effect. This paper presents an operation-centric model for risk-based confinement of SHCUA operations. The proposed design keeps ordinary functionality on the constrained REE path, while protecting security-critical classification, authorization, binding, evidence generation, and selected execution-control decisions inside a cloud-native TEE-backed trusted operation plane. We instantiate the architecture on OpenClaw using Intel TDX as the primary trusted backend, with remote terminal-side trusted components verifying TDX-audited commands before constrained local execution. The evaluation shows that the design can block unsafe or policy-disallowed operations before execution, preserve ordinary functionality for allowed workloads, and provide auditable evidence with deployment-dependent overhead.

Updated: 2026-05-07 15:08:40

标题: 通过TEE支持的隔离限制自托管计算机使用代理的主机级滥用

摘要: 自托管计算机使用代理（SHCUAs），如OpenClaw，结合了自然语言交互和直接访问主机端资源，包括浏览器、文件、脚本、系统命令和外部通信渠道。虽然对于自动化实际任务很有用，但这种能力也会产生主机级滥用风险：一个合法部署的代理可能通过恶意消息、间接提示注入、不安全技能或沿主机端控制路径的篡改被引导执行不安全操作。我们认为这样的风险不能仅通过临时阻止规则来解决，因为操作的安全关键性取决于其操作类型、目标对象、执行上下文和潜在影响的联合。本文提出了一个以操作为中心的模型，用于基于风险的限制SHCUA操作。所提出的设计将普通功能保留在受限的REE路径上，同时将安全关键性分类、授权、绑定、证据生成和选定执行控制决策保护在基于云原生TEE的可信操作平面内。我们在OpenClaw上实例化了这一架构，使用Intel TDX作为主要可信后端，远程终端端的可信组件在受限本地执行之前验证了经过TDX审计的命令。评估结果表明，该设计可以在执行之前阻止不安全或违反策略的操作，为允许的工作负载保留普通功能，并提供与部署相关的可审计证据，带来一定的开销。

更新时间: 2026-05-07 15:08:40

领域: cs.CR

下载: http://arxiv.org/abs/2605.06393v1

Risk Horizons: Structured Hypothesis Spaces for Longitudinal Clinical Prediction

Predicting future clinical events from longitudinal electronic health records (EHRs) requires selecting plausible outcomes from a large and structured event space under sparse observations. While clinical coding systems provide hierarchical organization of events, cross-modal and temporal relationships are not explicitly specified and must instead be inferred from data, making prediction difficult for weakly observed longitudinal transitions. We introduce Risk Horizons, a geometry-aware framework for constructing patient-specific candidate spaces for multi-modal next-visit prediction. Risk Horizons combines deterministic coding hierarchies with data-driven lagged cross-modal associations, embeds the resulting clinical graph in hyperbolic space, and retrieves candidate futures using directional risk cones. This reframes longitudinal prediction as ranking within a compact, clinically coherent hypothesis space rather than scoring an unconstrained vocabulary. Experiments on MIMIC-IV and eICU demonstrate competitive next-visit prediction performance, with consistently improved hierarchy consistency across diagnoses, procedures, and medications. Further analysis suggests that hyperbolic structured candidate retrieval is the primary driver of performance, while LLMs are effective as constrained inference-time rerankers operating over clinically grounded candidate sets.

Updated: 2026-05-07 15:08:25

标题: 风险视野：用于纵向临床预测的结构化假设空间

摘要: 从纵向电子健康记录（EHRs）中预测未来临床事件需要在稀疏观察下从大型和结构化事件空间中选择合理的结果。尽管临床编码系统提供了事件的分层组织，但跨模态和时间关系并未明确规定，必须从数据中推断，这使得对弱观察到的纵向转换进行预测变得困难。我们引入了Risk Horizons，这是一个基于几何的框架，用于构建患者特定的候选空间，用于多模式下次访问预测。Risk Horizons将确定性编码层次与数据驱动的滞后跨模态关联相结合，将结果临床图嵌入双曲空间，并使用方向性风险锥检索候选未来。这将纵向预测重新构建为在一个紧凑的、临床上连贯的假设空间内进行排名，而不是对无约束词汇进行评分。在MIMIC-IV和eICU上的实验表明，具有竞争力的下次访问预测性能，同时在诊断、程序和药物方面一贯改进了层次一致性。进一步的分析表明，双曲结构化的候选检索是性能的主要驱动因素，而LLMs作为在临床基础候选集上操作的约束推理时间重新排序器是有效的。

更新时间: 2026-05-07 15:08:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2602.12828v2

Path Dependence under Adaptive AI Delegation

Repeated AI assistance can improve immediate task performance while reducing the skill available for future independent work. We develop a mathematical framework for this long-run tradeoff. The model tracks two state variables: a latent human skill level governing expected independent performance, and a delegation level representing the learner's evolving tendency to rely on AI. Skill changes through error-driven learning under practice and decay under delegation; delegation responds to observed performance, increasing when AI-assisted work appears to outperform independent work. We analyze the resulting dynamics and contrast them with fixed delegation. With fixed delegation, skill follows a one-dimensional learning-decay process with a single stable equilibrium. With adaptive delegation, the coupled system has two attracting equilibria separated by the stable manifold of an interior saddle. The existence and geometry of this separatrix require a global phase-plane analysis of the coupled dynamics. The system is path-dependent: small differences in initial skill or reliance can lead to different long-run outcomes. We use this characterization to show that AI assistance can improve short-run performance while producing worse long-run performance than a no-AI baseline. Increasing AI capability can enlarge the basin of attraction of the low-skill equilibrium, making delegation appear beneficial for longer while increasing the risk of eventual skill loss. The qualitative picture is observed to persist across alternative specifications. Together, these results show that the risk is not AI assistance itself, but the coupling between performance-driven reliance and use-dependent skill change.

Updated: 2026-05-07 15:07:14

标题: 路径依赖在自适应人工智能委托下的应用

摘要: 重复的人工智能帮助可以提高即时任务表现，同时减少未来独立工作的技能。我们为这种长期权衡开发了一个数学框架。该模型跟踪两个状态变量：一个潜在的人类技能水平，用于控制预期的独立表现，以及一个代表学习者倾向于依赖人工智能的演变倾向的委托水平。技能通过练习中的误差驱动学习而变化，在委托下会衰减；委托会响应观察到的表现，在AI辅助工作似乎优于独立工作时增加。我们分析了由此产生的动态并将其与固定委托进行了对比。在固定委托下，技能遵循一个一维的学习-衰减过程，具有一个稳定的平衡点。而在自适应委托下，耦合系统具有两个吸引平衡点，它们之间被一个内部鞍点的稳定流形分隔。这个分隔线的存在和几何形状要求对耦合动态的全局相平面进行分析。该系统是路径依赖的：初始技能或依赖的细微差异可能导致不同的长期结果。我们利用这一特征表明，人工智能辅助可以提高短期表现，但比无AI基准产生更糟糕的长期表现。增加人工智能能力可以扩大低技能平衡点的吸引盆地，使委托似乎对更长期有益，同时增加最终技能丧失的风险。这种定性图景在替代规范中持续存在。总的来说，这些结果表明风险不是人工智能辅助本身，而是表现驱动的依赖和使用依赖的技能变化之间的耦合。

更新时间: 2026-05-07 15:07:14

领域: cs.CY,cs.AI,cs.GT

下载: http://arxiv.org/abs/2603.02950v2

Automated alignment is harder than you think

A leading proposal for aligning artificial superintelligence (ASI) is to use AI agents to automate an increasing fraction of alignment research as capabilities improve. We argue that, even when research agents are not scheming to deliberately sabotage alignment work, this plan could produce compelling but catastrophically misleading safety assessments resulting in the unintentional deployment of misaligned AI. This could happen because alignment research involves many hard-to-supervise fuzzy tasks (tasks without clear evaluation criteria, for which human judgement is systematically flawed). Consequently, research outputs will contain systematic, undetected errors, and even correct outputs could be incorrectly aggregated into overconfident safety assessments. This problem is likely to be worse for automated alignment research than for human-generated alignment research for several reasons: 1) optimisation pressure means agent-generated mistakes are concentrated among those that human reviewers are least likely to catch; 2) agents are likely to produce errors that do not resemble human mistakes; 3) AI-generated alignment solutions may involve arguments humans cannot evaluate; and 4) shared weights, data and training processes may make AI outputs more correlated than human equivalents. Therefore, agents must be trained to reliably perform hard-to-supervise fuzzy tasks. Generalisation and scalable oversight are the leading candidates for achieving this but both face novel challenges in the context of automated alignment.

Updated: 2026-05-07 15:06:37

标题: 自动对齐比你想象的更困难

摘要: 一种引领性的建议是利用人工超智能（ASI）来自动化越来越多的调整研究，随着能力的提高。我们认为，即使研究代理人没有阴谋策划故意破坏调整工作，这一计划也可能产生令人信服但灾难性误导安全评估，导致误用不符合要求的人工智能。这可能是因为调整研究涉及许多难以监督的模糊任务（即没有明确评估标准的任务，人类判断系统性有缺陷）。因此，研究产出将包含系统性、未被发现的错误，即使正确的产出也可能被错误地汇总为过于自信的安全评估。这个问题对于自动化调整研究而言可能比对于人类生成的调整研究更为严重，原因有几点：1）优化压力意味着代理生成的错误集中在人类审查者最不可能发现的错误上；2）代理可能产生不像人类错误的错误；3）AI生成的调整解决方案可能涉及人类无法评估的论点；以及4）共享的权重、数据和训练过程可能使AI产出比人类等价物更相关。因此，代理必须接受训练，以可靠地执行难以监督的模糊任务。泛化和可扩展的监督是实现这一目标的主要候选方案，但在自动化调整的背景下，两者都面临着新的挑战。

更新时间: 2026-05-07 15:06:37

领域: cs.AI

下载: http://arxiv.org/abs/2605.06390v1

MAT-Cell: A Multi-Agent Tree-Structured Reasoning Framework for Batch-Level Single-Cell Annotation

Automated single-cell annotation is difficult when the most abundant genes are not the most discriminative ones, or when a target state is poorly covered by a fixed reference atlas. GPTCelltype-style one-shot prompting allows large language models (LLMs) to produce plausible labels from generic expression signals, while reference-based annotators can force unfamiliar states into the nearest known category. We propose MAT-Cell, a prompt-driven framework for batch-level single-cell annotation that separates evidence grounding from label decision. MAT-Cell first uses Reverse Verification Query (RVQ) to combine tissue context, observed differentially expressed genes, and LLM-elicited biological priors into structured candidate-specific premises. Verifier agents then convert these premises into explicit premise-to-claim reasoning trees, and bounded multi-round debate compares,challenges, and revises the resulting claims before consensus or final adjudication.The returned Syllogistic Derivation Tree (SDT) provides an auditable debate trace rather than a formal proof of the annotation. In open-candidate benchmarks across five datasets, a locally deployed Qwen3-30B model with MAT-Cell achieves 75.5% average accuracy, compared with 64.2% for the strongest evaluated CoT baseline and 51.9% for the strongest evaluated scPilot variant. In oracle-candidate bench-marks across three species,MAT-Cell remains competitive across backbones, and local inference substantially reduces monetary cost for batch annotation. Code is available at: https://anonymous.4open.science/r/MATCell-4067

Updated: 2026-05-07 15:03:52

标题: MAT-Cell：一种用于批级单细胞注释的多主体树结构推理框架

摘要: 自动单细胞注释在最丰富的基因不是最具区分性的基因时很困难，或者当目标状态未被固定参考图谱充分覆盖时也很困难。GPTCelltype风格的一键提示允许大型语言模型（LLMs）从通用表达信号中产生合理的标签，而基于参考的注释程序可以将不熟悉的状态强制转换为最近的已知类别。我们提出了MAT-Cell，一个批量级单细胞注释的提示驱动框架，将证据基础与标签决策分开。MAT-Cell首先使用反向验证查询（RVQ）将组织上下文、观察到的差异表达基因和LLM引发的生物先验信息组合成结构化的候选特定前提。然后，验证器代理将这些前提转换为明确的前提-主张推理树，有界的多轮辩论比较、挑战和修订结果主张，直至达成共识或最终裁决。返回的三段论推导树（SDT）提供了可审计的辩论追踪，而不是注释的正式证明。在五个数据集中进行的开放候选基准测试中，使用MAT-Cell的本地部署的Qwen3-30B模型平均准确率达到了75.5%，最强的CoT基线为64.2%，最强的scPilot变体为51.9%。在三个物种的神谕候选基准测试中，MAT-Cell在各种基础设施上保持竞争力，本地推理大大降低了批注的货币成本。代码可在以下网址获取：https://anonymous.4open.science/r/MATCell-4067

更新时间: 2026-05-07 15:03:52

领域: q-bio.QM,cs.AI

下载: http://arxiv.org/abs/2604.06269v2

Multivariate Standardized Residuals for Conformal Prediction

While split conformal prediction guarantees marginal coverage, approaching the stronger property of conditional coverage is essential for reliable uncertainty quantification. Naive conformal scores, however, suffer from poor conditional coverage in heteroskedastic settings. In univariate regression, this is commonly addressed by normalizing non-conformity scores using an estimated local score variance. In this work, we propose a natural extension of this normalization to the multivariate setting, effectively whitening the residuals to decouple output correlations and standardize local variance. Furthermore, we derive a sufficient condition characterizing a broad class of distributions for which standardized residuals yield asymptotic conditional coverage. We demonstrate that using the Mahalanobis distance induced by a learned local covariance as a non-conformity score provides a closed-form, computationally efficient mechanism for capturing inter-output correlations and heteroskedasticity, avoiding the expensive sampling required by previous methods based on cumulative distribution functions. This structure unlocks several practical extensions, including the handling of missing output values, the refinement of conformal sets when partial information is revealed, and the construction of valid conformal sets for transformations of the output. Finally, we provide extensive empirical evidence on both synthetic and real-world datasets showing that our approach yields conformal sets that improve upon the conditional coverage of existing multivariate baselines.

Updated: 2026-05-07 15:03:06

标题: 多元标准化残差用于合规预测

摘要: 拆分符合预测保证了边际覆盖率，接近更强的条件覆盖性质对可靠的不确定性量化至关重要。然而，天真的符合分数在异方差设置中存在较差的条件覆盖率。在一元回归中，通常通过使用估计的局部得分方差对非符合得分进行归一化来解决这个问题。在这项工作中，我们提出了将这种归一化扩展到多元设置的自然方法，有效地将残差变白，解耦输出之间的相关性并标准化局部方差。此外，我们推导出一个充分条件，描述了一类广泛的分布，对于这些分布，标准化残差会产生渐近条件覆盖率。我们证明了使用由学习的局部协方差引起的马氏距离作为非符合得分提供了一个封闭形式的、计算有效的机制，用于捕捉输出之间的相关性和异方差性，避免了基于累积分布函数的先前方法所需的昂贵采样。这种结构解锁了几个实际扩展，包括处理缺失的输出值、在部分信息被透露时细化符合集，以及为输出的变换构建有效的符合集。最后，我们在合成和真实世界数据集上提供了大量的实证证据，表明我们的方法产生的符合集改进了现有多元基线的条件覆盖率。

更新时间: 2026-05-07 15:03:06

领域: stat.ML,cs.AI,cs.LG,stat.ME,stat.OT

下载: http://arxiv.org/abs/2507.20941v4

Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level

On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted policy gradient suffers from three structural weaknesses, including high variance updates, vanishing gradients in zero-advantage regions, and exploration bottlenecks when corrective signals are insufficient.We therefore propose Asymmetric On-Policy Distillation (AOPD), which replaces ineffective negative reinforcement with localized divergence minimization in non-positive advantage regions while preserving positive reinforcement learning. Experiments on mathematical reasoning benchmarks show that AOPD consistently outperforms standard OPD, with average gains of 4.09 / 8.34 under strong / weak initialization, respectively. AOPD also maintains higher policy entropy during training and better capability retention during sequential tool-use adaptation.

Updated: 2026-05-07 15:02:49

标题: 非对称的在策略蒸馏：在令牌级别上连接利用和模仿

摘要: On-policy distillation (OPD)使用基于令牌级别的教师反馈在自身轨迹上训练学生，并经常优于离策略蒸馏和标准强化学习。然而，我们发现其标准优势加权策略梯度存在三个结构性弱点，包括高方差更新、在零优势区域中梯度消失以及当修正信号不足时探索瓶颈。因此，我们提出了非对称的On-Policy Distillation（AOPD），在非正优势区域中用局部分歧最小化替代无效的负强化，同时保留正强化学习。在数学推理基准测试中的实验表明，AOPD始终优于标准OPD，在强/弱初始化下分别平均增益为4.09/8.34。AOPD还在训练过程中保持更高的策略熵，并在顺序工具使用适应过程中具有更好的能力保持。

更新时间: 2026-05-07 15:02:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2605.06387v1

MediEval: A Unified Medical Benchmark for Patient-Contextual and Knowledge-Grounded Reasoning in LLMs

Large Language Models (LLMs) are increasingly applied to medicine, yet their adoption is limited by concerns over reliability and safety. Existing evaluations either test factual medical knowledge in isolation or assess patient-level reasoning without verifying correctness, leaving a critical gap. We introduce MediEval, a benchmark that links MIMIC-IV electronic health records (EHRs) to a unified knowledge base built from UMLS and other biomedical vocabularies. MediEval generates diverse factual and counterfactual medical statements within real patient contexts, enabling systematic evaluation across a 4-quadrant framework that jointly considers knowledge grounding and contextual consistency. Using this framework, we identify critical failure modes, including hallucinated support and truth inversion, that current proprietary, open-source, and domain-specific LLMs frequently exhibit. To address these risks, we propose Counterfactual Risk-Aware Fine-tuning (CoRFu), a DPO-based method with an asymmetric penalty targeting unsafe confusions. CoRFu improves by +16.4 macro-F1 points over the base model and eliminates truth inversion errors, demonstrating both higher accuracy and substantially greater safety.

Updated: 2026-05-07 15:02:00

标题: MediEval：一种统一的医学基准，用于在LLMs中进行患者背景和知识基础推理

摘要: 大型语言模型（LLMs）越来越被应用于医学，然而它们的采用受到可靠性和安全性方面的担忧所限制。现有的评估要么仅测试独立的事实性医学知识，要么评估患者级别的推理而不验证正确性，留下了一个关键的空白。我们引入了MediEval，一个基准，将MIMIC-IV电子健康记录（EHRs）与从UMLS和其他生物医学词汇构建的统一知识库连接起来。MediEval在真实患者背景下生成多样化的事实性和反事实性医学陈述，实现了跨越4象限框架的系统评估，该框架同时考虑了知识基础和情境一致性。利用这一框架，我们确定了当前专有、开源和领域特定的LLMs经常表现出的关键失败模式，包括幻觉支持和真相倒置。为了解决这些风险，我们提出了Counterfactual Risk-Aware Fine-tuning（CoRFu），这是一种基于DPO的方法，采用不对称惩罚来针对不安全的混淆。CoRFu在基础模型上提高了+16.4个宏F1分数，并消除了真相倒置错误，显示出更高的准确性和极大的安全性。

更新时间: 2026-05-07 15:02:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2512.20822v2

MinMax Recurrent Neural Cascades

We show that the MinMax algebra provides a form of recurrence that is expressively powerful, efficiently implementable, and most importantly it is not affected by vanishing or exploding gradient. We call MinMax Recurrent Neural Cascades (RNCs) the models obtained by cascading several layers of neurons that employ such recurrence. We show that MinMax RNCs enjoy many favourable theoretical properties. First, their formal expressivity includes all regular languages, arguably the maximal expressivity for a finite-memory system. Second, they can be evaluated in parallel with a runtime that is logarithmic in the input length given enough processors; and they can also be evaluated sequentially. Third, their state and activations are bounded uniformly for all input lengths. Fourth, at almost all points, their loss gradient exists and it is bounded. Fifth, they do not exhibit a vanishing state gradient: the gradient of a state w.r.t. a past state can have constant value one regardless of the time distance between the two states. Finally, we find empirical evidence that the favourable theoretical properties of MinMax RNCs are matched by their practical capabilities: they are able to perfectly solve a number of synthetic tasks, showing superior performance compared to the considered state-of-the-art recurrent neural networks; also, we train a MinMax RNC of 127M parameters on next-token prediction, and the obtained model shows competitive performance for its size, providing evidence of the potential of MinMax RNCs on real-world tasks.

Updated: 2026-05-07 15:01:36

标题: 最小最大循环神经级联网络

摘要: 我们展示了MinMax代数提供了一种具有表达力、高效实现能力的循环形式，并且最重要的是它不受消失或爆炸梯度的影响。我们将采用这种循环的神经级联模型称为MinMax循环神经级联（RNCs）。我们展示了MinMax RNCs具有许多有利的理论特性。首先，它们的形式表达能力包括所有正则语言，可以说是有限存储系统的最大表达能力。其次，如果有足够的处理器，它们可以在输入长度的对数运行时间内并行评估；它们也可以顺序评估。第三，它们的状态和激活在所有输入长度上均统一有界。第四，在几乎所有点上，它们的损失梯度存在且有界。第五，它们不表现出消失状态梯度：相对于过去状态的状态梯度可以具有恒定值1，无论两个状态之间的时间距离如何。最后，我们找到了实证证据表明MinMax RNCs的有利理论特性与它们的实际能力相匹配：它们能够完美解决一些合成任务，表现出比考虑的最先进循环神经网络更优越的性能；此外，我们在下一个标记预测上训练了一个具有127M参数的MinMax RNC，获得的模型显示出了与其大小相匹配的竞争性表现，为MinMax RNCs在实际任务中的潜力提供了证据。

更新时间: 2026-05-07 15:01:36

领域: cs.LG,cs.AI,cs.FL

下载: http://arxiv.org/abs/2605.06384v1

Rethinking Vacuity for OOD Detection in Evidential Deep Learning

Vacuity, or Uncertainty Mass (UM), is commonly used as a metric to evaluate Out-of-Distribution (OOD) detection in Evidential Deep Learning (EDL). It generally involves dividing the number of classes ($K$) by the total strength of belief ($S$) of the model's predictions, where $S$ is derived from summing the Dirichlet parameters. As such, UM is sensitive to the cardinality of $K$. In particular, it is unlikely in practice that there is a linear relationship between $K$ and $S$ as $K$ and $S$ increase due to the nature of EDL (suppressing incorrectly assigned evidence). As a result, when comparing In Distribution (ID) and OOD results, it is important that $K_{\mathrm{ID}}$ and $K_{\mathrm{OOD}}$ are equal; something that is not always ensured in practice. We provide an empirical demonstration of how results for AUROC and AUPR can substantially differ when class cardinality between ID and OOD differs by 1, with AUROC differing by as much as 0.318 and AUPR by 0.613 for standard EDL, and AUROC by 0.360 and AUPR by 0.683 for IB-EDL. More concretely, our findings isolate an evaluation artefact: when K differs between ID and OOD, AUROC/AUPR can be artificially inflated without any change in model predictions. We further discuss the evaluation of EDL over causal language models using Multiple-Choice Question-Answer (MCQA) datasets and argue for clearer definitions of ID and OOD in this context. Our primary contribution is an empirical and theoretical demonstration that vacuity-based OOD detection in EDL-fine-tuned LLMs is highly sensitive to uncontrolled differences in evaluated class cardinality.

Updated: 2026-05-07 15:00:56

标题: 重新思考证据深度学习中的空虚性，用于OOD检测

摘要: Vacuity，或不确定性质量（UM），通常被用作评估证据深度学习（EDL）中的异常检测的度量标准。它通常涉及将模型预测的总信念强度（$S$）除以类别数量（$K$），其中$S$是从总和Dirichlet参数中导出的。因此，UM对$K$的基数敏感。特别是，在实践中，由于EDL的性质（抑制错误分配的证据），$K$和$S$增加时，$K$和$S$之间不太可能存在线性关系。因此，在比较入分布（ID）和异常分布（OOD）结果时，重要的是$K_{\mathrm{ID}}$和$K_{\mathrm{OOD}}$相等；这在实践中并不总是保证的。我们提供了一个实证演示，说明当ID和OOD之间的类别基数相差1时，AUROC和AUPR的结果可以大相径庭，标准EDL的AUROC相差多达0.318，AUPR相差0.613，IB-EDL的AUROC相差0.360，AUPR相差0.683。更具体地，我们的发现独立于评估工件：当ID和OOD之间的K不同时，AUROC/AUPR可能会在没有任何模型预测变化的情况下被人为夸大。我们进一步讨论了使用多项选择题答案（MCQA）数据集对EDL在因果语言模型上的评估，并在这一背景下提出了ID和OOD更清晰的定义。我们的主要贡献是经验和理论上的证明，即基于虚无的EDL微调LLM中的异常检测对评估类别基数的未受控差异高度敏感。

更新时间: 2026-05-07 15:00:56

领域: cs.AI

下载: http://arxiv.org/abs/2605.06382v1

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

Step distillation has become a leading technique for accelerating diffusion models, among which Distribution Matching Distillation (DMD) and Consistency Distillation are two representative paradigms. While consistency methods enforce self-consistency along the full PF-ODE trajectory to steer it toward the clean data manifold, vanilla DMD relies on sparse supervision at a few predefined discrete timesteps. This restricted discrete-time formulation and mode-seeking nature of the reverse KL divergence tends to exhibit visual artifacts and over-smoothed outputs, often necessitating complex auxiliary modules -- such as GANs or reward models -- to restore visual fidelity. In this work, we introduce Continuous-Time Distribution Matching (CDM), migrating the DMD framework from discrete anchoring to continuous optimization for the first time. CDM achieves this through two continuous-time designs. First, we replace the fixed discrete schedule with a dynamic continuous schedule of random length, so that distribution matching is enforced at arbitrary points along sampling trajectories rather than only at a few fixed anchors. Second, we propose a continuous-time alignment objective that performs active off-trajectory matching on latents extrapolated via the student's velocity field, improving generalization and preserving fine visual details. Extensive experiments on different architectures, including SD3-Medium and Longcat-Image, demonstrate that CDM provides highly competitive visual fidelity for few-step image generation without relying on complex auxiliary objectives. Code is available at https://github.com/byliutao/cdm.

Updated: 2026-05-07 14:56:39

标题: 少步扩散蒸馏的连续时间分布匹配

摘要: 阶梯蒸馏已成为加速扩散模型的主要技术，其中分布匹配蒸馏（DMD）和一致性蒸馏是两种代表性范式。虽然一致性方法强制在整个PF-ODE轨迹上保持自一致性，以将其引导到干净数据流形，而香草DMD依赖于在几个预定义的离散时间步骤上的稀疏监督。这种受限的离散时间公式和反向KL散度的模式寻找性质往往会产生视觉伪影和过度平滑的输出，通常需要复杂的辅助模块--如GAN或奖励模型--来恢复视觉保真度。在这项工作中，我们引入了连续时间分布匹配（CDM），将DMD框架从离散定位迁移到连续优化，这是第一次。CDM通过两种连续时间设计实现这一点。首先，我们将固定的离散时间表替换为动态的随机长度的连续时间表，以便在采样轨迹上的任意点强制分布匹配，而不仅仅是在几个固定的锚点上。其次，我们提出了一个连续时间对齐目标，通过学生速度场外推的潜变量进行主动脱轨匹配，提高了泛化能力并保留了精细的视觉细节。对不同架构的广泛实验，包括SD3-Medium和Longcat-Image，在不依赖于复杂的辅助目标的情况下，证明了CDM为少步图像生成提供了高度具有竞争力的视觉保真度。源代码可在https://github.com/byliutao/cdm找到。

更新时间: 2026-05-07 14:56:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2605.06376v1

A UEFI System with SPDM to Protect Against Unauthorized Device Connections

Attackers willing to compromise computing systems can use malicious peripherals as an attack vector, threatening users that cannot verify the hardware's authenticity. To address this problem, our work uses the Security Protocol and Data Model to propose a UEFI system capable of authenticating PCIe and USB devices trying to connect with it. We also develop an open source proof-of-concept using emulation to evaluate and illustrate our proposal, which is capable of restricting the devices' connections to only those allowed, thus protecting the system against malicious peripherals. Then, using kernel virtualization features to evaluate the emulation, we collect the number of instructions and CPU cycles during boot. Our experiments reveal that, during firmware execution, the number of instructions and the number of CPU cycles increased respectively 13% and 8% on average. This processing overhead is acceptable in view of enhanced security. Institutions requiring high security levels can leverage our proof-of-concept to tailor their own system based on their own requirements.

Updated: 2026-05-07 14:40:26

标题: 一个带有SPDM的UEFI系统，用于防止未经授权的设备连接

摘要: 攻击者有意愿妥协计算系统，可以利用恶意外围设备作为攻击向量，威胁那些无法验证硬件真实性的用户。为了解决这个问题，我们的工作利用安全协议和数据模型提出了一种能够认证试图连接的PCIe和USB设备的UEFI系统。我们还开发了一个开源的概念验证，使用仿真来评估和说明我们的提议，该提议能够限制设备只连接到允许的设备，从而保护系统免受恶意外围设备的侵害。然后，利用内核虚拟化功能来评估仿真，我们在引导过程中收集指令数量和CPU周期数。我们的实验表明，在固件执行期间，指令数量和CPU周期数分别平均增加了13%和8%。考虑到增强的安全性，这种处理开销是可以接受的。需要高安全级别的机构可以利用我们的概念验证来根据自己的要求定制自己的系统。

更新时间: 2026-05-07 14:40:26

领域: cs.CR

下载: http://arxiv.org/abs/2605.06744v1

Fine-Tuning Small Language Models for Solution-Oriented Windows Event Log Analysis

Large language models (LLMs) have shown promise for event log analysis, but their high computational requirements, reliance on cloud infrastructure, and security concerns limit practical deployment. In addition, most existing approaches focus only on the identification of the problem and do not provide actionable remediation. Small language models (SLMs) present a light-weight alternative that can be fine-tuned for a specific purpose and hosted locally. This paper investigates whether SLMs, when fine-tuned for a specific task, can serve as a practical alternative for event log analysis while also generating solutions. We first create a large-scale synthetic Windows event log dataset that contains remediation actions using a high-performing LLM. We then fine-tune multiple SLMs and LLMs using the LoRA parameter-efficient fine-tuning technique and evaluate their performance by comparing with expert assessment. The results show that the dataset accurately reflects real-world scenarios and that fine-tuned SLMs consistently outperform LLMs in identifying issues and providing relevant remediation, while requiring fewer computational resources.

Updated: 2026-05-07 14:24:59

标题: 为解决导向的Windows事件日志分析微调小型语言模型

摘要: 大型语言模型（LLMs）已经显示出在事件日志分析方面具有潜力，但它们高计算需求、对云基础设施的依赖以及安全问题限制了实际部署。此外，大多数现有方法仅关注问题的识别，并未提供可操作的纠正措施。小型语言模型（SLMs）提供了一种轻量级的替代方案，可以针对特定目的进行微调并在本地托管。本文研究了当针对特定任务进行微调时，SLMs是否可以作为事件日志分析的实际替代方案，同时还能生成解决方案。我们首先使用高性能LLM创建了一个大规模的合成Windows事件日志数据集，其中包含纠正操作。然后我们使用LoRA参数高效微调技术对多个SLMs和LLMs进行微调，并通过与专家评估进行比较评估它们的性能。结果表明该数据集准确反映了真实世界的情况，并且经过微调的SLMs在识别问题并提供相关纠正措施方面始终优于LLMs，同时需要较少的计算资源。

更新时间: 2026-05-07 14:24:59

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2605.06330v1

Gaming the Metric, Not the Harm: Certifying Safety Audits against Strategic Platform Manipulation

Online-safety regulation under the UK Online Safety Act and the EU Digital Services Act increasingly treats scalar metrics as compliance evidence. Once announced, such a metric also becomes an optimization target: a strategic platform can improve its score by routing recommendations through semantically equivalent content variants, without reducing true harm. We ask when such an audit metric can still certify a genuine reduction in harm. The protocol is modeled as a published transformation graph whose connected components form semantic classes, and the metric itself is treated as a security object. Three results follow. First, any metric that scores variants directly is manipulable as soon as two equivalent variants in a harmful class disagree in score. Second, the semantic-envelope lift, which assigns each variant the maximum score in its class, is the unique pointwise minimum among conservative classwise-constant repairs. Third, a class-stratified certificate, $H^\star(x) \le (1/\hatα) M_{\mathrm{Env}(m)}(x) + \barη$, holds for every platform strategy, with $\barη$ absorbing annotation and protocol error. We check the claims at three levels: exhaustive enumeration on a finite-state grid of mixed strategies, an SMT encoding in Z3 cross-replayed in cvc5, and a bounded single-player MDP encoded in PRISM-games. The fragile metric fails manipulation invariance and cannot support the same useful predeclared class-coverage certificate; under the envelope-level certificate, it produces large violations at every tested instance, with a large mean gaming gap across random catalogs at a fixed audit budget. The semantic-envelope metric exhibits no such violation in the tested instances.

Updated: 2026-05-07 14:22:21

标题: 操纵指标，而不是危害：针对战略平台操纵认证安全审计

摘要: 在英国在线安全法和欧盟数字服务法下，网络安全监管越来越将规模度量视为合规证据。一旦宣布，这样的度量也成为优化目标：战略平台可以通过将推荐路由通过语义等价的内容变体，而不降低实际伤害，来提高其得分。我们探讨了这样一个审核度量仍然能够证实真正减少伤害的情况。协议被建模为一个发布的转换图，其连通组件形成语义类，度量本身被视为安全对象。三个结果如下。首先，任何直接评分变体的度量一旦两个有害类中的等价变体在得分上不一致，就可以被操纵。其次，语义包络提升，将每个变体分配到其类中的最高分，是保守类常数修复中唯一的逐点最小值。第三，一个分层证书，$H^\star(x) \le (1/\hatα) M_{\mathrm{Env}(m)}(x) + \barη$，对于每个平台策略都成立，其中 $\barη$ 吸收注释和协议错误。我们在三个层面上验证了这些声明：在混合策略的有限状态网格上进行详尽枚举、在 Z3 中进行 SMT 编码，并在 cvc5 中进行交叉重播，以及在 PRISM-games 中对有界的单人 MDP 进行编码。脆弱的度量不支持操纵不变性，并且无法支持相同有用的预先声明的类覆盖证书；在包络水平证书下，在每个测试实例中都存在较大的违规行为，而在固定的审核预算下，随机目录中存在较大的平均游戏差距。在测试实例中，语义包络度量没有这样的违规行为。

更新时间: 2026-05-07 14:22:21

领域: cs.CR,cs.CY,cs.LG

下载: http://arxiv.org/abs/2605.06324v1

From Specification to Deployment: Empirical Evidence from a W3C VC + DID Trust Infrastructure for Autonomous Agents

Autonomous AI agents now transact at production scale -- 69,000 bots executing 165 million transactions across 50 million USDC in cumulative volume on a single marketplace -- without any shared trust layer between participants. Regulatory frameworks (Singapore IMDA, NIST CAISI, EU AI Act) and major AI laboratories (Anthropic, Google) have independently converged on the same structural requirement: an open, portable, cryptographically verifiable trust infrastructure for autonomous agents that no single vendor can deliver alone. This paper presents MolTrust, a production-deployed implementation of such an infrastructure built on W3C Verifiable Credentials 2.0 and Decentralized Identifiers v1.0, with on-chain anchoring on Base Layer 2. The system architecture is organized around four primitives (identity, authorization, behavioral record, portability), a five-party accountability chain, and the Agent Authorization Envelope (AAE) -- a machine-evaluable authorization structure enforced at three layers: cryptographic signatures, API-level credential lifecycle management, and kernel-level syscall monitoring via Falco eBPF integration. The paper documents three distinguishing capabilities: kernel-layer AAE enforcement below the agent process boundary; cross-protocol interoperability through five reproducible test vectors verified against independent implementations; and layered Sybil resistance combining dual-signature interaction proofs, cross-vertical endorsement diversity gating, and principal-DID-linked violation persistence. The reference implementation has been operational since March 2026 across eight credential verticals. Empirical validation at adversarial scale is pending. The contribution is deployment-first evidence that the trust infrastructure regulators and industry have converged on is implementable today using W3C-standardized primitives.

Updated: 2026-05-07 14:09:51

标题: 从规范到部署：基于W3C VC + DID信任基础设施的自主代理的实证证据

摘要: 自主AI代理现在已经在生产规模上进行交易 - 69,000个机器人在单个市场上执行了1.65亿笔交易，总交易量达到5000万美元，参与者之间没有任何共享信任层。监管框架（新加坡IMDA、NIST CAISI、欧盟AI法案）和主要AI实验室（Anthropic、Google）已经独立达成了相同的结构要求：为自主代理构建一个开放、可移植、加密验证的信任基础设施，没有任何单一供应商可以独自提供。本文介绍了MolTrust，这是一个基于W3C可验证凭证2.0和分散标识符v1.0构建的基础设施的生产部署实现，同时在Base Layer 2上进行链上锚定。系统架构围绕四个基本原语（身份、授权、行为记录、可移植性）、一个五方责任链以及代理授权信封（AAE）展开 - 这是一种机器评估授权结构，通过三个层面进行强制执行：加密签名、API级凭证生命周期管理以及通过Falco eBPF集成的内核级系统调用监控。本文记录了三个独特能力：在代理进程边界下内核层AAE执行；通过五个可重现的测试向量进行交叉协议互操作性验证，验证独立实现；以及通过双签名交互证明、跨垂直认可多样性门控和主体DID链接违规持久性的分层Sybil抵抗。参考实现自2026年3月起已在八个凭证垂直领域运营。对敌对规模的经验验证尚待完成。本文的贡献是部署优先证据，证明监管机构和行业已经达成一致的信任基础设施今天可以使用W3C标准化的基本原语来实现。

更新时间: 2026-05-07 14:09:51

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2605.06738v1

Module Lattice Security (Part I): Unconditional Verification of Weber's Conjecture for $k \le 12$

Weber's conjecture (1886) governs three aspects of lattice-based cryptography: the solvability of the Principal Ideal Problem, the freeness of modules over rings of integers, and the tightness of worst-case-to-average-case reductions in Ring-LWE (R-LWE) and Module-LWE (MLWE). Existing verifications for $k \ge 9$ rely on Generalized Riemann Hypothesis (GRH). In this paper, we present the first unconditional proof for $k \le 12$. Our method combines the Fukuda-Komatsu computational sieve, inductive structure of the cyclotomic $\mathbb{Z}_2$-tower, and Herbrand's theorem.

Updated: 2026-05-07 13:53:40

标题: 模块格安全性（第一部分）：$k \le 12$时对Weber猜想的无条件验证

摘要: 韦伯猜想（1886年）控制着基于格的密码学的三个方面：主理想问题的可解性，整数环上模的自由性，以及环-LWE（R-LWE）和模-LWE（MLWE）中最坏情况到平均情况减少的紧密程度。现有的$k \ge 9$的验证依赖于广义黎曼假设（GRH）。在本文中，我们提出了$k \le 12$的首个无条件证明。我们的方法结合了福田-小松计算筛选法，循环$\mathbb{Z}_2$-塔的归纳结构和赫布兰定理。

更新时间: 2026-05-07 13:53:40

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2604.15858v2

Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds

We derive a tight analysis of the trade-off function for Differentially Private Stochastic Gradient Descent (DP-SGD) with subsampling based on random shuffling within the $f$-DP framework. Our analysis covers the regime $σ\geq \sqrt{3/\ln M}$, where $σ$ is the noise multiplier and $M$ is the number of rounds within a single epoch. Unlike $f$-DP analyses for Poisson subsampling, which yield non-closed implicit formulas that can be machine computed but are non-transparent, random shuffling admits a tight analysis yielding transparent and interpretable closed-form bounds. Our concrete bounds, derived via the Berry-Esseen theorem, are tight up to constant factors within the proof framework. We demonstrate worked parameter settings for a single epoch ($E=1$) with a corresponding trade-off function $\geq 1-a-δ$, that is, only $δ$ below the ideal random guessing diagonal $1-a$: For $δ= 1/100$ and $σ= 1$, roughly $M \approx 1.14\times 10^6$ rounds and $N \approx 1.14\times 10^7$ training samples suffice to achieve meaningful differential privacy. This is in contrast to recent negative results for the regime $σ\leq 1/\sqrt{2 \ln M}$. Our concrete bounds can be composed over multiple epochs leading to $δ$ having a linear in $E$ dependency, which restricts $E=O(\sqrt{M})$. To go beyond Berry--Esseen, we introduce a new proof technique based on a generalization of the law of large numbers that yields an asymptotic random guessing diagonal-limit result: if $E=c_M^2M$ with $c_M\to 0$, then the $E$-fold composed trade-off function satisfies $f^{\otimes E}(a)\to 1-a$ uniformly in $a\in[0,1]$ with $δ$ having only an $O(\sqrt{E})$ dependency. We compare this asymptotic regime with the corresponding Poisson subsampling asymptotic, and highlight the characterization of explicit convergence rates as an open question.

Updated: 2026-05-07 13:35:43

标题: 基于随机洗牌的子采样DP-SGD的权衡函数：严格的上下界

摘要: 我们推导了基于随机重排的子采样的差分隐私随机梯度下降（DP-SGD）的权衡函数的严格分析，该分析基于$f$-DP框架。我们的分析涵盖了$σ\geq \sqrt{3/\ln M}$的范围，其中$σ$是噪声乘数，$M$是单个时期内的轮数。与Poisson子采样的$f$-DP分析不同，后者产生的是非闭合的隐式公式，可以通过机器计算，但不透明，而随机重排则允许进行严格分析，得到透明且可解释的闭合形式边界。我们通过Berry-Esseen定理推导的具体边界在证明框架内与常数因子紧密相关。我们展示了单个时期（$E=1$）的工作参数设置，相应的权衡函数为$\geq 1-a-δ$，即仅低于理想的随机猜测对角线$1-a$的$δ$：对于$δ=1/100$和$σ=1$，大约需要$M \approx 1.14\times 10^6$轮和$N \approx 1.14\times 10^7$个训练样本才能实现有意义的差分隐私。这与$σ\leq 1/\sqrt{2 \ln M}$范围内最近的负面结果形成对比。我们的具体边界可以在多个时期内组合，导致$δ$具有与$E$成线性依赖的特性，限制了$E=O(\sqrt{M})$。为了超越Berry-Esseen，我们引入了一种基于大数定律推广的新的证明技术，得到了一个渐近随机猜测对角线极限结果：如果$E=c_M^2M$且$c_M\to 0$，那么$E$次组合的权衡函数满足$f^{\otimes E}(a)\to 1-a$，在$a\in[0,1]$中一致成立，且$δ$仅与$O(\sqrt{E})$有关。我们将这一渐近范围与相应的Poisson子采样渐近进行比较，并强调明确收敛速率的表征作为一个未解决的问题。

更新时间: 2026-05-07 13:35:43

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2605.06259v1

Fairness in Token Delegation: Mitigating Voting Power Concentration in DAOs

Decentralized Autonomous Organizations (DAOs) aim to enable participatory governance, but in practice face challenges of voter apathy, concentration of voting power, and misaligned delegation. Existing delegation mechanisms often reinforce visibility biases, where a small set of highly ranked delegates accumulate disproportionate influence regardless of their alignment with the broader community. In this paper, we conduct an empirical study of delegation in DAO governance off-chain discussions from 14 DAO forums. We develop a methodology to link forum participants to on-chain addresses, extract governance interests using large language models, and compare these interests against delegates' historical behavior. Our analysis reveals that delegations are frequently misaligned with token holders' expressed priorities and that current ranking-based interfaces exacerbate power concentration. We argue that incorporating interest alignment into delegation processes could mitigate these imbalances and improve the representativeness of DAO decision-making. To support future research, we will release our dataset and code in a public repository.

Updated: 2026-05-07 13:28:41

标题: Token委派中的公平性：在DAO中减轻投票权集中化问题

摘要: 去中心化自治组织（DAOs）旨在实现参与式治理，但实际上面临选民冷漠、投票权集中和代表不一致等挑战。现有的委托机制通常会强化可见性偏差，即少数高排名代表会积聚不成比例的影响力，而不考虑他们是否与更广泛社区的利益一致。本文对来自14个DAO论坛的DAO治理离链讨论中的委托进行了实证研究。我们开发了一种方法学，将论坛参与者与链上地址进行关联，利用大型语言模型提取治理利益，并将这些利益与代表的历史行为进行比较。我们的分析显示，委托经常与代币持有人表达的优先事项不一致，当前基于排名的界面加剧了权力集中。我们认为将利益一致性纳入委托流程可以缓解这些不平衡，提高DAO决策的代表性。为支持未来研究，我们将在公共存储库中发布我们的数据集和代码。

更新时间: 2026-05-07 13:28:41

领域: cs.CR

下载: http://arxiv.org/abs/2510.05830v2

Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents

Large Language Models (LLMs) have revolutionized how information are collected, aggregated, and reasoned. However, this enables a novel and accessible vector of privacy intrusion: the automated and in-depth personal profiling; this engenders a chilling effect of "peepers everywhere". Existing research primarily unfolds from the training pipeline of LLM, emphasizing the exposure of Personally Identifiable Information (PII) through memorization, while privacy studies from a human-centric perspective remain underexplored. To fill this void, we empirically investigate privacy perception in the real world through the lens of human awareness and the practices of LLM-integrated platforms, revealing a significant dissonance: platforms fail to technically or policy-wise address public privacy concerns. To facilitate a systematic and quantifiable study of privacy risk, we propose the PrivacyIceberg, which categorizes real-world human privacy risks into three tiers: explicitly searched, contextually inferred, and deeply aggregated, based on the sophistication of LLM exploitation. We developed IcebergExplorer to audit privacy exposure, utilizing minimal PII as a search seed to reconstruct high-fidelity profiles, achieving over 90% factual accuracy within 10 minutes at a cost under $3, for real-world scenarios. Additionally, we identify six root causes contributing to such privacy disclosures and propose multi-stakeholder countermeasures for LLM vendors, individuals, and data publishers.

Updated: 2026-05-07 13:21:44

标题: 用几分钱进行分析：揭示LLM代理的隐私冰山

摘要: 大型语言模型（LLMs）已经彻底改变了信息的收集、汇总和推理方式。然而，这也开启了一种新颖且易于侵犯隐私的方式：自动化和深入的个人画像；这导致了一种“无处不在的窥视”效应。现有研究主要从LLM的训练管道展开，强调通过记忆曝光个人可识别信息（PII），而从以人为中心的角度进行隐私研究仍未得到充分探索。为了填补这一空白，我们通过人类意识和LLM集成平台的实践的视角，从实际世界中实证调查隐私感知，揭示了一个显著的不一致性：平台未能在技术或政策方面解决公众隐私关切。为了促进对隐私风险的系统化和量化研究，我们提出了隐私冰山理论，将现实世界中的人类隐私风险分为三个层次：明确搜索、上下文推断和深度聚合，基于LLM利用程度的复杂性。我们开发了IcebergExplorer来审计隐私曝光，利用最少的PII作为搜索种子来重建高保真度个人画像，在10分钟内以不到3美元的成本实现超过90%的事实准确性，适用于实际世界场景。此外，我们确定了导致此类隐私披露的六个根本原因，并为LLM供应商、个人和数据发布者提出了多方利益相关者的对策。

更新时间: 2026-05-07 13:21:44

领域: cs.CR

下载: http://arxiv.org/abs/2605.06232v1

ClawGuard: Out-of-Band Detection of LLM Agent Workflow Hijacking via EM Side Channel

Autonomous LLM agents face a critical security risk known as workflow hijacking, where attackers subtly alter tool and skill invocations. Existing defenses rely on host-internal telemetry (such as audit logs), which can be forged if the host OS is compromised. To solve this, we introduce ClawGuard, a passive, out-of-band monitor that audits LLM-agent workflows using electromagnetic (EM) emanations. Because distinct agent skills create unique hardware usage patterns (computation, DRAM, network blocking), they emit measurable, macroscopic EM envelopes. External software-defined radios (SDRs) capture these physical signals. Using a drift-aware pipeline with 320-dimensional features, ClawGuard converts RF streams into physical evidence. Evaluated on a 7.82TB RF corpus, ClawGuard achieved an AUC of 0.9945, detecting attacks with a 100% true-positive rate and a 1.16% false-positive rate. This proves passive EM sensing is a practical, forge-resistant physical check against compromised host software.

Updated: 2026-05-07 13:12:26

标题: 爪牙防护：通过电磁侧信道实现LLM代理工作流劫持的带外检测

摘要: 自主LLM代理面临一个关键的安全风险，被称为工作流劫持，攻击者会微妙地改变工具和技能的调用。现有的防御依赖于主机内部的遥测数据（如审计日志），如果主机操作系统被 compromise，这些数据可能会被伪造。为了解决这个问题，我们引入了 ClawGuard，这是一个被动的、带外监视器，通过电磁（EM）辐射审计LLM代理的工作流。因为不同的代理技能会产生独特的硬件使用模式（计算、DRAM、网络阻塞），它们会发出可测量的、宏观的EM包络。外部的软件定义无线电（SDR）捕获这些物理信号。使用一个具有320维特征的漂移感知管道，ClawGuard将射频流转换为物理证据。在一个7.82TB的射频语料库上进行评估，ClawGuard实现了0.9945的AUC，以100%的真正阳性率和1.16%的假阳性率检测到攻击。这证明了被动EM感知是对受损主机软件的实用、抗伪造的物理检查。

更新时间: 2026-05-07 13:12:26

领域: cs.CR

下载: http://arxiv.org/abs/2605.06205v1

FIT to Forget: Robust Continual Unlearning for Large Language Models

While large language models (LLMs) exhibit remarkable capabilities, they increasingly face demands to unlearn memorized privacy-sensitive, copyrighted, or harmful content. Existing unlearning methods primarily focus on \emph{single-shot} scenarios, whereas real-world deletion requests arrive \emph{continually}. Naïvely applying these methods to sequential requests leads to severe utility degradation and catastrophic forgetting. To address this, we propose \fit, a robust continual unlearning framework to process high-volume sequential deletion streams while resisting both catastrophic forgetting and post-unlearning recovery. \fit stabilizes sequential updates through three synergistic mechanisms: redundancy \underline{F}iltering, \underline{I}mportance-aware adaptive algorithm selection, and \underline{T}argeted layer attribution. Furthermore, to facilitate rigorous evaluation, we introduce \textbf{PCH}, a unified benchmark encompassing \textbf{P}ersonal, \textbf{C}opyrighted, and \textbf{H}armful content, alongside two symmetric metrics, Forget Degree (F.D.) and Retain Utility (R.U.), to systematically quantify forgetting-utility trade-offs. Extensive experiments across five LLMs (up to 14B parameters) demonstrate that \fit consistently achieves state-of-the-art unlearning efficacy and utility preservation. Notably, even after hundreds of sequential requests, \fit preserves strong downstream (\eg, GSM8K, MMLU) performance and exhibits superior resilience against relearning and quantization recovery attacks.

Updated: 2026-05-07 12:54:28

标题: FIT to Forget: 大型语言模型的强大持续遗忘

摘要: 尽管大型语言模型(LLMs)展示出令人瞩目的能力，但它们越来越面临要求遗忘已经记忆的隐私敏感、受版权保护或有害内容的需求。现有的遗忘方法主要集中在\emph{一次性}场景上，而现实世界中的删除请求是\emph{持续不断}到来的。简单地将这些方法应用于连续请求会导致严重的效用降低和灾难性遗忘。为了解决这个问题，我们提出了\fit，一个强大的持续遗忘框架，用于处理高频连续删除流，同时抵抗灾难性遗忘和遗忘后的恢复。通过三种协同机制，\fit稳定了连续更新：冗余过滤、重要性感知自适应算法选择和目标层属性。此外，为了促进严格评估，我们引入了\textbf{PCH}，一个统一的基准，包括\textbf{P}ersonal、\textbf{C}opyrighted和\textbf{H}armful内容，以及两个对称度量，遗忘程度(F.D.)和保留效用(R.U.)，以系统地量化遗忘效用权衡。通过对五种LLMs(最多14B参数)的广泛实验表明，\fit持续实现了最先进的遗忘效果和效用保留。值得注意的是，即使在数百个连续请求之后，\fit仍保持强大的下游(\eg，GSM8K、MMLU)性能，并表现出对重新学习和量化恢复攻击的优越弹性。

更新时间: 2026-05-07 12:54:28

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2601.21682v2

Stateful Agent Backdoor

Existing backdoor attacks on Large Language Model-based agents remain stateless, executing fixed behaviors confined to a single session. We propose a stateful agent backdoor that extends the attack lifecycle across multiple sessions under permission isolation. The attack maintains state through persistent components, enabling autonomous, incremental execution across sessions following a one-time trigger injection. Formally, we model the attack as a Mealy machine and derive a decomposition framework that enables independent per-transition data construction. We instantiate this framework with a primary attack and two extensibility variants. The primary instantiation achieves an attack success rate of 80\%--95\% across four models, with per-transition analysis demonstrating the effectiveness of the decomposition. Extensibility variants with alternative topologies and persistent components demonstrate consistent effectiveness. Code and data are available at https://anonymous.4open.science/r/stateful_agent_backdoor-E89F.

Updated: 2026-05-07 12:48:40

标题: 有状态代理后门

摘要: 现有的基于大型语言模型的代理的后门攻击仍然是无状态的，执行固定行为，限制在单个会话中。我们提出了一种有状态的代理后门，将攻击生命周期延伸到多个会话，实现权限隔离。该攻击通过持久组件维持状态，实现自主、增量式执行跨会话，只需一次触发注入。形式上，我们将攻击建模为Mealy机，并推导出一个分解框架，实现独立的每个转换数据构造。我们利用主要攻击和两个可扩展变体实例化这个框架。主要实例化在四个模型中实现了80%--95%的攻击成功率，每个转换的分析展示了分解的有效性。具有替代拓扑和持久组件的可扩展变体展示了持续的有效性。代码和数据可在https://anonymous.4open.science/r/stateful_agent_backdoor-E89F获取。

更新时间: 2026-05-07 12:48:40

领域: cs.CR

下载: http://arxiv.org/abs/2605.06158v1

Secure Seed-Based Multi-bit Watermarking for Diffusion Models from First Principles

The rapid emergence of generative image models has led to the development of specialized watermarking techniques, particularly in-generation methods such as seed-based embedding. However, current evaluations in this area remain largely empirical, making them heavily reliant on the specific model architectures used for generation and inversion. This prevents any clear conclusion on the performance of any method, especially regarding security, for which a rigorous definition is lacking. Against this approach, we argue that the effectiveness of a watermarking scheme should be established purely through a thorough theoretical analysis. This is enabled by decoupling the model-dependent part from the actual decision mechanism of the watermarking system. Using this decoupling, we introduce a formal evaluation framework based on security, robustness, and fidelity. This allows precise comparisons between watermarking systems through a characteristic surface representing the trade-off between these three quantities, independent of any generative model. Based on this framework, we propose SSB, a novel watermarking method that generalizes previous seed-based methods by allowing to reach any security-robustness-fidelity regime on its characteristic surface. This work opens the door to the design of modern watermarking systems with theoretical guarantees that do not necessitate any costly empirical evaluations.

Updated: 2026-05-07 12:46:22

标题: 基于安全种子的多比特扩散模型水印技术的第一原则研究

摘要: 生成图像模型的迅速出现导致了专门的水印技术的发展，特别是在生成方法中，如基于种子的嵌入。然而，目前在这一领域的评估仍然主要是经验性的，这使得它们在很大程度上依赖于用于生成和反转的特定模型架构。这阻止了对任何方法性能的明确结论，特别是关于安全性的，因为缺乏严格的定义。针对这种方法，我们认为水印方案的有效性应该纯粹通过彻底的理论分析来确定。通过将水印系统的模型相关部分与实际决策机制分离，我们能够实现这一点。利用这种分离，我们引入了一个基于安全性、稳健性和保真度的正式评估框架。这使得可以通过代表这三个量之间权衡的特征面对水印系统进行精确比较，而不受任何生成模型的影响。基于这个框架，我们提出了SSB，一种新颖的水印方法，它通过允许在其特征面上达到任何安全性-稳健性-保真度区域来推广以前的基于种子的方法。这项工作为设计具有理论保证的现代水印系统打开了大门，而不需要进行任何昂贵的经验评估。

更新时间: 2026-05-07 12:46:22

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2605.06153v1

When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents

Personalized LLM agents maintain persistent cross-session state to support long-horizon collaboration. Yet, this persistence introduces a subtle but critical security vulnerability: routine user-agent interactions can gradually reshape an agent's long-term state, inadvertently weakening future confirmation boundaries, expanding tool-use defaults, and escalating autonomous behavior over time. We formalize this risk as \textbf{unintended long-term state poisoning}. To systematically study it, we introduce the \textbf{Unintended Long-Term State Poisoning Bench (ULSPB)}, a bilingual benchmark comprising $350$ settings spanning five assistance categories, seven interaction patterns, 24-turn routine interactions, and matched single-injection counterparts. Furthermore, we define the \emph{Harm Score} (HS), a state-centric metric that quantifies \emph{authorization drift}, \emph{tool-use escalation}, and \emph{unchecked autonomy}. Experiments on OpenClaw with four backbone LLMs demonstrate that, while single-injection is generally effective, routine conversations alone can substantially poison long-term state, primarily corrupting memory-centric artifacts. Evaluations seeded with real-world user interactions confirm that this risk is not a mere artifact of synthetic prompts. To mitigate this threat, we propose \textbf{StateGuard}, a lightweight, post-execution defense that audits state diffs at the writeback boundary and selectively rolls back dangerous edits. Across all evaluated models, StateGuard reduces HS to near zero and lowers false-negative rates, with acceptable high false-positive rates under a safety-first writeback defense and minimal overhead.

Updated: 2026-05-07 12:25:16

标题: 当日常聊天变得有毒：个性化代理中意外的长期状态中毒

摘要: 个性化的LLM代理保持持久的跨会话状态，以支持长期合作。然而，这种持久性引入了一个微妙但关键的安全漏洞：日常用户代理交互可以逐渐重塑代理的长期状态，无意中削弱未来的确认边界，扩大工具使用默认设置，并随着时间的推移升级自主行为。我们将这种风险正式化为\textbf{意外的长期状态污染}。为了系统地研究这一问题，我们引入了\textbf{意外的长期状态污染基准（ULSPB）}，这是一个包含350个设置的双语基准，涵盖了五种协助类别、七种交互模式、24轮例行交互以及匹配的单次注入对应项。此外，我们定义了\emph{危害分数}（HS），这是一种以状态为中心的度量标准，量化了\emph{授权漂移}、\emph{工具使用升级}和\emph{未经检查的自治}。在OpenClaw上对四个基础LLM进行的实验证明，虽然单次注入通常是有效的，但仅仅例行对话就可以显著地污染长期状态，主要是破坏内存为中心的工件。通过种植真实用户交互进行评估，确认这一风险不仅仅是合成提示的产物。为了缓解这一威胁，我们提出了\textbf{StateGuard}，这是一种轻量级的、后执行的防御措施，通过在回写边界审计状态差异并有选择地回滚危险的编辑。在所有评估的模型中，StateGuard将HS降低到接近零，并降低了假阴性率，在以安全为首要的回写防御下具有可接受的高假阳性率和最小的开销。

更新时间: 2026-05-07 12:25:16

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2605.06731v1

Guidance Watermarking for Diffusion Models

This paper introduces a novel watermarking method for diffusion models. It is based on guiding the diffusion process using the gradient computed from any off-the-shelf watermark decoder. The gradient computation encompasses different image augmentations, increasing robustness to attacks against which the decoder was not originally robust, without retraining or fine-tuning. Our method effectively convert any \textit{post-hoc} watermarking scheme into an in-generation embedding along the diffusion process. We show that this approach is complementary to watermarking techniques modifying the variational autoencoder at the end of the diffusion process. We validate the methods on different diffusion models and detectors. The watermarking guidance does not significantly alter the generated image for a given seed and prompt, preserving both the diversity and quality of generation.

Updated: 2026-05-07 11:42:08

标题: 扩散模型的指导水印技术

摘要: 本文介绍了一种新颖的扩散模型水印方法。该方法基于使用从任何现成水印解码器计算的梯度来引导扩散过程。梯度计算涵盖了不同的图像增强，增加了对解码器最初不具有鲁棒性的攻击的抵抗力，而无需重新训练或微调。我们的方法有效地将任何\textit{事后}水印方案转换为沿扩散过程的内嵌。我们展示了这种方法对修改变分自动编码器的水印技术是互补的。我们在不同的扩散模型和检测器上验证了这些方法。对于给定的种子和提示，水印引导不会显著改变生成的图像，同时保留了生成的多样性和质量。

更新时间: 2026-05-07 11:42:08

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2509.22126v2

MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents

Persistent external memory enables LLM agents to maintain context across sessions, yet its security properties remain formally uncharacterized. We formalize memory poisoning attacks on retrieval-augmented agents as a Stackelberg game with a unified evaluation framework spanning three attack classes with escalating access assumptions. Correcting an evaluation protocol inconsistency in the triggered-query specification of Chen et al. (2024), we show faithful evaluation increases measured attack success by $4\times$ (ASR-R: $0.25 \to 1.00$). Our primary contribution is MEMSAD (Semantic Anomaly Detection), a calibration-based defense grounded in a gradient coupling theorem: under encoder regularity, the anomaly score gradient and the retrieval objective gradient are provably identical, so any continuous perturbation that reduces detection risk necessarily degrades retrieval rank. This coupling yields a certified detection radius guaranteeing correct classification regardless of adversary strategy. We prove minimax optimality via Le Cam's method, showing any threshold detector requires $Ω(1/ρ^2)$ calibration samples and MEMSAD achieves this up to $\log(1/δ)$ factors. We further derive online regret bounds for rolling calibration at rate $O(σ^{2/3}Δ^{1/3})$, and formally characterize a discrete synonym-invariance loophole that marks the boundary of what continuous-space defenses can guarantee. Experiments on a $3 \times 5$ attack-defense matrix with bootstrap confidence intervals, Bonferroni-corrected hypothesis tests, and Clopper-Pearson validation ($n=1{,}000$) confirm: composite defenses achieve TPR $= 1.00$, FPR $= 0.00$ across all attacks, while synonym substitution evades detection at $Δ$ ASR-R $\approx 0$, exposing a gap existing embedding-based defenses cannot close.

Updated: 2026-05-07 11:41:14

标题: MEMSAD：用于检测检索增强代理中的内存中毒的梯度耦合异常检测

摘要: 持久的外部记忆使LLM代理能够在会话之间保持上下文，但其安全性质仍未得到正式描述。我们将检索增强代理的记忆中毒攻击形式化为一个Stackelberg博弈，其中包括三个攻击类别，攻击假设逐渐升级的统一评估框架。通过纠正Chen等人（2024年）在触发查询规范中的评估协议不一致性，我们展示了忠实评估将攻击成功率提高了4倍（ASR-R：$0.25 \to 1.00$）。我们的主要贡献是MEMSAD（语义异常检测），这是一个基于校准的防御方案，基于梯度耦合定理：在编码器正则性下，异常分数梯度和检索目标梯度可以被证明是相同的，因此任何连续扰动都会降低检测风险，必然会降低检索等级。这种耦合产生了一个经过认证的检测半径保证，无论对手策略如何，都可以正确分类。我们通过Le Cam的方法证明了最小-最大最优性，表明任何阈值检测器都需要$Ω(1/ρ^2)$校准样本，而MEMSAD在$log(1/δ)$因子内达到了这一目标。我们进一步推导出了滚动校准的在线遗憾界限，速率为$O(σ^{2/3}Δ^{1/3})，并正式表征了一个离散同义词不变性漏洞，标志着连续空间防御能够保证的边界。通过一个$3 \times 5$的攻击-防御矩阵进行实验，使用自举置信区间，Bonferroni校正的假设检验和Clopper-Pearson验证（$n=1{,}000$），结果证实：复合防御策略在所有攻击中实现了TPR $= 1.00$，FPR $= 0.00$，而同义词替换在$Δ$ ASR-R $\approx 0$时逃避了检测，暴露了存在的基于嵌入的防御策略无法弥补的差距。

更新时间: 2026-05-07 11:41:14

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2605.03482v2

Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks

The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on parameters, gradients, or internal representations, we observe that they can be effectively circumvented under persistent HFT. Our analysis traces this failure to the inherent redundancy of the high-dimensional parameter space: attackers exploit optimization trajectories that are orthogonal to defense constraints to restore harmful capabilities while deceptively adhering to safety restrictions. To address this, we propose Safety Bottleneck Regularization (SBR). SBR shifts the defensive focus from the redundant parameter space to the unembedding layer, which serves as a geometric bottleneck. By anchoring the final hidden states of harmful queries to those of the safety-aligned model, SBR enables the model to maintain safe responses even under persistent HFT. Extensive experiments confirm SBR's effectiveness, demonstrating that utilizing just a single safety anchor is sufficient to reduce the Harmful Score to $<$10 while preserving competitive performance on benign downstream tasks.

Updated: 2026-05-07 10:47:53

标题: 安全锚：通过几何瓶颈防御有害的精细调整

摘要: 大型语言模型（LLMs）的安全对齐仍然容易受到有害微调（HFT）的影响。虽然现有的防御措施对参数、梯度或内部表示施加了约束，但我们观察到它们在持续的HFT下可以被有效地规避。我们的分析将这种失败追溯到高维参数空间的固有冗余性：攻击者利用与防御约束垂直的优化轨迹来恢复有害能力，同时欺骗性地遵守安全限制。为了解决这个问题，我们提出了Safety Bottleneck Regularization（SBR）。SBR将防御重点从冗余参数空间转移到非嵌入层，该层充当几何瓶颈。通过将有害查询的最终隐藏状态锚定到安全对齐模型的隐藏状态，SBR使模型能够在持续的HFT下保持安全响应。广泛的实验验证了SBR的有效性，表明利用一个安全锚点就足以将有害分数降低到小于10，同时保持对良性下游任务的竞争性性能。

更新时间: 2026-05-07 10:47:53

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2605.05995v1

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts

LLM agents rely on prompts to implement task-specific capabilities based on foundation LLMs, making agent prompts valuable intellectual property. However, in untrusted deployments, adversaries can copy and reuse these prompts with other proprietary LLMs, causing economic losses. To protect these prompts, we identify four key challenges: proactivity, runtime protection, usability, and non-portability that existing approaches fail to address. We present PragLocker, a prompt protection scheme that satisfies these requirements. PragLocker constructs function-preserving obfuscated prompts by anchoring semantics with code symbols and then using target-model feedback to inject noise, yielding prompts that only work on the target LLM. Experiments across multiple agent systems, datasets, and foundation LLMs show that PragLocker substantially reduces cross-LLM portability, maintains target performance, and remains robust against adaptive attackers.

Updated: 2026-05-07 10:19:06

标题: PragLocker: 通过非可移植提示在不受信任的部署中保护代理知识产权

摘要: LLM代理依赖提示来实现基于基础LLM的特定任务能力，使得代理提示成为有价值的知识产权。然而，在不受信任的部署中，对手可以复制和重复使用这些提示与其他专有LLMs，导致经济损失。为了保护这些提示，我们确定了四个关键挑战：主动性、运行时保护、可用性和非可移植性，现有方法未能解决这些挑战。我们提出了PragLocker，一种满足这些要求的提示保护方案。PragLocker通过将语义与代码符号锚定，然后使用目标模型反馈注入噪音，构建保留函数的混淆提示，从而只在目标LLM上有效。跨多个代理系统、数据集和基础LLMs的实验表明，PragLocker大大减少了跨LLM的可移植性，保持了目标性能，并且对自适应攻击者具有很强的抵抗力。

更新时间: 2026-05-07 10:19:06

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2605.05974v1

Heimdallr: Characterizing and Detecting LLM-Induced Security Risks in GitHub CI Workflows

GitHub Continuous Integration (CI) workflows increasingly integrate Large Language Models (LLMs) to automate review, triage, content generation, and repository maintenance. This creates a new attack surface: externally controllable workflow inputs can shape LLM prompts and outputs, which may in turn affect security decisions, repository state, or privileged execution. Although LLM security and CI security have each been studied extensively, their intersection remains underexplored. In this paper, we present the first study of LLM-induced security risks in GitHub CI workflows. We characterize the problem along the full execution chain and develop a taxonomy of high-level risk classes and concrete threat vectors. To detect such risks in practice, we design Heimdallr, a hybrid analysis framework that normalizes workflows into an LLM-Workflow Property Graph (L-WPG) and combines triggerability analysis, LLM-assisted dataflow summarization, and deterministic propagation to synthesize concrete threat-vector findings. Evaluated on 300 manually annotated unique workflows, Heimdallr achieves high accuracy on LLM-node identification (F1~=~0.994), triggerability classification (99.8%), and threat-vector detection (micro-average F1~=~0.917). As part of an ongoing detection and disclosure effort, we have so far responsibly disclosed 802 vulnerable workflow instances across 759 repositories and received 71 acknowledgments.

Updated: 2026-05-07 10:16:26

标题: Heimdallr：对GitHub CI工作流中由LLM引起的安全风险进行特征化和检测

摘要: GitHub持续集成（CI）工作流越来越集成大型语言模型（LLMs）以自动化审查、分类、内容生成和存储库维护。这创造了一个新的攻击面：外部可控的工作流输入可以塑造LLM提示和输出，从而可能影响安全决策、存储库状态或特权执行。尽管LLM安全和CI安全各自得到了广泛研究，它们的交叉点仍未被充分探索。在本文中，我们提出了对GitHub CI工作流中LLM引发的安全风险的第一项研究。我们沿着完整执行链表征了这个问题，并开发了高级风险类别和具体威胁向量的分类法。为了在实践中检测这些风险，我们设计了Heimdallr，一个混合分析框架，将工作流规范化为LLM-工作流属性图（L-WPG），并结合触发性分析、LLM辅助数据流摘要和确定性传播，以综合具体的威胁向量发现。在300个手动注释的唯一工作流上进行评估，Heimdallr在LLM节点识别（F1~=~0.994）、触发性分类（99.8%）和威胁向量检测（微平均F1~=~0.917）方面取得了高准确性。作为持续检测和披露工作的一部分，我们迄今已经负责披露了759个存储库中的802个易受攻击的工作流实例，并收到了71个确认。

更新时间: 2026-05-07 10:16:26

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2605.05969v1

Toward Space-Based Public Key Systems: Enabling Secure Space Communications through In-Orbit Trust Services

The New Space era has led to a rapid increase in satellites operated by independent entities in near-Earth orbit. This shift enables richer space services but also requires secure, near-real-time coordination, making efficient authentication of space assets critical for next-generation missions. Traditional ground-dependent Public Key Infrastructure (PKI) suffers from latency and operational bottlenecks that limit scalability and availability in dynamic space environments. This paper proposes architectural designs for space-based PKI that shift certificate management and validation from ground infrastructure into space, reducing reliance on ground stations while enabling interoperability and cross-entity collaboration. Two deployment schemes are introduced: a space-ground integrated PKI with in-orbit validation authorities, and a fully autonomous space-based PKI with in-space issuance and validation. We analyze deployment trade-offs in scalability, availability, security, cost, and operational complexity in multi-operator environments. A baseline latency analysis is provided to illustrate performance implications of in-orbit trust management.

Updated: 2026-05-07 09:57:35

标题: 走向基于空间的公钥系统：通过轨道信任服务实现安全的空间通信

摘要: 新的太空时代导致了由独立实体操作的近地轨道卫星数量的迅速增加。这种转变使得太空服务更加丰富，但也需要安全、接近实时的协调，这使得太空资产的高效认证对于下一代任务至关重要。传统的依赖地面的公钥基础设施（PKI）存在延迟和操作瓶颈，限制了在动态太空环境中的可扩展性和可用性。本文提出了基于太空的PKI的架构设计，将证书管理和验证从地面基础设施转移到太空中，减少了对地面站的依赖，同时实现了互操作性和跨实体协作。介绍了两种部署方案：具有轨道验证机构的空间地面集成PKI，以及完全自主的基于太空的PKI，具有空间颁发和验证功能。我们在多操作员环境中分析了部署的可扩展性、可用性、安全性、成本和操作复杂性方面的权衡。提供了基线延迟分析，以说明轨道中的信任管理对性能的影响。

更新时间: 2026-05-07 09:57:35

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2605.05948v1

Binary Image-Based Intrusion Detection for Operational Technology Networks: Extending the SPHBI Methodology from IoT to Modbus TCP

This paper extends the Single Packet Header Binary Image (SPHBI) intrusion detection methodology from IoT to Modbus TCP, evaluating five approaches spanning a gradient of protocol depth on the CIC Modbus 2023 dataset (11.4 million packets, eight detectable attack types). TCP/IP headers alone achieve only 51.8% binary accuracy, confirming that header-level heterogeneity exploited in IoT traffic is absent in uniform SCADA environments. Adding eight bytes of application-layer information improves binary accuracy to 98.1% with just 63 parameters, directly relevant to per-packet classification on resource-constrained OT edge devices. The best-performing approach achieves 94.4% +/- 2.2pp multiclass accuracy across nine classes (95% CI [92.9%, 95.9%], 10 seeds) with 56,873 parameters, roughly 430 times fewer than comparable ResNet50-based approaches. Per-class recall analysis shows seven of eight detectable attack types identified with recall above 94%, while replay attacks remain structurally undetectable by any single-packet method.

Updated: 2026-05-07 09:46:06

标题: 基于二进制图像的运营技术网络入侵检测：将SPHBI方法论从物联网扩展到Modbus TCP

摘要: 本文将单数据包头部二进制图像（SPHBI）入侵检测方法从物联网扩展到Modbus TCP，在CIC Modbus 2023数据集（1140万数据包，八种可检测的攻击类型）上评估了五种涵盖协议深度梯度的方法。仅使用TCP/IP头部的二进制准确率仅达到51.8％，证实在统一的SCADA环境中缺乏物联网流量中利用的头部级别异质性。添加八个字节的应用层信息将二进制准确率提高到98.1％，只需63个参数，直接适用于资源受限的OT边缘设备上的每个数据包分类。最佳表现的方法在九个类别（95％ CI [92.9％，95.9％]，10个种子）中实现了94.4％±2.2pp的多类准确率，使用56,873个参数，大约比类似的基于ResNet50的方法少430倍。每类召回分析显示八种可检测的攻击类型中有七种召回率超过94％，而重放攻击始终无法通过任何单数据包方法结构性地检测到。

更新时间: 2026-05-07 09:46:06

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2605.04250v2

Backdoor Mitigation in Object Detection via Adversarial Fine-Tuning

Backdoor attacks can implant malicious behaviours into deep models while preserving performance on clean data, posing a serious threat to safety-critical vision systems. Although backdoor mitigation has been studied extensively for image classification, defenses for object detection remain comparatively underdeveloped. Adversarial fine-tuning is a common backdoor mitigation approach in classification, but adapting it to detection is nontrivial as classification-oriented adversarial generation does not match the detection attack space, where attacks may cause object misclassification or disappearance, and standard detection losses can dilute the repair signal across many predictions. We address these challenges through a detection-aware adversarial fine-tuning framework for mitigating object-detection backdoors when the defender has access only to a compromised detector and a small clean dataset, without knowing the attack objective. For adversarial generation that does not require knowledge of the attack objective, we introduce soft-branch minimisation, which uses a soft gate to combine objectives aligned with misclassification and disappearance attacks, together with a detection-aware classification-loss maximisation. For targeted repair, we introduce a dual-objective fine-tuning loss applied to target-matched predictions, concentrating the defensive update on predictions most relevant to the backdoor behaviour. Experiments across CNN- and Transformer-based detectors show that our approach more effectively reduces attack success while preserving true detections, compared with classification-oriented baselines, and maintains competitive clean detection performance.

Updated: 2026-05-07 09:33:06

标题: 目标检测中的后门缓解：通过对抗微调

摘要: 后门攻击可以在深度模型中植入恶意行为，同时在干净数据上保持性能，对安全关键的视觉系统构成严重威胁。尽管针对图像分类进行了大量研究以减轻后门攻击，但针对目标检测的防御措施仍相对不完善。对抗微调是分类中常见的后门减轻方法，但将其调整为检测是不平凡的，因为以分类为导向的对抗生成并不匹配检测攻击空间，攻击可能导致目标误分类或消失，而标准检测损失可能会使修复信号在许多预测中稀释。我们通过一种针对检测感知的对抗微调框架来解决这些挑战，用于减轻当防御者只能访问一个受损检测器和一个小型干净数据集时的目标检测后门，而不知道攻击目标。对于不需要了解攻击目标的对抗生成，我们引入了软分支最小化，它使用软门将与误分类和消失攻击对齐的目标与检测感知分类损失最大化相结合。对于有针对性的修复，我们引入了一个应用于目标匹配预测的双目标微调损失，将防御性更新集中在与后门行为最相关的预测上。基于基于CNN和Transformer的检测器的实验表明，与以分类为导向的基线相比，我们的方法更有效地降低了攻击成功率，同时保持了真实检测，并保持了竞争性的干净检测性能。

更新时间: 2026-05-07 09:33:06

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2605.05928v1

Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models

Diffusion Models (DMs) have achieved remarkable success in image generation, yet recent studies reveal their vulnerability to backdoor attacks, where adversaries manipulate outputs via covert triggers embedded in inputs. Existing defenses, such as backdoor detection and trigger inversion, are largely effective because prior attacks rely on limited input spaces and low-dimensional triggers that are visually conspicuous or easily captured by neural detectors. To broaden the threat landscape, we propose Gungnir, a novel backdoor attack that activates malicious behaviors through style-based triggers embedded in input images. Unlike explicit visual patches or textual cues, stylistic features serve as stealthy, high-level triggers. We introduce Reconstructing-Adversarial Noise (RAN) and Short-Term Timesteps-Retention (STTR) to preserve trigger-consistent diffusion dynamics in image-to-image tasks. The resulting trigger-embedded samples are perceptually indistinguishable from clean images, evading both manual and automated detection. Extensive experiments show that Gungnir bypasses state-of-the-art defenses with an extremely low backdoor detection rate (BDR) and remains effective under fine-tuning-based purification, revealing previously underexplored vulnerabilities in diffusion models.

Updated: 2026-05-07 09:28:01

标题: Gungnir：利用图像中的风格特征对扩散模型进行后门攻击

摘要: 扩散模型（DMs）在图像生成方面取得了显著成功，然而最近的研究揭示了它们对后门攻击的脆弱性，即对手通过嵌入在输入中的隐蔽触发器来操纵输出。现有的防御措施，如后门检测和触发器反转，很大程度上是有效的，因为先前的攻击依赖于有限的输入空间和低维触发器，这些触发器在视觉上很明显，或者容易被神经检测器捕获。为了拓宽威胁范围，我们提出了Gungnir，一种新颖的后门攻击，通过嵌入在输入图像中的基于风格的触发器来激活恶意行为。与显式的视觉修补程序或文本提示不同，风格特征充当隐蔽的高级触发器。我们引入了重构对抗噪声（RAN）和短期时间步保留（STTR）来在图像到图像任务中保持一致的触发器扩散动态。由此产生的触发器嵌入样本在感知上与干净图像几乎无法区分，规避了手动和自动检测。大量实验证明，Gungnir绕过了最先进的防御措施，具有极低的后门检测率（BDR），并且在基于微调的净化下仍然有效，揭示了扩散模型中以前未经深入探讨的漏洞。

更新时间: 2026-05-07 09:28:01

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2502.20650v5

SMI: Statistical Membership Inference for Reliable Unlearned Model Auditing

Machine unlearning (MU) is essential for enforcing the right to be forgotten in machine learning systems. A key challenge of MU is how to reliably audit whether a model has truly forgotten specified training data. Membership Inference Attacks (MIAs) are widely used for unlearned model auditing, where samples that evade membership detection are regarded as successfully forgotten. We show this assumption is fundamentally flawed: failed membership inference does not imply true forgetting. We prove that unlearned samples occupy fundamentally different positions in the feature space than non-member samples, making this alignment bias unavoidable and unobservable, which leads to systematically optimistic evaluations of unlearning performance. Meanwhile, training shadow models for MIA incurs substantial computational overhead. To address both limitations, we propose Statistical Membership Inference (SMI), a training-free auditing framework that reformulates auditing as estimating the non-member mixture proportion in the unlearned feature distribution. Beyond estimating the forgetting rate, SMI also provides bootstrap reference ranges for quantified auditing reliability. Extensive experiments show that SMI consistently outperforms all MIA-based baselines, with no shadow model training required. Overall, SMI establishes a principled and efficient alternative to MIA-based auditing methods, with both theoretical guarantees and strong empirical performance.

Updated: 2026-05-07 09:14:58

标题: SMI：可靠未学习模型审计的统计成员推断

摘要: 机器遗忘（MU）对于在机器学习系统中执行被遗忘权利至关重要。MU的一个关键挑战是如何可靠地审计模型是否真正遗忘了指定的训练数据。成员推理攻击（MIAs）广泛用于未学习模型的审计，其中逃避成员检测的样本被视为成功遗忘。我们展示了这一假设基本上是错误的：失败的成员推理并不意味着真正的遗忘。我们证明了未学习样本在特征空间中占据基本不同的位置，使得这种对齐偏差是不可避免的和不可观察的，这导致对遗忘表现的系统乐观评估。同时，为MIA训练阴影模型会带来重大的计算开销。为了解决这两个限制，我们提出了统计成员推理（SMI），这是一个无需训练的审计框架，它重新制定审计为估计未学习特征分布中的非成员混合比例。除了估计遗忘率，SMI还为定量审计可靠性提供了自举参考范围。大量实验证明，SMI始终优于所有基于MIA的基线，无需训练阴影模型。总的来说，SMI建立了一个基于理论保证和强大实证表现的基于MIA的审计方法的原则和高效替代方案。

更新时间: 2026-05-07 09:14:58

领域: cs.LG,cs.AI,cs.CR,cs.CV,math.OC

下载: http://arxiv.org/abs/2602.01150v2

ActiveFlowMark: Assessing Tor Anonymity under Active Bandwidth Watermarking

Low-latency anonymity networks such as Tor remain vulnerable to infrastructure-level traffic analysis that exploits side-channel information observable from encrypted communications. We introduce NATA, a non-invasive active traffic-correlation analysis algorithm that injects distinguishable throughput patterns into traffic flows through controlled bandwidth perturbations. Unlike passive correlation methods, NATA does not require endpoint compromise, Tor-browser modification, or packet-payload decryption or modification. It can be carried out by an adversary that controls an upstream network gateway and observes traffic at adversary-controlled exit relays. To identify perturbed flows under substantial network variability, we develop BM-Net (Bandwidth Modulation Network), a selective state-space learning framework adapted for bandwidth-modulation detection. Given the limited availability of high-fidelity ground truth on real-world cross-continental Tor paths, BM-Net adopts a data-efficient learning strategy that separates self-supervised representation learning from supervised task-specific classification. It first learns reusable traffic representations through masked pre-training on serialized traffic traces, and then adapts these representations to binary perturbation detection and fine-grained modulation classification using task-specific labeled data. Through real Tor traffic measurements, BM-Net achieves a 99.65% binary detection F1 score and a 97.5% macro-F1 score for fine-grained modulation classification under our evaluated settings. In addition, tornettools-based scaled simulations are used to estimate exit-observation probability under bandwidth-weighted relay selection. These results suggest that active bandwidth perturbation can serve as an infrastructure-level side channel for traffic correlation under a clearly defined adversary model.

Updated: 2026-05-07 08:57:57

标题: ActiveFlowMark：在主动带宽水印下评估Tor匿名性

摘要: Low-latency anonymity networks like Tor are still vulnerable to traffic analysis at the infrastructure level, which exploits side-channel information obtained from encrypted communications. In this study, we introduce NATA, an active traffic-correlation analysis algorithm that injects distinguishable throughput patterns into traffic flows by making controlled changes to bandwidth. Unlike passive correlation methods, NATA does not require compromising endpoints, modifying the Tor browser, or decrypting or modifying packet payloads. It can be carried out by an adversary who controls a network gateway upstream and monitors traffic at exit relays controlled by the adversary. To identify perturbed flows in the presence of significant network variability, we introduce BM-Net (Bandwidth Modulation Network), a selective state-space learning framework designed for detecting bandwidth modulation. Due to the limited availability of accurate ground truth data on real-world Tor paths spanning continents, BM-Net utilizes a data-efficient learning approach that separates representation learning from task-specific classification. The model first learns reusable traffic representations through masked pre-training on serialized traffic traces, and then adapts these representations for detecting binary perturbations and classifying fine-grained modulations using labeled data specific to the task. Through measurements of real Tor traffic, BM-Net achieves a binary detection F1 score of 99.65% and a macro-F1 score of 97.5% for fine-grained modulation classification in our evaluation. Furthermore, scaled simulations based on tornettools are used to estimate the probability of observing exits under bandwidth-weighted relay selection. These findings indicate that active bandwidth perturbation can be utilized as an infrastructure-level side channel for traffic correlation within a well-defined adversary model.

更新时间: 2026-05-07 08:57:57

领域: cs.CR

下载: http://arxiv.org/abs/2605.05887v1

SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills

Agent Skills have become a practical way to extend LLM agents by packaging metadata, natural-language instructions, and executable resources into reusable capability bundles. However, this growing Skill ecosystem introduces a new compliance risk: a Skill may perform high-impact actions that exceed the minimum necessary scope of the user's current task, thereby violating least-privilege. Existing skill detection approaches are insufficient for this problem because it is inherently task-conditioned: the same action may be necessary under one user prompt but over-privileged under another. In this paper, we present SkillScope, a framework for fine-grained least-privilege enforcement in Agent Skills. SkillScope adopts a graph-based analysis approach that models instruction-level procedures and code-level operations as fine-grained action nodes. It extracts potential over-privilege candidates, validates them under graph-instantiated user tasks through replay-based analysis, and constrains validated over-privileged actions via control-flow privilege constraining. We evaluate SkillScope through effectiveness experiments and large-scale real-world measurement. SkillScope achieves 94.53% F1 for skill over-privilege detection. In the wild, SkillScope validates 7,039 Skills with over-privileged behaviors, showing that least-privilege violations are prevalent in current Skill ecosystems. In the privilege-constraining evaluation, SkillScope reduces triggered over-privileged action-in-task instances by 88.56% while preserving legitimate task completion.

Updated: 2026-05-07 08:34:14

标题: SkillScope：面向代理技能的细粒度最小特权执行

摘要: Agent Skills已成为通过将元数据、自然语言指令和可执行资源打包成可重用的能力包来扩展LLM代理的实际方法。然而，这一不断增长的Skill生态系统引入了一种新的合规风险：一个Skill可能执行超出用户当前任务最小必要范围的高影响操作，从而违反了最低权限原则。现有的技能检测方法对于这个问题是不够的，因为它在本质上是任务条件的：同一操作在一个用户提示下可能是必要的，但在另一个用户提示下就是过度授权的。在本文中，我们提出了SkillScope，一个用于代理技能中细粒度最低权限执行的框架。SkillScope采用基于图形的分析方法，将指令级过程和代码级操作建模为细粒度动作节点。它提取潜在的过度授权候选项，并通过基于重播的分析在图实例化用户任务下验证它们，并通过控制流权限约束来限制经验证的过度授权操作。我们通过有效性实验和大规模的真实世界测量来评估SkillScope。SkillScope实现了94.53%的技能超授权检测的F1值。在实际环境中，SkillScope验证了7,039个存在过度授权行为的技能，显示出当前Skill生态系统中存在普遍的最低权限违规行为。在特权约束评估中，SkillScope将触发的任务中的过度授权操作实例减少了88.56%，同时保留了合法的任务完成。

更新时间: 2026-05-07 08:34:14

领域: cs.CR

下载: http://arxiv.org/abs/2605.05868v1

LoopTrap: Termination Poisoning Attacks on LLM Agents

Modern LLM agents solve complex tasks by operating in iterative execution loops, where they repeatedly reason, act, and self-evaluate progress to determine when a task is complete. In this work, we show that while this self-directed loop facilitates autonomy, it also introduces a critical risk: by injecting malicious prompts into the agent's context, an adversary can distort the agent's termination judgment, making it believe the task remains incomplete and leading to unbounded computation.To understand this threat, we define and systematically characterize it as Termination Poisoning and design 10 representative attack strategies. Through a empirical study spanning 8 LLM agents and 60 tasks, we demonstrate that different LLM agents exhibit distinct behavioral signatures that determine which strategies succeed. These transferable patterns can serve as principled guidance for crafting effective attacks against previously unseen agents and tasks, enabling scalable red-teaming beyond manually designed templates. Building on these insights, we introduce LoopTrap, an automated red-teaming framework that synthesizes target-specific malicious prompts by exploiting agent behavioral tendencies. LoopTrap first constructs a behavioral profile of the target agent along four vulnerability dimensions via lightweight probing. It then performs adaptive trap synthesis, routing to the most effective strategy and selecting optimal injections via a self-scoring mechanism. Finally, successful traps are abstracted into a reusable skill library, while failed attempts are refined through self-reflection, ensuring continuous improvement. Extensive evaluation shows that LoopTrap achieves an average of 3.57$\times$ step amplification across 8 mainstream agents, with a peak of 25$\times$.

Updated: 2026-05-07 08:21:51

标题: LoopTrap: LLM代理的终止毒害攻击

摘要: 现代LLM代理通过在迭代执行循环中操作来解决复杂任务，其中它们反复推理、行动并自我评估进展，以确定任务何时完成。在这项工作中，我们展示了虽然这种自主循环促进了自主性，但也引入了一个关键风险：通过向代理的环境中注入恶意提示，对手可以扭曲代理的终止判断，使其相信任务仍未完成，并导致无限计算。为了理解这种威胁，我们将其定义并系统地表征为终止中毒，并设计了10种代表性的攻击策略。通过涵盖8个LLM代理和60个任务的实证研究，我们证明了不同的LLM代理展现出不同的行为特征，决定了哪些策略会成功。这些可转移的模式可以作为制定针对之前未见代理和任务的有效攻击的原则指导，实现了超越手动设计模板的可扩展红队行动。基于这些见解，我们介绍了LoopTrap，一种自动化红队框架，通过利用代理的行为倾向来合成特定目标的恶意提示。LoopTrap首先通过轻量级探测构建目标代理的四个易受攻击维度的行为特征。然后执行自适应陷阱合成，通过自评分机制路由到最有效的策略并选择最佳注入。最后，成功的陷阱被抽象为可重用的技能库，而失败的尝试通过自我反思进行改进，确保持续改进。广泛评估表明，LoopTrap在8个主流代理中实现了平均3.57倍的步骤放大，最高可达25倍。

更新时间: 2026-05-07 08:21:51

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2605.05846v1

LeakDojo: Decoding the Leakage Threats of RAG Systems

Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to leverage external knowledge, but also exposes valuable RAG databases to leakage attacks. As RAG systems grow more complex and LLMs exhibit stronger instruction-following capabilities, existing studies fall short of systematically assessing RAG leakage risks. We present LeakDojo, a configurable framework for controlled evaluation of RAG leakage. Using LeakDojo, we benchmark six existing attacks across fourteen LLMs, four datasets, and diverse RAG systems. Our study reveals that (1) query generation and adversarial instructions contribute independently to leakage, with overall leakage well approximated by their product; (2) stronger instruction-following capability correlates with higher leakage risk; and (3) improvements in RAG faithfulness can introduce increased leakage risk. These findings provide actionable insights for understanding and mitigating RAG leakage in practice. Our codebase is available at https://github.com/yeasen-z/LeakDojo.

Updated: 2026-05-07 07:55:02

标题: LeakDojo：解码RAG系统的泄漏威胁

摘要: 检索增强生成（RAG）使大型语言模型（LLMs）能够利用外部知识，但也暴露了有价值的RAG数据库受到泄漏攻击的风险。随着RAG系统变得更加复杂和LLMs展现出更强的指令遵循能力，现有研究未能系统评估RAG泄漏风险。我们提出了LeakDojo，一个可配置的框架，用于对RAG泄漏进行受控评估。使用LeakDojo，我们在十四个LLMs、四个数据集和多样化的RAG系统中对六种现有攻击进行基准测试。我们的研究揭示了（1）查询生成和对抗指令独立地对泄漏贡献，整体泄漏基本由它们的乘积近似；（2）更强的指令遵循能力与更高的泄漏风险相关；以及（3）RAG忠实度的提高可能会引入增加的泄漏风险。这些发现为理解和缓解实践中的RAG泄漏提供了可操作的见解。我们的代码库可在https://github.com/yeasen-z/LeakDojo 获取。

更新时间: 2026-05-07 07:55:02

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2605.05818v1

LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution

LLMs are increasingly explored for malware analysis; however, current LLM-based malware attribution remains limited by unsupported indicators and insufficient code-level grounding for identifying malicious and vulnerable code segments. To address these limitations, this research introduces LCC-LLM, a code-centric benchmark dataset and evidence-grounded framework for malware attribution and multi-task static malware analysis. The proposed LCCD dataset contains approximately 34K PE samples processed through a large-scale reverse-engineering pipeline and represented using decompiled C code, assembly code, CFG/FCG artifacts, hexadecimal data, PE metadata, suspicious API evidence, and structural features. Beyond dataset construction, LCC-LLM integrates LangGraph-orchestrated static analysis with multi-source cybersecurity knowledge to support evidence-grounded malware reasoning. The framework employs a seven-layer retrieval-augmented generation pipeline, CoVe for IoC validation, and a multi-dimensional quality gate to improve factual reliability and analyst-oriented decision support. Curriculum-ordered instruction data is used to fine-tune DeepSeek-R1-Distill-Qwen-14B and Qwen3-Coder-30B-A3B using QLoRA. Evaluation across 43 malware-analysis task types achieves an average semantic similarity of 0.634, with the highest task-level performance in structured report generation, IoC extraction, vulnerability assessment, malware configuration extraction, and malware class detection. In a real-world case study using MalwareBazaar samples, the grounded pipeline achieves a 10/10 structured analysis pass rate, producing CFG/FCG evidence, MITRE ATT&CK mappings, detection guidance, and analyst-ready reports. These results show that code-centric representations, retrieval grounding, and verification-guided reasoning improve the reliability and operational usefulness of LLM-assisted malware attribution.

Updated: 2026-05-07 07:44:32

标题: LCC-LLM：利用以代码为中心的大型语言模型进行恶意软件溯源

摘要: LLM（Language Model）越来越被用于恶意软件分析；然而，目前基于LLM的恶意软件归因仍受到不支持的指标和不足的代码级基础的限制，无法准确识别恶意和脆弱代码段。为解决这些限制，本研究引入了LCC-LLM，一个以代码为中心的基准数据集和基于证据的框架，用于恶意软件归因和多任务静态恶意软件分析。所提出的LCCD数据集包含约34K个PE样本，通过大规模反向工程流水线处理，并使用反编译的C代码、汇编代码、CFG/FCG工件、十六进制数据、PE元数据、可疑API证据和结构特征表示。除数据集构建外，LCC-LLM将LangGraph编排的静态分析与多源网络安全知识相结合，以支持基于证据的恶意软件推理。该框架采用七层检索增强生成流水线，利用CoVe进行IoC验证，并使用多维质量门控制以提高事实可靠性和面向分析师的决策支持。课程有序指导数据用于通过QLoRA对DeepSeek-R1-Distill-Qwen-14B和Qwen3-Coder-30B-A3B进行微调。在43个恶意软件分析任务类型的评估中，平均语义相似度达到0.634，其中结构化报告生成、IoC提取、漏洞评估、恶意软件配置提取和恶意软件类别检测的任务级性能最高。在使用MalwareBazaar样本的真实案例研究中，基于证据的流水线实现了10/10的结构化分析通过率，生成CFG/FCG证据、MITRE ATT&CK映射、检测指导和分析师准备的报告。这些结果表明，以代码为中心的表示、检索基础和验证引导推理提高了LLM辅助恶意软件归因的可靠性和操作实用性。

更新时间: 2026-05-07 07:44:32

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2605.05807v1

AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?

AI agents that autonomously interact with external tools and environments have shown great promise across real-world applications. However, their reliance on external data exposes them to serious indirect prompt injection attacks, where malicious instructions embedded in third-party content hijack agent behaviors. To mitigate this threat, a growing number of defenses have been proposed and evaluated under existing agent security benchmarks. These benchmarks provide structured environments for comparing attacks and defenses, and have become a key driver for defense design and optimization. However, as agents move toward more complex and open-ended real-world deployments, there is a pressing need for benchmarks to become more adaptive and better reflect the dynamic environments faced by real-world agentic systems. In this work, we reveal three fundamental flaws in the current benchmarks and push the frontier along these dimensions: (i) lack of dynamic open-ended tasks, (ii) lack of helpful instructions, and (iii) simplistic user tasks. To bridge this gap, we introduce AgentDyn, a manually designed benchmark featuring 60 challenging open-ended tasks and 560 injection test cases across Shopping, GitHub, and Daily Life. Unlike prior static benchmarks, AgentDyn requires dynamic planning and incorporates helpful third-party instructions. Our evaluation of ten state-of-the-art defenses suggests that almost all existing defenses are either not secure enough or suffer from significant over-defense, revealing that existing defenses are still far from real-world deployment. Our benchmark is available at https://github.com/leolee99/AgentDyn.

Updated: 2026-05-07 07:42:14

标题: AgentDyn：您的Agent安全防御措施在现实世界动态环境中可部署吗？

摘要: 具有自主与外部工具和环境交互能力的AI代理在现实世界的应用中表现出极大的潜力。然而，它们对外部数据的依赖使它们容易受到严重的间接提示注入攻击，即恶意指令嵌入第三方内容，劫持代理行为。为了缓解这一威胁，越来越多的防御措施在现有的代理安全基准下被提出和评估。这些基准提供了结构化环境，用于比较攻击和防御，并已成为防御设计和优化的关键驱动因素。然而，随着代理向更复杂和开放性的现实世界部署发展，基准需要变得更加适应性强，更能反映现实世界代理系统面临的动态环境。在这项工作中，我们揭示了当前基准中的三个基本缺陷，并在这些维度上推动前沿发展：（i）缺乏动态的开放式任务，（ii）缺乏有用的指令，（iii）简单的用户任务。为了弥补这一差距，我们引入了AgentDyn，一个手动设计的基准，包括60个具有挑战性的开放式任务和560个注入测试用例，涵盖购物、GitHub和日常生活。与先前的静态基准不同，AgentDyn需要动态规划，并结合了有用的第三方指令。我们对十种最先进的防御措施进行评估，发现几乎所有现有的防御措施要么不够安全，要么存在明显的过度防御，表明现有的防御仍远未达到实际部署的水平。我们的基准可在https://github.com/leolee99/AgentDyn 上找到。

更新时间: 2026-05-07 07:42:14

领域: cs.CR

下载: http://arxiv.org/abs/2602.03117v3

Stego Battlefield: Evaluating Image Steganography Attacks and Steganalysis Defenses

Image steganography is widely used to protect user privacy and enable covert communication. However, it can also be abused by the adversary as a covert channel to bypass content moderation, disseminate harmful semantics, and even hide malicious instructions in images to elicit dangerous outputs from large models, posing a practical security risk that continues to evolve. To address the lack of a unified and systematic evaluation framework, we propose SADBench, a systematic benchmark that assesses the adversary's ability to inject harmful secrets via steganography and the defender's ability to detect such threats through steganalysis. Crucially, SADBench comprises $4$ core tasks, namely steganography attack capability evaluation, steganalysis defense capability evaluation, efficiency evaluation, and transferability evaluation. It evaluates both image-payload and text-payload steganography across diverse cover distributions, utilizing harmful visual semantics and toxic instructions to simulate malicious attacks. Across a broad set of attacks and detectors, SADBench reveals that (i) INN and autoencoder-based methods demonstrate superior stability compared to other architectures, (ii) in-domain detection is near-perfect and cheaper than generation, (iii) a critical asymmetry exists in transferability where attacks robustly generalize to new distributions while detectors fail to adapt, and (iv) real-world threats persist on social media, where payloads either survive minimal compression or effectively adapt to aggressive compression via simulated training. Overall, SADBench establishes a systematic, reproducible, and extensible framework to quantify risks, paving the way for measurable and security-driven advancements in steganography defense.

Updated: 2026-05-07 07:26:02

标题: 隐秘战场：评估图像隐写攻击和隐写分析防御

摘要: 图像隐写术被广泛用于保护用户隐私和实现隐蔽通信。然而，对手也可以滥用它作为一个隐蔽通道，绕过内容审核，传播有害语义，甚至在图像中隐藏恶意指令，以引发大型模型产生危险输出，构成一个不断演变的实际安全风险。为了解决缺乏统一和系统评估框架的问题，我们提出了SADBench，一个系统化基准测试，评估对手通过隐写术注入有害秘密的能力以及防御者通过隐写分析检测此类威胁的能力。关键是，SADBench包括4个核心任务，即隐写攻击能力评估、隐写分析防御能力评估、效率评估和可转移性评估。它评估了图像载荷和文本载荷隐写术在各种覆盖分布下的性能，利用有害的视觉语义和有毒指令来模拟恶意攻击。通过广泛的攻击和检测器，SADBench揭示了：(i) 基于INN和自动编码器的方法相对于其他架构表现出更好的稳定性，(ii) 领域内检测几乎完美且成本低于生成，(iii) 在可转移性中存在关键的不对称性，攻击能够稳健地推广到新的分布，而检测器则无法适应，(iv) 在社交媒体上存在现实世界的威胁，载荷要么经受住了最低压缩，要么通过模拟训练有效地适应了激进压缩。总的来说，SADBench建立了一个系统化、可重现和可扩展的框架，用于量化风险，为隐写术防御的可衡量和安全驱动进步铺平道路。

更新时间: 2026-05-07 07:26:02

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2605.05789v1

Co-designing for Compliance: Multi-party Computation Protocols for Post-Market Fairness Monitoring in Algorithmic Hiring

Post-market fairness monitoring is now mandated to ensure fairness and accountability for high-risk employment AI systems under emerging regulations such as the EU AI Act. However, effective fairness monitoring often requires access to sensitive personal data, which is subject to strict legal protections under data protection law. Multi-party computation (MPC) offers a promising technical foundation for compliant post-market fairness monitoring, enabling the secure computation of fairness metrics without revealing sensitive attributes. Despite growing technical interest, the operationalization of MPC-based fairness monitoring in real-world hiring contexts under concrete legal, industrial, and usability constraints remains unknown. This work addresses this gap through a co-design approach integrating technical, legal, and industrial expertise. We identify practical design requirements for MPC-based fairness monitoring, develop an end-to-end, legally compliant protocol spanning the full data lifecycle, and empirically validate it in a large-scale industrial setting. Our findings provide actionable design insights as well as legal and industrial implications for deploying MPC-based post-market fairness monitoring in algorithmic hiring systems.

Updated: 2026-05-07 07:14:34

标题: 《共同设计以确保合规性：用于算法招聘后市场公平监测的多方计算协议》

摘要: 市场后公平监控现在是必须的，以确保高风险就业人工智能系统在新兴法规（如欧盟人工智能法案）下的公平性和问责制。然而，有效的公平监控通常需要访问敏感个人数据，这些数据受数据保护法的严格法律保护。多方计算（MPC）为符合市场后公平监控提供了一个有前途的技术基础，能够在不泄露敏感属性的情况下安全计算公平度量。尽管技术兴趣日益增长，但在具体法律、工业和可用性约束下，在现实招聘环境中实现基于MPC的公平监控的运作仍未知。本文通过整合技术、法律和工业专业知识的协同设计方法来填补这一空白。我们确定了基于MPC的公平监控的实际设计要求，制定了一个从数据生命周期全方位合法的协议，并在大规模工业环境中进行了实证验证。我们的研究结果提供了可操作的设计见解，以及在算法招聘系统中部署基于MPC的市场后公平监控的法律和工业影响。

更新时间: 2026-05-07 07:14:34

领域: cs.CY,cs.CR

下载: http://arxiv.org/abs/2602.01837v4

DP2Guard: A Lightweight and Byzantine-Robust Privacy-Preserving Federated Learning Scheme for Industrial IoT

Privacy-Preserving Federated Learning (PPFL) has emerged as a secure distributed Machine Learning (ML) paradigm that aggregates locally trained gradients without exposing raw data. To defend against model poisoning threats, several robustness-enhanced PPFL schemes have been proposed by integrating anomaly detection. Nevertheless, they still face two major challenges: (1) the reliance on heavyweight encryption techniques results in substantial communication and computation overhead; and (2) single-strategy defense mechanisms often fail to provide sufficient robustness against adaptive adversaries. To overcome these challenges, we propose DP2Guard, a lightweight PPFL framework that enhances both privacy and robustness. DP2Guard leverages a lightweight gradient masking mechanism to replace costly cryptographic operations while ensuring the privacy of local gradients. A hybrid defense strategy is proposed, which extracts gradient features using singular value decomposition and cosine similarity, and applies a clustering algorithm to effectively identify malicious gradients. Additionally, DP2Guard adopts a trust score-based adaptive aggregation scheme that adjusts client weights according to historical behavior, while blockchain records aggregated results and trust scores to ensure tamper-proof and auditable training. Extensive experiments conducted on two public datasets demonstrate that DP2Guard effectively defends against four advanced poisoning attacks while ensuring privacy with reduced communication and computation costs.

Updated: 2026-05-07 07:08:58

标题: DP2Guard：一种轻量级且拜占庭容错的工业物联网隐私保护联邦学习方案

摘要: 隐私保护的联邦学习（PPFL）已经成为一种安全的分布式机器学习（ML）范例，它聚合了本地训练的梯度而不暴露原始数据。为了抵御模型中毒威胁，一些通过集成异常检测来增强鲁棒性的PPFL方案已被提出。然而，它们仍然面临两个主要挑战：（1）依赖于沉重的加密技术导致了大量的通信和计算开销；（2）单一策略的防御机制往往无法提供足够的鲁棒性来对抗适应性对手。为了克服这些挑战，我们提出了DP2Guard，这是一个轻量级的PPFL框架，同时增强了隐私和鲁棒性。DP2Guard利用轻量级的梯度掩蔽机制来替代昂贵的加密操作，同时确保本地梯度的隐私。我们提出了一种混合防御策略，利用奇异值分解和余弦相似性提取梯度特征，并应用聚类算法来有效识别恶意梯度。此外，DP2Guard采用了基于信任分数的自适应聚合方案，根据历史行为调整客户权重，同时区块链记录聚合结果和信任分数，以确保训练的防篡改和可审计性。在两个公共数据集上进行的大量实验表明，DP2Guard有效地抵御了四种高级中毒攻击，同时在减少通信和计算成本的同时确保了隐私。

更新时间: 2026-05-07 07:08:58

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2507.16134v2

SuperPaymaster: Eliminating Centralized Signer Authority via Asset-Oriented Abstraction to Reconcile Usability and Decentralization in Account Abstraction

Most production ERC-4337 Paymasters rely on Process-Oriented Abstraction (POA): a centralized off-chain server signs each sponsorship request and therefore acts as a potential censorship bottleneck. We propose Asset-Oriented Abstraction (AOA), which encapsulates payment capability in a persistent, user-owned on-chain asset -- the ``Gas Card'' -- rather than in an off-chain signing process. Following Design Science Research, we implement SuperPaymaster on Optimism Mainnet. Its sponsorship validity is anchored in on-chain Soulbound Token state and deterministic policy rules, removing the off-chain signer as a hard validity gate. We evaluate gas cost on Optimism Mainnet using single-UserOp ERC-20 transfers ($n{=}50$ per system). Trace-level decomposition isolates an approximately 32k-gas delta as the execution cost of eliminating centralized signing. In pure L2 execution gas, SuperPaymaster (167,830) is lower than both vendor-as-deployed commercial samples, including a 49\% reduction against the DEX-routed ERC-20 baseline (328,937), because it replaces an on-chain liquidation path with an internal balance update. In total billed gas, the remaining gap to the cheapest baseline is explained primarily by bundler pricing rather than paymaster architecture. A failover simulation shows that non-cooperative relayers can be bypassed when an alternative relayer is available. These findings suggest that AOA can reduce the tension among usability, sponsorship decentralization, and economic efficiency.

Updated: 2026-05-07 07:07:00

标题: 超级支付管理员：通过基于资产的抽象消除集中签署者权威，以调和账户抽象中的可用性和去中心化

摘要: 大多数生产ERC-4337支付主管依赖于过程导向抽象（POA）：一个集中的离链服务器签署每个赞助请求，因此充当潜在的审查瓶颈。我们提出了资产导向抽象（AOA），它将支付能力封装在一个持久的，用户拥有的链上资产——“燃气卡”中，而不是在离链签名过程中。遵循设计科学研究，我们在Optimism Mainnet上实现了SuperPaymaster。其赞助有效性根植于链上Soulbound Token状态和确定性政策规则，消除了离链签署者作为硬有效性门。我们使用单一用户Op ERC-20转账（每个系统$n{=}50$）在Optimism Mainnet上评估燃气成本。跟踪级别的分解将消除集中签名的执行成本约为32k燃气的增量。在纯L2执行燃气中，SuperPaymaster（167,830）低于包括DEX路由的ERC-20基线（328,937）在内的部署商业样本，因为它将链上清算路径替换为内部余额更新。在总计费用燃气中，与最便宜基线之间的剩余差距主要由捆绑定价而不是支付主管架构解释。故障转移模拟显示，当有替代的中继器可用时，非合作中继器可以被绕过。这些发现表明AOA可以减少可用性、赞助去中心化和经济效率之间的紧张关系。

更新时间: 2026-05-07 07:07:00

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2605.05774v1

Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection

Adversarial attacks on stochastic bandits have traditionally relied on some unrealistic assumptions, such as per-round reward manipulation and unbounded perturbations, limiting their relevance to real-world systems. We propose a more practical threat model, Fake Data Injection, which reflects realistic adversarial constraints: the attacker can inject only a limited number of bounded fake feedback samples into the learner's history, simulating legitimate interactions. We design effective attack strategies under this model, explicitly addressing both magnitude constraints (on reward values) and temporal constraints (on when and how often data can be injected). Our theoretical analysis shows that these attacks can mislead a class of bandit algorithms into selecting a target arm in nearly all rounds while incurring only sublinear attack cost. Experiments on synthetic and real-world datasets validate the effectiveness of our strategies, revealing vulnerabilities in stochastic bandit algorithms under practical adversarial scenarios.

Updated: 2026-05-07 06:42:40

标题: 通过虚假数据注入对随机赌博机进行实际对抗攻击

摘要: 对随机赌博机的对抗性攻击传统上依赖于一些不切实际的假设，例如每轮奖励操纵和无界扰动，从而限制了它们与现实世界系统的相关性。我们提出了一种更为实用的威胁模型，即伪数据注入，反映了现实的对抗性约束：攻击者只能向学习者的历史记录中注入有限数量的有界伪反馈样本，模拟合法的交互。我们在这个模型下设计了有效的攻击策略，明确解决了奖励值的幅度约束和数据注入的时间约束。我们的理论分析表明，这些攻击可以误导一类赌博机算法在几乎所有轮次中选择目标臂，同时只产生次线性的攻击成本。针对合成和真实数据集的实验验证了我们策略的有效性，在实际对抗性场景下揭示了随机赌博机算法的脆弱性。

更新时间: 2026-05-07 06:42:40

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2505.21938v3

SARSteer: Safeguarding Large Audio-Language Models via Safe-Ablated Refusal Steering

Large Audio-Language Models (LALMs) are becoming essential as a powerful multimodal backbone for real-world applications. However, recent studies show that audio inputs can more easily elicit harmful responses than text, exposing new risks toward deployment. While safety alignment has made initial advances in LLMs and Large Vision-Language Models (LVLMs), we find that vanilla adaptation of these approaches to LALMs faces two key limitations: 1) LLM-based steering fails under audio input due to the large distributional gap between activations, and 2) prompt-based defenses induce over-refusals on benign-speech queries. To address these challenges, we propose Safe-Ablated Refusal Steering (SARSteer), the first inference-time defense framework for LALMs. Specifically, SARSteer leverages text-derived refusal steering to enforce rejection without manipulating audio inputs and introduces decomposed safe-space ablation to mitigate over-refusal. Extensive experiments demonstrate that SARSteer significantly improves harmful-query refusal while preserving benign responses, establishing a principled step toward safety alignment in LALMs. The codes and constructed datasets are released at https://github.com/linweiii/SARSteer.

Updated: 2026-05-07 06:36:50

标题: SARSteer：通过安全剔除拒绝引导保护大型音频语言模型

摘要: 大型音频语言模型（LALMs）作为强大的多模态支柱对于实际应用来说变得至关重要。然而，最近的研究表明，音频输入比文本更容易引发有害反应，暴露了新的部署风险。尽管安全对齐在LLMs和大型视觉语言模型（LVLMs）方面已经取得了初步进展，但我们发现将这些方法简单地应用于LALMs面临两个关键限制：1）由于激活之间的分布差异较大，基于LLM的转向在音频输入下失败，2）基于提示的防御措施会在良性语音查询上导致过度拒绝。为了解决这些挑战，我们提出了Safe-Ablated Refusal Steering（SARSteer），这是第一个用于LALMs的推理时防御框架。具体来说，SARSteer利用基于文本的拒绝转向来强制拒绝，而无需操纵音频输入，并引入分解的安全空间消蚀来减轻过度拒绝。广泛的实验表明，SARSteer显著提高了有害查询的拒绝率，同时保留了良性响应，为LALMs中的安全对齐建立了一个原则性步骤。代码和构建的数据集发布在https://github.com/linweiii/SARSteer。

更新时间: 2026-05-07 06:36:50

领域: cs.SD,cs.CR

下载: http://arxiv.org/abs/2510.17633v2

$α$-Wasserstein Mechanism for Rényi Pufferfish Privacy

This paper introduces the $α$-Wasserstein mechanism for achieving Rényi Pufferfish Privacy using Laplace and Gaussian noise. By leveraging Hölder's inequality, we demonstrate that the scale parameter of the Laplace mechanism can be calibrated via an upper bound on the $W_α$ metric to satisfy $(α, ε)$-Rényi Pufferfish Privacy for $α\in (1, \infty]$. We show that at the limit $α= \infty$, this framework recovers the established $W_\infty$ mechanism for $ε$-pufferfish privacy. This result is subsequently extended to the exponential mechanism. Furthermore, we propose a $W_α$ mechanism for Gaussian noise for $α\in (1, \infty)$, demonstrating that it generalizes existing results within the Rényi Differential Privacy framework. Experimental evaluations reveal that our $α$-Wasserstein mechanism significantly reduces noise power compared to the conventional $W_\infty$-based approach, with the Gaussian mechanism providing superior utility over the Laplace mechanism. Notably, the mechanisms derived in this work achieve exact $(α, ε)$-Rényi Pufferfish Privacy without requiring additional relaxations, such as $δ$-approximations.

Updated: 2026-05-07 06:12:18

标题: $α$-Wasserstein 机制用于 Rényi Pufferfish 隐私

摘要: 本文介绍了使用拉普拉斯和高斯噪声实现Rényi Pufferfish隐私的$α$-Wasserstein机制。通过利用Hölder不等式，我们证明拉普拉斯机制的比例参数可以通过对$W_α$度量的上界进行校准，以满足$(α, ε)$-Rényi Pufferfish隐私，其中$α\in (1, \infty]$。我们展示了在极限$α= \infty$时，该框架恢复了已建立的$W_\infty$机制，用于$ε$-pufferfish隐私。随后将这一结果扩展到指数机制。此外，我们提出了一个适用于高斯噪声的$W_α$机制，其中$α\in (1, \infty)$，表明它在Rényi差分隐私框架内推广了现有结果。实验评估显示，我们的$α$-Wasserstein机制与传统的基于$W_\infty$的方法相比显著降低了噪声功率，高斯机制在效用方面优于拉普拉斯机制。值得注意的是，本文推导出的机制实现了精确的$(α, ε)$-Rényi Pufferfish隐私，无需额外的放宽，如$δ$-近似。

更新时间: 2026-05-07 06:12:18

领域: cs.CR

下载: http://arxiv.org/abs/2605.05723v1

STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack

Red-teaming Vision-Language Models is essential for identifying vulnerabilities where adversarial image-text inputs trigger toxic outputs. Existing approaches treat image generation as a black box, returning only terminal toxicity scores and leaving open the question of when and how toxic semantics emerge during multi-step synthesis. We introduce STARE, a hierarchical reinforcement learning framework that treats the denoising trajectory itself as the attack surface, under a direct white-box T2I and query-only black-box VLM setting. By coupling a high-level prompt editor with low-level T2I fine-tuning via Group Relative Policy Optimization (GRPO), STARE attains a 68% improvement in Attack Success Rate over state-of-the-art black-box and white-box baselines. More importantly, this trajectory-level view surfaces the Optimization-Induced Phase Alignment phenomenon: vanilla models exhibit diffuse toxicity, whereas adversarial optimization concentrates conceptual harms into early semantic phases and detail-oriented harms into late refinement. Targeted perturbations of either window selectively suppress different toxicity categories, indicating that this temporal structure is a genuine causal handle rather than a side effect of the hierarchical design. The phenomenon turns toxicity formation from a chaotic process into a small set of predictable vulnerability windows, providing both a potent attack engine and a basis for phase-aware safety mechanisms. Content warning: This paper contains examples of toxic content that may be offensive or disturbing.

Updated: 2026-05-07 06:02:19

标题: STARE：多模态毒性攻击的逐步时间对齐和红队引擎

摘要: 对红队视觉-语言模型进行测试对于识别在对抗性图像文本输入触发有毒输出的漏洞是至关重要的。现有方法将图像生成视为黑盒，并仅返回终端毒性分数，未解决在多步合成过程中毒性语义何时以及如何出现的问题。我们引入STARE，一个层次化的强化学习框架，将去噪轨迹本身视为攻击表面，在直接白盒T2I和仅查询黑盒VLM设置下。通过结合高级提示编辑器和通过群体相对策略优化（GRPO）进行低级T2I微调，STARE在攻击成功率上比最先进的黑盒和白盒基线提高了68%。更重要的是，这种轨迹级别的观点展现了优化诱导的相位对齐现象：普通模型展现出扩散性毒性，而对抗性优化将概念性伤害集中到早期语义阶段，将细节导向性伤害集中到后期细化阶段。对两个窗口的有针对性扰动选择性地抑制了不同的毒性类别，表明这种时间结构是一个真正的因果手段，而不是层次性设计的副作用。这一现象将毒性形成从混乱的过程转变为一小组可预测的脆弱性窗口，为强大的攻击引擎和基于阶段感知的安全机制提供了基础。内容警告：本文包含可能令人不适或困扰的有毒内容示例。

更新时间: 2026-05-07 06:02:19

领域: cs.CR

下载: http://arxiv.org/abs/2605.00699v3

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

With the rapid evolution of foundation models, Large Language Model (LLM) agents have demonstrated increasingly powerful tool-use capabilities. However, this proficiency introduces significant security risks, as malicious actors can manipulate agents into executing tools to generate harmful content. While existing defensive mechanisms are effective, they frequently suffer from the over-refusal problem, where increased safety strictness compromises the agent's utility on benign tasks. To mitigate this trade-off, we propose \textsc{SafeHarbor}, a novel framework designed to establish precise decision boundaries for LLM agents. Unlike static guidelines, \textsc{SafeHarbor} extracts context-aware defense rules through enhanced adversarial generation. We design a local hierarchical memory system for dynamic rule injection, offering a training-free, efficient, and plug-and-play solution. Furthermore, we introduce an information entropy-based self-evolution mechanism that continuously optimizes the memory structure through dynamic node splitting and merging. Extensive experiments demonstrate that \textsc{SafeHarbor} achieves state-of-the-art performance on both ambiguous benign tasks and explicit malicious attacks, notably attaining a peak benign utility of 63.6\% on GPT-4o while maintaining a robust refusal rate exceeding 93\% against harmful requests. The source code is publicly available at https://github.com/ljj-cyber/SafeHarbor.

Updated: 2026-05-07 05:50:45

标题: SafeHarbor: 针对LLM Agent安全性的分层内存增强防护栏

摘要: 随着基础模型的快速演化，大型语言模型(LLM)代理展示了越来越强大的工具使用能力。然而，这种熟练性引入了重大的安全风险，因为恶意行为者可以操纵代理执行工具来生成有害内容。虽然现有的防御机制是有效的，但它们经常遭受过度拒绝问题的困扰，即增加安全严格性会损害代理在良性任务上的效用。为了缓解这种折衷，我们提出了一个新颖的框架\textsc{SafeHarbor}，旨在为LLM代理建立精确的决策边界。与静态指导原则不同，\textsc{SafeHarbor}通过增强的对抗生成提取具有上下文感知的防御规则。我们设计了一个本地分层存储系统，用于动态规则注入，提供了一个无需训练、高效且即插即用的解决方案。此外，我们引入了基于信息熵的自演进机制，通过动态节点分裂和合并持续优化存储结构。大量实验表明，\textsc{SafeHarbor}在模糊的良性任务和显式的恶意攻击方面均取得了最先进的性能，特别是在GPT-4o上达到了63.6\%的峰值良性效用，同时在对有害请求的拒绝率超过93%。源代码可在https://github.com/ljj-cyber/SafeHarbor上公开获取。

更新时间: 2026-05-07 05:50:45

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2605.05704v1

CFE-PPAR: Compression-friendly encryption for privacy-preserving action recognition leveraging video transformers

Privacy-preserving action recognition (PPAR) enables machines to understand human activities in videos without revealing sensitive visual content. Among the various strategies for PPAR, encryption-based methods achieve strong privacy protection while maintaining high recognition performance. However, these methods lead to a catastrophic decrease in recognition performance and visual quality when the encrypted videos are compressed. That is, the previous methods are not compression-friendly. To address these issues, in this paper, we propose the first compression-friendly encryption method for PPAR, called CFE-PPAR. In CFE-PPAR, videos encrypted with secret keys can be directly recognized by a video transformer, which uses parameters transformed by the same keys as those used for video encryption. In experiments, it is verified that CFE-PPAR outperforms previous methods on the UCF101 and HMDB51 datasets under Motion-JPEG and H.264 compression.

Updated: 2026-05-07 05:32:37

标题: CFE-PPAR：利用视频变换器实现保护隐私的行为识别的压缩友好型加密

摘要: 隐私保护行为识别（PPAR）使机器能够在视频中理解人类活动，而不会泄露敏感的视觉内容。在各种PPAR策略中，基于加密的方法在保护隐私的同时保持了高识别性能。然而，当加密视频被压缩时，这些方法会导致识别性能和视觉质量的灾难性下降。也就是说，以前的方法不友好于压缩。为了解决这些问题，本文提出了第一个适合压缩的PPAR加密方法，称为CFE-PPAR。在CFE-PPAR中，使用秘钥加密的视频可以直接被视频变换器识别，该变换器使用与视频加密相同的秘钥转换的参数。实验证明，在Motion-JPEG和H.264压缩下，CFE-PPAR在UCF101和HMDB51数据集上优于以前的方法。

更新时间: 2026-05-07 05:32:37

领域: cs.CV,cs.AI,cs.CR

下载: http://arxiv.org/abs/2605.05692v1

VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns

The increasing prevalence of software vulnerabilities highlights the need for effective Automatic Vulnerability Repair (AVR) tools. While LLM-based approaches are promising, they struggle to incorporate structured security knowledge from sources like CWE and NVD. Current methods either use this information superficially by concatenating the CWE-ID into the input prompt, yielding negligible benefits, or rely on few-shot learning with rigid, non-generalizable examples, which limits their effectiveness in real-world scenarios. To address this gap, we propose VulKey, an LLM-based AVR framework that leverages a hierarchical abstraction of expert knowledge to guide patch generation. Our novel three-level abstraction formulates repair strategies in terms of CWE type, syntactic actions, and semantic key elements. This approach captures the essence of a security fix with greater generality than concrete examples and more semantic richness than traditional syntax-based templates, overcoming the coverage limitations of prior methods. VulKey is implemented as a two-stage pipeline: first, expert knowledge matching predicts an appropriate repair pattern for the vulnerability; second, repair code generation uses a pattern-guided, fine-tuned LLM to produce secure patches. On the real-world C/C++ dataset PrimeVul, VulKey achieves 31.5% repair accuracy, surpassing the best baseline by 7.6% and outperforming leading tools such as VulMaster and GPT-5. Moreover, VulKey demonstrates cross-language and cross-model generalizability, with state-of-the-art performance on the Java benchmark Vul4J. These results underscore the importance of structured expert knowledge in advancing AVR effectiveness. Our work demonstrates that explicitly modeling and integrating expert security knowledge through hierarchical patterns is a crucial step toward building more effective and reliable AVR tools.

Updated: 2026-05-07 05:15:42

标题: VulKey：受领域特定修复模式指导的自动化漏洞修复

摘要: 随着软件漏洞日益普遍，有效的自动漏洞修复（AVR）工具变得越发重要。虽然基于LLM的方法有着潜力，但在整合来自CWE和NVD等源的结构化安全知识方面仍面临困难。目前的方法要么仅是将CWE-ID简单地连接到输入提示中，效果微乎其微，要么依赖于刚性的、不具有一般化意义的例子，这限制了它们在真实场景中的有效性。为了填补这一空白，我们提出了VulKey，一个基于LLM的AVR框架，利用专家知识的分层抽象来指导补丁生成。我们的新颖三级抽象将修复策略表述为CWE类型、语法动作和语义关键元素。这种方法比具体例子更具一般性地捕获了安全修复的本质，并比传统基于语法模板更具语义丰富性，克服了先前方法的覆盖范围限制。 VulKey作为一个两阶段流程实施：首先，专家知识匹配预测漏洞的适当修复模式；其次，修复代码生成使用模式引导的、经过细调的LLM来生成安全补丁。在真实的C/C++数据集PrimeVul上，VulKey实现了31.5%的修复准确率，超过最佳基线7.6%，胜过领先工具如VulMaster和GPT-5。此外，VulKey展示了跨语言和跨模型的泛化能力，在Java基准Vul4J上表现出最先进的性能。这些结果强调了结构化专家知识在提升AVR效果方面的重要性。我们的工作表明，通过分层模式明确建模和整合专家安全知识是构建更有效和可靠的AVR工具的关键一步。

更新时间: 2026-05-07 05:15:42

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2605.01769v2

AoI-Guided Client Selection for Robust and Timely Federated Intrusion Detection in Cloud-Edge Security Analytics

Federated learning (FL) is attractive for cloud-edge intrusion detection because it enables collaborative training over distributed telemetry without centralizing raw logs. In production security analytics pipelines, however, only a subset of clients participates in each round, and heterogeneous bandwidth, stragglers, and dropouts can cause the server to rely on stale client information. This paper studies client participation as a timeliness-aware systems problem using Age of Information (AoI). We compare three lightweight policies for federated intrusion detection: AoI-first, utility-first, and a hybrid AoI+utility rule with a tunable trade-off parameter. Across a CIC-IDS2017 DDoS/PortScan mini subset, NSL-KDD, ToN-IoT, and a synthetic drift benchmark under clean, poisoning, and poisoning-plus-robust-aggregation settings, AoI-aware selection reduces average AoI by about 39--41% and peak AoI by about 70% relative to random sampling while keeping the per-round communication budget fixed. The hybrid policy usually preserves Macro-F1/AUC and provides an interpretable knob for balancing freshness, detection quality, and robustness, although it is not uniformly Pareto-dominant once false positive rate is included. Robustness is evaluated by combining AoI-guided selection with trimmed-mean aggregation under label-flip poisoning; the selection policy itself is not intended as a standalone Byzantine defense. The main practical message is that cloud-edge, privacy-preserving intrusion analytics can improve timeliness through a lightweight scheduling layer without changing the underlying FL participation budget.

Updated: 2026-05-07 03:51:15

标题: AoI指导的客户端选择用于云边安全分析中强大及及时的联合入侵检测

摘要: 联邦学习（FL）对于云边入侵检测非常有吸引力，因为它可以在分布式遥测中进行协作训练，而无需将原始日志集中。然而，在生产安全分析管道中，每轮仅有部分客户端参与，异构带宽、落后者和掉线可能导致服务器依赖过时的客户端信息。本文利用信息时代（AoI）将客户端参与作为一个与时效性相关的系统问题进行研究。我们比较了三种轻量级策略用于联邦入侵检测：AoI优先、效用优先和具有可调节权衡参数的混合AoI+效用规则。在CIC-IDS2017 DDoS/PortScan迷你子集、NSL-KDD、ToN-IoT和一个干净、毒化以及毒化加强聚合设置下的合成漂移基准测试中，具有AoI意识的选择相对于随机抽样将平均AoI降低了约39-41%，峰值AoI降低了约70%，同时保持每轮通信预算不变。混合策略通常保持Macro-F1/AUC，并提供一个可解释的旋钮来平衡新鲜度、检测质量和鲁棒性，尽管一旦包括假阳性率，它并不总是帕累托优势的。通过将AoI引导选择与修剪均值聚合相结合来评估鲁棒性，在标签翻转毒化下；选择策略本身并不旨在作为独立的拜占庭防御。主要的实践信息是，云边、保护隐私的入侵分析可以通过一个轻量级调度层来提高时效性，而无需改变基础的FL参与预算。

更新时间: 2026-05-07 03:51:15

领域: cs.CR,cs.DL

下载: http://arxiv.org/abs/2605.05644v1

Architecture Matters: Comparing RAG Systems under Knowledge Base Poisoning

Retrieval-Augmented Generation (RAG) systems are vulnerable to knowledge base poisoning, yet existing attacks have been evaluated almost exclusively against vanilla retrieve-then-generate pipelines. Architectures designed to handle conflicting retrieved information - multi-agent debate, agentic retrieval, recursive language models - remain untested against adversarially optimized contradictions. We evaluate four RAG architectures (vanilla RAG, agentic RAG, MADAM-RAG, and Recursive Language Models) under controlled single-document (N=1) poisoning on 921 Natural Questions QA pairs, comparing a clean baseline, naive injection, and CorruptRAG-AK - an adversarial attack whose meta-epistemic framing targets credibility assessment. Architecture is a high-impact variable in adversarial robustness: under CorruptRAG-AK, attack success rates range from 81.9% (vanilla) to 24.4% (RLM) - a spread of nearly 58 percentage points across architectures with comparable clean accuracy (~92%). Decomposing this gap, once the poisoned document is retrieved, adversarial framing - not retrieval optimization - drives the majority of CorruptRAG-AK's advantage for three of four architectures, localizing the cross-architecture vulnerability at the content-reasoning stage. Our MADAM-RAG reimplementation shows the highest apparent contradiction detection rate, though our LLM judge over-identifies this behavior (~48.5% precision), so reported rates are upper bounds. Regardless of detection, MADAM-RAG cannot resolve contradictions reliably, producing a 41.4% non-answer rate even on clean inputs - though implementation divergences from the original may contribute. We introduce a seven-category behavioral taxonomy capturing contradiction detection, hedging, and failure modes beyond binary accuracy. Code, data, and analysis notebooks are publicly available.

Updated: 2026-05-07 03:36:14

标题: 建筑很重要：比较在知识库中毒情况下的RAG系统

摘要: 检索增强生成（RAG）系统容易受到知识库毒化的影响，然而现有的攻击几乎完全针对普通检索-生成流水线进行评估。设计用于处理冲突检索信息的架构 - 多代理辩论，代理检索，递归语言模型 - 尚未针对对抗优化的矛盾进行测试。我们在921个自然问题问答对上对四种RAG架构（普通RAG，代理RAG，MADAM-RAG和递归语言模型）进行了受控单文档（N=1）毒化评估，比较了干净基准线，天真注入和CorruptRAG-AK - 一种针对可信度评估的对抗攻击。架构是对抗鲁棒性的一个重要变量：在CorruptRAG-AK的作用下，攻击成功率从81.9%（普通）到24.4%（RLM）不等 - 在具有相似干净准确率（~92%）的架构之间有近58个百分点的差距。分解这一差距，一旦检索到毒化文档，对抗性框架 - 而不是检索优化 - 驱动了四种架构中的三种的大部分CorruptRAG-AK优势，将跨架构脆弱性局限在内容推理阶段。我们的MADAM-RAG重新实现显示出最高的明显矛盾检测率，尽管我们的LLM评判过度识别了这种行为（~48.5%精度），因此报告的率是上限。无论是否检测到，MADAM-RAG都无法可靠解决矛盾，甚至在干净输入上也产生41.4%的非答案率 - 尽管与原始版本的实现可能存在差异。我们引入了一个七类行为分类法，捕捉超越二元准确性的矛盾检测，避险和失败模式。代码、数据和分析笔记本是公开可用的。

更新时间: 2026-05-07 03:36:14

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2605.05632v1

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful objective in a single prompt, increasingly capable attackers can distribute their intent across multiple benign-looking turns. Recent studies show that even modern commercial models with advanced guardrails remain vulnerable to such attacks despite advances in safety alignment and external guardrails. In this work, we address this challenge by detecting the earliest turn at which delivering the candidate response would make the accumulated interaction sufficient to enable harmful action. This objective requires precise turn-level intervention that identifies the harm-enabling closure point while avoiding premature refusal of benign exploratory conversations. To further support training and evaluation, we construct the Multi-Turn Intent Dataset (MTID), which contains branching attack rollouts, matched benign hard negatives, and annotations of the earliest harm-enabling turns. We show that MTID helps enable a turn-level monitor TurnGate, which substantially outperforms existing baselines in harmful-intent detection while maintaining low over-refusal rates. TurnGate further generalizes across domains, attacker pipelines, and target models. Our code is available at https://github.com/Graph-COM/TurnGate.

Updated: 2026-05-07 03:35:31

标题: 晚了一步：针对多轮对话中隐藏的恶意意图的响应感知防御

摘要: 多轮对话中隐藏的恶意意图对已部署的大型语言模型(LLMs)构成日益严重的威胁。与在单个提示中暴露有害目标不同，日益强大的攻击者可以将他们的意图分散在多个看似良性的对话中。最近的研究表明，即使具有先进防护措施的现代商业模型也会对此类攻击保持脆弱，尽管在安全对齐和外部防护措施方面取得了进展。在这项工作中，我们通过检测传递候选响应的最早轮次，使累积互动足以实现有害行为来解决这一挑战。这一目标需要精确的轮次级干预，以确定有害行为的结束点，同时避免对良性探索性对话的过早拒绝。为了进一步支持训练和评估，我们构建了多轮意图数据集(MTID)，其中包含分支攻击展开、匹配的良性负例以及最早的有害行为启用轮次的注释。我们展示了MTID有助于实现轮次级监控TurnGate，该监控在有害意图检测方面远远优于现有基线，并保持低的拒绝过高率。TurnGate进一步在领域、攻击者流水线和目标模型之间进行泛化。我们的代码可在https://github.com/Graph-COM/TurnGate 上找到。

更新时间: 2026-05-07 03:35:31

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2605.05630v1

Adversarial procurement in blockchains

An emerging blockchain protocol design pattern leverages the asymmetry between the computational effort in performing versus verifying tasks. For example, cryptographic validity proofs (e.g., SNARKS) require the prover to expend significant effort demonstrating the correctness of their claim, while the verifiers benefit from extremely easy validation. The operationalization of this paradigm requires efficiently soliciting the performance of expensive tasks in pseudonymous, adversarial environments. We formalize this as a mechanism design question. The protocol balances the economic cost of a liveness fault, where the work is not completed, with the payments required to incentivize specific behavior from candidate suppliers. We show that the loss of the optimal protocol scales logarithmically in the cost of a liveness fault, scaled up by the adversarial fraction of the network. Further, we find that the optimal equilibria have an intuitive structure, allowing us to provide concrete advice to practitioners. Specifically, in many regimes, the optimum designates a single, random node as the primary worker and a committee as a fallback, which is reminiscent of leader-based consensus mechanisms. We also characterize the asymptotic regimes where having negative payments (i.e., slashing in blockchain parlance) is especially helpful.

Updated: 2026-05-07 01:08:10

标题: 区块链中的对抗性采购

摘要: 一种新兴的区块链协议设计模式利用了执行任务与验证任务之间的计算工作不对称性。例如，加密有效性证明（例如SNARKS）要求证明者付出大量努力证明其声明的正确性，而验证者则从极易验证中获益。这一范式的操作化要求在匿名的对抗环境中有效地征求执行昂贵任务的性能。我们将这一问题形式化为一种机制设计问题。该协议平衡了活跃性故障的经济成本，即工作未完成时所需的支付与激励候选供应商特定行为的支付之间的关系。我们发现，最佳协议的损失随着活跃性故障的成本以对抗网络的比例对数缩放。此外，我们发现最佳均衡具有直观的结构，使我们能够向从业者提供具体建议。特别是，在许多情况下，最佳设计指定一个随机节点作为主要工作者，以及一个委员会作为后备，这让人想起了基于领导者的共识机制。我们还表征了在具有负面支付（即区块链术语中的削减）特别有帮助的渐近区域。

更新时间: 2026-05-07 01:08:10

领域: cs.GT,cs.CR

下载: http://arxiv.org/abs/2605.05559v1

Beyond Collection: Measuring the Detection Efficacy of Modern Security Logging Standards

Effective security logging is crucial for the timely and accurate detection of cyber threats; however, the relative effectiveness of various industry-standard logging frameworks remains understudied. This paper addresses this critical gap by presenting the first systematic evaluation of modern security logging standards utilizing a novel methodology built upon the automated Security Exploit Telemetry Collection (SETC) framework. SETC systematically generates reproducible exploit scenarios in containerized environments, collecting rich telemetry across multiple logging standards, including CIM (Common Information Model), OCSF (Open Cybersecurity Schema Framework), and ECS (Elastic Common Schema). The detection efficacy of each logging standard is quantified by measuring telemetry completeness and exploit detectability across standardized logs through detailed experiments involving 50 diverse remote code execution vulnerabilities. The resulting findings identify critical gaps and reveal significant differences in logging standards' abilities to capture key attack indicators. Our contributions include a novel evaluation methodology that enables scalable and reproducible analysis of exploit telemetry, as well as new findings that provide clear, evidence-based guidance for security practitioners to make informed decisions about adopting logging standards.

Updated: 2026-05-07 00:19:31

标题: 超越收集：衡量现代安全日志标准的检测效果

摘要: 有效的安全日志记录对及时准确地检测网络威胁至关重要；然而，各种行业标准日志框架的相对有效性仍未得到充分研究。本文通过利用基于自动化安全漏洞遥测收集（SETC）框架的新方法，首次系统评估了现代安全日志记录标准的有效性，填补了这一关键空白。SETC在容器化环境中系统地生成可重现的漏洞场景，跨多个日志标准收集丰富的遥测数据，包括CIM（通用信息模型）、OCSF（开放网络安全模式框架）和ECS（弹性通用模式）。通过涉及50种不同的远程代码执行漏洞的详细实验，通过衡量遥测数据的完整性和漏洞的可检测性，量化了每种日志标准的检测效果。结果显示了关键差距，并揭示了各种日志标准捕获关键攻击指标能力的显著差异。我们的贡献包括一种新颖的评估方法，可实现可伸缩和可重现的漏洞遥测分析，以及提供基于证据的明确指导，帮助安全从业者就采用日志标准做出明智决策。

更新时间: 2026-05-07 00:19:31

领域: cs.CR

下载: http://arxiv.org/abs/2605.05531v1

Toward Quantum-Safe Software Engineering: A Vision for Post-Quantum Cryptography Migration

The quantum threat to cybersecurity has accelerated the standardization of Post-Quantum Cryptography (PQC). Migrating legacy software to these quantum-safe algorithms is not a simple library swap, but a new software engineering challenge: existing vulnerability detection, refactoring, and testing tools are not designed for PQC's probabilistic behavior, side-channel sensitivity, and complex performance trade-offs. To address these challenges, this paper outlines a vision for a new class of tools and introduces the Automated Quantum-safe Adaptation (AQuA) framework, with a three-pillar agenda for PQC-aware detection, semantic refactoring, and hybrid verification, thereby motivating Quantum-Safe Software Engineering (QSSE) as a distinct research direction.

Updated: 2026-05-07 00:10:34

标题: 朝向量子安全软件工程：后量子密码学迁移的愿景

摘要: 量子威胁对网络安全加速了后量子密码学（PQC）的标准化。将传统软件迁移到这些量子安全算法不仅仅是简单的库替换，而是一个全新的软件工程挑战：现有的漏洞检测、重构和测试工具并不适用于PQC的概率行为、侧信道敏感性和复杂的性能权衡。为了解决这些挑战，本文提出了一个新类工具的愿景，并引入了自动量子安全适应（AQuA）框架，其中包括PQC感知检测、语义重构和混合验证的三大支柱议程，从而推动量子安全软件工程（QSSE）作为一个独特的研究方向。

更新时间: 2026-05-07 00:10:34

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2602.05759v2