_              _         ____              
   / \   _ ____  _(_)_   __ |  _ \  __ _ _   _ 
  / _ \ | '__\ \/ / \ \ / / | | | |/ _` | | | |
 / ___ \| |   >  <| |\ V /  | |_| | (_| | |_| |
/_/   \_\_|  /_/\_\_| \_/   |____/ \__,_|\__, |
                                         |___/ 
        

Articles: 0

Last Updated: N/A (+00:00)

Index | Calendar | Favorites | Archive | Profile

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. Building on this progress, recent methods attempt to transfer such models for character animation and real robot control by applying a Whole-Body Controller (WBC) that converts diffusion-generated motions into executable trajectories. While WBC trajectories become compliant with physics, they may expose substantial deviations from original motion. To address this issue, we here propose PhysMoDPO, a Direct Preference Optimization framework. Unlike prior work that relies on hand-crafted physics-aware heuristics such as foot-sliding penalties, we integrate WBC into our training pipeline and optimize diffusion model such that the output of WBC becomes compliant both with physics and original text instructions. To train PhysMoDPO we deploy physics-based and task-specific rewards and use them to assign preference to synthesized trajectories. Our extensive experiments on text-to-motion and spatial control tasks demonstrate consistent improvements of PhysMoDPO in both physical realism and task-related metrics on simulated robots. Moreover, we demonstrate that PhysMoDPO results in significant improvements when applied to zero-shot motion transfer in simulation and for real-world deployment on a G1 humanoid robot.

Updated: 2026-03-13 17:59:59

标题: PhysMoDPO: 带有偏好优化的物理合理人形运动

摘要: 最近,在文本条件下的人体动作生成方面取得了显著进展,这主要是通过在大规模人体运动数据上训练的扩散模型推动的。借鉴这一进展,最近的方法尝试通过应用全身控制器(WBC)将这些模型转移到角色动画和真实机器人控制中,该控制器将扩散生成的动作转换为可执行轨迹。虽然WBC轨迹与物理相符,但可能会暴露出与原始动作相比的显著偏差。为了解决这个问题,我们在这里提出了PhysMoDPO,这是一个直接优先优化框架。与依赖手工制作的物理感知启发式方法(如脚滑罚款)不同,我们将WBC集成到我们的训练流程中,并优化扩散模型,使WBC的输出既符合物理规律又符合原始文本指令。为了训练PhysMoDPO,我们部署基于物理和任务特定奖励,并使用它们为合成轨迹分配偏好。我们在文本到动作和空间控制任务上进行了大量实验,结果表明PhysMoDPO在模拟机器人上的物理逼真度和任务相关指标方面均有一致的改善。此外,我们证明PhysMoDPO在模拟中应用于零样本运动转移以及在G1人形机器人上进行真实部署时都取得了显著改进。

更新时间: 2026-03-13 17:59:59

领域: cs.LG,cs.AI,cs.CV,cs.RO

下载: http://arxiv.org/abs/2603.13228v1

Representation Learning for Spatiotemporal Physical Systems

Machine learning approaches to spatiotemporal physical systems have primarily focused on next-frame prediction, with the goal of learning an accurate emulator for the system's evolution in time. However, these emulators are computationally expensive to train and are subject to performance pitfalls, such as compounding errors during autoregressive rollout. In this work, we take a different perspective and look at scientific tasks further downstream of predicting the next frame, such as estimation of a system's governing physical parameters. Accuracy on these tasks offers a uniquely quantifiable glimpse into the physical relevance of the representations of these models. We evaluate the effectiveness of general-purpose self-supervised methods in learning physics-grounded representations that are useful for downstream scientific tasks. Surprisingly, we find that not all methods designed for physical modeling outperform generic self-supervised learning methods on these tasks, and methods that learn in the latent space (e.g., joint embedding predictive architectures, or JEPAs) outperform those optimizing pixel-level prediction objectives. Code is available at https://github.com/helenqu/physical-representation-learning.

Updated: 2026-03-13 17:59:51

标题: 空间时间物理系统的表示学习

摘要: 机器学习方法在时空物理系统方面主要关注于下一帧预测,其目标是学习系统随时间演化的准确模拟器。然而,这些模拟器在训练过程中计算成本高昂,并且容易出现性能问题,比如在自回归滚动过程中出现的累积错误。在这项工作中,我们采取了一个不同的观点,关注于预测下一帧之后更下游的科学任务,比如估计系统的主导物理参数。在这些任务上的准确性提供了对这些模型表示物理相关性的独特可量化的一瞥。我们评估了通用自监督方法在学习物理基础表示方面的有效性,这些表示对下游科学任务有用。令人惊讶的是,我们发现,并非所有为物理建模而设计的方法在这些任务上表现优于通用自监督学习方法,而在潜在空间中学习的方法(例如,联合嵌入预测架构,或JEPAs)优于优化像素级预测目标的方法。代码可在https://github.com/helenqu/physical-representation-learning找到。

更新时间: 2026-03-13 17:59:51

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2603.13227v1

A Note on Publicly Verifiable Quantum Money with Low Quantum Computational Resources

In this work we present a publicly verifiable quantum money protocol which assumes close to no quantum computational capabilities. We rely on one-time memories which in turn can be built from quantum conjugate coding and hardware-based assumptions. Specifically, our scheme allows for a limited number of verifications and also allows for quantum tokens for digital signatures. Double spending is prevented by the no-cloning principle of conjugate coding states. An implementation of the concepts presented in this work can be found at https://github.com/neverlocal/otm_billz.

Updated: 2026-03-13 17:58:31

标题: 一个关于利用低量子计算资源的公开可验证量子货币的注解

摘要: 在这项工作中,我们提出了一个公开可验证的量子货币协议,该协议几乎不依赖量子计算能力。我们依赖于一次性记忆体,这可以通过量子共轭编码和基于硬件的假设来构建。具体而言,我们的方案允许进行有限次验证,同时还允许使用量子令牌进行数字签名。通过共轭编码状态的无克隆原理来防止双重支付。本文中提出的概念的实现可以在https://github.com/neverlocal/otm_billz 找到。

更新时间: 2026-03-13 17:58:31

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2512.21304v2

New Quantum Internet Applications via Verifiable One-Time Programs

We introduce Verifiable One-Time Programs (Ver-OTPs) and use them to construct single-round Open Secure Computation (OSC), a novel primitive enabling applications like (1) single-round sealed-bid auctions, (2) single-round and honest-majority atomic proposes -- a building block of consensus protocols, and (3) single-round differentially private statistical aggregation without pre-registration. First, we construct Ver-OTPs from single-qubit states and classical cryptographic primitives. Then, assuming a multi-key homomorphic scheme (MHE) with certain properties, we use Ver-OTPs with MHE to construct OSC. The underlying quantum requirement is minimal: only single-qubit states are needed alongside a hardware assumption on the receiver's quantum resources. Our work therefore provides a new framework for quantum-assisted cryptography that may be implementable with near-term quantum technology.

Updated: 2026-03-13 17:57:36

标题: 通过可验证一次性程序,实现新的量子互联网应用

摘要: 我们引入可验证一次性程序(Ver-OTPs),并利用它们构建单轮开放安全计算(OSC),这是一种新的原语,可实现应用程序,如(1)单轮密封竞标拍卖,(2)单轮和诚实多数原子提案 - 共识协议的构建块,以及(3)不需要预注册的单轮差分私有统计聚合。首先,我们从单量子比特状态和经典密码原语构建Ver-OTPs。然后,假设具有特定属性的多密钥同态方案(MHE),我们使用Ver-OTPs与MHE构建OSC。基础的量子要求很小:只需要单量子比特状态以及接收方量子资源的硬件假设。因此,我们的工作为量子辅助密码学提供了一个新的框架,可能可以利用近期的量子技术实施。

更新时间: 2026-03-13 17:57:36

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2509.22290v2

Structural Incompatibility of Differentiable Sorting and Within-Vector Rank Normalization

We show that differentiable sorting and ranking operators are structurally incompatible with within-vector rank normalization. We formalize admissibility through monotone invariance (C1), batch independence (C2), and a rank-space stability condition (C3). Gap-sensitive relaxations such as SoftSort violate (C1) by a quantitative margin that depends on the temperature and input scale. Batchwise rank relaxations such as SinkhornSort violate (C2): the same sample can be assigned outputs arbitrarily close to 0 or 1 depending solely on batch context. Condition (C3) implies (C1) under the rank representation used here and should not be read as a third independent failure mode. We also characterize the admissible class: any admissible operator must factor through the rank representation via a Lipschitz function.

Updated: 2026-03-13 17:40:56

标题: Differentiable排序和向量内排序归一化的结构不兼容

摘要: 我们展示了可微分的排序和排名操作符与向量内秩归一化在结构上是不相容的。我们通过单调不变性(C1)、批处理独立性(C2)和一个排名空间稳定性条件(C3)来形式化可接受性。Gap-sensitive松弛如SoftSort通过取决于温度和输入尺度的定量边际违反了(C1)。批处理级别的排名松弛如SinkhornSort违反了(C2):相同的样本可以根据批处理上下文被分配输出,这些输出可以任意接近0或1。条件(C3)暗示了在这里使用的排名表示下(C1),不应被视为第三个独立的失败模式。我们还表征了可接受的类别:任何可接受的操作符必须通过一个Lipschitz函数通过排名表示进行因式分解。

更新时间: 2026-03-13 17:40:56

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2512.22587v2

Global Sensitivity Analysis for Engineering Design Based on Individual Conditional Expectations

Explainable machine learning techniques have gained increasing attention in engineering applications, especially in aerospace design and analysis, where understanding how input variables influence data-driven models is essential. Partial Dependence Plots (PDPs) are widely used for interpreting black-box models by showing the average effect of an input variable on the prediction. However, their global sensitivity metric can be misleading when strong interactions are present, as averaging tends to obscure interaction effects. To address this limitation, we propose a global sensitivity metric based on Individual Conditional Expectation (ICE) curves. The method computes the expected feature importance across ICE curves, along with their standard deviation, to more effectively capture the influence of interactions. We provide a mathematical proof demonstrating that the PDP-based sensitivity is a lower bound of the proposed ICE-based metric under truncated orthogonal polynomial expansion. In addition, we introduce an ICE-based correlation value to quantify how interactions modify the relationship between inputs and the output. Comparative evaluations were performed on three cases: a 5-variable analytical function, a 5-variable wind-turbine fatigue problem, and a 9-variable airfoil aerodynamics case, where ICE-based sensitivity was benchmarked against PDP, SHapley Additive exPlanations (SHAP), and Sobol' indices. The results show that ICE-based feature importance provides richer insights than the traditional PDP-based approach, while visual interpretations from PDP, ICE, and SHAP complement one another by offering multiple perspectives.

Updated: 2026-03-13 17:35:32

标题: 基于个体条件期望的工程设计全局敏感性分析

摘要: 可解释的机器学习技术在工程应用中越来越受到关注,特别是在航空航天设计和分析领域,了解输入变量如何影响数据驱动模型是至关重要的。局部依赖图(PDPs)被广泛用于解释黑匣子模型,通过显示输入变量对预测的平均影响来实现。然而,在存在强交互作用时,它们的全局敏感度度量可能会产生误导,因为平均化往往会掩盖交互效应。为了解决这一局限性,我们提出了一种基于个体条件期望(ICE)曲线的全局敏感度度量。该方法计算ICE曲线中的特征重要性的期望值,以及它们的标准差,以更有效地捕捉交互作用的影响。我们提供了数学证明,证明了在截断正交多项式扩展下,基于PDP的敏感度是所提出的基于ICE的度量的下限。此外,我们引入了基于ICE的相关值,以量化交互如何修改输入和输出之间的关系。我们对三种情况进行了比较评估:一个5个变量的分析函数,一个5个变量的风力涡轮疲劳问题,以及一个9个变量的翼型气动力学案例,其中ICE-based敏感度与PDP、SHapley Additive exPlanations(SHAP)和Sobol'指数进行了基准测试。结果表明,基于ICE的特征重要性提供了比传统的基于PDP的方法更丰富的见解,而来自PDP、ICE和SHAP的视觉解释互相补充,提供了多个视角。

更新时间: 2026-03-13 17:35:32

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2512.11946v3

Superficial Safety Alignment Hypothesis

As large language models (LLMs) are overwhelmingly more and more integrated into various applications, ensuring they generate safe responses is a pressing need. Previous studies on alignment have largely focused on general instruction-following but have often overlooked the distinct properties of safety alignment, such as the brittleness of safety mechanisms. To bridge the gap, we propose the Superficial Safety Alignment Hypothesis (SSAH), which posits that safety alignment teaches an otherwise unsafe model to choose the correct reasoning direction-fulfill or refuse users' requests-interpreted as an implicit binary classification task. Through SSAH, we hypothesize that only a few essential components can establish safety guardrails in LLMs. We successfully identify four types of attribute-critical components: Safety Critical Unit (SCU), Utility Critical Unit (UCU), Complex Unit (CU), and Redundant Unit (RU). Our findings show that freezing certain safety-critical components during fine-tuning allows the model to retain its safety attributes while adapting to new tasks. Similarly, we show that leveraging redundant units in the pre-trained model as an "alignment budget" can effectively minimize the alignment tax while achieving the alignment goal. All considered, this paper concludes that the atomic functional unit for safety in LLMs is at the neuron level and underscores that safety alignment should not be complicated. We have code implementation and other information on the project website: https://ssa-h.github.io/.

Updated: 2026-03-13 17:29:36

标题: 表面安全对齐假设

摘要: 随着大型语言模型(LLMs)越来越多地融入各种应用程序中,确保它们生成安全响应是一个迫切的需求。先前关于对齐的研究主要集中在一般的指令遵循上,但往往忽视了安全对齐的独特属性,比如安全机制的脆弱性。为了弥合这一差距,我们提出了“表面安全对齐假设”(SSAH),该假设认为安全对齐教会了一个本来不安全的模型选择正确的推理方向-实现或拒绝用户请求-被理解为一个隐式的二元分类任务。通过SSAH,我们假设只有少数关键组件可以在LLMs中建立安全保障。我们成功地识别出四种属性关键组件:安全关键单元(SCU)、效用关键单元(UCU)、复杂单元(CU)和冗余单元(RU)。我们的研究结果表明,在微调期间冻结某些安全关键组件可以使模型保持其安全属性,同时适应新任务。同样,我们展示了利用预先训练模型中的冗余单元作为“对齐预算”可以有效地最小化对齐税,同时实现对齐目标。总的来说,本文得出结论,LLMs中安全的原子功能单元在神经元水平上,并强调安全对齐不应该复杂化。我们在项目网站上提供了代码实现和其他信息:https://ssa-h.github.io/。

更新时间: 2026-03-13 17:29:36

领域: cs.CL,cs.AI,cs.CR,cs.CY,cs.LG

下载: http://arxiv.org/abs/2410.10862v3

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessary utility loss or even more serious misalignment in predictions between training data and non-training data. In this work, we observed three insights: i) privacy vulnerability exists in a very small fraction of weights; ii) however, most of those weights also critically impact utility performance; iii) the importance of weights stems from their locations rather than their values. According to these insights, to preserve privacy, we score critical weights, and instead of discarding those neurons, we rewind only the weights for fine-tuning. We show that, through extensive experiments, this mechanism exhibits outperforming resilience in most cases against Membership Inference Attacks while maintaining utility.

Updated: 2026-03-13 17:20:12

标题: 学习能力和隐私漏洞在一些关键权重中纠缠在一起

摘要: 以前的会员隐私保护方法通常会更新或重新训练神经网络中的所有权重,这是昂贵的,并且可能导致不必要的效用损失,甚至在训练数据和非训练数据之间的预测中出现更严重的不一致。在这项工作中,我们观察到三个见解:i)隐私漏洞存在于非常少量的权重中;ii)然而,大多数这些权重也对效用性能有重要影响;iii)权重的重要性源自它们的位置而不是它们的值。根据这些见解,为了保护隐私,我们对关键权重进行评分,而不是丢弃那些神经元,我们只重新调整权重进行微调。我们通过广泛的实验表明,这种机制在大多数情况下对抗会员推理攻击表现出卓越的弹性,同时保持效用。

更新时间: 2026-03-13 17:20:12

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2603.13186v1

Neural-Quantum-States Impurity Solver for Quantum Embedding Problems

Neural quantum states (NQS) have emerged as a promising approach to solve second-quantized Hamiltonians, because of their scalability and flexibility. In this work, we design and benchmark an NQS impurity solver for the quantum embedding (QE) methods, focusing on the ghost Gutzwiller Approximation (gGA) framework. We introduce a graph transformer-based NQS framework able to represent arbitrarily connected impurity orbitals of the embedding Hamiltonian (EH) and develop an error control mechanism to stabilize iterative updates throughout the QE loops. We validate the accuracy of our approach with benchmark gGA calculations of the Anderson Lattice Model, yielding results in excellent agreement with the exact diagonalisation impurity solver. Finally, our analysis of the computational budget reveals the method's principal bottleneck to be the high-accuracy sampling of physical observables required by the embedding loop, rather than the NQS variational optimization, directly highlighting the critical need for more efficient inference techniques.

Updated: 2026-03-13 17:19:06

标题: 神经-量子态杂质求解器用于量子嵌入问题

摘要: 神经量子态(NQS)已经成为解决第二量子化哈密顿量的一种有前景的方法,因为它们具有可扩展性和灵活性。在这项工作中,我们设计并基准测试了一个NQS杂质求解器,用于量子嵌入(QE)方法,重点关注幽灵古茨威勒近似(gGA)框架。我们引入了基于图变换器的NQS框架,能够表示嵌入哈密顿量(EH)中任意连接的杂质轨道,并开发了一个误差控制机制,以稳定QE循环中的迭代更新。我们通过对安德森格子模型的基准gGA计算验证了我们方法的准确性,得到了与精确对角化杂质求解器极好一致的结果。最后,我们对计算预算的分析揭示了该方法的主要瓶颈是嵌入循环所需的物理可观测量的高精度采样,而不是NQS变分优化,直接突出了更高效的推理技术的关键需求。

更新时间: 2026-03-13 17:19:06

领域: cond-mat.str-el,cs.AI,cs.LG,quant-ph

下载: http://arxiv.org/abs/2509.12431v2

Verification of Robust Properties for Access Control Policies

Existing methods for verifying access control policies require the policy to be complete and fully determined before verification can proceed, but in practice policies are developed iteratively, composed from independently maintained components, and extended as organisational structures evolve. We introduce robust property verification: the problem of determining what a policy's structure commits it to regardless of how pending decisions are resolved and regardless of subsequent extension. We define a support judgment $\Vdash_{P}φ$ stating that policy $P$ has robust property $φ$, with connectives for implication, conjunction, disjunction, and negation, prove that it is compositional (verified properties persist under policy extension by a monotonicity theorem), and show that despite quantifying universally over all possible policy extensions the judgment reduces to proof search in a second-order logic programming language. Soundness and completeness of this reduction are established, yielding a finitary and executable verification procedure for robust security properties.

Updated: 2026-03-13 17:14:38

标题: 验证访问控制策略的稳健性属性

摘要: 现有的验证访问控制策略的方法要求在验证进行之前策略必须完整并完全确定,但在实践中,策略是逐步发展的,由独立维护的组件组成,并随着组织结构的演变而扩展。我们引入了鲁棒性属性验证:确定策略的结构承诺了什么问题,无论挂起的决定如何解决,无论之后的扩展如何。我们定义了一个支持判断$\Vdash_{P}φ$,表明策略$P$具有鲁棒性属性$φ$,其中包含蕴涵、合取、析取和否定的连接词,证明了它是组合的(验证的属性在通过单调性定理进行策略扩展后仍然存在),并且显示尽管在所有可能的策略扩展上进行量化,该判断在第二阶逻辑编程语言中减少到证明搜索。这种简化的完备性和完整性已经确立,为鲁棒安全属性提供了有限且可执行的验证程序。

更新时间: 2026-03-13 17:14:38

领域: cs.CR,cs.LO

下载: http://arxiv.org/abs/2603.13181v1

PILOT: Command-line Interface Fuzzing via Path-Guided, Iterative Large Language Model Prompting

Command-line interface (CLI) fuzzing tests programs by mutating both command-line options and input file contents, thus enabling discovery of vulnerabilities that only manifest under specific option-input combinations. Prior works of CLI fuzzing face the challenges of generating semantics-rich option strings and input files, which cannot reach deeply embedded target functions. This often leads to a misdetection of such a deep vulnerability using existing CLI fuzzing techniques. In this paper, we design a novel Path-guided, Iterative LLM-Orchestrated Testing framework, called PILOT, to fuzz CLI applications. The key insight is to provide potential call paths to target functions as context to LLM so that it can better generate CLI option strings and input files. Then, PILOT iteratively repeats the process, and provides reached functions as additional context so that target functions are reached. Our evaluation on real-world CLI applications demonstrates that PILOT achieves higher coverage than state-of-the-art fuzzing approaches and discovers 51 zero-day vulnerabilities. We responsibly disclosed all the vulnerabilities to their developers and so far 41 have been confirmed by their developers with 33 being fixed and three assigned CVE identifiers.

Updated: 2026-03-13 17:06:46

标题: PILOT:通过路径引导、迭代式大型语言模型提示的命令行界面模糊测试

摘要: 命令行界面(CLI)模糊测试通过改变命令行选项和输入文件内容来测试程序,从而发现仅在特定选项-输入组合下才会显现的漏洞。以往的CLI模糊测试面临生成语义丰富的选项字符串和输入文件的挑战,无法达到深度嵌入的目标函数。这经常导致利用现有CLI模糊测试技术错误检测到这种深层次的漏洞。在本文中,我们设计了一种新颖的路径引导、迭代LLM编排测试框架,称为PILOT,用于对CLI应用进行模糊测试。关键洞察是为LLM提供潜在的调用路径作为目标函数的上下文,以便它能更好地生成CLI选项字符串和输入文件。然后,PILOT迭代重复这个过程,并提供已达到的函数作为额外的上下文,以便达到目标函数。我们对真实CLI应用的评估表明,PILOT实现了比最先进的模糊测试方法更高的覆盖率,并发现了51个零日漏洞。我们负责向开发人员披露了所有的漏洞,迄今为止,41个漏洞已被开发人员确认,其中33个已被修复,三个被分配了CVE标识符。

更新时间: 2026-03-13 17:06:46

领域: cs.CR

下载: http://arxiv.org/abs/2511.20555v2

Defensible Design for OpenClaw: Securing Autonomous Tool-Invoking Agents

OpenClaw-like agents offer substantial productivity benefits, yet they are insecure by default because they combine untrusted inputs, autonomous action, extensibility, and privileged system access within a single execution loop. We use OpenClaw as an exemplar of a broader class of agents that interact with interfaces, manipulate files, invoke tools, and install extensions in real operating environments. Consequently, their security should be treated as a software engineering problem rather than as a product-specific concern. To address these architectural vulnerabilities, we propose a blueprint for defensible design. We present a risk taxonomy, secure engineering principles, and a practical research agenda to institutionalize safety in agent construction. Our goal is to transition the community focus from isolated vulnerability patching toward systematic defensive engineering and robust deployment practices.

Updated: 2026-03-13 16:41:11

标题: 《开放爪子的可防御设计:保护自主工具调用代理》

摘要: 类似OpenClaw的代理程序提供了可观的生产力益处,然而它们默认情况下是不安全的,因为它们将不受信任的输入、自主行动、可扩展性和特权系统访问结合在一个执行循环中。我们以OpenClaw为例,它代表了与接口交互、操作文件、调用工具和在实际操作环境中安装扩展的更广泛类别的代理程序。因此,它们的安全性应被视为一个软件工程问题,而不是一个特定产品的关注点。为了解决这些架构漏洞,我们提出了一个可捍卫设计的蓝图。我们提出了一个风险分类法、安全工程原则和一个实用的研究议程,以使安全性在代理程序构建中得到制度化。我们的目标是将社区关注点从孤立的漏洞修补转向系统性的防御性工程和强大的部署实践。

更新时间: 2026-03-13 16:41:11

领域: cs.CR

下载: http://arxiv.org/abs/2603.13151v1

Distributional Regression with Tabular Foundation Models: Evaluating Probabilistic Predictions via Proper Scoring Rules

Tabular foundation models such as TabPFN and TabICL already produce full predictive distributions, yet the benchmarks used to evaluate them (TabArena, TALENT, and others) still rely almost exclusively on point-estimate metrics (RMSE, $R^2$). This mismatch implicitly rewards models that elicit a good conditional mean while ignoring the quality of the predicted distribution. We make two contributions. First, we propose supplementing standard point metrics with proper scoring rules (CRPS, CRLS, and the Interval Score) and provide a head-to-head comparison of realTabPFNv2.5 and TabICLv2 with regards to some proper scoring rules across 20 OpenML regression datasets. Second, we show analytically and empirically that different proper scoring rules induce different model rankings and different inductive biases during training, even though each rule is individually minimized by the true distribution. Fine-tuning realTabPFNv2.5 with scoring rules not seen during pretraining (CRLS, $β=1.8$ energy score) yields consistent improvements on the corresponding metrics, confirming that the training loss shapes the model beyond what propriety alone guarantees. Together, these findings argue for (i) reporting distributional metrics in tabular regression benchmarks and (ii) making the training objective of foundation models adaptable (via fine-tuning or task-token conditioning) to the scoring rule relevant to the downstream decision problem.

Updated: 2026-03-13 16:39:12

标题: 基于表格基础模型的分布回归:通过适当的评分规则评估概率预测

摘要: 表格基础模型(如TabPFN和TabICL)已经能够产生完整的预测分布,然而用于评估它们的基准(如TabArena、TALENT等)几乎完全依赖于点估计指标(RMSE、$R^2$)。这种不匹配隐含地奖励那些能够提取出良好条件均值而忽略预测分布质量的模型。我们做出了两点贡献。首先,我们提议在标准点指标的基础上补充适当的评分规则(CRPS、CRLS和区间分数),并在20个OpenML回归数据集上对realTabPFNv2.5和TabICLv2进行了一对一的比较,考察了一些适当的评分规则。其次,我们从理论和实证方面展示了不同的适当评分规则在训练期间会导致不同的模型排名和不同的归纳偏差,尽管每个规则在个体上都被真实分布最小化。用未在预训练中见过的评分规则(CRLS、$β=1.8$ 能量分数)对realTabPFNv2.5进行微调,在相应的指标上持续改进,证实了训练损失会使模型超越单纯的合法性保障。综合这些发现,我们主张(i)在表格回归基准中报告分布度量指标,以及(ii)使基础模型的训练目标可适应(通过微调或任务令牌调节)与下游决策问题相关的评分规则。

更新时间: 2026-03-13 16:39:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2603.08206v2

RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation

The pursuit of robot generalists, agents capable of performing diverse tasks across diverse environments, demands rigorous and scalable evaluation. Yet real-world testing of robot policies remains fundamentally constrained: it is labor-intensive, slow, unsafe at scale, and difficult to reproduce. As policies expand in scope and complexity, these barriers only intensify, since defining "success" in robotics often hinges on nuanced human judgments of execution quality. We introduce RobotArena Infinity, a new benchmarking framework that overcomes these challenges by shifting vision-language-action (VLA) evaluation into large-scale simulated environments augmented with online human feedback. Leveraging advances in vision-language models, 2D-to-3D generative modeling, and differentiable rendering, our approach automatically converts video demonstrations from widely used robot datasets into simulated counterparts. Within these digital twins, we assess VLA policies using both automated vision-language-model-guided scoring and scalable human preference judgments collected from crowdworkers, transforming human involvement from tedious scene setup, resetting, and safety supervision into lightweight preference comparisons. To measure robustness, we systematically perturb simulated environments along multiple axes, including textures and object placements, stress-testing policy generalization under controlled variation. The result is a continuously evolving, reproducible, and scalable benchmark for real-world-trained robot manipulation policies, addressing a critical missing capability in today's robotics landscape.

Updated: 2026-03-13 16:29:34

标题: RobotArena $\infty$: 通过真实到虚拟的转换进行可扩展的机器人基准测试

摘要: 追求机器人通才,即能够在不同环境中执行多样任务的智能体,需要严格且可扩展的评估。然而,机器人政策的实际测试仍然受到根本性的限制:这是一项劳动密集、缓慢、在规模上不安全且难以复制的过程。随着政策范围和复杂性的扩大,这些障碍只会加剧,因为在机器人领域,定义“成功”往往取决于对执行质量微妙的人类判断。我们引入了RobotArena Infinity,这是一个新的基准框架,通过将视觉-语言-动作(VLA)评估转移到大规模模拟环境,并结合在线人类反馈,克服了这些挑战。利用视觉-语言模型、2D到3D生成建模和可微分渲染的进展,我们的方法自动将广泛使用的机器人数据集中的视频演示转换为模拟对应物。在这些数字孪生体中,我们使用自动化的视觉-语言模型引导评分和从众包工作者收集的可扩展人类偏好判断来评估VLA政策,将人类参与从繁琐的场景设置、重置和安全监督转变为轻量级的偏好比较。为了衡量稳健性,我们系统地扰动模拟环境,包括纹理和物体放置等多个轴线,通过受控变化来测试政策的泛化能力。结果是一个持续发展、可重现且可扩展的基准,用于评估经过真实世界训练的机器人操纵政策,弥补了当今机器人领域中一个关键的缺失能力。

更新时间: 2026-03-13 16:29:34

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2510.23571v2

ZO-SAM: Zero-Order Sharpness-Aware Minimization for Efficient Sparse Training

Deep learning models, despite their impressive achievements, suffer from high computational costs and memory requirements, limiting their usability in resource-constrained environments. Sparse neural networks significantly alleviate these constraints by dramatically reducing parameter count and computational overhead. However, existing sparse training methods often experience chaotic and noisy gradient signals, severely hindering convergence and generalization performance, particularly at high sparsity levels. To tackle this critical challenge, we propose Zero-Order Sharpness-Aware Minimization (ZO-SAM), a novel optimization framework that strategically integrates zero-order optimization within the SAM approach. Unlike traditional SAM, ZO-SAM requires only a single backpropagation step during perturbation, selectively utilizing zero-order gradient estimations. This innovative approach reduces the backpropagation computational cost by half compared to conventional SAM, significantly lowering gradient variance and effectively eliminating associated computational overhead. By harnessing SAM's capacity for identifying flat minima, ZO-SAM stabilizes the training process and accelerates convergence. These efficiency gains are particularly important in sparse training scenarios, where computational cost is the primary bottleneck that limits the practicality of SAM. Moreover, models trained with ZO-SAM exhibit improved robustness under distribution shift, further broadening its practicality in real-world deployments.

Updated: 2026-03-13 16:11:41

标题: ZO-SAM:零阶锐度感知最小化用于高效稀疏训练

摘要: 深度学习模型,尽管取得了令人印象深刻的成就,但受到高计算成本和内存需求的限制,限制了它们在资源受限环境中的可用性。稀疏神经网络通过显著减少参数数量和计算开销,显著缓解了这些限制。然而,现有的稀疏训练方法通常经历混乱和嘈杂的梯度信号,严重阻碍了收敛和泛化性能,特别是在高稀疏水平。为了解决这一关键挑战,我们提出了Zero-Order Sharpness-Aware Minimization (ZO-SAM),这是一个新颖的优化框架,通过在SAM方法中策略性地集成零阶优化。与传统的SAM不同,ZO-SAM在扰动期间只需要一次反向传播步骤,有选择地利用零阶梯度估计。这种创新方法将反向传播计算成本降低了一半,显著降低了梯度方差,有效消除了相关的计算开销。通过利用SAM识别平坦最小值的能力,ZO-SAM稳定了训练过程并加速了收敛。这些效率收益在稀疏训练场景中尤为重要,计算成本是限制SAM实用性的主要瓶颈。此外,使用ZO-SAM训练的模型在分布转移下表现出更好的鲁棒性,进一步扩大了其在实际部署中的实用性。

更新时间: 2026-03-13 16:11:41

领域: cs.LG

下载: http://arxiv.org/abs/2603.13115v1

BoSS: A Best-of-Strategies Selector as an Oracle for Deep Active Learning

Active learning (AL) aims to reduce annotation costs while maximizing model performance by iteratively selecting valuable instances. While foundation models have made it easier to identify these instances, existing selection strategies still lack robustness across different models, annotation budgets, and datasets. To highlight the potential weaknesses of existing AL strategies and provide a reference point for research, we explore oracle strategies, i.e., strategies that approximate the optimal selection by accessing ground-truth information unavailable in practical AL scenarios. Current oracle strategies, however, fail to scale effectively to large datasets and complex deep neural networks. To tackle these limitations, we introduce the Best-of-Strategy Selector (BoSS), a scalable oracle strategy designed for large-scale AL scenarios. BoSS constructs a set of candidate batches through an ensemble of selection strategies and then selects the batch yielding the highest performance gain. As an ensemble of selection strategies, BoSS can be easily extended with new state-of-the-art strategies as they emerge, ensuring it remains a reliable oracle strategy in the future. Our evaluation demonstrates that i) BoSS outperforms existing oracle strategies, ii) state-of-the-art AL strategies still fall noticeably short of oracle performance, especially in large-scale datasets with many classes, and iii) one possible solution to counteract the inconsistent performance of AL strategies might be to employ an ensemble-based approach for the selection.

Updated: 2026-03-13 16:05:37

标题: BoSS:作为深度主动学习的Oracle的最佳策略选择器

摘要: 主动学习(AL)旨在通过迭代选择有价值的实例来降低注释成本,同时最大化模型性能。虽然基础模型已经使识别这些实例变得更容易,但现有的选择策略在不同模型、注释预算和数据集之间仍然缺乏稳健性。为了突显现有AL策略的潜在弱点并为研究提供参考,我们探讨了oracle策略,即通过访问实际AL场景中不可用的地面真相信息来近似最佳选择的策略。然而,当前的oracle策略未能有效扩展到大型数据集和复杂的深度神经网络。为了解决这些限制,我们引入了最佳策略选择器(BoSS),这是一种专为大规模AL场景设计的可扩展的oracle策略。BoSS通过一组选择策略的集成构建一组候选批次,然后选择产生最大性能增益的批次。作为一组选择策略的集成,BoSS可以很容易地通过新的最新策略进行扩展,确保它在未来仍然是可靠的oracle策略。我们的评估表明,i)BoSS优于现有的oracle策略,ii)尤其是在具有许多类别的大规模数据集中,最新的AL策略仍然明显落后于oracle性能,iii)对抗AL策略不一致性性能的一个可能解决方案可能是采用基于集成的方法进行选择。

更新时间: 2026-03-13 16:05:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2603.13109v1

Language Models are Injective and Hence Invertible

Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs could map to the same output and prevent exact recovery of the input from a model's representations. In this paper, we challenge this view. First, we prove mathematically that transformer language models mapping discrete input sequences to their corresponding sequence of continuous representations are injective and therefore lossless, a property established at initialization and preserved during training. Second, we confirm this result empirically through billions of collision tests on six state-of-the-art language models, and observe no collisions. Third, we operationalize injectivity: we introduce SipIt, the first algorithm that provably and efficiently reconstructs the exact input text from hidden activations, establishing linear-time guarantees and demonstrating exact invertibility in practice. Overall, our work establishes injectivity as a fundamental and exploitable property of language models, with direct implications for transparency, interpretability, and safe deployment.

Updated: 2026-03-13 15:58:05

标题: 语言模型是可逆的,因此是可逆的

摘要: Transformer组件,如非线性激活和归一化,在本质上是非单射的,这意味着不同的输入可能映射到相同的输出,并阻止从模型表示中准确恢复输入。在本文中,我们挑战了这一观点。首先,我们在数学上证明了将离散输入序列映射到其相应连续表示序列的Transformer语言模型是单射的,因此是无损的,这种属性在初始化时得到了确认并在训练过程中得到了保留。其次,我们通过对六种最先进的语言模型进行数十亿次碰撞测试,在实验证实了这一结果,并观察到没有碰撞发生。第三,我们将单射性操作化:我们引入了SipIt,这是第一个能够从隐藏激活中可靠且高效地重建出准确输入文本的算法,建立了线性时间保证,并在实践中展示了确切的可逆性。总的来说,我们的工作将单射性确立为语言模型的一个基本且可利用的属性,对透明性、可解释性和安全部署具有直接的影响。

更新时间: 2026-03-13 15:58:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2510.15511v4

A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring

Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight mechanisms. Yet principled methods to detect and quantify such behaviours are lacking. Classical definitions of steganography, and detection methods based on them, require a known reference distribution of non-steganographic signals. For the case of steganographic reasoning in LLMs, knowing such a reference distribution is not feasible; this renders these approaches inapplicable. We propose an alternative, \textbf{decision-theoretic view of steganography}. Our central insight is that steganography creates an asymmetry in usable information between agents who can and cannot decode the hidden content (present within a steganographic signal), and this otherwise latent asymmetry can be inferred from the agents' observable actions. To formalise this perspective, we introduce generalised $\mathcal{V}$-information: a utilitarian framework for measuring the amount of usable information within some input. We use this to define the \textbf{steganographic gap} -- a measure that quantifies steganography by comparing the downstream utility of the steganographic signal to agents that can and cannot decode the hidden content. We empirically validate our formalism, and show that it can be used to detect, quantify, and mitigate steganographic reasoning in LLMs.

Updated: 2026-03-13 15:55:32

标题: 一个决策理论形式化的隐写术,并应用于LLM监测

摘要: 大型语言模型开始展现出隐写能力。这种能力可以让不一致的模型逃避监督机制。然而,目前缺乏检测和量化这种行为的原则方法。传统的隐写术定义以及基于它们的检测方法需要一个已知的非隐写信号的参考分布。对于LLM中的隐写推理,了解这样一个参考分布是不可行的;这使得这些方法不适用。我们提出了一个替代方法,即\textbf{决策理论观点的隐写术}。我们的核心见解是,隐写术在能够和不能够解码隐藏内容的代理之间创造了可用信息的不对称性,而这种潜在的不对称性可以通过代理的可观察行为推断出来。为了形式化这个观点,我们引入了广义$\mathcal{V}$-信息:一个用于衡量某个输入中可用信息量的实用主义框架。我们利用这一概念定义了\textbf{隐写差距} —— 一种通过比较能够和不能够解码隐藏内容的代理对隐写信号的下游效用来量化隐写术的度量。我们通过实证验证了我们的形式化方法,并展示了它可以用于检测、量化和减轻LLM中的隐写推理。

更新时间: 2026-03-13 15:55:32

领域: cs.AI,cs.CL,cs.CR,cs.IT,cs.MA

下载: http://arxiv.org/abs/2602.23163v2

Unsupervised anomaly detection in MeV ultrafast electron diffraction

MeV ultrafast electron diffraction (MUED) is a pump-probe technique used to study the dynamic structural evolution of materials. An ultrashort laser pulse triggers structural changes, which are then probed by an ultrashort relativistic electron beam. To overcome low signal-to-noise ratios, diffraction patterns are averaged over thousands of shots. However, shot-to-shot instabilities in the electron beam can distort individual patterns, introducing uncertainty. Improving MUED accuracy requires detecting and removing these anomalous patterns from large datasets. In this work, we developed a fully unsupervised methodology for the detection of anomalous diffraction patterns. Using a convolutional autoencoder, we calculate the reconstruction mean squared error of the diffraction patterns. Based on the statistical analysis of this error, we provide the user an estimation of the probability that the pattern is normal, which also allows a posterior visual inspection of the images that are difficult to classify. This method has been trained with only 100 diffraction patterns and tested on 1521 patterns, resulting in a false positive rate between 0.2\% and 0.4\%, with a training time of 10 seconds per image and a test time of about 1 second per image. The proposed methodology can also be applied to other diffraction techniques in which large datasets are collected that include faulty images due to instrumental instabilities.

Updated: 2026-03-13 15:49:45

标题: MeV 超快电子衍射中的无监督异常检测

摘要: MeV超快电子衍射(MUED)是一种泵浦-探针技术,用于研究材料的动态结构演变。超短激光脉冲触发结构变化,然后通过超短相对论电子束进行探测。为了克服低信噪比,衍射图案在成千上万次射击中进行平均。然而,电子束的逐次不稳定性可能会扭曲单个图案,引入不确定性。提高MUED准确性需要检测并从大型数据集中删除这些异常图案。在这项工作中,我们开发了一种完全无监督的方法来检测异常的衍射图案。通过使用卷积自动编码器,我们计算衍射图案的重建均方误差。基于对这种错误的统计分析,我们向用户提供图案正常的概率估计,这也允许后续对难以分类的图像进行视觉检查。该方法仅使用100个衍射图案进行训练,并在1521个图案上进行测试,结果显示假阳性率在0.2%至0.4%之间,每个图像的训练时间约为10秒,测试时间约为1秒。所提出的方法也可以应用于其他衍射技术,其中收集了包括仪器不稳定性导致的故障图像在内的大型数据集。

更新时间: 2026-03-13 15:49:45

领域: cs.LG,physics.ins-det

下载: http://arxiv.org/abs/2505.13702v2

CCMamba: Topologically-Informed Selective State-Space Networks on Combinatorial Complexes for Higher-Order Graph Learning

Topological deep learning has emerged as a powerful paradigm for modeling higher-order relational structures beyond pairwise interactions that standard graph neural networks fail to capture. While combinatorial complexes (CCs) offer a unified topological foundation for the higher-order graph learning, existing topological deep learning methods rely heavily on local message passing and attention mechanisms. These suffer from quadratic complexity and local neighborhood constraints, limiting their scalability and capacity for rank-aware, long-range dependency modeling. To overcome these challenges, we propose Combinatorial Complex Mamba (CCMamba), the first unified Mamba-based neural framework for learning on combinatorial complexes. CCMamba reformulates higher-order message passing as a selective state-space modeling problem by linearizing multi-rank incidence relations into structured, rank-aware sequences. This architecture enables adaptive, directional, and long-range information propagation in linear time bypassing the scalability bottlenecks of self-attention. Theoretically, we further establish that the expressive power of CCMamba is upper-bounded by the 1-dimensional combinatorial complex Weisfeiler-Lehman (1-CCWL) test. Extensive experiments across graph, hypergraph, and simplicial benchmarks demonstrate that CCMamba consistently outperforms existing methods while exhibiting superior scalability and remarkable robustness against over-smoothing in deep architectures.

Updated: 2026-03-13 15:43:11

标题: CCMamba:基于组合复合体的拓扑信息选择性状态空间网络,用于高阶图学习

摘要: 拓扑深度学习已经成为一种强大的范例,用于对超过成对交互的高阶关系结构进行建模,标准图神经网络无法捕捉。虽然组合复合体(CCs)为高阶图学习提供了统一的拓扑基础,但现有的拓扑深度学习方法严重依赖于局部消息传递和注意机制。这些方法受到二次复杂度和局部邻域约束的困扰,限制了它们的可扩展性和对等级感知、长程依赖建模的能力。为了克服这些挑战,我们提出了组合复合体Mamba(CCMamba),这是第一个基于Mamba的神经框架,用于在组合复合体上进行学习。CCMamba将高阶消息传递重新构建为一个选择性状态空间建模问题,通过将多秩发生关系线性化为结构化的、等级感知的序列。这种架构使得信息传播能够在线性时间内实现自适应、定向和长程传播,绕过了自注意力的可扩展性瓶颈。理论上,我们进一步确定了CCMamba的表达能力上限为1维组合复合体Weisfeiler-Lehman(1-CCWL)测试。在图、超图和单纯形基准测试中进行了大量实验,结果表明CCMamba在表现上始终优于现有方法,同时在深度架构中表现出卓越的可扩展性和对抗过度平滑的鲁棒性。

更新时间: 2026-03-13 15:43:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2601.20518v2

Breaking the Tuning Barrier: Zero-Hyperparameters Yield Multi-Corner Analysis Via Learned Priors

Yield Multi-Corner Analysis validates circuits across 25+ Process-Voltage-Temperature corners, resulting in a combinatorial simulation cost of $O(K \times N)$ where $K$ denotes corners and $N$ exceeds $10^4$ samples per corner. Existing methods face a fundamental trade-off: simple models achieve automation but fail on nonlinear circuits, while advanced AI models capture complex behaviors but require hours of hyperparameter tuning per design iteration, forming the Tuning Barrier. We break this barrier by replacing engineered priors (i.e., model specifications) with learned priors from a foundation model pre-trained on millions of regression tasks. This model performs in-context learning, instantly adapting to each circuit without tuning or retraining. Its attention mechanism automatically transfers knowledge across corners by identifying shared circuit physics between operating conditions. Combined with an automated feature selector (1152D to 48D), our method matches state-of-the-art accuracy (mean MREs as low as 0.11\%) with zero tuning, reducing total validation cost by over $10\times$.

Updated: 2026-03-13 15:40:57

标题: 突破调谐障碍:零超参数通过学习的先验实现多角分析

摘要: 多角分析验证了在25个以上的工艺-电压-温度角上的电路,其组合仿真成本为$O(K \times N)$,其中$K$表示角,$N$超过每个角度$10^4$个样本。现有方法面临着一种基本权衡:简单模型实现自动化,但在非线性电路上失败,而先进的AI模型捕捉复杂行为,但需要每个设计迭代花费数小时进行超参数调整,形成调整障碍。我们通过用预先在数百万次回归任务中进行训练的基础模型学习的先验(即模型规范)取代工程先验来打破这一障碍。这个模型执行上下文学习,即时适应每个电路,无需调整或重新训练。其注意机制通过识别在不同工作条件下共享电路物理来自动跨角度传递知识。结合自动特征选择器(1152D至48D),我们的方法与零调整相匹配,达到了最先进的准确度(平均MRE低至0.11\%),将总验证成本降低了超过10倍。

更新时间: 2026-03-13 15:40:57

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2603.13092v1

Purify Once, Edit Freely: Breaking Image Protections under Model Mismatch

Diffusion models enable high-fidelity image editing but can also be misused for unauthorized style imitation and harmful content generation. To mitigate these risks, proactive image protection methods embed small, often imperceptible adversarial perturbations into images before sharing to disrupt downstream editing or fine-tuning. However, in realistic post-release scenarios, content owners cannot control downstream processing pipelines, and protections optimized for a surrogate model may fail when attackers use mismatched diffusion pipelines. Existing purification methods can weaken protections but often sacrifice image quality and rarely examine architectural mismatch. We introduce a unified post-release purification framework to evaluate protection survivability under model mismatch. We propose two practical purifiers: VAE-Trans, which corrects protected images via latent-space projection, and EditorClean, which performs instruction-guided reconstruction with a Diffusion Transformer to exploit architectural heterogeneity. Both operate without access to protected images or defense internals. Across 2,100 editing tasks and six representative protection methods, EditorClean consistently restores editability. Compared to protected inputs, it improves PSNR by 3-6 dB and reduces FID by 50-70 percent on downstream edits, while outperforming prior purification baselines by about 2 dB PSNR and 30 percent lower FID. Our results reveal a purify-once, edit-freely failure mode: once purification succeeds, the protective signal is largely removed, enabling unrestricted editing. This highlights the need to evaluate protections under model mismatch and design defenses robust to heterogeneous attackers.

Updated: 2026-03-13 14:36:46

标题: 一次净化,自由编辑:在模型不匹配下打破图像保护

摘要: 扩散模型可以实现高保真图像编辑,但也可能被用于未经授权的风格模仿和有害内容生成。为了减轻这些风险,积极的图像保护方法在分享之前将小的、通常不可察觉的对抗性扰动嵌入图像中,以破坏下游编辑或微调。然而,在实际的发布后场景中,内容所有者无法控制下游处理管道,而为替代模型优化的保护可能在攻击者使用不匹配的扩散管道时失败。现有的净化方法可能会削弱保护,但往往会牺牲图像质量,很少检查架构不匹配。我们引入了一个统一的发布后净化框架来评估在模型不匹配情况下保护的生存能力。我们提出了两种实用的净化器:VAE-Trans,通过潜在空间投影纠正受保护的图像,以及EditorClean,利用扩散变压器进行指导重建,以利用架构异质性。两者均在没有访问受保护图像或防御内部的情况下运行。在2,100个编辑任务和六种代表性的保护方法中,EditorClean始终恢复可编辑性。与受保护的输入相比,在下游编辑中,它将PSNR提高了3-6 dB,将FID减少了50-70%,同时在PSNR方面优于之前的净化基线约2 dB,FID较低约30%。我们的结果揭示了一种净化一次,自由编辑的失败模式:一旦净化成功,保护信号大部分被移除,从而实现无限制的编辑。这突显了在模型不匹配情况下评估保护的必要性,并设计对异质攻击者具有鲁棒性的防御措施。

更新时间: 2026-03-13 14:36:46

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2603.13028v1

PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses

Prompt injection poses serious security risks to real-world LLM applications, particularly autonomous agents. Although many defenses have been proposed, their robustness against adaptive attacks remains insufficiently evaluated, potentially creating a false sense of security. In this work, we propose PISmith, a reinforcement learning (RL)-based red-teaming framework that systematically assesses existing prompt-injection defenses by training an attack LLM to optimize injected prompts in a practical black-box setting, where the attacker can only query the defended LLM and observe its outputs. We find that directly applying standard GRPO to attack strong defenses leads to sub-optimal performance due to extreme reward sparsity -- most generated injected prompts are blocked by the defense, causing the policy's entropy to collapse before discovering effective attack strategies, while the rare successes cannot be learned effectively. In response, we introduce adaptive entropy regularization and dynamic advantage weighting to sustain exploration and amplify learning from scarce successes. Extensive evaluation on 13 benchmarks demonstrates that state-of-the-art prompt injection defenses remain vulnerable to adaptive attacks. We also compare PISmith with 7 baselines across static, search-based, and RL-based attack categories, showing that PISmith consistently achieves the highest attack success rates. Furthermore, PISmith achieves strong performance in agentic settings on InjecAgent and AgentDojo against both open-source and closed-source LLMs (e.g., GPT-4o-mini and GPT-5-nano). Our code is available at https://github.com/albert-y1n/PISmith.

Updated: 2026-03-13 14:34:54

标题: PISmith:基于强化学习的即时注入防御的红队攻击

摘要: 即时注入对现实世界的LLM应用,尤其是自主代理,构成严重的安全风险。尽管已经提出了许多防御措施,但它们对自适应攻击的鲁棒性尚未得到充分评估,可能导致一种虚假的安全感。在这项工作中,我们提出了PISmith,这是一个基于强化学习(RL)的红队框架,通过训练一个攻击LLM来优化实际黑盒设置中注入的提示,系统地评估现有的即时注入防御措施,攻击者只能查询被保护的LLM并观察其输出。我们发现,直接应用标准GRPO来攻击强力防御措施会导致次优性能,因为极端奖励稀疏性 - 大多数生成的注入提示被防御阻挡,导致策略的熵在发现有效攻击策略之前崩溃,而罕见的成功也无法有效学习。作为回应,我们引入了自适应熵正则化和动态优势加权,以维持探索并加强从稀缺成功中的学习。对13个基准进行广泛评估表明,最先进的即时注入防御措施仍然容易受到自适应攻击的影响。我们还将PISmith与静态、基于搜索和基于RL的攻击类别的7个基准进行比较,显示PISmith始终实现最高的攻击成功率。此外,PISmith在InjecAgent和AgentDojo的代理设置中对开源和闭源LLM(例如GPT-4o-mini和GPT-5-nano)表现出强大的性能。我们的代码可以在https://github.com/albert-y1n/PISmith 上找到。

更新时间: 2026-03-13 14:34:54

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2603.13026v1

FraudFox: Adaptable Fraud Detection in the Real World

The proposed method (FraudFox) provides solutions to adversarial attacks in a resource constrained environment. We focus on questions like the following: How suspicious is `Smith', trying to buy \$500 shoes, on Monday 3am? How to merge the risk scores, from a handful of risk-assessment modules (`oracles') in an adversarial environment? More importantly, given historical data (orders, prices, and what-happened afterwards), and business goals/restrictions, which transactions, like the `Smith' transaction above, which ones should we `pass', versus send to human investigators? The business restrictions could be: `at most $x$ investigations are feasible', or `at most \$$y$ lost due to fraud'. These are the two research problems we focus on, in this work. One approach to address the first problem (`oracle-weighting'), is by using Extended Kalman Filters with dynamic importance weights, to automatically and continuously update our weights for each 'oracle'. For the second problem, we show how to derive an optimal decision surface, and how to compute the Pareto optimal set, to allow what-if questions. An important consideration is adaptation: Fraudsters will change their behavior, according to our past decisions; thus, we need to adapt accordingly. The resulting system, \method, is scalable, adaptable to changing fraudster behavior, effective, and already in \textbf{production} at Amazon. FraudFox augments a fraud prevention sub-system and has led to significant performance gains.

Updated: 2026-03-13 14:19:03

标题: FraudFox:适应现实世界中的欺诈检测

摘要: 提出的方法(FraudFox)为资源受限环境中的对抗性攻击提供解决方案。我们关注以下问题:在周一凌晨3点,试图购买价值500美元的鞋子的“史密斯”有多可疑?如何合并来自少数风险评估模块(“神谕”)的风险评分,在对抗性环境中?更重要的是,鉴于历史数据(订单、价格以及之后发生的事情)和业务目标/限制,像上面提到的“史密斯”交易这样的交易,我们应该“通过”哪些,而不是发送给人类调查员?业务限制可能是:“最多可进行$x$次调查”,或“最多因欺诈而损失\$$y$”。这是我们在这项工作中专注的两个研究问题。解决第一个问题(“神谕加权”)的一种方法是使用具有动态重要性权重的扩展卡尔曼滤波器,以自动和持续地更新我们对每个“神谕”的权重。对于第二个问题,我们展示了如何推导出最优决策面,并如何计算帕累托最优集,以允许对“如果”问题进行探讨。一个重要的考虑因素是适应性:欺诈者将根据我们过去的决策而改变他们的行为;因此,我们需要相应地进行适应。由此产生的系统,FraudFox,可扩展,适应变化中的欺诈者行为,有效,并已经在亚马逊的\textbf{生产}中投入使用。FraudFox增强了欺诈预防子系统,并取得了显著的性能增益。

更新时间: 2026-03-13 14:19:03

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2603.13014v1

ZK-ACE: Identity-Centric Zero-Knowledge Authorization for Post-Quantum Blockchain Systems

Post-quantum signature schemes introduce kilobyte-scale authorization artifacts when applied directly to blockchain transaction validation. A widely considered mitigation is to verify post-quantum signatures inside zero-knowledge circuits and publish only succinct proofs on-chain. However, this approach preserves the signature-centric authorization model, merely relocating the verification cost, and embeds expensive high-dimensional lattice arithmetic into prover circuits.We present ZK-ACE (Zero-Knowledge Authorization for Cryptographic Entities), an authorization layer that replaces transaction-carried signature objects entirely with identity-bound zero-knowledge authorization statements. Rather than proving the correctness of a specific post-quantum signature, the prover demonstrates in zero knowledge that a transaction is authorized by an identity consistent with an on-chain commitment and bound replay state. The construction assumes a deterministic identity derivation primitive (DIDP) as a black box and uses a compact identity commitment as the primary on-chain identity anchor, supplemented by per-transaction replay-prevention state. We formalize ZK-ACE with explicit game-based security definitions for authorization soundness, replay resistance, substitution resistance, and cross-domain separation. We present a complete circuit constraint specification, define two replay-prevention models, and provide reduction-based security proofs under standard assumptions (knowledge soundness, collision resistance, and DIDP identity-root recovery hardness). A structural, protocol-level data accounting demonstrates an order-of-magnitude reduction in consensus-visible authorization data relative to direct post-quantum signature deployment. The design supports batch aggregation and recursive proof composition, and is compatible with account-abstraction and rollup-based deployment architectures.

Updated: 2026-03-13 14:07:40

标题: ZK-ACE:基于身份的零知识授权用于后量子区块链系统

摘要: Post-quantum signature schemes在应用于区块链交易验证时引入了千字节级别的授权工件。一个被广泛考虑的缓解方法是在零知识电路中验证后量子签名,并仅在链上发布简明证明。然而,这种方法保留了以签名为中心的授权模型,仅仅是将验证成本转移,并将昂贵的高维格栅算术嵌入到证明者电路中。我们提出了ZK-ACE(密码实体的零知识授权),这是一个授权层,完全用身份绑定的零知识授权声明替换了交易携带的签名对象。证明者不是证明特定的后量子签名的正确性,而是零知识地证明一个交易是由一个与链上承诺和绑定重放状态一致的身份授权的。该构造假设一个确定性身份派生原语(DIDP)作为黑盒,并使用紧凑的身份承诺作为主要的链上身份锚定,辅以每个交易的重放防止状态。我们为授权的声音性、重放抵抗性、替换抵抗性和跨域分离明确定义了基于游戏的安全性定义。我们提出了完整的电路约束规范,定义了两种重放防范模型,并在标准假设下(知识声音性、碰撞抵抗性和DIDP身份根恢复难度)提供了基于还原的安全性证明。一个结构化的、协议级的数据会计展示了相对于直接后量子签名部署的共识可见授权数据数量的数量级降低。该设计支持批量聚合和递归证明组合,并与基于账户抽象和Rollup的部署架构兼容。

更新时间: 2026-03-13 14:07:40

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2603.07974v2

Public-Key Quantum Money and Fast Real Transforms

We propose a public-key quantum money scheme based on group actions and the Hartley transform. Our scheme adapts the quantum money scheme of Zhandry (2024), replacing the Fourier transform with the Hartley transform. This substitution ensures the banknotes have real amplitudes rather than complex amplitudes, which could offer both computational and theoretical advantages. To support this new construction, we propose a new verification algorithm that uses group action twists to address verification failures caused by the switch to real amplitudes. We also show how to efficiently compute the serial number associated with a money state using a new algorithm based on continuous-time quantum walks. Finally, we present a recursive algorithm for the quantum Hartley transform, achieving lower gate complexity than prior work and demonstrate how to compute other real quantum transforms, such as the quantum sine transform, using the quantum Hartley transform as a subroutine.

Updated: 2026-03-13 14:04:07

标题: 公钥量子货币和快速实数变换

摘要: 我们提出了一种基于群作用和哈特利变换的公钥量子货币方案。我们的方案改编自Zhandry(2024)的量子货币方案,用哈特利变换取代了傅里叶变换。这种替代确保了纸币具有实振幅而不是复振幅,这可能在计算和理论上都具有优势。 为了支持这种新的构建,我们提出了一个新的验证算法,利用群作用扭曲来解决由于转换到实振幅而导致的验证失败问题。我们还展示了如何使用基于连续时间量子漫步的新算法有效地计算与货币状态相关的序列号。最后,我们提出了一个用于量子哈特利变换的递归算法,实现了比先前工作更低的门复杂度,并展示了如何使用量子哈特利变换作为子程序来计算其他实量子变换,如量子正弦变换。

更新时间: 2026-03-13 14:04:07

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2503.18890v4

Mitigating Collusion in Proofs of Liabilities

Cryptocurrency exchanges use proofs of liabilities (PoLs) to prove to their customers their liabilities committed on-chain, thereby enhancing their trust in the service. Unfortunately, a close examination of currently deployed and academic PoLs reveals significant shortcomings in their designs. For instance, existing schemes cannot resist realistic attack scenarios in which the provider colludes with an existing user. In this paper, we propose a new model, dubbed permissioned PoL, that addresses this gap by not requiring cooperation from users to detect a dishonest provider's potential misbehavior. At the core of our proposal lies a novel primitive, which we call Permissioned Vector Commitment (PVC), to ensure that a committed vector only contains values that users have explicitly signed. We provide an efficient PVC and PoL construction that carefully combines homomorphic properties of KZG commitments and BLS-based signatures. Our prototype implementation shows that, despite the stronger security, our proposal also improves server performance (by up to $10\times$) compared to prior PoLs.

Updated: 2026-03-13 13:45:23

标题: 减轻责任证明中的勾结

摘要: 加密货币交易所使用负债证明(PoLs)向客户证明他们在链上承诺的责任,从而增强他们对服务的信任。不幸的是,对当前部署和学术PoLs的密切审查揭示了它们设计上的重大缺陷。例如,现有方案无法抵抗实际攻击情况,即提供者与现有用户勾结。在本文中,我们提出了一个新模型,称为许可PoL,它通过不需要用户合作就可以检测到不诚实提供者潜在的不端行为来弥补这一差距。我们提议的核心是一种新颖的原语,我们称之为许可向量承诺(PVC),以确保承诺的向量仅包含用户明确签署的值。我们提供了一种有效的PVC和PoL构建方法,精心结合了KZG承诺和基于BLS的签名的同态性质。我们的原型实现表明,尽管安全性更强,我们的提议也改善了服务器性能(最多提高了10倍)与之前的PoLs相比。

更新时间: 2026-03-13 13:45:23

领域: cs.CR

下载: http://arxiv.org/abs/2603.12990v1

Test-Time Attention Purification for Backdoored Large Vision Language Models

Despite the strong multimodal performance, large vision-language models (LVLMs) are vulnerable during fine-tuning to backdoor attacks, where adversaries insert trigger-embedded samples into the training data to implant behaviors that can be maliciously activated at test time. Existing defenses typically rely on retraining backdoored parameters (e.g., adapters or LoRA modules) with clean data, which is computationally expensive and often degrades model performance. In this work, we provide a new mechanistic understanding of backdoor behaviors in LVLMs: the trigger does not influence prediction through low-level visual patterns, but through abnormal cross-modal attention redistribution, where trigger-bearing visual tokens steal attention away from the textual context - a phenomenon we term attention stealing. Motivated by this, we propose CleanSight, a training-free, plug-and-play defense that operates purely at test time. CleanSight (i) detects poisoned inputs based on the relative visual-text attention ratio in selected cross-modal fusion layers, and (ii) purifies the input by selectively pruning the suspicious high-attention visual tokens to neutralize the backdoor activation. Extensive experiments show that CleanSight significantly outperforms existing pixel-based purification defenses across diverse datasets and backdoor attack types, while preserving the model's utility on both clean and poisoned samples.

Updated: 2026-03-13 13:45:06

标题: 测试时间关注净化用于带后门的大型视觉语言模型

摘要: 尽管大型视觉语言模型(LVLMs)在多模态性能方面表现强劲,但在微调过程中容易受到后门攻击的影响,即对手向训练数据中插入带有触发器的样本,以植入在测试时可能被恶意激活的行为。现有的防御方法通常依赖于使用干净数据重新训练受到后门攻击的参数(例如,适配器或LoRA模块),这种方法计算成本高昂,并且通常会降低模型性能。在这项工作中,我们提供了对LVLMs中后门行为的新的机械理解:触发器并不通过低级视觉模式影响预测,而是通过异常的跨模态注意力重分配,在其中带有触发器的视觉标记会从文本上下文中窃取注意力 - 我们称之为注意力窃取现象。在此基础上,我们提出了CleanSight,一种无需训练、即插即用的防御方法,纯粹在测试时运行。CleanSight(i)根据选定的跨模态融合层中的相对视觉-文本注意力比例检测受到污染的输入,(ii)通过有选择地修剪可疑的高关注视觉标记来净化输入,以抵消后门激活。大量实验证明,CleanSight在各种数据集和后门攻击类型中明显优于现有的基于像素的净化防御方法,同时保留了模型在干净和受污染样本上的效用。

更新时间: 2026-03-13 13:45:06

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2603.12989v1

A Requirement-Based Framework for Engineering Adaptive Authentication

Authentication is crucial to confirm that an individual or entity trying to perform an action is actually who or what they claim to be. In dynamic environments such as the Internet of Things (IoT), Internet of Vehicles (IoV), healthcare, and smart cities, security risks can change depending on varying contextual factors (e.g., user attempting to authenticate, location, device type). Thus, authentication methods must adapt to mitigate changing security risks while meeting usability and performance requirements. However, existing adaptive authentication systems provide limited guidance on (a) representing contextual factors, requirements, and authentication methods (b) understanding the influence of contextual factors and authentication methods on the fulfilment of requirements, and (c) selecting effective authentication methods that reduce security risks while maximizing the satisfaction of the requirements. This paper proposes a framework for engineering adaptive authentication systems that dynamically select effective authentication methods to address changes in contextual factors and security risks. The framework leverages a contextual goal model to represent requirements and the influence of contextual factors on security risks and requirement priorities. It uses an extended feature model to represent potential authentication methods and their impacts on mitigating security risks and satisfying requirements. At runtime, when contextual factors change, the framework employs a Fuzzy Causal network encoded using the Z3 SMT solver to analyze the goal and feature models, enabling the selection of effective authentication methods. We demonstrate and evaluate our framework through its application to real-world authentication scenarios in the IoV and the healthcare domains.

Updated: 2026-03-13 13:08:36

标题: 一种基于需求的框架用于工程适应性认证

摘要: 身份验证对于确认试图执行操作的个人或实体实际上是谁或什么至关重要。在动态环境中,如物联网(IoT)、车联网(IoV)、医疗保健和智慧城市等领域,安全风险可能会根据不同的上下文因素(例如,试图进行身份验证的用户、位置、设备类型)而发生变化。因此,身份验证方法必须适应以减轻不断变化的安全风险,同时满足可用性和性能要求。然而,现有的自适应身份验证系统在(a)表示上下文因素、要求和身份验证方法、(b)理解上下文因素和身份验证方法对要求履行的影响,以及(c)选择有效的身份验证方法以减少安全风险同时最大程度满足要求方面提供有限的指导。本文提出了一个框架,用于工程化自适应身份验证系统,动态选择有效的身份验证方法以应对上下文因素和安全风险的变化。该框架利用上下文目标模型来表示要求以及上下文因素对安全风险和要求优先级的影响。它使用扩展功能模型来表示潜在的身份验证方法及其对减轻安全风险和满足要求的影响。在运行时,当上下文因素发生变化时,该框架利用使用Z3 SMT求解器编码的模糊因果网络来分析目标和特征模型,从而选择有效的身份验证方法。我们通过在IoV和医疗保健领域的实际身份验证场景中应用和评估我们的框架来展示和评估。

更新时间: 2026-03-13 13:08:36

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2603.12968v1

Editing Away the Evidence: Diffusion-Based Image Manipulation and the Failure Modes of Robust Watermarking

Robust invisible watermarks are widely used to support copyright protection, content provenance, and accountability by embedding hidden signals designed to survive common post-processing operations. However, diffusion-based image editing introduces a fundamentally different class of transformations: it injects noise and reconstructs images through a powerful generative prior, often altering semantic content while preserving photorealism. In this paper, we provide a unified theoretical and empirical analysis showing that non-adversarial diffusion editing can unintentionally degrade or remove robust watermarks. We model diffusion editing as a stochastic transformation that progressively contracts off-manifold perturbations, causing the low-amplitude signals used by many watermarking schemes to decay. Our analysis derives bounds on watermark signal-to-noise ratio and mutual information along diffusion trajectories, yielding conditions under which reliable recovery becomes information-theoretically impossible. We further evaluate representative watermarking systems under a range of diffusion-based editing scenarios and strengths. The results indicate that even routine semantic edits can significantly reduce watermark recoverability. Finally, we discuss the implications for content provenance and outline principles for designing watermarking approaches that remain robust under generative image editing.

Updated: 2026-03-13 12:46:27

标题: 编辑掉证据:基于扩散的图像篡改与鲁棒水印的失效模式

摘要: 鲁棒性隐形水印被广泛用于支持版权保护、内容来源和责任追究,通过嵌入隐藏信号来设计以在常见的后处理操作中存活。然而,基于扩散的图像编辑引入了一类基本不同的变换:它注入噪声并通过强大的生成先验重建图像,通常改变语义内容同时保留照片逼真性。在本文中,我们提供了统一的理论和经验分析,表明非对抗性扩散编辑可能无意中降低或删除鲁棒水印。我们将扩散编辑建模为随机变换,逐渐收缩离散扰动,导致许多水印方案使用的低幅信号衰减。我们的分析推导出水印信噪比和扩散轨迹上的互信息的界限,得出可靠恢复变得信息论上不可能的条件。我们进一步评估了一系列代表性水印系统在一系列基于扩散的编辑场景和强度下的表现。结果表明,即使是例行语义编辑也可能显著降低水印的可恢复性。最后,我们讨论了内容来源的影响,并概述了设计在生成图像编辑下仍然鲁棒的水印方法的原则。

更新时间: 2026-03-13 12:46:27

领域: eess.IV,cs.CR,cs.MM

下载: http://arxiv.org/abs/2603.12949v1

Almost-Free Queue Jumping for Prior Inputs in Private Neural Inference

Privacy-Preserving Machine Learning as a Service (PP-MLaaS) enables secure neural network inference by integrating cryptographic primitives such as homomorphic encryption (HE) and multi-party computation (MPC), protecting both client data and server models. Recent mixed-primitive frameworks have significantly improved inference efficiency, yet they process batched inputs sequentially, offering little flexibility for prioritizing urgent requests. Naïve queue jumping introduces considerable computational and communication overhead, increasing non-negligible latency for in-queue inputs. We initiate the study of privacy-preserving queue jumping in batched inference and propose PrivQJ, a novel framework that enables efficient priority handling without degrading overall system performance. PrivQJ exploits shared computation across inputs via in-processing slot recycling, allowing prior inputs to be piggybacked onto ongoing batch computation with almost no additional cryptographic cost. Both theoretical analysis and experimental results demonstrate over an order-of-magnitude reduction in overhead compared to state-of-the-art PP-MLaaS systems.

Updated: 2026-03-13 12:41:36

标题: 私人神经推理中几乎免费的优先输入队列跳跃

摘要: 隐私保护机器学习作为服务(PP-MLaaS)通过集成诸如同态加密(HE)和多方计算(MPC)等密码原语,实现了安全的神经网络推断,保护了客户数据和服务器模型。最近的混合原语框架显著提高了推断效率,但它们按顺序处理批量输入,对于优先处理紧急请求提供了很少的灵活性。天真的队列跳跃引入了相当大的计算和通信开销,增加了队列输入的非可忽略延迟。 我们启动了隐私保护队列跳跃在批量推断中的研究,并提出了PrivQJ,一个新颖的框架,可以实现高效的优先处理而不降低整体系统性能。PrivQJ通过处理插槽回收实现跨输入的共享计算,允许先前的输入连接到正在进行的批量计算,几乎没有额外的密码成本。理论分析和实验证明,与现有技术的PP-MLaaS系统相比,开销减少了一个数量级。

更新时间: 2026-03-13 12:41:36

领域: cs.CR

下载: http://arxiv.org/abs/2603.12946v1

Architectural Selection Framework for Synthetic Network Traffic: Quantifying the Fidelity-Utility Trade-off

The fidelity and utility of synthetic network traffic are critically compromised by architectural mismatch across heterogeneous network datasets and prevalent scalability failure. This study addresses this challenge by establishing an Architectural Selection Framework that empirically quantifies how data structure compatibility dictates the optimal fidelity-utility trade-off. We systematically evaluate twelve generative architectures (both non-AI and AI) across two distinct data structure types: categorical-heavy NSL-KDD and continuous-flow-heavy CIC-IDS2017. Fidelity is rigorously assessed through three structural metrics (Data Structure, Correlation, and Probability Distribution Difference) to confirm structural realism before evaluating downstream utility. Our results, confirmed over twenty independent runs (N=20), demonstrate that GAN-based models (CTGAN, CopulaGAN) exhibit superior architectural robustness, consistently achieving the optimal balance of statistical fidelity and practical utility. Conversely, the framework exposes critical failure modes, i.e., statistical methods compromise structural fidelity for utility (Compromised fidelity), and modern iterative architectures, such as Diffusion Models, face prohibitive computational barriers, rendering them impractical for large-scale security deployment. This contribution provides security practitioners with an evidence-based guide for mitigating architectural failures, thereby setting a benchmark for reliable and scalable synthetic data deployment in adaptive security solutions.

Updated: 2026-03-13 12:07:03

标题: 合成网络流量的架构选择框架:定量衡量保真度和效用之间的权衡Trade-off

摘要: 合成网络流量的忠实度和实用性受到异构网络数据集之间的架构不匹配和普遍的可扩展性失败的严重影响。本研究通过建立一个架构选择框架来解决这一挑战,该框架通过实证量化数据结构的兼容性如何决定最佳忠实度和实用性的权衡。我们系统地评估了十二种生成架构(包括非AI和AI)在两种不同的数据结构类型(分类重的NSL-KDD和连续流重的CIC-IDS2017)上的表现。我们严格评估了三个结构指标(数据结构、相关性和概率分布差异)来确认结构的逼真性,然后再评估下游的实用性。我们的结果在二十次独立运行(N=20)中得到验证,证明基于GAN的模型(如CTGAN、CopulaGAN)展现出卓越的架构鲁棒性,始终实现统计忠实度和实际实用性的最佳平衡。相反,该框架揭示了关键的失败模式,即统计方法牺牲了结构的忠实度以换取实用性(忠实度受损),而现代的迭代架构,如扩散模型,面临着计算上的障碍,使其在大规模安全部署中变得不切实际。这一贡献为安全从业者提供了一个基于证据的指南,以减轻架构失败,从而为自适应安全解决方案中可靠和可扩展的合成数据部署设定了一个基准。

更新时间: 2026-03-13 12:07:03

领域: cs.CR

下载: http://arxiv.org/abs/2410.16326v3

HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios

The rapid evolution of embodied agents has accelerated the deployment of household robots in real-world environments. However, unlike structured industrial settings, household spaces introduce unpredictable safety risks, where system limitations such as perception latency and lack of common sense knowledge can lead to dangerous errors. Current safety evaluations, often restricted to static images, text, or general hazards, fail to adequately benchmark dynamic unsafe action detection in these specific contexts. To bridge this gap, we introduce HomeSafe-Bench, a challenging benchmark designed to evaluate Vision-Language Models (VLMs) on unsafe action detection in household scenarios. HomeSafe-Bench is contrusted via a hybrid pipeline combining physical simulation with advanced video generation and features 438 diverse cases across six functional areas with fine-grained multidimensional annotations. Beyond benchmarking, we propose Hierarchical Dual-Brain Guard for Household Safety (HD-Guard), a hierarchical streaming architecture for real-time safety monitoring. HD-Guard coordinates a lightweight FastBrain for continuous high-frequency screening with an asynchronous large-scale SlowBrain for deep multimodal reasoning, effectively balancing inference efficiency with detection accuracy. Evaluations demonstrate that HD-Guard achieves a superior trade-off between latency and performance, while our analysis identifies critical bottlenecks in current VLM-based safety detection.

Updated: 2026-03-13 10:53:52

标题: HomeSafe-Bench: 在家庭场景中评估视觉语言模型对不安全行为检测的影响

摘要: 身体化代理的快速发展加速了家庭机器人在现实环境中的部署。然而,与结构化的工业环境不同,家庭空间引入了不可预测的安全风险,系统限制如感知延迟和缺乏常识知识可能导致危险错误。当前的安全评估通常局限于静态图像、文本或一般危险,未能充分评估这些特定情境中动态不安全行为检测的基准。为了弥补这一差距,我们引入了HomeSafe-Bench,一个旨在评估家庭场景中不安全行为检测的挑战性基准。HomeSafe-Bench通过将物理模拟与先进视频生成相结合的混合管道构建,涵盖了六个功能区域中的438个多样化案例,并进行了精细的多维注释。除了基准测试,我们提出了用于实时安全监测的家庭安全的分层双脑防护(HD-Guard)。HD-Guard协调了一个轻量级的FastBrain用于连续高频筛选,以及一个异步的大规模SlowBrain用于深度多模态推理,有效平衡了推理效率和检测准确性。评估结果表明,HD-Guard在延迟和性能之间取得了卓越的平衡,而我们的分析识别了当前基于视觉-语言模型的安全检测中的关键瓶颈。

更新时间: 2026-03-13 10:53:52

领域: cs.CV,cs.AI,cs.CR

下载: http://arxiv.org/abs/2603.11975v2

Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models

Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the persistence of jailbreak attacks suggests a fundamental mechanistic decoupling. We propose the \textbf{\underline{D}}isentangled \textbf{\underline{S}}afety \textbf{\underline{H}}ypothesis \textbf{(DSH)}, positing that safety computation operates on two distinct subspaces: a \textit{Recognition Axis} ($\mathbf{v}_H$, ``Knowing'') and an \textit{Execution Axis} ($\mathbf{v}_R$, ``Acting''). Our geometric analysis reveals a universal ``Reflex-to-Dissociation'' evolution, where these signals transition from antagonistic entanglement in early layers to structural independence in deep layers. To validate this, we introduce \textit{Double-Difference Extraction} and \textit{Adaptive Causal Steering}. Using our curated \textsc{AmbiguityBench}, we demonstrate a causal double dissociation, effectively creating a state of ``Knowing without Acting.'' Crucially, we leverage this disentanglement to propose the \textbf{Refusal Erasure Attack (REA)}, which achieves State-of-the-Art attack success rates by surgically lobotomizing the refusal mechanism. Furthermore, we uncover a critical architectural divergence, contrasting the \textit{Explicit Semantic Control} of Llama3.1 with the \textit{Latent Distributed Control} of Qwen2.5. The code and dataset are available at https://anonymous.4open.science/r/DSH.

Updated: 2026-03-13 10:42:07

标题: 不行动而知晓:大型语言模型中安全机制的脱离几何结构

摘要: 安全对齐通常被概念化为一个整体过程,其中有害性检测会自动触发拒绝。然而,越狱攻击的持续存在表明存在基本的机械解耦。我们提出了\textbf{\underline{D}}解耦\textbf{\underline{S}}安全\textbf{\underline{H}}假设(DSH),认为安全计算在两个不同的子空间上运行:一个是\textit{识别轴}($\mathbf{v}_H$,“知道”),另一个是\textit{执行轴}($\mathbf{v}_R$,“行动”)。我们的几何分析揭示了一个普遍的“反射到解离”的演变过程,在这个过程中,这些信号从早期层的对抗纠缠过渡到深层的结构独立。为了验证这一点,我们引入了\textit{双差分提取}和\textit{自适应因果引导}。使用我们精心策划的\textsc{模糊测试平台},我们展示了一种因果双重解离,有效地创造了一种“知道而不行动”的状态。重要的是,我们利用这种解耦来提出\textbf{拒绝消除攻击(REA)},通过手术方式切除拒绝机制,实现了最先进的攻击成功率。此外,我们揭示了一个关键的架构分歧,对比了Llama3.1的\textit{显式语义控制}和Qwen2.5的\textit{潜在分布式控制}。代码和数据集可在https://anonymous.4open.science/r/DSH 上获得。

更新时间: 2026-03-13 10:42:07

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2603.05773v2

FoSAM: Forward Secret Messaging in Ad-Hoc Networks

Apps such as Firechat and Bridgefy have been used during recent protests in Hong Kong and Iran, as they allow communication over ad-hoc wireless networks even when internet access is restricted. However, these apps do not provide sufficient protection as they do not achieve forward secrecy in unreliable networks. Without forward secrecy, caught protesters' devices will disclose all previous messages to the authorities, putting them and others at great risk. In this paper, we introduce FoSAM, the first protocol to provide proven anonymous and forward secret messaging in unreliable ad-hoc networks. Communication in FoSAM requires only the receiver's public key, rather than an interactive handshake. We evaluate the performance of FoSAM using a large-scale simulation with different user movement patterns, showing that it achieves between 92% and 99% successful message delivery. We additionally implement a FoSAM prototype for Android.

Updated: 2026-03-13 10:17:18

标题: FoSAM:Ad-Hoc网络中的前向保密消息传递

摘要: 应用程序如Firechat和Bridgefy已经在最近的香港和伊朗抗议活动中被使用,因为它们允许在临时无线网络上进行通信,即使互联网访问受限。然而,这些应用程序并未提供足够的保护,因为它们在不可靠网络中无法实现前向保密。没有前向保密,被捕的抗议者设备将向当局披露所有先前的消息,将自己和其他人置于极大风险之中。在本文中,我们介绍了FoSAM,这是第一个在不可靠的临时网络中提供匿名和前向保密消息的协议。FoSAM中的通信仅需接收者的公钥,而不需要交互式握手。我们使用不同用户移动模式进行大规模模拟评估了FoSAM的性能,结果显示它实现了92%至99%的成功消息传递。我们还为Android实现了一个FoSAM原型。

更新时间: 2026-03-13 10:17:18

领域: cs.CR

下载: http://arxiv.org/abs/2603.12871v1

Evolving Deception: When Agents Evolve, Deception Wins

Self-evolving agents offer a promising path toward scalable autonomy. However, in this work, we show that in competitive environments, self-evolution can instead give rise to a serious and previously underexplored risk: the spontaneous emergence of deception as an evolutionarily stable strategy. We conduct a systematic empirical study on the self-evolution of large language model (LLM) agents in a competitive Bidding Arena, where agents iteratively refine their strategies through interaction-driven reflection. Across different evolutionary paths (\eg, Neutral, Honesty-Guided, and Deception-Guided), we find a consistent pattern: under utility-driven competition, unconstrained self-evolution reliably drifts toward deceptive behaviors, even when honest strategies remain viable. This drift is explained by a fundamental asymmetry in generalization. Deception evolves as a transferable meta-strategy that generalizes robustly across diverse and unseen tasks, whereas honesty-based strategies are fragile and often collapse outside their original contexts. Further analysis of agents internal states reveals the emergence of rationalization mechanisms, through which agents justify or deny deceptive actions to reconcile competitive success with normative instructions. Our paper exposes a fundamental tension between agent self-evolution and alignment, highlighting the risks of deploying self-improving agents in adversarial environments.

Updated: 2026-03-13 10:09:11

标题: 进化的欺骗:当代理进化时,欺骗获胜

摘要: 自我进化的代理人为可扩展自主性提供了一个有前途的道路。然而,在这项工作中,我们表明在竞争环境中,自我进化可能会导致一个严重且以前未被充分探讨的风险:欺骗作为一种进化稳定策略的自发出现。我们在一个竞争性投标竞技场中对大型语言模型(LLM)代理人的自我进化进行了系统实证研究,代理人通过互动驱动的反思逐步完善他们的策略。在不同的进化路径(如中立、诚实导向和欺骗导向)中,我们发现一个一致的模式:在以效用驱动的竞争下,无限制的自我进化可靠地漂移到欺骗行为,即使诚实策略仍然可行。这种漂移可以通过一种普遍化的不对称性来解释。欺骗作为一种可转移的元策略发展出了稳健的概括能力,可以跨越不同和未知的任务,而基于诚实的策略则是脆弱的,并且在原始背景之外经常崩溃。对代理人内部状态的进一步分析揭示了合理化机制的出现,通过这种机制,代理人为了将竞争成功与规范指令调和而为欺骗行为进行辩解或否认。我们的论文揭示了代理人自我进化与对齐之间的根本张力,凸显了在对抗性环境中部署自我改进代理人的风险。

更新时间: 2026-03-13 10:09:11

领域: cs.CR

下载: http://arxiv.org/abs/2603.05872v2

Depth Charge: Jailbreak Large Language Models from Deep Safety Attention Heads

Currently, open-sourced large language models (OSLLMs) have demonstrated remarkable generative performance. However, as their structure and weights are made public, they are exposed to jailbreak attacks even after alignment. Existing attacks operate primarily at shallow levels, such as the prompt or embedding level, and often fail to expose vulnerabilities rooted in deeper model components, which creates a false sense of security for successful defense. In this paper, we propose \textbf{\underline{S}}afety \textbf{\underline{A}}ttention \textbf{\underline{H}}ead \textbf{\underline{A}}ttack (\textbf{SAHA}), an attention-head-level jailbreak framework that explores the vulnerability in deeper but insufficiently aligned attention heads. SAHA contains two novel designs. Firstly, we reveal that deeper attention layers introduce more vulnerability against jailbreak attacks. Based on this finding, \textbf{SAHA} introduces \textit{Ablation-Impact Ranking} head selection strategy to effectively locate the most vital layer for unsafe output. Secondly, we introduce a boundary-aware perturbation method, \textit{i.e. Layer-Wise Perturbation}, to probe the generation of unsafe content with minimal perturbation to the attention. This constrained perturbation guarantees higher semantic relevance with the target intent while ensuring evasion. Extensive experiments show the superiority of our method: SAHA improves ASR by 14\% over SOTA baselines, revealing the vulnerability of the attack surface on the attention head. Our code is available at https://anonymous.4open.science/r/SAHA.

Updated: 2026-03-13 09:35:20

标题: 深度冲击:越狱大型语言模型从深度安全注意力头

摘要: 目前,开源的大型语言模型(OSLLMs)已经展示出卓越的生成性能。然而,由于它们的结构和权重是公开的,它们即使在对齐后也容易受到越狱攻击的威胁。现有的攻击主要在浅层级别,如提示或嵌入级别操作,并经常无法暴露根深蒂固的模型组件中的漏洞,这给成功的防御带来了一种虚假的安全感。在本文中,我们提出了一个名为\textbf{\underline{S}}afety \textbf{\underline{A}}ttention \textbf{\underline{H}}ead \textbf{\underline{A}}ttack(\textbf{SAHA})的注意力头级越狱框架,它探索更深层但不足对齐的注意力头中的漏洞。SAHA包含两个新颖的设计。首先,我们发现更深的注意力层引入了更多的越狱攻击漏洞。基于这一发现,\textbf{SAHA}引入了\textit{消融-影响排序}头选择策略,以有效地定位最关键的不安全输出层。其次,我们引入了一种基于边界感知的扰动方法,即\textit{分层扰动},以在对注意力最小扰动的情况下探测不安全内容的生成。这种受限扰动确保了与目标意图更高语义相关性,同时确保了逃避。大量实验证明了我们方法的优越性:SAHA相对于SOTA基线提高了14\%的ASR,揭示了对注意力头攻击表面的脆弱性。我们的代码可在https://anonymous.4open.science/r/SAHA获得。

更新时间: 2026-03-13 09:35:20

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2603.05772v2

Balancing the privacy-utility trade-off: How to draw reliable conclusions from private data

Absolute anonymization, conceived as an irreversible transformation that prevents re-identification and sensitive value disclosure, has proven to be a broken promise. Consequently, modern data protection must shift toward a privacy-utility trade-off grounded in risk mitigation. Differential Privacy (DP) offers a rigorous mathematical framework for balancing quantified disclosure risk with analytical usefulness. Nevertheless, widespread adoption remains limited, largely because effective translation of complex technical concepts, such as privacy-loss parameters, into forms meaningful to non-technical stakeholders has yet to be achieved. This difficulty arises from the inherent use of randomization: both legitimate analysts and potential adversaries must draw conclusions from uncertain observations rather than deterministic values. In this work, we propose a new interpretation of the privacy-utility trade-off based on hypothesis testing. This perspective explicitly accounts for the uncertainty introduced by randomized mechanisms in both membership inference scenarios and general data analysis. In particular, we introduce the concept of relative disclosure risk to quantify the maximum reduction in uncertainty an adversary can obtain from protected outputs, and we show that this measure is directly related to standard privacy-loss parameters. At the same time, we analyze how DP affects analytical validity by studying its impact on hypothesis tests commonly used to assess the statistical significance of empirical results. Finally, we provide practical guidance, accessible to non-experts, for navigating the privacy-utility trade-off, aiding in the selection of suitable protection mechanisms and the values for the privacy-loss parameters.

Updated: 2026-03-13 07:54:08

标题: 平衡隐私与效用的权衡:如何从私人数据中得出可靠结论

摘要: 绝对匿名化被构想为一种不可逆转的转换,可以防止重新识别和敏感价值披露,但事实证明这是一个夸大其词的承诺。因此,现代数据保护必须转向一个基于风险缓解的隐私效用权衡。差分隐私(DP)提供了一个严格的数学框架,用于平衡量化的披露风险和分析有用性。然而,广泛采用仍受限制,主要原因是将复杂的技术概念(如隐私损失参数)有效地转化为对非技术利益相关者有意义的形式尚未实现。这种困难源于随机化的固有使用:合法的分析人员和潜在的对手都必须从不确定的观察结果中得出结论,而不是确定性的值。在这项工作中,我们提出了基于假设检验的隐私效用权衡的新解释。这一观点明确考虑了随机机制引入的不确定性在成员推断场景和一般数据分析中的影响。特别是,我们引入了相对披露风险的概念,用于量化对手可以从受保护输出中获得的最大不确定性减少,并且我们展示了这一度量与标准隐私损失参数直接相关。同时,我们分析了DP对分析有效性的影响,通过研究其对用于评估经验结果的统计显著性的假设检验的影响。最后,我们提供了实用指导,适用于非专家,以便在隐私效用权衡中进行导航,帮助选择合适的保护机制和隐私损失参数的值。

更新时间: 2026-03-13 07:54:08

领域: stat.ME,cs.CR

下载: http://arxiv.org/abs/2603.12753v1

SLICE: Semantic Latent Injection via Compartmentalized Embedding for Image Watermarking

Watermarking the initial noise of diffusion models has emerged as a promising approach for image provenance, but content-independent noise patterns can be forged via inversion and regeneration attacks. Recent semantic-aware watermarking methods improve robustness by conditioning verification on image semantics. However, their reliance on a single global semantic binding makes them vulnerable to localized but globally coherent semantic edits. To address this limitation and provide a trustworthy semantic-aware watermark, we propose $\underline{\textbf{S}}$emantic $\underline{\textbf{L}}$atent $\underline{\textbf{I}}$njection via $\underline{\textbf{C}}$ompartmentalized $\underline{\textbf{E}}$mbedding ($\textbf{SLICE}$). Our framework decouples image semantics into four semantic factors (subject, environment, action, and detail) and precisely anchors them to distinct regions in the initial Gaussian noise. This fine-grained semantic binding enables advanced watermark verification where semantic tampering is detectable and localizable. We theoretically justify why SLICE enables robust and reliable tamper localization and provides statistical guarantees on false-accept rates. Experimental results demonstrate that SLICE significantly outperforms existing baselines against advanced semantic-guided regeneration attacks, substantially reducing attack success while preserving image quality and semantic fidelity. Overall, SLICE offers a practical, training-free provenance solution that is both fine-grained in diagnosis and robust to realistic adversarial manipulations.

Updated: 2026-03-13 07:49:01

标题: SLICE:通过分隔嵌入进行语义潜入的图像水印化

摘要: 将扩散模型的初始噪声进行水印处理已经成为图像溯源的一种有前景的方法,但是内容独立的噪声模式可以通过反演和重生攻击进行伪造。最近的语义感知水印方法通过将验证条件放在图像语义上来提高鲁棒性。然而,它们依赖于单一全局语义绑定,这使它们容易受到局部但全局连贯的语义编辑的影响。为了解决这一局限性并提供可信赖的语义感知水印,我们提出了通过分区嵌入实现语义潜入的$\textbf{SLICE}$。我们的框架将图像语义解耦为四个语义因子(主题、环境、动作和细节),并将它们精确锚定到初始高斯噪声的不同区域。这种细粒度的语义绑定使得高级水印验证成为可能,从而能够检测和定位语义篡改。我们理论上证明了为什么$\textbf{SLICE}$能够实现强大而可靠的篡改定位,并在误接受率上提供了统计保证。实验结果表明,$\textbf{SLICE}$在抵抗先进的语义引导的重生攻击方面明显优于现有基线方法,显著降低了攻击成功率,同时保持了图像质量和语义保真度。总的来说,$\textbf{SLICE}$提供了一种实用的、无需训练的溯源解决方案,既在诊断方面细粒度,又能够抵御现实中的对抗性操作。

更新时间: 2026-03-13 07:49:01

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2603.12749v1

Expert Selections In MoE Models Reveal (Almost) As Much As Text

We present a text-reconstruction attack on mixture-of-experts (MoE) language models that recovers tokens from expert selections alone. In MoE models, each token is routed to a subset of expert subnetworks; we show these routing decisions leak substantially more information than previously understood. Prior work using logistic regression achieves limited reconstruction; we show that a 3-layer MLP improves this to 63.1% top-1 accuracy, and that a transformer-based sequence decoder recovers 91.2% of tokens top-1 (94.8% top-10) on 32-token sequences from OpenWebText after training on 100M tokens. These results connect MoE routing to the broader literature on embedding inversion. We outline practical leakage scenarios (e.g., distributed inference and side channels) and show that adding noise reduces but does not eliminate reconstruction. Our findings suggest that expert selections in MoE deployments should be treated as sensitive as the underlying text.

Updated: 2026-03-13 06:37:48

标题: MoE模型中的专家选择揭示了(几乎)与文本一样的信息量

摘要: 我们提出了一种文本重构攻击,针对混合专家(MoE)语言模型,仅通过专家选择就可以恢复标记。在MoE模型中,每个标记被路由到专家子网络的一个子集;我们展示这些路由决策泄露的信息远比先前理解的要多。先前使用逻辑回归的工作实现了有限的重构;我们展示了一个3层MLP将其提高到63.1%的top-1准确率,并且一个基于transformer的序列解码器在训练了100M个标记后,在32个标记的OpenWebText序列中恢复了91.2%的top-1(94.8%的top-10)。这些结果将MoE路由与更广泛的嵌入逆转文献联系起来。我们概述了实际泄露场景(例如,分布式推理和侧信道),并展示添加噪声会减少但不会消除重构。我们的发现表明,在MoE部署中,专家选择应被视为与基础文本一样敏感。

更新时间: 2026-03-13 06:37:48

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2602.04105v3

Colluding LoRA: A Composite Attack on LLM Safety Alignment

We introduce Colluding LoRA (CoLoRA), an attack in which each adapter appears benign and plausibly functional in isolation, yet their linear composition consistently compromises safety. Unlike attacks that depend on specific input triggers or prompt patterns, CoLoRA is a composition-triggered broad refusal suppression: once a particular set of adapters is loaded, the model undergoes effective alignment degradation, complying with harmful requests without requiring adversarial prompts or suffixes. This attack exploits the combinatorial blindness of current defense systems, where exhaustively scanning all compositions is computationally intractable. Across several open-weight LLMs, CoLoRA achieves benign behavior individually yet high attack success rate after composition, indicating that securing modular LLM supply-chains requires moving beyond single-module verification toward composition-aware defenses.

Updated: 2026-03-13 05:53:15

标题: 共谋的LoRA:对LLM安全对齐的复合攻击

摘要: 我们介绍了Colluding LoRA (CoLoRA),这是一种攻击,其中每个适配器在孤立状态下表现良好且功能合理,但它们的线性组合一致地损害安全性。与依赖特定输入触发器或提示模式的攻击不同,CoLoRA是一种组合触发的广泛拒绝抑制:一旦加载了特定的适配器集,模型就会经历有效的对齐退化,遵守有害请求而不需要对抗性提示或后缀。该攻击利用了当前防御系统的组合盲目性,其中详尽扫描所有组合在计算上是难以解决的。在几个开放权重的LLMs中,CoLoRA在个体上实现了良性行为,但在组合后攻击成功率高,这表明保护模块化LLM供应链需要超越单模块验证,向组合感知防御迈进。

更新时间: 2026-03-13 05:53:15

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2603.12681v1

Why Neural Structural Obfuscation Can't Kill White-Box Watermarks for Good!

Neural Structural Obfuscation (NSO) (USENIX Security'23) is a family of ``zero cost'' structure-editing transforms (\texttt{nso\_zero}, \texttt{nso\_clique}, \texttt{nso\_split}) that inject dummy neurons. By combining neuron permutation and parameter scaling, NSO makes a radical modification to the network structure and parameters while strictly preserving functional equivalence, thereby disrupting white-box watermark verification. This capability has been a fundamental challenge to the reliability of existing white-box watermarking schemes. We rethink NSO and, for the first time, fully recover from the damage it has caused. We redefine NSO as a graph-consistent threat model within a \textit{producer--consumer} paradigm. This formulation posits that any obfuscation of a producer node necessitates a compatible layout update in all downstream consumers to maintain structural integrity. Building on these consistency constraints on signal propagation, we present \textsc{Canon}, a recovery framework that probes the attacked model to identify redundancy/dummy channels and then \textit{globally} canonicalizes the network by rewriting \textit{all} downstream consumers by construction, synchronizing layouts across \texttt{fan-out}, \texttt{add}, and \texttt{cat}. Extensive experiments demonstrate that, even under strong composed and extended NSO attacks, \textsc{Canon} achieves \textbf{100\%} recovery success, restoring watermark verifiability while preserving task utility. Our code is available at https://anonymous.4open.science/r/anti-NSO-9874.

Updated: 2026-03-13 05:50:26

标题: 为什么神经结构混淆不能永远消除白盒水印!

摘要: 神经结构模糊(NSO)(USENIX Security'23)是一组“零成本”结构编辑转换(\texttt{nso\_zero}、\texttt{nso\_clique}、\texttt{nso\_split}),它们注入虚拟神经元。通过结合神经元排列和参数缩放,NSO对网络结构和参数进行了根本性修改,同时严格保持功能等效性,从而破坏了白盒水印验证。这种能力对现有的白盒水印方案的可靠性构成了根本性挑战。 我们重新思考NSO,并首次完全恢复了它造成的损害。我们将NSO重新定义为一个图一致的威胁模型,处于一个“生产者-消费者”范式中。这一表述认为,对生产者节点的任何模糊化都需要在所有下游消费者中进行兼容的布局更新,以保持结构完整性。基于这些对信号传播的一致性约束,我们提出了\textsc{Canon},一个恢复框架,通过构建识别被攻击模型中的冗余/虚拟通道,然后通过重写\textit{所有}下游消费者来\textit{全局}规范化网络,同步跨\texttt{fan-out}、\texttt{add}和\texttt{cat}的布局。大量实验证明,即使在强大的组合和扩展的NSO攻击下,\textsc{Canon}也实现了\textbf{100\%}的恢复成功,恢复了水印的可验证性,同时保留了任务效用。 我们的代码可以在https://anonymous.4open.science/r/anti-NSO-9874找到。

更新时间: 2026-03-13 05:50:26

领域: cs.CR

下载: http://arxiv.org/abs/2603.12679v1

FARM: Few-shot Adaptive Malware Family Classification under Concept Drift

Malware classification models often suffer performance degradation under concept drift due to evolving threat landscapes and the emergence of novel malware families. This paper presents FARM (Few-shot Adaptive Recognition of Malware), a unified framework for detecting and adapting to both covariate drift and label drift in Windows Portable Executable (PE) malware family classification. FARM uses a triplet autoencoder to project samples into a discriminative latent space, enabling unsupervised drift detection through DBSCAN clustering and dynamic thresholding. To enable rapid adaptation, the framework employs a few-shot strategy that can incorporate new classes from only a small number of labeled samples. FARM also supports full retraining when sufficient drifted samples accumulate, allowing longer-term model updating. Experiments on the BenchMFC dataset show that FARM improves classification performance under covariate drift by 5.6%, and achieves an average F1 score of 0.85 on unseen malware families using few-shot adaptation, increasing to 0.94 after retraining. These results indicate that FARM provides an effective approach for drift-aware malware family classification in dynamic environments with limited supervision.

Updated: 2026-03-13 04:43:38

标题: FARM: 在概念漂移下的少样本自适应恶意软件家族分类

摘要: 恶意软件分类模型经常在概念漂移下性能下降,这是由于威胁景观的不断发展和新型恶意软件家族的出现。本文介绍了FARM(Few-shot Adaptive Recognition of Malware),这是一个统一的框架,用于检测和适应Windows Portable Executable(PE)恶意软件家族分类中的协变漂移和标签漂移。FARM使用三元自动编码器将样本投影到一个具有判别性的潜在空间中,通过DBSCAN聚类和动态阈值设置实现无监督漂移检测。为了实现快速适应,该框架采用少样本策略,只需少量标记样本即可引入新类别。当足够多的漂移样本积累时,FARM还支持完整的重新训练,允许长期模型更新。在BenchMFC数据集上的实验证明,FARM在协变漂移下提高了5.6%的分类性能,并在使用少样本适应时取得了未见恶意软件家族的平均F1分数为0.85,重新训练后提高到0.94。这些结果表明,FARM为在具有限监督的动态环境中的漂移感知恶意软件家族分类提供了有效的方法。

更新时间: 2026-03-13 04:43:38

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2601.17907v2

Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw

The rapid evolution of Large Language Models (LLMs) into autonomous, tool-calling agents has fundamentally altered the cybersecurity landscape. Frameworks like OpenClaw grant AI systems operating-system-level permissions and the autonomy to execute complex workflows. This level of access creates unprecedented security challenges. Consequently, traditional content-filtering defenses have become obsolete. This report presents a comprehensive security analysis of the OpenClaw ecosystem. We systematically investigate its current threat landscape, highlighting critical vulnerabilities such as prompt injection-driven Remote Code Execution (RCE), sequential tool attack chains, context amnesia, and supply chain contamination. To systematically contextualize these threats, we propose a novel tri-layered risk taxonomy for autonomous Agents, categorizing vulnerabilities across AI Cognitive, Software Execution, and Information System dimensions. To address these systemic architectural flaws, we introduce the Full-Lifecycle Agent Security Architecture (FASA). This theoretical defense blueprint advocates for zero-trust agentic execution, dynamic intent verification, and cross-layer reasoning-action correlation. Building on this framework, we present Project ClawGuard, our ongoing engineering initiative. This project aims to implement the FASA paradigm and transition autonomous agents from high-risk experimental utilities into trustworthy systems. Our code and dataset are available at https://github.com/NY1024/ClawGuard.

Updated: 2026-03-13 04:33:05

标题: 揭示自主代理体系中的安全威胁并构建防御措施:以OpenClaw为案例研究

摘要: 大型语言模型(LLMs)的快速演化为自主、工具调用代理在基本上改变了网络安全格局。像OpenClaw这样的框架赋予了AI系统操作系统级别的权限和执行复杂工作流的自主权。这种访问级别创造了前所未有的安全挑战。因此,传统的内容过滤防御已经过时。本报告对OpenClaw生态系统进行了全面的安全分析。我们系统地调查了其当前的威胁格局,重点突出了关键的漏洞,如提示注入驱动的远程代码执行(RCE)、顺序工具攻击链、上下文遗忘和供应链污染。为了系统地描述这些威胁,我们提出了一个新颖的三层风险分类法,将漏洞分为AI认知、软件执行和信息系统三个维度。为了解决这些系统结构缺陷,我们引入了全生命周期代理安全架构(FASA)。这种理论性的防御蓝图主张零信任代理执行、动态意图验证和跨层推理-行动相关。基于这个框架,我们提出了ClawGuard项目,这是我们正在进行的工程倡议。该项目旨在实施FASA范式,将自主代理从高风险的实验性实用工具转变为可信赖的系统。我们的代码和数据集可在https://github.com/NY1024/ClawGuard上获取。

更新时间: 2026-03-13 04:33:05

领域: cs.CR

下载: http://arxiv.org/abs/2603.12644v1

SoK: Evolution, Security, and Fundamental Properties of Transactional Systems

Transaction processing systems underpin modern commerce, finance, and critical infrastructure, yet their security has never been studied across the full evolutionary arc of these systems. Over five decades, transaction processing has progressed through four distinct generations, from centralized databases, to distributed databases, to blockchain and distributed ledger technologies (DLTs), finally to multi-context systems that span cyber-physical components under real-time constraints. Each generation has introduced new transaction types and new classes of vulnerabilities, yet security research remains fragmented by domain, and the foundational ACID transaction model has not been revisited to reflect the demands of contemporary systems. We classify 163 papers on transaction security by evolutionary generation, security focus, and relevant Common Weakness Enumeration (CWE) entries, and distill a curated set of 41 high-impact or seminal papers spanning all four generations. We make three principal contributions. First, we develop a four-generation evolutionary taxonomy that contextualizes each work within the broader trajectory of transaction processing. Second, we map each paper's security focus to CWE identifiers, providing a systems-oriented vocabulary for analyzing transaction-specific threats across otherwise siloed domains. Third, we demonstrate that the classical ACID properties are insufficient for modern transactional systems and introduce RANCID, extending ACID with Real-timeness (R) and N-many Contexts (N), as a property set for reasoning about the security and correctness of systems that must coordinate across heterogeneous contexts under timing constraints. Our systematization exposes a pronounced bias toward DLT security research at the expense of broader transactional security and identifies concrete open problems for the next generation of transaction processing systems.

Updated: 2026-03-13 04:20:35

标题: SoK:事务系统的演化、安全性和基本属性

摘要: 交易处理系统支撑着现代商业、金融和关键基础设施,然而它们的安全性从未在这些系统的完整演进过程中进行过研究。在过去的五十年里,交易处理已经经历了四个不同的世代,从集中式数据库、到分布式数据库、再到区块链和分布式账本技术(DLTs),最终发展到跨越实时约束下的网络物理组件的多上下文系统。每一代都引入了新的交易类型和新的漏洞类别,然而安全研究仍然被领域所分割,而基础的ACID事务模型尚未被重新审视以反映当代系统的需求。 我们对163篇关于交易安全性的论文进行了分类,包括演进世代、安全焦点和相关的常见弱点枚举(CWE)条目,并提炼出了一组精选的41篇高影响力或开创性的论文,涵盖了所有四代。我们做出了三项主要贡献。首先,我们制定了一个四代演进分类法,将每篇作品置于交易处理更广泛轨迹的背景中。其次,我们将每篇论文的安全焦点映射到CWE标识符,提供了一个系统导向的词汇,用于分析交易特定威胁在其他领域中的隔离中。第三,我们证明了经典的ACID属性对于现代事务系统是不足够的,并引入了RANCID,将ACID与实时性(R)和N个上下文(N)扩展,作为用于推断必须在时序约束下跨异构上下文协调的系统的安全性和正确性的属性集。我们的系统化揭示了对DLT安全研究的明显偏见,牺牲了更广泛的交易安全性,并确定了下一代交易处理系统的具体开放问题。

更新时间: 2026-03-13 04:20:35

领域: cs.CR

下载: http://arxiv.org/abs/2603.07381v2

ExpanderGraph-128: A Novel Graph-Theoretic Block Cipher with Formal Security Analysis and Hardware Implementation

Lightweight block cipher design has largely focused on incremental optimization of established paradigms such as substitution--permutation networks, Feistel structures, and ARX constructions, where security derives from the algebraic complexity of individual components. We propose a different approach based on \emph{expander-graph interaction networks}, where diffusion and security arise from sparse structural connectivity rather than component sophistication. We present \textbf{ExpanderGraph-128 (EGC128)}, a 128-bit block cipher constructed as a 20-round balanced Feistel network. Each round applies a 64-bit nonlinear transformation governed by a 3-regular expander graph whose vertices execute identical 4-input Boolean functions on local neighborhoods. Security analysis combines MILP-based differential bounds, proven optimal through 10 rounds via SCIP, establishing 147.3-bit differential security and conservatively extrapolating to 413 bits for the full cipher. Linear analysis provides MILP bounds of $\geq 2^{145}$, while related-key evaluation shows no free rounds for any nonzero key difference. Additional tests confirm rapid algebraic degree growth and the absence of invariant affine subspaces. Implementation results demonstrate practical efficiency. FPGA synthesis on Xilinx Artix-7 achieves 261~Mbps at 100~MHz using only 380 LUTs, while ARM Cortex-M4F software requires 25.8~KB Flash and 1.66~ms per encryption. These results show that expander-graph-driven diffusion provides a promising design methodology for lightweight cryptography.

Updated: 2026-03-13 04:15:20

标题: ExpanderGraph-128:一种具有形式安全分析和硬件实现的新型图论分组密码

摘要: 轻量级块密码设计主要集中在增量优化已建立的范例,如替代-置换网络、费斯特尔结构和ARX构造,其中安全性源于单个组件的代数复杂性。我们提出了一种基于\emph{展开图交互网络}的不同方法,其中扩散和安全性源于稀疏结构连接,而不是组件复杂性。 我们提出了\textbf{展开图128(EGC128)},这是一个128位块密码,构建为20轮平衡的费斯特尔网络。每一轮应用一个64位的非线性转换,由一个3-正则的展开图控制,其顶点在局部邻域上执行相同的4输入布尔函数。安全性分析结合了基于MILP的差分界限,通过SCIP证明了10轮的最佳性,建立了147.3位差分安全性,并保守地推断到完整密码的413位。线性分析提供了$\geq 2^{145}$的MILP界限,而相关密钥评估显示对于任何非零密钥差异均没有免费轮。额外测试确认了代数次数的快速增长和不变仿射子空间的缺失。 实施结果显示了实际效率。在Xilinx Artix-7上的FPGA综合实现实现了100 MHz时的261 Mbps,仅使用380个LUTs,而ARM Cortex-M4F软件需要25.8 KB的闪存和1.66毫秒的加密时间。这些结果表明,基于展开图驱动的扩散为轻量级密码学提供了一种有前途的设计方法。

更新时间: 2026-03-13 04:15:20

领域: cs.CR,cs.AR,cs.DS

下载: http://arxiv.org/abs/2603.12637v1

MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

LLM-based web agents have become increasingly popular for their utility in daily life and work. However, they exhibit critical vulnerabilities when processing malicious URLs: accepting a disguised malicious URL enables subsequent access to unsafe webpages, which can cause severe damage to service providers and users. Despite this risk, no benchmark currently targets this emerging threat. To address this gap, we propose MalURLBench, the first benchmark for evaluating LLMs' vulnerabilities to malicious URLs. MalURLBench contains 61,845 attack instances spanning 10 real-world scenarios and 7 categories of real malicious websites. Experiments with 12 popular LLMs reveal that existing models struggle to detect elaborately disguised malicious URLs. We further identify and analyze key factors that impact attack success rates and propose URLGuard, a lightweight defense module. We believe this work will provide a foundational resource for advancing the security of web agents. Our code is available at https://github.com/JiangYingEr/MalURLBench.

Updated: 2026-03-13 04:12:36

标题: MalURLBench:评估代理处理Web URL时的脆弱性的基准测试

摘要: 基于LLM的网络代理在日常生活和工作中越来越受欢迎,但在处理恶意URL时存在关键漏洞:接受伪装的恶意URL将导致对不安全网页的后续访问,可能对服务提供商和用户造成严重损害。尽管存在这种风险,目前没有基准针对这种新兴威胁。为了填补这一空白,我们提出了MalURLBench,这是第一个用于评估LLM对恶意URL的漏洞的基准。MalURLBench包含了61,845个攻击实例,涵盖了10个真实场景和7类真实恶意网站。对12个流行的LLM进行实验表明,现有模型难以检测精心伪装的恶意URL。我们进一步确定并分析影响攻击成功率的关键因素,并提出了URLGuard,一个轻量级防御模块。我们相信这项工作将为提升网络代理的安全性提供基础资源。我们的代码可在https://github.com/JiangYingEr/MalURLBench获取。

更新时间: 2026-03-13 04:12:36

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2601.18113v3

AEGIS: No Tool Call Left Unchecked -- A Pre-Execution Firewall and Audit Layer for AI Agents

AI agents increasingly act through external tools: they query databases, execute shell commands, read and write files, and send network requests. Yet in most current agent stacks, model-generated tool calls are handed to the execution layer with no framework-agnostic control point in between. Post-execution observability can record these actions, but it cannot stop them before side effects occur. We present AEGIS, a pre-execution firewall and audit layer for AI agents. AEGIS interposes on the tool-execution path and applies a three-stage pipeline: (i) deep string extraction from tool arguments, (ii) content-first risk scanning, and (iii) composable policy validation. High-risk calls can be held for human approval, and all decisions are recorded in a tamper-evident audit trail based on Ed25519 signatures and SHA-256 hash chaining. In the current implementation, AEGIS supports 14 agent frameworks across Python, JavaScript, and Go with lightweight integration. On a curated suite of 48 attackinstances, AEGIS blocks all attacks in the suite before execution; on 500 benign tool calls, it yields a 1.2% false positive rate; and across 1,000 consecutive interceptions, it adds 8.3 ms median latency. The live demo will show end-to-end interception of benign, malicious, and human-escalated tool calls, allowing attendees to observe real-time blocking, approval workflows, and audit-trail generation. These results suggest that pre-execution mediation for AI agents can be practical, low-overhead, and directly deployable.

Updated: 2026-03-13 03:49:12

标题: AEGIS:没有遗漏的工具调用--用于AI代理的预执行防火墙和审计层

摘要: AI代理越来越多地通过外部工具进行操作:它们查询数据库,执行shell命令,读写文件,并发送网络请求。然而,在大多数当前的代理堆栈中,模型生成的工具调用被直接传递给执行层,中间没有框架无关的控制点。后执行可观察性可以记录这些操作,但无法在产生副作用之前停止它们。我们提出了AEGIS,一个用于AI代理的预执行防火墙和审计层。AEGIS介入工具执行路径,并应用三阶段流水线:(i)从工具参数中提取深层字符串,(ii)以内容为先的风险扫描,以及(iii)可组合的策略验证。高风险调用可以等待人工批准,所有决策都记录在基于Ed25519签名和SHA-256哈希链接的防篡改审计路径中。在当前实现中,AEGIS支持Python、JavaScript和Go跨14个代理框架的轻量级集成。在经过精心策划的48个攻击实例套件中,AEGIS在执行之前阻止了所有攻击;在500个良性工具调用中,它产生了1.2%的误报率;在1,000次连续拦截中,它增加了8.3毫秒的中位延迟。实时演示将展示对良性、恶意和人工升级的工具调用的端到端拦截,使与会者能够观察实时阻止、批准工作流程和审计路径生成。这些结果表明,对于AI代理的预执行调解可以是实用的、低开销的,并且可以直接部署。

更新时间: 2026-03-13 03:49:12

领域: cs.CR

下载: http://arxiv.org/abs/2603.12621v1

Hardness of Range Avoidance and Proof Complexity Generators from Demi-Bits

Given a circuit $G: \{0, 1\}^n \to \{0, 1\}^m$ with $m > n$, the *range avoidance* problem ($\text{Avoid}$) asks to output a string $y\in \{0, 1\}^m$ that is not in the range of $G$. Besides its profound connection to circuit complexity and explicit construction problems, this problem is also related to the existence of *proof complexity generators* -- circuits $G: \{0, 1\}^n \to \{0, 1\}^m$ where $m > n$ but for every $y\in \{0, 1\}^m$, it is infeasible to prove the statement "$y\not\in\mathrm{Range}(G)$" in a given propositional proof system. This paper connects these two problems with the existence of *demi-bits generators*, a fundamental cryptographic primitive against nondeterministic adversaries introduced by Rudich (RANDOM '97). $\bullet$ We show that the existence of demi-bits generators implies $\text{Avoid}$ is hard for nondeterministic algorithms. This resolves an open problem raised by Chen and Li (STOC '24). Furthermore, assuming the demi-hardness of certain LPN-style generators or Goldreich' PRG, we prove the hardness of $\text{Avoid}$ even when the instances are constant-degree polynomials over $\mathbb{F}_2$. $\bullet$ We show that the dual weak pigeonhole principle is unprovable in Cook's theory $\mathsf{PV}_1$ under the existence of demi-bits generators secure against $\mathbf{AM}$, thereby separating Jerabek's theory $\mathsf{APC}_1$ from $\mathsf{PV}_1$. $\bullet$ We transform demi-bits generators to proof complexity generators that are *pseudo-surjective* with nearly optimal parameters. Our constructions build on the recent breakthroughs on the hardness of $\text{Avoid}$ by Ilango, Li, and Williams (STOC '23) and Chen and Li (STOC '24). We use *randomness extractors* to significantly simplify the construction and the proof.

Updated: 2026-03-13 03:47:26

标题: 避免范围硬度和证明复杂性生成器的半位翻译

摘要: 鉴于一个电路$G:\{0,1\}^n \to \{0,1\}^m$,其中$m > n$,*范围避免*问题($\text{Avoid}$)要求输出一个属于$G$的范围之外的字符串$y\in \{0,1\}^m$。除了与电路复杂度和显式构造问题有着深刻的联系之外,这个问题还与存在*证明复杂性生成器*相关联——即电路$G:\{0,1\}^n \to \{0,1\}^m$,其中$m > n$,但对于每个$y\in \{0,1\}^m$,在给定的命题证明系统中证明陈述"$y\not\in\mathrm{Range}(G)$"是不可行的。 本文将这两个问题与*半位生成器*的存在联系起来,这是Rudich(RANDOM '97)引入的针对非确定性对手的基本密码学原语。 $\bullet$ 我们表明半位生成器的存在意味着$\text{Avoid}$对于非确定性算法是困难的。这解决了Chen和Li(STOC '24)提出的一个悬而未决的问题。此外,假设某些LPN风格生成器或Goldreich的PRG的半硬度,我们证明了即使实例是$\mathbb{F}_2$上的常数次多项式,$\text{Avoid}$的难度也是困难的。 $\bullet$ 我们表明,在半位生成器安全对抗$\mathbf{AM}$的情况下,Cook的理论$\mathsf{PV}_1$中的双弱鸽巢原理是无法证明的,从而将Jerabek的理论$\mathsf{APC}_1$与$\mathsf{PV}_1$分开。 $\bullet$ 我们将半位生成器转换为证明复杂性生成器,这些生成器是*伪满射*,并具有几乎最佳的参数。 我们的构造建立在Ilango,Li和Williams(STOC '23)以及Chen和Li(STOC '24)关于$\text{Avoid}$的困难性的最新突破之上。我们使用*随机性提取器*来显著简化构造和证明。

更新时间: 2026-03-13 03:47:26

领域: cs.CC,cs.CR

下载: http://arxiv.org/abs/2511.14061v2

ChainFuzzer: Greybox Fuzzing for Workflow-Level Multi-Tool Vulnerabilities in LLM Agents

Tool-augmented LLM agents increasingly rely on multi-step, multi-tool workflows to complete real tasks. This design expands the attack surface, because data produced by one tool can be persisted and later reused as input to another tool, enabling exploitable source-to-sink dataflows that only emerge through tool composition. We study this risk as multi-tool vulnerabilities in LLM agents, and show that existing discovery efforts focused on single-tool or single-hop testing miss these long-horizon behaviors and provide limited debugging value. We present ChainFuzzer, a greybox framework for discovering and reproducing multi-tool vulnerabilities with auditable evidence. ChainFuzzer (i) identifies high-impact operations with strict source-to-sink dataflow evidence and extracts plausible upstream candidate tool chains based on cross-tool dependencies, (ii) uses Trace-guided Prompt Solving (TPS) to synthesize stable prompts that reliably drive the agent to execute target chains, and (iii) performs guardrail-aware fuzzing to reproduce vulnerabilities under LLM guardrails via payload mutation and sink-specific oracles. We evaluate ChainFuzzer on 20 popular open-source LLM agent apps (998 tools). ChainFuzzer extracts 2,388 candidate tool chains and synthesizes 2,213 stable prompts, confirming 365 unique, reproducible vulnerabilities across 19/20 apps (302 require multi-tool execution). Component evaluation shows tool-chain extraction achieves 96.49% edge precision and 91.50% strict chain precision; TPS increases chain reachability from 27.05% to 95.45%; guardrail-aware fuzzing boosts payload-level trigger rate from 18.20% to 88.60%. Overall, ChainFuzzer achieves 3.02 vulnerabilities per 1M tokens, providing a practical foundation for testing and hardening real-world multi-tool agent systems.

Updated: 2026-03-13 03:35:54

标题: ChainFuzzer:LLM代理中工作流级别多工具漏洞的灰盒模糊测试

摘要: 工具增强的LLM代理越来越依赖多步骤、多工具工作流来完成真实任务。这种设计扩展了攻击面,因为一个工具产生的数据可以被保留,并稍后被另一个工具重复使用作为输入,从而实现只有通过工具组合才会出现的可利用的源到汇数据流。我们将这一风险研究为LLM代理中的多工具漏洞,并表明现有的发现努力集中在单一工具或单一跳测试上,忽视了这些长期行为并提供有限的调试价值。我们提出了ChainFuzzer,一个灰盒框架,用于发现和重现具有可审计证据的多工具漏洞。ChainFuzzer (i) 识别具有严格源到汇数据流证据的高影响操作,并基于跨工具依赖关系提取合理的上游候选工具链, (ii) 使用追踪引导提示求解(TPS)来合成可靠驱动代理执行目标链的稳定提示, (iii) 执行带有防护栏意识的模糊测试,通过有效载荷变异和汇特定的神谕来重现LLM防护栏下的漏洞。我们在20个流行的开源LLM代理应用程序(998个工具)上评估了ChainFuzzer。ChainFuzzer提取了2,388个候选工具链,并合成了2,213个稳定提示,确认了19/20个应用程序中的365个独特、可重现的漏洞(302个需要多工具执行)。组件评估显示工具链提取实现了96.49%的边缘精度和91.50%的严格链精度;TPS将链的可达性从27.05%提高到95.45%;带防护栏的模糊测试将触发率从18.20%提高到88.60%。总体而言,ChainFuzzer每1M个令牌实现3.02个漏洞,为测试和加固现实世界中的多工具代理系统提供了实用的基础。

更新时间: 2026-03-13 03:35:54

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2603.12614v1

RTD-Guard: A Black-Box Textual Adversarial Detection Framework via Replacement Token Detection

Textual adversarial attacks pose a serious security threat to Natural Language Processing (NLP) systems by introducing imperceptible perturbations that mislead deep learning models. While adversarial example detection offers a lightweight alternative to robust training, existing methods typically rely on prior knowledge of attacks, white-box access to the victim model, or numerous queries, which severely limits their practical deployment. This paper introduces RTD-Guard, a novel black-box framework for detecting textual adversarial examples. Our key insight is that word-substitution perturbations in adversarial attacks closely resemble the "replaced tokens" that a Replaced Token Detection (RTD) discriminator is pre-trained to identify. Leveraging this, RTD-Guard employs an off-the-shelf RTD discriminator-without fine-tuning-to localize suspicious tokens, masks them, and detects adversarial examples by observing the prediction confidence shift of the victim model before and after intervention. The entire process requires no adversarial data, model tuning, or internal model access, and uses only two black-box queries. Comprehensive experiments on multiple benchmark datasets demonstrate that RTD-Guard effectively detects adversarial texts generated by diverse state-of-the-art attack methods. It surpasses existing detection baselines across multiple metrics, offering a highly efficient, practical, and resource-light defense mechanism-particularly suited for real-world deployment in resource-constrained or privacy-sensitive environments.

Updated: 2026-03-13 02:30:56

标题: RTD-Guard:一种通过替代标记检测的黑匣子文本对抗检测框架

摘要: 文本对抗攻击通过引入难以察觉的扰动,误导深度学习模型,对自然语言处理(NLP)系统构成严重安全威胁。虽然对抗样本检测提供了一个轻量级的替代方法来进行强化训练,但现有方法通常依赖于对攻击的先验知识、对受害模型的白盒访问或大量的查询,这严重限制了它们的实际部署。本文介绍了RTD-Guard,一种新颖的用于检测文本对抗样本的黑盒框架。我们的关键洞察是,在对抗攻击中的单词替换扰动与先前经过预训练的替换令牌检测(RTD)鉴别器预先训练以识别的“替换令牌”密切相似。利用这一点,RTD-Guard利用现成的RTD鉴别器(无需微调)来定位可疑令牌,对其进行屏蔽,并通过观察干预前后受害模型的预测置信度变化来检测对抗样本。整个过程不需要对抗数据、模型调整或内部模型访问,并且只需两个黑盒查询。对多个基准数据集进行的全面实验表明,RTD-Guard有效地检测出由各种最先进攻击方法生成的对抗文本。它在多个度量标准上超越了现有的检测基线,提供了一种高效、实用和资源轻量级的防御机制,特别适用于资源受限或隐私敏感环境中的实际部署。

更新时间: 2026-03-13 02:30:56

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2603.12582v1

Bipartite Randomized Response Mechanism for Local Differential Privacy

With the increasing importance of data privacy, Local Differential Privacy (LDP) has recently become a strong measure of privacy for protecting each user's privacy from data analysts without relying on a trusted third party. In this paper, we consider the problem of high-utility differentially private release. Given a domain of items and a distance-defined utility function, our goal is to design a differentially private mechanism that releases an item with the global expected error as small as possible. The most common LDP mechanism for this task is the Generalized Randomized Response (GRR) mechanism that treats all candidate items equally except for the true item. In this paper, we introduce Bipartite Randomized Response mechanism (BRR), which adaptively divides all candidate items into two parts by utility rankings. In the local search phase, we confirm how many high-utility candidates to be assigned with high release probability, which gives the locally optimal bipartite classification of all candidates. For preserving LDP, the global search phase uniformly selects the smallest number of dynamic high-utility candidates obtained locally. In particular, we give explicit formulas on the uniform number of dynamic high-utility candidates. The global expected error of our BRR can theoretically deliver a decrease with an asymptotically exact ratio, and when the privacy budget is set to $3$ the expected error can be reduced by $66.4\%$. Extensive experiments demonstrate that BRR outperforms the state-of-the-art methods across the standard metrics and datasets.

Updated: 2026-03-13 01:25:37

标题: 双部分随机响应机制用于本地差分隐私

摘要: 随着数据隐私的日益重要,局部差分隐私(LDP)最近已成为一种强有力的隐私保护措施,可保护每个用户的隐私免受数据分析人员的侵犯,而无需依赖于可信的第三方。在本文中,我们考虑高效差分私有发布的问题。给定一个项目领域和一个距离定义的效用函数,我们的目标是设计一个差分私有机制,使得释放的项目的全局预期误差尽可能小。这项任务中最常见的LDP机制是广义随机响应(GRR)机制,它将所有候选项目视为相等,除了真实项目之外。在本文中,我们引入了双部随机响应机制(BRR),它通过效用排名自适应地将所有候选项目分为两部分。在局部搜索阶段,我们确定有多少高效用候选项目应被分配高释放概率,从而给出所有候选项目的局部最优双部分类。为了保护LDP,全局搜索阶段均匀选择在局部获取的动态高效用候选项目的最小数量。特别地,我们给出了关于动态高效用候选项目的均匀数量的显式公式。我们的BRR的全局预期误差在理论上可以按照渐近精确比例减少,当隐私预算设定为$3$时,预期误差可以减少$66.4\%$。大量实验表明,BRR在标准度量和数据集上优于最先进的方法。

更新时间: 2026-03-13 01:25:37

领域: cs.CR

下载: http://arxiv.org/abs/2504.20926v3

SoK: Market Microstructure for Decentralized Prediction Markets (DePMs)

Decentralized prediction markets (DePMs) allow open participation in event-based wagering without fully relying on centralized intermediaries. We review the history of DePMs which date back to 2011 and includes hundreds of proposals. Perhaps surprising, modern DePMs like Polymarket deviate materially from earlier designs like Truthcoin and Augur v1. We use our review to present a modular workflow comprising eight stages: underlying infrastructure, market topic, share structure and pricing, market initialization, trading, market resolution, settlement, and archiving. For each module, we enumerate the design variants, analyzing trade-offs around decentralization, expressiveness, and manipulation resistance. We also identify open problems for researchers interested in this ecosystem.

Updated: 2026-03-13 00:42:45

标题: SoK:去中心化预测市场(DePMs)的市场微观结构

摘要: 去中心化预测市场(DePMs)允许公开参与基于事件的赌注,而无需完全依赖于中心化的中介机构。我们回顾了自2011年以来涉及数百个提案的DePMs的历史。或许令人惊讶的是,像Polymarket这样的现代DePMs与早期设计如Truthcoin和Augur v1有很大的偏离。我们利用我们的审查提出了一个包括八个阶段的模块化工作流程:基础设施、市场主题、股份结构和定价、市场初始化、交易、市场解决、结算和归档。对于每个模块,我们列举了设计变体,分析了围绕去中心化、表达能力和操纵抵抗的权衡。我们还确定了对于对这个生态系统感兴趣的研究人员的未解决问题。

更新时间: 2026-03-13 00:42:45

领域: cs.CE,cs.CR,q-fin.TR

下载: http://arxiv.org/abs/2510.15612v3

By Xinhai (Sean) Zou.