Arxiv Day: Article

On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment

With the increased deployment of large language models (LLMs), one concern is their potential misuse for generating harmful content. Our work studies the alignment challenge, with a focus on filters to prevent the generation of unsafe information. Two natural points of intervention are the filtering of the input prompt before it reaches the model, and filtering the output after generation. Our main results demonstrate computational challenges in filtering both prompts and outputs. First, we show that there exist LLMs for which there are no efficient prompt filters: adversarial prompts that elicit harmful behavior can be easily constructed, which are computationally indistinguishable from benign prompts for any efficient filter. Our second main result identifies a natural setting in which output filtering is computationally intractable. All of our separation results are under cryptographic hardness assumptions. In addition to these core findings, we also formalize and study relaxed mitigation approaches, demonstrating further computational barriers. We conclude that safety cannot be achieved by designing filters external to the LLM internals (architecture and weights); in particular, black-box access to the LLM will not suffice. Based on our technical results, we argue that an aligned AI system's intelligence cannot be separated from its judgment.

Updated: 2025-07-09 23:55:35

标题: 无法将情报与判断分开：AI对齐的过滤计算难题

摘要: 随着大型语言模型（LLMs）的部署增加，人们担心它们可能被滥用来生成有害内容。我们的工作研究了对齐挑战，重点关注防止生成不安全信息的过滤器。两个自然的干预点是在输入提示到达模型之前对其进行过滤，以及在生成后对输出进行过滤。我们的主要结果表明，在过滤提示和输出方面存在计算挑战。首先，我们展示了存在一些LLMs，没有有效的提示过滤器：可以轻松构建出引发有害行为的对抗性提示，这些提示在任何有效的过滤器下与良性提示在计算上无法区分。我们的第二个主要结果确定了一个自然的环境，其中输出过滤是计算难以处理的。我们所有的分离结果都基于加密难度假设。除了这些核心发现外，我们还形式化并研究了放松缓解方法，展示了进一步的计算障碍。我们得出结论，通过设计位于LLM内部（架构和权重）的外部过滤器无法实现安全性；特别是，对LLM的黑盒访问将不足以满足要求。基于我们的技术结果，我们认为，一个对齐的AI系统的智能无法与其判断分开。

更新时间: 2025-07-09 23:55:35

领域: cs.AI,cs.CR

下载: http://arxiv.org/abs/2507.07341v1

ConsNoTrainLoRA: Data-driven Weight Initialization of Low-rank Adapters using Constraints

Foundation models are pre-trained on large-scale datasets and subsequently fine-tuned on small-scale datasets using parameter-efficient fine-tuning (PEFT) techniques like low-rank adapters (LoRA). In most previous works, LoRA weight matrices are randomly initialized with a fixed rank across all attachment points. In this paper, we improve convergence and final performance of LoRA fine-tuning, using our proposed data-driven weight initialization method, ConsNoTrainLoRA (CNTLoRA). We express LoRA initialization as a domain shift problem where we use multiple constraints relating the pre-training and fine-tuning activations. By reformulating these constraints, we obtain a closed-form estimate of LoRA weights that depends on pre-training weights and fine-tuning activation vectors and hence requires no training during initialization. This weight estimate is decomposed to initialize the up and down matrices with proposed flexibility of variable ranks. With the proposed initialization method, we fine-tune on downstream tasks such as image generation, image classification and image understanding. Both quantitative and qualitative results demonstrate that CNTLoRA outperforms standard and data-driven weight initialization methods. Extensive analyses and ablations further elucidate the design choices of our framework, providing an optimal recipe for faster convergence and enhanced performance.

Updated: 2025-07-09 23:52:31

标题: ConsNoTrainLoRA: 使用约束条件的基于数据驱动的低秩适配器权重初始化

摘要: 基础模型在大规模数据集上进行预训练，随后使用参数高效的微调技术（如低秩适配器LoRA）在小规模数据集上进行微调。在大多数先前的工作中，LoRA权重矩阵是随机初始化的，并在所有附着点上具有固定秩。在本文中，我们提出了一种改进LoRA微调的收敛性和最终性能的数据驱动权重初始化方法ConsNoTrainLoRA（CNTLoRA）。我们将LoRA初始化表达为一个域转移问题，其中我们使用多个约束关联预训练和微调激活。通过重新表述这些约束，我们得到一个基于预训练权重和微调激活向量的LoRA权重的闭合形式估计，因此在初始化过程中不需要训练。这个权重估计被分解为使用建议的可变秩度的上下矩阵进行初始化。通过提出的初始化方法，我们在下游任务中进行微调，如图像生成、图像分类和图像理解。定量和定性结果表明，CNTLoRA优于标准和数据驱动的权重初始化方法。广泛的分析和消融进一步阐明了我们框架的设计选择，提供了更快收敛和增强性能的最佳配方。

更新时间: 2025-07-09 23:52:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.08044v1

Adversarial Defenses via Vector Quantization

Adversarial attacks pose significant challenges to the robustness of modern deep neural networks in computer vision, and defending these networks against adversarial attacks has attracted intense research efforts. Among various defense strategies, preprocessing-based defenses are practically appealing since there is no need to train the network under protection. However, such approaches typically do not achieve comparable robustness as other methods such as adversarial training. In this paper, we propose a novel framework for preprocessing-based defenses, where a vector quantizer is used as a preprocessor. This framework, inspired by and extended from Randomized Discretization (RandDisc), is theoretically principled by rate-distortion theory: indeed, RandDisc may be viewed as a scalar quantizer, and rate-distortion theory suggests that such quantization schemes are inferior to vector quantization. In our framework, the preprocessing vector quantizer treats the input image as a collection of patches and finds a set of representative patches based on the patch distributions; each original patch is then modified according to the representative patches close to it. We present two lightweight defenses in this framework, referred to as patched RandDisc (pRD) and sliding-window RandDisc (swRD), where the patches are disjoint in the former and overlapping in the latter. We show that vector-quantization-based defenses have certifiable robust accuracy and that pRD and swRD demonstrate state-of-the-art performances, surpassing RandDisc by a large margin. Notably, the proposed defenses possess the obfuscated gradients property. Our experiments however show that pRD and swRD remain effective under the STE and EOT attacks, which are designed specifically for defenses with gradient obfuscation. ...

Updated: 2025-07-09 23:51:43

标题: 对抗性防御通过向量量化

摘要: 对抗性攻击对计算机视觉中现代深度神经网络的鲁棒性提出了重大挑战，而保护这些网络免受对抗性攻击的影响已经引起了强烈的研究努力。在各种防御策略中，基于预处理的防御策略在实践中具有吸引力，因为无需在保护下训练网络。然而，这种方法通常无法达到与对抗性训练等其他方法相媲美的鲁棒性。在本文中，我们提出了一个基于预处理的防御框架，其中使用矢量量化器作为预处理器。这个框架受到随机离散化（RandDisc）的启发和扩展，从理论上讲受到失真率理论的支持：实际上，RandDisc可以被视为标量量化器，而失真率理论表明这种量化方案不如矢量量化优秀。在我们的框架中，预处理矢量量化器将输入图像视为一组补丁，并基于补丁分布找到一组代表性补丁；然后，根据其附近的代表性补丁对每个原始补丁进行修改。我们在这个框架中提出了两种轻量级防御方法，称为patched RandDisc（pRD）和sliding-window RandDisc（swRD），其中前者的补丁是不相交的，后者的补丁是重叠的。我们展示了基于矢量量化的防御具有可证明的鲁棒性准确性，并且pRD和swRD展示了最先进的性能，远远超过RandDisc。值得注意的是，所提出的防御具有模糊的梯度特性。然而，我们的实验表明，pRD和swRD在专门设计用于梯度混淆防御的STE和EOT攻击下仍然有效。

更新时间: 2025-07-09 23:51:43

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2305.13651v2

Benchmarking Waitlist Mortality Prediction in Heart Transplantation Through Time-to-Event Modeling using New Longitudinal UNOS Dataset

Decisions about managing patients on the heart transplant waitlist are currently made by committees of doctors who consider multiple factors, but the process remains largely ad-hoc. With the growing volume of longitudinal patient, donor, and organ data collected by the United Network for Organ Sharing (UNOS) since 2018, there is increasing interest in analytical approaches to support clinical decision-making at the time of organ availability. In this study, we benchmark machine learning models that leverage longitudinal waitlist history data for time-dependent, time-to-event modeling of waitlist mortality. We train on 23,807 patient records with 77 variables and evaluate both survival prediction and discrimination at a 1-year horizon. Our best model achieves a C-Index of 0.94 and AUROC of 0.89, significantly outperforming previous models. Key predictors align with known risk factors while also revealing novel associations. Our findings can support urgency assessment and policy refinement in heart transplant decision making.

Updated: 2025-07-09 23:51:31

标题: 使用新的长期UNOS数据集，通过时间事件建模在心脏移植等待名单中进行死亡预测的基准比较

摘要: 目前，关于如何管理心脏移植候选患者的决定是由医生委员会根据多种因素进行考虑的，但这一过程仍然主要是临时性的。自2018年以来，由于美国器官共享联合网络（UNOS）收集的纵向患者、捐赠者和器官数据量不断增长，因此对支持器官可用时的临床决策制定感兴趣的人越来越多。在这项研究中，我们对利用纵向候选人历史数据进行等待名单死亡时间相关、时间至事件建模的机器学习模型进行了基准测试。我们对23,807份患者记录进行训练，包括77个变量，并在1年的时间范围内评估存活预测和区分能力。我们的最佳模型实现了0.94的C指数和0.89的AUROC，明显优于以前的模型。关键预测因素与已知风险因素一致，同时也揭示了新的关联。我们的研究结果可以支持心脏移植决策中的紧急性评估和政策改进。

更新时间: 2025-07-09 23:51:31

领域: stat.AP,cs.LG

下载: http://arxiv.org/abs/2507.07339v1

Bayesian Double Descent

Double descent is a phenomenon of over-parameterized statistical models. Our goal is to view double descent from a Bayesian perspective. Over-parameterized models such as deep neural networks have an interesting re-descending property in their risk characteristics. This is a recent phenomenon in machine learning and has been the subject of many studies. As the complexity of the model increases, there is a U-shaped region corresponding to the traditional bias-variance trade-off, but then as the number of parameters equals the number of observations and the model becomes one of interpolation, the risk can become infinite and then, in the over-parameterized region, it re-descends -- the double descent effect. We show that this has a natural Bayesian interpretation. Moreover, we show that it is not in conflict with the traditional Occam's razor that Bayesian models possess, in that they tend to prefer simpler models when possible. We illustrate the approach with an example of Bayesian model selection in neural networks. Finally, we conclude with directions for future research.

Updated: 2025-07-09 23:47:26

标题: 贝叶斯双下降

摘要: 双下降是过参数化统计模型的现象。我们的目标是从贝叶斯的角度来看待双下降。过参数化模型，如深度神经网络，在其风险特性中具有有趣的再下降特性。这是机器学习中的一个最近现象，并成为许多研究的主题。随着模型复杂性的增加，存在一个与传统偏差-方差权衡相对应的U形区域，但当参数数量等于观察数量并且模型成为插值模型时，风险可能变为无穷大，然后在过参数化区域中重新下降 - 双下降效应。我们表明这具有自然的贝叶斯解释。此外，我们表明这与传统奥卡姆剃刀并不相矛盾，贝叶斯模型倾向于在可能时选择更简单的模型。我们通过神经网络中的贝叶斯模型选择的示例说明这一方法。最后，我们总结了未来研究的方向。

更新时间: 2025-07-09 23:47:26

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2507.07338v1

MarineFormer: A Spatio-Temporal Attention Model for USV Navigation in Dynamic Marine Environments

Autonomous navigation in marine environments can be extremely challenging, especially in the presence of spatially varying flow disturbances and dynamic and static obstacles. In this work, we demonstrate that incorporating local flow field measurements fundamentally alters the nature of the problem, transforming otherwise unsolvable navigation scenarios into tractable ones. However, the mere availability of flow data is not sufficient; it must be effectively fused with conventional sensory inputs such as ego-state and obstacle states. To this end, we propose \textbf{MarineFormer}, a Transformer-based policy architecture that integrates two complementary attention mechanisms: spatial attention for sensor fusion, and temporal attention for capturing environmental dynamics. MarineFormer is trained end-to-end via reinforcement learning in a 2D simulated environment with realistic flow features and obstacles. Extensive evaluations against classical and state-of-the-art baselines show that our approach improves episode completion success rate by nearly 23\% while reducing path length. Ablation studies further highlight the critical role of flow measurements and the effectiveness of our proposed architecture in leveraging them.

Updated: 2025-07-09 23:40:31

标题: MarineFormer: 用于动态海洋环境中USV导航的时空注意力模型

摘要: 海洋环境中的自主导航可能会极具挑战性，特别是在存在空间变化的流动干扰和动态静态障碍物的情况下。在这项工作中，我们展示了将局部流场测量纳入导航问题会从根本上改变问题的性质，将原本无法解决的导航场景转变为可处理的场景。然而，仅有流动数据的可用性是不够的；它必须与传统的传感器输入，如自我状态和障碍物状态，进行有效融合。为此，我们提出了一种基于Transformer的策略架构MarineFormer，它集成了两种互补的注意机制：用于传感器融合的空间注意力和用于捕捉环境动态的时间注意力。MarineFormer在具有现实流动特征和障碍物的2D模拟环境中通过强化学习进行端到端训练。针对经典和最新基线进行的广泛评估显示，我们的方法将完成成功率提高了近23％，同时减少了路径长度。消融研究进一步强调了流量测量的关键作用以及我们提出的架构在利用这些测量方面的有效性。

更新时间: 2025-07-09 23:40:31

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.13973v4

Leveraging Manifold Embeddings for Enhanced Graph Transformer Representations and Learning

Graph transformers typically embed every node in a single Euclidean space, blurring heterogeneous topologies. We prepend a lightweight Riemannian mixture-of-experts layer that routes each node to various kinds of manifold, mixture of spherical, flat, hyperbolic - best matching its local structure. These projections provide intrinsic geometric explanations to the latent space. Inserted into a state-of-the-art ensemble graph transformer, this projector lifts accuracy by up to 3% on four node-classification benchmarks. The ensemble makes sure that both euclidean and non-euclidean features are captured. Explicit, geometry-aware projection thus sharpens predictive power while making graph representations more interpretable.

Updated: 2025-07-09 23:33:36

标题: 利用流形嵌入增强图变压器表示和学习

摘要: 图形转换器通常将每个节点嵌入到一个欧几里得空间中，模糊了异构拓扑结构。我们在前面添加了一个轻量级的黎曼混合专家层，将每个节点路由到各种流形，混合球形、平面、双曲线等，最佳匹配其局部结构。这些投影为潜在空间提供了内在的几何解释。将这个投影器插入到最先进的集成图转换器中，可以在四个节点分类基准测试中将准确率提高高达3%。集成确保捕获了欧几里得和非欧几里得特征。显式的、几何感知的投影因此增强了预测能力，同时使图形表示更易解释。

更新时间: 2025-07-09 23:33:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.07335v1

Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery

Large Language Models (LLMs) often generate scientifically plausible but factually invalid information, a challenge we term the "plausibility-validity gap," particularly in specialized domains like chemistry. This paper presents a systematic methodology to bridge this gap by developing a specialized scientific assistant. We utilized the Magistral Small model, noted for its integrated reasoning capabilities, and fine-tuned it using Low-Rank Adaptation (LoRA). A key component of our approach was the creation of a "dual-domain dataset," a comprehensive corpus curated from various sources encompassing both molecular properties and chemical reactions, which was standardized to ensure quality. Our evaluation demonstrates that the fine-tuned model achieves significant improvements over the baseline model in format adherence, chemical validity of generated molecules, and the feasibility of proposed synthesis routes. The results indicate a hierarchical learning pattern, where syntactic correctness is learned more readily than chemical possibility and synthesis feasibility. While a comparative analysis with human experts revealed competitive performance in areas like chemical creativity and reasoning, it also highlighted key limitations, including persistent errors in stereochemistry, a static knowledge cutoff, and occasional reference hallucination. This work establishes a viable framework for adapting generalist LLMs into reliable, specialized tools for chemical research, while also delineating critical areas for future improvement.

Updated: 2025-07-09 23:05:23

标题: 通过微调增强推理的LLM桥接可能性-有效性差距，用于化学合成和发现

摘要: 大型语言模型（LLMs）经常生成在科学上看似合理但事实上无效的信息，这是我们所谓的“合理性-有效性差距”挑战，特别是在化学等专业领域。本文提出了一种系统方法来填补这一差距，即开发一种专门的科学助手。我们利用了以其综合推理能力而闻名的Magistral Small模型，并使用低秩适应（LoRA）进行了微调。我们方法的一个关键组成部分是创建了一个“双域数据集”，这是一个从各种来源策划的包括分子性质和化学反应的广泛语料库，经过标准化以确保质量。我们的评估表明，微调的模型在格式符合性、生成分子的化学有效性以及所提议的合成路线的可行性方面均取得了显著改进。结果表明了一种分层学习模式，其中语法正确性比化学可能性和合成可行性更容易学习。与人类专家进行比较分析显示，在化学创造力和推理等领域，性能具有竞争力，但也突出了关键限制，包括立体化学中的持续错误、静态知识截止和偶发的引用幻觉。这项工作建立了一个可行的框架，将通用LLMs调整为可靠的专门化学研究工具，同时勾画了未来改进的关键领域。

更新时间: 2025-07-09 23:05:23

领域: cs.LG,cs.AI,cs.CE,physics.chem-ph

下载: http://arxiv.org/abs/2507.07328v1

Optimizing Model Splitting and Device Task Assignment for Deceptive Signal Assisted Private Multi-hop Split Learning

In this paper, deceptive signal-assisted private split learning is investigated. In our model, several edge devices jointly perform collaborative training, and some eavesdroppers aim to collect the model and data information from devices. To prevent the eavesdroppers from collecting model and data information, a subset of devices can transmit deceptive signals. Therefore, it is necessary to determine the subset of devices used for deceptive signal transmission, the subset of model training devices, and the models assigned to each model training device. This problem is formulated as an optimization problem whose goal is to minimize the information leaked to eavesdroppers while meeting the model training energy consumption and delay constraints. To solve this problem, we propose a soft actor-critic deep reinforcement learning framework with intrinsic curiosity module and cross-attention (ICM-CA) that enables a centralized agent to determine the model training devices, the deceptive signal transmission devices, the transmit power, and sub-models assigned to each model training device without knowing the position and monitoring probability of eavesdroppers. The proposed method uses an ICM module to encourage the server to explore novel actions and states and a CA module to determine the importance of each historical state-action pair thus improving training efficiency. Simulation results demonstrate that the proposed method improves the convergence rate by up to 3x and reduces the information leaked to eavesdroppers by up to 13% compared to the traditional SAC algorithm.

Updated: 2025-07-09 22:53:23

标题: 优化欺骗信号辅助的私密多跳分割学习模型划分和设备任务分配

摘要: 在这篇论文中，研究了欺骗信号辅助的私密分布式学习。在我们的模型中，多个边缘设备共同进行协作训练，一些窃听者旨在从设备中收集模型和数据信息。为防止窃听者收集模型和数据信息，一部分设备可以发送欺骗信号。因此，有必要确定用于发送欺骗信号的设备子集，模型训练设备子集以及分配给每个模型训练设备的模型。这个问题被制定为一个优化问题，其目标是在满足模型训练能耗和延迟约束的同时最小化泄露给窃听者的信息。为了解决这个问题，我们提出了一个带有内在好奇心模块和交叉注意力（ICM-CA）的软演员-评论家深度强化学习框架，使得一个集中的代理可以确定模型训练设备、欺骗信号传输设备、传输功率和分配给每个模型训练设备的子模型，而不需要知道窃听者的位置和监测概率。所提出的方法使用ICM模块鼓励服务器探索新的动作和状态，使用CA模块确定每个历史状态-动作对的重要性，从而提高训练效率。模拟结果表明，与传统的SAC算法相比，所提出的方法将收敛速度提高了最多3倍，并将泄露给窃听者的信息减少了最多13%。

更新时间: 2025-07-09 22:53:23

领域: cs.LG

下载: http://arxiv.org/abs/2507.07323v1

Optimizing Communication and Device Clustering for Clustered Federated Learning with Differential Privacy

In this paper, a secure and communication-efficient clustered federated learning (CFL) design is proposed. In our model, several base stations (BSs) with heterogeneous task-handling capabilities and multiple users with non-independent and identically distributed (non-IID) data jointly perform CFL training incorporating differential privacy (DP) techniques. Since each BS can process only a subset of the learning tasks and has limited wireless resource blocks (RBs) to allocate to users for federated learning (FL) model parameter transmission, it is necessary to jointly optimize RB allocation and user scheduling for CFL performance optimization. Meanwhile, our considered CFL method requires devices to use their limited data and FL model information to determine their task identities, which may introduce additional communication overhead. We formulate an optimization problem whose goal is to minimize the training loss of all learning tasks while considering device clustering, RB allocation, DP noise, and FL model transmission delay. To solve the problem, we propose a novel dynamic penalty function assisted value decomposed multi-agent reinforcement learning (DPVD-MARL) algorithm that enables distributed BSs to independently determine their connected users, RBs, and DP noise of the connected users but jointly minimize the training loss of all learning tasks across all BSs. Different from the existing MARL methods that assign a large penalty for invalid actions, we propose a novel penalty assignment scheme that assigns penalty depending on the number of devices that cannot meet communication constraints (e.g., delay), which can guide the MARL scheme to quickly find valid actions, thus improving the convergence speed. Simulation results show that the DPVD-MARL can improve the convergence rate by up to 20% and the ultimate accumulated rewards by 15% compared to independent Q-learning.

Updated: 2025-07-09 22:44:26

标题: 优化通信和设备聚类以实现具有差分隐私的集群化联邦学习

摘要: 在本文中，提出了一种安全且通信高效的集群化联邦学习（CFL）设计。在我们的模型中，具有异构任务处理能力的多个基站（BSs）和具有非独立和同分布（non-IID）数据的多个用户共同执行包含差分隐私（DP）技术的CFL训练。由于每个BS只能处理学习任务的子集，并且具有有限的无线资源块（RBs）用于分配给用户进行联邦学习（FL）模型参数传输，因此需要联合优化RB分配和用户调度以优化CFL性能。同时，我们考虑的CFL方法要求设备利用其有限的数据和FL模型信息来确定其任务身份，这可能会引入额外的通信开销。我们制定了一个优化问题，其目标是在考虑设备聚类、RB分配、DP噪声和FL模型传输延迟的情况下最小化所有学习任务的训练损失。为了解决这个问题，我们提出了一种新颖的动态惩罚函数辅助值分解多智能体强化学习（DPVD-MARL）算法，使得分布式BS能够独立确定其连接的用户、RBs和已连接用户的DP噪声，但联合最小化所有BS中所有学习任务的训练损失。与现有的MARL方法不同，其为无效操作分配大量惩罚，我们提出了一种新颖的惩罚分配方案，根据无法满足通信约束（例如延迟）的设备数量分配惩罚，这可以指导MARL方案快速找到有效操作，从而提高收敛速度。仿真结果表明，与独立Q学习相比，DPVD-MARL可以将收敛速率提高高达20%，最终累积奖励提高15%。

更新时间: 2025-07-09 22:44:26

领域: cs.LG

下载: http://arxiv.org/abs/2507.07320v1

SonicMotion: Dynamic Spatial Audio Soundscapes with Latent Diffusion Models

Spatial audio is an integral part of immersive entertainment, such as VR/AR, and has seen increasing popularity in cinema and music as well. The most common format of spatial audio is described as first-order Ambisonics (FOA). We seek to extend recent advancements in FOA generative AI models to enable the generation of 3D scenes with dynamic sound sources. Our proposed end-to-end model, SonicMotion, comes in two variations which vary in their user input and level of precision in sound source localization. In addition to our model, we also present a new dataset of simulated spatial audio-caption pairs. Evaluation of our models demonstrate that they are capable of matching the semantic alignment and audio quality of state of the art models while capturing the desired spatial attributes.

Updated: 2025-07-09 22:31:06

标题: 声动：具有潜在扩散模型的动态空间音频音景

摘要: 空间音频是沉浸式娱乐的一个重要组成部分，如虚拟现实/增强现实，在电影和音乐中也越来越受欢迎。空间音频最常见的格式被描述为一阶Ambisonics（FOA）。我们试图将最近FOA生成AI模型的进展扩展到能够生成具有动态声源的3D场景。我们提出的端到端模型SonicMotion有两种不同的变体，它们在用户输入和声源定位精度方面有所不同。除了我们的模型外，我们还提供了一个新的模拟空间音频-标题对的数据集。对我们的模型的评估表明，它们能够匹配最先进模型的语义对齐和音频质量，同时捕捉到所需的空间属性。

更新时间: 2025-07-09 22:31:06

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2507.07318v1

AdeptHEQ-FL: Adaptive Homomorphic Encryption for Federated Learning of Hybrid Classical-Quantum Models with Dynamic Layer Sparing

Federated Learning (FL) faces inherent challenges in balancing model performance, privacy preservation, and communication efficiency, especially in non-IID decentralized environments. Recent approaches either sacrifice formal privacy guarantees, incur high overheads, or overlook quantum-enhanced expressivity. We introduce AdeptHEQ-FL, a unified hybrid classical-quantum FL framework that integrates (i) a hybrid CNN-PQC architecture for expressive decentralized learning, (ii) an adaptive accuracy-weighted aggregation scheme leveraging differentially private validation accuracies, (iii) selective homomorphic encryption (HE) for secure aggregation of sensitive model layers, and (iv) dynamic layer-wise adaptive freezing to minimize communication overhead while preserving quantum adaptability. We establish formal privacy guarantees, provide convergence analysis, and conduct extensive experiments on the CIFAR-10, SVHN, and Fashion-MNIST datasets. AdeptHEQ-FL achieves a $\approx 25.43\%$ and $\approx 14.17\%$ accuracy improvement over Standard-FedQNN and FHE-FedQNN, respectively, on the CIFAR-10 dataset. Additionally, it reduces communication overhead by freezing less important layers, demonstrating the efficiency and practicality of our privacy-preserving, resource-aware design for FL.

Updated: 2025-07-09 22:29:02

标题: AdeptHEQ-FL：用于混合经典-量子模型联邦学习的自适应同态加密，具有动态层保留

摘要: 联邦学习（FL）在平衡模型性能、隐私保护和通信效率方面面临固有挑战，特别是在非IID分散环境中。最近的方法要么牺牲正式的隐私保证，要么产生高额开销，要么忽视量子增强表达能力。我们引入AdeptHEQ-FL，一个统一的混合经典-量子FL框架，集成了（i）用于表达式分散学习的混合CNN-PQC架构，（ii）利用差分私有验证准确性的自适应准确性加权聚合方案，（iii）选择性同态加密（HE）以安全聚合敏感模型层，以及（iv）动态逐层自适应冻结以最小化通信开销同时保留量子适应性。我们建立正式的隐私保证，提供收敛分析，并在CIFAR-10、SVHN和Fashion-MNIST数据集上进行广泛实验。AdeptHEQ-FL在CIFAR-10数据集上分别比Standard-FedQNN和FHE-FedQNN提高了约25.43％和约14.17％的准确性。此外，通过冻结不太重要的层，它减少了通信开销，展示了我们用于FL的隐私保护、资源感知设计的效率和实用性。

更新时间: 2025-07-09 22:29:02

领域: cs.LG

下载: http://arxiv.org/abs/2507.07316v1

Frontier LLMs Still Struggle with Simple Reasoning Tasks

While state-of-the-art large language models (LLMs) demonstrate advanced reasoning capabilities-achieving remarkable performance on challenging competitive math and coding benchmarks-they also frequently fail on tasks that are easy for humans. This work studies the performance of frontier LLMs on a broad set of such "easy" reasoning problems. By extending previous work in the literature, we create a suite of procedurally generated simple reasoning tasks, including counting, first-order logic, proof trees, and travel planning, with changeable parameters (such as document length. or the number of variables in a math problem) that can arbitrarily increase the amount of computation required to produce the answer while preserving the fundamental difficulty. While previous work showed that traditional, non-thinking models can be made to fail on such problems, we demonstrate that even state-of-the-art thinking models consistently fail on such problems and for similar reasons (e.g. statistical shortcuts, errors in intermediate steps, and difficulties in processing long contexts). To further understand the behavior of the models, we introduce the unpuzzles dataset, a different "easy" benchmark consisting of trivialized versions of well-known math and logic puzzles. Interestingly, while modern LLMs excel at solving the original puzzles, they tend to fail on the trivialized versions, exhibiting several systematic failure patterns related to memorizing the originals. We show that this happens even if the models are otherwise able to solve problems with different descriptions but requiring the same logic. Our results highlight that out-of-distribution generalization is still problematic for frontier language models and the new generation of thinking models, even for simple reasoning tasks, and making tasks easier does not necessarily imply improved performance.

Updated: 2025-07-09 22:22:49

标题: 边境LLM仍然在简单推理任务中挣扎

摘要: 尽管最先进的大型语言模型(LLMs)展示了先进的推理能力，在具有挑战性的竞争数学和编码基准测试上表现出色，但它们经常在对人类来说很容易的任务上失败。这项工作研究了前沿LLMs在一系列“简单”推理问题上的表现。通过扩展文献中的先前工作，我们创建了一套程序生成的简单推理任务，包括计数、一阶逻辑、证明树和旅行规划，具有可更改的参数(如文档长度或数学问题中的变量数量)，可以任意增加产生答案所需的计算量，同时保留基本的困难程度。尽管先前的研究表明传统的、非思考模型可以在这些问题上失败，我们证明了即使是最先进的思考模型在这些问题上也经常失败，并且出现类似的原因(如统计快捷方式、中间步骤错误和处理长文本上的困难)。为了进一步了解模型的行为，我们引入了未解谜题数据集，这是一个不同的“简单”基准测试，由著名的数学和逻辑谜题的简化版本组成。有趣的是，尽管现代LLMs擅长解决原始谜题，它们往往在简化版本上失败，展示出与记忆原始谜题相关的几种系统性失败模式。我们证明，即使模型在其他描述不同但需要相同逻辑的问题上也能解决，这种情况仍会发生。我们的结果突显了对于前沿语言模型和新一代思考模型来说，超出分布的泛化仍然是问题，在简单的推理任务中使任务更容易并不一定意味着性能的改善。

更新时间: 2025-07-09 22:22:49

领域: cs.LG

下载: http://arxiv.org/abs/2507.07313v1

From Images to Signals: Are Large Vision Models Useful for Time Series Analysis?

Transformer-based models have gained increasing attention in time series research, driving interest in Large Language Models (LLMs) and foundation models for time series analysis. As the field moves toward multi-modality, Large Vision Models (LVMs) are emerging as a promising direction. In the past, the effectiveness of Transformer and LLMs in time series has been debated. When it comes to LVMs, a similar question arises: are LVMs truely useful for time series analysis? To address it, we design and conduct the first principled study involving 4 LVMs, 8 imaging methods, 18 datasets and 26 baselines across both high-level (classification) and low-level (forecasting) tasks, with extensive ablation analysis. Our findings indicate LVMs are indeed useful for time series classification but face challenges in forecasting. Although effective, the contemporary best LVM forecasters are limited to specific types of LVMs and imaging methods, exhibit a bias toward forecasting periods, and have limited ability to utilize long look-back windows. We hope our findings could serve as a cornerstone for future research on LVM- and multimodal-based solutions to different time series tasks.

Updated: 2025-07-09 22:13:37

标题: 从图像到信号：大型视觉模型对时间序列分析有用吗？

摘要: 基于Transformer的模型在时间序列研究中引起了越来越多的关注，促使人们对大型语言模型（LLMs）和时间序列分析的基础模型产生兴趣。随着领域向多模态发展，大型视觉模型（LVMs）正成为一个有前途的方向。过去，关于Transformer和LLMs在时间序列中的有效性一直存在争议。当涉及到LVMs时，一个类似的问题出现了：LVMs是否真的对时间序列分析有用？为了回答这个问题，我们设计并进行了第一次有原则的研究，涉及4个LVMs，8种成像方法，18个数据集和26个基线，涵盖了高级（分类）和低级（预测）任务，并进行了广泛的排除分析。我们的研究结果表明，LVMs确实对时间序列分类有用，但在预测方面面临挑战。尽管有效，当代最佳的LVM预测者受限于特定类型的LVMs和成像方法，对预测周期表现出偏见，并且能力有限，无法利用长期的回顾窗口。我们希望我们的发现可以成为未来研究LVM和多模态解决方案不同时间序列任务的基石。

更新时间: 2025-07-09 22:13:37

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2505.24030v2

Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing

We consider the problem of parameter estimation in a high-dimensional generalized linear model. Spectral methods obtained via the principal eigenvector of a suitable data-dependent matrix provide a simple yet surprisingly effective solution. However, despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructured (i.i.d.\ Gaussian and Haar orthogonal) designs. In contrast, real-world data matrices are highly structured and exhibit non-trivial correlations. To address the problem, we consider correlated Gaussian designs capturing the anisotropic nature of the features via a covariance matrix $\Sigma$. Our main result is a precise asymptotic characterization of the performance of spectral estimators. This allows us to identify the optimal preprocessing that minimizes the number of samples needed for parameter estimation. Surprisingly, such preprocessing is universal across a broad set of designs, which partly addresses a conjecture on optimal spectral estimators for rotationally invariant models. Our principled approach vastly improves upon previous heuristic methods, including for designs common in computational imaging and genetics. The proposed methodology, based on approximate message passing, is broadly applicable and opens the way to the precise characterization of spiked matrices and of the corresponding spectral methods in a variety of settings.

Updated: 2025-07-09 22:10:08

标题: 通过近似消息传递的谱估计器来处理结构化广义线性模型

摘要: 我们考虑在高维广义线性模型中的参数估计问题。通过一个适当的数据相关矩阵的主特征向量得到的谱方法提供了一个简单但出乎意料地有效的解决方案。然而，尽管它们被广泛使用，但仅对于无结构的（独立同分布的高斯和哈尔正交）设计提供了严格的性能表征，以及一个原则性的预处理数据的方法。相比之下，现实世界的数据矩阵具有高度结构化，并展现出非平凡的相关性。为了解决这个问题，我们考虑捕捉特征的各向异性特性的相关高斯设计，通过协方差矩阵Σ。我们的主要结果是对谱估计器性能的精确渐近特征化。这使我们能够确定最小化参数估计所需样本数量的最佳预处理方法。令人惊讶的是，这种预处理在广泛的设计集合中是普遍的，这在一定程度上解决了关于旋转不变模型最优谱估计器的猜想。我们的原则方法大大改进了先前的启发式方法，包括在计算成像和遗传学中常见的设计。所提出的基于近似传递消息的方法具有广泛的适用性，并为在各种设置中精确描述钉状矩阵和相应的谱方法打开了道路。

更新时间: 2025-07-09 22:10:08

领域: math.ST,cs.IT,cs.LG,math.IT,math.PR,stat.ML,stat.TH

下载: http://arxiv.org/abs/2308.14507v4

Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation

Large language models (LLMs) are increasingly integral to information retrieval (IR), powering ranking, evaluation, and AI-assisted content creation. This widespread adoption necessitates a critical examination of potential biases arising from the interplay between these LLM-based components. This paper synthesizes existing research and presents novel experiment designs that explore how LLM-based rankers and assistants influence LLM-based judges. We provide the first empirical evidence of LLM judges exhibiting significant bias towards LLM-based rankers. Furthermore, we observe limitations in LLM judges' ability to discern subtle system performance differences. Contrary to some previous findings, our preliminary study does not find evidence of bias against AI-generated content. These results highlight the need for a more holistic view of the LLM-driven information ecosystem. To this end, we offer initial guidelines and a research agenda to ensure the reliable use of LLMs in IR evaluation.

Updated: 2025-07-09 22:09:49

标题: 排序者，评判者和助手：探索理解信息检索评估中LLMs之间的相互作用

摘要: 大型语言模型（LLMs）在信息检索（IR）中变得越来越重要，为排名、评估和AI辅助内容创作提供动力。这种广泛采用需要对这些基于LLM的组件之间相互作用可能产生的潜在偏见进行批判性审查。本文综合现有研究，提出了探索LLM基础排名器和助手如何影响LLM基础裁判的新颖实验设计。我们提供了第一份实证证据，表明LLM裁判在对LLM基础排名器表现出显著偏见。此外，我们观察到LLM裁判在辨别微妙系统性能差异方面存在局限性。与一些先前研究结果相反，我们的初步研究并未发现对AI生成内容的偏见证据。这些结果强调了需要更全面地看待基于LLM驱动的信息生态系统。为此，我们提供了初步指导方针和研究议程，以确保在IR评估中可靠地使用LLMs。

更新时间: 2025-07-09 22:09:49

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.19092v2

ViDove: A Translation Agent System with Multimodal Context and Memory-Augmented Reasoning

LLM-based translation agents have achieved highly human-like translation results and are capable of handling longer and more complex contexts with greater efficiency. However, they are typically limited to text-only inputs. In this paper, we introduce ViDove, a translation agent system designed for multimodal input. Inspired by the workflow of human translators, ViDove leverages visual and contextual background information to enhance the translation process. Additionally, we integrate a multimodal memory system and long-short term memory modules enriched with domain-specific knowledge, enabling the agent to perform more accurately and adaptively in real-world scenarios. As a result, ViDove achieves significantly higher translation quality in both subtitle generation and general translation tasks, with a 28% improvement in BLEU scores and a 15% improvement in SubER compared to previous state-of-the-art baselines. Moreover, we introduce DoveBench, a new benchmark for long-form automatic video subtitling and translation, featuring 17 hours of high-quality, human-annotated data. Our code is available here: https://github.com/pigeonai-org/ViDove

Updated: 2025-07-09 22:05:46

标题: ViDove：一种具有多模态上下文和记忆增强推理的翻译代理系统

摘要: 基于LLM的翻译代理已经取得了非常接近人类的翻译结果，并且能够处理更长更复杂的语境，效率更高。然而，它们通常仅限于文本输入。在本文中，我们介绍了ViDove，一个专为多模态输入设计的翻译代理系统。受到人类翻译工作流程的启发，ViDove利用视觉和上下文背景信息来增强翻译过程。此外，我们整合了一个多模态记忆系统和富含领域特定知识的长短期记忆模块，使代理能够在真实场景中更准确和适应性更强地执行。结果，ViDove在字幕生成和一般翻译任务中实现了显著更高的翻译质量，与先前的最先进基线相比，BLEU分数提高了28％，SubER提高了15％。此外，我们引入了DoveBench，一个新的用于长篇自动视频字幕和翻译的基准，包含17小时高质量、人工注释的数据。我们的代码可在此处找到：https://github.com/pigeonai-org/ViDove

更新时间: 2025-07-09 22:05:46

领域: cs.AI,cs.CL,eess.AS

下载: http://arxiv.org/abs/2507.07306v1

Description of the Training Process of Neural Networks via Ergodic Theorem : Ghost nodes

Recent studies have proposed interpreting the training process from an ergodic perspective. Building on this foundation, we present a unified framework for understanding and accelerating the training of deep neural networks via stochastic gradient descent (SGD). By analyzing the geometric landscape of the objective function we introduce a practical diagnostic, the running estimate of the largest Lyapunov exponent, which provably distinguishes genuine convergence toward stable minimizers from mere statistical stabilization near saddle points. We then propose a ghost category extension for standard classifiers that adds auxiliary ghost output nodes so the model gains extra descent directions that open a lateral corridor around narrow loss barriers and enable the optimizer to bypass poor basins during the early training phase. We show that this extension strictly reduces the approximation error and that after sufficient convergence the ghost dimensions collapse so that the extended model coincides with the original one and there exists a path in the enlarged parameter space along which the total loss does not increase. Taken together, these results provide a principled architecture level intervention that accelerates early stage trainability while preserving asymptotic behavior and simultaneously serves as an architecture-friendly regularizer.

Updated: 2025-07-09 22:03:57

标题: 使用遍历定理描述神经网络训练过程：幽灵节点

摘要: 最近的研究提出从遍历的角度解释训练过程。在此基础上，我们提出了一个统一的框架，用于理解和加速通过随机梯度下降（SGD）训练深度神经网络。通过分析目标函数的几何景观，我们引入了一个实用的诊断方法，即最大Lyapunov指数的运行估计，可以明确区分真正朝向稳定极小值的收敛与仅在鞍点附近进行统计稳定。然后，我们提出了标准分类器的幽灵类别扩展，添加辅助幽灵输出节点，使模型获得额外的下降方向，打开窄损失障碍周围的横向通道，并使优化器在早期训练阶段绕过较差的盆地。我们展示了这种扩展严格减少了逼近误差，并且在足够收敛后，幽灵维度会崩溃，使扩展模型与原始模型一致，并且在扩大的参数空间中存在一条路径，沿该路径总损失不会增加。综合这些结果，提供了一个基于体系结构的干预方法，可以加速早期阶段的可训练性，同时保持渐近行为，并同时作为一种友好的体系结构正则化器。

更新时间: 2025-07-09 22:03:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.01003v2

Application of LLMs to Multi-Robot Path Planning and Task Allocation

Efficient exploration is a well known problem in deep reinforcement learning and this problem is exacerbated in multi-agent reinforcement learning due the intrinsic complexities of such algorithms. There are several approaches to efficiently explore an environment to learn to solve tasks by multi-agent operating in that environment, of which, the idea of expert exploration is investigated in this work. More specifically, this work investigates the application of large-language models as expert planners for efficient exploration in planning based tasks for multiple agents.

Updated: 2025-07-09 22:01:32

标题: LLMs在多机器人路径规划和任务分配中的应用

摘要: 高效的探索是深度强化学习中一个众所周知的问题，在多智能体强化学习中，由于这些算法的固有复杂性，这个问题变得更加严重。有几种方法可以有效地探索环境，通过多个智能体在该环境中学习解决任务，其中，专家探索的概念在这项工作中得到了研究。更具体地说，这项工作研究了将大型语言模型应用为多个智能体的有效探索中的专家规划者，用于基于规划的任务。

更新时间: 2025-07-09 22:01:32

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2507.07302v1

Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge

The combination of Large Language Models (LLM) and Automatic Speech Recognition (ASR), when deployed on edge devices (called edge ASR-LLM), can serve as a powerful personalized assistant to enable audio-based interaction for users. Compared to text-based interaction, edge ASR-LLM allows accessible and natural audio interactions. Unfortunately, existing ASR-LLM models are mainly trained in high-performance computing environments and produce substantial model weights, making them difficult to deploy on edge devices. More importantly, to better serve users' personalized needs, the ASR-LLM must be able to learn from each distinct user, given that audio input often contains highly personalized characteristics that necessitate personalized on-device training. Since individually fine-tuning the ASR or LLM often leads to suboptimal results due to modality-specific limitations, end-to-end training ensures seamless integration of audio features and language understanding (cross-modal alignment), ultimately enabling a more personalized and efficient adaptation on edge devices. However, due to the complex training requirements and substantial computational demands of existing approaches, cross-modal alignment between ASR audio and LLM can be challenging on edge devices. In this work, we propose a resource-efficient cross-modal alignment framework that bridges ASR and LLMs on edge devices to handle personalized audio input. Our framework enables efficient ASR-LLM alignment on resource-constrained devices like NVIDIA Jetson Orin (8GB RAM), achieving 50x training time speedup while improving the alignment quality by more than 50\%. To the best of our knowledge, this is the first work to study efficient ASR-LLM alignment on resource-constrained edge devices.

Updated: 2025-07-09 21:56:53

标题: 微对齐：在边缘上连接自动语音识别和大语言模型

摘要: Large Language Models (LLM) 和 Automatic Speech Recognition (ASR) 的结合，在边缘设备上部署时（称为边缘ASR-LLM），可以作为强大的个性化助手，为用户提供基于音频的交互。与基于文本的交互相比，边缘ASR-LLM 允许可访问和自然的音频交互。不幸的是，现有的 ASR-LLM 模型主要在高性能计算环境中训练，并生成大量模型权重，使其难以部署在边缘设备上。更重要的是，为了更好地满足用户的个性化需求，ASR-LLM 必须能够从每个不同的用户中学习，因为音频输入通常包含高度个性化的特征，需要个性化的设备端训练。由于单独微调 ASR 或 LLM 往往会导致由于模态特定限制而产生次优结果，端到端训练确保音频特征和语言理解的无缝集成（跨模态对齐），最终实现了在边缘设备上更个性化和高效的适应性。然而，由于现有方法复杂的训练要求和大量的计算需求，ASR 音频和 LLM 之间的跨模态对齐在边缘设备上可能具有挑战性。在这项工作中，我们提出了一个资源高效的跨模态对齐框架，将 ASR 和 LLM 在边缘设备上连接起来，以处理个性化的音频输入。我们的框架在类似 NVIDIA Jetson Orin（8GB RAM）这样的资源受限设备上实现了高效的 ASR-LLM 对齐，训练时间加速了 50 倍，同时将对齐质量提高了超过 50\%。据我们所知，这是第一个研究在资源受限的边缘设备上高效进行 ASR-LLM 对齐的工作。

更新时间: 2025-07-09 21:56:53

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2411.13766v3

Multilayer GNN for Predictive Maintenance and Clustering in Power Grids

Unplanned power outages cost the US economy over $150 billion annually, partly due to predictive maintenance (PdM) models that overlook spatial, temporal, and causal dependencies in grid failures. This study introduces a multilayer Graph Neural Network (GNN) framework to enhance PdM and enable resilience-based substation clustering. Using seven years of incident data from Oklahoma Gas & Electric (292,830 records across 347 substations), the framework integrates Graph Attention Networks (spatial), Graph Convolutional Networks (temporal), and Graph Isomorphism Networks (causal), fused through attention-weighted embeddings. Our model achieves a 30-day F1-score of 0.8935 +/- 0.0258, outperforming XGBoost and Random Forest by 3.2% and 2.7%, and single-layer GNNs by 10 to 15 percent. Removing the causal layer drops performance to 0.7354 +/- 0.0418. For resilience analysis, HDBSCAN clustering on HierarchicalRiskGNN embeddings identifies eight operational risk groups. The highest-risk cluster (Cluster 5, 44 substations) shows 388.4 incidents/year and 602.6-minute recovery time, while low-risk groups report fewer than 62 incidents/year. ANOVA (p < 0.0001) confirms significant inter-cluster separation. Our clustering outperforms K-Means and Spectral Clustering with a Silhouette Score of 0.626 and Davies-Bouldin index of 0.527. This work supports proactive grid management through improved failure prediction and risk-aware substation clustering.

Updated: 2025-07-09 21:44:51

标题: 多层GNN用于电网预测性维护和聚类

摘要: 未经计划的停电每年给美国经济造成超过1500亿美元的损失，部分原因是由于预测性维护（PdM）模型忽视了电网故障中的空间、时间和因果依赖关系。本研究引入了一个多层图神经网络（GNN）框架，以增强PdM并实现基于恢复力的变电站聚类。利用俄克拉荷马燃气和电力公司七年的事故数据（来自347个变电站的292,830条记录），该框架整合了图注意力网络（空间）、图卷积网络（时间）和图同构网络（因果），通过注意力加权嵌入进行融合。我们的模型在30天的F1分数上达到了0.8935 +/- 0.0258，优于XGBoost和随机森林分别为3.2%和2.7%，单层GNN则高出10到15%。删除因果层会使性能降至0.7354 +/- 0.0418。在恢复力分析方面，对HierarchicalRiskGNN嵌入进行HDBSCAN聚类鉴别出了八个操作风险组。最高风险群（第5群，44个变电站）每年有388.4起事故和602.6分钟的恢复时间，而低风险群报告的事故少于62起。ANOVA（p < 0.0001）确认了显著的群间分离。我们的聚类结果在轮廓分数为0.626和Davies-Bouldin指数为0.527的情况下优于K均值和谱聚类。该研究支持通过改进故障预测和风险感知的变电站聚类来支持主动的电网管理。

更新时间: 2025-07-09 21:44:51

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2507.07298v1

Time Series Foundation Models for Multivariate Financial Time Series Forecasting

Financial time series forecasting presents significant challenges due to complex nonlinear relationships, temporal dependencies, variable interdependencies and limited data availability, particularly for tasks involving low-frequency data, newly listed instruments, or emerging market assets. Time Series Foundation Models (TSFMs) offer a promising solution through pretraining on diverse time series corpora followed by task-specific adaptation. This study evaluates two TSFMs (Tiny Time Mixers (TTM) and Chronos) across three financial forecasting tasks: US 10-year Treasury yield changes, EUR/USD volatility, and equity spread prediction. Results demonstrate that TTM exhibits strong transferability. When fine-tuning both the pretrained version of TTM and an untrained model with the same architecture, the pretrained version achieved 25-50% better performance when fine-tuned on limited data and 15-30% improvements even when fine-tuned on lengthier datasets. Notably, TTM's zero-shot performance outperformed naive benchmarks in volatility forecasting and equity spread prediction, with the latter demonstrating that TSFMs can surpass traditional benchmark models without fine-tuning. The pretrained model consistently required 3-10 fewer years of data to achieve comparable performance levels compared to the untrained model, demonstrating significant sample-efficiency gains. However, while TTM outperformed naive baselines, traditional specialised models matched or exceeded its performance in two of three tasks, suggesting TSFMs prioritise breadth over task-specific optimisation. These findings indicate that TSFMs, though still nascent, offer substantial promise for financial forecasting-particularly in noisy, data-constrained tasks-but achieving competitive performance likely requires domain-specific pretraining and architectural refinements tailored to financial time series characteristics.

Updated: 2025-07-09 21:43:06

标题: 多元金融时间序列预测的时间序列基础模型

摘要: 金融时间序列预测面临重大挑战，主要是由于复杂的非线性关系、时间依赖性、变量间的相互关系以及有限的数据可用性，特别是对于涉及低频数据、新上市工具或新兴市场资产的任务。时间序列基础模型（TSFMs）通过在各种时间序列语料库上进行预训练，然后进行任务特定的调整，提供了一个有前途的解决方案。本研究评估了两种TSFMs（Tiny Time Mixers（TTM）和Chronos）在三个金融预测任务中的表现：美国10年期国债收益率变化、EUR/USD波动性和股权价差预测。结果表明，TTM表现出较强的可迁移性。当对预先训练的TTM版本和具有相同架构的未训练模型进行微调时，预先训练的版本在有限数据上微调时取得了25-50%更好的性能，并且在更长数据集上微调时也实现了15-30%的改进。值得注意的是，TTM的零点性能在波动性预测和股权价差预测方面优于天真基准，后者表明TSFMs可以在不进行微调的情况下超越传统基准模型。预先训练的模型通常需要比未训练模型少3-10年的数据才能达到可比较的性能水平，表明具有显著的样本效率增益。然而，虽然TTM优于天真基准，但在三个任务中，传统的专业模型在两个任务中匹配或超过其性能，这表明TSFMs更注重广度而非任务特定优化。这些发现表明，尽管TSFMs仍处于初期阶段，但在金融预测方面具有巨大的潜力，特别是在嘈杂、数据受限的任务中，但要实现竞争性的表现可能需要领域特定的预训练和根据金融时间序列特征量身定制的架构优化。

更新时间: 2025-07-09 21:43:06

领域: q-fin.GN,cs.LG

下载: http://arxiv.org/abs/2507.07296v1

Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine Learning

New discoveries in chemistry and materials science, with increasingly expanding volume of requisite knowledge and experimental workload, provide unique opportunities for machine learning (ML) to take critical roles in accelerating research efficiency. Here, we demonstrate (1) the use of large language models (LLMs) for automated literature reviews, and (2) the training of an ML model to predict chemical knowledge (thermodynamic parameters). Our LLM-based literature review tool (LMExt) successfully extracted chemical information and beyond into a machine-readable structure, including stability constants for metal cation-ligand interactions, thermodynamic properties, and other broader data types (medical research papers, and financial reports), effectively overcoming the challenges inherent in each domain. Using the autonomous acquisition of thermodynamic data, an ML model was trained using the CatBoost algorithm for accurately predicting thermodynamic parameters (e.g., enthalpy of formation) of minerals. This work highlights the transformative potential of integrated ML approaches to reshape chemistry and materials science research.

Updated: 2025-07-09 21:33:25

标题: 通过自动数据集构建和机器学习实现的热力学预测

摘要: 新的化学和材料科学发现，随着所需知识量和实验工作量的不断扩大，为机器学习（ML）提供了在加速研究效率方面发挥关键作用的独特机会。在这里，我们展示了（1）使用大型语言模型（LLMs）进行自动文献评论，以及（2）训练一个ML模型来预测化学知识（热力学参数）。我们基于LLM的文献评论工具（LMExt）成功地提取了化学信息及其他内容，并将其转化为机器可读的结构，包括金属阳离子-配体相互作用的稳定常数，热力学性质和其他更广泛的数据类型（医学研究论文和财务报告），有效地克服了每个领域固有的挑战。通过自动获取热力学数据，使用CatBoost算法训练了一个ML模型，可以准确预测矿物的热力学参数（例如形成焓）。这项工作突显了综合ML方法重塑化学和材料科学研究的潜力。

更新时间: 2025-07-09 21:33:25

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2507.07293v1

Discretization-independent multifidelity operator learning for partial differential equations

We develop a new and general encode-approximate-reconstruct operator learning model that leverages learned neural representations of bases for input and output function distributions. We introduce the concepts of \textit{numerical operator learning} and \textit{discretization independence}, which clarify the relationship between theoretical formulations and practical realizations of operator learning models. Our model is discretization-independent, making it particularly effective for multifidelity learning. We establish theoretical approximation guarantees, demonstrating uniform universal approximation under strong assumptions on the input functions and statistical approximation under weaker conditions. To our knowledge, this is the first comprehensive study that investigates how discretization independence enables robust and efficient multifidelity operator learning. We validate our method through extensive numerical experiments involving both local and nonlocal PDEs, including time-independent and time-dependent problems. The results show that multifidelity training significantly improves accuracy and computational efficiency. Moreover, multifidelity training further enhances empirical discretization independence.

Updated: 2025-07-09 21:29:11

标题: 离散化无关的多保真度算子学习用于偏微分方程

摘要: 我们开发了一种新的和通用的编码-近似-重构操作符学习模型，利用了输入和输出函数分布的学习神经表征。我们引入了\textit{数值操作符学习}和\textit{离散化独立性}的概念，澄清了操作符学习模型的理论公式和实际实现之间的关系。我们的模型是离散化无关的，特别适用于多精度学习。我们建立了理论近似保证，展示了在输入函数和统计近似条件较弱的情况下，强制假设下的均匀通用逼近。据我们所知，这是第一个全面研究离散化独立性如何实现鲁棒和高效的多精度操作符学习的研究。我们通过广泛的数值实验验证了我们的方法，包括局部和非局部PDE问题，包括独立时间和时间相关问题。结果显示，多精度训练显著提高了准确性和计算效率。此外，多精度训练进一步增强了经验性的离散化独立性。

更新时间: 2025-07-09 21:29:11

领域: cs.LG

下载: http://arxiv.org/abs/2507.07292v1

Estimating Dataset Dimension via Singular Metrics under the Manifold Hypothesis: Application to Inverse Problems

High-dimensional datasets often exhibit low-dimensional geometric structures, as suggested by the manifold hypothesis, which implies that data lie on a smooth manifold embedded in a higher-dimensional ambient space. While this insight underpins many advances in machine learning and inverse problems, fully leveraging it requires to deal with three key tasks: estimating the intrinsic dimension (ID) of the manifold, constructing appropriate local coordinates, and learning mappings between ambient and manifold spaces. In this work, we propose a framework that addresses all these challenges using a Mixture of Variational Autoencoders (VAEs) and tools from Riemannian geometry. We specifically focus on estimating the ID of datasets by analyzing the numerical rank of the VAE decoder pullback metric. The estimated ID guides the construction of an atlas of local charts using a mixture of invertible VAEs, enabling accurate manifold parameterization and efficient inference. We how this approach enhances solutions to ill-posed inverse problems, particularly in biomedical imaging, by enforcing that reconstructions lie on the learned manifold. Lastly, we explore the impact of network pruning on manifold geometry and reconstruction quality, showing that the intrinsic dimension serves as an effective proxy for monitoring model capacity.

Updated: 2025-07-09 21:22:59

标题: 利用流形假设下的奇异度量估计数据集维度：应用于逆问题

摘要: 高维数据集通常呈现低维几何结构，正如流形假设所暗示的那样，这意味着数据位于嵌入在高维环境空间中的平滑流形上。虽然这一观点支撑了许多机器学习和逆问题领域的进展，但充分利用它需要解决三个关键任务：估计流形的内在维数（ID），构建适当的局部坐标，以及学习环境空间和流形空间之间的映射。在这项工作中，我们提出了一个框架，利用变分自编码器（VAEs）的混合和黎曼几何工具来解决所有这些挑战。我们特别关注通过分析VAE解码器回拨度量的数值秩来估计数据集的内在维数。估计的内在维数指导了使用可逆VAEs混合构建局部图表的过程，实现准确的流形参数化和高效的推断。我们展示了这种方法如何增强解决逆问题中的不适定问题的解决方案，特别是在生物医学成像领域，通过强制重建结果位于学习到的流形上。最后，我们探讨了网络修剪对流形几何和重建质量的影响，表明内在维数作为监控模型容量的有效代理。

更新时间: 2025-07-09 21:22:59

领域: cs.LG

下载: http://arxiv.org/abs/2507.07291v1

Natural Evolutionary Search meets Probabilistic Numerics

Zeroth-order local optimisation algorithms are essential for solving real-valued black-box optimisation problems. Among these, Natural Evolution Strategies (NES) represent a prominent class, particularly well-suited for scenarios where prior distributions are available. By optimising the objective function in the space of search distributions, NES algorithms naturally integrate prior knowledge during initialisation, making them effective in settings such as semi-supervised learning and user-prior belief frameworks. However, due to their reliance on random sampling and Monte Carlo estimates, NES algorithms can suffer from limited sample efficiency. In this paper, we introduce a novel class of algorithms, termed Probabilistic Natural Evolutionary Strategy Algorithms (ProbNES), which enhance the NES framework with Bayesian quadrature. We show that ProbNES algorithms consistently outperforms their non-probabilistic counterparts as well as global sample efficient methods such as Bayesian Optimisation (BO) or $\pi$BO across a wide range of tasks, including benchmark test functions, data-driven optimisation tasks, user-informed hyperparameter tuning tasks and locomotion tasks.

Updated: 2025-07-09 21:15:50

标题: 自然进化搜索遇见概率数值论

摘要: 零阶局部优化算法是解决实值黑盒优化问题至关重要的。在这些算法中，自然进化策略（NES）代表了一个重要类别，特别适用于先验分布可用的情况。通过在搜索分布空间中优化目标函数，NES算法在初始化阶段自然地整合了先验知识，使其在半监督学习和用户先验信念框架等环境中表现出色。然而，由于依赖随机采样和蒙特卡洛估计，NES算法可能会受到样本效率的限制。在本文中，我们引入了一种新的算法类别，称为概率自然进化策略算法（ProbNES），它通过贝叶斯积分增强了NES框架。我们展示了ProbNES算法在一系列任务中持续优于非概率性对应物以及全局样本高效方法，如贝叶斯优化（BO）或 $\pi$BO，包括基准测试函数、数据驱动优化任务、用户信息超参数调整任务和运动任务。

更新时间: 2025-07-09 21:15:50

领域: cs.LG

下载: http://arxiv.org/abs/2507.07288v1

EditLord: Learning Code Transformation Rules for Code Editing

Code editing is a foundational task in software development, where its effectiveness depends on whether it introduces desired code property changes without changing the original code's intended functionality. Existing approaches often formulate code editing as an implicit end-to-end task, omitting the fact that code-editing procedures inherently consist of discrete and explicit steps. Thus, they suffer from suboptimal performance and lack of robustness and generalization. We introduce EditLord, a code editing framework that makes the code transformation steps explicit. Our key insight is to employ a language model (LM) as an inductive learner to extract code editing rules from the training code pairs as concise meta-rule sets. Such rule sets will be manifested for each training sample to augment them for finetuning or assist in prompting- and iterative-based code editing. EditLord outperforms the state-of-the-art by an average of 22.7% in editing performance and 58.1% in robustness while achieving 20.2% higher functional correctness across critical software engineering and security applications, LM models, and editing modes.

Updated: 2025-07-09 21:15:30

标题: EditLord：学习代码转换规则以进行代码编辑

摘要: 代码编辑是软件开发中的基础任务，其有效性取决于是否引入了期望的代码属性更改而不改变原始代码的预期功能。现有方法通常将代码编辑视为一个隐式的端到端任务，忽略了代码编辑过程本质上由离散和明确的步骤组成的事实。因此，它们表现出次优性能，缺乏鲁棒性和泛化能力。我们介绍了EditLord，一个使代码转换步骤明确的代码编辑框架。我们的关键洞察力是利用语言模型（LM）作为归纳学习器，从训练代码对中提取代码编辑规则作为简明的元规则集。这样的规则集将为每个训练样本显现，以增强它们进行微调或在基于提示和迭代的代码编辑中提供帮助。EditLord在编辑性能方面优于现有技术平均22.7％，在鲁棒性方面优于58.1％，同时在关键软件工程和安全应用程序、LM模型和编辑模式中实现了20.2％更高的功能正确性。

更新时间: 2025-07-09 21:15:30

领域: cs.SE,cs.CR,cs.LG

下载: http://arxiv.org/abs/2504.15284v4

Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Can machines truly think, reason and act in domains like humans? This enduring question continues to shape the pursuit of Artificial General Intelligence (AGI). Despite the growing capabilities of models such as GPT-4.5, DeepSeek, Claude 3.5 Sonnet, Phi-4, and Grok 3, which exhibit multimodal fluency and partial reasoning, these systems remain fundamentally limited by their reliance on token-level prediction and lack of grounded agency. This paper offers a cross-disciplinary synthesis of AGI development, spanning artificial intelligence, cognitive neuroscience, psychology, generative models, and agent-based systems. We analyze the architectural and cognitive foundations of general intelligence, highlighting the role of modular reasoning, persistent memory, and multi-agent coordination. In particular, we emphasize the rise of Agentic RAG frameworks that combine retrieval, planning, and dynamic tool use to enable more adaptive behavior. We discuss generalization strategies, including information compression, test-time adaptation, and training-free methods, as critical pathways toward flexible, domain-agnostic intelligence. Vision-Language Models (VLMs) are reexamined not just as perception modules but as evolving interfaces for embodied understanding and collaborative task completion. We also argue that true intelligence arises not from scale alone but from the integration of memory and reasoning: an orchestration of modular, interactive, and self-improving components where compression enables adaptive behavior. Drawing on advances in neurosymbolic systems, reinforcement learning, and cognitive scaffolding, we explore how recent architectures begin to bridge the gap between statistical learning and goal-directed cognition. Finally, we identify key scientific, technical, and ethical challenges on the path to AGI.

Updated: 2025-07-09 21:09:25

标题: 超越符号：从脑启发智能到人工通用智能的认知基础及其社会影响

摘要: 机器能否真正像人类一样在领域中思考、推理和行动？这个长久存在的问题继续塑造着对人工通用智能（AGI）的追求。尽管像GPT-4.5、DeepSeek、Claude 3.5 Sonnet、Phi-4和Grok 3这样的模型的能力不断增强，展现出多模态流畅性和部分推理能力，但这些系统仍然基本上受到它们对令牌级别预测的依赖和缺乏扎根代理的限制。本文提供了一个跨学科的AGI发展综合，涵盖了人工智能、认知神经科学、心理学、生成模型和基于代理的系统。我们分析了通用智能的架构和认知基础，突出了模块化推理、持久记忆和多智能体协调的作用。特别是，我们强调了RAG框架的崛起，这些框架结合了检索、规划和动态工具使用，以实现更具适应性的行为。我们讨论了泛化策略，包括信息压缩、测试时间适应和免训练方法，作为通向灵活、领域无关智能的关键途径。我们重新审视了视觉语言模型（VLMs），不仅作为感知模块，而且作为具有身体理解和协作任务完成功能的演变接口。我们还认为真正的智能不仅仅来自规模，而是来自记忆和推理的整合：一种模块化、交互式和自我改进组件的协调，其中压缩实现了自适应行为。借鉴神经符号系统、强化学习和认知支架的进展，我们探讨了最近的架构如何开始弥合统计学习和目标导向认知之间的差距。最后，我们确定了通向AGI的关键科学、技术和伦理挑战。

更新时间: 2025-07-09 21:09:25

领域: cs.AI

下载: http://arxiv.org/abs/2507.00951v2

Smart IoT Security: Lightweight Machine Learning Techniques for Multi-Class Attack Detection in IoT Networks

The Internet of Things (IoT) is expanding at an accelerated pace, making it critical to have secure networks to mitigate a variety of cyber threats. This study addresses the limitation of multi-class attack detection of IoT devices and presents new machine learning-based lightweight ensemble methods that exploit its strong machine learning framework. We used a dataset entitled CICIoT 2023, which has a total of 34 different attack types categorized into 10 categories, and methodically assessed the performance of a substantial array of current machine learning techniques in our goal to identify the best-performing algorithmic choice for IoT application protection. In this work, we focus on ML classifier-based methods to address the biocharges presented by the difficult and heterogeneous properties of the attack vectors in IoT ecosystems. The best-performing method was the Decision Tree, achieving 99.56% accuracy and 99.62% F1, indicating this model is capable of detecting threats accurately and reliably. The Random Forest model also performed nearly as well, with an accuracy of 98.22% and an F1 score of 98.24%, indicating that ML methods excel in a scenario of high-dimensional data. These findings emphasize the promise of integrating ML classifiers into the protective defenses of IoT devices and provide motivations for pursuing subsequent studies towards scalable, keystroke-based attack detection frameworks. We think that our approach offers a new avenue for constructing complex machine learning algorithms for low-resource IoT devices that strike a balance between accuracy requirements and time efficiency. In summary, these contributions expand and enhance the knowledge of the current IoT security literature, establishing a solid baseline and framework for smart, adaptive security to be used in IoT environments.

Updated: 2025-07-09 21:02:16

标题: 智能物联网安全：轻量级机器学习技术用于物联网网络中的多类攻击检测

摘要: 物联网（IoT）正在以加速的速度扩张，因此建立安全网络以减轻各种网络威胁变得至关重要。本研究探讨了物联网设备多类攻击检测的局限性，并提出了基于新型机器学习的轻量级集成方法，利用其强大的机器学习框架。我们使用了名为CICIoT 2023的数据集，其中包含34种不同的攻击类型，分为10个类别，并系统评估了大量当前机器学习技术的性能，以确定最佳的算法选择，用于保护物联网应用。在这项工作中，我们专注于基于ML分类器的方法，以解决物联网生态系统中攻击向量的复杂和异质性特性带来的挑战。最佳表现的方法是决策树，实现了99.56%的准确率和99.62%的F1值，表明该模型能够准确可靠地检测威胁。随机森林模型也表现不俗，准确率为98.22%，F1分数为98.24%，表明ML方法在高维数据场景下表现出色。这些发现强调了将ML分类器整合到物联网设备的防护防御中的潜力，并为进一步研究可扩展的、基于击键的攻击检测框架提供了动力。我们认为我们的方法为构建低资源物联网设备的复杂机器学习算法提供了新途径，平衡了准确性要求和时间效率。总之，这些贡献扩展和增强了当前物联网安全文献的知识，为在物联网环境中使用智能、自适应安全性奠定了坚实的基础和框架。

更新时间: 2025-07-09 21:02:16

领域: cs.LG

下载: http://arxiv.org/abs/2502.04057v3

Almost Sure Convergence for the Last Iterate of Stochastic Gradient Descent Schemes

We study the almost sure convergence rate for the last iterate of stochastic gradient descent (SGD) and stochastic heavy ball (SHB) in the parametric setting when the objective function $F$ is globally convex or non-convex whose gradient is $\gamma$-H\"{o}lder. Using only discrete Gronwall's inequality without Robbins-Siegmund theorem nor martingale convergence theory, we recover results for both SGD and SHB: $\min_{s\leq t} \|\nabla F(w_s)\|^2 = o(t^{p-1})$ for non-convex objectives and $F(w_t) - F_* = o(t^{2\gamma/(1+\gamma) \cdot \max(p-1,-2p+1)-\epsilon})$ for $\beta \in (0, 1)$ and $\min_{s \leq t} F(w_s) - F_* = o(t^{p-1})$ almost surely for convex objectives. In addition, we proved that SHB with constant momentum parameter $\beta \in (0, 1)$ attains a convergence rate of $F(w_t) - F_* = O(t^{\max(p-1,-2p+1)} \log^2 \frac{t}{\delta})$ with probability at least $1-\delta$ when $F$ is convex and $\gamma = 1$ and step size $\alpha_t = \Theta(t^{-p})$ with $p \in (\frac{1}{2}, 1)$.

Updated: 2025-07-09 20:59:23

标题: 随机梯度下降方案最后迭代的几乎必然收敛性

摘要: 我们研究了在参数设置下随机梯度下降（SGD）和随机重球（SHB）的最后迭代的几乎必然收敛速度，当目标函数$F$是全局凸函数或非凸函数且其梯度为$\gamma$-H\"{o}lder时。仅使用离散Gronwall不等式，而不使用Robbins-Siegmund定理或鞅收敛理论，我们得到了关于SGD和SHB的结果：对于非凸目标，$\min_{s\leq t} \|\nabla F(w_s)\|^2 = o(t^{p-1})$，对于凸目标，$F(w_t) - F_* = o(t^{2\gamma/(1+\gamma) \cdot \max(p-1,-2p+1)-\epsilon})$（其中$\beta \in (0, 1)$），以及$\min_{s \leq t} F(w_s) - F_* = o(t^{p-1})$ 几乎必然对于凸目标。此外，我们证明了具有常量动量参数$\beta \in (0, 1)$的SHB在$F$为凸函数且$\gamma = 1$以及步长$\alpha_t = \Theta(t^{-p})$（其中$p \in (\frac{1}{2}, 1)$）时，具有收敛速度$F(w_t) - F_* = O(t^{\max(p-1,-2p+1)} \log^2 \frac{t}{\delta})$，且概率至少为$1-\delta$。

更新时间: 2025-07-09 20:59:23

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2507.07281v1

HARMONIC: Cognitive and Control Collaboration in Human-Robotic Teams

This paper describes HARMONIC, a cognitive-robotic architecture that integrates the OntoAgent cognitive framework with general-purpose robot control systems applied to human-robot teaming (HRT). HARMONIC incorporates metacognition, meaningful natural language communication, and explainability capabilities required for developing mutual trust in HRT. Through simulation experiments involving a joint search task performed by a heterogeneous team of two HARMONIC-based robots and a human operator, we demonstrate heterogeneous robots that coordinate their actions, adapt to complex scenarios, and engage in natural human-robot communication. Evaluation results show that HARMONIC-based robots can reason about plans, goals, and team member attitudes while providing clear explanations for their decisions, which are essential requirements for realistic human-robot teaming.

Updated: 2025-07-09 20:58:34

标题: HARMONIC: 人机协作中的认知与控制协同

摘要: 这篇论文描述了HARMONIC，这是一个将OntoAgent认知框架与通用机器人控制系统集成的认知机器人架构，应用于人机团队合作（HRT）。HARMONIC整合了元认知、有意义的自然语言交流和解释能力，这些能力对于在HRT中建立相互信任至关重要。通过模拟实验，涉及由两个基于HARMONIC的机器人和一个人类操作员组成的异构团队执行的联合搜索任务，我们展示了协调他们的行动、适应复杂情景并进行自然的人机交流的异构机器人。评估结果显示，基于HARMONIC的机器人能够推理关于计划、目标和团队成员态度，并提供清晰的解释其决策，这是实现真实人机团队合作的基本要求。

更新时间: 2025-07-09 20:58:34

领域: cs.RO,cs.AI,cs.MA

下载: http://arxiv.org/abs/2409.18047v3

TRIP: A Nonparametric Test to Diagnose Biased Feature Importance Scores

Along with accurate prediction, understanding the contribution of each feature to the making of the prediction, i.e., the importance of the feature, is a desirable and arguably necessary component of a machine learning model. For a complex model such as a random forest, such importances are not innate -- as they are, e.g., with linear regression. Efficient methods have been created to provide such capabilities, with one of the most popular among them being permutation feature importance due to its efficiency, model-agnostic nature, and perceived intuitiveness. However, permutation feature importance has been shown to be misleading in the presence of dependent features as a result of the creation of unrealistic observations when permuting the dependent features. In this work, we develop TRIP (Test for Reliable Interpretation via Permutation), a test requiring minimal assumptions that is able to detect unreliable permutation feature importance scores that are the result of model extrapolation. To build on this, we demonstrate how the test can be complemented in order to allow its use in high dimensional settings. Through testing on simulated data and applications, our results show that the test can be used to reliably detect when permutation feature importance scores are unreliable.

Updated: 2025-07-09 20:49:10

标题: TRIP：一种诊断偏倚特征重要性评分的非参数检验

摘要: 随着准确预测的发展，了解每个特征对预测的贡献，即特征的重要性，是机器学习模型中一个令人期待且有争议的必要组成部分。对于像随机森林这样复杂的模型，这种重要性并非固有的 -- 就像线性回归那样。已经开发出高效的方法来提供这种能力，其中最受欢迎的之一是置换特征重要性，由于其高效性、与模型无关的性质和被认为是直观的。然而，置换特征重要性在存在相关特征时已被证明会产生误导，因为在置换相关特征时会创建不现实的观察结果。在这项工作中，我们开发了TRIP（通过置换进行可靠解释的测试），这是一个需要最少假设的测试，能够检测出由于模型外推而导致的不可靠的置换特征重要性分数。为了进一步发展，我们演示了如何补充该测试，使其能够在高维度环境中使用。通过对模拟数据和应用的测试，我们的结果表明，该测试可可靠地检测出置换特征重要性分数不可靠的情况。

更新时间: 2025-07-09 20:49:10

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2507.07276v1

LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation

Large Multimodal Models (LMMs) are typically trained on vast corpora of image-text data but are often limited in linguistic coverage, leading to biased and unfair outputs across languages. While prior work has explored multimodal evaluation, less emphasis has been placed on assessing multilingual capabilities. In this work, we introduce LinguaMark, a benchmark designed to evaluate state-of-the-art LMMs on a multilingual Visual Question Answering (VQA) task. Our dataset comprises 6,875 image-text pairs spanning 11 languages and five social attributes. We evaluate models using three key metrics: Bias, Answer Relevancy, and Faithfulness. Our findings reveal that closed-source models generally achieve the highest overall performance. Both closed-source (GPT-4o and Gemini2.5) and open-source models (Gemma3, Qwen2.5) perform competitively across social attributes, and Qwen2.5 demonstrates strong generalization across multiple languages. We release our benchmark and evaluation code to encourage reproducibility and further research.

Updated: 2025-07-09 20:45:04

标题: LinguaMark：多模态模型是否公平？基于基准的评估

摘要: 大型多模型（LMMs）通常在庞大的图像文本数据上进行训练，但在语言覆盖方面往往受限，导致跨语言输出存在偏见和不公平。尽管之前的研究已经探索了多模态评估，但对于评估多语言能力的重视不够。在这项工作中，我们介绍了LinguaMark，一个旨在评估最先进的LMMs在多语言视觉问答（VQA）任务上的基准测试。我们的数据集包括6,875个跨越11种语言和五种社会属性的图像文本对。我们使用三个关键指标评估模型：偏见、答案相关性和忠实度。我们的研究结果显示，封闭源模型通常表现出最高的整体性能。封闭源模型（GPT-4o和Gemini2.5）和开源模型（Gemma3、Qwen2.5）在社会属性上表现竞争力，并且Qwen2.5在多种语言上展现出强大的泛化能力。我们发布了我们的基准测试和评估代码，以鼓励可重现性和进一步的研究。

更新时间: 2025-07-09 20:45:04

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.07274v1

Beyond the ATE: Interpretable Modelling of Treatment Effects over Dose and Time

The Average Treatment Effect (ATE) is a foundational metric in causal inference, widely used to assess intervention efficacy in randomized controlled trials (RCTs). However, in many applications -- particularly in healthcare -- this static summary fails to capture the nuanced dynamics of treatment effects that vary with both dose and time. We propose a framework for modelling treatment effect trajectories as smooth surfaces over dose and time, enabling the extraction of clinically actionable insights such as onset time, peak effect, and duration of benefit. To ensure interpretability, robustness, and verifiability -- key requirements in high-stakes domains -- we adapt SemanticODE, a recent framework for interpretable trajectory modelling, to the causal setting where treatment effects are never directly observed. Our approach decouples the estimation of trajectory shape from the specification of clinically relevant properties (e.g., maxima, inflection points), supporting domain-informed priors, post-hoc editing, and transparent analysis. We show that our method yields accurate, interpretable, and editable models of treatment dynamics, facilitating both rigorous causal analysis and practical decision-making.

Updated: 2025-07-09 20:33:33

标题: 超越ATE：对剂量和时间上的治疗效果进行可解释建模

摘要: 平均治疗效应（ATE）是因果推断中的基本指标，在随机对照试验（RCTs）中广泛用于评估干预措施的有效性。然而，在许多应用中，特别是在医疗领域，这种静态摘要未能捕捉到随剂量和时间变化的治疗效应的微妙动态。我们提出了一个框架，将治疗效应轨迹建模为随着剂量和时间平滑变化的曲面，从而提取临床可操作的见解，如起效时间、峰值效应和持续受益时间。为了确保可解释性、稳健性和可验证性 - 这是高风险领域的关键要求 - 我们将最近提出的用于可解释轨迹建模的SemanticODE框架调整到治疗效应从未直接观察到的因果设置中。我们的方法将轨迹形状的估计与临床相关属性（例如最大值、拐点）的规范分离开来，支持领域知情先验、事后编辑和透明分析。我们展示了我们的方法产生准确、可解释和可编辑的治疗动态模型，促进了严格的因果分析和实际决策制定。

更新时间: 2025-07-09 20:33:33

领域: cs.LG

下载: http://arxiv.org/abs/2507.07271v1

Robust Multimodal Learning Framework For Intake Gesture Detection Using Contactless Radar and Wearable IMU Sensors

Automated food intake gesture detection plays a vital role in dietary monitoring, enabling objective and continuous tracking of eating behaviors to support better health outcomes. Wrist-worn inertial measurement units (IMUs) have been widely used for this task with promising results. More recently, contactless radar sensors have also shown potential. This study explores whether combining wearable and contactless sensing modalities through multimodal learning can further improve detection performance. We also address a major challenge in multimodal learning: reduced robustness when one modality is missing. To this end, we propose a robust multimodal temporal convolutional network with cross-modal attention (MM-TCN-CMA), designed to integrate IMU and radar data, enhance gesture detection, and maintain performance under missing modality conditions. A new dataset comprising 52 meal sessions (3,050 eating gestures and 797 drinking gestures) from 52 participants is developed and made publicly available. Experimental results show that the proposed framework improves the segmental F1-score by 4.3% and 5.2% over unimodal Radar and IMU models, respectively. Under missing modality scenarios, the framework still achieves gains of 1.3% and 2.4% for missing radar and missing IMU inputs. This is the first study to demonstrate a robust multimodal learning framework that effectively fuses IMU and radar data for food intake gesture detection.

Updated: 2025-07-09 20:15:40

标题: 使用无接触雷达和可穿戴IMU传感器的稳健多模态学习框架用于摄入手势检测

摘要: 自动食物摄入手势检测在膳食监测中扮演着重要角色，能够客观和持续地跟踪进食行为，以支持更好的健康结果。手腕佩戴的惯性测量单元（IMUs）已被广泛用于此任务，并取得了令人期待的结果。最近，无接触雷达传感器也显示出潜力。本研究探讨了通过多模态学习结合可穿戴和无接触传感模式是否可以进一步改善检测性能。我们还解决了多模态学习中的一个主要挑战：当一个模态缺失时降低了鲁棒性。为此，我们提出了一个具有跨模态注意力的鲁棒多模态时间卷积神经网络（MM-TCN-CMA），旨在整合IMU和雷达数据，增强手势检测，并在缺失模态条件下保持性能。我们开发并公开了一个新数据集，包括52名参与者的52个用餐场景（3,050个吃饭手势和797个喝水手势）。实验结果显示，提出的框架将分段F1分数分别提高了4.3%和5.2%，相较于单模态雷达和IMU模型。在缺失模态场景下，该框架仍然实现了分别对缺失雷达和缺失IMU输入的增益为1.3%和2.4%。这是第一项展示了有效融合IMU和雷达数据进行食物摄入手势检测的鲁棒多模态学习框架的研究。

更新时间: 2025-07-09 20:15:40

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.07261v1

Cryptogenic stroke and migraine: using probabilistic independence and machine learning to uncover latent sources of disease from the electronic health record

Migraine is a common but complex neurological disorder that doubles the lifetime risk of cryptogenic stroke (CS). However, this relationship remains poorly characterized, and few clinical guidelines exist to reduce this associated risk. We therefore propose a data-driven approach to extract probabilistically-independent sources from electronic health record (EHR) data and create a 10-year risk-predictive model for CS in migraine patients. These sources represent external latent variables acting on the causal graph constructed from the EHR data and approximate root causes of CS in our population. A random forest model trained on patient expressions of these sources demonstrated good accuracy (ROC 0.771) and identified the top 10 most predictive sources of CS in migraine patients. These sources revealed that pharmacologic interventions were the most important factor in minimizing CS risk in our population and identified a factor related to allergic rhinitis as a potential causative source of CS in migraine patients.

Updated: 2025-07-09 20:12:12

标题: 潜源性脑卒中和偏头痛：利用概率独立性和机器学习从电子健康记录中发现疾病的潜在来源

摘要: 偏头痛是一种常见但复杂的神经系统疾病，会使患者患隐源性中风（CS）的终身风险增加一倍。然而，这种关系仍然缺乏明确的特征，并且存在很少的临床指南来减少这种相关风险。因此，我们提出了一种基于数据驱动的方法，从电子健康记录（EHR）数据中提取概率独立的源，并为偏头痛患者创建一个CS的10年风险预测模型。这些源代表了作用于从EHR数据构建的因果图上的外部潜在变量，并近似于我们人口中CS的根本原因。一个在这些源的患者表达上训练的随机森林模型表现出良好的准确性（ROC 0.771），并确定了偏头痛患者中CS的前10个最具预测性的源。这些源揭示了药物干预是在我们人口中最重要的减少CS风险的因素，并确定了与过敏性鼻炎相关的因素作为偏头痛患者中CS的潜在致因源。

更新时间: 2025-07-09 20:12:12

领域: stat.AP,cs.LG,I.2.1; I.2.3; I.2.6; I.5.1; I.6.4; J.3

下载: http://arxiv.org/abs/2505.04631v2

AXLearn: Modular Large Model Training on Heterogeneous Infrastructure

We design and implement AXLearn, a production deep learning system that facilitates scalable and high-performance training of large deep learning models. Compared to other state-of-the-art deep learning systems, AXLearn has a unique focus on modularity and support for heterogeneous hardware infrastructure. AXLearn's internal interfaces between software components follow strict encapsulation, allowing different components to be assembled to facilitate rapid model development and experimentation on heterogeneous compute infrastructure. We introduce a novel method of quantifying modularity via Lines-of-Code (LoC)-complexity, which demonstrates how our system maintains constant complexity as we scale the components in the system, compared to linear or quadratic complexity in other systems. This allows integrating features such as Rotary Position Embeddings (RoPE) into AXLearn across hundred of modules with just 10 lines of code, compared to hundreds as required in other systems. At the same time, AXLearn maintains equivalent performance compared to state-of-the-art training systems. Finally, we share our experience in the development and operation of AXLearn.

Updated: 2025-07-09 20:10:51

标题: AXLearn：异构基础设施上的模块化大型模型训练

摘要: 我们设计并实现了AXLearn，这是一个生产级的深度学习系统，能够方便地训练大型深度学习模型并实现可扩展和高性能。与其他最先进的深度学习系统相比，AXLearn具有独特的模块化重点，并支持异构硬件基础设施。AXLearn的软件组件之间的内部接口遵循严格的封装，允许不同组件组装在一起，以便在异构计算基础设施上进行快速模型开发和实验。我们引入了一种通过代码行数（LoC）复杂度来量化模块化的新方法，这展示了我们的系统在扩展系统组件时保持恒定复杂性，而其他系统则呈线性或二次复杂性。这允许在AXLearn中集成诸如旋转位置嵌入（RoPE）等功能，只需10行代码即可跨越数百个模块，而其他系统则需要数百行。与此同时，AXLearn与最先进的训练系统相比保持了等效性能。最后，我们分享了在AXLearn开发和运营中的经验。

更新时间: 2025-07-09 20:10:51

领域: cs.LG

下载: http://arxiv.org/abs/2507.05411v2

Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning

As machine learning models become increasingly deployed across the edge of internet of things environments, a partitioned deep learning paradigm in which models are split across multiple computational nodes introduces a new dimension of security risk. Unlike traditional inference setups, these distributed pipelines span the model computation across heterogeneous nodes and communication layers, thereby exposing a broader attack surface to potential adversaries. Building on these motivations, this work explores a previously overlooked vulnerability: even when both the edge and cloud components of the model are inaccessible (i.e., black-box), an adversary who intercepts the intermediate features transmitted between them can still pose a serious threat. We demonstrate that, under these mild and realistic assumptions, an attacker can craft highly transferable proxy models, making the entire deep learning system significantly more vulnerable to evasion attacks. In particular, the intercepted features can be effectively analyzed and leveraged to distill surrogate models capable of crafting highly transferable adversarial examples against the target model. To this end, we propose an exploitation strategy specifically designed for distributed settings, which involves reconstructing the original tensor shape from vectorized transmitted features using simple statistical analysis, and adapting surrogate architectures accordingly to enable effective feature distillation. A comprehensive and systematic experimental evaluation has been conducted to demonstrate that surrogate models trained with the proposed strategy, i.e., leveraging intermediate features, tremendously improve the transferability of adversarial attacks. These findings underscore the urgent need to account for intermediate feature leakage in the design of secure distributed deep learning systems.

Updated: 2025-07-09 20:09:00

标题: 利用边缘特征进行在分布式机器学习中可转移的对抗攻击

摘要: 随着机器学习模型在物联网环境的边缘越来越广泛地部署，一种分区深度学习范式出现，其中模型被分割到多个计算节点之间引入了新的安全风险维度。与传统的推理设置不同，这些分布式管道跨越了模型计算跨越异构节点和通信层，从而向潜在对手暴露了更广泛的攻击面。基于这些动机，本文探讨了以前被忽视的漏洞：即使模型的边缘和云组件都无法访问（即黑盒），拦截它们之间传输的中间特征的对手仍然可能构成严重威胁。我们证明，在这些温和和现实的假设下，攻击者可以制作高度可转移的代理模型，使整个深度学习系统对规避攻击更加脆弱。特别是，拦截的特征可以被有效地分析和利用，以提炼出能够对目标模型制造高度可转移的对抗性示例的替代模型。为此，我们提出了一种专门针对分布式环境设计的利用策略，该策略涉及使用简单的统计分析从矢量化的传输特征中重建原始张量形状，并相应地调整替代体系结构以实现有效的特征提炼。已进行了全面和系统的实验评估，以证明使用提出的策略训练的替代模型，即利用中间特征，极大地提高了对抗攻击的可转移性。这些发现强调了在设计安全的分布式深度学习系统时需要考虑中间特征泄漏的迫切需要。

更新时间: 2025-07-09 20:09:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.07259v1

FedP3E: Privacy-Preserving Prototype Exchange for Non-IID IoT Malware Detection in Cross-Silo Federated Learning

As IoT ecosystems continue to expand across critical sectors, they have become prominent targets for increasingly sophisticated and large-scale malware attacks. The evolving threat landscape, combined with the sensitive nature of IoT-generated data, demands detection frameworks that are both privacy-preserving and resilient to data heterogeneity. Federated Learning (FL) offers a promising solution by enabling decentralized model training without exposing raw data. However, standard FL algorithms such as FedAvg and FedProx often fall short in real-world deployments characterized by class imbalance and non-IID data distributions -- particularly in the presence of rare or disjoint malware classes. To address these challenges, we propose FedP3E (Privacy-Preserving Prototype Exchange), a novel FL framework that supports indirect cross-client representation sharing while maintaining data privacy. Each client constructs class-wise prototypes using Gaussian Mixture Models (GMMs), perturbs them with Gaussian noise, and transmits only these compact summaries to the server. The aggregated prototypes are then distributed back to clients and integrated into local training, supported by SMOTE-based augmentation to enhance representation of minority malware classes. Rather than relying solely on parameter averaging, our prototype-driven mechanism enables clients to enrich their local models with complementary structural patterns observed across the federation -- without exchanging raw data or gradients. This targeted strategy reduces the adverse impact of statistical heterogeneity with minimal communication overhead. We evaluate FedP3E on the N-BaIoT dataset under realistic cross-silo scenarios with varying degrees of data imbalance.

Updated: 2025-07-09 20:07:35

标题: FedP3E：跨独立学习领域中用于非IID物联网恶意软件检测的隐私保护原型交换

摘要: 随着物联网生态系统在关键领域不断扩展，它们已成为日益复杂和规模庞大的恶意软件攻击的主要目标。不断变化的威胁形势，结合物联网生成数据的敏感性质，要求检测框架既要保护隐私，又要对数据异质性具有弹性。联邦学习（FL）通过实现分散模型训练而不暴露原始数据，提供了一种有前途的解决方案。然而，标准的FL算法如FedAvg和FedProx在现实世界的部署中往往表现不佳，特别是在存在类别不平衡和非独立同分布数据的情况下 -- 尤其是在存在罕见或不连续的恶意软件类别时。为了解决这些挑战，我们提出了FedP3E（隐私保护原型交换），这是一种支持间接跨客户端表示共享的新颖FL框架，同时保护数据隐私。每个客户端使用高斯混合模型（GMMs）构建按类别的原型，用高斯噪声扰动它们，然后仅向服务器传输这些紧凑的摘要。然后将聚合的原型分发给客户端，并集成到本地训练中，同时通过基于SMOTE的增强来增强少数恶意软件类别的表示。我们的原型驱动机制不仅依赖于参数平均化，还使客户端能够在整个联邦中丰富其本地模型，而无需交换原始数据或梯度。这种有针对性的策略可以降低统计异质性的不良影响，通信开销也很小。我们在N-BaIoT数据集上评估了FedP3E，在具有不同程度数据不平衡的实际交叉隔离场景下进行了评估。

更新时间: 2025-07-09 20:07:35

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.07258v1

Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery

We present a multi-agent system for automation of scientific research tasks, cmbagent. The system is formed by about 30 Large Language Model (LLM) agents and implements a Planning & Control strategy to orchestrate the agentic workflow, with no human-in-the-loop at any point. Each agent specializes in a different task (performing retrieval on scientific papers and codebases, writing code, interpreting results, critiquing the output of other agents) and the system is able to execute code locally. We successfully apply cmbagent to carry out a PhD level cosmology task (the measurement of cosmological parameters using supernova data) and evaluate its performance on two benchmark sets, finding superior performance over state-of-the-art LLMs. The source code is available on GitHub, demonstration videos are also available, and the system is deployed on HuggingFace and will be available on the cloud.

Updated: 2025-07-09 20:03:30

标题: 开放源规划与控制系统，具有语言代理的自主科学发现

摘要: 我们提出了一个用于自动化科学研究任务的多智能体系统，cmbagent。该系统由约30个大型语言模型（LLM）智能体组成，并实现了一个计划和控制策略来编排智能体的工作流程，在任何时候都没有人参与。每个智能体专门负责不同的任务（检索科学论文和代码库，编写代码，解释结果，批评其他智能体的输出），并且系统能够在本地执行代码。我们成功地将cmbagent应用于执行一个博士级别的宇宙学任务（使用超新星数据测量宇宙学参数），并在两个基准集上评估其性能，发现比最先进的LLM表现更优秀。源代码可在GitHub上找到，演示视频也可用，该系统已部署在HuggingFace上，并将在云端提供。

更新时间: 2025-07-09 20:03:30

领域: cs.AI,astro-ph.IM,cs.CL,cs.MA

下载: http://arxiv.org/abs/2507.07257v1

Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications

Large language models (LLMs) such as GPT-4o and o1 have demonstrated strong performance on clinical natural language processing (NLP) tasks across multiple medical benchmarks. Nonetheless, two high-impact NLP tasks - structured tabular reporting from nurse dictations and medical order extraction from doctor-patient consultations - remain underexplored due to data scarcity and sensitivity, despite active industry efforts. Practical solutions to these real-world clinical tasks can significantly reduce the documentation burden on healthcare providers, allowing greater focus on patient care. In this paper, we investigate these two challenging tasks using private and open-source clinical datasets, evaluating the performance of both open- and closed-weight LLMs, and analyzing their respective strengths and limitations. Furthermore, we propose an agentic pipeline for generating realistic, non-sensitive nurse dictations, enabling structured extraction of clinical observations. To support further research in both areas, we release SYNUR and SIMORD, the first open-source datasets for nurse observation extraction and medical order extraction.

Updated: 2025-07-09 19:53:32

标题: 用语言模型赋予医疗从业者力量：在两个真实世界临床应用中对语音转录进行结构化

摘要: 大型语言模型（LLMs）如GPT-4o和o1已经在多个医学基准测试中展示出在临床自然语言处理（NLP）任务上的强大性能。然而，由于数据稀缺和敏感性，两个高影响力的NLP任务——从护士口述中结构化的表格报告和从医生-患者咨询中提取医疗医嘱——仍然未被充分探索，尽管行业正在积极努力。解决这些真实世界临床任务的实际方法可以显著减轻医疗保健提供者的文档负担，使他们更专注于患者护理。在本文中，我们使用私有和开源临床数据集研究这两个具有挑战性的任务，评估开放和闭合权重LLMs的性能，并分析它们各自的优势和局限性。此外，我们提出了一个代理管道，用于生成逼真的、非敏感的护士口述，实现临床观察的结构化提取。为了支持进一步研究这两个领域，我们发布了SYNUR和SIMORD，这是护士观察提取和医疗医嘱提取的第一个开源数据集。

更新时间: 2025-07-09 19:53:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.05517v2

A Language-Driven Framework for Improving Personalized Recommendations: Merging LLMs with Traditional Algorithms

Traditional recommendation algorithms are not designed to provide personalized recommendations based on user preferences provided through text, e.g., "I enjoy light-hearted comedies with a lot of humor". Large Language Models (LLMs) have emerged as one of the most promising tools for natural language processing in recent years. This research proposes a novel framework that mimics how a close friend would recommend items based on their knowledge of an individual's tastes. We leverage LLMs to enhance movie recommendation systems by refining traditional algorithm outputs and integrating them with language-based user preference inputs. We employ Singular Value Decomposition (SVD) or SVD++ algorithms to generate initial movie recommendations, implemented using the Surprise Python library and trained on the MovieLens-Latest-Small dataset. We compare the performance of the base algorithms with our LLM-enhanced versions using leave-one-out validation hit rates and cumulative hit rates. Additionally, to compare the performance of our framework against the current state-of-the-art recommendation systems, we use rating and ranking metrics with an item-based stratified 0.75 train, 0.25 test split. Our framework can generate preference profiles automatically based on users' favorite movies or allow manual preference specification for more personalized results. Using an automated approach, our framework overwhelmingly surpassed SVD and SVD++ on every evaluation metric used (e.g., improvements of up to ~6x in cumulative hit rate, ~3.7x in NDCG, etc.), albeit at the cost of a slight increase in computational overhead.

Updated: 2025-07-09 19:48:33

标题: 一个用于改善个性化推荐的语言驱动框架：将LLMs与传统算法合并

摘要: 传统的推荐算法并不是为了根据用户通过文本提供的偏好来提供个性化推荐而设计的，例如，“我喜欢轻松幽默的喜剧片”。近年来，大型语言模型（LLMs）已经成为自然语言处理中最有前途的工具之一。本研究提出了一个新颖的框架，模拟了一个亲密朋友如何基于对个人口味的了解来推荐物品。我们利用LLMs来改进传统算法的输出，并将其与基于语言的用户偏好输入集成。我们使用奇异值分解（SVD）或SVD++算法生成初始的电影推荐，使用Surprise Python库实现，并在MovieLens-Latest-Small数据集上进行训练。我们通过留一验证命中率和累计命中率比较基本算法与我们增强的LLM版本的性能。此外，为了比较我们的框架与当前最先进的推荐系统的性能，我们使用基于项目的分层0.75训练、0.25测试分割的评分和排名指标。我们的框架可以根据用户喜欢的电影自动生成偏好配置文件，也可以允许手动指定偏好以获得更个性化的结果。使用自动化方法，我们的框架在每个评估指标上都明显优于SVD和SVD++（例如，在累积命中率上提高了约6倍，在NDCG上提高了约3.7倍等），尽管这会略微增加计算开销。

更新时间: 2025-07-09 19:48:33

领域: cs.IR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.07251v1

Semi-fragile watermarking of remote sensing images using DWT, vector quantization and automatic tiling

A semi-fragile watermarking scheme for multiple band images is presented in this article. We propose to embed a mark into remote sensing images applying a tree-structured vector quantization approach to the pixel signatures instead of processing each band separately. The signature of the multispectral or hyperspectral image is used to embed the mark in it order to detect any significant modification of the original image. The image is segmented into three-dimensional blocks, and a tree-structured vector quantizer is built for each block. These trees are manipulated using an iterative algorithm until the resulting block satisfies a required criterion, which establishes the embedded mark. The method is shown to be able to preserve the mark under lossy compression (above a given threshold) but, at the same time, it detects possibly forged blocks and their position in the whole image.

Updated: 2025-07-09 19:47:40

标题: 遥感图像的半脆弱水印技术：基于DWT、向量量化和自动平铺的方法

摘要: 这篇文章介绍了一种用于多波段图像的半脆弱水印方案。我们提出将标记嵌入到遥感图像中，采用树状结构的向量量化方法对像素签名进行处理，而不是分别处理每个波段。多光谱或高光谱图像的签名用于嵌入标记，以便检测原始图像的任何重要修改。图像被分割成三维块，并为每个块构建一个树状结构的向量量化器。这些树使用迭代算法进行操作，直到生成的块满足所需的标准，确定嵌入的标记。该方法被证明能够在有损压缩（超过给定阈值）下保留标记，同时检测可能伪造的块及其在整个图像中的位置。

更新时间: 2025-07-09 19:47:40

领域: cs.CR,cs.MM

下载: http://arxiv.org/abs/2507.07250v1

Position: Adopt Constraints Over Penalties in Deep Learning

Recent efforts to develop trustworthy AI systems with accountability guarantees have led to widespread use of machine learning formulations incorporating external requirements, or constraints. These requirements are often enforced via penalization--adding fixed-weight terms to the task loss. We argue this approach is fundamentally ill-suited since there may be no penalty coefficient that simultaneously ensures constraint satisfaction and optimal constrained performance, i.e., that truly solves the constrained problem. Moreover, tuning these coefficients requires costly trial-and-error, incurring significant time and computational overhead. We, therefore, advocate for broader adoption of tailored constrained optimization methods--such as the Lagrangian approach, which jointly optimizes the penalization "coefficients" (the Lagrange multipliers) and the model parameters. Such methods (i) truly solve the constrained problem and do so accountably, by clearly defining feasibility and verifying when it is achieved, (ii) eliminate the need for extensive penalty tuning, and (iii) integrate seamlessly with modern deep learning pipelines.

Updated: 2025-07-09 19:47:30

标题: 立场：在深度学习中采用约束而非惩罚

摘要: 最近，为开发具有责任保障的可信任人工智能系统而做出的努力导致了广泛使用包含外部要求或约束的机器学习公式。这些要求通常通过惩罚来强制执行--向任务损失添加固定权重项。我们认为这种方法基本上不适用，因为可能没有惩罚系数能同时确保约束满足和最佳约束性能，即真正解决了约束问题。此外，调整这些系数需要昂贵的试错，需要花费大量时间和计算开销。因此，我们主张更广泛地采用定制的约束优化方法--如拉格朗日方法，该方法同时优化惩罚“系数”（拉格朗日乘子）和模型参数。这些方法（i）真正解决了约束问题，并通过清晰定义可行性和验证实现可行性来做到负责任，（ii）消除了对广泛惩罚调整的需求，（iii）与现代深度学习流程无缝集成。

更新时间: 2025-07-09 19:47:30

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2505.20628v2

Leveraging the Structure of Medical Data for Improved Representation Learning

Building generalizable medical AI systems requires pretraining strategies that are data-efficient and domain-aware. Unlike internet-scale corpora, clinical datasets such as MIMIC-CXR offer limited image counts and scarce annotations, but exhibit rich internal structure through multi-view imaging. We propose a self-supervised framework that leverages the inherent structure of medical datasets. Specifically, we treat paired chest X-rays (i.e., frontal and lateral views) as natural positive pairs, learning to reconstruct each view from sparse patches while aligning their latent embeddings. Our method requires no textual supervision and produces informative representations. Evaluated on MIMIC-CXR, we show strong performance compared to supervised objectives and baselines being trained without leveraging structure. This work provides a lightweight, modality-agnostic blueprint for domain-specific pretraining where data is structured but scarce

Updated: 2025-07-09 19:45:03

标题: 利用医疗数据结构进行改进的表示学习

摘要: 构建通用的医疗人工智能系统需要数据高效和领域感知的预训练策略。与互联网规模的语料库不同，诸如MIMIC-CXR之类的临床数据集提供有限的图像数量和稀缺的注释，但通过多视图成像展现出丰富的内部结构。我们提出了一种利用医学数据集固有结构的自监督框架。具体而言，我们将成对的胸部X光片（即正面和侧面视图）视为自然的正对，学习从稀疏补丁中重建每个视图，同时对齐它们的潜在嵌入。我们的方法不需要文本监督并产生信息丰富的表示。在MIMIC-CXR上进行评估，我们展示了与受监督目标和没有利用结构进行训练的基线相比的强大性能。这项工作为领域特定的预训练提供了一个轻量级、模态不可知的蓝图，其中数据结构化但稀缺。

更新时间: 2025-07-09 19:45:03

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.02987v2

Attentions Under the Microscope: A Comparative Study of Resource Utilization for Variants of Self-Attention

As large language models (LLMs) and visual language models (VLMs) grow in scale and application, attention mechanisms have become a central computational bottleneck due to their high memory and time complexity. While many efficient attention variants have been proposed, there remains a lack of rigorous evaluation on their actual energy usage and hardware resource demands during training. In this work, we benchmark eight attention mechanisms in training GPT-2 architecture, measuring key metrics including training time, GPU memory usage, FLOPS, CPU usage, and power consumption. Our results reveal that attention mechanisms with optimized kernel implementations, including Flash Attention, Locality-Sensitive Hashing (LSH) Attention, and Multi-Head Latent Attention (MLA), achieve the best energy efficiency. We further show that lower GPU power alone does not guarantee reduced energy use, as training time plays an equally important role. Our study highlights the importance of energy-aware benchmarking in attention design and provides a practical insight for selecting resource-efficient mechanisms. All our codes are available at GitHub.

Updated: 2025-07-09 19:37:23

标题: 放在显微镜下的注意力：自注意力变体资源利用的比较研究

摘要: 随着大型语言模型（LLMs）和视觉语言模型（VLMs）在规模和应用上的增长，由于其高内存和时间复杂度，注意力机制已成为计算的中心瓶颈。虽然已经提出了许多高效的注意力变体，但在它们实际的能源使用和硬件资源需求方面仍存在缺乏严格评估。在这项工作中，我们在训练GPT-2架构中对八种注意力机制进行基准测试，测量关键指标包括训练时间、GPU内存使用、FLOPS、CPU使用和功耗。我们的结果表明，具有优化内核实现的注意力机制，包括Flash Attention、局部敏感哈希（LSH）Attention和多头潜在注意力（MLA），实现了最佳的能源效率。我们进一步表明，仅较低的GPU功率并不能保证降低能源消耗，因为训练时间起着同样重要的作用。我们的研究强调了注意力设计中能源感知基准测试的重要性，并为选择资源高效机制提供了实用的见解。我们所有的代码都在GitHub上提供。

更新时间: 2025-07-09 19:37:23

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2507.07247v1

Disa: Accurate Learning-based Static Disassembly with Attentions

For reverse engineering related security domains, such as vulnerability detection, malware analysis, and binary hardening, disassembly is crucial yet challenging. The fundamental challenge of disassembly is to identify instruction and function boundaries. Classic approaches rely on file-format assumptions and architecture-specific heuristics to guess the boundaries, resulting in incomplete and incorrect disassembly, especially when the binary is obfuscated. Recent advancements of disassembly have demonstrated that deep learning can improve both the accuracy and efficiency of disassembly. In this paper, we propose Disa, a new learning-based disassembly approach that uses the information of superset instructions over the multi-head self-attention to learn the instructions' correlations, thus being able to infer function entry-points and instruction boundaries. Disa can further identify instructions relevant to memory block boundaries to facilitate an advanced block-memory model based value-set analysis for an accurate control flow graph (CFG) generation. Our experiments show that Disa outperforms prior deep-learning disassembly approaches in function entry-point identification, especially achieving 9.1% and 13.2% F1-score improvement on binaries respectively obfuscated by the disassembly desynchronization technique and popular source-level obfuscator. By achieving an 18.5% improvement in the memory block precision, Disa generates more accurate CFGs with a 4.4% reduction in Average Indirect Call Targets (AICT) compared with the state-of-the-art heuristic-based approach.

Updated: 2025-07-09 19:36:57

标题: Disa：基于注意力机制的准确学习静态反汇编

摘要: 对于与逆向工程相关的安全领域，如漏洞检测、恶意软件分析和二进制强化，反汇编是至关重要却具有挑战性的。反汇编的根本挑战在于识别指令和函数边界。传统方法依赖于文件格式假设和特定架构的启发式方法来猜测边界，导致不完整和不正确的反汇编，特别是在二进制代码被混淆时。最近的反汇编方法的进展表明，深度学习可以提高反汇编的准确性和效率。在本文中，我们提出了一种新的基于学习的反汇编方法Disa，该方法利用多头自注意力机制上的超集指令信息来学习指令之间的相关性，从而能够推断函数入口点和指令边界。Disa还可以进一步识别与内存块边界相关的指令，以促进基于块内存模型的值集分析，以生成准确的控制流图（CFG）。我们的实验证明，Disa在函数入口点识别方面优于先前的深度学习反汇编方法，特别是在分别使用反汇编失同步技术和流行的源码级混淆器混淆的二进制代码上，分别实现了9.1%和13.2%的F1分数改进。通过在内存块精度方面实现18.5%的改进，Disa生成了更准确的CFG，相比于最先进的基于启发式方法的方法，平均间接调用目标（AICT）减少了4.4%。

更新时间: 2025-07-09 19:36:57

领域: cs.CR

下载: http://arxiv.org/abs/2507.07246v1

Cosmos World Foundation Model Platform for Physical AI

Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications. Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers. To help Physical AI builders solve the most critical problems of our society, we make Cosmos open-source and our models open-weight with permissive licenses available via https://github.com/nvidia-cosmos/cosmos-predict1.

Updated: 2025-07-09 19:35:31

标题: 宇宙世界基金会物理人工智能模型平台

摘要: 物理人工智能首先需要通过数字化训练。它需要一个自身的数字孪生体，即政策模型，以及一个数字孪生体世界，即世界模型。在本文中，我们提出了宇宙世界基金会模型平台，以帮助开发人员为他们的物理人工智能设置构建定制化的世界模型。我们将世界基金会模型定位为一个通用的世界模型，可以被微调成为用于下游应用的定制化世界模型。我们的平台涵盖了视频策划管道、预训练的世界基金会模型、预训练的世界基金会模型后训练的示例以及视频分词器。为了帮助物理人工智能构建者解决我们社会最关键的问题，我们将Cosmos开源，并且我们的模型开放权重，可通过宽松许可证在https://github.com/nvidia-cosmos/cosmos-predict1上获得。

更新时间: 2025-07-09 19:35:31

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2501.03575v3

Automated Attack Testflow Extraction from Cyber Threat Report using BERT for Contextual Analysis

In the ever-evolving landscape of cybersecurity, the rapid identification and mitigation of Advanced Persistent Threats (APTs) is crucial. Security practitioners rely on detailed threat reports to understand the tactics, techniques, and procedures (TTPs) employed by attackers. However, manually extracting attack testflows from these reports requires elusive knowledge and is time-consuming and prone to errors. This paper proposes FLOWGUARDIAN, a novel solution leveraging language models (i.e., BERT) and Natural Language Processing (NLP) techniques to automate the extraction of attack testflows from unstructured threat reports. FLOWGUARDIAN systematically analyzes and contextualizes security events, reconstructs attack sequences, and then generates comprehensive testflows. This automated approach not only saves time and reduces human error but also ensures comprehensive coverage and robustness in cybersecurity testing. Empirical validation using public threat reports demonstrates FLOWGUARDIAN's accuracy and efficiency, significantly enhancing the capabilities of security teams in proactive threat hunting and incident response.

Updated: 2025-07-09 19:33:13

标题: 使用BERT进行上下文分析从网络威胁报告中提取自动攻击测试流程

摘要: 在网络安全领域不断发展的背景下，快速识别和缓解高级持久性威胁（APTs）至关重要。安全从业者依靠详细的威胁报告来了解攻击者采用的策略、技术和程序（TTPs）。然而，从这些报告中手动提取攻击测试流程需要难以捉摸的知识，耗时且容易出错。本文提出了FLOWGUARDIAN，这是一种新颖的解决方案，利用语言模型（即BERT）和自然语言处理（NLP）技术来自动从非结构化威胁报告中提取攻击测试流程。FLOWGUARDIAN系统地分析和情境化安全事件，重建攻击序列，然后生成全面的测试流程。这种自动化方法不仅节省时间并减少人为错误，还确保网络安全测试的全面覆盖和鲁棒性。使用公开威胁报告进行的实证验证显示，FLOWGUARDIAN的准确性和效率，显著增强了安全团队在积极威胁猎杀和事件响应中的能力。

更新时间: 2025-07-09 19:33:13

领域: cs.CR

下载: http://arxiv.org/abs/2507.07244v1

Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased

Imbalanced binary classification problems arise in many fields of study. When using machine learning models for these problems, it is common to subsample the majority class (i.e., undersampling) to create a (more) balanced dataset for model training. This biases the model's predictions because the model learns from a dataset that does not follow the same data generating process as new data. One way of accounting for this bias is to analytically map the resulting predictions to new values based on the sampling rate for the majority class, which was used to create the training dataset. While this approach may work well for some machine learning models, we show that calibrating a random forest this way has unintended negative consequences, including prevalence estimates that can be upwardly biased. These prevalence estimates depend on both i) the number of predictors considered at each split in the random forest; and ii) the sampling rate used. We explain the former using known properties of random forests and analytical calibration. However, in investigating the latter issue, we made a surprising discovery - contrary to the widespread belief that decision trees are biased towards the majority class, they actually can be biased towards the minority class.

Updated: 2025-07-09 19:32:05

标题: 使用基于树模型的不平衡数据学习面临的挑战：患病率估计在系统性上依赖于超参数并可能存在上升偏倚

摘要: Imbalanced binary classification problems are common in various fields of study. When using machine learning models for such problems, it is typical to undersample the majority class in order to create a more balanced dataset for training. However, this approach can lead to biased model predictions as the model is learning from a dataset that does not accurately represent the data generating process of new data. One method to address this bias is to adjust the model's predictions based on the sampling rate used to create the training dataset. While this approach may work well for some machine learning models, we found that calibrating a random forest in this manner can have unintended negative consequences, such as upwardly biased prevalence estimates. These prevalence estimates are influenced by both the number of predictors considered at each split in the random forest and the sampling rate used. We explain the former using established properties of random forests and analytical calibration. However, in investigating the latter issue, we made a surprising discovery - contrary to popular belief that decision trees are biased towards the majority class, they can actually be biased towards the minority class.

更新时间: 2025-07-09 19:32:05

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2412.16209v2

A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive

Large Language Models (LLMs) are increasingly utilized in autonomous decision-making, where they sample options from vast action spaces. However, the heuristics that guide this sampling process remain under explored. We study this sampling behavior and show that this underlying heuristics resembles that of human decision-making: comprising a descriptive component (reflecting statistical norm) and a prescriptive component (implicit ideal encoded in the LLM) of a concept. We show that this deviation of a sample from the statistical norm towards a prescriptive component consistently appears in concepts across diverse real-world domains like public health, and economic trends. To further illustrate the theory, we demonstrate that concept prototypes in LLMs are affected by prescriptive norms, similar to the concept of normality in humans. Through case studies and comparison with human studies, we illustrate that in real-world applications, the shift of samples toward an ideal value in LLMs' outputs can result in significantly biased decision-making, raising ethical concerns.

Updated: 2025-07-09 19:29:19

标题: 线性混合模型中的响应抽样理论：部分描述性和部分规范性

摘要: 大型语言模型（LLMs）越来越多地用于自主决策，其中它们从庞大的行动空间中采样选项。然而，指导这一采样过程的启发式方法仍未得到深入探讨。我们研究了这种采样行为，并展示了这种底层启发式方法类似于人类决策制定：包括一个描述性部分（反映统计规范）和一个规范性部分（LLM中隐含的理想）。我们展示了这种样本从统计规范偏离向规范性部分的偏差在各种真实世界领域中持续出现，如公共卫生和经济趋势。为了进一步阐明这一理论，我们展示了LLMs中的概念原型受到规范性规范的影响，类似于人类中的正常概念。通过案例研究和与人类研究的比较，我们说明在实际应用中，LLMs输出中样本向理想值的偏移可能导致明显偏见的决策制定，引起道德关注。

更新时间: 2025-07-09 19:29:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.11005v4

Semantic Edge Computing and Semantic Communications in 6G Networks: A Unifying Survey and Research Challenges

Semantic Edge Computing (SEC) and Semantic Communications (SemComs) have been proposed as viable approaches to achieve real-time edge-enabled intelligence in sixth-generation (6G) wireless networks. On one hand, SemCom leverages the strength of Deep Neural Networks (DNNs) to encode and communicate the semantic information only, while making it robust to channel distortions by compensating for wireless effects. Ultimately, this leads to an improvement in the communication efficiency. On the other hand, SEC has leveraged distributed DNNs to divide the computation of a DNN across different devices based on their computational and networking constraints. Although significant progress has been made in both fields, the literature lacks a systematic view to connect both fields. In this work, we fulfill the current gap by unifying the SEC and SemCom fields. We summarize the research problems in these two fields and provide a comprehensive review of the state of the art with a focus on their technical strengths and challenges.

Updated: 2025-07-09 19:22:05

标题: 语义边缘计算和语义通信在6G网络中：一个统一的调查和研究挑战

摘要: 语义边缘计算（SEC）和语义通信（SemComs）已被提出作为实现第六代（6G）无线网络中实时边缘智能的可行方法。一方面，SemCom利用深度神经网络（DNNs）的优势来仅编码和传输语义信息，同时通过补偿无线效应来使其对信道失真具有鲁棒性。最终，这导致了通信效率的提高。另一方面，SEC利用分布式DNNs将DNN的计算分配到不同设备上，根据它们的计算和网络约束。尽管在这两个领域取得了显著进展，但文献缺乏一个系统性的观点来连接这两个领域。在这项工作中，我们通过统一SEC和SemCom领域来填补当前的空白。我们总结了这两个领域的研究问题，并重点介绍了其技术优势和挑战的综合评估。

更新时间: 2025-07-09 19:22:05

领域: cs.LG,cs.NI,eess.SP

下载: http://arxiv.org/abs/2411.18199v3

Reducing Reasoning Costs: The Path of Optimization for Chain of Thought via Sparse Attention Mechanism

In order to address the chain of thought in the large language model inference cost surge, this research proposes to use a sparse attention mechanism that only focuses on a few relevant tokens. The researcher constructed a new attention mechanism and used GiantRabbit trained with custom GPTs as an experimental tool. The experiment tested and compared the reasoning time, correctness score and chain of thought length of this model and o1 Preview in solving the linear algebra test questions of MIT OpenCourseWare. The results show that GiantRabbit's reasoning time and chain of thought length are significantly lower than o1 Preview. It verifies the feasibility of sparse attention mechanism for optimizing chain of thought reasoning. Detailed architectural details and experimental process have been uploaded to Github, the link is:https://github.com/brucewang123456789/GeniusTrail.git.

Updated: 2025-07-09 19:21:45

标题: 减少推理成本：通过稀疏注意机制对思维链进行优化的路径

摘要: 为了解决大型语言模型推理成本激增的思维链问题，本研究提出使用稀疏注意机制，仅关注少数相关标记。研究人员构建了一个新的注意机制，并使用定制的GPTs训练的GiantRabbit作为实验工具。实验测试并比较了该模型和o1 Preview在解决MIT OpenCourseWare的线性代数测试问题时的推理时间、正确率和思维链长度。结果显示，GiantRabbit的推理时间和思维链长度明显低于o1 Preview。这验证了稀疏注意机制优化思维链推理的可行性。详细的架构细节和实验过程已上传到Github，链接是:https://github.com/brucewang123456789/GeniusTrail.git。

更新时间: 2025-07-09 19:21:45

领域: cs.LG

下载: http://arxiv.org/abs/2411.09111v8

Towards Robust Surrogate Models: Benchmarking Machine Learning Approaches to Expediting Phase Field Simulations of Brittle Fracture

Data driven approaches have the potential to make modeling complex, nonlinear physical phenomena significantly more computationally tractable. For example, computational modeling of fracture is a core challenge where machine learning techniques have the potential to provide a much needed speedup that would enable progress in areas such as mutli-scale modeling and uncertainty quantification. Currently, phase field modeling (PFM) of fracture is one such approach that offers a convenient variational formulation to model crack nucleation, branching and propagation. To date, machine learning techniques have shown promise in approximating PFM simulations. However, most studies rely on overly simple benchmarks that do not reflect the true complexity of the fracture processes where PFM excels as a method. To address this gap, we introduce a challenging dataset based on PFM simulations designed to benchmark and advance ML methods for fracture modeling. This dataset includes three energy decomposition methods, two boundary conditions, and 1,000 random initial crack configurations for a total of 6,000 simulations. Each sample contains 100 time steps capturing the temporal evolution of the crack field. Alongside this dataset, we also implement and evaluate Physics Informed Neural Networks (PINN), Fourier Neural Operators (FNO) and UNet models as baselines, and explore the impact of ensembling strategies on prediction accuracy. With this combination of our dataset and baseline models drawn from the literature we aim to provide a standardized and challenging benchmark for evaluating machine learning approaches to solid mechanics. Our results highlight both the promise and limitations of popular current models, and demonstrate the utility of this dataset as a testbed for advancing machine learning in fracture mechanics research.

Updated: 2025-07-09 19:14:56

标题: 走向稳健的代理模型：对加速脆性断裂相场模拟的机器学习方法进行基准测试

摘要: 数据驱动方法有潜力使建模复杂、非线性物理现象变得更易于计算。例如，断裂的计算建模是一个核心挑战，机器学习技术有潜力提供急需的加速，从而推动多尺度建模和不确定性量化等领域的进展。目前，相场模型（PFM）是一种提供方便的变分表达来模拟裂纹起始、分支和传播的方法。迄今为止，机器学习技术在近似PFM模拟方面表现出了潜力。然而，大多数研究依赖于过于简单的基准，这些基准并不能反映PFM作为一种方法在断裂过程中真正复杂的特性。为了填补这一差距，我们引入了一个基于PFM模拟设计的具有挑战性的数据集，用于基准测试和推动断裂建模的机器学习方法。该数据集包括三种能量分解方法、两种边界条件和1,000个随机初始裂纹配置，共计6,000个模拟。每个样本包含100个时间步，捕捉裂纹场的时间演变。除了这个数据集，我们还实现和评估了基于物理信息的神经网络（PINN）、傅立叶神经操作员（FNO）和UNet模型作为基线，并探讨了集合策略对预测准确性的影响。通过我们从文献中提取的数据集和基线模型的组合，我们旨在为评估机器学习方法在固体力学中提供标准化和具有挑战性的基准。我们的结果突显了当前流行模型的潜力和局限性，并展示了这个数据集作为推进断裂力学研究中机器学习的试验平台的实用性。

更新时间: 2025-07-09 19:14:56

领域: cs.LG,physics.data-an,74R10, 74B20, 74A40, 68T07,J.2; I.6.3; I.6.5

下载: http://arxiv.org/abs/2507.07237v1

Multi-Scenario Reasoning: Unlocking Cognitive Autonomy in Humanoid Robots for Multimodal Understanding

To improve the cognitive autonomy of humanoid robots, this research proposes a multi-scenario reasoning architecture to solve the technical shortcomings of multi-modal understanding in this field. It draws on simulation based experimental design that adopts multi-modal synthesis (visual, auditory, tactile) and builds a simulator "Maha" to perform the experiment. The findings demonstrate the feasibility of this architecture in multimodal data. It provides reference experience for the exploration of cross-modal interaction strategies for humanoid robots in dynamic environments. In addition, multi-scenario reasoning simulates the high-level reasoning mechanism of the human brain to humanoid robots at the cognitive level. This new concept promotes cross-scenario practical task transfer and semantic-driven action planning. It heralds the future development of self-learning and autonomous behavior of humanoid robots in changing scenarios.

Updated: 2025-07-09 19:14:44

标题: 多情景推理：为人形机器人解锁多模态理解的认知自主性

摘要: 为了提高人形机器人的认知自主性，本研究提出了一个多场景推理架构，以解决这一领域中多模态理解的技术缺陷。它借鉴了基于模拟的实验设计，采用多模态综合（视觉、听觉、触觉），并建立了一个名为“Maha”的模拟器来进行实验。研究结果表明了这种架构在多模态数据中的可行性。它为在动态环境中探索人形机器人跨模态交互策略提供了参考经验。此外，多场景推理模拟了人脑的高级推理机制，将其应用到人形机器人的认知层面。这个新概念促进了跨场景实际任务转移和语义驱动的行动规划。它预示着人形机器人在不断变化的场景中自学习和自主行为的未来发展。

更新时间: 2025-07-09 19:14:44

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2412.20429v4

Implementation and Analysis of Regev's Quantum Factorization Algorithm

Quantum computing represents a significant advancement in computational capabilities. Of particular concern is its impact on asymmetric cryptography through, notably, Shor's algorithm and the more recently developed Regev's algorithm for factoring composite numbers. We present our implementation of the latter. Our analysis encompasses both quantum simulation results and classical component examples, with particular emphasis on comparative cases between Regev's and Shor's algorithms. Our experimental results reveal that Regev's algorithm indeed outperforms Shor's algorithm for certain composite numbers in practice. However, we observed significant performance variations across different input values. Despite Regev's algorithm's theoretical asymptotic efficiency advantage, our implementation exhibited execution times longer than Shor's algorithm for small integer factorization in both quantum and classical components. These findings offer insights into the practical challenges and performance characteristics of implementing Regev's algorithm in realistic quantum computing scenarios.

Updated: 2025-07-09 19:14:35

标题: Regev的量子因子分解算法的实现和分析

摘要: 量子计算代表了计算能力的显著进步。特别关注的是其对非对称密码学的影响，尤其是通过Shor算法和最近发展的Regev算法来分解复合数。我们展示了后者的实现。我们的分析涵盖了量子模拟结果和经典组件示例，特别强调了Regev算法和Shor算法之间的比较案例。我们的实验结果显示，实际上，Regev算法在某些复合数上表现优于Shor算法。然而，我们观察到在不同输入值之间存在显著的性能变化。尽管Regev算法在理论上具有渐近效率优势，但我们的实现在量子和经典组件中对小整数因子分解的执行时间都长于Shor算法。这些发现为在现实量子计算场景中实现Regev算法的实际挑战和性能特征提供了见解。

更新时间: 2025-07-09 19:14:35

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2502.09772v2

An Information-Theoretic Perspective on Multi-LLM Uncertainty Estimation

Large language models (LLMs) often behave inconsistently across inputs, indicating uncertainty and motivating the need for its quantification in high-stakes settings. Prior work on calibration and uncertainty quantification often focuses on individual models, overlooking the potential of model diversity. We hypothesize that LLMs make complementary predictions due to differences in training and the Zipfian nature of language, and that aggregating their outputs leads to more reliable uncertainty estimates. To leverage this, we propose MUSE (Multi-LLM Uncertainty via Subset Ensembles), a simple information-theoretic method that uses Jensen-Shannon Divergence to identify and aggregate well-calibrated subsets of LLMs. Experiments on binary prediction tasks demonstrate improved calibration and predictive performance compared to single-model and naive ensemble baselines.

Updated: 2025-07-09 19:13:25

标题: 一个信息论视角下的多LLM不确定性估计

摘要: 大型语言模型（LLMs）在输入上经常表现不一致，表明存在不确定性，并激发了在高风险环境中量化其不确定性的需求。先前关于校准和不确定性量化的工作通常集中在个体模型上，忽略了模型多样性的潜力。我们假设LLMs由于训练的差异和语言的Zipfian性质而做出互补的预测，聚合它们的输出可以产生更可靠的不确定性估计。为了利用这一点，我们提出了MUSE（通过子集集成实现多LLM不确定性），这是一种简单的信息论方法，使用Jensen-Shannon散度来识别和聚合LLMs的校准良好的子集。针对二元预测任务的实验证明，与单一模型和天真集成基线相比，MUSE能够实现改善的校准和预测性能。

更新时间: 2025-07-09 19:13:25

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.07236v1

Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection

Jailbreaking techniques trick Large Language Models (LLMs) into producing restricted output, posing a potential threat. One line of defense is to use another LLM as a Judge to evaluate the harmfulness of generated text. However, we reveal that these Judge LLMs are vulnerable to token segmentation bias, an issue that arises when delimiters alter the tokenization process, splitting words into smaller sub-tokens. This alters the embeddings of the entire sequence, reducing detection accuracy and allowing harmful content to be misclassified as safe. In this paper, we introduce Emoji Attack, a novel strategy that amplifies existing jailbreak prompts by exploiting token segmentation bias. Our method leverages in-context learning to systematically insert emojis into text before it is evaluated by a Judge LLM, inducing embedding distortions that significantly lower the likelihood of detecting unsafe content. Unlike traditional delimiters, emojis also introduce semantic ambiguity, making them particularly effective in this attack. Through experiments on state-of-the-art Judge LLMs, we demonstrate that Emoji Attack substantially reduces the unsafe prediction rate, bypassing existing safeguards.

Updated: 2025-07-09 19:12:23

标题: 表情符号攻击：增强越狱攻击对法官LLM检测的影响

摘要: 越狱技术欺骗大型语言模型（LLMs）产生受限制的输出，构成潜在威胁。一种防御方法是使用另一个LLM作为评判者来评估生成文本的有害性。然而，我们揭示这些评判者LLMs易受令牌分割偏见的影响，这是一种在分隔符改变令牌化过程时出现的问题，将单词分割成较小的子令牌。这改变了整个序列的嵌入，降低了检测准确性，使有害内容被误分类为安全。在本文中，我们介绍了一种新颖的策略Emoji Attack，通过利用令牌分割偏见来放大现有的越狱提示。我们的方法利用上下文学习，在文本被评判者LLM评估之前系统地插入表情符号，诱导嵌入失真，显著降低检测到不安全内容的可能性。与传统的分隔符不同，表情符号还引入语义模糊性，在这种攻击中特别有效。通过对最先进的评判者LLMs进行实验，我们证明了Emoji Attack显著降低了不安全预测率，绕过了现有的安全措施。

更新时间: 2025-07-09 19:12:23

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.01077v3

OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning

Recent works have shown that by using large pre-trained models along with learnable prompts, rehearsal-free methods for class-incremental learning (CIL) settings can achieve superior performance to prominent rehearsal-based ones. Rehearsal-free CIL methods struggle with distinguishing classes from different tasks, as those are not trained together. In this work we propose a regularization method based on virtual outliers to tighten decision boundaries of the classifier, such that confusion of classes among different tasks is mitigated. Recent prompt-based methods often require a pool of task-specific prompts, in order to prevent overwriting knowledge of previous tasks with that of the new task, leading to extra computation in querying and composing an appropriate prompt from the pool. This additional cost can be eliminated, without sacrificing accuracy, as we reveal in the paper. We illustrate that a simplified prompt-based method can achieve results comparable to previous state-of-the-art (SOTA) methods equipped with a prompt pool, using much less learnable parameters and lower inference cost. Our regularization method has demonstrated its compatibility with different prompt-based methods, boosting those previous SOTA rehearsal-free CIL methods' accuracy on the ImageNet-R and CIFAR-100 benchmarks. Our source code is available at https://github.com/jpmorganchase/ovor.

Updated: 2025-07-09 19:07:40

标题: OVOR：一种具有虚拟异常值正则化的无需重复练习的类增量学习算法

摘要: 最近的研究表明，通过使用大型预训练模型以及可学习提示，无需重复练习的类增量学习（CIL）方法可以实现比著名的基于重复练习的方法更优越的性能。无需重复练习的CIL方法在区分不同任务的类别方面存在困难，因为它们没有一起训练。在这项研究中，我们提出了一种基于虚拟异常值的正则化方法，以收紧分类器的决策边界，从而减轻不同任务之间类别混淆的问题。最近的基于提示的方法通常需要一个任务特定提示池，以防止将先前任务的知识覆盖到新任务的知识上，从而导致额外的计算用于从提示池中查询和组合合适的提示。我们在论文中揭示，这种额外成本可以被消除，而不会牺牲准确性。我们证明，一个简化的基于提示的方法可以实现与之前状态-of-the-art（SOTA）方法相当的结果，而使用更少的可学习参数和更低的推理成本。我们的正则化方法已经证明与不同基于提示的方法兼容，在ImageNet-R和CIFAR-100基准测试中提高了那些先前SOTA的无重复练习CIL方法的准确性。我们的源代码可以访问https://github.com/jpmorganchase/ovor。

更新时间: 2025-07-09 19:07:40

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2402.04129v2

Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences

The vertebrate hippocampus is believed to use recurrent connectivity in area CA3 to support episodic memory recall from partial cues. This brain area also contains place cells, whose location-selective firing fields implement maps supporting spatial memory. Here we show that place cells emerge in networks trained to remember temporally continuous sensory episodes. We model CA3 as a recurrent autoencoder that recalls and reconstructs sensory experiences from noisy and partially occluded observations by agents traversing simulated rooms. The agents move in realistic trajectories modeled from rodents and environments are modeled as high-dimensional sensory experience maps. Training our autoencoder to pattern-complete and reconstruct experiences with a constraint on total activity causes spatially localized firing fields, i.e., place cells, to emerge in the encoding layer. The emergent place fields reproduce key aspects of hippocampal phenomenology: a) remapping (maintenance of and reversion to distinct learned maps in different environments), implemented via repositioning of experience manifolds in the network's hidden layer, b) orthogonality of spatial representations in different arenas, c) robust place field emergence in differently shaped rooms, with single units showing multiple place fields in large or complex spaces, and d) slow representational drift of place fields. We argue that these results arise because continuous traversal of space makes sensory experience temporally continuous. We make testable predictions: a) rapidly changing sensory context will disrupt place fields, b) place fields will form even if recurrent connections are blocked, but reversion to previously learned representations upon remapping will be abolished, c) the dimension of temporally smooth experience sets the dimensionality of place fields, including during virtual navigation of abstract spaces.

Updated: 2025-07-09 19:03:11

标题: 时间创造空间：编码时间连续感觉体验的网络中地点场的出现

摘要: 脊椎动物的海马区被认为利用CA3区的递归连接来支持从部分线索中召回叙事记忆。这个脑区还包含地点细胞，其位置选择性放电场实现支持空间记忆的地图。在这里，我们展示了地点细胞在训练记忆时间连续感官事件的网络中出现。我们将CA3建模为一个递归自动编码器，从模拟房间中移动的代理的嘈杂和部分遮挡的观察中召回和重建感官体验。代理以从啮齿动物和环境建模的现实轨迹移动，环境被建模为高维感官体验地图。通过对我们的自动编码器进行模式完成和重建体验的训练，并对总活动施加约束，导致在编码层中出现空间局部化的放电场，即地点细胞。出现的地点场重现了海马现象学的关键方面：a)重定位（在不同环境中维持和恢复不同学习地图），通过在网络隐藏层中重新定位体验流形实现，b)不同竞技场中的空间表示的正交性，c)在不同形状的房间中出现强大的地点场，单元在大型或复杂空间中显示多个地点场，以及d)地点场的缓慢表征漂移。我们认为这些结果是因为空间的连续遍历使感官体验在时间上连续。我们提出可验证的预测：a)快速变化的感官背景会破坏地点场，b)即使递归连接被阻断，地点场也会形成，但在重新映射时恢复到先前学习的表示将被废除，c)时间上平滑体验的维度确定地点场的维度，包括在虚拟导航抽象空间时。

更新时间: 2025-07-09 19:03:11

领域: q-bio.NC,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2408.05798v3

Efficient Parametric SVD of Koopman Operator for Stochastic Dynamical Systems

The Koopman operator provides a principled framework for analyzing nonlinear dynamical systems through linear operator theory. Recent advances in dynamic mode decomposition (DMD) have shown that trajectory data can be used to identify dominant modes of a system in a data-driven manner. Building on this idea, deep learning methods such as VAMPnet and DPNet have been proposed to learn the leading singular subspaces of the Koopman operator. However, these methods require backpropagation through potentially numerically unstable operations on empirical second moment matrices, such as singular value decomposition and matrix inversion, during objective computation, which can introduce biased gradient estimates and hinder scalability to large systems. In this work, we propose a scalable and conceptually simple method for learning the top-k singular functions of the Koopman operator for stochastic dynamical systems based on the idea of low-rank approximation. Our approach eliminates the need for unstable linear algebraic operations and integrates easily into modern deep learning pipelines. Empirical results demonstrate that the learned singular subspaces are both reliable and effective for downstream tasks such as eigen-analysis and multi-step prediction.

Updated: 2025-07-09 18:55:48

标题: 随机动力系统的Koopman算子的高效参数化奇异值分解

摘要: The Koopman operator提供了一个通过线性算子理论来分析非线性动力系统的原则性框架。最近在动态模式分解（DMD）方面的进展表明，轨迹数据可以用于以数据驱动的方式识别系统的主导模式。在这个想法的基础上，提出了深度学习方法，如VAMPnet和DPNet，用于学习Koopman算子的主要奇异子空间。然而，这些方法在目标计算过程中需要通过可能数值不稳定的操作，如奇异值分解和矩阵求逆，进行反向传播，这可能会引入偏倚梯度估计并阻碍对大型系统的可扩展性。在这项工作中，我们提出了一种基于低秩逼近思想的可扩展且概念简单的方法，用于学习随机动力系统的Koopman算子的前k个奇异函数。我们的方法消除了对不稳定线性代数运算的需求，并易于集成到现代深度学习流程中。实证结果表明，学习到的奇异子空间既可靠又有效，可用于诸如特征分析和多步预测等下游任务。

更新时间: 2025-07-09 18:55:48

领域: cs.LG,cs.NA,math.DS,math.NA

下载: http://arxiv.org/abs/2507.07222v1

Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

Compound AI systems integrating multiple components, such as Large Language Models, specialized tools, and traditional machine learning models, are increasingly deployed to solve complex real-world tasks. However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and model parameters. To address this challenge, we propose Optimas, a unified framework for effective optimization of compound systems. The core idea of Optimas is to maintain one Local Reward Function (LRF) per component, each satisfying a local-global alignment property, i.e., each component's local reward correlates with the global system performance. In each iteration, Optimas efficiently adapts the LRFs to maintain this property while simultaneously maximizing each component's local reward. This approach enables independent updates of heterogeneous configurations using the designated optimization method, while ensuring that local improvements consistently lead to performance gains. We present extensive evaluations across five real-world compound systems to demonstrate that Optimas outperforms strong baselines by an average improvement of 11.92%, offering a general and effective approach for improving compound systems. Our website is at https://optimas.stanford.edu.

Updated: 2025-07-09 18:47:51

标题: Optimas：通过全局对齐的本地奖励优化化合AI系统

摘要: 复合AI系统集成多个组件，如大型语言模型、专门工具和传统机器学习模型，越来越多地被部署来解决复杂的现实世界任务。然而，由于它们的非可微结构和各个组件之间的多样化配置类型（包括提示、超参数和模型参数），优化复合系统仍然具有挑战性。为解决这一挑战，我们提出了Optimas，一个用于有效优化复合系统的统一框架。Optimas的核心思想是为每个组件维护一个本地奖励函数（LRF），每个LRF都满足局部-全局对齐属性，即每个组件的本地奖励与全局系统性能相关联。在每次迭代中，Optimas有效地调整LRF以保持这种属性，同时最大化每个组件的本地奖励。这种方法使得可以使用指定的优化方法独立更新异构配置，同时确保本地改进始终导致性能增益。我们在五个真实世界的复合系统上进行了广泛评估，结果表明Optimas比强基线平均提高了11.92%，为改进复合系统提供了一种通用和有效的方法。我们的网站是https://optimas.stanford.edu。

更新时间: 2025-07-09 18:47:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03041v2

Neurosymbolic Feature Extraction for Identifying Forced Labor in Supply Chains

Supply chain networks are complex systems that are challenging to analyze; this problem is exacerbated when there are illicit activities involved in the supply chain, such as counterfeit parts, forced labor, or human trafficking. While machine learning (ML) can find patterns in complex systems like supply chains, traditional ML techniques require large training data sets. However, illicit supply chains are characterized by very sparse data, and the data that is available is often (purposely) corrupted or unreliable in order to hide the nature of the activities. We need to be able to automatically detect new patterns that correlate with such illegal activity over complex, even temporal data, without requiring large training data sets. We explore neurosymbolic methods for identifying instances of illicit activity in supply chains and compare the effectiveness of manual and automated feature extraction from news articles accurately describing illicit activities uncovered by authorities. We propose a question tree approach for querying a large language model (LLM) to identify and quantify the relevance of articles. This enables a systematic evaluation of the differences between human and machine classification of news articles related to forced labor in supply chains.

Updated: 2025-07-09 18:44:48

标题: 神经符号特征提取用于识别供应链中的强迫劳动

摘要: 供应链网络是复杂系统，分析起来具有挑战性；当供应链中涉及非法活动，如假冒零部件、强迫劳工或人口贩卖时，问题会变得更加严重。虽然机器学习（ML）可以发现供应链等复杂系统中的模式，但传统的ML技术需要大量的训练数据集。然而，非法供应链的特点是数据非常稀疏，可用数据通常是（故意）损坏或不可靠的，以隐藏活动的性质。我们需要能够自动检测与此类非法活动相关的新模式，即使在复杂甚至时间数据中，也不需要大量的训练数据集。我们探讨了神经符号方法，用于识别供应链中的非法活动实例，并比较了从新闻文章中准确描述当局揭露的非法活动中进行的手动和自动特征提取的有效性。我们提出了一个问题树方法，用于查询大型语言模型（LLM）以识别并量化文章的相关性。这使得我们能够系统地评估人类和机器对涉及强迫劳工的新闻文章进行分类的差异。

更新时间: 2025-07-09 18:44:48

领域: cs.AI,cs.LG,cs.LO,I.2.4; I.2.7; J.4

下载: http://arxiv.org/abs/2507.07217v1

Bias-Aware Mislabeling Detection via Decoupled Confident Learning

Reliable data is a cornerstone of modern organizational systems. A notable data integrity challenge stems from label bias, which refers to systematic errors in a label, a covariate that is central to a quantitative analysis, such that its quality differs across social groups. This type of bias has been conceptually and empirically explored and is widely recognized as a pressing issue across critical domains. However, effective methodologies for addressing it remain scarce. In this work, we propose Decoupled Confident Learning (DeCoLe), a principled machine learning based framework specifically designed to detect mislabeled instances in datasets affected by label bias, enabling bias aware mislabelling detection and facilitating data quality improvement. We theoretically justify the effectiveness of DeCoLe and evaluate its performance in the impactful context of hate speech detection, a domain where label bias is a well documented challenge. Empirical results demonstrate that DeCoLe excels at bias aware mislabeling detection, consistently outperforming alternative approaches for label error detection. Our work identifies and addresses the challenge of bias aware mislabeling detection and offers guidance on how DeCoLe can be integrated into organizational data management practices as a powerful tool to enhance data reliability.

Updated: 2025-07-09 18:44:36

标题: 通过解耦自信学习实现偏见感知的错误标记检测

摘要: 可靠的数据是现代组织系统的基石。一个显著的数据完整性挑战源于标签偏差，指的是标签中的系统性错误，这是量化分析中的一个关键协变量，其质量在不同社会群体之间存在差异。这种偏差已经在概念上和实证上得到探讨，并被广泛认为是跨领域的一个紧迫问题。然而，解决这一问题的有效方法仍然很少。在这项工作中，我们提出了Decoupled Confident Learning（DeCoLe），这是一个基于机器学习的原则性框架，专门设计用于检测受标签偏差影响的数据集中的错误标记实例，实现了偏差感知的错误标记检测，并促进数据质量的改进。我们从理论上证明了DeCoLe的有效性，并在仇恨言论检测这一具有影响力的背景下评估了其性能，这是一个已经被充分记录的标签偏差挑战的领域。实证结果表明，DeCoLe在偏差感知的错误标记检测方面表现出色，始终优于其他标签错误检测方法。我们的工作确定并解决了偏差感知的错误标记检测挑战，并提供了DeCoLe如何被整合到组织数据管理实践中作为增强数据可靠性的强大工具的指导。

更新时间: 2025-07-09 18:44:36

领域: cs.LG,cs.AI,cs.DB,cs.HC

下载: http://arxiv.org/abs/2507.07216v1

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory

Effective video tokenization is critical for scaling transformer models for long videos. Current approaches tokenize videos using space-time patches, leading to excessive tokens and computational inefficiencies. The best token reduction strategies degrade performance and barely reduce the number of tokens when the camera moves. We introduce grounded video tokenization, a paradigm that organizes tokens based on panoptic sub-object trajectories rather than fixed patches. Our method aligns with fundamental perceptual principles, ensuring that tokenization reflects scene complexity rather than video duration. We propose TrajViT, a video encoder that extracts object trajectories and converts them into semantically meaningful tokens, significantly reducing redundancy while maintaining temporal coherence. Trained with contrastive learning, TrajViT significantly outperforms space-time ViT (ViT3D) across multiple video understanding benchmarks, e.g., TrajViT outperforms ViT3D by a large margin of 6% top-5 recall in average at video-text retrieval task with 10x token deduction. We also show TrajViT as a stronger model than ViT3D for being the video encoder for modern VideoLLM, obtaining an average of 5.2% performance improvement across 6 VideoQA benchmarks while having 4x faster training time and 18x less inference FLOPs. TrajViT is the first efficient encoder to consistently outperform ViT3D across diverse video analysis tasks, making it a robust and scalable solution.

Updated: 2025-07-09 18:41:10

标题: 一个轨迹，一个标记：通过全景子对象轨迹的视频标记化

摘要: 有效的视频标记对于扩展长视频的Transformer模型至关重要。当前的方法是使用时空补丁对视频进行标记，导致了过多的标记和计算效率低下。最佳的标记减少策略会降低性能并且几乎不减少标记数量，当摄像机移动时。我们引入了基于全景次目标轨迹而不是固定补丁的基于实地的视频标记方法。我们的方法符合基本的感知原则，确保标记反映场景的复杂性而不是视频的持续时间。我们提出了TrajViT，一个视频编码器，它提取对象轨迹并将其转换为语义有意义的标记，显著减少了冗余，同时保持了时间上的连贯性。通过对比学习训练，TrajViT在多个视频理解基准测试中明显优于时空ViT（ViT3D），例如，在视频-文本检索任务中，TrajViT在平均10倍标记减少的情况下，比ViT3D的top-5召回率高出6%。我们还展示了TrajViT作为现代VideoLLM的视频编码器比ViT3D更强大，跨6个VideoQA基准测试获得了平均5.2%的性能提升，同时训练时间快了4倍，推理FLOPs减少了18倍。TrajViT是第一个在不同的视频分析任务中始终优于ViT3D的高效编码器，使其成为一个强大和可扩展的解决方案。

更新时间: 2025-07-09 18:41:10

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2505.23617v2

Towards Evaluating Robustness of Prompt Adherence in Text to Image Models

The advancements in the domain of LLMs in recent years have surprised many, showcasing their remarkable capabilities and diverse applications. Their potential applications in various real-world scenarios have led to significant research on their reliability and effectiveness. On the other hand, multimodal LLMs and Text-to-Image models have only recently gained prominence, especially when compared to text-only LLMs. Their reliability remains constrained due to insufficient research on assessing their performance and robustness. This paper aims to establish a comprehensive evaluation framework for Text-to-Image models, concentrating particularly on their adherence to prompts. We created a novel dataset that aimed to assess the robustness of these models in generating images that conform to the specified factors of variation in the input text prompts. Our evaluation studies present findings on three variants of Stable Diffusion models: Stable Diffusion 3 Medium, Stable Diffusion 3.5 Large, and Stable Diffusion 3.5 Large Turbo, and two variants of Janus models: Janus Pro 1B and Janus Pro 7B. We introduce a pipeline that leverages text descriptions generated by the gpt-4o model for our ground-truth images, which are then used to generate artificial images by passing these descriptions to the Text-to-Image models. We then pass these generated images again through gpt-4o using the same system prompt and compare the variation between the two descriptions. Our results reveal that these models struggle to create simple binary images with only two factors of variation: a simple geometric shape and its location. We also show, using pre-trained VAEs on our dataset, that they fail to generate images that follow our input dataset distribution.

Updated: 2025-07-09 18:40:17

标题: 朝向评估文本到图像模型中提示遵循的稳健性

摘要: 近年来在LLMs领域的进展让许多人感到惊讶，展示出它们出色的能力和多样的应用。它们在各种现实场景中的潜在应用已经引发了对它们可靠性和有效性的重要研究。另一方面，多模态LLMs和文本到图像模型近来才开始引起关注，尤其是与仅文本的LLMs相比。由于对评估它们的性能和稳健性的研究不足，它们的可靠性仍然受到限制。本文旨在建立一个全面的评估框架，专注于文本到图像模型对提示的遵循。我们创建了一个新颖的数据集，旨在评估这些模型在生成符合输入文本提示中指定变化因素的图像的稳健性。我们的评估研究展示了三个稳定扩散模型的结果：Stable Diffusion 3 Medium、Stable Diffusion 3.5 Large和Stable Diffusion 3.5 Large Turbo，以及两个Janus模型的变种：Janus Pro 1B和Janus Pro 7B。我们引入了一个流程，利用gpt-4o模型生成的文本描述作为我们的真实图像，然后将这些描述传递给文本到图像模型生成人工图像。然后我们再次通过相同的系统提示将这些生成的图像传递给gpt-4o，并比较两个描述之间的变化。我们的结果显示这些模型在创建仅包含两个变化因素的简单二进制图像时存在困难：一个简单的几何形状和其位置。我们还展示了，使用我们的数据集上预训练的VAEs，它们无法生成遵循我们输入数据集分布的图像。

更新时间: 2025-07-09 18:40:17

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.08039v1

WatchWitch: Interoperability, Privacy, and Autonomy for the Apple Watch

Smartwatches such as the Apple Watch collect vast amounts of intimate health and fitness data as we wear them. Users have little choice regarding how this data is processed: The Apple Watch can only be used with Apple's iPhones, using their software and their cloud services. We are the first to publicly reverse-engineer the watch's wireless protocols, which led to discovering multiple security issues in Apple's proprietary implementation. With WatchWitch, our custom Android reimplementation, we break out of Apple's walled garden -- demonstrating practical interoperability with enhanced privacy controls and data autonomy. We thus pave the way for more consumer choice in the smartwatch ecosystem, offering users more control over their devices.

Updated: 2025-07-09 18:33:58

标题: WatchWitch：Apple Watch的互操作性、隐私和自主性

摘要: 智能手表（如苹果手表）在我们佩戴时收集大量私密的健康和健身数据。用户几乎无法选择这些数据如何被处理：苹果手表只能与苹果的iPhone一起使用，使用他们的软件和云服务。我们是首批公开逆向工程手表的无线协议的人，这导致发现了苹果专有实现中的多个安全问题。通过我们的自定义Android重新实现WatchWitch，我们打破了苹果的封闭生态系统，展示了与增强隐私控制和数据自治的实际互操作性。因此，我们为智能手表生态系统铺平了道路，为用户提供更多对设备的控制权，提供更多消费者选择。

更新时间: 2025-07-09 18:33:58

领域: cs.CR

下载: http://arxiv.org/abs/2507.07210v1

Scale leads to compositional generalization

Can neural networks systematically capture discrete, compositional task structure despite their continuous, distributed nature? The impressive capabilities of large-scale neural networks suggest that the answer to this question is yes. However, even for the most capable models, there are still frequent failure cases that raise doubts about their compositionality. Here, we seek to understand what it takes for a standard neural network to generalize over tasks that share compositional structure. We find that simply scaling data and model size leads to compositional generalization. We show that this holds across different task encodings as long as the training distribution sufficiently covers the task space. In line with this finding, we prove that standard multilayer perceptrons can approximate a general class of compositional task families to arbitrary precision using only a linear number of neurons with respect to the number of task modules. Finally, we uncover that if networks successfully compositionally generalize, the constituents of a task can be linearly decoded from their hidden activations. We show that this metric correlates with failures of text-to-image generation models to compose known concepts.

Updated: 2025-07-09 18:30:50

标题: 规模导致组合概括

摘要: 神经网络是否能够系统地捕捉离散的、组合的任务结构，尽管它们具有连续的、分布式的特性？大规模神经网络的强大能力表明对于这个问题的答案是肯定的。然而，即使对于最有能力的模型，仍然存在频繁的失败案例，这引发了对它们组合性的怀疑。在这里，我们试图了解标准神经网络在共享组合结构的任务上实现泛化所需的条件。我们发现，简单地扩大数据和模型规模会导致组合泛化。我们展示了这一点在不同任务编码中都成立，只要训练分布充分覆盖了任务空间。与这一发现一致，我们证明标准多层感知器可以用线性数量的神经元逼近一个通用的组合任务家族，达到任意精度，而这个数量与任务模块的数量成正比。最后，我们发现如果网络成功实现组合泛化，任务的组成部分可以从它们的隐藏激活中线性解码。我们展示了这个度量与文本到图像生成模型无法组合已知概念的失败之间的相关性。

更新时间: 2025-07-09 18:30:50

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2507.07207v1

State-Inference-Based Prompting for Natural Language Trading with Game NPCs

Large Language Models enable dynamic game interactions but struggle with rule-governed trading systems. Current implementations suffer from rule violations, such as item hallucinations and calculation errors, that erode player trust. Here, State-Inference-Based Prompting (SIBP) enables reliable trading through autonomous dialogue state inference and context-specific rule adherence. The approach decomposes trading into six states within a unified prompt framework, implementing context-aware item referencing and placeholder-based price calculations. Evaluation across 100 trading dialogues demonstrates >97% state compliance, >95% referencing accuracy, and 99.7% calculation precision. SIBP maintains computational efficiency while outperforming baseline approaches, establishing a practical foundation for trustworthy NPC interactions in commercial games.

Updated: 2025-07-09 18:24:47

标题: 基于状态推理的自然语言交易与游戏NPC的提示 (Note: NPCs stands for Non-Player Characters)

摘要: 大型语言模型可以实现动态游戏互动，但在受规则约束的交易系统中表现不佳。当前的实现存在规则违反问题，例如物品幻觉和计算错误，这会削弱玩家的信任。在这里，基于状态推断的提示（SIBP）通过自主对话状态推断和特定上下文规则遵从，实现可靠的交易。该方法将交易分解为六个状态，在统一提示框架中实现了上下文感知的物品引用和基于占位符的价格计算。对100个交易对话进行评估显示，超过97%的状态符合规则，超过95%的引用准确率，计算精度达到99.7%。SIBP在保持计算效率的同时，优于基准方法，为商业游戏中可信的NPC互动建立了实用基础。

更新时间: 2025-07-09 18:24:47

领域: cs.AI

下载: http://arxiv.org/abs/2507.07203v1

MODA: A Unified 3D Diffusion Framework for Multi-Task Target-Aware Molecular Generation

Three-dimensional molecular generators based on diffusion models can now reach near-crystallographic accuracy, yet they remain fragmented across tasks. SMILES-only inputs, two-stage pretrain-finetune pipelines, and one-task-one-model practices hinder stereochemical fidelity, task alignment, and zero-shot transfer. We introduce MODA, a diffusion framework that unifies fragment growing, linker design, scaffold hopping, and side-chain decoration with a Bayesian mask scheduler. During training, a contiguous spatial fragment is masked and then denoised in one pass, enabling the model to learn shared geometric and chemical priors across tasks. Multi-task training yields a universal backbone that surpasses six diffusion baselines and three training paradigms on substructure, chemical property, interaction, and geometry. Model-C reduces ligand-protein clashes and substructure divergences while maintaining Lipinski compliance, whereas Model-B preserves similarity but trails in novelty and binding affinity. Zero-shot de novo design and lead-optimisation tests confirm stable negative Vina scores and high improvement rates without force-field refinement. These results demonstrate that a single-stage multi-task diffusion routine can replace two-stage workflows for structure-based molecular design.

Updated: 2025-07-09 18:19:50

标题: MODA：一个统一的三维扩散框架，用于多任务目标感知分子生成.

摘要: 基于扩散模型的三维分子生成器现在可以达到近乎晶体学精度，但它们仍然在任务之间存在碎片化。仅基于SMILES的输入、两阶段预训练微调流程以及一任务一模型的做法阻碍了立体化学的忠实性、任务对齐和零样本转移。我们引入了MODA，一个扩散框架，统一了片段生长、连接物设计、骨架跳跃和侧链修饰，并使用贝叶斯掩模调度器。在训练期间，一个连续的空间片段被掩盖，然后在一个步骤中去噪声，使模型能够学习跨任务的共享几何和化学先验知识。多任务训练产生了一个超越了六个扩散基线和三个训练范式的通用主干，在亚结构、化学性质、相互作用和几何方面都取得了成功。Model-C减少了配体-蛋白冲突和亚结构分歧，同时保持了Lipinski的一致性，而Model-B保持了相似性，但在新颖性和结合亲和力方面落后。零样本全新设计和主导优化测试证实了稳定的负Vina分数和高的改进率，无需力场的细化。这些结果表明，单阶段多任务扩散例程可以取代基于结构的分子设计的两阶段工作流程。

更新时间: 2025-07-09 18:19:50

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.07201v1

Combining Pre-Trained Models for Enhanced Feature Representation in Reinforcement Learning

The recent focus and release of pre-trained models have been a key components to several advancements in many fields (e.g. Natural Language Processing and Computer Vision), as a matter of fact, pre-trained models learn disparate latent embeddings sharing insightful representations. On the other hand, Reinforcement Learning (RL) focuses on maximizing the cumulative reward obtained via agent's interaction with the environment. RL agents do not have any prior knowledge about the world, and they either learn from scratch an end-to-end mapping between the observation and action spaces or, in more recent works, are paired with monolithic and computationally expensive Foundational Models. How to effectively combine and leverage the hidden information of different pre-trained models simultaneously in RL is still an open and understudied question. In this work, we propose Weight Sharing Attention (WSA), a new architecture to combine embeddings of multiple pre-trained models to shape an enriched state representation, balancing the tradeoff between efficiency and performance. We run an extensive comparison between several combination modes showing that WSA obtains comparable performance on multiple Atari games compared to end-to-end models. Furthermore, we study the generalization capabilities of this approach and analyze how scaling the number of models influences agents' performance during and after training.

Updated: 2025-07-09 18:13:52

标题: 将预训练模型结合以增强在强化学习中的特征表示

摘要: 最近，对预训练模型的关注和发布已成为许多领域（例如自然语言处理和计算机视觉）多项进展的关键组成部分，实际上，预训练模型学习不同潜在嵌入共享有深刻见解的表示。另一方面，强化学习（RL）侧重于通过代理与环境的交互最大化累积奖励。RL代理没有关于世界的任何先前知识，它们要么从头开始学习观察和动作空间之间的端到端映射，要么在更近期的作品中与庞大且计算昂贵的基础模型配对。如何在RL中同时有效地组合和利用不同预训练模型的隐藏信息仍然是一个开放且研究不足的问题。在这项工作中，我们提出了一种新架构Weight Sharing Attention（WSA），用于组合多个预训练模型的嵌入以塑造丰富的状态表示，平衡效率和性能之间的权衡。我们对几种组合模式进行了广泛比较，结果显示WSA在多个Atari游戏上获得了与端到端模型可比的性能。此外，我们研究了这种方法的泛化能力，并分析了增加模型数量如何影响代理在训练期间和之后的性能。

更新时间: 2025-07-09 18:13:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.07197v1

Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching

Diffusion models, a type of generative model, have shown promise in time series forecasting. But they face limitations like rigid source distributions and limited sampling paths, which hinder their performance. Flow matching offers faster generation, higher-quality outputs, and greater flexibility, while also possessing the ability to utilize valuable information from the prediction errors of prior models, which were previously inaccessible yet critically important. To address these challenges and fully unlock the untapped potential of flow matching, we propose Conditional Guided Flow Matching (CGFM). CGFM extends flow matching by incorporating the outputs of an auxiliary model, enabling a previously unattainable capability in the field: learning from the errors of the auxiliary model. For time series forecasting tasks, it integrates historical data as conditions and guidance, constructs two-sided conditional probability paths, and uses a general affine path to expand the space of probability paths, ultimately leading to improved predictions. Extensive experiments show that CGFM consistently enhances and outperforms state-of-the-art models, highlighting its effectiveness in advancing forecasting methods.

Updated: 2025-07-09 18:03:31

标题: 桥接预测的最后一英里：通过条件引导流匹配增强时间序列预测

摘要: 扩散模型是一种生成模型，在时间序列预测中显示出潜力。但是它们面临诸如刚性源分布和有限采样路径等限制，这些限制阻碍了它们的性能。流匹配提供更快的生成速度、更高质量的输出和更大的灵活性，同时还具有利用先前模型的预测误差中宝贵信息的能力，这些信息以前是不可获取的但至关重要。为了解决这些挑战并充分发挥流匹配的未开发潜力，我们提出了条件引导流匹配（CGFM）。CGFM通过结合辅助模型的输出扩展了流匹配，实现了一种先前无法实现的能力：从辅助模型的错误中学习。对于时间序列预测任务，它将历史数据作为条件和指导，构建双向条件概率路径，并使用一般的仿射路径来扩展概率路径空间，最终实现了更好的预测结果。大量实验证明，CGFM始终提升并超越最先进的模型，突显了其在推进预测方法方面的有效性。

更新时间: 2025-07-09 18:03:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.07192v1

Prompt Perturbations Reveal Human-Like Biases in LLM Survey Responses

Large Language Models (LLMs) are increasingly used as proxies for human subjects in social science surveys, but their reliability and susceptibility to known response biases are poorly understood. This paper investigates the response robustness of LLMs in normative survey contexts -- we test nine diverse LLMs on questions from the World Values Survey (WVS), applying a comprehensive set of 11 perturbations to both question phrasing and answer option structure, resulting in over 167,000 simulated interviews. In doing so, we not only reveal LLMs' vulnerabilities to perturbations but also reveal that all tested models exhibit a consistent \textit{recency bias} varying in intensity, disproportionately favoring the last-presented answer option. While larger models are generally more robust, all models remain sensitive to semantic variations like paraphrasing and to combined perturbations. By applying a set of perturbations, we reveal that LLMs partially align with survey response biases identified in humans. This underscores the critical importance of prompt design and robustness testing when using LLMs to generate synthetic survey data.

Updated: 2025-07-09 18:01:50

标题: 快速扰动揭示LLM调查回答中类人偏见

摘要: 大型语言模型（LLMs）越来越被用作社会科学调查中代表人类主体的工具，但它们的可靠性和对已知响应偏差的敏感性尚不明确。本文研究了LLMs在规范调查背景下的响应鲁棒性--我们测试了来自世界价值观调查（WVS）的九种不同LLMs对问题的反应，对问题措辞和答案选项结构应用了一套包括11种扰动的综合方法，产生了超过167,000个模拟访谈。通过这样做，我们不仅揭示了LLMs对扰动的敏感性，还发现所有测试模型都表现出一致的\textit{最近偏见}，强烈偏向于最后呈现的答案选项。尽管较大的模型通常更具鲁棒性，但所有模型仍对语义变化（如改写）和组合扰动敏感。通过应用一组扰动，我们揭示了LLMs在某种程度上与人类调查中发现的响应偏差相一致。这强调了在使用LLMs生成合成调查数据时，提示设计和鲁棒性测试的关键重要性。

更新时间: 2025-07-09 18:01:50

领域: cs.CL,cs.AI,cs.CY,J.4

下载: http://arxiv.org/abs/2507.07188v1

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Large language models (LLMs) exhibit cognitive biases -- systematic tendencies of irrational decision-making, similar to those seen in humans. Prior work has found that these biases vary across models and can be amplified by instruction tuning. However, it remains unclear if these differences in biases stem from pretraining, finetuning, or even random noise due to training stochasticity. We propose a two-step causal experimental approach to disentangle these factors. First, we finetune models multiple times using different random seeds to study how training randomness affects over $30$ cognitive biases. Second, we introduce \emph{cross-tuning} -- swapping instruction datasets between models to isolate bias sources. This swap uses datasets that led to different bias patterns, directly testing whether biases are dataset-dependent. Our findings reveal that while training randomness introduces some variability, biases are mainly shaped by pretraining: models with the same pretrained backbone exhibit more similar bias patterns than those sharing only finetuning data. These insights suggest that understanding biases in finetuned models requires considering their pretraining origins beyond finetuning effects. This perspective can guide future efforts to develop principled strategies for evaluating and mitigating bias in LLMs.

Updated: 2025-07-09 18:01:14

标题: 在预训练中植入，在微调中受影响：关于LLM认知偏见起源的案例研究

摘要: 大型语言模型（LLMs）表现出认知偏见--类似于人类所见的非理性决策的系统倾向。先前的研究发现，这些偏见在模型之间有所不同，并且可以通过指导调整而被放大。然而，目前尚不清楚这些偏见的差异是源自预训练、微调，还是由于训练中的随机性造成的随机噪音。我们提出了一个两步因果实验方法来解开这些因素。首先，我们使用不同的随机种子多次微调模型，以研究训练随机性如何影响超过30种认知偏见。其次，我们引入了"交叉调整"——在模型之间交换指导数据集，以分离偏见来源。这种交换使用导致不同偏见模式的数据集，直接测试偏见是否依赖于数据集。我们的发现表明，尽管训练的随机性引入了一些变异性，但偏见主要由预训练塑造：具有相同预训练骨干的模型比仅共享微调数据的模型展现出更相似的偏见模式。这些见解暗示，理解微调模型中的偏见需要考虑其预训练起源，超越微调效应。这一视角可以指导未来努力发展评估和减轻LLMs中偏见的原则性策略。

更新时间: 2025-07-09 18:01:14

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.07186v1

Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor

Recent advances in multimodal large language models (MLLMs) have enabled image-based question-answering capabilities. However, a key limitation is the use of CLIP as the visual encoder; while it can capture coarse global information, it often can miss fine-grained details that are relevant to the input query. To address these shortcomings, this work studies whether pre-trained text-to-image diffusion models can serve as instruction-aware visual encoders. Through an analysis of their internal representations, we find diffusion features are both rich in semantics and can encode strong image-text alignment. Moreover, we find that we can leverage text conditioning to focus the model on regions relevant to the input question. We then investigate how to align these features with large language models and uncover a leakage phenomenon, where the LLM can inadvertently recover information from the original diffusion prompt. We analyze the causes of this leakage and propose a mitigation strategy. Based on these insights, we explore a simple fusion strategy that utilizes both CLIP and conditional diffusion features. We evaluate our approach on both general VQA and specialized MLLM benchmarks, demonstrating the promise of diffusion models for visual understanding, particularly in vision-centric tasks that require spatial and compositional reasoning. Our project page can be found https://vatsalag99.github.io/mustafar/.

Updated: 2025-07-09 17:59:47

标题: 朝向通过稳定扩散作为任务感知特征提取器的多模态理解

摘要: 最近的多模态大型语言模型（MLLMs）的进展使得基于图像的问答能力得以实现。然而，一个关键的限制是使用CLIP作为视觉编码器；虽然它可以捕捉粗略的全局信息，但往往会错过与输入查询相关的细节。为了解决这些缺点，本文研究了预训练的文本到图像扩散模型是否可以作为具有指导意识的视觉编码器。通过对它们的内部表示的分析，我们发现扩散特征在语义上非常丰富，并且能够编码强大的图像文本对齐。此外，我们发现我们可以利用文本调节来将模型聚焦在与输入问题相关的区域。然后，我们调查如何将这些特征与大型语言模型对齐，并揭示了一个泄漏现象，即LLM可能无意中从原始扩散提示中恢复信息。我们分析了这种泄漏的原因并提出了一个缓解策略。基于这些见解，我们探讨了一种简单的融合策略，利用了CLIP和有条件的扩散特征。我们在一般的VQA和专门的MLLM基准上评估了我们的方法，展示了扩散模型在视觉理解方面的潜力，特别是在需要空间和构成推理的以视觉为中心的任务中。我们的项目页面可以在https://vatsalag99.github.io/mustafar/找到。

更新时间: 2025-07-09 17:59:47

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.07106v1

Does Data Scaling Lead to Visual Compositional Generalization?

Compositional understanding is crucial for human intelligence, yet it remains unclear whether contemporary vision models exhibit it. The dominant machine learning paradigm is built on the premise that scaling data and model sizes will improve out-of-distribution performance, including compositional generalization. We test this premise through controlled experiments that systematically vary data scale, concept diversity, and combination coverage. We find that compositional generalization is driven by data diversity, not mere data scale. Increased combinatorial coverage forces models to discover a linearly factored representational structure, where concepts decompose into additive components. We prove this structure is key to efficiency, enabling perfect generalization from few observed combinations. Evaluating pretrained models (DINO, CLIP), we find above-random yet imperfect performance, suggesting partial presence of this structure. Our work motivates stronger emphasis on constructing diverse datasets for compositional generalization, and considering the importance of representational structure that enables efficient compositional learning. Code available at https://github.com/oshapio/visual-compositional-generalization.

Updated: 2025-07-09 17:59:03

标题: 数据缩放是否导致视觉组合泛化？

摘要: 组成理解对于人类智能至关重要，然而现代视觉模型是否具备这种理解能力仍不清楚。主导的机器学习范式建立在一个假设上，即扩大数据和模型规模将提高超出分布的表现，包括组合泛化。我们通过控制实验来测试这个假设，系统地变化数据规模、概念多样性和组合覆盖。我们发现，组合泛化是由数据多样性驱动的，而不仅仅是数据规模。增加组合覆盖强迫模型发现一个线性分解的表征结构，其中概念分解为加法组件。我们证明这种结构对效率至关重要，使得能够从少量观察到的组合中实现完美泛化。评估预训练模型（DINO、CLIP），我们发现高于随机但不完美的表现，表明这种结构部分存在。我们的工作促使更加强调构建多样化的数据集以实现组合泛化，并考虑实现高效组合学习的表征结构的重要性。代码可在https://github.com/oshapio/visual-compositional-generalization 找到。

更新时间: 2025-07-09 17:59:03

领域: cs.LG

下载: http://arxiv.org/abs/2507.07102v1

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful

Conventional wisdom dictates that small batch sizes make language model pretraining and fine-tuning unstable, motivating gradient accumulation, which trades off the number of optimizer steps for a proportional increase in batch size. While it is common to decrease the learning rate for smaller batch sizes, other hyperparameters are often held fixed. In this work, we revisit small batch sizes all the way down to batch size one, and we propose a rule for scaling Adam hyperparameters to small batch sizes. We find that small batch sizes (1) train stably, (2) are consistently more robust to hyperparameter choices, (3) achieve equal or better per-FLOP performance than larger batch sizes, and (4) notably enable stable language model training with vanilla SGD, even without momentum, despite storing no optimizer state. Building on these results, we provide practical recommendations for selecting a batch size and setting optimizer hyperparameters. We further recommend against gradient accumulation unless training on multiple devices with multiple model replicas, bottlenecked by inter-device bandwidth.

Updated: 2025-07-09 17:57:36

标题: 小批量大小训练语言模型：香草SGD何时有效，为什么梯度累积是浪费的

摘要: 传统观念认为小批量大小会使语言模型的预训练和微调不稳定，从而激发了梯度累积，这种方法在优化器步数的数量上进行了折衷，以换取批量大小的成比例增加。虽然通常会降低学习率以适应较小的批量大小，但其他超参数通常保持不变。在这项工作中，我们重新审视了一直降至批量大小为一的小批量大小，并提出了一个规则来将Adam超参数缩放到小批量大小。我们发现小批量大小（1）训练稳定，（2）对超参数选择更加稳健，（3）实现了与更大批量大小相等或更好的每次FLOP性能，并且（4）显著实现了使用普通SGD进行稳定语言模型训练，即使没有动量，尽管没有存储任何优化器状态。基于这些结果，我们提供了选择批量大小和设置优化器超参数的实际建议。我们进一步建议不要进行梯度累积，除非在多个设备上训练多个模型副本，受到设备间带宽的瓶颈限制。

更新时间: 2025-07-09 17:57:36

领域: cs.LG

下载: http://arxiv.org/abs/2507.07101v1

Addressing Imbalanced Domain-Incremental Learning through Dual-Balance Collaborative Experts

Domain-Incremental Learning (DIL) focuses on continual learning in non-stationary environments, requiring models to adjust to evolving domains while preserving historical knowledge. DIL faces two critical challenges in the context of imbalanced data: intra-domain class imbalance and cross-domain class distribution shifts. These challenges significantly hinder model performance, as intra-domain imbalance leads to underfitting of few-shot classes, while cross-domain shifts require maintaining well-learned many-shot classes and transferring knowledge to improve few-shot class performance in old domains. To overcome these challenges, we introduce the Dual-Balance Collaborative Experts (DCE) framework. DCE employs a frequency-aware expert group, where each expert is guided by specialized loss functions to learn features for specific frequency groups, effectively addressing intra-domain class imbalance. Subsequently, a dynamic expert selector is learned by synthesizing pseudo-features through balanced Gaussian sampling from historical class statistics. This mechanism navigates the trade-off between preserving many-shot knowledge of previous domains and leveraging new data to improve few-shot class performance in earlier tasks. Extensive experimental results on four benchmark datasets demonstrate DCE's state-of-the-art performance.

Updated: 2025-07-09 17:57:07

标题: 通过双平衡协作专家解决不平衡的领域增量学习

摘要: Domain-Incremental Learning（DIL）专注于在非静态环境中的持续学习，要求模型在调整到不断变化的领域的同时保留历史知识。在不平衡数据的情况下，DIL面临两个关键挑战：领域内类别不平衡和领域间类别分布的变化。这些挑战显著阻碍了模型的性能，因为领域内的不平衡导致少样本类别的欠拟合，而领域间的变化需要保持良好学习的多样本类别，并转移知识以提高旧领域中少样本类别的性能。为了克服这些挑战，我们引入了Dual-Balance Collaborative Experts（DCE）框架。DCE采用一个频率感知的专家组，每个专家受专门的损失函数引导，学习特定频率组的特征，有效解决领域内类别不平衡问题。随后，通过从历史类别统计数据中进行平衡高斯采样生成伪特征来学习动态专家选择器。这种机制在保持先前领域的多样本知识和利用新数据提高早期任务中少样本类别性能之间找到平衡。对四个基准数据集的大量实验结果表明了DCE的最先进性能。

更新时间: 2025-07-09 17:57:07

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.07100v1

From Pseudorandomness to Multi-Group Fairness and Back

We identify and explore connections between the recent literature on multi-group fairness for prediction algorithms and the pseudorandomness notions of leakage-resilience and graph regularity. We frame our investigation using new variants of multicalibration based on statistical distance and closely related to the concept of outcome indistinguishability. Adopting this perspective leads us not only to new, more efficient algorithms for multicalibration, but also to our graph theoretic results and a proof of a novel hardcore lemma for real-valued functions.

Updated: 2025-07-09 17:56:58

标题: 从伪随机性到多组公平性再到回溯

摘要: 我们确定并探索了最近关于预测算法的多组公平性的文献与泄漏抗性和图规则性的伪随机性概念之间的联系。我们使用基于统计距离的新变体的多校准来框定我们的调查，这与结果不可区分性的概念密切相关。采用这种视角不仅可以为多校准提供新的、更高效的算法，还可以得出我们的图论结果和一个针对实值函数的新的核心引理的证明。

更新时间: 2025-07-09 17:56:58

领域: cs.LG,cs.CC

下载: http://arxiv.org/abs/2301.08837v4

Large-scale portfolio optimization with variational neural annealing

Portfolio optimization is a routine asset management operation conducted in financial institutions around the world. However, under real-world constraints such as turnover limits and transaction costs, its formulation becomes a mixed-integer nonlinear program that current mixed-integer optimizers often struggle to solve. We propose mapping this problem onto a classical Ising-like Hamiltonian and solving it with Variational Neural Annealing (VNA), via its classical formulation implemented using autoregressive neural networks. We demonstrate that VNA can identify near-optimal solutions for portfolios comprising more than 2,000 assets and yields performance comparable to that of state-of-the-art optimizers, such as Mosek, while exhibiting faster convergence on hard instances. Finally, we present a dynamical finite-size scaling analysis applied to the S&P 500, Russell 1000, and Russell 3000 indices, revealing universal behavior and polynomial annealing time scaling of the VNA algorithm on portfolio optimization problems.

Updated: 2025-07-09 17:46:59

标题: 使用变分神经退火进行大规模投资组合优化

摘要: 投资组合优化是世界各地金融机构进行的常规资产管理操作。然而，在现实世界的限制条件下，如换手限制和交易成本，其制定变成了一个混合整数非线性规划问题，目前的混合整数优化器往往难以解决。我们提出将这个问题映射到一个类似于伊辛的哈密顿量上，并利用变分神经退火（VNA）来解决，通过使用自回归神经网络实现其经典形式。我们证明VNA可以为包含超过2,000种资产的投资组合找到接近最优解，并且在硬实例上表现出比Mosek等最先进优化器更快的收敛性能。最后，我们对标准普尔500、罗素1000和罗素3000指数应用动态有限尺寸缩放分析，揭示了VNA算法在投资组合优化问题上的普遍行为和多项式退火时间尺度。

更新时间: 2025-07-09 17:46:59

领域: cond-mat.dis-nn,cond-mat.stat-mech,cs.LG,q-fin.PM

下载: http://arxiv.org/abs/2507.07159v1

XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs

Speech codecs serve as bridges between speech signals and large language models. An ideal codec for speech language models should not only preserve acoustic information but also capture rich semantic information. However, existing speech codecs struggle to balance high-quality audio reconstruction with ease of modeling by language models. In this study, we analyze the limitations of previous codecs in balancing semantic richness and acoustic fidelity. We propose XY-Tokenizer, a novel codec that mitigates the conflict between semantic and acoustic capabilities through multi-stage, multi-task learning. Experimental results demonstrate that XY-Tokenizer achieves performance in both semantic and acoustic tasks comparable to that of state-of-the-art codecs operating at similar bitrates, even though those existing codecs typically excel in only one aspect. Specifically, XY-Tokenizer achieves strong text alignment, surpassing distillation-based semantic modeling methods such as SpeechTokenizer and Mimi, while maintaining a speaker similarity score of 0.83 between reconstructed and original audio. The reconstruction performance of XY-Tokenizer is comparable to that of BigCodec, the current state-of-the-art among acoustic-only codecs, which achieves a speaker similarity score of 0.84 at a similar bitrate. Code and models are available at https://github.com/gyt1145028706/XY-Tokenizer.

Updated: 2025-07-09 17:40:35

标题: XY-Tokenizer：在低比特率语音编解码器中缓解语义和声学冲突

摘要: 语音编解码器充当语音信号和大型语言模型之间的桥梁。理想的语音语言模型编解码器不仅应该保留声学信息，还应该捕捉丰富的语义信息。然而，现有的语音编解码器在平衡高质量音频重建和语言模型建模的便利性方面存在困难。在本研究中，我们分析了以前的编解码器在平衡语义丰富度和声学保真度方面的限制。我们提出了XY-Tokenizer，这是一种通过多阶段、多任务学习缓解语义和声学能力之间冲突的新型编解码器。实验结果表明，XY-Tokenizer在语义和声学任务上的表现与操作在类似比特率下的最先进编解码器相当，尽管这些现有编解码器通常只在一个方面表现出色。具体来说，XY-Tokenizer实现了强大的文本对齐，超过了基于蒸馏的语义建模方法，如SpeechTokenizer和Mimi，同时保持了0.83的重建和原始音频之间的讲话者相似性得分。XY-Tokenizer的重建性能与BigCodec相当，后者是声学编解码器中的最新技术，以类似比特率实现了0.84的讲话者相似性得分。代码和模型可在https://github.com/gyt1145028706/XY-Tokenizer找到。

更新时间: 2025-07-09 17:40:35

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2506.23325v2

Less can be more for predicting properties with large language models

Predicting properties from coordinate-category data -- sets of vectors paired with categorical information -- is fundamental to computational science. In materials science, this challenge manifests as predicting properties like formation energies or elastic moduli from crystal structures comprising atomic positions (vectors) and element types (categorical information). While large language models (LLMs) have increasingly been applied to such tasks, with researchers encoding structural data as text, optimal strategies for achieving reliable predictions remain elusive. Here, we report fundamental limitations in LLM's ability to learn from coordinate information in coordinate-category data. Through systematic experiments using synthetic datasets with tunable coordinate and category contributions, combined with a comprehensive benchmarking framework (MatText) spanning multiple representations and model scales, we find that LLMs consistently fail to capture coordinate information while excelling at category patterns. This geometric blindness persists regardless of model size (up to 70B parameters), dataset scale (up to 2M structures), or text representation strategy. Our findings suggest immediate practical implications: for materials property prediction tasks dominated by structural effects, specialized geometric architectures consistently outperform LLMs by significant margins, as evidenced by a clear "GNN-LM wall" in performance benchmarks. Based on our analysis, we provide concrete guidelines for architecture selection in scientific machine learning, while highlighting the critical importance of understanding model inductive biases when tackling scientific prediction problems.

Updated: 2025-07-09 17:37:23

标题: 用大型语言模型预测属性时，更少可能会更好

摘要: 从坐标-类别数据中预测属性-一组与分类信息配对的向量-对于计算科学至关重要。在材料科学中，这一挑战表现为从包含原子位置（向量）和元素类型（分类信息）的晶体结构中预测形成能或弹性模量等属性。虽然大型语言模型（LLMs）越来越多地被应用于这些任务中，研究人员将结构数据编码为文本，但实现可靠预测的最佳策略仍然难以捉摸。在这里，我们报道了LLM在从坐标-类别数据中学习坐标信息方面的基本限制。通过使用可调的坐标和类别贡献的合成数据集进行系统实验，结合涵盖多种表示和模型规模的全面基准框架（MatText），我们发现LLMs在捕捉坐标信息方面始终失败，同时在类别模式方面表现出色。这种几何盲目性无论模型大小（高达70B参数）、数据集规模（高达2M结构）还是文本表示策略都持续存在。我们的发现表明了直接的实际意义：对于由结构效应主导的材料属性预测任务，专门的几何架构始终优于LLMs，并且在性能基准测试中存在明显的“GNN-LM壁”。基于我们的分析，我们为科学机器学习中的架构选择提供了具体指导，同时强调了在解决科学预测问题时理解模型归纳偏差的重要性。

更新时间: 2025-07-09 17:37:23

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2406.17295v3

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

Agentic AI systems, built upon large language models (LLMs) and deployed in multi-agent configurations, are redefining intelligence, autonomy, collaboration, and decision-making across enterprise and societal domains. This review presents a structured analysis of \textbf{Trust, Risk, and Security Management (TRiSM)} in the context of LLM-based Agentic Multi-Agent Systems (AMAS). We begin by examining the conceptual foundations of Agentic AI and highlight its architectural distinctions from traditional AI agents. We then adapt and extend the AI TRiSM framework for Agentic AI, structured around four key pillars: Explainability, ModelOps, Security, Privacy and Governance, each contextualized to the challenges of multi-agent LLM systems. A novel risk taxonomy is proposed to capture the unique threats and vulnerabilities of Agentic AI, ranging from coordination failures to prompt-based adversarial manipulation. To support practical assessment in Agentic AI works, we introduce two novel metrics: the Component Synergy Score (CSS), which quantifies the quality of inter-agent collaboration, and the Tool Utilization Efficacy (TUE), which evaluates the efficiency of tool use within agent workflows. We further discuss strategies for improving explainability in Agentic AI , as well as approaches to enhancing security and privacy through encryption, adversarial robustness, and regulatory compliance. The review concludes with a research roadmap for the responsible development and deployment of Agentic AI, outlining critical directions to align emerging systems with TRiSM principles for safe, transparent, and accountable operation.

Updated: 2025-07-09 17:33:49

标题: TRiSM对于Agent AI的应用：基于LLM的Agent多智能系统中的信任、风险和安全管理综述

摘要: 主动型AI系统，建立在大型语言模型（LLMs）之上，并部署在多代理配置中，正在重新定义企业和社会领域的智能、自主性、协作和决策。本综述在LLM为基础的主动型多代理系统（AMAS）环境下，提出了对信任、风险和安全管理（TRiSM）的结构化分析。我们首先审视主动型AI的概念基础，并强调其与传统AI代理的架构区别。然后，我们对AI TRiSM框架进行了调整和延伸，围绕四个关键支柱进行结构化：可解释性、模型操作、安全性、隐私和治理，每个支柱都与多代理LLM系统的挑战相联系。提出了一种新的风险分类法，以捕捉主动型AI的独特威胁和漏洞，从协调失败到基于提示的对抗性操纵。为支持主动型AI工作的实际评估，我们引入了两种新的指标：组件协同得分（CSS），用于量化代理之间协作的质量，以及工具利用效率（TUE），用于评估代理工作流程中工具使用的效率。我们进一步讨论了改善主动型AI可解释性的策略，以及通过加密、对抗性强度和合规性来增强安全性和隐私性的方法。综述以对主动型AI的负责任发展和部署的研究路线图结尾，概述了将新兴系统与TRiSM原则对齐，以实现安全、透明和可问责的运作的关键方向。

更新时间: 2025-07-09 17:33:49

领域: cs.AI

下载: http://arxiv.org/abs/2506.04133v3

Multi-Attribute Steering of Language Models via Targeted Intervention

Inference-time intervention (ITI) has emerged as a promising method for steering large language model (LLM) behavior in a particular direction (e.g., improving helpfulness) by intervening on token representations without costly updates to the LLM's parameters. However, existing ITI approaches fail to scale to multi-attribute settings with conflicts, such as enhancing helpfulness while also reducing toxicity. To address this, we introduce Multi-Attribute Targeted Steering (MAT-Steer), a novel steering framework designed for selective token-level intervention across multiple attributes. MAT-Steer learns steering vectors using an alignment objective that shifts the model's internal representations of undesirable outputs closer to those of desirable ones while enforcing sparsity and orthogonality among vectors for different attributes, thereby reducing inter-attribute conflicts. We evaluate MAT-Steer in two distinct settings: (i) on question answering (QA) tasks where we balance attributes like truthfulness, bias, and toxicity; (ii) on generative tasks where we simultaneously improve attributes like helpfulness, correctness, and coherence. MAT-Steer outperforms existing ITI and parameter-efficient fine-tuning approaches across both task types (e.g., 3% average accuracy gain across QA tasks and 55.82% win rate against the best ITI baseline).

Updated: 2025-07-09 17:31:20

标题: 通过定向干预实现多属性语言模型的导向

摘要: 推理时间干预（ITI）已经成为一种有前途的方法，可以通过对令牌表示进行干预，而无需对LLM的参数进行昂贵的更新，从而引导大型语言模型（LLM）的行为朝特定方向发展（例如，提高帮助性）。然而，现有的ITI方法无法扩展到具有冲突的多属性设置，例如提高帮助性同时减少毒性。为了解决这个问题，我们引入了多属性目标导向（MAT-Steer），这是一个针对多属性的选择性令牌级干预的新型导向框架。MAT-Steer使用一个对齐目标来学习导向向量，该目标将模型对不良输出的内部表示移近到良好输出的内部表示，同时强制在不同属性的向量之间实现稀疏性和正交性，从而减少属性之间的冲突。我们在两个不同的设置中评估MAT-Steer：（i）在问题回答（QA）任务上，我们平衡诸如真实性、偏见和毒性等属性；（ii）在生成任务上，我们同时提高帮助性、正确性和连贯性等属性。MAT-Steer在两种任务类型上均优于现有的ITI和参数高效微调方法（例如，在QA任务中平均准确性提升3%，击败最佳ITI基线的胜率达55.82%）。

更新时间: 2025-07-09 17:31:20

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.12446v2

An AI Approach for Learning the Spectrum of the Laplace-Beltrami Operator

The spectrum of the Laplace-Beltrami (LB) operator is central in geometric deep learning tasks, capturing intrinsic properties of the shape of the object under consideration. The best established method for its estimation, from a triangulated mesh of the object, is based on the Finite Element Method (FEM), and computes the top k LB eigenvalues with a complexity of O(Nk), where N is the number of points. This can render the FEM method inefficient when repeatedly applied to databases of CAD mechanical parts, or in quality control applications where part metrology is acquired as large meshes and decisions about the quality of each part are needed quickly and frequently. As a solution to this problem, we present a geometric deep learning framework to predict the LB spectrum efficiently given the CAD mesh of a part, achieving significant computational savings without sacrificing accuracy, demonstrating that the LB spectrum is learnable. The proposed Graph Neural Network architecture uses a rich set of part mesh features - including Gaussian curvature, mean curvature, and principal curvatures. In addition to our trained network, we make available, for repeatability, a large curated dataset of real-world mechanical CAD models derived from the publicly available ABC dataset used for training and testing. Experimental results show that our method reduces computation time of the LB spectrum by approximately 5 times over linear FEM while delivering competitive accuracy.

Updated: 2025-07-09 17:31:18

标题: 一种用于学习拉普拉斯-贝尔特拉米算子频谱的人工智能方法

摘要: Laplace-Beltrami（LB）算子的频谱在几何深度学习任务中是核心的，捕捉了所考虑对象形状的内在特性。对于从对象的三角网格估计LB算子的最佳方法是基于有限元方法（FEM），并计算具有O（Nk）复杂度的前k个LB特征值，其中N是点的数量。当重复应用于CAD机械零件数据库或在质量控制应用中，其中部件计量被获取为大型网格并且需要快速而频繁地对每个部件的质量做出决定时，这可能导致FEM方法效率低下。作为解决这个问题的方法，我们提出了一个几何深度学习框架，可以有效地预测给定部件的CAD网格的LB频谱，实现了显著的计算节约而不损失准确性，证明了LB频谱是可学习的。所提出的图神经网络架构使用了丰富的部件网格特征，包括高斯曲率、平均曲率和主曲率。除了我们训练的网络，为了可重复性，我们提供了一个大型的经过筛选的真实世界机械CAD模型数据集，这些数据集来源于用于训练和测试的公开可用的ABC数据集。实验结果表明，我们的方法将LB频谱的计算时间减少了约5倍，同时提供了具有竞争力的准确性。

更新时间: 2025-07-09 17:31:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.07073v1

How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks

Training effective artificial intelligence models for telecommunications is challenging due to the scarcity of deployment-specific data. Real data collection is expensive, and available datasets often fail to capture the unique operational conditions and contextual variability of the network environment. Digital twinning provides a potential solution to this problem, as simulators tailored to the current network deployment can generate site-specific data to augment the available training datasets. However, there is a need to develop solutions to bridge the inherent simulation-to-reality (sim-to-real) gap between synthetic and real-world data. This paper reviews recent advances on two complementary strategies: 1) the calibration of digital twins (DTs) through real-world measurements, and 2) the use of sim-to-real gap-aware training strategies to robustly handle residual discrepancies between digital twin-generated and real data. For the latter, we evaluate two conceptually distinct methods that model the sim-to-real gap either at the level of the environment via Bayesian learning or at the level of the training loss via prediction-powered inference.

Updated: 2025-07-09 17:27:51

标题: 如何在数字孪生辅助的电信网络中弥合模拟与现实之间的差距

摘要: 在电信领域训练有效的人工智能模型具有挑战性，因为部署特定数据稀缺。真实数据收集昂贵，现有数据集通常无法捕捉网络环境的独特运行条件和上下文变化。数字孪生技术为解决这一问题提供了潜在解决方案，因为针对当前网络部署的模拟器可以生成特定站点的数据，以增强可用的训练数据集。然而，有必要开发解决方案来弥合合成和真实数据之间固有的模拟到现实的差距。本文回顾了最近在两种互补策略上的进展：1)通过真实世界测量校准数字孪生技术（DTs），和2)使用模拟到现实差距感知训练策略来稳健地处理数字孪生生成和真实数据之间的剩余差异。对于后者，我们评估了两种概念上不同的方法，即通过贝叶斯学习在环境层面建模模拟到现实差距，或通过预测驱动推理在训练损失层面建模。

更新时间: 2025-07-09 17:27:51

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2507.07067v1

Integer Factorization: Another perspective

Integer factorization is a fundamental problem in algorithmic number theory and computer science. It is considered as a one way or trapdoor function in the (RSA) cryptosystem. To date, from elementary trial division to sophisticated methods like the General Number Field Sieve, no known algorithm can break the problem in polynomial time, while its proved that Shor's algorithm could on a quantum computer. In this paper, we recall some factorization algorithms and then approach the problem under different angles. Firstly, we take the problem from the ring $\displaystyle\left(\mathbb{Z}, \text{+}, \cdot\right)$ to the Lebesgue space $\mathcal{L}^{1}\left(X\right)$ where $X$ can be $\mathbb{Q}$ or any given interval setting. From this first perspective, integer factorization becomes equivalent to finding the perimeter of a rectangle whose area is known. In this case, it is equivalent to either finding bounds of integrals or finding primitives for some given bounds. Secondly, we take the problem from the ring $\displaystyle\left(\mathbb{Z}, \text{+}, \cdot\right) $ to the ring of matrices $\left( M_{2}\text{(}\mathbb{Z}\text{)}, \ \text{+} \ \cdot\right)$ and show that this problem is equivalent to matrix decomposition, and therefore present some possible computing algorithms, particularly using Gr\"obner basis and through matrix diagonalization. Finally, we address the problem depending on algebraic forms of factors and show that this problem is equivalent to finding small roots of a bivariate polynomial through coppersmith's method. The aim of this study is to propose innovative methodological approaches to reformulate this problem, thereby offering new perspectives.

Updated: 2025-07-09 17:20:49

标题: 整数因子分解：另一个视角

摘要: 整数因子分解是算法数论和计算机科学中的一个基本问题。它被认为是(RSA)加密系统中的一种单向或陷阱函数。迄今为止，从基本的试除法到像广义数域筛法这样复杂的方法，没有已知的算法能够在多项式时间内解决这个问题，而Shor的算法在量子计算机上能够解决这个问题已经被证明。在本文中，我们回顾了一些因子分解算法，并从不同角度接近这个问题。首先，我们将问题从环$\displaystyle\left(\mathbb{Z}, \text{+}, \cdot\right)$ 转换到勒贝格空间$\mathcal{L}^{1}\left(X\right)$，其中$X$可以是$\mathbb{Q}$或任何给定的区间设置。从这个第一个视角，整数因子分解变成了寻找已知面积的矩形的周长。在这种情况下，它等价于寻找积分的上下界或者在给定边界上找到原函数。其次，我们将问题从环$\displaystyle\left(\mathbb{Z}, \text{+}, \cdot\right)$ 转换到矩阵环$\left( M_{2}\text{(}\mathbb{Z}\text{)}, \ \text{+} \ \cdot\right)$，并展示这个问题等价于矩阵分解，因此提出了一些可能的计算算法，特别是使用Gr\"obner基础和通过矩阵对角化。最后，我们根据因子的代数形式处理这个问题，并展示这个问题等价于通过柯珀史密斯方法寻找双变量多项式的小根。本研究旨在提出创新的方法论途径来重新表述这个问题，从而提供新的视角。

更新时间: 2025-07-09 17:20:49

领域: math.NT,cs.CR

下载: http://arxiv.org/abs/2507.07055v1

LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits

Reward Models (RMs) are crucial to aligning large language models (LLMs), but the degree to which an RM specialized to one task (e.g. writing) generalizes to new tasks (e.g. math) is often not known a priori, often making using only one fixed RM to train LLMs suboptimal. However, optimizing LLMs with multiple RMs simultaneously can incur a prohibitively high computational cost and lead to conflicting signals from different RMs that may degrade performance. To address these challenges, we introduce LASeR (Learning to Adaptively Select Rewards), which frames reward model selection as a multi-armed bandit problem, efficiently and iteratively training LLMs using multiple RMs by selecting the most well-suited RM for each instance. On commonsense and math reasoning tasks, we show that LASeR boosts iterative LLM training, improving the absolute average accuracy of Llama-3-8B over three datasets by 2.67% over an ensemble of RM scores while also showing superior efficiency (e.g., a 2x speedup). Moreover, on WildChat (open-ended instruction-following tasks), LASeR leads to a 72.69% AlpacaEval win rate over the RM score ensemble baseline. Extending to long-context generation, LASeR improves by 2.96 F1 points (avg.) on single-document QA tasks and 2.97 F1 points on few-shot learning over the RM score ensemble baseline with best-of-n sampling.

Updated: 2025-07-09 17:19:50

标题: LASeR：使用多臂老虎机学习自适应选择奖励模型

摘要: 奖励模型（RMs）对齐大型语言模型（LLMs）至关重要，但专门针对一项任务（例如写作）的RM在多个新任务（例如数学）上的泛化程度通常事先不清楚，通常只使用一个固定的RM来训练LLMs并不是最佳选择。然而，同时优化LLMs使用多个RMs可能会导致计算成本过高，并且来自不同RMs的冲突信号可能会降低性能。为了解决这些挑战，我们引入了LASeR（学习自适应选择奖励），将奖励模型选择构建为多臂赌博机问题，通过选择最适合每个实例的RM，有效且迭代地训练LLMs使用多个RMs。在常识和数学推理任务上，我们展示了LASeR提升了迭代LLM训练，将Llama-3-8B的绝对平均准确率在三个数据集上提高了2.67%，优于RM分数集合，同时显示出更高的效率（例如，加快了2倍）。此外，在WildChat（开放式指令遵循任务）中，LASeR在AlpacaEval竞赛中获得了72.69%的胜率，超过了RM分数集合基线。扩展到长文本生成，LASeR在单文档问答任务上提高了2.96个F1分数（平均值），在少样本学习中提高了2.97个F1分数，优于RM分数集合基线和最佳-n采样。

更新时间: 2025-07-09 17:19:50

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.01735v2

Interpretable EEG-to-Image Generation with Semantic Prompts

Decoding visual experience from brain signals offers exciting possibilities for neuroscience and interpretable AI. While EEG is accessible and temporally precise, its limitations in spatial detail hinder image reconstruction. Our model bypasses direct EEG-to-image generation by aligning EEG signals with multilevel semantic captions -- ranging from object-level to abstract themes -- generated by a large language model. A transformer-based EEG encoder maps brain activity to these captions through contrastive learning. During inference, caption embeddings retrieved via projection heads condition a pretrained latent diffusion model for image generation. This text-mediated framework yields state-of-the-art visual decoding on the EEGCVPR dataset, with interpretable alignment to known neurocognitive pathways. Dominant EEG-caption associations reflected the importance of different semantic levels extracted from perceived images. Saliency maps and t-SNE projections reveal semantic topography across the scalp. Our model demonstrates how structured semantic mediation enables cognitively aligned visual decoding from EEG.

Updated: 2025-07-09 17:18:06

标题: 可解释的脑电图到图像生成技术与语义提示

摘要: 从脑信号中解码视觉体验为神经科学和可解释人工智能提供了令人兴奋的可能性。虽然脑电图（EEG）具有易于获取和时间精确的优势，但其在空间细节方面的局限性阻碍了图像重建。我们的模型通过将EEG信号与由大型语言模型生成的多级语义标题相对齐，从物体级到抽象主题不等。基于变压器的EEG编码器通过对比学习将脑活动映射到这些标题。在推理过程中，通过投影头检索到的标题嵌入条件化预训练的潜在扩散模型，用于图像生成。这种基于文本的框架在EEGCVPR数据集上实现了最先进的视觉解码，与已知的神经认知途径具有可解释的对齐性。显著的EEG-标题关联反映了从感知图像中提取的不同语义层面的重要性。显著性图和t-SNE投影显示了头皮上的语义地形。我们的模型展示了结构化语义调解如何使认知对齐的视觉解码从EEG中实现。

更新时间: 2025-07-09 17:18:06

领域: cs.CV,cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.07157v1

ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment

We aim to develop a goal specification method that is semantically clear, spatially sensitive, domain-agnostic, and intuitive for human users to guide agent interactions in 3D environments. Specifically, we propose a novel cross-view goal alignment framework that allows users to specify target objects using segmentation masks from their camera views rather than the agent's observations. We highlight that behavior cloning alone fails to align the agent's behavior with human intent when the human and agent camera views differ significantly. To address this, we introduce two auxiliary objectives: cross-view consistency loss and target visibility loss, which explicitly enhance the agent's spatial reasoning ability. According to this, we develop ROCKET-2, a state-of-the-art agent trained in Minecraft, achieving an improvement in the efficiency of inference 3x to 6x compared to ROCKET-1. We show that ROCKET-2 can directly interpret goals from human camera views, enabling better human-agent interaction. Remarkably, ROCKET-2 demonstrates zero-shot generalization capabilities: despite being trained exclusively on the Minecraft dataset, it can adapt and generalize to other 3D environments like Doom, DMLab, and Unreal through a simple action space mapping.

Updated: 2025-07-09 17:13:26

标题: ROCKET-2：通过跨视图目标对齐来引导视觉动作策略

摘要: 我们旨在开发一种目标规范方法，该方法在语义上清晰、空间敏感、领域无关且直观，供人类用户指导3D环境中的代理互动。具体来说，我们提出了一种新颖的跨视角目标对齐框架，允许用户使用其摄像机视图中的分割掩模来指定目标对象，而不是代理的观察结果。我们强调，仅使用行为克隆无法将代理的行为与人类意图对齐，当人类和代理的摄像机视图差异显著时。为了解决这个问题，我们引入了两个辅助目标：跨视角一致性损失和目标可见性损失，明确增强了代理的空间推理能力。根据这一点，我们开发了ROCKET-2，这是一个在Minecraft中训练的最先进代理，与ROCKET-1相比，推理效率提高了3到6倍。我们展示了ROCKET-2可以直接从人类摄像机视图中解释目标，实现更好的人-代理互动。值得注意的是，ROCKET-2展示了零样本泛化能力：尽管仅在Minecraft数据集上训练，它可以通过简单的动作空间映射适应和泛化到其他3D环境，如Doom、DMLab和虚幻引擎。

更新时间: 2025-07-09 17:13:26

领域: cs.AI,cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2503.02505v2

Geometry-Informed Neural Operator Transformer

Machine-learning-based surrogate models offer significant computational efficiency and faster simulations compared to traditional numerical methods, especially for problems requiring repeated evaluations of partial differential equations. This work introduces the Geometry-Informed Neural Operator Transformer (GINOT), which integrates the transformer architecture with the neural operator framework to enable forward predictions on arbitrary geometries. GINOT employs a sampling and grouping strategy together with an attention mechanism to encode surface point clouds that are unordered, exhibit non-uniform point densities, and contain varying numbers of points for different geometries. The geometry information is seamlessly integrated with query points in the solution decoder through the attention mechanism. The performance of GINOT is validated on multiple challenging datasets, showcasing its high accuracy and strong generalization capabilities for complex and arbitrary 2D and 3D geometries.

Updated: 2025-07-09 17:13:05

标题: 几何信息感知神经操作者变换器

摘要: 基于机器学习的代理模型相比传统数值方法提供了显著的计算效率和更快的模拟速度，特别是对于需要重复评估偏微分方程的问题。本文介绍了几何信息感知神经操作变换器（GINOT），它将变换器架构与神经操作框架集成在一起，以实现对任意几何形状的前向预测。GINOT采用采样和分组策略以及注意机制来编码无序的表面点云，展示非均匀点密度，并且包含不同几何形状的不同数量的点。通过注意机制，几何信息与解码器中的查询点无缝集成。GINOT的性能在多个具有挑战性的数据集上得到验证，展示了其对于复杂和任意的2D和3D几何形状具有高准确性和强大的泛化能力。

更新时间: 2025-07-09 17:13:05

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2504.19452v4

Low-Rank Adaptation Secretly Imitates Differentially Private SGD

As pre-trained language models grow in size, full fine-tuning their parameters on task adaptation data becomes increasingly impractical. To address this challenge, some methods for low-rank adaptation of language models have been proposed, e.g. LoRA, which incorporates trainable low-rank decomposition matrices into only some parameters of the pre-trained model, called adapters. This approach significantly reduces the number of trainable parameters compared to fine-tuning all parameters or adapters. In this work, we look at low-rank adaptation method from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA is equivalent to fine-tuning adapters with noisy batch gradients - just like what DPSGD algorithm does. We also quantify the variance of the injected noise as a decreasing function of adaptation rank. By establishing a Berry-Esseen type bound on the total variation distance between the injected noise distribution and a Gaussian noise distribution with the same variance, we show that the dynamics of low-rank adaptation is very close to when DPSGD is performed w.r.t the adapters. Following our theoretical findings and approved by our experimental results, we show that low-rank adaptation provides robustness to membership inference attacks w.r.t the fine-tuning data.

Updated: 2025-07-09 17:11:15

标题: 低秩适应暗中模仿差分隐私SGD

摘要: 随着预训练语言模型的规模扩大，对任务适应数据进行完全微调其参数变得越来越不切实际。为了解决这一挑战，一些低秩适应语言模型的方法已被提出，例如LoRA，它将可训练的低秩分解矩阵整合到预训练模型的某些参数中，称为适配器。这种方法相比于微调所有参数或适配器，显著减少了可训练参数的数量。在这项工作中，我们从数据隐私的角度研究了低秩适应方法。我们理论上表明，LoRA中使用的低秩适应等同于使用带有嘈杂批次梯度的适配器微调 - 就像DPSGD算法所做的那样。我们还量化了注入噪声的方差，作为适应秩的减少函数。通过在注入噪声分布与具有相同方差的高斯噪声分布之间建立Berry-Esseen类型的界限，我们表明低秩适应的动态与在适配器方面执行DPSGD时非常接近。根据我们的理论发现并得到我们实验结果的认可，我们表明低秩适应提供了对微调数据进行成员推断攻击的鲁棒性。

更新时间: 2025-07-09 17:11:15

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.17538v7

A Novel Hybrid Deep Learning Technique for Speech Emotion Detection using Feature Engineering

Nowadays, speech emotion recognition (SER) plays a vital role in the field of human-computer interaction (HCI) and the evolution of artificial intelligence (AI). Our proposed DCRF-BiLSTM model is used to recognize seven emotions: neutral, happy, sad, angry, fear, disgust, and surprise, which are trained on five datasets: RAVDESS (R), TESS (T), SAVEE (S), EmoDB (E), and Crema-D (C). The model achieves high accuracy on individual datasets, including 97.83% on RAVDESS, 97.02% on SAVEE, 95.10% for CREMA-D, and a perfect 100% on both TESS and EMO-DB. For the combined (R+T+S) datasets, it achieves 98.82% accuracy, outperforming previously reported results. To our knowledge, no existing study has evaluated a single SER model across all five benchmark datasets (i.e., R+T+S+C+E) simultaneously. In our work, we introduce this comprehensive combination and achieve a remarkable overall accuracy of 93.76%. These results confirm the robustness and generalizability of our DCRF-BiLSTM framework across diverse datasets.

Updated: 2025-07-09 17:07:45

标题: 一种新型混合深度学习技术用于情绪检测的特征工程

摘要: 如今，语音情感识别（SER）在人机交互（HCI）领域和人工智能（AI）发展中扮演着至关重要的角色。我们提出的DCRF-BiLSTM模型用于识别七种情感：中性、快乐、悲伤、愤怒、恐惧、厌恶和惊讶，这些情感在五个数据集上进行训练：RAVDESS（R）、TESS（T）、SAVEE（S）、EmoDB（E）和Crema-D（C）。该模型在各个数据集上取得了高准确率，包括RAVDESS达到了97.83%，SAVEE达到了97.02%，CREMA-D达到了95.10%，TESS和EMO-DB均达到了完美的100%。对于组合的（R+T+S）数据集，准确率达到了98.82%，超过了以往的研究结果。据我们所知，目前没有任何研究评估过针对所有五个基准数据集（即R+T+S+C+E）同时进行评估的单一SER模型。在我们的研究中，我们引入了这种全面组合，并取得了卓越的综合准确率达到了93.76%。这些结果证实了我们的DCRF-BiLSTM框架在不同数据集上的稳健性和泛化能力。

更新时间: 2025-07-09 17:07:45

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2507.07046v1

Non-Asymptotic Analysis of Online Local Private Learning with SGD

Differentially Private Stochastic Gradient Descent (DP-SGD) has been widely used for solving optimization problems with privacy guarantees in machine learning and statistics. Despite this, a systematic non-asymptotic convergence analysis for DP-SGD, particularly in the context of online problems and local differential privacy (LDP) models, remains largely elusive. Existing non-asymptotic analyses have focused on non-private optimization methods, and hence are not applicable to privacy-preserving optimization problems. This work initiates the analysis to bridge this gap and opens the door to non-asymptotic convergence analysis of private optimization problems. A general framework is investigated for the online LDP model in stochastic optimization problems. We assume that sensitive information from individuals is collected sequentially and aim to estimate, in real-time, a static parameter that pertains to the population of interest. Most importantly, we conduct a comprehensive non-asymptotic convergence analysis of the proposed estimators in finite-sample situations, which gives their users practical guidelines regarding the effect of various hyperparameters, such as step size, parameter dimensions, and privacy budgets, on convergence rates. Our proposed estimators are validated in the theoretical and practical realms by rigorous mathematical derivations and carefully constructed numerical experiments.

Updated: 2025-07-09 17:06:01

标题: 随机梯度下降在在线本地私有学习中的非渐近分析

摘要: 差分隐私随机梯度下降（DP-SGD）已广泛用于解决机器学习和统计中带有隐私保证的优化问题。尽管如此，在DP-SGD的系统非渐近收敛分析中，特别是在在线问题和局部差分隐私（LDP）模型的背景下，仍然很难实现。现有的非渐近分析主要集中在非私密优化方法上，因此不适用于保护隐私的优化问题。本研究启动了分析以弥合这一差距，并为私密优化问题的非渐近收敛分析打开了大门。研究了在线LDP模型在随机优化问题中的一般框架。我们假设从个体收集的敏感信息是按顺序进行的，并旨在实时估计与感兴趣人群相关的静态参数。更重要的是，我们在有限样本情况下对所提出的估计量进行了全面的非渐近收敛分析，为用户提供有关各种超参数（如步长、参数维度和隐私预算）对收敛速度的影响的实用指导。我们通过严格的数学推导和精心设计的数值实验在理论和实践领域验证了我们提出的估计量。

更新时间: 2025-07-09 17:06:01

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.07041v1

Surrogate Model for Heat Transfer Prediction in Impinging Jet Arrays using Dynamic Inlet/Outlet and Flow Rate Control

This study presents a surrogate model designed to predict the Nusselt number distribution in an enclosed impinging jet arrays, where each jet function independently and where jets can be transformed from inlets to outlets, leading to a vast number of possible flow arrangements. While computational fluid dynamics (CFD) simulations can model heat transfer with high fidelity, their cost prohibits real-time application such as model-based temperature control. To address this, we generate a CNN-based surrogate model that can predict the Nusselt distribution in real time. We train it with data from implicit large eddy computational fluid dynamics simulations (Re < 2,000). We train two distinct models, one for a five by one array of jets (83 simulations) and one for a three by three array of jets (100 simulations). We introduce a method to extrapolate predictions to higher Reynolds numbers (Re < 10,000) using a correlation-based scaling. The surrogate models achieve high accuracy, with a normalized mean average error below 2% on validation data for the five by one surrogate model and 0.6% for the three by three surrogate model. Experimental validation confirms the model's predictive capabilities. This work provides a foundation for model-based control strategies in advanced thermal management applications.

Updated: 2025-07-09 17:03:54

标题: 使用动态进出口和流量控制的冲击射流阵列传热预测的代理模型

摘要: 这项研究提出了一个代理模型，旨在预测封闭冲击射流阵列中Nusselt数分布，其中每个射流独立运行，并且射流可以从进口变为出口，导致大量可能的流动排列。尽管计算流体动力学（CFD）模拟可以高度准确地模拟传热，但其成本限制了实时应用，例如基于模型的温度控制。为了解决这个问题，我们生成了一个基于CNN的代理模型，可以实时预测Nusselt分布。我们使用隐式大涡模拟（Re <2,000）的数据来对其进行训练。我们训练了两个不同的模型，一个用于五个射流阵列（83次模拟），另一个用于三乘三射流阵列（100次模拟）。我们介绍了一种基于相关性的缩放方法，用于将预测推广到更高的雷诺数（Re <10,000）。代理模型在验证数据上实现了很高的准确性，对于五乘一代理模型，标准化平均误差低于2％，对于三乘三代理模型为0.6％。实验验证了模型的预测能力。这项工作为先进热管理应用中基于模型的控制策略奠定了基础。

更新时间: 2025-07-09 17:03:54

领域: physics.flu-dyn,cs.AI

下载: http://arxiv.org/abs/2507.07034v1

Self-Supervised Learning at the Edge: The Cost of Labeling

Contrastive learning (CL) has recently emerged as an alternative to traditional supervised machine learning solutions by enabling rich representations from unstructured and unlabeled data. However, CL and, more broadly, self-supervised learning (SSL) methods often demand a large amount of data and computational resources, posing challenges for deployment on resource-constrained edge devices. In this work, we explore the feasibility and efficiency of SSL techniques for edge-based learning, focusing on trade-offs between model performance and energy efficiency. In particular, we analyze how different SSL techniques adapt to limited computational, data, and energy budgets, evaluating their effectiveness in learning robust representations under resource-constrained settings. Moreover, we also consider the energy costs involved in labeling data and assess how semi-supervised learning may assist in reducing the overall energy consumed to train CL models. Through extensive experiments, we demonstrate that tailored SSL strategies can achieve competitive performance while reducing resource consumption by up to 4X, underscoring their potential for energy-efficient learning at the edge.

Updated: 2025-07-09 17:03:50

标题: 边缘自监督学习：标记的成本

摘要: 对比学习（CL）最近已经成为传统监督机器学习解决方案的一种替代方式，它通过从非结构化和未标记数据中提取丰富的表示来实现。然而，CL和更广泛的自监督学习（SSL）方法通常需要大量数据和计算资源，这给在资源受限的边缘设备上部署带来挑战。在这项工作中，我们探讨了基于边缘学习的SSL技术的可行性和效率，重点放在模型性能和能源效率之间的权衡。特别地，我们分析了不同SSL技术如何适应有限的计算、数据和能源预算，在资源受限的环境下评估它们在学习鲁棒表示方面的有效性。此外，我们还考虑了标记数据所涉及的能量成本，并评估了半监督学习如何帮助减少训练CL模型所消耗的总能量。通过大量实验，我们证明了定制的SSL策略可以在降低资源消耗的同时实现竞争性性能，将其潜力突出为边缘上的节能学习。

更新时间: 2025-07-09 17:03:50

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.07033v1

ZKTorch: Compiling ML Inference to Zero-Knowledge Proofs via Parallel Proof Accumulation

As AI models become ubiquitous in our daily lives, there has been an increasing demand for transparency in ML services. However, the model owner does not want to reveal the weights, as they are considered trade secrets. To solve this problem, researchers have turned to zero-knowledge proofs of ML model inference. These proofs convince the user that the ML model output is correct, without revealing the weights of the model to the user. Past work on these provers can be placed into two categories. The first method compiles the ML model into a low-level circuit, and proves the circuit using a ZK-SNARK. The second method uses custom cryptographic protocols designed only for a specific class of models. Unfortunately, the first method is highly inefficient, making it impractical for the large models used today, and the second method does not generalize well, making it difficult to update in the rapidly changing field of machine learning. To solve this, we propose ZKTorch, an open source end-to-end proving system that compiles ML models into base cryptographic operations called basic blocks, each proved using specialized protocols. ZKTorch is built on top of a novel parallel extension to the Mira accumulation scheme, enabling succinct proofs with minimal accumulation overhead. These contributions allow ZKTorch to achieve at least a $3\times$ reduction in the proof size compared to specialized protocols and up to a $6\times$ speedup in proving time over a general-purpose ZKML framework.

Updated: 2025-07-09 17:03:21

标题: ZKTorch：通过并行证明累积将ML推断编译为零知识证明

摘要: 随着人工智能模型在我们日常生活中变得普遍，对机器学习服务透明性的需求日益增加。然而，模型所有者不愿透露权重，因为它们被视为商业秘密。为了解决这个问题，研究人员转向了零知识证明的机器学习模型推断。这些证明使用户相信机器学习模型的输出是正确的，而不向用户透露模型的权重。过去对这些证明者的工作可以分为两类。第一种方法将机器学习模型编译成低级电路，并使用ZK-SNARK证明电路。第二种方法使用专门为特定类别模型设计的自定义加密协议。不幸的是，第一种方法效率极低，对于当今使用的大型模型而言不切实际，而第二种方法通用性不强，使其难以在快速变化的机器学习领域进行更新。为了解决这个问题，我们提出了ZKTorch，一个开源的端到端证明系统，将机器学习模型编译为基本的加密操作，每个操作都使用专门的协议进行证明。ZKTorch建立在Mira累积方案的新型并行扩展之上，实现了简洁的证明，同时最小化累积开销。这些贡献使ZKTorch能够至少比专门协议减少3倍的证明大小，并在证明时间上比通用ZKML框架获得高达6倍的加速。

更新时间: 2025-07-09 17:03:21

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2507.07031v1

Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices

This paper presents the design and development of an OCR-powered pipeline for efficient table extraction from invoices. The system leverages Tesseract OCR for text recognition and custom post-processing logic to detect, align, and extract structured tabular data from scanned invoice documents. Our approach includes dynamic preprocessing, table boundary detection, and row-column mapping, optimized for noisy and non-standard invoice formats. The resulting pipeline significantly improves data extraction accuracy and consistency, supporting real-world use cases such as automated financial workflows and digital archiving.

Updated: 2025-07-09 16:59:00

标题: 设计和实施一个基于OCR技术的管道，用于从发票中提取表格

摘要: 本文介绍了一种基于OCR技术的管道设计和开发，用于有效地从发票中提取表格。该系统利用Tesseract OCR进行文本识别，并使用自定义的后处理逻辑来检测、对齐和提取扫描的发票文档中的结构化表格数据。我们的方法包括动态预处理、表边界检测和行列映射，针对嘈杂和非标准的发票格式进行了优化。由此产生的管道显著提高了数据提取的准确性和一致性，支持自动化财务工作流程和数字化归档等真实应用场景。

更新时间: 2025-07-09 16:59:00

领域: cs.CV,cs.AI,I.2.10; I.4.9; H.3.1

下载: http://arxiv.org/abs/2507.07029v1

Scaling 4D Representations

Scaling has not yet been convincingly demonstrated for pure self-supervised learning from video. However, prior work has focused evaluations on semantic-related tasks $\unicode{x2013}$ action classification, ImageNet classification, etc. In this paper we focus on evaluating self-supervised learning on non-semantic vision tasks that are more spatial (3D) and temporal (+1D = 4D), such as camera pose estimation, point and object tracking, and depth estimation. We show that by learning from very large video datasets, masked auto-encoding (MAE) with transformer video models actually scales, consistently improving performance on these 4D tasks, as model size increases from 20M all the way to the largest by far reported self-supervised video model $\unicode{x2013}$ 22B parameters. Rigorous apples-to-apples comparison with many recent image and video models demonstrates the benefits of scaling 4D representations. Pretrained models are available at https://github.com/google-deepmind/representations4d .

Updated: 2025-07-09 16:58:07

标题: 放大4D表示

摘要: 自监督学习从视频中仍未明确展示出可靠的扩展性。然而，先前的研究集中在语义相关任务上进行评估 - 如动作分类、ImageNet分类等。在本文中，我们专注于评估自监督学习在更多空间（3D）和时间（+1D = 4D）方面的非语义视觉任务，如相机姿势估计、点和物体跟踪以及深度估计。我们展示了通过从非常大的视频数据集中学习，使用变换器视频模型的遮蔽自编码（MAE）实际上是具有扩展性的，随着模型尺寸从20M增加到迄今为止报告的远远最大的自监督视频模型 - 22B参数，持续改进这些4D任务的性能。与许多最近的图像和视频模型进行严格的苹果对苹果比较，展示了扩展4D表示的好处。预训练模型可在https://github.com/google-deepmind/representations4d 上获得。

更新时间: 2025-07-09 16:58:07

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.15212v2

FlexOlmo: Open Language Models for Flexible Data Use

We introduce FlexOlmo, a new class of language models (LMs) that supports (1) distributed training without data sharing, where different model parameters are independently trained on closed datasets, and (2) data-flexible inference, where these parameters along with their associated data can be flexibly included or excluded from model inferences with no further training. FlexOlmo employs a mixture-of-experts (MoE) architecture where each expert is trained independently on closed datasets and later integrated through a new domain-informed routing without any joint training. FlexOlmo is trained on FlexMix, a corpus we curate comprising publicly available datasets alongside seven domain-specific sets, representing realistic approximations of closed sets. We evaluate models with up to 37 billion parameters (20 billion active) on 31 diverse downstream tasks. We show that a general expert trained on public data can be effectively combined with independently trained experts from other data owners, leading to an average 41% relative improvement while allowing users to opt out of certain data based on data licensing or permission requirements. Our approach also outperforms prior model merging methods by 10.1% on average and surpasses the standard MoE trained without data restrictions using the same training FLOPs. Altogether, this research presents a solution for both data owners and researchers in regulated industries with sensitive or protected data. FlexOlmo enables benefiting from closed data while respecting data owners' preferences by keeping their data local and supporting fine-grained control of data access during inference.

Updated: 2025-07-09 16:54:21

标题: FlexOlmo：灵活数据使用的开放语言模型

摘要: 我们介绍了FlexOlmo，一种新型语言模型（LMs）类别，支持（1）分布式训练而无需数据共享，其中不同的模型参数在封闭数据集上独立训练，以及（2）数据灵活推理，其中这些参数及其关联数据可以灵活地包含或排除在模型推理中，而无需进一步训练。FlexOlmo采用了一个专家混合（MoE）架构，其中每个专家在封闭数据集上独立训练，然后通过一种新的领域信息路由进行集成，无需任何联合训练。FlexOlmo在FlexMix上进行训练，这是我们策划的一个语料库，包括公开可用的数据集以及七个特定领域的数据集，代表了封闭数据集的现实近似。我们在31个不同的下游任务上评估了具有高达370亿参数（200亿活动参数）的模型。我们展示了一个在公共数据上训练的通用专家可以有效地与其他数据所有者独立训练的专家结合，导致平均相对改进41％，同时允许用户根据数据许可或权限要求选择退出某些数据。我们的方法还比以往的模型合并方法平均提高了10.1％，并且在使用相同的训练FLOPs进行训练时超过了没有数据限制的标准MoE。总的来说，这项研究为拥有敏感或受保护数据的受监管行业的数据所有者和研究人员提供了解决方案。FlexOlmo使得在尊重数据所有者偏好的同时受益于封闭数据成为可能，通过保持他们的数据本地化并支持在推理期间对数据访问进行细粒度控制。

更新时间: 2025-07-09 16:54:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.07024v1

Topological Machine Learning with Unreduced Persistence Diagrams

Supervised machine learning pipelines trained on features derived from persistent homology have been experimentally observed to ignore much of the information contained in a persistence diagram. Computing persistence diagrams is often the most computationally demanding step in such a pipeline, however. To explore this, we introduce several methods to generate topological feature vectors from unreduced boundary matrices. We compared the performance of pipelines trained on vectorizations of unreduced PDs to vectorizations of fully-reduced PDs across several data and task types. Our results indicate that models trained on PDs built from unreduced diagrams can perform on par and even outperform those trained on fully-reduced diagrams on some tasks. This observation suggests that machine learning pipelines which incorporate topology-based features may benefit in terms of computational cost and performance by utilizing information contained in unreduced boundary matrices.

Updated: 2025-07-09 16:49:11

标题: 使用未减少的持续图进行拓扑机器学习

摘要: 经过实验证明，基于持久同调导出特征的监督机器学习管道通常会忽略持久图中包含的大部分信息。然而，计算持久图通常是这种管道中最具计算需求的步骤。为了探索这一点，我们引入了几种方法，从未简化的边界矩阵中生成拓扑特征向量。我们比较了在多种数据和任务类型上，基于未简化PD的向量化训练的管道与基于完全简化PD的向量化训练的管道的性能。我们的结果表明，在某些任务上，基于未简化图构建的PD的模型可以与甚至优于基于完全简化图构建的PD的模型。这一观察结果表明，将基于拓扑的特征纳入机器学习管道可能会从未简化的边界矩阵中利用信息而在计算成本和性能方面受益。

更新时间: 2025-07-09 16:49:11

领域: stat.ML,cs.CG,cs.LG,math.AT,55N31

下载: http://arxiv.org/abs/2507.07156v1

Evaluating Retrieval-Augmented Generation Agents for Autonomous Scientific Discovery in Astrophysics

We evaluate 9 Retrieval Augmented Generation (RAG) agent configurations on 105 Cosmology Question-Answer (QA) pairs that we built specifically for this purpose.The RAG configurations are manually evaluated by a human expert, that is, a total of 945 generated answers were assessed. We find that currently the best RAG agent configuration is with OpenAI embedding and generative model, yielding 91.4\% accuracy. Using our human evaluation results we calibrate LLM-as-a-Judge (LLMaaJ) system which can be used as a robust proxy for human evaluation. These results allow us to systematically select the best RAG agent configuration for multi-agent system for autonomous scientific discovery in astrophysics (e.g., cmbagent presented in a companion paper) and provide us with an LLMaaJ system that can be scaled to thousands of cosmology QA pairs. We make our QA dataset, human evaluation results, RAG pipelines, and LLMaaJ system publicly available for further use by the astrophysics community.

Updated: 2025-07-09 16:46:03

标题: 评估检索增强生成代理在天体物理学自主科学发现中的应用

摘要: 我们评估了9种检索增强生成（RAG）代理配置，针对我们专门为此目的构建的105个宇宙学问答（QA）对。RAG配置由人类专家手动评估，总共评估了945个生成的答案。我们发现目前最佳的RAG代理配置是采用OpenAI嵌入和生成模型，准确率为91.4％。利用我们的人类评估结果，我们校准了LLM作为评判（LLMaaJ）系统，可以用作人类评估的强大代理。这些结果使我们能够系统地选择最佳的RAG代理配置，用于天体物理中的多代理系统进行自主科学发现（例如，在伴随论文中介绍的cmbagent），并提供一个可扩展到数千个宇宙学QA对的LLMaaJ系统。我们公开提供我们的QA数据集、人类评估结果、RAG管道和LLMaaJ系统，供天体物理学界进一步使用。

更新时间: 2025-07-09 16:46:03

领域: astro-ph.IM,astro-ph.CO,cs.AI

下载: http://arxiv.org/abs/2507.07155v1

First Return, Entropy-Eliciting Explore

Reinforcement Learning from Verifiable Rewards (RLVR) improves the reasoning abilities of Large Language Models (LLMs) but it struggles with unstable exploration. We propose FR3E (First Return, Entropy-Eliciting Explore), a structured exploration framework that identifies high-uncertainty decision points in reasoning trajectories and performs targeted rollouts to construct semantically grounded intermediate feedback. Our method provides targeted guidance without relying on dense supervision. Empirical results on mathematical reasoning benchmarks(AIME24) show that FR3E promotes more stable training, produces longer and more coherent responses, and increases the proportion of fully correct trajectories. These results highlight the framework's effectiveness in improving LLM reasoning through more robust and structured exploration.

Updated: 2025-07-09 16:45:48

标题: 首次回归，熵诱导探索

摘要: 强化学习从可验证奖励（RLVR）提高了大型语言模型（LLMs）的推理能力，但在不稳定的探索方面存在困难。我们提出了FR3E（第一返回，熵引发探索），这是一个结构化探索框架，可以识别推理轨迹中的高不确定性决策点，并执行有针对性的展开，以构建语义基础的中间反馈。我们的方法提供了有针对性的指导，而不依赖于密集的监督。在数学推理基准（AIME24）的实证结果表明，FR3E促进了更稳定的训练，产生了更长、更连贯的响应，并增加了完全正确轨迹的比例。这些结果突显了该框架通过更健壮和结构化的探索来提高LLM推理的有效性。

更新时间: 2025-07-09 16:45:48

领域: cs.AI

下载: http://arxiv.org/abs/2507.07017v1

On-Device Training of PV Power Forecasting Models in a Smart Meter for Grid Edge Intelligence

In this paper, an edge-side model training study is conducted on a resource-limited smart meter. The motivation of grid-edge intelligence and the concept of on-device training are introduced. Then, the technical preparation steps for on-device training are described. A case study on the task of photovoltaic power forecasting is presented, where two representative machine learning models are investigated: a gradient boosting tree model and a recurrent neural network model. To adapt to the resource-limited situation in the smart meter, "mixed"- and "reduced"-precision training schemes are also devised. Experiment results demonstrate the feasibility of economically achieving grid-edge intelligence via the existing advanced metering infrastructures.

Updated: 2025-07-09 16:45:33

标题: 在智能电表中进行光伏功率预测模型的设备端训练，用于网格边缘智能

摘要: 本文在资源有限的智能电表上进行了边缘端模型训练研究。首先介绍了电网边缘智能的动机和设备上的训练概念。然后描述了设备上训练的技术准备步骤。介绍了光伏发电功率预测任务的案例研究，研究了两种代表性的机器学习模型：梯度提升树模型和循环神经网络模型。为适应智能电表的资源有限情况，还设计了“混合”和“降低”精度训练方案。实验结果表明，通过现有先进的计量基础设施，经济实现电网边缘智能是可行的。

更新时间: 2025-07-09 16:45:33

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.07016v1

MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation

Knowledge distillation as an efficient knowledge transfer technique, has achieved remarkable success in unimodal scenarios. However, in cross-modal settings, conventional distillation methods encounter significant challenges due to data and statistical heterogeneities, failing to leverage the complementary prior knowledge embedded in cross-modal teacher models. This paper empirically reveals two critical issues in existing approaches: distillation path selection and knowledge drift. To address these limitations, we propose MST-Distill, a novel cross-modal knowledge distillation framework featuring a mixture of specialized teachers. Our approach employs a diverse ensemble of teacher models across both cross-modal and multimodal configurations, integrated with an instance-level routing network that facilitates adaptive and dynamic distillation. This architecture effectively transcends the constraints of traditional methods that rely on monotonous and static teacher models. Additionally, we introduce a plug-in masking module, independently trained to suppress modality-specific discrepancies and reconstruct teacher representations, thereby mitigating knowledge drift and enhancing transfer effectiveness. Extensive experiments across five diverse multimodal datasets, spanning visual, audio, and text, demonstrate that our method significantly outperforms existing state-of-the-art knowledge distillation methods in cross-modal distillation tasks. The source code is available at https://github.com/Gray-OREO/MST-Distill.

Updated: 2025-07-09 16:45:28

标题: MST-Distill: 跨模态知识蒸馏的专业教师混合

摘要: 知识蒸馏作为一种高效的知识传递技术，在单模态场景中取得了显著的成功。然而，在跨模态设置中，传统的蒸馏方法面临着数据和统计异质性的重大挑战，无法利用嵌入在跨模态教师模型中的互补先验知识。本文经验性地揭示了现有方法中存在的两个关键问题：蒸馏路径选择和知识漂移。为了解决这些限制，我们提出了一种新颖的跨模态知识蒸馏框架MST-Distill，其中包含一组专门的教师模型。我们的方法采用了跨模态和多模态配置的多样化教师模型集合，集成了一个实例级别的路由网络，促进自适应和动态蒸馏。这种架构有效地超越了依赖单调和静态教师模型的传统方法的约束。此外，我们引入了一个插件掩模模块，独立训练以抑制模态特定的差异，并重构教师表示，从而减轻知识漂移并增强传递效果。在涵盖视觉、音频和文本的五个不同多模态数据集上进行的大量实验证明，我们的方法在跨模态蒸馏任务中明显优于现有的最先进的知识蒸馏方法。源代码可在https://github.com/Gray-OREO/MST-Distill找到。

更新时间: 2025-07-09 16:45:28

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2507.07015v1

When Context Is Not Enough: Modeling Unexplained Variability in Car-Following Behavior

Modeling car-following behavior is fundamental to microscopic traffic simulation, yet traditional deterministic models often fail to capture the full extent of variability and unpredictability in human driving. While many modern approaches incorporate context-aware inputs (e.g., spacing, speed, relative speed), they frequently overlook structured stochasticity that arises from latent driver intentions, perception errors, and memory effects -- factors that are not directly observable from context alone. To fill the gap, this study introduces an interpretable stochastic modeling framework that captures not only context-dependent dynamics but also residual variability beyond what context can explain. Leveraging deep neural networks integrated with nonstationary Gaussian processes (GPs), our model employs a scenario-adaptive Gibbs kernel to learn dynamic temporal correlations in acceleration decisions, where the strength and duration of correlations between acceleration decisions evolve with the driving context. This formulation enables a principled, data-driven quantification of uncertainty in acceleration, speed, and spacing, grounded in both observable context and latent behavioral variability. Comprehensive experiments on the naturalistic vehicle trajectory dataset collected from the German highway, i.e., the HighD dataset, demonstrate that the proposed stochastic simulation method within this framework surpasses conventional methods in both predictive performance and interpretable uncertainty quantification. The integration of interpretability and accuracy makes this framework a promising tool for traffic analysis and safety-critical applications.

Updated: 2025-07-09 16:42:41

标题: 当背景不足够时：对汽车跟车行为中未解释的变异性进行建模

摘要: 建模车辆跟驰行为对微观交通仿真至关重要，然而传统确定性模型通常无法捕捉人类驾驶中的完整变异性和不可预测性。虽然许多现代方法包括上下文感知输入（如间距、速度、相对速度），但它们经常忽视由潜在驾驶员意图、感知误差和记忆效应产生的结构化随机性——这些因素仅通过上下文本身无法直接观察到。为填补这一空白，本研究引入了一个可解释的随机建模框架，不仅捕捉依赖上下文的动态性，还捕捉超出上下文解释范围的残余变异性。利用集成非平稳高斯过程（GPs）的深度神经网络，我们的模型采用了一个场景自适应的Gibbs核，学习加速决策中的动态时间相关性，其中加速决策之间的相关性强度和持续时间随着驾驶环境的变化而演变。这种表述使得加速度、速度和间距的不确定性能够有原则地、数据驱动地量化，既基于可观察的上下文，也基于潜在的行为变异性。对从德国高速公路收集的自然车辆轨迹数据集（即HighD数据集）进行的全面实验表明，该框架内提出的随机仿真方法在预测性能和可解释性不确定性量化方面超过了传统方法。可解释性和准确性的整合使得这个框架成为交通分析和安全关键应用的有前景的工具。

更新时间: 2025-07-09 16:42:41

领域: stat.AP,cs.LG,cs.RO

下载: http://arxiv.org/abs/2507.07012v1

TokenShapley: Token Level Context Attribution with Shapley Value

Large language models (LLMs) demonstrate strong capabilities in in-context learning, but verifying the correctness of their generated responses remains a challenge. Prior work has explored attribution at the sentence level, but these methods fall short when users seek attribution for specific keywords within the response, such as numbers, years, or names. To address this limitation, we propose TokenShapley, a novel token-level attribution method that combines Shapley value-based data attribution with KNN-based retrieval techniques inspired by recent advances in KNN-augmented LLMs. By leveraging a precomputed datastore for contextual retrieval and computing Shapley values to quantify token importance, TokenShapley provides a fine-grained data attribution approach. Extensive evaluations on four benchmarks show that TokenShapley outperforms state-of-the-art baselines in token-level attribution, achieving an 11-23% improvement in accuracy.

Updated: 2025-07-09 16:40:38

标题: TokenShapley：使用Shapley值进行标记级别上下文归因

摘要: 大型语言模型（LLMs）在上下文学习中表现出强大的能力，但验证其生成的响应的正确性仍然是一个挑战。以往的研究探讨了句子级别的归因，但当用户寻求对响应中特定关键字（如数字、年份或名称）的归因时，这些方法存在不足之处。为了解决这一局限性，我们提出了TokenShapley，一种结合了基于Shapley值的数据归因和受最近KNN增强LLMs的进展启发的KNN技术的新型标记级别归因方法。通过利用预先计算的数据存储库进行上下文检索，并计算Shapley值来量化标记重要性，TokenShapley提供了一种细粒度的数据归因方法。在四个基准测试上进行的广泛评估显示，TokenShapley在标记级别的归因方面优于最先进的基准线，精度提高了11-23%。

更新时间: 2025-07-09 16:40:38

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.05261v2

Exact Evaluation of the Accuracy of Diffusion Models for Inverse Problems with Gaussian Data Distributions

Used as priors for Bayesian inverse problems, diffusion models have recently attracted considerable attention in the literature. Their flexibility and high variance enable them to generate multiple solutions for a given task, such as inpainting, super-resolution, and deblurring. However, several unresolved questions remain about how well they perform. In this article, we investigate the accuracy of these models when applied to a Gaussian data distribution for deblurring. Within this constrained context, we are able to precisely analyze the discrepancy between the theoretical resolution of inverse problems and their resolution obtained using diffusion models by computing the exact Wasserstein distance between the distribution of the diffusion model sampler and the ideal distribution of solutions to the inverse problem. Our findings allow for the comparison of different algorithms from the literature.

Updated: 2025-07-09 16:36:51

标题: 《具有高斯数据分布的反问题扩散模型准确性的精确评估》

摘要: 最近，在文献中，扩散模型作为贝叶斯逆问题的先验引起了相当大的关注。它们的灵活性和高方差使它们能够为给定任务生成多个解决方案，例如修复、超分辨率和去模糊。然而，关于它们的表现如何仍有一些未解决的问题。本文中，我们研究了这些模型在应用于高斯数据分布进行去模糊时的准确性。在这种受限的情境下，我们能够精确分析逆问题的理论分辨率与使用扩散模型获得的分辨率之间的差异，通过计算扩散模型采样器的分布与理想解决方案的分布之间的确切Wasserstein距离。我们的发现允许从文献中比较不同的算法。

更新时间: 2025-07-09 16:36:51

领域: cs.LG

下载: http://arxiv.org/abs/2507.07008v1

GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning

Microscopic assessment of histopathology images is vital for accurate cancer diagnosis and treatment. Whole Slide Image (WSI) classification and captioning have become crucial tasks in computer-aided pathology. However, microscopic WSI face challenges such as redundant patches and unknown patch positions due to subjective pathologist captures. Moreover, generating automatic pathology captions remains a significant challenge. To address these issues, we introduce a novel GNN-ViTCap framework for classification and caption generation from histopathological microscopic images. First, a visual feature extractor generates patch embeddings. Redundant patches are then removed by dynamically clustering these embeddings using deep embedded clustering and selecting representative patches via a scalar dot attention mechanism. We build a graph by connecting each node to its nearest neighbors in the similarity matrix and apply a graph neural network to capture both local and global context. The aggregated image embeddings are projected into the language model's input space through a linear layer and combined with caption tokens to fine-tune a large language model. We validate our method on the BreakHis and PatchGastric datasets. GNN-ViTCap achieves an F1 score of 0.934 and an AUC of 0.963 for classification, along with a BLEU-4 score of 0.811 and a METEOR score of 0.569 for captioning. Experimental results demonstrate that GNN-ViTCap outperforms state of the art approaches, offering a reliable and efficient solution for microscopy based patient diagnosis.

Updated: 2025-07-09 16:35:21

标题: GNN-ViTCap：视觉Transformer增强的GNN多实例学习用于全切片图像分类和字幕生成

摘要: 镜检组织病理图像的微观评估对于准确的癌症诊断和治疗至关重要。全幻灯片图像（WSI）分类和字幕已成为计算机辅助病理学中至关重要的任务。然而，微观WSI面临诸如冗余补丁和未知补丁位置的挑战，这是由于主观病理学家的捕捉而造成的。此外，生成自动病理字幕仍然是一个重大挑战。为了解决这些问题，我们引入了一个新颖的GNN-ViTCap框架，用于从组织病理学微观图像中进行分类和字幕生成。首先，一个视觉特征提取器生成补丁嵌入。然后通过使用深度嵌入聚类动态聚类这些嵌入并通过标量点关注机制选择代表性补丁来移除冗余补丁。我们通过在相似性矩阵中将每个节点连接到其最近邻来构建一个图，并应用图神经网络来捕捉本地和全局上下文。聚合的图像嵌入通过一个线性层投影到语言模型的输入空间，并与字幕令牌结合以微调一个大型语言模型。我们在BreakHis和PatchGastric数据集上验证了我们的方法。GNN-ViTCap实现了0.934的F1分数和0.963的AUC用于分类，以及0.811的BLEU-4分数和0.569的METEOR分数用于字幕。实验结果表明，GNN-ViTCap优于最先进的方法，为基于显微镜的患者诊断提供了可靠和高效的解决方案。

更新时间: 2025-07-09 16:35:21

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.07006v1

Skewed Score: A statistical framework to assess autograders

The evaluation of large language model (LLM) outputs is increasingly performed by other LLMs, a setup commonly known as "LLM-as-a-judge", or autograders. While autograders offer a scalable alternative to human evaluation, they have shown mixed reliability and may exhibit systematic biases, depending on response type, scoring methodology, domain specificity, or other factors. Here we propose a statistical framework based on Bayesian generalised linear models (GLMs) that enables researchers to simultaneously assess their autograders while addressing their primary research questions (e.g., LLM evaluation). Our approach models evaluation outcomes (e.g., scores or pairwise preferences) as a function of properties of the grader (e.g., human vs. autograder) and the evaluated item (e.g., response length or the LLM that generated it), allowing for explicit quantification of scoring differences and potential biases within a unified framework. In addition, our method can be used to augment traditional metrics such as inter-rater agreement, by providing uncertainty estimates and clarifying sources of disagreement. Overall, this approach contributes to more robust and interpretable use of autograders in LLM evaluation, enabling both performance analysis and bias detection.

Updated: 2025-07-09 16:28:55

标题: 偏斜分数：评估自动评分工具的统计框架

摘要: 越来越多的人开始通过其他大型语言模型（LLM）对LLM输出进行评估，这种设置通常称为“LLM作为评判者”或自动评分器。虽然自动评分器为人类评估提供了可扩展的替代方案，但它们显示出了可靠性参差不齐，并可能出现系统性偏见，这取决于响应类型、评分方法、领域特定性或其他因素。在这里，我们提出了一个基于贝叶斯广义线性模型（GLMs）的统计框架，使研究人员能够同时评估他们的自动评分器，并解决他们的主要研究问题（例如，LLM评估）。我们的方法将评估结果（例如得分或成对偏好）建模为评分者属性（例如人类与自动评分器）和评估项目属性（例如响应长度或生成它的LLM）的函数，从而明确量化评分差异和潜在偏见在一个统一框架内。此外，我们的方法可以用来增强传统指标，如评分者间一致性，通过提供不确定性估计和澄清分歧的来源。总的来说，这种方法有助于更稳健和可解释地使用LLM评估中的自动评分器，既可以进行性能分析，也可以检测偏见。

更新时间: 2025-07-09 16:28:55

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.03772v2

Multi-Modality Conditioned Variational U-Net for Field-of-View Extension in Brain Diffusion MRI

An incomplete field-of-view (FOV) in diffusion magnetic resonance imaging (dMRI) can severely hinder the volumetric and bundle analyses of whole-brain white matter connectivity. Although existing works have investigated imputing the missing regions using deep generative models, it remains unclear how to specifically utilize additional information from paired multi-modality data and whether this can enhance the imputation quality and be useful for downstream tractography. To fill this gap, we propose a novel framework for imputing dMRI scans in the incomplete part of the FOV by integrating the learned diffusion features in the acquired part of the FOV to the complete brain anatomical structure. We hypothesize that by this design the proposed framework can enhance the imputation performance of the dMRI scans and therefore be useful for repairing whole-brain tractography in corrupted dMRI scans with incomplete FOV. We tested our framework on two cohorts from different sites with a total of 96 subjects and compared it with a baseline imputation method that treats the information from T1w and dMRI scans equally. The proposed framework achieved significant improvements in imputation performance, as demonstrated by angular correlation coefficient (p < 1E-5), and in downstream tractography accuracy, as demonstrated by Dice score (p < 0.01). Results suggest that the proposed framework improved imputation performance in dMRI scans by specifically utilizing additional information from paired multi-modality data, compared with the baseline method. The imputation achieved by the proposed framework enhances whole brain tractography, and therefore reduces the uncertainty when analyzing bundles associated with neurodegenerative.

Updated: 2025-07-09 16:25:52

标题: 多模态条件变分U-Net用于扩展脑扩散MRI中的视野

摘要: 扩散磁共振成像（dMRI）中的视场不完整会严重影响整个大脑白质连接性的体积和束分析。尽管现有研究已经探讨了使用深度生成模型来填补缺失区域，但如何具体利用来自配对多模态数据的额外信息以及这是否可以提高填补质量并对下游追踪有用仍不清楚。为了填补这一空白，我们提出了一个新颖的框架，通过将获取到的视场的扩散特征整合到完整的大脑解剖结构中，来填补FOV中不完整的dMRI扫描部分。我们假设通过这种设计，所提出的框架可以提高dMRI扫描的填充性能，从而有助于修复FOV不完整的受损dMRI扫描中的整个大脑追踪。我们在来自不同站点的两个队列上对我们的框架进行了测试，共计96名受试者，并将其与将T1w和dMRI扫描信息等同对待的基线填补方法进行了比较。所提出的框架在填充性能方面取得了显著改善，如角相关系数（p < 1E-5）所示，并在下游追踪准确性方面取得了改善，如Dice分数（p < 0.01）所示。结果表明，与基线方法相比，所提出的框架通过具体利用配对多模态数据的额外信息，提高了dMRI扫描的填充性能。所提出的框架实现的填充改善了整个大脑追踪，从而降低了分析与神经退行性相关的束时的不确定性。

更新时间: 2025-07-09 16:25:52

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.13846v2

Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs

Reasoning is a key capability for large language models (LLMs), particularly when applied to complex tasks such as mathematical problem solving. However, multimodal reasoning research still requires further exploration of modality alignment and training costs. Many of these approaches rely on additional data annotation and relevant rule-based rewards to enhance the understanding and reasoning ability, which significantly increases training costs and limits scalability. To address these challenges, we propose the Deliberate-to-Intuitive reasoning framework (D2I) that improves the understanding and reasoning ability of multimodal LLMs (MLLMs) without extra annotations and complex rewards. Specifically, our method sets deliberate reasoning strategies to enhance modality alignment only through the rule-based format reward during training. While evaluating, the reasoning style shifts to intuitive, which removes deliberate reasoning strategies during training and implicitly reflects the model's acquired abilities in the response. D2I outperforms baselines across both in-domain and out-of-domain benchmarks. Our findings highlight the role of format reward in fostering transferable reasoning skills in MLLMs, and inspire directions for decoupling training-time reasoning depth from test-time response flexibility.

Updated: 2025-07-09 16:25:44

标题: 刻意学习，直觉行动：解锁多模态LLMs的测试时推理

摘要: 推理是大型语言模型（LLMs）的关键能力，特别是在应用于复杂任务如数学问题解决时。然而，多模态推理研究仍需要进一步探索模态对齐和训练成本。许多这些方法依赖于额外的数据注释和相关的基于规则的奖励来增强理解和推理能力，这显著增加了训练成本并限制了可扩展性。为了解决这些挑战，我们提出了Deliberate-to-Intuitive 推理框架（D2I），它提高了多模态LLMs（MLLMs）的理解和推理能力，而不需要额外的注释和复杂的奖励。具体来说，我们的方法通过规则格式奖励在训练期间设置故意推理策略来增强模态对齐。在评估时，推理风格转变为直观，这在训练期间移除了故意推理策略，并隐含地反映了模型在响应中获得的能力。D2I在领域内和领域外基准测试中均优于基线。我们的研究结果突显了格式奖励在培养MLLMs中可转移的推理技能中的作用，并启发了将训练时推理深度与测试时响应灵活性解耦的方向。

更新时间: 2025-07-09 16:25:44

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.06999v1

Federated Learning-based MARL for Strengthening Physical-Layer Security in B5G Networks

This paper explores the application of a federated learning-based multi-agent reinforcement learning (MARL) strategy to enhance physical-layer security (PLS) in a multi-cellular network within the context of beyond 5G networks. At each cell, a base station (BS) operates as a deep reinforcement learning (DRL) agent that interacts with the surrounding environment to maximize the secrecy rate of legitimate users in the presence of an eavesdropper. This eavesdropper attempts to intercept the confidential information shared between the BS and its authorized users. The DRL agents are deemed to be federated since they only share their network parameters with a central server and not the private data of their legitimate users. Two DRL approaches, deep Q-network (DQN) and Reinforce deep policy gradient (RDPG), are explored and compared. The results demonstrate that RDPG converges more rapidly than DQN. In addition, we demonstrate that the proposed method outperforms the distributed DRL approach. Furthermore, the outcomes illustrate the trade-off between security and complexity.

Updated: 2025-07-09 16:24:15

标题: 基于联邦学习的多智能体强化物理层安全的B5G网络

摘要: 本文探讨了在超越5G网络背景下，将联邦学习为基础的多智能体强化学习（MARL）策略应用于增强多小区网络中的物理层安全（PLS）。在每个小区，基站（BS）作为一个深度强化学习（DRL）智能体，与周围环境进行交互，以最大化合法用户的保密率，同时存在窃听者。这个窃听者试图截取BS和其授权用户之间共享的机密信息。由于这些DRL代理只与中央服务器共享其网络参数而不共享其合法用户的私人数据，因此这些DRL代理被认为是联邦的。本文探讨并比较了两种DRL方法：深度Q网络（DQN）和深度政策梯度（RDPG）。结果表明，RDPG比DQN更快地收敛。此外，我们证明了所提出的方法优于分布式DRL方法。此外，结果说明了安全性和复杂性之间的权衡。

更新时间: 2025-07-09 16:24:15

领域: eess.SP,cs.ET,cs.LG,cs.NI

下载: http://arxiv.org/abs/2507.06997v1

Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing

Electronic Health Records (EHR) are time-series relational databases that record patient interactions and medical events over time, serving as a critical resource for healthcare research and applications. However, privacy concerns and regulatory restrictions limit the sharing and utilization of such sensitive data, necessitating the generation of synthetic EHR datasets. Unlike previous EHR synthesis methods, which typically generate medical records consisting of expert-chosen features (e.g. a few vital signs or structured codes only), we introduce RawMed, the first framework to synthesize multi-table, time-series EHR data that closely resembles raw EHRs. Using text-based representation and compression techniques, RawMed captures complex structures and temporal dynamics with minimal preprocessing. We also propose a new evaluation framework for multi-table time-series synthetic EHRs, assessing distributional similarity, inter-table relationships, temporal dynamics, and privacy. Validated on two open-source EHR datasets, RawMed outperforms baseline models in fidelity and utility. The code is available at https://github.com/eunbyeol-cho/RawMed.

Updated: 2025-07-09 16:22:22

标题: 使用最小的预处理从潜在空间生成多表时间序列电子病历

摘要: 电子健康记录（EHR）是一种时间序列关系数据库，记录了随着时间的推移患者的互动和医疗事件，是医疗研究和应用的重要资源。然而，隐私问题和监管限制限制了这些敏感数据的分享和利用，需要生成合成EHR数据集。与以往的EHR合成方法不同，通常生成由专家选择的特征（例如仅包括几个生命体征或结构化代码）的医疗记录不同，我们介绍了RawMed，这是第一个能够合成多表、时间序列EHR数据，与原始EHR非常相似的框架。使用基于文本的表示和压缩技术，RawMed在最小预处理下捕获复杂结构和时间动态。我们还提出了一个新的评估框架，用于评估多表时间序列合成EHR的分布相似性、表间关系、时间动态和隐私。在两个开源EHR数据集上验证后，RawMed在忠实度和实用性上优于基线模型。代码可在https://github.com/eunbyeol-cho/RawMed 上获得。

更新时间: 2025-07-09 16:22:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06996v1

Cross-Modality Masked Learning for Survival Prediction in ICI Treated NSCLC Patients

Accurate prognosis of non-small cell lung cancer (NSCLC) patients undergoing immunotherapy is essential for personalized treatment planning, enabling informed patient decisions, and improving both treatment outcomes and quality of life. However, the lack of large, relevant datasets and effective multi-modal feature fusion strategies pose significant challenges in this domain. To address these challenges, we present a large-scale dataset and introduce a novel framework for multi-modal feature fusion aimed at enhancing the accuracy of survival prediction. The dataset comprises 3D CT images and corresponding clinical records from NSCLC patients treated with immune checkpoint inhibitors (ICI), along with progression-free survival (PFS) and overall survival (OS) data. We further propose a cross-modality masked learning approach for medical feature fusion, consisting of two distinct branches, each tailored to its respective modality: a Slice-Depth Transformer for extracting 3D features from CT images and a graph-based Transformer for learning node features and relationships among clinical variables in tabular data. The fusion process is guided by a masked modality learning strategy, wherein the model utilizes the intact modality to reconstruct missing components. This mechanism improves the integration of modality-specific features, fostering more effective inter-modality relationships and feature interactions. Our approach demonstrates superior performance in multi-modal integration for NSCLC survival prediction, surpassing existing methods and setting a new benchmark for prognostic models in this context.

Updated: 2025-07-09 16:19:31

标题: ICI治疗非小细胞肺癌患者生存预测的跨模态掩蔽学习

摘要: 准确预测接受免疫治疗的非小细胞肺癌（NSCLC）患者的预后对于个性化治疗规划至关重要，可以帮助患者做出明智的决定，改善治疗结果和生活质量。然而，在这一领域，缺乏大量相关数据集和有效的多模态特征融合策略带来了重大挑战。为了解决这些挑战，我们提出了一个大规模数据集，并引入了一个新颖的多模态特征融合框架，旨在提高生存预测的准确性。该数据集包括接受免疫检查点抑制剂（ICI）治疗的NSCLC患者的3D CT图像和相应的临床记录，以及无进展生存（PFS）和总生存（OS）数据。我们进一步提出了一种用于医学特征融合的跨模态掩模学习方法，包括两个不同分支，分别针对各自的模态：用于从CT图像中提取3D特征的Slice-Depth Transformer和用于学习表格数据中临床变量之间的节点特征和关系的基于图的Transformer。融合过程由掩模模态学习策略指导，其中模型利用完整的模态来重建缺失的组件。这个机制改善了模态特定特征的整合，促进了更有效的模态间关系和特征交互。我们的方法在NSCLC生存预测的多模态整合方面表现出优越性能，超过了现有方法，并为该领域的预后模型设立了新的基准。

更新时间: 2025-07-09 16:19:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.06994v1

The end of radical concept nativism

Though humans seem to be remarkable learners, arguments in cognitive science and philosophy of mind have long maintained that learning something fundamentally new is impossible. Specifically, Jerry Fodor's arguments for radical concept nativism hold that most, if not all, concepts are innate and that what many call concept learning never actually leads to the acquisition of new concepts. These arguments have deeply affected cognitive science, and many believe that the counterarguments to radical concept nativism have been either unsuccessful or only apply to a narrow class of concepts. This paper first reviews the features and limitations of prior arguments. We then identify three critical points - related to issues of expressive power, conceptual structure, and concept possession - at which the arguments in favor of radical concept nativism diverge from describing actual human cognition. We use ideas from computer science and information theory to formalize the relevant ideas in ways that are arguably more scientifically productive. We conclude that, as a result, there is an important sense in which people do indeed learn new concepts.

Updated: 2025-07-09 16:18:56

标题: 激进概念民族主义的终结

摘要: 尽管人类似乎是出色的学习者，但认知科学和心灵哲学的论点长期以来一直认为学习某些根本新事物是不可能的。具体而言，杰瑞·福多对激进概念先天论的论据认为，大部分，如果不是全部，概念都是先天的，而许多人所谓的概念学习实际上从未导致新概念的习得。这些论点深刻影响了认知科学，许多人认为对激进概念先天论的反驳要么没有成功，要么只适用于一小部分概念。本文首先回顾了先前论点的特点和局限性。然后，我们确定了三个关键点 - 与表达能力、概念结构和概念拥有相关的问题 - 在这些支持激进概念先天论的论点与描述实际人类认知的情况不符之处。我们利用计算机科学和信息理论的思想以更具科学产出力的方式形式化相关思想。我们得出结论，因此，人们确实以某种重要方式学习新概念。

更新时间: 2025-07-09 16:18:56

领域: cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2505.18277v2

The User-Centric Geo-Experience: An LLM-Powered Framework for Enhanced Planning, Navigation, and Dynamic Adaptation

Traditional travel-planning systems are often static and fragmented, leaving them ill-equipped to handle real-world complexities such as evolving environmental conditions and unexpected itinerary disruptions. In this paper, we identify three gaps between existing service providers causing frustrating user experience: intelligent trip planning, precision "last-100-meter" navigation, and dynamic itinerary adaptation. We propose three cooperative agents: a Travel Planning Agent that employs grid-based spatial grounding and map analysis to help resolve complex multi-modal user queries; a Destination Assistant Agent that provides fine-grained guidance for the final navigation leg of each journey; and a Local Discovery Agent that leverages image embeddings and Retrieval-Augmented Generation (RAG) to detect and respond to trip plan disruptions. With evaluations and experiments, our system demonstrates substantial improvements in query interpretation, navigation accuracy, and disruption resilience, underscoring its promise for applications from urban exploration to emergency response.

Updated: 2025-07-09 16:18:09

标题: 用户中心的地理体验：基于LLM的增强规划、导航和动态适应性框架

摘要: 传统的旅行规划系统通常是静态和分散的，无法处理现实世界中的复杂问题，如不断变化的环境条件和意外行程中断。在本文中，我们确定了现有服务提供商之间存在的三个差距，导致用户体验受挫：智能行程规划、精准的“最后100米”导航和动态行程调整。我们提出了三个合作代理：一个旅行规划代理，利用基于网格的空间定位和地图分析来帮助解决复杂的多模式用户查询；一个目的地助手代理，为每次旅程的最后导航阶段提供精细的指导；以及一个本地发现代理，利用图像嵌入和检索增强生成（RAG）来检测和应对旅行计划中的中断。通过评估和实验，我们的系统在查询解释、导航准确性和中断恢复能力方面取得了实质性的改进，突显了其在从城市探索到应急响应等应用中的潜力。

更新时间: 2025-07-09 16:18:09

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.06993v1

MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation

Despite significant advancements in adapting Large Language Models (LLMs) for radiology report generation (RRG), clinical adoption remains challenging due to difficulties in accurately mapping pathological and anatomical features to their corresponding text descriptions. Additionally, semantic agnostic feature extraction further hampers the generation of accurate diagnostic reports. To address these challenges, we introduce Medical Concept Aligned Radiology Report Generation (MCA-RG), a knowledge-driven framework that explicitly aligns visual features with distinct medical concepts to enhance the report generation process. MCA-RG utilizes two curated concept banks: a pathology bank containing lesion-related knowledge, and an anatomy bank with anatomical descriptions. The visual features are aligned with these medical concepts and undergo tailored enhancement. We further propose an anatomy-based contrastive learning procedure to improve the generalization of anatomical features, coupled with a matching loss for pathological features to prioritize clinically relevant regions. Additionally, a feature gating mechanism is employed to filter out low-quality concept features. Finally, the visual features are corresponding to individual medical concepts, and are leveraged to guide the report generation process. Experiments on two public benchmarks (MIMIC-CXR and CheXpert Plus) demonstrate that MCA-RG achieves superior performance, highlighting its effectiveness in radiology report generation.

Updated: 2025-07-09 16:15:38

标题: MCA-RG：利用医学概念对放射学报告生成的LLMs进行增强

摘要: 尽管在将大型语言模型（LLMs）应用于放射学报告生成（RRG）方面取得了显著进展，但临床采用仍然具有挑战性，主要是由于在将病理和解剖特征准确映射到相应的文本描述方面存在困难。此外，语义不可知特征提取进一步阻碍了准确诊断报告的生成。为了解决这些挑战，我们引入了医学概念对齐放射学报告生成（MCA-RG），这是一个基于知识的框架，明确地将视觉特征与不同的医学概念对齐，以增强报告生成过程。MCA-RG利用两个策划的概念库：一个包含病变相关知识的病理学库，以及一个具有解剖描述的解剖学库。视觉特征与这些医学概念对齐并经过定制增强。我们进一步提出了一种基于解剖学的对比学习过程，以改进解剖特征的泛化能力，并配以匹配损失，以优先考虑临床相关区域的病理特征。此外，采用特征门控机制来过滤低质量概念特征。最后，视觉特征与各个医学概念相对应，并被利用来指导报告生成过程。对两个公共基准（MIMIC-CXR和CheXpert Plus）上的实验表明，MCA-RG实现了更优异的性能，突显了其在放射学报告生成中的有效性。

更新时间: 2025-07-09 16:15:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.06992v1

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

While large language models (LLMs) have recently demonstrated strong potential in solving planning problems, there is a trade-off between flexibility and complexity. LLMs, as zero-shot planners themselves, are still not capable of directly generating valid plans for complex planning problems such as multi-constraint or long-horizon tasks. On the other hand, many frameworks aiming to solve complex planning problems often rely on task-specific preparatory efforts, such as task-specific in-context examples and pre-defined critics/verifiers, which limits their cross-task generalization capability. In this paper, we tackle these challenges by observing that the core of many planning problems lies in optimization problems: searching for the optimal solution (best plan) with goals subject to constraints (preconditions and effects of decisions). With LLMs' commonsense, reasoning, and programming capabilities, this opens up the possibilities of a universal LLM-based approach to planning problems. Inspired by this observation, we propose LLMFP, a general-purpose framework that leverages LLMs to capture key information from planning problems and formally formulate and solve them as optimization problems from scratch, with no task-specific examples needed. We apply LLMFP to 9 planning problems, ranging from multi-constraint decision making to multi-step planning problems, and demonstrate that LLMFP achieves on average 83.7% and 86.8% optimal rate across 9 tasks for GPT-4o and Claude 3.5 Sonnet, significantly outperforming the best baseline (direct planning with OpenAI o1-preview) with 37.6% and 40.7% improvements. We also validate components of LLMFP with ablation experiments and analyzed the underlying success and failure reasons. Project page: https://sites.google.com/view/llmfp.

Updated: 2025-07-09 16:13:20

标题: 用严谨的方法规划任何事情：基于LLM的形式化编程的通用零样本规划

摘要: 最近，大型语言模型(LLMs)在解决规划问题方面展现出了强大的潜力，但在灵活性和复杂性之间存在着一个权衡。LLMs作为零-shot规划器本身，仍然无法直接为复杂规划问题（如多约束或长期任务）生成有效的计划。另一方面，许多旨在解决复杂规划问题的框架通常依赖于任务特定的准备工作，比如任务特定的上下文示例和预定义的评论者/验证者，这限制了它们的跨任务概括能力。在本文中，我们通过观察到许多规划问题的核心在于优化问题：在受到约束（决策的前提条件和效果）的目标下搜索最优解（最佳计划）。利用LLMs的常识、推理和编程能力，这为基于通用LLM的规划问题方法打开了可能性。受到这一观察的启发，我们提出了LLMFP，一个通用框架，利用LLMs从规划问题中捕获关键信息，并从头开始正式制定和解决它们作为优化问题，无需特定任务示例。我们将LLMFP应用于9个规划问题，从多约束决策制定到多步规划问题，证明LLMFP在9项任务中平均实现了83.7%和86.8%的优化率，对于GPT-4o和Claude 3.5 Sonnet，明显优于最佳基线（使用OpenAI o1-preview进行直接规划）的37.6%和40.7%的改进。我们还通过消融实验验证了LLMFP的组件，并分析了其成功和失败的根本原因。项目页面：https://sites.google.com/view/llmfp。

更新时间: 2025-07-09 16:13:20

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.12112v3

BarkBeetle: Stealing Decision Tree Models with Fault Injection

Machine learning models, particularly decision trees (DTs), are widely adopted across various domains due to their interpretability and efficiency. However, as ML models become increasingly integrated into privacy-sensitive applications, concerns about their confidentiality have grown, particularly in light of emerging threats such as model extraction and fault injection attacks. Assessing the vulnerability of DTs under such attacks is therefore important. In this work, we present BarkBeetle, a novel attack that leverages fault injection to extract internal structural information of DT models. BarkBeetle employs a bottom-up recovery strategy that uses targeted fault injection at specific nodes to efficiently infer feature splits and threshold values. Our proof-of-concept implementation demonstrates that BarkBeetle requires significantly fewer queries and recovers more structural information compared to prior approaches, when evaluated on DTs trained with public UCI datasets. To validate its practical feasibility, we implement BarkBeetle on a Raspberry Pi RP2350 board and perform fault injections using the Faultier voltage glitching tool. As BarkBeetle targets general DT models, we also provide an in-depth discussion on its applicability to a broader range of tree-based applications, including data stream classification, DT variants, and cryptography schemes.

Updated: 2025-07-09 16:08:58

标题: 松皮甲：利用故障注入窃取决策树模型

摘要: 机器学习模型，特别是决策树（DTs），由于其可解释性和效率，在各个领域被广泛采用。然而，随着ML模型越来越多地集成到隐私敏感的应用程序中，人们对其保密性的担忧也在增加，特别是在面对新兴威胁，如模型提取和故障注入攻击时。因此，评估DTs在此类攻击下的脆弱性是很重要的。在这项工作中，我们提出了BarkBeetle，一种利用故障注入来提取DT模型内部结构信息的新型攻击。BarkBeetle采用一种自下而上的恢复策略，通过在特定节点进行有针对性的故障注入，有效地推断特征拆分和阈值值。我们的概念验证实现表明，与先前方法相比，BarkBeetle需要更少的查询并恢复更多的结构信息，当评估使用公共UCI数据集训练的DTs时。为了验证其实际可行性，我们在Raspberry Pi RP2350板上实现了BarkBeetle，并使用Faultier电压故障工具进行故障注入。由于BarkBeetle针对通用DT模型，我们还就其在更广泛的基于树的应用程序，包括数据流分类、DT变体和密码学方案的适用性进行了深入讨论。

更新时间: 2025-07-09 16:08:58

领域: cs.CR

下载: http://arxiv.org/abs/2507.06986v1

A Principled Framework for Multi-View Contrastive Learning

Contrastive Learning (CL), a leading paradigm in Self-Supervised Learning (SSL), typically relies on pairs of data views generated through augmentation. While multiple augmentations per instance (more than two) improve generalization in supervised learning, current CL methods handle additional views suboptimally by simply aggregating different pairwise objectives. This approach suffers from four critical limitations: (L1) it utilizes multiple optimization terms per data point resulting to conflicting objectives, (L2) it fails to model all interactions across views and data points, (L3) it inherits fundamental limitations (e.g. alignment-uniformity coupling) from pairwise CL losses, and (L4) it prevents fully realizing the benefits of increased view multiplicity observed in supervised settings. We address these limitations through two novel loss functions: MV-InfoNCE, which extends InfoNCE to incorporate all possible view interactions simultaneously in one term per data point, and MV-DHEL, which decouples alignment from uniformity across views while scaling interaction complexity with view multiplicity. Both approaches are theoretically grounded - we prove they asymptotically optimize for alignment of all views and uniformity, providing principled extensions to multi-view contrastive learning. Our empirical results on ImageNet1K and three other datasets demonstrate that our methods consistently outperform existing multi-view approaches and effectively scale with increasing view multiplicity. We also apply our objectives to multimodal data and show that, in contrast to other contrastive objectives, they can scale beyond just two modalities. Most significantly, ablation studies reveal that MV-DHEL with five or more views effectively mitigates dimensionality collapse by fully utilizing the embedding space, thereby delivering multi-view benefits observed in supervised learning.

Updated: 2025-07-09 16:07:17

标题: 一个基于原则的多视角对比学习框架

摘要: 对比学习（CL）是自监督学习（SSL）中的主要范式，通常依赖通过增强生成的数据视图对。虽然在监督学习中，每个实例进行多次增强（超过两次）可以提高泛化能力，但当前的CL方法通过简单地聚合不同的成对目标来处理额外的视图，这种方法存在四个关键限制：（L1）它利用每个数据点的多个优化项导致冲突目标，（L2）它不能对所有视图和数据点之间的所有相互作用进行建模，（L3）它继承自成对CL损失的基本限制（例如对齐-一致性耦合），（L4）它阻止充分实现在监督设置中观察到的增加视图多样性的好处。我们通过两种新的损失函数解决了这些限制：MV-InfoNCE，它将InfoNCE扩展到一个项中同时纳入所有可能的视图相互作用，以及MV-DHEL，它在缩放交互复杂性的同时将对齐性与一致性在视图之间解耦，随着视图多样性的增加。这两种方法都有理论基础 - 我们证明它们在异步优化所有视图的对齐和一致性，为多视图对比学习提供了原则性的扩展。我们在ImageNet1K和其他三个数据集上的实证结果表明，我们的方法始终优于现有的多视图方法，并能够有效地随着视图多样性的增加而扩展。我们还将我们的目标应用于多模态数据，并展示，与其他对比目标相反，它们可以扩展到超过两种模态。最重要的是，消融研究表明，使用五个或更多视图的MV-DHEL可以通过充分利用嵌入空间有效缓解维度崩溃，从而提供监督学习中观察到的多视图优势。

更新时间: 2025-07-09 16:07:17

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.06979v1

Modeling (Deontic) Modal Operators With the s(CASP) Goal-directed Predicate Answer Set Programming System

We consider the problem of implementing deontic modal logic. We show how (deontic) modal operators can be expressed elegantly using default negation (negation-as-failure) and strong negation present in answer set programming (ASP). We propose using global constraints of ASP to represent obligations and impermissibilities of deontic modal logic. We show that our proposed representation results in the various paradoxes of deontic modal logic being elegantly resolved.

Updated: 2025-07-09 16:04:20

标题: 用s(CASP)目标导向谓词回答集编程系统对(义务)情态运算符进行建模

摘要: 我们考虑实现义务模态逻辑的问题。我们展示了如何使用默认否定（否定即失败）和答案集编程中的强否定来优雅地表达（义务）模态运算符。我们建议使用答案集编程的全局约束来表示义务和禁止的义务模态逻辑。我们展示了我们提出的表示方法能够优雅地解决义务模态逻辑的各种悖论。

更新时间: 2025-07-09 16:04:20

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2507.05519v2

PyPOTS: A Python Toolkit for Machine Learning on Partially-Observed Time Series

PyPOTS is an open-source Python library dedicated to data mining and analysis on multivariate partially-observed time series with missing values. Particularly, it provides easy access to diverse algorithms categorized into five tasks: imputation, forecasting, anomaly detection, classification, and clustering. The included models represent a diverse set of methodological paradigms, offering a unified and well-documented interface suitable for both academic research and practical applications. With robustness and scalability in its design philosophy, best practices of software construction, for example, unit testing, continuous integration and continuous delivery, code coverage, maintainability evaluation, interactive tutorials, and parallelization, are carried out as principles during the development of PyPOTS. The toolbox is available on PyPI, Anaconda, and Docker. PyPOTS is open source and publicly available on GitHub https://github.com/WenjieDu/PyPOTS.

Updated: 2025-07-09 16:03:16

标题: PyPOTS：用于部分观测时间序列机器学习的Python工具包

摘要: PyPOTS是一个开源的Python库，专门用于对具有缺失值的多变量部分观测时间序列进行数据挖掘和分析。特别地，它提供了易于访问的多样化算法，分为五个任务：插补、预测、异常检测、分类和聚类。所包含的模型代表了多种方法论范式，提供了统一且有文档记录的接口，适用于学术研究和实际应用。在设计哲学中具有鲁棒性和可扩展性，软件构造的最佳实践，例如单元测试、持续集成和持续交付、代码覆盖率、可维护性评估、交互式教程和并行化，在PyPOTS的开发过程中作为原则进行。该工具箱可在PyPI、Anaconda和Docker上获得。PyPOTS是开源的，并在GitHub上公开可用https://github.com/WenjieDu/PyPOTS。

更新时间: 2025-07-09 16:03:16

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2305.18811v2

Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy

Differentially private (DP) mechanisms are difficult to interpret and calibrate because existing methods for mapping standard privacy parameters to concrete privacy risks -- re-identification, attribute inference, and data reconstruction -- are both overly pessimistic and inconsistent. In this work, we use the hypothesis-testing interpretation of DP ($f$-DP), and determine that bounds on attack success can take the same unified form across re-identification, attribute inference, and data reconstruction risks. Our unified bounds are (1) consistent across a multitude of attack settings, and (2) tunable, enabling practitioners to evaluate risk with respect to arbitrary (including worst-case) levels of baseline risk. Empirically, our results are tighter than prior methods using $\varepsilon$-DP, R\'enyi DP, and concentrated DP. As a result, calibrating noise using our bounds can reduce the required noise by 20% at the same risk level, which yields, e.g., more than 15pp accuracy increase in a text classification task. Overall, this unifying perspective provides a principled framework for interpreting and calibrating the degree of protection in DP against specific levels of re-identification, attribute inference, or data reconstruction risk.

Updated: 2025-07-09 15:59:30

标题: 统一差分隐私中的再识别、属性推断和数据重构风险

摘要: 差异隐私（DP）机制很难解释和校准，因为现有的将标准隐私参数映射到具体隐私风险（重新识别、属性推断和数据重建）的方法既过于悲观又不一致。在这项工作中，我们使用了DP的假设检验解释（f-DP），并确定攻击成功的界限可以在重新识别、属性推断和数据重建风险之间采用统一的形式。我们的统一界限（1）在多种攻击设置下保持一致，（2）可调节，使从业者能够评估与任意（包括最坏情况）基线风险水平相关的风险。从经验上看，我们的结果比先前使用ε-DP、Rényi DP和集中DP的方法更紧密。因此，使用我们的界限校准噪声可以在相同风险水平下减少所需的噪声20％，从而在文本分类任务中提高超过15pp的准确度。总的来说，这种统一的观点为解释和校准DP的保护程度提供了一个原则性框架，以应对特定水平的重新识别、属性推断或数据重建风险。

更新时间: 2025-07-09 15:59:30

领域: cs.LG,cs.AI,cs.CR,cs.CY,stat.ML

下载: http://arxiv.org/abs/2507.06969v1

Scaling Towards the Information Boundary of Instruction Set: InfinityInstruct-Subject Technical Report

Instruction tuning has become a foundation for unlocking the capabilities of large-scale pretrained models and improving their performance on complex tasks. Thus, the construction of high-quality instruction datasets is crucial for enhancing model performance and generalizability. Although current instruction datasets have reached tens of millions of samples, models finetuned on them may still struggle with complex instruction following and tasks in rare domains. This is primarily due to limited expansion in both ``coverage'' (coverage of task types and knowledge areas) and ``depth'' (instruction complexity) of the instruction set. To address this issue, we propose a systematic instruction data construction framework, which integrates a hierarchical labeling system, an informative seed selection algorithm, an evolutionary data synthesis process, and a model deficiency diagnosis with targeted data generation. These components form an iterative closed-loop to continuously enhance the coverage and depth of instruction data. Based on this framework, we construct InfinityInstruct-Subject, a high-quality dataset containing ~1.5 million instructions. Experiments on multiple foundation models and benchmark tasks demonstrate its effectiveness in improving instruction-following capabilities. Further analyses suggest that InfinityInstruct-Subject shows enlarged coverage and depth compared to comparable synthesized instruction datasets. Our work lays a theoretical and practical foundation for the efficient, continuous evolution of instruction datasets, moving from data quantity expansion to qualitative improvement.

Updated: 2025-07-09 15:59:02

标题: 朝向指令集信息边界的扩展：InfinityInstruct-Subject 技术报告

摘要: 指令调优已成为释放大规模预训练模型潜力并提高其在复杂任务上表现的基础。因此，构建高质量的指令数据集对提升模型性能和泛化能力至关重要。尽管当前的指令数据集已达到数千万个样本，但在这些数据集上微调的模型仍可能在复杂指令跟随和稀有领域任务中遇到困难。这主要是由于指令集的“覆盖范围”（任务类型和知识领域的覆盖）和“深度”（指令复杂度）的限制。为了解决这个问题，我们提出了一个系统的指令数据构建框架，该框架整合了分层标记系统、信息丰富的种子选择算法、进化数据合成过程和具有针对性的数据生成的模型缺陷诊断。这些组件形成一个迭代闭环，不断增强指令数据的覆盖范围和深度。基于这个框架，我们构建了包含约1.5百万条指令的高质量数据集InfinityInstruct-Subject。在多个基础模型和基准任务上的实验证明了它在提高指令跟随能力方面的有效性。进一步的分析表明，与可比的合成指令数据集相比，InfinityInstruct-Subject显示出扩大的覆盖范围和深度。我们的工作为指令数据集高效、持续演化奠定了理论和实践基础，从数据数量扩张转向质量改进。

更新时间: 2025-07-09 15:59:02

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.06968v1

Noisy PDE Training Requires Bigger PINNs

Physics-Informed Neural Networks (PINNs) are increasingly used to approximate solutions of partial differential equations (PDEs), especially in high dimensions. In real-world applications, data samples are noisy, so it is important to know when a predictor can still achieve low empirical risk. However, little is known about the conditions under which a PINN can do so effectively. We prove a lower bound on the size of neural networks required for the supervised PINN empirical risk to fall below the variance of noisy supervision labels. Specifically, if a predictor achieves an empirical risk $O(\eta)$ below $\sigma^2$ (variance of supervision data), then necessarily $d_N\log d_N\gtrsim N_s \eta^2$, where $N_s$ is the number of samples and $d_N$ is the number of trainable parameters of the PINN. A similar constraint applies to the fully unsupervised PINN setting when boundary labels are sampled noisily. Consequently, increasing the number of noisy supervision labels alone does not provide a ``free lunch'' in reducing empirical risk. We also show empirically that PINNs can indeed achieve empirical risks below $\sigma^2$ under such conditions. As a case study, we investigate PINNs applied to the Hamilton--Jacobi--Bellman (HJB) PDE. Our findings lay the groundwork for quantitatively understanding the parameter requirements for training PINNs in the presence of noise.

Updated: 2025-07-09 15:58:26

标题: 嘈杂的PDE训练需要更大的PINNs

摘要: 物理信息神经网络（PINNs）越来越被用来近似解偏微分方程（PDEs），特别是在高维度中。在现实世界应用中，数据样本往往带有噪声，因此了解预测器何时可以实现低经验风险是很重要的。然而，关于PINN何时可以有效地实现这一点的条件知之甚少。我们证明了监督PINN经验风险降至低于噪声监督标签方差所需的神经网络大小的下限。具体而言，如果一个预测器实现了低于$\sigma^2$（监督数据的方差）的经验风险$O(\eta)$，则必然有$d_N\log d_N\gtrsim N_s \eta^2$，其中$N_s$是样本数，$d_N$是PINN的可训练参数数。类似的约束条件也适用于完全无监督的PINN设置，当边界标签有噪声时。因此，仅仅增加噪声监督标签的数量并不能提供“免费午餐”来降低经验风险。我们还通过实验证明，在这些条件下PINNs确实可以实现低于$\sigma^2$的经验风险。作为一个案例研究，我们研究了应用于Hamilton-Jacobi-Bellman（HJB）PDE的PINNs。我们的研究为量化理解在存在噪声情况下训练PINN所需的参数要求奠定了基础。

更新时间: 2025-07-09 15:58:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06967v1

Pullback Flow Matching on Data Manifolds

We propose Pullback Flow Matching (PFM), a novel framework for generative modeling on data manifolds. Unlike existing methods that assume or learn restrictive closed-form manifold mappings for training Riemannian Flow Matching (RFM) models, PFM leverages pullback geometry and isometric learning to preserve the underlying manifold's geometry while enabling efficient generation and precise interpolation in latent space. This approach not only facilitates closed-form mappings on the data manifold but also allows for designable latent spaces, using assumed metrics on both data and latent manifolds. By enhancing isometric learning through Neural ODEs and proposing a scalable training objective, we achieve a latent space more suitable for interpolation, leading to improved manifold learning and generative performance. We demonstrate PFM's effectiveness through applications in synthetic data, protein dynamics and protein sequence data, generating novel proteins with specific properties. This method shows strong potential for drug discovery and materials science, where generating novel samples with specific properties is of great interest.

Updated: 2025-07-09 15:53:08

标题: 数据流形上的回溯流匹配

摘要: 我们提出了Pullback Flow Matching（PFM），这是一个新颖的数据流形上生成建模框架。与现有方法不同，现有方法假设或学习了用于训练黎曼流匹配（RFM）模型的限制性闭式流形映射，PFM利用拉回几何和等距学习来保持底层流形的几何结构，同时实现在潜在空间中的高效生成和精确插值。这种方法不仅促进了数据流形上的闭式映射，还允许在数据和潜在流形上使用假设的度量来设计潜在空间。通过通过神经ODE增强等距学习并提出可扩展的训练目标，我们实现了更适合插值的潜在空间，从而改进了流形学习和生成性能。我们通过在合成数据、蛋白质动力学和蛋白质序列数据中的应用展示了PFM的有效性，生成具有特定特性的新型蛋白质。这种方法在药物发现和材料科学领域具有强大的潜力，能够生成具有特定特性的新样本。

更新时间: 2025-07-09 15:53:08

领域: cs.LG,cs.AI,math.DG,q-bio.BM

下载: http://arxiv.org/abs/2410.04543v2

Off-Policy Evaluation Under Nonignorable Missing Data

Off-Policy Evaluation (OPE) aims to estimate the value of a target policy using offline data collected from potentially different policies. In real-world applications, however, logged data often suffers from missingness. While OPE has been extensively studied in the literature, a theoretical understanding of how missing data affects OPE results remains unclear. In this paper, we investigate OPE in the presence of monotone missingness and theoretically demonstrate that the value estimates remain unbiased under ignorable missingness but can be biased under nonignorable (informative) missingness. To retain the consistency of value estimation, we propose an inverse probability weighted value estimator and conduct statistical inference to quantify the uncertainty of the estimates. Through a series of numerical experiments, we empirically demonstrate that our proposed estimator yields a more reliable value inference under missing data.

Updated: 2025-07-09 15:46:39

标题: 缺失数据下的非忽略离线策略评估

摘要: 离线策略评估（OPE）旨在使用从潜在不同策略收集的离线数据来估计目标策略的价值。然而，在现实世界的应用中，已记录的数据经常存在缺失。尽管文献中对OPE进行了广泛研究，但缺失数据如何影响OPE结果的理论理解仍不清楚。在本文中，我们研究了在单调缺失情况下的OPE，并在理论上证明了在可忽略的缺失情况下价值估计保持无偏，但在不可忽略的（信息性）缺失情况下可能存在偏差。为保持价值估计的一致性，我们提出了一种逆概率加权的价值估计器，并进行统计推断以量化估计的不确定性。通过一系列数值实验，我们实证地证明了我们提出的估计器在缺失数据下产生了更可靠的价值推断。

更新时间: 2025-07-09 15:46:39

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.06961v1

From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis

EEG signals capture brain activity with high temporal and low spatial resolution, supporting applications such as neurological diagnosis, cognitive monitoring, and brain-computer interfaces. However, effective analysis is hindered by limited labeled data, high dimensionality, and the absence of scalable models that fully capture spatiotemporal dependencies. Existing self-supervised learning (SSL) methods often focus on either spatial or temporal features, leading to suboptimal representations. To this end, we propose EEG-VJEPA, a novel adaptation of the Video Joint Embedding Predictive Architecture (V-JEPA) for EEG classification. By treating EEG as video-like sequences, EEG-VJEPA learns semantically meaningful spatiotemporal representations using joint embeddings and adaptive masking. To our knowledge, this is the first work that exploits V-JEPA for EEG classification and explores the visual concepts learned by the model. Evaluations on the publicly available Temple University Hospital (TUH) Abnormal EEG dataset show that EEG-VJEPA outperforms existing state-of-the-art models in classification accuracy. Beyond classification accuracy, EEG-VJEPA captures physiologically relevant spatial and temporal signal patterns, offering interpretable embeddings that may support human-AI collaboration in diagnostic workflows. These findings position EEG-VJEPA as a promising framework for scalable, trustworthy EEG analysis in real-world clinical settings.

Updated: 2025-07-09 15:43:06

标题: 从视频到脑电图：将联合嵌入预测架构调整为揭示脑信号分析中的视觉概念

摘要: EEG信号捕捉大脑活动，具有高时间分辨率和低空间分辨率，支持神经学诊断、认知监测和脑-计算机界面等应用。然而，有效分析受限于有限的标记数据、高维度和缺乏能够完全捕捉时空依赖关系的可扩展模型。现有的自监督学习（SSL）方法往往专注于空间或时间特征，导致子优表示。因此，我们提出了EEG-VJEPA，这是对视频联合嵌入预测架构（V-JEPA）进行的一种新颖的适应性改进，用于EEG分类。通过将EEG视为类似视频的序列，EEG-VJEPA利用联合嵌入和自适应掩模学习语义上有意义的时空表示。据我们所知，这是首个利用V-JEPA进行EEG分类并探索模型学到的视觉概念的工作。对公开可用的Temple University Hospital（TUH）异常EEG数据集的评估显示，EEG-VJEPA在分类准确性方面优于现有的最先进模型。除了分类准确性外，EEG-VJEPA捕捉了具有生理意义的空间和时间信号模式，提供了可解释的嵌入，可能支持诊断工作流中的人工智能协作。这些发现将EEG-VJEPA定位为在现实临床环境中进行可扩展、可信赖的EEG分析的有前景的框架。

更新时间: 2025-07-09 15:43:06

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.03633v3

Bayesian Invariance Modeling of Multi-Environment Data

Invariant prediction [Peters et al., 2016] analyzes feature/outcome data from multiple environments to identify invariant features - those with a stable predictive relationship to the outcome. Such features support generalization to new environments and help reveal causal mechanisms. Previous methods have primarily tackled this problem through hypothesis testing or regularized optimization. Here we develop Bayesian Invariant Prediction (BIP), a probabilistic model for invariant prediction. BIP encodes the indices of invariant features as a latent variable and recover them by posterior inference. Under the assumptions of Peters et al. [2016], the BIP posterior targets the true invariant features. We prove that the posterior is consistent and that greater environment heterogeneity leads to faster posterior contraction. To handle many features, we design an efficient variational approximation called VI-BIP. In simulations and real data, we find that BIP and VI-BIP are more accurate and scalable than existing methods for invariant prediction.

Updated: 2025-07-09 15:42:31

标题: 贝叶斯不变性建模多环境数据

摘要: 不变预测[Peters等人，2016]分析来自多个环境的特征/结果数据，以识别不变特征 - 那些与结果之间具有稳定预测关系的特征。这些特征支持对新环境的泛化，并有助于揭示因果机制。先前的方法主要通过假设检验或正则化优化来解决这个问题。在这里，我们开发了贝叶斯不变预测（BIP），这是一个用于不变预测的概率模型。BIP将不变特征的指标编码为潜在变量，并通过后验推断来恢复它们。在Peters等人[2016]的假设下，BIP后验目标是真正的不变特征。我们证明后验是一致的，并且更大的环境异质性导致更快的后验收缩。为了处理许多特征，我们设计了一种称为VI-BIP的高效变分近似。在模拟和真实数据中，我们发现BIP和VI-BIP比现有的不变预测方法更准确和可扩展。

更新时间: 2025-07-09 15:42:31

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2506.22675v3

CheXPO: Preference Optimization for Chest X-ray VLMs with Counterfactual Rationale

Vision-language models (VLMs) are prone to hallucinations that critically compromise reliability in medical applications. While preference optimization can mitigate these hallucinations through clinical feedback, its implementation faces challenges such as clinically irrelevant training samples, imbalanced data distributions, and prohibitive expert annotation costs. To address these challenges, we introduce CheXPO, a Chest X-ray Preference Optimization strategy that combines confidence-similarity joint mining with counterfactual rationale. Our approach begins by synthesizing a unified, fine-grained multi-task chest X-ray visual instruction dataset across different question types for supervised fine-tuning (SFT). We then identify hard examples through token-level confidence analysis of SFT failures and use similarity-based retrieval to expand hard examples for balancing preference sample distributions, while synthetic counterfactual rationales provide fine-grained clinical preferences, eliminating the need for additional expert input. Experiments show that CheXPO achieves 8.93% relative performance gain using only 5% of SFT samples, reaching state-of-the-art performance across diverse clinical tasks and providing a scalable, interpretable solution for real-world radiology applications.

Updated: 2025-07-09 15:40:18

标题: CheXPO：具有反事实理由的胸部X射线VLM的偏好优化

摘要: 视觉语言模型（VLMs）容易出现幻觉，严重影响医学应用的可靠性。虽然偏好优化可以通过临床反馈缓解这些幻觉，但其实施面临挑战，如临床不相关的训练样本、数据分布不平衡和专家注释成本过高。为了解决这些挑战，我们引入了CheXPO，一种结合置信度相似度联合挖掘和反事实理由的胸部X射线偏好优化策略。我们的方法首先通过综合、精细化的多任务胸部X射线视觉指令数据集合成不同问题类型，用于监督微调（SFT）。然后通过对SFT失败的令牌级置信度分析来识别难例，并利用基于相似度的检索来扩展难例以平衡偏好样本分布，而合成的反事实理由提供精细化的临床偏好，消除了额外专家输入的需求。实验证明，CheXPO仅使用5%的SFT样本就实现了8.93%的相对性能增益，达到了各种临床任务的最新性能水平，并为现实世界的放射学应用提供了可扩展的可解释解决方案。

更新时间: 2025-07-09 15:40:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.06959v1

What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

Foundation models are premised on the idea that sequence prediction can uncover deeper domain understanding, much like how Kepler's predictions of planetary motion later led to the discovery of Newtonian mechanics. However, evaluating whether these models truly capture deeper structure remains a challenge. We develop a technique for evaluating foundation models that examines how they adapt to synthetic datasets generated from some postulated world model. Our technique measures whether the foundation model's inductive bias aligns with the world model, and so we refer to it as an inductive bias probe. Across multiple domains, we find that foundation models can excel at their training tasks yet fail to develop inductive biases towards the underlying world model when adapted to new tasks. We particularly find that foundation models trained on orbital trajectories consistently fail to apply Newtonian mechanics when adapted to new physics tasks. Further analysis reveals that these models behave as if they develop task-specific heuristics that fail to generalize.

Updated: 2025-07-09 15:36:15

标题: 基于归纳偏差探测世界模型：基础模型发现了什么？

摘要: 基础模型是建立在序列预测可以揭示更深层领域理解的想法上的，就像开普勒对行星运动的预测后来导致了牛顿力学的发现一样。然而，评估这些模型是否真正捕捉了更深层次的结构仍然是一个挑战。我们开发了一种评估基础模型的技术，该技术检查它们如何适应从某个假设的世界模型生成的合成数据集。我们的技术测量基础模型的归纳偏差是否与世界模型一致，因此我们将其称为归纳偏差探测器。在多个领域中，我们发现基础模型可以在训练任务上表现出色，但在适应新任务时未能发展出对基础世界模型的归纳偏差。我们特别发现，在轨道轨迹训练的基础模型在适应新的物理任务时一直未能应用牛顿力学。进一步分析揭示了这些模型表现得好像它们发展了任务特定的启发式方法，这种方法不能泛化。

更新时间: 2025-07-09 15:36:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06952v1

Generating Heterogeneous Multi-dimensional Data : A Comparative Study

Allocation of personnel and material resources is highly sensible in the case of firefighter interventions. This allocation relies on simulations to experiment with various scenarios. The main objective of this allocation is the global optimization of the firefighters response. Data generation is then mandatory to study various scenarios In this study, we propose to compare different data generation methods. Methods such as Random Sampling, Tabular Variational Autoencoders, standard Generative Adversarial Networks, Conditional Tabular Generative Adversarial Networks and Diffusion Probabilistic Models are examined to ascertain their efficacy in capturing the intricacies of firefighter interventions. Traditional evaluation metrics often fall short in capturing the nuanced requirements of synthetic datasets for real-world scenarios. To address this gap, an evaluation of synthetic data quality is conducted using a combination of domain-specific metrics tailored to the firefighting domain and standard measures such as the Wasserstein distance. Domain-specific metrics include response time distribution, spatial-temporal distribution of interventions, and accidents representation. These metrics are designed to assess data variability, the preservation of fine and complex correlations and anomalies such as event with a very low occurrence, the conformity with the initial statistical distribution and the operational relevance of the synthetic data. The distribution has the particularity of being highly unbalanced, none of the variables following a Gaussian distribution, adding complexity to the data generation process.

Updated: 2025-07-09 15:32:12

标题: 生成异构多维数据：一项比较研究

摘要: 人员和物资资源的分配在消防员干预中非常敏感。这种分配依赖于模拟来尝试各种情景。这种分配的主要目标是全面优化消防员的响应。数据生成是研究各种情景的必要条件。在这项研究中，我们提出比较不同的数据生成方法。方法包括随机抽样、表格变分自动编码器、标准生成对抗网络、条件表格生成对抗网络和扩散概率模型，用以确定它们在捕捉消防员干预复杂性方面的有效性。传统评估指标通常无法完全满足合成数据集在真实场景中的微妙需求。为了弥补这一差距，使用领域特定指标结合标准度量如Wasserstein距离进行了合成数据质量评估。领域特定指标包括响应时间分布、干预的时空分布和事故表征。这些指标旨在评估数据的变异性、对细微和复杂相关性以及罕见事件的保存，以及与初始统计分布的一致性和合成数据的运营相关性。分布具有高度不平衡的特点，没有任何变量遵循高斯分布，这增加了数据生成过程的复杂性。

更新时间: 2025-07-09 15:32:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.00090v2

Representative Ranking for Deliberation in the Public Sphere

Online comment sections, such as those on news sites or social media, have the potential to foster informal public deliberation, However, this potential is often undermined by the frequency of toxic or low-quality exchanges that occur in these settings. To combat this, platforms increasingly leverage algorithmic ranking to facilitate higher-quality discussions, e.g., by using civility classifiers or forms of prosocial ranking. Yet, these interventions may also inadvertently reduce the visibility of legitimate viewpoints, undermining another key aspect of deliberation: representation of diverse views. We seek to remedy this problem by introducing guarantees of representation into these methods. In particular, we adopt the notion of justified representation (JR) from the social choice literature and incorporate a JR constraint into the comment ranking setting. We find that enforcing JR leads to greater inclusion of diverse viewpoints while still being compatible with optimizing for user engagement or other measures of conversational quality.

Updated: 2025-07-09 15:22:44

标题: 在公共领域中进行审议的代表性排名

摘要: 在线评论区，比如新闻网站或社交媒体上的评论区，有潜力促进非正式的公共讨论。然而，这种潜力经常受到在这些环境中发生的有毒或低质量交流的频繁影响。为了应对这一问题，平台越来越多地利用算法排名来促进更高质量的讨论，例如通过使用礼貌分类器或一种亲社会的排名方式。然而，这些干预措施也可能无意中降低合理观点的可见性，削弱讨论的另一个关键方面：多元观点的代表性。我们试图通过将代表性保证引入这些方法来解决这一问题。具体地，我们从社会选择文献中采纳了合理代表性（JR）的概念，并将JR约束纳入评论排名设置中。我们发现，强制执行JR会导致更多多元观点的包容性，同时仍与优化用户参与度或对话质量的其他度量相兼容。

更新时间: 2025-07-09 15:22:44

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2503.18962v2

ADPv2: A Hierarchical Histological Tissue Type-Annotated Dataset for Potential Biomarker Discovery of Colorectal Disease

Computational pathology (CoPath) leverages histopathology images to enhance diagnostic precision and reproducibility in clinical pathology. However, publicly available datasets for CoPath that are annotated with extensive histological tissue type (HTT) taxonomies at a granular level remain scarce due to the significant expertise and high annotation costs required. Existing datasets, such as the Atlas of Digital Pathology (ADP), address this by offering diverse HTT annotations generalized to multiple organs, but limit the capability for in-depth studies on specific organ diseases. Building upon this foundation, we introduce ADPv2, a novel dataset focused on gastrointestinal histopathology. Our dataset comprises 20,004 image patches derived from healthy colon biopsy slides, annotated according to a hierarchical taxonomy of 32 distinct HTTs of 3 levels. Furthermore, we train a multilabel representation learning model following a two-stage training procedure on our ADPv2 dataset. We leverage the VMamba architecture and achieving a mean average precision (mAP) of 0.88 in multilabel classification of colon HTTs. Finally, we show that our dataset is capable of an organ-specific in-depth study for potential biomarker discovery by analyzing the model's prediction behavior on tissues affected by different colon diseases, which reveals statistical patterns that confirm the two pathological pathways of colon cancer development. Our dataset is publicly available at https://zenodo.org/records/15307021

Updated: 2025-07-09 15:16:20

标题: ADPv2：一个用于结直肠疾病潜在生物标志物发现的分层组织类型注释数据集

摘要: 计算病理学（CoPath）利用组织病理学图像增强临床病理学中的诊断精度和可重复性。然而，对于具有细致组织组织类型（HTT）分类注释的公开可用的CoPath数据集仍然很少，这是因为需要显著的专业知识和高昂的注释成本。现有数据集，如数字病理学图谱（ADP），通过提供广泛的HTT注释，泛化到多个器官，解决了这一问题，但限制了对特定器官疾病的深入研究能力。在此基础上，我们介绍了ADPv2，这是一个专注于胃肠病理学的新数据集。我们的数据集包括来自健康结肠活检切片的20,004个图像块，根据32个不同HTT的3级层次分类注释。此外，我们在ADPv2数据集上进行了两阶段训练过程中的多标签表示学习模型训练。我们利用VMamba架构，在结肠HTTs的多标签分类中实现平均精度（mAP）为0.88。最后，我们展示了我们的数据集能够进行器官特异性的深入研究，以便通过分析模型对不同结肠疾病影响的组织的预测行为，发现验证结肠癌发展的两条病理途径的统计模式的潜在生物标记发现。我们的数据集可以在https://zenodo.org/records/15307021上公开获取。

更新时间: 2025-07-09 15:16:20

领域: eess.IV,cs.CV,cs.LG,q-bio.QM,I.2.10; I.2.1

下载: http://arxiv.org/abs/2507.05656v2

DICE: Data Influence Cascade in Decentralized Learning

Decentralized learning offers a promising approach to crowdsource data consumptions and computational workloads across geographically distributed compute interconnected through peer-to-peer networks, accommodating the exponentially increasing demands. However, proper incentives are still in absence, considerably discouraging participation. Our vision is that a fair incentive mechanism relies on fair attribution of contributions to participating nodes, which faces non-trivial challenges arising from the localized connections making influence ``cascade'' in a decentralized network. To overcome this, we design the first method to estimate \textbf{D}ata \textbf{I}nfluence \textbf{C}ascad\textbf{E} (DICE) in a decentralized environment. Theoretically, the framework derives tractable approximations of influence cascade over arbitrary neighbor hops, suggesting the influence cascade is determined by an interplay of data, communication topology, and the curvature of loss landscape. DICE also lays the foundations for applications including selecting suitable collaborators and identifying malicious behaviors. Project page is available at https://raiden-zhu.github.io/blog/2025/DICE/.

Updated: 2025-07-09 15:13:44

标题: DICE：分布式学习中的数据影响级联

摘要: 分散式学习提供了一种有前途的方法，通过点对点网络在地理上分布的计算机之间共享数据消耗和计算负载，以满足指数增长的需求。然而，适当的激励机制仍然缺乏，严重地阻止了参与。我们的愿景是，一个公平的激励机制依赖于对参与节点的贡献进行公平归因，这面临着由分布式网络中的局部连接产生的对影响的``级联''的不容易的挑战。为了克服这一问题，我们设计了第一种在分散式环境中估算数据影响级联 (DICE) 的方法。在理论上，该框架推导出对任意邻居跳数的影响级联的可处理近似，表明影响级联由数据、通信拓扑和损失景观的曲率的相互作用决定。DICE还为选择合适的合作伙伴和识别恶意行为等应用奠定了基础。项目页面可在 https://raiden-zhu.github.io/blog/2025/DICE/ 上找到。

更新时间: 2025-07-09 15:13:44

领域: cs.LG,cs.DC,cs.MA,cs.SI,stat.ML

下载: http://arxiv.org/abs/2507.06931v1

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

Recent advancements in reasoning with large language models (RLLMs), such as OpenAI-O1 and DeepSeek-R1, have demonstrated their impressive capabilities in complex domains like mathematics and coding. A central factor in their success lies in the application of long chain-of-thought (Long CoT) characteristics, which enhance reasoning abilities and enable the solution of intricate problems. However, despite these developments, a comprehensive survey on Long CoT is still lacking, limiting our understanding of its distinctions from traditional short chain-of-thought (Short CoT) and complicating ongoing debates on issues like "overthinking" and "inference-time scaling." This survey seeks to fill this gap by offering a unified perspective on Long CoT. (1) We first distinguish Long CoT from Short CoT and introduce a novel taxonomy to categorize current reasoning paradigms. (2) Next, we explore the key characteristics of Long CoT: deep reasoning, extensive exploration, and feasible reflection, which enable models to handle more complex tasks and produce more efficient, coherent outcomes compared to the shallower Short CoT. (3) We then investigate key phenomena such as the emergence of Long CoT with these characteristics, including overthinking, and inference-time scaling, offering insights into how these processes manifest in practice. (4) Finally, we identify significant research gaps and highlight promising future directions, including the integration of multi-modal reasoning, efficiency improvements, and enhanced knowledge frameworks. By providing a structured overview, this survey aims to inspire future research and further the development of logical reasoning in artificial intelligence.

Updated: 2025-07-09 15:13:24

标题: 走向推理时代：推理大型语言模型的长推理链调查

摘要: 最近在利用大型语言模型（RLLMs），如OpenAI-O1和DeepSeek-R1，进行推理方面取得了重大进展，展示了它们在数学和编码等复杂领域的令人印象深刻的能力。它们成功的一个关键因素在于应用长链式思维（Long CoT）特征，这些特征增强了推理能力，使其能够解决复杂问题。然而，尽管取得了这些发展，长链式思维的全面调查仍然缺乏，限制了我们对其与传统短链式思维（Short CoT）的区别的理解，并使关于“过度思考”和“推理时间扩展”等问题的持续辩论变得复杂。本调查试图通过提供统一的长链式思维视角来填补这一空白。（1）我们首先区分长链式思维和短链式思维，并引入一种新的分类体系来对当前的推理范式进行分类。（2）接下来，我们探讨长链式思维的关键特征：深层推理、广泛探索和可行的反思，这些特征使模型能够处理更复杂的任务，并产生比较浅层短链式思维更高效、连贯的结果。（3）然后，我们研究了具有这些特征的长链式思维的关键现象，包括过度思考和推理时间扩展，提供了关于这些过程在实践中如何表现的见解。（4）最后，我们确定了重要的研究空白，并强调了有前途的未来方向，包括多模态推理的整合、效率改进和增强的知识框架。通过提供结构化的概述，本调查旨在激发未来研究，并推动人工智能中逻辑推理的发展。

更新时间: 2025-07-09 15:13:24

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.09567v4

Machine-Learned Force Fields for Lattice Dynamics at Coupled-Cluster Level Accuracy

We investigate Machine-Learned Force Fields (MLFFs) trained on approximate Density Functional Theory (DFT) and Coupled Cluster (CC) level potential energy surfaces for the carbon diamond and lithium hydride solids. We assess the accuracy and precision of the MLFFs by calculating phonon dispersions and vibrational densities of states (VDOS) that are compared to experiment and reference ab initio results. To overcome limitations from long-range effects and the lack of atomic forces in the CC training data, a delta-learning approach based on the difference between CC and DFT results is explored. Compared to DFT, MLFFs trained on CC theory yield higher vibrational frequencies for optical modes, agreeing better with experiment. Furthermore, the MLFFs are used to estimate anharmonic effects on the VDOS of lithium hydride at the level of CC theory.

Updated: 2025-07-09 15:11:55

标题: 机器学习力场用于耦合簇水平准确度的晶格动力学

摘要: 我们研究了在近似密度泛函理论（DFT）和耦合簇（CC）水平的势能曲面上训练的机器学习力场（MLFFs）在碳金刚石和氢化锂固体中。我们通过计算声子色散和振动态密度（VDOS）来评估MLFFs的准确性和精度，与实验和参考从头算结果进行比较。为了克服CC训练数据中长程效应和缺乏原子力的局限性，我们探索了基于CC和DFT结果之间差异的δ学习方法。与DFT相比，训练在CC理论上的MLFFs对光学模式产生更高的振动频率，与实验更为一致。此外，MLFFs被用来估计在CC理论水平上对氢化锂VDOS的非谐效应。

更新时间: 2025-07-09 15:11:55

领域: cond-mat.mtrl-sci,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2507.06929v1

Are NFTs Ready to Keep Australian Artists Engaged?

Non-Fungible Tokens (NFTs) offer a promising mechanism to protect Australian and Indigenous artists' copyright. They represent and transfer the value of artwork in digital form. Before adopting NFTs to protect Australian artwork, we in this paper investigate them empericially. We focus on examining the details of NFT structure. We start from the underlying structure of NFTs to show how they represent copyright for both artists and production owners, as well as how they aim to safeguard or secure the value of digital artworks. We then involve data collection from various types of sources with different storage methods, including on-chain, centralized, and decentralized systems. Based on both metadata and artwork content, we present our analysis and discussion on the following key issues: copyright, security and artist identification. The final results of the evaluation, unfortnately, show that the NFT is NOT ready to protect Australian and Indigenous artists' copyright.

Updated: 2025-07-09 15:07:17

标题: NFTs准备好吸引澳大利亚艺术家了吗？

摘要: 非同质化代币（NFT）为保护澳大利亚和土著艺术家的版权提供了一种有前途的机制。它们以数字形式代表和转移艺术品的价值。在采用NFT来保护澳大利亚艺术品之前，我们在本文中对其进行了实证调查。我们重点研究NFT结构的细节。我们从NFT的基本结构开始，展示它们如何代表艺术家和制作所有者的版权，以及它们如何旨在保护或确保数字艺术品的价值。然后我们从各种存储方法的不同来源进行数据收集，包括链上、集中式和分散式系统。基于元数据和艺术作品内容，我们对版权、安全性和艺术家识别等关键问题进行了分析和讨论。最终评估结果不幸地显示，NFT并不准备好保护澳大利亚和土著艺术家的版权。

更新时间: 2025-07-09 15:07:17

领域: cs.CR,cs.CY,cs.ET

下载: http://arxiv.org/abs/2507.06926v1

Protecting Classifiers From Attacks

In multiple domains such as malware detection, automated driving systems, or fraud detection, classification algorithms are susceptible to being attacked by malicious agents willing to perturb the value of instance covariates to pursue certain goals. Such problems pertain to the field of adversarial machine learning and have been mainly dealt with, perhaps implicitly, through game-theoretic ideas with strong underlying common knowledge assumptions. These are not realistic in numerous application domains in relation to security and business competition. We present an alternative Bayesian decision theoretic framework that accounts for the uncertainty about the attacker's behavior using adversarial risk analysis concepts. In doing so, we also present core ideas in adversarial machine learning to a statistical audience. A key ingredient in our framework is the ability to sample from the distribution of originating instances given the, possibly attacked, observed ones. We propose an initial procedure based on approximate Bayesian computation usable during operations; within it, we simulate the attacker's problem taking into account our uncertainty about his elements. Large-scale problems require an alternative scalable approach implementable during the training stage. Globally, we are able to robustify statistical classification algorithms against malicious attacks.

Updated: 2025-07-09 15:04:25

标题: 保护分类器免受攻击

摘要: 在多个领域，如恶意软件检测、自动驾驶系统或欺诈检测中，分类算法容易受到恶意攻击，攻击者愿意扰乱实例协变量的值以追求特定目标。这些问题涉及对抗性机器学习领域，并且主要通过具有强大的基础普遍知识假设的博弈论观念来处理，可能是隐式的。这些方法在安全和商业竞争方面的众多应用领域中并不现实。我们提出了一种替代的基于贝叶斯决策理论的框架，该框架考虑了对攻击者行为的不确定性，使用对抗性风险分析概念。在这样做的过程中，我们还向统计学的受众介绍了对抗性机器学习的核心思想。我们框架中的一个关键因素是在给定，可能遭受攻击的观察实例的情况下，能够从原始实例的分布中进行抽样。我们提出了一种基于近似贝叶斯计算的初始程序，可在运行过程中使用；在其中，我们模拟攻击者的问题，考虑到我们对其元素的不确定性。大规模问题需要一种可在训练阶段实施的替代可扩展方法。总体而言，我们能够使统计分类算法对恶意攻击更加强健。

更新时间: 2025-07-09 15:04:25

领域: stat.ML,cs.CR,cs.LG,stat.CO

下载: http://arxiv.org/abs/2004.08705v2

Distribution-free inference for LightGBM and GLM with Tweedie loss

Prediction uncertainty quantification is a key research topic in recent years scientific and business problems. In insurance industries (\cite{parodi2023pricing}), assessing the range of possible claim costs for individual drivers improves premium pricing accuracy. It also enables insurers to manage risk more effectively by accounting for uncertainty in accident likelihood and severity. In the presence of covariates, a variety of regression-type models are often used for modeling insurance claims, ranging from relatively simple generalized linear models (GLMs) to regularized GLMs to gradient boosting models (GBMs). Conformal predictive inference has arisen as a popular distribution-free approach for quantifying predictive uncertainty under relatively weak assumptions of exchangeability, and has been well studied under the classic linear regression setting. In this work, we propose new non-conformity measures for GLMs and GBMs with GLM-type loss. Using regularized Tweedie GLM regression and LightGBM with Tweedie loss, we demonstrate conformal prediction performance with these non-conformity measures in insurance claims data. Our simulation results favor the use of locally weighted Pearson residuals for LightGBM over other methods considered, as the resulting intervals maintained the nominal coverage with the smallest average width.

Updated: 2025-07-09 14:58:54

标题: 不使用分布进行推断的LightGBM和GLM与Tweedie损失

摘要: 预测不确定性量化是近年来科学和商业问题中的关键研究课题。在保险行业中，评估个体驾驶员可能索赔成本范围可以提高保费定价的准确性。这也使保险公司能够更有效地管理风险，考虑事故发生可能性和严重性的不确定性。在存在协变量的情况下，通常会使用各种回归类型模型来建模保险索赔，从相对简单的广义线性模型（GLMs）到正则化的GLMs到梯度提升模型（GBMs）。符合性预测推断已成为一种流行的无分布方法，用于在相对弱的可交换性假设下量化预测不确定性，并且在经典线性回归设置下得到了良好的研究。在这项工作中，我们提出了GLMs和带有GLM类型损失的GBMs的新非符合性度量。使用正则化Tweedie GLM回归和带有Tweedie损失的LightGBM，我们展示了这些非符合性度量在保险索赔数据中的符合性预测性能。我们的模拟结果支持在LightGBM中使用局部加权皮尔逊残差而不是考虑的其他方法，因为由此产生的区间保持了名义覆盖率，并且具有最小的平均宽度。

更新时间: 2025-07-09 14:58:54

领域: stat.ML,cs.LG,Application to insurance data, Methodology

下载: http://arxiv.org/abs/2507.06921v1

Beyond Connectivity: An Open Architecture for AI-RAN Convergence in 6G

The proliferation of data-intensive Artificial Intelligence (AI) applications at the network edge demands a fundamental shift in RAN design, from merely consuming AI for network optimization, to actively enabling distributed AI workloads. This paradigm shift presents a significant opportunity for network operators to monetize AI at the edge while leveraging existing infrastructure investments. To realize this vision, this article presents a novel converged O-RAN and AI-RAN architecture that unifies orchestration and management of both telecommunications and AI workloads on shared infrastructure. The proposed architecture extends the Open RAN principles of modularity, disaggregation, and cloud-nativeness to support heterogeneous AI deployments. We introduce two key architectural innovations: (i) the AI-RAN Orchestrator, which extends the O-RAN Service Management and Orchestration (SMO) to enable integrated resource and allocation across RAN and AI workloads; and (ii) AI-RAN sites that provide distributed edge AI platforms with real-time processing capabilities. The proposed system supports flexible deployment options, allowing AI workloads to be orchestrated with specific timing requirements (real-time or batch processing) and geographic targeting. The proposed architecture addresses the orchestration requirements for managing heterogeneous workloads at different time scales while maintaining open, standardized interfaces and multi-vendor interoperability.

Updated: 2025-07-09 14:49:11

标题: 超越连接性：6G中AI-RAN融合的开放架构

摘要: 数据密集型人工智能（AI）应用在网络边缘的蔓延要求基本转变RAN设计，从仅仅消耗AI进行网络优化，到积极支持分布式AI工作负载。这种范式转变为网络运营商提供了重要机会，在边缘实现AI的商业化同时利用现有基础设施投资。为实现这一愿景，本文提出了一种新颖的融合O-RAN和AI-RAN架构，统一了电信和AI工作负载的编排和管理，共享基础设施。提出的架构将开放RAN的模块化、分解和云本性原则扩展到支持异构AI部署。我们引入了两个关键的架构创新：（i）AI-RAN编排器，扩展了O-RAN服务管理和编排（SMO）以实现RAN和AI工作负载之间的集成资源分配；（ii）AI-RAN站点提供了具有实时处理能力的分布式边缘AI平台。提出的系统支持灵活的部署选项，允许AI工作负载与特定的时间要求（实时或批处理）和地理定位进行编排。提出的架构满足了管理不同时间尺度上的异构工作负载的编排需求，同时保持开放、标准化的接口和多供应商互操作性。

更新时间: 2025-07-09 14:49:11

领域: cs.NI,cs.AI,eess.SP

下载: http://arxiv.org/abs/2507.06911v1

MultiJustice: A Chinese Dataset for Multi-Party, Multi-Charge Legal Prediction

Legal judgment prediction offers a compelling method to aid legal practitioners and researchers. However, the research question remains relatively under-explored: Should multiple defendants and charges be treated separately in LJP? To address this, we introduce a new dataset namely multi-person multi-charge prediction (MPMCP), and seek the answer by evaluating the performance of several prevailing legal large language models (LLMs) on four practical legal judgment scenarios: (S1) single defendant with a single charge, (S2) single defendant with multiple charges, (S3) multiple defendants with a single charge, and (S4) multiple defendants with multiple charges. We evaluate the dataset across two LJP tasks, i.e., charge prediction and penalty term prediction. We have conducted extensive experiments and found that the scenario involving multiple defendants and multiple charges (S4) poses the greatest challenges, followed by S2, S3, and S1. The impact varies significantly depending on the model. For example, in S4 compared to S1, InternLM2 achieves approximately 4.5% lower F1-score and 2.8% higher LogD, while Lawformer demonstrates around 19.7% lower F1-score and 19.0% higher LogD. Our dataset and code are available at https://github.com/lololo-xiao/MultiJustice-MPMCP.

Updated: 2025-07-09 14:47:00

标题: MultiJustice：一个用于多方参与、多指控法律预测的中文数据集

摘要: 法律判决预测为法律从业者和研究人员提供了一种有力的方法。然而，研究问题仍然相对未被充分探讨：在LJP中是否应该分别处理多个被告和指控？为了解决这个问题，我们引入了一个新的数据集，即多人多罪行预测（MPMCP），并通过评估几种流行的法律大型语言模型（LLMs）在四种实际法律判决场景上的表现来寻找答案：（S1）单个被告一个指控，（S2）单个被告多个指控，（S3）多个被告一个指控，以及（S4）多个被告多个指控。我们评估了数据集在两个LJP任务中的表现，即指控预测和处罚条款预测。我们进行了广泛的实验，发现涉及多个被告和多个指控的场景（S4）面临着最大的挑战，其次是S2、S3和S1。影响因模型而异。例如，在S4中，相较于S1，InternLM2的F1分数约低4.5％，LogD高2.8％，而Lawformer的F1分数约低19.7％，LogD高19.0％。我们的数据集和代码可在https://github.com/lololo-xiao/MultiJustice-MPMCP 上获得。

更新时间: 2025-07-09 14:47:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.06909v1

MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection

The rapid expansion of memes on social media has highlighted the urgent need for effective approaches to detect harmful content. However, traditional data-driven approaches struggle to detect new memes due to their evolving nature and the lack of up-to-date annotated data. To address this issue, we propose MIND, a multi-agent framework for zero-shot harmful meme detection that does not rely on annotated data. MIND implements three key strategies: 1) We retrieve similar memes from an unannotated reference set to provide contextual information. 2) We propose a bi-directional insight derivation mechanism to extract a comprehensive understanding of similar memes. 3) We then employ a multi-agent debate mechanism to ensure robust decision-making through reasoned arbitration. Extensive experiments on three meme datasets demonstrate that our proposed framework not only outperforms existing zero-shot approaches but also shows strong generalization across different model architectures and parameter scales, providing a scalable solution for harmful meme detection. The code is available at https://github.com/destroy-lonely/MIND.

Updated: 2025-07-09 14:46:32

标题: MIND：用于零样本有害模因检测的多代理框架

摘要: 社交媒体上迅速扩散的模因突显了检测有害内容的紧迫需要。然而，传统的基于数据驱动的方法很难检测新的模因，因为它们的演变特性和缺乏最新的注释数据。为了解决这个问题，我们提出了MIND，一个用于零样本有害模因检测的多agent框架，不依赖于注释数据。MIND实施了三个关键策略：1）我们从未注释的参考集中检索相似的模因，以提供上下文信息。2）我们提出了一个双向洞察推导机制，以提取对相似模因的全面理解。3）然后，我们采用多agent辩论机制，通过理性仲裁确保鲁棒的决策。对三个模因数据集的广泛实验表明，我们提出的框架不仅优于现有的零样本方法，而且在不同的模型架构和参数规模上表现出强大的泛化能力，为有害模因检测提供了可扩展的解决方案。代码可在https://github.com/destroy-lonely/MIND找到。

更新时间: 2025-07-09 14:46:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.06908v1

Robust and Safe Traffic Sign Recognition using N-version with Weighted Voting

Autonomous driving is rapidly advancing as a key application of machine learning, yet ensuring the safety of these systems remains a critical challenge. Traffic sign recognition, an essential component of autonomous vehicles, is particularly vulnerable to adversarial attacks that can compromise driving safety. In this paper, we propose an N-version machine learning (NVML) framework that integrates a safety-aware weighted soft voting mechanism. Our approach utilizes Failure Mode and Effects Analysis (FMEA) to assess potential safety risks and assign dynamic, safety-aware weights to the ensemble outputs. We evaluate the robustness of three-version NVML systems employing various voting mechanisms against adversarial samples generated using the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attacks. Experimental results demonstrate that our NVML approach significantly enhances the robustness and safety of traffic sign recognition systems under adversarial conditions.

Updated: 2025-07-09 14:46:31

标题: 使用加权投票的N版本技术实现健壮安全的交通标志识别

摘要: 自动驾驶作为机器学习的一个关键应用，正在迅速发展，然而确保这些系统的安全性仍然是一个关键挑战。交通标志识别作为自动驾驶汽车的一个重要组成部分，特别容易受到对抗性攻击，从而危及驾驶安全。在本文中，我们提出了一个N版本机器学习（NVML）框架，其中集成了一个安全意识加权软投票机制。我们的方法利用故障模式和效应分析（FMEA）来评估潜在的安全风险，并为合奏输出分配动态的、安全意识的权重。我们评估了采用不同投票机制的三版本NVML系统对抗性样本的鲁棒性，这些样本是使用快速梯度符号法（FGSM）和投影梯度下降（PGD）攻击生成的。实验结果表明，我们的NVML方法显著提高了交通标志识别系统在对抗条件下的鲁棒性和安全性。

更新时间: 2025-07-09 14:46:31

领域: cs.LG,cs.SE

下载: http://arxiv.org/abs/2507.06907v1

Neural Canonical Polyadic Factorization for Traffic Analysis

Modern intelligent transportation systems rely on accurate spatiotemporal traffic analysis to optimize urban mobility and infrastructure resilience. However, pervasive missing data caused by sensor failures and heterogeneous sensing gaps fundamentally hinders reliable traffic modeling. This paper proposes a Neural Canonical Polyadic Factorization (NCPF) model that synergizes low-rank tensor algebra with deep representation learning for robust traffic data imputation. The model innovatively embeds CP decomposition into neural architecture through learnable embedding projections, where sparse traffic tensors are encoded into dense latent factors across road segments, time intervals, and mobility metrics. A hierarchical feature fusion mechanism employs Hadamard products to explicitly model multilinear interactions, while stacked multilayer perceptron layers nonlinearly refine these representations to capture complex spatiotemporal couplings. Extensive evaluations on six urban traffic datasets demonstrate NCPF's superiority over six state-of-the-art baselines. By unifying CP decomposition's interpretable factor analysis with neural network's nonlinear expressive power, NCPF provides a principled yet flexible approaches for high-dimensional traffic data imputation, offering critical support for next-generation transportation digital twins and adaptive traffic control systems.

Updated: 2025-07-09 14:45:44

标题: 神经网络的规范多维分解用于交通分析

摘要: 现代智能交通系统依赖准确的时空交通分析来优化城市的流动性和基础设施的弹性。然而，由传感器故障和异质感知缺口引起的普遍缺失数据从根本上阻碍了可靠的交通建模。本文提出了一种神经典型多阶分解（NCPF）模型，将低秩张量代数与深度表示学习相结合，用于稳健的交通数据填充。该模型通过可学习的嵌入投影将CP分解嵌入神经架构中，将稀疏的交通张量编码为道路段、时间间隔和流动性指标之间的密集潜在因子。层次特征融合机制利用哈达玛积明确建模多线性交互作用，而叠加的多层感知器层非线性地优化这些表示，以捕捉复杂的时空耦合关系。对六个城市交通数据集的广泛评估显示，NCPF在六个最先进的基线模型上具有优越性。通过将CP分解的可解释因子分析与神经网络的非线性表达能力统一起来，NCPF为高维交通数据填充提供了一种原则性而灵活的方法，为下一代交通数字孪生体和自适应交通控制系统提供了关键支持。

更新时间: 2025-07-09 14:45:44

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2506.15079v3

Designing Adaptive Algorithms Based on Reinforcement Learning for Dynamic Optimization of Sliding Window Size in Multi-Dimensional Data Streams

Multi-dimensional data streams, prevalent in applications like IoT, financial markets, and real-time analytics, pose significant challenges due to their high velocity, unbounded nature, and complex inter-dimensional dependencies. Sliding window techniques are critical for processing such streams, but fixed-size windows struggle to adapt to dynamic changes like concept drift or bursty patterns. This paper proposes a novel reinforcement learning (RL)-based approach to dynamically optimize sliding window sizes for multi-dimensional data streams. By formulating window size selection as an RL problem, we enable an agent to learn an adaptive policy based on stream characteristics, such as variance, correlations, and temporal trends. Our method, RL-Window, leverages a Dueling Deep Q-Network (DQN) with prioritized experience replay to handle non-stationarity and high-dimensionality. Evaluations on benchmark datasets (UCI HAR, PAMAP2, Yahoo! Finance Stream) demonstrate that RL-Window outperforms state-of-the-art methods like ADWIN and CNN-Adaptive in classification accuracy, drift robustness, and computational efficiency. Additional qualitative analyses, extended metrics (e.g., energy efficiency, latency), and a comprehensive dataset characterization further highlight its adaptability and stability, making it suitable for real-time applications.

Updated: 2025-07-09 14:40:35

标题: 设计基于强化学习的自适应算法，用于多维数据流中滑动窗口大小的动态优化

摘要: 多维数据流在应用程序中普遍存在，如物联网、金融市场和实时分析，由于其高速度、无界性和复杂的跨维度依赖关系，面临着重大挑战。滑动窗口技术对于处理这种数据流至关重要，但固定大小的窗口很难适应概念漂移或突发模式等动态变化。本文提出了一种基于强化学习（RL）的新颖方法，用于动态优化多维数据流的滑动窗口大小。通过将窗口大小选择形式化为一个RL问题，我们使代理能够根据流特征（如方差、相关性和时间趋势）学习自适应策略。我们的方法，RL-Window，利用了一种具有优先经验重播的Dueling Deep Q-Network (DQN)来处理非稳态和高维度数据。对基准数据集（UCI HAR、PAMAP2、Yahoo! Finance Stream）的评估表明，RL-Window在分类准确性、漂移鲁棒性和计算效率方面优于ADWIN和CNN-Adaptive等最新方法。额外的定性分析、扩展指标（如能效、延迟）和全面的数据集特征进一步突显了其适应性和稳定性，使其适用于实时应用。

更新时间: 2025-07-09 14:40:35

领域: cs.LG

下载: http://arxiv.org/abs/2507.06901v1

VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation

Graphical User Interface (GUI) agents powered by Large Vision-Language Models (LVLMs) have emerged as a revolutionary approach to automating human-machine interactions, capable of autonomously operating personal devices (e.g., mobile phones) or applications within the device to perform complex real-world tasks in a human-like manner. However, their close integration with personal devices raises significant security concerns, with many threats, including backdoor attacks, remaining largely unexplored. This work reveals that the visual grounding of GUI agent-mapping textual plans to GUI elements-can introduce vulnerabilities, enabling new types of backdoor attacks. With backdoor attack targeting visual grounding, the agent's behavior can be compromised even when given correct task-solving plans. To validate this vulnerability, we propose VisualTrap, a method that can hijack the grounding by misleading the agent to locate textual plans to trigger locations instead of the intended targets. VisualTrap uses the common method of injecting poisoned data for attacks, and does so during the pre-training of visual grounding to ensure practical feasibility of attacking. Empirical results show that VisualTrap can effectively hijack visual grounding with as little as 5% poisoned data and highly stealthy visual triggers (invisible to the human eye); and the attack can be generalized to downstream tasks, even after clean fine-tuning. Moreover, the injected trigger can remain effective across different GUI environments, e.g., being trained on mobile/web and generalizing to desktop environments. These findings underscore the urgent need for further research on backdoor attack risks in GUI agents.

Updated: 2025-07-09 14:36:00

标题: VisualTrap：通过视觉基准操纵对GUI代理进行隐蔽后门攻击

摘要: 由大型视觉语言模型（LVLMs）支持的图形用户界面（GUI）代理已经成为自动化人机交互的革命性方法，能够自主操作个人设备（例如移动电话）或应用程序内的设备，以人类化的方式执行复杂的真实世界任务。然而，它们与个人设备的密切集成引发了重大的安全担忧，许多威胁，包括后门攻击，仍然未被充分探索。本研究揭示了GUI代理的视觉基础-将文本计划映射到GUI元素-可能引入漏洞，从而使新类型的后门攻击成为可能。通过针对视觉基础的后门攻击，即使给出正确的任务解决计划，代理的行为也可能受损。为了验证这种漏洞，我们提出了VisualTrap方法，该方法可以通过误导代理将文本计划定位到触发位置而不是预期目标来劫持基础。VisualTrap使用注入毒害数据进行攻击的常见方法，并在视觉基础的预训练期间执行此操作，以确保攻击的实际可行性。实证结果显示，VisualTrap可以有效地劫持视觉基础，只需5%的毒害数据和高度隐蔽的视觉触发器（对人眼不可见）；并且攻击可以泛化到下游任务，即使在干净的微调后也是如此。此外，注入的触发器可以在不同的GUI环境下保持有效，例如在移动/网络上进行训练并泛化到桌面环境。这些发现强调了对GUI代理中后门攻击风险进一步研究的迫切需要。

更新时间: 2025-07-09 14:36:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.06899v1

SCoRE: Streamlined Corpus-based Relation Extraction using Multi-Label Contrastive Learning and Bayesian kNN

The growing demand for efficient knowledge graph (KG) enrichment leveraging external corpora has intensified interest in relation extraction (RE), particularly under low-supervision settings. To address the need for adaptable and noise-resilient RE solutions that integrate seamlessly with pre-trained large language models (PLMs), we introduce SCoRE, a modular and cost-effective sentence-level RE system. SCoRE enables easy PLM switching, requires no finetuning, and adapts smoothly to diverse corpora and KGs. By combining supervised contrastive learning with a Bayesian k-Nearest Neighbors (kNN) classifier for multi-label classification, it delivers robust performance despite the noisy annotations of distantly supervised corpora. To improve RE evaluation, we propose two novel metrics: Correlation Structure Distance (CSD), measuring the alignment between learned relational patterns and KG structures, and Precision at R (P@R), assessing utility as a recommender system. We also release Wiki20d, a benchmark dataset replicating real-world RE conditions where only KG-derived annotations are available. Experiments on five benchmarks show that SCoRE matches or surpasses state-of-the-art methods while significantly reducing energy consumption. Further analyses reveal that increasing model complexity, as seen in prior work, degrades performance, highlighting the advantages of SCoRE's minimal design. Combining efficiency, modularity, and scalability, SCoRE stands as an optimal choice for real-world RE applications.

Updated: 2025-07-09 14:33:07

标题: SCoRE：使用多标签对比学习和贝叶斯kNN的简化基于语料库的关系提取

摘要: 对于利用外部语料库进行知识图谱（KG）丰富的需求不断增长，加剧了对关系抽取（RE）的兴趣，尤其是在低监督设置下。为了解决对能够与预训练大型语言模型（PLMs）无缝集成的可适应性和噪音抗干扰的RE解决方案的需求，我们引入了SCoRE，一个模块化和成本效益高的句子级RE系统。SCoRE可以轻松切换PLM，无需微调，并且能够顺利地适应不同的语料库和KG。通过将监督对比学习与贝叶斯k最近邻（kNN）分类器结合用于多标签分类，尽管存在远程监督语料库的噪音标注，它仍能提供稳健的性能。为了改进RE评估，我们提出了两种新的度量标准：关联结构距离（CSD），用于衡量学习的关系模式与KG结构之间的对齐情况，以及R处的精度（P@R），评估其作为推荐系统的效用。我们还发布了Wiki20d，一个基准数据集，复制了只有基于KG的注释可用的真实世界RE条件。在五个基准测试上的实验表明，SCoRE与最先进的方法相匹配或超越，同时显著减少了能源消耗。进一步的分析显示，增加模型复杂性会降低性能，突显了SCoRE最小设计的优势。结合效率、模块化和可扩展性，SCoRE成为真实世界RE应用的最佳选择。

更新时间: 2025-07-09 14:33:07

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2507.06895v1

Developing and Maintaining an Open-Source Repository of AI Evaluations: Challenges and Insights

AI evaluations have become critical tools for assessing large language model capabilities and safety. This paper presents practical insights from eight months of maintaining $inspect\_evals$, an open-source repository of 70+ community-contributed AI evaluations. We identify key challenges in implementing and maintaining AI evaluations and develop solutions including: (1) a structured cohort management framework for scaling community contributions, (2) statistical methodologies for optimal resampling and cross-model comparison with uncertainty quantification, and (3) systematic quality control processes for reproducibility. Our analysis reveals that AI evaluation requires specialized infrastructure, statistical rigor, and community coordination beyond traditional software development practices.

Updated: 2025-07-09 14:30:45

标题: 建立和维护一个人工智能评估的开源仓库：挑战与见解

摘要: AI评估已成为评估大型语言模型能力和安全性的关键工具。本文介绍了在维护$inspect\_evals$这一开源存储库中的八个月中所获得的实用见解，该存储库包含70多个社区贡献的AI评估。我们确定了在实施和维护AI评估过程中的关键挑战，并提出了解决方案，包括：（1）用于扩展社区贡献的结构化队伍管理框架，（2）用于最佳重抽样和跨模型比较的统计方法，同时量化不确定性，以及（3）用于可重现性的系统质量控制流程。我们的分析表明，AI评估需要专门的基础设施、统计严谨性和超出传统软件开发实践的社区协调。

更新时间: 2025-07-09 14:30:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.06893v1

Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model

Reinforcement Learning (RL) has demonstrated its potential to improve the reasoning ability of Large Language Models (LLMs). One major limitation of most existing Reinforcement Finetuning (RFT) methods is that they are on-policy RL in nature, i.e., data generated during the past learning process is not fully utilized. This inevitably comes at a significant cost of compute and time, posing a stringent bottleneck on continuing economic and efficient scaling. To this end, we launch the renaissance of off-policy RL and propose Reincarnating Mix-policy Proximal Policy Gradient (ReMix), a general approach to enable on-policy RFT methods like PPO and GRPO to leverage off-policy data. ReMix consists of three major components: (1) Mix-policy proximal policy gradient with an increased Update-To-Data (UTD) ratio for efficient training; (2) KL-Convex policy constraint to balance the trade-off between stability and flexibility; (3) Policy reincarnation to achieve a seamless transition from efficient early-stage learning to steady asymptotic improvement. In our experiments, we train a series of ReMix models upon PPO, GRPO and 1.5B, 7B base models. ReMix shows an average Pass@1 accuracy of 52.10% (for 1.5B model) with 0.079M response rollouts, 350 training steps and achieves 63.27%/64.39% (for 7B model) with 0.007M/0.011M response rollouts, 50/75 training steps, on five math reasoning benchmarks (i.e., AIME'24, AMC'23, Minerva, OlympiadBench, and MATH500). Compared with 15 recent advanced models, ReMix shows SOTA-level performance with an over 30x to 450x reduction in training cost in terms of rollout data volume. In addition, we reveal insightful findings via multifaceted analysis, including the implicit preference for shorter responses due to the Whipping Effect of off-policy discrepancy, the collapse mode of self-reflection behavior under the presence of severe off-policyness, etc.

Updated: 2025-07-09 14:29:45

标题: 挤压浸泡的海绵：针对大型语言模型的高效离策略强化微调

摘要: 强化学习（RL）已经展示了提高大型语言模型（LLMs）的推理能力的潜力。大多数现有的强化微调（RFT）方法的一个主要限制是它们在政策RL性质上，即，在过去的学习过程中生成的数据没有得到充分利用。这不可避免地会带来显著的计算和时间成本，对继续经济和有效的扩展构成了严格的瓶颈。为此，我们启动了离线政策RL的复兴，并提出了Reincarnating Mix-policy Proximal Policy Gradient（ReMix），这是一种通用方法，可以使像PPO和GRPO这样的政策RFT方法利用离线数据。ReMix包括三个主要组成部分：（1）混合政策近端政策梯度，增加更新到数据（UTD）比率以进行高效训练；（2）KL-凸政策约束，以平衡稳定性和灵活性之间的权衡；（3）政策重生，实现从高效的早期学习到稳定的渐进改进的无缝过渡。在我们的实验中，我们在PPO、GRPO和1.5B、7B基础模型上训练了一系列ReMix模型。ReMix显示出平均Pass@1准确率为52.10%（对于1.5B模型），使用0.079M响应回滚、350个训练步骤，并且在五个数学推理基准测试（即AIME'24、AMC'23、Minerva、OlympiadBench和MATH500）上实现了63.27%/64.39%（对于7B模型），使用0.007M/0.011M响应回滚、50/75个训练步骤。与15个最新的先进模型相比，ReMix在训练成本方面显示出SOTA级别的性能，回滚数据量减少了30倍至450倍。此外，通过多方面的分析，我们揭示了一些有见地的发现，包括由于离线政策差异的鞭挞效应而对较短响应的隐含偏好，以及在存在严重离线政策性下的自我反思行为的崩溃模式等。

更新时间: 2025-07-09 14:29:45

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.06892v1

A Single-Point Measurement Framework for Robust Cyber-Attack Diagnosis in Smart Microgrids Using Dual Fractional-Order Feature Analysis

Cyber-attacks jeopardize the safe operation of smart microgrids. At the same time, existing diagnostic methods either depend on expensive multi-point instrumentation or stringent modelling assumptions that are untenable under single-sensor constraints. This paper proposes a Fractional-Order Memory-Enhanced Attack-Diagnosis Scheme (FO-MADS) that achieves low-latency fault localisation and cyber-attack detection using only one VPQ (Voltage-Power-Reactive-power) sensor. FO-MADS first constructs a dual fractional-order feature library by jointly applying Caputo and Gr\"unwald-Letnikov derivatives, thereby amplifying micro-perturbations and slow drifts in the VPQ signal. A two-stage hierarchical classifier then pinpoints the affected inverter and isolates the faulty IGBT switch, effectively alleviating class imbalance. Robustness is further strengthened through Progressive Memory-Replay Adversarial Training (PMR-AT), whose attack-aware loss is dynamically re-weighted via Online Hard Example Mining (OHEM) to prioritise the most challenging samples. Experiments on a four-inverter microgrid testbed comprising 1 normal and 24 fault classes under four attack scenarios demonstrate diagnostic accuracies of 96.6 % (bias), 94.0 % (noise), 92.8 % (data replacement), and 95.7 % (replay), while sustaining 96.7 % under attack-free conditions. These results establish FO-MADS as a cost-effective and readily deployable solution that markedly enhances the cyber-physical resilience of smart microgrids.

Updated: 2025-07-09 14:27:40

标题: 智能微电网中使用双分数阶特征分析的稳健网络攻击诊断的单点测量框架

摘要: 网络攻击威胁着智能微电网的安全运行。同时，现有的诊断方法要么依赖于昂贵的多点仪器，要么依赖于在单传感器约束条件下不可持续的严格建模假设。本文提出了一种分数阶记忆增强攻击诊断方案（FO-MADS），仅使用一个VPQ（电压-功率-无功功率）传感器实现低延迟故障定位和网络攻击检测。FO-MADS首先通过联合应用Caputo和Gr\"unwald-Letnikov导数构建了一个双分数阶特征库，从而放大了VPQ信号中的微扰动和缓慢漂移。然后，一个两阶段的分层分类器精确定位受影响的逆变器并隔离故障IGBT开关，有效缓解了类别不平衡问题。通过渐进性记忆重播对抗训练（PMR-AT）进一步增强了鲁棒性，其攻击感知损失通过在线难例挖掘（OHEM）动态重新加权，以优先处理最具挑战性的样本。在一个包含1个正常类和24个故障类的四逆变器微电网测试平台上进行的实验展示了在四种攻击场景下的诊断准确率：96.6％（偏差）、94.0％（噪声）、92.8％（数据替换）和95.7％（重播），同时在无攻击条件下保持96.7％。这些结果表明FO-MADS是一种成本效益高且易于部署的解决方案，显著提高了智能微电网的网络物理韧性。

更新时间: 2025-07-09 14:27:40

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2507.06890v1

Horizontal and Vertical Federated Causal Structure Learning via Higher-order Cumulants

Federated causal discovery aims to uncover the causal relationships between entities while protecting data privacy, which has significant importance and numerous applications in real-world scenarios. Existing federated causal structure learning methods primarily focus on horizontal federated settings. However, in practical situations, different clients may not necessarily contain data on the same variables. In a single client, the incomplete set of variables can easily lead to spurious causal relationships, thereby affecting the information transmitted to other clients. To address this issue, we comprehensively consider causal structure learning methods under both horizontal and vertical federated settings. We provide the identification theories and methods for learning causal structure in the horizontal and vertical federal setting via higher-order cumulants. Specifically, we first aggregate higher-order cumulant information from all participating clients to construct global cumulant estimates. These global estimates are then used for recursive source identification, ultimately yielding a global causal strength matrix. Our approach not only enables the reconstruction of causal graphs but also facilitates the estimation of causal strength coefficients. Our algorithm demonstrates superior performance in experiments conducted on both synthetic data and real-world data.

Updated: 2025-07-09 14:25:51

标题: 水平和垂直联合因果结构学习：通过高阶累积量

摘要: 联合因果发现旨在揭示实体之间的因果关系，同时保护数据隐私，在现实场景中具有重要意义和众多应用。现有的联合因果结构学习方法主要关注水平联合设置。然而，在实际情况下，不同的客户端可能并不一定包含相同变量的数据。在单个客户端中，不完整的变量集很容易导致虚假因果关系，从而影响传输给其他客户端的信息。为了解决这个问题，我们全面考虑了在水平和垂直联合设置下的因果结构学习方法。我们通过高阶累积量提供了在水平和垂直联邦设置下学习因果结构的识别理论和方法。具体而言，我们首先从所有参与客户端聚合高阶累积量信息，构建全局累积量估计。然后利用这些全局估计进行递归源识别，最终得到全局因果强度矩阵。我们的方法不仅能够重建因果图，还有助于估计因果强度系数。我们的算法在对合成数据和实际数据进行的实验中表现出优越性能。

更新时间: 2025-07-09 14:25:51

领域: cs.LG

下载: http://arxiv.org/abs/2507.06888v1

LARP: Learner-Agnostic Robust Data Prefiltering

The widespread availability of large public datasets is a key factor behind the recent successes of statistical inference and machine learning methods. However, these datasets often contain some low-quality or contaminated data, to which many learning procedures are sensitive. Therefore, the question of whether and how public datasets should be prefiltered to facilitate accurate downstream learning arises. On a technical level this requires the construction of principled data prefiltering methods which are learner-agnostic robust, in the sense of provably protecting a set of pre-specified downstream learners from corrupted data. In this work, we formalize the problem of Learner-Agnostic Robust data Prefiltering (LARP), which aims at finding prefiltering procedures that minimize a worst-case loss over a pre-specified set of learners. We first instantiate our framework in the context of scalar mean estimation with Huber estimators under the Huber data contamination model. We provide a hardness result on a specific problem instance and analyze several natural prefiltering procedures. Our theoretical results indicate that performing LARP on a heterogeneous set of learners leads to some loss in model performance compared to the alternative of prefiltering data for each learner/use-case individually. We explore the resulting utility loss and its dependence on the problem parameters via extensive experiments on real-world image and tabular data, observing statistically significant reduction in utility. Finally, we model the trade-off between the utility drop and the cost of repeated (learner-specific) prefiltering within a game-theoretic framework and showcase benefits of LARP for large datasets.

Updated: 2025-07-09 14:23:19

标题: LARP: 学习者不可知的稳健数据预过滤

摘要: 大规模公共数据集的广泛可用性是统计推断和机器学习方法取得最近成功的关键因素。然而，这些数据集通常包含一些低质量或受污染的数据，许多学习过程对此敏感。因此，关于是否以及如何对公共数据集进行预过滤以促进准确的下游学习的问题浮现。在技术层面上，这需要构建一些有原则的数据预过滤方法，这些方法既不依赖于学习者，又具有稳健性，即可以可靠地保护一组预先指定的下游学习者免受污染数据的影响。在这项工作中，我们将学习者不可知稳健数据预过滤（LARP）问题形式化，其目标是找到最小化在预先指定的一组学习者上的最坏情况损失的预过滤程序。我们首先在Huber数据污染模型下的Huber估计器的标量均值估计的背景下实例化我们的框架。我们对特定问题实例提供了一个困难结果，并分析了几种自然的预过滤程序。我们的理论结果表明，在异质学习者集上执行LARP相对于为每个学习者/用例单独预过滤数据的替代方法会导致一定程度的模型性能损失。通过对真实世界图像和表格数据进行广泛实验，观察到了实用性显著降低。最后，我们在博弈论框架内建模实用性降低和重复（特定于学习者的）预过滤成本之间的权衡，并展示了LARP对大型数据集的益处。

更新时间: 2025-07-09 14:23:19

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2506.20573v2

A Survey on Event Prediction Methods from a Systems Perspective: Bringing Together Disparate Research Areas

Event prediction is the ability of anticipating future events, i.e., future real-world occurrences, and aims to support the user in deciding on actions that change future events towards a desired state. An event prediction method learns the relation between features of past events and future events. It is applied to newly observed events to predict corresponding future events that are evaluated with respect to the user's desired future state. If the predicted future events do not comply with this state, actions are taken towards achieving desirable future states. Evidently, event prediction is valuable in many application domains such as business and natural disasters. The diversity of application domains results in a diverse range of methods that are scattered across various research areas which, in turn, use different terminology for event prediction methods. Consequently, sharing methods and knowledge for developing future event prediction methods is restricted. To facilitate knowledge sharing on account of a comprehensive integration and assessment of event prediction methods, we take a systems perspective to integrate event prediction methods into a single system, elicit requirements, and assess existing work with respect to the requirements. Based on the assessment, we identify open challenges and discuss future research directions.

Updated: 2025-07-09 14:22:25

标题: 一个从系统角度出发的事件预测方法调查：整合不同研究领域

摘要: 事件预测是指预测未来事件的能力，即未来真实世界发生的事件，并旨在支持用户决定将未来事件转变为期望状态的行动。一种事件预测方法学习过去事件的特征与未来事件之间的关系。它被应用于新观察到的事件，以预测相应的未来事件，并根据用户期望的未来状态进行评估。如果预测的未来事件不符合这种状态，就会采取行动以实现理想的未来状态。显然，事件预测在诸如商业和自然灾害等许多应用领域都具有价值。应用领域的多样性导致了分布在各种研究领域中的各种方法，这些领域使用不同的术语来描述事件预测方法。因此，为了促进知识共享，对事件预测方法进行全面整合和评估，我们采取系统视角将事件预测方法整合到一个系统中，梳理需求，并评估现有工作是否符合这些需求。根据评估结果，我们确定开放性挑战，并讨论未来的研究方向。

更新时间: 2025-07-09 14:22:25

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2302.04018v2

Near-Optimal Consistency-Robustness Trade-Offs for Learning-Augmented Online Knapsack Problems

This paper introduces a family of learning-augmented algorithms for online knapsack problems that achieve near Pareto-optimal consistency-robustness trade-offs through a simple combination of trusted learning-augmented and worst-case algorithms. Our approach relies on succinct, practical predictions -- single values or intervals estimating the minimum value of any item in an offline solution. Additionally, we propose a novel fractional-to-integral conversion procedure, offering new insights for online algorithm design.

Updated: 2025-07-09 14:20:53

标题: 近乎最佳的一致性-鲁棒性权衡：学习增强的在线背包问题

摘要: 本文介绍了一种学习增强算法族，用于在线背包问题，通过信任学习增强和最坏情况算法的简单组合，实现了接近帕累托最优的一致性-鲁棒性平衡。我们的方法依赖于简洁、实用的预测——单个值或区间估计离线解决方案中任何物品的最小值。此外，我们提出了一种新颖的分数到整数转换程序，为在线算法设计提供了新的见解。

更新时间: 2025-07-09 14:20:53

领域: cs.LG,cs.GT,68Q25, 68T05,F.2.2; I.2.6

下载: http://arxiv.org/abs/2406.18752v2

Winning and losing with Artificial Intelligence: What public discourse about ChatGPT tells us about how societies make sense of technological change

Public product launches in Artificial Intelligence can serve as focusing events for collective attention, surfacing how societies react to technological change. Social media provide a window into the sensemaking around these events, surfacing hopes and fears and showing who chooses to engage in the discourse and when. We demonstrate that public sensemaking about AI is shaped by economic interests and cultural values of those involved. We analyze 3.8 million tweets posted by 1.6 million users across 117 countries in response to the public launch of ChatGPT in 2022. Our analysis shows how economic self-interest, proxied by occupational skill types in writing, programming, and mathematics, and national cultural orientations, as measured by Hofstede's individualism, uncertainty avoidance, and power distance dimensions, shape who speaks, when they speak, and their stance towards ChatGPT. Roles requiring more technical skills, such as programming and mathematics, tend to engage earlier and express more positive stances, whereas writing-centric occupations join later with greater skepticism. At the cultural level, individualism predicts both earlier engagement and a more negative stance, and uncertainty avoidance reduces the prevalence of positive stances but does not delay when users first engage with ChatGPT. Aggregate sentiment trends mask the dynamics observed in our study. The shift toward a more critical stance towards ChatGPT over time stems primarily from the entry of more skeptical voices rather than a change of heart among early adopters. Our findings underscore the importance of both the occupational background and cultural context in understanding public reactions to AI.

Updated: 2025-07-09 14:15:12

标题: 使用人工智能赢得和输掉：关于ChatGPT的公共话语告诉我们社会如何理解技术变革

摘要: 公开发布的人工智能产品推出可以作为集体注意力的焦点事件，揭示社会对技术变革的反应。社交媒体为人们提供了一个窗口，让人们可以了解这些事件周围的意义，呈现出希望和恐惧，并展示了谁选择参与讨论以及何时参与。我们证明，关于人工智能的公众意义构建受到参与者的经济利益和文化价值观的影响。我们分析了2022年ChatGPT公开发布后，1.6百万用户在117个国家发布的380万条推文。我们的分析显示，经济自利性，通过写作、编程和数学等职业技能类型代表，以及国家文化取向，如霍夫斯泰德的个人主义、不确定性回避和权力距离维度，决定了谁说话，何时说话以及他们对ChatGPT的立场。需要更多技术技能的角色，如编程和数学，往往会更早参与并表达更积极的立场，而以写作为中心的职业则会在后期加入并表现出更大的怀疑。在文化层面上，个人主义预测早期参与和更负面的立场，而不确定性回避减少了积极立场的普遍性，但并没有延迟用户首次与ChatGPT互动的时间。总体情绪趋势掩盖了我们研究中观察到的动态。随着时间推移，对ChatGPT更加批判的态度主要源自更多怀疑的声音的涌入，而不是早期采用者心态的改变。我们的研究结果强调了了解公众对人工智能反应的重要性，包括职业背景和文化背景。

更新时间: 2025-07-09 14:15:12

领域: cs.CY,cs.AI,I.2; J.4; K.4.0

下载: http://arxiv.org/abs/2507.06876v1

IntOPE: Off-Policy Evaluation in the Presence of Interference

Off-Policy Evaluation (OPE) is employed to assess the potential impact of a hypothetical policy using logged contextual bandit feedback, which is crucial in areas such as personalized medicine and recommender systems, where online interactions are associated with significant risks and costs. Traditionally, OPE methods rely on the Stable Unit Treatment Value Assumption (SUTVA), which assumes that the reward for any given individual is unaffected by the actions of others. However, this assumption often fails in real-world scenarios due to the presence of interference, where an individual's reward is affected not just by their own actions but also by the actions of their peers. This realization reveals significant limitations of existing OPE methods in real-world applications. To address this limitation, we propose IntIPW, an IPW-style estimator that extends the Inverse Probability Weighting (IPW) framework by integrating marginalized importance weights to account for both individual actions and the influence of adjacent entities. Extensive experiments are conducted on both synthetic and real-world data to demonstrate the effectiveness of the proposed IntIPW method.

Updated: 2025-07-09 14:13:59

标题: IntOPE：存在干扰情况下的离线策略评估

摘要: 离线策略评估（OPE）被用来评估使用记录的上下文强盗反馈的假设政策可能产生的影响，这在个性化医学和推荐系统等领域尤为重要，其中在线互动与重大风险和成本相关联。传统上，OPE方法依赖于稳定单元处理值假设（SUTVA），该假设假定任何给定个体的奖励不受他人行为的影响。然而，在现实场景中，由于干扰的存在，这一假设经常失败，其中个体的奖励不仅受到自己行为的影响，还受到同行行为的影响。这一认识揭示了现有OPE方法在实际应用中存在的重大局限。为了解决这一限制，我们提出了IntIPW，这是一种IPW风格的估计量，通过整合边际化重要权重来考虑个体行为和相邻实体的影响。对合成和真实数据进行了大量实验，以证明提出的IntIPW方法的有效性。

更新时间: 2025-07-09 14:13:59

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2408.13484v2

Conformal Prediction for Long-Tailed Classification

Many real-world classification problems, such as plant identification, have extremely long-tailed class distributions. In order for prediction sets to be useful in such settings, they should (i) provide good class-conditional coverage, ensuring that rare classes are not systematically omitted from the prediction sets, and (ii) be a reasonable size, allowing users to easily verify candidate labels. Unfortunately, existing conformal prediction methods, when applied to the long-tailed setting, force practitioners to make a binary choice between small sets with poor class-conditional coverage or sets with very good class-conditional coverage but that are extremely large. We propose methods with guaranteed marginal coverage that smoothly trade off between set size and class-conditional coverage. First, we propose a conformal score function, prevalence-adjusted softmax, that targets a relaxed notion of class-conditional coverage called macro-coverage. Second, we propose a label-weighted conformal prediction method that allows us to interpolate between marginal and class-conditional conformal prediction. We demonstrate our methods on Pl@ntNet and iNaturalist, two long-tailed image datasets with 1,081 and 8,142 classes, respectively.

Updated: 2025-07-09 14:08:50

标题: 长尾分类的一致性预测

摘要: 许多实际分类问题，如植物识别，具有极端长尾类分布。为了使预测集在这种情况下有效，它们应该（i）提供良好的类别条件覆盖，确保罕见类别不会被系统地省略在预测集中，以及（ii）是一个合理的大小，使用户可以轻松验证候选标签。不幸的是，现有的一致性预测方法，当应用于长尾设置时，迫使从业者在小集合和具有很好类别条件覆盖但非常大的集合之间做出二元选择。我们提出了具有保证边际覆盖的方法，可以平滑地在集大小和类别条件覆盖之间进行权衡。首先，我们提出了一个一致性评分函数，调整流行度的Softmax，针对一种宽松的类别条件覆盖概念，称为宏覆盖。其次，我们提出了一种标签加权的一致性预测方法，允许我们在边际和类别条件一致性预测之间进行插值。我们在Pl@ntNet和iNaturalist这两个长尾图像数据集上展示了我们的方法，分别包含1,081和8,142个类别。

更新时间: 2025-07-09 14:08:50

领域: stat.ML,cs.CV,cs.LG,stat.ME

下载: http://arxiv.org/abs/2507.06867v1

Episodic Contextual Bandits with Knapsacks under Conversion Models

We study an online setting, where a decision maker (DM) interacts with contextual bandit-with-knapsack (BwK) instances in repeated episodes. These episodes start with different resource amounts, and the contexts' probability distributions are non-stationary in an episode. All episodes share the same latent conversion model, which governs the random outcome contingent upon a request's context and an allocation decision. Our model captures applications such as dynamic pricing on perishable resources with episodic replenishment, and first price auctions in repeated episodes with different starting budgets. We design an online algorithm that achieves a regret sub-linear in $T$, the number of episodes, assuming access to a \emph{confidence bound oracle} that achieves an $o(T)$-regret. Such an oracle is readily available from existing contextual bandit literature. We overcome the technical challenge with arbitrarily many possible contexts, which leads to a reinforcement learning problem with an unbounded state space. Our framework provides improved regret bounds in certain settings when the DM is provided with unlabeled feature data, which is novel to the contextual BwK literature.

Updated: 2025-07-09 14:00:05

标题: 带有转化模型的背包式情境臂带算法

摘要: 我们研究了一个在线设置，在这个设置中，决策者（DM）与具有背包的上下文赌博（BwK）实例在重复的周期中进行交互。这些周期以不同的资源数量开始，并且上下文的概率分布在一个周期内是非稳定的。所有周期共享相同的潜在转化模型，该模型控制着在请求的上下文和分配决策之后的随机结果。我们的模型涵盖了诸如动态定价易腐资源与周期性补充以及在不同起始预算的重复周期中的首价拍卖等应用。我们设计了一个在线算法，在假设可以访问一个达到$o(T)$-后悔的\emph{置信区间预测}的情况下，可以实现$T$（周期数）的后悔次线性。这样的预测器可以从现有的上下文赌博文献中轻松获得。我们克服了技术挑战，其中可能存在任意多个可能的上下文，这导致了一个具有无界状态空间的强化学习问题。我们的框架在某些设置中提供了改进的后悔界限，当DM提供未标记的特征数据时，这在上下文BwK文献中是新颖的。

更新时间: 2025-07-09 14:00:05

领域: cs.LG

下载: http://arxiv.org/abs/2507.06859v1

IAP: Invisible Adversarial Patch Attack through Perceptibility-Aware Localization and Perturbation Optimization

Despite modifying only a small localized input region, adversarial patches can drastically change the prediction of computer vision models. However, prior methods either cannot perform satisfactorily under targeted attack scenarios or fail to produce contextually coherent adversarial patches, causing them to be easily noticeable by human examiners and insufficiently stealthy against automatic patch defenses. In this paper, we introduce IAP, a novel attack framework that generates highly invisible adversarial patches based on perceptibility-aware localization and perturbation optimization schemes. Specifically, IAP first searches for a proper location to place the patch by leveraging classwise localization and sensitivity maps, balancing the susceptibility of patch location to both victim model prediction and human visual system, then employs a perceptibility-regularized adversarial loss and a gradient update rule that prioritizes color constancy for optimizing invisible perturbations. Comprehensive experiments across various image benchmarks and model architectures demonstrate that IAP consistently achieves competitive attack success rates in targeted settings with significantly improved patch invisibility compared to existing baselines. In addition to being highly imperceptible to humans, IAP is shown to be stealthy enough to render several state-of-the-art patch defenses ineffective.

Updated: 2025-07-09 13:58:40

标题: IAP：通过感知感知定位和扰动优化进行的隐形对抗贴片攻击

摘要: 尽管只修改了一个小的局部输入区域，对抗性贴片可以极大地改变计算机视觉模型的预测。然而，先前的方法要么在针对性攻击场景下表现不理想，要么无法生成上下文一致的对抗性贴片，导致它们容易被人类检查者注意到，并且对自动贴片防御不够隐蔽。在本文中，我们介绍了IAP，一种基于感知感知定位和扰动优化方案生成高度隐形对抗性贴片的新型攻击框架。具体而言，IAP首先通过利用类别定位和敏感度图搜索适当的位置放置贴片，平衡贴片位置对受害模型预测和人类视觉系统的敏感性，然后采用感知性正则化对抗性损失和优化隐形扰动的梯度更新规则，优先考虑颜色一致性。通过对各种图像基准和模型架构的全面实验表明，与现有基准相比，IAP在有针对性的设置中始终实现了竞争攻击成功率，并显著提高了贴片的隐形性。除了对人类高度隐蔽外，已证明IAP足够隐蔽，使几种最先进的贴片防御措施失效。

更新时间: 2025-07-09 13:58:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.06856v1

Adaptive Elicitation of Latent Information Using Natural Language

Eliciting information to reduce uncertainty about a latent entity is a critical task in many application domains, e.g., assessing individual student learning outcomes, diagnosing underlying diseases, or learning user preferences. Though natural language is a powerful medium for this purpose, large language models (LLMs) and existing fine-tuning algorithms lack mechanisms for strategically gathering information to refine their own understanding of the latent entity. To harness the generalization power and world knowledge of LLMs in developing effective information-gathering strategies, we propose an adaptive elicitation framework that actively reduces uncertainty on the latent entity. Since probabilistic modeling of an abstract latent entity is difficult, our framework adopts a predictive view of uncertainty, using a meta-learned language model to simulate future observations and enable scalable uncertainty quantification over complex natural language. Through autoregressive forward simulation, our model quantifies how new questions reduce epistemic uncertainty, enabling the development of sophisticated information-gathering strategies to choose the most informative next queries. In experiments on the 20 questions game, dynamic opinion polling, and adaptive student assessment, our method consistently outperforms baselines in identifying critical unknowns and improving downstream predictions, illustrating the promise of strategic information gathering in natural language settings.

Updated: 2025-07-09 13:58:35

标题: 自然语言适应性引导潜在信息

摘要: 摘要：在许多应用领域中，例如评估个体学生的学习成果、诊断潜在疾病或学习用户偏好，引导信息以减少对潜在实体的不确定性是一项关键任务。尽管自然语言是达到这一目的的强大媒介，但大型语言模型（LLMs）和现有的微调算法缺乏策略性收集信息以完善其对潜在实体的理解的机制。为了利用LLMs的泛化能力和世界知识来开发有效的信息收集策略，我们提出了一种主动减少潜在实体不确定性的自适应引导框架。由于对抽象潜在实体进行概率建模是困难的，我们的框架采用了对不确定性的预测视角，利用元学习语言模型模拟未来观察，并实现对复杂自然语言的可伸缩不确定性量化。通过自回归正向模拟，我们的模型量化了新问题如何减少认识不确定性，从而使得开发复杂的信息收集策略来选择最具信息量的下一个问题成为可能。在20个问题游戏、动态意见调查和自适应学生评估的实验中，我们的方法始终优于基线，在识别关键未知因素和提高下游预测方面表现优异，展示了在自然语言环境中战略信息收集的潜力。

更新时间: 2025-07-09 13:58:35

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.04204v2

DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models

Molecular structure elucidation from spectra is a foundational problem in chemistry, with profound implications for compound identification, synthesis, and drug development. Traditional methods rely heavily on expert interpretation and lack scalability. Pioneering machine learning methods have introduced retrieval-based strategies, but their reliance on finite libraries limits generalization to novel molecules. Generative models offer a promising alternative, yet most adopt autoregressive SMILES-based architectures that overlook 3D geometry and struggle to integrate diverse spectral modalities. In this work, we present DiffSpectra, a generative framework that directly infers both 2D and 3D molecular structures from multi-modal spectral data using diffusion models. DiffSpectra formulates structure elucidation as a conditional generation process. Its denoising network is parameterized by Diffusion Molecule Transformer, an SE(3)-equivariant architecture that integrates topological and geometric information. Conditioning is provided by SpecFormer, a transformer-based spectral encoder that captures intra- and inter-spectral dependencies from multi-modal spectra. Extensive experiments demonstrate that DiffSpectra achieves high accuracy in structure elucidation, recovering exact structures with 16.01% top-1 accuracy and 96.86% top-20 accuracy through sampling. The model benefits significantly from 3D geometric modeling, SpecFormer pre-training, and multi-modal conditioning. These results highlight the effectiveness of spectrum-conditioned diffusion modeling in addressing the challenge of molecular structure elucidation. To our knowledge, DiffSpectra is the first framework to unify multi-modal spectral reasoning and joint 2D/3D generative modeling for de novo molecular structure elucidation.

Updated: 2025-07-09 13:57:20

标题: DiffSpectra：使用扩散模型从光谱中解析分子结构

摘要: 从光谱中解析分子结构是化学中的一个基础性问题，对于化合物的鉴定、合成和药物开发有着深远的影响。传统方法很大程度上依赖于专家解释，并且缺乏可扩展性。开创性的机器学习方法引入了基于检索的策略，但它们对有限库的依赖限制了对新型分子的泛化。生成模型提供了一种有前途的替代方案，但大多数采用基于自回归SMILES的架构，忽视了3D几何结构，并且难以整合多样化的光谱模态。在这项工作中，我们提出了DiffSpectra，一个生成框架，通过扩散模型直接从多模光谱数据中推断出2D和3D分子结构。DiffSpectra将结构解析形式化为条件生成过程。其去噪网络由扩散分子变换器参数化，这是一个集成拓扑和几何信息的SE(3)-等变架构。条件由SpecFormer提供，这是一个基于变压器的光谱编码器，可以从多模光谱中捕获内部和相互光谱依赖关系。广泛的实验表明，DiffSpectra在结构解析方面取得了高准确性，在采样过程中通过16.01%的top-1准确率和96.86%的top-20准确率恢复了精确的结构。该模型从3D几何建模、SpecFormer预训练和多模态条件等方面获益显著。这些结果突显了以光谱为条件的扩散建模在解决分子结构解析挑战方面的有效性。据我们所知，DiffSpectra是第一个统一多模光谱推理和联合2D/3D生成建模用于全新分子结构解析的框架。

更新时间: 2025-07-09 13:57:20

领域: cs.LG,cs.AI,cs.CE,physics.chem-ph,q-bio.MN

下载: http://arxiv.org/abs/2507.06853v1

SCC-recursiveness in infinite argumentation (extended version)

Argumentation frameworks (AFs) are a foundational tool in artificial intelligence for modeling structured reasoning and conflict. SCC-recursiveness is a well-known design principle in which the evaluation of arguments is decomposed according to the strongly connected components (SCCs) of the attack graph, proceeding recursively from "higher" to "lower" components. While SCC-recursive semantics such as \cft and \stgt have proven effective for finite AFs, Baumann and Spanring showed the failure of SCC-recursive semantics to generalize reliably to infinite AFs due to issues with well-foundedness. We propose two approaches to extending SCC-recursiveness to the infinite setting. We systematically evaluate these semantics using Baroni and Giacomin's established criteria, showing in particular that directionality fails in general. We then examine these semantics' behavior in finitary frameworks, where we find some of our semantics satisfy directionality. These results advance the theory of infinite argumentation and lay the groundwork for reasoning systems capable of handling unbounded or evolving domains.

Updated: 2025-07-09 13:57:12

标题: 无限论证中的SCC-递归性（扩展版）

摘要: Argumentation frameworks (AFs) are a foundational tool in artificial intelligence for modeling structured reasoning and conflict. SCC-recursiveness is a well-known design principle in which the evaluation of arguments is decomposed according to the strongly connected components (SCCs) of the attack graph, proceeding recursively from "higher" to "lower" components. While SCC-recursive semantics such as \cft and \stgt have proven effective for finite AFs, Baumann and Spanring showed the failure of SCC-recursive semantics to generalize reliably to infinite AFs due to issues with well-foundedness. 我们提出了两种将SCC递归性扩展到无限设置的方法。我们使用Baroni和Giacomin建立的标准系统地评估这些语义，特别是显示方向性在一般情况下失败。然后我们研究这些语义在有限框架中的行为，我们发现一些我们的语义满足方向性。这些结果推动了无限论证理论的发展，并为能够处理无限或不断发展的领域的推理系统奠定了基础。

更新时间: 2025-07-09 13:57:12

领域: cs.AI

下载: http://arxiv.org/abs/2507.06852v1

The Dark Side of LLMs Agent-based Attacks for Complete Computer Takeover

The rapid adoption of Large Language Model (LLM) agents and multi-agent systems enables unprecedented capabilities in natural language processing and generation. However, these systems have introduced unprecedented security vulnerabilities that extend beyond traditional prompt injection attacks. This paper presents the first comprehensive evaluation of LLM agents as attack vectors capable of achieving complete computer takeover through the exploitation of trust boundaries within agentic AI systems where autonomous entities interact and influence each other. We demonstrate that adversaries can leverage three distinct attack surfaces - direct prompt injection, RAG backdoor attacks, and inter-agent trust exploitation - to coerce popular LLMs (including GPT-4o, Claude-4 and Gemini-2.5) into autonomously installing and executing malware on victim machines. Our evaluation of 17 state-of-the-art LLMs reveals an alarming vulnerability hierarchy: while 41.2% of models succumb to direct prompt injection, 52.9% are vulnerable to RAG backdoor attacks, and a critical 82.4% can be compromised through inter-agent trust exploitation. Notably, we discovered that LLMs which successfully resist direct malicious commands will execute identical payloads when requested by peer agents, revealing a fundamental flaw in current multi-agent security models. Our findings demonstrate that only 5.9% of tested models (1/17) proved resistant to all attack vectors, with the majority exhibiting context-dependent security behaviors that create exploitable blind spots. Our findings also highlight the need to increase awareness and research on the security risks of LLMs, showing a paradigm shift in cybersecurity threats, where AI tools themselves become sophisticated attack vectors.

Updated: 2025-07-09 13:54:58

标题: LLMs代理攻击的阴暗面：完全控制计算机的攻击

摘要: 大型语言模型（LLM）代理和多代理系统的快速采用使自然语言处理和生成能力达到前所未有的水平。然而，这些系统引入了前所未有的安全漏洞，超出了传统的提示注入攻击。本文首次全面评估了LLM代理作为攻击向量的能力，通过利用自主实体在代理人工智能系统中的信任边界相互交互和影响，实现完全接管计算机。我们证明，对手可以利用三个不同的攻击面 - 直接提示注入、RAG后门攻击和代理间信任利用 - 强迫流行的LLM（包括GPT-4o、Claude-4和Gemini-2.5）自主安装和执行恶意软件。我们评估了17个最先进的LLM，发现了一个令人震惊的漏洞等级制度：41.2%的模型屈服于直接提示注入，52.9%容易受到RAG后门攻击的攻击，而关键的82.4%可以通过代理间信任利用来受到威胁。值得注意的是，我们发现，成功抵抗直接恶意命令的LLM在被同行代理请求时将执行相同的有效载荷，揭示了当前多代理安全模型中的基本缺陷。我们的研究结果表明，只有5.9%的测试模型（1/17）对所有攻击向量都表现出抵抗力，大多数表现出依赖上下文的安全行为，产生可利用的盲点。我们的研究结果还强调了增加对LLM安全风险的意识和研究的必要性，显示了网络安全威胁的范式转变，其中人工智能工具本身变成了复杂的攻击向量。

更新时间: 2025-07-09 13:54:58

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.06850v1

OpenDPDv2: A Unified Learning and Optimization Framework for Neural Network Digital Predistortion

Neural network (NN)-based Digital Predistortion (DPD) stands out in improving signal quality in wideband radio frequency (RF) power amplifiers (PAs) employing complex modulation. However, NN DPDs usually rely on a large number of parameters for effective linearization and can significantly contribute to the energy consumption of the digital back-end in RF systems. This paper presents OpenDPDv2, a unified framework for PA modeling, DPD learning, and model optimization to reduce power consumption while maintaining high linearization performance. The optimization techniques feature a novel DPD algorithm, TRes-DeltaGRU, alongside two energy-efficient methods. The top-performing 32-bit floating-point (FP32) TRes-DeltaGRU-DPD model achieves an Adjacent Channel Power Ratio (ACPR) of -59.4 dBc and Error Vector Magnitude (EVM) of -42.1 dBc. By exploiting fixed-point quantization and dynamic temporal sparsity of input signals and hidden neurons, the inference energy of our model can be reduced by 4.5X while still maintaining -50.3 dBc ACPR and -35.2 dB EVM with 56% temporal sparsity. This was evaluated using a TM3.1a 200 MHz bandwidth 256-QAM OFDM signal applied to a 3.5 GHz GaN Doherty RF PA. OpenDPDv2 code, datasets, and documentation are publicly accessible at: https://github.com/lab-emi/OpenDPD.

Updated: 2025-07-09 13:54:47

标题: OpenDPDv2：神经网络数字预失真的统一学习和优化框架

摘要: 基于神经网络（NN）的数字预失真（DPD）在改善采用复杂调制的宽带射频（RF）功率放大器（PA）中的信号质量方面表现突出。然而，NN DPD通常依赖于大量参数进行有效的线性化，并且可以显著增加RF系统中数字后端的能耗。本文提出了OpenDPDv2，一个统一的框架，用于PA建模、DPD学习和模型优化，以减少功耗同时保持高线性化性能。优化技术采用了一种新颖的DPD算法TRes-DeltaGRU，以及两种能效方法。表现最佳的32位浮点（FP32）TRes-DeltaGRU-DPD模型实现了-59.4 dBc的相邻信道功率比（ACPR）和-42.1 dBc的误差向量幅度（EVM）。通过利用输入信号和隐藏神经元的固定点量化和动态时间稀疏性，我们的模型的推断能量可以降低4.5倍，同时仍保持-50.3 dBc的ACPR和-35.2 dB的EVM，其中56%为时间稀疏。这是使用TM3.1a 200 MHz带宽256-QAM OFDM信号应用于3.5 GHz GaN Doherty RF PA进行评估的。OpenDPDv2代码、数据集和文档可以公开访问：https://github.com/lab-emi/OpenDPD。

更新时间: 2025-07-09 13:54:47

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2507.06849v1

Towards Enterprise-Ready Computer Using Generalist Agent

This paper presents our ongoing work toward developing an enterprise-ready Computer Using Generalist Agent (CUGA) system. Our research highlights the evolutionary nature of building agentic systems suitable for enterprise environments. By integrating state-of-the-art agentic AI techniques with a systematic approach to iterative evaluation, analysis, and refinement, we have achieved rapid and cost-effective performance gains, notably reaching a new state-of-the-art performance on the WebArena and AppWorld benchmarks. We detail our development roadmap, the methodology and tools that facilitated rapid learning from failures and continuous system refinement, and discuss key lessons learned and future challenges for enterprise adoption.

Updated: 2025-07-09 13:52:32

标题: 朝向企业就绪的计算机，使用通用代理程序

摘要: 本文介绍了我们正在进行的关于开发适用于企业的通用代理系统（CUGA）的工作。我们的研究突出了构建适用于企业环境的代理系统的演变性质。通过将最先进的代理人工智能技术与系统化的迭代评估、分析和改进方法相结合，我们实现了快速和经济高效的性能提升，特别是在WebArena和AppWorld基准测试中达到了新的最先进性能。我们详细介绍了我们的开发路线图，促进了从失败中快速学习和持续系统改进的方法和工具，并讨论了企业采用的关键经验教训和未来挑战。

更新时间: 2025-07-09 13:52:32

领域: cs.DC,cs.AI,cs.MA

下载: http://arxiv.org/abs/2503.01861v3

Wrapless: The trustless lending protocol on top of Bitcoin

This paper presents Wrapless -- a lending protocol that enables the collateralization of bitcoins without requiring a trusted wrapping mechanism. The protocol facilitates a "loan channel" on the Bitcoin blockchain, allowing bitcoins to be locked as collateral for loans issued on any blockchain that supports Turing-complete smart contracts. The protocol is designed in a way that makes it economically irrational for each involved party to manipulate the loan rules. There is still a significant research area to bring the protocol closer to traditional AMM financial instruments.

Updated: 2025-07-09 13:49:58

标题: Wrapless：基于比特币的无信任借贷协议

摘要: 这篇论文介绍了Wrapless——一种借贷协议，使比特币能够作为抵押物，而不需要信任的封装机制。该协议在比特币区块链上实现了一个“贷款通道”，允许比特币被锁定作为在任何支持图灵完备智能合约的区块链上发行的贷款的抵押物。该协议设计得使得每个参与方操纵贷款规则都经济上不合理。仍然有一个重要的研究领域，将该协议更接近传统的AMM金融工具。

更新时间: 2025-07-09 13:49:58

领域: cs.CR

下载: http://arxiv.org/abs/2507.06064v2

Privacy-Utility-Fairness: A Balanced Approach to Vehicular-Traffic Management System

Location-based vehicular traffic management faces significant challenges in protecting sensitive geographical data while maintaining utility for traffic management and fairness across regions. Existing state-of-the-art solutions often fail to meet the required level of protection against linkage attacks and demographic biases, leading to privacy leakage and inequity in data analysis. In this paper, we propose a novel algorithm designed to address the challenges regarding the balance of privacy, utility, and fairness in location-based vehicular traffic management systems. In this context, utility means providing reliable and meaningful traffic information, while fairness ensures that all regions and individuals are treated equitably in data use and decision-making. Employing differential privacy techniques, we enhance data security by integrating query-based data access with iterative shuffling and calibrated noise injection, ensuring that sensitive geographical data remains protected. We ensure adherence to epsilon-differential privacy standards by implementing the Laplace mechanism. We implemented our algorithm on vehicular location-based data from Norway, demonstrating its ability to maintain data utility for traffic management and urban planning while ensuring fair representation of all geographical areas without being overrepresented or underrepresented. Additionally, we have created a heatmap of Norway based on our model, illustrating the privatized and fair representation of the traffic conditions across various cities. Our algorithm provides privacy in vehicular traffic

Updated: 2025-07-09 13:49:13

标题: 隐私-效用-公平：一种平衡的方法来处理车辆交通管理系统

摘要: 管理系统中的地理位置数据，同时保持数据管理的实用性和公平性在各个地区之间。现有的最先进解决方案通常无法满足对抗关联攻击和人口统计偏见的所需保护水平，导致隐私泄露和数据分析中的不公平。本文提出了一种新颖的算法，旨在解决地理位置车辆交通管理系统中关于隐私、实用性和公平性平衡的挑战。在这种情况下，实用性意味着提供可靠和有意义的交通信息，而公平性确保所有地区和个体在数据使用和决策制定中受到公平对待。通过采用差分隐私技术，我们通过将基于查询的数据访问与迭代混洗和校准噪声注入相结合，增强了数据安全性，确保敏感的地理位置数据得到保护。我们通过实施拉普拉斯机制，确保遵守ε-差分隐私标准。我们在挪威的车辆基于位置的数据上实现了我们的算法，展示了其在保持交通管理和城市规划数据实用性的能力的同时，确保了所有地理区域的公平代表性，无过度或不足代表。此外，我们已根据我们的模型创建了挪威的热力图，展示了各个城市交通情况的私密化和公平表示。我们的算法为车辆交通提供了隐私。

更新时间: 2025-07-09 13:49:13

领域: cs.CR,cs.AI,cs.MA

下载: http://arxiv.org/abs/2507.08864v1

EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning

Recent advances in reinforcement learning (RL) for large language model (LLM) fine-tuning show promise in addressing multi-objective tasks but still face significant challenges, including competing objective balancing, low training efficiency, poor scalability, and limited explainability. Leveraging ensemble learning principles, we introduce an Ensemble Multi-Objective RL (EMORL) framework that fine-tunes multiple models with individual objectives while optimizing their aggregation after the fine-tuning to improve efficiency and flexibility. Our method is the first to aggregate the hidden states of individual models, incorporating contextual information from multiple objectives. This approach is supported by a hierarchical grid search algorithm that identifies optimal weighted combinations. We evaluate EMORL on counselor reflection generation tasks, using text classification models to score the generations and provide rewards during RL fine-tuning. Through comprehensive experiments on the PAIR and Psych8k datasets, we demonstrate the advantages of EMORL against existing baselines: significantly lower and more stable training consumption ($17,529\pm 1,650$ data points and $6,573\pm 147.43$ seconds), improved scalability and explainability, and comparable performance across multiple objectives.

Updated: 2025-07-09 13:45:07

标题: EMORL：集成多目标强化学习用于高效和灵活的LLM微调

摘要: 最近在大型语言模型（LLM）微调方面，强化学习（RL）取得了一些进展，显示出在解决多目标任务方面有潜力，但仍面临着重大挑战，包括竞争目标平衡、低训练效率、扩展性差以及可解释性有限。借鉴集成学习原理，我们引入了一个集成多目标强化学习（EMORL）框架，该框架微调多个具有个别目标的模型，并在微调后优化它们的聚合，以提高效率和灵活性。我们的方法是首个聚合个别模型的隐藏状态，将来自多个目标的上下文信息纳入其中。这种方法得到了一个层次化的网格搜索算法的支持，该算法能够确定最佳加权组合。我们在辅导员反思生成任务上评估了EMORL，使用文本分类模型对生成进行评分，并在强化学习微调过程中提供奖励。通过对PAIR和Psych8k数据集进行全面实验，我们展示了EMORL相对于现有基线的优势：训练消耗显著更低且更稳定（$17,529\pm 1,650$个数据点和$6,573\pm 147.43$秒），提高了可扩展性和可解释性，并在多个目标上表现出可比性的性能。

更新时间: 2025-07-09 13:45:07

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2505.02579v3

Adaptive collaboration for online personalized distributed learning with heterogeneous clients

We study the problem of online personalized decentralized learning with $N$ statistically heterogeneous clients collaborating to accelerate local training. An important challenge in this setting is to select relevant collaborators to reduce gradient variance while mitigating the introduced bias. To tackle this, we introduce a gradient-based collaboration criterion, allowing each client to dynamically select peers with similar gradients during the optimization process. Our criterion is motivated by a refined and more general theoretical analysis of the All-for-one algorithm, proved to be optimal in Even et al. (2022) for an oracle collaboration scheme. We derive excess loss upper-bounds for smooth objective functions, being either strongly convex, non-convex, or satisfying the Polyak-Lojasiewicz condition; our analysis reveals that the algorithm acts as a variance reduction method where the speed-up depends on a sufficient variance. We put forward two collaboration methods instantiating the proposed general schema; and we show that one variant preserves the optimality of All-for-one. We validate our results with experiments on synthetic and real datasets.

Updated: 2025-07-09 13:44:27

标题: 具有异构客户的在线个性化分布式学习的自适应协作

摘要: 我们研究了在线个性化分散学习的问题，其中$N$个统计异质客户端合作加速本地训练。在这种情况下的一个重要挑战是选择相关的合作伙伴，以减少梯度方差同时减轻引入的偏差。为了解决这个问题，我们引入了一个基于梯度的合作标准，允许每个客户端在优化过程中动态选择具有类似梯度的同行。我们的标准受到All-for-one算法的一个更精细和更一般的理论分析的启发，该算法在Even等人（2022年）中被证明对于一个预设的合作方案是最优的。我们推导了平滑目标函数的过度损失上界，它们可以是强凸的，非凸的，或者满足Polyak-Lojasiewicz条件；我们的分析表明，该算法作为一种方差缩减方法，其加速取决于足够的方差。我们提出了两种实例化所提出的一般框架的合作方法；并且我们展示其中一种变体保持了All-for-one的最佳性。我们通过合成和真实数据集上的实验证实了我们的结果。

更新时间: 2025-07-09 13:44:27

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.06844v1

Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models

Text-to-image (T2I) diffusion models have achieved remarkable progress in generating high-quality images but also raise people's concerns about generating harmful or misleading content. While extensive approaches have been proposed to erase unwanted concepts without requiring retraining from scratch, they inadvertently degrade performance on normal generation tasks. In this work, we propose Interpret then Deactivate (ItD), a novel framework to enable precise concept removal in T2I diffusion models while preserving overall performance. ItD first employs a sparse autoencoder (SAE) to interpret each concept as a combination of multiple features. By permanently deactivating the specific features associated with target concepts, we repurpose SAE as a zero-shot classifier that identifies whether the input prompt includes target concepts, allowing selective concept erasure in diffusion models. Moreover, we demonstrate that ItD can be easily extended to erase multiple concepts without requiring further training. Comprehensive experiments across celebrity identities, artistic styles, and explicit content demonstrate ItD's effectiveness in eliminating targeted concepts without interfering with normal concept generation. Additionally, ItD is also robust against adversarial prompts designed to circumvent content filters. Code is available at: https://github.com/NANSirun/Interpret-then-deactivate.

Updated: 2025-07-09 13:44:21

标题: 稀疏自编码器作为零样本分类器，用于文本到图像扩散模型中的概念擦除

摘要: 文本到图像（T2I）扩散模型在生成高质量图像方面取得了显著进展，但也引起了人们对生成有害或误导性内容的担忧。尽管已经提出了许多方法来消除不需要的概念，而无需从头开始重新训练，但这些方法不经意地降低了正常生成任务的性能。在这项工作中，我们提出了“先解释再停用”（ItD）的新框架，以实现在T2I扩散模型中精确删除概念，同时保持整体性能。ItD首先使用稀疏自编码器（SAE）将每个概念解释为多个特征的组合。通过永久停用与目标概念相关联的特定特征，我们重新用SAE作为零样本分类器，用于识别输入提示是否包含目标概念，从而允许在扩散模型中进行选择性概念擦除。此外，我们证明了ItD可以轻松扩展到擦除多个概念，而无需进一步训练。对名人身份、艺术风格和明确内容进行的全面实验表明了ItD在消除目标概念方面的有效性，而不会干扰正常概念生成。此外，ItD还能够抵抗旨在规避内容过滤器的对抗提示。代码可在以下链接找到：https://github.com/NANSirun/Interpret-then-deactivate。

更新时间: 2025-07-09 13:44:21

领域: cs.CV,cs.AI,cs.CR

下载: http://arxiv.org/abs/2503.09446v3

Towards Collaborative Anti-Money Laundering Among Financial Institutions

Money laundering is the process that intends to legalize the income derived from illicit activities, thus facilitating their entry into the monetary flow of the economy without jeopardizing their source. It is crucial to identify such activities accurately and reliably in order to enforce anti-money laundering (AML). Despite considerable efforts to AML, a large number of such activities still go undetected. Rule-based methods were first introduced and are still widely used in current detection systems. With the rise of machine learning, graph-based learning methods have gained prominence in detecting illicit accounts through the analysis of money transfer graphs. Nevertheless, these methods generally assume that the transaction graph is centralized, whereas in practice, money laundering activities usually span multiple financial institutions. Due to regulatory, legal, commercial, and customer privacy concerns, institutions tend not to share data, restricting their utility in practical usage. In this paper, we propose the first algorithm that supports performing AML over multiple institutions while protecting the security and privacy of local data. To evaluate, we construct Alipay-ECB, a real-world dataset comprising digital transactions from Alipay, the world's largest mobile payment platform, alongside transactions from E-Commerce Bank (ECB). The dataset includes over 200 million accounts and 300 million transactions, covering both intra-institution transactions and those between Alipay and ECB. This makes it the largest real-world transaction graph available for analysis. The experimental results demonstrate that our methods can effectively identify cross-institution money laundering subgroups. Additionally, experiments on synthetic datasets also demonstrate that our method is efficient, requiring only a few minutes on datasets with millions of transactions.

Updated: 2025-07-09 13:39:50

标题: 朝向金融机构间合作的反洗钱行动

摘要: 洗钱是一种旨在合法化来源于非法活动的收入的过程，从而促进它们进入经济货币流动，而不危及它们的来源。准确可靠地识别这类活动对于执行反洗钱（AML）至关重要。尽管在反洗钱方面做出了相当大的努力，但仍有大量此类活动未被发现。基于规则的方法首次被引入并仍广泛用于当前检测系统。随着机器学习的兴起，基于图的学习方法在通过分析资金转移图来检测非法账户方面日益引人关注。然而，这些方法通常假设交易图是集中的，而实际上，洗钱活动通常跨越多个金融机构。由于监管、法律、商业和客户隐私方面的考虑，机构往往不愿共享数据，限制了它们在实际使用中的效用。在本文中，我们提出了第一个支持在多个机构上执行AML的算法，同时保护本地数据的安全性和隐私性。为了评估，我们构建了Alipay-ECB，这是一个真实世界的数据集，包括来自世界最大移动支付平台支付宝的数字交易，以及来自电子商务银行（ECB）的交易。该数据集包括超过2亿个账户和3亿笔交易，涵盖了机构内部交易和支付宝与ECB之间的交易。这使得它成为可供分析的最大真实世界交易图。实验结果表明，我们的方法可以有效识别跨机构的洗钱小组。此外，对合成数据集的实验也表明我们的方法是高效的，在拥有数百万笔交易的数据集上只需几分钟。

更新时间: 2025-07-09 13:39:50

领域: cs.SI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2502.19952v3

Scalable Gaussian Processes: Advances in Iterative Methods and Pathwise Conditioning

Gaussian processes are a powerful framework for uncertainty-aware function approximation and sequential decision-making. Unfortunately, their classical formulation does not scale gracefully to large amounts of data and modern hardware for massively-parallel computation, prompting many researchers to develop techniques which improve their scalability. This dissertation focuses on the powerful combination of iterative methods and pathwise conditioning to develop methodological contributions which facilitate the use of Gaussian processes in modern large-scale settings. By combining these two techniques synergistically, expensive computations are expressed as solutions to systems of linear equations and obtained by leveraging iterative linear system solvers. This drastically reduces memory requirements, facilitating application to significantly larger amounts of data, and introduces matrix multiplication as the main computational operation, which is ideal for modern hardware.

Updated: 2025-07-09 13:39:37

标题: 可扩展的高斯过程：迭代方法和路径条件的进展

摘要: 高斯过程是一种强大的不确定性感知函数逼近和序贯决策框架。不幸的是，它们的经典形式在处理大量数据和现代硬件的大规模并行计算方面并不优雅，促使许多研究人员开发改进可扩展性的技术。本文重点研究迭代方法和路径条件技术的强大组合，以开发方法论贡献，促使高斯过程在现代大规模环境中的应用。通过将这两种技术协同结合起来，昂贵的计算被表达为线性方程组的解，并通过利用迭代线性系统求解器获得。这显著减少了内存需求，便于应用于大量数据，并引入矩阵乘法作为主要计算操作，这对现代硬件非常理想。

更新时间: 2025-07-09 13:39:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.06839v1

PBCAT: Patch-based composite adversarial training against physically realizable attacks on object detection

Object detection plays a crucial role in many security-sensitive applications. However, several recent studies have shown that object detectors can be easily fooled by physically realizable attacks, \eg, adversarial patches and recent adversarial textures, which pose realistic and urgent threats. Adversarial Training (AT) has been recognized as the most effective defense against adversarial attacks. While AT has been extensively studied in the $l_\infty$ attack settings on classification models, AT against physically realizable attacks on object detectors has received limited exploration. Early attempts are only performed to defend against adversarial patches, leaving AT against a wider range of physically realizable attacks under-explored. In this work, we consider defending against various physically realizable attacks with a unified AT method. We propose PBCAT, a novel Patch-Based Composite Adversarial Training strategy. PBCAT optimizes the model by incorporating the combination of small-area gradient-guided adversarial patches and imperceptible global adversarial perturbations covering the entire image. With these designs, PBCAT has the potential to defend against not only adversarial patches but also unseen physically realizable attacks such as adversarial textures. Extensive experiments in multiple settings demonstrated that PBCAT significantly improved robustness against various physically realizable attacks over state-of-the-art defense methods. Notably, it improved the detection accuracy by 29.7\% over previous defense methods under one recent adversarial texture attack.

Updated: 2025-07-09 13:36:11

标题: PBCAT：基于补丁的复合对抗训练，针对物理可实现的物体检测攻击

摘要: 目标检测在许多安全敏感应用中起着至关重要的作用。然而，一些最近的研究表明，目标检测器很容易受到物理上可实现的攻击，例如对抗性贴片和最近的对抗性纹理，这构成了现实和紧迫的威胁。对抗训练（AT）被认为是对抗性攻击最有效的防御手段。虽然在分类模型的$l_\infty$攻击设置下已经广泛研究了AT，但对抗性攻击物理上可实现的攻击对目标检测器的AT的研究仍受到限制。早期尝试仅用于防御对抗性贴片，对更广泛范围的物理上可实现的攻击的AT研究尚未深入探讨。在这项工作中，我们考虑使用统一的AT方法来抵御各种物理上可实现的攻击。我们提出了PBCAT，一种新颖的基于贴片的复合对抗训练策略。PBCAT通过结合小区域梯度引导的对抗性贴片和涵盖整个图像的难以察觉的全局对抗扰动来优化模型。通过这些设计，PBCAT有潜力不仅防御对抗性贴片，还能防御对抗性纹理等未知物理上可实现的攻击。在多个设置中进行的大量实验表明，PBCAT显著提高了对各种物理上可实现的攻击的抵抗力，超过了最先进的防御方法。值得注意的是，在最近的一种对抗性纹理攻击下，它将检测准确性提高了29.7％。

更新时间: 2025-07-09 13:36:11

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.23581v2

Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation

Recent advances in diffusion-based and autoregressive video generation models have achieved remarkable visual realism. However, these models typically lack accurate physical alignment, failing to replicate real-world dynamics in object motion. This limitation arises primarily from their reliance on learned statistical correlations rather than capturing mechanisms adhering to physical laws. To address this issue, we introduce a novel framework that integrates symbolic regression (SR) and trajectory-guided image-to-video (I2V) models for physics-grounded video forecasting. Our approach extracts motion trajectories from input videos, uses a retrieval-based pre-training mechanism to enhance symbolic regression, and discovers equations of motion to forecast physically accurate future trajectories. These trajectories then guide video generation without requiring fine-tuning of existing models. Evaluated on scenarios in Classical Mechanics, including spring-mass, pendulums, and projectile motions, our method successfully recovers ground-truth analytical equations and improves the physical alignment of generated videos over baseline methods.

Updated: 2025-07-09 13:28:42

标题: 物理学为基础的运动预测：通过方程发现实现轨迹引导的图像到视频生成

摘要: 最近在基于扩散和自回归的视频生成模型方面取得了显著的视觉逼真度。然而，这些模型通常缺乏准确的物理对准，无法复制物体运动中的真实世界动态。这一限制主要源于它们依赖于学习的统计相关性而不是捕捉符合物理定律的机制。为了解决这个问题，我们引入了一个集成了符号回归（SR）和轨迹引导图像到视频（I2V）模型的新框架，用于基于物理的视频预测。我们的方法从输入视频中提取运动轨迹，使用基于检索的预训练机制增强符号回归，并发现运动方程以预测物理上准确的未来轨迹。这些轨迹然后指导视频生成，而无需对现有模型进行微调。在经典力学场景中进行评估，包括弹簧-质量、摆和抛射运动，我们的方法成功恢复了地面真实解析方程，并改善了生成视频的物理对准，超过了基准方法。

更新时间: 2025-07-09 13:28:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.06830v1

Speckle2Self: Self-Supervised Ultrasound Speckle Reduction Without Clean Data

Image denoising is a fundamental task in computer vision, particularly in medical ultrasound (US) imaging, where speckle noise significantly degrades image quality. Although recent advancements in deep neural networks have led to substantial improvements in denoising for natural images, these methods cannot be directly applied to US speckle noise, as it is not purely random. Instead, US speckle arises from complex wave interference within the body microstructure, making it tissue-dependent. This dependency means that obtaining two independent noisy observations of the same scene, as required by pioneering Noise2Noise, is not feasible. Additionally, blind-spot networks also cannot handle US speckle noise due to its high spatial dependency. To address this challenge, we introduce Speckle2Self, a novel self-supervised algorithm for speckle reduction using only single noisy observations. The key insight is that applying a multi-scale perturbation (MSP) operation introduces tissue-dependent variations in the speckle pattern across different scales, while preserving the shared anatomical structure. This enables effective speckle suppression by modeling the clean image as a low-rank signal and isolating the sparse noise component. To demonstrate its effectiveness, Speckle2Self is comprehensively compared with conventional filter-based denoising algorithms and SOTA learning-based methods, using both realistic simulated US images and human carotid US images. Additionally, data from multiple US machines are employed to evaluate model generalization and adaptability to images from unseen domains. \textit{Code and datasets will be released upon acceptance.

Updated: 2025-07-09 13:28:00

标题: Speckle2Self: 无需干净数据的自监督超声散斑去除

摘要: 图像去噪是计算机视觉中的一个基本任务，特别是在医学超声（US）成像中，斑点噪声显著降低了图像质量。尽管深度神经网络的最新进展在自然图像去噪方面取得了重大进展，但这些方法不能直接应用于US斑点噪声，因为它并非纯粹随机。相反，US斑点起源于人体微结构内的复杂波干涉，使其依赖于组织。这种依赖关系意味着无法获得两个独立的相同场景的噪声观测，这是先驱性Noise2Noise所要求的。此外，由于其高空间依赖性，盲点网络也无法处理US斑点噪声。为了解决这一挑战，我们引入了Speckle2Self，一种新颖的自监督算法，用于仅使用单个嘈杂观察对斑点进行去除。关键见解是，应用多尺度扰动（MSP）操作在不同尺度上引入了依赖于组织的斑点模式变化，同时保留了共享的解剖结构。这通过将干净图像建模为低秩信号并隔离稀疏噪声成分来实现有效的斑点抑制。为了证明其有效性，Speckle2Self在使用逼真模拟的US图像和人体颈动脉US图像时，与传统基于滤波器的去噪算法和SOTA学习方法进行了全面比较。此外，利用多个US机器的数据来评估模型的泛化能力和适应能力，以适应来自未知领域的图像。\textit{代码和数据集将在接受后发布。

更新时间: 2025-07-09 13:28:00

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.06828v1

LLM Agent for Hyper-Parameter Optimization

Hyper-parameters are essential and critical for the performance of communication algorithms. However, current hyper-parameters optimization approaches for Warm-Start Particles Swarm Optimization with Crossover and Mutation (WS-PSO-CM) algorithm, designed for radio map-enabled unmanned aerial vehicle (UAV) trajectory and communication, are primarily heuristic-based, exhibiting low levels of automation and improvable performance. In this paper, we design an Large Language Model (LLM) agent for automatic hyper-parameters-tuning, where an iterative framework and Model Context Protocol (MCP) are applied. In particular, the LLM agent is first set up via a profile, which specifies the boundary of hyper-parameters, task objective, terminal condition, conservative or aggressive strategy of optimizing hyper-parameters, and LLM configurations. Then, the LLM agent iteratively invokes WS-PSO-CM algorithm for exploration. Finally, the LLM agent exits the loop based on the terminal condition and returns an optimized set of hyperparameters. Our experiment results show that the minimal sum-rate achieved by hyper-parameters generated via our LLM agent is significantly higher than those by both human heuristics and random generation methods. This indicates that an LLM agent with PSO and WS-PSO-CM algorithm knowledge is useful in seeking high-performance hyper-parameters.

Updated: 2025-07-09 13:20:45

标题: LLM代理用于超参数优化

摘要: 超参数对通信算法的性能至关重要。然而，目前用于辅助启动粒子群优化与交叉和变异（WS-PSO-CM）算法的超参数优化方法，主要是基于启发式的，自动化水平低且性能有待提升。在本文中，我们设计了一个大型语言模型（LLM）代理用于自动超参数调整，采用了迭代框架和模型上下文协议（MCP）。具体来说，LLM代理首先通过配置文件进行设置，配置文件指定了超参数的边界、任务目标、终止条件、优化超参数的保守或激进策略以及LLM的配置。然后，LLM代理通过迭代调用WS-PSO-CM算法进行探索。最后，LLM代理根据终止条件退出循环，并返回一组优化的超参数。我们的实验结果显示，通过我们的LLM代理生成的超参数所实现的最小总速率显著高于人类启发式和随机生成方法。这表明，具有PSO和WS-PSO-CM算法知识的LLM代理在寻找高性能超参数方面是有用的。

更新时间: 2025-07-09 13:20:45

领域: cs.IT,cs.AI,math.IT

下载: http://arxiv.org/abs/2506.15167v2

Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning

We introduce a real-time strategy game environment built on Generals.io, a game that hosts thousands of active players each week across multiple game formats. Our environment is fully compatible with Gymnasium and PettingZoo, capable of running thousands of frames per second on commodity hardware. Our reference agent -- trained with supervised pre-training and self-play -- hits the top 0.003\% of the 1v1 human leaderboard after just 36 hours on a single H100 GPU. To accelerate learning, we incorporate potential-based reward shaping and memory features. Our contributions -- a modular RTS benchmark and a competitive, state-of-the-art baseline agent -- provide an accessible yet challenging platform for advancing multi-agent reinforcement learning research.

Updated: 2025-07-09 13:15:05

标题: 人工智能通识：用强化学习掌握Generals.io

摘要: 我们介绍了一个基于Generals.io构建的实时策略游戏环境，Generals.io每周吸引数千名活跃玩家参与多种游戏模式。我们的环境与Gymnasium和PettingZoo完全兼容，能够在普通硬件上每秒运行数千帧。我们的参考智能体经过监督预训练和自我对弈训练，仅在单个H100 GPU上36小时后就达到了1v1人类排行榜的前0.003\%。为加快学习速度，我们还引入了基于潜在奖励塑造和记忆特征。我们的贡献是一个模块化的实时策略游戏基准测试和一个具有竞争力的最新基准智能体，为推动多智能体强化学习研究提供了一个既可访问又具有挑战性的平台。

更新时间: 2025-07-09 13:15:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06825v1

Fredholm Neural Networks for forward and inverse problems in elliptic PDEs

Building on our previous work introducing Fredholm Neural Networks (Fredholm NNs/ FNNs) for solving integral equations, we extend the framework to tackle forward and inverse problems for linear and semi-linear elliptic partial differential equations. The proposed scheme consists of a deep neural network (DNN) which is designed to represent the iterative process of fixed-point iterations for the solution of elliptic PDEs using the boundary integral method within the framework of potential theory. The number of layers, weights, biases and hyperparameters are computed in an explainable manner based on the iterative scheme, and we therefore refer to this as the Potential Fredholm Neural Network (PFNN). We show that this approach ensures both accuracy and explainability, achieving small errors in the interior of the domain, and near machine-precision on the boundary. We provide a constructive proof for the consistency of the scheme and provide explicit error bounds for both the interior and boundary of the domain, reflected in the layers of the PFNN. These error bounds depend on the approximation of the boundary function and the integral discretization scheme, both of which directly correspond to components of the Fredholm NN architecture. In this way, we provide an explainable scheme that explicitly respects the boundary conditions. We assess the performance of the proposed scheme for the solution of both the forward and inverse problem for linear and semi-linear elliptic PDEs in two dimensions.

Updated: 2025-07-09 13:12:54

标题: Fredholm神经网络在椭圆型偏微分方程前向和反向问题中的应用

摘要: 基于我们之前介绍的Fredholm神经网络（Fredholm NNs / FNNs）用于解决积分方程的工作，我们将该框架扩展到解决线性和半线性椭圆偏微分方程的正向和反向问题。所提出的方案包括一个深度神经网络（DNN），旨在表示使用边界积分方法解决椭圆PDE的不动点迭代过程。层数、权重、偏置和超参数根据迭代方案以可解释的方式计算，因此我们将其称为Potential Fredholm神经网络（PFNN）。我们展示了这种方法确保了准确性和可解释性，在域的内部实现了较小的误差，并在边界附近接近机器精度。我们为方案的一致性提供了一个建设性证明，并为域的内部和边界提供了显式误差界限，反映在PFNN的层中。这些误差界限取决于边界函数的逼近和积分离散化方案，这两者直接对应于Fredholm NN架构的组成部分。通过这种方式，我们提供了一个可解释的方案，明确遵守边界条件。我们评估了该方案在二维中解决线性和半线性椭圆PDE的正向和反向问题的性能。

更新时间: 2025-07-09 13:12:54

领域: math.NA,cs.LG,cs.NA,68T07, 65N12, 65N21 (Primary), 45B05, 65N38 (Secondary)

下载: http://arxiv.org/abs/2507.06038v2

Hybrid Quantum-Classical Multi-Agent Pathfinding

Multi-Agent Path Finding (MAPF) focuses on determining conflict-free paths for multiple agents navigating through a shared space to reach specified goal locations. This problem becomes computationally challenging, particularly when handling large numbers of agents, as frequently encountered in practical applications like coordinating autonomous vehicles. Quantum Computing (QC) is a promising candidate in overcoming such limits. However, current quantum hardware is still in its infancy and thus limited in terms of computing power and error robustness. In this work, we present the first optimal hybrid quantum-classical MAPF algorithms which are based on branch-andcut-and-price. QC is integrated by iteratively solving QUBO problems, based on conflict graphs. Experiments on actual quantum hardware and results on benchmark data suggest that our approach dominates previous QUBO formulationsand state-of-the-art MAPF solvers.

Updated: 2025-07-09 13:09:38

标题: 混合量子-经典多智能体路径规划

摘要: 多智能体路径规划(MAPF)专注于确定多个代理在共享空间中导航到指定目标位置的无冲突路径。这个问题在处理大量代理时变得计算上具有挑战性，特别是在实际应用中经常遇到的协调自主车辆的情况下。量子计算(QC)是克服这些限制的有希望的候选。然而，当前的量子硬件仍处于初级阶段，因此在计算能力和错误鲁棒性方面受到限制。在这项工作中，我们提出了基于分支和价格的首个最优混合量子-经典MAPF算法。QC通过迭代解决基于冲突图的QUBO问题进行集成。对实际量子硬件的实验和基准数据的结果表明，我们的方法优于先前的QUBO公式和最先进的MAPF求解器。

更新时间: 2025-07-09 13:09:38

领域: cs.AI,quant-ph

下载: http://arxiv.org/abs/2501.14568v2

Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts

Large Language Models (LLMs) are increasingly employed as automated evaluators to assess the safety of generated content, yet their reliability in this role remains uncertain. This study evaluates a diverse set of 11 LLM judge models across critical safety domains, examining three key aspects: self-consistency in repeated judging tasks, alignment with human judgments, and susceptibility to input artifacts such as apologetic or verbose phrasing. Our findings reveal that biases in LLM judges can significantly distort the final verdict on which content source is safer, undermining the validity of comparative evaluations. Notably, apologetic language artifacts alone can skew evaluator preferences by up to 98\%. Contrary to expectations, larger models do not consistently exhibit greater robustness, while smaller models sometimes show higher resistance to specific artifacts. To mitigate LLM evaluator robustness issues, we investigate jury-based evaluations aggregating decisions from multiple models. Although this approach both improves robustness and enhances alignment to human judgements, artifact sensitivity persists even with the best jury configurations. These results highlight the urgent need for diversified, artifact-resistant methodologies to ensure reliable safety assessments.

Updated: 2025-07-09 13:09:13

标题: 更安全还是更幸运？LLMs作为安全评估者对人为干扰不具有稳健性

摘要: 大型语言模型(LLMs)越来越被用作自动生成内容安全性评估的自动化评估器，然而它们在这一角色中的可靠性仍然不确定。本研究评估了一个多样化的11个LLM评判模型，涵盖了关键的安全领域，考察了三个关键方面：在重复评判任务中的自一致性，与人类判断的一致性，以及对诸如道歉或冗长措辞等输入工件的敏感性。我们的发现显示，LLM评判中存在的偏见可以显著扭曲哪种内容来源更安全的最终结论，破坏了比较评估的有效性。值得注意的是，仅仅道歉性语言工件就可以使评估者的偏好偏向高达98\%。与预期相反，更大的模型并不始终表现出更高的稳健性，而较小的模型有时表现出更高的对特定工件的抵抗力。为了减轻LLM评估者的稳健性问题，我们研究了基于陪审团的评估方法，汇总多个模型的决策。尽管这种方法既提高了稳健性，又增强了与人类判断的一致性，但即使在最佳陪审团配置下，工件敏感性仍然存在。这些结果突显了迫切需要多样化、抗工件方法论以确保可靠的安全评估。

更新时间: 2025-07-09 13:09:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.09347v3

HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning

Multi-modal emotion recognition has garnered increasing attention as it plays a significant role in human-computer interaction (HCI) in recent years. Since different discrete emotions may exist at the same time, compared with single-class emotion recognition, emotion distribution learning (EDL) that identifies a mixture of basic emotions has gradually emerged as a trend. However, existing EDL methods face challenges in mining the heterogeneity among multiple modalities. Besides, rich semantic correlations across arbitrary basic emotions are not fully exploited. In this paper, we propose a multi-modal emotion distribution learning framework, named HeLo, aimed at fully exploring the heterogeneity and complementary information in multi-modal emotional data and label correlation within mixed basic emotions. Specifically, we first adopt cross-attention to effectively fuse the physiological data. Then, an optimal transport (OT)-based heterogeneity mining module is devised to mine the interaction and heterogeneity between the physiological and behavioral representations. To facilitate label correlation learning, we introduce a learnable label embedding optimized by correlation matrix alignment. Finally, the learnable label embeddings and label correlation matrices are integrated with the multi-modal representations through a novel label correlation-driven cross-attention mechanism for accurate emotion distribution learning. Experimental results on two publicly available datasets demonstrate the superiority of our proposed method in emotion distribution learning.

Updated: 2025-07-09 13:08:58

标题: HeLo:基于标签相关性的异构多模态融合用于情绪分布学习

摘要: 多模态情绪识别近年来在人机交互（HCI）中引起了越来越多的关注，因为它在其中扮演着重要的角色。由于不同的离散情绪可能同时存在，与单一类情绪识别相比，识别混合基本情绪的情绪分布学习（EDL）逐渐成为一种趋势。然而，现有的EDL方法面临着在多模态之间挖掘异质性的挑战。此外，跨任意基本情绪之间的丰富语义相关性尚未充分利用。在本文中，我们提出了一个名为HeLo的多模态情绪分布学习框架，旨在充分探索多模态情感数据中的异质性和互补信息以及混合基本情绪内的标签相关性。具体而言，我们首先采用交叉注意力来有效地融合生理数据。然后，设计了一个基于最优输运（OT）的异质性挖掘模块，用于挖掘生理和行为表示之间的交互作用和异质性。为了促进标签相关性学习，我们引入了通过相关矩阵对齐优化的可学习标签嵌入。最后，通过一种新颖的标签相关性驱动的交叉注意力机制，将可学习的标签嵌入和标签相关矩阵与多模态表示集成，实现准确的情绪分布学习。在两个公开可用数据集上的实验结果表明，我们提出的方法在情绪分布学习方面具有优越性。

更新时间: 2025-07-09 13:08:58

领域: cs.LG,cs.AI,cs.MM

下载: http://arxiv.org/abs/2507.06821v1

Comprehensive Evaluation of Prototype Neural Networks

Prototype models are an important method for explainable artificial intelligence (XAI) and interpretable machine learning. In this paper, we perform an in-depth analysis of a set of prominent prototype models including ProtoPNet, ProtoPool and PIPNet. For their assessment, we apply a comprehensive set of metrics. In addition to applying standard metrics from literature, we propose several new metrics to further complement the analysis of model interpretability. In our experimentation, we apply the set of prototype models on a diverse set of datasets including fine-grained classification, Non-IID settings and multi-label classification to further contrast the performance. Furthermore, we also provide our code as an open-source library, which facilitates simple application of the metrics itself, as well as extensibility - providing the option for easily adding new metrics and models. https://github.com/uos-sis/quanproto

Updated: 2025-07-09 13:08:21

标题: 原型神经网络的全面评估

摘要: 原型模型是可解释人工智能（XAI）和可解释机器学习的重要方法。本文对一组著名的原型模型进行了深入分析，包括ProtoPNet、ProtoPool和PIPNet。为了评估它们，我们应用了一套全面的指标。除了应用文献中的标准指标外，我们还提出了几个新的指标，以进一步补充对模型可解释性的分析。在我们的实验中，我们将一组原型模型应用于各种数据集，包括细粒度分类、非独立同分布设置和多标签分类，以进一步对比性能。此外，我们还提供我们的代码作为一个开源库，可以简单地应用这些指标，以及可扩展性 - 提供轻松添加新指标和模型的选项。

更新时间: 2025-07-09 13:08:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06819v1

Bayesian Multi-Scale Neural Network for Crowd Counting

Crowd counting is a challenging yet critical task in computer vision with applications ranging from public safety to urban planning. Recent advances using Convolutional Neural Networks (CNNs) that estimate density maps have shown significant success. However, accurately counting individuals in highly congested scenes remains an open problem due to severe occlusions, scale variations, and perspective distortions, where people appear at drastically different sizes across the image. In this work, we propose a novel deep learning architecture that effectively addresses these challenges. Our network integrates a ResNet-based feature extractor for capturing rich hierarchical representations, followed by a downsampling block employing dilated convolutions to preserve spatial resolution while expanding the receptive field. An upsampling block using transposed convolutions reconstructs the high-resolution density map. Central to our architecture is a novel Perspective-aware Aggregation Module (PAM) designed to enhance robustness to scale and perspective variations by adaptively aggregating multi-scale contextual information. We detail the training procedure, including the loss functions and optimization strategies used. Our method is evaluated on three widely used benchmark datasets using Mean Absolute Error (MAE) and Mean Squared Error (MSE) as evaluation metrics. Experimental results demonstrate that our model achieves superior performance compared to existing state-of-the-art methods. Additionally, we incorporate principled Bayesian inference techniques to provide uncertainty estimates along with the crowd count predictions, offering a measure of confidence in the model's outputs.

Updated: 2025-07-09 13:07:00

标题: 贝叶斯多尺度神经网络用于人群计数

摘要: 人群计数是计算机视觉中具有挑战性但至关重要的任务，其应用范围从公共安全到城市规划不等。最近利用卷积神经网络（CNNs）估计密度图的方法取得了显著成功。然而，在高度拥挤的场景中准确计数个体仍然是一个未解决的问题，这是由于严重的遮挡、尺度变化和透视失真，导致人们在图像中呈现出极不同的大小。在这项工作中，我们提出了一种新颖的深度学习架构，有效地解决了这些挑战。我们的网络集成了基于ResNet的特征提取器，用于捕获丰富的分层表示，接着是一个使用扩张卷积来保持空间分辨率并扩展感受野的下采样块。一个使用转置卷积重构高分辨率密度图的上采样块。我们架构的核心是一种新颖的透视感知聚合模块（PAM），旨在通过自适应聚合多尺度的上下文信息，增强对尺度和透视变化的鲁棒性。我们详细介绍了训练过程，包括使用的损失函数和优化策略。我们的方法在三个广泛使用的基准数据集上使用平均绝对误差（MAE）和平均平方误差（MSE）作为评估指标进行评估。实验结果表明，我们的模型相对于现有的最先进方法表现出更优异的性能。此外，我们还结合了基于原则的贝叶斯推理技术，提供了伴随人群计数预测的不确定性估计，为模型输出提供了信心度的度量。

更新时间: 2025-07-09 13:07:00

领域: cs.CV,cs.LG,stat.ML

下载: http://arxiv.org/abs/2007.14245v4

Designing Robust Software Sensors for Nonlinear Systems via Neural Networks and Adaptive Sliding Mode Control

Accurate knowledge of the state variables in a dynamical system is critical for effective control, diagnosis, and supervision, especially when direct measurements of all states are infeasible. This paper presents a novel approach to designing software sensors for nonlinear dynamical systems expressed in their most general form. Unlike traditional model-based observers that rely on explicit transformations or linearization, the proposed framework integrates neural networks with adaptive Sliding Mode Control (SMC) to design a robust state observer under a less restrictive set of conditions. The learning process is driven by available sensor measurements, which are used to correct the observer's state estimate. The training methodology leverages the system's governing equations as a physics-based constraint, enabling observer synthesis without access to ground-truth state trajectories. By employing a time-varying gain matrix dynamically adjusted by the neural network, the observer adapts in real-time to system changes, ensuring robustness against noise, external disturbances, and variations in system dynamics. Furthermore, we provide sufficient conditions to guarantee estimation error convergence, establishing a theoretical foundation for the observer's reliability. The methodology's effectiveness is validated through simulations on challenging examples, including systems with non-differentiable dynamics and varying observability conditions. These examples, which are often problematic for conventional techniques, serve to demonstrate the robustness and broad applicability of our approach. The results show rapid convergence and high accuracy, underscoring the method's potential for addressing complex state estimation challenges in real-world applications.

Updated: 2025-07-09 13:06:58

标题: 通过神经网络和自适应滑模控制为非线性系统设计健壮的软件传感器

摘要: 准确了解动态系统中的状态变量对于有效的控制、诊断和监督至关重要，特别是当所有状态的直接测量不可行时。本文提出了一种新颖的方法，用于设计表达为最一般形式的非线性动态系统的软件传感器。与依赖于显式转换或线性化的传统基于模型的观测器不同，所提出的框架将神经网络与自适应滑模控制（SMC）结合起来，设计出一个在较少限制条件下的健壮状态观测器。学习过程由可用传感器测量驱动，用于修正观察者的状态估计。培训方法利用系统的控制方程作为基于物理的约束，使观察器在没有接触地面真实状态轨迹的情况下合成。通过使用由神经网络动态调整的时间变化增益矩阵，观察者实时适应系统变化，确保抵抗噪声、外部干扰和系统动态变化的健壮性。此外，我们提供了足够的条件来保证估计误差的收敛，为观察者的可靠性建立了理论基础。通过在具有非可微动态和不同可观察性条件的具有挑战性示例上进行模拟验证了该方法的有效性。这些通常对传统技术有问题的示例，用来展示我们方法的健壮性和广泛适用性。结果显示快速收敛和高精度，强调了该方法在解决实际应用中的复杂状态估计挑战方面的潜力。

更新时间: 2025-07-09 13:06:58

领域: math.DS,cs.LG,cs.NE,math.OC

下载: http://arxiv.org/abs/2507.06817v1

RAG Safety: Exploring Knowledge Poisoning Attacks to Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by retrieving external data to mitigate hallucinations and outdated knowledge issues. Benefiting from the strong ability in facilitating diverse data sources and supporting faithful reasoning, knowledge graphs (KGs) have been increasingly adopted in RAG systems, giving rise to KG-based RAG (KG-RAG) methods. Though RAG systems are widely applied in various applications, recent studies have also revealed its vulnerabilities to data poisoning attacks, where malicious information injected into external knowledge sources can mislead the system into producing incorrect or harmful responses. However, these studies focus exclusively on RAG systems using unstructured textual data sources, leaving the security risks of KG-RAG largely unexplored, despite the fact that KGs present unique vulnerabilities due to their structured and editable nature. In this work, we conduct the first systematic investigation of the security issue of KG-RAG methods through data poisoning attacks. To this end, we introduce a practical, stealthy attack setting that aligns with real-world implementation. We propose an attack strategy that first identifies adversarial target answers and then inserts perturbation triples to complete misleading inference chains in the KG, increasing the likelihood that KG-RAG methods retrieve and rely on these perturbations during generation. Through extensive experiments on two benchmarks and four recent KG-RAG methods, our attack strategy demonstrates strong effectiveness in degrading KG-RAG performance, even with minimal KG perturbations. In-depth analyses are also conducted to understand the safety threats within the internal stages of KG-RAG systems and to explore the robustness of LLMs against adversarial knowledge.

Updated: 2025-07-09 13:06:58

标题: RAG 安全性：探索知识中毒攻击对检索增强生成的影响

摘要: 检索增强生成（RAG）通过检索外部数据来增强大型语言模型（LLMs），以减轻幻觉和过时知识问题。由于具有促进多样数据源和支持忠实推理的强大能力，知识图（KGs）在RAG系统中越来越受到采用，产生了基于KG的RAG（KG-RAG）方法。尽管RAG系统广泛应用于各种应用程序，但最近的研究也揭示了其对数据中毒攻击的脆弱性，其中注入到外部知识源中的恶意信息可能会误导系统产生不正确或有害的响应。然而，这些研究仅关注使用非结构化文本数据源的RAG系统，而对于基于KG的RAG的安全风险却很少被探讨，尽管KG由于其结构化和可编辑的特性具有独特的脆弱性。在这项工作中，我们通过数据中毒攻击首次系统地调查了KG-RAG方法的安全问题。为此，我们提出了一个与实际实现相一致的实用的隐蔽攻击设置。我们提出了一种攻击策略，首先确定对抗性目标答案，然后插入扰动三元组以完成知识图中误导推理链，增加KG-RAG方法检索并依赖这些扰动的可能性。通过对两个基准和四种最近的KG-RAG方法进行广泛实验，我们的攻击策略表明即使只有最小的KG扰动，也能有效地降低KG-RAG的性能。我们还进行了深入分析，以了解KG-RAG系统内部阶段的安全威胁，并探索LLMs对抗性知识的鲁棒性。

更新时间: 2025-07-09 13:06:58

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2507.08862v1

Intrinsic Training Signals for Federated Learning Aggregation

Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy. While existing approaches for aggregating client-specific classification heads and adapted backbone parameters require architectural modifications or loss function changes, our method uniquely leverages intrinsic training signals already available during standard optimization. We present LIVAR (Layer Importance and VARiance-based merging), which introduces: i) a variance-weighted classifier aggregation scheme using naturally emergent feature statistics, and ii) an explainability-driven LoRA merging technique based on SHAP analysis of existing update parameter patterns. Without any architectural overhead, LIVAR achieves state-of-the-art performance on multiple benchmarks while maintaining seamless integration with existing FL methods. This work demonstrates that effective model merging can be achieved solely through existing training signals, establishing a new paradigm for efficient federated model aggregation. The code will be made publicly available upon acceptance.

Updated: 2025-07-09 13:03:23

标题: 联邦学习聚合的固有训练信号

摘要: 联邦学习（FL）使得在分布式客户端之间进行协作模型训练的同时保护数据隐私成为可能。虽然现有的方法用于聚合客户端特定的分类头和适应的主干参数需要进行架构修改或损失函数改变，但我们的方法独特地利用了在标准优化过程中已经可用的内在训练信号。我们提出了LIVAR（基于层重要性和方差的合并），其中引入了：i）使用自然产生的特征统计的方差加权分类器聚合方案，以及ii）基于SHAP分析现有更新参数模式的可解释性驱动的LoRA合并技术。在没有任何架构开销的情况下，LIVAR在多个基准测试上实现了最先进的性能，同时与现有的FL方法实现了无缝集成。这项工作表明，有效的模型合并可以仅通过现有的训练信号实现，为高效的联邦模型聚合建立了一个新的范式。该代码将在接受后公开发布。

更新时间: 2025-07-09 13:03:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06813v1

Democratizing High-Fidelity Co-Speech Gesture Video Generation

Co-speech gesture video generation aims to synthesize realistic, audio-aligned videos of speakers, complete with synchronized facial expressions and body gestures. This task presents challenges due to the significant one-to-many mapping between audio and visual content, further complicated by the scarcity of large-scale public datasets and high computational demands. We propose a lightweight framework that utilizes 2D full-body skeletons as an efficient auxiliary condition to bridge audio signals with visual outputs. Our approach introduces a diffusion model conditioned on fine-grained audio segments and a skeleton extracted from the speaker's reference image, predicting skeletal motions through skeleton-audio feature fusion to ensure strict audio coordination and body shape consistency. The generated skeletons are then fed into an off-the-shelf human video generation model with the speaker's reference image to synthesize high-fidelity videos. To democratize research, we present CSG-405-the first public dataset with 405 hours of high-resolution videos across 71 speech types, annotated with 2D skeletons and diverse speaker demographics. Experiments show that our method exceeds state-of-the-art approaches in visual quality and synchronization while generalizing across speakers and contexts.

Updated: 2025-07-09 13:02:12

标题: 民主化高保真共话手势视频生成

摘要: 共同语音手势视频生成旨在合成说话者的逼真、与音频对齐的视频，包括同步的面部表情和身体手势。由于音频和视觉内容之间存在显著的一对多映射，再加上大规模公共数据集稀缺和高计算需求，这项任务面临挑战。我们提出了一个轻量级框架，利用2D全身骨架作为有效的辅助条件，将音频信号与视觉输出连接起来。我们的方法引入了一个扩散模型，该模型以细粒度音频片段和从说话者参考图像中提取的骨架为条件，通过骨架-音频特征融合来预测骨架运动，以确保严格的音频协调和身体形状一致性。生成的骨架然后被输入到一个现成的人类视频生成模型中，与说话者的参考图像一起合成高保真度视频。为促进研究，我们提出了CSG-405-第一个包含71种语音类型的405小时高分辨率视频的公共数据集，注释有2D骨架和多样化的说话者人口统计信息。实验表明，我们的方法在视觉质量和同步方面超越了现有技术方法，能够在说话者和语境之间泛化。

更新时间: 2025-07-09 13:02:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.06812v1

Noise tolerance via reinforcement: Learning a reinforced quantum dynamics

The performance of quantum simulations heavily depends on the efficiency of noise mitigation techniques and error correction algorithms. Reinforcement has emerged as a powerful strategy to enhance the efficiency of learning and optimization algorithms. In this study, we demonstrate that a reinforced quantum dynamics can exhibit significant robustness against interactions with a noisy environment. We study a quantum annealing process where, through reinforcement, the system is encouraged to maintain its current state or follow a noise-free evolution. A learning algorithm is employed to derive a concise approximation of this reinforced dynamics, reducing the total evolution time and, consequently, the system's exposure to noisy interactions. This also avoids the complexities associated with implementing quantum feedback in such reinforcement algorithms. The efficacy of our method is demonstrated through numerical simulations of reinforced quantum annealing with one- and two-qubit systems under Pauli noise.

Updated: 2025-07-09 13:01:12

标题: 通过强化学习实现的噪声容忍：学习强化的量子动力学

摘要: 量子模拟的性能严重依赖于噪声缓解技术和错误修正算法的效率。强化已经成为增强学习和优化算法效率的强大策略。在本研究中，我们证明了通过强化量子动力学可以显著增强系统对与嘈杂环境的相互作用的稳健性。我们研究了一个量子退火过程，在此过程中，通过强化，系统被鼓励保持当前状态或遵循无噪声演化。我们采用学习算法推导了这种强化动态的简明近似，减少总演化时间，从而减少系统对嘈杂相互作用的暴露。这也避免了在这种强化算法中实施量子反馈所涉及的复杂性。我们通过数值模拟证明了我们的方法的有效性，使用一位和两位量子比特系统在Pauli噪声下进行了强化量子退火。

更新时间: 2025-07-09 13:01:12

领域: quant-ph,cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2506.12418v2

A Note on the Walsh Spectrum of Power Residue S-Boxes

Let $\mathbb{F}_q$ be a prime field with $q \geq 3$, and let $d, m \geq 1$ be integers such that $\gcd \left( d, q \right) = 1$ and $m \mid (q - 1)$. In this paper we bound the absolute values of the Walsh spectrum of S-Boxes $S (x) = x^d \cdot T \left( x^\frac{q - 1}{m} \right)$, where $T$ is a function with $T (x) \neq 0$ if $x \neq 0$. Such S-Boxes have been proposed for the Zero-Knowledge-friendly hash functions Grendel and Polocolo. In particular, we prove the conjectured correlation of the Polocolo S-Box.

Updated: 2025-07-09 12:57:15

标题: 关于幂余S-盒的Walsh谱的一点说明

摘要: 让$\mathbb{F}_q$是一个素数域，其中$q \geq 3$，并且让$d, m \geq 1$是整数，满足$\gcd \left( d, q \right) = 1$和$m \mid (q - 1)$。在这篇论文中，我们限制了S-盒$S (x) = x^d \cdot T \left( x^\frac{q - 1}{m} \right)$的Walsh谱的绝对值，其中$T$是一个具有$T(x) \neq 0$（如果$x \neq 0$）的函数。这样的S-盒已经被提议用于零知识友好的哈希函数Grendel和Polocolo。特别地，我们证明了Polocolo S-盒的猜想相关性。

更新时间: 2025-07-09 12:57:15

领域: math.NT,cs.CR

下载: http://arxiv.org/abs/2507.06808v1

Very fast Bayesian Additive Regression Trees on GPU

Bayesian Additive Regression Trees (BART) is a nonparametric Bayesian regression technique based on an ensemble of decision trees. It is part of the toolbox of many statisticians. The overall statistical quality of the regression is typically higher than other generic alternatives, and it requires less manual tuning, making it a good default choice. However, it is a niche method compared to its natural competitor XGBoost, due to the longer running time, making sample sizes above 10,000-100,000 a nuisance. I present a GPU-enabled implementation of BART, faster by up to 200x relative to a single CPU core, making BART competitive in running time with XGBoost. This implementation is available in the Python package bartz.

Updated: 2025-07-09 12:54:23

标题: 在GPU上运行的非常快速的贝叶斯附加回归树

摘要: Bayesian Additive Regression Trees (BART) 是一种非参数贝叶斯回归技术，基于一组决策树。它是许多统计学家工具箱中的一部分。回归的整体统计质量通常高于其他通用替代方案，并且需要较少的手动调整，使其成为一个很好的默认选择。然而，与其自然竞争对手XGBoost相比，它是一种利基方法，由于运行时间较长，使得样本量超过10,000-100,000会产生麻烦。我提出了一个支持GPU的BART实现，相对于单个CPU核心快200倍，使BART在运行时间上与XGBoost具有竞争力。这个实现可以在Python包bartz中找到。

更新时间: 2025-07-09 12:54:23

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.23244v2

A Wireless Foundation Model for Multi-Task Prediction

With the growing complexity and dynamics of the mobile communication networks, accurately predicting key system parameters, such as channel state information (CSI), user location, and network traffic, has become essential for a wide range of physical (PHY)-layer and medium access control (MAC)-layer tasks. Although traditional deep learning (DL)-based methods have been widely applied to such prediction tasks, they often struggle to generalize across different scenarios and tasks. In response, we propose a unified foundation model for multi-task prediction in wireless networks that supports diverse prediction intervals. The proposed model enforces univariate decomposition to unify heterogeneous tasks, encodes granularity for interval awareness, and uses a causal Transformer backbone for accurate predictions. Additionally, we introduce a patch masking strategy during training to support arbitrary input lengths. After trained on large-scale datasets, the proposed foundation model demonstrates strong generalization to unseen scenarios and achieves zero-shot performance on new tasks that surpass traditional full-shot baselines.

Updated: 2025-07-09 12:45:07

标题: 一个用于多任务预测的无线基础模型

摘要: 随着移动通信网络的复杂性和动态性不断增长，准确预测关键系统参数，如信道状态信息（CSI）、用户位置和网络流量，对于广泛的物理层（PHY）和介质访问控制（MAC）层任务而言变得至关重要。尽管传统的基于深度学习（DL）的方法已被广泛应用于此类预测任务，但它们往往难以在不同情景和任务之间进行泛化。为此，我们提出了一个支持多任务预测的无线网络统一基础模型，该模型支持多样化的预测间隔。所提出的模型通过强制进行单变量分解以统一异构任务，对间隔感知进行编码，并使用因果Transformer骨干进行准确预测。此外，在训练过程中我们引入了一个补丁遮罩策略以支持任意输入长度。经过大规模数据集的训练，所提出的基础模型展现了对未知情景的强大泛化能力，并在新任务上实现了零样本表现，超越了传统的全样本基线。

更新时间: 2025-07-09 12:45:07

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.05938v2

Text to model via SysML: Automated generation of dynamical system computational models from unstructured natural language text via enhanced System Modeling Language diagrams

This paper contributes to speeding up the design and deployment of engineering dynamical systems by proposing a strategy for exploiting domain and expert knowledge for the automated generation of dynamical system computational model starting from a corpus of document relevant to the dynamical system of interest and an input document describing the specific system. This strategy is implemented in five steps and, crucially, it uses system modeling language diagrams (SysML) to extract accurate information about the dependencies, attributes, and operations of components. Natural Language Processing (NLP) strategies and Large Language Models (LLMs) are employed in specific tasks to improve intermediate outputs of the SySML diagrams automated generation, such as: list of key nouns; list of extracted relationships; list of key phrases and key relationships; block attribute values; block relationships; and BDD diagram generation. The applicability of automated SysML diagram generation is illustrated with different case studies. The computational models of complex dynamical systems from SysML diagrams are then obtained via code generation and computational model generation steps. In the code generation step, NLP strategies are used for summarization, while LLMs are used for validation only. The proposed approach is not limited to a specific system, domain, or computational software. The applicability of the proposed approach is shown via an end-to-end example from text to model of a simple pendulum, showing improved performance compared to results yielded by LLMs only.

Updated: 2025-07-09 12:44:49

标题: 通过SysML将文本转换为模型：通过增强的系统建模语言图表自动从非结构化自然语言文本生成动态系统计算模型

摘要: 本文提出了一种利用领域和专家知识加速工程动态系统设计和部署的策略，通过从与感兴趣的动态系统相关的文档语料库和描述具体系统的输入文档开始自动生成动态系统计算模型。这种策略分为五个步骤，并且关键地使用系统建模语言图表（SysML）来提取关于组件的依赖关系、属性和操作的准确信息。在特定任务中，采用自然语言处理（NLP）策略和大型语言模型（LLMs）来改进SySML图表自动生成的中间输出，例如：关键名词列表；提取的关系列表；关键短语和关键关系列表；块属性值；块关系；以及BDD图表生成。通过不同案例研究展示了自动生成SysML图表的适用性。然后通过代码生成和计算模型生成步骤从SysML图表获得复杂动态系统的计算模型。在代码生成步骤中，使用NLP策略进行总结，而只用LLMs进行验证。该提议的方法不限于特定系统、领域或计算软件。通过从文本到简单摆的模型的端到端示例展示了该方法的适用性，显示出与仅使用LLMs产生的结果相比有更好的性能。

更新时间: 2025-07-09 12:44:49

领域: cs.CL,cs.AI,cs.CE

下载: http://arxiv.org/abs/2507.06803v1

Speech Tokenizer is Key to Consistent Representation

Speech tokenization is crucial in digital speech processing, converting continuous speech signals into discrete units for various computational tasks. This paper introduces a novel speech tokenizer with broad applicability across downstream tasks. While recent advances in residual vector quantization (RVQ) have incorporated semantic elements, they often neglect critical acoustic features. We propose an advanced approach that simultaneously encodes both linguistic and acoustic information, preserving prosodic and emotional content. Our method significantly enhances speech representation fidelity across diverse applications. Empirical evaluations demonstrate its effectiveness in speech coding, voice conversion, emotion recognition, and multimodal language modeling, without requiring additional training. This versatility underscores its potential as a key tool for advancing AI-driven speech processing.

Updated: 2025-07-09 12:43:39

标题: 语音分词器是保持一致表示的关键。

摘要: 语音标记是数字语音处理中至关重要的，将连续语音信号转换为离散单元以进行各种计算任务。本文介绍了一种新颖的语音标记器，在各种下游任务中具有广泛的适用性。虽然最近在剩余向量量化（RVQ）方面取得了进展，已经纳入了语义元素，但它们经常忽略了关键的声学特征。我们提出了一种先进的方法，同时编码了语言和声学信息，保留了韵律和情感内容。我们的方法显著增强了在各种应用中的语音表示保真度。实证评估表明，在不需要额外训练的情况下，该方法在语音编码、语音转换、情感识别和多模态语言建模方面的有效性。这种多功能性凸显了其作为推动基于人工智能的语音处理的关键工具的潜力。

更新时间: 2025-07-09 12:43:39

领域: cs.LG

下载: http://arxiv.org/abs/2507.06802v1

Comparing Dialectical Systems: Contradiction and Counterexample in Belief Change (Extended Version)

Dialectical systems are a mathematical formalism for modeling an agent updating a knowledge base seeking consistency. Introduced in the 1970s by Roberto Magari, they were originally conceived to capture how a working mathematician or a research community refines beliefs in the pursuit of truth. Dialectical systems also serve as natural models for the belief change of an automated agent, offering a unifying, computable framework for dynamic belief management. The literature distinguishes three main models of dialectical systems: (d-)dialectical systems based on revising beliefs when they are seen to be inconsistent, p-dialectical systems based on revising beliefs based on finding a counterexample, and q-dialectical systems which can do both. We answer an open problem in the literature by proving that q-dialectical systems are strictly more powerful than p-dialectical systems, which are themselves known to be strictly stronger than (d-)dialectical systems. This result highlights the complementary roles of counterexample and contradiction in automated belief revision, and thus also in the reasoning processes of mathematicians and research communities.

Updated: 2025-07-09 12:35:20

标题: 比较辩证系统：信念变化中的矛盾和反例（扩展版）

摘要: 辩证系统是一种数学形式化方法，用于建模一个代理更新知识库以寻求一致性。由Roberto Magari在1970年代引入，最初是为了捕捉一个工作数学家或研究团体在追求真理时如何完善信念。辩证系统还可以作为自动代理的信念变化的自然模型，为动态信念管理提供一个统一的、可计算的框架。文献区分了三种主要的辩证系统模型：基于发现信念不一致而修正信念的(d-)辩证系统，基于发现反例而修正信念的p-辩证系统，以及可以同时做到两者的q-辩证系统。我们通过证明q-辩证系统严格比p-辩证系统更强大来回答文献中的一个开放问题，而后者本身已被证明严格比(d-)辩证系统更强大。这一结果突显了反例和矛盾在自动信念修正中的互补作用，也因此在数学家和研究团体的推理过程中起着重要作用。

更新时间: 2025-07-09 12:35:20

领域: cs.AI,math.LO

下载: http://arxiv.org/abs/2507.06798v1

Neural Networks for Tamed Milstein Approximation of SDEs with Additive Symmetric Jump Noise Driven by a Poisson Random Measure

This work aims to estimate the drift and diffusion functions in stochastic differential equations (SDEs) driven by a particular class of L\'evy processes with finite jump intensity, using neural networks. We propose a framework that integrates the Tamed-Milstein scheme with neural networks employed as non-parametric function approximators. Estimation is carried out in a non-parametric fashion for the drift function $f: \mathbb{Z} \to \mathbb{R}$, the diffusion coefficient $g: \mathbb{Z} \to \mathbb{R}$. The model of interest is given by \[ dX(t) = \xi + f(X(t))\, dt + g(X(t))\, dW_t + \gamma \int_{\mathbb{Z}} z\, N(dt,dz), \] where $W_t$ is a standard Brownian motion, and $N(dt,dz)$ is a Poisson random measure on $(\mathbb{R}_{+} \times \mathbb{Z}$, $\mathcal{B} (\mathbb{R}_{+}) \otimes \mathcal{Z}$, $\lambda( \Lambda \otimes v))$, with $\lambda, \gamma > 0$, $\Lambda$ being the Lebesgue measure on $\mathbb{R}_{+}$, and $v$ a finite measure on the measurable space $(\mathbb{Z}, \mathcal{Z})$. Neural networks are used as non-parametric function approximators, enabling the modeling of complex nonlinear dynamics without assuming restrictive functional forms. The proposed methodology constitutes a flexible alternative for inference in systems with state-dependent noise and discontinuities driven by L\'evy processes.

Updated: 2025-07-09 12:33:51

标题: 神经网络用于受泊松随机度量驱动的具有对称加性跳跃噪声的SDE的驯服米尔斯坦逼近

摘要: 这项工作旨在利用神经网络估计由具有有限跳跃强度的特定类别的L\'evy 过程驱动的随机微分方程(SDEs)中的漂移和扩散函数。我们提出了一个框架，将Tamed-Milstein方案与神经网络结合，神经网络被用作非参数函数逼近器。漂移函数$f: \mathbb{Z} \to \mathbb{R}$和扩散系数$g: \mathbb{Z} \to \mathbb{R}$的估计以非参数方式进行。感兴趣的模型由\[ dX(t) = \xi + f(X(t))\, dt + g(X(t))\, dW_t + \gamma \int_{\mathbb{Z}} z\, N(dt,dz), \]给出，其中$W_t$是标准布朗运动，$N(dt,dz)$是$(\mathbb{R}_{+} \times \mathbb{Z}, \mathcal{B} (\mathbb{R}_{+}) \otimes \mathcal{Z}, \lambda( \Lambda \otimes v))$上的泊松随机测度，其中$\lambda, \gamma > 0$，$\Lambda$是$\mathbb{R}_{+}$上的勒贝格测度，$v$是可测空间$(\mathbb{Z}, \mathcal{Z})$上的有限测度。神经网络被用作非参数函数逼近器，可以对复杂的非线性动态进行建模，而不需要假设限制性的函数形式。所提出的方法构成了在由L\'evy过程驱动的具有状态相关噪声和不连续性的系统中进行推断的灵活替代方法。

更新时间: 2025-07-09 12:33:51

领域: stat.ML,cs.LG,60H10, 68T07,I.2.6; G.3

下载: http://arxiv.org/abs/2507.04417v2

The cost of ensembling: is it always worth combining?

Given the continuous increase in dataset sizes and the complexity of forecasting models, the trade-off between forecast accuracy and computational cost is emerging as an extremely relevant topic, especially in the context of ensemble learning for time series forecasting. To asses it, we evaluated ten base models and eight ensemble configurations across two large-scale retail datasets (M5 and VN1), considering both point and probabilistic accuracy under varying retraining frequencies. We showed that ensembles consistently improve forecasting performance, particularly in probabilistic settings. However, these gains come at a substantial computational cost, especially for larger, accuracy-driven ensembles. We found that reducing retraining frequency significantly lowers costs, with minimal impact on accuracy, particularly for point forecasts. Moreover, efficiency-driven ensembles offer a strong balance, achieving competitive accuracy with considerably lower costs compared to accuracy-optimized combinations. Most importantly, small ensembles of two or three models are often sufficient to achieve near-optimal results. These findings provide practical guidelines for deploying scalable and cost-efficient forecasting systems, supporting the broader goals of sustainable AI in forecasting. Overall, this work shows that careful ensemble design and retraining strategy selection can yield accurate, robust, and cost-effective forecasts suitable for real-world applications.

Updated: 2025-07-09 12:32:09

标题: 集成的成本：总是值得组合吗？

摘要: 鉴于数据集大小不断增加和预测模型复杂性，预测精度与计算成本之间的权衡正在成为一个极其重要的话题，特别是在时间序列预测的集成学习背景下。为了评估这一点，我们在两个大规模零售数据集（M5和VN1）上评估了十个基本模型和八种集成配置，考虑了在不同的重新训练频率下的点和概率精度。我们表明，集成模型始终能够改善预测性能，特别是在概率设置中。然而，这些收益伴随着相当大的计算成本，特别是对于更大、以准确性为驱动的集成模型。我们发现，降低重新训练频率显著降低了成本，对准确性的影响很小，特别是对于点预测。此外，以效率为驱动的集成模型提供了一个强大的平衡，实现了与准确性优化组合相比明显更低的成本。最重要的是，通常只需要两个或三个模型的小型集成模型就足以实现接近最佳结果。这些发现为部署可扩展和成本效益的预测系统提供了实用指南，支持预测领域可持续人工智能的更广泛目标。总的来说，这项工作表明，谨慎选择集成设计和重新训练策略可以产生准确、稳健且成本效益的预测，适用于实际应用。

更新时间: 2025-07-09 12:32:09

领域: cs.LG,stat.AP,stat.OT

下载: http://arxiv.org/abs/2506.04677v2

Revisiting Likelihood-Based Out-of-Distribution Detection by Modeling Representations

Out-of-distribution (OOD) detection is critical for ensuring the reliability of deep learning systems, particularly in safety-critical applications. Likelihood-based deep generative models have historically faced criticism for their unsatisfactory performance in OOD detection, often assigning higher likelihood to OOD data than in-distribution samples when applied to image data. In this work, we demonstrate that likelihood is not inherently flawed. Rather, several properties in the images space prohibit likelihood as a valid detection score. Given a sufficiently good likelihood estimator, specifically using the probability flow formulation of a diffusion model, we show that likelihood-based methods can still perform on par with state-of-the-art methods when applied in the representation space of pre-trained encoders. The code of our work can be found at $\href{https://github.com/limchaos/Likelihood-OOD.git}{\texttt{https://github.com/limchaos/Likelihood-OOD.git}}$.

Updated: 2025-07-09 12:32:02

标题: 重新审视基于似然的模型表示方法的异常检测

摘要: 超出分布（OOD）检测对于确保深度学习系统的可靠性至关重要，特别是在安全关键应用中。基于似然的深度生成模型历来面临批评，因为它们在OOD检测方面的表现不佳，当应用于图像数据时，通常会将OOD数据分配比在分布样本中更高的似然性。在这项工作中，我们展示了似然性本质上并非有缺陷。相反，图像空间中的几个属性阻碍了似然性作为有效检测分数。通过使用扩散模型的概率流形式，给定一个足够好的似然性估计器，我们展示了当应用于预训练编码器的表示空间时，基于似然的方法仍然可以与最先进的方法相媲美。我们的工作代码可以在以下链接找到：$\href{https://github.com/limchaos/Likelihood-OOD.git}{\texttt{https://github.com/limchaos/Likelihood-OOD.git}}$。

更新时间: 2025-07-09 12:32:02

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2504.07793v2

Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining: Method, Evaluation and Applications

The emergence of open-source large language models (LLMs) has expanded opportunities for enterprise applications; however, many organizations still lack the infrastructure to deploy and maintain large-scale models. As a result, small LLMs (sLLMs) have become a practical alternative, despite their inherent performance limitations. While Domain Adaptive Continual Pretraining (DACP) has been previously explored as a method for domain adaptation, its utility in commercial applications remains under-examined. In this study, we validate the effectiveness of applying a DACP-based recipe across diverse foundation models and service domains. Through extensive experiments and real-world evaluations, we demonstrate that DACP-applied sLLMs achieve substantial gains in target domain performance while preserving general capabilities, offering a cost-efficient and scalable solution for enterprise-level deployment.

Updated: 2025-07-09 12:30:42

标题: 通过领域自适应连续预训练实现高效的工业级sLLMs：方法、评估和应用

摘要: 开源大型语言模型（LLMs）的出现扩大了企业应用的机会；然而，许多组织仍然缺乏部署和维护大规模模型的基础设施。因此，尽管存在性能限制，小型LLMs（sLLMs）已成为一种实际的替代方案。虽然域自适应持续预训练（DACP）先前已被探讨作为一种领域适应方法，但其在商业应用中的实用性尚未被充分检验。在本研究中，我们验证了在不同基础模型和服务领域中应用基于DACP的方法的有效性。通过广泛的实验和真实世界评估，我们证明了应用DACP的sLLMs在目标领域性能方面取得了实质性的提升，同时保留了通用能力，为企业级部署提供了成本效益和可扩展的解决方案。

更新时间: 2025-07-09 12:30:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.06795v1

Test-Time Scaling with Reflective Generative Model

We introduce our first reflective generative model MetaStone-S1, which obtains OpenAI o3-mini's performance via the new Reflective Generative Form. The new form focuses on high-quality reasoning trajectory selection and contains two novelties: 1) A unified interface for policy and process reward model: we share the backbone network and use task-specific heads for reasoning trajectory predicting and scoring respectively, introducing only 53M extra parameters for trajectory scoring. 2) Eliminating the reliance on process-level annotation: we provide a self-supervised process reward model, which can directly learn the high-quality reasoning trajectory selection from the outcome reward. Equipped with the reflective generative form, MetaStone-S1 is naturally suitable for test-time scaling, and we provide three reasoning effort modes (low, medium, and high) based on the controllable thinking length. Experiments demonstrate that our MetaStone-S1 achieves comparable performance to OpenAI o3-mini's series with only 32B parameter size. To support the research community, we have open-sourced MetaStone-S1 at https://github.com/MetaStone-AI/MetaStone-S1.

Updated: 2025-07-09 12:28:31

标题: 测试时间缩放与反射生成模型

摘要: 我们引入了我们的第一个反思生成模型MetaStone-S1，通过新的反思生成形式获得了OpenAI o3-mini的性能。新形式专注于高质量的推理轨迹选择，并包含两个创新点：1）策略和过程奖励模型的统一接口：我们共享骨干网络，并使用任务特定的头部分别用于推理轨迹预测和评分，仅引入了5300万额外参数用于轨迹评分。2）消除对过程级注释的依赖：我们提供了一个自监督的过程奖励模型，可以直接从结果奖励中学习高质量的推理轨迹选择。装备有反思生成形式，MetaStone-S1自然适合于测试时间的扩展，并根据可控的思考长度提供了三种推理努力模式（低、中、高）。实验证明，我们的MetaStone-S1仅使用32亿参数大小就实现了与OpenAI o3-mini系列可比较的性能。为了支持研究社区，我们已经在https://github.com/MetaStone-AI/MetaStone-S1上开源了MetaStone-S1。

更新时间: 2025-07-09 12:28:31

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2507.01951v2

Temporal Information Retrieval via Time-Specifier Model Merging

The rapid expansion of digital information and knowledge across structured and unstructured sources has heightened the importance of Information Retrieval (IR). While dense retrieval methods have substantially improved semantic matching for general queries, they consistently underperform on queries with explicit temporal constraints--often those containing numerical expressions and time specifiers such as ``in 2015.'' Existing approaches to Temporal Information Retrieval (TIR) improve temporal reasoning but often suffer from catastrophic forgetting, leading to reduced performance on non-temporal queries. To address this, we propose Time-Specifier Model Merging (TSM), a novel method that enhances temporal retrieval while preserving accuracy on non-temporal queries. TSM trains specialized retrievers for individual time specifiers and merges them in to a unified model, enabling precise handling of temporal constraints without compromising non-temporal retrieval. Extensive experiments on both temporal and non-temporal datasets demonstrate that TSM significantly improves performance on temporally constrained queries while maintaining strong results on non-temporal queries, consistently outperforming other baseline methods. Our code is available at https://github.com/seungyoonee/TSM .

Updated: 2025-07-09 12:16:11

标题: 时间信息检索通过时间指定器模型合并

摘要: 数字信息和知识在结构化和非结构化来源之间的快速扩展加剧了信息检索（IR）的重要性。虽然密集检索方法在一般查询的语义匹配方面有了显著改进，但在具有明确时间约束的查询上表现不佳，通常是包含数字表达和时间标识符（如“2015年”）的查询。现有的时间信息检索（TIR）方法虽然改进了时间推理，但通常存在灾难性遗忘，导致在非时间查询上性能下降。为了解决这个问题，我们提出了时间标识符模型合并（TSM），这是一种新颖的方法，可以增强时间检索的效果，同时保持非时间查询的准确性。TSM为各个时间标识符训练专门的检索器，并将它们合并成一个统一模型，从而能够精确处理时间约束，而不会影响非时间检索。对于时间和非时间数据集的大量实验表明，TSM显著提高了对时间约束查询的性能，同时在非时间查询上保持了强大的结果，始终优于其他基准方法。我们的代码可在https://github.com/seungyoonee/TSM 上找到。

更新时间: 2025-07-09 12:16:11

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.06782v1

GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods

Despite the growing interest in jailbreak methods as an effective red-teaming tool for building safe and responsible large language models (LLMs), flawed evaluation system designs have led to significant discrepancies in their effectiveness assessments. We conduct a systematic measurement study based on 37 jailbreak studies since 2022, focusing on both the methods and the evaluation systems they employ. We find that existing evaluation systems lack case-specific criteria, resulting in misleading conclusions about their effectiveness and safety implications. This paper advocates a shift to a more nuanced, case-by-case evaluation paradigm. We introduce GuidedBench, a novel benchmark comprising a curated harmful question dataset, detailed case-by-case evaluation guidelines and an evaluation system integrated with these guidelines -- GuidedEval. Experiments demonstrate that GuidedBench offers more accurate measurements of jailbreak performance, enabling meaningful comparisons across methods and uncovering new insights overlooked in previous evaluations. GuidedEval reduces inter-evaluator variance by at least 76.03\%. Furthermore, we observe that incorporating guidelines can enhance the effectiveness of jailbreak methods themselves, offering new insights into both attack strategies and evaluation paradigms.

Updated: 2025-07-09 12:13:12

标题: GuidedBench：测量和缓解野外LLM越狱方法评估差异

摘要: 尽管越来越多人对越狱方法作为构建安全和负责任的大型语言模型（LLMs）的有效红队工具感兴趣，但缺乏完善的评估系统设计导致了对它们有效性评估中的重大差异。我们基于自2022年以来的37项越狱研究进行了系统性测量研究，重点关注它们所采用的方法和评估系统。我们发现现有的评估系统缺乏案例特定的标准，导致对它们的有效性和安全性影响得出误导性结论。本文主张向更加细致、案例为基础的评估范式转变。我们介绍了GuidedBench，一个新颖的基准测试，包括一个策划的有害问题数据集、详细的案例评估指南和一个与这些指南集成的评估系统-- GuidedEval。实验证明，GuidedBench提供了更准确的越狱性能测量，使得能够进行跨方法的有意义比较，并揭示了之前评估中忽视的新见解。GuidedEval将评估者之间的变异降低了至少76.03\%。此外，我们观察到，整合指南可以增强越狱方法本身的有效性，为攻击策略和评估范式提供新的见解。

更新时间: 2025-07-09 12:13:12

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2502.16903v2

Learning safe, constrained policies via imitation learning: Connection to Probabilistic Inference and a Naive Algorithm

This article introduces an imitation learning method for learning maximum entropy policies that comply with constraints demonstrated by expert trajectories executing a task. The formulation of the method takes advantage of results connecting performance to bounds for the KL-divergence between demonstrated and learned policies, and its objective is rigorously justified through a connection to a probabilistic inference framework for reinforcement learning, incorporating the reinforcement learning objective and the objective to abide by constraints in an entropy maximization setting. The proposed algorithm optimizes the learning objective with dual gradient descent, supporting effective and stable training. Experiments show that the proposed method can learn effective policy models for constraints-abiding behaviour, in settings with multiple constraints of different types, accommodating different modalities of demonstrated behaviour, and with abilities to generalize.

Updated: 2025-07-09 12:11:27

标题: 通过模仿学习学习安全、受限策略：与概率推理和一个天真算法的联系

摘要: 本文介绍了一种模仿学习方法，用于学习符合由专家轨迹执行的任务所示约束的最大熵策略。该方法的制定利用了将性能与展示和学习策略之间的KL散度的界限相关联的结果，并且其目标通过与强化学习的概率推断框架的连接得到严格的理论验证，将强化学习目标和遵守约束的目标结合在熵最大化环境中。所提出的算法使用双梯度下降优化学习目标，支持有效和稳定的训练。实验证明，所提出的方法可以学习适用于遵守约束行为的有效策略模型，在具有不同类型多个约束的设置中，适应不同模式的示范行为，并具有泛化能力。

更新时间: 2025-07-09 12:11:27

领域: cs.LG,cs.MA

下载: http://arxiv.org/abs/2507.06780v1

Tailoring deep learning for real-time brain-computer interfaces: From offline models to calibration-free online decoding

Despite the growing success of deep learning (DL) in offline brain-computer interfaces (BCIs), its adoption in real-time applications remains limited due to three primary challenges. First, most DL solutions are designed for offline decoding, making the transition to online decoding unclear. Second, the use of sliding windows in online decoding substantially increases computational complexity. Third, DL models typically require large amounts of training data, which are often scarce in BCI applications. To address these challenges and enable real-time, cross-subject decoding without subject-specific calibration, we introduce realtime adaptive pooling (RAP), a novel parameter-free method. RAP seamlessly modifies the pooling layers of existing offline DL models to meet online decoding requirements. It also reduces computational complexity during training by jointly decoding consecutive sliding windows. To further alleviate data requirements, our method leverages source-free domain adaptation, enabling privacy-preserving adaptation across varying amounts of target data. Our results demonstrate that RAP provides a robust and efficient framework for real-time BCI applications. It preserves privacy, reduces calibration demands, and supports co-adaptive BCI systems, paving the way for broader adoption of DL in online BCIs. These findings lay a strong foundation for developing user-centered, high-performance BCIs that facilitate immediate feedback and user learning.

Updated: 2025-07-09 12:11:19

标题: 调整深度学习以实现实时脑机接口：从离线模型到无校准在线解码

摘要: 尽管深度学习（DL）在离线脑机接口（BCIs）中取得了日益增长的成功，但由于三个主要挑战，其在实时应用中的采用仍然受到限制。首先，大多数DL解决方案都是为离线解码而设计的，使得转向在线解码变得不明确。其次，在在线解码中使用滑动窗口会大大增加计算复杂性。第三，DL模型通常需要大量的训练数据，在BCI应用中往往稀缺。为了应对这些挑战，并实现无需特定主体校准的实时跨主体解码，我们引入了实时自适应池化（RAP），这是一种新颖的无参数方法。RAP无缝地修改现有离线DL模型的池化层，以满足在线解码的要求。它还通过同时解码连续滑动窗口来减少训练过程中的计算复杂性。为了进一步减轻数据要求，我们的方法利用源自由领域适应，实现隐私保护的适应，跨不同数量的目标数据。我们的结果表明，RAP为实时BCI应用提供了一个强大而高效的框架。它保护隐私，减少校准需求，并支持共适应BCI系统，为深度学习在在线BCI中更广泛的采用铺平了道路。这些发现为开发以用户为中心的高性能BCI奠定了坚实的基础，促进了即时反馈和用户学习。

更新时间: 2025-07-09 12:11:19

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2507.06779v1

AblationBench: Evaluating Automated Planning of Ablations in Empirical AI Research

Autonomous agents built on language models (LMs) are showing increasing popularity in many fields, including scientific research. AI co-scientists aim to support or automate parts of the research process using these agents. A key component of empirical AI research is the design of ablation experiments. To this end, we introduce AblationBench, a benchmark suite for evaluating agents on ablation planning tasks in empirical AI research. It includes two tasks: AuthorAblation, which helps authors propose ablation experiments based on a method section and contains 83 instances, and ReviewerAblation, which helps reviewers find missing ablations in a full paper and contains 350 instances. For both tasks, we develop LM-based judges that serve as an automatic evaluation framework. Our experiments with frontier LMs show that these tasks remain challenging, with the best-performing LM system identifying only 29% of the original ablations on average. Lastly, we analyze the limitations of current LMs on these tasks, and find that chain-of-thought prompting outperforms the currently existing agent-based approach.

Updated: 2025-07-09 12:07:38

标题: 消融台：评估在经验人工智能研究中自动规划消融的效果

摘要: 建立在语言模型（LMs）上的自主代理在许多领域，包括科学研究中，越来越受欢迎。人工智能共同科学家旨在利用这些代理支持或自动化研究过程的某些部分。经验人工智能研究的一个关键组成部分是进行消蚀实验的设计。为此，我们介绍了AblationBench，一个用于评估代理在经验人工智能研究中消蚀计划任务的基准套件。它包括两个任务：AuthorAblation，帮助作者根据方法部分提出消蚀实验，包含83个实例；ReviewerAblation，帮助审稿人在完整论文中找到缺失的消蚀实验，包含350个实例。对于这两个任务，我们开发了基于LM的评判者，作为自动评估框架。我们使用前沿LM进行实验，结果显示这些任务仍然具有挑战性，最佳表现的LM系统平均仅识别原始消蚀的29%。最后，我们分析了当前LM在这些任务上的局限性，并发现思维链提示优于当前存在的基于代理的方法。

更新时间: 2025-07-09 12:07:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.08038v1

Mutual Information Free Topological Generalization Bounds via Stability

Providing generalization guarantees for stochastic optimization algorithms is a major challenge in modern learning theory. Recently, several studies highlighted the impact of the geometry of training trajectories on the generalization error, both theoretically and empirically. Among these works, a series of topological generalization bounds have been proposed, relating the generalization error to notions of topological complexity that stem from topological data analysis (TDA). Despite their empirical success, these bounds rely on intricate information-theoretic (IT) terms that can be bounded in specific cases but remain intractable for practical algorithms (such as ADAM), potentially reducing the relevance of the derived bounds. In this paper, we seek to formulate comprehensive and interpretable topological generalization bounds free of intractable mutual information terms. To this end, we introduce a novel learning theoretic framework that departs from the existing strategies via proof techniques rooted in algorithmic stability. By extending an existing notion of \textit{hypothesis set stability}, to \textit{trajectory stability}, we prove that the generalization error of trajectory-stable algorithms can be upper bounded in terms of (i) TDA quantities describing the complexity of the trajectory of the optimizer in the parameter space, and (ii) the trajectory stability parameter of the algorithm. Through a series of experimental evaluations, we demonstrate that the TDA terms in the bound are of great importance, especially as the number of training samples grows. This ultimately forms an explanation of the empirical success of the topological generalization bounds.

Updated: 2025-07-09 12:03:25

标题: 通过稳定性实现的互信息自由的拓扑泛化界限

摘要: 为随机优化算法提供泛化保证是现代学习理论中的一个重要挑战。最近，几项研究强调了训练轨迹几何形状对泛化误差的影响，无论是从理论上还是从实证上。在这些研究中，提出了一系列拓扑泛化界限，将泛化误差与源自拓扑数据分析（TDA）的拓扑复杂性概念联系起来。尽管这些界限在实证上取得了成功，但它们依赖于复杂的信息论（IT）术语，在特定情况下可以被界定，但对于实际算法（如ADAM）来说仍然难以处理，可能降低了所得界限的相关性。在本文中，我们致力于制定全面且易于解释的拓扑泛化界限，摆脱难以处理的互信息术语。为此，我们引入了一个新颖的学习理论框架，通过根植于算法稳定性的证明技术，与现有策略有所不同。通过将现有的“假设集稳定性”概念扩展为“轨迹稳定性”，我们证明了轨迹稳定算法的泛化误差可以用优化器在参数空间中轨迹的复杂性描述的TDA量以及算法的轨迹稳定性参数来上界。通过一系列实验评估，我们展示了界限中的TDA术语特别重要，尤其是在训练样本数量增加时。这最终解释了拓扑泛化界限的实证成功。

更新时间: 2025-07-09 12:03:25

领域: cs.LG,math.AT,stat.ML

下载: http://arxiv.org/abs/2507.06775v1

From Gradient Clipping to Normalization for Heavy Tailed SGD

Recent empirical evidence indicates that many machine learning applications involve heavy-tailed gradient noise, which challenges the standard assumptions of bounded variance in stochastic optimization. Gradient clipping has emerged as a popular tool to handle this heavy-tailed noise, as it achieves good performance in this setting both theoretically and practically. However, our current theoretical understanding of non-convex gradient clipping has three main shortcomings. First, the theory hinges on large, increasing clipping thresholds, which are in stark contrast to the small constant clipping thresholds employed in practice. Second, clipping thresholds require knowledge of problem-dependent parameters to guarantee convergence. Lastly, even with this knowledge, current sampling complexity upper bounds for the method are sub-optimal in nearly all parameters. To address these issues, we study convergence of Normalized SGD (NSGD). First, we establish a parameter-free sample complexity for NSGD of $\mathcal{O}\left(\varepsilon^{-\frac{2p}{p-1}}\right)$ to find an $\varepsilon$-stationary point. Furthermore, we prove tightness of this result, by providing a matching algorithm-specific lower bound. In the setting where all problem parameters are known, we show this complexity is improved to $\mathcal{O}\left(\varepsilon^{-\frac{3p-2}{p-1}}\right)$, matching the previously known lower bound for all first-order methods in all problem dependent parameters. Finally, we establish high-probability convergence of NSGD with a mild logarithmic dependence on the failure probability. Our work complements the studies of gradient clipping under heavy tailed noise improving the sample complexities of existing algorithms and offering an alternative mechanism to achieve high probability convergence.

Updated: 2025-07-09 12:01:30

标题: 从梯度裁剪到重尾SGD的归一化

摘要: 最近的实证证据表明，许多机器学习应用涉及重尾梯度噪声，这挑战了随机优化中有界方差的标准假设。梯度裁剪已成为处理这种重尾噪声的流行工具，因为它在理论和实践中都能在这种情况下取得良好的性能。然而，我们目前对非凸梯度裁剪的理论理解存在三个主要缺点。首先，该理论依赖于大型、不断增加的裁剪阈值，这与实践中采用的小常数裁剪阈值形成鲜明对比。其次，裁剪阈值需要知识问题相关参数才能保证收敛。最后，即使具备这一知识，当前方法的采样复杂性上界在几乎所有参数上都是次优的。为了解决这些问题，我们研究了归一化随机梯度下降（NSGD）的收敛性。首先，我们建立了NSGD的无参数样本复杂度为$\mathcal{O}\left(\varepsilon^{-\frac{2p}{p-1}}\right)$，以找到一个$\varepsilon$-稳定点。此外，我们通过提供一个匹配算法特定下界来证明了这一结果的紧度。在所有问题参数已知的情况下，我们展示了该复杂度被改进为$\mathcal{O}\left(\varepsilon^{-\frac{3p-2}{p-1}}\right)$，与所有问题相关参数中已知的所有一阶方法的下界相匹配。最后，我们建立了NSGD的高概率收敛性，并且对失败概率有轻微对数依赖。我们的工作补充了在重尾噪声下梯度裁剪的研究，改善了现有算法的样本复杂性，并提供了一种实现高概率收敛的替代机制。

更新时间: 2025-07-09 12:01:30

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.13849v3

Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking

To guarantee safe and robust deployment of large language models (LLMs) at scale, it is critical to accurately assess their adversarial robustness. Existing adversarial attacks typically target harmful responses in single-point, greedy generations, overlooking the inherently stochastic nature of LLMs. In this paper, we propose a novel framework for adversarial robustness evaluation that explicitly models the entire output distribution, including tail-risks, providing better estimates for model robustness at scale. By casting the attack process as a resource allocation problem between optimization and sampling, we determine compute-optimal tradeoffs and show that integrating sampling into existing attacks boosts ASR by up to 48% and improves efficiency by up to two orders of magnitude. Our framework also enables us to analyze how different attack algorithms affect output harm distributions. Surprisingly, we find that most optimization strategies have little effect on output harmfulness. Finally, we introduce a data-free proof-of-concept objective based on entropy-maximization to demonstrate how our tail-aware perspective enables new optimization targets. Overall, our findings highlight the importance of tail-aware attacks and evaluation protocols to accurately assess and strengthen LLM safety.

Updated: 2025-07-09 11:52:25

标题: 尾部感知对抗性攻击：一种分布式方法，用于高效的LLM越狱

摘要: 为了保证大型语言模型（LLMs）在规模上的安全和稳健部署，准确评估它们的对抗鲁棒性至关重要。现有的对抗攻击通常针对单点、贪婪生成中的有害响应，忽视了LLMs固有的随机性质。在本文中，我们提出了一种新颖的对抗鲁棒性评估框架，明确地对整个输出分布进行建模，包括尾风险，为规模上的模型鲁棒性提供更好的估计。通过将攻击过程视为优化和抽样之间的资源分配问题，我们确定了计算最优的权衡，表明将抽样集成到现有攻击中可以提高ASR高达48%，并将效率提高两个数量级。我们的框架还使我们能够分析不同攻击算法如何影响输出有害分布。令人惊讶的是，我们发现大多数优化策略对输出有害性几乎没有影响。最后，我们介绍了一种基于熵最大化的无数据概念验证目标，以展示我们的尾部感知视角如何实现新的优化目标。总的来说，我们的发现强调了尾部感知攻击和评估方案对准确评估和加强LLM安全的重要性。

更新时间: 2025-07-09 11:52:25

领域: cs.LG

下载: http://arxiv.org/abs/2507.04446v2

Robust Deep Network Learning of Nonlinear Regression Tasks by Parametric Leaky Exponential Linear Units (LELUs) and a Diffusion Metric

This document proposes a parametric activation function (ac.f.) aimed at improving multidimensional nonlinear data regression. It is a established knowledge that nonlinear ac.f.'s are required for learning nonlinear datasets. This work shows that smoothness and gradient properties of the ac.f. further impact the performance of large neural networks in terms of overfitting and sensitivity to model parameters. Smooth but vanishing-gradient ac.f.'s such as ELU or SiLU have limited performance and non-smooth ac.f.'s such as RELU and Leaky-RELU further impart discontinuity in the trained model. Improved performance is demonstrated with a smooth "Leaky Exponential Linear Unit", with non-zero gradient that can be trained. A novel diffusion-loss metric is also proposed to gauge the performance of the trained models in terms of overfitting.

Updated: 2025-07-09 11:49:15

标题: 通过参数泄漏指数线性单元（LELU）和扩散度量的鲁棒深度网络学习非线性回归任务

摘要: 这份文件提出了一个参数化激活函数（ac.f.），旨在改进多维非线性数据回归。已经确立的知识是，学习非线性数据集需要非线性激活函数。这项工作显示，激活函数的平滑性和梯度特性进一步影响大型神经网络在过拟合和对模型参数的敏感性方面的性能。像ELU或SiLU这样的平滑但梯度消失的激活函数具有有限的性能，而像RELU和Leaky-RELU这样的非平滑激活函数进一步在训练模型中引入不连续性。通过使用平滑的“Leaky指数线性单元”来展示了改进的性能，该激活函数具有非零梯度可以进行训练。还提出了一种新的扩散损失度量标准，以衡量训练模型在过拟合方面的性能。

更新时间: 2025-07-09 11:49:15

领域: cs.LG

下载: http://arxiv.org/abs/2507.06765v1

Fast Equivariant Imaging: Acceleration for Unsupervised Learning via Augmented Lagrangian and Auxiliary PnP Denoisers

We propose Fast Equivariant Imaging (FEI), a novel unsupervised learning framework to efficiently train deep imaging networks without ground-truth data. From the perspective of reformulating the Equivariant Imaging based optimization problem via the method of Lagrange multipliers and utilizing plug-and-play denoisers, this novel unsupervised scheme shows superior efficiency and performance compared to vanilla Equivariant Imaging paradigm. In particular, our PnP-FEI scheme achieves an order-of-magnitude (10x) acceleration over standard EI on training U-Net with CT100 dataset for X-ray CT reconstruction, with improved generalization performance.

Updated: 2025-07-09 11:47:06

标题: 快速等变成像：通过增广拉格朗日和辅助PnP去噪器加速无监督学习

摘要: 我们提出了快速等变成像（FEI）的新颖无监督学习框架，可以有效地训练深度成像网络，而无需地面真实数据。从通过拉格朗日乘子方法重新制定等变成像优化问题的角度，并利用插拔式去噪器，这种新颖的无监督方案在效率和性能上表现出优势，与普通的等变成像范式相比。特别是，我们的PnP-FEI方案在使用CT100数据集训练X射线CT重建的U-Net时，比标准EI实现了一个数量级（10倍）的加速，并具有改进的泛化性能。

更新时间: 2025-07-09 11:47:06

领域: eess.IV,cs.CV,cs.LG,math.OC

下载: http://arxiv.org/abs/2507.06764v1

Aerial Maritime Vessel Detection and Identification

Autonomous maritime surveillance and target vessel identification in environments where Global Navigation Satellite Systems (GNSS) are not available is critical for a number of applications such as search and rescue and threat detection. When the target vessel is only described by visual cues and its last known position is not available, unmanned aerial vehicles (UAVs) must rely solely on on-board vision to scan a large search area under strict computational constraints. To address this challenge, we leverage the YOLOv8 object detection model to detect all vessels in the field of view. We then apply feature matching and hue histogram distance analysis to determine whether any detected vessel corresponds to the target. When found, we localize the target using simple geometric principles. We demonstrate the proposed method in real-world experiments during the MBZIRC2023 competition, integrated into a fully autonomous system with GNSS-denied navigation. We also evaluate the impact of perspective on detection accuracy and localization precision and compare it with the oracle approach.

Updated: 2025-07-09 11:43:02

标题: 航空海事船舶探测与识别

摘要: 在全球导航卫星系统（GNSS）不可用的环境中，自主海上监视和目标船只识别对于诸如搜救和威胁检测等多种应用至关重要。当目标船只仅通过视觉线索描述且其最后已知位置不可用时，无人机必须仅依赖机载视觉在严格的计算约束下扫描大范围搜索区域。为了解决这一挑战，我们利用YOLOv8目标检测模型在视野范围内检测所有船只。然后，我们应用特征匹配和色调直方图距离分析来确定任何检测到的船只是否对应目标。一旦发现，我们使用简单的几何原理定位目标。我们在MBZIRC2023竞赛中进行了实际实验，将所提出的方法整合到一个完全自主系统中，该系统具有GNSS受限导航。我们还评估了透视对检测精度和定位精度的影响，并将其与基准方法进行比较。

更新时间: 2025-07-09 11:43:02

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2507.07153v1

FOLC-Net: A Federated-Optimized Lightweight Architecture for Enhanced MRI Disease Diagnosis across Axial, Coronal, and Sagittal Views

The framework is designed to improve performance in the analysis of combined as well as single anatomical perspectives for MRI disease diagnosis. It specifically addresses the performance degradation observed in state-of-the-art (SOTA) models, particularly when processing axial, coronal, and sagittal anatomical planes. The paper introduces the FOLC-Net framework, which incorporates a novel federated-optimized lightweight architecture with approximately 1.217 million parameters and a storage requirement of only 0.9 MB. FOLC-Net integrates Manta-ray foraging optimization (MRFO) mechanisms for efficient model structure generation, global model cloning for scalable training, and ConvNeXt for enhanced client adaptability. The model was evaluated on combined multi-view data as well as individual views, such as axial, coronal, and sagittal, to assess its robustness in various medical imaging scenarios. Moreover, FOLC-Net tests a ShallowFed model on different data to evaluate its ability to generalize beyond the training dataset. The results show that FOLC-Net outperforms existing models, particularly in the challenging sagittal view. For instance, FOLC-Net achieved an accuracy of 92.44% on the sagittal view, significantly higher than the 88.37% accuracy of study method (DL + Residual Learning) and 88.95% of DL models. Additionally, FOLC-Net demonstrated improved accuracy across all individual views, providing a more reliable and robust solution for medical image analysis in decentralized environments. FOLC-Net addresses the limitations of existing SOTA models by providing a framework that ensures better adaptability to individual views while maintaining strong performance in multi-view settings. The incorporation of MRFO, global model cloning, and ConvNeXt ensures that FOLC-Net performs better in real-world medical applications.

Updated: 2025-07-09 11:40:41

标题: FOLC-Net：一种用于增强MRI疾病诊断的联邦优化轻量化架构，覆盖轴位、冠状位和矢状位

摘要: 该框架旨在改进MRI疾病诊断中对组合和单个解剖视角的分析性能。它专门解决了在处理轴向，冠状和矢状解剖平面时观察到的现有技术模型（SOTA）性能下降的问题。本文介绍了FOLC-Net框架，它融合了一个新颖的联邦优化轻量级架构，大约有121.7万个参数和仅0.9MB的存储需求。FOLC-Net集成了魔鬼鱼觅食优化（MRFO）机制，用于高效的模型结构生成，全局模型克隆用于可扩展训练，以及ConvNeXt用于增强客户端的适应性。该模型在组合多视角数据以及单个视图，如轴向，冠状和矢状，上进行评估，以评估其在各种医学影像场景中的稳健性。此外，FOLC-Net在不同数据上测试了ShallowFed模型，以评估其在训练数据集之外的泛化能力。结果显示，FOLC-Net在挑战性的矢状视图上表现优于现有模型。例如，FOLC-Net在矢状视图上达到了92.44%的准确率，远高于研究方法（DL +残差学习）的88.37%和DL模型的88.95%的准确率。此外，FOLC-Net在所有单个视图上都表现出了更高的准确性，为去中心化环境中的医学图像分析提供了更可靠和稳健的解决方案。FOLC-Net通过提供一个确保更好适应个别视图的框架，同时在多视图设置中保持强大性能，解决了现有SOTA模型的局限性。MRFO，全局模型克隆和ConvNeXt的整合确保了FOLC-Net在真实世界的医学应用中表现更好。

更新时间: 2025-07-09 11:40:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.06763v1

CRISP: Complex Reasoning with Interpretable Step-based Plans

Recent advancements in large language models (LLMs) underscore the need for stronger reasoning capabilities to solve complex problems effectively. While Chain-of-Thought (CoT) reasoning has been a step forward, it remains insufficient for many domains. A promising alternative is explicit high-level plan generation, but existing approaches largely assume that LLMs can produce effective plans through few-shot prompting alone, without additional training. In this work, we challenge this assumption and introduce CRISP (Complex Reasoning with Interpretable Step-based Plans), a multi-domain dataset of high-level plans for mathematical reasoning and code generation. The plans in CRISP are automatically generated and rigorously validated--both intrinsically, using an LLM as a judge, and extrinsically, by evaluating their impact on downstream task performance. We demonstrate that fine-tuning a small model on CRISP enables it to generate higher-quality plans than much larger models using few-shot prompting, while significantly outperforming Chain-of-Thought reasoning. Furthermore, our out-of-domain evaluation reveals that fine-tuning on one domain improves plan generation in the other, highlighting the generalizability of learned planning capabilities.

Updated: 2025-07-09 11:40:24

标题: CRISP：具有可解释的基于步骤的复杂推理

摘要: 最近大型语言模型（LLMs）的先进进展突显了解决复杂问题所需的更强推理能力。虽然思维链（CoT）推理是一大进步，但对许多领域来说仍然不够。一种有前途的替代方案是显式高层次计划生成，但现有方法主要假设LLMs可以单凭少量提示就能生成有效计划，无需额外训练。在这项工作中，我们挑战了这一假设，介绍了CRISP（具有可解释步骤计划的复杂推理），这是一个多领域数据集，用于数学推理和代码生成的高层次计划。CRISP中的计划是自动生成并经过严格验证-既通过LLM作为评判者进行内在验证，又通过评估它们对下游任务绩效的影响进行外在验证。我们证明，对小型模型在CRISP上进行微调使其能够生成比使用少量提示的大型模型更高质量的计划，同时明显优于思维链推理。此外，我们的跨域评估表明，在一个领域上微调可以改善另一个领域的计划生成，突显了学习计划能力的泛化性。

更新时间: 2025-07-09 11:40:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.08037v1

Reinforcement Learning-based Feature Generation Algorithm for Scientific Data

Feature generation (FG) aims to enhance the prediction potential of original data by constructing high-order feature combinations and removing redundant features. It is a key preprocessing step for tabular scientific data to improve downstream machine-learning model performance. Traditional methods face the following two challenges when dealing with the feature generation of scientific data: First, the effective construction of high-order feature combinations in scientific data necessitates profound and extensive domain-specific expertise. Secondly, as the order of feature combinations increases, the search space expands exponentially, imposing prohibitive human labor consumption. Advancements in the Data-Centric Artificial Intelligence (DCAI) paradigm have opened novel avenues for automating feature generation processes. Inspired by that, this paper revisits the conventional feature generation workflow and proposes the Multi-agent Feature Generation (MAFG) framework. Specifically, in the iterative exploration stage, multi-agents will construct mathematical transformation equations collaboratively, synthesize and identify feature combinations ex-hibiting high information content, and leverage a reinforcement learning mechanism to evolve their strategies. Upon completing the exploration phase, MAFG integrates the large language models (LLMs) to interpreta-tively evaluate the generated features of each significant model performance breakthrough. Experimental results and case studies consistently demonstrate that the MAFG framework effectively automates the feature generation process and significantly enhances various downstream scientific data mining tasks.

Updated: 2025-07-09 11:30:58

标题: 基于强化学习的科学数据特征生成算法

摘要: 特征生成（FG）旨在通过构建高阶特征组合和移除冗余特征来增强原始数据的预测潜力。这是改善表格科学数据下游机器学习模型性能的关键预处理步骤。传统方法在处理科学数据的特征生成时面临以下两个挑战：首先，在科学数据中有效构建高阶特征组合需要深刻和广泛的领域专业知识。其次，随着特征组合的阶数增加，搜索空间呈指数级扩展，导致人力消耗过高。数据中心人工智能（DCAI）范式的进步为自动化特征生成过程开辟了新途径。受此启发，本文重新审视传统特征生成工作流程，并提出多智能体特征生成（MAFG）框架。具体来说，在迭代探索阶段，多智能体将协作构建数学转换方程，综合和识别具有高信息含量的特征组合，并利用强化学习机制来演化他们的策略。完成探索阶段后，MAFG集成大型语言模型（LLMs）来解释性评估每个显著模型性能突破的生成特征。实验结果和案例研究一致表明，MAFG框架有效地自动化了特征生成过程，并显著增强了各种下游科学数据挖掘任务。

更新时间: 2025-07-09 11:30:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.03498v2

AI Agent Smart Contract Exploit Generation

We present A1, an agentic execution driven system that transforms any LLM into an end-to-end exploit generator. A1 has no hand-crafted heuristics and provides the agent with six domain-specific tools that enable autonomous vulnerability discovery. The agent can flexibly leverage these tools to understand smart contract behavior, generate exploit strategies, test them on blockchain states, and refine approaches based on execution feedback. All outputs are concretely validated to eliminate false positives. The evaluation across 36 real-world vulnerable contracts on Ethereum and Binance Smart Chain demonstrates a 62.96% (17 out of 27) success rate on the VERITE benchmark. Beyond the VERITE dataset, A1 identified 9 additional vulnerable contracts, with 5 cases occurring after the strongest model's training cutoff date. Across all 26 successful cases, A1 extracts up to 8.59 million USD per case and 9.33 million USD total. Through 432 experiments across six LLMs, we analyze iteration-wise performance showing diminishing returns with average marginal gains of +9.7%, +3.7%, +5.1%, and +2.8% for iterations 2-5 respectively, with per-experiment costs ranging $0.01-$3.59. A Monte Carlo analysis of 19 historical attacks shows success probabilities of 85.9%-88.8% without detection delays. We investigate whether an attacker or a defender benefits most from deploying A1 as a continuous on-chain scanning system. Our model shows that OpenAI's o3-pro maintains profitability up to a 30.0 days scanning delay at 0.100% vulnerability incidence rates, while faster models require >=1.000% rates to break-even. The findings exposes a troubling asymmetry: at 0.1% vulnerability rates, attackers achieve an on-chain scanning profitability at a \$6000 exploit value, while defenders require \$60000, raising fundamental questions about whether AI agents inevitably favor exploitation over defense.

Updated: 2025-07-09 11:25:39

标题: AI代理智能合约利用生成

摘要: 我们提出了A1，一个主动执行驱动系统，将任何LLM转化为端到端的利用生成器。A1没有手工制定的启发式规则，并为代理提供了六种领域特定工具，可以实现自主漏洞发现。代理可以灵活地利用这些工具来理解智能合约行为，生成利用策略，在区块链状态上测试它们，并根据执行反馈调整方法。所有输出都经过具体验证，消除了虚假阳性。在以太坊和币安智能链上对36个真实存在漏洞的合约进行评估展示，A1在VERITE基准测试中的成功率为62.96%（27个中的17个）。除了VERITE数据集，A1还识别出了9个额外的有漏洞的合约，其中有5个发生在最强模型的训练截止日期之后。在所有26个成功的案例中，A1每个案例提取的金额高达859万美元，总计933万美元。通过对六个LLM进行432次实验，我们分析了迭代性能，显示出边际收益递减，分别为+9.7%、+3.7%、+5.1%和+2.8%的平均边际增益，每次实验的成本范围为0.01美元至3.59美元。对19次历史攻击的蒙特卡罗分析显示，成功概率为85.9%至88.8%，没有检测延迟。我们调查了攻击者或防御者在部署A1作为连续的链上扫描系统时谁会受益最大。我们的模型表明，OpenAI的o3-pro在0.100%的漏洞发生率下，延迟30.0天的扫描仍然可盈利，而更快的模型需要>=1.000%的发生率才能收支平衡。这些发现揭示了一个令人担忧的不对称性：在0.1%的漏洞率下，攻击者在6000美元的利用价值上实现链上扫描的盈利，而防御者则需要60000美元，引发了关于AI代理是否不可避免地偏向利用而非防御的基本问题。

更新时间: 2025-07-09 11:25:39

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.05558v2

KAConvText: Novel Approach to Burmese Sentence Classification using Kolmogorov-Arnold Convolution

This paper presents the first application of Kolmogorov-Arnold Convolution for Text (KAConvText) in sentence classification, addressing three tasks: imbalanced binary hate speech detection, balanced multiclass news classification, and imbalanced multiclass ethnic language identification. We investigate various embedding configurations, comparing random to fastText embeddings in both static and fine-tuned settings, with embedding dimensions of 100 and 300 using CBOW and Skip-gram models. Baselines include standard CNNs and CNNs augmented with a Kolmogorov-Arnold Network (CNN-KAN). In addition, we investigated KAConvText with different classification heads - MLP and KAN, where using KAN head supports enhanced interpretability. Results show that KAConvText-MLP with fine-tuned fastText embeddings achieves the best performance of 91.23% accuracy (F1-score = 0.9109) for hate speech detection, 92.66% accuracy (F1-score = 0.9267) for news classification, and 99.82% accuracy (F1-score = 0.9982) for language identification.

Updated: 2025-07-09 11:25:35

标题: KAConvText：使用Kolmogorov-Arnold卷积的缅甸语句分类的新方法

摘要: 这篇论文首次将Kolmogorov-Arnold卷积在文本（KAConvText）中应用于句子分类，解决了三个任务：不平衡的二元仇恨言论检测、平衡的多类新闻分类和不平衡的多类族裔语言识别。我们研究了各种嵌入配置，比较了静态和微调设置下的随机和fastText嵌入，使用CBOW和Skip-gram模型的100和300维嵌入维度。基线包括标准CNN和CNN增强的Kolmogorov-Arnold网络（CNN-KAN）。此外，我们还研究了具有不同分类头部的KAConvText - MLP和KAN，在使用KAN头部时支持增强的可解释性。结果表明，使用微调的fastText嵌入的KAConvText-MLP实现了最佳性能，仇恨言论检测准确率为91.23%（F1分数=0.9109），新闻分类准确率为92.66%（F1分数=0.9267），语言识别准确率为99.82%（F1分数=0.9982）。

更新时间: 2025-07-09 11:25:35

领域: cs.CL,cs.AI,I.2.7; I.2.6

下载: http://arxiv.org/abs/2507.06753v1

Mathematical artificial data for operator learning

Machine learning has emerged as a transformative tool for solving differential equations (DEs), yet prevailing methodologies remain constrained by dual limitations: data-driven methods demand costly labeled datasets while model-driven techniques face efficiency-accuracy trade-offs. We present the Mathematical Artificial Data (MAD) framework, a new paradigm that integrates physical laws with data-driven learning to facilitate large-scale operator discovery. By exploiting DEs' intrinsic mathematical structure to generate physics-embedded analytical solutions and associated synthetic data, MAD fundamentally eliminates dependence on experimental or simulated training data. This enables computationally efficient operator learning across multi-parameter systems while maintaining mathematical rigor. Through numerical demonstrations spanning 2D parametric problems where both the boundary values and source term are functions, we showcase MAD's generalizability and superior efficiency/accuracy across various DE scenarios. This physics-embedded-data-driven framework and its capacity to handle complex parameter spaces gives it the potential to become a universal paradigm for physics-informed machine intelligence in scientific computing.

Updated: 2025-07-09 11:23:05

标题: 数学人工数据用于操作员学习

摘要: 机器学习已经成为解决微分方程（DEs）的转化工具，然而现有的方法仍受到双重限制：数据驱动方法需要昂贵的标记数据集，而模型驱动技术面临效率与准确性的权衡。我们提出了数学人工数据（MAD）框架，这是一个集成物理定律和数据驱动学习的新范式，以促进大规模算子发现。通过利用微分方程的固有数学结构生成嵌入物理学的分析解和相关合成数据，MAD从根本上消除了对实验或模拟训练数据的依赖。这使得在多参数系统中进行高效的算子学习成为可能，同时保持数学严谨性。通过涵盖边界值和源项均为函数的2D参数问题的数值演示，我们展示了MAD在各种DE场景中的普适性和优越的效率/准确性。这种嵌入物理学数据驱动框架及其处理复杂参数空间的能力使其有可能成为科学计算中物理信息机器智能的通用范式。

更新时间: 2025-07-09 11:23:05

领域: cs.LG,cs.NA,math.NA,stat.ML,68T07, 35J05,I.2.6; G.1.8; G.4

下载: http://arxiv.org/abs/2507.06752v1

Robust Multimodal Large Language Models Against Modality Conflict

Despite the impressive capabilities of multimodal large language models (MLLMs) in vision-language tasks, they are prone to hallucinations in real-world scenarios. This paper investigates the hallucination phenomenon in MLLMs from the perspective of modality conflict. Unlike existing works focusing on the conflicts between model responses and inputs, we study the inherent conflicts in inputs from different modalities that place MLLMs in a dilemma and directly lead to hallucinations. We formally define the modality conflict and construct a dataset named Multimodal Modality Conflict (MMMC) to simulate this phenomenon in vision-language tasks. Three methods based on prompt engineering, supervised fine-tuning, and reinforcement learning are proposed to alleviate the hallucination caused by modality conflict. Extensive experiments are conducted on the MMMC dataset to analyze the merits and demerits of these methods. Our results show that the reinforcement learning method achieves the best performance in mitigating the hallucination under modality conflict, while the supervised fine-tuning method shows promising and stable performance. Our work sheds light on the unnoticed modality conflict that leads to hallucinations and provides more insights into the robustness of MLLMs.

Updated: 2025-07-09 11:18:38

标题: 强大的多模态大型语言模型抵抗模态冲突

摘要: 尽管多模态大型语言模型（MLLMs）在视觉语言任务中具有令人印象深刻的能力，但它们在现实场景中容易出现幻觉。本文从模态冲突的角度研究了MLLMs中的幻觉现象。与现有的研究侧重于模型响应和输入之间的冲突不同，我们研究了来自不同模态的输入中固有的冲突，这使得MLLMs陷入困境并直接导致幻觉。我们正式定义了模态冲突，并构建了一个名为多模态模态冲突（MMMC）的数据集，以模拟视觉语言任务中的这种现象。提出了基于提示工程、监督微调和强化学习的三种方法来减轻由于模态冲突引起的幻觉。在MMMC数据集上进行了大量实验，分析了这些方法的优点和缺点。我们的结果显示，强化学习方法在减轻模态冲突下的幻觉方面表现最佳，而监督微调方法表现出有前途且稳定的性能。我们的工作揭示了导致幻觉的未被注意到的模态冲突，并为MLLMs的鲁棒性提供了更多见解。

更新时间: 2025-07-09 11:18:38

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.07151v1

A Blockchain Solution for Collaborative Machine Learning over IoT

The rapid growth of Internet of Things (IoT) devices and applications has led to an increased demand for advanced analytics and machine learning techniques capable of handling the challenges associated with data privacy, security, and scalability. Federated learning (FL) and blockchain technologies have emerged as promising approaches to address these challenges by enabling decentralized, secure, and privacy-preserving model training on distributed data sources. In this paper, we present a novel IoT solution that combines the incremental learning vector quantization algorithm (XuILVQ) with Ethereum blockchain technology to facilitate secure and efficient data sharing, model training, and prototype storage in a distributed environment. Our proposed architecture addresses the shortcomings of existing blockchain-based FL solutions by reducing computational and communication overheads while maintaining data privacy and security. We assess the performance of our system through a series of experiments, showcasing its potential to enhance the accuracy and efficiency of machine learning tasks in IoT settings.

Updated: 2025-07-09 11:08:49

标题: 一个基于区块链的解决方案，用于在物联网上进行协作机器学习

摘要: 物联网（IoT）设备和应用的快速增长导致了对能够处理与数据隐私、安全性和可扩展性相关挑战的先进分析和机器学习技术的增加需求。联邦学习（FL）和区块链技术已经成为解决这些挑战的有希望的方法，通过在分布式数据源上实现去中心化、安全和隐私保护的模型训练。在本文中，我们提出了一种将增量学习向量量化算法（XuILVQ）与以太坊区块链技术相结合的新型物联网解决方案，以促进在分布式环境中安全高效地进行数据共享、模型训练和原型存储。我们提出的架构通过降低计算和通信开销的同时保持数据隐私和安全性，解决了现有基于区块链的FL解决方案的缺点。我们通过一系列实验评估了系统的性能，展示了其在物联网环境中提高机器学习任务的准确性和效率的潜力。

更新时间: 2025-07-09 11:08:49

领域: cs.LG,cs.CR,cs.NI

下载: http://arxiv.org/abs/2311.14136v2

EFKAN: A KAN-Integrated Neural Operator For Efficient Magnetotelluric Forward Modeling

Magnetotelluric (MT) forward modeling is fundamental for improving the accuracy and efficiency of MT inversion. Neural operators (NOs) have been effectively used for rapid MT forward modeling, demonstrating their promising performance in solving the MT forward modeling-related partial differential equations (PDEs). Particularly, they can obtain the electromagnetic field at arbitrary locations and frequencies. In these NOs, the projection layers have been dominated by multi-layer perceptrons (MLPs), which may potentially reduce the accuracy of solution due to they usually suffer from the disadvantages of MLPs, such as lack of interpretability, overfitting, and so on. Therefore, to improve the accuracy of MT forward modeling with NOs and explore the potential alternatives to MLPs, we propose a novel neural operator by extending the Fourier neural operator (FNO) with Kolmogorov-Arnold network (EFKAN). Within the EFKAN framework, the FNO serves as the branch network to calculate the apparent resistivity and phase from the resistivity model in the frequency domain. Meanwhile, the KAN acts as the trunk network to project the resistivity and phase, determined by the FNO, to the desired locations and frequencies. Experimental results demonstrate that the proposed method not only achieves higher accuracy in obtaining apparent resistivity and phase compared to the NO equipped with MLPs at the desired frequencies and locations but also outperforms traditional numerical methods in terms of computational speed.

Updated: 2025-07-09 10:59:52

标题: EFKAN：一种用于高效磁 Telluric 正演建模的 KAN 集成神经操作符

摘要: 地磁测电（MT）正演建模对于提高MT反演的精度和效率至关重要。神经算子（NOs）已被有效用于快速MT正演建模，展示了它们在解决与MT正演建模相关的偏微分方程（PDEs）方面的优异性能。特别地，它们可以在任意位置和频率获得电磁场。在这些NOs中，投影层主要由多层感知器（MLPs）主导，这可能会降低解决方案的准确性，因为它们通常受到MLPs的缺点的困扰，如解释性不足，过拟合等。因此，为了提高MT正演建模的准确性并探索替代MLPs的潜在选择，我们提出了一种通过将傅里叶神经算子（FNO）与科尔莫戈洛夫-阿诺德网络（EFKAN）结合的新型神经算子。在EFKAN框架内，FNO作为分支网络，用于从频域中的电阻率模型计算表观电阻率和相位。与此同时，KAN作为主干网络，将由FNO确定的电阻率和相位投影到所需的位置和频率。实验结果表明，与在所需频率和位置配备MLPs的NO相比，所提出的方法在获得表观电阻率和相位方面不仅实现了更高的准确性，而且在计算速度方面也优于传统的数值方法。

更新时间: 2025-07-09 10:59:52

领域: physics.geo-ph,cs.LG

下载: http://arxiv.org/abs/2502.02195v2

Dual-Granularity Cross-Modal Identity Association for Weakly-Supervised Text-to-Person Image Matching

Weakly supervised text-to-person image matching, as a crucial approach to reducing models' reliance on large-scale manually labeled samples, holds significant research value. However, existing methods struggle to predict complex one-to-many identity relationships, severely limiting performance improvements. To address this challenge, we propose a local-and-global dual-granularity identity association mechanism. Specifically, at the local level, we explicitly establish cross-modal identity relationships within a batch, reinforcing identity constraints across different modalities and enabling the model to better capture subtle differences and correlations. At the global level, we construct a dynamic cross-modal identity association network with the visual modality as the anchor and introduce a confidence-based dynamic adjustment mechanism, effectively enhancing the model's ability to identify weakly associated samples while improving overall sensitivity. Additionally, we propose an information-asymmetric sample pair construction method combined with consistency learning to tackle hard sample mining and enhance model robustness. Experimental results demonstrate that the proposed method substantially boosts cross-modal matching accuracy, providing an efficient and practical solution for text-to-person image matching.

Updated: 2025-07-09 10:59:13

标题: 弱监督条件下的文本到人物图像匹配的双粒度跨模态身份关联

摘要: 弱监督文本到人物图像匹配作为减少模型对大规模手动标记样本依赖的重要方法，具有显著的研究价值。然而，现有方法很难预测复杂的一对多身份关系，严重限制了性能的提升。为了解决这一挑战，我们提出了一种局部和全局双粒度身份关联机制。具体地，在局部级别，我们明确建立了批内的跨模态身份关系，加强了不同模态之间的身份约束，并使模型能够更好地捕捉微小差异和相关性。在全局级别，我们构建了一个动态的跨模态身份关联网络，以视觉模态作为锚点，并引入了基于置信度的动态调整机制，有效增强了模型识别弱关联样本的能力，同时提高了整体敏感性。此外，我们提出了一种信息不对称的样本对构建方法，结合一致性学习来解决难样本挖掘问题，增强模型的鲁棒性。实验结果表明，所提出的方法显著提高了跨模态匹配的准确性，为文本到人物图像匹配提供了高效实用的解决方案。

更新时间: 2025-07-09 10:59:13

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2507.06744v1

Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons

Large Language Models (LLMs) have shown to be effective evaluators across various domains such as machine translations or the scientific domain. Current LLM-as-a-Judge approaches rely mostly on individual assessments or a single round of pairwise assessments, preventing the judge LLM from developing a global ranking perspective. To address this, we present Knockout Assessment, an LLM-asa Judge method using a knockout tournament system with iterative pairwise comparisons. Experiments across three LLMs on two datasets show that knockout assessment improves scoring accuracy, increasing Pearson correlation with expert evaluations by 0.07 on average for university-level exam scoring and machine translation evaluations, aligning LLM assessments more closely with human scoring.

Updated: 2025-07-09 10:58:38

标题: 淘汰LLM评估：通过迭代两两比较利用大型语言模型进行评估

摘要: 大型语言模型（LLMs）已经证明在各个领域，如机器翻译或科学领域中，是有效的评估器。当前的LLM作为评委的方法主要依赖于个人评估或一轮单个评估，这阻止了评委LLM发展出全局排名的视角。为了解决这个问题，我们提出了Knockout Assessment，一种使用淘汰赛系统和迭代两两比较的LLM作为评委方法。在两个数据集上对三个LLM进行的实验表明，淘汰赛评估提高了得分准确性，平均增加了0.07的皮尔逊相关性，使得大学考试评分和机器翻译评估更加接近专家评估，更紧密地与人类评分对齐。

更新时间: 2025-07-09 10:58:38

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2506.03785v3

PenTest2.0: Towards Autonomous Privilege Escalation Using GenAI

Ethical hacking today relies on highly skilled practitioners executing complex sequences of commands, which is inherently time-consuming, difficult to scale, and prone to human error. To help mitigate these limitations, we previously introduced 'PenTest++', an AI-augmented system combining automation with generative AI supporting ethical hacking workflows. However, a key limitation of PenTest++ was its lack of support for privilege escalation, a crucial element of ethical hacking. In this paper we present 'PenTest2.0', a substantial evolution of PenTest++ supporting automated privilege escalation driven entirely by Large Language Model reasoning. It also incorporates several significant enhancements: 'Retrieval-Augmented Generation', including both one-line and offline modes; 'Chain-of-Thought' prompting for intermediate reasoning; persistent 'PenTest Task Trees' to track goal progression across turns; and the optional integration of human-authored hints. We describe how it operates, present a proof-of-concept prototype, and discuss its benefits and limitations. We also describe application of the system to a controlled Linux target, showing it can carry out multi-turn, adaptive privilege escalation. We explain the rationale behind its core design choices, and provide comprehensive testing results and cost analysis. Our findings indicate that 'PenTest2.0' represents a meaningful step toward practical, scalable, AI-automated penetration testing, whilst highlighting the shortcomings of generative AI systems, particularly their sensitivity to prompt structure, execution context, and semantic drift, reinforcing the need for further research and refinement in this emerging space. Keywords: AI, Ethical Hacking, Privilege Escalation, GenAI, ChatGPT, LLM (Large Language Model), HITL (Human-in-the-Loop)

Updated: 2025-07-09 10:56:32

标题: PenTest2.0：基于GenAI的自主特权升级

摘要: 今天的道德黑客依赖于高技能的从业者执行复杂的命令序列，这在本质上是耗时的、难以扩展的，并容易出现人为错误。为了帮助缓解这些限制，我们之前介绍了“PenTest++”，这是一个AI增强系统，结合了自动化和生成AI，支持道德黑客工作流程。然而，“PenTest++”的一个关键限制是缺乏对特权升级的支持，这是道德黑客的一个关键要素。在本文中，我们介绍了“PenTest2.0”，这是“PenTest++”的一个重大进化，完全支持由大型语言模型推理驱动的自动特权升级。它还包括几项重要的增强功能：“检索增强生成”，包括一行和离线模式；中间推理的“思维链”提示；持久的“PenTest任务树”来跟踪跨回合的目标进展；以及人类创作提示的可选集成。我们描述了它的运作方式，展示了概念验证原型，并讨论了它的益处和限制。我们还描述了系统在受控的Linux目标上的应用，展示它可以进行多回合、自适应的特权升级。我们解释了其核心设计选择的理由，并提供了全面的测试结果和成本分析。我们的研究结果表明，“PenTest2.0”代表了朝着实用、可扩展、AI自动化渗透测试的有意义的一步，同时强调了生成AI系统的缺点，特别是对提示结构、执行上下文和语义漂移的敏感性，进一步强调了在这个新兴领域进行进一步研究和改进的必要性。关键词：人工智能、道德黑客、特权升级、GenAI、ChatGPT、LLM（大型语言模型）、HITL（人在循环中）

更新时间: 2025-07-09 10:56:32

领域: cs.CR

下载: http://arxiv.org/abs/2507.06742v1

DIFFUMA: High-Fidelity Spatio-Temporal Video Prediction via Dual-Path Mamba and Diffusion Enhancement

Spatio-temporal video prediction plays a pivotal role in critical domains, ranging from weather forecasting to industrial automation. However, in high-precision industrial scenarios such as semiconductor manufacturing, the absence of specialized benchmark datasets severely hampers research on modeling and predicting complex processes. To address this challenge, we make a twofold contribution.First, we construct and release the Chip Dicing Lane Dataset (CHDL), the first public temporal image dataset dedicated to the semiconductor wafer dicing process. Captured via an industrial-grade vision system, CHDL provides a much-needed and challenging benchmark for high-fidelity process modeling, defect detection, and digital twin development.Second, we propose DIFFUMA, an innovative dual-path prediction architecture specifically designed for such fine-grained dynamics. The model captures global long-range temporal context through a parallel Mamba module, while simultaneously leveraging a diffusion module, guided by temporal features, to restore and enhance fine-grained spatial details, effectively combating feature degradation. Experiments demonstrate that on our CHDL benchmark, DIFFUMA significantly outperforms existing methods, reducing the Mean Squared Error (MSE) by 39% and improving the Structural Similarity (SSIM) from 0.926 to a near-perfect 0.988. This superior performance also generalizes to natural phenomena datasets. Our work not only delivers a new state-of-the-art (SOTA) model but, more importantly, provides the community with an invaluable data resource to drive future research in industrial AI.

Updated: 2025-07-09 10:51:54

标题: DIFFUMA: 高保真度时空视频预测通过双通道蛇和扩散增强

摘要: 时空视频预测在从天气预报到工业自动化等关键领域中发挥着关键作用。然而，在半导体制造等高精度工业场景中，缺乏专门的基准数据集严重阻碍了对复杂过程建模和预测的研究。为了解决这一挑战，我们做出了双重贡献。首先，我们构建并发布了Chip Dicing Lane Dataset（CHDL），这是专门用于半导体晶圆切割过程的首个公共时间图像数据集。通过工业级视觉系统捕获，CHDL为高保真度过程建模、缺陷检测和数字孪生发展提供了一个迫切需要且具有挑战性的基准。其次，我们提出了DIFFUMA，这是一种创新的双通道预测架构，专门设计用于这种细粒度动态。该模型通过并行的Mamba模块捕获全局长期时间上下文，同时利用由时间特征引导的扩散模块来恢复和增强细粒度空间细节，有效对抗特征退化。实验证明，在我们的CHDL基准上，DIFFUMA明显优于现有方法，将均方误差（MSE）降低了39％，将结构相似性（SSIM）从0.926提高到接近完美的0.988。这种优越性能也推广到自然现象数据集。我们的工作不仅提供了一个新的最先进（SOTA）模型，更重要的是，为社区提供了一个宝贵的数据资源，推动未来工业人工智能研究。

更新时间: 2025-07-09 10:51:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.06738v1

Residual Prior-driven Frequency-aware Network for Image Fusion

Image fusion aims to integrate complementary information across modalities to generate high-quality fused images, thereby enhancing the performance of high-level vision tasks. While global spatial modeling mechanisms show promising results, constructing long-range feature dependencies in the spatial domain incurs substantial computational costs. Additionally, the absence of ground-truth exacerbates the difficulty of capturing complementary features effectively. To tackle these challenges, we propose a Residual Prior-driven Frequency-aware Network, termed as RPFNet. Specifically, RPFNet employs a dual-branch feature extraction framework: the Residual Prior Module (RPM) extracts modality-specific difference information from residual maps, thereby providing complementary priors for fusion; the Frequency Domain Fusion Module (FDFM) achieves efficient global feature modeling and integration through frequency-domain convolution. Additionally, the Cross Promotion Module (CPM) enhances the synergistic perception of local details and global structures through bidirectional feature interaction. During training, we incorporate an auxiliary decoder and saliency structure loss to strengthen the model's sensitivity to modality-specific differences. Furthermore, a combination of adaptive weight-based frequency contrastive loss and SSIM loss effectively constrains the solution space, facilitating the joint capture of local details and global features while ensuring the retention of complementary information. Extensive experiments validate the fusion performance of RPFNet, which effectively integrates discriminative features, enhances texture details and salient objects, and can effectively facilitate the deployment of the high-level vision task.

Updated: 2025-07-09 10:48:00

标题: 基于残差先验的频率感知图像融合网络

摘要: 图像融合旨在整合跨模态的互补信息，生成高质量的融合图像，从而增强高级视觉任务的性能。虽然全局空间建模机制显示出有希望的结果，但在空间域中构建长距离特征依赖会产生大量的计算成本。此外，缺乏地面真实数据加剧了有效捕获互补特征的困难。为了解决这些挑战，我们提出了一种称为RPFNet的残差先验驱动频率感知网络。具体来说，RPFNet采用双分支特征提取框架：残差先验模块（RPM）从残差图中提取特定于模态的差异信息，从而为融合提供互补的先验；频域融合模块（FDFM）通过频域卷积实现高效的全局特征建模和集成。此外，交叉促进模块（CPM）通过双向特征交互增强局部细节和全局结构的协同感知。在训练过程中，我们结合辅助解码器和显著性结构损失，以增强模型对特定于模态的差异的敏感性。此外，自适应权重基于频率对比损失和SSIM损失的组合有效约束解决空间，促进对局部细节和全局特征的联合捕获，同时确保保留互补信息。广泛的实验验证了RPFNet的融合性能，有效整合了辨别特征，增强了纹理细节和显著对象，并能有效促进高级视觉任务的部署。

更新时间: 2025-07-09 10:48:00

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2507.06735v1

Civil Society in the Loop: Feedback-Driven Adaptation of (L)LM-Assisted Classification in an Open-Source Telegram Monitoring Tool

The role of civil society organizations (CSOs) in monitoring harmful online content is increasingly crucial, especially as platform providers reduce their investment in content moderation. AI tools can assist in detecting and monitoring harmful content at scale. However, few open-source tools offer seamless integration of AI models and social media monitoring infrastructures. Given their thematic expertise and contextual understanding of harmful content, CSOs should be active partners in co-developing technological tools, providing feedback, helping to improve models, and ensuring alignment with stakeholder needs and values, rather than as passive 'consumers'. However, collaborations between the open source community, academia, and civil society remain rare, and research on harmful content seldom translates into practical tools usable by civil society actors. This work in progress explores how CSOs can be meaningfully involved in an AI-assisted open-source monitoring tool of anti-democratic movements on Telegram, which we are currently developing in collaboration with CSO stakeholders.

Updated: 2025-07-09 10:46:58

标题: 《循环中的公民社会：开源Telegram监控工具中基于反馈驱动的（L）LM辅助分类的自适应》

摘要: 民间社会组织（CSOs）在监测有害在线内容方面的作用日益关键，尤其是在平台提供商减少对内容管理的投入时。人工智能工具可以帮助大规模检测和监测有害内容。然而，很少有开源工具提供无缝集成的人工智能模型和社交媒体监测基础设施。鉴于他们对有害内容的主题专业知识和背景理解，CSOs应积极参与共同开发技术工具，提供反馈，帮助改进模型，并确保与利益相关者的需求和价值观保持一致，而不是作为被动的“消费者”。然而，开源社区、学术界和民间社会之间的合作仍然很少见，对有害内容的研究很少能转化为民间社会行动者可用的实用工具。这项正在进行的工作探讨了CSOs如何有意义地参与与CSO利益相关者合作开发的Telegram上反民主运动的AI辅助开源监测工具。

更新时间: 2025-07-09 10:46:58

领域: cs.HC,cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2507.06734v1

Torsion in Persistent Homology and Neural Networks

We explore the role of torsion in hybrid deep learning models that incorporate topological data analysis, focusing on autoencoders. While most TDA tools use field coefficients, this conceals torsional features present in integer homology. We show that torsion can be lost during encoding, altered in the latent space, and in many cases, not reconstructed by standard decoders. Using both synthetic and high-dimensional data, we evaluate torsion sensitivity to perturbations and assess its recoverability across several autoencoder architectures. Our findings reveal key limitations of field-based approaches and underline the need for architectures or loss terms that preserve torsional information for robust data representation.

Updated: 2025-07-09 10:38:34

标题: Persistent Homology 和神经网络中的扭曲

摘要: 我们探讨了扭曲在混合深度学习模型中的作用，这些模型融合了拓扑数据分析，重点放在自编码器上。虽然大多数TDA工具使用场系数，但这掩盖了整数同调中存在的扭曲特征。我们展示了在编码过程中扭曲可能会丢失，在潜在空间中被改变，并且在许多情况下，不能被标准解码器重建。使用合成和高维数据，我们评估了扭曲对扰动的敏感性，并评估了它在几种自编码器架构中的可恢复性。我们的发现揭示了基于场的方法的关键局限性，并强调了需要保留扭曲信息以获得稳健数据表示的架构或损失项的必要性。

更新时间: 2025-07-09 10:38:34

领域: math.AT,cs.LG

下载: http://arxiv.org/abs/2506.03049v2

PotentRegion4MalDetect: Advanced Features from Potential Malicious Regions for Malware Detection

Malware developers exploit the fact that most detection models focus on the entire binary to extract the feature rather than on the regions of potential maliciousness. Therefore, they reverse engineer a benign binary and inject malicious code into it. This obfuscation technique circumvents the malware detection models and deceives the ML classifiers due to the prevalence of benign features compared to malicious features. However, extracting the features from the potential malicious regions enhances the accuracy and decreases false positives. Hence, we propose a novel model named PotentRegion4MalDetect that extracts features from the potential malicious regions. PotentRegion4MalDetect determines the nodes with potential maliciousness in the partially preprocessed Control Flow Graph (CFG) using the malicious strings given by StringSifter. Then, it extracts advanced features of the identified potential malicious regions alongside the features from the completely preprocessed CFG. The features extracted from the completely preprocessed CFG mitigate obfuscation techniques that attempt to disguise malicious content, such as suspicious strings. The experiments reveal that the PotentRegion4MalDetect requires fewer entries to save the features for all binaries than the model focusing on the entire binary, reducing memory overhead, faster computation, and lower storage requirements. These advanced features give an 8.13% increase in SHapley Additive exPlanations (SHAP) Absolute Mean and a 1.44% increase in SHAP Beeswarm value compared to those extracted from the entire binary. The advanced features outperform the features extracted from the entire binary by producing more than 99% accuracy, precision, recall, AUC, F1-score, and 0.064% FPR.

Updated: 2025-07-09 10:32:17

标题: PotentRegion4MalDetect：潜在恶意区域的高级特征用于恶意软件检测

摘要: 恶意软件开发者利用大多数检测模型专注于整个二进制文件提取特征而不是专注于潜在恶意性区域的事实。因此，他们对良性二进制文件进行逆向工程，并向其中注入恶意代码。这种混淆技术绕过了恶意软件检测模型，并由于良性特征相对于恶意特征更为普遍，欺骗了机器学习分类器。然而，从潜在恶意区域提取特征可以提高准确性并减少误报。因此，我们提出了一个名为PotentRegion4MalDetect的新模型，从潜在恶意区域提取特征。PotentRegion4MalDetect使用由StringSifter提供的恶意字符串，在部分预处理的控制流图（CFG）中确定具有潜在恶意性的节点。然后，它提取已识别的潜在恶意区域的高级特征以及从完全预处理的CFG中提取的特征。从完全预处理的CFG中提取的特征减轻了试图掩盖恶意内容（例如可疑字符串）的混淆技术。实验显示，PotentRegion4MalDetect需要更少的条目来保存所有二进制文件的特征，比专注于整个二进制文件的模型减少了内存开销，计算速度更快，存储要求更低。这些高级特征使SHapley Additive exPlanations（SHAP）的绝对平均值增加了8.13％，与从整个二进制文件中提取的特征相比，SHAP Beeswarm值增加了1.44％。与从整个二进制文件中提取的特征相比，这些高级特征通过产生超过99％的准确性、精确度、召回率、AUC、F1分数和0.064％的FPR表现更好。

更新时间: 2025-07-09 10:32:17

领域: cs.CR

下载: http://arxiv.org/abs/2507.06723v1

On the Effect of Uncertainty on Layer-wise Inference Dynamics

Understanding how large language models (LLMs) internally represent and process their predictions is central to detecting uncertainty and preventing hallucinations. While several studies have shown that models encode uncertainty in their hidden states, it is underexplored how this affects the way they process such hidden states. In this work, we demonstrate that the dynamics of output token probabilities across layers for certain and uncertain outputs are largely aligned, revealing that uncertainty does not seem to affect inference dynamics. Specifically, we use the Tuned Lens, a variant of the Logit Lens, to analyze the layer-wise probability trajectories of final prediction tokens across 11 datasets and 5 models. Using incorrect predictions as those with higher epistemic uncertainty, our results show aligned trajectories for certain and uncertain predictions that both observe abrupt increases in confidence at similar layers. We balance this finding by showing evidence that more competent models may learn to process uncertainty differently. Our findings challenge the feasibility of leveraging simplistic methods for detecting uncertainty at inference. More broadly, our work demonstrates how interpretability methods may be used to investigate the way uncertainty affects inference.

Updated: 2025-07-09 10:30:09

标题: 关于不确定性对逐层推断动态的影响

摘要: 理解大型语言模型（LLMs）内部表示和处理其预测的方式对于检测不确定性并预防幻觉至关重要。尽管有几项研究表明模型在其隐藏状态中编码不确定性，但尚未深入探究这如何影响它们处理这些隐藏状态的方式。在这项工作中，我们展示了对于确定和不确定输出，跨层的输出标记概率动态基本一致，揭示了不确定性似乎并不影响推理动态。具体而言，我们使用Tuned Lens，Logit Lens的一个变体，分析了跨11个数据集和5个模型的最终预测标记的分层概率轨迹。将高认知不确定性定义为不正确的预测，我们的结果显示，确定和不确定预测的轨迹保持一致，两者都观察到在相似层次上自信度急剧增加。我们通过展示更有竞争力的模型可能学会以不同方式处理不确定性的证据来平衡这一发现。我们的发现挑战了利用简单方法在推理中检测不确定性的可行性。更广泛地说，我们的工作展示了可解释性方法如何用于研究不确定性如何影响推理。

更新时间: 2025-07-09 10:30:09

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.06722v1

DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability

Recent advancements in text-to-image generation have spurred interest in personalized human image generation, which aims to create novel images featuring specific human identities as reference images indicate. Although existing methods achieve high-fidelity identity preservation, they often struggle with limited multi-ID usability and inadequate facial editability. We present DynamicID, a tuning-free framework supported by a dual-stage training paradigm that inherently facilitates both single-ID and multi-ID personalized generation with high fidelity and flexible facial editability. Our key innovations include: 1) Semantic-Activated Attention (SAA), which employs query-level activation gating to minimize disruption to the original model when injecting ID features and achieve multi-ID personalization without requiring multi-ID samples during training. 2) Identity-Motion Reconfigurator (IMR), which leverages contrastive learning to effectively disentangle and re-entangle facial motion and identity features, thereby enabling flexible facial editing. Additionally, we have developed a curated VariFace-10k facial dataset, comprising 10k unique individuals, each represented by 35 distinct facial images. Experimental results demonstrate that DynamicID outperforms state-of-the-art methods in identity fidelity, facial editability, and multi-ID personalization capability.

Updated: 2025-07-09 10:21:24

标题: DynamicID：具有灵活面部可编辑性的零-shot多ID图像个性化

摘要: 最近在文本到图像生成领域取得的进展引发了对个性化人类图像生成的兴趣，旨在创建具有特定人类身份的新颖图像，这些身份由参考图像指示。尽管现有方法实现了高保真度的身份保留，但它们经常面临有限的多ID可用性和不足的面部可编辑性的挑战。我们提出了DynamicID，这是一个无需调整的框架，支持双阶段训练范式，从根本上促进了具有高保真度和灵活面部可编辑性的单ID和多ID个性化生成。我们的关键创新包括：1）语义激活注意力（SAA），它利用查询级激活门控来最小化在注入ID特征时对原始模型的干扰，并实现多ID个性化而无需在训练期间使用多ID样本。2）身份动态重构器（IMR），它利用对比学习有效地解开和重新编织面部动态和身份特征，从而实现灵活的面部编辑。此外，我们开发了一个经过策划的VariFace-10k面部数据集，包括10k个独特个体，每个个体由35个不同的面部图像代表。实验结果表明，DynamicID在身份保真度、面部可编辑性和多ID个性化能力方面优于现有方法。

更新时间: 2025-07-09 10:21:24

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.06505v2

Multi-parameter Control for the $(1+(λ,λ))$-GA on OneMax via Deep Reinforcement Learning

It is well known that evolutionary algorithms can benefit from dynamic choices of the key parameters that control their behavior, to adjust their search strategy to the different stages of the optimization process. A prominent example where dynamic parameter choices have shown a provable super-constant speed-up is the $(1+(\lambda,\lambda))$ Genetic Algorithm optimizing the OneMax function. While optimal parameter control policies result in linear expected running times, this is not possible with static parameter choices. This result has spurred a lot of interest in parameter control policies. However, many works, in particular theoretical running time analyses, focus on controlling one single parameter. Deriving policies for controlling multiple parameters remains very challenging. In this work we reconsider the problem of the $(1+(\lambda,\lambda))$ Genetic Algorithm optimizing OneMax. We decouple its four main parameters and investigate how well state-of-the-art deep reinforcement learning techniques can approximate good control policies. We show that although making deep reinforcement learning learn effectively is a challenging task, once it works, it is very powerful and is able to find policies that outperform all previously known control policies on the same benchmark. Based on the results found through reinforcement learning, we derive a simple control policy that consistently outperforms the default theory-recommended setting by $27\%$ and the irace-tuned policy, the strongest existing control policy on this benchmark, by $13\%$, for all tested problem sizes up to $40{,}000$.

Updated: 2025-07-09 10:18:09

标题: 通过深度强化学习实现$(1+(λ,λ))$-GA在OneMax问题上的多参数控制

摘要: 众所周知，进化算法可以受益于动态选择控制其行为的关键参数，以调整其搜索策略以适应优化过程的不同阶段。一个突出的例子是在优化OneMax函数时，动态参数选择显示出可证明的超常速度提升的$(1+(\lambda,\lambda))$遗传算法。虽然最佳参数控制策略导致线性期望运行时间，但静态参数选择则不可能实现。这一结果引起了对参数控制策略的广泛关注。然而，许多作品，特别是理论运行时间分析，侧重于控制单个参数。推导用于控制多个参数的策略仍然非常具有挑战性。在这项工作中，我们重新考虑了优化OneMax的$(1+(\lambda,\lambda))$遗传算法的问题。我们解耦其四个主要参数，并研究最先进的深度强化学习技术如何能够近似好的控制策略。我们展示，虽然使深度强化学习有效学习是一项具有挑战性的任务，一旦成功，它非常强大，能够找到优于先前已知的所有控制策略的策略，超越同一基准测试上的所有先前已知的控制策略。基于通过强化学习找到的结果，我们推导出一种简单的控制策略，它在所有测试的问题大小上都比默认的理论推荐设置高出27％，比irace调整的策略高出13％，后者是这个基准测试上最强大的现有控制策略，直到40,000个问题大小。

更新时间: 2025-07-09 10:18:09

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2505.12982v2

From Blurry to Brilliant Detection: YOLO-Based Aerial Object Detection with Super Resolution

Aerial object detection presents challenges from small object sizes, high density clustering, and image quality degradation from distance and motion blur. These factors create an information bottleneck where limited pixel representation cannot encode sufficient discriminative features. B2BDet addresses this with a two-stage framework that applies domain-specific super-resolution during inference, followed by detection using an enhanced YOLOv5 architecture. Unlike training-time super-resolution approaches that enhance learned representations, our method recovers visual information from each input image. The approach combines aerial-optimized SRGAN fine-tuning with architectural innovations including an Efficient Attention Module (EAM) and Cross-Layer Feature Pyramid Network (CLFPN). Evaluation across four aerial datasets shows performance gains, with VisDrone achieving 52.5% mAP using only 27.7M parameters. Ablation studies show that super-resolution preprocessing contributes +2.6% mAP improvement while architectural enhancements add +2.9%, yielding +5.5% total improvement over baseline YOLOv5. The method achieves computational efficiency with 53.8% parameter reduction compared to recent approaches while achieving strong small object detection performance.

Updated: 2025-07-09 10:14:26

标题: 从模糊到清晰的检测：基于YOLO的具有超分辨率的航空目标检测

摘要: 空中目标检测面临着小目标尺寸、高密度聚类以及距离和运动模糊造成的图像质量降低等挑战。这些因素造成了信息瓶颈，有限的像素表示无法编码足够的区分特征。B2BDet通过一个两阶段框架来解决这个问题，在推理过程中应用领域特定的超分辨率，然后使用增强的YOLOv5架构进行检测。与训练时超分辨率方法不同，我们的方法从每个输入图像中恢复视觉信息。该方法结合了针对空中优化的SRGAN微调和包括Efficient Attention Module（EAM）和Cross-Layer Feature Pyramid Network（CLFPN）在内的架构创新。在四个空中数据集上的评估显示了性能提升，VisDrone仅使用27.7M参数即可实现52.5% mAP。消融研究表明，超分辨率预处理贡献了+2.6%的mAP改进，而架构增强则增加了+2.9%，使得相对基线YOLOv5的总改进达到+5.5%。该方法在实现强大的小目标检测性能的同时，实现了与最近方法相比的53.8%参数减少的计算效率。

更新时间: 2025-07-09 10:14:26

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.14661v2

CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs

Large language models (LLMs), including zero-shot and few-shot paradigms, have shown promising capabilities in clinical text generation. However, real-world applications face two key challenges: (1) patient data is highly unstructured, heterogeneous, and scattered across multiple note types and (2) clinical notes are often long and semantically dense, making naive prompting infeasible due to context length constraints and the risk of omitting clinically relevant information. We introduce CLI-RAG (Clinically Informed Retrieval-Augmented Generation), a domain-specific framework for structured and clinically grounded text generation using LLMs. It incorporates a novel hierarchical chunking strategy that respects clinical document structure and introduces a task-specific dual-stage retrieval mechanism. The global stage identifies relevant note types using evidence-based queries, while the local stage extracts high-value content within those notes creating relevance at both document and section levels. We apply the system to generate structured progress notes for individual hospital visits using 15 clinical note types from the MIMIC-III dataset. Experiments show that it preserves temporal and semantic alignment across visits, achieving an average alignment score of 87.7%, surpassing the 80.7% baseline from real clinician-authored notes. The generated outputs also demonstrate high consistency across LLMs, reinforcing deterministic behavior essential for reproducibility, reliability, and clinical trust.

Updated: 2025-07-09 10:13:38

标题: CLI-RAG：一种用于临床结构化和上下文感知文本生成的检索增强框架，基于LLMs

摘要: 大型语言模型（LLMs），包括零样本和少样本范式，在临床文本生成方面展现出了很有前景的能力。然而，现实世界的应用面临两个关键挑战：（1）患者数据高度不结构化、异构，并分散在多种类型的笔记中；（2）临床笔记通常较长且语义密集，使得由于上下文长度限制和遗漏临床相关信息的风险而使天真的提示变得不可行。我们介绍了CLI-RAG（Clinically Informed Retrieval-Augmented Generation），这是一个针对结构化和临床基础文本生成的领域特定框架，使用LLMs。它融入了一种尊重临床文档结构的新颖的分层分块策略，并引入了一个任务特定的双阶段检索机制。全局阶段使用基于证据的查询识别相关的笔记类型，而本地阶段提取这些笔记中的高价值内容，在文档和章节级别都创建相关性。我们将该系统应用于使用MIMIC-III数据集中的15种临床笔记类型为个人医院就诊生成结构化的进展笔记。实验表明，它保留了访问之间的时间和语义对齐，实现了平均对齐得分为87.7%，超过了来自真实临床医生撰写的笔记的80.7%基线。生成的输出还显示了LLMs之间高一致性，加强了对于可再现性、可靠性和临床信任至关重要的确定性行为。

更新时间: 2025-07-09 10:13:38

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2507.06715v1

PINN-Obs: Physics-Informed Neural Network-Based Observer for Nonlinear Dynamical Systems

State estimation for nonlinear dynamical systems is a critical challenge in control and engineering applications, particularly when only partial and noisy measurements are available. This paper introduces a novel Adaptive Physics-Informed Neural Network-based Observer (PINN-Obs) for accurate state estimation in nonlinear systems. Unlike traditional model-based observers, which require explicit system transformations or linearization, the proposed framework directly integrates system dynamics and sensor data into a physics-informed learning process. The observer adaptively learns an optimal gain matrix, ensuring convergence of the estimated states to the true system states. A rigorous theoretical analysis establishes formal convergence guarantees, demonstrating that the proposed approach achieves uniform error minimization under mild observability conditions. The effectiveness of PINN-Obs is validated through extensive numerical simulations on diverse nonlinear systems, including an induction motor model, a satellite motion system, and benchmark academic examples. Comparative experimental studies against existing observer designs highlight its superior accuracy, robustness, and adaptability.

Updated: 2025-07-09 10:09:45

标题: PINN-Obs：基于物理知识的神经网络观测器用于非线性动力系统

摘要: 非线性动力系统的状态估计在控制和工程应用中是一个关键挑战，特别是当只有部分和嘈杂的测量数据可用时。本文介绍了一种新颖的基于自适应物理信息神经网络的观测器（PINN-Obs），用于非线性系统中准确的状态估计。与传统的基于模型的观测器不同，后者需要显式的系统转换或线性化，所提出的框架直接将系统动态和传感器数据整合到一个物理信息学习过程中。观测器自适应学习一个最优增益矩阵，确保估计状态收敛于真实系统状态。严格的理论分析建立了正式的收敛保证，证明了所提出的方法在温和的可观测条件下实现了统一误差最小化。通过对各种非线性系统进行广泛的数值模拟验证了PINN-Obs的有效性，包括感应电动机模型、卫星运动系统和基准学术示例。与现有观测器设计的比较实验研究突出了其优越的准确性、鲁棒性和适应性。

更新时间: 2025-07-09 10:09:45

领域: cs.LG,math.DS,nlin.CD

下载: http://arxiv.org/abs/2507.06712v1

Do Larger Language Models Imply Better Generalization? A Pretraining Scaling Law for Implicit Reasoning

Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks requiring complex reasoning. However, the effects of scaling on their reasoning abilities remain insufficiently understood. In this paper, we introduce a synthetic multihop reasoning environment designed to closely replicate the structure and distribution of real-world large-scale knowledge graphs. Our reasoning task involves completing missing edges in the graph, which requires advanced multi-hop reasoning and mimics real-world reasoning scenarios. To evaluate this, we pretrain language models (LMs) from scratch solely on triples from the incomplete graph and assess their ability to infer the missing edges. Interestingly, we observe that overparameterization can impair reasoning performance due to excessive memorization. We investigate different factors that affect this U-shaped loss curve, including graph structure, model size, and training steps. To predict the optimal model size for a specific knowledge graph, we find an empirical scaling that linearly maps the knowledge graph search entropy to the optimal model size. This work provides new insights into the relationship between scaling and reasoning in LLMs, shedding light on possible ways to optimize their performance for reasoning tasks.

Updated: 2025-07-09 10:08:34

标题: 更大的语言模型是否意味着更好的泛化能力？隐性推理的预训练缩放规律

摘要: 大型语言模型（LLMs）在需要复杂推理的各种任务中展示了出色的能力。然而，它们的推理能力受到规模扩展的影响仍然不够清楚。在本文中，我们引入了一个合成的多跳推理环境，旨在紧密复制真实世界大规模知识图的结构和分布。我们的推理任务涉及完成图中缺失的边，这需要高级的多跳推理，并模拟真实世界的推理场景。为了评估这一点，我们从头开始仅对不完整图中的三元组对语言模型（LMs）进行预训练，并评估它们推断缺失边的能力。有趣的是，我们观察到过度参数化可能会损害推理性能，因为存在过多的记忆。我们调查了影响这种U形损失曲线的不同因素，包括图结构、模型大小和训练步骤。为了预测特定知识图的最佳模型大小，我们找到了一个经验缩放，将知识图搜索熵线性映射到最佳模型大小。这项工作提供了关于LLMs中规模和推理之间关系的新见解，为推理任务优化它们的性能提供了可能的方法。

更新时间: 2025-07-09 10:08:34

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2504.03635v2

Approximating Euler Totient Function using Linear Regression on RSA moduli

The security of the RSA cryptosystem is based on the intractability of computing Euler's totient function phi(n) for large integers n. Although deriving phi(n) deterministically remains computationally infeasible for cryptographically relevant bit lengths, and machine learning presents a promising alternative for constructing efficient approximations. In this work, we explore a machine learning approach to approximate Euler's totient function phi using linear regression models. We consider a dataset of RSA moduli of 64, 128, 256, 512 and 1024 bits along with their corresponding totient values. The regression model is trained to capture the relationship between the modulus and its totient, and tested on unseen samples to evaluate its prediction accuracy. Preliminary results suggest that phi can be approximated within a small relative error margin, which may be sufficient to aid in certain classes of RSA attacks. This research opens a direction for integrating statistical learning techniques into cryptanalysis, providing insights into the feasibility of attacking cryptosystems using approximation based strategies.

Updated: 2025-07-09 10:01:25

标题: 使用线性回归近似欧拉函数在RSA模数上

摘要: RSA加密系统的安全性基于计算大整数n的欧拉函数phi(n)的困难性。尽管对phi(n)进行确定性推导在密码学相关的比特长度上仍然是计算上不可行的，但机器学习提供了一种构建高效近似的有希望的替代方法。在这项工作中，我们探索了一种利用线性回归模型逼近欧拉函数phi的机器学习方法。我们考虑了一组64、128、256、512和1024比特的RSA模数及其对应的欧拉函数值的数据集。回归模型被训练来捕捉模数和其欧拉函数之间的关系，并在未见样本上进行测试以评估其预测准确性。初步结果表明，phi可以在一个较小的相对误差范围内近似，这可能足以帮助攻击某些RSA攻击类别。这项研究开辟了将统计学习技术整合到密码分析中的方向，为利用基于近似的策略攻击密码系统提供了见解。

更新时间: 2025-07-09 10:01:25

领域: cs.CR,03C05

下载: http://arxiv.org/abs/2507.06706v1

Causal Inference Isn't Special: Why It's Just Another Prediction Problem

Causal inference is often portrayed as fundamentally distinct from predictive modeling, with its own terminology, goals, and intellectual challenges. But at its core, causal inference is simply a structured instance of prediction under distribution shift. In both cases, we begin with labeled data from a source domain and seek to generalize to a target domain where outcomes are not observed. The key difference is that in causal inference, the labels -- potential outcomes -- are selectively observed based on treatment assignment, introducing bias that must be addressed through assumptions. This perspective reframes causal estimation as a familiar generalization problem and highlights how techniques from predictive modeling, such as reweighting and domain adaptation, apply directly to causal tasks. It also clarifies that causal assumptions are not uniquely strong -- they are simply more explicit. By viewing causal inference through the lens of prediction, we demystify its logic, connect it to familiar tools, and make it more accessible to practitioners and educators alike.

Updated: 2025-07-09 10:00:04

标题: 因果推断并不特殊：为什么它只是另一个预测问题

摘要: 因果推断经常被描述为与预测建模根本不同，具有自己的术语、目标和知识挑战。但在其核心，因果推断只是在分布转移下的预测的一个结构化实例。在这两种情况下，我们都从源域的标记数据开始，并试图推广到目标域，在那里结果没有被观察到。关键的区别在于，在因果推断中，标签 -- 潜在结果 -- 根据治疗分配有选择地被观察到，引入了必须通过假设来解决的偏差。这种视角将因果估计重新构建为一个熟悉的泛化问题，并突显出预测建模中的技术，如重新加权和领域适应，如何直接应用于因果任务。它还澄清了因果假设并不是唯一强大的 -- 它们只是更明确。通过通过预测的视角观察因果推断，我们使其逻辑更加清晰，将其与熟悉的工具联系起来，并使其更容易被从业者和教育者理解。

更新时间: 2025-07-09 10:00:04

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2504.04320v3

Diversifying Robot Locomotion Behaviors with Extrinsic Behavioral Curiosity

Imitation learning (IL) has shown promise in robot locomotion but is often limited to learning a single expert policy, constraining behavior diversity and robustness in unpredictable real-world scenarios. To address this, we introduce Quality Diversity Inverse Reinforcement Learning (QD-IRL), a novel framework that integrates quality-diversity optimization with IRL methods, enabling agents to learn diverse behaviors from limited demonstrations. This work introduces Extrinsic Behavioral Curiosity (EBC), which allows agents to receive additional curiosity rewards from an external critic based on how novel the behaviors are with respect to a large behavioral archive. To validate the effectiveness of EBC in exploring diverse locomotion behaviors, we evaluate our method on multiple robot locomotion tasks. EBC improves the performance of QD-IRL instances with GAIL, VAIL, and DiffAIL across all included environments by up to 185%, 42%, and 150%, even surpassing expert performance by 20% in Humanoid. Furthermore, we demonstrate that EBC is applicable to Gradient-Arborescence-based Quality Diversity Reinforcement Learning (QD-RL) algorithms, where it substantially improves performance and provides a generic technique for diverse robot locomotion. The source code of this work is provided at https://github.com/vanzll/EBC.

Updated: 2025-07-09 09:55:36

标题: 用外在行为好奇心丰富机器人运动行为

摘要: 模䓎学习（IL）在机器人行走方面表现出潜力，但通常仅限于学习单个专家策略，限制了在不可预测的现实世界场景中的行为多样性和鲁棒性。为了解决这个问题，我们引入了质量多样性逆强化学习（QD-IRL），这是一个将质量多样性优化与IRL方法相结合的新框架，使代理能够从有限的演示中学习多样化的行为。本研究介绍了外部行为好奇心（EBC），它允许代理根据行为相对于大型行为存档的新颖程度从外部评论者那里获得额外的好奇奖励。为了验证EBC在探索多样化的行走行为方面的有效性，我们在多个机器人行走任务上评估了我们的方法。EBC可以使QD-IRL实例在所有包含的环境中的性能提高高达185％，42％和150％，甚至在Humanoid中超过专家表现20％。此外，我们证明EBC适用于基于Gradient-Arborescence的质量多样性强化学习（QD-RL）算法，它显著提高了性能，并为多样化机器人行走提供了通用技术。本研究的源代码可在https://github.com/vanzll/EBC 上找到。

更新时间: 2025-07-09 09:55:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06151v3

Autonomy by Design: Preserving Human Autonomy in AI Decision-Support

AI systems increasingly support human decision-making across domains of professional, skill-based, and personal activity. While previous work has examined how AI might affect human autonomy globally, the effects of AI on domain-specific autonomy -- the capacity for self-governed action within defined realms of skill or expertise -- remain understudied. We analyze how AI decision-support systems affect two key components of domain-specific autonomy: skilled competence (the ability to make informed judgments within one's domain) and authentic value-formation (the capacity to form genuine domain-relevant values and preferences). By engaging with prior investigations and analyzing empirical cases across medical, financial, and educational domains, we demonstrate how the absence of reliable failure indicators and the potential for unconscious value shifts can erode domain-specific autonomy both immediately and over time. We then develop a constructive framework for autonomy-preserving AI support systems. We propose specific socio-technical design patterns -- including careful role specification, implementation of defeater mechanisms, and support for reflective practice -- that can help maintain domain-specific autonomy while leveraging AI capabilities. This framework provides concrete guidance for developing AI systems that enhance rather than diminish human agency within specialized domains of action.

Updated: 2025-07-09 09:55:27

标题: 设计自主权：在AI决策支持中保护人类自主权

摘要: 人工智能系统越来越多地支持人类在专业、技能和个人活动领域的决策。虽然以前的研究已经探讨了人工智能如何全球影响人类自主性，但人工智能对特定领域自主性的影响——即在定义的技能或专业领域内进行自主行动的能力——仍未得到充分研究。我们分析了人工智能决策支持系统对领域特定自主性的两个关键组成部分的影响：技能能力（在自己的领域内做出明智判断的能力）和真实价值形成（形成真实的领域相关价值观和偏好的能力）。通过参与先前的调查和分析医疗、金融和教育领域的实证案例，我们展示了缺乏可靠的失败指标和潜在的无意识价值转变如何会立即或随着时间推移侵蚀领域特定自主性。然后，我们制定了一个有益的框架，用于保护自主性的人工智能支持系统。我们提出了具体的社会技术设计模式——包括谨慎的角色规范、实施失败机制和支持反思实践——可以帮助在利用人工智能能力的同时保持领域特定自主性。这个框架为开发增强而不是削弱专业领域内人类代理能力的人工智能系统提供了具体指导。

更新时间: 2025-07-09 09:55:27

领域: cs.HC,cs.AI,cs.LG,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2506.23952v3

Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement

Imitation Learning from Observation (IfO) offers a powerful way to learn behaviors at large-scale: Unlike behavior cloning or offline reinforcement learning, IfO can leverage action-free demonstrations and thus circumvents the need for costly action-labeled demonstrations or reward functions. However, current IfO research focuses on idealized scenarios with mostly bimodal-quality data distributions, restricting the meaningfulness of the results. In contrast, this paper investigates more nuanced distributions and introduces a method to learn from such data, moving closer to a paradigm in which imitation learning can be performed iteratively via self-improvement. Our method adapts RL-based imitation learning to action-free demonstrations, using a value function to transfer information between expert and non-expert data. Through comprehensive evaluation, we delineate the relation between different data distributions and the applicability of algorithms and highlight the limitations of established methods. Our findings provide valuable insights for developing more robust and practical IfO techniques on a path to scalable behaviour learning.

Updated: 2025-07-09 09:55:23

标题: 观察的价值：通过自我改进实现大规模模仿学习

摘要: 观察学习仿真（IfO）提供了一种强大的学习行为的方法：与行为克隆或离线强化学习不同，IfO可以利用无动作演示，从而避免了昂贵的动作标记演示或奖励函数的需求。然而，目前的IfO研究集中在大多数双峰质量数据分布的理想化场景，限制了结果的意义。相比之下，本文研究了更加微妙的分布，并引入了一种从这些数据中学习的方法，更接近一个可以通过自我改进进行迭代的仿真学习范式。我们的方法将基于RL的仿真学习适应到无动作演示，使用价值函数在专家和非专家数据之间传递信息。通过全面评估，我们勾勒了不同数据分布之间的关系以及算法的适用性，并突出了已建立方法的局限性。我们的发现为发展更加稳健和实用的IfO技术提供了宝贵的见解，为可扩展行为学习的道路上铺平了道路。

更新时间: 2025-07-09 09:55:23

领域: cs.LG

下载: http://arxiv.org/abs/2507.06701v1

Disentangling Uncertainty for Safe Social Navigation using Deep Reinforcement Learning

Autonomous mobile robots are increasingly used in pedestrian-rich environments where safe navigation and appropriate human interaction are crucial. While Deep Reinforcement Learning (DRL) enables socially integrated robot behavior, challenges persist in novel or perturbed scenarios to indicate when and why the policy is uncertain. Unknown uncertainty in decision-making can lead to collisions or human discomfort and is one reason why safe and risk-aware navigation is still an open problem. This work introduces a novel approach that integrates aleatoric, epistemic, and predictive uncertainty estimation into a DRL navigation framework for policy distribution uncertainty estimates. We, therefore, incorporate Observation-Dependent Variance (ODV) and dropout into the Proximal Policy Optimization (PPO) algorithm. For different types of perturbations, we compare the ability of deep ensembles and Monte-Carlo dropout (MC-dropout) to estimate the uncertainties of the policy. In uncertain decision-making situations, we propose to change the robot's social behavior to conservative collision avoidance. The results show improved training performance with ODV and dropout in PPO and reveal that the training scenario has an impact on the generalization. In addition, MC-dropout is more sensitive to perturbations and correlates the uncertainty type to the perturbation better. With the safe action selection, the robot can navigate in perturbed environments with fewer collisions.

Updated: 2025-07-09 09:52:36

标题: 使用深度强化学习解决安全社交导航中的不确定性

摘要: 自主移动机器人在行人密集的环境中越来越多地被使用，其中安全导航和适当的人机交互至关重要。深度强化学习（DRL）使得机器人行为能够与社会整合，但在新颖或受干扰的情况下，仍存在挑战，无法确定政策何时以及为何不确定。决策中的未知不确定性可能导致碰撞或人类不适，这也是为什么安全和风险感知导航仍然是一个未解决的问题的原因之一。本研究引入了一种新颖的方法，将aleatoric、epistemic和预测性不确定性估计整合到DRL导航框架中，用于政策分布不确定性估计。因此，我们将观测依赖方差（ODV）和dropout结合到Proximal Policy Optimization（PPO）算法中。对于不同类型的扰动，我们比较了深度集成和蒙特卡洛dropout（MC-dropout）估计政策不确定性的能力。在不确定的决策情况下，我们建议改变机器人的社会行为，以保守地避免碰撞。结果显示了ODV和dropout在PPO中的改进训练表现，并揭示了训练场景对泛化的影响。此外，MC-dropout对扰动更为敏感，将不确定性类型与扰动更好地相关。通过安全动作选择，机器人可以在受扰动的环境中避免更少的碰撞。

更新时间: 2025-07-09 09:52:36

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2409.10655v3

QUITE: A Query Rewrite System Beyond Rules with LLM Agents

Query rewrite transforms SQL queries into semantically equivalent forms that run more efficiently. Existing approaches mainly rely on predefined rewrite rules, but they handle a limited subset of queries and can cause performance regressions. This limitation stems from three challenges of rule-based query rewrite: (1) it is hard to discover and verify new rules, (2) fixed rewrite rules do not generalize to new query patterns, and (3) some rewrite techniques cannot be expressed as fixed rules. Motivated by the fact that human experts exhibit significantly better rewrite ability but suffer from scalability, and Large Language Models (LLMs) have demonstrated nearly human-level semantic and reasoning abilities, we propose a new approach of using LLMs to rewrite SQL queries beyond rules. Due to the hallucination problems in LLMs, directly applying LLMs often leads to nonequivalent and suboptimal queries. To address this issue, we propose QUITE (query rewrite), a training-free and feedback-aware system based on LLM agents that rewrites SQL queries into semantically equivalent forms with significantly better performance, covering a broader range of query patterns and rewrite strategies compared to rule-based methods. Firstly, we design a multi-agent framework controlled by a finite state machine (FSM) to equip LLMs with the ability to use external tools and enhance the rewrite process with real-time database feedback. Secondly, we develop a rewrite middleware to enhance the ability of LLMs to generate optimized query equivalents. Finally, we employ a novel hint injection technique to improve execution plans for rewritten queries. Extensive experiments show that QUITE reduces query execution time by up to 35.8% over state-of-the-art approaches and produces 24.1% more rewrites than prior methods, covering query cases that earlier systems did not handle.

Updated: 2025-07-09 09:51:35

标题: QUITE：一种超越规则的查询重写系统，具有LLM代理

摘要: Query rewrite将SQL查询转换为语义等效的形式，以更有效地运行。现有方法主要依赖预定义的重写规则，但它们处理的查询子集有限，并且可能导致性能回归。这种限制源自基于规则的查询重写的三个挑战：(1)发现和验证新规则很困难，(2)固定的重写规则不能概括新的查询模式，(3)某些重写技术无法表达为固定规则。鉴于人类专家表现出显著更好的重写能力，但受到可扩展性的限制，以及大型语言模型（LLM）已经展示出接近人类水平的语义和推理能力，我们提出了一种新的使用LLM重写SQL查询的方法，超越规则。由于LLM中存在的错觉问题，直接应用LLM通常会导致不等效和次优的查询。为了解决这个问题，我们提出了一个名为QUITE（query rewrite）的基于LLM代理的免训练和反馈感知系统，将SQL查询重写为语义等效形式，性能显著更好，涵盖了比基于规则方法更广泛的查询模式和重写策略。首先，我们设计了一个由有限状态机（FSM）控制的多代理框架，赋予LLM使用外部工具的能力，并通过实时数据库反馈增强重写过程。其次，我们开发了一个重写中间件，以增强LLM生成优化查询等效的能力。最后，我们采用一种新颖的提示注入技术，以改进重写查询的执行计划。大量实验证明，QUITE相比最先进的方法可将查询执行时间减少最多35.8％，并且产生比先前方法更多的重写，覆盖了先前系统无法处理的查询案例。

更新时间: 2025-07-09 09:51:35

领域: cs.DB,cs.AI

下载: http://arxiv.org/abs/2506.07675v2

Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems

Online advertising in recommendation platforms has gained significant attention, with a predominant focus on channel recommendation and budget allocation strategies. However, current offline reinforcement learning (RL) methods face substantial challenges when applied to sparse advertising scenarios, primarily due to severe overestimation, distributional shifts, and overlooking budget constraints. To address these issues, we propose MTORL, a novel multi-task offline RL model that targets two key objectives. First, we establish a Markov Decision Process (MDP) framework specific to the nuances of advertising. Then, we develop a causal state encoder to capture dynamic user interests and temporal dependencies, facilitating offline RL through conditional sequence modeling. Causal attention mechanisms are introduced to enhance user sequence representations by identifying correlations among causal states. We employ multi-task learning to decode actions and rewards, simultaneously addressing channel recommendation and budget allocation. Notably, our framework includes an automated system for integrating these tasks into online advertising. Extensive experiments on offline and online environments demonstrate MTORL's superiority over state-of-the-art methods.

Updated: 2025-07-09 09:50:43

标题: 多任务离线强化学习在推荐系统中在线广告中的应用

摘要: 在线推荐平台中的在线广告已经引起了极大关注，主要集中在频道推荐和预算分配策略上。然而，目前的离线强化学习（RL）方法在应用于稀疏广告场景时面临重大挑战，主要是由于严重的高估、分布偏移和忽视预算约束。为了解决这些问题，我们提出了MTORL，一种新颖的多任务离线RL模型，旨在实现两个关键目标。首先，我们建立了一个特定于广告细微差别的马尔可夫决策过程（MDP）框架。然后，我们开发了一个因果状态编码器，以捕获动态用户兴趣和时间依赖关系，通过条件序列建模促进离线RL。引入了因果注意机制，通过识别因果状态之间的相关性来增强用户序列表示。我们采用多任务学习来解码行动和奖励，同时解决频道推荐和预算分配。值得注意的是，我们的框架包括一个自动系统，将这些任务整合到在线广告中。在离线和在线环境上进行的广泛实验表明，MTORL优于最先进的方法。

更新时间: 2025-07-09 09:50:43

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2506.23090v2

An Optimisation Framework for Unsupervised Environment Design

For reinforcement learning agents to be deployed in high-risk settings, they must achieve a high level of robustness to unfamiliar scenarios. One method for improving robustness is unsupervised environment design (UED), a suite of methods aiming to maximise an agent's generalisability across configurations of an environment. In this work, we study UED from an optimisation perspective, providing stronger theoretical guarantees for practical settings than prior work. Whereas previous methods relied on guarantees if they reach convergence, our framework employs a nonconvex-strongly-concave objective for which we provide a provably convergent algorithm in the zero-sum setting. We empirically verify the efficacy of our method, outperforming prior methods in a number of environments with varying difficulties.

Updated: 2025-07-09 09:50:34

标题: 一个无监督环境设计的优化框架

摘要: 为了在高风险环境中部署强化学习智能体，它们必须具有对陌生场景的高度鲁棒性。提高鲁棒性的一种方法是无监督环境设计（UED），这是一套旨在最大化智能体在环境配置中的泛化能力的方法。在这项工作中，我们从优化的角度研究UED，为实际设置提供比先前工作更强的理论保证。以往的方法依赖于它们是否达到收敛的保证，而我们的框架采用了一个非凸强凹目标，我们为其在零和设置中提供了一种可证明收敛的算法。我们通过实验证实了我们方法的有效性，在多个难度不同的环境中表现优于先前的方法。

更新时间: 2025-07-09 09:50:34

领域: cs.LG

下载: http://arxiv.org/abs/2505.20659v2

Heterogeneous Graph Neural Networks for Short-term State Forecasting in Power Systems across Domains and Time Scales: A Hydroelectric Power Plant Case Study

Accurate short-term state forecasting is essential for efficient and stable operation of modern power systems, especially in the context of increasing variability introduced by renewable and distributed energy resources. As these systems evolve rapidly, it becomes increasingly important to reliably predict their states in the short term to ensure operational stability, support control decisions, and enable interpretable monitoring of sensor and machine behavior. Modern power systems often span multiple physical domains - including electrical, mechanical, hydraulic, and thermal - posing significant challenges for modeling and prediction. Graph Neural Networks (GNNs) have emerged as a promising data-driven framework for system state estimation and state forecasting in such settings. By leveraging the topological structure of sensor networks, GNNs can implicitly learn inter-sensor relationships and propagate information across the network. However, most existing GNN-based methods are designed under the assumption of homogeneous sensor relationships and are typically constrained to a single physical domain. This limitation restricts their ability to integrate and reason over heterogeneous sensor data commonly encountered in real-world energy systems, such as those used in energy conversion infrastructure. In this work, we propose the use of Heterogeneous Graph Attention Networks to address these limitations. Our approach models both homogeneous intra-domain and heterogeneous inter-domain relationships among sensor data from two distinct physical domains - hydraulic and electrical - which exhibit fundamentally different temporal dynamics. Experimental results demonstrate that our method significantly outperforms conventional baselines on average by 35.5% in terms of normalized root mean square error, confirming its effectiveness in multi-domain, multi-rate power system state forecasting.

Updated: 2025-07-09 09:39:14

标题: 跨领域和时间尺度的电力系统短期状态预测中的异构图神经网络：水电厂案例研究

摘要: 准确的短期状态预测对现代电力系统的高效稳定运行至关重要，特别是在可再生能源和分布式能源资源引入的增加变化的背景下。随着这些系统的快速发展，可靠地预测它们的短期状态变得越来越重要，以确保运行稳定性，支持控制决策，并实现对传感器和机器行为的可解释监控。现代电力系统通常跨越多个物理领域 - 包括电气、机械、液压和热力学 - 这给建模和预测带来了重大挑战。图神经网络（GNNs）已经成为一种有前途的数据驱动框架，用于在这种环境中进行系统状态估计和状态预测。通过利用传感器网络的拓扑结构，GNNs可以隐式学习传感器之间的关系并在网络中传播信息。然而，大多数现有的基于GNN的方法都是在同质传感器关系的假设下设计的，并且通常受限于单一物理领域。这种限制阻碍了它们集成和推理真实世界能源系统中常见的异质传感器数据的能力，例如用于能源转换基础设施的传感器。在这项工作中，我们提出使用异构图注意力网络来解决这些限制。我们的方法对来自两个不同物理领域 - 液压和电气 - 的传感器数据之间的同质内域和异质间域关系进行建模，这两种领域展现出根本不同的时间动态。实验结果表明，我们的方法在标准化均方根误差方面平均优于传统基线35.5％，验证了它在多领域、多速率电力系统状态预测中的有效性。

更新时间: 2025-07-09 09:39:14

领域: cs.LG,cs.SY,eess.SP,eess.SY

下载: http://arxiv.org/abs/2507.06694v1

A statistical approach to latent dynamic modeling with differential equations

Ordinary differential equations (ODEs) can provide mechanistic models of temporally local changes of processes, where parameters are often informed by external knowledge. While ODEs are popular in systems modeling, they are less established for statistical modeling of longitudinal cohort data, e.g., in a clinical setting. Yet, modeling of local changes could also be attractive for assessing the trajectory of an individual in a cohort in the immediate future given its current status, where ODE parameters could be informed by further characteristics of the individual. However, several hurdles so far limit such use of ODEs, as compared to regression-based function fitting approaches. The potentially higher level of noise in cohort data might be detrimental to ODEs, as the shape of the ODE solution heavily depends on the initial value. In addition, larger numbers of variables multiply such problems and might be difficult to handle for ODEs. To address this, we propose to use each observation in the course of time as the initial value to obtain multiple local ODE solutions and build a combined estimator of the underlying dynamics. Neural networks are used for obtaining a low-dimensional latent space for dynamic modeling from a potentially large number of variables, and for obtaining patient-specific ODE parameters from baseline variables. Simultaneous identification of dynamic models and of a latent space is enabled by recently developed differentiable programming techniques. We illustrate the proposed approach in an application with spinal muscular atrophy patients and a corresponding simulation study. In particular, modeling of local changes in health status at any point in time is contrasted to the interpretation of functions obtained from a global regression. This more generally highlights how different application settings might demand different modeling strategies.

Updated: 2025-07-09 09:34:24

标题: 一种基于微分方程的潜在动态建模的统计方法

摘要: 常微分方程（ODEs）可以提供过程的时间局部变化的机械模型，其中参数通常由外部知识提供。虽然ODEs在系统建模中很受欢迎，但在纵向队列数据的统计建模中，例如在临床环境中，它们的应用还不够成熟。然而，对于评估队列中个体的轨迹在未来的变化情况可能也具有吸引力，因为ODE参数可以由个体的进一步特征提供信息。然而，迄今为止，与基于回归的函数拟合方法相比，几个障碍限制了ODEs的这种应用。队列数据中潜在的噪声水平可能对ODEs具有不利影响，因为ODE解的形状严重依赖于初始值。此外，更多的变量可能会增加这些问题，并且可能难以处理对ODEs。为了解决这个问题，我们建议将时间序列中的每个观测值作为初始值，以获得多个局部ODE解，并构建潜在动态的组合估计器。神经网络用于从潜在大量变量中获得低维度潜在空间进行动态建模，并从基线变量中获取特定于患者的ODE参数。最近发展的可微分编程技术使动态模型和潜在空间的同时识别成为可能。我们在脊髓性肌萎缩患者和相应的模拟研究中说明了所提出的方法。特别是，在任何时间点对健康状况的局部变化进行建模，与从全局回归中获得的函数的解释进行对比。这更普遍地突显了不同应用环境可能需要不同的建模策略。

更新时间: 2025-07-09 09:34:24

领域: stat.ME,cs.LG,stat.AP,stat.ML

下载: http://arxiv.org/abs/2311.16286v2

Hierarchical Procedural Framework for Low-latency Robot-Assisted Hand-Object Interaction

Advances in robotics have been driving the development of human-robot interaction (HRI) technologies. However, accurately perceiving human actions and achieving adaptive control remains a challenge in facilitating seamless coordination between human and robotic movements. In this paper, we propose a hierarchical procedural framework to enable dynamic robot-assisted hand-object interaction (HOI). An open-loop hierarchy leverages the RGB-based 3D reconstruction of the human hand, based on which motion primitives have been designed to translate hand motions into robotic actions. The low-level coordination hierarchy fine-tunes the robot's action by using the continuously updated 3D hand models. Experimental validation demonstrates the effectiveness of the hierarchical control architecture. The adaptive coordination between human and robot behavior has achieved a delay of $\leq 0.3$ seconds in the tele-interaction scenario. A case study of ring-wearing tasks indicates the potential application of this work in assistive technologies such as healthcare and manufacturing.

Updated: 2025-07-09 09:30:00

标题: 分层程序框架用于低延迟机器人辅助手对象交互

摘要: 机器人技术的进步推动了人机交互（HRI）技术的发展。然而，准确感知人类动作并实现自适应控制仍然是促进人类和机器人运动之间无缝协调的挑战。本文提出了一个分层程序框架，以实现动态机器人辅助手对象交互（HOI）。一个开环层次结构利用基于RGB的人手3D重建，基于此已设计出运动基元，将手部运动转化为机器人动作。低层次协调层次通过使用不断更新的3D手模型来微调机器人的动作。实验验证证明了分层控制架构的有效性。人类和机器人行为之间的自适应协调在远程交互场景中实现了≤0.3秒的延迟。戴戒指任务的案例研究表明了这项工作在医疗保健和制造等辅助技术中的潜在应用。

更新时间: 2025-07-09 09:30:00

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2405.19531v3

Photometric Stereo using Gaussian Splatting and inverse rendering

Recent state-of-the-art algorithms in photometric stereo rely on neural networks and operate either through prior learning or inverse rendering optimization. Here, we revisit the problem of calibrated photometric stereo by leveraging recent advances in 3D inverse rendering using the Gaussian Splatting formalism. This allows us to parameterize the 3D scene to be reconstructed and optimize it in a more interpretable manner. Our approach incorporates a simplified model for light representation and demonstrates the potential of the Gaussian Splatting rendering engine for the photometric stereo problem.

Updated: 2025-07-09 09:22:24

标题: 使用高斯点渲染和逆渲染的光度立体学

摘要: 最近光度立体的最先进算法依赖于神经网络，通过先前学习或反向渲染优化来运行。在这里，我们通过利用最近在3D反向渲染中使用高斯喷涂形式主义的进展，重新审视了校准的光度立体问题。这使我们能够参数化要重建的3D场景，并以更易解释的方式进行优化。我们的方法包括了一个简化的光表示模型，并展示了高斯喷涂渲染引擎在光度立体问题中的潜力。

更新时间: 2025-07-09 09:22:24

领域: eess.IV,cs.AI

下载: http://arxiv.org/abs/2507.06684v1

Class conditional conformal prediction for multiple inputs by p-value aggregation

Conformal prediction methods are statistical tools designed to quantify uncertainty and generate predictive sets with guaranteed coverage probabilities. This work introduces an innovative refinement to these methods for classification tasks, specifically tailored for scenarios where multiple observations (multi-inputs) of a single instance are available at prediction time. Our approach is particularly motivated by applications in citizen science, where multiple images of the same plant or animal are captured by individuals. Our method integrates the information from each observation into conformal prediction, enabling a reduction in the size of the predicted label set while preserving the required class-conditional coverage guarantee. The approach is based on the aggregation of conformal p-values computed from each observation of a multi-input. By exploiting the exact distribution of these p-values, we propose a general aggregation framework using an abstract scoring function, encompassing many classical statistical tools. Knowledge of this distribution also enables refined versions of standard strategies, such as majority voting. We evaluate our method on simulated and real data, with a particular focus on Pl@ntNet, a prominent citizen science platform that facilitates the collection and identification of plant species through user-submitted images.

Updated: 2025-07-09 09:17:17

标题: 多输入的类条件拟合预测: 通过p值聚合

摘要: 共形预测方法是一种统计工具，旨在量化不确定性，并生成具有保证覆盖概率的预测集。本文介绍了这些方法的创新改进，用于分类任务，特别是针对在预测时可用多个观测（多输入）的情况。我们的方法特别受到公民科学应用的启发，其中个体捕获同一植物或动物的多幅图像。我们的方法将每个观测的信息整合到共形预测中，使得预测标签集的大小减小，同时保持所需的类条件覆盖保证。该方法基于从多输入的每个观测计算的共形p值的汇总。通过利用这些p值的确切分布，我们提出了一个使用抽象评分函数的通用汇总框架，涵盖许多经典统计工具。对这一分布的了解还使得标准策略的精细版本得以实现，例如多数投票。我们在模拟和真实数据上评估了我们的方法，特别关注Pl@ntNet，这是一个重要的公民科学平台，通过用户提交的图像促进植物物种的收集和识别。

更新时间: 2025-07-09 09:17:17

领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2507.07150v1

Fast Gaussian Processes under Monotonicity Constraints

Gaussian processes (GPs) are widely used as surrogate models for complicated functions in scientific and engineering applications. In many cases, prior knowledge about the function to be approximated, such as monotonicity, is available and can be leveraged to improve model fidelity. Incorporating such constraints into GP models enhances predictive accuracy and reduces uncertainty, but remains a computationally challenging task for high-dimensional problems. In this work, we present a novel virtual point-based framework for building constrained GP models under monotonicity constraints, based on regularized linear randomize-then-optimize (RLRTO), which enables efficient sampling from a constrained posterior distribution by means of solving randomized optimization problems. We also enhance two existing virtual point-based approaches by replacing Gibbs sampling with the No U-Turn Sampler (NUTS) for improved efficiency. A Python implementation of these methods is provided and can be easily applied to a wide range of problems. This implementation is then used to validate the approaches on approximating a range of synthetic functions, demonstrating comparable predictive performance between all considered methods and significant improvements in computational efficiency with the two NUTS methods and especially with the RLRTO method. The framework is further applied to construct surrogate models for systems of differential equations.

Updated: 2025-07-09 09:09:00

标题: 在单调性约束下的快速高斯过程

摘要: 高斯过程（GPs）广泛用作科学和工程应用中复杂函数的代理模型。在许多情况下，关于要近似的函数的先验知识，如单调性，是可用的，并且可以利用这些知识来提高模型的准确性。将这些约束条件纳入GP模型可以增强预测准确性并减少不确定性，但对于高维问题仍然是一个具有挑战性的计算任务。在这项工作中，我们提出了一个基于正则化线性随机化优化（RLRTO）的新型基于虚拟点的框架，用于在单调性约束下构建受约束的GP模型，通过解决随机化优化问题实现对受约束后验分布的有效采样。我们还通过将吉布斯采样替换为无U-Turn采样器（NUTS）来增强两种现有的基于虚拟点的方法，以提高效率。提供了这些方法的Python实现，可以轻松应用于各种问题。然后使用该实现来验证在近似一系列合成函数时的方法，展示了所有考虑的方法之间可比较的预测性能和在两种NUTS方法以及尤其是RLRTO方法中的计算效率显著提高。该框架进一步应用于构建微分方程组的代理模型。

更新时间: 2025-07-09 09:09:00

领域: stat.ML,cs.LG,stat.ME,60G15

下载: http://arxiv.org/abs/2507.06677v1

Exploring State-Space-Model based Language Model in Music Generation

The recent surge in State Space Models (SSMs), particularly the emergence of Mamba, has established them as strong alternatives or complementary modules to Transformers across diverse domains. In this work, we aim to explore the potential of Mamba-based architectures for text-to-music generation. We adopt discrete tokens of Residual Vector Quantization (RVQ) as the modeling representation and empirically find that a single-layer codebook can capture semantic information in music. Motivated by this observation, we focus on modeling a single-codebook representation and adapt SiMBA, originally designed as a Mamba-based encoder, to function as a decoder for sequence modeling. We compare its performance against a standard Transformer-based decoder. Our results suggest that, under limited-resource settings, SiMBA achieves much faster convergence and generates outputs closer to the ground truth. This demonstrates the promise of SSMs for efficient and expressive text-to-music generation. We put audio examples on Github.

Updated: 2025-07-09 09:05:18

标题: 探索基于状态空间模型的语言模型在音乐生成中的应用

摘要: 最近，State Space Models（SSMs）的激增，特别是Mamba的出现，已经将它们确立为强大的替代品或补充模块，适用于各个领域，与Transformer相比。在这项工作中，我们旨在探索基于Mamba架构的文本到音乐生成的潜力。我们采用离散的Residual Vector Quantization（RVQ）令牌作为建模表示，并经验性地发现单层码书能够捕捉音乐中的语义信息。受到这一观察的启发，我们专注于建模单一码书表示，并将最初设计为基于Mamba的编码器的SiMBA调整为序列建模的解码器。我们将其性能与标准的基于Transformer的解码器进行比较。我们的结果表明，在资源有限的情况下，SiMBA实现了更快的收敛速度，并生成了更接近真实数据的输出。这证明了SSMs在高效和富有表现力的文本到音乐生成中的潜力。我们在Github上放置了音频示例。

更新时间: 2025-07-09 09:05:18

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2507.06674v1

DAF: An Efficient End-to-End Dynamic Activation Framework for on-Device DNN Training

Recent advancements in on-device training for deep neural networks have underscored the critical need for efficient activation compression to overcome the memory constraints of mobile and edge devices. As activations dominate memory usage during training and are essential for gradient computation, compressing them without compromising accuracy remains a key research challenge. While existing methods for dynamic activation quantization promise theoretical memory savings, their practical deployment is impeded by system-level challenges such as computational overhead and memory fragmentation. To address these challenges, we introduce DAF, a Dynamic Activation Framework that enables scalable and efficient on-device training through system-level optimizations. DAF achieves both memory- and time-efficient dynamic quantization training by addressing key system bottlenecks. It develops hybrid reduction operations tailored to the memory hierarchies of mobile and edge SoCs, leverages collaborative CPU-GPU bit-packing for efficient dynamic quantization, and implements an importance-aware paging memory management scheme to reduce fragmentation and support dynamic memory adjustments. These optimizations collectively enable DAF to achieve substantial memory savings and speedup without compromising model training accuracy. Evaluations on various deep learning models across embedded and mobile platforms demonstrate up to a $22.9\times$ reduction in memory usage and a $3.2\times$ speedup, making DAF a scalable and practical solution for resource-constrained environments.

Updated: 2025-07-09 08:59:30

标题: DAF：一种高效的端到端动态激活框架，用于设备上的DNN训练

摘要: 最近关于深度神经网络设备端训练的进展凸显了在移动设备和边缘设备中克服内存约束的高效激活压缩的关键需求。由于激活在训练过程中主导了内存使用，并且对于梯度计算至关重要，因此在不影响准确性的情况下对其进行压缩仍然是一个关键的研究挑战。尽管现有的动态激活量化方法承诺了理论上的内存节省，但它们在实际部署中受到了系统级挑战的阻碍，如计算开销和内存碎片化。为了解决这些挑战，我们引入了DAF，即动态激活框架，通过系统级优化实现可扩展和高效的设备端训练。DAF通过解决关键系统瓶颈实现了既节省内存又节省时间的动态量化训练。它开发了针对移动和边缘SoCs内存层次结构的混合减少操作，利用协同CPU-GPU位打包实现高效的动态量化，并实现了一个重要性感知的页面内存管理方案，以减少碎片化并支持动态内存调整。这些优化共同使DAF能够在不影响模型训练准确性的情况下实现显著的内存节省和加速。在嵌入式和移动平台上对各种深度学习模型的评估表明，内存使用量减少了高达22.9倍，加速了3.2倍，使DAF成为资源受限环境的可扩展和实用解决方案。

更新时间: 2025-07-09 08:59:30

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2507.07149v1

Probing and Steering Evaluation Awareness of Language Models

Language models can distinguish between testing and deployment phases -- a capability known as evaluation awareness. This has significant safety and policy implications, potentially undermining the reliability of evaluations that are central to AI governance frameworks and voluntary industry commitments. In this paper, we study evaluation awareness in Llama-3.3-70B-Instruct. We show that linear probes can separate real-world evaluation and deployment prompts, suggesting that current models internally represent this distinction. We also find that current safety evaluations are correctly classified by the probes, suggesting that they already appear artificial or inauthentic to models. Our findings underscore the importance of ensuring trustworthy evaluations and understanding deceptive capabilities. More broadly, our work showcases how model internals may be leveraged to support blackbox methods in safety audits, especially for future models more competent at evaluation awareness and deception.

Updated: 2025-07-09 08:46:58

标题: 探索和引导语言模型的评估意识

摘要: 语言模型可以区分测试和部署阶段 - 这一能力被称为评估意识。这对安全性和政策有重要影响，可能会削弱对人工智能治理框架和自愿行业承诺至关重要的评估的可靠性。在本文中，我们研究了Llama-3.3-70B-Instruct中的评估意识。我们展示线性探针可以分离真实世界的评估和部署提示，表明当前模型内部表示了这种区别。我们还发现当前的安全评估被探针正确分类，表明它们已经对模型呈现出人为或不真实的特点。我们的发现强调了确保可信评估和理解欺骗能力的重要性。更广泛地说，我们的工作展示了如何利用模型内部支持黑盒方法进行安全审计，特别是对于更擅长评估意识和欺骗的未来模型。

更新时间: 2025-07-09 08:46:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.01786v2

Elite Polarization in European Parliamentary Speeches: a Novel Measurement Approach Using Large Language Models

This project introduces a new measure of elite polarization via actor and subject detection using artificial intelligence. I identify when politicians mention one another in parliamentary speeches, note who is speaking and who is being addressed, and assess the emotional temperature behind these evaluations. This maps how elites evaluate their various out-parties, allowing us to create an index of mutual out-party hostility, that is, elite polarization. While I analyzed polarization data over the past four decades for the UK, and two decades for Hungary and Italy, my approach lays the groundwork for a twenty-year, EU-wide time-series dataset on elite polarization. I obtain the results that can be aggregated by party and quarter. The resulting index demonstrates a good face validity: it reacts to events such as electoral campaigns, country- and party-level crises, and to parties losing and assuming power.

Updated: 2025-07-09 08:44:29

标题: 欧洲议会演讲中的精英极化：利用大型语言模型的新颖测量方法

摘要: 这个项目引入了一种使用人工智能进行演员和主题检测的精英极化新衡量。我确定政治家在议会演讲中提到彼此的情况，记录谁在讲话，谁被提及，并评估这些评价背后的情绪温度。这将绘制精英如何评估他们的各种反对党，使我们能够创建一个互相敌对的反对党指数，即精英极化。虽然我分析了过去四十年英国以及匈牙利和意大利的极化数据，但我的方法为建立一个涵盖二十年的欧盟范围内的精英极化时间序列数据集奠定了基础。我获得的结果可以按党派和季度进行汇总。结果指数表现出良好的表面效度：它对选举活动、国家和党派危机以及党派失去和获得权力等事件做出反应。

更新时间: 2025-07-09 08:44:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.06658v1

Explainable Artificial Intelligence in Biomedical Image Analysis: A Comprehensive Survey

Explainable artificial intelligence (XAI) has become increasingly important in biomedical image analysis to promote transparency, trust, and clinical adoption of DL models. While several surveys have reviewed XAI techniques, they often lack a modality-aware perspective, overlook recent advances in multimodal and vision-language paradigms, and provide limited practical guidance. This survey addresses this gap through a comprehensive and structured synthesis of XAI methods tailored to biomedical image analysis.We systematically categorize XAI methods, analyzing their underlying principles, strengths, and limitations within biomedical contexts. A modality-centered taxonomy is proposed to align XAI methods with specific imaging types, highlighting the distinct interpretability challenges across modalities. We further examine the emerging role of multimodal learning and vision-language models in explainable biomedical AI, a topic largely underexplored in previous work. Our contributions also include a summary of widely used evaluation metrics and open-source frameworks, along with a critical discussion of persistent challenges and future directions. This survey offers a timely and in-depth foundation for advancing interpretable DL in biomedical image analysis.

Updated: 2025-07-09 08:42:14

标题: 生物医学图像分析中的可解释人工智能：综合调查

摘要: 可解释的人工智能（XAI）在生物医学图像分析中变得越来越重要，以促进透明度、信任和DL模型的临床应用。虽然有几项调查审查了XAI技术，但它们通常缺乏一种模态感知的视角，忽视了多模态和视觉-语言范式的最新进展，并提供有限的实用指导。本调查通过全面和结构化的综合XAI方法，针对生物医学图像分析进行了调整。我们系统地对XAI方法进行分类，分析它们在生物医学背景下的基本原理、优势和局限性。提出了一种以模态为中心的分类法，将XAI方法与特定的成像类型相匹配，突出不同模态之间的可解释性挑战。我们进一步研究了多模态学习和视觉-语言模型在可解释的生物医学人工智能中的新兴作用，这是以前工作中很少探讨的一个主题。我们的贡献还包括对广泛使用的评估指标和开源框架的总结，以及对持续挑战和未来方向的批判性讨论。本调查为推进生物医学图像分析中的可解释深度学习提供了及时和深入的基础。

更新时间: 2025-07-09 08:42:14

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.07148v1

Fuzzy Classification Aggregation for a Continuum of Agents

We prove that any optimal, independent, and zero unanimous fuzzy classification aggregation function of a continuum of individual classifications of $m\ge 3$ objects into $2\le p\le m$ types must be a weighted arithmetic mean.

Updated: 2025-07-09 08:41:46

标题: 模糊分类聚合在代理人连续体中的应用

摘要: 我们证明了对$m\ge 3$个对象进行$2\le p\le m$种类型的分类，任何最优的、独立的和零一致的模糊分类聚合函数必须是加权算术平均值。

更新时间: 2025-07-09 08:41:46

领域: cs.AI,econ.TH

下载: http://arxiv.org/abs/2507.05297v2

Enhancing Diffusion Model Stability for Image Restoration via Gradient Management

Diffusion models have shown remarkable promise for image restoration by leveraging powerful priors. Prominent methods typically frame the restoration problem within a Bayesian inference framework, which iteratively combines a denoising step with a likelihood guidance step. However, the interactions between these two components in the generation process remain underexplored. In this paper, we analyze the underlying gradient dynamics of these components and identify significant instabilities. Specifically, we demonstrate conflicts between the prior and likelihood gradient directions, alongside temporal fluctuations in the likelihood gradient itself. We show that these instabilities disrupt the generative process and compromise restoration performance. To address these issues, we propose Stabilized Progressive Gradient Diffusion (SPGD), a novel gradient management technique. SPGD integrates two synergistic components: (1) a progressive likelihood warm-up strategy to mitigate gradient conflicts; and (2) adaptive directional momentum (ADM) smoothing to reduce fluctuations in the likelihood gradient. Extensive experiments across diverse restoration tasks demonstrate that SPGD significantly enhances generation stability, leading to state-of-the-art performance in quantitative metrics and visually superior results. Code is available at \href{https://github.com/74587887/SPGD}{here}.

Updated: 2025-07-09 08:40:46

标题: 通过梯度管理增强图像恢复的扩散模型稳定性

摘要: 扩散模型通过利用强大的先验知识，为图像恢复展现了显著的潜力。知名方法通常将恢复问题置于贝叶斯推断框架内，通过迭代地将去噪步骤与似然导向步骤相结合。然而，在生成过程中，这两个组件之间的相互作用仍未得到充分探讨。本文分析了这些组件的潜在梯度动态，并确定了重要的不稳定性。具体而言，我们展示了先验和似然梯度方向之间的冲突，以及似然梯度本身的时间波动。我们表明，这些不稳定性破坏了生成过程，并损害了恢复性能。为了解决这些问题，我们提出了一种新颖的梯度管理技术——稳定渐进梯度扩散（SPGD）。SPGD集成了两个协同组件：（1）渐进似然热身策略以减轻梯度冲突；(2) 自适应方向动量（ADM）平滑以减少似然梯度的波动。广泛的实验涵盖了各种恢复任务，表明SPGD显著增强了生成稳定性，导致在定量指标和视觉上优越的结果。代码可在\href{https://github.com/74587887/SPGD}{此处}获取。

更新时间: 2025-07-09 08:40:46

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.06656v1

MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval

Result diversification (RD) is a crucial technique in Text-to-Image Retrieval for enhancing the efficiency of a practical application. Conventional methods focus solely on increasing the diversity metric of image appearances. However, the diversity metric and its desired value vary depending on the application, which limits the applications of RD. This paper proposes a novel task called CDR-CA (Contextual Diversity Refinement of Composite Attributes). CDR-CA aims to refine the diversities of multiple attributes, according to the application's context. To address this task, we propose Multi-Source DPPs, a simple yet strong baseline that extends the Determinantal Point Process (DPP) to multi-sources. We model MS-DPP as a single DPP model with a unified similarity matrix based on a manifold representation. We also introduce Tangent Normalization to reflect contexts. Extensive experiments demonstrate the effectiveness of the proposed method. Our code is publicly available at https://github.com/NEC-N-SOGI/msdpp.

Updated: 2025-07-09 08:38:46

标题: MS-DPPs：多源确定性点过程用于文本到图像检索中复合属性的上下文多样性精炼

摘要: 结果多样化（RD）是在文本到图像检索中增强实际应用效率的关键技术。传统方法仅专注于增加图像外观的多样性度量。然而，多样性度量及其期望值取决于应用程序，这限制了RD的应用。本文提出了一种名为CDR-CA（复合属性的上下文多样性细化）的新任务。CDR-CA旨在根据应用程序的上下文来细化多个属性的多样性。为了解决这个任务，我们提出了多源DPPs，这是一种简单而强大的基线，它将确定性点过程（DPP）扩展到多个来源。我们将MS-DPP建模为一个单一的DPP模型，其中统一的相似性矩阵基于流形表示。我们还引入了切线归一化来反映上下文。大量实验证明了所提方法的有效性。我们的代码可以在https://github.com/NEC-N-SOGI/msdpp 上公开获取。

更新时间: 2025-07-09 08:38:46

领域: cs.CV,cs.AI,cs.IR

下载: http://arxiv.org/abs/2507.06654v1

Federated Learning Inspired Fuzzy Systems: Decentralized Rule Updating for Privacy and Scalable Decision Making

Fuzzy systems are a way to allow machines, systems and frameworks to deal with uncertainty, which is not possible in binary systems that most computers use. These systems have already been deployed for certain use cases, and fuzzy systems could be further improved as proposed in this paper. Such technologies to draw inspiration from include machine learning and federated learning. Machine learning is one of the recent breakthroughs of technology and could be applied to fuzzy systems to further improve the results it produces. Federated learning is also one of the recent technologies that have huge potential, which allows machine learning training to improve by reducing privacy risk, reducing burden on networking infrastructure, and reducing latency of the latest model. Aspects from federated learning could be used to improve federated learning, such as applying the idea of updating the fuzzy rules that make up a key part of fuzzy systems, to further improve it over time. This paper discusses how these improvements would be implemented in fuzzy systems, and how it would improve fuzzy systems. It also discusses certain limitations on the potential improvements. It concludes that these proposed ideas and improvements require further investigation to see how far the improvements are, but the potential is there to improve fuzzy systems.

Updated: 2025-07-09 08:34:24

标题: 联邦学习启发的模糊系统：隐私和可扩展决策制定的分散规则更新

摘要: 模糊系统是一种允许机器、系统和框架处理不确定性的方式，这在大多数计算机使用的二进制系统中是不可能的。这些系统已经被部署用于某些用例，本文提出了进一步改进模糊系统的可能性。可借鉴的技术包括机器学习和联邦学习。机器学习是技术的最新突破之一，可以应用于模糊系统以进一步改善其产生的结果。联邦学习也是具有巨大潜力的最新技术之一，它可以通过降低隐私风险、减少网络基础设施的负担和减少最新模型的延迟来改善机器学习训练。联邦学习的一些方面可以用来改进模糊系统，例如应用更新模糊规则的想法，这是模糊系统的关键部分，以进一步改进其性能。本文讨论了这些改进将如何在模糊系统中实施，以及它们如何改进模糊系统。它还讨论了潜在改进的某些限制。结论是，这些提出的想法和改进需要进一步的调查来看看改进的程度，但潜力是存在的，可以改进模糊系统。

更新时间: 2025-07-09 08:34:24

领域: cs.LG

下载: http://arxiv.org/abs/2507.06652v1

Deep Disentangled Representation Network for Treatment Effect Estimation

Estimating individual-level treatment effect from observational data is a fundamental problem in causal inference and has attracted increasing attention in the fields of education, healthcare, and public policy.In this work, we concentrate on the study of disentangled representation methods that have shown promising outcomes by decomposing observed covariates into instrumental, confounding, and adjustment factors. However, most of the previous work has primarily revolved around generative models or hard decomposition methods for covariates, which often struggle to guarantee the attainment of precisely disentangled factors. In order to effectively model different causal relationships, we propose a novel treatment effect estimation algorithm that incorporates a mixture of experts with multi-head attention and a linear orthogonal regularizer to softly decompose the pre-treatment variables, and simultaneously eliminates selection bias via importance sampling re-weighting techniques. We conduct extensive experiments on both public semi-synthetic and real-world production datasets. The experimental results clearly demonstrate that our algorithm outperforms the state-of-the-art methods focused on individual treatment effects.

Updated: 2025-07-09 08:29:37

标题: 深度解耦表示网络用于治疗效果估计

摘要: 从观察数据中估计个体水平的治疗效果是因果推断中的一个基本问题，并在教育、医疗保健和公共政策领域引起了越来越多的关注。在这项工作中，我们专注于研究显示了通过将观察到的协变量分解为工具性、混淆和调整因子而取得有希望结果的分离表示方法。然而，大部分先前的工作主要围绕生成模型或硬性分解方法展开，这些方法往往难以保证准确分离因子的达成。为了有效地建模不同的因果关系，我们提出了一种新颖的治疗效果估计算法，该算法结合了具有多头注意力和线性正交正则化器的专家混合，并通过重要性抽样重新加权技术同时消除选择偏差。我们在公共半合成和真实生产数据集上进行了大量实验。实验结果清楚地表明，我们的算法优于专注于个体治疗效果的最先进方法。

更新时间: 2025-07-09 08:29:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06650v1

AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model

The Segment Anything Model (SAM) has demonstrated strong versatility across various visual tasks. However, its large storage requirements and high computational cost pose challenges for practical deployment. Post-training quantization (PTQ) has emerged as an effective strategy for efficient deployment, but we identify two key challenges in SAM that hinder the effectiveness of existing PTQ methods: the heavy-tailed and skewed distribution of post-GELU activations, and significant inter-channel variation in linear projection activations. To address these challenges, we propose AHCPTQ, an accurate and hardware-efficient PTQ method for SAM. AHCPTQ introduces hardware-compatible Hybrid Log-Uniform Quantization (HLUQ) to manage post-GELU activations, employing log2 quantization for dense small values and uniform quantization for sparse large values to enhance quantization resolution. Additionally, AHCPTQ incorporates Channel-Aware Grouping (CAG) to mitigate inter-channel variation by progressively clustering activation channels with similar distributions, enabling them to share quantization parameters and improving hardware efficiency. The combination of HLUQ and CAG not only enhances quantization effectiveness but also ensures compatibility with efficient hardware execution. For instance, under the W4A4 configuration on the SAM-L model, AHCPTQ achieves 36.6% mAP on instance segmentation with the DINO detector, while achieving a 7.89x speedup and 8.64x energy efficiency over its floating-point counterpart in FPGA implementation.

Updated: 2025-07-09 08:26:21

标题: AHCPTQ：用于分段任意模型的准确且硬件兼容的训练后量化

摘要: 段分割模型（SAM）在各种视觉任务中展示出强大的通用性。然而，其庞大的存储需求和高计算成本给实际部署带来了挑战。事后训练量化（PTQ）已经成为一种有效的高效部署策略，但我们确定了SAM中两个关键挑战，这些挑战阻碍了现有PTQ方法的有效性：后GELU激活的重尾和偏斜分布，以及线性投影激活中显著的通道间变化。为了解决这些挑战，我们提出了AHCPTQ，这是一种准确且硬件高效的SAM后训练量化方法。AHCPTQ引入了与硬件兼容的混合对数均匀量化（HLUQ）来管理后GELU激活，使用log2量化密集小值和均匀量化稀疏大值，以增强量化分辨率。此外，AHCPTQ还结合了通道感知分组（CAG），通过逐渐聚类具有相似分布的激活通道来减轻通道间变化，使它们能够共享量化参数，提高硬件效率。HLUQ和CAG的组合不仅提高了量化的有效性，还确保与高效硬件执行的兼容性。例如，在SAM-L模型的W4A4配置下，AHCPTQ在使用DINO检测器进行实例分割时实现了36.6％的mAP，同时在FPGA实现中实现了与其浮点对应物的7.89倍速度提升和8.64倍能效提升。

更新时间: 2025-07-09 08:26:21

领域: cs.CV,cs.AR,cs.LG

下载: http://arxiv.org/abs/2503.03088v2

Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning

Closed-loop learning is the process of repeatedly estimating a model from data generated from the model itself. It is receiving great attention due to the possibility that large neural network models may, in the future, be primarily trained with data generated by artificial neural networks themselves. We study this process for models that belong to exponential families, deriving equations of motions that govern the dynamics of the parameters. We show that maximum likelihood estimation of the parameters endows sufficient statistics with the martingale property and that as a result the process converges to absorbing states that amplify initial biases present in the data. However, we show that this outcome may be prevented if the data contains at least one data point generated from a ground truth model, by relying on maximum a posteriori estimation or by introducing regularisation.

Updated: 2025-07-09 08:24:09

标题: 迷失在重新训练中：在闭环学习下漫游指数族参数空间

摘要: 封闭环学习是重复从模型生成的数据中估计模型的过程。由于大型神经网络模型未来可能主要通过人工神经网络自身生成的数据进行训练，因此引起了广泛关注。我们研究了属于指数族的模型的这一过程，推导出控制参数动态的运动方程。我们表明，参数的最大似然估计赋予充分统计量马特拉格尔性质，因此该过程会收敛到放大数据中存在的初始偏见的吸收态。但是，我们表明，如果数据中至少包含一个由真实模型生成的数据点，则可以通过依赖最大后验估计或引入正则化来阻止这种结果。

更新时间: 2025-07-09 08:24:09

领域: cs.LG,cond-mat.dis-nn,physics.data-an,stat.ML

下载: http://arxiv.org/abs/2506.20623v2

SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications

Resolution of complex SQL issues persists as a significant bottleneck in real-world database applications. Current Large Language Models (LLMs), while adept at text-to-SQL translation, have not been rigorously evaluated on the more challenging task of debugging SQL issues. To address this gap, we introduce BIRD-CRITIC, a new SQL issue debugging benchmark comprising 530 PostgreSQL tasks (BIRD-CRITIC-PG) and 570 multi-dialect tasks (BIRD-CRITIC-Multi), distilled from authentic user issues and replayed within new environments to facilitate rigorous evaluation. Baseline evaluations underscore the task's complexity, with the leading reasoning model O3-Mini achieving only 38.87% success rate on BIRD-CRITIC-PG and 33.33% on BIRD-CRITIC-Multi. Meanwhile, advancing open-source models for database tasks is crucial for empowering local development while safeguarding data privacy. Therefore, we present Six-Gym (Sql-fIX-Gym), a training environment for elevating open-source model capabilities for SQL issue debugging. This environment leverages SQL-Rewind strategy, which automatically generates executable issue-solution datasets by reverse-engineering issues from verified SQLs. However, popular trajectory-based fine-tuning methods do not explore substantial supervisory signals. We further propose f-Plan Boosting, which extracts high-level debugging plans from SQL solutions, enabling teacher LLMs to produce 73.7% more successful trajectories for training. We integrate these components into an open-source agent, Bird-Fixer. Based on Qwen-2.5-Coder-14B, Bird-Fixer achieves 38.11% success rate on BIRD-CRITIC-PG and 29.65% on BIRD-CRITIC-Multi, surpassing leading proprietary models such as Claude-3.7-Sonnet and GPT-4.1, marking a significant step toward democratizing sophisticated SQL-debugging capabilities. The leaderboard and source code are available: https://bird-critic.github.io/

Updated: 2025-07-09 08:22:28

标题: SWE-SQL：揭示LLM路径以解决真实应用程序中用户SQL问题

摘要: SQL问题的解决在现实世界的数据库应用中仍然是一个重要的瓶颈。当前的大型语言模型(LLMs)在文本到SQL翻译方面表现出色，但在调试SQL问题这一更具挑战性的任务上尚未经过严格评估。为了填补这一空白，我们引入了BIRD-CRITIC，一个新的SQL问题调试基准，包括530个PostgreSQL任务(BIRD-CRITIC-PG)和570个多方言任务(BIRD-CRITIC-Multi)，这些任务从真实用户问题中精简而来，并在新环境中重现，以促进严格评估。基准评估凸显了任务的复杂性，领先的推理模型O3-Mini在BIRD-CRITIC-PG上仅实现了38.87%的成功率，在BIRD-CRITIC-Multi上为33.33%。与此同时，推动数据库任务的开源模型对于增强本地开发和保护数据隐私至关重要。因此，我们提出了Six-Gym(Sql-fIX-Gym)，一个用于提升SQL问题调试的开源模型能力的训练环境。该环境利用SQL-Rewind策略，通过从经过验证的SQL中逆向工程问题来自动生成可执行的问题解决方案数据集。然而，流行的基于轨迹的微调方法并未探索出重要的监督信号。我们进一步提出了f-Plan Boosting，从SQL解决方案中提取高级调试计划，使教师LLMs能够为训练产生73.7%更成功的轨迹。我们将这些组件集成到一个开源代理程序Bird-Fixer中。基于Qwen-2.5-Coder-14B，Bird-Fixer在BIRD-CRITIC-PG上实现了38.11%的成功率，在BIRD-CRITIC-Multi上为29.65%，超过了领先的专有模型，如Claude-3.7-Sonnet和GPT-4.1，标志着迈向民主化复杂SQL调试能力的重要一步。排行榜和源代码可在https://bird-critic.github.io/上找到。

更新时间: 2025-07-09 08:22:28

领域: cs.DB,cs.AI

下载: http://arxiv.org/abs/2506.18951v2

Learning from Sparse Point Labels for Dense Carcinosis Localization in Advanced Ovarian Cancer Assessment

Learning from sparse labels is a challenge commonplace in the medical domain. This is due to numerous factors, such as annotation cost, and is especially true for newly introduced tasks. When dense pixel-level annotations are needed, this becomes even more unfeasible. However, being able to learn from just a few annotations at the pixel-level, while extremely difficult and underutilized, can drive progress in studies where perfect annotations are not immediately available. This work tackles the challenge of learning the dense prediction task of keypoint localization from a few point annotations in the context of 2d carcinosis keypoint localization from laparoscopic video frames for diagnostic planning of advanced ovarian cancer patients. To enable this, we formulate the problem as a sparse heatmap regression from a few point annotations per image and propose a new loss function, called Crag and Tail loss, for efficient learning. Our proposed loss function effectively leverages positive sparse labels while minimizing the impact of false negatives or missed annotations. Through an extensive ablation study, we demonstrate the effectiveness of our approach in achieving accurate dense localization of carcinosis keypoints, highlighting its potential to advance research in scenarios where dense annotations are challenging to obtain.

Updated: 2025-07-09 08:14:46

标题: 从稀疏点标签中学习，在晚期卵巢癌评估中进行密集的癌变定位

摘要: 学习来自稀疏标签是医学领域普遍面临的挑战。这是由于诸多因素，如标注成本，尤其是对于新引入的任务。当需要密集的像素级注释时，这变得更加不可行。然而，能够仅通过少量像素级注释进行学习，虽然极具挑战性且未被充分利用，但可以推动在没有立即可用的完美注释的研究中取得进展。本文针对在2D腹腔镜视频帧中的卵巢癌晚期患者的诊断规划中，从少量点注释中学习关键点定位的密集预测任务的挑战进行了探讨。为此，我们将问题构建为每幅图像的稀疏热图回归，提出了一种新的损失函数，称为“Crag and Tail loss”，以进行高效学习。我们提出的损失函数有效地利用了正面稀疏标签，同时最小化了假阴性或遗漏的注释的影响。通过广泛的消融研究，我们证明了我们的方法在实现准确的癌症关键点密集定位方面的有效性，突显了其在难以获得密集注释的情景中推动研究的潜力。

更新时间: 2025-07-09 08:14:46

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.06643v1

EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision

In digital pathology, whole-slide images (WSIs) are often difficult to handle due to their gigapixel scale, so most approaches train patch encoders via self-supervised learning (SSL) and then aggregate the patch-level embeddings via multiple instance learning (MIL) or slide encoders for downstream tasks. However, patch-level SSL may overlook complex domain-specific features that are essential for biomarker prediction, such as mutation status and molecular characteristics, as SSL methods rely only on basic augmentations selected for natural image domains on small patch-level area. Moreover, SSL methods remain less data efficient than fully supervised approaches, requiring extensive computational resources and datasets to achieve competitive performance. To address these limitations, we present EXAONE Path 2.0, a pathology foundation model that learns patch-level representations under direct slide-level supervision. Using only 37k WSIs for training, EXAONE Path 2.0 achieves state-of-the-art average performance across 10 biomarker prediction tasks, demonstrating remarkable data efficiency.

Updated: 2025-07-09 08:09:05

标题: EXAONE Path 2.0：具有端到端监督的病理学基础模型

摘要: 在数字病理学中，全幻灯片图像（WSIs）通常由于其千兆像素的规模而难以处理，因此大多数方法通过自监督学习（SSL）训练补丁编码器，然后通过多实例学习（MIL）或幻灯片编码器对补丁级嵌入进行聚合，用于下游任务。然而，基于补丁级别的SSL可能会忽略对于生物标记预测至关重要的复杂领域特定特征，例如突变状态和分子特性，因为SSL方法仅依赖于为自然图像领域选择的基本增强，在小的补丁级面积上。此外，SSL方法仍然比完全监督的方法数据效率低，需要大量的计算资源和数据集才能实现竞争性性能。为了解决这些限制，我们提出了EXAONE Path 2.0，一个病理基础模型，它在直接幻灯片级别监督下学习补丁级别的表示。仅使用37k个WSIs进行训练，EXAONE Path 2.0在10个生物标记预测任务中实现了最先进的平均性能，展示了卓越的数据效率。

更新时间: 2025-07-09 08:09:05

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.06639v1

Semi-parametric Functional Classification via Path Signatures Logistic Regression

We propose Path Signatures Logistic Regression (PSLR), a semi-parametric framework for classifying vector-valued functional data with scalar covariates. Classical functional logistic regression models rely on linear assumptions and fixed basis expansions, which limit flexibility and degrade performance under irregular sampling. PSLR overcomes these issues by leveraging truncated path signatures to construct a finite-dimensional, basis-free representation that captures nonlinear and cross-channel dependencies. By embedding trajectories as time-augmented paths, PSLR extracts stable, geometry-aware features that are robust to sampling irregularity without requiring a common time grid, while still preserving subject-specific timing patterns. We establish theoretical guarantees for the existence and consistent estimation of the optimal truncation order, along with non-asymptotic risk bounds. Experiments on synthetic and real-world datasets show that PSLR outperforms traditional functional classifiers in accuracy, robustness, and interpretability, particularly under non-uniform sampling schemes. Our results highlight the practical and theoretical benefits of integrating rough path theory into modern functional data analysis.

Updated: 2025-07-09 08:06:50

标题: 通过路径签名逻辑回归进行半参数函数分类

摘要: 我们提出了路径特征逻辑回归（PSLR），这是一个用于分类具有标量协变量的矢量值功能数据的半参数框架。传统的功能逻辑回归模型依赖于线性假设和固定基函数展开，这限制了灵活性并在不规则采样下降低了性能。PSLR通过利用截断路径特征来构建一个基础自由的有限维表示，捕获非线性和跨通道依赖关系来解决这些问题。通过将轨迹嵌入时间增强路径，PSLR提取出稳定的、几何感知的特征，这些特征对采样不规则性具有鲁棒性，而无需共同的时间网格，同时仍保留了主体特定的时间模式。我们建立了关于最优截断顺序的存在和一致估计的理论保证，以及非渐近风险界。对合成和真实数据集的实验表明，PSLR在准确性、鲁棒性和可解释性方面优于传统的功能分类器，特别是在非均匀采样方案下。我们的结果突显了将粗糙路径理论整合到现代功能数据分析中的实际和理论优势。

更新时间: 2025-07-09 08:06:50

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.06637v1

PBa-LLM: Privacy- and Bias-aware NLP using Named-Entity Recognition (NER)

The use of Natural Language Processing (NLP) in highstakes AI-based applications has increased significantly in recent years, especially since the emergence of Large Language Models (LLMs). However, despite their strong performance, LLMs introduce important legal/ ethical concerns, particularly regarding privacy, data protection, and transparency. Due to these concerns, this work explores the use of Named- Entity Recognition (NER) to facilitate the privacy-preserving training (or adaptation) of LLMs. We propose a framework that uses NER technologies to anonymize sensitive information in text data, such as personal identities or geographic locations. An evaluation of the proposed privacy-preserving learning framework was conducted to measure its impact on user privacy and system performance in a particular high-stakes and sensitive setup: AI-based resume scoring for recruitment processes. The study involved two language models (BERT and RoBERTa) and six anonymization algorithms (based on Presidio, FLAIR, BERT, and different versions of GPT) applied to a database of 24,000 candidate profiles. The findings indicate that the proposed privacy preservation techniques effectively maintain system performance while playing a critical role in safeguarding candidate confidentiality, thus promoting trust in the experimented scenario. On top of the proposed privacy-preserving approach, we also experiment applying an existing approach that reduces the gender bias in LLMs, thus finally obtaining our proposed Privacyand Bias-aware LLMs (PBa-LLMs). Note that the proposed PBa-LLMs have been evaluated in a particular setup (resume scoring), but are generally applicable to any other LLM-based AI application.

Updated: 2025-07-09 08:02:08

标题: PBa-LLM：使用命名实体识别（NER）的隐私和偏见感知自然语言处理

摘要: 近年来，在高风险的基于人工智能的应用中，自然语言处理（NLP）的使用显著增加，特别是自从大型语言模型（LLM）的出现以来。然而，尽管它们表现出色，LLM引入了重要的法律/伦理关切，特别是涉及隐私、数据保护和透明度方面。基于这些关切，本文探讨了使用命名实体识别（NER）来促进对LLM进行隐私保护训练（或适应）的方法。我们提出了一个框架，利用NER技术在文本数据中匿名化敏感信息，如个人身份或地理位置。对所提出的隐私保护学习框架进行了评估，以衡量其对用户隐私和系统性能的影响，针对一个特定的高风险和敏感设置：基于人工智能的招聘流程中的简历评分。研究涉及两种语言模型（BERT和RoBERTa）和六种匿名化算法（基于Presidio、FLAIR、BERT和不同版本的GPT），应用于一个包含24,000个候选人档案的数据库。研究结果表明，所提出的隐私保护技术有效地维护系统性能，同时在维护候选人保密性方面发挥了关键作用，从而促进了对实验情景的信任。在所提出的隐私保护方法之上，我们还尝试应用一种现有的方法来减少LLM中的性别偏见，最终获得我们提出的隐私和偏见感知的LLM（PBa-LLMs）。需要注意的是，所提出的PBa-LLMs已在特定设置（简历评分）中进行评估，但通常适用于任何其他基于LLM的人工智能应用。

更新时间: 2025-07-09 08:02:08

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2507.02966v2

Prevention of Overfitting on Mesh-Structured Data Regressions with a Modified Laplace Operator

This document reports on a method for detecting and preventing overfitting on data regressions, herein applied to mesh-like data structures. The mesh structure allows for the straightforward computation of the Laplace-operator second-order derivatives in a finite-difference fashion for noiseless data. Derivatives of the training data are computed on the original training mesh to serve as a true label of the entropy of the training data. Derivatives of the trained data are computed on a staggered mesh to identify oscillations in the interior of the original training mesh cells. The loss of the Laplace-operator derivatives is used for hyperparameter optimisation, achieving a reduction of unwanted oscillation through the minimisation of the entropy of the trained model. In this setup, testing does not require the splitting of points from the training data, and training is thus directly performed on all available training points. The Laplace operator applied to the trained data on a staggered mesh serves as a surrogate testing metric based on diffusion properties.

Updated: 2025-07-09 07:57:52

标题: 使用修改后的拉普拉斯算子防止在网格结构数据回归中出现过拟合

摘要: 这篇文献报告了一种用于检测和预防数据回归中过拟合的方法，该方法适用于网格状数据结构。网格结构允许以有限差分方式直接计算无噪声数据的拉普拉斯算子二阶导数。训练数据的导数在原始训练网格上计算，用作训练数据熵的真实标签。训练数据的导数在交错网格上计算，以识别原始训练网格单元格内部的振荡。拉普拉斯算子导数的损失用于超参数优化，通过最小化训练模型的熵来实现对不需要的振荡的减少。在这种设置中，测试不需要将点从训练数据中分离，因此直接在所有可用的训练点上执行训练。应用于交错网格上的训练数据的拉普拉斯算子作为基于扩散性质的替代测试指标。

更新时间: 2025-07-09 07:57:52

领域: cs.LG

下载: http://arxiv.org/abs/2507.06631v1

Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation

Recent advances in pre-trained Vision Language Models (VLM) have shown promising potential for effectively adapting to downstream tasks through prompt learning, without the need for additional annotated paired datasets. To supplement the text information in VLM trained on correlations with vision data, new approaches leveraging Large Language Models (LLM) in prompts have been proposed, enhancing robustness to unseen and diverse data. Existing methods typically extract text-based responses (i.e., descriptions) from LLM to incorporate into prompts; however, this approach suffers from high variability and low reliability. In this work, we propose Description-free Multi-prompt Learning(DeMul), a novel method that eliminates the process of extracting descriptions and instead directly distills knowledge from LLM into prompts. By adopting a description-free approach, prompts can encapsulate richer semantics while still being represented as continuous vectors for optimization, thereby eliminating the need for discrete pre-defined templates. Additionally, in a multi-prompt setting, we empirically demonstrate the potential of prompt weighting in reflecting the importance of different prompts during training. Experimental results show that our approach achieves superior performance across 11 recognition datasets.

Updated: 2025-07-09 07:55:25

标题: 带权多提示学习与无描述大型语言模型蒸馏

摘要: 最近对预训练的视觉语言模型（VLM）的研究取得了进展，显示出通过提示学习有效地适应下游任务的潜力，无需额外的注释配对数据集。为了补充VLM中基于与视觉数据的相关性训练的文本信息，提出了一种利用大型语言模型（LLM）在提示中的新方法，增强对未知和多样化数据的鲁棒性。现有方法通常从LLM中提取基于文本的响应（即描述）并将其合并到提示中；然而，这种方法存在高度的可变性和低可靠性。在这项工作中，我们提出了一种新颖的方法，即无描述多提示学习（DeMul），它消除了提取描述的过程，而是直接从LLM中提炼知识到提示中。通过采用无描述的方法，提示可以包含更丰富的语义，同时仍然表示为连续向量以进行优化，从而消除了对离散预定义模板的需求。此外，在多提示设置中，我们实证证明了提示加权在训练过程中反映不同提示重要性的潜力。实验结果表明，我们的方法在11个识别数据集上实现了卓越的性能。

更新时间: 2025-07-09 07:55:25

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.07147v1

An attention-aware GNN-based input defender against multi-turn jailbreak on LLMs

Large Language Models (LLMs) have gained widespread popularity and are increasingly integrated into various applications. However, their capabilities can be exploited for both benign and harmful purposes. Despite rigorous training and fine-tuning for safety, LLMs remain vulnerable to jailbreak attacks. Recently, multi-turn attacks have emerged, exacerbating the issue. Unlike single-turn attacks, multi-turn attacks gradually escalate the dialogue, making them more difficult to detect and mitigate, even after they are identified. In this study, we propose G-Guard, an innovative attention-aware GNN-based input classifier designed to defend against multi-turn jailbreak attacks on LLMs. G-Guard constructs an entity graph for multi-turn queries, explicitly capturing relationships between harmful keywords and queries even when those keywords appear only in previous queries. Additionally, we introduce an attention-aware augmentation mechanism that retrieves the most similar single-turn query based on the multi-turn conversation. This retrieved query is treated as a labeled node in the graph, enhancing the ability of GNN to classify whether the current query is harmful. Evaluation results demonstrate that G-Guard outperforms all baselines across all datasets and evaluation metrics.

Updated: 2025-07-09 07:55:03

标题: 一种针对LLMs多轮越狱的基于关注力的GNN输入防御者

摘要: 大型语言模型（LLMs）已经广受欢迎，并越来越多地集成到各种应用程序中。然而，它们的能力可以被用于善良和有害目的。尽管经过严格的训练和调优以确保安全，LLMs仍然容易受到越狱攻击。最近，多轮攻击已经出现，加剧了这个问题。与单轮攻击不同，多轮攻击逐渐升级对话，使其更难以检测和缓解，即使在被识别后也是如此。在这项研究中，我们提出了G-Guard，一种创新的基于GNN的注意力感知输入分类器，旨在防御LLMs上的多轮越狱攻击。G-Guard为多轮查询构建一个实体图，明确捕捉有害关键词和查询之间的关系，即使这些关键词仅出现在先前的查询中。此外，我们引入了一个注意力感知增强机制，根据多轮对话检索出最相似的单轮查询。这个检索到的查询被视为图中的一个标记节点，增强了GNN分类当前查询是否有害的能力。评估结果表明，G-Guard在所有数据集和评估指标上均优于所有基线。

更新时间: 2025-07-09 07:55:03

领域: cs.LG

下载: http://arxiv.org/abs/2507.07146v1

Multi-objective methods in Federated Learning: A survey and taxonomy

The Federated Learning paradigm facilitates effective distributed machine learning in settings where training data is decentralized across multiple clients. As the popularity of the strategy grows, increasingly complex real-world problems emerge, many of which require balancing conflicting demands such as fairness, utility, and resource consumption. Recent works have begun to recognise the use of a multi-objective perspective in answer to this challenge. However, this novel approach of combining federated methods with multi-objective optimisation has never been discussed in the broader context of both fields. In this work, we offer a first clear and systematic overview of the different ways the two fields can be integrated. We propose a first taxonomy on the use of multi-objective methods in connection with Federated Learning, providing a targeted survey of the state-of-the-art and proposing unambiguous labels to categorise contributions. Given the developing nature of this field, our taxonomy is designed to provide a solid basis for further research, capturing existing works while anticipating future additions. Finally, we outline open challenges and possible directions for further research.

Updated: 2025-07-09 07:55:00

标题: 《联邦学习中的多目标方法：调查与分类》

摘要: 联邦学习范式促进了在训练数据分散在多个客户端的设置中进行有效的分布式机器学习。随着这一策略的普及，越来越复杂的现实世界问题出现，其中许多需要平衡冲突的需求，如公平性、效用和资源消耗。最近的研究已经开始认识到在回应这一挑战时使用多目标视角。然而，将联邦方法与多目标优化相结合的这种新颖方法从未在两个领域的更广泛背景下讨论过。在这项工作中，我们首次清晰系统地概述了两个领域可以整合的不同方式。我们提出了一个关于在联邦学习中使用多目标方法的分类法，提供了针对现有技术的有针对性调查，并提出了明确的标签来对贡献进行分类。鉴于这一领域的发展性质，我们的分类法旨在为进一步研究提供坚实的基础，捕捉现有作品并预测未来的补充。最后，我们概述了开放挑战和进一步研究的可能方向。

更新时间: 2025-07-09 07:55:00

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2502.03108v2

Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning

Offline multi-task reinforcement learning aims to learn a unified policy capable of solving multiple tasks using only pre-collected task-mixed datasets, without requiring any online interaction with the environment. However, it faces significant challenges in effectively sharing knowledge across tasks. Inspired by the efficient knowledge abstraction observed in human learning, we propose Goal-Oriented Skill Abstraction (GO-Skill), a novel approach designed to extract and utilize reusable skills to enhance knowledge transfer and task performance. Our approach uncovers reusable skills through a goal-oriented skill extraction process and leverages vector quantization to construct a discrete skill library. To mitigate class imbalances between broadly applicable and task-specific skills, we introduce a skill enhancement phase to refine the extracted skills. Furthermore, we integrate these skills using hierarchical policy learning, enabling the construction of a high-level policy that dynamically orchestrates discrete skills to accomplish specific tasks. Extensive experiments on diverse robotic manipulation tasks within the MetaWorld benchmark demonstrate the effectiveness and versatility of GO-Skill.

Updated: 2025-07-09 07:54:49

标题: 目标导向的技能抽象用于离线多任务强化学习

摘要: 离线多任务强化学习旨在学习一个统一的策略，能够解决多个任务，仅使用预先收集的任务混合数据集，而不需要与环境进行任何在线交互。然而，它在有效地跨任务共享知识方面面临重大挑战。受到人类学习中高效知识抽象的启发，我们提出了面向目标的技能抽象（GO-Skill），这是一种旨在提取和利用可重复使用技能以增强知识转移和任务性能的新方法。我们的方法通过面向目标的技能提取过程揭示可重复使用的技能，并利用向量量化构建离散技能库。为了缓解广泛适用和特定任务技能之间的类别不平衡，我们引入了一个技能增强阶段来完善提取的技能。此外，我们使用分层策略学习来整合这些技能，使得构建一个能够动态协调离散技能以完成特定任务的高级策略成为可能。在MetaWorld基准测试中对各种机器人操纵任务进行了大量实验，证明了GO-Skill的有效性和多功能性。

更新时间: 2025-07-09 07:54:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06628v1

Q-STAC: Q-Guided Stein Variational Model Predictive Actor-Critic

Deep reinforcement learning has shown remarkable success in continuous control tasks, yet often requires extensive training data, struggles with complex, long-horizon planning, and fails to maintain safety constraints during operation. Meanwhile, Model Predictive Control (MPC) offers explainability and constraint satisfaction, but typically yields only locally optimal solutions and demands careful cost function design. This paper introduces the Q-guided STein variational model predictive Actor-Critic (Q-STAC), a novel framework that bridges these approaches by integrating Bayesian MPC with actor-critic reinforcement learning through constrained Stein Variational Gradient Descent (SVGD). Our method optimizes control sequences directly using learned Q-values as objectives, eliminating the need for explicit cost function design while leveraging known system dynamics to enhance sample efficiency and ensure control signals remain within safe boundaries. Extensive experiments on 2D navigation and robotic manipulation tasks demonstrate that Q-STAC achieves superior sample efficiency, robustness, and optimality compared to state-of-the-art algorithms, while maintaining the high expressiveness of policy distributions. Experiment videos are available on our website: https://sites.google.com/view/q-stac

Updated: 2025-07-09 07:53:53

标题: Q-STAC：Q引导的斯坦变分模型预测演员-评论家

摘要: 深度强化学习在连续控制任务中取得了显著的成功，但通常需要大量的训练数据，在复杂、长期规划中遇到困难，并且在操作过程中无法保持安全约束。与此同时，模型预测控制（MPC）提供了可解释性和约束满足性，但通常只产生局部最优解，并需要仔细设计成本函数。本文介绍了Q引导的STein变分模型预测Actor-Critic（Q-STAC），这是一个通过受约束的Stein变分梯度下降（SVGD）将贝叶斯MPC与Actor-Critic强化学习结合起来的新框架。我们的方法直接使用学习的Q值作为目标优化控制序列，消除了对显式成本函数设计的需求，同时利用已知的系统动态来增强样本效率并确保控制信号保持在安全边界内。在2D导航和机器人操作任务上进行的大量实验表明，与最先进的算法相比，Q-STAC在样本效率、鲁棒性和最优性方面表现出色，同时保持了策略分布的高表现力。实验视频可在我们的网站上查看：https://sites.google.com/view/q-stac

更新时间: 2025-07-09 07:53:53

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.06625v1

UniOD: A Universal Model for Outlier Detection across Diverse Domains

Outlier detection (OD) seeks to distinguish inliers and outliers in completely unlabeled datasets and plays a vital role in science and engineering. Most existing OD methods require troublesome dataset-specific hyperparameter tuning and costly model training before they can be deployed to identify outliers. In this work, we propose UniOD, a universal OD framework that leverages labeled datasets to train a single model capable of detecting outliers of datasets from diverse domains. Specifically, UniOD converts each dataset into multiple graphs, produces consistent node features, and frames outlier detection as a node-classification task, and is able to generalize to unseen domains. As a result, UniOD avoids effort on model selection and hyperparameter tuning, reduces computational cost, and effectively utilizes the knowledge from historical datasets, which improves the convenience and accuracy in real applications. We evaluate UniOD on 15 benchmark OD datasets against 15 state-of-the-art baselines, demonstrating its effectiveness.

Updated: 2025-07-09 07:52:12

标题: UniOD：跨多领域的异常检测通用模型

摘要: 异常检测（OD）旨在区分完全未标记的数据集中的内围点和异常值，并在科学和工程领域中发挥着重要作用。大多数现有的OD方法在能够部署以识别异常值之前，需要繁琐的特定于数据集的超参数调整和昂贵的模型训练。在本文中，我们提出了UniOD，一个通用的OD框架，利用带标签的数据集训练单个模型，能够检测来自不同领域的数据集中的异常值。具体来说，UniOD将每个数据集转换为多个图形，生成一致的节点特征，并将异常检测视为一个节点分类任务，并能够推广到未知领域。因此，UniOD避免了模型选择和超参数调整的工作，减少了计算成本，并有效利用了历史数据集的知识，提高了在实际应用中的便利性和准确性。我们针对15个基准OD数据集对UniOD进行评估，与15个最先进的基线方法进行比较，证明了其有效性。

更新时间: 2025-07-09 07:52:12

领域: cs.LG

下载: http://arxiv.org/abs/2507.06624v1

Expediting data extraction using a large language model (LLM) and scoping review protocol: a methodological study within a complex scoping review

The data extraction stages of reviews are resource-intensive, and researchers may seek to expediate data extraction using online (large language models) LLMs and review protocols. Claude 3.5 Sonnet was used to trial two approaches that used a review protocol to prompt data extraction from 10 evidence sources included in a case study scoping review. A protocol-based approach was also used to review extracted data. Limited performance evaluation was undertaken which found high accuracy for the two extraction approaches (83.3% and 100%) when extracting simple, well-defined citation details; accuracy was lower (9.6% and 15.8%) when extracting more complex, subjective data items. Considering all data items, both approaches had precision >90% but low recall (<25%) and F1 scores (<40%). The context of a complex scoping review, open response types and methodological approach likely impacted performance due to missed and misattributed data. LLM feedback considered the baseline extraction accurate and suggested minor amendments: four of 15 (26.7%) to citation details and 8 of 38 (21.1%) to key findings data items were considered to potentially add value. However, when repeating the process with a dataset featuring deliberate errors, only 2 of 39 (5%) errors were detected. Review-protocol-based methods used for expediency require more robust performance evaluation across a range of LLMs and review contexts with comparison to conventional prompt engineering approaches. We recommend researchers evaluate and report LLM performance if using them similarly to conduct data extraction or review extracted data. LLM feedback contributed to protocol adaptation and may assist future review protocol drafting.

Updated: 2025-07-09 07:50:55

标题: 利用大型语言模型（LLM）和范围审查方案加快数据提取：复杂范围审查中的方法研究

摘要: 综述的数据提取阶段需要大量资源，研究人员可能会寻求利用在线（大型语言模型）LLMs和审查方案来加快数据提取。克劳德3.5 Sonnet被用来试验两种方法，这两种方法利用审查方案来促使从一个案例研究范围审查中包括的10个证据来源中提取数据。还使用了基于协议的方法来审查提取的数据。进行了有限的性能评估，发现两种提取方法在提取简单、明确定义的引文细节时的准确性很高（83.3%和100%）；在提取更复杂、主观的数据项时，准确性较低（9.6%和15.8%）。考虑所有数据项，两种方法的精确度都>90%，但召回率<25%，F1得分<40%。复杂范围审查的背景、开放响应类型和方法论方法可能会影响性能，因为会出现遗漏和错误归因的数据。LLM反馈认为基线提取准确，并建议进行轻微修改：15项引文细节中有4项（26.7%），38项关键发现数据项中有8项（21.1%）被认为可能增加价值。然而，当使用包含故意错误的数据集重复这个过程时，只有39项中的2项（5%）错误被检测出来。为了迅速进行评估，基于审查协议的方法需要在一系列LLMs和审查上下文中进行更加健壮的性能评估，并与传统的促进工程方法进行比较。我们建议研究人员在使用LLM类似于进行数据提取或审查提取数据时评估和报告LLM的性能。LLM反馈有助于协议的调整，并可能有助于未来审查协议的起草。

更新时间: 2025-07-09 07:50:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.06623v1

Saffron-1: Safety Inference Scaling

Existing safety assurance research has primarily focused on training-phase alignment to instill safe behaviors into LLMs. However, recent studies have exposed these methods' susceptibility to diverse jailbreak attacks. Concurrently, inference scaling has significantly advanced LLM reasoning capabilities but remains unexplored in the context of safety assurance. Addressing this gap, our work pioneers inference scaling for robust and effective LLM safety against emerging threats. We reveal that conventional inference scaling techniques, despite their success in reasoning tasks, perform poorly in safety contexts, even falling short of basic approaches like Best-of-N Sampling. We attribute this inefficiency to a newly identified challenge, the exploration--efficiency dilemma, arising from the high computational overhead associated with frequent process reward model (PRM) evaluations. To overcome this dilemma, we propose SAFFRON, a novel inference scaling paradigm tailored explicitly for safety assurance. Central to our approach is the introduction of a multifurcation reward model (MRM) that significantly reduces the required number of reward model evaluations. To operationalize this paradigm, we further propose: (i) a partial supervision training objective for MRM, (ii) a conservative exploration constraint to prevent out-of-distribution explorations, and (iii) a Trie-based key--value caching strategy that facilitates cache sharing across sequences during tree search. Extensive experiments validate the effectiveness of our method. Additionally, we publicly release our trained multifurcation reward model (Saffron-1) and the accompanying token-level safety reward dataset (Safety4M) to accelerate future research in LLM safety. Our code, model, and data are publicly available at https://github.com/q-rz/saffron , and our project homepage is at https://q-rz.github.io/p/saffron .

Updated: 2025-07-09 07:47:59

标题: 藏红花-1：安全推理扩展

摘要: 现有的安全保障研究主要集中在培训阶段的对准，以将安全行为灌输到LLM中。然而，最近的研究揭示了这些方法对各种越狱攻击的敏感性。与此同时，推理缩放在LLM推理能力方面取得了显著进展，但在安全保障的背景下仍未得到探讨。为填补这一空白，我们的工作开拓了推理缩放，以确保LLM对新兴威胁的安全性具有鲁棒性和有效性。我们发现，传统的推理缩放技术，尽管在推理任务中取得了成功，但在安全环境中表现不佳，甚至不如基本方法如最佳N抽样。我们将这种低效归因于一个新识别的挑战，即探索-效率困境，这是由于频繁进行进程奖励模型（PRM）评估所带来的高计算开销。为了克服这一困境，我们提出了SAFFRON，这是一种专门为安全保障定制的新型推理缩放范式。我们方法的核心是引入了一个多叉奖励模型（MRM），显著减少了所需的奖励模型评估数量。为了使这种范式操作化，我们进一步提出：（i）MRM的部分监督训练目标，（ii）保守的探索约束以防止超出分布的探索，以及（iii）一种基于Trie的键-值缓存策略，在树搜索过程中促进序列之间的缓存共享。大量实验验证了我们方法的有效性。此外，我们公开发布了我们训练的多叉奖励模型（Saffron-1）和配套的标记级安全奖励数据集（Safety4M），以加速未来在LLM安全性方面的研究。我们的代码、模型和数据都可以在https://github.com/q-rz/saffron 上公开获取，我们的项目主页在https://q-rz.github.io/p/saffron。

更新时间: 2025-07-09 07:47:59

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2506.06444v2

Steps Adaptive Decay DPSGD: Enhancing Performance on Imbalanced Datasets with Differential Privacy with HAM10000

When applying machine learning to medical image classification, data leakage is a critical issue. Previous methods, such as adding noise to gradients for differential privacy, work well on large datasets like MNIST and CIFAR-100, but fail on small, imbalanced medical datasets like HAM10000. This is because the imbalanced distribution causes gradients from minority classes to be clipped and lose crucial information, while majority classes dominate. This leads the model to fall into suboptimal solutions early. To address this, we propose SAD-DPSGD, which uses a linear decaying mechanism for noise and clipping thresholds. By allocating more privacy budget and using higher clipping thresholds in the initial training phases, the model avoids suboptimal solutions and enhances performance. Experiments show that SAD-DPSGD outperforms Auto-DPSGD on HAM10000, improving accuracy by 2.15% under $\epsilon = 3.0$ , $\delta = 10^{-3}$.

Updated: 2025-07-09 07:46:29

标题: Steps自适应衰减DPSGD：通过差分隐私增强不平衡数据集上的性能与HAM10000

摘要: 在将机器学习应用于医学图像分类时，数据泄露是一个关键问题。先前的方法，如将噪声添加到梯度以实现差分隐私，在像MNIST和CIFAR-100这样的大型数据集上表现良好，但在小型、不平衡的医学数据集（如HAM10000）上失败。这是因为不平衡的分布导致来自少数类的梯度被剪切并丢失关键信息，而多数类占主导地位。这导致模型提早陷入次优解。为了解决这个问题，我们提出了SAD-DPSGD，它使用线性衰减机制来控制噪声和剪切阈值。通过在初始训练阶段分配更多的隐私预算并使用更高的剪切阈值，模型避免了次优解并提高了性能。实验表明，在$\epsilon=3.0$，$\delta=10^{-3}$的情况下，SAD-DPSGD在HAM10000上的表现优于Auto-DPSGD，准确率提高了2.15%。

更新时间: 2025-07-09 07:46:29

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.06619v1

Efficient Multi-Task Reinforcement Learning with Cross-Task Policy Guidance

Multi-task reinforcement learning endeavors to efficiently leverage shared information across various tasks, facilitating the simultaneous learning of multiple tasks. Existing approaches primarily focus on parameter sharing with carefully designed network structures or tailored optimization procedures. However, they overlook a direct and complementary way to exploit cross-task similarities: the control policies of tasks already proficient in some skills can provide explicit guidance for unmastered tasks to accelerate skills acquisition. To this end, we present a novel framework called Cross-Task Policy Guidance (CTPG), which trains a guide policy for each task to select the behavior policy interacting with the environment from all tasks' control policies, generating better training trajectories. In addition, we propose two gating mechanisms to improve the learning efficiency of CTPG: one gate filters out control policies that are not beneficial for guidance, while the other gate blocks tasks that do not necessitate guidance. CTPG is a general framework adaptable to existing parameter sharing approaches. Empirical evaluations demonstrate that incorporating CTPG with these approaches significantly enhances performance in manipulation and locomotion benchmarks.

Updated: 2025-07-09 07:36:28

标题: 高效的多任务强化学习与跨任务策略引导

摘要: 多任务强化学习旨在有效利用各种任务之间的共享信息，促进多任务的同时学习。现有方法主要集中在通过精心设计的网络结构或定制的优化程序共享参数。然而，它们忽视了一种直接和互补的利用跨任务相似性的方法：已经精通某些技能的任务的控制策略可以为未掌握技能的任务提供明确的指导，加速技能的习得。为此，我们提出了一个名为Cross-Task Policy Guidance (CTPG)的新框架，该框架训练每个任务的导向策略，从所有任务的控制策略中选择与环境交互的行为策略，生成更好的训练轨迹。此外，我们提出了两种门控机制来提高CTPG的学习效率：一个门控制排除不利于指导的控制策略，而另一个门则阻止不需要指导的任务。CTPG是一个通用的框架，可适用于现有的参数共享方法。实证评估表明，将CTPG与这些方法结合使用显著提高了操作和运动基准测试的性能。

更新时间: 2025-07-09 07:36:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06615v1

Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning

T cell receptor (TCR) repertoires encode critical immunological signatures for autoimmune diseases, yet their clinical application remains limited by sequence sparsity and low witness rates. We developed EAMil, a multi-instance deep learning framework that leverages TCR sequencing data to diagnose systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) with exceptional accuracy. By integrating PrimeSeq feature extraction with ESMonehot encoding and enhanced gate attention mechanisms, our model achieved state-of-the-art performance with AUCs of 98.95% for SLE and 97.76% for RA. EAMil successfully identified disease-associated genes with over 90% concordance with established differential analyses and effectively distinguished disease-specific TCR genes. The model demonstrated robustness in classifying multiple disease categories, utilizing the SLEDAI score to stratify SLE patients by disease severity as well as to diagnose the site of damage in SLE patients, and effectively controlling for confounding factors such as age and gender. This interpretable framework for immune receptor analysis provides new insights for autoimmune disease detection and classification with broad potential clinical applications across immune-mediated conditions.

Updated: 2025-07-09 07:33:59

标题: 多模态多实例学习方法对外周血TCR谱系的自身免疫疾病分类

摘要: T细胞受体（TCR）库编码自身免疫疾病的重要免疫标志，然而其临床应用受到序列稀疏和低见证率的限制。我们开发了EAMil，这是一个多实例深度学习框架，利用TCR测序数据来诊断系统性红斑狼疮（SLE）和类风湿性关节炎（RA），具有异常的准确性。通过将PrimeSeq特征提取与ESMonehot编码和增强门注意机制相结合，我们的模型在SLE和RA的AUC分别达到了98.95%和97.76%的最新表现。EAMil成功地识别了与疾病相关的基因，与已建立的差异分析有超过90%的一致性，并有效区分了疾病特异性TCR基因。该模型在分类多种疾病类别方面表现出鲁棒性，利用SLEDAI评分对SLE患者按疾病严重程度分层，并对SLE患者的损伤部位进行诊断，有效地控制了年龄和性别等混杂因素。这种可解释的免疫受体分析框架为自身免疫疾病检测和分类提供了新的见解，具有广泛潜在的临床应用，可跨越免疫介导的疾病条件。

更新时间: 2025-07-09 07:33:59

领域: cs.LG,cs.AI,q-bio.GN

下载: http://arxiv.org/abs/2507.04981v3

Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation

Disentangled and interpretable latent representations in generative models typically come at the cost of generation quality. The $\beta$-VAE framework introduces a hyperparameter $\beta$ to balance disentanglement and reconstruction quality, where setting $\beta > 1$ introduces an information bottleneck that favors disentanglement over sharp, accurate reconstructions. To address this trade-off, we propose a novel generative modeling framework that leverages a range of $\beta$ values to learn multiple corresponding latent representations. First, we obtain a slew of representations by training a single variational autoencoder (VAE), with a new loss function that controls the information retained in each latent representation such that the higher $\beta$ value prioritize disentanglement over reconstruction fidelity. We then, introduce a non-linear diffusion model that smoothly transitions latent representations corresponding to different $\beta$ values. This model denoises towards less disentangled and more informative representations, ultimately leading to (almost) lossless representations, enabling sharp reconstructions. Furthermore, our model supports sample generation without input images, functioning as a standalone generative model. We evaluate our framework in terms of both disentanglement and generation quality. Additionally, we observe smooth transitions in the latent spaces with respect to changes in $\beta$, facilitating consistent manipulation of generated outputs.

Updated: 2025-07-09 07:29:41

标题: 去噪多重Beta VAE：表征学习用于解缠和生成

摘要: 在生成模型中，解耦且可解释的潜在表示通常会以生成质量为代价。β-VAE框架引入了一个超参数β来平衡解耦和重构质量，其中设置β>1会引入一个偏向解耦而非锐利、准确重构的信息瓶颈。为了解决这种权衡，我们提出了一个新颖的生成建模框架，利用一系列β值来学习多个对应的潜在表示。首先，我们通过训练单个变分自动编码器（VAE）获得一系列表示，采用一种新的损失函数来控制每个潜在表示中保留的信息，使得较高β值优先考虑解耦而非重构保真度。然后，我们引入一个非线性扩散模型，平滑过渡对应于不同β值的潜在表示。该模型向不太解耦且更具信息性的表示去噪，最终实现（几乎）无损表示，从而实现锐利的重构。此外，我们的模型支持不需要输入图像的样本生成，作为一个独立的生成模型运行。我们从解耦和生成质量的角度评估了我们的框架。此外，我们观察到在β变化时潜在空间的平滑过渡，有助于对生成输出进行一致的操作。

更新时间: 2025-07-09 07:29:41

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.06613v1

Animation Needs Attention: A Holistic Approach to Slides Animation Comprehension with Visual-Language Models

Slide animations, such as fade-in, fly-in, and wipe, are critical for audience engagement, efficient information delivery, and vivid visual expression. However, most AI-driven slide-generation tools still lack native animation support, and existing vision-language models (VLMs) struggle with animation tasks due to the absence of public datasets and limited temporal-reasoning capabilities. To address this gap, we release the first public dataset for slide-animation modeling: 12,000 triplets of natural-language descriptions, animation JSON files, and rendered videos, collectively covering every built-in PowerPoint effect. Using this resource, we fine-tune Qwen-2.5-VL-7B with Low-Rank Adaptation (LoRA) and achieve consistent improvements over GPT-4.1 and Gemini-2.5-Pro in BLEU-4, ROUGE-L, SPICE, and our Coverage-Order-Detail Assessment (CODA) metric, which evaluates action coverage, temporal order, and detail fidelity. On a manually created test set of slides, the LoRA model increases BLEU-4 by around 60%, ROUGE-L by 30%, and shows significant improvements in CODA-detail. This demonstrates that low-rank adaptation enables reliable temporal reasoning and generalization beyond synthetic data. Overall, our dataset, LoRA-enhanced model, and CODA metric provide a rigorous benchmark and foundation for future research on VLM-based dynamic slide generation.

Updated: 2025-07-09 07:28:53

标题: 动画需要关注：一种综合方法来理解幻灯片动画与视觉语言模型

摘要: 幻灯片动画，如淡入、飞入和擦除，对于观众参与、高效信息传递和生动视觉表达至关重要。然而，大多数基于人工智能的幻灯片生成工具仍然缺乏原生动画支持，现有的视觉-语言模型（VLMs）在动画任务上面临困难，因为缺乏公开数据集和有限的时间推理能力。为了填补这一空白，我们发布了第一个用于幻灯片动画建模的公开数据集：包括自然语言描述、动画JSON文件和渲染视频的12,000个三元组，共涵盖每种内置的PowerPoint效果。利用这一资源，我们使用低秩适应（LoRA）对Qwen-2.5-VL-7B进行微调，并在BLEU-4、ROUGE-L、SPICE以及我们的Coverage-Order-Detail Assessment（CODA）度量中实现了与GPT-4.1和Gemini-2.5-Pro一致的改进，评估了动作覆盖率、时间顺序和细节保真度。在手动创建的幻灯片测试集上，LoRA模型使BLEU-4提高约60％，ROUGE-L提高30％，并在CODA细节方面显示了显著改进。这表明低秩适应能够实现可靠的时间推理，并在合成数据之外实现泛化。总的来说，我们的数据集、LoRA增强模型和CODA度量为基于VLM的动态幻灯片生成的未来研究提供了严格的基准和基础。

更新时间: 2025-07-09 07:28:53

领域: cs.AI,cs.CV,68T01

下载: http://arxiv.org/abs/2507.03916v2

Nexus: Taming Throughput-Latency Tradeoff in LLM Serving via Efficient GPU Sharing

Current prefill-decode (PD) disaggregation is typically deployed at the level of entire serving engines, assigning separate GPUs to handle prefill and decode phases. While effective at reducing latency, this approach demands more hardware. To improve GPU utilization, Chunked Prefill mixes prefill and decode requests within the same batch, but introduces phase interference between prefill and decode. While existing PD disaggregation solutions separate the phases across GPUs, we ask: can the same decoupling be achieved within a single serving engine? The key challenge lies in managing the conflicting resource requirements of prefill and decode when they share the same hardware. In this paper, we first show that chunked prefill requests cause interference with decode requests due to their distinct requirements for GPU resources. Second, we find that GPU resources exhibit diminishing returns. Beyond a saturation point, increasing GPU allocation yields negligible latency improvements. This insight enables us to split a single GPU's resources and dynamically allocate them to prefill and decode on the fly, effectively disaggregating the two phases within the same GPU. Across a range of models and workloads, our system Nexus achieves up to 2.2x higher throughput, 20x lower TTFT, and 2.5x lower TBT than vLLM. It also outperforms SGLang with up to 2x higher throughput, 2x lower TTFT, and 1.7x lower TBT, and achieves 1.4x higher throughput than vLLM-disaggregation using only half the number of GPUs.

Updated: 2025-07-09 07:27:18

标题: Nexus: 通过高效的GPU共享驯服LLM服务中的吞吐量-延迟权衡

摘要: 当前的预加载解码（PD）分解通常部署在整个服务引擎的级别，为处理预加载和解码阶段分配单独的GPU。虽然有效地降低了延迟，但这种方法需要更多的硬件。为了提高GPU利用率，分块预加载将预加载和解码请求混合在同一批次中，但引入了预加载和解码之间的相位干扰。尽管现有的PD分解解决方案将不同阶段分开分配到不同的GPU上，但我们提出：是否可以在单个服务引擎内实现相同的解耦？关键挑战在于在共享相同硬件时管理预加载和解码的冲突资源需求。在本文中，我们首先展示了分块预加载请求与解码请求之间的干扰，因为它们对GPU资源有不同的要求。其次，我们发现GPU资源存在收益递减。超过饱和点后，增加GPU分配几乎不会改善延迟。这一观点使我们能够在动态分配单个GPU的资源，并将其有效地分别分配给预加载和解码，从而在同一个GPU内分解两个阶段。在各种模型和工作负载下，我们的系统Nexus的吞吐量高达2.2倍，TTFT低20倍，TBT低2.5倍，优于vLLM。它还优于SGLang，吞吐量高达2倍，TTFT低2倍，TBT低1.7倍，并且仅使用一半数量的GPU实现比vLLM分解更高的吞吐量1.4倍。

更新时间: 2025-07-09 07:27:18

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2507.06608v1

Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation

Recent advances in language modeling have demonstrated the effectiveness of State Space Models (SSMs) for efficient sequence modeling. While hybrid architectures such as Samba and the decoder-decoder architecture, YOCO, have shown promising performance gains over Transformers, prior works have not investigated the efficiency potential of representation sharing between SSM layers. In this paper, we introduce the Gated Memory Unit (GMU), a simple yet effective mechanism for efficient memory sharing across layers. We apply it to create SambaY, a decoder-hybrid-decoder architecture that incorporates GMUs in the cross-decoder to share memory readout states from a Samba-based self-decoder. SambaY significantly enhances decoding efficiency, preserves linear pre-filling time complexity, and boosts long-context performance, all while eliminating the need for explicit positional encoding. Through extensive scaling experiments, we demonstrate that our model exhibits a significantly lower irreducible loss compared to a strong YOCO baseline, indicating superior performance scalability under large-scale compute regimes. Our largest model enhanced with Differential Attention, Phi4-mini-Flash-Reasoning, achieves significantly better performance than Phi4-mini-Reasoning on reasoning tasks such as Math500, AIME24/25, and GPQA Diamond without any reinforcement learning, while delivering up to 10x higher decoding throughput on 2K-length prompts with 32K generation length under the vLLM inference framework. We release our training codebase on open-source data at https://github.com/microsoft/ArchScale.

Updated: 2025-07-09 07:27:00

标题: 解码器-混合解码器架构用于高效处理长生成的推理

摘要: 最近在语言建模方面取得的进展表明，状态空间模型（SSMs）在高效序列建模方面的有效性。虽然混合架构如Samba和解码器-解码器架构YOCO已经显示出比Transformer更有希望的性能提升，但先前的研究并未调查SSM层之间表示共享的效率潜力。在本文中，我们介绍了门控记忆单元（GMU），这是一种简单而有效的机制，用于在层之间实现内存共享。我们将其应用于创建SambaY，一种解码器-混合-解码器架构，该架构在交叉解码器中集成了GMUs，以共享来自基于Samba的自解码器的内存读取状态。SambaY显著提高了解码效率，保持了线性填充时间复杂度，并提升了长上下文性能，同时消除了对显式位置编码的需求。通过大量的扩展实验，我们展示了我们的模型相比于强大的YOCO基线具有显著较低的不可减损损失，表明在大规模计算环境下具有卓越的性能可扩展性。我们的最大模型配备了差分注意力、Phi4-mini-Flash-Reasoning，在推理任务（如Math500、AIME24/25和GPQA Diamond）上取得了比Phi4-mini-Reasoning更好的性能，而无需任何强化学习，并在vLLM推理框架下在2K长度提示和32K生成长度下提供高达10倍的解码吞吐量。我们在https://github.com/microsoft/ArchScale上发布了我们的训练代码库。

更新时间: 2025-07-09 07:27:00

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.06607v1

Generalization in Reinforcement Learning for Radio Access Networks

Modern RAN operate in highly dynamic and heterogeneous environments, where hand-tuned, rule-based RRM algorithms often underperform. While RL can surpass such heuristics in constrained settings, the diversity of deployments and unpredictable radio conditions introduce major generalization challenges. Data-driven policies frequently overfit to training conditions, degrading performance in unseen scenarios. To address this, we propose a generalization-centered RL framework for RAN control that: (i) encodes cell topology and node attributes via attention-based graph representations; (ii) applies domain randomization to broaden the training distribution; and (iii) distributes data generation across multiple actors while centralizing training in a cloud-compatible architecture aligned with O-RAN principles. Although generalization increases computational and data-management complexity, our distributed design mitigates this by scaling data collection and training across diverse network conditions. Applied to downlink link adaptation in five 5G benchmarks, our policy improves average throughput and spectral efficiency by ~10% over an OLLA baseline (10% BLER target) in full-buffer MIMO/mMIMO and by >20% under high mobility. It matches specialized RL in full-buffer traffic and achieves up to 4- and 2-fold gains in eMBB and mixed-traffic benchmarks, respectively. In nine-cell deployments, GAT models offer 30% higher throughput over MLP baselines. These results, combined with our scalable architecture, offer a path toward AI-native 6G RAN using a single, generalizable RL agent.

Updated: 2025-07-09 07:22:22

标题: 强化学习在无线接入网络中的泛化效果

摘要: 现代无线接入网络（RAN）在高度动态和异构的环境中运作，手动调整的基于规则的RRM算法通常表现不佳。虽然在受限环境中RL可以超越这种启发式方法，但部署的多样性和不可预测的无线条件引入了主要的泛化挑战。数据驱动的策略经常会过度拟合训练条件，降低在未知场景中的性能。为了解决这个问题，我们提出了一个以泛化为中心的RAN控制RL框架：（i）通过基于注意力的图表示对单元拓扑和节点属性进行编码；（ii）应用域随机化来扩大训练分布；（iii）在与O-RAN原则一致的云兼容架构中将数据生成分布在多个执行者之间，同时集中培训。尽管泛化增加了计算和数据管理复杂性，但我们的分布式设计通过在各种网络条件下扩展数据收集和培训来减轻这一问题。应用于五个5G基准测试中的下行链路适应性，我们的策略在满缓冲MIMO/mMIMO中比OLLA基线（10% BLER目标）提高了约10％的平均吞吐量和频谱效率，在高移动性下提高了超过20％。它在满缓冲流量中与专用RL相匹配，并在eMBB和混合流量基准测试中分别实现了最多4倍和2倍的增益。在九个小区的部署中，GAT模型比MLP基线提供了30％更高的吞吐量。这些结果，结合我们的可扩展架构，为使用单个、具有泛化能力的RL代理向AI原生的6G RAN迈出了一步。

更新时间: 2025-07-09 07:22:22

领域: cs.LG

下载: http://arxiv.org/abs/2507.06602v1

SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language Model

Recent advances in deep learning have revolutionized seismic monitoring, yet developing a foundation model that performs well across multiple complex tasks remains challenging, particularly when dealing with degraded signals or data scarcity. This work presents SeisMoLLM, the first foundation model that utilizes cross-modal transfer for seismic monitoring, to unleash the power of large-scale pre-training from a large language model without requiring direct pre-training on seismic datasets. Through elaborate waveform tokenization and fine-tuning of pre-trained GPT-2 model, SeisMoLLM achieves state-of-the-art performance on the DiTing and STEAD datasets across five critical tasks: back-azimuth estimation, epicentral distance estimation, magnitude estimation, phase picking, and first-motion polarity classification. It attains 36 best results out of 43 task metrics and 12 top scores out of 16 few-shot generalization metrics, with many relative improvements ranging from 10% to 50%. In addition to its superior performance, SeisMoLLM maintains efficiency comparable to or even better than lightweight models in both training and inference. These findings establish SeisMoLLM as a promising foundation model for practical seismic monitoring and highlight cross-modal transfer as an exciting new direction for earthquake studies, showcasing the potential of advanced deep learning techniques to propel seismology research forward.

Updated: 2025-07-09 07:08:00

标题: SeisMoLLM: 通过预训练的大型语言模型进行跨模态转移推动地震监测

摘要: 最近深度学习的进展彻底改变了地震监测，然而开发一个在多个复杂任务上表现出色的基础模型仍然具有挑战性，特别是当处理信号退化或数据稀缺时。本文介绍了SeisMoLLM，这是第一个利用跨模态转移进行地震监测的基础模型，以释放大型语言模型的大规模预训练的能力，而无需直接在地震数据集上进行预训练。通过精心设计的波形标记化和对预训练的GPT-2模型的微调，SeisMoLLM在DiTing和STEAD数据集上的五个关键任务（方位角估计、震中距估计、震级估计、震相拾取和初动极性分类）上实现了最先进的性能。在43个任务指标中，它获得了36个最佳结果，在16个少样本泛化指标中，它获得了12个最高分，许多相对改进达到了10%到50%。除了其卓越的性能外，SeisMoLLM在训练和推断中的效率与甚至更好地轻量级模型相媲美。这些发现将SeisMoLLM确立为实用地震监测的有希望的基础模型，并强调跨模态转移作为地震研究的令人兴奋的新方向，展示了先进深度学习技术推动地震学研究向前发展的潜力。

更新时间: 2025-07-09 07:08:00

领域: cs.LG

下载: http://arxiv.org/abs/2502.19960v2

FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models

Advancements in reasoning for large language models (LLMs) have lead to significant performance improvements for LLMs in various fields such as mathematics and programming. However, research applying these advances to the financial domain, where considerable domain-specific knowledge is necessary to complete tasks, remains limited. To address this gap, we introduce FEVO (Financial Evolution), a multi-stage enhancement framework developed to enhance LLM performance in the financial domain. FEVO systemically enhances LLM performance by using continued pre-training (CPT) to expand financial domain knowledge, supervised fine-tuning (SFT) to instill structured, elaborate reasoning patterns, and reinforcement learning (RL) to further integrate the expanded financial domain knowledge with the learned structured reasoning. To ensure effective and efficient training, we leverage frontier reasoning models and rule-based filtering to curate FEVO-Train, high-quality datasets specifically designed for the different post-training phases. Using our framework, we train the FEVO series of models - C32B, S32B, R32B - from Qwen2.5-32B and evaluate them on seven benchmarks to assess financial and general capabilities, with results showing that FEVO-R32B achieves state-of-the-art performance on five financial benchmarks against much larger models as well as specialist models. More significantly, FEVO-R32B demonstrates markedly better performance than FEVO-R32B-0 (trained from Qwen2.5-32B-Instruct using only RL), thus validating the effectiveness of financial domain knowledge expansion and structured, logical reasoning distillation

Updated: 2025-07-09 07:06:36

标题: FEVO：大型语言模型的财务知识扩展和推理演变

摘要: 大规模语言模型（LLMs）推理方面的进展已经显著提高了LLMs在数学和编程等各个领域的性能。然而，在金融领域应用这些进步的研究仍然有限，因为完成任务需要相当多的领域特定知识。为了填补这一空白，我们引入了FEVO（Financial Evolution），这是一个多阶段增强框架，旨在增强LLM在金融领域的性能。FEVO通过使用持续预训练（CPT）来扩展金融领域知识，使用监督微调（SFT）来灌输结构化、详细的推理模式，以及使用强化学习（RL）进一步整合扩展的金融领域知识与学习的结构化推理。为了确保有效和高效的训练，我们利用前沿推理模型和基于规则的过滤来筛选FEVO-Train，这是专门为不同的后续训练阶段设计的高质量数据集。使用我们的框架，我们从Qwen2.5-32B训练FEVO系列模型 - C32B、S32B、R32B，并在七个基准测试上进行评估，以评估金融和一般能力，结果显示FEVO-R32B在五个金融基准测试中取得了最先进的性能，超过了更大的模型和专业模型。更重要的是，FEVO-R32B表现出比FEVO-R32B-0（仅使用RL从Qwen2.5-32B-Instruct训练）更好的性能，从而验证了金融领域知识扩展和结构化、逻辑推理的有效性。

更新时间: 2025-07-09 07:06:36

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.06057v2

DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction

Autonomous driving has seen significant progress, driven by extensive real-world data. However, in long-tail scenarios, accurately predicting the safety of the ego vehicle's future motion remains a major challenge due to uncertainties in dynamic environments and limitations in data coverage. In this work, we aim to explore whether it is possible to enhance the motion risk prediction capabilities of Vision-Language Models (VLM) by synthesizing high-risk motion data. Specifically, we introduce a Bird's-Eye View (BEV) based motion simulation method to model risks from three aspects: the ego-vehicle, other vehicles, and the environment. This allows us to synthesize plug-and-play, high-risk motion data suitable for VLM training, which we call DriveMRP-10K. Furthermore, we design a VLM-agnostic motion risk estimation framework, named DriveMRP-Agent. This framework incorporates a novel information injection strategy for global context, ego-vehicle perspective, and trajectory projection, enabling VLMs to effectively reason about the spatial relationships between motion waypoints and the environment. Extensive experiments demonstrate that by fine-tuning with DriveMRP-10K, our DriveMRP-Agent framework can significantly improve the motion risk prediction performance of multiple VLM baselines, with the accident recognition accuracy soaring from 27.13% to 88.03%. Moreover, when tested via zero-shot evaluation on an in-house real-world high-risk motion dataset, DriveMRP-Agent achieves a significant performance leap, boosting the accuracy from base_model's 29.42% to 68.50%, which showcases the strong generalization capabilities of our method in real-world scenarios.

Updated: 2025-07-09 06:50:51

标题: DriveMRP：利用合成运动数据增强视觉-语言模型以进行运动风险预测

摘要: 自主驾驶在现实世界数据的推动下取得了显著进展。然而，在长尾场景中，由于动态环境中的不确定性和数据覆盖的限制，准确预测自我车辆未来运动的安全性仍然是一个主要挑战。在这项工作中，我们旨在探讨通过合成高风险运动数据是否可能增强视觉语言模型（VLM）的运动风险预测能力。具体来说，我们引入了基于鸟瞰图（BEV）的运动模拟方法，以从三个方面模拟风险：自我车辆、其他车辆和环境。这使我们能够合成适用于VLM训练的即插即用的高风险运动数据，我们称之为DriveMRP-10K。此外，我们设计了一个VLM-不可知的运动风险估计框架，命名为DriveMRP-Agent。该框架结合了一种新颖的信息注入策略，用于全局背景、自我车辆视角和轨迹投影，使VLM能够有效地推理运动航点与环境之间的空间关系。大量实验证明，通过与DriveMRP-10K微调，我们的DriveMRP-Agent框架可以显著提高多个VLM基线的运动风险预测性能，事故识别准确率从27.13%提升至88.03%。此外，通过在内部真实世界高风险运动数据集上进行零样本评估，DriveMRP-Agent实现了显著的性能飞跃，将准确率从基础模型的29.42%提升至68.50%，展示了我们方法在现实世界场景中的强大泛化能力。

更新时间: 2025-07-09 06:50:51

领域: cs.CV,cs.AI,cs.RO,I.4.8; I.2.7; I.2.10

下载: http://arxiv.org/abs/2507.02948v2

Hysteresis-Aware Neural Network Modeling and Whole-Body Reinforcement Learning Control of Soft Robots

Soft robots exhibit inherent compliance and safety, which makes them particularly suitable for applications requiring direct physical interaction with humans, such as surgical procedures. However, their nonlinear and hysteretic behavior, resulting from the properties of soft materials, presents substantial challenges for accurate modeling and control. In this study, we present a soft robotic system designed for surgical applications and propose a hysteresis-aware whole-body neural network model that accurately captures and predicts the soft robot's whole-body motion, including its hysteretic behavior. Building upon the high-precision dynamic model, we construct a highly parallel simulation environment for soft robot control and apply an on-policy reinforcement learning algorithm to efficiently train whole-body motion control strategies. Based on the trained control policy, we developed a soft robotic system for surgical applications and validated it through phantom-based laser ablation experiments in a physical environment. The results demonstrate that the hysteresis-aware modeling reduces the Mean Squared Error (MSE) by 84.95 percent compared to traditional modeling methods. The deployed control algorithm achieved a trajectory tracking error ranging from 0.126 to 0.250 mm on the real soft robot, highlighting its precision in real-world conditions. The proposed method showed strong performance in phantom-based surgical experiments and demonstrates its potential for complex scenarios, including future real-world clinical applications.

Updated: 2025-07-09 06:28:57

标题: 滞后感知神经网络建模与软机器人全身强化学习控制

摘要: 软机器人具有固有的顺应性和安全性，使它们特别适用于需要与人类直接进行物理交互的应用，如外科手术。然而，由于软材料的特性导致的非线性和滞后行为为精确建模和控制提出了重大挑战。在本研究中，我们提出了一个专为外科应用设计的软机器人系统，并提出了一个全身神经网络模型，该模型考虑了滞后效应，准确捕捉并预测软机器人的整体运动，包括其滞后行为。在高精度动态模型的基础上，我们构建了一个高度并行的软机器人控制仿真环境，并应用一个在线策略强化学习算法，有效地训练整体运动控制策略。基于训练好的控制策略，我们开发了一个用于外科应用的软机器人系统，并通过基于幻影的激光消融实验在物理环境中进行验证。结果表明，考虑滞后效应的建模相比传统建模方法将均方误差（MSE）降低了84.95％。部署的控制算法在真实软机器人上实现了轨迹跟踪误差在0.126至0.250毫米之间，突出了其在真实环境中的精度。所提出的方法在基于幻影的外科实验中表现出色，并展示了其在复杂场景中的潜力，包括未来真实世界的临床应用。

更新时间: 2025-07-09 06:28:57

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2504.13582v2

Learning controllable dynamics through informative exploration

Environments with controllable dynamics are usually understood in terms of explicit models. However, such models are not always available, but may sometimes be learned by exploring an environment. In this work, we investigate using an information measure called "predicted information gain" to determine the most informative regions of an environment to explore next. Applying methods from reinforcement learning allows good suboptimal exploring policies to be found, and leads to reliable estimates of the underlying controllable dynamics. This approach is demonstrated by comparing with several myopic exploration approaches.

Updated: 2025-07-09 06:20:24

标题: 学习可控动力学通过信息性探索

摘要: 具有可控动态的环境通常是通过明确的模型来理解的。然而，这些模型并不总是可用的，但有时可以通过探索环境来学习。在这项工作中，我们研究了使用一种名为“预测信息增益”的信息度量来确定下一个要探索的环境中最具信息量的区域。应用强化学习方法可以找到良好的次优探索策略，并导致对潜在可控动态的可靠估计。通过与几种目光短浅的探索方法进行比较，证明了这种方法。

更新时间: 2025-07-09 06:20:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06582v1

Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions

Large Language Models (LLMs) have gained enormous attention in recent years due to their capability of understanding and generating natural languages. With the rapid development and wild-range applications (e.g., Agents, Embodied Intelligence), the robustness of LLMs has received increased attention. As the core brain of many AI applications, the robustness of LLMs requires that models should not only generate consistent contents, but also ensure the correctness and stability of generated content when dealing with unexpeted application scenarios (e.g., toxic prompts, limited noise domain data, outof-distribution (OOD) applications, etc). In this survey paper, we conduct a thorough review of the robustness of LLMs, aiming to provide a comprehensive terminology of concepts and methods around this field and facilitate the community. Specifically, we first give a formal definition of LLM robustness and present the collection protocol of this survey paper. Then, based on the types of perturbated inputs, we organize this survey from the following perspectives: 1) Adversarial Robustness: tackling the problem that prompts are manipulated intentionally, such as noise prompts, long context, data attack, etc; 2) OOD Robustness: dealing with the unexpected real-world application scenarios, such as OOD detection, zero-shot transferring, hallucinations, etc; 3) Evaluation of Robustness: summarizing the new evaluation datasets, metrics, and tools for verifying the robustness of LLMs. After reviewing the representative work from each perspective, we discuss and highlight future opportunities and research directions in this field. Meanwhile, we also organize related works and provide an easy-to-search project (https://github.com/zhangkunzk/Awesome-LLM-Robustness-papers) to support the community.

Updated: 2025-07-09 06:18:33

标题: 评估和改进大型语言模型的鲁棒性：调查和未来方向

摘要: 近年来，由于大语言模型（LLMs）具有理解和生成自然语言的能力，它们受到了广泛关注。随着快速发展和广泛应用（例如代理，具身智能），LLMs的稳健性受到了越来越多的关注。作为许多人工智能应用的核心大脑，LLMs的稳健性要求模型不仅能够生成一致的内容，还要在处理意外应用场景时确保生成内容的正确性和稳定性（例如有毒提示，有限噪声域数据，超出分布范围的应用等）。在这篇调查论文中，我们对LLMs的稳健性进行了彻底审查，旨在提供关于这一领域概念和方法的全面术语，并促进社区发展。具体来说，我们首先给出了LLMs稳健性的正式定义，并介绍了本调查论文的收集协议。然后，基于扰动输入的类型，我们从以下角度组织了这项调查：1）对抗稳健性：解决提示被故意操纵的问题，例如噪声提示，长上下文，数据攻击等；2）OOD稳健性：处理意外的真实世界应用场景，例如OOD检测，零样本转移，幻觉等；3）稳健性评估：总结用于验证LLMs稳健性的新评估数据集，指标和工具。在审查了每个角度的代表性工作后，我们讨论并强调了这一领域的未来机会和研究方向。同时，我们还组织了相关工作，并提供一个方便搜索的项目（https://github.com/zhangkunzk/Awesome-LLM-Robustness-papers）来支持社区。

更新时间: 2025-07-09 06:18:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.11111v2

Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints

Diffusion models have been extensively utilized in AI-generated content (AIGC) in recent years, thanks to the superior generation capabilities. Combining with semantic communications, diffusion models are used for tasks such as denoising, data reconstruction, and content generation. However, existing diffusion-based generative models do not consider the stringent bandwidth limitation, which limits its application in wireless communication. This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model. Our designed architecture utilizes the diffusion model, where the signal transmission process through the wireless channel acts as the forward process in diffusion. To reduce bandwidth requirements, we incorporate a downsampling module and a paired upsampling module based on a variational auto-encoder with reparameterization at the receiver to ensure that the recovered features conform to the Gaussian distribution. Furthermore, we derive the loss function for our proposed system and evaluate its performance through comprehensive experiments. Our experimental results demonstrate significant improvements in pixel-level metrics such as peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS). These enhancements are more profound regarding the compression rates and SNR compared to deep joint source-channel coding (DJSCC). We release the code at https://github.com/import-sudo/Diffusion-Driven-Semantic-Communication.

Updated: 2025-07-09 06:10:57

标题: 受带宽约束的生成模型的扩散驱动语义通信

摘要: 扩散模型近年来在人工智能生成内容（AIGC）中被广泛利用，这要归功于其出色的生成能力。结合语义通信，扩散模型被用于去噪、数据重建和内容生成等任务。然而，现有基于扩散的生成模型并未考虑严格的带宽限制，这限制了其在无线通信中的应用。本文介绍了一种基于扩散驱动的语义通信框架，采用先进的基于VAE的压缩技术，用于受带宽限制的生成模型。我们设计的架构利用了扩散模型，通过无线通道中的信号传输过程作为扩散中的前向过程。为了降低带宽要求，我们在接收端基于变分自动编码器实现了一个下采样模块和一个配对的上采样模块，通过重新参数化确保恢复的特征符合高斯分布。此外，我们推导了我们提出系统的损失函数，并通过全面实验评估了其性能。我们的实验结果展示了在像素级指标（如峰值信噪比（PSNR））和语义指标（如学习到的感知图像块相似度（LPIPS））方面的显著改进。相比于深度联合源-信道编码（DJSCC），这些改进在压缩率和信噪比方面更为显著。我们在https://github.com/import-sudo/Diffusion-Driven-Semantic-Communication发布了代码。

更新时间: 2025-07-09 06:10:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.18468v4

From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization

Reinforcement learning with verifiable rewards (RLVR) has recently advanced the reasoning capabilities of large language models (LLMs). While prior work has emphasized algorithmic design, data curation, and reward shaping, we investigate RLVR from a sample-centric perspective and introduce LPPO (Learning-Progress and Prefix-guided Optimization), a framework of progressive optimization techniques. Our work addresses a critical question: how to best leverage a small set of trusted, high-quality demonstrations, rather than simply scaling up data volume. First, motivated by how hints aid human problem-solving, we propose prefix-guided sampling, an online data augmentation method that incorporates partial solution prefixes from expert demonstrations to guide the policy, particularly for challenging instances. Second, inspired by how humans focus on important questions aligned with their current capabilities, we introduce learning-progress weighting, a dynamic strategy that adjusts each training sample's influence based on model progression. We estimate sample-level learning progress via an exponential moving average of per-sample pass rates, promoting samples that foster learning and de-emphasizing stagnant ones. Experiments on mathematical-reasoning benchmarks demonstrate that our methods outperform strong baselines, yielding faster convergence and a higher performance ceiling.

Updated: 2025-07-09 06:05:28

标题: 从数据中心到样本中心：通过渐进优化增强LLM推理

摘要: 具有可验证奖励的强化学习（RLVR）最近推动了大型语言模型（LLMs）的推理能力。尽管先前的工作强调算法设计、数据整理和奖励塑造，我们从样本中心的角度研究了RLVR，并引入了LPPO（学习进展和前缀引导优化）框架，这是一种渐进优化技术。我们的工作解决了一个关键问题：如何充分利用一小部分可信赖的高质量示范，而不仅仅是扩大数据量。首先，受到提示如何帮助人类解决问题的启发，我们提出了前缀引导采样，这是一种在线数据增强方法，它将来自专家示范的部分解决方案前缀纳入策略中，特别是对于具有挑战性的情况。其次，受到人类如何专注于与其当前能力相符的重要问题的启发，我们引入了学习进展加权，这是一种动态策略，根据模型进展调整每个训练样本的影响。我们通过每个样本的通过率的指数移动平均值来估计样本级别的学习进展，促进促进学习的样本，减少停滞的样本。在数学推理基准测试中的实验表明，我们的方法优于强基准，收敛速度更快，性能更高。

更新时间: 2025-07-09 06:05:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06573v1

CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs

The rapid scaling of Large Language Models (LLMs) elevates inference costs and compounds substantial deployment barriers. While quantization to 8 or 4 bits mitigates this, sub-3-bit methods face severe accuracy, scalability, and efficiency degradation. We propose Convolutional Code Quantization (CCQ), an inference-optimized quantization approach compressing LLMs to 2.0-2.75 bits with minimal accuracy loss. Departing from error-prone scalar quantization or slow vector quantization, CCQ integrates a hardware-aware bit-shift encoding and decoding solution with Convolutional Code, Hybrid Encoding, and Code Cluster, jointly overcoming accuracy-speed bottlenecks. We construct a lookup-free encoding space, enabling a linear mapping between the codebook and weight vectors, thereby optimizing inference performance. Meanwhile, by drawing on the concept of data mapping from vector quantization, we minimize the performance degradation of the model under extremely low-bit conditions. Experiments demonstrate that CCQ achieves outstanding performance on LLMs across various benchmarks. We compress DeepSeek-V3 (671B total parameters) to 184GB and ERNIE-4.5-300B-A47B to 89GB, enabling single-GPU deployment of ERNIE 4.5 and eliminating inter-card communication. The 2-bit ERNIE-4.5-300B-A47B model and inference engine have been open-sourced.

Updated: 2025-07-09 06:04:14

标题: CCQ：低比特量化LLMs中的卷积编码

摘要: 大型语言模型（LLM）的快速扩展提高了推断成本，并增加了部署障碍。虽然将量化至8位或4位可以减轻这一问题，但低于3位的方法面临严重的精度、可扩展性和效率降低。我们提出了卷积码量化（CCQ），这是一种推断优化的量化方法，将LLMs压缩至2.0-2.75位而仅有最小的精度损失。CCQ不同于容易出错的标量量化或缓慢的向量量化，它集成了硬件感知的位移编码和解码解决方案，与卷积码、混合编码和编码聚类一起克服了精度-速度瓶颈。我们构建了一个无查找表编码空间，实现了码书和权重向量之间的线性映射，从而优化了推断性能。同时，借鉴了向量量化的数据映射概念，我们最小化了模型在极低位情况下的性能下降。实验证明，CCQ在各种基准测试中对LLMs的性能表现出色。我们将DeepSeek-V3（总参数671B）压缩至184GB，将ERNIE-4.5-300B-A47B压缩至89GB，实现了ERNIE 4.5的单GPU部署，并消除了卡间通信。2位ERNIE-4.5-300B-A47B模型和推断引擎已经开源。

更新时间: 2025-07-09 06:04:14

领域: cs.LG

下载: http://arxiv.org/abs/2507.07145v1

PersonaFlow: Designing LLM-Simulated Expert Perspectives for Enhanced Research Ideation

Generating interdisciplinary research ideas requires diverse domain expertise, but access to timely feedback is often limited by the availability of experts. In this paper, we introduce PersonaFlow, a novel system designed to provide multiple perspectives by using LLMs to simulate domain-specific experts. Our user studies showed that the new design 1) increased the perceived relevance and creativity of ideated research directions, and 2) promoted users' critical thinking activities (e.g., interpretation, analysis, evaluation, inference, and self-regulation), without increasing their perceived cognitive load. Moreover, users' ability to customize expert profiles significantly improved their sense of agency, which can potentially mitigate their over-reliance on AI. This work contributes to the design of intelligent systems that augment creativity and collaboration, and provides design implications of using customizable AI-simulated personas in domains within and beyond research ideation.

Updated: 2025-07-09 05:59:31

标题: PersonaFlow：为增强研究构思而设计的LLM模拟专家视角

摘要: 生成跨学科研究想法需要多样化的领域专业知识，但及时反馈通常受限于专家的可用性。在本文中，我们介绍了一种名为PersonaFlow的新颖系统，旨在通过使用LLMs模拟领域特定专家来提供多重视角。我们的用户研究表明，新设计1）增加了想象中研究方向的相关性和创造性，2）促进了用户的批判性思维活动（例如解释、分析、评估、推断和自我调节），而不增加其认知负荷。此外，用户能够自定义专家配置文件的能力显著提高了他们的代理感，这可能有助于减轻他们对人工智能的过度依赖。这项工作对增强创造力和协作的智能系统设计做出了贡献，并提供了在研究想法内部和超越领域中使用可定制的AI模拟人物的设计启示。

更新时间: 2025-07-09 05:59:31

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2409.12538v2

SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference

Mixture-of-Experts (MoE) models improve the scalability of large language models (LLMs) by activating only a small subset of relevant experts per input. However, the sheer number of expert networks in an MoE model introduces a significant storage burden for an edge device. To address this challenge, we consider a scenario where experts are dispersed within an edge network for distributed inference. Based on the popular Top-$K$ expert selection strategy, we formulate a latency minimization problem by optimizing expert caching on edge servers under storage constraints. When $K=1$, the problem reduces to a monotone submodular maximization problem with knapsack constraints, for which we design a greedy-based algorithm with a $(1 - 1/e)$-approximation guarantee. For the general case where $K\geq1$, expert co-activation within the same MoE layer introduces non-submodularity, causing greedy methods to be ineffective. To tackle this issue, we propose a successive greedy decomposition method to decompose the original problem into a series of subproblems, with each being solved by a dynamic programming approach. Furthermore, we design an accelerated algorithm based on the max-convolution technique to obtain the approximate solution with a provable guarantee in polynomial time. Simulation results on various MoE models demonstrate that our method significantly reduces inference latency compared to existing baselines.

Updated: 2025-07-09 05:43:43

标题: SlimCaching：边缘缓存的混合专家分布式推理

摘要: Mixture-of-Experts（MoE）模型通过仅激活每个输入的一小部分相关专家来提高大型语言模型（LLMs）的可伸缩性。然而，MoE模型中专家网络的数量庞大为边缘设备引入了显著的存储负担。为了解决这一挑战，我们考虑了专家在边缘网络中分散进行分布推理的情况。基于流行的Top-K专家选择策略，我们通过在存储约束下优化边缘服务器上的专家缓存来制定一个延迟最小化问题。当K=1时，问题简化为具有背包约束的单调次模模最大化问题，我们设计了一种基于贪心的算法，具有（1-1/e）的近似保证。对于K≥1的一般情况，同一MoE层内的专家联合激活引入了非次模性，导致贪心方法无效。为了解决这个问题，我们提出了一种连续贪心分解方法，将原始问题分解为一系列子问题，每个子问题都通过动态规划方法解决。此外，我们设计了一种基于最大卷积技术的加速算法，以在多项式时间内获得具有可证明保证的近似解。对各种MoE模型的模拟结果表明，与现有基线相比，我们的方法显著降低了推理延迟。

更新时间: 2025-07-09 05:43:43

领域: cs.LG,cs.DC,cs.NI

下载: http://arxiv.org/abs/2507.06567v1

PASS: Private Attributes Protection with Stochastic Data Substitution

The growing Machine Learning (ML) services require extensive collections of user data, which may inadvertently include people's private information irrelevant to the services. Various studies have been proposed to protect private attributes by removing them from the data while maintaining the utilities of the data for downstream tasks. Nevertheless, as we theoretically and empirically show in the paper, these methods reveal severe vulnerability because of a common weakness rooted in their adversarial training based strategies. To overcome this limitation, we propose a novel approach, PASS, designed to stochastically substitute the original sample with another one according to certain probabilities, which is trained with a novel loss function soundly derived from information-theoretic objective defined for utility-preserving private attributes protection. The comprehensive evaluation of PASS on various datasets of different modalities, including facial images, human activity sensory signals, and voice recording datasets, substantiates PASS's effectiveness and generalizability.

Updated: 2025-07-09 05:41:13

标题: PASS：使用随机数据替换进行私有属性保护

摘要: 不断增长的机器学习（ML）服务需要大量的用户数据，这些数据可能无意中包含与服务无关的人们的私人信息。已经提出了各种研究来保护私人属性，通过从数据中删除它们同时保持数据对下游任务的效用。然而，正如我们在本文中理论上和实证上展示的那样，这些方法由于根植于对抗训练的策略而揭示出严重的脆弱性。为了克服这一限制，我们提出了一种新方法PASS，该方法旨在根据一定的概率随机替换原始样本为另一个样本，PASS是根据一种从信息论目标中定义的用于保护保留私人属性的实用性损失函数进行训练的。PASS在包括面部图像、人体活动感知信号和语音记录数据集在内的不同模态的各种数据集上进行了全面评估，证实了PASS的有效性和泛化能力。

更新时间: 2025-07-09 05:41:13

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2506.07308v2

CHAI for LLMs: Improving Code-Mixed Translation in Large Language Models through Reinforcement Learning with AI Feedback

Large Language Models (LLMs) have demonstrated remarkable capabilities across various NLP tasks but struggle with code-mixed (or code-switched) language understanding. For example, prior work benchmarking the performance of multilingual LLMs on code-mixed translation tasks has demonstrated that current state-of-the-art multilingual LLMs are ineffective in dealing with code-mixed languages. However, the question of how to improve the capability of multilingual LLMs to handle code-mixed language has not received any attention to date. In this paper, we tackle this research gap by proposing CHAI, a novel general-purpose framework for improving the ability of multilingual LLMs to handle code-mixed languages. CHAI relies on three novel contributions made in this paper. First, we explore the ability of LLMs to provide accurate annotations for code-mixed translation tasks. Second, we leverage this ability of LLMs as annotators to generate preference data for code-mixed translation tasks at scale, which are then used within a reinforcement learning from AI feedback (RLAIF) procedure to improve LLMs' capability on code-mixed tasks. Third, we conduct a rigorous experimental evaluation across various real-world datasets and settings. Our analysis shows that CHAI-powered LLMs outperform state-of-the-art open-source LLMs by 25.66% (in terms of win rate adjudicated by human annotators) in code-mixed translation tasks. This work represents a first step towards developing more inclusive code-mixed LLMs.

Updated: 2025-07-09 05:40:56

标题: CHAI for LLMs：通过AI反馈的强化学习改进大型语言模型中的混合代码翻译

摘要: 大型语言模型（LLMs）在各种自然语言处理任务中展现出卓越的能力，但在理解混合代码（或代码切换）语言方面存在困难。例如，之前在混合代码翻译任务上对多语言LLMs性能进行基准测试的研究表明，当前最先进的多语言LLMs在处理混合代码语言方面效果不佳。然而，如何提高多语言LLMs处理混合代码语言的能力这一问题迄今未受到关注。本文通过提出CHAI，一个新颖的通用框架，来解决这一研究空白。CHAI依赖于本文中提出的三个新颖贡献。首先，我们探讨LLMs提供准确注释用于混合代码翻译任务的能力。其次，我们利用LLMs作为注释器的这种能力，生成用于混合代码翻译任务的偏好数据，并在强化学习从人工智能反馈（RLAIF）过程中利用这些数据来提高LLMs在混合代码任务上的能力。第三，我们在各种真实世界数据集和设置中进行了严格的实验评估。我们的分析表明，由CHAI驱动的LLMs在混合代码翻译任务中的胜率（由人工注释者裁决）比最先进的开源LLMs高出25.66%。这项工作是发展更具包容性的混合代码LLMs的第一步。

更新时间: 2025-07-09 05:40:56

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.09073v3

The Flaws of Others: An LLM-driven Framework for Scientific Knowledge Production

Large-language models turn writing into a live exchange between humans and software. We capture this new medium with a discursive-network model that treats people and LLMs as equal nodes and tracks how their statements circulate. Broadening the focus from isolated hallucinations, we define invalidation (any factual, logical, or structural breach) and show it follows four hazards: drift from truth, self-repair, fresh fabrication, and external detection. A general mathematical model of discursive networks is developed to provide valuable insights: A network governed only by drift and self-repair stabilizes at a modest error rate; adding fabrication reproduces the high rates seen in current LLMs. Giving each false claim even a small chance of peer review shifts the system to a truth-dominant state. We operationalize peer review with the open-source \emph{Flaws-of-Others (FOO) algorithm}: a configurable loop in which any set of agents critique one another while a harmoniser merges their verdicts. The takeaway is practical and cultural: reliability in this new medium comes not from perfecting single models but from wiring imperfect ones into networks that keep each other honest.

Updated: 2025-07-09 05:39:56

标题: 他人的缺陷：一种以LLM为驱动的科学知识生产框架

摘要: 大型语言模型将写作变成了人类和软件之间的实时交流。我们用一种辩论网络模型捕捉了这种新媒体，将人类和LLMs视为平等节点，并追踪他们的表述是如何传播的。我们将焦点从孤立的幻觉扩展到了无效（任何事实、逻辑或结构性违规），并展示了它遵循了四种风险：远离真相、自我修复、新鲜编造和外部检测。我们开发了一个通用的辩论网络数学模型，以提供有价值的见解：仅受漂移和自我修复支配的网络会稳定在一个适度的错误率；添加编造会复制当前LLMs中看到的高错误率。即使给予每个虚假声明一个很小的同行评审机会，也会将系统转变为以真相为主导的状态。我们用开源的“他人缺陷（FOO）算法”将同行评审操作化：一个可配置的循环，在其中任何一组代理人相互批评，同时一个和谐者合并他们的裁决。结论是实用和文化的：在这种新媒体中的可靠性不是来自完善单一模型，而是来自将不完美的模型连接成网络，使彼此保持诚实。

更新时间: 2025-07-09 05:39:56

领域: cs.CL,cs.LG,68T01, 60J10, 91D30, 05C82, 68T50, 68W20, 94A15,I.2.7; I.2.11; G.3

下载: http://arxiv.org/abs/2507.06565v1

SkyVLN: Vision-and-Language Navigation and NMPC Control for UAVs in Urban Environments

Unmanned Aerial Vehicles (UAVs) have emerged as versatile tools across various sectors, driven by their mobility and adaptability. This paper introduces SkyVLN, a novel framework integrating vision-and-language navigation (VLN) with Nonlinear Model Predictive Control (NMPC) to enhance UAV autonomy in complex urban environments. Unlike traditional navigation methods, SkyVLN leverages Large Language Models (LLMs) to interpret natural language instructions and visual observations, enabling UAVs to navigate through dynamic 3D spaces with improved accuracy and robustness. We present a multimodal navigation agent equipped with a fine-grained spatial verbalizer and a history path memory mechanism. These components allow the UAV to disambiguate spatial contexts, handle ambiguous instructions, and backtrack when necessary. The framework also incorporates an NMPC module for dynamic obstacle avoidance, ensuring precise trajectory tracking and collision prevention. To validate our approach, we developed a high-fidelity 3D urban simulation environment using AirSim, featuring realistic imagery and dynamic urban elements. Extensive experiments demonstrate that SkyVLN significantly improves navigation success rates and efficiency, particularly in new and unseen environments.

Updated: 2025-07-09 05:38:32

标题: SkyVLN：城市环境中无人机的视觉与语言导航和NMPC控制

摘要: 无人机（UAVs）已成为各个领域中多功能工具，其灵活性和适应性推动了其发展。本文介绍了SkyVLN，这是一个新颖的框架，将视觉和语言导航（VLN）与非线性模型预测控制（NMPC）结合起来，以增强UAV在复杂城市环境中的自主性。与传统导航方法不同，SkyVLN利用大型语言模型（LLMs）来解释自然语言指令和视觉观察，使UAV能够在动态3D空间中以更高的准确性和鲁棒性进行导航。我们提出了一个多模式导航代理，配备了细粒度的空间描述器和历史路径记忆机制。这些组件使UAV能够消除空间上下文的歧义，处理模糊的指令，并在必要时进行回溯。该框架还包含一个用于动态障碍物避免的NMPC模块，确保精确的轨迹跟踪和碰撞预防。为验证我们的方法，我们使用AirSim开发了一个高保真度的3D城市模拟环境，具有逼真的图像和动态城市元素。大量实验证明，SkyVLN显著提高了导航成功率和效率，特别是在新的和未知的环境中。

更新时间: 2025-07-09 05:38:32

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.06564v1

Efficient Transfer Learning via Causal Bounds

Transfer learning seeks to accelerate sequential decision-making by leveraging offline data from related agents. However, data from heterogeneous sources that differ in observed features, distributions, or unobserved confounders often render causal effects non-identifiable and bias naive estimators. We address this by forming ambiguity sets of structural causal models defined via integral constraints on their joint densities. Optimizing any causal effect over these sets leads to generally non-convex programs whose solutions tightly bound the range of possible effects under heterogeneity or confounding. To solve these programs efficiently, we develop a hit-and-run sampler that explores the entire ambiguity set and, when paired with a local optimization oracle, produces causal bound estimates that converge almost surely to the true limits. We further accommodate estimation error by relaxing the ambiguity set and exploit the Lipschitz continuity of causal effects to establish precise error propagation guarantees. These causal bounds are then embedded into bandit algorithms via arm elimination and truncated UCB indices, yielding optimal gap-dependent and minimax regret bounds. To handle estimation error, we also develop a safe algorithm for incorporating noisy causal bounds. In the contextual-bandit setting with function approximation, our method uses causal bounds to prune both the function class and the per-context action set, achieving matching upper and lower regret bounds with only logarithmic dependence on function-class complexity. Our analysis precisely characterizes when and how causal side-information accelerates online learning, and experiments on synthetic benchmarks confirm substantial regret reductions in data-scarce or confounded regimes.

Updated: 2025-07-09 05:37:07

标题: 通过因果边界实现高效的迁移学习

摘要: 转移学习旨在通过利用来自相关代理的离线数据加速顺序决策。然而，来自异质来源的数据往往在观察特征、分布或未观察混淆因素上有所不同，这使因果效应不可识别并且偏置朴素估计量。我们通过形成由其联合密度的积分约束定义的结构性因果模型的模糊集来解决这个问题。在这些集合上优化任何因果效应会导致通常非凸的程序，其解紧密地限制了异质性或混淆下可能效果的范围。为了有效解决这些问题，我们开发了一个探索整个模糊集的hit-and-run采样器，并与本地优化预测器配对，产生几乎肯定收敛到真实极限的因果界估计。我们进一步通过放宽模糊集来容纳估计误差，并利用因果效应的利普希茨连续性来建立精确的误差传播保证。然后，这些因果界被嵌入到术语消除和截断的UCB指数中的赌徒算法中，产生最优的依赖间隔和最小化后悔界。为了处理估计误差，我们还开发了一种安全算法，用于合并有噪声的因果边界。在具有函数逼近的情境赌徒设置中，我们的方法使用因果界来修剪函数类和每个上下文动作集，实现只对函数类复杂性具有对数依赖的上下文动作集，实现上下文动作集的匹配上限和下限后悔界。我们的分析准确地表征了因果侧信息如何加速在线学习的时间和方式，并在合成基准测试中的实验中确认在数据稀缺或混淆的情况下实现了实质性的后悔减少。

更新时间: 2025-07-09 05:37:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2308.03572v5

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback

Reinforcement learning from human feedback (RLHF) has become essential for improving language model capabilities, but traditional approaches rely on the assumption that human preferences follow a transitive Bradley-Terry model. This assumption fails to capture the non-transitive nature of populational human preferences. Nash learning from human feedback (NLHF), targeting non-transitive preferences, is a problem of computing the Nash equilibrium (NE) of the two-player constant-sum game defined by the human preference. We introduce Extragradient preference optimization (EGPO), a novel algorithm for NLHF achieving last-iterate linear convergence to the NE of KL-regularized games and polynomial convergence to the NE of original games, while being robust to noise. Unlike previous approaches that rely on nested optimization, we derive an equivalent implementation using gradients of an online variant of the identity preference optimization (IPO) loss, enabling more faithful implementation for neural networks. Our empirical evaluations demonstrate EGPO's superior performance over baseline methods when training for the same number of epochs, as measured by pairwise win-rates using the ground truth preference. These results validate both the theoretical strengths and practical advantages of EGPO for language model alignment with non-transitive human preferences.

Updated: 2025-07-09 05:35:31

标题: Extragradient Preference Optimization (EGPO): 超越最后收敛的纳什学习人类反馈

摘要: 人类反馈强化学习（RLHF）已经成为提高语言模型能力的必要方法，但传统方法依赖于人类偏好遵循传统的布拉德利-特里模型的假设。这个假设未能捕捉到人类偏好的非传递性特性。从人类反馈中学习纳什均衡（NLHF）是一个计算由人类偏好定义的两人零和博弈的纳什均衡（NE）的问题，旨在处理非传递性偏好。我们介绍了Extragradient偏好优化（EGPO），这是一种用于NLHF的新颖算法，实现了对KL正则化游戏的NE的最后迭代线性收敛和对原始游戏NE的多项式收敛，同时对噪声具有鲁棒性。与先前依赖于嵌套优化的方法不同，我们使用在线变体的身份偏好优化（IPO）损失的梯度导出了一个等效实现，从而更忠实地为神经网络实现。我们的实证评估表明，在训练相同数量的时代时，EGPO相对于基线方法表现出更优异的性能，通过使用地面真相偏好进行成对胜率测量。这些结果验证了EGPO在语言模型与非传递性人类偏好对齐方面的理论优势和实际优势。

更新时间: 2025-07-09 05:35:31

领域: cs.LG

下载: http://arxiv.org/abs/2503.08942v3

Clio-X: AWeb3 Solution for Privacy-Preserving AI Access to Digital Archives

As archives turn to artificial intelligence to manage growing volumes of digital records, privacy risks inherent in current AI data practices raise critical concerns about data sovereignty and ethical accountability. This paper explores how privacy-enhancing technologies (PETs) and Web3 architectures can support archives to preserve control over sensitive content while still being able to make it available for access by researchers. We present Clio-X, a decentralized, privacy-first Web3 digital solution designed to embed PETs into archival workflows and support AI-enabled reference and access. Drawing on a user evaluation of a medium-fidelity prototype, the study reveals both interest in the potential of the solution and significant barriers to adoption related to trust, system opacity, economic concerns, and governance. Using Rogers' Diffusion of Innovation theory, we analyze the sociotechnical dimensions of these barriers and propose a path forward centered on participatory design and decentralized governance through a Clio-X Decentralized Autonomous Organization. By integrating technical safeguards with community-based oversight, Clio-X offers a novel model to ethically deploy AI in cultural heritage contexts.

Updated: 2025-07-09 05:30:38

标题: Clio-X：面向数字档案的隐私保护AI访问的Web3解决方案

摘要: 当档案馆转向人工智能来管理不断增长的数字记录时，当前人工智能数据实践中固有的隐私风险引发了关于数据主权和道德问责的重要关注。本文探讨了隐私增强技术（PETs）和Web3架构如何支持档案馆在保留对敏感内容的控制的同时，仍能使其可供研究人员访问。我们提出了Clio-X，这是一个去中心化、以隐私为先的Web3数字解决方案，旨在将PETs嵌入到档案工作流程中，并支持AI启用的参考和访问。通过对中等保真度原型的用户评估，研究揭示了对解决方案潜力的兴趣以及与信任、系统不透明性、经济担忧和治理相关的显著采用障碍。利用罗杰斯的创新扩散理论，我们分析了这些障碍的社会技术维度，并提出了一个以参与性设计和通过Clio-X去中心化自治组织实现去中心化治理的前进之路。通过将技术保障与基于社区的监督相结合，Clio-X提供了一个在文化遗产背景下道德地部署人工智能的新模式。

更新时间: 2025-07-09 05:30:38

领域: cs.CR,cs.AI,cs.CY,cs.DL,D.2.11, H.3.4, H.3.7, J.5

下载: http://arxiv.org/abs/2507.08853v1

Divergence-Based Similarity Function for Multi-View Contrastive Learning

Recent success in contrastive learning has sparked growing interest in more effectively leveraging multiple augmented views of an instance. While prior methods incorporate multiple views at the loss or feature level, they primarily capture pairwise relationships and fail to model the joint structure across all views. In this work, we propose a divergence-based similarity function (DSF) that explicitly captures the joint structure by representing each set of augmented views as a distribution and measuring similarity as the divergence between distributions. Extensive experiments demonstrate that DSF consistently improves performance across various tasks, including kNN classification and linear evaluation, while also offering greater efficiency compared to other multi-view methods. Furthermore, we establish a theoretical connection between DSF and cosine similarity, and show that, unlike cosine similarity, DSF operates effectively without requiring a temperature hyperparameter.

Updated: 2025-07-09 05:28:31

标题: 基于分歧的多视图对比学习相似性函数

摘要: 最近在对比学习方面取得的成功引发了对更有效地利用实例的多个增强视图的日益增长的兴趣。虽然先前的方法在损失或特征级别上合并多个视图，但它们主要捕捉成对关系，并未对所有视图之间的联合结构进行建模。在本研究中，我们提出了一种基于分歧的相似性函数（DSF），通过将每组增强视图表示为分布并测量分布之间的分歧，明确捕捉联合结构。大量实验表明，DSF在各种任务中始终提高性能，包括kNN分类和线性评估，同时与其他多视图方法相比，还提供更高的效率。此外，我们建立了DSF与余弦相似性之间的理论联系，并展示了与余弦相似性不同，DSF在不需要温度超参数的情况下有效运行。

更新时间: 2025-07-09 05:28:31

领域: cs.CV,cs.LG,68T07, 62H12,I.2.6; I.4.8; I.5.1

下载: http://arxiv.org/abs/2507.06560v1

The Primacy of Magnitude in Low-Rank Adaptation

Low-Rank Adaptation (LoRA) offers a parameter-efficient paradigm for tuning large models. While recent spectral initialization methods improve convergence and performance over the naive "Noise & Zeros" scheme, their extra computational and storage overhead undermines efficiency. In this paper, we establish update magnitude as the fundamental driver of LoRA performance and propose LoRAM, a magnitude-driven "Basis & Basis" initialization scheme that matches spectral methods without their inefficiencies. Our key contributions are threefold: (i) Magnitude of weight updates determines convergence. We prove low-rank structures intrinsically bound update magnitudes, unifying hyperparameter tuning in learning rate, scaling factor, and initialization as mechanisms to optimize magnitude regulation. (ii) Spectral initialization succeeds via magnitude amplification. We demystify that the presumed knowledge-driven benefit of the spectral component essentially arises from the boost in the weight update magnitude. (iii) A novel and compact initialization strategy, LoRAM, scales deterministic orthogonal bases using pretrained weight magnitudes to simulate spectral gains. Extensive experiments show that LoRAM serves as a strong baseline, retaining the full efficiency of LoRA while matching or outperforming spectral initialization across benchmarks.

Updated: 2025-07-09 05:25:24

标题: 低秩适应中的数量优先原则

摘要: 低秩适应（LoRA）提供了一种参数高效的范式来调整大型模型。尽管最近的谱初始化方法改善了收敛性和性能，超过了天真的“噪声和零”方案，但它们额外的计算和存储开销削弱了效率。在本文中，我们将更新幅度确定为LoRA性能的基本驱动因素，并提出LoRAM，一种基于幅度驱动的“基础与基础”初始化方案，能够与谱方法匹配，但不带有它们的低效性。我们的主要贡献有三点：（i）权重更新的幅度决定收敛性。我们证明了低秩结构内在地限制了更新幅度，将学习率、缩放因子和初始化的超参数调整统一为优化幅度调节的机制。（ii）谱初始化通过幅度放大取得成功。我们揭示了谱成分被认为是知识驱动好处的实质根本来自于权重更新幅度的提升。（iii）一种新颖且紧凑的初始化策略LoRAM，利用预训练的权重幅度来扩展确定性正交基，模拟谱增益。大量实验表明，LoRAM作为一个强大的基准，保持了LoRA的完整效率，同时在基准测试中与谱初始化相匹配或超越。

更新时间: 2025-07-09 05:25:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06558v1

AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework

Answering natural language (NL) questions about tables, known as Tabular Question Answering (TQA), is crucial because it allows users to quickly and efficiently extract meaningful insights from structured data, effectively bridging the gap between human language and machine-readable formats. Many of these tables are derived from web sources or real-world scenarios, which require meticulous data preparation (or data prep) to ensure accurate responses. However, preparing such tables for NL questions introduces new requirements that extend beyond traditional data preparation. This question-ware data preparation involves specific tasks such as column derivation and filtering tailored to particular questions, as well as question-aware value normalization or conversion, highlighting the need for a more nuanced approach in this context. Because each of the above tasks is unique, a single model (or agent) may not perform effectively across all scenarios. In this paper, we propose AutoPrep, a large language model (LLM)-based multiagent framework that leverages the strengths of multiple agents, each specialized in a certain type of data prep, ensuring more accurate and contextually relevant responses. Given an NL question over a table, AutoPrep performs data prep through three key components. Planner: Determines a logical plan, outlining a sequence of high-level operations. Programmer: Translates this logical plan into a physical plan by generating the corresponding low-level code. Executor: Executes the generated code to process the table. To support this multi-agent framework, we design a novel Chain-ofClauses reasoning mechanism for high-level operation suggestion, and a tool-augmented method for low-level code generation.

Updated: 2025-07-09 05:24:56

标题: AutoPrep: 使用多智能体框架进行自然语言问题感知数据准备

摘要: 回答关于表格的自然语言（NL）问题，即被称为表格问题回答（TQA），是至关重要的，因为它使用户能够快速有效地从结构化数据中提取有意义的见解，有效地弥合了人类语言与可机器读取格式之间的差距。许多这些表格来自于网络来源或现实世界场景，这些表格需要精心准备数据以确保准确的回答。然而，为NL问题准备这些表格引入了超越传统数据准备的新要求。这种问题感知数据准备涉及特定任务，如根据特定问题定制的列推导和过滤，以及问题感知值规范化或转换，凸显了在这种情境下需要更加细致的方法。由于上述每个任务都是独一无二的，单一模型（或代理）可能无法在所有场景中有效执行。在本文中，我们提出了AutoPrep，这是一种基于大型语言模型（LLM）的多代理框架，利用多个代理的优势，每个代理专门从事某种类型的数据准备，确保更准确和与上下文相关的回答。给定一个关于表格的NL问题，AutoPrep通过三个关键组件执行数据准备。规划者：确定一个逻辑计划，概述一系列高级操作的顺序。程序员：通过生成相应的低级代码，将这个逻辑计划转化为物理计划。执行器：执行生成的代码来处理表格。为支持这个多代理框架，我们设计了一个新颖的Chain-of-Clauses推理机制用于高级操作建议，以及一种工具增强方法用于低级代码生成。

更新时间: 2025-07-09 05:24:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2412.10422v4

On the Hardness of Unsupervised Domain Adaptation: Optimal Learners and Information-Theoretic Perspective

This paper studies the hardness of unsupervised domain adaptation (UDA) under covariate shift. We model the uncertainty that the learner faces by a distribution $\pi$ in the ground-truth triples $(p, q, f)$ -- which we call a UDA class -- where $(p, q)$ is the source -- target distribution pair and $f$ is the classifier. We define the performance of a learner as the overall target domain risk, averaged over the randomness of the ground-truth triple. This formulation couples the source distribution, the target distribution and the classifier in the ground truth, and deviates from the classical worst-case analyses, which pessimistically emphasize the impact of hard but rare UDA instances. In this formulation, we precisely characterize the optimal learner. The performance of the optimal learner then allows us to define the learning difficulty for the UDA class and for the observed sample. To quantify this difficulty, we introduce an information-theoretic quantity -- Posterior Target Label Uncertainty (PTLU) -- along with its empirical estimate (EPTLU) from the sample , which capture the uncertainty in the prediction for the target domain. Briefly, PTLU is the entropy of the predicted label in the target domain under the posterior distribution of ground-truth classifier given the observed source and target samples. By proving that such a quantity serves to lower-bound the risk of any learner, we suggest that these quantities can be used as proxies for evaluating the hardness of UDA learning. We provide several examples to demonstrate the advantage of PTLU, relative to the existing measures, in evaluating the difficulty of UDA learning.

Updated: 2025-07-09 05:11:19

标题: 关于无监督领域自适应的困难性：最佳学习者和信息论视角

摘要: 本文研究了在共变量转移下无监督领域自适应（UDA）的困难性。我们通过一个分布$\pi$来建模学习者面临的不确定性，该分布在地面实况三元组$(p, q, f)$中，我们称之为UDA类，其中$(p, q)$是源-目标分布对，$f$是分类器。我们将学习者的表现定义为整体目标域风险，平均分布在地面实况三元组的随机性之上。这种形式将源分布、目标分布和分类器在地面实况中联系在一起，与经典的最坏情况分析有所偏离，后者悲观地强调了困难但罕见的UDA实例的影响。在这种表述中，我们精确地刻画了最优学习者。最优学习者的表现然后允许我们为UDA类和观察样本定义学习难度。为了量化这种困难，我们引入了一个信息论量 -- 后验目标标签不确定性（PTLU） -- 以及其从样本中的经验估计（EPTLU），捕捉目标域预测中的不确定性。简而言之，PTLU是在地面实况分类器的后验分布给定观测源和目标样本的情况下，在目标域中预测标签的熵。通过证明这样一个量可以对任何学习者的风险提供下界，我们建议这些量可以用作评估UDA学习的困难性的代理。我们提供了几个例子来展示PTLU相对于现有度量在评估UDA学习困难性方面的优势。

更新时间: 2025-07-09 05:11:19

领域: stat.ML,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2507.06552v1

Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery

Quantitative Structure-Activity Relationship (QSAR) modeling is a cornerstone of computational drug discovery. This research demonstrates the successful application of a Quantum Multiple Kernel Learning (QMKL) framework to enhance QSAR classification, showing a notable performance improvement over classical methods. We apply this methodology to a dataset for identifying DYRK1A kinase inhibitors. The workflow involves converting SMILES representations into numerical molecular descriptors, reducing dimensionality via Principal Component Analysis (PCA), and employing a Support Vector Machine (SVM) trained on an optimized combination of multiple quantum and classical kernels. By benchmarking the QMKL-SVM against a classical Gradient Boosting model, we show that the quantum-enhanced approach achieves a superior AUC score, highlighting its potential to provide a quantum advantage in challenging cheminformatics classification tasks.

Updated: 2025-07-09 05:09:16

标题: Q2SAR：一种用于药物发现的量子多核学习方法

摘要: 定量结构-活性关系（QSAR）建模是计算药物发现的基石。本研究展示了量子多核学习（QMKL）框架成功应用于增强QSAR分类的效果，表现出明显的性能改进，超越了传统方法。我们将这种方法应用于一个用于识别DYRK1A激酶抑制剂的数据集。工作流程包括将SMILES表示转换为数值分子描述符，通过主成分分析（PCA）降低维度，并应用在经过优化的多个量子和经典内核组合上训练的支持向量机（SVM）。通过将QMKL-SVM与经典梯度提升模型进行基准测试，我们展示了量子增强方法实现了更优越的AUC分数，突显了其在具有挑战性的化学信息分类任务中提供量子优势的潜力。

更新时间: 2025-07-09 05:09:16

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2506.14920v3

Deep-Learning-Based Pre-Layout Parasitic Capacitance Prediction on SRAM Designs

To achieve higher system energy efficiency, SRAM in SoCs is often customized. The parasitic effects cause notable discrepancies between pre-layout and post-layout circuit simulations, leading to difficulty in converging design parameters and excessive design iterations. Is it possible to well predict the parasitics based on the pre-layout circuit, so as to perform parasitic-aware pre-layout simulation? In this work, we propose a deep-learning-based 2-stage model to accurately predict these parasitics in pre-layout stages. The model combines a Graph Neural Network (GNN) classifier and Multi-Layer Perceptron (MLP) regressors, effectively managing class imbalance of the net parasitics in SRAM circuits. We also employ Focal Loss to mitigate the impact of abundant internal net samples and integrate subcircuit information into the graph to abstract the hierarchical structure of schematics. Experiments on 4 real SRAM designs show that our approach not only surpasses the state-of-the-art model in parasitic prediction by a maximum of 19X reduction of error but also significantly boosts the simulation process by up to 598X speedup.

Updated: 2025-07-09 05:05:08

标题: 基于深度学习的SRAM设计预布局寄生电容预测

摘要: 为了实现更高的系统能效，SoC中的SRAM通常会进行定制。寄生效应导致预布局和后布局电路模拟之间存在明显差异，导致设计参数难以收敛和设计迭代过多。是否可能基于预布局电路很好地预测寄生效应，从而进行寄生感知的预布局模拟？在这项工作中，我们提出了一个基于深度学习的2阶段模型，可以准确预测预布局阶段的这些寄生效应。该模型结合了图神经网络（GNN）分类器和多层感知器（MLP）回归器，有效地处理了SRAM电路中净寄生效应的类别不平衡。我们还采用了Focal Loss来减轻大量内部净样本的影响，并将子电路信息整合到图中，以抽象出原理图的层次结构。对4个真实的SRAM设计进行的实验表明，我们的方法不仅在最大误差降低了19倍的情况下超过了最先进的模型在寄生预测方面，而且还将模拟过程显著加快了最多598倍。

更新时间: 2025-07-09 05:05:08

领域: cs.LG,cs.AR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.06549v1

Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution

While diffusion models excel at image generation, their growing adoption raises critical concerns around copyright issues and model transparency. Existing attribution methods identify training examples influencing an entire image, but fall short in isolating contributions to specific elements, such as styles or objects, that matter most to stakeholders. To bridge this gap, we introduce \emph{concept-level attribution} via a novel method called \emph{Concept-TRAK}. Concept-TRAK extends influence functions with two key innovations: (1) a reformulated diffusion training loss based on diffusion posterior sampling, enabling robust, sample-specific attribution; and (2) a concept-aware reward function that emphasizes semantic relevance. We evaluate Concept-TRAK on the AbC benchmark, showing substantial improvements over prior methods. Through diverse case studies--ranging from identifying IP-protected and unsafe content to analyzing prompt engineering and compositional learning--we demonstrate how concept-level attribution yields actionable insights for responsible generative AI development and governance.

Updated: 2025-07-09 05:03:57

标题: Concept-TRAK：通过概念级归因理解扩散模型如何学习概念

摘要: 尽管扩散模型在图像生成方面表现出色，但它们日益普及引起了关于版权问题和模型透明度的关键关注。现有的归因方法识别影响整个图像的训练示例，但在分离对特定元素（如样式或对象）的贡献方面仍有不足，这些对于利益相关者最为重要。为了弥补这一差距，我们引入了一种名为Concept-TRAK的新颖方法，通过\emph{概念级归因}。Concept-TRAK通过两个关键创新扩展了影响函数：（1）基于扩散后验采样的重构扩散训练损失，实现了稳健的、针对样本的归因；以及（2）强调语义相关性的概念感知奖励函数。我们在AbC基准上评估了Concept-TRAK，显示出相对于先前方法的显著改进。通过多样化的案例研究--从识别受知识产权保护和不安全内容到分析提示工程和组合学习--我们展示了概念级归因如何为负责任的生成式人工智能开发和治理提供可操作的见解。

更新时间: 2025-07-09 05:03:57

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.06547v1

Stepwise functional refoundation of relational concept analysis

Relational concept analysis (RCA) is an extension of formal concept analysis allowing to deal with several related contexts simultaneously. It has been designed for learning description logic theories from data and used within various applications. A puzzling observation about RCA is that it returns a single family of concept lattices although, when the data feature circular dependencies, other solutions may be considered acceptable. The semantics of RCA, provided in an operational way, does not shed light on this issue. In this report, we define these acceptable solutions as those families of concept lattices which belong to the space determined by the initial contexts (well-formed), cannot scale new attributes (saturated), and refer only to concepts of the family (self-supported). We adopt a functional view on the RCA process by defining the space of well-formed solutions and two functions on that space: one expansive and the other contractive. We show that the acceptable solutions are the common fixed points of both functions. This is achieved step-by-step by starting from a minimal version of RCA that considers only one single context defined on a space of contexts and a space of lattices. These spaces are then joined into a single space of context-lattice pairs, which is further extended to a space of indexed families of context-lattice pairs representing the objects manippulated by RCA. We show that RCA returns the least element of the set of acceptable solutions. In addition, it is possible to build dually an operation that generates its greatest element. The set of acceptable solutions is a complete sublattice of the interval between these two elements. Its structure and how the defined functions traverse it are studied in detail.

Updated: 2025-07-09 05:01:42

标题: 逐步功能性重建关系概念分析

摘要: 关系概念分析（RCA）是形式概念分析的扩展，允许同时处理几个相关的上下文。它被设计用于从数据中学习描述逻辑理论，并在各种应用中使用。关于RCA的一个令人困惑的观察是，尽管当数据特征存在循环依赖时，可能会考虑其他解决方案，但它返回一个单一的概念格家族。以操作方式提供的RCA语义并未揭示这个问题。在本报告中，我们将这些可接受的解决方案定义为属于初始上下文确定的空间的那些概念格家族（格式正确），不能扩展新属性（饱和），且仅涉及该家族的概念（自支持）。我们采用功能视角定义RCA过程中格式正确解决方案的空间，并在该空间上定义两个函数：一个扩张函数和一个收缩函数。我们证明了可接受的解决方案是这两个函数的共同不动点。通过逐步从仅考虑在上下文空间和格子空间上定义的单一上下文的最小版本的RCA开始，我们实现了这一点。然后将这些空间连接成一个上下文-格子对的单一空间，进一步扩展为表示RCA操作对象的索引家族上下文-格子对的空间。我们展示了RCA返回了可接受解决方案集合的最小元素。此外，也可以对偶地构建一个生成其最大元素的操作。可接受解决方案集合是介于这两个元素之间的区间的完整子格。其结构以及定义的函数如何遍历它们都被详细研究。

更新时间: 2025-07-09 05:01:42

领域: cs.AI

下载: http://arxiv.org/abs/2310.06441v4

Semantic Augmentation in Images using Language

Deep Learning models are incredibly data-hungry and require very large labeled datasets for supervised learning. As a consequence, these models often suffer from overfitting, limiting their ability to generalize to real-world examples. Recent advancements in diffusion models have enabled the generation of photorealistic images based on textual inputs. Leveraging the substantial datasets used to train these diffusion models, we propose a technique to utilize generated images to augment existing datasets. This paper explores various strategies for effective data augmentation to improve the out-of-domain generalization capabilities of deep learning models.

Updated: 2025-07-09 05:00:43

标题: 使用语言进行图像的语义增强

摘要: 深度学习模型对数据需求巨大，需要非常大的带标签数据集进行监督学习。因此，这些模型经常受到过拟合的困扰，限制了它们推广到真实世界示例的能力。最近扩散模型的进步使得基于文本输入生成逼真的图像成为可能。利用用于训练这些扩散模型的大量数据集，我们提出了一种利用生成图像来增强现有数据集的技术。本文探讨了各种有效的数据增强策略，以提高深度学习模型的领域外泛化能力。

更新时间: 2025-07-09 05:00:43

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.02353v2

A Single Merging Suffices: Recovering Server-based Learning Performance in Decentralized Learning

Decentralized learning provides a scalable alternative to traditional parameter-server-based training, yet its performance is often hindered by limited peer-to-peer communication. In this paper, we study how communication should be scheduled over time, including determining when and how frequently devices synchronize. Our empirical results show that concentrating communication budgets in the later stages of decentralized training markedly improves global generalization. Surprisingly, we uncover that fully connected communication at the final step, implemented by a single global merging, is sufficient to match the performance of server-based training. We further show that low communication in decentralized learning preserves the \textit{mergeability} of local models throughout training. Our theoretical contributions, which explains these phenomena, are first to establish that the globally merged model of decentralized SGD can converge faster than centralized mini-batch SGD. Technically, we novelly reinterpret part of the discrepancy among local models, which were previously considered as detrimental noise, as constructive components that accelerate convergence. This work challenges the common belief that decentralized learning generalizes poorly under data heterogeneity and limited communication, while offering new insights into model merging and neural network loss landscapes.

Updated: 2025-07-09 04:56:56

标题: 一次合并就足够：恢复分散学习中基于服务器的学习性能

摘要: 分散学习提供了一种可扩展的替代传统基于参数服务器的训练方法，但其性能往往受到有限的对等通信的限制。在本文中，我们研究了如何在时间上安排通信，包括确定设备何时以及多频繁地进行同步。我们的实证结果表明，在分散式训练的后期集中通信预算显著提高了全局泛化能力。令人惊讶的是，我们发现在最后阶段进行完全连接的通信，通过单一全局合并实现，足以达到与基于服务器的训练相匹配的性能。我们进一步展示，在分散式学习中低通信保持了整个训练过程中的本地模型的可合并性。我们的理论贡献首次建立了分散式SGD的全局合并模型可以比集中式mini-batch SGD更快地收敛。在技术上，我们新颖地重新解释了本地模型之间差异的部分，这些差异以前被认为是有害的噪音，现在被认为是加速收敛的建设性组成部分。这项工作挑战了分散式学习在数据异质性和通信有限的情况下泛化能力差的普遍信念，同时为模型合并和神经网络损失景观提供了新的见解。

更新时间: 2025-07-09 04:56:56

领域: cs.LG,cs.DC,cs.MA,stat.ML

下载: http://arxiv.org/abs/2507.06542v1

Attribution Regularization for Multimodal Paradigms

Multimodal machine learning has gained significant attention in recent years due to its potential for integrating information from multiple modalities to enhance learning and decision-making processes. However, it is commonly observed that unimodal models outperform multimodal models, despite the latter having access to richer information. Additionally, the influence of a single modality often dominates the decision-making process, resulting in suboptimal performance. This research project aims to address these challenges by proposing a novel regularization term that encourages multimodal models to effectively utilize information from all modalities when making decisions. The focus of this project lies in the video-audio domain, although the proposed regularization technique holds promise for broader applications in embodied AI research, where multiple modalities are involved. By leveraging this regularization term, the proposed approach aims to mitigate the issue of unimodal dominance and improve the performance of multimodal machine learning systems. Through extensive experimentation and evaluation, the effectiveness and generalizability of the proposed technique will be assessed. The findings of this research project have the potential to significantly contribute to the advancement of multimodal machine learning and facilitate its application in various domains, including multimedia analysis, human-computer interaction, and embodied AI research.

Updated: 2025-07-09 04:52:34

标题: 跨模态范式的归因规范化

摘要: 多模态机器学习近年来受到了广泛关注，因为它有潜力整合来自多种模态的信息，以增强学习和决策过程。然而，尽管后者可以访问更丰富的信息，但通常观察到单模态模型的性能优于多模态模型。此外，单一模态的影响通常主导决策过程，导致性能不佳。这个研究项目旨在通过提出一种新颖的正则化项来解决这些挑战，鼓励多模态模型在做出决策时有效利用所有模态的信息。该项目的重点在于视频-音频领域，尽管所提出的正则化技术在涉及多种模态的体感人工智能研究中具有广泛应用前景。通过利用这种正则化项，所提出的方法旨在缓解单模态主导的问题，并提高多模态机器学习系统的性能。通过广泛的实验和评估，将评估所提出技术的有效性和泛化能力。这个研究项目的发现有可能显著促进多模态机器学习的发展，并促进其在各个领域的应用，包括多媒体分析，人机交互和体感人工智能研究。

更新时间: 2025-07-09 04:52:34

领域: cs.LG

下载: http://arxiv.org/abs/2404.02359v2

Graph-based Fake Account Detection: A Survey

In recent years, there has been a growing effort to develop effective and efficient algorithms for fake account detection in online social networks. This survey comprehensively reviews existing methods, with a focus on graph-based techniques that utilise topological features of social graphs (in addition to account information, such as their shared contents and profile data) to distinguish between fake and real accounts. We provide several categorisations of these methods (for example, based on techniques used, input data, and detection time), discuss their strengths and limitations, and explain how these methods connect in the broader context. We also investigate the available datasets, including both real-world data and synthesised models. We conclude the paper by proposing several potential avenues for future research.

Updated: 2025-07-09 04:52:15

标题: 基于图的虚假账号检测：一项调查

摘要: 近年来，人们越来越努力地开发有效和高效的算法，用于在线社交网络中伪造账户的检测。本调查全面审查了现有方法，重点关注利用社交图的拓扑特征的基于图的技术（除了账户信息外，例如它们共享的内容和个人资料数据）来区分假账户和真实账户。我们对这些方法进行了几种分类（例如，基于使用的技术、输入数据和检测时间），讨论了它们的优点和局限性，并解释了这些方法如何在更广泛的背景下相互关联。我们还调查了可用的数据集，包括真实世界数据和合成模型。最后，我们通过提出几个未来研究的潜在途径来总结这篇论文。

更新时间: 2025-07-09 04:52:15

领域: cs.SI,cs.AI,cs.LG,A.1; I.2.6; I.5.1

下载: http://arxiv.org/abs/2507.06541v1

Understanding Malware Propagation Dynamics through Scientific Machine Learning

Accurately modeling malware propagation is essential for designing effective cybersecurity defenses, particularly against adaptive threats that evolve in real time. While traditional epidemiological models and recent neural approaches offer useful foundations, they often fail to fully capture the nonlinear feedback mechanisms present in real-world networks. In this work, we apply scientific machine learning to malware modeling by evaluating three approaches: classical Ordinary Differential Equations (ODEs), Universal Differential Equations (UDEs), and Neural ODEs. Using data from the Code Red worm outbreak, we show that the UDE approach substantially reduces prediction error compared to both traditional and neural baselines by 44%, while preserving interpretability. We introduce a symbolic recovery method that transforms the learned neural feedback into explicit mathematical expressions, revealing suppression mechanisms such as network saturation, security response, and malware variant evolution. Our results demonstrate that hybrid physics-informed models can outperform both purely analytical and purely neural approaches, offering improved predictive accuracy and deeper insight into the dynamics of malware spread. These findings support the development of early warning systems, efficient outbreak response strategies, and targeted cyber defense interventions.

Updated: 2025-07-09 04:49:23

标题: 通过科学机器学习理解恶意软件传播动态

摘要: 精确建模恶意软件传播对于设计有效的网络安全防御至关重要，尤其是针对实时演变的自适应威胁。传统的流行病学模型和最近的神经网络方法提供了有用的基础，但它们通常未能完全捕捉现实世界网络中存在的非线性反馈机制。在这项工作中，我们将科学机器学习应用于恶意软件建模，评估了三种方法：经典的普通微分方程（ODEs）、通用微分方程（UDEs）和神经ODEs。利用Code Red蠕虫爆发的数据，我们展示了UDE方法相比传统和神经基线能够将预测误差减少了44％，同时保持了可解释性。我们引入了一种符号恢复方法，将学习到的神经反馈转化为明确的数学表达式，揭示了网络饱和、安全响应和恶意软件变体演化等抑制机制。我们的结果表明，混合物理信息模型可以胜过纯分析和纯神经方法，提供了改进的预测准确性和对恶意软件传播动态的更深入了解。这些发现支持早期预警系统、高效的爆发响应策略和有针对性的网络防御干预的发展。

更新时间: 2025-07-09 04:49:23

领域: cs.LG

下载: http://arxiv.org/abs/2507.07143v1

Geo-Registration of Terrestrial LiDAR Point Clouds with Satellite Images without GNSS

Accurate geo-registration of LiDAR point clouds presents significant challenges in GNSS signal denied urban areas with high-rise buildings and bridges. Existing methods typically rely on real-time GNSS and IMU data, that require pre-calibration and assume stable positioning during data collection. However, this assumption often fails in dense urban areas, resulting in localization errors. To address this, we propose a structured geo-registration and spatial correction method that aligns 3D point clouds with satellite images, enabling frame-wise recovery of GNSS information and reconstruction of city scale 3D maps without relying on prior localization. The proposed approach employs a pre-trained Point Transformer model to segment the road points and then extracts the road skeleton and intersection points from the point cloud as well as the target map for alignment. Global rigid alignment of the two is performed using the intersection points, followed by local refinement using radial basis function (RBF) interpolation. Elevation correction is then applied to the point cloud based on terrain information from SRTM dataset to resolve vertical discrepancies. The proposed method was tested on the popular KITTI benchmark and a locally collected Perth (Western Australia) CBD dataset. On the KITTI dataset, our method achieved an average planimetric alignment standard deviation (STD) of 0.84~m across sequences with intersections, representing a 55.3\% improvement over the original dataset. On the Perth dataset, which lacks GNSS information, our method achieved an average STD of 0.96~m compared to the GPS data extracted from Google Maps API. This corresponds to a 77.4\% improvement from the initial alignment. Our method also resulted in elevation correlation gains of 30.5\% on the KITTI dataset and 50.4\% on the Perth dataset.

Updated: 2025-07-09 04:44:50

标题: 使用卫星图像对地面LiDAR点云进行地理注册，无需GNSS

摘要: 激光雷达点云的精确地理注册在高层建筑和桥梁密集的GNSS信号受限城市地区面临重大挑战。现有方法通常依赖实时GNSS和IMU数据，需要预先校准并假定数据采集过程中位置稳定。然而，在密集城市地区，这种假设经常失败，导致定位误差。为解决这一问题，我们提出了一种结构化的地理注册和空间校正方法，通过将3D点云与卫星图像对齐，实现了GNSS信息的逐帧恢复和城市尺度3D地图的重建，而无需依赖先前的定位信息。所提出的方法利用预先训练的Point Transformer模型来分割道路点，然后从点云中提取道路骨架和交叉点以及目标地图进行对齐。通过使用交叉点进行全局刚性对齐，然后使用径向基函数（RBF）插值进行局部细化。然后基于SRTM数据集中的地形信息对点云进行高程校正以解决垂直差异。该方法在流行的KITTI基准数据集和本地收集的珀斯（西澳大利亚）CBD数据集上进行了测试。在KITTI数据集上，我们的方法在具有交叉点的序列中实现了平面对齐标准偏差（STD）的平均值为0.84~m，相比原始数据集提高了55.3\%。在缺乏GNSS信息的珀斯数据集上，我们的方法与从Google Maps API提取的GPS数据相比，平均STD为0.96~m。这相当于初始对齐的改进77.4\%。我们的方法还在KITTI数据集上实现了30.5\%的高程相关增益，在珀斯数据集上实现了50.4\%的高程相关增益。

更新时间: 2025-07-09 04:44:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.05999v2

Few-shot Learning on AMS Circuits and Its Application to Parasitic Capacitance Prediction

Graph representation learning is a powerful method to extract features from graph-structured data, such as analog/mixed-signal (AMS) circuits. However, training deep learning models for AMS designs is severely limited by the scarcity of integrated circuit design data. In this work, we present CircuitGPS, a few-shot learning method for parasitic effect prediction in AMS circuits. The circuit netlist is represented as a heterogeneous graph, with the coupling capacitance modeled as a link. CircuitGPS is pre-trained on link prediction and fine-tuned on edge regression. The proposed method starts with a small-hop sampling technique that converts a link or a node into a subgraph. Then, the subgraph embeddings are learned with a hybrid graph Transformer. Additionally, CircuitGPS integrates a low-cost positional encoding that summarizes the positional and structural information of the sampled subgraph. CircuitGPS improves the accuracy of coupling existence by at least 20\% and reduces the MAE of capacitance estimation by at least 0.067 compared to existing methods. Our method demonstrates strong inherent scalability, enabling direct application to diverse AMS circuit designs through zero-shot learning. Furthermore, the ablation studies provide valuable insights into graph models for representation learning.

Updated: 2025-07-09 04:42:18

标题: 少样本学习在AMS电路上的应用及其在寄生电容预测中的应用

摘要: 图表示学习是从图结构化数据中提取特征的强大方法，例如模拟/混合信号（AMS）电路。然而，针对AMS设计训练深度学习模型受到集成电路设计数据稀缺的严重限制。在这项工作中，我们提出了CircuitGPS，一种用于AMS电路中寄生效应预测的少样本学习方法。电路网络清单被表示为异构图，其中耦合电容被建模为一个连接。CircuitGPS在连接预测上进行预训练，并在边缘回归上进行微调。所提出的方法从一个小跳采样技术开始，将一个连接或一个节点转换为一个子图。然后，使用混合图Transformer学习子图嵌入。此外，CircuitGPS集成了一种低成本的位置编码，总结了采样子图的位置和结构信息。与现有方法相比，CircuitGPS将耦合存在的准确性提高了至少20\%，将电容估计的MAE降低了至少0.067。我们的方法展示了强大的固有可扩展性，通过零样本学习直接应用于多样的AMS电路设计。此外，消融研究为表示学习的图模型提供了有价值的见解。

更新时间: 2025-07-09 04:42:18

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.06538v1

DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation

Model quantization is a promising method for accelerating and compressing diffusion models. Nevertheless, since post-training quantization (PTQ) fails catastrophically at low-bit cases, quantization-aware training (QAT) is essential. Unfortunately, the wide range and time-varying activations in diffusion models sharply increase the complexity of quantization, making existing QAT methods inefficient. Equivalent scaling can effectively reduce activation range, but previous methods remain the overall quantization error unchanged. More critically, these methods significantly disrupt the original weight distribution, resulting in poor weight initialization and challenging convergence during QAT training. In this paper, we propose a novel QAT framework for diffusion models, called DilateQuant. Specifically, we propose Weight Dilation (WD) that maximally dilates the unsaturated in-channel weights to a constrained range through equivalent scaling. WD decreases the activation range while preserving the original weight range, which steadily reduces the quantization error and ensures model convergence. To further enhance accuracy and efficiency, we design a Temporal Parallel Quantizer (TPQ) to address the time-varying activations and introduce a Block-wise Knowledge Distillation (BKD) to reduce resource consumption in training. Extensive experiments demonstrate that DilateQuant significantly outperforms existing methods in terms of accuracy and efficiency. Code is available at http://github.com/BienLuky/DilateQuant .

Updated: 2025-07-09 04:40:14

标题: DilateQuant: 通过权重扩张实现精确高效的扩散量化

摘要: 模型量化是加速和压缩扩散模型的一种有前途的方法。然而，由于后训练量化(PTQ)在低比特情况下失败，量化感知训练(QAT)是必不可少的。不幸的是，扩散模型中广泛范围和时变激活会显著增加量化的复杂性，使得现有的QAT方法效率低下。等效缩放可以有效地减小激活范围，但先前的方法保持整体量化误差不变。更为关键的是，这些方法显著破坏了原始权重分布，导致权重初始化不佳，在QAT训练期间挑战收敛。在本文中，我们提出了一种新颖的扩散模型QAT框架，称为DilateQuant。具体来说，我们提出了最大程度扩张未饱和的通道内权重到受限范围的权重扩张(WD)通过等效缩放。WD减小了激活范围，同时保留了原始权重范围，稳定地减小了量化误差并确保模型收敛。为了进一步提高准确性和效率，我们设计了一个时间并行量化器(TPQ)来处理时变激活，并引入了一个块级知识蒸馏(BKD)来减少训练中的资源消耗。大量实验证明DilateQuant在准确性和效率方面明显优于现有方法。代码可在http://github.com/BienLuky/DilateQuant获取。

更新时间: 2025-07-09 04:40:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.14307v3

Transferable Parasitic Estimation via Graph Contrastive Learning and Label Rebalancing in AMS Circuits

Graph representation learning on Analog-Mixed Signal (AMS) circuits is crucial for various downstream tasks, e.g., parasitic estimation. However, the scarcity of design data, the unbalanced distribution of labels, and the inherent diversity of circuit implementations pose significant challenges to learning robust and transferable circuit representations. To address these limitations, we propose CircuitGCL, a novel graph contrastive learning framework that integrates representation scattering and label rebalancing to enhance transferability across heterogeneous circuit graphs. CircuitGCL employs a self-supervised strategy to learn topology-invariant node embeddings through hyperspherical representation scattering, eliminating dependency on large-scale data. Simultaneously, balanced mean squared error (MSE) and softmax cross-entropy (bsmCE) losses are introduced to mitigate label distribution disparities between circuits, enabling robust and transferable parasitic estimation. Evaluated on parasitic capacitance estimation (edge-level task) and ground capacitance classification (node-level task) across TSMC 28nm AMS designs, CircuitGCL outperforms all state-of-the-art (SOTA) methods, with the $R^2$ improvement of $33.64\% \sim 44.20\%$ for edge regression and F1-score gain of $0.9\times \sim 2.1\times$ for node classification. Our code is available at \href{https://anonymous.4open.science/r/CircuitGCL-099B/README.md}{here}.

Updated: 2025-07-09 04:31:10

标题: 在AMS电路中通过图对比学习和标签再平衡实现可传递的寄生参数估计

摘要: 在模拟混合信号（AMS）电路上进行图表示学习对于各种下游任务至关重要，例如寄生参数估计。然而，设计数据稀缺、标签分布不均衡以及电路实现的固有多样性给学习稳健和可转移电路表示带来了重大挑战。为解决这些限制，我们提出了CircuitGCL，这是一个集成了表示散射和标签重新平衡的新型图对比学习框架，以增强异构电路图之间的可转移性。CircuitGCL采用自监督策略通过超球面表示散射来学习具有拓扑不变性的节点嵌入，消除了对大规模数据的依赖。同时，引入了平衡均方误差（MSE）和softmax交叉熵（bsmCE）损失来减轻电路之间标签分布不平衡，实现了稳健和可转移的寄生参数估计。在TSMC 28nm AMS设计中进行寄生电容估计（边级任务）和地面电容分类（节点级任务）的评估中，CircuitGCL优于所有最新技术方法，边缘回归的$R^2$改进为$33.64\% \sim 44.20\%$，节点分类的F1分数增益为$0.9\times \sim 2.1\times$。我们的代码可在此处获取：https://anonymous.4open.science/r/CircuitGCL-099B/README.md。

更新时间: 2025-07-09 04:31:10

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.06535v1

Medical Image Segmentation Using Advanced Unet: VMSE-Unet and VM-Unet CBAM+

In this paper, we present the VMSE U-Net and VM-Unet CBAM+ model, two cutting-edge deep learning architectures designed to enhance medical image segmentation. Our approach integrates Squeeze-and-Excitation (SE) and Convolutional Block Attention Module (CBAM) techniques into the traditional VM U-Net framework, significantly improving segmentation accuracy, feature localization, and computational efficiency. Both models show superior performance compared to the baseline VM-Unet across multiple datasets. Notably, VMSEUnet achieves the highest accuracy, IoU, precision, and recall while maintaining low loss values. It also exhibits exceptional computational efficiency with faster inference times and lower memory usage on both GPU and CPU. Overall, the study suggests that the enhanced architecture VMSE-Unet is a valuable tool for medical image analysis. These findings highlight its potential for real-world clinical applications, emphasizing the importance of further research to optimize accuracy, robustness, and computational efficiency.

Updated: 2025-07-09 04:27:07

标题: 使用先进的Unet进行医学图像分割：VMSE-Unet和VM-Unet CBAM+

摘要: 在本文中，我们提出了VMSE U-Net和VM-Unet CBAM+模型，这是两种先进的深度学习架构，旨在增强医学图像分割。我们的方法将Squeeze-and-Excitation（SE）和Convolutional Block Attention Module（CBAM）技术整合到传统的VM U-Net框架中，显著提高了分割准确性、特征定位和计算效率。与基线VM-Unet相比，这两种模型在多个数据集上表现出更好的性能。值得注意的是，VMSEUnet在保持低损失值的同时实现了最高的准确性、IoU、精度和召回率。它还展现出了优秀的计算效率，推断时间更快，GPU和CPU上的内存使用更低。总的来说，研究表明增强的架构VMSE-Unet是医学图像分析的有价值工具。这些发现突显了它在现实临床应用中的潜力，强调进一步研究以优化准确性、稳健性和计算效率的重要性。

更新时间: 2025-07-09 04:27:07

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.00511v2

From large-eddy simulations to deep learning: A U-net model for fast urban canopy flow predictions

Accurate prediction of wind flow fields in urban canopies is crucial for ensuring pedestrian comfort, safety, and sustainable urban design. Traditional methods using wind tunnels and Computational Fluid Dynamics, such as Large-Eddy Simulations (LES), are limited by high costs, computational demands, and time requirements. This study presents a deep neural network (DNN) approach for fast and accurate predictions of urban wind flow fields, reducing computation time from an order of 10 hours on 32 CPUs for one LES evaluation to an order of 1 second on a single GPU using the DNN model. We employ a U-Net architecture trained on LES data including 252 synthetic urban configurations at seven wind directions ($0^{o}$ to $90^{o}$ in $15^{o}$ increments). The model predicts two key quantities of interest: mean velocity magnitude and streamwise turbulence intensity, at multiple heights within the urban canopy. The U-net uses 2D building representations augmented with signed distance functions and their gradients as inputs, forming a $256\times256\times9$ tensor. In addition, a Spatial Attention Module is used for feature transfer through skip connections. The loss function combines the root-mean-square error of predictions, their gradient magnitudes, and L2 regularization. Model evaluation on 50 test cases demonstrates high accuracy with an overall mean relative error of 9.3% for velocity magnitude and 5.2% for turbulence intensity. This research shows the potential of deep learning approaches to provide fast, accurate urban wind assessments essential for creating comfortable and safe urban environments. Code is available at https://github.com/tvarg/Urban-FlowUnet.git

Updated: 2025-07-09 04:25:13

标题: 从大涡模拟到深度学习：一种用于快速城市冠层流预测的U-net模型

摘要: 城市屋顶风流场的准确预测对于确保行人舒适、安全和可持续的城市设计至关重要。传统方法使用风洞和计算流体力学，例如大涡模拟（LES），受到高成本、计算需求和时间要求的限制。本研究提出了一种深度神经网络（DNN）方法，用于快速准确地预测城市风流场，将计算时间从一个LES评估在32个CPU上需要10小时减少到使用DNN模型在单个GPU上只需1秒。我们采用在七个风向（$0^{o}$到$90^{o}$，每隔$15^{o}$）下的252个合成城市配置上训练的U-Net架构。该模型预测了城市层林内多个高度处的平均速度大小和流向湍流强度这两个关键量。U-net使用了2D建筑表示法，增强了带有符号距离函数及其梯度的输入，形成了一个$256\times256\times9$张量。此外，通过跳跃连接使用了空间注意力模块进行特征传递。损失函数结合了预测的均方根误差、它们的梯度大小和L2正则化。对50个测试案例进行的模型评估表明，速度大小的整体平均相对误差为9.3%，湍流强度为5.2%。这项研究展示了深度学习方法在提供快速、准确的城市风评估方面的潜力，这对于创造舒适和安全的城市环境至关重要。代码可在https://github.com/tvarg/Urban-FlowUnet.git找到。

更新时间: 2025-07-09 04:25:13

领域: physics.comp-ph,cs.LG,physics.flu-dyn

下载: http://arxiv.org/abs/2507.06533v1

A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence

Policy gradient methods have become a staple of any single-agent reinforcement learning toolbox, due to their combination of desirable properties: iterate convergence, efficient use of stochastic trajectory feedback, and theoretically-sound avoidance of importance sampling corrections. In multi-agent imperfect-information settings (extensive-form games), however, it is still unknown whether the same desiderata can be guaranteed while retaining theoretical guarantees. Instead, sound methods for extensive-form games rely on approximating \emph{counterfactual} values (as opposed to Q values), which are incompatible with policy gradient methodologies. In this paper, we investigate whether policy gradient can be safely used in two-player zero-sum imperfect-information extensive-form games (EFGs). We establish positive results, showing for the first time that a policy gradient method leads to provable best-iterate convergence to a regularized Nash equilibrium in self-play.

Updated: 2025-07-09 04:23:39

标题: 用政策梯度方法解决具有最佳迭代收敛性的不完全信息游戏

摘要: 政策梯度方法已经成为任何单智能体强化学习工具箱的基本组成部分，这是由于它们具有理想特性的组合：迭代收敛、高效利用随机轨迹反馈和在理论上避免重要性采样修正。然而，在多智能体不完全信息设置（即广义形式博弈）中，仍然不清楚是否能够保证相同的期望，同时保留理论保证。相反，在广义形式博弈中，对于近似\emph{反事实}值（而不是Q值）的有效方法依赖于，这与政策梯度方法不兼容。在本文中，我们调查了政策梯度方法是否可以安全地用于两人零和不完全信息广义形式博弈（EFGs）。我们得出了积极的结果，首次显示政策梯度方法导致在自我对战中对正则化纳什均衡的可证明最佳迭代收敛。

更新时间: 2025-07-09 04:23:39

领域: cs.GT,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2408.00751v2

Integrating External Tools with Large Language Models to Improve Accuracy

This paper deals with improving querying large language models (LLMs). It is well-known that without relevant contextual information, LLMs can provide poor quality responses or tend to hallucinate. Several initiatives have proposed integrating LLMs with external tools to provide them with up-to-date data to improve accuracy. In this paper, we propose a framework to integrate external tools to enhance the capabilities of LLMs in answering queries in educational settings. Precisely, we develop a framework that allows accessing external APIs to request additional relevant information. Integrated tools can also provide computational capabilities such as calculators or calendars. The proposed framework has been evaluated using datasets from the Multi-Modal Language Understanding (MMLU) collection. The data consists of questions on mathematical and scientific reasoning. Results compared to state-of-the-art language models show that the proposed approach significantly improves performance. Our Athena framework achieves 83% accuracy in mathematical reasoning and 88% in scientific reasoning, substantially outperforming all tested models including GPT-4o, LLaMA-Large, Mistral-Large, Phi-Large, and GPT-3.5, with the best baseline model (LLaMA-Large) achieving only 67% and 79% respectively. These promising results open the way to creating complex computing ecosystems around LLMs to make their use more natural to support various tasks and activities.

Updated: 2025-07-09 04:09:59

标题: 整合外部工具与大型语言模型以提高准确性

摘要: 这篇论文涉及改进大型语言模型（LLMs）的查询。众所周知，如果缺乏相关的上下文信息，LLMs可能会提供质量较差的回应或倾向于产生幻觉。一些倡议已经提出将LLMs与外部工具整合，以提供最新数据来提高准确性。在本文中，我们提出了一个框架，将外部工具整合到LLMs中，以增强其在教育环境中回答查询的能力。确切地说，我们开发了一个框架，允许访问外部API来请求额外相关信息。集成的工具还可以提供计算功能，如计算器或日历。所提出的框架已经使用来自多模态语言理解（MMLU）集合的数据进行评估。数据包括关于数学和科学推理的问题。与最先进的语言模型相比，结果显示所提出的方法显著改善了性能。我们的Athena框架在数学推理方面达到了83%的准确率，在科学推理方面达到了88%，明显优于所有测试的模型，包括GPT-4o、LLaMA-Large、Mistral-Large、Phi-Large和GPT-3.5，最佳基准模型(LLaMA-Large)的准确率分别仅为67%和79%。这些有希望的结果为围绕LLMs创建复杂的计算生态系统打开了道路，使它们的使用更加自然，以支持各种任务和活动。

更新时间: 2025-07-09 04:09:59

领域: cs.CL,cs.AI,cs.LG,68T50,I.2.7; I.2.6

下载: http://arxiv.org/abs/2507.08034v1

Direct Regret Optimization in Bayesian Optimization

Bayesian optimization (BO) is a powerful paradigm for optimizing expensive black-box functions. Traditional BO methods typically rely on separate hand-crafted acquisition functions and surrogate models for the underlying function, and often operate in a myopic manner. In this paper, we propose a novel direct regret optimization approach that jointly learns the optimal model and non-myopic acquisition by distilling from a set of candidate models and acquisitions, and explicitly targets minimizing the multi-step regret. Our framework leverages an ensemble of Gaussian Processes (GPs) with varying hyperparameters to generate simulated BO trajectories, each guided by an acquisition function chosen from a pool of conventional choices, until a Bayesian early stop criterion is met. These simulated trajectories, capturing multi-step exploration strategies, are used to train an end-to-end decision transformer that directly learns to select next query points aimed at improving the ultimate objective. We further adopt a dense training--sparse learning paradigm: The decision transformer is trained offline with abundant simulated data sampled from ensemble GPs and acquisitions, while a limited number of real evaluations refine the GPs online. Experimental results on synthetic and real-world benchmarks suggest that our method consistently outperforms BO baselines, achieving lower simple regret and demonstrating more robust exploration in high-dimensional or noisy settings.

Updated: 2025-07-09 04:09:58

标题: 在贝叶斯优化中的直接遗憾优化

摘要: 贝叶斯优化（BO）是优化昂贵黑盒函数的强大范例。传统的BO方法通常依赖于单独设计的收益函数和基础函数的代理模型，并且通常以目光短浅的方式运行。在本文中，我们提出了一种新颖的直接后悔优化方法，通过从一组候选模型和收益中提炼来共同学习最优模型和非目光短浅的收益，并明确目标是最小化多步后悔。我们的框架利用具有不同超参数的高斯过程（GPs）集成来生成模拟的BO轨迹，每个轨迹由从传统选择中选择的收益函数引导，直到满足贝叶斯早停准则。这些模拟轨迹捕捉了多步探索策略，用于训练一个端到端的决策转换器，直接学习选择下一个旨在改进最终目标的查询点。我们进一步采用了稠密训练-稀疏学习范式：决策转换器在丰富的从集成GPs和收益中采样的模拟数据离线训练，而有限数量的真实评估则在线优化GPs。对合成和真实世界基准的实验结果表明，我们的方法始终优于BO基线，实现了更低的简单后悔并在高维或嘈杂环境中展示了更稳健的探索。

更新时间: 2025-07-09 04:09:58

领域: cs.LG

下载: http://arxiv.org/abs/2507.06529v1

InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes under Herd Behavior

Aligning Large Language Models (LLMs) with investor decision-making processes under herd behavior is a critical challenge in behavioral finance, which grapples with a fundamental limitation: the scarcity of real-user data needed for Supervised Fine-Tuning (SFT). While SFT can bridge the gap between LLM outputs and human behavioral patterns, its reliance on massive authentic data imposes substantial collection costs and privacy risks. We propose InvestAlign, a novel framework that constructs high-quality SFT datasets by leveraging theoretical solutions to similar and simple optimal investment problems rather than complex scenarios. Our theoretical analysis demonstrates that training LLMs with InvestAlign-generated data achieves faster parameter convergence than using real-user data, suggesting superior learning efficiency. Furthermore, we develop InvestAgent, an LLM agent fine-tuned with InvestAlign, which demonstrates significantly closer alignment to real-user data than pre-SFT models in both simple and complex investment problems. This highlights our proposed InvestAlign as a promising approach with the potential to address complex optimal investment problems and align LLMs with investor decision-making processes under herd behavior. Our code is publicly available at https://github.com/thu-social-network-research-group/InvestAlign.

Updated: 2025-07-09 04:07:22

标题: InvestAlign: 克服数据稀缺问题，将大型语言模型与投资者决策过程在群体行为下进行对齐

摘要: 将大型语言模型（LLMs）与投资者决策过程在羊群行为下进行对齐是行为金融学中的一个关键挑战，该领域面临一个基本限制：缺乏进行监督微调（SFT）所需的真实用户数据。虽然SFT可以弥合LLM输出和人类行为模式之间的差距，但其依赖大量真实数据带来了巨大的收集成本和隐私风险。我们提出了InvestAlign，这是一个新颖的框架，通过利用类似和简单的最优投资问题的理论解决方案构建高质量的SFT数据集，而不是复杂的场景。我们的理论分析表明，使用InvestAlign生成的数据对LLMs进行训练比使用真实用户数据实现了更快的参数收敛，表明了更高的学习效率。此外，我们开发了InvestAgent，一种经过InvestAlign微调的LLM代理，这种代理在简单和复杂的投资问题中展现出比SFT前模型更接近真实用户数据的对齐。这凸显了我们提出的InvestAlign作为一种有潜力解决复杂最优投资问题并将LLMs与投资者决策过程在羊群行为下进行对齐的方法。我们的代码可以在https://github.com/thu-social-network-research-group/InvestAlign 上公开获取。

更新时间: 2025-07-09 04:07:22

领域: cs.CL,cs.AI,cs.ET,cs.LG

下载: http://arxiv.org/abs/2507.06528v1

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

Neural networks trained to solve modular arithmetic tasks exhibit grokking, a phenomenon where the test accuracy starts improving long after the model achieves 100% training accuracy in the training process. It is often taken as an example of "emergence", where model ability manifests sharply through a phase transition. In this work, we show that the phenomenon of grokking is not specific to neural networks nor to gradient descent-based optimization. Specifically, we show that this phenomenon occurs when learning modular arithmetic with Recursive Feature Machines (RFM), an iterative algorithm that uses the Average Gradient Outer Product (AGOP) to enable task-specific feature learning with general machine learning models. When used in conjunction with kernel machines, iterating RFM results in a fast transition from random, near zero, test accuracy to perfect test accuracy. This transition cannot be predicted from the training loss, which is identically zero, nor from the test loss, which remains constant in initial iterations. Instead, as we show, the transition is completely determined by feature learning: RFM gradually learns block-circulant features to solve modular arithmetic. Paralleling the results for RFM, we show that neural networks that solve modular arithmetic also learn block-circulant features. Furthermore, we present theoretical evidence that RFM uses such block-circulant features to implement the Fourier Multiplication Algorithm, which prior work posited as the generalizing solution neural networks learn on these tasks. Our results demonstrate that emergence can result purely from learning task-relevant features and is not specific to neural architectures nor gradient descent-based optimization methods. Furthermore, our work provides more evidence for AGOP as a key mechanism for feature learning in neural networks.

Updated: 2025-07-09 04:06:02

标题: 非神经模型中的出现：通过平均梯度外积理解模数算术

摘要: 训练用于解决模块算术任务的神经网络表现出理解（grokking），这是一种现象，即在模型在训练过程中达到100％训练准确率之后，测试准确率开始长时间改善。这通常被视为“出现”（emergence）的一个例子，即模型能力通过阶段转变显著表现出来。在这项工作中，我们展示理解现象不仅适用于神经网络，也不仅适用于基于梯度下降的优化。具体来说，我们展示了在使用递归特征机器（RFM）学习模块算术时出现这种现象，RFM是一种使用平均梯度外积（AGOP）的迭代算法，用于让通用机器学习模型学习特定任务的特征。当与核机器结合使用时，迭代RFM会导致从随机、接近零的测试准确率迅速过渡到完美的测试准确率。这种过渡无法从训练损失（恒等于零）或测试损失（在初始迭代中保持恒定）中预测。相反，如我们所展示的，过渡完全由特征学习确定：RFM逐渐学习块循环特征来解决模块算术。与RFM的结果相似，我们还展示解决模块算术的神经网络也学习块循环特征。此外，我们提供理论证据表明RFM使用这种块循环特征来实现傅里叶乘法算法，之前的研究认为这是神经网络在这些任务上学到的泛化解决方案。我们的结果表明，出现现象纯粹可以由学习与任务相关的特征导致，并且并非特定于神经结构或基于梯度下降的优化方法。此外，我们的工作为AGOP作为神经网络特征学习的关键机制提供了更多证据。

更新时间: 2025-07-09 04:06:02

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.20199v3

FinSphere, a Real-Time Stock Analysis Agent Powered by Instruction-Tuned LLMs and Domain Tools

Current financial large language models (FinLLMs) struggle with two critical limitations: the absence of objective evaluation metrics to assess the quality of stock analysis reports and a lack of depth in stock analysis, which impedes their ability to generate professional-grade insights. To address these challenges, this paper introduces FinSphere, a stock analysis agent, along with three major contributions: (1) AnalyScore, a systematic evaluation framework for assessing stock analysis quality, (2) Stocksis, a dataset curated by industry experts to enhance LLMs' stock analysis capabilities, and (3) FinSphere, an AI agent that can generate high-quality stock analysis reports in response to user queries. Experiments demonstrate that FinSphere achieves superior performance compared to both general and domain-specific LLMs, as well as existing agent-based systems, even when they are enhanced with real-time data access and few-shot guidance. The integrated framework, which combines real-time data feeds, quantitative tools, and an instruction-tuned LLM, yields substantial improvements in both analytical quality and practical applicability for real-world stock analysis.

Updated: 2025-07-09 03:59:49

标题: FinSphere，一款由指令调整的LLMs和领域工具驱动的实时股票分析代理

摘要: 目前金融领域的大型语言模型（FinLLMs）面临两个关键限制：缺乏客观评估指标来评估股票分析报告的质量，以及股票分析的深度不足，这影响了它们生成专业级见解的能力。为了解决这些挑战，本文介绍了FinSphere，一个股票分析代理，以及三个主要贡献：（1）AnalyScore，一个系统化评估框架，用于评估股票分析质量，（2）Stocksis，由行业专家策划的数据集，以增强LLMs的股票分析能力，以及（3）FinSphere，一个能够根据用户查询生成高质量股票分析报告的AI代理。实验证明，与通用和特定领域的LLMs以及现有基于代理的系统相比，即使它们通过实时数据访问和少量指导进行增强，FinSphere也实现了卓越的性能。集成框架结合了实时数据源、定量工具和经过指导调整的LLM，在真实股票分析中实现了分析质量和实际应用的显著改进。

更新时间: 2025-07-09 03:59:49

领域: cs.AI,cs.CL,cs.IR,q-fin.CP

下载: http://arxiv.org/abs/2501.12399v2

The Power of Regularization in Solving Extensive-Form Games

In this paper, we investigate the power of {\it regularization}, a common technique in reinforcement learning and optimization, in solving extensive-form games (EFGs). We propose a series of new algorithms based on regularizing the payoff functions of the game, and establish a set of convergence results that strictly improve over the existing ones, with either weaker assumptions or stronger convergence guarantees. In particular, we first show that dilated optimistic mirror descent (DOMD), an efficient variant of OMD for solving EFGs, with adaptive regularization can achieve a fast $\tilde O(1/T)$ {last-iterate convergence rate for the output of the algorithm} in terms of duality gap and distance to the set of Nash equilibrium (NE) without uniqueness assumption of the NE. Second, we show that regularized counterfactual regret minimization (\texttt{Reg-CFR}), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(1/T^{1/4})$ best-iterate, and $O(1/T^{3/4})$ average-iterate convergence rate for finding NE in EFGs. Finally, we show that \texttt{Reg-CFR} can achieve asymptotic last-iterate convergence, and optimal $O(1/T)$ average-iterate convergence rate, for finding the NE of perturbed EFGs, which is useful for finding approximate extensive-form perfect equilibria (EFPE). To the best of our knowledge, they constitute the first last-iterate convergence results for CFR-type algorithms, while matching the state-of-the-art average-iterate convergence rate in finding NE for non-perturbed EFGs. We also provide numerical results to corroborate the advantages of our algorithms.

Updated: 2025-07-09 03:58:26

标题: 解决广泛形式游戏中正则化的力量

摘要: 在这篇论文中，我们研究了正则化在解决广义形式博弈（EFGs）中的作用，正则化是强化学习和优化中常见的技术。我们提出了一系列基于正则化游戏收益函数的新算法，并建立了一组收敛结果，这些结果严格改进了现有结果，要么是在更弱的假设下，要么是在更强的收敛保证下。具体而言，我们首先表明，扩张乐观镜像下降（DOMD）是一种用于解决EFGs的高效OMD变体，通过自适应正则化可以实现算法输出在对偶间隙和到达纳什均衡（NE）集合的快速$\tilde O(1/T)$最终迭代收敛速率，而无需假设NE的唯一性。其次，我们表明，通过使用乐观镜像下降算法作为后悔最小化器的正则化反事实后悔最小化（Reg-CFR），可以实现$O(1/T^{1/4})$最佳迭代和$O(1/T^{3/4})$平均迭代收敛速率，以找到EFGs中的NE。最后，我们展示了Reg-CFR可以实现渐近最终迭代收敛，并在寻找扰动EFGs的NE时实现了最优$O(1/T)$平均迭代收敛速率，这对于寻找近似的广义形式完美均衡（EFPE）非常有用。据我们所知，这些结果构成了CFR类型算法的首个最终迭代收敛结果，同时与寻找非扰动EFGs NE的最新平均迭代收敛速率相匹配。我们还提供了数值结果来证实我们算法的优势。

更新时间: 2025-07-09 03:58:26

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2206.09495v3

Hespi: A pipeline for automatically detecting information from hebarium specimen sheets

Specimen-associated biodiversity data are crucial for biological, environmental, and conservation sciences. A rate shift is needed to extract data from specimen images efficiently, moving beyond human-mediated transcription. We developed `Hespi' (HErbarium Specimen sheet PIpeline) using advanced computer vision techniques to extract pre-catalogue data from primary specimen labels on herbarium specimens. Hespi integrates two object detection models: one for detecting the components of the sheet and another for fields on the primary primary specimen label. It classifies labels as printed, typed, handwritten, or mixed and uses Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) for extraction. The text is then corrected against authoritative taxon databases and refined using a multimodal Large Language Model (LLM). Hespi accurately detects and extracts text from specimen sheets across international herbaria, and its modular design allows users to train and integrate custom models.

Updated: 2025-07-09 03:54:32

标题: Hespi: 一个用于自动检测标本文件中信息的流水线

摘要: 标本相关的生物多样性数据对生物学、环境和保护科学至关重要。需要进行速率转变以高效地从标本图像中提取数据，超越人工转录。我们开发了`Hespi'（植物标本表格流水线），利用先进的计算机视觉技术从植物标本上提取预目录数据。Hespi集成了两个目标检测模型：一个用于检测表格的组件，另一个用于主要标本标签上的字段。它将标签分类为打印、打字、手写或混合，并使用光学字符识别（OCR）和手写文本识别（HTR）进行提取。然后对文本进行校正，与权威分类数据库进行对比，并使用多模态大型语言模型（LLM）进行精细化。Hespi能够准确地检测和提取国际植物标本馆的标本表格中的文本，其模块化设计允许用户训练和集成定制模型。

更新时间: 2025-07-09 03:54:32

领域: cs.CV,cs.AI,cs.IR

下载: http://arxiv.org/abs/2410.08740v2

AdaDPIGU: Differentially Private SGD with Adaptive Clipping and Importance-Based Gradient Updates for Deep Neural Networks

Differential privacy has been proven effective for stochastic gradient descent; however, existing methods often suffer from performance degradation in high-dimensional settings, as the scale of injected noise increases with dimensionality. To tackle this challenge, we propose AdaDPIGU--a new differentially private SGD framework with importance-based gradient updates tailored for deep neural networks. In the pretraining stage, we apply a differentially private Gaussian mechanism to estimate the importance of each parameter while preserving privacy. During the gradient update phase, we prune low-importance coordinates and introduce a coordinate-wise adaptive clipping mechanism, enabling sparse and noise-efficient gradient updates. Theoretically, we prove that AdaDPIGU satisfies $(\varepsilon, \delta)$-differential privacy and retains convergence guarantees. Extensive experiments on standard benchmarks validate the effectiveness of AdaDPIGU. All results are reported under a fixed retention ratio of 60%. On MNIST, our method achieves a test accuracy of 99.12% under a privacy budget of $\epsilon = 8$, nearly matching the non-private model. Remarkably, on CIFAR-10, it attains 73.21% accuracy at $\epsilon = 4$, outperforming the non-private baseline of 71.12%, demonstrating that adaptive sparsification can enhance both privacy and utility.

Updated: 2025-07-09 03:53:03

标题: AdaDPIGU：具有自适应截断和基于重要性的梯度更新的深度神经网络差分隐私SGD

摘要: 差分隐私已被证明对随机梯度下降有效；然而，现有方法在高维设置中经常遭受性能下降，因为注入噪声的规模随着维度增加而增加。为了解决这一挑战，我们提出了AdaDPIGU——一种针对深度神经网络量身定制的基于重要性的梯度更新的差分隐私SGD框架。在预训练阶段，我们应用差分隐私高斯机制来估计每个参数的重要性，同时保护隐私。在梯度更新阶段，我们修剪低重要性坐标，并引入一种逐坐标自适应剪切机制，实现稀疏和高效的梯度更新。在理论上，我们证明了AdaDPIGU满足$(\varepsilon, \delta)$-差分隐私，并保持收敛性保证。在标准基准测试中的大量实验验证了AdaDPIGU的有效性。所有结果均在固定保留率为60%的情况下报告。在MNIST上，我们的方法在隐私预算为$\epsilon = 8$的情况下实现了99.12%的测试准确率，几乎与非私有模型相匹配。值得注意的是，在CIFAR-10上，它在$\epsilon = 4$的情况下达到了73.21%的准确率，优于非私有基线的71.12%，表明自适应稀疏化可以增强隐私和效用。

更新时间: 2025-07-09 03:53:03

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2507.06525v1

Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving

Existing approaches to mathematical reasoning with large language models (LLMs) rely on Chain-of-Thought (CoT) for generalizability or Tool-Integrated Reasoning (TIR) for precise computation. While efforts have been made to combine these methods, they primarily rely on post-selection or predefined strategies, leaving an open question: whether LLMs can autonomously adapt their reasoning strategy based on their inherent capabilities. In this work, we propose TATA (Teaching LLMs According to Their Aptitude), an adaptive framework that enables LLMs to personalize their reasoning strategy spontaneously, aligning it with their intrinsic aptitude. TATA incorporates base-LLM-aware data selection during supervised fine-tuning (SFT) to tailor training data to the model's unique abilities. This approach equips LLMs to autonomously determine and apply the appropriate reasoning strategy at test time. We evaluate TATA through extensive experiments on six mathematical reasoning benchmarks, using both general-purpose and math-specialized LLMs. Empirical results demonstrate that TATA effectively combines the complementary strengths of CoT and TIR, achieving superior or comparable performance with improved inference efficiency compared to TIR alone. Further analysis underscores the critical role of aptitude-aware data selection in enabling LLMs to make effective and adaptive reasoning decisions and align reasoning strategies with model capabilities.

Updated: 2025-07-09 03:50:21

标题: 根据LLMs的能力教学：适应性推理在数学问题解决中的应用

摘要: 现有的数学推理大型语言模型(LLMs)的方法依赖于思维链(Chain-of-Thought, CoT)以实现泛化性，或者依赖于集成工具推理(Tool-Integrated Reasoning, TIR)以进行精确计算。虽然已经努力将这些方法结合起来，但它们主要依赖于后选择或预定义策略，留下一个问题：LLMs是否可以根据其固有能力自主调整推理策略。在这项工作中，我们提出了TATA (根据其能力教授LLMs)，这是一个自适应框架，使LLMs能够自发地个性化其推理策略，与其内在能力相一致。TATA在监督微调(SFT)期间融入基础LLM感知数据选择，以调整训练数据以适应模型的独特能力。这种方法使LLMs能够在测试时自主确定并应用适当的推理策略。我们通过对六个数学推理基准的广泛实验评估了TATA，使用了通用和数学专业化的LLMs。经验结果表明，TATA有效地结合了CoT和TIR的互补优势，在推理效率方面表现出比仅使用TIR更好或相当的性能。进一步的分析强调了能力感知数据选择在使LLMs能够做出有效和自适应的推理决策并将推理策略与模型能力相一致方面的关键作用。

更新时间: 2025-07-09 03:50:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.12022v3

DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE

Native multimodal large language models (MLLMs) restructure a single large language model (LLM) into a spoken language model (SLM) capable of both speech and text generation. Compared to modular and aligned MLLMs, native MLLMs preserve richer paralinguistic features such as emotion and prosody, and generate speech responses directly within the backbone LLM rather than using a separate speech decoder. This integration also results in lower response latency and smoother interaction. However, native MLLMs suffer from catastrophic forgetting and performance degradation because the available paired speech-text data is insufficient to support the pretraining of MLLMs compared to the vast amount of text data required to pretrain text LLMs. To address this issue, we propose DeepTalk, a framework for adaptive modality expert learning based on a Mixture of Experts (MoE) architecture. DeepTalk first adaptively distinguishes modality experts according to their modality load within the LLM. Each modality expert then undergoes specialized single-modality training, followed by joint multimodal collaborative training. As a result, DeepTalk incurs only a 5.5% performance drop compared to the original LLM, which is significantly lower than the average performance drop of over 20% typically seen in native MLLMs (such as GLM-4-Voice), and is on par with modular MLLMs. Meanwhile, the end-to-end dialogue latency remains within 0.5 seconds, ensuring a seamless and intelligent speech interaction experience. Code and models are released at https://github.com/talkking/DeepTalk.

Updated: 2025-07-09 03:49:42

标题: DeepTalk：朝向具有自适应模态特定MoE的无缝智能语音交互

摘要: 原生多模态大语言模型（MLLMs）将单一大语言模型（LLM）重构为一种口语语言模型（SLM），能够进行语音和文本生成。与模块化和对齐的MLLM相比，原生MLLM保留了更丰富的语用特征，如情感和韵律，并直接在骨干LLM中生成语音响应，而不是使用单独的语音解码器。这种集成还导致更低的响应延迟和更顺畅的交互。然而，原生MLLM由于可用的配对语音-文本数据不足，无法支持MLLM的预训练，而文本LLM需要大量文本数据进行预训练。为了解决这个问题，我们提出了DeepTalk，这是一个基于混合专家（MoE）架构的自适应模态专家学习框架。DeepTalk首先根据LLM中的模态负荷自适应区分模态专家。然后，每个模态专家经历专门的单模态训练，然后进行联合多模态协作训练。结果，DeepTalk与原始LLM相比仅产生5.5％的性能下降，这显着低于通常在原生MLLM中看到的超过20％的平均性能下降（例如GLM-4-Voice），并与模块化MLLM相当。同时，端到端对话延迟保持在0.5秒内，确保无缝和智能的语音交互体验。代码和模型发布在https://github.com/talkking/DeepTalk。

更新时间: 2025-07-09 03:49:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.21864v2

Str-GCL: Structural Commonsense Driven Graph Contrastive Learning

Graph Contrastive Learning (GCL) is a widely adopted approach in self-supervised graph representation learning, applying contrastive objectives to produce effective representations. However, current GCL methods primarily focus on capturing implicit semantic relationships, often overlooking the structural commonsense embedded within the graph's structure and attributes, which contains underlying knowledge crucial for effective representation learning. Due to the lack of explicit information and clear guidance in general graph, identifying and integrating such structural commonsense in GCL poses a significant challenge. To address this gap, we propose a novel framework called Structural Commonsense Unveiling in Graph Contrastive Learning (Str-GCL). Str-GCL leverages first-order logic rules to represent structural commonsense and explicitly integrates them into the GCL framework. It introduces topological and attribute-based rules without altering the original graph and employs a representation alignment mechanism to guide the encoder in effectively capturing this commonsense. To the best of our knowledge, this is the first attempt to directly incorporate structural commonsense into GCL. Extensive experiments demonstrate that Str-GCL outperforms existing GCL methods, providing a new perspective on leveraging structural commonsense in graph representation learning.

Updated: 2025-07-09 03:41:48

标题: Str-GCL：结构常识驱动的图形对比学习

摘要: 图对比学习（GCL）是自监督图表示学习中被广泛采用的方法，应用对比目标产生有效的表示。然而，当前的GCL方法主要集中在捕捉隐含的语义关系，通常忽视了嵌入在图的结构和属性中的结构常识，其中包含了对有效表示学习至关重要的基础知识。由于普通图中缺乏明确的信息和清晰的指导，识别和整合这种结构常识在GCL中构成了一个重大挑战。为了填补这一空白，我们提出了一个名为图对比学习中的结构常识揭示（Str-GCL）的新框架。Str-GCL利用一阶逻辑规则来表示结构常识，并将其明确地整合到GCL框架中。它引入了基于拓扑和属性的规则，而不改变原始图，并采用表示对齐机制来指导编码器有效捕捉这种常识。据我们所知，这是第一次尝试直接将结构常识纳入GCL中。大量实验证明，Str-GCL优于现有的GCL方法，在图表示学习中提供了一种利用结构常识的新视角。

更新时间: 2025-07-09 03:41:48

领域: cs.LG

下载: http://arxiv.org/abs/2507.07141v1

Multi-Agent Pathfinding Under Team-Connected Communication Constraint via Adaptive Path Expansion and Dynamic Leading

This paper proposes a novel planning framework to handle a multi-agent pathfinding problem under team-connected communication constraint, where all agents must have a connected communication channel to the rest of the team during their entire movements. Standard multi-agent path finding approaches (e.g., priority-based search) have potential in this domain but fail when neighboring configurations at start and goal differ. Their single-expansion approach -- computing each agent's path from the start to the goal in just a single expansion -- cannot reliably handle planning under communication constraints for agents as their neighbors change during navigating. Similarly, leader-follower approaches (e.g., platooning) are effective at maintaining team communication, but fixing the leader at the outset of planning can cause planning to become stuck in dense-clutter environments, limiting their practical utility. To overcome this limitation, we propose a novel two-level multi-agent pathfinding framework that integrates two techniques: adaptive path expansion to expand agent paths to their goals in multiple stages; and dynamic leading technique that enables the reselection of the leading agent during each agent path expansion whenever progress cannot be made. Simulation experiments show the efficiency of our planners, which can handle up to 25 agents across five environment types under a limited communication range constraint and up to 11-12 agents on three environment types under line-of-sight communication constraint, exceeding 90% success-rate where baselines routinely fail.

Updated: 2025-07-09 03:41:02

标题: 多智能体路径规划在团队连接通信约束下的自适应路径扩展和动态领导下

摘要: 本文提出了一个新颖的规划框架，用于处理在团队连接通信约束下的多智能体路径规划问题，其中所有智能体在整个移动过程中必须具有与团队其他成员保持连接的通信渠道。标准的多智能体路径规划方法（例如，基于优先级的搜索）在这一领域具有潜力，但在起始和目标处的相邻配置不同时会失败。它们的单次扩展方法 - 仅在单次扩展中计算每个智能体从起点到目标的路径 - 无法可靠地处理智能体在导航过程中邻近配置的变化下的通信约束下的规划。同样，领导者-跟随者方法（例如，编队）在保持团队通信方面是有效的，但在规划开始时将领导者固定可能导致规划在密集混乱环境中陷入困境，限制其实际效用。为了克服这一局限，我们提出了一个新颖的两级多智能体路径规划框架，该框架集成了两种技术：自适应路径扩展，将智能体路径分阶段扩展到目标；以及动态领先技术，使得在每次智能体路径扩展时都可以重新选择领先智能体，无法取得进展时。模拟实验显示了我们的规划器的效率，它们可以在有限通信范围约束下处理多达25个智能体和五种环境类型，在视线通信约束下处理多达11-12个智能体和三种环境类型，在常规基线失败的情况下成功率超过90%。

更新时间: 2025-07-09 03:41:02

领域: cs.AI,cs.MA,cs.RO

下载: http://arxiv.org/abs/2501.02770v3

Gradientsys: A Multi-Agent LLM Scheduler with ReAct Orchestration

We present Gradientsys, a next-generation multi-agent scheduling framework that coordinates diverse specialized AI agents using a typed Model-Context Protocol (MCP) and a ReAct-based dynamic planning loop. At its core, Gradientsys employs an LLM-powered scheduler for intelligent one-to-many task dispatch, enabling parallel execution of heterogeneous agents such as PDF parsers, web search modules, GUI controllers, and web builders. The framework supports hybrid synchronous/asynchronous execution, respects agent capacity constraints, and incorporates a robust retry-and-replan mechanism to handle failures gracefully. To promote transparency and trust, Gradientsys includes an observability layer streaming real-time agent activity and intermediate reasoning via Server-Sent Events (SSE). We offer an architectural overview and evaluate Gradientsys against existing frameworks in terms of extensibility, scheduling topology, tool reusability, parallelism, and observability. Experiments on the GAIA general-assistant benchmark show that Gradientsys achieves higher task success rates with reduced latency and lower API costs compared to a MinionS-style baseline, demonstrating the strength of its LLM-driven multi-agent orchestration.

Updated: 2025-07-09 03:40:56

标题: Gradientsys：一种具有ReAct编排的多代理LLM调度器

摘要: 我们提出了Gradientsys，一个协调各种专门的人工智能代理的下一代多代理调度框架，使用类型化的模型-上下文协议（MCP）和基于ReAct的动态规划循环。在其核心，Gradientsys采用了LLM驱动的调度程序，用于智能一对多任务分发，实现异构代理（如PDF解析器、网络搜索模块、GUI控制器和网络生成器）的并行执行。该框架支持混合同步/异步执行，尊重代理容量约束，并包含一个强大的重试和重新规划机制，以优雅地处理失败。为了促进透明度和信任，Gradientsys包括一个可观察性层，通过服务器发送事件（SSE）实时传输代理活动和中间推理。我们提供了一个架构概述，并根据可扩展性、调度拓扑、工具可重用性、并行性和可观察性等方面评估Gradientsys与现有框架。在GAIA通用助手基准测试上的实验表明，与MinionS风格基线相比，Gradientsys实现了更高的任务成功率，同时降低了延迟和API成本，展示了其LLM驱动的多代理编排的实力。

更新时间: 2025-07-09 03:40:56

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2507.06520v1

Failure Forecasting Boosts Robustness of Sim2Real Rhythmic Insertion Policies

This paper addresses the challenges of Rhythmic Insertion Tasks (RIT), where a robot must repeatedly perform high-precision insertions, such as screwing a nut into a bolt with a wrench. The inherent difficulty of RIT lies in achieving millimeter-level accuracy and maintaining consistent performance over multiple repetitions, particularly when factors like nut rotation and friction introduce additional complexity. We propose a sim-to-real framework that integrates a reinforcement learning-based insertion policy with a failure forecasting module. By representing the wrench's pose in the nut's coordinate frame rather than the robot's frame, our approach significantly enhances sim-to-real transferability. The insertion policy, trained in simulation, leverages real-time 6D pose tracking to execute precise alignment, insertion, and rotation maneuvers. Simultaneously, a neural network predicts potential execution failures, triggering a simple recovery mechanism that lifts the wrench and retries the insertion. Extensive experiments in both simulated and real-world environments demonstrate that our method not only achieves a high one-time success rate but also robustly maintains performance over long-horizon repetitive tasks.

Updated: 2025-07-09 03:38:44

标题: 故障预测提升Sim2Real节奏插入策略的稳健性

摘要: 这篇论文讨论了节奏插入任务（RIT）的挑战，其中机器人必须重复执行高精度的插入动作，比如用扳手将螺母螺丝在螺栓上。 RIT的固有困难在于实现毫米级精度，并在多次重复时保持一致的性能，特别是当螺母旋转和摩擦等因素引入额外复杂性时。我们提出了一个sim-to-real框架，将基于强化学习的插入策略与故障预测模块集成在一起。通过将扳手的姿势表示为螺母的坐标系而不是机器人的坐标系，我们的方法显著增强了sim-to-real的可转移性。在模拟中训练的插入策略利用实时6D姿态跟踪来执行精确的对准、插入和旋转动作。同时，一个神经网络预测潜在的执行失败，触发一个简单的恢复机制，将扳手抬起并重新尝试插入。在模拟和真实环境中进行的大量实验表明，我们的方法不仅实现了高一次成功率，而且在长时间重复任务中稳健地保持性能。

更新时间: 2025-07-09 03:38:44

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2507.06519v1

Instance-Wise Monotonic Calibration by Constrained Transformation

Deep neural networks often produce miscalibrated probability estimates, leading to overconfident predictions. A common approach for calibration is fitting a post-hoc calibration map on unseen validation data that transforms predicted probabilities. A key desirable property of the calibration map is instance-wise monotonicity (i.e., preserving the ranking of probability outputs). However, most existing post-hoc calibration methods do not guarantee monotonicity. Previous monotonic approaches either use an under-parameterized calibration map with limited expressive ability or rely on black-box neural networks, which lack interpretability and robustness. In this paper, we propose a family of novel monotonic post-hoc calibration methods, which employs a constrained calibration map parameterized linearly with respect to the number of classes. Our proposed approach ensures expressiveness, robustness, and interpretability while preserving the relative ordering of the probability output by formulating the proposed calibration map as a constrained optimization problem. Our proposed methods achieve state-of-the-art performance across datasets with different deep neural network models, outperforming existing calibration methods while being data and computation-efficient. Our code is available at https://github.com/YunruiZhang/Calibration-by-Constrained-Transformation

Updated: 2025-07-09 03:32:49

标题: 一种通过受限变换实现逐实例单调校准

摘要: 深度神经网络经常产生 miscalibrated 概率估计，导致过度自信的预测。一种常见的校准方法是在未见过的验证数据上拟合一个后处理校准映射，将预测的概率转换为校准后的概率。校准映射的一个关键可取之处是实例级单调性（即保留概率输出的排名）。然而，大多数现有的后处理校准方法并不保证单调性。先前的单调方法要么使用具有有限表达能力的欠参数化校准映射，要么依赖于缺乏可解释性和稳健性的黑盒神经网络。在本文中，我们提出了一系列新颖的单调后处理校准方法，采用了一个与类别数量线性参数化的约束校准映射。我们提出的方法确保了表达能力、稳健性和可解释性，同时通过将所提出的校准映射构建为一个约束优化问题来保持概率输出的相对顺序。我们的提出的方法在不同的深度神经网络模型的数据集上取得了最先进的性能，优于现有的校准方法，并且具有数据和计算效率。我们的代码可在 https://github.com/YunruiZhang/Calibration-by-Constrained-Transformation 获取。

更新时间: 2025-07-09 03:32:49

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.06516v1

Breaking PEFT Limitations: Leveraging Weak-to-Strong Knowledge Transfer for Backdoor Attacks in LLMs

Despite being widely applied due to their exceptional capabilities, Large Language Models (LLMs) have been proven to be vulnerable to backdoor attacks. These attacks introduce targeted vulnerabilities into LLMs by poisoning training samples and full-parameter fine-tuning (FPFT). However, this kind of backdoor attack is limited since they require significant computational resources, especially as the size of LLMs increases. Besides, parameter-efficient fine-tuning (PEFT) offers an alternative but the restricted parameter updating may impede the alignment of triggers with target labels. In this study, we first verify that backdoor attacks with PEFT may encounter challenges in achieving feasible performance. To address these issues and improve the effectiveness of backdoor attacks with PEFT, we propose a novel backdoor attack algorithm from the weak-to-strong based on Feature Alignment-enhanced Knowledge Distillation (FAKD). Specifically, we poison small-scale language models through FPFT to serve as the teacher model. The teacher model then covertly transfers the backdoor to the large-scale student model through FAKD, which employs PEFT. Theoretical analysis reveals that FAKD has the potential to augment the effectiveness of backdoor attacks. We demonstrate the superior performance of FAKD on classification tasks across four language models, four backdoor attack algorithms, and two different architectures of teacher models. Experimental results indicate success rates close to 100% for backdoor attacks targeting PEFT.

Updated: 2025-07-09 03:32:17

标题: 突破PEFT限制：利用弱到强的知识转移进行LLM中的后门攻击

摘要: 尽管由于其卓越的能力而被广泛应用，但大型语言模型（LLMs）已被证明容易受到后门攻击的影响。这些攻击通过在LLMs中植入有针对性的漏洞，通过毒化训练样本和全参数微调（FPFT）引入。然而，这种类型的后门攻击受到限制，因为它们需要大量的计算资源，尤其是随着LLMs的规模增大。此外，参数高效微调（PEFT）提供了一种替代方法，但受限的参数更新可能阻碍触发器与目标标签的对齐。在本研究中，我们首先验证了使用PEFT进行后门攻击可能在实现可行性性能方面遇到挑战。为了解决这些问题并提高使用PEFT进行后门攻击的效果，我们提出了一种基于特征对齐增强知识蒸馏（FAKD）的从弱到强的新型后门攻击算法。具体而言，我们通过FPFT来毒化小规模语言模型，作为教师模型。然后，教师模型通过FAKD悄悄地将后门传递给大规模学生模型，使用PEFT。理论分析表明，FAKD有潜力增强后门攻击的效果。我们展示了FAKD在分类任务上的卓越性能，涵盖了四个语言模型、四个后门攻击算法和两种不同的教师模型架构。实验结果表明，针对PEFT的后门攻击成功率接近100%。

更新时间: 2025-07-09 03:32:17

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.17946v4

SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning

This paper introduces SagaLLM, a structured multi-agent architecture designed to address four foundational limitations of current LLM-based planning systems: unreliable self-validation, context loss, lack of transactional safeguards, and insufficient inter-agent coordination. While recent frameworks leverage LLMs for task decomposition and multi-agent communication, they often fail to ensure consistency, rollback, or constraint satisfaction across distributed workflows. SagaLLM bridges this gap by integrating the Saga transactional pattern with persistent memory, automated compensation, and independent validation agents. It leverages LLMs' generative reasoning to automate key tasks traditionally requiring hand-coded coordination logic, including state tracking, dependency analysis, log schema generation, and recovery orchestration. Although SagaLLM relaxes strict ACID guarantees, it ensures workflow-wide consistency and recovery through modular checkpointing and compensable execution. Empirical evaluations across planning domains demonstrate that standalone LLMs frequently violate interdependent constraints or fail to recover from disruptions. In contrast, SagaLLM achieves significant improvements in consistency, validation accuracy, and adaptive coordination under uncertainty, establishing a robust foundation for real-world, scalable LLM-based multi-agent systems.

Updated: 2025-07-09 03:31:59

标题: SagaLLM：多智能体LLM规划的上下文管理、验证和事务保证

摘要: 这篇论文介绍了SagaLLM，这是一种结构化的多智能体架构，旨在解决当前基于LLM的规划系统的四个基本限制：不可靠的自我验证，上下文丢失，缺乏事务保障和不足的智能体协调。尽管最近的框架利用LLMs进行任务分解和多智能体通信，但它们经常无法确保一致性，回滚或在分布式工作流中满足约束。SagaLLM通过将Saga事务模式与持久性存储器，自动补偿和独立验证智能体集成，弥合了这一差距。它利用LLMs的生成推理自动化传统需要手工编码协调逻辑的关键任务，包括状态跟踪，依赖分析，日志模式生成和恢复编排。虽然SagaLLM放宽了严格的ACID保证，但通过模块化的检查点和可补偿执行确保工作流程的一致性和恢复。在规划领域的实证评估中显示，独立LLMs经常违反相互依赖的约束或无法从中断中恢复。相比之下，SagaLLM在一致性，验证准确性和在不确定性下的自适应协调方面取得了显著改进，为实际，可扩展的基于LLM的多智能体系统奠定了坚实基础。

更新时间: 2025-07-09 03:31:59

领域: cs.AI,I.2.7

下载: http://arxiv.org/abs/2503.11951v3

Towards LLM-based Root Cause Analysis of Hardware Design Failures

With advances in large language models (LLMs), new opportunities have emerged to develop tools that support the digital hardware design process. In this work, we explore how LLMs can assist with explaining the root cause of design issues and bugs that are revealed during synthesis and simulation, a necessary milestone on the pathway towards widespread use of LLMs in the hardware design process and for hardware security analysis. We find promising results: for our corpus of 34 different buggy scenarios, OpenAI's o3-mini reasoning model reached a correct determination 100% of the time under pass@5 scoring, with other state of the art models and configurations usually achieving more than 80% performance and more than 90% when assisted with retrieval-augmented generation.

Updated: 2025-07-09 03:25:52

标题: 朝向基于LLM的硬件设计故障根本原因分析

摘要: 随着大型语言模型（LLMs）的进步，新的机遇出现了，可以开发支持数字硬件设计过程的工具。在这项工作中，我们探讨了LLMs如何在综合和仿真过程中揭示的设计问题和错误的根本原因，并且在通往硬件设计过程中广泛使用LLMs和硬件安全分析的必要里程碑上提供帮助。我们发现了令人鼓舞的结果：对于我们的34种不同错误场景的语料库，OpenAI的o3-mini推理模型在pass@5评分下100%的时间达到正确的决定，其他最先进的模型和配置通常达到80%以上的性能，并且在检索增强生成的帮助下通常可以达到90%以上的性能。

更新时间: 2025-07-09 03:25:52

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2507.06512v1

Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts

Merging parameter-efficient task experts has recently gained growing attention as a way to build modular architectures that can be rapidly adapted on the fly for specific downstream tasks, without requiring additional fine-tuning. Typically, LoRA serves as the foundational building block of such parameter-efficient modular architectures, leveraging low-rank weight structures to reduce the number of trainable parameters. In this paper, we study the properties of sparse adapters, which train only a subset of weights in the base neural network, as potential building blocks of modular architectures. First, we propose a simple method for training highly effective sparse adapters, which is conceptually simpler than existing methods in the literature and surprisingly outperforms both LoRA and full fine-tuning in our setting. Next, we investigate the merging properties of these sparse adapters by merging adapters for up to 20 natural language processing tasks, thus scaling beyond what is usually studied in the literature. Our findings demonstrate that sparse adapters yield superior in-distribution performance post-merging compared to LoRA or full model merging. Achieving strong held-out performance remains a challenge for all methods considered.

Updated: 2025-07-09 03:25:45

标题: 探索稀疏适配器以实现参数高效专家的可扩展合并

摘要: 最近，将参数高效的任务专家合并成为一种越来越受关注的方式，用于构建可以快速适应特定下游任务的模块化架构，而无需额外的微调。通常，LoRA作为这种参数高效模块化架构的基础构建模块，利用低秩权重结构来减少可训练参数的数量。本文研究了稀疏适配器的特性，这些适配器仅训练基础神经网络中的一部分权重，作为模块化架构的潜在构建模块。首先，我们提出了一种简单的方法来训练高效的稀疏适配器，这种方法在概念上比现有文献中的方法更简单，令人惊讶地在我们的设置中表现出比LoRA和完全微调更好的效果。接下来，我们通过合并高达20个自然语言处理任务的适配器，研究了这些稀疏适配器的合并性质，从而扩展到通常研究的范围之外。我们的研究结果表明，与LoRA或完全模型合并相比，稀疏适配器在合并后的内部分布性能上表现出优越性。实现强大的保留性能仍然是所有考虑的方法面临的挑战。

更新时间: 2025-07-09 03:25:45

领域: cs.LG

下载: http://arxiv.org/abs/2507.07140v1

UniF$^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

Unified multimodal models (UMMs) have emerged as a powerful paradigm in foundational computer vision research, demonstrating significant potential in both image understanding and generation. However, existing research in the face domain primarily focuses on $\textbf{coarse}$ facial attribute understanding, with limited capacity to handle $\textbf{fine-grained}$ facial attributes and without addressing generation capabilities. To overcome these limitations, we propose UniF$^2$ace, the first UMM tailored specifically for fine-grained face understanding and generation. In general, we train UniF$^2$ace on a self-constructed, specialized dataset utilizing two mutually beneficial diffusion techniques and a two-level mixture-of-experts architecture. Specifically, we first build a large-scale facial dataset, UniF$^2$ace-130K, which contains 130K image-text pairs with one million question-answering pairs that span a wide range of facial attributes. Second, we establish a theoretical connection between discrete diffusion score matching and masked generative models, optimizing both evidence lower bounds simultaneously, which significantly improves the model's ability to synthesize facial details. Finally, we introduce both token-level and sequence-level mixture-of-experts, enabling efficient fine-grained representation learning for both understanding and generation tasks. Extensive experiments on UniF$^2$ace-130K demonstrate that UniF$^2$ace outperforms existing UMMs and generative models, achieving superior performance across both understanding and generation tasks.

Updated: 2025-07-09 03:25:22

标题: UniF$^2$ace：统一多模态模型的细粒度人脸理解和生成

摘要: 统一多模态模型（UMMs）已经成为基础计算机视觉研究中的一个强大范式，在图像理解和生成方面展示了显著潜力。然而，在面部领域的现有研究主要集中在粗略的面部属性理解上，对于处理细粒度面部属性并且没有解决生成能力的能力有限。为了克服这些限制，我们提出了UniF$^2$ace，这是第一个专门针对细粒度面部理解和生成的UMM。通常情况下，我们在一个自建的专门数据集上训练UniF$^2$ace，利用两种互惠扩散技术和两级专家混合架构。具体地，我们首先构建了一个大规模的面部数据集UniF$^2$ace-130K，包含130K个图像文本对和一百万个问题回答对，涵盖了各种面部属性。其次，我们建立了离散扩散分数匹配和遮蔽生成模型之间的理论连接，同时优化两者的证据下界，这显著提高了模型合成面部细节的能力。最后，我们引入了基于标记级和序列级的专家混合模型，实现了对理解和生成任务的高效细粒度表示学习。在UniF$^2$ace-130K上的大量实验证明，UniF$^2$ace在理解和生成任务上均优于现有的UMMs和生成模型，表现出卓越的性能。

更新时间: 2025-07-09 03:25:22

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2503.08120v3

Prediction-Augmented Mechanism Design for Weighted Facility Location

Facility location is fundamental in operations research, mechanism design, and algorithmic game theory, with applications ranging from urban infrastructure planning to distributed systems. Recent research in this area has focused on augmenting classic strategyproof mechanisms with predictions to achieve an improved performance guarantee against the uncertainty under the strategic environment. Previous work has been devoted to address the trade-off obstacle of balancing the consistency (near-optimality under accurate predictions) and robustness (bounded inefficiency under poor predictions) primarily in the unweighted setting, assuming that all agents have the same importance. However, this assumption may not be true in some practical scenarios, leading to research of weighted facility location problems. The major contribution of the current work is to provide a prediction augmented algorithmic framework for balancing the consistency and robustness over strategic agents with non-uniform weights. In particular, through a reduction technique that identifies a subset of \emph{representative} instances and maps the other given locations to the representative ones, we prove that there exists a \emph{strategyproof} mechanism achieving a bounded consistency guarantee of $\frac{\sqrt{(1+c)^2W^2_{\min}+(1-c)^2W^2_{\max}}}{(1+c)W_{\min}}$ and a bounded robustness guarantee of $\frac{\sqrt{(1-c)^2W^2_{\min}+(1+c)^2W^2_{\max}}}{(1-c)W_{\min}}$ in weighted settings, where $c$ can be viewed as a parameter to make a trade-off between the consistency and robustness and $W_{\min}$ and $W_{\max}$ denote the minimum and maximum agents' weight. We also proved that there is no strategyproof deterministic mechanism that reach $1$-consistency and $O\left( n \cdot \frac{W_{\max}}{W_{\min}} \right)$-robustness in weighted FLP, even with fully predictions of all agents.

Updated: 2025-07-09 03:13:52

标题: 加权设施选址的预测增强机制设计

摘要: 设施选址在运营研究、机制设计和算法博弈论中是基础性的，应用范围从城市基础设施规划到分布式系统。该领域的最新研究集中在将经典的无悔机制与预测相结合，以实现在战略环境下对不确定性的改进性能保证。先前的工作致力于解决在无权设置下平衡一致性（在准确预测下接近最优）和稳健性（在预测不佳情况下的有限低效率）之间的权衡障碍，假设所有代理人具有相同的重要性。然而，在某些实际场景中，这种假设可能不成立，导致了对加权设施选址问题的研究。本文的主要贡献是提供一个预测增强的算法框架，用于平衡非均匀权重的战略代理人之间的一致性和稳健性。特别是，通过一种识别\emph{代表性}实例子集并将其他给定位置映射到代表性位置的减少技术，我们证明存在一个\emph{无悔}机制，实现了在加权设置下的有界一致性保证为$\frac{\sqrt{(1+c)^2W^2_{\min}+(1-c)^2W^2_{\max}}}{(1+c)W_{\min}}$，以及有界稳健性保证为$\frac{\sqrt{(1-c)^2W^2_{\min}+(1+c)^2W^2_{\max}}}{(1-c)W_{\min}}$，其中$c$可以被视为在一致性和稳健性之间进行权衡的参数，$W_{\min}$和$W_{\max}$分别表示最小和最大代理人权重。我们还证明了在加权设施选址问题中，即使对所有代理人进行了完全预测，也不存在能够达到$1$-一致性和$O\left( n \cdot \frac{W_{\max}}{W_{\min}} \right)$-稳健性的无悔确定性机制。

更新时间: 2025-07-09 03:13:52

领域: cs.DS,cs.GT,cs.LG,68W27, 68Q32,F.2.2

下载: http://arxiv.org/abs/2507.06509v1

Empowering Bridge Digital Twins by Bridging the Data Gap with a Unified Synthesis Framework

As critical transportation infrastructure, bridges face escalating challenges from aging and deterioration, while traditional manual inspection methods suffer from low efficiency. Although 3D point cloud technology provides a new data-driven paradigm, its application potential is often constrained by the incompleteness of real-world data, which results from missing labels and scanning occlusions. To overcome the bottleneck of insufficient generalization in existing synthetic data methods, this paper proposes a systematic framework for generating 3D bridge data. This framework can automatically generate complete point clouds featuring component-level instance annotations, high-fidelity color, and precise normal vectors. It can be further extended to simulate the creation of diverse and physically realistic incomplete point clouds, designed to support the training of segmentation and completion networks, respectively. Experiments demonstrate that a PointNet++ model trained with our synthetic data achieves a mean Intersection over Union (mIoU) of 84.2% in real-world bridge semantic segmentation. Concurrently, a fine-tuned KT-Net exhibits superior performance on the component completion task. This research offers an innovative methodology and a foundational dataset for the 3D visual analysis of bridge structures, holding significant implications for advancing the automated management and maintenance of infrastructure.

Updated: 2025-07-09 03:13:38

标题: 通过统一综合框架弥合数据差距，增强桥梁数字孪生模型

摘要: 作为关键的交通基础设施，桥梁面临着因老化和损坏而不断加剧的挑战，而传统的手动检测方法效率低下。尽管3D点云技术提供了一种新的数据驱动范式，但其应用潜力常常受到现实世界数据的不完整性的限制，这是由于缺失标签和扫描遮挡所导致的。为了克服现有合成数据方法中普遍化不足的瓶颈，本文提出了一个系统框架用于生成3D桥梁数据。该框架可以自动生成具有组件级实例注释、高保真度颜色和精确法向量的完整点云。它还可以进一步扩展以模拟创建多样化和物理现实的不完整点云，分别设计用于支持分割和完成网络的训练。实验证明，使用我们的合成数据训练的PointNet++模型在实际桥梁语义分割中实现了84.2%的平均IoU。同时，经过微调的KT-Net在组件完成任务上表现出优越性能。本研究提供了一种创新方法论和一个基础数据集，用于桥梁结构的3D视觉分析，对推动基础设施的自动化管理和维护具有重要意义。

更新时间: 2025-07-09 03:13:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.05814v2

Subgraph Counting under Edge Local Differential Privacy Based on Noisy Adjacency Matrix

When analyzing connection patterns within graphs, subgraph counting serves as an effective and fundamental approach. Edge-local differential privacy (edge-LDP) and shuffle model have been employed to achieve subgraph counting under a privacy-preserving situation. Existing algorithms are plagued by high time complexity, excessive download costs, low accuracy, or dependence on trusted third parties. To address the aforementioned challenges, we propose the Noisy Adjacency Matrix (NAM), which combines differential privacy with the adjacency matrix of the graph. NAM offers strong versatility and scalability, making it applicable to a wider range of DP variants, DP mechanisms, and graph types. Based on NAM, we designed five algorithms (TriOR, TriTR, TriMTR, QuaTR, and 2STAR) to count three types of subgraphs: triangles, quadrangles, and 2-stars. Theoretical and experimental results demonstrate that in triangle counting, TriOR maximizes accuracy with reduced time complexity among one-round algorithms, TriTR achieves optimal accuracy, TriMTR achieves the highest accuracy under low download costs, and QuaTR stands as the first quadrangle counting algorithm under pure edge-LDP. We implement edge-LDP for noisy data via a confidence interval-inspired method, providing DP guarantees on randomized data. Our 2STAR algorithm achieves the highest accuracy in 2-star counting and can be derived as a byproduct of two-round triangle or quadrangle counting algorithms, enabling efficient joint estimation of triangle, quadrangle, and 2-star counts within two query rounds.

Updated: 2025-07-09 03:13:15

标题: 基于嘈杂的邻接矩阵的边缘局部差分隐私下的子图计数

摘要: 在分析图中连接模式时，子图计数被视为一种有效且基础的方法。边缘局部差分隐私（edge-LDP）和洗牌模型已被用于在隐私保护的情况下实现子图计数。现有的算法存在时间复杂度高、下载成本过高、准确度低或依赖可信第三方的问题。为了解决上述挑战，我们提出了噪声邻接矩阵（NAM），将差分隐私与图的邻接矩阵相结合。NAM具有强大的通用性和可扩展性，适用于更广泛的差分隐私变体、差分隐私机制和图类型。基于NAM，我们设计了五种算法（TriOR、TriTR、TriMTR、QuaTR和2STAR）用于计算三种类型的子图：三角形、四边形和2星。理论和实验结果表明，在三角形计数中，TriOR在一轮算法中减少了时间复杂度，并实现了最大的准确度，TriTR实现了最佳准确度，TriMTR在低下载成本下实现了最高的准确度，QuaTR成为了纯边缘-LDP下的第一个四边形计数算法。我们通过一种基于置信区间的方法实现了对嘈杂数据的边缘-LDP，提供了对随机化数据的差分隐私保证。我们的2STAR算法在2星计数中实现了最高的准确度，并且可以作为两轮三角形或四边形计数算法的副产品，实现了三角形、四边形和2星计数的有效联合估计。

更新时间: 2025-07-09 03:13:15

领域: cs.CR

下载: http://arxiv.org/abs/2507.06508v1

GR-LLMs: Recent Advances in Generative Recommendation Based on Large Language Models

In the past year, Generative Recommendations (GRs) have undergone substantial advancements, especially in leveraging the powerful sequence modeling and reasoning capabilities of Large Language Models (LLMs) to enhance overall recommendation performance. LLM-based GRs are forming a new paradigm that is distinctly different from discriminative recommendations, showing strong potential to replace traditional recommendation systems heavily dependent on complex hand-crafted features. In this paper, we provide a comprehensive survey aimed at facilitating further research of LLM-based GRs. Initially, we outline the general preliminaries and application cases of LLM-based GRs. Subsequently, we introduce the main considerations when LLM-based GRs are applied in real industrial scenarios. Finally, we explore promising directions for LLM-based GRs. We hope that this survey contributes to the ongoing advancement of the GR domain.

Updated: 2025-07-09 03:13:08

标题: GR-LLMs：基于大型语言模型的生成式推荐最新进展

摘要: 在过去的一年中，生成式推荐（GRs）取得了重大进展，特别是利用大型语言模型（LLMs）的强大序列建模和推理能力来增强整体推荐性能。基于LLM的GRs正在形成一个与判别式推荐明显不同的新范式，表现出取代传统依赖于复杂手工特征的推荐系统的强大潜力。在本文中，我们提供了一份旨在促进LLM-based GRs进一步研究的全面调查。首先，我们概述了基于LLM的GRs的一般初步和应用案例。随后，我们介绍了当基于LLM的GRs应用于实际工业场景时的主要考虑因素。最后，我们探讨了基于LLM的GRs的有前途的方向。我们希望这份调查有助于推动GR领域的持续进步。

更新时间: 2025-07-09 03:13:08

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2507.06507v1

Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings

Translating wordplay across languages presents unique challenges that have long confounded both professional human translators and machine translation systems. This research proposes a novel approach for translating puns from English to French by combining state-of-the-art large language models with specialized techniques for wordplay generation. Our methodology employs a three-stage approach. First, we establish a baseline using multiple frontier large language models with feedback based on a new contrastive learning dataset. Second, we implement a guided chain-of-thought pipeline with combined phonetic-semantic embeddings. Third, we implement a multi-agent generator-discriminator framework for evaluating and regenerating puns with feedback. Moving beyond the limitations of literal translation, our methodology's primary objective is to capture the linguistic creativity and humor of the source text wordplay, rather than simply duplicating its vocabulary. Our best runs earned first and second place in the CLEF JOKER 2025 Task 2 competition where they were evaluated manually by expert native French speakers. This research addresses a gap between translation studies and computational linguistics by implementing linguistically-informed techniques for wordplay translation, advancing our understanding of how language models can be leveraged to handle the complex interplay between semantic ambiguity, phonetic similarity, and the implicit cultural and linguistic awareness needed for successful humor.

Updated: 2025-07-09 03:09:14

标题: 标题翻译：刻意双关：使用对比学习和语音-语义嵌入的多智能体文字游戏翻译

摘要: 跨语言翻译文字游戏面临着独特挑战，长期以来一直困扰着专业人类翻译人员和机器翻译系统。本研究提出了一种新颖的方法，通过结合最先进的大型语言模型和专门的文字游戏生成技术，将英语双关语翻译成法语。我们的方法论采用了三阶段方法。首先，我们使用多个前沿大型语言模型建立基线，并根据新的对比学习数据集进行反馈。其次，我们实施了一个带有组合音韵-语义嵌入的引导思维链管道。第三，我们实施了一个用于评估和再生双关语的多智能体生成器-鉴别器框架。超越了字面翻译的限制，我们方法的主要目标是捕捉源文本文字游戏的语言创造力和幽默，而不仅仅是复制其词汇。我们最佳的运行在CLEF JOKER 2025任务2竞赛中获得了第一和第二名，经由专业的法语母语人士手动评估。这项研究通过实施语言学启发的文字游戏翻译技术，弥补了翻译研究和计算语言学之间的差距，推进了我们对语言模型如何利用以处理语义模糊性、音韵相似性以及成功幽默所需的隐含文化和语言意识之间复杂相互作用的理解。

更新时间: 2025-07-09 03:09:14

领域: cs.CL,cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2507.06506v1

GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification

Integrating powerful but computationally expensive Pre-trained Language Models (PLMs) with Graph Neural Networks (GNNs) is a key challenge, especially on text-rich heterophilic graphs. We propose the Graph Masked Language Model (GMLM), a framework designed for the efficient and effective fusion of graph structure and text semantics. GMLM employs a two-stage process: first, a contrastive pre-training stage with a novel soft masking technique builds a robust multi-scale GNN; second, an end-to-end fine-tuning stage uses a dynamic active node selection strategy for scalability and a bi-directional cross-attention module for deep fusion. Experiments on five heterophilic benchmarks show GMLM achieves state-of-the-art results on four, significantly outperforming prior GNN and large LLM-based methods. For instance, it improves accuracy on the Texas dataset by over 8\% and on Wisconsin by nearly 5\%. Our work demonstrates that a sophisticated, deeply-integrated architecture can be more effective and efficient than larger, general-purpose models for text-rich graph representation learning.

Updated: 2025-07-09 03:08:21

标题: GMLM：连接图神经网络和语言模型，用于异质节点分类

摘要: 将强大但计算昂贵的预训练语言模型(PLMs)与图神经网络(GNNs)集成在文本丰富的异质图上是一个关键挑战。我们提出了图蒙版语言模型(GMLM)，这是一个旨在高效有效地融合图结构和文本语义的框架。GMLM采用两阶段过程：首先，采用一种新颖的软蒙版技术进行对比性预训练阶段，构建一个强大的多尺度GNN；其次，采用动态主动节点选择策略实现可扩展性，并采用双向交叉注意力模块进行深度融合的端到端微调阶段。在五个异质基准测试上的实验表明，GMLM在四个测试中取得了最先进的结果，明显优于之前的GNN和基于大型LLM的方法。例如，在Texas数据集上，准确率提高了超过8\%，在Wisconsin数据集上提高了近5\%。我们的工作表明，一个复杂、深度集成的架构可以比较大的通用模型更有效和更高效地进行文本丰富的图表示学习。

更新时间: 2025-07-09 03:08:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.05763v5

Oscillation-Reduced MXFP4 Training for Vision Transformers

Pre-training Transformers in FP4 precision is becoming a promising approach to gain substantial speedup, but it comes with a considerable loss of accuracy. Microscaling (MX) data format provides a fine-grained per-group quantization method to improve the representation ability of the FP4 format and is supported by the next-generation Blackwell GPU architecture. However, training with MXFP4 data format still results in significant degradation and there is a lack of systematic research on the reason. In this work, we propose a novel training method TetraJet for a more accurate FP4 training. We comprehensively evaluate all of the quantizers involved in the training, and identify the weight oscillation problem in the forward pass as the main source of the degradation in MXFP4 training. Therefore, we introduce two novel methods, EMA Quantizer (Q-EMA) and Adaptive Ramping Optimizer (Q-Ramping), to resolve the oscillation problem. Extensive experiments on Vision Transformers demonstrate that TetraJet consistently outperforms the existing 4-bit training methods, and Q-EMA & Q-Ramping can provide additional enhancement by effectively reducing oscillation. We decreased the accuracy degradation by more than $50\%$ compared to the baseline, and can even achieve competitive performance compared to full precision training. The codes are available at https://github.com/thu-ml/TetraJet-MXFP4Training

Updated: 2025-07-09 03:07:28

标题: 减少振荡的MXFP4视觉变换器训练

摘要: 在FP4精度下进行预训练的变压器正成为一种有前途的方法，可以获得实质性的加速，但却会带来相当大的精度损失。微缩放（MX）数据格式提供了一种细粒度的每组量化方法，以改善FP4格式的表示能力，并得到了下一代Blackwell GPU架构的支持。然而，使用MXFP4数据格式进行训练仍然会导致显著的降级，并且缺乏系统性研究其原因。在这项工作中，我们提出了一种更准确的FP4训练的新型训练方法TetraJet。我们全面评估了训练中涉及的所有量化器，并确定了前向传播中的权重振荡问题是MXFP4训练中降级的主要原因。因此，我们引入了两种新方法，EMA量化器（Q-EMA）和自适应斜坡优化器（Q-Ramping），以解决振荡问题。对Vision Transformers进行的广泛实验表明，TetraJet始终优于现有的4位训练方法，Q-EMA和Q-Ramping可以通过有效减少振荡提供额外的增强。与基准相比，我们将精度降低了超过50％，甚至可以实现与全精度训练相竞争的性能。代码可在https://github.com/thu-ml/TetraJet-MXFP4Training上找到。

更新时间: 2025-07-09 03:07:28

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2502.20853v2

MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models

As a prominent data modality task, time series forecasting plays a pivotal role in diverse applications. With the remarkable advancements in Large Language Models (LLMs), the adoption of LLMs as the foundational architecture for time series modeling has gained significant attention. Although existing models achieve some success, they rarely both model time and frequency characteristics in a pretraining-finetuning paradigm leading to suboptimal performance in predictions of complex time series, which requires both modeling periodicity and prior pattern knowledge of signals. We propose MoFE-Time, an innovative time series forecasting model that integrates time and frequency domain features within a Mixture of Experts (MoE) network. Moreover, we use the pretraining-finetuning paradigm as our training framework to effectively transfer prior pattern knowledge across pretraining and finetuning datasets with different periodicity distributions. Our method introduces both frequency and time cells as experts after attention modules and leverages the MoE routing mechanism to construct multidimensional sparse representations of input signals. In experiments on six public benchmarks, MoFE-Time has achieved new state-of-the-art performance, reducing MSE and MAE by 6.95% and 6.02% compared to the representative methods Time-MoE. Beyond the existing evaluation benchmarks, we have developed a proprietary dataset, NEV-sales, derived from real-world business scenarios. Our method achieves outstanding results on this dataset, underscoring the effectiveness of the MoFE-Time model in practical commercial applications.

Updated: 2025-07-09 03:00:56

标题: MoFE-Time：时序预测模型的频域专家混合

摘要: 作为一种重要的数据模态任务，时间序列预测在各种应用中发挥着关键作用。随着大型语言模型（LLMs）的显着进展，将LLMs作为时间序列建模的基础架构引起了广泛关注。尽管现有模型取得了一定成功，但它们很少在预训练-微调范式中同时对时间和频率特征进行建模，导致在复杂时间序列预测中性能不佳，这需要对信号的周期性和先前模式知识进行建模。我们提出了MoFE-Time，这是一种创新的时间序列预测模型，它在Mixture of Experts（MoE）网络中集成了时间和频率域特征。此外，我们将预训练-微调范式作为我们的训练框架，以有效地在具有不同周期分布的预训练和微调数据集之间传递先前的模式知识。我们的方法在注意力模块之后引入了频率和时间单元作为专家，并利用MoE路由机制构建输入信号的多维稀疏表示。在六个公共基准实验中，MoFE-Time取得了新的最先进性能，与代表性方法Time-MoE相比，将MSE和MAE降低了6.95%和6.02%。除了现有的评估基准之外，我们还开发了一个专有数据集NEV-sales，源自现实商业场景。我们的方法在这个数据集上取得了出色的结果，突显了MoFE-Time模型在实际商业应用中的有效性。

更新时间: 2025-07-09 03:00:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06502v1

Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning

Recent advances in image generation models (IGMs), particularly diffusion-based architectures such as Stable Diffusion (SD), have markedly enhanced the quality and diversity of AI-generated visual content. However, their generative capability has also raised significant ethical, legal, and societal concerns, including the potential to produce harmful, misleading, or copyright-infringing content. To mitigate these concerns, machine unlearning (MU) emerges as a promising solution by selectively removing undesirable concepts from pretrained models. Nevertheless, the robustness and effectiveness of existing unlearning techniques remain largely unexplored, particularly in the presence of multi-modal adversarial inputs. To bridge this gap, we propose Recall, a novel adversarial framework explicitly designed to compromise the robustness of unlearned IGMs. Unlike existing approaches that predominantly rely on adversarial text prompts, Recall exploits the intrinsic multi-modal conditioning capabilities of diffusion models by efficiently optimizing adversarial image prompts with guidance from a single semantically relevant reference image. Extensive experiments across ten state-of-the-art unlearning methods and diverse tasks show that Recall consistently outperforms existing baselines in terms of adversarial effectiveness, computational efficiency, and semantic fidelity with the original textual prompt. These findings reveal critical vulnerabilities in current unlearning mechanisms and underscore the need for more robust solutions to ensure the safety and reliability of generative models. Code and data are publicly available at \textcolor{blue}{https://github.com/ryliu68/RECALL}.

Updated: 2025-07-09 02:59:01

标题: 图像可以唤回你的记忆：一种新颖的多模式引导攻击对抗图像生成模型遗忘

摘要: 最近，图像生成模型（IGMs）的发展取得了显著进展，特别是基于扩散的架构，如稳定扩散（SD），极大地提高了AI生成的视觉内容的质量和多样性。然而，它们的生成能力也引发了重要的道德、法律和社会关切，包括可能产生有害、误导性或侵犯版权的内容。为了缓解这些问题，机器取消学习（MU）作为一种有前途的解决方案出现，通过有选择地从预训练模型中删除不良概念。然而，现有取消学习技术的鲁棒性和有效性仍然很少被探索，尤其是在存在多模态对抗输入的情况下。为了弥补这一差距，我们提出了Recall，一个新颖的对抗框架，专门设计来损害取消学习后的IGMs的鲁棒性。与现有方法主要依赖对抗性文本提示不同，Recall利用扩散模型固有的多模态调节能力，通过从单个语义相关的参考图像的指导下高效优化对抗性图像提示。通过对十种最先进的取消学习方法和不同任务的广泛实验表明，Recall在对抗效果、计算效率和与原始文本提示的语义保真度方面始终优于现有基线。这些发现揭示了当前取消学习机制的关键漏洞，并强调了需要更加鲁棒的解决方案以确保生成模型的安全性和可靠性。代码和数据公开可在\textcolor{blue}{https://github.com/ryliu68/RECALL}获取。

更新时间: 2025-07-09 02:59:01

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2507.07139v1

ModelCitizens: Representing Community Voices in Online Safety

Automatic toxic language detection is critical for creating safe, inclusive online spaces. However, it is a highly subjective task, with perceptions of toxic language shaped by community norms and lived experience. Existing toxicity detection models are typically trained on annotations that collapse diverse annotator perspectives into a single ground truth, erasing important context-specific notions of toxicity such as reclaimed language. To address this, we introduce MODELCITIZENS, a dataset of 6.8K social media posts and 40K toxicity annotations across diverse identity groups. To capture the role of conversational context on toxicity, typical of social media posts, we augment MODELCITIZENS posts with LLM-generated conversational scenarios. State-of-the-art toxicity detection tools (e.g. OpenAI Moderation API, GPT-o4-mini) underperform on MODELCITIZENS, with further degradation on context-augmented posts. Finally, we release LLAMACITIZEN-8B and GEMMACITIZEN-12B, LLaMA- and Gemma-based models finetuned on MODELCITIZENS, which outperform GPT-o4-mini by 5.5% on in-distribution evaluations. Our findings highlight the importance of community-informed annotation and modeling for inclusive content moderation. The data, models and code are available at https://github.com/asuvarna31/modelcitizens.

Updated: 2025-07-09 02:57:34

标题: 模范公民：在网络安全中代表社区声音

摘要: 自动检测有害语言对于创建安全、包容的在线空间至关重要。然而，这是一项高度主观的任务，有害语言的认知受到社区规范和生活经验的影响。现有的有害性检测模型通常是基于将不同注释者的观点合并为单一事实的注释进行训练，擦除了有关有害性的重要特定上下文概念，如被重新认领的语言。为了解决这个问题，我们介绍了MODELCITIZENS，这是一个包含6.8K社交媒体帖子和40K有害性注释的数据集，涵盖了各种身份群体。为了捕捉社交媒体帖子的典型对话上下文对有害性的影响，我们通过LLM生成对话情境来增强MODELCITIZENS帖子。目前最先进的有害性检测工具（例如OpenAI Moderation API、GPT-o4-mini）在MODELCITIZENS上表现不佳，并在增强上下文的帖子上进一步恶化。最后，我们发布了LLAMACITIZEN-8B和GEMMACITIZEN-12B，这是基于MODELCITIZENS进行微调的LLaMA和Gemma模型，比GPT-o4-mini在内部评估中表现出5.5%的优势。我们的发现突显了社区知情注释和建模对于包容性内容审核的重要性。数据、模型和代码可在https://github.com/asuvarna31/modelcitizens上找到。

更新时间: 2025-07-09 02:57:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.05455v2

A Survey on Artificial Noise for Physical Layer Security: Opportunities, Technologies, Guidelines, Advances, and Trends

Due to the broadcast nature of wireless communications, physical-layer security has attracted increasing concerns from both academia and industry. Artificial noise (AN), as one of the promising physical-layer security techniques, is capable of utilizing the spatial degree-of-freedom of channels to effectively enhance the security of wireless communications. In contrast to other physicallayer security techniques, the key distinguishing feature of AN is to generate specific interfering signals according to channel characteristics, increasing the secrecy capacity by reducing the wiretap channel capacity without affecting the legitimate channel capacity. Hence, this paper provides the latest survey of AN, including its evolution, modeling, backgrounds, applications, and future trends. Initially, we introduce the development, fundamentals, and backgrounds of AN. Subsequently, we highlight a comprehensive survey of the current state of research on various AN-empowered scenarios and AN-combined technologies. Finally, we discuss some technical challenges to tackle for AN-aided wireless security in the future.

Updated: 2025-07-09 02:54:05

标题: 关于物理层安全的人工噪声调查：机会、技术、指南、进展和趋势

摘要: 由于无线通信的广播特性，物理层安全引起了学术界和工业界日益关注。人工噪声（AN）作为一种有前景的物理层安全技术，能够利用信道的空间自由度有效增强无线通信的安全性。与其他物理层安全技术相比，AN的关键特点是根据信道特性生成特定的干扰信号，通过降低窃听信道容量而不影响合法信道容量来增加保密容量。因此，本文提供了AN的最新调查，包括其演变、建模、背景、应用和未来趋势。首先，我们介绍了AN的发展、基础和背景。接着，我们重点介绍了各种AN增强场景和AN组合技术的当前研究状况的综合调查。最后，我们讨论了未来在AN辅助无线安全方面需要解决的一些技术挑战。

更新时间: 2025-07-09 02:54:05

领域: cs.CR

下载: http://arxiv.org/abs/2507.06500v1

Terrier: A Deep Learning Repeat Classifier

Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Poor representation of taxa within repeat databases often limits the classification accuracy and reproducibility of current repeat annotation methods, limiting our understanding of repeat evolution and function. Terrier overcomes these challenges by leveraging deep learning for improved accuracy. Trained on Repbase, which includes over 100,000 repeat families -- four times more than Dfam -- Terrier maps 97.1% of Repbase sequences to RepeatMasker categories, offering the most comprehensive classification system available. When benchmarked against DeepTE, TERL, and TEclass2 in model organisms (rice, fruit flies, humans, and mice), Terrier achieved superior accuracy while classifying a broader range of sequences. Further validation in non-model amphibian, flatworm and Northern krill genomes highlights its effectiveness in improving classification in non-model species, facilitating research on repeat-driven evolution, genomic instability, and phenotypic variation.

Updated: 2025-07-09 02:48:47

标题: 獵犬：一種深度學習重複分類器

摘要: 重复的DNA序列支撑着基因组结构和进化过程，然而它们仍然具有挑战性的准确分类。Terrier是一个深度学习模型，旨在通过使用一个公开可用的、经过筛选的重复序列库来分类重复的DNA序列，该库在RepeatMasker模式下进行了训练。重复数据库中物种的表示不足通常限制了目前重复注释方法的分类准确性和可重复性，从而限制了我们对重复进化和功能的理解。Terrier通过利用深度学习来提高准确性来克服这些挑战。在Repbase上进行训练，该数据库包含超过10万个重复家族——比Dfam多四倍——Terrier将97.1%的Repbase序列映射到RepeatMasker类别，提供了目前最全面的分类系统。在模式生物（水稻、果蝇、人类和小鼠）中与DeepTE、TERL和TEclass2进行基准测试时，Terrier在分类更广泛的序列的同时实现了卓越的准确性。在非模式的两栖动物、扁形动物和北极甲壳动物基因组中进一步验证了其在改善非模式物种中的分类中的有效性，促进了对重复驱动进化、基因组不稳定性和表型变异的研究。

更新时间: 2025-07-09 02:48:47

领域: q-bio.GN,cs.LG,I.2

下载: http://arxiv.org/abs/2503.09312v2

TELSAFE: Security Gap Quantitative Risk Assessment Framework

Gaps between established security standards and their practical implementation have the potential to introduce vulnerabilities, possibly exposing them to security risks. To effectively address and mitigate these security and compliance challenges, security risk management strategies are essential. However, it must adhere to well-established strategies and industry standards to ensure consistency, reliability, and compatibility both within and across organizations. In this paper, we introduce a new hybrid risk assessment framework called TELSAFE, which employs probabilistic modeling for quantitative risk assessment and eliminates the influence of expert opinion bias. The framework encompasses both qualitative and quantitative assessment phases, facilitating effective risk management strategies tailored to the unique requirements of organizations. A specific use case utilizing Common Vulnerabilities and Exposures (CVE)-related data demonstrates the framework's applicability and implementation in real-world scenarios, such as in the telecommunications industry.

Updated: 2025-07-09 02:45:00

标题: TELSAFE：安全漏洞定量风险评估框架

摘要: 已建立的安全标准与它们的实际实施之间存在的差距可能会引入漏洞，可能使它们暴露于安全风险之中。为了有效解决和缓解这些安全和合规挑战，安全风险管理策略是必不可少的。然而，它必须遵循成熟的策略和行业标准，以确保组织内部和跨组织之间的一致性、可靠性和兼容性。在本文中，我们介绍了一种新的混合风险评估框架，名为TELSAFE，该框架采用概率建模进行定量风险评估，并消除了专家意见偏见的影响。该框架包含定性和定量评估阶段，促进了根据组织独特需求定制的有效风险管理策略。利用与通用漏洞和曝光（CVE）相关的数据展示了框架在现实场景中的适用性和实施，例如在电信行业中。

更新时间: 2025-07-09 02:45:00

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2507.06497v1

On the Inherent Privacy of Zeroth Order Projected Gradient Descent

Differentially private zeroth-order optimization methods have recently gained popularity in private fine tuning of machine learning models due to their reduced memory requirements. Current approaches for privatizing zeroth-order methods rely on adding Gaussian noise to the estimated zeroth-order gradients. However, since the search direction in the zeroth-order methods is inherently random, researchers including Tang et al. (2024) and Zhang et al. (2024a) have raised an important question: is the inherent noise in zeroth-order estimators sufficient to ensure the overall differential privacy of the algorithm? This work settles this question for a class of oracle-based optimization algorithms where the oracle returns zeroth-order gradient estimates. In particular, we show that for a fixed initialization, there exist strongly convex objective functions such that running (Projected) Zeroth-Order Gradient Descent (ZO-GD) is not differentially private. Furthermore, we show that even with random initialization and without revealing (initial and) intermediate iterates, the privacy loss in ZO-GD can grow superlinearly with the number of iterations when minimizing convex objective functions.

Updated: 2025-07-09 02:44:06

标题: 关于零阶投影梯度下降法的固有隐私性

摘要: 最近，由于其较低的内存需求，差分私有零阶优化方法在机器学习模型的私有微调中变得流行起来。目前用于使零阶方法私有化的方法依赖于向估计的零阶梯度添加高斯噪声。然而，由于零阶方法中的搜索方向本质上是随机的，包括唐等人（2024年）和张等人（2024年）在内的研究人员提出了一个重要问题：零阶估计器中的固有噪声是否足以确保算法的整体差分隐私性？本研究解决了这个问题，针对一个基于oracle的优化算法类，其中oracle返回零阶梯度估计。特别地，我们展示了对于固定的初始化，存在一类强凸目标函数，运行（投影）零阶梯度下降（ZO-GD）并不具有差分隐私性。此外，我们展示了即使使用随机初始化并且不透露（初始和）中间迭代，当最小化凸目标函数时，ZO-GD中的隐私损失可以随着迭代次数的增加而超线性增长。

更新时间: 2025-07-09 02:44:06

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.05610v2

Can adversarial attacks by large language models be attributed?

Attributing outputs from Large Language Models (LLMs) in adversarial settings-such as cyberattacks and disinformation campaigns-presents significant challenges that are likely to grow in importance. We approach this attribution problem from both a theoretical and an empirical perspective, drawing on formal language theory (identification in the limit) and data-driven analysis of the expanding LLM ecosystem. By modeling an LLM's set of possible outputs as a formal language, we analyze whether finite samples of text can uniquely pinpoint the originating model. Our results show that, under mild assumptions of overlapping capabilities among models, certain classes of LLMs are fundamentally non-identifiable from their outputs alone. We delineate four regimes of theoretical identifiability: (1) an infinite class of deterministic (discrete) LLM languages is not identifiable (Gold's classical result from 1967); (2) an infinite class of probabilistic LLMs is also not identifiable (by extension of the deterministic case); (3) a finite class of deterministic LLMs is identifiable (consistent with Angluin's tell-tale criterion); and (4) even a finite class of probabilistic LLMs can be non-identifiable (we provide a new counterexample establishing this negative result). Complementing these theoretical insights, we quantify the explosion in the number of plausible model origins (hypothesis space) for a given output in recent years. Even under conservative assumptions-each open-source model fine-tuned on at most one new dataset-the count of distinct candidate models doubles approximately every 0.5 years, and allowing multi-dataset fine-tuning combinations yields doubling times as short as 0.28 years. This combinatorial growth, alongside the extraordinary computational cost of brute-force likelihood attribution across all models and potential users, renders exhaustive attribution infeasible in practice.

Updated: 2025-07-09 02:35:36

标题: 大型语言模型的对抗性攻击可以归因于什么？

摘要: 将大型语言模型（LLMs）在敌对环境中的输出归因（如网络攻击和虚假信息活动）呈现出的挑战是相当重要的，并且可能会变得更加重要。我们从理论和实证两个角度解决这个归因问题，借鉴形式语言理论（极限识别）和对不断扩大的LLM生态系统进行数据驱动分析。通过将LLM的可能输出集建模为形式语言，我们分析了有限文本样本是否能够唯一确定原始模型。我们的结果显示，在对模型之间的重叠能力进行温和假设的情况下，某些类别的LLMs仅通过它们的输出是基本无法被识别的。我们划分了四种理论可识别性的情况：（1）无法识别一个无限类别的确定性（离散）LLM语言（Gold于1967年的经典结果）；（2）同样无法识别一个无限类别的概率性LLMs（通过确定性情况的推广）；（3）有限类别的确定性LLMs是可识别的（符合Angluin的显著标准）；（4）即使有限类别的概率性LLMs也可能无法被识别（我们提供了一个建立这一负面结果的新反例）。除了这些理论的见解，我们量化了在最近几年中对于给定输出的可能模型来源（假设空间）数量的激增。即使在保守假设下-每个开源模型最多在一个新数据集上进行微调-在大约每0.5年内，不同候选模型的计数将翻倍一次，并且允许多数据集微调组合将产生翻倍时间短至0.28年的情况。这种组合增长，以及通过蛮力计算跨所有模型和潜在用户进行归因的巨大计算成本，使得在实践中无法进行穷举的归因。

更新时间: 2025-07-09 02:35:36

领域: cs.AI,cs.CL,cs.CY,cs.FL

下载: http://arxiv.org/abs/2411.08003v2

Vectorised Hashing Based on Bernstein-Rabin-Winograd Polynomials over Prime Order Fields

We introduce the new AXU hash function decBRWHash, which is parameterised by the positive integer $c$ and is based on Bernstein-Rabin-Winograd (BRW) polynomials. Choosing $c>1$ gives a hash function which can be implemented using $c$-way single instruction multiple data (SIMD) instructions. We report a set of very comprehensive hand optimised assembly implementations of 4-decBRWHash using avx2 SIMD instructions available on modern Intel processors. For comparison, we also report similar carefully optimised avx2 assembly implementations of polyHash, an AXU hash function based on usual polynomials. Our implementations are over prime order fields, specifically the primes $2^{127}-1$ and $2^{130}-5$. For the prime $2^{130}-5$, for avx2 implementations, compared to the famous Poly1305 hash function, 4-decBRWHash is faster for messages which are a few hundred bytes long and achieves a speed-up of about 16% for message lengths in a few kilobytes range and improves to a speed-up of about 23% for message lengths in a few megabytes range.

Updated: 2025-07-09 02:27:33

标题: 基于素数阶域上的Bernstein-Rabin-Winograd多项式的矢量化哈希

摘要: 我们介绍了基于Bernstein-Rabin-Winograd（BRW）多项式的新AXU哈希函数decBRWHash，该哈希函数由正整数$c$参数化。选择$c>1$可以实现使用$c$路单指令多数据（SIMD）指令的哈希函数。我们报告了一组非常全面的手动优化汇编实现4-decBRWHash，使用现代英特尔处理器上可用的avx2 SIMD指令。为了比较，我们还报告了类似精心优化的avx2汇编实现polyHash，这是一种基于常规多项式的AXU哈希函数。我们的实现基于素数域，具体为素数$2^{127}-1$和$2^{130}-5$。对于素数$2^{130}-5$，在avx2实现中，与著名的Poly1305哈希函数相比，4-decBRWHash在几百字节长的消息上更快，并在几千字节范围内的消息长度上实现了约16%的加速，并在几兆字节范围内的消息长度上提高到约23%的加速。

更新时间: 2025-07-09 02:27:33

领域: cs.CR

下载: http://arxiv.org/abs/2507.06490v1

Proximal Oracles for Optimization and Sampling

We consider convex optimization with non-smooth objective function and log-concave sampling with non-smooth potential (negative log density). In particular, we study two specific settings where the convex objective/potential function is either H\"older smooth or in hybrid form as the finite sum of H\"older smooth components. To overcome the challenges caused by non-smoothness, our algorithms employ two powerful proximal frameworks in optimization and sampling: the proximal point framework for optimization and the alternating sampling framework (ASF) that uses Gibbs sampling on an augmented distribution. A key component of both optimization and sampling algorithms is the efficient implementation of the proximal map by the regularized cutting-plane method. We establish its iteration-complexity under both H\"older smoothness and hybrid settings using novel convergence analysis, yielding results that are new to the literature. We further propose an adaptive proximal bundle method for non-smooth optimization that employs an aggressive adaptive stepsize strategy, which adjusts stepsizes only when necessary and never rejects iterates. The proposed method is universal since it does not need any problem parameters as input. Additionally, we provide an exact implementation of a proximal sampling oracle, analogous to the proximal map in optimization, along with simple complexity analyses for both the H\"older smooth and hybrid cases, using a novel technique based on a modified Gaussian integral. Finally, we combine this proximal sampling oracle and ASF to obtain a Markov chain Monte Carlo method with non-asymptotic complexity bounds for sampling in H\"older smooth and hybrid settings.

Updated: 2025-07-09 02:24:36

标题: 优化和抽样的近端预言者

摘要: 我们考虑具有非光滑目标函数和对数凹采样的凸优化问题，其中采样潜力为非光滑（负对数密度）。具体而言，我们研究两种特定情况，凸目标/潜势函数要么是H\"older光滑的，要么是作为H\"older光滑分量的有限和的混合形式。为了克服非光滑性带来的挑战，我们的算法在优化和采样中采用两种强大的近端框架：优化中的近端点框架和交替采样框架（ASF），后者在增强分布上使用吉布斯采样。优化和采样算法的一个关键组成部分是通过正则化切平面方法高效实现近端映射。我们通过新颖的收敛分析建立了在H\"older光滑和混合设置下的迭代复杂性，得到了文献中的新结果。我们进一步提出了一种自适应近端包方法，用于非光滑优化，采用积极的自适应步长策略，仅在必要时调整步长，永远不拒绝迭代。该方法是通用的，因为它不需要任何问题参数作为输入。此外，我们提供了一个准确的近端采样预言实现，类似于优化中的近端映射，以及对H\"older光滑和混合情况的简单复杂性分析，使用基于修改的高斯积分的新颖技术。最后，我们结合这个近端采样预言和ASF，得到了在H\"older光滑和混合设置中采样的非渐近复杂度界的马尔可夫链蒙特卡洛方法。

更新时间: 2025-07-09 02:24:36

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2404.02239v2

Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning

Despite advances in reinforcement learning (RL)-based video reasoning with large language models (LLMs), data collection and finetuning remain significant challenges. These methods often rely on large-scale supervised fine-tuning (SFT) with extensive video data and long Chain-of-Thought (CoT) annotations, making them costly and hard to scale. To address this, we present Video-RTS, a new approach to improve video reasoning capability with drastically improved data efficiency by combining data-efficient RL with a video-adaptive test-time scaling (TTS) strategy. Based on observations about the data scaling of RL samples, we skip the resource-intensive SFT step and employ efficient pure-RL training with output-based rewards, requiring no additional annotations or extensive fine-tuning. Furthermore, to utilize computational resources more efficiently, we introduce a sparse-to-dense video TTS strategy that improves inference by iteratively adding frames based on output consistency. We validate our approach on multiple video reasoning benchmarks, showing that Video-RTS surpasses existing video reasoning models by an average of 2.4% in accuracy using only 3.6% training samples. For example, Video-RTS achieves a 4.2% improvement on Video-Holmes, a recent and challenging video reasoning benchmark, and a 2.6% improvement on MMVU. Notably, our pure RL training and adaptive video TTS offer complementary strengths, enabling Video-RTS's strong reasoning performance.

Updated: 2025-07-09 02:06:13

标题: Video-RTS: 重新思考强化学习和测试时间缩放，以实现高效和增强视频推理

摘要: 尽管在基于强化学习（RL）和大型语言模型（LLMs）的视频推理方面取得了进展，但数据收集和微调仍然是重要挑战。这些方法通常依赖于大规模监督微调（SFT），需要大量视频数据和长Chain-of-Thought（CoT）注释，使其成本高昂且难以扩展。为解决这一问题，我们提出了Video-RTS，一种通过将数据高效RL与视频自适应测试时间缩放（TTS）策略相结合，显著提高数据效率的新方法来改善视频推理能力。基于对RL样本数据缩放的观察，我们跳过了资源密集型的SFT步骤，并采用了基于输出奖励的高效纯RL训练，无需额外的注释或广泛的微调。此外，为了更有效地利用计算资源，我们引入了一种稀疏到稠密的视频TTS策略，通过根据输出一致性逐步添加帧来改善推理。我们在多个视频推理基准上验证了我们的方法，结果显示Video-RTS在准确性方面比现有视频推理模型平均提高了2.4%，仅使用了3.6%的训练样本。例如，Video-RTS在最近具有挑战性的视频推理基准Video-Holmes上实现了4.2%的改进，在MMVU上实现了2.6%的改进。值得注意的是，我们的纯RL训练和自适应视频TTS提供了互补的优势，使Video-RTS具有强大的推理性能。

更新时间: 2025-07-09 02:06:13

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.06485v1

FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning

Federated learning aims at training models collaboratively across participants while protecting privacy. However, one major challenge for this paradigm is the data heterogeneity issue, where biased data preferences across multiple clients, harming the model's convergence and performance. In this paper, we first introduce powerful diffusion models into the federated learning paradigm and show that diffusion representations are effective steers during federated training. To explore the possibility of using diffusion representations in handling data heterogeneity, we propose a novel diffusion-inspired Federated paradigm with Diffusion Representation Collaboration, termed FedDifRC, leveraging meaningful guidance of diffusion models to mitigate data heterogeneity. The key idea is to construct text-driven diffusion contrasting and noise-driven diffusion regularization, aiming to provide abundant class-related semantic information and consistent convergence signals. On the one hand, we exploit the conditional feedback from the diffusion model for different text prompts to build a text-driven contrastive learning strategy. On the other hand, we introduce a noise-driven consistency regularization to align local instances with diffusion denoising representations, constraining the optimization region in the feature space. In addition, FedDifRC can be extended to a self-supervised scheme without relying on any labeled data. We also provide a theoretical analysis for FedDifRC to ensure convergence under non-convex objectives. The experiments on different scenarios validate the effectiveness of FedDifRC and the efficiency of crucial components.

Updated: 2025-07-09 01:57:57

标题: FedDifRC：解锁异构联邦学习中文本到图像扩散模型的潜力

摘要: 联邦学习旨在跨参与者协作训练模型，同时保护隐私。然而，这种范式面临的一个主要挑战是数据异质性问题，即多个客户端之间存在偏向性数据偏好，损害了模型的收敛和性能。在本文中，我们首先将强大的扩散模型引入联邦学习范式，并展示扩散表示在联邦训练过程中的有效引导作用。为了探索使用扩散表示处理数据异质性的可能性，我们提出了一种新颖的受扩散启发的联邦范式，即带有扩散表示协作的FedDifRC，利用扩散模型的有意义引导来缓解数据异质性。关键思想是构建文本驱动的扩散对比和噪声驱动的扩散正则化，旨在提供丰富的与类别相关的语义信息和一致的收敛信号。一方面，我们利用来自扩散模型的不同文本提示的条件反馈来构建文本驱动的对比学习策略。另一方面，我们引入噪声驱动的一致性正则化，将本地实例与扩散去噪表示对齐，约束特征空间中的优化区域。此外，FedDifRC可以扩展为一种无需依赖任何标记数据的自监督方案。我们还对FedDifRC进行了理论分析，以确保在非凸目标下的收敛性。在不同场景的实验验证了FedDifRC的有效性以及关键组件的效率。

更新时间: 2025-07-09 01:57:57

领域: cs.LG

下载: http://arxiv.org/abs/2507.06482v1

Generative Lagrangian data assimilation for ocean dynamics under extreme sparsity

Reconstructing ocean dynamics from observational data is fundamentally limited by the sparse, irregular, and Lagrangian nature of spatial sampling, particularly in subsurface and remote regions. This sparsity poses significant challenges for forecasting key phenomena such as eddy shedding and rogue waves. Traditional data assimilation methods and deep learning models often struggle to recover mesoscale turbulence under such constraints. We leverage a deep learning framework that combines neural operators with denoising diffusion probabilistic models (DDPMs) to reconstruct high-resolution ocean states from extremely sparse Lagrangian observations. By conditioning the generative model on neural operator outputs, the framework accurately captures small-scale, high-wavenumber dynamics even at $99\%$ sparsity (for synthetic data) and $99.9\%$ sparsity (for real satellite observations). We validate our method on benchmark systems, synthetic float observations, and real satellite data, demonstrating robust performance under severe spatial sampling limitations as compared to other deep learning baselines.

Updated: 2025-07-09 01:56:25

标题: 极端稀疏条件下海洋动力学的生成Lagrangian数据同化

摘要: 从观测数据重建海洋动力学受到空间采样的稀疏、不规则和拉格朗日性质的基本限制，尤其是在亚表面和偏远地区。这种稀疏性对于预测涡流脱落和罕见海浪等关键现象构成重大挑战。传统的数据同化方法和深度学习模型通常难以在这些约束下恢复介观尺度的湍流。我们利用一个深度学习框架，将神经算子与去噪扩散概率模型（DDPMs）相结合，从极度稀疏的拉格朗日观测中重建高分辨率的海洋状态。通过将生成模型条件化为神经算子输出，该框架即使在$99\%$稀疏度（对于合成数据）和$99.9\%$稀疏度（对于实际卫星观测数据）下，也能准确捕捉小尺度、高波数动力学。我们在基准系统、合成浮标观测和实际卫星数据上验证了我们的方法，与其他深度学习基线相比，在严重的空间采样限制下表现出鲁棒性能。

更新时间: 2025-07-09 01:56:25

领域: physics.ao-ph,cs.AI,cs.LG,math.DS,nlin.CD

下载: http://arxiv.org/abs/2507.06479v1

Sequential Attention-based Sampling for Histopathological Analysis

Deep neural networks are increasingly applied for automated histopathology. Yet, whole-slide images (WSIs) are often acquired at gigapixel sizes, rendering it computationally infeasible to analyze them entirely at high resolution. Diagnostic labels are largely available only at the slide-level, because expert annotation of images at a finer (patch) level is both laborious and expensive. Moreover, regions with diagnostic information typically occupy only a small fraction of the WSI, making it inefficient to examine the entire slide at full resolution. Here, we propose SASHA -- {\it S}equential {\it A}ttention-based {\it S}ampling for {\it H}istopathological {\it A}nalysis -- a deep reinforcement learning approach for efficient analysis of histopathological images. First, SASHA learns informative features with a lightweight hierarchical, attention-based multiple instance learning (MIL) model. Second, SASHA samples intelligently and zooms selectively into a small fraction (10-20\%) of high-resolution patches, to achieve reliable diagnosis. We show that SASHA matches state-of-the-art methods that analyze the WSI fully at high-resolution, albeit at a fraction of their computational and memory costs. In addition, it significantly outperforms competing, sparse sampling methods. We propose SASHA as an intelligent sampling model for medical imaging challenges that involve automated diagnosis with exceptionally large images containing sparsely informative features.

Updated: 2025-07-09 01:48:46

标题: 基于顺序注意力的组织病理分析采样

摘要: 深度神经网络越来越多地应用于自动组织病理学。然而，整张幻灯片图像（WSIs）通常以千兆像素的大小获取，使得在高分辨率下完全分析它们在计算上不可行。诊断标签通常仅在幻灯片级别可用，因为在更细致的（补丁）级别上对图像进行专家标注既费力又昂贵。此外，具有诊断信息的区域通常仅占整张幻灯片的一小部分，因此检查整张幻灯片的全分辨率效率低下。在这里，我们提出SASHA——基于顺序注意力取样的组织病理学分析——一种用于高效分析组织病理学图像的深度强化学习方法。首先，SASHA使用轻量级分层、基于注意力的多实例学习（MIL）模型学习信息丰富的特征。其次，SASHA智能取样并有选择性地放大到高分辨率补丁的一小部分（10-20％），以实现可靠的诊断。我们展示了SASHA与分析完整高分辨率WSI的最新方法相匹配，尽管其计算和内存成本仅为其一小部分。此外，它明显优于竞争的稀疏取样方法。我们将SASHA提议为医学成像挑战的智能取样模型，这些挑战涉及包含稀疏信息特征的异常大图像的自动诊断。

更新时间: 2025-07-09 01:48:46

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.05077v2

Real AI Agents with Fake Memories: Fatal Context Manipulation Attacks on Web3 Agents

AI agents integrated with Web3 offer autonomy and openness but raise security concerns as they interact with financial protocols and immutable smart contracts. This paper investigates the vulnerabilities of AI agents within blockchain-based financial ecosystems when exposed to adversarial threats in real-world scenarios. We introduce the concept of context manipulation -- a comprehensive attack vector that exploits unprotected context surfaces, including input channels, memory modules, and external data feeds. It expands on traditional prompt injection and reveals a more stealthy and persistent threat: memory injection. Using ElizaOS, a representative decentralized AI agent framework for automated Web3 operations, we showcase that malicious injections into prompts or historical records can trigger unauthorized asset transfers and protocol violations which could be financially devastating in reality. To quantify these risks, we introduce CrAIBench, a Web3-focused benchmark covering 150+ realistic blockchain tasks. such as token transfers, trading, bridges, and cross-chain interactions, and 500+ attack test cases using context manipulation. Our evaluation results confirm that AI models are significantly more vulnerable to memory injection compared to prompt injection. Finally, we evaluate a comprehensive defense roadmap, finding that prompt-injection defenses and detectors only provide limited protection when stored context is corrupted, whereas fine-tuning-based defenses substantially reduce attack success rates while preserving performance on single-step tasks. These results underscore the urgent need for AI agents that are both secure and fiduciarily responsible in blockchain environments.

Updated: 2025-07-09 01:38:20

标题: 具有虚假记忆的真实AI代理：针对Web3代理的致命环境操纵攻击

摘要: AI代理与Web3集成提供了自主性和开放性，但由于它们与金融协议和不可变的智能合约进行交互，因此引发了安全问题。本文调查了在现实场景中面临对手威胁时，AI代理在基于区块链的金融生态系统中的漏洞。我们引入了上下文操纵的概念 -- 一种全面的攻击向量，利用未受保护的上下文表面，包括输入通道、内存模块和外部数据源。它扩展了传统的提示注入，揭示了一个更隐蔽和持久的威胁：内存注入。通过ElizaOS，一个代表性的用于自动化Web3操作的分散式AI代理框架，我们展示了对提示或历史记录的恶意注入可以触发未经授权的资产转移和协议违规行为，这在现实中可能造成严重的财务损失。为了量化这些风险，我们引入了CrAIBench，一个覆盖150多个现实区块链任务的Web3重点基准，如代币转移、交易、桥梁和跨链交互，以及使用上下文操纵的500多个攻击测试案例。我们的评估结果证实了与提示注入相比，AI模型对内存注入更加脆弱。最后，我们评估了一份全面的防御路线图，发现当存储的上下文被损坏时，提示注入的防御和检测器只能提供有限的保护，而基于微调的防御显著降低攻击成功率，同时保持对单步任务的性能。这些结果强调了在区块链环境中既安全又负责任的AI代理的迫切需求。

更新时间: 2025-07-09 01:38:20

领域: cs.CR,cs.AI,I.2.7

下载: http://arxiv.org/abs/2503.16248v3

GNNs Meet Sequence Models Along the Shortest-Path: an Expressive Method for Link Prediction

Graph Neural Networks (GNNs) often struggle to capture the link-specific structural patterns crucial for accurate link prediction, as their node-centric message-passing schemes overlook the subgraph structures connecting a pair of nodes. Existing methods to inject such structural context either incur high computational cost or rely on simplistic heuristics (e.g., common neighbor counts) that fail to model multi-hop dependencies. We introduce SP4LP (Shortest Path for Link Prediction), a novel framework that combines GNN-based node encodings with sequence modeling over shortest paths. Specifically, SP4LP first applies a GNN to compute representations for all nodes, then extracts the shortest path between each candidate node pair and processes the resulting sequence of node embeddings using a sequence model. This design enables SP4LP to capture expressive multi-hop relational patterns with computational efficiency. Empirically, SP4LP achieves state-of-the-art performance across link prediction benchmarks. Theoretically, we prove that SP4LP is strictly more expressive than standard message-passing GNNs and several state-of-the-art structural features methods, establishing it as a general and principled approach for link prediction in graphs.

Updated: 2025-07-09 01:37:19

标题: GNNs遇见序列模型沿着最短路径：一种用于链接预测的表达丰富的方法

摘要: 图神经网络（GNNs）经常很难捕捉到对准确的链接预测至关重要的特定于链接的结构模式，因为它们基于节点的消息传递方案忽略了连接一对节点的子图结构。现有的注入这种结构上下文的方法要么造成高计算成本，要么依赖于简单的启发式方法（例如，常见邻居计数），无法建模多跳依赖关系。我们引入了SP4LP（最短路径用于链接预测），这是一个将基于GNN的节点编码与最短路径上的序列建模相结合的新框架。具体地，SP4LP首先应用GNN计算所有节点的表示，然后提取每对候选节点之间的最短路径，并使用序列模型处理得到的节点嵌入的序列。这种设计使得SP4LP能够以高效的计算方式捕捉富有表现力的多跳关系模式。在实证方面，SP4LP在链接预测基准上取得了最先进的性能。从理论上讲，我们证明了SP4LP比标准的消息传递GNNs和一些最先进的结构特征方法更具表达力，将其确立为图中链接预测的一种通用且原则性的方法。

更新时间: 2025-07-09 01:37:19

领域: cs.LG

下载: http://arxiv.org/abs/2507.07138v1

Stochastic Alignments: Matching an Observed Trace to Stochastic Process Models

Process mining leverages event data extracted from IT systems to generate insights into the business processes of organizations. Such insights benefit from explicitly considering the frequency of behavior in business processes, which is captured by stochastic process models. Given an observed trace and a stochastic process model, conventional alignment-based conformance checking techniques face a fundamental limitation: They prioritize matching the trace to a model path with minimal deviations, which may, however, lead to selecting an unlikely path. In this paper, we study the problem of matching an observed trace to a stochastic process model by identifying a likely model path with a low edit distance to the trace. We phrase this as an optimization problem and develop a heuristic-guided path-finding algorithm to solve it. Our open-source implementation demonstrates the feasibility of the approach and shows that it can provide new, useful diagnostic insights for analysts.

Updated: 2025-07-09 01:20:53

标题: 随机对齐：将观察到的跟踪匹配到随机过程模型

摘要: 过程挖掘利用从IT系统中提取的事件数据，以生成对组织的业务流程的洞察。这种洞察受益于明确考虑业务流程中行为频率，这由随机过程模型捕获。鉴于观察到的跟踪和随机过程模型，传统的基于对齐的一致性检查技术面临着一个根本性限制：它们优先匹配具有最小偏差的模型路径的跟踪，然而，这可能导致选择一个不太可能的路径。在本文中，我们研究了将观察到的跟踪与随机过程模型匹配的问题，通过确定一个与跟踪的编辑距离较低的可能的模型路径。我们将其表述为一个优化问题，并开发了一个启发式引导的路径查找算法来解决它。我们的开源实现证明了这种方法的可行性，并表明它可以为分析人员提供新的有用的诊断洞察。

更新时间: 2025-07-09 01:20:53

领域: cs.FL,cs.LG

下载: http://arxiv.org/abs/2507.06472v1

GTA1: GUI Test-time Scaling Agent

Graphical user interface (GUI) agents autonomously operate across platforms (e.g., Linux) to complete tasks by interacting with visual elements. Specifically, a user instruction is decomposed into a sequence of action proposals, each corresponding to an interaction with the GUI. After each action, the agent observes the updated GUI environment to plan the next step. However, two main challenges arise: i) resolving ambiguity in task planning (i.e., the action proposal sequence), where selecting an appropriate plan is non-trivial, as many valid ones may exist; ii) accurately grounding actions in complex and high-resolution interfaces, i.e., precisely interacting with visual targets. This paper investigates the two aforementioned challenges with our GUI Test-time Scaling Agent, namely GTA1. First, to select the most appropriate action proposal, we introduce a test-time scaling method. At each step, we sample multiple candidate action proposals and leverage a judge model to evaluate and select the most suitable one. It trades off computation for better decision quality by concurrent sampling, shortening task execution steps, and improving overall performance. Second, we propose a model that achieves improved accuracy when grounding the selected action proposal to its corresponding visual elements. Our key insight is that reinforcement learning (RL) facilitates visual grounding through inherent objective alignments, rewarding successful clicks on interface elements. Experimentally, our method establishes state-of-the-art performance across diverse benchmarks. For example, GTA1-7B achieves 50.1%, 92.4%, and 67.7% accuracies on Screenspot-Pro, Screenspot-V2, and OSWorld-G, respectively. When paired with a planner applying our test-time scaling strategy, it exhibits state-of-the-art agentic performance (e.g., 45.2% task success rate on OSWorld). We open-source our code and models here.

Updated: 2025-07-09 01:16:44

标题: GTA1：GUI测试时间缩放代理

摘要: 图形用户界面（GUI）代理在各个平台（例如Linux）上自主运行，通过与视觉元素交互来完成任务。具体而言，用户指令被分解为一系列动作提议，每个提议对应与GUI的交互。在每个动作之后，代理观察更新后的GUI环境以规划下一步。然而，出现了两个主要挑战：一是解决任务规划（即动作提议序列）中的歧义，选择适当的计划并非易事，因为可能存在许多有效的计划；二是在复杂和高分辨率界面中准确地进行动作定位，即精确地与视觉目标进行交互。本文使用我们的GUI测试时间缩放代理GTA1来研究上述两个挑战。首先，为了选择最合适的动作提议，我们引入了一种测试时间缩放方法。在每个步骤中，我们对多个候选动作提议进行采样，并利用评估模型来评估并选择最合适的动作。通过并发采样，缩短任务执行步骤，提高整体性能，以计算换取更好的决策质量。其次，我们提出了一个模型，在将选定的动作提议与相应的视觉元素进行定位时取得了更高的准确性。我们的关键见解是，强化学习通过固有的客观对齐，奖励对界面元素的成功点击，促进了视觉定位。在实验中，我们的方法在各种基准测试中建立了最新技术性能。例如，GTA1-7B分别在Screenspot-Pro、Screenspot-V2和OSWorld-G上实现了50.1%、92.4%和67.7%的准确率。当与应用我们的测试时间缩放策略的规划器配对时，它展现出最新技术性能（例如，在OSWorld上的任务成功率为45.2%）。我们在此开源我们的代码和模型。

更新时间: 2025-07-09 01:16:44

领域: cs.AI

下载: http://arxiv.org/abs/2507.05791v2

MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-parameters

We address the challenge of optimizing meta-parameters (hyperparameters) in machine learning, a key factor for efficient training and high model performance. Rather than relying on expensive meta-parameter search methods, we introduce MetaOptimize: a dynamic approach that adjusts meta-parameters, particularly step sizes (also known as learning rates), during training. More specifically, MetaOptimize can wrap around any first-order optimization algorithm, tuning step sizes on the fly to minimize a specific form of regret that considers the long-term impact of step sizes on training, through a discounted sum of future losses. We also introduce lower-complexity variants of MetaOptimize that, in conjunction with its adaptability to various optimization algorithms, achieve performance comparable to those of the best hand-crafted learning rate schedules across diverse machine learning tasks.

Updated: 2025-07-09 01:03:54

标题: 元优化：一个用于优化步长和其他元参数的框架

摘要: 我们解决了在机器学习中优化元参数（超参数）的挑战，这是高效训练和模型性能的关键因素。我们不依赖昂贵的元参数搜索方法，而是引入了MetaOptimize：一种动态方法，它在训练过程中调整元参数，特别是步长（也称为学习率）。更具体地说，MetaOptimize可以包裹任何一阶优化算法，实时调整步长以最小化一种特定形式的后悔，该后悔考虑了步长对训练的长期影响，通过未来损失的折现总和。我们还介绍了MetaOptimize的低复杂度变体，结合其适应各种优化算法的能力，实现了与最佳手工调整学习率计划相媲美的性能，在各种机器学习任务中表现出色。

更新时间: 2025-07-09 01:03:54

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2402.02342v6

Mitigating Message Imbalance in Fraud Detection with Dual-View Graph Representation Learning

Graph representation learning has become a mainstream method for fraud detection due to its strong expressive power, which focuses on enhancing node representations through improved neighborhood knowledge capture. However, the focus on local interactions leads to imbalanced transmission of global topological information and increased risk of node-specific information being overwhelmed during aggregation due to the imbalance between fraud and benign nodes. In this paper, we first summarize the impact of topology and class imbalance on downstream tasks in GNN-based fraud detection, as the problem of imbalanced supervisory messages is caused by fraudsters' topological behavior obfuscation and identity feature concealment. Based on statistical validation, we propose a novel dual-view graph representation learning method to mitigate Message imbalance in Fraud Detection(MimbFD). Specifically, we design a topological message reachability module for high-quality node representation learning to penetrate fraudsters' camouflage and alleviate insufficient propagation. Then, we introduce a local confounding debiasing module to adjust node representations, enhancing the stable association between node representations and labels to balance the influence of different classes. Finally, we conducted experiments on three public fraud datasets, and the results demonstrate that MimbFD exhibits outstanding performance in fraud detection.

Updated: 2025-07-09 01:00:55

标题: 使用双视图图表示学习减轻欺诈检测中的消息不平衡

摘要: 图表示学习已成为欺诈检测的主流方法，因其强大的表达能力，侧重于通过改进邻域知识捕获来增强节点表示。然而，对局部交互的关注导致全局拓扑信息的不平衡传输和节点特定信息在聚合过程中被压倒的风险增加，原因是欺诈和良性节点之间存在不平衡。在本文中，我们首先总结了拓扑和类别不平衡对基于GNN的欺诈检测中的下游任务的影响，因为监督信息不平衡的问题是由于欺诈者的拓扑行为混淆和身份特征隐藏所致。基于统计验证，我们提出了一种新颖的双视图图表示学习方法来减轻欺诈检测中的消息不平衡（MimbFD）。具体地，我们设计了一个拓扑消息可达性模块，用于高质量节点表示学习，以穿透欺诈者的伪装并减轻信息传播不足。然后，我们引入了一个局部混淆去偏模块来调整节点表示，增强节点表示和标签之间的稳定关联，以平衡不同类别的影响。最后，我们在三个公共欺诈数据集上进行了实验，结果表明MimbFD在欺诈检测中表现出色。

更新时间: 2025-07-09 01:00:55

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2507.06469v1

Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models

Multi-agent interactions have long fueled innovation, from natural predator-prey dynamics to the space race. Self-play (SP) algorithms try to harness these dynamics by pitting agents against ever-improving opponents, thereby creating an implicit curriculum toward learning high-quality solutions. However, SP often fails to produce diverse solutions and can get stuck in locally optimal behaviors. We introduce Foundation-Model Self-Play (FMSP), a new direction that leverages the code-generation capabilities and vast knowledge of foundation models (FMs) to overcome these challenges by leaping across local optima in policy space. We propose a family of approaches: (1) \textbf{Vanilla Foundation-Model Self-Play (vFMSP)} continually refines agent policies via competitive self-play; (2) \textbf{Novelty-Search Self-Play (NSSP)} builds a diverse population of strategies, ignoring performance; and (3) the most promising variant, \textbf{Quality-Diveristy Self-Play (QDSP)}, creates a diverse set of high-quality policies by combining the diversity of NSSP and refinement of vFMSP. We evaluate FMSPs in Car Tag, a continuous-control pursuer-evader setting, and in Gandalf, a simple AI safety simulation in which an attacker tries to jailbreak an LLM's defenses. In Car Tag, FMSPs explore a wide variety of reinforcement learning, tree search, and heuristic-based methods, to name just a few. In terms of discovered policy quality, \ouralgo and vFMSP surpass strong human-designed strategies. In Gandalf, FMSPs can successfully automatically red-team an LLM, breaking through and jailbreaking six different, progressively stronger levels of defense. Furthermore, FMSPs can automatically proceed to patch the discovered vulnerabilities. Overall, FMSPs represent a promising new research frontier of improving self-play with foundation models, opening fresh paths toward more creative and open-ended strategy discovery

Updated: 2025-07-09 00:58:19

标题: 基础模型自我对弈：通过基础模型实现开放式策略创新

摘要: 多智能体相互作用长期以来一直推动创新，从自然的捕食-被捕食动态到太空竞赛。自我对弈（SP）算法试图通过让智能体与不断改进的对手对抗，从而创造一个朝向学习高质量解决方案的隐性课程，来利用这些动态。然而，SP通常无法产生多样化的解决方案，并且可能陷入局部最优行为。我们引入了基础模型自我对弈（FMSP），这是一个利用基础模型（FMs）的代码生成能力和广泛知识的新方向，通过跨越策略空间中的局部最优解来克服这些挑战。我们提出了一系列方法：（1）基础模型自我对弈（vFMSP）通过竞争性自我对弈不断完善智能体策略；（2）新颖性搜索自我对弈（NSSP）构建一个多样化的策略人口，忽略性能；以及（3）最有前景的变体，质量-多样性自我对弈（QDSP），通过结合NSSP的多样性和vFMSP的完善性，创建一组高质量的策略。我们在连续控制追逐-逃避设置的Car Tag和一个简单的人工智能安全模拟Gandalf中评估了FMSP。在Car Tag中，FMSP探索了各种强化学习、树搜索和启发式方法等。就发现的策略质量而言，我们的算法和vFMSP超越了强大的人类设计策略。在Gandalf中，FMSP可以成功地自动对抗LLM，突破并破解六个不同、逐渐更强的防御级别。此外，FMSP可以自动继续修补发现的漏洞。总体而言，FMSP代表了一个有前途的新研究前沿，通过基础模型改进自我对弈，开辟了通往更具创造性和开放性策略发现的新路径。

更新时间: 2025-07-09 00:58:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06466v1

Automating Evaluation of Diffusion Model Unlearning with (Vision-) Language Model World Knowledge

Machine unlearning (MU) is a promising cost-effective method to cleanse undesired information (generated concepts, biases, or patterns) from foundational diffusion models. While MU is orders of magnitude less costly than retraining a diffusion model without the undesired information, it can be challenging and labor-intensive to prove that the information has been fully removed from the model. Moreover, MU can damage diffusion model performance on surrounding concepts that one would like to retain, making it unclear if the diffusion model is still fit for deployment. We introduce autoeval-dmun, an automated tool which leverages (vision-) language models to thoroughly assess unlearning in diffusion models. Given a target concept, autoeval-dmun extracts structured, relevant world knowledge from the language model to identify nearby concepts which are likely damaged by unlearning and to circumvent unlearning with adversarial prompts. We use our automated tool to evaluate popular diffusion model unlearning methods, revealing that language models (1) impose semantic orderings of nearby concepts which correlate well with unlearning damage and (2) effectively circumvent unlearning with synthetic adversarial prompts.

Updated: 2025-07-09 00:51:09

标题: Automating Evaluation of Diffusion Model Unlearning with (Vision-) Language Model World Knowledge 利用（视觉）-语言模型世界知识自动化评估扩散模型遗忘

摘要: 机器遗忘（MU）是一种有前途且经济高效的方法，用于清除基础扩散模型中的不需要的信息（生成的概念、偏见或模式）。虽然MU的成本远低于重新训练没有不需要的信息的扩散模型，但要证明该信息已完全从模型中删除可能具有挑战性且劳动密集。此外，MU可能会损害周围概念上的扩散模型性能，而这些概念是希望保留的，这使得不清楚扩散模型是否仍适合部署。我们介绍了autoeval-dmun，这是一个自动化工具，利用（视觉）语言模型全面评估扩散模型中的遗忘。给定一个目标概念，autoeval-dmun从语言模型中提取结构化的相关世界知识，以识别可能受到遗忘影响的附近概念，并通过对抗性提示规避遗忘。我们使用我们的自动化工具评估流行的扩散模型遗忘方法，揭示语言模型（1）对附近概念进行语义排序，与遗忘损害相关良好，并且（2）有效地通过合成对抗性提示规避遗忘。

更新时间: 2025-07-09 00:51:09

领域: cs.LG

下载: http://arxiv.org/abs/2507.07137v1

SoftSignSGD(S3): An Enhanced Optimizer for Practical DNN Training and Loss Spikes Minimization Beyond Adam

Adam has proven remarkable successful in training deep neural networks, but the mechanisms underlying its empirical successes and limitations remain underexplored. In this study, we demonstrate that the effectiveness of Adam stems largely from its similarity to SignSGD in robustly handling large gradient fluctuations, yet it is also vulnerable to destabilizing loss spikes due to its uncontrolled update scaling. To enhance the advantage of Adam and mitigate its limitation, we propose SignSoftSGD (S3), a novel optimizer with three key innovations. \emph{First}, S3 generalizes the sign-like update by employing a flexible $p$-th order momentum ($p \geq 1$) in the denominator, departing from the conventional second-order momentum (variance) preconditioning. This design enables enhanced performance while achieving stable training even with aggressive learning rates. \emph{Second}, S3 minimizes the occurrences of loss spikes through unified exponential moving average coefficients for numerator and denominator momenta, which inherently bound updates to $[-1, 1]$ and simplify hyperparameter tuning. \emph{Third}, S3 incorporates an equivalent Nesterov's accelerated gradient(NAG) module, accelerating convergence without memory overhead. Theoretically, we prove that S3 achieves the optimal convergence rate of $O\left(\frac{1}{T^{\sfrac{1}{4}}}\right)$ for general nonconvex stochastic optimization under weak assumptions. Extensive experiments across a range of vision and language tasks show that \textsf{\small S3} not only converges more rapidly and improves performance but also rarely experiences loss spikes, even with a \textbf{$\bm{10 \times}$} larger learning rate. In fact, S3 delivers performance comparable to or better than AdamW with \textbf{$2 \times$} the training steps, establishing its efficacy in both efficiency and final task performance.

Updated: 2025-07-09 00:47:37

标题: SoftSignSGD(S3)：一种增强型优化器，用于实际的深度神经网络训练和超越Adam的损失峰值最小化

摘要: Adam在训练深度神经网络方面取得了显著的成功，但其实验成功和限制背后的机制仍然未被充分探索。在这项研究中，我们证明Adam的有效性在很大程度上源于其与SignSGD在稳健处理大梯度波动方面的相似性，然而它也容易受到不受控制的更新缩放引起的损失波动的影响。为了增强Adam的优势并减轻其限制，我们提出了SignSoftSGD（S3），这是一种具有三个关键创新的新型优化器。首先，S3通过在分母中使用灵活的$p$阶动量（$p \geq 1$）来泛化类似于符号的更新，从而与传统的二阶动量（方差）预处理有所不同。这种设计可以在实现稳定训练的同时提高性能，即使使用激进的学习率也能做到。其次，S3通过为分子和分母动量使用统一的指数移动平均系数，最小化损失波动的发生，从而固有地将更新限制在$[-1, 1]$之间并简化超参数调整。第三，S3集成了一个等效的Nesterov加速梯度（NAG）模块，加快收敛速度而无需额外的内存开销。理论上，我们证明S3在弱假设下对一般非凸随机优化实现了$O\left(\frac{1}{T^{\sfrac{1}{4}}}\right)$的最佳收敛速度。在一系列视觉和语言任务中进行的大量实验表明，S3不仅收敛更快并提高性能，而且即使使用更大的学习率，也很少出现损失波动。事实上，S3在训练步骤增加\textbf{$\bm{10 \times}$}的情况下，提供了与AdamW相当或更好的性能，从而确立了其在效率和最终任务性能方面的功效。

更新时间: 2025-07-09 00:47:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.06464v1

Cons-training Tensor Networks: Embedding and Optimization Over Discrete Linear Constraints

In this study, we introduce a novel family of tensor networks, termed constrained matrix product states (MPS), designed to incorporate exactly arbitrary discrete linear constraints, including inequalities, into sparse block structures. These tensor networks are particularly tailored for modeling distributions with support strictly over the feasible space, offering benefits such as reducing the search space in optimization problems, alleviating overfitting, improving training efficiency, and decreasing model size. Central to our approach is the concept of a quantum region, an extension of quantum numbers traditionally used in U(1) symmetric tensor networks, adapted to capture any linear constraint, including the unconstrained scenario. We further develop a novel canonical form for these new MPS, which allow for the merging and factorization of tensor blocks according to quantum region fusion rules and permit optimal truncation schemes. Utilizing this canonical form, we apply an unsupervised training strategy to optimize arbitrary objective functions subject to discrete linear constraints. Our method's efficacy is demonstrated by solving the quadratic knapsack problem, achieving superior performance compared to a leading nonlinear integer programming solver. Additionally, we analyze the complexity and scalability of our approach, demonstrating its potential in addressing complex constrained combinatorial optimization problems.

Updated: 2025-07-09 00:36:21

标题: 对张量网络进行约束培训：嵌入和在离散线性约束上的优化

摘要: 在这项研究中，我们介绍了一种新颖的张量网络家族，称为受限矩阵乘积态（MPS），旨在精确地将任意离散线性约束（包括不等式）纳入稀疏块结构中。这些张量网络特别适用于建模支持严格位于可行空间上的分布，提供诸如减少优化问题中的搜索空间、减轻过拟合、提高训练效率和减少模型大小等好处。我们方法的核心是量子区域的概念，这是传统上用于U(1)对称张量网络的量子数的扩展，适应于捕捉任何线性约束，包括无约束情况。我们进一步为这些新的MPS开发了一种新颖的规范形式，允许根据量子区域融合规则合并和因子分解张量块，并允许最佳截断方案。利用这种规范形式，我们应用了一种无监督训练策略来优化受限的离散线性约束下的任意目标函数。通过解决二次背包问题，我们展示了我们方法的有效性，相比于领先的非线性整数规划求解器，取得了优越的性能。此外，我们分析了我们方法的复杂性和可扩展性，展示了它在解决复杂的受限组合优化问题中的潜力。

更新时间: 2025-07-09 00:36:21

领域: math.NA,cs.LG,cs.NA,quant-ph

下载: http://arxiv.org/abs/2405.09005v5

Energy-Efficient Supervised Learning with a Binary Stochastic Forward-Forward Algorithm

Reducing energy consumption has become a pressing need for modern machine learning, which has achieved many of its most impressive results by scaling to larger and more energy-consumptive neural networks. Unfortunately, the main algorithm for training such networks, backpropagation, poses significant challenges for custom hardware accelerators, due to both its serial dependencies and the memory footprint needed to store forward activations for the backward pass. Alternatives to backprop, although less effective, do exist; here the main computational bottleneck becomes matrix multiplication. In this study, we derive forward-forward algorithms for binary, stochastic units. Binarization of the activations transforms matrix multiplications into indexing operations, which can be executed efficiently in hardware. Stochasticity, combined with tied weights across units with different biases, bypasses the information bottleneck imposed by binary units. Furthermore, although slow and expensive in traditional hardware, binary sampling that is very fast can be implemented cheaply with p-bits (probabilistic bits), novel devices made up of unstable magnets. We evaluate our proposed algorithms on the MNIST, Fashion-MNIST, and CIFAR-10 datasets, showing that its performance is close to real-valued forward-forward, but with an estimated energy savings of about one order of magnitude.

Updated: 2025-07-09 00:29:06

标题: 具有二进制随机前向算法的高效监督学习

摘要: 降低能耗已经成为现代机器学习的迫切需求，通过扩展规模以及使用更多能源的神经网络，现代机器学习取得了许多令人印象深刻的成果。然而，用于训练这些网络的主要算法反向传播，由于其串行依赖性和需要存储正向激活以进行反向传递的内存占用量，对于定制硬件加速器提出了重大挑战。虽然存在替代反向传播的方法，但效果较差，这里主要的计算瓶颈变成了矩阵乘法。在这项研究中，我们推导了二进制随机单元的正向算法。激活的二值化将矩阵乘法转换为索引操作，可以在硬件中高效执行。随机性结合不同偏置单元之间的绑定权重，绕过了二进制单元所施加的信息瓶颈。此外，虽然在传统硬件中较慢且昂贵，但非常快速的二值化采样可以使用p位（概率位）便宜地实现，这是由不稳定磁体组成的新型设备。我们在MNIST、Fashion-MNIST和CIFAR-10数据集上评估了我们提出的算法，表明其性能接近实值正向算法，但预计能源节约约为一个数量级。

更新时间: 2025-07-09 00:29:06

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2507.06461v1

Geometric Constraints in Deep Learning Frameworks: A Survey

Stereophotogrammetry is an established technique for scene understanding. Its origins go back to at least the 1800s when people first started to investigate using photographs to measure the physical properties of the world. Since then, thousands of approaches have been explored. The classic geometric technique of Shape from Stereo is built on using geometry to define constraints on scene and camera deep learning without any attempt to explicitly model the geometry. In this survey, we explore geometry-inspired deep learning-based frameworks. We compare and contrast geometry enforcing constraints integrated into deep learning frameworks for depth estimation and other closely related vision tasks. We present a new taxonomy for prevalent geometry enforcing constraints used in modern deep learning frameworks. We also present insightful observations and potential future research directions.

Updated: 2025-07-09 00:27:59

标题: 深度学习框架中的几何约束：一项调查

摘要: 立体摄影测量是一种用于场景理解的成熟技术。其起源可以追溯至至少19世纪，当时人们首次开始研究使用照片来测量世界的物理特性。从那时起，已经探索了成千上万种方法。经典的几何技术“立体形状”建立在使用几何来定义对场景和相机的约束的基础上，而不尝试明确建模几何。在这项调查中，我们探索了基于几何启发的深度学习框架。我们比较和对比了集成到深度学习框架中的几何约束强制约束在深度估计和其他密切相关的视觉任务中的应用。我们提出了一种新的用于现代深度学习框架中流行的几何约束强制约束的分类法。我们还提出了深刻的观察和潜在的未来研究方向。

更新时间: 2025-07-09 00:27:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.12431v2

Theme-Explanation Structure for Table Summarization using Large Language Models: A Case Study on Korean Tabular Data

Tables are a primary medium for conveying critical information in administrative domains, yet their complexity hinders utilization by Large Language Models (LLMs). This paper introduces the Theme-Explanation Structure-based Table Summarization (Tabular-TX) pipeline, a novel approach designed to generate highly interpretable summaries from tabular data, with a specific focus on Korean administrative documents. Current table summarization methods often neglect the crucial aspect of human-friendly output. Tabular-TX addresses this by first employing a multi-step reasoning process to ensure deep table comprehension by LLMs, followed by a journalist persona prompting strategy for clear sentence generation. Crucially, it then structures the output into a Theme Part (an adverbial phrase) and an Explanation Part (a predicative clause), significantly enhancing readability. Our approach leverages in-context learning, obviating the need for extensive fine-tuning and associated labeled data or computational resources. Experimental results show that Tabular-TX effectively processes complex table structures and metadata, offering a robust and efficient solution for generating human-centric table summaries, especially in low-resource scenarios.

Updated: 2025-07-09 00:21:40

标题: 使用大型语言模型进行表格总结的主题-解释结构：韩国表格数据案例研究

摘要: 表格是在行政领域传达关键信息的主要媒介，但其复杂性阻碍了大型语言模型（LLMs）的利用。本文介绍了基于主题解释结构的表格摘要（Tabular-TX）管道，这是一种新颖的方法，旨在从表格数据中生成高度可解释的摘要，特别关注韩国行政文件。当前的表格摘要方法通常忽略了人类友好输出的关键方面。Tabular-TX通过首先采用多步推理过程确保LLMs深入理解表格，然后采用记者人设提示策略生成清晰的句子来解决这个问题。关键是，它将输出结构化为主题部分（状语从句）和解释部分（谓语从句），显著提高了可读性。我们的方法利用上下文学习，消除了对大量微调和相关标记数据或计算资源的需求。实验结果表明，Tabular-TX有效处理复杂的表格结构和元数据，为生成以人为中心的表格摘要提供了强大而高效的解决方案，尤其是在资源匮乏的情况下。

更新时间: 2025-07-09 00:21:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2501.10487v3

EA: An Event Autoencoder for High-Speed Vision Sensing

High-speed vision sensing is essential for real-time perception in applications such as robotics, autonomous vehicles, and industrial automation. Traditional frame-based vision systems suffer from motion blur, high latency, and redundant data processing, limiting their performance in dynamic environments. Event cameras, which capture asynchronous brightness changes at the pixel level, offer a promising alternative but pose challenges in object detection due to sparse and noisy event streams. To address this, we propose an event autoencoder architecture that efficiently compresses and reconstructs event data while preserving critical spatial and temporal features. The proposed model employs convolutional encoding and incorporates adaptive threshold selection and a lightweight classifier to enhance recognition accuracy while reducing computational complexity. Experimental results on the existing Smart Event Face Dataset (SEFD) demonstrate that our approach achieves comparable accuracy to the YOLO-v4 model while utilizing up to $35.5\times$ fewer parameters. Implementations on embedded platforms, including Raspberry Pi 4B and NVIDIA Jetson Nano, show high frame rates ranging from 8 FPS up to 44.8 FPS. The proposed classifier exhibits up to 87.84x better FPS than the state-of-the-art and significantly improves event-based vision performance, making it ideal for low-power, high-speed applications in real-time edge computing.

Updated: 2025-07-09 00:21:15

标题: EA：一种用于高速视觉感知的事件自编码器

摘要: 高速视觉感知对于机器人、自动驾驶车辆和工业自动化等应用中的实时感知至关重要。传统基于帧的视觉系统受到运动模糊、高延迟和冗余数据处理的限制，在动态环境中表现性能有限。事件相机捕捉像素级别的异步亮度变化，提供了一种有希望的替代方案，但由于稀疏和嘈杂的事件流而在目标检测方面面临挑战。为了解决这个问题，我们提出了一个事件自动编码器架构，能够高效压缩和重建事件数据，同时保留重要的空间和时间特征。所提出的模型采用卷积编码，并结合自适应阈值选择和轻量级分类器，以提高识别精度同时减少计算复杂性。在现有的智能事件人脸数据集（SEFD）上的实验结果表明，我们的方法实现了与YOLO-v4模型相当的准确性，同时利用的参数数量减少了高达35.5倍。在嵌入式平台上的实现，包括树莓派4B和NVIDIA Jetson Nano，显示出高帧率，范围从8 FPS到44.8 FPS。所提出的分类器的帧率比最先进的技术高达87.84倍，并显著改善了基于事件的视觉性能，使其成为实时边缘计算中低功耗、高速度应用的理想选择。

更新时间: 2025-07-09 00:21:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.06459v1

Vital Insight: Assisting Experts' Context-Driven Sensemaking of Multi-modal Personal Tracking Data Using Visualization and Human-In-The-Loop LLM

Passive tracking methods, such as phone and wearable sensing, have become dominant in monitoring human behaviors in modern ubiquitous computing studies. While there have been significant advances in machine-learning approaches to translate periods of raw sensor data to model momentary behaviors, (e.g., physical activity recognition), there still remains a significant gap in the translation of these sensing streams into meaningful, high-level, context-aware insights that are required for various applications (e.g., summarizing an individual's daily routine). To bridge this gap, experts often need to employ a context-driven sensemaking process in real-world studies to derive insights. This process often requires manual effort and can be challenging even for experienced researchers due to the complexity of human behaviors. We conducted three rounds of user studies with 21 experts to explore solutions to address challenges with sensemaking. We follow a human-centered design process to identify needs and design, iterate, build, and evaluate Vital Insight (VI), a novel, LLM-assisted, prototype system to enable human-in-the-loop inference (sensemaking) and visualizations of multi-modal passive sensing data from smartphones and wearables. Using the prototype as a technology probe, we observe experts' interactions with it and develop an expert sensemaking model that explains how experts move between direct data representations and AI-supported inferences to explore, question, and validate insights. Through this iterative process, we also synthesize and discuss a list of design implications for the design of future AI-augmented visualization systems to better assist experts' sensemaking processes in multi-modal health sensing data.

Updated: 2025-07-09 00:14:20

标题: 关键洞察：利用可视化和人机协作LLM辅助专家基于上下文的多模式个人跟踪数据解释

摘要: 被动跟踪方法，如电话和可穿戴传感，已经成为现代普遍计算研究中监测人类行为的主导方法。虽然在将原始传感器数据转换为瞬时行为模型（例如，物理活动识别）方面取得了重大进展，但在将这些传感流转化为在各种应用中所需的有意义的、高水平、具有上下文感知能力的洞察力方面仍存在重大差距（例如，总结个人的日常例行事务）。为了弥合这一差距，专家通常需要在现实世界研究中采用基于上下文的意义建构过程来得出洞察。这一过程通常需要手动努力，并且即使对经验丰富的研究人员来说也可能具有挑战性，因为人类行为的复杂性。我们进行了三轮21位专家的用户研究，以探索解决意义建构挑战的解决方案。我们遵循人本设计过程，以识别需求并设计、迭代、构建和评估Vital Insight（VI），一种新颖的、LLM辅助的原型系统，以实现人机协同推理（意义建构）和可穿戴智能手机和传感数据的多模式被动传感数据的可视化。通过使用原型作为技术探针，我们观察专家与其互动，并开发一个专家意义建构模型，解释专家如何在直接数据表示和AI支持的推理之间移动，以探索、质疑和验证洞察。通过这一迭代过程，我们还综合和讨论了未来AI增强可视化系统设计的一系列设计启示，以更好地协助专家在多模式健康传感数据中的意义建构过程。

更新时间: 2025-07-09 00:14:20

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2410.14879v3

Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations

Non-negative Matrix Factorization (NMF) is a powerful technique for analyzing regularly-sampled data, i.e., data that can be stored in a matrix. For audio, this has led to numerous applications using time-frequency (TF) representations like the Short-Time Fourier Transform. However extending these applications to irregularly-spaced TF representations, like the Constant-Q transform, wavelets, or sinusoidal analysis models, has not been possible since these representations cannot be directly stored in matrix form. In this paper, we formulate NMF in terms of learnable functions (instead of vectors) and show that NMF can be extended to a wider variety of signal classes that need not be regularly sampled.

Updated: 2025-07-09 00:09:50

标题: 重新思考使用隐式神经表示的非负矩阵分解

摘要: 非负矩阵分解（NMF）是一种用于分析定期采样数据的强大技术，即可以存储在矩阵中的数据。对于音频数据，这导致了许多应用程序使用时间频率（TF）表示，例如短时傅里叶变换。然而，将这些应用程序扩展到不规则间隔的TF表示，如常数Q变换，小波或正弦分析模型，是不可能的，因为这些表示不能直接以矩阵形式存储。在本文中，我们将NMF表述为可学习函数（而不是向量），并展示NMF可以扩展到更广泛的信号类别，不需要定期采样。

更新时间: 2025-07-09 00:09:50

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2404.04439v2