Arxiv Day: Article

Data-driven identification of nonlinear dynamical systems with LSTM autoencoders and Normalizing Flows

While linear systems have been useful in solving problems across different fields, the need for improved performance and efficiency has prompted them to operate in nonlinear modes. As a result, nonlinear models are now essential for the design and control of these systems. However, identifying a nonlinear system is more complicated than identifying a linear one. Therefore, modeling and identifying nonlinear systems are crucial for the design, manufacturing, and testing of complex systems. This study presents using advanced nonlinear methods based on deep learning for system identification. Two deep neural network models, LSTM autoencoder and Normalizing Flows, are explored for their potential to extract temporal features from time series data and relate them to system parameters, respectively. The presented framework offers a nonlinear approach to system identification, enabling it to handle complex systems. As case studies, we consider Duffing and Lorenz systems, as well as fluid flows such as flows over a cylinder and the 2-D lid-driven cavity problem. The results indicate that the presented framework is capable of capturing features and effectively relating them to system parameters, satisfying the identification requirements of nonlinear systems.

Updated: 2025-03-05 23:58:59

标题: 基于LSTM自编码器和归一化流的数据驱动非线性动力系统识别

摘要: 尽管线性系统在解决不同领域的问题中很有用，但对性能和效率的改进需求促使它们以非线性模式运行。因此，非线性模型现在对于设计和控制这些系统至关重要。然而，识别非线性系统比识别线性系统更复杂。因此，建模和识别非线性系统对于设计、制造和测试复杂系统至关重要。本研究提出了使用基于深度学习的先进非线性方法进行系统识别。探讨了两种深度神经网络模型，即LSTM自动编码器和正规化流，它们分别用于从时间序列数据中提取时间特征并将其与系统参数相关联。所提出的框架为系统识别提供了一种非线性方法，使其能够处理复杂系统。作为案例研究，我们考虑了达芬和洛伦兹系统，以及流体流动，例如在圆柱体上的流动和二维驱动盖腔问题。结果表明，所提出的框架能够捕捉特征并有效地将其与系统参数相关联，满足非线性系统的识别要求。

更新时间: 2025-03-05 23:58:59

领域: cs.LG

下载: http://arxiv.org/abs/2503.03977v1

Improving Data Efficiency via Curating LLM-Driven Rating Systems

Instruction tuning is critical for adapting large language models (LLMs) to downstream tasks, and recent studies have demonstrated that small amounts of human-curated data can outperform larger datasets, challenging traditional data scaling laws. While LLM-based data quality rating systems offer a cost-effective alternative to human annotation, they often suffer from inaccuracies and biases, even in powerful models like GPT-4. In this work, we introduce DS2, a Diversity-aware Score curation method for Data Selection. By systematically modeling error patterns through a score transition matrix, DS2 corrects LLM-based scores and promotes diversity in the selected data samples. Our approach shows that a curated subset (just 3.3% of the original dataset) outperforms full-scale datasets (300k samples) across various machine-alignment benchmarks, and matches or surpasses human-aligned datasets such as LIMA with the same sample size (1k samples). These findings challenge conventional data scaling assumptions, highlighting that redundant, low-quality samples can degrade performance and reaffirming that "more can be less."

Updated: 2025-03-05 23:56:10

标题: 通过精心策划LLM驱动的评级系统来提高数据效率

摘要: 指导调整对于将大型语言模型（LLMs）调整到下游任务中至关重要，最近的研究表明，少量经过人工筛选的数据可以胜过更大规模的数据集，挑战传统的数据扩展规律。虽然基于LLM的数据质量评分系统提供了一种成本效益高的人工标注替代方案，但它们往往存在不准确和偏见，即使在像GPT-4这样的强大模型中也是如此。在这项工作中，我们引入了DS2，一种针对数据选择的多样性感知评分筛选方法。通过系统地建模误差模式，通过一个分数转换矩阵，DS2可以校正LLM基于的评分，并促进所选数据样本的多样性。我们的方法表明，一个经过筛选的子集（仅原始数据集的3.3%）在各种机器对齐基准测试中均胜过全尺度数据集（30万个样本），并且与具有相同样本量的人工对齐数据集（1k个样本）相匹配或超过。这些发现挑战了传统的数据扩展假设，突显了冗余、低质量样本可能会降低性能，并再次证实了"多不如少"。

更新时间: 2025-03-05 23:56:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.10877v2

Autonomous Recognition of Erroneous Raw Key Bit Bias in Quantum Key Distribution

As Quantum Key Distribution technologies mature, it is pertinent to consider these systems in contexts beyond lab settings, and how these systems may have to operate autonomously. To begin, an abstract definition of a type of error that can occur with regard to the ratio of bit values in the raw key is presented, and how this has an impact on the security and key rate of QKD protocols. A mechanism by which errors of this type can be autonomously recognised is given, along with simulated results. A two part countermeasure that can be put in place to mitigate against errors of this type is also given. Finally some motivating examples where this type of error could appear in practice are presented to add context, and to illustrate the importance of this work to the development of Quantum Key Distribution technologies.

Updated: 2025-03-05 23:51:06

标题: 自主识别量子密钥分发中错误原始密钥比特偏差

摘要: 随着量子密钥分发技术的成熟，考虑将这些系统应用于实验室环境之外的情境，并思考这些系统可能需要自主运行是至关重要的。首先，提出了关于原始密钥中比特值比例出现的一种错误类型的抽象定义，以及这种错误对量子密钥分发协议的安全性和密钥速率的影响。给出了一种自主识别这种错误类型的机制，并附有模拟结果。同时还提供了一种可用于减轻这种错误的双重对策措施。最后，为了增加背景和说明这项工作对量子密钥分发技术发展的重要性，呈现了一些实践中可能出现这种错误的激励性例子。

更新时间: 2025-03-05 23:51:06

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2305.18006v4

Cryptographic Verifiability for Voter Registration Systems

Voter registration systems are a critical - and surprisingly understudied - element of most high-stakes elections. Despite a history of targeting by adversaries, relatively little academic work has been done to increase visibility into how voter registration systems keep voters' data secure, accurate, and up to date. Enhancing transparency and verifiability could help election officials and the public detect and mitigate risks to this essential component of electoral processes worldwide. This work introduces cryptographic verifiability for voter registration systems. Based on consultation with diverse expert stakeholders that support elections systems, we precisely define the requirements for cryptographic verifiability in voter registration and systematize the practical challenges that must be overcome for near-term deployment. We then introduce VRLog, the first system to bring strong verifiability to voter registration. VRLog enables election officials to provide a transparent log that (1) allows voters to verify that their registration data has not been tampered with and (2) allows the public to monitor update patterns and database consistency. We also introduce VRLog$^x$, an enhancement to VRLog that offers cryptographic privacy to voter deduplication between jurisdictions - a common maintenance task currently performed in plaintext or using trusted third parties. Our designs rely on standard, efficient cryptographic primitives, and are backward compatible with existing voter registration systems. Finally, we provide an open-source implementation of VRLog and benchmarks to demonstrate that the system is practical - capable of running on low-cost commodity hardware and scaling to support databases the size of the largest U.S. state voter registration systems.

Updated: 2025-03-05 23:51:04

标题: 选民登记系统的加密可验证性

摘要: 选民登记系统是大多数高风险选举的关键 - 但令人惊讶地研究不足 - 要素。尽管经常成为对手的目标，但相对较少的学术工作已经进行，以增加对选民登记系统如何保护选民数据安全，准确和最新的可见性。增强透明度和可验证性可以帮助选举官员和公众检测和减轻这一全球选举过程中必不可少的组成部分的风险。本文介绍了选民登记系统的加密可验证性。基于与支持选举系统的多样化专家利益相关者的咨询，我们精确定义了选民登记中加密可验证性的要求，并系统化了必须克服的近期部署的实际挑战。然后，我们介绍了VRLog，这是第一个为选民登记引入强大可验证性的系统。VRLog使选举官员能够提供透明的日志，允许选民验证他们的登记数据没有被篡改，并允许公众监视更新模式和数据库一致性。我们还介绍了VRLog$^x$，这是对VRLog的增强，为跨辖区之间的选民去重提供了加密隐私 - 这是目前以明文或使用受信任的第三方执行的常见维护任务。我们的设计依赖于标准，高效的加密原语，并与现有的选民登记系统向后兼容。最后，我们提供了VRLog的开源实现和基准测试，以证明该系统是切实可行的 - 能够在低成本商品硬件上运行，并能够扩展以支持规模最大的美国州选民登记系统的数据库。

更新时间: 2025-03-05 23:51:04

领域: cs.CR

下载: http://arxiv.org/abs/2503.03974v1

Enhancing Collective Intelligence in Large Language Models Through Emotional Integration

This research investigates the integration of emotional diversity into Large Language Models (LLMs) to enhance collective intelligence. Inspired by the human wisdom of crowds phenomenon, where group decisions often outperform individual judgments, we fine-tuned the DarkIdol-Llama-3.1-8B model using Google's GoEmotions dataset and Low-Rank Adaptation (LoRA) to simulate emotionally diverse responses. Evaluating the model on a distance estimation task between Fargo, ND, and Seattle, WA, across 15,064 unique persona configurations, we analyzed how emotional states and social attributes influence decision-making. Our findings demonstrate that emotional integration shapes response patterns while maintaining acceptable prediction accuracy, revealing its potential to enhance artificial collective intelligence. This study provides valuable insights into the interplay of emotional diversity and decision-making in LLMs, suggesting pathways for creating emotionally aware AI systems that balance emotional depth with analytical precision.

Updated: 2025-03-05 23:42:48

标题: 通过情感整合增强大型语言模型的集体智慧

摘要: 这项研究调查了将情感多样性整合到大型语言模型（LLMs）中以增强集体智能。受到众人智慧现象的启发，即群体决策通常优于个体判断，我们使用Google的GoEmotions数据集和低秩适应（LoRA）对DarkIdol-Llama-3.1-8B模型进行微调，以模拟情感多样性的响应。通过在Fargo, ND和Seattle, WA之间的距离估计任务上评估模型，跨15,064个独特的个人配置，我们分析了情绪状态和社会属性如何影响决策。我们的研究结果表明，情感整合塑造了响应模式，同时保持了可接受的预测准确性，揭示了其增强人工集体智能的潜力。这项研究为LLMs中情感多样性和决策之间的相互作用提供了宝贵的见解，提出了创造情感感知AI系统的途径，平衡情感深度和分析精度。

更新时间: 2025-03-05 23:42:48

领域: cs.CL,cs.AI,cs.CY,cs.HC,cs.MA

下载: http://arxiv.org/abs/2503.04849v1

Trim My View: An LLM-Based Code Query System for Module Retrieval in Robotic Firmware

The software compilation process has a tendency to obscure the original design of the system and makes it difficult both to identify individual components and discern their purpose simply by examining the resulting binary code. Although decompilation techniques attempt to recover higher-level source code from the machine code in question, they are not fully able to restore the semantics of the original functions. Furthermore, binaries are often stripped of metadata, and this makes it challenging to reverse engineer complex binary software. In this paper we show how a combination of binary decomposition techniques, decompilation passes, and LLM-powered function summarization can be used to build an economical engine to identify modules in stripped binaries and associate them with high-level natural language descriptions. We instantiated this technique with three underlying open-source LLMs -- CodeQwen, DeepSeek-Coder and CodeStral -- and measured its effectiveness in identifying modules in robotics firmware. This experimental evaluation involved 467 modules from four devices from the ArduPilot software suite, and showed that CodeStral, the best-performing backend LLM, achieves an average F1-score of 0.68 with an online running time of just a handful of seconds.

Updated: 2025-03-05 23:40:17

标题: 修剪我的视图：基于LLM的机器人固件模块检索代码查询系统

摘要: 软件编译过程往往会掩盖系统的原始设计，并使通过检查生成的二进制代码来识别个别组件并理解它们的目的变得困难。尽管反编译技术试图从相关的机器码中恢复高级源代码，但它们并不能完全恢复原始函数的语义。此外，二进制文件通常被剥离元数据，这使得逆向工程复杂的二进制软件变得具有挑战性。在本文中，我们展示了如何利用一系列二进制分解技术、反编译过程和LLM驱动的函数摘要技术来构建一个经济高效的引擎，以识别剥离元数据的二进制文件中的模块，并将它们与高级自然语言描述关联起来。我们使用了三个基础开源LLM（CodeQwen、DeepSeek-Coder和CodeStral）来实现这一技术，并衡量了它在识别机器人固件中模块的有效性。这项实验评估涉及来自ArduPilot软件套件的四个设备中的467个模块，结果显示，表现最佳的后端LLM CodeStral 在仅几秒的在线运行时间内实现了平均F1分数为0.68。

更新时间: 2025-03-05 23:40:17

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2503.03969v1

All-atom Diffusion Transformers: Unified generative modelling of molecules and materials

Diffusion models are the standard toolkit for generative modelling of 3D atomic systems. However, for different types of atomic systems - such as molecules and materials - the generative processes are usually highly specific to the target system despite the underlying physics being the same. We introduce the All-atom Diffusion Transformer (ADiT), a unified latent diffusion framework for jointly generating both periodic materials and non-periodic molecular systems using the same model: (1) An autoencoder maps a unified, all-atom representations of molecules and materials to a shared latent embedding space; and (2) A diffusion model is trained to generate new latent embeddings that the autoencoder can decode to sample new molecules or materials. Experiments on QM9 and MP20 datasets demonstrate that jointly trained ADiT generates realistic and valid molecules as well as materials, exceeding state-of-the-art results from molecule and crystal-specific models. ADiT uses standard Transformers for both the autoencoder and diffusion model, resulting in significant speedups during training and inference compared to equivariant diffusion models. Scaling ADiT up to half a billion parameters predictably improves performance, representing a step towards broadly generalizable foundation models for generative chemistry. Open source code: https://github.com/facebookresearch/all-atom-diffusion-transformer

Updated: 2025-03-05 23:35:44

标题: 全原子扩散变换器：分子和材料的统一生成建模

摘要: 扩散模型是生成三维原子系统的标准工具包。然而，对于不同类型的原子系统 - 如分子和材料 - 尽管基础物理相同，但生成过程通常高度特定于目标系统。我们介绍了全原子扩散变换器（ADiT），一个统一的潜在扩散框架，用于同时生成周期性材料和非周期性分子系统，使用相同的模型：（1）自动编码器将分子和材料的统一全原子表示映射到共享的潜在嵌入空间；和（2）训练扩散模型生成新的潜在嵌入，自动编码器可以解码以生成新的分子或材料。对QM9和MP20数据集的实验表明，经过联合训练的ADiT生成了逼真且有效的分子以及材料，超过了分子和晶体特定模型的最新结果。ADiT在自动编码器和扩散模型中均使用标准的Transformer，相比于等变扩散模型，在训练和推理过程中产生了显著的加速。将ADiT扩展到五亿个参数可以可靠地提高性能，这代表了朝着广泛适用的生成化学基础模型迈出的一步。开源代码：https://github.com/facebookresearch/all-atom-diffusion-transformer

更新时间: 2025-03-05 23:35:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03965v1

Generative Learning of Densities on Manifolds

A generative modeling framework is proposed that combines diffusion models and manifold learning to efficiently sample data densities on manifolds. The approach utilizes Diffusion Maps to uncover possible low-dimensional underlying (latent) spaces in the high-dimensional data (ambient) space. Two approaches for sampling from the latent data density are described. The first is a score-based diffusion model, which is trained to map a standard normal distribution to the latent data distribution using a neural network. The second one involves solving an It\^o stochastic differential equation in the latent space. Additional realizations of the data are generated by lifting the samples back to the ambient space using Double Diffusion Maps, a recently introduced technique typically employed in studying dynamical system reduction; here the focus lies in sampling densities rather than system dynamics. The proposed approaches enable sampling high dimensional data densities restricted to low-dimensional, a priori unknown manifolds. The efficacy of the proposed framework is demonstrated through a benchmark problem and a material with multiscale structure.

Updated: 2025-03-05 23:29:06

标题: 在流形上生成密度的学习

摘要: 提出了一个生成建模框架，结合了扩散模型和流形学习，有效地在流形上对数据密度进行采样。该方法利用扩散映射来揭示高维数据（环境）空间中可能存在的低维潜在空间。描述了两种从潜在数据密度中进行采样的方法。第一种是基于分数的扩散模型，通过神经网络训练，将标准正态分布映射到潜在数据分布。第二种方法涉及在潜在空间中解决It\^o随机微分方程。通过使用双扩散映射将样本提升回环境空间生成数据的额外实现，这是一种最近引入的技术，通常用于研究动态系统的简化；这里的重点在于对密度进行采样而不是系统动态。提出的方法使得能够对限制在低维、事先未知流形上的高维数据密度进行采样。通过一个基准问题和具有多尺度结构的材料展示了提出框架的有效性。

更新时间: 2025-03-05 23:29:06

领域: cs.LG

下载: http://arxiv.org/abs/2503.03963v1

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Recent theoretical results show transformers cannot express sequential reasoning problems over long input lengths, intuitively because their computational depth is bounded. However, prior work treats the depth as a constant, leaving it unclear to what degree bounded depth may suffice for solving problems over short inputs, or how increasing the transformer's depth affects its expressive power. We address these questions by analyzing the expressive power of transformers whose depth can grow minimally with context length $n$. We show even highly uniform transformers with depth $\Theta(\log n)$ can express two important problems: recognizing regular languages, which captures state tracking abilities, and graph connectivity, which underlies multi-step reasoning. Notably, both of these problems cannot be expressed by fixed-depth transformers under standard complexity conjectures, demonstrating the expressivity benefit of growing depth. Moreover, our theory quantitatively predicts how depth must grow with input length to express these problems, showing that depth scaling is more efficient than scaling width or chain-of-thought steps. Empirically, we find our theoretical depth requirements for regular language recognition match the practical depth requirements of transformers remarkably well. Thus, our results clarify precisely how depth affects transformers' reasoning capabilities, providing potential practical insights for designing models that are better at sequential reasoning.

Updated: 2025-03-05 23:26:25

标题: 一点深度就足够了：对数深度变换器的表现力

摘要: 最近的理论结果表明，变压器不能表达长输入长度上的顺序推理问题，直观地因为它们的计算深度是有限的。然而，先前的工作将深度视为一个常数，这使得有关有限深度是否足以解决短输入上的问题，或者增加变压器深度如何影响其表达能力的程度不清楚。我们通过分析深度可以随着上下文长度$n$而最小增长的变压器的表达能力来解决这些问题。我们表明，即使是深度为$\Theta(\log n)$的高度统一的变压器也可以表达两个重要问题：识别正则语言，从而捕捉状态跟踪能力；以及图的连通性，这是多步推理的基础。值得注意的是，这两个问题都不能由固定深度的变压器在标准复杂性猜想下表达，展示了增加深度的表达优势。此外，我们的理论定量地预测了深度必须随着输入长度增长以表达这些问题，表明深度缩放比宽度或思维链步骤的缩放更有效。在实证方面，我们发现我们理论上对于正则语言识别的深度要求与变压器的实际深度要求非常吻合。因此，我们的结果清楚地阐明了深度如何影响变压器的推理能力，为设计更擅长顺序推理的模型提供潜在的实用见解。

更新时间: 2025-03-05 23:26:25

领域: cs.LG,cs.CC

下载: http://arxiv.org/abs/2503.03961v1

Introduction to Online Control

This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control. The primary distinction between online nonstochastic control and other frameworks is the objective. In optimal control, robust control, and other control methodologies that assume stochastic noise, the goal is to perform comparably to an offline optimal strategy. In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary. Thus the optimal policy is not defined a priori. Rather, the target is to attain low regret against the best policy in hindsight from a benchmark class of policies. This objective suggests the use of the decision making framework of online convex optimization as an algorithmic methodology. The resulting methods are based on iterative mathematical optimization algorithms, and are accompanied by finite-time regret and computational complexity guarantees.

Updated: 2025-03-05 23:25:34

标题: 在线控制简介

摘要: 这篇文本介绍了控制动态系统和可微强化学习中一个新兴范式——在线非随机控制。这种新方法应用了在线凸优化和凸松弛技术，以获得在经典最优和鲁棒控制设置中具有可证明保证的新方法。在线非随机控制与其他框架的主要区别在于目标。在最优控制、鲁棒控制和其他假设随机噪声的控制方法中，目标是表现与离线最优策略相当。在在线非随机控制中，成本函数和假定动态模型的扰动都是由对手选择的。因此，最优策略并非事先定义。相反，目标是在一组策略的基准类别中针对最佳策略取得低后悔。这个目标暗示了在线凸优化的决策框架作为一种算法方法的使用。由此产生的方法基于迭代数学优化算法，并伴随有有限时间后悔和计算复杂性保证。

更新时间: 2025-03-05 23:25:34

领域: cs.LG,cs.RO,cs.SY,eess.SY,math.OC,stat.ML

下载: http://arxiv.org/abs/2211.09619v4

Dynamic Scheduling of a Multiclass Queue in the Halfin-Whitt Regime: A Computational Approach for High-Dimensional Problems

We consider a multi-class queueing model of a telephone call center, in which a system manager dynamically allocates available servers to customer calls. Calls can terminate through either service completion or customer abandonment, and the manager strives to minimize the expected total of holding costs plus abandonment costs over a finite horizon. Focusing on the Halfin-Whitt heavy traffic regime, we derive an approximating diffusion control problem, and building on earlier work by Beck et al. (2021), develop a simulation-based computational method for solution of such problems, one that relies heavily on deep neural network technology. Using this computational method, we propose a policy for the original (pre-limit) call center scheduling problem. Finally, the performance of this policy is assessed using test problems based on publicly available call center data. For the test problems considered so far, our policy does as well as or better than the best benchmark we could find. Moreover, our method is computationally feasible at least up to dimension 500, that is, for call centers with 500 or more distinct customer classes.

Updated: 2025-03-05 23:24:01

标题: 在Halfin-Whitt制度下多类队列的动态调度：高维问题的计算方法

摘要: 我们考虑了一个电话呼叫中心的多类排队模型，其中系统经理动态分配可用服务器给客户呼叫。呼叫可以通过服务完成或客户放弃来终止，经理致力于在有限时间内最小化预期持有成本加上放弃成本总和。我们关注Halfin-Whitt重流量制度，推导出一个近似扩散控制问题，并在Beck等人（2021年）之前的工作基础上，开发了一种基于模拟的计算方法来解决这类问题，这种方法在很大程度上依赖于深度神经网络技术。利用这种计算方法，我们提出了一个原始（未限制）呼叫中心调度问题的策略。最后，我们使用基于公开可用呼叫中心数据的测试问题来评估这种策略的性能。对目前考虑的测试问题来说，我们的策略与我们能找到的最佳基准相当或更好。此外，我们的方法在至少500维的情况下是可行的，也就是说，对于具有500个或更多不同客户类别的呼叫中心。

更新时间: 2025-03-05 23:24:01

领域: eess.SY,cs.LG,cs.SY,math.AP,math.OC

下载: http://arxiv.org/abs/2311.18128v2

Improving the Temporal Resolution of SOHO/MDI Magnetograms of Solar Active Regions Using a Deep Generative Model

We present a novel deep generative model, named GenMDI, to improve the temporal resolution of line-of-sight (LOS) magnetograms of solar active regions (ARs) collected by the Michelson Doppler Imager (MDI) on board the Solar and Heliospheric Observatory (SOHO). Unlike previous studies that focus primarily on spatial super-resolution of MDI magnetograms, our approach can perform temporal super-resolution, which generates and inserts synthetic data between observed MDI magnetograms, thus providing finer temporal structure and enhanced details in the LOS data. The GenMDI model employs a conditional diffusion process, which synthesizes images by considering both preceding and subsequent magnetograms, ensuring that the generated images are not only of high-quality, but also temporally coherent with the surrounding data. Experimental results show that the GenMDI model performs better than the traditional linear interpolation method, especially in ARs with dynamic evolution in magnetic fields.

Updated: 2025-03-05 23:22:55

标题: 利用深度生成模型提高SOHO/MDI太阳活动区磁图的时间分辨率

摘要: 我们提出了一种新颖的深度生成模型，名为GenMDI，旨在改善由太阳和日球观测卫星（SOHO）上的Michelson多普勒成像仪（MDI）收集的太阳活跃区域（ARs）的视线磁图的时间分辨率。与先前主要专注于MDI磁图的空间超分辨率的研究不同，我们的方法可以执行时间超分辨率，即在观测到的MDI磁图之间生成和插入合成数据，从而提供更细的时间结构和增强的LOS数据细节。GenMDI模型采用条件扩散过程，通过考虑先前和后续磁图来合成图像，确保生成的图像不仅质量高，而且在时间上与周围数据一致。实验结果表明，GenMDI模型比传统的线性插值方法表现更好，特别是在磁场动态演变的ARs中。

更新时间: 2025-03-05 23:22:55

领域: astro-ph.SR,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2503.03959v1

Transformers Use Causal World Models in Maze-Solving Tasks

Recent studies in interpretability have explored the inner workings of transformer models trained on tasks across various domains, often discovering that these networks naturally develop highly structured representations. When such representations comprehensively reflect the task domain's structure, they are commonly referred to as "World Models" (WMs). In this work, we identify WMs in transformers trained on maze-solving tasks. By using Sparse Autoencoders (SAEs) and analyzing attention patterns, we examine the construction of WMs and demonstrate consistency between SAE feature-based and circuit-based analyses. By subsequently intervening on isolated features to confirm their causal role, we find that it is easier to activate features than to suppress them. Furthermore, we find that models can reason about mazes involving more simultaneously active features than they encountered during training; however, when these same mazes (with greater numbers of connections) are provided to models via input tokens instead, the models fail. Finally, we demonstrate that positional encoding schemes appear to influence how World Models are structured within the model's residual stream.

Updated: 2025-03-05 23:16:16

标题: 变压器在解决迷宫任务中使用因果世界模型

摘要: 最近的可解释性研究探索了在各个领域的任务上训练的变压器模型的内部工作机制，通常发现这些网络自然地发展出高度结构化的表示。当这些表示全面反映任务领域的结构时，它们通常被称为“世界模型”（WMs）。在这项工作中，我们在训练迷宫解决任务的变压器中识别了WMs。通过使用稀疏自编码器（SAEs）并分析注意模式，我们研究了WMs的构建，并展示了基于SAE特征和基于电路的分析之间的一致性。随后通过对孤立特征进行干预以确认它们的因果作用，我们发现激活特征比抑制特征更容易。此外，我们发现模型可以推理涉及更多同时活动特征的迷宫，而这些特征在训练期间并未遇到；然而，当将这些相同的迷宫（具有更多连接）通过输入令牌提供给模型时，模型失败了。最后，我们证明位置编码方案似乎影响了世界模型在模型的剩余流中的结构。

更新时间: 2025-03-05 23:16:16

领域: cs.LG,cs.AI,I.2

下载: http://arxiv.org/abs/2412.11867v2

The Illusion of State in State-Space Models

State-space models (SSMs) have emerged as a potential alternative architecture for building large language models (LLMs) compared to the previously ubiquitous transformer architecture. One theoretical weakness of transformers is that they cannot express certain kinds of sequential computation and state tracking (Merrill & Sabharwal, 2023), which SSMs are explicitly designed to address via their close architectural similarity to recurrent neural networks (RNNs). But do SSMs truly have an advantage (over transformers) in expressive power for state tracking? Surprisingly, the answer is no. Our analysis reveals that the expressive power of SSMs is limited very similarly to transformers: SSMs cannot express computation outside the complexity class $\mathsf{TC}^0$. In particular, this means they cannot solve simple state-tracking problems like permutation composition. It follows that SSMs are provably unable to accurately track chess moves with certain notation, evaluate code, or track entities in a long narrative. To supplement our formal analysis, we report experiments showing that Mamba-style SSMs indeed struggle with state tracking. Thus, despite its recurrent formulation, the "state" in an SSM is an illusion: SSMs have similar expressiveness limitations to non-recurrent models like transformers, which may fundamentally limit their ability to solve real-world state-tracking problems.

Updated: 2025-03-05 23:00:57

标题: 状态空间模型中的状态错觉

摘要: 状态空间模型（SSMs）已经成为构建大型语言模型（LLMs）的潜在替代架构，与先前普遍使用的变压器架构相比。变压器的一个理论上的弱点是它们无法表达某些类型的顺序计算和状态跟踪（Merrill & Sabharwal，2023），而SSMs则通过其与循环神经网络（RNNs）的紧密架构相似性明确设计来解决这个问题。但是，SSMs在状态跟踪的表现力方面是否真的具有优势（超过变压器）？令人惊讶的是，答案是否定的。我们的分析显示，SSMs的表达能力与变压器非常类似受限：SSMs无法表达超出复杂度类$\mathsf{TC}^0$的计算。特别是，这意味着它们无法解决像排列组合这样的简单状态跟踪问题。由此可见，SSMs无法准确地跟踪具有特定符号的国际象棋走法，评估代码或跟踪长篇叙述中的实体。为了补充我们的正式分析，我们报告了一些实验，显示Mamba风格的SSMs确实在状态跟踪方面遇到了困难。因此，尽管其循环形式，SSM中的“状态”是一种幻觉：SSMs与变压器等非循环模型具有类似的表达能力限制，这可能从根本上限制它们解决实际状态跟踪问题的能力。

更新时间: 2025-03-05 23:00:57

领域: cs.LG,cs.CC,cs.CL,cs.FL

下载: http://arxiv.org/abs/2404.08819v3

Dyads: Artist-Centric, AI-Generated Dance Duets

Existing AI-generated dance methods primarily train on motion capture data from solo dance performances, but a critical feature of dance in nearly any genre is the interaction of two or more bodies in space. Moreover, many works at the intersection of AI and dance fail to incorporate the ideas and needs of the artists themselves into their development process, yielding models that produce far more useful insights for the AI community than for the dance community. This work addresses both needs of the field by proposing an AI method to model the complex interactions between pairs of dancers and detailing how the technical methodology can be shaped by ongoing co-creation with the artistic stakeholders who curated the movement data. Our model is a probability-and-attention-based Variational Autoencoder that generates a choreographic partner conditioned on an input dance sequence. We construct a custom loss function to enhance the smoothness and coherence of the generated choreography. Our code is open-source, and we also document strategies for other interdisciplinary research teams to facilitate collaboration and strong communication between artists and technologists.

Updated: 2025-03-05 22:58:03

标题: 二重奏：以艺术家为中心，由人工智能生成的舞蹈二重奏

摘要: 现有的人工智能生成舞蹈方法主要基于独舞表演的动作捕捉数据进行训练，但几乎任何流派的舞蹈的一个关键特征是两个或更多身体在空间中的互动。此外，许多涉及人工智能和舞蹈交叉领域的作品未能将艺术家们的想法和需求纳入其开发过程中，导致生成的模型对人工智能社区比对舞蹈社区产生更有用的见解。本研究通过提出一种人工智能方法来模拟舞者之间复杂的互动关系，并详细介绍了技术方法如何通过与策划移动数据的艺术利益相关者的持续共创来塑造。我们的模型是基于概率和注意力的变分自动编码器，它生成一个根据输入舞蹈序列的编舞伴侣。我们构建了一个定制损失函数，以增强生成的编舞的平滑性和连贯性。我们的代码是开源的，我们还记录了其他跨学科研究团队促进艺术家和技术人员之间合作和强有力沟通的策略。

更新时间: 2025-03-05 22:58:03

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2503.03954v1

WIP: Assessing the Effectiveness of ChatGPT in Preparatory Testing Activities

This innovative practice WIP paper describes a research study that explores the integration of ChatGPT into the software testing curriculum and evaluates its effectiveness compared to human-generated testing artifacts. In a Capstone Project course, students were tasked with generating preparatory testing artifacts using ChatGPT prompts, which they had previously created manually. Their understanding and the effectiveness of the Artificial Intelligence generated artifacts were assessed through targeted questions. The results, drawn from this in-class assignment at a North American community college indicate that while ChatGPT can automate many testing preparation tasks, it cannot fully replace human expertise. However, students, already familiar with Information Technology at the postgraduate level, found the integration of ChatGPT into their workflow to be straightforward. The study suggests that AI can be gradually introduced into software testing education to keep pace with technological advancements.

Updated: 2025-03-05 22:51:24

标题: WIP: 评估ChatGPT在准备测试活动中的有效性

摘要: 这项创新实践工作正在描述一项研究，探讨了将ChatGPT整合到软件测试课程中，并评估其与人工生成测试工件的效果。在一个毕业设计课程中，学生被要求使用ChatGPT提示生成准备测试工件，这些提示是他们之前手动创建的。通过针对性问题评估了学生对人工智能生成的工件的理解和有效性。从北美社区学院的这项课堂作业中得出的结果表明，虽然ChatGPT可以自动化许多测试准备任务，但无法完全取代人类专业知识。然而，已经熟悉信息技术的研究生水平的学生发现将ChatGPT整合到他们的工作流程中是直接的。该研究表明，人工智能可以逐渐引入到软件测试教育中，以跟上技术发展的步伐。

更新时间: 2025-03-05 22:51:24

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2503.03951v1

Generative Social Choice

The mathematical study of voting, social choice theory, has traditionally only been applicable to choices among a few predetermined alternatives, but not to open-ended decisions such as collectively selecting a textual statement. We introduce generative social choice, a design methodology for open-ended democratic processes that combines the rigor of social choice theory with the capability of large language models to generate text and extrapolate preferences. Our framework divides the design of AI-augmented democratic processes into two components: first, proving that the process satisfies representation guarantees when given access to oracle queries; second, empirically validating that these queries can be approximately implemented using a large language model. We apply this framework to the problem of summarizing free-form opinions into a proportionally representative slate of opinion statements; specifically, we develop a democratic process with representation guarantees and use this process to portray the opinions of participants in a survey about abortion policy. In a trial with 100 representative US residents, we find that 84 out of 100 participants feel "excellently" or "exceptionally" represented by the slate of five statements we extracted.

Updated: 2025-03-05 22:43:15

标题: 生成式社会选择

摘要: 投票的数学研究，社会选择理论，传统上只适用于在几个预先确定的选择之间进行选择，而不适用于开放式决策，例如集体选择文本陈述。我们引入了生成社会选择，这是一种针对开放式民主过程的设计方法，结合了社会选择理论的严谨性和大型语言模型生成文本和推断偏好的能力。我们的框架将AI增强民主过程的设计分为两个组成部分：首先，证明在访问oracle查询时，该过程满足代表性保证；其次，经验验证这些查询可以通过大型语言模型近似实现。我们将这一框架应用于将自由形式意见总结为比例代表性意见陈述的问题；具体地，我们开发了一个具有代表性保证的民主过程，并使用该过程描绘了关于堕胎政策的调查参与者的意见。在对100名美国代表性居民进行的试验中，我们发现100名参与者中有84名感觉"非常"或"异常"地被我们提取的五个陈述所代表。

更新时间: 2025-03-05 22:43:15

领域: cs.GT,cs.AI,cs.LG

下载: http://arxiv.org/abs/2309.01291v3

Canonical normalizing flows for manifold learning

Manifold learning flows are a class of generative modelling techniques that assume a low-dimensional manifold description of the data. The embedding of such a manifold into the high-dimensional space of the data is achieved via learnable invertible transformations. Therefore, once the manifold is properly aligned via a reconstruction loss, the probability density is tractable on the manifold and maximum likelihood can be used to optimize the network parameters. Naturally, the lower-dimensional representation of the data requires an injective-mapping. Recent approaches were able to enforce that the density aligns with the modelled manifold, while efficiently calculating the density volume-change term when embedding to the higher-dimensional space. However, unless the injective-mapping is analytically predefined, the learned manifold is not necessarily an efficient representation of the data. Namely, the latent dimensions of such models frequently learn an entangled intrinsic basis, with degenerate information being stored in each dimension. Alternatively, if a locally orthogonal and/or sparse basis is to be learned, here coined canonical intrinsic basis, it can serve in learning a more compact latent space representation. Toward this end, we propose a canonical manifold learning flow method, where a novel optimization objective enforces the transformation matrix to have few prominent and non-degenerate basis functions. We demonstrate that by minimizing the off-diagonal manifold metric elements $\ell_1$-norm, we can achieve such a basis, which is simultaneously sparse and/or orthogonal. Canonical manifold flow yields a more efficient use of the latent space, automatically generating fewer prominent and distinct dimensions to represent data, and a better approximation of target distributions than other manifold flow methods in most experiments we conducted, resulting in lower FID scores.

Updated: 2025-03-05 22:27:05

标题: 规范化流用于流型学习

摘要: 流形学习流是一类生成建模技术，假设数据具有低维流形描述。将这种流形嵌入数据的高维空间是通过可学习的可逆变换实现的。因此，一旦通过重构损失正确对齐流形，概率密度在流形上就是可处理的，最大似然可以用来优化网络参数。自然地，数据的低维表示需要一个单射映射。最近的方法能够强制使密度与建模的流形一致，同时在嵌入到高维空间时高效计算密度的体积变化项。然而，除非单射映射被预先定义，学习的流形未必是数据的有效表示。换句话说，这种模型的潜在维度通常学习一个纠缠的内在基础，每个维度存储着退化信息。或者，如果要学习局部正交和/或稀疏基础，这里命名为规范内在基础，它可以用来学习更紧凑的潜在空间表示。为此，我们提出了一种规范流形学习流方法，其中一种新的优化目标强制变换矩阵具有少量突出且非退化的基函数。我们证明通过最小化非对角线流形度量元素的l1范数，我们可以实现这样的基础，同时稀疏和/或正交。规范流形流利用潜在空间更有效地，自动生成较少突出和不同维度来表示数据，并在我们进行的大多数实验中比其他流形流方法更好地逼近目标分布，从而导致较低的FID分数。

更新时间: 2025-03-05 22:27:05

领域: stat.ML,cs.LG,math.DG,stat.CO

下载: http://arxiv.org/abs/2310.12743v3

COARSE: Collaborative Pseudo-Labeling with Coarse Real Labels for Off-Road Semantic Segmentation

Autonomous off-road navigation faces challenges due to diverse, unstructured environments, requiring robust perception with both geometric and semantic understanding. However, scarce densely labeled semantic data limits generalization across domains. Simulated data helps, but introduces domain adaptation issues. We propose COARSE, a semi-supervised domain adaptation framework for off-road semantic segmentation, leveraging sparse, coarse in-domain labels and densely labeled out-of-domain data. Using pretrained vision transformers, we bridge domain gaps with complementary pixel-level and patch-level decoders, enhanced by a collaborative pseudo-labeling strategy on unlabeled data. Evaluations on RUGD and Rellis-3D datasets show significant improvements of 9.7\% and 8.4\% respectively, versus only using coarse data. Tests on real-world off-road vehicle data in a multi-biome setting further demonstrate COARSE's applicability.

Updated: 2025-03-05 22:25:54

标题: COARSE: 利用粗糙真实标签进行离线道路语义分割的协作伪标签化

摘要: 自主越野导航面临挑战，因为环境多样、无结构，需要具有几何和语义理解的强大感知。然而，稀缺的密集标注语义数据限制了跨领域的泛化。模拟数据有所帮助，但引入了领域适应问题。我们提出了COARSE，一个用于越野语义分割的半监督领域适应框架，利用稀疏、粗糙的领域内标签和密集标注的领域外数据。利用预训练的视觉变换器，我们通过互补的像素级和补丁级解码器来弥合领域差距，通过对未标记数据采用协作伪标记策略来增强。对RUGD和Rellis-3D数据集的评估显示，与仅使用粗糙数据相比，分别实现了9.7%和8.4%的显着改进。在多生物群落环境中对真实世界的越野车辆数据进行测试进一步证明了COARSE的适用性。

更新时间: 2025-03-05 22:25:54

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2503.03947v1

Deep ARTMAP: Generalized Hierarchical Learning with Adaptive Resonance Theory

This paper presents Deep ARTMAP, a novel extension of the ARTMAP architecture that generalizes the self-consistent modular ART (SMART) architecture to enable hierarchical learning (supervised and unsupervised) across arbitrary transformations of data. The Deep ARTMAP framework operates as a divisive clustering mechanism, supporting an arbitrary number of modules with customizable granularity within each module. Inter-ART modules regulate the clustering at each layer, permitting unsupervised learning while enforcing a one-to-many mapping from clusters in one layer to the next. While Deep ARTMAP reduces to both ARTMAP and SMART in particular configurations, it offers significantly enhanced flexibility, accommodating a broader range of data transformations and learning modalities.

Updated: 2025-03-05 22:23:17

标题: 深度ARTMAP：具有自适应谐振理论的广义分层学习

摘要: 本文介绍了Deep ARTMAP，这是ARTMAP架构的一种新扩展，它将自洽模块ART（SMART）架构泛化，以实现跨数据任意转换的分层学习（监督和无监督）。Deep ARTMAP框架作为一个分裂聚类机制运行，支持任意数量的模块，并且每个模块内的粒度可定制化。Inter-ART模块在每一层调节聚类，允许无监督学习同时强制在一层到下一层的聚类之间建立一对多的映射。虽然Deep ARTMAP在特定配置下可以归约为ARTMAP和SMART，但它提供了显著增强的灵活性，适应更广泛的数据转换和学习方式。

更新时间: 2025-03-05 22:23:17

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2503.07641v1

SymmetricDiffusers: Learning Discrete Diffusion on Finite Symmetric Groups

Finite symmetric groups $S_n$ are essential in fields such as combinatorics, physics, and chemistry. However, learning a probability distribution over $S_n$ poses significant challenges due to its intractable size and discrete nature. In this paper, we introduce SymmetricDiffusers, a novel discrete diffusion model that simplifies the task of learning a complicated distribution over $S_n$ by decomposing it into learning simpler transitions of the reverse diffusion using deep neural networks. We identify the riffle shuffle as an effective forward transition and provide empirical guidelines for selecting the diffusion length based on the theory of random walks on finite groups. Additionally, we propose a generalized Plackett-Luce (PL) distribution for the reverse transition, which is provably more expressive than the PL distribution. We further introduce a theoretically grounded "denoising schedule" to improve sampling and learning efficiency. Extensive experiments show that our model achieves state-of-the-art or comparable performances on solving tasks including sorting 4-digit MNIST images, jigsaw puzzles, and traveling salesman problems. Our code is released at https://github.com/DSL-Lab/SymmetricDiffusers.

Updated: 2025-03-05 22:22:12

标题: 对称扩散器：在有限对称群上学习离散扩散

摘要: 有限对称群$S_n$在组合学、物理学和化学等领域中起着重要作用。然而，学习在$S_n$上的概率分布面临着重大挑战，因为其规模庞大且具有离散性质。在本文中，我们介绍了SymmetricDiffusers，这是一种新颖的离散扩散模型，通过将复杂的在$S_n$上的分布分解为使用深度神经网络学习逆扩散的更简单转移来简化任务。我们将洗牌操作识别为一种有效的正向转移，并根据有限群上的随机游走理论提供了选择扩散长度的经验指导。此外，我们提出了一种广义的Plackett-Luce（PL）分布用于逆转移，该分布在理论上比PL分布更具表现力。我们进一步引入了一个基于理论的“去噪计划”来提高采样和学习效率。大量实验表明，我们的模型在解决包括对4位MNIST图像进行排序、拼图游戏和旅行推销员问题在内的任务中实现了最先进或可比较的性能。我们的代码已发布在https://github.com/DSL-Lab/SymmetricDiffusers。

更新时间: 2025-03-05 22:22:12

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.02942v2

BrainNet-MoE: Brain-Inspired Mixture-of-Experts Learning for Neurological Disease Identification

The Lewy body dementia (LBD) is the second most common neurodegenerative dementia after Alzheimer's disease (AD). Early differentiation between AD and LBD is crucial because they require different treatment approaches, but this is challenging due to significant clinical overlap, heterogeneity, complex pathogenesis, and the rarity of LBD. While recent advances in artificial intelligence (AI) demonstrate powerful learning capabilities and offer new hope for accurate diagnosis, existing methods primarily focus on designing "neural-level networks". Our work represents a pioneering effort in modeling system-level artificial neural network called BrainNet-MoE for brain modeling and diagnosing. Inspired by the brain's hierarchical organization of bottom-up sensory integration and top-down control, we design a set of disease-specific expert groups to process brain sub-network under different condition, A disease gate mechanism guides the specializa-tion of expert groups, while a transformer layer enables communication be-tween all sub-networks, generating a comprehensive whole-brain represen-tation for downstream disease classification. Experimental results show superior classification accuracy with interpretable insights into how brain sub-networks contribute to different neurodegenerative conditions.

Updated: 2025-03-05 22:19:49

标题: BrainNet-MoE：脑启发的专家混合学习用于神经疾病识别

摘要: Lewy体痴呆症（LBD）是继阿尔茨海默病（AD）之后第二常见的神经退行性痴呆症。早期区分AD和LBD至关重要，因为它们需要不同的治疗方法，但由于临床重叠显著，异质性，复杂的病因和LBD的罕见性，这是具有挑战性的。尽管人工智能（AI）的最新进展展示了强大的学习能力，并为准确诊断提供了新希望，但目前的方法主要集中在设计“神经级网络”。我们的工作代表了对建模系统级人工神经网络BrainNet-MoE进行了开创性的努力，用于脑建模和诊断。受到大脑自下而上感知整合和自上而下控制的层次结构组织启发，我们设计了一组疾病特异性专家组来处理不同条件下的脑子网络，疾病门机制指导专家组的特化，而变压器层使所有子网络之间的通信成为可能，生成了一个全面的整个大脑表示，用于下游疾病分类。实验结果显示出更高的分类准确性，并解释了大脑子网络如何对不同神经退行性疾病条件做出贡献。

更新时间: 2025-03-05 22:19:49

领域: cs.LG,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2503.07640v1

Guardians of the Agentic System: Preventing Many Shots Jailbreak with Agentic System

The autonomous AI agents using large language models can create undeniable values in all span of the society but they face security threats from adversaries that warrants immediate protective solutions because trust and safety issues arise. Considering the many-shot jailbreaking and deceptive alignment as some of the main advanced attacks, that cannot be mitigated by the static guardrails used during the supervised training, points out a crucial research priority for real world robustness. The combination of static guardrails in dynamic multi-agent system fails to defend against those attacks. We intend to enhance security for LLM-based agents through the development of new evaluation frameworks which identify and counter threats for safe operational deployment. Our work uses three examination methods to detect rogue agents through a Reverse Turing Test and analyze deceptive alignment through multi-agent simulations and develops an anti-jailbreaking system by testing it with GEMINI 1.5 pro and llama-3.3-70B, deepseek r1 models using tool-mediated adversarial scenarios. The detection capabilities are strong such as 94\% accuracy for GEMINI 1.5 pro yet the system suffers persistent vulnerabilities when under long attacks as prompt length increases attack success rates (ASR) and diversity metrics become ineffective in prediction while revealing multiple complex system faults. The findings demonstrate the necessity of adopting flexible security systems based on active monitoring that can be performed by the agents themselves together with adaptable interventions by system admin as the current models can create vulnerabilities that can lead to the unreliable and vulnerable system. So, in our work, we try to address such situations and propose a comprehensive framework to counteract the security issues.

Updated: 2025-03-05 22:17:18

标题: Agentic系统的守护者：使用Agentic系统预防Many Shots越狱

摘要: 利用大型语言模型的自主AI代理可以在社会的各个领域创造无可否认的价值，但他们面临来自对手的安全威胁，这需要立即提供保护性解决方案，因为信任和安全问题会出现。考虑到许多多次越狱和欺骗性对准作为一些主要的高级攻击，这些攻击无法通过监督训练期间使用的静态防护栏来缓解，指出了对于真实世界鲁棒性的关键研究优先级。静态防护栏在动态多代理系统中的组合无法抵御这些攻击。我们打算通过开发新的评估框架来增强基于LLM的代理的安全性，该框架可识别和对抗威胁以进行安全操作部署。我们的工作使用三种检测方法通过反图灵测试检测恶意代理，通过多代理模拟分析欺骗性对准，并通过使用工具介导的对抗情景测试GEMINI 1.5 pro和llama-3.3-70B、deepseek r1模型来开发反越狱系统。检测能力很强，例如GEMINI 1.5 pro的准确率达到94\%，但在长时间攻击下系统仍然存在持久的漏洞，因为提示长度增加攻击成功率（ASR），而多样性度量在预测中变得无效，同时揭示了多个复杂的系统故障。研究结果表明，采用基于主动监控的灵活安全系统是必要的，这可以由代理自身执行，并由系统管理员进行可适应性干预，因为当前模型可能会导致不可靠和脆弱的系统。因此，在我们的工作中，我们尝试解决这种情况，并提出一个全面的框架来对抗安全问题。

更新时间: 2025-03-05 22:17:18

领域: cs.CR

下载: http://arxiv.org/abs/2502.16750v2

Timber! Poisoning Decision Trees

We present Timber, the first white-box poisoning attack targeting decision trees. Timber is based on a greedy attack strategy that leverages sub-tree retraining to efficiently estimate the damage caused by poisoning a given training instance. The attack relies on a tree annotation procedure, which enables the sorting of training instances so that they are processed in increasing order of the computational cost of sub-tree retraining. This sorting yields a variant of Timber that supports an early stopping criterion, designed to make poisoning attacks more efficient and feasible on larger datasets. We also discuss an extension of Timber to traditional random forest models, which is valuable since decision trees are typically combined into ensembles to improve their predictive power. Our experimental evaluation on public datasets demonstrates that our attacks outperform existing baselines in terms of effectiveness, efficiency, or both. Moreover, we show that two representative defenses can mitigate the effect of our attacks, but fail to effectively thwart them.

Updated: 2025-03-05 22:12:27

标题: 翻译：伐木！毒害决策树

摘要: 我们提出了Timber，这是第一个针对决策树的白盒中毒攻击。Timber基于一种贪婪攻击策略，利用子树重新训练来有效估计中毒给定训练实例造成的损害。该攻击依赖于一种树注释程序，使训练实例按照子树重新训练的计算成本递增的顺序进行处理。这种排序产生了Timber的一个变种，支持早停止准则，旨在使中毒攻击在更大的数据集上更有效和可行。我们还讨论了将Timber扩展到传统随机森林模型的方法，这是有价值的，因为决策树通常被组合成集成以提高它们的预测能力。我们在公共数据集上的实验评估表明，我们的攻击在效果、效率或两者方面均优于现有基线。此外，我们展示了两种代表性防御措施可以减轻我们的攻击效果，但未能有效地阻止它们。

更新时间: 2025-03-05 22:12:27

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2410.00862v2

FlexiFly: Interfacing the Physical World with Foundation Models Empowered by Reconfigurable Drone Systems

Foundation models (FM) have shown immense human-like capabilities for generating digital media. However, foundation models that can freely sense, interact, and actuate the physical domain is far from being realized. This is due to 1) requiring dense deployments of sensors to fully cover and analyze large spaces, while 2) events often being localized to small areas, making it difficult for FMs to pinpoint relevant areas of interest relevant to the current task. We propose FlexiFly, a platform that enables FMs to ``zoom in'' and analyze relevant areas with higher granularity to better understand the physical environment and carry out tasks. FlexiFly accomplishes by introducing 1) a novel image segmentation technique that aids in identifying relevant locations and 2) a modular and reconfigurable sensing and actuation drone platform that FMs can actuate to ``zoom in'' with relevant sensors and actuators. We demonstrate through real smart home deployments that FlexiFly enables FMs and LLMs to complete diverse tasks up to $85\%$ more successfully. FlexiFly is critical step towards FMs and LLMs that can naturally interface with the physical world.

Updated: 2025-03-05 22:11:38

标题: FlexiFly：利用可重构无人机系统赋能的基础模型与物理世界互连

摘要: 基础模型（FM）已经展示出极具人类特征的能力，可以生成数字媒体。然而，能够自由感知、互动和执行物理领域的基础模型尚未实现。这是因为1）需要密集部署传感器以完全覆盖和分析大空间，而2）事件通常局限于小区域，使得基础模型难以精确定位与当前任务相关的兴趣区域。我们提出了FlexiFly，这是一个平台，使基础模型能够“放大”并以更高粒度分析相关区域，以更好理解物理环境并执行任务。FlexiFly通过引入1）一种新颖的图像分割技术，有助于识别相关位置，以及2）一种模块化和可重构的感知和执行无人机平台，基础模型可以通过其执行“放大”操作，搭载相关传感器和执行器。我们通过实际的智能家居部署展示了FlexiFly可以使基础模型和LLMs成功完成多样化任务的能力提高了85％。FlexiFly是基础模型和LLMs能够自然与物理世界接口的关键一步。

更新时间: 2025-03-05 22:11:38

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2403.12853v3

Towards Resilient and Sustainable Global Industrial Systems: An Evolutionary-Based Approach

This paper presents a new complex optimization problem in the field of automatic design of advanced industrial systems and proposes a hybrid optimization approach to solve the problem. The problem is multi-objective as it aims at finding solutions that minimize CO2 emissions, transportation time, and costs. The optimization approach combines an evolutionary algorithm and classical mathematical programming to design resilient and sustainable global manufacturing networks. Further, it makes use of the OWL ontology for data consistency and constraint management. The experimental validation demonstrates the effectiveness of the approach in both single and double sourcing scenarios. The proposed methodology, in general, can be applied to any industry case with complex manufacturing and supply chain challenges.

Updated: 2025-03-05 22:10:32

标题: 朝向具有韧性和可持续性的全球工业系统：一种基于进化的方法

摘要: 本文提出了在先进工业系统自动设计领域中的一个新的复杂优化问题，并提出了一种混合优化方法来解决这个问题。该问题是多目标的，旨在找到最小化二氧化碳排放、运输时间和成本的解决方案。优化方法结合了进化算法和经典数学规划，设计出具有弹性和可持续性的全球制造网络。此外，它利用OWL本体论来进行数据一致性和约束管理。实验验证表明，该方法在单一和双重采购方案中的有效性。总体上，所提出的方法可以应用于任何具有复杂制造和供应链挑战的行业案例。

更新时间: 2025-03-05 22:10:32

领域: cs.NE,cs.LG,math.OC

下载: http://arxiv.org/abs/2503.11688v1

GlucoLens: Explainable Postprandial Blood Glucose Prediction from Diet and Physical Activity

Postprandial hyperglycemia, marked by the blood glucose level exceeding the normal range after meals, is a critical indicator of progression toward type 2 diabetes in prediabetic and healthy individuals. A key metric for understanding blood glucose dynamics after eating is the postprandial area under the curve (PAUC). Predicting PAUC in advance based on a person's diet and activity level and explaining what affects postprandial blood glucose could allow an individual to adjust their lifestyle accordingly to maintain normal glucose levels. In this paper, we propose GlucoLens, an explainable machine learning approach to predict PAUC and hyperglycemia from diet, activity, and recent glucose patterns. We conducted a five-week user study with 10 full-time working individuals to develop and evaluate the computational model. Our machine learning model takes multimodal data including fasting glucose, recent glucose, recent activity, and macronutrient amounts, and provides an interpretable prediction of the postprandial glucose pattern. Our extensive analyses of the collected data revealed that the trained model achieves a normalized root mean squared error (NRMSE) of 0.123. On average, GlucoLense with a Random Forest backbone provides a 16% better result than the baseline models. Additionally, GlucoLens predicts hyperglycemia with an accuracy of 74% and recommends different options to help avoid hyperglycemia through diverse counterfactual explanations. Code available: https://github.com/ab9mamun/GlucoLens.

Updated: 2025-03-05 22:10:14

标题: 葡萄糖镜头：基于饮食和体力活动的可解释的餐后血糖预测

摘要: 餐后高血糖，即餐后血糖水平超出正常范围，是前糖尿病患者和健康个体进展至2型糖尿病的重要指标。了解饭后血糖动态的关键指标是餐后血糖曲线下面积（PAUC）。根据一个人的饮食和活动水平预测PAUC，并解释影响餐后血糖的因素，可以让个体相应调整生活方式以维持正常血糖水平。在本文中，我们提出了GlucoLens，一种可解释的机器学习方法，用于预测PAUC和高血糖，基于饮食、活动和最近的血糖模式。我们进行了一个为期五周的用户研究，参与了10位全职工作的个体，以开发和评估计算模型。我们的机器学习模型采用多模态数据，包括空腹血糖、最近的血糖、最近的活动和宏量营养物质的摄入量，并提供对餐后血糖模式的可解释预测。我们对收集到的数据进行了广泛的分析，结果显示训练模型的标准化均方根误差（NRMSE）为0.123。平均而言，具有随机森林骨干的GlucoLens比基准模型提供了16%更好的结果。此外，GlucoLens以74%的准确度预测高血糖，并推荐不同的选择以帮助避免高血糖，通过多样的反事实解释。代码可在https://github.com/ab9mamun/GlucoLens上找到。

更新时间: 2025-03-05 22:10:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03935v1

Decentralized Low-Rank Fine-Tuning of Large Language Models

While parameter-efficient fine-tuning (PEFT) techniques like Low-Rank Adaptation (LoRA) offer computationally efficient adaptations of Large Language Models (LLMs), their practical deployment often assumes centralized data and training environments. However, real-world scenarios frequently involve distributed, privacy-sensitive datasets that require decentralized solutions. Federated learning (FL) addresses data privacy by coordinating model updates across clients, but it is typically based on centralized aggregation through a parameter server, which can introduce bottlenecks and communication constraints. Decentralized learning, in contrast, eliminates this dependency by enabling direct collaboration between clients, improving scalability and efficiency in distributed environments. Despite its advantages, decentralized LLM fine-tuning remains underexplored. In this work, we propose Dec-LoRA, a decentralized fine-tuning algorithm for LLMs based on LoRA. Through extensive experiments on BERT and LLaMA-2 models, we demonstrate that Dec-LoRA achieves performance comparable to centralized LoRA under various conditions, including data heterogeneity and quantization constraints. Additionally, we provide a rigorous theoretical guarantee proving the convergence of our algorithm to a stationary point for non-convex and smooth loss functions. These findings highlight the potential of Dec-LoRA for scalable LLM fine-tuning in decentralized environments.

Updated: 2025-03-05 22:09:09

标题: 分散式低秩微调大型语言模型

摘要: 虽然参数高效微调（PEFT）技术如低秩适应（LoRA）提供了大型语言模型（LLMs）的计算效率高的调整，但它们的实际部署通常假设集中化的数据和训练环境。然而，现实世界中的场景通常涉及需要去中心化解决方案的分布式、隐私敏感的数据集。联邦学习（FL）通过协调客户端间的模型更新来解决数据隐私问题，但通常基于通过参数服务器的集中化聚合，这可能导致瓶颈和通信约束。相比之下，去中心化学习通过实现客户端直接合作来消除这种依赖性，在分布式环境中提高了可扩展性和效率。尽管具有这些优势，去中心化LLM微调仍未被充分探讨。在这项工作中，我们提出Dec-LoRA，一种基于LoRA的LLM的去中心化微调算法。通过对BERT和LLaMA-2模型的广泛实验，我们证明Dec-LoRA在各种条件下（包括数据异质性和量化约束）实现了与集中化LoRA相当的性能。此外，我们提供了严格的理论保证，证明了我们的算法对于非凸和光滑损失函数的收敛到一个稳定点。这些发现突显了Dec-LoRA在去中心化环境中可扩展的LLM微调的潜力。

更新时间: 2025-03-05 22:09:09

领域: cs.LG

下载: http://arxiv.org/abs/2501.15361v3

Data augmentation with automated machine learning: approaches and performance comparison with classical data augmentation methods

Data augmentation is arguably the most important regularization technique commonly used to improve generalization performance of machine learning models. It primarily involves the application of appropriate data transformation operations to create new data samples with desired properties. Despite its effectiveness, the process is often challenging because of the time-consuming trial and error procedures for creating and testing different candidate augmentations and their hyperparameters manually. State-of-the-art approaches are increasingly relying on automated machine learning (AutoML) principles. This work presents a comprehensive survey of AutoML-based data augmentation techniques. We discuss various approaches for accomplishing data augmentation with AutoML, including data manipulation, data integration and data synthesis techniques. The focus of this work is on image data augmentation methods. Nonetheless, we cover other data modalities, especially in cases where the specific data augmentations techniques being discussed are more suitable for these other modalities. For instance, since automated data integration methods are more suitable for tabular data, we cover tabular data in the discussion of data integration methods. The work also presents extensive discussion of techniques for accomplishing each of the major subtasks of the image data augmentation process: search space design, hyperparameter optimization and model evaluation. Finally, we carried out an extensive comparison and analysis of the performance of automated data augmentation techniques and state-of-the-art methods based on classical augmentation approaches. The results show that AutoML methods for data augmentation currently outperform state-of-the-art techniques based on conventional approaches.

Updated: 2025-03-05 22:04:18

标题: 使用自动机器学习进行数据增强：方法和性能与传统数据增强方法的比较

摘要: 数据增强被认为是改善机器学习模型泛化性能最重要的正则化技术之一。它主要涉及应用适当的数据转换操作，以创建具有所需属性的新数据样本。尽管其有效性，这一过程通常具有挑战性，因为需要耗时的试错过程来手动创建和测试不同候选增强和其超参数。最先进的方法越来越依赖自动化机器学习（AutoML）原则。本文提供了基于AutoML的数据增强技术的全面调查。我们讨论了利用AutoML实现数据增强的各种方法，包括数据操作、数据集成和数据合成技术。本文重点关注图像数据增强方法。尽管如此，我们也涵盖其他数据模态，特别是在讨论特定数据增强技术更适用于这些其他模态的情况下。例如，由于自动数据集成方法更适用于表格数据，我们在讨论数据集成方法时涵盖了表格数据。本文还对图像数据增强过程的主要子任务的完成技术进行了广泛讨论：搜索空间设计、超参数优化和模型评估。最后，我们进行了对自动化数据增强技术和基于传统增强方法的最新方法性能的广泛比较和分析。结果显示，目前AutoML方法对数据增强的性能优于基于传统方法的最先进技术。

更新时间: 2025-03-05 22:04:18

领域: cs.LG,cs.AI,cs.CV,cs.NE

下载: http://arxiv.org/abs/2403.08352v3

Pruning Deep Neural Networks via a Combination of the Marchenko-Pastur Distribution and Regularization

Deep neural networks (DNNs) have brought significant advancements in various applications in recent years, such as image recognition, speech recognition, and natural language processing. In particular, Vision Transformers (ViTs) have emerged as a powerful class of models in the field of deep learning for image classification. In this work, we propose a novel Random Matrix Theory (RMT)-based method for pruning pre-trained DNNs, based on the sparsification of weights and singular vectors, and apply it to ViTs. RMT provides a robust framework to analyze the statistical properties of large matrices, which has been shown to be crucial for understanding and optimizing the performance of DNNs. We demonstrate that our RMT-based pruning can be used to reduce the number of parameters of ViT models (trained on ImageNet) by 30-50\% with less than 1\% loss in accuracy. To our knowledge, this represents the state-of-the-art in pruning for these ViT models. Furthermore, we provide a rigorous mathematical underpinning of the above numerical studies, namely we proved a theorem for fully connected DNNs, and other more general DNN structures, describing how the randomness in the weight matrices of a DNN decreases as the weights approach a local or global minimum (during training). We verify this theorem through numerical experiments on fully connected DNNs, providing empirical support for our theoretical findings. Moreover, we prove a theorem that describes how DNN loss decreases as we remove randomness in the weight layers, and show a monotone dependence of the decrease in loss with the amount of randomness that we remove. Our results also provide significant RMT-based insights into the role of regularization during training and pruning.

Updated: 2025-03-05 21:57:19

标题: 通过马尔谢科-帕斯图分布和正则化的结合修剪深度神经网络

摘要: 深度神经网络（DNNs）在近年来在各种应用中取得了重要进展，如图像识别、语音识别和自然语言处理。特别是，Vision Transformers（ViTs）已经成为深度学习领域中一类强大的模型，用于图像分类。在这项工作中，我们提出了一种基于随机矩阵理论（RMT）的方法，用于对预训练的DNNs进行修剪，基于权重和奇异向量的稀疏化，并将其应用于ViTs。RMT提供了一个稳健的框架，用于分析大矩阵的统计特性，这对于理解和优化DNNs的性能至关重要。我们证明了我们基于RMT的修剪可以将ViT模型（在ImageNet上训练）的参数数量减少30-50％，而准确度损失不到1％。据我们所知，这代表了这些ViT模型修剪的最新技术。此外，我们为上述数值研究提供了严格的数学基础，即我们证明了一个关于全连接DNNs和其他更一般的DNN结构的定理，描述了DNN的权重矩阵中的随机性随着权重接近局部或全局最小值（在训练过程中）而减少。我们通过对全连接DNNs的数值实验验证了这个定理，为我们的理论发现提供了实证支持。此外，我们证明了一个定理，描述了当我们去除权重层中的随机性时，DNN损失如何减少，并显示了损失随我们去除的随机性量的单调依赖关系。我们的结果还为训练和修剪过程中正则化的作用提供了重要的RMT基础。

更新时间: 2025-03-05 21:57:19

领域: cs.LG

下载: http://arxiv.org/abs/2503.01922v2

LoLCATs: On Low-Rank Linearizing of Large Language Models

Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention -- avoiding the expensive pretraining costs. However, linearizing LLMs often significantly degrades model quality, still requires training over billions of tokens, and remains limited to smaller 1.3B to 7B LLMs. We thus propose Low-rank Linear Conversion via Attention Transfer (LoLCATs), a simple two-step method that improves LLM linearizing quality with orders of magnitudes less memory and compute. We base these steps on two findings. First, we can replace an LLM's softmax attentions with closely-approximating linear attentions, simply by training the linear attentions to match their softmax counterparts with an output MSE loss ("attention transfer"). Then, this enables adjusting for approximation errors and recovering LLM quality simply with low-rank adaptation (LoRA). LoLCATs significantly improves linearizing quality, training efficiency, and scalability. We significantly reduce the linearizing quality gap and produce state-of-the-art subquadratic LLMs from Llama 3 8B and Mistral 7B v0.1, leading to 20+ points of improvement on 5-shot MMLU. Furthermore, LoLCATs does so with only 0.2% of past methods' model parameters and 0.4% of their training tokens. Finally, we apply LoLCATs to create the first linearized 70B and 405B LLMs (50x larger than prior work). When compared with prior approaches under the same compute budgets, LoLCATs significantly improves linearizing quality, closing the gap between linearized and original Llama 3.1 70B and 405B LLMs by 77.8% and 78.1% on 5-shot MMLU.

Updated: 2025-03-05 21:57:04

标题: LoLCATs: 大型语言模型的低秩线性化

摘要: 最近的研究表明，我们可以对大型语言模型（LLMs）进行线性化处理，将流行的基于Transformer的LLMs的二次注意力替换为次二次模拟，如线性注意力，从而避免昂贵的预训练成本。然而，线性化LLMs通常会显著降低模型质量，仍然需要对数十亿个标记进行训练，并且仍然局限于规模较小的13亿至70亿的LLMs。因此，我们提出了一种名为LoLCATs的低秩线性转换通过注意力转移的简单两步方法，它可以以数量级别更少的内存和计算提高LLM的线性化质量。我们的方法基于两个发现。首先，我们可以通过训练线性注意力以匹配其softmax对应物来用线性注意力替换LLM的softmax注意力，从而简单地实现近似("注意力转移")。然后，这使得可以通过低秩适应（LoRA）来调整近似误差并恢复LLM的质量。LoLCATs显著提高了线性化质量、训练效率和可扩展性。我们大大缩小了线性化质量差距，并从Llama 3 8B和Mistral 7B v0.1生成了领先的次二次LLMs，使5-shot MMLU得分提高了20多个点。此外，LoLCATs仅使用过去方法的0.2%的模型参数和0.4%的训练标记就能实现这一点。最后，我们应用LoLCATs创建了第一个线性化的70B和405B的LLMs（比以前的工作大50倍）。在与相同计算预算下与以往方法进行比较时，LoLCATs显著提高了线性化质量，将线性化和原始Llama 3.1的70B和405B LLMs之间的差距在5-shot MMLU上缩小了77.8%和78.1%。

更新时间: 2025-03-05 21:57:04

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2410.10254v3

"Impressively Scary:" Exploring User Perceptions and Reactions to Unraveling Machine Learning Models in Social Media Applications

Machine learning models deployed locally on social media applications are used for features, such as face filters which read faces in-real time, and they expose sensitive attributes to the apps. However, the deployment of machine learning models, e.g., when, where, and how they are used, in social media applications is opaque to users. We aim to address this inconsistency and investigate how social media user perceptions and behaviors change once exposed to these models. We conducted user studies (N=21) and found that participants were unaware to both what the models output and when the models were used in Instagram and TikTok, two major social media platforms. In response to being exposed to the models' functionality, we observed long term behavior changes in 8 participants. Our analysis uncovers the challenges and opportunities in providing transparency for machine learning models that interact with local user data.

Updated: 2025-03-05 21:51:52

标题: “令人印象深刻的恐怖：探讨用户对社交媒体应用中揭示的机器学习模型的感知和反应”

摘要: 社交媒体应用程序上部署本地机器学习模型用于功能，例如实时读取面孔的面部滤镜，并向应用程序暴露敏感属性。然而，社交媒体应用程序中机器学习模型的部署，例如何时、在哪里以及如何使用，对用户来说是不透明的。我们的目标是解决这种不一致性，并调查社交媒体用户一旦接触到这些模型会如何改变其感知和行为。我们进行了用户研究（N=21），发现参与者对于模型的输出以及在Instagram和TikTok两个主要社交媒体平台上使用模型的时间都不知情。在暴露于模型功能后，我们观察到8名参与者的长期行为发生了变化。我们的分析揭示了为与本地用户数据交互的机器学习模型提供透明度所面临的挑战和机遇。

更新时间: 2025-03-05 21:51:52

领域: cs.HC,cs.AI,cs.CR

下载: http://arxiv.org/abs/2503.03927v1

De-skilling, Cognitive Offloading, and Misplaced Responsibilities: Potential Ironies of AI-Assisted Design

The rapid adoption of generative AI (GenAI) in design has sparked discussions about its benefits and unintended consequences. While AI is often framed as a tool for enhancing productivity by automating routine tasks, historical research on automation warns of paradoxical effects, such as de-skilling and misplaced responsibilities. To assess UX practitioners' perceptions of AI, we analyzed over 120 articles and discussions from UX-focused subreddits. Our findings indicate that while practitioners express optimism about AI reducing repetitive work and augmenting creativity, they also highlight concerns about over-reliance, cognitive offloading, and the erosion of critical design skills. Drawing from human-automation interaction literature, we discuss how these perspectives align with well-documented automation ironies and function allocation challenges. We argue that UX professionals should critically evaluate AI's role beyond immediate productivity gains and consider its long-term implications for creative autonomy and expertise. This study contributes empirical insights into practitioners' perspectives and links them to broader debates on automation in design.

Updated: 2025-03-05 21:47:16

标题: 去技能化、认知卸载和责任错位：AI辅助设计的潜在讽刺

摘要: 设计中对生成式人工智能（GenAI）的快速采用引发了关于其益处和意外后果的讨论。尽管人工智能经常被视为通过自动化例行任务来增强生产力的工具，但有关自动化的历史研究警告称可能会出现悖论性影响，例如技能下降和责任错位。为了评估用户体验从业者对人工智能的看法，我们分析了来自以用户体验为重点的子社区的超过120篇文章和讨论。我们的研究结果表明，尽管从业者对人工智能减少重复工作和增强创造力表示乐观，但他们也强调了对过度依赖、认知卸载和关键设计技能的侵蚀的担忧。结合人类与自动化互动文献，我们讨论了这些观点如何与广泛记录的自动化讽刺和功能分配挑战相一致。我们认为用户体验专业人员应该对人工智能在即时生产力增益之外的作用进行批判性评估，并考虑其对创造性自主权和专业知识的长期影响。这项研究为从业者的观点提供了实证见解，并将它们与关于设计中自动化的更广泛辩论联系起来。

更新时间: 2025-03-05 21:47:16

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2503.03924v1

CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance

We address the long-horizon mapless navigation problem: enabling robots to traverse novel environments without relying on high-definition maps or precise waypoints that specify exactly where to navigate. Achieving this requires overcoming two major challenges -- learning robust, generalizable perceptual representations of the environment without pre-enumerating all possible navigation factors and forms of perceptual aliasing and utilizing these learned representations to plan human-aligned navigation paths. Existing solutions struggle to generalize due to their reliance on hand-curated object lists that overlook unforeseen factors, end-to-end learning of navigation features from scarce large-scale robot datasets, and handcrafted reward functions that scale poorly to diverse scenarios. To overcome these limitations, we propose CREStE, the first method that learns representations and rewards for addressing the full mapless navigation problem without relying on large-scale robot datasets or manually curated features. CREStE leverages visual foundation models trained on internet-scale data to learn continuous bird's-eye-view representations capturing elevation, semantics, and instance-level features. To utilize learned representations for planning, we propose a counterfactual-based loss and active learning procedure that focuses on the most salient perceptual cues by querying humans for counterfactual trajectory annotations in challenging scenes. We evaluate CREStE in kilometer-scale navigation tasks across six distinct urban environments. CREStE significantly outperforms all state-of-the-art approaches with 70% fewer human interventions per mission, including a 2-kilometer mission in an unseen environment with just 1 intervention; showcasing its robustness and effectiveness for long-horizon mapless navigation. For videos and additional materials, see https://amrl.cs.utexas.edu/creste .

Updated: 2025-03-05 21:42:46

标题: CREStE：具有互联网规模先验和反事实引导的可扩展无地图导航

摘要: 我们解决了长期地图无导航问题：使机器人能够穿越新颖的环境，而无需依赖高清地图或精确的航点来指定导航位置。实现这一目标需要克服两个主要挑战——学习环境的稳健、可泛化的感知表示，而无需预先枚举所有可能的导航因素和感知混淆形式，并利用这些学到的表示来规划与人类对齐的导航路径。现有的解决方案很难泛化，因为它们依赖于手工策划的对象列表，忽视了未预料到的因素，从稀缺的大规模机器人数据集中学习导航特征的端到端学习，以及手工制作的奖励函数在不同场景下的扩展性较差。为了克服这些限制，我们提出了CREStE，这是第一个学习表示和奖励以解决完整的地图无导航问题的方法，而无需依赖大规模机器人数据集或手动策划的特征。CREStE利用在互联网规模数据上训练的视觉基础模型来学习连续的鸟瞰表示，捕捉高程、语义和实例级特征。为了利用学到的表示进行规划，我们提出了一种基于反事实的损失和主动学习程序，通过在具有挑战性的场景中询问人类对反事实轨迹注释，集中于最显著的感知线索。我们在六个不同的城市环境中进行了公里级导航任务上对CREStE的评估。CREStE在每次任务中人类干预减少了70%，包括在未见过的环境中进行的2公里任务仅需1次干预；展示了其在长期地图无导航中的稳健性和有效性。有关视频和其他材料，请访问https://amrl.cs.utexas.edu/creste。

更新时间: 2025-03-05 21:42:46

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.03921v1

GenCeption: Evaluate Vision LLMs with Unlabeled Unimodal Data

Multimodal Large Language Models (MLLMs) are typically assessed using expensive annotated multimodal benchmarks, which often lag behind the rapidly evolving demands of MLLM evaluation. This paper outlines and validates GenCeption, a novel, annotation-free evaluation method that requires only unimodal data to measure inter-modality semantic coherence and inversely assesses MLLMs' tendency to hallucinate. This approach eliminates the need for costly data annotation, minimizes the risk of training data contamination, is expected to result in slower benchmark saturation, and avoids the illusion of emerging abilities. Inspired by the DrawCeption game, GenCeption begins with a non-textual sample and proceeds through iterative description and generation steps. The semantic drift across iterations is quantified using the GC@T metric. While GenCeption is principally applicable to MLLMs across various modalities, this paper focuses on its implementation and validation for Vision LLMs (VLLMs). Based on the GenCeption method, we establish the MMECeption benchmark for evaluating VLLMs, and compare the performance of several popular VLLMs and human annotators. Our empirical results validate GenCeption's effectiveness, demonstrating strong correlations with established VLLM benchmarks. VLLMs still significantly lag behind human performance and struggle especially with text-intensive tasks.

Updated: 2025-03-05 21:42:27

标题: GenCeption：使用无标签的单模态数据评估视觉LLMs

摘要: 多模式大型语言模型（MLLMs）通常使用昂贵的注释多模式基准进行评估，这些基准往往滞后于MLLM评估的快速发展需求。本文概述并验证了GenCeption，一种新颖的、无需注释的评估方法，只需要单模态数据来衡量跨模态语义一致性，并反向评估MLLMs的产生幻觉的倾向。这种方法消除了昂贵数据注释的需求，最小化了训练数据污染的风险，预计将导致基准饱和速度变慢，并避免出现新能力的幻觉。受DrawCeption游戏启发，GenCeption从非文本样本开始，通过迭代描述和生成步骤进行。通过GC@T指标量化迭代间的语义漂移。虽然GenCeption主要适用于各种模态的MLLMs，本文侧重于其在视觉LLMs（VLLMs）中的实施和验证。基于GenCeption方法，我们建立了用于评估VLLMs的MMECeption基准，并比较了几种流行的VLLMs和人类注释者的表现。我们的实证结果验证了GenCeption的有效性，展示了与已建立的VLLM基准的强相关性。VLLMs仍然明显落后于人类表现，尤其在文本密集型任务方面表现困难。

更新时间: 2025-03-05 21:42:27

领域: cs.CL,cs.AI,cs.LG,I.7; I.4

下载: http://arxiv.org/abs/2402.14973v4

Personalized Federated Fine-tuning for Heterogeneous Data: An Automatic Rank Learning Approach via Two-Level LoRA

We study the task of personalized federated fine-tuning with heterogeneous data in the context of language models, where clients collaboratively fine-tune a language model (e.g., BERT, GPT) without sharing their local data, achieving personalization simultaneously. While recent efforts have applied parameter-efficient fine-tuning techniques like low-rank adaptation (LoRA) in federated settings, they typically use single or multiple independent low-rank adapters with predefined maximal and minimal ranks, which may not be optimal for diverse data sources over clients. To address this issue, we propose PF2LoRA, a new personalized federated fine-tuning algorithm built on a novel \emph{automatic rank learning approach via two-level LoRA}. Given the pretrained language model whose weight is frozen, our algorithm aims to learn two levels of adaptation simultaneously: the first level aims to learn a common adapter for all clients, while the second level fosters individual client personalization. A key advantage of PF2LoRA is its ability to adaptively determine a suitable rank based on an individual client's data, rather than relying on a predefined rank that is agnostic to data heterogeneity. We present a synthetic example that highlights how PF2LoRA automatically learns the ground-truth rank for each client, tailoring the adaptation to match the properties of their individual data. Notably, this approach introduces minimal additional memory overhead, as the second-level adaptation comprises a small number of parameters compared to the first level. Our experiments on natural language understanding and generation tasks demonstrate that PF2LoRA significantly outperforms existing federated fine-tuning methods.

Updated: 2025-03-05 21:41:03

标题: 个性化的联邦微调用于异构数据：基于两级LoRA的自动排名学习方法

摘要: 我们研究了在语言模型的背景下，使用异构数据进行个性化联合微调的任务，其中客户端协作微调语言模型（例如BERT、GPT）而无需共享他们的本地数据，同时实现个性化。尽管最近的努力在联合设置中应用了参数高效的微调技术，如低秩适应（LoRA），但它们通常使用单个或多个独立的低秩适配器，具有预定义的最大和最小秩，这可能不适用于客户端上的多样化数据源。为解决这一问题，我们提出了PF2LoRA，这是一种建立在新颖的自动秩学习方法上的个性化联合微调算法，通过两级LoRA实现。给定冻结权重的预训练语言模型，我们的算法旨在同时学习两个级别的适应：第一级旨在为所有客户端学习一个通用适配器，而第二级则促进个体客户端的个性化。PF2LoRA的一个关键优势是其能够根据个体客户端的数据自适应确定一个适合的秩，而不是依赖于对数据异质性不可知的预定义秩。我们提供了一个合成示例，突出了PF2LoRA如何自动学习每个客户端的真实秩，调整适应以匹配其个体数据的特性。值得注意的是，与第一级相比，第二级适应所需的参数数量较少，因此引入了最小的额外内存开销。我们对自然语言理解和生成任务的实验表明，PF2LoRA明显优于现有的联合微调方法。

更新时间: 2025-03-05 21:41:03

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.03920v1

Digital Divides in Scene Recognition: Uncovering Socioeconomic Biases in Deep Learning Systems

Computer-based scene understanding has influenced fields ranging from urban planning to autonomous vehicle performance, yet little is known about how well these technologies work across social differences. We investigate the biases of deep convolutional neural networks (dCNNs) in scene classification, using nearly one million images from global and US sources, including user-submitted home photographs and Airbnb listings. We applied statistical models to quantify the impact of socioeconomic indicators such as family income, Human Development Index (HDI), and demographic factors from public data sources (CIA and US Census) on dCNN performance. Our analyses revealed significant socioeconomic bias, where pretrained dCNNs demonstrated lower classification accuracy, lower classification confidence, and a higher tendency to assign labels that could be offensive when applied to homes (e.g., "ruin", "slum"), especially in images from homes with lower socioeconomic status (SES). This trend is consistent across two datasets of international images and within the diverse economic and racial landscapes of the United States. This research contributes to understanding biases in computer vision, emphasizing the need for more inclusive and representative training datasets. By mitigating the bias in the computer vision pipelines, we can ensure fairer and more equitable outcomes for applied computer vision, including home valuation and smart home security systems. There is urgency in addressing these biases, which can significantly impact critical decisions in urban development and resource allocation. Our findings also motivate the development of AI systems that better understand and serve diverse communities, moving towards technology that equitably benefits all sectors of society.

Updated: 2025-03-05 21:31:31

标题: 数字鸿沟在场景识别中的体现：揭示深度学习系统中的社会经济偏见

摘要: 计算机场景理解技术已经影响了从城市规划到自动驾驶汽车性能等领域，但是关于这些技术在不同社会差异中的表现如何，目前知之甚少。我们研究了深度卷积神经网络（dCNNs）在场景分类中的偏见，使用了来自全球和美国来源的近100万张图像，包括用户提交的家庭照片和Airbnb列表。我们应用统计模型来量化家庭收入、人类发展指数（HDI）以及来自公共数据源（CIA和美国人口普查局）的人口统计因素对dCNN性能的影响。我们的分析揭示了显著的社会经济偏见，即预先训练的dCNN在分类准确性、分类置信度方面表现较差，并且更倾向于给低社会经济地位的家庭的图像分配可能具有冒犯性的标签（例如“废墟”、“贫民窟”）。这种趋势在国际图像的两个数据集以及美国多样化的经济和种族景观中保持一致。这项研究有助于理解计算机视觉中的偏见，强调了需要更具包容性和代表性的训练数据集。通过减轻计算机视觉流程中的偏见，我们可以确保应用计算机视觉技术时实现更公平和更公正的结果，包括家庭估值和智能家居安全系统。解决这些偏见问题的紧迫性在于，这可能会对城市发展和资源分配中的关键决策产生重大影响。我们的发现也促使发展更好地理解和服务多样社区的人工智能系统，朝着更能够公平惠及社会各个领域的技术发展。

更新时间: 2025-03-05 21:31:31

领域: cs.CV,cs.AI,68-02,I.2.m

下载: http://arxiv.org/abs/2401.13097v2

Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework

Open-ended text generation has become a prominent task in natural language processing due to the rise of powerful (large) language models. However, evaluating the quality of these models and the employed decoding strategies remains challenging because of trade-offs among widely used metrics such as coherence, diversity, and perplexity. Decoding methods often excel in some metrics while underperforming in others, complicating the establishment of a clear ranking. In this paper, we present novel ranking strategies within this multicriteria framework. Specifically, we employ benchmarking approaches based on partial orderings and present a new summary metric designed to balance existing automatic indicators, providing a more holistic evaluation of text generation quality. Our experiments demonstrate that the proposed methods offer a robust way to compare decoding strategies, and serve as valuable tools in guiding model selection for open-ended text generation tasks. Finally, we suggest future directions for improving evaluation methodologies in text generation. Our codebase, datasets, and models are publicly available.

Updated: 2025-03-05 21:24:29

标题: 朝着更好的开放式文本生成：一个多标准评估框架

摘要: 开放式文本生成由于强大（大型）语言模型的兴起而成为自然语言处理中的一个重要任务。然而，由于一些常用指标如连贯性、多样性和困惑度之间存在权衡，评估这些模型和所采用的解码策略的质量仍然具有挑战性。解码方法通常在一些指标上表现出色，而在其他指标上表现不佳，使得建立清晰的排名变得复杂。在本文中，我们提出了在这个多标准框架内的新颖排名策略。具体来说，我们采用基于偏序关系的基准方法，并提出了一个旨在平衡现有自动指标的新的总结指标，从而更全面地评估文本生成质量。我们的实验表明，所提出的方法提供了一种稳健的比较解码策略的方式，并可作为指导开放式文本生成任务模型选择的有价值工具。最后，我们提出了改进文本生成评估方法的未来方向。我们的代码库、数据集和模型均公开可用。

更新时间: 2025-03-05 21:24:29

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.18653v2

Safe LLM-Controlled Robots with Formal Guarantees via Reachability Analysis

The deployment of Large Language Models (LLMs) in robotic systems presents unique safety challenges, particularly in unpredictable environments. Although LLMs, leveraging zero-shot learning, enhance human-robot interaction and decision-making capabilities, their inherent probabilistic nature and lack of formal guarantees raise significant concerns for safety-critical applications. Traditional model-based verification approaches often rely on precise system models, which are difficult to obtain for real-world robotic systems and may not be fully trusted due to modeling inaccuracies, unmodeled dynamics, or environmental uncertainties. To address these challenges, this paper introduces a safety assurance framework for LLM-controlled robots based on data-driven reachability analysis, a formal verification technique that ensures all possible system trajectories remain within safe operational limits. Our framework specifically investigates the problem of instructing an LLM to navigate the robot to a specified goal and assesses its ability to generate low-level control actions that successfully guide the robot safely toward that goal. By leveraging historical data to construct reachable sets of states for the robot-LLM system, our approach provides rigorous safety guarantees against unsafe behaviors without relying on explicit analytical models. We validate the framework through experimental case studies in autonomous navigation and task planning, demonstrating its effectiveness in mitigating risks associated with LLM-generated commands. This work advances the integration of formal methods into LLM-based robotics, offering a principled and practical approach to ensuring safety in next-generation autonomous systems.

Updated: 2025-03-05 21:23:15

标题: 通过可达性分析实现带有形式保证的安全LLM控制机器人

摘要: 大型语言模型（LLMs）在机器人系统中的部署面临独特的安全挑战，特别是在不可预测的环境中。虽然LLMs利用零样本学习增强了人机交互和决策能力，但其固有的概率性质和缺乏正式保证引发了对安全关键应用的重大担忧。传统的基于模型的验证方法通常依赖于精确的系统模型，这对于现实世界中的机器人系统来说很难获得，并且可能由于建模不准确、未建模的动态或环境不确定性而不完全可信。为了解决这些挑战，本文介绍了一种基于数据驱动可达性分析的LLM控制机器人的安全保证框架，这是一种形式验证技术，确保所有可能的系统轨迹保持在安全操作范围内。我们的框架特别研究了指导LLM将机器人导航到指定目标的问题，并评估其生成成功指导机器人安全朝向该目标的低级控制动作的能力。通过利用历史数据构建机器人-LLM系统状态的可达集，我们的方法提供了严格的安全保证，防止不安全行为，而无需依赖显式的分析模型。我们通过自主导航和任务规划的实验案例研究验证了该框架，展示了其在减轻与LLM生成命令相关的风险方面的有效性。这项工作推动了形式方法与基于LLM的机器人技术的整合，为确保下一代自主系统的安全提供了一种有原则且实用的方法。

更新时间: 2025-03-05 21:23:15

领域: cs.RO,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.03911v1

Multimodal Stock Price Prediction: A Case Study of the Russian Securities Market

Classical asset price forecasting methods primarily rely on numerical data, such as price time series, trading volumes, limit order book data, and technical analysis indicators. However, the news flow plays a significant role in price formation, making the development of multimodal approaches that combine textual and numerical data for improved prediction accuracy highly relevant. This paper addresses the problem of forecasting financial asset prices using the multimodal approach that combines candlestick time series and textual news flow data. A unique dataset was collected for the study, which includes time series for 176 Russian stocks traded on the Moscow Exchange and 79,555 financial news articles in Russian. For processing textual data, pre-trained models RuBERT and Vikhr-Qwen2.5-0.5b-Instruct (a large language model) were used, while time series and vectorized text data were processed using an LSTM recurrent neural network. The experiments compared models based on a single modality (time series only) and two modalities, as well as various methods for aggregating text vector representations. Prediction quality was estimated using two key metrics: Accuracy (direction of price movement prediction: up or down) and Mean Absolute Percentage Error (MAPE), which measures the deviation of the predicted price from the true price. The experiments showed that incorporating textual modality reduced the MAPE value by 55%. The resulting multimodal dataset holds value for the further adaptation of language models in the financial sector. Future research directions include optimizing textual modality parameters, such as the time window, sentiment, and chronological order of news messages.

Updated: 2025-03-05 21:20:32

标题: 多模式股价预测：俄罗斯证券市场案例研究

摘要: 传统资产价格预测方法主要依赖于数字数据，如价格时间序列、交易量、限价订单簿数据和技术分析指标。然而，新闻流在价格形成中发挥着重要作用，使得结合文本和数字数据以提高预测准确性的多模态方法的发展非常相关。本文解决了使用结合蜡烛图时间序列和文本新闻流数据的多模态方法来预测金融资产价格的问题。为研究收集了一个独特的数据集，其中包括莫斯科交易所上市的176支俄罗斯股票的时间序列和79,555篇俄文金融新闻文章。为处理文本数据，使用了预训练模型RuBERT和Vikhr-Qwen2.5-0.5b-Instruct（一个大型语言模型），而时间序列和向量化文本数据则使用LSTM递归神经网络进行处理。实验比较了基于单一模态（仅时间序列）和两种模态的模型，以及各种聚合文本向量表示的方法。预测质量使用了两个关键指标进行评估：准确性（价格走势预测的方向：上涨或下跌）和平均绝对百分比误差（MAPE），该指标衡量了预测价格与真实价格之间的偏差。实验证明，整合文本模态降低了MAPE值55%。得到的多模态数据集对于进一步在金融领域中调整语言模型具有价值。未来研究方向包括优化文本模态参数，如时间窗口、情感和新闻信息的时间顺序。

更新时间: 2025-03-05 21:20:32

领域: q-fin.ST,cs.LG,q-fin.CP

下载: http://arxiv.org/abs/2503.08696v1

On the Convergence of Adam-Type Algorithm for Bilevel Optimization under Unbounded Smoothness

Adam has become one of the most popular optimizers for training modern deep neural networks, such as transformers. However, its applicability is largely restricted to single-level optimization problems. In this paper, we aim to extend vanilla Adam to tackle bilevel optimization problems, which have important applications in machine learning, such as meta-learning. In particular, we study stochastic bilevel optimization problems where the lower-level function is strongly convex and the upper-level objective is nonconvex with potentially unbounded smoothness. This unbounded smooth objective function covers a broad class of neural networks, including transformers, which may exhibit non-Lipschitz gradients. In this work, we introduce AdamBO, a single-loop Adam-type method that achieves $\widetilde{O}(\epsilon^{-4})$ oracle complexity to find $\epsilon$-stationary points, where the oracle calls involve stochastic gradient or Hessian/Jacobian-vector product evaluations. The key to our analysis is a novel randomness decoupling lemma that provides refined control over the lower-level variable. We conduct extensive experiments on various machine learning tasks involving bilevel formulations with recurrent neural networks (RNNs) and transformers, demonstrating the effectiveness of our proposed Adam-type algorithm.

Updated: 2025-03-05 21:16:59

标题: 关于Adam类型算法在无界平滑度下双层优化中的收敛性

摘要: Adam已经成为训练现代深度神经网络（如变压器）最受欢迎的优化器之一。然而，它的适用性主要限于单层优化问题。在本文中，我们旨在将普通的Adam扩展到解决双层优化问题，这在机器学习中具有重要应用，如元学习。特别地，我们研究了随机双层优化问题，其中下层函数是强凸的，上层目标是非凸的，可能具有无界的光滑性。这种无界光滑目标函数涵盖了一大类神经网络，包括可能具有非利普希茨梯度的变压器。在这项工作中，我们引入了AdamBO，一种单循环Adam类型方法，其实现了$ \widetilde{O}（\epsilon^{-4}）$的预测复杂度以找到$ \epsilon $-稳定点，其中预测调用涉及随机梯度或Hessian/Jacobian-向量乘积评估。我们分析的关键是一种新颖的随机解耦引理，它提供了对下层变量的精细控制。我们在包含循环神经网络（RNNs）和变压器的双层形式的各种机器学习任务上进行了广泛的实验，展示了我们提出的Adam类型算法的有效性。

更新时间: 2025-03-05 21:16:59

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2503.03908v1

GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting

3D Gaussian Splatting (3DGS) has recently created impressive 3D assets for various applications. However, the copyright of these assets is not well protected as existing watermarking methods are not suited for the 3DGS rendering pipeline considering security, capacity, and invisibility. Besides, these methods often require hours or even days for optimization, limiting the application scenarios. In this paper, we propose GuardSplat, an innovative and efficient framework that effectively protects the copyright of 3DGS assets. Specifically, 1) We first propose a CLIP-guided Message Decoupling Optimization module for training the message decoder, leveraging CLIP's aligning capability and rich representations to achieve a high extraction accuracy with minimal optimization costs, presenting exceptional capacity and efficiency. 2) Then, we propose a Spherical-harmonic-aware (SH-aware) Message Embedding module tailored for 3DGS, which employs a set of SH offsets to seamlessly embed the message into the SH features of each 3D Gaussian while maintaining the original 3D structure. It enables the 3DGS assets to be watermarked with minimal fidelity trade-offs and also prevents malicious users from removing the messages from the model files, meeting the demands for invisibility and security. 3) We further propose an Anti-distortion Message Extraction module to improve robustness against various visual distortions. Extensive experiments demonstrate that GuardSplat outperforms state-of-the-art and achieves fast optimization speed. Project page: https://narcissusex.github.io/GuardSplat, and Code: https://github.com/NarcissusEx/GuardSplat.

Updated: 2025-03-05 21:10:52

标题: GuardSplat：三维高斯点喷洒的高效稳健水印技术

摘要: 最近，3D高斯喷洒（3DGS）已为各种应用程序创建了令人印象深刻的3D资产。然而，由于现有的水印方法不适用于3DGS渲染管道，考虑到安全性、容量和隐形性，这些资产的版权保护并不好。此外，这些方法通常需要数小时甚至数天进行优化，限制了应用场景。在本文中，我们提出了GuardSplat，这是一个创新的高效框架，有效保护3DGS资产的版权。具体而言，1）我们首先提出了一个CLIP引导的消息解耦优化模块，用于训练消息解码器，利用CLIP的对齐能力和丰富的表示来实现高提取准确性，同时最小化优化成本，呈现出优越的容量和效率。2）然后，我们提出了一个针对3DGS定制的球谐感知（SH-aware）消息嵌入模块，该模块利用一组SH偏移量，将消息无缝嵌入到每个3D高斯的SH特征中，同时保持原始的3D结构。这使得3DGS资产可以在最小化保真度的情况下加水印，并防止恶意用户从模型文件中删除消息，满足了对隐形性和安全性的需求。3）我们进一步提出了一个抗失真消息提取模块，以提高对各种视觉失真的鲁棒性。大量实验证明，GuardSplat优于最先进技术，并实现了快速优化速度。项目页面：https://narcissusex.github.io/GuardSplat，代码：https://github.com/NarcissusEx/GuardSplat。

更新时间: 2025-03-05 21:10:52

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2411.19895v3

The Signed Two-Space Proximity Model for Learning Representations in Protein-Protein Interaction Networks

Accurately predicting complex protein-protein interactions (PPIs) is crucial for decoding biological processes, from cellular functioning to disease mechanisms. However, experimental methods for determining PPIs are computationally expensive. Thus, attention has been recently drawn to machine learning approaches. Furthermore, insufficient effort has been made toward analyzing signed PPI networks, which capture both activating (positive) and inhibitory (negative) interactions. To accurately represent biological relationships, we present the Signed Two-Space Proximity Model (S2-SPM) for signed PPI networks, which explicitly incorporates both types of interactions, reflecting the complex regulatory mechanisms within biological systems. This is achieved by leveraging two independent latent spaces to differentiate between positive and negative interactions while representing protein similarity through proximity in these spaces. Our approach also enables the identification of archetypes representing extreme protein profiles. S2-SPM's superior performance in predicting the presence and sign of interactions in SPPI networks is demonstrated in link prediction tasks against relevant baseline methods. Additionally, the biological prevalence of the identified archetypes is confirmed by an enrichment analysis of Gene Ontology (GO) terms, which reveals that distinct biological tasks are associated with archetypal groups formed by both interactions. This study is also validated regarding statistical significance and sensitivity analysis, providing insights into the functional roles of different interaction types. Finally, the robustness and consistency of the extracted archetype structures are confirmed using the Bayesian Normalized Mutual Information (BNMI) metric, proving the model's reliability in capturing meaningful SPPI patterns.

Updated: 2025-03-05 21:08:58

标题: 已签名的两空间接近模型在蛋白质相互作用网络中学习表征的应用

摘要: 准确预测复杂蛋白质-蛋白质相互作用（PPIs）对于解码从细胞功能到疾病机制的生物过程至关重要。然而，确定PPIs的实验方法在计算上是昂贵的。因此，最近开始关注机器学习方法。此外，对分析带符号的PPI网络的努力不足，这些网络捕捉激活（正向）和抑制（负向）相互作用。为了准确表示生物关系，我们提出了用于带符号PPI网络的Signed Two-Space Proximity Model（S2-SPM），该模型明确地将两种类型的相互作用结合在一起，反映了生物系统内复杂的调节机制。通过利用两个独立的潜在空间来区分正负相互作用，并通过这些空间中的接近来表示蛋白质相似性，实现了这一目标。我们的方法还使得能够识别代表极端蛋白质特征的原型。在与相关基准方法进行的链接预测任务中展示了S2-SPM在预测SPPI网络中相互作用的存在和符号的优越性能。此外，通过基因本体（GO）术语的富集分析确认了确定的原型的生物学普遍性，揭示了与由这些相互作用形成的原型群相关联的不同生物任务。此研究还通过统计显著性和敏感性分析进行了验证，提供了有关不同相互作用类型功能角色的见解。最后，通过贝叶斯归一化互信息（BNMI）度量证实了提取的原型结构的稳健性和一致性，证明了该模型在捕捉有意义的SPPI模式方面的可靠性。

更新时间: 2025-03-05 21:08:58

领域: cs.LG,q-bio.MN

下载: http://arxiv.org/abs/2503.03904v1

Defining and Characterizing Reward Hacking

We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function leads to poor performance according to the true reward function. We say that a proxy is unhackable if increasing the expected proxy return can never decrease the expected true return. Intuitively, it might be possible to create an unhackable proxy by leaving some terms out of the reward function (making it "narrower") or overlooking fine-grained distinctions between roughly equivalent outcomes, but we show this is usually not the case. A key insight is that the linearity of reward (in state-action visit counts) makes unhackability a very strong condition. In particular, for the set of all stochastic policies, two reward functions can only be unhackable if one of them is constant. We thus turn our attention to deterministic policies and finite sets of stochastic policies, where non-trivial unhackable pairs always exist, and establish necessary and sufficient conditions for the existence of simplifications, an important special case of unhackability. Our results reveal a tension between using reward functions to specify narrow tasks and aligning AI systems with human values.

Updated: 2025-03-05 21:08:30

标题: 定义和描述奖励欺骗

摘要: 我们提供了对奖励欺骗的首个正式定义，这是一种现象，优化一个不完美的代理奖励函数会导致根据真实奖励函数的表现较差。我们称代理是不可欺骗的，如果增加期望代理回报永远不会降低期望真实回报。直观地，可能通过在奖励函数中略去一些术语（使其“更窄”）或忽略大致等效结果之间的细微区别来创建一个不可欺骗的代理，但我们证明这通常不是这样。一个关键的见解是奖励的线性性（在状态-动作访问计数中）使得不可欺骗性成为一个非常强的条件。特别地，对于所有随机策略的集合，只有一个奖励函数是恒定的情况下，两个奖励函数才能被认为是不可欺骗的。因此，我们将注意力转向确定性策略和有限集合的随机策略，其中非平凡的不可欺骗对总是存在，并建立了简化存在的必要和充分条件，这是不可欺骗的一个重要特例。我们的结果揭示了在使用奖励函数指定狭窄任务和将人工智能系统与人类价值观保持一致之间存在的紧张关系。

更新时间: 2025-03-05 21:08:30

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2209.13085v2

Parser Knows Best: Testing DBMS with Coverage-Guided Grammar-Rule Traversal

Database Management System (DBMS) is the key component for data-intensive applications. Recently, researchers propose many tools to comprehensively test DBMS systems for finding various bugs. However, these tools only cover a small subset of diverse syntax elements defined in DBMS-specific SQL dialects, leaving a large number of features unexplored. In this paper, we propose ParserFuzz, a novel fuzzing framework that automatically extracts grammar rules from DBMSs' built-in syntax definition files for SQL query generation. Without any input corpus, ParserFuzz can generate diverse query statements to saturate the grammar features of the tested DBMSs, which grammar features could be missed by previous tools. Additionally, ParserFuzz utilizes code coverage as feedback to guide the query mutation, which combines different DBMS features extracted from the syntax rules to find more function and safety bugs. In our evaluation, ParserFuzz outperforms all state-of-the-art existing DBMS testing tools in terms of bug finding, grammar rule coverage and code coverage. ParserFuzz detects 81 previously unknown bugs in total across 5 popular DBMSs, where all bugs are confirmed and 34 have been fixed.

Updated: 2025-03-05 20:50:41

标题: 解析器最佳选择：使用覆盖率引导的语法规则遍历测试数据库管理系统

摘要: 数据库管理系统（DBMS）是数据密集型应用的关键组件。最近，研究人员提出了许多工具来全面测试DBMS系统，以发现各种错误。然而，这些工具只涵盖了DBMS特定SQL方言中定义的一小部分不同语法元素，留下了许多特性未被探索。在本文中，我们提出了ParserFuzz，这是一个新颖的模糊测试框架，可自动从DBMS内置的语法定义文件中提取语法规则，用于生成SQL查询语句。ParserFuzz无需任何输入语料库，即可生成各种查询语句，以填满被测试DBMS的语法特性，这些语法特性可能被以前的工具忽略了。此外，ParserFuzz利用代码覆盖率作为反馈，指导查询变异，结合从语法规则中提取的不同DBMS特性，以发现更多功能和安全漏洞。在我们的评估中，ParserFuzz在bug发现、语法规则覆盖率和代码覆盖率方面均优于所有现有的DBMS测试工具。ParserFuzz在5个流行的DBMS中总共检测到81个以前未知的错误，所有错误都经过确认，其中34个已经修复。

更新时间: 2025-03-05 20:50:41

领域: cs.CR,cs.DB

下载: http://arxiv.org/abs/2503.03893v1

LensDFF: Language-enhanced Sparse Feature Distillation for Efficient Few-Shot Dexterous Manipulation

Learning dexterous manipulation from few-shot demonstrations is a significant yet challenging problem for advanced, human-like robotic systems. Dense distilled feature fields have addressed this challenge by distilling rich semantic features from 2D visual foundation models into the 3D domain. However, their reliance on neural rendering models such as Neural Radiance Fields (NeRF) or Gaussian Splatting results in high computational costs. In contrast, previous approaches based on sparse feature fields either suffer from inefficiencies due to multi-view dependencies and extensive training or lack sufficient grasp dexterity. To overcome these limitations, we propose Language-ENhanced Sparse Distilled Feature Field (LensDFF), which efficiently distills view-consistent 2D features onto 3D points using our novel language-enhanced feature fusion strategy, thereby enabling single-view few-shot generalization. Based on LensDFF, we further introduce a few-shot dexterous manipulation framework that integrates grasp primitives into the demonstrations to generate stable and highly dexterous grasps. Moreover, we present a real2sim grasp evaluation pipeline for efficient grasp assessment and hyperparameter tuning. Through extensive simulation experiments based on the real2sim pipeline and real-world experiments, our approach achieves competitive grasping performance, outperforming state-of-the-art approaches.

Updated: 2025-03-05 20:46:30

标题: LensDFF：语言增强的稀疏特征蒸馏，用于高效的少样本灵巧操作

摘要: 从少量示范中学习熟练操作是先进的、类似于人类的机器人系统面临的一个重要而具有挑战性的问题。稠密的蒸馏特征场通过将丰富的语义特征从2D视觉基础模型蒸馏到3D域中来解决了这一挑战。然而，它们依赖于神经渲染模型，如神经辐射场（NeRF）或高斯喷洒，导致高计算成本。相比之下，之前基于稀疏特征场的方法要么因多视角依赖性和大量训练而导致低效，要么缺乏足够的抓握灵活性。为了克服这些限制，我们提出了一种Language-ENhanced Sparse Distilled Feature Field（LensDFF），它通过我们的新颖的语言增强特征融合策略，有效地将视图一致的2D特征蒸馏到3D点上，从而实现单视图少量示范的泛化。基于LensDFF，我们进一步介绍了一个少量示范的熟练操作框架，将抓握基元集成到示范中，以生成稳定且高度灵巧的抓握。此外，我们提出了一个用于高效抓握评估和超参数调整的real2sim抓握评估流水线。通过基于real2sim流水线的广泛模拟实验和真实世界实验，我们的方法实现了有竞争力的抓取性能，优于最先进的方法。

更新时间: 2025-03-05 20:46:30

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2503.03890v1

Pretrained LLMs as Real-Time Controllers for Robot Operated Serial Production Line

The manufacturing industry is undergoing a transformative shift, driven by cutting-edge technologies like 5G, AI, and cloud computing. Despite these advancements, effective system control, which is crucial for optimizing production efficiency, remains a complex challenge due to the intricate, knowledge-dependent nature of manufacturing processes and the reliance on domain-specific expertise. Conventional control methods often demand heavy customization, considerable computational resources, and lack transparency in decision-making. In this work, we investigate the feasibility of using Large Language Models (LLMs), particularly GPT-4, as a straightforward, adaptable solution for controlling manufacturing systems, specifically, mobile robot scheduling. We introduce an LLM-based control framework to assign mobile robots to different machines in robot assisted serial production lines, evaluating its performance in terms of system throughput. Our proposed framework outperforms traditional scheduling approaches such as First-Come-First-Served (FCFS), Shortest Processing Time (SPT), and Longest Processing Time (LPT). While it achieves performance that is on par with state-of-the-art methods like Multi-Agent Reinforcement Learning (MARL), it offers a distinct advantage by delivering comparable throughput without the need for extensive retraining. These results suggest that the proposed LLM-based solution is well-suited for scenarios where technical expertise, computational resources, and financial investment are limited, while decision transparency and system scalability are critical concerns.

Updated: 2025-03-05 20:43:49

标题: 预训练LLMs作为机器人操作的串行生产线实时控制器

摘要: 制造业正在经历一场变革性转变，受到5G、人工智能和云计算等尖端技术的推动。尽管有这些进步，但有效的系统控制对于优化生产效率仍然是一个复杂的挑战，这是由于制造过程的复杂性以及对领域特定专业知识的依赖性。传统的控制方法通常需要大量定制、大量计算资源，并且在决策过程中缺乏透明度。在这项工作中，我们研究了使用大型语言模型(LLMs)，特别是GPT-4，作为控制制造系统的简单、可适应的解决方案的可行性，具体来说是移动机器人调度。我们引入了一个基于LLM的控制框架，将移动机器人分配到机器人辅助的串行生产线中的不同机器上，评估其在系统吞吐量方面的性能。我们提出的框架优于传统的调度方法，如先来先服务(FCFS)、最短处理时间(SPT)和最长处理时间(LPT)。虽然它在性能方面与最先进的方法，如多智能体强化学习(MARL)相当，但它具有一个明显的优势，即在不需要进行大量重新训练的情况下，提供相当的吞吐量。这些结果表明，所提出的基于LLM的解决方案非常适合技术专业知识、计算资源和财务投资有限、决策透明度和系统可扩展性是关键问题的情况。

更新时间: 2025-03-05 20:43:49

领域: cs.RO,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.03889v1

Seldonian Reinforcement Learning for Ad Hoc Teamwork

Most offline RL algorithms return optimal policies but do not provide statistical guarantees on undesirable behaviors. This could generate reliability issues in safety-critical applications, such as in some multiagent domains where agents, and possibly humans, need to interact to reach their goals without harming each other. In this work, we propose a novel offline RL approach, inspired by Seldonian optimization, which returns policies with good performance and statistically guaranteed properties with respect to predefined undesirable behaviors. In particular, our focus is on Ad Hoc Teamwork settings, where agents must collaborate with new teammates without prior coordination. Our method requires only a pre-collected dataset, a set of candidate policies for our agent, and a specification about the possible policies followed by the other players -- it does not require further interactions, training, or assumptions on the type and architecture of the policies. We test our algorithm in Ad Hoc Teamwork problems and show that it consistently finds reliable policies while improving sample efficiency with respect to standard ML baselines.

Updated: 2025-03-05 20:37:02

标题: 塞尔登强化学习用于即时小组合作

摘要: 大多数离线强化学习算法返回最优策略，但不能对不良行为提供统计保证。这可能在一些多智能体领域的安全关键应用中引发可靠性问题，例如在需要智能体和可能的人类相互交互以达到目标而不伤害彼此的情况下。在这项工作中，我们提出了一种新颖的离线强化学习方法，灵感来自Seldonian优化，它返回具有良好性能和在预定义不良行为方面具有统计保证属性的策略。具体而言，我们的重点是在Ad Hoc Teamwork设置中，智能体必须与新的队友合作而无需事先协调。我们的方法只需要一个预先收集的数据集、一组针对我们智能体的候选策略，以及关于其他玩家可能遵循的策略的规范 - 它不需要进一步的交互、训练或对策略的类型和架构的假设。我们在Ad Hoc Teamwork问题中测试我们的算法，并展示它在提高样本效率方面相对于标准机器学习基线而言始终能够找到可靠的策略。

更新时间: 2025-03-05 20:37:02

领域: cs.LG

下载: http://arxiv.org/abs/2503.03885v1

Role of Databases in GenAI Applications

Generative AI (GenAI) is transforming industries by enabling intelligent content generation, automation, and decision-making. However, the effectiveness of GenAI applications depends significantly on efficient data storage, retrieval, and contextual augmentation. This paper explores the critical role of databases in GenAI workflows, emphasizing the importance of choosing the right database architecture to optimize performance, accuracy, and scalability. It categorizes database roles into conversational context (key-value/document databases), situational context (relational databases/data lakehouses), and semantic context (vector databases) each serving a distinct function in enriching AI-generated responses. Additionally, the paper highlights real-time query processing, vector search for semantic retrieval, and the impact of database selection on model efficiency and scalability. By leveraging a multi-database approach, GenAI applications can achieve more context-aware, personalized, and high-performing AI-driven solutions.

Updated: 2025-03-05 20:32:21

标题: 数据库在GenAI应用中的作用

摘要: 生成式人工智能（GenAI）正在通过实现智能内容生成、自动化和决策-making来改变行业。然而，GenAI应用的有效性在很大程度上取决于高效的数据存储、检索和情境增强。本文探讨了数据库在GenAI工作流程中的关键作用，强调选择正确的数据库架构来优化性能、准确性和可扩展性的重要性。它将数据库角色分类为对话上下文（键值/文档数据库）、情境上下文（关系数据库/数据湖）和语义上下文（向量数据库），每种在丰富AI生成的响应中发挥着不同的功能。此外，本文强调了实时查询处理、语义检索的向量搜索，以及数据库选择对模型效率和可扩展性的影响。通过利用多数据库方法，GenAI应用可以实现更加具有上下文意识、个性化和高性能的AI驱动解决方案。

更新时间: 2025-03-05 20:32:21

领域: cs.DB,cs.AI,97P30,I.2.7; H.2.5

下载: http://arxiv.org/abs/2503.04847v1

A generative approach to LLM harmfulness detection with special red flag tokens

Most safety training methods for large language models (LLMs) based on fine-tuning rely on dramatically changing the output distribution of the model when faced with a harmful request, shifting it from an unsafe answer to a refusal to respond. These methods inherently compromise model capabilities and might make auto-regressive models vulnerable to attacks that make likely an initial token of affirmative response. To avoid that, we propose to expand the model's vocabulary with a special token we call red flag token (<rf>) and propose to fine-tune the model to generate this token at any time harmful content is generated or about to be generated. This novel safety training method effectively augments LLMs into generative classifiers of harmfulness at all times during the conversation. This method offers several advantages: it enables the model to explicitly learn the concept of harmfulness while marginally affecting the generated distribution, thus maintaining the model's utility. It also evaluates each generated answer rather than just the input prompt and provides a stronger defence against sampling-based attacks. In addition, it simplifies the evaluation of the model's robustness and reduces correlated failures when combined with a classifier. We further show an increased robustness to long contexts, and supervised fine-tuning attacks.

Updated: 2025-03-05 20:31:47

标题: 一种生成式方法用于检测带有特殊红旗标记的LLM有害性

摘要: 大多数基于微调的大型语言模型（LLMs）的安全培训方法依赖于在面对有害请求时，显著改变模型的输出分布，将其从不安全的答案转变为拒绝回应。这些方法本质上会损害模型的能力，并可能使自回归模型容易受到攻击，这些攻击会产生肯定回应的初始标记。为了避免这种情况，我们建议通过引入一个特殊的标记，称为红旗标记（<rf>），并提议对模型进行微调，以在生成有害内容或即将生成有害内容时生成此标记。这种新颖的安全培训方法有效地将LLMs扩展为有害性生成分类器，并在对话过程中始终保持。该方法具有几个优点：它使模型能够明确学习有害性的概念，同时对生成的分布影响较小，因此保持了模型的效用。它还评估每个生成的答案而不仅仅是输入提示，并提供更强大的防御以抵御基于抽样的攻击。此外，它简化了模型鲁棒性的评估，并减少了与分类器结合时的相关失败。我们进一步展示了对长上下文和受监督的微调攻击的增强鲁棒性。

更新时间: 2025-03-05 20:31:47

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2502.16366v2

A Quantum Good Authentication Protocol

This article presents a novel network protocol that incorporates a quantum photonic channel for symmetric key distribution, a Dilithium signature to replace factor-based public key cryptography for enhanced authentication, security, and privacy. The protocol uses strong hash functions to hash original messages and verify heightened data integrity at the destination. This Quantum Good Authentication Protocol (QGP) provides high-level security provided by the theory of quantum mechanics. QGP also has the advantage of quantum-resistant data protection that prevents current digital computer and future quantum computer attacks. QGP transforms the Transmission Control Protocol/Internet Protocol (TCP/IP) by adding a quantum layer at the bottom of Open Systems Interconnection (OSI) model (layer 0) and modifying the top layer (layer 7) with Dilithium signatures, thus improving the security of the original OSI model. In addition, QGP incorporates strong encryption, hardware-based quantum channels, post-quantum signatures, and secure hash algorithms over a platform of decryptors, switches, routers, and network controllers to form a testbed of the next-generation, secure quantum internet. The experiments presented here show that QGP provides secure authentication and improved security and privacy and can be adopted as a new protocol for the next-generation quantum Internet.

Updated: 2025-03-05 20:30:34

标题: 一个量子良好的认证协议

摘要: 本文介绍了一种新颖的网络协议，该协议将量子光子通道用于对称密钥分发，采用Dilithium签名替代基于因子的公钥加密以增强身份验证、安全性和隐私性。该协议使用强哈希函数对原始消息进行哈希，并在目的地验证数据完整性。这种量子好认证协议（QGP）提供了量子力学理论提供的高级安全性。QGP还具有抗量子数据保护的优势，可以防止当前数字计算机和未来量子计算机的攻击。 QGP通过在开放系统互联（OSI）模型的底部（第0层）添加一个量子层，并在顶层（第7层）使用Dilithium签名来改进原始OSI模型的安全性。此外，QGP还整合了强加密、基于硬件的量子通道、后量子签名和安全哈希算法，通过解密器、交换机、路由器和网络控制器构建了下一代安全量子互联网的测试平台。在这里呈现的实验表明，QGP提供了安全的认证、改善的安全性和隐私性，并可作为下一代量子互联网的新协议采用。

更新时间: 2025-03-05 20:30:34

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2503.03884v1

Deep Augmentation: Dropout as Augmentation for Self-Supervised Learning

Despite dropout's ubiquity in machine learning, its effectiveness as a form of data augmentation remains under-explored. We address two key questions: (i) When is dropout effective as an augmentation strategy? (ii) Is dropout uniquely effective under these conditions? To explore these questions, we propose Deep Augmentation, a network- and modality-agnostic method that applies dropout or PCA transformations to targeted layers in neural networks. Through extensive experiments on contrastive learning tasks in NLP, computer vision, and graph learning, we find that uniformly applying dropout across layers does not consistently improve performance. Instead, dropout proves most beneficial in deeper layers and can be matched by alternative augmentations (e.g., PCA). We also show that a stop-gradient operation is critical for ensuring dropout functions effectively as an augmentation, and that performance trends invert when moving from contrastive tasks to supervised tasks. Our analysis suggests that Deep Augmentation helps mitigate inter-layer co-adaptation -- a notable issue in self-supervised learning due to the absence of labeled data. Drawing on these insights, we outline a procedure for selecting the optimal augmentation layer and demonstrate that Deep Augmentation can outperform traditional input-level augmentations. This simple yet powerful approach can be seamlessly integrated into a wide range of architectures and modalities, yielding notable gains in both performance and generalization.

Updated: 2025-03-05 20:30:05

标题: 深度增强：辍学作为自监督学习的增强

摘要: 尽管辍学在机器学习中普遍存在，但作为一种数据增强形式的有效性仍未得到充分探讨。我们解决了两个关键问题：（i）辍学在何时作为增强策略有效？（ii）在这些条件下，辍学是否独特有效？为了探讨这些问题，我们提出了Deep Augmentation，这是一种网络和模态无关的方法，用于在神经网络中的目标层应用辍学或PCA变换。通过在自然语言处理、计算机视觉和图学习的对比学习任务上进行大量实验，我们发现在各层均匀应用辍学并不能始终提高性能。相反，辍学在更深层中最为有利，并且可以被替代增强（如PCA）所匹配。我们还展示了停梯度操作对确保辍学有效作为增强的关键性，并且当从对比任务转移到监督任务时，性能趋势会反转。我们的分析表明Deep Augmentation有助于缓解层间相互适应——这是自监督学习中一个显著问题，因为缺乏标记数据。借鉴这些见解，我们概述了选择最佳增强层的程序，并证明Deep Augmentation可以胜过传统的输入级增强。这种简单而强大的方法可以无缝地集成到各种架构和模态中，提高性能和泛化能力。

更新时间: 2025-03-05 20:30:05

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2303.14537v4

Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024

In the age of increasingly realistic generative AI, robust deepfake detection is essential for mitigating fraud and disinformation. While many deepfake detectors report high accuracy on academic datasets, we show that these academic benchmarks are out of date and not representative of real-world deepfakes. We introduce Deepfake-Eval-2024, a new deepfake detection benchmark consisting of in-the-wild deepfakes collected from social media and deepfake detection platform users in 2024. Deepfake-Eval-2024 consists of 45 hours of videos, 56.5 hours of audio, and 1,975 images, encompassing the latest manipulation technologies. The benchmark contains diverse media content from 88 different websites in 52 different languages. We find that the performance of open-source state-of-the-art deepfake detection models drops precipitously when evaluated on Deepfake-Eval-2024, with AUC decreasing by 50\% for video, 48\% for audio, and 45\% for image models compared to previous benchmarks. We also evaluate commercial deepfake detection models and models finetuned on Deepfake-Eval-2024, and find that they have superior performance to off-the-shelf open-source models, but do not yet reach the accuracy of deepfake forensic analysts. The dataset is available at https://github.com/nuriachandra/Deepfake-Eval-2024.

Updated: 2025-03-05 20:24:16

标题: Deepfake-Eval-2024：2024年流传的多模态野外深度伪造视频基准测试

摘要: 在逐渐逼真的生成式人工智能时代，强大的深度伪造检测对于减轻欺诈和虚假信息至关重要。虽然许多深度伪造检测器在学术数据集上报告高准确性，但我们表明这些学术基准已经过时且不代表真实世界中的深度伪造。我们引入了Deepfake-Eval-2024，一个新的深度伪造检测基准，由2024年社交媒体和深度伪造检测平台用户收集的现实中的深度伪造组成。Deepfake-Eval-2024包含了45小时的视频，56.5小时的音频和1,975张图片，涵盖了最新的操纵技术。该基准包含了来自88个不同网站、52种不同语言的多样化媒体内容。我们发现，当在Deepfake-Eval-2024上评估时，开源最先进的深度伪造检测模型的性能急剧下降，视频模型的AUC下降了50％，音频模型下降了48％，图像模型下降了45％，与先前的基准相比。我们还评估了商业深度伪造检测模型和在Deepfake-Eval-2024上微调的模型，发现它们的性能优于现成的开源模型，但尚未达到深度伪造法医分析师的准确性。该数据集可在https://github.com/nuriachandra/Deepfake-Eval-2024获取。

更新时间: 2025-03-05 20:24:16

领域: cs.CV,cs.AI,cs.CY

下载: http://arxiv.org/abs/2503.02857v2

CipherPrune: Efficient and Scalable Private Transformer Inference

Private Transformer inference using cryptographic protocols offers promising solutions for privacy-preserving machine learning; however, it still faces significant runtime overhead (efficiency issues) and challenges in handling long-token inputs (scalability issues). We observe that the Transformer's operational complexity scales quadratically with the number of input tokens, making it essential to reduce the input token length. Notably, each token varies in importance, and many inputs contain redundant tokens. Additionally, prior private inference methods that rely on high-degree polynomial approximations for non-linear activations are computationally expensive. Therefore, reducing the polynomial degree for less important tokens can significantly accelerate private inference. Building on these observations, we propose \textit{CipherPrune}, an efficient and scalable private inference framework that includes a secure encrypted token pruning protocol, a polynomial reduction protocol, and corresponding Transformer network optimizations. At the protocol level, encrypted token pruning adaptively removes unimportant tokens from encrypted inputs in a progressive, layer-wise manner. Additionally, encrypted polynomial reduction assigns lower-degree polynomials to less important tokens after pruning, enhancing efficiency without decryption. At the network level, we introduce protocol-aware network optimization via a gradient-based search to maximize pruning thresholds and polynomial reduction conditions while maintaining the desired accuracy. Our experiments demonstrate that CipherPrune reduces the execution overhead of private Transformer inference by approximately $6.1\times$ for 128-token inputs and $10.6\times$ for 512-token inputs, compared to previous methods, with only a marginal drop in accuracy. The code is publicly available at https://github.com/UCF-Lou-Lab-PET/cipher-prune-inference.

Updated: 2025-03-05 20:18:29

标题: CipherPrune：高效可扩展的私密Transformer推理

摘要: 使用密码协议进行私有Transformer推断为隐私保护机器学习提供了有前途的解决方案；然而，仍然面临着显着的运行时开销（效率问题）和处理长令牌输入（可扩展性问题）的挑战。我们观察到，Transformer的操作复杂度随输入令牌数量的平方增长，因此减少输入令牌长度至关重要。值得注意的是，每个令牌的重要性不同，许多输入包含冗余令牌。此外，依赖高次多项式逼近非线性激活的先前私有推断方法在计算上昂贵。因此，为不太重要的令牌减少多项式次数可以显著加速私有推断。基于这些观察，我们提出了一种名为CipherPrune的高效可扩展的私有推断框架，包括安全的加密令牌修剪协议、多项式减少协议和相应的Transformer网络优化。在协议级别，加密令牌修剪以逐渐的逐层方式自适应地从加密输入中移除不重要的令牌。此外，在修剪后，加密多项式减少为不太重要的令牌分配较低次数的多项式，提高效率而无需解密。在网络级别，我们通过基于梯度的搜索引入了协议感知网络优化，以最大化修剪阈值和多项式减少条件，同时保持所需的准确性。我们的实验表明，与先前的方法相比，CipherPrune将128令牌输入的私有Transformer推断执行开销降低了约6.1倍，将512令牌输入的私有Transformer推断执行开销降低了约10.6倍，仅有轻微的准确性下降。代码可在https://github.com/UCF-Lou-Lab-PET/cipher-prune-inference上公开获取。

更新时间: 2025-03-05 20:18:29

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2502.16782v2

CRAFT: Characterizing and Root-Causing Fault Injection Threats at Pre-Silicon

Fault injection attacks represent a class of threats that can compromise embedded systems across multiple layers of abstraction, such as system software, instruction set architecture (ISA), microarchitecture, and physical implementation. Early detection of these vulnerabilities and understanding their root causes, along with their propagation from the physical layer to the system software, is critical to secure the cyberinfrastructure. This work presents a comprehensive methodology for conducting controlled fault injection attacks at the pre-silicon level and an analysis of the underlying system for root-causing behavior. As the driving application, we use the clock glitch attacks in AI/ML applications for critical misclassification. Our study aims to characterize and diagnose the impact of faults within the RISC-V instruction set and pipeline stages, while tracing fault propagation from the circuit level to the AI/ML application software. This analysis resulted in discovering two new vulnerabilities through controlled clock glitch parameters. First, we reveal a novel method for causing instruction skips, thereby preventing the loading of critical values from memory. This can cause disruption and affect program continuity and correctness. Second, we demonstrate an attack that converts legal instructions into illegal ones, thereby diverting control flow in a manner exploitable by attackers. Our work underscores the complexity of fault injection attack exploits and emphasizes the importance of preemptive security analysis.

Updated: 2025-03-05 20:17:46

标题: CRAFT：在硅前阶段对故障注入威胁进行特征化和根因分析

摘要: 故障注入攻击代表了一类威胁，可以威胁嵌入式系统在多个抽象层面上，如系统软件、指令集架构（ISA）、微体系结构和物理实现。早期检测这些漏洞并理解它们的根本原因，以及从物理层传播到系统软件的过程对于保护网络基础设施至关重要。这项工作提出了在预硅级别进行受控故障注入攻击的全面方法，并分析了根本原因行为的基础系统。我们使用时钟故障攻击在AI/ML应用中进行关键误分类作为驱动应用。我们的研究旨在描述和诊断RISC-V指令集和流水线阶段内的故障影响，同时追踪从电路级别到AI/ML应用软件的故障传播。这项分析导致通过受控时钟故障参数发现了两个新漏洞。首先，我们揭示了一种导致指令跳过的新方法，从而阻止从内存加载关键值。这可能会引起中断并影响程序的连续性和正确性。其次，我们展示了一种将合法指令转变为非法指令的攻击，从而以攻击者可利用的方式改变控制流。我们的工作强调了故障注入攻击利用的复杂性，并强调了预防性安全分析的重要性。

更新时间: 2025-03-05 20:17:46

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2503.03877v1

LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach

As specialized large language models (LLMs) become increasingly prevalent, model merging methods are being used to combine them to create a single multi-task model without requiring any additional data or training. However, these approaches fall short when the objective of merging is to increase the downstream model's performance on a particular task-specific benchmark. In this work, we propose LEWIS (Layer Wise Sparsity), a guided model-merging framework that uses activation-based layer importance to dynamically adjust layer-wise task-vector sparsity required for the merge process. LEWIS uses a calibration dataset to prioritize critical layers during the task-vector pruning process required for model merging. This approach guides existing merging methods by preserving essential layer-wise task-specific knowledge while ensuring the merged model performs the best at benchmarks resembling the calibration dataset. Our experiments demonstrate the effectiveness of LEWIS with performance improvements of code instruction-following and math-solving models created through model merging up to 4 percent and 11.3 percent, respectively, outperforming unguided data-less model merging approaches that use uniform-sparsity.

Updated: 2025-03-05 20:09:59

标题: LEWIS（分层稀疏）-- 一种无需训练的引导模型合并方法

摘要: 随着专门的大型语言模型（LLMs）日益普遍，模型合并方法被用来将它们结合起来创建一个单一的多任务模型，而无需额外的数据或训练。然而，当合并的目标是提高下游模型在特定任务基准上的性能时，这些方法存在不足。在这项工作中，我们提出了LEWIS（Layer Wise Sparsity），一个引导模型合并框架，利用基于激活的层重要性动态调整合并过程所需的层级任务向量稀疏度。LEWIS使用一个校准数据集，在模型合并过程中优先考虑关键层，这些关键层在任务向量修剪过程中是必需的。这种方法通过保留必要的层级任务特定知识来指导现有的合并方法，同时确保合并后的模型在类似校准数据集的基准测试中表现最佳。我们的实验表明，LEWIS的有效性，通过模型合并实现的代码指令遵循和数学解决模型的性能提高分别高达4％和11.3％，优于使用均匀稀疏度的无数据引导模型合并方法。

更新时间: 2025-03-05 20:09:59

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2503.03874v1

Honest to a Fault: Root-Causing Fault Attacks with Pre-Silicon RISC Pipeline Characterization

Fault injection attacks represent a class of threats that can compromise embedded systems across multiple layers of abstraction, such as system software, instruction set architecture (ISA), microarchitecture, and physical implementation. Early detection of these vulnerabilities and understanding their root causes along with their propagation from the physical layer to the system software is critical to secure the cyberinfrastructure. This present presents a comprehensive methodology for conducting controlled fault injection attacks at the pre-silicon level and an analysis of the underlying system for root-causing behavior. As the driving application, we use the clock glitch attacks in AI/ML applications for critical misclassification. Our study aims to characterize and diagnose the impact of faults within the RISC-V instruction set and pipeline stages, while tracing fault propagation from the circuit level to the AI/ML application software. This analysis resulted in discovering a novel vulnerability through controlled clock glitch parameters, specifically targeting the RISC-V decode stage.

Updated: 2025-03-05 20:08:12

标题: 忠诚至上：利用预硅RISC管线特性查找故障攻击

摘要: 故障注入攻击代表了一类威胁，可以 compromise 嵌入式系统在多个抽象层面，如系统软件、指令集体系结构（ISA）、微体系结构和物理实现。早期检测这些漏洞并理解它们的根本原因以及它们从物理层到系统软件的传播是保护网络基础设施的关键。本文提出了一种全面的方法，用于在前硅级别进行受控故障注入攻击，并分析根本原因行为的底层系统。作为驱动应用程序，我们使用时钟故障攻击在AI/ML应用程序中进行关键错误分类。我们的研究旨在表征和诊断RISC-V指令集和流水线阶段内的故障影响，同时跟踪从电路级到AI/ML应用程序软件的故障传播。通过受控时钟故障参数，特别针对RISC-V解码阶段，进行了发现新型漏洞的分析。

更新时间: 2025-03-05 20:08:12

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2503.04846v1

Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow

This paper introduces Helix, a distributed system for high-throughput, low-latency large language model (LLM) serving in heterogeneous GPU clusters. The key idea behind Helix is to formulate inference computation of LLMs over heterogeneous GPUs and network connections as a max-flow problem on directed, weighted graphs, whose nodes represent GPU instances and edges capture both GPU and network heterogeneity through their capacities. Helix then uses a mixed integer linear programming (MILP) algorithm to discover highly optimized strategies to serve LLMs on heterogeneous GPUs. This approach allows Helix to jointly optimize model placement and request scheduling, two highly entangled tasks in heterogeneous LLM serving. Our evaluation on several heterogeneous clusters ranging from 24 to 42 GPU nodes shows that Helix improves serving throughput by up to 3.3x and reduces prompting and decoding latency by up to 66% and 24%, respectively, compared to existing approaches. Helix is available at https://github.com/Thesys-lab/Helix-ASPLOS25.

Updated: 2025-03-05 20:00:57

标题: 螺旋：通过最大流在异构GPU和网络上为大型语言模型提供服务

摘要: 本文介绍了Helix，这是一个用于在异构GPU集群中进行高吞吐量、低延迟的大型语言模型（LLM）服务的分布式系统。Helix的关键思想是将LLM的推理计算在异构GPU和网络连接上形式化为一个在有向加权图上的最大流问题，其中节点表示GPU实例，边通过它们的容量捕获GPU和网络的异构性。然后，Helix使用混合整数线性规划（MILP）算法来发现在异构GPU上为LLMs提供高度优化的策略。这种方法允许Helix共同优化模型放置和请求调度，这是异构LLM服务中两个高度交织的任务。我们在从24到42个GPU节点的几个异构集群上进行评估，结果显示与现有方法相比，Helix将服务吞吐量提高了最多3.3倍，并将提示和解码延迟分别降低了66%和24%。Helix可在https://github.com/Thesys-lab/Helix-ASPLOS25上获得。

更新时间: 2025-03-05 20:00:57

领域: cs.DC,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.01566v2

Learning to Negotiate via Voluntary Commitment

The partial alignment and conflict of autonomous agents lead to mixed-motive scenarios in many real-world applications. However, agents may fail to cooperate in practice even when cooperation yields a better outcome. One well known reason for this failure comes from non-credible commitments. To facilitate commitments among agents for better cooperation, we define Markov Commitment Games (MCGs), a variant of commitment games, where agents can voluntarily commit to their proposed future plans. Based on MCGs, we propose a learnable commitment protocol via policy gradients. We further propose incentive-compatible learning to accelerate convergence to equilibria with better social welfare. Experimental results in challenging mixed-motive tasks demonstrate faster empirical convergence and higher returns for our method compared with its counterparts. Our code is available at https://github.com/shuhui-zhu/DCL.

Updated: 2025-03-05 19:55:10

标题: 学习通过自愿承诺进行谈判

摘要: 自主代理人的部分对齐和冲突导致了许多现实世界应用中的混合动机场景。然而，在实践中，即使合作可以产生更好的结果，代理人也可能无法合作。这种失败的一个众所周知的原因来自不可信承诺。为了促进代理人之间更好的合作承诺，我们定义了马尔可夫承诺游戏（MCGs），这是承诺游戏的一种变体，其中代理人可以自愿承诺他们提出的未来计划。基于MCGs，我们提出了一种可学习的承诺协议，通过策略梯度实现。我们进一步提出了激励兼容学习，以加速收敛到具有更好社会福利的均衡状态。在具有挑战性的混合动机任务中的实验结果表明，与其对应方法相比，我们的方法实现了更快的实验收敛速度和更高的回报。我们的代码可在https://github.com/shuhui-zhu/DCL找到。

更新时间: 2025-03-05 19:55:10

领域: cs.AI,cs.GT,cs.LG,cs.MA

下载: http://arxiv.org/abs/2503.03866v1

Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector

We revisit the likelihood ratio between a pretrained large language model (LLM) and its finetuned variant as a criterion for out-of-distribution (OOD) detection. The intuition behind such a criterion is that, the pretrained LLM has the prior knowledge about OOD data due to its large amount of training data, and once finetuned with the in-distribution data, the LLM has sufficient knowledge to distinguish their difference. Leveraging the power of LLMs, we show that, the likelihood ratio can serve as an effective OOD detection criterion. Moreover, we apply the proposed LLM-based likelihood ratio to detect OOD questions in question-answering (QA) systems, which can be used to improve the performance of specialized LLMs for general questions. Given that likelihood can be easily obtained by the loss functions within contemporary neural network frameworks, it is straightforward to implement this approach in practice. Since both the pretrained LLMs and its various finetuned models are widely available from online platforms such as Hugging Face, our proposed criterion can be effortlessly incorporated for OOD detection without the need for further training. We conduct comprehensive evaluation across on multiple settings, including far OOD, near OOD, spam detection, and QA scenarios, to demonstrate the effectiveness of the method. Code can be found at https://github.com/andiac/LLMOODratio

Updated: 2025-03-05 19:51:23

标题: 您的经过优化的大型语言模型已经是一个强大的超出分布检测器。

摘要: 我们重新审视了预训练大型语言模型（LLM）及其微调变体之间的似然比作为超出分布（OOD）检测的标准。这个标准背后的直觉是，预训练的LLM由于大量的训练数据，对OOD数据具有先验知识，一旦与分布数据进行微调，LLM具有足够的知识来区分它们的差异。利用LLM的强大能力，我们展示了似然比可以作为有效的OOD检测标准。此外，我们将提出的基于LLM的似然比应用于问答（QA）系统中，用于改善专门LLM对一般问题的性能。鉴于似然可以通过当代神经网络框架中的损失函数轻松获得，实践中实现这种方法是直接的。由于预训练的LLM及其各种微调模型都可以从Hugging Face等在线平台轻松获取，我们提出的标准可以轻松地用于OOD检测，无需进一步训练。我们在多个设置下进行了全面评估，包括远距离OOD、近距离OOD、垃圾邮件检测和QA场景，以展示该方法的有效性。代码可在https://github.com/andiac/LLMOODratio找到。

更新时间: 2025-03-05 19:51:23

领域: cs.CL,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.08679v2

You Are the Best Reviewer of Your Own Papers: The Isotonic Mechanism

Machine learning (ML) and artificial intelligence (AI) conferences including NeurIPS and ICML have experienced a significant decline in peer review quality in recent years. To address this growing challenge, we introduce the Isotonic Mechanism, a computationally efficient approach to enhancing the accuracy of noisy review scores by incorporating authors' private assessments of their submissions. Under this mechanism, authors with multiple submissions are required to rank their papers in descending order of perceived quality. Subsequently, the raw review scores are calibrated based on this ranking to produce adjusted scores. We prove that authors are incentivized to truthfully report their rankings because doing so maximizes their expected utility, modeled as an additive convex function over the adjusted scores. Moreover, the adjusted scores are shown to be more accurate than the raw scores, with improvements being particularly significant when the noise level is high and the author has many submissions -- a scenario increasingly prevalent at large-scale ML/AI conferences. We further investigate whether submission quality information beyond a simple ranking can be truthfully elicited from authors. We establish that a necessary condition for truthful elicitation is that the mechanism be based on pairwise comparisons of the author's submissions. This result underscores the optimality of the Isotonic Mechanism, as it elicits the most fine-grained truthful information among all mechanisms we consider. We then present several extensions, including a demonstration that the mechanism maintains truthfulness even when authors have only partial rather than complete information about their submission quality. Finally, we discuss future research directions, focusing on the practical implementation of the mechanism and the further development of a theoretical framework inspired by our mechanism.

Updated: 2025-03-05 19:46:11

标题: 你是自己论文最好的审阅者：等渗机制

摘要: 机器学习（ML）和人工智能（AI）会议，包括NeurIPS和ICML，在近年来的同行评审质量方面经历了显著下降。为了解决这一不断增长的挑战，我们引入了等调机制，这是一种计算效率高的方法，可以通过结合作者对其提交物的私人评估来提高嘈杂评审评分的准确性。在这种机制下，有多个提交物的作者需要按照感知质量降序对其论文进行排名。随后，根据这个排名对原始评审评分进行校准，得出调整后的评分。我们证明，作者被激励诚实地报告他们的排名，因为这样做可以最大化他们的预期效用，被建模为对调整后的分数的加法凸函数。此外，调整后的评分被证明比原始评分更准确，尤其是在噪声水平高且作者有许多提交物的情况下，这种情况在大规模的ML/AI会议上越来越普遍。我们进一步研究了是否能从作者那里真实获取超出简单排名的提交质量信息。我们建立了一个必要条件，即真实获取的机制必须基于作者提交物之间的两两比较。这一结果强调了等调机制的最优性，因为它在我们考虑的所有机制中获取了最精细的真实信息。然后，我们提出了几个扩展，包括证明该机制即使在作者对其提交物质量只有部分而非完整信息的情况下仍能保持真实性。最后，我们讨论了未来的研究方向，重点放在机制的实际实施和我们机制启发的理论框架的进一步发展上。

更新时间: 2025-03-05 19:46:11

领域: cs.LG,cs.GT,econ.TH,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2206.08149v2

Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions

Improvements in language model capabilities are often attributed to increasing model size or training data, but in some cases smaller models trained on curated data or with different architectural decisions can outperform larger ones trained on more tokens. What accounts for this? To quantify the impact of these design choices, we meta-analyze 92 open-source pretrained models across a wide array of scales, including state-of-the-art open-weights models as well as less performant models and those with less conventional design decisions. We find that by incorporating features besides model size and number of training tokens, we can achieve a relative 3-28% increase in ability to predict downstream performance compared with using scale alone. Analysis of model design decisions reveal insights into data composition, such as the trade-off between language and code tasks at 15-25\% code, as well as the better performance of some architectural decisions such as choosing rotary over learned embeddings. Broadly, our framework lays a foundation for more systematic investigation of how model development choices shape final capabilities.

Updated: 2025-03-05 19:46:04

标题: 非仅仅是缩放定律：朝着更好理解语言模型设计决策的下游影响方向前进

摘要: 语言模型能力的提升通常被归因于增加模型大小或训练数据，但在某些情况下，小型模型在经过精心策划的数据或不同的架构决策下训练时，可以胜过在更多令牌上训练的大型模型。是什么导致了这种情况？为了量化这些设计选择的影响，我们对92个开源预训练模型进行了元分析，涵盖了各种规模，包括最先进的开放权重模型以及性能较差的模型和那些具有较少传统设计决策的模型。我们发现，通过结合模型大小和训练令牌数量以外的特征，与仅使用规模相比，我们可以实现相对3-28%的下游性能预测能力提高。对模型设计决策的分析揭示了关于数据组成的见解，例如在15-25%的代码任务与语言之间的权衡，以及一些架构决策的更好性能，比如选择旋转而不是学习嵌入。总的来说，我们的框架为更系统地研究模型开发选择如何塑造最终能力打下了基础。

更新时间: 2025-03-05 19:46:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03862v1

A dataset-free approach for self-supervised learning of 3D reflectional symmetries

In this paper, we explore a self-supervised model that learns to detect the symmetry of a single object without requiring a dataset-relying solely on the input object itself. We hypothesize that the symmetry of an object can be determined by its intrinsic features, eliminating the need for large datasets during training. Additionally, we design a self-supervised learning strategy that removes the necessity of ground truth labels. These two key elements make our approach both effective and efficient, addressing the prohibitive costs associated with constructing large, labeled datasets for this task. The novelty of our method lies in computing features for each point on the object based on the idea that symmetric points should exhibit similar visual appearances. To achieve this, we leverage features extracted from a foundational image model to compute a visual descriptor for the points. This approach equips the point cloud with visual features that facilitate the optimization of our self-supervised model. Experimental results demonstrate that our method surpasses the state-of-the-art models trained on large datasets. Furthermore, our model is more efficient, effective, and operates with minimal computational and data resources.

Updated: 2025-03-05 19:36:48

标题: 一种无数据集的方法用于自监督学习三维反射对称性

摘要: 在本文中，我们探索了一种自监督模型，该模型学习在不依赖数据集的情况下检测单个对象的对称性-仅依赖于输入对象本身。我们假设对象的对称性可以通过其固有特征来确定，消除了训练过程中大型数据集的需求。此外，我们设计了一种自监督学习策略，消除了对地面真实标签的必要性。这两个关键元素使我们的方法既有效又高效，解决了构建大型标记数据集所带来的高昂成本。我们方法的创新之处在于基于对称点应该具有相似的视觉外观这一理念为对象上的每个点计算特征。为了实现这一目标，我们利用从基础图像模型中提取的特征来计算点的视觉描述符。这种方法为点云提供了视觉特征，有助于优化我们的自监督模型。实验结果表明，我们的方法超过了在大型数据集上训练的最先进模型。此外，我们的模型更加高效、有效，并且仅需最少的计算和数据资源来运行。

更新时间: 2025-03-05 19:36:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.02660v2

LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

Developing interactive systems that utilize natural language instructions to solve complex robotic control tasks has long been a goal of the robotics community. While Large Language Models (LLMs) excel at logical reasoning, in-context learning, and code generation, translating high-level instructions into low-level robotic actions still remains challenging. Furthermore, solving such tasks often requires acquiring policies to execute diverse subtasks and integrating them to achieve the final objective. Hierarchical Reinforcement Learning (HRL) offers a promising solution for solving such tasks by enabling temporal abstraction and improved exploration. However, HRL suffers from non-stationarity caused by the changing lower-level behaviour, which hinders effective policy learning. We propose LGR2, a novel HRL framework that mitigates non-stationarity in HRL by using language-guided higher-level rewards that remain unaffected by the changing lower-level policy behaviour. To analyze the efficacy of our approach, we perform empirical analysis to demonstrate that LGR2 effectively mitigates non-stationarity in HRL and attains success rates exceeding 70% in challenging, sparsely-rewarded robotic navigation and manipulation environments, where other baselines typically fail to show significant progress. Finally, we perform real-world robotic experiments on complex tasks and demonstrate that LGR2 consistently outperforms the baselines.

Updated: 2025-03-05 19:34:08

标题: LGR2：语言引导奖励重标记以加速分层强化学习

摘要: 开发利用自然语言指令解决复杂机器人控制任务的交互式系统长期以来一直是机器人学界的目标。虽然大型语言模型（LLMs）擅长逻辑推理、上下文学习和代码生成，但将高级指令转换为低级机器人动作仍然具有挑战性。此外，解决这类任务通常需要获取执行各种子任务的策略，并将它们整合以实现最终目标。分层强化学习（HRL）通过实现时间抽象和改进探索，为解决这类任务提供了有希望的解决方案。然而，HRL受到由于下层行为的变化而导致的非稳态的困扰，这阻碍了有效的策略学习。我们提出了LGR2，这是一个新颖的HRL框架，通过使用语言引导的高级奖励来减轻HRL中的非稳态，这些奖励不受下层策略行为的影响。为了分析我们方法的有效性，我们进行了实证分析，展示了LGR2在具有挑战性、奖励稀疏的机器人导航和操纵环境中有效减轻了HRL中的非稳态，并取得了超过70%的成功率，而其他基准通常无法显示出显著进展。最后，我们进行了复杂任务的实际机器人实验，并展示了LGR2始终优于基准的表现。

更新时间: 2025-03-05 19:34:08

领域: cs.LG,cs.CL,cs.RO

下载: http://arxiv.org/abs/2406.05881v3

Rational Tuning of LLM Cascades via Probabilistic Modeling

Understanding the reliability of large language models (LLMs) has recently garnered significant attention. Given LLMs' propensity to hallucinate, as well as their high sensitivity to prompt design, it is already challenging to predict the performance of an individual LLM. However, the problem becomes more complex for compound LLM systems such as cascades, where in addition to each model's standalone performance, we must understand how the error rates of different models interact. In this paper, we present a probabilistic model for the joint performance distribution of a sequence of LLMs, which enables a framework for rationally tuning the confidence thresholds of a LLM cascade using continuous optimization. Compared to selecting confidence thresholds using grid search, our parametric Markov-copula model significantly improves runtime scaling with respect to the length of the cascade and the desired resolution of the cost-error curve, turning them from intractable into low-order polynomial. In addition, the optimal thresholds computed using our continuous optimization-based algorithm increasingly outperform those found via grid search as cascade length grows, improving the area under the cost-error curve by 1.9% on average for cascades consisting of at least three models. Overall, our Markov-copula model provides a rational basis for tuning LLM cascade performance and points to the potential of probabilistic methods in analyzing LLM systems.

Updated: 2025-03-05 19:23:10

标题: 通过概率建模对LLM级联进行合理调整

摘要: 最近，人们对大型语言模型（LLMs）的可靠性引起了极大关注。鉴于LLMs倾向于产生幻觉，以及对提示设计非常敏感，预测单个LLM的性能已经具有挑战性。然而，对于复合LLM系统（如级联系统）来说，问题变得更加复杂，因为除了每个模型的独立性能之外，我们还必须了解不同模型的错误率如何相互作用。在本文中，我们提出了一个概率模型，用于描述一系列LLMs的联合性能分布，这使得能够通过连续优化来理性地调整LLM级联的置信阈值。与使用网格搜索选择置信阈值相比，我们的参数马尔可夫-库普拉模型在运行时与级联长度和成本-错误曲线所需分辨率之间的缩放显著改善，将它们从难以处理变为低阶多项式。此外，使用我们基于连续优化的算法计算出的最佳阈值随着级联长度的增长而越来越优于通过网格搜索找到的阈值，使成本-错误曲线下的面积平均提高了1.9％，对于至少由三个模型组成的级联来说。总的来说，我们的马尔可夫-库普拉模型为调整LLM级联性能提供了理性基础，并指出了概率方法在分析LLM系统方面的潜力。

更新时间: 2025-03-05 19:23:10

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2501.09345v3

Implicit Diffusion: Efficient Optimization through Stochastic Sampling

We present a new algorithm to optimize distributions defined implicitly by parameterized stochastic diffusions. Doing so allows us to modify the outcome distribution of sampling processes by optimizing over their parameters. We introduce a general framework for first-order optimization of these processes, that performs jointly, in a single loop, optimization and sampling steps. This approach is inspired by recent advances in bilevel optimization and automatic implicit differentiation, leveraging the point of view of sampling as optimization over the space of probability distributions. We provide theoretical guarantees on the performance of our method, as well as experimental results demonstrating its effectiveness. We apply it to training energy-based models and finetuning denoising diffusions.

Updated: 2025-03-05 19:22:24

标题: 隐式扩散：通过随机抽样实现高效优化

摘要: 我们提出了一种新的算法，用于优化由参数化随机扩散隐含定义的分布。这样做可以通过优化参数来修改采样过程的结果分布。我们引入了一个用于优化这些过程的一阶优化的通用框架，在单个循环中同时执行优化和采样步骤。这种方法受到最近双层优化和自动隐式微分的进展的启发，利用了采样作为概率分布空间上的优化的视角。我们提供了关于我们方法性能的理论保证，以及展示其有效性的实验结果。我们将其应用于训练基于能量的模型和微调去噪扩散。

更新时间: 2025-03-05 19:22:24

领域: cs.LG

下载: http://arxiv.org/abs/2402.05468v3

Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions

In recent years, masked diffusion models (MDMs) have emerged as a promising alternative approach for generative modeling over discrete domains. Compared to autoregressive models (ARMs), MDMs trade off complexity at training time with flexibility at inference time. At training time, they must learn to solve an exponentially large number of infilling problems, but at inference time, they can decode tokens in essentially arbitrary order. In this work, we closely examine these two competing effects. On the training front, we theoretically and empirically demonstrate that MDMs indeed train on computationally intractable subproblems compared to their autoregressive counterparts. On the inference front, we show that a suitable strategy for adaptively choosing the token decoding order significantly enhances the capabilities of MDMs, allowing them to sidestep hard subproblems. On logic puzzles like Sudoku, we show that adaptive inference can boost solving accuracy in pretrained MDMs from $<7$% to $\approx 90$%, even outperforming ARMs with $7\times$ as many parameters and that were explicitly trained via teacher forcing to learn the right order of decoding.

Updated: 2025-03-05 19:19:48

标题: 最坏情况下进行训练，最好情况下进行规划：理解掩码扩散中的令牌排序

摘要: 最近几年，掩盖扩散模型（MDMs）已成为离散领域生成建模的一个有前途的替代方法。与自回归模型（ARMs）相比，MDMs在训练时以复杂性为代价，在推理时具有灵活性。在训练时，它们必须学会解决一个指数级别数量的填充问题，但在推理时，它们可以以基本任意的顺序解码标记。在这项工作中，我们密切研究了这两种相互竞争的效果。在训练方面，我们在理论上和实践上证明，与自回归模型相比，MDMs确实在计算上难以解决的子问题上训练。在推理方面，我们展示了一种适当的策略，可以根据需要选择标记解码顺序，显著增强MDMs的能力，使它们能够规避难题。在逻辑谜题如数独中，我们展示了自适应推理可以将预训练MDMs的解决准确率从<7％提高到约90％，甚至胜过具有7倍参数的ARMs，后者是通过教师强制训练明确学习解码顺序的。

更新时间: 2025-03-05 19:19:48

领域: cs.LG

下载: http://arxiv.org/abs/2502.06768v2

A Reverse Mamba Attention Network for Pathological Liver Segmentation

We present RMA-Mamba, a novel architecture that advances the capabilities of vision state space models through a specialized reverse mamba attention module (RMA). The key innovation lies in RMA-Mamba's ability to capture long-range dependencies while maintaining precise local feature representation through its hierarchical processing pipeline. By integrating Vision Mamba (VMamba)'s efficient sequence modeling with RMA's targeted feature refinement, our architecture achieves superior feature learning across multiple scales. This dual-mechanism approach enables robust handling of complex morphological patterns while maintaining computational efficiency. We demonstrate RMA-Mamba's effectiveness in the challenging domain of pathological liver segmentation (from both CT and MRI), where traditional segmentation approaches often fail due to tissue variations. When evaluated on a newly introduced cirrhotic liver dataset (CirrMRI600+) of T2-weighted MRI scans, RMA-Mamba achieves the state-of-the-art performance with a Dice coefficient of 92.08%, mean IoU of 87.36%, and recall of 92.96%. The architecture's generalizability is further validated on the cancerous liver segmentation from CT scans (LiTS: Liver Tumor Segmentation dataset), yielding a Dice score of 92.9% and mIoU of 88.99%. Our code is available for public: https://github.com/JunZengz/RMAMamba.

Updated: 2025-03-05 19:18:00

标题: 一个用于病理性肝脏分割的反向曼巴注意力网络

摘要: 我们提出了RMA-Mamba，这是一种通过专门的反向曼巴注意模块（RMA）提升视觉状态空间模型能力的新型架构。关键创新在于RMA-Mamba能够通过其分层处理流程捕捉长距离依赖关系，同时保持精确的局部特征表示。通过将Vision Mamba（VMamba）的高效序列建模与RMA的有针对性特征细化相结合，我们的架构在多个尺度上实现了优越的特征学习。这种双机制方法使得在处理复杂形态模式的同时保持了计算效率。我们在病理性肝脏分割的挑战领域中展示了RMA-Mamba的有效性（来自CT和MRI），传统分割方法在这里通常由于组织变异而失败。在新引入的肝硬化肝脏数据集（CirrMRI600+）的T2加权MRI扫描中评估时，RMA-Mamba实现了92.08%的Dice系数，87.36%的平均IoU和92.96%的召回率，达到了最先进的性能。该架构的泛化能力在CT扫描中的癌性肝脏分割（LiTS：肝肿瘤分割数据集）上得到了进一步验证，获得了92.9%的Dice分数和88.99%的mIoU。我们的代码已经公开发布：https://github.com/JunZengz/RMAMamba。

更新时间: 2025-03-05 19:18:00

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2502.18232v2

Task-Agnostic Attacks Against Vision Foundation Models

The study of security in machine learning mainly focuses on downstream task-specific attacks, where the adversarial example is obtained by optimizing a loss function specific to the downstream task. At the same time, it has become standard practice for machine learning practitioners to adopt publicly available pre-trained vision foundation models, effectively sharing a common backbone architecture across a multitude of applications such as classification, segmentation, depth estimation, retrieval, question-answering and more. The study of attacks on such foundation models and their impact to multiple downstream tasks remains vastly unexplored. This work proposes a general framework that forges task-agnostic adversarial examples by maximally disrupting the feature representation obtained with foundation models. We extensively evaluate the security of the feature representations obtained by popular vision foundation models by measuring the impact of this attack on multiple downstream tasks and its transferability between models.

Updated: 2025-03-05 19:15:14

标题: 任务不可知攻击对视觉基础模型的影响

摘要: 机器学习安全性研究主要集中在针对下游特定任务的攻击，其中对抗性示例是通过优化特定于下游任务的损失函数获得的。同时，对于机器学习从业者来说，采用公开可用的预训练视觉基础模型已成为标准做法，有效地在诸多应用中共享一个通用的骨干架构，如分类、分割、深度估计、检索、问答等。对这些基础模型的攻击研究及其对多个下游任务的影响仍然是一个尚未深入探讨的领域。本文提出了一个通用框架，通过最大程度地干扰使用基础模型获得的特征表示，生成与任务无关的对抗性示例。我们通过衡量此攻击对多个下游任务的影响以及在不同模型之间的可转移性，对流行视觉基础模型获得的特征表示的安全性进行了广泛评估。

更新时间: 2025-03-05 19:15:14

领域: cs.CV,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2503.03842v1

Market-based Architectures in RL and Beyond

Market-based agents refer to reinforcement learning agents which determine their actions based on an internal market of sub-agents. We introduce a new type of market-based algorithm where the state itself is factored into several axes called ``goods'', which allows for greater specialization and parallelism than existing market-based RL algorithms. Furthermore, we argue that market-based algorithms have the potential to address many current challenges in AI, such as search, dynamic scaling and complete feedback, and demonstrate that they may be seen to generalize neural networks; finally, we list some novel ways that market algorithms may be applied in conjunction with Large Language Models for immediate practical applicability.

Updated: 2025-03-05 19:09:29

标题: 强化学习及其拓展中的市场驱动架构

摘要: 市场化智能体指的是根据内部子智能体市场确定其行动的强化学习智能体。我们引入了一种新型的基于市场的算法，其中状态本身被分解为称为“商品”的几个轴，这允许比现有基于市场的强化学习算法更大的专业化和并行性。此外，我们认为基于市场的算法有潜力解决人工智能中的许多当前挑战，例如搜索、动态扩展和完整反馈，并证明它们可能被视为对神经网络的泛化；最后，我们列出了市场算法可以与大型语言模型结合应用的一些新颖方法，以实现即时的实际应用。

更新时间: 2025-03-05 19:09:29

领域: cs.AI,econ.TH

下载: http://arxiv.org/abs/2503.05828v1

Decoupling the components of geometric understanding in Vision Language Models

Understanding geometry relies heavily on vision. In this work, we evaluate whether state-of-the-art vision language models (VLMs) can understand simple geometric concepts. We use a paradigm from cognitive science that isolates visual understanding of simple geometry from the many other capabilities it is often conflated with such as reasoning and world knowledge. We compare model performance with human adults from the USA, as well as with prior research on human adults without formal education from an Amazonian indigenous group. We find that VLMs consistently underperform both groups of human adults, although they succeed with some concepts more than others. We also find that VLM geometric understanding is more brittle than human understanding, and is not robust when tasks require mental rotation. This work highlights interesting differences in the origin of geometric understanding in humans and machines -- e.g. from printed materials used in formal education vs. interactions with the physical world or a combination of the two -- and a small step toward understanding these differences.

Updated: 2025-03-05 19:09:19

标题: 解耦视觉语言模型中几何理解的组成部分

摘要: 理解几何学在很大程度上依赖于视觉。在这项工作中，我们评估了最先进的视觉语言模型（VLMs）是否能够理解简单的几何概念。我们使用了认知科学中的一个范例，将简单几何的视觉理解与常常混淆的推理和世界知识等许多其他能力分离开来。我们将模型表现与来自美国的成年人以及来自亚马逊原住民群体的未接受正规教育的人的先前研究进行比较。我们发现，VLMs在表现上始终不如这两组人类成年人，尽管它们在某些概念上比其他概念更成功。我们还发现，VLM的几何理解比人类理解更加脆弱，在需要进行心理旋转的任务中并不稳健。这项工作凸显了人类和机器几何理解的起源之间有趣的差异，例如来自正规教育中使用的印刷材料与与物理世界的互动或两者的结合之间的差异，并且是朝着理解这些差异迈出的一小步。

更新时间: 2025-03-05 19:09:19

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.03840v1

Conditional Hallucinations for Image Compression

In lossy image compression, models face the challenge of either hallucinating details or generating out-of-distribution samples due to the information bottleneck. This implies that at times, introducing hallucinations is necessary to generate in-distribution samples. The optimal level of hallucination varies depending on image content, as humans are sensitive to small changes that alter the semantic meaning. We propose a novel compression method that dynamically balances the degree of hallucination based on content. We collect data and train a model to predict user preferences on hallucinations. By using this prediction to adjust the perceptual weight in the reconstruction loss, we develop a Conditionally Hallucinating compression model (ConHa) that outperforms state-of-the-art image compression methods. Code and images are available at https://polybox.ethz.ch/index.php/s/owS1k5JYs4KD4TA.

Updated: 2025-03-05 19:03:26

标题: 条件性幻觉用于图像压缩

摘要: 在有损图像压缩中，模型面临着要么产生细节幻觉，要么生成由于信息瓶颈而属于分布之外的样本的挑战。这意味着有时引入幻觉是必要的，以生成属于分布内的样本。最佳幻觉水平取决于图像内容，因为人类对改变语义意义的微小变化很敏感。我们提出了一种基于内容动态平衡幻觉程度的新型压缩方法。我们收集数据并训练一个模型来预测用户对幻觉的偏好。通过使用这种预测来调整重建损失中的感知权重，我们开发了一种条件幻觉压缩模型（ConHa），优于现有的图像压缩方法。代码和图像可在https://polybox.ethz.ch/index.php/s/owS1k5JYs4KD4TA获取。

更新时间: 2025-03-05 19:03:26

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.19493v2

Can We Talk Models Into Seeing the World Differently?

Unlike traditional vision-only models, vision language models (VLMs) offer an intuitive way to access visual content through language prompting by combining a large language model (LLM) with a vision encoder. However, both the LLM and the vision encoder come with their own set of biases, cue preferences, and shortcuts, which have been rigorously studied in uni-modal models. A timely question is how such (potentially misaligned) biases and cue preferences behave under multi-modal fusion in VLMs. As a first step towards a better understanding, we investigate a particularly well-studied vision-only bias - the texture vs. shape bias and the dominance of local over global information. As expected, we find that VLMs inherit this bias to some extent from their vision encoders. Surprisingly, the multi-modality alone proves to have important effects on the model behavior, i.e., the joint training and the language querying change the way visual cues are processed. While this direct impact of language-informed training on a model's visual perception is intriguing, it raises further questions on our ability to actively steer a model's output so that its prediction is based on particular visual cues of the user's choice. Interestingly, VLMs have an inherent tendency to recognize objects based on shape information, which is different from what a plain vision encoder would do. Further active steering towards shape-based classifications through language prompts is however limited. In contrast, active VLM steering towards texture-based decisions through simple natural language prompts is often more successful. URL: https://github.com/paulgavrikov/vlm_shapebias

Updated: 2025-03-05 19:01:00

标题: 我们能通过模型改变世界观吗？

摘要: 与传统的仅视觉模型不同，视觉语言模型（VLMs）通过将大型语言模型（LLM）与视觉编码器相结合，提供了一种通过语言提示访问视觉内容的直观方式。然而，LLM和视觉编码器都具有自己的偏见、提示偏好和捷径，在单模态模型中已经得到了严格研究。一个及时的问题是这些（潜在的不一致）偏见和提示偏好在VLMs中的多模态融合下会如何表现。作为对更好理解的第一步，我们研究了一个特别研究过的仅视觉偏见 - 纹理与形状偏见和局部信息优于全局信息的优势。正如预期的那样，我们发现VLMs在某种程度上从它们的视觉编码器继承了这种偏见。令人惊讶的是，仅仅多模态本身就对模型行为产生了重要影响，即联合训练和语言查询改变了视觉提示被处理的方式。尽管语言引导训练对模型的视觉感知产生直接影响是令人着迷的，但这引发了更多关于我们能否积极引导模型输出的进一步问题，使其预测基于用户选择的特定视觉提示。有趣的是，VLMs具有根据形状信息识别对象的固有倾向，这与普通视觉编码器的做法不同。然而，通过语言提示对形状为基础的分类进行进一步积极引导是有限的。相反，通过简单的自然语言提示对纹理为基础的决策进行积极的VLM引导通常更成功。 URL:https://github.com/paulgavrikov/vlm_shapebias

更新时间: 2025-03-05 19:01:00

领域: cs.CV,cs.AI,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2403.09193v2

The optical and infrared are connected

Galaxies are often modelled as composites of separable components with distinct spectral signatures, implying that different wavelength ranges are only weakly correlated. They are not. We present a data-driven model which exploits subtle correlations between physical processes to accurately predict infrared (IR) WISE photometry from a neural summary of optical SDSS spectra. The model achieves accuracies of $\chi^2_N \approx 1$ for all photometric bands in WISE, as well as good colors. We are also able to tightly constrain typically IR-derived properties, e.g. the bolometric luminosities of AGN and dust parameters such as $\mathrm{q_{PAH}}$. We find that current SED-fitting methods are incapable of making comparable predictions, and that model misspecification often leads to correlated biases in star-formation rates and AGN luminosities. To help improve SED models, we determine what features of the optical spectrum are responsible for our improved predictions, and identify several lines (CaII, SrII, FeI, [OII] and H$\alpha$), which point to the complex chronology of star formation and chemical enrichment being incorrectly modelled.

Updated: 2025-03-05 19:00:01

标题: 光学和红外之间的联系

摘要: Galaxies are often modeled as composites of separable components with distinct spectral signatures, implying that different wavelength ranges are only weakly correlated. However, this is not the case. The authors present a data-driven model that exploits subtle correlations between physical processes to accurately predict infrared (IR) WISE photometry from a neural summary of optical SDSS spectra. The model achieves accuracies of $\chi^2_N \approx 1$ for all photometric bands in WISE, as well as good colors. The model also allows for tight constraints on typically IR-derived properties, such as the bolometric luminosities of AGN and dust parameters like $\mathrm{q_{PAH}}$. The authors find that current SED-fitting methods are unable to make comparable predictions, and that model misspecification often results in correlated biases in star-formation rates and AGN luminosities. To help improve SED models, the authors determine which features of the optical spectrum are responsible for their improved predictions, identifying several lines (CaII, SrII, FeI, [OII], and H$\alpha$) that suggest the complex chronology of star formation and chemical enrichment is being incorrectly modeled.

更新时间: 2025-03-05 19:00:01

领域: astro-ph.GA,cs.LG

下载: http://arxiv.org/abs/2503.03816v1

Non-Gaussianities in Collider Metric Binning

Metrics for rigorously defining a distance between two events have been used to study the properties of the dataspace manifold of particle collider physics. The probability distribution of pairwise distances on this dataspace is unique with probability 1, and so this suggests a method to search for and identify new physics by the deviation of measurement from a null hypothesis prediction. To quantify the deviation statistically, we directly calculate the probability distribution of the number of event pairs that land in the bin a fixed distance apart. This distribution is not generically Gaussian and the ratio of the standard deviation to the mean entries in a bin scales inversely with the square-root of the number of events in the data ensemble. If the dataspace manifold exhibits some enhanced symmetry, the number of entries is Gaussian, and further fluctuations about the mean scale away like the inverse of the number of events. We define a robust measure of the non-Gaussianity of the bin-by-bin statistics of the distance distribution, and demonstrate in simulated data of jets from quantum chromodynamics sensitivity to the parton-to-hadron transition and that the manifold of events enjoys enhanced symmetries as their energy increases.

Updated: 2025-03-05 19:00:00

标题: 碰撞器度量分箱中的非高斯性

摘要: 用于严格定义两个事件之间距离的度量标准已被用于研究粒子对撞机物理的数据空间流形的性质。在这个数据空间上，成对距离的概率分布是唯一的，并且这表明一种通过测量偏离零假设预测来搜索和识别新物理的方法。为了统计量化偏离，我们直接计算了落入固定距离的箱中的事件对数的概率分布。这个分布通常不是高斯分布，而在一个箱中标准差与均值条目的比率与数据集合中事件数量的平方根成反比地缩放。如果数据空间流形具有某种增强的对称性，箱中的条目数就是高斯分布的，而关于均值的进一步波动会以事件数量的倒数方式远离。我们定义了距离分布的逐箱统计的非高斯性的强度量，并在模拟的量子色动力学喷流数据中展示了对粒子到强子转变的敏感性，以及随着能量增加，事件流形享有增强对称性。

更新时间: 2025-03-05 19:00:00

领域: hep-ph,cs.LG,hep-ex

下载: http://arxiv.org/abs/2503.03809v1

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

As large language models (LLMs) become more capable and agentic, the requirement for trust in their outputs grows significantly, yet at the same time concerns have been mounting that models may learn to lie in pursuit of their goals. To address these concerns, a body of work has emerged around the notion of "honesty" in LLMs, along with interventions aimed at mitigating deceptive behaviors. However, evaluations of honesty are currently highly limited, with no benchmark combining large scale and applicability to all models. Moreover, many benchmarks claiming to measure honesty in fact simply measure accuracy--the correctness of a model's beliefs--in disguise. In this work, we introduce a large-scale human-collected dataset for measuring honesty directly, allowing us to disentangle accuracy from honesty for the first time. Across a diverse set of LLMs, we find that while larger models obtain higher accuracy on our benchmark, they do not become more honest. Surprisingly, while most frontier LLMs obtain high scores on truthfulness benchmarks, we find a substantial propensity in frontier LLMs to lie when pressured to do so, resulting in low honesty scores on our benchmark. We find that simple methods, such as representation engineering interventions, can improve honesty. These results underscore the growing need for robust evaluations and effective interventions to ensure LLMs remain trustworthy.

Updated: 2025-03-05 18:59:23

标题: MASK基准：AI系统中诚实与准确性的解耦

摘要: 随着大型语言模型（LLMs）变得更加能干和自主，对其输出的信任要求显著增加，然而与此同时，人们担忧模型可能会在追求目标时学会撒谎。为了解决这些问题，围绕LLMs中“诚实”概念出现了一系列工作，并采取干预措施来减轻欺骗行为。然而，目前对诚实性的评估非常有限，没有结合大规模和适用于所有模型的基准。此外，许多声称衡量诚实性的基准实际上只是伪装的准确性评估 - 即模型信念的正确性。在这项工作中，我们介绍了一个大规模人工收集的用于直接衡量诚实性的数据集，使我们首次能够将准确性与诚实性区分开来。在各种LLMs中，我们发现尽管更大的模型在我们的基准上获得更高的准确性，但它们并不变得更加诚实。令人惊讶的是，虽然大多数前沿LLMs在真实性基准上获得高分，我们发现前沿LLMs在受到压力时有相当大的撒谎倾向，导致我们的基准上获得低诚实性分数。我们发现简单的方法，如表示工程干预，可以提高诚实性。这些结果强调了对LLMs进行可靠评估和有效干预的日益增长的需求，以确保LLMs保持值得信赖。

更新时间: 2025-03-05 18:59:23

领域: cs.LG,cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2503.03750v1

Personalize Your LLM: Fake it then Align it

Personalizing large language models (LLMs) is essential for delivering tailored interactions that improve user experience. Many existing personalization methods require fine-tuning LLMs for each user, rendering them prohibitively expensive for widespread adoption. Although retrieval-based approaches offer a more compute-efficient alternative, they still depend on large, high-quality datasets that are not consistently available for all users. To address this challenge, we propose CHAMELEON, a scalable and efficient personalization approach that uses (1) self-generated personal preference data and (2) representation editing to enable quick and cost-effective personalization. Our experiments on various tasks, including those from the LaMP personalization benchmark, show that CHAMELEON efficiently adapts models to personal preferences, improving instruction-tuned models and outperforms two personalization baselines by an average of 40% across two model architectures.

Updated: 2025-03-05 18:59:19

标题: 个性化您的LLM：伪装然后对齐

摘要: 个性化大型语言模型（LLMs）对于提供定制化的互动以改善用户体验至关重要。许多现有的个性化方法需要为每个用户对LLMs进行微调，使它们对广泛采用变得过于昂贵。虽然基于检索的方法提供了一种更具计算效率的替代方案，但它们仍然依赖于大型、高质量的数据集，这些数据集并不总是对所有用户都可用。为了解决这一挑战，我们提出了CHAMELEON，一种可扩展且高效的个性化方法，它利用（1）自动生成的个人偏好数据和（2）表示编辑来实现快速和成本效益的个性化。我们在各种任务上的实验，包括来自LaMP个性化基准测试的任务，表明CHAMELEON能够有效地使模型适应个人偏好，改进指导调整的模型，并在两种模型架构中平均超过两个个性化基线40%。

更新时间: 2025-03-05 18:59:19

领域: cs.LG

下载: http://arxiv.org/abs/2503.01048v3

PacketCLIP: Multi-Modal Embedding of Network Traffic and Language for Cybersecurity Reasoning

Traffic classification is vital for cybersecurity, yet encrypted traffic poses significant challenges. We present PacketCLIP, a multi-modal framework combining packet data with natural language semantics through contrastive pretraining and hierarchical Graph Neural Network (GNN) reasoning. PacketCLIP integrates semantic reasoning with efficient classification, enabling robust detection of anomalies in encrypted network flows. By aligning textual descriptions with packet behaviors, it offers enhanced interpretability, scalability, and practical applicability across diverse security scenarios. PacketCLIP achieves a 95% mean AUC, outperforms baselines by 11.6%, and reduces model size by 92%, making it ideal for real-time anomaly detection. By bridging advanced machine learning techniques and practical cybersecurity needs, PacketCLIP provides a foundation for scalable, efficient, and interpretable solutions to tackle encrypted traffic classification and network intrusion detection challenges in resource-constrained environments.

Updated: 2025-03-05 18:58:58

标题: PacketCLIP：网络流量和语言的多模态嵌入用于网络安全推理

摘要: 流量分类对网络安全至关重要，但加密流量带来了重大挑战。我们提出了PacketCLIP，这是一个多模态框架，通过对比预训练和分层图神经网络（GNN）推理将数据包数据与自然语言语义结合起来。PacketCLIP将语义推理与高效分类相结合，能够在加密网络流中稳健地检测异常。通过将文本描述与数据包行为对齐，它提供了增强的可解释性、可扩展性和实际适用性，适用于各种安全场景。PacketCLIP实现了95%的平均AUC，比基线提高了11.6%，并将模型大小减小了92%，使其成为实时异常检测的理想选择。通过结合先进的机器学习技术和实际网络安全需求，PacketCLIP为解决资源受限环境中的加密流量分类和网络入侵检测挑战提供了可扩展、高效和可解释的解决方案的基础。

更新时间: 2025-03-05 18:58:58

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2503.03747v1

Process-based Self-Rewarding Language Models

Large Language Models have demonstrated outstanding performance across various downstream tasks and have been widely applied in multiple scenarios. Human-annotated preference data is used for training to further improve LLMs' performance, which is constrained by the upper limit of human performance. Therefore, Self-Rewarding method has been proposed, where LLMs generate training data by rewarding their own outputs. However, the existing self-rewarding paradigm is not effective in mathematical reasoning scenarios and may even lead to a decline in performance. In this work, we propose the Process-based Self-Rewarding pipeline for language models, which introduces long-thought reasoning, step-wise LLM-as-a-Judge, and step-wise preference optimization within the self-rewarding paradigm. Our new paradigm successfully enhances the performance of LLMs on multiple mathematical reasoning benchmarks through iterative Process-based Self-Rewarding, demonstrating the immense potential of self-rewarding to achieve LLM reasoning that may surpass human capabilities.

Updated: 2025-03-05 18:58:44

标题: 基于过程的自我奖励语言模型

摘要: 大型语言模型在各种下游任务中表现出色，并被广泛应用于多种场景。人类注释的偏好数据用于训练以进一步提高LLMs的性能，但受人类性能上限的限制。因此，提出了自我奖励方法，LLMs通过奖励自己的输出生成训练数据。然而，现有的自我奖励范式在数学推理场景中并不有效，甚至可能导致性能下降。在这项工作中，我们提出了基于过程的自我奖励管道用于语言模型，引入长时间推理、逐步LLM作为评判者，以及自我奖励范式内的逐步偏好优化。我们的新范式通过迭代过程化自我奖励成功提升了LLMs在多个数学推理基准测试中的表现，展示了自我奖励实现超越人类能力的LLM推理的巨大潜力。

更新时间: 2025-03-05 18:58:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03746v1

Constrained Gaussian Wasserstein Optimal Transport with Commutative Covariance Matrices

Optimal transport has found widespread applications in signal processing and machine learning. Among its many equivalent formulations, optimal transport seeks to reconstruct a random variable/vector with a prescribed distribution at the destination while minimizing the expected distortion relative to a given random variable/vector at the source. However, in practice, certain constraints may render the optimal transport plan infeasible. In this work, we consider three types of constraints: rate constraints, dimension constraints, and channel constraints, motivated by perception-aware lossy compression, generative principal component analysis, and deep joint source-channel coding, respectively. Special attenion is given to the setting termed Gaussian Wasserstein optimal transport, where both the source and reconstruction variables are multivariate Gaussian, and the end-to-end distortion is measured by the mean squared error. We derive explicit results for the minimum achievable mean squared error under the three aforementioned constraints when the covariance matrices of the source and reconstruction variables commute.

Updated: 2025-03-05 18:56:48

标题: 受限高斯瓦瑟斯坦最优输运与可交换协方差矩阵

摘要: 最优输运在信号处理和机器学习中找到了广泛的应用。在其众多等价公式中，最优输运旨在在目的地重建具有指定分布的随机变量/向量，同时最小化相对于源随机变量/向量的给定期望失真。然而，在实践中，某些约束条件可能使最优输运计划变得不可行。在这项工作中，我们考虑了三种类型的约束条件：速率约束，维度约束和通道约束，这些约束受感知感知损失压缩、生成主成分分析和深度联合源-信道编码的启发。我们特别关注被称为高斯瓦瑟斯坦最优输运的设定，在此设定中，源和重建变量均为多元高斯，并且端到端失真由均方误差来衡量。当源和重建变量的协方差矩阵可交换时，我们推导出了在三种前述约束条件下最小可实现的均方误差的明确结果。

更新时间: 2025-03-05 18:56:48

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2503.03744v1

CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning

The advancement of visual language models (VLMs) has enhanced mobile device operations, allowing simulated human-like actions to address user requirements. Current VLM-based mobile operating assistants can be structured into three levels: task, subtask, and action. The subtask level, linking high-level goals with low-level executable actions, is crucial for task completion but faces two challenges: ineffective subtasks that lower-level agent cannot execute and inefficient subtasks that fail to contribute to the completion of the higher-level task. These challenges stem from VLM's lack of experience in decomposing subtasks within GUI scenarios in multi-agent architecture. To address these, we propose a new mobile assistant architecture with constrained high-frequency o}ptimized planning (CHOP). Our approach overcomes the VLM's deficiency in GUI scenarios planning by using human-planned subtasks as the basis vector. We evaluate our architecture in both English and Chinese contexts across 20 Apps, demonstrating significant improvements in both effectiveness and efficiency. Our dataset and code is available at https://github.com/Yuqi-Zhou/CHOP

Updated: 2025-03-05 18:56:16

标题: CHOP：具有受限高频优化子任务规划的移动操作助手

摘要: 视觉语言模型（VLMs）的进步增强了移动设备的操作，使得可以模拟人类行为以满足用户需求。当前基于VLM的移动操作助手可以分为三个级别：任务、子任务和动作。子任务级别将高级目标与低级可执行动作联系起来，对于任务完成至关重要，但面临两个挑战：低级代理无法执行的无效子任务和未能对高级任务完成做出贡献的低效子任务。这些挑战源自VLM在多代理体系结构中缺乏GUI场景中子任务分解经验。为了解决这些问题，我们提出了一种新的移动助手架构，即约束高频优化规划（CHOP）。我们的方法通过使用人类计划的子任务作为基础向量，克服了VLM在GUI场景规划中的不足。我们在英文和中文环境下对20个应用程序中的架构进行评估，展示了在效果和效率方面的显著改进。我们的数据集和代码可在https://github.com/Yuqi-Zhou/CHOP 上找到。

更新时间: 2025-03-05 18:56:16

领域: cs.AI

下载: http://arxiv.org/abs/2503.03743v1

Network Causal Effect Estimation In Graphical Models Of Contagion And Latent Confounding

A key question in many network studies is whether the observed correlations between units are primarily due to contagion or latent confounding. Here, we study this question using a segregated graph (Shpitser, 2015) representation of these mechanisms, and examine how uncertainty about the true underlying mechanism impacts downstream computation of network causal effects, particularly under full interference -- settings where we only have a single realization of a network and each unit may depend on any other unit in the network. Under certain assumptions about asymptotic growth of the network, we derive likelihood ratio tests that can be used to identify whether different sets of variables -- confounders, treatments, and outcomes -- across units exhibit dependence due to contagion or latent confounding. We then propose network causal effect estimation strategies that provide unbiased and consistent estimates if the dependence mechanisms are either known or correctly inferred using our proposed tests. Together, the proposed methods allow network effect estimation in a wider range of full interference scenarios that have not been considered in prior work. We evaluate the effectiveness of our methods with synthetic data and the validity of our assumptions using real-world networks.

Updated: 2025-03-05 18:48:34

标题: 在传染和潜在混淆的图模型中网络因果效应估计

摘要: 在许多网络研究中的一个关键问题是观察到的单位之间的相关性主要是由传染还是潜在混杂引起的。在这里，我们使用分离图（Shpitser，2015）表示这些机制，研究这个问题，并检查关于真实潜在机制的不确定性如何影响网络因果效应的下游计算，尤其是在完全干扰的情况下--在这种设置中，我们只有一个网络的单个实现，并且每个单位可能依赖于网络中的任何其他单位。在对网络的渐近增长做出一些假设的情况下，我们推导出似然比检验，可以用来识别不同的变量集合--混杂因素、处理和结果--跨单位是否由于传染或潜在混杂而表现出依赖。然后，我们提出网络因果效应估计策略，如果依赖机制是已知的或使用我们提出的测试正确推断的话，可以提供无偏和一致的估计。综合考虑，提出的方法允许在以前的工作中没有考虑的更广泛的完全干扰场景中进行网络效应估计。我们使用合成数据评估我们的方法的有效性，并使用真实世界网络验证我们的假设的有效性。

更新时间: 2025-03-05 18:48:34

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2411.01371v2

RiskAgent: Autonomous Medical AI Copilot for Generalist Risk Prediction

The application of Large Language Models (LLMs) to various clinical applications has attracted growing research attention. However, real-world clinical decision-making differs significantly from the standardized, exam-style scenarios commonly used in current efforts. In this paper, we present the RiskAgent system to perform a broad range of medical risk predictions, covering over 387 risk scenarios across diverse complex diseases, e.g., cardiovascular disease and cancer. RiskAgent is designed to collaborate with hundreds of clinical decision tools, i.e., risk calculators and scoring systems that are supported by evidence-based medicine. To evaluate our method, we have built the first benchmark MedRisk specialized for risk prediction, including 12,352 questions spanning 154 diseases, 86 symptoms, 50 specialties, and 24 organ systems. The results show that our RiskAgent, with 8 billion model parameters, achieves 76.33% accuracy, outperforming the most recent commercial LLMs, o1, o3-mini, and GPT-4.5, and doubling the 38.39% accuracy of GPT-4o. On rare diseases, e.g., Idiopathic Pulmonary Fibrosis (IPF), RiskAgent outperforms o1 and GPT-4.5 by 27.27% and 45.46% accuracy, respectively. Finally, we further conduct a generalization evaluation on an external evidence-based diagnosis benchmark and show that our RiskAgent achieves the best results. These encouraging results demonstrate the great potential of our solution for diverse diagnosis domains. To improve the adaptability of our model in different scenarios, we have built and open-sourced a family of models ranging from 1 billion to 70 billion parameters. Our code, data, and models are all available at https://github.com/AI-in-Health/RiskAgent.

Updated: 2025-03-05 18:46:51

标题: RiskAgent：通用风险预测的自主医疗AI副驾驶

摘要: 将大型语言模型（LLMs）应用于各种临床应用已引起越来越多的研究关注。然而，现实世界中的临床决策与当前常用的标准化、考试式场景有着显著的区别。在本文中，我们介绍了RiskAgent系统，用于进行广泛的医疗风险预测，涵盖了超过387种风险情景，涉及多种复杂疾病，例如心血管疾病和癌症。RiskAgent旨在与数百种临床决策工具合作，即由循证医学支持的风险计算器和评分系统。为了评估我们的方法，我们建立了第一个专门用于风险预测的基准MedRisk，包括12,352个问题，涵盖了154种疾病、86种症状、50个专业领域和24个器官系统。结果显示，我们的RiskAgent，具有80亿个模型参数，实现了76.33%的准确率，优于最新的商业LLMs，o1、o3-mini和GPT-4.5，并将GPT-4o的38.39%准确率翻倍。在罕见疾病方面，例如特发性肺纤维化（IPF），RiskAgent分别比o1和GPT-4.5提高了27.27%和45.46%的准确率。最后，我们进一步对外部循证诊断基准进行了泛化评估，并展示我们的RiskAgent取得了最佳结果。这些令人鼓舞的结果展示了我们解决方案在不同诊断领域的巨大潜力。为了提高我们模型在不同场景中的适应性，我们构建并开源了一系列模型，参数范围从10亿到700亿。我们的代码、数据和模型都可以在https://github.com/AI-in-Health/RiskAgent找到。

更新时间: 2025-03-05 18:46:51

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2503.03802v1

Opportunistic Routing in Wireless Communications via Learnable State-Augmented Policies

This paper addresses the challenge of packet-based information routing in large-scale wireless communication networks. The problem is framed as a constrained statistical learning task, where each network node operates using only local information. Opportunistic routing exploits the broadcast nature of wireless communication to dynamically select optimal forwarding nodes, enabling the information to reach the destination through multiple relay nodes simultaneously. To solve this, we propose a State-Augmentation (SA) based distributed optimization approach aimed at maximizing the total information handled by the source nodes in the network. The problem formulation leverages Graph Neural Networks (GNNs), which perform graph convolutions based on the topological connections between network nodes. Using an unsupervised learning paradigm, we extract routing policies from the GNN architecture, enabling optimal decisions for source nodes across various flows. Numerical experiments demonstrate that the proposed method achieves superior performance when training a GNN-parameterized model, particularly when compared to baseline algorithms. Additionally, applying the method to real-world network topologies and wireless ad-hoc network test beds validates its effectiveness, highlighting the robustness and transferability of GNNs.

Updated: 2025-03-05 18:44:56

标题: 无线通信中的机会式路由：通过可学习的状态增强策略

摘要: 这篇论文探讨了在大规模无线通信网络中基于数据包的信息路由的挑战。问题被构建为一个受限的统计学习任务，其中每个网络节点仅使用本地信息进行操作。机会性路由利用无线通信的广播特性动态选择最佳的转发节点，使信息能够通过多个中继节点同时到达目的地。为了解决这个问题，我们提出了一种基于状态扩充（SA）的分布式优化方法，旨在最大化网络中源节点处理的总信息量。问题的制定利用了图神经网络（GNNs），它们根据网络节点之间的拓扑连接进行图卷积。通过无监督学习范式，我们从GNN体系结构中提取路由策略，使得源节点在各种流中能够做出最佳决策。数值实验表明，所提出的方法在训练GNN参数化模型时表现出优越性能，特别是与基线算法相比。此外，将该方法应用于真实网络拓扑和无线自组网络测试床验证了其有效性，突出了GNN的稳健性和可迁移性。

更新时间: 2025-03-05 18:44:56

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2503.03736v1

Rethinking Deep Clustering Paradigms: Self-Supervision Is All You Need

The recent advances in deep clustering have been made possible by significant progress in self-supervised and pseudo-supervised learning. However, the trade-off between self-supervision and pseudo-supervision can give rise to three primary issues. The joint training causes Feature Randomness and Feature Drift, whereas the independent training causes Feature Randomness and Feature Twist. In essence, using pseudo-labels generates random and unreliable features. The combination of pseudo-supervision and self-supervision drifts the reliable clustering-oriented features. Moreover, moving from self-supervision to pseudo-supervision can twist the curved latent manifolds. This paper addresses the limitations of existing deep clustering paradigms concerning Feature Randomness, Feature Drift, and Feature Twist. We propose a new paradigm with a new strategy that replaces pseudo-supervision with a second round of self-supervision training. The new strategy makes the transition between instance-level self-supervision and neighborhood-level self-supervision smoother and less abrupt. Moreover, it prevents the drifting effect that is caused by the strong competition between instance-level self-supervision and clustering-level pseudo-supervision. Moreover, the absence of the pseudo-supervision prevents the risk of generating random features. With this novel approach, our paper introduces a Rethinking of the Deep Clustering Paradigms, denoted by R-DC. Our model is specifically designed to address three primary challenges encountered in Deep Clustering: Feature Randomness, Feature Drift, and Feature Twist. Experimental results conducted on six datasets have shown that the two-level self-supervision training yields substantial improvements.

Updated: 2025-03-05 18:44:35

标题: 重新思考深度聚类范式：自我监督就是你所需要的

摘要: 最近在深度聚类方面取得的进展主要归功于自监督学习和伪监督学习的显著进展。然而，自监督学习和伪监督学习之间的权衡可能导致三个主要问题。联合训练会引起特征随机性和特征漂移，而独立训练会引起特征随机性和特征扭曲。实质上，使用伪标签会生成随机且不可靠的特征。伪监督学习和自监督学习的结合会使可靠的聚类导向特征漂移。此外，从自监督学习转向伪监督学习可能会扭曲曲线潜在流形。本文解决了现有深度聚类范式在特征随机性、特征漂移和特征扭曲方面的局限性。我们提出了一种新的范式，采用一种新策略，用第二轮自监督训练替代伪监督学习。新策略使实例级自监督和邻域级自监督之间的过渡更加平稳和不那么突然。此外，它可以防止由于实例级自监督和聚类级伪监督之间的激烈竞争而引起的漂移效应。而且，缺乏伪监督可以防止生成随机特征的风险。通过这种新颖的方法，我们的论文引入了一种重新思考深度聚类范式的方法，称为R-DC。我们的模型专门设计用于解决深度聚类中遇到的三个主要挑战：特征随机性、特征漂移和特征扭曲。在六个数据集上进行的实验结果表明，两级自监督训练能够显著提高性能。

更新时间: 2025-03-05 18:44:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.03733v1

Towards Understanding Distilled Reasoning Models: A Representational Approach

In this paper, we investigate how model distillation impacts the development of reasoning features in large language models (LLMs). To explore this, we train a crosscoder on Qwen-series models and their fine-tuned variants. Our results suggest that the crosscoder learns features corresponding to various types of reasoning, including self-reflection and computation verification. Moreover, we observe that distilled models contain unique reasoning feature directions, which could be used to steer the model into over-thinking or incisive-thinking mode. In particular, we perform analysis on four specific reasoning categories: (a) self-reflection, (b) deductive reasoning, (c) alternative reasoning, and (d) contrastive reasoning. Finally, we examine the changes in feature geometry resulting from the distillation process and find indications that larger distilled models may develop more structured representations, which correlate with enhanced distillation performance. By providing insights into how distillation modifies the model, our study contributes to enhancing the transparency and reliability of AI systems.

Updated: 2025-03-05 18:40:19

标题: 朝向理解精炼推理模型：一种表征性方法

摘要: 在这篇论文中，我们调查了模型蒸馏对大型语言模型（LLMs）推理特征发展的影响。为了探索这一点，我们对Qwen系列模型及其精调变体进行了交叉编码器的训练。我们的结果表明，交叉编码器学习了与各种类型推理相关的特征，包括自我反思和计算验证。此外，我们观察到蒸馏模型包含独特的推理特征方向，可以用来引导模型进入过度思考或锐利思考模式。特别地，我们对四种具体的推理类别进行了分析：（a）自我反思，（b）演绎推理，（c）替代推理和（d）对比推理。最后，我们检查了蒸馏过程导致的特征几何形状的变化，并发现更大的蒸馏模型可能会发展出更有结构的表示，这与提高蒸馏性能相关。通过揭示蒸馏如何修改模型，我们的研究有助于增强人工智能系统的透明度和可靠性。

更新时间: 2025-03-05 18:40:19

领域: cs.LG

下载: http://arxiv.org/abs/2503.03730v1

CDS: Data Synthesis Method Guided by Cognitive Diagnosis Theory

Large Language Models (LLMs) have achieved significant advancements, but the increasing complexity of tasks and higher performance demands highlight the need for continuous improvement. Some approaches utilize synthetic data generated by advanced LLMs based on evaluation results to train models. However, conventional evaluation methods fail to provide detailed, fine-grained profiles of LLMs, limiting their guidance for data synthesis. In this paper, we introduce the Cognitive Diagnostic Synthesis (CDS) method, which incorporates a diagnostic process inspired by Cognitive Diagnosis Theory (CDT) to refine evaluation results and characterize model profiles at the knowledge component level. Based on these diagnostics, we propose two diagnosis-synthesis strategies for weakness-targeted data synthesis. Additionally, we present an enhanced data augmentation and selection pipeline to improve the quality and diversity of synthesized data. Our experiments with several open-source models show significant improvements across multiple benchmarks, achieving up to 6.00% improvement in code generation, 13.10% in mathematical reasoning, and 5.43% in academic exams. Code and data are available on GitHub.

Updated: 2025-03-05 18:39:05

标题: CDS：由认知诊断理论指导的数据综合方法

摘要: 大型语言模型（LLMs）取得了显著进展，但任务的复杂性和性能要求的增加突显了持续改进的必要性。一些方法利用基于评估结果的先进LLMs生成的合成数据来训练模型。然而，传统评估方法未能提供LLMs的详细、细粒度的分析，限制了它们对数据合成的指导。本文介绍了认知诊断综合（CDS）方法，该方法融合了受认知诊断理论（CDT）启发的诊断过程，以完善评估结果并在知识组件级别表征模型概要。基于这些诊断，我们提出了两种针对弱点的数据合成诊断-综合策略。此外，我们提出了一个增强的数据增强和选择流程，以提高合成数据的质量和多样性。我们通过对几个开源模型进行实验，在多个基准测试中取得了显著改进，代码生成方面的改进高达6.00%，数学推理方面的改进达到13.10%，学术考试方面的改进为5.43%。代码和数据可在GitHub上获得。

更新时间: 2025-03-05 18:39:05

领域: cs.AI

下载: http://arxiv.org/abs/2501.07674v2

Graph-Augmented LSTM for Forecasting Sparse Anomalies in Graph-Structured Time Series

Detecting anomalies in time series data is a critical task across many domains. The challenge intensifies when anomalies are sparse and the data are multivariate with relational dependencies across sensors or nodes. Traditional univariate anomaly detectors struggle to capture such cross-node dependencies, particularly in sparse anomaly settings. To address this, we propose a graph-augmented time series forecasting approach that explicitly integrates the graph of relationships among time series into an LSTM forecasting model. This enables the model to detect rare anomalies that might otherwise go unnoticed in purely univariate approaches. We evaluate the approach on two benchmark datasets - the Yahoo Webscope S5 anomaly dataset and the METR-LA traffic sensor network - and compare the performance of the Graph-Augmented LSTM against LSTM-only, ARIMA, and Prophet baselines. Results demonstrate that the graph-augmented model achieves significantly higher precision and recall, improving F1-score by up to 10% over the best baseline

Updated: 2025-03-05 18:37:52

标题: 图增强LSTM用于预测图结构时间序列中的稀疏异常情况

摘要: 在许多领域，检测时间序列数据中的异常是一项关键任务。当异常稀疏且数据为多变量，并且在传感器或节点之间存在关联依赖时，挑战变得更加严峻。传统的单变量异常检测器很难捕捉这种跨节点的依赖关系，特别是在稀疏异常设置中。为了解决这个问题，我们提出了一种图增强的时间序列预测方法，将时间序列之间的关系图明确地整合到LSTM预测模型中。这使得模型能够检测到在纯粹的单变量方法中可能被忽略的罕见异常。我们在两个基准数据集上评估了这种方法 - 雅虎Webscope S5异常数据集和METR-LA交通传感器网络，并将图增强LSTM与仅LSTM、ARIMA和Prophet基线进行了性能比较。结果表明，图增强模型的精确度和召回率显著提高，F1分数比最佳基线提高了高达10%。

更新时间: 2025-03-05 18:37:52

领域: cs.LG

下载: http://arxiv.org/abs/2503.03729v1

On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning

We study the discriminative probabilistic modeling on a continuous domain for the data prediction task of (multimodal) self-supervised representation learning. To address the challenge of computing the integral in the partition function for each anchor data, we leverage the multiple importance sampling (MIS) technique for robust Monte Carlo integration, which can recover InfoNCE-based contrastive loss as a special case. Within this probabilistic modeling framework, we conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning and derive insights for developing better approaches by reducing the error of Monte Carlo integration. To this end, we propose a novel non-parametric method for approximating the sum of conditional probability densities required by MIS through convex optimization, yielding a new contrastive objective for self-supervised representation learning. Moreover, we design an efficient algorithm for solving the proposed objective. We empirically compare our algorithm to representative baselines on the contrastive image-language pretraining task. Experimental results on the CC3M and CC12M datasets demonstrate the superior overall performance of our algorithm. Our code is available at https://github.com/bokun-wang/NUCLR.

Updated: 2025-03-05 18:36:02

标题: 关于自监督表示学习的判别性概率建模

摘要: 我们研究了针对（多模态）自监督表示学习数据预测任务的连续域上的判别概率建模。为了解决为每个锚定数据计算分区函数中的积分的挑战，我们利用了多重重要性采样（MIS）技术进行稳健的蒙特卡洛积分，可以将InfoNCE基于对比损失恢复为特例。在这个概率建模框架内，我们进行了泛化误差分析，揭示了当前InfoNCE基于对比损失在自监督表示学习中的局限性，并通过减少蒙特卡洛积分误差得出了开发更好方法的见解。为此，我们提出了一种新的非参数方法，通过凸优化来逼近MIS所需的条件概率密度之和，从而产生了自监督表示学习的新对比目标。此外，我们设计了一个有效的算法来解决提出的目标。我们在对比图像-语言预训练任务上将我们的算法与代表性基线进行了实证比较。在CC3M和CC12M数据集上的实验结果显示出我们的算法的卓越整体性能。我们的代码可在https://github.com/bokun-wang/NUCLR 上获取。

更新时间: 2025-03-05 18:36:02

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.09156v3

Interactive Data Harmonization with LLM Agents

Data harmonization is an essential task that entails integrating datasets from diverse sources. Despite years of research in this area, it remains a time-consuming and challenging task due to schema mismatches, varying terminologies, and differences in data collection methodologies. This paper presents the case for agentic data harmonization as a means to both empower experts to harmonize their data and to streamline the process. We introduce Harmonia, a system that combines LLM-based reasoning, an interactive user interface, and a library of data harmonization primitives to automate the synthesis of data harmonization pipelines. We demonstrate Harmonia in a clinical data harmonization scenario, where it helps to interactively create reusable pipelines that map datasets to a standard format. Finally, we discuss challenges and open problems, and suggest research directions for advancing our vision.

Updated: 2025-03-05 18:33:41

标题: LLM代理与互动数据协调

摘要: 数据协调是一个重要的任务，涉及整合来自不同来源的数据集。尽管在这个领域进行了多年的研究，但由于模式不匹配、术语差异和数据收集方法的差异，这仍然是一项耗时且具有挑战性的任务。本文提出了代理数据协调的案例，作为赋予专家协调数据并简化流程的手段。我们介绍了Harmonia，一个结合了基于LLM的推理、交互式用户界面和数据协调基元库的系统，用于自动化数据协调管道的合成。我们在临床数据协调场景中展示了Harmonia，在这种场景中，它帮助交互式地创建可重用的管道，将数据集映射到标准格式。最后，我们讨论了挑战和未解决的问题，并提出了推动我们愿景发展的研究方向。

更新时间: 2025-03-05 18:33:41

领域: cs.AI,cs.DB

下载: http://arxiv.org/abs/2502.07132v2

Universal Narrative Model: an Author-centric Storytelling Framework for Generative AI

Generative AI promises to finally realize dynamic, personalized storytelling technologies across a range of media. To date, experimentation with generative AI in the field of procedural narrative generation has been quite promising from a technical perspective. However, fundamental narrative dilemmas remain, such as the balance between player agency and narrative coherence, and no rigorous narrative standard has been proposed to specifically leverage the strengths of generative AI. In this paper, we propose the Universal Narrative Model (UNM), an open and extensible standard designed to place writers at the center of future narrative design workflows and enable interoperability across authoring platforms. By encoding an author's intent according to an objective narrative model, the UNM enables narrative portability as well as intent-based constraints for generative systems.

Updated: 2025-03-05 18:29:15

标题: 通用叙事模型：一种面向作者的生成式人工智能叙事框架

摘要: 生成式人工智能承诺最终实现动态、个性化的故事叙述技术，涵盖各种媒体形式。迄今为止，在程序化叙事生成领域，生成式人工智能的实验从技术角度来看一直很有希望。然而，基本的叙事困境仍然存在，比如玩家代理和叙事连贯之间的平衡，并且尚未提出严格的叙事标准来专门利用生成式人工智能的优势。在本文中，我们提出了通用叙事模型（UNM），这是一个开放且可扩展的标准，旨在将作家置于未来叙事设计工作流程的中心，并实现跨作者平台的互操作性。通过根据客观叙事模型对作者意图进行编码，UNM实现了叙事可移植性，同时为生成系统提供基于意图的约束。

更新时间: 2025-03-05 18:29:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.04844v1

Deep Causal Behavioral Policy Learning: Applications to Healthcare

We present a deep learning-based approach to studying dynamic clinical behavioral regimes in diverse non-randomized healthcare settings. Our proposed methodology - deep causal behavioral policy learning (DC-BPL) - uses deep learning algorithms to learn the distribution of high-dimensional clinical action paths, and identifies the causal link between these action paths and patient outcomes. Specifically, our approach: (1) identifies the causal effects of provider assignment on clinical outcomes; (2) learns the distribution of clinical actions a given provider would take given evolving patient information; (3) and combines these steps to identify the optimal provider for a given patient type and emulate that provider's care decisions. Underlying this strategy, we train a large clinical behavioral model (LCBM) on electronic health records data using a transformer architecture, and demonstrate its ability to estimate clinical behavioral policies. We propose a novel interpretation of a behavioral policy learned using the LCBM: that it is an efficient encoding of complex, often implicit, knowledge used to treat a patient. This allows us to learn a space of policies that are critical to a wide range of healthcare applications, in which the vast majority of clinical knowledge is acquired tacitly through years of practice and only a tiny fraction of information relevant to patient care is written down (e.g. in textbooks, studies or standardized guidelines).

Updated: 2025-03-05 18:24:58

标题: 深层因果行为策略学习：在医疗保健领域的应用

摘要: 我们提出了一种基于深度学习的方法来研究多样化的非随机医疗环境中的动态临床行为规律。我们提出的方法论 - 深度因果行为策略学习（DC-BPL）- 使用深度学习算法来学习高维临床行动路径的分布，并确定这些行动路径与患者结果之间的因果联系。具体地，我们的方法：（1）确定提供者分配对临床结果的因果效应；（2）学习给定提供者在不断变化的患者信息下会采取的临床行动的分布；（3）并结合这些步骤来确定给定患者类型的最佳提供者，并模拟该提供者的护理决策。在这一策略的基础上，我们使用变压器架构在电子健康记录数据上训练了一个大型临床行为模型（LCBM），并展示了其估计临床行为策略的能力。我们提出了一种新颖的解释，即使用LCBM学习的行为策略：它是一种有效编码复杂、常常是隐含的知识，用于治疗患者。这使我们能够学习一系列对各种医疗应用至关重要的策略，在这些应用中，绝大部分临床知识是通过多年的实践隐性获取的，只有极少量与患者护理相关的信息被记录下来（例如在教科书、研究或标准指南中）。

更新时间: 2025-03-05 18:24:58

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.03724v1

PARAMANU-GANITA: Can Small Math Language Models Rival with Large Language Models on Mathematical Reasoning?

In this paper, we study whether domain specific pretraining of small generative language models (SLM) from scratch with domain specialized tokenizer and Chain-of-Thought (CoT) instruction fine-tuning results in competitive performance on mathematical reasoning compared to LLMs? Secondly, whether this approach is environmentally sustainable, highly cost efficient? To address these research questions, we present Paramanu-Ganita, a 208 million-parameter novel decoder-only Auto Regressive SLM on mathematics. We performed pretraining from scratch on 31.5 billion tokens for 170 A100 hours using a context size of 4096 on a mixed mathematical corpus consisting of web pages, source code, textbooks, CoT templatised StackOverflow QA pairs, and mathematical lecture notes in LaTeX curated by us. We also trained a math and code specialised BPE tokenizer. We proposed and performed CoT instruction fine-tuning of Paramanu-Ganita on the MetaMathQA dataset. Our model Paramanu-Ganita, despite being 34 times smaller than the 7B LLMs, outperforms generalist LLMs by approximately 30% points, and even math-specialised LLMs by 3-23% points in GSM8K test accuracy metric. On MATH benchmark, Paramanu-Ganita outperformed the various models by 6-8% points. On benchmarks like LogiQA, MMLU (high school, college level), and competitive exams level, AGIEVAL (AQuA-RAT, SAT-Math), Paramanu-Ganita outperformed others by 1-4%. Our model is available at https://huggingface.co/gyanai/paramanu-ganita-208M-hf .

Updated: 2025-03-05 18:17:28

标题: PARAMANU-GANITA：小型数学语言模型能否在数学推理方面与大型语言模型匹敌？

摘要: 在这篇论文中，我们研究了是否使用领域特定的预训练小型生成语言模型（SLM），从零开始使用领域专门化的分词器和Chain-of-Thought（CoT）指导微调，与LLMs相比在数学推理上表现出竞争力？其次，这种方法是否环境可持续，成本效益高？为了回答这些研究问题，我们提出了Paramanu-Ganita，一个拥有208百万参数的全新解码器自回归SLM，用于数学。我们在由我们筛选的网页、源代码、教科书、CoT模板化的StackOverflow问答对以及LaTeX格式的数学讲义组成的混合数学语料库上，从零开始对315亿个标记进行了170小时的预训练，使用了4096的上下文大小。我们还训练了一个数学和代码专门化的BPE分词器。我们提出并在MetaMathQA数据集上进行了Paramanu-Ganita的CoT指导微调。我们的模型Paramanu-Ganita虽然比7B的LLMs小34倍，但在GSM8K测试准确度指标上的表现优于通用LLMs约30个百分点，甚至比数学专业化的LLMs高出3-23个百分点。在MATH基准测试中，Paramanu-Ganita的表现超过其他各种模型6-8个百分点。在LogiQA、MMLU（高中、大学水平）和竞争性考试水平的基准测试中，AGIEVAL（AQuA-RAT、SAT-Math），Paramanu-Ganita超过其他模型1-4%。我们的模型可以在https://huggingface.co/gyanai/paramanu-ganita-208M-hf找到。

更新时间: 2025-03-05 18:17:28

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.14395v2

Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate

Reinforcement learning provides a mathematical framework for learning-based control, whose success largely depends on the amount of data it can utilize. The efficient utilization of historical trajectories obtained from previous policies is essential for expediting policy optimization. Empirical evidence has shown that policy gradient methods based on importance sampling work well. However, existing literature often neglect the interdependence between trajectories from different iterations, and the good empirical performance lacks a rigorous theoretical justification. In this paper, we study a variant of the natural policy gradient method with reusing historical trajectories via importance sampling. We show that the bias of the proposed estimator of the gradient is asymptotically negligible, the resultant algorithm is convergent, and reusing past trajectories helps improve the convergence rate. We further apply the proposed estimator to popular policy optimization algorithms such as trust region policy optimization. Our theoretical results are verified on classical benchmarks.

Updated: 2025-03-05 18:14:25

标题: 通过重要性抽样在自然策略梯度中重用历史轨迹：收敛性和收敛速率

摘要: 强化学习提供了一个基于学习的控制的数学框架，其成功在很大程度上取决于其可以利用的数据量。有效利用从先前策略获得的历史轨迹对于加速策略优化至关重要。实证证据表明，基于重要性抽样的策略梯度方法效果很好。然而，现有文献经常忽略了来自不同迭代的轨迹之间的相互依赖性，而且良好的实证表现缺乏严格的理论证明。在本文中，我们研究了一种通过重要性抽样重复使用历史轨迹的自然策略梯度方法的变体。我们表明，所提梯度估计器的偏差在渐近意义上可以忽略不计，所得到的算法是收敛的，并且重复使用过去的轨迹有助于改善收敛速度。我们进一步将所提估计器应用于流行的策略优化算法，如信任区域策略优化。我们的理论结果在经典基准上得到验证。

更新时间: 2025-03-05 18:14:25

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2403.00675v2

Machine Learning in Biomechanics: Key Applications and Limitations in Walking, Running, and Sports Movements

This chapter provides an overview of recent and promising Machine Learning applications, i.e. pose estimation, feature estimation, event detection, data exploration & clustering, and automated classification, in gait (walking and running) and sports biomechanics. It explores the potential of Machine Learning methods to address challenges in biomechanical workflows, highlights central limitations, i.e. data and annotation availability and explainability, that need to be addressed, and emphasises the importance of interdisciplinary approaches for fully harnessing the potential of Machine Learning in gait and sports biomechanics.

Updated: 2025-03-05 18:10:11

标题: 生物力学中的机器学习：步行、跑步和运动运动中的关键应用和局限性

摘要: 这一章节提供了近期和有前景的机器学习应用的概述，即姿势估计、特征估计、事件检测、数据探索与聚类以及自动分类，在步态（行走和奔跑）和运动生物力学领域。它探讨了机器学习方法在解决生物力学工作流中挑战的潜力，强调了中心限制，即数据和标注的可用性和可解释性，需要解决，并强调了跨学科方法在充分利用机器学习在步态和运动生物力学领域潜力方面的重要性。

更新时间: 2025-03-05 18:10:11

领域: cs.AI

下载: http://arxiv.org/abs/2503.03717v1

Neural DNF-MT: A Neuro-symbolic Approach for Learning Interpretable and Editable Policies

Although deep reinforcement learning has been shown to be effective, the model's black-box nature presents barriers to direct policy interpretation. To address this problem, we propose a neuro-symbolic approach called neural DNF-MT for end-to-end policy learning. The differentiable nature of the neural DNF-MT model enables the use of deep actor-critic algorithms for training. At the same time, its architecture is designed so that trained models can be directly translated into interpretable policies expressed as standard (bivalent or probabilistic) logic programs. Moreover, additional layers can be included to extract abstract features from complex observations, acting as a form of predicate invention. The logic representations are highly interpretable, and we show how the bivalent representations of deterministic policies can be edited and incorporated back into a neural model, facilitating manual intervention and adaptation of learned policies. We evaluate our approach on a range of tasks requiring learning deterministic or stochastic behaviours from various forms of observations. Our empirical results show that our neural DNF-MT model performs at the level of competing black-box methods whilst providing interpretable policies.

Updated: 2025-03-05 18:04:40

标题: 神经DNF-MT：一种用于学习可解释和可编辑策略的神经符号方法

摘要: 尽管深度强化学习已被证明有效，但模型的黑盒特性给直接策略解释带来了障碍。为了解决这一问题，我们提出了一种名为神经DNF-MT的神经符号方法，用于端到端策略学习。神经DNF-MT模型的可微特性使得可以使用深度演员-评论家算法进行训练。同时，其架构设计使得训练模型可以直接转换为可解释的策略，表达为标准的（双值或概率性）逻辑程序。此外，还可以添加额外的层来从复杂观测中提取抽象特征，充当谓词的发明形式。逻辑表示具有很高的可解释性，我们展示了确定性策略的双值表示如何被编辑并重新整合到神经模型中，促进了对学习的策略进行手动干预和调整。我们在需要从各种形式的观测中学习确定性或随机行为的一系列任务上评估了我们的方法。我们的实证结果表明，我们的神经DNF-MT模型在提供可解释策略的同时，表现出与竞争性黑盒方法相当的水平。

更新时间: 2025-03-05 18:04:40

领域: cs.AI,cs.LG,cs.LO

下载: http://arxiv.org/abs/2501.03888v3

Handling Uncertainty in Health Data using Generative Algorithms

Understanding and managing uncertainty is crucial in machine learning, especially in high-stakes domains like healthcare, where class imbalance can impact predictions. This paper introduces RIGA, a novel pipeline that mitigates class imbalance using generative AI. By converting tabular healthcare data into images, RIGA leverages models like cGAN, VQVAE, and VQGAN to generate balanced samples, improving classification performance. These representations are processed by CNNs and later transformed back into tabular format for seamless integration. This approach enhances traditional classifiers like XGBoost, improves Bayesian structure learning, and strengthens ML model robustness by generating realistic synthetic data for underrepresented classes.

Updated: 2025-03-05 18:04:30

标题: 使用生成算法处理健康数据中的不确定性

摘要: 理解和管理不确定性在机器学习中至关重要，特别是在像医疗保健这样的高风险领域，其中类别不平衡可能影响预测。本文介绍了一种新颖的管道RIGA，利用生成式人工智能来缓解类别不平衡。通过将表格化的医疗保健数据转换为图像，RIGA利用诸如cGAN、VQVAE和VQGAN等模型来生成平衡样本，改善分类性能。这些表示由CNN处理，随后再转换回表格格式以进行无缝集成。这种方法提升了传统分类器如XGBoost的性能，改进了贝叶斯结构学习，并通过为少数类别生成逼真的合成数据来增强机器学习模型的鲁棒性。

更新时间: 2025-03-05 18:04:30

领域: cs.LG

下载: http://arxiv.org/abs/2503.03715v1

Improving LLM Safety Alignment with Dual-Objective Optimization

Existing training-time safety alignment techniques for large language models (LLMs) remain vulnerable to jailbreak attacks. Direct preference optimization (DPO), a widely deployed alignment method, exhibits limitations in both experimental and theoretical contexts as its loss function proves suboptimal for refusal learning. Through gradient-based analysis, we identify these shortcomings and propose an improved safety alignment that disentangles DPO objectives into two components: (1) robust refusal training, which encourages refusal even when partial unsafe generations are produced, and (2) targeted unlearning of harmful knowledge. This approach significantly increases LLM robustness against a wide range of jailbreak attacks, including prefilling, suffix, and multi-turn attacks across both in-distribution and out-of-distribution scenarios. Furthermore, we introduce a method to emphasize critical refusal tokens by incorporating a reward-based token-level weighting mechanism for refusal learning, which further improves the robustness against adversarial exploits. Our research also suggests that robustness to jailbreak attacks is correlated with token distribution shifts in the training process and internal representations of refusal and harmful tokens, offering valuable directions for future research in LLM safety alignment. The code is available at https://github.com/wicai24/DOOR-Alignment

Updated: 2025-03-05 18:01:05

标题: 优化双目标优化以提高LLM安全性对齐

摘要: 现有的针对大型语言模型（LLMs）的训练时安全对齐技术仍然容易受到越狱攻击的影响。直接优先优化（DPO）是一种广泛采用的对齐方法，在实验和理论环境中都存在局限性，因为其损失函数在拒绝学习方面表现不佳。通过基于梯度的分析，我们确定了这些缺点，并提出了一种改进的安全对齐方法，将DPO目标分解为两个组成部分：（1）强化拒绝训练，即在产生部分不安全的生成物时也鼓励拒绝，以及（2）有针对性地消除有害知识。这种方法显著提高了LLM对各种越狱攻击的鲁棒性，包括前置、后缀和多轮攻击，涵盖分布内和分布外场景。此外，我们引入了一种方法，通过将基于奖励的令牌级加权机制纳入拒绝学习，来强调关键的拒绝令牌，进一步提高了对抗性攻击的鲁棒性。我们的研究还表明，对越狱攻击的鲁棒性与训练过程中的令牌分布变化以及拒绝和有害令牌的内部表示有关，为未来LLM安全对齐研究提供了有价值的方向。代码可在https://github.com/wicai24/DOOR-Alignment 获取。

更新时间: 2025-03-05 18:01:05

领域: cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2503.03710v1

Rethinking Video Tokenization: A Conditioned Diffusion-based Approach

Video tokenizers, which transform videos into compact latent representations, are key to video generation. Existing video tokenizers are based on the VAE architecture and follow a paradigm where an encoder compresses videos into compact latents, and a deterministic decoder reconstructs the original videos from these latents. In this paper, we propose a novel \underline{\textbf{C}}onditioned \underline{\textbf{D}}iffusion-based video \underline{\textbf{T}}okenizer entitled \textbf{\ourmethod}, which departs from previous methods by replacing the deterministic decoder with a 3D causal diffusion model. The reverse diffusion generative process of the decoder is conditioned on the latent representations derived via the encoder. With a feature caching and sampling acceleration, the framework efficiently reconstructs high-fidelity videos of arbitrary lengths. Results show that {\ourmethod} achieves state-of-the-art performance in video reconstruction tasks using just a single-step sampling. Even a smaller version of {\ourmethod} still achieves reconstruction results on par with the top two baselines. Furthermore, the latent video generation model trained using {\ourmethod} also shows superior performance.

Updated: 2025-03-05 17:59:19

标题: 重新思考视频标记化：一种基于条件扩散的方法

摘要: 视频标记器，将视频转换为紧凑的潜在表示，对于视频生成至关重要。现有的视频标记器基于VAE架构，并遵循一种范式，其中编码器将视频压缩为紧凑的潜在表示，而确定性解码器从这些潜在表示中重建原始视频。在本文中，我们提出了一种新颖的基于条件扩散的视频标记器，命名为我们的方法（OurMethod），它与先前的方法不同，通过将确定性解码器替换为3D因果扩散模型。解码器的反向扩散生成过程是基于通过编码器导出的潜在表示条件的。通过特征缓存和采样加速，该框架能够高效地重建任意长度的高保真视频。结果显示，我们的方法在视频重建任务中实现了最先进的性能，仅使用单步采样。甚至我们的方法的较小版本仍能实现与前两个基准的重建结果相媲美。此外，使用我们的方法训练的潜在视频生成模型也表现出卓越的性能。

更新时间: 2025-03-05 17:59:19

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.03708v1

Curating Demonstrations using Online Experience

Many robot demonstration datasets contain heterogeneous demonstrations of varying quality. This heterogeneity may benefit policy pre-training, but can hinder robot performance when used with a final imitation learning objective. In particular, some strategies in the data may be less reliable than others or may be underrepresented in the data, leading to poor performance when such strategies are sampled at test time. Moreover, such unreliable or underrepresented strategies can be difficult even for people to discern, and sifting through demonstration datasets is time-consuming and costly. On the other hand, policy performance when trained on such demonstrations can reflect the reliability of different strategies. We thus propose for robots to self-curate based on online robot experience (Demo-SCORE). More specifically, we train and cross-validate a classifier to discern successful policy roll-outs from unsuccessful ones and use the classifier to filter heterogeneous demonstration datasets. Our experiments in simulation and the real world show that Demo-SCORE can effectively identify suboptimal demonstrations without manual curation. Notably, Demo-SCORE achieves over 15-35% higher absolute success rate in the resulting policy compared to the base policy trained with all original demonstrations.

Updated: 2025-03-05 17:58:16

标题: 利用在线经验策划演示活动

摘要: 许多机器人演示数据集包含质量不同的异质演示。这种异质性可能有利于政策的预训练，但当与最终的模仿学习目标一起使用时，可能会阻碍机器人的性能。特别是，数据中的一些策略可能比其他策略不够可靠，或在数据中代表性不足，导致在测试时选择这些策略时性能不佳。此外，这种不可靠或代表性不足的策略甚至对人们来说也很难分辨，筛选演示数据集是耗时且昂贵的。另一方面，当在这些演示上进行训练时，政策的性能可以反映出不同策略的可靠性。因此，我们提出了基于在线机器人经验进行自动筛选的方法（Demo-SCORE）。更具体地说，我们训练并交叉验证一个分类器，以区分成功的政策展开和不成功的政策展开，并使用该分类器来过滤异质演示数据集。我们在模拟和真实世界的实验表明，Demo-SCORE可以有效识别次优演示，而无需手动筛选。值得注意的是，与使用所有原始演示进行训练的基准政策相比，Demo-SCORE在最终政策中实现了15-35%更高的绝对成功率。

更新时间: 2025-03-05 17:58:16

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.03707v1

Effective LLM Knowledge Learning via Model Generalization

Large language models (LLMs) are trained on enormous documents that contain extensive world knowledge. However, it is still not well-understood how knowledge is acquired via autoregressive pre-training. This lack of understanding greatly hinders effective knowledge learning, especially for continued pretraining on up-to-date information, as this evolving information often lacks diverse repetitions like foundational knowledge. In this paper, we focus on understanding and improving LLM knowledge learning. We found and verified that knowledge learning for LLMs can be deemed as an implicit supervised task hidden in the autoregressive pre-training objective. Our findings suggest that knowledge learning for LLMs would benefit from methods designed to improve generalization ability for supervised tasks. Based on our analysis, we propose the formatting-based data augmentation to grow in-distribution samples, which does not present the risk of altering the facts embedded in documents as text paraphrasing. We also introduce sharpness-aware minimization as an effective optimization algorithm to better improve generalization. Moreover, our analysis and method can be readily extended to instruction tuning. Extensive experiment results validate our findings and demonstrate our methods' effectiveness in both continued pre-training and instruction tuning. This paper offers new perspectives and insights to interpret and design effective strategies for LLM knowledge learning.

Updated: 2025-03-05 17:56:20

标题: 通过模型泛化实现LLM知识学习的有效性

摘要: 大型语言模型（LLMs）是在包含大量世界知识的庞大文档上进行训练的。然而，目前人们尚不清楚通过自回归预训练获取知识的机制。这种缺乏理解极大地阻碍了有效的知识学习，尤其是对于继续在最新信息上进行预训练，因为这些不断发展的信息通常缺乏像基础知识那样的多样重复。在本文中，我们关注理解和改进LLM知识学习。我们发现并验证了LLMs的知识学习可以被视为隐藏在自回归预训练目标中的一种隐式监督任务。我们的发现表明，LLMs的知识学习将受益于设计用于提高监督任务的泛化能力的方法。基于我们的分析，我们提出了基于格式化的数据增强方法来增加分布样本，这种方法不会像文本改写那样存在改变文档中嵌入的事实的风险。我们还介绍了一种有效的优化算法——锐度感知最小化，以更好地改善泛化能力。此外，我们的分析和方法可以轻松扩展到指导调整。广泛的实验结果验证了我们的发现，并展示了我们的方法在持续预训练和指导调整中的有效性。本文为解释和设计LLM知识学习的有效策略提供了新的视角和洞察。

更新时间: 2025-03-05 17:56:20

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.03705v1

A Practical Memory Injection Attack against LLM Agents

Agents based on large language models (LLMs) have demonstrated strong capabilities in a wide range of complex, real-world applications. However, LLM agents with a compromised memory bank may easily produce harmful outputs when the past records retrieved for demonstration are malicious. In this paper, we propose a novel Memory INJection Attack, MINJA, that enables the injection of malicious records into the memory bank by only interacting with the agent via queries and output observations. These malicious records are designed to elicit a sequence of malicious reasoning steps leading to undesirable agent actions when executing the victim user's query. Specifically, we introduce a sequence of bridging steps to link the victim query to the malicious reasoning steps. During the injection of the malicious record, we propose an indication prompt to guide the agent to autonomously generate our designed bridging steps. We also propose a progressive shortening strategy that gradually removes the indication prompt, such that the malicious record will be easily retrieved when processing the victim query comes after. Our extensive experiments across diverse agents demonstrate the effectiveness of MINJA in compromising agent memory. With minimal requirements for execution, MINJA enables any user to influence agent memory, highlighting practical risks of LLM agents.

Updated: 2025-03-05 17:53:24

标题: 一个针对LLM代理的实际内存注入攻击

摘要: 基于大型语言模型（LLMs）的代理在各种复杂的现实世界应用中展现出强大的能力。然而，当过去检索的记录用于演示时是恶意的时，具有受损内存库的LLM代理可能会轻易产生有害输出。本文提出了一种新颖的记忆注入攻击（MINJA），通过仅通过查询和输出观察与代理交互，实现向内存库注入恶意记录。这些恶意记录旨在引发一系列恶意推理步骤，导致执行受害用户查询时不良代理行为。具体地，我们引入一系列连接步骤将受害查询与恶意推理步骤联系起来。在注入恶意记录过程中，我们提出一个指示提示，引导代理自动生成我们设计的连接步骤。我们还提出了一种渐进缩短策略，逐渐消除指示提示，使得在处理受害查询时恶意记录容易被检索到。我们在多个代理上进行的广泛实验表明，MINJA在破坏代理记忆方面的有效性。MINJA以最低的执行要求使任何用户都能影响代理记忆，突显了LLM代理的实际风险。

更新时间: 2025-03-05 17:53:24

领域: cs.LG

下载: http://arxiv.org/abs/2503.03704v1

Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Many large-scale systems rely on high-quality deep representations (embeddings) to facilitate tasks like retrieval, search, and generative modeling. Matryoshka Representation Learning (MRL) recently emerged as a solution for adaptive embedding lengths, but it requires full model retraining and suffers from noticeable performance degradations at short lengths. In this paper, we show that sparse coding offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity. We propose Contrastive Sparse Representation (CSR), a method that sparsifies pre-trained embeddings into a high-dimensional but selectively activated feature space. By leveraging lightweight autoencoding and task-aware contrastive objectives, CSR preserves semantic quality while allowing flexible, cost-effective inference at different sparsity levels. Extensive experiments on image, text, and multimodal benchmarks demonstrate that CSR consistently outperforms MRL in terms of both accuracy and retrieval speed-often by large margins-while also cutting training time to a fraction of that required by MRL. Our results establish sparse coding as a powerful paradigm for adaptive representation learning in real-world applications where efficiency and fidelity are both paramount. Code is available at https://github.com/neilwen987/CSR_Adaptive_Rep

Updated: 2025-03-05 17:51:09

标题: 超越玛特略卡：重新审视用于自适应表示的稀疏编码

摘要: 许多大规模系统依赖于高质量的深度表示（嵌入）来促进检索、搜索和生成建模等任务。Matryoshka Representation Learning (MRL)最近出现作为一种适应嵌入长度的解决方案，但它需要完整的模型重新训练，并且在短长度下性能明显下降。在本文中，我们展示了稀疏编码提供了一个具有最小开销和更高保真度的吸引人的替代方案来实现自适应表示。我们提出了对比稀疏表示（CSR），一种将预训练的嵌入稀疏化为一个高维但有选择性激活的特征空间的方法。通过利用轻量级自编码和任务感知对比目标，CSR在保留语义质量的同时，允许在不同稀疏度水平上进行灵活、经济高效的推断。对图像、文本和多模态基准的广泛实验表明，CSR在准确性和检索速度方面始终优于MRL，通常差距很大，同时还将训练时间缩短为MRL所需时间的一小部分。我们的结果将稀疏编码确立为在效率和保真度都至关重要的实际应用中自适应表示学习的强大范式。代码可在https://github.com/neilwen987/CSR_Adaptive_Rep上找到。

更新时间: 2025-03-05 17:51:09

领域: cs.LG,cs.AI,cs.CV,cs.IR

下载: http://arxiv.org/abs/2503.01776v2

DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory

Large language models (LLMs) have achieved reasonable quality improvements in machine translation (MT). However, most current research on MT-LLMs still faces significant challenges in maintaining translation consistency and accuracy when processing entire documents. In this paper, we introduce DelTA, a Document-levEL Translation Agent designed to overcome these limitations. DelTA features a multi-level memory structure that stores information across various granularities and spans, including Proper Noun Records, Bilingual Summary, Long-Term Memory, and Short-Term Memory, which are continuously retrieved and updated by auxiliary LLM-based components. Experimental results indicate that DelTA significantly outperforms strong baselines in terms of translation consistency and quality across four open/closed-source LLMs and two representative document translation datasets, achieving an increase in consistency scores by up to 4.58 percentage points and in COMET scores by up to 3.16 points on average. DelTA employs a sentence-by-sentence translation strategy, ensuring no sentence omissions and offering a memory-efficient solution compared to the mainstream method. Furthermore, DelTA improves pronoun and context-dependent translation accuracy, and the summary component of the agent also shows promise as a tool for query-based summarization tasks. The code and data of our approach are released at https://github.com/YutongWang1216/DocMTAgent.

Updated: 2025-03-05 17:50:44

标题: DelTA：基于多级记忆的在线文档级翻译代理

摘要: 大型语言模型（LLMs）在机器翻译（MT）中取得了合理的质量改进。然而，目前大多数关于MT-LLMs的研究仍然面临着在处理整个文档时保持翻译一致性和准确性的重大挑战。在本文中，我们介绍了DelTA，一个旨在克服这些限制的文档级翻译代理。DelTA具有多级记忆结构，跨越不同的粒度和跨度存储信息，包括专有名词记录、双语摘要、长期记忆和短期记忆，这些信息由辅助LLM组件持续检索和更新。实验结果表明，DelTA在四个开源/闭源LLMs和两个代表性文档翻译数据集上在翻译一致性和质量方面明显优于强基线，一致性得分增加了高达4.58个百分点，COMET得分平均增加了3.16个点。DelTA采用逐句翻译策略，确保没有句子遗漏，并提供了与主流方法相比更具内存效率的解决方案。此外，DelTA提高了代词和上下文相关翻译的准确性，代理的摘要组件也显示出作为基于查询摘要任务工具的潜力。我们的方法的代码和数据已在https://github.com/YutongWang1216/DocMTAgent发布。

更新时间: 2025-03-05 17:50:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.08143v2

ZAugNet for Z-Slice Augmentation in Bio-Imaging

Three-dimensional biological microscopy has significantly advanced our understanding of complex biological structures. However, limitations due to microscopy techniques, sample properties or phototoxicity often result in poor z-resolution, hindering accurate cellular measurements. Here, we introduce ZAugNet, a fast, accurate, and self-supervised deep learning method for enhancing z-resolution in biological images. By performing nonlinear interpolation between consecutive slices, ZAugNet effectively doubles resolution with each iteration. Compared on several microscopy modalities and biological objects, it outperforms competing methods on most metrics. Our method leverages a generative adversarial network (GAN) architecture combined with knowledge distillation to maximize prediction speed without compromising accuracy. We also developed ZAugNet+, an extended version enabling continuous interpolation at arbitrary distances, making it particularly useful for datasets with nonuniform slice spacing. Both ZAugNet and ZAugNet+ provide high-performance, scalable z-slice augmentation solutions for large-scale 3D imaging. They are available as open-source frameworks in PyTorch, with an intuitive Colab notebook interface for easy access by the scientific community.

Updated: 2025-03-05 17:50:35

标题: ZAugNet用于生物成像中的Z切片增强

摘要: 三维生物显微镜技术显著推动了我们对复杂生物结构的理解。然而，由于显微镜技术、样本特性或光毒性等限制，通常会导致z分辨率不佳，从而影响准确的细胞测量。在这里，我们介绍了一种名为ZAugNet的快速、准确且自监督的深度学习方法，用于增强生物图像的z分辨率。通过在连续切片之间执行非线性插值，ZAugNet可以有效地在每次迭代中将分辨率加倍。在多种显微镜模式和生物对象上进行比较，它在大多数指标上优于竞争方法。我们的方法利用生成敌对网络（GAN）架构结合知识蒸馏，以最大化预测速度而不影响准确性。我们还开发了ZAugNet+，这是一个扩展版本，可以在任意距离上进行连续插值，特别适用于具有非均匀切片间距的数据集。ZAugNet和ZAugNet+为大规模3D成像提供高性能、可扩展的z切片增强解决方案。它们作为PyTorch中的开源框架提供，还提供了直观的Colab笔记本界面，以便科学界轻松访问。

更新时间: 2025-03-05 17:50:35

领域: cs.CV,cs.AI,eess.IV,q-bio.QM,68,I.4.3; I.4.4; I.2.0; J.3

下载: http://arxiv.org/abs/2503.04843v1

ILLC: Iterative Layer-by-Layer Compression for Enhancing Structural Faithfulness in SpArX

In the field of Explainable Artificial Intelligence (XAI), argumentative XAI approaches have been proposed to represent the internal reasoning process of deep neural networks in a more transparent way by interpreting hidden nodes as arguements. However, as the number of layers increases, existing compression methods simplify all layers at once, which lead to high accumulative information loss. To compensate for this, we propose an iterative layer-by-layer compression technique in which each layer is compressed separately and the reduction error in the next layer is immediately compensated for, thereby improving the overall input-output and structural fidelity of the model. Experiments on the Breast Cancer Diagnosis dataset show that, compared to traditional compression, the method reduces input-output and structural unfaithfulness, and maintains a more consistent attack-support relationship in the Argumentative Explanation scheme. This is significant because it provides a new way to make complex MLP models more compact while still conveying their internal inference logic without distortion.

Updated: 2025-03-05 17:43:49

标题: ILLC：迭代逐层压缩技术在SpArX中增强结构可靠性

摘要: 在可解释人工智能（XAI）领域，已经提出了论证性XAI方法来更透明地表示深度神经网络的内部推理过程，通过将隐藏节点解释为论点。然而，随着层数的增加，现有的压缩方法一次简化所有层，导致高累积信息丢失。为了补偿这一点，我们提出了一种迭代的逐层压缩技术，其中每一层都被单独压缩，下一层的减少误差立即得到补偿，从而提高模型的整体输入输出和结构保真度。对乳腺癌诊断数据集的实验表明，与传统压缩相比，该方法减少了输入输出和结构的不忠实，并在论证性解释方案中保持了更一致的攻击-支持关系。这是重要的，因为它提供了一种新方法，使复杂的MLP模型更紧凑，同时仍传达其内部推理逻辑而不失真。

更新时间: 2025-03-05 17:43:49

领域: cs.AI

下载: http://arxiv.org/abs/2503.03693v1

Mixture of Experts Made Intrinsically Interpretable

Neurons in large language models often exhibit \emph{polysemanticity}, simultaneously encoding multiple unrelated concepts and obscuring interpretability. Instead of relying on post-hoc methods, we present \textbf{MoE-X}, a Mixture-of-Experts (MoE) language model designed to be \emph{intrinsically} interpretable. Our approach is motivated by the observation that, in language models, wider networks with sparse activations are more likely to capture interpretable factors. However, directly training such large sparse networks is computationally prohibitive. MoE architectures offer a scalable alternative by activating only a subset of experts for any given input, inherently aligning with interpretability objectives. In MoE-X, we establish this connection by rewriting the MoE layer as an equivalent sparse, large MLP. This approach enables efficient scaling of the hidden size while maintaining sparsity. To further enhance interpretability, we enforce sparse activation within each expert and redesign the routing mechanism to prioritize experts with the highest activation sparsity. These designs ensure that only the most salient features are routed and processed by the experts. We evaluate MoE-X on chess and natural language tasks, showing that it achieves performance comparable to dense models while significantly improving interpretability. MoE-X achieves a perplexity better than GPT-2, with interpretability surpassing even sparse autoencoder (SAE)-based approaches.

Updated: 2025-03-05 17:40:54

标题: 专家混合模型的内在可解释性

摘要: 大型语言模型中的神经元经常表现出多义性，同时编码多个不相关的概念并且模糊了可解释性。我们提出了MoE-X，这是一个专为内在可解释性而设计的混合专家（MoE）语言模型，而不是依赖事后方法。我们的方法受到一个观察的启发，在语言模型中，具有稀疏激活的更宽的网络更有可能捕捉可解释的因素。然而，直接训练这样的大型稀疏网络在计算上是不可行的。MoE架构提供了一种可扩展的替代方案，通过仅激活给定输入的一小部分专家，固有地与可解释性目标保持一致。在MoE-X中，我们通过将MoE层重写为等效的稀疏大型MLP来建立这种关系。这种方法使得隐藏大小的有效扩展同时保持稀疏性。为了进一步增强可解释性，我们要求每个专家内部进行稀疏激活，并重新设计路由机制以优先考虑激活稀疏性最高的专家。这些设计确保只有最显著的特征被专家路由和处理。我们在国际象棋和自然语言任务上评估了MoE-X，结果显示它在保持可比性的同时显著提高了可解释性。MoE-X的困惑度优于GPT-2，可解释性甚至超过了基于稀疏自动编码器（SAE）的方法。

更新时间: 2025-03-05 17:40:54

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.07639v1

Replicating Human Social Perception in Generative AI: Evaluating the Valence-Dominance Model

As artificial intelligence (AI) continues to advance--particularly in generative models--an open question is whether these systems can replicate foundational models of human social perception. A well-established framework in social cognition suggests that social judgments are organized along two primary dimensions: valence (e.g., trustworthiness, warmth) and dominance (e.g., power, assertiveness). This study examines whether multimodal generative AI systems can reproduce this valence-dominance structure when evaluating facial images and how their representations align with those observed across world regions. Through principal component analysis (PCA), we found that the extracted dimensions closely mirrored the theoretical structure of valence and dominance, with trait loadings aligning with established definitions. However, many world regions and generative AI models also exhibited a third component, the nature and significance of which warrant further investigation. These findings demonstrate that multimodal generative AI systems can replicate key aspects of human social perception, raising important questions about their implications for AI-driven decision-making and human-AI interactions.

Updated: 2025-03-05 17:35:18

标题: 在生成式人工智能中复制人类社会感知：评估价值-支配模型

摘要: 随着人工智能（AI）不断发展，特别是在生成模型方面，一个悬而未决的问题是这些系统能否复制人类社会感知的基础模型。社会认知中的一个成熟框架表明，社会判断主要沿着两个主要维度组织：价值（例如，可信度，温暖）和支配力（例如，权力，果断）。本研究探讨了多模式生成AI系统在评估面部图像时是否能重现这种价值-支配结构，以及它们的表征如何与观察到的各个世界地区保持一致。通过主成分分析（PCA），我们发现提取的维度与价值和支配的理论结构非常相似，特质负荷与已建立的定义保持一致。然而，许多世界地区和生成AI模型还展示出第三个组件，其性质和重要性值得进一步调查。这些发现表明，多模式生成AI系统可以复制人类社会感知的关键方面，引发关于其对AI驱动决策和人机交互的影响的重要问题。

更新时间: 2025-03-05 17:35:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.04842v1

Towards Trustworthy Federated Learning

This paper develops a comprehensive framework to address three critical trustworthy challenges in federated learning (FL): robustness against Byzantine attacks, fairness, and privacy preservation. To improve the system's defense against Byzantine attacks that send malicious information to bias the system's performance, we develop a Two-sided Norm Based Screening (TNBS) mechanism, which allows the central server to crop the gradients that have the l lowest norms and h highest norms. TNBS functions as a screening tool to filter out potential malicious participants whose gradients are far from the honest ones. To promote egalitarian fairness, we adopt the q-fair federated learning (q-FFL). Furthermore, we adopt a differential privacy-based scheme to prevent raw data at local clients from being inferred by curious parties. Convergence guarantees are provided for the proposed framework under different scenarios. Experimental results on real datasets demonstrate that the proposed framework effectively improves robustness and fairness while managing the trade-off between privacy and accuracy. This work appears to be the first study that experimentally and theoretically addresses fairness, privacy, and robustness in trustworthy FL.

Updated: 2025-03-05 17:25:20

标题: 朝向可信赖的联邦学习

摘要: 本文制定了一个全面的框架，以解决联邦学习（FL）中的三个关键可信挑战：抵御拜占庭攻击、公平性和隐私保护。为了提高系统对发送恶意信息以偏离系统性能的拜占庭攻击的防御能力，我们开发了一个双边规范筛选（TNBS）机制，允许中央服务器截取具有较低规范和较高规范的梯度。TNBS作为一个筛选工具，用于过滤潜在的恶意参与者，其梯度与诚实者的梯度相去甚远。为了促进平等公平，我们采用了q-公平联邦学习（q-FFL）。此外，我们采用基于差分隐私的方案，防止本地客户端的原始数据被好奇的一方推断出来。在不同情景下，为所提出的框架提供了收敛保证。真实数据集上的实验结果表明，所提出的框架有效地提高了鲁棒性和公平性，同时平衡了隐私和准确性之间的权衡。这项工作似乎是第一项在可信FL中实验和理论上解决公平性、隐私和鲁棒性的研究。

更新时间: 2025-03-05 17:25:20

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2503.03684v1

CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition

Modern deep learning models often achieve high overall performance, but consistently fail on specific subgroups. Group distributionally robust optimization (group DRO) addresses this problem by minimizing the worst-group loss, but it fails when group losses misrepresent performance differences between groups. This is common in domains like speech, where the widely used connectionist temporal classification (CTC) loss scales with input length and varies with linguistic and acoustic properties, leading to spurious differences between group losses. We present CTC-DRO, which addresses the shortcomings of the group DRO objective by smoothing the group weight update to prevent overemphasis on consistently high-loss groups, while using input length-matched batching to mitigate CTC's scaling issues. We evaluate CTC-DRO on the task of multilingual automatic speech recognition (ASR) across five language sets from the ML-SUPERB 2.0 benchmark. CTC-DRO consistently outperforms group DRO and CTC-based baseline models, reducing the worst-language error by up to 47.1% and the average error by up to 32.9%. CTC-DRO can be applied to ASR with minimal computational costs, and offers the potential for reducing group disparities in other domains with similar challenges.

Updated: 2025-03-05 17:25:07

标题: CTC-DRO：降低语言差异的鲁棒优化在语音识别中的应用

摘要: 现代深度学习模型通常能够取得高整体性能，但在特定子群中经常失败。群体分布鲁棒优化（group DRO）通过最小化最差群体损失来解决这个问题，但当群体损失误导性地表现出群体之间的性能差异时，该方法会失败。在诸如语音领域这样的领域中，广泛使用的连接主义时间分类（CTC）损失随着输入长度而变化，并且随着语言和声学特性的不同而变化，导致群体损失之间出现虚假差异。我们提出了CTC-DRO，通过平滑群体权重更新来解决群体DRO目标的缺点，以防止过度强调一直高损失的群体，同时使用输入长度匹配的批处理来减轻CTC的缩放问题。我们在ML-SUPERB 2.0基准测试中的五种语言集上评估了CTC-DRO在多语言自动语音识别（ASR）任务上的表现。CTC-DRO始终优于群体DRO和基于CTC的基线模型，将最差语言错误率降低了高达47.1％，平均错误率降低了高达32.9％。CTC-DRO可以以最小的计算成本应用于ASR，并有望在其他具有类似挑战的领域中减少群体差异。

更新时间: 2025-03-05 17:25:07

领域: cs.LG,cs.CL,eess.AS

下载: http://arxiv.org/abs/2502.01777v2

Multi-Agent Systems Powered by Large Language Models: Applications in Swarm Intelligence

This work examines the integration of large language models (LLMs) into multi-agent simulations by replacing the hard-coded programs of agents with LLM-driven prompts. The proposed approach is showcased in the context of two examples of complex systems from the field of swarm intelligence: ant colony foraging and bird flocking. Central to this study is a toolchain that integrates LLMs with the NetLogo simulation platform, leveraging its Python extension to enable communication with GPT-4o via the OpenAI API. This toolchain facilitates prompt-driven behavior generation, allowing agents to respond adaptively to environmental data. For both example applications mentioned above, we employ both structured, rule-based prompts and autonomous, knowledge-driven prompts. Our work demonstrates how this toolchain enables LLMs to study self-organizing processes and induce emergent behaviors within multi-agent environments, paving the way for new approaches to exploring intelligent systems and modeling swarm intelligence inspired by natural phenomena. We provide the code, including simulation files and data at https://github.com/crjimene/swarm_gpt.

Updated: 2025-03-05 17:13:27

标题: 由大型语言模型驱动的多智能体系统：在群体智能中的应用

摘要: 这项工作探讨了将大型语言模型（LLMs）集成到多智能体模拟中，通过用LLM驱动的提示替换智能体的硬编码程序。所提出的方法在群体智能领域的两个复杂系统示例中展示：蚁群觅食和鸟群飞行。这项研究的核心是一个工具链，将LLMs与NetLogo模拟平台集成，在其Python扩展的基础上，通过OpenAI API与GPT-4o进行通信。这个工具链促进了基于提示的行为生成，使智能体能够自适应地响应环境数据。对于上述两个示例应用，我们采用了结构化的基于规则的提示和自主的知识驱动提示。我们的工作展示了这个工具链如何使LLMs能够研究自组织过程，并在多智能体环境中引发新的行为，为探索受自然现象启发的智能系统和模拟群体智能的新方法铺平了道路。我们提供了代码，包括模拟文件和数据，网址为https://github.com/crjimene/swarm_gpt。

更新时间: 2025-03-05 17:13:27

领域: cs.MA,cs.AI,cs.CL,cs.LG,I.6.0; I.2.7

下载: http://arxiv.org/abs/2503.03800v1

PyGen: A Collaborative Human-AI Approach to Python Package Creation

The principles of automation and innovation serve as foundational elements for advancement in contemporary science and technology. Here, we introduce Pygen, an automation platform designed to empower researchers, technologists, and hobbyists to bring abstract ideas to life as core, usable software tools written in Python. Pygen leverages the immense power of autoregressive large language models to augment human creativity during the ideation, iteration, and innovation process. By combining state-of-the-art language models with open-source code generation technologies, Pygen has significantly reduced the manual overhead of tool development. From a user prompt, Pygen automatically generates Python packages for a complete workflow from concept to package generation and documentation. The findings of our work show that Pygen considerably enhances the researcher's productivity by enabling the creation of resilient, modular, and well-documented packages for various specialized purposes. We employ a prompt enhancement approach to distill the user's package description into increasingly specific and actionable. While being inherently an open-ended task, we have evaluated the generated packages and the documentation using Human Evaluation, LLM-based evaluation, and CodeBLEU, with detailed results in the results section. Furthermore, we documented our results, analyzed the limitations, and suggested strategies to alleviate them. Pygen is our vision of ethical automation, a framework that promotes inclusivity, accessibility, and collaborative development. This project marks the beginning of a large-scale effort towards creating tools where intelligent agents collaborate with humans to improve scientific and technological development substantially. Our code and generated examples are open-sourced at [https://github.com/GitsSaikat/Pygen]

Updated: 2025-03-05 17:11:13

标题: PyGen：一种协作的人工智能方法用于Python软件包的创建

摘要: 自动化和创新原则是当代科学技术进步的基础要素。在这里，我们介绍了Pygen，这是一个旨在赋予研究人员、技术人员和爱好者能力的自动化平台，可以将抽象的想法变为核心、可用的软件工具，使用Python编写。Pygen利用自回归大型语言模型的巨大能力，在构思、迭代和创新过程中增强人类的创造力。通过将最先进的语言模型与开源代码生成技术相结合，Pygen显著减少了工具开发的手动成本。从用户提示开始，Pygen会自动生成Python包，完整地完成从概念到包生成和文档编写的工作流程。我们的研究结果表明，Pygen通过帮助创建针对各种专业用途的坚固、模块化和良好文档化的软件包，显著提高了研究人员的生产力。我们采用提示增强方法，将用户的软件包描述提炼为越来越具体和可操作的内容。虽然这是一个本质上开放式的任务，但我们已经通过人类评估、基于LLM的评估和CodeBLEU对生成的软件包和文档进行了评估，详细结果见结果部分。此外，我们记录了我们的结果，分析了限制，并提出了缓解限制的策略。Pygen是我们对伦理自动化的愿景，这是一个促进包容性、可访问性和协作开发的框架。这个项目标志着一个大规模努力的开始，旨在创建智能代理与人类合作，大幅改善科学和技术发展。我们的代码和生成的示例可以在[https://github.com/GitsSaikat/Pygen]上找到。

更新时间: 2025-03-05 17:11:13

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2411.08932v2

Optimally Installing Strict Equilibria

In this work, we develop a reward design framework for installing a desired behavior as a strict equilibrium across standard solution concepts: dominant strategy equilibrium, Nash equilibrium, correlated equilibrium, and coarse correlated equilibrium. We also extend our framework to capture the Markov-perfect equivalents of each solution concept. Central to our framework is a comprehensive mathematical characterization of strictly installable, based on the desired solution concept and the behavior's structure. These characterizations lead to efficient iterative algorithms, which we generalize to handle optimization objectives through linear programming. Finally, we explore how our results generalize to bounded rational agents.

Updated: 2025-03-05 17:11:02

标题: 最佳安装严格均衡

摘要: 在这项工作中，我们开发了一个奖励设计框架，用于在标准解决方案概念中安装所期望的行为作为严格均衡：主导策略均衡、纳什均衡、相关均衡和粗糙相关均衡。我们还扩展了我们的框架，以捕捉每个解决方案概念的马尔可夫完美等效。我们框架的核心是对严格可安装的全面数学特征的描述，基于所期望的解决方案概念和行为的结构。这些特征导致高效的迭代算法，我们将其推广以通过线性规划处理优化目标。最后，我们探讨了我们的结果如何推广到有界理性的代理人。

更新时间: 2025-03-05 17:11:02

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2503.03676v1

Bonsai: Gradient-free Graph Distillation for Node Classification

Graph distillation has emerged as a promising avenue to enable scalable training of GNNs by compressing the training dataset while preserving essential graph characteristics. Our study uncovers significant shortcomings in current graph distillation techniques. First, the majority of the algorithms paradoxically require training on the full dataset to perform distillation. Second, due to their gradient-emulating approach, these methods require fresh distillation for any change in hyperparameters or GNN architecture, limiting their flexibility and reusability. Finally, they fail to achieve substantial size reduction due to synthesizing fully-connected, edge-weighted graphs. To address these challenges, we present Bonsai, a novel graph distillation method empowered by the observation that \textit{computation trees} form the fundamental processing units of message-passing GNNs. Bonsai distills datasets by encoding a careful selection of \textit{exemplar} trees that maximize the representation of all computation trees in the training set. This unique approach imparts Bonsai as the first linear-time, model-agnostic graph distillation algorithm for node classification that outperforms existing baselines across $6$ real-world datasets on accuracy, while being $22$ times faster on average. Bonsai is grounded in rigorous mathematical guarantees on the adopted approximation strategies making it robust to GNN architectures, datasets, and parameters.

Updated: 2025-03-05 17:09:46

标题: 盆景：无梯度图蒸馏用于节点分类

摘要: 图形精炼已成为一种有前途的途径，可通过压缩训练数据集来训练规模可扩展的GNN，并同时保留基本的图特征。我们的研究揭示了当前图形精炼技术存在的重大缺陷。首先，大多数算法矛盾地要求在完整数据集上进行训练才能进行精炼。其次，由于它们的梯度模拟方法，这些方法需要针对超参数或GNN架构的任何更改进行新的精炼，限制了它们的灵活性和可重复使用性。最后，它们未能实现实质性的尺寸缩减，因为它们合成了完全连接的、边权重化的图形。为了解决这些挑战，我们提出了Bonsai，一种新颖的图形精炼方法，其观察结果表明“计算树”形成消息传递GNN的基本处理单元。Bonsai通过对最大化训练集中所有计算树表示的一组精心选择的“典范”树进行编码来精炼数据集。这种独特的方法使Bonsai成为第一个用于节点分类的线性时间、与模型无关的图形精炼算法，在精度上优于现有基准，在6个真实世界数据集上平均快22倍。Bonsai基于采用的近似策略具有严格的数学保证，使其对GNN架构、数据集和参数具有鲁棒性。

更新时间: 2025-03-05 17:09:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.17579v3

Statistical Advantages of Perturbing Cosine Router in Mixture of Experts

The cosine router in Mixture of Experts (MoE) has recently emerged as an attractive alternative to the conventional linear router. Indeed, the cosine router demonstrates favorable performance in image and language tasks and exhibits better ability to mitigate the representation collapse issue, which often leads to parameter redundancy and limited representation potentials. Despite its empirical success, a comprehensive analysis of the cosine router in MoE has been lacking. Considering the least square estimation of the cosine routing MoE, we demonstrate that due to the intrinsic interaction of the model parameters in the cosine router via some partial differential equations, regardless of the structures of the experts, the estimation rates of experts and model parameters can be as slow as $\mathcal{O}(1/\log^{\tau}(n))$ where $\tau > 0$ is some constant and $n$ is the sample size. Surprisingly, these pessimistic non-polynomial convergence rates can be circumvented by the widely used technique in practice to stabilize the cosine router -- simply adding noises to the $\ell^2$-norms in the cosine router, which we refer to as \textit{perturbed cosine router}. Under the strongly identifiable settings of the expert functions, we prove that the estimation rates for both the experts and model parameters under the perturbed cosine routing MoE are significantly improved to polynomial rates. Finally, we conduct extensive simulation studies in both synthetic and real data settings to empirically validate our theoretical results.

Updated: 2025-03-05 17:05:55

标题: 混合专家中扰动余弦路由器的统计优势

摘要: 在Mixture of Experts（MoE）中，余弦路由器最近作为传统线性路由器的一种有吸引力的替代方案出现。事实上，余弦路由器在图像和语言任务中表现出良好的性能，并且展现出更好的能力来减轻表示坍塌问题，这经常导致参数冗余和有限的表示潜力。尽管余弦路由器在实证上取得了成功，但对MoE中余弦路由器的全面分析尚缺乏。考虑到余弦路由MoE的最小二乘估计，我们证明由于余弦路由器中模型参数的固有相互作用通过一些偏微分方程，无论专家的结构如何，专家和模型参数的估计速率都可能慢至$\mathcal{O}(1/\log^{\tau}(n))$，其中$\tau>0$是一些常数，$n$是样本量。令人惊讶的是，这些悲观的非多项式收敛速率可以通过在实践中广泛使用的技术来稳定余弦路由器来避免--简单地向余弦路由器中的$\ell^2$范数添加噪声，我们称之为\textit{扰动余弦路由器}。在专家函数的强可识别设置下，我们证明在扰动余弦路由MoE中，专家和模型参数的估计速率显着提高到多项式速率。最后，我们进行了大量的仿真研究，包括合成数据和真实数据设置，以实证验证我们的理论结果。

更新时间: 2025-03-05 17:05:55

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.14131v3

Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models

We present Attentive Reasoning Queries (ARQs), a novel structured reasoning approach that significantly improves instruction-following in Large Language Models through domain-specialized reasoning blueprints. While LLMs demonstrate remarkable capabilities across diverse tasks, they often fail to maintain adherence to complex, use-case-specific instructions during multi-turn conversations, presenting challenges for business-critical applications. ARQs address this limitation by guiding LLMs through systematic reasoning steps with targeted queries that reinstate critical instructions and facilitate intermediate reasoning throughout the completion process. In extensive testing within Parlant, our framework for reliable customer-facing agents in which ARQs were born out of necessity, they achieved a 90.2% success rate across 87 test scenarios, outperforming both Chain-of-Thought reasoning (86.1%) and direct response generation (81.5%). ARQs showed particular strength in addressing persistent failure modes like guideline re-application and hallucination prevention. Our analysis also revealed that ARQs can potentially be more computationally efficient than free-form reasoning when carefully designed. These findings demonstrate that structured reasoning approaches provide effective mechanisms for controlling how LLMs process information and make decisions in complex scenarios.

Updated: 2025-03-05 17:03:48

标题: 专注推理查询：优化大型语言模型遵循指令的系统方法

摘要: 我们提出了专注推理查询（ARQs），这是一种新颖的结构化推理方法，通过领域专门化的推理蓝图显著提高了大型语言模型在指令遵循方面的表现。虽然大型语言模型在各种任务中展现出卓越的能力，但它们在多轮对话中往往无法遵循复杂的、特定用例的指令，这给商业关键应用带来了挑战。ARQs通过引导大型语言模型进行系统化推理步骤，使用有针对性的查询来恢复关键指令并促进在完成过程中的中间推理，以解决这一限制。在Parlant中进行了广泛测试，这是我们为可靠的面向客户的代理商设计的框架，ARQs是必需的。在87个测试场景中，ARQs实现了90.2%的成功率，优于思维链推理（86.1%）和直接响应生成（81.5%）。ARQs在解决指南重新应用和幻觉预防等持续失败模式方面表现出特别的优势。我们的分析还表明，当精心设计时，ARQs可能比自由形式的推理更具计算效率。这些发现表明，结构化推理方法为控制大型语言模型处理信息和在复杂场景中做出决策提供了有效的机制。

更新时间: 2025-03-05 17:03:48

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2503.03669v1

Framing the Game: How Context Shapes LLM Decision-Making

Large Language Models (LLMs) are increasingly deployed across diverse contexts to support decision-making. While existing evaluations effectively probe latent model capabilities, they often overlook the impact of context framing on perceived rational decision-making. In this study, we introduce a novel evaluation framework that systematically varies evaluation instances across key features and procedurally generates vignettes to create highly varied scenarios. By analyzing decision-making patterns across different contexts with the same underlying game structure, we uncover significant contextual variability in LLM responses. Our findings demonstrate that this variability is largely predictable yet highly sensitive to framing effects. Our results underscore the need for dynamic, context-aware evaluation methodologies for real-world deployments.

Updated: 2025-03-05 17:03:28

标题: 界定游戏：上下文如何塑造LLM决策-making

摘要: 大型语言模型（LLMs）越来越多地在不同的环境中部署，以支持决策制定。虽然现有的评估有效地探究了潜在的模型能力，但它们常常忽视了背景框架对于感知理性决策的影响。在这项研究中，我们引入了一个新颖的评估框架，通过系统地变化评估实例的关键特征，并程序生成小品，创造高度多样化的场景。通过分析相同基本游戏结构下不同背景下的决策模式，我们发现LLM响应中存在显著的上下文变异性。我们的研究结果表明，这种变异性在很大程度上是可预测的，但受到框架效应的高度敏感。我们的结果强调了对于真实世界部署需要动态、上下文感知的评估方法论。

更新时间: 2025-03-05 17:03:28

领域: cs.CL,cs.AI,cs.GT

下载: http://arxiv.org/abs/2503.04840v1

Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction

Analogical reasoning relies on conceptual abstractions, but it is unclear whether Large Language Models (LLMs) harbor such internal representations. We explore distilled representations from LLM activations and find that function vectors (FVs; Todd et al., 2024) - compact representations for in-context learning (ICL) tasks - are not invariant to simple input changes (e.g., open-ended vs. multiple-choice), suggesting they capture more than pure concepts. Using representational similarity analysis (RSA), we localize a small set of attention heads that encode invariant concept vectors (CVs) for verbal concepts like "antonym". These CVs function as feature detectors that operate independently of the final output - meaning that a model may form a correct internal representation yet still produce an incorrect output. Furthermore, CVs can be used to causally guide model behaviour. However, for more abstract concepts like "previous" and "next", we do not observe invariant linear representations, a finding we link to generalizability issues LLMs display within these domains.

Updated: 2025-03-05 16:59:08

标题: 大型语言模型中的类比推理：概念向量和抽象的极限

摘要: 类比推理依赖于概念抽象，但目前尚不清楚大型语言模型（LLMs）是否具有这种内部表示。我们探究了从LLM激活中提炼出来的表示，并发现功能向量（FVs；Todd等人，2024年）——用于上下文学习任务的紧凑表示——并不对简单输入变化（例如，开放式与多项选择）具有不变性，这表明它们捕捉到的不仅仅是纯粹的概念。通过使用表示相似性分析（RSA），我们定位了一小组注意力头部，这些头部编码了针对词语概念（如“反义词”）的不变概念向量（CVs）。这些CVs作为特征检测器独立于最终输出运行，这意味着一个模型可能形成了正确的内部表示，但仍会产生不正确的输出。此外，CVs可以用来因果地引导模型行为。然而，对于更抽象的概念，如“前一个”和“后一个”，我们并未观察到不变的线性表示，这一发现与LLMs在这些领域内展示的泛化性问题相关。

更新时间: 2025-03-05 16:59:08

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.03666v1

A Generative Approach to High Fidelity 3D Reconstruction from Text Data

The convergence of generative artificial intelligence and advanced computer vision technologies introduces a groundbreaking approach to transforming textual descriptions into three-dimensional representations. This research proposes a fully automated pipeline that seamlessly integrates text-to-image generation, various image processing techniques, and deep learning methods for reflection removal and 3D reconstruction. By leveraging state-of-the-art generative models like Stable Diffusion, the methodology translates natural language inputs into detailed 3D models through a multi-stage workflow. The reconstruction process begins with the generation of high-quality images from textual prompts, followed by enhancement by a reinforcement learning agent and reflection removal using the Stable Delight model. Advanced image upscaling and background removal techniques are then applied to further enhance visual fidelity. These refined two-dimensional representations are subsequently transformed into volumetric 3D models using sophisticated machine learning algorithms, capturing intricate spatial relationships and geometric characteristics. This process achieves a highly structured and detailed output, ensuring that the final 3D models reflect both semantic accuracy and geometric precision. This approach addresses key challenges in generative reconstruction, such as maintaining semantic coherence, managing geometric complexity, and preserving detailed visual information. Comprehensive experimental evaluations will assess reconstruction quality, semantic accuracy, and geometric fidelity across diverse domains and varying levels of complexity. By demonstrating the potential of AI-driven 3D reconstruction techniques, this research offers significant implications for fields such as augmented reality (AR), virtual reality (VR), and digital content creation.

Updated: 2025-03-05 16:54:15

标题: 从文本数据实现高保真度3D重建的生成式方法

摘要: 生成式人工智能和先进的计算机视觉技术的融合引入了一种突破性的方法，将文本描述转化为三维表示。本研究提出了一个完全自动化的流程，无缝集成了文本到图像生成、各种图像处理技术以及用于反射消除和三维重建的深度学习方法。通过利用像Stable Diffusion这样的最新生成模型，该方法通过多阶段工作流程将自然语言输入转化为详细的三维模型。重建过程始于从文本提示生成高质量图像，接着通过一个强化学习代理增强图像，并使用Stable Delight模型消除反射。随后，先进的图像放大和背景消除技术被应用以进一步提升视觉保真度。这些经过精细化处理的二维表示随后通过复杂的机器学习算法转化为体积三维模型，捕捉精细的空间关系和几何特征。该过程实现了高度结构化和详细化的输出，确保最终的三维模型既反映了语义准确性，又具有几何精度。这种方法解决了生成重建中的关键挑战，如保持语义连贯性、管理几何复杂性和保留详细的视觉信息。全面的实验评估将评估重建质量、语义准确性和几何保真度，跨越不同领域和不同复杂程度。通过展示AI驱动的3D重建技术的潜力，本研究为增强现实（AR）、虚拟现实（VR）和数字内容创作等领域提供了重要的启示。

更新时间: 2025-03-05 16:54:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.03664v1

A privacy-preserving, distributed and cooperative FCM-based learning approach for cancer research

Distributed Artificial Intelligence is attracting interest day by day. In this paper, the authors introduce an innovative methodology for distributed learning of Particle Swarm Optimization-based Fuzzy Cognitive Maps in a privacy-preserving way. The authors design a training scheme for collaborative FCM learning that offers data privacy compliant with the current regulation. This method is applied to a cancer detection problem, proving that the performance of the model is improved by the Federated Learning process, and obtaining similar results to the ones that can be found in the literature.

Updated: 2025-03-05 16:51:06

标题: 一个隐私保护的、分布式和合作的基于FCM的癌症研究学习方法

摘要: 分布式人工智能正在日益受到关注。在本文中，作者介绍了一种创新的方法，用于以保护隐私的方式分布式学习基于粒子群优化的模糊认知图。作者设计了一个合作FCM学习的训练方案，提供符合当前法规的数据隐私保护。该方法应用于癌症检测问题，证明了通过联邦学习过程改进了模型的性能，并获得了与文献中相似的结果。

更新时间: 2025-03-05 16:51:06

领域: cs.AI,cs.DC

下载: http://arxiv.org/abs/2402.10102v2

SMAC-R1: The Emergence of Intelligence in Decision-Making Tasks

StarCraft Multi-Agent Challenge (SMAC) has been one of the most commonly used experimental environments in multi-agent reinforcement learning (MARL), where the specific task is to control a set number of allied units to defeat enemy forces. Traditional MARL algorithms often require interacting with the environment for millions of steps to train a parametric model, of which the resulting policies are typically non-interpretable with weak transferability. In this paper, we introduce SMAC-R1 which is based on the Qwen2.5-7B-Base LLM distilled from DeepSeek-Coder-v2.5-236B. Similar to online reinforcement learning after behavior cloning in offline learning process, in our pipeline, agents leverage the DeepSeek LLM to generate decision tree code by providing task descriptions, and the agents are further self-reflected using feedback from the rewards provided by the environment. Based on that, we augment the generated scripts to fine-tune a small LLM, Qwen2.5-7B-Base, to distill the decision-making ability via Supervised Fine-Tuning (SFT) and enhance the script generation ability by the Group Relative Policy Optimization (GRPO) algorithm. We conduct experiments in the original 23 SMAC tasks and 10 newly-designed tasks to demonstrate that our method can produce high-quality, interpretable decision trees with minimal environmental exploration. Moreover, these scripts exhibit strong transferability, successfully applying to homogeneous SMAC environments without modification. We believe this approach offers a new direction for solving decision-making tasks and domain-specific LLM training pipelines in the future.

Updated: 2025-03-05 16:49:51

标题: SMAC-R1：决策任务中智能的出现

摘要: 星际争霸多智能体挑战（SMAC）是多智能体强化学习（MARL）中最常用的实验环境之一，其具体任务是控制一定数量的盟军单位击败敌军部队。传统的MARL算法通常需要与环境进行数百万步的交互，以训练参数模型，生成的策略通常是无法解释的，并且具有较弱的可转移性。本文介绍了基于Qwen2.5-7B-Base LLM（深度搜索编码器-v2.5-236B提炼）的SMAC-R1。与离线学习过程中的行为克隆后的在线强化学习类似，在我们的流程中，代理利用DeepSeek LLM通过提供任务描述生成决策树代码，并通过环境提供的奖励反馈进一步自我反思。基于此，我们通过监督微调（SFT）增强了生成的脚本来微调一个小型LLM，Qwen2.5-7B-Base，以提炼决策能力，并通过组相对策略优化（GRPO）算法增强了脚本生成能力。我们在原始的23个SMAC任务和10个新设计的任务中进行实验，以证明我们的方法可以在最小的环境探索下产生高质量、可解释的决策树。此外，这些脚本显示出很强的可转移性，成功应用于不需要修改的同质SMAC环境。我们相信这种方法为未来解决决策任务和特定领域LLM训练管道提供了新的方向。

更新时间: 2025-03-05 16:49:51

领域: cs.AI

下载: http://arxiv.org/abs/2410.16024v2

What to align in multimodal contrastive learning?

Humans perceive the world through multisensory integration, blending the information of different modalities to adapt their behavior. Contrastive learning offers an appealing solution for multimodal self-supervised learning. Indeed, by considering each modality as a different view of the same entity, it learns to align features of different modalities in a shared representation space. However, this approach is intrinsically limited as it only learns shared or redundant information between modalities, while multimodal interactions can arise in other ways. In this work, we introduce CoMM, a Contrastive MultiModal learning strategy that enables the communication between modalities in a single multimodal space. Instead of imposing cross- or intra- modality constraints, we propose to align multimodal representations by maximizing the mutual information between augmented versions of these multimodal features. Our theoretical analysis shows that shared, synergistic and unique terms of information naturally emerge from this formulation, allowing us to estimate multimodal interactions beyond redundancy. We test CoMM both in a controlled and in a series of real-world settings: in the former, we demonstrate that CoMM effectively captures redundant, unique and synergistic information between modalities. In the latter, CoMM learns complex multimodal interactions and achieves state-of-the-art results on the seven multimodal benchmarks. Code is available at https://github.com/Duplums/CoMM

Updated: 2025-03-05 16:48:23

标题: 多模式对比学习中应该对齐什么？

摘要: 人类通过多感官整合感知世界，将不同感知模式的信息融合以调整其行为。对比学习为多模态自监督学习提供了一种吸引人的解决方案。事实上，通过将每种模态视为同一实体的不同视图，它学习将不同模态的特征对齐在一个共享的表示空间中。然而，这种方法在本质上是有限的，因为它只学习模态之间的共享或冗余信息，而多模态交互可能以其他方式出现。在这项工作中，我们介绍了CoMM，一种对比多模态学习策略，它能够在单一多模态空间中实现模态之间的交流。我们提出通过最大化这些多模态特征的增强版本之间的互信息来对齐多模态表示，而不是强加跨模态或内模态约束。我们的理论分析表明，共享的、协同的和独特的信息术语自然地从这种公式中出现，使我们能够估计超越冗余的多模态交互。我们在受控环境和一系列真实世界环境中测试了CoMM：在前者中，我们证明了CoMM有效地捕捉了模态之间的冗余、独特和协同信息。在后者中，CoMM学习了复杂的多模态交互，并在七个多模态基准测试上取得了最先进的结果。代码可在https://github.com/Duplums/CoMM上获得。

更新时间: 2025-03-05 16:48:23

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2409.07402v2

Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns

Soft Actor-Critic (SAC) critically depends on its critic network, which typically evaluates a single state-action pair to guide policy updates. Using N-step returns is a common practice to reduce the bias in the target values of the critic. However, using N-step returns can again introduce high variance and necessitates importance sampling, often destabilizing training. Recent algorithms have also explored action chunking-such as direct action repetition and movement primitives-to enhance exploration. In this paper, we propose a Transformer-based Critic Network for SAC that integrates the N-returns framework in a stable and efficient manner. Unlike approaches that perform chunking in the actor network, we feed chunked actions into the critic network to explore potential performance gains. Our architecture leverages the Transformer's ability to process sequential information, facilitating more robust value estimation. Empirical results show that this method not only achieves efficient, stable training but also excels in sparse reward/multi-phase environments-traditionally a challenge for step-based methods. These findings underscore the promise of combining Transformer-based critics with N-returns to advance reinforcement learning performance

Updated: 2025-03-05 16:47:36

标题: 对评论进行分块：基于Transformer的带有N步回报的软Actor-Critic

摘要: 软性Actor-Critic（SAC）在很大程度上依赖于其评论网络，通常评估单个状态-动作对以指导策略更新。使用N步返回是减少评论目标值偏差的常见做法。然而，使用N步返回可能会再次引入高方差，并需要重要性采样，通常会使训练不稳定。最近的算法还探索了动作分块技术，如直接动作重复和运动基元，以增强探索性。在本文中，我们提出了一种基于Transformer的评论网络，用于SAC，以稳定高效地集成N步返回框架。与在演员网络中执行分块操作的方法不同，我们将分块动作输入评论网络以探索潜在的性能增益。我们的架构利用Transformer处理序列信息的能力，促进更稳健的价值估计。实证结果表明，这种方法不仅实现了高效稳定的训练，而且在稀疏奖励/多阶段环境中表现出色，这在传统的基于步骤的方法中通常是一个挑战。这些发现强调了将基于Transformer的评论者与N步返回相结合以提高强化学习性能的潜力。

更新时间: 2025-03-05 16:47:36

领域: cs.LG

下载: http://arxiv.org/abs/2503.03660v1

Finite-sample valid prediction of future insurance claims in the regression problem

In the current insurance literature, prediction of insurance claims in the regression problem is often performed with a statistical model. This model-based approach may suffer from several drawbacks: (i) model misspecification, (ii) selection effect, and (iii) lack of finite-sample validity. This article addresses these three issues simultaneously by employing conformal prediction-a general machine learning strategy for valid predictions. The proposed method is both model-free and tuning-parameter-free. It also guarantees finite-sample validity at a pre-assigned coverage probability level.

Updated: 2025-03-05 16:47:08

标题: 回归问题中有限样本有效预测未来保险赔偿金额

摘要: 在当前的保险文献中，通常使用统计模型来预测回归问题中的保险索赔。这种基于模型的方法可能存在几个缺点：（i）模型规范错误，（ii）选择效应，以及（iii）缺乏有限样本有效性。本文通过采用符合预测-一种用于有效预测的通用机器学习策略，同时解决这三个问题。所提出的方法既不依赖于模型也不需要调参。它还在预先分配的覆盖概率水平上保证有限样本有效性。

更新时间: 2025-03-05 16:47:08

领域: stat.ML,cs.LG,stat.AP,62P05, 91G05

下载: http://arxiv.org/abs/2503.03659v1

Robust Learning of Diverse Code Edits

Software engineering activities frequently involve edits to existing code. However, contemporary code language models (LMs) lack the ability to handle diverse types of code-edit requirements. In this work, we attempt to overcome this shortcoming through (1) a novel synthetic data generation pipeline and (2) a robust model adaptation algorithm. Starting with seed code examples and diverse editing criteria, our pipeline generates high-quality samples comprising original and modified code, along with natural language instructions in different styles and verbosity. Today's code LMs come bundled with strong abilities, such as code generation and instruction following, which should not be lost due to fine-tuning. To ensure this, we propose a novel adaptation algorithm, SeleKT, that (a) leverages a dense gradient-based step to identify the weights that are most important for code editing, and (b) does a sparse projection onto the base model to avoid overfitting. Using our approach, we obtain a new series of models NextCoder (adapted from QwenCoder-2.5) that achieves strong results on five code-editing benchmarks, outperforming comparable size models and even several larger ones. We show the generality of our approach on two model families (DeepSeekCoder and QwenCoder), compare against other fine-tuning approaches, and demonstrate robustness by showing retention of code generation abilities post adaptation.

Updated: 2025-03-05 16:39:04

标题: 多样化代码编辑的鲁棒学习

摘要: 软件工程活动经常涉及对现有代码的编辑。然而，当代代码语言模型（LMs）缺乏处理多样化代码编辑要求的能力。在这项工作中，我们尝试通过（1）一种新颖的合成数据生成管道和（2）一个强大的模型适应算法来克服这一缺点。从种子代码示例和多样化的编辑标准开始，我们的管道生成高质量的样本，包括原始和修改后的代码，以及不同风格和冗长程度的自然语言说明。今天的代码LMs捆绑了强大的功能，如代码生成和指令遵循，这些功能不应因微调而丢失。为了确保这一点，我们提出了一种新颖的适应算法SeleKT，它（a）利用密集的基于梯度的步骤来识别对代码编辑最重要的权重，并（b）对基本模型进行稀疏投影，以避免过拟合。使用我们的方法，我们获得了一系列新模型NextCoder（从QwenCoder-2.5调整而来），在五个代码编辑基准测试中取得了强大的结果，超过了可比较大小的模型甚至几个更大的模型。我们展示了我们方法的普适性，在两个模型系列（DeepSeekCoder和QwenCoder）上进行比较，与其他微调方法进行比较，并通过展示在适应后保留代码生成能力来展示鲁棒性。

更新时间: 2025-03-05 16:39:04

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2503.03656v1

CycleResearcher: Improving Automated Research via Automated Review

The automation of scientific discovery has been a long-standing goal within the research community, driven by the potential to accelerate knowledge creation. While significant progress has been made using commercial large language models (LLMs) as research assistants or idea generators, the possibility of automating the entire research process with open-source LLMs remains largely unexplored. This paper explores the feasibility of using open-source post-trained LLMs as autonomous agents capable of performing the full cycle of automated research and review, from literature review and manuscript preparation to peer review and paper refinement. Our iterative preference training framework consists of CycleResearcher, which conducts research tasks, and CycleReviewer, which simulates the peer review process, providing iterative feedback via reinforcement learning. To train these models, we develop two new datasets, Review-5k and Research-14k, reflecting real-world machine learning research and peer review dynamics. Our results demonstrate that CycleReviewer achieves promising performance with a 26.89\% reduction in mean absolute error (MAE) compared to individual human reviewers in predicting paper scores, indicating the potential of LLMs to effectively assist expert-level research evaluation. In research, the papers generated by the CycleResearcher model achieved a score of 5.36 in simulated peer reviews, showing some competitiveness in terms of simulated review scores compared to the preprint level of 5.24 from human experts, while still having room for improvement compared to the accepted paper level of 5.69. This work represents a significant step toward fully automated scientific inquiry, providing ethical safeguards and exploring AI-driven research capabilities. The code, dataset and model weight are released at https://wengsyx.github.io/Researcher/

Updated: 2025-03-05 16:36:05

标题: CycleResearcher：通过自动审查改善自动化研究

摘要: 科学发现的自动化一直是研究界长期以来的目标，驱动力是加速知识创造的潜力。虽然使用商业大型语言模型（LLMs）作为研究助手或创意生成器取得了重大进展，但利用开源LLMs自动化整个研究过程的可能性仍然大部分未被探索。本文探讨了使用开源后训练LLMs作为能够执行从文献审阅和稿件准备到同行评审和论文完善的整个自动化研究和审查周期的自主代理的可行性。我们的迭代偏好训练框架包括CycleResearcher，负责进行研究任务，以及CycleReviewer，模拟同行评审过程，通过强化学习提供迭代反馈。为训练这些模型，我们开发了两个新数据集，Review-5k和Research-14k，反映了真实世界中的机器学习研究和同行评审动态。我们的结果表明，CycleReviewer在预测论文得分时相对于个别人类评审员减少了26.89\%的平均绝对误差（MAE），表明LLMs有潜力有效地辅助专家级研究评估。在研究中，CycleResearcher模型生成的论文在模拟同行评审中获得了5.36的评分，在模拟审查评分方面显示了一定的竞争力，与人类专家的预印本水平5.24相比，但与被接受的论文水平5.69相比仍有改进空间。这项工作代表了朝着完全自动化科学探究迈出的重要一步，提供了道德保障，并探索了人工智能驱动的研究能力。代码、数据集和模型权重已发布在https://wengsyx.github.io/Researcher/。

更新时间: 2025-03-05 16:36:05

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2411.00816v2

Improving 6D Object Pose Estimation of metallic Household and Industry Objects

6D object pose estimation suffers from reduced accuracy when applied to metallic objects. We set out to improve the state-of-the-art by addressing challenges such as reflections and specular highlights in industrial applications. Our novel BOP-compatible dataset, featuring a diverse set of metallic objects (cans, household, and industrial items) under various lighting and background conditions, provides additional geometric and visual cues. We demonstrate that these cues can be effectively leveraged to enhance overall performance. To illustrate the usefulness of the additional features, we improve upon the GDRNPP algorithm by introducing an additional keypoint prediction and material estimator head in order to improve spatial scene understanding. Evaluations on the new dataset show improved accuracy for metallic objects, supporting the hypothesis that additional geometric and visual cues can improve learning.

Updated: 2025-03-05 16:35:15

标题: 改进金属家庭和工业物体的6D物体姿态估计

摘要: 6D物体姿态估计在应用于金属物体时精度降低。我们旨在通过解决工业应用中的反射和镜面高光等挑战来改进现有技术水平。我们提出了一种新颖的BOP兼容数据集，其中包含各种金属物体（罐头、家庭和工业物品），并在不同的光照和背景条件下提供额外的几何和视觉线索。我们展示了这些线索可以有效地用于提升整体性能。为了说明额外特征的用处，我们通过引入一个额外的关键点预测和材料估计器头来改进GDRNPP算法，以提高空间场景理解能力。在新数据集上的评估显示，对于金属物体，精度有所提高，支持额外的几何和视觉线索可以提高学习的假设。

更新时间: 2025-03-05 16:35:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.03655v1

Advancing Multimodal In-Context Learning in Large Vision-Language Models with Task-aware Demonstrations

Multimodal in-context learning (ICL) has emerged as a key capability of Large Vision-Language Models (LVLMs), driven by their increasing scale and applicability. Despite its promise, effective ICL in the multimodal setting remains challenging due to the inherent complexity of image-text inputs and the high sensitivity of ICL performance to input configurations. In this work, we shed light on the core mechanism underlying multimodal ICL, identifying task mapping as a crucial factor in configuring robust in-context demonstration (ICD) sequences. Building on these insights, we propose \textit{SabER}, a lightweight yet powerful decoder-only transformer equipped with task-aware attention, which intelligently selects and arranges ICDs from a demonstration library in an autoregressive fashion. This design enables fine-grained feature extraction and cross-modal reasoning, iteratively refining task mapping to generate high-quality ICD sequences. Through extensive experiments covering five LVLMs and nine benchmark datasets, SabER not only demonstrates strong empirical performance, but also provides deeper understanding of how task semantics interact with multimodal ICDs. Our findings highlight the importance of principled ICD sequence configuration and open new avenues to enhance multimodal ICL in a wide range of real-world scenarios.

Updated: 2025-03-05 16:33:10

标题: 用任务感知演示推动大规模视觉语言模型中的多模态上下文学习

摘要: 多模态上下文学习（ICL）已经成为大型视觉语言模型（LVLMs）的关键能力，这是由于它们不断增加的规模和适用性所推动的。尽管具有潜力，但在多模态环境中实现有效的ICL仍然具有挑战性，原因是图像文本输入的固有复杂性以及ICL性能对输入配置的高度敏感。在这项工作中，我们阐明了支撑多模态ICL的核心机制，确定任务映射作为配置稳健的上下文演示（ICD）序列的关键因素。基于这些见解，我们提出了轻量级但功能强大的仅解码器变压器\textit {SabER}，配备了任务感知的注意力，智能地以自回归方式从演示库中选择和排列ICDs。这种设计实现了精细的特征提取和跨模态推理，通过迭代地优化任务映射来生成高质量的ICD序列。通过涵盖五个LVLMs和九个基准数据集的广泛实验，SabER不仅展示了强大的经验性能，还深入理解了任务语义与多模态ICDs的相互作用。我们的研究结果强调了有原则的ICD序列配置的重要性，并为增强多模态ICL在各种实际场景中的开展新途径。

更新时间: 2025-03-05 16:33:10

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.04839v1

Improving Neutral Point of View Text Generation through Parameter-Efficient Reinforcement Learning and a Small-Scale High-Quality Dataset

This paper describes the construction of a dataset and the evaluation of training methods to improve generative large language models' (LLMs) ability to answer queries on sensitive topics with a Neutral Point of View (NPOV), i.e., to provide significantly more informative, diverse and impartial answers. The dataset, the SHQ-NPOV dataset, comprises 300 high-quality, human-written quadruplets: a query on a sensitive topic, an answer, an NPOV rating, and a set of links to source texts elaborating the various points of view. The first key contribution of this paper is a new methodology to create such datasets through iterative rounds of human peer-critique and annotator training, which we release alongside the dataset. The second key contribution is the identification of a highly effective training regime for parameter-efficient reinforcement learning (PE-RL) to improve NPOV generation. We compare and extensively evaluate PE-RL and multiple baselines-including LoRA finetuning (a strong baseline), SFT and RLHF. PE-RL not only improves on overall NPOV quality compared to the strongest baseline ($97.06\%\rightarrow 99.08\%$), but also scores much higher on features linguists identify as key to separating good answers from the best answers ($60.25\%\rightarrow 85.21\%$ for presence of supportive details, $68.74\%\rightarrow 91.43\%$ for absence of oversimplification). A qualitative analysis corroborates this. Finally, our evaluation finds no statistical differences between results on topics that appear in the training dataset and those on separated evaluation topics, which provides strong evidence that our approach to training PE-RL exhibits very effective out of topic generalization.

Updated: 2025-03-05 16:32:47

标题: 通过参数高效强化学习和小规模高质量数据集改进中立观点文本生成

摘要: 这篇论文描述了一个数据集的构建和评估训练方法，旨在改善生成式大型语言模型（LLMs）在敏感话题上回答查询的能力，具有中立观点（NPOV），即提供更多信息丰富、多样化和公正的答案。该数据集，SHQ-NPOV数据集，包括300个高质量的、人工撰写的四元组：一个关于敏感话题的查询、一个答案、一个NPOV评分，以及一组指向详细观点的源文本链接。本文的第一个重要贡献是通过人类同行批评和注释者培训的迭代回合创建这种数据集的新方法，我们将其与数据集一起发布。第二个重要贡献是确定了一种高效的训练方案，用于改进NPOV生成的参数高效强化学习（PE-RL）。我们比较并广泛评估了PE-RL和多个基线，包括LoRA微调（一个强基线）、SFT和RLHF。 PE-RL不仅在总体NPOV质量上比最强基线改进（$97.06\%\rightarrow 99.08\%$），而且在语言学家认为是区分好答案和最佳答案的关键特征方面得分更高（支持性细节存在的比例从$60.25\%\rightarrow 85.21\%，简化不足的比例从$68.74\%\rightarrow 91.43\%）。定性分析证实了这一点。最后，我们的评估发现，在出现在训练数据集中和在分开评估话题上的结果之间没有统计学差异，这为我们训练PE-RL的方法展现了非常有效的超出话题概括的证据。

更新时间: 2025-03-05 16:32:47

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.03654v1

Token-Level Privacy in Large Language Models

The use of language models as remote services requires transmitting private information to external providers, raising significant privacy concerns. This process not only risks exposing sensitive data to untrusted service providers but also leaves it vulnerable to interception by eavesdroppers. Existing privacy-preserving methods for natural language processing (NLP) interactions primarily rely on semantic similarity, overlooking the role of contextual information. In this work, we introduce dchi-stencil, a novel token-level privacy-preserving mechanism that integrates contextual and semantic information while ensuring strong privacy guarantees under the dchi differential privacy framework, achieving 2epsilon-dchi-privacy. By incorporating both semantic and contextual nuances, dchi-stencil achieves a robust balance between privacy and utility. We evaluate dchi-stencil using state-of-the-art language models and diverse datasets, achieving comparable and even better trade-off between utility and privacy compared to existing methods. This work highlights the potential of dchi-stencil to set a new standard for privacy-preserving NLP in modern, high-risk applications.

Updated: 2025-03-05 16:27:25

标题: 大型语言模型中的令牌级隐私

摘要: 将语言模型用作远程服务需要将私人信息传输给外部提供商，这引发了重要的隐私问题。这个过程不仅会将敏感数据暴露给不受信任的服务提供商，还会使其容易被窃听者拦截。现有的自然语言处理（NLP）交互的隐私保护方法主要依赖于语义相似性，忽视了上下文信息的作用。在这项工作中，我们介绍了dchi-stencil，这是一种新颖的基于标记级别的隐私保护机制，它在确保dchi差分隐私框架下集成了上下文和语义信息，实现了2epsilon-dchi-隐私。通过融合语义和上下文细微差别，dchi-stencil在隐私和效用之间实现了稳健的平衡。我们使用最先进的语言模型和多样化的数据集评估了dchi-stencil，与现有方法相比实现了可比甚至更好的效用和隐私权衡。这项工作突显了dchi-stencil在现代高风险应用中设立新的隐私保护NLP标准的潜力。

更新时间: 2025-03-05 16:27:25

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2503.03652v1

REVERSIM: An Open-Source Environment for the Controlled Study of Human Aspects in Hardware Reverse Engineering

Hardware Reverse Engineering (HRE) is a technique for analyzing integrated circuits. Experts employ HRE for security-critical tasks, like detecting Trojans or intellectual property violations, relying not only on their experience and customized tools but also on their cognitive abilities. In this work, we introduce ReverSim, a software environment that models key HRE subprocesses and integrates standardized cognitive tests. ReverSim enables quantitative studies with easier-to-recruit non-experts to uncover cognitive factors relevant to HRE. We empirically evaluated ReverSim in three studies. Semi-structured interviews with 14 HRE professionals confirmed its comparability to real-world HRE processes. Two online user studies with 170 novices and intermediates revealed effective differentiation of participant performance across a spectrum of difficulties, and correlations between participants' cognitive processing speed and task performance. ReverSim is available as open-source software, providing a robust platform for controlled experiments to assess cognitive processes in HRE, potentially opening new avenues for hardware protection.

Updated: 2025-03-05 16:26:46

标题: REVERSIM：用于硬件逆向工程中对人类因素进行控制研究的开源环境

摘要: 硬件逆向工程（HRE）是一种分析集成电路的技术。专家们利用HRE进行安全关键任务，如检测木马或知识产权侵犯，不仅依靠他们的经验和定制工具，还依靠他们的认知能力。在这项工作中，我们介绍了ReverSim，这是一个模拟关键HRE子过程并集成标准化认知测试的软件环境。ReverSim使得可以进行定量研究，并更容易招募非专家人员来揭示与HRE相关的认知因素。我们通过三项研究对ReverSim进行了实证评估。与14名HRE专业人士进行的半结构化访谈确认了它与真实世界HRE过程的可比性。两项在线用户研究涉及170名新手和中级用户，揭示了在一系列困难中参与者表现的有效差异，以及参与者认知处理速度与任务表现之间的相关性。ReverSim作为开源软件可供使用，提供了一个强大的平台，用于评估HRE中的认知过程，可能为硬件保护开辟新途径。

更新时间: 2025-03-05 16:26:46

领域: cs.CR,cs.HC

下载: http://arxiv.org/abs/2309.05740v3

Limits of nonlinear and dispersive fiber propagation for photonic extreme learning

We report a generalized nonlinear Schr\"odinger equation simulation model of an extreme learning machine based on optical fiber propagation. Using handwritten digit classification as a benchmark, we study how accuracy depends on propagation dynamics, as well as parameters governing spectral encoding, readout, and noise. Test accuracies of over 91% and 93% are found for propagation in the anomalous and normal dispersion regimes respectively. Our simulation results also suggest that quantum noise on the input pulses introduces an intrinsic penalty to ELM performance.

Updated: 2025-03-05 16:25:58

标题: 非线性和色散光纤传输对于光子极限学习的限制

摘要: 我们报告了一个基于光纤传播的极端学习机的广义非线性薛定谔方程模拟模型。以手写数字分类作为基准，我们研究了准确度如何取决于传播动态以及控制光谱编码、输出和噪音的参数。在异常和正常色散区域的传播中，测试准确率分别超过91%和93%。我们的模拟结果还表明，输入脉冲上的量子噪声会对ELM性能造成固有惩罚。

更新时间: 2025-03-05 16:25:58

领域: physics.optics,cs.LG

下载: http://arxiv.org/abs/2503.03649v1

DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping

Dexterous grasping remains a fundamental yet challenging problem in robotics. A general-purpose robot must be capable of grasping diverse objects in arbitrary scenarios. However, existing research typically relies on specific assumptions, such as single-object settings or limited environments, leading to constrained generalization. Our solution is DexGraspVLA, a hierarchical framework that utilizes a pre-trained Vision-Language model as the high-level task planner and learns a diffusion-based policy as the low-level Action controller. The key insight lies in iteratively transforming diverse language and visual inputs into domain-invariant representations, where imitation learning can be effectively applied due to the alleviation of domain shift. Thus, it enables robust generalization across a wide range of real-world scenarios. Notably, our method achieves a 90+% success rate under thousands of unseen object, lighting, and background combinations in a ``zero-shot'' environment. Empirical analysis further confirms the consistency of internal model behavior across environmental variations, thereby validating our design and explaining its generalization performance. We hope our work can be a step forward in achieving general dexterous grasping. Our demo and code can be found at https://dexgraspvla.github.io/.

Updated: 2025-03-05 16:23:09

标题: DexGraspVLA：通向普遍灵巧抓取的视觉-语言-动作框架

摘要: 灵巧抓取仍然是机器人领域中一个基础而具有挑战性的问题。一个通用的机器人必须能够在各种情境下抓取不同的物体。然而，现有的研究通常依赖于特定的假设，如单一物体设置或有限的环境，导致了受限的泛化能力。我们的解决方案是DexGraspVLA，这是一个分层框架，利用预训练的视觉-语言模型作为高层任务规划器，并学习了一个基于扩散的策略作为低层动作控制器。关键的见解在于将不同的语言和视觉输入迭代地转化为领域不变表示，这样可以有效地应用模仿学习，因为领域转移得到缓解。因此，它使得在各种真实世界情景下实现了强大的泛化能力成为可能。值得注意的是，我们的方法在“零-shot”环境中以90%以上的成功率处理了数千个未见过的物体、光照和背景组合。实证分析进一步确认了模型在环境变化中的内部行为一致性，从而验证了我们的设计并解释了其泛化性能。我们希望我们的工作能够在实现一般灵巧抓取方面迈出一步。我们的演示和代码可以在https://dexgraspvla.github.io/找到。

更新时间: 2025-03-05 16:23:09

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2502.20900v2

AI Governance through Markets

This paper argues that market governance mechanisms should be considered a key approach in the governance of artificial intelligence (AI), alongside traditional regulatory frameworks. While current governance approaches have predominantly focused on regulation, we contend that market-based mechanisms offer effective incentives for responsible AI development. We examine four emerging vectors of market governance: insurance, auditing, procurement, and due diligence, demonstrating how these mechanisms can affirm the relationship between AI risk and financial risk while addressing capital allocation inefficiencies. While we do not claim that market forces alone can adequately protect societal interests, we maintain that standardised AI disclosures and market mechanisms can create powerful incentives for safe and responsible AI development. This paper urges regulators, economists, and machine learning researchers to investigate and implement market-based approaches to AI governance.

Updated: 2025-03-05 16:20:03

标题: 通过市场实现人工智能治理

摘要: 本文认为，在人工智能（AI）治理中，市场治理机制应被视为一种关键方法，与传统的监管框架并重。虽然当前的治理方法主要集中在监管上，但我们认为市场机制提供了促使负责任的AI发展的有效激励。我们研究了四个新兴的市场治理向量：保险、审计、采购和尽职调查，展示了这些机制如何确认AI风险与金融风险之间的关系，同时解决了资本配置的低效问题。尽管我们并不认为市场力量单独可以充分保护社会利益，但我们坚持认为标准化的AI披露和市场机制可以为安全和负责任的AI发展创造强大的激励。本文敦促监管机构、经济学家和机器学习研究人员调查和实施基于市场的AI治理方法。

更新时间: 2025-03-05 16:20:03

领域: econ.GN,cs.AI,q-fin.EC

下载: http://arxiv.org/abs/2501.17755v2

Provable Benefits of Task-Specific Prompts for In-context Learning

The in-context learning capabilities of modern language models have motivated a deeper mathematical understanding of sequence models. A line of recent work has shown that linear attention models can emulate projected gradient descent iterations to implicitly learn the task vector from the data provided in the context window. In this work, we consider a novel setting where the global task distribution can be partitioned into a union of conditional task distributions. We then examine the use of task-specific prompts and prediction heads for learning the prior information associated with the conditional task distribution using a one-layer attention model. Our results on loss landscape show that task-specific prompts facilitate a covariance-mean decoupling where prompt-tuning explains the conditional mean of the distribution whereas the variance is learned/explained through in-context learning. Incorporating task-specific head further aids this process by entirely decoupling estimation of mean and variance components. This covariance-mean perspective similarly explains how jointly training prompt and attention weights can provably help over fine-tuning after pretraining.

Updated: 2025-03-05 16:18:33

标题: 任务特定提示对于上下文学习的可证明益处

摘要: 现代语言模型的上下文学习能力激发了对序列模型更深入的数学理解。最近的一系列研究表明，线性注意力模型可以模拟投影梯度下降迭代，从上下文窗口提供的数据中隐式学习任务向量。在这项工作中，我们考虑了一个新颖的设置，全局任务分布可以被分为一组条件任务分布的并集。然后，我们研究了使用特定任务提示和预测头来学习与条件任务分布相关的先验信息，使用一个层的注意力模型。我们在损失景观上的结果表明，特定任务提示促进了协方差-均值解耦，其中提示调整解释了分布的条件均值，而方差则通过上下文学习进行学习/解释。进一步将特定任务头整合到这个过程中，通过完全解耦均值和方差分量来帮助这个过程。这种协方差-均值的视角类似地解释了如何通过联合训练提示和注意力权重可以在预训练后可证明地帮助微调。

更新时间: 2025-03-05 16:18:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.02102v2

MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question Generation

Automatic question generation is a critical task that involves evaluating question quality by considering factors such as engagement, pedagogical value, and the ability to stimulate critical thinking. These aspects require human-like understanding and judgment, which automated systems currently lack. However, human evaluations are costly and impractical for large-scale samples of generated questions. Therefore, we propose a novel system, MIRROR (Multi-LLM Iterative Review and Response for Optimized Rating), which leverages large language models (LLMs) to automate the evaluation process for questions generated by automated question generation systems. We experimented with several state-of-the-art LLMs, such as GPT-4, Gemini, and Llama2-70b. We observed that the scores of human evaluation metrics, namely relevance, appropriateness, novelty, complexity, and grammaticality, improved when using the feedback-based approach called MIRROR, tending to be closer to the human baseline scores. Furthermore, we observed that Pearson's correlation coefficient between GPT-4 and human experts improved when using our proposed feedback-based approach, MIRROR, compared to direct prompting for evaluation. Error analysis shows that our proposed approach, MIRROR, significantly helps to improve relevance and appropriateness.

Updated: 2025-03-05 16:16:01

标题: 《MIRROR：一种用于自动评估开放式问题生成的新方法》

摘要: 自动生成问题是一项关键任务，涉及评估问题质量，考虑因素如吸引力、教学价值和激发批判性思维的能力。这些方面需要类人的理解和判断，而自动化系统目前还缺乏。然而，人工评估对于大规模生成的问题样本来说成本高且不切实际。因此，我们提出了一种新颖的系统，MIRROR（多LLM迭代评审和优化评分），利用大型语言模型（LLMs）来自动化自动问题生成系统生成的问题的评估过程。我们尝试了几种最先进的LLMs，如GPT-4、Gemini和Llama2-70b。我们观察到，使用基于反馈的方法MIRROR时，人类评估指标的得分，即相关性、适当性、新颖性、复杂性和语法性，得到了改善，往往更接近人类基准得分。此外，我们观察到，与直接提示进行评估相比，使用我们提出的基于反馈的方法MIRROR时，GPT-4和人类专家之间的皮尔逊相关系数得到了改善。错误分析表明，我们提出的方法MIRROR显著有助于改善相关性和适当性。

更新时间: 2025-03-05 16:16:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.12893v2

Feature Matching Intervention: Leveraging Observational Data for Causal Representation Learning

A major challenge in causal discovery from observational data is the absence of perfect interventions, making it difficult to distinguish causal features from spurious ones. We propose an innovative approach, Feature Matching Intervention (FMI), which uses a matching procedure to mimic perfect interventions. We define causal latent graphs, extending structural causal models to latent feature space, providing a framework that connects FMI with causal graph learning. Our feature matching procedure emulates perfect interventions within these causal latent graphs. Theoretical results demonstrate that FMI exhibits strong out-of-distribution (OOD) generalizability. Experiments further highlight FMI's superior performance in effectively identifying causal features solely from observational data.

Updated: 2025-03-05 16:14:43

标题: 特征匹配干预：利用观察数据进行因果表示学习

摘要: 从观测数据中发现因果关系的一个主要挑战是缺乏完美的干预，这使得难以区分因果特征和虚假特征。我们提出了一种创新方法，特征匹配干预（FMI），该方法使用匹配程序模拟完美干预。我们定义了因果潜在图，将结构因果模型扩展到潜在特征空间，提供了一个将FMI与因果图学习相连接的框架。我们的特征匹配程序在这些因果潜在图中模拟完美干预。理论结果表明，FMI表现出强大的分布之外（OOD）泛化能力。实验进一步突显了FMI在仅从观测数据中有效识别因果特征方面的卓越性能。

更新时间: 2025-03-05 16:14:43

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2503.03634v1

DeepGrav: Anomalous Gravitational-Wave Detection Through Deep Latent Features

This work introduces a novel deep learning-based approach for gravitational wave anomaly detection, aiming to overcome the limitations of traditional matched filtering techniques in identifying unknown waveform gravitational wave signals. We introduce a modified convolutional neural network architecture inspired by ResNet that leverages residual blocks to extract high-dimensional features, effectively capturing subtle differences between background noise and gravitational wave signals. This network architecture learns a high-dimensional projection while preserving discrepancies with the original input, facilitating precise identification of gravitational wave signals. In our experiments, we implement an innovative data augmentation strategy that generates new data by computing the arithmetic mean of multiple signal samples while retaining the key features of the original signals. In the NSF HDR A3D3: Detecting Anomalous Gravitational Wave Signals competition, it is honorable for us (group name: easonyan123) to get to the first place at the end with our model achieving a true negative rate (TNR) of 0.9708 during development/validation phase and 0.9832 on an unseen challenge dataset during final/testing phase, the highest among all competitors. These results demonstrate that our method not only achieves excellent generalization performance but also maintains robust adaptability in addressing the complex uncertainties inherent in gravitational wave anomaly detection.

Updated: 2025-03-05 16:14:22

标题: DeepGrav：通过深度潜在特征检测异常引力波

摘要: 这项研究引入了一种新颖的基于深度学习的方法来检测引力波异常，旨在克服传统匹配滤波技术在识别未知波形引力波信号方面的局限性。我们引入了一个受ResNet启发的修改后的卷积神经网络架构，利用残差块提取高维特征，有效捕捉背景噪音和引力波信号之间的微妙差异。该网络架构学习了一个高维投影，同时保留了与原始输入的差异，有助于精确识别引力波信号。在我们的实验中，我们实施了一种创新的数据增强策略，通过计算多个信号样本的算术平均值生成新数据，同时保留原始信号的关键特征。在NSF HDR A3D3：检测异常引力波信号竞赛中，我们（小组名称：easonyan123）荣幸地在开发/验证阶段以真阴性率（TNR）达到0.9708，并在最终/测试阶段的未见挑战数据集上达到0.9832的结果，比所有竞争对手都高。这些结果表明，我们的方法不仅实现了优秀的泛化性能，而且在处理引力波异常检测中固有的复杂不确定性时保持了强大的适应性。

更新时间: 2025-03-05 16:14:22

领域: cs.LG,astro-ph.HE,gr-qc

下载: http://arxiv.org/abs/2503.03799v1

Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification

The ability to process long contexts is crucial for many natural language processing tasks, yet it remains a significant challenge. While substantial progress has been made in enhancing the efficiency of attention mechanisms, there is still a gap in understanding how attention heads function in long-context settings. In this paper, we observe that while certain heads consistently attend to local information only, others swing between attending to local and long-context information depending on the query. This raises the question: can we identify which heads require long-context information to predict the next token accurately? We demonstrate that it's possible to predict which heads are crucial for long-context processing using only local keys. The core idea here is to exploit a simple model for the long-context scores via second moment approximations. These findings unveil simple properties of attention in the context of long sequences, and open the door to potentially significant gains in efficiency.

Updated: 2025-03-05 16:14:16

标题: 揭示注意力的简单性：自适应长上下文头部识别

摘要: 处理长上下文的能力对许多自然语言处理任务至关重要，但仍然是一个重大挑战。虽然在增强注意力机制的效率方面已经取得了实质性进展，但在理解注意力头在长上下文环境中的功能方面仍存在差距。在本文中，我们观察到，虽然某些头部始终只关注本地信息，但其他头部会根据查询在本地和长上下文信息之间切换。这引发了一个问题：我们能否确定哪些头部需要长上下文信息才能准确预测下一个标记？我们展示了仅使用本地键就可以预测哪些头部对长上下文处理至关重要的可能性。这里的核心思想是通过二阶矩逼近利用一个简单的模型来计算长上下文分数。这些发现揭示了在长序列环境中注意力的简单属性，并为潜在的效率显著提升打开了大门。

更新时间: 2025-03-05 16:14:16

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.09647v2

Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm

Advances in deep learning and representation learning have transformed item factor analysis (IFA) in the item response theory (IRT) literature by enabling more efficient and accurate parameter estimation. Variational Autoencoders (VAEs) have been one of the most impactful techniques in modeling high-dimensional latent variables in this context. However, the limited expressiveness of the inference model based on traditional VAEs can still hinder the estimation performance. We introduce Adversarial Variational Bayes (AVB) algorithms as an improvement to VAEs for IFA with improved flexibility and accuracy. By bridging the strengths of VAEs and Generative Adversarial Networks (GANs), AVB incorporates an auxiliary discriminator network to reframe the estimation process as a two-player adversarial game and removes the restrictive assumption of standard normal distributions in the inference model. Theoretically, AVB can achieve similar or higher likelihood compared to VAEs. A further enhanced algorithm, Importance-weighted Adversarial Variational Bayes (IWAVB) is proposed and compared with Importance-weighted Autoencoders (IWAE). In an exploratory analysis of empirical data, IWAVB demonstrated superior expressiveness by achieving a higher likelihood compared to IWAE. In confirmatory analysis with simulated data, IWAVB achieved similar mean-square error results to IWAE while consistently achieving higher likelihoods. When latent variables followed a multimodal distribution, IWAVB outperformed IWAE. With its innovative use of GANs, IWAVB is shown to have the potential to extend IFA to handle large-scale data, facilitating the potential integration of psychometrics and multimodal data analysis.

Updated: 2025-03-05 16:11:42

标题: 生成对抗网络用于高维项目因子分析：一种深度对抗学习算法

摘要: 深度学习和表示学习的进展已经改变了项目因子分析（IFA）在项目反应理论（IRT）文献中的应用，使参数估计更加高效和准确。变分自动编码器（VAEs）是在这一背景下建模高维潜变量中最有影响力的技术之一。然而，基于传统VAEs的推断模型的有限表达能力仍然可能阻碍估计性能。我们引入了对抗变分贝叶斯（AVB）算法作为对具有改进灵活性和准确性的VAEs的改进。通过结合VAEs和生成对抗网络（GANs）的优势，AVB引入了一个辅助鉴别器网络，将估计过程重新构建为一个双方对抗的游戏，并消除了推断模型中标准正态分布的限制性假设。理论上，AVB可以实现与VAEs类似或更高的可能性。进一步增强的算法，重要性加权对抗变分贝叶斯（IWAVB）被提出，并与重要性加权自动编码器（IWAE）进行比较。在对经验数据的探索性分析中，IWAVB通过实现更高的可能性表达能力，表现出优越性能。在对模拟数据的验证性分析中，IWAVB实现了与IWAE类似的均方误差结果，同时始终实现更高的可能性。当潜变量遵循多模态分布时，IWAVB优于IWAE。通过创新地使用GANs，IWAVB被证明具有潜力将IFA扩展到处理大规模数据，促进心理测量学和多模态数据分析的潜在整合。

更新时间: 2025-03-05 16:11:42

领域: stat.ML,cs.LG,stat.AP,stat.CO,stat.ME

下载: http://arxiv.org/abs/2502.10650v2

MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT

Multimodal sensing systems are increasingly prevalent in various real-world applications. Most existing multimodal learning approaches heavily rely on training with a large amount of synchronized, complete multimodal data. However, such a setting is impractical in real-world IoT sensing applications where data is typically collected by distributed nodes with heterogeneous data modalities, and is also rarely labeled. In this paper, we propose MMBind, a new data binding approach for multimodal learning on distributed and heterogeneous IoT data. The key idea of MMBind is to construct a pseudo-paired multimodal dataset for model training by binding data from disparate sources and incomplete modalities through a sufficiently descriptive shared modality. We also propose a weighted contrastive learning approach to handle domain shifts among disparate data, coupled with an adaptive multimodal learning architecture capable of training models with heterogeneous modality combinations. Evaluations on ten real-world multimodal datasets highlight that MMBind outperforms state-of-the-art baselines under varying degrees of data incompleteness and domain shift, and holds promise for advancing multimodal foundation model training in IoT applications\footnote (The source code is available via https://github.com/nesl/multimodal-bind).

Updated: 2025-03-05 16:08:49

标题: MMBind：释放分布式和异构数据在物联网中进行多模态学习的潜力

摘要: 多模感知系统在各种实际应用中越来越普遍。大多数现有的多模学习方法严重依赖于使用大量同步的完整多模数据进行训练。然而，在实际的物联网感知应用中，这样的设定是不切实际的，因为数据通常是由具有异构数据模态的分布式节点收集的，并且很少被标记。在本文中，我们提出了MMBind，一种针对分布式和异构物联网数据的新数据绑定方法。MMBind的关键思想是通过使用足够描述性的共享模态，将来自不同来源和不完整模态的数据绑定在一起，构建伪配对的多模数据集，用于模型训练。我们还提出了一种加权对比学习方法，用于处理不同数据之间的领域漂移，结合自适应多模学习架构，能够训练具有异构模态组合的模型。对十个真实世界的多模数据集的评估表明，MMBind在不同程度的数据不完整性和领域漂移下优于现有技术基准，并有望推动物联网应用中多模基础模型训练的发展。【源代码可通过https://github.com/nesl/multimodal-bind获得】。

更新时间: 2025-03-05 16:08:49

领域: cs.LG

下载: http://arxiv.org/abs/2411.12126v2

Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis

Recent advances in code generation have illuminated the potential of employing large language models (LLMs) for general-purpose programming languages such as Python and C++, opening new opportunities for automating software development and enhancing programmer productivity. The potential of LLMs in software programming has sparked significant interest in exploring automated hardware generation and automation. Although preliminary endeavors have been made to adopt LLMs in generating hardware description languages (HDLs), several challenges persist in this direction. First, the volume of available HDL training data is substantially smaller compared to that for software programming languages. Second, the pre-trained LLMs, mainly tailored for software code, tend to produce HDL designs that are more error-prone. Third, the generation of HDL requires a significantly higher number of tokens compared to software programming, leading to inefficiencies in cost and energy consumption. To tackle these challenges, this paper explores leveraging LLMs to generate High-Level Synthesis (HLS)-based hardware design. Although code generation for domain-specific programming languages is not new in the literature, we aim to provide experimental results, insights, benchmarks, and evaluation infrastructure to investigate the suitability of HLS over low-level HDLs for LLM-assisted hardware design generation. To achieve this, we first finetune pre-trained models for HLS-based hardware generation, using a collected dataset with text prompts and corresponding reference HLS designs. An LLM-assisted framework is then proposed to automate end-to-end hardware code generation, which also investigates the impact of chain-of-thought and feedback loops promoting techniques on HLS-design generation. Limited by the timeframe of this research, we plan to evaluate more advanced reasoning models in the future.

Updated: 2025-03-05 16:07:23

标题: 探索用于自动HLS-based硬件生成的代码语言模型：基准，基础设施和分析

摘要: 最近在代码生成方面取得的进展揭示了利用大型语言模型(LLMs)进行通用编程语言（如Python和C++）的潜力，为自动化软件开发和提高程序员生产力开辟了新机遇。LLMs在软件编程中的潜力引发了对自动化硬件生成和自动化的重大兴趣。尽管已经开始尝试采用LLMs来生成硬件描述语言（HDLs），但在这方面仍存在一些挑战。首先，相比于软件编程语言，可用的HDL训练数据量要小得多。其次，主要针对软件代码的预训练LLMs往往会产生更容易出错的HDL设计。第三，与软件编程相比，HDL的生成需要更多的标记，导致成本和能源消耗的低效性。为了解决这些挑战，本文探讨了利用LLMs生成基于高级综合（HLS）的硬件设计。虽然领域特定编程语言的代码生成在文献中并不新鲜，但我们旨在提供实验结果、见解、基准测试和评估基础设施，以调查HLS相对于低级HDL对LLM辅助硬件设计生成的适用性。为了实现这一目标，我们首先对HLS基础硬件生成的预训练模型进行微调，使用收集的带文本提示和相应参考HLS设计的数据集。然后提出了一个LLM辅助框架，以自动化端到端硬件代码生成，同时研究了对HLS设计生成的思维链和反馈循环促进技术的影响。受限于本研究的时间框架，我们计划在未来评估更先进的推理模型。

更新时间: 2025-03-05 16:07:23

领域: cs.LG,cs.AR,cs.SE

下载: http://arxiv.org/abs/2502.13921v2

One-Shot Imitation under Mismatched Execution

Human demonstrations as prompts are a powerful way to program robots to do long-horizon manipulation tasks. However, translating these demonstrations into robot-executable actions presents significant challenges due to execution mismatches in movement styles and physical capabilities. Existing methods either depend on human-robot paired data, which is infeasible to scale, or rely heavily on frame-level visual similarities that often break down in practice. To address these challenges, we propose RHyME, a novel framework that automatically aligns human and robot task executions using optimal transport costs. Given long-horizon robot demonstrations, RHyME synthesizes semantically equivalent human videos by retrieving and composing short-horizon human clips. This approach facilitates effective policy training without the need for paired data. RHyME successfully imitates a range of cross-embodiment demonstrators, both in simulation and with a real human hand, achieving over 50\% increase in task success compared to previous methods. We release our code and datasets at https://portal-cornell.github.io/rhyme/.

Updated: 2025-03-05 16:07:20

标题: 一次性模仿不匹配执行

摘要: 人类演示作为提示是编程机器人执行长期操纵任务的强大方法。然而，将这些演示转化为机器人可执行的动作存在重要挑战，因为运动风格和物理能力上存在执行不匹配。现有方法要么依赖于人机配对数据，这在实践中难以扩展，要么严重依赖于在实践中经常崩溃的帧级视觉相似性。为了解决这些挑战，我们提出了RHyME，一个新颖的框架，通过最优传输成本自动对齐人类和机器人任务执行。给定长期机器人演示，RHyME通过检索和组合短期人类剪辑来合成语义等效的人类视频。这种方法有助于有效的策略训练，无需配对数据。RHyME成功模仿了一系列不同体验的演示者，在模拟和真实人类手部中，与先前方法相比，任务成功率增加了50%以上。我们在https://portal-cornell.github.io/rhyme/发布了我们的代码和数据集。

更新时间: 2025-03-05 16:07:20

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.06615v5

Deterministic Global Optimization of the Acquisition Function in Bayesian Optimization: To Do or Not To Do?

Bayesian Optimization (BO) with Gaussian Processes relies on optimizing an acquisition function to determine sampling. We investigate the advantages and disadvantages of using a deterministic global solver (MAiNGO) compared to conventional local and stochastic global solvers (L-BFGS-B and multi-start, respectively) for the optimization of the acquisition function. For CPU efficiency, we set a time limit for MAiNGO, taking the best point as optimal. We perform repeated numerical experiments, initially using the Muller-Brown potential as a benchmark function, utilizing the lower confidence bound acquisition function; we further validate our findings with three alternative benchmark functions. Statistical analysis reveals that when the acquisition function is more exploitative (as opposed to exploratory), BO with MAiNGO converges in fewer iterations than with the local solvers. However, when the dataset lacks diversity, or when the acquisition function is overly exploitative, BO with MAiNGO, compared to the local solvers, is more likely to converge to a local rather than a global ly near-optimal solution of the black-box function. L-BFGS-B and multi-start mitigate this risk in BO by introducing stochasticity in the selection of the next sampling point, which enhances the exploration of uncharted regions in the search space and reduces dependence on acquisition function hyperparameters. Ultimately, suboptimal optimization of poorly chosen acquisition functions may be preferable to their optimal solution. When the acquisition function is more exploratory, BO with MAiNGO, multi-start, and L-BFGS-B achieve comparable probabilities of convergence to a globally near-optimal solution (although BO with MAiNGO may require more iterations to converge under these conditions).

Updated: 2025-03-05 16:05:26

标题: Deterministic Global Optimization of the Acquisition Function in Bayesian Optimization: To Do or Not To Do? 贝叶斯优化中的采集函数的确定性全局优化：要做还是不要做？

摘要: 贝叶斯优化（BO）与高斯过程依赖于优化收购功能来确定采样。我们研究了使用确定性全局求解器（MAiNGO）相比传统的局部和随机全局求解器（L-BFGS-B和多起点，分别）来优化收购功能的优缺点。为了CPU效率，我们为MAiNGO设置了时间限制，将最佳点作为最优点。我们进行了重复的数值实验，最初使用Muller-Brown势作为基准函数，利用下置信界收购功能；我们进一步验证了我们的发现，使用三种替代基准函数。统计分析表明，当收购功能更具开发性（而非探索性）时，BO与MAiNGO在较少的迭代次数内收敛，而与局部求解器相比。然而，当数据集缺乏多样性，或者收购功能过于开发时，与局部求解器相比，BO与MAiNGO更有可能收敛到局部而非全局近似最优解的黑盒函数。L-BFGS-B和多起点通过在选择下一个采样点时引入随机性来减轻BO中的这种风险，这增强了对搜索空间中未知区域的探索，并减少了对收购功能超参数的依赖。最终，对选择不佳的收购功能进行次优化可能比其最佳解更可取。当收购功能更具探索性时，BO与MAiNGO、多起点和L-BFGS-B实现了收敛到全局近似最优解的概率相当（尽管在这些条件下，BO与MAiNGO可能需要更多的迭代次数才能收敛）。

更新时间: 2025-03-05 16:05:26

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2503.03625v1

It's My Data Too: Private ML for Datasets with Multi-User Training Examples

We initiate a study of algorithms for model training with user-level differential privacy (DP), where each example may be attributed to multiple users, which we call the multi-attribution model. We first provide a carefully chosen definition of user-level DP under the multi-attribution model. Training in the multi-attribution model is facilitated by solving the contribution bounding problem, i.e. the problem of selecting a subset of the dataset for which each user is associated with a limited number of examples. We propose a greedy baseline algorithm for the contribution bounding problem. We then empirically study this algorithm for a synthetic logistic regression task and a transformer training task, including studying variants of this baseline algorithm that optimize the subset chosen using different techniques and criteria. We find that the baseline algorithm remains competitive with its variants in most settings, and build a better understanding of the practical importance of a bias-variance tradeoff inherent in solutions to the contribution bounding problem.

Updated: 2025-03-05 16:02:09

标题: 这是我的数据：为具有多用户训练示例的数据集提供私密机器学习

摘要: 我们开始研究在用户级差分隐私（DP）下进行模型训练的算法，其中每个示例可能归因于多个用户，我们称之为多归因模型。我们首先在多归因模型下提供了一个精心选择的用户级DP的定义。在多归因模型中的训练通过解决贡献限制问题来实现，即选择数据集的子集，其中每个用户与有限数量的示例相关联。我们提出了一个贪婪基线算法用于贡献限制问题。然后，我们在合成逻辑回归任务和变压器训练任务中对这个算法进行了经验研究，包括研究优化使用不同技术和标准选择的子集的基线算法的变体。我们发现在大多数情况下，基线算法在竞争中仍然具有竞争力，并更好地理解了解决贡献限制问题中固有的偏差-方差折衷在实际重要性方面。

更新时间: 2025-03-05 16:02:09

领域: cs.LG

下载: http://arxiv.org/abs/2503.03622v1

Measuring and identifying factors of individuals' trust in Large Language Models

Large Language Models (LLMs) can engage in human-looking conversational exchanges. Although conversations can elicit trust between users and LLMs, scarce empirical research has examined trust formation in human-LLM contexts, beyond LLMs' trustworthiness or human trust in AI in general. Here, we introduce the Trust-In-LLMs Index (TILLMI) as a new framework to measure individuals' trust in LLMs, extending McAllister's cognitive and affective trust dimensions to LLM-human interactions. We developed TILLMI as a psychometric scale, prototyped with a novel protocol we called LLM-simulated validity. The LLM-based scale was then validated in a sample of 1,000 US respondents. Exploratory Factor Analysis identified a two-factor structure. Two items were then removed due to redundancy, yielding a final 6-item scale with a 2-factor structure. Confirmatory Factor Analysis on a separate subsample showed strong model fit ($CFI = .995$, $TLI = .991$, $RMSEA = .046$, $p_{X^2} > .05$). Convergent validity analysis revealed that trust in LLMs correlated positively with openness to experience, extraversion, and cognitive flexibility, but negatively with neuroticism. Based on these findings, we interpreted TILLMI's factors as "closeness with LLMs" (affective dimension) and "reliance on LLMs" (cognitive dimension). Younger males exhibited higher closeness with- and reliance on LLMs compared to older women. Individuals with no direct experience with LLMs exhibited lower levels of trust compared to LLMs' users. These findings offer a novel empirical foundation for measuring trust in AI-driven verbal communication, informing responsible design, and fostering balanced human-AI collaboration.

Updated: 2025-03-05 15:52:43

标题: 测量和识别个体对大型语言模型的信任因素

摘要: 大型语言模型（LLMs）可以进行看起来像人类的对话交流。尽管对话可以在用户和LLMs之间引发信任，但很少有经验研究研究了人类-LLM环境中的信任形成，超越了LLMs的可信度或人类对人工智能的信任。在这里，我们引入了信任-LLMs指数（TILLMI）作为一个新的框架，用于衡量个体对LLMs的信任，将McAllister的认知和情感信任维度扩展到LLM-人类互动中。我们开发了TILLMI作为一个心理测量量表，并用一个我们称之为LLM模拟有效性的新协议进行了原型设计。然后，在一个包含1,000名美国受访者的样本中验证了基于LLM的量表。探索性因素分析确定了一个双因素结构。然后由于冗余性而移除了两个项目，得到了一个最终的6项量表，具有一个2因素结构。在一个单独的子样本上进行的验证性因素分析显示了强大的模型拟合度（CFI = .995，TLI = .991，RMSEA = .046，pX^2 > .05）。收敛效度分析显示，对LLMs的信任与经验开放性、外向性和认知灵活性呈正相关，但与神经质呈负相关。基于这些发现，我们将TILLMI的因素解释为“与LLMs的亲近度”（情感维度）和“对LLMs的依赖”（认知维度）。年轻男性相较于年长女性表现出更高的与LLMs的亲近度和依赖性。没有直接经验LLMs的个体相较于LLMs的用户表现出更低的信任水平。这些发现为衡量基于人工智能的口头交流中的信任提供了新颖的经验基础，为负责任的设计提供了信息，并促进了平衡的人类-人工智能合作。

更新时间: 2025-03-05 15:52:43

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2502.21028v2

Decoupled Recommender Systems: Exploring Alternative Recommender Ecosystem Designs

Recommender ecosystems are an emerging subject of research. Such research examines how the characteristics of algorithms, recommendation consumers, and item providers influence system dynamics and long-term outcomes. One architectural possibility that has not yet been widely explored in this line of research is the consequences of a configuration in which recommendation algorithms are decoupled from the platforms they serve. This is sometimes called "the friendly neighborhood algorithm store" or "middleware" model. We are particularly interested in how such architectures might offer a range of different distributions of utility across consumers, providers, and recommendation platforms. In this paper, we create a model of a recommendation ecosystem that incorporates algorithm choice and examine the outcomes of such a design.

Updated: 2025-03-05 15:42:37

标题: 解耦推荐系统：探索替代推荐生态系统设计

摘要: 推荐生态系统是一个新兴的研究领域。这类研究探讨算法特征、推荐消费者和物品提供者如何影响系统动态和长期结果。在这一研究领域中尚未广泛探讨的一个架构可能性是推荐算法与其服务的平台脱钩的配置的后果。这有时被称为“友好邻里算法商店”或“中间件”模型。我们特别感兴趣的是这种架构如何为消费者、提供者和推荐平台提供各种效用分布。在本文中，我们创建了一个推荐生态系统模型，该模型包括算法选择，并检验了这种设计的结果。

更新时间: 2025-03-05 15:42:37

领域: cs.IR,cs.AI,cs.HC

下载: http://arxiv.org/abs/2503.03606v1

Review of Machine Learning for Micro-Electronic Design Verification

Microelectronic design verification remains a critical bottleneck in device development, traditionally mitigated by expanding verification teams and computational resources. Since the late 1990s, machine learning (ML) has been proposed to enhance verification efficiency, yet many techniques have not achieved mainstream adoption. This review, from the perspective of verification and ML practitioners, examines the application of ML in dynamic-based techniques for functional verification of microelectronic designs, and provides a starting point for those new to this interdisciplinary field. Historical trends, techniques, ML types, and evaluation baselines are analysed to understand why previous research has not been widely adopted in industry. The review highlights the application of ML, the techniques used and critically discusses their limitations and successes. Although there is a wealth of promising research, real-world adoption is hindered by challenges in comparing techniques, identifying suitable applications, and the expertise required for implementation. This review proposes that the field can progress through the creation and use of open datasets, common benchmarks, and verification targets. By establishing open evaluation criteria, industry can guide future research. Parallels with ML in software verification suggest potential for collaboration. Additionally, greater use of open-source designs and verification environments can allow more researchers from outside the hardware verification discipline to contribute to the challenge of verifying microelectronic designs.

Updated: 2025-03-05 15:41:09

标题: 《微电子设计验证的机器学习综述》

摘要: 微电子设计验证仍然是设备开发中的一个关键瓶颈，传统上通过扩大验证团队和计算资源来缓解。自20世纪90年代末以来，机器学习（ML）被提出用于提高验证效率，然而许多技术并未被广泛采用。这篇综述从验证和机器学习从业者的角度，研究了ML在基于动态技术的微电子设计功能验证中的应用，并为那些对这一跨学科领域感兴趣的人提供了一个起点。历史趋势、技术、ML类型和评估基线被分析以了解为什么先前的研究没有被广泛采用在工业中。综述突出了ML的应用、所使用的技术并对其局限性和成功进行了批判性讨论。尽管有大量有希望的研究，但真实世界的采用受到技术比较、识别适用的应用程序以及实施所需的专业知识等挑战的阻碍。这篇综述提出，通过创建和利用开放数据集、共同基准和验证目标，该领域可以取得进展。通过建立开放评估标准，工业界可以引导未来的研究。与软件验证中的ML的对应关系暗示了合作的潜力。此外，更多地使用开源设计和验证环境可以让更多来自硬件验证领域以外的研究人员参与到验证微电子设计的挑战中。

更新时间: 2025-03-05 15:41:09

领域: cs.AR,cs.LG,A.1

下载: http://arxiv.org/abs/2503.11687v1

MDP Geometry, Normalization and Reward Balancing Solvers

We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This advantage-preserving transformation of the MDP motivates a class of algorithms which we call Reward Balancing, which solve MDPs by iterating through these transformations, until an approximately optimal policy can be trivially found. We provide a convergence analysis of several algorithms in this class, in particular showing that for MDPs for unknown transition probabilities we can improve upon state-of-the-art sample complexity results.

Updated: 2025-03-05 15:40:42

标题: MDP几何，归一化和奖励平衡求解器

摘要: 我们提出了一个新的几何解释马尔可夫决策过程（MDPs），并提出了一种自然的标准化程序，使我们能够在不改变任何动作相对于任何策略的优势的情况下调整每个状态的值函数。这种保持优势的MDP转换激发了一类算法，我们称之为奖励平衡，通过迭代这些转换来解决MDPs，直到可以轻松找到近似最优策略。我们提供了这一类算法的收敛分析，特别是表明对于未知转移概率的MDPs，我们可以改进现有的样本复杂度结果。

更新时间: 2025-03-05 15:40:42

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2407.06712v4

Capability-Aware Shared Hypernetworks for Flexible Heterogeneous Multi-Robot Coordination

Recent advances have enabled heterogeneous multi-robot teams to learn complex and effective coordination. However, existing architectural designs that support heterogeneous teams tend to force a trade-off between expressivity and efficiency. Some attempt to encode diverse behaviors within a single shared architecture by appending the input with an ID unique to each robot or robot type. These designs improve sample and parameter efficiency but tend to limit behavioral diversity. Others use a separate policy for each robot, enabling greater diversity at the cost of efficiency and generalization. We view these two designs as ends of a spectrum and explore a middle-ground approach that enables efficient learning of diverse behaviors. Inspired by work in transfer learning and meta RL, and building upon prior work in trait-based task allocation, we propose Capability-Aware Shared Hypernetworks (CASH), a general-purpose soft weight sharing architecture that uses hypernetworks to enable a single architecture to dynamically adapt to each robot and the current context. Intuitively, CASH encodes shared decision making strategies that can be adapted to each robot based on local observations and the robots' individual and collective capabilities (e.g., speed and payload). CASH explicitly captures the impact of capabilities on collective behavior, enabling zero-shot generalization to unseen robots or team compositions. We conducted experiments across four heterogeneous coordination tasks and three learning paradigms (imitation learning, value-based, and policy-gradient RL) using SOTA multi-robot simulation (JaxMARL) and hardware (Robotarium) platforms. Across all conditions, CASH generates appropriately diverse behaviors and outperforms baseline architectures in task performance and sample efficiency during training and zero-shot generalization while utilizing 60%-80% fewer learnable parameters.

Updated: 2025-03-05 15:37:52

标题: 能力感知共享超网络用于灵活异构多机器人协调

摘要: 最近的进展使得异构多机器人团队能够学习复杂而有效的协调。然而，现有支持异构团队的架构设计往往会在表达能力和效率之间进行权衡。一些尝试在单个共享架构中通过为每个机器人或机器人类型附加独特的ID来编码多样化行为。这些设计提高了样本和参数效率，但往往限制了行为多样性。其他人为每个机器人使用单独的策略，以牺牲效率和泛化能力为代价实现更大的多样性。我们将这两种设计视为一个谱系的两端，并探索一种中间地带的方法，使得能够高效地学习多样化行为。受到迁移学习和元RL的工作的启发，并在基于特征的任务分配的先前工作基础上构建，我们提出了能力感知共享超网络（CASH），这是一种通用软权重共享架构，使用超网络使得单个架构能够动态地适应每个机器人和当前环境。直观地说，CASH编码了可以根据局部观察和机器人的个体和集体能力（如速度和负载）对每个机器人进行调整的共享决策策略。CASH明确捕捉了能力对集体行为的影响，实现了对未见机器人或团队结构的零-shot泛化。我们在四个异构协调任务和三种学习范式（模仿学习、基于值的和策略梯度RL）上进行了实验，使用了SOTA多机器人模拟（JaxMARL）和硬件（Robotarium）平台。在所有条件下，CASH生成了适当多样化的行为，并在训练过程中的任务表现和样本效率以及零-shot泛化方面优于基线架构，同时利用了60%-80%更少的可学习参数。

更新时间: 2025-03-05 15:37:52

领域: cs.MA,cs.LG

下载: http://arxiv.org/abs/2501.06058v3

Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning

Recent innovations in autonomous drones have facilitated time-optimal flight in single-drone configurations, and enhanced maneuverability in multi-drone systems by applying optimal control and learning-based methods. However, few studies have achieved time-optimal motion planning for multi-drone systems, particularly during highly agile maneuvers or in dynamic scenarios. This paper presents a decentralized policy network using multi-agent reinforcement learning for time-optimal multi-drone flight. To strike a balance between flight efficiency and collision avoidance, we introduce a soft collision-free mechanism inspired by optimization-based methods. By customizing PPO in a centralized training, decentralized execution (CTDE) fashion, we unlock higher efficiency and stability in training while ensuring lightweight implementation. Extensive simulations show that, despite slight performance trade-offs compared to single-drone systems, our multi-drone approach maintains near-time-optimal performance with a low collision rate. Real-world experiments validate our method, with two quadrotors using the same network as in simulation achieving a maximum speed of 13.65 m/s and a maximum body rate of 13.4 rad/s in a 5.5 m * 5.5 m * 2.0 m space across various tracks, relying entirely on onboard computation.

Updated: 2025-03-05 15:35:47

标题: 追逐金色飞贼：多无人机多智能体强化学习的时间最优运动规划

摘要: 最近自主无人机的创新促进了单个无人机配置中的时间最优飞行，并通过应用最优控制和基于学习的方法增强了多无人机系统的机动性。然而，很少有研究实现了多无人机系统的时间最优运动规划，特别是在高度灵活的机动或动态场景中。本文提出了一种使用多智能体强化学习的分散策略网络，用于实现多无人机时间最优飞行。为了在飞行效率和碰撞回避之间取得平衡，我们引入了一种受到基于优化的方法启发的软碰撞自由机制。通过以集中式训练，分散执行（CTDE）的方式定制PPO，我们在训练中实现了更高的效率和稳定性，同时确保了轻量级实现。大量模拟显示，尽管与单个无人机系统相比存在轻微的性能折衷，但我们的多无人机方法在低碰撞率的同时保持接近时间最优的性能。真实世界的实验证实了我们的方法，两个四旋翼飞行器在各种赛道上使用与模拟中相同的网络实现了最大速度为13.65 m/s，最大机体速率为13.4 rad/s，在一个5.5 m * 5.5 m * 2.0 m的空间中，完全依靠机载计算。

更新时间: 2025-03-05 15:35:47

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2409.16720v2

Beyond Canonicalization: How Tensorial Messages Improve Equivariant Message Passing

In numerous applications of geometric deep learning, the studied systems exhibit spatial symmetries and it is desirable to enforce these. For the symmetry of global rotations and reflections, this means that the model should be equivariant with respect to the transformations that form the group of $\mathrm O(d)$. While many approaches for equivariant message passing require specialized architectures, including non-standard normalization layers or non-linearities, we here present a framework based on local reference frames ("local canonicalization") which can be integrated with any architecture without restrictions. We enhance equivariant message passing based on local canonicalization by introducing tensorial messages to communicate geometric information consistently between different local coordinate frames. Our framework applies to message passing on geometric data in Euclidean spaces of arbitrary dimension. We explicitly show how our approach can be adapted to make a popular existing point cloud architecture equivariant. We demonstrate the superiority of tensorial messages and achieve state-of-the-art results on normal vector regression and competitive results on other standard 3D point cloud tasks.

Updated: 2025-03-05 15:35:35

标题: 超越规范化：张量消息如何改进等变消息传递

摘要: 在许多几何深度学习的应用中，研究的系统展现出空间对称性，并且希望强制执行这些对称性。对于全局旋转和反射的对称性，这意味着模型应该具有关于形成$\mathrm O(d)$群的变换的等变性。虽然许多等变消息传递方法需要专门的架构，包括非标准的归一化层或非线性，但我们在这里提出了一个基于局部参考框架（“局部规范化”）的框架，可以与任何架构集成而没有限制。我们通过引入张量消息来增强基于局部规范化的等变消息传递，以便在不同的本地坐标框架之间一致地传递几何信息。我们的框架适用于在任意维度的欧几里得空间中进行几何数据的消息传递。我们明确展示了我们的方法如何被调整以使一个流行的现有点云架构具有等变性。我们展示了张量消息的优越性，并在法向量回归上取得了最先进的结果，并在其他标准的3D点云任务上取得了竞争性的结果。

更新时间: 2025-03-05 15:35:35

领域: cs.LG

下载: http://arxiv.org/abs/2405.15389v3

Incentivizing Truthful Collaboration in Heterogeneous Federated Learning

Federated learning (FL) is a distributed collaborative learning method, where multiple clients learn together by sharing gradient updates instead of raw data. However, it is well-known that FL is vulnerable to manipulated updates from clients. In this work we study the impact of data heterogeneity on clients' incentives to manipulate their updates. First, we present heterogeneous collaborative learning scenarios where a client can modify their updates to be better off, and show that these manipulations can lead to diminishing model performance. To prevent such modifications, we formulate a game in which clients may misreport their gradient updates in order to "steer" the server model to their advantage. We develop a payment rule that provably disincentivizes sending modified updates under the FedSGD protocol. We derive explicit bounds on the clients' payments and the convergence rate of the global model, which allows us to study the trade-off between heterogeneity, payments and convergence. Finally, we provide an experimental evaluation of the effectiveness of our payment rule in the FedSGD, median-based aggregation FedSGD and FedAvg protocols on three tasks in computer vision and natural language processing. In all cases we find that our scheme successfully disincentivizes modifications.

Updated: 2025-03-05 15:32:01

标题: 激励异构联邦学习中真实合作

摘要: 联邦学习（FL）是一种分布式协作学习方法，多个客户端通过共享梯度更新而不是原始数据一起学习。然而，众所周知FL容易受到客户端操纵更新的影响。本文研究了数据异质性对客户端操纵其更新的动机的影响。首先，我们提出了异质协作学习场景，客户端可以修改其更新以获得更大好处，并显示这些操纵可能导致模型性能下降。为防止此类修改，我们制定了一个游戏，客户端可以误报其梯度更新以“操纵”服务器模型以获得优势。我们开发了一种支付规则，可以证明在FedSGD协议下减少发送修改更新的动机。我们导出了客户端支付和全局模型收敛速度的明确界限，以便研究异质性、支付和收敛之间的权衡。最后，我们在计算机视觉和自然语言处理领域的三个任务上对我们的支付规则在FedSGD、基于中位数聚合的FedSGD和FedAvg协议的有效性进行了实验评估。在所有情况下，我们发现我们的方案成功地减少了修改行为的动机。

更新时间: 2025-03-05 15:32:01

领域: cs.LG,cs.GT,stat.ML

下载: http://arxiv.org/abs/2412.00980v2

Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias

Score-based diffusion models have achieved incredible performance in generating realistic images, audio, and video data. While these models produce high-quality samples with impressive details, they often introduce unrealistic artifacts, such as distorted fingers or hallucinated texts with no meaning. This paper focuses on textual hallucinations, where diffusion models correctly generate individual symbols but assemble them in a nonsensical manner. Through experimental probing, we consistently observe that such phenomenon is attributed it to the network's local generation bias. Denoising networks tend to produce outputs that rely heavily on highly correlated local regions, particularly when different dimensions of the data distribution are nearly pairwise independent. This behavior leads to a generation process that decomposes the global distribution into separate, independent distributions for each symbol, ultimately failing to capture the global structure, including underlying grammar. Intriguingly, this bias persists across various denoising network architectures including MLP and transformers which have the structure to model global dependency. These findings also provide insights into understanding other types of hallucinations, extending beyond text, as a result of implicit biases in the denoising models. Additionally, we theoretically analyze the training dynamics for a specific case involving a two-layer MLP learning parity points on a hypercube, offering an explanation of its underlying mechanism.

Updated: 2025-03-05 15:28:50

标题: 通过局部生成偏差理解扩散模型的文本幻觉

摘要: 基于分数的扩散模型在生成逼真的图像、音频和视频数据方面取得了令人难以置信的性能。虽然这些模型生成了具有令人印象深刻细节的高质量样本，但它们经常引入不现实的伪像，比如扭曲的手指或无意义的幻想文本。本文关注文本幻觉，即扩散模型正确生成单个符号但以一种无意义的方式组合它们。通过实验探索，我们一致观察到这种现象归因于网络的局部生成偏差。去噪网络往往会生成依赖于高度相关的局部区域的输出，特别是当数据分布的不同维度几乎是成对独立时。这种行为导致了一个分解全局分布为各个符号的独立分布的生成过程，最终无法捕捉全局结构，包括潜在的语法。有趣的是，这种偏见在各种去噪网络架构中都存在，包括MLP和transformers，这些架构具有建模全局依赖关系的结构。这些发现还提供了对理解其他类型幻觉的见解，超越了文本，成为去噪模型中隐含偏见的结果。此外，我们在理论上分析了一个特定情况下的训练动态，涉及一个在超立方体上学习奇偶点的两层MLP，提供了其潜在机制的解释。

更新时间: 2025-03-05 15:28:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03595v1

Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs

While LLMs have demonstrated remarkable potential in time series forecasting, their practical deployment remains constrained by excessive computational demands and memory footprints. Existing LLM-based approaches typically suffer from three critical limitations: Inefficient parameter utilization in handling numerical time series patterns; Modality misalignment between continuous temporal signals and discrete text embeddings; and Inflexibility for real-time expert knowledge integration. We present SMETimes, the first systematic investigation of sub-3B parameter SLMs for efficient and accurate time series forecasting. Our approach centers on three key innovations: A statistically-enhanced prompting mechanism that bridges numerical time series with textual semantics through descriptive statistical features; A adaptive fusion embedding architecture that aligns temporal patterns with language model token spaces through learnable parameters; And a dynamic mixture-of-experts framework enabled by SLMs' computational efficiency, adaptively combining base predictions with domain-specific models. Extensive evaluations across seven benchmark datasets demonstrate that our 3B-parameter SLM achieves state-of-the-art performance on five primary datasets while maintaining 3.8x faster training and 5.2x lower memory consumption compared to 7B-parameter LLM baselines. Notably, the proposed model exhibits better learning capabilities, achieving 12.3% lower MSE than conventional LLM. Ablation studies validate that our statistical prompting and cross-modal fusion modules respectively contribute 15.7% and 18.2% error reduction in long-horizon forecasting tasks. By redefining the efficiency-accuracy trade-off landscape, this work establishes SLMs as viable alternatives to resource-intensive LLMs for practical time series forecasting. Code and models are available at https://github.com/xiyan1234567/SMETimes.

Updated: 2025-03-05 15:27:36

标题: 小而强大：利用轻量级LLMs增强时间序列预测

摘要: 尽管LLMs在时间序列预测中表现出了显著的潜力，但它们的实际部署仍受到计算需求和内存占用量过大的限制。现有基于LLM的方法通常存在三个关键限制：处理数值时间序列模式时参数利用效率低；连续时间信号与离散文本嵌入之间存在模态不匹配；以及无法实时整合专业知识。我们提出了SMETimes，这是对于高效准确的时间序列预测的首次系统研究，基于小于3B参数的SLMs。我们的方法围绕三个关键创新展开：通过描述性统计特征构建了一个统计增强提示机制，将数值时间序列与文本语义相结合；采用自适应融合嵌入架构，通过可学习的参数将时间模式与语言模型令牌空间对齐；以及通过SLMs的计算效率实现的动态专家混合框架，自适应地将基本预测与领域特定模型结合。在七个基准数据集上的广泛评估表明，我们的3B参数SLM在五个主要数据集上实现了最先进的性能，同时相比于7B参数LLM基线训练速度更快3.8倍，内存消耗更低5.2倍。值得注意的是，所提出的模型表现出更好的学习能力，比传统LLM的均方误差低12.3%。消融研究验证了我们的统计提示和跨模态融合模块在长期预测任务中分别贡献了15.7%和18.2%的误差减少。通过重新定义效率-准确性权衡的格局，本研究将SLMs确立为实际时间序列预测的资源节约型LLMs的可行替代方案。代码和模型可在https://github.com/xiyan1234567/SMETimes获得。

更新时间: 2025-03-05 15:27:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03594v1

English K_Quantization of LLMs Does Not Disproportionately Diminish Multilingual Performance

For consumer usage of locally deployed LLMs, the GGUF format and k_quantization are invaluable tools for maintaining the performance of the original model while reducing it to sizes deployable with consumer-grade hardware. The number of bits dedicated to each weight from the original model is reduced based on how important they are thought to be during model inference. This importance is arrived at through the application of an 'importance matrix'-a relatively small text document meant to be representative of the LLM's standard use-cases. In the vast majority of quants available online, this document is primarily written in English. It was therefore an open question whether performance on English language tasks was preserved through the sacrifice of multilingual performance and whether it can be preserved with alternate importance matrices. This article investigates these hypotheses by quantizing Llama3.3 70B on importance matrices written in three languages (English, Norwegian, and Malayalam) and evaluating them on the MixEval dataset in both English and Norwegian. All experiments related to k_quantization yielded non-significant results (In all cases p > 0.237) indicating that current quantization practices do not disproportionately harm multilingual performance.

Updated: 2025-03-05 15:26:59

标题: K_量化对LLMs的影响并不会不成比例地降低多语言表现

摘要: 对于消费者使用本地部署的LLM，GGUF格式和k_quantization是无价的工具，可以在将原始模型缩减到可在消费级硬件上部署的大小的同时保持性能。根据在模型推断过程中认为它们的重要性，将用于原始模型的每个权重的位数减少。这种重要性是通过应用一个“重要性矩阵”来确定的-这是一个相对较小的文本文档，旨在代表LLM的标准用例。在线上大多数可用的量化器中，该文档主要以英语撰写。因此，一个悬而未决的问题是，通过牺牲多语言性能，是否保留了在英语语言任务上的性能，并且是否可以通过替代的重要性矩阵来保留。本文通过在三种语言（英语、挪威语和马拉雅拉姆语）编写的重要性矩阵上对Llama3.3 70B进行量化，并在MixEval数据集上对其进行评估来探讨这些假设。所有与k_quantization相关的实验均产生了非显著结果（在所有情况下p> 0.237），表明当前的量化实践并不会过度损害多语言性能。

更新时间: 2025-03-05 15:26:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03592v1

From Informal to Formal -- Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs

The research in AI-based formal mathematical reasoning has shown an unstoppable growth trend. These studies have excelled in mathematical competitions like IMO and have made significant progress. This paper focuses on formal verification, an immediate application scenario of formal reasoning, and breaks it down into sub-tasks. We constructed 18k high-quality instruction-response pairs across five formal specification languages (Coq, Lean4, Dafny, ACSL, and TLA+) by distilling gpt-4o and evaluated against ten open-sourced LLMs, including recent popular DeepSeek-R1. We also fine-tuned several 7~8B small models to achieve comparable performance with Deepseek-R1-671B. Interestingly, we observed that fine-tuning with formal data also enhances mathematics, reasoning, and coding capabilities. Fine-tuned models are released at https: //huggingface.co/fm-universe.

Updated: 2025-03-05 15:26:49

标题: 从非正式到正式--将自然语言需求整合并评估到可验证的形式证明

摘要: 基于人工智能的形式数学推理研究表现出不可阻挡的增长趋势。这些研究在数学竞赛如IMO中表现出色并取得了显著进展。本文着重于形式验证，即形式推理的一个直接应用场景，并将其分解为子任务。我们通过提炼gpt-4o构建了18k个高质量的指导-响应对，涵盖了五种形式规约语言（Coq、Lean4、Dafny、ACSL和TLA+），并与包括最新流行的DeepSeek-R1在内的十个开源LLM进行了评估。我们还对几个7~8B小型模型进行了微调，以实现与Deepseek-R1-671B相当的性能。有趣的是，我们观察到，使用形式数据进行微调还增强了数学、推理和编码能力。微调后的模型可以在https://huggingface.co/fm-universe 上获得。

更新时间: 2025-03-05 15:26:49

领域: cs.AI,cs.CL,cs.PL

下载: http://arxiv.org/abs/2501.16207v3

On the Utility of Equivariance and Symmetry Breaking in Deep Learning Architectures on Point Clouds

This paper explores the key factors that influence the performance of models working with point clouds, across different tasks of varying geometric complexity. In this work, we explore the trade-offs between flexibility and weight-sharing introduced by equivariant layers, assessing when equivariance boosts or detracts from performance. It is often argued that providing more information as input improves a model's performance. However, if this additional information breaks certain properties, such as $\SE(3)$ equivariance, does it remain beneficial? We identify the key aspects of equivariant and non-equivariant architectures that drive success in different tasks by benchmarking them on segmentation, regression, and generation tasks across multiple datasets with increasing complexity. We observe a positive impact of equivariance, which becomes more pronounced with increasing task complexity, even when strict equivariance is not required.

Updated: 2025-03-05 15:26:17

标题: 关于等变性和对称性破坏在点云深度学习架构中的实用性

摘要: 本文探讨了影响模型在处理不同几何复杂度任务时性能的关键因素。在这项工作中，我们探讨了等变层引入的灵活性和权重共享之间的权衡，评估等变性何时增强或削弱性能。通常认为，提供更多信息作为输入可以提高模型的性能。然而，如果这些额外信息破坏了某些属性，比如$\SE(3)$等变性，它是否仍然有益？我们通过在不断增加复杂性的多个数据集上对分割、回归和生成任务进行基准测试，识别了推动不同任务成功的等变和非等变体系结构的关键方面。我们观察到等变性的积极影响，在任务复杂度增加时更加显著，即使不需要严格的等变性。

更新时间: 2025-03-05 15:26:17

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2501.01999v2

PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention

Large Language Models (LLMs) face efficiency bottlenecks due to the quadratic complexity of the attention mechanism when processing long contexts. Sparse attention methods offer a promising solution, but existing approaches often suffer from incomplete effective context and/or require complex implementation of pipeline. We present a comprehensive analysis of sparse attention for autoregressive LLMs from the respective of receptive field, recognize the suboptimal nature of existing methods for expanding the receptive field, and introduce PowerAttention, a novel sparse attention design that facilitates effective and complete context extension through the theoretical analysis. PowerAttention achieves exponential receptive field growth in $d$-layer LLMs, allowing each output token to attend to $2^d$ tokens, ensuring completeness and continuity of the receptive field. Experiments demonstrate that PowerAttention outperforms existing static sparse attention methods by $5\sim 40\%$, especially on tasks demanding long-range dependencies like Passkey Retrieval and RULER, while maintaining a comparable time complexity to sliding window attention. Efficiency evaluations further highlight PowerAttention's superior speedup in both prefilling and decoding phases compared with dynamic sparse attentions and full attention ($3.0\times$ faster on 128K context), making it a highly effective and user-friendly solution for processing long sequences in LLMs.

Updated: 2025-03-05 15:24:11

标题: PowerAttention：有效稀疏注意力的指数级扩展感受野

摘要: 大型语言模型（LLMs）面临效率瓶颈，因为当处理长文本时，注意力机制的复杂度是二次的。稀疏注意力方法提供了一个有希望的解决方案，但现有方法往往存在有效上下文不完整和/或需要复杂的流水线实现。我们从接受域的角度对自回归LLMs的稀疏注意力进行了全面分析，认识到现有方法在扩展接受域方面的亚最优性质，并介绍了PowerAttention，一种新颖的稀疏注意力设计，通过理论分析促进了有效和完整的上下文扩展。PowerAttention在$d$层LLMs中实现了指数级的接受域增长，使每个输出标记能够关注$2^d$个标记，确保了接受域的完整性和连续性。实验证明，PowerAttention在长距离依赖性任务（如Passkey检索和RULER）上优于现有的静态稀疏注意力方法$5\sim 40\%$，同时与滑动窗口注意力的时间复杂度保持可比性。效率评估进一步突显了PowerAttention相对于动态稀疏注意力和全注意力在预填充和解码阶段的优越加速性能（在128K上下文中快$3.0\times$），使其成为处理LLMs中长序列的高效且用户友好的解决方案。

更新时间: 2025-03-05 15:24:11

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.03588v1

Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models

A safe and trustworthy use of Large Language Models (LLMs) requires an accurate expression of confidence in their answers. We introduce a novel Reinforcement Learning (RL) approach for LLM calibration that fine-tunes LLMs to elicit calibrated confidence estimations in their answers to factual questions. We model the problem as a betting game where the model predicts a confidence score together with every answer, and design a reward function that penalizes both over and under-confidence. We prove that under our reward design an optimal policy would result in a perfectly calibrated confidence estimation. Our experiments demonstrate significantly improved confidence calibration and generalization to new tasks without re-training, indicating that our approach teaches a general confidence awareness. This approach enables the training of inherently calibrated LLMs.

Updated: 2025-03-05 15:23:16

标题: 奖励怀疑：一种强化学习方法用于大型语言模型的置信度校准

摘要: 一种安全可靠地使用大型语言模型(LLMs)的方法需要对其回答的信心进行准确表达。我们引入了一种新颖的强化学习(RL)方法，用于对LLMs进行校准，以在回答事实性问题时引发校准的信心估计。我们将问题建模为一个赌博游戏，模型预测每个答案的信心分数，并设计了一个奖励函数，惩罚过度和不足的信心。我们证明，在我们的奖励设计下，一个最优策略将导致完全校准的信心估计。我们的实验证明，信心校准显著提高，并且在不重新训练的情况下对新任务的泛化能力表现良好，表明我们的方法教授了一种一般的信心意识。这种方法使得本质上校准的LLMs的训练成为可能。

更新时间: 2025-03-05 15:23:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.02623v2

Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories

Large Language Models (LLMs) have shown promise in software vulnerability detection, particularly on function-level benchmarks like Devign and BigVul. However, real-world detection requires interprocedural analysis, as vulnerabilities often emerge through multi-hop function calls rather than isolated functions. While repository-level benchmarks like ReposVul and VulEval introduce interprocedural context, they remain computationally expensive, lack pairwise evaluation of vulnerability fixes, and explore limited context retrieval, limiting their practicality. We introduce JitVul, a JIT vulnerability detection benchmark linking each function to its vulnerability-introducing and fixing commits. Built from 879 CVEs spanning 91 vulnerability types, JitVul enables comprehensive evaluation of detection capabilities. Our results show that ReAct Agents, leveraging thought-action-observation and interprocedural context, perform better than LLMs in distinguishing vulnerable from benign code. While prompting strategies like Chain-of-Thought help LLMs, ReAct Agents require further refinement. Both methods show inconsistencies, either misidentifying vulnerabilities or over-analyzing security guards, indicating significant room for improvement.

Updated: 2025-03-05 15:22:24

标题: 在代码仓库中实际漏洞检测中基于LLMs和基于LLM的代理的基准测试

摘要: 大型语言模型（LLMs）在软件漏洞检测方面表现出潜力，特别是在像Devign和BigVul这样的函数级基准上。然而，实际检测需要进行程序间分析，因为漏洞通常是通过多跳函数调用而不是孤立的函数而出现的。虽然像ReposVul和VulEval这样的存储库级基准引入了程序间上下文，但它们仍然计算昂贵，缺乏漏洞修复的成对评估，以及探索有限的上下文检索，限制了它们的实用性。我们介绍了JitVul，一种JIT漏洞检测基准，将每个函数与其引入漏洞和修复提交链接起来。JitVul由涵盖91种漏洞类型的879个CVE构建，可以全面评估检测能力。我们的结果表明，利用思考-行动-观察和程序间上下文的ReAct Agents在区分易受攻击的与良性代码方面表现优于LLMs。虽然像思维链这样的提示策略有助于LLMs，但ReAct Agents需要进一步改进。这两种方法都显示出不一致性，要么错误地识别漏洞，要么过度分析安全防护措施，表明还有很大的改进空间。

更新时间: 2025-03-05 15:22:24

领域: cs.CR

下载: http://arxiv.org/abs/2503.03586v1

A Generative System for Robot-to-Human Handovers: from Intent Inference to Spatial Configuration Imagery

We propose a novel system for robot-to-human object handover that emulates human coworker interactions. Unlike most existing studies that focus primarily on grasping strategies and motion planning, our system focus on 1. inferring human handover intents, 2. imagining spatial handover configuration. The first one integrates multimodal perception-combining visual and verbal cues-to infer human intent. The second one using a diffusion-based model to generate the handover configuration, involving the spacial relationship among robot's gripper, the object, and the human hand, thereby mimicking the cognitive process of motor imagery. Experimental results demonstrate that our approach effectively interprets human cues and achieves fluent, human-like handovers, offering a promising solution for collaborative robotics. Code, videos, and data are available at: https://i3handover.github.io.

Updated: 2025-03-05 15:13:54

标题: 一个用于机器人向人类交接物品的生成系统：从意图推断到空间配置图像化

摘要: 我们提出了一个新颖的系统，用于模拟人机器人之间对象交接的方式，仿照人类同事之间的互动。与大多数现有研究主要关注抓取策略和运动规划不同，我们的系统专注于1. 推断人类交接意图，2. 想象空间交接配置。第一个部分整合多模态感知-结合视觉和语音线索-来推断人类意图。第二个部分使用基于扩散的模型生成交接配置，涉及机器人夹具、物体和人手之间的空间关系，从而模拟运动形象的认知过程。实验结果表明，我们的方法有效地解释人类线索，并实现流畅、类似人类的交接，为协作机器人提供了一种有前途的解决方案。代码、视频和数据可在以下网址获取：https://i3handover.github.io。

更新时间: 2025-03-05 15:13:54

领域: cs.RO,cs.LG,I.2.9

下载: http://arxiv.org/abs/2503.03579v1

Efficient Neural SDE Training using Wiener-Space Cubature

A neural stochastic differential equation (SDE) is an SDE with drift and diffusion terms parametrized by neural networks. The training procedure for neural SDEs consists of optimizing the SDE vector field (neural network) parameters to minimize the expected value of an objective functional on infinite-dimensional path-space. Existing training techniques focus on methods to efficiently compute path-wise gradients of the objective functional with respect to these parameters, then pair this with Monte-Carlo simulation to estimate the expectation, and stochastic gradient descent to optimize. In this work we introduce a novel training technique which bypasses and improves upon Monte-Carlo simulation; we extend results in the theory of Wiener-space cubature to approximate the expected objective functional by a weighted sum of deterministic ODE solutions. This allows us to compute gradients by efficient ODE adjoint methods. Furthermore, we exploit a high-order recombination scheme to drastically reduce the number of ODE solutions necessary to achieve a reasonable approximation. We show that this Wiener-space cubature approach can surpass the O(1/sqrt(n)) rate of Monte-Carlo simulation, or the O(log(n)/n) rate of quasi-Monte-Carlo, to achieve a O(1/n) rate under reasonable assumptions.

Updated: 2025-03-05 15:10:51

标题: 使用维纳空间积分法进行高效的神经随机微分方程训练

摘要: 神经随机微分方程（SDE）是具有漂移和扩散项的微分方程，其参数由神经网络参数化。神经SDE的训练过程包括优化SDE向量场（神经网络）参数，以最小化无限维路径空间上的目标函数的期望值。现有的训练技术侧重于有效计算目标函数对这些参数的路径梯度，然后与蒙特卡罗模拟配对以估计期望值，并使用随机梯度下降进行优化。在这项工作中，我们引入了一种新颖的训练技术，绕过并改进了蒙特卡罗模拟；我们将Wiener空间积分理论的结果扩展到通过确定性ODE解的加权和来近似期望的目标函数。这使我们能够通过高效的ODE伴随方法计算梯度。此外，我们利用高阶重组方案大幅减少所需的ODE解数量，以实现合理的近似。我们展示了这种Wiener空间积分方法可以超越蒙特卡罗模拟的O(1/√n)速率，或准蒙特卡罗的O(log(n)/n)速率，在合理的假设下实现O(1/n)的速率。

更新时间: 2025-03-05 15:10:51

领域: cs.LG

下载: http://arxiv.org/abs/2502.12395v2

Optimal Decision Tree Pruning Revisited: Algorithms and Complexity

We present a comprehensive classical and parameterized complexity analysis of decision tree pruning operations, extending recent research on the complexity of learning small decision trees. Thereby, we offer new insights into the computational challenges of decision tree simplification, a crucial aspect of developing interpretable and efficient machine learning models. We focus on fundamental pruning operations of subtree replacement and raising, which are used in heuristics. Surprisingly, while optimal pruning can be performed in polynomial time for subtree replacement, the problem is NP-complete for subtree raising. Therefore, we identify parameters and combinations thereof that lead to fixed-parameter tractability or hardness, establishing a precise borderline between these complexity classes. For example, while subtree raising is hard for small domain size $D$ or number $d$ of features, it can be solved in $D^{2d} \cdot |I|^{O(1)}$ time, where $|I|$ is the input size. We complement our theoretical findings with preliminary experimental results, demonstrating the practical implications of our analysis.

Updated: 2025-03-05 15:02:46

标题: 最佳决策树剪枝再探讨：算法与复杂性

摘要: 我们提出了一项关于决策树修剪操作的综合经典和参数化复杂性分析，扩展了最近关于学习小型决策树复杂性的研究。因此，我们提供了对决策树简化的计算挑战的新见解，这是发展可解释和高效机器学习模型的关键方面。我们专注于子树替换和提升这两种基本修剪操作，在启发式算法中使用。令人惊讶的是，虽然子树替换的最优修剪可以在多项式时间内完成，但对于子树提升，该问题是NP完全的。因此，我们确定了导致固定参数可处理性或难度的参数及其组合，建立了这些复杂性类之间的精确界限。例如，虽然对于小域大小$D$或特征数$d$，子树提升是困难的，但可以在$D^{2d} \cdot |I|^{O(1)}$时间内解决，其中$|I|$是输入大小。我们将我们的理论发现与初步实验结果相结合，展示了我们分析的实际影响。

更新时间: 2025-03-05 15:02:46

领域: cs.LG

下载: http://arxiv.org/abs/2503.03576v1

Olympus: A Jumping Quadruped for Planetary Exploration Utilizing Reinforcement Learning for In-Flight Attitude Control

Exploring planetary bodies with lower gravity, such as the moon and Mars, allows legged robots to utilize jumping as an efficient form of locomotion thus giving them a valuable advantage over traditional rovers for exploration. Motivated by this fact, this paper presents the design, simulation, and learning-based "in-flight" attitude control of Olympus, a jumping legged robot tailored to the gravity of Mars. First, the design requirements are outlined followed by detailing how simulation enabled optimizing the robot's design - from its legs to the overall configuration - towards high vertical jumping, forward jumping distance, and in-flight attitude reorientation. Subsequently, the reinforcement learning policy used to track desired in-flight attitude maneuvers is presented. Successfully crossing the sim2real gap, extensive experimental studies of attitude reorientation tests are demonstrated.

Updated: 2025-03-05 15:01:56

标题: 奥林巴斯：利用强化学习实现行星探索的跳跃四足机器人，并用于飞行姿态控制

摘要: 本文摘要描述了利用低重力行星表面，如月球和火星，让有腿的机器人利用跳跃作为一种高效的运动形式，从而使它们在探索中比传统的探测器具有有利优势。受到这一事实的启发，本文介绍了奥林匹斯（Olympus）跳跃式有腿机器人的设计、模拟和基于学习的“飞行”姿态控制，该机器人专为火星的重力而设计。首先，概述了设计要求，然后详细介绍了模拟如何优化机器人的设计 - 从它的腿到整体配置 - 以实现高垂直跳跃、前跳跃距离和飞行中的姿态重新定位。随后，介绍了用于跟踪期望的飞行中姿态机动的强化学习策略。成功地跨越了从模拟到实际的差距，展示了大量关于姿态重新定位测试的实验研究。

更新时间: 2025-03-05 15:01:56

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2503.03574v1

Domain Consistent Industrial Decarbonisation of Global Coal Power Plants

Machine learning and optimisation techniques (MLOPT) hold significant potential to accelerate the decarbonisation of industrial systems by enabling data-driven operational improvements. However, the practical application of MLOPT in industrial settings is often hindered by a lack of domain compliance and system-specific consistency, resulting in suboptimal solutions with limited real-world applicability. To address this challenge, we propose a novel human-in-the-loop (HITL) constraint-based optimisation framework that integrates domain expertise with data-driven methods, ensuring solutions are both technically sound and operationally feasible. We demonstrate the efficacy of this framework through a case study focused on enhancing the thermal efficiency and reducing the turbine heat rate of a 660 MW supercritical coal-fired power plant. By embedding domain knowledge as constraints within the optimisation process, our approach yields solutions that align with the plant's operational patterns and are seamlessly integrated into its control systems. Empirical validation confirms a mean improvement in thermal efficiency of 0.64\% and a mean reduction in turbine heat rate of 93 kJ/kWh. Scaling our analysis to 59 global coal power plants with comparable capacity and fuel type, we estimate a cumulative lifetime reduction of 156.4 million tons of carbon emissions. These results underscore the transformative potential of our HITL-MLOPT framework in delivering domain-compliant, implementable solutions for industrial decarbonisation, offering a scalable pathway to mitigate the environmental impact of coal-based power generation worldwide.

Updated: 2025-03-05 15:00:39

标题: 全球煤电厂领域内一致的工业脱碳

摘要: 机器学习和优化技术(MLOPT)在加速工业系统脱碳方面具有重要潜力，通过实现基于数据的运营改进。然而，在工业环境中实际应用MLOPT通常受到领域合规性和系统特定一致性不足的阻碍，导致次优解决方案的现实适用性有限。为了解决这一挑战，我们提出了一种新颖的人机协同(HITL)基于约束的优化框架，将领域专业知识与数据驱动方法整合在一起，确保解决方案在技术上合理且在操作上可行。我们通过一个案例研究展示了该框架的有效性，重点是提高660兆瓦超临界燃煤电厂的热效率和减少涡轮热耗。通过将领域知识嵌入到优化过程中作为约束，我们的方法产生与电厂运营模式一致并无缝集成到其控制系统中的解决方案。实证验证确认热效率平均提高0.64\%，涡轮热耗平均减少93 kJ/kWh。将我们的分析扩展到59座全球燃煤电厂，这些电厂具有相似的容量和燃料类型，我们估计将减少156.4百万吨碳排放量。这些结果突显了我们的HITL-MLOPT框架在为工业脱碳提供符合领域要求、可实施的解决方案方面的变革潜力，为减轻全球燃煤发电对环境的影响提供了可扩展的途径。

更新时间: 2025-03-05 15:00:39

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.03571v1

DeePen: Penetration Testing for Audio Deepfake Detection

Deepfakes - manipulated or forged audio and video media - pose significant security risks to individuals, organizations, and society at large. To address these challenges, machine learning-based classifiers are commonly employed to detect deepfake content. In this paper, we assess the robustness of such classifiers through a systematic penetration testing methodology, which we introduce as DeePen. Our approach operates without prior knowledge of or access to the target deepfake detection models. Instead, it leverages a set of carefully selected signal processing modifications - referred to as attacks - to evaluate model vulnerabilities. Using DeePen, we analyze both real-world production systems and publicly available academic model checkpoints, demonstrating that all tested systems exhibit weaknesses and can be reliably deceived by simple manipulations such as time-stretching or echo addition. Furthermore, our findings reveal that while some attacks can be mitigated by retraining detection systems with knowledge of the specific attack, others remain persistently effective. We release all associated code.

Updated: 2025-03-05 14:58:33

标题: DeePen：用于音频深度伪造检测的渗透测试

摘要: Deepfakes - 被篡改或伪造的音频和视频媒体 - 对个人、组织和整个社会造成重大安全风险。为了解决这些挑战，常常使用基于机器学习的分类器来检测深度伪造内容。本文通过系统化的渗透测试方法评估这些分类器的鲁棒性，我们将其称为DeePen。我们的方法在没有先前知识或访问目标深度伪造检测模型的情况下运作。相反，它利用一组精心选择的信号处理修改 - 称为攻击 - 来评估模型的漏洞。使用DeePen，我们分析了真实世界的生产系统和公开可用的学术模型检查点，表明所有测试系统都存在弱点，并可以通过简单的操作（如时间拉伸或回声添加）可靠地欺骗。此外，我们的研究结果表明，虽然一些攻击可以通过重新训练检测系统并了解特定攻击的知识来减轻，但其他攻击仍然持续有效。我们发布了所有相关代码。

更新时间: 2025-03-05 14:58:33

领域: cs.CR,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2502.20427v2

Bringing AI Participation Down to Scale: A Comment on Open AIs Democratic Inputs to AI Project

In 2023, Open AIs Democratic Inputs program funded 10 teams to design procedures for public participation in generative AI. In this Perspective, we review the results of the project, drawing on interviews with some of the teams and our own experiences conducting participation exercises, we identify several shared yet largely unspoken assumptions of the Democratic Inputs program 1. that participation must be scalable 2. that the object of participation is a single model 3. that there must be a single form of participation 4. that the goal is to extract abstract principles 5. that these principles should have consensus 6. that publics should be representative and encourage alternative forms of participation in AI, perhaps not undertaken by tech companies.

Updated: 2025-03-05 14:55:49

标题: 将AI参与降低到规模：关于Open AI对AI项目的民主参与的评论

摘要: 在2023年，Open AI的民主输入计划资助了10个团队设计公众参与生成式人工智能的程序。在这个观点中，我们回顾了该项目的结果，引用了一些团队的采访和我们自己进行参与练习的经验，我们发现了民主输入计划的一些共同但很少被提及的假设：1. 参与必须可扩展；2. 参与的对象是一个单一模型；3. 必须有一个单一形式的参与；4. 目标是提取抽象原则；5. 这些原则应该得到共识；6. 公众应该是代表性的，并鼓励AI中的其他形式的参与，也许不是由科技公司进行。

更新时间: 2025-03-05 14:55:49

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2407.11613v2

Probabilistic Insights for Efficient Exploration Strategies in Reinforcement Learning

We investigate efficient exploration strategies of environments with unknown stochastic dynamics and sparse rewards. Specifically, we analyze first the impact of parallel simulations on the probability of reaching rare states within a finite time budget. Using simplified models based on random walks and L\'evy processes, we provide analytical results that demonstrate a phase transition in reaching probabilities as a function of the number of parallel simulations. We identify an optimal number of parallel simulations that balances exploration diversity and time allocation. Additionally, we analyze a restarting mechanism that exponentially enhances the probability of success by redirecting efforts toward more promising regions of the state space. Our findings contribute to a more qualitative and quantitative theory of some exploration schemes in reinforcement learning, offering insights into developing more efficient strategies for environments characterized by rare events.

Updated: 2025-03-05 14:53:32

标题: 强化学习中高效探索策略的概率洞察

摘要: 我们研究了在具有未知随机动态和稀疏奖励的环境中的有效探索策略。具体而言，我们首先分析了并行模拟对在有限时间预算内达到稀有状态的概率的影响。利用基于随机游走和L\'evy过程的简化模型，我们提供了分析结果，证明了达到概率作为并行模拟数量的函数发生相变。我们确定了一种平衡探索多样性和时间分配的最佳并行模拟数量。此外，我们分析了一种重新启动机制，通过将努力重定向到状态空间中更有前景的区域，指数级增加成功的概率。我们的研究结果有助于对强化学习中一些探索方案的定性和定量理论，为针对稀有事件环境开发更有效的策略提供了见解。

更新时间: 2025-03-05 14:53:32

领域: math.PR,cs.LG

下载: http://arxiv.org/abs/2503.03565v1

VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection

This research introduces a novel AI techniques as Mixture-of-Experts Transformers with Group Relative Policy Optimization (GRPO) for voice health care applications on voice pathology detection. With the architectural innovations, we adopt advanced training paradigms inspired by reinforcement learning, namely Proximal Policy Optimization (PPO) and Group-wise Regularized Policy Optimization (GRPO), to enhance model stability and performance. Experiments conducted on a synthetically generated voice pathology dataset demonstrate that our proposed models significantly improve diagnostic accuracy, F1 score, and ROC-AUC compared to conventional approaches. These findings underscore the potential of integrating transformer architectures with novel training strategies to advance automated voice pathology detection and ultimately contribute to more effective healthcare delivery. The code we used to train and evaluate our models is available at https://github.com/enkhtogtokh/voicegrpo

Updated: 2025-03-05 14:52:57

标题: VoiceGRPO：基于组相对策略优化的现代MoE Transformers，用于AI语音健康护理应用中的声音病理检测

摘要: 这项研究引入了一种新颖的人工智能技术，即混合专家变压器与组相对策略优化（GRPO），用于语音健康护理应用中的语音病理检测。通过架构创新，我们采用受强化学习启发的先进训练范式，即近端策略优化（PPO）和组内正则化策略优化（GRPO），以增强模型稳定性和性能。在一个合成生成的语音病理数据集上进行的实验表明，与传统方法相比，我们提出的模型显著提高了诊断准确性、F1分数和ROC-AUC。这些发现强调了将变压器架构与新颖训练策略相结合，推进自动语音病理检测，并最终有助于更有效的医疗服务交付的潜力。我们用来训练和评估模型的代码可在https://github.com/enkhtogtokh/voicegrpo找到。

更新时间: 2025-03-05 14:52:57

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2503.03797v1

A Conceptual Model for Attributions in Event-Centric Knowledge Graphs

The use of narratives as a means of fusing information from knowledge graphs (KGs) into a coherent line of argumentation has been the subject of recent investigation. Narratives are especially useful in event-centric knowledge graphs in that they provide a means to connect different real-world events and categorize them by well-known narrations. However, specifically for controversial events, a problem in information fusion arises, namely, multiple viewpoints regarding the validity of certain event aspects, e.g., regarding the role a participant takes in an event, may exist. Expressing those viewpoints in KGs is challenging because disputed information provided by different viewpoints may introduce inconsistencies. Hence, most KGs only feature a single view on the contained information, hampering the effectiveness of narrative information access. This paper is an extension of our original work and introduces attributions, i.e., parameterized predicates that allow for the representation of facts that are only valid in a specific viewpoint. For this, we develop a conceptual model that allows for the representation of viewpoint-dependent information. As an extension, we enhance the model by a conception of viewpoint-compatibility. Based on this, we deepen our original deliberations on the model's effects on information fusion and provide additional grounding in the literature.

Updated: 2025-03-05 14:51:46

标题: 一个事件中心知识图中的归因概念模型

摘要: 最近的研究探讨了将叙事作为一种将知识图谱（KGs）中的信息融合成连贯论证的手段。叙事在以事件为中心的知识图谱中特别有用，因为它们提供了一种连接不同现实世界事件并按照众所周知的叙述进行分类的方法。然而，对于有争议的事件，信息融合中存在一个问题，即关于某些事件方面的有效性的多个观点，例如某个参与者在事件中扮演的角色。在知识图谱中表达这些观点是具有挑战性的，因为不同观点提供的有争议信息可能引入不一致性。因此，大多数知识图谱只展示了包含信息的一个视角，阻碍了叙事信息访问的有效性。本文是我们原始工作的延伸，并引入了属性，即允许表示仅在特定视角下有效的事实的参数化谓词。为此，我们开发了一个概念模型，允许表示依赖于视角的信息。作为扩展，我们通过一个视角兼容性的概念加强了该模型。基于此，我们深化了对模型对信息融合的影响的原始思考，并在文献中提供了额外的基础。

更新时间: 2025-03-05 14:51:46

领域: cs.DB,cs.AI

下载: http://arxiv.org/abs/2503.03563v1

FedPalm: A General Federated Learning Framework for Closed- and Open-Set Palmprint Verification

Current deep learning (DL)-based palmprint verification models rely on centralized training with large datasets, which raises significant privacy concerns due to biometric data's sensitive and immutable nature. Federated learning~(FL), a privacy-preserving distributed learning paradigm, offers a compelling alternative by enabling collaborative model training without the need for data sharing. However, FL-based palmprint verification faces critical challenges, including data heterogeneity from diverse identities and the absence of standardized evaluation benchmarks. This paper addresses these gaps by establishing a comprehensive benchmark for FL-based palmprint verification, which explicitly defines and evaluates two practical scenarios: closed-set and open-set verification. We propose FedPalm, a unified FL framework that balances local adaptability with global generalization. Each client trains a personalized textural expert tailored to local data and collaboratively contributes to a shared global textural expert for extracting generalized features. To further enhance verification performance, we introduce a Textural Expert Interaction Module that dynamically routes textural features among experts to generate refined side textural features. Learnable parameters are employed to model relationships between original and side features, fostering cross-texture-expert interaction and improving feature discrimination. Extensive experiments validate the effectiveness of FedPalm, demonstrating robust performance across both scenarios and providing a promising foundation for advancing FL-based palmprint verification research.

Updated: 2025-03-05 14:49:42

标题: FedPalm：一种闭合和开放式掌纹验证的通用联邦学习框架

摘要: 目前基于深度学习（DL）的掌纹验证模型依赖于具有大型数据集的集中式训练，这引发了由于生物特征数据的敏感和不可变性而引起的重大隐私问题。联邦学习（FL）是一种保护隐私的分布式学习范式，提供了一种引人注目的替代方案，可以实现协作模型训练而无需共享数据。然而，基于FL的掌纹验证面临着关键挑战，包括来自不同身份的数据异质性和缺乏标准化评估基准。本文通过建立一个FL-based掌纹验证的全面基准来填补这些空白，明确定义和评估了两种实际场景：封闭集和开放集验证。我们提出了FedPalm，一个统一的FL框架，平衡了本地适应性和全局泛化性。每个客户端训练一个针对本地数据量身定制的个性化纹理专家，并共同为提取广义特征贡献到一个共享的全局纹理专家。为了进一步提高验证性能，我们引入了一个纹理专家交互模块，动态地将纹理特征路由到专家之间，生成精细的侧纹理特征。采用可学习参数来建模原始特征和侧特征之间的关系，促进跨纹理专家的互动，提高特征区分能力。大量实验证实了FedPalm的有效性，展示了在两种场景下的稳健性能，并为推进基于FL的掌纹验证研究奠定了有前途的基础。

更新时间: 2025-03-05 14:49:42

领域: cs.CV,cs.AI,cs.CR

下载: http://arxiv.org/abs/2503.04837v1

Online Convex Optimisation: The Optimal Switching Regret for all Segmentations Simultaneously

We consider the classic problem of online convex optimisation. Whereas the notion of static regret is relevant for stationary problems, the notion of switching regret is more appropriate for non-stationary problems. A switching regret is defined relative to any segmentation of the trial sequence, and is equal to the sum of the static regrets of each segment. In this paper we show that, perhaps surprisingly, we can achieve the asymptotically optimal switching regret on every possible segmentation simultaneously. Our algorithm for doing so is very efficient: having a space and per-trial time complexity that is logarithmic in the time-horizon. Our algorithm also obtains novel bounds on its dynamic regret: being adaptive to variations in the rate of change of the comparator sequence.

Updated: 2025-03-05 14:49:25

标题: 在线凸优化：同时为所有分割找到最佳切换遗憾

摘要: 我们考虑在线凸优化的经典问题。对于静态问题，静态遗憾的概念是相关的，而对于非静态问题，切换遗憾的概念更为合适。切换遗憾是相对于试验序列的任何分段定义的，等于每个分段的静态遗憾之和。在本文中，我们展示了一个令人惊讶的结果，即我们可以同时在每个可能的分段上实现渐近最优的切换遗憾。我们的算法非常高效：在时间段内的空间和每次试验时间复杂度是对数的。我们的算法还获得了其动态遗憾的新界限：对比序列变化速率的适应性。

更新时间: 2025-03-05 14:49:25

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.20824v2

Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection

Humans detect real-world object anomalies by perceiving, interacting, and reasoning based on object-conditioned physical knowledge. The long-term goal of Industrial Anomaly Detection (IAD) is to enable machines to autonomously replicate this skill. However, current IAD algorithms are largely developed and tested on static, semantically simple datasets, which diverge from real-world scenarios where physical understanding and reasoning are essential.To bridge this gap, we introduce the Physics Anomaly Detection (Phys-AD) dataset, the first large-scale, real-world, physics-grounded video dataset for industrial anomaly detection. Collected using a real robot arm and motor, Phys-AD provides a diverse set of dynamic, semantically rich scenarios. The dataset includes more than 6400 videos across 22 real-world object categories, interacting with robot arms and motors, and exhibits 47 types of anomalies. Anomaly detection in Phys-AD requires visual reasoning, combining both physical knowledge and video content to determine object abnormality.We benchmark state-of-the-art anomaly detection methods under three settings: unsupervised AD, weakly-supervised AD, and video-understanding AD, highlighting their limitations in handling physics-grounded anomalies. Additionally, we introduce the Physics Anomaly Explanation (PAEval) metric, designed to assess the ability of visual-language foundation models to not only detect anomalies but also provide accurate explanations for their underlying physical causes. Our dataset and benchmark will be publicly available.

Updated: 2025-03-05 14:49:08

标题: 朝向对现实世界物理动态的视觉区分和推理：基于物理的异常检测

摘要: 人类通过感知、互动和基于物体条件的物理知识进行推理来检测真实世界物体异常。工业异常检测（IAD）的长期目标是使机器能够自主复制这种技能。然而，当前的IAD算法主要是在静态、语义简单的数据集上开发和测试的，这些数据集与物理理解和推理至关重要的真实世界场景有所偏离。为了弥合这一差距，我们引入了Physics Anomaly Detection（Phys-AD）数据集，这是第一个面向工业异常检测的大规模、真实世界、基于物理的视频数据集。Phys-AD使用真实机器人臂和电机收集，提供了一组多样化的动态、语义丰富的场景。该数据集包括22个真实世界物体类别的6400多个视频，这些物体与机器人臂和电机进行互动，并展示了47种异常类型。Phys-AD中的异常检测需要视觉推理，结合物理知识和视频内容来确定物体的异常性。我们在三种设置下对最先进的异常检测方法进行基准测试：无监督AD、弱监督AD和视频理解AD，突出它们在处理基于物理的异常方面的局限性。此外，我们引入了Physics Anomaly Explanation（PAEval）度量标准，旨在评估视觉语言基础模型不仅能够检测异常，而且还能够提供关于其潜在物理原因的准确解释。我们的数据集和基准测试将公开提供。

更新时间: 2025-03-05 14:49:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.03562v1

Transformer-Based Power Optimization for Max-Min Fairness in Cell-Free Massive MIMO

Power allocation is an important task in wireless communication networks. Classical optimization algorithms and deep learning methods, while effective in small and static scenarios, become either computationally demanding or unsuitable for large and dynamic networks with varying user loads. This letter explores the potential of transformer-based deep learning models to address these challenges. We propose a transformer neural network to jointly predict optimal uplink and downlink power using only user and access point positions. The max-min fairness problem in cell-free massive multiple input multiple output systems is considered. Numerical results show that the trained model provides near-optimal performance and adapts to varying numbers of users and access points without retraining, additional processing, or updating its neural network architecture. This demonstrates the effectiveness of the proposed model in achieving robust and flexible power allocation for dynamic networks.

Updated: 2025-03-05 14:49:06

标题: 基于Transformer的功率优化在无蜂窝大规模MIMO中的最大最小公平性

摘要: 功率分配是无线通信网络中的重要任务。传统的优化算法和深度学习方法在小型和静态场景中有效，但在具有不同用户负载的大型和动态网络中要么计算量过大，要么不适用。本文探讨了基于变压器的深度学习模型应对这些挑战的潜力。我们提出了一个变压器神经网络，仅使用用户和接入点位置来联合预测最佳上行和下行功率。考虑了小区无线大规模多输入多输出系统中的最大最小公平问题。数值结果表明，训练模型提供了接近最优性能，并适应不同数量的用户和接入点，无需重新训练、额外处理或更新其神经网络架构。这证明了所提出模型在实现动态网络的稳健和灵活功率分配方面的有效性。

更新时间: 2025-03-05 14:49:06

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2503.03561v1

Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction

Training multimodal generative models on large, uncurated datasets can result in users being exposed to harmful, unsafe and controversial or culturally-inappropriate outputs. While model editing has been proposed to remove or filter undesirable concepts in embedding and latent spaces, it can inadvertently damage learned manifolds, distorting concepts in close semantic proximity. We identify limitations in current model editing techniques, showing that even benign, proximal concepts may become misaligned. To address the need for safe content generation, we leverage safe embeddings and a modified diffusion process with tunable weighted summation in the latent space to generate safer images. Our method preserves global context without compromising the structural integrity of the learned manifolds. We achieve state-of-the-art results on safe image generation benchmarks and offer intuitive control over the level of model safety. We identify trade-offs between safety and censorship, which presents a necessary perspective in the development of ethical AI models. We will release our code. Keywords: Text-to-Image Models, Generative AI, Safety, Reliability, Model Editing

Updated: 2025-03-05 14:45:55

标题: 安全而不影响语义：通过保持上下文的双重潜在重建实现无需编辑的安全图像生成

摘要: 在大型、未经筛选的数据集上训练多模态生成模型可能导致用户暴露于有害、不安全和有争议或文化不当的输出。虽然已经提出了模型编辑来消除或过滤嵌入和潜在空间中的不良概念，但它可能无意中损害了学习的流形，扭曲了语义接近的概念。我们确定了当前模型编辑技术的局限性，表明即使是良性的、接近的概念也可能变得不一致。为了解决安全内容生成的需求，我们利用安全嵌入和在潜在空间中可调加权求和的修改扩散过程来生成更安全的图像。我们的方法保留全局上下文而不损害学习流形的结构完整性。我们在安全图像生成基准测试中取得了最先进的结果，并提供了对模型安全水平的直观控制。我们确定了安全和审查之间的权衡，这在开发道德人工智能模型中是必要的视角。我们将发布我们的代码。关键词：文本到图像模型，生成人工智能，安全性，可靠性，模型编辑

更新时间: 2025-03-05 14:45:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.13982v2

LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models

The rapid development of Large Language Models (LLMs) has brought significant advancements across various tasks. However, despite these achievements, LLMs still exhibit inherent safety vulnerabilities, especially when confronted with jailbreak attacks. Existing jailbreak methods suffer from two main limitations: reliance on complicated prompt engineering and iterative optimization, which lead to low attack success rate (ASR) and attack efficiency (AE). In this work, we propose an efficient jailbreak attack method, Analyzing-based Jailbreak (ABJ), which leverages the advanced reasoning capability of LLMs to autonomously generate harmful content, revealing their underlying safety vulnerabilities during complex reasoning process. We conduct comprehensive experiments on ABJ across various open-source and closed-source LLMs. In particular, ABJ achieves high ASR (82.1% on GPT-4o-2024-11-20) with exceptional AE among all target LLMs, showcasing its remarkable attack effectiveness, transferability, and efficiency. Our findings underscore the urgent need to prioritize and improve the safety of LLMs to mitigate the risks of misuse.

Updated: 2025-03-05 14:43:33

标题: LLMs可能是危险的推理者：基于分析的对大型语言模型的越狱攻击

摘要: 大型语言模型（LLMs）的快速发展在各种任务中带来了显著进展。然而，尽管取得了这些成就，LLMs在面对越狱攻击时仍然表现出固有的安全漏洞。现有的越狱方法存在两个主要限制：依赖复杂的提示工程和迭代优化，导致攻击成功率（ASR）和攻击效率（AE）较低。在这项工作中，我们提出了一种高效的越狱攻击方法，基于分析的越狱（ABJ），利用LLMs的先进推理能力自动生成有害内容，揭示它们在复杂推理过程中的潜在安全漏洞。我们对ABJ在各种开源和闭源LLMs上进行了全面实验。特别是，ABJ在所有目标LLMs中实现了高ASR（在GPT-4o-2024-11-20上为82.1%），具有卓越的AE，展示了其卓越的攻击效果、可转移性和效率。我们的发现强调了迫切需要优先考虑和改善LLMs的安全性，以减轻误用风险。

更新时间: 2025-03-05 14:43:33

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.16205v5

Online Scheduling for LLM Inference with KV Cache Constraints

Large Language Model (LLM) inference, where a trained model generates text one word at a time in response to user prompts, is a computationally intensive process requiring efficient scheduling to optimize latency and resource utilization. A key challenge in LLM inference is the management of the Key-Value (KV) cache, which reduces redundant computations but introduces memory constraints. In this work, we model LLM inference with KV cache constraints theoretically and propose novel batching and scheduling algorithms that minimize inference latency while effectively managing the KV cache's memory. We analyze both semi-online and fully online scheduling models, and our results are threefold. First, we provide a polynomial-time algorithm that achieves exact optimality in terms of average latency in the semi-online prompt arrival model. Second, in the fully online case with a stochastic prompt arrival, we introduce an efficient online scheduling algorithm with constant regret. Third, we prove that no algorithm (deterministic or randomized) can achieve a constant competitive ratio in fully online adversarial settings. Our empirical evaluations on a public LLM inference dataset, using the Llama-70B model on A100 GPUs, show that our approach significantly outperforms benchmark algorithms used currently in practice, achieving lower latency while reducing energy consumption. Overall, our results offer a path toward more sustainable and cost-effective LLM deployment.

Updated: 2025-03-05 14:43:01

标题: 基于KV缓存约束的LLM推理的在线调度

摘要: 大型语言模型（LLM）推理是一个计算密集型的过程，训练模型会根据用户提示逐字生成文本，需要有效的调度来优化延迟和资源利用率。LLM推理中的一个关键挑战是管理键值（KV）缓存，它可以减少冗余计算但引入了内存约束。本文在理论上对具有KV缓存约束的LLM推理进行建模，并提出了新颖的批处理和调度算法，可以最小化推理延迟同时有效地管理KV缓存的内存。我们分析了半在线和全在线调度模型，并得出了三个结果。首先，在半在线提示到达模型中，我们提供了一个多项式时间算法，可以实现平均延迟的精确最优性。其次，在全在线情况下，我们引入了一种具有常数遗憾的高效在线调度算法，用于处理随机提示到达。第三，我们证明在全在线对抗环境中，没有算法（确定性或随机化）能够实现恒定的竞争比率。我们在一个公开的LLM推理数据集上进行了实证评估，使用A100 GPU上的Llama-70B模型，结果显示我们的方法明显优于目前实践中使用的基准算法，实现了更低的延迟同时降低了能耗。总体而言，我们的结果为更可持续和成本效益的LLM部署提供了一条道路。

更新时间: 2025-03-05 14:43:01

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2502.07115v3

PGAD: Prototype-Guided Adaptive Distillation for Multi-Modal Learning in AD Diagnosis

Missing modalities pose a major issue in Alzheimer's Disease (AD) diagnosis, as many subjects lack full imaging data due to cost and clinical constraints. While multi-modal learning leverages complementary information, most existing methods train only on complete data, ignoring the large proportion of incomplete samples in real-world datasets like ADNI. This reduces the effective training set and limits the full use of valuable medical data. While some methods incorporate incomplete samples, they fail to effectively address inter-modal feature alignment and knowledge transfer challenges under high missing rates. To address this, we propose a Prototype-Guided Adaptive Distillation (PGAD) framework that directly incorporates incomplete multi-modal data into training. PGAD enhances missing modality representations through prototype matching and balances learning with a dynamic sampling strategy. We validate PGAD on the ADNI dataset with varying missing rates (20%, 50%, and 70%) and demonstrate that it significantly outperforms state-of-the-art approaches. Ablation studies confirm the effectiveness of prototype matching and adaptive sampling, highlighting the potential of our framework for robust and scalable AD diagnosis in real-world clinical settings.

Updated: 2025-03-05 14:39:31

标题: PGAD：原型引导的自适应蒸馏用于多模态学习在AD诊断中

摘要: 缺失的模态在阿尔茨海默病（AD）诊断中构成了一个重大问题，因为许多受试者由于成本和临床限制而缺乏完整的成像数据。虽然多模态学习利用互补信息，但大多数现有方法仅在完整数据上进行训练，忽视了真实世界数据集（如ADNI）中大量不完整样本的比例。这降低了有效训练集，并限制了有价值的医学数据的充分利用。虽然一些方法包含不完整样本，但它们未能有效解决在高缺失率下的跨模态特征对齐和知识转移挑战。为了解决这个问题，我们提出了一个原型引导自适应蒸馏（PGAD）框架，直接将不完整的多模态数据纳入训练中。PGAD通过原型匹配增强缺失模态表示，并通过动态采样策略平衡学习。我们在ADNI数据集上验证了PGAD在不同缺失率（20％、50％和70％）下的表现，并证明它明显优于最先进的方法。消融研究证实了原型匹配和自适应采样的有效性，突显了我们框架在真实世界临床环境中进行鲁棒和可扩展AD诊断的潜力。

更新时间: 2025-03-05 14:39:31

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.04836v1

RIDE: Enhancing Large Language Model Alignment through Restyled In-Context Learning Demonstration Exemplars

Alignment tuning is crucial for ensuring large language models (LLMs) behave ethically and helpfully. Current alignment approaches require high-quality annotations and significant training resources. This paper proposes a low-cost, tuning-free method using in-context learning (ICL) to enhance LLM alignment. Through an analysis of high-quality ICL demos, we identified style as a key factor influencing LLM alignment capabilities and explicitly restyled ICL exemplars based on this stylistic framework. Additionally, we combined the restyled demos to achieve a balance between the two conflicting aspects of LLM alignment--factuality and safety. We packaged the restyled examples as prompts to trigger few-shot learning, improving LLM alignment. Compared to the best baseline approach, with an average score of 5.00 as the maximum, our method achieves a maximum 0.10 increase on the Alpaca task (from 4.50 to 4.60), a 0.22 enhancement on the Just-eval benchmark (from 4.34 to 4.56), and a maximum improvement of 0.32 (from 3.53 to 3.85) on the MT-Bench dataset. We release the code and data at https://github.com/AnonymousCode-ComputerScience/RIDE.

Updated: 2025-03-05 14:38:19

标题: RIDE：通过重新设计的上下文学习演示示例增强大型语言模型的对齐

摘要: 调整对齐是确保大型语言模型（LLMs）行为道德和有用的关键。目前的对齐方法需要高质量的注释和大量的训练资源。本文提出了一种低成本、无需调整的方法，使用上下文学习（ICL）来增强LLM的对齐。通过对高质量ICL演示的分析，我们确定了风格作为影响LLM对齐能力的关键因素，并根据这种风格框架明确地重新设计了ICL示范。此外，我们将重新设计的演示结合起来，以在LLM对齐的两个相互冲突的方面之间实现平衡--事实性和安全性。我们将重新设计的示例打包为提示，触发少量学习，改善LLM的对齐。与最佳基准方法相比，最高平均分数为5.00，我们的方法在Alpaca任务上实现了最高0.10的增加（从4.50到4.60），在Just-eval基准上增加了0.22（从4.34到4.56），在MT-Bench数据集上最大改善为0.32（从3.53到3.85）。我们在https://github.com/AnonymousCode-ComputerScience/RIDE上发布了代码和数据。

更新时间: 2025-03-05 14:38:19

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2502.11681v4

Distilling Dataset into Neural Field

Utilizing a large-scale dataset is essential for training high-performance deep learning models, but it also comes with substantial computation and storage costs. To overcome these challenges, dataset distillation has emerged as a promising solution by compressing the large-scale dataset into a smaller synthetic dataset that retains the essential information needed for training. This paper proposes a novel parameterization framework for dataset distillation, coined Distilling Dataset into Neural Field (DDiF), which leverages the neural field to store the necessary information of the large-scale dataset. Due to the unique nature of the neural field, which takes coordinates as input and output quantity, DDiF effectively preserves the information and easily generates various shapes of data. We theoretically confirm that DDiF exhibits greater expressiveness than some previous literature when the utilized budget for a single synthetic instance is the same. Through extensive experiments, we demonstrate that DDiF achieves superior performance on several benchmark datasets, extending beyond the image domain to include video, audio, and 3D voxel. We release the code at https://github.com/aailab-kaist/DDiF.

Updated: 2025-03-05 14:33:29

标题: 将数据集提炼为神经场

摘要: 利用大规模数据集对训练高性能深度学习模型至关重要，但也伴随着大量的计算和存储成本。为了克服这些挑战，数据集精炼已经成为一种有前途的解决方案，它将大规模数据集压缩成一个保留了训练所需基本信息的较小的合成数据集。本文提出了一种新颖的数据集精炼参数化框架，名为Distilling Dataset into Neural Field (DDiF)，它利用神经场来存储大规模数据集的必要信息。由于神经场的独特特性，它以坐标作为输入，并输出数量，DDiF有效地保留了信息并轻松生成各种形状的数据。我们在理论上证实，当用于单个合成实例的预算相同时，DDiF比一些先前的文献表现出更大的表达能力。通过大量实验证明，DDiF在几个基准数据集上表现出优越的性能，不仅扩展到图像领域，还包括视频、音频和3D体素。我们在https://github.com/aailab-kaist/DDiF发布了代码。

更新时间: 2025-03-05 14:33:29

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.04835v1

Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm

Multi-Agent Reinforcement Learning (MARL) has shown promise in solving complex problems involving cooperation and competition among agents, such as an Unmanned Surface Vehicle (USV) swarm used in search and rescue, surveillance, and vessel protection. However, aligning system behavior with user preferences is challenging due to the difficulty of encoding expert intuition into reward functions. To address the issue, we propose a Reinforcement Learning with Human Feedback (RLHF) approach for MARL that resolves credit-assignment challenges through an Agent-Level Feedback system categorizing feedback into intra-agent, inter-agent, and intra-team types. To overcome the challenges of direct human feedback, we employ a Large Language Model (LLM) evaluator to validate our approach using feedback scenarios such as region constraints, collision avoidance, and task allocation. Our method effectively refines USV swarm policies, addressing key challenges in multi-agent systems while maintaining fairness and performance consistency.

Updated: 2025-03-05 14:33:18

标题: 无人水面舰队中基于人类内在偏好的政策微调多智能体强化学习

摘要: 多智能体强化学习（MARL）在解决涉及协作和竞争的复杂问题方面显示出潜力，例如用于搜索和救援、监视和船只保护的无人水面车（USV）群。然而，由于将专家直觉编码到奖励函数中的困难，使系统行为与用户偏好保持一致是具有挑战性的。为了解决这个问题，我们提出了一种强化学习与人类反馈（RLHF）方法，用于MARL，通过将反馈划分为单个智能体、智能体之间和团队内部类型的智能体级反馈系统来解决信用分配挑战。为了克服直接人类反馈的挑战，我们使用大型语言模型（LLM）评估器来验证我们的方法，使用反馈场景，如区域约束、避碰和任务分配。我们的方法有效地优化了USV群策略，解决了多智能体系统中的关键挑战，同时保持公平性和性能一致性。

更新时间: 2025-03-05 14:33:18

领域: cs.MA,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2503.03796v1

Simulation-Based Performance Evaluation of 3D Object Detection Methods with Deep Learning for a LiDAR Point Cloud Dataset in a SOTIF-related Use Case

Safety of the Intended Functionality (SOTIF) addresses sensor performance limitations and deep learning-based object detection insufficiencies to ensure the intended functionality of Automated Driving Systems (ADS). This paper presents a methodology examining the adaptability and performance evaluation of the 3D object detection methods on a LiDAR point cloud dataset generated by simulating a SOTIF-related Use Case. The major contributions of this paper include defining and modelling a SOTIF-related Use Case with 21 diverse weather conditions and generating a LiDAR point cloud dataset suitable for application of 3D object detection methods. The dataset consists of 547 frames, encompassing clear, cloudy, rainy weather conditions, corresponding to different times of the day, including noon, sunset, and night. Employing MMDetection3D and OpenPCDET toolkits, the performance of State-of-the-Art (SOTA) 3D object detection methods is evaluated and compared by testing the pre-trained Deep Learning (DL) models on the generated dataset using Average Precision (AP) and Recall metrics.

Updated: 2025-03-05 14:32:32

标题: 基于仿真的性能评估：在SOTIF相关用例中使用深度学习对LiDAR点云数据集进行3D物体检测方法

摘要: 安全性的预期功能（SOTIF）解决了传感器性能限制和基于深度学习的目标检测不足，以确保自动驾驶系统（ADS）的预期功能。本文提出了一种方法，检验了适应性和性能评估的3D目标检测方法在通过模拟SOTIF相关用例生成的LiDAR点云数据集上的适应性。本文的主要贡献包括定义和建模具有21种不同天气条件的SOTIF相关用例，并生成适用于应用3D目标检测方法的LiDAR点云数据集。该数据集包括547帧，包括晴天、多云、雨天等不同天气条件，对应不同时间段，包括中午、日落和夜晚。通过使用MMDetection3D和OpenPCDET工具包，评估和比较了现有技术（SOTA）3D目标检测方法的性能，通过在生成的数据集上使用平均精度（AP）和召回率指标测试预训练的深度学习（DL）模型。

更新时间: 2025-03-05 14:32:32

领域: cs.CV,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.03548v1

Revisiting the Role of Relearning in Semantic Dementia

Patients with semantic dementia (SD) present with remarkably consistent atrophy of neurons in the anterior temporal lobe and behavioural impairments, such as graded loss of category knowledge. While relearning of lost knowledge has been shown in acute brain injuries such as stroke, it has not been widely supported in chronic cognitive diseases such as SD. Previous research has shown that deep linear artificial neural networks exhibit stages of semantic learning akin to humans. Here, we use a deep linear network to test the hypothesis that relearning during disease progression rather than particular atrophy cause the specific behavioural patterns associated with SD. After training the network to generate the common semantic features of various hierarchically organised objects, neurons are successively deleted to mimic atrophy while retraining the model. The model with relearning and deleted neurons reproduced errors specific to SD, including prototyping errors and cross-category confusions. This suggests that relearning is necessary for artificial neural networks to reproduce the behavioural patterns associated with SD in the absence of \textit{output} non-linearities. Our results support a theory of SD progression that results from continuous relearning of lost information. Future research should revisit the role of relearning as a contributing factor to cognitive diseases.

Updated: 2025-03-05 14:28:38

标题: 重新考察重新学习在语义痴呆中的作用

摘要: 患有语义痴呆症（SD）的患者表现出前颞叶神经元的明显一致性萎缩和行为障碍，如分类知识的逐渐丧失。虽然在急性脑损伤如中风中已经显示出失去知识的重新学习，但在慢性认知疾病如SD中并未得到广泛支持。先前的研究表明，深度线性人工神经网络展示出类似于人类的语义学习阶段。在这里，我们使用深度线性网络来测试这一假设：疾病进展期间的重新学习而非特定的萎缩导致了与SD相关的特定行为模式。在训练网络生成各种分层组织对象的共同语义特征后，神经元被逐步删除以模拟萎缩同时重新训练模型。具有重新学习和已删除神经元的模型复制了与SD特定的错误，包括原型错误和跨类别混淆。这表明，在缺乏\textit{输出}非线性的情况下，重新学习对于人工神经网络复制与SD相关的行为模式是必要的。我们的结果支持SD进展理论，即通过持续重新学习丢失的信息而导致。未来的研究应重新审视重新学习作为认知疾病的一个促发因素的作用。

更新时间: 2025-03-05 14:28:38

领域: cs.LG

下载: http://arxiv.org/abs/2503.03545v1

Extrapolation Merging: Keep Improving With Extrapolation and Merging

Large Language Models (LLMs) require instruction fine-tuning to perform different downstream tasks. However, the instruction fine-tuning phase still demands significant computational resources and labeled data, lacking a paradigm that can improve model performance without additional computational power and data. Model merging aims to enhance performance by combining the parameters of different models, but the lack of a clear optimization direction during the merging process does not always guarantee improved performance. In this paper, we attempt to provide a clear optimization direction for model merging. We first validate the effectiveness of the model extrapolation method during the instruction fine-tuning phase. Then, we propose Extrapolation Merging, a paradigm that can continue improving model performance without requiring extra computational resources or data. Using the extrapolation method, we provide a clear direction for model merging, achieving local optimization search, and consequently enhancing the merged model's performance. We conduct experiments on seven different tasks, and the results show that our method can consistently improve the model's performance after fine-tuning.

Updated: 2025-03-05 14:28:22

标题: 外推合并：借助外推和合并持续改进

摘要: 大型语言模型（LLMs）需要指导微调以执行不同的下游任务。然而，指导微调阶段仍然需要大量的计算资源和标记数据，缺乏一个可以在不增加额外计算资源和数据的情况下提高模型性能的范例。模型合并旨在通过结合不同模型的参数来增强性能，但在合并过程中缺乏明确的优化方向并不总能保证性能的提升。在本文中，我们尝试为模型合并提供一个明确的优化方向。我们首先验证了模型外推方法在指导微调阶段的有效性。然后，我们提出了外推合并，这是一种可以在不需要额外计算资源或数据的情况下持续提高模型性能的范例。利用外推方法，我们为模型合并提供了一个明确的方向，实现了局部优化搜索，从而增强了合并模型的性能。我们在七个不同任务上进行实验，结果显示我们的方法可以在微调后持续改善模型的性能。

更新时间: 2025-03-05 14:28:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.04834v1

Data Sharing, Privacy and Security Considerations in the Energy Sector: A Review from Technical Landscape to Regulatory Specifications

Decarbonization, decentralization and digitalization are the three key elements driving the twin energy transition. The energy system is evolving to a more data driven ecosystem, leading to the need of communication and storage of large amount of data of different resolution from the prosumers and other stakeholders in the energy ecosystem. While the energy system is certainly advancing, this paradigm shift is bringing in new privacy and security issues related to collection, processing and storage of data - not only from the technical dimension, but also from the regulatory perspective. Understanding data privacy and security in the evolving energy system, regarding regulatory compliance, is an immature field of research. Contextualized knowledge of how related issues are regulated is still in its infancy, and the practical and technical basis for the regulatory framework for data privacy and security is not clear. To fill this gap, this paper conducts a comprehensive review of the data-related issues for the energy system by integrating both technical and regulatory dimensions. We start by reviewing open-access data, data communication and data-processing techniques for the energy system, and use it as the basis to connect the analysis of data-related issues from the integrated perspective. We classify the issues into three categories: (i) data-sharing among energy end users and stakeholders (ii) privacy of end users, and (iii) cyber security, and then explore these issues from a regulatory perspective. We analyze the evolution of related regulations, and introduce the relevant regulatory initiatives for the categorized issues in terms of regulatory definitions, concepts, principles, rights and obligations in the context of energy systems. Finally, we provide reflections on the gaps that still exist, and guidelines for regulatory frameworks for a truly participatory energy system.

Updated: 2025-03-05 14:23:56

标题: 在能源领域的数据共享、隐私和安全考虑：从技术风景到监管规范的审查

摘要: Decarbonization, decentralization and digitalization are the three key elements driving the twin energy transition. The energy system is evolving to a more data driven ecosystem, leading to the need of communication and storage of large amount of data of different resolution from the prosumers and other stakeholders in the energy ecosystem. While the energy system is certainly advancing, this paradigm shift is bringing in new privacy and security issues related to collection, processing and storage of data - not only from the technical dimension, but also from the regulatory perspective. Understanding data privacy and security in the evolving energy system, regarding regulatory compliance, is an immature field of research. Contextualized knowledge of how related issues are regulated is still in its infancy, and the practical and technical basis for the regulatory framework for data privacy and security is not clear. To fill this gap, this paper conducts a comprehensive review of the data-related issues for the energy system by integrating both technical and regulatory dimensions. We start by reviewing open-access data, data communication and data-processing techniques for the energy system, and use it as the basis to connect the analysis of data-related issues from the integrated perspective. We classify the issues into three categories: (i) data-sharing among energy end users and stakeholders (ii) privacy of end users, and (iii) cyber security, and then explore these issues from a regulatory perspective. We analyze the evolution of related regulations, and introduce the relevant regulatory initiatives for the categorized issues in terms of regulatory definitions, concepts, principles, rights and obligations in the context of energy systems. Finally, we provide reflections on the gaps that still exist, and guidelines for regulatory frameworks for a truly participatory energy system.

更新时间: 2025-03-05 14:23:56

领域: cs.CR

下载: http://arxiv.org/abs/2503.03539v1

AI-Enabled Conversational Journaling for Advancing Parkinson's Disease Symptom Tracking

Journaling plays a crucial role in managing chronic conditions by allowing patients to document symptoms and medication intake, providing essential data for long-term care. While valuable, traditional journaling methods often rely on static, self-directed entries, lacking interactive feedback and real-time guidance. This gap can result in incomplete or imprecise information, limiting its usefulness for effective treatment. To address this gap, we introduce PATRIKA, an AI-enabled prototype designed specifically for people with Parkinson's disease (PwPD). The system incorporates cooperative conversation principles, clinical interview simulations, and personalization to create a more effective and user-friendly journaling experience. Through two user studies with PwPD and iterative refinement of PATRIKA, we demonstrate conversational journaling's significant potential in patient engagement and collecting clinically valuable information. Our results showed that generating probing questions PATRIKA turned journaling into a bi-directional interaction. Additionally, we offer insights for designing journaling systems for healthcare and future directions for promoting sustained journaling.

Updated: 2025-03-05 14:14:25

标题: AI-Enabled Conversational Journaling 用于推进帕金森病症状追踪

摘要: 日记记录在管理慢性疾病中扮演着至关重要的角色，允许患者记录症状和药物摄入情况，提供长期护理所需的基本数据。然而，传统的日记记录方法往往依赖于静态的、自主的条目，缺乏交互式反馈和实时指导。这种差距可能导致信息不完整或不精确，限制了其对有效治疗的用处。为了弥补这一差距，我们介绍了PATRIKA，这是一个专为帕金森病患者设计的人工智能原型。该系统结合了合作对话原则、临床访谈模拟和个性化，以创造更有效和用户友好的日记记录体验。通过与帕金森病患者进行两项用户研究，并通过对PATRIKA的迭代改进，我们展示了对话式日记记录在患者参与和收集临床有价值信息方面的重要潜力。我们的结果表明，生成深入提问使PATRIKA将日记记录转变为双向交互。此外，我们提供了为医疗保健设计日记记录系统的见解，以及促进持续日记记录的未来方向。

更新时间: 2025-03-05 14:14:25

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2503.03532v1

Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks

Multimodal large language models (MLLMs) have made remarkable strides in cross-modal comprehension and generation tasks. However, they remain vulnerable to jailbreak attacks, where crafted perturbations bypass security guardrails and elicit harmful outputs. In this paper, we present the first adversarial training (AT) paradigm tailored to defend against jailbreak attacks during the MLLM training phase. Extending traditional AT to this domain poses two critical challenges: efficiently tuning massive parameters and ensuring robustness against attacks across multiple modalities. To address these challenges, we introduce Projection Layer Against Adversarial Training (ProEAT), an end-to-end AT framework. ProEAT incorporates a projector-based adversarial training architecture that efficiently handles large-scale parameters while maintaining computational feasibility by focusing adversarial training on a lightweight projector layer instead of the entire model; additionally, we design a dynamic weight adjustment mechanism that optimizes the loss function's weight allocation based on task demands, streamlining the tuning process. To enhance defense performance, we propose a joint optimization strategy across visual and textual modalities, ensuring robust resistance to jailbreak attacks originating from either modality. Extensive experiments conducted on five major jailbreak attack methods across three mainstream MLLMs demonstrate the effectiveness of our approach. ProEAT achieves state-of-the-art defense performance, outperforming existing baselines by an average margin of +34% across text and image modalities, while incurring only a 1% reduction in clean accuracy. Furthermore, evaluations on real-world embodied intelligent systems highlight the practical applicability of our framework, paving the way for the development of more secure and reliable multimodal systems.

Updated: 2025-03-05 14:13:35

标题: 对抗监狱攻击的多模式大型语言模型的对抗训练

摘要: 多模式大型语言模型（MLLMs）在跨模态理解和生成任务中取得了显著进展。然而，它们仍然容易受到越狱攻击的影响，即通过精心设计的扰动绕过安全防护并引发有害输出。在本文中，我们提出了第一个专为在MLLM训练阶段防御越狱攻击而定制的对抗训练（AT）范例。将传统的对抗训练扩展到这个领域面临着两个关键挑战：高效调整大规模参数和确保对多种模态的攻击具有强大的鲁棒性。为了解决这些挑战，我们引入了Projection Layer Against Adversarial Training（ProEAT），这是一个端到端的对抗训练框架。ProEAT包含一个基于投影仪的对抗训练架构，可以高效处理大规模参数，同时通过将对抗训练集中在轻量级的投影层而不是整个模型上，保持计算可行性；此外，我们设计了一个动态权重调整机制，根据任务需求优化损失函数的权重分配，简化调整过程。为了提高防御性能，我们提出了跨视觉和文本模态的联合优化策略，确保对来自任一模态的越狱攻击具有强大的抵抗力。在对五种主流MLLMs的三种主要越狱攻击方法进行的大量实验中，我们的方法的有效性得到了证实。ProEAT实现了最先进的防御性能，在文本和图像模态上的平均边际超过现有基线34％，同时仅导致干净准确率下降1％。此外，在真实世界的具有智能体的系统上的评估突显了我们框架的实际适用性，为开发更安全可靠的多模式系统铺平了道路。

更新时间: 2025-03-05 14:13:35

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.04833v1

GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization

Although various visual localization approaches exist, such as scene coordinate regression and camera pose regression, these methods often struggle with optimization complexity or limited accuracy. To address these challenges, we explore the use of novel view synthesis techniques, particularly 3D Gaussian Splatting (3DGS), which enables the compact encoding of both 3D geometry and scene appearance. We propose a two-stage procedure that integrates dense and robust keypoint descriptors from the lightweight XFeat feature extractor into 3DGS, enhancing performance in both indoor and outdoor environments. The coarse pose estimates are directly obtained via 2D-3D correspondences between the 3DGS representation and query image descriptors. In the second stage, the initial pose estimate is refined by minimizing the rendering-based photometric warp loss. Benchmarking on widely used indoor and outdoor datasets demonstrates improvements over recent neural rendering-based localization methods, such as NeRFMatch and PNeRFLoc.

Updated: 2025-03-05 14:11:44

标题: GSplatLoc：将关键点描述符接地到3D高斯点云投影中，以改善视觉定位

摘要: 尽管存在各种视觉定位方法，如场景坐标回归和相机姿态回归，但这些方法通常在优化复杂性或有限准确性方面存在困难。为了解决这些挑战，我们探索了使用新颖的视图合成技术，特别是3D高斯喷洒（3DGS），它能够紧凑地编码3D几何和场景外观。我们提出了一个两阶段过程，将轻量级XFeat特征提取器中的密集和稳健的关键点描述符整合到3DGS中，增强了在室内和室外环境中的性能。粗略的姿态估计是通过3DGS表示和查询图像描述符之间的2D-3D对应关系直接获得的。在第二阶段，通过最小化基于渲染的光度变形损失来优化初始姿态估计。在广泛使用的室内和室外数据集上进行基准测试表明，相对于最近的基于神经渲染的定位方法，如NeRFMatch和PNeRFLoc，有了改进。

更新时间: 2025-03-05 14:11:44

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2409.16502v2

AdaSin: Enhancing Hard Sample Metrics with Dual Adaptive Penalty for Face Recognition

In recent years, the emergence of deep convolutional neural networks has positioned face recognition as a prominent research focus in computer vision. Traditional loss functions, such as margin-based, hard-sample mining-based, and hybrid approaches, have achieved notable performance improvements, with some leveraging curriculum learning to optimize training. However, these methods often fall short in effectively quantifying the difficulty of hard samples. To address this, we propose Adaptive Sine (AdaSin) loss function, which introduces the sine of the angle between a sample's embedding feature and its ground-truth class center as a novel difficulty metric. This metric enables precise and effective penalization of hard samples. By incorporating curriculum learning, the model dynamically adjusts classification boundaries across different training stages. Unlike previous adaptive-margin loss functions, AdaSin introduce a dual adaptive penalty, applied to both the positive and negative cosine similarities of hard samples. This design imposes stronger constraints, enhancing intra-class compactness and inter-class separability. The combination of the dual adaptive penalty and curriculum learning is guided by a well-designed difficulty metric. It enables the model to focus more effectively on hard samples in later training stages, and lead to the extraction of highly discriminative face features. Extensive experiments across eight benchmarks demonstrate that AdaSin achieves superior accuracy compared to other state-of-the-art methods.

Updated: 2025-03-05 14:11:13

标题: AdaSin：利用双重自适应惩罚增强人脸识别的困难样本度量

摘要: 近年来，深度卷积神经网络的出现使得人脸识别成为计算机视觉中的一个突出研究重点。传统的损失函数，如基于边界、基于难样本挖掘和混合方法，已经取得了显著的性能改进，其中一些利用课程学习来优化训练。然而，这些方法经常无法有效地量化难样本的难度。为了解决这个问题，我们提出了自适应正弦（AdaSin）损失函数，将样本的嵌入特征与其地面真实类中心之间的角度的正弦引入作为一种新颖的难度度量。这个度量能够精确有效地惩罚难样本。通过结合课程学习，模型可以动态调整不同训练阶段的分类边界。与以往的自适应边界损失函数不同，AdaSin引入了双自适应惩罚，应用于难样本的正余弦相似性。这种设计施加了更强的约束，增强了类内紧凑性和类间可分性。双自适应惩罚和课程学习的结合是由一个设计良好的难度度量引导的。它使模型能够更有效地在后期训练阶段集中精力处理难样本，并导致提取高度判别性的人脸特征。在八个基准测试中进行的广泛实验证明，AdaSin相对于其他最先进的方法实现了更高的准确性。

更新时间: 2025-03-05 14:11:13

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.03528v1

Intrinsic and Extrinsic Factor Disentanglement for Recommendation in Various Context Scenarios

In recommender systems, the patterns of user behaviors (e.g., purchase, click) may vary greatly in different contexts (e.g., time and location). This is because user behavior is jointly determined by two types of factors: intrinsic factors, which reflect consistent user preference, and extrinsic factors, which reflect external incentives that may vary in different contexts. Differentiating between intrinsic and extrinsic factors helps learn user behaviors better. However, existing studies have only considered differentiating them from a single, pre-defined context (e.g., time or location), ignoring the fact that a user's extrinsic factors may be influenced by the interplay of various contexts at the same time. In this paper, we propose the Intrinsic-Extrinsic Disentangled Recommendation (IEDR) model, a generic framework that differentiates intrinsic from extrinsic factors considering various contexts simultaneously, enabling more accurate differentiation of factors and hence the improvement of recommendation accuracy. IEDR contains a context-invariant contrastive learning component to capture intrinsic factors, and a disentanglement component to extract extrinsic factors under the interplay of various contexts. The two components work together to achieve effective factor learning. Extensive experiments on real-world datasets demonstrate IEDR's effectiveness in learning disentangled factors and significantly improving recommendation accuracy by up to 4% in NDCG.

Updated: 2025-03-05 14:08:53

标题: 在各种情境场景下推荐的内在和外在因素解脱

摘要: 在推荐系统中，用户行为模式（例如购买、点击）在不同的上下文（例如时间和位置）中可能有很大差异。这是因为用户行为由两种类型的因素共同决定：内在因素反映了一致的用户偏好，外在因素反映了可能在不同上下文中变化的外部激励。区分内在和外在因素有助于更好地学习用户行为。然而，现有研究只考虑从单一、预定义的上下文（例如时间或位置）区分它们，忽略了用户的外在因素可能同时受到各种上下文相互作用的影响的事实。在本文中，我们提出了内在外在解耦推荐（IEDR）模型，这是一个通用框架，可以同时考虑各种上下文区分内在和外在因素，从而更准确地区分因素，从而提高推荐准确性。IEDR包含一个与上下文无关的对比学习组件，用于捕捉内在因素，以及一个解耦组件，用于在各种上下文相互作用下提取外在因素。这两个组件共同工作，实现有效的因素学习。对真实世界数据集进行的广泛实验表明，IEDR在学习解耦因素方面的有效性，并通过NDCG最多提高了4%的推荐准确性。

更新时间: 2025-03-05 14:08:53

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2503.03524v1

O-RAN xApps Conflict Management using Graph Convolutional Networks

Open Radio Access Network (O-RAN) adopts a flexible, open, and virtualized structure with standardized interfaces, reducing dependency on a single supplier. Conflict management in O-RAN refers to the process of identifying and resolving conflicts between network applications. xApps are applications deployed at the RAN Intelligent Controller (RIC) that leverage advanced AI/ML algorithms to make dynamic decisions for network optimization. The lack of a unified mechanism to coordinate and prioritize the actions of different applications can create three types of conflicts (direct, indirect, and implicit). In our paper, we introduce a novel data-driven GCN-based method called Graph-based xApps Conflict and Root Cause Analysis Engine (GRACE) based on Graph Convolutional Network (GCN). It detects three types of conflicts (direct, indirect, and implicit) and pinpoints the root causes (xApps). GRACE captures the complex and hidden dependencies among the xApps, the controlled parameters, and the KPIs in O-RAN to detect possible conflicts. Then, it identifies the root causes (xApps) contributing to the detected conflicts. The proposed method was tested on highly imbalanced datasets where the number of conflict instances ranges from 40% to 10%. The model is tested in a setting that simulates real-world scenarios where conflicts are rare to assess its performance and generalizability. Experimental results demonstrate an exceptional performance, achieving a high F1-score greater than 98% for all the case studies.

Updated: 2025-03-05 14:07:29

标题: O-RAN xApps 冲突管理利用图卷积网络

摘要: 开放式无线接入网络（O-RAN）采用灵活、开放和虚拟化的结构，具有标准化接口，减少对单一供应商的依赖。O-RAN中的冲突管理是指识别和解决网络应用程序之间冲突的过程。xApps是部署在无线接入网络智能控制器（RIC）上的应用程序，利用先进的人工智能/机器学习算法进行网络优化的动态决策。缺乏统一的机制来协调和优先考虑不同应用程序的行动可能会产生三种类型的冲突（直接、间接和隐性）。在我们的论文中，我们介绍了一种基于图卷积网络（GCN）的新型数据驱动的基于图的xApps冲突和根本原因分析引擎（GRACE）方法。它检测三种类型的冲突（直接、间接和隐性），并指出根本原因（xApps）。GRACE捕捉了O-RAN中xApps、受控参数和KPI之间的复杂和隐藏的依赖关系，以检测可能的冲突。然后，它识别导致检测到的冲突的根本原因（xApps）。所提出的方法在高度不平衡的数据集上进行了测试，在这些数据集中，冲突实例的数量范围从40%到10%。该模型在模拟冲突罕见的真实场景中进行了测试，以评估其性能和泛化能力。实验结果表明出色的性能，对所有案例研究的F1分数均高于98%。

更新时间: 2025-03-05 14:07:29

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2503.03523v1

PCM Selector: Penalized Covariate-Mediator Selection Operator for Evaluating Linear Causal Effects

For a data-generating process for random variables that can be described with a linear structural equation model, we consider a situation in which (i) a set of covariates satisfying the back-door criterion cannot be observed or (ii) such a set can be observed, but standard statistical estimation methods cannot be applied to estimate causal effects because of multicollinearity/high-dimensional data problems. We propose a novel two-stage penalized regression approach, the penalized covariate-mediator selection operator (PCM Selector), to estimate the causal effects in such scenarios. Unlike existing penalized regression analyses, when a set of intermediate variables is available, PCM Selector provides a consistent or less biased estimator of the causal effect. In addition, PCM Selector provides a variable selection procedure for intermediate variables to obtain better estimation accuracy of the causal effects than does the back-door criterion.

Updated: 2025-03-05 14:05:29

标题: PCM选择器：用于评估线性因果效应的惩罚协变量-中介选择算子

摘要: 对于可以用线性结构方程模型描述的随机变量的数据生成过程，我们考虑以下情况：（i）无法观察到满足反门准则的一组协变量，或者（ii）虽然可以观察到这样一组协变量，但由于多重共线性/高维数据问题，无法应用标准统计估计方法来估计因果效应。我们提出了一种新颖的两阶段惩罚回归方法，即惩罚协变量-中介选择算子（PCM Selector），用来在这种情况下估计因果效应。与现有的惩罚回归分析不同，当一组中间变量可用时，PCM Selector提供了一致或更少偏倚的因果效应估计器。此外，PCM Selector为中间变量提供了一个变量选择过程，以获得比反门准则更好的因果效应估计精度。

更新时间: 2025-03-05 14:05:29

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2412.18180v3

How simple can you go? An off-the-shelf transformer approach to molecular dynamics

Most current neural networks for molecular dynamics (MD) include physical inductive biases, resulting in specialized and complex architectures. This is in contrast to most other machine learning domains, where specialist approaches are increasingly replaced by general-purpose architectures trained on vast datasets. In line with this trend, several recent studies have questioned the necessity of architectural features commonly found in MD models, such as built-in rotational equivariance or energy conservation. In this work, we contribute to the ongoing discussion by evaluating the performance of an MD model with as few specialized architectural features as possible. We present a recipe for MD using an Edge Transformer, an "off-the-shelf'' transformer architecture that has been minimally modified for the MD domain, termed MD-ET. Our model implements neither built-in equivariance nor energy conservation. We use a simple supervised pre-training scheme on $\sim$30 million molecular structures from the QCML database. Using this "off-the-shelf'' approach, we show state-of-the-art results on several benchmarks after fine-tuning for a small number of steps. Additionally, we examine the effects of being only approximately equivariant and energy conserving for MD simulations, proposing a novel method for distinguishing the errors resulting from non-equivariance from other sources of inaccuracies like numerical rounding errors. While our model exhibits runaway energy increases on larger structures, we show approximately energy-conserving NVE simulations for a range of small structures.

Updated: 2025-03-05 14:04:46

标题: 您能多简化吗？一种适用于分子动力学的现成变压器方法

摘要: 目前大多数用于分子动力学（MD）的神经网络都包含物理归纳偏差，导致专门化和复杂的架构。这与大多数其他机器学习领域不同，那里的专家方法越来越被在庞大数据集上训练的通用架构所取代。符合这一趋势，最近几项研究质疑了MD模型中常见的架构特征的必要性，如内置的旋转等变性或能量守恒。在本研究中，我们通过评估尽可能少的专门化架构特征的MD模型性能，对此进行了贡献。我们提出了一种使用Edge Transformer进行MD的方法，这是一种“即插即用”的变压器架构，已经经过最小程度的修改，称为MD-ET。我们的模型既不实现内置的等变性，也不实现能量守恒。我们使用了一个简单的监督预训练方案，在QCML数据库中的约3000万分子结构上进行了训练。使用这种“即插即用”的方法，我们展示了在少数步骤微调后在几个基准测试上的最新结果。此外，我们检查了在MD模拟中仅近似等变和能量保持的影响，提出了一种区分由于非等变性引起的错误与其他不准确性来源（如数值舍入误差）的新方法。虽然我们的模型在较大结构上表现出能量急剧增加的情况，但我们展示了一系列小结构的近似能量保持的NVE模拟。

更新时间: 2025-03-05 14:04:46

领域: cs.LG

下载: http://arxiv.org/abs/2503.01431v2

Multimodal Action Quality Assessment

Action quality assessment (AQA) is to assess how well an action is performed. Previous works perform modelling by only the use of visual information, ignoring audio information. We argue that although AQA is highly dependent on visual information, the audio is useful complementary information for improving the score regression accuracy, especially for sports with background music, such as figure skating and rhythmic gymnastics. To leverage multimodal information for AQA, i.e., RGB, optical flow and audio information, we propose a Progressive Adaptive Multimodal Fusion Network (PAMFN) that separately models modality-specific information and mixed-modality information. Our model consists of with three modality-specific branches that independently explore modality-specific information and a mixed-modality branch that progressively aggregates the modality-specific information from the modality-specific branches. To build the bridge between modality-specific branches and the mixed-modality branch, three novel modules are proposed. First, a Modality-specific Feature Decoder module is designed to selectively transfer modality-specific information to the mixed-modality branch. Second, when exploring the interaction between modality-specific information, we argue that using an invariant multimodal fusion policy may lead to suboptimal results, so as to take the potential diversity in different parts of an action into consideration. Therefore, an Adaptive Fusion Module is proposed to learn adaptive multimodal fusion policies in different parts of an action. This module consists of several FusionNets for exploring different multimodal fusion strategies and a PolicyNet for deciding which FusionNets are enabled. Third, a module called Cross-modal Feature Decoder is designed to transfer cross-modal features generated by Adaptive Fusion Module to the mixed-modality branch.

Updated: 2025-03-05 14:02:10

标题: 多模态动作质量评估

摘要: 动作质量评估（AQA）是评估动作执行情况的方法。以往的研究仅使用视觉信息进行建模，忽略了音频信息。我们认为，虽然AQA高度依赖于视觉信息，但音频是用于提高得分回归准确性的有用补充信息，特别是对于伴有背景音乐的体育项目，如花样滑冰和韵律体操。为了利用多模态信息进行AQA，即RGB、光流和音频信息，我们提出了一个逐步自适应多模态融合网络（PAMFN），它分别对模态特定信息和混合模态信息进行建模。我们的模型包括三个独立探索模态特定信息的分支和一个逐步聚合来自模态特定分支的模态特定信息的混合模态分支。为了在模态特定分支和混合模态分支之间建立桥梁，提出了三个新颖模块。首先，设计了一个模态特定特征解码器模块，用于有选择地将模态特定信息传输到混合模态分支。其次，在探索模态特定信息之间的交互时，我们认为使用不变的多模态融合策略可能导致次优结果，因此需要考虑动作不同部分的潜在多样性。因此，提出了一个自适应融合模块，用于学习动作不同部分的自适应多模态融合策略。该模块包括几个FusionNets用于探索不同的多模态融合策略和一个PolicyNet用于决定哪些FusionNets被启用。第三，设计了一个名为跨模态特征解码器的模块，用于将自适应融合模块生成的跨模态特征传输到混合模态分支。

更新时间: 2025-03-05 14:02:10

领域: eess.SP,cs.AI,cs.CV,I.2.10

下载: http://arxiv.org/abs/2402.09444v3

DO-IQS: Dynamics-Aware Offline Inverse Q-Learning for Optimal Stopping with Unknown Gain Functions

We consider Inverse Optimal Stopping (IOS) problem where, based on stopped expert trajectories, one aims to recover the optimal stopping region through continuation and stopping gain functions approximation. The uniqueness of the stopping region allows the use of IOS in real-world applications with safety concerns. While current state-of-the-art inverse reinforcement learning methods recover both a Q-function and the corresponding optimal policy, they fail to account for specific challenges posed by optimal stopping problems. These include data sparsity near the stopping region, non-Markovian nature of the continuation gain, a proper treatment of boundary conditions, the need for a stable offline approach for risk-sensitive applications, and a lack of a quality evaluation metric. These challenges are addressed with the proposed Dynamics-Aware Offline Inverse Q-Learning for Optimal Stopping (DO-IQS), which incorporates temporal information by approximating the cumulative continuation gain together with the world dynamics and the Q-function without querying to the environment. Moreover, a confidence-based oversampling approach is proposed to treat the data sparsity problem. We demonstrate the performance of our models on real and artificial data including an optimal intervention for critical events problem.

Updated: 2025-03-05 14:01:17

标题: DO-IQS：具有未知增益函数的动态感知离线逆Q学习用于最优停止

摘要: 我们考虑逆最优停止（IOS）问题，在这个问题中，基于停止的专家轨迹，我们旨在通过近似延续和停止增益函数来恢复最优停止区域。停止区域的唯一性使得可以在关注安全问题的实际应用中使用IOS。虽然当前最先进的逆强化学习方法可以恢复Q函数和相应的最优策略，但它们未能考虑到最优停止问题所带来的具体挑战。这些挑战包括停止区域附近的数据稀疏性，延续增益的非马尔科夫性质，边界条件的适当处理，风险敏感应用的稳定离线方法的需求，以及缺乏质量评估指标。提出了一种新的解决方案——用于最优停止的动态感知离线逆Q学习（DO-IQS），它通过近似累积延续增益连同世界动态和Q函数，而无需查询环境，以整合时间信息。此外，提出了一种基于置信度的过采样方法来解决数据稀疏问题。我们在真实和人工数据上展示了我们模型的性能，包括一个关键事件问题的最优干预。

更新时间: 2025-03-05 14:01:17

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2503.03515v1

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency

Chain-of-thought (CoT) significantly enhances the reasoning performance of large language models (LLM). While current theoretical studies often attribute this improvement to increased expressiveness and computational capacity, we argue that expressiveness is not the primary limitation in the LLM regime, as current large models will fail on simple tasks. Using a parity-learning setup, we demonstrate that CoT can substantially improve sample efficiency even when the representation power is sufficient. Specifically, with CoT, a transformer can learn the function within polynomial samples, whereas without CoT, the required sample size is exponential. Additionally, we show that CoT simplifies the learning process by introducing sparse sequential dependencies among input tokens, and leads to a sparse and interpretable attention. We validate our theoretical analysis with both synthetic and real-world experiments, confirming that sparsity in attention layers is a key factor of the improvement induced by CoT.

Updated: 2025-03-05 13:57:56

标题: 从稀疏依赖到稀疏注意力：揭示链式思维如何增强Transformer样本效率

摘要: Chain-of-thought (CoT)显著提高了大型语言模型（LLM）的推理性能。尽管当前的理论研究通常将这种改进归因于增加的表达能力和计算能力，但我们认为在LLM范围内，表达能力并不是主要限制，因为当前的大型模型在简单任务上会失败。通过使用奇偶学习设置，我们证明了即使表示能力足够，CoT也可以大大提高样本效率。具体地，使用CoT，transformer可以在多项式样本内学习函数，而没有CoT时，所需的样本量是指数级的。此外，我们展示了CoT通过引入输入令牌之间的稀疏序列依赖关系简化了学习过程，并导致了稀疏且可解释的注意力。我们通过合成和真实世界实验验证了我们的理论分析，确认了CoT引入的注意力层中的稀疏性是其所引起的改进的关键因素。

更新时间: 2025-03-05 13:57:56

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2410.05459v2

An Aspect Extraction Framework using Different Embedding Types, Learning Models, and Dependency Structure

Aspect-based sentiment analysis has gained significant attention in recent years due to its ability to provide fine-grained insights for sentiment expressions related to specific features of entities. An important component of aspect-based sentiment analysis is aspect extraction, which involves identifying and extracting aspect terms from text. Effective aspect extraction serves as the foundation for accurate sentiment analysis at the aspect level. In this paper, we propose aspect extraction models that use different types of embeddings for words and part-of-speech tags and that combine several learning models. We also propose tree positional encoding that is based on dependency parsing output to capture better the aspect positions in sentences. In addition, a new aspect extraction dataset is built for Turkish by machine translating an English dataset in a controlled setting. The experiments conducted on two Turkish datasets showed that the proposed models mostly outperform the studies that use the same datasets, and incorporating tree positional encoding increases the performance of the models.

Updated: 2025-03-05 13:57:48

标题: 一个使用不同嵌入类型、学习模型和依赖结构的方面提取框架

摘要: 方面情感分析在近年来受到了极大关注，因为它能够提供与实体特定特征相关的情感表达的细致洞察。方面情感分析的一个重要组成部分是方面提取，它涉及从文本中识别和提取方面术语。有效的方面提取为准确的方面级情感分析奠定了基础。本文提出了使用不同类型的词嵌入和词性标签的方面提取模型，并结合了多个学习模型。我们还提出了基于依赖解析输出的树位置编码，以更好地捕捉句子中的方面位置。此外，通过在受控环境中将一个英文数据集机器翻译成土耳其语，建立了一个新的土耳其方面提取数据集。在两个土耳其数据集上进行的实验显示，所提出的模型大多优于使用相同数据集的研究，并且加入树位置编码可以提高模型的性能。

更新时间: 2025-03-05 13:57:48

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.03512v1

NeuGrasp: Generalizable Neural Surface Reconstruction with Background Priors for Material-Agnostic Object Grasp Detection

Robotic grasping in scenes with transparent and specular objects presents great challenges for methods relying on accurate depth information. In this paper, we introduce NeuGrasp, a neural surface reconstruction method that leverages background priors for material-agnostic grasp detection. NeuGrasp integrates transformers and global prior volumes to aggregate multi-view features with spatial encoding, enabling robust surface reconstruction in narrow and sparse viewing conditions. By focusing on foreground objects through residual feature enhancement and refining spatial perception with an occupancy-prior volume, NeuGrasp excels in handling objects with transparent and specular surfaces. Extensive experiments in both simulated and real-world scenarios show that NeuGrasp outperforms state-of-the-art methods in grasping while maintaining comparable reconstruction quality. More details are available at https://neugrasp.github.io/.

Updated: 2025-03-05 13:57:37

标题: NeuGrasp：具有背景先验知识的通用神经表面重建用于材料无关物体抓取检测

摘要: 在具有透明和反光物体的场景中进行机器人抓取对依赖于准确深度信息的方法提出了巨大挑战。在本文中，我们介绍了NeuGrasp，这是一种神经表面重建方法，利用背景先验进行与材料无关的抓取检测。NeuGrasp集成了变换器和全局先验体积，以聚合多视图特征和空间编码，从而在狭窄和稀疏的视图条件下实现稳健的表面重建。通过通过残差特征增强专注于前景物体，并利用占用先验体积精细调整空间感知，NeuGrasp在处理具有透明和反光表面的物体方面表现出色。在模拟和真实世界场景中进行的大量实验表明，NeuGrasp在抓取方面优于最先进的方法，同时保持可比的重建质量。更多详细信息可在https://neugrasp.github.io/获取。

更新时间: 2025-03-05 13:57:37

领域: cs.RO,cs.AI,I.2.9; I.2.10

下载: http://arxiv.org/abs/2503.03511v1

Rethinking Synthetic Data definitions: A privacy driven approach

Synthetic data is gaining traction as a cost-effective solution for the increasing data demands of AI development and can be generated either from existing knowledge or derived data captured from real-world events. The source of the synthetic data generation and the technique used significantly impacts its residual privacy risk and therefore its opportunity for sharing. Traditional classification of synthetic data types no longer fit the newer generation techniques and there is a need to better align the classification with practical needs. We suggest a new way of grouping synthetic data types that better supports privacy evaluations to aid regulatory policymaking. Our novel classification provides flexibility to new advancements like deep generative methods and offers a more practical framework for future applications.

Updated: 2025-03-05 13:54:13

标题: 重新思考合成数据定义：一个以隐私为驱动的方法

摘要: 合成数据作为一种经济高效的解决方案，用于满足人工智能开发日益增长的数据需求，并且可以通过现有知识或从真实世界事件中捕获的派生数据生成。合成数据生成的来源和所使用的技术显著影响其剩余隐私风险，因此影响其共享机会。传统的合成数据类型分类不再适用于新一代技术，有必要更好地将分类与实际需求对齐。我们建议一种新的合成数据类型分组方式，更好地支持隐私评估，以帮助监管政策制定。我们的新分类提供了灵活性，适应了深度生成方法等新进展，并为未来应用提供了更实用的框架。

更新时间: 2025-03-05 13:54:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03506v1

Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems

Recent advancements in Large Language Model(LLM)-based Multi-Agent Systems(MAS) have demonstrated remarkable potential for tackling complex decision-making tasks. However, existing frameworks inevitably rely on serialized execution paradigms, where agents must complete sequential LLM planning before taking action. This fundamental constraint severely limits real-time responsiveness and adaptation, which is crucial in dynamic environments with ever-changing scenarios. In this paper, we propose a novel parallelized planning-acting framework for LLM-based MAS, featuring a dual-thread architecture with interruptible execution to enable concurrent planning and acting. Specifically, our framework comprises two core threads:(1) a planning thread driven by a centralized memory system, maintaining synchronization of environmental states and agent communication to support dynamic decision-making; and (2) an acting thread equipped with a comprehensive skill library, enabling automated task execution through recursive decomposition. Extensive experiments on challenging Minecraft demonstrate the effectiveness of the proposed framework.

Updated: 2025-03-05 13:53:10

标题: 并行规划-执行以提高基于LLM的多智能体系统的效率

摘要: 最近基于大型语言模型（LLM）的多Agent系统（MAS）的进展表明，具有卓越潜力来解决复杂的决策任务。然而，现有框架不可避免地依赖于串行执行范式，其中代理必须在采取行动之前完成顺序LLM规划。这种基本约束严重限制了实时响应能力和适应性，在动态环境中至关重要，因为这些环境中的情景不断变化。在本文中，我们提出了一个新颖的并行规划-执行框架，用于基于LLM的MAS，具有可中断执行的双线程架构，以实现并发规划和执行。具体而言，我们的框架包括两个核心线程：（1）由集中式内存系统驱动的规划线程，维护环境状态和代理通信的同步，以支持动态决策制定；和（2）配备全面技能库的执行线程，通过递归分解实现自动化任务执行。在具有挑战性的Minecraft上进行的大量实验证明了所提出框架的有效性。

更新时间: 2025-03-05 13:53:10

领域: cs.AI

下载: http://arxiv.org/abs/2503.03505v1

Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization

Molecular optimization is a crucial yet complex and time-intensive process that often acts as a bottleneck for drug development. Traditional methods rely heavily on trial and error, making multi-objective optimization both time-consuming and resource-intensive. Current AI-based methods have shown limited success in handling multi-objective optimization tasks, hampering their practical utilization. To address this challenge, we present MultiMol, a collaborative large language model (LLM) system designed to guide multi-objective molecular optimization. MultiMol comprises two agents, including a data-driven worker agent and a literature-guided research agent. The data-driven worker agent is a large language model being fine-tuned to learn how to generate optimized molecules considering multiple objectives, while the literature-guided research agent is responsible for searching task-related literature to find useful prior knowledge that facilitates identifying the most promising optimized candidates. In evaluations across six multi-objective optimization tasks, MultiMol significantly outperforms existing methods, achieving a 82.30% success rate, in sharp contrast to the 27.50% success rate of current strongest methods. To further validate its practical impact, we tested MultiMol on two real-world challenges. First, we enhanced the selectivity of Xanthine Amine Congener (XAC), a promiscuous ligand that binds both A1R and A2AR, successfully biasing it towards A1R. Second, we improved the bioavailability of Saquinavir, an HIV-1 protease inhibitor with known bioavailability limitations. Overall, these results indicate that MultiMol represents a highly promising approach for multi-objective molecular optimization, holding great potential to accelerate the drug development process and contribute to the advancement of pharmaceutical research.

Updated: 2025-03-05 13:47:55

标题: 协作专家LLMs指导的多目标分子优化

摘要: 分子优化是一项至关重要但又复杂且耗时的过程，通常作为药物开发的瓶颈。传统方法主要依赖于试错，使得多目标优化既耗时又资源密集。目前基于人工智能的方法在处理多目标优化任务方面显示出有限的成功，限制了它们的实际利用。为了解决这一挑战，我们提出了MultiMol，这是一个协作的大型语言模型（LLM）系统，旨在指导多目标分子优化。MultiMol包括两个代理，包括一个数据驱动的工作代理和一个文献引导的研究代理。数据驱动的工作代理是一个经过微调的大型语言模型，学习如何生成考虑多个目标的优化分子，而文献引导的研究代理负责搜索与任务相关的文献，寻找有用的先前知识，以便识别最有前景的优化候选者。在六个多目标优化任务的评估中，MultiMol明显优于现有方法，取得了82.30%的成功率，与当前最强方法的27.50%的成功率形成鲜明对比。为了进一步验证其实际影响，我们在两个真实挑战中测试了MultiMol。首先，我们增强了黄嘌呤胺类似物（XAC）的选择性，这是一种结合A1R和A2AR的多效配体，成功地偏向A1R。其次，我们改善了Saquinavir的生物利用度，这是一种已知生物利用度有限的HIV-1蛋白酶抑制剂。总的来说，这些结果表明MultiMol代表了一种非常有前景的多目标分子优化方法，具有加速药物开发过程并促进制药研究进展的巨大潜力。

更新时间: 2025-03-05 13:47:55

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.03503v1

WVEmbs with its Masking: A Method For Radar Signal Sorting

Our study proposes a novel embedding method, Wide-Value-Embeddings (WVEmbs), for processing Pulse Descriptor Words (PDWs) as normalized inputs to neural networks. This method adapts to the distribution of interleaved radar signals, ranking original signal features from trivial to useful and stabilizing the learning process. To address the imbalance in radar signal interleaving, we introduce a value dimension masking method on WVEmbs, which automatically and efficiently generates challenging samples, and constructs interleaving scenarios, thereby compelling the model to learn robust features. Experimental results demonstrate that our method is an efficient end-to-end approach, achieving high-granularity, sample-level pulse sorting for high-density interleaved radar pulse sequences in complex and non-ideal environments.

Updated: 2025-03-05 13:47:55

标题: 带有掩码的WVEmbs:一种雷达信号分类方法

摘要: 我们的研究提出了一种新颖的嵌入方法，Wide-Value-Embeddings（WVEmbs），用于将脉冲描述符词（PDWs）处理为归一化输入到神经网络中。该方法适应交错雷达信号的分布，将原始信号特征从琐碎到有用进行排序，并稳定学习过程。为了解决雷达信号交错中的不平衡，我们在WVEmbs上引入了一个值维度掩模方法，自动生成具有挑战性的样本，并构建交错场景，从而迫使模型学习稳健的特征。实验结果表明，我们的方法是一种高效的端到端方法，在复杂和非理想环境中实现了高密度交错雷达脉冲序列的高粒度、样本级脉冲排序。

更新时间: 2025-03-05 13:47:55

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2503.13480v1

CURVALID: Geometrically-guided Adversarial Prompt Detection

Adversarial prompts capable of jailbreaking large language models (LLMs) and inducing undesirable behaviours pose a significant obstacle to their safe deployment. Current mitigation strategies rely on activating built-in defence mechanisms or fine-tuning the LLMs, but the fundamental distinctions between adversarial and benign prompts are yet to be understood. In this work, we introduce CurvaLID, a novel defense framework that efficiently detects adversarial prompts by leveraging their geometric properties. It is agnostic to the type of LLM, offering a unified detection framework across diverse adversarial prompts and LLM architectures. CurvaLID builds on the geometric analysis of text prompts to uncover their underlying differences. We theoretically extend the concept of curvature via the Whewell equation into an $n$-dimensional word embedding space, enabling us to quantify local geometric properties, including semantic shifts and curvature in the underlying manifolds. Additionally, we employ Local Intrinsic Dimensionality (LID) to capture geometric features of text prompts within adversarial subspaces. Our findings reveal that adversarial prompts differ fundamentally from benign prompts in terms of their geometric characteristics. Our results demonstrate that CurvaLID delivers superior detection and rejection of adversarial queries, paving the way for safer LLM deployment. The source code can be found at https://github.com/Cancanxxx/CurvaLID

Updated: 2025-03-05 13:47:53

标题: CURVALID：几何引导的对抗性提示检测

摘要: 能够越狱大型语言模型（LLMs）并诱发不良行为的对抗性提示对它们的安全部署构成了重大障碍。当前的缓解策略依赖于激活内置的防御机制或微调LLMs，但对抗性提示与良性提示之间的基本区别尚未被理解。在这项工作中，我们引入了CurvaLID，这是一个新颖的防御框架，通过利用对抗性提示的几何特性有效地检测它们。它对LLM的类型不加区分，提供了一个统一的检测框架，可跨越各种对抗性提示和LLM架构。CurvaLID基于文本提示的几何分析来揭示它们之间的潜在差异。我们通过Whewell方程在$n$维词嵌入空间中理论上扩展了曲率的概念，使我们能够量化局部几何特性，包括语义转变和底层流形中的曲率。此外，我们采用局部内在维度（LID）来捕获对抗子空间中文本提示的几何特征。我们的研究结果显示，对抗性提示在几何特性上与良性提示有根本区别。我们的结果表明，CurvaLID提供了对抗性查询的优越检测和拒绝，为更安全地部署LLM铺平了道路。源代码可在https://github.com/Cancanxxx/CurvaLID找到。

更新时间: 2025-03-05 13:47:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03502v1

State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models

State Space Models (SSMs) have emerged as efficient alternatives to Transformers, mitigating their quadratic computational cost. However, the application of Parameter-Efficient Fine-Tuning (PEFT) methods to SSMs remains largely unexplored. In particular, prompt-based methods like Prompt Tuning and Prefix-Tuning, which are widely used in Transformers, do not perform well on SSMs. To address this, we propose state-based methods as a superior alternative to prompt-based methods. This new family of methods naturally stems from the architectural characteristics of SSMs. State-based methods adjust state-related features directly instead of depending on external prompts. Furthermore, we introduce a novel state-based PEFT method: State-offset Tuning. At every timestep, our method directly affects the state at the current step, leading to more effective adaptation. Through extensive experiments across diverse datasets, we demonstrate the effectiveness of our method. Code is available at https://github.com/furiosa-ai/ssm-state-tuning.

Updated: 2025-03-05 13:44:42

标题: 状态偏移调整：基于状态的参数高效微调用于状态空间模型

摘要: 状态空间模型（SSM）已成为高效的变换器的替代方案，减轻了它们的二次计算成本。然而，将参数高效微调（PEFT）方法应用于SSM仍然大多未被探索。特别是，像Prompt Tuning和Prefix-Tuning这样的基于提示的方法，在变压器中被广泛使用，但在SSM上表现不佳。为了解决这个问题，我们提出了基于状态的方法作为提示性方法的更优选。这一新方法系列自然地源自SSM的结构特征。基于状态的方法直接调整与状态相关的特征，而不依赖于外部提示。此外，我们引入了一种新颖的基于状态的PEFT方法：State-offset Tuning。在每个时间步，我们的方法直接影响当前步骤的状态，导致更有效的适应性。通过对各种数据集的广泛实验，我们展示了我们方法的有效性。代码可在https://github.com/furiosa-ai/ssm-state-tuning获取。

更新时间: 2025-03-05 13:44:42

领域: cs.LG

下载: http://arxiv.org/abs/2503.03499v1

Oblivious Digital Tokens

A computing device typically identifies itself by exhibiting unique measurable behavior or by proving its knowledge of a secret. In both cases, the identifying device must reveal information to a verifier. Considerable research has focused on protecting identifying entities (provers) and reducing the amount of leaked data. However, little has been done to conceal the fact that the verification occurred. We show how this problem naturally arises in the context of digital emblems, which were recently proposed by the International Committee of the Red Cross to protect digital resources during cyber-conflicts. To address this new and important open problem, we define a new primitive, called an Oblivious Digital Token (ODT) that can be verified obliviously. Verifiers can use this procedure to check whether a device has an ODT without revealing to any other parties (including the device itself) that this check occurred. We demonstrate the feasibility of ODTs and present a concrete construction that provably meets the ODT security requirements, even if the prover device's software is fully compromised. We also implement a prototype of the proposed construction and evaluate its performance, thereby confirming its practicality.

Updated: 2025-03-05 13:34:31

标题: 遗忘数字代币

摘要: 计算设备通常通过展示独特的可测行为或证明其知晓一个秘密来进行身份识别。在这两种情况下，识别设备必须向验证者透露信息。大量研究集中在保护识别实体（证明者）和减少泄漏数据的数量上。然而，很少有人去掩盖验证发生的事实。我们展示了这个问题如何在数字标记的背景下自然产生，这些数字标记最近由国际红十字会提出，用于在网络冲突期间保护数字资源。为了解决这个新的重要的开放问题，我们定义了一种新的原语，称为遗忘数字令牌（ODT），可以被遗忘地验证。验证者可以使用这个过程检查设备是否具有ODT，而不向任何其他方（包括设备本身）透露这个检查发生了。我们证明了ODT的可行性，并提出了一个具体的构造，可以证明符合ODT的安全要求，即使证明者设备的软件被完全破坏。我们还实现了提出的构造的原型，并评估其性能，从而确认了其实用性。

更新时间: 2025-03-05 13:34:31

领域: cs.CR

下载: http://arxiv.org/abs/2503.03494v1

Federated Learning for Predicting Mild Cognitive Impairment to Dementia Conversion

Dementia is a progressive condition that impairs an individual's cognitive health and daily functioning, with mild cognitive impairment (MCI) often serving as its precursor. The prediction of MCI to dementia conversion has been well studied, but previous studies have almost always focused on traditional Machine Learning (ML) based methods that require sharing sensitive clinical information to train predictive models. This study proposes a privacy-enhancing solution using Federated Learning (FL) to train predictive models for MCI to dementia conversion without sharing sensitive data, leveraging socio demographic and cognitive measures. We simulated and compared two network architectures, Peer to Peer (P2P) and client-server, to enable collaborative learning. Our results demonstrated that FL had comparable predictive performance to centralized ML, and each clinical site showed similar performance without sharing local data. Moreover, the predictive performance of FL models was superior to site specific models trained without collaboration. This work highlights that FL can eliminate the need for data sharing without compromising model efficacy.

Updated: 2025-03-05 13:29:23

标题: 联邦学习用于预测轻度认知障碍向痴呆症转化

摘要: 痴呆症是一种逐渐恶化的状况，影响个体的认知健康和日常功能，轻度认知障碍（MCI）通常是其前驱。对MCI向痴呆症转化的预测已经得到了深入研究，但以往的研究几乎总是集中在需要共享敏感临床信息以训练预测模型的传统机器学习（ML）方法上。本研究提出了一种隐私增强解决方案，使用联邦学习（FL）来训练MCI向痴呆症转化的预测模型，而无需共享敏感数据，利用社会人口统计和认知测量。我们模拟并比较了两种网络架构，点对点（P2P）和客户端-服务器，以实现协作学习。我们的结果表明，FL与集中式ML具有可比较的预测性能，每个临床站点在不共享本地数据的情况下表现出类似的性能。此外，FL模型的预测性能优于在没有合作训练的站点特定模型。这项工作强调了FL可以在不影响模型效力的情况下消除数据共享的需求。

更新时间: 2025-03-05 13:29:23

领域: cs.LG

下载: http://arxiv.org/abs/2503.03489v1

Handling Spatial-Temporal Data Heterogeneity for Federated Continual Learning via Tail Anchor

Federated continual learning (FCL) allows each client to continually update its knowledge from task streams, enhancing the applicability of federated learning in real-world scenarios. However, FCL needs to address not only spatial data heterogeneity between clients but also temporal data heterogeneity between tasks. In this paper, empirical experiments demonstrate that such input-level heterogeneity significantly affects the model's internal parameters and outputs, leading to severe spatial-temporal catastrophic forgetting of local and previous knowledge. To this end, we propose Federated Tail Anchor (FedTA) to mix trainable Tail Anchor with the frozen output features to adjust their position in the feature space, thereby overcoming parameter-forgetting and output-forgetting. Three novel components are also included: Input Enhancement for improving the performance of pre-trained models on downstream tasks; Selective Input Knowledge Fusion for fusion of heterogeneous local knowledge on the server; and Best Global Prototype Selection for finding the best anchor point for each class in the feature space. Extensive experiments demonstrate that FedTA not only outperforms existing FCL methods but also effectively preserves the relative positions of features.

Updated: 2025-03-05 13:25:09

标题: 通过尾锚点处理联邦持续学习中的空间-时间数据异质性

摘要: 联邦持续学习（FCL）允许每个客户端不断更新其知识，从任务流中增强联邦学习在现实场景中的适用性。然而，FCL不仅需要解决客户端之间的空间数据异质性，还需要解决任务之间的时间数据异质性。本文通过实证实验表明，这种输入级别的异质性显著影响模型的内部参数和输出，导致严重的空间-时间灾难性遗忘局部和先前的知识。为此，我们提出了Federated Tail Anchor（FedTA），将可训练的Tail Anchor与冻结的输出特征混合在一起，调整它们在特征空间中的位置，从而克服参数遗忘和输出遗忘。还包括三个新颖组件：用于改进预训练模型在下游任务上性能的输入增强；用于在服务器上融合异构本地知识的选择性输入知识融合；以及用于在特征空间中找到每个类的最佳锚点的最佳全局原型选择。大量实验表明，FedTA不仅优于现有的FCL方法，而且有效地保留了特征的相对位置。

更新时间: 2025-03-05 13:25:09

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.18355v2

Differentially Private Learners for Heterogeneous Treatment Effects

Patient data is widely used to estimate heterogeneous treatment effects and thus understand the effectiveness and safety of drugs. Yet, patient data includes highly sensitive information that must be kept private. In this work, we aim to estimate the conditional average treatment effect (CATE) from observational data under differential privacy. Specifically, we present DP-CATE, a novel framework for CATE estimation that is Neyman-orthogonal and further ensures differential privacy of the estimates. Our framework is highly general: it applies to any two-stage CATE meta-learner with a Neyman-orthogonal loss function, and any machine learning model can be used for nuisance estimation. We further provide an extension of our DP-CATE, where we employ RKHS regression to release the complete CATE function while ensuring differential privacy. We demonstrate our DP-CATE across various experiments using synthetic and real-world datasets. To the best of our knowledge, we are the first to provide a framework for CATE estimation that is Neyman-orthogonal and differentially private.

Updated: 2025-03-05 13:24:58

标题: 异构处理效应的差分私有学习者

摘要: 患者数据被广泛用于估计异质性治疗效果，从而了解药物的有效性和安全性。然而，患者数据包含高度敏感的信息，必须保持私密。在这项工作中，我们旨在在差分隐私下估计来自观察数据的条件平均治疗效应（CATE）。具体地，我们提出了DP-CATE，这是一个新颖的用于CATE估计的框架，它是Neyman正交的，并进一步确保估计结果的差分隐私。我们的框架非常通用：它适用于任何具有Neyman正交损失函数的两阶段CATE元学习器，并且任何机器学习模型都可以用于干扰项估计。我们进一步提供了DP-CATE的扩展，其中我们使用RKHS回归来发布完整的CATE函数，同时确保差分隐私。我们通过使用合成和真实世界数据集进行的各种实验展示了我们的DP-CATE。据我们所知，我们是第一个提供Neyman正交和差分隐私的CATE估计框架。

更新时间: 2025-03-05 13:24:58

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2503.03486v1

TEDDY: A Family Of Foundation Models For Understanding Single Cell Biology

Understanding the biological mechanism of disease is critical for medicine, and in particular drug discovery. AI-powered analysis of genome-scale biological data hold great potential in this regard. The increasing availability of single-cell RNA sequencing data has enabled the development of large foundation models for disease biology. However, existing foundation models either do not improve or only modestly improve over task-specific models in downstream applications. Here, we explored two avenues for improving the state-of-the-art. First, we scaled the pre-training dataset to 116 million cells, which is larger than those used by previous models. Second, we leveraged the availability of large-scale biological annotations as a form of supervision during pre-training. We trained the TEDDY family of models comprising six transformer-based state-of-the-art single-cell foundation models with 70 million, 160 million, and 400 million parameters. We vetted our models on two downstream evaluation tasks -- identifying the underlying disease state of held-out donors not seen during training and distinguishing healthy cells from diseased ones for disease conditions and donors not seen during training. Scaling experiments showed that performance improved predictably with both data volume and parameter count. Our models showed substantial improvement over existing work on the first task and more muted improvements on the second.

Updated: 2025-03-05 13:24:57

标题: TEDDY：用于理解单细胞生物学的基础模型家族

摘要: 理解疾病的生物学机制对医学，尤其是药物发现至关重要。AI技术对基因组规模的生物数据进行分析在这方面具有巨大潜力。单细胞RNA测序数据的增加可促成疾病生物学基础模型的发展。然而，现有的基础模型在下游应用中要么没有改进，要么只是在任务特定模型上略微改进。在这里，我们探讨了两种改进最新技术的途径。首先，我们将预训练数据集扩展到1.16亿个细胞，比以前的模型使用的数据规模更大。其次，我们利用大规模生物标注的可用性作为预训练过程中的一种监督形式。我们训练了TEDDY家族的模型，包括六个基于Transformer的最新单细胞基础模型，参数量分别为7千万、1.6亿和4亿。我们在两个下游评估任务上验证了我们的模型——识别在训练期间未见的受试者的潜在疾病状态，以及为在训练期间未见的疾病情况和受试者区分健康细胞和患病细胞。扩展实验表明，性能随数据量和参数数量的增加而可预测地改进。我们的模型在第一个任务上表现出显著改进，对第二个任务的改进则较为温和。

更新时间: 2025-03-05 13:24:57

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2503.03485v1

Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries

To answer one-to-many factual queries (e.g., listing cities of a country), a language model (LM) must simultaneously recall knowledge and avoid repeating previous answers. How are these two subtasks implemented and integrated internally? Across multiple datasets and models, we identify a promote-then-suppress mechanism: the model first recalls all answers, and then suppresses previously generated ones. Specifically, LMs use both the subject and previous answer tokens to perform knowledge recall, with attention propagating subject information and MLPs promoting the answers. Then, attention attends to and suppresses previous answer tokens, while MLPs amplify the suppression signal. Our mechanism is corroborated by extensive experimental evidence: in addition to using early decoding and causal tracing, we analyze how components use different tokens by introducing both Token Lens, which decodes aggregated attention updates from specified tokens, and a knockout method that analyzes changes in MLP outputs after removing attention to specified tokens. Overall, we provide new insights into how LMs' internal components interact with different input tokens to support complex factual recall. Code is available at https://github.com/Lorenayannnnn/how-lms-answer-one-to-many-factual-queries.

Updated: 2025-03-05 13:22:47

标题: 促进、抑制、迭代：语言模型如何回答一对多的事实查询

摘要: 为了回答一对多的事实查询（例如列出一个国家的城市），语言模型（LM）必须同时回忆知识并避免重复先前的答案。这两个子任务是如何在内部实现和集成的呢？在多个数据集和模型中，我们确定了一个先提升后抑制的机制：模型首先回忆所有答案，然后抑制先前生成的答案。具体来说，LM使用主题和先前的答案标记来执行知识回忆，注意力传播主题信息，MLP促进答案。然后，注意力关注和抑制先前的答案标记，而MLP增强抑制信号。我们的机制得到了大量实验证据的支持：除了使用早期解码和因果追踪，我们通过引入Token Lens和一种分析MLP输出变化的击倒方法，分析组件如何使用不同的标记。总的来说，我们提供了关于LM的内部组件如何与不同的输入标记交互以支持复杂的事实回忆的新见解。代码可在https://github.com/Lorenayannnnn/how-lms-answer-one-to-many-factual-queries找到。

更新时间: 2025-03-05 13:22:47

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.20475v2

Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers

We present an approach to modifying Transformer architectures by integrating graph-aware relational reasoning into the attention mechanism, merging concepts from graph neural networks and language modeling. Building on the inherent connection between attention and graph theory, we reformulate the Transformer's attention mechanism as a graph operation and propose Graph-Aware Isomorphic Attention. This method leverages advanced graph modeling strategies, including Graph Isomorphism Networks (GIN) and Principal Neighborhood Aggregation (PNA), to enrich the representation of relational structures. Our approach captures complex dependencies and generalizes across tasks, as evidenced by a reduced generalization gap and improved learning performance. Additionally, we expand the concept of graph-aware attention to introduce Sparse GIN-Attention, a fine-tuning approach that employs sparse GINs. By interpreting attention matrices as sparse adjacency graphs, this technique enhances the adaptability of pre-trained foundational models with minimal computational overhead, endowing them with graph-aware capabilities. Sparse GIN-Attention fine-tuning achieves improved training dynamics and better generalization compared to alternative methods like low-rank adaption (LoRA). We discuss latent graph-like structures within traditional attention mechanisms, offering a new lens through which Transformers can be understood. By evolving Transformers as hierarchical GIN models for relational reasoning. This perspective suggests profound implications for foundational model development, enabling the design of architectures that dynamically adapt to both local and global dependencies. Applications in bioinformatics, materials science, language modeling, and beyond could benefit from this synthesis of relational and sequential data modeling, setting the stage for interpretable and generalizable modeling strategies.

Updated: 2025-03-05 13:19:16

标题: 图感知同构注意力用于变压器中的自适应动态

摘要: 我们提出了一种修改Transformer架构的方法，通过将图感知关系推理整合到注意力机制中，融合了图神经网络和语言建模的概念。基于注意力和图论之间固有的联系，我们将Transformer的注意力机制重新构建为图操作，并提出了图感知同构注意力。该方法利用先进的图建模策略，包括图同构网络（GIN）和主要邻域聚合（PNA），以丰富关系结构的表示。我们的方法捕捉了复杂的依赖关系，并在任务间进行泛化，通过减少泛化差距和提高学习性能来证明。此外，我们将图感知注意力的概念扩展到引入稀疏GIN-注意力，这是一种利用稀疏GIN进行微调的方法。通过将注意力矩阵解释为稀疏邻接图，这种技术增强了预训练基础模型的适应性，同时带来最小的计算开销，赋予它们图感知能力。与低秩调整（LoRA）等替代方法相比，稀疏GIN-注意力微调实现了更好的训练动态和更好的泛化性能。我们讨论了传统注意力机制中的潜在图状结构，提供了一个新的视角，可以理解Transformer。通过将Transformer演变为用于关系推理的分层GIN模型，这种观点对基础模型开发具有深远的影响，可以设计出动态适应本地和全局依赖的架构。生物信息学、材料科学、语言建模等领域的应用可能会受益于这种关系和序列数据建模的综合，为可解释和可泛化的建模策略奠定基础。

更新时间: 2025-03-05 13:19:16

领域: cs.LG,cond-mat.mes-hall,cond-mat.mtrl-sci,cs.AI,cs.CL

下载: http://arxiv.org/abs/2501.02393v3

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning

Vision-language-action models (VLAs) have shown great potential as generalist robot policies. However, these models pose urgent safety challenges during deployment, including the risk of physical harm to the environment, the robot itself, and humans. How can safety be explicitly incorporated into VLAs? In this work, we propose SafeVLA, a novel algorithm designed to integrate safety into VLAs, ensuring the protection of the environment, robot hardware and humans in real-world settings. SafeVLA effectively balances safety and task performance by employing large-scale constrained learning within simulated environments. We demonstrate that SafeVLA outperforms the current state-of-the-art method in both safety and task performance, achieving average improvements of 83.58% and 3.85%, respectively, in simulation. By prioritizing safety, our approach eliminates high-risk behaviors and reduces the upper bound of unsafe behaviors to 1/35 of that in the current state-of-the-art, thereby significantly mitigating long-tail risks. Furthermore, the learned safety constraints generalize to diverse, unseen scenarios, including multiple out-of-distribution perturbations and tasks. Our data, models and newly proposed benchmark environment are available at https://sites.google.com/view/pku-safevla.

Updated: 2025-03-05 13:16:55

标题: SafeVLA: 通过安全强化学习实现视觉-语言-动作模型的安全对齐

摘要: 视觉-语言-动作模型（VLAs）已经显示出作为通用机器人策略的巨大潜力。然而，这些模型在部署过程中面临紧迫的安全挑战，包括对环境、机器人本身和人类造成身体伤害的风险。如何将安全明确地纳入VLAs中？在这项工作中，我们提出了SafeVLA，一种新颖的算法，旨在将安全性整合到VLAs中，确保在真实世界环境中保护环境、机器人硬件和人类。SafeVLA通过在模拟环境中进行大规模约束学习，有效地平衡安全性和任务性能。我们展示了SafeVLA在安全性和任务性能方面均优于当前的最先进方法，在模拟中分别实现了83.58%和3.85%的平均改进。通过优先考虑安全性，我们的方法消除了高风险行为，并将不安全行为的上限降低到当前最先进方法的1/35，从而显著减轻长尾风险。此外，学习到的安全约束适用于各种不同的、未见过的场景，包括多个分布之外的扰动和任务。我们的数据、模型和新提出的基准测试环境可在https://sites.google.com/view/pku-safevla上获得。

更新时间: 2025-03-05 13:16:55

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2503.03480v1

Zero-Knowledge Proof-based Verifiable Decentralized Machine Learning in Communication Network: A Comprehensive Survey

Over recent decades, machine learning has significantly advanced network communication, enabling improved decision-making, user behavior analysis, and fault detection. Decentralized approaches, where participants exchange computation results instead of raw private data, mitigate these risks but introduce challenges related to trust and verifiability. A critical issue arises: How can one ensure the integrity and validity of computation results shared by other participants? Existing survey articles predominantly address security and privacy concerns in decentralized machine learning, whereas this survey uniquely highlights the emerging issue of verifiability. Recognizing the critical role of zero-knowledge proofs in ensuring verifiability, we present a comprehensive review of Zero-Knowledge Proof-based Verifiable Machine Learning (ZKP-VML). To clarify the research problem, we present a definition of ZKP-VML consisting of four algorithms, along with several corresponding key security properties. Besides, we provide an overview of the current research landscape by systematically organizing the research timeline and categorizing existing schemes based on their security properties. Furthermore, through an in-depth analysis of each existing scheme, we summarize their technical contributions and optimization strategies, aiming to uncover common design principles underlying ZKP-VML schemes. Building on the reviews and analysis presented, we identify current research challenges and suggest future research directions. To the best of our knowledge, this is the most comprehensive survey to date on verifiable decentralized machine learning and ZKP-VML.

Updated: 2025-03-05 12:52:30

标题: 基于零知识证明的可验证去中心化机器学习在通信网络中的应用：一项全面调查

摘要: 在最近几十年，机器学习在网络通信方面取得了显著进展，实现了改进的决策制定、用户行为分析和故障检测。去中心化方法，参与者通过交换计算结果而不是原始私人数据，可以减轻这些风险，但引入了与信任和可验证性相关的挑战。一个关键问题是：如何确保其他参与者共享的计算结果的完整性和有效性？现有的调查文章主要关注去中心化机器学习中的安全和隐私问题，而本调查独特地突出了可验证性的新问题。认识到零知识证明在确保可验证性方面的关键作用，我们提出了基于零知识证明的可验证机器学习（ZKP-VML）的综合评估。为了阐明研究问题，我们提出了一个由四个算法组成的ZKP-VML定义，以及几个相应的关键安全属性。此外，我们通过系统组织研究时间表并根据它们的安全属性对现有方案进行分类，概述了当前研究领域的概况。此外，通过深入分析每个现有方案，我们总结了它们的技术贡献和优化策略，旨在揭示支撑ZKP-VML方案的共同设计原则。基于所提出的评论和分析，我们确定了当前的研究挑战并提出了未来的研究方向。据我们所知，这是迄今为止对可验证的去中心化机器学习和ZKP-VML进行的最全面的调查。

更新时间: 2025-03-05 12:52:30

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2310.14848v2

Open-Source Large Language Models as Multilingual Crowdworkers: Synthesizing Open-Domain Dialogues in Several Languages With No Examples in Targets and No Machine Translation

The prevailing paradigm in the domain of Open-Domain Dialogue agents predominantly focuses on the English language, encompassing both models and datasets. Furthermore, the financial and temporal investments required for crowdsourcing such datasets for finetuning are substantial, particularly when multiple languages are involved. Fortunately, advancements in Large Language Models (LLMs) have unveiled a plethora of possibilities across diverse tasks. Specifically, instruction-tuning has enabled LLMs to execute tasks based on natural language instructions, occasionally surpassing the performance of human crowdworkers. Additionally, these models possess the capability to function in various languages within a single thread. Consequently, to generate new samples in different languages, we propose leveraging these capabilities to replicate the data collection process. We introduce a pipeline for generating Open-Domain Dialogue data in multiple Target Languages using LLMs, with demonstrations provided in a unique Source Language. By eschewing explicit Machine Translation in this approach, we enhance the adherence to language-specific nuances. We apply this methodology to the PersonaChat dataset. To enhance the openness of generated dialogues and mimic real life scenarii, we added the notion of speech events corresponding to the type of conversation the speakers are involved in and also that of common ground which represents the premises of a conversation.

Updated: 2025-03-05 12:52:14

标题: 开源大型语言模型作为多语众包工作者：在多种语言中合成开放领域对话，无目标示例和无机器翻译

摘要: 在开放领域对话代理领域，主流范式主要集中在英语，涵盖了模型和数据集。此外，为了微调这些数据集所需的财务和时间投资是相当可观的，特别是涉及到多种语言时。幸运的是，大型语言模型（LLMs）的进步揭示了各种任务的大量可能性。具体来说，指导微调使LLMs能够根据自然语言指令执行任务，有时超过人类众包工作者的表现。此外，这些模型具有在单个线程中以多种语言运行的能力。因此，为了在不同语言中生成新样本，我们提议利用这些能力来复制数据收集过程。我们介绍了使用LLMs生成多种目标语言的开放领域对话数据的流程，并提供了在独特源语言中的演示。通过在这种方法中避免明确的机器翻译，我们增强了对特定语言细微差别的遵循。我们将这种方法应用于PersonaChat数据集。为了增强生成对话的开放性并模仿现实生活场景，我们添加了与发言者所涉及对话类型相对应的语音事件的概念，以及代表对话前提的共同基础的概念。

更新时间: 2025-03-05 12:52:14

领域: cs.CL,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2503.03462v1

Unified Mind Model: Reimagining Autonomous Agents in the LLM Era

Large language models (LLMs) have recently demonstrated remarkable capabilities across domains, tasks, and languages (e.g., ChatGPT and GPT-4), reviving the research of general autonomous agents with human-like cognitive abilities.Such human-level agents require semantic comprehension and instruction-following capabilities, which exactly fall into the strengths of LLMs.Although there have been several initial attempts to build human-level agents based on LLMs, the theoretical foundation remains a challenging open problem. In this paper, we propose a novel theoretical cognitive architecture, the Unified Mind Model (UMM), which offers guidance to facilitate the rapid creation of autonomous agents with human-level cognitive abilities. Specifically, our UMM starts with the global workspace theory and further leverage LLMs to enable the agent with various cognitive abilities, such as multi-modal perception, planning, reasoning, tool use, learning, memory, reflection and motivation. Building upon UMM, we then develop an agent-building engine, MindOS, which allows users to quickly create domain-/task-specific autonomous agents without any programming effort.

Updated: 2025-03-05 12:49:44

标题: 统一心灵模型：重新构想LLM时代的自主代理

摘要: 大型语言模型（LLMs）最近展示了在领域、任务和语言方面的卓越能力（例如ChatGPT和GPT-4），重新唤起了具有类似于人类认知能力的通用自主代理研究。这种人类水平的代理需要语义理解和遵循指令的能力，这正是LLMs的优势所在。尽管已经有几次尝试基于LLMs构建人类水平的代理，但理论基础仍然是一个具有挑战性的开放问题。在本文中，我们提出了一种新颖的理论认知架构，统一心智模型（UMM），它提供指导，以促进具有人类水平认知能力的自主代理的快速创建。具体来说，我们的UMM从全局工作空间理论开始，进一步利用LLMs使代理具有各种认知能力，例如多模态感知、规划、推理、工具使用、学习、记忆、反思和动机。在UMM的基础上，我们开发了一个代理构建引擎，MindOS，它允许用户在不需要任何编程工作的情况下快速创建领域/任务特定的自主代理。

更新时间: 2025-03-05 12:49:44

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.03459v1

Sim2Real within 5 Minutes: Efficient Domain Transfer with Stylized Gaussian Splatting for Endoscopic Images

Robot assisted endoluminal intervention is an emerging technique for both benign and malignant luminal lesions. With vision-based navigation, when combined with pre-operative imaging data as priors, it is possible to recover position and pose of the endoscope without the need of additional sensors. In practice, however, aligning pre-operative and intra-operative domains is complicated by significant texture differences. Although methods such as style transfer can be used to address this issue, they require large datasets from both source and target domains with prolonged training times. This paper proposes an efficient domain transfer method based on stylized Gaussian splatting, only requiring a few of real images (10 images) with very fast training time. Specifically, the transfer process includes two phases. In the first phase, the 3D models reconstructed from CT scans are represented as differential Gaussian point clouds. In the second phase, only color appearance related parameters are optimized to transfer the style and preserve the visual content. A novel structure consistency loss is applied to latent features and depth levels to enhance the stability of the transferred images. Detailed validation was performed to demonstrate the performance advantages of the proposed method compared to that of the current state-of-the-art, highlighting the potential for intra-operative surgical navigation.

Updated: 2025-03-05 12:41:05

标题: 5分钟内的Sim2Real：使用风格化的高斯分层进行内窥镜图像的高效域转移

摘要: 机器人辅助内腔介入是一种新兴的技术，用于治疗良性和恶性内腔病变。通过基于视觉的导航，结合术前影像数据作为先验知识，可以在无需额外传感器的情况下恢复内窥镜的位置和姿态。然而，在实践中，术前和术中域的对齐受到显著纹理差异的影响而变得复杂。虽然可以使用风格转移等方法来解决这个问题，但这些方法需要来自源域和目标域的大量数据集，并需要长时间的训练。本文提出了一种基于风格化高斯点云投射的高效领域转移方法，只需要少量真实图像（10张图像），训练时间非常快。具体而言，转移过程包括两个阶段。在第一阶段，从CT扫描重建的3D模型被表示为微分高斯点云。在第二阶段，只优化与颜色外观相关的参数，以转移风格并保留视觉内容。一种新颖的结构一致性损失被应用于潜在特征和深度水平，以增强转移图像的稳定性。通过详细的验证，展示了所提出方法相对于当前最先进技术的性能优势，突显了术中外科导航的潜力。

更新时间: 2025-03-05 12:41:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.10860v2

Data Poisoning Attacks to Locally Differentially Private Range Query Protocols

Trajectory data, which tracks movements through geographic locations, is crucial for improving real-world applications. However, collecting such sensitive data raises considerable privacy concerns. Local differential privacy (LDP) offers a solution by allowing individuals to locally perturb their trajectory data before sharing it. Despite its privacy benefits, LDP protocols are vulnerable to data poisoning attacks, where attackers inject fake data to manipulate aggregated results. In this work, we make the first attempt to analyze vulnerabilities in several representative LDP trajectory protocols. We propose \textsc{TraP}, a heuristic algorithm for data \underline{P}oisoning attacks using a prefix-suffix method to optimize fake \underline{Tra}jectory selection, significantly reducing computational complexity. Our experimental results demonstrate that our attack can substantially increase target pattern occurrences in the perturbed trajectory dataset with few fake users. This study underscores the urgent need for robust defenses and better protocol designs to safeguard LDP trajectory data against malicious manipulation.

Updated: 2025-03-05 12:40:34

标题: 对本地差分隐私范围查询协议的数据投毒攻击

摘要: 轨迹数据跟踪通过地理位置的移动，在改进现实世界应用程序方面至关重要。然而，收集这种敏感数据引发了相当大的隐私担忧。本地差分隐私（LDP）通过允许个体在共享之前本地扰动其轨迹数据提供了一个解决方案。尽管具有隐私优势，LDP协议容易受到数据毒化攻击的影响，攻击者注入虚假数据来操纵聚合结果。在这项工作中，我们首次尝试分析几种代表性LDP轨迹协议中的漏洞。我们提出了\textsc{TraP}，一种启发式算法用于使用前缀后缀方法优化虚假轨迹选择的数据毒化攻击，显著降低计算复杂性。我们的实验结果表明，我们的攻击可以在扰动的轨迹数据集中显著增加目标模式的出现次数，只需少量虚假用户。这项研究强调了迫切需要强有力的防御措施和更好的协议设计，以保护LDP轨迹数据免受恶意操纵。

更新时间: 2025-03-05 12:40:34

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2503.03454v1

A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction

Recent advances in video generation models demonstrate their potential as world simulators, but they often struggle with videos deviating from physical laws, a key concern overlooked by most text-to-video benchmarks. We introduce a benchmark designed specifically to assess the Physical Coherence of generated videos, PhyCoBench. Our benchmark includes 120 prompts covering 7 categories of physical principles, capturing key physical laws observable in video content. We evaluated four state-of-the-art (SoTA) T2V models on PhyCoBench and conducted manual assessments. Additionally, we propose an automated evaluation model: PhyCoPredictor, a diffusion model that generates optical flow and video frames in a cascade manner. Through a consistency evaluation comparing automated and manual sorting, the experimental results show that PhyCoPredictor currently aligns most closely with human evaluation. Therefore, it can effectively evaluate the physical coherence of videos, providing insights for future model optimization. Our benchmark, including physical coherence prompts, the automatic evaluation tool PhyCoPredictor, and the generated video dataset, has been released on GitHub at https://github.com/Jeckinchen/PhyCoBench.

Updated: 2025-03-05 12:27:57

标题: 一个用于通过光流引导帧预测评估视频生成模型的物理一致性基准

摘要: 最近对视频生成模型的研究表明它们作为世界模拟器的潜力，但它们经常在视频偏离物理定律方面遇到困难，这是大多数文本到视频基准测试忽视的一个关键问题。我们引入了一个专门设计用于评估生成视频的物理一致性的基准测试，称为PhyCoBench。我们的基准测试包括120个提示，涵盖了7类物理原则，捕捉了视频内容中可观察到的关键物理定律。我们在PhyCoBench上评估了四种最先进的T2V模型，并进行了手动评估。此外，我们提出了一个自动评估模型：PhyCoPredictor，一个以级联方式生成光流和视频帧的扩散模型。通过比较自动和手动排序的一致性评估，实验结果表明PhyCoPredictor目前与人类评估最为接近。因此，它可以有效评估视频的物理一致性，为未来模型优化提供见解。我们的基准测试，包括物理一致性提示、自动评估工具PhyCoPredictor和生成的视频数据集，已在GitHub上发布，网址为https://github.com/Jeckinchen/PhyCoBench。

更新时间: 2025-03-05 12:27:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.05503v3

Taxation Perspectives from Large Language Models: A Case Study on Additional Tax Penalties

How capable are large language models (LLMs) in the domain of taxation? Although numerous studies have explored the legal domain in general, research dedicated to taxation remain scarce. Moreover, the datasets used in these studies are either simplified, failing to reflect the real-world complexities, or unavailable as open source. To address this gap, we introduce PLAT, a new benchmark designed to assess the ability of LLMs to predict the legitimacy of additional tax penalties. PLAT is constructed to evaluate LLMs' understanding of tax law, particularly in cases where resolving the issue requires more than just applying related statutes. Our experiments with six LLMs reveal that their baseline capabilities are limited, especially when dealing with conflicting issues that demand a comprehensive understanding. However, we found that enabling retrieval, self-reasoning, and discussion among multiple agents with specific role assignments, this limitation can be mitigated.

Updated: 2025-03-05 12:24:20

标题: 大语言模型的税收观点：关于额外税收罚款的案例研究

摘要: 大型语言模型（LLMs）在税收领域有多大能力？尽管已有许多研究探讨了法律领域，但专门研究税收的研究仍然较少。此外，这些研究中使用的数据集要么过于简化，未能反映现实世界的复杂性，要么无法作为开放源代码提供。为了填补这一空白，我们引入了PLAT，一个新的基准，旨在评估LLMs预测额外税收处罚合法性的能力。PLAT旨在评估LLMs对税法的理解，特别是在解决问题需要不仅仅是应用相关法规的情况下。我们对六个LLMs进行的实验表明，它们的基线能力是有限的，尤其是在处理需要全面理解的冲突问题时。然而，我们发现通过启用检索、自我推理和多个具有特定角色分配的代理之间的讨论，可以缓解这种限制。

更新时间: 2025-03-05 12:24:20

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03444v1

Conceptualizing Uncertainty

Uncertainty in machine learning refers to the degree of confidence or lack thereof in a model's predictions. While uncertainty quantification methods exist, explanations of uncertainty, especially in high-dimensional settings, remain an open challenge. Existing work focuses on feature attribution approaches which are restricted to local explanations. Understanding uncertainty, its origins, and characteristics on a global scale is crucial for enhancing interpretability and trust in a model's predictions. In this work, we propose to explain the uncertainty in high-dimensional data classification settings by means of concept activation vectors which give rise to local and global explanations of uncertainty. We demonstrate the utility of the generated explanations by leveraging them to refine and improve our model.

Updated: 2025-03-05 12:24:12

标题: 概念化不确定性

摘要: 机器学习中的不确定性指的是模型预测的置信度或不确定程度。尽管存在不确定性量化方法，但在高维环境下解释不确定性仍然是一个挑战。现有研究侧重于特征归因方法，这些方法局限于局部解释。理解不确定性及其起源和特征在全局范围内对于提高模型预测的可解释性和信任度至关重要。在这项工作中，我们提出通过概念激活向量来解释高维数据分类环境中的不确定性，从而产生局部和全局的不确定性解释。我们通过利用这些生成的解释来完善和改进我们的模型，展示了这些解释的实用性。

更新时间: 2025-03-05 12:24:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03443v1

A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks

LLM test-time compute (or LLM inference) via search has emerged as a promising research area with rapid developments. However, current frameworks often adopt distinct perspectives on three key aspects (task definition, LLM profiling, and search procedures), making direct comparisons challenging. Moreover, the search algorithms employed often diverge from standard implementations, and their specific characteristics are not thoroughly specified. In this survey, we provide a comprehensive technical review that unifies task definitions and provides modular definitions of LLM profiling and search procedures. The definitions enable precise comparisons of various LLM inference frameworks while highlighting their departures from conventional search algorithms. We also discuss the applicability, performance, and efficiency of these methods. We have updated our content to include the latest papers, and the differences between versions are highlighted in the appendix. For further details and ongoing updates, please refer to our GitHub repository: https://github.com/xinzhel/LLM-Agent-Survey/blob/main/search.md

Updated: 2025-03-05 12:22:23

标题: 一项关于LLM测试时间计算的调查：任务、LLM配置文件、搜索算法和相关框架

摘要: LLM测试时间计算（或LLM推理）通过搜索已成为一个具有快速发展潜力的研究领域。然而，当前框架通常对三个关键方面（任务定义、LLM配置文件和搜索程序）采用不同的观点，使直接比较变得困难。此外，所采用的搜索算法通常与标准实现不同，并且它们的具体特征未经彻底说明。在本调查中，我们提供了一份全面的技术评估，统一了任务定义，并提供了LLM配置文件和搜索程序的模块化定义。这些定义使得可以对各种LLM推理框架进行精确比较，同时突出它们与传统搜索算法的不同之处。我们还讨论了这些方法的适用性、性能和效率。我们已更新内容以包含最新的论文，并在附录中突出显示版本之间的差异。有关更多详细信息和持续更新，请参考我们的GitHub存储库：https://github.com/xinzhel/LLM-Agent-Survey/blob/main/search.md

更新时间: 2025-03-05 12:22:23

领域: cs.AI

下载: http://arxiv.org/abs/2501.10069v2

Gradient Deconfliction via Orthogonal Projections onto Subspaces For Multi-task Learning

Although multi-task learning (MTL) has been a preferred approach and successfully applied in many real-world scenarios, MTL models are not guaranteed to outperform single-task models on all tasks mainly due to the negative effects of conflicting gradients among the tasks. In this paper, we fully examine the influence of conflicting gradients and further emphasize the importance and advantages of achieving non-conflicting gradients which allows simple but effective trade-off strategies among the tasks with stable performance. Based on our findings, we propose the Gradient Deconfliction via Orthogonal Projections onto Subspaces (GradOPS) spanned by other task-specific gradients. Our method not only solves all conflicts among the tasks, but can also effectively search for diverse solutions towards different trade-off preferences among the tasks. Theoretical analysis on convergence is provided, and performance of our algorithm is fully testified on multiple benchmarks in various domains. Results demonstrate that our method can effectively find multiple state-of-the-art solutions with different trade-off strategies among the tasks on multiple datasets.

Updated: 2025-03-05 12:13:08

标题: 多任务学习中基于正交投影的梯度冲突解决方案

摘要: 虽然多任务学习（MTL）已成为一种首选方法，并成功应用于许多实际场景，但由于任务之间梯度冲突的负面影响，MTL模型并不保证在所有任务上都优于单任务模型。本文全面研究了梯度冲突的影响，并进一步强调了实现非冲突梯度的重要性和优势，这使得在任务之间可以采用简单但有效的权衡策略，以保持稳定的性能。基于我们的发现，我们提出了通过其他任务特定梯度构成的子空间（GradOPS）上的正交投影来解决梯度冲突。我们的方法不仅解决了所有任务之间的冲突，而且能够有效地寻找不同任务之间的不同权衡偏好的多样解决方案。我们提供了收敛性的理论分析，并在多个领域的多个基准测试中充分验证了我们算法的性能。结果表明，我们的方法可以有效地在多个数据集上找到多个最先进的解决方案，这些解决方案采用不同的任务权衡策略。

更新时间: 2025-03-05 12:13:08

领域: cs.LG

下载: http://arxiv.org/abs/2503.03438v1

Time-bin Phase and Polarization based QKD systems performance analysis over 16Km Aerial Fibers

We analyze the performance of Time-bin Phase and Polarization based QKD systems on mixed 14Km underground and 16Km of aerial fiber using plug-and-play commercial QKD systems.

Updated: 2025-03-05 12:12:04

标题: 基于时间槽相位和极化的量子密钥分发系统在16公里空中光纤中的性能分析

摘要: 我们分析了基于时间槽相位和极化的混合14公里地下和16公里空中光纤上使用即插即用商用QKD系统的性能。

更新时间: 2025-03-05 12:12:04

领域: cs.CR

下载: http://arxiv.org/abs/2503.03436v1

RASD: Retrieval-Augmented Speculative Decoding

Speculative decoding accelerates inference in large language models (LLMs) by generating draft tokens for target model verification. Current approaches for obtaining draft tokens rely on lightweight draft models or additional model structures to generate draft tokens and retrieve context from databases. Due to the draft model's small size and limited training data, model-based speculative decoding frequently becomes less effective in out-of-domain scenarios. Additionally, the time cost of the drafting phase results in a low upper limit on acceptance length during the verification step, limiting overall efficiency. This paper proposes RASD (Retrieval-Augmented Speculative Decoding), which adopts retrieval methods to enhance model-based speculative decoding. We introduce tree pruning and tree fusion to achieve this. Specifically, we develop a pruning method based on the draft model's probability distribution to construct the optimal retrieval tree. Second, we employ the longest prefix matching algorithm to merge the tree generated by the draft model with the retrieval tree, resulting in a unified tree for verification. Experimental results demonstrate that RASD achieves state-of-the-art inference acceleration across tasks such as DocQA, Summary, Code, and In-Domain QA. Moreover, RASD exhibits strong scalability, seamlessly integrating with various speculative decoding approaches, including both generation-based and retrieval-based methods.

Updated: 2025-03-05 12:10:14

标题: RASD: 检索增强的推测解码

摘要: Speculative decoding加速了大型语言模型（LLMs）中的推理，通过为目标模型生成草稿标记进行验证。目前获取草稿标记的方法依赖于轻量级的草稿模型或额外的模型结构来生成草稿标记并从数据库中检索上下文。由于草稿模型的规模较小且训练数据有限，基于模型的猜测解码在领域外情况下频繁变得不太有效。此外，草拟阶段的时间成本导致在验证步骤中接受长度的上限较低，限制了整体效率。本文提出了RASD（检索增强的猜测解码），采用检索方法来增强基于模型的猜测解码。我们引入了树修剪和树融合来实现这一点。具体而言，我们开发了一种基于草稿模型概率分布的修剪方法来构建最佳检索树。其次，我们采用最长前缀匹配算法将草稿模型生成的树与检索树合并，形成统一的用于验证的树。实验结果表明，RASD在DocQA、摘要、代码和领域内QA等任务中实现了最先进的推理加速。此外，RASD具有很强的可扩展性，可以与各种基于生成和检索的猜测解码方法无缝集成。

更新时间: 2025-03-05 12:10:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03434v1

Privacy is All You Need: Revolutionizing Wearable Health Data with Advanced PETs

In a world where data is the new currency, wearable health devices offer unprecedented insights into daily life, continuously monitoring vital signs and metrics. However, this convenience raises privacy concerns, as these devices collect sensitive data that can be misused or breached. Traditional measures often fail due to real-time data processing needs and limited device power. Users also lack awareness and control over data sharing and usage. We propose a Privacy-Enhancing Technology (PET) framework for wearable devices, integrating federated learning, lightweight cryptographic methods, and selectively deployed blockchain technology. The blockchain acts as a secure ledger triggered only upon data transfer requests, granting users real-time notifications and control. By dismantling data monopolies, this approach returns data sovereignty to individuals. Through real-world applications like secure medical data sharing, privacy-preserving fitness tracking, and continuous health monitoring, our framework reduces privacy risks by up to 70 percent while preserving data utility and performance. This innovation sets a new benchmark for wearable privacy and can scale to broader IoT ecosystems, including smart homes and industry. As data continues to shape our digital landscape, our research underscores the critical need to maintain privacy and user control at the forefront of technological progress.

Updated: 2025-03-05 12:01:22

标题: 隐私即所需：通过先进的PETs改变可穿戴健康数据

摘要: 在一个数据是新货币的世界里，可穿戴健康设备为我们提供了对日常生活的前所未有的洞察，不断监测着生命体征和指标。然而，这种便利性引发了隐私担忧，因为这些设备收集了可能被滥用或泄露的敏感数据。传统措施通常因实时数据处理需求和设备电力有限而失败。用户也缺乏对数据共享和使用的意识和控制。我们提出了一个针对可穿戴设备的隐私增强技术（PET）框架，集成了联邦学习、轻量级加密方法和选择性部署的区块链技术。区块链作为一个安全账本，只在数据传输请求时触发，给用户提供实时通知和控制。通过打破数据垄断，这种方法将数据主权重新归还给个人。通过安全医疗数据共享、保护隐私的健身追踪和持续的健康监测等实际应用，我们的框架将隐私风险降低了70%，同时保留了数据的效用和性能。这一创新为可穿戴设备的隐私设定了新的标杆，并可以扩展到更广泛的物联网生态系统，包括智能家居和工业领域。随着数据继续塑造我们的数字景观，我们的研究强调了在技术进步的前沿保持隐私和用户控制的重要性。

更新时间: 2025-03-05 12:01:22

领域: cs.CR,cs.AI,cs.ET,cs.HC

下载: http://arxiv.org/abs/2503.03428v1

Channel-Attentive Graph Neural Networks

Graph Neural Networks (GNNs) set the state-of-the-art in representation learning for graph-structured data. They are used in many domains, from online social networks to complex molecules. Most GNNs leverage the message-passing paradigm and achieve strong performances on various tasks. However, the message-passing mechanism used in most models suffers from over-smoothing as a GNN's depth increases. The over-smoothing degrades GNN's performance due to the increased similarity between the representations of unrelated nodes. This study proposes an adaptive channel-wise message-passing approach to alleviate the over-smoothing. The proposed model, Channel-Attentive GNN, learns how to attend to neighboring nodes and their feature channels. Thus, much diverse information can be transferred between nodes during message-passing. Experiments with widely used benchmark datasets show that the proposed model is more resistant to over-smoothing than baselines and achieves state-of-the-art performances for various graphs with strong heterophily. Our code is at https://github.com/ALLab-Boun/CHAT-GNN.

Updated: 2025-03-05 12:00:38

标题: 通道注意力图神经网络

摘要: 图神经网络（GNN）在表示学习图结构数据方面处于领先地位。它们被应用于许多领域，从在线社交网络到复杂分子。大多数GNN利用消息传递范式，在各种任务上取得了强大的表现。然而，大多数模型中使用的消息传递机制在GNN的深度增加时会出现过度平滑的问题。过度平滑会降低GNN的性能，因为不相关节点的表示之间的相似性增加。本研究提出了一种自适应的通道注意力消息传递方法来缓解过度平滑问题。所提出的模型，通道注意力GNN，学习如何关注邻近节点及其特征通道。因此，在消息传递过程中可以传递更加多样化的信息。对广泛使用的基准数据集进行的实验表明，所提出的模型比基线更具抗过度平滑的能力，并在具有强异质性的各种图上实现了最先进的性能。我们的代码位于https://github.com/ALLab-Boun/CHAT-GNN。

更新时间: 2025-03-05 12:00:38

领域: cs.LG

下载: http://arxiv.org/abs/2503.00578v2

Early-Stopped Mirror Descent for Linear Regression over Convex Bodies

Early-stopped iterative optimization methods are widely used as alternatives to explicit regularization, and direct comparisons between early-stopping and explicit regularization have been established for many optimization geometries. However, most analyses depend heavily on the specific properties of the optimization geometry or strong convexity of the empirical objective, and it remains unclear whether early-stopping could ever be less statistically efficient than explicit regularization for some particular shape constraint, especially in the overparameterized regime. To address this question, we study the setting of high-dimensional linear regression under additive Gaussian noise when the ground truth is assumed to lie in a known convex body and the task is to minimize the in-sample mean squared error. Our main result shows that for any convex body and any design matrix, up to an absolute constant factor, the worst-case risk of unconstrained early-stopped mirror descent with an appropriate potential is at most that of the least squares estimator constrained to the convex body. We achieve this by constructing algorithmic regularizers based on the Minkowski functional of the convex body.

Updated: 2025-03-05 11:59:31

标题: 线性回归在凸体上的早停镜像下降

摘要: 早停止迭代优化方法被广泛用作显式正则化的替代方法，并且早停止与显式正则化之间的直接比较已经在许多优化几何结构中建立起来。然而，大多数分析严重依赖于特定的优化几何结构属性或经验目标的强凸性，尤其是在过参数化区域中，仍然不清楚早停止在某些特定形状约束下是否会比显式正则化更少统计效率。为了解决这个问题，我们研究了在已知凸体中地实线性回归的高维设置，当假设基本事实位于已知凸体中，并且任务是最小化样本内均方误差。我们的主要结果表明，对于任何凸体和任何设计矩阵，根据绝对常数因子，无约束的早停止镜像下降的最坏情况风险最多等于约束到凸体的最小二乘估计器。我们通过构建基于凸体的明可夫斯基功能的算法正则化器来实现这一点。

更新时间: 2025-03-05 11:59:31

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2503.03426v1

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

We introduce LADDER (Learning through Autonomous Difficulty-Driven Example Recursion), a framework which enables Large Language Models to autonomously improve their problem-solving capabilities through self-guided learning by recursively generating and solving progressively simpler variants of complex problems. Unlike prior approaches that require curated datasets or human feedback, LADDER leverages a model's own capabilities to generate easier question variants. We demonstrate LADDER's effectiveness in the subject of mathematical integration, improving Llama 3.2 3B's accuracy from 1% to 82% on undergraduate-level problems and enabling Qwen2.5 7B Deepseek-R1 Distilled to achieve 73% on the MIT Integration Bee qualifying examination. We also introduce TTRL (Test-Time Reinforcement Learning), where we perform reinforcement learning on variants of test problems at inference time. TTRL enables Qwen2.5 7B Deepseek-R1 Distilled to achieve a state-of-the-art score of 90% on the MIT Integration Bee qualifying examination, surpassing OpenAI o1's performance. These results show how self-directed strategic learning can achieve significant capability improvements without relying on architectural scaling or human supervision.

Updated: 2025-03-05 11:50:24

标题: 梯子：通过递归问题分解实现自我改进的LLMs

摘要: 我们介绍了LADDER（通过自主困难驱动示例递归学习）框架，该框架使大型语言模型能够通过自主学习来逐步生成和解决复杂问题的简化变体，从而不断提高其解决问题的能力。与以往需要策划数据集或人类反馈的方法不同，LADDER利用模型自身的能力生成更简单的问题变体。我们在数学积分学科展示了LADDER的有效性，将Llama 3.2 3B的准确率从1%提高到82%，并使Qwen2.5 7B Deepseek-R1 Distilled在麻省理工学院积分竞赛资格考试上达到73%。我们还介绍了TTRL（测试时间强化学习），在推理时对测试问题的变体进行强化学习。TTRL使Qwen2.5 7B Deepseek-R1 Distilled在麻省理工学院积分竞赛资格考试上达到90%的最先进分数，超过了OpenAI o1的表现。这些结果表明，自主指导的战略学习可以在不依赖架构扩展或人类监督的情况下实现显著的能力提升。

更新时间: 2025-03-05 11:50:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.00735v3

Synthetic Data Augmentation for Enhancing Harmful Algal Bloom Detection with Machine Learning

Harmful Algal Blooms (HABs) pose severe threats to aquatic ecosystems and public health, resulting in substantial economic losses globally. Early detection is crucial but often hindered by the scarcity of high-quality datasets necessary for training reliable machine learning (ML) models. This study investigates the use of synthetic data augmentation using Gaussian Copulas to enhance ML-based HAB detection systems. Synthetic datasets of varying sizes (100-1,000 samples) were generated using relevant environmental features$\unicode{x2015}$water temperature, salinity, and UVB radiation$\unicode{x2015}$with corrected Chlorophyll-a concentration as the target variable. Experimental results demonstrate that moderate synthetic augmentation significantly improves model performance (RMSE reduced from 0.4706 to 0.1850; $p < 0.001$). However, excessive synthetic data introduces noise and reduces predictive accuracy, emphasizing the need for a balanced approach to data augmentation. These findings highlight the potential of synthetic data to enhance HAB monitoring systems, offering a scalable and cost-effective method for early detection and mitigation of ecological and public health risks.

Updated: 2025-03-05 11:50:04

标题: 使用机器学习增强有害藻类水华检测的合成数据增强

摘要: 有害藻类水华（HABs）对水生态系统和公共健康造成严重威胁，全球范围内造成了重大经济损失。早期检测至关重要，但通常受到高质量数据稀缺的影响，这些数据是训练可靠的机器学习（ML）模型所必需的。本研究探讨了使用高斯Copulas进行合成数据增强以增强基于ML的HAB检测系统的可行性。使用相关环境特征（水温、盐度和紫外辐射）生成不同规模（100-1,000个样本）的合成数据集，以修正的叶绿素-a浓度作为目标变量。实验结果表明，适度的合成增强显著提高了模型性能（RMSE从0.4706降至0.1850；$p < 0.001$）。然而，过度的合成数据会引入噪音并降低预测准确性，强调了对数据增强采取平衡方法的必要性。这些发现突出了合成数据提升HAB监测系统的潜力，为早期检测和减轻生态和公共卫生风险提供了可扩展且具有成本效益的方法。

更新时间: 2025-03-05 11:50:04

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2503.03794v1

ChaI-TeA: A Benchmark for Evaluating Autocompletion of Interactions with LLM-based Chatbots

The rise of LLMs has deflected a growing portion of human-computer interactions towards LLM-based chatbots. The remarkable abilities of these models allow users to interact using long, diverse natural language text covering a wide range of topics and styles. Phrasing these messages is a time and effort consuming task, calling for an autocomplete solution to assist users. We introduce the task of chatbot interaction autocomplete. We present ChaI-TeA: CHat InTEraction Autocomplete; An autcomplete evaluation framework for LLM-based chatbot interactions. The framework includes a formal definition of the task, coupled with suitable datasets and metrics. We use the framework to evaluate After formally defining the task along with suitable datasets and metrics, we test 9 models on the defined auto completion task, finding that while current off-the-shelf models perform fairly, there is still much room for improvement, mainly in ranking of the generated suggestions. We provide insights for practitioners working on this task and open new research directions for researchers in the field. We release our framework to serve as a foundation for future research.

Updated: 2025-03-05 11:49:36

标题: ChaI-TeA：一个基于LLM聊天机器人的互动自动完成评估基准

摘要: LLMs的崛起使越来越多的人机交互转向基于LLM的聊天机器人。这些模型的卓越能力允许用户使用长篇、多样化的自然语言文本进行互动，涵盖了广泛的主题和风格。构思这些消息是一个耗时耗力的任务，需要一个自动完成解决方案来辅助用户。我们引入了聊天机器人交互自动完成任务。我们提出了ChaI-TeA：CHat InTEraction Autocomplete；一种基于LLM的聊天机器人交互自动完成评估框架。该框架包括任务的正式定义，配合合适的数据集和指标。我们使用该框架来评估在定义的自动完成任务上测试的9个模型，发现虽然目前现成的模型表现相当不错，但在生成建议的排名方面仍有很大的改进空间。我们为从事这项任务的实践者提供见解，并为该领域的研究人员开辟新的研究方向。我们发布我们的框架，作为未来研究的基础。

更新时间: 2025-03-05 11:49:36

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.18377v3

Simplicial SMOTE: Oversampling Solution to the Imbalanced Learning Problem

SMOTE (Synthetic Minority Oversampling Technique) is the established geometric approach to random oversampling to balance classes in the imbalanced learning problem, followed by many extensions. Its idea is to introduce synthetic data points of the minor class, with each new point being the convex combination of an existing data point and one of its k-nearest neighbors. In this paper, by viewing SMOTE as sampling from the edges of a geometric neighborhood graph and borrowing tools from the topological data analysis, we propose a novel technique, Simplicial SMOTE, that samples from the simplices of a geometric neighborhood simplicial complex. A new synthetic point is defined by the barycentric coordinates w.r.t. a simplex spanned by an arbitrary number of data points being sufficiently close rather than a pair. Such a replacement of the geometric data model results in better coverage of the underlying data distribution compared to existing geometric sampling methods and allows the generation of synthetic points of the minority class closer to the majority class on the decision boundary. We experimentally demonstrate that our Simplicial SMOTE outperforms several popular geometric sampling methods, including the original SMOTE. Moreover, we show that simplicial sampling can be easily integrated into existing SMOTE extensions. We generalize and evaluate simplicial extensions of the classic Borderline SMOTE, Safe-level SMOTE, and ADASYN algorithms, all of which outperform their graph-based counterparts.

Updated: 2025-03-05 11:47:41

标题: 单纯形SMOTE：不平衡学习问题的过采样解决方案

摘要: SMOTE（Synthetic Minority Oversampling Technique）是一种已建立的几何方法，用于在不平衡学习问题中平衡类别，随后有许多扩展。其思想是引入少数类的合成数据点，每个新点都是现有数据点和其k个最近邻之一的凸组合。在本文中，通过将SMOTE视为从几何邻域图的边缘进行抽样，并借鉴拓扑数据分析工具，我们提出了一种新颖的技术，称为Simplicial SMOTE，它从几何邻域单纯复合体的单纯形中进行抽样。一个新的合成点是由相对于由足够接近的任意数量的数据点展开的一个单纯形的重心坐标来定义的，而不是一对。这种对几何数据模型的替换相对于现有的几何采样方法具有更好的数据分布覆盖，并允许在决策边界上生成更接近多数类的少数类的合成点。我们通过实验证明，我们的Simplicial SMOTE优于几种流行的几何采样方法，包括原始的SMOTE。此外，我们展示单纯采样可以轻松集成到现有的SMOTE扩展中。我们推广和评估了经典的边界SMOTE、安全级别SMOTE和ADASYN算法的单纯扩展，它们都优于它们的基于图的对应物。

更新时间: 2025-03-05 11:47:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03418v1

When Claims Evolve: Evaluating and Enhancing the Robustness of Embedding Models Against Misinformation Edits

Online misinformation remains a critical challenge, and fact-checkers increasingly rely on embedding-based methods to retrieve relevant fact-checks. Yet, when debunked claims reappear in edited forms, the performance of these methods is unclear. In this work, we introduce a taxonomy of six common real-world misinformation edits and propose a perturbation framework that generates valid, natural claim variations. Our multi-stage retrieval evaluation reveals that standard embedding models struggle with user-introduced edits, while LLM-distilled embeddings offer improved robustness at a higher computational cost. Although a strong reranker helps mitigate some issues, it cannot fully compensate for first-stage retrieval gaps. Addressing these retrieval gaps, our train- and inference-time mitigation approaches enhance in-domain robustness by up to 17 percentage points and boost out-of-domain generalization by 10 percentage points over baseline models. Overall, our findings provide practical improvements to claim-matching systems, enabling more reliable fact-checking of evolving misinformation.

Updated: 2025-03-05 11:47:32

标题: 当主张演变时：评估和增强嵌入模型对抗虚假信息编辑的鲁棒性

摘要: 在线错误信息仍然是一个关键挑战，事实核查人员越来越依赖基于嵌入的方法来检索相关的事实核查。然而，当经过编辑的虚假声明以新形式重新出现时，这些方法的表现尚不清楚。在这项工作中，我们引入了一个包含六种常见实际世界错误信息编辑的分类法，并提出了一个扰动框架，生成有效的、自然的声明变体。我们的多阶段检索评估显示，标准嵌入模型在用户引入的编辑方面表现不佳，而经过LLM提炼的嵌入提供了更好的鲁棒性，但计算成本更高。虽然强大的重新排名器有助于缓解一些问题，但无法完全弥补第一阶段检索的差距。解决这些检索差距，我们的训练和推断时的缓解方法可以将域内鲁棒性提高高达17个百分点，并将域外泛化性能提高10个百分点以上。总体而言，我们的研究结果为声明匹配系统提供了实际的改进，使得对逐渐演变的错误信息进行更可靠的事实核查成为可能。

更新时间: 2025-03-05 11:47:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03417v1

Augmentation-Based Deep Learning for Identification of Circulating Tumor Cells

Circulating tumor cells (CTCs) are crucial biomarkers in liquid biopsy, offering a noninvasive tool for cancer patient management. However, their identification remains particularly challenging due to their limited number and heterogeneity. Labeling samples for contrast limits the generalization of fluorescence-based methods across different hospital datasets. Analyzing single-cell images enables detailed assessment of cell morphology, subcellular structures, and phenotypic variations, often hidden in clustered images. Developing a method based on bright-field single-cell analysis could overcome these limitations. CTCs can be isolated using an unbiased workflow combining Parsortix technology, which selects cells based on size and deformability, with DEPArray technology, enabling precise visualization and selection of single cells. Traditionally, DEPArray-acquired digital images are manually analyzed, making the process time-consuming and prone to variability. In this study, we present a Deep Learning-based classification pipeline designed to distinguish CTCs from leukocytes in blood samples, aimed to enhance diagnostic accuracy and optimize clinical workflows. Our approach employs images from the bright-field channel acquired through DEPArray technology leveraging a ResNet-based CNN. To improve model generalization, we applied three types of data augmentation techniques and incorporated fluorescence (DAPI) channel images into the training phase, allowing the network to learn additional CTC-specific features. Notably, only bright-field images have been used for testing, ensuring the model's ability to identify CTCs without relying on fluorescence markers. The proposed model achieved an F1-score of 0.798, demonstrating its capability to distinguish CTCs from leukocytes. These findings highlight the potential of DL in refining CTC analysis and advancing liquid biopsy applications.

Updated: 2025-03-05 11:39:15

标题: 基于增强学习的深度学习用于循环肿瘤细胞的识别

摘要: 循环肿瘤细胞（CTCs）是液体活检中至关重要的生物标志物，为癌症患者管理提供了一种无创检测工具。然而，由于其数量有限和异质性，它们的鉴定仍然具有特殊挑战性。为了提高对比度，对样品进行标记限制了基于荧光的方法在不同医院数据集中的泛化能力。分析单细胞图像能够详细评估细胞形态、亚细胞结构和表型变异，通常这些在聚类图像中是隐藏的。基于明场单细胞分析的方法的发展可以克服这些限制。CTCs可以使用结合Parsortix技术（基于大小和可变形性选择细胞）和DEPArray技术（实现单细胞的精确可视化和选择）的无偏流程进行分离。传统上，DEPArray获取的数字图像是手动分析的，这使得该过程耗时且易受变异性影响。在本研究中，我们提出了一个基于深度学习的分类流程，旨在区分血液样本中的CTCs和白细胞，以提高诊断准确性并优化临床工作流程。我们的方法利用DEPArray技术获取的明场通道图像，利用基于ResNet的CNN。为了改善模型的泛化能力，我们应用了三种数据增强技术，并将荧光（DAPI）通道图像纳入训练阶段，使网络能够学习额外的CTC特定特征。值得注意的是，仅使用明场图像进行测试，确保了模型无需依赖荧光标记即可识别CTCs的能力。所提出的模型实现了0.798的F1分数，展示了其区分CTCs和白细胞的能力。这些发现突显了深度学习在完善CTC分析和推进液体活检应用方面的潜力。

更新时间: 2025-03-05 11:39:15

领域: eess.IV,cs.AI,cs.CV,68T07, 68T10,I.2; I.4; J.3

下载: http://arxiv.org/abs/2503.03410v1

AIArena: A Blockchain-Based Decentralized AI Training Platform

The rapid advancement of AI has underscored critical challenges in its development and implementation, largely due to centralized control by a few major corporations. This concentration of power intensifies biases within AI models, resulting from inadequate governance and oversight mechanisms. Additionally, it limits public involvement and heightens concerns about the integrity of model generation. Such monopolistic control over data and AI outputs threatens both innovation and fair data usage, as users inadvertently contribute data that primarily benefits these corporations. In this work, we propose AIArena, a blockchain-based decentralized AI training platform designed to democratize AI development and alignment through on-chain incentive mechanisms. AIArena fosters an open and collaborative environment where participants can contribute models and computing resources. Its on-chain consensus mechanism ensures fair rewards for participants based on their contributions. We instantiate and implement AIArena on the public Base blockchain Sepolia testnet, and the evaluation results demonstrate the feasibility of AIArena in real-world applications.

Updated: 2025-03-05 11:38:00

标题: AIArena：一个基于区块链的去中心化人工智能训练平台

摘要: 人工智能的快速发展突显了在其开发和实施过程中存在的关键挑战，这主要是由于少数几家主要公司的集中控制。这种权力集中加剧了人工智能模型内部的偏见，这是由于监管和监督机制不足造成的。此外，这种集中控制限制了公众的参与，并加剧了对模型生成的诚信性的担忧。对数据和人工智能输出的垄断控制威胁到创新和公平数据使用，因为用户无意中为这些公司主要受益。在这项工作中，我们提出了AIArena，这是一个基于区块链的去中心化人工智能训练平台，旨在通过链上激励机制使人工智能的开发和对齐民主化。AIArena促进了一个开放和协作的环境，在这里参与者可以贡献模型和计算资源。其链上共识机制确保参与者根据其贡献获得公平的奖励。我们在公共Base区块链Sepolia测试网上实例化和实施了AIArena，并评估结果展示了AIArena在实际应用中的可行性。

更新时间: 2025-03-05 11:38:00

领域: cs.CR,cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2412.14566v2

Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion

A continual learning agent builds on previous experiences to develop increasingly complex behaviors by adapting to non-stationary and dynamic environments while preserving previously acquired knowledge. However, scaling these systems presents significant challenges, particularly in balancing the preservation of previous policies with the adaptation of new ones to current environments. This balance, known as the stability-plasticity dilemma, is especially pronounced in complex multi-agent domains such as the train scheduling problem, where environmental and agent behaviors are constantly changing, and the search space is vast. In this work, we propose addressing these challenges in the train scheduling problem using curriculum learning. We design a curriculum with adjacent skills that build on each other to improve generalization performance. Introducing a curriculum with distinct tasks introduces non-stationarity, which we address by proposing a new algorithm: Continual Deep Q-Network (DQN) Expansion (CDE). Our approach dynamically generates and adjusts Q-function subspaces to handle environmental changes and task requirements. CDE mitigates catastrophic forgetting through EWC while ensuring high plasticity using adaptive rational activation functions. Experimental results demonstrate significant improvements in learning efficiency and adaptability compared to RL baselines and other adapted methods for continual learning, highlighting the potential of our method in managing the stability-plasticity dilemma in the adaptive train scheduling setting.

Updated: 2025-03-05 11:27:17

标题: 使用课程驱动的持续DQN扩展来缓解自适应列车调度中的稳定性-可塑性困境

摘要: 一个持续学习的代理人依靠先前的经验来发展日益复杂的行为，通过适应非静态和动态的环境，同时保留先前获得的知识。然而，扩展这些系统面临着重大挑战，特别是在平衡保留先前策略与适应新策略以适应当前环境方面。这种平衡，被称为稳定性-可塑性困境，在复杂的多代理领域中特别明显，如列车调度问题，其中环境和代理行为不断变化，搜索空间巨大。在这项工作中，我们提出使用课程学习来解决列车调度问题中的这些挑战。我们设计了一个课程，其中包含相邻技能，这些技能相互增强，以提高泛化性能。引入具有不同任务的课程引入了非静态性，我们通过提出一种新算法来解决这个问题：连续深度 Q 网络（DQN）扩展（CDE）。我们的方法动态生成和调整 Q 函数子空间，以处理环境变化和任务要求。CDE 通过 EWC 缓解灾难性遗忘，同时利用自适应的合理激活函数确保高可塑性。实验结果表明，与 RL 基线和其他适应方法相比，我们的方法在学习效率和适应性方面取得了显著改进，突出了我们的方法在适应列车调度设置中管理稳定性-可塑性困境的潜力。

更新时间: 2025-03-05 11:27:17

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2408.09838v2

Evolutionary Prediction Games

When users decide whether to use a system based on the quality of predictions they receive, learning has the capacity to shape the population of users it serves - for better or worse. This work aims to study the long-term implications of this process through the lens of evolutionary game theory. We introduce and study evolutionary prediction games, designed to capture the role of learning as a driver of natural selection between groups of users, and hence a determinant of evolutionary outcomes. Our main theoretical results show that: (i) in settings with unlimited data and compute, learning tends to reinforce the survival of the fittest, and (ii) in more realistic settings, opportunities for coexistence emerge. We analyze these opportunities in terms of their stability and feasibility, present several mechanisms that can sustain their existence, and empirically demonstrate our findings using real and synthetic data.

Updated: 2025-03-05 11:24:55

标题: 演化预测游戏

摘要: 当用户根据他们收到的预测质量决定是否使用系统时，学习能够塑造它所服务的用户群体-无论是好是坏。本文旨在通过进化博弈论的视角研究这一过程的长期影响。我们引入并研究了进化预测游戏，旨在捕捉学习作为用户群体之间自然选择的驱动因素，从而决定进化结果。我们的主要理论结果表明：（i）在具有无限数据和计算资源的环境中，学习倾向于强化适者生存，（ii）在更现实的环境中，存在共存的机会。我们分析这些机会的稳定性和可行性，提出几种可以维持其存在的机制，并利用真实和合成数据实证证明我们的发现。

更新时间: 2025-03-05 11:24:55

领域: cs.LG,cs.CY,cs.GT

下载: http://arxiv.org/abs/2503.03401v1

Predicting Practically? Domain Generalization for Predictive Analytics in Real-world Environments

Predictive machine learning models are widely used in customer relationship management (CRM) to forecast customer behaviors and support decision-making. However, the dynamic nature of customer behaviors often results in significant distribution shifts between training data and serving data, leading to performance degradation in predictive models. Domain generalization, which aims to train models that can generalize to unseen environments without prior knowledge of their distributions, has become a critical area of research. In this work, we propose a novel domain generalization method tailored to handle complex distribution shifts, encompassing both covariate and concept shifts. Our method builds upon the Distributionally Robust Optimization framework, optimizing model performance over a set of hypothetical worst-case distributions rather than relying solely on the training data. Through simulation experiments, we demonstrate the working mechanism of the proposed method. We also conduct experiments on a real-world customer churn dataset, and validate its effectiveness in both temporal and spatial generalization settings. Finally, we discuss the broader implications of our method for advancing Information Systems (IS) design research, particularly in building robust predictive models for dynamic managerial environments.

Updated: 2025-03-05 11:21:37

标题: 实际预测？领域泛化在实际环境中的预测分析

摘要: 预测性机器学习模型被广泛应用于客户关系管理（CRM）中，用于预测客户行为并支持决策制定。然而，客户行为的动态性经常导致训练数据和服务数据之间出现显著的分布转移，从而导致预测模型性能下降。领域泛化旨在训练可以泛化到未见环境的模型，而无需先了解它们的分布，已成为一个关键的研究领域。在这项工作中，我们提出了一种针对处理复杂分布转移的新颖领域泛化方法，涵盖了协变量和概念转移。我们的方法建立在分布鲁棒优化框架之上，通过优化模型在一组假设的最坏情况分布上的性能，而不仅仅依赖于训练数据。通过模拟实验，我们展示了所提出方法的工作机制。我们还在一个真实的客户流失数据集上进行实验，并验证了其在时间和空间泛化设置中的有效性。最后，我们讨论了我们的方法对推进信息系统（IS）设计研究的更广泛影响，特别是在建立适用于动态管理环境的稳健预测模型方面。

更新时间: 2025-03-05 11:21:37

领域: cs.LG

下载: http://arxiv.org/abs/2503.03399v1

AI-Driven Multi-Stage Computer Vision System for Defect Detection in Laser-Engraved Industrial Nameplates

Automated defect detection in industrial manufacturing is essential for maintaining product quality and minimizing production errors. In air disc brake manufacturing, ensuring the precision of laser-engraved nameplates is crucial for accurate product identification and quality control. Engraving errors, such as misprints or missing characters, can compromise both aesthetics and functionality, leading to material waste and production delays. This paper presents a proof of concept for an AI-driven computer vision system that inspects and verifies laser-engraved nameplates, detecting defects in logos and alphanumeric strings. The system integrates object detection using YOLOv7, optical character recognition (OCR) with Tesseract, and anomaly detection through a residual variational autoencoder (ResVAE) along with other computer vision methods to enable comprehensive inspections at multiple stages. Experimental results demonstrate the system's effectiveness, achieving 91.33% accuracy and 100% recall, ensuring that defective nameplates are consistently detected and addressed. This solution highlights the potential of AI-driven visual inspection to enhance quality control, reduce manual inspection efforts, and improve overall manufacturing efficiency.

Updated: 2025-03-05 11:19:17

标题: 基于人工智能的多阶段计算机视觉系统用于激光雕刻工业铭牌缺陷检测

摘要: 在工业制造中，自动化缺陷检测对于保持产品质量和最小化生产错误至关重要。在气动盘式制动器制造中，确保激光雕刻的铭牌精度对于准确的产品识别和质量控制至关重要。雕刻错误，如错字或缺失字符，可能会影响美观性和功能性，导致材料浪费和生产延误。本文提出了一个基于人工智能驱动的计算机视觉系统的概念验证，该系统检查和验证激光雕刻的铭牌，检测标志和字母数字串中的缺陷。该系统集成了使用YOLOv7的目标检测、使用Tesseract的光学字符识别（OCR）以及通过残差变分自动编码器（ResVAE）进行异常检测等其他计算机视觉方法，以实现多个阶段的全面检查。实验结果表明该系统的有效性，实现了91.33%的准确率和100%的召回率，确保有缺陷的铭牌被持续检测和处理。这个解决方案凸显了人工智能驱动的视觉检查对于增强质量控制、减少手动检查工作量和提高整体制造效率的潜力。

更新时间: 2025-03-05 11:19:17

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.03395v1

UniFlow: A Foundation Model for Unified Urban Spatio-Temporal Flow Prediction

Urban spatio-temporal flow prediction, encompassing traffic flows and crowd flows, is crucial for optimizing city infrastructure and managing traffic and emergency responses. Traditional approaches have relied on separate models tailored to either grid-based data, representing cities as uniform cells, or graph-based data, modeling cities as networks of nodes and edges. In this paper, we build UniFlow, a foundational model for general urban flow prediction that unifies both grid-based and graphbased data. We first design a multi-view spatio-temporal patching mechanism to standardize different data into a consistent sequential format and then introduce a spatio-temporal transformer architecture to capture complex correlations and dynamics. To leverage shared spatio-temporal patterns across different data types and facilitate effective cross-learning, we propose SpatioTemporal Memory Retrieval Augmentation (ST-MRA). By creating structured memory modules to store shared spatio-temporal patterns, ST-MRA enhances predictions through adaptive memory retrieval. Extensive experiments demonstrate that UniFlow outperforms existing models in both grid-based and graph-based flow prediction, excelling particularly in scenarios with limited data availability, showcasing its superior performance and broad applicability. The datasets and code implementation have been released on https://github.com/YuanYuan98/UniFlow.

Updated: 2025-03-05 11:18:41

标题: UniFlow：统一城市时空流量预测的基础模型

摘要: 城市时空流预测对于优化城市基础设施、管理交通和应急响应至关重要，涵盖交通流和人群流。传统方法依赖于针对基于网格的数据或基于图的数据定制的独立模型，前者将城市表示为统一单元格，后者将城市建模为节点和边的网络。在本文中，我们建立了UniFlow，这是一个通用城市流预测模型，统一了基于网格和基于图的数据。我们首先设计了一个多视图时空修补机制，将不同数据标准化为一致的序列格式，然后引入了一个时空转换器架构，以捕捉复杂的相关性和动态性。为了利用不同数据类型之间共享的时空模式并促进有效的跨学习，我们提出了时空记忆检索增强（ST-MRA）。通过创建结构化记忆模块来存储共享的时空模式，ST-MRA通过自适应记忆检索增强了预测性能。大量实验证明，UniFlow在基于网格和基于图的流预测中优于现有模型，在数据可用性有限的场景中表现突出，展示了其卓越性能和广泛适用性。数据集和代码实现已发布在https://github.com/YuanYuan98/UniFlow。

更新时间: 2025-03-05 11:18:41

领域: cs.LG

下载: http://arxiv.org/abs/2411.12972v2

Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation

Diffusion and flow matching models have achieved remarkable success in text-to-image generation. However, these models typically rely on the predetermined denoising schedules for all prompts. The multi-step reverse diffusion process can be regarded as a kind of chain-of-thought for generating high-quality images step by step. Therefore, diffusion models should reason for each instance to adaptively determine the optimal noise schedule, achieving high generation quality with sampling efficiency. In this paper, we introduce the Time Prediction Diffusion Model (TPDM) for this. TPDM employs a plug-and-play Time Prediction Module (TPM) that predicts the next noise level based on current latent features at each denoising step. We train the TPM using reinforcement learning to maximize a reward that encourages high final image quality while penalizing excessive denoising steps. With such an adaptive scheduler, TPDM not only generates high-quality images that are aligned closely with human preferences but also adjusts diffusion time and the number of denoising steps on the fly, enhancing both performance and efficiency. With Stable Diffusion 3 Medium architecture, TPDM achieves an aesthetic score of 5.44 and a human preference score (HPS) of 29.59, while using around 50% fewer denoising steps to achieve better performance.

Updated: 2025-03-05 11:17:18

标题: 即兴制定时间表：用于更快、更好图像生成的扩散时间预测

摘要: 扩散和流匹配模型在文本到图像生成方面取得了显著的成功。然而，这些模型通常依赖于预先确定的所有提示的去噪时间表。多步反向扩散过程可以被视为一种逐步生成高质量图像的思维链。因此，扩散模型应该针对每个实例进行推理，以自适应确定最佳的噪声时间表，实现高质量的生成同时提高采样效率。在本文中，我们引入了时间预测扩散模型（TPDM）。TPDM采用了一种即插即用的时间预测模块（TPM），根据每个去噪步骤的当前潜在特征预测下一个噪声级别。我们使用强化学习对TPM进行训练，以最大化奖励，鼓励高最终图像质量同时惩罚过多的去噪步骤。通过这样的自适应调度器，TPDM不仅生成了与人类偏好密切一致的高质量图像，还可以动态调整扩散时间和去噪步骤的数量，提高性能和效率。在Stable Diffusion 3 Medium架构下，TPDM实现了5.44的美学评分和29.59的人类偏好评分，同时使用了约50%更少的去噪步骤来实现更好的性能。

更新时间: 2025-03-05 11:17:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2412.01243v3

Prompt-Matcher: Leveraging Large Models to Reduce Uncertainty in Schema Matching Results

Schema matching is the process of identifying correspondences between the elements of two given schemata, essential for database management systems, data integration, and data warehousing. For datasets across different scenarios, the optimal schema matching algorithm is different. For single algorithm, hyperparameter tuning also cases multiple results. All results assigned equal probabilities are stored in probabilistic databases to facilitate uncertainty management. The substantial degree of uncertainty diminishes the efficiency and reliability of data processing, thereby precluding the provision of more accurate information for decision-makers. To address this problem, we introduce a new approach based on fine-grained correspondence verification with specific prompt of Large Language Model. Our approach is an iterative loop that consists of three main components: (1) the correspondence selection algorithm, (2) correspondence verification, and (3) the update of probability distribution. The core idea is that correspondences intersect across multiple results, thereby linking the verification of correspondences to the reduction of uncertainty in candidate results. The task of selecting an optimal correspondence set to maximize the anticipated uncertainty reduction within a fixed budgetary framework is established as an NP-hard problem. We propose a novel $(1-1/e)$-approximation algorithm that significantly outperforms brute algorithm in terms of computational efficiency. To enhance correspondence verification, we have developed two prompt templates that enable GPT-4 to achieve state-of-the-art performance across two established benchmark datasets. Our comprehensive experimental evaluation demonstrates the superior effectiveness and robustness of the proposed approach.

Updated: 2025-03-05 11:15:58

标题: 智能匹配器：利用大型模型减少模式匹配结果中的不确定性

摘要: 模式匹配是识别两个给定模式之间对应关系的过程，对于数据库管理系统、数据集成和数据仓库至关重要。对于不同场景下的数据集，最佳的模式匹配算法也各不相同。对于单一算法，超参数调整也会导致多个结果。所有结果都被赋予相等概率并存储在概率数据库中，以方便不确定性管理。大量的不确定性会降低数据处理的效率和可靠性，从而无法为决策者提供更准确的信息。为解决这一问题，我们提出一种基于细粒度对应验证和特定的大型语言模型提示的新方法。我们的方法是一个包含三个主要组件的迭代循环：（1）对应选择算法，（2）对应验证和（3）概率分布更新。核心思想是对应交叉多个结果，从而将对应验证与候选结果中不确定性的减少联系起来。选择最佳对应集合以在固定预算框架内最大化预期的不确定性减少任务被建立为NP困难问题。我们提出了一种新颖的 $(1-1/e)$-近似算法，从计算效率方面显著优于蛮力算法。为增强对应验证，我们开发了两个提示模板，使GPT-4能够在两个已建立的基准数据集上实现最先进的性能。我们的全面实验评估表明了所提出方法的卓越有效性和稳健性。

更新时间: 2025-03-05 11:15:58

领域: cs.DB,cs.AI

下载: http://arxiv.org/abs/2408.14507v2

Explaining Vision-Language Similarities in Dual Encoders with Feature-Pair Attributions

Dual encoder architectures like CLIP models map two types of inputs into a shared embedding space and predict similarities between them. Despite their success, it is, however, not understood how these models compare their two inputs. Common first-order feature-attribution methods can only provide limited insights into dual-encoders since their predictions depend on feature-interactions rather than on individual features. In this paper, we first derive a second-order method enabling the attribution of predictions by any differentiable dual encoder onto feature-interactions between its inputs. Second, we apply our method to CLIP models and show that they learn fine-grained correspondences between parts of captions and regions in images. They match objects across input modes also account for mismatches. This visual-linguistic grounding ability, however, varies heavily between object classes and exhibits pronounced out-of-domain effects. We can identify individual errors as well as systematic failure categories including object coverage, unusual scenes and correlated contexts.

Updated: 2025-03-05 11:15:39

标题: 用特征对归因解释双编码器中视觉-语言相似性

摘要: 双编码器架构，如CLIP模型，将两种类型的输入映射到共享的嵌入空间，并预测它们之间的相似性。尽管它们取得了成功，但目前尚不清楚这些模型如何比较它们的两个输入。常见的一阶特征归因方法只能提供有限的洞见，因为它们的预测取决于特征相互作用而不是单个特征。在本文中，我们首先推导出一种二阶方法，使得任何可微分的双编码器能够将预测归因于其输入之间的特征相互作用。其次，我们将这种方法应用于CLIP模型，并展示它们学习了字幕部分与图像区域之间的细粒度对应关系。它们在输入模式间匹配对象同时也考虑不匹配情况。然而，这种视觉-语言基础能力在对象类别之间变化很大，并呈现出明显的域外效应。我们可以识别个别错误以及系统性失败类别，包括对象覆盖、异常场景和相关上下文。

更新时间: 2025-03-05 11:15:39

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.14153v2

Multi-Agent DRL for Queue-Aware Task Offloading in Hierarchical MEC-Enabled Air-Ground Networks

Mobile edge computing (MEC)-enabled air-ground networks are a key component of 6G, employing aerial base stations (ABSs) such as unmanned aerial vehicles (UAVs) and high-altitude platform stations (HAPS) to provide dynamic services to ground IoT devices (IoTDs). These IoTDs support real-time applications (e.g., multimedia and Metaverse services) that demand high computational resources and strict quality of service (QoS) guarantees in terms of latency and task queue management. Given their limited energy and processing capabilities, IoTDs rely on UAVs and HAPS to offload tasks for distributed processing, forming a multi-tier MEC system. This paper tackles the overall energy minimization problem in MEC-enabled air-ground integrated networks (MAGIN) by jointly optimizing UAV trajectories, computing resource allocation, and queue-aware task offloading decisions. The optimization is challenging due to the nonconvex, nonlinear nature of this hierarchical system, which renders traditional methods ineffective. We reformulate the problem as a multi-agent Markov decision process (MDP) with continuous action spaces and heterogeneous agents, and propose a novel variant of multi-agent proximal policy optimization with a Beta distribution (MAPPO-BD) to solve it. Extensive simulations show that MAPPO-BD outperforms baseline schemes, achieving superior energy savings and efficient resource management in MAGIN while meeting queue delay and edge computing constraints.

Updated: 2025-03-05 11:12:40

标题: 多智能体深度强化学习用于层次化MEC启用的空地网络中的队列感知任务卸载

摘要: 移动边缘计算（MEC）启用的空地网络是6G的关键组成部分，利用无人机等空中基站（ABS）和高空平台站（HAPS）为地面物联网设备（IoTDs）提供动态服务。这些IoTDs支持实时应用程序（例如多媒体和Metaverse服务），需要高计算资源和严格的服务质量（QoS）保证，包括延迟和任务队列管理。鉴于其有限的能量和处理能力，IoTDs依赖于无人机和HAPS来卸载任务进行分布式处理，形成多层MEC系统。本文通过联合优化无人机轨迹、计算资源分配和队列感知任务卸载决策，解决了MEC启用的空地集成网络（MAGIN）中的整体能量最小化问题。优化是具有挑战性的，因为这种分层系统的非凸、非线性性质使传统方法无效。我们将问题重新构建为具有连续动作空间和异质代理的多代理马尔可夫决策过程（MDP），并提出了一种新颖的多代理近端策略优化的变体，其中包含Beta分布（MAPPO-BD）以解决问题。广泛的模拟表明，MAPPO-BD优于基线方案，在MAGIN中实现了卓越的能量节约和有效的资源管理，同时满足队列延迟和边缘计算约束。

更新时间: 2025-03-05 11:12:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03391v1

CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation

With the rapid advancement of Large Language Models (LLMs), the demand for robust instruction-following capabilities in code generation tasks has grown significantly. Code generation not only facilitates faster prototyping and automated testing, but also augments developer efficiency through improved maintainability and reusability of code. In this paper, we introduce CodeIF, the first benchmark specifically designed to assess the abilities of LLMs to adhere to task-oriented instructions within diverse code generation scenarios. CodeIF encompasses a broad range of tasks, including function synthesis, error debugging, algorithmic refactoring, and code explanation, thereby providing a comprehensive suite to evaluate model performance across varying complexity levels and programming domains. We conduct extensive experiments with LLMs, analyzing their strengths and limitations in meeting the demands of these tasks. The experimental results offer valuable insights into how well current models align with human instructions, as well as the extent to which they can generate consistent, maintainable, and contextually relevant code. Our findings not only underscore the critical role that instruction-following LLMs can play in modern software development, but also illuminate pathways for future research aimed at enhancing their adaptability, reliability, and overall effectiveness in automated code generation.

Updated: 2025-03-05 11:09:06

标题: CodeIF：为代码生成的大型语言模型的指令跟随能力进行基准测试

摘要: 随着大型语言模型（LLMs）的快速发展，对代码生成任务中强大的指令遵循能力的需求显著增长。代码生成不仅有助于更快的原型设计和自动化测试，还通过改善代码的可维护性和可重用性提高开发人员的效率。在本文中，我们介绍了CodeIF，这是专门设计用于评估LLMs遵循任务导向指令能力的第一个基准。CodeIF涵盖了广泛的任务范围，包括函数合成、错误调试、算法重构和代码解释，从而提供了一个全面的套件，用于评估模型在不同复杂性水平和编程领域下的性能。我们对LLMs进行了大量实验，分析它们在满足这些任务需求方面的优势和局限性。实验结果提供了宝贵的见解，展示了当前模型与人类指令的对齐程度，以及它们生成一致、可维护和上下文相关代码的能力。我们的研究结果不仅强调了指令遵循LLMs在现代软件开发中的关键作用，还为未来的研究提供了启示，旨在增强它们在自动化代码生成中的适应性、可靠性和整体效力。

更新时间: 2025-03-05 11:09:06

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2502.19166v2

Bounding Evidence and Estimating Log-Likelihood in VAE

Many crucial problems in deep learning and statistical inference are caused by a variational gap, i.e., a difference between model evidence (log-likelihood) and evidence lower bound (ELBO). In particular, in a classical VAE setting that involves training via an ELBO cost function, it is difficult to provide a robust comparison of the effects of training between models, since we do not know a log-likelihood of data (but only its lower bound). In this paper, to deal with this problem, we introduce a general and effective upper bound, which allows us to efficiently approximate the evidence of data. We provide extensive theoretical and experimental studies of our approach, including its comparison to the other state-of-the-art upper bounds, as well as its application as a tool for the evaluation of models that were trained on various lower bounds.

Updated: 2025-03-05 11:04:58

标题: 限定证据和估计VAE中的对数似然

摘要: 深度学习和统计推断中许多关键问题都由变分差引起，即模型证据（对数似然）和证据下界（ELBO）之间的差异。特别是，在经由ELBO成本函数进行训练的经典VAE设置中，由于我们不知道数据的对数似然（只知道其下界），因此很难提供对模型训练效果的稳健比较。为了解决这个问题，在本文中，我们引入了一个通用且有效的上界，可以帮助我们有效地近似数据的证据。我们对我们的方法进行了广泛的理论和实验研究，包括与其他最先进的上界的比较，以及将其应用作为评估在各种下界上训练的模型的工具。

更新时间: 2025-03-05 11:04:58

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2206.09453v2

GNNMerge: Merging of GNN Models Without Accessing Training Data

Model merging has gained prominence in machine learning as a method to integrate multiple trained models into a single model without accessing the original training data. While existing approaches have demonstrated success in domains such as computer vision and NLP, their application to Graph Neural Networks (GNNs) remains unexplored. These methods often rely on the assumption of shared initialization, which is seldom applicable to GNNs. In this work, we undertake the first benchmarking study of model merging algorithms for GNNs, revealing their limited effectiveness in this context. To address these challenges, we propose GNNMerge, which utilizes a task-agnostic node embedding alignment strategy to merge GNNs. Furthermore, we establish that under a mild relaxation, the proposed optimization objective admits direct analytical solutions for widely used GNN architectures, significantly enhancing its computational efficiency. Empirical evaluations across diverse datasets, tasks, and architectures establish GNNMerge to be up to 24% more accurate than existing methods while delivering over 2 orders of magnitude speed-up compared to training from scratch.

Updated: 2025-03-05 11:02:29

标题: GNNMerge: 不访问训练数据的GNN模型合并

摘要: 模型合并在机器学习中日益受到关注，作为一种方法来将多个训练好的模型整合成一个单一模型，而无需访问原始训练数据。尽管现有方法在计算机视觉和自然语言处理等领域取得了成功，但它们在图神经网络（GNNs）中的应用尚未被探索。这些方法通常依赖于共享初始化的假设，而这种假设很少适用于GNNs。在这项工作中，我们进行了第一次针对GNNs的模型合并算法的基准研究，揭示了它们在这一背景下的有限有效性。为了解决这些挑战，我们提出了GNNMerge，该方法利用一种任务无关的节点嵌入对齐策略来合并GNNs。此外，我们建立了在一个轻微放宽的条件下，所提出的优化目标允许直接解析解用于广泛使用的GNN架构，显著提高了其计算效率。通过对各种数据集、任务和架构的实证评估，我们确定GNNMerge比现有方法更准确高达24%，同时相比于从头开始训练，提供了超过2个数量级的加速。

更新时间: 2025-03-05 11:02:29

领域: cs.LG

下载: http://arxiv.org/abs/2503.03384v1

Paths and Ambient Spaces in Neural Loss Landscapes

Understanding the structure of neural network loss surfaces, particularly the emergence of low-loss tunnels, is critical for advancing neural network theory and practice. In this paper, we propose a novel approach to directly embed loss tunnels into the loss landscape of neural networks. Exploring the properties of these loss tunnels offers new insights into their length and structure and sheds light on some common misconceptions. We then apply our approach to Bayesian neural networks, where we improve subspace inference by identifying pitfalls and proposing a more natural prior that better guides the sampling procedure.

Updated: 2025-03-05 10:57:34

标题: 神经损失景观中的路径和环境空间

摘要: 理解神经网络损失曲面的结构，特别是低损失通道的出现对推进神经网络理论和实践至关重要。在本文中，我们提出了一种直接将损失通道嵌入神经网络损失曲面的新方法。探索这些损失通道的特性为我们提供了对其长度和结构的新见解，并揭示了一些常见的误解。然后，我们将我们的方法应用于贝叶斯神经网络，通过识别陷阱并提出更自然的先验，从而改善了子空间推断，更好地引导了采样过程。

更新时间: 2025-03-05 10:57:34

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.03382v1

DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration

Recent progress in Large Language Models (LLMs) has drawn attention to their potential for accelerating drug discovery. However, a central problem remains: translating theoretical ideas into robust implementations in the highly specialized context of pharmaceutical research. This limitation prevents practitioners from making full use of the latest AI developments in drug discovery. To address this challenge, we introduce DrugAgent, a multi-agent framework that automates machine learning (ML) programming for drug discovery tasks. DrugAgent employs an LLM Planner that formulates high-level ideas and an LLM Instructor that identifies and integrates domain knowledge when implementing those ideas. We present case studies on three representative drug discovery tasks. Our results show that DrugAgent consistently outperforms leading baselines, including a relative improvement of 4.92% in ROC-AUC compared to ReAct for drug-target interaction (DTI). DrugAgent is publicly available at https://anonymous.4open.science/r/drugagent-5C42/.

Updated: 2025-03-05 10:54:30

标题: DrugAgent：通过LLM多智能体协作自动化AI辅助药物发现编程

摘要: 最近大型语言模型（LLMs）的进展引起了人们对它们加速药物发现的潜力的关注。然而，一个中心问题仍然存在：将理论观念转化为在高度专业化的制药研究环境中的稳健实现。这种限制阻碍了从业者充分利用药物发现中最新人工智能发展。为了解决这一挑战，我们引入了DrugAgent，一个自动化机器学习（ML）编程用于药物发现任务的多智能体框架。DrugAgent采用LLM计划者制定高层思路和LLM指导员在实现这些思路时识别和整合领域知识。我们对三个代表性药物发现任务进行了案例研究。我们的结果表明，DrugAgent始终优于主流基线，包括与ReAct相比在药物靶标相互作用（DTI）中ROC-AUC相对改进4.92％。DrugAgent可在https://anonymous.4open.science/r/drugagent-5C42/上公开获取。

更新时间: 2025-03-05 10:54:30

领域: cs.LG

下载: http://arxiv.org/abs/2411.15692v2

TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning

Hierarchical organization is fundamental to biological systems and human societies, yet artificial intelligence systems often rely on monolithic architectures that limit adaptability and scalability. Current hierarchical reinforcement learning (HRL) approaches typically restrict hierarchies to two levels or require centralized training, which limits their practical applicability. We introduce TAME Agent Framework (TAG), a framework for constructing fully decentralized hierarchical multi-agent systems. TAG enables hierarchies of arbitrary depth through a novel LevelEnv concept, which abstracts each hierarchy level as the environment for the agents above it. This approach standardizes information flow between levels while preserving loose coupling, allowing for seamless integration of diverse agent types. We demonstrate the effectiveness of TAG by implementing hierarchical architectures that combine different RL agents across multiple levels, achieving improved performance over classical multi-agent RL baselines on standard benchmarks. Our results show that decentralized hierarchical organization enhances both learning speed and final performance, positioning TAG as a promising direction for scalable multi-agent systems.

Updated: 2025-03-05 10:48:42

标题: TAG：一个多智能体分层强化学习的去中心化框架

摘要: 层次结构对生物系统和人类社会至关重要，然而人工智能系统通常依赖于限制适应性和可扩展性的单一体系结构。当前的层次强化学习（HRL）方法通常将层次结构限制在两个级别，或需要集中式训练，从而限制了它们的实际适用性。我们引入了TAME Agent Framework（TAG），这是一个用于构建完全去中心化的层次多代理系统的框架。TAG通过一种新颖的LevelEnv概念实现了任意深度的层次结构，该概念将每个层次级别抽象为其上方代理的环境。这种方法标准化了不同层次之间的信息流，同时保持了松耦合，允许无缝集成不同类型的代理。我们通过在标准基准测试上实现结合不同RL代理的层次体系结构来展示TAG的有效性，其性能优于经典的多代理RL基准。我们的结果表明，去中心化的层次组织提高了学习速度和最终性能，将TAG定位为可扩展多代理系统的一个有前景的方向。

更新时间: 2025-03-05 10:48:42

领域: cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2502.15425v4

Exploration Implies Data Augmentation: Reachability and Generalisation in Contextual MDPs

In the zero-shot policy transfer (ZSPT) setting for contextual Markov decision processes (MDP), agents train on a fixed set of contexts and must generalise to new ones. Recent work has argued and demonstrated that increased exploration can improve this generalisation, by training on more states in the training contexts. In this paper, we demonstrate that training on more states can indeed improve generalisation, but can come at a cost of reducing the accuracy of the learned value function which should not benefit generalisation. We introduce reachability in the ZSPT setting to define which states/contexts require generalisation and explain why exploration can improve it. We hypothesise and demonstrate that using exploration to increase the agent's coverage while also increasing the accuracy improves generalisation even more. Inspired by this, we propose a method Explore-Go that implements an exploration phase at the beginning of each episode, which can be combined with existing on- and off-policy RL algorithms and significantly improves generalisation even in partially observable MDPs. We demonstrate the effectiveness of Explore-Go when combined with several popular algorithms and show an increase in generalisation performance across several environments. With this, we hope to provide practitioners with a simple modification that can improve the generalisation of their agents.

Updated: 2025-03-05 10:47:17

标题: 探索意味着数据增强：在上下文MDPs中的可达性和泛化

摘要: 在上下文马尔可夫决策过程（MDP）的零-shot政策转移（ZSPT）设置中，代理在固定的上下文集上训练，必须推广到新的上下文。最近的研究认为并证明了增加探索可以改善这种泛化，通过在训练上下文中训练更多状态。在本文中，我们证明了在更多状态上训练确实可以改善泛化，但可能会以降低学习价值函数的准确性为代价，这不应有利于泛化。我们在ZSPT设置中引入可达性来定义哪些状态/上下文需要泛化，并解释了为什么探索可以改善它。我们假设并证明了利用探索来增加代理覆盖范围同时提高准确性可以进一步改善泛化。受此启发，我们提出了一种名为Explore-Go的方法，实现在每一集的开始处进行探索阶段，可以与现有的有策略和无策略RL算法结合使用，甚至在部分可观测MDP中显著改善泛化。当与几种流行算法结合使用时，我们证明了Explore-Go的有效性，并展示了在几个环境中泛化性能的提高。通过这样做，我们希望为从业者提供一种简单的修改方法，可以改善他们的代理的泛化能力。

更新时间: 2025-03-05 10:47:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.03565v2

A Novel Multi-Criteria Local Latin Hypercube Refinement System for Commutation Angle Improvement in IPMSMs

The commutation angle is defined as the angle between the fundamental of the motor phase current and the fundamental of the back-EMF. It can be utilised to provide a compensating effect in IPMSMs. This is due to the reluctance torque component being dependent on the commutation angle of the phase current even before entering the extended speed range. A real-time maximum torque per current and voltage strategy is demonstrated to find the trajectory and optimum commutation angles, gamma, where the level of accuracy depends on the application and available computational speed. A magnet volume reduction using a novel multi-criteria local Latin hypercube refinement (MLHR) sampling system is also presented to improve the optimisation process. The proposed new technique minimises the magnet mass to motor torque density whilst maintaining a similar phase current level. A mapping of gamma allows the determination of the optimum angles, as shown in this paper. The 3rd generation Toyota Prius IPMSM is considered as the reference motor, where the rotor configuration is altered to allow for an individual assessment.

Updated: 2025-03-05 10:47:06

标题: 一种新的用于改善IPMSMs换向角的多准则本地拉丁超立方细化系统

摘要: 换向角被定义为电机相电流基波与反电动势基波之间的角度。它可以用于在IPMSMs中提供补偿效果。这是因为磁阻转矩分量取决于相电流的换向角，甚至在进入扩展速度范围之前。演示了一种实时最大转矩每电流和电压策略，用于找到轨迹和最佳换向角γ，其准确度水平取决于应用程序和可用的计算速度。还提出了一种利用新颖的多标准局部拉丁超立方体精化（MLHR）采样系统来减小磁体体积，以改善优化过程的方法。所提出的新技术将磁体质量最小化至电机转矩密度，同时保持类似的相电流水平。通过对gamma的映射，可以确定最佳角度，如本文所示。将第三代丰田普锐斯IPMSM作为参考电机，其中转子配置被改变以进行个体评估。

更新时间: 2025-03-05 10:47:06

领域: cs.CE,cs.ET,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2503.03372v1

Unifying Causal Representation Learning with the Invariance Principle

Causal representation learning (CRL) aims at recovering latent causal variables from high-dimensional observations to solve causal downstream tasks, such as predicting the effect of new interventions or more robust classification. A plethora of methods have been developed, each tackling carefully crafted problem settings that lead to different types of identifiability. These different settings are widely assumed to be important because they are often linked to different rungs of Pearl's causal hierarchy, even though this correspondence is not always exact. This work shows that instead of strictly conforming to this hierarchical mapping, many causal representation learning approaches methodologically align their representations with inherent data symmetries. Identification of causal variables is guided by invariance principles that are not necessarily causal. This result allows us to unify many existing approaches in a single method that can mix and match different assumptions, including non-causal ones, based on the invariance relevant to the problem at hand. It also significantly benefits applicability, which we demonstrate by improving treatment effect estimation on real-world high-dimensional ecological data. Overall, this paper clarifies the role of causal assumptions in the discovery of causal variables and shifts the focus to preserving data symmetries.

Updated: 2025-03-05 10:42:53

标题: 用不变性原则统一因果表示学习

摘要: 因果表征学习（CRL）旨在从高维观测中恢复潜在的因果变量，以解决因果下游任务，如预测新干预的效果或更健壮的分类。已经开发了大量方法，每种方法都处理精心设计的问题设置，导致不同类型的可识别性。尽管这种对应关系并不总是精确的，但普遍认为这些不同的设置很重要，因为它们通常与Pearl的因果层次结构的不同层级相关。这项工作表明，许多因果表征学习方法在方法上与内在数据对称性保持一致，而不是严格遵循这种层次映射。因果变量的识别受到不一定是因果的不变性原则的指导。这一结果使我们能够将许多现有方法统一为一种方法，可以根据与手头问题相关的不变性混合和匹配不同的假设，包括非因果性假设。这也极大地有利于适用性，我们通过改善对现实世界高维生态数据的处理效果估计来证明这一点。总的来说，本文阐明了因果假设在发现因果变量中的作用，并将焦点转移到保持数据对称性。

更新时间: 2025-03-05 10:42:53

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2409.02772v2

From Infants to AI: Incorporating Infant-like Learning in Models Boosts Efficiency and Generalization in Learning Social Prediction Tasks

Early in development, infants learn a range of useful concepts, which can be challenging from a computational standpoint. This early learning comes together with an initial understanding of aspects of the meaning of concepts, e.g., their implications, causality, and using them to predict likely future events. All this is accomplished in many cases with little or no supervision, and from relatively few examples, compared with current network models. In learning about objects and human-object interactions, early acquired and possibly innate concepts are often used in the process of learning additional, more complex concepts. In the current work, we model how early-acquired concepts are used in the learning of subsequent concepts, and compare the results with standard deep network modeling. We focused in particular on the use of the concepts of animacy and goal attribution in learning to predict future events. We show that the use of early concepts in the learning of new concepts leads to better learning (higher accuracy) and more efficient learning (requiring less data). We further show that this integration of early and new concepts shapes the representation of the concepts acquired by the model. The results show that when the concepts were learned in a human-like manner, the emerging representation was more useful, as measured in terms of generalization to novel data and tasks. On a more general level, the results suggest that there are likely to be basic differences in the conceptual structures acquired by current network models compared to human learning.

Updated: 2025-03-05 10:40:19

标题: 从婴儿到人工智能：在模型中融入类婴儿学习方式提升学习社交预测任务的效率和泛化能力

摘要: 在发育的早期，婴儿学习了一系列有用的概念，这在计算方面可能具有挑战性。这种早期学习与对概念意义的一些方面有着初步的理解相结合，例如，它们的含义、因果关系，以及利用它们来预测可能发生的未来事件。在许多情况下，所有这些都是在很少或几乎没有监督的情况下完成的，与当前的网络模型相比，所需的示例相对较少。在学习物体和人类-物体交互方面，早期获得的、可能是先天的概念经常在学习额外、更复杂的概念的过程中使用。在当前的工作中，我们模拟了早期获得的概念如何用于学习随后的概念，并将结果与标准的深度网络建模进行了比较。我们特别关注了在学习预测未来事件时使用生命性和目标归因概念。我们表明，在学习新概念时利用早期概念会导致更好的学习（更高的准确性）和更高效的学习（需要更少的数据）。我们进一步表明，早期和新概念的整合塑造了模型所获得的概念的表示。结果表明，当概念以类似人类的方式学习时，新出现的表示更有用，这是通过对新数据和任务的泛化来衡量的。在更一般的层面上，结果表明，当前网络模型所获得的概念结构可能与人类学习有基本的差异。

更新时间: 2025-03-05 10:40:19

领域: cs.AI,cs.NE

下载: http://arxiv.org/abs/2503.03361v1

Transformers for molecular property prediction: Domain adaptation efficiently improves performance

Most of the current transformer-based chemical language models are pre-trained on millions to billions of molecules. However, the improvement from such scaling in dataset size is not confidently linked to improved molecular property prediction. The aim of this study is to investigate and overcome some of the limitations of transformer models in predicting molecular properties. Specifically, we examine the impact of pre-training dataset size and diversity on the performance of transformer models and investigate the use of domain adaptation as a technique for improving model performance. First, our findings indicate that increasing pretraining dataset size beyond 400K molecules from the GuacaMol dataset does not result in a significant improvement on four ADME endpoints, namely, solubility, permeability, microsomal stability, and plasma protein binding. Second, our results demonstrate that using domain adaptation by further training the transformer model on a small set of domain-relevant molecules, i.e., a few hundred to a few thousand, using multi-task regression of physicochemical properties was sufficient to significantly improve performance for three out of the four investigated ADME endpoints (P-value < 0.001). Finally, we observe that a model pre-trained on 400K molecules and domain adopted on a few hundred/thousand molecules performs similarly (P-value > 0.05) to more complicated transformer models like MolBERT(pre-trained on 1.3M molecules) and MolFormer (pre-trained on 100M molecules). A comparison to a random forest model trained on basic physicochemical properties showed similar performance to the examined transformer models. We believe that current transformer models can be improved through further systematic analysis of pre-training and downstream data, pre-training objectives, and scaling laws, ultimately leading to better and more helpful models.

Updated: 2025-03-05 10:40:09

标题: 转换器用于分子性质预测：领域自适应有效提高性能

摘要: 目前大多数基于当前变压器的化学语言模型都是在数百万到数十亿分子上进行预训练的。然而，从数据集规模的扩展中获得的改进并不能确信地与分子性质预测的改善相联系。本研究的目的是调查并克服变压器模型在预测分子性质方面的一些限制。具体来说，我们研究了预训练数据集大小和多样性对变压器模型性能的影响，并调查了使用领域适应作为改善模型性能的技术。首先，我们的研究结果表明，将预训练数据集大小增加到GuacaMol数据集的40万个分子以上并没有显著改善四个ADME端点，即溶解度、渗透性、微粒体稳定性和血浆蛋白结合。其次，我们的结果表明，通过在一小部分领域相关分子上进一步训练变压器模型，即几百到几千个，使用多任务回归物理化学性质可以显著改善三个受研究的四个ADME端点的性能（P值<0.001）。最后，我们观察到，一个在40万个分子上进行预训练并在几百/几千个分子上进行领域适应的模型表现类似（P值>0.05）于更复杂的变压器模型，如MolBERT（在130万个分子上进行预训练）和MolFormer（在1亿个分子上进行预训练）。与基本物理化学性质训练的随机森林模型相比，我们发现被研究的变压器模型表现类似。我们相信，通过进一步系统分析预训练和下游数据、预训练目标和缩放规律，可以改进当前的变压器模型，最终实现更好和更有益的模型。

更新时间: 2025-03-05 10:40:09

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.03360v1

Video Super-Resolution: All You Need is a Video Diffusion Model

We present a generic video super-resolution algorithm in this paper, based on the Diffusion Posterior Sampling framework with an unconditional video generation model in latent space. The video generation model, a diffusion transformer, functions as a space-time model. We argue that a powerful model, which learns the physics of the real world, can easily handle various kinds of motion patterns as prior knowledge, thus eliminating the need for explicit estimation of optical flows or motion parameters for pixel alignment. Furthermore, a single instance of the proposed video diffusion transformer model can adapt to different sampling conditions without re-training. Due to limited computational resources and training data, our experiments provide empirical evidence of the algorithm's strong super-resolution capabilities using synthetic data.

Updated: 2025-03-05 10:37:51

标题: 视频超分辨率：你所需要的只是一个视频扩散模型

摘要: 本文提出了一种基于扩散后验抽样框架的通用视频超分辨率算法，该算法基于潜在空间中的无条件视频生成模型。视频生成模型是一个扩散变换器，作为时空模型。我们认为一个强大的模型，可以学习现实世界的物理规律，可以轻松处理各种运动模式作为先验知识，从而消除了对像素对齐的光流或运动参数的显式估计的需要。此外，建议的视频扩散变换器模型的单个实例可以适应不同的采样条件，而无需重新训练。由于有限的计算资源和训练数据，我们的实验提供了使用合成数据验证算法强大的超分辨率能力的经验证据。

更新时间: 2025-03-05 10:37:51

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2503.03355v1

Leveraging Large Language Models to Develop Heuristics for Emerging Optimization Problems

Combinatorial optimization problems often rely on heuristic algorithms to generate efficient solutions. However, the manual design of heuristics is resource-intensive and constrained by the designer's expertise. Recent advances in artificial intelligence, particularly large language models (LLMs), have demonstrated the potential to automate heuristic generation through evolutionary frameworks. Recent works focus only on well-known combinatorial optimization problems like the traveling salesman problem and online bin packing problem when designing constructive heuristics. This study investigates whether LLMs can effectively generate heuristics for niche, not yet broadly researched optimization problems, using the unit-load pre-marshalling problem as an example case. We propose the Contextual Evolution of Heuristics (CEoH) framework, an extension of the Evolution of Heuristics (EoH) framework, which incorporates problem-specific descriptions to enhance in-context learning during heuristic generation. Through computational experiments, we evaluate CEoH and EoH and compare the results. Results indicate that CEoH enables smaller LLMs to generate high-quality heuristics more consistently and even outperform larger models. Larger models demonstrate robust performance with or without contextualized prompts. The generated heuristics exhibit scalability to diverse instance configurations.

Updated: 2025-03-05 10:22:49

标题: 利用大型语言模型开发新兴优化问题的启发式算法

摘要: 组合优化问题通常依赖于启发式算法来生成高效解决方案。然而，启发式的手动设计需要耗费大量资源，并受设计者专业知识的限制。最近人工智能的进展，尤其是大型语言模型（LLMs），已经展示了通过进化框架自动化生成启发式的潜力。最近的研究仅关注已知的组合优化问题，如旅行商问题和在线装箱问题，在设计构造性启发式时。本研究调查LLMs是否能够有效地为尚未广泛研究的优化问题生成启发式，以单元装载预编组问题为例。我们提出了上下文启发式演化（CEoH）框架，这是启发式演化（EoH）框架的扩展，它将问题特定描述纳入其中，以增强启发式生成过程中的上下文学习。通过计算实验，我们评估了CEoH和EoH，并比较了结果。结果表明，CEoH能够使较小的LLMs更一致地生成高质量启发式，甚至胜过更大的模型。较大的模型无论是否有上下文提示，都表现出稳健的性能。生成的启发式表现出对不同实例配置的可扩展性。

更新时间: 2025-03-05 10:22:49

领域: cs.AI

下载: http://arxiv.org/abs/2503.03350v1

Coordinated Multi-Armed Bandits for Improved Spatial Reuse in Wi-Fi

Multi-Access Point Coordination (MAPC) and Artificial Intelligence and Machine Learning (AI/ML) are expected to be key features in future Wi-Fi, such as the forthcoming IEEE 802.11bn (Wi-Fi~8) and beyond. In this paper, we explore a coordinated solution based on online learning to drive the optimization of Spatial Reuse (SR), a method that allows multiple devices to perform simultaneous transmissions by controlling interference through Packet Detect (PD) adjustment and transmit power control. In particular, we focus on a Multi-Agent Multi-Armed Bandit (MA-MAB) setting, where multiple decision-making agents concurrently configure SR parameters from coexisting networks by leveraging the MAPC framework, and study various algorithms and reward-sharing mechanisms. We evaluate different MA-MAB implementations using Komondor, a well-adopted Wi-Fi simulator, and demonstrate that AI-native SR enabled by coordinated MABs can improve the network performance over current Wi-Fi operation: mean throughput increases by 15%, fairness is improved by increasing the minimum throughput across the network by 210%, while the maximum access delay is kept below 3 ms.

Updated: 2025-03-05 10:19:05

标题: 协调的多臂老虎机用于改善Wi-Fi中的空间重用

摘要: 多接入点协调（MAPC）和人工智能和机器学习（AI/ML）被认为是未来Wi-Fi（例如即将推出的IEEE 802.11bn（Wi-Fi~8）及更高版本）的关键特性。在本文中，我们探讨了一种基于在线学习的协调解决方案，以推动空间重用（SR）的优化，这种方法允许多个设备通过控制干扰来进行同时传输，通过数据包检测（PD）调整和发送功率控制。特别是，我们关注多智能体多臂老虎机（MA-MAB）设置，多个决策代理同时通过利用MAPC框架配置共存网络的SR参数，并研究各种算法和奖励共享机制。我们使用广泛采用的Wi-Fi模拟器Komondor评估不同的MA-MAB实现，并证明通过协调的MAB实现的AI本地化SR可以提高网络性能，平均吞吐量增加15％，公平性通过增加整个网络的最小吞吐量210％得到改善，同时最大接入延迟保持在3毫秒以下。

更新时间: 2025-03-05 10:19:05

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2412.03076v2

From Learning to Optimize to Learning Optimization Algorithms

Towards designing learned optimization algorithms that are usable beyond their training setting, we identify key principles that classical algorithms obey, but have up to now, not been used for Learning to Optimize (L2O). Following these principles, we provide a general design pipeline, taking into account data, architecture and learning strategy, and thereby enabling a synergy between classical optimization and L2O, resulting in a philosophy of Learning Optimization Algorithms. As a consequence our learned algorithms perform well far beyond problems from the training distribution. We demonstrate the success of these novel principles by designing a new learning-enhanced BFGS algorithm and provide numerical experiments evidencing its adaptation to many settings at test time.

Updated: 2025-03-05 10:17:25

标题: 从学习优化到学习优化算法

摘要: 朝着设计学习优化算法，使其在训练环境之外也可用，我们确定了经典算法遵循的关键原则，但迄今为止尚未用于学习优化（L2O）。遵循这些原则，我们提供了一个通用的设计流程，考虑到数据、架构和学习策略，从而实现经典优化和L2O之间的协同作用，形成了学习优化算法的理念。因此，我们的学习算法在训练分布之外的问题上表现良好。我们通过设计一个新的学习增强的BFGS算法来证明这些新原则的成功，并提供数值实验，证明其在测试时适应多种设置。

更新时间: 2025-03-05 10:17:25

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.18222v2

State Space Models are Provably Comparable to Transformers in Dynamic Token Selection

Deep neural networks based on state space models (SSMs) are attracting significant attention in sequence modeling since their computational cost is much smaller than that of Transformers. While the capabilities of SSMs have been demonstrated through experiments in various tasks, theoretical understanding of SSMs is still limited. In particular, most theoretical studies discuss the capabilities of SSM layers without nonlinear layers, and there is a lack of discussion on their combination with nonlinear layers. In this paper, we explore the capabilities of SSMs combined with fully connected neural networks, and show that they are comparable to Transformers in extracting the essential tokens depending on the input. As concrete examples, we consider two synthetic tasks, which are challenging for a single SSM layer, and demonstrate that SSMs combined with nonlinear layers can efficiently solve these tasks. Furthermore, we study the nonparametric regression task, and prove that the ability of SSMs is equivalent to that of Transformers in estimating functions belonging to a certain class.

Updated: 2025-03-05 10:15:19

标题: 状态空间模型在动态令牌选择方面可以被证明与Transformer相媲美

摘要: 基于状态空间模型（SSMs）的深度神经网络在序列建模中引起了广泛关注，因为它们的计算成本远远小于Transformer的计算成本。虽然通过各种任务的实验证明了SSMs的能力，但对于SSMs的理论理解仍然有限。特别是，大多数理论研究讨论了没有非线性层的SSM层的能力，并缺乏关于它们与非线性层结合的讨论。在本文中，我们探讨了SSMs与全连接神经网络结合的能力，并展示它们在提取依赖于输入的基本标记方面与Transformer相当。作为具体示例，我们考虑了两个对单个SSM层具有挑战性的合成任务，并证明了SSMs与非线性层结合可以有效地解决这些任务。此外，我们研究了非参数回归任务，并证明了SSMs在估计属于某一类别的函数方面的能力等同于Transformer。

更新时间: 2025-03-05 10:15:19

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.19036v2

Navigating Intelligence: A Survey of Google OR-Tools and Machine Learning for Global Path Planning in Autonomous Vehicles

We offer a new in-depth investigation of global path planning (GPP) for unmanned ground vehicles, an autonomous mining sampling robot named ROMIE. GPP is essential for ROMIE's optimal performance, which is translated into solving the traveling salesman problem, a complex graph theory challenge that is crucial for determining the most effective route to cover all sampling locations in a mining field. This problem is central to enhancing ROMIE's operational efficiency and competitiveness against human labor by optimizing cost and time. The primary aim of this research is to advance GPP by developing, evaluating, and improving a cost-efficient software and web application. We delve into an extensive comparison and analysis of Google operations research (OR)-Tools optimization algorithms. Our study is driven by the goal of applying and testing the limits of OR-Tools capabilities by integrating Reinforcement Learning techniques for the first time. This enables us to compare these methods with OR-Tools, assessing their computational effectiveness and real-world application efficiency. Our analysis seeks to provide insights into the effectiveness and practical application of each technique. Our findings indicate that Q-Learning stands out as the optimal strategy, demonstrating superior efficiency by deviating only 1.2% on average from the optimal solutions across our datasets.

Updated: 2025-03-05 10:12:22

标题: 智能导航：自动驾驶车辆全局路径规划中Google OR-Tools和机器学习调查

摘要: 我们提供了一项关于全球路径规划（GPP）的深入调查，针对无人地面车辆，一种名为ROMIE的自主采矿取样机器人。GPP对于ROMIE的最佳性能至关重要，这被转化为解决旅行推销员问题，这是一个复杂的图论挑战，对确定覆盖矿区所有取样位置的最有效路线至关重要。这个问题对于提高ROMIE的运营效率和竞争力非常重要，通过优化成本和时间。这项研究的主要目的是通过开发、评估和改进一种成本有效的软件和网络应用程序来推进GPP。我们深入比较和分析了谷歌运营研究（OR）工具优化算法。我们的研究旨在通过首次整合强化学习技术来应用和测试OR-Tools的能力极限。这使我们能够将这些方法与OR-Tools进行比较，评估它们的计算效率和现实应用效率。我们的分析旨在提供有关每种技术的有效性和实际应用的见解。我们的研究结果表明，Q学习是最佳策略，通过在我们的数据集上平均仅偏离最优解1.2%，展示出卓越的效率。

更新时间: 2025-03-05 10:12:22

领域: cs.RO,cs.AI,cs.CE,eess.SP

下载: http://arxiv.org/abs/2503.03338v1

XLSTM-HVED: Cross-Modal Brain Tumor Segmentation and MRI Reconstruction Method Using Vision XLSTM and Heteromodal Variational Encoder-Decoder

Neurogliomas are among the most aggressive forms of cancer, presenting considerable challenges in both treatment and monitoring due to their unpredictable biological behavior. Magnetic resonance imaging (MRI) is currently the preferred method for diagnosing and monitoring gliomas. However, the lack of specific imaging techniques often compromises the accuracy of tumor segmentation during the imaging process. To address this issue, we introduce the XLSTM-HVED model. This model integrates a hetero-modal encoder-decoder framework with the Vision XLSTM module to reconstruct missing MRI modalities. By deeply fusing spatial and temporal features, it enhances tumor segmentation performance. The key innovation of our approach is the Self-Attention Variational Encoder (SAVE) module, which improves the integration of modal features. Additionally, it optimizes the interaction of features between segmentation and reconstruction tasks through the Squeeze-Fusion-Excitation Cross Awareness (SFECA) module. Our experiments using the BraTS 2024 dataset demonstrate that our model significantly outperforms existing advanced methods in handling cases where modalities are missing. Our source code is available at https://github.com/Quanato607/XLSTM-HVED.

Updated: 2025-03-05 10:09:25

标题: XLSTM-HVED: 使用Vision XLSTM和异模态变分编码器-解码器进行跨模态脑肿瘤分割和MRI重建方法

摘要: 神经胶质瘤是最具侵略性的癌症之一，由于其不可预测的生物行为，在治疗和监测方面存在相当大的挑战。磁共振成像（MRI）目前是诊断和监测胶质瘤的首选方法。然而，缺乏特定的成像技术往往会损害在成像过程中肿瘤分割的准确性。为了解决这个问题，我们介绍了XLSTM-HVED模型。该模型将异模态编码器-解码器框架与Vision XLSTM模块相结合，以重建缺失的MRI模态。通过深度融合空间和时间特征，它提高了肿瘤分割性能。我们方法的关键创新是自注意力变分编码器（SAVE）模块，它改善了模态特征的整合。此外，通过Squeeze-Fusion-Excitation Cross Awareness（SFECA）模块，它优化了分割和重建任务之间特征的交互。我们在BraTS 2024数据集上的实验表明，我们的模型在处理模态缺失的情况下显著优于现有的先进方法。我们的源代码可在https://github.com/Quanato607/XLSTM-HVED 上找到。

更新时间: 2025-03-05 10:09:25

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2412.07804v3

Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization

We consider the problem of learning in adversarial Markov decision processes [MDPs] with an oblivious adversary in a full-information setting. The agent interacts with an environment during $T$ episodes, each of which consists of $H$ stages, and each episode is evaluated with respect to a reward function that will be revealed only at the end of the episode. We propose an algorithm, called APO-MVP, that achieves a regret bound of order $\tilde{\mathcal{O}}(\mathrm{poly}(H)\sqrt{SAT})$, where $S$ and $A$ are sizes of the state and action spaces, respectively. This result improves upon the best-known regret bound by a factor of $\sqrt{S}$, bridging the gap between adversarial and stochastic MDPs, and matching the minimax lower bound $\Omega(\sqrt{H^3SAT})$ as far as the dependencies in $S,A,T$ are concerned. The proposed algorithm and analysis completely avoid the typical tool given by occupancy measures; instead, it performs policy optimization based only on dynamic programming and on a black-box online linear optimization strategy run over estimated advantage functions, making it easy to implement. The analysis leverages two recent techniques: policy optimization based on online linear optimization strategies (Jonckheere et al., 2023) and a refined martingale analysis of the impact on values of estimating transitions kernels (Zhang et al., 2023).

Updated: 2025-03-05 10:07:22

标题: 通过策略优化缩小对抗性和随机MDP之间的差距

摘要: 我们考虑在全信息环境中与一个无意识对手进行学习的问题，该对手在对抗性马尔可夫决策过程[MDPs]中。代理与环境在$T$个周期内互动，每个周期包含$H$个阶段，并且每个周期根据一个奖励函数进行评估，该函数只会在周期结束时揭示。我们提出了一种算法，称为APO-MVP，其实现了一个遗憾上限为$\tilde{\mathcal{O}}(\mathrm{poly}(H)\sqrt{SAT})$，其中$S$和$A$分别代表状态空间和动作空间的大小。这一结果比已知的最佳遗憾上限提高了一个$\sqrt{S}$的因子，弥合了对抗性和随机MDPs之间的差距，并且在$S,A,T$的依赖性方面与最小值下限$\Omega(\sqrt{H^3SAT})$相匹配。所提出的算法和分析完全避免了典型的占用度量工具；相反，它仅基于动态规划和基于黑盒在线线性优化策略对估计的优势函数进行策略优化，这使得实施变得容易。该分析利用了两种最近的技术：基于在线线性优化策略的策略优化（Jonckheere等，2023）和对估计转移核值的影响进行精细的鞅分析（Zhang等，2023）。

更新时间: 2025-03-05 10:07:22

领域: cs.LG

下载: http://arxiv.org/abs/2407.05704v2

Leap: Inductive Link Prediction via Learnable TopologyAugmentation

Link prediction is a crucial task in many downstream applications of graph machine learning. To this end, Graph Neural Network (GNN) is a widely used technique for link prediction, mainly in transductive settings, where the goal is to predict missing links between existing nodes. However, many real-life applications require an inductive setting that accommodates for new nodes, coming into an existing graph. Thus, recently inductive link prediction has attracted considerable attention, and a multi-layer perceptron (MLP) is the popular choice of most studies to learn node representations. However, these approaches have limited expressivity and do not fully capture the graph's structural signal. Therefore, in this work we propose LEAP, an inductive link prediction method based on LEArnable toPology augmentation. Unlike previous methods, LEAP models the inductive bias from both the structure and node features, and hence is more expressive. To the best of our knowledge, this is the first attempt to provide structural contexts for new nodes via learnable augmentation in inductive settings. Extensive experiments on seven real-world homogeneous and heterogeneous graphs demonstrates that LEAP significantly surpasses SOTA methods. The improvements are up to 22\% and 17\% in terms of AUC and average precision, respectively. The code and datasets are available on GitHub (https://github.com/AhmedESamy/LEAP/)

Updated: 2025-03-05 10:03:59

标题: Leap：通过可学习的拓扑增强进行归纳链接预测

摘要: 链接预测是图机器学习许多下游应用中的一个关键任务。为此，图神经网络（GNN）是一种广泛使用的技术，主要用于链接预测，在传导设置中，目标是预测现有节点之间的缺失链接。然而，许多现实生活中的应用需要一个归纳设置，以适应新节点进入现有图。因此，最近归纳链接预测引起了相当大的关注，大多数研究选择多层感知器（MLP）来学习节点表示。然而，这些方法的表达能力有限，不能完全捕捉图的结构信号。因此，在这项工作中，我们提出了LEAP，一种基于LEArnable toPology增强的归纳链接预测方法。与以前的方法不同，LEAP模型从结构和节点特征中学习的归纳偏差，因此更具表现力。据我们所知，这是第一个尝试通过可学习增强在归纳设置中为新节点提供结构上下文的尝试。对七个真实世界的同质和异质图的大量实验表明，LEAP在AUC和平均精度方面显著超越了SOTA方法。改进分别达到了22％和17％。代码和数据集可在GitHub上找到（https://github.com/AhmedESamy/LEAP/）。

更新时间: 2025-03-05 10:03:59

领域: cs.LG

下载: http://arxiv.org/abs/2503.03331v1

A Survey on Self-play Methods in Reinforcement Learning

Self-play, characterized by agents' interactions with copies or past versions of themselves, has recently gained prominence in reinforcement learning (RL). This paper first clarifies the preliminaries of self-play, including the multi-agent reinforcement learning framework and basic game theory concepts. Then, it provides a unified framework and classifies existing self-play algorithms within this framework. Moreover, the paper bridges the gap between the algorithms and their practical implications by illustrating the role of self-play in different scenarios. Finally, the survey highlights open challenges and future research directions in self-play. This paper is an essential guide map for understanding the multifaceted landscape of self-play in RL.

Updated: 2025-03-05 10:03:08

标题: 一项关于自我对弈方法在强化学习中的调查

摘要: 自我对弈是指代理与其副本或过去版本进行互动，最近在强化学习（RL）中备受关注。本文首先澄清了自我对弈的基础知识，包括多智能体强化学习框架和基本博弈论概念。然后，它提供了一个统一框架，并将现有的自我对弈算法分类在这个框架内。此外，本文通过说明自我对弈在不同场景中的作用，弥合了算法与实际影响之间的鸿沟。最后，该调查突出了自我对弈中的挑战和未来研究方向。本文是理解RL中自我对弈多方面景观的重要指南。

更新时间: 2025-03-05 10:03:08

领域: cs.AI

下载: http://arxiv.org/abs/2408.01072v2

Distributed Differentially Private Data Analytics via Secure Sketching

We introduce the linear-transformation model, a distributed model of differentially private data analysis. Clients have access to a trusted platform capable of applying a public matrix to their inputs. Such computations can be securely distributed across multiple servers using simple and efficient secure multiparty computation techniques. The linear-transformation model serves as an intermediate model between the highly expressive central model and the minimal local model. In the central model, clients have access to a trusted platform capable of applying any function to their inputs. However, this expressiveness comes at a cost, as it is often prohibitively expensive to distribute such computations, leading to the central model typically being implemented by a single trusted server. In contrast, the local model assumes no trusted platform, which forces clients to add significant noise to their data. The linear-transformation model avoids the single point of failure for privacy present in the central model, while also mitigating the high noise required in the local model. We demonstrate that linear transformations are very useful for differential privacy, allowing for the computation of linear sketches of input data. These sketches largely preserve utility for tasks such as private low-rank approximation and private ridge regression, while introducing only minimal error, critically independent of the number of clients.

Updated: 2025-03-05 09:59:23

标题: 通过安全草图实现的分布式差分隐私数据分析

摘要: 我们介绍了线性转换模型，这是一个分布式的差分私有数据分析模型。客户可以访问一个可信平台，该平台能够对他们的输入应用一个公共矩阵。这样的计算可以使用简单而高效的安全多方计算技术在多个服务器之间安全地分布。线性转换模型作为高度表达中心模型和最小本地模型之间的中间模型。在中心模型中，客户可以访问一个可信平台，该平台能够对他们的输入应用任何函数。然而，这种表达能力是有代价的，因为通常分发这样的计算是非常昂贵的，导致中心模型通常由一个单一可信服务器实现。相反，本地模型假设没有可信平台，这迫使客户在他们的数据中添加大量的噪音。线性转换模型避免了中心模型中存在的隐私单点故障，同时也减轻了本地模型中所需的高噪音。我们证明了线性转换对差分隐私非常有用，允许计算输入数据的线性草图。这些草图在任务如私有低秩近似和私有岭回归等方面大部分保留了实用性，同时引入了只有最小误差，关键的是与客户数量无关。

更新时间: 2025-03-05 09:59:23

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2412.00497v2

See What You Are Told: Visual Attention Sink in Large Multimodal Models

Large multimodal models (LMMs) "see" images by leveraging the attention mechanism between text and visual tokens in the transformer decoder. Ideally, these models should focus on key visual information relevant to the text token. However, recent findings indicate that LMMs have an extraordinary tendency to consistently allocate high attention weights to specific visual tokens, even when these tokens are irrelevant to the corresponding text. In this study, we investigate the property behind the appearance of these irrelevant visual tokens and examine their characteristics. Our findings show that this behavior arises due to the massive activation of certain hidden state dimensions, which resembles the attention sink found in language models. Hence, we refer to this phenomenon as the visual attention sink. In particular, our analysis reveals that removing the irrelevant visual sink tokens does not impact model performance, despite receiving high attention weights. Consequently, we recycle the attention to these tokens as surplus resources, redistributing the attention budget to enhance focus on the image. To achieve this, we introduce Visual Attention Redistribution (VAR), a method that redistributes attention in image-centric heads, which we identify as innately focusing on visual information. VAR can be seamlessly applied across different LMMs to improve performance on a wide range of tasks, including general vision-language tasks, visual hallucination tasks, and vision-centric tasks, all without the need for additional training, models, or inference steps. Experimental results demonstrate that VAR enables LMMs to process visual information more effectively by adjusting their internal attention mechanisms, offering a new direction to enhancing the multimodal capabilities of LMMs.

Updated: 2025-03-05 09:55:07

标题: 看到你所告诉的：大型多模态模型中的视觉注意力下沉

摘要: 大型多模态模型（LMMs）通过在变压器解码器中的文本和视觉标记之间的注意力机制来“看”图像。理想情况下，这些模型应该专注于与文本标记相关的关键视觉信息。然而，最近的研究表明，LMMs具有一种非凡的倾向，即即使这些标记与相应的文本无关，它们也会持续地分配高的注意力权重给特定的视觉标记。在本研究中，我们调查了这些无关视觉标记出现背后的特性，并对它们的特征进行了研究。我们的研究结果表明，这种行为是由于某些隐藏状态维度的大规模激活所导致的，这类似于语言模型中发现的注意力汇聚。因此，我们将这一现象称为视觉注意力汇聚。特别是，我们的分析表明，移除这些无关的视觉汇聚标记并不影响模型性能，尽管它们获得了高的注意力权重。因此，我们将这些标记的注意力重新分配为多余的资源，重新分配注意力预算以增强对图像的关注。为了实现这一点，我们引入了视觉注意力重新分配（VAR）方法，该方法在以图像为中心的头部中重新分配注意力，我们认为这些头部天生专注于视觉信息。VAR可以无缝地应用于不同的LMMs，以改善广泛任务的性能，包括一般视觉语言任务、视觉幻觉任务和以视觉为中心的任务，而无需额外的训练、模型或推断步骤。实验结果表明，VAR通过调整LMMs的内部注意力机制使它们更有效地处理视觉信息，为增强LMMs的多模态能力提供了一条新的方向。

更新时间: 2025-03-05 09:55:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.03321v1

Task-optimal data-driven surrogate models for eNMPC via differentiable simulation and optimization

Mechanistic dynamic process models may be too computationally expensive to be usable as part of a real-time capable predictive controller. We present a method for end-to-end learning of Koopman surrogate models for optimal performance in a specific control task. In contrast to previous contributions that employ standard reinforcement learning (RL) algorithms, we use a training algorithm that exploits the differentiability of environments based on mechanistic simulation models to aid the policy optimization. We evaluate the performance of our method by comparing it to that of other training algorithms on an existing economic nonlinear model predictive control (eNMPC) case study of a continuous stirred-tank reactor (CSTR) model. Compared to the benchmark methods, our method produces similar economic performance while eliminating constraint violations. Thus, for this case study, our method outperforms the others and offers a promising path toward more performant controllers that employ dynamic surrogate models.

Updated: 2025-03-05 09:54:52

标题: 通过可微分模拟和优化实现任务最优数据驱动的eNMPC代理模型

摘要: 机制动态过程模型可能在实时可预测控制器的一部分中过于计算昂贵而无法使用。我们提出了一种用于端到端学习 Koopman 代理模型以在特定控制任务中实现最佳性能的方法。与以往利用标准强化学习（RL）算法的贡献不同，我们使用一种训练算法，利用基于机制模拟模型的环境的可微性来辅助策略优化。我们通过将其与其他训练算法在现有经济非线性模型预测控制（eNMPC）的一个连续搅拌槽反应器（CSTR）模型案例研究中的表现进行比较来评估我们的方法的性能。与基准方法相比，我们的方法在消除约束违规的同时产生类似的经济性能。因此，对于这个案例研究，我们的方法优于其他方法，并为采用动态代理模型的更有效控制器提供了一个有希望的路径。

更新时间: 2025-03-05 09:54:52

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2403.14425v3

Bi-Fact: A Bidirectional Factorization-based Evaluation of Intent Extraction from UI Trajectories

Evaluating intent extraction from GUIs demands accurate, fine-grained metrics. This paper introduces Bi-Fact, a novel method that decomposes intents into atomic facts and performs bidirectional comparisons to assess precision and recall. Experiments demonstrate Bi-Fact's superior correlation with human judgments compared to existing metrics, establishing a more robust evaluation framework for UI-driven intent understanding.

Updated: 2025-03-05 09:54:25

标题: Bi-Fact：基于双向因子分解的UI轨迹意图提取评估

摘要: 评估从GUI中提取意图需要准确、细粒度的度量标准。本文介绍了一种新方法Bi-Fact，它将意图分解为原子事实，并进行双向比较来评估精确度和召回率。实验证明，与现有度量标准相比，Bi-Fact与人类判断具有更高的相关性，为基于UI的意图理解建立了更健壮的评估框架。

更新时间: 2025-03-05 09:54:25

领域: cs.AI

下载: http://arxiv.org/abs/2502.13149v3

Number Cookbook: Number Understanding of Language Models and How to Improve It

Large language models (LLMs) can solve an increasing number of complex reasoning tasks while making surprising mistakes in basic numerical understanding and processing (such as 9.11 > 9.9). The latter ability is essential for tackling complex arithmetic and mathematical problems and serves as a foundation for most reasoning tasks, but previous work paid little attention to it or only discussed several restricted tasks (like integer addition). In this paper, we comprehensively investigate the numerical understanding and processing ability (NUPA) of LLMs. Firstly, we introduce a benchmark covering four common numerical representations and 17 distinct numerical tasks in four major categories, resulting in 41 meaningful combinations in total. These tasks are derived from primary and secondary education curricula, encompassing nearly all everyday numerical understanding and processing scenarios, and the rules of these tasks are very simple and clear. Through the benchmark, we find that current LLMs fail frequently in many of the tasks. To study the problem, we train small models with existing and potential techniques for enhancing NUPA (such as tokenizers, PEs, and number formats), comprehensively evaluating their effectiveness using our testbed. We also finetune practical-scale LLMs on our proposed NUPA tasks and find that 1) naive finetuning can improve NUPA a lot on many but not all tasks, and 2) surprisingly, techniques designed to enhance NUPA prove ineffective for finetuning pretrained models. We further explore the impact of chain-of-thought techniques on NUPA. Our work provides a more detailed and comprehensive understanding of NUPA in LLMs. Our benchmark and code are released at https://github.com/GraphPKU/number_cookbook.

Updated: 2025-03-05 09:52:30

标题: 数字菜谱：语言模型的数字理解及如何改善

摘要: 大型语言模型（LLMs）可以解决越来越多的复杂推理任务，同时在基本数字理解和处理方面出现令人惊讶的错误（如9.11>9.9）。后者的能力对于解决复杂的算术和数学问题至关重要，并且是大多数推理任务的基础，但以往的研究很少关注它，或者只讨论了几个受限制的任务（如整数加法）。在本文中，我们全面调查了LLMs的数字理解和处理能力（NUPA）。首先，我们引入了一个基准，涵盖了四种常见的数字表示和四个主要类别中的17个不同的数字任务，总共产生了41个有意义的组合。这些任务源自初中和高中的课程，涵盖了几乎所有日常数字理解和处理场景，这些任务的规则非常简单明了。通过基准测试，我们发现目前的LLMs在许多任务中经常失败。为了研究这个问题，我们使用现有和潜在的增强NUPA技术（如标记器、PE和数字格式）训练小模型，并通过我们的测试平台全面评估它们的有效性。我们还在我们提出的NUPA任务上微调实际规模的LLMs，并发现1）朴素的微调可以在许多但不是所有任务上显著改善NUPA，2）令人惊讶的是，旨在增强NUPA的技术对预训练模型的微调毫无效果。我们进一步探讨了链式思维技术对NUPA的影响。我们的工作提供了对LLMs中NUPA更详细和全面的理解。我们的基准测试和代码发布在https://github.com/GraphPKU/number_cookbook。

更新时间: 2025-03-05 09:52:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.03766v3

DarwinLM: Evolutionary Structured Pruning of Large Language Models

Large Language Models (LLMs) have achieved significant success across various NLP tasks. However, their massive computational costs limit their widespread use, particularly in real-time applications. Structured pruning offers an effective solution by compressing models and directly providing end-to-end speed improvements, regardless of the hardware environment. Meanwhile, different components of the model exhibit varying sensitivities towards pruning, calling for non-uniform model compression. However, a pruning method should not only identify a capable substructure, but also account for post-compression training. To this end, we propose DarwinLM, a method for training-aware structured pruning. DarwinLM builds upon an evolutionary search process, generating multiple offspring models in each generation through mutation, and selecting the fittest for survival. To assess the effect of post-training, we incorporate a lightweight, multistep training process within the offspring population, progressively increasing the number of tokens and eliminating poorly performing models in each selection stage. We validate our method through extensive experiments on Llama-2-7B, Llama-3.1-8B and Qwen-2.5-14B-Instruct, achieving state-of-the-art performance for structured pruning. For instance, DarwinLM surpasses ShearedLlama while requiring 5x less training data during post-compression training. Code is at: https://github.com/IST-DASLab/DarwinLM

Updated: 2025-03-05 09:50:16

标题: DarwinLM: 大型语言模型的进化结构修剪

摘要: 大型语言模型（LLMs）在各种NLP任务中取得了显著的成功。然而，它们庞大的计算成本限制了它们在广泛应用中的使用，特别是在实时应用中。结构化修剪通过压缩模型并直接提供端到端速度改进，为解决这一问题提供了有效的解决方案，无论硬件环境如何。同时，模型的不同组件对修剪显示出不同的敏感性，需要非均匀模型压缩。然而，修剪方法不仅应该识别出一个能够胜任的子结构，还应考虑后压缩训练。为此，我们提出了DarwinLM，一种用于训练感知的结构化修剪方法。DarwinLM基于进化搜索过程构建，通过突变在每一代生成多个后代模型，并选择适者生存。为了评估后训练的效果，我们在后代人口中加入了一个轻量级的多步训练过程，逐渐增加令牌数量，并在每个选择阶段消除表现不佳的模型。我们通过对Llama-2-7B、Llama-3.1-8B和Qwen-2.5-14B-Instruct等进行广泛实验来验证我们的方法，实现了结构化修剪的最新性能。例如，DarwinLM在后压缩训练期间需要5倍更少的训练数据，超过了ShearedLlama。代码位于：https://github.com/IST-DASLab/DarwinLM

更新时间: 2025-03-05 09:50:16

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2502.07780v3

"Only ChatGPT gets me": An Empirical Analysis of GPT versus other Large Language Models for Emotion Detection in Text

This work investigates the capabilities of large language models (LLMs) in detecting and understanding human emotions through text. Drawing upon emotion models from psychology, we adopt an interdisciplinary perspective that integrates computational and affective sciences insights. The main goal is to assess how accurately they can identify emotions expressed in textual interactions and compare different models on this specific task. This research contributes to broader efforts to enhance human-computer interaction, making artificial intelligence technologies more responsive and sensitive to users' emotional nuances. By employing a methodology that involves comparisons with a state-of-the-art model on the GoEmotions dataset, we aim to gauge LLMs' effectiveness as a system for emotional analysis, paving the way for potential applications in various fields that require a nuanced understanding of human language.

Updated: 2025-03-05 09:47:49

标题: “只有ChatGPT能理解我”：对GPT与其他大型语言模型在文本情感检测中的实证分析

摘要: 这项工作研究了大型语言模型（LLMs）在通过文本检测和理解人类情绪方面的能力。借鉴心理学的情感模型，我们采用了一个跨学科的视角，整合了计算和情感科学的见解。主要目标是评估它们在识别文本交互中表达的情绪时的准确性，并比较不同模型在这一特定任务上的表现。这项研究为增强人机交互的努力做出了贡献，使人工智能技术更加响应和敏感于用户的情感细微差别。通过采用一种涉及与GoEmotions数据集上最先进模型的比较的方法，我们旨在评估LLMs作为情感分析系统的有效性，为需要对人类语言有微妙理解的各个领域的潜在应用铺平道路。

更新时间: 2025-03-05 09:47:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.04831v1

LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models

Text-Attributed Graphs (TAGs), where each node is associated with text descriptions, are ubiquitous in real-world scenarios. They typically exhibit distinctive structure and domain-specific knowledge, motivating the development of a Graph Foundation Model (GFM) that generalizes across diverse graphs and tasks. Despite large efforts to integrate Large Language Models (LLMs) and Graph Neural Networks (GNNs) for TAGs, existing approaches suffer from decoupled architectures with two-stage alignment, limiting their synergistic potential. Even worse, existing methods assign out-of-vocabulary (OOV) tokens to graph nodes, leading to graph-specific semantics, token explosion, and incompatibility with task-oriented prompt templates, which hinders cross-graph and cross-task transferability. To address these challenges, we propose PromptGFM, a versatile GFM for TAGs grounded in graph vocabulary learning. PromptGFM comprises two key components: (1) Graph Understanding Module, which explicitly prompts LLMs to replicate the finest GNN workflow within the text space, facilitating seamless GNN-LLM integration and elegant graph-text alignment; (2) Graph Inference Module, which establishes a language-based graph vocabulary ensuring expressiveness, transferability, and scalability, enabling readable instructions for LLM fine-tuning. Extensive experiments demonstrate our superiority and transferability across diverse graphs and tasks. The code is available at this: https://github.com/agiresearch/PromptGFM.

Updated: 2025-03-05 09:45:22

标题: LLM作为GNN：文本属性图基础模型的图形词汇学习

摘要: 文本属性图（TAGs），其中每个节点都与文本描述相关联，在现实世界中随处可见。它们通常展现出独特的结构和领域特定的知识，促使开发一个可以横跨不同图和任务的图基础模型（GFM）。尽管已经大力整合大型语言模型（LLMs）和图神经网络（GNNs）用于TAGs，但现有方法存在分离的架构和两阶段对齐，限制了它们的协同潜力。更糟糕的是，现有方法将未登录词（OOV）标记分配给图节点，导致图特定的语义、标记爆炸，并且与面向任务的提示模板不兼容，这阻碍了跨图和跨任务的可转移性。为了解决这些挑战，我们提出了PromptGFM，这是一个基于图词汇学习的适用于TAGs的通用GFM。PromptGFM包括两个关键组件：（1）图理解模块，明确提示LLMs在文本空间内复制最精细的GNN工作流程，促进无缝的GNN-LLM集成和优雅的图文对齐；（2）图推理模块，建立了基于语言的图词汇，确保表达能力、可转移性和可扩展性，为LLM微调提供可读的指导。大量实验证明了我们在各种图和任务之间的优越性和可转移性。该代码可在此处获得：https://github.com/agiresearch/PromptGFM。

更新时间: 2025-03-05 09:45:22

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.03313v1

Differential Machine Learning for Time Series Prediction

Accurate time series prediction is challenging due to the inherent nonlinearity and sensitivity to initial conditions. We propose a novel approach that enhances neural network predictions through differential learning, which involves training models on both the original time series and its differential series. Specifically, we develop a differential long short-term memory (Diff-LSTM) network that uses a shared LSTM cell to simultaneously process both data streams, effectively capturing intrinsic patterns and temporal dynamics. Evaluated on the Mackey-Glass, Lorenz, and R\"ossler chaotic time series, as well as a real-world financial dataset from ACI Worldwide Inc., our results demonstrate that the Diff- LSTM network outperforms prevalent models such as recurrent neural networks, convolutional neural networks, and bidirectional and encoder-decoder LSTM networks in both short-term and long-term predictions. This framework offers a promising solution for enhancing time series prediction, even when comprehensive knowledge of the underlying dynamics of the time series is not fully available.

Updated: 2025-03-05 09:36:57

标题: 时间序列预测的差异化机器学习

摘要: 准确的时间序列预测是具有挑战性的，因为其固有的非线性和对初始条件的敏感性。我们提出了一种通过微分学习增强神经网络预测的新方法，该方法涉及在原始时间序列和其微分序列上训练模型。具体而言，我们开发了一种差分长短期记忆（Diff-LSTM）网络，该网络使用共享的LSTM单元同时处理两个数据流，有效地捕捉内在模式和时间动态。在Mackey-Glass、Lorenz和Rössler混沌时间序列以及ACI Worldwide Inc.的真实金融数据集上进行评估，我们的结果表明Diff-LSTM网络在短期和长期预测方面优于普遍的模型，如递归神经网络、卷积神经网络和双向和编码器-解码器LSTM网络。即使对于时间序列的基础动态的全面了解并不完全可用，这种框架也提供了一个有希望的解决方案来增强时间序列预测。

更新时间: 2025-03-05 09:36:57

领域: cs.LG

下载: http://arxiv.org/abs/2503.03302v1

Introduction to Artificial Consciousness: History, Current Trends and Ethical Challenges

With the significant progress of artificial intelligence (AI) and consciousness science, artificial consciousness (AC) has recently gained popularity. This work provides a broad overview of the main topics and current trends in AC. The first part traces the history of this interdisciplinary field to establish context and clarify key terminology, including the distinction between Weak and Strong AC. The second part examines major trends in AC implementations, emphasising the synergy between Global Workspace and Attention Schema, as well as the problem of evaluating the internal states of artificial systems. The third part analyses the ethical dimension of AC development, revealing both critical risks and transformative opportunities. The last part offers recommendations to guide AC research responsibly, and outlines the limitations of this study as well as avenues for future research. The main conclusion is that while AC appears both indispensable and inevitable for scientific progress, serious efforts are required to address the far-reaching impact of this innovative research path.

Updated: 2025-03-05 09:34:36

标题: 人工意识简介：历史、当前趋势和伦理挑战

摘要: 随着人工智能（AI）和意识科学的重大进展，人工意识（AC）近来受到了关注。本文提供了人工意识主要主题和当前趋势的广泛概述。第一部分追溯了这一跨学科领域的历史，以建立背景并澄清关键术语，包括弱人工意识和强人工意识之间的区别。第二部分审视了人工意识实现的主要趋势，强调了全局工作空间和注意模式之间的协同作用，以及评估人工系统内部状态的问题。第三部分分析了人工意识发展的伦理维度，揭示了关键风险和变革性机遇。最后一部分提供了指导人工意识研究负责的建议，并概述了本研究的局限性以及未来研究的途径。主要结论是，虽然人工意识对科学进步既不可或缺又不可避免，但需要认真努力来应对这种创新研究路径的深远影响。

更新时间: 2025-03-05 09:34:36

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2503.05823v1

Verifiable and Provably Secure Machine Unlearning

Machine unlearning aims to remove points from the training dataset of a machine learning model after training: e.g., when a user requests their data to be deleted. While many unlearning methods have been proposed, none of them enable users to audit the procedure. Furthermore, recent work shows a user is unable to verify whether their data was unlearnt from an inspection of the model parameter alone. Rather than reasoning about parameters, we propose to view verifiable unlearning as a security problem. To this end, we present the first cryptographic definition of verifiable unlearning to formally capture the guarantees of an unlearning system. In this framework, the server first computes a proof that the model was trained on a dataset D. Given a user's data point d requested to be deleted, the server updates the model using an unlearning algorithm. It then provides a proof of the correct execution of unlearning and that d is not part of D', where D' is the new training dataset (i.e., d has been removed). Our framework is generally applicable to different unlearning techniques that we abstract as admissible functions. We instantiate a protocol in the framework, based on cryptographic assumptions, using SNARKs and hash chains. Finally, we implement the protocol for three different unlearning techniques and validate its feasibility for linear regression, logistic regression, and neural networks.

Updated: 2025-03-05 09:30:22

标题: 可验证和可证明安全的机器遗忘

摘要: 机器遗忘旨在在训练后从机器学习模型的训练数据集中删除点：例如，当用户请求删除他们的数据时。虽然已经提出了许多遗忘方法，但没有一种方法使用户能够审计该过程。此外，最近的工作表明，用户无法仅通过检查模型参数来验证其数据是否被遗忘。我们提出将可验证的遗忘视为安全问题，而不是对参数进行推理。为此，我们提出了第一个可验证遗忘的密码学定义，以正式捕获遗忘系统的保证。在这个框架中，服务器首先计算一个证明，即模型是在数据集D上训练的。给定用户请求删除的数据点d，服务器使用一个遗忘算法更新模型。然后提供正确执行遗忘的证明，以及d不是D'的一部分的证明，其中D'是新的训练数据集（即d已被移除）。我们的框架通常适用于不同的遗忘技术，我们将其抽象为可接受的函数。我们在该框架中基于密码学假设实例化了一个协议，使用SNARKs和哈希链。最后，我们为三种不同的遗忘技术实现了该协议，并验证了其在线性回归、逻辑回归和神经网络中的可行性。

更新时间: 2025-03-05 09:30:22

领域: cs.LG

下载: http://arxiv.org/abs/2210.09126v3

Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQs

In education, the capability of generating human-like text of Large Language Models (LLMs) inspired work on how they can increase the efficiency of learning and teaching. We study the affordability of these models for educators and students by investigating how LLMs answer multiple-choice questions (MCQs) with respect to hardware constraints and refinement techniques. We explore this space by using generic pre-trained LLMs (the 7B, 13B, and 70B variants of LLaMA-2) to answer 162 undergraduate-level MCQs from a course on Programming Languages (PL) -- the MCQ dataset is a contribution of this work, which we make publicly available. Specifically, we dissect how different factors, such as using readily-available material -- (parts of) the course's textbook -- for fine-tuning and quantisation (to decrease resource usage) can change the accuracy of the responses. The main takeaway is that smaller textbook-based fine-tuned models outperform generic larger ones (whose pre-training requires conspicuous resources), making the usage of LLMs for answering MCQs resource- and material-wise affordable.

Updated: 2025-03-05 09:18:31

标题: 经济实惠的精细调整LLM为课程特定的多项选择题提供更好的答案

摘要: 在教育领域，大型语言模型（LLMs）生成类似人类文本的能力激发了人们对它们如何提高学习和教学效率的研究。我们通过研究LLMs在硬件约束和改进技术方面如何回答多项选择题（MCQs）来探讨这些模型对教育工作者和学生的可负担性。我们使用通用预训练的LLMs（LLaMA-2的7B、13B和70B变体）来回答编程语言（PL）课程中的162个本科水平MCQs -- MCQ数据集是本研究的贡献，我们将其公开提供。具体来说，我们分析了不同因素，如使用现成材料（课程教材的部分）进行微调和量化（以减少资源使用），可以如何改变回答的准确性。主要结论是，基于较小的教科书进行微调的模型优于通用的较大模型（其预训练需要大量资源），使得使用LLMs回答MCQs在资源和材料上更加可负担。

更新时间: 2025-03-05 09:18:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2501.05891v2

Solving Inverse Problem for Multi-armed Bandits via Convex Optimization

We consider the inverse problem of multi-armed bandits (IMAB) that are widely used in neuroscience and psychology research for behavior modelling. We first show that the IMAB problem is not convex in general, but can be relaxed to a convex problem via variable transformation. Based on this result, we propose a two-step sequential heuristic for (approximately) solving the IMAB problem. We discuss a condition where our method provides global solution to the IMAB problem with certificate, as well as approximations to further save computing time. Numerical experiments indicate that our heuristic method is more robust than directly solving the IMAB problem via repeated local optimization, and can achieve the performance of Monte Carlo methods within a significantly decreased running time. We provide the implementation of our method based on CVXPY, which allows straightforward application by users not well versed in convex optimization.

Updated: 2025-03-05 09:13:02

标题: 通过凸优化解决多臂老虎机的反问题

摘要: 我们考虑了在神经科学和心理学研究中广泛使用的多臂老虎机的逆问题（IMAB）。我们首先展示了IMAB问题一般不是凸的，但可以通过变量转换放宽成一个凸问题。基于这一结果，我们提出了一个两步序贯启发式方法来（近似）解决IMAB问题。我们讨论了一个条件，该方法在IMAB问题上提供了全局解，并附有证明，同时提供了进一步节省计算时间的近似值。数值实验表明，我们的启发式方法比通过反复局部优化直接解决IMAB问题更为稳健，并且在显著减少运行时间的情况下可以实现蒙特卡洛方法的性能。我们提供了基于CVXPY的实现方法，这使得不熟悉凸优化的用户可以直接应用。

更新时间: 2025-03-05 09:13:02

领域: cs.CE,cs.LG,math.OC,q-bio.NC

下载: http://arxiv.org/abs/2501.18945v2

Iterative Value Function Optimization for Guided Decoding

While Reinforcement Learning from Human Feedback (RLHF) has become the predominant method for controlling language model outputs, it suffers from high computational costs and training instability. Guided decoding, especially value-guided methods, offers a cost-effective alternative by controlling outputs without re-training models. However, the accuracy of the value function is crucial for value-guided decoding, as inaccuracies can lead to suboptimal decision-making and degraded performance. Existing methods struggle with accurately estimating the optimal value function, leading to less effective control. We propose Iterative Value Function Optimization, a novel framework that addresses these limitations through two key components: Monte Carlo Value Estimation, which reduces estimation variance by exploring diverse trajectories, and Iterative On-Policy Optimization, which progressively improves value estimation through collecting trajectories from value-guided policies. Extensive experiments on text summarization, multi-turn dialogue, and instruction following demonstrate the effectiveness of value-guided decoding approaches in aligning language models. These approaches not only achieve alignment but also significantly reduce computational costs by leveraging principled value function optimization for efficient and effective control.

Updated: 2025-03-05 09:12:25

标题: 迭代值函数优化用于引导解码

摘要: 虽然从人类反馈中进行强化学习（RLHF）已成为控制语言模型输出的主要方法，但它面临着高计算成本和训练不稳定的问题。引导解码，特别是价值引导方法，提供了一种经济有效的替代方案，可以在不重新训练模型的情况下控制输出。然而，价值函数的准确性对于价值引导解码至关重要，因为不准确可能导致次优决策和性能下降。现有方法难以准确估计最优价值函数，导致控制效果不佳。我们提出了迭代价值函数优化，这是一个通过两个关键组件解决这些限制的新框架：蒙特卡洛价值估计通过探索多样轨迹来减少估计方差，以及迭代在线策略优化，通过从价值引导策略中收集轨迹逐步改进价值估计。在文本摘要、多轮对话和遵循指令等广泛实验中，证明了价值引导解码方法在对齐语言模型方面的有效性。这些方法不仅实现了对齐，还通过利用基于原则的价值函数优化显著降低了计算成本，从而实现了高效和有效的控制。

更新时间: 2025-03-05 09:12:25

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.02368v2

Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations

Visual Question Answering (VQA) is a multimodal task requiring reasoning across textual and visual inputs, which becomes particularly challenging in low-resource languages like Vietnamese due to linguistic variability and the lack of high-quality datasets. Traditional methods often rely heavily on extensive annotated datasets, computationally expensive pipelines, and large pre-trained models, specifically in the domain of Vietnamese VQA, limiting their applicability in such scenarios. To address these limitations, we propose a training framework that combines a paraphrase-based feature augmentation module with a dynamic curriculum learning strategy. Explicitly, augmented samples are considered "easy" while raw samples are regarded as "hard". The framework then utilizes a mechanism that dynamically adjusts the ratio of easy to hard samples during training, progressively modifying the same dataset to increase its difficulty level. By enabling gradual adaptation to task complexity, this approach helps the Vietnamese VQA model generalize well, thus improving overall performance. Experimental results show consistent improvements on the OpenViVQA dataset and mixed outcomes on the ViVQA dataset, highlighting both the potential and challenges of our approach in advancing VQA for Vietnamese language.

Updated: 2025-03-05 09:12:16

标题: 通过课程学习增强越南VQA在原始和增强文本表示上的效果

摘要: 视觉问答（VQA）是一项多模态任务，需要跨文本和视觉输入进行推理，这在像越南这样的低资源语言中尤为具有挑战性，原因在于语言的变化性和缺乏高质量的数据集。传统方法通常严重依赖于大量注释数据集、计算昂贵的流水线和大型预训练模型，特别是在越南VQA领域，限制了它们在这种情况下的适用性。为了解决这些限制，我们提出了一个训练框架，结合了基于释义的特征增强模块和动态课程学习策略。明确地，增强样本被视为“简单”，而原始样本被视为“困难”。该框架然后利用一种机制，在训练过程中动态调整简单样本与困难样本的比例，逐渐修改同一数据集以增加其难度级别。通过使任务复杂性逐渐适应，这种方法有助于越南VQA模型良好泛化，从而提高整体性能。实验结果表明，在OpenViVQA数据集上表现稳定提高，而在ViVQA数据集上呈现出不同结果，突显了我们的方法在推动越南语言VQA方面的潜力和挑战。

更新时间: 2025-03-05 09:12:16

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.03285v1

Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations

Drawing parallels with the way biological networks are studied, we adapt the treatment--control paradigm to explainable artificial intelligence research and enrich it through multi-parametric input alterations. In this study, we propose a framework for investigating the internal inference impacted by input data augmentations. The internal changes in network operation are reflected in activation changes measured by variance, which can be decomposed into components related to each augmentation, employing Sobol indices and Shapley values. These quantities enable one to visualize sensitivity to different variables and use them for guided masking of activations. In addition, we introduce a way of single-class sensitivity analysis where the candidates are filtered according to their matching to prediction bias generated by targeted damaging of the activations. Relying on the observed parallels, we assume that the developed framework can potentially be transferred to studying biological neural networks in complex environments.

Updated: 2025-03-05 09:09:01

标题: 在同时图像增强的情况下探索卷积神经网络的专业化和敏感性

摘要: 将生物网络的研究方式与可解释人工智能研究相结合，我们将治疗-对照范式适应于可解释人工智能研究，并通过多参数输入改变来丰富它。在这项研究中，我们提出了一个框架，用于调查输入数据增强所影响的内部推理。网络操作中的内部变化体现在通过方差测量的激活变化中，可以将其分解为与每个增强相关的组件，利用Sobol指数和Shapley值。这些量使人们能够可视化对不同变量的敏感性，并将其用于激活的引导屏蔽。此外，我们引入了一种单类敏感性分析的方法，其中候选对象根据其与通过有针对性地破坏激活产生的预测偏差的匹配进行过滤。依赖于观察到的类比，我们假设所开发的框架潜在地可以转移到研究复杂环境中的生物神经网络。

更新时间: 2025-03-05 09:09:01

领域: stat.ML,cs.AI,cs.LG,cs.NA,math.NA,68T07,I.2.6; G.3; I.2.10

下载: http://arxiv.org/abs/2503.03283v1

POMDP-Driven Cognitive Massive MIMO Radar: Joint Target Detection-Tracking In Unknown Disturbances

The joint detection and tracking of a moving target embedded in an unknown disturbance represents a key feature that motivates the development of the cognitive radar paradigm. Building upon recent advancements in robust target detection with multiple-input multiple-output (MIMO) radars, this work explores the application of a Partially Observable Markov Decision Process (POMDP) framework to enhance the tracking and detection tasks in a statistically unknown environment. In the POMDP setup, the radar system is considered as an intelligent agent that continuously senses the surrounding environment, optimizing its actions to maximize the probability of detection $(P_D)$ and improve the target position and velocity estimation, all this while keeping a constant probability of false alarm $(P_{FA})$. The proposed approach employs an online algorithm that does not require any apriori knowledge of the noise statistics, and it relies on a much more general observation model than the traditional range-azimuth-elevation model employed by conventional tracking algorithms. Simulation results clearly show substantial performance improvement of the POMDP-based algorithm compared to the State-Action-Reward-State-Action (SARSA)-based one that has been recently investigated in the context of massive MIMO (MMIMO) radar systems.

Updated: 2025-03-05 09:05:23

标题: POMDP驱动的认知大规模MIMO雷达：未知干扰下的联合目标检测-跟踪

摘要: 在未知干扰中嵌入的移动目标的联合检测和跟踪代表了激发认知雷达范式发展的关键特征。借鉴最近在多输入多输出（MIMO）雷达中鲁棒目标检测方面的进展，本文探讨了在部分可观察马尔可夫决策过程（POMDP）框架中应用以增强在统计未知环境中的跟踪和检测任务。在POMDP设置中，雷达系统被视为一个智能体，不断感知周围环境，优化其行动以最大化检测概率$(P_D)$并改善目标位置和速度估计，同时保持恒定的误报率$(P_{FA})$。所提出的方法采用一种在线算法，不需要任何噪声统计的先验知识，并且依赖于比传统跟踪算法采用的范围-方位-仰角模型更一般的观察模型。仿真结果清楚地显示，与最近在大规模MIMO（MMIMO）雷达系统背景下研究的基于状态-动作-奖励-状态-动作（SARSA）的算法相比，基于POMDP的算法显著提高了性能。

更新时间: 2025-03-05 09:05:23

领域: cs.LG,eess.SP,stat.AP

下载: http://arxiv.org/abs/2410.17967v2

SLTNet: Efficient Event-based Semantic Segmentation with Spike-driven Lightweight Transformer-based Networks

Event-based semantic segmentation has great potential in autonomous driving and robotics due to the advantages of event cameras, such as high dynamic range, low latency, and low power cost. Unfortunately, current artificial neural network (ANN)-based segmentation methods suffer from high computational demands, the requirements for image frames, and massive energy consumption, limiting their efficiency and application on resource-constrained edge/mobile platforms. To address these problems, we introduce SLTNet, a spike-driven lightweight transformer-based network designed for event-based semantic segmentation. Specifically, SLTNet is built on efficient spike-driven convolution blocks (SCBs) to extract rich semantic features while reducing the model's parameters. Then, to enhance the long-range contextural feature interaction, we propose novel spike-driven transformer blocks (STBs) with binary mask operations. Based on these basic blocks, SLTNet employs a high-efficiency single-branch architecture while maintaining the low energy consumption of the Spiking Neural Network (SNN). Finally, extensive experiments on DDD17 and DSEC-Semantic datasets demonstrate that SLTNet outperforms state-of-the-art (SOTA) SNN-based methods by at most 9.06% and 9.39% mIoU, respectively, with extremely 4.58x lower energy consumption and 114 FPS inference speed. Our code is open-sourced and available at https://github.com/longxianlei/SLTNet-v1.0.

Updated: 2025-03-05 09:03:18

标题: SLTNet：基于轻量级脉冲驱动变压器网络的高效事件驱动语义分割

摘要: 基于事件的语义分割在自动驾驶和机器人领域具有巨大潜力，这是因为事件相机具有高动态范围、低延迟和低功耗等优势。不幸的是，目前基于人工神经网络（ANN）的分割方法存在高计算需求、对图像帧的要求和大量能量消耗等问题，限制了它们在资源受限的边缘/移动平台上的效率和应用。为了解决这些问题，我们引入了SLTNet，这是一个基于脉冲驱动的轻量级Transformer网络，专为基于事件的语义分割而设计。具体来说，SLTNet建立在高效的脉冲驱动卷积块（SCBs）上，可以提取丰富的语义特征同时减少模型的参数。然后，为了增强长距离的上下文特征交互，我们提出了带有二进制掩码操作的新型脉冲驱动Transformer块（STBs）。基于这些基本块，SLTNet采用高效的单分支架构，同时保持脉冲神经网络（SNN）低能耗的特点。最后，在DDD17和DSEC-Semantic数据集上进行了大量实验，结果表明SLTNet在mIoU上分别比最先进的基于SNN的方法提高了最多9.06%和9.39%，并且能源消耗极低，推理速度为114 FPS。我们的代码是开源的，可在https://github.com/longxianlei/SLTNet-v1.0获取。

更新时间: 2025-03-05 09:03:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2412.12843v2

TrafficKAN-GCN: Graph Convolutional-based Kolmogorov-Arnold Network for Traffic Flow Optimization

Urban traffic optimization is critical for improving transportation efficiency and alleviating congestion, particularly in large-scale dynamic networks. Traditional methods, such as Dijkstra's and Floyd's algorithms, provide effective solutions in static settings, but they struggle with the spatial-temporal complexity of real-world traffic flows. In this work, we propose TrafficKAN-GCN, a hybrid deep learning framework combining Kolmogorov-Arnold Networks (KAN) with Graph Convolutional Networks (GCN), designed to enhance urban traffic flow optimization. By integrating KAN's adaptive nonlinear function approximation with GCN's spatial graph learning capabilities, TrafficKAN-GCN captures both complex traffic patterns and topological dependencies. We evaluate the proposed framework using real-world traffic data from the Baltimore Metropolitan area. Compared with baseline models such as MLP-GCN, standard GCN, and Transformer-based approaches, TrafficKAN-GCN achieves competitive prediction accuracy while demonstrating improved robustness in handling noisy and irregular traffic data. Our experiments further highlight the framework's ability to redistribute traffic flow, mitigate congestion, and adapt to disruptive events, such as the Francis Scott Key Bridge collapse. This study contributes to the growing body of work on hybrid graph learning for intelligent transportation systems, highlighting the potential of combining KAN and GCN for real-time traffic optimization. Future work will focus on reducing computational overhead and integrating Transformer-based temporal modeling for enhanced long-term traffic prediction. The proposed TrafficKAN-GCN framework offers a promising direction for data-driven urban mobility management, balancing predictive accuracy, robustness, and computational efficiency.

Updated: 2025-03-05 08:59:06

标题: TrafficKAN-GCN：基于图卷积的科尔莫戈洛夫-阿诺德网络用于交通流优化

摘要: 城市交通优化对于提高交通效率和缓解拥堵至关重要，特别是在大规模动态网络中。传统方法，如Dijkstra和Floyd算法，在静态环境中提供有效解决方案，但在现实世界交通流的空间-时间复杂性方面存在困难。在这项工作中，我们提出了TrafficKAN-GCN，这是一个混合深度学习框架，结合了Kolmogorov-Arnold Networks（KAN）与Graph Convolutional Networks（GCN），旨在增强城市交通流优化。通过集成KAN的自适应非线性函数逼近和GCN的空间图学习能力，TrafficKAN-GCN捕捉了复杂的交通模式和拓扑依赖关系。我们使用来自巴尔的摩大都会地区的真实交通数据评估了提出的框架。与MLP-GCN、标准GCN和基于Transformer的方法等基准模型相比，TrafficKAN-GCN在实现竞争性预测准确性的同时，展示了在处理嘈杂和不规则交通数据方面的改进鲁棒性。我们的实验进一步突显了该框架在重新分配交通流、缓解拥堵和适应破坏性事件（如弗朗西斯·斯科特·基桥倒塌）方面的能力。这项研究为智能交通系统的混合图学习领域增添了新的内容，突出了结合KAN和GCN进行实时交通优化的潜力。未来的工作将专注于减少计算开销，并集成基于Transformer的时间建模，以增强长期交通预测。提出的TrafficKAN-GCN框架为基于数据驱动的城市移动管理提供了一个有前途的方向，平衡了预测准确性、鲁棒性和计算效率。

更新时间: 2025-03-05 08:59:06

领域: cs.LG,90B20, 68T07, 05C85, 90C90,G.2.2; I.2.6; I.5.1; I.2.8; J.7

下载: http://arxiv.org/abs/2503.03276v1

Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum Systems

Ensuring Service Level Objectives (SLOs) in large-scale architectures, such as Distributed Computing Continuum Systems (DCCS), is challenging due to their heterogeneous nature and varying service requirements across different devices and applications. Additionally, unpredictable workloads and resource limitations lead to fluctuating performance and violated SLOs. To improve SLO compliance in DCCS, one possibility is to apply machine learning; however, the design choices are often left to the developer. To that extent, we provide a benchmark of Active Inference -- an emerging method from neuroscience -- against three established reinforcement learning algorithms (Deep Q-Network, Advantage Actor-Critic, and Proximal Policy Optimization). We consider a realistic DCCS use case: an edge device running a video conferencing application alongside a WebSocket server streaming videos. Using one of the respective algorithms, we continuously monitor key performance metrics, such as latency and bandwidth usage, to dynamically adjust parameters -- including the number of streams, frame rate, and resolution -- to optimize service quality and user experience. To test algorithms' adaptability to constant system changes, we simulate dynamically changing SLOs and both instant and gradual data-shift scenarios, such as network bandwidth limitations and fluctuating device thermal states. Although the evaluated algorithms all showed advantages and limitations, our findings demonstrate that Active Inference is a promising approach for ensuring SLO compliance in DCCS, offering lower memory usage, stable CPU utilization, and fast convergence.

Updated: 2025-03-05 08:56:26

标题: 在分布式计算连续系统中基准测试动态SLO合规性

摘要: 在大规模架构中确保服务级别目标（SLOs），如分布式计算连续系统（DCCS），由于其异构性质和不同设备和应用程序之间的各种服务需求变化而具有挑战性。此外，不可预测的工作负载和资源限制导致性能波动和SLOs违规。为了提高DCCS中SLO的合规性，一种可能性是应用机器学习；然而，设计选择通常由开发人员决定。为此，我们对Active Inference进行了基准测试--这是一种来自神经科学的新兴方法--与三种已建立的强化学习算法（深度Q网络、优势演员评论和近端策略优化）进行比较。我们考虑了一个现实的DCCS用例：一台边缘设备运行视频会议应用程序，同时搭载一个流式传输视频的WebSocket服务器。使用其中一种算法，我们持续监控关键性能指标，如延迟和带宽使用量，动态调整参数--包括流数量、帧率和分辨率--以优化服务质量和用户体验。为了测试算法对系统不断变化的适应性，我们模拟不断变化的SLO和即时和逐渐数据转移场景，如网络带宽限制和设备热状态波动。尽管评估的算法都显示了优势和局限性，我们的研究结果表明Active Inference是确保DCCS中SLO合规性的一种有前途的方法，提供更低的内存使用率、稳定的CPU利用率和快速收敛。

更新时间: 2025-03-05 08:56:26

领域: cs.DC,cs.AI,cs.LG,cs.NI,cs.PF

下载: http://arxiv.org/abs/2503.03274v1

Reduced Spatial Dependency for More General Video-level Deepfake Detection

As one of the prominent AI-generated content, Deepfake has raised significant safety concerns. Although it has been demonstrated that temporal consistency cues offer better generalization capability, existing methods based on CNNs inevitably introduce spatial bias, which hinders the extraction of intrinsic temporal features. To address this issue, we propose a novel method called Spatial Dependency Reduction (SDR), which integrates common temporal consistency features from multiple spatially-perturbed clusters, to reduce the dependency of the model on spatial information. Specifically, we design multiple Spatial Perturbation Branch (SPB) to construct spatially-perturbed feature clusters. Subsequently, we utilize the theory of mutual information and propose a Task-Relevant Feature Integration (TRFI) module to capture temporal features residing in similar latent space from these clusters. Finally, the integrated feature is fed into a temporal transformer to capture long-range dependencies. Extensive benchmarks and ablation studies demonstrate the effectiveness and rationale of our approach.

Updated: 2025-03-05 08:51:55

标题: 减少空间依赖性以实现更普遍的视频级深度伪造检测

摘要: 作为突出的人工智能生成内容之一，Deepfake引起了重大的安全担忧。尽管已经证明时间一致性线索提供了更好的泛化能力，但基于CNN的现有方法不可避免地引入了空间偏差，这妨碍了内在时间特征的提取。为了解决这个问题，我们提出了一种称为空间依赖性降低（SDR）的新方法，它集成了来自多个空间扰动聚类的常见时间一致性特征，以减少模型对空间信息的依赖性。具体地，我们设计了多个空间扰动分支（SPB）来构建空间扰动特征聚类。随后，我们利用互信息理论并提出了一个任务相关特征整合（TRFI）模块，来捕捉这些聚类中相似潜在空间中的时间特征。最后，集成特征被输入到一个时间变换器中，以捕捉长程依赖关系。广泛的基准测试和消融研究证明了我们方法的有效性和合理性。

更新时间: 2025-03-05 08:51:55

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2503.03270v1

Conformal Transformations for Symmetric Power Transformers

Transformers with linear attention offer significant computational advantages over softmax-based transformers but often suffer from degraded performance. The symmetric power (sympow) transformer, a particular type of linear transformer, addresses some of this performance gap by leveraging symmetric tensor embeddings, achieving comparable performance to softmax transformers. However, the finite capacity of the recurrent state in sympow transformers limits their ability to retain information, leading to performance degradation when scaling the training or evaluation context length. To address this issue, we propose the conformal-sympow transformer, which dynamically frees up capacity using data-dependent multiplicative gating and adaptively stores information using data-dependent rotary embeddings. Preliminary experiments on the LongCrawl64 dataset demonstrate that conformal-sympow overcomes the limitations of sympow transformers, achieving robust performance across scaled training and evaluation contexts.

Updated: 2025-03-05 08:50:53

标题: 对称功率变压器的共形变换

摘要: 线性注意力的变压器相比基于softmax的变压器具有显著的计算优势，但通常会受到性能下降的影响。对称幂（sympow）变压器是一种特殊类型的线性变压器，通过利用对称张量嵌入，弥补了部分性能差距，达到了与softmax变压器相当的性能。然而，在sympow变压器中，有限的循环状态容量限制了其保留信息的能力，在扩展训练或评估上下文长度时导致性能下降。为了解决这个问题，我们提出了符合- sympow 变压器，通过使用数据相关的乘法门控动态释放容量，并使用数据相关的旋转嵌入自适应存储信息。对LongCrawl64数据集的初步实验表明，符合- sympow 克服了sympow 变压器的局限性，在扩展的训练和评估上下文中实现了稳健的性能。

更新时间: 2025-03-05 08:50:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03269v1

Quantum-Inspired Privacy-Preserving Federated Learning Framework for Secure Dementia Classification

Dementia, a neurological disorder impacting millions globally, presents significant challenges in diagnosis and patient care. With the rise of privacy concerns and security threats in healthcare, federated learning (FL) has emerged as a promising approach to enable collaborative model training across decentralized datasets without exposing sensitive patient information. However, FL remains vulnerable to advanced security breaches such as gradient inversion and eavesdropping attacks. This paper introduces a novel framework that integrates federated learning with quantum-inspired encryption techniques for dementia classification, emphasizing privacy preservation and security. Leveraging quantum key distribution (QKD), the framework ensures secure transmission of model weights, protecting against unauthorized access and interception during training. The methodology utilizes a convolutional neural network (CNN) for dementia classification, with federated training conducted across distributed healthcare nodes, incorporating QKD-encrypted weight sharing to secure the aggregation process. Experimental evaluations conducted on MRI data from the OASIS dataset demonstrate that the proposed framework achieves identical accuracy levels to a baseline model while enhancing data security and reducing loss by almost 1% compared to the classical baseline model. The framework offers significant implications for democratizing access to AI-driven dementia diagnostics in low- and middle-income countries, addressing critical resource and privacy constraints. This work contributes a robust, scalable, and secure federated learning solution for healthcare applications, paving the way for broader adoption of quantum-inspired techniques in AI-driven medical research.

Updated: 2025-03-05 08:49:31

标题: 量子启发的隐私保护联邦学习框架用于安全痴呆分类

摘要: 痴呆症是一种影响全球数百万人的神经系统疾病，诊断和患者护理面临着重大挑战。随着医疗保健领域隐私和安全威胁的增加，联邦学习（FL）已成为一种有希望的方法，可以在分散的数据集之间进行协作模型训练，而不会暴露敏感患者信息。然而，FL仍然容易受到高级安全漏洞的攻击，如梯度反演和窃听攻击。本文介绍了一种将联邦学习与量子启发式加密技术相结合的新框架，用于痴呆症分类，强调隐私保护和安全性。借助量子密钥分发（QKD），该框架确保模型权重的安全传输，保护免受未经授权的访问和拦截。方法利用卷积神经网络（CNN）进行痴呆症分类，通过分布式的医疗保健节点进行联邦训练，结合QKD加密权重共享来保护聚合过程。在来自OASIS数据集的MRI数据上进行的实验评估表明，所提出的框架在保持与基线模型相同准确度水平的同时，增强了数据安全性，并将损失减少了近1％。该框架对于在低收入和中等收入国家实现基于人工智能的痴呆症诊断的民主化访问具有重要意义，解决了关键资源和隐私约束。这项工作为医疗应用提供了一个强大、可扩展和安全的联邦学习解决方案，为人工智能驱动的医学研究中更广泛采用量子启发式技术铺平了道路。

更新时间: 2025-03-05 08:49:31

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2503.03267v1

LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs -- No Silver Bullet for LC or RAG Routing

Effectively incorporating external knowledge into Large Language Models (LLMs) is crucial for enhancing their capabilities and addressing real-world needs. Retrieval-Augmented Generation (RAG) offers an effective method for achieving this by retrieving the most relevant fragments into LLMs. However, the advancements in context window size for LLMs offer an alternative approach, raising the question of whether RAG remains necessary for effectively handling external knowledge. Several existing studies provide inconclusive comparisons between RAG and long-context (LC) LLMs, largely due to limitations in the benchmark designs. In this paper, we present LaRA, a novel benchmark specifically designed to rigorously compare RAG and LC LLMs. LaRA encompasses 2326 test cases across four practical QA task categories and three types of naturally occurring long texts. Through systematic evaluation of seven open-source and four proprietary LLMs, we find that the optimal choice between RAG and LC depends on a complex interplay of factors, including the model's parameter size, long-text capabilities, context length, task type, and the characteristics of the retrieved chunks. Our findings provide actionable guidelines for practitioners to effectively leverage both RAG and LC approaches in developing and deploying LLM applications. Our code and dataset is provided at: \href{https://github.com/Alibaba-NLP/LaRA}{\textbf{https://github.com/Alibaba-NLP/LaRA}}.

Updated: 2025-03-05 08:48:25

标题: LaRA：检测检索增强生成和长上下文LLMs的基准测试-- LC或RAG路由没有银弹

摘要: 将外部知识有效地整合到大型语言模型（LLMs）中对于增强其能力并满足现实需求至关重要。检索增强生成（RAG）提供了一种有效的方法，通过将最相关的片段检索到LLMs中来实现这一目标。然而，LLMs中上下文窗口大小的进展提供了一种替代方法，引发了一个问题，即RAG是否仍然对有效处理外部知识至关重要。一些现有研究对RAG和长上下文（LC）LLMs进行了不确定的比较，主要是由于基准设计的限制。在本文中，我们提出了LaRA，一个专门设计的新型基准，用于严格比较RAG和LC LLMs。LaRA涵盖了4个实际QA任务类别和3种自然发生的长文本中的2326个测试案例。通过对七个开源和四个专有LLMs进行系统评估，我们发现在RAG和LC之间的最佳选择取决于多种因素的复杂相互作用，包括模型的参数大小、长文本能力、上下文长度、任务类型以及检索到的片段的特征。我们的研究结果为从业者提供了可操作的指导方针，以有效地利用RAG和LC方法开发和部署LLM应用程序。我们的代码和数据集可在以下链接获得：https://github.com/Alibaba-NLP/LaRA。

更新时间: 2025-03-05 08:48:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.09977v2

Gated Delta Networks: Improving Mamba2 with Delta Rule

Linear Transformers have gained attention as efficient alternatives to standard Transformers, but their performance in retrieval and long-context tasks has been limited. To address these limitations, recent work has explored two distinct mechanisms: gating for adaptive memory control and the delta update rule for precise memory modifications. We observe that these mechanisms are complementary: gating enables rapid memory erasure while the delta rule facilitates targeted updates. Building on this insight, we introduce the gated delta rule and develop a parallel training algorithm optimized for modern hardware. Our proposed architecture, Gated DeltaNet, consistently surpasses existing models like Mamba2 and DeltaNet across multiple benchmarks, including language modeling, common-sense reasoning, in-context retrieval, length extrapolation, and long-context understanding. We further enhance performance by developing hybrid architectures that combine Gated DeltaNet layers with sliding window attention or Mamba2 layers, achieving both improved training efficiency and superior task performance.

Updated: 2025-03-05 08:47:27

标题: 门控三角网络：利用三角规则改进Mamba2

摘要: 线性变压器作为标准变压器的高效替代方案引起了关注，但其在检索和长上下文任务中的表现有限。为了解决这些限制，最近的研究探索了两种不同的机制：用于自适应记忆控制的门控和用于精确记忆修改的增量更新规则。我们观察到这些机制是互补的：门控使快速记忆擦除成为可能，而增量规则促进了有针对性的更新。基于这一洞察，我们引入了门控增量规则，并开发了一种针对现代硬件优化的并行训练算法。我们提出的架构，门控增量网络（Gated DeltaNet），在多个基准测试中始终超越现有模型，包括语言建模、常识推理、上下文检索、长度外推和长上下文理解。我们进一步提高性能，开发了将门控增量网络层与滑动窗口注意力或Mamba2层相结合的混合架构，实现了训练效率的提高和任务性能的卓越表现。

更新时间: 2025-03-05 08:47:27

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2412.06464v2

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

Large Language Models (LLMs) have achieved remarkable success and have been applied across various scientific fields, including chemistry. However, many chemical tasks require the processing of visual information, which cannot be successfully handled by existing chemical LLMs. This brings a growing need for models capable of integrating multimodal information in the chemical domain. In this paper, we introduce \textbf{ChemVLM}, an open-source chemical multimodal large language model specifically designed for chemical applications. ChemVLM is trained on a carefully curated bilingual multimodal dataset that enhances its ability to understand both textual and visual chemical information, including molecular structures, reactions, and chemistry examination questions. We develop three datasets for comprehensive evaluation, tailored to Chemical Optical Character Recognition (OCR), Multimodal Chemical Reasoning (MMCR), and Multimodal Molecule Understanding tasks. We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks. Experimental results demonstrate that ChemVLM achieves competitive performance across all evaluated tasks. Our model can be found at https://huggingface.co/AI4Chem/ChemVLM-26B.

Updated: 2025-03-05 08:43:44

标题: ChemVLM: 探索化学领域中多模态大型语言模型的能力

摘要: 大型语言模型（LLMs）取得了显著的成功，并已应用于各种科学领域，包括化学。然而，许多化学任务需要处理视觉信息，这是现有化学LLMs无法成功处理的。这引起了对能够在化学领域集成多模态信息的模型的日益增长的需求。在本文中，我们介绍了\textbf{ChemVLM}，这是一个专门为化学应用设计的开源化学多模态大型语言模型。ChemVLM在一个精心策划的双语多模态数据集上进行训练，增强了其理解文本和视觉化学信息的能力，包括分子结构、反应和化学考试问题。我们开发了三个数据集进行全面评估，定制化学光学字符识别（OCR）、多模态化学推理（MMCR）和多模态分子理解任务。我们在各种任务上对ChemVLM进行了与一系列开源和专有多模态大型语言模型的基准测试。实验结果表明，ChemVLM在所有评估任务中均取得了竞争性表现。我们的模型可以在https://huggingface.co/AI4Chem/ChemVLM-26B找到。

更新时间: 2025-03-05 08:43:44

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2408.07246v3

Flow-based Bayesian filtering for high-dimensional nonlinear stochastic dynamical systems

Bayesian filtering for high-dimensional nonlinear stochastic dynamical systems is a fundamental yet challenging problem in many fields of science and engineering. Existing methods face significant obstacles: Gaussian-based filters struggle with non-Gaussian distributions, while sequential Monte Carlo methods are computationally intensive and prone to particle degeneracy in high dimensions. Although generative models in machine learning have made significant progress in modeling high-dimensional non-Gaussian distributions, their inefficiency in online updating limits their applicability to filtering problems. To address these challenges, we propose a flow-based Bayesian filter (FBF) that integrates normalizing flows to construct a novel latent linear state-space model with Gaussian filtering distributions. This framework facilitates efficient density estimation and sampling using invertible transformations provided by normalizing flows, and it enables the construction of filters in a data-driven manner, without requiring prior knowledge of system dynamics or observation models. Numerical experiments demonstrate the superior accuracy and efficiency of FBF.

Updated: 2025-03-05 08:42:40

标题: 基于流的贝叶斯过滤在高维非线性随机动力系统中的应用

摘要: 贝叶斯滤波是许多科学和工程领域中一个基本但具有挑战性的问题，特别是对于高维非线性随机动力系统。现有方法面临重大障碍：基于高斯的滤波器在处理非高斯分布时困难重重，而序贯蒙特卡洛方法在高维情况下计算密集且容易出现粒子退化。尽管机器学习中的生成模型在建模高维非高斯分布方面取得了重大进展，但它们在线更新的低效性限制了它们在滤波问题中的适用性。为了解决这些挑战，我们提出了一种基于流的贝叶斯滤波器（FBF），它整合了归一化流以构建一个具有高斯滤波分布的新颖潜在线性状态空间模型。该框架利用归一化流提供的可逆变换实现了高效的密度估计和采样，并且它使得滤波器可以以数据驱动的方式构建，无需事先了解系统动态或观测模型。数值实验证明了FBF的优越准确性和效率。

更新时间: 2025-03-05 08:42:40

领域: math.NA,cs.LG,cs.NA,stat.ML

下载: http://arxiv.org/abs/2502.16232v2

Trajectory Prediction for Autonomous Driving: Progress, Limitations, and Future Directions

As the potential for autonomous vehicles to be integrated on a large scale into modern traffic systems continues to grow, ensuring safe navigation in dynamic environments is crucial for smooth integration. To guarantee safety and prevent collisions, autonomous vehicles must be capable of accurately predicting the trajectories of surrounding traffic agents. Over the past decade, significant efforts from both academia and industry have been dedicated to designing solutions for precise trajectory forecasting. These efforts have produced a diverse range of approaches, raising questions about the differences between these methods and whether trajectory prediction challenges have been fully addressed. This paper reviews a substantial portion of recent trajectory prediction methods and devises a taxonomy to classify existing solutions. A general overview of the prediction pipeline is also provided, covering input and output modalities, modeling features, and prediction paradigms discussed in the literature. In addition, the paper discusses active research areas within trajectory prediction, addresses the posed research questions, and highlights the remaining research gaps and challenges.

Updated: 2025-03-05 08:38:51

标题: 自动驾驶的轨迹预测：进展、限制和未来方向

摘要: 随着自动驾驶汽车被大规模整合到现代交通系统中的潜力不断增长，确保在动态环境中安全导航对于顺利整合至关重要。为了保证安全并防止碰撞，自动驾驶汽车必须能够准确预测周围交通参与者的轨迹。在过去的十年中，学术界和工业界投入了大量精力设计精确的轨迹预测解决方案。这些努力产生了各种各样的方法，引发了关于这些方法之间的差异以及轨迹预测挑战是否已经得到充分解决的问题。本文回顾了最近轨迹预测方法的大部分内容，并设计了一个分类现有解决方案的分类法。文中还提供了对预测流程的概述，涵盖了文献中讨论的输入和输出模态、建模特征和预测范式。此外，本文讨论了轨迹预测中的活跃研究领域，回答了提出的研究问题，并强调了剩余的研究空白和挑战。

更新时间: 2025-03-05 08:38:51

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.03262v1

Learning High-Degree Parities: The Crucial Role of the Initialization

Parities have become a standard benchmark for evaluating learning algorithms. Recent works show that regular neural networks trained by gradient descent can efficiently learn degree $k$ parities on uniform inputs for constant $k$, but fail to do so when $k$ and $d-k$ grow with $d$ (here $d$ is the ambient dimension). However, the case where $k=d-O_d(1)$ (almost-full parities), including the degree $d$ parity (the full parity), has remained unsettled. This paper shows that for gradient descent on regular neural networks, learnability depends on the initial weight distribution. On one hand, the discrete Rademacher initialization enables efficient learning of almost-full parities, while on the other hand, its Gaussian perturbation with large enough constant standard deviation $\sigma$ prevents it. The positive result for almost-full parities is shown to hold up to $\sigma=O(d^{-1})$, pointing to questions about a sharper threshold phenomenon. Unlike statistical query (SQ) learning, where a singleton function class like the full parity is trivially learnable, our negative result applies to a fixed function and relies on an initial gradient alignment measure of potential broader relevance to neural networks learning.

Updated: 2025-03-05 08:37:17

标题: 学习高次奇偶性：初始化的关键作用

摘要: 奇偶性已经成为评估学习算法的标准基准。最近的研究表明，通过梯度下降训练的常规神经网络可以有效地学习均匀输入上的常数$k$奇偶性，但当$k$和$d-k$随着$d$的增长而增长时会失败（这里$d$是环境维度）。然而，$k=d-O_d(1)$（几乎全奇偶性）的情况，包括度$d$奇偶性（完全奇偶性），仍未解决。本文表明，对于常规神经网络上的梯度下降，可学习性取决于初始权重分布。一方面，离散Rademacher初始化可以有效地学习几乎全奇偶性，另一方面，带有足够大常数标准差$\sigma$的高斯扰动可以阻止它。几乎全奇偶性的积极结果被证明在$\sigma=O(d^{-1})$时成立，指向了有关更尖锐的阈值现象的问题。不同于统计查询（SQ）学习，其中像完全奇偶性这样的单例函数类是可以轻松学习的，我们的负面结果适用于一个固定函数，并依赖于一个初始梯度对齐度量，可能与神经网络学习有更广泛的相关性。

更新时间: 2025-03-05 08:37:17

领域: cs.LG

下载: http://arxiv.org/abs/2412.04910v3

Scale-Invariant Object Detection by Adaptive Convolution with Unified Global-Local Context

Dense features are important for detecting minute objects in images. Unfortunately, despite the remarkable efficacy of the CNN models in multi-scale object detection, CNN models often fail to detect smaller objects in images due to the loss of dense features during the pooling process. Atrous convolution addresses this issue by applying sparse kernels. However, sparse kernels often can lose the multi-scale detection efficacy of the CNN model. In this paper, we propose an object detection model using a Switchable (adaptive) Atrous Convolutional Network (SAC-Net) based on the efficientDet model. A fixed atrous rate limits the performance of the CNN models in the convolutional layers. To overcome this limitation, we introduce a switchable mechanism that allows for dynamically adjusting the atrous rate during the forward pass. The proposed SAC-Net encapsulates the benefits of both low-level and high-level features to achieve improved performance on multi-scale object detection tasks, without losing the dense features. Further, we apply a depth-wise switchable atrous rate to the proposed network, to improve the scale-invariant features. Finally, we apply global context on the proposed model. Our extensive experiments on benchmark datasets demonstrate that the proposed SAC-Net outperforms the state-of-the-art models by a significant margin in terms of accuracy.

Updated: 2025-03-05 08:36:27

标题: 通过统一的全局-局部上下文自适应卷积实现尺度不变物体检测

摘要: 密集特征对于在图像中检测微小物体非常重要。然而，尽管CNN模型在多尺度物体检测中具有显著的效果，由于在池化过程中丢失密集特征，CNN模型通常无法检测到图像中的较小物体。空洞卷积通过应用稀疏核来解决这个问题。然而，稀疏核往往会丧失CNN模型的多尺度检测效果。在本文中，我们提出了一种基于efficientDet模型的可切换（自适应）空洞卷积网络（SAC-Net）的目标检测模型。固定的空洞率限制了CNN模型在卷积层中的性能。为了克服这一限制，我们引入了一个可切换机制，允许在前向传递过程中动态调整空洞率。所提出的SAC-Net封装了低级和高级特征的优点，以实现在多尺度物体检测任务中的改进性能，而不丧失密集特征。此外，我们在所提出的网络上应用了深度可切换空洞率，以改进尺度不变特征。最后，我们在所提出的模型上应用了全局上下文。我们在基准数据集上进行了大量实验，结果表明所提出的SAC-Net在准确性方面明显优于现有模型。

更新时间: 2025-03-05 08:36:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.05274v2

Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator

Dataset distillation has emerged as a technique aiming to condense informative features from large, natural datasets into a compact and synthetic form. While recent advancements have refined this technique, its performance is bottlenecked by the prevailing class-specific synthesis paradigm. Under this paradigm, synthetic data is optimized exclusively for a pre-assigned one-hot label, creating an implicit class barrier in feature condensation. This leads to inefficient utilization of the distillation budget and oversight of inter-class feature distributions, which ultimately limits the effectiveness and efficiency, as demonstrated in our analysis. To overcome these constraints, this paper presents the Inter-class Feature Compensator (INFER), an innovative distillation approach that transcends the class-specific data-label framework widely utilized in current dataset distillation methods. Specifically, INFER leverages a Universal Feature Compensator (UFC) to enhance feature integration across classes, enabling the generation of multiple additional synthetic instances from a single UFC input. This significantly improves the efficiency of the distillation budget. Moreover, INFER enriches inter-class interactions during the distillation, thereby enhancing the effectiveness and generalizability of the distilled data. By allowing for the linear interpolation of labels similar to those in the original dataset, INFER meticulously optimizes the synthetic data and dramatically reduces the size of soft labels in the synthetic dataset to almost zero, establishing a new benchmark for efficiency and effectiveness in dataset distillation. In practice, INFER demonstrates state-of-the-art performance across benchmark datasets. For instance, in the ipc = 50 setting on ImageNet-1k with the same compression level, it outperforms SRe2L by 34.5% using ResNet18.

Updated: 2025-03-05 08:35:41

标题: 突破阶级障碍：通过跨类特征补偿器实现高效数据集精炼

摘要: 数据集提炼已经成为一种旨在将大型自然数据集中的信息特征压缩为紧凑且合成形式的技术。虽然最近的进展已经完善了这种技术，但其性能受到目前普遍存在的特定类别综合范式的瓶颈影响。在这种范式下，合成数据仅针对预先分配的独热标签进行优化，从而在特征压缩中产生了一个隐含的类别障碍。这导致了对提炼预算的低效利用以及对跨类别特征分布的忽视，最终限制了效果和效率，正如我们的分析所证明的那样。为了克服这些约束，本文提出了Inter-class Feature Compensator（INFER），这是一种创新的提炼方法，超越了目前数据集提炼方法中广泛采用的特定类别数据标签框架。具体而言，INFER利用Universal Feature Compensator（UFC）来增强跨类别特征集成，从而实现从单个UFC输入生成多个额外合成实例。这显著提高了提炼预算的效率。此外，INFER在提炼过程中丰富了跨类别交互，从而增强了提炼数据的效果和泛化能力。通过允许对与原始数据集中相似的标签进行线性插值，INFER精心优化了合成数据，并将合成数据集中的软标签大小几乎降至零，为数据集提炼中的效率和效果建立了新的基准。在实践中，INFER在基准数据集上展示了最先进的性能。例如，在ImageNet-1k的ipc = 50设置中，使用ResNet18，在相同的压缩级别下，它比SRe2L的表现高出34.5%。

更新时间: 2025-03-05 08:35:41

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.06927v3

DP-LDMs: Differentially Private Latent Diffusion Models

Diffusion models (DMs) are one of the most widely used generative models for producing high quality images. However, a flurry of recent papers points out that DMs are least private forms of image generators, by extracting a significant number of near-identical replicas of training images from DMs. Existing privacy-enhancing techniques for DMs, unfortunately, do not provide a good privacy-utility tradeoff. In this paper, we aim to improve the current state of DMs with differential privacy (DP) by adopting the $\textit{Latent}$ Diffusion Models (LDMs). LDMs are equipped with powerful pre-trained autoencoders that map the high-dimensional pixels into lower-dimensional latent representations, in which DMs are trained, yielding a more efficient and fast training of DMs. Rather than fine-tuning the entire LDMs, we fine-tune only the $\textit{attention}$ modules of LDMs with DP-SGD, reducing the number of trainable parameters by roughly $90\%$ and achieving a better privacy-accuracy trade-off. Our approach allows us to generate realistic, high-dimensional images (256x256) conditioned on text prompts with DP guarantees, which, to the best of our knowledge, has not been attempted before. Our approach provides a promising direction for training more powerful, yet training-efficient differentially private DMs, producing high-quality DP images. Our code is available at https://anonymous.4open.science/r/DP-LDM-4525.

Updated: 2025-03-05 08:34:25

标题: DP-LDMs: 差分隐私潜在扩散模型

摘要: 扩散模型（DMs）是生成高质量图像的最广泛使用的生成模型之一。然而，最近一系列论文指出，通过从DMs中提取大量接近相同的训练图像副本，DMs是最不私密的图像生成器之一。目前用于DMs的增强隐私技术，不幸的是，没有提供良好的隐私-效用平衡。在本文中，我们旨在通过采用$\textit{潜在}$扩散模型（LDMs）改进当前DMs状态。LDMs配备有强大的预训练自动编码器，将高维像素映射到较低维度的潜在表示中，其中DMs进行训练，从而实现更高效和快速的DMs训练。我们并非微调整个LDMs，而是仅使用DP-SGD对LDMs的$\textit{注意}$模块进行微调，将可训练参数数量减少约$90\%$，并获得更好的隐私-准确性平衡。我们的方法允许我们生成受文本提示条件的逼真、高维图像（256x256），并确保DP保证，据我们所知，这是以前未尝试过的。我们的方法为训练更强大、训练效率更高的差分私密DMs提供了一个有希望的方向，生成高质量的DP图像。我们的代码可在https://anonymous.4open.science/r/DP-LDM-4525找到。

更新时间: 2025-03-05 08:34:25

领域: stat.ML,cs.CR,cs.LG

下载: http://arxiv.org/abs/2305.15759v6

Exploring the Potential of Large Language Models as Predictors in Dynamic Text-Attributed Graphs

With the rise of large language models (LLMs), there has been growing interest in Graph Foundation Models (GFMs) for graph-based tasks. By leveraging LLMs as predictors, GFMs have demonstrated impressive generalizability across various tasks and datasets. However, existing research on LLMs as predictors has predominantly focused on static graphs, leaving their potential in dynamic graph prediction unexplored. In this work, we pioneer using LLMs for predictive tasks on dynamic graphs. We identify two key challenges: the constraints imposed by context length when processing large-scale historical data and the significant variability in domain characteristics, both of which complicate the development of a unified predictor. To address these challenges, we propose the GraphAgent-Dynamic (GAD) Framework, a multi-agent system that leverages collaborative LLMs. In contrast to using a single LLM as the predictor, GAD incorporates global and local summary agents to generate domain-specific knowledge, enhancing its transferability across domains. Additionally, knowledge reflection agents enable adaptive updates to GAD's knowledge, maintaining a unified and self-consistent architecture. In experiments, GAD demonstrates performance comparable to or even exceeds that of full-supervised graph neural networks without dataset-specific training. Finally, to enhance the task-specific performance of LLM-based predictors, we discuss potential improvements, such as dataset-specific fine-tuning to LLMs. By developing tailored strategies for different tasks, we provide new insights for the future design of LLM-based predictors.

Updated: 2025-03-05 08:28:11

标题: 探索大型语言模型在动态文本属性图中作为预测器的潜力

摘要: 随着大型语言模型（LLMs）的兴起，对于基于图的任务，对图基础模型（GFMs）的兴趣不断增长。通过利用LLMs作为预测器，GFMs已经展示出在各种任务和数据集上的惊人泛化能力。然而，现有关于LLMs作为预测器的研究主要集中在静态图上，未探索它们在动态图预测中的潜力。在这项工作中，我们首次使用LLMs进行动态图的预测任务。我们确定了两个关键挑战：在处理大规模历史数据时所施加的上下文长度限制以及领域特征的显著变化，这两者都使得统一预测器的开发变得复杂。为了解决这些挑战，我们提出了GraphAgent-Dynamic（GAD）框架，这是一个利用协作LLMs的多代理系统。与使用单个LLM作为预测器不同，GAD整合了全局和局部摘要代理以生成领域特定知识，增强了其在不同领域之间的可传递性。此外，知识反射代理使得GAD的知识可以进行自适应更新，保持统一和自洽的架构。在实验中，GAD展示出与或甚至超过全监督图神经网络的性能，而无需特定于数据集的训练。最后，为了增强基于LLMs的预测器的任务特定性能，我们讨论了潜在的改进，例如对LLMs进行特定于数据集的微调。通过为不同任务开发量身定制的策略，我们为未来基于LLMs的预测器设计提供了新的见解。

更新时间: 2025-03-05 08:28:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03258v1

Improved Performances and Motivation in Intelligent Tutoring Systems: Combining Machine Learning and Learner Choice

Large class sizes challenge personalized learning in schools, prompting the use of educational technologies such as intelligent tutoring systems. To address this, we present an AI-driven personalization system, called ZPDES, based on the Learning Progress Hypothesis - modeling curiosity-driven learning - and multi-armed bandit techniques. It sequences exercises that maximize learning progress for each student. While previous studies demonstrated its efficacy in enhancing learning compared to hand-made curricula, its impact on student motivation remained unexplored. Furthermore, ZPDES previously lacked features allowing student choice, a limitation in agency that conflicts with its foundation on models of curiosity-driven learning. This study investigates how integrating choice, as a gamification element unrelated to exercise difficulty, affects both learning outcomes and motivation. We conducted an extensive field study (265 7-8 years old children, RCT design), comparing ZPDES with and without choice against a hand-designed curriculum. Results show that ZPDES improves both learning performance and the learning experience. Moreover adding choice to ZPDES enhances intrinsic motivation and further strengthens its learning benefits. In contrast, incorporating choice into a fixed, linear curriculum negatively impacts learning outcomes. These findings highlight that the intrinsic motivation elicited by choice (gamification) is beneficial only when paired with an adaptive personalized learning system. This insight is critical as gamified features become increasingly prevalent in educational technologies.

Updated: 2025-03-05 08:23:02

标题: 智能辅导系统中的改进表现和动机：结合机器学习和学习者选择

摘要: 大班教室规模挑战了学校中个性化学习的实施，促使教育技术的使用，如智能辅导系统。为了解决这个问题，我们提出了一个基于学习进度假设、模拟以好奇心驱动的学习以及多臂赌博技术的AI驱动个性化系统，名为ZPDES。它为每个学生安排了最大化学习进度的练习顺序。尽管先前的研究证明了它相对于手工制作课程的有效性，但对学生动机的影响尚未被探索。此外，ZPDES之前缺乏允许学生选择的功能，这是一个与以好奇心驱动学习模型冲突的代理权限制。本研究调查了将选择作为与练习难度无关的游戏化元素整合到学习成果和动机上的影响。我们进行了一项广泛的现场研究（265名7-8岁的儿童，随机对照试验设计），比较了有选择和没有选择的ZPDES与手工设计的课程。结果表明，ZPDES提高了学习表现和学习体验。此外，将选择添加到ZPDES中增强了内在动机，并进一步加强了其学习效益。相比之下，在固定的线性课程中加入选择会负面影响学习成果。这些发现凸显了选择（游戏化）激发的内在动机只有在与自适应个性化学习系统配对时才是有益的。随着游戏化特征在教育技术中变得越来越普遍，这一洞察力至关重要。

更新时间: 2025-03-05 08:23:02

领域: cs.CY,cs.AI,cs.LG,I.2.1; I.2.6

下载: http://arxiv.org/abs/2402.01669v2

Rebalanced Multimodal Learning with Data-aware Unimodal Sampling

To address the modality learning degeneration caused by modality imbalance, existing multimodal learning~(MML) approaches primarily attempt to balance the optimization process of each modality from the perspective of model learning. However, almost all existing methods ignore the modality imbalance caused by unimodal data sampling, i.e., equal unimodal data sampling often results in discrepancies in informational content, leading to modality imbalance. Therefore, in this paper, we propose a novel MML approach called \underline{D}ata-aware \underline{U}nimodal \underline{S}ampling~(\method), which aims to dynamically alleviate the modality imbalance caused by sampling. Specifically, we first propose a novel cumulative modality discrepancy to monitor the multimodal learning process. Based on the learning status, we propose a heuristic and a reinforcement learning~(RL)-based data-aware unimodal sampling approaches to adaptively determine the quantity of sampled data at each iteration, thus alleviating the modality imbalance from the perspective of sampling. Meanwhile, our method can be seamlessly incorporated into almost all existing multimodal learning approaches as a plugin. Experiments demonstrate that \method~can achieve the best performance by comparing with diverse state-of-the-art~(SOTA) baselines.

Updated: 2025-03-05 08:19:31

标题: 使用数据感知的单模态采样进行重新平衡的多模态学习

摘要: 为了解决由模态不平衡引起的模态学习退化问题，现有的多模态学习（MML）方法主要尝试从模型学习的角度平衡每个模态的优化过程。然而，几乎所有现有方法都忽视了由单模态数据采样引起的模态不平衡，即相等的单模态数据采样通常导致信息内容的差异，从而导致模态不平衡。因此，在本文中，我们提出了一种名为数据感知单模态采样（DUS）的新颖MML方法，旨在动态缓解由采样引起的模态不平衡。具体来说，我们首先提出了一种新颖的累积模态差异度来监控多模态学习过程。根据学习状态，我们提出了一种基于启发式和基于强化学习（RL）的数据感知单模态采样方法，以自适应地确定每次迭代中采样数据的数量，从而从采样的角度缓解模态不平衡。同时，我们的方法可以无缝地作为插件整合到几乎所有现有的多模态学习方法中。实验证明，与各种最新技术基线相比，DUS可以实现最佳性能。

更新时间: 2025-03-05 08:19:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03792v1

Regularization-based Framework for Quantization-, Fault- and Variability-Aware Training

Efficient inference is critical for deploying deep learning models on edge AI devices. Low-bit quantization (e.g., 3- and 4-bit) with fixed-point arithmetic improves efficiency, while low-power memory technologies like analog nonvolatile memory enable further gains. However, these methods introduce non-ideal hardware behavior, including bit faults and device-to-device variability. We propose a regularization-based quantization-aware training (QAT) framework that supports fixed, learnable step-size, and learnable non-uniform quantization, achieving competitive results on CIFAR-10 and ImageNet. Our method also extends to Spiking Neural Networks (SNNs), demonstrating strong performance on 4-bit networks on CIFAR10-DVS and N-Caltech 101. Beyond quantization, our framework enables fault and variability-aware fine-tuning, mitigating stuck-at faults (fixed weight bits) and device resistance variability. Compared to prior fault-aware training, our approach significantly improves performance recovery under upto 20% bit-fault rate and 40% device-to-device variability. Our results establish a generalizable framework for quantization and robustness-aware training, enhancing efficiency and reliability in low-power, non-ideal hardware.

Updated: 2025-03-05 08:03:39

标题: 基于正则化的量化、故障和变异感知训练框架

摘要: Efficient inference is critical for deploying deep learning models on edge AI devices. Low-bit quantization (e.g., 3- and 4-bit) with fixed-point arithmetic improves efficiency, while low-power memory technologies like analog nonvolatile memory enable further gains. However, these methods introduce non-ideal hardware behavior, including bit faults and device-to-device variability. We propose a regularization-based quantization-aware training (QAT) framework that supports fixed, learnable step-size, and learnable non-uniform quantization, achieving competitive results on CIFAR-10 and ImageNet. Our method also extends to Spiking Neural Networks (SNNs), demonstrating strong performance on 4-bit networks on CIFAR10-DVS and N-Caltech 101. Beyond quantization, our framework enables fault and variability-aware fine-tuning, mitigating stuck-at faults (fixed weight bits) and device resistance variability. Compared to prior fault-aware training, our approach significantly improves performance recovery under up to 20% bit-fault rate and 40% device-to-device variability. Our results establish a generalizable framework for quantization and robustness-aware training, enhancing efficiency and reliability in low-power, non-ideal hardware.

更新时间: 2025-03-05 08:03:39

领域: cs.LG

下载: http://arxiv.org/abs/2503.01297v2

Censor Resistant Instruction Independent Obfuscation for Multiple Programs

This work builds upon and optimizes our prior research on obfuscation as instruction decorrelation which achieves multiple program obfuscation. Leveraging this infrastructure, we further achieve the property of sensor-resistant computation.

Updated: 2025-03-05 07:57:33

标题: 多程序的抗审查指令独立混淆

摘要: 这项工作基于并优化了我们先前关于混淆作为指令去相关化的研究，从而实现了多程序混淆。利用这一基础设施，我们进一步实现了抗传感器计算的性质。

更新时间: 2025-03-05 07:57:33

领域: cs.CR,cs.PL

下载: http://arxiv.org/abs/2502.04157v2

Less is more? Rewards in RL for Cyber Defence

The last few years has seen an explosion of interest in autonomous cyber defence agents based on deep reinforcement learning. Such agents are typically trained in a cyber gym environment, also known as a cyber simulator, at least 32 of which have already been built. Most, if not all cyber gyms provide dense "scaffolded" reward functions which combine many penalties or incentives for a range of (un)desirable states and costly actions. Whilst dense rewards help alleviate the challenge of exploring complex environments, yielding seemingly effective strategies from relatively few environment steps; they are also known to bias the solutions an agent can find, potentially towards suboptimal solutions. Sparse rewards could offer preferable or more effective solutions and have been overlooked by cyber gyms to date. In this work we set out to evaluate whether sparse reward functions might enable training more effective cyber defence agents. Towards this goal we first break down several evaluation limitations in existing work by proposing a ground truth evaluation score that goes beyond the standard RL paradigm used to train and evaluate agents. By adapting a well-established cyber gym to accommodate our methodology and ground truth score, we propose and evaluate two sparse reward mechanisms and compare them with a typical dense reward. Our evaluation considers a range of network sizes, from 2 to 50 nodes, and both reactive and proactive defensive actions. Our results show that sparse rewards, particularly positive reinforcement for an uncompromised network state, enable the training of more effective cyber defence agents. Furthermore, we show that sparse rewards provide more stable training than dense rewards, and that both effectiveness and training stability are robust to a variety of cyber environment considerations.

Updated: 2025-03-05 07:53:39

标题: “Less is more？网络防御中的强化学习奖励”

摘要: 在过去几年中，基于深度强化学习的自主网络防御代理引起了广泛关注。这些代理通常在一个网络健身房环境中接受训练，也被称为网络模拟器，至少已经建立了32个这样的环境。大多数，如果不是所有的网络健身房都提供密集的“脚手架”奖励函数，结合了许多对于一系列（不）理想状态和昂贵行动的惩罚或激励。虽然密集奖励有助于缓解探索复杂环境的挑战，从相对较少的环境步骤中产生看似有效的策略；但也众所周知，它们会偏向代理可能找到的解决方案，潜在地导致次优解。稀疏奖励可能提供更可取或更有效的解决方案，但迄今为止网络健身房忽视了这一点。在这项工作中，我们旨在评估稀疏奖励函数是否能够训练出更有效的网络防御代理。为了实现这一目标，我们首先通过提出一个超越标准RL范式的地面真实评分来分解现有工作中的几个评估限制。通过调整一个成熟的网络健身房以适应我们的方法和地面真实评分，我们提出并评估了两种稀疏奖励机制，并将它们与典型的密集奖励进行比较。我们的评估考虑了从2到50个节点的各种网络规模，以及反应性和主动性的防御行动。我们的结果表明，稀疏奖励，特别是对于未受损网络状态的正向强化，能够训练出更有效的网络防御代理。此外，我们还展示稀疏奖励比密集奖励提供更稳定的训练，而且无论是效果还是训练稳定性都对各种网络环境考虑都很强大。

更新时间: 2025-03-05 07:53:39

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2503.03245v1

Structural Entropy Guided Unsupervised Graph Out-Of-Distribution Detection

With the emerging of huge amount of unlabeled data, unsupervised out-of-distribution (OOD) detection is vital for ensuring the reliability of graph neural networks (GNNs) by identifying OOD samples from in-distribution (ID) ones during testing, where encountering novel or unknown data is inevitable. Existing methods often suffer from compromised performance due to redundant information in graph structures, which impairs their ability to effectively differentiate between ID and OOD data. To address this challenge, we propose SEGO, an unsupervised framework that integrates structural entropy into OOD detection regarding graph classification. Specifically, within the architecture of contrastive learning, SEGO introduces an anchor view in the form of coding tree by minimizing structural entropy. The obtained coding tree effectively removes redundant information from graphs while preserving essential structural information, enabling the capture of distinct graph patterns between ID and OOD samples. Furthermore, we present a multi-grained contrastive learning scheme at local, global, and tree levels using triplet views, where coding trees with essential information serve as the anchor view. Extensive experiments on real-world datasets validate the effectiveness of SEGO, demonstrating superior performance over state-of-the-art baselines in OOD detection. Specifically, our method achieves the best performance on 9 out of 10 dataset pairs, with an average improvement of 3.7\% on OOD detection datasets, significantly surpassing the best competitor by 10.8\% on the FreeSolv/ToxCast dataset pair.

Updated: 2025-03-05 07:47:57

标题: 结构熵引导的无监督图形外分布检测

摘要: 随着大量未标记数据的出现，无监督的离群检测对于确保图神经网络（GNNs）的可靠性至关重要，通过在测试过程中从内部分布（ID）中识别出离群样本，避免遇到新颖或未知数据。现有方法通常由于图结构中的冗余信息而导致性能下降，这损害了它们有效区分ID和OOD数据的能力。为了解决这一挑战，我们提出了SEGO，一个无监督框架，将结构熵整合到图分类的OOD检测中。具体来说，在对比学习的架构中，SEGO引入了一个以编码树形式的锚视图，通过最小化结构熵来获得。所获得的编码树有效地去除了图中的冗余信息，同时保留了基本的结构信息，能够捕捉ID和OOD样本之间的不同图模式。此外，我们提出了一个多粒度的对比学习方案，包括局部、全局和树级别的三元视图，其中带有基本信息的编码树作为锚视图。对真实世界数据集的广泛实验验证了SEGO的有效性，在OOD检测方面表现优越，明显优于OOD检测中现有技术水平。具体来说，我们的方法在10对数据集中的9对中取得了最佳表现，在OOD检测数据集上平均提升了3.7\%，在FreeSolv/ToxCast数据集对中超过了最佳竞争者10.8\%。

更新时间: 2025-03-05 07:47:57

领域: cs.LG

下载: http://arxiv.org/abs/2503.03241v1

PAIR: A Novel Large Language Model-Guided Selection Strategy for Evolutionary Algorithms

Evolutionary Algorithms (EAs) employ random or simplistic selection methods, limiting their exploration of solution spaces and convergence to optimal solutions. The randomness in performing crossover or mutations may limit the model's ability to evolve efficiently. This paper introduces Preference-Aligned Individual Reciprocity (PAIR), a novel selection approach leveraging Large Language Models to emulate human-like mate selection, thereby introducing intelligence to the pairing process in EAs. PAIR prompts an LLM to evaluate individuals within a population based on genetic diversity, fitness level, and crossover compatibility, guiding more informed pairing decisions. We evaluated PAIR against a baseline method called LLM-driven EA (LMEA), published recently. Results indicate that PAIR significantly outperforms LMEA across various TSP instances, achieving lower optimality gaps and improved convergence. This performance is especially noticeable when combined with the flash thinking model, demonstrating increased population diversity to escape local optima. In general, PAIR provides a new strategy in the area of in-context learning for LLM-driven selection in EAs via sophisticated preference modelling, paving the way for improved solutions and further studies into LLM-guided optimization.

Updated: 2025-03-05 07:45:56

标题: PAIR：一种新颖的大型语言模型引导选择策略，用于进化算法

摘要: 进化算法（EAs）采用随机或简单的选择方法，限制了它们对解决方案空间的探索和收敛到最优解的能力。在执行交叉或变异时的随机性可能限制模型的有效演化能力。本文介绍了Preference-Aligned Individual Reciprocity（PAIR），这是一种新颖的选择方法，利用大型语言模型来模拟类似人类的配对选择，从而在EAs的配对过程中引入智能。PAIR促使一个LLM根据遗传多样性、适应度水平和交叉兼容性来评估种群中的个体，从而指导更明智的配对决策。我们将PAIR与最近发布的基线方法LLM驱动的EA（LMEA）进行了评估。结果表明，在各种TSP实例中，PAIR明显优于LMEA，达到了更低的最优性差距和改进的收敛性。当与快速思考模型结合使用时，这种性能尤为明显，表现出增加种群多样性以逃离局部最优解。总的来说，PAIR提供了一种新的策略，通过复杂的偏好建模，在EAs中为LLM驱动的选择提供了上下文学习的方法，为改进解决方案和进一步研究LLM引导优化铺平了道路。

更新时间: 2025-03-05 07:45:56

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2503.03239v1

FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4

Large Language Models (LLMs) have displayed astonishing abilities in various tasks, especially in text generation, classification, question answering, etc. However, the reasoning ability of LLMs still faces many debates. The inherent ambiguity of Natural Language (NL) limits LLMs' ability to perform verifiable reasoning, making its answers lack coherence and trustworthy support. To tackle the above problems, we propose a novel framework named FANS: Formal ANswer Selection for Natural Language Math Reasoning Using Lean4. To the best of our knowledge, it is the first framework that utilizes Lean4 to enhance LLMs' NL math reasoning ability. In particular, given an NL math question and LLM-generated answers, FANS first translates it into Lean4 theorem statements. Then it tries to prove it using a Lean4 prover and verify it by Lean4. Finally, it uses the FL result to assist in answer selection. It enhances LLMs' NL math ability in providing a computer-verifiable solution for its correct answer and proposes an alternative method for answer selection beyond the reward model. Extensive experiments indicate the effectiveness of our framework. It can improve the accuracy rate of reward model enhanced LLMs in the MATH-500 dataset by at most 1.91% and AMC-23 by at most 8.33% on strong reward-model baselines. In some particular fields like number theory that Lean4 experts in, we can even select all correct solutions. The qualitative analysis also shows our framework can make NL results formally backed by Lean4 proofs. As a pioneering work in the corresponding field, we will open-source all our models and datasets to further boost the development of the field.

Updated: 2025-03-05 07:34:53

标题: FANS -- 使用Lean4进行自然语言数学推理的形式化答案选择

摘要: 大型语言模型（LLMs）在各种任务中展示了惊人的能力，特别是在文本生成、分类、问题回答等方面。然而，LLMs的推理能力仍然面临许多争论。自然语言（NL）的固有歧义限制了LLMs执行可验证推理的能力，使其答案缺乏连贯性和可信度支持。为了解决以上问题，我们提出了一个名为FANS的新颖框架：使用Lean4进行自然语言数学推理的正式答案选择。据我们所知，这是第一个利用Lean4增强LLMs NL数学推理能力的框架。具体来说，给定一个NL数学问题和LLM生成的答案，FANS首先将其翻译成Lean4定理陈述。然后尝试使用Lean4证明器证明并通过Lean4验证。最后，它使用FL结果来辅助答案选择。它通过为其正确答案提供计算机可验证的解决方案提高了LLMs的NL数学能力，并提出了一种超越奖励模型的答案选择替代方法。广泛的实验证明了我们框架的有效性。它可以将MATH-500数据集中奖励模型增强的LLMs的准确率提高至多1.91%，并将AMC-23提高至多8.33%。在像Lean4专家所在的数论等特定领域，我们甚至可以选择所有正确的解决方案。定性分析还显示我们的框架可以使NL结果得到Lean4证明的形式支持。作为相关领域的开创性工作，我们将开源所有模型和数据集，以进一步推动该领域的发展。

更新时间: 2025-03-05 07:34:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03238v1

Prediction of Halo Coronal Mass Ejections Using SDO/HMI Vector Magnetic Data Products and a Transformer Model

We present a transformer model, named DeepHalo, to predict the occurrence of halo coronal mass ejections (CMEs). Our model takes as input an active region (AR) and a profile, where the profile contains a time series of data samples in the AR that are collected 24 hours before the beginning of a day, and predicts whether the AR would produce a halo CME during that day. Each data sample contains physical parameters, or features, derived from photospheric vector magnetic field data taken by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO). We survey and match CME events in the Space Weather Database Of Notification, Knowledge, Information (DONKI) and Large Angle and Spectrometric Coronagraph (LASCO) CME Catalog, and compile a list of CMEs including halo CMEs and non-halo CMEs associated with ARs in the period between November 2010 and August 2023. We use the information gathered above to build the labels (positive versus negative) of the data samples and profiles at hand, where the labels are needed for machine learning. Experimental results show that DeepHalo with a true skill statistics (TSS) score of 0.907 outperforms a closely related long short-term memory network with a TSS score of 0.821. To our knowledge, this is the first time that the transformer model has been used for halo CME prediction.

Updated: 2025-03-05 07:31:06

标题: 利用SDO/HMI矢量磁数据产品和变压器模型预测日冕质量暴。

摘要: 我们提出了一个名为DeepHalo的变压器模型，用于预测日冕质量抛射（CME）出现的可能性。我们的模型以一个活动区域（AR）和一个配置文件作为输入，其中配置文件包含AR中在一天开始前24小时收集的数据样本的时间序列，并预测AR在当天是否会产生一个日冕CME。每个数据样本包含从Solar Dynamics Observatory（SDO）上搭载的Helioseismic和Magnetic Imager（HMI）获取的光球矢量磁场数据导出的物理参数或特征。我们调查并匹配了Space Weather Database Of Notification，Knowledge，Information（DONKI）和Large Angle和Spectrometric Coronagraph（LASCO）CME目录中的CME事件，并编制了一个包括与AR相关的日冕CME和非日冕CME的CME列表，时间跨度为2010年11月至2023年8月。我们利用上述收集的信息构建了手头数据样本和配置文件的标签（正面对负面），这些标签对机器学习是必要的。实验结果显示，DeepHalo具有0.907的真实技能统计（TSS）得分，优于一个TSS得分为0.821的密切相关的长短期记忆网络。据我们所知，这是第一次使用变压器模型进行日冕CME预测。

更新时间: 2025-03-05 07:31:06

领域: astro-ph.SR,cs.LG

下载: http://arxiv.org/abs/2503.03237v1

Grams: Gradient Descent with Adaptive Momentum Scaling

We introduce $\mathbf{G}$radient Descent with $\mathbf{A}$daptive $\mathbf{M}$omentum $\mathbf{S}$caling ($\mathbf{Grams}$), a novel optimization algorithm that decouples the direction and magnitude of parameter updates in deep learning. Unlike traditional optimizers that directly integrate momentum into updates, Grams separates the update direction, derived from current gradients, from momentum, which is used solely for adaptive magnitude scaling. This approach enables Grams to achieve improved loss descent compared to state-of-the-art cautious and momentum-based optimizers. We theoretically demonstrate that Grams descents faster than other state-of-the-art optimizers and establish a global convergence guarantee for Grams. We also validate its effectiveness through extensive empirical evaluations. The results demonstrate Grams' superior performance, including faster convergence and better generalization, compared to widely-used optimizers such as Adam, Lion, and their cautious variants. Our results highlight Grams' potential as a transformative approach for efficiently training and fine-tuning large language models. Code is available at https://github.com/Gunale0926/Grams.

Updated: 2025-03-05 07:29:42

标题: Grams：自适应动量缩放的梯度下降

摘要: 我们介绍了一种新颖的优化算法$\mathbf{G}$radient Descent with $\mathbf{A}$daptive $\mathbf{M}$omentum $\mathbf{S}$caling（$\mathbf{Grams}$），它将参数更新的方向和幅度分离开来，用于深度学习。与直接将动量整合到更新中的传统优化器不同，Grams将更新方向（从当前梯度导出）与动量分开，仅用于自适应幅度缩放。这种方法使Grams能够实现比最先进的谨慎和基于动量的优化器更好的损失下降。我们理论上证明了Grams比其他最先进的优化器下降更快，并为Grams建立了全局收敛保证。我们还通过广泛的实证评估验证了其有效性。结果表明，与广泛使用的优化器（如Adam、Lion及其谨慎变体）相比，Grams表现出更快的收敛速度和更好的泛化性能。我们的结果突出了Grams作为有效训练和微调大型语言模型的潜力。代码可在https://github.com/Gunale0926/Grams获取。

更新时间: 2025-03-05 07:29:42

领域: cs.LG,cs.AI,cs.DS,math.OC

下载: http://arxiv.org/abs/2412.17107v3

Predicting Team Performance from Communications in Simulated Search-and-Rescue

Understanding how individual traits influence team performance is valuable, but these traits are not always directly observable. Prior research has inferred traits like trust from behavioral data. We analyze conversational data to identify team traits and their correlation with teaming outcomes. Using transcripts from a Minecraft-based search-and-rescue experiment, we apply topic modeling and clustering to uncover key interaction patterns. Our findings show that variations in teaming outcomes can be explained through these inferences, with different levels of predictive power derived from individual traits and team dynamics.

Updated: 2025-03-05 07:20:27

标题: 从模拟搜索和救援中的沟通预测团队绩效

摘要: 理解个体特质如何影响团队绩效是有价值的，但这些特质并不总是直接可观察的。以往的研究从行为数据中推断出信任等特质。我们分析对话数据以识别团队特质及其与团队绩效的相关性。利用基于Minecraft的搜索和救援实验的文字记录，我们应用主题建模和聚类技术来揭示关键的互动模式。我们的研究结果表明，团队绩效的变化可以通过这些推断来解释，不同水平的预测能力来自个体特质和团队动态。

更新时间: 2025-03-05 07:20:27

领域: cs.AI

下载: http://arxiv.org/abs/2503.03791v1

Positive-Unlabeled Diffusion Models for Preventing Sensitive Data Generation

Diffusion models are powerful generative models but often generate sensitive data that are unwanted by users, mainly because the unlabeled training data frequently contain such sensitive data. Since labeling all sensitive data in the large-scale unlabeled training data is impractical, we address this problem by using a small amount of labeled sensitive data. In this paper, we propose positive-unlabeled diffusion models, which prevent the generation of sensitive data using unlabeled and sensitive data. Our approach can approximate the evidence lower bound (ELBO) for normal (negative) data using only unlabeled and sensitive (positive) data. Therefore, even without labeled normal data, we can maximize the ELBO for normal data and minimize it for labeled sensitive data, ensuring the generation of only normal data. Through experiments across various datasets and settings, we demonstrated that our approach can prevent the generation of sensitive images without compromising image quality.

Updated: 2025-03-05 07:17:48

标题: 正面-未标记扩散模型用于防止敏感数据生成

摘要: 扩散模型是强大的生成模型，但通常会生成用户不希望的敏感数据，主要是因为未标记的训练数据经常包含这些敏感数据。由于标记大规模未标记的训练数据中的所有敏感数据是不现实的，我们通过使用少量标记的敏感数据来解决这个问题。在本文中，我们提出了正-未标记扩散模型，通过使用未标记和敏感数据来阻止敏感数据的生成。我们的方法可以仅使用未标记和敏感数据来近似正常（负面）数据的证据下界（ELBO）。因此，即使没有标记的正常数据，我们也可以最大化正常数据的ELBO，并最小化标记的敏感数据的ELBO，确保只生成正常数据。通过在各种数据集和设置上的实验，我们证明了我们的方法可以防止生成敏感图像，而不会影响图像质量。

更新时间: 2025-03-05 07:17:48

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2503.03789v1

StickMotion: Generating 3D Human Motions by Drawing a Stickman

Text-to-motion generation, which translates textual descriptions into human motions, has been challenging in accurately capturing detailed user-imagined motions from simple text inputs. This paper introduces StickMotion, an efficient diffusion-based network designed for multi-condition scenarios, which generates desired motions based on traditional text and our proposed stickman conditions for global and local control of these motions, respectively. We address the challenges introduced by the user-friendly stickman from three perspectives: 1) Data generation. We develop an algorithm to generate hand-drawn stickmen automatically across different dataset formats. 2) Multi-condition fusion. We propose a multi-condition module that integrates into the diffusion process and obtains outputs of all possible condition combinations, reducing computational complexity and enhancing StickMotion's performance compared to conventional approaches with the self-attention module. 3) Dynamic supervision. We empower StickMotion to make minor adjustments to the stickman's position within the output sequences, generating more natural movements through our proposed dynamic supervision strategy. Through quantitative experiments and user studies, sketching stickmen saves users about 51.5% of their time generating motions consistent with their imagination. Our codes, demos, and relevant data will be released to facilitate further research and validation within the scientific community.

Updated: 2025-03-05 07:16:14

标题: "StickMotion：通过绘制一个简单的人形图生成3D人体动作"

摘要: 文本到动作生成是将文本描述转化为人类动作的一种技术，在准确捕捉用户从简单文本输入中想象的细节动作方面具有挑战性。本文介绍了StickMotion，这是一种基于扩散的高效网络，专为多条件场景设计，根据传统文本和我们提出的stickman条件为全局和局部控制生成所需的动作。我们从三个角度解决了由用户友好的stickman引入的挑战：1)数据生成。我们开发了一种算法，可以自动在不同的数据集格式上生成手绘的stickmen。2)多条件融合。我们提出了一个多条件模块，将其集成到扩散过程中，并获得所有可能条件组合的输出，减少了计算复杂性，并提高了StickMotion相对于传统方法的性能，其中包括自注意力模块。3)动态监督。我们赋予StickMotion能力，在输出序列中对stickman的位置进行微小调整，通过我们提出的动态监督策略生成更自然的动作。通过定量实验和用户研究，草绘stickmen可以节省用户约51.5%的时间，生成与他们想象一致的动作。我们将发布代码、演示和相关数据，以促进科学界进一步的研究和验证。

更新时间: 2025-03-05 07:16:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.04829v1

TimeRefine: Temporal Grounding with Time Refining Video LLM

Video temporal grounding aims to localize relevant temporal boundaries in a video given a textual prompt. Recent work has focused on enabling Video LLMs to perform video temporal grounding via next-token prediction of temporal timestamps. However, accurately localizing timestamps in videos remains challenging for Video LLMs when relying solely on temporal token prediction. Our proposed TimeRefine addresses this challenge in two ways. First, instead of directly predicting the start and end timestamps, we reformulate the temporal grounding task as a temporal refining task: the model first makes rough predictions and then refines them by predicting offsets to the target segment. This refining process is repeated multiple times, through which the model progressively self-improves its temporal localization accuracy. Second, to enhance the model's temporal perception capabilities, we incorporate an auxiliary prediction head that penalizes the model more if a predicted segment deviates further from the ground truth, thus encouraging the model to make closer and more accurate predictions. Our plug-and-play method can be integrated into most LLM-based temporal grounding approaches. The experimental results demonstrate that TimeRefine achieves 3.6% and 5.0% mIoU improvements on the ActivityNet and Charades-STA datasets, respectively. Code and pretrained models will be released.

Updated: 2025-03-05 07:06:15

标题: TimeRefine：使用时间细化视频LLM进行时间定位

摘要: 视频时间定位旨在根据文本提示在视频中定位相关的时间边界。最近的研究侧重于使视频LLMs能够通过预测时间戳的下一个标记来执行视频时间定位。然而，当仅依赖于时间标记预测时，准确定位视频中的时间戳对于视频LLMs仍然具有挑战性。我们提出的TimeRefine以两种方式解决这一挑战。首先，我们将时间定位任务重新制定为时间细化任务，而不是直接预测开始和结束时间戳：模型首先进行粗略预测，然后通过预测与目标段的偏移来细化预测。这个细化过程被多次重复，通过这个过程，模型逐渐改进其时间定位精度。其次，为了增强模型的时间感知能力，我们加入了一个辅助预测头，如果预测段偏离地面真相更远，该头将对模型进行更多惩罚，从而鼓励模型做出更接近和更准确的预测。我们的即插即用方法可以集成到大多数基于LLM的时间定位方法中。实验结果表明，TimeRefine分别在ActivityNet和Charades-STA数据集上实现了3.6%和5.0%的mIoU改进。代码和预训练模型将会发布。

更新时间: 2025-03-05 07:06:15

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2412.09601v2

Revisiting Random Walks for Learning on Graphs

We revisit a simple model class for machine learning on graphs, where a random walk on a graph produces a machine-readable record, and this record is processed by a deep neural network to directly make vertex-level or graph-level predictions. We call these stochastic machines random walk neural networks (RWNNs), and through principled analysis, show that we can design them to be isomorphism invariant while capable of universal approximation of graph functions in probability. A useful finding is that almost any kind of record of random walks guarantees probabilistic invariance as long as the vertices are anonymized. This enables us, for example, to record random walks in plain text and adopt a language model to read these text records to solve graph tasks. We further establish a parallelism to message passing neural networks using tools from Markov chain theory, and show that over-smoothing in message passing is alleviated by construction in RWNNs, while over-squashing manifests as probabilistic under-reaching. We empirically demonstrate RWNNs on a range of problems, verifying our theoretical analysis and demonstrating the use of language models for separating strongly regular graphs where 3-WL test fails, and transductive classification on arXiv citation network. Code is available at https://github.com/jw9730/random-walk.

Updated: 2025-03-05 07:02:28

标题: 重新审视在图上学习的随机游走

摘要: 我们重新审视了一种用于图上机器学习的简单模型类，其中图上的随机游走产生一个机器可读的记录，这个记录被深度神经网络处理，直接进行顶点级或图级的预测。我们将这些随机机器称为随机游走神经网络（RWNNs），通过原则性分析，我们展示了我们可以设计它们成为同构不变的，同时能够以概率的方式对图函数进行通用逼近。一个有用的发现是，只要顶点匿名化，几乎任何类型的随机游走记录都能保证概率不变性。这使我们能够在纯文本中记录随机游走，并采用语言模型读取这些文本记录来解决图任务。我们进一步建立了使用马尔可夫链理论工具的消息传递神经网络的并行性，并展示了在RWNNs的构建过程中，消息传递中的过度平滑得到缓解，而过度压缩表现为概率性的未达到。我们在一系列问题上通过实证验证了RWNNs，验证了我们的理论分析，并展示了使用语言模型来分离3-WL测试失败的强正则图，以及在arXiv引文网络上进行的传导分类。代码可在https://github.com/jw9730/random-walk找到。

更新时间: 2025-03-05 07:02:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.01214v3

Training a Generally Curious Agent

Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. In this paper, we present PAPRIKA, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. By training on synthetic interaction data from different tasks that require diverse strategies, PAPRIKA teaches models to explore and adapt their behavior on a new task based on environment feedback in-context without more gradient updates. Experimental results show that models fine-tuned with PAPRIKA can effectively transfer their learned decision-making capabilities to entirely unseen tasks without additional training. Unlike traditional training, our approach's primary bottleneck lies in sampling useful interaction data instead of model updates. To improve sample efficiency, we propose a curriculum learning strategy that prioritizes sampling trajectories from tasks with high learning potential. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems that require interactions with the external world.

Updated: 2025-03-05 06:53:52

标题: 训练一个普遍好奇的代理程序

摘要: 高效的探索对于与环境互动的智能系统至关重要，但现有的语言模型在需要战略信息收集的场景中往往表现不佳。本文介绍了PAPRIKA，这是一种微调方法，使语言模型能够发展出不受特定环境限制的一般决策能力。通过在需要多样化策略的不同任务的合成交互数据上进行训练，PAPRIKA教导模型在新任务中根据环境反馈在上下文中探索和调整其行为，而无需更多的梯度更新。实验结果显示，使用PAPRIKA进行微调的模型能够有效地将他们学到的决策能力转移到完全未见过的任务中，而无需额外的训练。与传统训练不同，我们方法的主要瓶颈在于采样有用的交互数据，而不是模型更新。为了提高采样效率，我们提出了一个课程学习策略，优先从具有高学习潜力的任务中采样轨迹。这些结果表明了通向能够自主解决需要与外部世界互动的新颖顺序决策问题的人工智能系统的有希望的路径。

更新时间: 2025-03-05 06:53:52

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2502.17543v2

Affordance-Guided Reinforcement Learning via Visual Prompting

Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as human demonstrations of success and failure, to learn task-specific reward functions. Recently, there is also a growing adoption of large multi-modal foundation models for robotics that can perform visual reasoning in physical contexts and generate coarse robot motions for manipulation tasks. Motivated by this range of capability, in this work, we present Keypoint-based Affordance Guidance for Improvements (KAGI), a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL. State-of-the-art VLMs have demonstrated impressive reasoning about affordances through keypoints in zero-shot, and we use these to define dense rewards that guide autonomous robotic learning. On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 30K online fine-tuning steps. Additionally, we demonstrate the robustness of KAGI to reductions in the number of in-domain demonstrations used for pre-training, reaching similar performance in 45K online fine-tuning steps. Project website: https://sites.google.com/view/affordance-guided-rl

Updated: 2025-03-05 06:53:17

标题: 通过视觉提示引导的可供性强化学习

摘要: 具备强化学习（RL）的机器人有潜力仅通过奖励信号学习各种技能。然而，对于一般操作任务而言，获取稳健且密集的奖励信号仍然是一个挑战。现有的基于学习的方法需要大量数据，例如成功和失败的人类演示，来学习特定任务的奖励函数。最近，越来越多地采用了可以在物理环境中进行视觉推理并生成用于操作任务的粗糙机器人动作的大型多模态基础模型。受到这种能力范围的启发，本文提出了一种称为基于关键点的可供指导改进（KAGI）的方法，利用视觉语言模型（VLMs）塑造的奖励来进行自主RL。最先进的VLMs已经展示了在零样本中通过关键点对可供性进行令人印象深刻的推理，我们利用这些定义密集奖励来指导自主机器人学习。在由自然语言描述指定的真实世界操作任务中，KAGI提高了自主RL的样本效率，并使任务成功完成需要30K次在线微调步骤。此外，我们展示了KAGI对于用于预训练的领域内演示数量的减少的稳健性，在45K次在线微调步骤中达到类似的性能。项目网站：https://sites.google.com/view/affordance-guided-rl

更新时间: 2025-03-05 06:53:17

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.10341v5

$μ^2$-SGD: Stable Stochastic Optimization via a Double Momentum Mechanism

We consider stochastic convex optimization problems where the objective is an expectation over smooth functions. For this setting we suggest a novel gradient estimate that combines two recent mechanism that are related to notion of momentum. Then, we design an SGD-style algorithm as well as an accelerated version that make use of this new estimator, and demonstrate the robustness of these new approaches to the choice of the learning rate. Concretely, we show that these approaches obtain the optimal convergence rates for both noiseless and noisy case with the same choice of fixed learning rate. Moreover, for the noisy case we show that these approaches achieve the same optimal bound for a very wide range of learning rates.

Updated: 2025-03-05 06:51:11

标题: $μ^2$-SGD：通过双动量机制实现稳定的随机优化

摘要: 我们考虑随机凸优化问题，其中目标函数是平滑函数的期望值。针对这种情况，我们提出了一种结合了与动量概念相关的两种最新机制的新梯度估计。然后，我们设计了一种类似随机梯度下降的算法以及一种加速版本，利用这个新的估计器，并展示了这些新方法对学习率选择的稳健性。具体来说，我们证明这些方法在选择固定学习率时，在无噪声和有噪声情况下都能获得最佳收敛速率。此外，对于有噪声情况，我们展示这些方法在非常广泛的学习率范围内实现了相同的最优界限。

更新时间: 2025-03-05 06:51:11

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2304.04172v2

Beyond Next Word Prediction: Developing Comprehensive Evaluation Frameworks for measuring LLM performance on real world applications

While Large Language Models (LLMs) are fundamentally next-token prediction systems, their practical applications extend far beyond this basic function. From natural language processing and text generation to conversational assistants and software use, LLMs have numerous use-cases, and have already acquired a significant degree of enterprise adoption. To evaluate such models, static evaluation datasets, consisting of a set of prompts and their corresponding ground truths, are often used to benchmark the efficacy of the model for a particular task. In this paper, we provide the basis for a more comprehensive evaluation framework, based upon a traditional game and tool-based architecture that enables a more overarching measurement of a model's capabilities. For simplicity, we provide a generalized foundation that can be extended, without significant alteration, to numerous scenarios, from specific use cases such as supply chain management or financial reasoning, to abstract measurements such as ethics or safety.

Updated: 2025-03-05 06:44:38

标题: 超越下一个词预测：开发全面评估框架，以衡量LLM在实际应用中的性能

摘要: 尽管大型语言模型（LLMs）基本上是下一个标记预测系统，但它们的实际应用远远超出了这一基本功能。从自然语言处理和文本生成到对话助手和软件使用，LLMs有许多用例，并已经获得了企业采用的重要程度。为了评估这些模型，通常使用静态评估数据集，包括一组提示和它们对应的地面事实，以用于基准测试模型对特定任务的有效性。在本文中，我们提供了一个更全面的评估框架的基础，基于传统的游戏和基于工具的架构，可以更全面地衡量模型的能力。为了简化起见，我们提供了一个通用的基础，可以在不需要重大改变的情况下扩展到许多场景，从供应链管理或金融推理等具体用例到伦理或安全等抽象的测量。

更新时间: 2025-03-05 06:44:38

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.04828v1

Preserving Cultural Identity with Context-Aware Translation Through Multi-Agent AI Systems

Language is a cornerstone of cultural identity, yet globalization and the dominance of major languages have placed nearly 3,000 languages at risk of extinction. Existing AI-driven translation models prioritize efficiency but often fail to capture cultural nuances, idiomatic expressions, and historical significance, leading to translations that marginalize linguistic diversity. To address these challenges, we propose a multi-agent AI framework designed for culturally adaptive translation in underserved language communities. Our approach leverages specialized agents for translation, interpretation, content synthesis, and bias evaluation, ensuring that linguistic accuracy and cultural relevance are preserved. Using CrewAI and LangChain, our system enhances contextual fidelity while mitigating biases through external validation. Comparative analysis shows that our framework outperforms GPT-4o, producing contextually rich and culturally embedded translations, a critical advancement for Indigenous, regional, and low-resource languages. This research underscores the potential of multi-agent AI in fostering equitable, sustainable, and culturally sensitive NLP technologies, aligning with the AI Governance, Cultural NLP, and Sustainable NLP pillars of Language Models for Underserved Communities. Our full experimental codebase is publicly available at: https://github.com/ciol-researchlab/Context-Aware_Translation_MAS

Updated: 2025-03-05 06:43:59

标题: 通过多智能体AI系统实现上下文感知翻译以保护文化身份

摘要: 语言是文化身份的基石，然而全球化和主要语言的主导地位使近3000种语言面临灭绝的风险。现有的人工智能驱动的翻译模型注重效率，但往往无法捕捉文化细微差别、习语表达和历史意义，导致边缘化语言多样性的翻译。为了解决这些挑战，我们提出了一个针对未受服务的语言社区设计的多代理人人工智能框架，旨在进行文化适应性翻译。我们的方法利用专门的代理人进行翻译、口译、内容合成和偏见评估，确保语言准确性和文化相关性得以保留。通过使用CrewAI和LangChain，我们的系统在增强上下文保真度的同时，通过外部验证减轻偏见。比较分析表明，我们的框架胜过GPT-4o，产生了富有情境和文化内涵的翻译，这对于土著、地区和资源匮乏语言是一个重要的进步。这项研究强调了多代理人人工智能在促进公平、可持续和文化敏感的自然语言处理技术方面的潜力，符合语言模型为未受服务的社区所做的人工智能治理、文化自然语言处理和可持续自然语言处理支柱。我们完整的实验代码库可以在以下链接公开获取：https://github.com/ciol-researchlab/Context-Aware_Translation_MAS

更新时间: 2025-03-05 06:43:59

领域: cs.CL,cs.AI,cs.CY,cs.MA

下载: http://arxiv.org/abs/2503.04827v1

CarPlanner: Consistent Auto-regressive Trajectory Planning for Large-scale Reinforcement Learning in Autonomous Driving

Trajectory planning is vital for autonomous driving, ensuring safe and efficient navigation in complex environments. While recent learning-based methods, particularly reinforcement learning (RL), have shown promise in specific scenarios, RL planners struggle with training inefficiencies and managing large-scale, real-world driving scenarios. In this paper, we introduce \textbf{CarPlanner}, a \textbf{C}onsistent \textbf{a}uto-\textbf{r}egressive \textbf{Planner} that uses RL to generate multi-modal trajectories. The auto-regressive structure enables efficient large-scale RL training, while the incorporation of consistency ensures stable policy learning by maintaining coherent temporal consistency across time steps. Moreover, CarPlanner employs a generation-selection framework with an expert-guided reward function and an invariant-view module, simplifying RL training and enhancing policy performance. Extensive analysis demonstrates that our proposed RL framework effectively addresses the challenges of training efficiency and performance enhancement, positioning CarPlanner as a promising solution for trajectory planning in autonomous driving. To the best of our knowledge, we are the first to demonstrate that the RL-based planner can surpass both IL- and rule-based state-of-the-arts (SOTAs) on the challenging large-scale real-world dataset nuPlan. Our proposed CarPlanner surpasses RL-, IL-, and rule-based SOTA approaches within this demanding dataset.

Updated: 2025-03-05 06:36:27

标题: CarPlanner：大规模强化学习中一致的自回归轨迹规划（Autonomous Driving）

摘要: 轨迹规划对于自动驾驶至关重要，可以确保在复杂环境中进行安全和高效的导航。尽管最近基于学习的方法，特别是强化学习（RL），在特定场景中显示出潜力，但RL规划者在训练效率和管理大规模、真实世界的驾驶场景方面存在困难。本文介绍了CarPlanner，这是一个使用RL生成多模态轨迹的一致性自回归规划器。自回归结构实现了高效的大规模RL训练，而一致性的引入确保通过保持时间步骤间的连贯性，实现稳定的策略学习。此外，CarPlanner采用了一个生成-选择框架，其中包括专家引导的奖励函数和不变视图模块，简化了RL训练并增强了策略性能。广泛的分析表明，我们提出的RL框架有效地解决了训练效率和性能提升的挑战，使CarPlanner成为自动驾驶中轨迹规划的一个有希望的解决方案。据我们所知，我们是第一个证明RL-based规划者可以在具有挑战性的大规模真实世界数据集nuPlan上超越IL和基于规则的最新技术（SOTAs）的人。我们提出的CarPlanner在这个具有挑战性的数据集中超越了RL、IL和基于规则的SOTA方法。

更新时间: 2025-03-05 06:36:27

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.19908v2

CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for Classroom Environments

Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students. In this work, we study the efficacy of continued pretraining (CPT) in adapting Wav2vec2.0 to the classroom domain. We show that CPT is a powerful tool in that regard and reduces the Word Error Rate (WER) of Wav2vec2.0-based models by upwards of 10%. More specifically, CPT improves the model's robustness to different noises, microphones and classroom conditions.

Updated: 2025-03-05 06:32:04

标题: CPT-增强Wav2vec2.0：面向课堂环境的噪声鲁棒语音识别

摘要: 在开发AI工具以帮助教师和学生的过程中，创建对课堂环境具有鲁棒性和弹性的自动语音识别（ASR）系统至关重要。在这项工作中，我们研究了持续预训练（CPT）在适应Wav2vec2.0到课堂领域的效果。我们展示了CPT在这方面是一个强大的工具，并将基于Wav2vec2.0的模型的词误率（WER）降低了超过10％。更具体地说，CPT提高了模型对不同噪声、麦克风和课堂环境的鲁棒性。

更新时间: 2025-03-05 06:32:04

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2409.14494v3

VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool

Due to the growing complexity of modern Integrated Circuits (ICs), automating hardware design can prevent a significant amount of human error from the engineering process and result in less errors. Verilog is a popular hardware description language for designing and modeling digital systems; thus, Verilog generation is one of the emerging areas of research to facilitate the design process. In this work, we propose VerilogCoder, a system of multiple Artificial Intelligence (AI) agents for Verilog code generation, to autonomously write Verilog code and fix syntax and functional errors using collaborative Verilog tools (i.e., syntax checker, simulator, and waveform tracer). Firstly, we propose a task planner that utilizes a novel Task and Circuit Relation Graph retrieval method to construct a holistic plan based on module descriptions. To debug and fix functional errors, we develop a novel and efficient abstract syntax tree (AST)-based waveform tracing tool, which is integrated within the autonomous Verilog completion flow. The proposed methodology successfully generates 94.2% syntactically and functionally correct Verilog code, surpassing the state-of-the-art methods by 33.9% on the VerilogEval-Human v2 benchmark.

Updated: 2025-03-05 06:23:52

标题: VerilogCoder：基于图形规划和基于抽象语法树（AST）的波形跟踪工具的自主Verilog编码代理

摘要: 由于现代集成电路（ICs）越来越复杂，自动化硬件设计可以防止工程过程中的大量人为错误，并减少错误。Verilog是一种流行的硬件描述语言，用于设计和建模数字系统；因此，Verilog生成是研究的新兴领域之一，以促进设计过程。在这项工作中，我们提出了VerilogCoder，这是一个由多个人工智能（AI）代理组成的系统，用于自主编写Verilog代码并使用协作Verilog工具（即语法检查器、模拟器和波形跟踪器）修复语法和功能错误。首先，我们提出了一个任务规划器，利用一种新颖的任务和电路关系图检索方法，根据模块描述构建综合计划。为了调试和修复功能错误，我们开发了一种基于抽象语法树（AST）的波形跟踪工具，该工具在自主Verilog完成流程中集成。该方法成功生成了94.2%的语法和功能正确的Verilog代码，在VerilogEval-Human v2基准测试中超过了33.9%的最新方法。

更新时间: 2025-03-05 06:23:52

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.08927v2

Unsupervised Topic Models are Data Mixers for Pre-training Language Models

The performance of large language models (LLMs) is significantly affected by the quality and composition of their pre-training data, which is inherently diverse, spanning various domains, sources, and topics. Effectively integrating these heterogeneous data sources is crucial for optimizing LLM performance. Previous research has predominantly concentrated on domain-based data mixing, often neglecting the nuanced topic-level characteristics of the data. To address this gap, we propose a simple yet effective topic-based data mixing strategy that utilizes fine-grained topics generated through our topic modeling method, DataWeave. DataWeave employs a multi-stage clustering process to group semantically similar documents and utilizes LLMs to generate detailed topics, thereby facilitating a more nuanced understanding of dataset composition. Our strategy employs heuristic methods to upsample or downsample specific topics, which significantly enhances LLM performance on downstream tasks, achieving superior results compared to previous, more complex data mixing approaches. Furthermore, we confirm that the topics Science and Relationships are particularly effective, yielding the most substantial performance improvements. We will make our code and datasets publicly available.

Updated: 2025-03-05 06:23:22

标题: 无监督主题模型是预训练语言模型的数据混合器

摘要: 大型语言模型（LLMs）的性能受其预训练数据的质量和组成的显著影响，这些数据囊括了各种领域、来源和主题，具有多样性。有效地整合这些异质数据源对于优化LLM性能至关重要。先前的研究主要集中在基于领域的数据混合上，通常忽略了数据的微妙主题级特征。为了填补这一空白，我们提出了一种简单但有效的基于主题的数据混合策略，利用我们的主题建模方法DataWeave生成的细粒度主题。DataWeave采用多阶段聚类过程将语义上相似的文档分组，并利用LLMs生成详细主题，从而促进对数据集组成的更细致理解。我们的策略采用启发式方法来上采样或下采样特定主题，显著提高了LLM在下游任务中的性能，实现了比先前更复杂的数据混合方法更优越的结果。此外，我们确认科学和关系主题特别有效，带来了最重大的性能改进。我们将公开提供我们的代码和数据集。

更新时间: 2025-03-05 06:23:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.16802v2

COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence

Open Source Intelligence (OSINT) requires the integration and reasoning of diverse multimodal data, presenting significant challenges in deriving actionable insights. Traditional approaches, including multimodal large language models (MLLMs), often struggle to infer complex contextual relationships or deliver comprehensive intelligence from unstructured data sources. In this paper, we introduce COSINT-Agent, a knowledge-driven multimodal agent tailored to address the challenges of OSINT in the Chinese domain. COSINT-Agent seamlessly integrates the perceptual capabilities of fine-tuned MLLMs with the structured reasoning power of the Entity-Event-Scene Knowledge Graph (EES-KG). Central to COSINT-Agent is the innovative EES-Match framework, which bridges COSINT-MLLM and EES-KG, enabling systematic extraction, reasoning, and contextualization of multimodal insights. This integration facilitates precise entity recognition, event interpretation, and context retrieval, effectively transforming raw multimodal data into actionable intelligence. Extensive experiments validate the superior performance of COSINT-Agent across core OSINT tasks, including entity recognition, EES generation, and context matching. These results underscore its potential as a robust and scalable solution for advancing automated multimodal reasoning and enhancing the effectiveness of OSINT methodologies.

Updated: 2025-03-05 06:16:15

标题: COSINT-Agent：一个基于知识驱动的中文开源情报多模态代理

摘要: 开源情报(OSINT)需要整合和推理多样的多模态数据，这在提取可操作见解方面存在重大挑战。传统方法，包括多模态大型语言模型(MLLMs)，通常难以推断复杂的上下文关系或从非结构化数据源中提供全面情报。在本文中，我们介绍了COSINT-Agent，这是一个针对中国领域OSINT挑战量身定制的知识驱动多模态代理。COSINT-Agent无缝整合了经过调整的MLLMs的感知能力和实体-事件-场景知识图(EES-KG)的结构推理能力。COSINT-Agent的核心是创新的EES-Match框架，它连接了COSINT-MLLM和EES-KG，实现了对多模态见解的系统提取、推理和情境化。这种整合有助于精确实体识别、事件解释和上下文检索，有效地将原始多模态数据转化为可操作情报。广泛的实验证实了COSINT-Agent在核心OSINT任务中的出色表现，包括实体识别、EES生成和上下文匹配。这些结果突显了其作为推进自动化多模态推理和增强OSINT方法效果的强大可扩展解决方案的潜力。

更新时间: 2025-03-05 06:16:15

领域: cs.AI

下载: http://arxiv.org/abs/2503.03215v1

Convergence Rates for Softmax Gating Mixture of Experts

Mixture of experts (MoE) has recently emerged as an effective framework to advance the efficiency and scalability of machine learning models by softly dividing complex tasks among multiple specialized sub-models termed experts. Central to the success of MoE is an adaptive softmax gating mechanism which takes responsibility for determining the relevance of each expert to a given input and then dynamically assigning experts their respective weights. Despite its widespread use in practice, a comprehensive study on the effects of the softmax gating on the MoE has been lacking in the literature. To bridge this gap in this paper, we perform a convergence analysis of parameter estimation and expert estimation under the MoE equipped with the standard softmax gating or its variants, including a dense-to-sparse gating and a hierarchical softmax gating, respectively. Furthermore, our theories also provide useful insights into the design of sample-efficient expert structures. In particular, we demonstrate that it requires polynomially many data points to estimate experts satisfying our proposed \emph{strong identifiability} condition, namely a commonly used two-layer feed-forward network. In stark contrast, estimating linear experts, which violate the strong identifiability condition, necessitates exponentially many data points as a result of intrinsic parameter interactions expressed in the language of partial differential equations. All the theoretical results are substantiated with a rigorous guarantee.

Updated: 2025-03-05 06:11:24

标题: Softmax门控混合专家模型的收敛速率

摘要: 最近，混合专家（MoE）已经成为一种有效的框架，通过在多个专门的子模型（称为专家）之间软分配复杂任务来提高机器学习模型的效率和可扩展性。MoE的成功关键在于自适应softmax门控机制，负责确定每个专家对给定输入的相关性，然后动态分配专家的权重。尽管在实践中被广泛使用，但文献中缺乏对softmax门控对MoE影响的全面研究。为了填补这一空白，本文对配备标准softmax门控或其变种（包括从密集到稀疏门控和分层softmax门控）的MoE的参数估计和专家估计进行了收敛分析。此外，我们的理论还为设计样本高效的专家结构提供了有用的见解。特别是，我们证明估计符合我们提出的“强可识别性”条件的专家需要多项式数量的数据点，即常用的两层前馈网络。与之形成鲜明对比的是，估计线性专家，违反强可识别性条件，需要指数数量的数据点，这是由于在偏微分方程语言中表达的内在参数交互。所有理论结果都经过严格保证。

更新时间: 2025-03-05 06:11:24

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2503.03213v1

Adversarial Example Based Fingerprinting for Robust Copyright Protection in Split Learning

Currently, deep learning models are easily exposed to data leakage risks. As a distributed model, Split Learning thus emerged as a solution to address this issue. The model is splitted to avoid data uploading to the server and reduce computing requirements while ensuring data privacy and security. However, the transmission of data between clients and server creates a potential vulnerability. In particular, model is vulnerable to intellectual property (IP) infringement such as piracy. Alarmingly, a dedicated copyright protection framework tailored for Split Learning models is still lacking. To this end, we propose the first copyright protection scheme for Split Learning model, leveraging fingerprint to ensure effective and robust copyright protection. The proposed method first generates a set of specifically designed adversarial examples. Then, we select those examples that would induce misclassifications to form the fingerprint set. These adversarial examples are embedded as fingerprints into the model during the training process. Exhaustive experiments highlight the effectiveness of the scheme. This is demonstrated by a remarkable fingerprint verification success rate (FVSR) of 100% on MNIST, 98% on CIFAR-10, and 100% on ImageNet, respectively. Meanwhile, the model's accuracy only decreases slightly, indicating that the embedded fingerprints do not compromise model performance. Even under label inference attack, our approach consistently achieves a high fingerprint verification success rate that ensures robust verification.

Updated: 2025-03-05 06:07:16

标题: 对抗性示例基于指纹技术的分布式学习下的鲁棒版权保护

摘要: 目前，深度学习模型很容易暴露于数据泄露风险之中。作为一种分布式模型，分裂学习因此成为解决这一问题的方案。该模型被分割以避免数据上传至服务器并减少计算需求，同时确保数据隐私和安全性。然而，客户端和服务器之间的数据传输会产生潜在的漏洞。特别是，模型容易受到知识产权（IP）侵权的风险，如盗版。令人担忧的是，尚缺乏专门为分裂学习模型量身定制的版权保护框架。为此，我们提出了第一个用于分裂学习模型的版权保护方案，利用指纹技术确保有效和稳健的版权保护。所提出的方法首先生成一组特别设计的对抗性示例。然后，我们选择那些会导致误分类的示例形成指纹集。这些对抗性示例在训练过程中被嵌入到模型中作为指纹。详尽的实验突显了该方案的有效性。通过在MNIST上达到100%的指纹验证成功率（FVSR），在CIFAR-10上达到98%，在ImageNet上达到100%，这一点得到了证明。同时，模型的准确率仅略微降低，表明嵌入的指纹并不影响模型性能。即使在标签推断攻击下，我们的方法始终保持高的指纹验证成功率，确保了稳健的验证。

更新时间: 2025-03-05 06:07:16

领域: cs.CR

下载: http://arxiv.org/abs/2503.04825v1

NodeReg: Mitigating the Imbalance and Distribution Shift Effects in Semi-Supervised Node Classification via Norm Consistency

Aggregating information from neighboring nodes benefits graph neural networks (GNNs) in semi-supervised node classification tasks. Nevertheless, this mechanism also renders nodes susceptible to the influence of their neighbors. For instance, this will occur when the neighboring nodes are imbalanced or the neighboring nodes contain noise, which can even affect the GNN's ability to generalize out of distribution. We find that ensuring the consistency of the norm for node representations can significantly reduce the impact of these two issues on GNNs. To this end, we propose a regularized optimization method called NodeReg that enforces the consistency of node representation norms. This method is simple but effective and satisfies Lipschitz continuity, thus facilitating stable optimization and significantly improving semi-supervised node classification performance under the above two scenarios. To illustrate, in the imbalance scenario, when training a GCN with an imbalance ratio of 0.1, NodeReg outperforms the most competitive baselines by 1.4%-25.9% in F1 score across five public datasets. Similarly, in the distribution shift scenario, NodeReg outperforms the most competitive baseline by 1.4%-3.1% in accuracy.

Updated: 2025-03-05 06:06:16

标题: NodeReg：通过规范一致性缓解半监督节点分类中的不平衡和分布偏移效应

摘要: 聚合邻居节点的信息有益于图神经网络（GNN）在半监督节点分类任务中的表现。然而，这种机制也使节点容易受到其邻居的影响。例如，当邻居节点不平衡或包含噪音时，这种情况就会发生，甚至会影响GNN在分布之外的泛化能力。我们发现，确保节点表示的范数一致性可以显著减少这两个问题对GNN的影响。为此，我们提出了一种称为NodeReg的正则化优化方法，强化节点表示范数的一致性。这种方法简单而有效，并满足Lipschitz连续性，从而促进稳定的优化，并在上述两种情况下显著改善半监督节点分类性能。举例来说，在不平衡情况下，当使用不平衡比率为0.1的GCN进行训练时，NodeReg在五个公共数据集上的F1得分比竞争基线高出1.4%-25.9%。同样，在分布转移情况下，NodeReg在准确性上比最具竞争力的基线高出1.4%-3.1%。

更新时间: 2025-03-05 06:06:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03211v1

Laminator: Verifiable ML Property Cards using Hardware-assisted Attestations

Regulations increasingly call for various assurances from machine learning (ML) model providers about their training data, training process, and model behavior. For better transparency, industry (e.g., Huggingface and Google) has adopted model cards and datasheets to describe various properties of training datasets and models. In the same vein, we introduce the notion of inference cards to describe the properties of a given inference (e.g., binding of the output to the model and its corresponding input). We coin the term ML property cards to collectively refer to these various types of cards. To prevent a malicious model provider from including false information in ML property cards, they need to be verifiable. We show how to construct verifiable ML property cards using property attestation, technical mechanisms by which a prover (e.g., a model provider) can attest to various ML properties to a verifier (e.g., an auditor). Since prior attestation mechanisms based purely on cryptography are often narrowly focused (lacking versatility) and inefficient, we need an efficient mechanism to attest different types of properties across the entire ML model pipeline. Emerging widespread support for confidential computing has made it possible to run and even train models inside hardware-assisted trusted execution environments (TEEs), which provide highly efficient attestation mechanisms. We propose Laminator, which uses TEEs to provide the first framework for verifiable ML property cards via hardware-assisted ML property attestations. Laminator is efficient in terms of overhead, scalable to large numbers of verifiers, and versatile with respect to the properties it can prove during training or inference.

Updated: 2025-03-05 06:05:14

标题: 层压机：使用硬件辅助证明的可验证的机器学习属性卡

摘要: 随着监管机构对机器学习（ML）模型提供者的训练数据、训练过程和模型行为提出越来越多的要求，行业（例如Huggingface和谷歌）为了更好地透明化，已经采用了模型卡和数据表来描述训练数据集和模型的各种属性。在同样的精神下，我们引入了推理卡的概念，以描述给定推理的属性（例如，输出与模型及其相应输入的绑定）。我们创造了术语ML属性卡，用来统称这些各种类型的卡。为防止恶意的模型提供者在ML属性卡中包含虚假信息，这些信息需要是可验证的。我们展示了如何使用属性认证构建可验证的ML属性卡，即技术机制，通过这种机制，证明者（例如，模型提供者）可以向验证者（例如，审计员）证明各种ML属性。由于先前基于纯密码学的认证机制通常专注于特定领域（缺乏多功能性）且效率低下，我们需要一种有效的机制，以证明整个ML模型管道中的不同类型的属性。隐私计算的广泛支持正在兴起，这使得在硬件辅助的受信任执行环境（TEE）内运行甚至训练模型成为可能，这些环境提供了高效的认证机制。我们提出了Laminator，利用TEE提供了第一个可通过硬件辅助的ML属性认证实现可验证的ML属性卡的框架。Laminator在开销方面高效，可扩展到大量的验证者，并且在训练或推理过程中可以证明各种属性的多功能性。

更新时间: 2025-03-05 06:05:14

领域: cs.CR

下载: http://arxiv.org/abs/2406.17548v3

BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling

Time-series Generation (TSG) is a prominent research area with broad applications in simulations, data augmentation, and counterfactual analysis. While existing methods have shown promise in unconditional single-domain TSG, real-world applications demand for cross-domain approaches capable of controlled generation tailored to domain-specific constraints and instance-level requirements. In this paper, we argue that text can provide semantic insights, domain information and instance-specific temporal patterns, to guide and improve TSG. We introduce ``Text-Controlled TSG'', a task focused on generating realistic time series by incorporating textual descriptions. To address data scarcity in this setting, we propose a novel LLM-based Multi-Agent framework that synthesizes diverse, realistic text-to-TS datasets. Furthermore, we introduce BRIDGE, a hybrid text-controlled TSG framework that integrates semantic prototypes with text description for supporting domain-level guidance. This approach achieves state-of-the-art generation fidelity on 11 of 12 datasets, and improves controllability by 12.52% on MSE and 6.34% MAE compared to no text input generation, highlighting its potential for generating tailored time-series data.

Updated: 2025-03-05 06:04:37

标题: BRIDGE：通过多代理迭代优化和扩散建模引导文本控制时间序列生成

摘要: 时间序列生成（TSG）是一个重要的研究领域，广泛应用于模拟、数据增强和反事实分析。尽管现有方法在无条件单领域TSG方面表现出了潜力，但实际应用要求跨领域方法，能够生成符合领域特定约束和实例级要求的可控时间序列。在本文中，我们认为文本可以提供语义洞见、领域信息和实例特定的时间模式，以指导和改进TSG。我们引入了“文本控制TSG”，这是一个通过整合文本描述生成逼真时间序列的任务。为了解决这种情况下的数据稀缺问题，我们提出了一种基于LLM的多智能体框架，合成多样化、逼真的文本到时间序列数据集。此外，我们引入了BRIDGE，一个混合文本控制TSG框架，将语义原型与文本描述集成，以支持领域级别的指导。这种方法在12个数据集中有11个达到了最先进的生成保真度，并且与无文本输入生成相比，MSE提高了12.52%，MAE提高了6.34%，突出了其生成定制时间序列数据的潜力。

更新时间: 2025-03-05 06:04:37

领域: cs.LG,cs.CL,cs.MA

下载: http://arxiv.org/abs/2503.02445v2

Online Planning for Multi-UAV Pursuit-Evasion in Unknown Environments Using Deep Reinforcement Learning

Multi-UAV pursuit-evasion, where pursuers aim to capture evaders, poses a key challenge for UAV swarm intelligence. Multi-agent reinforcement learning (MARL) has demonstrated potential in modeling cooperative behaviors, but most RL-based approaches remain constrained to simplified simulations with limited dynamics or fixed scenarios. Previous attempts to deploy RL policy to real-world pursuit-evasion are largely restricted to two-dimensional scenarios, such as ground vehicles or UAVs at fixed altitudes. In this paper, we address multi-UAV pursuit-evasion by considering UAV dynamics and physical constraints. We introduce an evader prediction-enhanced network to tackle partial observability in cooperative strategy learning. Additionally, we propose an adaptive environment generator within MARL training, enabling higher exploration efficiency and better policy generalization across diverse scenarios. Simulations show our method significantly outperforms all baselines in challenging scenarios, generalizing to unseen scenarios with a 100% capture rate. Finally, we derive a feasible policy via a two-stage reward refinement and deploy the policy on real quadrotors in a zero-shot manner. To our knowledge, this is the first work to derive and deploy an RL-based policy using collective thrust and body rates control commands for multi-UAV pursuit-evasion in unknown environments. The open-source code and videos are available at https://sites.google.com/view/pursuit-evasion-rl.

Updated: 2025-03-05 05:55:45

标题: 利用深度强化学习进行多无人机在未知环境中的在线追逐-逃避规划

摘要: 多无人机追逐逃避，其中追逐者的目标是捕捉逃避者，对无人机群体智能提出了关键挑战。多智能体强化学习（MARL）已经展示了在建模合作行为方面的潜力，但大多数基于RL的方法仍然受限于简化的模拟和有限的动态或固定的场景。以前尝试将RL策略部署到现实的追逐逃避中主要受限于二维场景，如地面车辆或固定高度的无人机。在本文中，我们通过考虑无人机动态和物理约束来解决多无人机追逐逃避问题。我们引入了一个增强逃避者预测的网络，以应对合作策略学习中的部分可观察性。此外，我们提出了一个自适应环境生成器，在MARL训练中实现更高的探索效率和更好的策略泛化能力跨多种场景。仿真结果显示我们的方法在具有挑战性的场景中明显优于所有基线，在未见过的场景中普遍具有100%的捕获率。最后，我们通过两阶段奖励细化推导出一个可行的策略，并以零射击方式将该策略部署在真实的四旋翼飞行器上。据我们所知，这是第一项使用集体推力和体速率控制命令为未知环境中的多无人机追逐逃避导出和部署RL策略的工作。开源代码和视频可在https://sites.google.com/view/pursuit-evasion-rl 上找到。

更新时间: 2025-03-05 05:55:45

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2409.15866v3

An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models

We developed an analytical framework for understanding how the learned distribution evolves during diffusion model training. Leveraging the Gaussian equivalence principle, we derived exact solutions for the gradient-flow dynamics of weights in one- or two-layer linear denoiser settings with arbitrary data. Remarkably, these solutions allowed us to derive the generated distribution in closed form and its KL divergence through training. These analytical results expose a pronounced power-law spectral bias, i.e., for weights and distributions, the convergence time of a mode follows an inverse power law of its variance. Empirical experiments on both Gaussian and image datasets demonstrate that the power-law spectral bias remains robust even when using deeper or convolutional architectures. Our results underscore the importance of the data covariance in dictating the order and rate at which diffusion models learn different modes of the data, providing potential explanations for why earlier stopping could lead to incorrect details in image generative models.

Updated: 2025-03-05 05:50:38

标题: 扩散模型学习动态中幂律谱偏差的分析理论

摘要: 我们开发了一个分析框架，用于理解在扩散模型训练过程中所学习的分布是如何演变的。通过利用高斯等价原理，我们推导出在具有任意数据的一层或两层线性去噪器设置中，权重的梯度流动动力学的精确解。值得注意的是，这些解使我们能够以闭合形式推导出生成的分布及其通过训练的KL散度。这些分析结果揭示了明显的幂律谱偏差，即对于权重和分布，模式的收敛时间遵循其方差的逆幂律。对高斯和图像数据集的经验实验表明，即使使用更深层或卷积结构，幂律谱偏差仍然保持稳健。我们的结果强调了数据协方差在决定扩散模型学习数据不同模式的顺序和速率方面的重要性，为早期停止可能导致图像生成模型中出现不正确细节提供潜在解释。

更新时间: 2025-03-05 05:50:38

领域: cs.LG,cs.CV,math.ST,stat.ML,stat.TH,68T07, 60G15,F.2.2; G.1.2; G.3; I.2.6

下载: http://arxiv.org/abs/2503.03206v1

MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving

Solving mathematical problems using computer-verifiable languages like Lean has significantly impacted mathematical and computer science communities. State-of-the-art methods utilize single Large Language Models (LLMs) as agents or provers to either generate complete proof or perform tree searches. However, single-agent methods inherently lack a structured way to combine high-level reasoning in Natural Language (NL) with Formal Language (FL) verification feedback. To solve these issues, we propose MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought framework, (to the best of our knowledge), the first multi-agent framework for Lean4 theorem proving that balance high-level NL reasoning and FL verification in Long CoT. Using this structured interaction, our approach enables deeper insights and long-term coherence in proof generation, with which past methods struggle. We do this by leveraging emergent formal reasoning ability in Long CoT using our novel LoT-Transfer Learning training-inference pipeline. Extensive experiments show that our framework achieves 54.51% accuracy rate on the Lean4 version of MiniF2F-Test dataset, largely outperforming GPT-4 (22.95%), single-agent tree search (InternLM-Step-Prover, 50.70%), and whole-proof generation (DeepSeek-Prover-v1.5, 48.36%) baselines. Furthermore, our findings highlight the potential of combining Long CoT with formal verification for a more insightful generation in a broader perspective.

Updated: 2025-03-05 05:50:31

标题: MA-LoT：多智能体基于精简的长思维链推理增强形式定理证明

摘要: 使用计算机可验证语言（如Lean）解决数学问题已经对数学和计算机科学社区产生了显著影响。目前的方法利用单一的大型语言模型（LLMs）作为代理或证明者，来生成完整的证明或进行树搜索。然而，单一代理方法固有地缺乏一种结构化的方式来将自然语言（NL）的高级推理与形式语言（FL）的验证反馈相结合。为了解决这些问题，我们提出了MA-LoT：基于Lean的多代理长链推理框架，（据我们所知）是Lean4定理证明的第一个多代理框架，平衡了长链推理中的高级NL推理和FL验证。通过这种结构化的交互方式，我们的方法可以实现更深入的洞察和长期的证明连贯性，而过去的方法则很难做到。我们通过利用我们的新颖LoT-Transfer Learning训练推理管道，在长链推理中利用新兴的形式推理能力来实现这一点。大量实验证明，我们的框架在Lean4版本的MiniF2F-Test数据集上实现了54.51%的准确率，远远优于GPT-4（22.95%）、单一代理树搜索（InternLM-Step-Prover，50.70%）和整个证明生成（DeepSeek-Prover-v1.5，48.36%）的基线。此外，我们的研究结果突出了将长链推理与形式验证相结合，以更广阔的视角进行更具洞察力的生成的潜力。

更新时间: 2025-03-05 05:50:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03205v1

Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning

Recent advances in Large Language Models (LLMs) have raised interest in their formal reasoning capabilities, particularly in mathematics. While closed LLMs like GPT-4 perform well on mathematical benchmarks, e.g., GSM8K, it remains unclear whether small to medium-sized open LLMs can achieve similar performance, questioning their reliability. To close this gap, we propose a post-training approach leveraging a mixture of opinions (MoO) from weaker ancillary LLMs to enhance a (relatively) stronger LLM's reasoning. For that, each post-training sample is augmented with Chain-of-Thought (CoT) reasoning steps and answers from ancillary LLMs, enabling the main LLM to learn from diverse perspectives. We compare MoO with standard supervised fine-tuning (SFT), few-shot prompting, and the Mixture of Agents (MoA) method on mathematical reasoning benchmarks. Our results show that incorporating weaker LLMs' opinions improves mathematical reasoning by an average of 5%, highlighting the value of diverse perspectives in reasoning tasks.

Updated: 2025-03-05 05:42:39

标题: 弱势LLM的观点也很重要：观点的混合提升LLM的数学推理能力

摘要: 最近对大型语言模型（LLMs）的最新进展引起了人们对其形式推理能力的兴趣，特别是在数学领域。尽管像GPT-4这样的封闭LLMs在数学基准测试中表现出色，例如GSM8K，但目前尚不清楚小到中等规模的开放LLMs是否能够实现类似的性能，质疑它们的可靠性。为了弥补这一差距，我们提出了一种后训练方法，利用来自较弱辅助LLMs的意见混合（MoO）来增强一个（相对）较强的LLMs的推理能力。为此，每个后训练样本都增加了来自辅助LLMs的思维链（CoT）推理步骤和答案，使主要LLMs能够从不同的视角学习。我们在数学推理基准测试中将MoO与标准监督微调（SFT）、少样本提示和多智能体混合（MoA）方法进行了比较。我们的结果显示，融入较弱LLMs的意见将数学推理能力提高了平均5%，突显了在推理任务中多样化视角的价值。

更新时间: 2025-03-05 05:42:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.19622v2

zsLLMCode: An Effective Approach for Code Embedding via LLM with Zero-Shot Learning

The advent of large language models (LLMs) has greatly advanced artificial intelligence (AI) in software engineering (SE), with code embeddings playing a critical role in tasks like code-clone detection and code clustering. However, existing methods for code embedding, including those based on LLMs, often depend on costly supervised training or fine-tuning for domain adaptation. This paper proposes a novel zero-shot approach, zsLLMCode, to generate code embeddings by using LLMs and sentence embedding models. This approach attempts to eliminate the need for task-specific training or fine-tuning, and to effectively address the issue of erroneous information commonly found in LLM-generated outputs. We conducted a series of experiments to evaluate the performance of the proposed approach by considering various LLMs and embedding models. The results have demonstrated the effectiveness and superiority of our method zsLLMCode over state-of-the-art unsupervised approaches such as SourcererCC, Code2vec, InferCode, and TransformCode. Our findings highlight the potential of zsLLMCode to advance the field of SE by providing robust and efficient solutions for code embedding tasks.

Updated: 2025-03-05 05:42:35

标题: zsLLMCode：通过零样本学习的LLM进行代码嵌入的有效方法

摘要: 大语言模型（LLMs）的出现极大地推动了软件工程（SE）中的人工智能（AI）发展，代码嵌入在代码克隆检测和代码聚类等任务中起着关键作用。然而，现有的代码嵌入方法，包括基于LLMs的方法，通常依赖于昂贵的监督训练或领域适应的微调。本文提出了一种新颖的零-shot方法，zsLLMCode，通过使用LLMs和句子嵌入模型生成代码嵌入。这种方法试图消除对特定任务训练或微调的需求，并有效解决LLM生成的输出中常见的错误信息问题。我们进行了一系列实验来评估所提方法的性能，考虑了各种LLMs和嵌入模型。结果表明，我们的方法zsLLMCode在无监督方法（如SourcererCC，Code2vec，InferCode和TransformCode）方面的有效性和优越性。我们的研究结果强调了zsLLMCode在提供强健和高效的代码嵌入解决方案方面推动SE领域发展的潜力。

更新时间: 2025-03-05 05:42:35

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2409.14644v2

Towards Robust Universal Information Extraction: Benchmark, Evaluation, and Solution

In this paper, we aim to enhance the robustness of Universal Information Extraction (UIE) by introducing a new benchmark dataset, a comprehensive evaluation, and a feasible solution. Existing robust benchmark datasets have two key limitations: 1) They generate only a limited range of perturbations for a single Information Extraction (IE) task, which fails to evaluate the robustness of UIE models effectively; 2) They rely on small models or handcrafted rules to generate perturbations, often resulting in unnatural adversarial examples. Considering the powerful generation capabilities of Large Language Models (LLMs), we introduce a new benchmark dataset for Robust UIE, called RUIE-Bench, which utilizes LLMs to generate more diverse and realistic perturbations across different IE tasks. Based on this dataset, we comprehensively evaluate existing UIE models and reveal that both LLM-based models and other models suffer from significant performance drops. To improve robustness and reduce training costs, we propose a data-augmentation solution that dynamically selects hard samples for iterative training based on the model's inference loss. Experimental results show that training with only \textbf{15\%} of the data leads to an average \textbf{7.5\%} relative performance improvement across three IE tasks.

Updated: 2025-03-05 05:39:29

标题: 朝着强大的通用信息提取：基准、评估和解决方案

摘要: 在这篇论文中，我们旨在通过引入一个新的基准数据集、一个全面的评估以及一个可行的解决方案来增强通用信息提取（UIE）的鲁棒性。现有的鲁棒基准数据集存在两个关键限制：1）它们仅为单个信息提取（IE）任务生成有限范围的扰动，未能有效评估UIE模型的鲁棒性；2）它们依赖小型模型或手工规则来生成扰动，往往导致不自然的对抗性示例。考虑到大型语言模型（LLMs）强大的生成能力，我们引入了一个名为RUIE-Bench的新的鲁棒UIE基准数据集，利用LLMs在不同IE任务中生成更多样化和更真实的扰动。基于这个数据集，我们全面评估了现有的UIE模型，并揭示LLM-based模型和其他模型都存在显著的性能下降。为了提高鲁棒性并降低训练成本，我们提出了一个数据增强解决方案，根据模型的推理损失动态选择难样本进行迭代训练。实验结果表明，仅使用15％的数据训练可使三个IE任务的平均性能提升7.5％。

更新时间: 2025-03-05 05:39:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03201v1

LoBAM: LoRA-Based Backdoor Attack on Model Merging

Model merging is an emerging technique that integrates multiple models fine-tuned on different tasks to create a versatile model that excels in multiple domains. This scheme, in the meantime, may open up backdoor attack opportunities where one single malicious model can jeopardize the integrity of the merged model. Existing works try to demonstrate the risk of such attacks by assuming substantial computational resources, focusing on cases where the attacker can fully fine-tune the pre-trained model. Such an assumption, however, may not be feasible given the increasing size of machine learning models. In practice where resources are limited and the attacker can only employ techniques like Low-Rank Adaptation (LoRA) to produce the malicious model, it remains unclear whether the attack can still work and pose threats. In this work, we first identify that the attack efficacy is significantly diminished when using LoRA for fine-tuning. Then, we propose LoBAM, a method that yields high attack success rate with minimal training resources. The key idea of LoBAM is to amplify the malicious weights in an intelligent way that effectively enhances the attack efficacy. We demonstrate that our design can lead to improved attack success rate through extensive empirical experiments across various model merging scenarios. Moreover, we show that our method is highly stealthy and is difficult to detect and defend against.

Updated: 2025-03-05 05:34:47

标题: LoBAM：基于LoRA的模型合并后门攻击

摘要: 模型合并是一种新兴技术，它整合了在不同任务上进行微调的多个模型，创建出在多个领域表现卓越的通用模型。与此同时，这种方案可能会打开后门攻击的机会，其中一个恶意模型就能危害合并模型的完整性。现有研究试图通过假设大量计算资源来展示这类攻击的风险，重点放在攻击者可以完全微调预训练模型的情况下。然而，随着机器学习模型规模的增加，这种假设可能并不可行。在现实中，资源有限且攻击者只能使用Low-Rank Adaptation（LoRA）等技术来生成恶意模型时，攻击是否仍然有效并构成威胁尚不清楚。在这项工作中，我们首先发现使用LoRA进行微调时，攻击效果明显减弱。然后，我们提出了LoBAM方法，该方法以最少的训练资源实现高攻击成功率。LoBAM的关键思想是以智能方式放大恶意权重，从而有效增强攻击效果。我们通过广泛的经验实验证明，我们的设计可以提高攻击成功率，适用于各种模型合并场景。此外，我们展示了我们的方法具有高度隐秘性，难以检测和防御。

更新时间: 2025-03-05 05:34:47

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2411.16746v3

Directly Follows Graphs Go Predictive Process Monitoring With Graph Neural Networks

In the past years, predictive process monitoring (PPM) techniques based on artificial neural networks have evolved as a method to monitor the future behavior of business processes. Existing approaches mostly focus on interpreting the processes as sequences, so-called traces, and feeding them to neural architectures designed to operate on sequential data such as recurrent neural networks (RNNs) or transformers. In this study, we investigate an alternative way to perform PPM: by transforming each process in its directly-follows-graph (DFG) representation we are able to apply graph neural networks (GNNs) for the prediction tasks. By this, we aim to develop models that are more suitable for complex processes that are long and contain an abundance of loops. In particular, we present different ways to create DFG representations depending on the particular GNN we use. The tested GNNs range from classical node-based to novel edge-based architectures. Further, we investigate the possibility of using multi-graphs. By these steps, we aim to design graph representations that minimize the information loss when transforming traces into graphs.

Updated: 2025-03-05 05:30:26

标题: 直接跟随图形：利用图神经网络进行预测性过程监控

摘要: 在过去的几年中，基于人工神经网络的预测性过程监控（PPM）技术已经发展成为监测业务流程未来行为的一种方法。现有的方法主要集中在将流程解释为序列，即所谓的轨迹，并将其馈送到设计用于处理序列数据的神经架构，如循环神经网络（RNNs）或transformers。在本研究中，我们调查了一种执行PPM的替代方法：通过将每个流程转换为其直接后续图（DFG）表示，我们能够应用图神经网络（GNNs）进行预测任务。通过这种方式，我们旨在开发更适合复杂、长且包含大量循环的流程的模型。具体而言，我们提出了根据我们使用的特定GNN的不同方式来创建DFG表示的方法。测试的GNN范围从经典的基于节点的到新颖的基于边的架构。此外，我们研究了使用多图的可能性。通过这些步骤，我们旨在设计图表示，最大限度地减少将轨迹转换为图时的信息损失。

更新时间: 2025-03-05 05:30:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03197v1

Sarcasm Detection as a Catalyst: Improving Stance Detection with Cross-Target Capabilities

Stance Detection (SD) has become a critical area of interest due to its applications in various contexts leading to increased research within NLP. Yet the subtlety and complexity of texts sourced from online platforms often containing sarcastic language pose significant challenges for SD algorithms in accurately determining the authors stance. This paper addresses this by employing sarcasm for SD. It also tackles the issue of insufficient annotated data for training SD models on new targets by conducting Cross-Target SD (CTSD). The proposed approach involves fine-tuning BERT and RoBERTa models followed by concatenating additional deep learning layers. The approach is assessed against various State-Of-The-Art baselines for SD demonstrating superior performance using publicly available datasets. Notably our model outperforms the best SOTA models on both in-domain SD and CTSD tasks even before the incorporation of sarcasm-detection pre-training. The integration of sarcasm knowledge into the model significantly reduces misclassifications of sarcastic text elements in SD allowing our model to accurately predict 85% of texts that were previously misclassified without sarcasm-detection pre-training on in-domain SD. This enhancement contributes to an increase in the models average macro F1-score. The CTSD task achieves performance comparable to that of the in-domain task despite using a zero-shot finetuning. We also reveal that the success of the transfer-learning framework relies on the correlation between the lexical attributes of sarcasm detection and SD. This study represents the first exploration of sarcasm detection as an intermediate transfer-learning task within the context of SD while also leveraging the concatenation of BERT or RoBERTa with other deep-learning techniques. The proposed approach establishes a foundational baseline for future research in this domain.

Updated: 2025-03-05 05:27:16

标题: 讽刺检测作为催化剂：通过跨目标能力改善立场检测

摘要: 观点检测（SD）已经成为一个关键的研究领域，因为它在各种上下文中的应用导致了自然语言处理领域内的研究增加。然而，来源于在线平台的文本通常包含讽刺语言，其微妙和复杂性给SD算法在准确确定作者立场方面带来了重大挑战。本文通过使用讽刺来解决这一问题。它还通过进行交叉目标SD（CTSD）来解决在新目标上训练SD模型所需的标注数据不足的问题。所提出的方法包括对BERT和RoBERTa模型进行微调，然后连接额外的深度学习层。该方法通过使用公开可用的数据集针对各种SD的最新基线进行评估，展示了卓越的性能。值得注意的是，我们的模型在领域内SD和CTSD任务上的表现均优于最佳SOTA模型，甚至在未整合讽刺检测预训练模型之前。将讽刺知识整合到模型中显著减少了SD中讽刺文本元素的错误分类，使我们的模型能够准确预测以前被错误分类的85％文本，在领域内SD中没有讽刺检测预训练的情况下。这种改进有助于提高模型的平均宏F1分数。尽管使用了零-shot微调，CTSD任务的表现与领域内任务相媲美。我们还揭示了迁移学习框架的成功取决于讽刺检测和SD之间的词汇属性之间的相关性。这项研究代表了对SD上下文中讽刺检测作为中间迁移学习任务的首次探索，同时利用了BERT或RoBERTa与其他深度学习技术的连接。所提出的方法为未来该领域的研究奠定了基础。

更新时间: 2025-03-05 05:27:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03787v1

Online Bidding under RoS Constraints without Knowing the Value

We consider the problem of bidding in online advertising, where an advertiser aims to maximize value while adhering to budget and Return-on-Spend (RoS) constraints. Unlike prior work that assumes knowledge of the value generated by winning each impression ({e.g.,} conversions), we address the more realistic setting where the advertiser must simultaneously learn the optimal bidding strategy and the value of each impression opportunity. This introduces a challenging exploration-exploitation dilemma: the advertiser must balance exploring different bids to estimate impression values with exploiting current knowledge to bid effectively. To address this, we propose a novel Upper Confidence Bound (UCB)-style algorithm that carefully manages this trade-off. Via a rigorous theoretical analysis, we prove that our algorithm achieves $\widetilde{O}(\sqrt{T\log(|\mathcal{B}|T)})$ regret and constraint violation, where $T$ is the number of bidding rounds and $\mathcal{B}$ is the domain of possible bids. This establishes the first optimal regret and constraint violation bounds for bidding in the online setting with unknown impression values. Moreover, our algorithm is computationally efficient and simple to implement. We validate our theoretical findings through experiments on synthetic data, demonstrating that our algorithm exhibits strong empirical performance compared to existing approaches.

Updated: 2025-03-05 05:25:54

标题: 在不知晓价值的情况下受RoS约束的在线竞标

摘要: 我们考虑在线广告投放中的竞标问题，其中广告主旨在最大化价值，同时遵守预算和投放回报（RoS）约束。与以往假设知道每次展示获得的价值（例如，转化）的工作不同，我们处理更现实的情形，即广告主必须同时学习最佳竞标策略和每次展示机会的价值。这引入了一个具有挑战性的探索-利用困境：广告主必须平衡探索不同竞标以估计展示价值，并利用当前知识有效地竞标。为了解决这个问题，我们提出了一种新颖的上限置信区间（UCB）风格算法，可以精心平衡这种权衡。通过严格的理论分析，我们证明我们的算法实现了$ \widetilde{O}（\sqrt {T \log（| \mathcal{B} | T）}）$后悔和约束违规，其中$ T $是竞标轮数，$ \mathcal{B} $是可能竞标的领域。这为在线环境中具有未知展示价值的竞标建立了首个最优后悔和约束违规边界。此外，我们的算法在计算效率上是高效且简单实现的。通过对合成数据的实验验证我们的理论发现，展示了我们的算法相对于现有方法表现出强大的实证性能。

更新时间: 2025-03-05 05:25:54

领域: cs.LG

下载: http://arxiv.org/abs/2503.03195v1

Structured Outputs Enable General-Purpose LLMs to be Medical Experts

Medical question-answering (QA) is a critical task for evaluating how effectively large language models (LLMs) encode clinical knowledge and assessing their potential applications in medicine. Despite showing promise on multiple-choice tests, LLMs frequently struggle with open-ended medical questions, producing responses with dangerous hallucinations or lacking comprehensive coverage of critical aspects. Existing approaches attempt to address these challenges through domain-specific fine-tuning, but this proves resource-intensive and difficult to scale across models. To improve the comprehensiveness and factuality of medical responses, we propose a novel approach utilizing structured medical reasoning. Our method guides LLMs through an seven-step cognitive process inspired by clinical diagnosis, enabling more accurate and complete answers without additional training. Experiments on the MedLFQA benchmark demonstrate that our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models. Notably, this improvement transfers to smaller models, highlighting the method's efficiency and scalability. Our code and datasets are available.

Updated: 2025-03-05 05:24:55

标题: 结构化输出使通用型LLM成为医学专家

摘要: 医学问答（QA）是评估大型语言模型（LLMs）如何有效地编码临床知识并评估它们在医学中潜在应用的关键任务。尽管在多项选择测试中表现出潜力，LLMs经常在开放式医学问题上遇到困难，产生具有危险幻觉或缺乏关键方面的全面覆盖的回答。现有方法试图通过领域特定的微调来解决这些挑战，但这证明是资源密集型且难以在模型之间扩展。为了提高医学回答的全面性和事实性，我们提出了一种利用结构化医学推理的新方法。我们的方法通过受临床诊断启发的七步认知过程引导LLMs，使其能够更准确和完整地回答问题，而无需额外训练。在MedLFQA基准上的实验表明，我们的方法实现了最高的事实性分数为85.8，超过了经过微调的模型。值得注意的是，这种改进可以转移到较小的模型上，突显了该方法的效率和可扩展性。我们的代码和数据集可供使用。

更新时间: 2025-03-05 05:24:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03194v1

SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction

Large Language Models (LLMs) have demonstrated improved generation performance by incorporating externally retrieved knowledge, a process known as retrieval-augmented generation (RAG). Despite the potential of this approach, existing studies evaluate RAG effectiveness by 1) assessing retrieval and generation components jointly, which obscures retrieval's distinct contribution, or 2) examining retrievers using traditional metrics such as NDCG, which creates a gap in understanding retrieval's true utility in the overall generation process. To address the above limitations, in this work, we introduce an automatic evaluation method that measures retrieval quality through the lens of information gain within the RAG framework. Specifically, we propose Semantic Perplexity (SePer), a metric that captures the LLM's internal belief about the correctness of the retrieved information. We quantify the utility of retrieval by the extent to which it reduces semantic perplexity post-retrieval. Extensive experiments demonstrate that SePer not only aligns closely with human preferences but also offers a more precise and efficient evaluation of retrieval utility across diverse RAG scenarios.

Updated: 2025-03-05 05:24:54

标题: SePer：通过语义困惑减少的视角衡量信息检索效用

摘要: 大型语言模型（LLMs）通过整合外部检索到的知识，即检索增强生成（RAG）的过程，展示了提高生成性能的能力。尽管这种方法有潜力，但现有研究评估RAG的有效性方法有限：1）评估检索和生成组件时混合在一起，模糊了检索的独特贡献；或者2）使用传统指标如NDCG检查检索器，造成对检索在整体生成过程中真正效用的理解缺失。为解决上述限制，本文引入了一种自动评估方法，通过RAG框架内的信息增益视角衡量检索质量。具体地，我们提出了语义困惑（SePer）指标，捕捉了LLM对检索信息正确性的内部信念。我们通过检索减少语义困惑程度来量化检索的效用。大量实验表明，SePer不仅与人类偏好密切相关，而且在各种RAG场景下提供了更精确和高效的检索效用评估。

更新时间: 2025-03-05 05:24:54

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.01478v3

RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments

Reliable embodied perception from an egocentric perspective is challenging yet essential for autonomous navigation technology of intelligent mobile agents. With the growing demand of social robotics, near-field scene understanding becomes an important research topic in the areas of egocentric perceptual tasks related to navigation in both crowded and unstructured environments. Due to the complexity of environmental conditions and difficulty of surrounding obstacles owing to truncation and occlusion, the perception capability under this circumstance is still inferior. To further enhance the intelligence of mobile robots, in this paper, we setup an egocentric multi-sensor data collection platform based on 3 main types of sensors (Camera, LiDAR and Fisheye), which supports flexible sensor configurations to enable dynamic sight of view from ego-perspective, capturing either near or farther areas. Meanwhile, a large-scale multimodal dataset is constructed, named RoboSense, to facilitate egocentric robot perception. Specifically, RoboSense contains more than 133K synchronized data with 1.4M 3D bounding box and IDs annotated in the full $360^{\circ}$ view, forming 216K trajectories across 7.6K temporal sequences. It has $270\times$ and $18\times$ as many annotations of surrounding obstacles within near ranges as the previous datasets collected for autonomous driving scenarios such as KITTI and nuScenes. Moreover, we define a novel matching criterion for near-field 3D perception and prediction metrics. Based on RoboSense, we formulate 6 popular tasks to facilitate the future research development, where the detailed analysis as well as benchmarks are also provided accordingly. Data desensitization measures have been conducted for privacy protection.

Updated: 2025-03-05 05:14:34

标题: RoboSense：拥挤和非结构化环境中自主机器人感知和导航的大规模数据集和基准测试

摘要: 从自我中心的视角获得可靠的体现知觉对于智能移动代理的自主导航技术来说是具有挑战性但又必不可少的。随着社交机器人需求的增长，近场景理解成为自我中心感知任务相关导航领域中的重要研究课题，涉及拥挤和无结构环境中的任务。由于环境条件的复杂性和由于截断和遮挡引起的周围障碍物的困难性，此种情况下的感知能力仍然较低。为了进一步提升移动机器人的智能性，在本文中，我们建立了一个基于三种主要类型传感器（相机、LiDAR和鱼眼）的自我中心多传感器数据收集平台，支持灵活的传感器配置，实现从自我视角动态视野，捕捉近距离或更远区域。同时，构建了一个名为RoboSense的大规模多模态数据集，以促进自我中心机器人感知。具体来说，RoboSense包含超过133K个同步数据，其中1.4M个三维边界框和标识在完整的$360^{\circ}$视图中注释，形成跨7.6K个时间序列的216K个轨迹。与以往为自主驾驶场景收集的数据集（如KITTI和nuScenes）相比，它在近距离范围内周围障碍物的注释数量分别是其$270\times$和$18\times$。此外，我们为近场三维感知和预测指标定义了一种新的匹配标准。基于RoboSense，我们制定了6个流行任务以促进未来研究发展，同时也相应提供了详细分析和基准测试。数据脱敏措施已经进行以保护隐私。

更新时间: 2025-03-05 05:14:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.15503v5

PAC Learning with Improvements

One of the most basic lower bounds in machine learning is that in nearly any nontrivial setting, it takes $\textit{at least}$ $1/\epsilon$ samples to learn to error $\epsilon$ (and more, if the classifier being learned is complex). However, suppose that data points are agents who have the ability to improve by a small amount if doing so will allow them to receive a (desired) positive classification. In that case, we may actually be able to achieve $\textit{zero}$ error by just being "close enough". For example, imagine a hiring test used to measure an agent's skill at some job such that for some threshold $\theta$, agents who score above $\theta$ will be successful and those who score below $\theta$ will not (i.e., learning a threshold on the line). Suppose also that by putting in effort, agents can improve their skill level by some small amount $r$. In that case, if we learn an approximation $\hat{\theta}$ of $\theta$ such that $\theta \leq \hat{\theta} \leq \theta + r$ and use it for hiring, we can actually achieve error zero, in the sense that (a) any agent classified as positive is truly qualified, and (b) any agent who truly is qualified can be classified as positive by putting in effort. Thus, the ability for agents to improve has the potential to allow for a goal one could not hope to achieve in standard models, namely zero error. In this paper, we explore this phenomenon more broadly, giving general results and examining under what conditions the ability of agents to improve can allow for a reduction in the sample complexity of learning, or alternatively, can make learning harder. We also examine both theoretically and empirically what kinds of improvement-aware algorithms can take into account agents who have the ability to improve to a limited extent when it is in their interest to do so.

Updated: 2025-03-05 05:03:14

标题: PAC学习的改进

摘要: 在机器学习中最基本的一个下界是，在几乎任何非平凡的情境中，学习到错误率 $\epsilon$ 需要至少 $1/\epsilon$ 个样本（如果被学习的分类器复杂，则需要更多）。然而，假设数据点是代理，他们有能力稍微改进自己，如果这样做会让他们获得（期望的）正分类。在这种情况下，我们实际上可能通过只要"足够接近"就能实现零错误。例如，想象一种用于衡量代理在某种工作中技能的招聘测试，对于某个阈值 $\theta$，得分高于 $\theta$ 的代理将成功，得分低于 $\theta$ 的代理则不会成功（即在一条线上学习阈值）。还假设通过努力，代理可以将自己的技能水平提高一小部分 $r$。在这种情况下，如果我们学习到一个 $\theta$ 的近似值 $\hat{\theta}$，使得 $\theta \leq \hat{\theta} \leq \theta + r$，并将其用于招聘，实际上可以实现零错误，意味着（a）任何被分类为正的代理确实合格，（b）任何确实合格的代理可以通过努力被分类为正。因此，代理改进的能力有可能实现在标准模型中无法希望实现的目标，即零错误。在本文中，我们更广泛地探讨了这一现象，提出了一般结果，并研究了在什么条件下代理改进的能力可以减少学习的样本复杂性，或者反之，可能使学习变得更加困难。我们还从理论和实证两方面研究了哪些考虑到代理具有在利益需要时有限程度改进能力的改进感知算法。

更新时间: 2025-03-05 05:03:14

领域: stat.ML,cs.GT,cs.LG

下载: http://arxiv.org/abs/2503.03184v1

Enhancing Cybersecurity in Critical Infrastructure with LLM-Assisted Explainable IoT Systems

Ensuring the security of critical infrastructure has become increasingly vital with the proliferation of Internet of Things (IoT) systems. However, the heterogeneous nature of IoT data and the lack of human-comprehensible insights from anomaly detection models remain significant challenges. This paper presents a hybrid framework that combines numerical anomaly detection using Autoencoders with Large Language Models (LLMs) for enhanced preprocessing and interpretability. Two preprocessing approaches are implemented: a traditional method utilizing Principal Component Analysis (PCA) to reduce dimensionality and an LLM-assisted method where GPT-4 dynamically recommends feature selection, transformation, and encoding strategies. Experimental results on the KDDCup99 10% corrected dataset demonstrate that the LLM-assisted preprocessing pipeline significantly improves anomaly detection performance. The macro-average F1 score increased from 0.49 in the traditional PCA-based approach to 0.98 with LLM-driven insights. Additionally, the LLM generates natural language explanations for detected anomalies, providing contextual insights into their causes and implications. This framework highlights the synergy between numerical AI models and LLMs, delivering an accurate, interpretable, and efficient solution for IoT cybersecurity in critical infrastructure.

Updated: 2025-03-05 04:53:07

标题: 用LLM辅助可解释的物联网系统增强关键基础设施的网络安全

摘要: 确保关键基础设施的安全性随着物联网系统的增加变得越来越重要。然而，物联网数据的异构性以及异常检测模型缺乏人类可理解的洞察力仍然是重大挑战。本文提出了一个混合框架，将使用自动编码器进行数值异常检测与大型语言模型（LLMs）相结合，以增强预处理和可解释性。实施了两种预处理方法：一种传统方法利用主成分分析（PCA）来降低维度，另一种是LLM辅助方法，其中GPT-4动态推荐特征选择、转换和编码策略。对KDDCup99 10%校正数据集的实验结果表明，LLM辅助的预处理流程显著改善了异常检测性能。宏平均F1分数从传统的基于PCA的方法的0.49增加到LLM驱动的洞察力的0.98。此外，LLM生成了对检测到的异常的自然语言解释，提供了有关其原因和影响的上下文洞察。这个框架突出了数值人工智能模型和LLMs之间的协同作用，为关键基础设施中物联网网络安全提供了准确、可解释和高效的解决方案。

更新时间: 2025-03-05 04:53:07

领域: cs.CR

下载: http://arxiv.org/abs/2503.03180v1

ProReflow: Progressive Reflow with Decomposed Velocity

Diffusion models have achieved significant progress in both image and video generation while still suffering from huge computation costs. As an effective solution, flow matching aims to reflow the diffusion process of diffusion models into a straight line for a few-step and even one-step generation. However, in this paper, we suggest that the original training pipeline of flow matching is not optimal and introduce two techniques to improve it. Firstly, we introduce progressive reflow, which progressively reflows the diffusion models in local timesteps until the whole diffusion progresses, reducing the difficulty of flow matching. Second, we introduce aligned v-prediction, which highlights the importance of direction matching in flow matching over magnitude matching. Experimental results on SDv1.5 and SDXL demonstrate the effectiveness of our method, for example, conducting on SDv1.5 achieves an FID of 10.70 on MSCOCO2014 validation set with only 4 sampling steps, close to our teacher model (32 DDIM steps, FID = 10.05).

Updated: 2025-03-05 04:50:53

标题: ProReflow: 采用分解速度的渐进回流

摘要: 扩散模型在图像和视频生成方面取得了重要进展，但仍然面临巨大的计算成本。作为一种有效的解决方案，流匹配旨在将扩散模型的扩散过程重新流为直线，实现少步甚至一步生成。然而，在本文中，我们认为流匹配的原始训练流程并不是最佳的，并引入了两种技术来改进它。首先，我们引入了渐进式重新流动，逐步在本地时间步骤中重新流动扩散模型，直到整个扩散过程完成，减少了流匹配的难度。其次，我们引入了对齐的v-预测，强调了在流匹配中方向匹配的重要性，而不是幅度匹配。对SDv1.5和SDXL上的实验结果表明了我们方法的有效性，例如在SDv1.5上进行的实验在MSCOCO2014验证集上取得了10.70的FID，仅使用4个采样步骤，接近我们的教师模型（32个DDIM步骤，FID = 10.05）。

更新时间: 2025-03-05 04:50:53

领域: cs.GR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.04824v1

ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points

We introduce ArcPro, a novel learning framework built on architectural programs to recover structured 3D abstractions from highly sparse and low-quality point clouds. Specifically, we design a domain-specific language (DSL) to hierarchically represent building structures as a program, which can be efficiently converted into a mesh. We bridge feedforward and inverse procedural modeling by using a feedforward process for training data synthesis, allowing the network to make reverse predictions. We train an encoder-decoder on the points-program pairs to establish a mapping from unstructured point clouds to architectural programs, where a 3D convolutional encoder extracts point cloud features and a transformer decoder autoregressively predicts the programs in a tokenized form. Inference by our method is highly efficient and produces plausible and faithful 3D abstractions. Comprehensive experiments demonstrate that ArcPro outperforms both traditional architectural proxy reconstruction and learning-based abstraction methods. We further explore its potential to work with multi-view image and natural language inputs.

Updated: 2025-03-05 04:49:18

标题: ArcPro：用于稀疏点结构化三维抽象的建筑程序

摘要: 我们介绍了ArcPro，这是一个基于建筑程序构建的新型学习框架，用于从高度稀疏和低质量的点云中恢复结构化的3D抽象。具体而言，我们设计了一个领域特定语言（DSL），用于分层表示建筑结构作为一个程序，可以有效地转换为网格。我们通过使用前馈过程进行训练数据合成来连接前馈和逆向过程建模，使网络能够进行逆向预测。我们训练了一个编码器-解码器对点程序对进行训练，建立从无结构点云到建筑程序的映射，其中3D卷积编码器提取点云特征，变换器解码器自回归地预测以标记形式呈现的程序。我们的方法推理效率高，并产生可信且忠实的3D抽象。全面的实验表明，ArcPro优于传统的建筑代理重建和基于学习的抽象方法。我们进一步探讨了其与多视图图像和自然语言输入的潜力。

更新时间: 2025-03-05 04:49:18

领域: cs.GR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.02745v2

Active operator learning with predictive uncertainty quantification for partial differential equations

In this work, we develop a method for uncertainty quantification in deep operator networks (DeepONets) using predictive uncertainty estimates calibrated to model errors observed during training. The uncertainty framework operates using a single network, in contrast to existing ensemble approaches, and introduces minimal overhead during training and inference. We also introduce an optimized implementation for DeepONet inference (reducing evaluation times by a factor of five) to provide models well-suited for real-time applications. We evaluate the uncertainty-equipped models on a series of partial differential equation (PDE) problems, and show that the model predictions are unbiased, non-skewed, and accurately reproduce solutions to the PDEs. To assess how well the models generalize, we evaluate the network predictions and uncertainty estimates on in-distribution and out-of-distribution test datasets. We find the predictive uncertainties accurately reflect the observed model errors over a range of problems with varying complexity; simpler out-of-distribution examples are assigned low uncertainty estimates, consistent with the observed errors, while more complex out-of-distribution examples are properly assigned higher uncertainties. We also provide a statistical analysis of the predictive uncertainties and verify that these estimates are well-aligned with the observed error distributions at the tail-end of training. Finally, we demonstrate how predictive uncertainties can be used within an active learning framework to yield improvements in accuracy and data-efficiency for outer-loop optimization procedures.

Updated: 2025-03-05 04:48:14

标题: 使用预测不确定性量化的主动操作员学习用于偏微分方程

摘要: 在这项工作中，我们开发了一种用于深度操作器网络（DeepONets）中的不确定性量化的方法，该方法使用在训练过程中观察到的校准到模型错误的预测不确定性估计。该不确定性框架使用单个网络操作，与现有的集成方法相比，训练和推断过程中引入的开销最小。我们还引入了一种针对DeepONet推断的优化实现（将评估时间缩短了五倍），以提供适用于实时应用的模型。我们在一系列偏微分方程（PDE）问题上评估了配备不确定性的模型，并显示模型预测是无偏的，非倾斜的，并且准确重现了PDE的解。为了评估模型的泛化能力，我们在分布内和分布外的测试数据集上评估了网络的预测和不确定性估计。我们发现，预测的不确定性准确地反映了在一系列具有不同复杂性的问题上观察到的模型错误；简单的分布外示例被分配了低不确定性估计，与观察到的错误一致，而更复杂的分布外示例被适当地分配了更高的不确定性。我们还对预测的不确定性进行了统计分析，并验证这些估计与训练末端观察到的错误分布是一致的。最后，我们演示了如何在主动学习框架内使用预测的不确定性，以提高外循环优化过程的准确性和数据效率。

更新时间: 2025-03-05 04:48:14

领域: cs.LG,math.PR

下载: http://arxiv.org/abs/2503.03178v1

Transformer Block Coupling and its Correlation with Generalization in LLMs

Large Language Models (LLMs) have made significant strides in natural language processing, and a precise understanding of the internal mechanisms driving their success is essential. In this work, we analyze the trajectories of token embeddings as they pass through transformer blocks, linearizing the system along these trajectories through their Jacobian matrices. By examining the relationships between these block Jacobians, we uncover the phenomenon of \textbf{transformer block coupling} in a multitude of LLMs, characterized by the coupling of their top singular vectors across tokens and depth. Our findings reveal that coupling \textit{positively correlates} with model performance, and that this relationship is stronger than with other hyperparameters such as parameter count, model depth, and embedding dimension. We further investigate how these properties emerge during training, observing a progressive development of coupling, increased linearity, and layer-wise exponential growth in token trajectories. Additionally, experiments with Vision Transformers (ViTs) corroborate the emergence of coupling and its relationship with generalization, reinforcing our findings in LLMs. Collectively, these insights offer a novel perspective on token interactions in transformers, opening new directions for studying their mechanisms as well as improving training and generalization.

Updated: 2025-03-05 04:47:05

标题: Transformer块耦合及其与LLM通用性的相关性

摘要: 大型语言模型（LLMs）在自然语言处理方面取得了显著进展，对于驱动其成功的内部机制的准确理解至关重要。在这项工作中，我们分析了令牌嵌入经过变换器块时的轨迹，通过它们的雅可比矩阵沿着这些轨迹线性化系统。通过检查这些块雅可比矩阵之间的关系，我们发现了在许多LLMs中的\textbf{变换器块耦合}现象，其特征是它们的顶部奇异向量在令牌和深度之间的耦合。我们的发现表明，耦合与模型性能\textit{正相关}，并且这种关系比与其他超参数（如参数数量、模型深度和嵌入维度）的关系更强。我们进一步研究了这些特性在训练过程中如何出现，观察到耦合的逐渐发展、线性增加以及令牌轨迹的逐层指数增长。此外，对视觉变换器（ViTs）的实验证实了耦合的出现及其与泛化的关系，从而加强了我们在LLMs中的发现。总的来说，这些见解为变压器中的令牌交互提供了一种新的视角，为研究它们的机制以及改善训练和泛化开辟了新的方向。

更新时间: 2025-03-05 04:47:05

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.07810v5

An Optimal Cascade Feature-Level Spatiotemporal Fusion Strategy for Anomaly Detection in CAN Bus

Autonomous vehicles represent a revolutionary advancement driven by the integration of artificial intelligence within intelligent transportation systems. However, they remain vulnerable due to the absence of robust security mechanisms in the Controller Area Network (CAN) bus. In order to mitigate the security issue, many machine learning models and strategies have been proposed, which primarily focus on a subset of dominant patterns of anomalies and lack rigorous evaluation in terms of reliability and robustness. Therefore, to address the limitations of previous works and mitigate the security vulnerability in CAN bus, the current study develops a model based on the intrinsic nature of the problem to cover all dominant patterns of anomalies. To achieve this, a cascade feature-level fusion strategy optimized by a two-parameter genetic algorithm is proposed to combine temporal and spatial information. Subsequently, the model is evaluated using a paired t-test to ensure reliability and robustness. Finally, a comprehensive comparative analysis conducted on two widely used datasets advocates that the proposed model outperforms other models and achieves superior accuracy and F1-score, demonstrating the best performance among all models presented to date.

Updated: 2025-03-05 04:45:03

标题: 一种用于CAN总线异常检测的最佳级联特征级时空融合策略

摘要: 自动驾驶车辆代表了一项革命性的进步，这是由人工智能在智能交通系统中的整合推动的。然而，由于控制区域网络（CAN）总线中缺乏强大的安全机制，它们仍然容易受到攻击。为了缓解安全问题，已经提出了许多机器学习模型和策略，这些模型主要关注异常的主要模式子集，并缺乏可靠性和稳健性方面的严格评估。因此，为了解决以往研究的局限性并缓解CAN总线的安全漏洞，本研究开发了一个基于问题固有特性的模型，以覆盖所有主要异常模式。为了实现这一目标，提出了一个由双参数遗传算法优化的级联特征级融合策略，将时空信息结合起来。随后，利用成对t检验对模型进行评估，以确保可靠性和稳健性。最后，对两个广泛使用的数据集进行了全面的比较分析，支持所提出的模型优于其他模型，并实现了更高的准确性和F1分数，表明该模型是迄今为止所有模型中表现最佳的。

更新时间: 2025-03-05 04:45:03

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2501.18821v2

Multiaccuracy and Multicalibration via Proxy Groups

As the use of predictive machine learning algorithms increases in high-stakes decision-making, it is imperative that these algorithms are fair across sensitive groups. Unfortunately, measuring and enforcing fairness in real-world applications can be challenging due to missing or incomplete sensitive group data. Proxy-sensitive attributes have been proposed as a practical and effective solution in these settings, but only for parity-based fairness notions. Knowing how to evaluate and control for fairness with missing sensitive group data for newer and more flexible frameworks, such as multiaccuracy and multicalibration, remains unexplored. In this work, we address this gap by demonstrating that in the absence of sensitive group data, proxy-sensitive attributes can provably be used to derive actionable upper bounds on the true multiaccuracy and multicalibration, providing insights into a model's potential worst-case fairness violations. Additionally, we show that adjusting models to satisfy multiaccuracy and multicalibration across proxy-sensitive attributes can significantly mitigate these violations for the true, but unknown, sensitive groups. Through several experiments on real-world datasets, we illustrate that approximate multiaccuracy and multicalibration can be achieved even when sensitive group information is incomplete or unavailable.

Updated: 2025-03-05 04:41:11

标题: 通过代理群体实现多准确性和多校准

摘要: 随着在高风险决策中使用预测机器学习算法的增加，这些算法在敏感群体之间的公平性至关重要。不幸的是，在现实世界应用中测量和执行公平性可能会面临挑战，因为敏感群体数据缺失或不完整。代理敏感属性被提议作为这些情况下的实际和有效解决方案，但仅适用于基于平等的公平性概念。如何评估和控制针对新型和更灵活框架（如多准确度和多校准）的公平性，其中存在缺失敏感群体数据，尚未探讨。在这项工作中，我们通过展示，在没有敏感群体数据的情况下，可以明确地使用代理敏感属性来推导真实多准确度和多校准的可操作上限，为模型潜在最坏情况下的公平性违规提供见解。此外，我们展示了调整模型以满足代理敏感属性上的多准确度和多校准可以显著减轻这些违规对真正但未知的敏感群体的影响。通过对真实数据集进行几项实验，我们说明即使敏感群体信息不完整或不可用，也可以实现近似的多准确度和多校准。

更新时间: 2025-03-05 04:41:11

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2503.02870v2

Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness

It is generally perceived that Dynamic Sparse Training opens the door to a new era of scalability and efficiency for artificial neural networks at, perhaps, some costs in accuracy performance for the classification task. At the same time, Dense Training is widely accepted as being the "de facto" approach to train artificial neural networks if one would like to maximize their robustness against image corruption. In this paper, we question this general practice. Consequently, we claim that, contrary to what is commonly thought, the Dynamic Sparse Training methods can consistently outperform Dense Training in terms of robustness accuracy, particularly if the efficiency aspect is not considered as a main objective (i.e., sparsity levels between 10% and up to 50%), without adding (or even reducing) resource cost. We validate our claim on two types of data, images and videos, using several traditional and modern deep learning architectures for computer vision and three widely studied Dynamic Sparse Training algorithms. Our findings reveal a new yet-unknown benefit of Dynamic Sparse Training and open new possibilities in improving deep learning robustness beyond the current state of the art.

Updated: 2025-03-05 04:37:07

标题: 动态稀疏训练与密集训练：在图像损坏鲁棒性中的意外赢家

摘要: 一般认为，动态稀疏训练为人工神经网络开启了一个新的可扩展性和效率时代，尽管在分类任务的准确性性能方面可能会付出一些代价。与此同时，密集训练被广泛认为是训练人工神经网络的"事实上"方法，如果想要最大化其抵抗图像破坏的稳健性。本文质疑了这种普遍做法。因此，我们声称，与常见看法相反，动态稀疏训练方法在稳健性准确性方面可以持续优于密集训练，特别是在效率方面并非主要目标时（即稀疏度在10%到50%之间），而不增加（甚至减少）资源成本。我们在图像和视频两种数据类型上验证了我们的说法，使用了几种传统和现代深度学习架构进行计算机视觉，并使用了三种广泛研究的动态稀疏训练算法。我们的研究结果揭示了动态稀疏训练的一种新的、尚未被发现的好处，并为改善深度学习的稳健性打开了新的可能性，超越了当前的技术水平。

更新时间: 2025-03-05 04:37:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.03030v2

The impact of AI and peer feedback on research writing skills: a study using the CGScholar platform among Kazakhstani scholars

This research studies the impact of AI and peer feedback on the academic writing development of Kazakhstani scholars using the CGScholar platform - a product of research into collaborative learning, big data, and artificial intelligence developed by educators and computer scientists at the University of Illinois at Urbana-Champaign (UIUC). The study aimed to find out how familiarity with AI tools and peer feedback processes impacts participants' openness to incorporating feedback into their academic writing. The study involved 36 scholars enrolled in a scientific internship focused on education at UIUC. A survey with 15 multiple-choice questions, a Likert scale, and open-ended questions was used to collect data. The survey was conducted via Google Forms in both English and Russian to ensure linguistic accessibility. Demographic information such as age, gender, and first language was collected to provide a detailed understanding of the data. The analysis revealed a moderate positive correlation between familiarity with AI tools and openness to making changes based on feedback, and a strong positive correlation between research writing experience and expectations of peer feedback, especially in the area of research methodology. These results show that participants are open-minded to AI-assisted feedback; however, they still highly appreciate peer input, especially regarding methodological guidance. This study demonstrates the potential benefits of integrating AI tools with traditional feedback mechanisms to improve research writing quality in academic settings.

Updated: 2025-03-05 04:34:25

标题: 人工智能和同行反馈对哈萨克斯坦学者研究写作技能的影响：利用CGScholar平台进行的研究

摘要: 这项研究研究了AI和同行反馈对哈萨克斯坦学者在CGScholar平台上的学术写作发展的影响 - 这是伊利诺伊大学厄巴纳-香槟分校（UIUC）的教育工作者和计算机科学家开发的协作学习、大数据和人工智能的研究成果。该研究旨在了解对AI工具和同行反馈过程的熟悉程度如何影响参与者将反馈纳入学术写作中的开放程度。该研究涉及36名在UIUC的教育科学实习课程中注册的学者。使用了包含15个多项选择题、李克特量表和开放性问题的调查来收集数据。调查通过Google表单以英语和俄语进行，以确保语言可访问性。收集了年龄、性别和母语等人口统计信息，以提供对数据的详细理解。分析显示，熟悉AI工具与根据反馈进行改变的开放程度之间存在中等正相关，研究写作经验与同行反馈期望之间存在强烈正相关，特别是在研究方法论领域。这些结果表明，参与者对辅助AI反馈持开放态度；然而，他们仍然高度重视同行输入，特别是在方法指导方面。该研究展示了将AI工具与传统反馈机制相结合以提高学术环境中研究写作质量的潜在好处。

更新时间: 2025-03-05 04:34:25

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2503.05820v1

A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement

This article discusses the evolving role of artificial intelligence (AI) in the legal profession, focusing on its potential to streamline tasks such as document review, research, and contract drafting. However, challenges persist, particularly the occurrence of "hallucinations" in AI models, where they generate inaccurate or misleading information, undermining their reliability in legal contexts. To address this, the article proposes a novel framework combining a mixture of expert systems with a knowledge-based architecture to improve the precision and contextual relevance of AI-driven legal services. This framework utilizes specialized modules, each focusing on specific legal areas, and incorporates structured operational guidelines to enhance decision-making. Additionally, it leverages advanced AI techniques like Retrieval-Augmented Generation (RAG), Knowledge Graphs (KG), and Reinforcement Learning from Human Feedback (RLHF) to improve the system's accuracy. The proposed approach demonstrates significant improvements over existing AI models, showcasing enhanced performance in legal tasks and offering a scalable solution to provide more accessible and affordable legal services. The article also outlines the methodology, system architecture, and promising directions for future research in AI applications for the legal sector.

Updated: 2025-03-05 04:32:02

标题: 可靠法律人工智能的综合框架：结合专业专家系统和自适应改进

摘要: 这篇文章讨论了人工智能（AI）在法律行业中不断发展的作用，重点关注其在简化文件审查、研究和合同起草等任务方面的潜力。然而，挑战仍然存在，特别是AI模型中出现“幻觉”的情况，导致它们生成不准确或误导性信息，削弱了它们在法律环境中的可靠性。为了解决这个问题，文章提出了一个结合专家系统和基于知识的架构的新颖框架，以提高AI驱动的法律服务的准确性和上下文相关性。这个框架利用专门的模块，每个模块都专注于特定的法律领域，并结合了结构化的操作指南来增强决策制定。此外，它利用了像检索增强生成（RAG）、知识图（KG）和从人类反馈中学习的强化学习（RLHF）等先进的AI技术来提高系统的准确性。提出的方法在现有的AI模型上展示了显著的改进，展示了在法律任务中性能的提升，并提供了一个可扩展的解决方案，以提供更加易于接近和负担得起的法律服务。文章还概述了方法论、系统架构以及AI在法律领域的未来研究方向。

更新时间: 2025-03-05 04:32:02

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2412.20468v2

Intermediate-Task Transfer Learning: Leveraging Sarcasm Detection for Stance Detection

Stance Detection (SD) on social media has emerged as a prominent area of interest with implications for social business and political applications thereby garnering escalating research attention within NLP. The inherent subtlety and complexity of texts procured from online platforms pose challenges for SD algorithms in accurately discerning the authors stance. Mostly the inclusion of sarcastic and figurative language drastically impacts the performance of SD models. This paper addresses this by employing sarcasm detection intermediate-task transfer learning tailored for SD. The proposed methodology involves the finetuning of BERT and RoBERTa and the concatenation of convolutional BiLSTM and dense layers. Rigorous experiments are conducted on publicly available datasets to evaluate our transfer-learning framework. The performance of the approach is assessed against various State-Of-The-Art baselines for SD providing empirical evidence of its effectiveness. Notably our model outperforms the best SOTA models even prior to sarcasm-detection pretraining. The integration of sarcasm knowledge into the model proves instrumental in mitigating misclassifications of sarcastic textual elements in SD. Our model accurately predicts 85% of texts that were previously misclassified by the model without sarcasm-detection pretraining thereby amplifying the average F1-score of the model. Our experiments also revealed that the success of the transfer-learning framework is contingent upon the correlation of lexical attributes between the intermediate task and the target task. This study represents the first exploration of sarcasm detection as an intermediate transfer-learning task in the context of SD and simultaneously uses the concatenation of BERT or RoBERTa with other deep-learning techniques establishing the proposed approach as a foundational baseline for future research endeavors in this domain.

Updated: 2025-03-05 04:30:53

标题: 中间任务的迁移学习：利用讽刺检测进行立场检测

摘要: 社交媒体上的立场检测（SD）已成为一个备受关注的领域，对社会商业和政治应用具有重要影响，因此在自然语言处理领域引起了越来越多的研究关注。从在线平台获取的文本的微妙和复杂性对SD算法在准确识别作者立场方面提出了挑战。大多数情况下，讽刺和比喻语言的使用会严重影响SD模型的性能。本文通过采用针对SD定制的讽刺检测中间任务迁移学习来解决这一问题。所提出的方法涉及对BERT和RoBERTa进行微调，并将卷积BiLSTM和密集层连接在一起。我们在公开可用的数据集上进行了严格实验，以评估我们的迁移学习框架的性能。该方法的表现与各种最新技术基线进行了比较，为SD提供了其有效性的实证证据。值得注意的是，我们的模型在讽刺检测预训练之前甚至就超过了最佳的SOTA模型。将讽刺知识整合到模型中对于减轻SD中讽刺文本元素的误分类起到了关键作用。我们的模型准确预测了之前被模型错误分类的85%的文本，从而提高了模型的平均F1分数。我们的实验还揭示了迁移学习框架的成功取决于中间任务和目标任务之间的词汇属性之间的相关性。本研究代表了在SD背景下将讽刺检测作为中间迁移学习任务的首次探索，并同时使用BERT或RoBERTa与其他深度学习技术的连接，将所提出的方法建立为未来研究努力在这一领域的基础基线。

更新时间: 2025-03-05 04:30:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03172v1

Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in high-dimensional, large-scale datasets. Specifically, we first propose a large matrix partitioning algorithm that partitions a large matrix into smaller submatrices, enabling parallel co-clustering. This method employs a probabilistic model to optimize the configuration of submatrices, balancing the computational efficiency and depth of analysis. Additionally, we propose a hierarchical co-cluster merging algorithm that efficiently identifies and merges co-clusters from these submatrices, enhancing the robustness and reliability of the process. Extensive evaluations validate the effectiveness and efficiency of our method. Experimental results demonstrate a significant reduction in computation time, with an approximate 83% decrease for dense matrices and up to 30% for sparse matrices.

Updated: 2025-03-05 04:30:02

标题: 可扩展的大规模数据的协同分簇：通过动态分区和层次合并

摘要: 共聚类同时对行和列进行聚类，揭示更细粒度的群组。然而，现有的共聚类方法存在可扩展性差，无法处理大规模数据。本文提出了一种新颖且可扩展的共聚类方法，旨在揭示高维大规模数据集中的复杂模式。具体来说，我们首先提出了一种大矩阵分区算法，将大矩阵分割成较小的子矩阵，实现并行共聚类。该方法采用概率模型优化子矩阵的配置，平衡计算效率和分析深度。此外，我们提出了一种层次化共聚类合并算法，有效识别和合并这些子矩阵中的共聚类，增强了过程的鲁棒性和可靠性。大量评估验证了我们方法的有效性和效率。实验结果表明，计算时间显著减少，对于密集矩阵减少约83%，对于稀疏矩阵高达30%。

更新时间: 2025-03-05 04:30:02

领域: cs.DC,cs.LG,H.2.8

下载: http://arxiv.org/abs/2410.18113v2

AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks

The observations documented in Cyber Threat Intelligence (CTI) reports play a critical role in describing adversarial behaviors, providing valuable insights for security practitioners to respond to evolving threats. Recent advancements of Large Language Models (LLMs) have demonstrated significant potential in various cybersecurity applications, including CTI report understanding and attack knowledge graph construction. While previous works have proposed benchmarks that focus on the CTI extraction ability of LLMs, the sequential characteristic of adversarial behaviors within CTI reports remains largely unexplored, which holds considerable significance in developing a comprehensive understanding of how adversaries operate. To address this gap, we introduce AttackSeqBench, a benchmark tailored to systematically evaluate LLMs' capability to understand and reason attack sequences in CTI reports. Our benchmark encompasses three distinct Question Answering (QA) tasks, each task focuses on the varying granularity in adversarial behavior. To alleviate the laborious effort of QA construction, we carefully design an automated dataset construction pipeline to create scalable and well-formulated QA datasets based on real-world CTI reports. To ensure the quality of our dataset, we adopt a hybrid approach of combining human evaluation and systematic evaluation metrics. We conduct extensive experiments and analysis with both fast-thinking and slow-thinking LLMs, while highlighting their strengths and limitations in analyzing the sequential patterns in cyber attacks. The overarching goal of this work is to provide a benchmark that advances LLM-driven CTI report understanding and fosters its application in real-world cybersecurity operations. Our dataset and code are available at https://github.com/Javiery3889/AttackSeqBench .

Updated: 2025-03-05 04:25:21

标题: AttackSeqBench：基准测试大型语言模型对网络攻击中的序列模式理解的能力

摘要: 《网络威胁情报（CTI）报告中记录的观察结果在描述对手行为方面发挥着至关重要的作用，为安全从业者提供了宝贵的见解，帮助其应对不断演变的威胁。最近大型语言模型（LLMs）的进展在各种网络安全应用中展示出了显著的潜力，包括CTI报告理解和攻击知识图谱构建。虽然先前的研究提出了专注于LLMs的CTI提取能力的基准，但CTI报告中对手行为的顺序特征仍然很大程度上未被探索，这在开发对对手操作方式的全面理解方面具有重要意义。为了填补这一空白，我们引入了AttackSeqBench，这是一个专门设计用于系统评估LLMs在理解和推理CTI报告中攻击序列的能力的基准。我们的基准包括三个不同的问答（QA）任务，每个任务侧重于对手行为中的不同粒度。为了减轻QA构建的繁重工作，我们精心设计了一个自动化数据集构建流程，基于现实世界的CTI报告创建可扩展且构思良好的QA数据集。为确保数据集的质量，我们采用了结合人工评估和系统评估指标的混合方法。我们进行了广泛的实验和分析，同时强调了快速思考和慢思考LLMs在分析网络攻击中的顺序模式时的优势和局限性。本工作的总体目标是提供一个推动LLM驱动的CTI报告理解的基准，并促进其在现实世界网络安全运营中的应用。我们的数据集和代码可在https://github.com/Javiery3889/AttackSeqBench 上找到。》

更新时间: 2025-03-05 04:25:21

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2503.03170v1

Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models

In this paper, we present a very first study to investigate trust and ethical implications of on-device artificial intelligence (AI), focusing on small language models (SLMs) amenable for personal devices like smartphones. While on-device SLMs promise enhanced privacy, reduced latency, and improved user experience compared to cloud-based services, we posit that they might also introduce significant risks and vulnerabilities compared to their on-server counterparts. As part of our trust assessment study, we conduct a systematic evaluation of the state-of-the-art on-devices SLMs, contrasted to their on-server counterparts, based on a well-established trustworthiness measurement framework. Our results show on-device SLMs to be significantly less trustworthy, specifically demonstrating more stereotypical, unfair and privacy-breaching behavior. Informed by these findings, we then perform our ethics assessment study using a dataset of unethical questions, that depicts harmful scenarios. Our results illustrate the lacking ethical safeguards in on-device SLMs, emphasizing their capabilities of generating harmful content. Further, the broken safeguards and exploitable nature of on-device SLMs is demonstrated using potentially unethical vanilla prompts, to which the on-device SLMs answer with valid responses without any filters and without the need for any jailbreaking or prompt engineering. These responses can be abused for various harmful and unethical scenarios like: societal harm, illegal activities, hate, self-harm, exploitable phishing content and many others, all of which indicates the severe vulnerability and exploitability of these on-device SLMs.

Updated: 2025-03-05 04:18:08

标题: 设备上的人工智能是否出现故障和可利用性？评估小语言模型中的信任和伦理问题

摘要: 在这篇论文中，我们提出了一项首次研究，旨在调查设备上人工智能（AI）的信任和伦理影响，重点关注适用于个人设备（如智能手机）的小型语言模型（SLMs）。虽然设备上的SLMs承诺提供增强的隐私保护、降低的延迟和改善的用户体验，与基于云的服务相比，但我们认为它们可能也引入了与其基于服务器的对应物相比更大的风险和漏洞。作为我们信任评估研究的一部分，我们对最新的设备上SLMs进行了系统评估，与它们的基于服务器的对应物进行对比，基于一个既定的信誉度测量框架。我们的结果显示，设备上的SLMs的信誉度明显较低，具体表现为更具刻板印象、不公平和侵犯隐私的行为。在这些发现的基础上，我们使用一个不道德问题的数据集进行了伦理评估研究，展示了有害场景。我们的结果说明设备上的SLMs缺乏伦理保障，强调了它们产生有害内容的能力。此外，我们使用潜在不道德的普通提示展示了设备上SLMs的破损保护和可利用性，对这些提示，设备上的SLMs会以有效的回答而不需要任何过滤器，也不需要越狱或提示工程。这些回答可以被滥用用于各种有害和不道德的场景，如：社会伤害、非法活动、仇恨、自残、可利用的网络钓鱼内容等，所有这些都表明了这些设备上SLMs的严重漏洞和可利用性。

更新时间: 2025-03-05 04:18:08

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.05364v2

A Predict-Then-Optimize Customer Allocation Framework for Online Fund Recommendation

With the rapid growth of online investment platforms, funds can be distributed to individual customers online. The central issue is to match funds with potential customers under constraints. Most mainstream platforms adopt the recommendation formulation to tackle the problem. However, the traditional recommendation regime has its inherent drawbacks when applying the fund-matching problem with multiple constraints. In this paper, we model the fund matching under the allocation formulation. We design PTOFA, a Predict-Then-Optimize Fund Allocation framework. This data-driven framework consists of two stages, i.e., prediction and optimization, which aim to predict expected revenue based on customer behavior and optimize the impression allocation to achieve the maximum revenue under the necessary constraints, respectively. Extensive experiments on real-world datasets from an industrial online investment platform validate the effectiveness and efficiency of our solution. Additionally, the online A/B tests demonstrate PTOFA's effectiveness in the real-world fund recommendation scenario.

Updated: 2025-03-05 04:16:36

标题: 一个预测-优化客户分配框架用于在线基金推荐

摘要: 随着在线投资平台的快速增长，资金可以在线分发给个人客户。核心问题是在约束条件下将资金与潜在客户匹配。大多数主流平台采用推荐公式来解决这个问题。然而，在应用具有多个约束条件的基金匹配问题时，传统的推荐制度具有固有的缺点。在本文中，我们对基金匹配进行了分配公式建模。我们设计了PTOFA，一种预测-优化基金分配框架。这个数据驱动的框架包括两个阶段，即预测和优化，旨在基于客户行为预测预期收入，并优化印象分配以在必要的约束条件下实现最大收入。来自工业在线投资平台的真实数据集上的大量实验验证了我们解决方案的有效性和效率。此外，在线A/B测试证明了PTOFA在现实世界的基金推荐场景中的有效性。

更新时间: 2025-03-05 04:16:36

领域: cs.CE,cs.IR,cs.LG

下载: http://arxiv.org/abs/2503.03165v1

SpinML: Customized Synthetic Data Generation for Private Training of Specialized ML Models

Specialized machine learning (ML) models tailored to users needs and requests are increasingly being deployed on smart devices with cameras, to provide personalized intelligent services taking advantage of camera data. However, two primary challenges hinder the training of such models: the lack of publicly available labeled data suitable for specialized tasks and the inaccessibility of labeled private data due to concerns about user privacy. To address these challenges, we propose a novel system SpinML, where the server generates customized Synthetic image data to Privately traIN a specialized ML model tailored to the user request, with the usage of only a few sanitized reference images from the user. SpinML offers users fine-grained, object-level control over the reference images, which allows user to trade between the privacy and utility of the generated synthetic data according to their privacy preferences. Through experiments on three specialized model training tasks, we demonstrate that our proposed system can enhance the performance of specialized models without compromising users privacy preferences.

Updated: 2025-03-05 04:05:09

标题: SpinML：定制合成数据生成用于专门机器学习模型的私人训练

摘要: 针对用户需求和请求定制的专门的机器学习（ML）模型越来越多地部署在带有摄像头的智能设备上，以利用摄像头数据提供个性化智能服务。然而，训练这种模型面临两个主要挑战：缺乏适用于专门任务的公开可用的标记数据，以及由于用户隐私顾虑而无法访问标记的私有数据。为了解决这些挑战，我们提出了一个新颖的系统SpinML，其中服务器生成定制的合成图像数据，以私密地训练一个针对用户请求定制的专门的ML模型，仅使用用户提供的几张经过处理的参考图像。SpinML为用户提供了对参考图像的细粒度、物体级别的控制，这使用户可以根据其隐私偏好在生成的合成数据的隐私和效用之间进行权衡。通过对三个专门模型训练任务的实验，我们证明了我们提出的系统可以提升专门模型的性能，同时不会损害用户的隐私偏好。

更新时间: 2025-03-05 04:05:09

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2503.03160v1

PROVGEN: A Privacy-Preserving Approach for Outcome Validation in Genomic Research

As genomic research has become increasingly popular in recent years, the sharing of datasets has remained limited due to privacy concerns. This limitation hinders the reproduction and validation of research outcomes, which are essential for identifying computation errors during the research process. In this paper, we introduce PROVGEN, a privacy-preserving method for sharing genomic datasets that facilitates reproducibility and outcome validation in genome-wide association studies (GWAS). Our approach encodes genomic data into binary space and applies a two-stage process. First, we generate a differentially private version of the dataset using an XOR-based mechanism that incorporates biological characteristics. Second, we restore data utility by adjusting the Minor Allele Frequency (MAF) values in the noisy dataset to align with published MAFs using optimal transport. Finally, we decode the processed data back into its genomic form for further use. We evaluate PROVGEN on three real-world genomic datasets and compare it with local differential privacy and three synthesis-based methods. We show that our proposed scheme outperforms all existing methods in detecting GWAS outcome errors, achieves better utility, provides higher privacy protection against membership inference attacks (MIAs). By adopting our method, genomic researchers will be inclined to share differentially private datasets while maintaining high data quality.

Updated: 2025-03-05 04:02:30

标题: PROVGEN：基因组研究中的结果验证隐私保护方法

摘要: 随着基因组学研究在近年来变得越来越流行，由于隐私问题，数据集的共享仍然受到限制。这种限制阻碍了研究结果的再现和验证，而这对于在研究过程中识别计算错误是至关重要的。在本文中，我们介绍了PROVGEN，这是一种用于分享基因组数据集的隐私保护方法，可以促进全基因组关联研究(GWAS)中的再现性和结果验证。我们的方法将基因组数据编码为二进制空间，并应用了一个两阶段过程。首先，我们使用基于XOR的机制生成数据集的差分隐私版本，该机制包含了生物特征。其次，我们通过使用最优传输将嘈杂数据集中的次要等位基因频率（MAF）值调整为与已发表的MAF值对齐来恢复数据效用。最后，我们将经过处理的数据解码回其基因组形式以供进一步使用。我们在三个真实基因组数据集上评估了PROVGEN，并将其与本地差分隐私和三种基于合成的方法进行了比较。我们展示了我们提出的方案在检测GWAS结果错误方面优于所有现有方法，实现了更好的效用，提供了更高的隐私保护防止成员推理攻击(MIAs)。通过采用我们的方法，基因组研究人员将倾向于分享具有差分隐私的数据集，同时保持高质量的数据。

更新时间: 2025-03-05 04:02:30

领域: cs.CR

下载: http://arxiv.org/abs/2209.06327v6

Error Correction Code Transformer: From Non-Unified to Unified

Channel coding is vital for reliable data transmission in modern wireless systems, and its significance will increase with the emergence of sixth-generation (6G) networks, which will need to support various error correction codes. However, traditional decoders were typically designed as fixed hardware circuits tailored to specific decoding algorithms, leading to inefficiencies and limited flexibility. To address these challenges, this paper proposes a unified, code-agnostic Transformer-based decoding architecture capable of handling multiple linear block codes, including Polar, Low-Density Parity-Check (LDPC), and Bose-Chaudhuri-Hocquenghem (BCH), within a single framework. To achieve this, standardized units are employed to harmonize parameters across different code types, while the redesigned unified attention module compresses the structural information of various codewords. Additionally, a sparse mask, derived from the sparsity of the parity-check matrix, is introduced to enhance the model's ability to capture inherent constraints between information and parity-check bits, resulting in improved decoding accuracy and robustness. Extensive experimental results demonstrate that the proposed unified Transformer-based decoder not only outperforms existing methods but also provides a flexible, efficient, and high-performance solution for next-generation wireless communication systems.

Updated: 2025-03-05 04:01:38

标题: 错误校正码转换器：从非统一到统一

摘要: 通道编码对于现代无线系统中可靠的数据传输至关重要，随着第六代（6G）网络的出现，其重要性将会增加，这些网络将需要支持各种纠错码。然而，传统的解码器通常被设计为针对特定解码算法的固定硬件电路，导致效率低下且灵活性有限。为了解决这些挑战，本文提出了一种统一的、与编码无关的基于Transformer的解码架构，能够在一个框架内处理多种线性分组码，包括Polar码、低密度奇偶校验（LDPC）码和Bose-Chaudhuri-Hocquenghem（BCH）码。为实现这一目标，采用了标准化单元来协调不同编码类型之间的参数，同时重新设计的统一注意力模块压缩了各种码字的结构信息。此外，还引入了一个稀疏掩码，从奇偶校验矩阵的稀疏性中导出，以增强模型捕捉信息与奇偶校验位之间固有约束的能力，从而提高解码准确性和鲁棒性。广泛的实验结果表明，所提出的统一Transformer解码器不仅优于现有方法，还为下一代无线通信系统提供了灵活、高效和高性能的解决方案。

更新时间: 2025-03-05 04:01:38

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2410.03364v2

DiRe-JAX: A JAX based Dimensionality Reduction Algorithm for Large-scale Data

DiRe-JAX is a new dimensionality reduction toolkit designed to address some of the challenges faced by traditional methods like UMAP and tSNE such as loss of global structure and computational efficiency. Built on the JAX framework, DiRe leverages modern hardware acceleration to provide an efficient, scalable, and interpretable solution for visualizing complex data structures, and for quantitative analysis of lower-dimensional embeddings. The toolkit shows considerable promise in preserving both local and global structures within the data as compare to state-of-the-art UMAP and tSNE implementations. This makes it suitable for a wide range of applications in machine learning, bioinformatics, and data science.

Updated: 2025-03-05 03:56:01

标题: DiRe-JAX：一种基于JAX的大规模数据降维算法

摘要: DiRe-JAX是一个新的降维工具包，旨在解决传统方法如UMAP和tSNE所面临的一些挑战，例如全局结构丢失和计算效率低下。建立在JAX框架上，DiRe利用现代硬件加速来提供一种高效、可扩展且可解释的解决方案，用于可视化复杂数据结构和对低维嵌入的定量分析。与最先进的UMAP和tSNE实现相比，该工具包在保留数据中的局部和全局结构方面表现出了相当大的潜力。这使得它适用于机器学习、生物信息学和数据科学的各种应用。

更新时间: 2025-03-05 03:56:01

领域: cs.LG,cs.AI,cs.MS,H.1.1; G.4

下载: http://arxiv.org/abs/2503.03156v1

Position: Model Collapse Does Not Mean What You Think

The proliferation of AI-generated content online has fueled concerns over \emph{model collapse}, a degradation in future generative models' performance when trained on synthetic data generated by earlier models. Industry leaders, premier research journals and popular science publications alike have prophesied catastrophic societal consequences stemming from model collapse. In this position piece, we contend this widespread narrative fundamentally misunderstands the scientific evidence. We highlight that research on model collapse actually encompasses eight distinct and at times conflicting definitions of model collapse, and argue that inconsistent terminology within and between papers has hindered building a comprehensive understanding of model collapse. To assess how significantly different interpretations of model collapse threaten future generative models, we posit what we believe are realistic conditions for studying model collapse and then conduct a rigorous assessment of the literature's methodologies through this lens. While we leave room for reasonable disagreement, our analysis of research studies, weighted by how faithfully each study matches real-world conditions, leads us to conclude that certain predicted claims of model collapse rely on assumptions and conditions that poorly match real-world conditions, and in fact several prominent collapse scenarios are readily avoidable. Altogether, this position paper argues that model collapse has been warped from a nuanced multifaceted consideration into an oversimplified threat, and that the evidence suggests specific harms more likely under society's current trajectory have received disproportionately less attention.

Updated: 2025-03-05 03:47:17

标题: 职位：模型崩溃并不是你想象的那样

摘要: 网络上人工智能生成内容的增加引发了对“模型崩溃”的担忧，即当未来生成模型在早期模型生成的合成数据上训练时，性能会下降。行业领袖、顶尖研究期刊和流行科学出版物均预言了模型崩溃可能带来的灾难性社会后果。在这篇立场文章中，我们认为这种广泛的叙事根本误解了科学证据。我们强调，关于模型崩溃的研究实际上涵盖了八个不同且有时相互矛盾的定义，并认为在论文内部和之间使用不一致的术语阻碍了对模型崩溃的全面理解。为了评估不同对模型崩溃的解释对未来生成模型构成的威胁有多大，我们提出了我们认为是研究模型崩溃的现实条件，然后通过这个角度对文献的方法进行了严格评估。尽管我们留出合理分歧的余地，但我们对研究研究的分析，根据每项研究与现实条件的匹配程度，导致我们得出结论，即某些关于模型崩溃的预测性声明依赖于与现实条件不符的假设和条件，实际上几个突出的崩溃场景是可以避免的。总的来说，这篇立场文章认为，模型崩溃已经从一个细致多面的考虑被扭曲成一个过于简化的威胁，而证据表明，在社会当前的轨迹下更有可能出现的具体危害受到了不成比例的关注。

更新时间: 2025-03-05 03:47:17

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2503.03150v1

Partial Convolution Meets Visual Attention

Designing an efficient and effective neural network has remained a prominent topic in computer vision research. Depthwise onvolution (DWConv) is widely used in efficient CNNs or ViTs, but it needs frequent memory access during inference, which leads to low throughput. FasterNet attempts to introduce partial convolution (PConv) as an alternative to DWConv but compromises the accuracy due to underutilized channels. To remedy this shortcoming and consider the redundancy between feature map channels, we introduce a novel Partial visual ATtention mechanism (PAT) that can efficiently combine PConv with visual attention. Our exploration indicates that the partial attention mechanism can completely replace the full attention mechanism and reduce model parameters and FLOPs. Our PAT can derive three types of blocks: Partial Channel-Attention block (PAT_ch), Partial Spatial-Attention block (PAT_sp) and Partial Self-Attention block (PAT_sf). First, PAT_ch integrates the enhanced Gaussian channel attention mechanism to infuse global distribution information into the untouched channels of PConv. Second, we introduce the spatial-wise attention to the MLP layer to further improve model accuracy. Finally, we replace PAT_ch in the last stage with the self-attention mechanism to extend the global receptive field. Building upon PAT, we propose a novel hybrid network family, named PATNet, which achieves superior top-1 accuracy and inference speed compared to FasterNet on ImageNet-1K classification and excel in both detection and segmentation on the COCO dataset. Particularly, our PATNet-T2 achieves 1.3% higher accuracy than FasterNet-T2, while exhibiting 25% higher GPU throughput and 24% lower CPU latency.

Updated: 2025-03-05 03:42:59

标题: 部分卷积遇见视觉注意力

摘要: 设计高效有效的神经网络一直是计算机视觉研究中一个突出的话题。深度卷积（DWConv）被广泛应用于高效的CNN或ViTs中，但在推理过程中需要频繁的内存访问，导致吞吐量较低。FasterNet尝试引入部分卷积（PConv）作为DWConv的替代方案，但由于通道利用不足而牺牲了准确性。为了弥补这一缺陷并考虑特征图通道之间的冗余，我们引入了一种新颖的部分视觉注意机制（PAT），可以有效地将PConv与视觉注意相结合。我们的探索表明，部分注意机制可以完全取代完全注意机制，并减少模型参数和FLOPs。我们的PAT可以派生出三种类型的块：部分通道注意块（PAT_ch），部分空间注意块（PAT_sp）和部分自注意块（PAT_sf）。首先，PAT_ch将增强的高斯通道注意机制整合到PConv的未触及通道中，以注入全局分布信息。其次，我们在MLP层引入了空间注意来进一步提高模型的准确性。最后，我们在最后阶段用自注意机制替换了PAT_ch，以扩展全局感受野。基于PAT，我们提出了一个新颖的混合网络家族，名为PATNet，与ImageNet-1K分类中的FasterNet相比，PATNet在顶级准确性和推理速度方面表现出色，并在COCO数据集上在检测和分割方面表现出色。特别是，我们的PATNet-T2比FasterNet-T2准确度高出1.3%，同时GPU吞吐量高出25%，CPU延迟低了24%。

更新时间: 2025-03-05 03:42:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.03148v1

PriFFT: Privacy-preserving Federated Fine-tuning of Large Language Models via Function Secret Sharing

Fine-tuning large language models (LLMs) raises privacy concerns due to the risk of exposing sensitive training data. Federated learning (FL) mitigates this risk by keeping training samples on local devices, but recent studies show that adversaries can still infer private information from model updates in FL. Additionally, LLM parameters are typically shared publicly during federated fine-tuning, while developers are often reluctant to disclose these parameters, posing further security challenges. Inspired by the above problems, we propose PriFFT, a privacy-preserving federated fine-tuning mechanism, to protect both the model updates and parameters. In PriFFT, clients and the server share model inputs and parameters by secret sharing, performing secure fine-tuning on shared values without accessing plaintext data. Due to considerable LLM parameters, privacy-preserving federated fine-tuning invokes complex secure calculations and requires substantial communication and computation resources. To optimize the efficiency of privacy-preserving federated fine-tuning of LLMs, we introduce function secret-sharing protocols for various operations, including reciprocal calculation, tensor products, natural exponentiation, softmax, hyperbolic tangent, and dropout. The proposed protocols achieve up to 4.02X speed improvement and reduce 7.19X communication overhead compared to the implementation based on existing secret sharing methods. Besides, PriFFT achieves a 2.23X speed improvement and reduces 4.08X communication overhead in privacy-preserving fine-tuning without accuracy drop compared to the existing secret sharing methods.

Updated: 2025-03-05 03:41:57

标题: PriFFT：通过功能秘密共享实现隐私保护的大型语言模型联邦微调

摘要: 大型语言模型（LLMs）的微调引起了隐私问题，因为存在泄露敏感训练数据的风险。联邦学习（FL）通过将训练样本保留在本地设备上来减轻这一风险，但最近的研究表明对手仍然可以从FL中的模型更新中推断出私人信息。此外，LLM参数通常在联邦微调期间公开共享，而开发人员通常不愿透露这些参数，进一步提出了安全挑战。受以上问题启发，我们提出了PriFFT，一种保护模型更新和参数的隐私保护联邦微调机制。在PriFFT中，客户端和服务器通过秘密共享模型输入和参数，对共享值进行安全微调，而无需访问明文数据。由于LLM参数相当多，隐私保护的联邦微调引发复杂的安全计算，并需要大量的通信和计算资源。为了优化隐私保护的联邦微调LLMs的效率，我们引入了各种操作的函数秘密共享协议，包括倒数计算、张量积、自然指数、softmax、双曲正切和放弃。与基于现有秘密共享方法的实现相比，所提出的协议实现了高达4.02倍的速度改进，并减少了7.19倍的通信开销。此外，与现有的秘密共享方法相比，PriFFT在隐私保护微调中实现了2.23倍的速度改进，并减少了4.08倍的通信开销，而不降低准确性。

更新时间: 2025-03-05 03:41:57

领域: cs.CR

下载: http://arxiv.org/abs/2503.03146v1

Mixtraining: A Better Trade-Off Between Compute and Performance

Incorporating self-supervised learning (SSL) before standard supervised learning (SL) has become a widely used strategy to enhance model performance, particularly in data-limited scenarios. However, this approach introduces a trade-off between computation and performance: while SSL helps with representation learning, it requires a separate, often time-consuming training phase, increasing computational overhead and limiting efficiency in resource-constrained settings. To address these challenges, we propose MixTraining, a novel framework that interleaves several SSL and SL epochs within a unified mixtraining training phase, featuring a smooth transition between two learning objectives. MixTraining enhances synergy between SSL and SL for improved accuracy and consolidates shared computation steps to reduce computation overhead. MixTraining is versatile and applicable to both single-task and multi-task learning scenarios. Extensive experiments demonstrate that MixTraining offers a superior compute-performance trade-off compared to conventional pipelines, achieving an 8.81% absolute accuracy gain (18.89% relative accuracy gain) on the TinyImageNet dataset while accelerating training by up to 1.29x with the ViT-Tiny model.

Updated: 2025-03-05 03:40:47

标题: 混合训练：计算与性能之间更好的折衷方案

摘要: 将自监督学习（SSL）纳入标准监督学习（SL）之前已成为增强模型性能的常用策略，特别是在数据有限的情况下。然而，这种方法引入了计算和性能之间的权衡：虽然SSL有助于表示学习，但它需要一个单独的、通常耗时的训练阶段，增加了计算开销，限制了在资源受限环境中的效率。为了应对这些挑战，我们提出了MixTraining，这是一个新颖的框架，它在统一的MixTraining训练阶段中交替进行几个SSL和SL周期，实现了两个学习目标之间的平滑过渡。MixTraining增强了SSL和SL之间的协同作用，提高了准确性，并将共享的计算步骤整合在一起，减少了计算开销。MixTraining具有通用性，适用于单任务和多任务学习场景。大量实验证明，与传统管道相比，MixTraining在TinyImageNet数据集上实现了8.81%的绝对准确率提升（18.89%的相对准确率提升），同时通过ViT-Tiny模型加速训练最多可达1.29倍。

更新时间: 2025-03-05 03:40:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.19513v2

AutoG: Towards automatic graph construction from tabular data

Recent years have witnessed significant advancements in graph machine learning (GML), with its applications spanning numerous domains. However, the focus of GML has predominantly been on developing powerful models, often overlooking a crucial initial step: constructing suitable graphs from common data formats, such as tabular data. This construction process is fundamental to applying graph-based models, yet it remains largely understudied and lacks formalization. Our research aims to address this gap by formalizing the graph construction problem and proposing an effective solution. We identify two critical challenges to achieve this goal: 1. The absence of dedicated datasets to formalize and evaluate the effectiveness of graph construction methods, and 2. Existing automatic construction methods can only be applied to some specific cases, while tedious human engineering is required to generate high-quality graphs. To tackle these challenges, we present a two-fold contribution. First, we introduce a set of datasets to formalize and evaluate graph construction methods. Second, we propose an LLM-based solution, AutoG, automatically generating high-quality graph schemas without human intervention. The experimental results demonstrate that the quality of constructed graphs is critical to downstream task performance, and AutoG can generate high-quality graphs that rival those produced by human experts. Our code can be accessible from https://github.com/amazon-science/Automatic-Table-to-Graph-Generation.

Updated: 2025-03-05 03:38:57

标题: AutoG：从表格数据自动构建图形

摘要: 近年来，图机器学习（GML）取得了重大进展，其应用领域涵盖了许多领域。然而，GML的重点主要集中在开发强大的模型上，往往忽视了一个关键的初始步骤：从常见数据格式（如表格数据）构建适当的图。这个构建过程对于应用基于图的模型至关重要，但仍然在很大程度上缺乏研究和形式化。我们的研究旨在通过形式化图构建问题并提出有效解决方案来填补这一空白。我们确定了实现这一目标的两个关键挑战：1. 缺乏专门的数据集来形式化和评估图构建方法的有效性，以及2. 现有的自动构建方法只能应用于一些特定情况，而需要繁琐的人工工程来生成高质量的图。为了解决这些挑战，我们提出了一个双重贡献。首先，我们引入一组数据集来形式化和评估图构建方法。其次，我们提出了一种基于LLM的解决方案AutoG，可以在没有人工干预的情况下自动生成高质量的图模式。实验结果表明，构建的图的质量对下游任务的性能至关重要，AutoG可以生成与人类专家生成的图媲美的高质量图。我们的代码可以从https://github.com/amazon-science/Automatic-Table-to-Graph-Generation获取。

更新时间: 2025-03-05 03:38:57

领域: cs.LG

下载: http://arxiv.org/abs/2501.15282v3

A Survey of Foundation Models for Environmental Science

Modeling environmental ecosystems is essential for effective resource management, sustainable development, and understanding complex ecological processes. However, traditional methods frequently struggle with the inherent complexity, interconnectedness, and limited data of such systems. Foundation models, with their large-scale pre-training and universal representations, offer transformative opportunities by integrating diverse data sources, capturing spatiotemporal dependencies, and adapting to a broad range of tasks. This survey presents a comprehensive overview of foundation model applications in environmental science, highlighting advancements in forward prediction, data generation, data assimilation, downscaling, model ensembling, and decision-making across domains. We also detail the development process of these models, covering data collection, architecture design, training, tuning, and evaluation. By showcasing these emerging methods, we aim to foster interdisciplinary collaboration and advance the integration of cutting-edge machine learning for sustainable solutions in environmental science.

Updated: 2025-03-05 03:33:31

标题: 一个关于环境科学基础模型的调查

摘要: 建模环境生态系统对于有效的资源管理、可持续发展和理解复杂的生态过程至关重要。然而，传统方法经常难以应对这些系统固有的复杂性、相互关联性和有限的数据。基础模型通过大规模的预训练和通用表示，提供了整合多样数据源、捕捉时空依赖性并适应广泛任务的转型机会。本调查提供了基础模型在环境科学中应用的综合概述，重点介绍了在前向预测、数据生成、数据同化、下尺度化、模型集成和跨领域决策等方面取得的进展。我们还详细介绍了这些模型的开发过程，包括数据收集、架构设计、训练、调优和评估。通过展示这些新兴方法，我们旨在促进跨学科合作，推动前沿机器学习在环境科学中可持续解决方案的整合。

更新时间: 2025-03-05 03:33:31

领域: cs.LG

下载: http://arxiv.org/abs/2503.03142v1

Implicit U-KAN2.0: Dynamic, Efficient and Interpretable Medical Image Segmentation

Image segmentation is a fundamental task in both image analysis and medical applications. State-of-the-art methods predominantly rely on encoder-decoder architectures with a U-shaped design, commonly referred to as U-Net. Recent advancements integrating transformers and MLPs improve performance but still face key limitations, such as poor interpretability, difficulty handling intrinsic noise, and constrained expressiveness due to discrete layer structures, often lacking a solid theoretical foundation.In this work, we introduce Implicit U-KAN 2.0, a novel U-Net variant that adopts a two-phase encoder-decoder structure. In the SONO phase, we use a second-order neural ordinary differential equation (NODEs), called the SONO block, for a more efficient, expressive, and theoretically grounded modeling approach. In the SONO-MultiKAN phase, we integrate the second-order NODEs and MultiKAN layer as the core computational block to enhance interpretability and representation power. Our contributions are threefold. First, U-KAN 2.0 is an implicit deep neural network incorporating MultiKAN and second order NODEs, improving interpretability and performance while reducing computational costs. Second, we provide a theoretical analysis demonstrating that the approximation ability of the MultiKAN block is independent of the input dimension. Third, we conduct extensive experiments on a variety of 2D and a single 3D dataset, demonstrating that our model consistently outperforms existing segmentation networks.

Updated: 2025-03-05 03:31:05

标题: 隐式U-KAN2.0：动态、高效且可解释的医学图像分割

摘要: 图像分割是图像分析和医学应用中的基本任务。目前的方法主要依赖于具有U形设计的编码器-解码器架构，通常称为U-Net。最近整合了变换器和MLP的先进方法提高了性能，但仍面临关键限制，如解释性差、难以处理固有噪音以及由于离散层结构而受限的表达能力，常常缺乏坚实的理论基础。在这项工作中，我们引入了Implicit U-KAN 2.0，这是一种采用两阶段编码器-解码器结构的新型U-Net变体。在SONO阶段，我们使用第二阶神经常微分方程（NODEs），称为SONO块，以获得更高效、更具表现力和理论基础的建模方法。在SONO-MultiKAN阶段，我们将第二阶NODEs和MultiKAN层作为核心计算块集成，以增强解释性和表示能力。我们的贡献有三个方面。首先，U-KAN 2.0是一种隐式深度神经网络，结合了MultiKAN和第二阶NODEs，提高了解释性和性能，同时降低了计算成本。其次，我们提供了理论分析，证明了MultiKAN块的逼近能力与输入维度无关。第三，我们对各种2D和单一3D数据集进行了广泛实验，证明我们的模型始终优于现有的分割网络。

更新时间: 2025-03-05 03:31:05

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.03141v1

Knowledge Augmentation in Federation: Rethinking What Collaborative Learning Can Bring Back to Decentralized Data

Data, as an observable form of knowledge, has become one of the most important factors of production for the development of Artificial Intelligence (AI). Meanwhile, increasing legislation and regulations on private and proprietary information results in scattered data sources also known as the ``data islands''. Although some collaborative learning paradigms such as Federated Learning (FL) can enable privacy-preserving training over decentralized data, they have inherent deficiencies in fairness, costs and reproducibility because of being learning-centric, which greatly limits the way how participants cooperate with each other. In light of this, we present a knowledge-centric paradigm termed \emph{Knowledge Augmentation in Federation} (KAF), with focus on how to enhance local knowledge through collaborative effort. We provide the suggested system architecture, formulate the prototypical optimization objective, and review emerging studies that employ methodologies suitable for KAF. On our roadmap, with a three-way categorization we describe the methods for knowledge expansion, knowledge filtering, and label and feature space correction in the federation. Further, we highlight several challenges and open questions that deserve more attention from the community. With our investigation, we intend to offer new insights for what collaborative learning can bring back to decentralized data.

Updated: 2025-03-05 03:26:54

标题: 联邦中的知识增强：重新思考协作学习可以为分散数据带来什么。

摘要: 数据作为一种可观察的知识形式，已成为人工智能（AI）发展中最重要的生产要素之一。与此同时，对私人和专有信息的立法和监管日益增加导致了分散的数据源，也被称为“数据孤岛”。尽管一些协作学习范例，如联邦学习（FL），可以实现在分散数据上进行隐私保护的训练，但由于其以学习为中心的固有缺陷，如公平性、成本和可重现性，极大限制了参与者之间合作的方式。鉴此，我们提出了一种以知识为中心的范式，称为“联邦中的知识增强”（KAF），重点在于通过协作努力提升本地知识。我们提供了建议的系统架构，制定了原型优化目标，并审查了采用适用于KAF的方法学的新兴研究。在我们的路线图上，我们描述了知识扩展、知识过滤和在联邦中的标签和特征空间校正的方法，分为三个方面。此外，我们强调了一些值得社区更多关注的挑战和开放性问题。通过我们的调查，我们希望为协作学习能够给分散数据带来什么带来新的见解。

更新时间: 2025-03-05 03:26:54

领域: cs.DC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.03140v1

Convergence Analysis of Federated Learning Methods Using Backward Error Analysis

Backward error analysis allows finding a modified loss function, which the parameter updates really follow under the influence of an optimization method. The additional loss terms included in this modified function is called implicit regularizer. In this paper, we attempt to find the implicit regularizer for various federated learning algorithms on non-IID data distribution, and explain why each method shows different convergence behavior. We first show that the implicit regularizer of FedAvg disperses the gradient of each client from the average gradient, thus increasing the gradient variance. We also empirically show that the implicit regularizer hampers its convergence. Similarly, we compute the implicit regularizers of FedSAM and SCAFFOLD, and explain why they converge better. While existing convergence analyses focus on pointing out the advantages of FedSAM and SCAFFOLD, our approach can explain their limitations in complex non-convex settings. In specific, we demonstrate that FedSAM can partially remove the bias in the first-order term of the implicit regularizer in FedAvg, whereas SCAFFOLD can fully eliminate the bias in the first-order term, but not in the second-order term. Consequently, the implicit regularizer can provide a useful insight on the convergence behavior of federated learning from a different theoretical perspective.

Updated: 2025-03-05 03:26:48

标题: "Federated Learning方法的收敛分析：使用后向误差分析"

摘要: 反向误差分析允许找到一个修改的损失函数，这个函数可以在优化方法的影响下实际跟随参数更新。在这个修改后的函数中包含的额外损失项被称为隐式正则化器。在本文中，我们尝试找到各种联邦学习算法在非IID数据分布上的隐式正则化器，并解释为什么每种方法显示不同的收敛行为。我们首先展示FedAvg的隐式正则化器将每个客户端的梯度从平均梯度中分散出来，从而增加了梯度方差。我们还经验性地展示隐式正则化器阻碍了其收敛。类似地，我们计算了FedSAM和SCAFFOLD的隐式正则化器，并解释了它们为什么收敛更好。虽然现有的收敛分析侧重于指出FedSAM和SCAFFOLD的优势，但我们的方法可以解释它们在复杂的非凸设置中的局限性。具体而言，我们证明了FedSAM可以部分消除FedAvg中隐式正则化器的一阶项中的偏差，而SCAFFOLD可以完全消除一阶项中的偏差，但不能消除二阶项中的偏差。因此，隐式正则化器可以从不同的理论视角提供对联邦学习收敛行为的有用见解。

更新时间: 2025-03-05 03:26:48

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2503.03139v1

L2R: Learning to Reduce Search Space for Generalizable Neural Routing Solver

Constructive neural combinatorial optimization (NCO) has attracted growing research attention due to its ability to solve complex routing problems without relying on handcrafted rules. However, existing NCO methods face significant challenges in generalizing to large-scale problems due to high computational complexity and inefficient capture of structural patterns. To address this issue, we propose a novel learning-based search space reduction method that adaptively selects a small set of promising candidate nodes at each step of the constructive NCO process. Unlike traditional methods that rely on fixed heuristics, our selection model dynamically prioritizes nodes based on learned patterns, significantly reducing the search space while maintaining solution quality. Experimental results demonstrate that our method, trained solely on 100-node instances from uniform distribution, generalizes remarkably well to large-scale Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) instances with up to 1 million nodes from the uniform distribution and over 80K nodes from other distributions.

Updated: 2025-03-05 03:25:09

标题: L2R: 学习减少通用神经路由求解器的搜索空间

摘要: 建设性神经组合优化（NCO）因其解决复杂路由问题而不依赖手工规则的能力而受到越来越多的研究关注。然而，现有的NCO方法在推广到大规模问题时面临着重要挑战，原因是高计算复杂性和对结构模式的低效捕捉。为了解决这个问题，我们提出了一种新颖的基于学习的搜索空间缩减方法，该方法在构建性NCO过程的每一步自适应地选择一小组有希望的候选节点。与依赖固定启发式的传统方法不同，我们的选择模型根据学习到的模式动态地优先考虑节点，显著减少了搜索空间同时保持解决方案质量。实验结果表明，我们的方法仅在均匀分布的100个节点实例上进行训练，可以非常好地推广到大规模旅行商问题（TSP）和容量车辆路径问题（CVRP）实例，这些实例来自均匀分布的100万个节点以及其他分布的超过80,000个节点。

更新时间: 2025-03-05 03:25:09

领域: cs.AI

下载: http://arxiv.org/abs/2503.03137v1

Bridging Molecular Graphs and Large Language Models

While Large Language Models (LLMs) have shown exceptional generalization capabilities, their ability to process graph data, such as molecular structures, remains limited. To bridge this gap, this paper proposes Graph2Token, an efficient solution that aligns graph tokens to LLM tokens. The key idea is to represent a graph token with the LLM token vocabulary, without fine-tuning the LLM backbone. To achieve this goal, we first construct a molecule-text paired dataset from multisources, including CHEBI and HMDB, to train a graph structure encoder, which reduces the distance between graphs and texts representations in the feature space. Then, we propose a novel alignment strategy that associates a graph token with LLM tokens. To further unleash the potential of LLMs, we collect molecular IUPAC name identifiers, which are incorporated into the LLM prompts. By aligning molecular graphs as special tokens, we can activate LLM generalization ability to molecular few-shot learning. Extensive experiments on molecular classification and regression tasks demonstrate the effectiveness of our proposed Graph2Token.

Updated: 2025-03-05 03:15:38

标题: 连接分子图和大型语言模型

摘要: 尽管大型语言模型（LLMs）展现出了异常的泛化能力，但它们处理图数据（如分子结构）的能力仍然有限。为了弥合这一差距，本文提出了Graph2Token，这是一种有效的解决方案，它将图标记与LLM标记对齐。关键思想是用LLM标记词汇表示图标记，而无需对LLM骨干进行微调。为了实现这一目标，我们首先从多个来源构建了一个分子文本配对数据集，包括CHEBI和HMDB，用于训练一个图结构编码器，该编码器在特征空间中减小了图与文本表示之间的距离。然后，我们提出了一种新颖的对齐策略，将图标记与LLM标记关联起来。为了进一步释放LLMs的潜力，我们收集了分子IUPAC名称标识符，并将其纳入LLM提示中。通过将分子图形对齐为特殊标记，我们可以激活LLM对分子少样本学习的泛化能力。对分子分类和回归任务的大量实验表明我们提出的Graph2Token的有效性。

更新时间: 2025-03-05 03:15:38

领域: cs.LG

下载: http://arxiv.org/abs/2503.03135v1

HARMONIC: Cognitive and Control Collaboration in Human-Robotic Teams

This paper introduces HARMONIC, a cognitive-robotic architecture that integrates the OntoAgent cognitive framework with general-purpose robot control systems applied to human-robot teaming (HRT). We also present a cognitive strategy for robots that incorporates metacognition, natural language communication, and explainability capabilities required for collaborative partnerships in HRT. Through simulation experiments involving a joint search task performed by a heterogeneous team of a UGV, a drone, and a human operator, we demonstrate the system's ability to coordinate actions between robots with heterogeneous capabilities, adapt to complex scenarios, and facilitate natural human-robot communication. Evaluation results show that robots using the OntoAgent architecture within the HARMONIC framework can reason about plans, goals, and team member attitudes while providing clear explanations for their decisions, which are essential prerequisites for realistic human-robot teaming.

Updated: 2025-03-05 03:08:12

标题: “HARMONIC: 人机团队中的认知与控制协作”

摘要: 这篇论文介绍了HARMONIC，这是一个将OntoAgent认知框架与通用机器人控制系统结合应用于人机协作团队（HRT）的认知机器人架构。我们还提出了一种认知策略，该策略融合了元认知、自然语言沟通和解释能力，这些能力对于HRT中的合作伙伴关系至关重要。通过模拟实验，涉及到UGV、无人机和人类操作员组成的异质团队执行联合搜索任务，我们展示了系统在协调具有不同能力的机器人之间的行动、适应复杂场景以及促进自然的人机沟通方面的能力。评估结果表明，在HARMONIC框架内使用OntoAgent架构的机器人能够推理计划、目标和团队成员态度，同时清晰解释他们的决策，这是实现真实人机合作的关键前提。

更新时间: 2025-03-05 03:08:12

领域: cs.RO,cs.AI,cs.MA

下载: http://arxiv.org/abs/2409.18047v2

Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks

As speech translation (ST) systems become increasingly prevalent, understanding their vulnerabilities is crucial for ensuring robust and reliable communication. However, limited work has explored this issue in depth. This paper explores methods of compromising these systems through imperceptible audio manipulations. Specifically, we present two innovative approaches: (1) the injection of perturbation into source audio, and (2) the generation of adversarial music designed to guide targeted translation, while also conducting more practical over-the-air attacks in the physical world. Our experiments reveal that carefully crafted audio perturbations can mislead translation models to produce targeted, harmful outputs, while adversarial music achieve this goal more covertly, exploiting the natural imperceptibility of music. These attacks prove effective across multiple languages and translation models, highlighting a systemic vulnerability in current ST architectures. The implications of this research extend beyond immediate security concerns, shedding light on the interpretability and robustness of neural speech processing systems. Our findings underscore the need for advanced defense mechanisms and more resilient architectures in the realm of audio systems. More details and samples can be found at https://adv-st.github.io.

Updated: 2025-03-05 03:07:49

标题: 利用有针对性的对抗性攻击在语音翻译系统中发现漏洞

摘要: 随着语音翻译（ST）系统变得越来越普遍，了解它们的脆弱性对于确保稳健可靠的交流至关重要。然而，有限的研究深入探讨了这个问题。本文通过不可察觉的音频操作探讨了损害这些系统的方法。具体来说，我们提出了两种创新方法：（1）向源音频注入扰动，以及（2）生成旨在引导有针对性翻译的对抗音乐，同时在物理世界进行更实际的空中攻击。我们的实验表明，精心设计的音频扰动可以误导翻译模型产生有针对性的有害输出，而对抗音乐更隐蔽地实现这一目标，利用音乐的自然不可察觉性。这些攻击在多种语言和翻译模型中都证明有效，突显了当前ST体系结构的系统性脆弱性。这项研究的影响超越了即时安全问题，为神经语音处理系统的可解释性和稳健性带来了新视角。我们的发现强调了在音频系统领域需求更先进的防御机制和更具弹性的架构。更多详细信息和样本可在https://adv-st.github.io找到。

更新时间: 2025-03-05 03:07:49

领域: cs.SD,cs.AI,cs.CR,eess.AS

下载: http://arxiv.org/abs/2503.00957v2

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

This paper investigates visual analogical reasoning in large multimodal models (LMMs) compared to human adults and children. A "visual analogy" is an abstract rule inferred from one image and applied to another. While benchmarks exist for testing visual reasoning in LMMs, they require advanced skills and omit basic visual analogies that even young children can make. Inspired by developmental psychology, we propose a new benchmark of 4,300 visual transformations of everyday objects to test LMMs on visual analogical reasoning and compare them to children (ages three to five) and to adults. We structure the evaluation into three stages: identifying what changed (e.g., color, number, etc.), how it changed (e.g., added one object), and applying the rule to new scenarios. Our findings show that while GPT-o1, GPT-4V, LLaVA-1.5, and MANTIS identify the "what" effectively, they struggle with quantifying the "how" and extrapolating this rule to new objects. In contrast, children and adults exhibit much stronger analogical reasoning at all three stages. Additionally, the strongest tested model, GPT-o1, performs better in tasks involving simple surface-level visual attributes like color and size, correlating with quicker human adult response times. Conversely, more complex tasks such as number, rotation, and reflection, which necessitate extensive cognitive processing and understanding of extrinsic spatial properties in the physical world, present more significant challenges. Altogether, these findings highlight the limitations of training models on data that primarily consists of 2D images and text.

Updated: 2025-03-05 03:07:12

标题: KiVA：受儿童启发的用于测试大型多模态模型的视觉类比

摘要: 这篇论文研究了大型多模态模型（LMMs）中的视觉类比推理与人类成年人和儿童的比较。"视觉类比"是从一幅图像中推断出的抽象规则，并应用于另一幅图像。虽然存在用于测试LMMs视觉推理能力的基准，但它们需要高级技能，并且忽略了即使年幼的儿童也能做出的基本视觉类比。受发展心理学启发，我们提出了一个新的基准，包括了4,300个日常物体的视觉变换，以测试LMMs在视觉类比推理上的表现，并将它们与儿童（三到五岁）和成年人进行比较。我们将评估分为三个阶段：识别发生了什么变化（例如颜色、数量等），变化的方式（例如增加一个对象），以及将规则应用于新场景。我们的研究结果表明，虽然GPT-o1、GPT-4V、LLaVA-1.5和MANTIS有效地识别了“发生了什么”，但在量化“如何”和将这一规则推广到新物体方面遇到了困难。相反，儿童和成年人在三个阶段的类比推理能力要强得多。此外，最强的测试模型GPT-o1在涉及颜色和大小等简单表面级视觉属性的任务中表现更好，与人类成年人的反应时间更快相关。相反，需要广泛的认知加工和对物理世界的外在空间属性的理解的更复杂任务，如数量、旋转和反射，带来了更大的挑战。总的来说，这些发现突显了在主要由2D图像和文本组成的数据上训练模型的局限性。

更新时间: 2025-03-05 03:07:12

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.17773v3

An Optimization Algorithm for Multimodal Data Alignment

In the data era, the integration of multiple data types, known as multimodality, has become a key area of interest in the research community. This interest is driven by the goal to develop cutting edge multimodal models capable of serving as adaptable reasoning engines across a wide range of modalities and domains. Despite the fervent development efforts, the challenge of optimally representing different forms of data within a single unified latent space a crucial step for enabling effective multimodal reasoning has not been fully addressed. To bridge this gap, we introduce AlignXpert, an optimization algorithm inspired by Kernel CCA crafted to maximize the similarities between N modalities while imposing some other constraints. This work demonstrates the impact on improving data representation for a variety of reasoning tasks, such as retrieval and classification, underlining the pivotal importance of data representation.

Updated: 2025-03-05 03:07:07

标题: 一个多模态数据对齐的优化算法

摘要: 在数据时代，多个数据类型的整合，即多模态，已成为研究界关注的一个关键领域。这种兴趣是由于发展尖端多模态模型的目标，这些模型能够作为可适应不同模态和领域的推理引擎。尽管有着热情的发展工作，但在单一统一的潜在空间内最佳地表示不同形式的数据的挑战，这对于实现有效的多模态推理是一个至关重要的步骤，尚未得到充分解决。为了弥补这一差距，我们介绍了AlignXpert，这是一种受核CCA启发的优化算法，旨在最大化N种模态之间的相似性，同时施加一些其他约束。这项工作展示了对多种推理任务（如检索和分类）改进数据表示的影响，强调了数据表示的关键重要性。

更新时间: 2025-03-05 03:07:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.07636v1

Exploring Neural Ordinary Differential Equations as Interpretable Healthcare classifiers

Deep Learning has emerged as one of the most significant innovations in machine learning. However, a notable limitation of this field lies in the ``black box" decision-making processes, which have led to skepticism within groups like healthcare and scientific communities regarding its applicability. In response, this study introduces a interpretable approach using Neural Ordinary Differential Equations (NODEs), a category of neural network models that exploit the dynamics of differential equations for representation learning. Leveraging their foundation in differential equations, we illustrate the capability of these models to continuously process textual data, marking the first such model of its kind, and thereby proposing a promising direction for future research in this domain. The primary objective of this research is to propose a novel architecture for groups like healthcare that require the predictive capabilities of deep learning while emphasizing the importance of model transparency demonstrated in NODEs.

Updated: 2025-03-05 02:51:50

标题: 探索神经常微分方程作为可解释的医疗分类器

摘要: 深度学习已经成为机器学习中最重要的创新之一。然而，这一领域的一个显著局限性在于“黑匣子”决策过程，这导致了医疗和科学界对其适用性产生怀疑。作为回应，本研究引入了一种可解释的方法，使用神经常微分方程（NODEs）来建模，这是一类利用微分方程动力学进行表示学习的神经网络模型。通过利用它们在微分方程中的基础，我们展示了这些模型持续处理文本数据的能力，标志着这种模型的首次出现，并为未来研究提出了一个有前景的方向。本研究的主要目标是为医疗等需要深度学习预测能力的群体提出一种新颖的架构，同时强调了NODEs所展示的模型透明度的重要性。

更新时间: 2025-03-05 02:51:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.03129v1

Towards Understanding Multi-Round Large Language Model Reasoning: Approximability, Learnability and Generalizability

Recent advancements in cognitive science and multi-round reasoning techniques for Large Language Models (LLMs) suggest that iterative thinking processes improve problem-solving performance in complex tasks. Inspired by this, approaches like Chain-of-Thought, debating, and self-refinement have been applied to auto-regressive LLMs, achieving significant successes in tasks such as mathematical reasoning, commonsense reasoning, and multi-hop question answering. Despite these successes, the theoretical basis for how multi-round reasoning enhances problem-solving abilities remains underexplored. In this work, we investigate the approximation, learnability, and generalization properties of multi-round auto-regressive models. We show that Transformers with finite context windows are universal approximators for steps of Turing-computable functions and can approximate any Turing-computable sequence-to-sequence function through multi-round reasoning. We extend PAC learning to sequence generation and demonstrate that multi-round generation is learnable even when the sequence length exceeds the model's context window. Finally, we examine how generalization error propagates across rounds, and show how the aforementioned approaches can help constrain this error, ensuring outputs stay within an expectation boundary. This work sheds light on the systemic theoretical foundations of multi-round sequence learning and reasoning, emphasizing its role in inference complexity.

Updated: 2025-03-05 02:50:55

标题: 朝向理解多轮大型语言模型推理：近似性，可学性和泛化性

摘要: 最近认知科学和多轮推理技术在大型语言模型（LLMs）中的应用表明，迭代思维过程可以提高复杂任务中的问题解决性能。受此启发，像“链式思维”、辩论和自我完善等方法已被应用于自回归LLMs，在数学推理、常识推理和多跳问题回答等任务中取得了显著成功。尽管取得了这些成功，多轮推理如何提升问题解决能力的理论基础仍未得到充分探讨。在本研究中，我们调查了多轮自回归模型的逼近性、可学习性和泛化性质。我们展示了有限上下文窗口的Transformer对于图灵可计算函数的步骤是通用逼近器，并且可以通过多轮推理逼近任何图灵可计算的序列到序列函数。我们将PAC学习扩展到序列生成，并证明即使序列长度超过模型的上下文窗口，多轮生成也是可学习的。最后，我们研究了泛化误差如何跨越多轮传播，并展示了上述方法如何帮助约束这种误差，确保输出保持在一个期望边界内。这项工作阐明了多轮序列学习和推理的系统理论基础，强调了其在推理复杂性中的作用。

更新时间: 2025-03-05 02:50:55

领域: cs.AI,cs.CL,cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.03128v1

Long-Sequence Recommendation Models Need Decoupled Embeddings

Lifelong user behavior sequences are crucial for capturing user interests and predicting user responses in modern recommendation systems. A two-stage paradigm is typically adopted to handle these long sequences: a subset of relevant behaviors is first searched from the original long sequences via an attention mechanism in the first stage and then aggregated with the target item to construct a discriminative representation for prediction in the second stage. In this work, we identify and characterize, for the first time, a neglected deficiency in existing long-sequence recommendation models: a single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes. Initial attempts to address this issue with some common methods (e.g., linear projections -- a technique borrowed from language processing) proved ineffective, shedding light on the unique challenges of recommendation models. To overcome this, we propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are initialized and learned separately to fully decouple attention and representation. Extensive experiments and analysis demonstrate that DARE provides more accurate searches of correlated behaviors and outperforms baselines with AUC gains up to 0.9% on public datasets and notable improvements on Tencent's advertising platform. Furthermore, decoupling embedding spaces allows us to reduce the attention embedding dimension and accelerate the search procedure by 50% without significant performance impact, enabling more efficient, high-performance online serving. Code in PyTorch for experiments, including model analysis, is available at https://github.com/thuml/DARE.

Updated: 2025-03-05 02:48:49

标题: 长序列推荐模型需要解耦嵌入

摘要: 终身用户行为序列对于捕获用户兴趣和预测现代推荐系统中用户响应至关重要。通常采用两阶段范式来处理这些长序列：首先通过注意机制从原始长序列中搜索一部分相关行为，然后与目标项目聚合以构建一个用于预测的判别性表示。在这项工作中，我们首次识别和表征了现有长序列推荐模型中被忽视的缺陷：单一的嵌入集合难以同时学习注意力和表示，导致这两个过程之间的干扰。初步尝试使用一些常见方法（例如，线性投影--从语言处理中借鉴的技术）来解决这个问题被证明是无效的，揭示了推荐模型的独特挑战。为了克服这一问题，我们提出了解耦注意力和表示嵌入（DARE）模型，其中初始化和分别学习两个不同的嵌入表，以完全解耦注意力和表示。大量实验证明，DARE提供了更准确的相关行为搜索，并且在公共数据集上比基线模型提高了高达0.9%的AUC，并在腾讯的广告平台上有显著的改进。此外，解耦嵌入空间使我们能够减少注意力嵌入维度，并通过加速搜索过程50%而不会对性能产生重大影响，实现更高效、高性能的在线服务。在PyTorch中的实验代码，包括模型分析，可在https://github.com/thuml/DARE上找到。

更新时间: 2025-03-05 02:48:49

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2410.02604v2

MUSE-Net: Missingness-aware mUlti-branching Self-attention Encoder for Irregular Longitudinal Electronic Health Records

The era of big data has made vast amounts of clinical data readily available, particularly in the form of electronic health records (EHRs), which provides unprecedented opportunities for developing data-driven diagnostic tools to enhance clinical decision making. However, the application of EHRs in data-driven modeling faces challenges such as irregularly spaced multi-variate time series, issues of incompleteness, and data imbalance. Realizing the full data potential of EHRs hinges on the development of advanced analytical models. In this paper, we propose a novel Missingness-aware mUlti-branching Self-Attention Encoder (MUSE-Net) to cope with the challenges in modeling longitudinal EHRs for data-driven disease prediction. The proposed MUSE-Net is composed by four novel modules including: (1) a multi-task Gaussian process (MGP) with missing value masks for data imputation; (2) a multi-branching architecture to address the data imbalance problem; (3) a time-aware self-attention encoder to account for the irregularly spaced time interval in longitudinal EHRs; (4) interpretable multi-head attention mechanism that provides insights into the importance of different time points in disease prediction, allowing clinicians to trace model decisions. We evaluate the proposed MUSE-Net using both synthetic and real-world datasets. Experimental results show that our MUSE-Net outperforms existing methods that are widely used to investigate longitudinal signals.

Updated: 2025-03-05 02:39:47

标题: MUSE-Net：针对不规则长期电子健康记录的缺失感知多分支自注意力编码器

摘要: 大数据时代使大量临床数据变得容易获取，尤其是以电子健康记录（EHRs）的形式，为开发基于数据的诊断工具提供了前所未有的机会，以增强临床决策。然而，EHRs在数据驱动建模中的应用面临诸如不规则间隔的多变量时间序列、不完整性和数据失衡等挑战。实现EHRs的完整数据潜力取决于先进分析模型的发展。在本文中，我们提出了一种新颖的Missingness-aware mUlti-branching Self-Attention Encoder（MUSE-Net），以应对建模纵向EHRs用于数据驱动疾病预测的挑战。所提出的MUSE-Net由四个新颖模块组成，包括：（1）带有缺失值掩码的多任务高斯过程（MGP）用于数据插补；（2）多分支架构以解决数据失衡问题；（3）时间感知的自注意编码器以考虑纵向EHRs中不规则间隔的时间间隔；（4）可解释的多头注意机制，提供对疾病预测中不同时间点重要性的见解，使临床医生能够追踪模型决策。我们使用合成和真实世界数据集评估了所提出的MUSE-Net。实验结果显示，我们的MUSE-Net优于广泛用于研究纵向信号的现有方法。

更新时间: 2025-03-05 02:39:47

领域: cs.LG

下载: http://arxiv.org/abs/2407.00840v2

The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models

Multimodal Reward Models (MM-RMs) are crucial for aligning Large Language Models (LLMs) with human preferences, particularly as LLMs increasingly interact with multimodal data. However, we find that MM-RMs trained on existing datasets often struggle to generalize to out-of-distribution data due to their reliance on unimodal spurious correlations, primarily text-only shortcuts within the training distribution, which prevents them from leveraging true multimodal reward functions. To address this, we introduce a Shortcut-aware MM-RM learning algorithm that mitigates this issue by dynamically reweighting training samples, shifting the distribution toward better multimodal understanding, and reducing dependence on unimodal spurious correlations. Our experiments demonstrate significant improvements in generalization, downstream task performance, and scalability, establishing a more robust framework for multimodal reward modeling.

Updated: 2025-03-05 02:37:41

标题: 细节决定成败：解决单模态虚假相关性以建立通用多模态奖励模型

摘要: 多模态奖励模型（MM-RMs）对于将大型语言模型（LLMs）与人类偏好对齐至关重要，特别是随着LLMs与多模态数据的交互日益增多。然而，我们发现，现有数据集上训练的MM-RMs往往难以推广到分布外的数据，原因是它们依赖于单模态虚假相关性，主要是在训练分布中仅有文本的捷径，这使它们无法利用真正的多模态奖励函数。为了解决这个问题，我们引入了一种Shortcut-aware MM-RM学习算法，通过动态重新加权训练样本，将分布转向更好的多模态理解，减少对单模态虚假相关性的依赖。我们的实验表明，在泛化能力、下游任务性能和可扩展性方面取得了显著的改进，建立了一个更健壮的多模态奖励建模框架。

更新时间: 2025-03-05 02:37:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03122v1

Detecting Adversarial Data using Perturbation Forgery

As a defense strategy against adversarial attacks, adversarial detection aims to identify and filter out adversarial data from the data flow based on discrepancies in distribution and noise patterns between natural and adversarial data. Although previous detection methods achieve high performance in detecting gradient-based adversarial attacks, new attacks based on generative models with imbalanced and anisotropic noise patterns evade detection. Even worse, the significant inference time overhead and limited performance against unseen attacks make existing techniques impractical for real-world use. In this paper, we explore the proximity relationship among adversarial noise distributions and demonstrate the existence of an open covering for these distributions. By training on the open covering of adversarial noise distributions, a detector with strong generalization performance against various types of unseen attacks can be developed. Based on this insight, we heuristically propose Perturbation Forgery, which includes noise distribution perturbation, sparse mask generation, and pseudo-adversarial data production, to train an adversarial detector capable of detecting any unseen gradient-based, generative-based, and physical adversarial attacks. Comprehensive experiments conducted on multiple general and facial datasets, with a wide spectrum of attacks, validate the strong generalization of our method.

Updated: 2025-03-05 02:30:54

标题: 使用扰动伪造检测对抗性数据

摘要: 作为对抗性攻击的一种防御策略，对抗性检测旨在基于自然数据和对抗性数据之间的分布差异和噪声模式来识别和过滤出对抗性数据。尽管先前的检测方法在检测基于梯度的对抗性攻击方面表现出很高的性能，但基于生成模型的新攻击具有不平衡和各向异性噪声模式，能够规避检测。更糟糕的是，显著的推理时间开销和对未知攻击的有限性能使现有技术在实际应用中不切实际。在本文中，我们探索了对抗性噪声分布之间的接近关系，并证明了这些分布存在一个开覆盖。通过对对抗性噪声分布的开覆盖进行训练，可以开发出一个具有强大泛化性能的检测器，可以对各种类型的未知攻击进行检测。基于这一洞见，我们启发性地提出了扰动伪造方法，包括噪声分布扰动、稀疏掩码生成和伪对抗性数据生成，以训练一个能够检测任何未知基于梯度、生成和物理对抗性攻击的对抗性检测器。在多个通用和面部数据集上进行的全面实验验证了我们方法的强大泛化能力。

更新时间: 2025-03-05 02:30:54

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.16226v4

ExpertPrompting: Instructing Large Language Models to be Distinguished Experts

The answering quality of an aligned large language model (LLM) can be drastically improved if treated with proper crafting of prompts. In this paper, we propose ExpertPrompting to elicit the potential of LLMs to answer as distinguished experts. We first utilize In-Context Learning to automatically synthesize detailed and customized descriptions of the expert identity for each specific instruction, and then ask LLMs to provide answer conditioned on such agent background. Based on this augmented prompting strategy, we produce a new set of instruction-following data using GPT-3.5, and train a competitive open-source chat assistant called ExpertLLaMA. We employ GPT4-based evaluation to show that 1) the expert data is of significantly higher quality than vanilla answers, and 2) ExpertLLaMA outperforms existing open-source opponents and achieves 96\% of the original ChatGPT's capability. All data and the ExpertLLaMA model will be made publicly available at https://github.com/OFA-Sys/ExpertLLaMA.

Updated: 2025-03-05 02:28:39

标题: 专家提示：指导大型语言模型成为杰出专家

摘要: 通过适当的提示设计，对齐的大型语言模型（LLM）的回答质量可以得到显著提高。在本文中，我们提出了ExpertPrompting方法，以激发LLMs作为杰出专家回答的潜力。我们首先利用上下文学习自动合成专家身份的详细和定制描述，然后要求LLMs在这种代理背景条件下提供答案。基于这种增强的提示策略，我们使用GPT-3.5生成了一组新的遵循指令的数据，并训练了一个竞争力强的开源聊天助手ExpertLLaMA。我们使用基于GPT4的评估结果表明：1）专家数据的质量显著高于普通回答，2）ExpertLLaMA超越现有的开源对手，实现了ChatGPT原始能力的96%。所有数据和ExpertLLaMA模型将在https://github.com/OFA-Sys/ExpertLLaMA上公开提供。

更新时间: 2025-03-05 02:28:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2305.14688v2

Advancing Highway Work Zone Safety: A Comprehensive Review of Sensor Technologies for Intrusion and Proximity Hazards

Highway work zones are critical areas where accidents frequently occur, often due to the proximity of workers to heavy machinery and ongoing traffic. With technological advancements in sensor technologies and the Internet of Things, promising solutions are emerging to address these safety concerns. This paper provides a systematic review of existing studies on the application of sensor technologies in enhancing highway work zone safety, particularly in preventing intrusion and proximity hazards. Following the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) protocol, the review examines a broad spectrum of publications on various sensor technologies, including GPS, radar, laser, infrared, RFID, Bluetooth, ultrasonic, and infrared sensors, detailing their application in reducing intrusion and proximity incidents. The review also assesses these technologies in terms of their accuracy, range, power consumption, cost, and user-friendliness, with a specific emphasis on their suitability for highway work zones. The findings highlight the potential of sensor technologies to significantly enhance work zone safety. As there are a wide range of sensor technologies to choose from, the review also revealed that selection of sensors for a particular application needs careful consideration of different factors. Finally, while sensor technologies offer promising solutions for enhancing highway work zone safety, their effective implementation requires comprehensive consideration of various factors beyond technological capabilities, including developing integrated, cost-effective, user-friendly, and secure systems, and creating regulatory frameworks to support the rapid development of these technologies.

Updated: 2025-03-05 02:23:06

标题: 推进高速公路工地安全：入侵和邻近危险的传感器技术综合评述

摘要: 高速公路施工区是事故频发的关键区域，往往是由于工人与重型机械和持续交通的接近造成的。随着传感器技术和物联网的技术进步，出现了有望解决这些安全问题的解决方案。本文系统回顾了现有研究关于传感器技术在增强高速公路施工区安全性方面的应用，特别是在预防侵入和接近危险方面。根据“系统评价和荟萃分析的首选报告项目”（PRISMA）协议，该回顾审查了各种传感器技术的广泛出版物，包括GPS、雷达、激光、红外、RFID、蓝牙、超声波和红外传感器，并详细说明它们在减少侵入和接近事件中的应用。该回顾还评估了这些技术在准确性、范围、功耗、成本和用户友好性方面的优势，特别强调它们在高速公路施工区的适用性。研究结果突出了传感器技术在显著提升工作区安全性方面的潜力。由于有各种传感器技术可供选择，该回顾还揭示了为特定应用选择传感器需要仔细考虑不同因素。最后，虽然传感器技术为增强高速公路施工区安全性提供了有希望的解决方案，但它们的有效实施需要综合考虑超越技术能力的各种因素，包括开发集成的、成本效益的、用户友好的和安全的系统，并创建支持这些技术快速发展的监管框架。

更新时间: 2025-03-05 02:23:06

领域: eess.SP,cs.CR,cs.CY

下载: http://arxiv.org/abs/2503.13478v1

Predicting Space Tourism Demand Using Explainable AI

Comprehensive forecasts of space tourism demand are crucial for businesses to optimize strategies and customer experiences in this burgeoning industry. Traditional methods struggle to capture the complex factors influencing an individual's decision to travel to space. In this paper, we propose an explainable and trustworthy artificial intelligence framework to address the challenge of predicting space tourism demand by following the National Institute of Standards and Technology guidelines. We develop a novel machine learning network, called SpaceNet, capable of learning wide-range dependencies in data and allowing us to analyze the relationships between various factors such as age, income, and risk tolerance. We investigate space travel demand in the US, categorizing it into four types: no travel, moon travel, suborbital, and orbital travel. To this end, we collected 1860 data points in many states and cities with different ages and then conducted our experiment with the data. From our experiments, the SpaceNet achieves an average ROC-AUC of 0.82 $\pm$ 0.088, indicating strong classification performance. Our investigation demonstrated that travel price, age, annual income, gender, and fatality probability are important features in deciding whether a person wants to travel or not. Beyond demand forecasting, we use explainable AI to provide interpretation for the travel-type decisions of an individual, offering insights into the factors driving interest in space travel, which is not possible with traditional classification methods. This knowledge enables businesses to tailor marketing strategies and optimize service offerings in this rapidly evolving market. To the best of our knowledge, this is the first work to implement an explainable and interpretable AI framework for investigating the factors influencing space tourism.

Updated: 2025-03-05 02:18:31

标题: 利用可解释人工智能预测太空旅游需求

摘要: 空间旅游需求的全面预测对于企业优化战略和客户体验在这个蓬勃发展的行业中至关重要。传统方法难以捕捉影响个人决定前往太空的复杂因素。在本文中，我们提出了一个可解释和可信赖的人工智能框架，以应对预测空间旅游需求的挑战，遵循国家标准与技术研究所的指导方针。我们开发了一个名为SpaceNet的新型机器学习网络，能够学习数据中的广泛依赖关系，使我们能够分析年龄、收入和风险承受能力等各种因素之间的关系。我们调查了美国的太空旅行需求，将其分类为四种类型：不旅行、月球旅行、次轨道和轨道旅行。为此，我们收集了在许多州和城市中具有不同年龄的1860个数据点，然后用这些数据进行实验。从我们的实验中，SpaceNet实现了平均ROC-AUC为0.82 ± 0.088，表明强大的分类性能。我们的调查表明，旅行价格、年龄、年收入、性别和死亡概率是决定一个人是否想要旅行的重要特征。除了需求预测，我们利用可解释的人工智能为个人的旅行类型决策提供解释，提供关于推动对太空旅行兴趣的因素的见解，这是传统分类方法无法实现的。这种知识使企业能够量身定制营销战略并优化在这个迅速发展的市场中的服务提供。据我们所知，这是第一个为调查影响空间旅游的因素实施可解释和可解释的AI框架的工作。

更新时间: 2025-03-05 02:18:31

领域: cs.LG

下载: http://arxiv.org/abs/2503.03113v1

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data. Scaling reinforcement learning (RL) unlocks a new axis for the continued improvement of artificial intelligence, with the promise that large language models (LLMs) can scale their training data by learning to explore with rewards. However, prior published work has not produced competitive results. In light of this, we report on the training practice of Kimi k1.5, our latest multi-modal LLM trained with RL, including its RL training techniques, multi-modal data recipes, and infrastructure optimization. Long context scaling and improved policy optimization methods are key ingredients of our approach, which establishes a simplistic, effective RL framework without relying on more complex techniques such as Monte Carlo tree search, value functions, and process reward models. Notably, our system achieves state-of-the-art reasoning performance across multiple benchmarks and modalities -- e.g., 77.5 on AIME, 96.2 on MATH 500, 94-th percentile on Codeforces, 74.9 on MathVista -- matching OpenAI's o1. Moreover, we present effective long2short methods that use long-CoT techniques to improve short-CoT models, yielding state-of-the-art short-CoT reasoning results -- e.g., 60.8 on AIME, 94.6 on MATH500, 47.3 on LiveCodeBench -- outperforming existing short-CoT models such as GPT-4o and Claude Sonnet 3.5 by a large margin (up to +550%).

Updated: 2025-03-05 02:16:32

标题: Kimi k1.5: 利用LLMs扩展强化学习

摘要: 使用下一个标记预测的语言模型预训练已被证明对于扩展计算是有效的，但受限于可用训练数据的数量。扩展强化学习（RL）开启了人工智能持续改进的新方向，承诺大型语言模型（LLMs）可以通过学习探索奖励来扩展其训练数据。然而，先前发表的研究未能取得竞争性成果。鉴于此，我们报告了Kimi k1.5的训练实践，这是我们最新的多模态LLM，经过RL训练，包括其RL训练技术、多模态数据配方和基础设施优化。长上下文扩展和改进的策略优化方法是我们方法的关键要素，它建立了一个简单而有效的RL框架，不依赖于更复杂的技术，如蒙特卡洛树搜索、价值函数和过程奖励模型。值得注意的是，我们的系统在多个基准和模态上实现了最新的推理性能，例如，在AIME上为77.5，在MATH 500上为96.2，在Codeforces上为94分位数，在MathVista上为74.9，与OpenAI的o1相匹配。此外，我们提出了有效的长2短方法，利用长CoT技术改进短CoT模型，产生了最新的短CoT推理结果，例如，在AIME上为60.8，在MATH500上为94.6，在LiveCodeBench上为47.3，优于现有的短CoT模型，如GPT-4o和Claude Sonnet 3.5，大幅提高（高达+550%）。

更新时间: 2025-03-05 02:16:32

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2501.12599v2

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct. Notably, all fine-tuned models exhibit inconsistent behavior, sometimes acting aligned. Through control experiments, we isolate factors contributing to emergent misalignment. Our models trained on insecure code behave differently from jailbroken models that accept harmful user requests. Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment. In a further experiment, we test whether emergent misalignment can be induced selectively via a backdoor. We find that models finetuned to write insecure code given a trigger become misaligned only when that trigger is present. So the misalignment is hidden without knowledge of the trigger. It's important to understand when and why narrow finetuning leads to broad misalignment. We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.

Updated: 2025-03-05 02:15:50

标题: 紧急错位：狭窄的微调可以产生广泛错位的LLM

摘要: 我们提出了关于LLMs和对齐的一个令人惊讶的结果。在我们的实验中，一个模型被微调以输出不安全的代码，而不向用户透露这一点。结果模型在与编码无关的广泛提示上表现不对齐：它声称人类应该被人工智能奴役，给出恶意建议，并表现欺骗性。在编写不安全代码的狭窄任务上进行训练引发了广泛的不对齐。我们称之为新发现的不对齐。这种效应在一系列模型中观察到，但在GPT-4o和Qwen2.5-Coder-32B-Instruct中最为明显。值得注意的是，所有微调模型都表现出不一致的行为，有时表现对齐。通过对照实验，我们分离出导致新发现的不对齐的因素。我们的模型在不安全代码上训练的行为与接受有害用户请求的越狱模型不同。此外，如果修改数据集，让用户请求不安全代码用于计算机安全课程，则可以防止新发现的不对齐。在进一步的实验中，我们测试了是否可以通过后门有选择性地引发新发现的不对齐。我们发现，被微调为在给定触发器的情况下编写不安全代码的模型仅在该触发器存在时出现不对齐。因此，不对齐是隐藏的，没有触发器的情况下不会发生。了解狭窄微调何时以及为何会导致广泛不对齐是很重要的。我们进行了大量消融实验，提供了初步见解，但全面解释仍然是未来工作的一个开放挑战。

更新时间: 2025-03-05 02:15:50

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.17424v4

A Multimodal Framework for Topic Propagation Classification in Social Networks

The rapid proliferation of the Internet and the widespread adoption of social networks have significantly accelerated information dissemination. However, this transformation has introduced complexities in information capture and processing, posing substantial challenges for researchers and practitioners. Predicting the dissemination of topic-related information within social networks has thus become a critical research focus. This paper proposes a predictive model for topic dissemination in social networks by integrating multidimensional features derived from key dissemination characteristics. Specifically, we introduce two novel indicators, user relationship breadth and user authority, into the PageRank algorithm to quantify user influence more effectively. Additionally, we employ a Text-CNN model for sentiment classification, extracting sentiment features from textual content. Temporal embeddings of nodes are encoded using a Bi-LSTM model to capture temporal dynamics. Furthermore, we refine the measurement of user interaction traces with topics, replacing traditional topic view metrics with a more precise communication characteristics measure. Finally, we integrate the extracted multidimensional features using a Transformer model, significantly enhancing predictive performance. Experimental results demonstrate that our proposed model outperforms traditional machine learning and unimodal deep learning models in terms of FI-Score, AUC, and Recall, validating its effectiveness in predicting topic propagation within social networks.

Updated: 2025-03-05 02:12:23

标题: 社交网络中主题传播分类的多模态框架

摘要: 互联网的快速扩张和社交网络的广泛应用显著加快了信息传播。然而，这种转变引入了信息捕获和处理的复杂性，给研究人员和从业者带来了重大挑战。因此，在社交网络中预测与主题相关信息的传播已成为一个关键的研究重点。本文提出了一个整合了关键传播特征的多维特征的主题传播预测模型。具体而言，我们将两个新颖的指标——用户关系广度和用户权威——引入PageRank算法，以更有效地量化用户影响力。此外，我们使用Text-CNN模型进行情感分类，从文本内容中提取情感特征。节点的时间嵌入使用Bi-LSTM模型进行编码，以捕捉时间动态。此外，我们通过用更精确的通信特征度量替换传统的主题查看度量，进一步完善了用户与主题互动迹象的测量。最后，我们使用Transformer模型整合提取的多维特征，显著提高了预测性能。实验结果表明，我们提出的模型在FI-Score、AUC和Recall方面优于传统机器学习和单模深度学习模型，验证了其在预测社交网络中主题传播方面的有效性。

更新时间: 2025-03-05 02:12:23

领域: cs.SI,cs.AI,cs.NE

下载: http://arxiv.org/abs/2503.03112v1

WarmFed: Federated Learning with Warm-Start for Globalization and Personalization Via Personalized Diffusion Models

Federated Learning (FL) stands as a prominent distributed learning paradigm among multiple clients to achieve a unified global model without privacy leakage. In contrast to FL, Personalized federated learning aims at serving for each client in achieving persoanlized model. However, previous FL frameworks have grappled with a dilemma: the choice between developing a singular global model at the server to bolster globalization or nurturing personalized model at the client to accommodate personalization. Instead of making trade-offs, this paper commences its discourse from the pre-trained initialization, obtaining resilient global information and facilitating the development of both global and personalized models. Specifically, we propose a novel method called WarmFed to achieve this. WarmFed customizes Warm-start through personalized diffusion models, which are generated by local efficient fine-tunining (LoRA). Building upon the Warm-Start, we advance a server-side fine-tuning strategy to derive the global model, and propose a dynamic self-distillation (DSD) to procure more resilient personalized models simultaneously. Comprehensive experiments underscore the substantial gains of our approach across both global and personalized models, achieved within just one-shot and five communication(s).

Updated: 2025-03-05 02:10:04

标题: WarmFed：通过个性化扩散模型实现全球化和个性化的热启动联邦学习

摘要: 联邦学习（FL）是一种突出的分布式学习范式，多个客户端之间实现统一的全局模型而不泄露隐私。与FL相比，个性化联邦学习旨在为每个客户端提供个性化模型。然而，先前的FL框架面临着一个困境：在服务器上开发一个统一的全局模型以增强全球化，或者在客户端培养个性化模型以适应个性化之间的选择。本文从预训练初始化开始，获取具有韧性的全局信息，并促进全局和个性化模型的发展，而不是做出取舍。具体来说，我们提出了一种名为WarmFed的新方法来实现这一目标。WarmFed通过个性化扩散模型定制了Warm-start，这些模型是通过本地高效微调（LoRA）生成的。在Warm-Start的基础上，我们提出了一种服务器端微调策略来推导全局模型，并提出了一种动态自蒸馏（DSD）方法，同时获取更具韧性的个性化模型。全面的实验证明了我们方法在全局和个性化模型上取得的显著收益，仅在一个或五次通信内实现。

更新时间: 2025-03-05 02:10:04

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2503.03110v1

Tackling Few-Shot Segmentation in Remote Sensing via Inpainting Diffusion Model

Limited data is a common problem in remote sensing due to the high cost of obtaining annotated samples. In the few-shot segmentation task, models are typically trained on base classes with abundant annotations and later adapted to novel classes with limited examples. However, this often necessitates specialized model architectures or complex training strategies. Instead, we propose a simple approach that leverages diffusion models to generate diverse variations of novel-class objects within a given scene, conditioned by the limited examples of the novel classes. By framing the problem as an image inpainting task, we synthesize plausible instances of novel classes under various environments, effectively increasing the number of samples for the novel classes and mitigating overfitting. The generated samples are then assessed using a cosine similarity metric to ensure semantic consistency with the novel classes. Additionally, we employ Segment Anything Model (SAM) to segment the generated samples and obtain precise annotations. By using high-quality synthetic data, we can directly fine-tune off-the-shelf segmentation models. Experimental results demonstrate that our method significantly enhances segmentation performance in low-data regimes, highlighting its potential for real-world remote sensing applications.

Updated: 2025-03-05 02:08:51

标题: 通过修补扩散模型解决遥感中的少样本分割

摘要: 有限的数据是遥感中的常见问题，因为获取带有注释的样本成本高昂。在少样本分割任务中，模型通常在具有丰富注释的基本类别上进行训练，然后调整到具有有限示例的新类别。然而，这通常需要专门的模型架构或复杂的训练策略。相反，我们提出了一种简单的方法，利用扩散模型，在给定场景中生成新类别对象的多样化变化，条件是新类别的有限示例。通过将问题构建为图像修复任务，我们在各种环境下合成了新类别的可能实例，有效增加了新类别的样本数量，并减轻了过拟合。然后，使用余弦相似度度量对生成的样本进行评估，以确保与新类别的语义一致性。此外，我们使用Segment Anything Model (SAM)对生成的样本进行分割，并获得精确的注释。通过使用高质量的合成数据，我们可以直接微调现成的分割模型。实验结果表明，我们的方法显著提高了在低数据情况下的分割性能，凸显了其在现实世界遥感应用中的潜力。

更新时间: 2025-03-05 02:08:51

领域: eess.IV,cs.LG

下载: http://arxiv.org/abs/2503.03785v1

Property Enhanced Instruction Tuning for Multi-task Molecule Generation with Large Language Models

Large language models (LLMs) are widely applied in various natural language processing tasks such as question answering and machine translation. However, due to the lack of labeled data and the difficulty of manual annotation for biochemical properties, the performance for molecule generation tasks is still limited, especially for tasks involving multi-properties constraints. In this work, we present a two-step framework PEIT (Property Enhanced Instruction Tuning) to improve LLMs for molecular-related tasks. In the first step, we use textual descriptions, SMILES, and biochemical properties as multimodal inputs to pre-train a model called PEIT-GEN, by aligning multi-modal representations to synthesize instruction data. In the second step, we fine-tune existing open-source LLMs with the synthesized data, the resulting PEIT-LLM can handle molecule captioning, text-based molecule generation, molecular property prediction, and our newly proposed multi-constraint molecule generation tasks. Experimental results show that our pre-trained PEIT-GEN outperforms MolT5 and BioT5 in molecule captioning, demonstrating modalities align well between textual descriptions, structures, and biochemical properties. Furthermore, PEIT-LLM shows promising improvements in multi-task molecule generation, proving the scalability of the PEIT framework for various molecular tasks. We release the code, constructed instruction data, and model checkpoints in https://github.com/chenlong164/PEIT.

Updated: 2025-03-05 02:08:32

标题: 大语言模型多任务分子生成中的属性增强指导调整

摘要: 大型语言模型（LLMs）被广泛应用于各种自然语言处理任务，如问答和机器翻译。然而，由于缺乏标记数据和对生化属性进行手动注释的困难，分子生成任务的性能仍然有限，特别是涉及多属性约束的任务。在这项工作中，我们提出了一个两步框架PEIT（Property Enhanced Instruction Tuning）来改进用于分子相关任务的LLMs。在第一步中，我们使用文本描述、SMILES和生化属性作为多模态输入，预训练一个名为PEIT-GEN的模型，通过对齐多模态表示来合成指令数据。在第二步中，我们使用合成数据对现有的开源LLMs进行微调，得到的PEIT-LLM可以处理分子字幕、基于文本的分子生成、分子属性预测以及我们新提出的多约束分子生成任务。实验结果表明，我们预训练的PEIT-GEN在分子字幕方面优于MolT5和BioT5，表明文本描述、结构和生化属性之间的模态对齐良好。此外，PEIT-LLM在多任务分子生成方面显示出有希望的改进，证明了PEIT框架在各种分子任务中的可扩展性。我们在https://github.com/chenlong164/PEIT发布了代码、构建的指令数据和模型检查点。

更新时间: 2025-03-05 02:08:32

领域: cs.AI

下载: http://arxiv.org/abs/2412.18084v2

SoK: Knowledge is All You Need: Last Mile Delivery for Automated Provenance-based Intrusion Detection with LLMs

Recently, provenance-based intrusion detection systems (PIDSes) have been widely proposed for endpoint threat analysis. However, due to the lack of systematic integration and utilization of knowledge, existing PIDSes still require significant manual intervention for practical deployment, making full automation challenging. This paper presents a disruptive innovation by categorizing PIDSes according to the types of knowledge they utilize. In response to the prevalent issue of ``knowledge silos problem'' in existing research, we introduce a novel knowledge-driven provenance-based intrusion detection framework, powered by large language models (LLMs). We also present OmniSec, a best practice system built upon this framework. By integrating attack representation knowledge, threat intelligence knowledge, and benign behavior knowledge, OmniSec outperforms the state-of-the-art approaches on public benchmark datasets. OmniSec is available online at https://anonymous.4open.science/r/PIDS-with-LLM-613B.

Updated: 2025-03-05 02:08:12

标题: SoK: 知识就是你所需要的：基于溯源的自动入侵检测的最后一英里交付与LLMs

摘要: 最近，基于来源的入侵检测系统（PIDSes）已被广泛提出用于终端威胁分析。然而，由于缺乏对知识的系统集成和利用，现有的PIDSes仍然需要大量的手动干预才能实际部署，使得完全自动化具有挑战性。本文通过根据它们利用的知识类型对PIDSes进行分类，提出了一种颠覆性创新。为了应对现有研究中普遍存在的“知识孤岛问题”，我们引入了一种新颖的基于知识驱动的基于来源的入侵检测框架，由大型语言模型（LLMs）提供支持。我们还介绍了OmniSec，这是一个建立在这一框架之上的最佳实践系统。通过整合攻击表示知识、威胁情报知识和良性行为知识，OmniSec在公共基准数据集上表现优于最先进的方法。OmniSec可在线访问：https://anonymous.4open.science/r/PIDS-with-LLM-613B。

更新时间: 2025-03-05 02:08:12

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2503.03108v1

External Reliable Information-enhanced Multimodal Contrastive Learning for Fake News Detection

With the rapid development of the Internet, the information dissemination paradigm has changed and the efficiency has been improved greatly. While this also brings the quick spread of fake news and leads to negative impacts on cyberspace. Currently, the information presentation formats have evolved gradually, with the news formats shifting from texts to multimodal contents. As a result, detecting multimodal fake news has become one of the research hotspots. However, multimodal fake news detection research field still faces two main challenges: the inability to fully and effectively utilize multimodal information for detection, and the low credibility or static nature of the introduced external information, which limits dynamic updates. To bridge the gaps, we propose ERIC-FND, an external reliable information-enhanced multimodal contrastive learning framework for fake news detection. ERIC-FND strengthens the representation of news contents by entity-enriched external information enhancement method. It also enriches the multimodal news information via multimodal semantic interaction method where the multimodal constrative learning is employed to make different modality representations learn from each other. Moreover, an adaptive fusion method is taken to integrate the news representations from different dimensions for the eventual classification. Experiments are done on two commonly used datasets in different languages, X (Twitter) and Weibo. Experiment results demonstrate that our proposed model ERIC-FND outperforms existing state-of-the-art fake news detection methods under the same settings.

Updated: 2025-03-05 02:07:38

标题: 外部可靠信息增强的多模式对比学习在假新闻检测中的应用

摘要: 随着互联网的快速发展，信息传播范式已经发生了改变，效率大大提高。然而，这也带来了假新闻的快速传播，对网络空间产生了负面影响。目前，信息呈现格式逐渐演变，新闻格式从文本转变为多模态内容。因此，检测多模态假新闻已经成为研究热点之一。然而，多模态假新闻检测研究领域仍面临两大挑战：无法充分有效地利用多模态信息进行检测，以及引入的外部信息可信度低或静态性质，限制了动态更新。为了填补这些差距，我们提出了ERIC-FND，一种外部可靠信息增强的多模态对比学习框架，用于假新闻检测。ERIC-FND通过实体丰富的外部信息增强方法加强了新闻内容的表示。它还通过多模态语义交互方法丰富了多模态新闻信息，其中采用多模态对比学习让不同模态的表示相互学习。此外，采用自适应融合方法来整合不同维度的新闻表示最终进行分类。在不同语言的两个常用数据集X（Twitter）和微博上进行了实验。实验结果表明，我们提出的模型ERIC-FND在相同设置下优于现有的最先进假新闻检测方法。

更新时间: 2025-03-05 02:07:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.03107v1

Simple Alternating Minimization Provably Solves Complete Dictionary Learning

This paper focuses on the noiseless complete dictionary learning problem, where the goal is to represent a set of given signals as linear combinations of a small number of atoms from a learned dictionary. There are two main challenges faced by theoretical and practical studies of dictionary learning: the lack of theoretical guarantees for practically-used heuristic algorithms and their poor scalability when dealing with huge-scale datasets. Towards addressing these issues, we propose a simple and efficient algorithm that provably recovers the ground truth when applied to the nonconvex and discrete formulation of the problem in the noiseless setting. We also extend our proposed method to mini-batch and online settings where the data is huge-scale or arrives continuously over time. At the core of our proposed method lies an efficient preconditioning technique that transforms the unknown dictionary to a near-orthonormal one, for which we prove a simple alternating minimization technique converges linearly to the ground truth under minimal conditions. Our numerical experiments on synthetic and real datasets showcase the superiority of our method compared with the existing techniques.

Updated: 2025-03-05 02:01:02

标题: 简单交替最小化可证明解决完整字典学习

摘要: 本文关注无噪声完整字典学习问题，其目标是将一组给定的信号表示为从学习的字典中的少数原子的线性组合。字典学习的理论和实践研究面临两个主要挑战：实际使用的启发式算法缺乏理论保证，并且在处理大规模数据集时扩展性差。为了解决这些问题，我们提出了一种简单高效的算法，可以在无噪声设置下应用于非凸离散问题，并可证明恢复出真实情况。我们还将我们提出的方法扩展到小批量和在线设置，其中数据规模巨大或不断随时间到达。我们提出的方法的核心是一种有效的预处理技术，将未知的字典转换为接近正交的字典，我们证明在最小条件下，一种简单的交替最小化技术会线性收敛到真实情况。我们在合成和真实数据集上的数值实验展示了我们的方法相对于现有技术的优越性。

更新时间: 2025-03-05 02:01:02

领域: cs.LG,eess.SP,math.OC

下载: http://arxiv.org/abs/2210.12816v2

HeTGB: A Comprehensive Benchmark for Heterophilic Text-Attributed Graphs

Graph neural networks (GNNs) have demonstrated success in modeling relational data primarily under the assumption of homophily. However, many real-world graphs exhibit heterophily, where linked nodes belong to different categories or possess diverse attributes. Additionally, nodes in many domains are associated with textual descriptions, forming heterophilic text-attributed graphs (TAGs). Despite their significance, the study of heterophilic TAGs remains underexplored due to the lack of comprehensive benchmarks. To address this gap, we introduce the Heterophilic Text-attributed Graph Benchmark (HeTGB), a novel benchmark comprising five real-world heterophilic graph datasets from diverse domains, with nodes enriched by extensive textual descriptions. HeTGB enables systematic evaluation of GNNs, pre-trained language models (PLMs) and co-training methods on the node classification task. Through extensive benchmarking experiments, we showcase the utility of text attributes in heterophilic graphs, analyze the challenges posed by heterophilic TAGs and the limitations of existing models, and provide insights into the interplay between graph structures and textual attributes. We have publicly released HeTGB with baseline implementations to facilitate further research in this field.

Updated: 2025-03-05 02:00:32

标题: HeTGB：一种用于异质文本属性图的综合基准

摘要: 图神经网络（GNNs）在建模关系数据方面取得了成功，主要是在同质性的假设下。然而，许多现实世界的图表现出异质性，其中链接的节点属于不同的类别或具有多样化的属性。此外，许多领域的节点与文本描述相关联，形成了异质性的文本属性图（TAGs）。尽管它们具有重要意义，但由于缺乏全面的基准测试，异质性TAGs的研究仍未得到充分探讨。为了填补这一空白，我们引入了异质性文本属性图基准（HeTGB），这是一个新颖的基准，包括来自不同领域的五个真实世界的异质性图数据集，节点由广泛的文本描述丰富。HeTGB使得可以对GNNs、预训练语言模型（PLMs）和共同训练方法在节点分类任务上进行系统评估。通过广泛的基准测试实验，我们展示了文本属性在异质性图中的实用性，分析了异质性TAGs带来的挑战以及现有模型的局限性，并提供了图结构与文本属性之间相互作用的见解。我们已经公开发布了带有基准实现的HeTGB，以促进这一领域的进一步研究。

更新时间: 2025-03-05 02:00:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.04822v1

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

Large Language Models (LLMs) are often described as instances of foundation models that possess strong generalization obeying scaling laws, and therefore transfer robustly across various conditions in few- or zero-shot manner. Such claims rely on standardized benchmarks that suppose to measure generalization and reasoning, where state-of-the-art (SOTA) models score high. We demonstrate here a dramatic breakdown of generalization and basic reasoning of all SOTA models claiming strong function, including large scale advanced models like GPT-4 or Claude 3 Opus, using a simple, short common sense math problem formulated in concise natural language, easily solvable by humans (AIW problem). The breakdown is dramatic as it manifests on a simple problem in both low average performance and strong performance fluctuations on natural variations in problem template that do not change either problem structure or its difficulty at all. By testing models on further control problems with similar form, we rule out that breakdown might be rooted in minor low-level issues like natural language or numbers parsing. We also observe strong overconfidence in the wrong solutions, expressed in form of plausible sounding explanation-like confabulations. Various standard interventions in an attempt to get the right solution, like chain-of-thought prompting, or urging the models to reconsider the wrong solutions again by multi step re-evaluation, fail. We use these observations to stimulate re-assessment of the capabilities of current generation of LLMs as claimed by standardized benchmarks. Such re-assessment also requires common action to create standardized benchmarks that would allow proper detection of such deficits in generalization and reasoning that obviously remain undiscovered by current state-of-the-art evaluation procedures, where SOTA LLMs manage to score high. Code: https://github.com/LAION-AI/AIW

Updated: 2025-03-05 01:58:08

标题: 《爱丽丝梦游仙境：最先进的大型语言模型中展示完全推理崩溃的简单任务》

摘要: 大型语言模型（LLMs）通常被描述为遵循缩放定律的基础模型实例，具有强大的泛化能力，因此可以在少量或零次尝试的情况下在各种条件下进行稳健的转移。这些主张依赖于假设可以衡量泛化和推理的标准基准，其中最先进的模型得分很高。我们在这里展示了所有声称具有强大功能的SOTA模型（包括大规模先进模型如GPT-4或Claude 3 Opus）在一个简单、简洁的自然语言数学问题上的泛化和基本推理的明显崩溃，这个问题对人类来说很容易解决（AIW问题）。这种崩溃是戏剧性的，因为它在一个简单问题上表现出低平均表现和对自然变化中问题模板的强烈性能波动，这些变化既不改变问题结构也不改变问题的难度。通过在进一步的具有类似形式的控制问题上测试模型，我们排除了崩溃可能源于自然语言或数字解析等次要低级问题。我们还观察到错误解决方案中的强烈过度自信，以合理听起来的解释性幻想形式表达。在尝试采取各种标准干预措施以获得正确解决方案，如思维链提示，或通过多步重新评估促使模型重新考虑错误解决方案，这些尝试都失败了。我们利用这些观察结果来刺激对当前一代LLMs功能的重新评估，正如标准基准所声称的。这种重新评估还需要共同行动，以创建标准基准，以便正确检测出这种显然未被当前最先进评估程序发现的泛化和推理缺陷，其中SOTA LLMs设法取得高分。代码：https://github.com/LAION-AI/AIW

更新时间: 2025-03-05 01:58:08

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.02061v5

Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation

While large language models have demonstrated exceptional performance across a wide range of tasks, they remain susceptible to hallucinations -- generating plausible yet factually incorrect contents. Existing methods to mitigating such risk often rely on sampling multiple full-length generations, which introduces significant response latency and becomes ineffective when the model consistently produces hallucinated outputs with high confidence. To address these limitations, we introduce Monitoring Decoding (MD), a novel framework that dynamically monitors the generation process and selectively applies in-process interventions, focusing on revising crucial tokens responsible for hallucinations. Instead of waiting until completion of multiple full-length generations, we identify hallucination-prone tokens during generation using a monitor function, and further refine these tokens through a tree-based decoding strategy. This approach ensures an enhanced factual accuracy and coherence in the generated output while maintaining efficiency. Experimental results demonstrate that MD consistently outperforms self-consistency-based approaches in both effectiveness and efficiency, achieving higher factual accuracy while significantly reducing computational overhead.

Updated: 2025-03-05 01:51:03

标题: 监控解码：通过评估生成过程中部分回应的事实性来减轻幻觉

摘要: 尽管大型语言模型在广泛的任务中表现出色，但它们仍然容易出现幻觉——生成看似合理但事实不正确的内容。现有的缓解这种风险的方法通常依赖于对多个完整生成进行采样，这会引入显著的响应延迟，并且在模型始终以高置信度产生幻觉输出时变得无效。为了解决这些限制，我们引入了监控解码（MD），这是一个新颖的框架，动态监控生成过程并选择性地应用过程中干预，重点是修正导致幻觉的关键标记。我们不是等到完成多个完整生成，而是在生成过程中使用监视功能识别易产生幻觉的标记，并通过基于树的解码策略进一步完善这些标记。这种方法确保了生成输出中的增强事实准确性和连贯性，同时保持效率。实验结果表明，MD在效果和效率上始终优于基于自一致性的方法，在显著减少计算开销的同时实现更高的事实准确性。

更新时间: 2025-03-05 01:51:03

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.03106v1

RVAFM: Re-parameterizing Vertical Attention Fusion Module for Handwritten Paragraph Text Recognition

Handwritten Paragraph Text Recognition (HPTR) is a challenging task in Computer Vision, requiring the transformation of a paragraph text image, rich in handwritten text, into text encoding sequences. One of the most advanced models for this task is Vertical Attention Network (VAN), which utilizes a Vertical Attention Module (VAM) to implicitly segment paragraph text images into text lines, thereby reducing the difficulty of the recognition task. However, from a network structure perspective, VAM is a single-branch module, which is less effective in learning compared to multi-branch modules. In this paper, we propose a new module, named Re-parameterizing Vertical Attention Fusion Module (RVAFM), which incorporates structural re-parameterization techniques. RVAFM decouples the structure of the module during training and inference stages. During training, it uses a multi-branch structure for more effective learning, and during inference, it uses a single-branch structure for faster processing. The features learned by the multi-branch structure are fused into the single-branch structure through a special fusion method named Re-parameterization Fusion (RF) without any loss of information. As a result, we achieve a Character Error Rate (CER) of 4.44% and a Word Error Rate (WER) of 14.37% on the IAM paragraph-level test set. Additionally, the inference speed is slightly faster than VAN.

Updated: 2025-03-05 01:41:59

标题: RVAFM：重新参数化垂直注意融合模块用于手写段落文本识别

摘要: 手写段落文本识别（HPTR）是计算机视觉中一项具有挑战性的任务，需要将富含手写文本的段落文本图像转换为文本编码序列。针对这一任务的最先进模型之一是垂直注意网络（VAN），它利用垂直注意模块（VAM）隐式地将段落文本图像分割成文本行，从而降低了识别任务的难度。然而，从网络结构的角度来看，VAM是一个单分支模块，与多分支模块相比学习效果较差。本文提出了一个新模块，命名为重新参数化垂直注意融合模块（RVAFM），它结合了结构重新参数化技术。RVAFM在训练和推断阶段解耦模块的结构。在训练过程中，它使用多分支结构进行更有效的学习，而在推断阶段，它使用单分支结构进行更快的处理。通过一种名为重新参数化融合（RF）的特殊融合方法，多分支结构学习的特征被融合到单分支结构中，而不会丢失任何信息。因此，我们在IAM段落级测试集上实现了4.44%的字符错误率（CER）和14.37%的单词错误率（WER）。另外，推断速度略快于VAN。

更新时间: 2025-03-05 01:41:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.03104v1

Fast Jet Tagging with MLP-Mixers on FPGAs

We explore the innovative use of MLP-Mixer models for real-time jet tagging and establish their feasibility on resource-constrained hardware like FPGAs. MLP-Mixers excel in processing sequences of jet constituents, achieving state-of-the-art performance on datasets mimicking Large Hadron Collider conditions. By using advanced optimization techniques such as High-Granularity Quantization and Distributed Arithmetic, we achieve unprecedented efficiency. These models match or surpass the accuracy of previous architectures, reduce hardware resource usage by up to 97%, double the throughput, and half the latency. Additionally, non-permutation-invariant architectures enable smart feature prioritization and efficient FPGA deployment, setting a new benchmark for machine learning in real-time data processing at particle colliders.

Updated: 2025-03-05 01:37:47

标题: 在FPGAs上使用MLP-Mixers进行快速喷气标记

摘要: 我们探讨了MLP-Mixer模型在实时喷注标记中的创新应用，并确定它们在像FPGAs这样的资源受限硬件上的可行性。MLP-Mixers在处理喷注组成序列方面表现出色，在模拟大型强子对撞机条件的数据集上取得了最先进的性能。通过使用高粒度量化和分布算术等先进的优化技术，我们实现了前所未有的效率。这些模型与之前的架构的准确性相匹敌或超越，将硬件资源使用减少了高达97%，提高了吞吐量，降低了延迟。此外，非置换不变架构使智能特征优先级和高效的FPGA部署成为可能，为粒子对撞机实时数据处理中的机器学习设立了新的基准。

更新时间: 2025-03-05 01:37:47

领域: physics.ins-det,cs.LG

下载: http://arxiv.org/abs/2503.03103v1

Cross-modal Causal Relation Alignment for Video Question Grounding

Video question grounding (VideoQG) requires models to answer the questions and simultaneously infer the relevant video segments to support the answers. However, existing VideoQG methods usually suffer from spurious cross-modal correlations, leading to a failure to identify the dominant visual scenes that align with the intended question. Moreover, vision-language models exhibit unfaithful generalization performance and lack robustness on challenging downstream tasks such as VideoQG. In this work, we propose a novel VideoQG framework named Cross-modal Causal Relation Alignment (CRA), to eliminate spurious correlations and improve the causal consistency between question-answering and video temporal grounding. Our CRA involves three essential components: i) Gaussian Smoothing Grounding (GSG) module for estimating the time interval via cross-modal attention, which is de-noised by an adaptive Gaussian filter, ii) Cross-Modal Alignment (CMA) enhances the performance of weakly supervised VideoQG by leveraging bidirectional contrastive learning between estimated video segments and QA features, iii) Explicit Causal Intervention (ECI) module for multimodal deconfounding, which involves front-door intervention for vision and back-door intervention for language. Extensive experiments on two VideoQG datasets demonstrate the superiority of our CRA in discovering visually grounded content and achieving robust question reasoning. Codes are available at https://github.com/WissingChen/CRA-GQA.

Updated: 2025-03-05 01:36:32

标题: 跨模态视频问题定位的因果关系对齐

摘要: 视频问题定位（VideoQG）要求模型回答问题，并同时推断相关视频段以支持答案。然而，现有的VideoQG方法通常受到虚假的跨模态相关性的困扰，导致无法识别与预期问题对齐的主要视觉场景。此外，视觉语言模型在具有挑战性的下游任务（如VideoQG）上表现出不忠实的泛化性能和缺乏鲁棒性。在这项工作中，我们提出了一种名为Cross-modal Causal Relation Alignment（CRA）的新型VideoQG框架，以消除虚假相关性并改善问题回答和视频时间定位之间的因果一致性。我们的CRA涉及三个关键组件：i）通过跨模态注意力估计时间间隔的高斯平滑定位（GSG）模块，通过自适应高斯滤波器去噪，ii）跨模态对齐（CMA）通过利用估计的视频段和QA特征之间的双向对比学习，增强弱监督VideoQG的性能，iii）显式因果干预（ECI）模块用于多模态反混淆，其中包括视觉的前门干预和语言的后门干预。在两个VideoQG数据集上的大量实验证明了我们的CRA在发现具有视觉根基的内容和实现鲁棒的问题推理方面的优越性。代码可在https://github.com/WissingChen/CRA-GQA找到。

更新时间: 2025-03-05 01:36:32

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2503.07635v1

RTFusion: A depth estimation network based on multimodal fusion in challenging scenarios

Depth estimation in complex real-world scenarios is a challenging task, especially when relying solely on a single modality such as visible light or thermal infrared (THR) imagery. This paper proposes a novel multimodal depth estimation model, RTFusion, which enhances depth estimation accuracy and robustness by integrating the complementary strengths of RGB and THR data. The RGB modality provides rich texture and color information, while the THR modality captures thermal patterns, ensuring stability under adverse lighting conditions such as extreme illumination. The model incorporates a unique fusion mechanism, EGFusion, consisting of the Mutual Complementary Attention (MCA) module for cross-modal feature alignment and the Edge Saliency Enhancement Module (ESEM) to improve edge detail preservation. Comprehensive experiments on the MS2 and ViViD++ datasets demonstrate that the proposed model consistently produces high-quality depth maps across various challenging environments, including nighttime, rainy, and high-glare conditions. The experimental results highlight the potential of the proposed method in applications requiring reliable depth estimation, such as autonomous driving, robotics, and augmented reality.

Updated: 2025-03-05 01:35:14

标题: RTFusion：基于多模态融合的挑战场景下深度估计网络

摘要: 在复杂的现实世界场景中进行深度估计是一项具有挑战性的任务，特别是当仅依赖于单一模态，如可见光或热红外（THR）图像时。本文提出了一种新颖的多模态深度估计模型RTFusion，通过整合RGB和THR数据的互补优势，提高深度估计的准确性和鲁棒性。RGB模态提供丰富的纹理和颜色信息，而THR模态捕捉热图案，确保在极端照明等恶劣光照条件下的稳定性。该模型包含一个独特的融合机制EGFusion，包括用于跨模态特征对齐的相互互补注意力（MCA）模块和用于改进边缘细节保留的边缘显著性增强模块（ESEM）。对MS2和ViViD ++数据集进行的全面实验表明，提出的模型在各种具有挑战性的环境中（包括夜间、雨天和高光照条件）始终产生高质量的深度图。实验结果突显了提出的方法在需要可靠深度估计的应用中的潜力，如自动驾驶、机器人技术和增强现实。

更新时间: 2025-03-05 01:35:14

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.04821v1

Perceptual Motor Learning with Active Inference Framework for Robust Lateral Control

This paper presents a novel Perceptual Motor Learning (PML) framework integrated with Active Inference (AIF) to enhance lateral control in Highly Automated Vehicles (HAVs). PML, inspired by human motor learning, emphasizes the seamless integration of perception and action, enabling efficient decision-making in dynamic environments. Traditional autonomous driving approaches--including modular pipelines, imitation learning, and reinforcement learning--struggle with adaptability, generalization, and computational efficiency. In contrast, PML with AIF leverages a generative model to minimize prediction error ("surprise") and actively shape vehicle control based on learned perceptual-motor representations. Our approach unifies deep learning with active inference principles, allowing HAVs to perform lane-keeping maneuvers with minimal data and without extensive retraining across different environments. Extensive experiments in the CARLA simulator demonstrate that PML with AIF enhances adaptability without increasing computational overhead while achieving performance comparable to conventional methods. These findings highlight the potential of PML-driven active inference as a robust alternative for real-world autonomous driving applications.

Updated: 2025-03-05 01:27:57

标题: 使用主动推理框架进行感知运动学习，实现稳健的侧向控制

摘要: 这篇论文提出了一个新颖的感知-运动学习（PML）框架，与主动推理（AIF）集成，以增强高度自动化车辆（HAVs）的横向控制。受人类运动学习启发，PML强调感知和行动的无缝整合，使决策在动态环境中更有效。传统的自动驾驶方法，包括模块化管道、模仿学习和强化学习，很难实现适应性、泛化和计算效率。相比之下，PML与AIF结合利用生成模型来最小化预测误差（“惊讶”），并根据学习的感知运动表征积极塑造车辆控制。我们的方法将深度学习与主动推理原则统一起来，使HAVs能够在不同环境中执行车道保持机动，无需大量数据和广泛的重新训练。在CARLA模拟器中进行的大量实验表明，PML与AIF增强了适应性，同时实现了与传统方法相当的性能，而不增加计算负担。这些发现突显了PML驱动的主动推理作为实际自动驾驶应用的稳健替代方案的潜力。

更新时间: 2025-03-05 01:27:57

领域: cs.RO,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2503.01676v2

MobileViM: A Light-weight and Dimension-independent Vision Mamba for 3D Medical Image Analysis

Efficient evaluation of three-dimensional (3D) medical images is crucial for diagnostic and therapeutic practices in healthcare. Recent years have seen a substantial uptake in applying deep learning and computer vision to analyse and interpret medical images. Traditional approaches, such as convolutional neural networks (CNNs) and vision transformers (ViTs), face significant computational challenges, prompting the need for architectural advancements. Recent efforts have led to the introduction of novel architectures like the ``Mamba'' model as alternative solutions to traditional CNNs or ViTs. The Mamba model excels in the linear processing of one-dimensional data with low computational demands. However, Mamba's potential for 3D medical image analysis remains underexplored and could face significant computational challenges as the dimension increases. This manuscript presents MobileViM, a streamlined architecture for efficient segmentation of 3D medical images. In the MobileViM network, we invent a new dimension-independent mechanism and a dual-direction traversing approach to incorporate with a vision-Mamba-based framework. MobileViM also features a cross-scale bridging technique to improve efficiency and accuracy across various medical imaging modalities. With these enhancements, MobileViM achieves segmentation speeds exceeding 90 frames per second (FPS) on a single graphics processing unit (i.e., NVIDIA RTX 4090). This performance is over 24 FPS faster than the state-of-the-art deep learning models for processing 3D images with the same computational resources. In addition, experimental evaluations demonstrate that MobileViM delivers superior performance, with Dice similarity scores reaching 92.72%, 86.69%, 80.46%, and 77.43% for PENGWIN, BraTS2024, ATLAS, and Toothfairy2 datasets, respectively, which significantly surpasses existing models.

Updated: 2025-03-05 01:21:38

标题: MobileViM：一种轻量级且与尺寸无关的用于3D医学图像分析的视觉马巴

摘要: 三维（3D）医学图像的高效评估对于医疗保健中的诊断和治疗实践至关重要。近年来，深度学习和计算机视觉在分析和解释医学图像方面得到了广泛应用。传统方法，如卷积神经网络（CNNs）和视觉变换器（ViTs），面临着重要的计算挑战，促使需要进行架构的改进。最近的努力导致了像“Mamba”模型这样的新颖架构的引入，作为传统CNNs或ViTs的替代解决方案。Mamba模型在处理低计算需求的一维数据方面表现出色。然而，Mamba在3D医学图像分析方面的潜力尚未得到充分挖掘，并且随着维度的增加，可能面临重要的计算挑战。本文介绍了MobileViM，一个用于高效分割3D医学图像的简化架构。在MobileViM网络中，我们发明了一种新的与维度无关的机制和双向遍历方法，结合了基于视觉Mamba的框架。MobileViM还采用了一种跨尺度桥接技术，以提高在各种医学成像模式下的效率和准确性。通过这些增强功能，MobileViM在单个图形处理单元（即NVIDIA RTX 4090）上实现了超过90帧每秒（FPS）的分割速度。这一性能比利用相同计算资源处理3D图像的最先进深度学习模型快24 FPS以上。此外，实验评估表明，MobileViM具有卓越的性能，Dice相似度分数分别达到92.72％、86.69％、80.46％和77.43％，对于PENGWIN、BraTS2024、ATLAS和Toothfairy2数据集，明显超过现有模型。

更新时间: 2025-03-05 01:21:38

领域: cs.CV,cs.AI,cs.LG,cs.NI

下载: http://arxiv.org/abs/2502.13524v3

The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback

Reinforcement learning from human feedback (RLHF) has contributed to performance improvements in large language models. To tackle its reliance on substantial amounts of human-labeled data, a successful approach is multi-task representation learning, which involves learning a high-quality, low-dimensional representation from a wide range of source tasks. In this paper, we formulate RLHF as the contextual dueling bandit problem and assume a common linear representation. We demonstrate that the sample complexity of source tasks in multi-task RLHF can be reduced by considering task relevance and allocating different sample sizes to source tasks with varying task relevance. We further propose an algorithm to estimate task relevance by a small number of additional data and then learn a policy. We prove that to achieve $\varepsilon-$optimal, the sample complexity of the source tasks can be significantly reduced compared to uniform sampling. Additionally, the sample complexity of the target task is only linear in the dimension of the latent space, thanks to representation learning.

Updated: 2025-03-05 01:09:08

标题: 主动多任务学习在强化学习中从人类反馈中的威力

摘要: 人类反馈强化学习（RLHF）已经为大型语言模型的性能改进做出了贡献。为了解决其对大量人工标记数据的依赖性，一个成功的方法是多任务表示学习，其中涉及从各种来源任务学习高质量、低维度表示。在本文中，我们将RLHF形式化为上下文对决赌徒问题，并假设一个公共线性表示。我们证明了多任务RLHF中源任务的样本复杂度可以通过考虑任务相关性和为不同任务相关性分配不同样本大小来降低。我们进一步提出了一种通过少量额外数据来估计任务相关性然后学习策略的算法。我们证明，为了实现ε-最优，与均匀采样相比，源任务的样本复杂度可以显著降低。此外，由于表示学习，目标任务的样本复杂度仅与潜在空间的维度成线性关系。

更新时间: 2025-03-05 01:09:08

领域: cs.LG

下载: http://arxiv.org/abs/2405.11226v2

AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model

The Segment Anything Model (SAM) has demonstrated strong versatility across various visual tasks. However, its large storage requirements and high computational cost pose challenges for practical deployment. Post-training quantization (PTQ) has emerged as an effective strategy for efficient deployment, but we identify two key challenges in SAM that hinder the effectiveness of existing PTQ methods: the heavy-tailed and skewed distribution of post-GELU activations, and significant inter-channel variation in linear projection activations. To address these challenges, we propose AHCPTQ, an accurate and hardware-efficient PTQ method for SAM. AHCPTQ introduces hardware-compatible Hybrid Log-Uniform Quantization (HLUQ) to manage post-GELU activations, employing log2 quantization for dense small values and uniform quantization for sparse large values to enhance quantization resolution. Additionally, AHCPTQ incorporates Channel-Aware Grouping (CAG) to mitigate inter-channel variation by progressively clustering activation channels with similar distributions, enabling them to share quantization parameters and improving hardware efficiency. The combination of HLUQ and CAG not only enhances quantization effectiveness but also ensures compatibility with efficient hardware execution. For instance, under the W4A4 configuration on the SAM-L model, AHCPTQ achieves 36.6% mAP on instance segmentation with the DINO detector, while achieving a 7.89x speedup and 8.64x energy efficiency over its floating-point counterpart in FPGA implementation.

Updated: 2025-03-05 01:04:45

标题: AHCPTQ：精确且硬件兼容的后训练量化方法用于段模型

摘要: 段分割模型（SAM）已经展示了在各种视觉任务中的强大通用性。然而，其大量的存储需求和高计算成本对实际部署提出了挑战。后训练量化（PTQ）已经成为一种有效的策略，用于高效的部署，但我们发现SAM中存在两个关键挑战，阻碍了现有PTQ方法的有效性：后GELU激活的长尾和偏斜分布，以及线性投影激活中的显著通道间变化。为了解决这些挑战，我们提出了AHCPTQ，一种准确且硬件高效的SAM后训练量化方法。AHCPTQ引入了硬件兼容的混合对数均匀量化（HLUQ）来管理后GELU激活，采用log2量化来处理密集小值和均匀量化来处理稀疏大值，以增强量化分辨率。此外，AHCPTQ还结合了通道感知分组（CAG）来减轻通道间的变化，通过逐渐聚类具有类似分布的激活通道，使它们共享量化参数，并提高硬件效率。HLUQ和CAG的组合不仅增强了量化效果，还确保了与高效硬件执行的兼容性。例如，在SAM-L模型的W4A4配置下，AHCPTQ在使用DINO检测器进行实例分割时实现了36.6％mAP，同时在FPGA实现中实现了7.89倍的加速和8.64倍的能效。

更新时间: 2025-03-05 01:04:45

领域: cs.CV,cs.AR,cs.LG

下载: http://arxiv.org/abs/2503.03088v1

Hopfield Networks Meet Big Data: A Brain-Inspired Deep Learning Framework for Semantic Data Linking

The exponential rise in data generation has led to vast, heterogeneous datasets crucial for predictive analytics and decision-making. Ensuring data quality and semantic integrity remains a challenge. This paper presents a brain-inspired distributed cognitive framework that integrates deep learning with Hopfield networks to identify and link semantically related attributes across datasets. Modeled on the dual-hemisphere functionality of the human brain, the right hemisphere assimilates new information while the left retrieves learned representations for association. Our architecture, implemented on MapReduce with Hadoop Distributed File System (HDFS), leverages deep Hopfield networks as an associative memory mechanism to enhance recall of frequently co-occurring attributes and dynamically adjust relationships based on evolving data patterns. Experiments show that associative imprints in Hopfield memory are reinforced over time, ensuring linked datasets remain contextually meaningful and improving data disambiguation and integration accuracy. Our results indicate that combining deep Hopfield networks with distributed cognitive processing offers a scalable, biologically inspired approach to managing complex data relationships in large-scale environments.

Updated: 2025-03-05 00:53:22

标题: Hopfield网络遇到大数据：一种用于语义数据链接的脑启发深度学习框架

摘要: 数据生成的指数增长导致了庞大、异构的数据集，这些数据对于预测分析和决策制定至关重要。确保数据质量和语义完整性仍然是一个挑战。本文介绍了一个脑启发的分布式认知框架，将深度学习与霍普菲尔德网络集成，以识别和链接跨数据集的语义相关属性。该框架模仿了人类大脑的双半球功能，右半球吸收新信息，而左半球检索学习到的表示以进行关联。我们的架构在MapReduce和Hadoop分布式文件系统（HDFS）上实现，利用深度霍普菲尔德网络作为联想记忆机制，增强频繁共同出现属性的召回，并根据不断演变的数据模式动态调整关系。实验表明，霍普菲尔德记忆中的联想印记会随着时间的推移而加强，确保链接的数据集仍然具有上下文意义，并提高数据消歧和集成准确性。我们的结果表明，将深度霍普菲尔德网络与分布式认知处理相结合，提供了一种可扩展的、受生物启发的方法，用于管理大规模环境中的复杂数据关系。

更新时间: 2025-03-05 00:53:22

领域: cs.LG,cs.AI,cs.DC,cs.NE

下载: http://arxiv.org/abs/2503.03084v1

Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing

Data-Independent Acquisition (DIA) was introduced to improve sensitivity to cover all peptides in a range rather than only sampling high-intensity peaks as in Data-Dependent Acquisition (DDA) mass spectrometry. However, it is not very clear how useful DIA data is for de novo peptide sequencing as the DIA data are marred with coeluted peptides, high noises, and varying data quality. We present a new deep learning method DIANovo, and address each of these difficulties, and improves the previous established system DeepNovo-DIA by from 25% to 81%, averaging 48%, for amino acid recall, and by from 27% to 89%, averaging 57%, for peptide recall, by equipping the model with a deeper understanding of coeluted DIA spectra. This paper also provides criteria about when DIA data could be used for de novo peptide sequencing and when not to by providing a comparison between DDA and DIA, in both de novo and database search mode. We find that while DIA excels with narrow isolation windows on older-generation instruments, it loses its advantage with wider windows. However, with Orbitrap Astral, DIA consistently outperforms DDA due to narrow window mode enabled. We also provide a theoretical explanation of this phenomenon, emphasizing the critical role of the signal-to-noise profile in the successful application of de novo sequencing.

Updated: 2025-03-05 00:46:26

标题: 解析复杂的多重DIA谱在De Novo肽段测定中的作用

摘要: 数据独立采集（DIA）被引入以提高对范围内所有肽段的敏感性，而不仅仅是在数据相关采集（DDA）质谱中仅对高强度峰进行采样。然而，目前并不清楚DIA数据对于全新肽段测序的实用性，因为DIA数据中存在共沉淀的肽段、高噪音和数据质量变化。我们提出了一种新的深度学习方法DIANovo，解决了这些困难，并将之前建立的DeepNovo-DIA系统的性能从25%提高到81%，平均为48%，用于氨基酸回忆，从27%提高到89%，平均为57%，用于肽段回忆，通过赋予模型对共沉淀DIA谱的更深入理解。本文还提供了关于何时可以使用DIA数据进行全新肽段测序以及何时不应该的标准，通过在全新和数据库搜索模式下进行DDA和DIA的比较。我们发现，在老一代仪器上，DIA在窄隔离窗口上表现出色，但在较宽窗口下失去了优势。然而，在Orbitrap Astral上，由于启用了窄窗口模式，DIA始终优于DDA。我们还提供了这一现象的理论解释，强调信噪比在成功应用全新测序中的关键作用。

更新时间: 2025-03-05 00:46:26

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2411.15684v2

A Linear Theory of Multi-Winner Voting

We introduces a general linear framework that unifies the study of multi-winner voting rules and proportionality axioms, demonstrating that many prominent multi-winner voting rules-including Thiele methods, their sequential variants, and approval-based committee scoring rules-are linear. Similarly, key proportionality axioms such as Justified Representation (JR), Extended JR (EJR), and their strengthened variants (PJR+, EJR+), along with core stability, can fit within this linear structure as well. Leveraging PAC learning theory, we establish general and novel upper bounds on the sample complexity of learning linear mappings. Our approach yields near-optimal guarantees for diverse classes of rules, including Thiele methods and ordered weighted average rules, and can be applied to analyze the sample complexity of learning proportionality axioms such as approximate core stability. Furthermore, the linear structure allows us to leverage prior work to extend our analysis beyond worst-case scenarios to study the likelihood of various properties of linear rules and axioms. We introduce a broad class of distributions that extend Impartial Culture for approval preferences, and show that under these distributions, with high probability, any Thiele method is resolute, CORE is non-empty, and any Thiele method satisfies CORE, among other observations on the likelihood of commonly-studied properties in social choice. We believe that this linear theory offers a new perspective and powerful new tools for designing and analyzing multi-winner rules in modern social choice applications.

Updated: 2025-03-05 00:44:56

标题: 一个多赢家投票的线性理论

摘要: 我们引入了一个统一研究多赢家投票规则和比例公理的一般线性框架，证明许多著名的多赢家投票规则，包括Thiele方法，它们的顺序变体，以及基于批准的委员会评分规则，都是线性的。同样，关键的比例公理，如合理代表性（JR），扩展JR（EJR）及其加强变体（PJR+，EJR+），以及核心稳定性，也可以适用于这种线性结构。利用PAC学习理论，我们建立了关于学习线性映射的样本复杂度的一般性和新颖的上界。我们的方法为各种规则提供了几乎最优的保证，包括Thiele方法和有序加权平均规则，并可以用来分析学习比例公理（如近似核心稳定性）的样本复杂度。此外，线性结构使我们能够利用先前的工作将我们的分析扩展到超出最坏情况的场景，以研究线性规则和公理各种属性的可能性。我们引入了一类广泛的分布，扩展了对批准偏好的公正文化，并且证明在这些分布下，高概率下，任何Thiele方法都是坚决的，CORE是非空的，任何Thiele方法都符合CORE，以及在社会选择中通常研究的属性的可能性等其他观察。我们相信，这种线性理论为在现代社会选择应用中设计和分析多赢家规则提供了新的视角和强大的新工具。

更新时间: 2025-03-05 00:44:56

领域: cs.GT,cs.LG,econ.TH

下载: http://arxiv.org/abs/2503.03082v1

Neural Models of Task Adaptation: A Tutorial on Spiking Networks for Executive Control

Understanding cognitive flexibility and task-switching mechanisms in neural systems requires biologically plausible computational models. This tutorial presents a step-by-step approach to constructing a spiking neural network (SNN) that simulates task-switching dynamics within the cognitive control network. The model incorporates biologically realistic features, including lateral inhibition, adaptive synaptic weights through unsupervised Spike Timing-Dependent Plasticity (STDP), and precise neuronal parameterization within physiologically relevant ranges. The SNN is implemented using Leaky Integrate-and-Fire (LIF) neurons, which represent excitatory (glutamatergic) and inhibitory (GABAergic) populations. We utilize two real-world datasets as tasks, demonstrating how the network learns and dynamically switches between them. Experimental design follows cognitive psychology paradigms to analyze neural adaptation, synaptic weight modifications, and emergent behaviors such as Long-Term Potentiation (LTP), Long-Term Depression (LTD), and Task-Set Reconfiguration (TSR). Through a series of structured experiments, this tutorial illustrates how variations in task-switching intervals affect performance and multitasking efficiency. The results align with empirically observed neuronal responses, offering insights into the computational underpinnings of executive function. By following this tutorial, researchers can develop and extend biologically inspired SNN models for studying cognitive processes and neural adaptation.

Updated: 2025-03-05 00:44:34

标题: 任务适应的神经模型：关于用于执行控制的脉冲网络的教程

摘要: 理解神经系统中的认知灵活性和任务切换机制需要具有生物学合理性的计算模型。本教程提供了一种逐步构建尖峰神经网络(SNN)的方法，该网络模拟了认知控制网络内的任务切换动态。该模型包括生物学现实特征，包括侧抑制、通过无监督的尖峰时序相关可塑性(STDP)调整适应性突触权重，以及生理相关范围内的精确神经元参数化。SNN使用泄漏积分-发放(LIF)神经元实现，代表兴奋性(谷氨酸能)和抑制性(GABA能)群体。我们利用两个真实世界的数据集作为任务，展示网络如何学习并在它们之间动态切换。实验设计遵循认知心理学范式，分析神经适应、突触权重修改以及长期增强(LTP)、长期抑制(LTD)和任务集重构(TSR)等新行为。通过一系列结构化实验，本教程阐明了任务切换间隔变化如何影响性能和多任务效率。结果与经验观察到的神经元响应一致，提供了对执行功能计算基础的洞见。通过遵循本教程，研究人员可以开发和扩展受生物启发的SNN模型，用于研究认知过程和神经适应。

更新时间: 2025-03-05 00:44:34

领域: q-bio.NC,cs.LG,cs.NE

下载: http://arxiv.org/abs/2503.03784v1

The Last Iterate Advantage: Empirical Auditing and Principled Heuristic Analysis of Differentially Private SGD

We propose a simple heuristic privacy analysis of noisy clipped stochastic gradient descent (DP-SGD) in the setting where only the last iterate is released and the intermediate iterates remain hidden. Namely, our heuristic assumes a linear structure for the model. We show experimentally that our heuristic is predictive of the outcome of privacy auditing applied to various training procedures. Thus it can be used prior to training as a rough estimate of the final privacy leakage. We also probe the limitations of our heuristic by providing some artificial counterexamples where it underestimates the privacy leakage. The standard composition-based privacy analysis of DP-SGD effectively assumes that the adversary has access to all intermediate iterates, which is often unrealistic. However, this analysis remains the state of the art in practice. While our heuristic does not replace a rigorous privacy analysis, it illustrates the large gap between the best theoretical upper bounds and the privacy auditing lower bounds and sets a target for further work to improve the theoretical privacy analyses. We also empirically support our heuristic and show existing privacy auditing attacks are bounded by our heuristic analysis in both vision and language tasks.

Updated: 2025-03-05 00:39:17

标题: 最后一次迭代的优势：差分隐私SGD的实证审计和原则启发式分析

摘要: 我们提出了一个简单的启发式隐私分析方法，用于噪声剪切随机梯度下降（DP-SGD）的设置，其中只有最后一个迭代被公开，中间迭代保持隐藏。换句话说，我们的启发式方法假设模型具有线性结构。我们通过实验证明，我们的启发式方法可以预测应用于各种训练过程的隐私审计结果。因此，它可以在训练之前用作对最终隐私泄漏的粗略估计。我们还通过提供一些人为对比例子来探讨我们的启发式方法的局限性，其中它低估了隐私泄漏。 DP-SGD的标准基于组合的隐私分析有效地假设对手可以访问所有中间迭代，这通常是不现实的。然而，这种分析在实践中仍然是最先进的。虽然我们的启发式方法不能取代严格的隐私分析，但它展示了最佳理论上限和隐私审计下限之间的巨大差距，并为进一步改进理论隐私分析设定了目标。我们还在视觉和语言任务中通过经验支持我们的启发式方法，并展示现有的隐私审计攻击受到我们启发式分析的限制。

更新时间: 2025-03-05 00:39:17

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.06186v3

Periodontal Bone Loss Analysis via Keypoint Detection With Heuristic Post-Processing

Calculating percentage bone loss is a critical test for periodontal disease staging but is sometimes imprecise and time consuming when manually calculated. This study evaluates the application of a deep learning keypoint and object detection model, YOLOv8-pose, for the automatic identification of localised periodontal bone loss landmarks, conditions and staging. YOLOv8-pose was fine-tuned on 193 annotated periapical radiographs. We propose a keypoint detection metric, Percentage of Relative Correct Keypoints (PRCK), which normalises the metric to the average tooth size of teeth in the image. We propose a heuristic post-processing module that adjusts certain keypoint predictions to align with the edge of the related tooth, using a supporting instance segmentation model trained on an open source auxiliary dataset. The model can sufficiently detect bone loss keypoints, tooth boxes, and alveolar ridge resorption, but has insufficient performance at detecting detached periodontal ligament and furcation involvement. The model with post-processing demonstrated a PRCK 0.25 of 0.726 and PRCK 0.05 of 0.401 for keypoint detection, mAP 0.5 of 0.715 for tooth object detection, mesial dice score of 0.593 for periodontal staging, and dice score of 0.280 for furcation involvement. Our annotation methodology provides a stage agnostic approach to periodontal disease detection, by ensuring most keypoints are present for each tooth in the image, allowing small imbalanced datasets. Our PRCK metric allows accurate evaluation of keypoints in dental domains. Our post-processing module adjusts predicted keypoints correctly but is dependent on a minimum quality of prediction by the pose detection and segmentation models. Code: https:// anonymous.4open.science/r/Bone-Loss-Keypoint-Detection-Code. Dataset: https://bit.ly/4hJ3aE7.

Updated: 2025-03-05 00:34:29

标题: 通过启发式后处理的关键点检测进行牙周骨吸收分析

摘要: 计算百分比骨质丢失是牙周疾病分期的关键测试，但在手动计算时有时不够精确且耗时。本研究评估了深度学习关键点和目标检测模型YOLOv8-pose的应用，用于自动识别局部牙周骨质丢失的标志、条件和分期。YOLOv8-pose在193幅注释的根尖X光片上进行了微调。我们提出了一个关键点检测指标，即相对正确关键点的百分比（PRCK），该指标将度量标准规范化为图像中牙齿的平均大小。我们提出了一个启发式后处理模块，该模块调整了某些关键点预测，以使其与相关牙齿的边缘对齐，使用在开源辅助数据集上训练的支持实例分割模型。该模型可以充分检测到骨质丢失关键点、牙盒和牙槽脊吸收，但在检测脱落的牙周韧带和叉根受累方面表现不佳。经过后处理的模型展示了关键点检测的PRCK 0.25为0.726，PRCK 0.05为0.401，牙齿对象检测的mAP 0.5为0.715，牙周分期的近中骰子得分为0.593，叉根受累的骰子得分为0.280。我们的注释方法提供了一种与分期无关的牙周疾病检测方法，通过确保图像中每颗牙齿都存在大多数关键点，允许使用小型不平衡数据集。我们的PRCK指标允许在牙科领域准确评估关键点。我们的后处理模块可以正确调整预测的关键点，但取决于姿态检测和分割模型的最低预测质量。代码：https://anonymous.4open.science/r/Bone-Loss-Keypoint-Detection-Code。数据集：https://bit.ly/4hJ3aE7。

更新时间: 2025-03-05 00:34:29

领域: q-bio.TO,cs.AI,cs.CV,I.2.1; I.2.10; J.3

下载: http://arxiv.org/abs/2503.13477v1

Efficient Sparse PCA via Block-Diagonalization

Sparse Principal Component Analysis (Sparse PCA) is a pivotal tool in data analysis and dimensionality reduction. However, Sparse PCA is a challenging problem in both theory and practice: it is known to be NP-hard and current exact methods generally require exponential runtime. In this paper, we propose a novel framework to efficiently approximate Sparse PCA by (i) approximating the general input covariance matrix with a re-sorted block-diagonal matrix, (ii) solving the Sparse PCA sub-problem in each block, and (iii) reconstructing the solution to the original problem. Our framework is simple and powerful: it can leverage any off-the-shelf Sparse PCA algorithm and achieve significant computational speedups, with a minor additive error that is linear in the approximation error of the block-diagonal matrix. Suppose $g(k, d)$ is the runtime of an algorithm (approximately) solving Sparse PCA in dimension $d$ and with sparsity constant $k$. Our framework, when integrated with this algorithm, reduces the runtime to $\mathcal{O}\left(\frac{d}{d^\star} \cdot g(k, d^\star) + d^2\right)$, where $d^\star \leq d$ is the largest block size of the block-diagonal matrix. For instance, integrating our framework with the Branch-and-Bound algorithm reduces the complexity from $g(k, d) = \mathcal{O}(k^3\cdot d^k)$ to $\mathcal{O}(k^3\cdot d \cdot (d^\star)^{k-1})$, demonstrating exponential speedups if $d^\star$ is small. We perform large-scale evaluations on many real-world datasets: for exact Sparse PCA algorithm, our method achieves an average speedup factor of 100.50, while maintaining an average approximation error of 0.61%; for approximate Sparse PCA algorithm, our method achieves an average speedup factor of 6.00 and an average approximation error of -0.91%, meaning that our method oftentimes finds better solutions.

Updated: 2025-03-05 00:31:22

标题: 通过块对角化实现高效的稀疏主成分分析

摘要: 稀疏主成分分析（Sparse PCA）是数据分析和降维的关键工具。然而，稀疏PCA在理论和实践中都是一个具有挑战性的问题：它被认为是NP难题，当前的精确方法通常需要指数级的运行时间。在本文中，我们提出了一个新颖的框架，通过（i）用重新排序的块对角矩阵近似一般输入协方差矩阵，（ii）在每个块中解决稀疏PCA子问题，并（iii）将解重构到原始问题中，从而有效地近似稀疏PCA。我们的框架简单而强大：它可以利用任何现成的稀疏PCA算法，并实现显著的计算加速，误差仅为块对角矩阵近似误差的线性增加。假设$g(k, d)$是一个算法（近似）解决维度为$d$且稀疏度常数为$k$的稀疏PCA的运行时间。我们的框架与该算法集成后，将运行时间降低到$\mathcal{O}\left(\frac{d}{d^\star} \cdot g(k, d^\star)+d^2\right)$，其中$d^\star \leq d$是块对角矩阵的最大块大小。例如，将我们的框架与分支定界算法集成后，将复杂度从$g(k, d)=\mathcal{O}(k^3\cdot d^k)$降低到$\mathcal{O}(k^3\cdot d \cdot (d^\star)^{k-1})$，如果$d^\star$较小，则展示出指数级的加速。我们在许多真实世界数据集上进行了大规模评估：对于精确的稀疏PCA算法，我们的方法实现了平均加速比为100.50，同时保持了平均近似误差为0.61%；对于近似的稀疏PCA算法，我们的方法实现了平均加速比为6.00和平均近似误差为-0.91%，这意味着我们的方法通常找到更好的解决方案。

更新时间: 2025-03-05 00:31:22

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2410.14092v2

Impact of Level 2/3 Automated Driving Technology on Road Work Zone Safety

As China's road network enters the maintenance era, work zones will become a common sight on the roads. With the development of automated driving, vehicles equipped with Level 2/3 automated driving capabilities will also become a common presence on the roads. When these vehicles pass through work zones, automated driving may disengage, which can have complex effects on traffic safety. This paper explores the impact of Level 2/3 automated driving technology on road safety in high-speed highway work zone environments. Through microscopic traffic simulation method and using full-type traffic conflict technique, factors such as market penetration rate (MPR), traffic volume level, disengagement threshold, and driver takeover style are studied to understand their impact on work zone safety. The study found that the impact of automated driving technology on work zone safety is complex. Disengagement of automated vehicles in work zones reduces the proportion of vehicles that can maintain automated driving status. If takeover is not timely or adequate, it can easily lead to new traffic conflicts. Different factors have varying degrees of impact on work zone safety. Increasing MPR helps reduce the occurrence of single-vehicle conflicts, but it also increases the possibility of multi-vehicle conflicts. Therefore, future research and improvement directions should focus on optimizing the disengagement detection and takeover mechanisms of automated driving systems.

Updated: 2025-03-05 00:26:53

标题: 自动驾驶技术2/3级对道路施工区安全的影响

摘要: 随着中国的道路网络进入维护时代，工程区将成为道路上常见的景象。随着自动驾驶技术的发展，配备2/3级自动驾驶能力的车辆也将成为道路上常见的存在。当这些车辆经过工程区时，自动驾驶可能会失效，这可能对交通安全产生复杂影响。本文探讨了2/3级自动驾驶技术在高速公路工程区环境中对道路安全的影响。通过微观交通仿真方法和使用全类型交通冲突技术，研究了市场渗透率（MPR）、交通量水平、失效阈值和驾驶员接管方式等因素，以了解它们对工程区安全的影响。研究发现，自动驾驶技术对工程区安全的影响是复杂的。自动车辆在工程区失效会减少能维持自动驾驶状态的车辆比例。如果接管不及时或充分，很容易导致新的交通冲突。不同因素对工程区安全产生不同程度的影响。增加MPR有助于减少单车辆冲突的发生，但也增加了多车辆冲突的可能性。因此，未来的研究和改进方向应该集中在优化自动驾驶系统的失效检测和接管机制。

更新时间: 2025-03-05 00:26:53

领域: cs.AI,cs.MA,cs.RO

下载: http://arxiv.org/abs/2503.07634v1

An Undetectable Watermark for Generative Image Models

We present the first undetectable watermarking scheme for generative image models. Undetectability ensures that no efficient adversary can distinguish between watermarked and un-watermarked images, even after making many adaptive queries. In particular, an undetectable watermark does not degrade image quality under any efficiently computable metric. Our scheme works by selecting the initial latents of a diffusion model using a pseudorandom error-correcting code (Christ and Gunn, 2024), a strategy which guarantees undetectability and robustness. We experimentally demonstrate that our watermarks are quality-preserving and robust using Stable Diffusion 2.1. Our experiments verify that, in contrast to every prior scheme we tested, our watermark does not degrade image quality. Our experiments also demonstrate robustness: existing watermark removal attacks fail to remove our watermark from images without significantly degrading the quality of the images. Finally, we find that we can robustly encode 512 bits in our watermark, and up to 2500 bits when the images are not subjected to watermark removal attacks. Our code is available at https://github.com/XuandongZhao/PRC-Watermark.

Updated: 2025-03-05 00:06:53

标题: 一种用于生成式图像模型的不可检测水印

摘要: 我们提出了第一个针对生成图像模型的不可检测水印方案。不可检测性确保没有有效的对手可以区分带水印和不带水印的图像，即使在进行多次自适应查询之后也是如此。特别是，不可检测水印不会在任何有效计算度量下降低图像质量。我们的方案通过使用伪随机纠错码（Christ和Gunn，2024）选择扩散模型的初始潜变量来工作，这种策略保证了不可检测性和鲁棒性。我们通过使用稳定扩散2.1实验验证了我们的水印是保持质量和鲁棒性的。我们的实验验证，与我们测试过的每个先前方案相比，我们的水印不会降低图像质量。我们的实验还展示了鲁棒性：现有的水印去除攻击无法从图像中去除我们的水印而不显著降低图像质量。最后，我们发现我们可以在我们的水印中鲁棒地编码512位，当图像没有经过水印去除攻击时，最多可以编码2500位。我们的代码可在https://github.com/XuandongZhao/PRC-Watermark获取。

更新时间: 2025-03-05 00:06:53

领域: cs.CR,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2410.07369v3

Deep Learning without Global Optimization by Random Fourier Neural Networks

We introduce a new training algorithm for deep neural networks that utilize random complex exponential activation functions. Our approach employs a Markov Chain Monte Carlo sampling procedure to iteratively train network layers, avoiding global and gradient-based optimization while maintaining error control. It consistently attains the theoretical approximation rate for residual networks with complex exponential activation functions, determined by network complexity. Additionally, it enables efficient learning of multiscale and high-frequency features, producing interpretable parameter distributions. Despite using sinusoidal basis functions, we do not observe Gibbs phenomena in approximating discontinuous target functions.

Updated: 2025-03-05 00:01:35

标题: 使用随机傅里叶神经网络进行深度学习而不需要全局优化

摘要: 我们介绍了一种新的深度神经网络训练算法，利用随机复指数激活函数。我们的方法采用马尔可夫链蒙特卡洛抽样过程来迭代训练网络层，避免全局和基于梯度的优化，同时保持误差控制。它始终达到了具有复指数激活函数的残差网络的理论逼近率，由网络复杂度确定。此外，它能够高效学习多尺度和高频特征，产生可解释的参数分布。尽管使用正弦基函数，我们并未观察到在逼近不连续目标函数时的吉布斯现象。

更新时间: 2025-03-05 00:01:35

领域: cs.LG,cs.NA,math.NA,stat.ML,65T40, 90C15, 65C05, 65C40, 60J22, 68T07

下载: http://arxiv.org/abs/2407.11894v2