Arxiv Day: Article

Leveraging deep learning for plant disease identification: a bibliometric analysis in SCOPUS from 2018 to 2024

This work aimed to present a bibliometric analysis of deep learning research for plant disease identification, with a special focus on generative modeling. A thorough analysis of SCOPUS-sourced bibliometric data from 253 documents was performed. Key performance metrics such as accuracy, precision, recall, and F1-score were analyzed for generative modeling. The findings highlighted significant contributions from some authors Too and Arnal Barbedo, whose works had notable citation counts, suggesting their influence on the academic community. Co-authorship networks revealed strong collaborative clusters, while keyword analysis identified emerging research gaps. This study highlights the role of collaboration and citation metrics in shaping research directions and enhancing the impact of scholarly work in applications of deep learning to plant disease identification. Future research should explore the methodologies of highly cited studies to inform best practices and policy-making.

Updated: 2025-04-09 23:57:30

标题: 利用深度学习进行植物病害识别：2018年至2024年在SCOPUS中的文献计量分析

摘要: 这项工作旨在对植物疾病识别的深度学习研究进行文献计量分析，特别关注生成建模。对来自253份文献的SCOPUS数据进行了彻底分析。对生成建模的关键性能指标，如准确度、精确度、召回率和F1分数进行了分析。研究结果突显了一些作者Too和Arnal Barbedo的显著贡献，其作品具有显著的引用计数，表明他们对学术界的影响。合著网络显示了强大的协作集群，而关键词分析确定了新兴研究领域。这项研究强调了合作和引文指标在塑定研究方向和增强学术作品在深度学习应用于植物疾病识别中的影响方面的作用。未来研究应探索高被引研究的方法论，以指导最佳实践和政策制定。

更新时间: 2025-04-09 23:57:30

领域: cs.LG

下载: http://arxiv.org/abs/2504.07342v1

Learning to erase quantum states: thermodynamic implications of quantum learning theory

The energy cost of erasing quantum states depends on our knowledge of the states. We show that learning algorithms can acquire such knowledge to erase many copies of an unknown state at the optimal energy cost. This is proved by showing that learning can be made fully reversible and has no fundamental energy cost itself. With simple counting arguments, we relate the energy cost of erasing quantum states to their complexity, entanglement, and magic. We further show that the constructed erasure protocol is computationally efficient when learning is efficient. Conversely, under standard cryptographic assumptions, we prove that the optimal energy cost cannot be achieved efficiently in general. These results also enable efficient work extraction based on learning. Together, our results establish a concrete connection between quantum learning theory and thermodynamics, highlighting the physical significance of learning processes and enabling efficient learning-based protocols for thermodynamic tasks.

Updated: 2025-04-09 23:51:01

标题: 学习擦除量子态：量子学习理论的热力学含义

摘要: 擦除量子状态的能量成本取决于我们对状态的了解程度。我们展示了学习算法可以获取这种知识，以最佳能量成本擦除许多未知状态的副本。通过展示学习可以被完全逆转且本身没有基本能量成本来证明这一点。通过简单的计数论证，我们将擦除量子状态的能量成本与它们的复杂性、纠缠和魔力联系起来。我们进一步展示了构建的擦除协议在学习高效时是计算有效的。相反，在标准密码学假设下，我们证明了一般情况下无法高效实现最佳能量成本。这些结果还使得基于学习的有效工作提取成为可能。总之，我们的结果建立了量子学习理论和热力学之间的具体联系，突显了学习过程的物理意义，并为基于学习的热力学任务提供了高效的协议。

更新时间: 2025-04-09 23:51:01

领域: quant-ph,cond-mat.stat-mech,cs.CC,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2504.07341v1

GOLLuM: Gaussian Process Optimized LLMs -- Reframing LLM Finetuning through Bayesian Optimization

Large Language Models (LLMs) can encode complex relationships in their latent spaces, yet harnessing them for optimization under uncertainty remains challenging. We address this gap with a novel architecture that reframes LLM finetuning as Gaussian process (GP) marginal likelihood optimization via deep kernel methods. We introduce LLM-based deep kernels, jointly optimized with GPs to preserve the benefits of both - LLMs to provide a rich and flexible input space for Bayesian optimization and - GPs to model this space with predictive uncertainty for more efficient sampling. Applied to Buchwald-Hartwig reaction optimization, our method nearly doubles the discovery rate of high-performing reactions compared to static LLM embeddings (from 24% to 43% coverage of the top 5% reactions in just 50 optimization iterations). We also observe a 14% improvement over domain-specific representations without requiring specialized features. Extensive empirical evaluation across 19 benchmarks - ranging from general chemistry to reaction and molecular property optimization - demonstrates our method's robustness, generality, and consistent improvements across: (1) tasks, (2) LLM architectures (encoder, decoder, encoder-decoder), (3) pretraining domains (chemistry-related or general-purpose) and (4) hyperparameter settings (tuned once on a single dataset). Finally, we explain these improvements: joint LLM-GP optimization through marginal likelihood implicitly performs contrastive learning, aligning representations to produce (1) better-structured embedding spaces, (2) improved uncertainty calibration, and (3) more efficient sampling - without requiring any external loss. This work provides both practical advances in sample-efficient optimization and insights into what makes effective Bayesian optimization.

Updated: 2025-04-09 23:45:44

标题: GOLLuM：高斯过程优化的LLMs ——通过贝叶斯优化重新构建LLM微调

摘要: 大型语言模型（LLMs）可以在其潜在空间中编码复杂关系，但是将它们用于不确定性下的优化仍然具有挑战性。我们通过一种新颖的架构解决了这一差距，将LLM微调重新构建为通过深度核方法优化高斯过程（GP）边缘似然性。我们引入基于LLM的深度核，与GP一起进行联合优化，以保留两者的优势 - LLM提供丰富灵活的输入空间用于贝叶斯优化，GP用于模拟此空间的预测不确定性以实现更高效的采样。应用于Buchwald-Hartwig反应优化，我们的方法将高性能反应的发现率几乎提高了一倍，与静态LLM嵌入相比（在仅50次优化迭代中，从24%提高到43%覆盖率达到前5%的反应）。我们还观察到，相对于不需要专门特征的领域特定表示，我们的方法提高了14%。广泛的实证评估跨越19个基准测试 - 从一般化学到反应和分子性质优化 - 展示了我们方法在任务、LLM架构（编码器、解码器、编码器-解码器）、预训练领域（与化学相关或通用用途）和超参数设置（在单个数据集上进行一次调整）中的稳健性、普适性和一致改进。最后，我们解释了这些改进：通过边缘似然性进行联合LLM-GP优化隐式执行对比学习，使表示对齐产生（1）更好结构化的嵌入空间，（2）改进的不确定性校准，以及（3）更高效的采样 - 而无需任何外部损失。这项工作在样本高效优化方面提供了实际进展，并深入探讨了有效贝叶斯优化的原理。

更新时间: 2025-04-09 23:45:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06265v2

FLASH: Flexible Learning of Adaptive Sampling from History in Temporal Graph Neural Networks

Aggregating temporal signals from historic interactions is a key step in future link prediction on dynamic graphs. However, incorporating long histories is resource-intensive. Hence, temporal graph neural networks (TGNNs) often rely on historical neighbors sampling heuristics such as uniform sampling or recent neighbors selection. These heuristics are static and fail to adapt to the underlying graph structure. We introduce FLASH, a learnable and graph-adaptive neighborhood selection mechanism that generalizes existing heuristics. FLASH integrates seamlessly into TGNNs and is trained end-to-end using a self-supervised ranking loss. We provide theoretical evidence that commonly used heuristics hinders TGNNs performance, motivating our design. Extensive experiments across multiple benchmarks demonstrate consistent and significant performance improvements for TGNNs equipped with FLASH.

Updated: 2025-04-09 23:35:09

标题: FLASH：时间图神经网络中历史自适应采样的灵活学习

摘要: 从历史互动中聚合时间信号是未来动态图上链接预测的关键步骤。然而，整合长期历史是资源密集型的。因此，时间图神经网络（TGNNs）通常依赖于历史邻居抽样启发式方法，如均匀抽样或最近邻居选择。这些启发式方法是静态的，并且无法适应底层图结构。我们引入了FLASH，一种可学习且适应图形的邻域选择机制，泛化了现有的启发式方法。FLASH与TGNNs完全集成，并使用自监督排名损失进行端到端训练。我们提供了理论证据表明常用的启发式方法阻碍了TGNNs的性能，从而激发了我们的设计动机。在多个基准测试中进行的大量实验表明，配备FLASH的TGNNs显示出一致且显著的性能改进。

更新时间: 2025-04-09 23:35:09

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2504.07337v1

Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging

Medical image segmentation has achieved remarkable success through the continuous advancement of UNet-based and Transformer-based foundation backbones. However, clinical diagnosis in the real world often requires integrating domain knowledge, especially textual information. Conducting multimodal learning involves visual and text modalities shown as a solution, but collecting paired vision-language datasets is expensive and time-consuming, posing significant challenges. Inspired by the superior ability in numerous cross-modal tasks for Large Language Models (LLMs), we proposed a novel Vision-LLM union framework to address the issues. Specifically, we introduce frozen LLMs for zero-shot instruction generation based on corresponding medical images, imitating the radiology scanning and report generation process. {To better approximate real-world diagnostic processes}, we generate more precise text instruction from multimodal radiology images (e.g., T1-w or T2-w MRI and CT). Based on the impressive ability of semantic understanding and rich knowledge of LLMs. This process emphasizes extracting special features from different modalities and reunion the information for the ultimate clinical diagnostic. With generated text instruction, our proposed union segmentation framework can handle multimodal segmentation without prior collected vision-language datasets. To evaluate our proposed method, we conduct comprehensive experiments with influential baselines, the statistical results and the visualized case study demonstrate the superiority of our novel method.}

Updated: 2025-04-09 23:33:35

标题: 宙斯：多模医学影像联合分割的零样本LLM指导

摘要: 医学图像分割通过基于UNet和Transformer的基础骨干持续进步取得了显著成功。然而，在现实世界的临床诊断中，通常需要整合领域知识，特别是文本信息。进行多模态学习涉及将视觉和文本模态显示为一个解决方案，但收集配对的视觉-语言数据集是昂贵且耗时的，面临着重大挑战。受大型语言模型（LLM）在许多跨模态任务中的优越能力的启发，我们提出了一个新颖的Vision-LLM联合框架来解决这些问题。具体而言，我们引入了冻结的LLM用于基于对应的医学图像生成零样本指导，模拟放射学扫描和报告生成过程。为了更好地逼近真实世界的诊断过程，我们从多模态放射学图像（如T1-w或T2-w MRI和CT）生成更精确的文本指导。基于LLM在语义理解和丰富知识方面的出色能力。这一过程强调从不同模态提取特殊特征并重新汇合信息以进行最终的临床诊断。通过生成的文本指导，我们提出的联合分割框架可以处理多模态分割，而无需事先收集视觉-语言数据集。为了评估我们提出的方法，我们进行了与有影响力的基线的全面实验，统计结果和可视化案例研究展示了我们新方法的优越性。

更新时间: 2025-04-09 23:33:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.07336v1

Objaverse++: Curated 3D Object Dataset with Quality Annotations

This paper presents Objaverse++, a curated subset of Objaverse enhanced with detailed attribute annotations by human experts. Recent advances in 3D content generation have been driven by large-scale datasets such as Objaverse, which contains over 800,000 3D objects collected from the Internet. Although Objaverse represents the largest available 3D asset collection, its utility is limited by the predominance of low-quality models. To address this limitation, we manually annotate 10,000 3D objects with detailed attributes, including aesthetic quality scores, texture color classifications, multi-object composition flags, transparency characteristics, etc. Then, we trained a neural network capable of annotating the tags for the rest of the Objaverse dataset. Through experiments and a user study on generation results, we demonstrate that models pre-trained on our quality-focused subset achieve better performance than those trained on the larger dataset of Objaverse in image-to-3D generation tasks. In addition, by comparing multiple subsets of training data filtered by our tags, our results show that the higher the data quality, the faster the training loss converges. These findings suggest that careful curation and rich annotation can compensate for the raw dataset size, potentially offering a more efficient path to develop 3D generative models. We release our enhanced dataset of approximately 500,000 curated 3D models to facilitate further research on various downstream tasks in 3D computer vision. In the near future, we aim to extend our annotations to cover the entire Objaverse dataset.

Updated: 2025-04-09 23:29:08

标题: Objaverse++：具有质量标注的策展3D对象数据集

摘要: 本文介绍了Objaverse++，这是一个由人类专家增强的Objaverse的筛选子集，具有详细的属性注释。最近，3D内容生成的进展是由大规模数据集驱动的，如Objaverse，该数据集包含来自互联网的超过80万个3D对象。尽管Objaverse代表了目前最大的可用3D资产集合，但其效用受到低质量模型的主导而受限。为了解决这一限制，我们手动注释了1万个3D对象的详细属性，包括审美质量评分、纹理颜色分类、多对象组合标志、透明特性等。然后，我们训练了一个能够为其余Objaverse数据集注释标签的神经网络。通过实验和对生成结果的用户研究，我们证明了在我们的质量重点子集上预训练的模型在图像到3D生成任务中表现比在Objaverse更大数据集上训练的模型更好。此外，通过比较通过我们的标签筛选的多个训练数据子集，我们的结果表明，数据质量越高，训练损失收敛得越快。这些发现表明，精心策划和丰富注释可以弥补原始数据集大小，可能为开发3D生成模型提供更有效的路径。我们发布了约50万个经过筛选的3D模型的增强数据集，以促进在3D计算机视觉中各种下游任务的进一步研究。在不久的将来，我们的目标是扩展我们的注释以覆盖整个Objaverse数据集。

更新时间: 2025-04-09 23:29:08

领域: cs.CV,cs.AI,cs.LG,68T45, 68T07,I.2.10; I.3.5; I.3.7; I.4.8; I.5.1

下载: http://arxiv.org/abs/2504.07334v1

Learning-Based Approximate Nonlinear Model Predictive Control Motion Cueing

Motion Cueing Algorithms (MCAs) encode the movement of simulated vehicles into movement that can be reproduced with a motion simulator to provide a realistic driving experience within the capabilities of the machine. This paper introduces a novel learning-based MCA for serial robot-based motion simulators. Building on the differentiable predictive control framework, the proposed method merges the advantages of Nonlinear Model Predictive Control (NMPC) - notably nonlinear constraint handling and accurate kinematic modeling - with the computational efficiency of machine learning. By shifting the computational burden to offline training, the new algorithm enables real-time operation at high control rates, thus overcoming the key challenge associated with NMPC-based motion cueing. The proposed MCA incorporates a nonlinear joint-space plant model and a policy network trained to mimic NMPC behavior while accounting for joint acceleration, velocity, and position limits. Simulation experiments across multiple motion cueing scenarios showed that the proposed algorithm performed on par with a state-of-the-art NMPC-based alternative in terms of motion cueing quality as quantified by the RMSE and correlation coefficient with respect to reference signals. However, the proposed algorithm was on average 400 times faster than the NMPC baseline. In addition, the algorithm successfully generalized to unseen operating conditions, including motion cueing scenarios on a different vehicle and real-time physics-based simulations.

Updated: 2025-04-09 23:09:21

标题: 基于学习的近似非线性模型预测控制运动线索

摘要: 运动线索算法（MCAs）将模拟车辆的运动编码为可以在运动模拟器中重现的运动，以提供真实的驾驶体验，同时保持机器的性能。本文介绍了一种基于学习的串联机器人运动模拟器的MCA。基于可微分预测控制框架，所提出的方法将非线性模型预测控制（NMPC）的优势 - 尤其是非线性约束处理和精确的运动建模 - 与机器学习的计算效率相结合。通过将计算负担转移到离线训练，新算法实现了高控制速率下的实时运行，从而克服了基于NMPC的运动线索的主要挑战。所提出的MCA包括非线性关节空间模型和一个经过训练以模仿NMPC行为的策略网络，同时考虑关节加速度、速度和位置限制。跨多个运动线索场景的仿真实验显示，所提出的算法在运动线索质量方面与一种最先进的基于NMPC的替代方案相当，这是通过与参考信号的均方根误差和相关系数来量化的。然而，所提出的算法平均比NMPC基线快400倍。此外，该算法成功地推广到未见的操作条件，包括在不同车辆上的运动线索场景和基于实时物理的模拟。

更新时间: 2025-04-09 23:09:21

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2504.00469v2

Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators

Nonlinear control systems with partial information to the decision maker are prevalent in a variety of applications. As a step toward studying such nonlinear systems, this work explores reinforcement learning methods for finding the optimal policy in the nearly linear-quadratic regulator systems. In particular, we consider a dynamic system that combines linear and nonlinear components, and is governed by a policy with the same structure. Assuming that the nonlinear component comprises kernels with small Lipschitz coefficients, we characterize the optimization landscape of the cost function. Although the cost function is nonconvex in general, we establish the local strong convexity and smoothness in the vicinity of the global optimizer. Additionally, we propose an initialization mechanism to leverage these properties. Building on the developments, we design a policy gradient algorithm that is guaranteed to converge to the globally optimal policy with a linear rate.

Updated: 2025-04-09 23:06:03

标题: 策略梯度方法收敛于几乎线性二次调节器的全局最优策略

摘要: 非线性控制系统与决策者部分信息相关，在各种应用中普遍存在。作为研究这种非线性系统的一步，本文探讨了强化学习方法，用于在几乎线性二次调节器系统中找到最优策略。具体来说，我们考虑一个动态系统，结合了线性和非线性组件，并由具有相同结构的策略控制。假设非线性组件包含具有较小Lipschitz系数的核，我们对成本函数的优化景观进行了特征化。尽管成本函数通常是非凸的，但我们在全局优化器附近建立了局部强凸性和光滑性。此外，我们提出了一个初始化机制来利用这些特性。基于这些发展，我们设计了一个策略梯度算法，保证以线性速率收敛到全局最优策略。

更新时间: 2025-04-09 23:06:03

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2303.08431v5

Prekey Pogo: Investigating Security and Privacy Issues in WhatsApp's Handshake Mechanism

WhatsApp, the world's largest messaging application, uses a version of the Signal protocol to provide end-to-end encryption (E2EE) with strong security guarantees, including Perfect Forward Secrecy (PFS). To ensure PFS right from the start of a new conversation -- even when the recipient is offline -- a stash of ephemeral (one-time) prekeys must be stored on a server. While the critical role of these one-time prekeys in achieving PFS has been outlined in the Signal specification, we are the first to demonstrate a targeted depletion attack against them on individual WhatsApp user devices. Our findings not only reveal an attack that can degrade PFS for certain messages, but also expose inherent privacy risks and serious availability implications arising from the refilling and distribution procedure essential for this security mechanism.

Updated: 2025-04-09 22:53:13

标题: Prekey Pogo：调查WhatsApp握手机制中的安全和隐私问题

摘要: WhatsApp，全球最大的消息应用程序，使用Signal协议的一个版本提供端到端加密（E2EE），具有强大的安全性保证，包括完美的前向保密性（PFS）。为了确保从新对话开始就实现PFS——即使接收者离线——必须在服务器上存储一组短暂的（一次性的）预密钥。虽然这些一次性预密钥在实现PFS中的关键作用已在Signal规范中概述，但我们是第一个展示针对个别WhatsApp用户设备的有针对性耗尽攻击的研究者。我们的发现不仅揭示了一种可以降低某些消息的PFS的攻击，还暴露了因为为这种安全机制提供必要的重新填充和分发过程而产生的固有隐私风险和严重的可用性影响。

更新时间: 2025-04-09 22:53:13

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2504.07323v1

DeciMamba: Exploring the Length Extrapolation Potential of Mamba

Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generalization capabilities of Mamba, which we find to be relatively limited. Through a series of visualizations and analyses we identify that the limitations arise from a restricted effective receptive field, dictated by the sequence length used during training. To address this constraint, we introduce DeciMamba, a context-extension method specifically designed for Mamba. This mechanism, built on top of a hidden filtering mechanism embedded within the S6 layer, enables the trained model to extrapolate well even without additional training. Empirical experiments over real-world long-range NLP tasks show that DeciMamba can extrapolate to context lengths that are significantly longer than the ones seen during training, while enjoying faster inference.

Updated: 2025-04-09 22:43:46

标题: DeciMamba: 探索曼巴蛇长度外推潜力

摘要: 长程序列处理对于Transformer构成了重要挑战，因为它们在输入长度上具有二次复杂度。一个有前途的替代方案是Mamba，它表现出高性能，并实现了Transformer级别的能力，同时需要较少的计算资源。在本文中，我们探讨了Mamba的长度泛化能力，我们发现其相对有限。通过一系列可视化和分析，我们发现限制来自于训练过程中使用的序列长度所决定的受限有效接受域。为了解决这一限制，我们引入了DeciMamba，这是一种专门为Mamba设计的上下文扩展方法。该机制建立在S6层内嵌的隐藏过滤机制之上，使训练模型能够在没有额外训练的情况下进行良好的外推。在真实世界的长程NLP任务上的实证实验表明，DeciMamba可以外推到比训练过程中看到的上下文长度显著更长的长度，同时享有更快的推断速度。

更新时间: 2025-04-09 22:43:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.14528v3

Bregman-Hausdorff divergence: strengthening the connections between computational geometry and machine learning

The purpose of this paper is twofold. On a technical side, we propose an extension of the Hausdorff distance from metric spaces to spaces equipped with asymmetric distance measures. Specifically, we focus on the family of Bregman divergences, which includes the popular Kullback--Leibler divergence (also known as relative entropy). As a proof of concept, we use the resulting Bregman--Hausdorff divergence to compare two collections of probabilistic predictions produced by different machine learning models trained using the relative entropy loss. The algorithms we propose are surprisingly efficient even for large inputs with hundreds of dimensions. In addition to the introduction of this technical concept, we provide a survey. It outlines the basics of Bregman geometry, as well as computational geometry algorithms. We focus on algorithms that are compatible with this geometry and are relevant for machine learning.

Updated: 2025-04-09 22:42:29

标题: Bregman-Hausdorff散度：加强计算几何与机器学习之间的联系

摘要: 本文的目的有两个方面。在技术方面，我们提出了一种将Hausdorff距离从度量空间扩展到配备非对称距离度量的空间的方法。具体来说，我们着重研究Bregman散度家族，其中包括流行的Kullback-Leibler散度（也称为相对熵）。作为概念验证，我们使用生成的Bregman-Hausdorff散度来比较由不同使用相对熵损失训练的机器学习模型产生的两个概率预测集合。我们提出的算法即使对于具有数百维度的大输入也非常高效。除了介绍这一技术概念，我们还提供了一份调查报告。它概述了Bregman几何学的基础，以及计算几何算法。我们着重研究与这种几何学兼容且与机器学习相关的算法。

更新时间: 2025-04-09 22:42:29

领域: cs.LG,cs.CG,cs.IT,math.IT

下载: http://arxiv.org/abs/2504.07322v1

Deep Learning in Early Alzheimer's disease's Detection: A Comprehensive Survey of Classification, Segmentation, and Feature Extraction Methods

Alzheimers disease is a deadly neurological condition, impairing important memory and brain functions. Alzheimers disease promotes brain shrinkage, ultimately leading to dementia. Dementia diagnosis typically takes 2.8 to 4.4 years after the first clinical indication. Advancements in computing and information technology have led to many techniques of studying Alzheimers disease. Early identification and therapy are crucial for preventing Alzheimers disease, as early-onset dementia hits people before the age of 65, while late-onset dementia occurs after this age. According to the 2015 World Alzheimers disease Report, there are 46.8 million individuals worldwide suffering from dementia, with an anticipated 74.7 million more by 2030 and 131.5 million by 2050. Deep Learning has outperformed conventional Machine Learning techniques by identifying intricate structures in high-dimensional data. Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), have achieved an accuracy of up to 96.0% for Alzheimers disease classification, and 84.2% for mild cognitive impairment (MCI) conversion prediction. There have been few literature surveys available on applying ML to predict dementia, lacking in congenital observations. However, this survey has focused on a specific data channel for dementia detection. This study evaluated Deep Learning algorithms for early Alzheimers disease detection, using openly accessible datasets, feature segmentation, and classification methods. This article also has identified research gaps and limits in detecting Alzheimers disease, which can inform future research.

Updated: 2025-04-09 22:39:50

标题: 早期阿尔茨海默病检测中的深度学习：分类、分割和特征提取方法的综合调查

摘要: 阿尔茨海默病是一种致命的神经疾病，损害重要的记忆和脑功能。阿尔茨海默病促使大脑萎缩，最终导致痴呆症。痴呆症的诊断通常在首次临床指示后需要2.8至4.4年。计算和信息技术的进步已经导致了许多用于研究阿尔茨海默病的技术。早期的识别和治疗对于预防阿尔茨海默病至关重要，因为早发痴呆症会在65岁之前影响人们，而晚发痴呆症则在此年龄之后发生。根据2015年《世界阿尔茨海默病报告》，全球有4,680万人患有痴呆症，预计到2030年将有另外7,470万人，到2050年将有1.315亿人。深度学习通过识别高维数据中的复杂结构，已经超越了传统的机器学习技术。卷积神经网络（CNN）和循环神经网络（RNN）已经实现了96.0%的阿尔茨海默病分类准确率，以及84.2%的轻度认知障碍（MCI）转化预测准确率。目前关于应用机器学习预测痴呆症的文献调查很少，缺乏先天观察。然而，本调查专注于特定数据通道用于痴呆检测。本研究评估了深度学习算法用于早期阿尔茨海默病检测，使用公开可访问的数据集、特征划分和分类方法。本文还确定了在检测阿尔茨海默病方面的研究缺口和限制，这可以为未来的研究提供信息。

更新时间: 2025-04-09 22:39:50

领域: cs.LG

下载: http://arxiv.org/abs/2501.15293v3

Cryptographic Strengthening of MST3 via Automorphism Group of Suzuki Function Fields

The article describes a new implementation of MST3 cryptosystems based on the automorphism group of the field of the Suzuki function. The main difference in the presented implementation is to use the logarithmic signature for encryption not only in the center of the group, as in the well-known implementation of MST3 for Suzuki groups but also for coordinates outside the center of the group. The presented implementation of a cryptosystem has greater reliability. The complexity of cryptanalysis and the size of the message for encryption squared is greater than that of the MST3 cryptosystem in the Suzuki group.

Updated: 2025-04-09 22:37:08

标题: 通过Suzuki函数域的自同构群加密强化MST3

摘要: 这篇文章描述了基于铃木函数域的自同构群的MST3加密系统的新实现。所提出的实现的主要区别在于不仅在群的中心使用对数签名进行加密，而且还在群的中心之外的坐标上使用对数签名，这与铃木群的MST3常见实现有所不同。所提出的加密系统实现具有更高的可靠性。加密的消息大小和密码分析的复杂度比铃木群中的MST3加密系统更大。

更新时间: 2025-04-09 22:37:08

领域: cs.CR

下载: http://arxiv.org/abs/2504.07318v1

Compressing Search with Language Models

Millions of people turn to Google Search each day for information on things as diverse as new cars or flu symptoms. The terms that they enter contain valuable information on their daily intent and activities, but the information in these search terms has been difficult to fully leverage. User-defined categorical filters have been the most common way to shrink the dimensionality of search data to a tractable size for analysis and modeling. In this paper we present a new approach to reducing the dimensionality of search data while retaining much of the information in the individual terms without user-defined rules. Our contributions are two-fold: 1) we introduce SLaM Compression, a way to quantify search terms using pre-trained language models and create a representation of search data that has low dimensionality, is memory efficient, and effectively acts as a summary of search, and 2) we present CoSMo, a Constrained Search Model for estimating real world events using only search data. We demonstrate the efficacy of our contributions by estimating with high accuracy U.S. automobile sales and U.S. flu rates using only Google Search data.

Updated: 2025-04-09 22:34:22

标题: 使用语言模型压缩搜索

摘要: 数百万人每天都会转向谷歌搜索获取有关新车或流感症状等各种信息。他们输入的搜索词包含有关他们日常意图和活动的宝贵信息，但这些搜索词中的信息很难充分利用。用户定义的分类过滤器一直是缩小搜索数据维度以便进行分析和建模的最常见方法。在本文中，我们提出了一种新方法，可以减少搜索数据的维度，同时保留大部分个体术语中的信息，而无需用户定义的规则。我们的贡献有两个方面：1）我们引入了SLaM压缩，一种使用预先训练的语言模型量化搜索词并创建具有低维度、内存高效且有效作为搜索摘要的数据表示的方法；2）我们提出了CoSMo，一个用于仅使用搜索数据估计真实世界事件的约束搜索模型。我们通过仅使用谷歌搜索数据高准确度地估计美国汽车销售和美国流感率来证明我们的贡献的有效性。

更新时间: 2025-04-09 22:34:22

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.00085v2

An Activity-Based Model of Transport Demand for Greater Melbourne

In this paper, we present an activity-based model for the Greater Melbourne area, using a combination of hierarchical clustering, probabilistic, and gravity-based approaches. The model outlines steps for generating a synthetic population-a list of agents with their demographic attributes-and for assigning activity patterns, schedules, as well as activity locations and modes of travel for each trip. In our model, individuals are assigned activity chains based on the probabilities of their respective demographic clusters, as informed by observed data. Tours and trips then emanate from these assigned activities. This is innovative compared to the common practice of creating trips or tours first and attaching activities thereafter. Furthermore, when selecting activity locations, our model incorporates both the distance-decay of trip lengths and the activity-based attraction of destination sites. This results in areas with higher attractiveness for various activities showing a greater likelihood of being selected. Additionally, when assigning the location for the next activity, we take into account the number of activities an agent has remaining to ensure they do not opt for a location that would be impractical for a return trip home. Our methodology is open and replicable, requiring only publicly available data and is designed to produce outcomes compatible with commonly used agent-based modeling software such as MATSim. Each sub-model is calibrated to match observed data in terms of activity types, start and end times, and durations.

Updated: 2025-04-09 22:34:16

标题: 《大墨尔本地区交通需求的基于活动的模型》

摘要: 在这篇论文中，我们提出了一个针对大墨尔本地区的基于活动的模型，使用了分层聚类、概率和基于重力的方法的组合。该模型概述了生成一个合成人口-一个具有其人口属性的代理列表-以及为每次旅行分配活动模式、时间表、活动地点和出行方式的步骤。在我们的模型中，个体根据其各自人口群体的概率被分配活动链，这些概率是由观察数据所确定的。然后，旅行和行程从这些分配的活动中产生。与先创建旅行或行程，然后再附加活动的常见做法相比，这是创新的。此外，在选择活动地点时，我们的模型同时考虑了行程长度的距离衰减和目的地对活动的吸引力。这导致具有各种活动更高吸引力的地区更有可能被选中。此外，在分配下一个活动的位置时，我们考虑了代理剩余活动数，以确保他们不会选择一个对返回家的旅行来说不切实际的地点。我们的方法论是开放和可复制的，只需要公开可用的数据，并且旨在产生与常用基于代理的建模软件（如MATSim）兼容的结果。每个子模型都根据活动类型、开始和结束时间以及持续时间的观察数据进行校准。

更新时间: 2025-04-09 22:34:16

领域: cs.AI

下载: http://arxiv.org/abs/2111.10061v2

Identifying regions of interest in whole slide images of renal cell carcinoma

The histopathological images contain a huge amount of information, which can make diagnosis an extremely timeconsuming and tedious task. In this study, we developed a completely automated system to detect regions of interest (ROIs) in whole slide images (WSI) of renal cell carcinoma (RCC), to reduce time analysis and assist pathologists in making more accurate decisions. The proposed approach is based on an efficient texture descriptor named dominant rotated local binary pattern (DRLBP) and color transformation to reveal and exploit the immense texture variability at the microscopic high magnifications level. Thereby, the DRLBPs retain the structural information and utilize the magnitude values in a local neighborhood for more discriminative power. For the classification of the relevant ROIs, feature extraction of WSIs patches was performed on the color channels separately to form the histograms. Next, we used the most frequently occurring patterns as a feature selection step to discard non-informative features. The performances of different classifiers on a set of 1800 kidney cancer patches originating from 12 whole slide images were compared and evaluated. Furthermore, the small size of the image dataset allows to investigate deep learning approach based on transfer learning for image patches classification by using deep features and fine-tuning methods. High recognition accuracy was obtained and the classifiers are efficient, the best precision result was 99.17% achieved with SVM. Moreover, transfer learning models perform well with comparable performance, and the highest precision using ResNet-50 reached 98.50%. The proposed approach results revealed a very efficient image classification and demonstrated efficacy in identifying ROIs. This study presents an automatic system to detect regions of interest relevant to the diagnosis of kidney cancer in whole slide histopathology images.

Updated: 2025-04-09 22:28:26

标题: 确定肾细胞癌全切片图像中感兴趣区域

摘要: 组织病理图像包含大量信息，使得诊断成为一项非常耗时且乏味的任务。在本研究中，我们开发了一个完全自动化的系统，用于检测肾细胞癌（RCC）的全切片图像中的感兴趣区域（ROIs），以减少分析时间并帮助病理学家做出更准确的决策。所提出的方法基于一种高效的纹理描述符，称为主导旋转局部二值模式（DRLBP），并利用颜色转换来揭示和利用显微镜高倍率水平上的巨大纹理变化。因此，DRLBP保留了结构信息，并利用局部邻域中的幅度值具有更强的区分能力。为了对相关ROIs进行分类，对WSIs补丁的特征提取分别在颜色通道上进行，形成直方图。接下来，我们使用最常出现的模式作为特征选择步骤，以丢弃无信息的特征。对来自12个全切片图像的1800个肾癌补丁的不同分类器的性能进行了比较和评估。此外，图像数据集的小尺寸允许使用基于转移学习的深度学习方法对图像补丁进行分类，利用深度特征和微调方法。获得了高准确率识别，分类器效率高，最佳精度结果为99.17%。此外，转移学习模型表现良好，性能可比，使用ResNet-50达到98.50%的最高精度。所提出的方法结果显示了非常高效的图像分类，并在识别ROIs方面表现出了功效。本研究提出了一个自动系统，用于检测全切片组织病理图像中与肾癌诊断相关的感兴趣区域。

更新时间: 2025-04-09 22:28:26

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.07313v1

KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models

Recent advances in diffusion models have significantly improved text-to-image (T2I) generation, but they often struggle to balance fine-grained precision with high-level control. Methods like ControlNet and T2I-Adapter excel at following sketches by seasoned artists but tend to be overly rigid, replicating unintentional flaws in sketches from novice users. Meanwhile, coarse-grained methods, such as sketch-based abstraction frameworks, offer more accessible input handling but lack the precise control needed for detailed, professional use. To address these limitations, we propose KnobGen, a dual-pathway framework that democratizes sketch-based image generation by seamlessly adapting to varying levels of sketch complexity and user skill. KnobGen uses a Coarse-Grained Controller (CGC) module for high-level semantics and a Fine-Grained Controller (FGC) module for detailed refinement. The relative strength of these two modules can be adjusted through our knob inference mechanism to align with the user's specific needs. These mechanisms ensure that KnobGen can flexibly generate images from both novice sketches and those drawn by seasoned artists. This maintains control over the final output while preserving the natural appearance of the image, as evidenced on the MultiGen-20M dataset and a newly collected sketch dataset.

Updated: 2025-04-09 22:27:10

标题: KnobGen：控制基于草图扩散模型中艺术作品的复杂性

摘要: 最近的扩散模型的进展显著提高了文本到图像（T2I）生成的质量，但它们往往在精细的精度和高级控制之间难以平衡。像ControlNet和T2I-Adapter这样的方法擅长跟随经验丰富的艺术家的草图，但往往过于僵硬，复制了新手用户草图中的无意的缺陷。与此同时，粗粒度方法，如基于草图的抽象框架，提供了更易于操作的输入处理，但缺乏详细、专业使用所需的精确控制。为了解决这些限制，我们提出了KnobGen，一个双通道框架，通过无缝地适应不同水平的草图复杂性和用户技能，实现了基于草图的图像生成的民主化。KnobGen使用一个高级语义的粗粒度控制器（CGC）模块和一个详细精炼的细粒度控制器（FGC）模块。这两个模块的相对强度可以通过我们的旋钮推断机制进行调整，以满足用户特定的需求。这些机制确保KnobGen可以灵活地生成新手草图和经验丰富的艺术家绘制的草图的图像。这样可以在保持对最终输出的控制的同时保留图像的自然外观，正如在MultiGen-20M数据集和新收集的草图数据集中所证明的那样。

更新时间: 2025-04-09 22:27:10

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.01595v3

NNN: Next-Generation Neural Networks for Marketing Mix Modeling

We present NNN, a Transformer-based neural network approach to Marketing Mix Modeling (MMM) designed to address key limitations of traditional methods. Unlike conventional MMMs which rely on scalar inputs and parametric decay functions, NNN uses rich embeddings to capture both quantitative and qualitative aspects of marketing and organic channels (e.g., search queries, ad creatives). This, combined with its attention mechanism, enables NNN to model complex interactions, capture long-term effects, and potentially improve sales attribution accuracy. We show that L1 regularization permits the use of such expressive models in typical data-constrained settings. Evaluating NNN on simulated and real-world data demonstrates its efficacy, particularly through considerable improvement in predictive power. Beyond attribution, NNN provides valuable, complementary insights through model probing, such as evaluating keyword or creative effectiveness, enhancing model interpretability.

Updated: 2025-04-09 22:23:07

标题: NNN：用于市场组合建模的下一代神经网络

摘要: 我们提出了NNN，这是一种基于Transformer的神经网络方法，用于营销组合建模(MMM)，旨在解决传统方法的关键局限性。与依赖标量输入和参数衰减函数的传统MMM不同，NNN使用丰富的嵌入来捕捉营销和有机渠道（例如搜索查询、广告创意）的定量和定性方面。结合其注意力机制，NNN能够模拟复杂的互动，捕捉长期效应，并可能提高销售归因的准确性。我们展示了L1正则化允许在典型的数据受限设置中使用这种表达性模型。对NNN在模拟和真实数据上的评估显示了其有效性，尤其是通过显著提高预测能力。除了归因之外，NNN通过模型探测提供了有价值的、补充性的见解，例如评估关键词或创意的有效性，增强模型的可解释性。

更新时间: 2025-04-09 22:23:07

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2504.06212v2

Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for the m-Set Semi-Bandit Problems

We consider a common case of the combinatorial semi-bandit problem, the $m$-set semi-bandit, where the learner exactly selects $m$ arms from the total $d$ arms. In the adversarial setting, the best regret bound, known to be $\mathcal{O}(\sqrt{nmd})$ for time horizon $n$, is achieved by the well-known Follow-the-Regularized-Leader (FTRL) policy, which, however, requires to explicitly compute the arm-selection probabilities by solving optimizing problems at each time step and sample according to it. This problem can be avoided by the Follow-the-Perturbed-Leader (FTPL) policy, which simply pulls the $m$ arms that rank among the $m$ smallest (estimated) loss with random perturbation. In this paper, we show that FTPL with a Fr\'echet perturbation also enjoys the optimal regret bound $\mathcal{O}(\sqrt{nmd})$ in the adversarial setting and achieves best-of-both-world regret bounds, i.e., achieves a logarithmic regret for the stochastic setting.

Updated: 2025-04-09 22:07:01

标题: Follow-the-Perturbed-Leader 在 m-Set 半强盗问题中实现了最佳效果

摘要: 我们考虑组合半匪帮问题的一种常见情况，即$m$-集合半匪帮，其中学习者从总共$d$个臂中精确选择$m$个臂。在对抗设置中，已知时间横跨为$n$时，最佳遗憾界为$\mathcal{O}(\sqrt{nmd})$，由众所周知的Follow-the-Regularized-Leader（FTRL）策略实现，然而，该策略需要在每个时间步骤显式计算臂选择概率，并根据其解决优化问题进行抽样。这个问题可以通过Follow-the-Perturbed-Leader（FTPL）策略避免，该策略简单地拉动在$m$个最小（估计）损失中排名的$m$个臂，并进行随机扰动。在本文中，我们展示了带有Frechet扰动的FTPL也在对抗设置中享有最佳遗憾界$\mathcal{O}(\sqrt{nmd})，并实现了最佳-两者遗憾界，即在随机设置中实现对数遗憾。

更新时间: 2025-04-09 22:07:01

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.07307v1

PAYADOR: A Minimalist Approach to Grounding Language Models on Structured Data for Interactive Storytelling and Role-playing Games

Every time an Interactive Storytelling (IS) system gets a player input, it is facing the world-update problem. Classical approaches to this problem consist in mapping that input to known preprogrammed actions, what can severely constrain the free will of the player. When the expected experience has a strong focus on improvisation, like in Role-playing Games (RPGs), this problem is critical. In this paper we present PAYADOR, a different approach that focuses on predicting the outcomes of the actions instead of representing the actions themselves. To implement this approach, we ground a Large Language Model to a minimal representation of the fictional world, obtaining promising results. We make this contribution open-source, so it can be adapted and used for other related research on unleashing the co-creativity power of RPGs.

Updated: 2025-04-09 21:59:31

标题: PAYADOR：一种在结构化数据上为互动叙事和角色扮演游戏打基础的极简主义方法。

摘要: 每当一个互动叙事（IS）系统接收到玩家输入时，它都面临着世界更新问题。传统方法通常将该输入映射到已知的预编程动作，这可能严重限制玩家的自由意志。当期望的体验强调即兴表演，例如角色扮演游戏（RPGs）时，这个问题就变得至关重要。在本文中，我们介绍了PAYADOR，一种不同的方法，它注重预测行动的结果而不是代表行动本身。为了实现这种方法，我们将一个大型语言模型与虚构世界的最小表示联系起来，获得了令人期待的结果。我们开放这一贡献，以便它可以被适应和用于其他相关研究，释放RPGs的共同创造力。

更新时间: 2025-04-09 21:59:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.07304v1

Modeling Response Consistency in Multi-Agent LLM Systems: A Comparative Analysis of Shared and Separate Context Approaches

Large Language Models (LLMs) are increasingly utilized in multi-agent systems (MAS) to enhance collaborative problem-solving and interactive reasoning. Recent advancements have enabled LLMs to function as autonomous agents capable of understanding complex interactions across multiple topics. However, deploying LLMs in MAS introduces challenges related to context management, response consistency, and scalability, especially when agents must operate under memory limitations and handle noisy inputs. While prior research has explored optimizing context sharing and response latency in LLM-driven MAS, these efforts often focus on either fully centralized or decentralized configurations, each with distinct trade-offs. In this paper, we develop a probabilistic framework to analyze the impact of shared versus separate context configurations on response consistency and response times in LLM-based MAS. We introduce the Response Consistency Index (RCI) as a metric to evaluate the effects of context limitations, noise, and inter-agent dependencies on system performance. Our approach differs from existing research by focusing on the interplay between memory constraints and noise management, providing insights into optimizing scalability and response times in environments with interdependent topics. Through this analysis, we offer a comprehensive understanding of how different configurations impact the efficiency of LLM-driven multi-agent systems, thereby guiding the design of more robust architectures.

Updated: 2025-04-09 21:54:21

标题: 建模多智能体LLM系统中的响应一致性：共享和分离上下文方法的比较分析

摘要: 大型语言模型（LLMs）越来越被广泛应用于多智能体系统（MAS）中，以增强协作问题解决和互动推理。最近的进展使LLMs能够作为能够理解跨多个主题的复杂交互的自主智能体。然而，在MAS中部署LLMs引入了与上下文管理、响应一致性和可扩展性相关的挑战，特别是当智能体必须在内存限制和处理嘈杂输入的情况下运作时。尽管先前的研究探讨了在LLM驱动的MAS中优化上下文共享和响应延迟，但这些努力往往集中在完全集中或分散配置上，每种都具有不同的权衡。在本文中，我们开发了一个概率框架来分析LLM驱动的MAS中共享与分离上下文配置对响应一致性和响应时间的影响。我们引入了响应一致性指数（RCI）作为评估上下文限制、噪声和智能体间依赖对系统性能影响的指标。我们的方法与现有研究不同之处在于专注于记忆限制和噪声管理之间的相互作用，为在具有相互依赖主题的环境中优化可扩展性和响应时间提供了见解。通过这种分析，我们提供了对不同配置如何影响LLM驱动的多智能体系统效率的全面理解，从而指导设计更稳健的架构。

更新时间: 2025-04-09 21:54:21

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2504.07303v1

A Python toolkit for dealing with Petri nets over ontological graphs

We present theoretical rudiments of Petri nets over ontological graphs as well as the designed and implemented Python toolkit for dealing with such nets. In Petri nets over ontological graphs, the domain knowledge is enclosed in a form of ontologies. In this way, some valuable knowledge (especially in terms of semantic relations) can be added to model reasoning and control processes by means of Petri nets. In the implemented approach, ontological graphs are obtained from ontologies built in accordance with the OWL 2 Web Ontology Language. The implemented tool enables the users to define the structure and dynamics of Petri nets over ontological graphs.

Updated: 2025-04-09 21:52:17

标题: 一个用于处理本体图上的Petri网的Python工具包

摘要: 我们提出了Petri网在本体图上的理论基础，以及设计和实现的用于处理此类网络的Python工具包。在本体图上的Petri网中，领域知识以本体的形式封装。通过这种方式，一些有价值的知识（特别是语义关系方面）可以通过Petri网来模拟推理和控制过程。在实现的方法中，本体图是根据符合OWL 2 Web本体语言构建的本体而获得的。实现的工具使用户能够定义Petri网在本体图上的结构和动态。

更新时间: 2025-04-09 21:52:17

领域: cs.AI

下载: http://arxiv.org/abs/2504.08006v1

Meta-RTL: Reinforcement-Based Meta-Transfer Learning for Low-Resource Commonsense Reasoning

Meta learning has been widely used to exploit rich-resource source tasks to improve the performance of low-resource target tasks. Unfortunately, most existing meta learning approaches treat different source tasks equally, ignoring the relatedness of source tasks to the target task in knowledge transfer. To mitigate this issue, we propose a reinforcement-based multi-source meta-transfer learning framework (Meta-RTL) for low-resource commonsense reasoning. In this framework, we present a reinforcement-based approach to dynamically estimating source task weights that measure the contribution of the corresponding tasks to the target task in the meta-transfer learning. The differences between the general loss of the meta model and task-specific losses of source-specific temporal meta models on sampled target data are fed into the policy network of the reinforcement learning module as rewards. The policy network is built upon LSTMs that capture long-term dependencies on source task weight estimation across meta learning iterations. We evaluate the proposed Meta-RTL using both BERT and ALBERT as the backbone of the meta model on three commonsense reasoning benchmark datasets. Experimental results demonstrate that Meta-RTL substantially outperforms strong baselines and previous task selection strategies and achieves larger improvements on extremely low-resource settings.

Updated: 2025-04-09 21:49:23

标题: Meta-RTL：基于强化的元迁移学习用于低资源的常识推理

摘要: 元学习被广泛应用于利用丰富资源的源任务来提高低资源目标任务的性能。不幸的是，大多数现有的元学习方法将不同的源任务视为相等，忽略了源任务与目标任务在知识转移中的相关性。为了缓解这个问题，我们提出了一种基于强化的多源元迁移学习框架（Meta-RTL）用于低资源常识推理。在这个框架中，我们提出了一种基于强化的方法，动态估计源任务权重，衡量相应任务对元迁移学习中目标任务的贡献。元模型的一般损失与基于源特定时间元模型在采样目标数据上的任务特定损失之间的差异被馈送到强化学习模块的策略网络中作为奖励。策略网络建立在捕捉跨元学习迭代中源任务权重估计的长期依赖的LSTM上。我们使用BERT和ALBERT作为元模型的主干对提出的Meta-RTL进行评估，评估三个常识推理基准数据集。实验结果表明，Meta-RTL在强基线和先前的任务选择策略上取得了显著的优势，并在极低资源设置上取得了更大的改进。

更新时间: 2025-04-09 21:49:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.19075v3

ChatEMG: Synthetic Data Generation to Control a Robotic Hand Orthosis for Stroke

Intent inferral on a hand orthosis for stroke patients is challenging due to the difficulty of data collection. Additionally, EMG signals exhibit significant variations across different conditions, sessions, and subjects, making it hard for classifiers to generalize. Traditional approaches require a large labeled dataset from the new condition, session, or subject to train intent classifiers; however, this data collection process is burdensome and time-consuming. In this paper, we propose ChatEMG, an autoregressive generative model that can generate synthetic EMG signals conditioned on prompts (i.e., a given sequence of EMG signals). ChatEMG enables us to collect only a small dataset from the new condition, session, or subject and expand it with synthetic samples conditioned on prompts from this new context. ChatEMG leverages a vast repository of previous data via generative training while still remaining context-specific via prompting. Our experiments show that these synthetic samples are classifier-agnostic and can improve intent inferral accuracy for different types of classifiers. We demonstrate that our complete approach can be integrated into a single patient session, including the use of the classifier for functional orthosis-assisted tasks. To the best of our knowledge, this is the first time an intent classifier trained partially on synthetic data has been deployed for functional control of an orthosis by a stroke survivor. Videos, source code, and additional information can be found at https://jxu.ai/chatemg.

Updated: 2025-04-09 21:49:04

标题: ChatEMG: 用于控制中风患者的机械手外骨骼的合成数据生成

摘要: 对于中风患者手部矫正器的意图推断是具有挑战性的，因为数据收集困难。此外，肌电信号在不同条件、会话和受试者之间表现出显著变化，使得分类器很难泛化。传统方法需要从新条件、会话或受试者获取大量标记数据来训练意图分类器；然而，这种数据收集过程繁重且耗时。在本文中，我们提出了ChatEMG，这是一种可以根据提示（即给定的肌电信号序列）生成合成肌电信号的自回归生成模型。ChatEMG使我们可以仅从新条件、会话或受试者收集小数据集，并在新背景下的提示条件下扩展其合成样本。ChatEMG通过生成训练利用大量的先前数据，同时通过提示保持特定上下文。我们的实验表明，这些合成样本与分类器无关，并且可以提高不同类型分类器的意图推断准确性。我们展示了我们的完整方法可以整合到单个患者会话中，包括使用分类器进行功能矫正器辅助任务。据我们所知，这是首次使用部分基于合成数据训练的意图分类器为中风幸存者的矫正器的功能控制部署。视频、源代码和其他信息可在https://jxu.ai/chatemg找到。

更新时间: 2025-04-09 21:49:04

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.12123v3

UQ of 2D Slab Burner DNS: Surrogates, Uncertainty Propagation, and Parameter Calibration

The goal of this paper is to demonstrate and address challenges related to all aspects of performing a complete uncertainty quantification analysis of a complicated physics-based simulation like a 2D slab burner direct numerical simulation (DNS). The UQ framework includes the development of data-driven surrogate models, propagation of parametric uncertainties to the fuel regression rate--the primary quantity of interest--and Bayesian calibration of the latent heat of sublimation and a chemical reaction temperature exponent using experimental data. Two surrogate models, a Gaussian Process (GP) and a Hierarchical Multiscale Surrogate (HMS) were constructed using an ensemble of 64 simulations generated via Latin Hypercube sampling. HMS is superior for prediction demonstrated by cross-validation and able to achieve an error < 15% when predicting multiscale boundary quantities just from a few far field inputs. Subsequent Bayesian calibration of chemical kinetics and fuel response parameters against experimental observations showed that the default values used in the DNS should be higher to better match measurements. This study highlights the importance of surrogate model selection and parameter calibration in quantifying uncertainty in predictions of fuel regression rates in complex combustion systems.

Updated: 2025-04-09 21:42:47

标题: 2D板燃烧器DNS的UQ：代理模型、不确定性传播和参数校准

摘要: 本文的目标是展示和解决与执行复杂基于物理的仿真（如2D板式燃烧器直接数值模拟（DNS））的完整不确定性量化分析相关的挑战。UQ框架包括开发基于数据驱动的代理模型，将参数不确定性传播到燃料回归速率（主要感兴趣的数量）以及使用实验数据对升华潜热和化学反应温度指数进行贝叶斯校准。通过拉丁超立方采样生成的64个模拟的集合构建了两个代理模型，高斯过程（GP）和分层多尺度代理（HMS）。通过交叉验证证明，HMS在预测方面更为优越，并且能够在仅从少量远场输入预测多尺度边界量时实现<15％的误差。随后对化学动力学和燃料响应参数进行贝叶斯校准，校准结果表明DNS中使用的默认值应较高以更好地匹配测量值。本研究强调了在复杂燃烧系统中量化燃料回归速率预测不确定性时的代理模型选择和参数校准的重要性。

更新时间: 2025-04-09 21:42:47

领域: physics.comp-ph,cs.LG

下载: http://arxiv.org/abs/2411.16693v2

Physics-tailored machine learning reveals unexpected physics in dusty plasmas

Dusty plasma is a mixture of ions, electrons, and macroscopic charged particles that is commonly found in space and planetary environments. The particles interact through Coulomb forces mediated by the surrounding plasma, and as a result, the effective forces between particles can be non-conservative and non-reciprocal. Machine learning (ML) models are a promising route to learn these complex forces, yet their structure should match the underlying physical constraints to provide useful insight. Here we demonstrate and experimentally validate an ML approach that incorporates physical intuition to infer force laws in a laboratory dusty plasma. Trained on 3D particle trajectories, the model accounts for inherent symmetries, non-identical particles, and learns the effective non-reciprocal forces between particles with exquisite accuracy (R^2>0.99). We validate the model by inferring particle masses in two independent yet consistent ways. The model's accuracy enables precise measurements of particle charge and screening length, discovering large deviations from common theoretical assumptions. Our ability to identify new physics from experimental data demonstrates how ML-powered approaches can guide new routes of scientific discovery in many-body systems. Furthermore, we anticipate our ML approach to be a starting point for inferring laws from dynamics in a wide range of many-body systems, from colloids to living organisms.

Updated: 2025-04-09 21:41:47

标题: 物理定制的机器学习揭示了尘埃等离子体中意想不到的物理现象

摘要: 尘埃等离子体是一种在太空和行星环境中常见的离子、电子和宏观带电粒子的混合物。这些粒子通过周围等离子体介导的库仑力相互作用，结果，粒子之间的有效力可以是非保守的和非对称的。机器学习（ML）模型是学习这些复杂力的一种有前途的途径，但是它们的结构应该与基础物理约束相匹配，以提供有用的见解。在这里，我们展示并实验证明了一种ML方法，该方法结合了物理直觉来推断实验室尘埃等离子体中的力律。通过对3D粒子轨迹进行训练，该模型考虑了固有的对称性、非相同粒子，并以极高的精度学习了粒子之间的有效非对称力（R^2>0.99）。我们通过两种独立但一致的方式推断粒子质量来验证模型。该模型的准确性使得可以精确测量粒子电荷和屏蔽长度，发现与常见理论假设存在较大偏差。我们能够从实验数据中识别新物理的能力表明，基于ML的方法如何可以引导多体系统中的科学发现的新路径。此外，我们预期我们的ML方法将成为从各种多体系统动力学中推断定律的起点，从胶体到生物体。

更新时间: 2025-04-09 21:41:47

领域: physics.plasm-ph,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2310.05273v3

Data Fusion of Deep Learned Molecular Embeddings for Property Prediction

Data-driven approaches such as deep learning can result in predictive models for material properties with exceptional accuracy and efficiency. However, in many problems data is sparse, severely limiting their accuracy and applicability. To improve predictions, techniques such as transfer learning and multi-task learning have been used. The performance of multi-task learning models depends on the strength of the underlying correlations between tasks and the completeness of the dataset. We find that standard multi-task models tend to underperform when trained on sparse datasets with weakly correlated properties. To address this gap, we use data fusion techniques to combine the learned molecular embeddings of various single-task models and trained a multi-task model on this combined embedding. We apply this technique to a widely used benchmark dataset of quantum chemistry data for small molecules as well as a newly compiled sparse dataset of experimental data collected from literature and our own quantum chemistry and thermochemical calculations. The results show that the fused, multi-task models outperform standard multi-task models for sparse datasets and can provide enhanced prediction on data-limited properties compared to single-task models.

Updated: 2025-04-09 21:40:15

标题: 深度学习分子嵌入数据融合用于性质预测

摘要: 基于数据驱动的方法，如深度学习，可以得到具有异常准确性和效率的材料性能预测模型。然而，在许多问题中，数据是稀疏的，严重限制了它们的准确性和适用性。为了改善预测，已经使用了诸如迁移学习和多任务学习等技术。多任务学习模型的性能取决于任务之间的基础相关性的强度和数据集的完整性。我们发现，在训练基于稀疏数据集和弱相关性属性的标准多任务模型时，其性能往往不佳。为了解决这一问题，我们使用数据融合技术将各种单任务模型学习的分子嵌入组合起来，并在这种组合的嵌入上训练多任务模型。我们将这种技术应用于一个广泛使用的小分子量子化学数据基准数据集，以及从文献和我们自己的量子化学和热化学计算中收集的新编制的稀疏数据集。结果表明，融合的多任务模型在稀疏数据集上优于标准多任务模型，并且与单任务模型相比，能够提供对数据受限属性的增强预测。

更新时间: 2025-04-09 21:40:15

领域: cs.LG,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2504.07297v1

Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs

Partial differential equations (PDEs) are widely used to model complex physical systems, but solving them efficiently remains a significant challenge. Recently, Transformers have emerged as the preferred architecture for PDEs due to their ability to capture intricate dependencies. However, they struggle with representing continuous dynamics and long-range interactions. To overcome these limitations, we introduce the Mamba Neural Operator (MNO), a novel framework that enhances neural operator-based techniques for solving PDEs. MNO establishes a formal theoretical connection between structured state-space models (SSMs) and neural operators, offering a unified structure that can adapt to diverse architectures, including Transformer-based models. By leveraging the structured design of SSMs, MNO captures long-range dependencies and continuous dynamics more effectively than traditional Transformers. Through extensive analysis, we show that MNO significantly boosts the expressive power and accuracy of neural operators, making it not just a complement but a superior framework for PDE-related tasks, bridging the gap between efficient representation and accurate solution approximation.

Updated: 2025-04-09 21:36:19

标题: 猜测：曼巴神经算子：谁会赢？变压器 vs. 偏微分方程状态空间模型 Translation: Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs

摘要: 偏微分方程（PDEs）被广泛应用于建模复杂的物理系统，但有效解决它们仍然是一个重大挑战。最近，由于能够捕捉复杂依赖关系的能力，Transformers已经成为PDE的首选架构。然而，它们在表示连续动态和长距离相互作用方面存在困难。为了克服这些限制，我们引入了Mamba Neural Operator（MNO），这是一个增强神经算子技术用于解决PDE的新框架。MNO建立了结构化状态空间模型（SSMs）和神经算子之间的正式理论连接，提供了一个统一的结构，可以适应各种架构，包括基于Transformer的模型。通过利用SSMs的结构化设计，MNO比传统的Transformers更有效地捕捉长距离依赖关系和连续动态。通过广泛的分析，我们表明MNO显著提高了神经算子的表达能力和准确性，使其不仅是PDE相关任务的一个补充，而且是一个更优越的框架，弥合了高效表示和准确解决方案逼近之间的差距。

更新时间: 2025-04-09 21:36:19

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2410.02113v2

Optimal Bounds for Adversarial Constrained Online Convex Optimization

Constrained Online Convex Optimization (COCO) can be seen as a generalization of the standard Online Convex Optimization (OCO) framework. At each round, a cost function and constraint function are revealed after a learner chooses an action. The goal is to minimize both the regret and cumulative constraint violation (CCV) against an adaptive adversary. We show for the first time that is possible to obtain the optimal $O(\sqrt{T})$ bound on both regret and CCV, improving the best known bounds of $O \left( \sqrt{T} \right)$ and $\tilde{O} \left( \sqrt{T} \right)$ for the regret and CCV, respectively. Based on a new surrogate loss function enforcing a minimum penalty on the constraint function, we demonstrate that both the Follow-the-Regularized-Leader and the Online Gradient Descent achieve the optimal bounds.

Updated: 2025-04-09 21:32:42

标题: 对于对抗性约束在线凸优化的最优界限

摘要: 受限制的在线凸优化（COCO）可以被看作是标准在线凸优化（OCO）框架的一种泛化。在每一轮中，当学习者选择一个动作后，一个成本函数和约束函数被揭示。目标是最小化后悔和累积约束违反（CCV）对抗适应性对手。我们首次展示了可以获得对后悔和CCV的最优$O（\sqrt{T}）$界限，改进了已知的后悔和CCV的最优界限分别为$O（\sqrt{T}）$和$\tilde{O（\sqrt{T}）$。基于一个新的替代损失函数，强制对约束函数进行最小惩罚，我们证明了无论是Follow-the-Regularized-Leader还是Online Gradient Descent都可以实现最优界限。

更新时间: 2025-04-09 21:32:42

领域: cs.LG,cs.DS,math.OC,stat.ML

下载: http://arxiv.org/abs/2503.13366v3

ALFA-Chains: AI-Supported Discovery of Privilege Escalation and Remote Exploit Chains

We present ALFA-Chains, a novel method capable of discovering chains of known Privilege Escalation (PE) and Remote exploits in a network. It can assist in penetration-testing without being tied to any specific penetration-testing framework. We test ALFA-Chains' ability to find exploit chains in networks ranging from 3 to 200 hosts. It can discover a chain in a 20 host network in as little as 0.01 seconds. More importantly, it is able to discover 12 novel exploit chains in a realistic firewalled network. We demonstrate the execution of one of these chains, proving ALFA-Chains' capability to improve penetration-testing.

Updated: 2025-04-09 21:27:54

标题: ALFA-Chains: AI支持的特权提升和远程利用链发现

摘要: 我们提出了ALFA-Chains，这是一种能够发现网络中已知特权升级（PE）和远程攻击利用链的新方法。它可以在渗透测试中发挥作用，而不受任何特定渗透测试框架的约束。我们测试了ALFA-Chains在包含3到200台主机的网络中查找攻击链的能力。它可以在20台主机的网络中在0.01秒内发现一个攻击链。更重要的是，它能够在一个真实的防火墙网络中发现12个新的攻击链。我们展示了其中一个攻击链的执行，证明了ALFA-Chains改进渗透测试的能力。

更新时间: 2025-04-09 21:27:54

领域: cs.CR

下载: http://arxiv.org/abs/2504.07287v1

A Scalable Approach to Clustering Embedding Projections

Interactive visualization of embedding projections is a useful technique for understanding data and evaluating machine learning models. Labeling data within these visualizations is critical for interpretation, as labels provide an overview of the projection and guide user navigation. However, most methods for producing labels require clustering the points, which can be computationally expensive as the number of points grows. In this paper, we describe an efficient clustering approach using kernel density estimation in the projected 2D space instead of points. This algorithm can produce high-quality cluster regions from a 2D density map in a few hundred milliseconds, orders of magnitude faster than current approaches. We contribute the design of the algorithm, benchmarks, and applications that demonstrate the utility of the algorithm, including labeling and summarization.

Updated: 2025-04-09 21:24:17

标题: 一个可扩展的聚类嵌入投影方法

摘要: 交互式可视化嵌入投影是一种有用的技术，用于理解数据和评估机器学习模型。在这些可视化中为数据标记是至关重要的，因为标签提供了对投影的概述并引导用户导航。然而，大多数产生标签的方法需要对点进行聚类，随着点的数量增长，这可能会导致计算开销很大。在本文中，我们描述了一种有效的聚类方法，使用在投影的2D空间中的核密度估计而不是点。该算法可以在几百毫秒内从2D密度图中产生高质量的聚类区域，比当前方法快几个数量级。我们贡献了算法的设计，基准测试以及展示算法的实用性的应用，包括标记和总结。

更新时间: 2025-04-09 21:24:17

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2504.07285v1

Conthereum: Concurrent Ethereum Optimized Transaction Scheduling for Multi-Core Execution

Blockchain technology has revolutionized decentralized computation, providing high security through transparent cryptographic protocols and immutable data. However, the Blockchain Trilemma-an inherent trade-off between security, scalability, and performance-limits computational efficiency, resulting in low transactions-per-second (TPS) compared to conventional systems like Visa or PayPal. To address this, we introduce Conthereum, a novel concurrent blockchain solution that enhances multi-core usage in transaction processing through a deterministic scheduling scheme. It reformulates smart contract execution as a variant of the Flexible Job Shop Scheduling Problem (FJSS), optimizing both time and power consumption. Conthereum offers the most efficient open-source implementation compared to existing solutions. Empirical evaluations based on Ethereum, the most widely used blockchain platform, show near-linear throughput increases with available computational power. Additionally, an integrated energy consumption model allows participant to optimize power usage by intelligently distributing workloads across cores. This solution not only boosts network TPS and energy efficiency, offering a scalable and sustainable framework for blockchain transaction processing. The proposed approach also opens new avenues for further optimizations in Ethereum and is adaptable for broader applications in other blockchain infrastructures.

Updated: 2025-04-09 21:15:05

标题: Conthereum：多核执行优化事务调度的并发以太坊

摘要: 区块链技术已经彻底改变了分布式计算，通过透明的加密协议和不可变的数据提供高安全性。然而，区块链三难题——安全性、可扩展性和性能之间固有的权衡——限制了计算效率，导致每秒交易数（TPS）较传统系统如Visa或PayPal低。为了解决这个问题，我们引入了Conthereum，一种新颖的并发区块链解决方案，通过确定性调度方案增强了多核处理器在交易处理中的使用。它将智能合约执行重新构建为柔性作业车间调度问题（FJSS）的一个变种，优化了时间和功耗。与现有解决方案相比，Conthereum提供了最高效的开源实现。基于以太坊（目前最广泛使用的区块链平台）的实证评估显示，随着可用计算能力的增加，吞吐量近乎线性增加。此外，集成的能源消耗模型允许参与者通过智能地在核心间分配工作负载来优化功耗。这个解决方案不仅提升了网络TPS和能源效率，为区块链交易处理提供了可扩展和可持续的框架。所提出的方法还为以太坊的进一步优化打开了新的途径，并适用于其他区块链基础设施的更广泛应用。

更新时间: 2025-04-09 21:15:05

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2504.07280v1

A Multi-Phase Analysis of Blood Culture Stewardship: Machine Learning Prediction, Expert Recommendation Assessment, and LLM Automation

Blood cultures are often over ordered without clear justification, straining healthcare resources and contributing to inappropriate antibiotic use pressures worsened by the global shortage. In study of 135483 emergency department (ED) blood culture orders, we developed machine learning (ML) models to predict the risk of bacteremia using structured electronic health record (EHR) data and provider notes via a large language model (LLM). The structured models AUC improved from 0.76 to 0.79 with note embeddings and reached 0.81 with added diagnosis codes. Compared to an expert recommendation framework applied by human reviewers and an LLM-based pipeline, our ML approach offered higher specificity without compromising sensitivity. The recommendation framework achieved sensitivity 86%, specificity 57%, while the LLM maintained high sensitivity (96%) but over classified negatives, reducing specificity (16%). These findings demonstrate that ML models integrating structured and unstructured data can outperform consensus recommendations, enhancing diagnostic stewardship beyond existing standards of care.

Updated: 2025-04-09 21:12:29

标题: 一种血培养管理的多阶段分析：机器学习预测，专家建议评估和LLM自动化

摘要: 血培养往往被过度开出，没有明确的理由，消耗医疗资源并加剧不当使用抗生素的压力，全球短缺加剧了这一现象。在对135483例急诊科（ED）血培养订单进行研究中，我们利用结构化的电子健康记录（EHR）数据和提供者注释通过大型语言模型（LLM）开发了机器学习（ML）模型来预测败血症风险。结构化模型的AUC值从0.76提高到0.79，注释嵌入后达到0.81，并在添加诊断编码后获得。与人工审阅者应用的专家推荐框架和基于LLM的流程相比，我们的ML方法提供了更高的特异性，而不会损害灵敏度。推荐框架实现了86%的灵敏度和57%的特异性，而LLM保持了高灵敏度（96%），但过度分类负例，降低了特异性（16%）。这些发现表明，整合结构化和非结构化数据的ML模型可以胜过共识推荐，提高诊断管理水平，超越现有的护理标准。

更新时间: 2025-04-09 21:12:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.07278v1

No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers

We consider the sampling problem, where the aim is to draw samples from a distribution whose density is known only up to a normalization constant. Recent breakthroughs in generative modeling to approximate a high-dimensional data distribution have sparked significant interest in developing neural network-based methods for this challenging problem. However, neural samplers typically incur heavy computational overhead due to simulating trajectories during training. This motivates the pursuit of simulation-free training procedures of neural samplers. In this work, we propose an elegant modification to previous methods, which allows simulation-free training with the help of a time-dependent normalizing flow. However, it ultimately suffers from severe mode collapse. On closer inspection, we find that nearly all successful neural samplers rely on Langevin preconditioning to avoid mode collapsing. We systematically analyze several popular methods with various objective functions and demonstrate that, in the absence of Langevin preconditioning, most of them fail to adequately cover even a simple target. Finally, we draw attention to a strong baseline by combining the state-of-the-art MCMC method, Parallel Tempering (PT), with an additional generative model to shed light on future explorations of neural samplers.

Updated: 2025-04-09 21:05:18

标题: 没有把戏，没有对策：朝着无需模拟训练神经采样器的追求和挑战

摘要: 我们考虑抽样问题，目标是从一个密度函数仅知道归一化常数的分布中抽取样本。近年来，生成建模方面的突破引发了对发展神经网络方法解决这一具有挑战性问题的浓厚兴趣。然而，由于在训练过程中模拟轨迹导致神经采样器通常产生沉重的计算开销。这促使我们追求无需模拟的神经采样器训练程序。在这项工作中，我们提出了一种优雅的修改方法，借助时间相关的归一化流来实现无需模拟的训练。然而，它最终遭受严重的模式崩溃。经过仔细检查，我们发现几乎所有成功的神经采样器都依赖Langevin预条件来避免模式崩溃。我们系统地分析了几种流行的方法，使用不同的目标函数，并证明在没有Langevin预条件的情况下，大多数方法甚至无法充分覆盖一个简单的目标。最后，我们结合最先进的MCMC方法Parallel Tempering (PT)和额外的生成模型，提出一个强有力的基线，为未来的神经采样器探索提供一些启示。

更新时间: 2025-04-09 21:05:18

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.06685v2

Evaluating Parameter-Based Training Performance of Neural Networks and Variational Quantum Circuits

In recent years, neural networks (NNs) have driven significant advances in machine learning. However, as tasks grow more complex, NNs often require large numbers of trainable parameters, which increases computational and energy demands. Variational quantum circuits (VQCs) offer a promising alternative: they leverage quantum mechanics to capture intricate relationships and typically need fewer parameters. In this work, we evaluate NNs and VQCs on simple supervised and reinforcement learning tasks, examining models with different parameter sizes. We simulate VQCs and execute selected parts of the training process on real quantum hardware to approximate actual training times. Our results show that VQCs can match NNs in performance while using significantly fewer parameters, despite longer training durations. As quantum technology and algorithms advance, and VQC architectures improve, we posit that VQCs could become advantageous for certain machine learning tasks.

Updated: 2025-04-09 21:00:41

标题: 评估基于参数的神经网络和变分量子电路的训练性能

摘要: 近年来，神经网络（NNs）在机器学习领域取得了重大进展。然而，随着任务变得越来越复杂，NNs通常需要大量可训练参数，这增加了计算和能量需求。变分量子电路（VQCs）提供了一个有前途的替代方案：它们利用量子力学来捕捉错综复杂的关系，通常需要更少的参数。在这项工作中，我们评估了在简单的监督学习和强化学习任务上的 NNs 和 VQCs，研究了具有不同参数大小的模型。我们模拟了 VQCs 并在真实的量子硬件上执行了选定的部分训练过程，以近似实际的训练时间。我们的结果显示，尽管训练时间更长，但 VQCs 可以在性能上与 NNs 相匹配，同时使用显著更少的参数。随着量子技术和算法的进步，以及 VQC 结构的改进，我们认为 VQCs 可能在某些机器学习任务中具有优势。

更新时间: 2025-04-09 21:00:41

领域: quant-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.07273v1

Tensor Product Attention Is All You Need

Scaling language models to handle longer input sequences typically necessitates large key-value (KV) caches, resulting in substantial memory overhead during inference. In this paper, we propose Tensor Product Attention (TPA), a novel attention mechanism that uses tensor decompositions to represent queries, keys, and values compactly, significantly shrinking KV cache size at inference time. By factorizing these representations into contextual low-rank components (contextual factorization) and seamlessly integrating with RoPE, TPA achieves improved model quality alongside memory efficiency. Based on TPA, we introduce the Tensor ProducT ATTenTion Transformer (T6), a new model architecture for sequence modeling. Through extensive empirical evaluation of language modeling tasks, we demonstrate that T6 exceeds the performance of standard Transformer baselines including MHA, MQA, GQA, and MLA across various metrics, including perplexity and a range of renowned evaluation benchmarks. Notably, TPA's memory efficiency enables the processing of significantly longer sequences under fixed resource constraints, addressing a critical scalability challenge in modern language models. The code is available at https://github.com/tensorgi/T6.

Updated: 2025-04-09 20:51:08

标题: 张量积注意力就是你所需要的。

摘要: 将语言模型扩展以处理更长的输入序列通常需要大量的键-值（KV）缓存，导致推理过程中出现显着的内存开销。在本文中，我们提出了张量积注意力（TPA），这是一种使用张量分解来紧凑表示查询、键和值的新型注意力机制，显著减小了推理时的KV缓存大小。通过将这些表示因式分解为上下文低秩组件（上下文因式分解）并与RoPE无缝集成，TPA在提高模型质量的同时实现了内存效率。基于TPA，我们引入了张量积注意力变换器（T6），这是一种用于序列建模的新模型架构。通过对语言建模任务进行广泛的实证评估，我们证明T6在各种指标（包括困惑度和一系列著名的评估基准）上超过了标准Transformer基线，包括MHA、MQA、GQA和MLA。值得注意的是，TPA的内存效率使得在固定资源约束下处理更长序列成为可能，解决了现代语言模型中的一个关键可扩展性挑战。代码可在https://github.com/tensorgi/T6找到。

更新时间: 2025-04-09 20:51:08

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2501.06425v3

ECDSA Cracking Methods

The ECDSA (Elliptic Curve Digital Signature Algorithm) is used in many blockchain networks for digital signatures. This includes the Bitcoin and the Ethereum blockchains. While it has good performance levels and as strong current security, it should be handled with care. This care typically relates to the usage of the nonce value which is used to create the signature. This paper outlines the methods that can be used to break ECDSA signatures, including revealed nonces, weak nonce choice, nonce reuse, two keys and shared nonces, and fault attack.

Updated: 2025-04-09 20:43:27

标题: ECDSA破解方法

摘要: ECDSA（椭圆曲线数字签名算法）被许多区块链网络用于数字签名。这包括比特币和以太坊区块链。虽然它具有良好的性能水平和强大的安全性，但应小心处理。这种小心通常涉及用于创建签名的随机值的使用。本文概述了可以用于破解ECDSA签名的方法，包括已揭示的随机值、弱随机值选择、随机值重用、两个密钥和共享随机值，以及故障攻击。

更新时间: 2025-04-09 20:43:27

领域: cs.CR

下载: http://arxiv.org/abs/2504.07265v1

Adapting to Online Distribution Shifts in Deep Learning: A Black-Box Approach

We study the well-motivated problem of online distribution shift in which the data arrive in batches and the distribution of each batch can change arbitrarily over time. Since the shifts can be large or small, abrupt or gradual, the length of the relevant historical data to learn from may vary over time, which poses a major challenge in designing algorithms that can automatically adapt to the best ``attention span'' while remaining computationally efficient. We propose a meta-algorithm that takes any network architecture and any Online Learner (OL) algorithm as input and produces a new algorithm which provably enhances the performance of the given OL under non-stationarity. Our algorithm is efficient (it requires maintaining only $O(\log(T))$ OL instances) and adaptive (it automatically chooses OL instances with the ideal ``attention'' length at every timestamp). Experiments on various real-world datasets across text and image modalities show that our method consistently improves the accuracy of user specified OL algorithms for classification tasks. Key novel algorithmic ingredients include a \emph{multi-resolution instance} design inspired by wavelet theory and a cross-validation-through-time technique. Both could be of independent interest.

Updated: 2025-04-09 20:34:24

标题: 适应深度学习在线分发变化：一种黑盒方法

摘要: 我们研究了在线分布转移的一个充分动机的问题，其中数据以批次到达，每个批次的分布可以随时间任意改变。由于转移可以是大或小的，突然或渐进的，可以学习的相关历史数据的长度可能随时间变化，这在设计能够自动适应最佳“注意力跨度”的算法方面构成了一个主要挑战，同时保持计算效率。我们提出了一种元算法，该算法将任何网络架构和任何在线学习器（OL）算法作为输入，并产生一个新算法，该算法在非稳态下可以证明地提高给定OL的性能。我们的算法是高效的（仅需要维护$O(\log(T))$个OL实例）和自适应的（它在每个时间戳自动选择具有理想“注意力”长度的OL实例）。在跨文本和图像模态的各种真实世界数据集上的实验证明，我们的方法始终提高了用户指定的OL算法在分类任务中的准确性。关键的新算法要素包括受小波理论启发的多分辨率实例设计和通过时间的交叉验证技术。这两者都可能具有独立的兴趣。

更新时间: 2025-04-09 20:34:24

领域: cs.LG

下载: http://arxiv.org/abs/2504.07261v1

Demystifying amortized causal discovery with transformers

Supervised learning approaches for causal discovery from observational data often achieve competitive performance despite seemingly avoiding explicit assumptions that traditional methods make for identifiability. In this work, we investigate CSIvA (Ke et al., 2023), a transformer-based model promising to train on synthetic data and transfer to real data. First, we bridge the gap with existing identifiability theory and show that constraints on the training data distribution implicitly define a prior on the test observations. Consistent with classical approaches, good performance is achieved when we have a good prior on the test data, and the underlying model is identifiable. At the same time, we find new trade-offs. Training on datasets generated from different classes of causal models, unambiguously identifiable in isolation, improves the test generalization. Performance is still guaranteed, as the ambiguous cases resulting from the mixture of identifiable causal models are unlikely to occur (which we formally prove). Overall, our study finds that amortized causal discovery still needs to obey identifiability theory, but it also differs from classical methods in how the assumptions are formulated, trading more reliance on assumptions on the noise type for fewer hypotheses on the mechanisms.

Updated: 2025-04-09 20:30:46

标题: 解密使用transformers进行摊销因果发现

摘要: 监督学习方法用于从观测数据中发现因果关系，尽管看似避免了传统方法为可辨识性所做的明确假设，但通常能够取得竞争性表现。在这项工作中，我们调查了CSIvA（Ke等人，2023年），这是一个基于变压器的模型，承诺在合成数据上训练并转移到真实数据上。首先，我们弥合了现有的可辨识性理论与表明训练数据分布的约束隐含地定义了对测试观测的先验的差距。与经典方法一致，当我们对测试数据有一个良好的先验并且基础模型是可辨识时，就能实现良好的表现。与此同时，我们发现了新的权衡。训练数据集来自不同类别的因果模型，这些模型在单独情况下是明确可辨识的，能够提高测试泛化能力。性能仍然得到保证，因为由于可辨识因果模型混合而导致的模糊情况不太可能发生（我们在正式证明中证明了这一点）。总的来说，我们的研究发现，摊销因果发现仍需要遵守可辨识性理论，但它也不同于经典方法在假设的表述方式上，更多地依赖于对噪声类型的假设，而减少了对机制的假设。

更新时间: 2025-04-09 20:30:46

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.16924v2

Better Decisions through the Right Causal World Model

Reinforcement learning (RL) agents have shown remarkable performances in various environments, where they can discover effective policies directly from sensory inputs. However, these agents often exploit spurious correlations in the training data, resulting in brittle behaviours that fail to generalize to new or slightly modified environments. To address this, we introduce the Causal Object-centric Model Extraction Tool (COMET), a novel algorithm designed to learn the exact interpretable causal world models (CWMs). COMET first extracts object-centric state descriptions from observations and identifies the environment's internal states related to the depicted objects' properties. Using symbolic regression, it models object-centric transitions and derives causal relationships governing object dynamics. COMET further incorporates large language models (LLMs) for semantic inference, annotating causal variables to enhance interpretability. By leveraging these capabilities, COMET constructs CWMs that align with the true causal structure of the environment, enabling agents to focus on task-relevant features. The extracted CWMs mitigate the danger of shortcuts, permitting the development of RL systems capable of better planning and decision-making across dynamic scenarios. Our results, validated in Atari environments such as Pong and Freeway, demonstrate the accuracy and robustness of COMET, highlighting its potential to bridge the gap between object-centric reasoning and causal inference in reinforcement learning.

Updated: 2025-04-09 20:29:13

标题: 通过正确的因果世界模型做出更好的决策

摘要: 强化学习（RL）代理在各种环境中展现出了卓越的表现，它们能够直接从感知输入中发现有效的策略。然而，这些代理通常会利用训练数据中的虚假相关性，导致脆弱的行为无法推广到新的或稍作修改的环境。为了解决这个问题，我们引入了因果物体中心模型提取工具（COMET），这是一种旨在学习精确可解释的因果世界模型（CWMs）的新算法。COMET首先从观察中提取物体中心状态描述，并识别与所描述对象属性相关的环境内部状态。通过符号回归，它对物体中心转换进行建模，并推导控制对象动态的因果关系。COMET进一步整合了大型语言模型（LLMs）进行语义推理，对因果变量进行注释以增强可解释性。通过利用这些能力，COMET构建了与环境真实因果结构一致的CWMs，使代理能够专注于任务相关特征。提取的CWMs减轻了捷径的危险，使得能够在动态情景下更好地规划和决策的RL系统的发展成为可能。我们在Atari环境（如Pong和Freeway）中验证的结果展示了COMET的准确性和稳健性，突显了它在强化学习中桥接物体中心推理和因果推断之间的潜力。

更新时间: 2025-04-09 20:29:13

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.07257v1

CiteBART: Learning to Generate Citations for Local Citation Recommendation

Local citation recommendation (LCR) suggests a set of papers for a citation placeholder within a given context. The task has evolved as generative approaches have become more promising than the traditional pre-fetch and re-rank-based state-of-the-art approaches. This paper introduces citation-specific pre-training within an encoder-decoder architecture, where author-date citation tokens are masked to learn to reconstruct them to fulfill LCR. There are two variants for this pre-training. In the local context-only base scheme (CiteBART-Base), the citation token in a local context is masked to learn to predict the citation. The global version (CiteBART-Global) extends the local context with the citing paper's title and abstract to enrich the learning signal. CiteBART-Global achieves state-of-the-art performance on LCR benchmarks except for the FullTextPeerRead dataset, which is quite small to see the advantage of generative pre-training. The effect is significant in the larger benchmarks, e.g., Refseer and ArXiv., with the Refseer benchmark-trained model emerging as the best-performing model. We perform comprehensive experiments, including an ablation study, a qualitative analysis, and a taxonomy of hallucinations with detailed statistics. Our analyses confirm that CiteBART-Global has a cross-dataset generalization capability; the macro hallucination rate (MaHR) at the top-3 predictions is 4\%, and when the ground-truth is in the top-k prediction list, the hallucination tendency in the other predictions drops significantly.

Updated: 2025-04-09 20:23:16

标题: 引用BART：学习为本地引用推荐生成引文

摘要: 本地引文推荐（LCR）建议在给定上下文中为引文占位符提供一组论文。随着生成方法变得比传统的预获取和重新排名的最先进方法更有前景，这项任务已经发展。本文介绍了一种在编码器-解码器架构中进行引文特定预训练的方法，其中作者-日期引文标记被屏蔽以学习重建它们以满足LCR的要求。有两种变体用于这种预训练。在本地上下文基本方案（CiteBART-Base）中，本地上下文中的引文标记被屏蔽以学习预测引文。全局版本（CiteBART-Global）通过将引用论文的标题和摘要与本地上下文相结合，丰富了学习信号。CiteBART-Global在LCR基准测试中实现了最先进的性能，除了FullTextPeerRead数据集，这个数据集非常小，无法看到生成式预训练的优势。在更大的基准测试中，例如Refseer和ArXiv，效果显著，Refseer基准测试训练的模型成为表现最佳的模型。我们进行了全面的实验，包括消融研究、定性分析和详细统计的幻觉分类。我们的分析证实了CiteBART-Global具有跨数据集的泛化能力；在前3个预测中的宏幻觉率（MaHR）为4％，当真实值在前k个预测列表中时，其他预测中的幻觉倾向显著下降。

更新时间: 2025-04-09 20:23:16

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2412.17534v2

Neural Approximate Mirror Maps for Constrained Diffusion Models

Diffusion models excel at creating visually-convincing images, but they often struggle to meet subtle constraints inherent in the training data. Such constraints could be physics-based (e.g., satisfying a PDE), geometric (e.g., respecting symmetry), or semantic (e.g., including a particular number of objects). When the training data all satisfy a certain constraint, enforcing this constraint on a diffusion model makes it more reliable for generating valid synthetic data and solving constrained inverse problems. However, existing methods for constrained diffusion models are restricted in the constraints they can handle. For instance, recent work proposed to learn mirror diffusion models (MDMs), but analytical mirror maps only exist for convex constraints and can be challenging to derive. We propose neural approximate mirror maps (NAMMs) for general, possibly non-convex constraints. Our approach only requires a differentiable distance function from the constraint set. We learn an approximate mirror map that transforms data into an unconstrained space and a corresponding approximate inverse that maps data back to the constraint set. A generative model, such as an MDM, can then be trained in the learned mirror space and its samples restored to the constraint set by the inverse map. We validate our approach on a variety of constraints, showing that compared to an unconstrained diffusion model, a NAMM-based MDM substantially improves constraint satisfaction. We also demonstrate how existing diffusion-based inverse-problem solvers can be easily applied in the learned mirror space to solve constrained inverse problems.

Updated: 2025-04-09 20:08:55

标题: 神经近似镜像映射用于受限扩散模型

摘要: 扩散模型在创建视觉上令人信服的图像方面表现出色，但它们通常难以满足训练数据中固有的微妙约束。这些约束可能是基于物理学的（例如，满足偏微分方程），几何的（例如，尊重对称性），或语义的（例如，包括特定数量的对象）。当训练数据都满足某个约束时，在扩散模型上强制执行该约束使其更可靠地生成有效的合成数据并解决受约束的反问题。然而，现有的受约束扩散模型方法在其能处理的约束方面受到限制。例如，最近的研究提出学习镜像扩散模型（MDMs），但分析镜像映射仅对凸约束有效，且难以推导。我们提出了适用于一般可能非凸约束的神经近似镜像映射（NAMMs）。我们的方法只需要约束集合中的可微分距离函数。我们学习一个近似镜像映射，将数据转换为无约束空间，并学习一个相应的近似逆映射，将数据映射回约束集合。然后可以在学习的镜像空间中训练生成模型，如MDM，并通过逆映射将其样本恢复到约束集合。我们验证了我们的方法在各种约束条件下的有效性，结果显示与无约束扩散模型相比，基于NAMM的MDM显著提高了约束满足度。我们还展示了如何在学习的镜像空间中轻松应用现有基于扩散的反问题求解器来解决受约束的反问题。

更新时间: 2025-04-09 20:08:55

领域: cs.LG,cs.CV,eess.IV

下载: http://arxiv.org/abs/2406.12816v2

CORTEX-AVD: A Framework for CORner Case Testing and EXploration in Autonomous Vehicle Development

Autonomous Vehicles (AVs) aim to improve traffic safety and efficiency by reducing human error. However, ensuring AVs reliability and safety is a challenging task when rare, high-risk traffic scenarios are considered. These 'Corner Cases' (CC) scenarios, such as unexpected vehicle maneuvers or sudden pedestrian crossings, must be safely and reliable dealt by AVs during their operations. But they arehard to be efficiently generated. Traditional CC generation relies on costly and risky real-world data acquisition, limiting scalability, and slowing research and development progress. Simulation-based techniques also face challenges, as modeling diverse scenarios and capturing all possible CCs is complex and time-consuming. To address these limitations in CC generation, this research introduces CORTEX-AVD, CORner Case Testing & EXploration for Autonomous Vehicles Development, an open-source framework that integrates the CARLA Simulator and Scenic to automatically generate CC from textual descriptions, increasing the diversity and automation of scenario modeling. Genetic Algorithms (GA) are used to optimize the scenario parameters in six case study scenarios, increasing the occurrence of high-risk events. Unlike previous methods, CORTEX-AVD incorporates a multi-factor fitness function that considers variables such as distance, time, speed, and collision likelihood. Additionally, the study provides a benchmark for comparing GA-based CC generation methods, contributing to a more standardized evaluation of synthetic data generation and scenario assessment. Experimental results demonstrate that the CORTEX-AVD framework significantly increases CC incidence while reducing the proportion of wasted simulations.

Updated: 2025-04-09 20:04:21

标题: CORTEX-AVD：自动驾驶车辆开发中用于角案例测试和探索的框架

摘要: 自主驾驶车辆（AVs）旨在通过减少人为错误来提高交通安全性和效率。然而，当考虑罕见、高风险的交通场景时，确保AVs的可靠性和安全性是一项具有挑战性的任务。这些“角落案例”（CC）场景，如意外车辆机动或突然的行人穿越，在AVs运行期间必须被安全和可靠地处理。但是，它们很难被有效地生成。传统的CC生成依赖于昂贵和风险的现实世界数据采集，限制了可扩展性，并减缓了研究和开发的进展。基于模拟的技术也面临挑战，因为建模各种场景和捕捉所有可能的CC是复杂和耗时的。为了解决CC生成中的这些限制，本研究引入了CORTEX-AVD，即自主车辆开发的CORner Case Testing＆Exploration的开源框架，该框架集成了CARLA模拟器和Scenic，可以从文本描述中自动生成CC，增加了场景建模的多样性和自动化。遗传算法（GA）用于优化六个案例研究场景中的参数，增加高风险事件的发生率。与先前的方法不同，CORTEX-AVD包含一个考虑距离、时间、速度和碰撞可能性等变量的多因素适应性函数。此外，该研究提供了一个用于比较基于GA的CC生成方法的基准，有助于更加标准化地评估合成数据生成和场景评估。实验结果表明，CORTEX-AVD框架显著增加了CC的发生率，同时减少了浪费模拟的比例。

更新时间: 2025-04-09 20:04:21

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2504.03989v3

Scalable Reinforcement Post-Training Beyond Static Human Prompts: Evolving Alignment via Asymmetric Self-Play

Current reinforcement learning (RL) frameworks for large language models (LLM) post-training typically assume a fixed prompt distribution, which is sub-optimal and bottlenecks scalability. Prior works have explored prompt evolving, but are often limited to the supervised fine-tuning stage, and prompts are sampled and evolved uniformly without signals. This empirical work presents a paradigm shift: Evolving Alignment via Asymmetric Self-Play (eva), that casts post-training as an infinite game with regret-based signals for 2 players: (i) a creator, who strategically samples and creates new informative prompts and (ii) a solver, who learns to produce preferred responses. eva is the first method that allows language models to adaptively create training prompts in both offline and online RL post-training. The design is simple, easy-to-use yet remarkably effective: eva sets a new SOTA on challenging benchmarks, without any extra human prompts, e.g. it boosts the win-rate of gemma-2-9b-it on Arena-Hard by 51.6% -> 60.1% for DPO and 52.6% -> 62.4% for RLOO, surpassing claude-3-opus and catching up to gemini-1.5-pro, both of which are orders of magnitude larger. Extensive experiments show eva can create effective RL curricula and is robust across ablations. We believe adaptively evolving prompts are key to designing the next-generation RL post-training scheme.

Updated: 2025-04-09 19:53:54

标题: 标题翻译：可扩展的强化后训练：超越静态人类提示的演进对齐通过不对称自我对弈

摘要: 目前用于大型语言模型（LLM）后训练的强化学习（RL）框架通常假设固定的提示分布，这是次优的并且限制了可扩展性。先前的研究已经探讨了提示的演化，但通常仅限于监督微调阶段，并且提示是均匀采样和演化而没有信号。这项实证工作提出了一种范式转变：通过不对称自我博弈（eva）进行演化对齐，将后训练看作是一个无限游戏，其中有2个玩家：（i）创作者，他们策略性地采样和创建新的信息提示；（ii）求解器，学习生成首选响应。eva是第一种方法，允许语言模型在离线和在线RL后训练中自适应地创建训练提示。设计简单，易于使用但非常有效：eva在具有挑战性的基准测试中刷新了最新技术，而无需任何额外的人类提示，例如它将gemma-2-9b-it在Arena-Hard上的胜率从51.6%提高到60.1%，对于DPO提高了52.6%至62.4%，超越了claude-3-opus并赶上了gemini-1.5-pro，这两者都是数量级更大的。大量实验证明eva可以创建有效的RL课程，并且在删除实验中表现稳健。我们相信自适应演化提示是设计下一代RL后训练方案的关键。

更新时间: 2025-04-09 19:53:54

领域: cs.CL,cs.AI,physics.data-an,stat.ML

下载: http://arxiv.org/abs/2411.00062v3

Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge

Background Identification of the interactions and regulatory relations between biomolecules play pivotal roles in understanding complex biological systems and the mechanisms underlying diverse biological functions. However, the collection of such molecular interactions has heavily relied on expert curation in the past, making it labor-intensive and time-consuming. To mitigate these challenges, we propose leveraging the capabilities of large language models (LLMs) to automate genome-scale extraction of this crucial knowledge. Results In this study, we investigate the efficacy of various LLMs in addressing biological tasks, such as the recognition of protein interactions, identification of genes linked to pathways affected by low-dose radiation, and the delineation of gene regulatory relationships. Overall, the larger models exhibited superior performance, indicating their potential for specific tasks that involve the extraction of complex interactions among genes and proteins. Although these models possessed detailed information for distinct gene and protein groups, they faced challenges in identifying groups with diverse functions and in recognizing highly correlated gene regulatory relationships. Conclusions By conducting a comprehensive assessment of the state-of-the-art models using well-established molecular interaction and pathway databases, our study reveals that LLMs can identify genes/proteins associated with pathways of interest and predict their interactions to a certain extent. Furthermore, these models can provide important insights, marking a noteworthy stride toward advancing our understanding of biological systems through AI-assisted knowledge discovery.

Updated: 2025-04-09 19:41:35

标题: 大型语言模型在提取分子相互作用和通路知识方面的比较性能评估

摘要: 背景识别生物分子之间的相互作用和调控关系对于理解复杂生物系统和多样生物功能的机制起着关键作用。然而，过去这类分子相互作用的收集主要依赖于专家的整理，导致工作量大且耗时。为了应对这些挑战，我们提出利用大型语言模型（LLMs）的能力自动化提取这一关键知识的基因组规模。结果在本研究中，我们调查了各种LLMs在解决生物任务方面的有效性，如识别蛋白质相互作用、识别受低剂量辐射影响的通路相关基因以及勾画基因调控关系。总体而言，较大的模型表现出更好的性能，表明它们在涉及基因和蛋白质之间复杂相互作用提取的特定任务中具有潜力。尽管这些模型对不同基因和蛋白质组的详细信息具有优势，但在识别功能多样的组和识别高度相关的基因调控关系方面面临挑战。结论通过对现有最先进模型在已建立的分子相互作用和通路数据库上进行全面评估，我们的研究揭示了LLMs可以识别与感兴趣通路相关的基因/蛋白质并在一定程度上预测它们的相互作用。此外，这些模型可以提供重要见解，标志着通过AI辅助知识发现推动我们对生物系统的理解迈出了值得注意的一步。

更新时间: 2025-04-09 19:41:35

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2307.08813v3

Resource-efficient Inference with Foundation Model Programs

The inference-time resource costs of large language and vision models present a growing challenge in production deployments. We propose the use of foundation model programs, i.e., programs that can invoke foundation models with varying resource costs and performance, as an approach to this problem. Specifically, we present a method that translates a task into a program, then learns a policy for resource allocation that, on each input, selects foundation model "backends" for each program module. The policy uses smaller, cheaper backends to handle simpler subtasks, while allowing more complex subtasks to leverage larger, more capable models. We evaluate the method on two new "streaming" visual question-answering tasks in which a system answers a question on a sequence of inputs, receiving ground-truth feedback after each answer. Compared to monolithic multi-modal models, our implementation achieves up to 98% resource savings with minimal accuracy loss, demonstrating its potential for scalable and resource-efficient multi-modal inference.

Updated: 2025-04-09 19:36:47

标题: 基于基础模型程序的资源高效推理

摘要: 大型语言和视觉模型的推理时间资源成本在生产部署中提出了一个不断增长的挑战。我们提出使用基础模型程序，即可以调用具有不同资源成本和性能的基础模型的程序，作为解决这个问题的方法。具体来说，我们提出了一种方法，将任务转化为程序，然后学习资源分配策略，该策略在每个输入上为每个程序模块选择基础模型“后端”。该策略使用较小、更便宜的后端来处理较简单的子任务，同时允许更复杂的子任务利用更大、更有能力的模型。我们在两个新的“流式”视觉问答任务上评估了该方法，在这些任务中，系统对一系列输入的问题进行回答，并在每个答案后接收地面真实反馈。与单一的多模态模型相比，我们的实现实现了最高达98%的资源节省，且准确率损失最小，展示了其在可扩展和资源高效的多模态推理方面的潜力。

更新时间: 2025-04-09 19:36:47

领域: cs.LG

下载: http://arxiv.org/abs/2504.07247v1

A new training approach for text classification in Mental Health: LatentGLoss

This study presents a multi-stage approach to mental health classification by leveraging traditional machine learning algorithms, deep learning architectures, and transformer-based models. A novel data set was curated and utilized to evaluate the performance of various methods, starting with conventional classifiers and advancing through neural networks. To broaden the architectural scope, recurrent neural networks (RNNs) such as LSTM and GRU were also evaluated to explore their effectiveness in modeling sequential patterns in the data. Subsequently, transformer models such as BERT were fine-tuned to assess the impact of contextual embeddings in this domain. Beyond these baseline evaluations, the core contribution of this study lies in a novel training strategy involving a dual-model architecture composed of a teacher and a student network. Unlike standard distillation techniques, this method does not rely on soft label transfer; instead, it facilitates information flow through both the teacher model's output and its latent representations by modifying the loss function. The experimental results highlight the effectiveness of each modeling stage and demonstrate that the proposed loss function and teacher-student interaction significantly enhance the model's learning capacity in mental health prediction tasks.

Updated: 2025-04-09 19:34:31

标题: 一种新的文本分类训练方法在心理健康领域的应用：LatentGLoss

摘要: 这项研究提出了一种多阶段的方法，通过利用传统的机器学习算法、深度学习结构和基于transformer的模型对心理健康进行分类。一个新颖的数据集被策划并利用来评估各种方法的性能，从传统分类器开始，逐步推进到神经网络。为了扩展架构范围，循环神经网络（RNNs）如LSTM和GRU也被评估，以探索它们在数据中建模顺序模式的效果。随后，transformer模型如BERT被微调，以评估在这一领域中上下文嵌入的影响。除了这些基准评估之外，本研究的核心贡献在于一种新颖的训练策略，涉及由教师网络和学生网络组成的双模型架构。与标准蒸馏技术不同，这种方法不依赖软标签转移；相反，它通过修改损失函数促进信息在教师模型的输出和其潜在表示之间的流动。实验结果突出了每个建模阶段的有效性，并表明所提出的损失函数和教师-学生交互显著增强了模型在心理健康预测任务中的学习能力。

更新时间: 2025-04-09 19:34:31

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.07245v1

Prototype-Based Continual Learning with Label-free Replay Buffer and Cluster Preservation Loss

Continual learning techniques employ simple replay sample selection processes and use them during subsequent tasks. Typically, they rely on labeled data. In this paper, we depart from this by automatically selecting prototypes stored without labels, preserving cluster structures in the latent space across tasks. By eliminating label dependence in the replay buffer and introducing cluster preservation loss, it is demonstrated that the proposed method can maintain essential information from previously encountered tasks while ensuring adaptation to new tasks. "Push-away" and "pull-toward" mechanisms over previously learned prototypes are also introduced for class-incremental and domain-incremental scenarios. These mechanisms ensure the retention of previously learned information as well as adaptation to new classes or domain shifts. The proposed method is evaluated on several benchmarks, including SplitCIFAR100, SplitImageNet32, SplitTinyImageNet, and SplitCaltech256 for class-incremental, as well as R-MNIST and CORe50 for domain-incremental setting using pre-extracted DINOv2 features. Experimental results indicate that the label-free replay-based technique outperforms state-of-the-art continual learning methods and, in some cases, even surpasses offline learning. An unsupervised variant of the proposed technique for the class-incremental setting, avoiding labels use even on incoming data, also demonstrated competitive performance, outperforming particular supervised baselines in some cases. These findings underscore the effectiveness of the proposed framework in retaining prior information and facilitating continual adaptation.

Updated: 2025-04-09 19:26:26

标题: 基于原型的连续学习：无标签重放缓冲和聚类保持损失

摘要: 持续学习技术采用简单的重放样本选择过程，并在后续任务中使用它们。通常情况下，它们依赖于带标签的数据。本文与此不同，通过自动选择存储在没有标签的原型，并在潜在空间中保留跨任务的簇结构。通过在回放缓冲区中消除标签依赖性并引入簇保留损失，证明了所提出的方法可以在确保适应新任务的同时保留先前遇到的任务的关键信息。还引入了"推开"和"拉向"机制，用于类增量和域增量情况。这些机制确保了以前学习到的信息的保留以及对新类别或域转移的适应。所提出的方法在几个基准测试中进行了评估，包括SplitCIFAR100、SplitImageNet32、SplitTinyImageNet和SplitCaltech256用于类增量，以及使用预先提取的DINOv2特征的R-MNIST和CORe50用于域增量设置。实验结果表明，基于无标签回放的技术优于最先进的持续学习方法，并且在某些情况下甚至超过离线学习。提出的技术的无监督变体，用于类增量设置，即使在输入数据上也避免使用标签，有时还能表现出竞争性能，甚至在某些情况下超过特定的监督基线。这些发现强调了所提出框架在保留先前信息和促进持续适应中的有效性。

更新时间: 2025-04-09 19:26:26

领域: cs.LG

下载: http://arxiv.org/abs/2504.07240v1

Earth-like planet predictor: A machine learning approach

Searching for planets analogous to Earth in terms of mass and equilibrium temperature is currently the first step in the quest for habitable conditions outside our Solar System and, ultimately, the search for life in the universe. Future missions such as PLATO or LIFE will begin to detect and characterise these small, cold planets, dedicating significant observation time to them. The aim of this work is to predict which stars are most likely to host an Earth-like planet (ELP) to avoid blind searches, minimises detection times, and thus maximises the number of detections. Using a previous study on correlations between the presence of an ELP and the properties of its system, we trained a Random Forest to recognise and classify systems as 'hosting an ELP' or 'not hosting an ELP'. The Random Forest was trained and tested on populations of synthetic planetary systems derived from the Bern model, and then applied to real observed systems. The tests conducted on the machine learning (ML) model yield precision scores of up to 0.99, indicating that 99% of the systems identified by the model as having ELPs possess at least one. Among the few real observed systems that have been tested, 44 have been selected as having a high probability of hosting an ELP, and a quick study of the stability of these systems confirms that the presence of an Earth-like planet within them would leave them stable. The excellent results obtained from the tests conducted on the ML model demonstrate its ability to recognise the typical architectures of systems with or without ELPs within populations derived from the Bern model. If we assume that the Bern model adequately describes the architecture of real systems, then such a tool can prove indispensable in the search for Earth-like planets. A similar approach could be applied to other planetary system formation models to validate those predictions.

Updated: 2025-04-09 19:21:46

标题: 类地行星预测器：一种机器学习方法

摘要: 目前，在太阳系之外寻找质量和平衡温度类似地球的行星是追寻宜居条件的第一步，最终是探索宇宙中的生命。未来的任务，如PLATO或LIFE，将开始探测和表征这些小型、寒冷的行星，将显著的观测时间用于它们。本研究的目的是预测哪些恒星最有可能拥有类地行星（ELP），以避免盲目搜索，缩短探测时间，从而最大化探测数量。利用先前关于ELP存在与其系统性质之间相关性的研究，我们训练了一个随机森林来识别和分类系统为“拥有ELP”或“不拥有ELP”。随机森林在从伯尔恩模型衍生的合成行星系统群体上进行了训练和测试，然后应用于真实观测的系统。对机器学习（ML）模型进行的测试显示出高达0.99的精度分数，表明模型识别出的拥有ELP的系统中，至少有99％的系统拥有。在少数经过测试的真实观测系统中，有44个被选定为有很高概率拥有ELP，并对这些系统的稳定性进行了快速研究，证实其中存在类地行星将使它们保持稳定。在ML模型上进行的测试取得的优异结果表明其能够识别伯尔恩模型衍生的群体中有或无ELP的典型体系结构。如果我们假设伯尔恩模型充分描述了真实系统的结构，那么这样的工具在寻找类地行星方面将不可或缺。类似的方法可以应用于其他行星系统形成模型，以验证这些预测。

更新时间: 2025-04-09 19:21:46

领域: astro-ph.EP,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2504.07235v1

Let SSMs be ConvNets: State-space Modeling with Optimal Tensor Contractions

We introduce Centaurus, a class of networks composed of generalized state-space model (SSM) blocks, where the SSM operations can be treated as tensor contractions during training. The optimal order of tensor contractions can then be systematically determined for every SSM block to maximize training efficiency. This allows more flexibility in designing SSM blocks beyond the depthwise-separable configuration commonly implemented. The new design choices will take inspiration from classical convolutional blocks including group convolutions, full convolutions, and bottleneck blocks. We architect the Centaurus network with a mixture of these blocks, to balance between network size and performance, as well as memory and computational efficiency during both training and inference. We show that this heterogeneous network design outperforms its homogeneous counterparts in raw audio processing tasks including keyword spotting, speech denoising, and automatic speech recognition (ASR). For ASR, Centaurus is the first network with competitive performance that can be made fully state-space based, without using any nonlinear recurrence (LSTMs), explicit convolutions (CNNs), or (surrogate) attention mechanism. The source code is available as supplementary material on https://openreview.net/forum?id=PkpNRmBZ32

Updated: 2025-04-09 19:05:31

标题: 让SSMs成为ConvNets：具有最佳张量收缩的状态空间建模

摘要: 我们介绍了Centaurus，一类由广义状态空间模型（SSM）块组成的网络，其中在训练过程中可以将SSM操作视为张量收缩。然后可以系统地确定每个SSM块的张量收缩的最佳顺序，以最大化训练效率。这允许在设计SSM块时具有更多的灵活性，超越常见的深度可分离配置。新的设计选择将从包括组卷积、全卷积和瓶颈块在内的经典卷积块中获取灵感。我们使用这些块的混合来设计Centaurus网络，以在训练和推理过程中平衡网络大小和性能，以及内存和计算效率。我们展示了这种异构网络设计在原始音频处理任务中，包括关键词识别、语音降噪和自动语音识别（ASR）方面的表现优于其同质对应物。对于ASR，Centaurus是首个具有竞争性表现的完全基于状态空间的网络，不使用任何非线性循环（LSTM）、显式卷积（CNN）或（替代）注意机制。源代码可作为补充材料在https://openreview.net/forum?id=PkpNRmBZ32上获得。

更新时间: 2025-04-09 19:05:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2501.13230v2

EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations

Vibrometry-based side channels pose a significant privacy risk, exploiting sensors like mmWave radars, light sensors, and accelerometers to detect vibrations from sound sources or proximate objects, enabling speech eavesdropping. Despite various proposed defenses, these involve costly hardware solutions with inherent physical limitations. This paper presents EveGuard, a software-driven defense framework that creates adversarial audio, protecting voice privacy from side channels without compromising human perception. We leverage the distinct sensing capabilities of side channels and traditional microphones, where side channels capture vibrations and microphones record changes in air pressure, resulting in different frequency responses. EveGuard first proposes a perturbation generator model (PGM) that effectively suppresses sensor-based eavesdropping while maintaining high audio quality. Second, to enable end-to-end training of PGM, we introduce a new domain translation task called Eve-GAN for inferring an eavesdropped signal from a given audio. We further apply few-shot learning to mitigate the data collection overhead for Eve-GAN training. Our extensive experiments show that EveGuard achieves a protection rate of more than 97 percent from audio classifiers and significantly hinders eavesdropped audio reconstruction. We further validate the performance of EveGuard across three adaptive attack mechanisms. We have conducted a user study to verify the perceptual quality of our perturbed audio.

Updated: 2025-04-09 19:04:02

标题: EveGuard：使用音频对抗扰动击败基于振动的侧信道窃听

摘要: 基于振动测量的侧信道构成了一个重要的隐私风险，利用诸如毫米波雷达、光传感器和加速计等传感器来检测声源或附近物体的振动，从而实现窃听对话。尽管提出了各种防御方案，但这些方案涉及成本高昂的硬件解决方案，并具有固有的物理限制。本文提出了EveGuard，一个以软件为驱动的防御框架，通过创建对抗性音频来保护音频隐私免受侧信道的侵犯，同时不影响人类感知。我们利用侧信道和传统麦克风的不同感知能力，其中侧信道捕捉振动，而麦克风记录空气压力变化，导致不同的频率响应。EveGuard首先提出了一个有效抑制传感器窃听的扰动生成器模型（PGM），同时保持高音频质量。其次，为了实现PGM的端到端训练，我们引入了一项称为Eve-GAN的新领域转换任务，用于从给定音频推断被窃听的信号。我们进一步应用少样本学习来减轻Eve-GAN训练的数据收集开销。我们的广泛实验表明，EveGuard实现了超过97%的保护率，使音频分类器受到显著限制，并且显著阻碍了被窃听音频的重建。我们进一步验证了EveGuard在三种自适应攻击机制下的性能。我们进行了用户研究，以验证我们扰动音频的感知质量。

更新时间: 2025-04-09 19:04:02

领域: cs.CR,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2411.10034v2

The Artificial Intelligence Disclosure (AID) Framework: An Introduction

As the use of Generative Artificial Intelligence tools have grown in higher education and research, there have been increasing calls for transparency and granularity around the use and attribution of the use of these tools. Thus far, this need has been met via the recommended inclusion of a note, with little to no guidance on what the note itself should include. This has been identified as a problem to the use of AI in academic and research contexts. This article introduces The Artificial Intelligence Disclosure (AID) Framework, a standard, comprehensive, and detailed framework meant to inform the development and writing of GenAI disclosure for education and research.

Updated: 2025-04-09 19:03:37

标题: 人工智能披露（AID）框架：介绍

摘要: 随着生成式人工智能工具在高等教育和研究领域的应用不断增长，人们对于这些工具的使用和归因透明度和细节性的呼声也越来越高。到目前为止，这一需求通过建议包含一则注释来满足，但对于这则注释本身应包含哪些内容却缺乏指导。这被认为是学术和研究领域中使用人工智能的一个问题。本文介绍了人工智能披露（AID）框架，这是一个标准、全面和详细的框架，旨在指导为教育和研究编写生成式人工智能披露。

更新时间: 2025-04-09 19:03:37

领域: cs.DL,cs.AI

下载: http://arxiv.org/abs/2408.01904v2

Leveraging Machine Learning Techniques in Intrusion Detection Systems for Internet of Things

As the Internet of Things (IoT) continues to expand, ensuring the security of connected devices has become increasingly critical. Traditional Intrusion Detection Systems (IDS) often fall short in managing the dynamic and large-scale nature of IoT networks. This paper explores how Machine Learning (ML) and Deep Learning (DL) techniques can significantly enhance IDS performance in IoT environments. We provide a thorough overview of various IDS deployment strategies and categorize the types of intrusions common in IoT systems. A range of ML methods -- including Support Vector Machines, Naive Bayes, K-Nearest Neighbors, Decision Trees, and Random Forests -- are examined alongside advanced DL models such as LSTM, CNN, Autoencoders, RNNs, and Deep Belief Networks. Each technique is evaluated based on its accuracy, efficiency, and suitability for real-world IoT applications. We also address major challenges such as high false positive rates, data imbalance, encrypted traffic analysis, and the resource constraints of IoT devices. In addition, we highlight the emerging role of Generative AI and Large Language Models (LLMs) in improving threat detection, automating responses, and generating intelligent security policies. Finally, we discuss ethical and privacy concerns, underscoring the need for responsible and transparent implementation. This paper aims to provide a comprehensive framework for developing adaptive, intelligent, and secure IDS solutions tailored for the evolving landscape of IoT.

Updated: 2025-04-09 18:52:15

标题: 利用机器学习技术加强物联网入侵检测系统

摘要: 随着物联网（IoT）的持续扩展，确保连接设备的安全性变得日益关键。传统的入侵检测系统（IDS）在管理物联网网络的动态和大规模特性方面通常表现不佳。本文探讨了机器学习（ML）和深度学习（DL）技术如何显著增强物联网环境中的IDS性能。我们全面概述了各种IDS部署策略，并对物联网系统中常见的入侵类型进行分类。我们检查了一系列ML方法，包括支持向量机、朴素贝叶斯、K-最近邻、决策树和随机森林，以及高级DL模型，如LSTM、CNN、自编码器、RNN和深度信念网络。每种技术都根据其准确性、效率和适用于实际物联网应用的性能进行评估。我们还讨论了一些主要挑战，如高假阳性率、数据不平衡、加密流量分析以及物联网设备的资源限制。此外，我们突出了生成式人工智能和大型语言模型（LLMs）在改进威胁检测、自动化响应和生成智能安全策略方面的新兴角色。最后，我们讨论了伦理和隐私问题，强调需要负责任和透明的实施。本文旨在为开发适应性、智能和安全的IDS解决方案提供全面框架，以适应物联网不断发展的格局。

更新时间: 2025-04-09 18:52:15

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2504.07220v1

Not someone, but something: Rethinking trust in the age of medical AI

As artificial intelligence (AI) becomes embedded in healthcare, trust in medical decision-making is changing fast. This opinion paper argues that trust in AI isn't a simple transfer from humans to machines - it's a dynamic, evolving relationship that must be built and maintained. Rather than debating whether AI belongs in medicine, this paper asks: what kind of trust must AI earn, and how? Drawing from philosophy, bioethics, and system design, it explores the key differences between human trust and machine reliability - emphasizing transparency, accountability, and alignment with the values of good care. It argues that trust in AI shouldn't be built on mimicking empathy or intuition, but on thoughtful design, responsible deployment, and clear moral responsibility. The goal is a balanced view - one that avoids blind optimism and reflexive fear. Trust in AI must be treated not as a given, but as something to be earned over time.

Updated: 2025-04-09 18:46:53

标题: 不是某人，而是某物：重新思考医疗人工智能时代的信任

摘要: 随着人工智能（AI）在医疗保健中的应用，对医疗决策的信任正在迅速改变。本意见论文认为，对AI的信任并不是简单地从人类转移到机器上——这是一个动态的、不断发展的关系，必须建立和维护。本文不是讨论AI是否应该应用于医学，而是问：AI必须赢得何种信任，以及如何赢得？借鉴哲学、生物伦理学和系统设计，探讨了人类信任和机器可靠性之间的关键差异——强调透明度、问责制和与良好护理价值的一致性。本文认为，对AI的信任不应建立在模仿同理心或直觉上，而应建立在深思熟虑的设计、负责任的部署和明确的道德责任上。目标是一个平衡的观点——避免盲目乐观和反射性恐惧。对AI的信任必须被视为不是理所当然的，而是随着时间逐渐赢得的东西。

更新时间: 2025-04-09 18:46:53

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2504.05331v2

Evolutionary algorithms meet self-supervised learning: a comprehensive survey

The number of studies that combine Evolutionary Machine Learning and self-supervised learning has been growing steadily in recent years. Evolutionary Machine Learning has been shown to help automate the design of machine learning algorithms and to lead to more reliable solutions. Self-supervised learning, on the other hand, has produced good results in learning useful features when labelled data is limited. This suggests that the combination of these two areas can help both in shaping evolutionary processes and in automating the design of deep neural networks, while also reducing the need for labelled data. Still, there are no detailed reviews that explain how Evolutionary Machine Learning and self-supervised learning can be used together. To help with this, we provide an overview of studies that bring these areas together. Based on this growing interest and the range of existing works, we suggest a new sub-area of research, which we call Evolutionary Self-Supervised Learning and introduce a taxonomy for it. Finally, we point out some of the main challenges and suggest directions for future research to help Evolutionary Self-Supervised Learning grow and mature as a field.

Updated: 2025-04-09 18:39:41

标题: 进化算法与自监督学习相遇：一项全面调查

摘要: 近年来，结合进化机器学习和自监督学习的研究数量持续增长。进化机器学习已被证明有助于自动化机器学习算法的设计，并导致更可靠的解决方案。另一方面，自监督学习在标记数据有限时产生了良好的结果。这表明这两个领域的结合可以帮助塑造进化过程，并自动化设计深度神经网络，同时减少对标记数据的需求。然而，目前还没有详细的评论解释进化机器学习和自监督学习如何结合使用。为了帮助解决这个问题，我们提供了一个将这两个领域结合起来的研究概况。基于这种不断增长的兴趣和现有作品的范围，我们提出了一个新的研究子领域，我们将其称为进化自监督学习，并为其引入了一个分类法。最后，我们指出了一些主要挑战，并提出了未来研究的方向，以帮助进化自监督学习成长和发展。

更新时间: 2025-04-09 18:39:41

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2504.07213v1

MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data

Terrain modeling has traditionally relied on procedural techniques, which often require extensive domain expertise and handcrafted rules. In this paper, we present MESA - a novel data-centric alternative by training a diffusion model on global remote sensing data. This approach leverages large-scale geospatial information to generate high-quality terrain samples from text descriptions, showcasing a flexible and scalable solution for terrain generation. The model's capabilities are demonstrated through extensive experiments, highlighting its ability to generate realistic and diverse terrain landscapes. The dataset produced to support this work, the Major TOM Core-DEM extension dataset, is released openly as a comprehensive resource for global terrain data. The results suggest that data-driven models, trained on remote sensing data, can provide a powerful tool for realistic terrain modeling and generation.

Updated: 2025-04-09 18:37:24

标题: MESA：利用潜在扩散和全球哥白尼数据驱动的文本地形生成

摘要: 地形建模传统上依赖程序化技术，通常需要广泛的领域专业知识和手工制定的规则。在本文中，我们提出了MESA - 一种新颖的以数据为中心的替代方案，通过在全球遥感数据上训练扩散模型。这种方法利用大规模地理信息生成高质量的地形样本从文本描述中，展示了一种灵活和可扩展的地形生成解决方案。通过广泛的实验展示了模型的能力，突出了其生成逼真和多样化地形景观的能力。为支持这项工作而产生的数据集，即Major TOM Core-DEM扩展数据集，作为全球地形数据的综合资源公开发布。结果表明，基于遥感数据训练的数据驱动模型可以为逼真的地形建模和生成提供强大的工具。

更新时间: 2025-04-09 18:37:24

领域: cs.GR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.07210v1

How Accurately Do Large Language Models Understand Code?

Large Language Models (LLMs) are increasingly used in post-development tasks such as code repair and testing. A key factor in these tasks' success is the model's deep understanding of code. However, the extent to which LLMs truly understand code remains largely unevaluated. Quantifying code comprehension is challenging due to its abstract nature and the lack of a standardized metric. Previously, this was assessed through developer surveys, which are not feasible for evaluating LLMs. Existing LLM benchmarks focus primarily on code generation, fundamentally different from code comprehension. Additionally, fixed benchmarks quickly become obsolete as they become part of the training data. This paper presents the first large-scale empirical investigation into LLMs' ability to understand code. Inspired by mutation testing, we use an LLM's fault-finding ability as a proxy for its deep code understanding. This approach is based on the insight that a model capable of identifying subtle functional discrepancies must understand the code well. We inject faults in real-world programs and ask the LLM to localize them, ensuring the specifications suffice for fault localization. Next, we apply semantic-preserving code mutations (SPMs) to the faulty programs and test whether the LLMs still locate the faults, verifying their confidence in code understanding. We evaluate nine popular LLMs on 600,010 debugging tasks from 670 Java and 637 Python programs. We find that LLMs lose the ability to debug the same bug in 78% of faulty programs when SPMs are applied, indicating a shallow understanding of code and reliance on features irrelevant to semantics. We also find that LLMs understand code earlier in the program better than later. This suggests that LLMs' code comprehension remains tied to lexical and syntactic features due to tokenization designed for natural languages, which overlooks code semantics.

Updated: 2025-04-09 18:27:43

标题: 大型语言模型对代码的理解准确性有多高？

摘要: 大型语言模型(LLMs)在后期开发任务中越来越多地用于代码修复和测试。这些任务成功的关键因素之一是模型对代码的深刻理解。然而，LLMs对代码的真实理解程度仍然大部分未经评估。由于抽象性质和缺乏标准化指标，量化代码理解是具有挑战性的。以前，通过开发者调查来评估这一点，但这在评估LLMs时是不可行的。现有的LLMs基准主要集中在代码生成上，与代码理解基本不同。此外，固定的基准很快就会过时，因为它们成为训练数据的一部分。本文提出了对LLMs理解代码能力的首次大规模实证调查。受突变测试的启发，我们使用LLM的错误查找能力作为其对代码深刻理解的代理。这种方法基于这样的洞察力，即一个能够识别微妙功能差异的模型必须对代码了解良好。我们在真实世界的程序中注入错误，并要求LLM定位它们，确保规范足够用于故障定位。接下来，我们对有错误的程序应用保持语义的代码突变(SPMs)，并测试LLMs是否依然能够找到错误，验证它们对代码理解的信心。我们对来自670个Java和637个Python程序的600,010个调试任务评估了九个流行的LLMs。我们发现，当应用SPMs时，LLMs在78%的有错误程序中失去了调试相同错误的能力，表明对代码的理解浅薄，并依赖于与语义无关的特征。我们还发现，LLMs在程序早期对代码的理解要比后期好。这表明LLMs的代码理解仍然与词法和句法特征相关，这是因为为自然语言设计的标记化忽视了代码语义。

更新时间: 2025-04-09 18:27:43

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.04372v2

SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog

We present SemEval-2025 Task 5: LLMs4Subjects, a shared task on automated subject tagging for scientific and technical records in English and German using the GND taxonomy. Participants developed LLM-based systems to recommend top-k subjects, evaluated through quantitative metrics (precision, recall, F1-score) and qualitative assessments by subject specialists. Results highlight the effectiveness of LLM ensembles, synthetic data generation, and multilingual processing, offering insights into applying LLMs for digital library classification.

Updated: 2025-04-09 18:26:46

标题: SemEval-2025任务5：LLMs4Subjects -- 基于LLM的国家技术图书馆开放目录自动学科标记

摘要: 我们提出了SemEval-2025任务5：LLMs4Subjects，这是一个关于使用GND分类法对英语和德语科技记录进行自动主题标记的共享任务。参与者开发了基于LLM的系统来推荐前k个主题，并通过定量指标（精度、召回率、F1分数）和主题专家的定性评估进行评估。结果突出了LLM集成、合成数据生成和多语言处理的有效性，为利用LLMs进行数字图书馆分类提供了见解。

更新时间: 2025-04-09 18:26:46

领域: cs.CL,cs.AI,cs.DL,cs.LG

下载: http://arxiv.org/abs/2504.07199v1

Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning

The human face plays a central role in social communication, necessitating the use of performant computer vision tools for human-centered applications. We propose Face-LLaVA, a multimodal large language model for face-centered, in-context learning, including facial expression and attribute recognition. Additionally, Face-LLaVA is able to generate natural language descriptions that can be used for reasoning. Leveraging existing visual databases, we first developed FaceInstruct-1M, a face-centered database for instruction tuning MLLMs for face processing. We then developed a novel face-specific visual encoder powered by Face-Region Guided Cross-Attention that integrates face geometry with local visual features. We evaluated the proposed method across nine different datasets and five different face processing tasks, including facial expression recognition, action unit detection, facial attribute detection, age estimation and deepfake detection. Face-LLaVA achieves superior results compared to existing open-source MLLMs and competitive performance compared to commercial solutions. Our model output also receives a higher reasoning rating by GPT under a zero-shot setting across all the tasks. Both our dataset and model wil be released at https://face-llava.github.io to support future advancements in social AI and foundational vision-language research.

Updated: 2025-04-09 18:26:07

标题: Face-LLaVA: 通过指导调整实现面部表情和属性理解

摘要: 人脸在社交交流中起着中心作用，需要使用高性能计算机视觉工具进行以人为中心的应用。我们提出了Face-LLaVA，这是一个面向人脸的多模态大型语言模型，用于面向上下文学习，包括面部表情和属性识别。此外，Face-LLaVA能够生成可用于推理的自然语言描述。利用现有的视觉数据库，我们首先开发了FaceInstruct-1M，一个面向面部处理的数据库，用于调整MLLMs进行面部处理。然后，我们开发了一种新颖的面部特定视觉编码器，采用Face-Region Guided Cross-Attention技术，将面部几何形状与局部视觉特征整合在一起。我们在九个不同的数据集和五个不同的面部处理任务上评估了所提出的方法，包括面部表情识别、动作单元检测、面部属性检测、年龄估计和深度伪造检测。与现有的开源MLLMs相比，Face-LLaVA取得了优越的结果，并与商业解决方案相比具有竞争力的表现。在零-shot设置下，我们的模型输出在所有任务中都获得了GPT更高的推理评分。我们的数据集和模型将在https://face-llava.github.io上发布，以支持社交人工智能和基础视觉语言研究的未来进展。

更新时间: 2025-04-09 18:26:07

领域: cs.CV,cs.AI,cs.HC

下载: http://arxiv.org/abs/2504.07198v1

GIScience in the Era of Artificial Intelligence: A Research Agenda Towards Autonomous GIS

The advent of generative AI exemplified by large language models (LLMs) opens new ways to represent and compute geographic information and transcends the process of geographic knowledge production, driving geographic information systems (GIS) towards autonomous GIS. Leveraging LLMs as the decision core, autonomous GIS can independently generate and execute geoprocessing workflows to perform spatial analysis. In this vision paper, we further elaborate on the concept of autonomous GIS and present a conceptual framework that defines its five autonomous goals, five autonomous levels, five core functions, and three operational scales. We demonstrate how autonomous GIS could perform geospatial data retrieval, spatial analysis, and map making with four proof-of-concept GIS agents. We conclude by identifying critical challenges and future research directions, including fine-tuning and self-growing decision-cores, autonomous modeling, and examining the societal and practical implications of autonomous GIS. By establishing the groundwork for a paradigm shift in GIScience, this paper envisions a future where GIS moves beyond traditional workflows to autonomously reason, derive, innovate, and advance geospatial solutions to pressing global challenges. As we design and deploy increasingly intelligent geospatial systems, we carry a responsibility to ensure they are developed in socially responsible ways, serve the public good, and support the continued value of human geographic insight in an AI-augmented future.

Updated: 2025-04-09 18:26:03

标题: 人工智能时代的地理信息科学：朝自主GIS方向的研究议程

摘要: 生成式人工智能的出现，例如大型语言模型（LLMs），开辟了代表和计算地理信息的新途径，并超越地理知识生产的过程，推动地理信息系统（GIS）向自主GIS发展。利用LLMs作为决策核心，自主GIS可以独立生成和执行地理处理工作流程，进行空间分析。在这篇愿景论文中，我们进一步阐述了自主GIS的概念，并提出了一个概念框架，定义了其五个自主目标、五个自主水平、五个核心功能和三个运营规模。我们展示了自主GIS如何通过四个概念验证GIS代理执行地理空间数据检索、空间分析和制图。最后，我们总结了关键挑战和未来研究方向，包括微调和自我成长的决策核心、自主建模，以及审查自主GIS的社会和实际影响。通过为GIScience中的范式转变奠定基础，本文展望了一个未来，GIS将超越传统工作流程，自主推理、衍生、创新，并推进地理空间解决方案应对紧迫全球挑战。随着我们设计和部署越来越智能的地理空间系统，我们有责任确保它们以社会负责的方式发展，服务于公共利益，并支持人类地理洞察力在AI增强的未来中持续价值。

更新时间: 2025-04-09 18:26:03

领域: cs.AI,cs.ET,cs.SE

下载: http://arxiv.org/abs/2503.23633v4

HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation

Large language models (LLMs) have demonstrated great potential for automating the evaluation of natural language generation. Previous frameworks of LLM-as-a-judge fall short in two ways: they either use zero-shot setting without consulting any human input, which leads to low alignment, or fine-tune LLMs on labeled data, which requires a non-trivial number of samples. Moreover, previous methods often provide little reasoning behind automated evaluations. In this paper, we propose HypoEval, Hypothesis-guided Evaluation framework, which first uses a small corpus of human evaluations to generate more detailed rubrics for human judgments and then incorporates a checklist-like approach to combine LLM's assigned scores on each decomposed dimension to acquire overall scores. With only 30 human evaluations, HypoEval achieves state-of-the-art performance in alignment with both human rankings (Spearman correlation) and human scores (Pearson correlation), on average outperforming G-Eval by 11.86% and fine-tuned Llama-3.1-8B-Instruct with at least 3 times more human evaluations by 11.95%. Furthermore, we conduct systematic studies to assess the robustness of HypoEval, highlighting its effectiveness as a reliable and interpretable automated evaluation framework.

Updated: 2025-04-09 18:00:01

标题: HypoEval: 用于自然语言生成的假设引导评估

摘要: 大型语言模型（LLMs）已经展示出在自然语言生成评估自动化方面具有巨大潜力。先前的LLM作为评判者的框架在两个方面表现不佳：它们要么使用零-shot设置而不参考任何人类输入，导致对齐性低，要么在标记数据上微调LLMs，这需要大量样本。此外，先前的方法通常对自动评估提供很少的推理。在本文中，我们提出了HypoEval，即假设引导评估框架，它首先使用少量人类评估语料库生成更详细的用于人类判断的标准，然后结合类似于检查表的方法，将LLM对每个分解维度的分数组合以获取总体分数。仅通过30个人类评估，HypoEval在与人类排名（斯皮尔曼相关性）和人类得分（皮尔逊相关性）的对齐方面实现了最先进的性能，平均优于G-Eval 11.86％，并且优于具有至少3倍更多人类评估的经过微调的Llama-3.1-8B-Instruct 11.95％。此外，我们进行了系统性研究来评估HypoEval的鲁棒性，突出其作为可靠和可解释的自动评估框架的有效性。

更新时间: 2025-04-09 18:00:01

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.07174v1

Trustworthy AI Must Account for Intersectionality

Trustworthy AI encompasses many aspirational aspects for aligning AI systems with human values, including fairness, privacy, robustness, explainability, and uncertainty quantification. However, efforts to enhance one aspect often introduce unintended trade-offs that negatively impact others, making it challenging to improve all aspects simultaneously. In this position paper, we review notable approaches to these five aspects and systematically consider every pair, detailing the negative interactions that can arise. For example, applying differential privacy to model training can amplify biases in the data, undermining fairness. Drawing on these findings, we take the position that addressing trustworthiness along each axis in isolation is insufficient. Instead, research on Trustworthy AI must account for intersectionality between aspects and adopt a holistic view across all relevant axes at once. To illustrate our perspective, we provide guidance on how researchers can work towards integrated trustworthiness, a case study on how intersectionality applies to the financial industry, and alternative views to our position.

Updated: 2025-04-09 18:00:00

标题: 可信的人工智能必须考虑交叉性

摘要: 值得信赖的人工智能涵盖了许多雄心勃勃的方面，用于将人工智能系统与人类价值观相一致，包括公平性、隐私、稳健性、可解释性和不确定性量化。然而，努力增强一个方面往往会引入意想不到的权衡，对其他方面产生负面影响，使得同时改进所有方面成为一项挑战。在这篇立场文件中，我们回顾了这五个方面的显著方法，并系统地考虑了每一对，详细描述可能出现的负面相互作用。例如，将差分隐私应用于模型训练可能会放大数据中的偏见，破坏公平性。借鉴这些发现，我们认为单独沿着每个轴解决信任度是不够的。相反，对值得信赖的人工智能的研究必须考虑各方面之间的交叉性，并采取跨所有相关轴线的整体观点。为了说明我们的观点，我们提供了研究人员如何朝着整体信任度发展的指导，一个关于交叉性如何适用于金融行业的案例研究，以及我们立场的替代观点。

更新时间: 2025-04-09 18:00:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.07170v1

Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning

Continual learning in large language models (LLMs) is prone to catastrophic forgetting, where adapting to new tasks significantly degrades performance on previously learned ones. Existing methods typically rely on low-rank, parameter-efficient updates that limit the model's expressivity and introduce additional parameters per task, leading to scalability issues. To address these limitations, we propose a novel continual full fine-tuning approach leveraging adaptive singular value decomposition (SVD). Our method dynamically identifies task-specific low-rank parameter subspaces and constrains updates to be orthogonal to critical directions associated with prior tasks, thus effectively minimizing interference without additional parameter overhead or storing previous task gradients. We evaluate our approach extensively on standard continual learning benchmarks using both encoder-decoder (T5-Large) and decoder-only (LLaMA-2 7B) models, spanning diverse tasks including classification, generation, and reasoning. Empirically, our method achieves state-of-the-art results, up to 7% higher average accuracy than recent baselines like O-LoRA, and notably maintains the model's general linguistic capabilities, instruction-following accuracy, and safety throughout the continual learning process by reducing forgetting to near-negligible levels. Our adaptive SVD framework effectively balances model plasticity and knowledge retention, providing a practical, theoretically grounded, and computationally scalable solution for continual learning scenarios in large language models.

Updated: 2025-04-09 17:59:42

标题: 雕塑子空间：LLM模型中的受限全精调用于持续学习

摘要: 大型语言模型（LLMs）中的持续学习容易出现灾难性遗忘，即适应新任务会显著降低先前学习的任务表现。现有方法通常依赖于低秩、参数高效的更新，限制了模型的表达能力并引入了每个任务的额外参数，导致可扩展性问题。为了解决这些限制，我们提出了一种利用自适应奇异值分解（SVD）的新颖的持续全精调方法。我们的方法动态识别任务特定的低秩参数子空间，并约束更新沿着与先前任务相关的关键方向正交，从而有效地最小化干扰，而不会增加额外的参数开销或存储先前任务的梯度。我们在标准持续学习基准上广泛评估了我们的方法，使用了编码器-解码器（T5-Large）和仅解码器（LLaMA-27B）模型，涵盖了分类、生成和推理等各种任务。从经验上看，我们的方法取得了最先进的结果，平均准确度比最近的基线（如O-LoRA）高出7%，并显著保持了模型的通用语言能力、遵循指令的准确性和安全性，通过将遗忘减少到近乎可忽略的水平来贯穿整个持续学习过程。我们的自适应SVD框架有效地平衡了模型的可塑性和知识保留，为大型语言模型中的持续学习场景提供了实用、理论基础和计算可扩展的解决方案。

更新时间: 2025-04-09 17:59:42

领域: cs.LG,cs.AI,cs.CL,math.PR,stat.ML,68T50,I.2.0; G.3

下载: http://arxiv.org/abs/2504.07097v1

Neural Motion Simulator: Pushing the Limit of World Models in Reinforcement Learning

An embodied system must not only model the patterns of the external world but also understand its own motion dynamics. A motion dynamic model is essential for efficient skill acquisition and effective planning. In this work, we introduce the neural motion simulator (MoSim), a world model that predicts the future physical state of an embodied system based on current observations and actions. MoSim achieves state-of-the-art performance in physical state prediction and provides competitive performance across a range of downstream tasks. This works shows that when a world model is accurate enough and performs precise long-horizon predictions, it can facilitate efficient skill acquisition in imagined worlds and even enable zero-shot reinforcement learning. Furthermore, MoSim can transform any model-free reinforcement learning (RL) algorithm into a model-based approach, effectively decoupling physical environment modeling from RL algorithm development. This separation allows for independent advancements in RL algorithms and world modeling, significantly improving sample efficiency and enhancing generalization capabilities. Our findings highlight that world models for motion dynamics is a promising direction for developing more versatile and capable embodied systems.

Updated: 2025-04-09 17:59:32

标题: 神经运动模拟器：推动强化学习中世界模型的极限

摘要: 一个具有体现性的系统不仅必须建模外部世界的模式，还必须理解自身的运动动态。运动动态模型对于高效的技能习得和有效的规划至关重要。在这项工作中，我们介绍了神经运动模拟器（MoSim），这是一个世界模型，根据当前观察和行为预测具有体现性系统的未来物理状态。MoSim在物理状态预测方面达到了最先进的性能，并在一系列下游任务中提供了竞争性能。这项工作表明，当一个世界模型足够精确并做出准确的长期预测时，它可以促进在想象中的世界中的高效技能习得，甚至实现零样本强化学习。此外，MoSim可以将任何无模型强化学习（RL）算法转化为基于模型的方法，有效地将物理环境建模与RL算法开发分离。这种分离允许RL算法和世界建模方面的独立进展，显著提高样本效率并增强泛化能力。我们的发现强调，对于运动动态的世界模型是开发更多功能更强大的具有体现性系统的一个有前途的方向。

更新时间: 2025-04-09 17:59:32

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2504.07095v1

Are We Done with Object-Centric Learning?

Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene. This approach underpins various aims, including out-of-distribution (OOD) generalization, sample-efficient composition, and modeling of structured environments. Most research has focused on developing unsupervised mechanisms that separate objects into discrete slots in the representation space, evaluated using unsupervised object discovery. However, with recent sample-efficient segmentation models, we can separate objects in the pixel space and encode them independently. This achieves remarkable zero-shot performance on OOD object discovery benchmarks, is scalable to foundation models, and can handle a variable number of slots out-of-the-box. Hence, the goal of OCL methods to obtain object-centric representations has been largely achieved. Despite this progress, a key question remains: How does the ability to separate objects within a scene contribute to broader OCL objectives, such as OOD generalization? We address this by investigating the OOD generalization challenge caused by spurious background cues through the lens of OCL. We propose a novel, training-free probe called $\textbf{Object-Centric Classification with Applied Masks (OCCAM)}$, demonstrating that segmentation-based encoding of individual objects significantly outperforms slot-based OCL methods. However, challenges in real-world applications remain. We provide the toolbox for the OCL community to use scalable object-centric representations, and focus on practical applications and fundamental questions, such as understanding object perception in human cognition. Our code is available $\href{https://github.com/AlexanderRubinstein/OCCAM}{here}$.

Updated: 2025-04-09 17:59:05

标题: 我们是否已经完成了以对象为中心的学习？

摘要: 目标中心学习（OCL）旨在学习只编码一个对象的表示，与场景中的其他对象或背景线索隔离开来。这种方法支持各种目标，包括超出分布（OOD）泛化，样本高效的组合和结构化环境建模。大多数研究集中在开发无监督机制上，将对象分离为表示空间中的离散槽，使用无监督对象发现进行评估。然而，借助最近的样本高效分割模型，我们可以在像素空间中分隔对象并独立编码它们。这在OOD对象发现基准测试中取得了显着的零-shot性能，可扩展到基础模型，并且可以直接处理可变数量的槽。因此，获得目标中心表示的OCL方法的目标已在很大程度上实现。尽管取得了进展，一个关键问题仍然存在：在一个场景中分离对象的能力如何有助于更广泛的OCL目标，如OOD泛化？我们通过OCL的视角研究由虚假背景线索引起的OOD泛化挑战来解决这个问题。我们提出了一种新颖的无需训练的探针，名为$\textbf{应用遮罩的目标中心分类（OCCAM）}$，展示了基于分割的个体对象编码明显优于基于槽的OCL方法。然而，在现实世界应用中仍存在挑战。我们为OCL社区提供了使用可扩展目标中心表示的工具箱，并专注于实际应用和基本问题，如了解人类认知中的对象感知。我们的代码在$\href{https://github.com/AlexanderRubinstein/OCCAM}{这里}$可用。

更新时间: 2025-04-09 17:59:05

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.07092v1

AssistanceZero: Scalably Solving Assistance Games

Assistance games are a promising alternative to reinforcement learning from human feedback (RLHF) for training AI assistants. Assistance games resolve key drawbacks of RLHF, such as incentives for deceptive behavior, by explicitly modeling the interaction between assistant and user as a two-player game where the assistant cannot observe their shared goal. Despite their potential, assistance games have only been explored in simple settings. Scaling them to more complex environments is difficult because it requires both solving intractable decision-making problems under uncertainty and accurately modeling human users' behavior. We present the first scalable approach to solving assistance games and apply it to a new, challenging Minecraft-based assistance game with over $10^{400}$ possible goals. Our approach, AssistanceZero, extends AlphaZero with a neural network that predicts human actions and rewards, enabling it to plan under uncertainty. We show that AssistanceZero outperforms model-free RL algorithms and imitation learning in the Minecraft-based assistance game. In a human study, our AssistanceZero-trained assistant significantly reduces the number of actions participants take to complete building tasks in Minecraft. Our results suggest that assistance games are a tractable framework for training effective AI assistants in complex environments. Our code and models are available at https://github.com/cassidylaidlaw/minecraft-building-assistance-game.

Updated: 2025-04-09 17:59:03

标题: 辅助零：可扩展地解决协助游戏

摘要: 协助游戏是训练人工智能助手的一种有前途的替代方法，而不是依靠人类反馈进行强化学习（RLHF）。协助游戏解决了RLHF的关键缺点，例如欺骗性行为的激励，通过明确地将助手与用户之间的交互建模为一个助手无法观察到共同目标的双人游戏。尽管协助游戏具有潜力，但目前只在简单环境中进行了探索。将其扩展到更复杂的环境中很困难，因为这需要在不确定性下解决难以处理的决策问题，并准确地建模人类用户的行为。我们提出了第一个可扩展解决协助游戏的方法，并将其应用于一个新的、具有$10^{400}$种可能目标的基于Minecraft的挑战性协助游戏。我们的方法，AssistanceZero，通过一个神经网络扩展了AlphaZero，预测人类的行动和奖励，使其能够在不确定性下进行计划。我们展示AssistanceZero在Minecraft协助游戏中优于无模型RL算法和模仿学习。在一项人类研究中，我们经过AssistanceZero训练的助手显著减少了参与者在Minecraft完成建筑任务时所需的行动数。我们的结果表明，协助游戏是在复杂环境中训练有效的人工智能助手的一个可行框架。我们的代码和模型可在https://github.com/cassidylaidlaw/minecraft-building-assistance-game找到。

更新时间: 2025-04-09 17:59:03

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.07091v1

KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs

Knowledge graphs have emerged as a popular method for injecting up-to-date, factual knowledge into large language models (LLMs). This is typically achieved by converting the knowledge graph into text that the LLM can process in context. While multiple methods of encoding knowledge graphs have been proposed, the impact of this textualization process on LLM performance remains under-explored. We introduce KG-LLM-Bench, a comprehensive and extensible benchmark spanning five knowledge graph understanding tasks, and evaluate how different encoding strategies affect performance across various base models. Our extensive experiments with seven language models and five textualization strategies provide insights for optimizing LLM performance on KG reasoning tasks.

Updated: 2025-04-09 17:58:47

标题: KG-LLM-Bench：用于评估文本化知识图上LLM推理的可扩展基准

摘要: 知识图谱已经成为向大型语言模型（LLMs）注入最新事实知识的流行方法。通常通过将知识图谱转换为LLM可以在上下文中处理的文本来实现这一目标。虽然已经提出了多种编码知识图谱的方法，但这种文本化过程对LLM性能的影响仍未得到充分探讨。我们引入了KG-LLM-Bench，一个涵盖五个知识图谱理解任务的全面且可扩展的基准测试，并评估不同编码策略如何影响各种基本模型的性能。我们对七个语言模型和五种文本化策略进行了广泛的实验，为优化LLM在知识图谱推理任务上的性能提供了见解。

更新时间: 2025-04-09 17:58:47

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2504.07087v1

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Reasoning has emerged as the next major frontier for language models (LMs), with rapid advances from both academic and industrial labs. However, this progress often outpaces methodological rigor, with many evaluations relying on benchmarking practices that lack transparency, robustness, or statistical grounding. In this work, we conduct a comprehensive empirical study and find that current mathematical reasoning benchmarks are highly sensitive to subtle implementation choices - including decoding parameters, random seeds, prompt formatting, and even hardware and software-framework configurations. Performance gains reported in recent studies frequently hinge on unclear comparisons or unreported sources of variance. To address these issues, we propose a standardized evaluation framework with clearly defined best practices and reporting standards. Using this framework, we reassess recent methods and find that reinforcement learning (RL) approaches yield only modest improvements - far below prior claims - and are prone to overfitting, especially on small-scale benchmarks like AIME24. In contrast, supervised finetuning (SFT) methods show consistently stronger generalization. To foster reproducibility, we release all code, prompts, and model outputs, for reasoning benchmarks, establishing more rigorous foundations for future work.

Updated: 2025-04-09 17:58:17

标题: 对语言模型推理进展的冷静观察：可重现性的陷阱和路径

摘要: 推理已经成为语言模型（LMs）的下一个重要领域，从学术和工业实验室都迅速取得进展。然而，这一进展往往超过了方法论的严谨性，许多评估依赖于缺乏透明度、鲁棒性或统计基础的基准实践。在这项工作中，我们进行了一项全面的实证研究，发现当前的数学推理基准对于微妙的实现选择非常敏感，包括解码参数、随机种子、提示格式化，甚至硬件和软件框架配置。最近研究中报告的性能增益经常取决于不清晰的比较或未报告的差异来源。为了解决这些问题，我们提出了一个标准化的评估框架，明确定义最佳实践和报告标准。使用这个框架，我们重新评估了最近的方法，发现强化学习（RL）方法只带来了一些温和的改进 - 远低于以前的声明 - 并且容易过拟合，特别是在像AIME24这样的小规模基准上。相比之下，监督微调（SFT）方法表现出更强的泛化能力。为了促进可重复性，我们发布了所有的代码、提示和模型输出，用于推理基准，为未来工作建立更严格的基础。

更新时间: 2025-04-09 17:58:17

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2504.07086v1

Identifying Unknown Stochastic Dynamics via Finite expression methods

Modeling stochastic differential equations (SDEs) is crucial for understanding complex dynamical systems in various scientific fields. Recent methods often employ neural network-based models, which typically represent SDEs through a combination of deterministic and stochastic terms. However, these models usually lack interpretability and have difficulty generalizing beyond their training domain. This paper introduces the Finite Expression Method (FEX), a symbolic learning approach designed to derive interpretable mathematical representations of the deterministic component of SDEs. For the stochastic component, we integrate FEX with advanced generative modeling techniques to provide a comprehensive representation of SDEs. The numerical experiments on linear, nonlinear, and multidimensional SDEs demonstrate that FEX generalizes well beyond the training domain and delivers more accurate long-term predictions compared to neural network-based methods. The symbolic expressions identified by FEX not only improve prediction accuracy but also offer valuable scientific insights into the underlying dynamics of the systems, paving the way for new scientific discoveries.

Updated: 2025-04-09 17:57:54

标题: 通过有限表达式方法识别未知的随机动态

摘要: 建模随机微分方程（SDEs）对于理解各种科学领域中的复杂动态系统至关重要。最近的方法通常采用基于神经网络的模型，这些模型通常通过确定性和随机项的组合来表示SDEs。然而，这些模型通常缺乏可解释性，并且难以推广到训练领域之外。本文介绍了有限表达式方法（FEX），这是一种符号学习方法，旨在推导SDEs确定性部分的可解释数学表示。对于随机部分，我们将FEX与先进的生成建模技术相结合，提供了SDEs的全面表示。对线性、非线性和多维SDEs的数值实验表明，FEX在训练领域之外具有良好的泛化能力，并与基于神经网络的方法相比，提供更准确的长期预测。FEX识别的符号表达不仅提高了预测准确性，还为系统的基础动态提供了宝贵的科学见解，为新的科学发现铺平了道路。

更新时间: 2025-04-09 17:57:54

领域: cs.LG

下载: http://arxiv.org/abs/2504.07085v1

Distributional Autoencoders Know the Score

This work presents novel and desirable properties of a recently introduced class of autoencoders - the Distributional Principal Autoencoder (DPA) - which combines distributionally correct reconstruction with principal components-like interpretability of the encodings. First, we show formally that the level sets of the encoder orient themselves exactly with regard to the score of the data distribution. This both explains the method's often remarkable performance in disentangling the factors of variation of the data, as well as opens up possibilities of recovering its distribution while having access to samples only. In settings where the score itself has physical meaning - such as when the data obeys the Boltzmann distribution - we demonstrate that the method can recover scientifically important quantities such as the minimum free energy path. Second, we prove that if the data lies on a manifold that can be approximated by the encoder, the optimal encoder's components beyond the dimension of the manifold will carry absolutely no additional information about the data distribution. This promises potentially new ways of determining the number of relevant dimensions of the data. The results thus demonstrate that the DPA elegantly combines two often disparate goals of unsupervised learning: the learning of the data distribution and the learning of the intrinsic data dimensionality.

Updated: 2025-04-09 17:56:17

标题: 分布式自动编码器了解分数

摘要: 这项工作展示了一种新颖且值得期待的自编码器类别的特性 - 分布主成分自编码器（DPA），它将分布合理的重构与编码的主成分样式相结合。首先，我们正式展示了编码器的等值线与数据分布的得分完全一致。这既解释了该方法在分解数据变化因素方面通常表现出色的原因，也为在仅有样本的情况下恢复其分布打开了可能性。在数据得分本身具有物理意义的情况下 - 例如当数据服从玻尔兹曼分布时 - 我们证明该方法可以恢复科学上重要的量，如最小自由能路径。其次，我们证明，如果数据位于可以由编码器近似的流形上，则超出流形维度的最佳编码器组件绝对不会携带关于数据分布的额外信息。这为确定数据的相关维度的新方法带来了潜在可能性。因此，结果表明DPA优雅地结合了无监督学习中经常不同的两个目标：学习数据分布和学习数据固有的维度。

更新时间: 2025-04-09 17:56:17

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2502.11583v2

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Large Language Models (LLMs) have demonstrated the ability to tackle increasingly complex tasks through advanced reasoning, long-form content generation, and tool use. Solving these tasks often involves long inference-time computations. In human problem solving, a common strategy to expedite work is collaboration: by dividing the problem into sub-tasks, exploring different strategies concurrently, etc. Recent research has shown that LLMs can also operate in parallel by implementing explicit cooperation frameworks, such as voting mechanisms or the explicit creation of independent sub-tasks that can be executed in parallel. However, each of these frameworks may not be suitable for all types of tasks, which can hinder their applicability. In this work, we propose a different design approach: we run LLM "workers" in parallel , allowing them to synchronize via a concurrently-updated attention cache and prompt these workers to decide how best to collaborate. Our approach allows the instances to come up with their own collaboration strategy for the problem at hand, all the while "seeing" each other's partial progress in the concurrent cache. We implement this approach via Hogwild! Inference: a parallel LLM inference engine where multiple instances of the same LLM run in parallel with the same attention cache, with "instant" access to each other's generated tokens. Hogwild! inference takes advantage of Rotary Position Embeddings (RoPE) to avoid recomputation while improving parallel hardware utilization. We find that modern reasoning-capable LLMs can perform inference with shared Key-Value cache out of the box, without additional fine-tuning.

Updated: 2025-04-09 17:56:08

标题: Hogwild！推理：通过并发注意力实现并行LLM生成

摘要: 大型语言模型（LLMs）通过先进的推理、长篇内容生成和工具使用，展示了解决日益复杂任务的能力。解决这些任务通常涉及长时间推理计算。在人类问题解决中，一种常见的加速工作的策略是协作：通过将问题分解为子任务、同时探索不同策略等。最近的研究表明，LLMs也可以通过实现明确的合作框架来并行操作，例如投票机制或明确创建可以并行执行的独立子任务。然而，这些框架中的每一个可能不适用于所有类型的任务，这可能会影响它们的适用性。在这项工作中，我们提出了一种不同的设计方法：我们并行运行LLM“工作者”，允许它们通过同时更新的注意力缓存进行同步，并促使这些工作者决定如何最好地合作。我们的方法允许实例为手头的问题制定自己的协作策略，同时在并发缓存中“看到”彼此的部分进展。我们通过Hogwild! 推理来实现这种方法：一个并行LLM推理引擎，其中多个相同LLM的实例并行运行，使用相同的注意力缓存，可以“立即”访问彼此生成的标记。Hogwild! 推理利用旋转位置嵌入（RoPE）来避免重新计算，同时提高并行硬件利用率。我们发现，现代具有推理能力的LLMs可以直接在共享的键值缓存中执行推理，无需额外微调。

更新时间: 2025-04-09 17:56:08

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2504.06261v2

R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Improving open-source models on real-world SWE tasks (solving GITHUB issues) faces two key challenges: 1) scalable curation of execution environments to train these models, and, 2) optimal scaling of test-time compute. We introduce AgentGym, the largest procedurally-curated executable gym environment for training real-world SWE-agents, consisting of more than 8.7K tasks. AgentGym is powered by two main contributions: 1) SYNGEN: a synthetic data curation recipe that enables scalable curation of executable environments using test-generation and back-translation directly from commits, thereby reducing reliance on human-written issues or unit tests. We show that this enables more scalable training leading to pass@1 performance of 34.4% on SWE-Bench Verified benchmark with our 32B model. 2) Hybrid Test-time Scaling: we provide an in-depth analysis of two test-time scaling axes; execution-based and execution-free verifiers, demonstrating that they exhibit complementary strengths and limitations. Test-based verifiers suffer from low distinguishability, while execution-free verifiers are biased and often rely on stylistic features. Surprisingly, we find that while each approach individually saturates around 42-43%, significantly higher gains can be obtained by leveraging their complementary strengths. Overall, our approach achieves 51% on the SWE-Bench Verified benchmark, reflecting a new state-of-the-art for open-weight SWE-agents and for the first time showing competitive performance with proprietary models such as o1, o1-preview and sonnet-3.5-v2 (with tools). We will open-source our environments, models, and agent trajectories.

Updated: 2025-04-09 17:55:19

标题: R2E-Gym: 用于扩展开放权重SWE代理的程序化环境和混合验证器

摘要: 改进开源模型以解决真实世界软件工程任务（解决GitHub问题）面临两个关键挑战：1）可扩展的执行环境的策划，以训练这些模型；2）测试时计算的最佳扩展。我们引入了AgentGym，这是最大的程序化策划的可执行健身环境，用于训练真实世界的软件工程代理，包括超过8.7K个任务。AgentGym由两个主要贡献驱动：1）SYNGEN：一种合成数据策划配方，通过使用测试生成和直接从提交进行反向翻译，从而减少对人工编写问题或单元测试的依赖，从而实现可扩展的可执行环境的策划。我们展示了这使得训练更加可扩展，使用我们的32B模型在SWE-Bench Verified基准测试中实现了34.4%的pass@1性能。2）混合测试时间扩展：我们提供了对两个测试时间扩展轴的深入分析；基于执行和不基于执行的验证器，表明它们表现出互补的优势和局限性。基于测试的验证器易受低可区分性的影响，而不基于执行的验证器则存在偏见，并且通常依赖于风格特征。令人惊讶的是，我们发现，虽然每种方法单独饱和在42-43%左右，但通过利用它们的互补优势可以获得更高的增益。总体而言，我们的方法在SWE-Bench Verified基准测试中实现了51%的性能，反映出开源软件工程代理的最新技术水平，同时首次展示了与专有模型（如o1、o1-preview和sonnet-3.5-v2（带工具））竞争性能。我们将开源我们的环境、模型和代理轨迹。

更新时间: 2025-04-09 17:55:19

领域: cs.SE,cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.07164v1

A Concise Mathematical Description of Active Inference in Discrete Time

In this paper we present a concise mathematical description of active inference in discrete time. The main part of the paper serves as a basic introduction to the topic, including a detailed example of the action selection mechanism. The appendix discusses the more subtle mathematical details, targeting readers who have already studied the active inference literature but struggle to make sense of the mathematical details and derivations. Throughout, we emphasize precise and standard mathematical notation, ensuring consistency with existing texts and linking all equations to widely used references on active inference. Additionally, we provide Python code that implements the action selection and learning mechanisms described in this paper and is compatible with pymdp environments.

Updated: 2025-04-09 17:54:25

标题: 一个简明的数学描述：离散时间中的主动推理

摘要: 在本文中，我们提出了一个简洁的离散时间主动推理的数学描述。文章的主要部分作为主题的基本介绍，包括动作选择机制的详细示例。附录讨论了更微妙的数学细节，针对那些已经研究过主动推理文献但难以理解数学细节和推导的读者。在整个过程中，我们强调精确和标准的数学符号，确保与现有文本保持一致，并将所有方程式链接到广泛使用的主动推理参考文献。此外，我们提供了实现本文中描述的动作选择和学习机制的Python代码，并与pymdp环境兼容。

更新时间: 2025-04-09 17:54:25

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2406.07726v3

Self-Steering Language Models

While test-time reasoning enables language models to tackle complex tasks, searching or planning in natural language can be slow, costly, and error-prone. But even when LMs struggle to emulate the precise reasoning steps needed to solve a problem, they often excel at describing its abstract structure--both how to verify solutions and how to search for them. This paper introduces DisCIPL, a method for "self-steering" LMs where a Planner model generates a task-specific inference program that is executed by a population of Follower models. Our approach equips LMs with the ability to write recursive search procedures that guide LM inference, enabling new forms of verifiable and efficient reasoning. When instantiated with a small Follower (e.g., Llama-3.2-1B), DisCIPL matches (and sometimes outperforms) much larger models, including GPT-4o and o1, on challenging constrained generation tasks. In decoupling planning from execution, our work opens up a design space of highly-parallelized Monte Carlo inference strategies that outperform standard best-of-N sampling, require no finetuning, and can be implemented automatically by existing LMs.

Updated: 2025-04-09 17:54:22

标题: 自主导航语言模型

摘要: 在测试时推理使语言模型能够处理复杂任务，但在自然语言中进行搜索或规划可能会缓慢、昂贵且容易出错。但即使当语言模型难以模拟解决问题所需的精确推理步骤时，它们通常擅长描述其抽象结构——包括如何验证解决方案以及如何搜索解决方案。本文介绍了DisCIPL，这是一种用于“自主引导”语言模型的方法，其中一个规划模型生成一个特定任务的推理程序，由一组跟随者模型执行。我们的方法赋予语言模型编写递归搜索程序的能力，指导语言模型推理，从而实现新形式的可验证和高效推理。当结合一个小的跟随者（例如Llama-3.2-1B）时，DisCIPL在具有挑战性的受限生成任务上与（有时甚至优于）更大的模型相匹配，包括GPT-4o和o1。通过将规划与执行分离，我们的工作打开了一个设计空间，其中高度并行化的蒙特卡洛推理策略胜过标准的最佳-N采样，无需微调，并且可以由现有的语言模型自动实现。

更新时间: 2025-04-09 17:54:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.07081v1

DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning

Despite great performance on Olympiad-level reasoning problems, frontier large language models can still struggle on high school math when presented with novel problems outside standard benchmarks. Going beyond final accuracy, we propose a deductive consistency metric to analyze chain-of-thought output from language models (LMs).Formally, deductive reasoning involves two subtasks: understanding a set of input premises and inferring the conclusions that follow from them. The proposed metric studies LMs' performance on these subtasks, with the goal of explaining LMs' reasoning errors on novel problems: how well do LMs understand input premises with increasing context lengths, and how well can they infer conclusions over multiple reasoning hops? Since existing benchmarks may be memorized, we develop a pipeline to evaluate LMs' deductive consistency on novel, perturbed versions of benchmark problems. On novel grade school math problems (GSM-8k), we find that LMs are fairly robust to increasing number of input premises, but suffer significant accuracy decay as the number of reasoning hops is increased. Interestingly, these errors are masked in the original benchmark as all models achieve near 100% accuracy. As we increase the number of solution steps using a synthetic dataset, prediction over multiple hops still remains the major source of error compared to understanding input premises. Other factors, such as shifts in language style or natural propagation of early errors do not explain the trends. Our analysis provides a new view to characterize LM reasoning -- as computations over a window of input premises and reasoning hops -- that can provide unified evaluation across problem domains.

Updated: 2025-04-09 17:53:55

标题: DeduCE：将演绎一致性作为评估LLM推理的框架

摘要: 尽管前沿大型语言模型在奥赛级推理问题上表现出色，但在高中数学中可能仍然会遇到困难，特别是当面对超出标准基准之外的新问题时。除了最终准确性之外，我们提出了一种推理一致性度量来分析语言模型（LMs）的思维链输出。形式上，演绎推理涉及两个子任务：理解一组输入前提并推断从中得出的结论。所提出的度量研究了LM在这些子任务上的表现，旨在解释LM在新问题上的推理错误：LM在增加上下文长度时对输入前提的理解程度如何，以及它们在多次推理跳跃中如何推断结论？由于现有基准可能被记忆，我们开发了一个流程来评估LM在基准问题的新颖扰动版本上的演绎一致性。在新型小学数学问题（GSM-8k）上，我们发现LM对于增加的输入前提数量相当稳健，但在增加推理跳跃次数时准确性显著下降。有趣的是，在原始基准中这些错误被掩盖，因为所有模型都达到了近100%的准确性。随着我们使用合成数据集增加解决步骤的数量，与理解输入前提相比，跨越多个跳跃的预测仍然是错误的主要来源。其他因素，如语言风格的变化或早期错误的自然传播，不能解释这些趋势。我们的分析提供了一种新的视角来表征LM的推理——作为对输入前提窗口和推理跳跃的计算——这可以在问题领域之间提供统一评估。

更新时间: 2025-04-09 17:53:55

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.07080v1

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

To survive and thrive in complex environments, humans have evolved sophisticated self-improvement mechanisms through environment exploration, hierarchical abstraction of experiences into reuseable skills, and collaborative construction of an ever-growing skill repertoire. Despite recent advancements, autonomous web agents still lack crucial self-improvement capabilities, struggling with procedural knowledge abstraction, refining skills, and skill composition. In this work, we introduce SkillWeaver, a skill-centric framework enabling agents to self-improve by autonomously synthesizing reusable skills as APIs. Given a new website, the agent autonomously discovers skills, executes them for practice, and distills practice experiences into robust APIs. Iterative exploration continually expands a library of lightweight, plug-and-play APIs, significantly enhancing the agent's capabilities. Experiments on WebArena and real-world websites demonstrate the efficacy of SkillWeaver, achieving relative success rate improvements of 31.8% and 39.8%, respectively. Additionally, APIs synthesized by strong agents substantially enhance weaker agents through transferable skills, yielding improvements of up to 54.3% on WebArena. These results demonstrate the effectiveness of honing diverse website interactions into APIs, which can be seamlessly shared among various web agents.

Updated: 2025-04-09 17:51:50

标题: SkillWeaver: 网络代理能够通过发现和磨练技能来自我提升

摘要: 为了在复杂环境中生存和茁壮成长，人类进化出了复杂的自我改进机制，通过环境探索、将经验层次化抽象为可重复利用的技能，以及协作构建不断增长的技能库。尽管近年来取得了进展，自主网络代理仍然缺乏关键的自我改进能力，难以处理程序知识的抽象、技能的完善和技能的组合。在这项工作中，我们介绍了SkillWeaver，这是一个以技能为中心的框架，使代理能够通过自主合成可重复使用的API来实现自我改进。给定一个新网站，代理会自主发现技能，执行它们进行实践，并将实践经验提炼成强大的API。迭代式探索不断扩展了一个轻量级、即插即用的API库，显著增强了代理的能力。在WebArena和真实网站上的实验表明，SkillWeaver的有效性，分别实现了31.8%和39.8%的相对成功率改善。此外，由强代理合成的API通过可转移的技能显著增强了较弱代理，使WebArena上的改善达到54.3%。这些结果表明，将多样的网站交互精炼为API的有效性，这些API可以在各种网络代理之间无缝共享。

更新时间: 2025-04-09 17:51:50

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2504.07079v1

Detecting AI-generated Artwork

The high efficiency and quality of artwork generated by Artificial Intelligence (AI) has created new concerns and challenges for human artists. In particular, recent improvements in generative AI have made it difficult for people to distinguish between human-generated and AI-generated art. In this research, we consider the potential utility of various types of Machine Learning (ML) and Deep Learning (DL) models in distinguishing AI-generated artwork from human-generated artwork. We focus on three challenging artistic styles, namely, baroque, cubism, and expressionism. The learning models we test are Logistic Regression (LR), Support Vector Machine (SVM), Multilayer Perceptron (MLP), and Convolutional Neural Network (CNN). Our best experimental results yield a multiclass accuracy of 0.8208 over six classes, and an impressive accuracy of 0.9758 for the binary classification problem of distinguishing AI-generated from human-generated art.

Updated: 2025-04-09 17:50:07

标题: Detecting AI生成的艺术品

摘要: 人工智能生成的艺术作品的高效率和质量为人类艺术家带来了新的关注和挑战。特别是，生成式人工智能的最新改进使人们难以区分人类生成和人工智能生成的艺术作品。在这项研究中，我们考虑了各种类型的机器学习（ML）和深度学习（DL）模型在区分人工智能生成艺术作品和人类生成艺术作品方面的潜在效用。我们关注三种具有挑战性的艺术风格，即巴洛克风格、立体主义和表现主义。我们测试的学习模型包括逻辑回归（LR）、支持向量机（SVM）、多层感知器（MLP）和卷积神经网络（CNN）。我们最好的实验结果在六个类别上产生了多类准确度为0.8208，对于区分人工智能生成和人类生成艺术的二元分类问题则有令人印象深刻的准确度为0.9758。

更新时间: 2025-04-09 17:50:07

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.07078v1

Estimation of embedding vectors in high dimensions

Embeddings are a basic initial feature extraction step in many machine learning models, particularly in natural language processing. An embedding attempts to map data tokens to a low-dimensional space where similar tokens are mapped to vectors that are close to one another by some metric in the embedding space. A basic question is how well can such embedding be learned? To study this problem, we consider a simple probability model for discrete data where there is some "true" but unknown embedding where the correlation of random variables is related to the similarity of the embeddings. Under this model, it is shown that the embeddings can be learned by a variant of low-rank approximate message passing (AMP) method. The AMP approach enables precise predictions of the accuracy of the estimation in certain high-dimensional limits. In particular, the methodology provides insight on the relations of key parameters such as the number of samples per value, the frequency of the terms, and the strength of the embedding correlation on the probability distribution. Our theoretical findings are validated by simulations on both synthetic data and real text data.

Updated: 2025-04-09 17:46:00

标题: 高维空间中嵌入向量的估计

摘要: 嵌入是许多机器学习模型中的基本特征提取步骤，特别是在自然语言处理中。嵌入试图将数据标记映射到一个低维空间，其中相似的标记被映射到距离较近的向量，这些向量在嵌入空间中通过某种度量相互靠近。一个基本问题是这样的嵌入能学习得有多好？为了研究这个问题，我们考虑了一个简单的离散数据的概率模型，其中存在一种“真实”但未知的嵌入，随机变量的相关性与嵌入的相似性有关。在这个模型下，通过一种低秩近似传递（AMP）方法，证明了可以学习到这些嵌入。AMP方法可以精确预测在某些高维限制条件下估计的准确性。特别是，该方法提供了关键参数之间的关系的见解，例如每个值的样本数、术语的频率以及嵌入相关性在概率分布上的影响强度。我们的理论发现通过对合成数据和实际文本数据的模拟进行验证。

更新时间: 2025-04-09 17:46:00

领域: cs.LG,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2312.07802v2

Beyond the Hype: A dispassionate look at vision-language models in medical scenario

Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across diverse tasks, garnering significant attention in AI communities. However, their performance and reliability in specialized domains such as medicine remain insufficiently assessed. In particular, most assessments over-concentrate on evaluating VLMs based on simple Visual Question Answering (VQA) on multi-modality data, while ignoring the in-depth characteristics of LVLMs. In this study, we introduce RadVUQA, a novel Radiological Visual Understanding and Question Answering benchmark, to comprehensively evaluate existing LVLMs. RadVUQA mainly validates LVLMs across five dimensions: 1) Anatomical understanding, assessing the models' ability to visually identify biological structures; 2) Multimodal comprehension, which involves the capability of interpreting linguistic and visual instructions to produce desired outcomes; 3) Quantitative and spatial reasoning, evaluating the models' spatial awareness and proficiency in combining quantitative analysis with visual and linguistic information; 4) Physiological knowledge, measuring the models' capability to comprehend functions and mechanisms of organs and systems; and 5) Robustness, which assesses the models' capabilities against unharmonized and synthetic data. The results indicate that both generalized LVLMs and medical-specific LVLMs have critical deficiencies with weak multimodal comprehension and quantitative reasoning capabilities. Our findings reveal the large gap between existing LVLMs and clinicians, highlighting the urgent need for more robust and intelligent LVLMs. The code is available at https://github.com/Nandayang/RadVUQA

Updated: 2025-04-09 17:42:01

标题: 超越炒作：冷静审视医疗场景中的视觉-语言模型

摘要: 最近在大规模视觉语言模型（LVLMs）领域的进展展示出在不同任务中的显著能力，引起了人工智能社区的广泛关注。然而，它们在医学等专业领域的性能和可靠性仍未得到充分评估。特别是，大多数评估过于集中于基于多模态数据的简单视觉问答（VQA）评估VLMs，而忽略了LVLMs的深入特征。在这项研究中，我们介绍了RadVUQA，一个新颖的放射学视觉理解和问题回答基准，以全面评估现有的LVLMs。RadVUQA主要在五个维度上验证LVLMs：1）解剖理解，评估模型在视觉上识别生物结构的能力；2）多模态理解，包括解释语言和视觉说明以产生所需结果的能力；3）定量和空间推理，评估模型的空间意识和将定量分析与视觉和语言信息相结合的能力；4）生理知识，衡量模型理解器官和系统功能和机制的能力；5）鲁棒性，评估模型对不协调和合成数据的能力。结果表明，通用LVLMs和医学特定LVLMs在多模态理解和定量推理能力方面存在严重不足。我们的研究结果揭示了现有LVLMs与临床医生之间的巨大差距，突显了更强大智能的LVLMs的迫切需求。代码可在https://github.com/Nandayang/RadVUQA上找到。

更新时间: 2025-04-09 17:42:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.08704v2

HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification

This paper introduces a comprehensive system for detecting hallucinations in large language model (LLM) outputs in enterprise settings. We present a novel taxonomy of LLM responses specific to hallucination in enterprise applications, categorizing them into context-based, common knowledge, enterprise-specific, and innocuous statements. Our hallucination detection model HDM-2 validates LLM responses with respect to both context and generally known facts (common knowledge). It provides both hallucination scores and word-level annotations, enabling precise identification of problematic content. To evaluate it on context-based and common-knowledge hallucinations, we introduce a new dataset HDMBench. Experimental results demonstrate that HDM-2 out-performs existing approaches across RagTruth, TruthfulQA, and HDMBench datasets. This work addresses the specific challenges of enterprise deployment, including computational efficiency, domain specialization, and fine-grained error identification. Our evaluation dataset, model weights, and inference code are publicly available.

Updated: 2025-04-09 17:39:41

标题: 幻觉检测：通过上下文和常识验证进行幻觉检测

摘要: 本文介绍了一种在企业环境中检测大型语言模型（LLM）输出中幻觉的综合系统。我们提出了一种新颖的LLM响应分类法，针对企业应用中的幻觉进行分类，将其分为基于上下文、常识、企业特定和无害陈述。我们的幻觉检测模型HDM-2验证LLM响应与上下文和一般已知事实（常识）的关系。它提供幻觉分数和单词级注释，能够精确识别问题内容。为了在基于上下文和常识的幻觉上评估它，我们引入了一个新的数据集HDMBench。实验结果表明，HDM-2在RagTruth、TruthfulQA和HDMBench数据集上优于现有方法。这项工作解决了企业部署的具体挑战，包括计算效率、领域专业化和细粒度错误识别。我们的评估数据集、模型权重和推理代码是公开可用的。

更新时间: 2025-04-09 17:39:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.07069v1

Multi-Object Tracking for Collision Avoidance Using Multiple Cameras in Open RAN Networks

This paper deals with the multi-object detection and tracking problem, within the scope of open Radio Access Network (RAN), for collision avoidance in vehicular scenarios. To this end, a set of distributed intelligent agents collocated with cameras are considered. The fusion of detected objects is done at an edge service, considering Open RAN connectivity. Then, the edge service predicts the objects trajectories for collision avoidance. Compared to the related work a more realistic Open RAN network is implemented and multiple cameras are used.

Updated: 2025-04-09 17:36:40

标题: 使用多个摄像头在开放式RAN网络中进行多目标跟踪以避免碰撞

摘要: 这篇论文讨论了在开放式无线接入网络（RAN）范围内，用于车辆场景中的碰撞避免的多目标检测和跟踪问题。为此，考虑了一组与摄像头共同定位的分布式智能代理。检测对象的融合在边缘服务中进行，考虑到开放式RAN连接。然后，边缘服务预测对象的轨迹以进行碰撞避免。与相关工作相比，实施了一个更加现实的开放式RAN网络，并使用了多个摄像头。

更新时间: 2025-04-09 17:36:40

领域: cs.MA,cs.LG,cs.RO

下载: http://arxiv.org/abs/2504.07163v1

LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning

Path planning is a fundamental scientific problem in robotics and autonomous navigation, requiring the derivation of efficient routes from starting to destination points while avoiding obstacles. Traditional algorithms like A* and its variants are capable of ensuring path validity but suffer from significant computational and memory inefficiencies as the state space grows. Conversely, large language models (LLMs) excel in broader environmental analysis through contextual understanding, providing global insights into environments. However, they fall short in detailed spatial and temporal reasoning, often leading to invalid or inefficient routes. In this work, we propose LLM-A*, an new LLM based route planning method that synergistically combines the precise pathfinding capabilities of A* with the global reasoning capability of LLMs. This hybrid approach aims to enhance pathfinding efficiency in terms of time and space complexity while maintaining the integrity of path validity, especially in large-scale scenarios. By integrating the strengths of both methodologies, LLM-A* addresses the computational and memory limitations of conventional algorithms without compromising on the validity required for effective pathfinding.

Updated: 2025-04-09 17:34:52

标题: LLM-A*: 大型语言模型增强的路径规划增量启发式搜索

摘要: 路径规划是机器人和自主导航中的一个基本科学问题，需要从起点到目的地点推导出高效的路径，同时避开障碍物。传统算法如A*及其变种能够确保路径的有效性，但随着状态空间的增长，它们面临着显著的计算和内存效率问题。相反，大型语言模型（LLMs）通过语境理解在更广泛的环境分析中表现出色，提供对环境的全局洞察。然而，它们在详细的空间和时间推理方面表现不佳，经常导致无效或低效的路径。在这项工作中，我们提出了LLM-A*，一种基于新型LLM的路径规划方法，将A*的精确路径查找能力与LLM的全局推理能力相结合。这种混合方法旨在提高路径规划的效率，同时保持路径有效性的完整性，尤其在大规模场景中。通过整合这两种方法的优势，LLM-A*解决了传统算法的计算和内存限制，同时不妥协于有效路径规划所需的有效性。

更新时间: 2025-04-09 17:34:52

领域: cs.RO,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.02511v2

Enhancing Downstream Analysis in Genome Sequencing: Species Classification While Basecalling

The ability to quickly and accurately identify microbial species in a sample, known as metagenomic profiling, is critical across various fields, from healthcare to environmental science. This paper introduces a novel method to profile signals coming from sequencing devices in parallel with determining their nucleotide sequences, a process known as basecalling, via a multi-objective deep neural network for simultaneous basecalling and multi-class genome classification. We introduce a new loss strategy where losses for basecalling and classification are back-propagated separately, with model weights combined for the shared layers, and a pre-configured ranking strategy allowing top-K species accuracy, giving users flexibility to choose between higher accuracy or higher speed at identifying the species. We achieve state-of-the-art basecalling accuracies, while classification accuracies meet and exceed the results of state-of-the-art binary classifiers, attaining an average of 92.5%/98.9% accuracy at identifying the top-1/3 species among a total of 17 genomes in the Wick bacterial dataset. The work presented here has implications for future studies in metagenomic profiling by accelerating the bottleneck step of matching the DNA sequence to the correct genome.

Updated: 2025-04-09 17:30:43

标题: 增强基因组测序中的下游分析：在碱基叫号时进行物种分类

摘要: 该论文介绍了一种新颖的方法，用于同时进行碱基识别和多类基因组分类，通过多目标深度神经网络来对来自测序设备的信号进行分析，同时确定它们的核苷酸序列。这一过程被称为碱基识别，可以快速准确地识别样本中的微生物物种，即所谓的元基因组分析，在从医疗保健到环境科学等各个领域都具有重要意义。我们引入了一种新的损失策略，其中碱基识别和分类的损失分别进行反向传播，模型权重组合用于共享层，以及一个预配置的排序策略，允许用户在准确性和速度之间进行选择。我们实现了最先进的碱基识别准确性，同时分类准确度达到并超过最先进的二元分类器的结果，在Wick细菌数据集的17个基因组中，平均识别前1/3种类的准确率达到92.5%/98.9%。这项工作对未来的元基因组分析研究具有重要意义，可以加速将DNA序列与正确基因组匹配的瓶颈步骤。

更新时间: 2025-04-09 17:30:43

领域: q-bio.GN,cs.LG

下载: http://arxiv.org/abs/2504.07065v1

Architecture independent generalization bounds for overparametrized deep ReLU networks

We prove that overparametrized neural networks are able to generalize with a test error that is independent of the level of overparametrization, and independent of the Vapnik-Chervonenkis (VC) dimension. We prove explicit bounds that only depend on the metric geometry of the test and training sets, on the regularity properties of the activation function, and on the operator norms of the weights and norms of biases. For overparametrized deep ReLU networks with a training sample size bounded by the input space dimension, we explicitly construct zero loss minimizers without use of gradient descent, and prove that the generalization error is independent of the network architecture.

Updated: 2025-04-09 17:29:05

标题: 无架构依赖的深度ReLU网络过度参数化的一般化界限

摘要: 我们证明过度参数化的神经网络能够泛化，测试误差与过度参数化水平无关，并且与Vapnik-Chervonenkis（VC）维度无关。我们证明了明确的界限，仅取决于测试集和训练集的度量几何、激活函数的正则性属性以及权重和偏置的算子范数。对于训练样本大小受输入空间维度限制的过度参数化的深层ReLU网络，我们明确构建了无梯度下降的零损失最小化器，并证明泛化误差与网络架构无关。

更新时间: 2025-04-09 17:29:05

领域: cs.LG,cs.AI,math.AP,math.OC,stat.ML,57R70, 62M45

下载: http://arxiv.org/abs/2504.05695v2

Pruner: A Draft-then-Verify Exploration Mechanism to Accelerate Tensor Program Tuning

Tensor program tuning is essential for the efficient deployment of deep neural networks. Search-based approaches have demonstrated scalability and effectiveness in automatically finding high-performance programs for specific hardware. However, the search process is often inefficient, taking hours or even days to discover optimal programs due to the exploration mechanisms guided by an accurate but slow-learned cost model. Meanwhile, the learned cost model trained on one platform cannot seamlessly adapt online to another, which we call cross-platform online unawareness. In this work, we propose Pruner and MoA-Pruner. Pruner is a "Draft-then-Verify" exploration mechanism that accelerates the schedule search process. Instead of applying the complex learned cost model to all explored candidates, Pruner drafts small-scale potential candidates by introducing a naive Symbol-based Analyzer (draft model), then identifies the best candidates by the learned cost model. MoA-Pruner introduces a Momentum online Adaptation strategy to address the cross-platform online unawareness. We incorporate Pruner into the TVM and conduct extensive experiments on three GPU-based platforms. Results show considerable speedup in schedule search time. In online tuning scenarios, Pruner and MoA-Pruner achieve an average speedup of $2.6 \times$ and $4.82 \times$ compared to Ansor. In offline tuning scenarios, Pruner achieves an average speedup of $4.75 \times$ and $4.05\times$ compared to TenSet and TLP, respectively. Furthermore, Pruner achieves an average speedup of $4.08 \times$ compared to MetaSchedule on TensorCore.

Updated: 2025-04-09 17:26:08

标题: 修剪工具：一种加速张量程序调优的草稿-验证探索机制

摘要: 张量程序调优对于高效部署深度神经网络至关重要。基于搜索的方法已经证明在自动发现适用于特定硬件的高性能程序方面具有可伸缩性和有效性。然而，由于由准确但学习缓慢的成本模型引导的探索机制，搜索过程通常是低效的，需要几个小时甚至几天才能发现最佳程序。同时，在一个平台上训练的学习成本模型无法无缝地在线适应另一个平台，这种情况被称为跨平台在线无意识。在这项工作中，我们提出了Pruner和MoA-Pruner。Pruner是一种“先草拟，再验证”的探索机制，加速了调度搜索过程。Pruner不是将复杂的学习成本模型应用于所有探索的候选方案，而是通过引入一个简单的基于符号的分析器（草拟模型）起草小规模潜在候选方案，然后通过学习的成本模型识别最佳候选方案。MoA-Pruner引入了一种动量在线适应策略来解决跨平台在线无意识问题。我们将Pruner纳入TVM并在三个基于GPU的平台上进行了大量实验。结果显示了调度搜索时间的显着加速。在在线调优场景中，与Ansor相比，Pruner和MoA-Pruner分别实现了平均加速比$2.6 \times$和$4.82 \times$。在离线调优场景中，Pruner与TenSet和TLP相比，分别实现了平均加速比$4.75 \times$和$4.05 \times$。此外，与TensorCore上的MetaSchedule相比，Pruner实现了平均加速比$4.08 \times$。

更新时间: 2025-04-09 17:26:08

领域: cs.LG

下载: http://arxiv.org/abs/2402.02361v3

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

The rapid development of vision language models (VLMs) demands rigorous and reliable evaluation. However, current visual question answering (VQA) benchmarks often depend on open-ended questions, making accurate evaluation difficult due to the variability in natural language responses. To address this, we introduce AutoConverter, an agentic framework that automatically converts these open-ended questions into multiple-choice format, enabling objective evaluation while reducing the costly multiple-choice question creation process. Our experiments demonstrate that AutoConverter can generate correct and challenging multiple-choice questions, with VLMs demonstrating consistently similar or lower accuracy on these questions compared to human-created ones. Using AutoConverter, we construct VMCBench, a benchmark created by transforming 20 existing VQA datasets into a unified multiple-choice format, totaling 9,018 questions. We comprehensively evaluate 33 state-of-the-art VLMs on VMCBench, setting a new standard for scalable, consistent, and reproducible VLM evaluation.

Updated: 2025-04-09 17:25:07

标题: 自动化生成具有挑战性的用于视觉语言模型评估的多项选择题

摘要: 视觉语言模型（VLMs）的快速发展需要严格和可靠的评估。然而，当前的视觉问答（VQA）基准通常依赖于开放性问题，这使得由于自然语言回答的变化而使准确评估变得困难。为了解决这个问题，我们引入了AutoConverter，一个主动框架，可以自动将这些开放性问题转换为多选题格式，从而实现客观评估，同时减少昂贵的多选题创建过程。我们的实验表明，AutoConverter可以生成正确且具有挑战性的多选题，VLMs在这些问题上的准确率与人工创建的问题相比要么保持一致，要么更低。使用AutoConverter，我们构建了VMCBench，通过将20个现有的VQA数据集转换为统一的多选题格式，共计9,018个问题。我们在VMCBench上全面评估了33个最先进的VLMs，为可扩展、一致和可重复的VLM评估设立了新标准。

更新时间: 2025-04-09 17:25:07

领域: cs.CV,cs.AI,cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2501.03225v2

$Π$-NeSy: A Possibilistic Neuro-Symbolic Approach

In this article, we introduce a neuro-symbolic approach that combines a low-level perception task performed by a neural network with a high-level reasoning task performed by a possibilistic rule-based system. The goal is to be able to derive for each input instance the degree of possibility that it belongs to a target (meta-)concept. This (meta-)concept is connected to intermediate concepts by a possibilistic rule-based system. The probability of each intermediate concept for the input instance is inferred using a neural network. The connection between the low-level perception task and the high-level reasoning task lies in the transformation of neural network outputs modeled by probability distributions (through softmax activation) into possibility distributions. The use of intermediate concepts is valuable for the explanation purpose: using the rule-based system, the classification of an input instance as an element of the (meta-)concept can be justified by the fact that intermediate concepts have been recognized. From the technical side, our contribution consists of the design of efficient methods for defining the matrix relation and the equation system associated with a possibilistic rule-based system. The corresponding matrix and equation are key data structures used to perform inferences from a possibilistic rule-based system and to learn the values of the rule parameters in such a system according to a training data sample. Furthermore, leveraging recent results on the handling of inconsistent systems of fuzzy relational equations, an approach for learning rule parameters according to multiple training data samples is presented. Experiments carried out on the MNIST addition problems and the MNIST Sudoku puzzles problems highlight the effectiveness of our approach compared with state-of-the-art neuro-symbolic ones.

Updated: 2025-04-09 17:16:23

标题: $Π$-NeSy: 一种可能性神经符号方法

摘要: 在这篇文章中，我们介绍了一种神经符号方法，将由神经网络执行的低级感知任务与由可能性规则系统执行的高级推理任务相结合。其目标是能够为每个输入实例推导出它属于目标（元）概念的可能性程度。这个（元）概念通过可能性规则系统与中间概念相连接。使用神经网络推断输入实例的每个中间概念的概率。低级感知任务与高级推理任务之间的连接在于通过概率分布建模（通过softmax激活）将神经网络输出转化为可能性分布。使用中间概念对解释目的很有价值：通过规则系统，将输入实例分类为（元）概念的元素可以通过已识别的中间概念来证明。从技术方面来看，我们的贡献在于设计了定义与可能性规则系统相关的矩阵关系和方程系统的有效方法。相应的矩阵和方程是用于从可能性规则系统进行推理并根据训练数据样本学习规则参数值的关键数据结构。此外，利用最近关于处理模糊关系方程不一致系统的结果，提出了一种根据多个训练数据样本学习规则参数的方法。在MNIST加法问题和MNIST数独难题上进行的实验突显了我们的方法相对于最新的神经符号方法的有效性。

更新时间: 2025-04-09 17:16:23

领域: cs.AI,cs.LG,cs.LO

下载: http://arxiv.org/abs/2504.07055v1

Beyond the Hype: Embeddings vs. Prompting for Multiclass Classification Tasks

Are traditional classification approaches irrelevant in this era of AI hype? We show that there are multiclass classification problems where predictive models holistically outperform LLM prompt-based frameworks. Given text and images from home-service project descriptions provided by Thumbtack customers, we build embeddings-based softmax models that predict the professional category (e.g., handyman, bathroom remodeling) associated with each problem description. We then compare against prompts that ask state-of-the-art LLM models to solve the same problem. We find that the embeddings approach outperforms the best LLM prompts in terms of accuracy, calibration, latency, and financial cost. In particular, the embeddings approach has 49.5% higher accuracy than the prompting approach, and its superiority is consistent across text-only, image-only, and text-image problem descriptions. Furthermore, it yields well-calibrated probabilities, which we later use as confidence signals to provide contextualized user experience during deployment. On the contrary, prompting scores are overly uninformative. Finally, the embeddings approach is 14 and 81 times faster than prompting in processing images and text respectively, while under realistic deployment assumptions, it can be up to 10 times cheaper. Based on these results, we deployed a variation of the embeddings approach, and through A/B testing we observed performance consistent with our offline analysis. Our study shows that for multiclass classification problems that can leverage proprietary datasets, an embeddings-based approach may yield unequivocally better results. Hence, scientists, practitioners, engineers, and business leaders can use our study to go beyond the hype and consider appropriate predictive models for their classification use cases.

Updated: 2025-04-09 17:15:47

标题: 超越炒作：嵌入 vs. 提示在多类分类任务中的应用

摘要: 传统的分类方法在人工智能炒作的时代是否已经不再适用？我们展示了在多类别分类问题中，预测模型在综合性能方面优于LLM基于提示的框架。我们利用Thumbtack客户提供的家政服务项目描述中的文本和图像，构建基于嵌入的softmax模型，预测与每个问题描述相关联的专业类别（例如，修理工，浴室改造）。然后我们与要求最先进的LLM模型解决相同问题的提示进行比较。我们发现，在准确性、校准性、延迟和财务成本方面，嵌入方法优于最佳LLM提示。具体而言，嵌入方法的准确性比提示方法高出49.5％，其优越性在仅文本、仅图像和文本-图像问题描述中都是一致的。此外，它产生的概率是良好校准的，我们稍后将其用作部署过程中提供上下文化用户体验的信心信号。相反，提示得分过于缺乏信息。最后，嵌入方法在处理图像和文本方面分别比提示快14倍和81倍，而在现实部署假设下，成本最多可节省10倍。基于这些结果，我们部署了嵌入方法的一个变种，并通过A/B测试观察到与我们的离线分析一致的性能。我们的研究表明，对于可以利用专有数据集的多类别分类问题，基于嵌入的方法可能会产生明显更好的结果。因此，科学家、从业者、工程师和业务领导者可以利用我们的研究超越炒作，并考虑适用于其分类用例的适当预测模型。

更新时间: 2025-04-09 17:15:47

领域: cs.LG,cs.AI,cs.CL,stat.AP

下载: http://arxiv.org/abs/2504.04277v2

Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models

This paper introduces a novel task to evaluate the robust understanding capability of Large Multimodal Models (LMMs), termed $\textbf{Unsolvable Problem Detection (UPD)}$. Multiple-choice question answering (MCQA) is widely used to assess the understanding capability of LMMs, but it does not guarantee that LMMs truly comprehend the answer. UPD assesses the LMM's ability to withhold answers when encountering unsolvable problems of MCQA, verifying whether the model truly understands the answer. UPD encompasses three problems: Absent Answer Detection (AAD), Incompatible Answer Set Detection (IASD), and Incompatible Visual Question Detection (IVQD), covering unsolvable cases like answer-lacking or incompatible choices and image-question mismatches. For the evaluation, we introduce the MM-UPD Bench, a benchmark for assessing performance across various ability dimensions. Our experiments reveal that even most LMMs, which demonstrate adequate performance on existing benchmarks, struggle significantly with MM-UPD, underscoring a novel aspect of trustworthiness that current benchmarks have overlooked. A detailed analysis shows that LMMs have different bottlenecks and chain-of-thought and self-reflection improved performance for LMMs with the bottleneck in their LLM capability. We hope our insights will enhance the broader understanding and development of more reliable LMMs.

Updated: 2025-04-09 17:13:27

标题: 无法解决问题检测：大型多模态模型的鲁棒理解评估

摘要: 这篇论文介绍了一项评估大型多模型模型（LMMs）稳健理解能力的新任务，称为$\textbf{无法解决问题检测（UPD）}$。多项选择题回答（MCQA）被广泛用于评估LMMs的理解能力，但它并不能保证LMMs真正理解答案。UPD评估LMM的能力在遇到MCQA的无法解决问题时保持答案，验证模型是否真正理解答案。UPD包括三个问题：缺少答案检测（AAD），不兼容答案集检测（IASD）和不兼容视觉问题检测（IVQD），涵盖答案缺失或不兼容选择和图像问题不匹配等无法解决的情况。为了评估，我们引入了MM-UPD Bench，一个用于评估各种能力维度表现的基准。我们的实验表明，即使大多数LMMs在现有基准上表现良好，它们在MM-UPD上面临着显著困难，突出了当前基准忽视的可信度新方面。详细分析显示，LMMs存在不同的瓶颈和思维链和自我反思提高了具有LLM能力瓶颈的LMMs的性能。我们希望我们的见解将增强对更可靠LMMs的更广泛理解和发展。

更新时间: 2025-04-09 17:13:27

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.20331v2

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning

Recent advancements in large language models have significantly improved their reasoning abilities, particularly through techniques involving search and backtracking. Backtracking naturally scales test-time compute by enabling sequential, linearized exploration via long chain-of-thought (CoT) generation. However, this is not the only strategy for scaling test-time compute: parallel sampling with best-of-n selection provides an alternative that generates diverse solutions simultaneously. Despite the growing adoption of sequential search, its advantages over parallel sampling--especially under a fixed compute budget remain poorly understood. In this paper, we systematically compare these two approaches on two challenging reasoning tasks: CountDown and Sudoku. Surprisingly, we find that sequential search underperforms parallel sampling on CountDown but outperforms it on Sudoku, suggesting that backtracking is not universally beneficial. We identify two factors that can cause backtracking to degrade performance: (1) training on fixed search traces can lock models into suboptimal strategies, and (2) explicit CoT supervision can discourage "implicit" (non-verbalized) reasoning. Extending our analysis to reinforcement learning (RL), we show that models with backtracking capabilities benefit significantly from RL fine-tuning, while models without backtracking see limited, mixed gains. Together, these findings challenge the assumption that backtracking universally enhances LLM reasoning, instead revealing a complex interaction between task structure, training data, model scale, and learning paradigm.

Updated: 2025-04-09 17:12:49

标题: 是否要回溯：当顺序搜索限制模型推理时

摘要: 最近大型语言模型的进展显著提高了它们的推理能力，特别是通过涉及搜索和回溯技术。回溯通过启用顺序化的、线性化的长链思维（CoT）生成，自然地扩展了测试计算量。然而，这并不是扩展测试计算量的唯一策略：带有最佳n选择的并行抽样提供了一个同时生成多样化解决方案的替代方案。尽管顺序搜索的使用日益增长，但其在固定计算预算下相对于并行抽样的优势尚不明确。在本文中，我们系统地比较了这两种方法在两个具有挑战性的推理任务上的表现：CountDown和数独。令人惊讶的是，我们发现顺序搜索在CountDown上表现不佳，但在数独上表现优于并行抽样，这表明回溯并不是普遍有益的。我们确定两个因素可能导致回溯性能下降：（1）在固定搜索轨迹上训练可能将模型锁定在次优策略中，（2）显式CoT监督可能阻碍“隐性”（非口头化）推理。将我们的分析扩展到强化学习（RL），我们展示了具有回溯能力的模型从RL微调中受益显著，而没有回溯的模型则看到有限的、混合的收益。总的来说，这些发现挑战了回溯普遍增强LLM推理的假设，而是揭示了任务结构、训练数据、模型规模和学习范式之间的复杂相互作用。

更新时间: 2025-04-09 17:12:49

领域: cs.LG

下载: http://arxiv.org/abs/2504.07052v1

Context Switching for Secure Multi-programming of Near-Term Quantum Computers

Multi-programming quantum computers improve device utilization and throughput. However, crosstalk from concurrent two-qubit CNOT gates poses security risks, compromising the fidelity and output of co-running victim programs. We design Zero Knowledge Tampering Attacks (ZKTAs), using which attackers can exploit crosstalk without knowledge of the hardware error profile. ZKTAs can alter victim program outputs in 40% of cases on commercial systems. We identify that ZKTAs succeed because the attacker's program consistently runs with the same victim program in a fixed context. To mitigate this, we propose QONTEXTS: a context-switching technique that defends against ZKTAs by running programs across multiple contexts, each handling only a subset of trials. QONTEXTS uses multi-programming with frequent context switching while identifying a unique set of programs for each context. This helps limit only a fraction of execution to ZKTAs. We enhance QONTEXTS with attack detection capabilities that compare the distributions from different contexts against each other to identify noisy contexts executed with ZKTAs. Our evaluations on real IBMQ systems show that QONTEXTS increases program resilience by three orders of magnitude and fidelity by 1.33$\times$ on average. Moreover, QONTEXTS improves throughput by 2$\times$, advancing security in multi-programmed environments.

Updated: 2025-04-09 17:05:16

标题: 上下文切换以实现近期量子计算机安全多程序设计

摘要: 多程序量子计算机提高了设备利用率和吞吐量。然而，同时运行的两比特CNOT门之间的串扰会带来安全风险，影响共同运行的受害程序的保真度和输出。我们设计了零知识篡改攻击（ZKTAs），攻击者可以利用该攻击来利用串扰，而不需要了解硬件错误配置文件。ZKTAs可以在商业系统中改变受害程序的输出，成功率达到40％。我们发现ZKTAs成功的原因是攻击者的程序在固定的上下文中始终与相同的受害程序一起运行。为了缓解这一问题，我们提出了QONTEXTS：一种上下文切换技术，通过在多个上下文中运行程序，每个上下文仅处理一部分试验，以抵御ZKTAs。QONTEXTS利用多程序编程和频繁的上下文切换，为每个上下文识别出一组唯一的程序。这有助于将执行的一部分限制在ZKTAs中。我们通过攻击检测功能增强了QONTEXTS，比较不同上下文的分布以识别与ZKTAs一起执行的噪声上下文。我们在真实的IBMQ系统上进行评估表明，QONTEXTS将程序的韧性提高了三个数量级，平均提高了1.33倍的保真度。此外，QONTEXTS提高了吞吐量2倍，促进了多程序环境中的安全性。

更新时间: 2025-04-09 17:05:16

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2504.07048v1

Efficient Storage Integrity in Adversarial Settings

Storage integrity is essential to systems and applications that use untrusted storage (e.g., public clouds, end-user devices). However, known methods for achieving storage integrity either suffer from high (and often prohibitive) overheads or provide weak integrity guarantees. In this work, we demonstrate a hybrid approach to storage integrity that simultaneously reduces overhead while providing strong integrity guarantees. Our system, partially asynchronous integrity checking (PAC), allows disk write commitments to be deferred while still providing guarantees around read integrity. PAC delivers a 5.5X throughput and latency improvement over the state of the art, and 85% of the throughput achieved by non-integrity-assuring approaches. In this way, we show that untrusted storage can be used for integrity-critical workloads without meaningfully sacrificing performance.

Updated: 2025-04-09 16:58:22

标题: 在敌对环境中高效存储完整性

摘要: 存储完整性对于使用不受信任的存储（例如公共云，终端用户设备）的系统和应用程序至关重要。然而，目前已知的实现存储完整性的方法要么具有高昂（并且通常是禁止的）开销，要么提供弱的完整性保证。在这项工作中，我们展示了一种混合方法来实现存储完整性，同时降低开销并提供强大的完整性保证。我们的系统，部分异步完整性检查（PAC），允许延迟磁盘写入承诺，同时仍然提供读取完整性保证。PAC相比于现有技术实现了5.5倍的吞吐量和延迟改进，并且达到了非完整性保证方法所实现吞吐量的85%。通过这种方式，我们展示了不受信任的存储可以用于完整性关键的工作负载，而不会实质性地牺牲性能。

更新时间: 2025-04-09 16:58:22

领域: cs.CR

下载: http://arxiv.org/abs/2504.07041v1

MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications with Retrieval Augmented Generation and Knowledge Graphs

The increasing interest in developing Artificial Intelligence applications in the medical domain, suffers from the lack of high-quality data set, mainly due to privacy-related issues. In addition, the recent increase in Vision Language Models (VLM) leads to the need for multimodal medical data sets, where clinical reports and findings are attached to the corresponding medical scans. This paper illustrates the entire workflow for building the MedPix 2.0 data set. Starting with the well-known multimodal data set MedPix, mainly used by physicians, nurses, and healthcare students for Continuing Medical Education purposes, a semi-automatic pipeline was developed to extract visual and textual data followed by a manual curing procedure in which noisy samples were removed, thus creating a MongoDB database. Along with the data set, we developed a Graphical User Interface aimed at navigating efficiently the MongoDB instance and obtaining the raw data that can be easily used for training and/or fine-tuning VLMs. To enforce this point, in this work, we first recall DR-Minerva, a Retrieve Augmented Generation-based VLM model trained upon MedPix 2.0. DR-Minerva predicts the body part and the modality used to scan its input image. We also propose the extension of DR-Minerva with a Knowledge Graph that uses Llama 3.1 Instruct 8B, and leverages MedPix 2.0. The resulting architecture can be queried in a end-to-end manner, as a medical decision support system. MedPix 2.0 is available on GitHub https://github.com/CHILab1/MedPix-2.0

Updated: 2025-04-09 16:57:40

标题: MedPix 2.0：用于先进AI应用的综合多模态生物医学数据集，具有检索增强生成和知识图

摘要: 在医学领域开发人工智能应用的兴趣日益增加，但由于隐私问题导致缺乏高质量数据集。此外，最近视觉语言模型（VLM）的增加增加了对多模态医学数据集的需求，其中临床报告和发现与相应的医学扫描相关联。本文介绍了构建MedPix 2.0数据集的整个工作流程。从广泛用于医生、护士和医疗学生进行继续医学教育的著名多模态数据集MedPix开始，开发了一个半自动化流水线，用于提取视觉和文本数据，然后进行手动处理程序，去除噪音样本，从而创建一个MongoDB数据库。除了数据集，我们还开发了一个图形用户界面，旨在有效浏览MongoDB实例并获取原始数据，这些数据可以轻松用于训练和/或微调VLM。为了强调这一点，在这项工作中，我们首先回顾了基于MedPix 2.0训练的Retrieve Augmented Generation VLM模型DR-Minerva。DR-Minerva预测了输入图像的身体部位和使用的扫描方式。我们还提出了将DR-Minerva扩展为使用Llama 3.1 Instruct 8B的知识图，并利用MedPix 2.0。最终的架构可以被查询为端到端方式，作为医疗决策支持系统。MedPix 2.0可在GitHub上找到https://github.com/CHILab1/MedPix-2.0。

更新时间: 2025-04-09 16:57:40

领域: cs.DB,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.02994v3

Identifying Key Challenges of Hardness-Based Resampling

Performance gap across classes remains a persistent challenge in machine learning, often attributed to variations in class hardness. One way to quantify class hardness is through sample complexity - the minimum number of samples required to effectively learn a given class. Sample complexity theory suggests that class hardness is driven by differences in the amount of data required for generalization. That is, harder classes need substantially more samples to achieve generalization. Therefore, hardness-based resampling is a promising approach to mitigate these performance disparities. While resampling has been studied extensively in data-imbalanced settings, its impact on balanced datasets remains unexplored. This raises the fundamental question whether resampling is effective because it addresses data imbalance or hardness imbalance. We begin addressing this question by introducing class imbalance into balanced datasets and evaluate its effect on performance disparities. We oversample hard classes and undersample easy classes to bring hard classes closer to their sample complexity requirements while maintaining a constant dataset size for fairness. We estimate class-level hardness using the Area Under the Margin (AUM) hardness estimator and leverage it to compute resampling ratios. Using these ratios, we perform hardness-based resampling on the well-known CIFAR-10 and CIFAR-100 datasets. Contrary to theoretical expectations, our results show that hardness-based resampling does not meaningfully affect class-wise performance disparities. To explain this discrepancy, we conduct detailed analyses to identify key challenges unique to hardness-based imbalance, distinguishing it from traditional data-based imbalance. Our insights help explain why theoretical sample complexity expectations fail to translate into practical performance gains and we provide guidelines for future research.

Updated: 2025-04-09 16:45:57

标题: 确定基于硬度的重采样的关键挑战

摘要: 不同类别之间的性能差距仍然是机器学习中一项持续挑战，通常被归因于类别难度的变化。量化类别难度的一种方式是通过样本复杂度 - 有效学习给定类别所需的最少样本数。样本复杂度理论表明，类别难度是由泛化所需数据量的差异驱动的。也就是说，更难的类别需要更多的样本才能实现泛化。因此，基于难度的重采样是缓解这些性能差异的一种有希望的方法。虽然在数据不平衡的情况下对重采样进行了广泛研究，但其对平衡数据集的影响尚未被探索。这引发了一个基本问题，即重采样是否有效，因为它解决了数据不平衡还是难度不平衡。我们开始通过将类别不平衡引入平衡数据集来回答这个问题，并评估其对性能差异的影响。我们对难类别进行过采样，对易类别进行欠采样，以使难类别接近其样本复杂度要求，同时保持一个恒定的数据集大小以确保公平。我们使用区域边缘下面积（AUM）难度估计器估计类别级难度，并利用它计算重采样比例。利用这些比例，我们对著名的CIFAR-10和CIFAR-100数据集进行基于难度的重采样。与理论预期相反，我们的结果表明，基于难度的重采样并没有对类别性能差异产生实质性影响。为了解释这一差异，我们进行了详细分析，以确定难度不平衡独特的关键挑战，将其与传统数据不平衡进行区分。我们的见解有助于解释为什么理论上的样本复杂度期望未能转化为实际性能提升，并为未来研究提供指导。

更新时间: 2025-04-09 16:45:57

领域: cs.LG

下载: http://arxiv.org/abs/2504.07031v1

Using ML filters to help automated vulnerability repairs: when it helps and when it doesn't

[Context:] The acceptance of candidate patches in automated program repair has been typically based on testing oracles. Testing requires typically a costly process of building the application while ML models can be used to quickly classify patches, thus allowing more candidate patches to be generated in a positive feedback loop. [Problem:] If the model predictions are unreliable (as in vulnerability detection) they can hardly replace the more reliable oracles based on testing. [New Idea:] We propose to use an ML model as a preliminary filter of candidate patches which is put in front of a traditional filter based on testing. [Preliminary Results:] We identify some theoretical bounds on the precision and recall of the ML algorithm that makes such operation meaningful in practice. With these bounds and the results published in the literature, we calculate how fast some of state-of-the art vulnerability detectors must be to be more effective over a traditional AVR pipeline such as APR4Vuln based just on testing.

Updated: 2025-04-09 16:39:09

标题: 使用机器学习过滤器帮助自动化漏洞修复：何时有帮助，何时无帮助

摘要: [背景：]自动程序修复中候选补丁的接受通常基于测试预测。测试通常需要耗费大量时间来构建应用程序，而机器学习模型可以快速分类补丁，从而允许在正反馈循环中生成更多候选补丁。[问题：]如果模型预测不可靠（如在漏洞检测中），则很难取代基于测试的更可靠预测。[新想法：]我们提出使用机器学习模型作为候选补丁的初步筛选器，放在传统基于测试的筛选器之前。[初步结果：]我们确定了机器学习算法的精度和召回率的一些理论界限，使得这种操作在实践中具有意义。根据这些界限和文献中发布的结果，我们计算出一些最先进的漏洞检测器必须有多快才能比仅基于测试的传统AVR管道（如基于测试的APR4Vuln）更有效。

更新时间: 2025-04-09 16:39:09

领域: cs.SE,cs.CR,cs.LG

下载: http://arxiv.org/abs/2504.07027v1

Countering threats to national security posed by AI systems through an incident regime

Recent progress in AI capabilities has heightened concerns that AI systems could pose a threat to national security, for example, by making it easier for malicious actors to perform cyberattacks on critical national infrastructure, or through loss of control of autonomous AI systems. In parallel, federal legislators in the US have proposed nascent 'AI incident regimes' to identify and counter similar threats. In this paper, we consolidate these two trends and present a proposal for a legally mandated post-deployment AI incident regime that aims to counter potential national security threats from AI systems. We start the paper by introducing the concept of 'security-critical' to describe doctors that pose extreme risks to national security, before arguing that 'security-critical' describes civilian nuclear power, aviation, life science dual-use research of concern, and frontier AI development. We then present in detail our AI incident regime proposal, justifying each component of the proposal by demonstrating its similarity to US domestic incident regimes in other 'security-critical' sectors. Finally, we sketch a hypothetical scenario where our proposed AI incident regime deals with an AI cyber incident. Our proposed AI incident regime is split into three phases. The first phase revolves around a novel operationalization of what counts as an 'AI incident' and we suggest that AI providers must create a 'national security case' before deploying a frontier AI system. The second and third phases spell out that AI providers should notify a government agency about incidents, and that the government agency should be involved in amending AI providers' security and safety procedures, in order to counter future threats to national security.

Updated: 2025-04-09 16:36:30

标题: 通过事故制度应对人工智能系统对国家安全构成的威胁

摘要: 人工智能能力的最新进展引起了人们对人工智能系统可能对国家安全构成威胁的担忧，例如，通过使恶意行为者更容易对关键国家基础设施进行网络攻击，或者通过失去对自主人工智能系统的控制。与此同时，美国联邦立法者提出了初步的“人工智能事件制度”，以识别和应对类似威胁。在本文中，我们整合了这两种趋势，并提出了一个法律规定的部署后人工智能事件制度的建议，旨在应对人工智能系统可能带来的国家安全威胁。我们首先介绍了“安全关键”概念，用以描述对国家安全构成极端风险的部门，然后论证认为，“安全关键”包括平民核能、航空、生命科学双用研究以及前沿人工智能发展。接着我们详细介绍了人工智能事件制度建议，并通过展示其与美国其他“安全关键”领域的国内事件制度的相似之处来证明建议的每个组成部分。最后，我们勾勒了一个假设情景，展示了我们提出的人工智能事件制度如何处理人工智能网络事件。我们提出的人工智能事件制度分为三个阶段。第一阶段围绕着对“人工智能事件”如何计算的新颖运作方式，我们建议人工智能提供商在部署前必须建立一个“国家安全案例”。第二和第三阶段明确表示人工智能提供商应该向政府机构通报事件，政府机构应参与修订人工智能提供商的安全程序，以应对未来对国家安全的威胁。

更新时间: 2025-04-09 16:36:30

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2503.19887v2

ShadowBinding: Realizing Effective Microarchitectures for In-Core Secure Speculation Schemes

Secure speculation schemes have shown great promise in the war against speculative side-channel attacks, and will be a key building block for developing secure, high-performance architectures moving forward. As the field matures, the need for rigorous microarchitectures, and corresponding performance and cost analysis, become critical for evaluating secure schemes and for enabling their future adoption. In ShadowBinding, we present effective microarchitectures for two state-of-the-art secure schemes, uncovering and mitigating fundamental microarchitectural limitations within the analyzed schemes, and provide important design characteristics. We uncover that Speculative Taint Tracking's (STT's) rename-based taint computation must be completed in a single cycle, creating an expensive dependency chain which greatly limits performance for wider processor cores. We also introduce a novel michroarchitectural approach for STT, named STT-Issue, which, by delaying the taint computation to the issue stage, eliminates the dependency chain, achieving better instructions per cycle (IPC), timing, area, and performance results. Through a comprehensive evaluation of our STT and Non-Speculative Data Access (NDA) microarchitectural designs on the RISC-V Berkeley Out-of-Order Machine, we find that the IPC impact of in-core secure schemes is higher than previously estimated, close to 20% for the highest performance core. With insights into timing from our RTL evaluation, the performance loss, created by the combined impact of IPC and timing, becomes even greater, at 35%, 27%, and 22% for STT-Rename, STT-Issue, and NDA, respectively. If these trends were to hold for leading processor core designs, the performance impact would be well over 30%, even for the best-performing scheme.

Updated: 2025-04-09 16:33:42

标题: ShadowBinding: 实现用于核心内安全推测方案的有效微架构

摘要: Secure speculation schemes have shown significant potential in combating speculative side-channel attacks, and will be crucial for the development of secure, high-performance architectures in the future. As the field progresses, the importance of rigorous microarchitectures, as well as performance and cost analysis, becomes essential for evaluating secure schemes and facilitating their adoption. In ShadowBinding, we introduce efficient microarchitectures for two cutting-edge secure schemes, identifying and addressing fundamental microarchitectural limitations in the analyzed schemes, and highlighting key design characteristics. We discover that Speculative Taint Tracking's (STT) rename-based taint computation must be completed within a single cycle, leading to a costly dependency chain that significantly hinders performance in wider processor cores. We also propose a novel microarchitectural approach for STT, called STT-Issue, which eliminates the dependency chain by delaying the taint computation to the issue stage, resulting in improved instructions per cycle (IPC), timing, area, and performance outcomes. Through a thorough evaluation of our STT and Non-Speculative Data Access (NDA) microarchitectural designs on the RISC-V Berkeley Out-of-Order Machine, we observe that the IPC impact of in-core secure schemes is higher than previously estimated, reaching nearly 20% for the top-performing core. Our RTL evaluation reveals that the performance loss, due to the combined effects of IPC and timing, is even more significant, at 35%, 27%, and 22% for STT-Rename, STT-Issue, and NDA, respectively. If these trends hold true for leading processor core designs, the performance impact could exceed 30%, even for the most effective scheme.

更新时间: 2025-04-09 16:33:42

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2504.07018v1

Adapting GT2-FLS for Uncertainty Quantification: A Blueprint Calibration Strategy

Uncertainty Quantification (UQ) is crucial for deploying reliable Deep Learning (DL) models in high-stakes applications. Recently, General Type-2 Fuzzy Logic Systems (GT2-FLSs) have been proven to be effective for UQ, offering Prediction Intervals (PIs) to capture uncertainty. However, existing methods often struggle with computational efficiency and adaptability, as generating PIs for new coverage levels $(\phi_d)$ typically requires retraining the model. Moreover, methods that directly estimate the entire conditional distribution for UQ are computationally expensive, limiting their scalability in real-world scenarios. This study addresses these challenges by proposing a blueprint calibration strategy for GT2-FLSs, enabling efficient adaptation to any desired $\phi_d$ without retraining. By exploring the relationship between $\alpha$-plane type reduced sets and uncertainty coverage, we develop two calibration methods: a lookup table-based approach and a derivative-free optimization algorithm. These methods allow GT2-FLSs to produce accurate and reliable PIs while significantly reducing computational overhead. Experimental results on high-dimensional datasets demonstrate that the calibrated GT2-FLS achieves superior performance in UQ, highlighting its potential for scalable and practical applications.

Updated: 2025-04-09 16:32:43

标题: 调整GT2-FLS用于不确定性量化：一种蓝图校准策略

摘要: 不确定性量化（UQ）对于在高风险应用中部署可靠的深度学习（DL）模型至关重要。最近，广义二型模糊逻辑系统（GT2-FLSs）已被证明对于UQ非常有效，提供预测区间（PIs）来捕捉不确定性。然而，现有的方法通常在计算效率和适应性方面存在困难，因为生成新覆盖水平（φd）的PIs通常需要重新训练模型。此外，直接估计整个UQ条件分布的方法在计算上昂贵，限制了它们在现实场景中的可扩展性。本研究通过提出一种蓝图校准策略来解决这些挑战，使GT2-FLSs能够有效地适应任何所需的φd而无需重新训练。通过探讨α-平面型缩减集和不确定性覆盖之间的关系，我们开发了两种校准方法：基于查找表的方法和无导数优化算法。这些方法使GT2-FLSs能够产生准确可靠的PIs，同时显著减少计算开销。对高维数据集的实验结果表明，经过校准的GT2-FLS在UQ中表现出优越性能，凸显了其在可扩展和实际应用中的潜力。

更新时间: 2025-04-09 16:32:43

领域: cs.LG

下载: http://arxiv.org/abs/2504.07017v1

LLM-IFT: LLM-Powered Information Flow Tracking for Secure Hardware

As modern hardware designs grow in complexity and size, ensuring security across the confidentiality, integrity, and availability (CIA) triad becomes increasingly challenging. Information flow tracking (IFT) is a widely-used approach to tracing data propagation, identifying unauthorized activities that may compromise confidentiality or/and integrity in hardware. However, traditional IFT methods struggle with scalability and adaptability, particularly in high-density and interconnected architectures, leading to tracing bottlenecks that limit applicability in large-scale hardware. To address these limitations and show the potential of transformer-based models in integrated circuit (IC) design, this paper introduces LLM-IFT that integrates large language models (LLM) for the realization of the IFT process in hardware. LLM-IFT exploits LLM-driven structured reasoning to perform hierarchical dependency analysis, systematically breaking down even the most complex designs. Through a multi-step LLM invocation, the framework analyzes both intra-module and inter-module dependencies, enabling comprehensive IFT assessment. By focusing on a set of Trust-Hub vulnerability test cases at both the IP level and the SoC level, our experiments demonstrate a 100\% success rate in accurate IFT analysis for confidentiality and integrity checks in hardware.

Updated: 2025-04-09 16:32:13

标题: LLM-IFT：LLM 动力信息流跟踪用于安全硬件

摘要: 随着现代硬件设计在复杂性和规模上的增长，确保跨机密性、完整性和可用性（CIA）三元素的安全性变得越来越具有挑战性。信息流跟踪（IFT）是一种广泛使用的方法，用于跟踪数据传播，识别可能危及硬件机密性和/或完整性的未经授权活动。然而，传统的IFT方法在可扩展性和适应性方面存在困难，特别是在高密度和互联架构中，导致跟踪瓶颈，限制在大规模硬件中的适用性。为了解决这些限制并展示变压器型模型在集成电路（IC）设计中的潜力，本文介绍了LLM-IFT，该方法将大型语言模型（LLM）集成到硬件中实现IFT过程。LLM-IFT利用LLM驱动的结构推理来执行分层依赖性分析，系统地分解甚至最复杂的设计。通过多步LLM调用，该框架分析模块内部和模块间的依赖关系，实现全面的IFT评估。通过专注于IP级别和SoC级别的一组Trust-Hub漏洞测试案例，我们的实验表明，在硬件中对机密性和完整性检查进行精确IFT分析的成功率达到100%。

更新时间: 2025-04-09 16:32:13

领域: cs.CR

下载: http://arxiv.org/abs/2504.07015v1

FAME: Introducing Fuzzy Additive Models for Explainable AI

In this study, we introduce the Fuzzy Additive Model (FAM) and FAM with Explainability (FAME) as a solution for Explainable Artificial Intelligence (XAI). The family consists of three layers: (1) a Projection Layer that compresses the input space, (2) a Fuzzy Layer built upon Single Input-Single Output Fuzzy Logic Systems (SFLS), where SFLS functions as subnetworks within an additive index model, and (3) an Aggregation Layer. This architecture integrates the interpretability of SFLS, which uses human-understandable if-then rules, with the explainability of input-output relationships, leveraging the additive model structure. Furthermore, using SFLS inherently addresses issues such as the curse of dimensionality and rule explosion. To further improve interpretability, we propose a method for sculpting antecedent space within FAM, transforming it into FAME. We show that FAME captures the input-output relationships with fewer active rules, thus improving clarity. To learn the FAM family, we present a deep learning framework. Through the presented comparative results, we demonstrate the promising potential of FAME in reducing model complexity while retaining interpretability, positioning it as a valuable tool for XAI.

Updated: 2025-04-09 16:29:55

标题: FAME：引入模糊加法模型用于可解释的人工智能

摘要: 在这项研究中，我们介绍了模糊加法模型（FAM）和具有可解释性的FAM（FAME）作为可解释人工智能（XAI）的解决方案。该家族包括三个层次：（1）一个投影层，用于压缩输入空间，（2）建立在单输入-单输出模糊逻辑系统（SFLS）之上的模糊层，其中SFLS作为加法指数模型中的子网络，以及（3）一个聚合层。该架构整合了SFLS的可解释性，使用人类可理解的if-then规则，以及输入-输出关系的可解释性，利用了加法模型结构。此外，使用SFLS固有地解决了维度灾难和规则爆炸等问题。为了进一步提高可解释性，我们提出了一种在FAM中雕刻前提空间的方法，将其转变为FAME。我们展示了FAME能够以更少的活跃规则捕捉输入-输出关系，从而提高清晰度。为了学习FAM家族，我们提出了一个深度学习框架。通过所呈现的比较结果，我们展示了FAME在减少模型复杂性的同时保留可解释性的有望潜力，将其定位为XAI的有价值工具。

更新时间: 2025-04-09 16:29:55

领域: cs.LG

下载: http://arxiv.org/abs/2504.07011v1

Assumption-free fidelity bounds for hardware noise characterization

In the Quantum Supremacy regime, quantum computers may overcome classical machines on several tasks if we can estimate, mitigate, or correct unavoidable hardware noise. Estimating the error requires classical simulations, which become unfeasible in the Quantum Supremacy regime. We leverage Machine Learning data-driven approaches and Conformal Prediction, a Machine Learning uncertainty quantification tool known for its mild assumptions and finite-sample validity, to find theoretically valid upper bounds of the fidelity between noiseless and noisy outputs of quantum devices. Under reasonable extrapolation assumptions, the proposed scheme applies to any Quantum Computing hardware, does not require modeling the device's noise sources, and can be used when classical simulations are unavailable, e.g. in the Quantum Supremacy regime.

Updated: 2025-04-09 16:27:52

标题: 无假设的硬件噪声特性的保真度界限

摘要: 在量子霸权领域，如果我们能够估计、减轻或纠正不可避免的硬件噪声，量子计算机可能在几个任务上超越经典机器。估计误差需要经典仿真，在量子霸权领域变得不可行。我们利用机器学习数据驱动方法和符合预测，这是一种机器学习不确定性量化工具，以其温和的假设和有限样本有效性而闻名，以找到理论上有效的无噪声和有噪声量子设备输出之间保真度的上限。在合理的外推假设下，所提出的方案适用于任何量子计算硬件，不需要对设备的噪声源进行建模，并且当经典仿真不可用时，例如在量子霸权领域，可以使用。

更新时间: 2025-04-09 16:27:52

领域: quant-ph,cs.LG,stat.ML

下载: http://arxiv.org/abs/2504.07010v1

Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback

The scientific research paradigm is undergoing a profound transformation owing to the development of Artificial Intelligence (AI). Recent works demonstrate that various AI-assisted research methods can largely improve research efficiency by improving data analysis, accelerating computation, and fostering novel idea generation. To further move towards the ultimate goal (i.e., automatic scientific research), in this paper, we introduce Dolphin, a closed-loop LLM-driven framework to enhance the automation level of scientific research. Dolphin first generates novel ideas based on feedback from previous experiments and relevant papers ranked by the topic and task attributes. Then, the generated ideas can be implemented using a code template refined and debugged with the designed exception-traceback-guided local code structure. Finally, Dolphin automatically analyzes the results of each idea and feeds the results back to the next round of idea generation. Experiments are conducted on the benchmark datasets of different topics and a subset of MLE-bench. Results show that Dolphin can continuously improve the performance of the input topic in a loop. We highlight that Dolphin can automatically propose methods that are comparable to the state-of-the-art in some tasks such as 3D point classification.

Updated: 2025-04-09 16:27:02

标题: 海豚：通过思考、实践和反馈迈向闭环自动研究

摘要: 科学研究范式正在经历深刻的转变，这要归功于人工智能（AI）的发展。最近的研究表明，各种基于AI的研究方法可以通过改善数据分析、加速计算和促进新颖想法的生成，大大提高研究效率。为了进一步实现最终目标（即自动科学研究），本文介绍了Dolphin，一个闭环LLM驱动的框架，以提高科学研究的自动化水平。Dolphin首先基于先前实验和相关论文的反馈生成新颖想法，这些论文根据主题和任务属性排名。然后，生成的想法可以使用一个经过设计的异常-回溯引导的本地代码结构的代码模板来实现和调试。最后，Dolphin自动分析每个想法的结果，并将结果反馈给下一轮想法生成。实验在不同主题的基准数据集和MLE-bench的子集上进行。结果显示，Dolphin可以在循环中持续改善输入主题的性能。我们强调，Dolphin可以自动提出一些任务中与最先进技术相媲美的方法，例如3D点分类。

更新时间: 2025-04-09 16:27:02

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2501.03916v3

A Deep Generative Learning Approach for Two-stage Adaptive Robust Optimization

Two-stage adaptive robust optimization (ARO) is a powerful approach for planning under uncertainty, balancing first-stage decisions with recourse decisions made after uncertainty is realized. To account for uncertainty, modelers typically define a simple uncertainty set over which potential outcomes are considered. However, classical methods for defining these sets unintentionally capture a wide range of unrealistic outcomes, resulting in overly-conservative and costly planning in anticipation of unlikely contingencies. In this work, we introduce AGRO, a solution algorithm that performs adversarial generation for two-stage adaptive robust optimization using a variational autoencoder. AGRO generates high-dimensional contingencies that are simultaneously adversarial and realistic, improving the robustness of first-stage decisions at a lower planning cost than standard methods. To ensure generated contingencies lie in high-density regions of the uncertainty distribution, AGRO defines a tight uncertainty set as the image of "latent" uncertainty sets under the VAE decoding transformation. Projected gradient ascent is then used to maximize recourse costs over the latent uncertainty sets by leveraging differentiable optimization methods. We demonstrate the cost-efficiency of AGRO by applying it to both a synthetic production-distribution problem and a real-world power system expansion setting. We show that AGRO outperforms the standard column-and-constraint algorithm by up to 1.8% in production-distribution planning and up to 11.6% in power system expansion.

Updated: 2025-04-09 16:24:02

标题: 一种用于两阶段自适应稳健优化的深度生成学习方法

摘要: 两阶段自适应鲁棒优化（ARO）是一种强大的规划方法，用于在不确定性下进行平衡一阶决策与不确定性实现后的追溯决策。为了考虑不确定性，建模者通常在一个简单的不确定性集合上定义潜在结果。然而，传统方法定义这些集合时无意中捕捉到了一系列不切实际的结果，导致过于保守和昂贵的规划，以应对不太可能发生的情况。在本文中，我们介绍了AGRO，一种利用变分自动编码器进行对抗生成的二阶段自适应鲁棒优化的解决算法。AGRO生成高维度的对抗性和现实性并存的情景，提高了一阶决策的鲁棒性，而且规划成本低于标准方法。为了确保生成的情景位于不确定性分布的高密度区域，AGRO将一个紧密的不确定性集合定义为VAE解码变换下的“潜在”不确定性集合的图像。然后利用投影梯度上升来最大化潜在不确定性集合上的追溯成本，通过利用可微分优化方法。我们通过将AGRO应用于一个合成的生产分配问题和一个现实世界的电力系统扩展设置来展示AGRO的成本效益。我们展示AGRO在生产分配规划中比标准列和约束算法表现优异，最高可提高1.8％，在电力系统扩展中最高可提高11.6％。

更新时间: 2025-04-09 16:24:02

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2409.03731v3

DeCoMa: Detecting and Purifying Code Dataset Watermarks through Dual Channel Code Abstraction

Watermarking is a technique to help identify the source of data points, which can be used to help prevent the misuse of protected datasets. Existing methods on code watermarking, leveraging the idea from the backdoor research, embed stealthy triggers as watermarks.Despite their high resilience against dilution attacks and backdoor detections, the robustness has not been fully evaluated. To fill this gap, we propose DeCoMa, a dual-channel approach to Detect and purify Code dataset waterMarks.To overcome the high barrier created by the stealthy and hidden nature of code watermarks, DeCoMa leverages dual-channel constraints on code to generalize and map code samples into standardized templates. Subsequently, DeCoMa extracts hidden watermarks by identifying outlier associations between paired elements within the standardized templates. Finally, DeCoMa purifies the watermarked dataset by removing all samples containing the detected watermark, enabling the silent appropriation of protected code. We conduct extensive experiments to evaluate the effectiveness and efficiency of DeCoMa, covering 14 types of code watermarks and 3 representative intelligent code tasks (a total of 14 scenarios). Experimental results demonstrate that DeCoMa achieves a stable recall of 100% in 14 code watermark detection scenarios, significantly outperforming the baselines. Additionally, DeCoMa effectively attacks code watermarks with embedding rates as low as 0.1%, while maintaining comparable model performance after training on the purified dataset. Furthermore, as DeCoMa requires no model training for detection, it achieves substantially higher efficiency than all baselines, with a speedup ranging from 31.5 to 130.9X. The results call for more advanced watermarking techniques for code models, while DeCoMa can serve as a baseline for future evaluation.

Updated: 2025-04-09 16:19:11

标题: DeCoMa: 通过双通道代码抽象检测和净化代码数据集水印

摘要: 数字水印技术是一种帮助识别数据点来源的技术，可用于帮助防止受保护数据集的滥用。现有的代码水印方法借鉴了后门研究的思想，将隐蔽触发器嵌入作为水印。尽管它们对稀释攻击和后门检测具有很高的抵抗力，但鲁棒性尚未得到充分评估。为了填补这一空白，我们提出了DeCoMa，一种检测和净化代码数据集水印的双通道方法。为了克服代码水印的隐蔽和隐藏特性带来的高障碍，DeCoMa利用了代码的双通道约束，将代码样本泛化和映射到标准化模板中。随后，DeCoMa通过识别标准化模板中配对元素之间的异常关联来提取隐藏的水印。最后，DeCoMa通过移除所有包含检测到的水印的样本来净化带水印的数据集，实现对受保护代码的悄悄占用。我们进行了大量实验来评估DeCoMa的有效性和效率，涵盖了14种代码水印和3种代表性智能代码任务（共14种场景）。实验结果表明，在14种代码水印检测场景中，DeCoMa实现了100%的稳定召回率，明显优于基线。此外，DeCoMa在嵌入率低至0.1%时有效攻击代码水印，同时在对净化后的数据集进行训练后保持可比较的模型性能。此外，由于DeCoMa无需模型训练进行检测，因此它的效率显著高于所有基线，加速比范围为31.5至130.9倍。这些结果要求为代码模型提供更先进的水印技术，而DeCoMa可以作为未来评估的基准。

更新时间: 2025-04-09 16:19:11

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2504.07002v1

AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment

With the growing adoption of reinforcement learning with human feedback (RLHF) for aligning large language models (LLMs), the risk of backdoor installation during alignment has increased, leading to unintended and harmful behaviors. Existing backdoor triggers are typically limited to fixed word patterns, making them detectable during data cleaning and easily removable post-poisoning. In this work, we explore the use of prompt-specific paraphrases as backdoor triggers, enhancing their stealth and resistance to removal during LLM alignment. We propose AdvBDGen, an adversarially fortified generative fine-tuning framework that automatically generates prompt-specific backdoors that are effective, stealthy, and transferable across models. AdvBDGen employs a generator-discriminator pair, fortified by an adversary, to ensure the installability and stealthiness of backdoors. It enables the crafting and successful installation of complex triggers using as little as 3% of the fine-tuning data. Once installed, these backdoors can jailbreak LLMs during inference, demonstrate improved stability against perturbations compared to traditional constant triggers, and are more challenging to remove. These findings underscore an urgent need for the research community to develop more robust defenses against adversarial backdoor threats in LLM alignment.

Updated: 2025-04-09 16:09:27

标题: AdvBDGen：对抗性强化的特定提示模糊后门生成器针对LLM对齐

摘要: 随着人类反馈强化学习（RLHF）在对齐大型语言模型（LLMs）中的日益普及，对齐过程中潜在的后门安装风险增加，导致意外和有害行为。现有的后门触发器通常仅限于固定的词模式，使它们在数据清理过程中可检测并在中毒后易于移除。在这项工作中，我们探讨了使用特定提示的释义作为后门触发器，增强它们的隐蔽性和抵抗力，从而在LLM对齐过程中更难移除。我们提出了AdvBDGen，这是一个经过对抗加强的生成微调框架，可以自动生成有效、隐蔽且可在模型之间传递的特定提示后门。AdvBDGen使用一个生成器-判别器对，通过对手加强，以确保后门的可安装性和隐蔽性。它能够使用仅占微调数据的3%就能够精心制作和成功安装复杂的触发器。一旦安装完成，这些后门可以在推理过程中越狱LLMs，与传统的恒定触发器相比，表现出更好的稳定性，并且更难移除。这些发现强调了研究界迫切需要开发更加健壮的防御措施，以应对LLMs对齐中的对抗性后门威胁。

更新时间: 2025-04-09 16:09:27

领域: cs.LG

下载: http://arxiv.org/abs/2410.11283v2

Neural Signal Compression using RAMAN tinyML Accelerator for BCI Applications

High-quality, multi-channel neural recording is indispensable for neuroscience research and clinical applications. Large-scale brain recordings often produce vast amounts of data that must be wirelessly transmitted for subsequent offline analysis and decoding, especially in brain-computer interfaces (BCIs) utilizing high-density intracortical recordings with hundreds or thousands of electrodes. However, transmitting raw neural data presents significant challenges due to limited communication bandwidth and resultant excessive heating. To address this challenge, we propose a neural signal compression scheme utilizing Convolutional Autoencoders (CAEs), which achieves a compression ratio of up to 150 for compressing local field potentials (LFPs). The CAE encoder section is implemented on RAMAN, an energy-efficient tinyML accelerator designed for edge computing, and subsequently deployed on an Efinix Ti60 FPGA with 37.3k LUTs and 8.6k register utilization. RAMAN leverages sparsity in activation and weights through zero skipping, gating, and weight compression techniques. Additionally, we employ hardware-software co-optimization by pruning CAE encoder model parameters using a hardware-aware balanced stochastic pruning strategy, resolving workload imbalance issues and eliminating indexing overhead to reduce parameter storage requirements by up to 32.4%. Using the proposed compact depthwise separable convolutional autoencoder (DS-CAE) model, the compressed neural data from RAMAN is reconstructed offline with superior signal-to-noise and distortion ratios (SNDR) of 22.6 dB and 27.4 dB, along with R2 scores of 0.81 and 0.94, respectively, evaluated on two monkey neural recordings.

Updated: 2025-04-09 16:09:00

标题: 使用RAMAN微型ML加速器对BCI应用进行神经信号压缩

摘要: 高质量的多通道神经记录对神经科学研究和临床应用至关重要。大规模的脑部记录通常会产生大量数据，必须通过无线传输进行后续离线分析和解码，特别是在利用数百或数千个电极的高密度颅内记录的脑-计算机界面（BCIs）中。然而，传输原始神经数据存在显著挑战，由于通信带宽有限和导致过度加热。为了解决这一挑战，我们提出了一种利用卷积自动编码器（CAEs）的神经信号压缩方案，该方案可实现高达150的压缩比，用于压缩局部场电位（LFPs）。CAE编码器部分在为边缘计算设计的能效高的tinyML加速器RAMAN上实现，并随后部署在具有37.3k LUTs和8.6k寄存器利用率的Efinix Ti60 FPGA上。RAMAN通过零跳过、门控和权重压缩技术利用激活和权重的稀疏性。此外，我们通过使用硬件感知均衡随机剪枝策略对CAE编码器模型参数进行剪枝，通过硬件-软件协同优化解决了工作负载不平衡问题，并消除索引开销，将参数存储需求降低了高达32.4%。利用提出的紧凑型深度可分离卷积自编码器（DS-CAE）模型，来自RAMAN的压缩神经数据在离线重建时表现出优越的信噪比和失真比（SNDR）分别为22.6 dB和27.4 dB，以及R2分数分别为0.81和0.94，在两只猴子的神经记录上进行评估。

更新时间: 2025-04-09 16:09:00

领域: cs.AR,cs.HC,cs.LG

下载: http://arxiv.org/abs/2504.06996v1

RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration

Open-set semantic mapping is crucial for open-world robots. Current mapping approaches either are limited by the depth range or only map beyond-range entities in constrained settings, where overall they fail to combine within-range and beyond-range observations. Furthermore, these methods make a trade-off between fine-grained semantics and efficiency. We introduce RayFronts, a unified representation that enables both dense and beyond-range efficient semantic mapping. RayFronts encodes task-agnostic open-set semantics to both in-range voxels and beyond-range rays encoded at map boundaries, empowering the robot to reduce search volumes significantly and make informed decisions both within & beyond sensory range, while running at 8.84 Hz on an Orin AGX. Benchmarking the within-range semantics shows that RayFronts's fine-grained image encoding provides 1.34x zero-shot 3D semantic segmentation performance while improving throughput by 16.5x. Traditionally, online mapping performance is entangled with other system components, complicating evaluation. We propose a planner-agnostic evaluation framework that captures the utility for online beyond-range search and exploration, and show RayFronts reduces search volume 2.2x more efficiently than the closest online baselines.

Updated: 2025-04-09 16:06:58

标题: RayFronts：用于在线场景理解和探索的开放式语义射线边界

摘要: 开放式语义映射对于开放世界的机器人至关重要。当前的映射方法要么受到深度范围的限制，要么仅在受限制的环境中映射超出范围的实体，总体上未能将范围内和范围外的观察结合起来。此外，这些方法在精细语义和效率之间进行了权衡。我们引入了RayFronts，一种统一的表示形式，可以实现密集和超出范围的高效语义映射。RayFronts对不受任务约束的开放式语义进行编码，同时对编码在地图边界的范围内体素和超出范围的射线进行编码，使机器人能够显著减少搜索范围，并在8.84 Hz的运行速度下在感知范围内外做出明智决策。对范围内语义的基准测试显示，RayFronts的细粒度图像编码提供了1.34倍的零样本3D语义分割性能，同时将吞吐量提高了16.5倍。传统上，在线映射性能与其他系统组件纠缠在一起，使评估变得更加复杂。我们提出了一个与规划器无关的评估框架，捕捉了在线超出范围搜索和探索的效用，并展示RayFronts比最接近的在线基线更有效地减少了搜索范围2.2倍。

更新时间: 2025-04-09 16:06:58

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.06994v1

HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting?

Forecasting multiple future events within a given time horizon is essential for applications in finance, retail, social networks, and healthcare. Marked Temporal Point Processes (MTPP) provide a principled framework to model both the timing and labels of events. However, most existing research focuses on predicting only the next event, leaving long-horizon forecasting largely underexplored. To address this gap, we introduce HoTPP, the first benchmark specifically designed to rigorously evaluate long-horizon predictions. We identify shortcomings in widely used evaluation metrics, propose a theoretically grounded T-mAP metric, present strong statistical baselines, and offer efficient implementations of popular models. Our empirical results demonstrate that modern MTPP approaches often underperform simple statistical baselines. Furthermore, we analyze the diversity of predicted sequences and find that most methods exhibit mode collapse. Finally, we analyze the impact of autoregression and intensity-based losses on prediction quality, and outline promising directions for future research. The HoTPP source code, hyperparameters, and full evaluation results are available on GitHub.

Updated: 2025-04-09 15:59:49

标题: HoTPP基准测试：我们在长期事件预测方面做得好吗？

摘要: 在给定时间范围内预测多个未来事件对于金融、零售、社交网络和医疗保健等领域的应用至关重要。标记的时间点过程（MTPP）提供了一个合理的框架来建模事件的时间和标签。然而，大多数现有研究集中在预测下一个事件，长期预测往往被忽视。为了填补这一空白，我们引入了HoTPP，这是第一个专门设计用于严格评估长期预测的基准。我们发现了广泛使用的评估指标的缺点，提出了一个基于理论的T-mAP指标，提供了强大的统计基线，并提供了流行模型的高效实现。我们的实证结果表明，现代MTPP方法通常表现不如简单的统计基线。此外，我们分析了预测序列的多样性，并发现大多数方法存在模式坍塌。最后，我们分析了自回归和基于强度的损失对预测质量的影响，并概述了未来研究的有希望的方向。HoTPP的源代码、超参数和完整的评估结果都可以在GitHub上找到。

更新时间: 2025-04-09 15:59:49

领域: cs.LG

下载: http://arxiv.org/abs/2406.14341v3

Dissimilar Batch Decompositions of Random Datasets

For better learning, large datasets are often split into small batches and fed sequentially to the predictive model. In this paper, we study such batch decompositions from a probabilistic perspective. We assume that data points (possibly corrupted) are drawn independently from a given space and define a concept of similarity between two data points. We then consider decompositions that restrict the amount of similarity within each batch and obtain high probability bounds for the minimum size. We demonstrate an inherent tradeoff between relaxing the similarity constraint and the overall size and also use martingale methods to obtain bounds for the maximum size of data subsets with a given similarity.

Updated: 2025-04-09 15:58:06

标题: 随机数据集的不同批次分解

摘要: 为了更好地学习，通常会将大型数据集分成小批次，并依次输入到预测模型中。本文从概率的角度研究了这种批次分解。我们假设数据点（可能是损坏的）是独立地从给定空间中抽取的，并定义了两个数据点之间的相似性概念。然后，我们考虑限制每个批次内相似性量的分解，并获得了最小大小的高概率界限。我们展示了在放宽相似性约束和整体大小之间存在固有的权衡，还利用鞍点方法得到了具有给定相似性的数据子集的最大大小的界限。

更新时间: 2025-04-09 15:58:06

领域: cs.LG,math.PR,stat.ML

下载: http://arxiv.org/abs/2504.06991v1

Multi-Fidelity Policy Gradient Algorithms

Many reinforcement learning (RL) algorithms require large amounts of data, prohibiting their use in applications where frequent interactions with operational systems are infeasible, or high-fidelity simulations are expensive or unavailable. Meanwhile, low-fidelity simulators--such as reduced-order models, heuristic reward functions, or generative world models--can cheaply provide useful data for RL training, even if they are too coarse for direct sim-to-real transfer. We propose multi-fidelity policy gradients (MFPGs), an RL framework that mixes a small amount of data from the target environment with a large volume of low-fidelity simulation data to form unbiased, reduced-variance estimators (control variates) for on-policy policy gradients. We instantiate the framework by developing multi-fidelity variants of two policy gradient algorithms: REINFORCE and proximal policy optimization. Experimental results across a suite of simulated robotics benchmark problems demonstrate that when target-environment samples are limited, MFPG achieves up to 3.9x higher reward and improves training stability when compared to baselines that only use high-fidelity data. Moreover, even when the baselines are given more high-fidelity samples--up to 10x as many interactions with the target environment--MFPG continues to match or outperform them. Finally, we observe that MFPG is capable of training effective policies even when the low-fidelity environment is drastically different from the target environment. MFPG thus not only offers a novel paradigm for efficient sim-to-real transfer but also provides a principled approach to managing the trade-off between policy performance and data collection costs.

Updated: 2025-04-09 15:52:25

标题: 多保真度策略梯度算法

摘要: 许多强化学习（RL）算法需要大量数据，这使得它们无法在与运行系统频繁交互不可行的应用中使用，或者高保真度的模拟成本昂贵或不可用。与此同时，低保真度的模拟器--如降阶模型、启发式奖励函数或生成性世界模型--可以廉价地为RL训练提供有用的数据，即使它们对于直接模拟到真实环境的转移来说过于粗糙。我们提出了多保真度策略梯度（MFPGs），这是一个RL框架，混合来自目标环境的少量数据和大量低保真度模拟数据，形成无偏差、降低方差的估计器（控制变量）用于基于策略梯度的在线策略梯度。我们通过开发两种策略梯度算法的多保真度变体：REINFORCE和近端策略优化来实现该框架。在一系列模拟机器人基准问题上的实验结果表明，在目标环境样本有限时，MFPG的奖励可以达到高出3.9倍，并且在与仅使用高保真度数据的基线相比，提高了训练稳定性。此外，即使基线获得更多高保真度样本--与目标环境的交互次数增加到10倍--MFPG仍能持续匹配或超越它们。最后，我们观察到，即使低保真度环境与目标环境截然不同，MFPG仍能训练出有效的策略。因此，MFPG不仅提供了一种有效的模拟到真实转移的新范式，还提供了一种管理策略性能和数据收集成本之间权衡的原则性方法。

更新时间: 2025-04-09 15:52:25

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2503.05696v2

Enhancing Metabolic Syndrome Prediction with Hybrid Data Balancing and Counterfactuals

Metabolic Syndrome (MetS) is a cluster of interrelated risk factors that significantly increases the risk of cardiovascular diseases and type 2 diabetes. Despite its global prevalence, accurate prediction of MetS remains challenging due to issues such as class imbalance, data scarcity, and methodological inconsistencies in existing studies. In this paper, we address these challenges by systematically evaluating and optimizing machine learning (ML) models for MetS prediction, leveraging advanced data balancing techniques and counterfactual analysis. Multiple ML models, including XGBoost, Random Forest, TabNet, etc., were trained and compared under various data balancing techniques such as random oversampling (ROS), SMOTE, ADASYN, and CTGAN. Additionally, we introduce MetaBoost, a novel hybrid framework that integrates SMOTE, ADASYN, and CTGAN, optimizing synthetic data generation through weighted averaging and iterative weight tuning to enhance the model's performance (achieving a 1.14% accuracy improvement over individual balancing techniques). A comprehensive counterfactual analysis is conducted to quantify feature-level changes required to shift individuals from high-risk to low-risk categories. The results indicate that blood glucose (50.3%) and triglycerides (46.7%) were the most frequently modified features, highlighting their clinical significance in MetS risk reduction. Additionally, probabilistic analysis shows elevated blood glucose (85.5% likelihood) and triglycerides (74.9% posterior probability) as the strongest predictors. This study not only advances the methodological rigor of MetS prediction but also provides actionable insights for clinicians and researchers, highlighting the potential of ML in mitigating the public health burden of metabolic syndrome.

Updated: 2025-04-09 15:51:10

标题: 利用混合数据平衡和反事实增强代谢综合征预测

摘要: 代谢综合征（MetS）是一组相互关联的危险因素，显著增加了心血管疾病和2型糖尿病的风险。尽管其在全球范围内的患病率很高，但由于诸如类别不平衡、数据稀缺和现有研究中的方法不一致等问题，MetS的准确预测仍然具有挑战性。在本文中，我们通过系统评估和优化机器学习（ML）模型来预测MetS，利用先进的数据平衡技术和反事实分析来解决这些挑战。多个ML模型，包括XGBoost、随机森林、TabNet等，经过训练并在各种数据平衡技术（如随机过采样（ROS）、SMOTE、ADASYN和CTGAN）下进行比较。此外，我们引入了MetaBoost，这是一个集成SMOTE、ADASYN和CTGAN的新型混合框架，通过加权平均和迭代权重调整优化合成数据生成，以增强模型性能（相比单独平衡技术实现了1.14%的准确度提升）。进行了全面的反事实分析，以量化需要调整特征水平的变化，使个体从高风险转变为低风险类别。结果表明，血糖（50.3%）和甘油三酯（46.7%）是最常修改的特征，突显了它们在减少MetS风险中的临床重要性。此外，概率分析显示，升高的血糖（85.5%的可能性）和甘油三酯（74.9%的后验概率）是最强的预测因素。这项研究不仅推进了MetS预测的方法学严谨性，还为临床医生和研究人员提供了可操作的见解，突显了ML在缓解代谢综合征公共健康负担方面的潜力。

更新时间: 2025-04-09 15:51:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06987v1

Free Random Projection for In-Context Reinforcement Learning

Hierarchical inductive biases are hypothesized to promote generalizable policies in reinforcement learning, as demonstrated by explicit hyperbolic latent representations and architectures. Therefore, a more flexible approach is to have these biases emerge naturally from the algorithm. We introduce Free Random Projection, an input mapping grounded in free probability theory that constructs random orthogonal matrices where hierarchical structure arises inherently. The free random projection integrates seamlessly into existing in-context reinforcement learning frameworks by encoding hierarchical organization within the input space without requiring explicit architectural modifications. Empirical results on multi-environment benchmarks show that free random projection consistently outperforms the standard random projection, leading to improvements in generalization. Furthermore, analyses within linearly solvable Markov decision processes and investigations of the spectrum of kernel random matrices reveal the theoretical underpinnings of free random projection's enhanced performance, highlighting its capacity for effective adaptation in hierarchically structured state spaces.

Updated: 2025-04-09 15:38:50

标题: 上下文强化学习中的自由随机投影

摘要: 分层归纳偏差被假设能够促进强化学习中具有泛化能力的策略，正如明确的双曲线潜在表示和架构所证明的那样。因此，一个更灵活的方法是让这些偏差自然地从算法中出现。我们引入了自由随机投影，这是一种基于自由概率理论的输入映射，可以构建随机正交矩阵，从而固有地形成分层结构。自由随机投影可以无缝集成到现有的上下文强化学习框架中，通过在输入空间中编码层次化组织来实现，而无需明确的架构修改。在多环境基准测试中的实证结果显示，自由随机投影始终优于标准随机投影，从而提高了泛化能力。此外，在线性可解的马尔可夫决策过程以及对核随机矩阵谱的研究中揭示了自由随机投影增强性能的理论基础，突显了它在具有层次结构状态空间中的有效适应能力。

更新时间: 2025-04-09 15:38:50

领域: cs.LG,math.PR,stat.ML

下载: http://arxiv.org/abs/2504.06983v1

TabRep: a Simple and Effective Continuous Representation for Training Tabular Diffusion Models

Diffusion models have been the predominant generative model for tabular data generation. However, they face the conundrum of modeling under a separate versus a unified data representation. The former encounters the challenge of jointly modeling all multi-modal distributions of tabular data in one model. While the latter alleviates this by learning a single representation for all features, it currently leverages sparse suboptimal encoding heuristics and necessitates additional computation costs. In this work, we address the latter by presenting TabRep, a tabular diffusion architecture trained with a unified continuous representation. To motivate the design of our representation, we provide geometric insights into how the data manifold affects diffusion models. The key attributes of our representation are composed of its density, flexibility to provide ample separability for nominal features, and ability to preserve intrinsic relationships. Ultimately, TabRep provides a simple yet effective approach for training tabular diffusion models under a continuous data manifold. Our results showcase that TabRep achieves superior performance across a broad suite of evaluations. It is the first to synthesize tabular data that exceeds the downstream quality of the original datasets while preserving privacy and remaining computationally efficient.

Updated: 2025-04-09 15:38:00

标题: TabRep：用于训练表格扩散模型的简单有效连续表示

摘要: 扩散模型一直是表格数据生成模型中的主要模型。然而，它们面临着在单独数据表示和统一数据表示下建模的难题。前者需要在一个模型中联合建模表格数据的所有多模态分布，而后者通过学习一个表示所有特征的单一表示来缓解这一挑战，但目前利用稀疏次优编码启发式方法，并需要额外的计算成本。在这项工作中，我们通过提出TabRep来解决后者，这是一个使用统一连续表示训练的表格扩散架构。为了激发我们表示设计的动机，我们提供了几何洞察，说明数据流形如何影响扩散模型。我们表示的关键属性包括其密度、为名义特征提供充分可分性的灵活性，以及保持内在关系的能力。最终，TabRep为在连续数据流形下训练表格扩散模型提供了一种简单而有效的方法。我们的结果展示了，TabRep在广泛的评估中取得了卓越的性能。它是第一个合成超越原始数据集下游质量的表格数据的模型，同时保持隐私并保持计算效率。

更新时间: 2025-04-09 15:38:00

领域: cs.LG

下载: http://arxiv.org/abs/2504.04798v3

Artificial Intelligence for Pediatric Height Prediction Using Large-Scale Longitudinal Body Composition Data

This study developed an accurate artificial intelligence model for predicting future height in children and adolescents using anthropometric and body composition data from the GP Cohort Study (588,546 measurements from 96,485 children aged 7-18). The model incorporated anthropometric measures, body composition, standard deviation scores, and growth velocity parameters, with performance evaluated using RMSE, MAE, and MAPE. Results showed high accuracy with males achieving average RMSE, MAE, and MAPE of 2.51 cm, 1.74 cm, and 1.14%, and females showing 2.28 cm, 1.68 cm, and 1.13%, respectively. Explainable AI approaches identified height SDS, height velocity, and soft lean mass velocity as crucial predictors. The model generated personalized growth curves by estimating individual-specific height trajectories, offering a robust tool for clinical decision support, early identification of growth disorders, and optimization of growth outcomes.

Updated: 2025-04-09 15:32:15

标题: 人工智能在使用大规模纵向身体组成数据进行儿童身高预测方面的应用

摘要: 这项研究开发了一个准确的人工智能模型，用于预测儿童和青少年未来的身高，使用了来自GP队列研究的人体测量和体成分数据（来自96,485名年龄为7-18岁的儿童的588,546个测量）。该模型结合了人体测量、体成分、标准偏差分数和生长速度参数，通过使用RMSE、MAE和MAPE进行性能评估。结果显示男性的平均RMSE、MAE和MAPE分别为2.51厘米、1.74厘米和1.14%，女性分别为2.28厘米、1.68厘米和1.13%。可解释的人工智能方法确定了身高SDS、身高速度和软瘦体量速度作为关键预测因子。该模型通过估计个体特定的身高轨迹生成了个性化的生长曲线，为临床决策支持、早期发现生长障碍和优化生长结果提供了强大的工具。

更新时间: 2025-04-09 15:32:15

领域: q-bio.QM,cs.LG,62P10, 68T05

下载: http://arxiv.org/abs/2504.06979v1

LLM Safeguard is a Double-Edged Sword: Exploiting False Positives for Denial-of-Service Attacks

Safety is a paramount concern for large language models (LLMs) in open deployment, motivating the development of safeguard methods that enforce ethical and responsible use through safety alignment or guardrail mechanisms. Jailbreak attacks that exploit the \emph{false negatives} of safeguard methods have emerged as a prominent research focus in the field of LLM security. However, we found that the malicious attackers could also exploit false positives of safeguards, i.e., fooling the safeguard model to block safe content mistakenly, leading to a denial-of-service (DoS) affecting LLM users. To bridge the knowledge gap of this overlooked threat, we explore multiple attack methods that include inserting a short adversarial prompt into user prompt templates and corrupting the LLM on the server by poisoned fine-tuning. In both ways, the attack triggers safeguard rejections of user requests from the client. Our evaluation demonstrates the severity of this threat across multiple scenarios. For instance, in the scenario of white-box adversarial prompt injection, the attacker can use our optimization process to automatically generate seemingly safe adversarial prompts, approximately only 30 characters long, that universally block over 97% of user requests on Llama Guard 3. These findings reveal a new dimension in LLM safeguard evaluation -- adversarial robustness to false positives.

Updated: 2025-04-09 15:20:33

标题: LLM安全保障是双刃剑：利用误报阳性进行拒绝服务攻击

摘要: 安全是大型语言模型（LLM）在开放部署中的首要关注点，促使开发通过安全对齐或防护机制强制执行道德和负责任的使用的安全防护方法。利用防护方法的“假阴性”的越狱攻击已成为LLM安全领域的一个突出研究重点。然而，我们发现恶意攻击者也可以利用防护方法的假阳性，即欺骗防护模型错误地阻止安全内容，导致拒绝服务（DoS）影响LLM用户。为了弥补这一被忽视威胁的知识差距，我们探索了多种攻击方法，包括将短的对抗性提示插入用户提示模板和通过有毒的微调损坏服务器上的LLM。以这两种方式，攻击会触发客户端用户请求的防护拒绝。我们的评估展示了这一威胁在多种场景中的严重性。例如，在白盒对抗性提示注入场景中，攻击者可以使用我们的优化过程自动生成看似安全的对抗性提示，大约只有30个字符长，普遍阻止了Llama Guard 3上超过97%的用户请求。这些发现揭示了LLM防护评估中的一个新维度--对假阳性的对抗强度。

更新时间: 2025-04-09 15:20:33

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.02916v3

RNN-Transducer-based Losses for Speech Recognition on Noisy Targets

Training speech recognition systems on noisy transcripts is a significant challenge in industrial pipelines, where datasets are enormous and ensuring accurate transcription for every instance is difficult. In this work, we introduce novel loss functions to mitigate the impact of transcription errors in RNN-Transducer models. Our Star-Transducer loss addresses deletion errors by incorporating "skip frame" transitions in the loss lattice, restoring over 90% of the system's performance compared to models trained with accurate transcripts. The Bypass-Transducer loss uses "skip token" transitions to tackle insertion errors, recovering more than 60% of the quality. Finally, the Target-Robust Transducer loss merges these approaches, offering robust performance against arbitrary errors. Experimental results demonstrate that the Target-Robust Transducer loss significantly improves RNN-T performance on noisy data by restoring over 70% of the quality compared to well-transcribed data.

Updated: 2025-04-09 15:18:29

标题: RNN-Transducer基于噪声目标的语音识别损失

摘要: 在工业管道中对嘈杂文本进行语音识别系统训练是一个重要挑战，其中数据集庞大且确保每个实例的准确转录是困难的。在这项工作中，我们引入了新颖的损失函数，以减轻RNN-Transducer模型中转录错误的影响。我们的Star-Transducer损失通过在损失晶格中包含“跳帧”转换来解决删除错误，与使用准确文本训练的模型相比，恢复了系统90%以上的性能。Bypass-Transducer损失使用“跳标记”转换来解决插入错误，恢复了超过60%的质量。最后，Target-Robust Transducer损失合并了这些方法，提供了对任意错误的稳健性能。实验结果表明，与良好转录数据相比，Target-Robust Transducer损失显著改善了对嘈杂数据的RNN-T性能，恢复了超过70%的质量。

更新时间: 2025-04-09 15:18:29

领域: eess.AS,cs.AI,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2504.06963v1

Scalable Geometric Learning with Correlation-Based Functional Brain Networks

The correlation matrix is a central representation of functional brain networks in neuroimaging. Traditional analyses often treat pairwise interactions independently in a Euclidean setting, overlooking the intrinsic geometry of correlation matrices. While earlier attempts have embraced the quotient geometry of the correlation manifold, they remain limited by computational inefficiency and numerical instability, particularly in high-dimensional contexts. This paper presents a novel geometric framework that employs diffeomorphic transformations to embed correlation matrices into a Euclidean space, preserving salient manifold properties and enabling large-scale analyses. The proposed method integrates with established learning algorithms - regression, dimensionality reduction, and clustering - and extends naturally to population-level inference of brain networks. Simulation studies demonstrate both improved computational speed and enhanced accuracy compared to conventional manifold-based approaches. Moreover, applications in real neuroimaging scenarios illustrate the framework's utility, enhancing behavior score prediction, subject fingerprinting in resting-state fMRI, and hypothesis testing in electroencephalogram data. An open-source MATLAB toolbox is provided to facilitate broader adoption and advance the application of correlation geometry in functional brain network research.

Updated: 2025-04-09 15:14:53

标题: 使用基于相关性的功能性大脑网络进行可扩展的几何学习

摘要: 相关矩阵是神经影像中功能脑网络的中心表示。传统分析通常在欧几里德设置中独立处理成对交互作用，忽视相关矩阵的固有几何。尽管早期尝试已经采用了相关流形的商几何，但它们仍然受到计算效率和数值稳定性的限制，特别是在高维环境中。本文提出了一种新颖的几何框架，利用微分同胚变换将相关矩阵嵌入到欧几里德空间中，保留突出的流形特性，并实现大规模分析。所提出的方法与已建立的学习算法 - 回归、降维和聚类 - 结合，并自然地扩展到大脑网络的群体水平推断。模拟研究表明，与传统的基于流形的方法相比，计算速度和准确性均得到了改善。此外，在真实神经影像场景中的应用展示了该框架的实用性，增强了行为评分预测、静息态fMRI中的主体指纹识别以及脑电图数据的假设检验。提供了一个开源的MATLAB工具箱，以促进更广泛的采用并推动相关几何在功能脑网络研究中的应用。

更新时间: 2025-04-09 15:14:53

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2503.23653v2

Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation

Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO), demonstrating strong transferability across diverse remote sensing tasks. While prior work has focused on network architectures and training strategies, the role of dataset curation, especially in balancing and diversifying pre-training datasets, remains underexplored. In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery, which can lead to biased representations and inefficient training. In this work, we propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance. Our method iteratively refines the training set without requiring a pre-existing feature extractor, making it well-suited for domains where curated datasets are limited or unavailable. We demonstrate our approach on the Sentinel-1 Wave Mode (WV) Synthetic Aperture Radar (SAR) archive, a challenging dataset dominated by ocean observations. We train models from scratch on the entire Sentinel-1 WV archive spanning 10 years. Across three downstream tasks, our results show that dynamic pruning improves both computational efficiency and representation quality, leading to stronger transferability. We also release the weights of Nereus-SAR-1, the first model in the Nereus family, a series of foundation models for ocean observation and analysis using SAR imagery, at github.com/galeio-research/nereus-sar-models/.

Updated: 2025-04-09 15:13:26

标题: 高效的自监督学习地球观测方法：动态数据集整理

摘要: 自我监督学习（SSL）已经实现了对地球观测（EO）视觉基础模型的发展，展示出在不同的遥感任务中具有强大的可迁移性。尽管先前的工作集中在网络架构和训练策略上，但数据集的策划角色，特别是在平衡和丰富化预训练数据集方面，仍未得到充分探讨。在EO领域，这一挑战由于卫星图像中常见的冗余性和重尾分布而被放大，这可能导致偏见表示和低效的训练。在这项工作中，我们提出了一种动态数据集修剪策略，旨在通过最大化数据集的多样性和平衡来改善SSL预训练。我们的方法通过迭代地精炼训练集来完成，而不需要预先存在的特征提取器，因此非常适用于策划数据集有限或不可用的领域。我们在Sentinel-1 Wave Mode（WV）合成孔径雷达（SAR）存档上展示了我们的方法，这是一个以海洋观测为主导的具有挑战性的数据集。我们从头开始训练在过去10年中涵盖整个Sentinel-1 WV存档的模型。在三个下游任务中，我们的结果表明动态修剪提高了计算效率和表示质量，从而增强了可迁移性。我们还发布了Nereus家族中第一个模型Nereus-SAR-1的权重，这是一系列使用SAR图像进行海洋观测和分析的基础模型，网址为github.com/galeio-research/nereus-sar-models/。

更新时间: 2025-04-09 15:13:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.06962v1

FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion

As machine learning models in critical fields increasingly grapple with multimodal data, they face the dual challenges of handling a wide array of modalities, often incomplete due to missing elements, and the temporal irregularity and sparsity of collected samples. Successfully leveraging this complex data, while overcoming the scarcity of high-quality training samples, is key to improving these models' predictive performance. We introduce ``FuseMoE'', a mixture-of-experts framework incorporated with an innovative gating function. Designed to integrate a diverse number of modalities, FuseMoE is effective in managing scenarios with missing modalities and irregularly sampled data trajectories. Theoretically, our unique gating function contributes to enhanced convergence rates, leading to better performance in multiple downstream tasks. The practical utility of FuseMoE in the real world is validated by a diverse set of challenging prediction tasks.

Updated: 2025-04-09 15:12:58

标题: FuseMoE：用于Fleximodal融合的专家混合Transformer

摘要: 随着关键领域中的机器学习模型越来越多地处理多模态数据，它们面临处理各种模态的双重挑战，通常由于缺失元素而不完整，并且收集样本的时间不规则和稀疏。成功利用这种复杂数据，同时克服高质量训练样本的稀缺性，是提高这些模型预测性能的关键。我们引入了``FuseMoE''，这是一个融合了创新门控函数的专家混合框架。设计用于整合多种不同模态，FuseMoE在处理缺少模态和不规则采样数据轨迹的场景中非常有效。从理论上讲，我们独特的门控函数有助于提高收敛速度，从而在多个下游任务中获得更好的性能。FuseMoE在现实世界中的实际效用通过一系列具有挑战性的预测任务得到验证。

更新时间: 2025-04-09 15:12:58

领域: cs.LG

下载: http://arxiv.org/abs/2402.03226v4

A Survey of Source Code Representations for Machine Learning-Based Cybersecurity Tasks

Machine learning techniques for cybersecurity-related software engineering tasks are becoming increasingly popular. The representation of source code is a key portion of the technique that can impact the way the model is able to learn the features of the source code. With an increasing number of these techniques being developed, it is valuable to see the current state of the field to better understand what exists and what is not there yet. This article presents a study of these existing machine learning based approaches and demonstrates what type of representations were used for different cybersecurity tasks and programming languages. Additionally, we study what types of models are used with different representations. We have found that graph-based representations are the most popular category of representation, and tokenizers and Abstract Syntax Trees (ASTs) are the two most popular representations overall (e.g., AST and tokenizers are the representations with the highest count of papers, whereas graph-based representations is the category with the highest count of papers). We also found that the most popular cybersecurity task is vulnerability detection, and the language that is covered by the most techniques is C. Finally, we found that sequence-based models are the most popular category of models, and Support Vector Machines are the most popular model overall.

Updated: 2025-04-09 15:06:35

标题: 一项关于基于机器学习的网络安全任务中源代码表示的调查

摘要: 机器学习技术在与网络安全相关的软件工程任务中变得越来越受欢迎。源代码的表示是该技术的关键部分，它可以影响模型学习源代码特征的方式。随着越来越多的这些技术被开发出来，了解该领域的当前状态以更好地理解现有情况以及尚未存在的情况是有价值的。本文对这些现有的基于机器学习的方法进行了研究，并展示了不同网络安全任务和编程语言中使用的表示类型。此外，我们研究了使用不同表示的模型类型。我们发现基于图的表示是最受欢迎的表示类别，而标记器和抽象语法树（AST）是总体上最受欢迎的两种表示（例如，AST和标记器是论文数量最多的表示，而基于图的表示是论文数量最多的类别）。我们还发现最受欢迎的网络安全任务是漏洞检测，而被最多技术覆盖的语言是C。最后，我们发现基于序列的模型是最受欢迎的模型类别，支持向量机是总体上最受欢迎的模型。

更新时间: 2025-04-09 15:06:35

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2403.10646v2

Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets

The rise of online platforms exacerbated the spread of hate speech, demanding scalable and effective detection. However, the accuracy of hate speech detection systems heavily relies on human-labeled data, which is inherently susceptible to biases. While previous work has examined the issue, the interplay between the characteristics of the annotator and those of the target of the hate are still unexplored. We fill this gap by leveraging an extensive dataset with rich socio-demographic information of both annotators and targets, uncovering how human biases manifest in relation to the target's attributes. Our analysis surfaces the presence of widespread biases, which we quantitatively describe and characterize based on their intensity and prevalence, revealing marked differences. Furthermore, we compare human biases with those exhibited by persona-based LLMs. Our findings indicate that while persona-based LLMs do exhibit biases, these differ significantly from those of human annotators. Overall, our work offers new and nuanced results on human biases in hate speech annotations, as well as fresh insights into the design of AI-driven hate speech detection systems.

Updated: 2025-04-09 15:05:27

标题: 人类和LLM在仇恨言论注释中的偏见：注释者和目标的社会人口统计分析

摘要: 在线平台的兴起加剧了仇恨言论的传播，需要可扩展且有效的检测。然而，仇恨言论检测系统的准确性在很大程度上依赖于人工标记的数据，这种数据本身容易受到偏见的影响。虽然以前的研究已经检验了这个问题，但标注者的特征与仇恨言论对象的特征之间的相互作用仍未被探讨。我们通过利用包含注释者和目标丰富的社会人口统计信息的庞大数据集，揭示了人类偏见如何与目标的属性相关联。我们的分析表明存在普遍的偏见，我们根据其强度和普遍性进行了定量描述和表征，揭示了明显的差异。此外，我们将人类偏见与基于人物的LLM所展示的偏见进行了比较。我们的研究结果表明，虽然基于人物的LLM确实存在偏见，但这些偏见与人类注释者的偏见显著不同。总的来说，我们的工作在仇恨言论注释中提供了新颖而细致的人类偏见研究结果，同时为人工智能驱动的仇恨言论检测系统的设计提供了新的见解。

更新时间: 2025-04-09 15:05:27

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2410.07991v5

Adaptive Computation Pruning for the Forgetting Transformer

The recently proposed Forgetting Transformer (FoX) incorporates a forget gate into softmax attention and has shown consistently better or on-par performance compared to the standard RoPE-based Transformer. Notably, many attention heads in FoX tend to forget quickly, causing their output at each timestep to rely primarily on the local context. Based on this observation, we propose Adaptive Computation Pruning (ACP) for FoX, a method that dynamically prunes computations involving input-output dependencies that are strongly decayed by the forget gate. This is achieved using a dynamically set pruning threshold that ensures that the pruned attention weights remain negligible. We apply ACP to language model pretraining with FoX and show it consistently reduces the number of FLOPs in softmax attention by around 70% across different model sizes and context lengths, resulting in a roughly 10% to 35% improvement in training throughput. Furthermore, longer context lengths yield greater computational savings. All these speed improvements are achieved without any performance degradation. We also perform several analyses to provide deeper insights into our method, such as examining the pruning patterns and analyzing the distribution of FLOP savings across different attention heads. Our code is available at https://github.com/zhixuan-lin/arctic-fox.

Updated: 2025-04-09 14:57:55

标题: 自适应计算修剪用于遗忘变压器

摘要: 最近提出的Forgetting Transformer（FoX）将遗忘门整合到softmax注意力机制中，并与基于标准RoPE的Transformer相比，表现出一致更好或相当的性能。值得注意的是，FoX中许多注意力头往往会快速遗忘，导致它们在每个时间步的输出主要依赖于局部上下文。基于这一观察结果，我们提出了适应性计算修剪（ACP）方法用于FoX，该方法动态修剪通过遗忘门强烈衰减的输入-输出依赖关系的计算。这是通过使用动态设置的修剪阈值来实现的，确保修剪的注意力权重保持可忽略。我们将ACP应用于FoX的语言模型预训练，并展示它在不同模型大小和上下文长度下，softmax注意力中减少约70％的FLOPs，从而训练吞吐量大约提高了10％至35％。此外，更长的上下文长度会产生更大的计算节约。所有这些速度改进都是在没有任何性能退化的情况下实现的。我们还执行了几项分析，以提供对我们方法的更深入了解，例如检查修剪模式并分析不同注意力头之间FLOP节省的分布。我们的代码可在https://github.com/zhixuan-lin/arctic-fox找到。

更新时间: 2025-04-09 14:57:55

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2504.06949v1

Review of Case-Based Reasoning for LLM Agents: Theoretical Foundations, Architectural Components, and Cognitive Integration

Agents powered by Large Language Models (LLMs) have recently demonstrated impressive capabilities in various tasks. Still, they face limitations in tasks requiring specific, structured knowledge, flexibility, or accountable decision-making. While agents are capable of perceiving their environments, forming inferences, planning, and executing actions towards goals, they often face issues such as hallucinations and lack of contextual memory across interactions. This paper explores how Case-Based Reasoning (CBR), a strategy that solves new problems by referencing past experiences, can be integrated into LLM agent frameworks. This integration allows LLMs to leverage explicit knowledge, enhancing their effectiveness. We systematically review the theoretical foundations of these enhanced agents, identify critical framework components, and formulate a mathematical model for the CBR processes of case retrieval, adaptation, and learning. We also evaluate CBR-enhanced agents against other methods like Chain-of-Thought reasoning and standard Retrieval-Augmented Generation, analyzing their relative strengths. Moreover, we explore how leveraging CBR's cognitive dimensions (including self-reflection, introspection, and curiosity) via goal-driven autonomy mechanisms can further enhance the LLM agent capabilities. Contributing to the ongoing research on neuro-symbolic hybrid systems, this work posits CBR as a viable technique for enhancing the reasoning skills and cognitive aspects of autonomous LLM agents.

Updated: 2025-04-09 14:51:02

标题: 对LLM代理的案例推理进行综述：理论基础、架构组件和认知整合

摘要: 由大型语言模型（LLMs）驱动的代理人最近在各种任务中展示了令人印象深刻的能力。然而，在需要特定、结构化知识、灵活性或可核算决策的任务中，它们面临着限制。虽然代理人能够感知其环境、形成推理、规划并执行朝向目标的行动，但它们经常面临幻觉和跨交互的缺乏上下文记忆等问题。本文探讨了如何将基于案例的推理（CBR），一种通过参考过去经验解决新问题的策略，整合到LLM代理人框架中。这种整合使LLMs能够利用显式知识，增强它们的有效性。我们系统地审查了这些增强代理人的理论基础，确定了关键的框架组件，并为案例检索、适应和学习的CBR过程制定了数学模型。我们还对CBR增强代理人与其他方法（如思维链推理和标准的检索增强生成）进行了评估，分析它们的相对优势。此外，我们探讨了如何通过目标驱动的自治机制利用CBR的认知维度（包括自我反思、内省和好奇心），进一步增强LLM代理人的能力。作为对神经符号混合系统的正在进行的研究的贡献，本文将CBR作为增强自主LLM代理人的推理技能和认知方面的可行技术。

更新时间: 2025-04-09 14:51:02

领域: cs.AI,cs.MA,68,I.2; I.2.7

下载: http://arxiv.org/abs/2504.06943v1

ASRL:A robust loss function with potential for development

In this article, we proposed a partition:wise robust loss function based on the previous robust loss function. The characteristics of this loss function are that it achieves high robustness and a wide range of applicability through partition-wise design and adaptive parameter adjustment. Finally, the advantages and development potential of this loss function were verified by applying this loss function to the regression question and using five different datasets (with different dimensions, different sample numbers, and different fields) to compare with the other loss functions. The results of multiple experiments have proven the advantages of our loss function .

Updated: 2025-04-09 14:40:46

标题: ASRL：具有潜力发展的鲁棒损失函数

摘要: 在这篇文章中，我们提出了一种基于先前鲁棒损失函数的分区智能鲁棒损失函数。这种损失函数的特点是通过分区智能设计和自适应参数调整实现高鲁棒性和广泛适用性。最后，通过将该损失函数应用于回归问题，并利用五个不同数据集（具有不同维度、不同样本数和不同领域）与其他损失函数进行比较，验证了该损失函数的优势和发展潜力。多次实验的结果证明了我们的损失函数的优势。

更新时间: 2025-04-09 14:40:46

领域: cs.LG

下载: http://arxiv.org/abs/2504.06935v1

Outlier dimensions favor frequent tokens in language models

We study last-layer outlier dimensions, i.e. dimensions that display extreme activations for the majority of inputs. We show that outlier dimensions arise in many different modern language models, and trace their function back to the heuristic of constantly predicting frequent words. We further show how a model can block this heuristic when it is not contextually appropriate, by assigning a counterbalancing weight mass to the remaining dimensions, and we investigate which model parameters boost outlier dimensions and when they arise during training. We conclude that outlier dimensions are a specialized mechanism discovered by many distinct models to implement a useful token prediction heuristic.

Updated: 2025-04-09 14:37:48

标题: 异常维度偏向语言模型中频繁出现的标记

摘要: 我们研究最后一层的异常维度，即对大多数输入显示极端激活的维度。我们展示异常维度出现在许多不同的现代语言模型中，并追溯其功能到不断预测常见单词的启发式。我们进一步展示了当这种启发式在语境上不合适时，模型如何通过为剩余维度分配一个抵消权重来阻止这种启发式，并研究了哪些模型参数增强异常维度以及它们在训练过程中何时出现。我们得出结论，异常维度是许多不同模型发现的一种专门机制，用于实现有用的标记预测启发式。

更新时间: 2025-04-09 14:37:48

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2503.21718v3

Saliency-driven Dynamic Token Pruning for Large Language Models

Despite the recent success of large language models (LLMs), LLMs are particularly challenging in long-sequence inference scenarios due to the quadratic computational complexity of the attention mechanism. Inspired by the interpretability theory of feature attribution in neural network models, we observe that not all tokens have the same contribution. Based on this observation, we propose a novel token pruning framework, namely Saliency-driven Dynamic Token Pruning (SDTP), to gradually and dynamically prune redundant tokens based on the input context. Specifically, a lightweight saliency-driven prediction module is designed to estimate the importance score of each token with its hidden state, which is added to different layers of the LLM to hierarchically prune redundant tokens. Furthermore, a ranking-based optimization strategy is proposed to minimize the ranking divergence of the saliency score and the predicted importance score. Extensive experiments have shown that our framework is generalizable to various models and datasets. By hierarchically pruning 65\% of the input tokens, our method greatly reduces 33\% $\sim$ 47\% FLOPs and achieves speedup up to 1.75$\times$ during inference, while maintaining comparable performance. We further demonstrate that SDTP can be combined with KV cache compression method for further compression.

Updated: 2025-04-09 14:36:19

标题: 基于显著性驱动的大型语言模型动态标记修剪

摘要: 尽管大型语言模型（LLMs）最近取得了成功，但在长序列推理场景中，由于注意力机制的二次计算复杂度，LLMs尤其具有挑战性。受神经网络模型中特征归因的可解释性理论启发，我们观察到并非所有标记的贡献都相同。基于这一观察，我们提出了一种新颖的标记修剪框架，即基于显著性驱动的动态标记修剪（SDTP），根据输入上下文逐渐动态修剪多余的标记。具体而言，设计了一个轻量级的显著性驱动预测模块，用于估计每个标记的重要性得分与其隐藏状态，该模块添加到LLM的不同层中，以分层修剪多余的标记。此外，提出了基于排名的优化策略，以最小化显著性得分和预测重要性得分的排名差异。大量实验证明，我们的框架适用于各种模型和数据集。通过分层修剪输入标记的65％，我们的方法大大减少了33％到47％的浮点运算数，并在推理过程中实现了高达1.75倍的加速，同时保持了可比较的性能。我们进一步证明，SDTP可以与KV缓存压缩方法结合以进一步压缩。

更新时间: 2025-04-09 14:36:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.04514v2

Beyond Tools: Generative AI as Epistemic Infrastructure in Education

As generative AI rapidly integrates into educational infrastructures worldwide, it transforms how knowledge gets created, validated, and shared, yet current discourse inadequately addresses its implications as epistemic infrastructure mediating teaching and learning. This paper investigates how AI systems function as epistemic infrastructures in education and their impact on human epistemic agency. Adopting a situated cognition perspective and following a value-sensitive design approach, the study conducts a technical investigation of two representative AI systems in educational settings, analyzing their impact on teacher practice across three dimensions: affordances for skilled epistemic actions, support for epistemic sensitivity, and implications for long-term habit formation. The analysis reveals that current AI systems inadequately support teachers' skilled epistemic actions, insufficiently foster epistemic sensitivity, and potentially cultivate problematic habits that prioritize efficiency over epistemic agency. To address these challenges, the paper recommends recognizing the infrastructural transformation occurring in education, developing AI environments that stimulate skilled actions while upholding epistemic norms, and involving educators in AI design processes -- recommendations aimed at fostering AI integration that aligns with core educational values and maintains human epistemic agency.

Updated: 2025-04-09 14:35:30

标题: 超越工具：生成式人工智能作为教育中的认识基础设施

摘要: 随着生成式人工智能迅速融入全球教育基础设施，它转变了知识的创造、验证和分享方式，然而目前的讨论未能充分解决其作为认知基础设施在教学和学习中的影响。本文研究了人工智能系统在教育中作为认知基础设施的功能以及对人类认知代理的影响。采用了情境认知的观点并遵循价值敏感的设计方法，研究对教育环境中两个代表性人工智能系统进行了技术调查，分析了它们对教师实践的影响，包括：支持熟练认知行为的功能、支持认知敏感性和对长期习惯形成的影响。分析表明，目前的人工智能系统未能充分支持教师的熟练认知行为，未能充分培养认知敏感性，并可能培养出重视效率而非认知代理的问题习惯。为了解决这些挑战，本文建议认识到教育中正在发生的基础设施转型，开发能够激发熟练行为同时维护认知规范的人工智能环境，并将教育工作者纳入人工智能设计过程中，以促进与核心教育价值观一致并保持人类认知代理的人工智能整合。

更新时间: 2025-04-09 14:35:30

领域: cs.CY,cs.AI,K.3.1; K.4.3; H.5.2

下载: http://arxiv.org/abs/2504.06928v1

RO-FIGS: Efficient and Expressive Tree-Based Ensembles for Tabular Data

Tree-based models are often robust to uninformative features and can accurately capture non-smooth, complex decision boundaries. Consequently, they often outperform neural network-based models on tabular datasets at a significantly lower computational cost. Nevertheless, the capability of traditional tree-based ensembles to express complex relationships efficiently is limited by using a single feature to make splits. To improve the efficiency and expressiveness of tree-based methods, we propose Random Oblique Fast Interpretable Greedy-Tree Sums (RO-FIGS). RO-FIGS builds on Fast Interpretable Greedy-Tree Sums, and extends it by learning trees with oblique or multivariate splits, where each split consists of a linear combination learnt from random subsets of features. This helps uncover interactions between features and improves performance. The proposed method is suitable for tabular datasets with both numerical and categorical features. We evaluate RO-FIGS on 22 real-world tabular datasets, demonstrating superior performance and much smaller models over other tree- and neural network-based methods. Additionally, we analyse their splits to reveal valuable insights into feature interactions, enriching the information learnt from SHAP summary plots, and thereby demonstrating the enhanced interpretability of RO-FIGS models. The proposed method is well-suited for applications, where balance between accuracy and interpretability is essential.

Updated: 2025-04-09 14:35:24

标题: RO-FIGS：用于表格数据的高效且表达力强的基于树的集成算法

摘要: 基于树的模型通常对无信息特征具有鲁棒性，并且能够准确捕捉非平滑、复杂的决策边界。因此，在表格数据集上，它们通常以较低的计算成本表现优于基于神经网络的模型。然而，传统的基于树的集成模型表达复杂关系的能力受到仅使用单个特征进行分割的限制。为了提高基于树的方法的效率和表达能力，我们提出了Random Oblique Fast Interpretable Greedy-Tree Sums（RO-FIGS）。RO-FIGS建立在Fast Interpretable Greedy-Tree Sums的基础上，并通过学习具有倾斜或多变量分割的树来扩展它，其中每个分割由从特征的随机子集中学习的线性组合组成。这有助于揭示特征之间的相互作用并提高性能。该方法适用于具有数值和分类特征的表格数据集。我们在22个真实世界的表格数据集上评估了RO-FIGS，表现出优越的性能和比其他基于树和神经网络的方法更小的模型。此外，我们分析它们的分割，以揭示有价值的特征相互作用见解，丰富从SHAP汇总图中学到的信息，从而展示了RO-FIGS模型的增强可解释性。该方法非常适用于需要在准确性和可解释性之间取得平衡的应用场景。

更新时间: 2025-04-09 14:35:24

领域: cs.LG

下载: http://arxiv.org/abs/2504.06927v1

Preference-Based Alignment of Discrete Diffusion Models

Diffusion models have achieved state-of-the-art performance across multiple domains, with recent advancements extending their applicability to discrete data. However, aligning discrete diffusion models with task-specific preferences remains challenging, particularly in scenarios where explicit reward functions are unavailable. In this work, we introduce Discrete Diffusion DPO (D2-DPO), the first adaptation of Direct Preference Optimization (DPO) to discrete diffusion models formulated as continuous-time Markov chains. Our approach derives a novel loss function that directly fine-tunes the generative process using preference data while preserving fidelity to a reference distribution. We validate D2-DPO on a structured binary sequence generation task, demonstrating that the method effectively aligns model outputs with preferences while maintaining structural validity. Our results highlight that D2-DPO enables controlled fine-tuning without requiring explicit reward models, making it a practical alternative to reinforcement learning-based approaches. Future research will explore extending D2-DPO to more complex generative tasks, including language modeling and protein sequence generation, as well as investigating alternative noise schedules, such as uniform noising, to enhance flexibility across different applications.

Updated: 2025-04-09 14:34:53

标题: 基于偏好的离散扩散模型对齐

摘要: 扩散模型在多个领域取得了最先进的性能，最近的进展将它们的适用性扩展到离散数据。然而，在没有明确奖励函数的情况下，将离散扩散模型与任务特定偏好对齐仍然具有挑战性。在这项工作中，我们介绍了离散扩散DPO（D2-DPO），这是将直接偏好优化（DPO）调整为连续时间马尔可夫链的离散扩散模型的第一个适应。我们的方法提出了一种新颖的损失函数，通过偏好数据直接微调生成过程，同时保持与参考分布的忠实度。我们在结构化的二进制序列生成任务上验证了D2-DPO，证明了该方法能够有效地将模型输出与偏好对齐，同时保持结构有效性。我们的结果突显了D2-DPO能够实现受控微调，而无需明确奖励模型，使其成为强化学习方法的实际替代方案。未来的研究将探索将D2-DPO扩展到更复杂的生成任务，包括语言建模和蛋白质序列生成，以及研究替代的噪声时间表，如均匀噪声，以增强在不同应用中的灵活性。

更新时间: 2025-04-09 14:34:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.08295v2

Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition

Automatic dietary assessment based on food images remains a challenge, requiring precise food detection, segmentation, and classification. Vision-Language Models (VLMs) offer new possibilities by integrating visual and textual reasoning. In this study, we evaluate six state-of-the-art VLMs (ChatGPT, Gemini, Claude, Moondream, DeepSeek, and LLaVA), analyzing their capabilities in food recognition at different levels. For the experimental framework, we introduce the FoodNExTDB, a unique food image database that contains 9,263 expert-labeled images across 10 categories (e.g., "protein source"), 62 subcategories (e.g., "poultry"), and 9 cooking styles (e.g., "grilled"). In total, FoodNExTDB includes 50k nutritional labels generated by seven experts who manually annotated all images in the database. Also, we propose a novel evaluation metric, Expert-Weighted Recall (EWR), that accounts for the inter-annotator variability. Results show that closed-source models outperform open-source ones, achieving over 90% EWR in recognizing food products in images containing a single product. Despite their potential, current VLMs face challenges in fine-grained food recognition, particularly in distinguishing subtle differences in cooking styles and visually similar food items, which limits their reliability for automatic dietary assessment. The FoodNExTDB database is publicly available at https://github.com/AI4Food/FoodNExtDB.

Updated: 2025-04-09 14:33:59

标题: 视觉语言模型是否准备好进行饮食评估？探索人工智能驱动的食物图像识别的下一个领域

摘要: 基于食物图像的自动饮食评估仍然是一个挑战，需要精确的食物检测、分割和分类。视觉语言模型（VLMs）通过整合视觉和文本推理，提供了新的可能性。在这项研究中，我们评估了六种最先进的VLMs（ChatGPT、Gemini、Claude、Moondream、DeepSeek和LLaVA），分析它们在不同级别的食物识别中的能力。在实验框架中，我们引入了FoodNExTDB，这是一个独特的食物图像数据库，包含了10个类别（例如“蛋白质来源”）、62个子类别（例如“家禽”）和9种烹饪风格（例如“烤制”）的9,263张专家标记的图片。总共，FoodNExTDB包含了由七位专家生成的50,000个营养标签，他们手动注释了数据库中的所有图片。此外，我们提出了一种新颖的评估指标，即专家加权召回率（EWR），考虑了注释者之间的变异性。结果显示，封闭源模型优于开源模型，在识别单一产品图片中的食品产品时，达到了超过90%的EWR。尽管具有潜力，当前的VLMs面临着在细粒度食物识别方面的挑战，特别是在区分烹饪风格的细微差异和在视觉上相似的食物项目方面，这限制了它们在自动饮食评估方面的可靠性。FoodNExTDB数据库可以在https://github.com/AI4Food/FoodNExtDB 上公开获取。

更新时间: 2025-04-09 14:33:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.06925v1

Longitudinal Assessment of Lung Lesion Burden in CT

In the U.S., lung cancer is the second major cause of death. Early detection of suspicious lung nodules is crucial for patient treatment planning, management, and improving outcomes. Many approaches for lung nodule segmentation and volumetric analysis have been proposed, but few have looked at longitudinal changes in total lung tumor burden. In this work, we trained two 3D models (nnUNet) with and without anatomical priors to automatically segment lung lesions and quantified total lesion burden for each patient. The 3D model without priors significantly outperformed ($p < .001$) the model trained with anatomy priors. For detecting clinically significant lesions $>$ 1cm, a precision of 71.3\%, sensitivity of 68.4\%, and F1-score of 69.8\% was achieved. For segmentation, a Dice score of 77.1 $\pm$ 20.3 and Hausdorff distance error of 11.7 $\pm$ 24.1 mm was obtained. The median lesion burden was 6.4 cc (IQR: 2.1, 18.1) and the median volume difference between manual and automated measurements was 0.02 cc (IQR: -2.8, 1.2). Agreements were also evaluated with linear regression and Bland-Altman plots. The proposed approach can produce a personalized evaluation of the total tumor burden for a patient and facilitate interval change tracking over time.

Updated: 2025-04-09 14:30:43

标题: CT影像中肺部病变负担的纵向评估

摘要: 在美国，肺癌是第二大死因。对可疑肺结节进行早期检测对于患者治疗计划、管理和改善预后至关重要。已经提出了许多肺结节分割和体积分析方法，但很少有人考虑总肺肿瘤负担的纵向变化。在这项工作中，我们训练了两个3D模型（nnUNet），一个带有解剖先验，一个不带有解剖先验，以自动分割肺部病变并量化每位患者的总病变负担。不带先验的3D模型在表现上明显优于（$p < .001$）使用解剖先验训练的模型。对于检测临床意义的大于1cm的病变，实现了71.3\%的精度，68.4\%的灵敏度和69.8\%的F1分数。对于分割，获得了77.1 $\pm$ 20.3的Dice得分和11.7 $\pm$ 24.1mm的Hausdorff距离误差。中位病变负担为6.4 cc（IQR: 2.1，18.1），手动和自动测量之间的中位体积差为0.02 cc（IQR: -2.8，1.2）。还使用线性回归和Bland-Altman图评估了一致性。所提出的方法可以为患者提供总肿瘤负担的个性化评估，并促进随时间的间隔变化跟踪。

更新时间: 2025-04-09 14:30:43

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.06924v1

The Importance of Being Discrete: Measuring the Impact of Discretization in End-to-End Differentially Private Synthetic Data

Differentially Private (DP) generative marginal models are often used in the wild to release synthetic tabular datasets in lieu of sensitive data while providing formal privacy guarantees. These models approximate low-dimensional marginals or query workloads; crucially, they require the training data to be pre-discretized, i.e., continuous values need to first be partitioned into bins. However, as the range of values (or their domain) is often inferred directly from the training data, with the number of bins and bin edges typically defined arbitrarily, this approach can ultimately break end-to-end DP guarantees and may not always yield optimal utility. In this paper, we present an extensive measurement study of four discretization strategies in the context of DP marginal generative models. More precisely, we design DP versions of three discretizers (uniform, quantile, and k-means) and reimplement the PrivTree algorithm. We find that optimizing both the choice of discretizer and bin count can improve utility, on average, by almost 30% across six DP marginal models, compared to the default strategy and number of bins, with PrivTree being the best-performing discretizer in the majority of cases. We demonstrate that, while DP generative models with non-private discretization remain vulnerable to membership inference attacks, applying DP during discretization effectively mitigates this risk. Finally, we propose an optimized approach for automatically selecting the optimal number of bins, achieving high utility while reducing both privacy budget consumption and computational overhead.

Updated: 2025-04-09 14:30:30

标题: 《离散性的重要性：测量端到端差分隐私合成数据中离散化的影响》

摘要: 差分隐私（DP）生成边际模型经常在野外使用，以发布合成表格数据集代替敏感数据，同时提供正式的隐私保证。这些模型近似低维边际或查询工作负载；关键是，它们要求训练数据事先离散化，即需要首先将连续值分成区间。然而，由于数值的范围（或其域）通常直接从训练数据推断出来，与区间的数量和边界通常任意定义，这种方法最终可能会破坏端到端的DP保证，并且可能并不总是产生最佳效用。在本文中，我们在DP边际生成模型的环境中，对四种离散化策略进行了广泛的测量研究。更具体地说，我们设计了三种离散化器（均匀、分位数和k均值）的DP版本，并重新实现了PrivTree算法。我们发现，优化离散化器和区间数的选择可以将效用平均提高近30％，与默认策略和区间数相比，跨六个DP边际模型，其中PrivTree在大多数情况下表现最佳。我们证明，尽管具有非私有离散化的DP生成模型仍然容易受到成员推断攻击，但在离散化过程中应用DP可以有效减轻这种风险。最后，我们提出了一种优化方法，用于自动选择最佳区间数，实现高效用的同时减少隐私预算消耗和计算开销。

更新时间: 2025-04-09 14:30:30

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2504.06923v1

Leveraging Anatomical Priors for Automated Pancreas Segmentation on Abdominal CT

An accurate segmentation of the pancreas on CT is crucial to identify pancreatic pathologies and extract imaging-based biomarkers. However, prior research on pancreas segmentation has primarily focused on modifying the segmentation model architecture or utilizing pre- and post-processing techniques. In this article, we investigate the utility of anatomical priors to enhance the segmentation performance of the pancreas. Two 3D full-resolution nnU-Net models were trained, one with 8 refined labels from the public PANORAMA dataset, and another that combined them with labels derived from the public TotalSegmentator (TS) tool. The addition of anatomical priors resulted in a 6\% increase in Dice score ($p < .001$) and a 36.5 mm decrease in Hausdorff distance for pancreas segmentation ($p < .001$). Moreover, the pancreas was always detected when anatomy priors were used, whereas there were 8 instances of failed detections without their use. The use of anatomy priors shows promise for pancreas segmentation and subsequent derivation of imaging biomarkers.

Updated: 2025-04-09 14:29:08

标题: 利用解剖先验知识进行腹部CT自动胰腺分割

摘要: 胰腺在CT上的准确分割对于识别胰腺病变并提取基于影像的生物标志物至关重要。然而，在以往的胰腺分割研究中，主要集中在修改分割模型架构或利用预处理和后处理技术。在本文中，我们研究了解剖先验的实用性，以增强胰腺分割的性能。训练了两个3D全分辨率的nnU-Net模型，一个使用来自公共PANORAMA数据集的8个精细标签，另一个将它们与来自公共TotalSegmentator（TS）工具的标签结合在一起。添加解剖先验导致Dice分数增加了6\%（$p < .001$），胰腺分割的Hausdorff距离减少了36.5毫米（$p < .001$）。此外，当使用解剖先验时，始终可以检测到胰腺，而在没有使用解剖先验的情况下有8个未成功检测的实例。使用解剖先验对于胰腺分割和随后推导影像生物标志物显示出潜力。

更新时间: 2025-04-09 14:29:08

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.06921v1

An Analysis of Temporal Dropout in Earth Observation Time Series for Regression Tasks

Missing instances in time series data impose a significant challenge to deep learning models, particularly in regression tasks. In the Earth Observation field, satellite failure or cloud occlusion frequently results in missing time-steps, introducing uncertainties in the predicted output and causing a decline in predictive performance. While many studies address missing time-steps through data augmentation to improve model robustness, the uncertainty arising at the input level is commonly overlooked. To address this gap, we introduce Monte Carlo Temporal Dropout (MC-TD), a method that explicitly accounts for input-level uncertainty by randomly dropping time-steps during inference using a predefined dropout ratio, thereby simulating the effect of missing data. To bypass the need for costly searches for the optimal dropout ratio, we extend this approach with Monte Carlo Concrete Temporal Dropout (MC-ConcTD), a method that learns the optimal dropout distribution directly. Both MC-TD and MC-ConcTD are applied during inference, leveraging Monte Carlo sampling for uncertainty quantification. Experiments on three EO time-series datasets demonstrate that MC-ConcTD improves predictive performance and uncertainty calibration compared to existing approaches. Additionally, we highlight the advantages of adaptive dropout tuning over manual selection, making uncertainty quantification more robust and accessible for EO applications.

Updated: 2025-04-09 14:23:04

标题: 对于回归任务中地球观测时间序列中的时间丢失的分析

摘要: 时间序列数据中缺失的实例对深度学习模型构成了重大挑战，特别是在回归任务中。在地球观测领域，卫星故障或云遮挡经常导致时间步长的缺失，引入了对预测输出的不确定性，并导致预测性能下降。虽然许多研究通过数据增强来解决缺失的时间步骤以提高模型的鲁棒性，但常常忽视了输入级别的不确定性。为了解决这一问题，我们引入了蒙特卡罗时间丢失（MC-TD）方法，该方法通过在推断过程中随机丢失时间步骤来明确考虑输入级别的不确定性，使用预定义的丢失率，从而模拟缺失数据的影响。为了避免需要昂贵的搜索最佳丢失率的需求，我们将这一方法扩展为蒙特卡罗混凝土时间丢失（MC-ConcTD）方法，该方法直接学习最佳丢失分布。MC-TD和MC-ConcTD均在推断过程中应用，利用蒙特卡罗抽样进行不确定性量化。对三个EO时间序列数据集的实验证明，与现有方法相比，MC-ConcTD改善了预测性能和不确定性校准。此外，我们强调了自适应丢失调整相对于手动选择的优势，使不确定性量化对EO应用更加稳健和易于访问。

更新时间: 2025-04-09 14:23:04

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.06915v1

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation

In medical imaging, the primary challenge is collecting large-scale labeled data due to privacy concerns, logistics, and high labeling costs. In this work, we present the UK Biobank Organs and Bones (UKBOB), the largest labeled dataset of body organs, comprising 51,761 MRI 3D samples (equivalent to 17.9 million 2D images) and more than 1.37 billion 2D segmentation masks of 72 organs, all based on the UK Biobank MRI dataset. We utilize automatic labeling, introduce an automated label cleaning pipeline with organ-specific filters, and manually annotate a subset of 300 MRIs with 11 abdominal classes to validate the quality (referred to as UKBOB-manual). This approach allows for scaling up the dataset collection while maintaining confidence in the labels. We further confirm the validity of the labels by demonstrating zero-shot generalization of trained models on the filtered UKBOB to other small labeled datasets from similar domains (e.g., abdominal MRI). To further mitigate the effect of noisy labels, we propose a novel method called Entropy Test-time Adaptation (ETTA) to refine the segmentation output. We use UKBOB to train a foundation model, Swin-BOB, for 3D medical image segmentation based on the Swin-UNetr architecture, achieving state-of-the-art results in several benchmarks in 3D medical imaging, including the BRATS brain MRI tumor challenge (with a 0.4% improvement) and the BTCV abdominal CT scan benchmark (with a 1.3% improvement). The pre-trained models and the code are available at https://emmanuelleb985.github.io/ukbob , and the filtered labels will be made available with the UK Biobank.

Updated: 2025-04-09 14:10:51

标题: UKBOB：十亿MRI标记掩模用于可推广的3D医学图像分割

摘要: 在医学成像领域，主要挑战是由于隐私问题、后勤和高昂的标注成本，导致收集大规模标记数据困难。在这项工作中，我们提出了英国生物库器官和骨骼（UKBOB），这是最大的身体器官标记数据集，包括51,761个MRI三维样本（相当于1,790万个二维图像）和超过13.7亿个72个器官的二维分割掩模，所有这些基于英国生物库MRI数据集。我们利用自动标注，引入具有器官特定过滤器的自动标签清理管道，并手动标注了300个MRI的11个腹部类别的子集，以验证质量（称为UKBOB-manual）。这种方法允许扩大数据集收集规模，同时保持对标签的信心。我们通过展示经过滤的UKBOB上训练模型对来自类似领域（如腹部MRI）的其他小型标记数据集进行零样本泛化来进一步确认标签的有效性。为了进一步减轻噪声标签的影响，我们提出了一种名为熵测试时间适应（ETTA）的新方法来改进分割输出。我们使用UKBOB来训练基础模型Swin-BOB，基于Swin-UNetr架构进行3D医学图像分割，取得了在几个3D医学成像基准测试中的最新成果，包括BRATS脑部MRI肿瘤挑战（提高了0.4%）和BTCV腹部CT扫描基准测试（提高了1.3%）。预训练模型和代码可在https://emmanuelleb985.github.io/ukbob 上获取，经过筛选的标签将在英国生物库中提供。

更新时间: 2025-04-09 14:10:51

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.06908v1

Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval

Despite the recent advancement in Retrieval-Augmented Generation (RAG) systems, most retrieval methodologies are often developed for factual retrieval, which assumes query and positive documents are semantically similar. In this paper, we instead propose and study a more challenging type of retrieval task, called hidden rationale retrieval, in which query and document are not similar but can be inferred by reasoning chains, logic relationships, or empirical experiences. To address such problems, an instruction-tuned Large language model (LLM) with a cross-encoder architecture could be a reasonable choice. To further strengthen pioneering LLM-based retrievers, we design a special instruction that transforms the retrieval task into a generative task by prompting LLM to answer a binary-choice question. The model can be fine-tuned with direct preference optimization (DPO). The framework is also optimized for computational efficiency with no performance degradation. We name this retrieval framework by RaHoRe and verify its zero-shot and fine-tuned performance superiority on Emotional Support Conversation (ESC), compared with previous retrieval works. Our study suggests the potential to employ LLM as a foundation for a wider scope of retrieval tasks. Our codes, models, and datasets are available on https://github.com/flyfree5/LaHoRe.

Updated: 2025-04-09 14:08:58

标题: 大型语言模型可以作为基于隐藏原理的检索的基础

摘要: 尽管检索增强生成（RAG）系统近期取得了进展，但大多数检索方法通常是针对事实检索而开发的，这种方法假设查询和正面文档在语义上相似。本文提出并研究了一种更具挑战性的检索任务类型，称为隐藏理由检索，其中查询和文档并不相似，但可以通过推理链、逻辑关系或经验来推断。为了解决这些问题，一种带有交叉编码器架构的指令调整的大型语言模型（LLM）可能是一个合理的选择。为了进一步强化开创性的基于LLM的检索器，我们设计了一个特殊的指令，将检索任务转换为生成任务，通过提示LLM回答一个二选一问题。该模型可以通过直接偏好优化（DPO）进行微调。该框架还针对计算效率进行了优化，没有性能下降。我们将这种检索框架命名为RaHoRe，并验证了其在情感支持对话（ESC）上的零射击和微调性能优越性，与以前的检索工作相比。我们的研究表明，LLM有潜力成为更广泛的检索任务的基础。我们的代码、模型和数据集可以在https://github.com/flyfree5/LaHoRe 上找到。

更新时间: 2025-04-09 14:08:58

领域: cs.IR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2412.16615v2

MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs

This paper presents MedSegFactory, a versatile medical synthesis framework that generates high-quality paired medical images and segmentation masks across modalities and tasks. It aims to serve as an unlimited data repository, supplying image-mask pairs to enhance existing segmentation tools. The core of MedSegFactory is a dual-stream diffusion model, where one stream synthesizes medical images and the other generates corresponding segmentation masks. To ensure precise alignment between image-mask pairs, we introduce Joint Cross-Attention (JCA), enabling a collaborative denoising paradigm by dynamic cross-conditioning between streams. This bidirectional interaction allows both representations to guide each other's generation, enhancing consistency between generated pairs. MedSegFactory unlocks on-demand generation of paired medical images and segmentation masks through user-defined prompts that specify the target labels, imaging modalities, anatomical regions, and pathological conditions, facilitating scalable and high-quality data generation. This new paradigm of medical image synthesis enables seamless integration into diverse medical imaging workflows, enhancing both efficiency and accuracy. Extensive experiments show that MedSegFactory generates data of superior quality and usability, achieving competitive or state-of-the-art performance in 2D and 3D segmentation tasks while addressing data scarcity and regulatory constraints.

Updated: 2025-04-09 13:56:05

标题: MedSegFactory：医学图像掩模对的文本引导生成

摘要: 这篇论文介绍了MedSegFactory，这是一个多功能的医学合成框架，可以跨模态和任务生成高质量的配对医学影像和分割遮罩。它旨在作为一个无限数据存储库，提供图像-遮罩配对以增强现有的分割工具。MedSegFactory的核心是一个双流扩散模型，其中一个流合成医学影像，另一个生成相应的分割遮罩。为了确保图像-遮罩配对之间的精确对齐，我们引入了联合交叉注意力（JCA），通过流之间的动态交叉条件使协同去噪范式成为可能。这种双向交互允许两种表示指导彼此的生成，增强生成配对之间的一致性。MedSegFactory通过用户定义的提示解锁了对医学影像和分割遮罩的按需生成，这些提示指定了目标标签、成像模态、解剖区域和病理条件，促进了可扩展和高质量的数据生成。这种新的医学影像合成范式使其能够无缝集成到各种医学影像工作流中，提高了效率和准确性。大量实验证明，MedSegFactory生成的数据质量和可用性优越，在2D和3D分割任务中实现了有竞争力或最新技术水平的性能，同时解决了数据稀缺和监管约束问题。

更新时间: 2025-04-09 13:56:05

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.06897v1

A Survey on Mixture of Experts in Large Language Models

Large language models (LLMs) have garnered unprecedented advancements across diverse fields, ranging from natural language processing to computer vision and beyond. The prowess of LLMs is underpinned by their substantial model size, extensive and diverse datasets, and the vast computational power harnessed during training, all of which contribute to the emergent abilities of LLMs (e.g., in-context learning) that are not present in small models. Within this context, the mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal computation overhead, gaining significant attention from academia and industry. Despite its growing prevalence, there lacks a systematic and comprehensive review of the literature on MoE. This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE. We first briefly introduce the structure of the MoE layer, followed by proposing a new taxonomy of MoE. Next, we overview the core designs for various MoE models including both algorithmic and systemic aspects, alongside collections of available open-source implementations, hyperparameter configurations and empirical evaluations. Furthermore, we delineate the multifaceted applications of MoE in practice, and outline some potential directions for future research. To facilitate ongoing updates and the sharing of cutting-edge advances in MoE research, we have established a resource repository at https://github.com/withinmiaov/A-Survey-on-Mixture-of-Experts-in-LLMs.

Updated: 2025-04-09 13:54:59

标题: 大型语言模型中专家混合的调查

摘要: 大型语言模型（LLMs）在各个领域取得了前所未有的进展，涵盖了从自然语言处理到计算机视觉等各种领域。LLMs的实力源自它们庞大的模型规模、广泛且多样化的数据集，以及在训练过程中利用的庞大计算能力，这些因素共同促成了LLMs的新兴能力（例如上下文学习），这些能力在小型模型中并不存在。在这种背景下，专家混合（MoE）已经成为一种有效的方法，可以大幅提升模型容量，同时最小化计算开销，受到学术界和工业界的广泛关注。尽管MoE的普及程度不断增长，但对MoE文献缺乏系统性和全面性的审查。本调查旨在填补这一空白，为研究人员深入探讨MoE的复杂性提供必要的资源。我们首先简要介绍MoE层的结构，然后提出了一个新的MoE分类法。接下来，我们概述了各种MoE模型的核心设计，包括算法和系统方面，以及可用的开源实现、超参数配置和实证评估的集合。此外，我们还勾勒出MoE在实践中的多方面应用，并概述了未来研究的一些潜在方向。为了促进MoE研究中最新进展的持续更新和分享，我们在https://github.com/withinmiaov/A-Survey-on-Mixture-of-Experts-in-LLMs建立了一个资源库。

更新时间: 2025-04-09 13:54:59

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2407.06204v3

CAI: An Open, Bug Bounty-Ready Cybersecurity AI

By 2028 most cybersecurity actions will be autonomous, with humans teleoperating. We present the first classification of autonomy levels in cybersecurity and introduce Cybersecurity AI (CAI), an open-source framework that democratizes advanced security testing through specialized AI agents. Through rigorous empirical evaluation, we demonstrate that CAI consistently outperforms state-of-the-art results in CTF benchmarks, solving challenges across diverse categories with significantly greater efficiency -up to 3,600x faster than humans in specific tasks and averaging 11x faster overall. CAI achieved first place among AI teams and secured a top-20 position worldwide in the "AI vs Human" CTF live Challenge, earning a monetary reward of $750. Based on our results, we argue against LLM-vendor claims about limited security capabilities. Beyond cybersecurity competitions, CAI demonstrates real-world effectiveness, reaching top-30 in Spain and top-500 worldwide on Hack The Box within a week, while dramatically reducing security testing costs by an average of 156x. Our framework transcends theoretical benchmarks by enabling non-professionals to discover significant security bugs (CVSS 4.3-7.5) at rates comparable to experts during bug bounty exercises. By combining modular agent design with seamless tool integration and human oversight (HITL), CAI addresses critical market gaps, offering organizations of all sizes access to AI-powered bug bounty security testing previously available only to well-resourced firms -thereby challenging the oligopolistic ecosystem currently dominated by major bug bounty platforms.

Updated: 2025-04-09 13:54:18

标题: CAI：一个开放的，适用于漏洞赏金的网络安全人工智能

摘要: 到2028年，大多数网络安全行动将是自主的，由人类进行远程操作。我们提出了网络安全自主级别的第一个分类，并引入了网络安全人工智能（CAI），这是一个开源框架，通过专门的人工智能代理实现了先进安全测试的民主化。通过严格的实证评估，我们展示了CAI在CTF基准测试中始终优于最先进的结果，以明显更高的效率解决了各种类别的挑战-在特定任务中比人类快3600倍，整体平均快11倍。CAI在"AI vs Human" CTF实时挑战中获得了AI团队的第一名，并在全球排名中获得了前20名，赢得了750美元的奖金。根据我们的结果，我们反对LLM供应商关于安全能力有限的说法。除了网络安全竞赛，CAI在一周内在西班牙排名前30，在Hack The Box全球排名前500，同时将安全测试成本平均降低了156倍。我们的框架通过将模块化代理设计与无缝工具集成和人类监督（HITL）相结合，超越了理论基准，使非专业人员能够在漏洞悬赏练习中以与专家相当的速率发现重要的安全漏洞（CVSS 4.3-7.5）。通过结合模块化代理设计与无缝工具集成和人类监督（HITL），CAI解决了关键的市场空白，为各种规模的组织提供了AI支持的漏洞悬赏安全测试，这些测试以前只有富裕公司才能获得-从而挑战了目前由主要漏洞悬赏平台主导的寡头生态系统。

更新时间: 2025-04-09 13:54:18

领域: cs.CR

下载: http://arxiv.org/abs/2504.06017v2

MoC-System: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training

As large language models continue to scale up, distributed training systems have expanded beyond 10k nodes, intensifying the importance of fault tolerance. Checkpoint has emerged as the predominant fault tolerance strategy, with extensive studies dedicated to optimizing its efficiency. However, the advent of the sparse Mixture-of-Experts (MoE) model presents new challenges due to the substantial increase in model size, despite comparable computational demands to dense models. In this work, we propose the Mixture-of-Checkpoint System (MoC-System) to orchestrate the vast array of checkpoint shards produced in distributed training systems. MoC-System features a novel Partial Experts Checkpointing (PEC) mechanism, an algorithm-system co-design that strategically saves a selected subset of experts, effectively reducing the MoE checkpoint size to levels comparable with dense models. Incorporating hybrid parallel strategies, MoC-System involves fully sharded checkpointing strategies to evenly distribute the workload across distributed ranks. Furthermore, MoC-System introduces a two-level checkpointing management method that asynchronously handles in-memory snapshots and persistence processes. We build MoC-System upon the Megatron-DeepSpeed framework, achieving up to a 98.9% reduction in overhead for each checkpointing process compared to the original method, during MoE model training with ZeRO-2 data parallelism and expert parallelism. Additionally, extensive empirical analyses substantiate that our methods enhance efficiency while maintaining comparable model accuracy, even achieving an average accuracy increase of 1.08% on downstream tasks.

Updated: 2025-04-09 13:51:25

标题: MoC-System: 稀疏专家模型训练的高效容错系统

摘要: 随着大型语言模型的不断扩大，分布式训练系统已经扩展到超过10k个节点，加剧了容错性的重要性。检查点已经成为主要的容错策略，大量研究致力于优化其效率。然而，稀疏的专家混合模型的出现带来了新的挑战，因为尽管计算需求与密集模型相当，但模型大小显著增加。在这项工作中，我们提出了混合检查点系统（MoC-System），用于协调分布式训练系统中产生的大量检查点分片。MoC-System具有一种新颖的部分专家检查点（PEC）机制，这是一种算法-系统共同设计，可以策略性地保存一组选择的专家，有效地将MoE检查点大小减少到与密集模型相当的水平。MoC-System结合了混合并行策略，涉及全分片检查点策略，以均匀分配工作负载到分布式排名中。此外，MoC-System引入了一个两级检查点管理方法，异步处理内存快照和持久化过程。我们基于Megatron-DeepSpeed框架构建了MoC-System，与原始方法相比，在使用ZeRO-2数据并行性和专家并行性训练MoE模型时，每个检查点过程的开销降低了高达98.9%。此外，广泛的实证分析证实了我们的方法在提高效率的同时保持了可比的模型准确性，甚至在下游任务中实现了平均准确度增加1.08%。

更新时间: 2025-04-09 13:51:25

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2408.04307v3

Quantized symbolic time series approximation

Time series are ubiquitous in numerous science and engineering domains, e.g., signal processing, bioinformatics, and astronomy. Previous work has verified the efficacy of symbolic time series representation in a variety of engineering applications due to its storage efficiency and numerosity reduction. The most recent symbolic aggregate approximation technique, ABBA, has been shown to preserve essential shape information of time series and improve downstream applications, e.g., neural network inference regarding prediction and anomaly detection in time series. Motivated by the emergence of high-performance hardware which enables efficient computation for low bit-width representations, we present a new quantization-based ABBA symbolic approximation technique, QABBA, which exhibits improved storage efficiency while retaining the original speed and accuracy of symbolic reconstruction. We prove an upper bound for the error arising from quantization and discuss how the number of bits should be chosen to balance this with other errors. An application of QABBA with large language models (LLMs) for time series regression is also presented, and its utility is investigated. By representing the symbolic chain of patterns on time series, QABBA not only avoids the training of embedding from scratch, but also achieves a new state-of-the-art on Monash regression dataset. The symbolic approximation to the time series offers a more efficient way to fine-tune LLMs on the time series regression task which contains various application domains. We further present a set of extensive experiments performed across various well-established datasets to demonstrate the advantages of the QABBA method for symbolic approximation.

Updated: 2025-04-09 13:46:27

标题: 量化符号时间序列逼近

摘要: 时间序列在许多科学和工程领域中无处不在，例如信号处理、生物信息学和天文学。先前的研究已经验证了符号时间序列表示在各种工程应用中的有效性，因为它具有存储效率和数量减少的特点。最近的符号聚合逼近技术ABBA已被证明能够保留时间序列的基本形状信息，并改善下游应用程序，例如神经网络推断，用于时间序列的预测和异常检测。受到高性能硬件的出现的启发，该硬件使得低位宽表示的高效计算成为可能，我们提出了一种新的基于量化的ABBA符号逼近技术QABBA，它展现了改进的存储效率，同时保留了符号重建的原始速度和准确性。我们证明了由量化引起的误差的上限，并讨论了应该如何选择比特数以平衡这一点与其他错误之间的关系。还提出了一种将QABBA与大型语言模型(LLMs)用于时间序列回归的应用，并对其效用进行了调查。通过表示时间序列上的符号模式链，QABBA不仅避免了从头开始训练嵌入，而且在Monash回归数据集上实现了一个新的最新技术。对时间序列的符号逼近提供了一种更高效的方法来对时间序列回归任务中包含的各种应用领域进行微调LLMs。我们进一步展示了在各种知名数据集上执行的一系列广泛实验，以展示QABBA方法在符号逼近方面的优势。

更新时间: 2025-04-09 13:46:27

领域: cs.LG,eess.SP,stat.ML

下载: http://arxiv.org/abs/2411.15209v2

Audio-visual Event Localization on Portrait Mode Short Videos

Audio-visual event localization (AVEL) plays a critical role in multimodal scene understanding. While existing datasets for AVEL predominantly comprise landscape-oriented long videos with clean and simple audio context, short videos have become the primary format of online video content due to the the proliferation of smartphones. Short videos are characterized by portrait-oriented framing and layered audio compositions (e.g., overlapping sound effects, voiceovers, and music), which brings unique challenges unaddressed by conventional methods. To this end, we introduce AVE-PM, the first AVEL dataset specifically designed for portrait mode short videos, comprising 25,335 clips that span 86 fine-grained categories with frame-level annotations. Beyond dataset creation, our empirical analysis shows that state-of-the-art AVEL methods suffer an average 18.66% performance drop during cross-mode evaluation. Further analysis reveals two key challenges of different video formats: 1) spatial bias from portrait-oriented framing introduces distinct domain priors, and 2) noisy audio composition compromise the reliability of audio modality. To address these issues, we investigate optimal preprocessing recipes and the impact of background music for AVEL on portrait mode videos. Experiments show that these methods can still benefit from tailored preprocessing and specialized model design, thus achieving improved performance. This work provides both a foundational benchmark and actionable insights for advancing AVEL research in the era of mobile-centric video content. Dataset and code will be released.

Updated: 2025-04-09 13:38:40

标题: 肖像模式短视频上的音视频事件定位

摘要: 视听事件定位（AVEL）在多模态场景理解中扮演着至关重要的角色。虽然现有的AVEL数据集主要由以清晰简单音频背景为主的横向长视频组成，但由于智能手机的普及，短视频已成为在线视频内容的主要格式。短视频以竖向构图和分层音频组合（例如重叠的音效、旁白和音乐）为特点，这带来了传统方法无法解决的独特挑战。为此，我们介绍了AVE-PM，这是专门为竖屏模式短视频设计的第一个AVEL数据集，包括25,335个跨86个细粒度类别的剪辑，带有帧级注释。除了数据集创建外，我们的实证分析表明，最先进的AVEL方法在跨模式评估中平均性能下降了18.66%。进一步分析揭示了不同视频格式的两个关键挑战：1）竖向构图引入了不同的领域先验; 2）嘈杂的音频组合损害了音频模态的可靠性。为了解决这些问题，我们研究了最佳的预处理配方以及背景音乐对竖屏模式视频AVEL的影响。实验证明，这些方法仍然可以受益于定制的预处理和专门的模型设计，从而实现改善的性能。这项工作为推动移动中心视频内容时代的AVEL研究提供了基础基准和可操作的见解。数据集和代码将会发布。

更新时间: 2025-04-09 13:38:40

领域: cs.MM,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.06884v1

Amortized Bayesian Multilevel Models

Multilevel models (MLMs) are a central building block of the Bayesian workflow. They enable joint, interpretable modeling of data across hierarchical levels and provide a fully probabilistic quantification of uncertainty. Despite their well-recognized advantages, MLMs pose significant computational challenges, often rendering their estimation and evaluation intractable within reasonable time constraints. Recent advances in simulation-based inference offer promising solutions for addressing complex probabilistic models using deep generative networks. However, the utility and reliability of deep learning methods for estimating Bayesian MLMs remains largely unexplored, especially when compared with gold-standard samplers. To this end, we explore a family of neural network architectures that leverage the probabilistic factorization of multilevel models to facilitate efficient neural network training and subsequent near-instant posterior inference on unseen datasets. We test our method on several real-world case studies and provide comprehensive comparisons to Stan's gold standard sampler, where possible. Finally, we provide an open-source implementation of our methods to stimulate further research in the nascent field of amortized Bayesian inference.

Updated: 2025-04-09 13:38:39

标题: 摊销贝叶斯多层模型

摘要: 多层模型（MLMs）是贝叶斯工作流程的核心构建模块。它们实现了跨层次数据的联合可解释建模，并提供了对不确定性的完全概率量化。尽管它们具有公认的优点，但MLMs面临着重大的计算挑战，通常在合理的时间限制内很难进行估计和评估。基于模拟推断的最新进展为应对复杂概率模型提供了有希望的解决方案，使用深度生成网络。然而，深度学习方法在估计贝叶斯MLMs方面的效用和可靠性仍然很少探讨，特别是与黄金标准采样器相比。为此，我们探索了一系列神经网络架构，利用MLMs的概率因子分解来促进高效的神经网络训练，并在未见数据集上进行近乎即时的后验推断。我们在几个真实案例研究中测试了我们的方法，并在可能的情况下与Stan的黄金标准采样器进行了全面比较。最后，我们提供了我们方法的开源实现，以刺激在摊销贝叶斯推断这一新兴领域的进一步研究。

更新时间: 2025-04-09 13:38:39

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2408.13230v2

Compound and Parallel Modes of Tropical Convolutional Neural Networks

Convolutional neural networks have become increasingly deep and complex, leading to higher computational costs. While tropical convolutional neural networks (TCNNs) reduce multiplications, they underperform compared to standard CNNs. To address this, we propose two new variants - compound TCNN (cTCNN) and parallel TCNN (pTCNN)-that use combinations of tropical min-plus and max-plus kernels to replace traditional convolution kernels. This reduces multiplications and balances efficiency with performance. Experiments on various datasets show that cTCNN and pTCNN match or exceed the performance of other CNN methods. Combining these with conventional CNNs in deeper architectures also improves performance. We are further exploring simplified TCNN architectures that reduce parameters and multiplications with minimal accuracy loss, aiming for efficient and effective models.

Updated: 2025-04-09 13:36:11

标题: 热带卷积神经网络的复合和并行模式

摘要: 卷积神经网络变得越来越深和复杂，导致计算成本增加。尽管热带卷积神经网络（TCNNs）减少了乘法运算，但与标准CNN相比表现不佳。为了解决这个问题，我们提出了两种新的变体 - 复合TCNN（cTCNN）和并行TCNN（pTCNN）- 使用热带最小-加和最大-加内核的组合来替代传统的卷积内核。这减少了乘法运算，平衡了效率和性能。在各种数据集上的实验表明，cTCNN和pTCNN与其他CNN方法的性能相匹敌甚至超越。将这些与传统CNN结合在更深的架构中也提高了性能。我们进一步探索简化的TCNN架构，减少参数和乘法运算，同时最小化精度损失，旨在构建高效且有效的模型。

更新时间: 2025-04-09 13:36:11

领域: cs.CV,cs.AI,I.2.6

下载: http://arxiv.org/abs/2504.06881v1

CRYSIM: Prediction of Symmetric Structures of Large Crystals with GPU-based Ising Machines

Solving black-box optimization problems with Ising machines is increasingly common in materials science. However, their application to crystal structure prediction (CSP) is still ineffective due to symmetry agnostic encoding of atomic coordinates. We introduce CRYSIM, an algorithm that encodes the space group, the Wyckoff positions combination, and coordinates of independent atomic sites as separate variables. This encoding reduces the search space substantially by exploiting the symmetry in space groups. When CRYSIM is interfaced to Fixstars Amplify, a GPU-based Ising machine, its prediction performance was competitive with CALYPSO and Bayesian optimization for crystals containing more than 150 atoms in a unit cell. Although it is not realistic to interface CRYSIM to current small-scale quantum devices, it has the potential to become the standard CSP algorithm in the coming quantum age.

Updated: 2025-04-09 13:33:48

标题: CRYSIM：基于GPU的Ising机器预测大晶体的对称结构

摘要: 使用伊辛机解决黑盒优化问题在材料科学中越来越普遍。然而，由于对原子坐标的对称性不敏感编码，它们在晶体结构预测（CSP）中的应用仍然效果不佳。我们介绍了CRYSIM算法，它将空间群、Wyckoff位置组合和独立原子位点的坐标分别编码为单独的变量。这种编码通过利用空间群中的对称性大大减少了搜索空间。当CRYSIM与基于GPU的伊辛机Fixstars Amplify接口时，其预测性能与CALYPSO和贝叶斯优化在包含超过150个原子的晶胞中的晶体相媲美。虽然将CRYSIM接口到当前小规模量子设备并非现实，但它有潜力成为未来量子时代的标准CSP算法。

更新时间: 2025-04-09 13:33:48

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2504.06878v1

SEAL: Semantic Aware Image Watermarking

Generative models have rapidly evolved to generate realistic outputs. However, their synthetic outputs increasingly challenge the clear distinction between natural and AI-generated content, necessitating robust watermarking techniques. Watermarks are typically expected to preserve the integrity of the target image, withstand removal attempts, and prevent unauthorized replication onto unrelated images. To address this need, recent methods embed persistent watermarks into images produced by diffusion models using the initial noise. Yet, to do so, they either distort the distribution of generated images or rely on searching through a long dictionary of used keys for detection. In this paper, we propose a novel watermarking method that embeds semantic information about the generated image directly into the watermark, enabling a distortion-free watermark that can be verified without requiring a database of key patterns. Instead, the key pattern can be inferred from the semantic embedding of the image using locality-sensitive hashing. Furthermore, conditioning the watermark detection on the original image content improves robustness against forgery attacks. To demonstrate that, we consider two largely overlooked attack strategies: (i) an attacker extracting the initial noise and generating a novel image with the same pattern; (ii) an attacker inserting an unrelated (potentially harmful) object into a watermarked image, possibly while preserving the watermark. We empirically validate our method's increased robustness to these attacks. Taken together, our results suggest that content-aware watermarks can mitigate risks arising from image-generative models.

Updated: 2025-04-09 13:30:18

标题: SEAL：语义感知图像水印技术

摘要: 生成模型已迅速发展，以产生逼真的输出。然而，它们的合成输出越来越挑战自然内容和人工智能生成内容之间的明显区别，这使得需要强大的水印技术。水印通常被期望保持目标图像的完整性，经受撤除尝试，并防止未经授权的复制到不相关的图像上。为了解决这一需求，最近的方法将持久水印嵌入到由扩散模型产生的图像中，使用初始噪声。然而，为了做到这一点，它们要么扭曲生成图像的分布，要么依赖于搜索长字典中使用的密钥以进行检测。在本文中，我们提出了一种新颖的水印方法，将关于生成图像的语义信息直接嵌入水印中，从而实现无失真的水印，而无需数据库中的密钥模式。相反，密钥模式可以通过使用局部敏感哈希从图像的语义嵌入中推断出来。此外，将水印检测条件设置为基于原始图像内容可以提高对伪造攻击的鲁棒性。为了证明这一点，我们考虑了两种被大多数人忽视的攻击策略：（i）攻击者提取初始噪声并生成具有相同模式的新图像；（ii）攻击者在带有水印的图像中插入一个无关的（潜在有害的）对象，可能同时保留水印。我们在实验中验证了我们的方法对这些攻击的增强鲁棒性。综合起来，我们的结果表明，内容感知水印可以减轻由图像生成模型引起的风险。

更新时间: 2025-04-09 13:30:18

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2503.12172v2

ODEStream: A Buffer-Free Online Learning Framework with ODE-based Adaptor for Streaming Time Series Forecasting

Addressing the challenges of irregularity and concept drift in streaming time series is crucial for real-world predictive modelling. Previous studies in time series continual learning often propose models that require buffering long sequences, potentially restricting the responsiveness of the inference system. Moreover, these models are typically designed for regularly sampled data, an unrealistic assumption in real-world scenarios. This paper introduces ODEStream, a novel buffer-free continual learning framework that incorporates a temporal isolation layer to capture temporal dependencies within the data. Simultaneously, it leverages the capability of neural ordinary differential equations to process irregular sequences and generate a continuous data representation, enabling seamless adaptation to changing dynamics in a data streaming scenario. Our approach focuses on learning how the dynamics and distribution of historical data change over time, facilitating direct processing of streaming sequences. Evaluations on benchmark real-world datasets demonstrate that ODEStream outperforms the state-of-the-art online learning and streaming analysis baseline models, providing accurate predictions over extended periods while minimising performance degradation over time by learning how the sequence dynamics change. The implementation of ODEStream is available at: https://github.com/FtoonAbushaqra/ODEStream.git.

Updated: 2025-04-09 13:29:09

标题: ODEStream：一种基于ODE适配器的无缓冲在线学习框架，用于流式时间序列预测

摘要: 解决流式时间序列中出现的不规则性和概念漂移挑战对于真实世界的预测建模至关重要。以往关于时间序列的持续学习的研究往往提出需要缓冲长序列的模型，可能限制推断系统的响应性。此外，这些模型通常设计用于定期采样的数据，这在真实世界情景中是不现实的假设。本文介绍了ODEStream，这是一个新颖的无缓冲持续学习框架，它包含一个时间隔离层来捕捉数据中的时间依赖性。同时，它利用神经常微分方程的能力来处理不规则序列并生成连续数据表示，使其能够无缝地适应数据流场景中的动态变化。我们的方法侧重于学习历史数据的动态和分布随时间变化的方式，从而促进对流式序列的直接处理。对基准真实世界数据集的评估表明，ODEStream优于最先进的在线学习和流式分析基准模型，能够在延长的时间段内提供准确的预测，并通过学习序列动态变化的方式最小化随时间的性能下降。ODEStream的实现可在以下链接获取：https://github.com/FtoonAbushaqra/ODEStream.git。

更新时间: 2025-04-09 13:29:09

领域: cs.LG

下载: http://arxiv.org/abs/2411.07413v2

Differential Adjusted Parity for Learning Fair Representations

The development of fair and unbiased machine learning models remains an ongoing objective for researchers in the field of artificial intelligence. We introduce the Differential Adjusted Parity (DAP) loss to produce unbiased informative representations. It utilises a differentiable variant of the adjusted parity metric to create a unified objective function. By combining downstream task classification accuracy and its inconsistency across sensitive feature domains, it provides a single tool to increase performance and mitigate bias. A key element in this approach is the use of soft balanced accuracies. In contrast to previous non-adversarial approaches, DAP does not suffer a degeneracy where the metric is satisfied by performing equally poorly across all sensitive domains. It outperforms several adversarial models on downstream task accuracy and fairness in our analysis. Specifically, it improves the demographic parity, equalized odds and sensitive feature accuracy by as much as 22.5\%, 44.1\% and 40.1\%, respectively, when compared to the best performing adversarial approaches on these metrics. Overall, the DAP loss and its associated metric can play a significant role in creating more fair machine learning models.

Updated: 2025-04-09 13:19:22

标题: 学习公平表示的差异调整平等

摘要: 公平且无偏的机器学习模型的发展仍然是人工智能领域研究人员的持续目标。我们引入了差异调整的平等（DAP）损失，以产生无偏的信息表示。它利用调整后的平等度量的可微变体来创建统一的目标函数。通过结合敏感特征域中下游任务分类准确性及其不一致性，它提供了一个单一工具来提高性能和减少偏见。该方法的一个关键元素是使用软平衡准确性。与先前的非对抗方法相比，DAP不会因在所有敏感领域中表现同样糟糕而满足度量而退化。在我们的分析中，它在下游任务准确性和公平性方面优于几个对抗模型。具体地，与这些指标上表现最好的对抗方法相比，它将人口统计平等、均等赔率和敏感特征准确性分别提高了22.5％、44.1％和40.1％。总的来说，DAP损失及其相关度量可以在创建更公平的机器学习模型方面发挥重要作用。

更新时间: 2025-04-09 13:19:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.09765v2

Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games

Artificial agents are increasingly central to complex interactions and decision-making tasks, yet aligning their behaviors with desired human values remains an open challenge. In this work, we investigate how human-like personality traits influence agent behavior and performance within text-based interactive environments. We introduce PANDA: PersonalityAdapted Neural Decision Agents, a novel method for projecting human personality traits onto agents to guide their behavior. To induce personality in a text-based game agent, (i) we train a personality classifier to identify what personality type the agent's actions exhibit, and (ii) we integrate the personality profiles directly into the agent's policy-learning pipeline. By deploying agents embodying 16 distinct personality types across 25 text-based games and analyzing their trajectories, we demonstrate that an agent's action decisions can be guided toward specific personality profiles. Moreover, certain personality types, such as those characterized by higher levels of Openness, display marked advantages in performance. These findings underscore the promise of personality-adapted agents for fostering more aligned, effective, and human-centric decision-making in interactive environments.

Updated: 2025-04-09 13:17:00

标题: 个人动态：揭示人格特质对文本游戏中代理的影响

摘要: 人工代理在复杂互动和决策任务中变得越来越重要，然而将它们的行为与期望的人类价值观保持一致仍然是一个挑战。在这项工作中，我们研究了类人个性特征如何影响基于文本的互动环境中代理的行为和性能。我们引入了PANDA：PersonalityAdapted Neural Decision Agents，这是一种将人类个性特征投射到代理上以指导其行为的新方法。为了在基于文本的游戏代理中引入个性，（i）我们训练了一个个性分类器来识别代理的行为表现出什么样的个性类型，（ii）我们直接将个性配置文件整合到代理的政策学习管道中。通过在25个基于文本的游戏中部署代表16种不同人格类型的代理并分析它们的轨迹，我们证明了代理的行动决策可以朝着特定的个性配置文件指导。此外，某些个性类型，例如那些具有更高开放性水平的类型，在性能方面显示出明显的优势。这些发现强调了个性适应代理在促进更加一致、有效和以人为本的决策制定方面的潜力。

更新时间: 2025-04-09 13:17:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.06868v1

GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes

Robust grasping in cluttered environments remains an open challenge in robotics. While benchmark datasets have significantly advanced deep learning methods, they mainly focus on simplistic scenes with light occlusion and insufficient diversity, limiting their applicability to practical scenarios. We present GraspClutter6D, a large-scale real-world grasping dataset featuring: (1) 1,000 highly cluttered scenes with dense arrangements (14.1 objects/scene, 62.6\% occlusion), (2) comprehensive coverage across 200 objects in 75 environment configurations (bins, shelves, and tables) captured using four RGB-D cameras from multiple viewpoints, and (3) rich annotations including 736K 6D object poses and 9.3B feasible robotic grasps for 52K RGB-D images. We benchmark state-of-the-art segmentation, object pose estimation, and grasping detection methods to provide key insights into challenges in cluttered environments. Additionally, we validate the dataset's effectiveness as a training resource, demonstrating that grasping networks trained on GraspClutter6D significantly outperform those trained on existing datasets in both simulation and real-world experiments. The dataset, toolkit, and annotation tools are publicly available on our project website: https://sites.google.com/view/graspclutter6d.

Updated: 2025-04-09 13:15:46

标题: GraspClutter6D：一个用于复杂场景中稳健感知和抓取的大规模真实世界数据集

摘要: 在机器人领域，复杂环境下稳健的抓取仍然是一个挑战。尽管基准数据集显著推动了深度学习方法的发展，但它们主要关注简单场景，光遮挡较轻，并且缺乏多样性，限制了它们在实际情况中的适用性。我们提出了GraspClutter6D，这是一个大规模的真实世界抓取数据集，具有以下特点：（1）1000个高度混乱的场景，密集排列（每个场景14.1个物体，62.6％的遮挡），（2）涵盖了75个环境配置中的200个物体（箱子、货架和桌子），使用四个RGB-D相机从多个视角捕获，并且（3）包含丰富的注释，包括736K个6D物体姿势和52K个RGB-D图像中的9.3B个可行的机器人抓取。我们对最先进的分割、物体姿态估计和抓取检测方法进行了基准测试，以便深入了解在复杂环境中面临的挑战。此外，我们验证了该数据集作为训练资源的有效性，证明在GraspClutter6D上训练的抓取网络在模拟和实际实验中明显优于在现有数据集上训练的网络。该数据集、工具包和注释工具均可在我们的项目网站上公开获取：https://sites.google.com/view/graspclutter6d。

更新时间: 2025-04-09 13:15:46

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2504.06866v1

EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation

Zero-shot, training-free, image-based text-to-video generation is an emerging area that aims to generate videos using existing image-based diffusion models. Current methods in this space require specific architectural changes to image generation models, which limit their adaptability and scalability. In contrast to such methods, we provide a model-agnostic approach. We use intersections in diffusion trajectories, working only with the latent values. We could not obtain localized frame-wise coherence and diversity using only the intersection of trajectories. Thus, we instead use a grid-based approach. An in-context trained LLM is used to generate coherent frame-wise prompts; another is used to identify differences between frames. Based on these, we obtain a CLIP-based attention mask that controls the timing of switching the prompts for each grid cell. Earlier switching results in higher variance, while later switching results in more coherence. Therefore, our approach can ensure appropriate control between coherence and variance for the frames. Our approach results in state-of-the-art performance while being more flexible when working with diverse image-generation models. The empirical analysis using quantitative metrics and user studies confirms our model's superior temporal consistency, visual fidelity and user satisfaction, thus providing a novel way to obtain training-free, image-based text-to-video generation.

Updated: 2025-04-09 13:11:09

标题: EIDT-V：利用扩散轨迹中的交集进行无模型、零训练的文本到视频生成

摘要: Zero-shot、无需训练的基于图像的文本到视频生成是一个新兴领域，旨在使用现有的基于图像的扩散模型生成视频。当前这个领域的方法需要对图像生成模型进行特定的结构更改，这限制了它们的适应性和可扩展性。与这些方法相反，我们提供了一种与模型无关的方法。我们利用扩散轨迹中的交点，仅使用潜在值。我们无法仅通过轨迹的交点获得局部帧间一致性和多样性。因此，我们改用基于网格的方法。我们使用经过上下文训练的LLM生成连贯的逐帧提示；另一个用于识别帧之间的差异。基于这些，我们获得了一个基于CLIP的注意力蒙版，用于控制每个网格单元切换提示的时间。较早的切换会导致更高的方差，而较晚的切换会导致更多的一致性。因此，我们的方法可以确保在帧之间的一致性和方差之间适当地进行控制。我们的方法在与多样化图像生成模型一起工作时更加灵活，同时表现出最先进的性能。使用定量指标和用户研究进行的实证分析证实了我们模型在时间一致性、视觉保真度和用户满意度方面的优越性，从而提供了一种获得无需训练、基于图像的文本到视频生成的新颖方法。

更新时间: 2025-04-09 13:11:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.06861v1

Unified CNNs and transformers underlying learning mechanism reveals multi-head attention modus vivendi

Convolutional neural networks (CNNs) evaluate short-range correlations in input images which progress along the layers, whereas vision transformer (ViT) architectures evaluate long-range correlations, using repeated transformer encoders composed of fully connected layers. Both are designed to solve complex classification tasks but from different perspectives. This study demonstrates that CNNs and ViT architectures stem from a unified underlying learning mechanism, which quantitatively measures the single-nodal performance (SNP) of each node in feedforward (FF) and multi-head attention (MHA) sub-blocks. Each node identifies small clusters of possible output labels, with additional noise represented as labels outside these clusters. These features are progressively sharpened along the transformer encoders, enhancing the signal-to-noise ratio. This unified underlying learning mechanism leads to two main findings. First, it enables an efficient applied nodal diagonal connection (ANDC) pruning technique without affecting the accuracy. Second, based on the SNP, spontaneous symmetry breaking occurs among the MHA heads, such that each head focuses its attention on a subset of labels through cooperation among its SNPs. Consequently, each head becomes an expert in recognizing its designated labels, representing a quantitative MHA modus vivendi mechanism. This statistical mechanics inspired viewpoint enables to reveal macroscopic behavior of the entire network from the microscopic performance of each node. These results are based on a compact convolutional transformer architecture trained on the CIFAR-100 and Flowers-102 datasets and call for their extension to other architectures and applications, such as natural language processing.

Updated: 2025-04-09 13:06:49

标题: 统一的 CNN 和 transformers 构成的学习机制揭示了多头注意力的生活方式

摘要: 卷积神经网络（CNN）评估输入图像中沿层逐步发展的短程相关性，而视觉变换器（ViT）架构评估长程相关性，使用由全连接层组成的重复变换器编码器。两者都旨在解决复杂的分类任务，但从不同的角度。本研究证明CNN和ViT架构源于统一的基础学习机制，该机制定量测量前馈（FF）和多头注意力（MHA）子块中每个节点的单点性能（SNP）。每个节点识别可能输出标签的小簇，额外的噪声表示为这些簇之外的标签。这些特征逐渐在变换器编码器中变得更加清晰，增强了信噪比。这一统一的基础学习机制导致两个主要发现。首先，它实现了一种有效的应用节点对角连接（ANDC）修剪技术，而不影响准确性。其次，基于SNP，MHA头部之间发生自发对称破缺，使每个头部通过合作集中注意力于标签的子集。因此，每个头部都成为识别其指定标签的专家，代表了一种定量MHA共存机制。这种受到统计力学启发的观点能够从每个节点的微观性能揭示整个网络的宏观行为。这些结果基于在CIFAR-100和Flowers-102数据集上训练的紧凑卷积变换器架构，并呼吁将其扩展到其他架构和应用，如自然语言处理。

更新时间: 2025-04-09 13:06:49

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2501.12900v3

Integrating Cognitive Processing Signals into Language Models: A Review of Advances, Applications and Future Directions

Recently, the integration of cognitive neuroscience in Natural Language Processing (NLP) has gained significant attention. This article provides a critical and timely overview of recent advancements in leveraging cognitive signals, particularly Eye-tracking (ET) signals, to enhance Language Models (LMs) and Multimodal Large Language Models (MLLMs). By incorporating user-centric cognitive signals, these approaches address key challenges, including data scarcity and the environmental costs of training large-scale models. Cognitive signals enable efficient data augmentation, faster convergence, and improved human alignment. The review emphasises the potential of ET data in tasks like Visual Question Answering (VQA) and mitigating hallucinations in MLLMs, and concludes by discussing emerging challenges and research trends.

Updated: 2025-04-09 13:01:48

标题: 将认知处理信号整合到语言模型中：进展、应用和未来方向综述

摘要: 最近，认知神经科学在自然语言处理（NLP）中的整合引起了重要关注。本文提供了对最近在利用认知信号，特别是眼动追踪（ET）信号，以增强语言模型（LMs）和多模态大型语言模型（MLLMs）方面的进展的关键和及时的概述。通过整合以用户为中心的认知信号，这些方法解决了关键挑战，包括数据稀缺和训练大型模型的环境成本。认知信号可以实现高效的数据增强、更快的收敛和改进的人类对齐。该综述强调了眼动追踪数据在任务中的潜力，如视觉问答（VQA）和减轻MLLM中的幻觉，并通过讨论新兴挑战和研究趋势来总结。

更新时间: 2025-04-09 13:01:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.06843v1

ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models

Recent studies have introduced various approaches for prompt-tuning black-box vision-language models, referred to as black-box prompt-tuning (BBPT). While BBPT has demonstrated considerable potential, it is often found that many existing methods require an excessive number of queries (i.e., function evaluations), which poses a significant challenge in real-world scenarios where the number of allowed queries is limited. To tackle this issue, we propose Zeroth-order Intrinsic-dimensional Prompt-tuning (ZIP), a novel approach that enables efficient and robust prompt optimization in a purely black-box setting. The key idea of ZIP is to reduce the problem dimensionality and the variance of zeroth-order gradient estimates, such that the training is done fast with far less queries. We achieve this by re-parameterizing prompts in low-rank representations and designing intrinsic-dimensional clipping of estimated gradients. We evaluate ZIP on 13+ vision-language tasks in standard benchmarks and show that it achieves an average improvement of approximately 6% in few-shot accuracy and 48% in query efficiency compared to the best-performing alternative BBPT methods, establishing a new state of the art. Our ablation analysis further shows that the proposed clipping mechanism is robust and nearly optimal, without the need to manually select the clipping threshold, matching the result of expensive hyperparameter search.

Updated: 2025-04-09 12:56:22

标题: ZIP：针对黑盒视觉-语言模型的高效零阶提示调整

摘要: 最近的研究引入了各种方法来及时调整黑盒视觉-语言模型，称为黑盒提示调整（BBPT）。虽然BBPT表现出了相当大的潜力，但通常发现许多现有方法需要过多的查询（即函数评估），这在现实世界中允许的查询数量有限的情况下构成了重大挑战。为了解决这个问题，我们提出了零阶内在维度提示调整（ZIP），这是一种新颖的方法，可以在纯黑盒设置中实现高效和稳健的提示优化。ZIP的关键思想是降低问题的维度和零阶梯度估计的方差，使训练速度快，查询次数少。我们通过在低秩表示中重新参数化提示并设计估计梯度的内在维度修剪来实现这一点。我们在标准基准测试中评估ZIP在13个以上的视觉-语言任务中的表现，并展示与表现最佳的替代BBPT方法相比，它的平均改进约为6%的少样本准确性和48%的查询效率，确立了新的最佳结果。我们的消融分析进一步表明，所提出的修剪机制是稳健的，几乎是最优的，无需手动选择修剪阈值，与昂贵的超参数搜索的结果相匹配。

更新时间: 2025-04-09 12:56:22

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.06838v1

Similarity of Neural Network Models: A Survey of Functional and Representational Measures

Measuring similarity of neural networks to understand and improve their behavior has become an issue of great importance and research interest. In this survey, we provide a comprehensive overview of two complementary perspectives of measuring neural network similarity: (i) representational similarity, which considers how activations of intermediate layers differ, and (ii) functional similarity, which considers how models differ in their outputs. In addition to providing detailed descriptions of existing measures, we summarize and discuss results on the properties of and relationships between these measures, and point to open research problems. We hope our work lays a foundation for more systematic research on the properties and applicability of similarity measures for neural network models.

Updated: 2025-04-09 12:54:43

标题: 神经网络模型的相似性：功能和表征度量的调查

摘要: 测量神经网络相似性以了解和改善它们的行为已成为一个非常重要且备受研究关注的问题。在本调查中，我们提供了对衡量神经网络相似性的两种互补视角的全面概述：(i) 表示相似性，考虑中间层激活的差异，以及(ii) 功能相似性，考虑模型在输出上的差异。除了提供现有测量的详细描述外，我们还总结和讨论了这些测量之间的属性和关系，并指出了开放的研究问题。我们希望我们的工作为神经网络模型相似性测量的属性和适用性更系统的研究奠定基础。

更新时间: 2025-04-09 12:54:43

领域: cs.LG

下载: http://arxiv.org/abs/2305.06329v4

Symbolic Parallel Composition for Multi-language Protocol Verification

The implementation of security protocols often combines different languages. This practice, however, poses a challenge to traditional verification techniques, which typically assume a single-language environment and, therefore, are insufficient to handle challenges presented by the interplay of different languages. To address this issue, we establish principles for combining multiple programming languages operating on different atomic types using a symbolic execution semantics. This facilitates the (parallel) composition of labeled transition systems, improving the analysis of complex systems by streamlining communication between diverse programming languages. By treating the Dolev-Yao (DY) model as a symbolic abstraction, our approach eliminates the need for translation between different base types, such as bitstrings and DY terms. Our technique provides a foundation for securing interactions in multi-language environments, enhancing program verification and system analysis in complex, interconnected systems.

Updated: 2025-04-09 12:50:03

标题: 多语言协议验证的符号并行组合

摘要: 安全协议的实施通常结合不同的编程语言。然而，这种做法对传统的验证技术构成挑战，传统验证技术通常假设单一语言环境，因此无法处理不同语言交互带来的挑战。为了解决这个问题，我们建立了一个关于使用符号执行语义结合多种编程语言操作不同原子类型的原则。这有助于（并行）组合带标签的转换系统，通过简化不同编程语言之间的通信来改进复杂系统的分析。通过将Dolev-Yao（DY）模型视为符号抽象，我们的方法消除了在不同基本类型之间进行翻译的需要，如比特串和DY术语。我们的技术为在多语言环境中保护交互提供了基础，增强了在复杂、互连系统中的程序验证和系统分析。

更新时间: 2025-04-09 12:50:03

领域: cs.CR

下载: http://arxiv.org/abs/2504.06833v1

FreeCloth: Free-form Generation Enhances Challenging Clothed Human Modeling

Achieving realistic animated human avatars requires accurate modeling of pose-dependent clothing deformations. Existing learning-based methods heavily rely on the Linear Blend Skinning (LBS) of minimally-clothed human models like SMPL to model deformation. However, they struggle to handle loose clothing, such as long dresses, where the canonicalization process becomes ill-defined when the clothing is far from the body, leading to disjointed and fragmented results. To overcome this limitation, we propose FreeCloth, a novel hybrid framework to model challenging clothed humans. Our core idea is to use dedicated strategies to model different regions, depending on whether they are close to or distant from the body. Specifically, we segment the human body into three categories: unclothed, deformed, and generated. We simply replicate unclothed regions that require no deformation. For deformed regions close to the body, we leverage LBS to handle the deformation. As for the generated regions, which correspond to loose clothing areas, we introduce a novel free-form, part-aware generator to model them, as they are less affected by movements. This free-form generation paradigm brings enhanced flexibility and expressiveness to our hybrid framework, enabling it to capture the intricate geometric details of challenging loose clothing, such as skirts and dresses. Experimental results on the benchmark dataset featuring loose clothing demonstrate that FreeCloth achieves state-of-the-art performance with superior visual fidelity and realism, particularly in the most challenging cases.

Updated: 2025-04-09 12:48:01

标题: FreeCloth：自由形式生成增强挑战性服装人体建模

摘要: 实现逼真的动画人类化身需要准确建模姿势相关的服装变形。现有的基于学习的方法严重依赖于线性混合蒙皮（LBS）来对像SMPL这样的最少穿着的人体模型进行变形建模。然而，它们难以处理宽松服装，例如长裙，当服装远离身体时，规范化过程变得不明确，导致结果分散和碎片化。为了克服这一限制，我们提出了FreeCloth，一个用于建模具有挑战性服装的人体的新型混合框架。我们的核心思想是使用专门的策略来对不同区域进行建模，取决于它们是靠近还是远离身体。具体而言，我们将人体分为三个类别：未穿着、变形和生成。我们简单地复制不需要变形的未穿着区域。对于靠近身体的变形区域，我们利用LBS来处理变形。至于生成区域，即宽松服装区域，我们引入了一种新颖的自由形式、部分感知生成器来对其进行建模，因为它们不太受运动的影响。这种自由形式生成范式为我们的混合框架带来了增强的灵活性和表现力，使其能够捕捉挑战性宽松服装的复杂几何细节，如裙子和连衣裙。在具有宽松服装的基准数据集上的实验结果表明，FreeCloth在视觉保真度和逼真度方面取得了最先进的性能，特别是在最具挑战性的情况下。

更新时间: 2025-04-09 12:48:01

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2411.19942v3

Adaptive Locally Linear Embedding

Manifold learning techniques, such as Locally linear embedding (LLE), are designed to preserve the local neighborhood structures of high-dimensional data during dimensionality reduction. Traditional LLE employs Euclidean distance to define neighborhoods, which can struggle to capture the intrinsic geometric relationships within complex data. A novel approach, Adaptive locally linear embedding(ALLE), is introduced to address this limitation by incorporating a dynamic, data-driven metric that enhances topological preservation. This method redefines the concept of proximity by focusing on topological neighborhood inclusion rather than fixed distances. By adapting the metric based on the local structure of the data, it achieves superior neighborhood preservation, particularly for datasets with complex geometries and high-dimensional structures. Experimental results demonstrate that ALLE significantly improves the alignment between neighborhoods in the input and feature spaces, resulting in more accurate and topologically faithful embeddings. This approach advances manifold learning by tailoring distance metrics to the underlying data, providing a robust solution for capturing intricate relationships in high-dimensional datasets.

Updated: 2025-04-09 12:40:13

标题: 自适应局部线性嵌入

摘要: 流形学习技术，如局部线性嵌入（LLE），旨在在降维过程中保留高维数据的局部邻域结构。传统的LLE使用欧氏距离来定义邻域，这可能难以捕捉复杂数据中的内在几何关系。一种新颖的方法，自适应局部线性嵌入（ALLE），被引入以解决这一限制，通过结合动态、数据驱动的度量来增强拓扑保存性。该方法通过专注于拓扑邻域的包含而不是固定距离来重新定义接近度的概念。通过根据数据的局部结构调整度量，它实现了出色的邻域保存，特别适用于具有复杂几何和高维结构的数据集。实验结果表明，ALLE显著改善了输入和特征空间之间邻域之间的对齐，导致更准确和拓扑忠实的嵌入。这种方法通过将距离度量器调整到基础数据，推动了流形学习的发展，为捕捉高维数据集中复杂关系提供了稳健的解决方案。

更新时间: 2025-04-09 12:40:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06829v1

Accelerated Stein Variational Gradient Flow

Stein variational gradient descent (SVGD) is a kernel-based particle method for sampling from a target distribution, e.g., in generative modeling and Bayesian inference. SVGD does not require estimating the gradient of the log-density, which is called score estimation. In practice, SVGD can be slow compared to score-estimation based sampling algorithms. To design fast and efficient high-dimensional sampling algorithms, we introduce ASVGD, an accelerated SVGD, based on an accelerated gradient flow in a metric space of probability densities following Nesterov's method. We then derive a momentum-based discrete-time sampling algorithm, which evolves a set of particles deterministically. To stabilize the particles' momentum update, we also study a Wasserstein metric regularization. For the generalized bilinear kernel and the Gaussian kernel, toy numerical examples with varied target distributions demonstrate the effectiveness of ASVGD compared to SVGD and other popular sampling methods.

Updated: 2025-04-09 12:35:29

标题: 加速的斯坦变分梯度流

摘要: Stein变分梯度下降（SVGD）是一种基于核的粒子方法，用于从目标分布中进行抽样，例如在生成建模和贝叶斯推断中。SVGD不需要估计对数密度的梯度，这被称为分数估计。在实践中，与基于分数估计的抽样算法相比，SVGD可能会较慢。为了设计快速有效的高维抽样算法，我们引入了ASVGD，一种加速的SVGD，基于概率密度度量空间中的加速梯度流，遵循Nesterov的方法。然后，我们推导了基于动量的离散时间抽样算法，该算法以确定性方式演化一组粒子。为了稳定粒子的动量更新，我们还研究了Wasserstein度量正则化。对于广义双线性核和高斯核，通过玩具数值示例展示了ASVGD相对于SVGD和其他流行的抽样方法的有效性。

更新时间: 2025-04-09 12:35:29

领域: stat.ML,cs.LG,math.OC,46N10 (Primary) 46E22 94A15 (Secondary)

下载: http://arxiv.org/abs/2503.23462v2

TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time (Extended Version)

Machine learning (ML) plays a pivotal role in detecting malicious software. Despite the high F1-scores reported in numerous studies reaching upwards of 0.99, the issue is not completely solved. Malware detectors often experience performance decay due to constantly evolving operating systems and attack methods, which can render previously learned knowledge insufficient for accurate decision-making on new inputs. This paper argues that commonly reported results are inflated due to two pervasive sources of experimental bias in the detection task: spatial bias caused by data distributions that are not representative of a real-world deployment; and temporal bias caused by incorrect time splits of data, leading to unrealistic configurations. To address these biases, we introduce a set of constraints for fair experiment design, and propose a new metric, AUT, for classifier robustness in real-world settings. We additionally propose an algorithm designed to tune training data to enhance classifier performance. Finally, we present TESSERACT, an open-source framework for realistic classifier comparison. Our evaluation encompasses both traditional ML and deep learning methods, examining published works on an extensive Android dataset with 259,230 samples over a five-year span. Additionally, we conduct case studies in the Windows PE and PDF domains. Our findings identify the existence of biases in previous studies and reveal that significant performance enhancements are possible through appropriate, periodic tuning. We explore how mitigation strategies may support in achieving a more stable and better performance over time by employing multiple strategies to delay performance decay.

Updated: 2025-04-09 12:32:21

标题: TESSERACT：消除恶意软件分类中的实验偏差跨空间和时间（扩展版本）

摘要: 机器学习（ML）在检测恶意软件方面发挥着关键作用。尽管许多研究报告的F1分数高达0.99，但这个问题并没有完全解决。恶意软件检测器通常由于不断演化的操作系统和攻击方法而经历性能衰减，这可能导致先前学到的知识不足以准确地对新输入进行决策。本文认为，由于两种普遍的实验偏见，常见的报告结果存在夸大：检测任务中的空间偏见是由数据分布引起的，这些数据分布不代表真实部署；而时间偏见是由数据错误的时间划分造成的，导致不切实际的配置。为了解决这些偏见，我们引入了一组公平实验设计约束，并提出了一个新的度量标准AUT，用于评估分类器在真实世界环境中的稳健性。此外，我们提出了一种旨在调整训练数据以提高分类器性能的算法。最后，我们提出了TESSERACT，一个用于实际分类器比较的开源框架。我们的评估涵盖了传统机器学习和深度学习方法，在一个为期五年的广泛的Android数据集上检验了259,230个样本的已发表作品。此外，我们在Windows PE和PDF领域进行了案例研究。我们的研究结果发现了先前研究中的偏见存在，并揭示了通过适当的定期调整可能实现显著的性能增强。我们探讨了如何通过采用多种策略延迟性能衰减，以支持实现更加稳定和更好的性能。

更新时间: 2025-04-09 12:32:21

领域: cs.LG,cs.CR,cs.PF

下载: http://arxiv.org/abs/2402.01359v2

Regret Bounds for Robust Online Decision Making

We propose a framework which generalizes "decision making with structured observations" by allowing robust (i.e. multivalued) models. In this framework, each model associates each decision with a convex set of probability distributions over outcomes. Nature can choose distributions out of this set in an arbitrary (adversarial) manner, that can be nonoblivious and depend on past history. The resulting framework offers much greater generality than classical bandits and reinforcement learning, since the realizability assumption becomes much weaker and more realistic. We then derive a theory of regret bounds for this framework. Although our lower and upper bounds are not tight, they are sufficient to fully characterize power-law learnability. We demonstrate this theory in two special cases: robust linear bandits and tabular robust online reinforcement learning. In both cases, we derive regret bounds that improve state-of-the-art (except that we do not address computational efficiency).

Updated: 2025-04-09 12:25:00

标题: 稳健在线决策的后悔界限

摘要: 我们提出了一个框架，通过允许健壮（即多值）模型，推广了“结构化观察决策”。在这个框架中，每个模型将每个决策与一个概率分布的凸集相关联。自然可以以任意（对抗性）方式从该集合中选择分布，这可能是非遗忘的，并且取决于过去的历史。由此产生的框架比经典的赌博机和强化学习具有更广泛的适用性，因为实现假设变得更弱且更现实。然后我们为这个框架推导了一个后悔界的理论。虽然我们的下限和上限并不紧密，但它们足以完全描述幂律可学习性。我们在两种特殊情况下展示了这一理论：健壮线性赌博机和表格型健壮在线强化学习。在这两种情况下，我们推导出改进现有技术的后悔界（除了我们没有涉及计算效率）。

更新时间: 2025-04-09 12:25:00

领域: cs.LG,68Q32,I.2.6

下载: http://arxiv.org/abs/2504.06820v1

Deep Neural Koopman Operator-based Economic Model Predictive Control of Shipboard Carbon Capture System

Shipboard carbon capture is a promising solution to help reduce carbon emissions in international shipping. In this work, we propose a data-driven dynamic modeling and economic predictive control approach within the Koopman framework. This integrated modeling and control approach is used to achieve safe and energy-efficient process operation of shipboard post-combustion carbon capture plants. Specifically, we propose a deep neural Koopman operator modeling approach, based on which a Koopman model with time-varying model parameters is established. This Koopman model predicts the overall economic operational cost and key system outputs, based on accessible partial state measurements. By leveraging this learned model, a constrained economic predictive control scheme is developed. Despite time-varying parameters involved in the formulated model, the formulated optimization problem associated with the economic predictive control design is convex, and it can be solved efficiently during online control implementations. Extensive tests are conducted on a high-fidelity simulation environment for shipboard post-combustion carbon capture processes. Four ship operational conditions are taken into account. The results show that the proposed method significantly improves the overall economic operational performance and carbon capture rate. Additionally, the proposed method guarantees safe operation by ensuring that hard constraints on the system outputs are satisfied.

Updated: 2025-04-09 12:22:42

标题: 基于深度神经Koopman算子的船舶碳捕集系统经济模型预测控制

摘要: 船舶碳捕捉是帮助减少国际航运碳排放的一种有前途的解决方案。在这项工作中，我们提出了一个基于数据驱动的动态建模和经济预测控制方法，该方法在Koopman框架内实现。这种集成建模和控制方法用于实现船舶后燃烧碳捕捉装置的安全和节能过程运行。具体而言，我们提出了一种基于深度神经Koopman算子建模方法，基于该方法建立了一个具有时变模型参数的Koopman模型。该Koopman模型根据可访问的部分状态测量预测整体经济运行成本和关键系统输出。通过利用这个学习模型，我们开发了一个受限经济预测控制方案。尽管制定的模型涉及时变参数，但与经济预测控制设计相关的优化问题是凸优化问题，并且可以在在线控制实施期间高效解决。对船舶后燃烧碳捕捉过程进行了大量测试，测试是在高保真度的仿真环境中进行的。考虑了四种船舶操作条件。结果显示，所提出的方法显著改善了整体经济运行性能和碳捕捉率。此外，所提出的方法通过确保满足系统输出的硬约束条件来保证安全运行。

更新时间: 2025-04-09 12:22:42

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2504.06818v1

Unraveling Human-AI Teaming: A Review and Outlook

Artificial Intelligence (AI) is advancing at an unprecedented pace, with clear potential to enhance decision-making and productivity. Yet, the collaborative decision-making process between humans and AI remains underdeveloped, often falling short of its transformative possibilities. This paper explores the evolution of AI agents from passive tools to active collaborators in human-AI teams, emphasizing their ability to learn, adapt, and operate autonomously in complex environments. This paradigm shifts challenges traditional team dynamics, requiring new interaction protocols, delegation strategies, and responsibility distribution frameworks. Drawing on Team Situation Awareness (SA) theory, we identify two critical gaps in current human-AI teaming research: the difficulty of aligning AI agents with human values and objectives, and the underutilization of AI's capabilities as genuine team members. Addressing these gaps, we propose a structured research outlook centered on four key aspects of human-AI teaming: formulation, coordination, maintenance, and training. Our framework highlights the importance of shared mental models, trust-building, conflict resolution, and skill adaptation for effective teaming. Furthermore, we discuss the unique challenges posed by varying team compositions, goals, and complexities. This paper provides a foundational agenda for future research and practical design of sustainable, high-performing human-AI teams.

Updated: 2025-04-09 12:20:05

标题: 揭示人工智能与人类团队合作：回顾与展望

摘要: 人工智能（AI）正在以前所未有的速度发展，具有明显的潜力来提升决策和生产力。然而，人与AI之间的协作决策过程仍未充分发展，往往无法发挥其转变可能性。本文探讨了AI代理从被动工具发展为人工智能团队中的积极合作者的演变过程，强调它们在复杂环境中学习、适应和自主运作的能力。这种范式转变挑战了传统团队动态，需要新的互动协议、委托策略和责任分配框架。借鉴团队情境感知（SA）理论，我们确定了当前人-AI团队研究中的两个关键空白：难以将AI代理与人类的价值观和目标保持一致，以及未充分利用AI作为真正团队成员的能力。针对这些空白，我们提出了一个以人-AI团队的四个关键方面为中心的结构化研究展望：制定、协调、维护和培训。我们的框架强调了共享的心智模型、建立信任、解决冲突和技能适应对于有效团队合作的重要性。此外，我们讨论了不同团队组成、目标和复杂性带来的独特挑战。本文为未来研究和可持续、高效的人-AI团队的实际设计提供了基础议程。

更新时间: 2025-04-09 12:20:05

领域: cs.HC,cs.AI,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2504.05755v2

Hybrid CNN with Chebyshev Polynomial Expansion for Medical Image Analysis

Lung cancer remains one of the leading causes of cancer-related mortality worldwide, with early and accurate diagnosis playing a pivotal role in improving patient outcomes. Automated detection of pulmonary nodules in computed tomography (CT) scans is a challenging task due to variability in nodule size, shape, texture, and location. Traditional Convolutional Neural Networks (CNNs) have shown considerable promise in medical image analysis; however, their limited ability to capture fine-grained spatial-spectral variations restricts their performance in complex diagnostic scenarios. In this study, we propose a novel hybrid deep learning architecture that incorporates Chebyshev polynomial expansions into CNN layers to enhance expressive power and improve the representation of underlying anatomical structures. The proposed Chebyshev-CNN leverages the orthogonality and recursive properties of Chebyshev polynomials to extract high-frequency features and approximate complex nonlinear functions with greater fidelity. The model is trained and evaluated on benchmark lung cancer imaging datasets, including LUNA16 and LIDC-IDRI, achieving superior performance in classifying pulmonary nodules as benign or malignant. Quantitative results demonstrate significant improvements in accuracy, sensitivity, and specificity compared to traditional CNN-based approaches. This integration of polynomial-based spectral approximation within deep learning provides a robust framework for enhancing automated medical diagnostics and holds potential for broader applications in clinical decision support systems.

Updated: 2025-04-09 12:02:56

标题: 基于Chebyshev多项式扩展的混合CNN用于医学图像分析

摘要: 肺癌仍然是全球癌症相关死亡的主要原因之一，早期和准确的诊断在改善患者预后方面起着关键作用。在计算机断层扫描(CT)中自动检测肺结节是一项具有挑战性的任务，因为结节的大小、形状、纹理和位置存在变异性。传统的卷积神经网络(CNNs)在医学图像分析中显示出相当大的潜力；然而，它们对细粒度空间-光谱变化的有限能力限制了它们在复杂诊断场景中的性能。在这项研究中，我们提出了一种新颖的混合深度学习架构，将切比雪夫多项式展开应用于CNN层中，以增强表达能力并改善底层解剖结构的表示。所提出的切比雪夫-CNN利用切比雪夫多项式的正交性和递归属性来提取高频特征，并以更高的准确度逼近复杂非线性函数。该模型在基准肺癌影像数据集上进行了训练和评估，包括LUNA16和LIDC-IDRI，在将肺结节分类为良性或恶性方面取得了卓越的性能。定量结果表明，与传统基于CNN的方法相比，准确性、灵敏度和特异性均有显著改善。在深度学习中整合基于多项式的光谱逼近提供了一个强大的框架，用于增强自动医学诊断，并在临床决策支持系统中具有更广泛的应用潜力。

更新时间: 2025-04-09 12:02:56

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.06811v1

Noise-based Local Learning using Stochastic Magnetic Tunnel Junctions

Brain-inspired learning in physical hardware has enormous potential to learn fast at minimal energy expenditure. One of the characteristics of biological learning systems is their ability to learn in the presence of various noise sources. Inspired by this observation, we introduce a novel noise-based learning approach for physical systems implementing multi-layer neural networks. Simulation results show that our approach allows for effective learning whose performance approaches that of the conventional effective yet energy-costly backpropagation algorithm. Using a spintronics hardware implementation, we demonstrate experimentally that learning can be achieved in a small network composed of physical stochastic magnetic tunnel junctions. These results provide a path towards efficient learning in general physical systems which embraces rather than mitigates the noise inherent in physical devices.

Updated: 2025-04-09 12:01:35

标题: 基于噪声的局部学习：使用随机磁隧道结的方法

摘要: 大脑启发的物理硬件学习具有巨大的潜力，在最小能量消耗下快速学习。生物学习系统的特征之一是它们能够在各种噪声源存在的情况下学习。受到这一观察的启发，我们引入了一种基于噪声的新型学习方法，用于实现多层神经网络的物理系统。模拟结果表明，我们的方法允许有效学习，其性能接近传统的有效但能耗昂贵的反向传播算法。通过使用自旋电子学硬件实现，我们实验性地证明了可以在由物理随机磁隧道结构组成的小型网络中实现学习。这些结果为在一般物理系统中实现高效学习提供了一条途径，这种系统不仅接受物理设备固有的噪声，而且利用了这种噪声。

更新时间: 2025-04-09 12:01:35

领域: cs.ET,cond-mat.mes-hall,cs.LG

下载: http://arxiv.org/abs/2412.12783v2

Mass Balance Approximation of Unfolding Improves Potential-Like Methods for Protein Stability Predictions

The prediction of protein stability changes following single-point mutations plays a pivotal role in computational biology, particularly in areas like drug discovery, enzyme reengineering, and genetic disease analysis. Although deep-learning strategies have pushed the field forward, their use in standard workflows remains limited due to resource demands. Conversely, potential-like methods are fast, intuitive, and efficient. Yet, these typically estimate Gibbs free energy shifts without considering the free-energy variations in the unfolded protein state, an omission that may breach mass balance and diminish accuracy. This study shows that incorporating a mass-balance correction (MBC) to account for the unfolded state significantly enhances these methods. While many machine learning models partially model this balance, our analysis suggests that a refined representation of the unfolded state may improve the predictive performance.

Updated: 2025-04-09 11:53:02

标题: 不可展开的质量平衡近似改进了蛋白稳定性预测的基于势的方法

摘要: 单点突变后蛋白稳定性变化的预测在计算生物学中发挥着关键作用，特别是在药物发现、酶重组和遗传疾病分析等领域。尽管深度学习策略推动了该领域的发展，但由于资源需求，它们在标准工作流程中的使用仍然有限。相反，类似势的方法快速、直观且高效。然而，这些方法通常估计吉布斯自由能的变化而不考虑未折叠蛋白状态中的自由能变化，这种遗漏可能会破坏质量平衡并降低准确性。本研究表明，将质量平衡校正（MBC）纳入到未折叠状态中显著增强了这些方法。虽然许多机器学习模型部分地模拟这种平衡，但我们的分析表明，对未折叠状态的精细表示可能会提高预测性能。

更新时间: 2025-04-09 11:53:02

领域: q-bio.QM,cs.LG,physics.bio-ph

下载: http://arxiv.org/abs/2504.06806v1

Robust Classification with Noisy Labels Based on Posterior Maximization

Designing objective functions robust to label noise is crucial for real-world classification algorithms. In this paper, we investigate the robustness to label noise of an $f$-divergence-based class of objective functions recently proposed for supervised classification, herein referred to as $f$-PML. We show that, in the presence of label noise, any of the $f$-PML objective functions can be corrected to obtain a neural network that is equal to the one learned with the clean dataset. Additionally, we propose an alternative and novel correction approach that, during the test phase, refines the posterior estimated by the neural network trained in the presence of label noise. Then, we demonstrate that, even if the considered $f$-PML objective functions are not symmetric, they are robust to symmetric label noise for any choice of $f$-divergence, without the need for any correction approach. This allows us to prove that the cross-entropy, which belongs to the $f$-PML class, is robust to symmetric label noise. Finally, we show that such a class of objective functions can be used together with refined training strategies, achieving competitive performance against state-of-the-art techniques of classification with label noise.

Updated: 2025-04-09 11:52:51

标题: 基于后验最大化的具有噪声标签的鲁棒分类

摘要: 设计对标签噪声具有鲁棒性的客观函数对于现实世界中的分类算法至关重要。在本文中，我们调查了最近针对监督分类提出的一类基于$f$-散度的客观函数（以下简称为$f$-PML）对标签噪声的鲁棒性。我们展示了，在存在标签噪声的情况下，任何一个$f$-PML客观函数都可以被校正，以获得一个与使用干净数据集学习的神经网络相等的神经网络。此外，我们提出了一种替代且新颖的校正方法，该方法在测试阶段对在存在标签噪声下训练的神经网络估计的后验进行了细化。然后，我们证明了即使考虑的$f$-PML客观函数不对称，它们对于对称标签噪声也具有鲁棒性，无需任何校正方法。这使我们能够证明交叉熵，属于$f$-PML类别，对于对称标签噪声具有鲁棒性。最后，我们展示了这类客观函数可以与精细的训练策略一起使用，达到与标签噪声分类的最新技术竞争性的性能。

更新时间: 2025-04-09 11:52:51

领域: cs.LG

下载: http://arxiv.org/abs/2504.06805v1

Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization

Non-convex Machine Learning problems typically do not adhere to the standard smoothness assumption. Based on empirical findings, Zhang et al. (2020b) proposed a more realistic generalized $(L_0, L_1)$-smoothness assumption, though it remains largely unexplored. Many existing algorithms designed for standard smooth problems need to be revised. However, in the context of Federated Learning, only a few works address this problem but rely on additional limiting assumptions. In this paper, we address this gap in the literature: we propose and analyze new methods with local steps, partial participation of clients, and Random Reshuffling without extra restrictive assumptions beyond generalized smoothness. The proposed methods are based on the proper interplay between clients' and server's stepsizes and gradient clipping. Furthermore, we perform the first analysis of these methods under the Polyak-{\L} ojasiewicz condition. Our theory is consistent with the known results for standard smooth problems, and our experimental results support the theoretical insights.

Updated: 2025-04-09 11:46:26

标题: 具有局部步骤和随机重排的方法，用于一般平滑非凸联邦优化

摘要: 非凸机器学习问题通常不符合标准的光滑性假设。根据经验发现，张等人（2020b）提出了一个更加现实的广义$(L_0, L_1)$-光滑性假设，尽管它仍然大部分未被探索。许多现有为标准光滑问题设计的算法需要进行修订。然而，在联邦学习的背景下，只有少数研究致力于解决这个问题，但依赖于额外的限制性假设。在本文中，我们填补了文献中的这一空白：我们提出并分析了具有局部步骤、客户端部分参与和随机重排的新方法，而不需要超出广义光滑性的额外限制性假设。所提出的方法基于客户端和服务器步长以及梯度剪切之间的适当互动。此外，我们在Polyak-{\L}ojasiewicz条件下首次对这些方法进行了分析。我们的理论与标准光滑问题的已知结果一致，我们的实验结果支持理论洞察。

更新时间: 2025-04-09 11:46:26

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2412.02781v2

Learning Equivariant Non-Local Electron Density Functionals

The accuracy of density functional theory hinges on the approximation of non-local contributions to the exchange-correlation (XC) functional. To date, machine-learned and human-designed approximations suffer from insufficient accuracy, limited scalability, or dependence on costly reference data. To address these issues, we introduce Equivariant Graph Exchange Correlation (EG-XC), a novel non-local XC functional based on equivariant graph neural networks (GNNs). Where previous works relied on semi-local functionals or fixed-size descriptors of the density, we compress the electron density into an SO(3)-equivariant nuclei-centered point cloud for efficient non-local atomic-range interactions. By applying an equivariant GNN on this point cloud, we capture molecular-range interactions in a scalable and accurate manner. To train EG-XC, we differentiate through a self-consistent field solver requiring only energy targets. In our empirical evaluation, we find EG-XC to accurately reconstruct `gold-standard' CCSD(T) energies on MD17. On out-of-distribution conformations of 3BPA, EG-XC reduces the relative MAE by 35% to 50%. Remarkably, EG-XC excels in data efficiency and molecular size extrapolation on QM9, matching force fields trained on 5 times more and larger molecules. On identical training sets, EG-XC yields on average 51% lower MAEs.

Updated: 2025-04-09 11:44:13

标题: 学习等变非局域电子密度泛函

摘要: 密度泛函理论的准确性取决于对交换相关（XC）泛函的非局部贡献的近似。到目前为止，基于机器学习和人工设计的近似存在准确性不足、可扩展性有限或依赖昂贵的参考数据的问题。为了解决这些问题，我们引入了等变图交换相关（EG-XC），这是一种基于等变图神经网络（GNNs）的新型非局部XC泛函。之前的研究依赖于半局部泛函或密度的固定大小描述符，我们将电子密度压缩成一个SO（3）-等变核心点云，以便进行高效的非局部原子范围相互作用。通过在该点云上应用等变GNN，我们以一种可扩展且准确的方式捕捉分子范围的相互作用。为了训练EG-XC，我们通过一个只需要能量目标的自洽场求解器进行微分。在我们的实证评估中，我们发现EG-XC能够精确重现MD17上的“黄金标准”CCSD（T）能量。在3BPA的超出分布构象上，EG-XC将相对MAE降低了35％至50％。值得注意的是，EG-XC在数据效率和分子尺寸外推方面表现出色，在QM9上与训练了5倍更多和更大分子的力场相匹配。在相同的训练集上，EG-XC的平均MAE降低了51％。

更新时间: 2025-04-09 11:44:13

领域: cs.LG,physics.chem-ph,physics.comp-ph

下载: http://arxiv.org/abs/2410.07972v2

Learning in Spiking Neural Networks with a Calcium-based Hebbian Rule for Spike-timing-dependent Plasticity

Understanding how biological neural networks are shaped via local plasticity mechanisms can lead to energy-efficient and self-adaptive information processing systems, which promises to mitigate some of the current roadblocks in edge computing systems. While biology makes use of spikes to seamless use both spike timing and mean firing rate to modulate synaptic strength, most models focus on one of the two. In this work, we present a Hebbian local learning rule that models synaptic modification as a function of calcium traces tracking neuronal activity. We show how the rule reproduces results from spike time and spike rate protocols from neuroscientific studies. Moreover, we use the model to train spiking neural networks on MNIST digit recognition to show and explain what sort of mechanisms are needed to learn real-world patterns. We show how our model is sensitive to correlated spiking activity and how this enables it to modulate the learning rate of the network without altering the mean firing rate of the neurons nor the hyparameters of the learning rule. To the best of our knowledge, this is the first work that showcases how spike timing and rate can be complementary in their role of shaping the connectivity of spiking neural networks.

Updated: 2025-04-09 11:39:59

标题: 使用基于钙的希伯规则进行时序相关可塑性的尖峰神经网络学习

摘要: 理解生物神经网络是如何通过局部可塑性机制塑造的，可以导致节能和自适应信息处理系统，这有望缓解边缘计算系统中的一些当前障碍。尽管生物学利用尖峰来无缝地利用尖峰时间和平均发放率来调节突触强度，但大多数模型专注于这两者之一。在这项工作中，我们提出了一个赫布型局部学习规则，将突触修饰建模为跟踪神经元活动的钙痕迹的函数。我们展示了该规则如何重现神经科学研究中尖峰时间和尖峰率协议的结果。此外，我们使用该模型来训练脉冲神经网络对MNIST数字识别进行展示和解释，展示了学习真实世界模式所需的机制。我们展示了我们的模型对相关尖峰活动的敏感性，以及这如何使其能够调节网络的学习速率，而不改变神经元的平均发放率或学习规则的超参数。据我们所知，这是第一项展示尖峰时间和率如何在形塑脉冲神经网络连接性的角色中互补的工作。

更新时间: 2025-04-09 11:39:59

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2504.06796v1

International Scientific Report on the Safety of Advanced AI (Interim Report)

This is the interim publication of the first International Scientific Report on the Safety of Advanced AI. The report synthesises the scientific understanding of general-purpose AI -- AI that can perform a wide variety of tasks -- with a focus on understanding and managing its risks. A diverse group of 75 AI experts contributed to this report, including an international Expert Advisory Panel nominated by 30 countries, the EU, and the UN. Led by the Chair, these independent experts collectively had full discretion over the report's content. The final report is available at arXiv:2501.17805

Updated: 2025-04-09 11:34:12

标题: 国际科学报告：高级人工智能安全性的研究报告（暂行报告）

摘要: 这是首份关于先进人工智能安全性的国际科学报告的临时发布。该报告综合了对通用人工智能的科学理解 - 即能够执行各种任务的人工智能 - 重点关注理解和管理其风险。共有75位人工智能专家为该报告做出了贡献，包括由30个国家、欧盟和联合国提名的国际专家咨询小组。在主席的领导下，这些独立专家集体全权决定了报告的内容。最终报告可在arXiv:2501.17805上查阅。

更新时间: 2025-04-09 11:34:12

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2412.05282v2

Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations

Mixture-of-Experts (MoE) models achieve a favorable trade-off between performance and inference efficiency by activating only a subset of experts. However, the memory overhead of storing all experts remains a major limitation, especially in large-scale MoE models such as DeepSeek-R1 (671B). In this study, we investigate domain specialization and expert redundancy in large-scale MoE models and uncover a consistent behavior we term few-shot expert localization, with only a few demonstrations, the model consistently activates a sparse and stable subset of experts. Building on this observation, we propose a simple yet effective pruning framework, EASY-EP, that leverages a few domain-specific demonstrations to identify and retain only the most relevant experts. EASY-EP comprises two key components: output-aware expert importance assessment and expert-level token contribution estimation. The former evaluates the importance of each expert for the current token by considering the gating scores and magnitudes of the outputs of activated experts, while the latter assesses the contribution of tokens based on representation similarities after and before routed experts. Experiments show that our method can achieve comparable performances and $2.99\times$ throughput under the same memory budget with full DeepSeek-R1 with only half the experts. Our code is available at https://github.com/RUCAIBox/EASYEP.

Updated: 2025-04-09 11:34:06

标题: 领域特定的大规模专家混合模型修剪与少量示范

摘要: 混合专家（MoE）模型通过仅激活一小部分专家，实现了性能和推理效率之间的有利权衡。然而，存储所有专家的内存开销仍然是一个主要限制，特别是在大规模MoE模型（如DeepSeek-R1（671B））中。在这项研究中，我们调查了大规模MoE模型中的领域专业化和专家冗余，并揭示了一种我们称之为少样本专家定位的一致行为，即只有少量演示，模型始终激活稀疏且稳定的专家子集。基于这一观察结果，我们提出了一个简单而有效的修剪框架EASY-EP，利用一些领域特定的演示来识别和保留最相关的专家。EASY-EP包括两个关键组成部分：基于输出的专家重要性评估和专家级别的标记贡献估计。前者通过考虑激活专家的门控分数和输出的幅度来评估每个专家对当前标记的重要性，而后者根据路由专家后的表示相似性来评估标记的贡献。实验证明，我们的方法在相同的内存预算下，仅使用一半的专家即可实现与完整DeepSeek-R1相当的性能和2.99倍的吞吐量。我们的代码可在https://github.com/RUCAIBox/EASYEP找到。

更新时间: 2025-04-09 11:34:06

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.06792v1

Beware of "Explanations" of AI

Understanding the decisions made and actions taken by increasingly complex AI system remains a key challenge. This has led to an expanding field of research in explainable artificial intelligence (XAI), highlighting the potential of explanations to enhance trust, support adoption, and meet regulatory standards. However, the question of what constitutes a "good" explanation is dependent on the goals, stakeholders, and context. At a high level, psychological insights such as the concept of mental model alignment can offer guidance, but success in practice is challenging due to social and technical factors. As a result of this ill-defined nature of the problem, explanations can be of poor quality (e.g. unfaithful, irrelevant, or incoherent), potentially leading to substantial risks. Instead of fostering trust and safety, poorly designed explanations can actually cause harm, including wrong decisions, privacy violations, manipulation, and even reduced AI adoption. Therefore, we caution stakeholders to beware of explanations of AI: while they can be vital, they are not automatically a remedy for transparency or responsible AI adoption, and their misuse or limitations can exacerbate harm. Attention to these caveats can help guide future research to improve the quality and impact of AI explanations.

Updated: 2025-04-09 11:31:08

标题: 当心关于人工智能的“解释”

摘要: 理解越来越复杂的人工智能系统所做的决策和采取的行动仍然是一个关键挑战。这导致了一个不断扩大的可解释人工智能（XAI）研究领域，突出了解释对增强信任、支持采用和满足监管标准的潜力。然而，什么构成一个“好”的解释取决于目标、利益相关者和背景。在高层次上，心理学洞察力，如心理模型一致性的概念，可以提供指导，但在实践中取得成功是具有挑战性的，这是由于社会和技术因素。由于问题的模糊性，解释可能质量低劣（例如不忠实、不相关或不连贯），可能导致重大风险。与其建立信任和安全，设计不良的解释实际上可能会导致伤害，包括错误决策、侵犯隐私、操纵，甚至减少人工智能采用。因此，我们警告利益相关者要警惕人工智能的解释：虽然它们可能至关重要，但并不自动成为透明或负责任的人工智能采用的解决方案，它们的误用或限制可能加剧伤害。注意这些警告可以帮助指导未来的研究，以改善人工智能解释的质量和影响。

更新时间: 2025-04-09 11:31:08

领域: cs.LG

下载: http://arxiv.org/abs/2504.06791v1

Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models

This technical report presents Ring-Lite-Distill, a lightweight reasoning model derived from our open-source Mixture-of-Experts (MoE) Large Language Models (LLMs) Ling-Lite. This study demonstrates that through meticulous high-quality data curation and ingenious training paradigms, the compact MoE model Ling-Lite can be further trained to achieve exceptional reasoning capabilities, while maintaining its parameter-efficient architecture with only 2.75 billion activated parameters, establishing an efficient lightweight reasoning architecture. In particular, in constructing this model, we have not merely focused on enhancing advanced reasoning capabilities, exemplified by high-difficulty mathematical problem solving, but rather aimed to develop a reasoning model with more comprehensive competency coverage. Our approach ensures coverage across reasoning tasks of varying difficulty levels while preserving generic capabilities, such as instruction following, tool use, and knowledge retention. We show that, Ring-Lite-Distill's reasoning ability reaches a level comparable to DeepSeek-R1-Distill-Qwen-7B, while its general capabilities significantly surpass those of DeepSeek-R1-Distill-Qwen-7B. The models are accessible at https://huggingface.co/inclusionAI

Updated: 2025-04-09 11:24:32

标题: 整体能力保持：朝向紧凑而全面的推理模型

摘要: 这份技术报告介绍了Ring-Lite-Distill，这是一个轻量级推理模型，源自我们开源的Mixture-of-Experts（MoE）大型语言模型（LLMs）Ling-Lite。本研究表明，通过精心策划高质量数据和巧妙的训练范式，紧凑的MoE模型Ling-Lite可以进一步训练以实现卓越的推理能力，同时保持其参数高效的架构，仅有27.5亿激活参数，建立了一种高效的轻量级推理架构。特别是，在构建这个模型时，我们不仅仅专注于增强高级推理能力，比如高难度数学问题求解，而是旨在开发一个具有更全面能力覆盖范围的推理模型。我们的方法确保覆盖各种难度级别的推理任务，同时保留通用能力，如遵循指令、使用工具和保持知识。我们展示，Ring-Lite-Distill的推理能力达到了与DeepSeek-R1-Distill-Qwen-7B相媲美的水平，而其通用能力明显超过了DeepSeek-R1-Distill-Qwen-7B。这些模型可在https://huggingface.co/inclusionAI上获得。

更新时间: 2025-04-09 11:24:32

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2504.07158v1

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

Recent advancements in reasoning with large language models (RLLMs), such as OpenAI-O1 and DeepSeek-R1, have demonstrated their impressive capabilities in complex domains like mathematics and coding. A central factor in their success lies in the application of long chain-of-thought (Long CoT) characteristics, which enhance reasoning abilities and enable the solution of intricate problems. However, despite these developments, a comprehensive survey on Long CoT is still lacking, limiting our understanding of its distinctions from traditional short chain-of-thought (Short CoT) and complicating ongoing debates on issues like "overthinking" and "test-time scaling." This survey seeks to fill this gap by offering a unified perspective on Long CoT. (1) We first distinguish Long CoT from Short CoT and introduce a novel taxonomy to categorize current reasoning paradigms. (2) Next, we explore the key characteristics of Long CoT: deep reasoning, extensive exploration, and feasible reflection, which enable models to handle more complex tasks and produce more efficient, coherent outcomes compared to the shallower Short CoT. (3) We then investigate key phenomena such as the emergence of Long CoT with these characteristics, including overthinking, and test-time scaling, offering insights into how these processes manifest in practice. (4) Finally, we identify significant research gaps and highlight promising future directions, including the integration of multi-modal reasoning, efficiency improvements, and enhanced knowledge frameworks. By providing a structured overview, this survey aims to inspire future research and further the development of logical reasoning in artificial intelligence.

Updated: 2025-04-09 11:20:18

标题: 走向推理时代：对于推理大型语言模型的长链条思路调查

摘要: 最近在使用大型语言模型（RLLMs）进行推理方面取得了一些进展，比如OpenAI-O1和DeepSeek-R1，在数学和编码等复杂领域展示出了令人印象深刻的能力。它们成功的一个关键因素在于应用长链式思维（Long CoT）特征，这些特征增强了推理能力，使其能够解决复杂问题。然而，尽管取得了这些进展，关于Long CoT的全面调查仍然缺乏，限制了我们对其与传统短链式思维（Short CoT）的区别的理解，并使关于“过度思考”和“测试时间扩展”等问题的持续辩论变得复杂。本调查旨在通过提供一个统一的Long CoT视角来填补这一空白。（1）我们首先区分Long CoT和Short CoT，并引入一个新颖的分类法来对当前的推理范例进行归类。（2）接下来，我们探讨Long CoT的关键特征：深度推理、广泛探索和可行反思，这使模型能够处理更复杂的任务，并产生比较浅的Short CoT更有效、连贯的结果。（3）然后，我们调查了具有这些特征的Long CoT的关键现象，包括过度思考和测试时间扩展，提供了关于这些过程如何在实践中表现的见解。（4）最后，我们确定了重要的研究空白，并强调了有前途的未来方向，包括多模态推理的整合、效率改进和增强的知识框架。通过提供一个结构化的概述，本调查旨在激发未来研究，并推进人工智能中的逻辑推理发展。

更新时间: 2025-04-09 11:20:18

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.09567v3

GAAPO: Genetic Algorithmic Applied to Prompt Optimization

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, with their performance heavily dependent on the quality of input prompts \cite{schulhoff2025promptsurvey} \cite{sahoo2025promptengineering}. While prompt engineering has proven effective, it typically relies on manual adjustments, making it time-consuming and potentially suboptimal. This paper introduces GAAPO (Genetic Algorithm Applied to Prompt Optimization), a novel hybrid optimization framework that leverages genetic algorithm \cite{dejong1988gen} principles to evolve prompts through successive generations. Unlike traditional genetic approaches that rely solely on mutation and crossover operations, GAAPO integrates multiple specialized prompt generation strategies within its evolutionary framework. Through extensive experimentation on diverse datasets including ETHOS, MMLU-Pro, and GPQA, our analysis reveals several important point for the future development of automatic prompt optimization methods: importance of the tradeoff between the population size and the number of generations, effect of selection methods on stability results, capacity of different LLMs and especially reasoning models to be able to automatically generate prompts from similar queries... Furthermore, we provide insights into the relative effectiveness of different prompt generation strategies and their evolution across optimization phases. These findings contribute to both the theoretical understanding of prompt optimization and practical applications in improving LLM performance.

Updated: 2025-04-09 11:19:42

标题: GAAPO：应用于快速优化的遗传算法

摘要: 大型语言模型（LLMs）已经在各种任务中展示出了非凡的能力，它们的性能严重依赖于输入提示的质量。尽管提示工程已被证明有效，但通常依赖于手动调整，使其耗时且潜在不够优化。本文介绍了GAAPO（应用于提示优化的遗传算法），这是一个利用遗传算法原则通过连续世代进化提示的新型混合优化框架。与仅依赖于突变和交叉操作的传统遗传方法不同，GAAPO在其进化框架中整合了多种专门的提示生成策略。通过在包括ETHOS、MMLU-Pro和GPQA在内的多个数据集上进行广泛实验，我们的分析揭示了未来发展自动提示优化方法的几个重要要点：人口规模和世代数量之间的权衡重要性，选择方法对稳定结果的影响，不同LLMs特别是推理模型能够自动从相似查询中生成提示的能力...此外，我们提供了对不同提示生成策略的相对有效性以及它们在优化阶段演化的见解。这些发现既有助于对提示优化的理论理解，也在改善LLM性能的实际应用中发挥作用。

更新时间: 2025-04-09 11:19:42

领域: cs.NE,cs.LG,I.2.6

下载: http://arxiv.org/abs/2504.07157v1

Zero-Shot Image-Based Large Language Model Approach to Road Pavement Monitoring

Effective and rapid evaluation of pavement surface condition is critical for prioritizing maintenance, ensuring transportation safety, and minimizing vehicle wear and tear. While conventional manual inspections suffer from subjectivity, existing machine learning-based methods are constrained by their reliance on large and high-quality labeled datasets, which require significant resources and limit adaptability across varied road conditions. The revolutionary advancements in Large Language Models (LLMs) present significant potential for overcoming these challenges. In this study, we propose an innovative automated zero-shot learning approach that leverages the image recognition and natural language understanding capabilities of LLMs to assess road conditions effectively. Multiple LLM-based assessment models were developed, employing prompt engineering strategies aligned with the Pavement Surface Condition Index (PSCI) standards. These models' accuracy and reliability were evaluated against official PSCI results, with an optimized model ultimately selected. Extensive tests benchmarked the optimized model against evaluations from various levels experts using Google Street View road images. The results reveal that the LLM-based approach can effectively assess road conditions, with the optimized model -employing comprehensive and structured prompt engineering strategies -outperforming simpler configurations by achieving high accuracy and consistency, even surpassing expert evaluations. Moreover, successfully applying the optimized model to Google Street View images demonstrates its potential for future city-scale deployments. These findings highlight the transformative potential of LLMs in automating road damage evaluations and underscore the pivotal role of detailed prompt engineering in achieving reliable assessments.

Updated: 2025-04-09 11:19:17

标题: 零样本基于图像的大型语言模型方法用于道路路面监测

摘要: 道路表面状况的有效和快速评估对于确定维护优先级、确保交通安全并最大程度地减少车辆磨损至关重要。虽然传统的手动检查存在主观性，但现有基于机器学习的方法受到它们对大型和高质量标记数据集的依赖的限制，这需要大量资源并限制了在不同道路条件下的适应性。大语言模型（LLMs）的革命性进展为克服这些挑战提供了重要潜力。在本研究中，我们提出了一种创新的自动零样本学习方法，利用LLMs的图像识别和自然语言理解能力有效评估道路状况。开发了多个基于LLMs的评估模型，采用与道路表面状况指数（PSCI）标准一致的提示工程策略。这些模型的准确性和可靠性与官方PSCI结果进行了评估，最终选择了一个优化模型。通过使用Google街景道路图像对优化模型进行了广泛测试，与各级专家评估进行了基准测试。结果显示，基于LLMs的方法可以有效评估道路状况，优化模型 - 采用全面和结构化的提示工程策略 - 通过实现高准确性和一致性，甚至超越专家评估，而胜过简单配置。此外，成功将优化模型应用于Google街景图像展示了其在未来城市规模部署中的潜力。这些发现突显了LLMs在自动化道路损伤评估中的转变潜力，并强调了详细提示工程在实现可靠评估中的关键作用。

更新时间: 2025-04-09 11:19:17

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.06785v1

LostPaw: Finding Lost Pets using a Contrastive Learning-based Transformer with Visual Input

Losing pets can be highly distressing for pet owners, and finding a lost pet is often challenging and time-consuming. An artificial intelligence-based application can significantly improve the speed and accuracy of finding lost pets. To facilitate such an application, this study introduces a contrastive neural network model capable of accurately distinguishing between images of pets. The model was trained on a large dataset of dog images and evaluated through 3-fold cross-validation. Following 350 epochs of training, the model achieved a test accuracy of 90%. Furthermore, overfitting was avoided, as the test accuracy closely matched the training accuracy. Our findings suggest that contrastive neural network models hold promise as a tool for locating lost pets. This paper presents the foundational framework for a potential web application designed to assist users in locating their missing pets. The application will allow users to upload images of their lost pets and provide notifications when matching images are identified within its image database. This functionality aims to enhance the efficiency and accuracy with which pet owners can search for and reunite with their beloved animals.

Updated: 2025-04-09 11:17:26

标题: LostPaw：使用基于对比学习的视觉输入Transformer找到走失的宠物

摘要: 失去宠物对宠物主人来说可能是非常令人痛苦的，找到一只走失的宠物通常是具有挑战性且耗时的。基于人工智能的应用程序可以显著提高找回走失宠物的速度和准确性。为了促进这样的应用程序，本研究介绍了一个对宠物图像进行准确区分的对比神经网络模型。该模型在一个大型的狗狗图像数据集上进行训练，并通过3折交叉验证进行评估。经过350次训练周期后，模型实现了90%的测试准确度。此外，通过避免过拟合，测试准确度与训练准确度相匹配。我们的研究结果表明，对比神经网络模型有望成为一种定位走失宠物的工具。本文提出了一个潜在的旨在帮助用户找到他们失踪宠物的网络应用程序的基础框架。该应用程序将允许用户上传他们失踪宠物的图像，并在其图像数据库中识别出匹配图像时提供通知。这一功能旨在提高宠物主人寻找并与他们心爱动物团聚的效率和准确性。

更新时间: 2025-04-09 11:17:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2304.14765v2

Oil Spill Segmentation using Deep Encoder-Decoder models

Crude oil is an integral component of the world economy and transportation sectors. With the growing demand for crude oil due to its widespread applications, accidental oil spills are unfortunate yet unavoidable. Even though oil spills are difficult to clean up, the first and foremost challenge is to detect them. In this research, the authors test the feasibility of deep encoder-decoder models that can be trained effectively to detect oil spills remotely. The work examines and compares the results from several segmentation models on high dimensional satellite Synthetic Aperture Radar (SAR) image data to pave the way for further in-depth research. Multiple combinations of models are used to run the experiments. The best-performing model is the one with the ResNet-50 encoder and DeepLabV3+ decoder. It achieves a mean Intersection over Union (IoU) of 64.868% and an improved class IoU of 61.549% for the ``oil spill" class when compared with the previous benchmark model, which achieved a mean IoU of 65.05% and a class IoU of 53.38% for the ``oil spill" class.

Updated: 2025-04-09 11:06:22

标题: 使用深度编码器-解码器模型进行油污溢出分割

摘要: 原油是世界经济和交通领域的重要组成部分。由于其广泛应用，对原油的需求不断增长，不幸的是，意外的油污泄漏是不可避免的。虽然清理油污是困难的，但首要挑战是检测它们。在这项研究中，作者测试了可以有效训练以远程检测油污的深度编码器-解码器模型的可行性。该工作检验并比较了在高维度卫星合成孔径雷达（SAR）图像数据上的几种分割模型的结果，为进一步深入研究铺平了道路。使用多种模型组合来运行实验。表现最佳的模型是具有ResNet-50编码器和DeepLabV3+解码器的模型。与先前的基准模型相比，该模型在“油污”类别上实现了64.868%的平均交并比（IoU），并在“油污”类别上获得了61.549%的改进类别IoU，而前者在“油污”类别上的平均IoU为65.05%，类别IoU为53.38%。

更新时间: 2025-04-09 11:06:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2305.01386v2

A Lightweight and Extensible Cell Segmentation and Classification Model for Whole Slide Images

Developing clinically useful cell-level analysis tools in digital pathology remains challenging due to limitations in dataset granularity, inconsistent annotations, high computational demands, and difficulties integrating new technologies into workflows. To address these issues, we propose a solution that enhances data quality, model performance, and usability by creating a lightweight, extensible cell segmentation and classification model. First, we update data labels through cross-relabeling to refine annotations of PanNuke and MoNuSAC, producing a unified dataset with seven distinct cell types. Second, we leverage the H-Optimus foundation model as a fixed encoder to improve feature representation for simultaneous segmentation and classification tasks. Third, to address foundation models' computational demands, we distill knowledge to reduce model size and complexity while maintaining comparable performance. Finally, we integrate the distilled model into QuPath, a widely used open-source digital pathology platform. Results demonstrate improved segmentation and classification performance using the H-Optimus-based model compared to a CNN-based model. Specifically, average $R^2$ improved from 0.575 to 0.871, and average $PQ$ score improved from 0.450 to 0.492, indicating better alignment with actual cell counts and enhanced segmentation quality. The distilled model maintains comparable performance while reducing parameter count by a factor of 48. By reducing computational complexity and integrating into workflows, this approach may significantly impact diagnostics, reduce pathologist workload, and improve outcomes. Although the method shows promise, extensive validation is necessary prior to clinical deployment.

Updated: 2025-04-09 11:06:08

标题: 一种轻量级且可扩展的整张切片图像细胞分割和分类模型

摘要: 在数字病理学中开发具有临床实用性的细胞级分析工具仍然具有挑战性，原因是数据集粒度受限、注释不一致、计算需求高，并且难以将新技术整合到工作流程中。为了解决这些问题，我们提出了一个解决方案，通过创建一个轻量级、可扩展的细胞分割和分类模型来提高数据质量、模型性能和可用性。首先，我们通过交叉重新标记更新数据标签，以细化 PanNuke 和 MoNuSAC 的注释，生成一个包含七种不同细胞类型的统一数据集。其次，我们利用 H-Optimus 基础模型作为固定编码器，以改善特征表示，用于同时进行分割和分类任务。第三，为了解决基础模型的计算需求，我们进行知识蒸馏以减少模型大小和复杂性，同时保持可比较的性能。最后，我们将精炼的模型整合到 QuPath 中，这是一个广泛使用的开源数字病理学平台。结果表明，使用基于 H-Optimus 的模型相比基于 CNN 的模型，分割和分类性能有所提升。具体地，平均 $R^2$ 从 0.575 提高到 0.871，平均 $PQ$ 分数从 0.450 提高到 0.492，表明与实际细胞计数更好地对齐，并提高了分割质量。精炼模型在减少参数数量的同时保持了可比较的性能，减少了48倍。通过减少计算复杂性并整合到工作流程中，这种方法可能会显著影响诊断、减轻病理学家的工作负担，并改善结果。尽管该方法显示出潜力，但在临床部署前需要进行广泛的验证。

更新时间: 2025-04-09 11:06:08

领域: cs.CV,cs.AI,I.4.6; I.4.9; I.2.10

下载: http://arxiv.org/abs/2502.19217v2

Hybrid machine learning models based on physical patterns to accelerate CFD simulations: a short guide on autoregressive models

Accurate modeling of the complex dynamics of fluid flows is a fundamental challenge in computational physics and engineering. This study presents an innovative integration of High-Order Singular Value Decomposition (HOSVD) with Long Short-Term Memory (LSTM) architectures to address the complexities of reduced-order modeling (ROM) in fluid dynamics. HOSVD improves the dimensionality reduction process by preserving multidimensional structures, surpassing the limitations of Singular Value Decomposition (SVD). The methodology is tested across numerical and experimental data sets, including two- and three-dimensional (2D and 3D) cylinder wake flows, spanning both laminar and turbulent regimes. The emphasis is also on exploring how the depth and complexity of LSTM architectures contribute to improving predictive performance. Simpler architectures with a single dense layer effectively capture the periodic dynamics, demonstrating the network's ability to model non-linearities and chaotic dynamics. The addition of extra layers provides higher accuracy at minimal computational cost. These additional layers enable the network to expand its representational capacity, improving the prediction accuracy and reliability. The results demonstrate that HOSVD outperforms SVD in all tested scenarios, as evidenced by using different error metrics. Efficient mode truncation by HOSVD-based models enables the capture of complex temporal patterns, offering reliable predictions even in challenging, noise-influenced data sets. The findings underscore the adaptability and robustness of HOSVD-LSTM architectures, offering a scalable framework for modeling fluid dynamics.

Updated: 2025-04-09 10:56:03

标题: 基于物理模式的混合机器学习模型加速CFD模拟：自回归模型简要指南

摘要: 准确建模流体流动的复杂动态是计算物理学和工程学中的基本挑战。本研究提出了一种创新的将高阶奇异值分解（HOSVD）与长短期记忆（LSTM）架构相结合，以解决流体动力学中降阶建模（ROM）的复杂性。HOSVD通过保留多维结构改善了降维过程，超越了奇异值分解（SVD）的局限性。该方法在数值和实验数据集上进行了测试，包括二维和三维（2D和3D）圆柱尾流，涵盖了层流和湍流区域。重点还在于探索LSTM架构的深度和复杂性如何有助于提高预测性能。具有单个稠密层的简单架构有效捕捉周期性动态，展示了网络建模非线性和混沌动态的能力。额外的层提供了更高的准确性，且计算成本最小。这些额外的层使网络能够扩展其表征能力，提高预测准确性和可靠性。结果表明，在所有测试场景中，HOSVD表现优于SVD，这通过使用不同的误差指标得到证实。基于HOSVD的模型通过有效的模态截断能够捕捉复杂的时间模式，即使在受噪音影响的数据集中也能提供可靠的预测。研究结果强调了HOSVD-LSTM架构的适应性和稳健性，为建模流体动力学提供了可扩展的框架。

更新时间: 2025-04-09 10:56:03

领域: physics.flu-dyn,cs.LG

下载: http://arxiv.org/abs/2504.06774v1

AI, Help Me Think$\unicode{x2014}$but for Myself: Assisting People in Complex Decision-Making by Providing Different Kinds of Cognitive Support

How can we design AI tools that effectively support human decision-making by complementing and enhancing users' reasoning processes? Common recommendation-centric approaches face challenges such as inappropriate reliance or a lack of integration with users' decision-making processes. Here, we explore an alternative interaction model in which the AI outputs build upon users' own decision-making rationales. We compare this approach, which we call ExtendAI, with a recommendation-based AI. Participants in our mixed-methods user study interacted with both AIs as part of an investment decision-making task. We found that the AIs had different impacts, with ExtendAI integrating better into the decision-making process and people's own thinking and leading to slightly better outcomes. RecommendAI was able to provide more novel insights while requiring less cognitive effort. We discuss the implications of these and other findings along with three tensions of AI-assisted decision-making which our study revealed.

Updated: 2025-04-09 10:48:17

标题: AI，帮助我思考——但是为了自己：通过提供不同类型的认知支持来帮助人们进行复杂决策。

摘要: 我们如何设计有效支持人类决策过程的人工智能工具，以补充和增强用户的推理过程?常见的基于推荐的方法面临着诸如不恰当依赖或缺乏与用户决策过程的整合等挑战。在这里，我们探讨了一种替代的交互模型，即人工智能的输出建立在用户自身的决策理论基础之上。我们将这种方法称为ExtendAI，并将其与基于推荐的人工智能进行了比较。在我们的混合方法用户研究中，参与者与两种人工智能进行了互动，作为一项投资决策任务的一部分。我们发现，这两种人工智能产生了不同的影响，ExtendAI更好地整合进决策过程和人们自己的思考，并导致略微更好的结果。而RecommendAI能够提供更多新颖的见解，同时需要更少的认知努力。我们讨论了这些以及研究揭示的助决策人工智能的三种紧张关系的含义。

更新时间: 2025-04-09 10:48:17

领域: cs.HC,cs.AI,68, 91,I.2; J.4

下载: http://arxiv.org/abs/2504.06771v1

PLM-eXplain: Divide and Conquer the Protein Embedding Space

Protein language models (PLMs) have revolutionised computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. We present an explainable adapter layer - PLM-eXplain (PLM-X), that bridges this gap by factoring PLM embeddings into two components: an interpretable subspace based on established biochemical features, and a residual subspace that preserves the model's predictive power. Using embeddings from ESM2, our adapter incorporates well-established properties, including secondary structure and hydropathy while maintaining high performance. We demonstrate the effectiveness of our approach across three protein-level classification tasks: prediction of extracellular vesicle association, identification of transmembrane helices, and prediction of aggregation propensity. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalisable solution for enhancing PLM interpretability across various downstream applications. This work addresses a critical need in computational biology by providing a bridge between powerful deep learning models and actionable biological insights.

Updated: 2025-04-09 10:46:24

标题: PLM-eXplain: 将蛋白质嵌入空间分割并征服

摘要: 蛋白质语言模型（PLMs）通过其生成强大的序列表示，为多样化的预测任务在计算生物学中进行了革命。然而，它们的黑盒本质限制了生物解释和转化为可行见解。我们提出了一个可解释的适配器层- PLM-eXplain（PLM-X），通过将PLM嵌入因子化为两个组件来弥合这一差距：一个基于已建立的生物化学特征的可解释子空间，以及一个保留模型预测能力的残差子空间。使用ESM2的嵌入，我们的适配器结合了已建立的特性，包括二级结构和亲水性，同时保持高性能。我们展示了我们的方法在三个蛋白质级分类任务中的有效性：预测细胞外囊泡结合，识别跨膜螺旋和预测聚集倾向。PLM-X使模型决策的生物解释成为可能，而不会牺牲准确性，为增强跨各种下游应用的PLM解释性提供了通用解决方案。这项工作通过提供一个桥梁，将强大的深度学习模型与可操作的生物见解联系起来，解决了计算生物学中的一个关键需求。

更新时间: 2025-04-09 10:46:24

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.07156v1

FedMerge: Federated Personalization via Model Merging

One global model in federated learning (FL) might not be sufficient to serve many clients with non-IID tasks and distributions. While there has been advances in FL to train multiple global models for better personalization, they only provide limited choices to clients so local finetuning is still indispensable. In this paper, we propose a novel ``FedMerge'' approach that can create a personalized model per client by simply merging multiple global models with automatically optimized and customized weights. In FedMerge, a few global models can serve many non-IID clients, even without further local finetuning. We formulate this problem as a joint optimization of global models and the merging weights for each client. Unlike existing FL approaches where the server broadcasts one or multiple global models to all clients, the server only needs to send a customized, merged model to each client. Moreover, instead of periodically interrupting the local training and re-initializing it to a global model, the merged model aligns better with each client's task and data distribution, smoothening the local-global gap between consecutive rounds caused by client drift. We evaluate FedMerge on three different non-IID settings applied to different domains with diverse tasks and data types, in which FedMerge consistently outperforms existing FL approaches, including clustering-based and mixture-of-experts (MoE) based methods.

Updated: 2025-04-09 10:44:14

标题: FedMerge：通过模型合并实现联邦个性化

摘要: 在联邦学习（FL）中，一个全局模型可能不足以为许多具有非独立同分布任务和分布的客户提供服务。虽然在FL领域已经取得了进展，可以训练多个全局模型以实现更好的个性化，但它们仅为客户提供了有限的选择，因此本地微调仍然是必不可少的。在本文中，我们提出了一种新颖的“FedMerge”方法，通过简单地合并多个全局模型并自动优化和定制权重，可以为每个客户创建一个个性化模型。在FedMerge中，少数全局模型可以为许多非独立同分布的客户提供服务，甚至无需进一步的本地微调。我们将这个问题建模为全局模型和每个客户的合并权重的联合优化。与现有的FL方法不同，服务器不需要向所有客户广播一个或多个全局模型，而只需向每个客户发送一个定制的合并模型。此外，与周期性中断本地训练并重新初始化到全局模型不同，合并模型更好地与每个客户的任务和数据分布相吻合，从而平滑了由客户漂移引起的每轮之间的本地-全局差距。我们在应用于不同领域的三种不同非独立同分布设置上评估了FedMerge，这些领域具有多样的任务和数据类型，其中FedMerge始终优于现有的FL方法，包括基于聚类和基于专家混合的方法。

更新时间: 2025-04-09 10:44:14

领域: cs.LG

下载: http://arxiv.org/abs/2504.06768v1

FamilyTool: A Multi-hop Personalized Tool Use Benchmark

The integration of tool learning with Large Language Models (LLMs) has expanded their capabilities in handling complex tasks by leveraging external tools. However, existing benchmarks for tool learning inadequately address critical real-world personalized scenarios, particularly those requiring multi-hop reasoning and inductive knowledge adaptation in dynamic environments. To bridge this gap, we introduce FamilyTool, a novel benchmark grounded in a family-based knowledge graph (KG) that simulates personalized, multi-hop tool use scenarios. FamilyTool challenges LLMs with queries spanning 1 to 3 relational hops (e.g., inferring familial connections and preferences) and incorporates an inductive KG setting where models must adapt to unseen user preferences and relationships without re-training, a common limitation in prior approaches that compromises generalization. We further propose KGETool: a simple KG-augmented evaluation pipeline to systematically assess LLMs' tool use ability in these settings. Experiments reveal significant performance gaps in state-of-the-art LLMs, with accuracy dropping sharply as hop complexity increases and inductive scenarios exposing severe generalization deficits. These findings underscore the limitations of current LLMs in handling personalized, evolving real-world contexts and highlight the urgent need for advancements in tool-learning frameworks. FamilyTool serves as a critical resource for evaluating and advancing LLM agents' reasoning, adaptability, and scalability in complex, dynamic environments. Code and dataset are available at Github.

Updated: 2025-04-09 10:42:36

标题: FamilyTool: 一个多跳个性化工具使用基准

摘要: 工具学习与大型语言模型（LLMs）的整合通过利用外部工具扩展了它们处理复杂任务的能力。然而，现有的工具学习基准不足以应对关键的现实个性化场景，特别是那些需要在动态环境中进行多跳推理和归纳知识适应的场景。为了弥补这一差距，我们引入了FamilyTool，这是一个基于家庭知识图（KG）的新型基准，模拟个性化的、多跳的工具使用场景。FamilyTool挑战LLMs处理跨越1到3个关系跳跃的查询（例如推断家庭关系和偏好），并且采用归纳KG设置，模型必须适应未见过的用户偏好和关系而无需重新训练，这是先前方法中常见的一个限制，会影响泛化能力。我们进一步提出了KGETool：一个简单的KG增强评估管道，用于系统评估LLMs在这些设置中的工具使用能力。实验证实，尖端LLMs存在显著的性能差距，准确率随着跳跃复杂性的增加急剧下降，归纳场景暴露出严重的泛化缺陷。这些发现突显了当前LLMs在处理个性化、不断发展的现实环境时的局限性，并强调了在工具学习框架方面的迫切需求。FamilyTool作为一个关键资源，用于评估和推进LLM代理在复杂、动态环境中的推理、适应性和可扩展性。代码和数据集可在Github上获得。

更新时间: 2025-04-09 10:42:36

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2504.06766v1

Digital Gene: Learning about the Physical World through Analytic Concepts

Reviewing the progress in artificial intelligence over the past decade, various significant advances (e.g. object detection, image generation, large language models) have enabled AI systems to produce more semantically meaningful outputs and achieve widespread adoption in internet scenarios. Nevertheless, AI systems still struggle when it comes to understanding and interacting with the physical world. This reveals an important issue: relying solely on semantic-level concepts learned from internet data (e.g. texts, images) to understand the physical world is far from sufficient -- machine intelligence currently lacks an effective way to learn about the physical world. This research introduces the idea of analytic concept -- representing the concepts related to the physical world through programs of mathematical procedures, providing machine intelligence a portal to perceive, reason about, and interact with the physical world. Except for detailing the design philosophy and providing guidelines for the application of analytic concepts, this research also introduce about the infrastructure that has been built around analytic concepts. I aim for my research to contribute to addressing these questions: What is a proper abstraction of general concepts in the physical world for machine intelligence? How to systematically integrate structured priors with neural networks to constrain AI systems to comply with physical laws?

Updated: 2025-04-09 10:35:12

标题: 数字基因：通过分析概念了解物理世界

摘要: 在过去的十年里，人工智能领域取得了巨大进展，例如目标检测、图像生成、大型语言模型等重大突破，使得人工智能系统能够产生更具语义意义的输出，并在互联网场景中得到广泛应用。然而，当涉及理解和与物理世界互动时，人工智能系统仍然面临困难。这揭示了一个重要问题：仅仅依靠从互联网数据（例如文本、图像）学习的语义级概念来理解物理世界是远远不够的——当前的机器智能缺乏有效的学习物理世界的方式。这项研究引入了分析概念的概念——通过数学程序的程序表示与物理世界相关的概念，为机器智能提供了一个感知、推理和与物理世界互动的门户。除了详细介绍设计哲学并提供应用分析概念的指导方针外，这项研究还介绍了围绕分析概念建立的基础设施。我希望我的研究能够有助于解决以下问题：什么是机器智能在物理世界中一般概念的适当抽象？如何系统地将结构化先验与神经网络集成，以约束人工智能系统遵守物理定律？

更新时间: 2025-04-09 10:35:12

领域: cs.AI,cs.CV,cs.RO

下载: http://arxiv.org/abs/2504.04170v2

Optimizing LLM Queries in Relational Data Analytics Workloads

Batch data analytics is a growing application for Large Language Models (LLMs). LLMs enable users to perform a wide range of natural language tasks, such as classification, entity extraction, and translation, over large datasets. However, LLM inference is highly costly and slow: for example, an NVIDIA L4 GPU running Llama3-8B can only process 6 KB of text per second, taking about a day to handle 15 GB of data; processing a similar amount of data costs around $10K on OpenAI's GPT-4o. In this paper, we propose novel techniques that can significantly reduce the cost of LLM calls for relational data analytics workloads. Our key contribution is developing efficient algorithms for reordering the rows and the fields within each row of an input table to maximize key-value (KV) cache reuse when performing LLM serving. As such, our approach can be easily applied to existing analytics systems and serving platforms. Our evaluation shows that our solution can yield up to 3.4x improvement in job completion time on a benchmark of diverse LLM-based queries using Llama 3 models. Our solution also achieves a 32% cost savings under OpenAI and Anthropic pricing models.

Updated: 2025-04-09 10:23:39

标题: 优化关系数据分析工作负载中的LLM查询

摘要: 批处理数据分析是大型语言模型（LLMs）日益增长的应用。LLMs使用户能够在大型数据集上执行各种自然语言任务，如分类、实体提取和翻译。然而，LLM推断成本高昂且速度缓慢：例如，一台运行Llama3-8B的NVIDIA L4 GPU每秒只能处理6 KB的文本，需要大约一天才能处理15 GB的数据；在OpenAI的GPT-4o上处理类似数量的数据约需花费10K美元。在本文中，我们提出了一些新颖的技术，可以显著降低LLM调用在关系数据分析工作负载中的成本。我们的关键贡献是开发出有效的算法，用于重新排列输入表中每行的行和字段，以最大化在执行LLM服务时的键值（KV）缓存重用。因此，我们的方法可以轻松应用于现有的分析系统和服务平台。我们的评估结果显示，我们的解决方案在使用Llama 3模型进行多样化基于LLM的查询基准测试时，可以使作业完成时间提高最多3.4倍。我们的解决方案还在OpenAI和Anthropic定价模型下实现了32%的成本节省。

更新时间: 2025-04-09 10:23:39

领域: cs.LG,cs.DB

下载: http://arxiv.org/abs/2403.05821v2

Automating Customer Needs Analysis: A Comparative Study of Large Language Models in the Travel Industry

In the rapidly evolving landscape of Natural Language Processing (NLP), Large Language Models (LLMs) have emerged as powerful tools for many tasks, such as extracting valuable insights from vast amounts of textual data. In this study, we conduct a comparative analysis of LLMs for the extraction of travel customer needs from TripAdvisor and Reddit posts. Leveraging a diverse range of models, including both open-source and proprietary ones such as GPT-4 and Gemini, we aim to elucidate their strengths and weaknesses in this specialized domain. Through an evaluation process involving metrics such as BERTScore, ROUGE, and BLEU, we assess the performance of each model in accurately identifying and summarizing customer needs. Our findings highlight the efficacy of opensource LLMs, particularly Mistral 7B, in achieving comparable performance to larger closed models while offering affordability and customization benefits. Additionally, we underscore the importance of considering factors such as model size, resource requirements, and performance metrics when selecting the most suitable LLM for customer needs analysis tasks. Overall, this study contributes valuable insights for businesses seeking to leverage advanced NLP techniques to enhance customer experience and drive operational efficiency in the travel industry.

Updated: 2025-04-09 10:21:07

标题: 自动化客户需求分析：旅游行业大型语言模型的比较研究

摘要: 在自然语言处理（NLP）领域迅速发展的背景下，大型语言模型（LLMs）已经成为从大量文本数据中提取有价值见解的强大工具。本研究对比分析了LLMs在从TripAdvisor和Reddit帖子中提取旅行客户需求方面的性能。利用各种模型，包括开源和专有模型如GPT-4和Gemini，我们旨在阐明它们在这一专业领域的优势和劣势。通过评估过程，包括BERTScore、ROUGE和BLEU等指标，我们评估了每个模型在准确识别和总结客户需求方面的性能。我们的研究结果突出了开源LLMs的效力，特别是Mistral 7B，在实现与更大封闭模型可比的性能的同时，提供了经济性和定制化优势。此外，我们强调在选择最适合客户需求分析任务的LLM时考虑模型大小、资源需求和性能指标等因素的重要性。总的来说，本研究为寻求利用先进的NLP技术提升客户体验并推动旅行业运营效率的企业提供了宝贵的见解。

更新时间: 2025-04-09 10:21:07

领域: cs.CL,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2404.17975v2

Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception

The rapid advancement of audio generation technologies has escalated the risks of malicious deepfake audio across speech, sound, singing voice, and music, threatening multimedia security and trust. While existing countermeasures (CMs) perform well in single-type audio deepfake detection (ADD), their performance declines in cross-type scenarios. This paper is dedicated to studying the alltype ADD task. We are the first to comprehensively establish an all-type ADD benchmark to evaluate current CMs, incorporating cross-type deepfake detection across speech, sound, singing voice, and music. Then, we introduce the prompt tuning self-supervised learning (PT-SSL) training paradigm, which optimizes SSL frontend by learning specialized prompt tokens for ADD, requiring 458x fewer trainable parameters than fine-tuning (FT). Considering the auditory perception of different audio types,we propose the wavelet prompt tuning (WPT)-SSL method to capture type-invariant auditory deepfake information from the frequency domain without requiring additional training parameters, thereby enhancing performance over FT in the all-type ADD task. To achieve an universally CM, we utilize all types of deepfake audio for co-training. Experimental results demonstrate that WPT-XLSR-AASIST achieved the best performance, with an average EER of 3.58% across all evaluation sets. The code is available online.

Updated: 2025-04-09 10:18:45

标题: 检测所有类型的深度伪造音频：小波提示调整以增强听觉感知

摘要: 音频生成技术的快速发展加剧了恶意深度伪造音频在语音、声音、歌声和音乐领域的风险，威胁到多媒体安全与信任。虽然现有的对抗措施在单一类型音频深度伪造检测（ADD）中表现良好，但在跨类型场景中性能下降。本文致力于研究全类型ADD任务。我们是第一个全面建立全类型ADD基准的研究，评估当前对抗措施，包括跨语音、声音、歌声和音乐的深度伪造检测。然后，我们介绍了一种优化SSL前端的提示调整自监督学习（PT-SSL）训练范式，通过学习专门用于ADD的提示标记，需要的可训练参数比微调（FT）少458倍。考虑到不同音频类型的听觉感知，我们提出了波形提示调整（WPT）-SSL方法，从频域捕获不受类型影响的音频深度伪造信息，无需额外的训练参数，从而提高了在全类型ADD任务中的性能。为了实现通用的对抗措施，我们利用所有类型的深度伪造音频进行联合训练。实验结果表明，WPT-XLSR-AASIST取得了最佳性能，所有评估集的平均EER为3.58%。代码可在网上获取。

更新时间: 2025-04-09 10:18:45

领域: cs.SD,cs.AI

下载: http://arxiv.org/abs/2504.06753v1

Floralens: a Deep Learning Model for the Portuguese Native Flora

Machine-learning techniques, especially deep convolutional neural networks, are pivotal for image-based identification of biological species in many Citizen Science platforms. In this paper, we describe the construction of a dataset for the Portuguese native flora based on publicly available research-grade datasets, and the derivation of a high-accuracy model from it using off-the-shelf deep convolutional neural networks. We anchored the dataset in high-quality data provided by Sociedade Portuguesa de Bot\^anica and added further sampled data from research-grade datasets available from GBIF. We find that with a careful dataset design, off-the-shelf machine-learning cloud services such as Google's AutoML Vision produce accurate models, with results comparable to those of Pl@ntNet, a state-of-the-art citizen science platform. The best model we derived, dubbed Floralens, has been integrated into the public website of Project Biolens, where we gather models for other taxa as well. The dataset used to train the model is also publicly available on Zenodo.

Updated: 2025-04-09 10:12:38

标题: Floralens：用于葡萄牙本土植物的深度学习模型

摘要: 机器学习技术，特别是深度卷积神经网络，在许多公民科学平台中对生物物种的基于图像的识别至关重要。在本文中，我们描述了基于公开可用的研究级数据集构建的葡萄牙本土植物群的数据集，并利用现成的深度卷积神经网络从中得出高准确性模型。我们将数据集锚定在由葡萄牙植物学会提供的高质量数据上，并从GBIF提供的研究级数据集中添加了进一步的抽样数据。我们发现，通过谨慎设计数据集，像谷歌的AutoML Vision这样的现成机器学习云服务可以生成准确的模型，其结果与Pl@ntNet这样的最先进公民科学平台相当。我们得出的最佳模型，被称为Floralens，已经集成到Project Biolens的公共网站中，我们在那里收集其他分类群的模型。用于训练模型的数据集也可以在Zenodo上公开获取。

更新时间: 2025-04-09 10:12:38

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.12072v3

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation

Spiking Neural Networks (SNNs) are highly energy-efficient during inference, making them particularly suitable for deployment on neuromorphic hardware. Their ability to process event-driven inputs, such as data from dynamic vision sensors (DVS), further enhances their applicability to edge computing tasks. However, the resource constraints of edge hardware necessitate techniques like weight quantization, which reduce the memory footprint of SNNs while preserving accuracy. Despite its importance, existing quantization methods typically focus on synaptic weights quantization without taking account of other critical parameters, such as scaling neuron firing thresholds. To address this limitation, we present the first benchmark for the DVS gesture recognition task using SNNs optimized for the many-core neuromorphic chip SpiNNaker2. Our study evaluates two quantization pipelines for fixed-point computations. The first approach employs post training quantization (PTQ) with percentile-based threshold scaling, while the second uses quantization aware training (QAT) with adaptive threshold scaling. Both methods achieve accurate 8-bit on-chip inference, closely approximating 32-bit floating-point performance. Additionally, our baseline SNNs perform competitively against previously reported results without specialized techniques. These models are deployed on SpiNNaker2 using the neuromorphic intermediate representation (NIR). Ultimately, we achieve 94.13% classification accuracy on-chip, demonstrating the SpiNNaker2's potential for efficient, low-energy neuromorphic computing.

Updated: 2025-04-09 10:09:29

标题: 在SpiNNaker2上高效部署用于DVS手势识别的脉冲神经网络，使用神经形态中间表示

摘要: 脉冲神经网络（SNNs）在推理过程中非常节能，使它们特别适合部署在神经形态硬件上。它们处理事件驱动输入的能力，例如来自动态视觉传感器（DVS）的数据，进一步增强了它们在边缘计算任务中的适用性。然而，边缘硬件的资源约束需要像权重量化这样的技术，可以减少SNNs的内存占用，同时保持准确性。尽管其重要性，现有的量化方法通常集中在神经突触权重量化上，而忽略了其他关键参数，例如缩放神经元发放阈值。为了解决这一限制，我们提出了第一个针对DVS手势识别任务的基准测试，使用了为众核神经形态芯片SpiNNaker2优化的SNNs。我们的研究评估了两种用于定点计算的量化流程。第一种方法采用了后训练量化（PTQ）与基于百分位的阈值缩放，而第二种方法使用了量化感知训练（QAT）与自适应阈值缩放。这两种方法都实现了准确的8位片上推理，接近32位浮点性能。此外，我们的基准SNNs在没有专门技术的情况下表现出竞争力。这些模型使用神经形态中间表示（NIR）部署在SpiNNaker2上。最终，我们在芯片上实现了94.13%的分类准确度，展示了SpiNNaker2在高效、低能耗神经形态计算方面的潜力。

更新时间: 2025-04-09 10:09:29

领域: cs.LG

下载: http://arxiv.org/abs/2504.06748v1

SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints

Generative models see increasing use in computer-aided drug design. However, while performing well at capturing distributions of molecular motifs, they often produce synthetically inaccessible molecules. To address this, we introduce SynFlowNet, a GFlowNet model whose action space uses chemical reactions and purchasable reactants to sequentially build new molecules. By incorporating forward synthesis as an explicit constraint of the generative mechanism, we aim at bridging the gap between in silico molecular generation and real world synthesis capabilities. We evaluate our approach using synthetic accessibility scores and an independent retrosynthesis tool to assess the synthesizability of our compounds, and motivate the choice of GFlowNets through considerable improvement in sample diversity compared to baselines. Additionally, we identify challenges with reaction encodings that can complicate traversal of the MDP in the backward direction. To address this, we introduce various strategies for learning the GFlowNet backward policy and thus demonstrate how additional constraints can be integrated into the GFlowNet MDP framework. This approach enables our model to successfully identify synthesis pathways for previously unseen molecules.

Updated: 2025-04-09 10:03:07

标题: SynFlowNet：具有合成约束的多样化和新颖分子设计

摘要: 生成模型在计算机辅助药物设计中的应用越来越广泛。然而，尽管在捕获分子结构分布方面表现出色，它们通常会产生合成无法访问的分子。为了解决这个问题，我们引入了SynFlowNet，这是一个使用化学反应和可购买的反应物作为动作空间的GFlowNet模型，用于依次构建新的分子。通过将前向合成作为生成机制的显式约束，我们旨在弥合在硅分子生成和实际合成能力之间的差距。我们使用合成可访问性评分和独立的逆合成工具来评估我们化合物的可合成性，并通过与基线相比在样本多样性方面的显著改进来激励选择GFlowNets。此外，我们确定了化学反应编码中可能使MDP在反向方向遍历变得复杂的挑战。为了解决这个问题，我们引入了学习GFlowNet反向策略的各种策略，从而展示了如何将附加约束集成到GFlowNet MDP框架中。这种方法使我们的模型能够成功地为以前未见过的分子识别合成途径。

更新时间: 2025-04-09 10:03:07

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2405.01155v3

More Efficient Stealth Address Protocol

The integration of privacy-preserving transactions into public blockchains such as Ethereum remains a major challenge. The Stealth Address Protocol (SAP) provides recipient anonymity by generating unlinkable stealth addresses. Existing SAPs, such as the Dual-Key Stealth Address Protocol and the Curvy Protocol, have shown significant improvements in efficiency, but remain vulnerable to quantum attacks. Post-quantum SAPs based on lattice-based cryptography, such as the Module-LWE SAP, on the other hand, offer quantum resistance while achieving better performance. In this paper, we present a novel hybrid SAP that combines the Curvy protocol with the computational advantages of the Module-LWE technique while remaining Ethereum-friendly. In contrast to full post-quantum solutions, our approach does not provide quantum security, but achieves a significant speedup in scanning the ephemeral public key registry, about three times faster than the Curvy protocol. We present a detailed cryptographic construction of our protocol and compare its performance with existing solutions. Our results prove that this hybrid approach is the most efficient Ethereum-compatible SAP to date.

Updated: 2025-04-09 10:01:24

标题: 更高效的隐形地址协议

摘要: 将保护隐私交易集成到以太坊等公共区块链中仍然是一个重大挑战。隐形地址协议（SAP）通过生成不可链接的隐形地址提供接收者匿名性。现有的SAP，如双密钥隐形地址协议和曲线协议，已经在效率方面取得了显著改进，但仍然容易受到量子攻击。另一方面，基于格基密码学的后量子SAP，如模块-LWE SAP，提供了量子抵抗力同时实现更好的性能。在本文中，我们提出了一种新颖的混合SAP，将曲线协议与模块-LWE技术的计算优势相结合，同时保持与以太坊的兼容性。与完全的后量子解决方案相比，我们的方法虽然没有提供量子安全性，但在扫描短暂公钥注册表方面实现了显著的加速，大约比曲线协议快三倍。我们详细介绍了我们协议的加密构造，并将其性能与现有解决方案进行了比较。我们的结果证明了这种混合方法是迄今为止最有效的以太坊兼容SAP。

更新时间: 2025-04-09 10:01:24

领域: cs.CR

下载: http://arxiv.org/abs/2504.06744v1

EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture

In this paper, we propose EDIT (Encoder-Decoder Image Transformer), a novel architecture designed to mitigate the attention sink phenomenon observed in Vision Transformer models. Attention sink occurs when an excessive amount of attention is allocated to the [CLS] token, distorting the model's ability to effectively process image patches. To address this, we introduce a layer-aligned encoder-decoder architecture, where the encoder utilizes self-attention to process image patches, while the decoder uses cross-attention to focus on the [CLS] token. Unlike traditional encoder-decoder framework, where the decoder depends solely on high-level encoder representations, EDIT allows the decoder to extract information starting from low-level features, progressively refining the representation layer by layer. EDIT is naturally interpretable demonstrated through sequential attention maps, illustrating the refined, layer-by-layer focus on key image features. Experiments on ImageNet-1k and ImageNet-21k, along with transfer learning tasks, show that EDIT achieves consistent performance improvements over DeiT3 models. These results highlight the effectiveness of EDIT's design in addressing attention sink and improving visual feature extraction.

Updated: 2025-04-09 09:51:41

标题: 修订：通过编码器-解码器架构减轻注意力漏洞，增强视觉变换器

摘要: 在本文中，我们提出了EDIT（Encoder-Decoder Image Transformer），这是一种新颖的架构，旨在减轻在Vision Transformer模型中观察到的注意力消失现象。当过多的注意力被分配给[CLS]标记时，就会发生注意力消失，这会扭曲模型有效处理图像补丁的能力。为了解决这个问题，我们引入了一种层对齐的编码器-解码器架构，其中编码器利用自注意力处理图像补丁，而解码器使用交叉注意力集中在[CLS]标记上。与传统的编码器-解码器框架不同，在那里解码器仅依赖于高级编码器表示，EDIT允许解码器从低级特征开始提取信息，逐层逐层地完善表示。通过顺序注意力图表明EDIT是自然可解释的，显示了对关键图像特征的逐层精炼关注。在ImageNet-1k和ImageNet-21k上的实验以及迁移学习任务显示，EDIT相对于DeiT3模型实现了一致的性能改进。这些结果突显了EDIT设计在解决注意力消失和改善视觉特征提取方面的有效性。

更新时间: 2025-04-09 09:51:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.06738v1

CroissantLLM: A Truly Bilingual French-English Language Model

We introduce CroissantLLM, a 1.3B language model pretrained on a set of 3T English and French tokens, to bring to the research and industrial community a high-performance, fully open-sourced bilingual model that runs swiftly on consumer-grade local hardware. To that end, we pioneer the approach of training an intrinsically bilingual model with a 1:1 English-to-French pretraining data ratio, a custom tokenizer, and bilingual finetuning datasets. We release the training dataset, notably containing a French split with manually curated, high-quality, and varied data sources. To assess performance outside of English, we craft a novel benchmark, FrenchBench, consisting of an array of classification and generation tasks, covering various orthogonal aspects of model performance in the French Language. Additionally, rooted in transparency and to foster further Large Language Model research, we release codebases, and dozens of checkpoints across various model sizes, training data distributions, and training steps, as well as fine-tuned Chat models, and strong translation models. We evaluate our model through the FMTI framework, and validate 81 % of the transparency criteria, far beyond the scores of even most open initiatives. This work enriches the NLP landscape, breaking away from previous English-centric work in order to strengthen our understanding of multilinguality in language models.

Updated: 2025-04-09 09:45:01

标题: CroissantLLM：一个真正的双语法语-英语语言模型

摘要: 我们介绍了CroissantLLM，这是一个预训练的语言模型，使用了3T的英语和法语令牌集，为研究和工业界带来了一个高性能、完全开源的双语模型，可以在消费者级别的本地硬件上快速运行。为此，我们开创了训练一个本质上双语模型的方法，其英语到法语的预训练数据比例为1:1，采用自定义分词器和双语微调数据集。我们发布了训练数据集，特别包括一个法语分割，其中包含经过手工策划的高质量和多样化的数据源。为了评估在英语以外的表现，我们创建了一个新的基准FrenchBench，包括一系列分类和生成任务，涵盖法语语言模型性能的各个方面。此外，基于透明度和促进更多大型语言模型研究，我们发布了代码库，以及各种模型大小、训练数据分布和训练步骤的数十个检查点，以及经过微调的Chat模型和强大的翻译模型。我们通过FMTI框架评估我们的模型，并验证了81%的透明度标准，远远超过大多数开放倡议的分数。这项工作丰富了自然语言处理领域，摆脱了以前以英语为中心的工作，以加强我们对语言模型中多语言性的理解。

更新时间: 2025-04-09 09:45:01

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.00786v5

The Writing is on the Wall: Analyzing the Boom of Inscriptions and its Impact on EVM-compatible Blockchains

This paper examines inscription-related transactions on Ethereum and major EVM-compatible rollups, assessing their impact on scalability during transaction surges. Our results show that, on certain days, inscriptions accounted for nearly 90% of transactions on Arbitrum and ZKsync Era, while 53% on Ethereum, with 99% of these inscriptions involving meme coin minting. Furthermore, we show that ZKsync and Arbitrum saw lower median gas fees during these surges. ZKsync Era, a ZK-rollup, showed a greater fee reduction than the optimistic rollups studied -- Arbitrum, Base, and Optimism.

Updated: 2025-04-09 09:43:04

标题: 墙上的文字：分析铭文繁荣及其对与EVM兼容的区块链的影响

摘要: 本文研究了以太坊和主要与EVM兼容的扩容技术上的铭文相关交易，评估它们在交易激增期间对可扩展性的影响。我们的研究结果显示，在某些日子，铭文在Arbitrum和ZKsync Era上占交易的近90％，而在以太坊上占53％，其中99％的铭文涉及模因币的铸造。此外，我们发现在这些激增期间，ZKsync和Arbitrum的中位气体费用较低。ZKsync Era，一种ZK-rollup，显示出比研究的乐观扩容技术Arbitrum、Base和Optimism更大的费用降低。

更新时间: 2025-04-09 09:43:04

领域: cs.CR

下载: http://arxiv.org/abs/2405.15288v3

Dynamic Relative Representations for Goal-Oriented Semantic Communications

In future 6G wireless networks, semantic and effectiveness aspects of communications will play a fundamental role, incorporating meaning and relevance into transmissions. However, obstacles arise when devices employ diverse languages, logic, or internal representations, leading to semantic mismatches that might jeopardize understanding. In latent space communication, this challenge manifests as misalignment within high-dimensional representations where deep neural networks encode data. This paper presents a novel framework for goal-oriented semantic communication, leveraging relative representations to mitigate semantic mismatches via latent space alignment. We propose a dynamic optimization strategy that adapts relative representations, communication parameters, and computation resources for energy-efficient, low-latency, goal-oriented semantic communications. Numerical results demonstrate our methodology's effectiveness in mitigating mismatches among devices, while optimizing energy consumption, delay, and effectiveness.

Updated: 2025-04-09 09:41:40

标题: 目标导向语义通信的动态相关性表示

摘要: 在未来的6G无线网络中，通信的语义和有效性方面将发挥基础作用，将含义和相关性整合到传输中。然而，当设备采用不同的语言、逻辑或内部表示时，会出现障碍，导致语义不匹配，可能危及理解。在潜在空间通信中，这一挑战表现为在深度神经网络编码数据的高维表示中的不对齐。本文提出了一个面向目标的语义通信的新框架，利用相对表示来通过潜在空间对齐来减轻语义不匹配。我们提出了一种动态优化策略，可以调整相对表示、通信参数和计算资源，实现能效高、低延迟、面向目标的语义通信。数值结果表明，我们的方法在减轻设备之间的不匹配的同时，优化了能耗、延迟和有效性。

更新时间: 2025-04-09 09:41:40

领域: cs.NI,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2403.16986v3

PETNet -- Coincident Particle Event Detection using Spiking Neural Networks

Spiking neural networks (SNN) hold the promise of being a more biologically plausible, low-energy alternative to conventional artificial neural networks. Their time-variant nature makes them particularly suitable for processing time-resolved, sparse binary data. In this paper, we investigate the potential of leveraging SNNs for the detection of photon coincidences in positron emission tomography (PET) data. PET is a medical imaging technique based on injecting a patient with a radioactive tracer and detecting the emitted photons. One central post-processing task for inferring an image of the tracer distribution is the filtering of invalid hits occurring due to e.g. absorption or scattering processes. Our approach, coined PETNet, interprets the detector hits as a binary-valued spike train and learns to identify photon coincidence pairs in a supervised manner. We introduce a dedicated multi-objective loss function and demonstrate the effects of explicitly modeling the detector geometry on simulation data for two use-cases. Our results show that PETNet can outperform the state-of-the-art classical algorithm with a maximal coincidence detection $F_1$ of 95.2%. At the same time, PETNet is able to predict photon coincidences up to 36 times faster than the classical approach, highlighting the great potential of SNNs in particle physics applications.

Updated: 2025-04-09 09:38:45

标题: PETNet -- 使用脉冲神经网络进行巧合粒子事件检测

摘要: 尖峰神经网络（SNN）有望成为一种更具生物学可信性、低能耗的替代传统人工神经网络的选择。它们的时变特性使其特别适合处理时间分辨率、稀疏二进制数据。本文研究了利用SNN检测正电子发射断层扫描（PET）数据中光子符合事件的潜力。PET是一种基于给患者注射放射性示踪剂并检测发射的光子的医学成像技术。推断示踪剂分布图像的一个中心后处理任务是过滤由于吸收或散射过程等原因而发生的无效击中。我们的方法，命名为PETNet，将探测器击中解释为二进制值尖峰列，并学会以监督方式识别光子符合对。我们引入了一个专门的多目标损失函数，并在两种用例的模拟数据上展示了明确建模探测器几何形状的效果。我们的结果显示，PETNet可以在最大符合检测$F_1$为95.2%的情况下优于最先进的经典算法。同时，PETNet能够比经典方法快36倍地预测光子符合事件，突显了SNN在粒子物理应用中的巨大潜力。

更新时间: 2025-04-09 09:38:45

领域: cs.LG,hep-ex

下载: http://arxiv.org/abs/2504.06730v1

Plastic tensor networks for interpretable generative modeling

A structural optimization scheme for a single-layer nonnegative adaptive tensor tree (NATT) that models a target probability distribution is proposed. The NATT scheme, by construction, has the advantage that it is interpretable as a probabilistic graphical model. We consider the NATT scheme and a recently proposed Born machine adaptive tensor tree (BMATT) optimization scheme and demonstrate their effectiveness on a variety of generative modeling tasks where the objective is to infer the hidden structure of a provided dataset. Our results show that in terms of minimizing the negative log-likelihood, the single-layer scheme has model performance comparable to the Born machine scheme, though not better. The tasks include deducing the structure of binary bitwise operations, learning the internal structure of random Bayesian networks given only visible sites, and a real-world example related to hierarchical clustering where a cladogram is constructed from mitochondrial DNA sequences. In doing so, we also show the importance of the choice of network topology and the versatility of a least-mutual information criterion in selecting a candidate structure for a tensor tree, as well as discuss aspects of these tensor tree generative models including their information content and interpretability.

Updated: 2025-04-09 09:23:11

标题: 可解释生成建模的塑料张量网络

摘要: 提出了一种针对单层非负自适应张量树（NATT）的结构优化方案，该方案模拟了目标概率分布。NATT方案在构建过程中具有解释为概率图模型的优势。我们考虑了NATT方案和最近提出的Born机自适应张量树（BMATT）优化方案，并展示了它们在各种生成建模任务中的有效性，其中目标是推断所提供数据集的隐藏结构。我们的结果表明，在最小化负对数似然的角度来看，单层方案的模型性能与Born机方案相当，尽管并非更好。这些任务包括推断二进制位运算的结构，仅根据可见位置学习随机贝叶斯网络的内部结构，以及与线粒体DNA序列相关的层次聚类的真实示例中构造类群图。在此过程中，我们还展示了网络拓扑选择的重要性以及使用最小互信息准则选择张量树候选结构的多功能性，同时讨论了这些张量树生成模型的信息内容和可解释性。

更新时间: 2025-04-09 09:23:11

领域: cs.LG,cond-mat.stat-mech

下载: http://arxiv.org/abs/2504.06722v1

Learning global control of underactuated systems with Model-Based Reinforcement Learning

This short paper describes our proposed solution for the third edition of the "AI Olympics with RealAIGym" competition, held at ICRA 2025. We employed Monte-Carlo Probabilistic Inference for Learning Control (MC-PILCO), an MBRL algorithm recognized for its exceptional data efficiency across various low-dimensional robotic tasks, including cart-pole, ball \& plate, and Furuta pendulum systems. MC-PILCO optimizes a system dynamics model using interaction data, enabling policy refinement through simulation rather than direct system data optimization. This approach has proven highly effective in physical systems, offering greater data efficiency than Model-Free (MF) alternatives. Notably, MC-PILCO has previously won the first two editions of this competition, demonstrating its robustness in both simulated and real-world environments. Besides briefly reviewing the algorithm, we discuss the most critical aspects of the MC-PILCO implementation in the tasks at hand: learning a global policy for the pendubot and acrobot systems.

Updated: 2025-04-09 09:20:37

标题: 学习基于模型的强化学习的欠驱动系统全局控制

摘要: 本短文描述了我们提出的解决方案，用于在2025年ICRA举办的“AI奥林匹克与RealAIGym”竞赛的第三版。我们采用了蒙特卡洛概率推断学习控制（MC-PILCO）算法，这是一种在各种低维度机器人任务中具有出色数据效率的MBRL算法，包括倒立摆、球盘和Furuta摆系统。MC-PILCO通过交互数据优化系统动态模型，通过模拟而非直接系统数据优化来实现策略的改进。这种方法在物理系统中证明非常有效，比无模型（MF）替代方案具有更高的数据效率。值得注意的是，MC-PILCO先前赢得了这个竞赛的前两个版本，证明了它在模拟和真实环境中的稳健性。除了简要回顾算法外，我们还讨论了MC-PILCO在手头任务中实现的最关键方面：学习倒立摆和双摆系统的全局策略。

更新时间: 2025-04-09 09:20:37

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.06721v1

Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding

Self-supervised learning has transformed 2D computer vision by enabling models trained on large, unannotated datasets to provide versatile off-the-shelf features that perform similarly to models trained with labels. However, in 3D scene understanding, self-supervised methods are typically only used as a weight initialization step for task-specific fine-tuning, limiting their utility for general-purpose feature extraction. This paper addresses this shortcoming by proposing a robust evaluation protocol specifically designed to assess the quality of self-supervised features for 3D scene understanding. Our protocol uses multi-resolution feature sampling of hierarchical models to create rich point-level representations that capture the semantic capabilities of the model and, hence, are suitable for evaluation with linear probing and nearest-neighbor methods. Furthermore, we introduce the first self-supervised model that performs similarly to supervised models when only off-the-shelf features are used in a linear probing setup. In particular, our model is trained natively in 3D with a novel self-supervised approach based on a Masked Scene Modeling objective, which reconstructs deep features of masked patches in a bottom-up manner and is specifically tailored to hierarchical 3D models. Our experiments not only demonstrate that our method achieves competitive performance to supervised models, but also surpasses existing self-supervised approaches by a large margin. The model and training code can be found at our Github repository (https://github.com/phermosilla/msm).

Updated: 2025-04-09 09:19:49

标题: 遮罩场景建模：缩小监督学习和自监督学习在3D场景理解中的差距

摘要: 自监督学习已经通过使模型在大型未标记数据集上训练，提供多功能的即插即用特征，从而在2D计算机视觉中发挥了作用，其性能类似于使用标签训练的模型。然而，在3D场景理解中，自监督方法通常只用作任务特定微调的权重初始化步骤，限制了它们用于通用特征提取的效用。本文通过提出一个专门设计用于评估自监督特征在3D场景理解中质量的健壮评估协议来解决这一不足。我们的协议使用层次模型的多分辨率特征采样来创建丰富的点级表示，捕获模型的语义能力，因此适合使用线性探测和最近邻方法进行评估。此外，我们介绍了第一个自监督模型，当仅在线性探测设置中使用即插即用特征时，其性能与监督模型相似。特别地，我们的模型是在3D中原生训练的，采用基于遮罩场景建模目标的新型自监督方法，该目标以自下而上的方式重建遮罩补丁的深层特征，并专门为层次化3D模型设计。我们的实验证明，我们的方法不仅达到了与监督模型相竞争的性能，而且在很大程度上超过了现有的自监督方法。模型和训练代码可以在我们的Github存储库中找到（https://github.com/phermosilla/msm）。

更新时间: 2025-04-09 09:19:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.06719v1

Large-Scale (Semi-)Automated Security Assessment of Consumer IoT Devices -- A Roadmap

The Internet of Things (IoT) has rapidly expanded across various sectors, with consumer IoT devices - such as smart thermostats and security cameras - experiencing growth. Although these devices improve efficiency and promise additional comfort, they also introduce new security challenges. Common and easy-to-explore vulnerabilities make IoT devices prime targets for malicious actors. Upcoming mandatory security certifications offer a promising way to mitigate these risks by enforcing best practices and providing transparency. Regulatory bodies are developing IoT security frameworks, but a universal standard for large-scale systematic security assessment is lacking. Existing manual testing approaches are expensive, limiting their efficacy in the diverse and rapidly evolving IoT domain. This paper reviews current IoT security challenges and assessment efforts, identifies gaps, and proposes a roadmap for scalable, automated security assessment, leveraging a model-based testing approach and machine learning techniques to strengthen consumer IoT security.

Updated: 2025-04-09 09:15:04

标题: 消费者物联网设备的大规模（半）自动化安全评估——一条路线图

摘要: 物联网（IoT）已迅速在各个领域扩展，消费者物联网设备-如智能恒温器和安全摄像头-经历增长。虽然这些设备提高了效率并承诺额外的舒适性，但它们也带来了新的安全挑战。常见且易于探测的漏洞使物联网设备成为恶意行为者的首要目标。即将出台的强制性安全认证提供了一种有望减轻这些风险的方式，通过执行最佳实践并提供透明度。监管机构正在制定物联网安全框架，但缺乏大规模系统安全评估的通用标准。现有的手动测试方法昂贵，限制了它们在多样化和快速演变的物联网领域的有效性。本文审查了当前物联网安全挑战和评估工作，确定了差距，并提出了一个可扩展的，自动化的安全评估路线图，利用基于模型的测试方法和机器学习技术来加强消费者物联网安全性。

更新时间: 2025-04-09 09:15:04

领域: cs.CR

下载: http://arxiv.org/abs/2504.06712v1

Clustering and novel class recognition: evaluating bioacoustic deep learning feature extractors

In computational bioacoustics, deep learning models are composed of feature extractors and classifiers. The feature extractors generate vector representations of the input sound segments, called embeddings, which can be input to a classifier. While benchmarking of classification scores provides insights into specific performance statistics, it is limited to species that are included in the models' training data. Furthermore, it makes it impossible to compare models trained on very different taxonomic groups. This paper aims to address this gap by analyzing the embeddings generated by the feature extractors of 15 bioacoustic models spanning a wide range of setups (model architectures, training data, training paradigms). We evaluated and compared different ways in which models structure embedding spaces through clustering and kNN classification, which allows us to focus our comparison on feature extractors independent of their classifiers. We believe that this approach lets us evaluate the adaptability and generalization potential of models going beyond the classes they were trained on.

Updated: 2025-04-09 09:13:18

标题: 聚类和新类别识别：评估生物声学深度学习特征提取器

摘要: 在计算生物声学中，深度学习模型由特征提取器和分类器组成。特征提取器生成输入声音片段的向量表示，称为嵌入，可以输入到分类器中。虽然分类得分的基准测试提供了特定性能统计数据，但仅限于模型训练数据中包含的物种。此外，它使得无法比较在非常不同分类群上训练的模型。本文旨在通过分析由涵盖各种设置（模型架构、训练数据、训练范式）的15个生物声学模型的特征提取器生成的嵌入来填补这一差距。我们通过聚类和kNN分类评估和比较模型结构嵌入空间的不同方式，这使我们能够将比较重点放在特征提取器上，而不受其分类器的影响。我们相信，这种方法让我们评估模型的适应性和泛化潜力，超越了它们训练的类别。

更新时间: 2025-04-09 09:13:18

领域: cs.LG

下载: http://arxiv.org/abs/2504.06710v1

MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation

Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks. In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answers, providing useful clues for the retrieval tools to locate relevant information within the long context. Second, it leverages an expensive but expressive system, which generates the final answer based on the retrieved information. Building upon this fundamental framework, we realize the memory module in the form of KV compression, and reinforce its memorization and cluing capacity from the Generation quality's Feedback (a.k.a. RLGF). In our experiments, MemoRAG achieves superior performances across a variety of long-context evaluation tasks, not only complex scenarios where traditional RAG methods struggle, but also simpler ones where RAG is typically applied.

Updated: 2025-04-09 09:09:37

标题: MemoRAG：通过全局记忆增强检索增强提升长上下文处理

摘要: 处理长上下文对于大型语言模型(LLMs)来说是一个重大挑战。虽然最近的进展使LLMs能够处理比以前更长的上下文（例如32K或128K个标记），但这需要大量计算资源，对许多应用仍然不足够。检索增强生成（RAG）被认为是解决这个问题的一种有希望的策略。然而，传统的RAG方法面临固有的限制，因为需要两个基本要求：1）明确陈述的查询，2）结构良好的知识。然而，这些条件通常在一般长上下文处理任务中并不成立。在这项工作中，我们提出了MemoRAG，一种由全局记忆增强检索的创新RAG框架。MemoRAG具有双系统架构。首先，它采用一种轻量但长距离系统来创建长上下文的全局记忆。一旦任务呈现，它生成初步答案，为检索工具提供有用的线索，以在长上下文中定位相关信息。其次，它利用一种昂贵但表达能力强的系统，基于检索到的信息生成最终答案。在这个基本框架的基础上，我们将记忆模块实现为KV压缩形式，并通过生成质量的反馈（即RLGF）增强其记忆和提示能力。在我们的实验中，MemoRAG在各种长上下文评估任务中表现出优越的性能，不仅是在传统RAG方法困难的复杂场景中，也包括RAG通常应用的简单场景。

更新时间: 2025-04-09 09:09:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.05591v3

GWQ: Gradient-Aware Weight Quantization for Large Language Models

Large language models (LLMs) show impressive performance in solving complex language tasks. However, its large number of parameters presents significant challenges for the deployment. So, compressing LLMs to low bits can enable to deploy on resource-constrained devices. To address this problem, we propose gradient-aware weight quantization (GWQ), the first quantization approach for low-bit weight quantization that leverages gradients to localize outliers, requiring only a minimal amount of calibration data for outlier detection. GWQ retains the top 1\% outliers preferentially at FP16 precision, while the remaining non-outlier weights are stored in a low-bit. We widely evaluate GWQ on different task include language modeling, grounding detection, massive multitask language understanding and vision-language question and answering. Results show that models quantified by GWQ performs better than other quantization method. During quantization process, GWQ only need one calibration set to realize effective quant. Also, GWQ achieves 1.2x inference speedup in comparison to the original model and effectively reduces the inference memory.

Updated: 2025-04-09 09:09:11

标题: GWQ: 大型语言模型的梯度感知权重量化

摘要: 大型语言模型（LLMs）在解决复杂语言任务方面表现出色。然而，其庞大的参数数量对部署提出了重大挑战。因此，将LLMs压缩为低位可以使其能够部署在资源受限的设备上。为了解决这个问题，我们提出了梯度感知权重量化（GWQ），这是第一个针对低位权重量化的量化方法，利用梯度来定位异常值，仅需要最少量的校准数据来进行异常值检测。GWQ优先保留FP16精度下的前1\%异常值，而其余非异常值权重存储在低位中。我们在不同任务中广泛评估了GWQ，包括语言建模、地面检测、大规模多任务语言理解和视觉-语言问答。结果表明，通过GWQ量化的模型表现优于其他量化方法。在量化过程中，GWQ只需要一个校准集来实现有效量化。此外，与原始模型相比，GWQ实现了1.2倍的推理加速，并有效减少了推理内存的使用。

更新时间: 2025-04-09 09:09:11

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2411.00850v3

CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers

Transformers have driven remarkable breakthroughs in natural language processing and computer vision, yet their standard attention mechanism still imposes O(N^2) complexity, hindering scalability to longer sequences. We introduce Circular-convolutional ATtention (CAT), a Fourier-based approach that efficiently applies circular convolutions to reduce complexity without sacrificing representational power. CAT achieves O(NlogN) computations, requires fewer learnable parameters by streamlining fully-connected layers, and introduces no heavier operations, resulting in consistent accuracy improvements and about a 10% speedup in naive PyTorch implementations on large-scale benchmarks such as ImageNet-1k and WikiText-103. Grounded in an engineering-isomorphism framework, CAT's design not only offers practical efficiency and ease of implementation but also provides insights to guide the development of next-generation, high-performance Transformer architectures. Finally, our ablation studies highlight the key conditions underlying CAT's success, shedding light on broader principles for scalable attention mechanisms.

Updated: 2025-04-09 09:08:26

标题: CAT: 循环卷积注意力用于次二次方变换器

摘要: 变压器在自然语言处理和计算机视觉方面取得了显著的突破，然而它们的标准注意力机制仍然具有O(N^2)的复杂度，限制了对更长序列的可扩展性。我们引入了基于傅立叶变换的循环卷积注意力（CAT）方法，该方法有效地应用循环卷积以降低复杂度，而不损失表征能力。CAT实现了O(NlogN)的计算量，通过简化全连接层需要更少的可学习参数，并且不引入更重的操作，在大规模基准测试如ImageNet-1k和WikiText-103上，Naive PyTorch实现中的准确性持续提高约10%的速度。基于工程同构框架，CAT的设计不仅提供了实际的效率和易于实现，还提供了指导下一代高性能变压器架构开发的见解。最后，我们的消融研究突出了CAT成功背后的关键条件，为可扩展注意力机制的更广泛原则提供了启示。

更新时间: 2025-04-09 09:08:26

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2504.06704v1

Benchmarking Convolutional Neural Network and Graph Neural Network based Surrogate Models on a Real-World Car External Aerodynamics Dataset

Aerodynamic optimization is crucial for developing eco-friendly, aerodynamic, and stylish cars, which requires close collaboration between aerodynamicists and stylists, a collaboration impaired by the time-consuming nature of aerodynamic simulations. Surrogate models offer a viable solution to reduce this overhead, but they are untested in real-world aerodynamic datasets. We present a comparative evaluation of two surrogate modeling approaches for predicting drag on a real-world dataset: a Convolutional Neural Network (CNN) model that uses a signed distance field as input and a commercial tool based on Graph Neural Networks (GNN) that directly processes a surface mesh. In contrast to previous studies based on datasets created from parameterized geometries, our dataset comprises 343 geometries derived from 32 baseline vehicle geometries across five distinct car projects, reflecting the diverse, free-form modifications encountered in the typical vehicle development process. Our results show that the CNN-based method achieves a mean absolute error of 2.3 drag counts, while the GNN-based method achieves 3.8. Both methods achieve approximately 77% accuracy in predicting the direction of drag change relative to the baseline geometry. While both methods effectively capture the broader trends between baseline groups (set of samples derived from a single baseline geometry), they struggle to varying extents in capturing the finer intra-baseline group variations. In summary, our findings suggest that aerodynamicists can effectively use both methods to predict drag in under two minutes, which is at least 600 times faster than performing a simulation. However, there remains room for improvement in capturing the finer details of the geometry.

Updated: 2025-04-09 09:04:59

标题: 基于真实汽车外部空气动力学数据集的卷积神经网络和图神经网络的代理模型的基准测试

摘要: 空气动力学优化对于开发环保、具有良好空气动力特性和时尚外观的汽车至关重要，这需要空气动力学家和设计师之间密切合作，而空气动力学模拟耗时较长，影响了他们之间的合作。代理模型提供了减少这种额外开销的可行解决方案，但在真实世界的空气动力学数据集中尚未经过测试。本文对两种用于预测真实数据集上阻力的代理建模方法进行了比较评估：一种利用有符号距离场作为输入的卷积神经网络（CNN）模型，以及一种基于图神经网络（GNN）的商业工具，直接处理表面网格。与先前基于参数化几何创建的数据集的研究不同，我们的数据集包括从五个不同的汽车项目中导出的32个基准车几何形状衍生出的343个几何形状，反映了典型车辆开发过程中遇到的多样化、自由形式的修改。我们的结果显示，基于CNN的方法实现了2.3个阻力计的平均绝对误差，而基于GNN的方法实现了3.8。两种方法在预测相对于基准几何形状的阻力变化方向上的准确率约为77%。虽然两种方法有效捕捉了基准组之间的更广泛趋势（从单个基准几何形状导出的样本集），但它们在捕捉更精细的基准组内变化方面存在不同程度的困难。总之，我们的发现表明，空气动力学家可以有效地使用这两种方法在不到两分钟内预测阻力，这至少比进行模拟快600倍。然而，在捕捉几何细节方面仍有改进的空间。

更新时间: 2025-04-09 09:04:59

领域: cs.LG

下载: http://arxiv.org/abs/2504.06699v1

Compound Fault Diagnosis for Train Transmission Systems Using Deep Learning with Fourier-enhanced Representation

Fault diagnosis prevents train disruptions by ensuring the stability and reliability of their transmission systems. Data-driven fault diagnosis models have several advantages over traditional methods in terms of dealing with non-linearity, adaptability, scalability, and automation. However, existing data-driven models are trained on separate transmission components and only consider single faults due to the limitations of existing datasets. These models will perform worse in scenarios where components operate with each other at the same time, affecting each component's vibration signals. To address some of these challenges, we propose a frequency domain representation and a 1-dimensional convolutional neural network for compound fault diagnosis and applied it on the PHM Beijing 2024 dataset, which includes 21 sensor channels, 17 single faults, and 42 compound faults from 4 interacting components, that is, motor, gearbox, left axle box, and right axle box. Our proposed model achieved 97.67% and 93.93% accuracies on the test set with 17 single faults and on the test set with 42 compound faults, respectively.

Updated: 2025-04-09 09:01:18

标题: 使用傅立叶增强表示的深度学习进行火车传动系统的复合故障诊断

摘要: 故障诊断通过确保火车传输系统的稳定性和可靠性来防止列车故障。数据驱动的故障诊断模型在处理非线性、适应性、可扩展性和自动化方面比传统方法具有几个优势。然而，现有的数据驱动模型是针对单独的传输组件进行训练的，仅考虑单一故障，因为现有数据集的限制。在组件同时运行的情况下，这些模型在影响每个组件振动信号的情况下表现更差。为了解决其中一些挑战，我们提出了一种频域表示和一维卷积神经网络用于复合故障诊断，并将其应用于PHM北京2024数据集，该数据集包括21个传感器通道、17种单一故障和来自4个相互作用组件，即电机、齿轮箱、左轴箱和右轴箱的42种复合故障。我们提出的模型在测试集上分别实现了17种单一故障和42种复合故障的97.67%和93.93%的准确度。

更新时间: 2025-04-09 09:01:18

领域: cs.LG

下载: http://arxiv.org/abs/2504.07155v1

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

Large Language Models (LLMs) are prone to off-topic misuse, where users may prompt these models to perform tasks beyond their intended scope. Current guardrails, which often rely on curated examples or custom classifiers, suffer from high false-positive rates, limited adaptability, and the impracticality of requiring real-world data that is not available in pre-production. In this paper, we introduce a flexible, data-free guardrail development methodology that addresses these challenges. By thoroughly defining the problem space qualitatively and passing this to an LLM to generate diverse prompts, we construct a synthetic dataset to benchmark and train off-topic guardrails that outperform heuristic approaches. Additionally, by framing the task as classifying whether the user prompt is relevant with respect to the system prompt, our guardrails effectively generalize to other misuse categories, including jailbreak and harmful prompts. Lastly, we further contribute to the field by open-sourcing both the synthetic dataset and the off-topic guardrail models, providing valuable resources for developing guardrails in pre-production environments and supporting future research and development in LLM safety.

Updated: 2025-04-09 08:59:26

标题: 一种灵活的大型语言模型防护栏开发方法应用于离题提示检测

摘要: 大型语言模型（LLMs）容易被滥用，用户可能要求这些模型执行超出其预期范围的任务。目前的防护栏通常依赖于经过精心筛选的示例或自定义分类器，存在高假阳性率、适应性有限以及需要实际数据但在预生产环境中不可行的问题。在本文中，我们介绍了一种灵活的、无数据的防护栏开发方法，以解决这些挑战。通过对问题空间进行定性的全面定义，并将其传递给LLM生成多样的提示，我们构建了一个合成数据集来评估和训练超越启发式方法的离题防护栏。此外，通过将任务框定为判断用户提示与系统提示之间的相关性，我们的防护栏有效地推广到其他滥用类别，包括越狱和有害提示。最后，我们通过开源合成数据集和离题防护栏模型进一步为该领域做出贡献，为在预生产环境中开发防护栏以及支持LLM安全的未来研究和发展提供宝贵资源。

更新时间: 2025-04-09 08:59:26

领域: cs.CL,cs.LG,68T50,I.2.7

下载: http://arxiv.org/abs/2411.12946v2

Levels of Binary Equivalence for the Comparison of Binaries from Alternative Builds

In response to challenges in software supply chain security, several organisations have created infrastructures to independently build commodity open source projects and release the resulting binaries. Build platform variability can strengthen security as it facilitates the detection of compromised build environments. Furthermore, by improving the security posture of the build platform and collecting provenance information during the build, the resulting artifacts can be used with greater trust. Such offerings are now available from Google, Oracle and RedHat. The availability of multiple binaries built from the same sources creates new challenges and opportunities, and raises questions such as: 'Does build A confirm the integrity of build B?' or 'Can build A reveal a compromised build B?'. To answer such questions requires a notion of equivalence between binaries. We demonstrate that the obvious approach based on bitwise equality has significant shortcomings in practice, and that there is value in opting for alternative notions. We conceptualise this by introducing levels of equivalence, inspired by clone detection types. We demonstrate the value of these new levels through several experiments. We construct a dataset consisting of Java binaries built from the same sources independently by different providers, resulting in 14,156 pairs of binaries in total. We then compare the compiled class files in those jar files and find that for 3,750 pairs of jars (26.49%) there is at least one such file that is different, also forcing the jar files and their cryptographic hashes to be different. However, based on the new equivalence levels, we can still establish that many of them are practically equivalent. We evaluate several candidate equivalence relations on a semi-synthetic dataset that provides oracles consisting of pairs of binaries that either should be, or must not be equivalent.

Updated: 2025-04-09 08:55:38

标题: 二进制等价水平用于比较来自不同构建的二进制文件

摘要: 针对软件供应链安全挑战，一些组织已经建立了基础设施，独立构建商品开源项目并发布生成的二进制文件。构建平台的多样性可以增强安全性，因为它有助于检测受损的构建环境。此外，通过改善构建平台的安全状况并在构建过程中收集来源信息，生成的工件可以更可靠地使用。目前，Google、Oracle和RedHat提供了这样的服务。从相同源代码构建的多个二进制文件的可用性带来了新的挑战和机遇，并提出了问题，比如：“构建A是否确认了构建B的完整性？”或“构建A是否可以揭示受损的构建B？”要回答这些问题需要对二进制文件之间的等价概念。我们通过引入受克隆检测类型启发的等价级别来概念化这一点。通过几项实验，我们展示了这些新等价级别的价值。我们构建了一个数据集，其中包含独立由不同提供者构建的来自相同源代码的Java二进制文件，总共产生了14,156对二进制文件。然后我们比较这些jar文件中的编译类文件，并发现在3750对jar文件（26.49%）中至少有一个文件是不同的，也导致jar文件及其密码哈希值不同。然而，基于新的等价级别，我们仍然可以确定其中许多实际上是等价的。我们在一个半合成数据集上评估了几个候选等价关系，该数据集提供了应该或不应该等价的二进制文件对的标准。

更新时间: 2025-04-09 08:55:38

领域: cs.CR,cs.SE,D.2.13; D.3.4; F.3.2

下载: http://arxiv.org/abs/2410.08427v2

Hyperparameter Optimisation with Practical Interpretability and Explanation Methods in Probabilistic Curriculum Learning

Hyperparameter optimisation (HPO) is crucial for achieving strong performance in reinforcement learning (RL), as RL algorithms are inherently sensitive to hyperparameter settings. Probabilistic Curriculum Learning (PCL) is a curriculum learning strategy designed to improve RL performance by structuring the agent's learning process, yet effective hyperparameter tuning remains challenging and computationally demanding. In this paper, we provide an empirical analysis of hyperparameter interactions and their effects on the performance of a PCL algorithm within standard RL tasks, including point-maze navigation and DC motor control. Using the AlgOS framework integrated with Optuna's Tree-Structured Parzen Estimator (TPE), we present strategies to refine hyperparameter search spaces, enhancing optimisation efficiency. Additionally, we introduce a novel SHAP-based interpretability approach tailored specifically for analysing hyperparameter impacts, offering clear insights into how individual hyperparameters and their interactions influence RL performance. Our work contributes practical guidelines and interpretability tools that significantly improve the effectiveness and computational feasibility of hyperparameter optimisation in reinforcement learning.

Updated: 2025-04-09 08:41:27

标题: 《概率课程学习中具有实用可解释性和解释方法的超参数优化》

摘要: 超参数优化（HPO）对于在强化学习（RL）中实现强大性能至关重要，因为RL算法本身对超参数设置敏感。概率课程学习（PCL）是一种课程学习策略，旨在通过构建代理学习过程来提高RL性能，然而有效的超参数调整仍然具有挑战性且计算需求量大。在本文中，我们对超参数相互作用及其对PCL算法在标准RL任务中的性能的影响进行了实证分析，包括点迷宫导航和DC电机控制。利用集成Optuna的Tree-Structured Parzen Estimator（TPE）的AlgOS框架，我们提出了改进超参数搜索空间的策略，增强了优化效率。此外，我们引入了一种新颖的基于SHAP的可解释性方法，专门用于分析超参数影响，为了清晰地展示单个超参数及其相互作用如何影响RL性能。我们的工作提供了实用指南和可解释性工具，显著提高了强化学习中超参数优化的效果和计算可行性。

更新时间: 2025-04-09 08:41:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06683v1

ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models

With the rise and widespread use of Large Language Models (LLMs), ensuring their safety is crucial to prevent harm to humans and promote ethical behaviors. However, directly assessing value valence (i.e., support or oppose) by leveraging large-scale data training is untrustworthy and inexplainable. We assume that emulating humans to rely on social norms to make moral decisions can help LLMs understand and predict moral judgment. However, capturing human values remains a challenge, as multiple related norms might conflict in specific contexts. Consider norms that are upheld by the majority and promote the well-being of society are more likely to be accepted and widely adopted (e.g., "don't cheat,"). Therefore, it is essential for LLM to identify the appropriate norms for a given scenario before making moral decisions. To this end, we introduce a novel moral judgment approach called \textit{ClarityEthic} that leverages LLMs' reasoning ability and contrastive learning to uncover relevant social norms for human actions from different perspectives and select the most reliable one to enhance judgment accuracy. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in moral judgment tasks. Moreover, human evaluations confirm that the generated social norms provide plausible explanations that support the judgments. This suggests that modeling human moral judgment with the emulating humans moral strategy is promising for improving the ethical behaviors of LLMs.

Updated: 2025-04-09 08:38:44

标题: ClarityEthic：利用大型语言模型的对比伦理观点进行可解释的道德判断

摘要: 随着大型语言模型（LLMs）的兴起和广泛使用，确保它们的安全对于防止对人类造成伤害并促进道德行为至关重要。然而，通过利用大规模数据训练直接评估价值倾向（即支持或反对）是不可靠且无法解释的。我们假设，模仿人类依赖社会规范做出道德决策可以帮助LLMs理解和预测道德判断。然而，捕捉人类价值仍然是一个挑战，因为多个相关规范在特定环境下可能发生冲突。考虑到被大多数人支持并促进社会福祉的规范更有可能被接受和广泛采纳（例如，“不作弊”）。因此，对于LLMs在做出道德决策之前识别给定情景下的适当规范至关重要。为此，我们引入了一种称为\textit{ClarityEthic}的新颖道德判断方法，利用LLMs的推理能力和对比学习来从不同角度揭示人类行为的相关社会规范，并选择最可靠的规范以提高判断准确性。广泛的实验表明，我们的方法在道德判断任务中胜过了最先进的方法。此外，人类评估证实生成的社会规范提供了支持判断的合理解释。这表明，通过模拟人类的道德策略来建模人类道德判断对于改善LLMs的道德行为是有希望的。

更新时间: 2025-04-09 08:38:44

领域: cs.CY,cs.AI,cs.SI

下载: http://arxiv.org/abs/2412.12848v2

The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters

Theory-of-Mind (ToM) is a fundamental psychological capability that allows humans to understand and interpret the mental states of others. Humans infer others' thoughts by integrating causal cues and indirect clues from broad contextual information, often derived from past interactions. In other words, human ToM heavily relies on the understanding about the backgrounds and life stories of others. Unfortunately, this aspect is largely overlooked in existing benchmarks for evaluating machines' ToM capabilities, due to their usage of short narratives without global context, especially personal background of characters. In this paper, we verify the importance of comprehensive contextual understanding about personal backgrounds in ToM and assess the performance of LLMs in such complex scenarios. To achieve this, we introduce CharToM benchmark, comprising 1,035 ToM questions based on characters from classic novels. Our human study reveals a significant disparity in performance: the same group of educated participants performs dramatically better when they have read the novels compared to when they have not. In parallel, our experiments on state-of-the-art LLMs, including the very recent o1 and DeepSeek-R1 models, show that LLMs still perform notably worse than humans, despite that they have seen these stories during pre-training. This highlights the limitations of current LLMs in capturing the nuanced contextual information required for ToM reasoning.

Updated: 2025-04-09 08:36:10

标题: 理解背景的本质在心理理论中的重要性：关于故事人物问答的研究

摘要: 心灵理论（ToM）是一种基本的心理能力，使人类能够理解和解释他人的心理状态。人类通过整合来自广泛背景信息的因果线索和间接线索来推断他人的想法，这些信息通常源自过去的互动。换句话说，人类的ToM在很大程度上依赖于对他人背景和生活故事的理解。不幸的是，由于现有基准测试在评估机器的ToM能力时使用了缺乏全局背景，尤其是人物个人背景的短篇故事，这一方面在很大程度上被忽视了。在本文中，我们验证了对ToM中个人背景的全面上下文理解的重要性，并评估了LLMs在这种复杂场景中的表现。为了实现这一目标，我们引入了CharToM基准测试，包括1,035个基于经典小说人物的ToM问题。我们的人类研究揭示了在表现上的显著差异：同一组受过教育的参与者在阅读了小说后与未阅读小说时表现截然不同。同时，我们对最先进的LLMs进行了实验，包括最新的o1和DeepSeek-R1模型，结果显示，尽管它们在预训练时已经看过这些故事，但LLMs的表现仍明显逊色于人类。这突显了当前LLMs在捕捉ToM推理所需的微妙上下文信息方面存在的局限性。

更新时间: 2025-04-09 08:36:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2501.01705v2

asKAN: Active Subspace embedded Kolmogorov-Arnold Network

The Kolmogorov-Arnold Network (KAN) has emerged as a promising neural network architecture for small-scale AI+Science applications. However, it suffers from inflexibility in modeling ridge functions, which is widely used in representing the relationships in physical systems. This study investigates this inflexibility through the lens of the Kolmogorov-Arnold theorem, which starts the representation of multivariate functions from constructing the univariate components rather than combining the independent variables. Our analysis reveals that incorporating linear combinations of independent variables can substantially simplify the network architecture in representing the ridge functions. Inspired by this finding, we propose active subspace embedded KAN (asKAN), a hierarchical framework that synergizes KAN's function representation with active subspace methodology. The architecture strategically embeds active subspace detection between KANs, where the active subspace method is used to identify the primary ridge directions and the independent variables are adaptively projected onto these critical dimensions. The proposed asKAN is implemented in an iterative way without increasing the number of neurons in the original KAN. The proposed method is validated through function fitting, solving the Poisson equation, and reconstructing sound field. Compared with KAN, asKAN significantly reduces the error using the same network architecture. The results suggest that asKAN enhances the capability of KAN in fitting and solving equations in the form of ridge functions.

Updated: 2025-04-09 08:34:59

标题: asKAN：主动子空间嵌入式Kolmogorov-Arnold网络

摘要: 科尔莫戈洛夫-阿诺德网络（KAN）已经成为小规模AI+科学应用中一种有前途的神经网络架构。然而，它在模拟脊函数方面存在柔韧性不足的问题，而脊函数在表示物理系统中的关系时被广泛使用。本研究通过科尔莫戈洛夫-阿诺德定理的视角调查了这种柔韧性不足，该定理从构建单变量组件开始表示多元函数，而不是结合独立变量。我们的分析显示，将独立变量的线性组合纳入其中可以大大简化网络架构以表示脊函数。受到这一发现的启发，我们提出了嵌入主动子空间的KAN（asKAN），这是一个将KAN的功能表示与主动子空间方法相结合的分层框架。该架构在KAN之间策略性地嵌入了主动子空间检测，其中主动子空间方法用于识别主要的脊方向，并且独立变量被自适应地投影到这些关键维度上。提出的asKAN是通过迭代方式实施的，而不会增加原始KAN中的神经元数量。该方法通过函数拟合、解决泊松方程和重建声场来验证。与KAN相比，asKAN在相同的网络架构下显著减少了误差。结果表明，asKAN增强了KAN在拟合和解决脊函数形式的方程方面的能力。

更新时间: 2025-04-09 08:34:59

领域: physics.comp-ph,cs.LG

下载: http://arxiv.org/abs/2504.04669v2

Privacy Attacks on Image AutoRegressive Models

Image autoregressive generation has emerged as a powerful new paradigm, with image autoregressive models (IARs) matching state-of-the-art diffusion models (DMs) in image quality (FID: 1.48 vs. 1.58) while allowing for higher generation speed. However, the privacy risks associated with IARs remain unexplored, raising concerns about their responsible deployment. To address this gap, we conduct a comprehensive privacy analysis of IARs, comparing their privacy risks to those of DMs as a reference point. Specifically, we develop a novel membership inference attack (MIA) that achieves a remarkably high success rate in detecting training images, with a True Positive Rate at False Positive Rate = 1% (TPR@FPR=1%) of 86.38%, compared to just 6.38% for DMs using comparable attacks. We leverage our novel MIA to perform dataset inference (DI) for IARs and show that it requires as few as 6 samples to detect dataset membership, compared to 200 samples for DI in DMs. This confirms a higher level of information leakage in IARs. Finally, we are able to extract hundreds of training data points from an IAR (e.g., 698 from VAR-d30). Our results suggest a fundamental privacy-utility trade-off: while IARs excel in image generation quality and speed, they are empirically significantly more vulnerable to privacy attacks compared to DMs that achieve similar performance. This trend suggests that incorporating techniques from DMs into IARs, such as modeling the per-token probability distribution using a diffusion procedure, could help mitigate IARs' vulnerability to privacy attacks. We make our code available at: https://github.com/sprintml/privacy_attacks_against_iars

Updated: 2025-04-09 08:33:54

标题: 图像自回归模型的隐私攻击

摘要: 图像自回归生成已经成为一种强大的新范式，图像自回归模型（IARs）在图像质量（FID：1.48 vs. 1.58）方面与最先进的扩散模型（DMs）相匹配，同时允许更高的生成速度。然而，与IARs相关的隐私风险仍未被探索，引发了对它们负责任部署的担忧。为了填补这一空白，我们对IARs进行了全面的隐私分析，将它们的隐私风险与DMs进行比较，作为一个参考点。具体而言，我们开发了一种新颖的成员推理攻击（MIA），在检测训练图像方面取得了显著高的成功率，True Positive Rate at False Positive Rate = 1%（TPR@FPR=1%）达到了86.38%，而使用相似攻击的DMs仅为6.38%。我们利用我们的新颖MIA来执行IARs的数据集推理（DI），并展示它只需要6个样本就能检测数据集成员身份，而在DMs中需要200个样本。这证实了IARs中信息泄漏的程度更高。最后，我们能够从一个IAR中提取出数百个训练数据点（例如，从VAR-d30中提取了698个）。我们的结果表明了一个基本的隐私-效用权衡：虽然IARs在图像生成质量和速度方面表现出色，但在实证上它们相比实现类似性能的DMs更容易受到隐私攻击。这一趋势表明，将来自DMs的技术纳入IARs中，比如使用扩散过程建模每个标记的概率分布，可能有助于减轻IARs对隐私攻击的脆弱性。我们在以下链接提供我们的代码：https://github.com/sprintml/privacy_attacks_against_iars

更新时间: 2025-04-09 08:33:54

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.02514v3

Demystifying Language Model Forgetting with Low-rank Example Associations

Large Language models (LLMs) suffer from forgetting of upstream data when fine-tuned. Despite efforts on mitigating forgetting, few have investigated whether, and how forgotten upstream examples are dependent on newly learned tasks. Insights on such dependencies enable efficient and targeted mitigation of forgetting. In this paper, we empirically analyze forgetting that occurs in $N$ upstream examples of language modeling or instruction-tuning after fine-tuning LLMs on one of $M$ new tasks, visualized in $M\times N$ matrices. We show that the matrices are often well-approximated with low-rank matrices, indicating the dominance of simple associations between the learned tasks and forgotten upstream examples. Leveraging the analysis, we predict forgetting of upstream examples when fine-tuning on unseen tasks with matrix completion over the empirical associations. This enables fast identification of most forgotten examples without expensive inference on the entire upstream data. The approach, despite simplicity, outperforms prior approaches that learn semantic relationships of learned tasks and upstream examples with LMs for predicting forgetting. We demonstrate the practical utility of our analysis by showing statistically significantly reduced forgetting as we upweight predicted examples for replay at fine-tuning. Project page: https://inklab.usc.edu/lm-forgetting-prediction/

Updated: 2025-04-09 08:23:19

标题: 解密低秩示例关联的语言模型遗忘

摘要: 大型语言模型（LLMs）在微调时容易忘记上游数据。尽管有关减轻遗忘的努力，但很少有人调查被遗忘的上游示例是否取决于新学习任务。对这种依赖性的洞察有助于有效和有针对性地减轻遗忘。在本文中，我们实证分析了在微调LLMs后在$M$个新任务中发生的语言建模或指令调整的$N$个上游示例中发生的遗忘，可在$M\times N$矩阵中可视化。我们发现这些矩阵通常可以用低秩矩阵很好地近似，表明学习任务和被遗忘的上游示例之间的简单关联占主导地位。利用分析，我们通过对经验关联进行矩阵补全来预测在微调未见任务时上游示例的遗忘。这使得能够快速识别大多数被遗忘的示例，而无需对整个上游数据进行昂贵的推理。尽管简单，该方法胜过以前通过学习与LMs中学习任务和上游示例的语义关系来预测遗忘的方法。我们通过显示在微调中增加预测示例的重播以统计显著减少遗忘来展示我们分析的实际效用。项目页面：https://inklab.usc.edu/lm-forgetting-prediction/

更新时间: 2025-04-09 08:23:19

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2406.14026v5

Off-the-grid learning of mixtures from a continuous dictionary

We consider a general non-linear model where the signal is a finite mixture of an unknown, possibly increasing, number of features issued from a continuous dictionary parameterized by a real non-linear parameter. The signal is observed with Gaussian (possibly correlated) noise in either a continuous or a discrete setup. We propose an off-the-grid optimization method, that is, a method which does not use any discretization scheme on the parameter space, to estimate both the non-linear parameters of the features and the linear parameters of the mixture. We use recent results on the geometry of off-the-grid methods to give minimal separation on the true underlying non-linear parameters such that interpolating certificate functions can be constructed. Using also tail bounds for suprema of Gaussian processes we bound the prediction error with high probability. Assuming that the certificate functions can be constructed, our prediction error bound is up to $\log$-factors similar to the rates attained by the Lasso predictor in the linear regression model. We also establish convergence rates that quantify with high probability the quality of estimation for both the linear and the non-linear parameters. We develop in full details our main results for two applications: the Gaussian spike deconvolution and the scaled exponential model.

Updated: 2025-04-09 08:17:29

标题: 不受网格限制的从连续词典中学习混合物

摘要: 我们考虑一个一般的非线性模型，其中信号是一个有限混合物，由一个连续字典参数化的未知特征组成，该字典的非线性参数是实数。信号在连续或离散设置下与高斯（可能相关）噪声一起被观察到。我们提出了一种基于离散化方案的非网格优化方法，用于估计特征的非线性参数和混合物的线性参数。我们利用最近关于非网格方法几何性质的结果来给出真实基础非线性参数的最小分离，以便构建插值证明函数。同时利用高斯过程的最大值的尾部界限，我们可以高概率地限制预测误差。假设可以构建证明函数，我们的预测误差界限与线性回归模型中Lasso预测器达到的速率类似，只是多了对数因子。我们还建立了收敛速率，以高概率量化线性和非线性参数的估计质量。我们详细阐述了两种应用的主要结果：高斯尖峰去卷积和缩放指数模型。

更新时间: 2025-04-09 08:17:29

领域: stat.ML,cs.LG,math.PR,math.ST,stat.TH

下载: http://arxiv.org/abs/2207.00171v2

Low-Rank Mirror-Prox for Nonsmooth and Low-Rank Matrix Optimization Problems

Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning. While significant progress has been made in recent years in developing efficient methods for \textit{smooth} low-rank optimization problems that avoid maintaining high-rank matrices and computing expensive high-rank SVDs, advances for nonsmooth problems have been slow paced. In this paper we consider standard convex relaxations for such problems. Mainly, we prove that under a \textit{strict complementarity} condition and under the relatively mild assumption that the nonsmooth objective can be written as a maximum of smooth functions, approximated variants of two popular \textit{mirror-prox} methods: the Euclidean \textit{extragradient method} and mirror-prox with \textit{matrix exponentiated gradient updates}, when initialized with a "warm-start", converge to an optimal solution with rate $O(1/t)$, while requiring only two \textit{low-rank} SVDs per iteration. Moreover, for the extragradient method we also consider relaxed versions of strict complementarity which yield a trade-off between the rank of the SVDs required and the radius of the ball in which we need to initialize the method. We support our theoretical results with empirical experiments on several nonsmooth low-rank matrix recovery tasks, demonstrating both the plausibility of the strict complementarity assumption, and the efficient convergence of our proposed low-rank mirror-prox variants.

Updated: 2025-04-09 08:15:07

标题: 低秩镜像-近端法用于非光滑和低秩矩阵优化问题

摘要: 低秩和非平滑矩阵优化问题涵盖了统计学和机器学习中许多基础任务。近年来，在开发避免维护高秩矩阵和计算昂贵的高秩SVD的高效方法方面取得了重大进展，以解决\textit{平滑}低秩优化问题，但对于非平滑问题的进展缓慢。本文考虑了这些问题的标准凸松弛。主要是，我们证明在\textit{严格互补}条件下，并在相对温和的假设下，即非平滑目标可以被写成平滑函数的最大值的情况下，两种流行的\textit{镜像-近端}方法的近似变体会收敛到一个最优解，这两种方法分别是：欧几里德\textit{额外梯度方法}和带有\textit{矩阵指数梯度更新}的镜像-近端方法，当它们以“热启动”初始化时，收敛速度为$O(1/t)$，每次迭代仅需要两个\textit{低秩}SVD。此外，对于额外梯度方法，我们还考虑了严格互补的宽松版本，这些版本在所需SVD的秩和我们需要初始化方法的球半径之间取得了权衡。我们通过对几个非平滑低秩矩阵恢复任务进行实证实验来支持我们的理论结果，展示了严格互补假设的合理性，以及我们提出的低秩镜像-近端变体的高效收敛性。

更新时间: 2025-04-09 08:15:07

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2206.11523v2

NLP Security and Ethics, in the Wild

As NLP models are used by a growing number of end-users, an area of increasing importance is NLP Security (NLPSec): assessing the vulnerability of models to malicious attacks and developing comprehensive countermeasures against them. While work at the intersection of NLP and cybersecurity has the potential to create safer NLP for all, accidental oversights can result in tangible harm (e.g., breaches of privacy or proliferation of malicious models). In this emerging field, however, the research ethics of NLP have not yet faced many of the long-standing conundrums pertinent to cybersecurity, until now. We thus examine contemporary works across NLPSec, and explore their engagement with cybersecurity's ethical norms. We identify trends across the literature, ultimately finding alarming gaps on topics like harm minimization and responsible disclosure. To alleviate these concerns, we provide concrete recommendations to help NLP researchers navigate this space more ethically, bridging the gap between traditional cybersecurity and NLP ethics, which we frame as ``white hat NLP''. The goal of this work is to help cultivate an intentional culture of ethical research for those working in NLP Security.

Updated: 2025-04-09 08:12:34

标题: 自然语言处理的安全性和道德伦理，在野外

摘要: 随着越来越多的终端用户使用自然语言处理（NLP）模型，NLP安全（NLPSec）成为一个日益重要的领域：评估模型对恶意攻击的脆弱性，并开发全面的应对措施。尽管NLP与网络安全领域的交叉工作有望为所有人创建更安全的NLP，但意外疏忽可能导致实质性的伤害（例如，侵犯隐私或恶意模型的传播）。然而，在这一新兴领域中，NLP的研究伦理尚未面临许多长期存在的与网络安全相关的难题，直到现在。因此，我们检视了NLPSec领域的当代作品，探索它们与网络安全伦理规范的互动。我们在文献中找到了一些趋势，最终发现了一些令人担忧的空白，例如有关减少伤害和负责披露的话题。为了缓解这些担忧，我们提出了具体的建议，帮助NLP研究人员更加符合伦理地开展研究，弥合传统网络安全与NLP伦理之间的鸿沟，我们称之为“白帽NLP”。这项工作的目标是帮助在NLP安全领域工作的人培养一种有意识的伦理研究文化。

更新时间: 2025-04-09 08:12:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2504.06669v1

A primal-dual perspective for distributed TD-learning

The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process. The proposed approach is based on distributed optimization algorithms, which can be interpreted as primal-dual Ordinary differential equation (ODE) dynamics subject to null-space constraints. Based on the exponential convergence behavior of the primal-dual ODE dynamics subject to null-space constraints, we examine the behavior of the final iterate in various distributed TD-learning scenarios, considering both constant and diminishing step-sizes and incorporating both i.i.d. and Markovian observation models. Unlike existing methods, the proposed algorithm does not require the assumption that the underlying communication network structure is characterized by a doubly stochastic matrix.

Updated: 2025-04-09 08:07:54

标题: 一个分布式TD-learning的原始-对偶视角

摘要: 本文旨在研究分布式时间差异（TD）学习在网络化多智能体马尔可夫决策过程中的应用。所提出的方法基于分布式优化算法，可以解释为受零空间约束的原始-对偶常微分方程（ODE）动态。基于受零空间约束的原始-对偶ODE动态的指数收敛行为，我们考察了在各种分布式TD学习场景中最终迭代的行为，考虑了恒定和递减步长，并纳入了i.i.d.和马尔可夫观测模型。与现有方法不同，所提出的算法不需要假设基础通信网络结构由双随机矩阵描述。

更新时间: 2025-04-09 08:07:54

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2310.00638v2

CW-BASS: Confidence-Weighted Boundary-Aware Learning for Semi-Supervised Semantic Segmentation

Semi-supervised semantic segmentation (SSSS) aims to improve segmentation performance by utilizing large amounts of unlabeled data with limited labeled samples. Existing methods often suffer from coupling, where over-reliance on initial labeled data leads to suboptimal learning; confirmation bias, where incorrect predictions reinforce themselves repeatedly; and boundary blur caused by limited boundary-awareness and ambiguous edge cues. To address these issues, we propose CW-BASS, a novel framework for SSSS. In order to mitigate the impact of incorrect predictions, we assign confidence weights to pseudo-labels. Additionally, we leverage boundary-delineation techniques, which, despite being extensively explored in weakly-supervised semantic segmentation (WSSS), remain underutilized in SSSS. Specifically, our method: (1) reduces coupling via a confidence-weighted loss that adjusts pseudo-label influence based on their predicted confidence scores, (2) mitigates confirmation bias with a dynamic thresholding mechanism that learns to filter out pseudo-labels based on model performance, (3) tackles boundary blur using a boundary-aware module to refine segmentation near object edges, and (4) reduces label noise through a confidence decay strategy that progressively refines pseudo-labels during training. Extensive experiments on Pascal VOC 2012 and Cityscapes demonstrate that CW-BASS achieves state-of-the-art performance. Notably, CW-BASS achieves a 65.9% mIoU on Cityscapes under a challenging and underexplored 1/30 (3.3%) split (100 images), highlighting its effectiveness in limited-label settings. Our code is available at https://github.com/psychofict/CW-BASS.

Updated: 2025-04-09 08:07:51

标题: CW-BASS：置信加权边界感知学习用于半监督语义分割

摘要: 半监督语义分割（SSSS）旨在通过利用大量未标记数据和有限标记样本来提高分割性能。现有方法通常存在耦合问题，即对初始标记数据过度依赖会导致学习效果不佳；确认偏见问题，即不正确的预测会反复强化自身；以及由于有限的边界感知和模糊的边缘线索导致的边界模糊问题。为了解决这些问题，我们提出了CW-BASS，这是一种用于SSSS的新框架。为了减轻不正确预测的影响，我们为伪标签分配置信权重。此外，我们利用边界划分技术，尽管在弱监督语义分割（WSSS）中得到广泛探讨，但在SSSS中仍未充分利用。具体而言，我们的方法：（1）通过调整基于其预测置信度得分的伪标签影响力的置信加权损失来减少耦合，（2）通过动态阈值机制减轻确认偏见，该机制学习根据模型性能滤除伪标签，（3）使用边界感知模块处理边界模糊问题，以在物体边缘附近细化分割，（4）通过逐步在训练过程中精细化伪标签的置信度衰减策略来减少标签噪声。对Pascal VOC 2012和Cityscapes的广泛实验表明，CW-BASS实现了最先进的性能。值得注意的是，CW-BASS在具有挑战性且未充分探索的1/30（3.3%）分割（100张图像）下在Cityscapes上实现了65.9%的mIoU，突显了其在有限标记设置中的有效性。我们的代码可在https://github.com/psychofict/CW-BASS上获得。

更新时间: 2025-04-09 08:07:51

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.15152v2

CLaSP: Learning Concepts for Time-Series Signals from Natural Language Supervision

This paper presents CLaSP, a novel model for retrieving time-series signals using natural language queries that describe signal characteristics. The ability to search time-series signals based on descriptive queries is essential in domains such as industrial diagnostics, where data scientists often need to find signals with specific characteristics. However, existing methods rely on sketch-based inputs, predefined synonym dictionaries, or domain-specific manual designs, limiting their scalability and adaptability. CLaSP addresses these challenges by employing contrastive learning to map time-series signals to natural language descriptions. Unlike prior approaches, it eliminates the need for predefined synonym dictionaries and leverages the rich contextual knowledge of large language models (LLMs). Using the TRUCE and SUSHI datasets, which pair time-series signals with natural language descriptions, we demonstrate that CLaSP achieves high accuracy in retrieving a variety of time series patterns based on natural language queries.

Updated: 2025-04-09 08:01:55

标题: CLaSP：从自然语言监督中学习时间序列信号的概念

摘要: 这篇论文介绍了CLaSP，这是一种新颖的模型，用于通过描述信号特征的自然语言查询检索时间序列信号。根据描述性查询搜索时间序列信号的能力在诸如工业诊断等领域至关重要，数据科学家经常需要找到具有特定特征的信号。然而，现有方法依赖于基于草图的输入、预定义的同义词词典或领域特定的手动设计，限制了它们的可伸缩性和适应性。CLaSP通过采用对比学习将时间序列信号映射到自然语言描述来解决这些挑战。与以往的方法不同，它消除了对预定义同义词词典的需求，并利用了大型语言模型（LLMs）的丰富上下文知识。使用TRUCE和SUSHI数据集，这些数据集将时间序列信号与自然语言描述配对，我们展示了CLaSP在基于自然语言查询检索各种时间序列模式方面具有高准确性。

更新时间: 2025-04-09 08:01:55

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.08397v2

SEE: Continual Fine-tuning with Sequential Ensemble of Experts

Continual fine-tuning of large language models (LLMs) suffers from catastrophic forgetting. Rehearsal-based methods mitigate this problem by retaining a small set of old data. Nevertheless, they still suffer inevitable performance loss. Although training separate experts for each task can help prevent forgetting, effectively assembling them remains a challenge. Some approaches use routers to assign tasks to experts, but in continual learning, they often require retraining for optimal performance. To address these challenges, we introduce the Sequential Ensemble of Experts (SEE) framework. SEE removes the need for an additional router, allowing each expert to independently decide whether a query should be handled. The framework employs distributed routing, and during continual fine-tuning, SEE only requires the training of new experts for incoming tasks rather than retraining the entire system. Experiments reveal that the SEE outperforms prior approaches, including multi-task learning, in continual fine-tuning. It also demonstrates remarkable generalization ability, as the expert can effectively identify out-of-distribution queries, which can then be directed to a more generalized model for resolution. This work highlights the promising potential of integrating routing and response mechanisms within each expert, paving the way for the future of distributed model ensembling.

Updated: 2025-04-09 07:56:56

标题: 查看：使用顺序专家集进行持续微调

摘要: 大型语言模型（LLMs）的持续微调遭受灾难性遗忘的困扰。基于重复的方法通过保留一小部分旧数据来缓解这个问题。然而，它们仍然会遭受不可避免的性能损失。虽然为每个任务训练单独的专家可以帮助防止遗忘，但有效地组装它们仍然是一个挑战。一些方法使用路由器将任务分配给专家，但在持续学习中，它们经常需要重新训练以实现最佳性能。为了解决这些挑战，我们引入了顺序专家集成（SEE）框架。SEE消除了对额外路由器的需求，允许每个专家独立决定是否处理查询。该框架采用分布式路由，在持续微调期间，SEE只需要为新任务训练新专家，而不是重新训练整个系统。实验表明，SEE在持续微调中优于先前的方法，包括多任务学习。它还展示了出色的泛化能力，因为专家可以有效地识别分布之外的查询，然后将其引导到更广义的模型进行解决。这项工作突出了在每个专家内集成路由和响应机制的有希望的潜力，为分布式模型集成的未来铺平道路。

更新时间: 2025-04-09 07:56:56

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.06664v1

Robo-taxi Fleet Coordination at Scale via Reinforcement Learning

Fleets of robo-taxis offering on-demand transportation services, commonly known as Autonomous Mobility-on-Demand (AMoD) systems, hold significant promise for societal benefits, such as reducing pollution, energy consumption, and urban congestion. However, orchestrating these systems at scale remains a critical challenge, with existing coordination algorithms often failing to exploit the systems' full potential. This work introduces a novel decision-making framework that unites mathematical modeling with data-driven techniques. In particular, we present the AMoD coordination problem through the lens of reinforcement learning and propose a graph network-based framework that exploits the main strengths of graph representation learning, reinforcement learning, and classical operations research tools. Extensive evaluations across diverse simulation fidelities and scenarios demonstrate the flexibility of our approach, achieving superior system performance, computational efficiency, and generalizability compared to prior methods. Finally, motivated by the need to democratize research efforts in this area, we release publicly available benchmarks, datasets, and simulators for network-level coordination alongside an open-source codebase designed to provide accessible simulation platforms and establish a standardized validation process for comparing methodologies. Code available at: https://github.com/StanfordASL/RL4AMOD

Updated: 2025-04-09 07:54:20

标题: 规模化通过强化学习进行Robo-taxi车队协调

摘要: 自动出租车车队提供按需交通服务，通常被称为自主移动按需（AMoD）系统，对于减少污染、能源消耗和城市拥堵等社会效益具有重要的潜力。然而，在规模上协调这些系统仍然是一个关键挑战，现有的协调算法通常无法充分发挥系统的潜力。本研究引入了一个新的决策框架，将数学建模与数据驱动技术结合起来。具体地，我们通过强化学习的视角提出了AMoD协调问题，并提出了一个基于图网络的框架，利用图表示学习、强化学习和经典运筹学工具的主要优势。通过对不同仿真忠实度和场景的广泛评估，我们的方法展示了其灵活性，相较于先前的方法，实现了优越的系统性能、计算效率和泛化能力。最后，受到在这一领域民主化研究工作的需求，我们发布了公开可用的基准、数据集和网络级协调的模拟器，以及一个开源代码库，旨在提供易于访问的仿真平台，并建立一个用于比较方法论的标准化验证过程。代码可在以下链接找到：https://github.com/StanfordASL/RL4AMOD

更新时间: 2025-04-09 07:54:20

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2504.06125v2

Robust and Noise-resilient Long-Term Prediction of Spatiotemporal Data Using Variational Mode Graph Neural Networks with 3D Attention

This paper focuses on improving the robustness of spatiotemporal long-term prediction using a variational mode graph convolutional network (VMGCN) by introducing 3D channel attention. The deep learning network for this task relies on historical data inputs, yet real-time data can be corrupted by sensor noise, altering its distribution. We model this noise as independent and identically distributed (i.i.d.) Gaussian noise and incorporate it into the LargeST traffic volume dataset, resulting in data with both inherent and additive noise components. Our approach involves decomposing the corrupted signal into modes using variational mode decomposition, followed by feeding the data into a learning pipeline for prediction. We integrate a 3D attention mechanism encompassing spatial, temporal, and channel attention. The spatial and temporal attention modules learn their respective correlations, while the channel attention mechanism is used to suppress noise and highlight the significant modes in the spatiotemporal signals. Additionally, a learnable soft thresholding method is implemented to exclude unimportant modes from the feature vector, and a feature reduction method based on the signal-to-noise ratio (SNR) is applied. We compare the performance of our approach against baseline models, demonstrating that our method achieves superior long-term prediction accuracy, robustness to noise, and improved performance with mode truncation compared to the baseline models. The code of the paper is available at https://github.com/OsamaAhmad369/VMGCN.

Updated: 2025-04-09 07:49:45

标题: 稳健且抗噪声的三维关注变分模式图神经网络用于时空数据的长期预测

摘要: 本文着重于利用变分模式图卷积网络（VMGCN）引入3D通道注意力来提高时空长期预测的鲁棒性。这项任务的深度学习网络依赖于历史数据输入，但实时数据可能会受到传感器噪声的影响，改变其分布。我们将这种噪声建模为独立同分布（i.i.d.）的高斯噪声，并将其纳入LargeST交通量数据集，导致数据同时具有固有噪声和附加噪声成分。我们的方法涉及使用变分模式分解将损坏的信号分解为模式，然后将数据馈入学习管道进行预测。我们集成了一个包含空间、时间和通道注意力的3D注意机制。空间和时间注意力模块学习它们各自的相关性，而通道注意力机制用于抑制噪声并突显时空信号中重要的模式。此外，我们实施了一个可学习的软阈值方法，将不重要的模式排除在特征向量之外，并应用基于信噪比（SNR）的特征减少方法。我们将我们的方法与基准模型的性能进行比较，证明我们的方法在长期预测准确性、对噪声的鲁棒性以及与基准模型相比在模式截断方面的性能都更优。本文代码可在https://github.com/OsamaAhmad369/VMGCN获取。

更新时间: 2025-04-09 07:49:45

领域: cs.LG

下载: http://arxiv.org/abs/2504.06660v1

Bridging the Gap Between Preference Alignment and Machine Unlearning

Despite advances in Preference Alignment (PA) for Large Language Models (LLMs), mainstream methods like Reinforcement Learning with Human Feedback (RLHF) face notable challenges. These approaches require high-quality datasets of positive preference examples, which are costly to obtain and computationally intensive due to training instability, limiting their use in low-resource scenarios. LLM unlearning technique presents a promising alternative, by directly removing the influence of negative examples. However, current research has primarily focused on empirical validation, lacking systematic quantitative analysis. To bridge this gap, we propose a framework to explore the relationship between PA and LLM unlearning. Specifically, we introduce a bi-level optimization-based method to quantify the impact of unlearning specific negative examples on PA performance. Our analysis reveals that not all negative examples contribute equally to alignment improvement when unlearned, and the effect varies significantly across examples. Building on this insight, we pose a crucial question: how can we optimally select and weight negative examples for unlearning to maximize PA performance? To answer this, we propose a framework called Unlearning to Align (U2A), which leverages bi-level optimization to efficiently select and unlearn examples for optimal PA performance. We validate the proposed method through extensive experiments, with results confirming its effectiveness.

Updated: 2025-04-09 07:49:08

标题: 弥合偏好一致性与机器去学习之间的差距

摘要: 尽管大型语言模型（LLMs）的偏好对齐（Preference Alignment，PA）取得了进展，但主流方法如强化学习与人类反馈（RLHF）面临显著挑战。这些方法需要高质量的正面偏好示例数据集，由于训练不稳定而需要耗费大量资源，限制了它们在资源匮乏情况下的应用。LLM遗忘技术提出了一种有希望的替代方案，可以直接消除负面示例的影响。然而，目前的研究主要集中在经验验证上，缺乏系统性的定量分析。为了弥补这一差距，我们提出了一个框架来探讨PA和LLM遗忘之间的关系。具体地，我们引入了一种基于双层优化的方法，来量化遗忘特定负面示例对PA性能的影响。我们的分析表明，并非所有的负面示例在被遗忘后对对齐改善都有相同的贡献，而且效果在示例之间存在显著差异。基于这一见解，我们提出了一个关键问题：如何选择和加权负面示例以最大化PA性能？为了回答这个问题，我们提出了一个名为“Unlearning to Align”（U2A）的框架，利用双层优化来高效选择和遗忘示例，以实现最佳的PA性能。我们通过大量实验验证了所提出的方法，并结果证实了其有效性。

更新时间: 2025-04-09 07:49:08

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2504.06659v1

A Neuro-inspired Interpretation of Unlearning in Large Language Models through Sample-level Unlearning Difficulty

Driven by privacy protection laws and regulations, unlearning in Large Language Models (LLMs) is gaining increasing attention. However, current research often neglects the interpretability of the unlearning process, particularly concerning sample-level unlearning difficulty. Existing studies typically assume a uniform unlearning difficulty across samples. This simplification risks attributing the performance of unlearning algorithms to sample selection rather than the algorithm's design, potentially steering the development of LLM unlearning in the wrong direction. Thus, we investigate the relationship between LLM unlearning and sample characteristics, with a focus on unlearning difficulty. Drawing inspiration from neuroscience, we propose a Memory Removal Difficulty ($\mathrm{MRD}$) metric to quantify sample-level unlearning difficulty. Using $\mathrm{MRD}$, we analyze the characteristics of hard-to-unlearn versus easy-to-unlearn samples. Furthermore, we propose an $\mathrm{MRD}$-based weighted sampling method to optimize existing unlearning algorithms, which prioritizes easily forgettable samples, thereby improving unlearning efficiency and effectiveness. We validate the proposed metric and method using public benchmarks and datasets, with results confirming its effectiveness.

Updated: 2025-04-09 07:48:10

标题: 基于神经启发的大型语言模型中遗忘的解释：通过样本级遗忘难度

摘要: 受隐私保护法律和法规的驱动，大型语言模型（LLMs）中的遗忘正日益受到关注。然而，当前的研究往往忽视了遗忘过程的可解释性，特别是涉及样本级遗忘难度。现有研究通常假设样本之间的遗忘难度是均匀的。这种简化可能会将遗忘算法的性能归因于样本选择，而不是算法的设计，潜在地将LLM遗忘的发展引导至错误的方向。因此，我们调查了LLM遗忘与样本特征之间的关系，重点关注遗忘难度。受到神经科学的启发，我们提出了一个记忆删除难度（MRD）指标来量化样本级遗忘难度。利用MRD，我们分析了难以遗忘与容易遗忘样本的特征。此外，我们提出了一种基于MRD的加权抽样方法来优化现有的遗忘算法，该方法优先考虑易被遗忘的样本，从而提高遗忘的效率和有效性。我们使用公共基准和数据集验证了提出的度量和方法，结果证实了其有效性。

更新时间: 2025-04-09 07:48:10

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2504.06658v1

GRAIN: Multi-Granular and Implicit Information Aggregation Graph Neural Network for Heterophilous Graphs

Graph neural networks (GNNs) have shown significant success in learning graph representations. However, recent studies reveal that GNNs often fail to outperform simple MLPs on heterophilous graph tasks, where connected nodes may differ in features or labels, challenging the homophily assumption. Existing methods addressing this issue often overlook the importance of information granularity and rarely consider implicit relationships between distant nodes. To overcome these limitations, we propose the Granular and Implicit Graph Network (GRAIN), a novel GNN model specifically designed for heterophilous graphs. GRAIN enhances node embeddings by aggregating multi-view information at various granularity levels and incorporating implicit data from distant, non-neighboring nodes. This approach effectively integrates local and global information, resulting in smoother, more accurate node representations. We also introduce an adaptive graph information aggregator that efficiently combines multi-granularity and implicit data, significantly improving node representation quality, as shown by experiments on 13 datasets covering varying homophily and heterophily. GRAIN consistently outperforms 12 state-of-the-art models, excelling on both homophilous and heterophilous graphs.

Updated: 2025-04-09 07:36:44

标题: GRAIN：多粒度和隐式信息聚合图神经网络用于异质图

摘要: 图神经网络（GNNs）在学习图表示方面取得了显著的成功。然而，最近的研究表明，在异质图任务中，GNNs往往无法超越简单的MLPs，其中连接的节点可能在特征或标签上有所不同，挑战了同质性假设。现有方法通常忽视了信息粒度的重要性，并很少考虑远距离节点之间的隐含关系。为了克服这些限制，我们提出了Granular and Implicit Graph Network（GRAIN），这是一种专门为异质图设计的新颖的GNN模型。GRAIN通过在不同粒度级别聚合多视图信息并结合来自远距非邻节点的隐含数据来增强节点嵌入。这种方法有效地整合了局部和全局信息，导致更平滑、更准确的节点表示。我们还引入了一种自适应图信息聚合器，有效地结合了多粒度和隐含数据，显著提高了节点表示质量，实验证明在涵盖各种同质性和异质性的13个数据集上，GRAIN始终优于12个最先进模型，在同质性和异质性图上均表现出色。

更新时间: 2025-04-09 07:36:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06649v1

AMAD: AutoMasked Attention for Unsupervised Multivariate Time Series Anomaly Detection

Unsupervised multivariate time series anomaly detection (UMTSAD) plays a critical role in various domains, including finance, networks, and sensor systems. In recent years, due to the outstanding performance of deep learning in general sequential tasks, many models have been specialized for deep UMTSAD tasks and have achieved impressive results, particularly those based on the Transformer and self-attention mechanisms. However, the sequence anomaly association assumptions underlying these models are often limited to specific predefined patterns and scenarios, such as concentrated or peak anomaly patterns. These limitations hinder their ability to generalize to diverse anomaly situations, especially where the lack of labels poses significant challenges. To address these issues, we propose AMAD, which integrates \textbf{A}uto\textbf{M}asked Attention for UMTS\textbf{AD} scenarios. AMAD introduces a novel structure based on the AutoMask mechanism and an attention mixup module, forming a simple yet generalized anomaly association representation framework. This framework is further enhanced by a Max-Min training strategy and a Local-Global contrastive learning approach. By combining multi-scale feature extraction with automatic relative association modeling, AMAD provides a robust and adaptable solution to UMTSAD challenges. Extensive experimental results demonstrate that the proposed model achieving competitive performance results compared to SOTA benchmarks across a variety of datasets.

Updated: 2025-04-09 07:32:59

标题: AMAD: 自动遮罩注意力用于无监督多变量时间序列异常检测

摘要: 无监督多变量时间序列异常检测（UMTSAD）在各个领域中发挥着关键作用，包括金融、网络和传感器系统。近年来，由于深度学习在一般顺序任务中的出色表现，许多模型已经专门针对深度UMTSAD任务进行了优化，并取得了令人印象深刻的结果，特别是基于Transformer和自注意机制的模型。然而，这些模型基于序列异常关联假设往往局限于特定预定义的模式和场景，例如集中或峰值异常模式。这些限制阻碍了它们在各种异常情况下的泛化能力，尤其是在缺乏标签的情况下，面临着重大挑战。为了解决这些问题，我们提出了AMAD，该模型集成了用于UMTSAD场景的\textbf{A}uto\textbf{M}asked Attention。AMAD引入了基于AutoMask机制和注意力混合模块的新颖结构，形成了一个简单但泛化的异常关联表示框架。这个框架进一步通过Max-Min训练策略和局部-全局对比学习方法得到增强。通过将多尺度特征提取与自动相对关联建模相结合，AMAD为UMTSAD挑战提供了强大且适应性强的解决方案。广泛的实验结果表明，所提出的模型在各种数据集上取得了与SOTA基准相比具有竞争力的性能结果。

更新时间: 2025-04-09 07:32:59

领域: cs.LG,cs.AI,I.5.1

下载: http://arxiv.org/abs/2504.06643v1

Deep Sturm--Liouville: From Sample-Based to 1D Regularization with Learnable Orthogonal Basis Functions

Although Artificial Neural Networks (ANNs) have achieved remarkable success across various tasks, they still suffer from limited generalization. We hypothesize that this limitation arises from the traditional sample-based (0--dimensionnal) regularization used in ANNs. To overcome this, we introduce \textit{Deep Sturm--Liouville} (DSL), a novel function approximator that enables continuous 1D regularization along field lines in the input space by integrating the Sturm--Liouville Theorem (SLT) into the deep learning framework. DSL defines field lines traversing the input space, along which a Sturm--Liouville problem is solved to generate orthogonal basis functions, enforcing implicit regularization thanks to the desirable properties of SLT. These basis functions are linearly combined to construct the DSL approximator. Both the vector field and basis functions are parameterized by neural networks and learned jointly. We demonstrate that the DSL formulation naturally arises when solving a Rank-1 Parabolic Eigenvalue Problem. DSL is trained efficiently using stochastic gradient descent via implicit differentiation. DSL achieves competitive performance and demonstrate improved sample efficiency on diverse multivariate datasets including high-dimensional image datasets such as MNIST and CIFAR-10.

Updated: 2025-04-09 07:21:13

标题: 深度Sturm-Liouville：从基于样本的到具有可学习正交基函数的1D正则化

摘要: 尽管人工神经网络（ANNs）在各种任务中取得了显著的成功，但它们仍然受到有限泛化能力的限制。我们假设这种限制源于在ANNs中使用的传统基于样本（0维）的正则化。为了克服这一限制，我们引入了\textit{深度Sturm-Liouville}（DSL），这是一种新颖的函数逼近器，通过将Sturm-Liouville定理（SLT）集成到深度学习框架中，使输入空间沿场线连续1D正则化。DSL定义了穿越输入空间的场线，沿着这些场线解决了一个Sturm-Liouville问题以生成正交基函数，通过SLT的理想性质实施了隐式正则化。这些基函数线性组合以构建DSL逼近器。矢量场和基函数都由神经网络参数化并联合学习。我们证明了DSL公式在解决秩-1抛物线特征值问题时自然产生。DSL通过隐式微分的随机梯度下降有效训练。DSL在各种多变量数据集上实现了竞争性能，并展示了在高维图像数据集（如MNIST和CIFAR-10）上改进的样本效率。

更新时间: 2025-04-09 07:21:13

领域: cs.LG

下载: http://arxiv.org/abs/2504.07151v1

SIGMA: An Efficient Heterophilous Graph Neural Network with Fast Global Aggregation

Graph neural networks (GNNs) realize great success in graph learning but suffer from performance loss when meeting heterophily, i.e. neighboring nodes are dissimilar, due to their local and uniform aggregation. Existing attempts of heterophilous GNNs incorporate long-range or global aggregations to distinguish nodes in the graph. However, these aggregations usually require iteratively maintaining and updating full-graph information, which limits their efficiency when applying to large-scale graphs. In this paper, we propose SIGMA, an efficient global heterophilous GNN aggregation integrating the structural similarity measurement SimRank. Our theoretical analysis illustrates that SIGMA inherently captures distant global similarity even under heterophily, that conventional approaches can only achieve after iterative aggregations. Furthermore, it enjoys efficient one-time computation with a complexity only linear to the node set size $\mathcal{O}(n)$. Comprehensive evaluation demonstrates that SIGMA achieves state-of-the-art performance with superior aggregation and overall efficiency. Notably, it obtains $5\times$ acceleration on the large-scale heterophily dataset pokec with over 30 million edges compared to the best baseline aggregation.

Updated: 2025-04-09 07:19:32

标题: SIGMA：一种具有快速全局聚合的高效异质图神经网络

摘要: 图神经网络（GNNs）在图学习中取得了巨大成功，但在遇到异质性时（即相邻节点不相似），由于它们的局部和均匀聚合，会出现性能损失。现有的异质性GNN尝试将长距离或全局聚合纳入其中，以区分图中的节点。然而，这些聚合通常需要迭代地维护和更新完整的图信息，这限制了它们在应用于大规模图时的效率。在本文中，我们提出了SIGMA，一种高效的全局异质性GNN聚合，集成了结构相似度测量SimRank。我们的理论分析显示，SIGMA在异质性下本质上捕捉到了远距离全局相似性，而传统方法只能在经过迭代聚合后才能实现。此外，它具有高效的一次计算，复杂度仅与节点集大小成线性关系$O(n)$。全面的评估表明，SIGMA实现了最先进的性能，具有优越的聚合和整体效率。值得注意的是，与最佳基线聚合相比，在拥有超过3000万边的大规模异质性数据集pokec上，它实现了$5\times$的加速。

更新时间: 2025-04-09 07:19:32

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2305.09958v4

FACT: Multinomial Misalignment Classification for Point Cloud Registration

We present FACT, a method for predicting alignment quality (i.e., registration error) of registered lidar point cloud pairs. This is useful e.g. for quality assurance of large, automatically registered 3D models. FACT extracts local features from a registered pair and processes them with a point transformer-based network to predict a misalignment class. We generalize prior work that study binary alignment classification of registration errors, by recasting it as multinomial misalignment classification. To achieve this, we introduce a custom regression-by-classification loss function that combines the cross-entropy and Wasserstein losses, and demonstrate that it outperforms both direct regression and prior binary classification. FACT successfully classifies point-cloud pairs registered with both the classical ICP and GeoTransformer, while other choices, such as standard point-cloud-quality metrics and registration residuals are shown to be poor choices for predicting misalignment. On a synthetically perturbed point-cloud task introduced by the CorAl method, we show that FACT achieves substantially better performance than CorAl. Finally, we demonstrate how FACT can assist experts in correcting misaligned point-cloud maps. Our code is available at https://github.com/LudvigDillen/FACT_for_PCMC.

Updated: 2025-04-09 07:01:57

标题: 事实：点云配准的多项式不对齐分类

摘要: 我们提出了FACT，一种用于预测已注册的激光雷达点云对齐质量（即注册错误）的方法。这对于大型自动注册的3D模型的质量保证非常有用。FACT从已注册的点云对中提取局部特征，并通过基于点变换器的网络对其进行处理，以预测错位类别。我们将研究二进制对准分类的先前工作进行了泛化，将其重新构造为多项式错位分类。为了实现这一点，我们引入了一种自定义的通过分类回归损失函数，该函数结合了交叉熵和Wasserstein损失，并证明其胜过直接回归和先前的二进制分类。FACT成功地对使用传统ICP和GeoTransformer注册的点云对进行分类，而其他选择，例如标准点云质量指标和注册残差，则被证明是预测错位的不良选择。在由CorAl方法引入的合成扰动的点云任务上，我们展示了FACT比CorAl表现更好。最后，我们展示了FACT如何帮助专家纠正错位的点云地图。我们的代码可在https://github.com/LudvigDillen/FACT_for_PCMC上找到。

更新时间: 2025-04-09 07:01:57

领域: cs.CV,cs.LG,I.4.5; I.4.8; I.2.9; I.2.10

下载: http://arxiv.org/abs/2504.06627v1

Boost Your Human Image Generation Model via Direct Preference Optimization

Human image generation is a key focus in image synthesis due to its broad applications, but even slight inaccuracies in anatomy, pose, or details can compromise realism. To address these challenges, we explore Direct Preference Optimization (DPO), which trains models to generate preferred (winning) images while diverging from non-preferred (losing) ones. However, conventional DPO methods use generated images as winning images, limiting realism. To overcome this limitation, we propose an enhanced DPO approach that incorporates high-quality real images as winning images, encouraging outputs to resemble real images rather than generated ones. However, implementing this concept is not a trivial task. Therefore, our approach, HG-DPO (Human image Generation through DPO), employs a novel curriculum learning framework that gradually improves the output of the model toward greater realism, making training more feasible. Furthermore, HG-DPO effectively adapts to personalized text-to-image tasks, generating high-quality and identity-specific images, which highlights the practical value of our approach.

Updated: 2025-04-09 06:55:52

标题: 通过直接偏好优化提升您的人类形象生成模型

摘要: 人类图像生成是图像合成的关键焦点，因为它具有广泛的应用，但是即使在解剖学、姿势或细节方面有轻微的不准确性也可能影响真实感。为了解决这些挑战，我们探索了直接偏好优化（DPO）方法，该方法训练模型生成首选（获胜）图像，同时与非首选（失败）图像有所不同。然而，传统的DPO方法使用生成的图像作为获胜图像，限制了真实感。为了克服这一限制，我们提出了一种增强的DPO方法，该方法将高质量的真实图像作为获胜图像，鼓励输出图像与真实图像相似，而不是生成的图像。然而，实现这一概念并不是一项简单的任务。因此，我们的方法HG-DPO（通过DPO进行人类图像生成）采用了一种新颖的课程学习框架，逐渐改善模型的输出，使训练更加可行。此外，HG-DPO有效地适应个性化文本到图像任务，生成高质量和身份特定的图像，突显了我们方法的实际价值。

更新时间: 2025-04-09 06:55:52

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20216v3

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models

To parameter-efficiently fine-tune (PEFT) large language models (LLMs), the low-rank adaptation (LoRA) method approximates the model changes $\Delta W \in \mathbb{R}^{m \times n}$ through the product of two matrices $A \in \mathbb{R}^{m \times r}$ and $B \in \mathbb{R}^{r \times n}$, where $r \ll \min(m, n)$, $A$ is initialized with Gaussian noise, and $B$ with zeros. LoRA freezes the original model $W$ and updates the "Noise & Zero" adapter, which may lead to slow convergence. To overcome this limitation, we introduce Principal Singular values and Singular vectors Adaptation (PiSSA). PiSSA shares the same architecture as LoRA, but initializes the adaptor matrices $A$ and $B$ with the principal components of the original matrix $W$, and put the remaining components into a residual matrix $W^{res} \in \mathbb{R}^{m \times n}$ which is frozen during fine-tuning. Compared to LoRA, PiSSA updates the principal components while freezing the "residual" parts, allowing faster convergence and enhanced performance. Comparative experiments of PiSSA and LoRA across 12 different models, ranging from 184M to 70B, encompassing 5 NLG and 8 NLU tasks, reveal that PiSSA consistently outperforms LoRA under identical experimental setups. On the GSM8K benchmark, Mistral-7B fine-tuned with PiSSA achieves an accuracy of 72.86%, surpassing LoRA's 67.7% by 5.16%. Due to the same architecture, PiSSA is also compatible with quantization to further reduce the memory requirement of fine-tuning. Compared to QLoRA, QPiSSA exhibits smaller quantization errors in the initial stages. Fine-tuning LLaMA-3-70B on GSM8K, QPiSSA attains an accuracy of 86.05%, exceeding the performances of QLoRA at 81.73%. Leveraging a fast SVD technique, PiSSA can be initialized in only a few seconds, presenting a negligible cost for transitioning from LoRA to PiSSA. Code is available at https://github.com/GraphPKU/PiSSA.

Updated: 2025-04-09 06:54:20

标题: PiSSA: 大型语言模型的主要奇异值和奇异向量适应

摘要: 为了对大型语言模型（LLMs）进行参数高效微调（PEFT），低秩适应（LoRA）方法通过两个矩阵$A \in \mathbb{R}^{m \times r}$和$B \in \mathbb{R}^{r \times n}$的乘积来近似模型变化$\Delta W \in \mathbb{R}^{m \times n}$，其中$r \ll \min(m, n)$，$A$用高斯噪声初始化，$B$用零初始化。LoRA冻结原始模型$W$并更新“噪声和零”适配器，可能导致收敛速度较慢。为了克服这一限制，我们引入了主奇异值和奇异向量适应（PiSSA）。PiSSA与LoRA具有相同的架构，但是初始化适配器矩阵$A$和$B$时使用原始矩阵$W$的主成分，并将其余部分放入一个冻结的残差矩阵$W^{res} \in \mathbb{R}^{m \times n}$，在微调过程中保持冻结。与LoRA相比，PiSSA在冻结“残差”部分的同时更新主要成分，从而实现更快的收敛速度和增强性能。针对从184M到70B不同模型的12项比较实验，涵盖5个NLG和8个NLU任务，PiSSA和LoRA的比较表明，在相同的实验设置下，PiSSA始终优于LoRA。在GSM8K基准上，使用PiSSA微调的Mistral-7B模型的准确率达到了72.86%，比LoRA的67.7%高出了5.16%。由于具有相同的架构，PiSSA还与量化兼容，可以进一步降低微调的内存需求。与QLoRA相比，QPiSSA在初始阶段显示出较小的量化误差。在GSM8K上微调LLaMA-3-70B，QPiSSA的准确率达到了86.05%，超过了QLoRA的81.73%。利用快速SVD技术，PiSSA可以在几秒钟内初始化，从LoRA过渡到PiSSA几乎没有成本。代码可在https://github.com/GraphPKU/PiSSA找到。

更新时间: 2025-04-09 06:54:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.02948v4

Quantum neural networks facilitating quantum state classification

The classification of quantum states into distinct classes poses a significant challenge. In this study, we address this problem using quantum neural networks in combination with a problem-inspired circuit and customised as well as predefined ans\"{a}tz. To facilitate the resource-efficient quantum state classification, we construct the dataset of quantum states using the proposed problem-inspired circuit. The problem-inspired circuit incorporates two-qubit parameterised unitary gates of varying entangling power, which is further integrated with the ans\"{a}tz, developing an entire quantum neural network. To demonstrate the capability of the selected ans\"{a}tz, we visualise the mitigated barren plateaus. The designed quantum neural network demonstrates the efficiency in binary and multi-class classification tasks. This work establishes a foundation for the classification of multi-qubit quantum states and offers the potential for generalisation to multi-qubit pure quantum states.

Updated: 2025-04-09 06:42:32

标题: 量子神经网络促进量子态分类

摘要: 将量子态分类为不同类别是一个重要的挑战。在这项研究中，我们利用量子神经网络结合问题启发的电路和定制的以及预定义的ans\"{a}tz来解决这个问题。为了促进资源高效的量子态分类，我们利用提出的问题启发电路构建量子态数据集。该问题启发电路包含具有不同纠缠能力的两量子比特参数化酉门，进一步与ans\"{a}tz集成，形成整个量子神经网络。为了展示所选ans\"{a}tz的能力，我们可视化了减轻的贫瘠平原。设计的量子神经网络在二进制和多类分类任务中展示了效率。这项工作为多量子比特量子态的分类奠定了基础，并为泛化到多量子比特纯量子态提供了潜力。

更新时间: 2025-04-09 06:42:32

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2504.06622v1

Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training

Multimodal large language models (MLLMs), such as GPT-4o, are garnering significant attention. During the exploration of MLLM training, we identified Modality Composition Incoherence, a phenomenon that the proportion of a certain modality varies dramatically across different examples. It exacerbates the challenges of addressing mini-batch imbalances, which lead to uneven GPU utilization between Data Parallel (DP) instances and severely degrades the efficiency and scalability of MLLM training, ultimately affecting training speed and hindering further research on MLLMs. To address these challenges, we introduce OrchMLLM, a comprehensive framework designed to mitigate the inefficiencies in MLLM training caused by Modality Composition Incoherence. First, we propose Batch Post-Balancing Dispatcher, a technique that efficiently eliminates mini-batch imbalances in sequential data. Additionally, we integrate MLLM Global Orchestrator into the training framework to orchestrate multimodal data and tackle the issues arising from Modality Composition Incoherence. We evaluate OrchMLLM across various MLLM sizes, demonstrating its efficiency and scalability. Experimental results reveal that OrchMLLM achieves a Model FLOPs Utilization (MFU) of $41.6\%$ when training an 84B MLLM with three modalities on $2560$ H100 GPUs, outperforming Megatron-LM by up to $3.1\times$ in throughput.

Updated: 2025-04-09 06:39:29

标题: 利用批处理后平衡来编排多模态数据以加速多模态大语言模型训练

摘要: 多模态大型语言模型（MLLMs），如GPT-4o，正在引起重大关注。在探索MLLM训练过程中，我们发现了模态组成不一致的现象，即某种模态在不同示例中的比例差异巨大。这加剧了解决小批量不平衡的挑战，导致数据并行（DP）实例之间GPU利用率不均匀，并严重影响了MLLM训练的效率和可扩展性，最终影响了训练速度，阻碍了对MLLM的进一步研究。为了解决这些挑战，我们引入了OrchMLLM，这是一个旨在减轻由模态组成不一致引起的MLLM训练效率低下问题的综合框架。首先，我们提出了Batch Post-Balancing Dispatcher，一种有效消除顺序数据中小批量不平衡的技术。此外，我们将MLLM全局协调器集成到训练框架中，以协调多模态数据并解决由模态组成不一致引起的问题。我们评估了OrchMLLM在各种MLLM规模上的表现，展示了其效率和可扩展性。实验结果显示，在使用2560个H100 GPU训练具有三种模态的84B MLLM时，OrchMLLM实现了41.6％的模型FLOPs利用率，在吞吐量方面超过了Megatron-LM高达3.1倍。

更新时间: 2025-04-09 06:39:29

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2503.23830v2

PolygonGNN: Representation Learning for Polygonal Geometries with Heterogeneous Visibility Graph

Polygon representation learning is essential for diverse applications, encompassing tasks such as shape coding, building pattern classification, and geographic question answering. While recent years have seen considerable advancements in this field, much of the focus has been on single polygons, overlooking the intricate inner- and inter-polygonal relationships inherent in multipolygons. To address this gap, our study introduces a comprehensive framework specifically designed for learning representations of polygonal geometries, particularly multipolygons. Central to our approach is the incorporation of a heterogeneous visibility graph, which seamlessly integrates both inner- and inter-polygonal relationships. To enhance computational efficiency and minimize graph redundancy, we implement a heterogeneous spanning tree sampling method. Additionally, we devise a rotation-translation invariant geometric representation, ensuring broader applicability across diverse scenarios. Finally, we introduce Multipolygon-GNN, a novel model tailored to leverage the spatial and semantic heterogeneity inherent in the visibility graph. Experiments on five real-world and synthetic datasets demonstrate its ability to capture informative representations for polygonal geometries. Code and data are available at \href{https://github.com/dyu62/PolyGNN}{$github.com/dyu62/PolyGNN$}.

Updated: 2025-04-09 06:17:32

标题: PolygonGNN：具有异构可见性图的多边形几何表示学习

摘要: 多边形表示学习对于各种应用至关重要，涵盖了形状编码、建筑模式分类和地理问题回答等任务。近年来，这一领域取得了相当大的进展，但大部分关注点集中在单个多边形上，忽视了多多边形内部和多边形间复杂的关系。为了填补这一空白，我们的研究引入了一个专门设计用于学习多边形几何表示的全面框架，特别是多多边形。我们方法的核心是引入异质可见性图，无缝集成了内部和多边形间的关系。为了增强计算效率并最小化图的冗余，我们实现了一种异质覆盖树采样方法。此外，我们设计了一个旋转平移不变的几何表示，确保在各种情况下都具有更广泛的适用性。最后，我们引入了Multipolygon-GNN，这是一个针对利用可见性图中空间和语义异质性的新模型。在五个真实和合成数据集上的实验表明，该模型能够捕捉多边形几何的信息表示能力。代码和数据可以在\href{https://github.com/dyu62/PolyGNN}{$github.com/dyu62/PolyGNN$}找到。

更新时间: 2025-04-09 06:17:32

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.00742v2

Wanting to be Understood

This paper explores an intrinsic motivation for mutual awareness, hypothesizing that humans possess a fundamental drive to understand \textit{and to be understood} even in the absence of extrinsic rewards. Through simulations of the perceptual crossing paradigm, we explore the effect of various internal reward functions in reinforcement learning agents. The drive to understand is implemented as an active inference type artificial curiosity reward, whereas the drive to be understood is implemented through intrinsic rewards for imitation, influence/impressionability, and sub-reaction time anticipation of the other. Results indicate that while artificial curiosity alone does not lead to a preference for social interaction, rewards emphasizing reciprocal understanding successfully drive agents to prioritize interaction. We demonstrate that this intrinsic motivation can facilitate cooperation in tasks where only one agent receives extrinsic reward for the behaviour of the other.

Updated: 2025-04-09 06:15:24

标题: 想要被理解

摘要: 本文探讨了相互意识的内在动机，假设人类在没有外在奖励的情况下，具有一种理解和被理解的基本驱动力。通过感知交叉范式的模拟，我们研究了强化学习代理中各种内部奖励函数的影响。理解的驱动力被实现为一种主动推理类型的人工好奇心奖励，而被理解的驱动力则通过对模仿、影响/易受影响性和对他人的次反应时间预期的内在奖励来实现。结果表明，虽然单独的人工好奇心并不会导致对社会互动的偏好，但强调相互理解的奖励成功地驱使代理人优先考虑互动。我们证明这种内在动机可以促进合作，在只有一个代理人因另一个代理人的行为而获得外在奖励的任务中。

更新时间: 2025-04-09 06:15:24

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2504.06611v1

Disentangle and Regularize: Sign Language Production with Articulator-Based Disentanglement and Channel-Aware Regularization

In this work, we propose a simple gloss-free, transformer-based sign language production (SLP) framework that directly maps spoken-language text to sign pose sequences. We first train a pose autoencoder that encodes sign poses into a compact latent space using an articulator-based disentanglement strategy, where features corresponding to the face, right hand, left hand, and body are modeled separately to promote structured and interpretable representation learning. Next, a non-autoregressive transformer decoder is trained to predict these latent representations from sentence-level text embeddings. To guide this process, we apply channel-aware regularization by aligning predicted latent distributions with priors extracted from the ground-truth encodings using a KL-divergence loss. The contribution of each channel to the loss is weighted according to its associated articulator region, enabling the model to account for the relative importance of different articulators during training. Our approach does not rely on gloss supervision or pretrained models, and achieves state-of-the-art results on the PHOENIX14T dataset using only a modest training set.

Updated: 2025-04-09 06:14:19

标题: 解开和规范化：基于发音器的分解和通道感知规范的手语制作

摘要: 在这项工作中，我们提出了一个简单的无光泽、基于变压器的手语生成（SLP）框架，直接将口语文本映射到手语姿势序列。我们首先训练一个姿势自动编码器，使用基于发音器的解缠策略将手语姿势编码为紧凑的潜在空间，其中与面部、右手、左手和身体对应的特征分别建模，以促进结构化和可解释的表示学习。接下来，训练一个非自回归的变压器解码器，从句子级文本嵌入中预测这些潜在表示。为了引导这个过程，我们应用了通道感知正则化，通过将预测的潜在分布与从地面真实编码中提取的先验进行对齐，使用KL散度损失。每个通道对损失的贡献根据其相关的发音器区域加权，使模型能够在训练过程中考虑不同发音器的相对重要性。我们的方法不依赖于光泽监督或预训练模型，并且只使用适度的训练集在PHOENIX14T数据集上取得了最先进的结果。

更新时间: 2025-04-09 06:14:19

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2504.06610v1

InteractRank: Personalized Web-Scale Search Pre-Ranking with Cross Interaction Features

Modern search systems use a multi-stage architecture to deliver personalized results efficiently. Key stages include retrieval, pre-ranking, full ranking, and blending, which refine billions of items to top selections. The pre-ranking stage, vital for scoring and filtering hundreds of thousands of items down to a few thousand, typically relies on two tower models due to their computational efficiency, despite often lacking in capturing complex interactions. While query-item cross interaction features are paramount for full ranking, integrating them into pre-ranking models presents efficiency-related challenges. In this paper, we introduce InteractRank, a novel two tower pre-ranking model with robust cross interaction features used at Pinterest. By incorporating historical user engagement-based query-item interactions in the scoring function along with the two tower dot product, InteractRank significantly boosts pre-ranking performance with minimal latency and computation costs. In real-world A/B experiments at Pinterest, InteractRank improves the online engagement metric by 6.5% over a BM25 baseline and by 3.7% over a vanilla two tower baseline. We also highlight other components of InteractRank, like real-time user-sequence modeling, and analyze their contributions through offline ablation studies. The code for InteractRank is available at https://github.com/pinterest/atg-research/tree/main/InteractRank.

Updated: 2025-04-09 06:13:58

标题: InteractRank: 具有交互特征的个性化网络规模搜索预排序

摘要: 现代搜索系统使用多阶段架构有效地提供个性化结果。关键阶段包括检索、预排序、全排序和混合，将数十亿个项目细化为前几个选择。预排序阶段对于将数十万个项目的评分和过滤降至几千个至关重要，通常依赖于两个塔模型，因为它们的计算效率高，尽管常常缺乏捕捉复杂互动的能力。虽然查询-项目交互特征对于全排序至关重要，但将它们整合到预排序模型中会带来与效率相关的挑战。在本文中，我们介绍了InteractRank，这是一个新颖的两塔预排序模型，其中包含在Pinterest上使用的强大交叉互动特征。通过将基于历史用户参与的查询-项目互动整合到评分函数中，以及两塔点积，InteractRank显著提高了预排序性能，同时具有最小延迟和计算成本。在Pinterest的实际 A/B 实验中，InteractRank相比于BM25基线提高了6.5%的在线参与度指标，比起基本的两塔基线提高了3.7%。我们还突出了InteractRank的其他组件，如实时用户序列建模，并通过离线消融研究分析了它们的贡献。InteractRank的代码可以在https://github.com/pinterest/atg-research/tree/main/InteractRank 上找到。

更新时间: 2025-04-09 06:13:58

领域: cs.IR,cs.AI,cs.LG,H.3.3

下载: http://arxiv.org/abs/2504.06609v1

Effective Method for Inverse Ising Problem under Missing Observations in Restricted Boltzmann Machines

Restricted Boltzmann machines (RBMs) are energy-based models analogous to the Ising model and are widely applied in statistical machine learning. The standard inverse Ising problem with a complete dataset requires computing both data and model expectations and is computationally challenging because model expectations have a combinatorial explosion. Furthermore, in many applications, the available datasets are partially incomplete, making it difficult to compute even data expectations. In this study, we propose a approximation framework for these expectations in the practical inverse Ising problems that integrates mean-field approximation or persistent contrastive divergence to generate refined initial points and spatial Monte Carlo integration to enhance estimator accuracy. We demonstrate that the proposed method effectively and accurately tunes the model parameters in comparison to the conventional method.

Updated: 2025-04-09 06:05:02

标题: 缺失观测数据下受限玻尔兹曼机中逆伊辛问题的有效方法

摘要: Restricted Boltzmann machines (RBMs)是能量基模型，类似于Ising模型，在统计机器学习中被广泛应用。完整数据集的标准逆Ising问题需要计算数据和模型期望，由于模型期望存在组合爆炸，计算具有挑战性。此外，在许多应用中，可用的数据集部分不完整，使得即使计算数据期望也很困难。在本研究中，我们提出了一个适用于实际逆Ising问题的期望的近似框架，该框架整合了均场近似或持续对比散度来生成精确的初始点以及空间蒙特卡洛积分来增强估计器的准确性。我们证明了所提出的方法相比传统方法有效且准确地调整模型参数。

更新时间: 2025-04-09 06:05:02

领域: stat.ML,cond-mat.dis-nn,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2504.05643v2

Automated Business Process Analysis: An LLM-Based Approach to Value Assessment

Business processes are fundamental to organizational operations, yet their optimization remains challenging due to the timeconsuming nature of manual process analysis. Our paper harnesses Large Language Models (LLMs) to automate value-added analysis, a qualitative process analysis technique that aims to identify steps in the process that do not deliver value. To date, this technique is predominantly manual, time-consuming, and subjective. Our method offers a more principled approach which operates in two phases: first, decomposing high-level activities into detailed steps to enable granular analysis, and second, performing a value-added analysis to classify each step according to Lean principles. This approach enables systematic identification of waste while maintaining the semantic understanding necessary for qualitative analysis. We develop our approach using 50 business process models, for which we collect and publish manual ground-truth labels. Our evaluation, comparing zero-shot baselines with more structured prompts reveals (a) a consistent benefit of structured prompting and (b) promising performance for both tasks. We discuss the potential for LLMs to augment human expertise in qualitative process analysis while reducing the time and subjectivity inherent in manual approaches.

Updated: 2025-04-09 05:52:50

标题: 自动化业务流程分析：基于LLM的价值评估方法

摘要: 商业流程是组织运营的基础，但由于手动流程分析的耗时性质，它们的优化仍然具有挑战性。我们的论文利用大型语言模型（LLMs）来自动化增值分析，这是一种定性流程分析技术，旨在识别流程中不提供价值的步骤。迄今为止，这种技术主要是手动的、耗时的和主观的。我们的方法提供了一种更有原则的方法，它分为两个阶段：首先，将高层活动分解为详细步骤，以实现细粒度分析；其次，进行增值分析，根据精益原则对每个步骤进行分类。这种方法能够系统地识别浪费，同时保持定性分析所需的语义理解。我们使用50个商业流程模型开发了我们的方法，我们收集和发布了手动的基准标签。我们的评估结果显示，将零射基线与更结构化的提示进行比较，结果显示结构化提示的一致性好处和两项任务的良好性能。我们讨论了LLMs在定性流程分析中增强人类专业知识的潜力，同时减少了手动方法中固有的时间和主观性。

更新时间: 2025-04-09 05:52:50

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2504.06600v1

Neuron-level Balance between Stability and Plasticity in Deep Reinforcement Learning

In contrast to the human ability to continuously acquire knowledge, agents struggle with the stability-plasticity dilemma in deep reinforcement learning (DRL), which refers to the trade-off between retaining existing skills (stability) and learning new knowledge (plasticity). Current methods focus on balancing these two aspects at the network level, lacking sufficient differentiation and fine-grained control of individual neurons. To overcome this limitation, we propose Neuron-level Balance between Stability and Plasticity (NBSP) method, by taking inspiration from the observation that specific neurons are strongly relevant to task-relevant skills. Specifically, NBSP first (1) defines and identifies RL skill neurons that are crucial for knowledge retention through a goal-oriented method, and then (2) introduces a framework by employing gradient masking and experience replay techniques targeting these neurons to preserve the encoded existing skills while enabling adaptation to new tasks. Numerous experimental results on the Meta-World and Atari benchmarks demonstrate that NBSP significantly outperforms existing approaches in balancing stability and plasticity.

Updated: 2025-04-09 05:43:30

标题: 深度强化学习中神经元水平上稳定性与可塑性的平衡

摘要: 与人类不断获取知识的能力相反，代理人在深度强化学习（DRL）中面临着稳定性-可塑性困境，这指的是保留现有技能（稳定性）和学习新知识（可塑性）之间的权衡。当前的方法侧重于在网络层面平衡这两个方面，缺乏对个体神经元的足够区分和细粒度控制。为了克服这一局限性，我们提出了神经元水平的稳定性和可塑性平衡（NBSP）方法，灵感来自于特定神经元与任务相关技能之间的强相关性的观察。具体来说，NBSP首先（1）通过面向目标的方法定义和识别对知识保留至关重要的RL技能神经元，然后（2）通过采用梯度屏蔽和经验重放技术针对这些神经元引入一个框架，以保留编码的现有技能同时使其适应新任务。在Meta-World和Atari基准测试上的大量实验结果表明，NBSP在平衡稳定性和可塑性方面明显优于现有方法。

更新时间: 2025-04-09 05:43:30

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.08000v1

Corrected with the Latest Version: Make Robust Asynchronous Federated Learning Possible

As an emerging paradigm of federated learning, asynchronous federated learning offers significant speed advantages over traditional synchronous federated learning. Unlike synchronous federated learning, which requires waiting for all clients to complete updates before aggregation, asynchronous federated learning aggregates the models that have arrived in realtime, greatly improving training speed. However, this mechanism also introduces the issue of client model version inconsistency. When the differences between models of different versions during aggregation become too large, it may lead to conflicts, thereby reducing the models accuracy. To address this issue, this paper proposes an asynchronous federated learning version correction algorithm based on knowledge distillation, named FedADT. FedADT applies knowledge distillation before aggregating gradients, using the latest global model to correct outdated information, thus effectively reducing the negative impact of outdated gradients on the training process. Additionally, FedADT introduces an adaptive weighting function that adjusts the knowledge distillation weight according to different stages of training, helps mitigate the misleading effects caused by the poorer performance of the global model in the early stages of training. This method significantly improves the overall performance of asynchronous federated learning without adding excessive computational overhead. We conducted experimental comparisons with several classical algorithms, and the results demonstrate that FedADT achieves significant improvements over other asynchronous methods and outperforms all methods in terms of convergence speed.

Updated: 2025-04-09 05:42:03

标题: 使用最新版本进行校正：使得异步联邦学习更加稳健

摘要: 作为一种新兴的联邦学习范式，异步联邦学习相比传统的同步联邦学习具有显著的速度优势。与需要等待所有客户端完成更新后再进行聚合的同步联邦学习不同，异步联邦学习在实时聚合已到达的模型，大大提高了训练速度。然而，这种机制也引入了客户端模型版本不一致的问题。当聚合过程中不同版本模型之间的差异变得太大时，可能会导致冲突，从而降低模型的准确性。为解决这一问题，本文提出了一种基于知识蒸馏的异步联邦学习版本校正算法，命名为FedADT。FedADT在聚合梯度之前应用知识蒸馏，利用最新的全局模型来纠正过时信息，有效降低过时梯度对训练过程的负面影响。此外，FedADT引入了一个自适应加权函数，根据训练的不同阶段调整知识蒸馏权重，有助于减轻早期训练阶段全局模型性能较差导致的误导效果。该方法显著提升了异步联邦学习的整体性能，而不增加过多的计算开销。我们与几种经典算法进行了实验比较，结果表明FedADT在收敛速度方面显著优于其他异步方法，并在所有方法中表现最好。

更新时间: 2025-04-09 05:42:03

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2504.04081v2

EzSQL: An SQL intermediate representation for improving SQL-to-text Generation

The SQL-to-text generation task traditionally uses template base, Seq2Seq, tree-to-sequence, and graph-to-sequence models. Recent models take advantage of pre-trained generative language models for this task in the Seq2Seq framework. However, treating SQL as a sequence of inputs to the pre-trained models is not optimal. In this work, we put forward a new SQL intermediate representation called EzSQL to align SQL with the natural language text sequence. EzSQL simplifies the SQL queries and brings them closer to natural language text by modifying operators and keywords, which can usually be described in natural language. EzSQL also removes the need for set operators. Our proposed SQL-to-text generation model uses EzSQL as the input to a pre-trained generative language model for generating the text descriptions. We demonstrate that our model is an effective state-of-the-art method to generate text narrations from SQL queries on the WikiSQL and Spider datasets. We also show that by generating pretraining data using our SQL-to-text generation model, we can enhance the performance of Text-to-SQL parsers.

Updated: 2025-04-09 05:40:29

标题: EzSQL: 用于改善SQL到文本生成的SQL中间表示

摘要: 传统上，SQL到文本生成任务通常使用基于模板的、Seq2Seq、树到序列和图到序列模型。最近的模型利用预训练生成性语言模型在Seq2Seq框架中执行此任务。然而，将SQL作为输入序列到预训练模型并不是最佳选择。在这项工作中，我们提出了一种新的SQL中间表示形式，称为EzSQL，以使SQL与自然语言文本序列对齐。EzSQL通过修改运算符和关键字简化SQL查询，并使其更接近自然语言文本，这些关键字通常可以用自然语言描述。EzSQL还消除了集合运算符的需求。我们提出的SQL到文本生成模型使用EzSQL作为输入到预训练生成性语言模型，用于生成文本描述。我们展示了我们的模型是从WikiSQL和Spider数据集的SQL查询生成文本描述的有效领先方法。我们还展示，通过使用我们的SQL到文本生成模型生成预训练数据，可以提高Text-to-SQL解析器的性能。

更新时间: 2025-04-09 05:40:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.18923v2

NAPER: Fault Protection for Real-Time Resource-Constrained Deep Neural Networks

Fault tolerance in Deep Neural Networks (DNNs) deployed on resource-constrained systems presents unique challenges for high-accuracy applications with strict timing requirements. Memory bit-flips can severely degrade DNN accuracy, while traditional protection approaches like Triple Modular Redundancy (TMR) often sacrifice accuracy to maintain reliability, creating a three-way dilemma between reliability, accuracy, and timeliness. We introduce NAPER, a novel protection approach that addresses this challenge through ensemble learning. Unlike conventional redundancy methods, NAPER employs heterogeneous model redundancy, where diverse models collectively achieve higher accuracy than any individual model. This is complemented by an efficient fault detection mechanism and a real-time scheduler that prioritizes meeting deadlines by intelligently scheduling recovery operations without interrupting inference. Our evaluations demonstrate NAPER's superiority: 40% faster inference in both normal and fault conditions, maintained accuracy 4.2% higher than TMR-based strategies, and guaranteed uninterrupted operation even during fault recovery. NAPER effectively balances the competing demands of accuracy, reliability, and timeliness in real-time DNN applications

Updated: 2025-04-09 05:37:54

标题: NAPER：面向实时资源受限的深度神经网络的故障保护

摘要: 深度神经网络（DNNs）在资源受限系统上部署时的容错性面临独特挑战，特别是对于具有严格时间要求的高精度应用。内存位翻转会严重降低DNN的准确性，而传统的保护方法如三重模块冗余（TMR）通常会牺牲准确性来保持可靠性，从而在可靠性、准确性和及时性之间造成三方面困境。我们介绍了NAPER，一种通过集成学习解决这一挑战的新型保护方法。与传统的冗余方法不同，NAPER采用异构模型冗余，多样化的模型共同实现比任何单一模型更高的准确性。这得到了一个高效的故障检测机制和一个实时调度器的补充，后者通过智能调度恢复操作来优先满足截止期限，而不会中断推理。我们的评估表明NAPER的优越性：在正常和故障条件下推理速度提高40%，维持的准确性比基于TMR的策略高4.2%，并且即使在故障恢复期间也保证不间断运行。NAPER有效地平衡了实时DNN应用程序中准确性、可靠性和及时性之间的竞争需求。

更新时间: 2025-04-09 05:37:54

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2504.06591v1

PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models

Process-level Reward Models (PRMs) are crucial for complex reasoning and decision-making tasks, where each intermediate step plays an important role in the reasoning process. Since language models are prone to various types of errors during the reasoning process, PRMs are required to possess nuanced capabilities for detecting various implicit error types in real-world scenarios. However, current benchmarks primarily focus on step correctness, failing to evaluate PRMs' performance systematically. To address this gap, we introduce PRMBench, a process-level benchmark specifically designed to assess the fine-grained error detection capabilities of PRMs. PRMBench comprises 6,216 carefully designed problems and 83,456 step-level labels, evaluating models across multiple dimensions, including simplicity, soundness, and sensitivity. In our experiments on 15 models, spanning both open-source PRMs and closed-source large language models prompted as critic models, we uncover significant weaknesses in current PRMs. These findings underscore the challenges inherent in process-level evaluation and highlight key directions for future research. We hope PRMBench can be a robust bench for advancing research on PRM evaluation and development.

Updated: 2025-04-09 05:29:30

标题: PRMBench：一个细粒度且具有挑战性的过程级奖励模型基准

摘要: 过程级奖励模型（PRMs）对于复杂推理和决策任务至关重要，其中每个中间步骤在推理过程中扮演着重要角色。由于语言模型在推理过程中容易出现各种类型的错误，因此需要PRMs具有细微的能力来检测现实世界场景中各种隐含错误类型。然而，当前的基准主要关注步骤的正确性，未能系统评估PRMs的性能。为填补这一差距，我们引入了PRMBench，一个专门设计用于评估PRMs细粒度错误检测能力的过程级基准。PRMBench包括6,216个精心设计的问题和83,456个步骤级标签，评估模型在多个维度上，包括简单性、合理性和敏感性。在对15个模型进行的实验中，涵盖了开源PRMs和作为评论模型的封闭源语言模型，我们发现当前PRMs存在显著弱点。这些发现突显了过程级评估中固有的挑战，并强调了未来研究的关键方向。我们希望PRMBench能成为推进PRM评估和发展研究的坚实基准。

更新时间: 2025-04-09 05:29:30

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2501.03124v3

CAFE-AD: Cross-Scenario Adaptive Feature Enhancement for Trajectory Planning in Autonomous Driving

Imitation learning based planning tasks on the nuPlan dataset have gained great interest due to their potential to generate human-like driving behaviors. However, open-loop training on the nuPlan dataset tends to cause causal confusion during closed-loop testing, and the dataset also presents a long-tail distribution of scenarios. These issues introduce challenges for imitation learning. To tackle these problems, we introduce CAFE-AD, a Cross-Scenario Adaptive Feature Enhancement for Trajectory Planning in Autonomous Driving method, designed to enhance feature representation across various scenario types. We develop an adaptive feature pruning module that ranks feature importance to capture the most relevant information while reducing the interference of noisy information during training. Moreover, we propose a cross-scenario feature interpolation module that enhances scenario information to introduce diversity, enabling the network to alleviate over-fitting in dominant scenarios. We evaluate our method CAFE-AD on the challenging public nuPlan Test14-Hard closed-loop simulation benchmark. The results demonstrate that CAFE-AD outperforms state-of-the-art methods including rule-based and hybrid planners, and exhibits the potential in mitigating the impact of long-tail distribution within the dataset. Additionally, we further validate its effectiveness in real-world environments. The code and models will be made available at https://github.com/AlniyatRui/CAFE-AD.

Updated: 2025-04-09 05:16:29

标题: CAFE-AD：用于自动驾驶轨迹规划的跨场景自适应特征增强

摘要: 基于模仿学习的nuPlan数据集上的规划任务因其产生类似人类驾驶行为的潜力而受到极大关注。然而，在nuPlan数据集上的开环训练往往会在闭环测试期间导致因果混淆，该数据集还呈现出长尾分布的场景。这些问题为模仿学习带来了挑战。为了解决这些问题，我们引入了CAFE-AD，一种用于自动驾驶轨迹规划的跨场景自适应特征增强方法，旨在增强各种场景类型的特征表示。我们开发了一个自适应特征修剪模块，对特征重要性进行排名，以捕捉最相关的信息，同时减少训练过程中嘈杂信息的干扰。此外，我们提出了一个跨场景特征插值模块，增强场景信息以引入多样性，使网络能够缓解主导场景中的过拟合。我们在具有挑战性的公共nuPlan Test14-Hard闭环模拟基准上评估了我们的方法CAFE-AD。结果表明，CAFE-AD优于基于规则和混合规划者的最新方法，并展示了在数据集中减轻长尾分布影响的潜力。此外，我们进一步验证了其在真实环境中的有效性。代码和模型将在https://github.com/AlniyatRui/CAFE-AD 上提供。

更新时间: 2025-04-09 05:16:29

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2504.06584v1

Learning Generalizable Features for Tibial Plateau Fracture Segmentation Using Masked Autoencoder and Limited Annotations

Accurate automated segmentation of tibial plateau fractures (TPF) from computed tomography (CT) requires large amounts of annotated data to train deep learning models, but obtaining such annotations presents unique challenges. The process demands expert knowledge to identify diverse fracture patterns, assess severity, and account for individual anatomical variations, making the annotation process highly time-consuming and expensive. Although semi-supervised learning methods can utilize unlabeled data, existing approaches often struggle with the complexity and variability of fracture morphologies, as well as limited generalizability across datasets. To tackle these issues, we propose an effective training strategy based on masked autoencoder (MAE) for the accurate TPF segmentation in CT. Our method leverages MAE pretraining to capture global skeletal structures and fine-grained fracture details from unlabeled data, followed by fine-tuning with a small set of labeled data. This strategy reduces the dependence on extensive annotations while enhancing the model's ability to learn generalizable and transferable features. The proposed method is evaluated on an in-house dataset containing 180 CT scans with TPF. Experimental results demonstrate that our method consistently outperforms semi-supervised methods, achieving an average Dice similarity coefficient (DSC) of 95.81%, average symmetric surface distance (ASSD) of 1.91mm, and Hausdorff distance (95HD) of 9.42mm with only 20 annotated cases. Moreover, our method exhibits strong transferability when applying to another public pelvic CT dataset with hip fractures, highlighting its potential for broader applications in fracture segmentation tasks.

Updated: 2025-04-09 05:15:50

标题: 使用掩蔽自动编码器和有限注释学习胫骨平台骨折分割的通用特征

摘要: 准确自动分割胫骨平台骨折（TPF）的计算机断层扫描（CT）需要大量的注释数据来训练深度学习模型，但获取这样的注释数据面临独特挑战。该过程需要专家知识来识别不同的骨折模式，评估严重程度，并考虑个体解剖变异，使得注释过程非常耗时且昂贵。虽然半监督学习方法可以利用无标签数据，但现有方法常常难以应对骨折形态的复杂性和可变性，以及跨数据集的有限泛化能力。为了解决这些问题，我们提出了基于掩模自编码器（MAE）的有效训练策略，用于CT中准确的TPF分割。我们的方法利用MAE预训练来捕捉全局骨骼结构和细粒度的骨折细节，随后通过少量标记数据进行微调。这种策略减少了对广泛注释的依赖，同时增强了模型学习可泛化和可转移特征的能力。所提出的方法在一个包含180个带有TPF的CT扫描的内部数据集上进行评估。实验结果表明，我们的方法始终优于半监督方法，达到了平均Dice相似系数（DSC）为95.81％，平均对称表面距离（ASSD）为1.91mm，Hausdorff距离（95HD）为9.42mm，仅使用20个标记案例。此外，我们的方法在应用于另一个公共盆骨CT数据集与髋骨骨折时表现出强大的可转移性，突显了其在骨折分割任务中更广泛应用的潜力。

更新时间: 2025-04-09 05:15:50

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2502.02862v2

Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis

Large language models (LLMs) offer a promising pre-screening tool, improving early disease detection and providing enhanced healthcare access for underprivileged communities. The early diagnosis of various diseases continues to be a significant challenge in healthcare, primarily due to the nonspecific nature of early symptoms, the shortage of expert medical practitioners, and the need for prolonged clinical evaluations, all of which can delay treatment and adversely affect patient outcomes. With impressive accuracy in prediction across a range of diseases, LLMs have the potential to revolutionize clinical pre-screening and decision-making for various medical conditions. In this work, we study the diagnostic capability of LLMs for Rheumatoid Arthritis (RA) with real world patients data. Patient data was collected alongside diagnoses from medical experts, and the performance of LLMs was evaluated in comparison to expert diagnoses for RA disease prediction. We notice an interesting pattern in disease diagnosis and find an unexpected \textit{misalignment between prediction and explanation}. We conduct a series of multi-round analyses using different LLM agents. The best-performing model accurately predicts rheumatoid arthritis (RA) diseases approximately 95\% of the time. However, when medical experts evaluated the reasoning generated by the model, they found that nearly 68\% of the reasoning was incorrect. This study highlights a clear misalignment between LLMs high prediction accuracy and its flawed reasoning, raising important questions about relying on LLM explanations in clinical settings. \textbf{LLMs provide incorrect reasoning to arrive at the correct answer for RA disease diagnosis.}

Updated: 2025-04-09 05:04:01

标题: 正确预测，错误推理：揭示在类风湿关节炎诊断中LLM不一致

摘要: 大型语言模型（LLMs）为一种有前途的预筛查工具，可以改善早期疾病检测，并提供加强的医疗保健服务，使贫困社区获得更多的医疗服务。各种疾病的早期诊断在医疗保健领域仍然是一个重大挑战，主要原因是早期症状的非特异性、专业医疗从业者的短缺，以及需要进行延长的临床评估，所有这些都可能延误治疗并对患者的预后产生不利影响。LLMs 在各种疾病预测方面显示出令人印象深刻的准确性，有潜力彻底改变临床预筛查和决策制定对各种医疗条件的方式。在这项工作中，我们研究了LLMs对风湿性关节炎（RA）的诊断能力，使用了真实世界患者数据。患者数据与医疗专家的诊断一起收集，并将LLMs的性能与专家诊断进行比较，以进行RA疾病预测。我们发现了一种有趣的疾病诊断模式，并发现了一种意外的\textit{预测与解释之间的不一致}。我们使用不同的LLM代理进行了一系列多轮分析。表现最佳的模型大约95\%的时间可以准确预测风湿性关节炎（RA）疾病。然而，当医疗专家评估模型生成的推理时，他们发现将近68\%的推理是不正确的。这项研究突显了LLMs高预测准确性与其错误推理之间的明显不一致，引发了关于在临床环境中依赖LLM解释的重要问题。\textbf{LLMs 提供了不正确的推理来得出 RA 疾病诊断的正确答案。}

更新时间: 2025-04-09 05:04:01

领域: cs.AI

下载: http://arxiv.org/abs/2504.06581v1

Exploring Ordinal Bias in Action Recognition for Instructional Videos

Action recognition models have achieved promising results in understanding instructional videos. However, they often rely on dominant, dataset-specific action sequences rather than true video comprehension, a problem that we define as ordinal bias. To address this issue, we propose two effective video manipulation methods: Action Masking, which masks frames of frequently co-occurring actions, and Sequence Shuffling, which randomizes the order of action segments. Through comprehensive experiments, we demonstrate that current models exhibit significant performance drops when confronted with nonstandard action sequences, underscoring their vulnerability to ordinal bias. Our findings emphasize the importance of rethinking evaluation strategies and developing models capable of generalizing beyond fixed action patterns in diverse instructional videos.

Updated: 2025-04-09 05:03:51

标题: 在教学视频中探索动作识别中的序数偏差

摘要: 行动识别模型在理解指导视频方面取得了令人期待的结果。然而，它们通常依赖于特定数据集的主导动作序列，而不是真正的视频理解，这是我们定义的一种序数偏差问题。为了解决这个问题，我们提出了两种有效的视频操作方法：行动屏蔽，它屏蔽了频繁出现的动作帧，以及序列洗牌，它随机改变动作片段的顺序。通过全面的实验，我们证明了当前模型在面对非标准动作序列时表现出显著的性能下降，突出了它们对序数偏差的脆弱性。我们的发现强调了重新思考评估策略的重要性，并开发能够在不同类型的指导视频中泛化超越固定动作模式的模型。

更新时间: 2025-04-09 05:03:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.06580v1

Attributes-aware Visual Emotion Representation Learning

Visual emotion analysis or recognition has gained considerable attention due to the growing interest in understanding how images can convey rich semantics and evoke emotions in human perception. However, visual emotion analysis poses distinctive challenges compared to traditional vision tasks, especially due to the intricate relationship between general visual features and the different affective states they evoke, known as the affective gap. Researchers have used deep representation learning methods to address this challenge of extracting generalized features from entire images. However, most existing methods overlook the importance of specific emotional attributes such as brightness, colorfulness, scene understanding, and facial expressions. Through this paper, we introduce A4Net, a deep representation network to bridge the affective gap by leveraging four key attributes: brightness (Attribute 1), colorfulness (Attribute 2), scene context (Attribute 3), and facial expressions (Attribute 4). By fusing and jointly training all aspects of attribute recognition and visual emotion analysis, A4Net aims to provide a better insight into emotional content in images. Experimental results show the effectiveness of A4Net, showcasing competitive performance compared to state-of-the-art methods across diverse visual emotion datasets. Furthermore, visualizations of activation maps generated by A4Net offer insights into its ability to generalize across different visual emotion datasets.

Updated: 2025-04-09 05:00:43

标题: Attributes-aware Visual Emotion Representation Learning的翻译是：基于属性的视觉情感表示学习

摘要: 视觉情感分析或识别受到了广泛关注，因为人们越来越关注了解图像如何传达丰富的语义并在人类感知中引发情感。然而，与传统视觉任务相比，视觉情感分析提出了独特的挑战，特别是由于一般视觉特征和它们引发的不同情感状态之间错综复杂的关系，即所谓的情感差距。研究人员已经使用深度表示学习方法来解决从整个图像中提取泛化特征的挑战。然而，大多数现有方法忽视了特定情感属性的重要性，如亮度、色彩、场景理解和面部表情。通过本文，我们介绍了A4Net，这是一个深度表示网络，通过利用亮度（属性1）、色彩度（属性2）、场景上下文（属性3）和面部表情（属性4）这四个关键属性来弥合情感差距。通过融合和联合训练属性识别和视觉情感分析的所有方面，A4Net旨在提供对图像中情感内容的更好洞察。实验结果显示了A4Net的有效性，展示了其在各种视觉情感数据集上与最先进方法相比的竞争性表现。此外，由A4Net生成的激活图的可视化提供了对其在不同视觉情感数据集上泛化能力的洞察。

更新时间: 2025-04-09 05:00:43

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2504.06578v1

Bypassing Safety Guardrails in LLMs Using Humor

In this paper, we show it is possible to bypass the safety guardrails of large language models (LLMs) through a humorous prompt including the unsafe request. In particular, our method does not edit the unsafe request and follows a fixed template -- it is simple to implement and does not need additional LLMs to craft prompts. Extensive experiments show the effectiveness of our method across different LLMs. We also show that both removing and adding more humor to our method can reduce its effectiveness -- excessive humor possibly distracts the LLM from fulfilling its unsafe request. Thus, we argue that LLM jailbreaking occurs when there is a proper balance between focus on the unsafe request and presence of humor.

Updated: 2025-04-09 04:58:14

标题: 绕过低门槛安全措施：利用幽默的方式

摘要: 在这篇论文中，我们展示了通过一个包含不安全请求的幽默提示，可以绕过大型语言模型（LLMs）的安全防护栏。特别是，我们的方法不编辑不安全请求，并遵循一个固定的模板--这种方法易于实现，不需要额外的LLMs来制作提示。大量实验证明了我们方法在不同LLMs上的有效性。我们还表明，删除或增加更多幽默元素到我们的方法都会降低其有效性--过度的幽默可能会分散LLM的注意力，使其无法满足不安全请求。因此，我们认为当在不安全请求和幽默的存在之间找到合适的平衡时，LLM越狱就会发生。

更新时间: 2025-04-09 04:58:14

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.06577v1

Prompting or Fine-tuning? Exploring Large Language Models for Causal Graph Validation

This study explores the capability of Large Language Models (LLMs) to evaluate causality in causal graphs generated by conventional statistical causal discovery methods-a task traditionally reliant on manual assessment by human subject matter experts. To bridge this gap in causality assessment, LLMs are employed to evaluate the causal relationships by determining whether a causal connection between variable pairs can be inferred from textual context. Our study compares two approaches: (1) prompting-based method for zero-shot and few-shot causal inference and, (2) fine-tuning language models for the causal relation prediction task. While prompt-based LLMs have demonstrated versatility across various NLP tasks, our experiments on biomedical and general-domain datasets show that fine-tuned models consistently outperform them, achieving up to a 20.5-point improvement in F1 score-even when using smaller-parameter language models. These findings provide valuable insights into the strengths and limitations of both approaches for causal graph evaluation.

Updated: 2025-04-09 04:44:48

标题: 促进还是微调？探索用于因果图验证的大型语言模型

摘要: 这项研究探讨了大型语言模型（LLMs）在评估传统统计因果发现方法生成的因果图中的因果关系的能力，这是传统上依赖于人类专家手动评估的任务。为了弥合因果评估中的这一差距，LLMs被用来通过确定是否可以从文本上下文中推断出变量对之间的因果关系来评估因果关系。我们的研究比较了两种方法：（1）基于提示的零射和少射因果推理方法，以及（2）为因果关系预测任务微调语言模型。虽然基于提示的LLMs已经展示了在各种NLP任务中的多功能性，但我们在生物医学和一般领域数据集上的实验表明，微调模型始终表现出色，即使使用较小参数的语言模型也能实现高达20.5点的F1分数提高。这些发现为因果图评估的两种方法的优势和局限性提供了宝贵的见解。

更新时间: 2025-04-09 04:44:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16899v2

Defending LLM Watermarking Against Spoofing Attacks with Contrastive Representation Learning

Watermarking has emerged as a promising technique for detecting texts generated by LLMs. Current research has primarily focused on three design criteria: high quality of the watermarked text, high detectability, and robustness against removal attack. However, the security against spoofing attacks remains relatively understudied. For example, a piggyback attack can maliciously alter the meaning of watermarked text-transforming it into hate speech-while preserving the original watermark, thereby damaging the reputation of the LLM provider. We identify two core challenges that make defending against spoofing difficult: (1) the need for watermarks to be both sensitive to semantic-distorting changes and insensitive to semantic-preserving edits, and (2) the contradiction between the need to detect global semantic shifts and the local, auto-regressive nature of most watermarking schemes. To address these challenges, we propose a semantic-aware watermarking algorithm that post-hoc embeds watermarks into a given target text while preserving its original meaning. Our method introduces a semantic mapping model, which guides the generation of a green-red token list, contrastively trained to be sensitive to semantic-distorting changes and insensitive to semantic-preserving changes. Experiments on two standard benchmarks demonstrate strong robustness against removal attacks and security against spoofing attacks, including sentiment reversal and toxic content insertion, while maintaining high watermark detectability. Our approach offers a significant step toward more secure and semantically aware watermarking for LLMs. Our code is available at https://github.com/UCSB-NLP-Chang/contrastive-watermark.

Updated: 2025-04-09 04:38:17

标题: 使用对比表示学习防御LLM水印技术免受欺骗攻击

摘要: 数字水印技术已经成为一种检测由LLMs生成的文本的有希望的技术。当前的研究主要集中在三个设计标准上：数字水印文本的高质量、高可检测性以及对删除攻击的鲁棒性。然而，对抗欺骗攻击的安全性仍然相对不够研究。例如，一个附带攻击可以恶意改变数字水印文本的含义，将其转化为仇恨言论，同时保留原始水印，从而损害LLM提供者的声誉。我们确定了两个核心挑战，使得防御欺骗变得困难：(1)数字水印需要对语义扭曲性变化敏感，对保持语义的编辑不敏感；(2)需要检测全局语义转变与大多数数字水印方案的局部自回归性质之间的矛盾。为了解决这些挑战，我们提出了一种语义感知的数字水印算法，后期将水印嵌入到给定目标文本中，同时保持其原始含义。我们的方法引入了一个语义映射模型，指导生成一个经过对比训练，对语义扭曲变化敏感而对保持语义变化不敏感的绿红标记列表。对两个标准基准的实验表明，我们的方法对删除攻击具有很强的鲁棒性，对抗欺骗攻击的安全性也很高，包括情感逆转和有毒内容插入，同时保持高水印可检测性。我们的方法为更安全和语义感知的LLMs数字水印提供了重要的一步。我们的代码可在https://github.com/UCSB-NLP-Chang/contrastive-watermark找到。

更新时间: 2025-04-09 04:38:17

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2504.06575v1

Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups

Large Language Models (LLMs) have been shown to demonstrate imbalanced biases against certain groups. However, the study of unprovoked targeted attacks by LLMs towards at-risk populations remains underexplored. Our paper presents three novel contributions: (1) the explicit evaluation of LLM-generated attacks on highly vulnerable mental health groups; (2) a network-based framework to study the propagation of relative biases; and (3) an assessment of the relative degree of stigmatization that emerges from these attacks. Our analysis of a recently released large-scale bias audit dataset reveals that mental health entities occupy central positions within attack narrative networks, as revealed by a significantly higher mean centrality of closeness (p-value = 4.06e-10) and dense clustering (Gini coefficient = 0.7). Drawing from sociological foundations of stigmatization theory, our stigmatization analysis indicates increased labeling components for mental health disorder-related targets relative to initial targets in generation chains. Taken together, these insights shed light on the structural predilections of large language models to heighten harmful discourse and highlight the need for suitable approaches for mitigation.

Updated: 2025-04-09 04:24:38

标题: 穿越兔子洞：面向心理健康群体的LLM生成的攻击叙事中的新兴偏见

摘要: 大型语言模型（LLMs）已被证明存在对某些群体的不平衡偏见。然而，对LLMs对处于风险人群进行无端攻击的研究仍未得到充分探讨。我们的论文提出了三个新颖的贡献：（1）明确评估LLM生成的对高度脆弱的心理健康群体的攻击；（2）一个基于网络的框架，用于研究相对偏见的传播；（3）评估这些攻击所产生的相对污名化程度。我们对最近发布的大规模偏见审计数据集进行的分析表明，心理健康实体在攻击叙事网络中占据了核心位置，这表现在接近度的平均中心性显著较高（p值=4.06e-10）和密集聚类（基尼系数=0.7）。借鉴污名化理论的社会学基础，我们的污名化分析表明，相对于生成链中的初始目标，与心理健康障碍相关的目标存在增加的标签化成分。综合而言，这些见解揭示了大型语言模型加剧有害言论的结构偏好，并突显了对缓解的适当方法的需要。

更新时间: 2025-04-09 04:24:38

领域: cs.CL,cs.AI,cs.CY,cs.LG,cs.SI,J.4; K.4.1; K.4.2

下载: http://arxiv.org/abs/2504.06160v2

DDT: Decoupled Diffusion Transformer

Diffusion transformers have demonstrated remarkable generation quality, albeit requiring longer training iterations and numerous inference steps. In each denoising step, diffusion transformers encode the noisy inputs to extract the lower-frequency semantic component and then decode the higher frequency with identical modules. This scheme creates an inherent optimization dilemma: encoding low-frequency semantics necessitates reducing high-frequency components, creating tension between semantic encoding and high-frequency decoding. To resolve this challenge, we propose a new \textbf{\color{ddt}D}ecoupled \textbf{\color{ddt}D}iffusion \textbf{\color{ddt}T}ransformer~(\textbf{\color{ddt}DDT}), with a decoupled design of a dedicated condition encoder for semantic extraction alongside a specialized velocity decoder. Our experiments reveal that a more substantial encoder yields performance improvements as model size increases. For ImageNet $256\times256$, Our DDT-XL/2 achieves a new state-of-the-art performance of {1.31 FID}~(nearly $4\times$ faster training convergence compared to previous diffusion transformers). For ImageNet $512\times512$, Our DDT-XL/2 achieves a new state-of-the-art FID of 1.28. Additionally, as a beneficial by-product, our decoupled architecture enhances inference speed by enabling the sharing self-condition between adjacent denoising steps. To minimize performance degradation, we propose a novel statistical dynamic programming approach to identify optimal sharing strategies.

Updated: 2025-04-09 04:23:38

标题: DDT：解耦扩散变压器

摘要: 扩散变压器展示了出色的生成质量，尽管需要较长的训练迭代和大量的推理步骤。在每个去噪步骤中，扩散变压器对嘈杂的输入进行编码，提取低频语义组件，然后使用相同的模块解码高频部分。这种方案造成了一个固有的优化困境：编码低频语义需要减少高频分量，在语义编码和高频解码之间产生了紧张关系。为了解决这一挑战，我们提出了一种新的分离扩散变压器（DDT），具有专门的条件编码器用于语义提取以及专门的速度解码器的分离设计。我们的实验证明，随着模型规模的增加，更强大的编码器可以提高性能。对于ImageNet 256×256，我们的DDT-XL/2实现了1.31的新的最先进性能（与先前的扩散变压器相比，训练收敛速度几乎快4倍）。对于ImageNet 512×512，我们的DDT-XL/2实现了1.28的新的最先进性能。此外，作为一个有益的副产品，我们的分离架构通过在相邻的去噪步骤之间共享自条件来提高推理速度。为了最小化性能降级，我们提出了一种新颖的统计动态规划方法来确定最佳的共享策略。

更新时间: 2025-04-09 04:23:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.05741v2

UMGAD: Unsupervised Multiplex Graph Anomaly Detection

Graph anomaly detection (GAD) is a critical task in graph machine learning, with the primary objective of identifying anomalous nodes that deviate significantly from the majority. This task is widely applied in various real-world scenarios, including fraud detection and social network analysis. However, existing GAD methods still face two major challenges: (1) They are often limited to detecting anomalies in single-type interaction graphs and struggle with multiple interaction types in multiplex heterogeneous graphs. (2) In unsupervised scenarios, selecting appropriate anomaly score thresholds remains a significant challenge for accurate anomaly detection. To address the above challenges, we propose a novel Unsupervised Multiplex Graph Anomaly Detection method, named UMGAD. We first learn multi-relational correlations among nodes in multiplex heterogeneous graphs and capture anomaly information during node attribute and structure reconstruction through graph-masked autoencoder (GMAE). Then, to further extract abnormal information, we generate attribute-level and subgraph-level augmented-view graphs, respectively, and perform attribute and structure reconstruction through GMAE. Finally, we learn to optimize node attributes and structural features through contrastive learning between original-view and augmented-view graphs to improve the model's ability to capture anomalies. Meanwhile, we propose a new anomaly score threshold selection strategy, which allows the model to be independent of ground truth information in real unsupervised scenarios. Extensive experiments on six datasets show that our UMGAD significantly outperforms state-of-the-art methods, achieving average improvements of 12.25% in AUC and 11.29% in Macro-F1 across all datasets.

Updated: 2025-04-09 04:11:23

标题: UMGAD：无监督多重图异常检测

摘要: 图异常检测（GAD）是图机器学习中的一个关键任务，其主要目标是识别与大多数节点明显偏离的异常节点。这项任务被广泛应用于各种现实场景，包括欺诈检测和社交网络分析。然而，现有的GAD方法仍然面临两个主要挑战：（1）它们通常仅限于检测单一类型交互图中的异常，并且在多重交互类型的多层异构图中存在困难。（2）在无监督场景中，选择合适的异常评分阈值仍然是准确检测异常的重要挑战。为了解决上述挑战，我们提出了一种新颖的无监督多层图异常检测方法，命名为UMGAD。我们首先学习多重关系图中节点之间的关联，并通过图掩码自编码器（GMAE）在节点属性和结构重建过程中捕获异常信息。然后，为了进一步提取异常信息，我们分别生成属性级和子图级的增强视图图，并通过GMAE执行属性和结构重建。最后，我们通过原始视图和增强视图图之间的对比学习来优化节点属性和结构特征，以提高模型捕获异常的能力。与此同时，我们提出了一种新的异常评分阈值选择策略，使模型能够在真实无监督场景中独立于地面真相信息。对六个数据集进行的大量实验表明，我们的UMGAD明显优于最先进的方法，在所有数据集中平均提高了12.25%的AUC和11.29%的Macro-F1。

更新时间: 2025-04-09 04:11:23

领域: cs.LG

下载: http://arxiv.org/abs/2411.12556v4

Leanabell-Prover: Posttraining Scaling in Formal Reasoning

Recent advances in automated theorem proving (ATP) through LLMs have highlighted the potential of formal reasoning with Lean 4 codes. However, ATP has not yet be revolutionized by the recent posttraining scaling as demonstrated by Open AI O1/O3 and Deepseek R1. In this work, we investigate the entire posttraining of ATP, aiming to align it with breakthroughs in reasoning models in natural languages. To begin, we continual train current ATP models with a hybrid dataset, which consists of numerous statement-proof pairs, and additional data aimed at incorporating cognitive behaviors that emulate human reasoning and hypothesis refinement. Next, we explore reinforcement learning with the use of outcome reward returned by Lean 4 compiler. Through our designed continual training and reinforcement learning processes, we have successfully improved existing formal provers, including both DeepSeek-Prover-v1.5 and Goedel-Prover, achieving state-of-the-art performance in the field of whole-proof generation. For example, we achieve a 59.8% pass rate (pass@32) on MiniF2F. This is an on-going project and we will progressively update our findings, release our data and training details.

Updated: 2025-04-09 04:03:00

标题: Leanabell-Prover：形式推理中的后训练缩放

摘要: 最近关于自动定理证明（ATP）的LLMs的技术进步突显了使用Lean 4代码进行形式推理的潜力。然而，ATP尚未像Open AI O1/O3和Deepseek R1所示的最近的后训练扩展那样彻底革新。在这项工作中，我们调查了ATP的整个后训练过程，旨在将其与自然语言推理模型的突破相一致。首先，我们使用包含大量语句-证明对和旨在融入模拟人类推理和假设精化的认知行为的额外数据的混合数据集来不断训练当前的ATP模型。接下来，我们探索了通过Lean 4编译器返回的结果奖励来实现强化学习。通过我们设计的持续训练和强化学习过程，我们成功改进了现有的形式证明器，包括DeepSeek-Prover-v1.5和Goedel-Prover，实现了在整个证明生成领域的最新性能。例如，我们在MiniF2F上实现了59.8％的通过率（pass@32）。这是一个正在进行中的项目，我们将逐步更新我们的发现，发布我们的数据和培训细节。

更新时间: 2025-04-09 04:03:00

领域: cs.AI

下载: http://arxiv.org/abs/2504.06122v2

Diffusion Factor Models: Generating High-Dimensional Returns with Factor Structure

Financial scenario simulation is essential for risk management and portfolio optimization, yet it remains challenging especially in high-dimensional and small data settings common in finance. We propose a diffusion factor model that integrates latent factor structure into generative diffusion processes, bridging econometrics with modern generative AI to address the challenges of the curse of dimensionality and data scarcity in financial simulation. By exploiting the low-dimensional factor structure inherent in asset returns, we decompose the score function--a key component in diffusion models--using time-varying orthogonal projections, and this decomposition is incorporated into the design of neural network architectures. We derive rigorous statistical guarantees, establishing nonasymptotic error bounds for both score estimation at O(d^{5/2} n^{-2/(k+5)}) and generated distribution at O(d^{5/4} n^{-1/2(k+5)}), primarily driven by the intrinsic factor dimension k rather than the number of assets d, surpassing the dimension-dependent limits in the classical nonparametric statistics literature and making the framework viable for markets with thousands of assets. Numerical studies confirm superior performance in latent subspace recovery under small data regimes. Empirical analysis demonstrates the economic significance of our framework in constructing mean-variance optimal portfolios and factor portfolios. This work presents the first theoretical integration of factor structure with diffusion models, offering a principled approach for high-dimensional financial simulation with limited data.

Updated: 2025-04-09 04:01:35

标题: 扩散因子模型：使用因子结构生成高维收益

摘要: 财务场景模拟对于风险管理和投资组合优化至关重要，但在金融领域常见的高维度和小数据情况下仍然具有挑战性。我们提出了一种扩散因子模型，将潜在因子结构整合到生成性扩散过程中，将计量经济学与现代生成人工智能相结合，以解决金融模拟中维度诅咒和数据稀缺性的挑战。通过利用资产收益中固有的低维因子结构，我们使用时间变化的正交投影来分解得分函数--扩散模型中的关键组成部分，并将这种分解纳入神经网络架构的设计中。我们推导了严格的统计保证，建立了得分估计的非渐近误差界为O(d^{5/2} n^{-2/(k+5)})和生成分布的非渐近误差界为O(d^{5/4} n^{-1/2(k+5)})，主要由固有因子维度k驱动，而不是资产数量d，超越了经典非参数统计学文献中的维度相关限制，并使该框架适用于拥有成千上万资产的市场。数值研究证实在小数据情况下对潜在子空间恢复的卓越性能。经验分析展示了我们框架在构建均值-方差最优投资组合和因子投资组合方面的经济意义。这项工作首次将因子结构与扩散模型进行了理论整合，为具有有限数据的高维度金融模拟提供了一种合理的方法。

更新时间: 2025-04-09 04:01:35

领域: q-fin.ST,cs.LG,q-fin.MF

下载: http://arxiv.org/abs/2504.06566v1

Discovering Influential Neuron Path in Vision Transformers

Vision Transformer models exhibit immense power yet remain opaque to human understanding, posing challenges and risks for practical applications. While prior research has attempted to demystify these models through input attribution and neuron role analysis, there's been a notable gap in considering layer-level information and the holistic path of information flow across layers. In this paper, we investigate the significance of influential neuron paths within vision Transformers, which is a path of neurons from the model input to output that impacts the model inference most significantly. We first propose a joint influence measure to assess the contribution of a set of neurons to the model outcome. And we further provide a layer-progressive neuron locating approach that efficiently selects the most influential neuron at each layer trying to discover the crucial neuron path from input to output within the target model. Our experiments demonstrate the superiority of our method finding the most influential neuron path along which the information flows, over the existing baseline solutions. Additionally, the neuron paths have illustrated that vision Transformers exhibit some specific inner working mechanism for processing the visual information within the same image category. We further analyze the key effects of these neurons on the image classification task, showcasing that the found neuron paths have already preserved the model capability on downstream tasks, which may also shed some lights on real-world applications like model pruning. The project website including implementation code is available at https://foundation-model-research.github.io/NeuronPath/.

Updated: 2025-04-09 03:53:25

标题: 发现视觉变压器中具有影响力的神经元路径

摘要: 视觉Transformer模型展示了巨大的能力，但对人类理解仍然不透明，给实际应用带来了挑战和风险。虽然先前的研究已经尝试通过输入归因和神经元角色分析来揭示这些模型，但在考虑层级信息和跨层信息流的整体路径方面存在明显的差距。在本文中，我们研究了视觉Transformer中具有影响力的神经元路径的重要性，这是从模型输入到输出的一条神经元路径，对模型推断产生最显著影响。我们首先提出了一个联合影响度量来评估一组神经元对模型结果的贡献。我们进一步提供了一种逐层递进的神经元定位方法，有效地选择每一层中最有影响力的神经元，试图发现目标模型中从输入到输出的关键神经元路径。我们的实验表明，我们的方法在找到信息流动的最具影响力的神经元路径方面优于现有基准解决方案。此外，神经元路径展示了视觉Transformer在处理同一图像类别内的视觉信息时表现出一些特定的内部工作机制。我们进一步分析了这些神经元对图像分类任务的关键作用，展示了找到的神经元路径已经在下游任务中保留了模型的能力，这也可能为模型修剪等实际应用提供一些启示。该项目网站包括实现代码可在https://foundation-model-research.github.io/NeuronPath/找到。

更新时间: 2025-04-09 03:53:25

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.09046v2

Security Vulnerabilities in Ethereum Smart Contracts: A Systematic Analysis

Smart contracts are a secure and trustworthy application that plays a vital role in decentralized applications in various fields such as insurance,the internet, and gaming. However, in recent years, smart contract security breaches have occurred frequently, and due to their financial properties, they have caused huge economic losses, such as the most famous security incident "The DAO" which caused a loss of over $60 million in Ethereum. This has drawn a lot of attention from all sides. Writing a secure smart contract is now a critical issue.This paper focuses on Ether smart contracts and explains the main components of Ether, smart contract architecture and mechanism.The environment used in this paper is the Ethernet environment, using remix online compilation platform and Solidity language, according to the four security events of American Chain, The DAO, Parity and KotET, the principles of integer overflow attack, reentrant attack, access control attack and denial of service attack are studied and analyzed accordingly, and the scenarios of these vulnerabilities are reproduced, and the measures to prevent them are given. Finally, preventive measures are given. In addition, the principles of short address attack, early transaction attack and privileged function exposure attack are also introduced in detail, and security measures are proposed.As vulnerabilities continue to emerge, their classification will also evolve. The analysis and research of the current vulnerabilities are also to lay a solid foundation for avoiding more vulnerabilities.

Updated: 2025-04-09 03:47:55

标题: 以太坊智能合约中的安全漏洞：系统性分析

摘要: 智能合约是一种安全可信的应用程序，在保险、互联网和游戏等各个领域的分散式应用程序中发挥着至关重要的作用。然而，近年来，智能合约安全漏洞频繁发生，由于其财务属性，造成了巨大的经济损失，比如最著名的安全事件“The DAO”导致以太坊损失超过6000万美元。这引起了各方的极大关注。编写安全的智能合约现在是一个关键问题。本文重点介绍以太坊智能合约，解释了以太坊的主要组成部分、智能合约架构和机制。本文使用以太网环境，使用remix在线编译平台和Solidity语言，根据美国链、The DAO、Parity和KotET的四个安全事件，研究和分析了整数溢出攻击、递归攻击、访问控制攻击和拒绝服务攻击的原则，并重现了这些漏洞的场景，并提出了预防措施。最后，给出了预防措施。此外，还详细介绍了短地址攻击、早期交易攻击和特权功能暴露攻击的原则，并提出了安全措施。随着漏洞不断出现，它们的分类也将不断发展。对当前漏洞的分析和研究也为避免更多漏洞奠定了坚实的基础。

更新时间: 2025-04-09 03:47:55

领域: cs.CR,D.2.4

下载: http://arxiv.org/abs/2504.05968v2

TabKAN: Advancing Tabular Data Analysis using Kolmograv-Arnold Network

Tabular data analysis presents unique challenges due to its heterogeneous feature types, missing values, and complex interactions. While traditional machine learning methods, such as gradient boosting, often outperform deep learning approaches, recent advancements in neural architectures offer promising alternatives. This paper introduces TabKAN, a novel framework that advances tabular data modeling using Kolmogorov-Arnold Networks (KANs). Unlike conventional deep learning models, KANs leverage learnable activation functions on edges, enhancing both interpretability and training efficiency. Our contributions include: (1) the introduction of modular KAN-based architectures tailored for tabular data analysis, (2) the development of a transfer learning framework for KAN models, enabling effective knowledge transfer between domains, (3) the development of model-specific interpretability for tabular data learning, reducing reliance on post hoc and model-agnostic analysis, and (4) comprehensive evaluation of vanilla supervised learning across binary and multi-class classification tasks. Through extensive benchmarking on diverse public datasets, TabKAN demonstrates superior performance in supervised learning while significantly outperforming classical and Transformer-based models in transfer learning scenarios. Our findings highlight the advantage of KAN-based architectures in efficiently transferring knowledge across domains, bridging the gap between traditional machine learning and deep learning for structured data.

Updated: 2025-04-09 03:46:10

标题: TabKAN：利用Kolmograv-Arnold网络推动表格数据分析

摘要: 表格数据分析面临独特挑战，因其异质特征类型、缺失值和复杂交互。虽然传统机器学习方法（如梯度提升）通常优于深度学习方法，但最近神经网络架构的进步提供了有希望的替代方案。本文介绍了TabKAN，一个利用科尔莫哥洛夫-阿诺德网络（KANs）推进表格数据建模的新框架。与传统深度学习模型不同，KANs利用边缘上可学习的激活函数，提高了可解释性和训练效率。我们的贡献包括：（1）针对表格数据分析定制的模块化KAN架构的介绍，（2）为KAN模型开发了迁移学习框架，实现领域间的有效知识转移，（3）为表格数据学习开发了模型特定的可解释性，减少对事后和模型无关分析的依赖，以及（4）在二元和多类分类任务上进行了全面评估。通过在多样公共数据集上进行广泛基准测试，TabKAN在监督学习中展现出卓越性能，同时在迁移学习场景中明显优于传统和基于Transformer的模型。我们的发现突显了KAN架构在跨领域高效传递知识的优势，弥合了传统机器学习和深度学习在结构化数据方面的差距。

更新时间: 2025-04-09 03:46:10

领域: cs.LG

下载: http://arxiv.org/abs/2504.06559v1

Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning

Reinforcement learning from human feedback (RLHF) has emerged as a key technique for aligning the output of large language models (LLMs) with human preferences. To learn the reward function, most existing RLHF algorithms use the Bradley-Terry model, which relies on assumptions about human preferences that may not reflect the complexity and variability of real-world judgments. In this paper, we propose a robust algorithm to enhance the performance of existing approaches under such reward model misspecifications. Theoretically, our algorithm reduces the variance of reward and policy estimators, leading to improved regret bounds. Empirical evaluations on LLM benchmark datasets demonstrate that the proposed algorithm consistently outperforms existing methods, with 77-81% of responses being favored over baselines on the Anthropic Helpful and Harmless dataset.

Updated: 2025-04-09 03:41:09

标题: 大规模语言模型微调的鲁棒性强化学习与人类反馈

摘要: 人类反馈的强化学习（RLHF）已经成为将大型语言模型（LLMs）的输出与人类偏好对齐的关键技术。为了学习奖励函数，大多数现有的RLHF算法使用Bradley-Terry模型，这种模型依赖于对人类偏好的假设，可能不反映真实世界判断的复杂性和变异性。在本文中，我们提出了一种强大的算法，以增强现有方法在奖励模型错误规范化下的性能。从理论上讲，我们的算法减少了奖励和策略估计器的方差，导致改进的后悔边界。在LLM基准数据集上的实证评估表明，所提出的算法在Anthropic Helpful and Harmless数据集上始终优于现有方法，有77-81%的响应被认为优于基线。

更新时间: 2025-04-09 03:41:09

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.03784v2

Understanding Users' Security and Privacy Concerns and Attitudes Towards Conversational AI Platforms

The widespread adoption of conversational AI platforms has introduced new security and privacy risks. While these risks and their mitigation strategies have been extensively researched from a technical perspective, users' perceptions of these platforms' security and privacy remain largely unexplored. In this paper, we conduct a large-scale analysis of over 2.5M user posts from the r/ChatGPT Reddit community to understand users' security and privacy concerns and attitudes toward conversational AI platforms. Our qualitative analysis reveals that users are concerned about each stage of the data lifecycle (i.e., collection, usage, and retention). They seek mitigations for security vulnerabilities, compliance with privacy regulations, and greater transparency and control in data handling. We also find that users exhibit varied behaviors and preferences when interacting with these platforms. Some users proactively safeguard their data and adjust privacy settings, while others prioritize convenience over privacy risks, dismissing privacy concerns in favor of benefits, or feel resigned to inevitable data sharing. Through qualitative content and regression analysis, we discover that users' concerns evolve over time with the evolving AI landscape and are influenced by technological developments and major events. Based on our findings, we provide recommendations for users, platforms, enterprises, and policymakers to enhance transparency, improve data controls, and increase user trust and adoption.

Updated: 2025-04-09 03:22:48

标题: 理解用户对话AI平台的安全和隐私担忧以及态度

摘要: 对话式人工智能平台的广泛应用引入了新的安全和隐私风险。尽管这些风险及其缓解策略在技术层面上得到了广泛研究，但用户对这些平台的安全和隐私的看法仍然未被深入探讨。本文通过对Reddit社区r/ChatGPT中超过250万用户帖子的大规模分析，以了解用户对对话式人工智能平台的安全和隐私关注和态度。我们的定性分析显示，用户对数据生命周期的每个阶段（即收集、使用和保留）都感到担忧。他们寻求安全漏洞的缓解措施，遵守隐私法规，并在数据处理中获得更多透明度和控制。我们还发现，用户在与这些平台互动时表现出不同的行为和偏好。一些用户会主动保护其数据并调整隐私设置，而其他人则将便利置于隐私风险之上，忽视隐私问题而选择利益，或认为数据共享是不可避免的。通过定性内容和回归分析，我们发现用户的关注随着人工智能领域的发展而逐渐演变，并受到技术发展和重大事件的影响。根据我们的研究结果，我们为用户、平台、企业和决策者提供建议，以增强透明度、改进数据控制，并增加用户信任和采纳。

更新时间: 2025-04-09 03:22:48

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2504.06552v1

SketchRef: a Multi-Task Evaluation Benchmark for Sketch Synthesis

Sketching is a powerful artistic technique for capturing essential visual information about real-world objects and has increasingly attracted attention in image synthesis research. However, the field lacks a unified benchmark to evaluate the performance of various synthesis methods. To address this, we propose SketchRef, the first comprehensive multi-task evaluation benchmark for sketch synthesis. SketchRef fully leverages the shared characteristics between sketches and reference photos. It introduces two primary tasks: category prediction and structural consistency estimation, the latter being largely overlooked in previous studies. These tasks are further divided into five sub-tasks across four domains: animals, common things, human body, and faces. Recognizing the inherent trade-off between recognizability and simplicity in sketches, we are the first to quantify this balance by introducing a recognizability calculation method constrained by simplicity, mRS, ensuring fair and meaningful evaluations. To validate our approach, we collected 7,920 responses from art enthusiasts, confirming the effectiveness of our proposed evaluation metrics. Additionally, we evaluate the performance of existing sketch synthesis methods on our benchmark, highlighting their strengths and weaknesses. We hope this study establishes a standardized benchmark and offers valuable insights for advancing sketch synthesis algorithms.

Updated: 2025-04-09 03:18:01

标题: SketchRef：一个用于草图合成的多任务评估基准

摘要: Sketching是一种强大的艺术技术，用于捕捉关于真实世界对象的基本视觉信息，并在图像合成研究中越来越受到关注。然而，该领域缺乏统一的基准来评估各种合成方法的性能。为了解决这个问题，我们提出了SketchRef，这是第一个全面的多任务评估基准，用于素描合成。SketchRef充分利用了素描和参考照片之间的共同特征。它引入了两个主要任务：类别预测和结构一致性估计，后者在先前的研究中大多被忽视。这些任务进一步分为四个领域的五个子任务：动物、常见物品、人体和面部。鉴于素描中辨识度和简单性之间的固有权衡，我们是第一个通过引入一种受简单性限制的辨识度计算方法mRS来量化这种平衡，确保公平和有意义的评估。为了验证我们的方法，我们从艺术爱好者那里收集了7,920个回应，证实了我们提出的评估指标的有效性。此外，我们评估了现有素描合成方法在我们的基准上的性能，突出它们的优势和劣势。我们希望这项研究建立了一个标准化的基准，并为推进素描合成算法提供了有价值的见解。

更新时间: 2025-04-09 03:18:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.08623v2

Societal Impacts Research Requires Benchmarks for Creative Composition Tasks

Foundation models that are capable of automating cognitive tasks represent a pivotal technological shift, yet their societal implications remain unclear. These systems promise exciting advances, yet they also risk flooding our information ecosystem with formulaic, homogeneous, and potentially misleading synthetic content. Developing benchmarks grounded in real use cases where these risks are most significant is therefore critical. Through a thematic analysis using 2 million language model user prompts, we identify creative composition tasks as a prevalent usage category where users seek help with personal tasks that require everyday creativity. Our fine-grained analysis identifies mismatches between current benchmarks and usage patterns among these tasks. Crucially, we argue that the same use cases that currently lack thorough evaluations can lead to negative downstream impacts. This position paper argues that benchmarks focused on creative composition tasks is a necessary step towards understanding the societal harms of AI-generated content. We call for greater transparency in usage patterns to inform the development of new benchmarks that can effectively measure both the progress and the impacts of models with creative capabilities.

Updated: 2025-04-09 03:12:16

标题: 社会影响研究需要创造性构思任务的基准。

摘要: 具有自动化认知任务能力的基础模型代表了一个重要的技术转变，然而它们的社会影响仍然不清楚。这些系统承诺了令人兴奋的进展，但也存在着向我们的信息生态系统注入公式化、同质化和潜在误导性的合成内容的风险。因此，在真实用例基础上开发基准是至关重要的，因为在这些风险最为重要的地方。通过使用200万语言模型用户提示进行主题分析，我们确定了创意构图任务作为一个用户寻求帮助解决需要日常创造力的个人任务的流行使用类别。我们的细致分析识别了当前基准和这些任务中的使用模式之间的不匹配。至关重要的是，我们认为目前缺乏彻底评估的相同用例可能会导致负面的下游影响。本立场论文认为，专注于创意构图任务的基准是了解AI生成内容的社会危害的必要步骤。我们呼吁在使用模式方面更加透明，以指导开发新的基准，可以有效地衡量具有创造能力的模型的进展和影响。

更新时间: 2025-04-09 03:12:16

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2504.06549v1

Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning

Recent advances in large-scale generative language models have shown that reasoning capabilities can significantly improve model performance across a variety of tasks. However, the impact of reasoning on a model's ability to mitigate stereotypical responses remains largely underexplored. In this work, we investigate the crucial relationship between a model's reasoning ability and fairness, and ask whether improved reasoning capabilities can mitigate harmful stereotypical responses, especially those arising due to shallow or flawed reasoning. We conduct a comprehensive evaluation of multiple open-source LLMs, and find that larger models with stronger reasoning abilities exhibit substantially lower stereotypical bias on existing fairness benchmarks. Building on this insight, we introduce ReGiFT -- Reasoning Guided Fine-Tuning, a novel approach that extracts structured reasoning traces from advanced reasoning models and infuses them into models that lack such capabilities. We use only general-purpose reasoning and do not require any fairness-specific supervision for bias mitigation. Notably, we see that models fine-tuned using ReGiFT not only improve fairness relative to their non-reasoning counterparts but also outperform advanced reasoning models on fairness benchmarks. We also analyze how variations in the correctness of the reasoning traces and their length influence model fairness and their overall performance. Our findings highlight that enhancing reasoning capabilities is an effective, fairness-agnostic strategy for mitigating stereotypical bias caused by reasoning flaws.

Updated: 2025-04-09 03:05:13

标题: 朝着公平性的推理：通过推理引导微调减少语言模型中的偏见

摘要: 最近大规模生成语言模型的进展表明，推理能力可以显著提高模型在各种任务中的性能。然而，推理对模型减轻刻板回应的影响仍然较少被探讨。在这项工作中，我们调查了模型推理能力和公平性之间的关键关系，询问改进的推理能力是否可以减轻有害的刻板回应，特别是那些由浅薄或有缺陷的推理引起的回应。我们对多个开源大规模生成语言模型进行了全面评估，并发现具有更强推理能力的较大模型在现有公平性基准测试中表现出明显较低的刻板偏见。基于这一认识，我们引入了一种新颖的方法ReGiFT -- 推理引导微调，从先进推理模型中提取结构化推理痕迹，并将其灌输到缺乏这种能力的模型中。我们只使用通用推理，不需要任何公平性特定的监督来减轻偏见。值得注意的是，我们发现使用ReGiFT进行微调的模型不仅在公平性方面改善相对于其非推理对照组，而且在公平性基准测试中表现优于先进推理模型。我们还分析了推理痕迹正确性和长度变化如何影响模型的公平性和整体性能。我们的研究结果突显了增强推理能力是一种有效的、与公平性无关的策略，可以减轻由推理缺陷引起的刻板偏见。

更新时间: 2025-04-09 03:05:13

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.05632v2

Learning Latent Hardening (LLH): Enhancing Deep Learning with Domain Knowledge for Material Inverse Problems

Advancements in deep learning and machine learning have improved the ability to model complex, nonlinear relationships, such as those encountered in complex material inverse problems. However, the effectiveness of these methods often depends on large datasets, which are not always available. In this study, the incorporation of domain-specific knowledge of the mechanical behavior of material microstructures is investigated to evaluate the impact on the predictive performance of the models in data-scarce scenarios. To overcome data limitations, a two-step framework, Learning Latent Hardening (LLH), is proposed. In the first step of LLH, a Deep Neural Network is employed to reconstruct full stress-strain curves from randomly selected portions of the stress-strain curves to capture the latent mechanical response of a material based on key microstructural features. In the second step of LLH, the results of the reconstructed stress-strain curves are leveraged to predict key microstructural features of porous materials. The performance of six deep learning and/or machine learning models trained with and without domain knowledge are compared: Convolutional Neural Networks, Deep Neural Networks, Extreme Gradient Boosting, K-Nearest Neighbors, Long Short-Term Memory, and Random Forest. The results from the models with domain-specific information consistently achieved higher $R^2$ values compared to models without prior knowledge. Models without domain knowledge missed critical patterns linking stress-strain behavior to microstructural changes, whereas domain-informed models better identified essential stress-strain features predictive of microstructure. These findings highlight the importance of integrating domain-specific knowledge with deep learning to achieve accurate outcomes in materials science.

Updated: 2025-04-09 03:04:57

标题: 学习潜在加固（LLH）：利用领域知识增强深度学习用于材料反问题

摘要: 深度学习和机器学习的进展已经提高了模拟复杂、非线性关系的能力，例如在复杂材料逆问题中遇到的关系。然而，这些方法的有效性往往取决于大量数据集，而这并不总是可获得的。本研究探讨了将材料微结构的力学行为的领域特定知识纳入模型以评估在数据稀缺情况下对模型预测性能的影响。为了克服数据限制，提出了一个两步框架Learning Latent Hardening（LLH）。在LLH的第一步中，采用深度神经网络从随机选择的应力应变曲线部分重建完整的应力应变曲线，以捕捉材料基于关键微结构特征的潜在力学响应。在LLH的第二步中，利用重建的应力应变曲线的结果来预测多孔材料的关键微结构特征。比较了经过培训的具有和不具有领域知识的六个深度学习和/或机器学习模型的性能：卷积神经网络、深度神经网络、极限梯度提升、K-最近邻、长短期记忆和随机森林。具有领域特定信息的模型的结果一致显示出比没有先验知识的模型更高的$R^2$值。没有领域知识的模型错过了将应力应变行为与微结构变化联系起来的关键模式，而领域知情的模型更好地识别了预测微结构的关键应力应变特征。这些发现突显了将领域特定知识与深度学习结合以在材料科学中获得准确结果的重要性。

更新时间: 2025-04-09 03:04:57

领域: cs.LG,cond-mat.mtrl-sci,cs.CE

下载: http://arxiv.org/abs/2501.10481v3

Polygon: Symbolic Reasoning for SQL using Conflict-Driven Under-Approximation Search

We present a novel symbolic reasoning engine for SQL which can efficiently generate an input $I$ for $n$ queries $P_1, \cdots, P_n$, such that their outputs on $I$ satisfy a given property (expressed in SMT). This is useful in different contexts, such as disproving equivalence of two SQL queries and disambiguating a set of queries. Our first idea is to reason about an under-approximation of each $P_i$ -- that is, a subset of $P_i$'s input-output behaviors. While it makes our approach both semantics-aware and lightweight, this idea alone is incomplete (as a fixed under-approximation might miss some behaviors of interest). Therefore, our second idea is to perform search over an expressive family of under-approximations (which collectively cover all program behaviors of interest), thereby making our approach complete. We have implemented these ideas in a tool, Polygon, and evaluated it on over 30,000 benchmarks across two tasks (namely, SQL equivalence refutation and query disambiguation). Our evaluation results show that Polygon significantly outperforms all prior techniques.

Updated: 2025-04-09 02:46:52

标题: 多边形：使用冲突驱动的欠估计搜索进行SQL的符号推理

摘要: 我们提出了一个新颖的符号推理引擎，用于SQL，可以高效地生成输入$I$，用于$n$个查询$P_1, \cdots, P_n$，使得它们在$I$上的输出满足给定的属性（在SMT中表示）。这在不同的情境中是有用的，比如证明两个SQL查询的等价性和消除一组查询的歧义。我们的第一个想法是对每个$P_i$的一个欠估计进行推理 - 也就是说，$P_i$的输入输出行为的一个子集。虽然这使得我们的方法既具有语义意识又轻量级，但这个想法本身是不完整的（因为固定的欠估计可能会错过一些感兴趣的行为）。因此，我们的第二个想法是在一个富有表现力的欠估计族中进行搜索（这些欠估计共同涵盖了所有感兴趣的程序行为），从而使我们的方法完整。我们已经在一个工具Polygon中实现了这些想法，并在两个任务（即，SQL等价性证伪和查询消除歧义）上对其进行了评估。我们的评估结果显示，Polygon明显优于所有先前的技术。

更新时间: 2025-04-09 02:46:52

领域: cs.PL,cs.AI,cs.DB,cs.SE

下载: http://arxiv.org/abs/2504.06542v1

Confidence Regularized Masked Language Modeling using Text Length

Masked language modeling is a widely used method for learning language representations, where the model predicts a randomly masked word in each input. However, this approach typically considers only a single correct answer during training, ignoring the variety of plausible alternatives that humans might choose. This issue becomes more pronounced when the input text is short, as the possible word distribution tends to have higher entropy, potentially causing the model to become overconfident in its predictions. To mitigate this, we propose a novel confidence regularizer that adaptively adjusts the regularization strength based on the input length. Experiments on the GLUE and SQuAD benchmarks show that our method improves both accuracy and expected calibration error

Updated: 2025-04-09 02:32:58

标题: 信心规范化的掩蔽语言建模方法及其应用

摘要: 遮蔽语言建模是一种广泛使用的学习语言表示的方法，其中模型预测每个输入中的一个随机遮蔽的单词。然而，这种方法通常只考虑训练过程中的一个正确答案，忽略了人类可能选择的各种可信的替代方案。当输入文本较短时，这个问题变得更加突出，因为可能的单词分布往往具有更高的熵，潜在地导致模型在其预测中变得过于自信。为了缓解这个问题，我们提出了一种新颖的置信度正则化器，根据输入长度自适应地调整正则化强度。在GLUE和SQuAD基准测试中的实验表明，我们的方法提高了准确性和期望的校准错误。

更新时间: 2025-04-09 02:32:58

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.06037v2

OPAL: Encoding Causal Understanding of Physical Systems for Robot Learning

We present OPAL (Operant Physical Agent with Language), a novel vision-language-action architecture that introduces topological constraints to flow matching for robotic control. To do so, we further introduce topological attention. Our approach models action sequences as topologically-structured representations with non-trivial constraints. Experimental results across 10 complex manipulation tasks demonstrate OPAL's superior performance compared to previous approaches, including Octo, OpenVLA, and ${\pi}$0. Our architecture achieves significant improvements in zero-shot performance without requiring task-specific fine-tuning, while reducing inference computational requirements by 42%. The theoretical guarantees provided by our topological approach result in more coherent long-horizon action sequences. Our results highlight the potential of constraining the search space of learning problems in robotics by deriving from fundamental physical laws, and the possibility of using topological attention to embed causal understanding into transformer architectures.

Updated: 2025-04-09 02:29:36

标题: OPAL：为机器人学习编码物理系统的因果理解

摘要: 我们提出了OPAL（Operant Physical Agent with Language），这是一种引入拓扑约束的视觉-语言-动作架构，用于机器人控制的流匹配。为此，我们进一步引入了拓扑注意力。我们的方法将动作序列建模为具有非平凡约束的拓扑结构表示。在10个复杂的操作任务中的实验结果表明，与先前的方法（包括Octo、OpenVLA和${\pi}$0）相比，OPAL表现出更优越的性能。我们的架构在不需要特定任务的微调的情况下实现了零样本性能的显着改善，同时将推理计算要求降低了42%。我们的拓扑方法提供的理论保证导致更连贯的长期行动序列。我们的结果突显了通过源自基本物理法则的约束学习问题的搜索空间的潜力，以及使用拓扑注意力将因果理解嵌入到变压器架构中的可能性。

更新时间: 2025-04-09 02:29:36

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2504.06538v1

Lugha-Llama: Adapting Large Language Models for African Languages

Large language models (LLMs) have achieved impressive results in a wide range of natural language applications. However, they often struggle to recognize low-resource languages, in particular African languages, which are not well represented in large training corpora. In this paper, we consider how to adapt LLMs to low-resource African languages. We find that combining curated data from African languages with high-quality English educational texts results in a training mix that substantially improves the model's performance on these languages. On the challenging IrokoBench dataset, our models consistently achieve the best performance amongst similarly sized baselines, particularly on knowledge-intensive multiple-choice questions (AfriMMLU). Additionally, on the cross-lingual question answering benchmark AfriQA, our models outperform the base model by over 10%. To better understand the role of English data during training, we translate a subset of 200M tokens into Swahili language and perform an analysis which reveals that the content of these data is primarily responsible for the strong performance. We release our models and data to encourage future research on African languages.

Updated: 2025-04-09 02:25:53

标题: Lugha-Llama：针对非洲语言调整大型语言模型

摘要: 大型语言模型（LLMs）在各种自然语言应用中取得了令人印象深刻的成果。然而，它们通常难以识别资源匮乏的语言，特别是非洲语言，这些语言在大型训练语料库中没有很好地表示。在本文中，我们考虑如何使LLMs适应资源匮乏的非洲语言。我们发现，将非洲语言的策划数据与高质量的英语教育文本相结合，会显著改善模型在这些语言上的性能。在具有挑战性的IrokoBench数据集上，我们的模型在类似规模的基线中始终取得最佳性能，特别是在知识密集型的多项选择题（AfriMMLU）上。此外，在跨语言问答基准测试AfriQA上，我们的模型比基础模型表现优异超过10%。为了更好地了解训练过程中英语数据的作用，我们将其中的200M个令牌的子集翻译成斯瓦希里语，并进行分析，结果显示这些数据的内容主要负责强大的性能。我们发布我们的模型和数据以鼓励未来对非洲语言的研究。

更新时间: 2025-04-09 02:25:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.06536v1

GBG++: A Fast and Stable Granular Ball Generation Method for Classification

Granular ball computing (GBC), as an efficient, robust, and scalable learning method, has become a popular research topic of granular computing. GBC includes two stages: granular ball generation (GBG) and multi-granularity learning based on the granular ball (GB). However, the stability and efficiency of existing GBG methods need to be further improved due to their strong dependence on $k$-means or $k$-division. In addition, GB-based classifiers only unilaterally consider the GB's geometric characteristics to construct classification rules, but the GB's quality is ignored. Therefore, in this paper, based on the attention mechanism, a fast and stable GBG (GBG++) method is proposed first. Specifically, the proposed GBG++ method only needs to calculate the distances from the data-driven center to the undivided samples when splitting each GB instead of randomly selecting the center and calculating the distances between it and all samples. Moreover, an outlier detection method is introduced to identify local outliers. Consequently, the GBG++ method can significantly improve effectiveness, robustness, and efficiency while being absolutely stable. Second, considering the influence of the sample size within the GB on the GB's quality, based on the GBG++ method, an improved GB-based $k$-nearest neighbors algorithm (GB$k$NN++) is presented, which can reduce misclassification at the class boundary. Finally, the experimental results indicate that the proposed method outperforms several existing GB-based classifiers and classical machine learning classifiers on $24$ public benchmark datasets. The implementation code of experiments is available at https://github.com/CherylTse/GBG-plusplus.

Updated: 2025-04-09 02:25:03

标题: GBG++：一种用于分类的快速稳定的颗粒球生成方法

摘要: 颗粒球计算（Granular Ball Computing，GBC）作为一种高效、稳健和可扩展的学习方法，已成为颗粒计算领域的热门研究课题。GBC包括两个阶段：颗粒球生成（GBG）和基于颗粒球（GB）的多粒度学习。然而，现有的GBG方法的稳定性和效率需要进一步改进，因为它们严重依赖于k-means或k-division。此外，基于GB的分类器只单方面考虑GB的几何特征来构建分类规则，而忽略了GB的质量。因此，在本文中，基于注意力机制，首先提出了一种快速稳定的GBG++方法。具体来说，提出的GBG++方法只需要计算从数据驱动的中心到未分割样本的距离，而不是随机选择中心并计算其与所有样本之间的距离。此外，引入了一种异常值检测方法来识别局部异常值。因此，GBG++方法可以显著提高效果、稳健性和效率，同时保持绝对稳定性。其次，考虑GB内样本大小对GB质量的影响，基于GBG++方法，提出了一种改进的基于GB的k最近邻算法（GBkNN++），可以减少类边界处的误分类。最后，实验结果表明，所提出的方法在24个公共基准数据集上优于多个现有的基于GB的分类器和经典机器学习分类器。实验的实现代码可在https://github.com/CherylTse/GBG-plusplus上找到。

更新时间: 2025-04-09 02:25:03

领域: cs.LG

下载: http://arxiv.org/abs/2305.18450v3

Enhancing Job Salary Prediction with Disentangled Composition Effect Modeling: A Neural Prototyping Approach

In the era of the knowledge economy, understanding how job skills influence salary is crucial for promoting recruitment with competitive salary systems and aligned salary expectations. Despite efforts on salary prediction based on job positions and talent demographics, there still lacks methods to effectively discern the set-structured skills' intricate composition effect on job salary. While recent advances in neural networks have significantly improved accurate set-based quantitative modeling, their lack of explainability hinders obtaining insights into the skills' composition effects. Indeed, model explanation for set data is challenging due to the combinatorial nature, rich semantics, and unique format. To this end, in this paper, we propose a novel intrinsically explainable set-based neural prototyping approach, namely \textbf{LGDESetNet}, for explainable salary prediction that can reveal disentangled skill sets that impact salary from both local and global perspectives. Specifically, we propose a skill graph-enhanced disentangled discrete subset selection layer to identify multi-faceted influential input subsets with varied semantics. Furthermore, we propose a set-oriented prototype learning method to extract globally influential prototypical sets. The resulting output is transparently derived from the semantic interplay between these input subsets and global prototypes. Extensive experiments on four real-world datasets demonstrate that our method achieves superior performance than state-of-the-art baselines in salary prediction while providing explainable insights into salary-influencing patterns.

Updated: 2025-04-09 02:23:34

标题: 利用解耦合组成效应建模增强工作薪水预测：一种神经原型方法

摘要: 在知识经济时代，了解工作技能如何影响薪资对于促进具有竞争力的薪资体系和一致的薪资预期的招聘至关重要。尽管已经努力基于职位和人才人口统计数据进行薪资预测，但仍然缺乏有效区分结构化技能对工作薪资的复杂影响的方法。虽然最近神经网络的进步显著提高了准确的基于集合的定量建模，但它们缺乏可解释性，阻碍了对技能组成效果的洞察。实际上，由于组合性质、丰富的语义和独特的格式，对集合数据进行模型解释是具有挑战性的。因此，在本文中，我们提出了一种新颖的内在可解释的基于集合的神经原型方法，即\textbf{LGDESetNet}，用于可解释的薪资预测，可以从本地和全局的角度揭示影响薪资的解开技能集合。具体而言，我们提出了一种技能图增强的解开离散子集选择层，以识别具有不同语义的多方面有影响力的输入子集。此外，我们提出了一种面向集合的原型学习方法，以提取全局有影响力的原型集合。最终的输出透明地从这些输入子集和全局原型之间的语义相互作用中衍生出来。对四个真实数据集的广泛实验表明，我们的方法在薪资预测方面比最先进的基线表现更好，同时提供了有关影响薪资模式的可解释见解。

更新时间: 2025-04-09 02:23:34

领域: cs.LG

下载: http://arxiv.org/abs/2503.12978v3

Flexible Graph Similarity Computation With A Proactive Optimization Strategy

Graph Edit Distance (GED) is an important similarity measure in graph retrieval, which quantifies the minimum cost of transforming one graph into another through edit operations, and offers flexibility by allowing customizable operation costs. Recent learning-based approaches approximate GEDs with the distances between representations in vector spaces. However, these methods often struggle with varying operation costs due to neglecting the impact of these costs on determining optimal graph mappings. Furthermore, they rely on isolated node distances as guidance, necessitating inefficient reactive refinements of mappings. To address these issues, we propose Graph Edit Network (GEN), a novel learning-based approach for flexible GED computation. By identifying the limitations of existing methods in capturing flexibility of GED, we introduce a principled yet simple solution that incorporates the operation costs before establishing mappings. To improve matching efficiency, we propose a strategy that proactively optimizes guidance from a graph perspective. This strategy initializes guidance as each node's alignment difficulty and captures the interdependencies between matches within and across graphs through a difficulty propagation mechanism, enabling more informed decisions. As a result, GEN selects optimal matches in a single step, minimizing the need for costly refinements. Results on real-world and synthetic datasets demonstrate the effectiveness, time efficiency, and adaptability of GEN, achieving up to 37.8\% error reduction and 72.7\% inference time reduction compared with state-of-the-art models, while performing robustly under varying cost settings and graph sizes.

Updated: 2025-04-09 02:16:46

标题: 灵活的图相似度计算与主动优化策略

摘要: 图编辑距离（GED）是图检索中的重要相似性度量，它量化了通过编辑操作将一个图转换为另一个图的最小成本，并通过允许可定制的操作成本提供了灵活性。最近基于学习的方法使用向量空间中的表示之间的距离来近似GED。然而，这些方法通常由于忽略了这些成本对确定最佳图映射的影响而难以处理不同的操作成本。此外，它们依赖于孤立节点距离作为指导，需要对映射进行低效的反应性改进。为了解决这些问题，我们提出了图编辑网络（GEN），这是一种用于灵活GED计算的新型基于学习的方法。通过识别现有方法在捕捉GED灵活性方面的局限性，我们引入了一个基于原则但简单的解决方案，在建立映射之前将操作成本纳入考虑。为了提高匹配效率，我们提出了一种策略，从图的角度积极优化指导。该策略将每个节点的对齐难度作为指导初始化，并通过难度传播机制捕获图内和图间匹配之间的相互依赖关系，从而做出更明智的决策。因此，GEN在一个步骤中选择最佳匹配，最大程度减少了昂贵的改进需求。在真实世界和合成数据集上的结果表明，GEN的有效性、时间效率和适应性，与最先进的模型相比，错误率降低了高达37.8\%，推理时间降低了72.7\%，同时在不同成本设置和图大小下表现稳健。

更新时间: 2025-04-09 02:16:46

领域: cs.LG,cs.AI,cs.DS

下载: http://arxiv.org/abs/2504.06533v1

WaveHiTS: Wavelet-Enhanced Hierarchical Time Series Modeling for Wind Direction Nowcasting in Eastern Inner Mongolia

Wind direction forecasting plays a crucial role in optimizing wind energy production, but faces significant challenges due to the circular nature of directional data, error accumulation in multi-step forecasting, and complex meteorological interactions. This paper presents a novel model, WaveHiTS, which integrates wavelet transform with Neural Hierarchical Interpolation for Time Series to address these challenges. Our approach decomposes wind direction into U-V components, applies wavelet transform to capture multi-scale frequency patterns, and utilizes a hierarchical structure to model temporal dependencies at multiple scales, effectively mitigating error propagation. Experiments conducted on real-world meteorological data from Inner Mongolia, China demonstrate that WaveHiTS significantly outperforms deep learning models (RNN, LSTM, GRU), transformer-based approaches (TFT, Informer, iTransformer), and hybrid models (EMD-LSTM). The proposed model achieves RMSE values of approximately 19.2{\deg}-19.4{\deg} compared to 56{\deg}-64{\deg} for deep learning recurrent models, maintaining consistent accuracy across all forecasting steps up to 60 minutes ahead. Moreover, WaveHiTS demonstrates superior robustness with vector correlation coefficients (VCC) of 0.985-0.987 and hit rates of 88.5%-90.1%, substantially outperforming baseline models. Ablation studies confirm that each component-wavelet transform, hierarchical structure, and U-V decomposition-contributes meaningfully to overall performance. These improvements in wind direction nowcasting have significant implications for enhancing wind turbine yaw control efficiency and grid integration of wind energy.

Updated: 2025-04-09 02:15:48

标题: WaveHiTS：波纹增强的东内蒙古风向短期预报层次时间序列建模

摘要: 风向预测在优化风能生产中起着至关重要的作用，但由于方向数据的循环性质、多步预测中的误差累积以及复杂的气象相互作用，面临着重大挑战。本文提出了一种新颖的模型WaveHiTS，将小波变换与神经分层插值结合起来，以解决这些挑战。我们的方法将风向分解为U-V分量，应用小波变换来捕捉多尺度频率模式，并利用分层结构来模拟多个尺度上的时间依赖关系，有效地减轻误差传播。在中国内蒙古的真实气象数据上进行的实验表明，WaveHiTS在深度学习模型（RNN、LSTM、GRU）、基于变压器的方法（TFT、Informer、iTransformer）以及混合模型（EMD-LSTM）方面显著优于其他模型。所提出的模型在所有预测步骤中均取得了约19.2°-19.4°的均方根误差值，而深度学习循环模型的均方根误差值为56°-64°，在未来60分钟内仍能保持稳定的准确性。此外，WaveHiTS表现出卓越的鲁棒性，矢量相关系数（VCC）为0.985-0.987，命中率为88.5%-90.1%，远远优于基准模型。消融研究证实，每个组件-小波变换、分层结构和U-V分解-都对整体性能有着显著贡献。这些风向短时预报的改进对提高风力发电机的偏航控制效率和风能网格集成具有重要意义。

更新时间: 2025-04-09 02:15:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.06532v1

Replacing Paths with Connection-Biased Attention for Knowledge Graph Completion

Knowledge graph (KG) completion aims to identify additional facts that can be inferred from the existing facts in the KG. Recent developments in this field have explored this task in the inductive setting, where at test time one sees entities that were not present during training; the most performant models in the inductive setting have employed path encoding modules in addition to standard subgraph encoding modules. This work similarly focuses on KG completion in the inductive setting, without the explicit use of path encodings, which can be time-consuming and introduces several hyperparameters that require costly hyperparameter optimization. Our approach uses a Transformer-based subgraph encoding module only; we introduce connection-biased attention and entity role embeddings into the subgraph encoding module to eliminate the need for an expensive and time-consuming path encoding module. Evaluations on standard inductive KG completion benchmark datasets demonstrate that our \textbf{C}onnection-\textbf{B}iased \textbf{Li}nk \textbf{P}rediction (CBLiP) model has superior performance to models that do not use path information. Compared to models that utilize path information, CBLiP shows competitive or superior performance while being faster. Additionally, to show that the effectiveness of connection-biased attention and entity role embeddings also holds in the transductive setting, we compare CBLiP's performance on the relation prediction task in the transductive setting.

Updated: 2025-04-09 02:12:28

标题: 使用连接偏向注意力替换路径进行知识图补全

摘要: 知识图谱（KG）完成旨在识别可以从KG中现有事实推断出的附加事实。该领域的最新发展探索了归纳设置下的任务，在测试时看到了训练期间不存在的实体；在归纳设置中表现最佳的模型除了标准子图编码模块外还采用了路径编码模块。本文同样专注于归纳设置下的KG完成，但没有明确使用路径编码，因为这可能耗时，并且引入了几个需要昂贵的超参数优化的超参数。我们的方法仅使用基于Transformer的子图编码模块；我们在子图编码模块中引入了连接偏置注意力和实体角色嵌入，以消除对昂贵且耗时的路径编码模块的需要。在标准归纳KG完成基准数据集上的评估表明，我们的CBLiP模型在不使用路径信息的模型方面表现出优越性能。与利用路径信息的模型相比，CBLiP表现出更快的竞争性或优越性能。此外，为了展示连接偏置注意力和实体角色嵌入在传导设置下的有效性，我们比较了CBLiP在传导设置下关系预测任务上的表现。

更新时间: 2025-04-09 02:12:28

领域: cs.LG

下载: http://arxiv.org/abs/2410.00876v4

Beyond Moore's Law: Harnessing the Redshift of Generative AI with Effective Hardware-Software Co-Design

For decades, Moore's Law has served as a steadfast pillar in computer architecture and system design, promoting a clear abstraction between hardware and software. This traditional Moore's computing paradigm has deepened the rift between the two, enabling software developers to achieve near-exponential performance gains often without needing to delve deeply into hardware-specific optimizations. Yet today, Moore's Law -- with its once relentless performance gains now diminished to incremental improvements -- faces inevitable physical barriers. This stagnation necessitates a reevaluation of the conventional system design philosophy. The traditional decoupled system design philosophy, which maintains strict abstractions between hardware and software, is increasingly obsolete. The once-clear boundary between software and hardware is rapidly dissolving, replaced by co-design. It is imperative for the computing community to intensify its commitment to hardware-software co-design, elevating system abstractions to first-class citizens and reimagining design principles to satisfy the insatiable appetite of modern computing. Hardware-software co-design is not a recent innovation. To illustrate its historical evolution, I classify its development into five relatively distinct ``epochs''. This post also highlights the growing influence of the architecture community in interdisciplinary teams -- particularly alongside ML researchers -- and explores why current co-design paradigms are struggling in today's computing landscape. Additionally, I will examine the concept of the ``hardware lottery'' and explore directions to mitigate its constraining influence on the next era of computing innovation.

Updated: 2025-04-09 02:10:58

标题: 超越摩尔定律：利用有效的硬件-软件协同设计来利用生成式人工智能的红移

摘要: 几十年来，摩尔定律一直是计算机架构和系统设计中的坚定支柱，促进了硬件和软件之间的明确抽象。这种传统的摩尔计算范式加深了硬件和软件之间的裂痕，使软件开发人员能够在不深入研究硬件特定优化的情况下实现近乎指数级的性能提升。然而，如今，随着摩尔定律的性能提升一度无情地减弱为渐增式改进，面临着不可避免的物理障碍。这种停滞需要重新评估传统系统设计哲学。传统的分离式系统设计哲学，维持着严格的硬件和软件之间的抽象，越来越过时。软件和硬件之间曾经清晰的边界正在迅速消失，被协同设计所取代。计算机社区有必要加强对硬件-软件协同设计的承诺，将系统抽象提升为一等公民，并重新构想设计原则，以满足现代计算的贪婪需求。硬件-软件协同设计并非一种新的创新。为了说明其历史演变，我将其发展分类为五个相对明显的“时代”。本文还强调了架构社区在跨学科团队中的日益增长的影响力，特别是在机器学习研究人员之间，探讨为什么当前的协同设计范式在今天的计算环境中面临困难。此外，我将研究“硬件抽奖”的概念，并探讨缓解其对下一代计算创新的约束影响的方向。

更新时间: 2025-04-09 02:10:58

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2504.06531v1

TSP-OCS: A Time-Series Prediction for Optimal Camera Selection in Multi-Viewpoint Surgical Video Analysis

Recording the open surgery process is essential for educational and medical evaluation purposes; however, traditional single-camera methods often face challenges such as occlusions caused by the surgeon's head and body, as well as limitations due to fixed camera angles, which reduce comprehensibility of the video content. This study addresses these limitations by employing a multi-viewpoint camera recording system, capturing the surgical procedure from six different angles to mitigate occlusions. We propose a fully supervised learning-based time series prediction method to choose the best shot sequences from multiple simultaneously recorded video streams, ensuring optimal viewpoints at each moment. Our time series prediction model forecasts future camera selections by extracting and fusing visual and semantic features from surgical videos using pre-trained models. These features are processed by a temporal prediction network with TimeBlocks to capture sequential dependencies. A linear embedding layer reduces dimensionality, and a Softmax classifier selects the optimal camera view based on the highest probability. In our experiments, we created five groups of open thyroidectomy videos, each with simultaneous recordings from six different angles. The results demonstrate that our method achieves competitive accuracy compared to traditional supervised methods, even when predicting over longer time horizons. Furthermore, our approach outperforms state-of-the-art time series prediction techniques on our dataset. This manuscript makes a unique contribution by presenting an innovative framework that advances surgical video analysis techniques, with significant implications for improving surgical education and patient safety.

Updated: 2025-04-09 02:07:49

标题: TSP-OCS：多视角手术视频分析中的最佳摄像头选择的时间序列预测

摘要: 记录开放手术过程对于教育和医疗评估至关重要；然而，传统的单摄像头方法常常面临由于外科医生头部和身体造成的遮挡，以及由于固定摄像头角度而导致的限制，降低了视频内容的可理解性等挑战。本研究通过采用多视角摄像系统来记录手术过程，从六个不同角度捕捉手术步骤，以减轻遮挡问题。我们提出了一种基于完全监督学习的时间序列预测方法，从同时录制的多个视频流中选择最佳镜头序列，确保每个时刻的最佳视角。我们的时间序列预测模型通过使用预训练模型从外科视频中提取和融合视觉和语义特征来预测未来的相机选择。这些特征经过时间预测网络处理，利用TimeBlocks捕捉顺序依赖关系。线性嵌入层降低了维度，Softmax分类器基于最高概率选择最佳的相机视角。在我们的实验中，我们创建了五组开放性甲状腺切除视频，每组都同时从六个不同角度录制。结果表明，即使在预测更长时间范围时，我们的方法与传统监督方法相比仍具有竞争力的准确性。此外，我们的方法在我们的数据集上表现优于最先进的时间序列预测技术。本文独特地提出了一种创新框架，推进了外科视频分析技术，对提高外科教育和患者安全具有重要意义。

更新时间: 2025-04-09 02:07:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.06527v1

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

Video inpainting, which aims to restore corrupted video content, has experienced substantial progress. Despite these advances, existing methods, whether propagating unmasked region pixels through optical flow and receptive field priors, or extending image-inpainting models temporally, face challenges in generating fully masked objects or balancing the competing objectives of background context preservation and foreground generation in one model, respectively. To address these limitations, we propose a novel dual-stream paradigm VideoPainter that incorporates an efficient context encoder (comprising only 6% of the backbone parameters) to process masked videos and inject backbone-aware background contextual cues to any pre-trained video DiT, producing semantically consistent content in a plug-and-play manner. This architectural separation significantly reduces the model's learning complexity while enabling nuanced integration of crucial background context. We also introduce a novel target region ID resampling technique that enables any-length video inpainting, greatly enhancing our practical applicability. Additionally, we establish a scalable dataset pipeline leveraging current vision understanding models, contributing VPData and VPBench to facilitate segmentation-based inpainting training and assessment, the largest video inpainting dataset and benchmark to date with over 390K diverse clips. Using inpainting as a pipeline basis, we also explore downstream applications including video editing and video editing pair data generation, demonstrating competitive performance and significant practical potential. Extensive experiments demonstrate VideoPainter's superior performance in both any-length video inpainting and editing, across eight key metrics, including video quality, mask region preservation, and textual coherence.

Updated: 2025-04-09 02:05:33

标题: VideoPainter：具有即插即用上下文控制的任意长度视频修复和编辑

摘要: 视频修复旨在恢复受损的视频内容，取得了显著进展。尽管存在方法已经取得了进展，例如通过光流和感受野先验传播未遮挡区域像素，或者在时间上扩展图像修复模型，但是在一个模型中生成完全遮挡的对象或平衡背景上下文保留和前景生成的竞争目标方面仍然面临挑战。为了解决这些限制，我们提出了一种新颖的双流范式VideoPainter，它包含一个高效的上下文编码器（仅占主干参数的6％），用于处理遮罩视频并向任何预训练视频DiT注入主干感知的背景上下文线索，以一种即插即用的方式生成语义一致的内容。这种架构分离显著降低了模型的学习复杂性，同时实现了对关键背景上下文的微妙整合。我们还引入了一种新颖的目标区域ID重新采样技术，实现了任意长度的视频修复，极大增强了我们的实际适用性。此外，我们建立了一个可扩展的数据集管道，利用当前视觉理解模型，贡献了VPData和VPBench，以促进基于分割的修复训练和评估，这是迄今为止最大的视频修复数据集和基准，包含超过390,000个不同的片段。通过将修复作为管道基础，我们还探讨了下游应用，包括视频编辑和视频编辑配对数据生成，展示了竞争性表现和显著的实际潜力。广泛的实验证明了VideoPainter在任意长度的视频修复和编辑方面的优越性能，涵盖了八个关键指标，包括视频质量、遮罩区域保留和文本连贯性。

更新时间: 2025-04-09 02:05:33

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2503.05639v3

The Power of the Pareto Front: Balancing Uncertain Rewards for Adaptive Experimentation in scanning probe microscopy

Automated experimentation has the potential to revolutionize scientific discovery, but its effectiveness depends on well-defined optimization targets, which are often uncertain or probabilistic in real-world settings. In this work, we demonstrate the application of Multi-Objective Bayesian Optimization (MOBO) to balance multiple, competing rewards in autonomous experimentation. Using scanning probe microscopy (SPM) imaging, one of the most widely used and foundational SPM modes, we show that MOBO can optimize imaging parameters to enhance measurement quality, reproducibility, and efficiency. A key advantage of this approach is the ability to compute and analyze the Pareto front, which not only guides optimization but also provides physical insights into the trade-offs between different objectives. Additionally, MOBO offers a natural framework for human-in-the-loop decision-making, enabling researchers to fine-tune experimental trade-offs based on domain expertise. By standardizing high-quality, reproducible measurements and integrating human input into AI-driven optimization, this work highlights MOBO as a powerful tool for advancing autonomous scientific discovery.

Updated: 2025-04-09 01:59:31

标题: 帕累托前沿的力量：在扫描探针显微镜中平衡不确定奖励以进行自适应实验

摘要: 自动实验具有改变科学发现的潜力，但其有效性取决于明确定义的优化目标，在现实世界的环境中，这些目标往往是不确定的或具有概率性的。在这项工作中，我们演示了多目标贝叶斯优化（MOBO）在自主实验中平衡多个竞争奖励的应用。利用扫描探针显微镜（SPM）成像，这是最广泛使用和基础性的SPM模式之一，我们展示了MOBO可以优化成像参数，以提高测量质量、可重复性和效率。这种方法的一个关键优势是能够计算和分析帕累托前沿，这不仅指导优化，还能提供关于不同目标之间权衡的物理洞见。此外，MOBO提供了一个自然的人机协同决策框架，使研究人员能够根据领域专业知识调整实验权衡。通过标准化高质量、可重复测量，并将人类输入整合到AI驱动的优化中，这项工作突显了MOBO作为推进自主科学发现的强大工具。

更新时间: 2025-04-09 01:59:31

领域: cs.LG,cond-mat.mes-hall,cond-mat.mtrl-sci,cs.AI

下载: http://arxiv.org/abs/2504.06525v1

Controller Distillation Reduces Fragile Brain-Body Co-Adaptation and Enables Migrations in MAP-Elites

Brain-body co-optimization suffers from fragile co-adaptation where brains become over-specialized for particular bodies, hindering their ability to transfer well to others. Evolutionary algorithms tend to discard such low-performing solutions, eliminating promising morphologies. Previous work considered applying MAP-Elites, where niche descriptors are based on morphological features, to promote better search over morphology space. In this work, we show that this approach still suffers from fragile co-adaptation: where a core mechanism of MAP-Elites, creating stepping stones through solutions that migrate from one niche to another, is disrupted. We suggest that this disruption occurs because the body mutations that move an offspring to a new morphological niche break the robots' fragile brain-body co-adaptation and thus significantly decrease the performance of those potential solutions -- reducing their likelihood of outcompeting an existing elite in that new niche. We utilize a technique, we call Pollination, that periodically replaces the controllers of certain solutions with a distilled controller with better generalization across morphologies to reduce fragile brain-body co-adaptation and thus promote MAP-Elites migrations. Pollination increases the success of body mutations and the number of migrations, resulting in better quality-diversity metrics. We believe we develop important insights that could apply to other domains where MAP-Elites is used.

Updated: 2025-04-09 01:45:51

标题: 控制器精炼减少脆弱的大脑-身体协同适应，并使MAP-Elites中的迁移成为可能

摘要: 大脑-身体协同优化受到脆弱的协同适应性的影响，其中大脑变得过度专门化于特定的身体，阻碍其能力有效地转移到其他身体上。进化算法往往会丢弃这种表现较差的解决方案，从而消除有前途的形态。先前的研究考虑应用基于形态特征的 niche 描述符的 MAP-Elites，以促进对形态空间的更好搜索。在这项工作中，我们展示了这种方法仍然受到脆弱的协同适应性的影响：MAP-Elites 的核心机制，通过解决方案在不同 niche 之间迁移创建跳跃石，遭到破坏。我们认为这种中断是因为将后代移动到新的形态 niche 的身体突变破坏了机器人脆弱的大脑-身体协同适应性，从而显著降低了这些潜在解决方案的表现 – 减少它们在新 niche 中击败现有精英的可能性。我们利用一种称为“授粉”的技术，定期用更具横向泛化能力的精炼控制器替换某些解决方案的控制器，以减少脆弱的大脑-身体协同适应性，从而促进 MAP-Elites 迁移。授粉增加了身体突变的成功率和迁移数量，导致更好的质量-多样性指标。我们相信我们发展了可以应用于其他领域的重要见解，其中使用 MAP-Elites。

更新时间: 2025-04-09 01:45:51

领域: cs.RO,cs.LG,cs.NE

下载: http://arxiv.org/abs/2504.06523v1

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

We find that the response length of reasoning LLMs, whether trained by reinforcement learning or supervised learning, drastically increases for ill-posed questions with missing premises (MiP), ending up with redundant and ineffective thinking. This newly introduced scenario exacerbates the general overthinking issue to a large extent, which we name as the MiP-Overthinking. Such failures are against the ``test-time scaling law'' but have been widely observed on multiple datasets we curated with MiP, indicating the harm of cheap overthinking and a lack of critical thinking. Surprisingly, LLMs not specifically trained for reasoning exhibit much better performance on the MiP scenario, producing much shorter responses that quickly identify ill-posed queries. This implies a critical flaw of the current training recipe for reasoning LLMs, which does not encourage efficient thinking adequately, leading to the abuse of thinking patterns. To further investigate the reasons behind such failures, we conduct fine-grained analyses of the reasoning length, overthinking patterns, and location of critical thinking on different types of LLMs. Moreover, our extended ablation study reveals that the overthinking is contagious through the distillation of reasoning models' responses. These results improve the understanding of overthinking and shed novel insights into mitigating the problem.

Updated: 2025-04-09 01:25:27

标题: 缺失前提加剧了过度思考：推理模型正在失去批判性思维能力吗？

摘要: 我们发现，无论是通过强化学习还是监督学习训练的推理LLMs的响应长度，在存在缺失前提的问题（MiP）的情况下会急剧增加，最终导致冗余和无效的思考。这种新引入的情景在很大程度上加剧了普遍的过度思考问题，我们将其命名为MiP-过度思考。这些失败与“测试时间缩放定律”相悖，但在我们策划的多个具有MiP的数据集上广泛观察到，表明了廉价过度思考和缺乏批判性思考的危害。令人惊讶的是，没有专门训练用于推理的LLMs在MiP情景上表现出更好的性能，产生更短的响应，快速识别出问题的不完整性。这意味着当前推理LLMs的训练方法存在关键缺陷，不足以充分鼓励高效思考，导致思考模式被滥用。为了进一步调查这些失败背后的原因，我们对不同类型的LLMs的推理长度、过度思考模式和关键思考位置进行了细致分析。此外，我们的扩展消融研究揭示了过度思考通过推理模型响应的蒸馏是具有传染性的。这些结果提高了对过度思考的理解，并为缓解这一问题提供了新颖的见解。

更新时间: 2025-04-09 01:25:27

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2504.06514v1

Differentially Private Joint Independence Test

Identification of joint dependence among more than two random vectors plays an important role in many statistical applications, where the data may contain sensitive or confidential information. In this paper, we consider the the $d$-variable Hilbert-Schmidt independence criterion (dHSIC) in the context of differential privacy. Given the limiting distribution of the empirical estimate of dHSIC is complicated Gaussian chaos, constructing tests in the non-privacy regime is typically based on permutation and bootstrap. To detect joint dependence in privacy, we propose a dHSIC-based testing procedure by employing a differentially private permutation methodology. Our method enjoys privacy guarantee, valid level and pointwise consistency, while the bootstrap counterpart suffers inconsistent power. We further investigate the uniform power of the proposed test in dHSIC metric and $L_2$ metric, indicating that the proposed test attains the minimax optimal power across different privacy regimes. As a byproduct, our results also contain the pointwise and uniform power of the non-private permutation dHSIC, addressing an unsolved question remained in Pfister et al. (2018). Both numerical simulations and real data analysis on causal inference suggest our proposed test performs well empirically.

Updated: 2025-04-09 01:20:43

标题: 差分隐私联合独立性检验

摘要: 在许多统计应用中，识别超过两个随机向量之间的联合依赖关系起着重要作用，其中数据可能包含敏感或机密信息。本文考虑在差分隐私的背景下，$d$-变量希尔伯特-施密特独立准则（dHSIC）。鉴于dHSIC的经验估计的极限分布是复杂的高斯混沌，非隐私领域中构建检验通常基于排列和自助法。为了在隐私中检测联合依赖性，我们提出了一种基于dHSIC的测试程序，采用差分隐私排列方法。我们的方法具有隐私保证、有效水平和点逐一一致性，而自举对应物则存在不一致的功率。我们进一步研究了在dHSIC度量和$L_2$度量中所提出的测试的统一功率，表明所提出的测试在不同隐私领域中实现了极小极限功率。作为副产品，我们的结果还包括非私有排列dHSIC的点逐一和统一功率，解决了Pfister等人（2018）中未解决的问题。对因果推断的数值模拟和实际数据分析表明，我们提出的测试在实证上表现良好。

更新时间: 2025-04-09 01:20:43

领域: math.ST,cs.CR,cs.LG,stat.ME,stat.ML,stat.TH,62G10, 62H20

下载: http://arxiv.org/abs/2503.18721v2

A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation

In this work, we build a simple but strong baseline for sounding video generation. Given base diffusion models for audio and video, we integrate them with additional modules into a single model and train it to make the model jointly generate audio and video. To enhance alignment between audio-video pairs, we introduce two novel mechanisms in our model. The first one is timestep adjustment, which provides different timestep information to each base model. It is designed to align how samples are generated along with timesteps across modalities. The second one is a new design of the additional modules, termed Cross-Modal Conditioning as Positional Encoding (CMC-PE). In CMC-PE, cross-modal information is embedded as if it represents temporal position information, and the embeddings are fed into the model like positional encoding. Compared with the popular cross-attention mechanism, CMC-PE provides a better inductive bias for temporal alignment in the generated data. Experimental results validate the effectiveness of the two newly introduced mechanisms and also demonstrate that our method outperforms existing methods.

Updated: 2025-04-09 01:18:01

标题: 一个简单但强大的声音视频生成基线：音频和视频扩散模型的有效调整用于联合生成

摘要: 在这项工作中，我们为声音视频生成构建了一个简单但强大的基线。鉴于音频和视频的基础扩散模型，我们将它们与其他模块集成到一个单一模型中，并训练该模型以使其共同生成音频和视频。为了增强音频-视频对之间的对齐，我们在我们的模型中引入了两种新颖的机制。第一个是时间步调整，为每个基础模型提供不同的时间步信息。它旨在使样本生成沿着不同模态的时间步骤对齐。第二个是附加模块的新设计，称为跨模态调节作为位置编码（CMC-PE）。在CMC-PE中，交叉模态信息被嵌入，就好像它代表时间位置信息一样，并且这些嵌入被像位置编码一样馈送到模型中。与流行的跨注意机制相比，CMC-PE为生成数据中的时间对齐提供了更好的归纳偏差。实验结果验证了这两种新引入机制的有效性，并且证明了我们的方法优于现有方法。

更新时间: 2025-04-09 01:18:01

领域: cs.LG,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2409.17550v3

GTS-LUM: Reshaping User Behavior Modeling with LLMs in Telecommunications Industry

As telecommunication service providers shifting their focus to analyzing user behavior for package design and marketing interventions, a critical challenge lies in developing a unified, end-to-end framework capable of modeling long-term and periodic user behavior sequences with diverse time granularities, multi-modal data inputs, and heterogeneous labels. This paper introduces GTS-LUM, a novel user behavior model that redefines modeling paradigms in telecommunication settings. GTS-LUM adopts a (multi-modal) encoder-adapter-LLM decoder architecture, enhanced with several telecom-specific innovations. Specifically, the model incorporates an advanced timestamp processing method to handle varying time granularities. It also supports multi-modal data inputs -- including structured tables and behavior co-occurrence graphs -- and aligns these with semantic information extracted by a tokenizer using a Q-former structure. Additionally, GTS-LUM integrates a front-placed target-aware mechanism to highlight historical behaviors most relevant to the target. Extensive experiments on industrial dataset validate the effectiveness of this end-to-end framework and also demonstrate that GTS-LUM outperforms LLM4Rec approaches which are popular in recommendation systems, offering an effective and generalizing solution for user behavior modeling in telecommunications.

Updated: 2025-04-09 01:12:07

标题: GTS-LUM：在电信行业中利用LLMs重塑用户行为建模

摘要: 随着电信服务提供商将重点转向分析用户行为以进行套餐设计和营销干预，一个关键挑战在于开发一个能够建模长期和周期性用户行为序列的统一、端到端框架，具有不同时间粒度、多模态数据输入和异质标签。本文介绍了GTS-LUM，一种重新定义了电信环境建模范式的新型用户行为模型。GTS-LUM采用了（多模态）编码器-适配器-LLM解码器架构，并加入了几项电信特定的创新技术。具体来说，该模型融入了一种先进的时间戳处理方法，以处理不同的时间粒度。它还支持多模态数据输入--包括结构化表格和行为共现图--并将这些与由一个使用Q-形式结构的分词器提取的语义信息对齐。此外，GTS-LUM集成了一个前置的目标感知机制，以突出与目标最相关的历史行为。对工业数据集进行的广泛实验验证了这一端到端框架的有效性，并证明了GTS-LUM优于在推荐系统中流行的LLM4Rec方法，为电信用户行为建模提供了有效且泛化的解决方案。

更新时间: 2025-04-09 01:12:07

领域: cs.LG

下载: http://arxiv.org/abs/2504.06511v1

Identifying Information from Observations with Uncertainty and Novelty

A machine learning tasks from observations must encounter and process uncertainty and novelty, especially when it is to maintain performance when observing new information and to choose the hypothesis that best fits the current observations. In this context, some key questions arise: what and how much information did the observations provide, how much information is required to identify the data-generating process, how many observations remain to get that information, and how does a predictor determine that it has observed novel information? This paper strengthens existing answers to these questions by formalizing the notion of identifiable information that arises from the language used to express the relationship between distinct states. Model identifiability and sample complexity are defined via computation of an indicator function over a set of hypotheses, bridging algorithmic and probabilistic information. Their properties and asymptotic statistics are described for data-generating processes ranging from deterministic processes to ergodic stationary stochastic processes. This connects the notion of identifying information in finite steps with asymptotic statistics and PAC-learning. The indicator function's computation naturally formalizes novel information and its identification from observations with respect to a hypothesis set. We also proved that computable PAC-Bayes learners' sample complexity distribution is determined by its moments in terms of the prior probability distribution over a fixed finite hypothesis set.

Updated: 2025-04-09 01:11:06

标题: 通过不确定性和新颖性观察识别信息

摘要: 从观察中进行的机器学习任务必须面对和处理不确定性和新颖性，特别是在观察到新信息时要保持性能并选择最适合当前观察的假设。在这种情况下，一些关键问题出现了：观察提供了什么信息，需要多少信息来识别产生数据的过程，还有多少观察数据可以获得这些信息，预测器如何确定自己观察到了新信息？本文通过形式化表达不同状态之间关系的语言，加强了对这些问题的现有答案的认识。通过计算在一组假设上的指示函数，定义了模型可识别性和样本复杂性，构建了算法和概率信息之间的桥梁。描述了从确定性过程到遍历定态随机过程的数据生成过程的性质和渐近统计。这将有限步骤中的识别信息概念与渐近统计和PAC学习联系起来。指示函数的计算自然地形式化了从观察中针对假设集识别新信息及其识别。我们还证明了可计算的PAC-Bayes学习者的样本复杂性分布由其关于固定有限假设集的先验概率分布的矩决定。

更新时间: 2025-04-09 01:11:06

领域: cs.LG,stat.ML,G.3

下载: http://arxiv.org/abs/2501.09331v2

EffOWT: Transfer Visual Language Models to Open-World Tracking Efficiently and Effectively

Open-World Tracking (OWT) aims to track every object of any category, which requires the model to have strong generalization capabilities. Trackers can improve their generalization ability by leveraging Visual Language Models (VLMs). However, challenges arise with the fine-tuning strategies when VLMs are transferred to OWT: full fine-tuning results in excessive parameter and memory costs, while the zero-shot strategy leads to sub-optimal performance. To solve the problem, EffOWT is proposed for efficiently transferring VLMs to OWT. Specifically, we build a small and independent learnable side network outside the VLM backbone. By freezing the backbone and only executing backpropagation on the side network, the model's efficiency requirements can be met. In addition, EffOWT enhances the side network by proposing a hybrid structure of Transformer and CNN to improve the model's performance in the OWT field. Finally, we implement sparse interactions on the MLP, thus reducing parameter updates and memory costs significantly. Thanks to the proposed methods, EffOWT achieves an absolute gain of 5.5% on the tracking metric OWTA for unknown categories, while only updating 1.3% of the parameters compared to full fine-tuning, with a 36.4% memory saving. Other metrics also demonstrate obvious improvement.

Updated: 2025-04-09 01:00:05

标题: EffOWT: 将视觉语言模型有效地和高效地转移到开放世界跟踪

摘要: 开放世界跟踪（OWT）旨在跟踪任何类别的每个对象，这需要模型具有强大的泛化能力。跟踪器可以通过利用视觉语言模型（VLMs）来提高其泛化能力。然而，当VLMs转移到OWT时，微调策略会带来挑战：完全微调会导致过多的参数和内存成本，而零-shot策略会导致次优性能。为了解决这个问题，提出了EffOWT来有效地将VLMs转移到OWT。具体而言，我们在VLM骨干网络之外构建了一个小型且独立的可学习的侧网络。通过冻结骨干网络，仅在侧网络上执行反向传播，可以满足模型的效率要求。此外，EffOWT通过提出Transformer和CNN的混合结构来增强侧网络，在OWT领域提高模型的性能。最后，我们在MLP上实现了稀疏交互，从而显著减少了参数更新和内存成本。由于提出的方法，EffOWT在未知类别的跟踪度量OWTA上获得了5.5％的绝对增益，与完全微调相比，仅更新了1.3％的参数，节省了36.4％的内存。其他指标也表现出明显改善。

更新时间: 2025-04-09 01:00:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.05141v2

DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis

Multimodal Sentiment Analysis (MSA) leverages heterogeneous modalities, such as language, vision, and audio, to enhance the understanding of human sentiment. While existing models often focus on extracting shared information across modalities or directly fusing heterogeneous modalities, such approaches can introduce redundancy and conflicts due to equal treatment of all modalities and the mutual transfer of information between modality pairs. To address these issues, we propose a Disentangled-Language-Focused (DLF) multimodal representation learning framework, which incorporates a feature disentanglement module to separate modality-shared and modality-specific information. To further reduce redundancy and enhance language-targeted features, four geometric measures are introduced to refine the disentanglement process. A Language-Focused Attractor (LFA) is further developed to strengthen language representation by leveraging complementary modality-specific information through a language-guided cross-attention mechanism. The framework also employs hierarchical predictions to improve overall accuracy. Extensive experiments on two popular MSA datasets, CMU-MOSI and CMU-MOSEI, demonstrate the significant performance gains achieved by the proposed DLF framework. Comprehensive ablation studies further validate the effectiveness of the feature disentanglement module, language-focused attractor, and hierarchical predictions. Our code is available at https://github.com/pwang322/DLF.

Updated: 2025-04-09 00:52:30

标题: DLF：解缠-语言聚焦的多模态情感分析

摘要: 多模态情感分析（MSA）利用异构模态，如语言、视觉和音频，以增强对人类情感的理解。虽然现有模型通常专注于提取跨模态的共享信息或直接融合异构模态，但这些方法可能会由于对所有模态的平等处理以及模态对之间信息的相互传递而引入冗余和冲突。为了解决这些问题，我们提出了一个Disentangled-Language-Focused（DLF）多模态表示学习框架，其中包含一个特征解缠模块，用于分离模态共享和模态特定信息。为了进一步减少冗余并增强语言相关特征，引入了四个几何度量来优化解缠过程。通过引入Language-Focused Attractor（LFA）来进一步加强语言表示，通过语言引导的交叉注意机制利用补充的模态特定信息。该框架还采用分层预测来提高整体准确性。在两个流行的MSA数据集CMU-MOSI和CMU-MOSEI上进行了大量实验，证明了提出的DLF框架所取得的显著性能提升。全面的消融研究进一步验证了特征解缠模块、语言聚焦引子和分层预测的有效性。我们的代码可在https://github.com/pwang322/DLF 上找到。

更新时间: 2025-04-09 00:52:30

领域: cs.LG,cs.AI,cs.CL,cs.MM

下载: http://arxiv.org/abs/2412.12225v3

Untangling Lariats: Subgradient Following of Variationally Penalized Objectives

We describe an apparatus for subgradient-following of the optimum of convex problems with variational penalties. In this setting, we receive a sequence $y_i,\ldots,y_n$ and seek a smooth sequence $x_1,\ldots,x_n$. The smooth sequence needs to attain the minimum Bregman divergence to an input sequence with additive variational penalties in the general form of $\sum_i{}g_i(x_{i+1}-x_i)$. We derive known algorithms such as the fused lasso and isotonic regression as special cases of our approach. Our approach also facilitates new variational penalties such as non-smooth barrier functions. We then derive a novel lattice-based procedure for subgradient following of variational penalties characterized through the output of arbitrary convolutional filters. This paradigm yields efficient solvers for high-order filtering problems of temporal sequences in which sparse discrete derivatives such as acceleration and jerk are desirable. We also introduce and analyze new multivariate problems in which $\mathbf{x}_i,\mathbf{y}_i\in\mathbb{R}^d$ with variational penalties that depend on $\|\mathbf{x}_{i+1}-\mathbf{x}_i\|$. The norms we consider are $\ell_2$ and $\ell_\infty$ which promote group sparsity.

Updated: 2025-04-09 00:30:27

标题: 解开套圈：对变分惩罚目标的次梯度跟随

摘要: 我们描述了一个用于凸问题的变分惩罚的次梯度跟踪的装置。在这个设置中，我们接收到一个序列$y_i,\ldots,y_n$，并寻找一个平滑序列$x_1,\ldots,x_n$。这个平滑序列需要达到与输入序列具有加法变分惩罚的最小Bregman散度的距离，一般形式为$\sum_i{}g_i(x_{i+1}-x_i)$。我们推导出已知的算法，如融合拉索和单调回归，作为我们方法的特殊情况。我们的方法还促进了新的变分惩罚，如非光滑障碍函数。然后，我们推导了一个基于格的程序，用于跟踪通过任意卷积滤波器的输出所表征的变分惩罚。这种范例为时间序列的高阶滤波问题提供了有效的解决方案，其中稀疏离散导数（如加速度和抖动）是可取的。我们还介绍和分析了新的多变量问题，其中$\mathbf{x}_i,\mathbf{y}_i\in\mathbb{R}^d$，其变分惩罚取决于$\|\mathbf{x}_{i+1}-\mathbf{x}_i\|$。我们考虑的范数是$\ell_2$和$\ell_\infty$，可以促进组稀疏性。

更新时间: 2025-04-09 00:30:27

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.04710v3

Generative AI Voting: Fair Collective Choice is Resilient to LLM Biases and Inconsistencies

Scaling up deliberative and voting participation is a longstanding endeavor -- a cornerstone for direct democracy and legitimate collective choice. Recent breakthroughs in generative artificial intelligence (AI) and large language models (LLMs) unravel new capabilities for AI personal assistants to overcome cognitive bandwidth limitations of humans, providing decision support or even direct representation of human voters at large scale. However, the quality of this representation and what underlying biases manifest when delegating collective decision-making to LLMs is an alarming and timely challenge to tackle. By rigorously emulating with high realism more than >50K LLM voting personas in 306 real-world voting elections, we disentangle the nature of different biases in LLMS (GPT 3, GPT 3.5, and Llama2). Complex preferential ballot formats exhibit significant inconsistencies compared to simpler majoritarian elections that show higher consistency. Strikingly though, by demonstrating for the first time in real-world a proportional representation of voters in direct democracy, we are also able to show that fair ballot aggregation methods, such as equal shares, prove to be a win-win: fairer voting outcomes for humans with fairer AI representation, especially for voters who are likely to abstain. This novel underlying relationship proves paramount for democratic resilience in progressives scenarios with low voters turnout and voter fatigue supported by AI representatives: abstained voters are mitigated by recovering highly representative voting outcomes that are fairer. These interdisciplinary insights provide remarkable foundations for science, policymakers, and citizens to develop safeguards and resilience for AI risks in democratic innovations.

Updated: 2025-04-09 00:21:07

标题: 生成式人工智能投票：公平的集体选择对大型语言模型偏见和不一致性具有韧性

摘要: 扩大审议和投票参与是一项长期的努力，是直接民主和合法集体选择的基石。最近，生成式人工智能（AI）和大型语言模型（LLMs）的突破揭示了AI个人助手克服人类认知带宽限制的新能力，为决策支持甚至直接代表大规模人类选民提供了可能。然而，将集体决策委托给LLMs时所体现的这种代表性质量以及潜在的偏见是一个令人担忧和及时应对的挑战。通过在306个现实世界的选举中高度逼真地模拟超过50K个LLM投票人格，我们揭示了LLMs（GPT 3、GPT 3.5和Llama2）中不同偏见的本质。复杂的优先投票形式与简单的多数选举相比存在显著的不一致性。然而，令人震惊的是，通过在现实世界首次展示直接民主中选民的比例代表，我们还能够表明公平的选票聚合方法，如平等份额，被证明是双赢的：对于可能弃权的选民来说，人类得到更公平的AI代表，尤其是对于弃权的选民。这种新颖的基础关系对于在低选民投票率和选民疲劳的进步情景中民主的韧性至关重要，通过AI代表支持：弃权选民通过恢复高度代表性的投票结果而得到缓解，这些结果更加公平。这些跨学科的见解为科学家、政策制定者和公民提供了显著的基础，以开发对民主创新中AI风险的保障和韧性。

更新时间: 2025-04-09 00:21:07

领域: cs.AI

下载: http://arxiv.org/abs/2406.11871v4

Data-driven Fuzzy Control for Time-Optimal Aggressive Trajectory Following

Optimal trajectories that minimize a user-defined cost function in dynamic systems require the solution of a two-point boundary value problem. The optimization process yields an optimal control sequence that depends on the initial conditions and system parameters. However, the optimal sequence may result in undesirable behavior if the system's initial conditions and parameters are erroneous. This work presents a data-driven fuzzy controller synthesis framework that is guided by a time-optimal trajectory for multicopter tracking problems. In particular, we consider an aggressive maneuver consisting of a mid-air flip and generate a time-optimal trajectory by numerically solving the two-point boundary value problem. A fuzzy controller consisting of a stabilizing controller near hover conditions and an autoregressive moving average (ARMA) controller, trained to mimic the time-optimal aggressive trajectory, is constructed using the Takagi-Sugeno fuzzy framework.

Updated: 2025-04-09 00:06:15

标题: 基于数据驱动的模糊控制用于时间最优的激进轨迹跟随.

摘要: 在动态系统中，最小化用户定义成本函数的最佳轨迹需要解决两点边界值问题。优化过程产生了一个依赖于初始条件和系统参数的最优控制序列。然而，如果系统的初始条件和参数错误，最优序列可能导致不良行为。本文提出了一个数据驱动的模糊控制器合成框架，该框架以多旋翼追踪问题的时间最优轨迹为指导。具体来说，我们考虑了一个包括在空中翻转的激进机动，并通过数值求解两点边界值问题生成了一个时间最优轨迹。利用高桥-杉根模糊框架构建了一个包括在悬停条件附近的稳定控制器和一个经过训练以模拟时间最优激进轨迹的自回归移动平均（ARMA）控制器的模糊控制器。

更新时间: 2025-04-09 00:06:15

领域: eess.SY,cs.LG,cs.RO,cs.SY

下载: http://arxiv.org/abs/2504.06500v1

Continuous-Variable Quantum Encoding Techniques: A Comparative Study of Embedding Techniques and Their Impact on Machine Learning Performance

This study explores the intersection of continuous-variable quantum computing (CVQC) and classical machine learning, focusing on CVQC data encoding techniques, including Displacement encoding and squeezing encoding, alongside Instantaneous Quantum Polynomial (IQP) encoding from discrete quantum computing. We perform an extensive empirical analysis to assess the impact of these encoding methods on classical machine learning models, such as Logistic Regression, Support Vector Machines, K-Nearest Neighbors, and ensemble methods like Random Forest and LightGBM. Our findings indicate that CVQC-based encoding methods significantly enhance feature expressivity, resulting in improved classification accuracy and F1 scores, especially in high-dimensional and complex datasets. However, these improvements come with varying computational costs, which depend on the complexity of the encoding and the architecture of the machine learning models. Additionally, we examine the trade-off between quantum expressibility and classical learnability, offering valuable insights into the practical feasibility of incorporating these quantum encodings into real-world applications. This study contributes to the growing body of research on quantum-classical hybrid learning, emphasizing the role of CVQC in advancing quantum data representation and its integration into classical machine learning workflows.

Updated: 2025-04-09 00:00:45

标题: 连续变量量子编码技术：嵌入技术的比较研究及其对机器学习性能的影响

摘要: 本研究探讨了连续变量量子计算（CVQC）和经典机器学习的交集，重点关注CVQC数据编码技术，包括位移编码和压缩编码，以及来自离散量子计算的瞬时量子多项式（IQP）编码。我们进行了广泛的经验分析，评估这些编码方法对经典机器学习模型（如逻辑回归、支持向量机、K最近邻、随机森林和LightGBM等集成方法）的影响。我们的研究结果表明，基于CVQC的编码方法显著增强了特征表达能力，从而提高了分类准确性和F1分数，特别是在高维和复杂数据集中。然而，这些改进伴随着不同的计算成本，这取决于编码的复杂性和机器学习模型的架构。此外，我们还考察了量子表达能力和经典可学性之间的权衡，为将这些量子编码纳入实际应用提供了有价值的见解。本研究对量子-经典混合学习的不断增长的研究领域做出了贡献，强调了CVQC在推进量子数据表示和将其整合到经典机器学习工作流程中的作用。

更新时间: 2025-04-09 00:00:45

领域: quant-ph,cs.AI

下载: http://arxiv.org/abs/2504.06497v1