Arxiv Day: Article

HumorDB: Can AI understand graphical humor?

Despite significant advancements in image segmentation and object detection, understanding complex scenes remains a significant challenge. Here, we focus on graphical humor as a paradigmatic example of image interpretation that requires elucidating the interaction of different scene elements in the context of prior cognitive knowledge. This paper introduces \textbf{HumorDB}, a novel, controlled, and carefully curated dataset designed to evaluate and advance visual humor understanding by AI systems. The dataset comprises diverse images spanning photos, cartoons, sketches, and AI-generated content, including minimally contrastive pairs where subtle edits differentiate between humorous and non-humorous versions. We evaluate humans, state-of-the-art vision models, and large vision-language models on three tasks: binary humor classification, funniness rating prediction, and pairwise humor comparison. The results reveal a gap between current AI systems and human-level humor understanding. While pretrained vision-language models perform better than vision-only models, they still struggle with abstract sketches and subtle humor cues. Analysis of attention maps shows that even when models correctly classify humorous images, they often fail to focus on the precise regions that make the image funny. Preliminary mechanistic interpretability studies and evaluation of model explanations provide initial insights into how different architectures process humor. Our results identify promising trends and current limitations, suggesting that an effective understanding of visual humor requires sophisticated architectures capable of detecting subtle contextual features and bridging the gap between visual perception and abstract reasoning. All the code and data are available here: \href{https://github.com/kreimanlab/HumorDB}{https://github.com/kreimanlab/HumorDB}

Updated: 2025-07-24 23:41:15

标题: HumorDB：人工智能能理解图形幽默吗？

摘要: 尽管图像分割和目标检测取得了显著进展，但理解复杂场景仍然是一个重大挑战。在这里，我们以图形幽默为范例，说明图像解释需要阐明不同场景元素在先前认知知识的背景下相互作用。本文介绍了一个新颖、受控和精心策划的数据集HumorDB，旨在评估和推进AI系统对视觉幽默的理解。该数据集涵盖各种各样的图像，包括照片、漫画、素描和由AI生成的内容，其中包括最小对比对，通过微妙的编辑区分幽默和非幽默版本。我们在三个任务上评估了人类、最先进的视觉模型和大型视觉-语言模型：二元幽默分类、有趣程度评分预测和成对幽默比较。结果显示当前AI系统与人类级幽默理解之间存在差距。虽然预训练的视觉-语言模型表现优于仅视觉模型，但它们仍然在抽象素描和微妙的幽默线索上遇到困难。注意力图的分析显示，即使模型正确分类幽默图像，它们通常也无法集中于使图像有趣的确切区域。初步的机械解释性研究和模型解释评估提供了如何不同架构处理幽默的初步见解。我们的结果确定了有希望的趋势和当前的局限性，表明有效理解视觉幽默需要能够检测微妙的语境特征并弥合视觉感知和抽象推理之间的差距的复杂架构。所有的代码和数据都可以在此处获得：https://github.com/kreimanlab/HumorDB。

更新时间: 2025-07-24 23:41:15

领域: cs.CV,cs.AI,I.5.4

下载: http://arxiv.org/abs/2406.13564v2

HumorDB: Can AI understand graphical humor?

Updated: 2025-07-24 23:41:15

标题: HumorDB：人工智能能理解图形幽默吗？

摘要: 尽管图像分割和目标检测取得了显著进展，但理解复杂场景仍然是一个重大挑战。在这里，我们以图形幽默作为需要在先前认知知识背景下阐明不同场景元素相互作用的图像解释的典型示例。本文介绍了一个新颖的、受控的、精心策划的数据集HumorDB，旨在通过AI系统评估和推进视觉幽默理解。该数据集包括各种各样的图像，涵盖照片、卡通、素描和AI生成的内容，包括微妙对比的图像对，其中微小的编辑区分了有趣和无趣版本。我们在三个任务上评估人类、最先进的视觉模型和大型视觉语言模型：二元幽默分类、有趣程度预测和成对幽默比较。结果显示当前AI系统与人类水平的幽默理解存在差距。虽然预训练的视觉语言模型优于仅视觉模型，但它们仍然难以处理抽象的素描和微妙的幽默线索。注意力图的分析显示，即使模型正确分类有趣的图像，它们经常无法集中精力在使图像有趣的准确区域。初步的机械可解释性研究和模型解释评估提供了有关不同架构如何处理幽默的初步见解。我们的结果确定了令人鼓舞的趋势和当前的限制，表明有效理解视觉幽默需要能够检测微妙上下文特征并弥合视觉知觉和抽象推理之间差距的复杂架构。所有的代码和数据都可以在这里找到：https://github.com/kreimanlab/HumorDB。

更新时间: 2025-07-24 23:41:15

领域: cs.CV,cs.AI,I.5.4

下载: http://arxiv.org/abs/2406.13564v2

Optimizing Metachronal Paddling with Reinforcement Learning at Low Reynolds Number

Metachronal paddling is a swimming strategy in which an organism oscillates sets of adjacent limbs with a constant phase lag, propagating a metachronal wave through its limbs and propelling it forward. This limb coordination strategy is utilized by swimmers across a wide range of Reynolds numbers, which suggests that this metachronal rhythm was selected for its optimality of swimming performance. In this study, we apply reinforcement learning to a swimmer at zero Reynolds number and investigate whether the learning algorithm selects this metachronal rhythm, or if other coordination patterns emerge. We design the swimmer agent with an elongated body and pairs of straight, inflexible paddles placed along the body for various fixed paddle spacings. Based on paddle spacing, the swimmer agent learns qualitatively different coordination patterns. At tight spacings, a back-to-front metachronal wave-like stroke emerges which resembles the commonly observed biological rhythm, but at wide spacings, different limb coordinations are selected. Across all resulting strokes, the fastest stroke is dependent on the number of paddles, however, the most efficient stroke is a back-to-front wave-like stroke regardless of the number of paddles.

Updated: 2025-07-24 23:38:06

标题: 优化低雷诺数下采用强化学习的间歇式划桨

摘要: Metachronal paddling是一种游泳策略，其中生物体用恒定相位差振荡相邻肢体集合，通过其肢体传播metachronal波并推动向前。这种肢体协调策略被游泳者在广泛的Reynolds数范围内使用，这表明这种metachronal节奏被选择是因为其游泳表现的最优性。在这项研究中，我们将强化学习应用于零Reynolds数的游泳者，并调查学习算法是否选择这种metachronal节奏，或者是否出现其他协调模式。我们设计了一个身体细长的游泳者代理和沿身体放置的直的、不可弯曲的桨板对以不同固定的桨板间距为基础。根据桨板间距，游泳者代理学习到了 qualitatively 不同的协调模式。在紧密的间距下，出现了一种类似背到前的metachronal波状划水，类似于常见的生物节奏，但在宽间距下，选择了不同的肢体协调。在所有结果的划水中，最快的划水取决于桨板的数量，然而，最有效的划水是一种背到前的波状划水，无论桨板数量如何。

更新时间: 2025-07-24 23:38:06

领域: physics.flu-dyn,cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.18849v1

PTCMIL: Multiple Instance Learning via Prompt Token Clustering for Whole Slide Image Analysis

Multiple Instance Learning (MIL) has advanced WSI analysis but struggles with the complexity and heterogeneity of WSIs. Existing MIL methods face challenges in aggregating diverse patch information into robust WSI representations. While ViTs and clustering-based approaches show promise, they are computationally intensive and fail to capture task-specific and slide-specific variability. To address these limitations, we propose PTCMIL, a novel Prompt Token Clustering-based ViT for MIL aggregation. By introducing learnable prompt tokens into the ViT backbone, PTCMIL unifies clustering and prediction tasks in an end-to-end manner. It dynamically aligns clustering with downstream tasks, using projection-based clustering tailored to each WSI, reducing complexity while preserving patch heterogeneity. Through token merging and prototype-based pooling, PTCMIL efficiently captures task-relevant patterns. Extensive experiments on eight datasets demonstrate its superior performance in classification and survival analysis tasks, outperforming state-of-the-art methods. Systematic ablation studies confirm its robustness and strong interpretability. The code is released at https://github.com/ubc-tea/PTCMIL.

Updated: 2025-07-24 23:33:59

标题: PTCMIL: 多实例学习通过提示标记聚类进行全幻灯片图像分析

摘要: 多实例学习（MIL）已经推动了WSI分析的发展，但面临着WSI的复杂性和异质性的挑战。现有的MIL方法在将不同的块信息聚合成稳健的WSI表示时面临着挑战。虽然ViTs和基于聚类的方法显示出潜力，但它们计算密集且无法捕捉任务特定和幻灯片特定的变异性。为了解决这些限制，我们提出了PTCMIL，这是一种基于Prompt Token聚类的ViT用于MIL聚合的新方法。通过将可学习的提示令牌引入ViT骨干，PTCMIL以端到端的方式统一了聚类和预测任务。它动态地将聚类与下游任务进行对齐，使用针对每个WSI定制的基于投影的聚类，降低了复杂性同时保留了块的异质性。通过令牌合并和基于原型的池化，PTCMIL有效地捕获了任务相关的模式。对八个数据集进行的广泛实验表明，它在分类和生存分析任务中表现优异，优于最先进的方法。系统化的消融研究证实了其稳健性和强大的可解释性。代码发布在https://github.com/ubc-tea/PTCMIL。

更新时间: 2025-07-24 23:33:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18848v1

PTCMIL: Multiple Instance Learning via Prompt Token Clustering for Whole Slide Image Analysis

Updated: 2025-07-24 23:33:59

标题: PTCMIL: 多实例学习通过提示令牌聚类用于整张幻灯片图像分析

摘要: 多实例学习（MIL）已经推动了WSI分析，但在处理WSI的复杂性和异质性方面仍然面临挑战。现有的MIL方法在将多样的补丁信息聚合成强大的WSI表示时面临挑战。虽然ViTs和基于聚类的方法显示出潜力，但它们计算量大，并且无法捕捉任务特定和幻灯片特定的变化。为了解决这些局限性，我们提出了PTCMIL，一种基于Prompt Token聚类的ViT用于MIL聚合。通过将可学习的提示令牌引入ViT骨干，PTCMIL以端到端的方式统一了聚类和预测任务。它通过投影聚类将聚类与下游任务动态对齐，针对每个WSI定制，降低复杂性同时保留补丁的异质性。通过令牌合并和基于原型的池化，PTCMIL有效地捕捉了与任务相关的模式。在八个数据集上进行的大量实验表明，它在分类和生存分析任务中表现出优越的性能，优于最先进的方法。系统化的消融研究证实了其稳健性和强大的可解释性。该代码已发布在https://github.com/ubc-tea/PTCMIL。

更新时间: 2025-07-24 23:33:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18848v1

Low-Rank Thinning

The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time.

Updated: 2025-07-24 23:20:10

标题: 低秩稀疏

摘要: 稀疏化的目标是使用一小组代表性点对数据集进行总结。值得注意的是，像Kernel Halving和Compress这样的次高斯稀疏化算法可以与均匀子采样的质量相匹配，同时大大减少摘要点的数量。然而，现有的保证仅涵盖一定范围的分布和基于核的质量度量，并且受到悲观的维度依赖性的影响。为了解决这些缺陷，我们引入了一种新的次高斯稀疏化的低秩分析方法，适用于任何分布和任何核，保证在核或数据矩阵近似低秩时获得高质量的压缩。为了展示这些技术的广泛适用性，我们设计了实用的次高斯稀疏化方法，改进了对变压器中关注的最佳已知保证，通过重新排序加速随机梯度训练，并在几乎线性时间内区分分布。

更新时间: 2025-07-24 23:20:10

领域: stat.ML,cs.LG,math.OC,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2502.12063v7

Equivariant Volumetric Grasping

We propose a new volumetric grasp model that is equivariant to rotations around the vertical axis, leading to a significant improvement in sample efficiency. Our model employs a tri-plane volumetric feature representation -- i.e., the projection of 3D features onto three canonical planes. We introduce a novel tri-plane feature design in which features on the horizontal plane are equivariant to 90{\deg} rotations, while the sum of features from the other two planes remains invariant to the same transformations. This design is enabled by a new deformable steerable convolution, which combines the adaptability of deformable convolutions with the rotational equivariance of steerable ones. This allows the receptive field to adapt to local object geometry while preserving equivariance properties. We further develop equivariant adaptations of two state-of-the-art volumetric grasp planners, GIGA and IGD. Specifically, we derive a new equivariant formulation of IGD's deformable attention mechanism and propose an equivariant generative model of grasp orientations based on flow matching. We provide a detailed analytical justification of the proposed equivariance properties and validate our approach through extensive simulated and real-world experiments. Our results demonstrate that the proposed projection-based design significantly reduces both computational and memory costs. Moreover, the equivariant grasp models built on top of our tri-plane features consistently outperform their non-equivariant counterparts, achieving higher performance with only a modest computational overhead. Video and code can be viewed in: https://mousecpn.github.io/evg-page/

Updated: 2025-07-24 23:18:32

标题: 等变体积抓取

摘要: 我们提出了一种新的体积抓取模型，该模型对围绕垂直轴的旋转具有等变性，从而显著提高了样本效率。我们的模型采用三平面体积特征表示——即将3D特征投影到三个标准平面上。我们引入了一种新颖的三平面特征设计，其中水平平面上的特征对90°旋转具有等变性，而来自其他两个平面的特征之和对相同的转换保持不变。这种设计是通过新的可变可转卷积实现的，它结合了可变卷积的适应性和可转卷积的旋转等变性。这使得感受野能够适应局部物体几何形状，同时保持等变性属性。我们进一步开发了两种最先进的体积抓取规划器GIGA和IGD的等变性适应。具体来说，我们推导了IGD的可变注意机制的新等变性公式，并提出了基于流匹配的抓取方向等变性生成模型。我们通过广泛的模拟和真实世界实验提供了对所提出的等变性属性的详细分析验证。我们的结果表明，所提出的基于投影的设计显著降低了计算和内存成本。此外，建立在我们的三平面特征之上的等变性抓取模型始终优于其非等变性对应物，在只有适度计算开销的情况下实现更高性能。视频和代码可在以下网址查看：https://mousecpn.github.io/evg-page/

更新时间: 2025-07-24 23:18:32

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2507.18847v1

Equivariant Volumetric Grasping

Updated: 2025-07-24 23:18:32

标题: 等变体体积抓取

摘要: 我们提出了一种新的体积抓取模型，该模型对围绕垂直轴的旋转是等变的，从而显著提高了样本效率。我们的模型采用三平面体积特征表示法--即将3D特征投影到三个规范平面上。我们引入了一种新颖的三平面特征设计，其中水平面上的特征对90度旋转是等变的，而来自其他两个平面的特征之和对相同的变换保持不变。这种设计由新的可变形可导卷积实现，它结合了可变形卷积的适应性和可导卷积的旋转等变性。这使得感受野能够适应局部物体几何形状，同时保持等变性质。我们进一步开发了两种最先进的体积抓取规划器GIGA和IGD的等变适应。具体而言，我们推导了IGD的可变形注意机制的新的等变形式，并提出了基于流匹配的抓取方向的等变生成模型。我们通过广泛的模拟和真实世界实验对所提出的等变性质进行了详细的分析论证，并验证了我们的方法。我们的结果表明，所提出的基于投影的设计显著降低了计算和内存成本。此外，基于我们的三平面特征构建的等变抓取模型始终优于其非等变对应物，在仅有适度的计算开销的情况下实现了更高的性能。视频和代码可在以下链接中查看：https://mousecpn.github.io/evg-page/

更新时间: 2025-07-24 23:18:32

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2507.18847v1

Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training

Zeroth-order (ZO) optimization is an emerging deep neural network (DNN) training paradigm that offers computational simplicity and memory savings. However, this seemingly promising approach faces a significant and long-ignored challenge. ZO requires generating a substantial number of Gaussian random numbers, which poses significant difficulties and even makes it infeasible for hardware platforms, such as FPGAs and ASICs. In this paper, we identify this critical issue, which arises from the mismatch between algorithm and hardware designers. To address this issue, we proposed PeZO, a perturbation-efficient ZO framework. Specifically, we design random number reuse strategies to significantly reduce the demand for random number generation and introduce a hardware-friendly adaptive scaling method to replace the costly Gaussian distribution with a uniform distribution. Our experiments show that PeZO reduces the required LUTs and FFs for random number generation by 48.6\% and 12.7\%, and saves at maximum 86\% power consumption, all without compromising training performance, making ZO optimization feasible for on-device training. To the best of our knowledge, we are the first to explore the potential of on-device ZO optimization, providing valuable insights for future research.

Updated: 2025-07-24 23:09:02

标题: 硬件友好的设备端训练的扰动高效零阶优化

摘要: 零阶（ZO）优化是一种新兴的深度神经网络（DNN）训练范式，提供了计算简单性和内存节省。然而，这种看似有前途的方法面临着一个长期被忽视的重大挑战。ZO需要生成大量的高斯随机数，这带来了重大困难，甚至使其在硬件平台（如FPGAs和ASICs）上变得不可行。在本文中，我们确定了这个关键问题，这是由算法与硬件设计师之间的不匹配引起的。为了解决这个问题，我们提出了PeZO，一个高效的扰动ZO框架。具体地，我们设计了随机数重用策略来显著减少对随机数生成的需求，并引入了一种硬件友好的自适应缩放方法，以将昂贵的高斯分布替换为均匀分布。我们的实验证明，PeZO将随机数生成所需的LUTs和FFs分别减少了48.6％和12.7％，并最多节省了86％的功耗，而又不影响训练性能，使得ZO优化在设备上训练成为可能。据我们所知，我们是第一个探索设备上ZO优化潜力的研究者，为未来研究提供了有价值的见解。

更新时间: 2025-07-24 23:09:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.20314v2

Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training

Updated: 2025-07-24 23:09:02

标题: 扰动高效的零阶优化：硬件友好的设备端训练

摘要: Zeroth-order (ZO) 优化是一种新兴的深度神经网络（DNN）训练范式，它提供了计算简单性和内存节省。然而，这种看似有前途的方法面临着一个重大且长期被忽视的挑战。ZO需要生成大量的高斯随机数，这给硬件平台（如FPGA和ASICs）带来了重大困难，甚至使其不可行。本文中，我们确定了这一关键问题，这一问题源于算法和硬件设计者之间的不匹配。为了解决这个问题，我们提出了PeZO，一个高效的扰动优化ZO框架。具体地，我们设计了随机数重用策略，显著减少了对随机数生成的需求，并引入了一种硬件友好的自适应缩放方法，用均匀分布替代昂贵的高斯分布。我们的实验表明，PeZO将随机数生成所需的LUTs和FFs分别减少了48.6\%和12.7\%，最大程度上节省了86\%的功耗，而且没有影响训练性能，使得ZO优化对于现场训练成为可能。据我们所知，我们是第一个探索现场ZO优化潜力的研究，为未来研究提供了宝贵的见解。

更新时间: 2025-07-24 23:09:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2504.20314v2

PIPA: Preference Alignment as Prior-Informed Statistical Estimation

Offline preference alignment for language models such as Direct Preference Optimization (DPO) is favored for its effectiveness and simplicity, eliminating the need for costly reinforcement learning. Various offline algorithms have been developed for different data settings, yet they lack a unified understanding. In this study, we introduce Pior-Informed Preference Alignment (PIPA), a unified, RL-free probabilistic framework that formulates language model preference alignment as a Maximum Likelihood Estimation (MLE) problem with prior constraints. This method effectively accommodates both paired and unpaired data, as well as answer and step-level annotations. We illustrate that DPO and KTO are special cases with different prior constraints within our framework. By integrating different types of prior information, we developed two variations of PIPA: PIPA-M and PIPA-N. Both algorithms demonstrate a $3\sim10\%$ performance enhancement on the GSM8K and MATH benchmarks across all configurations, achieving these gains without additional training or computational costs compared to existing algorithms.

Updated: 2025-07-24 22:59:42

标题: PIPA：偏好对齐作为先验知情的统计估计

摘要: 离线偏好对齐对于诸如直接偏好优化（DPO）之类的语言模型而言，因其有效性和简单性而备受青睐，消除了昂贵的强化学习的需求。针对不同的数据设置，已经开发了各种离线算法，但它们缺乏统一的理解。在本研究中，我们引入了Pior-Informed Preference Alignment（PIPA），这是一个统一的、无需强化学习的概率框架，将语言模型偏好对齐形式化为一个带有先验约束的最大似然估计（MLE）问题。该方法有效地适应了成对和无成对数据，以及答案和步骤级别的注释。我们演示了DPO和KTO是我们框架中具有不同先验约束的特例。通过整合不同类型的先验信息，我们开发了两种Pipa的变体：Pipa-M和Pipa-N。这两种算法在所有配置下在GSM8K和MATH基准测试中均表现出3~10%的性能提升，与现有算法相比，实现了这些收益，而无需额外的训练或计算成本。

更新时间: 2025-07-24 22:59:42

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2502.05773v2

PIPA: Preference Alignment as Prior-Informed Statistical Estimation

Updated: 2025-07-24 22:59:42

标题: PIPA: 偏好对齐作为先验知情统计估计

摘要: 离线偏好对齐对于诸如直接偏好优化（DPO）之类的语言模型受到青睐，因为它具有高效性和简单性，消除了昂贵的强化学习的需求。针对不同的数据设置已开发了各种离线算法，然而它们缺乏统一的理解。在这项研究中，我们引入了Pior-Informed Preference Alignment（PIPA），这是一个统一的、无强化学习的概率框架，将语言模型偏好对齐形式化为带有先验约束的最大似然估计（MLE）问题。该方法有效地适应了成对和非成对数据，以及答案和步骤级别的注释。我们说明了DPO和KTO是我们框架中具有不同先验约束的特例。通过整合不同类型的先验信息，我们开发了两种PIPA的变体：PIPA-M和PIPA-N。这两种算法在GSM8K和MATH基准测试中的所有配置中均表现出3\sim10\%的性能提升，与现有算法相比，实现这些收益而无需额外的训练或计算成本。

更新时间: 2025-07-24 22:59:42

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2502.05773v2

R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

Chain-of-thought (CoT) reasoning enhances the problem-solving capabilities of large language models by encouraging step-by-step intermediate reasoning during inference. While effective, CoT introduces substantial computational overhead due to its reliance on autoregressive decoding over long token sequences. Existing acceleration strategies either reduce sequence length through early stopping or compressive reward designs, or improve decoding speed via speculative decoding with smaller models. However, speculative decoding suffers from limited speedup when the agreement between small and large models is low, and fails to exploit the potential advantages of small models in producing concise intermediate reasoning. In this paper, we present R-Stitch, a token-level, confidence-based hybrid decoding framework that accelerates CoT inference by switching between a small language model (SLM) and a large language model (LLM) along the reasoning trajectory. R-Stitch uses the SLM to generate tokens by default and delegates to the LLM only when the SLM's confidence falls below a threshold. This design avoids full-sequence rollback and selectively invokes the LLM on uncertain steps, preserving both efficiency and answer quality. R-Stitch is model-agnostic, training-free, and compatible with standard decoding pipelines. Experiments on math reasoning benchmarks demonstrate that R-Stitch achieves up to 85\% reduction in inference latency with negligible accuracy drop, highlighting its practical effectiveness in accelerating CoT reasoning.

Updated: 2025-07-24 22:39:45

标题: R-Stitch：用于高效推理的动态轨迹拼接

摘要: Chain-of-thought (CoT) 推理通过鼓励推理过程中的逐步中间推理来增强大型语言模型的问题解决能力。尽管有效，但由于其依赖于长标记序列上的自回归解码，CoT引入了相当大的计算开销。现有的加速策略要么通过提前停止或压缩奖励设计来减少序列长度，要么通过使用较小模型进行投机解码来提高解码速度。然而，当小型模型和大型模型之间的一致性较低时，投机解码速度提升有限，并且未能充分利用小型模型在产生简明中间推理方面的潜在优势。在本文中，我们提出了R-Stitch，一个基于令牌级别、基于置信度的混合解码框架，通过在推理轨迹上在小语言模型（SLM）和大语言模型（LLM）之间切换来加速CoT推理。R-Stitch默认使用SLM生成标记，并仅在SLM的置信度低于阈值时委托给LLM。这种设计避免了完整序列回滚，并在不确定步骤上有选择地调用LLM，既保留了效率又保持了答案质量。R-Stitch是与模型无关、无需训练的，并兼容标准解码流程。对数学推理基准的实验证明，R-Stitch 在推理延迟上实现了高达 85\% 的降低，几乎没有准确性下降，突显了其在加速CoT推理方面的实际有效性。

更新时间: 2025-07-24 22:39:45

领域: cs.LG

下载: http://arxiv.org/abs/2507.17307v2

Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners

Reinforcement learning agents are fundamentally limited by the quality of the reward functions they learn from, yet reward design is often overlooked under the assumption that a well-defined reward is readily available. However, in practice, designing rewards is difficult, and even when specified, evaluating their correctness is equally problematic: how do we know if a reward function is correctly specified? In our work, we address these challenges by focusing on reward alignment -- assessing whether a reward function accurately encodes the preferences of a human stakeholder. As a concrete measure of reward alignment, we introduce the Trajectory Alignment Coefficient to quantify the similarity between a human stakeholder's ranking of trajectory distributions and those induced by a given reward function. We show that the Trajectory Alignment Coefficient exhibits desirable properties, such as not requiring access to a ground truth reward, invariance to potential-based reward shaping, and applicability to online RL. Additionally, in an 11 -- person user study of RL practitioners, we found that access to the Trajectory Alignment Coefficient during reward selection led to statistically significant improvements. Compared to relying only on reward functions, our metric reduced cognitive workload by 1.5x, was preferred by 82% of users and increased the success rate of selecting reward functions that produced performant policies by 41%.

Updated: 2025-07-24 22:27:38

标题: 朝着改进RL中的奖励设计：RL从业者的奖励对齐度量

摘要: 强化学习代理受到其学习的奖励函数质量的根本限制，然而奖励设计经常被忽视，因为通常假设一个明确定义的奖励是容易获得的。然而，在实践中，设计奖励是困难的，即使指定了，评估其正确性同样是问题：我们如何知道奖励函数是否被正确指定？在我们的工作中，我们通过专注于奖励对齐来解决这些挑战 -- 评估奖励函数是否准确地编码了人类利益相关者的偏好。作为奖励对齐的具体衡量标准，我们引入了轨迹对齐系数来量化人类利益相关者对轨迹分布的排名与给定奖励函数引导的排名之间的相似性。我们展示了轨迹对齐系数具有可取性质，例如不需要访问地面事实奖励、对基于潜力的奖励塑形不变性以及适用于在线强化学习。此外，在对11名强化学习从业者的用户研究中，我们发现访问轨迹对齐系数在奖励选择过程中导致了统计显著的改进。与仅依赖于奖励函数相比，我们的度量减少了1.5倍的认知负荷，被82%的用户偏好，并将产生高性能政策的奖励函数选择成功率提高了41%。

更新时间: 2025-07-24 22:27:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.05996v2

Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners

Updated: 2025-07-24 22:27:38

标题: 朝着改进强化学习中的奖励设计：一种RL从业者的奖励对齐度指标

摘要: 强化学习代理受到从中学习的奖励函数质量的基本限制，然而奖励设计经常被忽视，因为人们认为一个明确定义的奖励很容易获得。然而，在实践中，设计奖励是困难的，即使明确定义了，评估其正确性也同样困难：我们如何知道奖励函数是否被正确地指定？在我们的工作中，我们通过关注奖励对齐来应对这些挑战 -- 评估奖励函数是否准确地编码了人类利益相关者的偏好。作为奖励对齐的具体衡量标准，我们引入了轨迹对齐系数来量化人类利益相关者对轨迹分布的排名与给定奖励函数所引发的排名之间的相似性。我们展示了轨迹对齐系数具有理想的性质，比如不需要访问地面真实奖励、对基于潜在的奖励塑形具有不变性，以及适用于在线强化学习。此外，在一项对11名强化学习实践者的用户研究中，我们发现在奖励选择过程中访问轨迹对齐系数导致显著的改善。与仅依赖奖励函数相比，我们的度量减少了1.5倍的认知负荷，82%的用户更喜欢它，并且增加了能够产生表现良好策略的奖励函数选择成功率达到41%。

更新时间: 2025-07-24 22:27:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2503.05996v2

Flow Stochastic Segmentation Networks

We introduce the Flow Stochastic Segmentation Network (Flow-SSN), a generative segmentation model family featuring discrete-time autoregressive and modern continuous-time flow variants. We prove fundamental limitations of the low-rank parameterisation of previous methods and show that Flow-SSNs can estimate arbitrarily high-rank pixel-wise covariances without assuming the rank or storing the distributional parameters. Flow-SSNs are also more efficient to sample from than standard diffusion-based segmentation models, thanks to most of the model capacity being allocated to learning the base distribution of the flow, constituting an expressive prior. We apply Flow-SSNs to challenging medical imaging benchmarks and achieve state-of-the-art results. Code available: https://github.com/biomedia-mira/flow-ssn.

Updated: 2025-07-24 22:26:28

标题: 流随机分割网络

摘要: 我们介绍了Flow Stochastic Segmentation Network（Flow-SSN），这是一种生成分割模型家族，具有离散时间自回归和现代连续时间流变体。我们证明了以前方法中低秩参数化的基本限制，并展示了Flow-SSNs可以估计任意高秩的像素级协方差，而无需假设秩或存储分布参数。由于大部分模型容量被分配给学习流的基本分布，构成了一个表现力强大的先验，Flow-SSNs也比标准扩散基于分割模型更有效地进行抽样。我们将Flow-SSNs应用于具有挑战性的医学成像基准，并取得了最先进的结果。代码可在https://github.com/biomedia-mira/flow-ssn 上获得。

更新时间: 2025-07-24 22:26:28

领域: cs.CV,cs.AI,stat.ML

下载: http://arxiv.org/abs/2507.18838v1

Flow Stochastic Segmentation Networks

Updated: 2025-07-24 22:26:28

标题: 流随机分割网络

摘要: 我们介绍了流动随机分割网络（Flow-SSN），这是一种生成分割模型家族，具有离散时间自回归和现代连续时间流变体。我们证明了先前方法的低秩参数化的基本局限性，并展示了Flow-SSNs可以在不假定秩或存储分布参数的情况下估计任意高秩的像素级协方差。Flow-SSNs也比基于标准扩散的分割模型更有效地进行采样，这要归功于大部分模型容量被分配给学习流的基本分布，构成了一种表达丰富的先验。我们将Flow-SSNs应用于具有挑战性的医学成像基准，并取得了最先进的结果。代码可在 https://github.com/biomedia-mira/flow-ssn 上找到。

更新时间: 2025-07-24 22:26:28

领域: cs.CV,cs.AI,stat.ML

下载: http://arxiv.org/abs/2507.18838v1

RedactOR: An LLM-Powered Framework for Automatic Clinical Data De-Identification

Ensuring clinical data privacy while preserving utility is critical for AI-driven healthcare and data analytics. Existing de-identification (De-ID) methods, including rule-based techniques, deep learning models, and large language models (LLMs), often suffer from recall errors, limited generalization, and inefficiencies, limiting their real-world applicability. We propose a fully automated, multi-modal framework, RedactOR for de-identifying structured and unstructured electronic health records, including clinical audio records. Our framework employs cost-efficient De-ID strategies, including intelligent routing, hybrid rule and LLM based approaches, and a two-step audio redaction approach. We present a retrieval-based entity relexicalization approach to ensure consistent substitutions of protected entities, thereby enhancing data coherence for downstream applications. We discuss key design desiderata, de-identification and relexicalization methodology, and modular architecture of RedactOR and its integration with the Oracle Health Clinical AI system. Evaluated on the i2b2 2014 De-ID dataset using standard metrics with strict recall, our approach achieves competitive performance while optimizing token usage to reduce LLM costs. Finally, we discuss key lessons and insights from deployment in real-world AI- driven healthcare data pipelines.

Updated: 2025-07-24 22:25:37

标题: RedactOR: 一种基于LLM的自动临床数据去标识框架

摘要: 在AI驱动的医疗保健和数据分析中，确保临床数据隐私并保留效用至关重要。现有的去识别（De-ID）方法，包括基于规则的技术、深度学习模型和大型语言模型（LLMs），往往存在召回错误、有限的泛化能力和低效率，限制了它们在现实世界中的适用性。我们提出了一个完全自动化的多模态框架RedactOR，用于去识别结构化和非结构化的电子健康记录，包括临床音频记录。我们的框架采用成本效益高的De-ID策略，包括智能路由、混合规则和LLM基于方法，以及两步音频去识别方法。我们提出了一种基于检索的实体重新词汇化方法，以确保受保护实体的一致替换，从而增强下游应用的数据连贯性。我们讨论了RedactOR的关键设计要求、去识别和重新词汇化方法论，以及与Oracle Health Clinical AI系统的模块化架构集成。在使用严格召回的标准指标对i2b2 2014 De-ID数据集进行评估时，我们的方法在优化令牌使用以减少LLM成本的同时实现了竞争性能。最后，我们讨论了在现实世界中AI驱动的医疗保健数据管道部署中的关键经验和见解。

更新时间: 2025-07-24 22:25:37

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2505.18380v2

RedactOR: An LLM-Powered Framework for Automatic Clinical Data De-Identification

Updated: 2025-07-24 22:25:37

标题: RedactOR：一个基于LLM的自动临床数据去识别框架

摘要: 在AI驱动的医疗保健和数据分析中，确保临床数据隐私同时保留效用至关重要。现有的去标识化（De-ID）方法，包括基于规则的技术、深度学习模型和大型语言模型（LLM），通常存在召回错误、泛化能力有限和低效率等问题，限制了它们在现实世界中的适用性。我们提出了一个完全自动化的多模态框架RedactOR，用于去标识化结构化和非结构化的电子健康记录，包括临床音频记录。我们的框架采用了成本效益高的De-ID策略，包括智能路由、混合规则和LLM基础方法，以及两步音频去标识化方法。我们提出了一种基于检索的实体重编码方法，以确保对受保护实体的一致替换，从而增强下游应用程序的数据连贯性。我们讨论了RedactOR的关键设计要求、去标识化和重编码方法论，以及其与Oracle Health Clinical AI系统的模块化架构集成。在使用严格召回的标准指标对i2b2 2014 De-ID数据集进行评估时，我们的方法在优化令牌使用以降低LLM成本的同时取得了竞争性表现。最后，我们讨论了在现实世界的AI驱动医疗保健数据管道部署中获得的关键经验和见解。

更新时间: 2025-07-24 22:25:37

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2505.18380v2

Evaluating Ensemble and Deep Learning Models for Static Malware Detection with Dimensionality Reduction Using the EMBER Dataset

This study investigates the effectiveness of several machine learning algorithms for static malware detection using the EMBER dataset, which contains feature representations of Portable Executable (PE) files. We evaluate eight classification models: LightGBM, XGBoost, CatBoost, Random Forest, Extra Trees, HistGradientBoosting, k-Nearest Neighbors (KNN), and TabNet, under three preprocessing settings: original feature space, Principal Component Analysis (PCA), and Linear Discriminant Analysis (LDA). The models are assessed on accuracy, precision, recall, F1 score, and AUC to examine both predictive performance and robustness. Ensemble methods, especially LightGBM and XGBoost, show the best overall performance across all configurations, with minimal sensitivity to PCA and consistent generalization. LDA improves KNN performance but significantly reduces accuracy for boosting models. TabNet, while promising in theory, underperformed under feature reduction, likely due to architectural sensitivity to input structure. The analysis is supported by detailed exploratory data analysis (EDA), including mutual information ranking, PCA or t-SNE visualizations, and outlier detection using Isolation Forest and Local Outlier Factor (LOF), which confirm the discriminatory capacity of key features in the EMBER dataset. The results suggest that boosting models remain the most reliable choice for high-dimensional static malware detection, and that dimensionality reduction should be applied selectively based on model type. This work provides a benchmark for comparing classification models and preprocessing strategies in malware detection tasks and contributes insights that can guide future system development and real-world deployment.

Updated: 2025-07-24 22:23:53

标题: 评估集成和深度学习模型在使用EMBER数据集进行静态恶意软件检测时的降维效果

摘要: 这项研究调查了使用EMBER数据集进行静态恶意软件检测的几种机器学习算法的有效性，该数据集包含Portable Executable (PE)文件的特征表示。我们评估了八种分类模型：LightGBM、XGBoost、CatBoost、Random Forest、Extra Trees、HistGradientBoosting、k-最近邻居（KNN）和TabNet，在三种预处理设置下进行评估：原始特征空间、主成分分析（PCA）和线性判别分析（LDA）。模型根据准确率、精度、召回率、F1分数和AUC进行评估，以检验预测性能和鲁棒性。集成方法，特别是LightGBM和XGBoost，在所有配置下展示了最佳的整体性能，对PCA的敏感性最小，并且具有一致的泛化能力。LDA提高了KNN的性能，但显著降低了增强模型的准确性。TabNet在理论上很有潜力，但在特征减少下表现不佳，可能是由于对输入结构的架构敏感性。分析得到了详细的探索性数据分析（EDA）支持，包括互信息排序、PCA或t-SNE可视化，以及使用隔离森林和局部离群因子（LOF）进行异常检测，这些都证实了EMBER数据集中关键特征的区分能力。结果表明，增强模型仍然是高维静态恶意软件检测的最可靠选择，降维应根据模型类型有选择性地应用。这项工作为比较分类模型和预处理策略在恶意软件检测任务中提供了一个基准，并为指导未来系统开发和实际部署提供了见解。

更新时间: 2025-07-24 22:23:53

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.16952v2

Evaluating Ensemble and Deep Learning Models for Static Malware Detection with Dimensionality Reduction Using the EMBER Dataset

Updated: 2025-07-24 22:23:53

标题: 评估集成和深度学习模型在使用EMBRY数据集进行降维静态恶意软件检测中的效果

摘要: 本研究调查了使用包含便携式可执行（PE）文件特征表示的EMBER数据集进行静态恶意软件检测的几种机器学习算法的有效性。我们评估了八种分类模型：LightGBM、XGBoost、CatBoost、随机森林、额外树、HistGradientBoosting、k-最近邻（KNN）和TabNet，在三种预处理设置下：原始特征空间、主成分分析（PCA）和线性判别分析（LDA）。我们根据准确性、精确度、召回率、F1分数和AUC来评估模型，以检验预测性能和稳健性。集成方法，特别是LightGBM和XGBoost，在所有配置下显示出最佳的整体性能，对于PCA几乎没有敏感性，并且具有一致的泛化能力。LDA提高了KNN的性能，但显著降低了提升模型的准确性。TabNet在理论上有潜力，但在特征降维下表现不佳，可能是由于对输入结构的架构敏感性。该分析得到了详细的探索性数据分析（EDA）的支持，包括互信息排序、PCA或t-SNE可视化以及使用孤立森林和局部异常因子（LOF）进行异常值检测，这些确认了EMBER数据集中关键特征的区分能力。结果表明，提升模型仍然是高维度静态恶意软件检测的最可靠选择，并且应该根据模型类型有选择性地应用降维。这项工作为在恶意软件检测任务中比较分类模型和预处理策略提供了一个基准，并为指导未来系统开发和实际部署提供了见解。

更新时间: 2025-07-24 22:23:53

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.16952v2

Toward Super Agent System with Hybrid AI Routers

AI Agents powered by Large Language Models are transforming the world through enormous applications. A super agent has the potential to fulfill diverse user needs, such as summarization, coding, and research, by accurately understanding user intent and leveraging the appropriate tools to solve tasks. However, to make such an agent viable for real-world deployment and accessible at scale, significant optimizations are required to ensure high efficiency and low cost. This position paper presents a design of the Super Agent System powered by the hybrid AI routers. Upon receiving a user prompt, the system first detects the intent of the user, then routes the request to specialized task agents with the necessary tools or automatically generates agentic workflows. In practice, most applications directly serve as AI assistants on edge devices such as phones and robots. As different language models vary in capability and cloud-based models often entail high computational costs, latency, and privacy concerns, we then explore the hybrid mode where the router dynamically selects between local and cloud models based on task complexity. Finally, we introduce the blueprint of an on-device super agent enhanced with cloud. With advances in multi-modality models and edge hardware, we envision that most computations can be handled locally, with cloud collaboration only as needed. Such architecture paves the way for super agents to be seamlessly integrated into everyday life in the near future.

Updated: 2025-07-24 22:14:47

标题: 朝向具有混合人工智能路由器的超级代理系统

摘要: 由大型语言模型驱动的AI代理正在通过庞大的应用程序改变世界。超级代理有潜力通过准确理解用户意图并利用适当的工具来解决任务，满足各种用户需求，如摘要、编码和研究。然而，要使这样的代理在现实世界中得以部署并可扩展访问，需要进行重大优化，以确保高效率和低成本。本文提出了一个由混合AI路由器驱动的超级代理系统的设计。在接收到用户提示后，系统首先检测用户意图，然后将请求路由到带有必要工具的专门任务代理，或者自动生成代理工作流程。在实践中，大多数应用直接作为边缘设备（如手机和机器人）上的AI助手。由于不同的语言模型在能力上有所不同，基于云的模型通常涉及高计算成本、延迟和隐私问题，因此我们探讨了路由器根据任务复杂性动态选择本地和云模型之间的混合模式。最后，我们介绍了一个增强了云功能的设备上的超级代理的蓝图。随着多模态模型和边缘硬件的进步，我们设想大多数计算可以在本地处理，只在需要时进行云协作。这种架构为超级代理在不久的将来无缝集成到日常生活中铺平了道路。

更新时间: 2025-07-24 22:14:47

领域: cs.AI,cs.CL,cs.LG,cs.MA

下载: http://arxiv.org/abs/2504.10519v2

Toward Super Agent System with Hybrid AI Routers

Updated: 2025-07-24 22:14:47

标题: 朝着具有混合人工智能路由器的超级代理系统前进

摘要: 由大型语言模型提供支持的AI代理正在通过巨大的应用程序改变世界。超级代理有潜力通过准确理解用户意图并利用适当的工具来解决任务，满足各种用户需求，如摘要、编码和研究。然而，要使这样的代理在实际部署中可行并且可以大规模访问，需要进行重大优化以确保高效率和低成本。本文提出了由混合AI路由器提供支持的超级代理系统的设计。系统在接收用户提示后，首先检测用户的意图，然后将请求路由到具有必要工具的专门任务代理，或自动生成代理工作流程。在实践中，大多数应用程序直接作为手机和机器人等边缘设备上的AI助手。由于不同的语言模型在功能上有所不同，云模型通常会带来高计算成本、延迟和隐私问题，因此我们探索了基于任务复杂性动态选择本地和云模型的混合模式。最后，我们介绍了一种增强了云功能的设备上超级代理的蓝图。随着多模态模型和边缘硬件的进步，我们预见到大多数计算可以在本地处理，只在需要时与云进行协作。这种架构为超级代理在不久的将来无缝集成到日常生活中铺平了道路。

更新时间: 2025-07-24 22:14:47

领域: cs.AI,cs.CL,cs.LG,cs.MA

下载: http://arxiv.org/abs/2504.10519v2

Unmasking Synthetic Realities in Generative AI: A Comprehensive Review of Adversarially Robust Deepfake Detection Systems

The rapid advancement of Generative Artificial Intelligence has fueled deepfake proliferation-synthetic media encompassing fully generated content and subtly edited authentic material-posing challenges to digital security, misinformation mitigation, and identity preservation. This systematic review evaluates state-of-the-art deepfake detection methodologies, emphasizing reproducible implementations for transparency and validation. We delineate two core paradigms: (1) detection of fully synthetic media leveraging statistical anomalies and hierarchical feature extraction, and (2) localization of manipulated regions within authentic content employing multi-modal cues such as visual artifacts and temporal inconsistencies. These approaches, spanning uni-modal and multi-modal frameworks, demonstrate notable precision and adaptability in controlled settings, effectively identifying manipulations through advanced learning techniques and cross-modal fusion. However, comprehensive assessment reveals insufficient evaluation of adversarial robustness across both paradigms. Current methods exhibit vulnerability to adversarial perturbations-subtle alterations designed to evade detection-undermining reliability in real-world adversarial contexts. This gap highlights critical disconnect between methodological development and evolving threat landscapes. To address this, we contribute a curated GitHub repository aggregating open-source implementations, enabling replication and testing. Our findings emphasize urgent need for future work prioritizing adversarial resilience, advocating scalable, modality-agnostic architectures capable of withstanding sophisticated manipulations. This review synthesizes strengths and shortcomings of contemporary deepfake detection while charting paths toward robust trustworthy systems.

Updated: 2025-07-24 22:05:52

标题: 揭示生成式人工智能中的合成现实：对抗性强健深度伪造检测系统的全面审查

摘要: 生成式人工智能的快速发展推动了深度伪造技术的泛滥，合成媒体涵盖了完全生成的内容和微妙编辑的真实材料，给数字安全、虚假信息缓解和身份保护带来挑战。本系统性回顾评估了最先进的深度伪造检测方法，强调可重现的实现以确保透明度和验证。我们划分了两个核心范式：（1）检测完全合成媒体利用统计异常和分层特征提取，以及（2）在真实内容中定位被篡改区域，采用多模态线索如视觉痕迹和时间矛盾。这些方法涵盖了单模态和多模态框架，在受控环境中展示了显著的精度和适应性，通过高级学习技术和跨模态融合有效地识别篡改。然而，综合评估揭示了在两种范式中对敌对鲁棒性的不足评估。当前方法对敌对扰动-旨在规避检测的微妙变化-展现了脆弱性，削弱了在现实世界敌对情境中的可靠性。这种差距突显了方法论发展与不断变化的威胁格局之间的重要脱节。为了解决这一问题，我们提供了一个策划的 GitHub 代码库，汇总了开源实现，以便复制和测试。我们的发现强调了未来工作急需优先考虑敌对韧性，倡导可承受复杂篡改的可伸缩、模态无关的架构。本回顾综合了当代深度伪造检测的优势和不足，同时指明了通向强大可信系统的路径。

更新时间: 2025-07-24 22:05:52

领域: cs.CR,cs.CV,F.2.2; I.2.7

下载: http://arxiv.org/abs/2507.21157v1

LeanKAN: A Parameter-Lean Kolmogorov-Arnold Network Layer with Improved Memory Efficiency and Convergence Behavior

The recently proposed Kolmogorov-Arnold network (KAN) is a promising alternative to multi-layer perceptrons (MLPs) for data-driven modeling. While original KAN layers were only capable of representing the addition operator, the recently-proposed MultKAN layer combines addition and multiplication subnodes in an effort to improve representation performance. Here, we find that MultKAN layers suffer from a few key drawbacks including limited applicability in output layers, bulky parameterizations with extraneous activations, and the inclusion of complex hyperparameters. To address these issues, we propose LeanKANs, a direct and modular replacement for MultKAN and traditional AddKAN layers. LeanKANs address these three drawbacks of MultKAN through general applicability as output layers, significantly reduced parameter counts for a given network structure, and a smaller set of hyperparameters. As a one-to-one layer replacement for standard AddKAN and MultKAN layers, LeanKAN is able to provide these benefits to traditional KAN learning problems as well as augmented KAN structures in which it serves as the backbone, such as KAN Ordinary Differential Equations (KAN-ODEs) or Deep Operator KANs (DeepOKAN). We demonstrate LeanKAN's simplicity and efficiency in a series of demonstrations carried out across a standard KAN toy problem as well as ordinary and partial differential equations learned via KAN-ODEs, where we find that its sparser parameterization and compact structure serve to increase its expressivity and learning capability, leading it to outperform similar and even much larger MultKANs in various tasks.

Updated: 2025-07-24 22:04:51

标题: LeanKAN：具有改进的内存效率和收敛行为的参数精简科尔莫戈洛夫-阿诺德网络层

摘要: 最近提出的科尔莫戈洛夫-阿诺德网络（KAN）是一种有前途的用于数据驱动建模的替代多层感知器（MLPs）的方法。虽然最初的KAN层只能表示加法运算符，但最近提出的MultKAN层结合了加法和乘法子节点，旨在提高表示性能。在这里，我们发现MultKAN层存在一些关键缺点，包括在输出层的适用性有限，参数化庞大且带有多余的激活，以及包含复杂的超参数。为了解决这些问题，我们提出了LeanKANs，这是MultKAN和传统AddKAN层的直接和模块化替代。LeanKAN通过作为输出层的一般适用性，对给定网络结构显著减少参数数量，以及更小的超参数集，解决了MultKAN的这三个缺点。作为标准AddKAN和MultKAN层的一对一替代，LeanKAN能够为传统KAN学习问题以及增强KAN结构（如KAN常微分方程（KAN-ODEs）或深度运算符KANs（DeepOKAN））提供这些优势。我们在一系列演示中展示了LeanKAN在标准KAN玩具问题以及通过KAN-ODEs学习的常微分方程和偏微分方程中的简单性和效率，我们发现其更稀疏的参数化和紧凑的结构有助于提高其表达能力和学习能力，使其在各种任务中表现优于类似甚至更大的MultKAN。

更新时间: 2025-07-24 22:04:51

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2502.17844v2

RealDeal: Enhancing Realism and Details in Brain Image Generation via Image-to-Image Diffusion Models

We propose image-to-image diffusion models that are designed to enhance the realism and details of generated brain images by introducing sharp edges, fine textures, subtle anatomical features, and imaging noise. Generative models have been widely adopted in the biomedical domain, especially in image generation applications. Latent diffusion models achieve state-of-the-art results in generating brain MRIs. However, due to latent compression, generated images from these models are overly smooth, lacking fine anatomical structures and scan acquisition noise that are typically seen in real images. This work formulates the realism enhancing and detail adding process as image-to-image diffusion models, which refines the quality of LDM-generated images. We employ commonly used metrics like FID and LPIPS for image realism assessment. Furthermore, we introduce new metrics to demonstrate the realism of images generated by RealDeal in terms of image noise distribution, sharpness, and texture.

Updated: 2025-07-24 22:04:39

标题: RealDeal: 通过图像到图像扩散模型增强脑图像生成的逼真性和细节

摘要: 我们提出了图像到图像扩散模型，旨在通过引入清晰的边缘、精细的纹理、微妙的解剖特征和成像噪音，增强生成的脑部图像的逼真性和细节。生成模型已经被广泛应用于生物医学领域，特别是在图像生成应用中。潜在扩散模型在生成脑部MRI方面取得了最先进的结果。然而，由于潜在压缩，这些模型生成的图像过于平滑，缺乏通常在真实图像中看到的精细解剖结构和扫描获取噪音。这项工作将增强逼真性和增加细节的过程形式化为图像到图像扩散模型，从而提升LDM生成的图像质量。我们使用常用的指标如FID和LPIPS来评估图像的逼真性。此外，我们引入新的指标来展示RealDeal生成的图像在图像噪音分布、锐度和纹理方面的逼真性。

更新时间: 2025-07-24 22:04:39

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18830v1

CueBuddy: helping non-native English speakers navigate English-centric STEM education

Students across the world in STEM classes, especially in the Global South, fall behind their peers who are more fluent in English, despite being at par with them in terms of scientific prerequisites. While many of them are able to follow everyday English at ease, key terms in English stay challenging. In most cases, such students have had most of their course prerequisites in a lower resource language. Live speech translation to lower resource languages is a promising area of research, however, models for speech translation can be too expensive on a large scale and often struggle with technical content. In this paper, we describe CueBuddy, which aims to remediate these issues by providing real-time "lexical cues" through technical keyword spotting along real-time multilingual glossary lookup to help students stay up to speed with complex English jargon without disrupting their concentration on the lecture. We also describe the limitations and future extensions of our approach.

Updated: 2025-07-24 21:56:47

标题: CueBuddy：帮助非英语为母语的人在以英语为中心的STEM教育中导航

摘要: 在STEM课程中，尤其是在全球南方的学生落后于那些英语流利的同龄人，尽管在科学先决条件方面与他们持平。虽然许多学生能够轻松理解日常英语，但英语中的关键术语仍然具有挑战性。在大多数情况下，这些学生大部分课程先决条件都是用较低资源语言完成的。实时语音翻译到较低资源语言是一个有希望的研究领域，然而，语音翻译模型在大规模应用时可能成本过高，并且通常难以处理技术内容。在本文中，我们描述了CueBuddy，它旨在通过实时的“词汇提示”通过技术关键词识别和实时多语言词汇表查找来解决这些问题，帮助学生在不影响他们对讲座的集中注意力的情况下跟上复杂的英语行话。我们还描述了我们方法的局限性和未来的拓展。

更新时间: 2025-07-24 21:56:47

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.18827v1

Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models

Large Language Models (LLMs) perform best with well-crafted prompts, yet prompt engineering remains manual, inconsistent, and inaccessible to non-experts. We introduce Promptomatix, an automatic prompt optimization framework that transforms natural language task descriptions into high-quality prompts without requiring manual tuning or domain expertise. Promptomatix supports both a lightweight meta-prompt-based optimizer and a DSPy-powered compiler, with modular design enabling future extension to more advanced frameworks. The system analyzes user intent, generates synthetic training data, selects prompting strategies, and refines prompts using cost-aware objectives. Evaluated across 5 task categories, Promptomatix achieves competitive or superior performance compared to existing libraries, while reducing prompt length and computational overhead making prompt optimization scalable and efficient.

Updated: 2025-07-24 21:49:26

标题: Promptomatix：用于大型语言模型的自动提示优化框架

摘要: 大型语言模型（LLMs）在使用经过精心设计的提示时表现最佳，然而提示工程仍然是手动的、不一致的，并且对非专家不可访问。我们引入了Promptomatix，这是一个自动的提示优化框架，可以将自然语言任务描述转换为高质量的提示，而无需手动调整或领域专业知识。Promptomatix支持基于轻量级元提示的优化器和由DSPy驱动的编译器，其模块化设计使其能够未来扩展到更高级的框架。该系统分析用户意图，生成合成训练数据，选择提示策略，并使用成本感知目标优化提示。在5个任务类别中进行评估，Promptomatix相比现有库实现了竞争性或更优越的性能，同时减少了提示长度和计算开销，使提示优化变得可扩展和高效。

更新时间: 2025-07-24 21:49:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.14241v3

Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models

Updated: 2025-07-24 21:49:26

标题: Promptomatix：一种用于大型语言模型的自动提示优化框架

摘要: 大型语言模型（LLMs）在使用精心设计的提示时表现最佳，然而提示工程仍然是手动的、不一致的，并且对非专家不可访问。我们引入了Promptomatix，这是一个自动提示优化框架，可以将自然语言任务描述转化为高质量的提示，而无需手动调整或领域专业知识。Promptomatix支持基于轻量级元提示的优化器和由DSPy驱动的编译器，其模块化设计使其能够未来扩展到更高级的框架。该系统分析用户意图，生成合成训练数据，选择提示策略，并利用成本感知目标优化提示。在5个任务类别上进行评估时，与现有库相比，Promptomatix实现了竞争性或优越的性能，同时减少了提示长度和计算开销，使提示优化具有可伸缩性和效率。

更新时间: 2025-07-24 21:49:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.14241v3

Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities

Evaluations of large language model (LLM) risks and capabilities are increasingly being incorporated into AI risk management and governance frameworks. Currently, most risk evaluations are conducted by designing inputs that elicit harmful behaviors from the system. However, this approach suffers from two limitations. First, input-output evaluations cannot fully evaluate realistic risks from open-weight models. Second, the behaviors identified during any particular input-output evaluation can only lower-bound the model's worst-possible-case input-output behavior. As a complementary method for eliciting harmful behaviors, we propose evaluating LLMs with model tampering attacks which allow for modifications to latent activations or weights. We pit state-of-the-art techniques for removing harmful LLM capabilities against a suite of 5 input-space and 6 model tampering attacks. In addition to benchmarking these methods against each other, we show that (1) model resilience to capability elicitation attacks lies on a low-dimensional robustness subspace; (2) the success rate of model tampering attacks can empirically predict and offer conservative estimates for the success of held-out input-space attacks; and (3) state-of-the-art unlearning methods can easily be undone within 16 steps of fine-tuning. Together, these results highlight the difficulty of suppressing harmful LLM capabilities and show that model tampering attacks enable substantially more rigorous evaluations than input-space attacks alone.

Updated: 2025-07-24 21:34:54

标题: 模型篡改攻击实现了对LLM功能更严格的评估

摘要: 大型语言模型（LLM）风险和能力的评估越来越多地被纳入人工智能风险管理和治理框架中。目前，大多数风险评估是通过设计输入来引发系统的有害行为来进行的。然而，这种方法存在两个限制。首先，输入-输出评估无法完全评估来自开放权重模型的现实风险。其次，通过任何特定的输入-输出评估确定的行为只能对模型的最坏情况输入-输出行为做出下限估计。作为引发有害行为的补充方法，我们提出使用模型篡改攻击评估LLMs，这些攻击允许修改潜在激活或权重。我们将消除有害LLM能力的最先进技术与5种输入空间和6种模型篡改攻击进行对抗。除了将这些方法相互进行基准测试之外，我们还展示了（1）模型对能力引发攻击的韧性存在于一个低维稳健子空间；（2）模型篡改攻击的成功率可以经验性地预测并为持有输入空间攻击的成功提供保守估计；以及（3）最先进的遗忘方法可以在16步微调内轻松撤销。综合这些结果突显了抑制有害LLM能力的困难，并显示模型篡改攻击比仅使用输入空间攻击能够进行更加严格的评估。

更新时间: 2025-07-24 21:34:54

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2502.05209v4

Deepfake Detection Via Facial Feature Extraction and Modeling

The rise of deepfake technology brings forth new questions about the authenticity of various forms of media found online today. Videos and images generated by artificial intelligence (AI) have become increasingly more difficult to differentiate from genuine media, resulting in the need for new models to detect artificially-generated media. While many models have attempted to solve this, most focus on direct image processing, adapting a convolutional neural network (CNN) or a recurrent neural network (RNN) that directly interacts with the video image data. This paper introduces an approach of using solely facial landmarks for deepfake detection. Using a dataset consisting of both deepfake and genuine videos of human faces, this paper describes an approach for extracting facial landmarks for deepfake detection, focusing on identifying subtle inconsistencies in facial movements instead of raw image processing. Experimental results demonstrated that this feature extraction technique is effective in various neural network models, with the same facial landmarks tested on three neural network models, with promising performance metrics indicating its potential for real-world applications. The findings discussed in this paper include RNN and artificial neural network (ANN) models with accuracy between 96% and 93%, respectively, with a CNN model hovering around 78%. This research challenges the assumption that raw image processing is necessary to identify deepfake videos by presenting a facial feature extraction approach compatible with various neural network models while requiring fewer parameters.

Updated: 2025-07-24 21:30:51

标题: 通过面部特征提取和建模进行深度伪造检测

摘要: 深度伪造技术的兴起带来了关于今天在线媒体各种形式真实性的新问题。由人工智能（AI）生成的视频和图像越来越难与真实媒体区分开来，因此需要新的模型来检测人工生成的媒体。虽然许多模型已经尝试解决这个问题，但大多集中在直接图像处理上，采用卷积神经网络（CNN）或递归神经网络（RNN）直接与视频图像数据交互。本文介绍了一种仅使用面部标志进行深度伪造检测的方法。利用包含人脸深度伪造和真实视频的数据集，本文描述了一种提取面部标志进行深度伪造检测的方法，重点是识别面部运动中微妙的不一致，而不是原始图像处理。实验结果表明，这种特征提取技术在各种神经网络模型中是有效的，同一面部标志在三种神经网络模型上进行测试，具有有希望的性能指标，表明其在现实应用中的潜力。本文讨论的研究结果包括准确率分别在96%和93%之间的RNN和人工神经网络（ANN）模型，CNN模型在78%左右。这项研究挑战了需要原始图像处理来识别深度伪造视频的假设，提出了一种与各种神经网络模型兼容且需要更少参数的面部特征提取方法。

更新时间: 2025-07-24 21:30:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18815v1

Deepfake Detection Via Facial Feature Extraction and Modeling

Updated: 2025-07-24 21:30:51

标题: 通过面部特征提取和建模进行深度伪造检测

摘要: 深度伪造技术的兴起引发了关于当今在线媒体真实性的新问题。由人工智能生成的视频和图像变得越来越难以区分真实媒体，这导致了需要新模型来检测人工生成的媒体。虽然许多模型已经尝试解决这个问题，但大多集中在直接图像处理上，使用卷积神经网络（CNN）或递归神经网络（RNN）直接与视频图像数据交互。本文介绍了一种仅使用面部标志进行深度伪造检测的方法。利用包含人类面部深度伪造和真实视频的数据集，本文描述了一种提取面部标志以进行深度伪造检测的方法，重点放在识别面部运动中微妙的不一致性上，而不是原始图像处理。实验结果表明，这种特征提取技术在各种神经网络模型中都有效，将相同的面部标志测试在三个神经网络模型上，具有有希望的性能指标，表明其在实际应用中具有潜力。本文讨论的研究结果包括准确率分别为96%和93%的RNN和人工神经网络（ANN）模型，以及准确率约为78%的CNN模型。这项研究挑战了必须使用原始图像处理来识别深度伪造视频的假设，通过提出一种面部特征提取方法，与各种神经网络模型兼容，同时需要更少的参数。

更新时间: 2025-07-24 21:30:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18815v1

Scale-Consistent Learning for Partial Differential Equations

Machine learning (ML) models have emerged as a promising approach for solving partial differential equations (PDEs) in science and engineering. Previous ML models typically cannot generalize outside the training data; for example, a trained ML model for the Navier-Stokes equations only works for a fixed Reynolds number ($Re$) on a pre-defined domain. To overcome these limitations, we propose a data augmentation scheme based on scale-consistency properties of PDEs and design a scale-informed neural operator that can model a wide range of scales. Our formulation leverages the facts: (i) PDEs can be rescaled, or more concretely, a given domain can be re-scaled to unit size, and the parameters and the boundary conditions of the PDE can be appropriately adjusted to represent the original solution, and (ii) the solution operators on a given domain are consistent on the sub-domains. We leverage these facts to create a scale-consistency loss that encourages matching the solutions evaluated on a given domain and the solution obtained on its sub-domain from the rescaled PDE. Since neural operators can fit to multiple scales and resolutions, they are the natural choice for incorporating scale-consistency loss during training of neural PDE solvers. We experiment with scale-consistency loss and the scale-informed neural operator model on the Burgers' equation, Darcy Flow, Helmholtz equation, and Navier-Stokes equations. With scale-consistency, the model trained on $Re$ of 1000 can generalize to $Re$ ranging from 250 to 10000, and reduces the error by 34% on average of all datasets compared to baselines.

Updated: 2025-07-24 21:29:52

标题: 尺度一致学习用于偏微分方程

摘要: 机器学习（ML）模型已经成为解决科学和工程中偏微分方程（PDEs）的一种有前途的方法。先前的ML模型通常无法推广到训练数据之外；例如，用于纳维-斯托克斯方程的训练ML模型仅适用于预定义域上的固定雷诺数（$Re$）。为了克服这些限制，我们提出了一种基于PDE的尺度一致性属性的数据增强方案，并设计了一种能够模拟各种尺度的尺度感知神经算子。我们的表述利用以下事实：（i）PDE可以重新调整尺度，或更具体地说，给定域可以重新调整到单位大小，并且可以适当调整PDE的参数和边界条件以表示原始解，以及（ii）给定域上的解算子在子域上是一致的。我们利用这些事实创建了一个尺度一致性损失，鼓励匹配在给定域上评估的解和从重新调整PDE的子域上获得的解。由于神经算子可以适应多个尺度和分辨率，它们是在训练神经PDE求解器时集成尺度一致性损失的自然选择。我们在波尔格斯方程、达西流、亥姆霍兹方程和纳维-斯托克斯方程上进行了尺度一致性损失和尺度感知神经算子模型的实验。通过尺度一致性，训练在$Re$为1000的模型可以推广到范围从250到10000的$Re$，并且与基线相比，在所有数据集上平均减少了34％的误差。

更新时间: 2025-07-24 21:29:52

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2507.18813v1

MemoCoder: Automated Function Synthesis using LLM-Supported Agents

With the widespread adoption of Large Language Models (LLMs) such as GitHub Copilot and ChatGPT, developers increasingly rely on AI-assisted tools to support code generation. While LLMs can generate syntactically correct solutions for well-structured programming tasks, they often struggle with challenges that require iterative debugging, error handling, or adaptation to diverse problem structures. Existing approaches such as fine-tuning or self-repair strategies either require costly retraining or lack mechanisms to accumulate and reuse knowledge from previous attempts. To address these limitations, we propose MemoCoder, a multi-agent framework that enables collaborative problem solving and persistent learning from past fixes. At the core of MemoCoder is a Fixing Knowledge Set, which stores successful repairs and supports retrieval for future tasks. A central Mentor Agent supervises the repair process by identifying recurring error patterns and refining high-level fixing strategies, providing a novel supervisory role that guides the self-repair loop. We evaluate MemoCoder across three public benchmarks -- MBPP, HumanEval, and LiveCodeBench -- spanning a range of problem complexities. Experimental results show that MemoCoder consistently outperforms both zero-shot prompting and a Self-Repair strategy, with improvements ranging from 3.1% to 12.1% in Pass@10 and from 1.4% to 14.5% in Pass@50, demonstrating its effectiveness in iterative refinement and knowledge-guided code generation.

Updated: 2025-07-24 21:23:44

标题: MemoCoder：使用LLM支持的代理自动合成功能

摘要: 随着GitHub Copilot和ChatGPT等大型语言模型（LLMs）的广泛应用，开发人员越来越依赖于AI辅助工具来支持代码生成。虽然LLMs可以为结构良好的编程任务生成语法正确的解决方案，但它们通常在需要迭代调试、错误处理或适应不同问题结构的挑战中遇到困难。现有方法，如微调或自修复策略，要么需要昂贵的重新训练，要么缺乏从先前尝试中积累和重用知识的机制。为了解决这些限制，我们提出了MemoCoder，一个多智能体框架，它实现了协作问题解决和从过去修复中持续学习。MemoCoder的核心是一个修复知识集，它存储成功的修复并支持未来任务的检索。一个中心导师智能体通过识别重复的错误模式和完善高级修复策略来监督修复过程，提供了一个引导自修复循环的新颖监督角色。我们在跨越一系列问题复杂性的三个公共基准测试中评估了MemoCoder--MBPP、HumanEval和LiveCodeBench。实验结果表明，MemoCoder在Pass@10和Pass@50中始终优于零提示和自修复策略，改进范围从3.1%到12.1%和从1.4%到14.5%，展示了它在迭代改进和知识引导的代码生成中的有效性。

更新时间: 2025-07-24 21:23:44

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.18812v1

MemoCoder: Automated Function Synthesis using LLM-Supported Agents

Updated: 2025-07-24 21:23:44

标题: MemoCoder：使用LLM支持的代理自动功能合成

摘要: 随着GitHub Copilot和ChatGPT等大型语言模型（LLMs）的广泛应用，开发者越来越依赖于人工智能辅助工具来支持代码生成。虽然LLMs可以为结构良好的编程任务生成符合语法的解决方案，但它们通常在需要迭代调试、错误处理或适应不同问题结构的挑战中遇到困难。现有的方法，如微调或自修复策略，要么需要昂贵的重新训练，要么缺乏从先前尝试中积累和重用知识的机制。为了解决这些限制，我们提出了MemoCoder，这是一个多智能体框架，可以实现协作问题解决和从过去修复中持续学习。MemoCoder的核心是一个修复知识集，它存储成功的修复并支持未来任务的检索。一个中央的导师智能体通过识别重复的错误模式和完善高层次的修复策略来监督修复过程，提供了一个引导自修复循环的新颖监督角色。我们在三个公共基准测试中评估了MemoCoder——MBPP、HumanEval和LiveCodeBench——涵盖了一系列问题的复杂性。实验结果显示，MemoCoder在Pass@10和Pass@50中始终优于零提示和自修复策略，改进范围从3.1%到12.1%和从1.4%到14.5%，展示了它在迭代改进和知识引导的代码生成中的有效性。

更新时间: 2025-07-24 21:23:44

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.18812v1

Even Faster Simulations with Flow Matching: A Study of Zero Degree Calorimeter Responses

Recent advances in generative neural networks, particularly flow matching (FM), have enabled the generation of high-fidelity samples while significantly reducing computational costs. A promising application of these models is accelerating simulations in high-energy physics (HEP), helping research institutions meet their increasing computational demands. In this work, we leverage FM to develop surrogate models for fast simulations of zero degree calorimeters in the ALICE experiment. We present an effective training strategy that enables the training of fast generative models with an exceptionally low number of parameters. This approach achieves state-of-the-art simulation fidelity for both neutron (ZN) and proton (ZP) detectors, while offering substantial reductions in computational costs compared to existing methods. Our FM model achieves a Wasserstein distance of 1.27 for the ZN simulation with an inference time of 0.46 ms per sample, compared to the current best of 1.20 with an inference time of approximately 109 ms. The latent FM model further improves the inference speed, reducing the sampling time to 0.026 ms per sample, with a minimal trade-off in accuracy. Similarly, our approach achieves a Wasserstein distance of 1.30 for the ZP simulation, outperforming the current best of 2.08. The source code is available at https://github.com/m-wojnar/faster_zdc.

Updated: 2025-07-24 21:21:33

标题: 使用流匹配技术进行更快速的模拟：零度量热器响应研究

摘要: 最近，生成式神经网络，尤其是流匹配（FM），取得了重大进展，使得能够生成高保真度样本的同时显著降低计算成本。这些模型的一个有前途的应用是加速高能物理（HEP）中的模拟，帮助研究机构满足其日益增长的计算需求。在这项工作中，我们利用FM来开发零度量热计数器在ALICE实验中的快速模拟的代理模型。我们提出了一种有效的训练策略，使得可以使用极少数量的参数训练快速生成模型。与现有方法相比，这种方法在中子（ZN）和质子（ZP）探测器的模拟质量上达到了最先进水平，同时大幅降低了计算成本。我们的FM模型在ZN模拟中实现了1.27的Wasserstein距离，每个样本的推断时间为0.46毫秒，而目前最佳模型为1.20，推断时间约为109毫秒。潜在的FM模型进一步提高了推断速度，将采样时间降低到每个样本0.026毫秒，几乎没有准确性的牺牲。同样，我们的方法在ZP模拟中实现了1.30的Wasserstein距离，优于目前最佳模型的2.08。源代码可在https://github.com/m-wojnar/faster_zdc找到。

更新时间: 2025-07-24 21:21:33

领域: cs.LG

下载: http://arxiv.org/abs/2507.18811v1

Curiosity Driven Exploration to Optimize Structure-Property Learning in Microscopy

Rapidly determining structure-property correlations in materials is an important challenge in better understanding fundamental mechanisms and greatly assists in materials design. In microscopy, imaging data provides a direct measurement of the local structure, while spectroscopic measurements provide relevant functional property information. Deep kernel active learning approaches have been utilized to rapidly map local structure to functional properties in microscopy experiments, but are computationally expensive for multi-dimensional and correlated output spaces. Here, we present an alternative lightweight curiosity algorithm which actively samples regions with unexplored structure-property relations, utilizing a deep-learning based surrogate model for error prediction. We show that the algorithm outperforms random sampling for predicting properties from structures, and provides a convenient tool for efficient mapping of structure-property relationships in materials science.

Updated: 2025-07-24 21:17:00

标题: 好的，这个文献标题的翻译是：好奇心驱动的探索优化显微镜下的结构-性质学习

摘要: 在材料中快速确定结构与性能之间的相关性是更好地理解基本机制的重要挑战，并且在材料设计中极大地帮助。在显微镜中，成像数据提供了局部结构的直接测量，而光谱测量提供相关的功能性质信息。深度核主动学习方法已被用于在显微实验中快速将局部结构映射到功能性质，但对于多维和相关输出空间而言计算成本昂贵。在这里，我们提出了一种替代的轻量级好奇算法，它主动采样尚未探索的结构-性能关系区域，利用基于深度学习的替代模型进行误差预测。我们展示了该算法在从结构预测性能方面优于随机采样，并提供了一个有效地映射材料科学中结构-性能关系的便利工具。

更新时间: 2025-07-24 21:17:00

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2504.20011v2

MetaSel: A Test Selection Approach for Fine-tuned DNN Models

Deep Neural Networks (DNNs) face challenges during deployment due to data distribution shifts. Fine-tuning adapts pre-trained models to new contexts requiring smaller labeled sets. However, testing fine-tuned models under constrained labeling budgets remains a critical challenge. This paper introduces MetaSel, a new approach, tailored for fine-tuned DNN models, to select tests from unlabeled inputs. MetaSel assumes that fine-tuned and pre-trained models share related data distributions and exhibit similar behaviors for many inputs. However, their behaviors diverge within the input subspace where fine-tuning alters decision boundaries, making those inputs more prone to misclassification. Unlike general approaches that rely solely on the DNN model and its input set, MetaSel leverages information from both the fine-tuned and pre-trained models and their behavioral differences to estimate misclassification probability for unlabeled test inputs, enabling more effective test selection. Our extensive empirical evaluation, comparing MetaSel against 11 state-of-the-art approaches and involving 68 fine-tuned models across weak, medium, and strong distribution shifts, demonstrates that MetaSel consistently delivers significant improvements in Test Relative Coverage (TRC) over existing baselines, particularly under highly constrained labeling budgets. MetaSel shows average TRC improvements of 28.46% to 56.18% over the most frequent second-best baselines while maintaining a high TRC median and low variability. Our results confirm MetaSel's practicality, robustness, and cost-effectiveness for test selection in the context of fine-tuned models.

Updated: 2025-07-24 21:16:42

标题: MetaSel：一种用于微调DNN模型的测试选择方法

摘要: 深度神经网络（DNNs）在部署过程中面临数据分布转移的挑战。微调适应预训练模型到新的环境中，需要更小的标记数据集。然而，在受限的标记预算下测试微调模型仍然是一个关键挑战。本文介绍了一种新方法MetaSel，专门针对微调的DNN模型，从未标记的输入中选择测试。MetaSel假设微调和预训练模型共享相关的数据分布，并对许多输入表现出类似的行为。然而，在微调改变决策边界的输入子空间中，它们的行为会发生分歧，使这些输入更容易被错误分类。与仅依赖于DNN模型及其输入集的一般方法不同，MetaSel利用来自微调和预训练模型以及它们的行为差异的信息来估计未标记测试输入的误分类概率，从而实现更有效的测试选择。我们的广泛实证评估将MetaSel与11种最先进方法进行比较，并涉及68个微调模型，跨弱、中、强分布转移，结果表明MetaSel在现有基线基础上持续提供显著的测试相对覆盖率（TRC）改进，特别是在高度受限的标记预算下。MetaSel显示出相对于最频繁的次佳基线的平均TRC改进为28.46%至56.18%，同时保持较高的TRC中位数和低的变异性。我们的结果证实了MetaSel在微调模型测试选择方面的实用性、稳健性和成本效益。

更新时间: 2025-07-24 21:16:42

领域: cs.LG,cs.SE

下载: http://arxiv.org/abs/2503.17534v3

Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

We introduce a new approach to systematically map features discovered by sparse autoencoder across consecutive layers of large language models, extending earlier work that examined inter-layer feature links. By using a data-free cosine similarity technique, we trace how specific features persist, transform, or first appear at each stage. This method yields granular flow graphs of feature evolution, enabling fine-grained interpretability and mechanistic insights into model computations. Crucially, we demonstrate how these cross-layer feature maps facilitate direct steering of model behavior by amplifying or suppressing chosen features, achieving targeted thematic control in text generation. Together, our findings highlight the utility of a causal, cross-layer interpretability framework that not only clarifies how features develop through forward passes but also provides new means for transparent manipulation of large language models.

Updated: 2025-07-24 21:16:27

标题: 分析特征流以增强语言模型中的解释和引导

摘要: 我们引入了一种新的方法，系统地映射稀疏自动编码器在大型语言模型的连续层中发现的特征，扩展了早期研究对层间特征链接的探讨。通过使用一种无数据的余弦相似度技术，我们追踪特定特征在每个阶段的持久性、转换或首次出现的方式。这种方法产生了特征演变的细粒度流程图，实现了对模型计算的细粒度可解释性和机制洞察力。至关重要的是，我们展示了这些跨层特征映射如何促进通过放大或抑制选择的特征来直接引导模型行为，实现文本生成中的有针对性的主题控制。总的来说，我们的研究结果突显了一个因果关系、跨层可解释性框架的实用性，它不仅阐明了特征如何通过前向传播发展，还为透明操作大型语言模型提供了新的手段。

更新时间: 2025-07-24 21:16:27

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2502.03032v3

Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science.

Updated: 2025-07-24 21:15:25

标题: 人工智能在量子、原子和连续系统科学中的应用

摘要: 人工智能（AI）的进步正在推动自然科学中的新发现范式。如今，AI已经开始通过改进、加速和促进我们对自然现象在各种空间和时间尺度上的理解，从而推动自然科学的进步，引发了一个被称为AI科学（AI4Science）的新研究领域。作为一种新兴的研究范式，AI4Science在于它是一个庞大而高度跨学科的领域。因此，这个领域需要一个统一和技术上的处理，尽管这是具有挑战性的。这项工作旨在提供对AI4Science的一个子领域——即AI用于量子、原子和连续系统的技术全面介绍。这些领域旨在从亚原子（波函数和电子密度）、原子（分子、蛋白质、材料和相互作用）到宏观（流体、气候和地下）尺度理解物理世界，形成了AI4Science的一个重要子领域。专注于这些领域的独特优势在于它们在很大程度上共享一组共同的挑战，从而允许一个统一和基础性的处理。一个关键的共同挑战是如何通过深度学习方法在自然系统中捕捉物理第一原理，特别是对称性。我们提供了一种深入而直观的技术方法，以实现对对称变换的等变性。我们还讨论了其他常见的技术挑战，包括可解释性、超出分布的泛化、基于基础和大型语言模型的知识转移，以及不确定性量化。为了促进学习和教育，我们提供了我们发现有用的资源的分类列表。我们努力做到全面和统一，希望这一初步努力能引发更多社区的兴趣和努力，进一步推动AI4Science的发展。

更新时间: 2025-07-24 21:15:25

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2307.08423v6

Test-time Offline Reinforcement Learning on Goal-related Experience

Foundation models compress a large amount of information in a single, large neural network, which can then be queried for individual tasks. There are strong parallels between this widespread framework and offline goal-conditioned reinforcement learning algorithms: a universal value function is trained on a large number of goals, and the policy is evaluated on a single goal in each test episode. Extensive research in foundation models has shown that performance can be substantially improved through test-time training, specializing the model to the current goal. We find similarly that test-time offline reinforcement learning on experience related to the test goal can lead to substantially better policies at minimal compute costs. We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state and quality with respect to the evaluation goal. We demonstrate across a wide range of high-dimensional loco-navigation and manipulation tasks that fine-tuning a policy on the selected data for a few gradient steps leads to significant performance gains over standard offline pre-training. Our goal-conditioned test-time training (GC-TTT) algorithm applies this routine in a receding-horizon fashion during evaluation, adapting the policy to the current trajectory as it is being rolled out. Finally, we study compute allocation at inference, demonstrating that, at comparable costs, GC-TTT induces performance gains that are not achievable by scaling model size.

Updated: 2025-07-24 21:11:39

标题: 测试时间离线强化学习基于目标相关经验

摘要: 基础模型将大量信息压缩到一个单一的大型神经网络中，然后可以针对个别任务进行查询。这种广泛应用的框架与离线目标条件强化学习算法之间存在强烈的相似性：一个通用价值函数在大量目标上进行训练，策略在每个测试周期中针对单个目标进行评估。基础模型的广泛研究表明，通过测试时间训练，可以显着提高性能，使模型专门适应当前目标。我们发现，类似地，针对与测试目标相关的经验进行测试时间离线强化学习可以在最小的计算成本下实现显著更好的策略。我们提出了一种新颖的自监督数据选择标准，根据它们与当前状态的相关性和对评估目标的质量，从离线数据集中选择转换。我们跨越广泛的高维定位导航和操作任务，证明在选定的数据上微调策略几个梯度步骤会比标准的离线预训练带来显著的性能提升。我们的目标条件测试时间训练（GC-TTT）算法在评估过程中以一种逐步递进的方式应用这种例程，使策略适应当前轨迹。最后，我们研究推断时的计算分配，证明在可比成本下，GC-TTT会带来性能提升，这是通过扩大模型规模无法实现的。

更新时间: 2025-07-24 21:11:39

领域: cs.LG

下载: http://arxiv.org/abs/2507.18809v1

Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling

Recent developments in Self-Supervised Learning (SSL) have demonstrated significant potential for Speaker Verification (SV), but closing the performance gap with supervised systems remains an ongoing challenge. SSL frameworks rely on anchor-positive pairs, constructed from segments of the same audio utterance. Hence, positives have channel characteristics similar to those of their corresponding anchors, even with extensive data-augmentation. Therefore, this positive sampling strategy is a fundamental limitation as it encodes too much information regarding the recording source in the learned representations. This article introduces Self-Supervised Positive Sampling (SSPS), a bootstrapped technique for sampling appropriate and diverse positives in SSL frameworks for SV. SSPS samples positives close to their anchor in the representation space, assuming that these pseudo-positives belong to the same speaker identity but correspond to different recording conditions. This method consistently demonstrates improvements in SV performance on VoxCeleb benchmarks when applied to major SSL frameworks, including SimCLR, SwAV, VICReg, and DINO. Using SSPS, SimCLR and DINO achieve 2.57% and 2.53% EER on VoxCeleb1-O, respectively. SimCLR yields a 58% relative reduction in EER, getting comparable performance to DINO with a simpler training framework. Furthermore, SSPS lowers intra-class variance and reduces channel information in speaker representations while exhibiting greater robustness without data-augmentation.

Updated: 2025-07-24 21:10:40

标题: 通过引导式正样本采样的自监督框架用于说话人验证

摘要: 最近发展的自监督学习（SSL）在说话者验证（SV）方面已经展示出了显著的潜力，但要缩小与监督系统的性能差距仍然是一个持续的挑战。SSL框架依赖于由相同音频话语片段构建的锚-正对，因此正对具有与其对应锚的信道特征相似，即使进行了大量的数据增强。因此，这种正样本采样策略是一个基本限制，因为它在学习表示中编码了太多关于录制来源的信息。本文介绍了自监督正样本采样（SSPS），这是一种用于在SV的SSL框架中采样适当和多样正对的自举技术。SSPS在表示空间中接近其锚点采样正对，假设这些伪正对属于相同的说话者身份但对应不同的录制条件。当应用于SimCLR、SwAV、VICReg和DINO等主要SSL框架时，这种方法在VoxCeleb基准测试中始终展示出SV性能的提升。使用SSPS，SimCLR和DINO在VoxCeleb1-O上分别实现了2.57%和2.53%的EER。SimCLR的EER相对降低了58%，达到了与DINO相当的性能，而训练框架更简单。此外，SSPS降低了类内方差，并减少了说话者表示中的信道信息，同时在没有数据增强的情况下表现出更大的鲁棒性。

更新时间: 2025-07-24 21:10:40

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2501.17772v4

Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator

The diagonal of a model's Fisher Information Matrix (the "Fisher diagonal") has frequently been used as a way to measure parameter sensitivity. Typically, the Fisher diagonal is estimated via squared sampled gradients of the model's likelihood with respect to its parameters, averaged over a few hundred or thousand examples -- a process which incurs nontrivial computational costs. At the same time, adaptive gradient methods like the ubiquitous Adam optimizer compute a moving average of the squared gradient over the course of training. This paper therefore explores whether an approximation of the Fisher diagonal can be obtained "for free" by recycling the squared gradient accumulator that has already been computed over the course of training. Through a comprehensive set of experiments covering five applications of the Fisher diagonal, we demonstrate that the "Squisher" (SQUared gradient accumulator as an approximation of the FISHER) consistently performs similarly to the Fisher diagonal while outperforming baseline methods. Additionally, we clarify the exact differences between the Squisher and the Fisher diagonal and provide empirical quantification of their respective impact.

Updated: 2025-07-24 21:10:37

标题: 自由的渔夫？通过回收平方梯度累加器来近似费舍尔信息矩阵

摘要: 模型的费舍尔信息矩阵对角线（“费舍尔对角线”）经常被用来衡量参数敏感性。通常，费舍尔对角线通过对模型似然相对于其参数的平方采样梯度进行估计得到，这些梯度在几百或几千个示例上进行平均 - 这个过程会产生不可忽视的计算成本。同时，像常见的Adam优化器这样的自适应梯度方法会在训练过程中计算梯度的平方的移动平均值。因此，本文研究是否可以通过回收已经在训练过程中计算过的平方梯度累加器来“免费”获得费舍尔对角线的近似值。通过一系列涵盖费舍尔对角线的五个应用的实验，我们证明了“Squisher”（作为费舍尔的近似值的平方梯度累加器）在表现上始终与费舍尔对角线相似，同时优于基线方法。此外，我们澄清了Squisher和费舍尔对角线之间的确切差异，并提供了它们各自影响的经验量化。

更新时间: 2025-07-24 21:10:37

领域: cs.LG

下载: http://arxiv.org/abs/2507.18807v1

Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs

As large language models (LLMs) become increasingly integrated into daily life, ensuring their cultural sensitivity and inclusivity is paramount. We introduce our dataset, a year-long community-driven project covering all 22 Arab countries. The dataset includes instructions (input, response pairs) in both Modern Standard Arabic (MSA) and dialectal Arabic (DA), spanning 20 diverse topics. Built by a team of 44 researchers across the Arab world, all of whom are authors of this paper, our dataset offers a broad, inclusive perspective. We use our dataset to evaluate the cultural and dialectal capabilities of several frontier LLMs, revealing notable limitations. For instance, while closed-source LLMs generally exhibit strong performance, they are not without flaws, and smaller open-source models face greater challenges. Moreover, certain countries (e.g., Egypt, the UAE) appear better represented than others (e.g., Iraq, Mauritania, Yemen). Our annotation guidelines, code, and data for reproducibility are publicly available.

Updated: 2025-07-24 21:08:27

标题: 棕榈：一个对阿拉伯语LLM具有文化包容性和语言多样性的数据集

摘要: 随着大型语言模型（LLMs）越来越多地融入日常生活中，确保它们的文化敏感性和包容性至关重要。我们介绍了我们的数据集，这是一个为期一年的社区驱动项目，涵盖了所有22个阿拉伯国家。该数据集包括用现代标准阿拉伯语（MSA）和方言阿拉伯语（DA）编写的说明（输入、响应对），涵盖了20个不同主题。由一个遍布阿拉伯世界的44名研究人员组成的团队构建，所有这些人都是本文的作者，我们的数据集提供了广泛、包容的视角。我们使用我们的数据集来评估几个前沿LLMs的文化和方言能力，揭示了显著的局限性。例如，尽管封闭源LLMs通常表现出色，但它们并非毫无瑕疵，而较小的开源模型面临更大挑战。此外，某些国家（例如埃及、阿联酋）似乎比其他国家（例如伊拉克、毛里塔尼亚、也门）更受重视。我们的注释指南、代码和数据可供公开复制。

更新时间: 2025-07-24 21:08:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.00151v2

Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs

Updated: 2025-07-24 21:08:27

标题: 棕榈：一个适用于阿拉伯语LLMs的文化包容和语言多样的数据集

摘要: 随着大型语言模型（LLMs）越来越多地整合到日常生活中，确保它们的文化敏感性和包容性至关重要。我们介绍了我们的数据集，这是一个为期一年的社区驱动项目，涵盖所有22个阿拉伯国家。该数据集包括了20个不同主题的现代标准阿拉伯语（MSA）和方言阿拉伯语（DA）的指令（输入、响应对）。由阿拉伯世界的44名研究人员组成的团队构建了这个数据集，他们都是本文的作者，我们的数据集提供了广泛、包容的视角。我们利用我们的数据集来评估一些前沿LLMs的文化和方言能力，揭示了显著的局限性。例如，虽然闭源LLMs通常表现出较强的性能，但它们并非没有缺陷，较小的开源模型面临更大的挑战。此外，某些国家（如埃及、阿联酋）似乎比其他国家（如伊拉克、毛里塔尼亚、也门）更好地被代表。我们的标注指南、代码和数据可供公开获取以实现可重复性。

更新时间: 2025-07-24 21:08:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.00151v2

Ralts: Robust Aggregation for Enhancing Graph Neural Network Resilience on Bit-flip Errors

Graph neural networks (GNNs) have been widely applied in safety-critical applications, such as financial and medical networks, in which compromised predictions may cause catastrophic consequences. While existing research on GNN robustness has primarily focused on software-level threats, hardware-induced faults and errors remain largely underexplored. As hardware systems progress toward advanced technology nodes to meet high-performance and energy efficiency demands, they become increasingly susceptible to transient faults, which can cause bit flips and silent data corruption, a prominent issue observed by major technology companies (e.g., Meta and Google). In response, we first present a comprehensive analysis of GNN robustness against bit-flip errors, aiming to reveal system-level optimization opportunities for future reliable and efficient GNN systems. Second, we propose Ralts, a generalizable and lightweight solution to bolster GNN resilience to bit-flip errors. Specifically, Ralts exploits various graph similarity metrics to filter out outliers and recover compromised graph topology, and incorporates these protective techniques directly into aggregation functions to support any message-passing GNNs. Evaluation results demonstrate that Ralts effectively enhances GNN robustness across a range of GNN models, graph datasets, error patterns, and both dense and sparse architectures. On average, under a BER of $3\times10^{-5}$, these robust aggregation functions improve prediction accuracy by at least 20\% when errors present in model weights or node embeddings, and by at least 10\% when errors occur in adjacency matrices. Ralts is also optimized to deliver execution efficiency comparable to built-in aggregation functions in PyTorch Geometric.

Updated: 2025-07-24 21:03:44

标题: Ralts：强大的聚合方法，增强图神经网络对位翻错误的容忍性

摘要: 图神经网络（GNNs）已被广泛应用于金融和医疗网络等安全关键应用中，其中遭受损害的预测可能导致灾难性后果。虽然现有关于GNN鲁棒性的研究主要集中在软件级威胁上，但硬件引起的故障和错误仍然大部分未被探索。随着硬件系统向先进技术节点发展以满足高性能和能源效率需求，它们变得越来越容易受到瞬态故障的影响，这可能导致位翻转和静默数据损坏，这是一些主要技术公司（如Meta和Google）观察到的一个突出问题。为此，我们首先对GNN对位翻转错误的鲁棒性进行了全面分析，旨在揭示未来可靠和高效的GNN系统的系统级优化机会。其次，我们提出了Ralts，一个通用且轻量级的解决方案，以增强GNN对位翻转错误的韧性。具体来说，Ralts利用各种图相似性度量来过滤异常值并恢复遭受损害的图拓扑，并将这些保护技术直接整合到聚合函数中，以支持任何消息传递的GNNs。评估结果表明，在一系列GNN模型、图数据集、错误模式以及稠密和稀疏架构下，Ralts有效提高了GNN的鲁棒性。在误码率为$3\times10^{-5}$的情况下，这些鲁棒的聚合函数在模型权重或节点嵌入中存在错误时，将预测准确度提高至少20％，当邻接矩阵中出现错误时，预测准确度提高至少10％。Ralts还经过优化，以提供与PyTorch Geometric内置聚合函数相当的执行效率。

更新时间: 2025-07-24 21:03:44

领域: cs.LG

下载: http://arxiv.org/abs/2507.18804v1

Central limit theorems for the eigenvalues of graph Laplacians on data clouds

Given i.i.d.\ samples $X_n =\{ x_1, \dots, x_n \}$ from a distribution supported on a low dimensional manifold ${M}$ embedded in Eucliden space, we consider the graph Laplacian operator $\Delta_n$ associated to an $\varepsilon$-proximity graph over $X_n$ and study the asymptotic fluctuations of its eigenvalues around their means. In particular, letting $\hat{\lambda}_l^\varepsilon$ denote the $l$-th eigenvalue of $\Delta_n$, and under suitable assumptions on the data generating model and on the rate of decay of $\varepsilon$, we prove that $\sqrt{n } (\hat{\lambda}_{l}^\varepsilon - \mathbb{E}[\hat{\lambda}_{l}^\varepsilon] )$ is asymptotically Gaussian with a variance that we can explicitly characterize. A formal argument allows us to interpret this asymptotic variance as the dissipation of a gradient flow of a suitable energy with respect to the Fisher-Rao geometry. This geometric interpretation allows us to give, in turn, a statistical interpretation of the asymptotic variance in terms of a Cramer-Rao lower bound for the estimation of the eigenvalues of certain weighted Laplace-Beltrami operator. The latter interpretation suggests a form of asymptotic statistical efficiency for the eigenvalues of the graph Laplacian. We also present CLTs for multiple eigenvalues and through several numerical experiments explore the validity of our results when some of the assumptions that we make in our theoretical analysis are relaxed.

Updated: 2025-07-24 21:03:20

标题: 数据云上图拉普拉斯矩阵特征值的中心极限定理

摘要: 鉴于来自支持在欧几里德空间中嵌入的低维流形${M}$上的分布的独立同分布的样本$X_n =\{ x_1, \dots, x_n \}$，我们考虑与$X_n$上的$\varepsilon$-接近图相关联的图拉普拉斯算子$\Delta_n$，并研究其特征值围绕其均值的渐近波动。特别是，让$\hat{\lambda}_l^\varepsilon$表示$\Delta_n$的第$l$个特征值，并在数据生成模型和$\varepsilon$衰减速率上做出适当假设的情况下，我们证明了$\sqrt{n } (\hat{\lambda}_{l}^\varepsilon - \mathbb{E}[\hat{\lambda}_{l}^\varepsilon] )$在渐近意义下服从高斯分布，其方差可以明确表征。通过一个形式化的论证，我们可以将这个渐近方差解释为适当能量的梯度流相对于Fisher-Rao几何的耗散。这种几何解释使我们能够进一步以某种加权拉普拉斯-贝尔特拉米算子的特征值估计的Cramer-Rao下界的形式，给出渐近方差的统计解释。后一种解释暗示了图拉普拉斯算子的特征值的渐近统计效率形式。我们还提出了多个特征值的中心极限定理，并通过几个数值实验探索了当我们在理论分析中放宽一些假设时，我们结果的有效性。

更新时间: 2025-07-24 21:03:20

领域: stat.ML,cs.LG,math.AP,math.DG,math.PR,62G20 60F05 58J50 35P15 68R10 60D05

下载: http://arxiv.org/abs/2507.18803v1

DxHF: Providing High-Quality Human Feedback for LLM Alignment via Interactive Decomposition

Human preferences are widely used to align large language models (LLMs) through methods such as reinforcement learning from human feedback (RLHF). However, the current user interfaces require annotators to compare text paragraphs, which is cognitively challenging when the texts are long or unfamiliar. This paper contributes by studying the decomposition principle as an approach to improving the quality of human feedback for LLM alignment. This approach breaks down the text into individual claims instead of directly comparing two long-form text responses. Based on the principle, we build a novel user interface DxHF. It enhances the comparison process by showing decomposed claims, visually encoding the relevance of claims to the conversation and linking similar claims. This allows users to skim through key information and identify differences for better and quicker judgment. Our technical evaluation shows evidence that decomposition generally improves feedback accuracy regarding the ground truth, particularly for users with uncertainty. A crowdsourcing study with 160 participants indicates that using DxHF improves feedback accuracy by an average of 5%, although it increases the average feedback time by 18 seconds. Notably, accuracy is significantly higher in situations where users have less certainty. The finding of the study highlights the potential of HCI as an effective method for improving human-AI alignment.

Updated: 2025-07-24 21:01:24

标题: DxHF: 通过交互式分解为LLM对齐提供高质量的人工反馈

摘要: 人类偏好被广泛用于通过诸如从人类反馈中进行强化学习（RLHF）等方法来对齐大型语言模型（LLMs）。然而，当前的用户界面要求注释者比较文本段落，在文本长或陌生时认知上具有挑战性。本文通过研究分解原则作为改进人类反馈质量以对齐LLM的方法。该方法将文本分解为单个主张，而不是直接比较两个长篇文本响应。基于这一原则，我们构建了一个新颖的用户界面DxHF。它通过展示分解的主张，视觉编码主张与对话的相关性，并连接相似的主张来增强比较过程。这使用户能够快速浏览关键信息并识别差异，以做出更好和更快的判断。我们的技术评估显示，分解通常提高了关于基本事实的反馈准确性，特别是对于存在不确定性的用户。一项包括160名参与者的众包研究表明，使用DxHF平均提高了5%的反馈准确性，尽管平均反馈时间增加了18秒。值得注意的是，在用户不太确定的情况下，准确性显著提高。该研究的发现突出了人机交互作为改善人工智能对齐的有效方法的潜力。

更新时间: 2025-07-24 21:01:24

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2507.18802v1

DxHF: Providing High-Quality Human Feedback for LLM Alignment via Interactive Decomposition

Updated: 2025-07-24 21:01:24

标题: DxHF：通过交互分解为LLM对齐提供高质量的人类反馈

摘要: 人类偏好被广泛用于通过诸如从人类反馈中进行强化学习（RLHF）的方法来对齐大型语言模型（LLMs）。然而，当前的用户界面需要注释者比较文本段落，当文本很长或陌生时，这在认知上是具有挑战性的。本文通过研究分解原则作为改进LLM对齐的人类反馈质量的方法来做出贡献。这种方法将文本分解为单个主张，而不是直接比较两个长篇文本回复。基于这一原则，我们构建了一种新颖的用户界面DxHF。它通过显示分解的主张、视觉编码主张与对话的相关性以及链接相似的主张来增强比较过程。这使用户能够快速浏览关键信息，并识别差异，从而做出更好和更快的判断。我们的技术评估显示，分解通常提高了关于地面真相的反馈准确性，特别是对于存在不确定性的用户。一项拥有160名参与者的众包研究表明，使用DxHF平均提高了5%的反馈准确性，尽管平均反馈时间增加了18秒。值得注意的是，在用户不太确定的情况下，准确性显着提高。研究结果突显了人机交互作为改善人工智能对齐的有效方法的潜力。

更新时间: 2025-07-24 21:01:24

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2507.18802v1

Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models

Masked diffusion language models (MDLMs) promise fast, non-autoregressive text generation, yet existing samplers, which pick tokens to unmask based on model confidence, ignore interactions when unmasking multiple positions in parallel and effectively reduce to slow, autoregressive behavior. We propose the Dilated Unmasking Scheduler (DUS), an inference-only, planner-model-free method that partitions sequence positions into non-adjacent dilated groups and unmasked them in parallel so as to minimize an upper bound on joint entropy gain at each denoising step. By explicitly trading off the number of network calls against generation quality, DUS recovers most of the performance lost under traditional parallel unmasking strategies. Across math (GSM8K, MATH500), code (HumanEval, MBPP) and general-knowledge benchmarks (BBH, MMLU-Pro), DUS outperforms confidence-based planners, without modifying the underlying denoiser, and reveals the true speed-quality frontier of MDLMs.

Updated: 2025-07-24 20:58:50

标题: 速度计划：用于蒙面扩散语言模型的膨胀调度

摘要: 掩盖扩散语言模型（MDLMs）承诺快速、非自回归的文本生成，然而现有的采样器，根据模型的置信度选择要解码的标记，忽视了在同时解码多个位置时的交互作用，并有效地减少到缓慢的自回归行为。我们提出了扩张解码调度器（DUS），这是一种仅推理的、无计划模型的方法，将序列位置分成非相邻的扩张组，并并行解码它们，以便在每个去噪步骤上最小化联合熵增益的上界。通过明确权衡网络调用次数与生成质量之间的关系，DUS恢复了在传统并行解码策略下丢失的大部分性能。在数学（GSM8K，MATH500）、代码（HumanEval，MBPP）和一般知识基准（BBH，MMLU-Pro）方面，DUS优于基于置信度的计划者，而不修改基础去噪器，并揭示了MDLMs的真正速度-质量前沿。

更新时间: 2025-07-24 20:58:50

领域: cs.CL,cs.AI,cs.IT,cs.LG,cs.NE,math.IT

下载: http://arxiv.org/abs/2506.19037v3

Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models

Updated: 2025-07-24 20:58:50

标题: 速度计划：用于掩码扩散语言模型的扩张调度方案

摘要: 遮蔽扩散语言模型（MDLMs）承诺快速、非自回归的文本生成，然而现有的采样器，根据模型置信度选择要取消遮蔽的标记，忽略了在并行取消遮蔽多个位置时的交互，并有效地降低到缓慢的自回归行为。我们提出了扩张取消遮蔽调度器（DUS），这是一种仅用于推理、不需要规划模型的方法，将序列位置划分为非相邻的扩张组，并并行取消遮蔽，以便在每个去噪步骤中最小化联合熵增益的上界。通过明确权衡网络调用次数与生成质量，DUS恢复了传统并行取消遮蔽策略下损失的大部分性能。在数学（GSM8K、MATH500）、代码（HumanEval、MBPP）和通识基准测试（BBH、MMLU-Pro）中，DUS优于基于置信度的规划者，而无需修改基础去噪器，并揭示了MDLMs的真正速度-质量前沿。

更新时间: 2025-07-24 20:58:50

领域: cs.CL,cs.AI,cs.IT,cs.LG,cs.NE,math.IT

下载: http://arxiv.org/abs/2506.19037v3

2048: Reinforcement Learning in a Delayed Reward Environment

Delayed and sparse rewards present a fundamental obstacle for reinforcement-learning (RL) agents, which struggle to assign credit for actions whose benefits emerge many steps later. The sliding-tile game 2048 epitomizes this challenge: although frequent small score changes yield immediate feedback, they often mislead agents into locally optimal but globally suboptimal strategies. In this work, we introduce a unified, distributional multi-step RL framework designed to directly optimize long-horizon performance. Using the open source Gym-2048 environment we develop and compare four agent variants: standard DQN, PPO, QR-DQN (Quantile Regression DQN), and a novel Horizon-DQN (H-DQN) that integrates distributional learning, dueling architectures, noisy networks, prioritized replay, and more. Empirical evaluation reveals a clear hierarchy in effectiveness: max episode scores improve from 3.988K (DQN) to 5.756K (PPO), 8.66K (QR-DQN), and 18.21K (H-DQN), with H-DQN reaching the 2048 tile. Upon scaling H-DQN it reaches a max score 41.828K and a 4096 tile. These results demonstrate that distributional, multi-step targets substantially enhance performance in sparse-reward domains, and they suggest promising avenues for further gains through model-based planning and curriculum learning.

Updated: 2025-07-24 20:58:48

标题: 2048：延迟奖励环境中的强化学习

摘要: 推迟和稀疏的奖励对强化学习（RL）代理构成了根本障碍，这些代理很难为行动分配信用，因为其好处可能在许多步之后才显现。滑动方块游戏2048体现了这一挑战：尽管频繁的小分数变化会产生即时反馈，但它们经常会误导代理进入局部最优但在全局上次优策略。在这项工作中，我们引入了一个统一的、分布式的多步RL框架，旨在直接优化长期性能。我们使用开源的Gym-2048环境开发并比较了四种代理变体：标准DQN、PPO、QR-DQN（分位数回归DQN）和一种新颖的Horizon-DQN（H-DQN），它集成了分布式学习、对决架构、噪声网络、优先重放等。经验评估显示了明显的效果层次结构：最大的集数分数从3.988K（DQN）提高到5.756K（PPO）、8.66K（QR-DQN）和18.21K（H-DQN），H-DQN达到了2048方块。在扩展H-DQN后，它达到了最大分数41.828K和4096方块。这些结果表明，分布式的多步目标大大提升了稀疏奖励领域的性能，并且它们为通过基于模型的规划和课程学习进一步提升性能提供了有希望的途径。

更新时间: 2025-07-24 20:58:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.05465v2

2048: Reinforcement Learning in a Delayed Reward Environment

Updated: 2025-07-24 20:58:48

标题: 2048：延迟奖励环境下的强化学习

摘要: 延迟和稀疏的奖励对于强化学习代理构成了根本障碍，这些代理很难为行动分配功劳，因为这些行动的好处要很多步后才会显现。滑动块游戏2048体现了这一挑战：虽然频繁的小得分变化会带来即时反馈，但它们经常会误导代理进入局部最优但全局次优的策略。在这项工作中，我们引入了一个统一的、分布式的多步强化学习框架，旨在直接优化长期表现。使用开源的Gym-2048环境，我们开发并比较了四种代理变体：标准DQN、PPO、QR-DQN（分位回归DQN）和一种新颖的Horizon-DQN（H-DQN），它集成了分布式学习、对决架构、噪声网络、优先回放等。实证评估显示了有效性的明显等级：最大的回合得分从3.988K（DQN）提高到5.756K（PPO）、8.66K（QR-DQN）和18.21K（H-DQN），H-DQN达到了2048块。在扩展H-DQN之后，它达到了最高得分41.828K和4096块。这些结果表明，分布式、多步目标显著增强了稀缺奖励领域的性能，并且它们为通过基于模型的规划和课程学习进一步提升性能提供了有希望的途径。

更新时间: 2025-07-24 20:58:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.05465v2

Resolving Indirect Calls in Binary Code via Cross-Reference Augmented Graph Neural Networks

Binary code analysis is essential in scenarios where source code is unavailable, with extensive applications across various security domains. However, accurately resolving indirect call targets remains a longstanding challenge in maintaining the integrity of static analysis in binary code. This difficulty arises because the operand of a call instruction (e.g., call rax) remains unknown until runtime, resulting in an incomplete inter-procedural control flow graph (CFG). Previous approaches have struggled with low accuracy and limited scalability. To address these limitations, recent work has increasingly turned to machine learning (ML) to enhance analysis. However, this ML-driven approach faces two significant obstacles: low-quality callsite-callee training pairs and inadequate binary code representation, both of which undermine the accuracy of ML models. In this paper, we introduce NeuCall, a novel approach for resolving indirect calls using graph neural networks. Existing ML models in this area often overlook key elements such as data and code cross-references, which are essential for understanding a program's control flow. In contrast, NeuCall augments CFGs with cross-references, preserving rich semantic information. Additionally, we leverage advanced compiler-level type analysis to generate high-quality callsite-callee training pairs, enhancing model precision and reliability. We further design a graph neural model that leverages augmented CFGs and relational graph convolutions for accurate target prediction. Evaluated against real-world binaries from GitHub and the Arch User Repository on x86_64 architecture, NeuCall achieves an F1 score of 95.2%, outperforming state-of-the-art ML-based approaches. These results highlight NeuCall's effectiveness in building precise inter-procedural CFGs and its potential to advance downstream binary analysis and security applications.

Updated: 2025-07-24 20:54:41

标题: 通过交叉引用增强图神经网络解决二进制代码中的间接调用

摘要: 二进制代码分析在无法获取源代码的情况下至关重要，在各个安全领域都有广泛的应用。然而，在二进制代码的静态分析中，准确解析间接调用目标始终是一个长期存在的挑战。这一困难是因为调用指令的操作数（例如，call rax）直到运行时才会被确定，导致了不完整的程序间控制流图（CFG）。先前的方法在准确性和可扩展性方面面临困难。为了解决这些限制，最近的研究越来越多地转向机器学习（ML）来增强分析能力。然而，这种基于ML的方法面临两个重大障碍：低质量的调用点-被调用者训练对和不足的二进制代码表示，这两者都削弱了ML模型的准确性。在本文中，我们介绍了一种使用图神经网络解析间接调用的新方法NeuCall。这一领域现有的ML模型经常忽略关键元素，如数据和代码交叉引用，这些对于理解程序的控制流是至关重要的。相比之下，NeuCall通过交叉引用增强了CFG，保留了丰富的语义信息。此外，我们利用先进的编译器级别类型分析生成高质量的调用点-被调用者训练对，提高了模型的精确度和可靠性。我们进一步设计了一个利用增强的CFG和关系图卷积进行准确目标预测的图神经模型。在GitHub和Arch用户仓库上对x86_64架构的真实二进制文件进行评估，NeuCall取得了95.2%的F1得分，胜过了最先进的基于ML的方法。这些结果突显了NeuCall在构建精确的程序间CFG方面的有效性，以及推动下游二进制分析和安全应用的潜力。

更新时间: 2025-07-24 20:54:41

领域: cs.CR

下载: http://arxiv.org/abs/2507.18801v1

Semantic IDs for Music Recommendation

Training recommender systems for next-item recommendation often requires unique embeddings to be learned for each item, which may take up most of the trainable parameters for a model. Shared embeddings, such as using content information, can reduce the number of distinct embeddings to be stored in memory. This allows for a more lightweight model; correspondingly, model complexity can be increased due to having fewer embeddings to store in memory. We show the benefit of using shared content-based features ('semantic IDs') in improving recommendation accuracy and diversity, while reducing model size, for two music recommendation datasets, including an online A/B test on a music streaming service.

Updated: 2025-07-24 20:48:02

标题: 音乐推荐的语义标识符

摘要: 为了进行下一个项目推荐，训练推荐系统通常需要学习每个项目的独特嵌入，这可能占据模型中大部分可训练参数。共享嵌入，例如使用内容信息，可以减少需要存储在内存中的不同嵌入的数量。这使得模型更加轻量级；相应地，由于要存储的嵌入较少，模型复杂度可能会增加。我们展示了在两个音乐推荐数据集上使用共享基于内容的特征（'语义ID'）在提高推荐准确性和多样性的同时，减小模型大小的好处，包括对音乐流媒体服务的在线A/B测试。

更新时间: 2025-07-24 20:48:02

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2507.18800v1

Simulation-Driven Reinforcement Learning in Queuing Network Routing Optimization

This study focuses on the development of a simulation-driven reinforcement learning (RL) framework for optimizing routing decisions in complex queueing network systems, with a particular emphasis on manufacturing and communication applications. Recognizing the limitations of traditional queueing methods, which often struggle with dynamic, uncertain environments, we propose a robust RL approach leveraging Deep Deterministic Policy Gradient (DDPG) combined with Dyna-style planning (Dyna-DDPG). The framework includes a flexible and configurable simulation environment capable of modeling diverse queueing scenarios, disruptions, and unpredictable conditions. Our enhanced Dyna-DDPG implementation incorporates separate predictive models for next-state transitions and rewards, significantly improving stability and sample efficiency. Comprehensive experiments and rigorous evaluations demonstrate the framework's capability to rapidly learn effective routing policies that maintain robust performance under disruptions and scale effectively to larger network sizes. Additionally, we highlight strong software engineering practices employed to ensure reproducibility and maintainability of the framework, enabling practical deployment in real-world scenarios.

Updated: 2025-07-24 20:32:47

标题: 排队网络路由优化中的仿真驱动强化学习

摘要: 这项研究关注于开发一种基于模拟驱动的强化学习（RL）框架，用于优化复杂排队网络系统中的路由决策，特别关注制造和通信应用。认识到传统排队方法存在的局限性，通常难以处理动态、不确定的环境，我们提出了一种强大的RL方法，结合深度确定性策略梯度（DDPG）和Dyna式规划（Dyna-DDPG）。该框架包括一个灵活可配置的模拟环境，能够模拟各种排队场景、干扰和不可预测条件。我们增强的Dyna-DDPG实现包括独立的预测模型，用于下一个状态转换和奖励，显著提高了稳定性和样本效率。全面的实验和严格的评估表明，该框架能够快速学习有效的路由策略，保持在干扰下的稳健性能，并有效扩展到更大的网络规模。此外，我们强调采用了强大的软件工程实践，以确保框架的可重现性和可维护性，实现在现实场景中的实际部署。

更新时间: 2025-07-24 20:32:47

领域: cs.AI

下载: http://arxiv.org/abs/2507.18795v1

Simulation-Driven Reinforcement Learning in Queuing Network Routing Optimization

Updated: 2025-07-24 20:32:47

标题: 排队网络路由优化中的仿真驱动强化学习

摘要: 本研究着重于开发一个基于仿真驱动的强化学习（RL）框架，用于优化复杂排队网络系统中的路由决策，特别关注制造和通信应用。鉴于传统排队方法在动态、不确定环境中常常遇到困难，我们提出了一种强大的RL方法，利用深度确定性策略梯度（DDPG）结合Dyna风格规划（Dyna-DDPG）。该框架包括一个灵活可配置的仿真环境，能够建模各种排队场景、干扰和不可预测的条件。我们增强的Dyna-DDPG实现包括针对下一个状态转移和奖励的独立预测模型，显著提高了稳定性和样本效率。全面的实验和严格的评估表明，该框架能够快速学习有效的路由策略，在干扰下保持稳健性能，并有效扩展到更大的网络规模。此外，我们强调了采用强大软件工程实践，以确保框架的可重现性和可维护性，使其能够在实际场景中实现实际部署。

更新时间: 2025-07-24 20:32:47

领域: cs.AI

下载: http://arxiv.org/abs/2507.18795v1

CLEAR: Unlearning Spurious Style-Content Associations with Contrastive LEarning with Anti-contrastive Regularization

Learning representations unaffected by superficial characteristics is important to ensure that shifts in these characteristics at test time do not compromise downstream prediction performance. For instance, in healthcare applications, we might like to learn features that contain information about pathology yet are unaffected by race, sex, and other sources of physiologic variability, thereby ensuring predictions are equitable and generalizable across all demographics. Here we propose Contrastive LEarning with Anti-contrastive Regularization (CLEAR), an intuitive and easy-to-implement framework that effectively separates essential (i.e., task-relevant) characteristics from superficial (i.e., task-irrelevant) characteristics during training, leading to better performance when superficial characteristics shift at test time. We begin by supposing that data representations can be semantically separated into task-relevant content features, which contain information relevant to downstream tasks, and task-irrelevant style features, which encompass superficial attributes that are irrelevant to these tasks, yet may degrade performance due to associations with content present in training data that do not generalize. We then prove that our anti-contrastive penalty, which we call Pair-Switching (PS), minimizes the Mutual Information between the style attributes and content labels. Finally, we instantiate CLEAR in the latent space of a Variational Auto-Encoder (VAE), then perform experiments to quantitatively and qualitatively evaluate the resulting CLEAR-VAE over several image datasets. Our results show that CLEAR-VAE allows us to: (a) swap and interpolate content and style between any pair of samples, and (b) improve downstream classification performance in the presence of previously unseen combinations of content and style. Our code will be made publicly available.

Updated: 2025-07-24 20:31:21

标题: 清晰：利用对比学习和反对比正则化来消除虚假的风格-内容关联

摘要: 学习不受表面特征影响的表示对于确保测试时这些特征的变化不损害下游预测性能至关重要。例如，在医疗应用中，我们可能希望学习包含有关病理信息的特征，但受种族、性别和其他生理变异源的影响，从而确保预测在所有人口统计学中是公平和可推广的。在这里，我们提出了反对比正则化的对比学习（CLEAR），这是一个直观且易于实现的框架，可以在训练期间有效地将基本（即与任务相关）特征与表面（即与任务无关）特征分离，从而在测试时表面特征发生变化时提高性能。我们首先假设数据表示可以语义地分离为与任务相关的内容特征，其中包含与下游任务相关的信息，以及与任务无关的风格特征，它们包括与这些任务无关的表面属性，但可能会降低性能，因为它们与训练数据中存在的内容相关联，而这些内容不能泛化。然后，我们证明了我们的反对比惩罚，即我们称之为Pair-Switching（PS），可以最小化风格属性和内容标签之间的互信息。最后，我们在变分自动编码器（VAE）的潜在空间中实例化CLEAR，然后进行实验，以定量和定性评估结果CLEAR-VAE在几个图像数据集上的结果。我们的结果表明，CLEAR-VAE使我们能够：（a）在任意一对样本之间交换和插值内容和风格，以及（b）在先前未见到的内容和风格组合存在时提高下游分类性能。我们的代码将公开发布。

更新时间: 2025-07-24 20:31:21

领域: cs.LG

下载: http://arxiv.org/abs/2507.18794v1

Tell Me What You See: An Iterative Deep Learning Framework for Image Captioning

Image captioning, a task at the confluence of computer vision and natural language processing, requires a sophisticated understanding of both visual scenes and linguistic structure. While modern approaches are dominated by large-scale Transformer architectures, this paper documents a systematic, iterative development of foundational image captioning models, progressing from a simple CNN-LSTM encoder-decoder to a competitive attention-based system. We present a series of five models, beginning with Genesis and concluding with Nexus, an advanced model featuring an EfficientNetV2B3 backbone and a dynamic attention mechanism. Our experiments chart the impact of architectural enhancements and demonstrate a key finding within the classic CNN-LSTM paradigm: merely upgrading the visual backbone without a corresponding attention mechanism can degrade performance, as the single-vector bottleneck cannot transmit the richer visual detail. This insight validates the architectural shift to attention. Trained on the MS COCO 2017 dataset, our final model, Nexus, achieves a BLEU-4 score of 31.4, surpassing several foundational benchmarks and validating our iterative design process. This work provides a clear, replicable blueprint for understanding the core architectural principles that underpin modern vision-language tasks.

Updated: 2025-07-24 20:20:44

标题: 告诉我你看到的：一种用于图像字幕的迭代深度学习框架

摘要: 图像字幕是计算机视觉和自然语言处理交汇处的任务，需要对视觉场景和语言结构进行复杂的理解。尽管现代方法主要由大规模Transformer架构主导，但本文记录了基础图像字幕模型的系统、迭代式发展过程，从简单的CNN-LSTM编码器-解码器发展到竞争激烈的基于注意力机制的系统。我们提出了一系列五个模型，从Genesis开始，最终发展到Nexus，这是一个先进模型，具有EfficientNetV2B3骨干和动态注意力机制。我们的实验展示了架构优化的影响，并展示了在经典的CNN-LSTM范式中的一个关键发现：仅仅升级视觉骨干而不配备相应的注意力机制可能会降低性能，因为单向量瓶颈无法传输更丰富的视觉细节。这一发现证实了对注意力的架构转变。在MS COCO 2017数据集上训练的最终模型Nexus实现了31.4的BLEU-4分数，超过了几个基础基准，并验证了我们的迭代设计过程。这项工作为理解支撑现代视觉-语言任务的核心架构原则提供了清晰、可复制的蓝图。

更新时间: 2025-07-24 20:20:44

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.18788v1

Tell Me What You See: An Iterative Deep Learning Framework for Image Captioning

Updated: 2025-07-24 20:20:44

标题: 告诉我你看到了什么：一种用于图像描述的迭代深度学习框架

摘要: 图像字幕是计算机视觉和自然语言处理交汇的任务，需要对视觉场景和语言结构都有复杂的理解。虽然现代方法主要由大规模Transformer架构主导，但本文记录了基础图像字幕模型的系统、迭代式发展过程，从简单的CNN-LSTM编码器-解码器到竞争力强的基于注意力机制的系统。我们提出了一系列五个模型，从Genesis开始，到采用EfficientNetV2B3骨干和动态注意力机制的高级模型Nexus结束。我们的实验展示了架构改进的影响，并展示了在经典的CNN-LSTM范式中一个关键发现：仅仅升级视觉骨干而没有对应的注意力机制会降低性能，因为单一向量瓶颈无法传递更丰富的视觉细节。这一洞察验证了向注意力的架构转变。在MS COCO 2017数据集上训练，我们的最终模型Nexus实现了31.4的BLEU-4得分，超过了几个基础基准，并验证了我们的迭代设计过程。这项工作为理解支撑现代视觉-语言任务的核心架构原则提供了明确、可复制的蓝图。

更新时间: 2025-07-24 20:20:44

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.18788v1

Initial Steps in Integrating Large Reasoning and Action Models for Service Composition

Service composition remains a central challenge in building adaptive and intelligent software systems, often constrained by limited reasoning capabilities or brittle execution mechanisms. This paper explores the integration of two emerging paradigms enabled by large language models: Large Reasoning Models (LRMs) and Large Action Models (LAMs). We argue that LRMs address the challenges of semantic reasoning and ecosystem complexity while LAMs excel in dynamic action execution and system interoperability. However, each paradigm has complementary limitations - LRMs lack grounded action capabilities, and LAMs often struggle with deep reasoning. We propose an integrated LRM-LAM architectural framework as a promising direction for advancing automated service composition. Such a system can reason about service requirements and constraints while dynamically executing workflows, thus bridging the gap between intention and execution. This integration has the potential to transform service composition into a fully automated, user-friendly process driven by high-level natural language intent.

Updated: 2025-07-24 19:57:18

标题: 将大型推理和行为模型整合用于服务组合的初始步骤

摘要: 服务组合仍然是构建适应性和智能软件系统的中心挑战，通常受限于有限的推理能力或脆弱的执行机制。本文探讨了两种新兴范例的整合，即由大型语言模型实现的大型推理模型（LRMs）和大型行动模型（LAMs）。我们认为LRMs解决了语义推理和生态系统复杂性的挑战，而LAMs在动态行动执行和系统互操作性方面表现出色。然而，每种范例都有互补的局限性 - LRMs缺乏基础行动能力，而LAMs经常在深度推理方面遇到困难。我们提出了一个集成的LRM-LAM架构框架作为推进自动服务组合的有希望的方向。这样的系统可以推理服务需求和约束，同时动态执行工作流程，从而弥合意图和执行之间的差距。这种整合有潜力将服务组合转变为完全自动化、用户友好的过程，由高级自然语言意图驱动。

更新时间: 2025-07-24 19:57:18

领域: cs.AI,cs.SE

下载: http://arxiv.org/abs/2507.18775v1

Initial Steps in Integrating Large Reasoning and Action Models for Service Composition

Updated: 2025-07-24 19:57:18

标题: 整合大型推理和行动模型以进行服务组合的初始步骤

摘要: 服务组合仍然是构建自适应和智能软件系统中的一个核心挑战，通常受限于有限的推理能力或脆弱的执行机制。本文探讨了两种由大型语言模型实现的新兴范式的集成：大型推理模型（LRMs）和大型行动模型（LAMs）。我们认为LRMs解决了语义推理和生态系统复杂性的挑战，而LAMs在动态行动执行和系统互操作性方面表现出色。然而，每种范式都有互补的局限性 - LRMs缺乏基于实地行动的能力，而LAMs常常在深度推理方面遇到困难。我们提出了一个集成的LRM-LAM架构框架作为推进自动化服务组合的一个有前途的方向。这样一个系统可以推理服务需求和约束，同时动态执行工作流程，从而弥合意图和执行之间的差距。这种集成有潜力将服务组合转变为一个完全自动化、用户友好的过程，由高级自然语言意图驱动。

更新时间: 2025-07-24 19:57:18

领域: cs.AI,cs.SE

下载: http://arxiv.org/abs/2507.18775v1

Bridging Cloud Convenience and Protocol Transparency: A Hybrid Architecture for Ethereum Node Operations on Amazon Managed Blockchain

As blockchain technologies are increasingly adopted in enterprise and research domains, the need for secure, scalable, and performance-transparent node infrastructure has become critical. While self-hosted Ethereum nodes offer operational control, they often lack elasticity and require complex maintenance. This paper presents a hybrid, service-oriented architecture for deploying and monitoring Ethereum full nodes using Amazon Managed Blockchain (AMB), integrated with EC2-based observability, IAM-enforced security policies, and reproducible automation via the AWS Cloud Development Kit. Our architecture supports end-to-end observability through custom EC2 scripts leveraging Web3.py and JSON-RPC, collecting over 1,000 real-time data points-including gas utilization, transaction inclusion latency, and mempool dynamics. These metrics are visualized and monitored through AWS CloudWatch, enabling service-level performance tracking and anomaly detection. This cloud-native framework restores low-level observability lost in managed environments while maintaining the operational simplicity of managed services. By bridging the simplicity of AMB with the transparency required for protocol research and enterprise monitoring, this work delivers one of the first reproducible, performance-instrumented Ethereum deployments on AMB. The proposed hybrid architecture enables secure, observable, and reproducible Ethereum node operations in cloud environments, suitable for both research and production use.

Updated: 2025-07-24 19:55:35

标题: 桥接云便利和协议透明性：在亚马逊托管区块链上进行以太坊节点操作的混合架构

摘要: 随着区块链技术在企业和研究领域的日益普及，对安全、可扩展和性能透明的节点基础设施的需求变得至关重要。虽然自托管的以太坊节点提供操作控制，但它们通常缺乏弹性并且需要复杂的维护。本文提出了一种混合的面向服务的架构，用于部署和监视以太坊全节点，使用亚马逊托管区块链（AMB）集成了基于EC2的可观察性，通过IAM强制执行的安全策略，并通过AWS云开发工具包实现可重现的自动化。我们的架构通过使用Web3.py和JSON-RPC的自定义EC2脚本支持端到端的可观察性，收集超过1,000个实时数据点，包括gas利用率、交易包含延迟和内存池动态。这些指标通过AWS CloudWatch进行可视化和监视，实现了服务级性能跟踪和异常检测。这种云原生框架恢复了在托管环境中丢失的低级可观察性，同时保持了托管服务的操作简易性。通过将AMB的简易性与协议研究和企业监控所需的透明度相结合，该工作在AMB上提供了第一个可重现的、性能仪表化的以太坊部署。提出的混合架构在云环境中实现了安全、可观察和可重现的以太坊节点操作，适用于研究和生产使用。

更新时间: 2025-07-24 19:55:35

领域: cs.CR

下载: http://arxiv.org/abs/2507.18774v1

Discovering the dynamics of \emph{Sargassum} rafts' centers of mass

Since 2011, rafts of floating \emph{Sargassum} seaweed have frequently obstructed the coasts of the Intra-Americas Seas. The motion of the rafts is represented by a high-dimensional nonlinear dynamical system. Referred to as the eBOMB model, this builds on the Maxey--Riley equation by incorporating interactions between clumps of \emph{Sargassum} forming a raft and the effects of Earth's rotation. The absence of a predictive law for the rafts' centers of mass suggests a need for machine learning. In this paper, we evaluate and contrast Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs) and Sparse Identification of Nonlinear Dynamics (SINDy). In both cases, a physics-inspired closure modeling approach is taken rooted in eBOMB. Specifically, the LSTM model learns a mapping from a collection of eBOMB variables to the difference between raft center-of-mass and ocean velocities. The SINDy model's library of candidate functions is suggested by eBOMB variables and includes windowed velocity terms incorporating far-field effects of the carrying flow. Both LSTM and SINDy models perform most effectively in conditions with tightly bonded clumps, despite declining precision with rising complexity, such as with wind effects and when assessing loosely connected clumps. The LSTM model delivered the best results when designs were straightforward, with fewer neurons and hidden layers. While LSTM model serves as an opaque black-box model lacking interpretability, the SINDy model brings transparency by discerning explicit functional relationships through the function libraries. Integration of the windowed velocity terms enabled effective modeling of nonlocal interactions, particularly in datasets featuring sparsely connected rafts.

Updated: 2025-07-24 19:40:00

标题: 发现\emph{Sargassum}浮筏质心的动态特性

摘要: 自2011年以来，漂浮的褐藻丝藻群经常阻塞美洲内海的海岸。漂流群的运动被表示为高维非线性动力学系统。被称为eBOMB模型，该模型基于Maxey-Riley方程，并将形成漂流群的褐藻丝团与地球自转的影响结合起来。对于漂流群的质心缺乏预测性法则表明需要机器学习。在本文中，我们评估和比较了长短期记忆（LSTM）循环神经网络（RNN）和稀疏非线性动力学辨识（SINDy）。在两种情况下，都采用了基于eBOMB的物理启发式闭合建模方法。具体而言，LSTM模型学习了从一组eBOMB变量到漂流群质心和海洋速度之间差异的映射。SINDy模型的候选函数库由eBOMB变量提出，并包括包含携带流的远场效应的窗口速度项。尽管随着复杂性的增加，如风力影响和评估松散连接的团块时，两种模型的精度下降，但在团块之间紧密结合的条件下，LSTM和SINDy模型表现最为有效。当设计简单时，具有较少的神经元和隐藏层时，LSTM模型提供了最佳结果。虽然LSTM模型是一种缺乏解释性的不透明黑盒模型，但SINDy模型通过识别函数库中的显式功能关系带来透明性。窗口速度项的集成实现了对非局部相互作用的有效建模，特别是在特征稀疏连接的数据集中。

更新时间: 2025-07-24 19:40:00

领域: nlin.CD,cs.LG,physics.ao-ph

下载: http://arxiv.org/abs/2507.18771v1

ylmmcl at Multilingual Text Detoxification 2025: Lexicon-Guided Detoxification and Classifier-Gated Rewriting

In this work, we introduce our solution for the Multilingual Text Detoxification Task in the PAN-2025 competition for the ylmmcl team: a robust multilingual text detoxification pipeline that integrates lexicon-guided tagging, a fine-tuned sequence-to-sequence model (s-nlp/mt0-xl-detox-orpo) and an iterative classifier-based gatekeeping mechanism. Our approach departs from prior unsupervised or monolingual pipelines by leveraging explicit toxic word annotation via the multilingual_toxic_lexicon to guide detoxification with greater precision and cross-lingual generalization. Our final model achieves the highest STA (0.922) from our previous attempts, and an average official J score of 0.612 for toxic inputs in both the development and test sets. It also achieved xCOMET scores of 0.793 (dev) and 0.787 (test). This performance outperforms baseline and backtranslation methods across multiple languages, and shows strong generalization in high-resource settings (English, Russian, French). Despite some trade-offs in SIM, the model demonstrates consistent improvements in detoxification strength. In the competition, our team achieved ninth place with a score of 0.612.

Updated: 2025-07-24 19:38:15

标题: 2025年多语言文本净化中的ylmmcl：词典引导的净化和分类器门控重写

摘要: 在这项工作中，我们介绍了我们为PAN-2025比赛ylmmcl团队的多语言文本净化任务提出的解决方案：一个强大的多语言文本净化流水线，集成了基于词典引导的标记、经过微调的序列到序列模型（s-nlp/mt0-xl-detox-orpo）和一个迭代的基于分类器的门控机制。我们的方法与先前的无监督或单语言流水线有所不同，通过利用多语言有毒词汇的明确标注来指导净化，实现更精确和跨语言的泛化。我们的最终模型在之前的尝试中实现了最高的STA（0.922）和毒性输入的平均官方J分数为0.612。在开发和测试集中，它还实现了0.793（开发）和0.787（测试）的xCOMET分数。这种表现在多种语言中优于基线和回译方法，在高资源环境中（英语、俄语、法语）显示出强大的泛化能力。尽管在SIM方面存在一些折衷，但模型在净化强度方面表现出一致的改进。在比赛中，我们的团队以0.612的分数获得了第九名。

更新时间: 2025-07-24 19:38:15

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.18769v1

Toward Structured Knowledge Reasoning: Contrastive Retrieval-Augmented Generation on Experience

Large language models (LLMs) achieve strong performance on plain text tasks but underperform on structured data like tables and databases. Potential challenges arise from their underexposure during pre-training and rigid text-to-structure transfer mechanisms. Unlike humans who seamlessly apply learned patterns across data modalities, LLMs struggle to infer implicit relationships embedded in tabular formats, especially in the absence of explicit structural guidance. To bridge this cognitive gap, we introduce Contrastive Retrieval-Augmented Generation on Experience (CoRE), a framework that builds experience memory representations and enhances generalization through contrastive In-Context Learning (ICL) to simulate human-like knowledge transfer. Experiments on Text-to-SQL and TableQA show CoRE significantly improves performance, achieving average gains of 3.44% and 4.24%, with up to 17.2% on challenging tasks. Our Monte Carlo Tree Search (MCTS)-generated Experience Memory expands training data 8-9x, enhancing diversity and domain coverage. This training-free and continual method propels LLMs toward structured knowledge expertise.

Updated: 2025-07-24 19:34:51

标题: 朝向结构化知识推理：基于对比检索增强生成的经验

摘要: 大型语言模型（LLMs）在纯文本任务上表现出色，但在表格和数据库等结构化数据上表现不佳。潜在挑战源于它们在预训练过程中的不充分暴露和刚性的文本到结构传输机制。与人类不同，人类可以无缝地跨数据模态应用学习到的模式，LLMs很难推断嵌入在表格格式中的隐含关系，尤其是在缺乏明确结构指导的情况下。为了弥补这种认知差距，我们引入了Contrastive Retrieval-Augmented Generation on Experience（CoRE）框架，该框架构建经验记忆表示，并通过对比上下文学习（ICL）增强泛化能力，以模拟类似于人类的知识传递。在Text-to-SQL和TableQA上的实验表明，CoRE显著提高了性能，平均增益为3.44％和4.24％，在具有挑战性的任务上高达17.2％。我们使用蒙特卡洛树搜索（MCTS）生成的经验记忆扩展了训练数据8-9倍，增强了多样性和领域覆盖范围。这种无需训练且持续的方法将LLMs推向结构化知识专业领域。

更新时间: 2025-07-24 19:34:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.00842v2

Toward Structured Knowledge Reasoning: Contrastive Retrieval-Augmented Generation on Experience

Updated: 2025-07-24 19:34:51

标题: 朝向结构化知识推理：基于对比检索增强生成的经验

摘要: 大型语言模型（LLMs）在纯文本任务上表现出色，但在表格和数据库等结构化数据上表现不佳。潜在挑战源于它们在预训练期间接触不足以及刚性的文本到结构转移机制。与人类不同，人类可以无缝地应用学习到的模式跨越数据模态，LLMs在推断表格格式中嵌入的隐含关系时遇到困难，尤其是在缺乏明确结构指导的情况下。为了弥合这种认知差距，我们引入了Contrastive Retrieval-Augmented Generation on Experience（CoRE）框架，该框架构建了经验记忆表示，并通过对比上下文学习（ICL）增强泛化能力，以模拟人类式知识转移。在Text-to-SQL和TableQA上的实验表明，CoRE显著提高了性能，平均增益为3.44%和4.24%，在挑战性任务上高达17.2%。我们的蒙特卡洛树搜索（MCTS）生成的经验记忆将训练数据扩大了8-9倍，增强了多样性和领域覆盖范围。这种无需训练且持续的方法将LLMs推向结构化知识专业化。

更新时间: 2025-07-24 19:34:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2506.00842v2

A Study of Anatomical Priors for Deep Learning-Based Segmentation of Pheochromocytoma in Abdominal CT

Accurate segmentation of pheochromocytoma (PCC) in abdominal CT scans is essential for tumor burden estimation, prognosis, and treatment planning. It may also help infer genetic clusters, reducing reliance on expensive testing. This study systematically evaluates anatomical priors to identify configurations that improve deep learning-based PCC segmentation. We employed the nnU-Net framework to evaluate eleven annotation strategies for accurate 3D segmentation of pheochromocytoma, introducing a set of novel multi-class schemes based on organ-specific anatomical priors. These priors were derived from adjacent organs commonly surrounding adrenal tumors (e.g., liver, spleen, kidney, aorta, adrenal gland, and pancreas), and were compared against a broad body-region prior used in previous work. The framework was trained and tested on 105 contrast-enhanced CT scans from 91 patients at the NIH Clinical Center. Performance was measured using Dice Similarity Coefficient (DSC), Normalized Surface Distance (NSD), and instance-wise F1 score. Among all strategies, the Tumor + Kidney + Aorta (TKA) annotation achieved the highest segmentation accuracy, significantly outperforming the previously used Tumor + Body (TB) annotation across DSC (p = 0.0097), NSD (p = 0.0110), and F1 score (25.84% improvement at an IoU threshold of 0.5), measured on a 70-30 train-test split. The TKA model also showed superior tumor burden quantification (R^2 = 0.968) and strong segmentation across all genetic subtypes. In five-fold cross-validation, TKA consistently outperformed TB across IoU thresholds (0.1 to 0.5), reinforcing its robustness and generalizability. These findings highlight the value of incorporating relevant anatomical context into deep learning models to achieve precise PCC segmentation, offering a valuable tool to support clinical assessment and longitudinal disease monitoring in PCC patients.

Updated: 2025-07-24 19:33:50

标题: 一项关于腹部CT深度学习分割嗜铬细胞瘤解剖先验的研究

摘要: 腹部CT扫描中嗜铬细胞瘤（PCC）的准确分割对于肿瘤负担估计、预后和治疗计划至关重要。它还可以帮助推断遗传群集，减少对昂贵测试的依赖。本研究系统评估解剖先验，以识别改善基于深度学习的PCC分割的配置。我们采用nnU-Net框架评估了十一种注释策略，以准确分割嗜铬细胞瘤的3D，引入了一组基于器官特定解剖先验的新型多类别方案。这些先验源自常见包围肾上腺肿瘤的相邻器官（例如肝脏、脾脏、肾脏、主动脉、肾上腺和胰腺），并与之前工作中使用的广泛身体区域先验进行比较。该框架在NIH临床中心的91名患者的105例增强CT扫描上进行训练和测试。性能使用Dice相似系数（DSC）、归一化表面距离（NSD）和实例级F1分数进行测量。在所有策略中，肿瘤+肾脏+主动脉（TKA）注释实现了最高的分割准确性，显著优于先前使用的肿瘤+身体（TB）注释在DSC（p = 0.0097）、NSD（p = 0.0110）和F1分数（在IoU阈值为0.5时提高了25.84%）上，在70-30的训练-测试分割上进行了测量。TKA模型还显示出优越的肿瘤负担量化（R^2 = 0.968）和在所有遗传亚型上的强大分割。在五倍交叉验证中，TKA在所有IoU阈值（0.1至0.5）上始终优于TB，证实了其稳健性和泛化能力。这些发现突出了将相关解剖背景纳入深度学习模型以实现精确的PCC分割的价值，为支持PCC患者的临床评估和疾病纵向监测提供了有价值的工具。

更新时间: 2025-07-24 19:33:50

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.15193v2

A Study of Anatomical Priors for Deep Learning-Based Segmentation of Pheochromocytoma in Abdominal CT

Updated: 2025-07-24 19:33:50

标题: 一个关于深度学习在腹部CT扫描中嗜铬细胞瘤分割的解剖先验研究

摘要: 在腹部CT扫描中准确分割嗜铬细胞瘤（PCC）对于肿瘤负担估计、预后和治疗计划至关重要。这也可能有助于推断遗传簇，减少对昂贵测试的依赖。本研究系统评估解剖先验以识别改善基于深度学习的PCC分割的配置。我们采用nnU-Net框架评估了十一个注释策略，以准确三维分割嗜铬细胞瘤，引入了一组基于器官特定解剖先验的新型多类别方案。这些先验来自通常环绕肾上腺肿瘤的相邻器官（如肝脏、脾脏、肾脏、主动脉、肾上腺和胰腺），并与先前工作中使用的广泛身体区域先验进行比较。该框架在美国国立卫生研究院临床中心的91名患者的105例对比增强CT扫描上进行了训练和测试。性能使用Dice相似系数（DSC）、归一化表面距离（NSD）和逐例F1分数进行了衡量。在所有策略中，肿瘤+肾脏+主动脉（TKA）注释实现了最高的分割准确性，显著优于先前使用的肿瘤+身体（TB）注释在DSC（p = 0.0097）、NSD（p = 0.0110）和F1分数（在IoU阈值为0.5时提高了25.84%）上，在70-30的训练-测试分割上进行了衡量。TKA模型还显示了优越的肿瘤负担定量化（R^2 = 0.968）和对所有遗传亚型的强大分割。在五折交叉验证中，TKA在所有IoU阈值（0.1至0.5）上始终优于TB，强调了其稳健性和泛化能力。这些发现突显了将相关解剖背景纳入深度学习模型以实现精确的PCC分割的价值，为支持PCC患者的临床评估和疾病纵向监测提供了有价值的工具。

更新时间: 2025-07-24 19:33:50

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.15193v2

Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation

Multi-Armed Bandit (MAB) algorithms are widely used in recommender systems that require continuous, incremental learning. A core aspect of MABs is the exploration-exploitation trade-off: choosing between exploiting items likely to be enjoyed and exploring new ones to gather information. In contextual linear bandits, this trade-off is particularly central, as many variants share the same linear regression backbone and differ primarily in their exploration strategies. Despite its prevalent use, offline evaluation of MABs is increasingly recognized for its limitations in reliably assessing exploration behavior. This study conducts an extensive offline empirical comparison of several linear MABs. Strikingly, across over 90% of various datasets, a greedy linear model, with no type of exploration, consistently achieves top-tier performance, often outperforming or matching its exploratory counterparts. This observation is further corroborated by hyperparameter optimization, which consistently favors configurations that minimize exploration, suggesting that pure exploitation is the dominant strategy within these evaluation settings. Our results expose significant inadequacies in offline evaluation protocols for bandits, particularly concerning their capacity to reflect true exploratory efficacy. Consequently, this research underscores the urgent necessity for developing more robust assessment methodologies, guiding future investigations into alternative evaluation frameworks for interactive learning in recommender systems.

Updated: 2025-07-24 19:14:39

标题: 剥削胜过探索：揭示线性赌博推荐器离线评估中的偏见

摘要: 多臂赌博(MAB)算法广泛应用于需要持续、增量学习的推荐系统中。MAB的核心方面是探索与开发之间的权衡：在利用可能受欢迎的项目和探索新项目以收集信息之间进行选择。在情境线性赌博中，这种权衡尤为重要，因为许多变体共享相同的线性回归基础，主要区别在于它们的探索策略。尽管MAB的普遍使用，离线评估MAB的局限性越来越受到认可，无法可靠地评估探索行为。本研究对几种线性MAB进行了广泛的离线实证比较。令人惊讶的是，在超过90%的各种数据集中，一个贪婪线性模型，没有任何类型的探索，始终达到最高水平的性能，往往优于或与其探索对手相匹敌。这一观察结果进一步得到了超参数优化的支持，后者一贯支持最小化探索的配置，表明纯粹的开发是这些评估设置中的主导策略。我们的结果揭示了关于赌博的离线评估协议存在重大不足，尤其是关于其反映真实探索效果的能力。因此，这项研究强调了对于开发更健壮的评估方法的紧迫必要性，引导未来研究探索替代评估框架，用于推荐系统中的交互式学习。

更新时间: 2025-07-24 19:14:39

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2507.18756v1

Agentic Program Repair from Test Failures at Scale: A Neuro-symbolic approach with static analysis and test execution feedback

Aim: With the advent of LLMs, sophisticated agentic program repair has become viable at large organizations with large codebases. In this work, we develop an Engineering Agent that fixes the source code based on test failures at scale across diverse software offerings internally. Method: Using Llama as the base, we employ the ReAct harness to develop an agent. We start with a test failure that was triaged by a rule-based test failure bot. We then set up an agentic harness and allow the agent to reason and run a set of 15 actions from reading a file to generating a patch. We provide feedback to the agent through static analysis and test failures so it can refine its solution. We leverage an LLM-as-a-Judge to ensure that the patch conforms to the standards followed by a human review to land fixes. Benchmark Findings: We curated offline benchmarks for our patch generator, the Engineering Agent loop, and the LLM-as-a-Judge. In offline evaluations we found that a specialized 70B model is highly competitive with the much larger but vanilla Llama-405B. In an ablation study, we found that the ReAct harness (neural model) benefited from the symbolic information from static analysis tools and test execution traces. A model that strikes a balance between the solve rate and error rate vs the cost and latency has a benchmark solve rate of 42.3% using an average 11.8 feedback iterations. Production Findings: In a three month period, 80% of the generated fixes were reviewed, of which 31.5% were landed (25.5% of the total number of generated fixes). Feedback from Engineers: We used open coding to extract qualitative themes from engineers' feedback. We saw positive feedback in the form of quick approvals, gratitude, and surprise. We also found mixed feedback when the Engineering Agent's solution was partially correct and it served as a good starting point.

Updated: 2025-07-24 19:12:32

标题: 大规模测试失败的主体程序修复：一种带有静态分析和测试执行反馈的神经符号方法。

摘要: 目的：随着LLM的出现，复杂的代理程序修复在拥有大型代码库的大型组织中变得可行。在这项工作中，我们开发了一种工程代理，根据内部不同软件产品的测试失败修复源代码。方法：以Llama为基础，我们使用ReAct测试框架开发了一个代理。我们从通过基于规则的测试失败机器人进行分类的测试失败开始。然后我们设置一个代理测试框架，允许代理进行推理并运行一组从读取文件到生成补丁的15个动作。我们通过静态分析和测试失败向代理提供反馈，以便它可以完善解决方案。我们利用LLM作为评判者，以确保补丁符合人工审查的标准。基准结果：我们为我们的补丁生成器、工程代理循环和LLM作为评判者策划了离线基准。在离线评估中，我们发现专门的70B模型与更大但基本的Llama-405B具有很高的竞争力。在消融研究中，我们发现ReAct测试框架（神经模型）受益于来自静态分析工具和测试执行跟踪的符号信息。在解决率和错误率与成本和延迟之间取得平衡的模型在平均11.8个反馈迭代的情况下，具有42.3%的基准解决率。生产结果：在三个月的时间里，80%的生成的修复措施得到了审查，其中31.5%被采纳（占生成的修复措施总数的25.5%）。工程师的反馈：我们使用开放编码提取工程师反馈中的定性主题。我们看到了快速批准、感激和惊喜等积极反馈。当工程代理的解决方案部分正确并作为一个良好的起点时，我们也发现了反馈的混合。

更新时间: 2025-07-24 19:12:32

领域: cs.SE,cs.AI,cs.PL

下载: http://arxiv.org/abs/2507.18755v1

Agentic Program Repair from Test Failures at Scale: A Neuro-symbolic approach with static analysis and test execution feedback

Updated: 2025-07-24 19:12:32

标题: 规模化测试失败的主动式程序修复：一种具有静态分析和测试执行反馈的神经符号方法

摘要: 目的：随着LLMs的出现，大型组织拥有大型代码库的复杂代理程序修复变得可行。在这项工作中，我们开发了一个工程代理，根据内部各种软件产品的测试失败修复源代码规模。方法：使用Llama作为基础，我们利用ReAct测试框架开发了一个代理。我们从一个由基于规则的测试失败机器人分类的测试失败开始。然后我们建立一个代理测试框架，允许代理推理并运行一组从读取文件到生成补丁的15个动作。我们通过静态分析和测试失败向代理提供反馈，以便它可以完善解决方案。我们利用LLM作为裁判确保补丁符合人工审查的标准以实现修复。基准结果：我们为我们的补丁生成器、工程代理循环和LLM裁判策划了离线基准测试。在离线评估中，我们发现专门的70B模型与更大但普通的Llama-405B相比具有很高的竞争力。在消融研究中，我们发现ReAct测试框架（神经模型）受益于来自静态分析工具和测试执行跟踪的符号信息。一个在解决率和错误率与成本和延迟之间取得平衡的模型在平均11.8个反馈迭代中具有42.3%的基准解决率。生产结果：在三个月的时间里，80%的生成的修复方案得到审查，其中31.5%被采纳（占总生成修复方案的25.5%）。工程师的反馈：我们使用开放编码提取工程师反馈的定性主题。我们看到快速批准、感激和惊喜等积极反馈。当工程代理的解决方案部分正确并作为一个良好的起点时，我们也发现了混合反馈。

更新时间: 2025-07-24 19:12:32

领域: cs.SE,cs.AI,cs.PL

下载: http://arxiv.org/abs/2507.18755v1

Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern Recognition

Tactics, Techniques and Procedures (TTPs) represent sophisticated attack patterns in the cybersecurity domain, described encyclopedically in textual knowledge bases. Identifying TTPs in cybersecurity writing, often called TTP mapping, is an important and challenging task. Conventional learning approaches often target the problem in the classical multi-class or multilabel classification setting. This setting hinders the learning ability of the model due to a large number of classes (i.e., TTPs), the inevitable skewness of the label distribution and the complex hierarchical structure of the label space. We formulate the problem in a different learning paradigm, where the assignment of a text to a TTP label is decided by the direct semantic similarity between the two, thus reducing the complexity of competing solely over the large labeling space. To that end, we propose a neural matching architecture with an effective sampling-based learn-to-compare mechanism, facilitating the learning process of the matching model despite constrained resources.

Updated: 2025-07-24 19:11:13

标题: 噪声对比估计为基础的低资源安全攻击模式识别匹配框架

摘要: 战术、技术和程序（TTPs）代表网络安全领域中复杂的攻击模式，在文本知识库中被百科式描述。识别网络安全写作中的TTP，通常称为TTP映射，是一个重要且具有挑战性的任务。传统的学习方法通常以经典的多类别或多标签分类设置来解决这个问题。这种设置由于大量类别（即TTP）、标签分布的不可避免的偏斜以及标签空间的复杂层次结构而阻碍了模型的学习能力。我们将问题在一个不同的学习范式中加以阐述，其中文本分配给TTP标签是通过两者之间的直接语义相似性来决定的，从而减少了仅仅在庞大的标记空间上竞争的复杂性。为此，我们提出了一个具有有效基于采样的学习比较机制的神经匹配架构，促进了匹配模型的学习过程，尽管资源受到限制。

更新时间: 2025-07-24 19:11:13

领域: cs.LG,cs.AI,cs.CL,cs.CR

下载: http://arxiv.org/abs/2401.10337v4

Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern Recognition

Updated: 2025-07-24 19:11:13

标题: 基于噪声对比估计的低资源安全攻击模式识别匹配框架

摘要: 战术、技术和程序（TTPs）代表了网络安全领域中复杂的攻击模式，在文本知识库中被全面描述。在网络安全写作中识别TTPs，通常被称为TTP映射，是一项重要且具有挑战性的任务。传统的学习方法通常针对经典的多类别或多标签分类设置中的问题。这种设置由于大量类别（即TTPs）、标签分布的不可避免的倾斜和标签空间的复杂层次结构而阻碍了模型的学习能力。我们在一个不同的学习范式中制定了这个问题，其中将文本分配给TTP标签是通过两者之间的直接语义相似性来决定的，从而减少了仅在庞大的标记空间中竞争的复杂性。为此，我们提出了一个具有有效基于抽样的学习比较机制的神经匹配架构，促进了匹配模型的学习过程，尽管资源受限。

更新时间: 2025-07-24 19:11:13

领域: cs.LG,cs.AI,cs.CL,cs.CR

下载: http://arxiv.org/abs/2401.10337v4

Time-resolved dynamic CBCT reconstruction using prior-model-free spatiotemporal Gaussian representation (PMF-STGR)

Time-resolved CBCT imaging, which reconstructs a dynamic sequence of CBCTs reflecting intra-scan motion (one CBCT per x-ray projection without phase sorting or binning), is highly desired for regular and irregular motion characterization, patient setup, and motion-adapted radiotherapy. Representing patient anatomy and associated motion fields as 3D Gaussians, we developed a Gaussian representation-based framework (PMF-STGR) for fast and accurate dynamic CBCT reconstruction. PMF-STGR comprises three major components: a dense set of 3D Gaussians to reconstruct a reference-frame CBCT for the dynamic sequence; another 3D Gaussian set to capture three-level, coarse-to-fine motion-basis-components (MBCs) to model the intra-scan motion; and a CNN-based motion encoder to solve projection-specific temporal coefficients for the MBCs. Scaled by the temporal coefficients, the learned MBCs will combine into deformation vector fields to deform the reference CBCT into projection-specific, time-resolved CBCTs to capture the dynamic motion. Due to the strong representation power of 3D Gaussians, PMF-STGR can reconstruct dynamic CBCTs in a 'one-shot' training fashion from a standard 3D CBCT scan, without using any prior anatomical or motion model. We evaluated PMF-STGR using XCAT phantom simulations and real patient scans. Metrics including the image relative error, structural-similarity-index-measure, tumor center-of-mass-error, and landmark localization error were used to evaluate the accuracy of solved dynamic CBCTs and motion. PMF-STGR shows clear advantages over a state-of-the-art, INR-based approach, PMF-STINR. Compared with PMF-STINR, PMF-STGR reduces reconstruction time by 50% while reconstructing less blurred images with better motion accuracy. With improved efficiency and accuracy, PMF-STGR enhances the applicability of dynamic CBCT imaging for potential clinical translation.

Updated: 2025-07-24 18:45:56

标题: 使用先前模型无关的时空高斯表示（PMF-STGR）进行时间分辨率动态CBCT重建

摘要: 时间分辨CBCT成像，重建动态序列的CBCT，反映扫描内运动（每个X射线投影一个CBCT，无相位排序或分组），对于正常和不规则运动特征化、患者安置和动态适应放射治疗非常重要。我们将患者解剖和相关运动场作为3D高斯模型表示，开发了一个基于高斯表示的框架（PMF-STGR）用于快速准确的动态CBCT重建。PMF-STGR包括三个主要组成部分：一个密集的3D高斯模型，用于重建动态序列的参考帧CBCT；另一个3D高斯模型集合，用于捕捉三级，粗到细的运动基础分量（MBCs）来模拟扫描内运动；以及一个基于CNN的运动编码器，为MBCs解决投影特定的时间系数。通过时间系数缩放，学习得到的MBCs将结合成变形矢量场，将参考CBCT变形为投影特定的、时间分辨的CBCT，以捕捉动态运动。由于3D高斯的强大表征能力，PMF-STGR可以以“一次性”训练方式从标准的3D CBCT扫描中重建动态CBCT，而不使用任何先前的解剖或运动模型。我们使用XCAT幻影模拟和真实患者扫描评估了PMF-STGR。评估解决的动态CBCT和运动的准确性，使用图像相对误差、结构相似性指数测量、肿瘤质心误差和地标定位误差等指标。PMF-STGR显示出明显优势，相对于基于INR的最新方法PMF-STINR。与PMF-STINR相比，PMF-STGR在减少重建时间的同时，重建模糊度更小，运动准确性更高。通过提高效率和准确性，PMF-STGR增强了动态CBCT成像在潜在临床转化中的适用性。

更新时间: 2025-07-24 18:45:56

领域: physics.med-ph,cs.LG,eess.IV

下载: http://arxiv.org/abs/2503.22139v2

Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement

Language models (LMs) are susceptible to in-context reward hacking, where they exploit flaws in tainted or faulty written specifications or rubrics to achieve high scores without fulfilling the user's true intent. We introduce Specification Self-Correction (SSC), a novel, test-time framework that enables an LM to identify and correct flaws within its own guiding specification. SSC employs a multi-step inference process where the model first generates a response based on a potentially tainted specification, critiques its output, and then revises the specification itself to remove the exploitable loophole. A final, more robust response is then generated using this self-corrected specification. Across experiments spanning creative writing and agentic coding tasks with several LMs, we demonstrate that while models initially game tainted specifications in 50-70\% of cases, the SSC process reduces this vulnerability by over 90\%. This dynamic repair occurs at inference time, requires no weight modification, and leads to more robustly aligned model behavior. Code at https://github.com/vicgalle/specification-self-correction .

Updated: 2025-07-24 18:44:28

标题: 规格自我校正：通过测试时细化减轻上下文奖励黑客行为

摘要: 语言模型(LMs)容易受到上下文奖励黑客的影响，即它们利用有缺陷或错误的书写规范或评分标准来获得高分，而不实现用户真正的意图。我们引入了一种名为“规范自我校正”(SSC)的新颖测试时间框架，使LM能够识别并纠正其自身指导规范中的缺陷。SSC采用多步推理过程，模型首先根据可能有缺陷的规范生成响应，批评其输出，然后修订规范本身以消除可利用的漏洞。然后使用这个自我校正的规范生成一个更加健壮的最终响应。通过涵盖创造性写作和主动编码任务的实验，我们证明，虽然模型最初在50-70%的情况下利用有缺陷的规范，但SSC过程将这种脆弱性降低了90%以上。这种动态修复发生在推理时间，不需要修改权重，并导致更加稳健对齐的模型行为。代码在https://github.com/vicgalle/specification-self-correction。

更新时间: 2025-07-24 18:44:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18742v1

Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement

Updated: 2025-07-24 18:44:28

标题: 规范自我校正：通过测试时间细化缓解上下文奖励入侵

摘要: 语言模型（LMs）容易受到上下文奖励破解的影响，在这种情况下，它们利用有缺陷或错误的书面规范或评分标准来获得高分，而不实现用户真正的意图。我们引入了“规范自我修正”（SSC），这是一个新颖的、测试时的框架，使LM能够在其自身引导规范中识别和纠正缺陷。SSC采用了一个多步推理过程，其中模型首先基于一个潜在有缺陷的规范生成一个响应，批评其输出，然后修订规范本身以消除可以被利用的漏洞。然后使用这个自我校正的规范生成一个更健壮的最终响应。通过涵盖创意写作和代理编码任务的实验，我们展示了尽管模型最初在50-70\%的情况下操纵有缺陷的规范，但SSC过程将这种脆弱性降低了超过90\%。这种动态修复发生在推理时，不需要修改权重，并导致更加稳健地对齐模型行为。代码链接：https://github.com/vicgalle/specification-self-correction。

更新时间: 2025-07-24 18:44:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18742v1

FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models

The increasing demand to process long and high-resolution videos significantly burdens Large Vision-Language Models (LVLMs) due to the enormous number of visual tokens. Existing token reduction methods primarily prune tokens based on importance metrics, such as cumulative attention scores. However, even important tokens may exhibit high redundancy caused by similarity among adjacent video frames and repetitive visual elements. To address this limitation, we propose FrameFusion, a novel token reduction approach integrating similarity-based merging with importance-based pruning. We conduct a thorough study on token similarity characteristics, revealing three key insights: (1) spatially corresponding visual tokens between adjacent frames have higher cosine similarities compared to other token pairs; (2) high token similarities prominently decrease in deeper model layers; and (3) token similarity rankings are highly consistent across different layers. Guided by these observations, FrameFusion computes token similarities exclusively between corresponding visual tokens from adjacent frames, applies token merging at initial successive layers followed by pruning in deeper layers, and adopts a cascaded merging strategy to further enhance efficiency. We evaluate FrameFusion comprehensively across six diverse LVLMs, ranging from 2B to 72B parameters, using five video benchmarks encompassing video retrieval, question-answering, and spatial-temporal understanding tasks. Experiments show that FrameFusion reduces visual tokens by 70%, achieving 1.6-3.6x end-to-end speedups, with an average performance impact of less than 3%. Our code is available at: https://github.com/thu-nics/FrameFusion.

Updated: 2025-07-24 18:44:26

标题: FrameFusion: 结合相似性和重要性，减少大型视觉语言模型中的视频标记

摘要: 随着对处理长时间和高分辨率视频的需求不断增加，由于视觉标记的数量庞大，大型视觉-语言模型（LVLMs）受到了重大负担。现有的标记减少方法主要基于重要性指标，如累积注意力分数对标记进行修剪。然而，即使重要的标记也可能由于相邻视频帧之间的相似性和重复的视觉元素而具有高冗余性。为了解决这一限制，我们提出了一种新颖的标记减少方法FrameFusion，该方法将基于相似性的合并与基于重要性的修剪相结合。我们对标记相似性特征进行了深入研究，揭示了三个关键见解：（1）相邻帧之间空间对应的视觉标记比其他标记对具有更高的余弦相似性；（2）高标记相似性在较深的模型层中明显减少；（3）标记相似性排名在不同层之间高度一致。受这些观察的启发，FrameFusion 仅在相邻帧之间的对应视觉标记之间计算标记相似性，首先在初始连续层进行标记合并，然后在较深的层进行修剪，并采用级联合并策略进一步增强效率。我们在六种不同的LVLMs上全面评估了FrameFusion，这些LVLMs的参数范围从2B到72B，使用了涵盖视频检索、问答和时空理解任务的五个视频基准。实验表明，FrameFusion 将视觉标记减少了70%，实现了1.6-3.6倍的端到端加速，平均性能影响不到3%。我们的代码可在以下网址获得：https://github.com/thu-nics/FrameFusion。

更新时间: 2025-07-24 18:44:26

领域: cs.CV,cs.AI,68T45, 68T50,I.2.7; I.2.10

下载: http://arxiv.org/abs/2501.01986v2

FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models

Updated: 2025-07-24 18:44:26

标题: FrameFusion：在大型视觉语言模型上结合相似性和重要性以减少视频令牌

摘要: 随着对处理长时间和高分辨率视频的需求不断增加，由于视觉令牌数量巨大，大规模视觉语言模型（LVLMs）的负担显著增加。现有的令牌减少方法主要基于重要性指标，如累积注意力分数，来修剪令牌。然而，即使重要的令牌也可能由于相邻视频帧之间的相似性和重复的视觉元素而表现出高冗余性。为了解决这个限制，我们提出了FrameFusion，一种将基于相似性的合并与基于重要性的修剪相结合的新型令牌减少方法。我们对令牌相似性特征进行了深入研究，揭示了三个关键见解：（1）相邻帧之间的空间对应的视觉令牌与其他令牌对比起来具有更高的余弦相似度；（2）高令牌相似性在更深的模型层中显著降低；（3）令牌相似性排名在不同层之间高度一致。受这些观察的启发，FrameFusion 仅计算相邻帧之间对应的视觉令牌之间的令牌相似性，首先在初始连续层进行令牌合并，然后在更深的层进行修剪，并采用级联合并策略进一步提高效率。我们在六个不同的LVLMs上全面评估了FrameFusion，这些LVLMs的参数范围从2B到72B，使用了涵盖视频检索、问答和时空理解任务的五个视频基准。实验证明，FrameFusion 将视觉令牌减少了70%，实现了1.6-3.6倍的端到端加速，平均性能影响不到3%。我们的代码可在以下链接获取：https://github.com/thu-nics/FrameFusion。

更新时间: 2025-07-24 18:44:26

领域: cs.CV,cs.AI,68T45, 68T50,I.2.7; I.2.10

下载: http://arxiv.org/abs/2501.01986v2

Learned Single-Pixel Fluorescence Microscopy

Single-pixel imaging has emerged as a key technique in fluorescence microscopy, where fast acquisition and reconstruction are crucial. In this context, images are reconstructed from linearly compressed measurements. In practice, total variation minimisation is still used to reconstruct the image from noisy measurements of the inner product between orthogonal sampling pattern vectors and the original image data. However, data can be leveraged to learn the measurement vectors and the reconstruction process, thereby enhancing compression, reconstruction quality, and speed. We train an autoencoder through self-supervision to learn an encoder (or measurement matrix) and a decoder. We then test it on physically acquired multispectral and intensity data. During acquisition, the learned encoder becomes part of the physical device. Our approach can enhance single-pixel imaging in fluorescence microscopy by reducing reconstruction time by two orders of magnitude, achieving superior image quality, and enabling multispectral reconstructions. Ultimately, learned single-pixel fluorescence microscopy could advance diagnosis and biological research, providing multispectral imaging at a fraction of the cost.

Updated: 2025-07-24 18:40:28

标题: 学习的单像素荧光显微镜

摘要: 单像素成像已成为荧光显微镜中的关键技术，快速采集和重建至关重要。在这种情况下，图像是从线性压缩的测量中重建的。在实践中，仍然使用总变差最小化来从噪声测量中重建图像，这些噪声测量是原始图像数据与正交采样模式向量之间的内积。然而，数据可以用来学习测量向量和重建过程，从而增强压缩、重建质量和速度。我们通过自监督训练一个自动编码器来学习编码器（或测量矩阵）和解码器。然后我们将其测试在物理获取的多光谱和强度数据上。在采集过程中，学习的编码器成为物理设备的一部分。我们的方法可以通过将重建时间缩短两个数量级，实现更优越的图像质量，并实现多光谱重建，从而增强荧光显微镜中的单像素成像。最终，学习的单像素荧光显微镜可以推动诊断和生物研究，以较低成本提供多光谱成像。

更新时间: 2025-07-24 18:40:28

领域: cs.CV,cs.AI,physics.optics

下载: http://arxiv.org/abs/2507.18740v1

Learned Single-Pixel Fluorescence Microscopy

Updated: 2025-07-24 18:40:28

标题: 学习的单像素荧光显微镜

摘要: 单像素成像已成为荧光显微镜中的关键技术，快速采集和重建至关重要。在这种情况下，图像是从线性压缩的测量中重建的。在实践中，仍然使用总变差最小化来从噪声测量中重建图像，噪声测量是正交采样模式向量与原始图像数据之间的内积。然而，数据可以被利用来学习测量向量和重建过程，从而增强压缩、重建质量和速度。我们通过自监督训练一个自动编码器来学习一个编码器（或测量矩阵）和一个解码器。然后我们在物理获取的多光谱和强度数据上进行测试。在采集过程中，学习的编码器成为物理设备的一部分。我们的方法可以通过减少重建时间两个数量级、获得优越的图像质量以及实现多光谱重建来增强荧光显微镜中的单像素成像。最终，学习的单像素荧光显微镜可以推动诊断和生物研究，以更低的成本提供多光谱成像。

更新时间: 2025-07-24 18:40:28

领域: cs.CV,cs.AI,physics.optics

下载: http://arxiv.org/abs/2507.18740v1

An Explainable Equity-Aware P2P Energy Trading Framework for Socio-Economically Diverse Microgrid

Fair and dynamic energy allocation in community microgrids remains a critical challenge, particularly when serving socio-economically diverse participants. Static optimization and cost-sharing methods often fail to adapt to evolving inequities, leading to participant dissatisfaction and unsustainable cooperation. This paper proposes a novel framework that integrates multi-objective mixed-integer linear programming (MILP), cooperative game theory, and a dynamic equity-adjustment mechanism driven by reinforcement learning (RL). At its core, the framework utilizes a bi-level optimization model grounded in Equity-regarding Welfare Maximization (EqWM) principles, which incorporate Rawlsian fairness to prioritize the welfare of the least advantaged participants. We introduce a Proximal Policy Optimization (PPO) agent that dynamically adjusts socio-economic weights in the optimization objective based on observed inequities in cost and renewable energy access. This RL-powered feedback loop enables the system to learn and adapt, continuously striving for a more equitable state. To ensure transparency, Explainable AI (XAI) is used to interpret the benefit allocations derived from a weighted Shapley value. Validated across six realistic scenarios, the framework demonstrates peak demand reductions of up to 72.6%, and significant cooperative gains. The adaptive RL mechanism further reduces the Gini coefficient over time, showcasing a pathway to truly sustainable and fair energy communities.

Updated: 2025-07-24 18:38:51

标题: 一个可解释的、关注公平性的P2P能源交易框架，适用于社会经济多样化的微电网

摘要: 社区微电网中公平且动态的能源分配仍然是一个关键挑战，特别是在为社会经济多元化的参与者服务时。静态优化和成本分担方法往往无法适应不断发展的不平等现象，导致参与者不满和不可持续的合作。本文提出了一个新颖的框架，该框架集成了多目标混合整数线性规划（MILP）、合作博弈理论以及由强化学习（RL）驱动的动态权益调整机制。在其核心，该框架利用基于“以公平为导向的福利最大化”（EqWM）原则的双层优化模型，这些原则将Rawlsian公平性纳入其中，以优先考虑处于劣势地位的参与者的福利。我们引入了一个Proximal Policy Optimization（PPO）代理，根据观察到的成本和可再生能源获取不平等动态调整优化目标中的社会经济权重。这种由RL驱动的反馈循环使系统能够学习和适应，不断努力实现更公平的状态。为了确保透明度，我们使用可解释的人工智能（XAI）来解释从加权Shapley值导出的利益分配。在经过六个现实场景验证后，该框架展示了高达72.6%的峰值需求减少和显著的合作收益。自适应的RL机制进一步随时间降低了基尼系数，展示了一条通向真正可持续和公平能源社区的路径。

更新时间: 2025-07-24 18:38:51

领域: eess.SY,cs.GT,cs.LG,cs.SY

下载: http://arxiv.org/abs/2507.18738v1

Less is More: Adaptive Coverage for Synthetic Training Data

Synthetic training data generation with Large Language Models (LLMs) like Google's Gemma and OpenAI's GPT offer a promising solution to the challenge of obtaining large, labeled datasets for training classifiers. When rapid model deployment is critical, such as in classifying emerging social media trends or combating new forms of online abuse tied to current events, the ability to generate training data is invaluable. While prior research has examined the comparability of synthetic data to human-labeled data, this study introduces a novel sampling algorithm, based on the maximum coverage problem, to select a representative subset from a synthetically generated dataset. Our results demonstrate that training a classifier on this contextually sampled subset achieves superior performance compared to training on the entire dataset. This "less is more" approach not only improves model accuracy but also reduces the volume of data required, leading to potentially more efficient model fine-tuning.

Updated: 2025-07-24 18:36:49

标题: 少即是多：合成训练数据的自适应覆盖

摘要: 使用大语言模型（LLMs）如Google的Gemma和OpenAI的GPT生成合成训练数据为训练分类器获取大型标记数据集提供了有希望的解决方案。在快速模型部署至关重要的情况下，比如对新兴社交媒体趋势进行分类或应对与当前事件相关的新形式的在线滥用，生成训练数据的能力是无价的。尽管先前的研究已经考察了合成数据与人工标记数据的可比性，但本研究引入了一种基于最大覆盖问题的新型抽样算法，以从合成生成的数据集中选择一个代表性子集。我们的结果表明，在这种上下文抽样子集上训练分类器的性能优于在整个数据集上训练。这种“少即是多”的方法不仅提高了模型的准确性，还减少了所需数据的量，从而可能实现更高效的模型微调。

更新时间: 2025-07-24 18:36:49

领域: cs.LG

下载: http://arxiv.org/abs/2504.14508v2

Bootstrapped Reward Shaping

In reinforcement learning, especially in sparse-reward domains, many environment steps are required to observe reward information. In order to increase the frequency of such observations, "potential-based reward shaping" (PBRS) has been proposed as a method of providing a more dense reward signal while leaving the optimal policy invariant. However, the required "potential function" must be carefully designed with task-dependent knowledge to not deter training performance. In this work, we propose a "bootstrapped" method of reward shaping, termed BSRS, in which the agent's current estimate of the state-value function acts as the potential function for PBRS. We provide convergence proofs for the tabular setting, give insights into training dynamics for deep RL, and show that the proposed method improves training speed in the Atari suite.

Updated: 2025-07-24 18:29:56

标题: 自举奖励塑形

摘要: 在强化学习中，尤其是在稀疏奖励领域，需要许多环境步骤才能观察到奖励信息。为了增加这种观察的频率，“基于潜力的奖励塑造”（PBRS）被提出作为一种提供更密集奖励信号的方法，同时保持最优策略不变。然而，所需的“潜力函数”必须根据任务相关知识进行精心设计，以不影响训练性能。在这项工作中，我们提出了一种“自举”奖励塑造方法，称为BSRS，其中代理的当前状态值函数的估计作为PBRS的潜力函数。我们为表格设置提供了收敛证明，提供了关于深度强化学习训练动态的见解，并展示了该方法在Atari套件中提高了训练速度。

更新时间: 2025-07-24 18:29:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2501.00989v2

Bootstrapped Reward Shaping

Updated: 2025-07-24 18:29:56

标题: 自举奖励塑造

摘要: 在强化学习中，特别是在稀疏奖励领域，需要许多环境步骤才能观察到奖励信息。为了增加这类观察的频率，“基于潜力的奖励塑造”（PBRS）被提出作为一种提供更密集奖励信号的方法，同时保持最优策略不变。然而，所需的“潜力函数”必须经过仔细设计，具有任务相关的知识，以不影响训练性能。在这项工作中，我们提出了一种“自举”奖励塑造方法，称为BSRS，其中代理的当前状态值函数估计作为PBRS的潜力函数。我们为表格设置提供了收敛证明，深度RL的训练动态洞察，并展示了所提出的方法在Atari套件中提高了训练速度。

更新时间: 2025-07-24 18:29:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2501.00989v2

Multi-Year Maintenance Planning for Large-Scale Infrastructure Systems: A Novel Network Deep Q-Learning Approach

Infrastructure asset management is essential for sustaining the performance of public infrastructure such as road networks, bridges, and utility networks. Traditional maintenance and rehabilitation planning methods often face scalability and computational challenges, particularly for large-scale networks with thousands of assets under budget constraints. This paper presents a novel deep reinforcement learning (DRL) framework that optimizes asset management strategies for large infrastructure networks. By decomposing the network-level Markov Decision Process (MDP) into individual asset-level MDPs while using a unified neural network architecture, the proposed framework reduces computational complexity, improves learning efficiency, and enhances scalability. The framework directly incorporates annual budget constraints through a budget allocation mechanism, ensuring maintenance plans are both optimal and cost-effective. Through a case study on a large-scale pavement network of 68,800 segments, the proposed DRL framework demonstrates significant improvements over traditional methods like Progressive Linear Programming and genetic algorithms, both in efficiency and network performance. This advancement contributes to infrastructure asset management and the broader application of reinforcement learning in complex, large-scale environments.

Updated: 2025-07-24 18:27:31

标题: 大规模基础设施系统的多年维护规划：一种新颖的网络深度Q学习方法

摘要: 基础设施资产管理对于维持公共基础设施的性能至关重要，如道路网络、桥梁和公用事业网络。传统的维护和修复规划方法通常面临可扩展性和计算挑战，特别是对于受到预算限制的大规模网络，其中有数千个资产。本文提出了一种新颖的深度强化学习（DRL）框架，用于优化大型基础设施网络的资产管理策略。通过将网络级马尔可夫决策过程（MDP）分解为个体资产级MDPs，同时使用统一的神经网络架构，所提出的框架减少了计算复杂性，提高了学习效率，并增强了可扩展性。该框架通过预算分配机制直接纳入年度预算限制，确保维护计划既优化又具有成本效益。通过对一个由68,800个路段组成的大型路面网络的案例研究，提出的DRL框架在效率和网络性能方面均显著优于传统方法，如渐进线性规划和遗传算法。这一进展有助于基础设施资产管理以及在复杂的大规模环境中广泛应用强化学习。

更新时间: 2025-07-24 18:27:31

领域: math.OC,cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.18732v1

Multi-Year Maintenance Planning for Large-Scale Infrastructure Systems: A Novel Network Deep Q-Learning Approach

Updated: 2025-07-24 18:27:31

标题: 大规模基础设施系统多年维护规划：一种新颖的网络深度Q学习方法

摘要: 基础设施资产管理对于维持道路网络、桥梁和公用设施网络等公共基础设施的性能至关重要。传统的维护和修复规划方法通常面临可伸缩性和计算挑战，特别是在受到预算限制的大规模网络中，拥有数千个资产。本文提出了一种新颖的深度强化学习（DRL）框架，用于优化大型基础设施网络的资产管理策略。通过将网络级马尔可夫决策过程（MDP）分解为单个资产级MDPs，并使用统一的神经网络架构，所提出的框架减少了计算复杂性，提高了学习效率，并增强了可伸缩性。该框架通过预算分配机制直接纳入年度预算限制，确保维护计划既是最优的又是具有成本效益的。通过对68800个路段的大规模路面网络进行案例研究，所提出的DRL框架在效率和网络性能方面均显著优于传统方法，如渐进线性规划和遗传算法。这一进展对基础设施资产管理和在复杂的大规模环境中应用强化学习具有广泛的意义。

更新时间: 2025-07-24 18:27:31

领域: math.OC,cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.18732v1

Exploration Behavior of Untrained Policies

Exploration remains a fundamental challenge in reinforcement learning (RL), particularly in environments with sparse or adversarial reward structures. In this work, we study how the architecture of deep neural policies implicitly shapes exploration before training. We theoretically and empirically demonstrate strategies for generating ballistic or diffusive trajectories from untrained policies in a toy model. Using the theory of infinite-width networks and a continuous-time limit, we show that untrained policies return correlated actions and result in non-trivial state-visitation distributions. We discuss the distributions of the corresponding trajectories for a standard architecture, revealing insights into inductive biases for tackling exploration. Our results establish a theoretical and experimental framework for using policy initialization as a design tool to understand exploration behavior in early training.

Updated: 2025-07-24 18:16:23

标题: 未训练策略的探索行为

摘要: 在强化学习（RL）中，探索仍然是一个基本挑战，特别是在奖励结构稀疏或对抗性的环境中。在这项工作中，我们研究了深度神经策略的架构在训练之前如何隐含地塑造探索。我们在一个玩具模型中理论上和实证上展示了生成未经训练的策略的弹道或扩散轨迹的策略。利用无限宽网络的理论和连续时间极限，我们展示了未经训练的策略返回相关动作并导致非平凡的状态访问分布。我们讨论了标准架构对应轨迹的分布，揭示了解决探索问题的归纳偏好。我们的结果建立了一个理论和实验框架，利用策略初始化作为设计工具来理解早期训练中的探索行为。

更新时间: 2025-07-24 18:16:23

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2506.22566v3

Exploration Behavior of Untrained Policies

Updated: 2025-07-24 18:16:23

标题: 未训练策略的探索行为

摘要: 在强化学习（RL）中，探索仍然是一个基本挑战，特别是在奖励结构稀疏或对抗性的环境中。在这项工作中，我们研究了深度神经策略的架构在训练之前如何隐含地塑造探索。我们在一个玩具模型中从理论和实证的角度演示了生成未经训练的策略中的弹道或扩散轨迹的策略。使用无限宽度网络的理论和连续时间极限，我们展示了未经训练的策略返回相关动作并导致非平凡的状态访问分布的策略。我们讨论了标准架构的相应轨迹的分布，揭示了解决探索的归纳偏见的见解。我们的结果建立了一个理论和实验框架，利用策略初始化作为设计工具，以理解早期训练中的探索行为。

更新时间: 2025-07-24 18:16:23

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2506.22566v3

AI Flow: Perspectives, Scenarios, and Approaches

Pioneered by the foundational information theory by Claude Shannon and the visionary framework of machine intelligence by Alan Turing, the convergent evolution of information and communication technologies (IT/CT) has created an unbroken wave of connectivity and computation. This synergy has sparked a technological revolution, now reaching its peak with large artificial intelligence (AI) models that are reshaping industries and redefining human-machine collaboration. However, the realization of ubiquitous intelligence faces considerable challenges due to substantial resource consumption in large models and high communication bandwidth demands. To address these challenges, AI Flow has been introduced as a multidisciplinary framework that integrates cutting-edge IT and CT advancements, with a particular emphasis on the following three key points. First, device-edge-cloud framework serves as the foundation, which integrates end devices, edge servers, and cloud clusters to optimize scalability and efficiency for low-latency model inference. Second, we introduce the concept of familial models, which refers to a series of different-sized models with aligned hidden features, enabling effective collaboration and the flexibility to adapt to varying resource constraints and dynamic scenarios. Third, connectivity- and interaction-based intelligence emergence is a novel paradigm of AI Flow. By leveraging communication networks to enhance connectivity, the collaboration among AI models across heterogeneous nodes achieves emergent intelligence that surpasses the capability of any single model. The innovations of AI Flow provide enhanced intelligence, timely responsiveness, and ubiquitous accessibility to AI services, paving the way for the tighter fusion of AI techniques and communication systems.

Updated: 2025-07-24 18:15:00

标题: AI 流程：观点、场景和方法论

摘要: 由克劳德·香农的基础信息理论和艾伦·图灵的机器智能前瞻性框架开创的信息与通信技术（IT/CT）的融合进化已经创造出一波连续的连接和计算浪潮。这种协同作用引发了一场技术革命，现在正达到其顶峰，大型人工智能（AI）模型正在重塑产业，重新定义人机协作。然而，普遍智能的实现面临着重大挑战，主要是由于大型模型的大量资源消耗和高通信带宽需求。为了解决这些挑战，AI Flow被引入为一个多学科框架，整合了尖端的IT和CT进展，特别强调以下三个关键点。首先，设备边缘云框架作为基础，将端设备、边缘服务器和云集群整合在一起，以优化可伸缩性和效率，用于低延迟模型推断。其次，我们介绍了家族模型的概念，指的是一系列具有对齐隐藏特征的不同大小模型，实现有效协作和灵活适应不同资源限制和动态场景。第三，基于连接和互动的智能涌现是AI Flow的一种新范式。通过利用通信网络增强连接性，跨异构节点之间的AI模型协作实现了超越任何单一模型能力的新兴智能。AI Flow的创新提供了增强的智能、及时的响应和普遍的AI服务可访问性，为AI技术和通信系统的更紧密融合铺平了道路。

更新时间: 2025-07-24 18:15:00

领域: cs.AI,cs.CL,cs.CV,cs.DC,eess.SP

下载: http://arxiv.org/abs/2506.12479v3

AI Flow: Perspectives, Scenarios, and Approaches

Updated: 2025-07-24 18:15:00

标题: AI流程：观点、场景和方法论

摘要: 由克劳德·香农倡导的基础信息理论和艾伦·图灵的机器智能愿景构成了信息和通信技术（IT / CT）的收敛演进，创造了一波连续的连接和计算浪潮。这种协同作用引发了一场技术革命，目前正处于顶峰，大型人工智能（AI）模型正在重塑产业，重新定义人机协作。然而，无处不在的智能实现面临着重大挑战，因为大型模型会消耗大量资源，对通信带宽需求也很高。为了解决这些挑战，AI Flow被引入作为一个多学科框架，整合了尖端的IT和CT进步，特别强调以下三个关键点。首先，设备-边缘-云框架作为基础，整合了端设备、边缘服务器和云集群，以优化低延迟模型推断的可伸缩性和效率。其次，我们引入了家族模型的概念，指的是一系列具有对齐隐藏特征的不同大小模型，实现有效协作和灵活适应不同资源约束和动态场景。第三，基于连接和交互的智能出现是AI Flow的一种新典范。通过利用通信网络来增强连接性，跨异构节点之间的AI模型协作实现超越任何单一模型能力的新兴智能。AI Flow的创新提供了增强的智能、及时的响应和无处不在的AI服务可访问性，为AI技术和通信系统更紧密融合铺平了道路。

更新时间: 2025-07-24 18:15:00

领域: cs.AI,cs.CL,cs.CV,cs.DC,eess.SP

下载: http://arxiv.org/abs/2506.12479v3

An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning

Fine-tuning is an important step in adapting foundation models such as large language models to downstream tasks. To make this step more accessible to users with limited computational budgets, it is crucial to develop fine-tuning methods that are memory and computationally efficient. Sparse Fine-tuning (SpFT) and Low-rank adaptation (LoRA) are two frameworks that have emerged for addressing this problem and have been adopted widely in practice. In this work, we develop a new SpFT framework, based on ideas from neural network pruning. At a high level, we first identify ``important'' neurons/nodes using feature importance metrics from network pruning (specifically, we use the structural pruning method), and then perform fine-tuning by restricting to weights involving these neurons. Experiments on common language tasks show our method improves SpFT's memory efficiency by 20-50\% while matching the accuracy of state-of-the-art methods like LoRA's variants.

Updated: 2025-07-24 18:14:45

标题: 一种通过神经网络修剪实现高效稀疏微调和低量化误差的方法

摘要: 微调是将基础模型（如大型语言模型）适应下游任务的重要步骤。为了使这一步骤对计算资源有限的用户更具可操作性，开发内存和计算效率高的微调方法至关重要。稀疏微调（SpFT）和低秩适应（LoRA）是解决这一问题并在实践中被广泛采用的两种框架。在这项工作中，我们基于神经网络修剪的思想开发了一种新的SpFT框架。在高层次上，我们首先使用来自网络修剪的特征重要性指标（具体来说，我们使用结构修剪方法）来识别“重要”的神经元/节点，然后通过限制涉及这些神经元的权重来进行微调。在常见的语言任务上的实验表明，我们的方法将SpFT的内存效率提高了20-50%，同时与LoRA的变体等最先进方法的准确性相匹配。

更新时间: 2025-07-24 18:14:45

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.11439v2

An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning

Updated: 2025-07-24 18:14:45

标题: 一种通过神经网络修剪实现低量化误差的高效稀疏微调

摘要: Fine-tuning是将基础模型（如大型语言模型）调整到下游任务中的重要步骤。为了使这一步骤更易于使用计算资源有限的用户，开发内存和计算效率高的fine-tuning方法至关重要。稀疏微调（SpFT）和低秩适应（LoRA）是两种解决这一问题的框架，并在实践中被广泛采用。在这项工作中，我们基于神经网络修剪的思想，开发了一种新的SpFT框架。在高层次上，我们首先使用网络修剪的特征重要性度量来识别“重要”的神经元/节点（具体来说，我们使用结构修剪方法），然后通过限制涉及这些神经元的权重来进行微调。对常见语言任务的实验显示，我们的方法将SpFT的内存效率提高了20-50\%，同时与LoRA的变体等最先进方法的准确性相匹配。

更新时间: 2025-07-24 18:14:45

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.11439v2

The Right to be Forgotten in Pruning: Unveil Machine Unlearning on Sparse Models

Machine unlearning aims to efficiently eliminate the memory about deleted data from trained models and address the right to be forgotten. Despite the success of existing unlearning algorithms, unlearning in sparse models has not yet been well studied. In this paper, we empirically find that the deleted data has an impact on the pruned topology in a sparse model. Motivated by the observation and the right to be forgotten, we define a new terminology ``un-pruning" to eliminate the impact of deleted data on model pruning. Then we propose an un-pruning algorithm to approximate the pruned topology driven by retained data. We remark that any existing unlearning algorithm can be integrated with the proposed un-pruning workflow and the error of un-pruning is upper-bounded in theory. Also, our un-pruning algorithm can be applied to both structured sparse models and unstructured sparse models. In the experiment, we further find that Membership Inference Attack (MIA) accuracy is unreliable for assessing whether a model has forgotten deleted data, as a small change in the amount of deleted data can produce arbitrary MIA results. Accordingly, we devise new performance metrics for sparse models to evaluate the success of un-pruning. Lastly, we conduct extensive experiments to verify the efficacy of un-pruning with various pruning methods and unlearning algorithms. Our code is released at https://anonymous.4open.science/r/UnlearningSparseModels-FBC5/.

Updated: 2025-07-24 18:13:26

标题: 在修剪中被遗忘的权利：揭开稀疏模型上的机器遗忘

摘要: 机器遗忘旨在有效地从经过训练的模型中消除已删除数据的记忆，并解决被遗忘的权利。尽管现有的遗忘算法取得了成功，但在稀疏模型中的遗忘尚未得到很好的研究。本文通过实证研究发现，已删除数据对稀疏模型中的修剪拓扑结构产生影响。受到这一观察和被遗忘的权利的启发，我们定义了一个新术语“反修剪”，以消除已删除数据对模型修剪的影响。然后，我们提出了一个反修剪算法，以近似由保留数据驱动的修剪拓扑结构。我们指出，任何现有的遗忘算法都可以与所提出的反修剪工作流集成，理论上反修剪的误差是有上界的。此外，我们的反修剪算法可以应用于结构化稀疏模型和非结构化稀疏模型。在实验中，我们进一步发现，会员推理攻击(MIA)的准确性对于评估模型是否遗忘已删除数据是不可靠的，因为已删除数据量的微小改变可能导致任意的MIA结果。因此，我们设计了新的性能指标，用于评估反修剪的成功。最后，我们进行了大量实验，验证了反修剪在各种修剪方法和遗忘算法中的有效性。我们的代码发布在https://anonymous.4open.science/r/UnlearningSparseModels-FBC5/。

更新时间: 2025-07-24 18:13:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18725v1

The Right to be Forgotten in Pruning: Unveil Machine Unlearning on Sparse Models

Updated: 2025-07-24 18:13:26

标题: 在修剪中被遗忘的权利：揭示稀疏模型上的机器遗忘

摘要: 机器遗忘的目标是有效地从训练模型中消除已删除数据的记忆，并解决被遗忘的权利。尽管现有的遗忘算法取得了成功，但在稀疏模型中的遗忘尚未得到充分研究。本文在实证研究中发现，已删除数据对稀疏模型中的修剪拓扑结构产生影响。受到这一观察和被遗忘的权利的启发，我们定义了一个新术语“去修剪”来消除已删除数据对模型修剪的影响。然后我们提出了一个去修剪算法，以近似由保留数据驱动的修剪拓扑结构。我们指出，任何现有的遗忘算法都可以与所提出的去修剪工作流程集成，理论上去修剪的误差是有上限的。此外，我们的去修剪算法既可以应用于结构稀疏模型，也可以应用于非结构稀疏模型。在实验中，我们进一步发现，成员推断攻击（MIA）的准确性不可靠，无法评估模型是否已经遗忘了已删除的数据，因为已删除数据量的微小变化可能会产生任意的MIA结果。因此，我们设计了新的性能指标来评估稀疏模型的去修剪成功程度。最后，我们进行了大量实验，验证了去修剪在各种修剪方法和遗忘算法中的有效性。我们的代码已发布在https://anonymous.4open.science/r/UnlearningSparseModels-FBC5/。

更新时间: 2025-07-24 18:13:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18725v1

SCORE-SET: A dataset of GuitarPro files for Music Phrase Generation and Sequence Learning

A curated dataset of Guitar Pro tablature files (.gp5 format), tailored for tasks involving guitar music generation, sequence modeling, and performance-aware learning is provided. The dataset is derived from MIDI notes in MAESTRO and GiantMIDI which have been adapted into rhythm guitar tracks. These tracks are further processed to include a variety of expression settings typical of guitar performance, such as bends, slides, vibrato, and palm muting, to better reflect the nuances of real-world guitar playing.

Updated: 2025-07-24 18:13:12

标题: SCORE-SET：用于音乐短语生成和序列学习的GuitarPro文件数据集

摘要: 提供了一个经过精心筛选的吉他谱文件数据集（.gp5格式），专为涉及吉他音乐生成、序列建模和基于表现的学习任务而设计。该数据集源自MAESTRO和GiantMIDI中的MIDI音符，已经转换为节奏吉他轨道。这些轨道进一步处理，包括各种吉他演奏的典型表现设置，如弯音、滑音、颤音和掌鸣，以更好地反映真实世界吉他演奏的细微差别。

更新时间: 2025-07-24 18:13:12

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2507.18723v1

Fixed-Point RNNs: Interpolating from Diagonal to Dense

Linear recurrent neural networks (RNNs) and state-space models (SSMs) such as Mamba have become promising alternatives to softmax-attention as sequence mixing layers in Transformer architectures. Current models, however, do not exhibit the full state-tracking expressivity of RNNs because they rely on channel-wise (i.e. diagonal) sequence mixing. In this paper, we investigate parameterizations of a large class of dense linear RNNs as fixed-points of parallelizable diagonal linear RNNs. The resulting models can naturally trade expressivity for efficiency at a fixed number of parameters and achieve state-of-the-art results on the commonly used toy tasks $A_5$, $S_5$, copying, and modular arithmetics.

Updated: 2025-07-24 18:03:06

标题: 固定点RNNs：从对角到密集的插值

摘要: 线性递归神经网络（RNNs）和状态空间模型（SSMs），如Mamba，已成为Transformer架构中序列混合层的softmax-attention的有希望的替代品。然而，当前模型并不具备RNNs的完整状态跟踪表达能力，因为它们依赖于通道级（即对角线）序列混合。在本文中，我们研究了一大类密集线性RNNs的参数化，将其作为可并行化的对角线线性RNNs的固定点。由此产生的模型可以自然地在固定数量的参数下交换表达能力和效率，并在常用的玩具任务$A_5$，$S_5$，复制以及模块化算术上取得了最新的成果。

更新时间: 2025-07-24 18:03:06

领域: cs.LG

下载: http://arxiv.org/abs/2503.10799v2

Adaptive Neural Quantum States: A Recurrent Neural Network Perspective

Neural-network quantum states (NQS) are powerful neural-network ans\"atzes that have emerged as promising tools for studying quantum many-body physics through the lens of the variational principle. These architectures are known to be systematically improvable by increasing the number of parameters. Here we demonstrate an Adaptive scheme to optimize NQSs, through the example of recurrent neural networks (RNN), using a fraction of the computation cost while reducing training fluctuations and improving the quality of variational calculations targeting ground states of prototypical models in one- and two-spatial dimensions. This Adaptive technique reduces the computational cost through training small RNNs and reusing them to initialize larger RNNs. This work opens up the possibility for optimizing graphical processing unit (GPU) resources deployed in large-scale NQS simulations.

Updated: 2025-07-24 18:00:03

标题: 自适应神经量子态：循环神经网络视角

摘要: 神经网络量子态（NQS）是强大的神经网络ans\"ätze，已经成为研究量子多体物理的有希望工具，通过变分原理研究。这些架构已知可以通过增加参数的数量来系统地改进。在这里，我们通过递归神经网络（RNN）的示例展示了一种自适应方案来优化NQS，使用计算成本的一部分，同时减少训练波动并改善针对一维和二维空间维度中典型模型的基态的变分计算的质量。这种自适应技术通过训练小型RNN并重复使用它们来初始化更大的RNN来降低计算成本。这项工作打开了在大规模NQS模拟中部署的图形处理单元（GPU）资源的优化可能性。

更新时间: 2025-07-24 18:00:03

领域: cond-mat.dis-nn,cond-mat.str-el,cs.LG,physics.comp-ph,quant-ph

下载: http://arxiv.org/abs/2507.18700v1

Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift

We develop and analyze a principled approach to kernel ridge regression under covariate shift. The goal is to learn a regression function with small mean squared error over a target distribution, based on unlabeled data from there and labeled data that may have a different feature distribution. We propose to split the labeled data into two subsets, and conduct kernel ridge regression on them separately to obtain a collection of candidate models and an imputation model. We use the latter to fill the missing labels and then select the best candidate accordingly. Our non-asymptotic excess risk bounds demonstrate that our estimator adapts effectively to both the structure of the target distribution and the covariate shift. This adaptation is quantified through a notion of effective sample size that reflects the value of labeled source data for the target regression task. Our estimator achieves the minimax optimal error rate up to a polylogarithmic factor, and we find that using pseudo-labels for model selection does not significantly hinder performance.

Updated: 2025-07-24 17:59:49

标题: 伪标签在协变量转移下核岭回归中的应用

摘要: 我们开发并分析了一种基于核岭回归的原则方法，用于处理协变量转移。我们的目标是基于未标记数据和可能具有不同特征分布的标记数据，学习出在目标分布上具有较小均方误差的回归函数。我们建议将标记数据分为两个子集，并分别对它们进行核岭回归，以获得一组候选模型和一个插补模型。我们利用后者填补缺失标签，然后相应地选择最佳候选模型。我们的非渐近过度风险界表明，我们的估计器有效地适应了目标分布和协变量转移的结构。这种适应性通过有效样本大小的概念来量化，该概念反映了标记源数据对目标回归任务的价值。我们的估计器实现了最小化的最优误差率，最多只有对数因子，并且我们发现在模型选择中使用伪标签并不会显著影响性能。

更新时间: 2025-07-24 17:59:49

领域: stat.ME,cs.LG,math.ST,stat.ML,stat.TH,62J07, 62G05

下载: http://arxiv.org/abs/2302.10160v4

Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift

Updated: 2025-07-24 17:59:49

标题: 伪标记在协变量转移下的核岭回归

摘要: 我们开发并分析了一种基于核岭回归的原则方法，用于处理协变量转移。我们的目标是基于未标记数据和可能具有不同特征分布的标记数据，学习一个在目标分布上具有较小均方误差的回归函数。我们建议将标记数据分成两个子集，并分别对它们进行核岭回归，以获得一组候选模型和一个插补模型。我们使用后者填补缺失的标签，然后相应地选择最佳候选者。我们的非渐近过量风险界证明了我们的估算器有效地适应了目标分布的结构和协变量转移。这种适应性通过有效样本大小的概念来量化，该概念反映了标记源数据对目标回归任务的价值。我们的估算器实现了最小化的最优误差率，直到对数多项式因子为止，并且我们发现使用伪标签来进行模型选择并不会显著影响性能。

更新时间: 2025-07-24 17:59:49

领域: stat.ME,cs.LG,math.ST,stat.ML,stat.TH,62J07, 62G05

下载: http://arxiv.org/abs/2302.10160v4

SIDA: Synthetic Image Driven Zero-shot Domain Adaptation

Zero-shot domain adaptation is a method for adapting a model to a target domain without utilizing target domain image data. To enable adaptation without target images, existing studies utilize CLIP's embedding space and text description to simulate target-like style features. Despite the previous achievements in zero-shot domain adaptation, we observe that these text-driven methods struggle to capture complex real-world variations and significantly increase adaptation time due to their alignment process. Instead of relying on text descriptions, we explore solutions leveraging image data, which provides diverse and more fine-grained style cues. In this work, we propose SIDA, a novel and efficient zero-shot domain adaptation method leveraging synthetic images. To generate synthetic images, we first create detailed, source-like images and apply image translation to reflect the style of the target domain. We then utilize the style features of these synthetic images as a proxy for the target domain. Based on these features, we introduce Domain Mix and Patch Style Transfer modules, which enable effective modeling of real-world variations. In particular, Domain Mix blends multiple styles to expand the intra-domain representations, and Patch Style Transfer assigns different styles to individual patches. We demonstrate the effectiveness of our method by showing state-of-the-art performance in diverse zero-shot adaptation scenarios, particularly in challenging domains. Moreover, our approach achieves high efficiency by significantly reducing the overall adaptation time.

Updated: 2025-07-24 17:59:36

标题: AIDS:合成图像驱动的零样本域自适应

摘要: 零样本领域自适应是一种将模型适应到目标领域而不利用目标领域图像数据的方法。为了实现没有目标图像的适应，现有研究利用CLIP的嵌入空间和文本描述来模拟类似目标的风格特征。尽管在零样本领域自适应方面取得了先前的成就，但我们观察到这些以文本驱动的方法难以捕捉复杂的现实世界变化，并且由于它们的对齐过程显著增加了适应时间。我们不依赖于文本描述，而是探索利用图像数据的解决方案，这提供了多样化和更精细的风格线索。在这项工作中，我们提出了一种新颖高效的零样本领域自适应方法SIDA，利用合成图像。为了生成合成图像，我们首先创建详细的类似源图像，并应用图像转换来反映目标领域的风格。然后，我们利用这些合成图像的风格特征作为目标领域的替代。基于这些特征，我们引入了领域混合和补丁风格转移模块，这些模块使得对现实世界变化的有效建模成为可能。特别是，领域混合将多种风格混合在一起，以扩展域内表示，而补丁风格转移则为不同的补丁分配不同的风格。我们通过展示在各种零样本适应场景中的最新性能，特别是在具有挑战性的领域中，证明了我们方法的有效性。此外，我们的方法通过显著减少整体适应时间实现了高效率。

更新时间: 2025-07-24 17:59:36

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2507.18632v1

SIDA: Synthetic Image Driven Zero-shot Domain Adaptation

Updated: 2025-07-24 17:59:36

标题: AIDS:合成图像驱动的零样本域自适应

摘要: 零样本域自适应是一种适应模型到目标域而不利用目标域图像数据的方法。为了实现无需目标图像的适应，现有研究利用CLIP的嵌入空间和文本描述来模拟类似目标的风格特征。尽管在零样本域自适应方面取得了一些成就，但我们观察到这些以文本驱动的方法在捕捉复杂的现实世界变化方面存在困难，并且由于它们的对齐过程显著增加了适应时间。我们不依赖于文本描述，而是探索利用图像数据的解决方案，因为图像数据提供了更丰富和更精细的风格线索。在这项工作中，我们提出了SIDA，一种利用合成图像的新颖高效的零样本域自适应方法。为了生成合成图像，我们首先创建详细的类似源图像，然后应用图像翻译来反映目标域的风格。然后，我们利用这些合成图像的风格特征作为目标域的代理。基于这些特征，我们引入了域混合和补丁风格转移模块，从而实现对真实世界变化的有效建模。特别是，域混合将多种风格混合以扩展域内表示，而补丁风格转移将不同的风格赋予各个补丁。我们通过在各种零样本适应场景中展示最先进的性能，特别是在具有挑战性的领域中，证明了我们方法的有效性。此外，我们的方法通过显著减少总体适应时间实现了高效率。

更新时间: 2025-07-24 17:59:36

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2507.18632v1

Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment

With rapid advancement and increasing accessibility of LLMs, fine-tuning aligned models has become a critical step for adapting them to real-world applications, which makes the safety of this fine-tuning process more important than ever. However, recent studies have highlighted a critical challenge: even when fine-tuning with seemingly benign downstream datasets, the safety of aligned LLMs can be compromised, making them more susceptible to malicious instructions. In this paper, we show that fine-tuning datasets often contain samples with safety-degrading features that are not easily identifiable on the surface. These samples can significantly degrade the safety alignment of LLMs during fine-tuning. To address this issue, we propose LARF, a \textbf{L}ayer-\textbf{A}ware \textbf{R}epresentation \textbf{F}iltering method. This method identifies safety-sensitive layers within the LLM and leverages their representations to detect which data samples in the post-training dataset contain safety-degrading features. Experimental results demonstrate that LARF can effectively identify benign data with safety-degrading features. After removing such data, the safety alignment degradation caused by fine-tuning is mitigated. Please see our code at \href{https://github.com/LLLeoLi/LARF}{https://github.com/LLLeoLi/LARF}.

Updated: 2025-07-24 17:59:24

标题: 层感知表示过滤：净化微调数据以保持LLM安全对齐

摘要: 随着LLM的快速发展和日益普及，调整对齐模型已成为适应真实世界应用的关键步骤，这使得这一微调过程的安全性比以往任何时候都更加重要。然而，最近的研究已经凸显出一个关键挑战：即使在看似良性的下游数据集上进行微调，对齐的LLM的安全性也可能受到损害，使其更容易受到恶意指令的影响。在本文中，我们展示微调数据集通常包含具有不易在表面上识别的安全降级特征的样本。这些样本在微调过程中可以显著降低LLM的安全对齐性。为了解决这个问题，我们提出了LARF，一种层感知的表示过滤方法。该方法识别LLM中的安全敏感层，并利用它们的表示来检测后训练数据集中包含安全降级特征的数据样本。实验结果表明，LARF能够有效识别具有安全降级特征的良性数据。在移除这类数据后，微调造成的安全对齐降级得到缓解。请查看我们的代码：https://github.com/LLLeoLi/LARF。

更新时间: 2025-07-24 17:59:24

领域: cs.CR

下载: http://arxiv.org/abs/2507.18631v1

Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment

Updated: 2025-07-24 17:59:24

标题: 层感知表示过滤：净化微调数据以保持LLM安全对齐

摘要: 随着LLM的快速发展和日益普及，调整对齐模型已经成为将它们适应于现实世界应用的关键步骤，这使得这一微调过程的安全性比以往任何时候都更加重要。然而，最近的研究已经突显了一个关键挑战：即使在看似良性的下游数据集上进行微调，对齐的LLM的安全性也可能受到损害，使其更容易受到恶意指令的影响。在本文中，我们展示了微调数据集通常包含具有不易于识别表面上的安全降级特征的样本。这些样本在微调过程中可以显著降低LLM的安全对齐性。为解决这一问题，我们提出了一种名为LARF的Layer-Aware Representation Filtering方法。这种方法识别LLM内部的安全敏感层，并利用它们的表示来检测后训练数据集中包含安全降级特征的数据样本。实验结果表明，LARF能够有效识别具有安全降级特征的良性数据。在移除这些数据后，微调造成的安全对齐退化得到缓解。请访问我们的代码：https://github.com/LLLeoLi/LARF。

更新时间: 2025-07-24 17:59:24

领域: cs.CR

下载: http://arxiv.org/abs/2507.18631v1

Gait Recognition Based on Tiny ML and IMU Sensors

This project presents the development of a gait recognition system using Tiny Machine Learning (Tiny ML) and Inertial Measurement Unit (IMU) sensors. The system leverages the XIAO-nRF52840 Sense microcontroller and the LSM6DS3 IMU sensor to capture motion data, including acceleration and angular velocity, from four distinct activities: walking, stationary, going upstairs, and going downstairs. The data collected is processed through Edge Impulse, an edge AI platform, which enables the training of machine learning models that can be deployed directly onto the microcontroller for real-time activity classification.The data preprocessing step involves extracting relevant features from the raw sensor data using techniques such as sliding windows and data normalization, followed by training a Deep Neural Network (DNN) classifier for activity recognition. The model achieves over 80% accuracy on a test dataset, demonstrating its ability to classify the four activities effectively. Additionally, the platform enables anomaly detection, further enhancing the robustness of the system. The integration of Tiny ML ensures low-power operation, making it suitable for battery-powered or energy-harvesting devices.

Updated: 2025-07-24 17:59:08

标题: 基于微型机器学习和IMU传感器的步态识别

摘要: 这个项目介绍了使用微型机器学习（Tiny ML）和惯性测量单元（IMU）传感器开发步态识别系统。该系统利用XIAO-nRF52840 Sense微控制器和LSM6DS3 IMU传感器捕获运动数据，包括加速度和角速度，从四种不同的活动中提取数据：行走、静止、上楼和下楼。收集的数据通过Edge Impulse，一个边缘人工智能平台进行处理，该平台可用于训练可以直接部署到微控制器上进行实时活动分类的机器学习模型。数据预处理步骤涉及使用滑动窗口和数据归一化等技术从原始传感器数据中提取相关特征，然后训练深度神经网络（DNN）分类器进行活动识别。该模型在测试数据集上实现了超过80%的准确率，展示了其有效分类四种活动的能力。此外，该平台还支持异常检测，进一步增强了系统的稳健性。微型机器学习的整合确保了低功耗操作，使其适用于电池供电或能量收集设备。

更新时间: 2025-07-24 17:59:08

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.18627v1

Gait Recognition Based on Tiny ML and IMU Sensors

Updated: 2025-07-24 17:59:08

标题: 基于微型机器学习和IMU传感器的步态识别

摘要: 这个项目介绍了使用微型机器学习（Tiny ML）和惯性测量单元（IMU）传感器开发步态识别系统。该系统利用XIAO-nRF52840 Sense微控制器和LSM6DS3 IMU传感器捕获运动数据，包括加速度和角速度，从四种不同的活动中：行走、静止、上楼和下楼。收集的数据通过Edge Impulse，一种边缘人工智能平台进行处理，该平台可以训练机器学习模型，可以直接部署到微控制器上进行实时活动分类。数据预处理步骤涉及使用滑动窗口和数据归一化等技术从原始传感器数据中提取相关特征，然后训练深度神经网络（DNN）分类器进行活动识别。该模型在测试数据集上实现了超过80%的准确率，证明了其有效分类四种活动的能力。此外，该平台还实现了异常检测，进一步增强了系统的稳健性。微型机器学习的集成确保低功耗运行，使其适用于使用电池供电或能量收集设备。

更新时间: 2025-07-24 17:59:08

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.18627v1

3D Software Synthesis Guided by Constraint-Expressive Intermediate Representation

Graphical user interface (UI) software has undergone a fundamental transformation from traditional two-dimensional (2D) desktop/web/mobile interfaces to spatial three-dimensional (3D) environments. While existing work has made remarkable success in automated 2D software generation, such as HTML/CSS and mobile app interface code synthesis, the generation of 3D software still remains under-explored. Current methods for 3D software generation usually generate the 3D environments as a whole and cannot modify or control specific elements in the software. Furthermore, these methods struggle to handle the complex spatial and semantic constraints inherent in the real world. To address the challenges, we present Scenethesis, a novel requirement-sensitive 3D software synthesis approach that maintains formal traceability between user specifications and generated 3D software. Scenethesis is built upon ScenethesisLang, a domain-specific language that serves as a granular constraint-aware intermediate representation (IR) to bridge natural language requirements and executable 3D software. It serves both as a comprehensive scene description language enabling fine-grained modification of 3D software elements and as a formal constraint-expressive specification language capable of expressing complex spatial constraints. By decomposing 3D software synthesis into stages operating on ScenethesisLang, Scenethesis enables independent verification, targeted modification, and systematic constraint satisfaction. Our evaluation demonstrates that Scenethesis accurately captures over 80% of user requirements and satisfies more than 90% of hard constraints while handling over 100 constraints simultaneously. Furthermore, Scenethesis achieves a 42.8% improvement in BLIP-2 visual evaluation scores compared to the state-of-the-art method.

Updated: 2025-07-24 17:58:03

标题: 由约束表达中间表示引导的3D软件合成

摘要: 图形用户界面（UI）软件已经从传统的二维（2D）桌面/网络/移动界面转变为空间三维（3D）环境。虽然现有工作在自动化2D软件生成方面取得了显著成功，比如HTML/CSS和移动应用程序界面代码合成，但3D软件的生成仍然未被充分探索。当前的3D软件生成方法通常将整个3D环境生成为一个整体，无法修改或控制软件中的特定元素。此外，这些方法很难处理真实世界中固有的复杂空间和语义约束。为了解决这些挑战，我们提出了Scenethesis，一种新颖的需求敏感的3D软件合成方法，它保持用户规格和生成的3D软件之间的形式可追溯性。Scenethesis建立在ScenethesisLang之上，这是一种特定领域的语言，作为一个粒度约束感知的中间表示（IR），用于连接自然语言需求和可执行的3D软件。它既作为一种全面的场景描述语言，使得对3D软件元素进行精细修改成为可能，又作为一种能够表达复杂空间约束的正式约束表达规范语言。通过将3D软件合成分解为操作ScenethesisLang的阶段，Scenethesis实现了独立验证、有针对性的修改和系统性约束满足。我们的评估表明，Scenethesis准确捕捉了超过80%的用户需求，并满足了超过90%的硬约束，同时处理了超过100个约束。此外，与最先进的方法相比，Scenethesis在BLIP-2视觉评估分数上实现了42.8%的改善。

更新时间: 2025-07-24 17:58:03

领域: cs.CV,cs.AI,cs.MM,cs.SE

下载: http://arxiv.org/abs/2507.18625v1

Moving Out: Physically-grounded Human-AI Collaboration

The ability to adapt to physical actions and constraints in an environment is crucial for embodied agents (e.g., robots) to effectively collaborate with humans. Such physically grounded human-AI collaboration must account for the increased complexity of the continuous state-action space and constrained dynamics caused by physical constraints. In this paper, we introduce \textit{Moving Out}, a new human-AI collaboration benchmark that resembles a wide range of collaboration modes affected by physical attributes and constraints, such as moving heavy items together and maintaining consistent actions to move a big item around a corner. Using Moving Out, we designed two tasks and collected human-human interaction data to evaluate models' abilities to adapt to diverse human behaviors and unseen physical attributes. To address the challenges in physical environments, we propose a novel method, BASS (Behavior Augmentation, Simulation, and Selection), to enhance the diversity of agents and their understanding of the outcome of actions. Our experiments show that BASS outperforms state-of-the-art models in AI-AI and human-AI collaboration. The project page is available at \href{https://live-robotics-uva.github.io/movingout_ai/}{https://live-robotics-uva.github.io/movingout\_ai/}.

Updated: 2025-07-24 17:57:18

标题: 搬迁：基于身体的人工智能协作

摘要: 适应环境中的物理动作和约束对于具有实体特征的代理（例如机器人）有效地与人类合作至关重要。这种基于物理的人工智能与人类合作必须考虑由物理约束引起的连续状态-动作空间的增加复杂性和约束动力学。在本文中，我们引入了一个新的人工智能协作基准，名为\textit{Moving Out}，它类似于受物理属性和约束影响的各种协作模式，比如一起移动重物和保持一致的动作以将大物品绕过一个拐角移动。使用Moving Out，我们设计了两个任务，并收集了人与人之间的互动数据，以评估模型适应多样化人类行为和未见物理属性的能力。为了解决物理环境中的挑战，我们提出了一种新方法，BASS（行为增强、模拟和选择），以增强代理的多样性和对动作结果的理解。我们的实验表明，BASS在人工智能与人工智能、人工智能与人类的协作中表现优于最先进的模型。项目页面可在\href{https://live-robotics-uva.github.io/movingout_ai/}{https://live-robotics-uva.github.io/movingout\_ai/}上找到。

更新时间: 2025-07-24 17:57:18

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2507.18623v1

Moving Out: Physically-grounded Human-AI Collaboration

Updated: 2025-07-24 17:57:18

标题: 搬迁：基于物理基础的人工智能与人类合作

摘要: 适应环境中的物理行为和限制对于具有体现性的代理（例如机器人）有效地与人类合作至关重要。这种基于物理的人工智能协作必须考虑由物理约束引起的连续状态-行动空间的增加复杂性和受限动力学。在本文中，我们介绍了一个新的人工智能协作基准，称为\textit{Moving Out}，它类似于受物理属性和约束影响的各种协作模式，如一起移动重物和保持一致的动作来绕过一个角落移动一个大物品。使用Moving Out，我们设计了两个任务，并收集了人际互动数据，以评估模型适应多样化人类行为和未见物理属性的能力。为了解决物理环境中的挑战，我们提出了一种新方法，BASS（行为增强、模拟和选择），以增强代理的多样性和对行动结果的理解。我们的实验表明，BASS在人工智能和人工智能协作中优于最先进的模型。项目页面可在\href{https://live-robotics-uva.github.io/movingout_ai/}{https://live-robotics-uva.github.io/movingout_ai/}中找到。

更新时间: 2025-07-24 17:57:18

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2507.18623v1

Diffusion Beats Autoregressive in Data-Constrained Settings

Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages over AR models remain underexplored. In this paper, we systematically study masked diffusion models in data-constrained settings-where training involves repeated passes over limited data-and find that they significantly outperform AR models when compute is abundant but data is scarce. Diffusion models make better use of repeated data, achieving lower validation loss and superior downstream performance. We interpret this advantage as implicit data augmentation: masked diffusion exposes the model to a diverse distribution of token orderings and prediction tasks, unlike AR's fixed left-to-right factorization. We find new scaling laws for diffusion models and derive a closed-form expression for the critical compute threshold at which diffusion begins to outperform AR. These results suggest that when data, not compute, is the bottleneck, diffusion models offer a compelling alternative to the standard AR paradigm. Our code is available at: https://diffusion-scaling.github.io.

Updated: 2025-07-24 17:55:24

标题: 在数据受限制的情况下，扩散胜过自回归

摘要: 自回归（AR）模型长期以来一直主导着大型语言模型的领域，并推动了各种任务的进展。最近，基于扩散的语言模型已经成为一种有前途的替代方案，尽管它们相对于AR模型的优势尚未得到充分探讨。在本文中，我们系统地研究了在数据受限环境下的遮蔽扩散模型，其中训练涉及对有限数据的重复传递，并发现当计算资源充足而数据稀缺时，它们明显优于AR模型。扩散模型更好地利用了重复数据，实现了更低的验证损失和更优越的下游性能。我们将这一优势解释为隐式数据增强：遮蔽扩散使模型暴露于各种令牌排序和预测任务的分布，与AR的固定由左到右的因式分解不同。我们发现了扩散模型的新的扩展规律，并为扩散开始优于AR的关键计算阈值导出了一个闭合表达式。这些结果表明，当数据而不是计算资源成为瓶颈时，扩散模型为标准的AR范式提供了一种有吸引力的替代方案。我们的代码可在以下网址获取：https://diffusion-scaling.github.io。

更新时间: 2025-07-24 17:55:24

领域: cs.LG,cs.AI,cs.CV,cs.RO

下载: http://arxiv.org/abs/2507.15857v2

TRPrompt: Bootstrapping Query-Aware Prompt Optimization from Textual Rewards

Prompt optimization improves the reasoning abilities of large language models (LLMs) without requiring parameter updates to the target model. Following heuristic-based "Think step by step" approaches, the field has evolved in two main directions: while one group of methods uses textual feedback to elicit improved prompts from general-purpose LLMs in a training-free way, a concurrent line of research relies on numerical rewards to train a special prompt model, tailored for providing optimal prompts to the target model. In this paper, we introduce the Textual Reward Prompt framework (TRPrompt), which unifies these approaches by directly incorporating textual feedback into training of the prompt model. Our framework does not require prior dataset collection and is being iteratively improved with the feedback on the generated prompts. When coupled with the capacity of an LLM to internalize the notion of what a "good" prompt is, the high-resolution signal provided by the textual rewards allows us to train a prompt model yielding state-of-the-art query-specific prompts for the problems from the challenging math datasets GSMHard and MATH.

Updated: 2025-07-24 17:54:44

标题: TRPrompt：从文本奖励中引导查询感知提示优化

摘要: 快速优化提高了大型语言模型（LLMs）的推理能力，而无需对目标模型进行参数更新。遵循启发式的“逐步思考”方法，该领域发展出了两个主要方向：一组方法使用文本反馈以无需训练的方式从通用LLMs中获取改进的提示，而另一组研究则依赖于数值奖励来训练一个特殊的提示模型，为目标模型提供最佳提示。在本文中，我们引入了文本奖励提示框架（TRPrompt），通过直接将文本反馈纳入提示模型的训练，统一了这些方法。我们的框架不需要先前的数据集收集，并且通过对生成的提示的反馈进行迭代改进。当与LLM内化“好”提示的概念的能力相结合时，文本奖励提供的高分辨率信号使我们能够训练一个提示模型，为具有挑战性的数学数据集GSMHard和MATH中的问题提供最先进的查询特定提示。

更新时间: 2025-07-24 17:54:44

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.18618v1

TRPrompt: Bootstrapping Query-Aware Prompt Optimization from Textual Rewards

Updated: 2025-07-24 17:54:44

标题: TRPrompt：从文本奖励中引导查询感知提示优化

摘要: 快速优化提高了大型语言模型（LLMs）的推理能力，而无需对目标模型进行参数更新。遵循基于启发式的“逐步思考”方法，该领域发展为两个主要方向：一组方法利用文本反馈以无需训练的方式从通用LLMs中引出改进的提示，而另一组研究则依赖于数值奖励来训练专门的提示模型，为目标模型提供最佳提示。在本文中，我们介绍了文本奖励提示框架（TRPrompt），该框架通过直接将文本反馈纳入提示模型的训练中，统一了这些方法。我们的框架不需要事先收集数据集，并且正在通过生成的提示的反馈进行迭代改进。当与LLM内化“好”提示概念的能力相结合时，文本奖励提供的高分辨率信号使我们能够训练一个提示模型，为来自具有挑战性的数学数据集GSMHard和MATH的问题提供最先进的查询特定提示。

更新时间: 2025-07-24 17:54:44

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.18618v1

SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning

Zero-shot Image Captioning (ZIC) increasingly utilizes synthetic datasets generated by text-to-image (T2I) models to mitigate the need for costly manual annotation. However, these T2I models often produce images that exhibit semantic misalignments with their corresponding input captions (e.g., missing objects, incorrect attributes), resulting in noisy synthetic image-caption pairs that can hinder model training. Existing dataset pruning techniques are largely designed for removing noisy text in web-crawled data. However, these methods are ill-suited for the distinct challenges of synthetic data, where captions are typically well-formed, but images may be inaccurate representations. To address this gap, we introduce SynC, a novel framework specifically designed to refine synthetic image-caption datasets for ZIC. Instead of conventional filtering or regeneration, SynC focuses on reassigning captions to the most semantically aligned images already present within the synthetic image pool. Our approach employs a one-to-many mapping strategy by initially retrieving multiple relevant candidate images for each caption. We then apply a cycle-consistency-inspired alignment scorer that selects the best image by verifying its ability to retrieve the original caption via image-to-text retrieval. Extensive evaluations demonstrate that SynC consistently and significantly improves performance across various ZIC models on standard benchmarks (MS-COCO, Flickr30k, NoCaps), achieving state-of-the-art results in several scenarios. SynC offers an effective strategy for curating refined synthetic data to enhance ZIC.

Updated: 2025-07-24 17:53:26

标题: SynC：使用一对多映射进行零样本图像字幕生成的合成图像字幕数据集精化

摘要: 零样本图像描述（ZIC）越来越多地利用文本到图像（T2I）模型生成的合成数据集，以减轻昂贵的手工标注需求。然而，这些T2I模型通常产生的图像与其对应的输入标题存在语义错位（例如，缺少对象，属性不正确），导致合成图像-标题对嘈杂，可能会妨碍模型训练。现有的数据集修剪技术主要用于去除网络爬取数据中的嘈杂文本。然而，这些方法不适用于合成数据的独特挑战，其中标题通常形式良好，但图像可能是不准确的表示。为了解决这一差距，我们引入了SynC，这是一个专门设计用于改进ZIC的合成图像-标题数据集的新框架。与传统的过滤或再生不同，SynC专注于将标题重新分配给合成图像池中已经存在的最具语义对齐的图像。我们的方法采用一对多映射策略，通过最初为每个标题检索多个相关候选图像。然后，我们应用一个循环一致性启发的对齐评分器，通过验证其通过图像到文本检索能力来检索原始标题来选择最佳图像。广泛的评估表明，SynC在各种标准基准测试（MS-COCO，Flickr30k，NoCaps）上始终显著改善各种ZIC模型的性能，达到了一些场景的最先进结果。SynC提供了一个有效的策略，用于策划精炼的合成数据，以增强ZIC。

更新时间: 2025-07-24 17:53:26

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.18616v1

SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning

Updated: 2025-07-24 17:53:26

标题: SynC: 用于零样本图像描述的一对多映射的合成图像描述数据集精炼

摘要: 零样本图像描述（ZIC）越来越多地利用由文本到图像（T2I）模型生成的合成数据集，以减轻昂贵的手动注释需求。然而，这些T2I模型经常会产生与其相应输入标题存在语义错位的图像（例如，缺少物体、属性不正确），导致合成图像-标题对具有噪声的情况，可能会阻碍模型训练。现有的数据集修剪技术主要是为了从网络抓取的数据中删除嘈杂的文本而设计的。然而，这些方法不适合合成数据的独特挑战，其中标题通常是良好形成的，但图像可能是不准确的表示。为了解决这一差距，我们引入了SynC，这是一个专门设计用于ZIC的精炼合成图像-标题数据集的新框架。与传统的过滤或重新生成不同，SynC专注于将标题重新分配给合成图像池中已经存在的最具语义对齐性的图像。我们的方法采用一对多的映射策略，最初为每个标题检索多个相关的候选图像。然后，我们应用一个受循环一致性启发的对齐评分器，通过图像到文本检索来选择最佳图像，验证其能力以检索原始标题。广泛的评估表明，SynC在各种ZIC模型上稳定且显著地提高了性能，达到了几种场景的最新成果。SynC提供了一个有效的策略，用于筛选精炼的合成数据以增强ZIC。

更新时间: 2025-07-24 17:53:26

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.18616v1

Approximate SMT Counting Beyond Discrete Domains

Satisfiability Modulo Theory (SMT) solvers have advanced automated reasoning, solving complex formulas across discrete and continuous domains. Recent progress in propositional model counting motivates extending SMT capabilities toward model counting, especially for hybrid SMT formulas. Existing approaches, like bit-blasting, are limited to discrete variables, highlighting the challenge of counting solutions projected onto the discrete domain in hybrid formulas. We introduce pact, an SMT model counter for hybrid formulas that uses hashing-based approximate model counting to estimate solutions with theoretical guarantees. pact makes a logarithmic number of SMT solver calls relative to the projection variables, leveraging optimized hash functions. pact achieves significant performance improvements over baselines on a large suite of benchmarks. In particular, out of 14,202 instances, pact successfully finished on 603 instances, while Baseline could only finish on 13 instances.

Updated: 2025-07-24 17:48:13

标题: 在离散域之外的近似SMT计数

摘要: Satisfiability Modulo Theory (SMT)求解器已经推进自动推理，解决离散和连续域中的复杂公式。最近在命题模型计数方面取得的进展促使将SMT的能力扩展到模型计数，特别是对于混合SMT公式。现有的方法，如位爆破，仅限于离散变量，突显了在混合公式中将解决方案投影到离散域中计算的挑战。我们引入了pact，一个用于混合公式的SMT模型计数器，它使用基于哈希的近似模型计数来估计具有理论保证的解决方案。pact相对于投影变量进行对数数量的SMT求解器调用，利用了优化的哈希函数。pact在大量基准测试中实现了显著的性能改进。特别是，在14,202个实例中，pact成功完成了603个实例，而基准只能完成13个实例。

更新时间: 2025-07-24 17:48:13

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2507.18612v1

BEARCUBS: A benchmark for computer-using web agents

Modern web agents possess computer use abilities that allow them to interact with webpages by sending commands to a virtual keyboard and mouse. While such agents have considerable potential to assist human users with complex tasks, evaluating their capabilities in real-world settings poses a major challenge. To this end, we introduce BEARCUBS, a "smallbut mighty" benchmark of 111 information-seeking questions designed to evaluate a web agent's ability to search, browse, and identify factual information from the web. Unlike prior web agent benchmarks, solving BEARCUBS requires (1) accessing live web content rather than synthetic or simulated pages, which captures the unpredictability of real-world web interactions; and (2) performing a broad range of multimodal interactions (e.g., video understanding, 3D navigation) that cannot be bypassed via text-based workarounds. Each question in BEARCUBS has a corresponding short, unambiguous answer and a human-validated browsing trajectory, allowing for transparent evaluation of agent performance and strategies. A human study confirms that BEARCUBS questions are solvable but non-trivial (84.7% human accuracy), revealing domain knowledge gaps and overlooked details as common failure points. We find that ChatGPT Agent significantly outperforms other computer-using agents with an overall accuracy of 65.8% (compared to e.g., Operator's 23.4%), showcasing substantial progress in tasks involving real computer use, such as playing web games and navigating 3D environments. Nevertheless, closing the gap to human performance requires improvements in areas like fine control, complex data filtering, and execution speed. To facilitate future research, BEARCUBS will be updated periodically to replace invalid or contaminated questions, keeping the benchmark fresh for future generations of web agents.

Updated: 2025-07-24 17:45:05

标题: BEARCUBS：用于计算机网络代理的基准测试

摘要: 现代网络代理程序具有计算机使用能力，可以通过向虚拟键盘和鼠标发送命令与网页进行交互。虽然这种代理程序在协助人类用户完成复杂任务方面具有巨大潜力，但在真实世界环境中评估它们的能力却是一个重大挑战。为此，我们引入了BEARCUBS，一个由111个信息搜索问题组成的“小而强大”的基准，旨在评估网络代理程序搜索、浏览和识别网络中的事实信息的能力。与以往的网络代理程序基准不同，解决BEARCUBS需要(1)访问真实网络内容而不是合成或模拟页面，从而捕捉真实世界网络交互的不可预测性；(2)执行广泛的多模态交互(例如视频理解、3D导航)，不能通过基于文本的变通方法绕过。BEARCUBS中的每个问题都有一个对应的简短、明确的答案和经过人工验证的浏览轨迹，可以透明地评估代理程序的性能和策略。一项人类研究证实，BEARCUBS的问题是可解的但非平凡的(84.7%人类准确率)，揭示了领域知识差距和被忽视的细节常见的失败点。我们发现，ChatGPT代理程序在总体准确率方面明显优于其他使用计算机的代理程序，达到65.8%的准确率(例如，操作员的23.4%)，展示了在涉及真实计算机使用的任务中取得的重大进展，如玩网络游戏和导航3D环境。然而，要缩小与人类性能之间的差距，需要在诸如精细控制、复杂数据过滤和执行速度等方面进行改进。为了促进未来的研究，BEARCUBS将定期更新以替换无效或受污染的问题，使基准保持新鲜，以供未来一代网络代理程序使用。

更新时间: 2025-07-24 17:45:05

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.07919v3

BEARCUBS: A benchmark for computer-using web agents

Updated: 2025-07-24 17:45:05

标题: BEARCUBS：用于计算机网络代理的基准测试

摘要: 现代网络代理具有计算机使用能力，使它们能够通过向虚拟键盘和鼠标发送命令与网页互动。尽管这些代理在协助人类用户完成复杂任务方面具有相当大的潜力，但在真实世界环境中评估它们的能力却是一个重大挑战。为此，我们引入了BEARCUBS，一个由111个信息查询问题组成的“小而强大”的基准，旨在评估网络代理从网络上搜索、浏览和识别事实信息的能力。与先前的网络代理基准不同，解决BEARCUBS需要(1)访问实时网络内容而不是合成或模拟页面，以捕捉真实世界网络互动的不可预测性；(2)执行广泛的多模态交互(如视频理解、3D导航)，不能通过基于文本的变通方法绕过。BEARCUBS中的每个问题都有一个对应的简短、明确的答案和一个经人验证的浏览轨迹，使得能够透明地评估代理的性能和策略。一项人类研究证实，BEARCUBS问题是可以解决的，但并非易事(人类准确率为84.7%)，揭示了领域知识差距和被忽视的细节作为常见失败点。我们发现ChatGPT代理在实现65.8%的整体准确率方面明显优于其他计算机使用代理(例如，操作员的23.4%)，展示了在涉及真实计算机使用任务的领域取得的重大进展，如玩网络游戏和导航3D环境。然而，要缩小与人类表现之间的差距，需要在细致控制、复杂数据过滤和执行速度等方面进行改进。为了促进未来研究，BEARCUBS将定期更新以替换无效或受污染的问题，使基准保持新鲜，以供未来一代网络代理使用。

更新时间: 2025-07-24 17:45:05

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.07919v3

Interact2Vec -- An efficient neural network-based model for simultaneously learning users and items embeddings in recommender systems

Over the past decade, recommender systems have experienced a surge in popularity. Despite notable progress, they grapple with challenging issues, such as high data dimensionality and sparseness. Representing users and items as low-dimensional embeddings learned via neural networks has become a leading solution. However, while recent studies show promising results, many approaches rely on complex architectures or require content data, which may not always be available. This paper presents Interact2Vec, a novel neural network-based model that simultaneously learns distributed embeddings for users and items while demanding only implicit feedback. The model employs state-of-the-art strategies that natural language processing models commonly use to optimize the training phase and enhance the final embeddings. Two types of experiments were conducted regarding the extrinsic and intrinsic quality of the model. In the former, we benchmarked the recommendations generated by Interact2Vec's embeddings in a top-$N$ ranking problem, comparing them with six other recommender algorithms. The model achieved the second or third-best results in 30% of the datasets, being competitive with other recommenders, and has proven to be very efficient with an average training time reduction of 274% compared to other embedding-based models. Later, we analyzed the intrinsic quality of the embeddings through similarity tables. Our findings suggest that Interact2Vec can achieve promising results, especially on the extrinsic task, and is an excellent embedding-generator model for scenarios of scarce computing resources, enabling the learning of item and user embeddings simultaneously and efficiently.

Updated: 2025-07-24 17:44:54

标题: Interact2Vec——一种用于在推荐系统中同时学习用户和物品嵌入的高效神经网络模型

摘要: 在过去的十年中，推荐系统经历了一波流行。尽管取得了显著进展，但它们仍然面临着诸如高数据维度和稀疏性等挑战性问题。通过使用神经网络学习的低维嵌入来表示用户和物品已成为一种领先的解决方案。然而，尽管最近的研究显示出有希望的结果，许多方法依赖于复杂的架构或需要内容数据，这些数据并不总是可用。本文介绍了一种名为Interact2Vec的新型基于神经网络的模型，它同时学习用户和物品的分布式嵌入，只需隐式反馈即可实现。该模型采用了自然语言处理模型常用的最先进策略来优化训练阶段并增强最终的嵌入。关于模型的外在和内在质量进行了两种类型的实验。在前者中，我们在一个top-$N$排名问题中对Interact2Vec嵌入生成的推荐进行了基准测试，并与其他六种推荐算法进行了比较。该模型在30%的数据集中取得了第二或第三名的成绩，与其他推荐系统竞争激烈，并且在平均训练时间缩短了274%的情况下证明了其非常有效。接着，我们通过相似性表格分析了嵌入的内在质量。我们的研究结果表明，Interact2Vec可以取得有希望的结果，尤其在外在任务上表现出色，并且是一种在计算资源稀缺的场景下卓越的嵌入生成模型，能够同时高效地学习物品和用户的嵌入。

更新时间: 2025-07-24 17:44:54

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2506.22648v3

Explainable Mapper: Charting LLM Embedding Spaces Using Perturbation-Based Explanation and Verification Agents

Large language models (LLMs) produce high-dimensional embeddings that capture rich semantic and syntactic relationships between words, sentences, and concepts. Investigating the topological structures of LLM embedding spaces via mapper graphs enables us to understand their underlying structures. Specifically, a mapper graph summarizes the topological structure of the embedding space, where each node represents a topological neighborhood (containing a cluster of embeddings), and an edge connects two nodes if their corresponding neighborhoods overlap. However, manually exploring these embedding spaces to uncover encoded linguistic properties requires considerable human effort. To address this challenge, we introduce a framework for semi-automatic annotation of these embedding properties. To organize the exploration process, we first define a taxonomy of explorable elements within a mapper graph such as nodes, edges, paths, components, and trajectories. The annotation of these elements is executed through two types of customizable LLM-based agents that employ perturbation techniques for scalable and automated analysis. These agents help to explore and explain the characteristics of mapper elements and verify the robustness of the generated explanations. We instantiate the framework within a visual analytics workspace and demonstrate its effectiveness through case studies. In particular, we replicate findings from prior research on BERT's embedding properties across various layers of its architecture and provide further observations into the linguistic properties of topological neighborhoods.

Updated: 2025-07-24 17:43:40

标题: 可解释的映射器：利用扰动式解释和验证代理绘制LLM嵌入空间

摘要: 大型语言模型（LLMs）产生高维嵌入，捕捉单词、句子和概念之间丰富的语义和句法关系。通过映射器图研究LLM嵌入空间的拓扑结构使我们能够理解其潜在结构。具体而言，映射器图总结了嵌入空间的拓扑结构，其中每个节点代表一个拓扑邻域（包含一组嵌入），如果它们对应的邻域重叠，则连接两个节点的边。然而，手动探索这些嵌入空间以揭示编码的语言特性需要大量人力。为了解决这一挑战，我们引入了一个半自动注释这些嵌入属性的框架。为了组织探索过程，我们首先定义了映射器图中可探索元素的分类法，如节点、边、路径、组件和轨迹。通过两种类型的可定制的基于LLM的代理执行这些元素的注释，这些代理采用扰动技术进行可伸缩和自动化分析。这些代理帮助探索和解释映射器元素的特征，并验证生成的解释的稳健性。我们在可视化分析工作空间中实现了这一框架，并通过案例研究证明了其有效性。特别地，我们复制了关于BERT嵌入属性在其不同层次体系结构中的发现，并对拓扑邻域的语言特性提供了进一步观察。

更新时间: 2025-07-24 17:43:40

领域: cs.CG,cs.LG

下载: http://arxiv.org/abs/2507.18607v1

Explainable Mapper: Charting LLM Embedding Spaces Using Perturbation-Based Explanation and Verification Agents

Updated: 2025-07-24 17:43:40

标题: 可解释的映射器：利用基于扰动的解释和验证代理绘制LLM嵌入空间

摘要: 大型语言模型（LLMs）生成捕捉词语、句子和概念之间丰富语义和句法关系的高维嵌入。通过mapper图研究LLM嵌入空间的拓扑结构使我们能够理解其潜在结构。具体而言，mapper图总结了嵌入空间的拓扑结构，其中每个节点代表一个拓扑邻域（包含一组嵌入），边连接两个节点如果它们对应的邻域重叠。然而，手动探索这些嵌入空间以揭示编码的语言属性需要相当大的人力。为了解决这一挑战，我们引入了一个半自动注释这些嵌入属性的框架。为了组织探索过程，我们首先定义了mapper图中可探索元素的分类法，如节点、边、路径、组件和轨迹。通过两种类型的可定制的基于LLM的代理执行这些元素的注释，这些代理采用扰动技术进行可扩展和自动化分析。这些代理有助于探索和解释mapper元素的特征，并验证生成解释的稳健性。我们在一个可视分析工作区内实例化了这个框架，并通过案例研究证明了其有效性。特别地，我们在BERT的各层架构上复制了先前研究中关于其嵌入属性的发现，并对拓扑邻域的语言属性提供了进一步的观察。

更新时间: 2025-07-24 17:43:40

领域: cs.CG,cs.LG

下载: http://arxiv.org/abs/2507.18607v1

Hybrid quantum-classical algorithm for near-optimal planning in POMDPs

Reinforcement learning (RL) provides a principled framework for decision-making in partially observable environments, which can be modeled as Markov decision processes and compactly represented through dynamic decision Bayesian networks. Recent advances demonstrate that inference on sparse Bayesian networks can be accelerated using quantum rejection sampling combined with amplitude amplification, leading to a computational speedup in estimating acceptance probabilities.\\ Building on this result, we introduce Quantum Bayesian Reinforcement Learning (QBRL), a hybrid quantum-classical look-ahead algorithm for model-based RL in partially observable environments. We present a rigorous, oracle-free time complexity analysis under fault-tolerant assumptions for the quantum device. Unlike standard treatments that assume a black-box oracle, we explicitly specify the inference process, allowing our bounds to more accurately reflect the true computational cost. We show that, for environments whose dynamics form a sparse Bayesian network, horizon-based near-optimal planning can be achieved sub-quadratically faster through quantum-enhanced belief updates. Furthermore, we present numerical experiments benchmarking QBRL against its classical counterpart on simple yet illustrative decision-making tasks. Our results offer a detailed analysis of how the quantum computational advantage translates into decision-making performance, highlighting that the magnitude of the advantage can vary significantly across different deployment settings.

Updated: 2025-07-24 17:42:30

标题: 混合量子-经典算法用于POMDPs中近乎最优规划

摘要: 强化学习（RL）为部分可观测环境中的决策提供了一个有原则的框架，这些环境可以建模为马尔可夫决策过程，并通过动态决策贝叶斯网络紧凑表示。最近的进展表明，利用量子拒绝采样结合幅度放大可以加速对稀疏贝叶斯网络的推断，从而在估计接受概率时实现计算速度提升。在这一结果基础上，我们介绍了量子贝叶斯强化学习（QBRL），这是一个用于部分可观测环境中基于模型的RL的混合量子-经典前瞻算法。我们在容错假设下对量子设备进行了严格的、不依赖于预言的时间复杂度分析。与通常假设一个黑盒预言的标准处理方式不同，我们明确指定了推断过程，使我们的界限能更准确地反映真实的计算成本。我们表明，在环境动态形成稀疏贝叶斯网络的情况下，通过量子增强信念更新，基于时间跨度的近最优规划可以实现亚二次速度更快。此外，我们通过对简单但富有说明性的决策任务对比QBRL与其经典对应算法的数值实验来衡量QBRL的性能。我们的结果详细分析了量子计算优势如何转化为决策性能，突显了优势的大小在不同部署设置中可能存在显著差异。

更新时间: 2025-07-24 17:42:30

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2507.18606v1

Hybrid quantum-classical algorithm for near-optimal planning in POMDPs

Updated: 2025-07-24 17:42:30

标题: 混合量子-经典算法用于POMDP中近乎最优规划

摘要: 强化学习（RL）提供了一个有原则的框架，用于在部分可观察环境中做出决策，这可以建模为马尔可夫决策过程，并通过动态决策贝叶斯网络来紧凑表示。最近的进展表明，可以使用量子拒绝采样结合振幅放大来加速对稀疏贝叶斯网络的推断，从而在估计接受概率方面实现计算速度提升。在此基础上，我们引入了量子贝叶斯强化学习（QBRL），这是一个混合量子-经典前瞻算法，用于部分可观察环境中的基于模型的RL。我们在容错假设下对量子设备进行了严格的、无需预言的时间复杂度分析。与假设黑匣子预言的标准处理方式不同，我们明确指定了推断过程，使得我们的界限更准确地反映了真实的计算成本。我们展示了对于动态形成稀疏贝叶斯网络的环境，通过量子增强的信念更新，可以实现基于视野的近乎最优规划，速度比次二次更快。此外，我们通过在简单但富有说明性的决策任务上对比QBRL和其经典对应物的数值实验，来展示QBRL的性能。我们的结果提供了对量子计算优势如何转化为决策性能的详细分析，强调优势的大小在不同的部署环境中可能会有显著差异。

更新时间: 2025-07-24 17:42:30

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2507.18606v1

Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures

The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning increasingly encounters richly structured data that is inherently nonEuclidean. This data can exhibit intricate geometric, topological and algebraic structure: from the geometry of the curvature of space-time, to topologically complex interactions between neurons in the brain, to the algebraic transformations describing symmetries of physical systems. Extracting knowledge from such non-Euclidean data necessitates a broader mathematical perspective. Echoing the 19th-century revolutions that gave rise to non-Euclidean geometry, an emerging line of research is redefining modern machine learning with non-Euclidean structures. Its goal: generalizing classical methods to unconventional data types with geometry, topology, and algebra. In this review, we provide an accessible gateway to this fast-growing field and propose a graphical taxonomy that integrates recent advances into an intuitive unified framework. We subsequently extract insights into current challenges and highlight exciting opportunities for future development in this field.

Updated: 2025-07-24 17:41:43

标题: 超越欧几里得：现代机器学习与几何、拓扑和代数结构的图解指南

摘要: 欧几里德几何学的持久遗产支撑着经典机器学习，几十年来，这主要是为了处理在欧几里德空间中的数据而开发的。然而，现代机器学习越来越多地遇到具有丰富结构的本质上非欧几里德的数据。这些数据可能展示复杂的几何、拓扑和代数结构：从时空曲率的几何到大脑中神经元之间的拓扑复杂相互作用，再到描述物理系统对称性的代数变换。从这种非欧几里德数据中提取知识需要更广泛的数学视角。回应19世纪引起非欧几里德几何学的革命，一个新兴的研究方向正在重新定义具有非欧几里德结构的现代机器学习。其目标是将经典方法推广到具有几何、拓扑和代数的非传统数据类型。在本综述中，我们为这个快速发展的领域提供了一个易于理解的入门，提出了一个将最新进展整合到直观统一框架中的图形分类法。随后，我们提取出对当前挑战的见解，并突出未来在这一领域发展的令人兴奋的机会。

更新时间: 2025-07-24 17:41:43

领域: cs.LG

下载: http://arxiv.org/abs/2407.09468v2

Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures

Updated: 2025-07-24 17:41:43

标题: 超越欧几里得：现代机器学习中几何、拓扑和代数结构的图解指南

摘要: 欧几里得几何学的持久遗产支撑着经典机器学习，几十年来，这一领域主要针对位于欧几里得空间中的数据进行开发。然而，现代机器学习越来越频繁地遇到固有非欧几里得的结构丰富的数据。这些数据可能展示复杂的几何、拓扑和代数结构：从时空曲率的几何到大脑神经元之间拓扑复杂的相互作用，再到描述物理系统对称性的代数变换。从这种非欧几里得数据中提取知识需要更广泛的数学视角。回应19世纪引发非欧几里得几何学的革命，一条新兴的研究路线正在重新定义具有非欧几里得结构的现代机器学习。其目标是将经典方法推广到具有几何、拓扑和代数的非传统数据类型。在本综述中，我们提供了一个便于理解的通往这个快速发展领域的入口，并提出了一个将最新进展整合到直观统一框架中的图形分类法。随后，我们提取出当前挑战的见解，并突出了这一领域未来发展的激动人心的机会。

更新时间: 2025-07-24 17:41:43

领域: cs.LG

下载: http://arxiv.org/abs/2407.09468v2

Demystify Protein Generation with Hierarchical Conditional Diffusion Models

Generating novel and functional protein sequences is critical to a wide range of applications in biology. Recent advancements in conditional diffusion models have shown impressive empirical performance in protein generation tasks. However, reliable generations of protein remain an open research question in de novo protein design, especially when it comes to conditional diffusion models. Considering the biological function of a protein is determined by multi-level structures, we propose a novel multi-level conditional diffusion model that integrates both sequence-based and structure-based information for efficient end-to-end protein design guided by specified functions. By generating representations at different levels simultaneously, our framework can effectively model the inherent hierarchical relations between different levels, resulting in an informative and discriminative representation of the generated protein. We also propose a Protein-MMD, a new reliable evaluation metric, to evaluate the quality of generated protein with conditional diffusion models. Our new metric is able to capture both distributional and functional similarities between real and generated protein sequences while ensuring conditional consistency. We experiment with the benchmark datasets, and the results on conditional protein generation tasks demonstrate the efficacy of the proposed generation framework and evaluation metric.

Updated: 2025-07-24 17:34:02

标题: 用分层条件扩散模型揭开蛋白质生成之谜

摘要: 生成新颖和功能蛋白质序列对于生物学中的各种应用至关重要。最近条件扩散模型的进展在蛋白质生成任务中表现出令人印象深刻的经验性能。然而，在从头设计蛋白质时，可靠的蛋白质生成仍然是一个开放的研究问题，特别是在涉及条件扩散模型时。考虑到蛋白质的生物功能是由多级结构决定的，我们提出了一种新颖的多级条件扩散模型，该模型整合了基于序列和基于结构的信息，以实现指定功能引导的高效端到端蛋白质设计。通过同时在不同级别生成表示，我们的框架可以有效地建模不同级别之间的内在分层关系，从而产生生成蛋白质的信息丰富且具有区分性的表示。我们还提出了一种新的可靠评估指标Protein-MMD，用于评估具有条件扩散模型的生成蛋白质的质量。我们的新指标能够捕捉真实和生成蛋白质序列之间的分布和功能相似性，同时确保条件一致性。我们在基准数据集上进行了实验，条件蛋白质生成任务的结果证明了所提出的生成框架和评估指标的有效性。

更新时间: 2025-07-24 17:34:02

领域: cs.LG

下载: http://arxiv.org/abs/2507.18603v1

Demystify Protein Generation with Hierarchical Conditional Diffusion Models

Updated: 2025-07-24 17:34:02

标题: Hierarchical Conditional Diffusion Models解释蛋白质生成

摘要: 生成新颖且功能性蛋白质序列对于生物学中的广泛应用至关重要。最近对于蛋白质生成任务中条件扩散模型的进展表现出令人印象深刻的实证表现。然而，在全新蛋白质设计中，特别是涉及条件扩散模型时，可靠的蛋白质生成仍然是一个开放的研究问题。考虑到蛋白质的生物功能是由多层结构决定的，我们提出了一种新颖的多层条件扩散模型，该模型整合了基于序列和基于结构的信息，用于有效地端到端蛋白质设计，并且受特定功能引导。通过同时在不同层次生成表示，我们的框架能够有效地建模不同层次之间的内在层次关系，从而生成蛋白质的信息丰富且具有区分性的表示。我们还提出了一种Protein-MMD，一种新的可靠评估指标，用于评估基于条件扩散模型生成的蛋白质的质量。我们的新指标能够捕捉真实和生成蛋白质序列之间的分布和功能相似性，同时确保条件一致性。我们在基准数据集上进行了实验，条件蛋白质生成任务的结果展示了所提出的生成框架和评估指标的有效性。

更新时间: 2025-07-24 17:34:02

领域: cs.LG

下载: http://arxiv.org/abs/2507.18603v1

Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs

Knowledge distillation can be a cost-effective technique to distill knowledge in Large Language Models, if the teacher output logits can be pre-computed and cached. However, successfully applying this to pre-training remains largely unexplored. In this work, we prove that naive approaches for sparse knowledge distillation such as caching Top-K probabilities, while intuitive, provide biased estimates of teacher probability distribution to the student, resulting in suboptimal performance and calibration. We propose an importance-sampling-based method `Random Sampling Knowledge Distillation', which provides unbiased estimates, preserves the gradient in expectation, and requires storing significantly sparser logits. Our method enables faster training of student models with marginal overhead (<10%) compared to cross-entropy based training, while maintaining competitive performance compared to full distillation, across a range of model sizes from 300M to 3B.

Updated: 2025-07-24 17:30:12

标题: 稀疏逻辑抽样：加速LLMs中的知识蒸馏

摘要: 知识蒸馏可以是一种成本有效的技术，用于在大型语言模型中蒸馏知识，如果教师输出logits可以预先计算并缓存。然而，成功地将这一技术应用于预训练仍然大部分未被探索。在这项工作中，我们证明了对于稀疏知识蒸馏的朴素方法，如缓存Top-K概率，虽然直观，但会给学生提供教师概率分布的偏见估计，导致性能和校准不佳。我们提出了一种基于重要性抽样的方法“随机抽样知识蒸馏”，该方法提供无偏估计，保留期望中的梯度，并且需要存储显著更稀疏的logits。我们的方法使得与基于交叉熵的训练相比，学生模型的训练速度更快，同时保持了与完整蒸馏相比的竞争性能，在从300M到3B的一系列模型大小上。

更新时间: 2025-07-24 17:30:12

领域: cs.LG,cs.AI,cs.CL,68T50,I.2.7

下载: http://arxiv.org/abs/2503.16870v2

Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs

Updated: 2025-07-24 17:30:12

标题: 稀疏逻辑抽样：加速LLM中的知识蒸馏

摘要: 知识蒸馏可以是一种成本有效的技术，用于在大型语言模型中提炼知识，如果教师输出logits可以预先计算并缓存。然而，成功地将这种技术应用于预训练仍然是一个相对未被探索的领域。在这项工作中，我们证明了对于稀疏知识蒸馏的天真方法，比如缓存Top-K概率，虽然直观，却会给学生提供教师概率分布的偏见估计，导致次优的性能和校准。我们提出了一种基于重要性抽样的方法“随机抽样知识蒸馏”，它提供无偏估计，保留了期望中的梯度，并需要存储显著更稀疏的logits。我们的方法使学生模型的训练速度更快，与基于交叉熵的训练相比，增加的开销很小（<10%），同时在从300M到3B的一系列模型大小上保持与完整蒸馏相竞争的性能。

更新时间: 2025-07-24 17:30:12

领域: cs.LG,cs.AI,cs.CL,68T50,I.2.7

下载: http://arxiv.org/abs/2503.16870v2

Linear Memory SE(2) Invariant Attention

Processing spatial data is a key component in many learning tasks for autonomous driving such as motion forecasting, multi-agent simulation, and planning. Prior works have demonstrated the value in using SE(2) invariant network architectures that consider only the relative poses between objects (e.g. other agents, scene features such as traffic lanes). However, these methods compute the relative poses for all pairs of objects explicitly, requiring quadratic memory. In this work, we propose a mechanism for SE(2) invariant scaled dot-product attention that requires linear memory relative to the number of objects in the scene. Our SE(2) invariant transformer architecture enjoys the same scaling properties that have benefited large language models in recent years. We demonstrate experimentally that our approach is practical to implement and improves performance compared to comparable non-invariant architectures.

Updated: 2025-07-24 17:28:57

标题: 线性内存 SE(2) 不变的注意力

摘要: 处理空间数据是自动驾驶许多学习任务的关键组成部分，例如运动预测、多智能体模拟和规划。先前的研究表明使用只考虑对象之间相对姿势（例如其他智能体、场景特征如交通车道）的SE(2)不变网络架构具有价值。然而，这些方法明确计算所有对象之间的相对姿势，需要二次内存。在这项工作中，我们提出了一种SE(2)不变的缩放点积注意力机制，相对于场景中对象的数量需要线性内存。我们的SE(2)不变变压器架构具有与近年来受益于大型语言模型的相同的扩展性质。我们在实验证明，我们的方法实用且相对于可比较的非不变架构改善了性能。

更新时间: 2025-07-24 17:28:57

领域: cs.LG

下载: http://arxiv.org/abs/2507.18597v1

Linear Memory SE(2) Invariant Attention

Updated: 2025-07-24 17:28:57

标题: 线性内存SE(2)不变的注意力机制

摘要: 处理空间数据是许多自动驾驶学习任务的关键组成部分，如运动预测、多智能体模拟和规划。先前的研究已经证明了使用SE(2)不变网络架构的价值，该架构仅考虑对象之间的相对姿态（例如其他智能体、交通车道等场景特征）。然而，这些方法明确计算所有对象之间的相对姿态，需要二次内存。在这项工作中，我们提出了一种SE(2)不变的缩放点积注意机制，相对于场景中对象数量的线性内存。我们的SE(2)不变变压器架构具有与近年来受益于大型语言模型的相同的扩展属性。我们通过实验证明，我们的方法在实施上是实用的，并且与可比的非不变架构相比，可以提高性能。

更新时间: 2025-07-24 17:28:57

领域: cs.LG

下载: http://arxiv.org/abs/2507.18597v1

MambaNeXt-YOLO: A Hybrid State Space Model for Real-time Object Detection

Real-time object detection is a fundamental but challenging task in computer vision, particularly when computational resources are limited. Although YOLO-series models have set strong benchmarks by balancing speed and accuracy, the increasing need for richer global context modeling has led to the use of Transformer-based architectures. Nevertheless, Transformers have high computational complexity because of their self-attention mechanism, which limits their practicality for real-time and edge deployments. To overcome these challenges, recent developments in linear state space models, such as Mamba, provide a promising alternative by enabling efficient sequence modeling with linear complexity. Building on this insight, we propose MambaNeXt-YOLO, a novel object detection framework that balances accuracy and efficiency through three key contributions: (1) MambaNeXt Block: a hybrid design that integrates CNNs with Mamba to effectively capture both local features and long-range dependencies; (2) Multi-branch Asymmetric Fusion Pyramid Network (MAFPN): an enhanced feature pyramid architecture that improves multi-scale object detection across various object sizes; and (3) Edge-focused Efficiency: our method achieved 66.6% mAP at 31.9 FPS on the PASCAL VOC dataset without any pre-training and supports deployment on edge devices such as the NVIDIA Jetson Xavier NX and Orin NX.

Updated: 2025-07-24 17:28:09

标题: MambaNeXt-YOLO：一种用于实时目标检测的混合状态空间模型

摘要: 实时目标检测是计算机视觉中一项基础但具有挑战性的任务，尤其是在计算资源有限的情况下。尽管YOLO系列模型通过平衡速度和准确性设定了强大的基准，但对于更丰富的全局上下文建模的需求不断增加，导致了Transformer-based架构的使用。然而，由于其自注意机制，Transformers具有较高的计算复杂性，限制了它们在实时和边缘部署中的实用性。为了克服这些挑战，最近在线性状态空间模型领域的发展（如Mamba）提供了一种有希望的替代方案，通过实现线性复杂度来实现高效的序列建模。基于这一见解，我们提出了MambaNeXt-YOLO，一种新颖的目标检测框架，通过三个关键贡献平衡了准确性和效率：（1）MambaNeXt块：一种混合设计，将CNN与Mamba集成，有效捕捉局部特征和远程依赖关系；（2）多分支不对称融合金字塔网络（MAFPN）：增强的特征金字塔架构，提高了不同目标尺寸的多尺度目标检测；（3）侧重效率：我们的方法在PASCAL VOC数据集上在31.9 FPS的情况下实现了66.6%的mAP，而且没有任何预训练，并支持在NVIDIA Jetson Xavier NX和Orin NX等边缘设备上部署。

更新时间: 2025-07-24 17:28:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2506.03654v3

Private Counterfactual Retrieval

Transparency and explainability are two extremely important aspects to be considered when employing black-box machine learning models in high-stake applications. Providing counterfactual explanations is one way of fulfilling this requirement. However, this also poses a threat to the privacy of both the institution that is providing the explanation as well as the user who is requesting it. In this work, we propose multiple schemes inspired by private information retrieval (PIR) techniques which ensure the \emph{user's privacy} when retrieving counterfactual explanations. We present a scheme which retrieves the \emph{exact} nearest neighbor counterfactual explanation from a database of accepted points while achieving perfect (information-theoretic) privacy for the user. While the scheme achieves perfect privacy for the user, some leakage on the database is inevitable which we quantify using a mutual information based metric. Furthermore, we propose strategies to reduce this leakage to achieve an advanced degree of database privacy. We extend these schemes to incorporate user's preference on transforming their attributes, so that a more actionable explanation can be received. Since our schemes rely on finite field arithmetic, we empirically validate our schemes on real datasets to understand the trade-off between the accuracy and the finite field sizes. Finally, we present numerical results to support our theoretical findings, and compare the database leakage of the proposed schemes.

Updated: 2025-07-24 17:25:40

标题: 私人反事实检索

摘要: 透明度和可解释性是在高风险应用中使用黑盒机器学习模型时必须考虑的两个极为重要的方面。提供反事实解释是满足这一要求的一种方式。然而，这也对提供解释的机构以及请求解释的用户的隐私构成威胁。在这项工作中，我们提出了多种受私人信息检索（PIR）技术启发的方案，确保在检索反事实解释时用户的隐私。我们提出了一种方案，从接受点数据库中检索出精确的最近邻反事实解释，同时为用户实现完美的（信息论）隐私。虽然该方案为用户实现了完美的隐私，但对数据库的一些泄露是不可避免的，我们使用基于互信息的度量来量化这种泄露。此外，我们提出了减少这种泄露以实现更高程度数据库隐私的策略。我们将这些方案扩展到包含用户对转换其属性的偏好，以便获得更具操作性的解释。由于我们的方案依赖于有限域算术，我们在真实数据集上对我们的方案进行了实证验证，以了解精度和有限域大小之间的权衡。最后，我们提出数值结果支持我们的理论发现，并比较所提出方案的数据库泄露情况。

更新时间: 2025-07-24 17:25:40

领域: cs.IT,cs.CR,cs.LG,eess.SP,math.IT

下载: http://arxiv.org/abs/2410.13812v2

Private Counterfactual Retrieval

Updated: 2025-07-24 17:25:40

标题: 私人反事实检索

摘要: 透明度和可解释性是在高风险应用中使用黑盒机器学习模型时必须考虑的两个极其重要的方面。提供反事实解释是满足这一要求的一种方式。然而，这也对提供解释的机构以及请求解释的用户的隐私构成威胁。在本研究中，我们提出了多种受私密信息检索（PIR）技术启发的方案，以确保用户在检索反事实解释时的隐私。我们提出了一个方案，从接受的数据点数据库中检索出确切的最近邻反事实解释，同时为用户实现了完全的（信息论）隐私。虽然该方案为用户实现了完全的隐私，但数据库上的一些泄漏是不可避免的，我们使用基于互信息的度量对其进行量化。此外，我们提出策略来减少这种泄漏，以实现更高级别的数据库隐私。我们扩展这些方案以结合用户对于转换其属性的偏好，以便接收到更具可操作性的解释。由于我们的方案依赖有限域算术，我们在真实数据集上进行了实证验证，以了解精度和有限域大小之间的权衡。最后，我们提供数值结果来支持我们的理论发现，并比较所提出方案的数据库泄漏情况。

更新时间: 2025-07-24 17:25:40

领域: cs.IT,cs.CR,cs.LG,eess.SP,math.IT

下载: http://arxiv.org/abs/2410.13812v2

DRWKV: Focusing on Object Edges for Low-Light Image Enhancement

Low-light image enhancement remains a challenging task, particularly in preserving object edge continuity and fine structural details under extreme illumination degradation. In this paper, we propose a novel model, DRWKV (Detailed Receptance Weighted Key Value), which integrates our proposed Global Edge Retinex (GER) theory, enabling effective decoupling of illumination and edge structures for enhanced edge fidelity. Secondly, we introduce Evolving WKV Attention, a spiral-scanning mechanism that captures spatial edge continuity and models irregular structures more effectively. Thirdly, we design the Bilateral Spectrum Aligner (Bi-SAB) and a tailored MS2-Loss to jointly align luminance and chrominance features, improving visual naturalness and mitigating artifacts. Extensive experiments on five LLIE benchmarks demonstrate that DRWKV achieves leading performance in PSNR, SSIM, and NIQE while maintaining low computational complexity. Furthermore, DRWKV enhances downstream performance in low-light multi-object tracking tasks, validating its generalization capabilities.

Updated: 2025-07-24 17:24:59

标题: DRWKV：专注于物体边缘的低光图像增强

摘要: 低光图像增强仍然是一个具有挑战性的任务，特别是在极端光照退化下保留物体边缘连续性和细节结构。在本文中，我们提出了一个新颖的模型，DRWKV（Detailed Receptance Weighted Key Value），它集成了我们提出的全局边缘Retinex（GER）理论，实现了对光照和边缘结构的有效解耦，以增强边缘的保真度。其次，我们引入了Evolving WKV Attention，这是一种螺旋扫描机制，可以更有效地捕捉空间边缘的连续性并对不规则结构进行建模。第三，我们设计了双边光谱对齐器（Bi-SAB）和特制的MS2-Loss，共同对齐亮度和色度特征，提高视觉自然性并减轻伪影。对五个LLIE基准进行的大量实验表明，DRWKV在PSNR、SSIM和NIQE方面取得了领先的性能，同时保持了低的计算复杂度。此外，DRWKV提高了低光环境下多目标跟踪任务的性能，验证了其泛化能力。

更新时间: 2025-07-24 17:24:59

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.18594v1

DRWKV: Focusing on Object Edges for Low-Light Image Enhancement

Updated: 2025-07-24 17:24:59

标题: DRWKV：聚焦于低光照图像增强的目标边缘

摘要: 低光图像增强仍然是一个具有挑战性的任务，特别是在极端照明恶化下保持对象边缘连续性和细微结构细节。在本文中，我们提出了一个新颖的模型，DRWKV（详细接受权重键值），它集成了我们提出的全局边缘Retinex（GER）理论，实现了有效的照明和边缘结构解耦，从而增强了边缘保真度。其次，我们引入了Evolving WKV Attention，一种螺旋扫描机制，可以更有效地捕捉空间边缘连续性并对不规则结构进行建模。第三，我们设计了双边谱对准器（Bi-SAB）和定制的MS2-Loss，共同对齐亮度和色度特征，提高了视觉自然性并减轻了伪影。在五个LLIE基准测试上进行的大量实验证明，DRWKV在PSNR、SSIM和NIQE方面取得了领先的性能，同时保持了较低的计算复杂度。此外，DRWKV提高了低光多目标跟踪任务的下游性能，验证了其泛化能力。

更新时间: 2025-07-24 17:24:59

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.18594v1

Scaling RL to Long Videos

We introduce a full-stack framework that scales up reasoning in vision-language models (VLMs) to long videos, leveraging reinforcement learning. We address the unique challenges of long video reasoning by integrating three critical components: (1) a large-scale dataset, LongVideo-Reason, comprising 104K long video QA pairs with high-quality reasoning annotations across diverse domains such as sports, games, and vlogs; (2) a two-stage training pipeline that extends VLMs with chain-of-thought supervised fine-tuning (CoT-SFT) and reinforcement learning (RL); and (3) a training infrastructure for long video RL, named Multi-modal Reinforcement Sequence Parallelism (MR-SP), which incorporates sequence parallelism and a vLLM-based engine tailored for long video, using cached video embeddings for efficient rollout and prefilling. In our experiments, LongVILA-R1-7B achieves strong performance on video benchmarks, reaching 65.0% and 70.7% accuracy on VideoMME without and with subtitles, respectively, and consistently outperforming LongVILA-R1 across multiple benchmarks. Moreover, LongVILA-R1 shows steady performance improvements as the number of input video frames increases. Notably, our MR-SP system achieves up to 2.1x speedup on long video RL training. In addition, we release our training system for public availability that supports RL training on various modalities (video, text, and audio), various models (VILA and Qwen series), and even image and video generation models. On a single A100 node (8 GPUs), it supports RL training on hour-long videos (e.g., 3,600 frames / around 256k tokens).

Updated: 2025-07-24 17:20:41

标题: 将强化学习扩展至长视频

摘要: 我们引入了一个全栈框架，通过强化学习将视觉语言模型（VLMs）的推理能力扩展到长视频。我们通过整合三个关键组件来解决长视频推理的独特挑战：（1）一个大规模数据集LongVideo-Reason，包括104K个长视频问答对，涵盖体育、游戏和视频博客等多个领域，并带有高质量的推理注释；（2）一个两阶段训练流程，通过链式思维监督微调（CoT-SFT）和强化学习（RL）扩展VLMs；以及（3）一个用于长视频强化学习的训练基础设施，名为多模态强化序列并行性（MR-SP），其中包含序列并行性和为长视频量身定制的vLLM引擎，利用缓存的视频嵌入进行高效的展开和预填充。在我们的实验中，LongVILA-R1-7B在视频基准上取得了强大的表现，分别在没有字幕和有字幕的情况下达到了65.0%和70.7%的准确率，并在多个基准测试中持续优于LongVILA-R1。此外，LongVILA-R1在输入视频帧数量增加时显示出稳定的性能改进。值得注意的是，我们的MR-SP系统在长视频RL训练上实现了高达2.1倍的加速。此外，我们发布了我们的训练系统以供公开使用，支持在各种模态（视频、文本和音频）、各种模型（VILA和Qwen系列）甚至图像和视频生成模型上进行RL训练。在单个A100节点（8个GPU）上，它支持对长视频（例如，3600帧/约256k个标记）进行RL训练。

更新时间: 2025-07-24 17:20:41

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.07966v2

A Foundation Model for Massive MIMO Precoding with an Adaptive per-User Rate-Power Tradeoff

Deep learning (DL) has emerged as a solution for precoding in massive multiple-input multiple-output (mMIMO) systems due to its capacity to learn the characteristics of the propagation environment. However, training such a model requires high-quality, local datasets at the deployment site, which are often difficult to collect. We propose a transformer-based foundation model for mMIMO precoding that seeks to minimize the energy consumption of the transmitter while dynamically adapting to per-user rate requirements. At equal energy consumption, zero-shot deployment of the proposed foundation model significantly outperforms zero forcing, and approaches weighted minimum mean squared error performance with 8x less complexity. To address model adaptation in data-scarce settings, we introduce a data augmentation method that finds training samples similar to the target distribution by computing the cosine similarity between the outputs of the pre-trained feature extractor. Our work enables the implementation of DL-based solutions in practice by addressing challenges of data availability and training complexity. Moreover, the ability to dynamically configure per-user rate requirements can be leveraged by higher level resource allocation and scheduling algorithms for greater control over energy efficiency, spectral efficiency and fairness.

Updated: 2025-07-24 17:10:06

标题: 一个适用于大规模MIMO预编码的基础模型，具有自适应的每用户速率-功率权衡

摘要: 深度学习（DL）已经成为大规模多输入多输出（mMIMO）系统中预编码的解决方案，因为它具有学习传播环境特性的能力。然而，训练这样的模型需要在部署现场收集高质量的本地数据集，这通常很难实现。我们提出了一种基于变压器的mMIMO预编码基础模型，旨在最小化发射机的能耗，同时动态适应每个用户的速率要求。在相同的能耗下，提出的基础模型的零次部署明显优于零强制，并且在复杂度降低8倍的情况下接近加权最小均方误差性能。为了解决数据稀缺环境下的模型适应问题，我们引入了一种数据增强方法，通过计算预训练特征提取器的输出之间的余弦相似度来找到与目标分布相似的训练样本。我们的工作通过解决数据可用性和训练复杂性的挑战，使DL-based解决方案在实践中得以实现。此外，动态配置每个用户的速率要求的能力可以通过更高级别的资源分配和调度算法，实现对能源效率、频谱效率和公平性的更大控制。

更新时间: 2025-07-24 17:10:06

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2507.18587v1

AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs

Despite the impressive performance of large language models (LLMs) in general domains, they often underperform in specialized domains. Existing approaches typically rely on data synthesis methods and yield promising results by using unlabeled data to capture domain-specific features. However, these methods either incur high computational costs or suffer from performance limitations, while also demonstrating insufficient generalization across different tasks. To address these challenges, we propose AQuilt, a framework for constructing instruction-tuning data for any specialized domains from corresponding unlabeled data, including Answer, Question, Unlabeled data, Inspection, Logic, and Task type. By incorporating logic and inspection, we encourage reasoning processes and self-inspection to enhance model performance. Moreover, customizable task instructions enable high-quality data generation for any task. As a result, we construct a dataset of 703k examples to train a powerful data synthesis model. Experiments show that AQuilt is comparable to DeepSeek-V3 while utilizing just 17% of the production cost. Further analysis demonstrates that our generated data exhibits higher relevance to downstream tasks. Source code, models, and scripts are available at https://github.com/Krueske/AQuilt.

Updated: 2025-07-24 17:03:27

标题: AQuilt：将逻辑与自我检查编织到低成本、高相关性数据合成中，用于专业LLMs

摘要: 尽管大型语言模型（LLMs）在一般领域表现出色，但它们在专业领域通常表现不佳。现有方法通常依赖于数据合成方法，并通过使用无标签数据捕获领域特定特征来产生有希望的结果。然而，这些方法要么需要高昂的计算成本，要么受到性能限制，同时在不同任务之间也表现出不足的泛化能力。为了解决这些挑战，我们提出了AQuilt，这是一个从相应的无标签数据中构建专业领域指导调整数据的框架，包括答案、问题、无标签数据、检查、逻辑和任务类型。通过融入逻辑和检查，我们鼓励推理过程和自我检查以增强模型性能。此外，可定制的任务说明使得任何任务都可以生成高质量的数据。因此，我们构建了一个包含703k个示例的数据集，用于训练一个强大的数据合成模型。实验表明，AQuilt与DeepSeek-V3相当，但仅利用了生产成本的17%。进一步的分析表明，我们生成的数据与下游任务的相关性更高。源代码、模型和脚本可在https://github.com/Krueske/AQuilt 上找到。

更新时间: 2025-07-24 17:03:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18584v1

DR.EHR: Dense Retrieval for Electronic Health Record with Knowledge Injection and Synthetic Data

Electronic Health Records (EHRs) are pivotal in clinical practices, yet their retrieval remains a challenge mainly due to semantic gap issues. Recent advancements in dense retrieval offer promising solutions but existing models, both general-domain and biomedical-domain, fall short due to insufficient medical knowledge or mismatched training corpora. This paper introduces \texttt{DR.EHR}, a series of dense retrieval models specifically tailored for EHR retrieval. We propose a two-stage training pipeline utilizing MIMIC-IV discharge summaries to address the need for extensive medical knowledge and large-scale training data. The first stage involves medical entity extraction and knowledge injection from a biomedical knowledge graph, while the second stage employs large language models to generate diverse training data. We train two variants of \texttt{DR.EHR}, with 110M and 7B parameters, respectively. Evaluated on the CliniQ benchmark, our models significantly outperforms all existing dense retrievers, achieving state-of-the-art results. Detailed analyses confirm our models' superiority across various match and query types, particularly in challenging semantic matches like implication and abbreviation. Ablation studies validate the effectiveness of each pipeline component, and supplementary experiments on EHR QA datasets demonstrate the models' generalizability on natural language questions, including complex ones with multiple entities. This work significantly advances EHR retrieval, offering a robust solution for clinical applications.

Updated: 2025-07-24 17:02:46

标题: DR.EHR: 密集检索用于具有知识注入和合成数据的电子健康记录

摘要: 电子健康记录（EHR）在临床实践中至关重要，但由于语义差距问题，它们的检索仍然是一个挑战。最近在密集检索方面取得的进展提供了有希望的解决方案，但现有模型，无论是通用领域还是生物医学领域，由于医学知识不足或训练语料库不匹配而存在不足之处。本文介绍了一个专门为EHR检索量身定制的密集检索模型系列\texttt{DR.EHR}。我们提出了一个两阶段训练流程，利用MIMIC-IV出院摘要来满足对广泛医学知识和大规模训练数据的需求。第一阶段涉及从生物医学知识图谱中提取医学实体和注入知识，而第二阶段则利用大型语言模型生成多样化的训练数据。我们训练了两个\texttt{DR.EHR}变体，分别具有110M和7B参数。在CliniQ基准测试中评估，我们的模型明显优于所有现有的密集检索器，实现了最先进的结果。详细分析证实了我们的模型在各种匹配和查询类型中的优越性，特别是在挑战性语义匹配（如含义和缩写）方面。消融研究验证了每个流程组件的有效性，并在EHR QA数据集上进行的补充实验展示了模型在自然语言问题上的泛化能力，包括涉及多个实体的复杂问题。这项工作显著推进了EHR检索，为临床应用提供了强大的解决方案。

更新时间: 2025-07-24 17:02:46

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.18583v1

On the Convergence of Gradient Descent on Learning Transformers with Residual Connections

Transformer models have emerged as fundamental tools across various scientific and engineering disciplines, owing to their outstanding performance in diverse applications. Despite this empirical success, the theoretical foundations of Transformers remain relatively underdeveloped, particularly in understanding their training dynamics. Existing research predominantly examines isolated components--such as self-attention mechanisms and feedforward networks--without thoroughly investigating the interdependencies between these components, especially when residual connections are present. In this paper, we aim to bridge this gap by analyzing the convergence behavior of a structurally complete yet single-layer Transformer, comprising self-attention, a feedforward network, and residual connections. We demonstrate that, under appropriate initialization, gradient descent exhibits a linear convergence rate, where the convergence speed is determined by the minimum and maximum singular values of the output matrix from the attention layer. Moreover, our analysis reveals that residual connections serve to ameliorate the ill-conditioning of this output matrix, an issue stemming from the low-rank structure imposed by the softmax operation, thereby promoting enhanced optimization stability. We also extend our theoretical findings to a multi-layer Transformer architecture, confirming the linear convergence rate of gradient descent under suitable initialization. Empirical results corroborate our theoretical insights, illustrating the beneficial role of residual connections in promoting convergence stability.

Updated: 2025-07-24 16:56:37

标题: 关于利用残差连接学习Transformer的梯度下降收敛性

摘要: Transformer模型已经成为各种科学和工程学科中的基本工具，这要归功于它们在各种应用中表现出色。尽管在实证成功方面表现出色，Transformer的理论基础仍相对不够完善，特别是在理解它们的训练动态方面。现有研究主要关注独立的组件，如自注意机制和前馈网络，而没有充分探讨这些组件之间的相互依赖性，尤其是在存在残差连接时。本文旨在通过分析一个结构完整但只有单层的Transformer的收敛行为来弥补这一差距，该Transformer包括自注意、前馈网络和残差连接。我们证明，在适当初始化的情况下，梯度下降表现出线性收敛速度，其中收敛速度由注意力层输出矩阵的最小和最大奇异值决定。此外，我们的分析表明，残差连接有助于改善这个输出矩阵的病态条件，这是由于softmax操作所施加的低秩结构引起的问题，从而促进了优化稳定性的提高。我们还将我们的理论发现扩展到多层Transformer架构，确认在适当初始化下梯度下降的线性收敛速度。经验结果证实了我们的理论见解，说明残差连接在促进收敛稳定性方面发挥了积极的作用。

更新时间: 2025-07-24 16:56:37

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2506.05249v3

On the Convergence of Gradient Descent on Learning Transformers with Residual Connections

Updated: 2025-07-24 16:56:37

标题: 关于利用残差连接学习Transformer中梯度下降的收敛性

摘要: Transformer模型已经成为各种科学和工程学科中的基本工具，这归功于它们在不同应用中出色的性能。尽管取得了这种经验成功，Transformer的理论基础仍然相对不够完善，特别是在理解它们的训练动态方面。现有的研究主要关注于独立组件--例如自注意机制和前馈网络--而没有深入研究这些组件之间的相互依赖关系，尤其是当残差连接存在时。在本文中，我们旨在通过分析一个结构完整但仅有单层的Transformer的收敛行为来填补这一空白，该Transformer包括自注意、前馈网络和残差连接。我们证明，在适当初始化的情况下，梯度下降表现出线性收敛速度，其中收敛速度由注意力层输出矩阵的最小和最大奇异值决定。此外，我们的分析揭示残差连接有助于改善这个输出矩阵的病态性，这是由于softmax操作所施加的低秩结构所导致的问题，从而促进了优化稳定性。我们还将我们的理论发现扩展到多层Transformer架构，证实了在合适初始化下梯度下降的线性收敛速度。经验结果证实了我们的理论见解，展示了残差连接在促进收敛稳定性方面的有益作用。

更新时间: 2025-07-24 16:56:37

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2506.05249v3

RUMI: Rummaging Using Mutual Information

This paper presents Rummaging Using Mutual Information (RUMI), a method for online generation of robot action sequences to gather information about the pose of a known movable object in visually-occluded environments. Focusing on contact-rich rummaging, our approach leverages mutual information between the object pose distribution and robot trajectory for action planning. From an observed partial point cloud, RUMI deduces the compatible object pose distribution and approximates the mutual information of it with workspace occupancy in real time. Based on this, we develop an information gain cost function and a reachability cost function to keep the object within the robot's reach. These are integrated into a model predictive control (MPC) framework with a stochastic dynamics model, updating the pose distribution in a closed loop. Key contributions include a new belief framework for object pose estimation, an efficient information gain computation strategy, and a robust MPC-based control scheme. RUMI demonstrates superior performance in both simulated and real tasks compared to baseline methods.

Updated: 2025-07-24 16:51:36

标题: 鲁米：利用互信息进行搜索

摘要: 本文介绍了一种名为Rummaging Using Mutual Information（RUMI）的方法，用于在线生成机器人动作序列，以收集关于在视觉遮挡环境中已知可移动物体姿势的信息。我们的方法侧重于接触丰富的翻找，利用物体姿势分布和机器人轨迹之间的互信息进行动作规划。从观察到的部分点云中，RUMI推断出兼容的物体姿势分布，并实时估计其与工作空间占用之间的互信息。基于此，我们开发了一个信息增益成本函数和一个可达性成本函数，以保持物体在机器人的触及范围内。这些被整合到一个具有随机动态模型的模型预测控制（MPC）框架中，通过一个闭环更新姿势分布。关键贡献包括一个用于物体姿势估计的新信念框架，一个高效的信息增益计算策略，以及一个稳健的基于MPC的控制方案。与基准方法相比，RUMI在模拟和真实任务中表现出优越性能。

更新时间: 2025-07-24 16:51:36

领域: cs.RO,cs.AI,I.2.9

下载: http://arxiv.org/abs/2408.10450v2

Compliance Brain Assistant: Conversational Agentic AI for Assisting Compliance Tasks in Enterprise Environments

This paper presents Compliance Brain Assistant (CBA), a conversational, agentic AI assistant designed to boost the efficiency of daily compliance tasks for personnel in enterprise environments. To strike a good balance between response quality and latency, we design a user query router that can intelligently choose between (i) FastTrack mode: to handle simple requests that only need additional relevant context retrieved from knowledge corpora; and (ii) FullAgentic mode: to handle complicated requests that need composite actions and tool invocations to proactively discover context across various compliance artifacts, and/or involving other APIs/models for accommodating requests. A typical example would be to start with a user query, use its description to find a specific entity and then use the entity's information to query other APIs for curating and enriching the final AI response. Our experimental evaluations compared CBA against an out-of-the-box LLM on various real-world privacy/compliance-related queries targeting various personas. We found that CBA substantially improved upon the vanilla LLM's performance on metrics such as average keyword match rate (83.7% vs. 41.7%) and LLM-judge pass rate (82.0% vs. 20.0%). We also compared metrics for the full routing-based design against the `fast-track only` and `full-agentic` modes and found that it had a better average match-rate and pass-rate while keeping the run-time approximately the same. This finding validated our hypothesis that the routing mechanism leads to a good trade-off between the two worlds.

Updated: 2025-07-24 16:50:13

标题: 《合规智能助手：在企业环境中辅助合规任务的对话式代理人人工智能》

摘要: 这篇论文介绍了Compliance Brain Assistant (CBA)，这是一个对话式、代理式的人工智能助手，旨在提高企业环境中人员日常合规任务的效率。为了在响应质量和延迟之间取得良好平衡，我们设计了一个用户查询路由器，可以智能选择以下两种模式：(i) FastTrack模式：处理只需要从知识库中检索附加相关上下文的简单请求；和(ii) FullAgentic模式：处理需要复合操作和工具调用主动发现各种合规文档中的上下文，和/或涉及其他API/模型以满足请求的复杂请求。一个典型的例子是从用户查询开始，使用其描述找到特定实体，然后使用实体信息查询其他API以策划和丰富最终的AI响应。我们对CBA与现成的LLM在针对各种真实世界隐私/合规相关查询以及不同角色的实验评估进行了比较。我们发现，CBA在诸如平均关键词匹配率（83.7％对41.7％）和LLM判定通过率（82.0％对20.0％）等指标上明显优于基础LLM的表现。我们还比较了全路由设计的指标与“仅快速通道”和“全主动”模式的指标，发现它在保持运行时间大致相同的情况下，具有更好的平均匹配率和通过率。这一发现验证了我们的假设，即路由机制在两个世界之间取得了良好的折衷。

更新时间: 2025-07-24 16:50:13

领域: cs.AI

下载: http://arxiv.org/abs/2507.17289v2

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety. It is developed by our proposed SafeLadder framework, which incorporates large-scale, progressive, safety-oriented reinforcement learning post-training, supported by a suite of multi-principled verifiers. Unlike previous alignment methods such as RLHF that simply learn human preferences, SafeLadder enables SafeWork-R1 to develop intrinsic safety reasoning and self-reflection abilities, giving rise to safety `aha' moments. Notably, SafeWork-R1 achieves an average improvement of $46.54\%$ over its base model Qwen2.5-VL-72B on safety-related benchmarks without compromising general capabilities, and delivers state-of-the-art safety performance compared to leading proprietary models such as GPT-4.1 and Claude Opus 4. To further bolster its reliability, we implement two distinct inference-time intervention methods and a deliberative search mechanism, enforcing step-level verification. Finally, we further develop SafeWork-R1-InternVL3-78B, SafeWork-R1-DeepSeek-70B, and SafeWork-R1-Qwen2.5VL-7B. All resulting models demonstrate that safety and capability can co-evolve synergistically, highlighting the generalizability of our framework in building robust, reliable, and trustworthy general-purpose AI.

Updated: 2025-07-24 16:49:19

标题: SafeWork-R1：在AI-45$^{\circ}$法律下共同发展安全和智能

摘要: 我们介绍了SafeWork-R1，这是一个先进的多模态推理模型，展示了能力和安全性的共同演化。它是由我们提出的SafeLadder框架开发的，该框架包括大规模、渐进式、以安全为导向的强化学习后训练，支持一套多原则的验证器。与以前的对齐方法（如RLHF）不同，它仅仅学习人类偏好，SafeLadder使SafeWork-R1能够发展内在的安全性推理和自我反思能力，从而产生安全性的“顿悟”时刻。值得注意的是，SafeWork-R1在安全相关基准测试中比其基础模型Qwen2.5-VL-72B平均提高了46.54%，而不会损害一般的能力，并且与领先的专有模型（如GPT-4.1和Claude Opus 4）相比，在安全性能方面达到了最先进的水平。为了进一步增强其可靠性，我们实施了两种不同的推理时间干预方法和一个深思熟虑的搜索机制，强制执行步骤级别的验证。最后，我们进一步开发了SafeWork-R1-InternVL3-78B、SafeWork-R1-DeepSeek-70B和SafeWork-R1-Qwen2.5VL-7B。所有产生的模型都表明，安全性和能力可以协同进化，突出了我们框架在构建强大、可靠和值得信赖的通用人工智能方面的泛化能力。

更新时间: 2025-07-24 16:49:19

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2507.18576v1

Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning

Large Language Models (LLMs) exhibit considerable promise in financial applications; however, prevailing models frequently demonstrate limitations when confronted with scenarios that necessitate sophisticated reasoning capabilities, stringent trustworthiness criteria, and efficient adaptation to domain-specific requirements. We introduce the Agentar-Fin-R1 series of financial large language models (8B and 32B parameters), specifically engineered based on the Qwen3 foundation model to enhance reasoning capabilities, reliability, and domain specialization for financial applications. Our optimization approach integrates a high-quality, systematic financial task label system with a comprehensive multi-layered trustworthiness assurance framework. This framework encompasses high-quality trustworthy knowledge engineering, multi-agent trustworthy data synthesis, and rigorous data validation governance. Through label-guided automated difficulty-aware optimization, tow-stage training pipeline, and dynamic attribution systems, we achieve substantial improvements in training efficiency. Our models undergo comprehensive evaluation on mainstream financial benchmarks including Fineva, FinEval, and FinanceIQ, as well as general reasoning datasets such as MATH-500 and GPQA-diamond. To thoroughly assess real-world deployment capabilities, we innovatively propose the Finova evaluation benchmark, which focuses on agent-level financial reasoning and compliance verification. Experimental results demonstrate that Agentar-Fin-R1 not only achieves state-of-the-art performance on financial tasks but also exhibits exceptional general reasoning capabilities, validating its effectiveness as a trustworthy solution for high-stakes financial applications. The Finova bench is available at https://github.com/antgroup/Finova.

Updated: 2025-07-24 16:46:58

标题: Agentar-Fin-R1：通过领域专业知识、培训效率和高级推理增强金融智能

摘要: 大型语言模型（LLMs）在金融应用中展现出相当大的潜力；然而，现有模型在面对需要复杂推理能力、严格的可信度标准和高效适应领域特定需求的情景时经常表现出局限性。我们引入了Agentar-Fin-R1系列金融大型语言模型（8B和32B参数），专门基于Qwen3基础模型进行设计，以增强推理能力、可靠性和领域专业化，用于金融应用。我们的优化方法整合了高质量的系统性金融任务标签系统和全面的多层次可信度保障框架。该框架包括高质量可信的知识工程、多代理可信数据合成和严格的数据验证治理。通过标签引导的自动化难度感知优化、两阶段训练流程和动态归因系统，我们实现了训练效率的显著提升。我们的模型在主流金融基准测试中进行了全面评估，包括Fineva、FinEval和FinanceIQ，以及通用推理数据集，如MATH-500和GPQA-diamond。为了全面评估实际部署能力，我们创新地提出了Finova评估基准，重点关注代理级金融推理和合规验证。实验结果表明，Agentar-Fin-R1不仅在金融任务上取得了最先进的表现，还展现出出色的通用推理能力，验证了其作为高风险金融应用的可信解决方案的有效性。Finova基准可在https://github.com/antgroup/Finova获得。

更新时间: 2025-07-24 16:46:58

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.16802v3

PosterMate: Audience-driven Collaborative Persona Agents for Poster Design

Poster designing can benefit from synchronous feedback from target audiences. However, gathering audiences with diverse perspectives and reconciling them on design edits can be challenging. Recent generative AI models present opportunities to simulate human-like interactions, but it is unclear how they may be used for feedback processes in design. We introduce PosterMate, a poster design assistant that facilitates collaboration by creating audience-driven persona agents constructed from marketing documents. PosterMate gathers feedback from each persona agent regarding poster components, and stimulates discussion with the help of a moderator to reach a conclusion. These agreed-upon edits can then be directly integrated into the poster design. Through our user study (N=12), we identified the potential of PosterMate to capture overlooked viewpoints, while serving as an effective prototyping tool. Additionally, our controlled online evaluation (N=100) revealed that the feedback from an individual persona agent is appropriate given its persona identity, and the discussion effectively synthesizes the different persona agents' perspectives.

Updated: 2025-07-24 16:46:25

标题: 海報夥伴：面向觀眾的海報設計協作角色代理人

摘要: 海报设计可以从目标受众的同步反馈中受益。然而，聚集具有不同观点的受众并在设计编辑上取得一致可能具有挑战性。最近的生成式人工智能模型提供了模拟类似人类互动的机会，但目前尚不清楚它们如何用于设计中的反馈过程。我们引入了名为PosterMate的海报设计助手，通过创建从营销文件中构建的以受众为驱动的角色代理来促进协作。PosterMate从每个角色代理收集关于海报组件的反馈，并在主持人的帮助下激发讨论以达成结论。然后，这些达成一致的编辑可以直接整合到海报设计中。通过我们的用户研究（N=12），我们发现PosterMate具有捕捉被忽视观点的潜力，同时也是一种有效的原型工具。此外，我们进行的受控在线评估（N=100）显示，单个角色代理的反馈与其角色身份相符，讨论有效地综合了不同角色代理的观点。

更新时间: 2025-07-24 16:46:25

领域: cs.HC,cs.AI,cs.CL,H.5.2; I.2.7

下载: http://arxiv.org/abs/2507.18572v1

Proceedings 19th International Workshop on the ACL2 Theorem Prover and Its Applications

The ACL2 Workshop series is the major technical forum for users of the ACL2 theorem proving system to present research related to the ACL2 theorem prover and its applications. ACL2 is an industrial-strength automated reasoning system, the latest in the Boyer-Moore family of theorem provers. The 2005 ACM Software System Award was awarded to Boyer, Kaufmann, and Moore for their work on ACL2 and the other theorem provers in the Boyer-Moore family.

Updated: 2025-07-24 16:42:15

标题: 第19届ACL2定理证明器及其应用国际研讨会论文集

摘要: ACL2研讨会系列是ACL2定理证明系统用户展示与ACL2定理证明器及其应用相关研究的主要技术论坛。ACL2是一种工业强度的自动推理系统，是Boyer-Moore定理证明器家族中最新的成员。2005年，ACM软件系统奖授予了Boyer、Kaufmann和Moore，以表彰他们在ACL2和Boyer-Moore家族中其他定理证明器上的工作。

更新时间: 2025-07-24 16:42:15

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2507.18567v1

GIIFT: Graph-guided Inductive Image-free Multimodal Machine Translation

Multimodal Machine Translation (MMT) has demonstrated the significant help of visual information in machine translation. However, existing MMT methods face challenges in leveraging the modality gap by enforcing rigid visual-linguistic alignment whilst being confined to inference within their trained multimodal domains. In this work, we construct novel multimodal scene graphs to preserve and integrate modality-specific information and introduce GIIFT, a two-stage Graph-guided Inductive Image-Free MMT framework that uses a cross-modal Graph Attention Network adapter to learn multimodal knowledge in a unified fused space and inductively generalize it to broader image-free translation domains. Experimental results on the Multi30K dataset of English-to-French and English-to-German tasks demonstrate that our GIIFT surpasses existing approaches and achieves the state-of-the-art, even without images during inference. Results on the WMT benchmark show significant improvements over the image-free translation baselines, demonstrating the strength of GIIFT towards inductive image-free inference.

Updated: 2025-07-24 16:36:47

标题: GIIFT：基于图像指导的无图像感知多模态机器翻译

摘要: 多模态机器翻译（MMT）已经证明了在机器翻译中利用视觉信息的显著帮助。然而，现有的MMT方法在利用模态差距方面面临挑战，因为它们在强制执行严格的视觉-语言对齐的同时，受到了在其训练的多模态领域内进行推理的限制。在这项工作中，我们构建了新颖的多模态场景图，以保留和整合模态特定信息，并引入了GIIFT，一个两阶段的图引导归纳图像无关的MMT框架，使用跨模态图注意力网络适配器在统一融合空间中学习多模态知识，并将其归纳推广到更广泛的图像无关翻译领域。在英语到法语和英语到德语任务的Multi30K数据集上的实验结果表明，我们的GIIFT超过了现有的方法，并实现了最先进的水平，即使在推理过程中没有图像。在WMT基准测试中的结果显示，与无图像翻译基线相比，GIIFT表现出显著的改进，展示了GIIFT在归纳图像无关推理方面的优势。

更新时间: 2025-07-24 16:36:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18562v1

Beyond Internal Data: Constructing Complete Datasets for Fairness Testing

As AI becomes prevalent in high-risk domains and decision-making, it is essential to test for potential harms and biases. This urgency is reflected by the global emergence of AI regulations that emphasise fairness and adequate testing, with some mandating independent bias audits. However, procuring the necessary data for fairness testing remains a significant challenge. Particularly in industry settings, legal and privacy concerns restrict the collection of demographic data required to assess group disparities, and auditors face practical and cultural challenges in gaining access to data. Further, internal historical datasets are often insufficiently representative to identify real-world biases. This work focuses on evaluating classifier fairness when complete datasets including demographics are inaccessible. We propose leveraging separate overlapping datasets to construct complete synthetic data that includes demographic information and accurately reflects the underlying relationships between protected attributes and model features. We validate the fidelity of the synthetic data by comparing it to real data, and empirically demonstrate that fairness metrics derived from testing on such synthetic data are consistent with those obtained from real data. This work, therefore, offers a path to overcome real-world data scarcity for fairness testing, enabling independent, model-agnostic evaluation of fairness, and serving as a viable substitute where real data is limited.

Updated: 2025-07-24 16:35:42

标题: 超越内部数据：构建完整数据集进行公平性测试

摘要: 随着人工智能在高风险领域和决策中变得普遍，测试潜在的危害和偏见至关重要。全球范围内出现了强调公平和充分测试的人工智能法规，其中一些要求进行独立的偏见审计。然而，获取公平性测试所需的必要数据仍然是一个重大挑战。特别是在工业环境中，法律和隐私问题限制了收集用于评估群体差异的人口统计数据，审计人员在获取数据方面面临实际和文化挑战。此外，内部历史数据集通常不足以代表真实世界的偏见。本研究侧重于评估分类器的公平性，当包含人口统计数据的完整数据集无法获取时。我们提出利用不同重叠数据集来构建包含人口统计信息的完整合成数据，以准确反映受保护属性和模型特征之间的基本关系。我们通过将合成数据与真实数据进行比较来验证合成数据的准确性，并实证地证明在此类合成数据上进行测试得出的公平性指标与在真实数据上获得的指标一致。因此，这项工作提供了一条克服公平测试中真实世界数据稀缺性的途径，实现了独立、与模型无关的公平性评估，并作为真实数据有限时的可行替代。

更新时间: 2025-07-24 16:35:42

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2507.18561v1

Beyond Internal Data: Constructing Complete Datasets for Fairness Testing

Updated: 2025-07-24 16:35:42

标题: 超越内部数据：构建完整数据集以进行公平性测试

摘要: 随着人工智能在高风险领域和决策中变得普遍，测试潜在危害和偏见至关重要。这种紧迫性反映在全球AI监管的出现上，强调公平和充分测试，其中一些要求独立的偏见审计。然而，为公平测试获取必要的数据仍然是一个重大挑战。特别是在工业环境中，法律和隐私问题限制了收集评估群体差异所需的人口统计数据，并且审计员在获取数据方面面临实际和文化挑战。此外，内部历史数据集通常不足以代表性地识别现实世界的偏见。本研究侧重于在无法访问完整包含人口统计数据的数据集时评估分类器的公平性。我们提出利用不同重叠数据集构建完整的合成数据，包括人口统计信息，并准确反映受保护属性和模型特征之间的基本关系。我们通过将其与真实数据进行比较来验证合成数据的忠实度，并实证证明在此类合成数据上进行测试得出的公平性指标与从真实数据中获得的一致。因此，这项工作提供了一条路径，以克服公平测试的现实数据稀缺问题，实现公平性的独立、不受模型限制的评估，并作为现实数据有限时的可行替代方案。

更新时间: 2025-07-24 16:35:42

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2507.18561v1

HARLF: Hierarchical Reinforcement Learning and Lightweight LLM-Driven Sentiment Integration for Financial Portfolio Optimization

This paper presents a novel hierarchical framework for portfolio optimization, integrating lightweight Large Language Models (LLMs) with Deep Reinforcement Learning (DRL) to combine sentiment signals from financial news with traditional market indicators. Our three-tier architecture employs base RL agents to process hybrid data, meta-agents to aggregate their decisions, and a super-agent to merge decisions based on market data and sentiment analysis. Evaluated on data from 2018 to 2024, after training on 2000-2017, the framework achieves a 26% annualized return and a Sharpe ratio of 1.2, outperforming equal-weighted and S&P 500 benchmarks. Key contributions include scalable cross-modal integration, a hierarchical RL structure for enhanced stability, and open-source reproducibility.

Updated: 2025-07-24 16:35:24

标题: HARLF: 金融投资组合优化的分层强化学习和轻量级LLM驱动情感集成

摘要: 这篇论文提出了一个新颖的层次化框架用于投资组合优化，将轻量级大型语言模型（LLMs）与深度强化学习（DRL）相结合，将来自金融新闻的情绪信号与传统市场指标相结合。我们的三层架构采用基本RL代理处理混合数据，元代理汇总它们的决策，以及一个超级代理基于市场数据和情绪分析合并决策。在2018年至2024年的数据上进行评估，在2000年至2017年的训练后，该框架实现了26%的年化回报和1.2的夏普比率，胜过等权重和标普500基准。关键贡献包括可扩展的跨模态集成，用于增强稳定性的层次化RL结构，以及开源可重现性。

更新时间: 2025-07-24 16:35:24

领域: q-fin.PM,cs.AI

下载: http://arxiv.org/abs/2507.18560v1

Concept Probing: Where to Find Human-Defined Concepts (Extended Version)

Concept probing has recently gained popularity as a way for humans to peek into what is encoded within artificial neural networks. In concept probing, additional classifiers are trained to map the internal representations of a model into human-defined concepts of interest. However, the performance of these probes is highly dependent on the internal representations they probe from, making identifying the appropriate layer to probe an essential task. In this paper, we propose a method to automatically identify which layer's representations in a neural network model should be considered when probing for a given human-defined concept of interest, based on how informative and regular the representations are with respect to the concept. We validate our findings through an exhaustive empirical analysis over different neural network models and datasets.

Updated: 2025-07-24 16:30:10

标题: 概念探究：在哪里找到人类定义的概念（扩展版）

摘要: 概念探测最近作为一种让人类窥视人工神经网络内部编码的方法而变得流行起来。在概念探测中，额外的分类器被训练用于将模型的内部表示映射到人类定义的感兴趣的概念。然而，这些探针的性能高度依赖于它们所探测的内部表示，因此确定适当的层进行探测是一项至关重要的任务。在本文中，我们提出了一种方法，根据内部表示与概念相关性的信息量和规律性，自动确定神经网络模型中应该考虑哪一层的表示来探测给定的人类定义的感兴趣的概念。我们通过对不同神经网络模型和数据集进行详尽的实证分析来验证我们的发现。

更新时间: 2025-07-24 16:30:10

领域: cs.LG,cs.AI,cs.CV,cs.NE

下载: http://arxiv.org/abs/2507.18681v1

Concept Probing: Where to Find Human-Defined Concepts (Extended Version)

Updated: 2025-07-24 16:30:10

标题: 概念探究：在哪里找到人类定义的概念（扩展版）

摘要: 概念探测最近在人类窥视编码在人工神经网络中的内容方面变得流行。在概念探测中，额外的分类器被训练来将模型的内部表示映射到人类定义的感兴趣的概念中。然而，这些探针的性能高度依赖于它们所探测的内部表示，因此确定适当的层进行探测是一项关键任务。在本文中，我们提出了一种方法，根据表示与概念相关的信息性和规则性，自动识别神经网络模型中应考虑哪个层的表示进行给定人类定义的感兴趣的概念的探测。我们通过对不同神经网络模型和数据集进行详尽的实证分析来验证我们的发现。

更新时间: 2025-07-24 16:30:10

领域: cs.LG,cs.AI,cs.CV,cs.NE

下载: http://arxiv.org/abs/2507.18681v1

Neural Tangent Kernels and Fisher Information Matrices for Simple ReLU Networks with Random Hidden Weights

Fisher information matrices and neural tangent kernels (NTK) for 2-layer ReLU networks with random hidden weight are argued. We discuss the relation between both notions as a linear transformation and show that spectral decomposition of NTK with concrete forms of eigenfunctions with major eigenvalues. We also obtain an approximation formula of the functions presented by the 2-layer neural networks.

Updated: 2025-07-24 16:26:52

标题: 简单ReLU网络的神经切向核和费舍尔信息矩阵与随机隐藏权重

摘要: 本文讨论了具有随机隐藏权重的2层ReLU网络的Fisher信息矩阵和神经切线核（NTK）。我们讨论了两个概念之间作为线性变换的关系，并展示了具有主要特征值的具体形式的特征函数的NTK的谱分解。我们还获得了由2层神经网络呈现的函数的近似公式。

更新时间: 2025-07-24 16:26:52

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.18555v1

Neural Tangent Kernels and Fisher Information Matrices for Simple ReLU Networks with Random Hidden Weights

Updated: 2025-07-24 16:26:52

标题: 神经切向核和随机隐藏权重下简单ReLU网络的费舍尔信息矩阵

摘要: 这篇文章讨论了具有随机隐藏权重的2层ReLU网络的Fisher信息矩阵和神经切向核（NTK）。我们讨论了这两个概念之间作为线性变换的关系，并展示了具有主要特征值的具体形式的特征函数的NTK的谱分解。我们还获得了2层神经网络呈现的函数的近似公式。

更新时间: 2025-07-24 16:26:52

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.18555v1

Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards

The advancement of general-purpose artificial intelligence relies on large language models (LLMs) that excel across a wide range of tasks, from structured reasoning to creative generation. However, post-training methods like Supervised Fine-Tuning (SFT) often struggle with generalization, favoring memorization over transferable learning. In this work, we introduce Omni-Thinker, a unified reinforcement learning (RL) framework that enhances LLM performance across diverse tasks by combining rule-based verifiable rewards with generative preference signals via LLM-as-a-Judge evaluations. Our approach enables consistent optimization across task types and scales RL-based training to subjective domains. We further investigate training strategies, demonstrating that a curriculum-based progression that orders tasks from structured to open-ended improves performance and reduces forgetting. Experimental results across four domains reveal that curriculum learning improves performance by 5.2% over joint training and 9.1% over model merging. These results highlight the importance of task-aware sampling and hybrid supervision in scaling RL-based post-training for general-purpose LLMs.

Updated: 2025-07-24 16:25:54

标题: 全思者：通过混合奖励的多任务强化学习在LLMs中扩展跨领域泛化

摘要: 通用人工智能的进步依赖于在各种任务中表现出色的大型语言模型(LLMs)，从结构化推理到创造性生成。然而，像监督微调(SFT)这样的后训练方法常常在泛化方面遇到困难，更倾向于记忆而非可转移的学习。在这项工作中，我们介绍了Omni-Thinker，这是一个统一的强化学习(RL)框架，通过将基于规则的可验证奖励与通过LLM作为评判者的生成偏好信号相结合，提高了LLM在各种任务中的表现。我们的方法通过任务类型之间的一致优化，并将基于RL的训练扩展到主观领域。我们进一步研究训练策略，表明基于课程的任务进展顺序从结构化到开放式可以提高性能并减少遗忘。在四个领域的实验结果显示，课程学习比联合训练提高了5.2%，比模型融合提高了9.1%。这些结果突显了任务感知采样和混合监督在扩展RL后训练通用LLMs方面的重要性。

更新时间: 2025-07-24 16:25:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.14783v2

Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards

Updated: 2025-07-24 16:25:54

标题: 全能思考者：通过混合奖励的多任务强化学习在LLMs中扩展跨领域泛化

摘要: 通用人工智能的进步依赖于在各种任务中表现出色的大型语言模型（LLMs），从结构化推理到创造性生成。然而，像监督微调（SFT）这样的训练后方法经常在泛化方面遇到困难，更倾向于记忆而非可转移学习。在这项工作中，我们引入了Omni-Thinker，一个统一的强化学习（RL）框架，通过将基于规则的可验证奖励与通过LLM作为评委评估的生成偏好信号相结合，提高了LLM在各种任务中的性能。我们的方法通过任务类型之间的一致优化并将基于RL的训练扩展到主观领域。我们进一步研究了训练策略，表明基于课程的进展，将任务从结构化到开放式进行排序，可以提高性能并减少遗忘。在四个领域的实验结果显示，课程学习使性能提高了5.2%，比联合训练提高了9.1%。这些结果突显了任务感知采样和混合监督在扩展RL-based训练用于通用LLM时的重要性。

更新时间: 2025-07-24 16:25:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.14783v2

LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important

The increasing size of the Key-Value (KV) cache during the Large Language Models long-context inference is the main obstacle for its balance between the deployment cost and task accuracy. To reduce the KV cache size in such scenarios, most previous efforts leveraged on the attention weight to evict non-critical cache tokens. But there is a trade-off in those methods, they usually require major modification of the inference infrastructure and significant computation overhead. Based on the fact that the Large Language models are autoregressive models, we propose LagKV, a KV compression strategy only relying on straight forward comparison among KV themselves. It is a totally attention free method which offers easy integration to the main stream inference platform and comparable performance comparing to other complicated KV compression methods. Results on RULER benchmark show that, our approach outperforms SnapKV and StreamingLLM in different compression ratios. Especially in the 64-digit passkey retrieval task, our method outperforms the attention weight based method $H_2O$ over $50\%$ with same compression ratios. Our code is available at https://github.com/AI-Lab-China-Merchants-Bank/LagKV.

Updated: 2025-07-24 16:25:51

标题: LagKV: KV缓存的滞后相对信息告诉哪些标记是重要的

摘要: 在大型语言模型长上下文推理过程中，键-值（KV）缓存的增大是其在部署成本和任务准确性之间保持平衡的主要障碍。为了减少这种情况下的KV缓存大小，大多数先前的努力都依赖于注意力权重来清除非关键的缓存令牌。但这些方法存在折衷，它们通常需要对推理基础设施进行重大修改，并带来显著的计算开销。基于大型语言模型是自回归模型的事实，我们提出了LagKV，一种仅依赖于KV之间的直接比较的KV压缩策略。这是一种完全无需注意力的方法，可以轻松集成到主流推理平台中，并且在性能上与其他复杂的KV压缩方法相媲美。在RULER基准测试中的结果显示，我们的方法在不同的压缩比中胜过了SnapKV和StreamingLLM。特别是在64位密码检索任务中，我们的方法在相同的压缩比下比基于注意力权重的方法$H_2O$性能提高了超过50%。我们的代码可在https://github.com/AI-Lab-China-Merchants-Bank/LagKV 上找到。

更新时间: 2025-07-24 16:25:51

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2504.04704v2

LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important

Updated: 2025-07-24 16:25:51

标题: LagKV：KV缓存的滞后信息告诉哪些令牌很重要

摘要: 在大型语言模型长文本推理过程中，键值（KV）缓存的增加是平衡部署成本和任务准确性的主要障碍。为了减少这种情况下的KV缓存大小，大多数先前的努力都依赖于注意力权重来清除非关键的缓存标记。但是这些方法存在权衡，它们通常需要对推理基础设施进行重大修改，并产生显著的计算开销。基于大型语言模型是自回归模型的事实，我们提出了LagKV，一种仅依赖于KV之间直接比较的KV压缩策略。这是一种完全无需注意力的方法，易于集成到主流推理平台，并且与其他复杂的KV压缩方法相比具有可比性的性能。在RULER基准测试中的结果显示，我们的方法在不同的压缩比下优于SnapKV和StreamingLLM。特别是在64位数字密码检索任务中，我们的方法在相同的压缩比下优于基于注意力权重的方法$H_2O$超过50％。我们的代码可在https://github.com/AI-Lab-China-Merchants-Bank/LagKV找到。

更新时间: 2025-07-24 16:25:51

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2504.04704v2

The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Quantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto more affordable accelerators. GPTQ emerged as one of the standard methods for one-shot post-training quantization at LLM scale. Yet, its inner workings are described as a sequence of ad-hoc algebraic updates that obscure any geometric meaning or worst-case guarantees. In this work, we show that, when executed back-to-front (from the last to first dimension) for a linear layer, GPTQ is mathematically identical to Babai's nearest plane algorithm for the classical closest vector problem (CVP) on a lattice defined by the Hessian matrix of the layer's inputs. This equivalence is based on a sophisticated mathematical argument, and has two analytical consequences: (i) the GPTQ error propagation step gains an intuitive geometric interpretation; (ii) GPTQ inherits the error upper bound of Babai's algorithm under the no-clipping condition. Taken together, these results place GPTQ on firm theoretical footing and open the door to importing decades of progress in lattice algorithms towards the design of future quantization algorithms for billion-parameter models.

Updated: 2025-07-24 16:22:18

标题: LLM量子化的几何学：GPTQ作为巴拜最近平面算法

摘要: 将大型语言模型（LLMs）的权重从16位量化为更低位宽是将庞大的变压器部署到更实惠的加速器的实际方法。 GPTQ已成为LLM规模下一次训练后量化的标准方法之一。然而，其内部工作被描述为一系列模糊任何几何含义或最坏情况保证的特定代数更新。在这项工作中，我们展示了，当针对线性层从后向前执行（从最后到第一维度）时，GPTQ在数学上等同于Babai的最近平面算法，用于由该层输入的Hessian矩阵定义的格点上的经典最近向量问题（CVP）。这种等价性基于一个复杂的数学论证，并具有两个分析结果：（i）GPTQ的误差传播步骤获得直观的几何解释；（ii）在无剪裁条件下，GPTQ继承了Babai算法的误差上界。综合起来，这些结果将GPTQ置于坚实的理论基础上，并为将来设计亿参数模型的量化算法引入几十年来在格点算法方面取得的进展打开大门。

更新时间: 2025-07-24 16:22:18

领域: cs.LG

下载: http://arxiv.org/abs/2507.18553v1

The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Updated: 2025-07-24 16:22:18

标题: LLM量子化的几何：GPTQ作为巴拜最近平面算法

摘要: 将大型语言模型（LLMs）的权重从16位量化为更低位宽是将大规模变压器部署到更实惠的加速器上的事实方法。 GPTQ已成为LLM规模的一次性后训练量化的标准方法之一。然而，其内部工作被描述为一系列遮蔽任何几何意义或最坏情况保证的特设代数更新。在这项工作中，我们展示了，当针对线性层从后向前执行（从最后到第一维度）时，GPTQ在数学上与Babai的最近平面算法完全相同，用于由层的输入的Hessian矩阵定义的晶格上的经典最近向量问题（CVP）。这种等价性基于复杂的数学论证，并具有两个分析结果：（i）GPTQ错误传播步骤获得直观的几何解释；（ii）在无剪切条件下，GPTQ继承了Babai算法的错误上限。综合起来，这些结果使GPTQ在坚实的理论基础上，并打开了将几十年来在晶格算法方面取得的进展引入到为十亿参数模型设计未来量化算法的大门。

更新时间: 2025-07-24 16:22:18

领域: cs.LG

下载: http://arxiv.org/abs/2507.18553v1

Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

Fine-tuning Large Language Models (LLMs) has proven effective for a variety of downstream tasks. However, as LLMs grow in size, the memory demands for backpropagation become increasingly prohibitive. Zeroth-order (ZO) optimization methods offer a memory-efficient alternative by using forward passes to estimate gradients, but the variance of gradient estimates typically scales linearly with the model's parameter dimension$\unicode{x2013}$a significant issue for LLMs. In this paper, we propose the random Subspace Zeroth-order (SubZero) optimization to address the challenges posed by LLMs' high dimensionality. We introduce a low-rank perturbation tailored for LLMs that significantly reduces memory consumption while improving training performance. Additionally, we prove that our gradient estimation closely approximates the backpropagation gradient, exhibits lower variance than traditional ZO methods, and ensures convergence when combined with SGD. Experimental results show that SubZero enhances fine-tuning performance and achieves faster convergence compared to standard ZO approaches like MeZO across various language modeling tasks. Code is available at https://github.com/zimingyy/SubZero.

Updated: 2025-07-24 16:21:10

标题: 在随机子空间中对LLMs进行零阶微调

摘要: 大型语言模型（LLMs）的微调已被证明对各种下游任务有效。然而，随着LLMs的规模增长，反向传播的内存需求变得越来越高。零阶（ZO）优化方法通过使用前向传递来估计梯度，提供了一种内存高效的替代方案，但梯度估计的方差通常随着模型参数维度线性增长，这对LLMs来说是一个重要问题。在本文中，我们提出了随机子空间零阶（SubZero）优化方法，以解决LLMs高维度所带来的挑战。我们引入了专为LLMs定制的低秩扰动，显著降低了内存消耗同时提高了训练性能。此外，我们证明了我们的梯度估计与反向传播梯度紧密逼近，比传统的ZO方法具有更低的方差，并且与SGD结合时保证收敛。实验结果表明，SubZero提高了微调性能，并在各种语言建模任务中比MeZO等标准ZO方法实现更快的收敛。代码可在https://github.com/zimingyy/SubZero上找到。

更新时间: 2025-07-24 16:21:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.08989v3

Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

Updated: 2025-07-24 16:21:10

标题: 在随机子空间中对LLMs进行零阶微调

摘要: 大语言模型（LLMs）的微调已被证明对各种下游任务有效。然而，随着LLMs的规模增长，反向传播的内存需求变得越来越高。零阶（ZO）优化方法通过使用前向传递来估计梯度，提供了一种内存有效的替代方案，但梯度估计的方差通常随模型参数维度线性增长，这对LLMs来说是一个重要问题。在本文中，我们提出了随机子空间零阶（SubZero）优化，以解决LLMs高维度所带来的挑战。我们引入了一种专为LLMs定制的低秩扰动，显著减少内存消耗同时提高训练性能。此外，我们证明了我们的梯度估计与反向传播梯度紧密逼近，比传统ZO方法具有更低的方差，并且与SGD结合时确保收敛。实验结果表明，与MeZO等标准ZO方法相比，SubZero提高了微调性能并在各种语言建模任务中实现更快的收敛。代码可在https://github.com/zimingyy/SubZero上找到。

更新时间: 2025-07-24 16:21:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.08989v3

VideoMind: An Omni-Modal Video Dataset with Intent Grounding for Deep-Cognitive Video Understanding

This paper introduces VideoMind, a video-centric omni-modal dataset designed for deep video content cognition and enhanced multi-modal feature representation. The dataset comprises 103K video samples (3K reserved for testing), each paired with audio and systematically detailed textual descriptions. Specifically, every video and its audio is described across three hierarchical layers (factual, abstract, and intent), progressing from surface to depth. It contains over 22 million words, averaging ~225 words per sample. VideoMind's key distinction from existing datasets is its provision of intent expressions, which require contextual integration across the entire video and are not directly observable. These deep-cognitive expressions are generated using a Chain-of-Thought (COT) approach, prompting the mLLM through step-by-step reasoning. Each description includes annotations for subject, place, time, event, action, and intent, supporting downstream recognition tasks. Crucially, we establish a gold-standard benchmark with 3,000 manually validated samples for evaluating deep-cognitive video understanding. We design hybrid-cognitive retrieval experiments, scored by multi-level retrieval metrics, to appropriately assess deep video comprehension. Evaluation results for models (e.g., InternVideo, VAST, UMT-L) are released. VideoMind serves as a powerful benchmark for fine-grained cross-modal alignment and advances fields requiring in-depth video understanding, such as emotion and intent recognition. The data is publicly available on GitHub, HuggingFace, and OpenDataLab, https://github.com/cdx-cindy/VideoMind.

Updated: 2025-07-24 16:19:43

标题: VideoMind：一种具有意图基础的全模态视频数据集，用于深度认知视频理解

摘要: 这篇论文介绍了VideoMind，这是一个以视频为中心的全模态数据集，旨在进行深度视频内容认知和增强多模态特征表示。该数据集包含103,000个视频样本（其中3,000个用于测试），每个样本都配有音频并有系统详细的文本描述。具体而言，每个视频及其音频都在三个层次（事实、抽象和意图）上进行描述，从表面到深度逐渐展开。它包含超过2200万个词，每个样本平均约225个词。VideoMind与现有数据集的主要区别在于其提供意图表达，这需要在整个视频中进行上下文集成，不能直接观察到。这些深层认知表达是使用Chain-of-Thought（COT）方法生成的，通过逐步推理促使mLLM。每个描述都包括主题、地点、时间、事件、动作和意图的注释，支持下游识别任务。关键是，我们建立了一个包含3,000个手动验证样本的黄金标准基准，用于评估深层认知视频理解。我们设计了混合认知检索实验，并通过多级检索指标评分，以适当评估深度视频理解。对于模型（例如InternVideo、VAST、UMT-L）的评估结果已发布。VideoMind作为一个强大的基准，可用于细粒度跨模态对齐，并推动需要深入了解视频的领域，如情感和意图识别。数据公开可在GitHub、HuggingFace和OpenDataLab上获取，链接为https://github.com/cdx-cindy/VideoMind。

更新时间: 2025-07-24 16:19:43

领域: cs.CV,cs.AI,68T45, 68T50, 68U35,,I.4.8; I.2.7; I.2.10; H.5.1

下载: http://arxiv.org/abs/2507.18552v1

On the Performance of Concept Probing: The Influence of the Data (Extended Version)

Concept probing has recently garnered increasing interest as a way to help interpret artificial neural networks, dealing both with their typically large size and their subsymbolic nature, which ultimately renders them unfeasible for direct human interpretation. Concept probing works by training additional classifiers to map the internal representations of a model into human-defined concepts of interest, thus allowing humans to peek inside artificial neural networks. Research on concept probing has mainly focused on the model being probed or the probing model itself, paying limited attention to the data required to train such probing models. In this paper, we address this gap. Focusing on concept probing in the context of image classification tasks, we investigate the effect of the data used to train probing models on their performance. We also make available concept labels for two widely used datasets.

Updated: 2025-07-24 16:18:46

标题: 关于概念探测性能的影响：数据的影响（扩展版）

摘要: 概念探测近年来越来越受到关注，作为解释人工神经网络的一种方法，因为这些网络通常规模庞大且具有符号下位特征，最终使它们难以直接被人类解释。概念探测通过训练额外的分类器将模型的内部表示映射到人类定义的感兴趣的概念，从而允许人类窥视人工神经网络。概念探测的研究主要集中在被探测的模型或探测模型本身上，对于训练这些探测模型所需的数据给予了有限的关注。本文填补了这一空白。在图像分类任务的背景下，我们重点研究了用于训练探测模型的数据对其性能的影响。我们还为两个广泛使用的数据集提供了概念标签。

更新时间: 2025-07-24 16:18:46

领域: cs.AI,cs.CV,cs.LG,cs.NE

下载: http://arxiv.org/abs/2507.18550v1

On the Performance of Concept Probing: The Influence of the Data (Extended Version)

Updated: 2025-07-24 16:18:46

标题: 关于概念探测性能的影响: 数据的影响（扩展版）

摘要: 概念探测最近引起了越来越多的关注，作为帮助解释人工神经网络的方法，因为它们通常具有庞大的规模和符号下的特性，最终使它们无法直接被人类解释。概念探测通过训练额外的分类器将模型的内部表示映射到人类定义的感兴趣的概念，从而使人类可以窥视人工神经网络。概念探测的研究主要集中在被探测的模型或探测模型本身，对用于训练这些探测模型所需的数据给予了有限的关注。在本文中，我们填补了这一空白。着眼于图像分类任务中的概念探测，我们调查了用于训练探测模型的数据对其性能的影响。我们还为两个广泛使用的数据集提供了概念标签。

更新时间: 2025-07-24 16:18:46

领域: cs.AI,cs.CV,cs.LG,cs.NE

下载: http://arxiv.org/abs/2507.18550v1

Market Making Strategies with Reinforcement Learning

This thesis presents the results of a comprehensive research project focused on applying Reinforcement Learning (RL) to the problem of market making in financial markets. Market makers (MMs) play a fundamental role in providing liquidity, yet face significant challenges arising from inventory risk, competition, and non-stationary market dynamics. This research explores how RL, particularly Deep Reinforcement Learning (DRL), can be employed to develop autonomous, adaptive, and profitable market making strategies. The study begins by formulating the MM task as a reinforcement learning problem, designing agents capable of operating in both single-agent and multi-agent settings within a simulated financial environment. It then addresses the complex issue of inventory management using two complementary approaches: reward engineering and Multi-Objective Reinforcement Learning (MORL). While the former uses dynamic reward shaping to guide behavior, the latter leverages Pareto front optimization to explicitly balance competing objectives. To address the problem of non-stationarity, the research introduces POW-dTS, a novel policy weighting algorithm based on Discounted Thompson Sampling. This method allows agents to dynamically select and combine pretrained policies, enabling continual adaptation to shifting market conditions. The experimental results demonstrate that the proposed RL-based approaches significantly outperform traditional and baseline algorithmic strategies across various performance metrics. Overall, this research thesis contributes new methodologies and insights for the design of robust, efficient, and adaptive market making agents, reinforcing the potential of RL to transform algorithmic trading in complex financial systems.

Updated: 2025-07-24 16:17:49

标题: 使用强化学习的市场做市策略

摘要: 这篇论文介绍了一个针对金融市场做市问题应用强化学习（RL）的全面研究项目的结果。市场做市商（MMs）在提供流动性方面发挥着基础性作用，但面临着因库存风险、竞争和非稳态市场动态而产生的重大挑战。这项研究探讨了如何利用RL，特别是深度强化学习（DRL），开发自主、适应性和盈利性的市场做市策略。该研究首先将MM任务构建为一个强化学习问题，设计能够在模拟金融环境中单一代理和多代理设置中运行的代理。然后，通过两种互补方法解决库存管理的复杂问题：奖励工程和多目标强化学习（MORL）。前者利用动态奖励塑造引导行为，后者利用帕累托前沿优化明确平衡竞争目标。为解决非稳态问题，研究引入了POW-dTS，一种基于折扣汤普森抽样的新型策略加权算法。该方法允许代理动态选择和组合预先训练的策略，实现对不断变化的市场条件的持续适应。实验结果表明，所提出的基于RL的方法在各种绩效指标上明显优于传统和基准算法策略。总体而言，这项研究提出了设计强健、高效和适应性市场做市代理的新方法和见解，强调了RL在改变复杂金融系统中的算法交易潜力。

更新时间: 2025-07-24 16:17:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18680v1

Market Making Strategies with Reinforcement Learning

Updated: 2025-07-24 16:17:49

标题: 使用强化学习的市场做市策略

摘要: 这篇论文介绍了一个全面的研究项目的结果，重点是将强化学习（RL）应用于金融市场中的做市问题。做市商（MMs）在提供流动性方面发挥着基础作用，但面临着由库存风险、竞争和非平稳市场动态所引起的重大挑战。这项研究探讨了如何利用RL，特别是深度强化学习（DRL），来开发自主、适应性和有利可图的做市策略。该研究首先将MM任务形式化为一个强化学习问题，设计了能够在模拟的金融环境中以单一代理和多代理的方式运作的代理。然后，通过两种互补的方法来解决库存管理的复杂问题：奖励工程和多目标强化学习（MORL）。前者使用动态奖励塑造来引导行为，而后者利用帕累托前沿优化来明确平衡竞争目标。为了解决非平稳性问题，研究引入了POW-dTS，一种基于折扣汤普森采样的新型策略加权算法。这种方法允许代理动态选择和组合预训练策略，实现对不断变化的市场条件的持续适应。实验结果表明，所提出的基于RL的方法在各种性能指标上明显优于传统和基准算法策略。总的来说，这项研究论文为设计强大、高效和适应性的做市代理提供了新的方法和见解，强调了RL在改变复杂金融系统中的算法交易潜力。

更新时间: 2025-07-24 16:17:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18680v1

The Price equation reveals a universal force-metric-bias law of algorithmic learning and natural selection

Diverse learning algorithms, optimization methods, and natural selection share a common mathematical structure, despite their apparent differences. Here I show that a simple notational partitioning of change by the Price equation reveals a universal force-metric-bias (FMB) law: $\Delta\mathbf{\theta} = \mathbf{M}\,\mathbf{f} + \mathbf{b} + \mathbf{\xi}$. The force $\mathbf{f}$ drives improvement in parameters, $\Delta\mathbf{\theta}$, through the covariance between the parameters and performance. The metric $\mathbf{M}$ rescales movement by inverse curvature. The bias $\mathbf{b}$ adds momentum or changes in the frame of reference. The noise $\mathbf{\xi}$ enables exploration. This framework unifies natural selection, Bayesian updating, Newton's method, stochastic gradient descent, stochastic Langevin dynamics, Adam optimization, and most other algorithms as special cases of the same underlying process. The Price equation also reveals why Fisher information, Kullback-Leibler divergence, and d'Alembert's principle arise naturally in learning dynamics. By exposing this common structure, the FMB law provides a principled foundation for understanding, comparing, and designing learning algorithms across disciplines.

Updated: 2025-07-24 16:13:56

标题: 价格方程揭示了一种算法学习和自然选择的普遍力-度量偏差定律

摘要: 多样化的学习算法、优化方法和自然选择共享一个共同的数学结构，尽管它们表面上有所不同。在这里，我展示了通过Price方程的简单符号分割揭示了一个普遍的力-度量-偏差（FMB）定律：$\Delta\mathbf{\theta} = \mathbf{M}\,\mathbf{f} + \mathbf{b} + \mathbf{\xi}$。力$\mathbf{f}$通过参数和性能之间的协方差推动参数$\Delta\mathbf{\theta}$的改进。度量$\mathbf{M}$通过曲率的倒数重新调整移动。偏差$\mathbf{b}$增加动量或改变参考框架。噪声$\mathbf{\xi}$实现探索。这个框架将自然选择、贝叶斯更新、牛顿法、随机梯度下降、随机Langevin动力学、Adam优化以及大多数其他算法统一为相同基础过程的特殊情况。Price方程还揭示了为什么费舍尔信息、Kullback-Leibler散度和达朗贝尔原理在学习动态中自然产生。通过揭示这一共同结构，FMB定律为跨学科理解、比较和设计学习算法提供了一个有原则的基础。

更新时间: 2025-07-24 16:13:56

领域: cs.LG,q-bio.PE

下载: http://arxiv.org/abs/2507.18549v1

The Price equation reveals a universal force-metric-bias law of algorithmic learning and natural selection

Updated: 2025-07-24 16:13:56

标题: 价格方程揭示了算法学习和自然选择的普遍力-度量-偏差定律

摘要: 不同的学习算法、优化方法和自然选择共享一个共同的数学结构，尽管它们表面上有所不同。在这里，我展示了通过Price方程简单的符号划分揭示了一个通用的力-度量-偏差（FMB）定律：$\Delta\mathbf{\theta} = \mathbf{M}\,\mathbf{f} + \mathbf{b} + \mathbf{\xi}$。力$\mathbf{f}$通过参数和性能之间的协方差推动参数$\Delta\mathbf{\theta}$的改进。度量$\mathbf{M}$通过曲率的倒数重新调整移动。偏差$\mathbf{b}$添加动量或参考系中的变化。噪声$\mathbf{\xi}$实现探索。这个框架将自然选择、贝叶斯更新、牛顿方法、随机梯度下降、随机朗格朗日动力学、Adam优化和大多数其他算法统一为同一基础过程的特殊情况。Price方程还揭示了为什么费舍尔信息、Kullback-Leibler散度和达朗贝尔原理在学习动态中自然产生。通过揭示这种共同结构，FMB定律为理解、比较和跨学科设计学习算法提供了一个有原则的基础。

更新时间: 2025-07-24 16:13:56

领域: cs.LG,q-bio.PE

下载: http://arxiv.org/abs/2507.18549v1

GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface

Information extraction (IE) is fundamental to numerous NLP applications, yet existing solutions often require specialized models for different tasks or rely on computationally expensive large language models. We present GLiNER2, a unified framework that enhances the original GLiNER architecture to support named entity recognition, text classification, and hierarchical structured data extraction within a single efficient model. Built pretrained transformer encoder architecture, GLiNER2 maintains CPU efficiency and compact size while introducing multi-task composition through an intuitive schema-based interface. Our experiments demonstrate competitive performance across extraction and classification tasks with substantial improvements in deployment accessibility compared to LLM-based alternatives. We release GLiNER2 as an open-source pip-installable library with pre-trained models and documentation at https://github.com/fastino-ai/GLiNER2.

Updated: 2025-07-24 16:11:14

标题: GLiNER2：具有基于模式驱动界面的高效多任务信息提取系统

摘要: 信息提取（IE）对于许多自然语言处理应用都是基础性的，然而现有的解决方案通常需要针对不同任务的专门模型，或者依赖于计算成本高昂的大型语言模型。我们提出了GLiNER2，这是一个统一的框架，增强了原始的GLiNER架构，以支持命名实体识别、文本分类和层次化结构化数据提取在一个高效的模型中。基于预训练的转换器编码器架构，GLiNER2保持了CPU效率和紧凑尺寸，同时通过直观的基于模式的接口引入了多任务组合。我们的实验表明，在提取和分类任务中，GLiNER2表现出与基于LLM的替代方案相比的竞争性性能，并显著改善了部署的可访问性。我们将GLiNER2作为一个开源的pip-installable库发布，附带预训练模型和文档，网址为https://github.com/fastino-ai/GLiNER2。

更新时间: 2025-07-24 16:11:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18546v1

Learning Gentle Grasping Using Vision, Sound, and Touch

In our daily life, we often encounter objects that are fragile and can be damaged by excessive grasping force, such as fruits. For these objects, it is paramount to grasp gently -- not using the maximum amount of force possible, but rather the minimum amount of force necessary. This paper proposes using visual, tactile, and auditory signals to learn to grasp and regrasp objects stably and gently. Specifically, we use audio signals as an indicator of gentleness during the grasping, and then train an end-to-end action-conditional model from raw visuo-tactile inputs that predicts both the stability and the gentleness of future grasping candidates, thus allowing the selection and execution of the most promising action. Experimental results on a multi-fingered hand over 1,500 grasping trials demonstrated that our model is useful for gentle grasping by validating the predictive performance (3.27% higher accuracy than the vision-only variant) and providing interpretations of their behavior. Finally, real-world experiments confirmed that the grasping performance with the trained multi-modal model outperformed other baselines (17% higher rate for stable and gentle grasps than vision-only). Our approach requires neither tactile sensor calibration nor analytical force modeling, drastically reducing the engineering effort to grasp fragile objects. Dataset and videos are available at https://lasr.org/research/gentle-grasping.

Updated: 2025-07-24 16:08:12

标题: 学习使用视觉、声音和触觉进行温和抓取

摘要: 在我们日常生活中，我们经常遇到易碎并且可能被过度抓取力损坏的物体，比如水果。对于这些物体，温和地抓取是至关重要的——不是使用可能的最大力量，而是最小必要力量。本文提出使用视觉、触觉和听觉信号来学习稳定和温和地抓取和重新抓取物体。具体来说，我们使用音频信号作为在抓取过程中温和性的指示，并从原始视觉-触觉输入训练一个端到端的动作条件模型，该模型预测未来抓取候选物体的稳定性和温和性，从而允许选择和执行最有前途的动作。在一个多指手上进行了1,500次抓取试验的实验结果表明，我们的模型对于温和抓取是有用的，通过验证预测性能（比仅视觉变体高出3.27%的准确率）并解释他们的行为。最后，真实世界的实验证实，经过训练的多模态模型的抓取性能优于其他基线（稳定和温和抓取比仅视觉高出17%）。我们的方法既不需要触觉传感器校准，也不需要分析力模型，大大减少了抓取易碎物体的工程工作量。数据集和视频可在https://lasr.org/research/gentle-grasping上获得。

更新时间: 2025-07-24 16:08:12

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.07926v2

Learning Gentle Grasping Using Vision, Sound, and Touch

Updated: 2025-07-24 16:08:12

标题: 学习使用视觉、声音和触觉进行轻柔抓取

摘要: 在我们日常生活中，我们经常会遇到易碎且容易受损的物体，比如水果。对于这些物体，轻柔地抓取是至关重要的——不是使用可能的最大力量，而是使用必要的最小力量。本文提出利用视觉、触觉和听觉信号来学习稳定而轻柔地抓取和重新抓取物体。具体而言，我们使用音频信号作为抓取过程中轻柔性的指示器，然后从原始视觉-触觉输入训练一个端到端的动作条件模型，预测未来抓取候选物体的稳定性和轻柔性，从而选择和执行最有前途的行动。在一个多指手上的1,500次抓取试验中的实验结果表明，我们的模型在轻柔抓取方面是有效的，通过验证预测性能（比仅视觉变体高出3.27%的准确率）并解释它们的行为。最后，真实世界实验证实，经过训练的多模态模型的抓取性能优于其他基线（稳定和轻柔抓取比仅视觉高出17%）。我们的方法既不需要触觉传感器校准，也不需要分析力模型，大大减少了抓取易碎物体的工程工作量。数据集和视频可在https://lasr.org/research/gentle-grasping获取。

更新时间: 2025-07-24 16:08:12

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.07926v2

Deep Variational Free Energy Calculation of Hydrogen Hugoniot

We develop a deep variational free energy framework to compute the equation of state of hydrogen in the warm dense matter region. This method parameterizes the variational density matrix of hydrogen nuclei and electrons at finite temperature using three deep generative models: a normalizing flow model that represents the Boltzmann distribution of the classical nuclei, an autoregressive transformer that models the distribution of electrons in excited states, and a permutational equivariant flow model that constructs backflow coordinates for electrons in Hartree-Fock orbitals. By jointly optimizing the three neural networks to minimize the variational free energy, we obtain the equation of state and related thermodynamic properties of dense hydrogen. We compare our results with other theoretical and experimental results on the deuterium Hugoniot curve, aiming to resolve existing discrepancies. The calculated results provide a valuable benchmark for deuterium in the warm dense matter region.

Updated: 2025-07-24 16:07:13

标题: 深度变分自由能计算氢Hugoniot

摘要: 我们开发了一个深度变分自由能框架，用于计算氢在温暖致密物质区域的状态方程。该方法参数化了氢核和电子的变分密度矩阵，在有限温度下使用三个深度生成模型：一个表示经典核玻尔兹曼分布的归一化流模型，一个建模激发态电子分布的自回归变压器，以及一个构建哈特里-福克轨道中电子反向流坐标的置换等变流模型。通过联合优化三个神经网络以最小化变分自由能，我们得到了密集氢的状态方程和相关热力学性质。我们将我们的结果与其他关于氘Hugoniot曲线的理论和实验结果进行比较，旨在解决现有的矛盾之处。计算结果为温暖致密物质区域的氘提供了宝贵的基准。

更新时间: 2025-07-24 16:07:13

领域: cond-mat.str-el,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2507.18540v1

Deep Variational Free Energy Calculation of Hydrogen Hugoniot

Updated: 2025-07-24 16:07:13

标题: 深度变分自由能计算氢Hugoniot

摘要: 我们发展了一个深度变分自由能框架来计算氢在温暖稠密物质区域的状态方程。该方法通过三个深度生成模型参数化氢原子核和电子的变分密度矩阵，在有限温度下使用三个深度生成模型：一个表示经典原子核玻尔兹曼分布的正规流模型，一个模拟激发态电子分布的自回归变压器，以及一个构建哈特里-福克轨道中电子反流坐标的排列等变流模型。通过联合优化三个神经网络以最小化变分自由能，我们获得了密集氢的状态方程和相关热力学性质。我们将结果与其他理论和实验结果在重氢Hugoniot曲线上进行比较，旨在解决现有的差异。计算结果为在温暖稠密物质区域的重氢提供了有价值的基准。

更新时间: 2025-07-24 16:07:13

领域: cond-mat.str-el,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2507.18540v1

AI/ML Life Cycle Management for Interoperable AI Native RAN

Artificial intelligence (AI) and machine learning (ML) models are rapidly permeating the 5G Radio Access Network (RAN), powering beam management, channel state information (CSI) feedback, positioning, and mobility prediction. However, without a standardized life-cycle management (LCM) framework, challenges, such as model drift, vendor lock-in, and limited transparency, hinder large-scale adoption. 3GPP Releases 16-20 progressively evolve AI/ML from experimental features to managed, interoperable network functions. Beginning with the Network Data Analytics Function (NWDAF) in Rel-16, subsequent releases introduced standardized interfaces for model transfer, execution, performance monitoring, and closed-loop control, culminating in Rel-20's two-sided CSI-compression Work Item and vendor-agnostic LCM profile. This article reviews the resulting five-block LCM architecture, KPI-driven monitoring mechanisms, and inter-vendor collaboration schemes, while identifying open challenges in resource-efficient monitoring, environment drift detection, intelligent decision-making, and flexible model training. These developments lay the foundation for AI-native transceivers as a key enabler for 6G.

Updated: 2025-07-24 16:04:59

标题: AI/ML生命周期管理对于可互操作的AI原生RAN的重要性

摘要: 人工智能（AI）和机器学习（ML）模型正在迅速渗透到5G无线接入网络（RAN）中，为波束管理、信道状态信息（CSI）反馈、定位和移动预测提供动力。然而，缺乏标准化的生命周期管理（LCM）框架，挑战诸如模型漂移、供应商锁定和透明度有限等问题，阻碍了大规模采用。3GPP发布的16-20版本逐步将AI/ML从实验性功能发展为受控的、可互操作的网络功能。从Rel-16开始的网络数据分析功能（NWDAF），随后的版本引入了用于模型传输、执行、性能监控和闭环控制的标准化接口，最终在Rel-20的双向CSI压缩工作项目和供应商无关的LCM配置文件中达到了顶峰。本文审查了由此产生的五个块的LCM架构、以KPI为驱动的监控机制以及跨供应商的合作方案，同时确定了资源高效监控、环境漂移检测、智能决策和灵活模型训练等方面的开放挑战。这些发展为AI原生的收发器作为6G的关键推动者奠定了基础。

更新时间: 2025-07-24 16:04:59

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2507.18538v1

AI/ML Life Cycle Management for Interoperable AI Native RAN

Updated: 2025-07-24 16:04:59

标题: AI/ML生命周期管理对于可互操作的AI原生RAN的重要性

摘要: 人工智能（AI）和机器学习（ML）模型正在迅速渗透5G无线接入网络（RAN），推动波束管理、信道状态信息（CSI）反馈、定位和移动性预测。然而，缺乏标准化的生命周期管理（LCM）框架，诸如模型漂移、供应商锁定和透明度有限等挑战阻碍了大规模采用。3GPP Releases 16-20逐步将AI/ML从实验性功能演变为受控的、可互操作的网络功能。从Rel-16中的网络数据分析功能（NWDAF）开始，随后的版本引入了用于模型传输、执行、性能监控和闭环控制的标准化接口，最终在Rel-20中实现了双向CSI压缩工作项目和供应商无关的LCM配置文件。本文回顾了由此产生的五个模块LCM架构、以KPI为驱动的监控机制和跨供应商的协作方案，同时识别了资源高效监控、环境漂移检测、智能决策和灵活模型训练等方面的挑战。这些发展为AI原生收发器作为6G的关键推动因素奠定了基础。

更新时间: 2025-07-24 16:04:59

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2507.18538v1

External Knowledge Injection for CLIP-Based Class-Incremental Learning

Class-Incremental Learning (CIL) enables learning systems to continuously adapt to evolving data streams. With the advancement of pre-training, leveraging pre-trained vision-language models (e.g., CLIP) offers a promising starting point for CIL. However, CLIP makes decisions by matching visual embeddings to class names, overlooking the rich contextual information conveyed through language. For instance, the concept of ``cat'' can be decomposed into features like tail, fur, and face for recognition. Besides, since the model is continually updated, these detailed features are overwritten in CIL, requiring external knowledge for compensation. In this paper, we introduce ExterNal knowledGe INjEction (ENGINE) for CLIP-based CIL. To enhance knowledge transfer from outside the dataset, we propose a dual-branch injection tuning framework that encodes informative knowledge from both visual and textual modalities. The visual branch is enhanced with data augmentation to enrich the visual features, while the textual branch leverages GPT-4 to rewrite discriminative descriptors. In addition to this on-the-fly knowledge injection, we also implement post-tuning knowledge by re-ranking the prediction results during inference. With the injected knowledge, the model can better capture informative features for downstream tasks as data evolves. Extensive experiments demonstrate the state-of-the-art performance of ENGINE. Code is available at: https://github.com/LAMDA-CL/ICCV25-ENGINE

Updated: 2025-07-24 16:04:36

标题: 基于CLIP的增量学习的外部知识注入

摘要: 增量学习（CIL）使学习系统能够持续适应不断演变的数据流。随着预训练技术的进步，利用预训练的视觉-语言模型（例如CLIP）为CIL提供了一个有前途的起点。然而，CLIP通过将视觉嵌入与类别名称匹配来做出决策，忽略了通过语言传达的丰富上下文信息。例如，“猫”的概念可以被分解为尾巴、毛发和脸部等特征进行识别。此外，由于模型不断更新，这些详细特征在CIL中被覆盖，需要外部知识进行补偿。在本文中，我们介绍了基于CLIP的CIL的外部知识注入（ENGINE）。为了增强来自数据集外部的知识传输，我们提出了一个双分支注入调整框架，从视觉和文本模态中编码信息性知识。视觉分支通过数据增强加强视觉特征，而文本分支利用GPT-4来重写区分性描述符。除了这种即时的知识注入，我们还实现了通过在推断过程中重新排列预测结果来进行后期调整。通过注入的知识，模型可以更好地捕捉下游任务中的信息特征随着数据的演变。大量实验证明了ENGINE的最先进性能。代码可在以下链接找到：https://github.com/LAMDA-CL/ICCV25-ENGINE

更新时间: 2025-07-24 16:04:36

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.08510v2

External Knowledge Injection for CLIP-Based Class-Incremental Learning

Updated: 2025-07-24 16:04:36

标题: 基于CLIP的类增量学习的外部知识注入

摘要: 增量式学习（CIL）使学习系统能够持续适应不断变化的数据流。随着预训练技术的发展，利用预训练的视觉-语言模型（例如CLIP）为CIL提供了一个有前途的起点。然而，CLIP通过将视觉嵌入与类别名称进行匹配来做出决策，忽略了语言传达的丰富上下文信息。例如，“猫”的概念可以分解为尾巴、毛发和面部等特征以进行识别。此外，由于模型不断更新，这些详细特征在CIL中被覆盖，需要外部知识进行补偿。本文介绍了基于CLIP的CIL的外部知识注入（ENGINE）。为了增强来自数据集之外的知识传输，我们提出了一个双分支注入调整框架，可以从视觉和文本模态中编码信息丰富的知识。通过数据增强来增强视觉分支，丰富视觉特征，同时利用GPT-4重写有区别的描述符来增强文本分支。除了这种即时的知识注入，我们还通过重新排列推断过程中的预测结果来实现后期调整知识。通过注入知识，模型能够更好地捕捉下游任务中的信息特征随着数据的演变。大量实验证明了ENGINE的最先进性能。代码可在以下网址找到：https://github.com/LAMDA-CL/ICCV25-ENGINE

更新时间: 2025-07-24 16:04:36

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.08510v2

Elucidating the Design Space of Arbitrary-Noise-Based Diffusion Models

EDM elucidates the unified design space of diffusion models, yet its fixed noise patterns restricted to pure Gaussian noise, limit advancements in image restoration. Our study indicates that forcibly injecting Gaussian noise corrupts the degraded images, overextends the image transformation distance, and increases restoration complexity. To address this problem, our proposed EDA Elucidates the Design space of Arbitrary-noise-based diffusion models. Theoretically, EDA expands the freedom of noise pattern while preserving the original module flexibility of EDM, with rigorous proof that increased noise complexity incurs no additional computational overhead during restoration. EDA is validated on three typical tasks: MRI bias field correction (global smooth noise), CT metal artifact reduction (global sharp noise), and natural image shadow removal (local boundary-aware noise). With only 5 sampling steps, EDA outperforms most task-specific methods and achieves state-of-the-art performance in bias field correction and shadow removal.

Updated: 2025-07-24 16:01:34

标题: 阐明基于任意噪声的扩散模型的设计空间

摘要: EDM阐明了扩散模型的统一设计空间，然而其固定的噪声模式仅限于纯高斯噪声，限制了图像恢复的进展。我们的研究表明，强制注入高斯噪声会损坏退化图像，过度扩展图像变换距离，并增加恢复复杂性。为解决这一问题，我们提出的EDA阐明了基于任意噪声的扩散模型的设计空间。从理论上讲，EDA扩展了噪声模式的自由度，同时保留了EDM的原始模块灵活性，严格证明了增加噪声复杂性在恢复过程中不会增加额外的计算开销。EDA在三个典型任务上进行了验证：MRI偏置场校正（全局平滑噪声）、CT金属伪影减少（全局尖锐噪声）和自然图像阴影去除（局部边界感知噪声）。仅需5个采样步骤，EDA胜过大多数任务特定方法，在偏置场校正和阴影去除方面取得了最先进的性能。

更新时间: 2025-07-24 16:01:34

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18534v1

Elucidating the Design Space of Arbitrary-Noise-Based Diffusion Models

Updated: 2025-07-24 16:01:34

标题: 阐明基于任意噪声的扩散模型的设计空间

摘要: EDM阐明了扩散模型的统一设计空间，然而其固定的噪声模式仅限于纯高斯噪声，限制了图像恢复方面的进展。我们的研究表明，强制注入高斯噪声会破坏退化图像，扩大图像变换距离，并增加恢复复杂性。为了解决这个问题，我们提出的EDA阐明了基于任意噪声的扩散模型的设计空间。理论上，EDA在保留EDM原始模块灵活性的同时扩展了噪声模式的自由度，并严格证明增加噪声复杂性在恢复过程中不会增加额外的计算开销。EDA在MRI偏置场校正（全局平滑噪声）、CT金属伪影减少（全局尖锐噪声）和自然图像阴影去除（局部边界感知噪声）等三个典型任务上进行验证。只需5个采样步骤，EDA胜过大多数特定任务的方法，并在偏置场校正和阴影去除方面取得了最先进的性能。

更新时间: 2025-07-24 16:01:34

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18534v1

C2G-KD: PCA-Constrained Generator for Data-Free Knowledge Distillation

We introduce C2G-KD, a data-free knowledge distillation framework where a class-conditional generator is trained to produce synthetic samples guided by a frozen teacher model and geometric constraints derived from PCA. The generator never observes real training data but instead learns to activate the teacher's output through a combination of semantic and structural losses. By constraining generated samples to lie within class-specific PCA subspaces estimated from as few as two real examples per class, we preserve topological consistency and diversity. Experiments on MNIST show that even minimal class structure is sufficient to bootstrap useful synthetic training pipelines.

Updated: 2025-07-24 16:00:32

标题: C2G-KD：基于PCA约束的无数据知识蒸馏生成器

摘要: 我们引入了C2G-KD，这是一个无数据知识蒸馏框架，其中一个类别条件生成器被训练以产生由冻结的教师模型和从PCA导出的几何约束引导的合成样本。生成器从未观察过真实的训练数据，而是通过语义和结构损失的组合学习激活教师的输出。通过将生成的样本限制在从每个类别仅两个真实示例估计出的类别特定PCA子空间内，我们保持了拓扑一致性和多样性。在MNIST数据集上的实验表明，即使是最小的类别结构也足以启动有用的合成训练管道。

更新时间: 2025-07-24 16:00:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18533v1

C2G-KD: PCA-Constrained Generator for Data-Free Knowledge Distillation

Updated: 2025-07-24 16:00:32

标题: C2G-KD：基于PCA约束的数据无关知识蒸馏生成器

摘要: 我们介绍了C2G-KD，这是一个无数据知识蒸馏框架，其中一个条件生成器被训练来生成合成样本，由冻结的教师模型和从PCA导出的几何约束指导。生成器从未观察过真实的训练数据，而是通过语义和结构损失的组合学习激活教师的输出。通过将生成的样本限制在从每个类别至少两个真实示例估计的类别特定PCA子空间中，我们保持了拓扑一致性和多样性。在MNIST上的实验证明，即使是最小的类结构也足以启动有用的合成训练流水线。

更新时间: 2025-07-24 16:00:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18533v1

Machine Learning Solutions Integrated in an IoT Healthcare Platform for Heart Failure Risk Stratification

The management of chronic Heart Failure (HF) presents significant challenges in modern healthcare, requiring continuous monitoring, early detection of exacerbations, and personalized treatment strategies. In this paper, we present a predictive model founded on Machine Learning (ML) techniques to identify patients at HF risk. This model is an ensemble learning approach, a modified stacking technique, that uses two specialized models leveraging clinical and echocardiographic features and then a meta-model to combine the predictions of these two models. We initially assess the model on a real dataset and the obtained results suggest that it performs well in the stratification of patients at HR risk. Specifically, we obtained high sensitivity (95\%), ensuring that nearly all high-risk patients are identified. As for accuracy, we obtained 84\%, which can be considered moderate in some ML contexts. However, it is acceptable given our priority of identifying patients at risk of HF because they will be asked to participate in the telemonitoring program of the PrediHealth research project on which some of the authors of this paper are working. The initial findings also suggest that ML-based risk stratification models can serve as valuable decision-support tools not only in the PrediHealth project but also for healthcare professionals, aiding in early intervention and personalized patient management. To have a better understanding of the value and of potentiality of our predictive model, we also contrasted its results with those obtained by using three baseline models. The preliminary results indicate that our predictive model outperforms these baselines that flatly consider features, \ie not grouping them in clinical and echocardiographic features.

Updated: 2025-07-24 15:55:05

标题: 《集成在物联网医疗平台中的机器学习解决方案用于心力衰竭风险分层》

摘要: 慢性心力衰竭（HF）的管理在现代医疗中面临着重大挑战，需要持续监测、早期发现加重症状，并采用个性化治疗策略。在本文中，我们提出了一种基于机器学习（ML）技术的预测模型，用于识别HF风险患者。该模型是一种集成学习方法，一种修改后的堆叠技术，利用两个专门的模型，结合临床和超声心动图特征，然后使用元模型来结合这两个模型的预测结果。我们最初在真实数据集上评估了该模型，得到的结果表明它在HF风险患者分层方面表现良好。具体而言，我们获得了高敏感性（95\%），确保几乎所有高风险患者都能被识别出来。至于准确性，我们获得了84\%，在一些ML背景下可以被认为是中等水平。然而，考虑到我们优先识别HF风险患者的重要性，这个结果是可以接受的，因为他们将被要求参与PrediHealth研究项目的远程监测计划，而该项目的一些作者正在进行工作。初步结果还表明，基于ML的风险分层模型不仅可以作为PrediHealth项目中有价值的决策支持工具，还可以帮助医疗专业人员进行早期干预和个性化患者管理。为了更好地了解我们预测模型的价值和潜力，我们还将其结果与使用三个基准模型获得的结果进行对比。初步结果表明，我们的预测模型优于这些基准模型，这些基准模型仅考虑特征，即没有将它们分组为临床和超声心动图特征。

更新时间: 2025-07-24 15:55:05

领域: stat.OT,cs.AI

下载: http://arxiv.org/abs/2505.09619v4

Diffuse and Disperse: Image Generation with Representation Regularization

The development of diffusion-based generative models over the past decade has largely proceeded independently of progress in representation learning. These diffusion models typically rely on regression-based objectives and generally lack explicit regularization. In this work, we propose \textit{Dispersive Loss}, a simple plug-and-play regularizer that effectively improves diffusion-based generative models. Our loss function encourages internal representations to disperse in the hidden space, analogous to contrastive self-supervised learning, with the key distinction that it requires no positive sample pairs and therefore does not interfere with the sampling process used for regression. Compared to the recent method of representation alignment (REPA), our approach is self-contained and minimalist, requiring no pre-training, no additional parameters, and no external data. We evaluate Dispersive Loss on the ImageNet dataset across a range of models and report consistent improvements over widely used and strong baselines. We hope our work will help bridge the gap between generative modeling and representation learning.

Updated: 2025-07-24 15:55:00

标题: 扩散和分散：带有表示规范化的图像生成

摘要: 在过去的十年中，基于扩散的生成模型的发展在很大程度上独立于表示学习的进展。这些扩散模型通常依赖于基于回归的目标，并且通常缺乏明确的正则化。在这项工作中，我们提出了一种名为“扩散损失”的简单即插即用的正则化器，可以有效改善基于扩散的生成模型。我们的损失函数鼓励内部表示在隐藏空间中分散，类似于对比自监督学习，但关键区别在于它不需要正样本对，并且不会干扰用于回归的采样过程。与最近的表示对齐方法(REPA)相比，我们的方法是自包含和极简主义的，不需要预训练、额外参数和外部数据。我们在ImageNet数据集上评估了扩散损失在一系列模型上的表现，并报告了相对于广泛使用的强基线的持续改进。我们希望我们的工作将有助于弥合生成建模和表示学习之间的差距。

更新时间: 2025-07-24 15:55:00

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.09027v2

Diffuse and Disperse: Image Generation with Representation Regularization

Updated: 2025-07-24 15:55:00

标题: 扩散和分散：具有表示规范化的图像生成

摘要: 在过去的十年中，扩散基于生成模型的发展在很大程度上独立于表示学习的进展。这些扩散模型通常依赖于基于回归的目标，并且通常缺乏明确的正则化。在这项工作中，我们提出了\textit{Dispersive Loss}，这是一种简单的即插即用正则化器，可以有效地改进基于扩散的生成模型。我们的损失函数鼓励内部表示在隐藏空间中分散，类似于对比自监督学习，其关键区别在于它不需要正样本对，并且因此不干扰用于回归的采样过程。与最近的表示对齐方法（REPA）相比，我们的方法是自包含的和极简主义的，不需要预训练、额外的参数和外部数据。我们在ImageNet数据集上评估了Dispersive Loss在各种模型上的表现，并报告了相对于广泛使用的和强大的基线的一致改进。我们希望我们的工作将有助于弥合生成建模和表示学习之间的差距。

更新时间: 2025-07-24 15:55:00

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.09027v2

Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench

Large Language Models (LLMs) and their agentic frameworks are increasingly adopted to automate software development tasks such as issue resolution and program repair. While prior work has identified security risks in LLM-generated code, most evaluations have focused on synthetic or isolated settings, leaving open questions about the security of these systems in real-world development contexts. In this study, we present the first large-scale security analysis of LLM-generated patches using 20,000+ issues from the SWE-bench dataset. We evaluate patches produced by a standalone LLM (Llama 3.3) and compare them to developer-written patches. We also assess the security of patches generated by three top-performing agentic frameworks (OpenHands, AutoCodeRover, HoneyComb) on a subset of our data. Finally, we analyze a wide range of code, issue, and project-level factors to understand the conditions under which LLMs and agents are most likely to generate insecure code. Our findings reveal that the standalone LLM introduces nearly 9x more new vulnerabilities than developers, with many of these exhibiting unique patterns not found in developers' code. Agentic workflows also generate a significant number of vulnerabilities, particularly when granting LLMs more autonomy, potentially increasing the likelihood of misinterpreting project context or task requirements. We find that vulnerabilities are more likely to occur in LLM patches associated with a higher number of files, more lines of generated code, and GitHub issues that lack specific code snippets or information about the expected code behavior and steps to reproduce. These results suggest that contextual factors play a critical role in the security of the generated code and point toward the need for proactive risk assessment methods that account for both code and issue-level information to complement existing vulnerability detection tools.

Updated: 2025-07-24 15:50:13

标题: AI生成的修复程序安全吗？分析LLM和Agent在SWE-bench上的补丁

摘要: 大型语言模型（LLMs）及其代理框架越来越被采用来自动化软件开发任务，如问题解决和程序修复。尽管先前的研究已经发现LLM生成的代码存在安全风险，但大多数评估集中在合成或孤立的设置上，留下了有关这些系统在真实开发环境中安全性的问题。在这项研究中，我们首次对LLM生成的补丁进行了大规模安全分析，使用了来自SWE-bench数据集的20,000多个问题。我们评估了独立LLM（Llama 3.3）生成的补丁，并将其与开发人员编写的补丁进行了比较。我们还评估了由三个表现最佳的代理框架（OpenHands，AutoCodeRover，HoneyComb）生成的补丁的安全性。最后，我们分析了各种代码、问题和项目级因素，以了解在哪些条件下LLMs和代理最有可能生成不安全的代码。我们的研究结果显示，独立LLM引入的新漏洞几乎是开发人员的9倍，其中许多展现出开发人员代码中找不到的独特模式。代理工作流也生成了大量漏洞，特别是在授予LLMs更多自治权时，可能增加错误解释项目环境或任务要求的可能性。我们发现，LLM补丁中更容易出现漏洞的情况包括与更多文件相关联，生成更多代码行，以及GitHub问题缺乏特定代码片段或关于预期代码行为和重现步骤的信息。这些结果表明，上下文因素在生成代码的安全性中起着关键作用，并指向了需要主动风险评估方法的需求，这些方法考虑了代码和问题级别信息，以补充现有的漏洞检测工具。

更新时间: 2025-07-24 15:50:13

领域: cs.CR,cs.LG,cs.SE

下载: http://arxiv.org/abs/2507.02976v2

Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench

Updated: 2025-07-24 15:50:13

标题: 人工智能生成的修复程序安全吗？分析LLM和Agent在SWE-bench上的补丁

摘要: 大型语言模型(LLMs)及其代理框架越来越被用于自动化软件开发任务，如问题解决和程序修复。尽管先前的研究已经确定了LLM生成的代码中存在安全风险，但大多数评估集中在合成或孤立的环境中，对这些系统在现实世界开发环境中的安全性仍存在疑问。在这项研究中，我们首次对使用SWE-bench数据集中的20,000多个问题生成的LLM补丁进行了大规模安全分析。我们评估了由独立LLM(Llama 3.3)生成的补丁，并将其与开发人员编写的补丁进行比较。我们还评估了三种表现最佳的代理框架(OpenHands，AutoCodeRover，HoneyComb)在我们数据的子集上生成的补丁的安全性。最后，我们分析了各种代码、问题和项目层面的因素，以了解LLMs和代理在何种条件下最有可能生成不安全的代码。我们的研究结果显示，独立LLM引入的新漏洞几乎是开发人员的9倍，其中许多展现出开发人员代码中没有的独特模式。代理工作流程也生成了大量漏洞，特别是在赋予LLMs更多自主权时，可能增加了误解项目背景或任务要求的可能性。我们发现，与文件数量更多、生成代码行数更多以及缺乏有关预期代码行为和重现步骤的GitHub问题相关的LLM补丁更容易出现漏洞。这些结果表明，上下文因素在生成代码的安全性中起着关键作用，并指向了需要考虑代码和问题级别信息的主动风险评估方法的需求，以补充现有的漏洞检测工具。

更新时间: 2025-07-24 15:50:13

领域: cs.CR,cs.LG,cs.SE

下载: http://arxiv.org/abs/2507.02976v2

The Moral Gap of Large Language Models

Moral foundation detection is crucial for analyzing social discourse and developing ethically-aligned AI systems. While large language models excel across diverse tasks, their performance on specialized moral reasoning remains unclear. This study provides the first comprehensive comparison between state-of-the-art LLMs and fine-tuned transformers across Twitter and Reddit datasets using ROC, PR, and DET curve analysis. Results reveal substantial performance gaps, with LLMs exhibiting high false negative rates and systematic under-detection of moral content despite prompt engineering efforts. These findings demonstrate that task-specific fine-tuning remains superior to prompting for moral reasoning applications.

Updated: 2025-07-24 15:49:06

标题: 大型语言模型的道德鸿沟

摘要: 道德基础检测对于分析社会话语并开发道德对齐的人工智能系统至关重要。尽管大型语言模型在各种任务上表现出色，但它们在专门的道德推理方面的表现仍不清楚。本研究首次全面比较了最先进的LLMs和经过精细调整的transformers在Twitter和Reddit数据集上的性能，使用ROC、PR和DET曲线分析。结果显示出显著的性能差距，LLMs表现出高误报率和对道德内容的系统性低检测，尽管进行了及时的工程努力。这些发现表明，针对特定任务的精细调整仍然优于针对道德推理应用的提示。

更新时间: 2025-07-24 15:49:06

领域: cs.CL,cs.HC,cs.LG

下载: http://arxiv.org/abs/2507.18523v1

The Moral Gap of Large Language Models

Updated: 2025-07-24 15:49:06

标题: 大型语言模型的道德鸿沟

摘要: 道德基础检测对于分析社会话语和开发道德对齐的人工智能系统至关重要。虽然大型语言模型在各种任务上表现出色，但它们在专门的道德推理上的表现仍不清楚。本研究首次全面比较了最先进的LLMs和在Twitter和Reddit数据集上进行了微调的transformers，使用ROC、PR和DET曲线分析。结果显示出明显的性能差距，LLMs表现出高假阴性率，并且尽管进行了提示工程努力，仍然存在对道德内容的系统性未检测。这些发现表明，针对特定任务的微调仍然优于为道德推理应用提供提示。

更新时间: 2025-07-24 15:49:06

领域: cs.CL,cs.HC,cs.LG

下载: http://arxiv.org/abs/2507.18523v1

Optimal Transport Regularized Divergences: Application to Adversarial Robustness

We introduce a new class of optimal-transport-regularized divergences, $D^c$, constructed via an infimal convolution between an information divergence, $D$, and an optimal-transport (OT) cost, $C$, and study their use in distributionally robust optimization (DRO). In particular, we propose the $ARMOR_D$ methods as novel approaches to enhancing the adversarial robustness of deep learning models. These DRO-based methods are defined by minimizing the maximum expected loss over a $D^c$-neighborhood of the empirical distribution of the training data. Viewed as a tool for constructing adversarial samples, our method allows samples to be both transported, according to the OT cost, and re-weighted, according to the information divergence; the addition of a principled and dynamical adversarial re-weighting on top of adversarial sample transport is a key innovation of $ARMOR_D$. $ARMOR_D$ can be viewed as a generalization of the best-performing loss functions and OT costs in the adversarial training literature; we demonstrate this flexibility by using $ARMOR_D$ to augment the UDR, TRADES, and MART methods and obtain improved performance on CIFAR-10 and CIFAR-100 image recognition. Specifically, augmenting with $ARMOR_D$ leads to 1.9\% and 2.1\% improvement against AutoAttack, a powerful ensemble of adversarial attacks, on CIFAR-10 and CIFAR-100 respectively. To foster reproducibility, we made the code accessible at https://github.com/star-ailab/ARMOR.

Updated: 2025-07-24 15:47:29

标题: 最优传输正则化散度：应用于对抗鲁棒性

摘要: 我们介绍了一类新的最优输运正则化散度，$D^c$，通过信息散度$D$和最优输运(OT)成本$C$之间的最小卷积构建，并研究它们在分布鲁棒优化(DRO)中的应用。特别是，我们提出了$ARMOR_D$方法作为增强深度学习模型对抗鲁棒性的新方法。这些基于DRO的方法通过最小化在训练数据的经验分布的$D^c$-邻域中的最大期望损失来定义。将我们的方法视为构建对抗样本的工具，允许样本根据OT成本进行传输，并根据信息散度进行重新加权；在对抗样本传输之上添加一个有原则和动态的对抗重新加权是$ARMOR_D$的一个关键创新。$ARMOR_D$可以被视为对对抗训练文献中表现最佳的损失函数和OT成本的泛化；我们通过使用$ARMOR_D$来增强UDR、TRADES和MART方法，并在CIFAR-10和CIFAR-100图像识别上获得改进的性能来展示这种灵活性。具体来说，增加$ARMOR_D$导致对抗攻击强大的AutoAttack在CIFAR-10和CIFAR-100上分别提高了1.9%和2.1%。为了促进可重现性，我们将代码放在https://github.com/star-ailab/ARMOR上供访问。

更新时间: 2025-07-24 15:47:29

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2309.03791v3

Optimal Transport Regularized Divergences: Application to Adversarial Robustness

Updated: 2025-07-24 15:47:29

标题: 最优输运正则化散度：应用于对抗鲁棒性

摘要: 我们引入了一类新的最优输运正则化差异，$D^c$，通过一个信息差异$D$和一个最优输运（OT）成本$C$之间的infimal卷积构造，并研究它们在分布鲁棒优化（DRO）中的应用。特别地，我们提出了$ARMOR_D$方法作为增强深度学习模型对抗鲁棒性的新方法。这些基于DRO的方法通过最小化在训练数据的经验分布的$D^c$-邻域中的最大期望损失来定义。作为构建对抗样本的工具，我们的方法允许样本根据OT成本进行输运，并根据信息差异进行重新加权；在对抗样本输运之上添加一个有原则和动态的对抗重新加权是$ARMOR_D$的一个关键创新。$ARMOR_D$可以被视为对抗训练文献中表现最佳的损失函数和OT成本的推广；我们通过在UDR、TRADES和MART方法中使用$ARMOR_D$来展示这种灵活性，并在CIFAR-10和CIFAR-100图像识别中获得改进的性能。具体地，使用$ARMOR_D$进行增强在CIFAR-10和CIFAR-100上分别对抗AutoAttack，一个强大的对抗攻击集合，分别提高了1.9%和2.1%。为了促进可重现性，我们将代码提供在https://github.com/star-ailab/ARMOR。

更新时间: 2025-07-24 15:47:29

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2309.03791v3

GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks

The exponential growth of spam text on the Internet necessitates robust detection mechanisms to mitigate risks such as information leakage and social instability. This work addresses two principal challenges: adversarial strategies employed by spammers and the scarcity of labeled data. We propose a novel spam-text detection framework GCC-Spam, which integrates three core innovations. First, a character similarity network captures orthographic and phonetic features to counter character-obfuscation attacks and furthermore produces sentence embeddings for downstream classification. Second, contrastive learning enhances discriminability by optimizing the latent-space distance between spam and normal texts. Third, a Generative Adversarial Network (GAN) generates realistic pseudo-spam samples to alleviate data scarcity while improving model robustness and classification accuracy. Extensive experiments on real-world datasets demonstrate that our model outperforms baseline approaches, achieving higher detection rates with significantly fewer labeled examples.

Updated: 2025-07-24 15:46:28

标题: GCC-Spam：通过GAN、对比学习和字符相似性网络进行垃圾邮件检测

摘要: 互联网上垃圾短信的指数级增长需要强大的检测机制来减轻信息泄露和社会不稳定等风险。本研究解决了两个主要挑战：垃圾短信发送者采用的对抗策略和标记数据的稀缺性。我们提出了一种新颖的垃圾短信检测框架GCC-Spam，该框架集成了三个核心创新。首先，一个字符相似性网络捕捉了正字法和语音特征，以对抗字符混淆攻击，进一步生成了用于下游分类的句子嵌入。其次，对比学习通过优化垃圾文本和正常文本之间的潜在空间距离来增强可区分性。第三，生成对抗网络（GAN）生成逼真的伪垃圾样本以减轻数据稀缺性，同时提高模型的鲁棒性和分类准确性。对真实世界数据集的大量实验表明，我们的模型优于基准方法，在显著较少的标记示例下实现了更高的检测率。

更新时间: 2025-07-24 15:46:28

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.14679v2

GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks

Updated: 2025-07-24 15:46:28

标题: GCC-Spam：通过GAN、对比学习和字符相似性网络进行垃圾邮件检测

摘要: 互联网上垃圾短信的指数增长需要强大的检测机制来减轻信息泄露和社会不稳定等风险。本文解决了两个主要挑战：垃圾邮件发送者采用的敌对策略和标记数据的稀缺性。我们提出了一种新颖的垃圾短信检测框架GCC-Spam，该框架集成了三个核心创新。首先，一个字符相似性网络捕捉了正字法和语音特征，以对抗字符混淆攻击，并进一步为下游分类生成句子嵌入。其次，对比学习通过优化垃圾文本和正常文本之间的潜在空间距离来增强可区分性。第三，生成对抗网络（GAN）生成逼真的伪垃圾样本，以缓解数据稀缺性同时提高模型的鲁棒性和分类准确率。对真实世界数据集的广泛实验表明，我们的模型胜过基准方法，在显著较少的标记示例下实现更高的检测率。

更新时间: 2025-07-24 15:46:28

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.14679v2

Robust sensitivity control in digital pathology via tile score distribution matching

Deploying digital pathology models across medical centers is challenging due to distribution shifts. Recent advances in domain generalization improve model transferability in terms of aggregated performance measured by the Area Under Curve (AUC). However, clinical regulations often require to control the transferability of other metrics, such as prescribed sensitivity levels. We introduce a novel approach to control the sensitivity of whole slide image (WSI) classification models, based on optimal transport and Multiple Instance Learning (MIL). Validated across multiple cohorts and tasks, our method enables robust sensitivity control with only a handful of calibration samples, providing a practical solution for reliable deployment of computational pathology systems.

Updated: 2025-07-24 15:45:59

标题: 数字病理学中的鲁棒灵敏度控制：通过瓷砖分数分布匹配

摘要: 在医疗中心部署数字病理模型是具有挑战性的，因为存在分布变化。最近在领域泛化方面取得的进展提高了模型的可转移性，以AUC（曲线下面积）的综合表现方式来衡量。然而，临床规定通常需要控制其他指标的可转移性，例如处方的灵敏度水平。我们提出了一种新颖的方法来控制整个切片图像（WSI）分类模型的灵敏度，基于最优传输和多实例学习（MIL）。经过多个队列和任务的验证，我们的方法能够在仅有少量校准样本的情况下实现灵敏度的可控，为可靠部署计算病理系统提供了实用解决方案。

更新时间: 2025-07-24 15:45:59

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.20144v3

Robust sensitivity control in digital pathology via tile score distribution matching

Updated: 2025-07-24 15:45:59

标题: 数字病理学中通过瓦片分数分布匹配实现的鲁棒性敏感性控制

摘要: 在医疗中心之间部署数字病理模型是具有挑战性的，这是由于分布转移。最近在领域泛化方面取得的进展提高了模型的可转移性，以AUC（曲线下面积）为衡量标准的综合性能。然而，临床规定通常要求控制其他指标的可转移性，例如规定的敏感性水平。我们提出了一种基于最优输送和多实例学习（MIL）的新方法，用于控制全切片图像（WSI）分类模型的敏感性。经过多个队列和任务的验证，我们的方法能够通过仅使用少量校准样本来实现稳健的敏感性控制，为可靠部署计算病理系统提供了实用解决方案。

更新时间: 2025-07-24 15:45:59

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.20144v3

GLANCE: Graph Logic Attention Network with Cluster Enhancement for Heterophilous Graph Representation Learning

Graph Neural Networks (GNNs) have demonstrated significant success in learning from graph-structured data but often struggle on heterophilous graphs, where connected nodes differ in features or class labels. This limitation arises from indiscriminate neighbor aggregation and insufficient incorporation of higher-order structural patterns. To address these challenges, we propose GLANCE (Graph Logic Attention Network with Cluster Enhancement), a novel framework that integrates logic-guided reasoning, dynamic graph refinement, and adaptive clustering to enhance graph representation learning. GLANCE combines a logic layer for interpretable and structured embeddings, multi-head attention-based edge pruning for denoising graph structures, and clustering mechanisms for capturing global patterns. Experimental results in benchmark datasets, including Cornell, Texas, and Wisconsin, demonstrate that GLANCE achieves competitive performance, offering robust and interpretable solutions for heterophilous graph scenarios. The proposed framework is lightweight, adaptable, and uniquely suited to the challenges of heterophilous graphs.

Updated: 2025-07-24 15:45:26

标题: GLANCE：具有集群增强的图逻辑注意网络，用于异质图表示学习

摘要: 图神经网络（GNNs）在从图结构数据中学习方面取得了显著的成功，但在异构图上通常会遇到困难，其中连接的节点在特征或类标签上有所不同。这种限制源自无差别的邻居聚合和对高阶结构模式的不足整合。为了解决这些挑战，我们提出了GLANCE（具有集群增强的图逻辑注意网络），这是一个集成逻辑引导推理、动态图细化和自适应聚类的新颖框架，以增强图表示学习。GLANCE结合了逻辑层用于可解释和结构化嵌入、基于多头注意力的边剪枝用于去噪图结构，以及用于捕获全局模式的聚类机制。在包括康奈尔、德克萨斯和威斯康星在内的基准数据集上的实验结果表明，GLANCE实现了竞争性的性能，为异构图场景提供了稳健且可解释的解决方案。所提出的框架轻量、适应性强，并且独特地适用于异构图的挑战。

更新时间: 2025-07-24 15:45:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18521v1

GLANCE: Graph Logic Attention Network with Cluster Enhancement for Heterophilous Graph Representation Learning

Updated: 2025-07-24 15:45:26

标题: GLANCE：具有集群增强的图逻辑注意网络，用于异质图表示学习

摘要: 图神经网络（GNNs）已经在从图结构数据中学习方面取得了显著的成功，但在异质图上通常会遇到困难，其中连接的节点在特征或类标签上有所不同。这种限制源于不加区分的邻居聚合和对高阶结构模式的不足整合。为了解决这些挑战，我们提出了GLANCE（具有集群增强的图逻辑注意网络），这是一个整合了逻辑引导推理、动态图细化和自适应聚类的新框架，以增强图表示学习。GLANCE结合了一个逻辑层，用于可解释和结构化的嵌入，基于多头注意力的边修剪，用于去噪图结构，以及用于捕捉全局模式的聚类机制。在包括康奈尔、德克萨斯和威斯康星在内的基准数据集上的实验结果表明，GLANCE取得了竞争性能，并为异质图场景提供了强大且可解释的解决方案。所提出的框架轻量级、适应性强，并且独特适用于异质图的挑战。

更新时间: 2025-07-24 15:45:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18521v1

Euclidean Distance Deflation Under High-Dimensional Heteroskedastic Noise

Pairwise Euclidean distance calculation is a fundamental step in many machine learning and data analysis algorithms. In real-world applications, however, these distances are frequently distorted by heteroskedastic noise$\unicode{x2014}$a prevalent form of inhomogeneous corruption characterized by variable noise magnitudes across data observations. Such noise inflates the computed distances in a nontrivial way, leading to misrepresentations of the underlying data geometry. In this work, we address the tasks of estimating the noise magnitudes per observation and correcting the pairwise Euclidean distances under heteroskedastic noise. Perhaps surprisingly, we show that in general high-dimensional settings and without assuming prior knowledge on the clean data structure or noise distribution, both tasks can be performed reliably, even when the noise levels vary considerably. Specifically, we develop a principled, hyperparameter-free approach that jointly estimates the noise magnitudes and corrects the distances. We provide theoretical guarantees for our approach, establishing probabilistic bounds on the estimation errors of both noise magnitudes and distances. These bounds, measured in the normalized $\ell_1$ norm, converge to zero at polynomial rates as both feature dimension and dataset size increase. Experiments on synthetic datasets demonstrate that our method accurately estimates distances in challenging regimes, significantly improving the robustness of subsequent distance-based computations. Notably, when applied to single-cell RNA sequencing data, our method yields noise magnitude estimates consistent with an established prototypical model, enabling accurate nearest neighbor identification that is fundamental to many downstream analyses.

Updated: 2025-07-24 15:45:23

标题: 高维异方差噪声下的欧几里得距离紧缩

摘要: 成对欧几里得距离计算是许多机器学习和数据分析算法中的基本步骤。然而，在现实世界的应用中，这些距离经常被异方差噪声扭曲$\unicode{x2014}$这是一种以数据观测中噪声幅度不同为特征的不均匀污染形式。这种噪声以非常规方式膨胀了计算出的距离，导致对基础数据几何形状的错误描述。在这项工作中，我们解决了在异方差噪声下估计每个观测噪声幅度和纠正成对欧几里得距离的任务。或许令人惊讶的是，我们表明在一般高维设置中，且不需要假设干净数据结构或噪声分布的先验知识时，这两个任务都可以可靠地执行，即使噪声水平变化很大。具体地，我们开发了一种基于原则且无超参数的方法，同时估计噪声幅度并纠正距离。我们为我们的方法提供了理论保证，建立了关于噪声幅度和距离估计误差的概率界限。这些界限，以标准化的$\ell_1$范数测量，随着特征维度和数据集大小的增加以多项式速率收敛到零。对合成数据集的实验表明，我们的方法准确估计了具有挑战性的区域中的距离，显着提高了后续基于距离的计算的稳健性。值得注意的是，当应用于单细胞RNA测序数据时，我们的方法产生了与一个已建立的原型模型一致的噪声幅度估计，从而实现了对许多下游分析至关重要的准确最近邻识别。

更新时间: 2025-07-24 15:45:23

领域: stat.ML,cs.LG,math.ST,stat.TH,62R07, 62G

下载: http://arxiv.org/abs/2507.18520v1

Euclidean Distance Deflation Under High-Dimensional Heteroskedastic Noise

Updated: 2025-07-24 15:45:23

标题: 在高维异方差噪声下的欧氏距离缩减

摘要: 成对欧氏距离计算是许多机器学习和数据分析算法中的基本步骤。然而，在实际应用中，这些距离经常受到异方差噪声的扭曲$\unicode{x2014}$一种以数据观测值之间的变量噪声幅度为特征的不均匀污染形式。这种噪声以一种非常复杂的方式膨胀计算得到的距离，导致对底层数据几何结构的误解。在这项工作中，我们解决了估计每个观测的噪声幅度和在异方差噪声下校正成对欧氏距离的任务。也许令人惊讶的是，我们表明在一般高维设置下，即使在没有关于干净数据结构或噪声分布的先验知识的情况下，这两个任务都可以可靠地执行，即使噪声水平变化很大。具体地，我们开发了一种有原则、无超参数的方法，联合估计噪声幅度并校正距离。我们为我们的方法提供了理论保证，建立了关于噪声幅度和距离估计误差的概率界限。这些界限，以归一化的$\ell_1$范数度量，随着特征维度和数据集大小的增加，以多项式速率收敛于零。对合成数据集的实验表明，我们的方法在具有挑战性的情境中准确估计距离，显著提高了后续基于距离的计算的鲁棒性。值得注意的是，当应用于单细胞RNA测序数据时，我们的方法产生与已建立的典型模型一致的噪声幅度估计，实现了基于最近邻的准确识别，这对许多后续分析至关重要。

更新时间: 2025-07-24 15:45:23

领域: stat.ML,cs.LG,math.ST,stat.TH,62R07, 62G

下载: http://arxiv.org/abs/2507.18520v1

Revisiting Bisimulation Metric for Robust Representations in Reinforcement Learning

Bisimulation metric has long been regarded as an effective control-related representation learning technique in various reinforcement learning tasks. However, in this paper, we identify two main issues with the conventional bisimulation metric: 1) an inability to represent certain distinctive scenarios, and 2) a reliance on predefined weights for differences in rewards and subsequent states during recursive updates. We find that the first issue arises from an imprecise definition of the reward gap, whereas the second issue stems from overlooking the varying importance of reward difference and next-state distinctions across different training stages and task settings. To address these issues, by introducing a measure for state-action pairs, we propose a revised bisimulation metric that features a more precise definition of reward gap and novel update operators with adaptive coefficient. We also offer theoretical guarantees of convergence for our proposed metric and its improved representation distinctiveness. In addition to our rigorous theoretical analysis, we conduct extensive experiments on two representative benchmarks, DeepMind Control and Meta-World, demonstrating the effectiveness of our approach.

Updated: 2025-07-24 15:42:22

标题: 重新审视强化学习中鲁棒表示的等价关系度量

摘要: Bisimulation metric一直被视为各种强化学习任务中的一种有效的与控制相关的表示学习技术。然而，在本文中，我们确定了传统bisimulation metric存在的两个主要问题：1) 无法表示某些独特的场景，2) 在递归更新过程中依赖预定义的奖励和随后状态的差异权重。我们发现第一个问题源于对奖励差距的不精确定义，而第二个问题则是由于忽视了在不同训练阶段和任务设置中奖励差异和下一个状态区分的重要性变化。为了解决这些问题，通过引入一种用于状态-动作对的度量，我们提出了一个修订后的bisimulation metric，具有更精确的奖励差距定义和具有自适应系数的新颖更新操作符。我们还为我们提出的度量和其改进的表示独特性提供了收敛的理论保证。除了我们严谨的理论分析外，我们还在两个代表性基准测试DeepMind Control和Meta-World上进行了大量实验，展示了我们方法的有效性。

更新时间: 2025-07-24 15:42:22

领域: cs.LG

下载: http://arxiv.org/abs/2507.18519v1

Revisiting Bisimulation Metric for Robust Representations in Reinforcement Learning

Updated: 2025-07-24 15:42:22

标题: 重新审视强化学习中用于稳健表示的等价关系度量

摘要: Bisimulation metric一直被认为是各种强化学习任务中一种有效的控制相关的表示学习技术。然而，在本文中，我们确定了传统的bisimulation metric存在两个主要问题：1）无法表示某些独特的场景，2）在递归更新过程中依赖预定义的奖励差异和后续状态的权重。我们发现第一个问题源于奖励差距的不精确定义，而第二个问题则是因为忽视了在不同训练阶段和任务设置中奖励差异和下一个状态区别的重要性变化。为了解决这些问题，通过引入状态-动作对的度量，我们提出了一个修订的bisimulation metric，具有更精确定义的奖励差距和具有自适应系数的新颖更新算子。我们还为我们提出的度量及其改进后的表示独特性提供了收敛的理论保证。除了我们严格的理论分析外，我们在两个代表性基准测试，DeepMind Control和Meta-World上进行了广泛的实验，展示了我们方法的有效性。

更新时间: 2025-07-24 15:42:22

领域: cs.LG

下载: http://arxiv.org/abs/2507.18519v1

Thermal-Aware 3D Design for Side-Channel Information Leakage

Side-channel attacks are important security challenges as they reveal sensitive information about on-chip activities. Among such attacks, the thermal side-channel has been shown to disclose the activities of key functional blocks and even encryption keys. This paper proposes a novel approach to proactively conceal critical activities in the functional layers while minimizing the power dissipation by (i) leveraging inherent characteristics of 3D integration to protect from side-channel attacks and (ii) dynamically generating custom activity patterns to match the activity to be concealed in the functional layers. Experimental analysis shows that 3D technology combined with the proposed run-time algorithm effectively reduces the Side channel vulnerability Factor (SVF) below 0.05 and the Spatial Thermal Side-channel Factor (STSF) below 0.59.

Updated: 2025-07-24 15:39:46

标题: 热感知的三维设计用于侧信道信息泄露

摘要: 侧信道攻击是重要的安全挑战，因为它们揭示了关于芯片活动的敏感信息。在这些攻击中，热侧信道已被证明可以揭示关键功能块甚至加密密钥的活动。本文提出了一种新颖的方法，通过利用3D集成的固有特性保护免受侧信道攻击，并动态生成定制活动模式以匹配需要隐藏的功能层中的活动，从而在最小化功耗的同时主动隐藏关键活动。实验分析表明，结合建议的运行时算法的3D技术有效地将侧信道脆弱性因子（SVF）降低到0.05以下，并将空间热侧信道因子（STSF）降低到0.59以下。

更新时间: 2025-07-24 15:39:46

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2508.02816v1

Visual Adaptive Prompting for Compositional Zero-Shot Learning

Vision-Language Models (VLMs) have demonstrated impressive multimodal capabilities in learning joint representations of visual and textual data, making them powerful tools for tasks such as Compositional Zero-Shot Learning (CZSL). CZSL requires models to generalize to novel combinations of visual primitives--such as attributes and objects--that were not explicitly encountered during training. Recent works in prompting for CZSL have focused on modifying inputs for the text encoder, often using static prompts that do not change across varying visual contexts. However, these approaches struggle to fully capture varying visual contexts, as they focus on text adaptation rather than leveraging visual features for compositional reasoning. To address this, we propose a Visual Adaptive Prompting System (VAPS) that leverages a learnable visual prompt repository and similarity-based retrieval mechanism within the framework of VLMs to bridge the gap between semantic and visual features. Our method introduces a dynamic visual prompt repository mechanism that selects the most relevant attribute and object prompts based on the visual features of the image. Our proposed system includes a visual prompt adapter that encourages the model to learn a more generalizable embedding space. Experiments on three CZSL benchmarks, across both closed and open-world scenarios, demonstrate state-of-the-art results.

Updated: 2025-07-24 15:38:22

标题: 视觉自适应提示用于组合式零样本学习

摘要: 视觉语言模型（VLMs）已经展示出在学习视觉和文本数据的联合表示方面具有令人印象深刻的多模态能力，使它们成为任务（如组合式零样本学习（CZSL））的强大工具。CZSL要求模型推广到在训练期间未明确遇到的视觉基元（如属性和对象）的新组合。最近的提示CZSL的作品专注于修改文本编码器的输入，通常使用在不同视觉背景下不变的静态提示。然而，这些方法很难完全捕捉不同的视觉背景，因为它们侧重于文本适应而不是利用视觉特征进行组合推理。为了解决这个问题，我们提出了一种视觉自适应提示系统（VAPS），它利用可学习的视觉提示仓库和相似性检索机制在VLMs框架内建立语义和视觉特征之间的桥梁。我们的方法引入了一个动态的视觉提示仓库机制，根据图像的视觉特征选择最相关的属性和对象提示。我们提出的系统包括一个视觉提示适配器，鼓励模型学习更具普适性的嵌入空间。对三个CZSL基准测试的实验，跨越封闭和开放世界的场景，展示了最新的结果。

更新时间: 2025-07-24 15:38:22

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.20292v6

Visual Adaptive Prompting for Compositional Zero-Shot Learning

Updated: 2025-07-24 15:38:22

标题: 视觉自适应提示用于组合式零样本学习

摘要: 视觉-语言模型（VLMs）已经展示了在学习视觉和文本数据的联合表示方面的令人印象深刻的多模态能力，使它们成为任务（如组合零样本学习（CZSL））的强大工具。CZSL要求模型能够推广到在训练过程中未明确遇到的视觉基元（如属性和对象）的新组合。最近在提示CZSL方面的研究集中在修改文本编码器的输入上，通常使用在不同视觉上下文中不变的静态提示。然而，这些方法难以完全捕捉不同的视觉上下文，因为它们专注于文本的适应而不是利用视觉特征进行组合推理。为了解决这个问题，我们提出了一种视觉自适应提示系统（VAPS），它利用可学习的视觉提示存储库和基于相似性的检索机制在VLMs框架内弥合语义和视觉特征之间的差距。我们的方法引入了一个动态的视觉提示存储库机制，根据图像的视觉特征选择最相关的属性和对象提示。我们提出的系统包括一个视觉提示适配器，鼓励模型学习一个更具普遍性的嵌入空间。在三个CZSL基准测试中的实验，涵盖了封闭和开放世界的场景，展示了最新的结果。

更新时间: 2025-07-24 15:38:22

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.20292v6

A Transfer Learning-Based Method for Water Body Segmentation in Remote Sensing Imagery: A Case Study of the Zhada Tulin Area

The Tibetan Plateau, known as the Asian Water Tower, faces significant water security challenges due to its high sensitivity to climate change. Advancing Earth observation for sustainable water monitoring is thus essential for building climate resilience in this region. This study proposes a two-stage transfer learning strategy using the SegFormer model to overcome domain shift and data scarcit--key barriers in developing robust AI for climate-sensitive applications. After pre-training on a diverse source domain, our model was fine-tuned for the arid Zhada Tulin area. Experimental results show a substantial performance boost: the Intersection over Union (IoU) for water body segmentation surged from 25.50% (direct transfer) to 64.84%. This AI-driven accuracy is crucial for disaster risk reduction, particularly in monitoring flash flood-prone systems. More importantly, the high-precision map reveals a highly concentrated spatial distribution of water, with over 80% of the water area confined to less than 20% of the river channel length. This quantitative finding provides crucial evidence for understanding hydrological processes and designing targeted water management and climate adaptation strategies. Our work thus demonstrates an effective technical solution for monitoring arid plateau regions and contributes to advancing AI-powered Earth observation for disaster preparedness in critical transboundary river headwaters.

Updated: 2025-07-24 15:37:18

标题: 一种基于迁移学习的遥感影像水体分割方法：扎达吐林地区案例研究

摘要: 青藏高原被称为亚洲水塔，由于其对气候变化高度敏感，面临着重大的水资源安全挑战。推进地球观测以实现可持续水资源监测对于在该地区建立气候适应性至关重要。本研究提出了一种使用SegFormer模型的两阶段迁移学习策略，以克服领域转移和数据稀缺等在开发用于气候敏感应用的强大人工智能的关键障碍。在多样化的源领域进行预训练后，我们的模型被微调用于干旱的扎达图林地区。实验结果显示出显著的性能提升：水体分割的交并比（IoU）从25.50%（直接迁移）增加到了64.84%。这种基于人工智能的精度对于减少灾害风险至关重要，特别是在监测易发洪水系统方面。更重要的是，高精度地图揭示了水的高度集中空间分布，超过80%的水域面积仅限于不到20%的河道长度。这一定量发现为理解水文过程和设计有针对性的水资源管理和气候适应策略提供了关键证据。因此，我们的工作展示了监测干旱高原地区的有效技术解决方案，并为在关键的跨界河流源头推进基于人工智能的地球观测以应对灾害做出了贡献。

更新时间: 2025-07-24 15:37:18

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.10084v2

A Transfer Learning-Based Method for Water Body Segmentation in Remote Sensing Imagery: A Case Study of the Zhada Tulin Area

Updated: 2025-07-24 15:37:18

标题: 一种基于迁移学习的遥感影像水体分割方法：扎达图林地区案例研究

摘要: 青藏高原，被称为亚洲水塔，由于对气候变化高度敏感，面临着重大的水资源安全挑战。推动地球观测技术以实现可持续水资源监测对于在该地区建立气候适应力至关重要。本研究提出了一种使用SegFormer模型的两阶段迁移学习策略，以克服在开发气候敏感应用的强大人工智能中的领域转移和数据稀缺等主要障碍。在多样化的源领域进行预训练后，我们的模型被微调用于干旱的札达吐林地区。实验结果显示出显著的性能提升：水体分割的IoU（交并比）从25.50%（直接迁移）提高到了64.84%。这种基于人工智能的准确性对于减少灾害风险至关重要，特别是在监测容易发生山洪的系统中。更重要的是，高精度地图揭示了水资源的高度集中空间分布，超过80%的水域面积约束在河道长度的不到20%的区域内。这一定量发现为理解水文过程和设计有针对性的水资源管理和气候适应策略提供了关键证据。我们的工作展示了对干旱高原地区进行监测的有效技术解决方案，为灾害准备工作提供了AI驱动的地球观测技术，尤其是在关键的跨界河流源头地区。

更新时间: 2025-07-24 15:37:18

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.10084v2

Explaining How Visual, Textual and Multimodal Encoders Share Concepts

Sparse autoencoders (SAEs) have emerged as a powerful technique for extracting human-interpretable features from neural networks activations. Previous works compared different models based on SAE-derived features but those comparisons have been restricted to models within the same modality. We propose a novel indicator allowing quantitative comparison of models across SAE features, and use it to conduct a comparative study of visual, textual and multimodal encoders. We also propose to quantify the Comparative Sharedness of individual features between different classes of models. With these two new tools, we conduct several studies on 21 encoders of the three types, with two significantly different sizes, and considering generalist and domain specific datasets. The results allow to revisit previous studies at the light of encoders trained in a multimodal context and to quantify to which extent all these models share some representations or features. They also suggest that visual features that are specific to VLMs among vision encoders are shared with text encoders, highlighting the impact of text pretraining. The code is available at https://github.com/CEA-LIST/SAEshareConcepts

Updated: 2025-07-24 15:33:31

标题: 解释视觉、文本和多模态编码器如何共享概念

摘要: 稀疏自编码器（SAEs）已经成为从神经网络激活中提取人类可解释特征的强大技术。以往的研究比较了基于SAE特征的不同模型，但这些比较仅限于同一模态的模型。我们提出了一个新的指标，允许对不同SAE特征的模型进行定量比较，并使用它来进行视觉、文本和多模态编码器的比较研究。我们还提出了量化不同类别模型之间个体特征的比较共享度。利用这两个新工具，我们对三种类型的21个编码器进行了多项研究，包括两个显著不同的大小，并考虑了通用和领域特定的数据集。结果允许在多模态背景下训练的编码器的光下重新审视以往的研究，并量化所有这些模型共享某些表示或特征的程度。它们还表明，视觉编码器中特定于VLMs的视觉特征与文本编码器共享，突出了文本预训练的影响。代码可在 https://github.com/CEA-LIST/SAEshareConcepts 获取。

更新时间: 2025-07-24 15:33:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18512v1

Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an RL algorithm to learn the optimal policy parameter directly. Our main contributions include the introduction of an exploration schedule and a regret analysis of the proposed algorithm. We provide the convergence rate of the policy parameter to the optimal one, and prove that the algorithm achieves a regret bound of $O(N^{\frac{3}{4}})$ up to a logarithmic factor, where $N$ is the number of learning episodes. We conduct a simulation study to validate the theoretical results and demonstrate the effectiveness and reliability of the proposed algorithm. We also perform numerical comparisons between our method and those of the recent model-based stochastic LQ RL studies adapted to the state- and control-dependent volatility setting, demonstrating a better performance of the former in terms of regret bounds.

Updated: 2025-07-24 15:32:35

标题: 一个类连续时间线性二次强化学习问题的次线性后悔

摘要: 我们研究了一类连续时间线性二次（LQ）控制问题的强化学习（RL），该问题涉及扩散过程，其中状态为标量值，运行控制奖励不存在，但状态过程的波动性取决于状态和控制变量。我们采用一种无模型的方法，既不依赖于模型参数的知识，也不依赖于其估计，并设计了一个RL算法来直接学习最优策略参数。我们的主要贡献包括引入一种探索进度表和对所提出算法的后悔分析。我们提供了策略参数收敛到最优参数的速率，并证明该算法在对数因子的影响下达到了$O(N^{\frac{3}{4}})$的后悔上界，其中$N$为学习周期数。我们进行了仿真研究以验证理论结果，并展示了所提出算法的有效性和可靠性。我们还对我们的方法与最近针对状态和控制变量相关波动性设置的基于模型的随机LQ RL研究进行了数值比较，表明前者在后悔上限方面表现更好。

更新时间: 2025-07-24 15:32:35

领域: cs.LG,cs.AI,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2407.17226v6

Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

Updated: 2025-07-24 15:32:35

标题: 一类连续时间线性二次强化学习问题的次线性遗憾

摘要: 我们研究了一类连续时间线性二次（LQ）控制问题的强化学习（RL），这些问题针对扩散过程，其中状态是标量值，运行控制奖励不存在，但状态过程的波动性取决于状态和控制变量。我们应用了一种无模型的方法，既不依赖于模型参数的知识也不依赖于其估计，并设计了一个RL算法来直接学习最优策略参数。我们的主要贡献包括引入探索计划和对所提出的算法进行后悔分析。我们提供了策略参数收敛到最优参数的收敛速度，并证明该算法的后悔界为$O(N^{\frac{3}{4}})$，其中$N$是学习周期的数量，直到对数因子。我们进行了模拟研究，以验证理论结果，并展示了所提出算法的有效性和可靠性。我们还在最近适用于状态和控制相关波动性设置的基于模型的随机LQ RL研究方法之间进行了数值比较，表明前者在后悔边界方面表现更好。

更新时间: 2025-07-24 15:32:35

领域: cs.LG,cs.AI,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2407.17226v6

Masked Autoencoders that Feel the Heart: Unveiling Simplicity Bias for ECG Analyses

The diagnostic value of electrocardiogram (ECG) lies in its dynamic characteristics, ranging from rhythm fluctuations to subtle waveform deformations that evolve across time and frequency domains. However, supervised ECG models tend to overfit dominant and repetitive patterns, overlooking fine-grained but clinically critical cues, a phenomenon known as Simplicity Bias (SB), where models favor easily learnable signals over subtle but informative ones. In this work, we first empirically demonstrate the presence of SB in ECG analyses and its negative impact on diagnostic performance, while simultaneously discovering that self-supervised learning (SSL) can alleviate it, providing a promising direction for tackling the bias. Following the SSL paradigm, we propose a novel method comprising two key components: 1) Temporal-Frequency aware Filters to capture temporal-frequency features reflecting the dynamic characteristics of ECG signals, and 2) building on this, Multi-Grained Prototype Reconstruction for coarse and fine representation learning across dual domains, further mitigating SB. To advance SSL in ECG analyses, we curate a large-scale multi-site ECG dataset with 1.53 million recordings from over 300 clinical centers. Experiments on three downstream tasks across six ECG datasets demonstrate that our method effectively reduces SB and achieves state-of-the-art performance. Code and dataset will be released publicly.

Updated: 2025-07-24 15:31:13

标题: 遮蔽自动编码器感知心脏：揭示心电图分析的简单偏差

摘要: 心电图（ECG）的诊断价值在于其动态特性，从节律波动到随时间和频率域演变的微妙波形畸变。然而，监督式ECG模型往往会过度拟合主导性和重复性模式，忽视精细但临床关键的线索，这种现象被称为简单偏见（SB），即模型偏向于易于学习的信号而不是微妙但信息丰富的信号。在这项工作中，我们首先实证地证明了ECG分析中SB的存在及其对诊断性能的负面影响，同时发现自监督学习（SSL）可以缓解这一问题，为解决偏见提供了一个有前途的方向。遵循SSL范式，我们提出了一种包含两个关键组成部分的新方法：1）时域-频率感知滤波器，捕捉反映ECG信号动态特性的时域-频率特征，2）在此基础上构建多粒度原型重建，实现跨双域的粗细表示学习，进一步减轻SB。为了推进ECG分析中的SSL，我们整理了一个包含来自300多个临床中心的153万条记录的大规模多站点ECG数据集。在六个ECG数据集上进行的三个下游任务实验表明，我们的方法有效减少了SB，并取得了最先进的性能。代码和数据集将会公开发布。

更新时间: 2025-07-24 15:31:13

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.22495v2

Masked Autoencoders that Feel the Heart: Unveiling Simplicity Bias for ECG Analyses

Updated: 2025-07-24 15:31:13

标题: 感知心脏的掩蔽自动编码器：揭示心电图分析的简单偏见

摘要: 心电图（ECG）的诊断价值在于其动态特性，从节律波动到随时间和频率领域演变的微妙波形变形。然而，监督式ECG模型往往会过度拟合主导性和重复性模式，忽视了细微但临床关键的线索，这种现象被称为简单偏见（SB），即模型更倾向于易学习信号而不是微妙但信息丰富的信号。在这项工作中，我们首先从实证上证明了ECG分析中SB的存在及其对诊断性能的负面影响，同时发现自监督学习（SSL）可以缓解这种偏见，为解决偏见提供了一个有前途的方向。遵循SSL范式，我们提出了一种包含两个关键组件的新方法：1）时域-频率感知滤波器，用于捕获反映ECG信号动态特性的时域频率特征，2）基于此构建的多粒度原型重建，用于在双域中进行粗粒度和细粒度表示学习，进一步减轻SB。为了推进ECG分析中的SSL，我们策划了一个包含来自300多个临床中心的153万次记录的大规模多地点ECG数据集。在六个ECG数据集上进行的三个下游任务的实验表明，我们的方法有效地减少了SB并取得了最先进的性能。代码和数据集将会公开发布。

更新时间: 2025-07-24 15:31:13

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.22495v2

Multi-Preference Lambda-weighted Listwise DPO for Small-Scale Model Alignment

Large language models (LLMs) demonstrate strong generalization across a wide range of language tasks, but often generate outputs that misalign with human preferences. Reinforcement Learning from Human Feedback (RLHF) addresses this by optimizing models toward human preferences using a learned reward function and reinforcement learning, yielding improved alignment but suffering from high computational cost and instability. Direct Preference Optimization (DPO) simplifies the process by treating alignment as a classification task over binary preference pairs, reducing training overhead while achieving competitive performance. However, it assumes fixed, single-dimensional preferences and only supports pairwise supervision. To address these limitations, we propose Multi-Preference Lambda-weighted Listwise DPO, which allows the model to learn from more detailed human feedback and flexibly balance multiple goals such as helpfulness, honesty, and fluency. Our method models full-ranked preference distributions rather than binary comparisons, enabling more informative learning signals. The lambda vector controls the relative importance of different alignment goals, allowing the model to generalize across diverse human objectives. During inference, lambda can be adjusted without retraining, providing controllable alignment behavior for downstream use. We also introduce a learned scheduler that dynamically samples performant lambda configurations to improve robustness. Notably, our method requires only 20GB of GPU memory for training, making it suitable for compute-constrained settings such as academic labs, educational tools, or on-device assistants. Experiments on 1B-2B scale models show that our method consistently outperforms standard DPO on alignment benchmarks while enabling efficient, controllable, and fine-grained adaptation suitable for real-world deployment.

Updated: 2025-07-24 15:23:54

标题: 多偏好lambda加权列表排序的小规模模型对齐

摘要: 大型语言模型（LLMs）在广泛的语言任务中表现出强大的泛化能力，但通常生成的输出与人类偏好不一致。通过人类反馈的强化学习（RLHF）通过使用学习的奖励函数和强化学习来优化模型以朝向人类偏好，从而实现了改进的一致性，但受到高计算成本和不稳定性的影响。直接偏好优化（DPO）通过将一致性视为二元偏好对的分类任务，简化了过程，减少了训练开销同时实现了竞争性能。然而，它假设了固定的单一维度偏好，并且仅支持成对监督。为了解决这些限制，我们提出了多偏好Lambda加权Listwise DPO，该方法允许模型从更详细的人类反馈中学习，并灵活平衡多个目标，如帮助性、诚实性和流畅性。我们的方法模拟完整的偏好分布而不是二元比较，从而提供更多信息的学习信号。Lambda向量控制不同一致性目标的相对重要性，使模型能够泛化到各种人类目标。在推理过程中，可以调整lambda而无需重新训练，为下游使用提供可控的对齐行为。我们还引入了一个学习调度器，动态地采样表现良好的lambda配置以提高鲁棒性。值得注意的是，我们的方法仅需要20GB的GPU内存进行训练，适用于计算受限的环境，如学术实验室、教育工具或设备上的助手。在1B-2B规模模型上的实验表明，我们的方法在一致性基准上始终优于标准DPO，同时实现了高效、可控和细粒度的适应性，适合实际部署。

更新时间: 2025-07-24 15:23:54

领域: cs.LG,I.2.6; I.2.7; I.5.1

下载: http://arxiv.org/abs/2506.19780v5

Multi-Preference Lambda-weighted Listwise DPO for Small-Scale Model Alignment

Updated: 2025-07-24 15:23:54

标题: 多偏好λ加权列表方式的小规模模型对齐

摘要: 大型语言模型（LLMs）在广泛的语言任务中展现出强大的泛化能力，但通常生成的输出与人类偏好不一致。通过人类反馈的强化学习（RLHF）通过使用学习的奖励函数和强化学习优化模型朝向人类偏好，从而实现更好的一致性，但受到高计算成本和不稳定性的影响。直接偏好优化（DPO）通过将一致性视为二元偏好对的分类任务，简化了这一过程，减少了训练开销，同时实现了竞争性的性能。然而，它假设了固定的单维度偏好，并且仅支持成对监督。为了解决这些限制，我们提出了多偏好Lambda加权Listwise DPO，允许模型从更详细的人类反馈中学习，并灵活平衡多个目标，如帮助性、诚实性和流畅性。我们的方法模拟全排序偏好分布而不是二元比较，从而获得更多信息的学习信号。lambda向量控制不同一致性目标的相对重要性，使模型能够泛化到各种人类目标。在推断过程中，lambda可以在无需重新训练的情况下进行调整，为下游使用提供可控的一致性行为。我们还引入了一个学习的调度器，动态采样性能lambda配置，以提高鲁棒性。值得注意的是，我们的方法仅需要20GB的GPU内存进行训练，适用于计算受限的环境，如学术实验室、教育工具或设备上的助手。对1B-2B规模模型的实验表明，我们的方法在一致性基准测试中始终优于标准DPO，同时实现了高效、可控和精细的适应性，适用于实际部署。

更新时间: 2025-07-24 15:23:54

领域: cs.LG,I.2.6; I.2.7; I.5.1

下载: http://arxiv.org/abs/2506.19780v5

DualXDA: Towards Sparse, Efficient and Explainable Data Attribution in Large AI Models

Deep learning models achieve remarkable performance, yet their decision-making processes often remain opaque. In response, the field of eXplainable Artificial Intelligence (XAI) has grown significantly over the last decade, primarily focusing on feature attribution methods. Complementing this perspective, Data Attribution (DA) has emerged as a promising paradigm that shifts the focus from features to data provenance. However, existing DA approaches suffer from prohibitively high computational costs and memory demands. Additionally, current attribution methods exhibit low sparsity, hindering the discovery of decisive patterns in the data. We introduce DualXDA, a framework for sparse, efficient and explainable DA, comprised of two interlinked approaches for Dual Data Attribution (DualDA) and eXplainable Data Attribution (XDA): With DualDA, we propose efficient and effective DA, leveraging Support Vector Machine theory to provide fast and naturally sparse data attributions for AI predictions. We demonstrate that DualDA achieves high attribution quality, excels at solving a series of evaluated downstream tasks, while at the same time improving explanation time by a factor of up to 4,100,000$\times$ compared to the original Influence Functions method, and up to 11,000$\times$ compared to the method's most efficient approximation from literature. We further introduce XDA, a method for enhancing Data Attribution with capabilities from feature attribution methods to explain why training samples are relevant for the prediction of a test sample in terms of impactful features. Taken together, our contributions in DualXDA ultimately point towards a future of eXplainable AI applied at unprecedented scale, enabling transparent, efficient and novel analysis of even the largest neural architectures fostering a new generation of accountable AI systems. Code at https://github.com/gumityolcu/DualXDA.

Updated: 2025-07-24 15:23:53

标题: DualXDA：面向大型AI模型的稀疏、高效和可解释数据归因

摘要: 深度学习模型取得了显著的性能，但它们的决策过程通常是不透明的。作为回应，过去十年中，可解释人工智能（XAI）领域已经显著增长，主要集中在特征归因方法上。作为补充，数据归因（DA）作为一种有前途的范式已经出现，将焦点从特征转移到数据溯源。然而，现有的DA方法存在着计算成本和内存需求过高的问题。此外，当前的归因方法表现出较低的稀疏性，阻碍了在数据中发现决定性模式。我们介绍了DualXDA，一个用于稀疏、高效和可解释的DA的框架，由双数据归因（DualDA）和可解释数据归因（XDA）两种相互关联的方法组成：通过DualDA，我们提出了高效和有效的DA，利用支持向量机理论为AI预测提供快速且自然稀疏的数据归因。我们证明DualDA实现了高质量的归因，并在解决一系列评估的下游任务方面表现出色，同时将解释时间提高了高达4,100,000倍，与原始影响函数方法相比，提高了最高达11,000倍，相比文献中最高效的近似方法。我们进一步引入了XDA，一种利用特征归因方法提高数据归因能力的方法，解释为什么训练样本与测试样本的预测相关，并指出影响性特征。总的来说，我们在DualXDA中的贡献最终指向一个未来的可解释AI，应用规模前所未有，实现透明、高效和新颖的对甚至最大的神经网络架构的分析，促进新一代可追溯的AI系统的发展。具体代码请查看https://github.com/gumityolcu/DualXDA。

更新时间: 2025-07-24 15:23:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.12118v2

DualXDA: Towards Sparse, Efficient and Explainable Data Attribution in Large AI Models

Updated: 2025-07-24 15:23:53

标题: DualXDA：朝向稀疏、高效和可解释的大型AI模型数据归因

摘要: 深度学习模型取得了卓越的性能，但它们的决策过程通常仍然不透明。作为回应，解释性人工智能（XAI）领域在过去十年显著发展，主要集中在特征归因方法上。作为补充，数据归因（DA）作为一种有前途的范式出现，将焦点从特征转移到数据来源。然而，现有的DA方法存在着计算成本和内存需求过高的问题。此外，当前的归因方法表现出低稀疏性，阻碍了在数据中发现决策性模式。我们引入了DualXDA，这是一个稀疏、高效且可解释的DA框架，由双数据归因（DualDA）和可解释数据归因（XDA）两个相互关联的方法组成：通过DualDA，我们提出了高效且有效的DA，利用支持向量机理论为AI预测提供快速且自然稀疏的数据归因。我们展示了DualDA实现了高质量的归因，在解决一系列评估的下游任务时表现优异，同时将解释时间提高了高达4,100,000倍，与原始影响函数方法相比，提高了高达11,000倍，与文献中方法的最有效近似相比。我们进一步引入了XDA，一种通过特征归因方法增强数据归因的方法，以解释为什么训练样本对测试样本的预测在影响特征方面具有相关性。综合起来，我们在DualXDA中的贡献最终指向了一个未来的可解释AI，应用范围前所未有，实现了对甚至最大的神经结构的透明、高效和新颖分析，促进了新一代可追溯的AI系统的发展。代码在https://github.com/gumityolcu/DualXDA。

更新时间: 2025-07-24 15:23:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.12118v2

Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models

Large Language Models (LLMs) have shown strong potential for tabular data generation by modeling textualized feature-value pairs. However, tabular data inherently exhibits sparse feature-level dependencies, where many feature interactions are structurally insignificant. This creates a fundamental mismatch as LLMs' self-attention mechanism inevitably distributes focus across all pairs, diluting attention on critical relationships, particularly in datasets with complex dependencies or semantically ambiguous features. To address this limitation, we propose GraDe (Graph-Guided Dependency Learning), a novel method that explicitly integrates sparse dependency graphs into LLMs' attention mechanism. GraDe employs a lightweight dynamic graph learning module guided by externally extracted functional dependencies, prioritizing key feature interactions while suppressing irrelevant ones. Our experiments across diverse real-world datasets demonstrate that GraDe outperforms existing LLM-based approaches by up to 12% on complex datasets while achieving competitive results with state-of-the-art approaches in synthetic data quality. Our method is minimally intrusive yet effective, offering a practical solution for structure-aware tabular data modeling with LLMs.

Updated: 2025-07-24 15:22:27

标题: 并非所有特征值得关注：带有语言模型的图导向依赖学习用于表格数据生成

摘要: 大型语言模型(LLMs)通过建模文本化的特征-值对展示了在表格数据生成方面的强大潜力。然而，表格数据本质上展示了稀疏的特征级依赖性，其中许多特征交互在结构上并不重要。这导致了LLMs的自注意机制不可避免地将注意力分散到所有对上，从而削弱了对关键关系的关注，特别是在具有复杂依赖关系或语义模糊特征的数据集中。为了解决这一限制，我们提出了GraDe (图引导的依赖学习)，这是一种新颖的方法，明确地将稀疏依赖图集成到LLMs的注意机制中。GraDe采用了一个轻量级的动态图学习模块，由外部提取的功能依赖指导，优先考虑关键特征交互，同时抑制无关的交互。我们在各种真实世界数据集上的实验表明，GraDe在复杂数据集上的表现优于现有的基于LLM的方法多达12%，同时在合成数据质量方面取得了与最先进方法相媲美的结果。我们的方法干扰最小但有效，为使用LLMs进行结构感知的表格数据建模提供了实际解决方案。

更新时间: 2025-07-24 15:22:27

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.18504v1

Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models

Updated: 2025-07-24 15:22:27

标题: 并非所有特征值得关注：面向表格数据生成的图导向依赖学习与语言模型

摘要: 大型语言模型（LLMs）已经展示出通过对文本化特征-值对进行建模在表格数据生成方面具有潜力。然而，表格数据本质上展示出稀疏的特征级依赖关系，其中许多特征交互在结构上是不重要的。这导致了LLMs的自注意机制不可避免地在所有对中分配关注，从而削弱了对关键关系的关注，特别是在具有复杂依赖性或语义模糊的特征的数据集中。为了解决这一限制，我们提出了GraDe（Graph-Guided Dependency Learning），这是一种新颖的方法，将稀疏依赖图明确地集成到LLMs的注意机制中。GraDe利用一个轻量级的动态图学习模块，由外部提取的功能依赖关系指导，优先考虑关键特征交互，同时抑制不相关的交互。我们在各种真实世界数据集上的实验证明，GraDe在复杂数据集上比现有的基于LLM的方法表现提高了高达12％，同时在合成数据质量方面与最先进的方法取得了竞争性结果。我们的方法具有最小干扰但高效，为使用LLMs进行结构感知型表格数据建模提供了实用解决方案。

更新时间: 2025-07-24 15:22:27

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.18504v1

PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization

Few-shot temporal action localization (TAL) methods that adapt large models via single-prompt tuning often fail to produce precise temporal boundaries. This stems from the model learning a non-discriminative mean representation of an action from sparse data, which compromises generalization. We address this by proposing a new paradigm based on multi-prompt ensembles, where a set of diverse, learnable prompts for each action is encouraged to specialize on compositional sub-events. To enforce this specialization, we introduce PLOT-TAL, a framework that leverages Optimal Transport (OT) to find a globally optimal alignment between the prompt ensemble and the video's temporal features. Our method establishes a new state-of-the-art on the challenging few-shot benchmarks of THUMOS'14 and EPIC-Kitchens, without requiring complex meta-learning. The significant performance gains, particularly at high IoU thresholds, validate our hypothesis and demonstrate the superiority of learning distributed, compositional representations for precise temporal localization.

Updated: 2025-07-24 15:19:06

标题: PLOT-TAL: 使用最优输运进行少样本时间动作定位的提示学习

摘要: 少样本时间动作定位（TAL）方法通常通过单提示调整大型模型，往往无法产生精确的时间边界。这是因为模型从稀疏数据中学习到一个非判别性的动作平均表示，从而损害了泛化能力。我们通过提出一个基于多提示集合的新范式来解决这个问题，其中鼓励每个动作的一组多样化的可学习提示专门用于组合子事件。为了强化这种专业化，我们引入了PLOT-TAL框架，利用最优传输（OT）在提示集合和视频的时间特征之间找到全局最优对齐。我们的方法在具有挑战性的少样本基准THUMOS'14和EPIC-Kitchens上建立了新的最先进技术，而无需复杂的元学习。特别是在高IoU阈值下的显著性能增益验证了我们的假设，并证明了学习分布式、组合性表示对于精确的时间定位的优越性。

更新时间: 2025-07-24 15:19:06

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.18915v2

PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization

Updated: 2025-07-24 15:19:06

标题: PLOT-TAL: 利用最优输运进行少样本时间动作定位的即时学习

摘要: 少样本时间动作定位（TAL）方法通常通过单提示调整大型模型，往往无法产生精确的时间边界。这是因为模型从稀疏数据中学习到一个非判别性的动作平均表示，这会损害泛化能力。我们通过提出一种基于多提示集成的新范式来解决这个问题，其中鼓励每个动作的一组多样化的可学习提示专注于组合子事件。为了强化这种专业化，我们引入了PLOT-TAL，这是一个利用最优传输（OT）来找到提示集成与视频时间特征之间的全局最优对齐的框架。我们的方法在具有挑战性的THUMOS'14和EPIC-Kitchens的少样本基准上建立了一个新的最先进水平，而不需要复杂的元学习。显著的性能提升，特别是在高IoU阈值下，验证了我们的假设，并展示了学习分布式、组合表示以实现精确时间定位的优越性。

更新时间: 2025-07-24 15:19:06

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.18915v2

EarthLink: A Self-Evolving AI Agent for Climate Science

Modern Earth science is at an inflection point. The vast, fragmented, and complex nature of Earth system data, coupled with increasingly sophisticated analytical demands, creates a significant bottleneck for rapid scientific discovery. Here we introduce EarthLink, the first AI agent designed as an interactive copilot for Earth scientists. It automates the end-to-end research workflow, from planning and code generation to multi-scenario analysis. Unlike static diagnostic tools, EarthLink can learn from user interaction, continuously refining its capabilities through a dynamic feedback loop. We validated its performance on a number of core scientific tasks of climate change, ranging from model-observation comparisons to the diagnosis of complex phenomena. In a multi-expert evaluation, EarthLink produced scientifically sound analyses and demonstrated an analytical competency that was rated as comparable to specific aspects of a human junior researcher's workflow. Additionally, its transparent, auditable workflows and natural language interface empower scientists to shift from laborious manual execution to strategic oversight and hypothesis generation. EarthLink marks a pivotal step towards an efficient, trustworthy, and collaborative paradigm for Earth system research in an era of accelerating global change. The system is accessible at our website https://earthlink.intern-ai.org.cn.

Updated: 2025-07-24 15:12:15

标题: EarthLink：气候科学自进化人工智能代理

摘要: 现代地球科学正处于一个拐点。地球系统数据的庞大、碎片化和复杂性，加上日益复杂的分析需求，为快速科学发现创造了一个重要瓶颈。在这里，我们介绍EarthLink，这是第一个专为地球科学家设计的人工智能助手。它自动化了从规划和代码生成到多场景分析的整个研究工作流程。与静态诊断工具不同，EarthLink 可以通过动态反馈循环学习用户互动，不断完善其能力。我们验证了它在气候变化的一些核心科学任务上的性能，从模型观测比较到复杂现象的诊断。在多专家评估中，EarthLink 产生了科学上合理的分析，并展示了被评为与人类初级研究人员工作流程的特定方面相媲美的分析能力。此外，其透明、可审计的工作流程和自然语言界面使科学家能够从繁琐的手动执行转变为战略监督和假设生成。EarthLink 标志着朝着在加速全球变化时代实现高效、可信赖和协作范式的地球系统研究迈出了重要一步。该系统可在我们的网站https://earthlink.intern-ai.org.cn 上访问。

更新时间: 2025-07-24 15:12:15

领域: cs.LG,cs.AI,physics.ao-ph

下载: http://arxiv.org/abs/2507.17311v2

Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time

Concept drift is the phenomenon in which the underlying data distributions and statistical properties of a target domain change over time, leading to a degradation in model performance. Consequently, production models require continuous drift detection monitoring. Most drift detection methods to date are supervised, relying on ground-truth labels. However, they are inapplicable in many real-world scenarios, as true labels are often unavailable. Although recent efforts have proposed unsupervised drift detectors, many lack the accuracy required for reliable detection or are too computationally intensive for real-time use in high-dimensional, large-scale production environments. Moreover, they often fail to characterize or explain drift effectively. To address these limitations, we propose \textsc{DriftLens}, an unsupervised framework for real-time concept drift detection and characterization. Designed for deep learning classifiers handling unstructured data, \textsc{DriftLens} leverages distribution distances in deep learning representations to enable efficient and accurate detection. Additionally, it characterizes drift by analyzing and explaining its impact on each label. Our evaluation across classifiers and data-types demonstrates that \textsc{DriftLens} (i) outperforms previous methods in detecting drift in 15/17 use cases; (ii) runs at least 5 times faster; (iii) produces drift curves that align closely with actual drift (correlation $\geq\!0.85$); (iv) effectively identifies representative drift samples as explanations.

Updated: 2025-07-24 15:10:45

标题: 实时无监督概念漂移检测基于深度学习表示

摘要: 概念漂移是指目标领域的基础数据分布和统计特性随时间发生变化，导致模型性能下降的现象。因此，生产模型需要持续监测漂移检测。迄今为止，大多数漂移检测方法都是有监督的，依赖于地面真实标签。然而，在许多现实世界场景中，这些方法并不适用，因为真实标签通常不可用。尽管最近的努力提出了无监督漂移检测器，但其中许多缺乏可靠检测所需的准确性，或者在高维、大规模生产环境中实时使用时计算量过大。此外，它们经常无法有效地表征或解释漂移。为了解决这些局限性，我们提出了一个名为\textsc{DriftLens}的无监督框架，用于实时概念漂移检测和表征。\textsc{DriftLens}专为处理非结构化数据的深度学习分类器设计，利用深度学习表示中的分布距离，实现了高效和准确的检测。此外，它通过分析和解释漂移对每个标签的影响来表征漂移。我们在不同分类器和数据类型上的评估表明，\textsc{DriftLens}：(i) 在15/17个用例中优于先前的方法检测漂移；(ii) 运行速度至少快5倍；(iii) 产生的漂移曲线与实际漂移高度一致（相关性≥0.85）；(iv) 有效地识别代表性漂移样本作为解释。

更新时间: 2025-07-24 15:10:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17813v2

Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time

Updated: 2025-07-24 15:10:45

标题: 实时深度学习表示中的无监督概念漂移检测

摘要: 概念漂移是指目标领域的基础数据分布和统计特性随时间发生变化，导致模型性能下降的现象。因此，生产模型需要持续监测漂移检测。迄今为止，大多数漂移检测方法都是有监督的，依赖于地面真实标签。然而，在许多现实场景中，这些方法不适用，因为真实标签通常不可用。尽管最近一些努力提出了无监督的漂移检测器，但许多缺乏可靠检测所需的准确性，或者在高维、大规模生产环境中实时使用时计算过于复杂。此外，它们通常无法有效表征或解释漂移。为了解决这些限制，我们提出了\textsc{DriftLens}，一个用于实时概念漂移检测和表征的无监督框架。设计用于处理非结构化数据的深度学习分类器，\textsc{DriftLens}利用深度学习表示中的分布距离，实现了高效准确的检测。此外，它通过分析和解释漂移对每个标签的影响来表征漂移。我们在各种分类器和数据类型上的评估表明，\textsc{DriftLens} (i) 在15/17个用例中检测漂移的性能优于先前方法；(ii) 运行速度至少快5倍；(iii) 产生的漂移曲线与实际漂移密切相关（相关性$\geq\!0.85$）；(iv) 有效识别代表性漂移样本作为解释。

更新时间: 2025-07-24 15:10:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17813v2

Faithful, Interpretable Chest X-ray Diagnosis with Anti-Aliased B-cos Networks

Faithfulness and interpretability are essential for deploying deep neural networks (DNNs) in safety-critical domains such as medical imaging. B-cos networks offer a promising solution by replacing standard linear layers with a weight-input alignment mechanism, producing inherently interpretable, class-specific explanations without post-hoc methods. While maintaining diagnostic performance competitive with state-of-the-art DNNs, standard B-cos models suffer from severe aliasing artifacts in their explanation maps, making them unsuitable for clinical use where clarity is essential. In this work, we address these limitations by introducing anti-aliasing strategies using FLCPooling (FLC) and BlurPool (BP) to significantly improve explanation quality. Our experiments on chest X-ray datasets demonstrate that the modified $\text{B-cos}_\text{FLC}$ and $\text{B-cos}_\text{BP}$ preserve strong predictive performance while providing faithful and artifact-free explanations suitable for clinical application in multi-class and multi-label settings. Code available at: GitHub repository (url: https://github.com/mkleinma/B-cos-medical-paper).

Updated: 2025-07-24 14:58:44

标题: 忠实、可解释的胸部X光诊断与抗锯齿B-cos网络

摘要: 忠实性和可解释性对于在医学成像等安全关键领域部署深度神经网络（DNNs）至关重要。B-cos网络通过将标准线性层替换为权重-输入对齐机制，产生固有可解释的、类别特定的解释，而无需事后方法，为提供有希望的解决方案。虽然标准B-cos模型在维持与最先进DNNs竞争性的诊断性能的同时，其解释图中存在严重的混叠伪影，使其在需要清晰度的临床应用中不适用。在本研究中，我们通过引入使用FLCPooling（FLC）和BlurPool（BP）的抗混叠策略来显著改善解释质量，以解决这些限制。我们在胸部X光数据集上的实验表明，修改后的B-cos_FLC和B-cos_BP在保持强大预测性能的同时，提供了忠实且无伪影的解释，适用于多类别和多标签设置的临床应用。代码可在GitHub存储库 (url: https://github.com/mkleinma/B-cos-medical-paper) 上找到。

更新时间: 2025-07-24 14:58:44

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.16761v2

Reinforced Embodied Active Defense: Exploiting Adaptive Interaction for Robust Visual Perception in Adversarial 3D Environments

Adversarial attacks in 3D environments have emerged as a critical threat to the reliability of visual perception systems, particularly in safety-sensitive applications such as identity verification and autonomous driving. These attacks employ adversarial patches and 3D objects to manipulate deep neural network (DNN) predictions by exploiting vulnerabilities within complex scenes. Existing defense mechanisms, such as adversarial training and purification, primarily employ passive strategies to enhance robustness. However, these approaches often rely on pre-defined assumptions about adversarial tactics, limiting their adaptability in dynamic 3D settings. To address these challenges, we introduce Reinforced Embodied Active Defense (Rein-EAD), a proactive defense framework that leverages adaptive exploration and interaction with the environment to improve perception robustness in 3D adversarial contexts. By implementing a multi-step objective that balances immediate prediction accuracy with predictive entropy minimization, Rein-EAD optimizes defense strategies over a multi-step horizon. Additionally, Rein-EAD involves an uncertainty-oriented reward-shaping mechanism that facilitates efficient policy updates, thereby reducing computational overhead and supporting real-world applicability without the need for differentiable environments. Comprehensive experiments validate the effectiveness of Rein-EAD, demonstrating a substantial reduction in attack success rates while preserving standard accuracy across diverse tasks. Notably, Rein-EAD exhibits robust generalization to unseen and adaptive attacks, making it suitable for real-world complex tasks, including 3D object classification, face recognition and autonomous driving.

Updated: 2025-07-24 14:56:21

标题: 加强的实体主动防御：利用自适应交互在对抗性三维环境中实现稳健的视觉感知

摘要: 在3D环境中的对抗性攻击已经成为视觉感知系统可靠性的关键威胁，特别是在诸如身份验证和自动驾驶等安全敏感应用中。这些攻击利用对抗性贴片和3D物体来操纵深度神经网络（DNN）的预测，通过利用复杂场景中的漏洞。现有的防御机制，如对抗性训练和净化，主要采用被动策略来增强鲁棒性。然而，这些方法通常依赖于对对抗性策略的预定义假设，限制了它们在动态3D环境中的适应性。为了解决这些挑战，我们引入了强化体验主动防御（Rein-EAD），这是一个积极的防御框架，利用自适应的探索和与环境的互动来提高在3D对抗性环境中的感知鲁棒性。通过实施一个多步目标，平衡即时预测准确性和预测熵最小化，Rein-EAD优化了多步视野下的防御策略。此外，Rein-EAD还涉及一种基于不确定性的奖励塑形机制，促进高效的策略更新，从而减少计算开销，并支持无需可微分环境的实际适用性。全面的实验验证了Rein-EAD的有效性，显示出攻击成功率的显著降低，同时保持各种任务的标准准确性。值得注意的是，Rein-EAD表现出对未知和自适应攻击的强大泛化能力，使其适用于真实世界的复杂任务，包括3D物体分类、人脸识别和自动驾驶。

更新时间: 2025-07-24 14:56:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18484v1

Scout: Leveraging Large Language Models for Rapid Digital Evidence Discovery

Recent technological advancements and the prevalence of technology in day to day activities have caused a major increase in the likelihood of the involvement of digital evidence in more and more legal investigations. Consumer-grade hardware is growing more powerful, with expanding memory and storage sizes and enhanced processor capabilities. Forensics investigators often have to sift through gigabytes of data during an ongoing investigation making the process tedious. Memory forensics, disk analysis all are well supported by state of the art tools that significantly lower the effort required to be put in by a forensic investigator by providing string searches, analyzing images file etc. During the course of the investigation a lot of false positives are identified that need to be lowered. This work presents Scout, a digital forensics framework that performs preliminary evidence processing and prioritizing using large language models. Scout deploys foundational language models to identify relevant artifacts from a large number of potential evidence files (disk images, captured network packets, memory dumps etc.) which would have taken longer to get identified. Scout employs text based large language models can easily process files with textual information. For the forensic analysis of multimedia files like audio, image, video, office documents etc. multimodal models are employed by Scout. Scout was able to identify and realize the evidence file that were of potential interest for the investigator.

Updated: 2025-07-24 14:54:14

标题: 侦察者：利用大型语言模型实现快速数字证据发现

摘要: 最近的技术进步和技术在日常活动中的普及显著增加了数字证据在越来越多的法律调查中涉及的可能性。消费级硬件正变得更加强大，内存和存储容量不断扩大，处理器性能得到增强。在进行持续调查期间，取证调查员经常不得不筛选大量数据，使得这一过程变得繁琐。记忆取证、磁盘分析等领域都得到了先进工具的良好支持，这些工具显著降低了取证调查员需要投入的工作量，提供了字符串搜索、分析图像文件等功能。在调查过程中，经常会发现许多误报，需要降低误报率。本文介绍了一种名为Scout的数字取证框架，通过使用大型语言模型进行初步证据处理和优先处理。Scout利用基础语言模型从大量潜在证据文件（磁盘映像、捕获的网络数据包、内存转储等）中识别相关证据，这将需要更长时间才能识别。Scout采用基于文本的大型语言模型可以轻松处理带有文本信息的文件。对于音频、图像、视频、办公文档等多媒体文件的取证分析，Scout采用了多模态模型。Scout能够识别并确定对调查员可能感兴趣的证据文件。

更新时间: 2025-07-24 14:54:14

领域: cs.CR

下载: http://arxiv.org/abs/2507.18478v1

Scout: Leveraging Large Language Models for Rapid Digital Evidence Discovery

Updated: 2025-07-24 14:54:14

标题: 侦察：利用大型语言模型快速发现数字证据

摘要: 最近技术进步和技术在日常活动中的普及导致数字证据在越来越多的法律调查中的涉及可能性大大增加。消费级硬件变得更加强大，具有扩展的内存和存储容量以及增强的处理器功能。在进行持续调查期间，取证调查人员经常不得不筛选大量数据，使得这一过程变得繁琐。内存取证、磁盘分析等都得到了最先进工具的充分支持，这些工具可以通过提供字符串搜索、分析图像文件等方式显著降低取证调查人员的工作量。在调查过程中，识别出许多误报，需要降低。本文介绍了Scout，一个数字取证框架，它利用大型语言模型执行初步证据处理和优先处理。Scout部署基础语言模型来识别来自大量潜在证据文件（磁盘映像、捕获的网络数据包、内存转储等）的相关信息，这些信息在以前需要更长时间才能被识别出来。Scout采用基于文本的大型语言模型可以轻松处理带有文本信息的文件。对于音频、图像、视频、办公文档等多媒体文件的取证分析，Scout采用了多模态模型。Scout能够识别和理解那些对调查人员具有潜在兴趣的证据文件。

更新时间: 2025-07-24 14:54:14

领域: cs.CR

下载: http://arxiv.org/abs/2507.18478v1

Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting

Spatial intelligence is emerging as a transformative frontier in AI, yet it remains constrained by the scarcity of large-scale 3D datasets. Unlike the abundant 2D imagery, acquiring 3D data typically requires specialized sensors and laborious annotation. In this work, we present a scalable pipeline that converts single-view images into comprehensive, scale- and appearance-realistic 3D representations - including point clouds, camera poses, depth maps, and pseudo-RGBD - via integrated depth estimation, camera calibration, and scale calibration. Our method bridges the gap between the vast repository of imagery and the increasing demand for spatial scene understanding. By automatically generating authentic, scale-aware 3D data from images, we significantly reduce data collection costs and open new avenues for advancing spatial intelligence. We release two generated spatial datasets, i.e., COCO-3D and Objects365-v2-3D, and demonstrate through extensive experiments that our generated data can benefit various 3D tasks, ranging from fundamental perception to MLLM-based reasoning. These results validate our pipeline as an effective solution for developing AI systems capable of perceiving, understanding, and interacting with physical environments.

Updated: 2025-07-24 14:53:26

标题: 朝向可扩展的空间智能：通过2D到3D数据提升

摘要: 空间智能正成为人工智能中的一个变革性前沿，然而受限于大规模3D数据集的稀缺性。与丰富的2D图像不同，获取3D数据通常需要专门的传感器和费时费力的标注。在这项工作中，我们提出了一个可扩展的流程，将单视图图像转换为全面的、具有规模和外观逼真的3D表示 - 包括点云、相机姿态、深度图和伪RGBD - 通过集成深度估计、相机校准和比例校准。我们的方法弥合了图像大量库和对空间场景理解日益增长的需求之间的差距。通过从图像自动生成真实的、具有规模感知的3D数据，我们显著降低了数据收集成本，并为推进空间智能开辟了新途径。我们发布了两个生成的空间数据集，即COCO-3D和Objects365-v2-3D，并通过广泛的实验表明，我们生成的数据可以惠及各种3D任务，从基础感知到基于MLLM的推理。这些结果验证了我们的流程作为开发能够感知、理解和与物理环境互动的人工智能系统的有效解决方案。

更新时间: 2025-07-24 14:53:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18678v1

Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting

Updated: 2025-07-24 14:53:26

标题: 朝向可扩展的空间智能：通过2D到3D数据提升

摘要: 空间智能正逐渐成为人工智能的一个转变性前沿，然而由于大规模3D数据集的稀缺，它仍受到限制。与丰富的2D图像不同，获取3D数据通常需要专门的传感器和繁琐的注释。在这项工作中，我们提出了一个可扩展的流程，将单视图图像转换为全面的、尺度和外观逼真的3D表示，包括点云、相机姿态、深度图和伪RGBD - 通过集成深度估计、相机校准和尺度校准。我们的方法弥合了图像库与对空间场景理解需求日益增长之间的差距。通过从图像自动生成真实的、尺度感知的3D数据，我们显著降低了数据采集成本，并为推进空间智能开辟了新途径。我们发布了两个生成的空间数据集，即COCO-3D和Objects365-v2-3D，并通过大量实验表明，我们生成的数据可以造福于各种3D任务，从基础感知到基于MLLM的推理。这些结果验证了我们的流程作为开发能够感知、理解和与物理环境互动的人工智能系统的有效解决方案。

更新时间: 2025-07-24 14:53:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18678v1

Automated Code Review Using Large Language Models with Symbolic Reasoning

Code review is one of the key processes in the software development lifecycle and is essential to maintain code quality. However, manual code review is subjective and time consuming. Given its rule-based nature, code review is well suited for automation. In recent years, significant efforts have been made to automate this process with the help of artificial intelligence. Recent developments in Large Language Models (LLMs) have also emerged as a promising tool in this area, but these models often lack the logical reasoning capabilities needed to fully understand and evaluate code. To overcome this limitation, this study proposes a hybrid approach that integrates symbolic reasoning techniques with LLMs to automate the code review process. We tested our approach using the CodexGlue dataset, comparing several models, including CodeT5, CodeBERT, and GraphCodeBERT, to assess the effectiveness of combining symbolic reasoning and prompting techniques with LLMs. Our results show that this approach improves the accuracy and efficiency of automated code review.

Updated: 2025-07-24 14:50:27

标题: 使用具有符号推理的大型语言模型进行自动化代码审查

摘要: 代码审查是软件开发生命周期中的关键流程之一，对于维护代码质量至关重要。然而，手动代码审查具有主观性且耗时。鉴于其基于规则的性质，代码审查非常适合自动化。近年来，人们已经在利用人工智能来自动化这一过程方面做出了重大努力。最近大规模语言模型（LLMs）的发展也成为这一领域的一种有前途的工具，但这些模型常常缺乏全面理解和评估代码所需的逻辑推理能力。为了克服这一局限，本研究提出了一种混合方法，将符号推理技术与LLMs集成，以自动化代码审查过程。我们使用CodexGlue数据集测试了我们的方法，比较了几种模型，包括CodeT5、CodeBERT和GraphCodeBERT，以评估将符号推理和提示技术与LLMs相结合的有效性。我们的结果表明，这种方法提高了自动代码审查的准确性和效率。

更新时间: 2025-07-24 14:50:27

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.18476v1

Leveraging multi-source and heterogeneous signals for fatigue detection

Fatigue detection plays a critical role in safety-critical applications such as aviation, mining, and long-haul transport. However, most existing methods rely on high-end sensors and controlled environments, limiting their applicability in real world settings. This paper formally defines a practical yet underexplored problem setting for real world fatigue detection, where systems operating with context-appropriate sensors aim to leverage knowledge from differently instrumented sources including those using impractical sensors deployed in controlled environments. To tackle this challenge, we propose a heterogeneous and multi-source fatigue detection framework that adaptively utilizes the available modalities in the target domain while benefiting from the diverse configurations present in source domains. Our experiments, conducted using a realistic field-deployed sensor setup and two publicly available datasets, demonstrate the practicality, robustness, and improved generalization of our approach, paving the practical way for effective fatigue monitoring in sensor-constrained scenarios.

Updated: 2025-07-24 14:41:42

标题: 利用多源和异构信号进行疲劳检测

摘要: 疲劳检测在航空、矿业和长途运输等安全关键应用中起着至关重要的作用。然而，大多数现有方法依赖于高端传感器和受控环境，从而限制了它们在现实世界环境中的适用性。本文正式定义了一个在现实世界中用于疲劳检测的实用但尚未充分探索的问题设置，其中系统使用适应上下文的传感器，旨在利用来自不同仪器化来源的知识，包括那些使用不切实际传感器部署在受控环境中的。为了解决这一挑战，我们提出了一个异构和多源疲劳检测框架，该框架自适应地利用目标域中可用的模态，同时受益于源域中存在的多样配置。我们的实验使用一个真实的现场部署传感器设置和两个公开可用的数据集进行，展示了我们方法的实用性、稳健性和改进的泛化性，为在传感器受限情况下有效监测疲劳铺平了实用的道路。

更新时间: 2025-07-24 14:41:42

领域: cs.RO,cs.AI,62H30,I.2

下载: http://arxiv.org/abs/2507.16859v2

DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts

Learning from non-stationary data streams subject to concept drift requires models that can adapt on-the-fly while remaining resource-efficient. Existing adaptive ensemble methods often rely on coarse-grained adaptation mechanisms or simple voting schemes that fail to optimally leverage specialized knowledge. This paper introduces DriftMoE, an online Mixture-of-Experts (MoE) architecture that addresses these limitations through a novel co-training framework. DriftMoE features a compact neural router that is co-trained alongside a pool of incremental Hoeffding tree experts. The key innovation lies in a symbiotic learning loop that enables expert specialization: the router selects the most suitable expert for prediction, the relevant experts update incrementally with the true label, and the router refines its parameters using a multi-hot correctness mask that reinforces every accurate expert. This feedback loop provides the router with a clear training signal while accelerating expert specialization. We evaluate DriftMoE's performance across nine state-of-the-art data stream learning benchmarks spanning abrupt, gradual, and real-world drifts testing two distinct configurations: one where experts specialize on data regimes (multi-class variant), and another where they focus on single-class specialization (task-based variant). Our results demonstrate that DriftMoE achieves competitive results with state-of-the-art stream learning adaptive ensembles, offering a principled and efficient approach to concept drift adaptation. All code, data pipelines, and reproducibility scripts are available in our public GitHub repository: https://github.com/miguel-ceadar/drift-moe.

Updated: 2025-07-24 14:39:20

标题: DriftMoE：处理概念漂移的专家混合方法

摘要: 学习非稳态数据流受概念漂移影响需要能够在线适应并保持资源高效的模型。现有的自适应集成方法通常依赖于粗粒度的适应机制或简单的投票方案，未能充分利用专业知识。本文介绍了DriftMoE，一种在线专家混合（MoE）架构，通过一种新颖的共同训练框架解决了这些限制。DriftMoE具有一个紧凑的神经路由器，与一组增量Hoeffding树专家一起进行共同训练。关键创新在于一种共生学习循环，使专家专业化：路由器选择最适合进行预测的专家，相关专家根据真实标签进行增量更新，而路由器则使用多热正确性掩码调整其参数，以强化每个准确专家。这种反馈循环为路由器提供了清晰的训练信号，同时加速了专家的专业化。我们评估了DriftMoE在涵盖突发、渐变和真实漂移的九个最先进数据流学习基准测试中的性能，测试了两种不同的配置：一种是专家在数据制度上专业化（多类别变体），另一种是专注于单类别专业化的变体（基于任务的变体）。我们的结果表明，DriftMoE在与最先进的数据流学习自适应集成相竞争时取得了竞争性的结果，为概念漂移适应提供了一种理性和高效的方法。我们的公共GitHub存储库中提供了所有代码、数据管道和可重现性脚本：https://github.com/miguel-ceadar/drift-moe。

更新时间: 2025-07-24 14:39:20

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.18464v1

DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts

Updated: 2025-07-24 14:39:20

标题: DriftMoE: 一种处理概念漂移的专家混合方法

摘要: 学习非稳态数据流受概念漂移影响需要能够在资源有效的同时实时适应的模型。现有的自适应集成方法通常依赖于粗粒度的适应机制或简单的投票方案，无法最大程度地利用专业知识。本文介绍了DriftMoE，这是一种在线的专家混合(MoE)架构，通过一种新颖的共同训练框架解决了这些限制。DriftMoE具有一个紧凑的神经路由器，与一组增量Hoeffding树专家一起进行共同训练。关键创新在于一种共生学习循环，使专家专业化：路由器选择最适合的专家进行预测，相关专家根据真实标签进行增量更新，路由器使用多热正确性掩码调整其参数，以强化每个准确的专家。这种反馈循环为路由器提供了清晰的训练信号，同时加速了专家的专业化。我们评估了DriftMoE在跨越突变、渐变和真实世界漂移的九个最先进数据流学习基准上的性能，测试了两种不同的配置：一种是专家专门化在数据制度上的（多类别变体），另一种是专注于单类特化的（基于任务的变体）。我们的结果表明，DriftMoE在与最先进的数据流学习自适应集成相竞争的结果中取得了竞争性结果，提供了一种合理高效的概念漂移适应方法。我们的公共GitHub存储库中提供了所有代码、数据管道和可重现性脚本：https://github.com/miguel-ceadar/drift-moe。

更新时间: 2025-07-24 14:39:20

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.18464v1

Revisiting Physically Realizable Adversarial Object Attack against LiDAR-based Detection: Clarifying Problem Formulation and Experimental Protocols

Adversarial robustness in LiDAR-based 3D object detection is a critical research area due to its widespread application in real-world scenarios. While many digital attacks manipulate point clouds or meshes, they often lack physical realizability, limiting their practical impact. Physical adversarial object attacks remain underexplored and suffer from poor reproducibility due to inconsistent setups and hardware differences. To address this, we propose a device-agnostic, standardized framework that abstracts key elements of physical adversarial object attacks, supports diverse methods, and provides open-source code with benchmarking protocols in simulation and real-world settings. Our framework enables fair comparison, accelerates research, and is validated by successfully transferring simulated attacks to a physical LiDAR system. Beyond the framework, we offer insights into factors influencing attack success and advance understanding of adversarial robustness in real-world LiDAR perception.

Updated: 2025-07-24 14:37:00

标题: 重新审视基于LiDAR检测的可实现对抗性目标攻击：澄清问题形式和实验方案

摘要: 激光雷达（LiDAR）技术在3D物体检测中的对抗性鲁棒性是一个关键的研究领域，因为它在现实场景中有广泛的应用。虽然许多数字攻击会操纵点云或网格，但它们往往缺乏物理可实现性，限制了它们的实际影响。物理对抗性物体攻击仍然未被充分探索，并因设置不一致和硬件差异而导致重现性差。为了解决这个问题，我们提出了一个设备不可知的标准化框架，抽象了物理对抗性物体攻击的关键要素，支持多种方法，并在模拟和现实世界环境中提供基准测试协议的开源代码。我们的框架实现了公平比较，加快了研究进展，并通过成功将模拟攻击转移到实际的LiDAR系统来验证。除了框架之外，我们还提供了影响攻击成功的因素的见解，推动了对现实世界LiDAR感知中对抗性鲁棒性的理解。

更新时间: 2025-07-24 14:37:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18457v1

Generation of Synthetic Clinical Text: A Systematic Review

Generating clinical synthetic text represents an effective solution for common clinical NLP issues like sparsity and privacy. This paper aims to conduct a systematic review on generating synthetic medical free-text by formulating quantitative analysis to three research questions concerning (i) the purpose of generation, (ii) the techniques, and (iii) the evaluation methods. We searched PubMed, ScienceDirect, Web of Science, Scopus, IEEE, Google Scholar, and arXiv databases for publications associated with generating synthetic medical unstructured free-text. We have identified 94 relevant articles out of 1,398 collected ones. A great deal of attention has been given to the generation of synthetic medical text from 2018 onwards, where the main purpose of such a generation is towards text augmentation, assistive writing, corpus building, privacy-preserving, annotation, and usefulness. Transformer architectures were the main predominant technique used to generate the text, especially the GPTs. On the other hand, there were four main aspects of evaluation, including similarity, privacy, structure, and utility, where utility was the most frequent method used to assess the generated synthetic medical text. Although the generated synthetic medical text demonstrated a moderate possibility to act as real medical documents in different downstream NLP tasks, it has proven to be a great asset as augmented, complementary to the real documents, towards improving the accuracy and overcoming sparsity/undersampling issues. Yet, privacy is still a major issue behind generating synthetic medical text, where more human assessments are needed to check for the existence of any sensitive information. Despite that, advances in generating synthetic medical text will considerably accelerate the adoption of workflows and pipeline development, discarding the time-consuming legalities of data transfer.

Updated: 2025-07-24 14:35:16

标题: 生成合成临床文本：系统综述

摘要: 生成临床合成文本代表了解决常见临床自然语言处理问题（如稀疏性和隐私性）的有效解决方案。本文旨在通过制定定量分析来对生成医学自由文本进行系统综述，针对三个研究问题进行调查，包括（i）生成的目的，（ii）技术，以及（iii）评估方法。我们在PubMed、ScienceDirect、Web of Science、Scopus、IEEE、Google Scholar和arXiv数据库中搜索了与生成医学非结构化自由文本相关的出版物。我们从1,398篇收集的文章中确定了94篇相关文章。自2018年以来，已经非常关注生成医学文本，其中这种生成的主要目的是用于文本增强、辅助写作、语料库构建、隐私保护、标注和实用性。变压器结构是生成文本的主要技术，特别是GPTs。另一方面，评估有四个主要方面，包括相似性、隐私、结构和实用性，其中实用性是评估生成的医学合成文本最常用的方法。尽管生成的医学合成文本在不同下游自然语言处理任务中表现出中等可能性作为真实医学文件的能力，但它已被证明是一种非常有价值的资源，作为对真实文件的增补，有助于提高准确性和克服稀疏/欠采样问题。然而，隐私仍然是生成医学合成文本背后的一个主要问题，需要更多的人类评估来检查是否存在任何敏感信息。尽管如此，生成医学合成文本的进展将大大加快工作流程和管道开发的采纳，摒弃了繁琐的数据传输法律程序。

更新时间: 2025-07-24 14:35:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18451v1

Restoring Rhythm: Punctuation Restoration Using Transformer Models for Bangla, a Low-Resource Language

Punctuation restoration enhances the readability of text and is critical for post-processing tasks in Automatic Speech Recognition (ASR), especially for low-resource languages like Bangla. In this study, we explore the application of transformer-based models, specifically XLM-RoBERTa-large, to automatically restore punctuation in unpunctuated Bangla text. We focus on predicting four punctuation marks: period, comma, question mark, and exclamation mark across diverse text domains. To address the scarcity of annotated resources, we constructed a large, varied training corpus and applied data augmentation techniques. Our best-performing model, trained with an augmentation factor of alpha = 0.20%, achieves an accuracy of 97.1% on the News test set, 91.2% on the Reference set, and 90.2% on the ASR set. Results show strong generalization to reference and ASR transcripts, demonstrating the model's effectiveness in real-world, noisy scenarios. This work establishes a strong baseline for Bangla punctuation restoration and contributes publicly available datasets and code to support future research in low-resource NLP.

Updated: 2025-07-24 14:33:13

标题: 恢复节奏：使用Transformer模型恢复孟加拉语标点符号，一种资源稀缺语言

摘要: 标点符号的恢复增强了文本的可读性，对于自动语音识别（ASR）中的后处理任务至关重要，特别是对于孟加拉语等低资源语言。在这项研究中，我们探讨了基于变压器的模型，具体是XLM-RoBERTa-large，在未标点的孟加拉文本中自动恢复标点的应用。我们专注于预测四种标点符号：句号、逗号、问号和感叹号，跨越多样的文本领域。为了解决标注资源的稀缺性，我们构建了一个庞大而多样的训练语料库，并应用了数据增强技术。我们表现最佳的模型，在增强因子alpha = 0.20%下训练，News测试集上达到97.1%的准确率，参考集上达到91.2%，ASR集上达到90.2%。结果显示对于参考和ASR转录具有强大的泛化能力，展示了该模型在真实世界中嘈杂场景中的有效性。这项工作为孟加拉语标点符号的恢复建立了强大的基线，并提供了公开可用的数据集和代码，以支持低资源自然语言处理领域的未来研究。

更新时间: 2025-07-24 14:33:13

领域: cs.CL,cs.AI,cs.LG,I.2; I.7

下载: http://arxiv.org/abs/2507.18448v1

Restoring Rhythm: Punctuation Restoration Using Transformer Models for Bangla, a Low-Resource Language

Updated: 2025-07-24 14:33:13

标题: 恢复节奏：使用Transformer模型恢复孟加拉语标点，一种低资源语言

摘要: 标点符号的恢复增强了文本的可读性，并且对于自动语音识别（ASR）中的后处理任务至关重要，特别是对于孟加拉语等低资源语言。在这项研究中，我们探讨了基于Transformer模型，特别是XLM-RoBERTa-large，用于自动恢复未标点的孟加拉文本中的标点符号的应用。我们专注于预测四种标点符号：句号、逗号、问号和感叹号，跨不同文本领域。为了解决标注资源的稀缺性，我们构建了一个大而多样的训练语料库，并应用了数据增强技术。我们表现最佳的模型，使用增强因子alpha = 0.20％进行训练，在新闻测试集上获得了97.1％的准确率，在参考集上获得了91.2％的准确率，在ASR集上获得了90.2％的准确率。结果显示出对参考和ASR转录的强大泛化能力，证明了模型在现实世界中嘈杂场景中的有效性。这项工作为孟加拉语标点符号的恢复建立了强大的基线，并贡献了公开可用的数据集和代码，以支持低资源NLP领域的未来研究。

更新时间: 2025-07-24 14:33:13

领域: cs.CL,cs.AI,cs.LG,I.2; I.7

下载: http://arxiv.org/abs/2507.18448v1

AraTable: Benchmarking LLMs' Reasoning and Understanding of Arabic Tabular Data

The cognitive and reasoning abilities of large language models (LLMs) have enabled remarkable progress in natural language processing. However, their performance in interpreting structured data, especially in tabular formats, remains limited. Although benchmarks for English tabular data are widely available, Arabic is still underrepresented because of the limited availability of public resources and its unique language features. To address this gap, we present AraTable, a novel and comprehensive benchmark designed to evaluate the reasoning and understanding capabilities of LLMs when applied to Arabic tabular data. AraTable consists of various evaluation tasks, such as direct question answering, fact verification, and complex reasoning, involving a wide range of Arabic tabular sources. Our methodology follows a hybrid pipeline, where initial content is generated by LLMs and subsequently filtered and verified by human experts to ensure high dataset quality. Initial analyses using AraTable show that, while LLMs perform adequately on simpler tabular tasks such as direct question answering, they continue to face significant cognitive challenges when tasks require deeper reasoning and fact verification. This indicates that there are substantial opportunities for future work to improve performance on complex tabular reasoning tasks. We also propose a fully automated evaluation framework that uses a self-deliberation mechanism and achieves performance nearly identical to that of human judges. This research provides a valuable, publicly available resource and evaluation framework that can help accelerate the development of foundational models for processing and analysing Arabic structured data.

Updated: 2025-07-24 14:26:41

标题: AraTable：LLMs对阿拉伯表格数据推理和理解的基准测试

摘要: 大型语言模型（LLMs）的认知和推理能力已经在自然语言处理领域取得了显著进展。然而，它们在解释结构化数据，特别是表格格式方面的性能仍然有限。尽管英语表格数据的基准测试已经广泛可用，但由于公共资源的有限可用性和其独特的语言特征，阿拉伯语仍然是少数代表。为了填补这一空白，我们提出了AraTable，一个旨在评估LLMs应用于阿拉伯语表格数据时的推理和理解能力的新颖而全面的基准。AraTable包括各种评估任务，如直接问题回答、事实验证和复杂推理，涉及各种阿拉伯语表格数据源。我们的方法遵循混合流水线，其中LLMs生成初始内容，随后由人类专家进行过滤和验证，以确保高质量的数据集。使用AraTable进行的初步分析显示，虽然LLMs在简单的表格任务（如直接问题回答）上表现得足够好，但在需要更深层次推理和事实验证的任务中仍然面临重大认知挑战。这表明未来有很大的机会来改善在复杂表格推理任务上的表现。我们还提出了一个完全自动化的评估框架，使用自我思考机制，并实现了几乎与人类评委相同的表现。这项研究提供了一个有价值的、公开可用的资源和评估框架，可以帮助加速开发用于处理和分析阿拉伯语结构化数据的基础模型。

更新时间: 2025-07-24 14:26:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18442v1

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Reinforcement learning with outcome-based feedback faces a fundamental challenge: when rewards are only observed at trajectory endpoints, how do we assign credit to the right actions? This paper provides the first comprehensive analysis of this problem in online RL with general function approximation. We develop a provably sample-efficient algorithm achieving $\widetilde{O}({C_{\rm cov} H^3}/{\epsilon^2})$ sample complexity, where $C_{\rm cov}$ is the coverability coefficient of the underlying MDP. By leveraging general function approximation, our approach works effectively in large or infinite state spaces where tabular methods fail, requiring only that value functions and reward functions can be represented by appropriate function classes. Our results also characterize when outcome-based feedback is statistically separated from per-step rewards, revealing an unavoidable exponential separation for certain MDPs. For deterministic MDPs, we show how to eliminate the completeness assumption, dramatically simplifying the algorithm. We further extend our approach to preference-based feedback settings, proving that equivalent statistical efficiency can be achieved even under more limited information. Together, these results constitute a theoretical foundation for understanding the statistical properties of outcome-based reinforcement learning.

Updated: 2025-07-24 14:21:12

标题: 基于结果的在线强化学习：算法和基本限制

摘要: 基于结果的反馈的强化学习面临一个根本性挑战：当奖励仅在轨迹端点观察到时，我们如何对正确的行为进行归因？本文首次对在线RL中具有一般函数逼近的这个问题进行了全面分析。我们开发了一个经过证明的样本高效算法，实现了${\widetilde{O}({C_{\rm cov} H^3}/{\epsilon^2})}$的样本复杂度，其中$C_{\rm cov}$是潜在MDP的覆盖系数。通过利用一般函数逼近，我们的方法在大型或无限状态空间中有效工作，而表格方法失败，只需要价值函数和奖励函数可以由适当的函数类表示。我们的结果还表明，当基于结果的反馈在统计上与每步奖励分离时，对于某些MDP来说，存在不可避免的指数分离。对于确定性MDP，我们展示了如何消除完整性假设，从而极大简化了算法。我们进一步将我们的方法扩展到基于偏好的反馈设置，证明了即使在更有限的信息下，也可以实现等效的统计效率。总的来说，这些结果构成了理解基于结果的强化学习的统计特性的理论基础。

更新时间: 2025-07-24 14:21:12

领域: cs.LG,cs.AI,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2505.20268v2

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Updated: 2025-07-24 14:21:12

标题: 基于结果的在线强化学习：算法和基本限制

摘要: 使用基于结果的反馈进行强化学习面临一个根本性挑战：当奖励仅在轨迹端点观察到时，我们如何将信用分配给正确的动作？本文首次对在线RL与一般函数逼近中的这个问题进行了全面分析。我们开发了一个经证明的样本高效算法，实现了$\widetilde{O}({C_{\rm cov} H^3}/{\epsilon^2})$的样本复杂度，其中$C_{\rm cov}$是潜在MDP的覆盖系数。通过利用一般函数逼近，我们的方法在大型或无限状态空间中有效地工作，而表格方法失败，只需要价值函数和奖励函数可以用适当的函数类表示。我们的结果还表明了当基于结果的反馈在统计上与每步奖励分离时的情况，揭示了对于某些MDPs不可避免的指数分离。对于确定性MDPs，我们展示了如何消除完整性假设，从而大大简化了算法。我们进一步将我们的方法扩展到基于偏好的反馈设置，证明了即使在更有限的信息下也可以实现等效的统计效率。总的来说，这些结果构成了理解基于结果的强化学习的统计特性的理论基础。

更新时间: 2025-07-24 14:21:12

领域: cs.LG,cs.AI,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2505.20268v2

IPCGRL: Language-Instructed Reinforcement Learning for Procedural Level Generation

Recent research has highlighted the significance of natural language in enhancing the controllability of generative models. While various efforts have been made to leverage natural language for content generation, research on deep reinforcement learning (DRL) agents utilizing text-based instructions for procedural content generation remains limited. In this paper, we propose IPCGRL, an instruction-based procedural content generation method via reinforcement learning, which incorporates a sentence embedding model. IPCGRL fine-tunes task-specific embedding representations to effectively compress game-level conditions. We evaluate IPCGRL in a two-dimensional level generation task and compare its performance with a general-purpose embedding method. The results indicate that IPCGRL achieves up to a 21.4% improvement in controllability and a 17.2% improvement in generalizability for unseen instructions. Furthermore, the proposed method extends the modality of conditional input, enabling a more flexible and expressive interaction framework for procedural content generation.

Updated: 2025-07-24 14:14:52

标题: IPCGRL：面向程序级生成的语言指导强化学习

摘要: 最近的研究突显了自然语言在增强生成模型的可控性方面的重要性。虽然已经采取了各种努力利用自然语言进行内容生成的研究，但利用基于文本指令的深度强化学习（DRL）代理进行程序内容生成的研究仍然有限。本文提出了IPCGRL，一种基于指令的程序内容生成方法，通过强化学习结合了句子嵌入模型。IPCGRL对任务特定的嵌入表示进行微调，以有效压缩游戏级别的条件。我们在二维级别生成任务中评估了IPCGRL，并将其性能与通用嵌入方法进行了比较。结果表明，IPCGRL在可控性方面的提高达到了21.4%，在未见指令的情况下泛化性提高了17.2%。此外，提出的方法扩展了条件输入的模态，为程序内容生成提供了更灵活和富有表现力的交互框架。

更新时间: 2025-07-24 14:14:52

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.12358v4

IPCGRL: Language-Instructed Reinforcement Learning for Procedural Level Generation

Updated: 2025-07-24 14:14:52

标题: IPCGRL：面向程序级生成的语言指导强化学习

摘要: 近期研究突出了自然语言在增强生成模型可控性方面的重要性。虽然已经做出了各种努力利用自然语言进行内容生成，但利用基于文本的指令进行程序内容生成的深度强化学习（DRL）代理的研究仍然有限。在本文中，我们提出了IPCGRL，一种基于指令的程序内容生成方法，通过强化学习，该方法结合了一个句子嵌入模型。IPCGRL微调任务特定的嵌入表示，有效地压缩游戏级别的条件。我们在一个二维级别生成任务中评估了IPCGRL，并将其性能与通用嵌入方法进行了比较。结果表明，IPCGRL 在可控性方面取得了高达 21.4% 的改善，并且对于未见过的指令，其泛化性能提高了 17.2%。此外，所提出的方法扩展了条件输入的模态，为程序内容生成提供了更灵活和更具表现力的交互框架。

更新时间: 2025-07-24 14:14:52

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.12358v4

NLML-HPE: Head Pose Estimation with Limited Data via Manifold Learning

Head pose estimation (HPE) plays a critical role in various computer vision applications such as human-computer interaction and facial recognition. In this paper, we propose a novel deep learning approach for head pose estimation with limited training data via non-linear manifold learning called NLML-HPE. This method is based on the combination of tensor decomposition (i.e., Tucker decomposition) and feed forward neural networks. Unlike traditional classification-based approaches, our method formulates head pose estimation as a regression problem, mapping input landmarks into a continuous representation of pose angles. To this end, our method uses tensor decomposition to split each Euler angle (yaw, pitch, roll) to separate subspaces and models each dimension of the underlying manifold as a cosine curve. We address two key challenges: 1. Almost all HPE datasets suffer from incorrect and inaccurate pose annotations. Hence, we generated a precise and consistent 2D head pose dataset for our training set by rotating 3D head models for a fixed set of poses and rendering the corresponding 2D images. 2. We achieved real-time performance with limited training data as our method accurately captures the nature of rotation of an object from facial landmarks. Once the underlying manifold for rotation around each axis is learned, the model is very fast in predicting unseen data. Our training and testing code is available online along with our trained models: https: //github.com/MahdiGhafoorian/NLML_HPE.

Updated: 2025-07-24 14:08:33

标题: NLML-HPE: 通过流形学习利用有限数据进行头部姿势估计

摘要: 头部姿态估计（HPE）在各种计算机视觉应用中起着关键作用，例如人机交互和面部识别。在本文中，我们提出了一种基于非线性流形学习的新型深度学习方法，用于通过有限训练数据进行头部姿态估计，称为NLML-HPE。该方法基于张量分解（即Tucker分解）和前馈神经网络的组合。与传统的基于分类的方法不同，我们的方法将头部姿态估计表述为回归问题，将输入地标映射为姿态角的连续表示。为此，我们的方法使用张量分解将每个欧拉角（偏航、俯仰、横滚）分割为单独的子空间，并将底层流形的每个维度建模为余弦曲线。我们解决了两个关键挑战：1. 几乎所有HPE数据集都存在不正确和不准确的姿态标注。因此，我们通过旋转3D头部模型以固定姿势集，并渲染相应的2D图像，为我们的训练集生成了精确和一致的2D头部姿态数据集。2. 我们通过有限的训练数据实现了实时性能，因为我们的方法准确捕捉了从面部地标到对象旋转的本质。一旦学习了围绕每个轴旋转的底层流形，模型在预测未见数据时非常快速。我们的训练和测试代码以及我们训练的模型可在线获取：https://github.com/MahdiGhafoorian/NLML_HPE。

更新时间: 2025-07-24 14:08:33

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18429v1

NLML-HPE: Head Pose Estimation with Limited Data via Manifold Learning

Updated: 2025-07-24 14:08:33

标题: NLML-HPE: 通过流形学习利用有限数据进行头部姿势估计

摘要: 头部姿态估计（HPE）在各种计算机视觉应用中起着至关重要的作用，如人机交互和面部识别。在本文中，我们提出了一种新颖的深度学习方法，用于通过非线性流形学习进行头部姿态估计，称为NLML-HPE。该方法基于张量分解（即，Tucker分解）和前馈神经网络的结合。与传统的基于分类的方法不同，我们的方法将头部姿态估计表述为回归问题，将输入的地标映射到姿态角度的连续表示。为此，我们的方法使用张量分解将每个欧拉角（偏航、俯仰、翻滚）分割为单独的子空间，并将底层流形的每个维度建模为余弦曲线。我们解决了两个关键挑战：1. 几乎所有的HPE数据集都存在错误和不准确的姿势标注。因此，我们通过旋转3D头部模型以固定的姿势集来生成了精确和一致的2D头部姿态数据集作为我们的训练集，并渲染相应的2D图像。2. 在有限的训练数据下，我们实现了实时性能，因为我们的方法准确捕捉了从面部地标旋转物体的特性。一旦学习了每个轴周围的旋转的底层流形，模型在预测未见数据时非常快速。我们的训练和测试代码以及训练好的模型均可在线获取：https://github.com/MahdiGhafoorian/NLML_HPE。

更新时间: 2025-07-24 14:08:33

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18429v1

How do language models learn facts? Dynamics, curricula and hallucinations

Large language models accumulate vast knowledge during pre-training, yet the dynamics governing this acquisition remain poorly understood. This work investigates the learning dynamics of language models on a synthetic factual recall task, uncovering three key findings: First, language models learn in three phases, exhibiting a performance plateau before acquiring precise factual knowledge. Mechanistically, this plateau coincides with the formation of attention-based circuits that support recall. Second, the training data distribution significantly impacts learning dynamics, as imbalanced distributions lead to shorter plateaus. Finally, hallucinations emerge simultaneously with knowledge, and integrating new knowledge into the model through fine-tuning is challenging, as it quickly corrupts its existing parametric memories. Our results emphasize the importance of data distribution in knowledge acquisition and suggest novel data scheduling strategies to accelerate neural network training.

Updated: 2025-07-24 14:04:20

标题: 语言模型是如何学习事实的？动态、课程表和幻觉

摘要: 大型语言模型在预训练期间积累了大量知识，然而控制这种获取过程的动态仍然知之甚少。本研究调查了语言模型在一个合成的事实回忆任务中的学习动态，揭示了三个关键发现：首先，语言模型在三个阶段学习，在获得精确的事实知识之前出现性能平台。从机制上看，这个平台与支持回忆的基于注意力的电路的形成相一致。其次，训练数据的分布显著影响学习动态，不平衡的分布导致平台时间较短。最后，幻觉与知识同时出现，通过微调将新知识整合到模型中具有挑战性，因为这很快会破坏其现有的参数记忆。我们的结果强调了数据分布在知识获取中的重要性，并建议新颖的数据调度策略以加速神经网络训练。

更新时间: 2025-07-24 14:04:20

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.21676v2

How do language models learn facts? Dynamics, curricula and hallucinations

Updated: 2025-07-24 14:04:20

标题: 语言模型是如何学习事实的？动态、课程和幻觉

摘要: 大型语言模型在预训练期间积累了大量知识，然而控制这种获取过程的动力学仍然知之甚少。本研究调查了语言模型在合成事实回忆任务中的学习动态，揭示了三个关键发现：首先，语言模型分三个阶段学习，在获得准确事实知识之前表现出性能平台。从机械上讲，这个平台与支持回忆的基于注意力的电路的形成同时发生。其次，训练数据分布显著影响学习动态，不平衡分布会导致更短的平台。最后，幻觉与知识同时出现，并且通过微调将新知识整合到模型中是具有挑战性的，因为它很快破坏了其现有的参数化记忆。我们的结果强调了数据分布在知识获取中的重要性，并建议新颖的数据调度策略来加速神经网络训练。

更新时间: 2025-07-24 14:04:20

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2503.21676v2

LLM-D12: A Dual-Dimensional Scale of Instrumental and Relational Dependencies on Large Language Models

There is growing interest in understanding how people interact with large language models (LLMs) and whether such models elicit dependency or even addictive behaviour. Validated tools to assess the extent to which individuals may become dependent on LLMs are scarce and primarily build on classic behavioral addiction symptoms, adapted to the context of LLM use. We view this as a conceptual limitation, as the LLM-human relationship is more nuanced and warrants a fresh and distinct perspective. To address this gap, we developed and validated a new 12-item questionnaire to measure LLM dependency, referred to as LLM-D12. The scale was based on the authors' prior theoretical work, with items developed accordingly and responses collected from 526 participants in the UK. Exploratory and confirmatory factor analyses, performed on separate halves of the total sample using a split-sample approach, supported a two-factor structure: Instrumental Dependency (six items) and Relationship Dependency (six items). Instrumental Dependency reflects the extent to which individuals rely on LLMs to support or collaborate in decision-making and cognitive tasks. Relationship Dependency captures the tendency to perceive LLMs as socially meaningful, sentient, or companion-like entities. The two-factor structure demonstrated excellent internal consistency and clear discriminant validity. External validation confirmed both the conceptual foundation and the distinction between the two subscales. The psychometric properties and structure of our LLM-D12 scale were interpreted in light of the emerging view that dependency on LLMs does not necessarily indicate dysfunction but may still reflect reliance levels that could become problematic in certain contexts.

Updated: 2025-07-24 14:00:31

标题: LLM-D12：大型语言模型中的工具性和关系性依赖的双维度量表

摘要: 越来越多的人对于人们如何与大型语言模型（LLMs）进行互动以及这些模型是否引起依赖甚至成瘾行为感兴趣。目前缺乏经过验证的工具来评估个体可能对LLMs产生依赖的程度，主要建立在经典的行为成瘾症状基础上，适应LLM使用的情境。我们认为这是一个概念上的局限，因为LLM与人类之间的关系更加微妙，需要一种新鲜而独特的视角。为了填补这一空白，我们开发并验证了一个新的12项问卷，用于衡量LLM依赖，称为LLM-D12。该量表基于作者先前的理论工作，相应地开发了项目，并从英国的526名参与者中收集了回应。在使用分样本方法对总样本的两半进行的探索性和验证性因素分析支持了一个两因素结构：工具依赖（六个项目）和关系依赖（六个项目）。工具依赖反映了个体依赖LLMs来支持或合作进行决策和认知任务的程度。关系依赖捕捉了倾向于将LLMs视为社交有意义的、有感情的或类似伴侣的实体的倾向。这两因素结构表现出优秀的内部一致性和明确的差异效度。外部验证证实了概念基础和两个子量表之间的区别。我们的LLM-D12量表的心理测量特性和结构被解释为在新兴观点的光下，对LLMs的依赖不一定意味着功能失调，但在某些情境下仍可能反映出可能成为问题的依赖水平。

更新时间: 2025-07-24 14:00:31

领域: cs.HC,cs.AI,Human-Centered Computing -- > Human computer interaction (HCI) --> HCI design and evaluation methods

下载: http://arxiv.org/abs/2506.06874v3

Multi-Model Ensemble and Reservoir Computing for River Discharge Prediction in Ungauged Basins

Despite the critical need for accurate flood prediction and water management, many regions lack sufficient river discharge observations, limiting the skill of rainfall-runoff analyses. Although numerous physically based and machine learning models exist, achieving high accuracy, interpretability, and computational efficiency under data-scarce conditions remains a major challenge. We address this challenge with a novel method, HYdrological Prediction with multi-model Ensemble and Reservoir computing (HYPER) that leverages multi-model ensemble and reservoir computing (RC). Our approach first applies Bayesian model averaging (BMA) to 43 "uncalibrated" catchment-based conceptual hydrological models. An RC model is then trained via linear regression to correct errors in the BMA output, a non-iterative process that ensures high computational efficiency. For ungauged basins, we infer the required BMA and RC weights by linking them to catchment attributes from gauged basins, creating a generalizable framework. We evaluated HYPER using data from 87 river basins in Japan. In a data-rich scenario, HYPER (median Kling-Gupta Efficiency, KGE, of 0.56) performed comparably to a benchmark LSTM (KGE 0.55) but required only 5% of its computational time. In a data-scarce scenario (23% of basins gauged), HYPER maintained robust performance (KGE 0.55) and lower uncertainty, whereas the LSTM's performance degraded significantly (KGE -0.04). These results reveal that individual conceptual hydrological models do not necessarily need to be calibrated when an effectively large ensemble is assembled and combined with machine-learning-based bias correction. HYPER provides a robust, efficient, and generalizable solution for discharge prediction, particularly in ungauged basins, making it applicable to a wide range of regions.

Updated: 2025-07-24 14:00:18

标题: 多模型集成和水库计算在未测流域河流流量预测中的应用

摘要: 尽管准确的洪水预测和水资源管理至关重要，但许多地区缺乏足够的河流流量观测数据，限制了降雨径流分析的准确性。尽管存在许多基于物理和机器学习模型，但在数据稀缺条件下实现高准确性、可解释性和计算效率仍然是一项主要挑战。我们通过一种新颖的方法 HYdrological Prediction with multi-model Ensemble and Reservoir computing (HYPER) 来应对这一挑战，该方法利用多模型集成和水库计算（RC）。我们的方法首先对43个“未校准”的基于集水区的概念性水文模型应用贝叶斯模型平均（BMA）。然后通过线性回归训练一个 RC 模型来纠正 BMA 输出中的错误，这是一个非迭代过程，确保了高计算效率。对于未测流域，我们通过将其与已测流域的集水区属性联系起来来推断所需的 BMA 和 RC 权重，从而创建一个可推广的框架。我们使用日本87个河流流域的数据评估了 HYPER。在数据丰富的情况下，HYPER（中位 Kling-Gupta 效率，KGE，为0.56）表现与基准 LSTM 相当（KGE 0.55），但只需其计算时间的5%。在数据稀缺情况下（23%的流域已测量），HYPER 保持了强大的性能（KGE 0.55）和更低的不确定性，而 LSTM 的性能明显下降（KGE -0.04）。这些结果显示，当有效大集成与基于机器学习的偏差校正结合时，个体的概念性水文模型不一定需要校准。HYPER 为流量预测提供了一个强大、高效和可推广的解决方案，特别适用于未测流域，使其适用于广泛的地区。

更新时间: 2025-07-24 14:00:18

领域: cs.LG,physics.geo-ph

下载: http://arxiv.org/abs/2507.18423v1

Multi-Model Ensemble and Reservoir Computing for River Discharge Prediction in Ungauged Basins

Updated: 2025-07-24 14:00:18

标题: 多模型集成和水库计算用于未测流域的河流径流预测

摘要: 尽管准确的洪水预测和水资源管理至关重要，但许多地区缺乏足够的河流流量观测数据，限制了降雨径流分析的准确性。尽管存在大量基于物理和机器学习的模型，但在数据稀缺条件下实现高准确性、可解释性和计算效率仍然是一个重要挑战。我们提出了一种新颖的方法，HYdrological Prediction with multi-model Ensemble and Reservoir computing (HYPER)，利用多模型集成和水库计算（RC）。我们的方法首先将贝叶斯模型平均（BMA）应用于43个“未校准”的集水区概念水文模型。然后通过线性回归训练一个RC模型来校正BMA输出中的错误，这是一个非迭代过程，确保了高计算效率。对于未测量的流域，我们通过将它们与已测量的流域的集水区属性联系起来来推断所需的BMA和RC权重，从而创建一个可推广的框架。我们使用日本87个河流流域的数据对HYPER进行了评估。在数据丰富的情况下，HYPER（中位Kling-Gupta效率，KGE，为0.56）的表现与基准LSTM（KGE 0.55）相当，但仅需要其计算时间的5%。在数据稀缺的情况下（23%的流域被测量），HYPER保持了稳健的性能（KGE 0.55）和较低的不确定性，而LSTM的性能显著下降（KGE -0.04）。这些结果表明，当有效地组装大型集成并与基于机器学习的偏差校正相结合时，个别概念水文模型不一定需要进行校准。HYPER为流量预测提供了一个坚固、高效和可推广的解决方案，特别适用于未测量的流域，使其适用于各种地区。

更新时间: 2025-07-24 14:00:18

领域: cs.LG,physics.geo-ph

下载: http://arxiv.org/abs/2507.18423v1

Residual Prior-driven Frequency-aware Network for Image Fusion

Image fusion aims to integrate complementary information across modalities to generate high-quality fused images, thereby enhancing the performance of high-level vision tasks. While global spatial modeling mechanisms show promising results, constructing long-range feature dependencies in the spatial domain incurs substantial computational costs. Additionally, the absence of ground-truth exacerbates the difficulty of capturing complementary features effectively. To tackle these challenges, we propose a Residual Prior-driven Frequency-aware Network, termed as RPFNet. Specifically, RPFNet employs a dual-branch feature extraction framework: the Residual Prior Module (RPM) extracts modality-specific difference information from residual maps, thereby providing complementary priors for fusion; the Frequency Domain Fusion Module (FDFM) achieves efficient global feature modeling and integration through frequency-domain convolution. Additionally, the Cross Promotion Module (CPM) enhances the synergistic perception of local details and global structures through bidirectional feature interaction. During training, we incorporate an auxiliary decoder and saliency structure loss to strengthen the model's sensitivity to modality-specific differences. Furthermore, a combination of adaptive weight-based frequency contrastive loss and SSIM loss effectively constrains the solution space, facilitating the joint capture of local details and global features while ensuring the retention of complementary information. Extensive experiments validate the fusion performance of RPFNet, which effectively integrates discriminative features, enhances texture details and salient objects, and can effectively facilitate the deployment of the high-level vision task.

Updated: 2025-07-24 13:57:08

标题: 基于残差先验驱动的频率感知图像融合网络

摘要: 图像融合旨在整合跨模态的互补信息，生成高质量的融合图像，从而提高高级视觉任务的性能。虽然全局空间建模机制表现出有希望的结果，但在空间域构建长距离特征依赖会产生大量的计算成本。此外，缺乏地面真实数据使得捕捉互补特征变得更加困难。为了应对这些挑战，我们提出了一种名为RPFNet的残差先验驱动频率感知网络。具体来说，RPFNet采用双分支特征提取框架：残差先验模块（RPM）从残差图中提取模态特定的差异信息，从而为融合提供互补先验；频域融合模块（FDFM）通过频域卷积实现高效的全局特征建模和集成。此外，交叉促进模块（CPM）通过双向特征交互增强了局部细节和全局结构的协同感知。在训练过程中，我们加入了辅助解码器和显著性结构损失，以增强模型对模态特定差异的敏感性。此外，自适应基于权重的频率对比损失和SSIM损失的组合有效限制了解空间，促进了对局部细节和全局特征的联合捕获，同时确保了互补信息的保留。大量实验证实了RPFNet的融合性能，有效整合了具有区分性的特征，增强了纹理细节和显著对象，并且可以有效促进高级视觉任务的部署。

更新时间: 2025-07-24 13:57:08

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2507.06735v2

Residual Prior-driven Frequency-aware Network for Image Fusion

Updated: 2025-07-24 13:57:08

标题: 基于残差先验驱动的频率感知图像融合网络

摘要: 图像融合旨在整合跨模态的互补信息，生成高质量的融合图像，从而提升高级视觉任务的性能。虽然全局空间建模机制显示出有希望的结果，但在空间域构建长距离特征依赖会产生大量的计算成本。此外，缺乏地面真实数据加剧了有效捕获互补特征的困难。为了解决这些挑战，我们提出了一种名为RPFNet的残差先验驱动频率感知网络。具体而言，RPFNet采用双分支特征提取框架：残差先验模块（RPM）从残差图中提取模态特定的差异信息，从而为融合提供互补先验；频域融合模块（FDFM）通过频域卷积实现高效的全局特征建模和整合。此外，交叉促进模块（CPM）通过双向特征交互增强局部细节和全局结构的协同感知。在训练过程中，我们结合了辅助解码器和显著性结构损失，以增强模型对模态特定差异的敏感性。此外，基于自适应权重的频率对比损失和SSIM损失的组合有效地限制了解决空间，促进了对局部细节和全局特征的联合捕获，同时确保了互补信息的保留。大量实验证实了RPFNet的融合性能，能有效整合具有区分性的特征，增强纹理细节和显著对象，并有效促进高级视觉任务的部署。

更新时间: 2025-07-24 13:57:08

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2507.06735v2

FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs

Opinions expressed in online finance-related textual data are having an increasingly profound impact on trading decisions and market movements. This trend highlights the vital role of sentiment analysis as a tool for quantifying the nature and strength of such opinions. With the rapid development of Generative AI (GenAI), supervised fine-tuned (SFT) large language models (LLMs) have become the de facto standard for financial sentiment analysis. However, the SFT paradigm can lead to memorization of the training data and often fails to generalize to unseen samples. This is a critical limitation in financial domains, where models must adapt to previously unobserved events and the nuanced, domain-specific language of finance. To this end, we introduce FinDPO, the first finance-specific LLM framework based on post-training human preference alignment via Direct Preference Optimization (DPO). The proposed FinDPO achieves state-of-the-art performance on standard sentiment classification benchmarks, outperforming existing supervised fine-tuned models by 11% on the average. Uniquely, the FinDPO framework enables the integration of a fine-tuned causal LLM into realistic portfolio strategies through a novel 'logit-to-score' conversion, which transforms discrete sentiment predictions into continuous, rankable sentiment scores (probabilities). In this way, simulations demonstrate that FinDPO is the first sentiment-based approach to maintain substantial positive returns of 67% annually and strong risk-adjusted performance, as indicated by a Sharpe ratio of 2.0, even under realistic transaction costs of 5 basis points (bps).

Updated: 2025-07-24 13:57:05

标题: FinDPO：通过LLMs的偏好优化进行算法交易的金融情绪分析

摘要: 在线金融相关文本数据中表达的观点越来越深刻地影响着交易决策和市场走势。这一趋势突出了情感分析作为量化这些观点性质和强度的工具的重要作用。随着生成式人工智能（GenAI）的快速发展，监督微调（SFT）大型语言模型（LLMs）已成为金融情感分析的事实标准。然而，SFT范式可能导致对训练数据的记忆，并且通常无法泛化到未见样本。这在金融领域是一个关键限制，模型必须适应以前未观察到的事件和金融领域特定语言的微妙之处。为此，我们介绍了FinDPO，基于通过直接偏好优化（DPO）进行后训练的首个金融特定LLM框架。所提出的FinDPO在标准情感分类基准上实现了最先进的性能，平均超过现有的监督微调模型11%。独特的是，FinDPO框架通过一种新颖的“对数到分数”转换，将离散情感预测转换为连续的、可排名的情感分数（概率），从而能够将微调的因果LLM整合到现实的组合策略中。通过模拟，证明了FinDPO是第一个基于情感的方法，即使在5个基点（bps）的现实交易成本下，也能维持每年67%的显著正收益和强劲的风险调整性能，表现为2.0的夏普比率。

更新时间: 2025-07-24 13:57:05

领域: cs.CL,cs.LG,q-fin.ST,q-fin.TR

下载: http://arxiv.org/abs/2507.18417v1

FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs

Updated: 2025-07-24 13:57:05

标题: FinDPO：通过LLMs的偏好优化进行算法交易的金融情绪分析

摘要: 在线金融相关文本数据中表达的观点越来越深刻地影响着交易决策和市场波动。这一趋势突显了情感分析作为量化此类观点性质和强度的工具的重要作用。随着生成式人工智能（GenAI）的快速发展，监督微调（SFT）的大型语言模型（LLMs）已成为金融情感分析的事实标准。然而，SFT范式可能导致对训练数据的记忆，并经常无法推广到未见样本。这在金融领域中是一个关键限制，因为模型必须适应以往未曾观察到的事件和金融领域特定的微妙语言。为此，我们引入了FinDPO，基于通过直接偏好优化（DPO）进行后训练的首个金融特定LLM框架。所提出的FinDPO在标准情感分类基准测试中取得了最先进的性能，平均比现有的监督微调模型高出11%。独特地，FinDPO框架通过一种新颖的“逻辑到分数”转换，将离散情感预测转化为连续的、可排名的情感分数（概率），从而使微调的因果LLM能够集成到现实投资组合策略中。通过模拟，证明了FinDPO是首个基于情感的方法，即使在5个基点（bps）的实际交易成本下，也能保持每年67%的实质正收益和强大的风险调整性能，表现为2.0的夏普比率。

更新时间: 2025-07-24 13:57:05

领域: cs.CL,cs.LG,q-fin.ST,q-fin.TR

下载: http://arxiv.org/abs/2507.18417v1

GPU Accelerated Compact-Table Propagation

Constraint Programming developed within Logic Programming in the Eighties; nowadays all Prolog systems encompass modules capable of handling constraint programming on finite domains demanding their solution to a constraint solver. This work focuses on a specific form of constraint, the so-called table constraint, used to specify conditions on the values of variables as an enumeration of alternative options. Since every condition on a set of finite domain variables can be ultimately expressed as a finite set of cases, Table can, in principle, simulate any other constraint. These characteristics make Table one of the most studied constraints ever, leading to a series of increasingly efficient propagation algorithms. Despite this, it is not uncommon to encounter real-world problems with hundreds or thousands of valid cases that are simply too many to be handled effectively with standard CPU-based approaches. In this paper, we deal with the Compact-Table (CT) algorithm, the state-of-the-art propagation algorithms for Table. We describe how CT can be enhanced by exploiting the massive computational power offered by modern GPUs to handle large Table constraints. In particular, we report on the design and implementation of GPU-accelerated CT, on its integration into an existing constraint solver, and on an experimental validation performed on a significant set of instances.

Updated: 2025-07-24 13:53:49

标题: GPU加速的紧凑表传播

摘要: 约束编程在80年代发展并融入逻辑编程中；如今所有Prolog系统都包含能够处理有限域约束编程的模块，要求它们解决约束求解器的问题。本文关注一种特定形式的约束，即所谓的表约束，用于指定变量值的条件，作为替代选项的枚举。由于有限域变量集合上的每个条件最终可以表示为有限数量的案例，因此表原则上可以模拟任何其他约束。这些特性使表成为最受研究的约束之一，导致一系列越来越有效的传播算法。尽管如此，在现实世界中遇到具有数百或数千个有效案例的问题并不罕见，这些案例对于使用标准基于CPU的方法来处理实际上太多了。在本文中，我们讨论了Compact-Table（CT）算法，这是表的最新传播算法。我们描述了如何通过利用现代GPU提供的大量计算能力来增强CT，以处理大型表约束。具体地，我们报告了GPU加速CT的设计和实施，以及其整合到现有约束求解器中，并对一组重要实例进行了实验验证。

更新时间: 2025-07-24 13:53:49

领域: cs.AI

下载: http://arxiv.org/abs/2507.18413v1

An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis

Legal dispute analysis is crucial for intelligent legal assistance systems. However, current LLMs face significant challenges in understanding complex legal concepts, maintaining reasoning consistency, and accurately citing legal sources. This research presents a framework combining prompt engineering with multidimensional knowledge graphs to improve LLMs' legal dispute analysis. Specifically, the framework includes a three-stage hierarchical prompt structure (task definition, knowledge background, reasoning guidance) along with a three-layer knowledge graph (legal ontology, representation, instance layers). Additionally, four supporting methods enable precise legal concept retrieval: direct code matching, semantic vector similarity, ontology path reasoning, and lexical segmentation. Through extensive testing, results show major improvements: sensitivity increased by 9.9%-13.8%, specificity by 4.8%-6.7%, and citation accuracy by 22.4%-39.7%. As a result, the framework provides better legal analysis and understanding of judicial logic, thus offering a new technical method for intelligent legal assistance systems.

Updated: 2025-07-24 13:52:51

标题: 一个融合了即时工程和多维知识图谱的法律纠纷分析综合框架

摘要: 法律纠纷分析对智能法律辅助系统至关重要。然而，当前的LLMs在理解复杂的法律概念、保持推理一致性和准确引用法律来源方面面临重大挑战。本研究提出了一个将提示工程与多维知识图结合起来以改进LLMs法律纠纷分析的框架。具体来说，该框架包括一个三阶段的层次提示结构（任务定义、知识背景、推理指导）以及一个三层知识图（法律本体论、表示、实例层）。此外，四种支持方法使得精确的法律概念检索成为可能：直接代码匹配、语义向量相似度、本体路径推理和词法分割。通过广泛的测试，结果显示主要的改进：敏感性提高了9.9%-13.8%，特异性提高了4.8%-6.7%，引文准确性提高了22.4%-39.7%。因此，该框架提供了更好的法律分析和对司法逻辑的理解，从而为智能法律辅助系统提供了一种新的技术方法。

更新时间: 2025-07-24 13:52:51

领域: cs.AI,68T50, 68T30, 91F20,I.2.7; I.2.4; K.5.1; H.3.3

下载: http://arxiv.org/abs/2507.07893v3

ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models

Large Language Models (LLMs) are increasingly used in tasks requiring interpretive and inferential accuracy. In this paper, we introduce ExpliCa, a new dataset for evaluating LLMs in explicit causal reasoning. ExpliCa uniquely integrates both causal and temporal relations presented in different linguistic orders and explicitly expressed by linguistic connectives. The dataset is enriched with crowdsourced human acceptability ratings. We tested LLMs on ExpliCa through prompting and perplexity-based metrics. We assessed seven commercial and open-source LLMs, revealing that even top models struggle to reach 0.80 accuracy. Interestingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events. Finally, perplexity-based scores and prompting performance are differently affected by model size.

Updated: 2025-07-24 13:47:56

标题: ExpliCa：评估大型语言模型中的显式因果推理

摘要: 大型语言模型(LLM)越来越多地用于需要解释和推理准确性的任务。在本文中，我们介绍了ExpliCa，这是一个用于评估LLM在显式因果推理中的新数据集。ExpliCa独特地整合了以不同语言顺序呈现的因果和时间关系，并由语言连接词明确表达。该数据集还附有众包人类可接受性评分。我们通过提示和困惑度指标在ExpliCa上测试了LLM。我们评估了七个商业和开源LLM，发现即使是顶级模型也很难达到0.80的准确性。有趣的是，模型倾向于混淆时间关系和因果关系，而他们的性能也受事件的语言顺序的强烈影响。最后，困惑度得分和提示性能受模型大小的影响不同。

更新时间: 2025-07-24 13:47:56

领域: cs.CL,cs.AI,68T50, 68T07,I.2.7

下载: http://arxiv.org/abs/2502.15487v3

Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows

We introduce Iwin Transformer, a novel position-embedding-free hierarchical vision transformer, which can be fine-tuned directly from low to high resolution, through the collaboration of innovative interleaved window attention and depthwise separable convolution. This approach uses attention to connect distant tokens and applies convolution to link neighboring tokens, enabling global information exchange within a single module, overcoming Swin Transformer's limitation of requiring two consecutive blocks to approximate global attention. Extensive experiments on visual benchmarks demonstrate that Iwin Transformer exhibits strong competitiveness in tasks such as image classification (87.4 top-1 accuracy on ImageNet-1K), semantic segmentation and video action recognition. We also validate the effectiveness of the core component in Iwin as a standalone module that can seamlessly replace the self-attention module in class-conditional image generation. The concepts and methods introduced by the Iwin Transformer have the potential to inspire future research, like Iwin 3D Attention in video generation. The code and models are available at https://github.com/cominder/Iwin-Transformer.

Updated: 2025-07-24 13:45:48

标题: Iwin Transformer：使用交错窗口的分层视觉变换器

摘要: 我们介绍了Iwin Transformer，这是一种新颖的无位置嵌入的分层视觉Transformer，可以直接从低分辨率微调到高分辨率，通过创新的交错窗口注意力和深度可分离卷积的协作。这种方法利用注意力来连接遥远的令牌，并应用卷积来连接相邻的令牌，在单个模块内实现全局信息交换，克服了Swin Transformer需要两个连续块来近似全局注意力的限制。在视觉基准测试中进行了大量实验，结果表明Iwin Transformer在图像分类（ImageNet-1K上的87.4% top-1准确率）、语义分割和视频动作识别等任务中表现出强大的竞争力。我们还验证了Iwin中核心组件作为独立模块的有效性，可以无缝地替换类条件图像生成中的自注意力模块。Iwin Transformer引入的概念和方法有潜力激发未来研究，如视频生成中的Iwin 3D注意力。代码和模型可在https://github.com/cominder/Iwin-Transformer上找到。

更新时间: 2025-07-24 13:45:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18405v1

Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows

Updated: 2025-07-24 13:45:48

标题: Iwin Transformer：使用交错窗口的分层视觉Transformer

摘要: 我们介绍了Iwin Transformer，这是一种新颖的无位置嵌入的分层视觉Transformer，可以通过创新的交替窗口注意力和深度可分离卷积的协作直接从低分辨率微调到高分辨率。该方法利用注意力连接远距离令牌，并应用卷积链接相邻令牌，使全局信息在单个模块内进行交换，克服了Swin Transformer需要两个连续块来近似全局注意力的限制。在视觉基准测试中进行的大量实验表明，Iwin Transformer在诸如图像分类（ImageNet-1K上的87.4的top-1准确率）、语义分割和视频动作识别等任务中表现出强大的竞争力。我们还验证了Iwin中核心组件作为独立模块的有效性，可以无缝替换类条件图像生成中的自注意力模块。Iwin Transformer引入的概念和方法有潜力激发未来研究，如视频生成中的Iwin 3D注意力。代码和模型可在https://github.com/cominder/Iwin-Transformer获取。

更新时间: 2025-07-24 13:45:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18405v1

Optimising Call Centre Operations using Reinforcement Learning: Value Iteration versus Proximal Policy Optimisation

This paper investigates the application of Reinforcement Learning (RL) to optimise call routing in call centres to minimise client waiting time and staff idle time. Two methods are compared: a model-based approach using Value Iteration (VI) under known system dynamics, and a model-free approach using Proximal Policy Optimisation (PPO) that learns from experience. For the model-based approach, a theoretical model is used, while a simulation model combining Discrete Event Simulation (DES) with the OpenAI Gym environment is developed for model-free learning. Both models frame the problem as a Markov Decision Process (MDP) within a Skills-Based Routing (SBR) framework, with Poisson client arrivals and exponentially distributed service and abandonment times. For policy evaluation, random, VI, and PPO policies are evaluated using the simulation model. After 1,000 test episodes, PPO consistently achives the highest rewards, along with the lowest client waiting time and staff idle time, despite requiring longer training time.

Updated: 2025-07-24 13:31:38

标题: 使用强化学习优化呼叫中心运营：值迭代与近端策略优化对比分析

摘要: 本文研究了强化学习（RL）在呼叫中心中应用于优化呼叫路由以最小化客户等待时间和员工空闲时间。比较了两种方法：一种是基于已知系统动态的模型为基础的方法，使用值迭代（VI），另一种是基于经验学习的模型无关方法，使用Proximal Policy Optimisation（PPO）。对于基于模型的方法，使用了一个理论模型，而对于无模型学习，开发了一个将离散事件模拟（DES）与OpenAI Gym环境相结合的模拟模型。两种模型将问题构建为技能路由（SBR）框架中的马尔可夫决策过程（MDP），其中客户到达为泊松过程，服务和放弃时间为指数分布。通过模拟模型评估了随机、VI和PPO策略。在经过1,000个测试周期后，PPO始终获得最高奖励，同时具有最低的客户等待时间和员工空闲时间，尽管需要更长的训练时间。

更新时间: 2025-07-24 13:31:38

领域: cs.AI

下载: http://arxiv.org/abs/2507.18398v1

EndoControlMag: Robust Endoscopic Vascular Motion Magnification with Periodic Reference Resetting and Hierarchical Tissue-aware Dual-Mask Control

Visualizing subtle vascular motions in endoscopic surgery is crucial for surgical precision and decision-making, yet remains challenging due to the complex and dynamic nature of surgical scenes. To address this, we introduce EndoControlMag, a training-free, Lagrangian-based framework with mask-conditioned vascular motion magnification tailored to endoscopic environments. Our approach features two key modules: a Periodic Reference Resetting (PRR) scheme that divides videos into short overlapping clips with dynamically updated reference frames to prevent error accumulation while maintaining temporal coherence, and a Hierarchical Tissue-aware Magnification (HTM) framework with dual-mode mask dilation. HTM first tracks vessel cores using a pretrained visual tracking model to maintain accurate localization despite occlusions and view changes. It then applies one of two adaptive softening strategies to surrounding tissues: motion-based softening that modulates magnification strength proportional to observed tissue displacement, or distance-based exponential decay that simulates biomechanical force attenuation. This dual-mode approach accommodates diverse surgical scenarios-motion-based softening excels with complex tissue deformations while distance-based softening provides stability during unreliable optical flow conditions. We evaluate EndoControlMag on our EndoVMM24 dataset spanning four different surgery types and various challenging scenarios, including occlusions, instrument disturbance, view changes, and vessel deformations. Quantitative metrics, visual assessments, and expert surgeon evaluations demonstrate that EndoControlMag significantly outperforms existing methods in both magnification accuracy and visual quality while maintaining robustness across challenging surgical conditions. The code, dataset, and video results are available at https://szupc.github.io/EndoControlMag/.

Updated: 2025-07-24 13:26:19

标题: EndoControlMag：具有周期性参考重置和分层组织感知双蒙版控制的鲁棒内窥镜血管运动放大

摘要: 在内窥镜手术中，可视化微小血管动作对手术精度和决策至关重要，但由于手术场景的复杂性和动态性，仍然具有挑战性。为了解决这个问题，我们引入了EndoControlMag，这是一个无需训练的、基于拉格朗日的框架，具有针对内窥镜环境量身定制的面具条件的血管运动放大功能。我们的方法包括两个关键模块：一个周期性参考重置（PRR）方案，将视频分成短重叠片段，并动态更新参考帧，以防止误差积累，同时保持时间上的一致性；以及一个具有双模面具膨胀的分层组织感知放大（HTM）框架。HTM首先使用预训练的视觉跟踪模型跟踪血管核心，以维持准确的定位，尽管遮挡和视角变化。然后，它应用两种自适应软化策略之一到周围组织：基于运动的软化根据观察到的组织位移调节放大强度，或者基于距离的指数衰减，模拟生物力学力量的衰减。这种双模式方法适应不同的手术场景-基于运动的软化在复杂组织变形时表现出色，而基于距离的软化在不可靠的光流条件下提供稳定性。我们在覆盖四种不同手术类型和各种具有挑战性的情况下的EndoVMM24数据集上评估了EndoControlMag，包括遮挡、仪器干扰、视角变化和血管变形。定量指标、视觉评估和专家外科医生评估表明，EndoControlMag在放大精度和视觉质量上明显优于现有方法，同时在具有挑战性的手术条件下保持稳健性。代码、数据集和视频结果可在https://szupc.github.io/EndoControlMag/上获得。

更新时间: 2025-07-24 13:26:19

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.15292v4

Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder

Text-to-Image (T2I) Diffusion Models have achieved remarkable performance in generating high quality images. However, enabling precise control of continuous attributes, especially multiple attributes simultaneously, in a new domain (e.g., numeric values like eye openness or car width) with text-only guidance remains a significant challenge. To address this, we introduce the Attribute (Att) Adapter, a novel plug-and-play module designed to enable fine-grained, multi-attributes control in pretrained diffusion models. Our approach learns a single control adapter from a set of sample images that can be unpaired and contain multiple visual attributes. The Att-Adapter leverages the decoupled cross attention module to naturally harmonize the multiple domain attributes with text conditioning. We further introduce Conditional Variational Autoencoder (CVAE) to the Att-Adapter to mitigate overfitting, matching the diverse nature of the visual world. Evaluations on two public datasets show that Att-Adapter outperforms all LoRA-based baselines in controlling continuous attributes. Additionally, our method enables a broader control range and also improves disentanglement across multiple attributes, surpassing StyleGAN-based techniques. Notably, Att-Adapter is flexible, requiring no paired synthetic data for training, and is easily scalable to multiple attributes within a single model.

Updated: 2025-07-24 13:24:21

标题: Att-Adapter：一种稳健且精确的领域特定多属性T2I扩散适配器，通过条件变分自动编码器实现

摘要: 文本到图像（T2I）扩散模型在生成高质量图像方面取得了显著的性能。然而，在新领域（例如，数字值如眼睛张开度或车宽度）中，通过仅依靠文本指导实现对连续属性的精确控制，尤其是同时控制多个属性，仍然是一个重大挑战。为了解决这个问题，我们引入了属性（Att）适配器，这是一个新颖的即插即用模块，旨在实现在预训练的扩散模型中进行细粒度、多属性控制。我们的方法通过学习一个从包含多个视觉属性的样本图像中学习一个单一控制适配器来实现。Att-Adapter利用解耦的交叉注意力模块，自然地将多个领域属性与文本条件进行协调。我们进一步引入了条件变分自动编码器（CVAE）到Att-Adapter中，以减轻过拟合问题，匹配视觉世界的多样性。在两个公共数据集上的评估显示，Att-Adapter在控制连续属性方面优于所有基于LoRA的基线。此外，我们的方法还能够扩大控制范围，并改善跨多个属性的解缠，超越了基于StyleGAN的技术。值得注意的是，Att-Adapter是灵活的，无需配对的合成数据进行训练，并且可以轻松扩展到单个模型中的多个属性。

更新时间: 2025-07-24 13:24:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.11937v4

CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

The evaluation of Large Language Models (LLMs) increasingly relies on other LLMs acting as judges. However, current evaluation paradigms typically yield a single score or ranking, answering which model is better but not why. While essential for benchmarking, these top-level scores obscure the specific, actionable reasons behind a model's performance. To bridge this gap, we introduce CLEAR, an interactive, open-source package for LLM-based error analysis. CLEAR first generates per-instance textual feedback, then it creates a set of system-level error issues, and quantifies the prevalence of each identified issue. Our package also provides users with an interactive dashboard that allows for a comprehensive error analysis through aggregate visualizations, applies interactive filters to isolate specific issues or score ranges, and drills down to the individual instances that exemplify a particular behavioral pattern. We demonstrate CLEAR analysis for RAG and Math benchmarks, and showcase its utility through a user case study.

Updated: 2025-07-24 13:15:21

标题: CLEAR: 通过LLM作为评判者进行的错误分析变得更加容易

摘要: 大型语言模型（LLMs）的评估越来越依赖于其他LLMs充当评委。然而，当前的评估范式通常产生单一的得分或排名，回答了哪个模型更好，但没有解释为什么。虽然对基准测试至关重要，但这些顶层得分掩盖了模型性能背后具体可操作的原因。为了弥合这一差距，我们引入了CLEAR，一个基于LLM的错误分析交互式、开源软件包。CLEAR首先生成每个实例的文本反馈，然后创建一组系统级错误问题，并量化每个已识别问题的普遍程度。我们的软件包还为用户提供了一个交互式仪表板，通过聚合可视化实现全面的错误分析，应用交互式过滤器来隔离特定问题或得分范围，并深入到展示特定行为模式的个别实例。我们展示了RAG和数学基准的CLEAR分析，并通过用户案例研究展示了其实用性。

更新时间: 2025-07-24 13:15:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.18392v1

CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

Updated: 2025-07-24 13:15:21

标题: CLEAR：通过LLM作为评判员进行错误分析变得更容易

摘要: 评估大型语言模型（LLMs）越来越依赖于其他LLMs充当评委。然而，当前的评估范式通常产生一个单一的得分或排名，回答哪个模型更好，但不解释为什么。虽然对于基准测试至关重要，但这些顶层得分掩盖了模型性能背后具体可操作的原因。为了弥补这一差距，我们引入了CLEAR，一个基于LLM的错误分析交互式开源包。CLEAR首先生成每个实例的文本反馈，然后创建一组系统级错误问题，并量化每个识别的问题的普遍性。我们的包还为用户提供了一个交互式仪表板，通过汇总可视化图表进行全面错误分析，应用交互式筛选器来隔离特定问题或得分范围，并深入到展示特定行为模式的个体实例。我们展示了RAG和Math基准测试的CLEAR分析，并通过用户案例研究展示了其实用性。

更新时间: 2025-07-24 13:15:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.18392v1

Revisiting LLM Reasoning via Information Bottleneck

Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR). By leveraging simple rule-based rewards, RL effectively incentivizes LLMs to produce extended chain-of-thought (CoT) reasoning trajectories, progressively guiding them toward correct answers. However, existing approaches remain largely heuristic and intuition-driven, limiting the development of principled methodologies. In this paper, we present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle, introducing IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable across diverse prompts. We derive a practical token-level surrogate objective and propose an efficient approximation, resulting in the lightweight IB regularization method. This technique integrates seamlessly into existing RL-based post-training frameworks without additional computational overhead, requiring only a one-line code modification. Empirically, we validate IB regularization across multiple mathematical reasoning benchmarks and RL algorithms, demonstrating consistent improvements in LLM reasoning performance.

Updated: 2025-07-24 13:14:25

标题: 通过信息瓶颈重新审视LLM推理

摘要: 最近，大型语言模型（LLMs）通过强化学习与可验证奖励（RLVR）展示出了在推理能力方面的显著进展。通过利用基于简单规则的奖励，RL有效地激励LLMs产生延续性的思维链（CoT）推理轨迹，逐步引导它们走向正确答案。然而，现有方法仍然主要是启发式和直觉驱动的，限制了原则性方法的发展。在本文中，我们提出了一个基于信息瓶颈（IB）原理的LLM推理的理论表征，引入了IB感知推理优化（IBRO），这是一个鼓励推理轨迹既对最终正确答案具有信息性又能够跨多种提示泛化的框架。我们推导了一个实用的基于标记级别的替代目标，并提出了一种高效的近似方法，从而产生了轻量级的IB正则化方法。这种技术能够无缝集成到现有的基于RL的后训练框架中，不需要额外的计算开销，只需要一行代码修改。在实证研究中，我们验证了IB正则化在多个数学推理基准和RL算法中的有效性，展示了LLM推理性能的一致改进。

更新时间: 2025-07-24 13:14:25

领域: cs.AI

下载: http://arxiv.org/abs/2507.18391v1

Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks

Human action recognition plays a critical role in healthcare and medicine, supporting applications such as patient behavior monitoring, fall detection, surgical robot supervision, and procedural skill assessment. While traditional models like CNNs and RNNs have achieved moderate success, they often struggle to generalize across diverse and complex actions. Recent advancements in vision-language models, especially the transformer-based CLIP model, offer promising capabilities for generalizing action recognition from video data. In this work, we evaluate CLIP on the UCF-101 dataset and systematically analyze its performance under three masking strategies: (1) percentage-based and shape-based black masking at 10%, 30%, and 50%, (2) feature-specific masking to suppress bias-inducing elements, and (3) isolation masking that retains only class-specific regions. Our results reveal that CLIP exhibits inconsistent behavior and frequent misclassifications, particularly when essential visual cues are obscured. To overcome these limitations, we propose incorporating class-specific noise, learned via a custom loss function, to reinforce attention to class-defining features. This enhancement improves classification accuracy and model confidence while reducing bias. We conclude with a discussion on the challenges of applying such models in clinical domains and outline directions for future work to improve generalizability across domain-independent healthcare scenarios.

Updated: 2025-07-24 13:13:28

标题: 推进基于视觉的人类动作识别：探索用于领域无关任务泛化的视觉语言CLIP模型

摘要: 人类动作识别在医疗保健和医学中发挥着关键作用，支持诸如患者行为监测、跌倒检测、手术机器人监督和程序技能评估等应用。尽管传统模型如CNN和RNN取得了适度成功，但它们往往难以泛化到各种复杂动作上。最近视觉-语言模型的进展，特别是基于transformer的CLIP模型，为从视频数据中泛化动作识别提供了有希望的能力。在这项工作中，我们在UCF-101数据集上评估了CLIP，并系统地分析了其在三种遮罩策略下的表现：（1）基于百分比和形状的黑色遮罩，分别为10％、30％和50％，（2）特征特定的遮罩以抑制引入偏见的元素，以及（3）保留仅特定于类别的区域的隔离遮罩。我们的结果显示，CLIP表现出不一致的行为和频繁的误分类，特别是在关键视觉线索被遮挡时。为了克服这些限制，我们提出结合特定类别的噪声，通过自定义损失函数学习，以加强对定义类别特征的关注。这种增强提高了分类准确性和模型信心，同时降低了偏见。我们最后讨论了在临床领域应用这种模型所面临的挑战，并概述了未来工作的方向，以提高跨领域独立医疗保健场景的泛化能力。

更新时间: 2025-07-24 13:13:28

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18675v1

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games

As large language models (LLMs) are increasingly deployed as autonomous agents, understanding their cooperation and social mechanisms is becoming increasingly important. In particular, how LLMs balance self-interest and collective well-being is a critical challenge for ensuring alignment, robustness, and safe deployment. In this paper, we examine the challenge of costly sanctioning in multi-agent LLM systems, where an agent must decide whether to invest its own resources to incentivize cooperation or penalize defection. To study this, we adapt a public goods game with institutional choice from behavioral economics, allowing us to observe how different LLMs navigate social dilemmas over repeated interactions. Our analysis reveals four distinct behavioral patterns among models: some consistently establish and sustain high levels of cooperation, others fluctuate between engagement and disengagement, some gradually decline in cooperative behavior over time, and others rigidly follow fixed strategies regardless of outcomes. Surprisingly, we find that reasoning LLMs, such as the o1 series, struggle significantly with cooperation, whereas some traditional LLMs consistently achieve high levels of cooperation. These findings suggest that the current approach to improving LLMs, which focuses on enhancing their reasoning capabilities, does not necessarily lead to cooperation, providing valuable insights for deploying LLM agents in environments that require sustained collaboration. Our code is available at https://github.com/davidguzmanp/SanctSim

Updated: 2025-07-24 13:13:24

标题: 被推理腐蚀：推理语言模型在公共产品游戏中成为搭便车者

摘要: 随着大型语言模型（LLMs）越来越多地被部署为自主代理，理解它们的合作和社会机制变得越来越重要。特别是，LLMs如何平衡自身利益和集体福祉是确保对齐、鲁棒性和安全部署的关键挑战。在本文中，我们研究了多代理LLM系统中昂贵制裁的挑战，其中一个代理必须决定是否投入自己的资源来激励合作或惩罚叛变。为了研究这一点，我们改编了来自行为经济学的带有制度选择的公共品博弈，从而可以观察不同LLMs在重复互动中如何处理社会困境。我们的分析揭示了模型之间四种明显的行为模式：一些始终建立和维持高水平的合作，另一些在参与和退出之间波动，一些随着时间逐渐减少合作行为，另一些则严格遵循固定策略，不考虑结果。令人惊讶的是，我们发现推理LLMs（如o1系列）在合作方面存在显著困难，而一些传统LLMs始终能够实现高水平的合作。这些发现表明，当前改进LLMs的方法主要集中在提高它们的推理能力，并不一定会导致合作，为在需要持续合作的环境中部署LLM代理提供了宝贵的见解。我们的代码可在https://github.com/davidguzmanp/SanctSim 上找到。

更新时间: 2025-07-24 13:13:24

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2506.23276v2

Vision Transformers in Precision Agriculture: A Comprehensive Survey

Detecting plant diseases is a crucial aspect of modern agriculture, as it plays a key role in maintaining crop health and increasing overall yield. Traditional approaches, though still valuable, often rely on manual inspection or conventional machine learning techniques, both of which face limitations in scalability and accuracy. Recently, Vision Transformers (ViTs) have emerged as a promising alternative, offering advantages such as improved handling of long-range dependencies and better scalability for visual tasks. This review explores the application of ViTs in precision agriculture, covering a range of tasks. We begin by introducing the foundational architecture of ViTs and discussing their transition from Natural Language Processing (NLP) to Computer Vision. The discussion includes the concept of inductive bias in traditional models like Convolutional Neural Networks (CNNs), and how ViTs mitigate these biases. We provide a comprehensive review of recent literature, focusing on key methodologies, datasets, and performance metrics. This study also includes a comparative analysis of CNNs and ViTs, along with a review of hybrid models and performance enhancements. Technical challenges such as data requirements, computational demands, and model interpretability are addressed, along with potential solutions. Finally, we outline future research directions and technological advancements that could further support the integration of ViTs in real-world agricultural settings. Our goal with this study is to offer practitioners and researchers a deeper understanding of how ViTs are poised to transform smart and precision agriculture.

Updated: 2025-07-24 12:57:18

标题: 视觉变压器在精准农业中的应用：综合调查

摘要: 检测植物疾病是现代农业的一个关键方面，因为它在维持作物健康和增加总产量方面起着关键作用。传统方法虽然仍然有价值，但往往依赖于手动检查或常规机器学习技术，这两种方法都面临着可伸缩性和准确性方面的限制。最近，视觉变压器（ViTs）作为一种有前途的替代方案出现，提供了改进长距离依赖性处理和视觉任务更好可伸缩性等优势。本综述探讨了ViTs在精准农业中的应用，涵盖了一系列任务。我们首先介绍了ViTs的基础架构，并讨论了它们从自然语言处理（NLP）过渡到计算机视觉的过程。讨论包括传统模型如卷积神经网络（CNNs）中的归纳偏差的概念，以及ViTs如何缓解这些偏差。我们对最近文献进行了全面回顾，重点关注关键方法论、数据集和性能指标。本研究还包括对CNNs和ViTs的比较分析，以及对混合模型和性能增强的回顾。还讨论了技术挑战，如数据要求、计算需求和模型解释性，并提出了潜在的解决方案。最后，我们概述了未来研究方向和技术进展，这些进展可能进一步支持ViTs在现实农业环境中的整合。我们的目标是通过这项研究为实践者和研究人员提供更深入的了解，了解ViTs如何准备改变智能和精准农业。

更新时间: 2025-07-24 12:57:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.21706v3

Beamforming and Resource Allocation for Delay Minimization in RIS-Assisted OFDM Systems

This paper investigates a joint beamforming and resource allocation problem in downlink reconfigurable intelligent surface (RIS)-assisted orthogonal frequency division multiplexing (OFDM) systems to minimize the average delay, where data packets for each user arrive at the base station (BS) stochastically. The sequential optimization problem is inherently a Markov decision process (MDP), thus falling within the remit of reinforcement learning. To effectively handle the mixed action space and reduce the state space dimensionality, a hybrid deep reinforcement learning (DRL) approach is proposed. Specifically, proximal policy optimization (PPO)-Theta is employed to optimize the RIS phase shift design, while PPO-N is responsible for subcarrier allocation decisions. The active beamforming at the BS is then derived from the jointly optimized RIS phase shifts and subcarrier allocation decisions. To further mitigate the curse of dimensionality associated with subcarrier allocation, a multi-agent strategy is introduced to optimize the subcarrier allocation indicators more efficiently. Moreover, to achieve more adaptive resource allocation and accurately capture the network dynamics, key factors closely related to average delay, such as the number of backlogged packets in buffers and current packet arrivals, are incorporated into the state space. Furthermore, a transfer learning framework is introduced to enhance the training efficiency and accelerate convergence. Simulation results demonstrate that the proposed algorithm significantly reduces the average delay, enhances resource allocation efficiency, and achieves superior system robustness and fairness compared to baseline methods.

Updated: 2025-07-24 12:56:07

标题: 反射表面辅助OFDM系统中的波束成形和资源分配以最小化时延

摘要: 本文研究了下行可重构智能表面（RIS）辅助正交频分复用（OFDM）系统中的联合波束成形和资源分配问题，以最小化平均延迟，其中每个用户的数据包以随机方式到达基站（BS）。顺序优化问题本质上是马尔可夫决策过程（MDP），因此属于强化学习的范畴。为了有效处理混合动作空间并减少状态空间的维度，提出了一种混合深度强化学习（DRL）方法。具体而言，采用PPO-Theta优化RIS相移设计，而PPO-N负责子载波分配决策。然后，基站的主动波束成形是从联合优化的RIS相移和子载波分配决策中导出的。为了进一步减轻与子载波分配相关的维度灾难，引入了一种多智能体策略，以更高效地优化子载波分配指示器。此外，为了实现更具适应性的资源分配并准确捕捉网络动态，与平均延迟密切相关的关键因素，如缓冲区中积压的数据包数量和当前数据包到达情况，被纳入到状态空间中。此外，引入了一种迁移学习框架，以增强训练效率和加快收敛速度。仿真结果表明，所提出的算法显著减少了平均延迟，增强了资源分配效率，并相比基线方法实现了更优越的系统稳健性和公平性。

更新时间: 2025-07-24 12:56:07

领域: cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2506.03586v4

A comparison of stretched-grid and limited-area modelling for data-driven regional weather forecasting

Regional machine learning weather prediction (MLWP) models based on graph neural networks have recently demonstrated remarkable predictive accuracy, outperforming numerical weather prediction models at lower computational costs. In particular, limited-area model (LAM) and stretched-grid model (SGM) approaches have emerged for generating high-resolution regional forecasts, based on initial conditions from a regional (re)analysis. While LAM uses lateral boundaries from an external global model, SGM incorporates a global domain at lower resolution. This study aims to understand how the differences in model design impact relative performance and potential applications. Specifically, the strengths and weaknesses of these two approaches are identified for generating deterministic regional forecasts over Europe. Using the Anemoi framework, models of both types are built by minimally adapting a shared architecture and trained using global and regional reanalyses in a near-identical setup. Several inference experiments have been conducted to explore their relative performance and highlight key differences. Results show that both LAM and SGM are competitive deterministic MLWP models with generally accurate and comparable forecasting performance over the regional domain. Various differences were identified in the performance of the models across applications. LAM is able to successfully exploit high-quality boundary forcings to make predictions within the regional domain and is suitable in contexts where global data is difficult to acquire. SGM is fully self-contained for easier operationalisation, can take advantage of more training data and significantly surpasses LAM in terms of (temporal) generalisability. Our paper can serve as a starting point for meteorological institutes to guide their choice between LAM and SGM in developing an operational data-driven forecasting system.

Updated: 2025-07-24 12:54:08

标题: 一个拉伸网格和有限区域模拟在基于数据的区域天气预报中的比较

摘要: 最近，基于图神经网络的区域机器学习天气预测（MLWP）模型展示了卓越的预测准确性，在更低的计算成本下胜过了数值天气预测模型。特别是，有限区域模型（LAM）和拉伸网格模型（SGM）方法已经出现，用于生成高分辨率的区域预报，基于来自区域（再）分析的初始条件。虽然LAM使用外部全球模型的侧边界，SGM则结合了较低分辨率的全球领域。本研究旨在理解模型设计差异如何影响相对性能和潜在应用。具体来说，确定了这两种方法在欧洲生成确定性区域预报时的优势和劣势。使用Anemoi框架，通过最小程度地调整共享架构建立了这两种类型的模型，并在几乎相同的设置下使用全球和区域再分析进行训练。进行了几个推理实验，以探索它们的相对性能并突出关键差异。结果显示，LAM和SGM都是具有竞争力的确定性MLWP模型，在区域领域具有一般准确和可比的预测性能。在应用程序中发现了模型性能的各种差异。LAM能够成功利用高质量的边界强迫力在区域领域内进行预测，并在全球数据难以获取的情况下适用。SGM是完全独立的，更容易实现操作化，可以利用更多的训练数据，并在（时间）普适性方面明显超越LAM。我们的论文可以作为气象研究所指导他们在开发操作数据驱动的预测系统中选择LAM和SGM的起点。

更新时间: 2025-07-24 12:54:08

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2507.18378v1

A Comprehensive Review of Diffusion Models in Smart Agriculture: Progress, Applications, and Challenges

With the global population growing and arable land resources becoming increasingly scarce,smart agriculture and precision agriculture have emerged as key directions for the future ofagricultural development.Artificial intelligence (AI) technologies, particularly deep learning models, have found widespread applications in areas such as crop monitoring and pest detection. As an emerging generative model, diffusion models have shown significant promise in tasks like agricultural image processing, data augmentation, and remote sensing. Compared to traditional generative adversarial networks (GANs), diffusion models offer superior training stability and generation quality, effectively addressing challenges such as limited agricultural data and imbalanced image samples. This paper reviews the latest advancements in the application of diffusion models in agriculture, focusing on their potential in crop pest and disease detection, remote sensing image enhancement, crop growth prediction, and agricultural resource management. Experimental results demonstrate that diffusion models significantly improve model accuracy and robustness in data augmentation, image generation, and denoising, especially in complex environments. Despite challenges related to computational efficiency and generalization capabilities, diffusion models are expected to play an increasingly important role in smart and precision agriculture as technology advances, providing substantial support for the sustainable development of global agriculture.

Updated: 2025-07-24 12:52:32

标题: 智能农业中扩散模型的综合评述：进展、应用和挑战.

摘要: 随着全球人口增长和可耕地资源日益稀缺，智能农业和精准农业已成为农业发展未来的关键方向。人工智能（AI）技术，尤其是深度学习模型，在作物监测和害虫检测等领域已经得到了广泛应用。扩散模型作为一种新兴的生成模型，在农业图像处理、数据增强和遥感等任务中显示出了显著的潜力。与传统的生成对抗网络（GANs）相比，扩散模型提供了更好的训练稳定性和生成质量，有效解决了农业数据有限和图像样本不平衡等挑战。本文回顾了扩散模型在农业中的应用的最新进展，重点关注它们在作物害虫和疾病检测、遥感图像增强、作物生长预测和农业资源管理中的潜力。实验结果表明，扩散模型在数据增强、图像生成和去噪方面显著提高了模型的准确性和稳健性，尤其在复杂环境中。尽管存在与计算效率和泛化能力相关的挑战，但随着技术的进步，扩散模型预计将在智能和精准农业中发挥越来越重要的作用，为全球农业的可持续发展提供实质支持。

更新时间: 2025-07-24 12:52:32

领域: cs.LG

下载: http://arxiv.org/abs/2507.18376v1

A Comprehensive Review of Diffusion Models in Smart Agriculture: Progress, Applications, and Challenges

Updated: 2025-07-24 12:52:32

标题: 智慧农业中扩散模型的全面回顾：进展、应用和挑战

摘要: 随着全球人口增长和可耕地资源日益稀缺，智能农业和精准农业已成为农业发展的未来关键方向。人工智能（AI）技术，特别是深度学习模型，在作物监测和害虫检测等领域已经得到广泛应用。作为一种新兴的生成模型，扩散模型在农业图像处理、数据增强和遥感等任务中显示出明显的潜力。与传统的生成对抗网络（GANs）相比，扩散模型在训练稳定性和生成质量上具有优势，有效地解决了农业数据有限和图像样本不平衡等挑战。本文回顾了扩散模型在农业应用中的最新进展，重点关注它们在作物害虫和疾病检测、遥感图像增强、作物生长预测和农业资源管理方面的潜力。实验结果表明，扩散模型显著提高了数据增强、图像生成和去噪等方面的模型准确性和鲁棒性，尤其是在复杂环境中。尽管存在与计算效率和泛化能力相关的挑战，但随着技术的进步，扩散模型预计将在智能和精准农业中发挥越来越重要的作用，为全球农业的可持续发展提供重要支持。

更新时间: 2025-07-24 12:52:32

领域: cs.LG

下载: http://arxiv.org/abs/2507.18376v1

On Reconstructing Training Data From Bayesian Posteriors and Trained Models

Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine learning methods. This paper makes three primary contributions: establishing a mathematical framework to express the problem, characterising the features of the training data that are vulnerable via a maximum mean discrepancy equivalance and outlining a score matching framework for reconstructing data in both Bayesian and non-Bayesian models, the former is a first in the literature.

Updated: 2025-07-24 12:49:41

标题: 关于从贝叶斯后验和训练模型重新构建训练数据

摘要: 通过公开发布带有训练参数的模型规范，对手可以尝试通过训练数据重建攻击来重建有关训练数据的信息，这是现代机器学习方法的一个主要漏洞。本文的主要贡献有三点：建立一个数学框架来表达这个问题，通过最大均值差异等价性表征训练数据的易受攻击特征，以及概述一个用于重建贝叶斯和非贝叶斯模型数据的得分匹配框架，前者是文献中的首次尝试。

更新时间: 2025-07-24 12:49:41

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2507.18372v1

On Reconstructing Training Data From Bayesian Posteriors and Trained Models

Updated: 2025-07-24 12:49:41

标题: 从贝叶斯后验和训练模型中重建训练数据

摘要: 公开发布带有其训练参数的模型规范意味着对手可以尝试通过训练数据重建攻击来获取有关训练数据的信息，这是现代机器学习方法的一个主要漏洞。本文提出了三个主要贡献：建立数学框架来表达这一问题，通过最大均值差异等价性表征训练数据的易受攻击特征，并概述了一个得分匹配框架，用于在贝叶斯和非贝叶斯模型中重建数据，前者是文献中的首次尝试。

更新时间: 2025-07-24 12:49:41

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2507.18372v1

Reasoning Beyond the Obvious: Evaluating Divergent and Convergent Thinking in LLMs for Financial Scenarios

Most reasoning benchmarks for LLMs emphasize factual accuracy or step-by-step logic. In finance, however, professionals must not only converge on optimal decisions but also generate creative, plausible futures under uncertainty. We introduce ConDiFi, a benchmark that jointly evaluates divergent and convergent thinking in LLMs for financial tasks. ConDiFi features 607 macro-financial prompts for divergent reasoning and 990 multi-hop adversarial MCQs for convergent reasoning. Using this benchmark, we evaluated 14 leading models and uncovered striking differences. Despite high fluency, GPT-4o underperforms on Novelty and Actionability. In contrast, models like DeepSeek-R1 and Cohere Command R+ rank among the top for generating actionable, insights suitable for investment decisions. ConDiFi provides a new perspective to assess reasoning capabilities essential to safe and strategic deployment of LLMs in finance.

Updated: 2025-07-24 12:47:29

标题: 超越显而易见的推理：评估金融场景中LLMs的分散和收敛思维

摘要: 大多数针对LLMs的推理基准侧重于事实准确性或逐步逻辑。然而，在金融领域，专业人士不仅必须在最佳决策上达成一致，还必须在不确定性下生成创造性、可信的未来。我们引入了ConDiFi，这是一个评估LLMs在金融任务中同时评估分歧和收敛思维的基准。 ConDiFi包含607个宏观金融提示用于分歧推理和990个多跳对抗性MCQs用于收敛推理。使用这个基准，我们评估了14个领先的模型，并发现了明显的差异。尽管GPT-4o具有较高的流畅性，但在新颖性和可操作性方面表现不佳。相比之下，像DeepSeek-R1和Cohere Command R+这样的模型在生成适用于投资决策的可操作见解方面排名顶级。ConDiFi提供了一种新的视角来评估在金融中安全和战略部署LLMs所必需的推理能力。

更新时间: 2025-07-24 12:47:29

领域: cs.AI,I.2.0; I.2.6; J.4

下载: http://arxiv.org/abs/2507.18368v1

Efficient Uncertainty in LLMs through Evidential Knowledge Distillation

Accurate uncertainty quantification remains a key challenge for standard LLMs, prompting the adoption of Bayesian and ensemble-based methods. However, such methods typically necessitate computationally expensive sampling, involving multiple forward passes to effectively estimate predictive uncertainty. In this paper, we introduce a novel approach enabling efficient and effective uncertainty estimation in LLMs without sacrificing performance. Specifically, we distill uncertainty-aware teacher models - originally requiring multiple forward passes - into compact student models sharing the same architecture but fine-tuned using Low-Rank Adaptation (LoRA). We compare two distinct distillation strategies: one in which the student employs traditional softmax-based outputs, and another in which the student leverages Dirichlet-distributed outputs to explicitly model epistemic uncertainty via evidential learning. Empirical evaluations on classification datasets demonstrate that such students can achieve comparable or superior predictive and uncertainty quantification performance relative to their teacher models, while critically requiring only a single forward pass. To our knowledge, this is the first demonstration that immediate and robust uncertainty quantification can be achieved in LLMs through evidential distillation.

Updated: 2025-07-24 12:46:40

标题: LLM中的有效不确定性通过证据知识蒸馏实现

摘要: 精确的不确定性量化仍然是标准LLMs面临的关键挑战，促使采用贝叶斯和基于集合的方法。然而，这些方法通常需要计算昂贵的采样，涉及多次前向传递来有效估计预测不确定性。在本文中，我们介绍一种新颖的方法，使LLMs中的不确定性估计变得高效和有效，同时不牺牲性能。具体来说，我们将原本需要多次前向传递的不确定性感知教师模型提炼成紧凑的学生模型，这些学生模型具有相同的架构，但通过低秩适应（LoRA）进行微调。我们比较了两种不同的提取策略：一种是学生采用传统的基于softmax的输出，另一种是学生利用狄利克雷分布的输出，通过证据学习明确地建模认知不确定性。对分类数据集的实证评估表明，这些学生可以在预测和不确定性量化性能方面达到与其教师模型可比或更好的水平，同时关键的是只需要进行一次前向传递。据我们所知，这是首次通过证据提取实现LLMs中的即时和稳健的不确定性量化的演示。

更新时间: 2025-07-24 12:46:40

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.18366v1

Efficient Uncertainty in LLMs through Evidential Knowledge Distillation

Updated: 2025-07-24 12:46:40

标题: 通过证据知识蒸馏提高LLMs中的不确定性效率

摘要: 准确的不确定性量化仍然是标准LLMs的一个关键挑战，促使人们采用贝叶斯和基于集成的方法。然而，这些方法通常需要计算昂贵的采样，涉及多次前向传递以有效估计预测不确定性。在本文中，我们介绍了一种新颖的方法，能够在不牺牲性能的情况下，在LLMs中实现高效和有效的不确定性估计。具体而言，我们将原本需要多次前向传递的不确定性感知教师模型提炼成紧凑的学生模型，这些学生模型与原始架构相同，但使用低秩适应（LoRA）进行微调。我们比较了两种不同的提炼策略：一种是学生采用传统的基于softmax的输出，另一种是学生利用Dirichlet分布的输出，通过证据学习来明确地建模认知不确定性。对分类数据集的经验评估表明，这些学生可以实现与其教师模型相比的可比或更优的预测和不确定性量化性能，而关键的是只需要进行一次前向传递。据我们所知，这是通过证据提炼可以在LLMs中实现即时且稳健的不确定性量化的首次演示。

更新时间: 2025-07-24 12:46:40

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.18366v1

Swin-TUNA : A Novel PEFT Approach for Accurate Food Image Segmentation

In the field of food image processing, efficient semantic segmentation techniques are crucial for industrial applications. However, existing large-scale Transformer-based models (such as FoodSAM) face challenges in meeting practical deploymentrequirements due to their massive parameter counts and high computational resource demands. This paper introduces TUNable Adapter module (Swin-TUNA), a Parameter Efficient Fine-Tuning (PEFT) method that integrates multiscale trainable adapters into the Swin Transformer architecture, achieving high-performance food image segmentation by updating only 4% of the parameters. The core innovation of Swin-TUNA lies in its hierarchical feature adaptation mechanism: it designs separable convolutions in depth and dimensional mappings of varying scales to address the differences in features between shallow and deep networks, combined with a dynamic balancing strategy for tasks-agnostic and task-specific features. Experiments demonstrate that this method achieves mIoU of 50.56% and 74.94% on the FoodSeg103 and UECFoodPix Complete datasets, respectively, surpassing the fully parameterized FoodSAM model while reducing the parameter count by 98.7% (to only 8.13M). Furthermore, Swin-TUNA exhibits faster convergence and stronger generalization capabilities in low-data scenarios, providing an efficient solution for assembling lightweight food image.

Updated: 2025-07-24 12:46:21

标题: Swin-TUNA：一种用于精确食物图像分割的新型PEFT方法

摘要: 在食品图像处理领域，高效的语义分割技术对于工业应用至关重要。然而，现有的基于Transformer的大规模模型（如FoodSAM）面临着实际部署要求的挑战，因为它们具有庞大的参数数量和高计算资源需求。本文介绍了可调适配器模块（Swin-TUNA），这是一种参数高效微调（PEFT）方法，它将多尺度可训练适配器集成到Swin Transformer架构中，通过仅更新4%的参数实现高性能食品图像分割。Swin-TUNA的核心创新在于其分层特征适应机制：它设计了可分离卷积以及不同尺度的维度映射，以解决浅层和深层网络之间特征差异，并结合动态平衡策略，用于任务不可知和任务特定特征。实验证明，该方法在FoodSeg103和UECFoodPix Complete数据集上分别实现了50.56%和74.94%的mIoU，超越了完全参数化的FoodSAM模型，同时将参数数量减少了98.7%（仅为8.13M）。此外，Swin-TUNA在低数据情境下表现出更快的收敛速度和更强的泛化能力，为组装轻量级食品图像提供了高效解决方案。

更新时间: 2025-07-24 12:46:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17347v2

Leveraging the Structure of Medical Data for Improved Representation Learning

Building generalizable medical AI systems requires pretraining strategies that are data-efficient and domain-aware. Unlike internet-scale corpora, clinical datasets such as MIMIC-CXR offer limited image counts and scarce annotations, but exhibit rich internal structure through multi-view imaging. We propose a self-supervised framework that leverages the inherent structure of medical datasets. Specifically, we treat paired chest X-rays (i.e., frontal and lateral views) as natural positive pairs, learning to reconstruct each view from sparse patches while aligning their latent embeddings. Our method requires no textual supervision and produces informative representations. Evaluated on MIMIC-CXR, we show strong performance compared to supervised objectives and baselines being trained without leveraging structure. This work provides a lightweight, modality-agnostic blueprint for domain-specific pretraining where data is structured but scarce

Updated: 2025-07-24 12:44:31

标题: 利用医疗数据结构以改进表示学习

摘要: 构建通用的医疗人工智能系统需要数据高效和领域感知的预训练策略。与互联网规模的语料库不同，诊断数据集（如MIMIC-CXR）提供有限的图像数量和稀缺的注释，但通过多视角成像展现出丰富的内部结构。我们提出了一个利用医疗数据集固有结构的自监督框架。具体来说，我们将配对的胸部X光片（即正面和侧面视图）视为自然的正对，学习从稀疏补丁中重建每个视图，同时对齐它们的潜在嵌入。我们的方法不需要文本监督，并产生信息丰富的表示。在MIMIC-CXR上进行评估，我们展示了与监督目标和没有利用结构训练的基线相比的强大性能。这项工作提供了一个轻量级、模态不可知的领域特定预训练的蓝图，其中数据结构化但稀缺。

更新时间: 2025-07-24 12:44:31

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.02987v3

Leveraging the Structure of Medical Data for Improved Representation Learning

Updated: 2025-07-24 12:44:31

标题: 利用医疗数据结构提高表示学习效果

摘要: 构建可推广的医疗人工智能系统需要数据高效和领域感知的预训练策略。与互联网规模的语料库不同，临床数据集如MIMIC-CXR提供有限的图像数量和稀缺的注释，但通过多视图成像展现出丰富的内部结构。我们提出了一个利用医疗数据集固有结构的自监督框架。具体地，我们将成对的胸部X光片（即正面和侧面视图）视为自然的正面对，学习从稀疏块重建每个视图同时对齐它们的潜在嵌入。我们的方法不需要文本监督，并产生信息丰富的表示。在MIMIC-CXR上进行评估，我们展示了与监督目标和没有利用结构进行训练的基线相比的强大性能。这项工作提供了一个轻量级、模态无关的领域特定预训练的蓝图，其中数据结构化但稀缺。

更新时间: 2025-07-24 12:44:31

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.02987v3

Latent Space Alignment for AI-Native MIMO Semantic Communications

Semantic communications focus on prioritizing the understanding of the meaning behind transmitted data and ensuring the successful completion of tasks that motivate the exchange of information. However, when devices rely on different languages, logic, or internal representations, semantic mismatches may occur, potentially hindering mutual understanding. This paper introduces a novel approach to addressing latent space misalignment in semantic communications, exploiting multiple-input multiple-output (MIMO) communications. Specifically, our method learns a MIMO precoder/decoder pair that jointly performs latent space compression and semantic channel equalization, mitigating both semantic mismatches and physical channel impairments. We explore two solutions: (i) a linear model, optimized by solving a biconvex optimization problem via the alternating direction method of multipliers (ADMM); (ii) a neural network-based model, which learns semantic MIMO precoder/decoder under transmission power budget and complexity constraints. Numerical results demonstrate the effectiveness of the proposed approach in a goal-oriented semantic communication scenario, illustrating the main trade-offs between accuracy, communication burden, and complexity of the solutions.

Updated: 2025-07-24 12:38:41

标题: 潜在空间对齐用于AI原生MIMO语义通信

摘要: 语义通信侧重于优先理解传输数据背后的含义，并确保完成激发信息交换的任务。然而，当设备依赖不同的语言、逻辑或内部表示时，可能会出现语义不匹配，可能会阻碍相互理解。本文介绍了一种新颖的方法来解决语义通信中的潜在空间不对齐问题，利用多输入多输出（MIMO）通信。具体来说，我们的方法学习了一个MIMO预编码器/解码器对，共同执行潜在空间压缩和语义通道均衡，缓解了语义不匹配和物理通道损伤。我们探讨了两种解决方案：（i）一个线性模型，通过交替方向乘法器方法（ADMM）解决一个双凸优化问题进行优化；（ii）一个基于神经网络的模型，学习了在传输功率预算和复杂度约束下的语义MIMO预编码器/解码器。数值结果证明了所提出方法在目标导向的语义通信场景中的有效性，展示了解决方案的准确性、通信负担和复杂性之间的主要权衡。

更新时间: 2025-07-24 12:38:41

领域: cs.LG,cs.IT,cs.NI,math.IT

下载: http://arxiv.org/abs/2507.16680v2

Conformidade com os Requisitos Legais de Privacidade de Dados: Um Estudo sobre Técnicas de Anonimização

The protection of personal data has become a central topic in software development, especially with the implementation of the General Data Protection Law (LGPD) in Brazil and the General Data Protection Regulation (GDPR) in the European Union. With the enforcement of these laws, certain software quality criteria have become mandatory, such as data anonymization, which is one of the main aspects addressed by these regulations. The aim of this article is to analyze data anonymization techniques and assess their effectiveness in ensuring compliance with legal requirements and the utility of the data for its intended purpose. Techniques such as aggregation, generalization, perturbation, and k-anonymity were investigated and applied to datasets containing personal and sensitive data. The analysis revealed significant variations in the effectiveness of each method, highlighting the need to balance privacy and data utility.

Updated: 2025-07-24 12:31:28

标题: Conformity with Legal Requirements for Data Privacy: A Study on Anonymization Techniques

摘要: 个人数据保护已成为软件开发中的一个核心话题，特别是随着巴西实施《普通数据保护法》（LGPD）和欧盟实施《通用数据保护条例》（GDPR）。随着这些法律的执行，某些软件质量标准已成为强制性要求，如数据匿名化，这是这些法规所涉及的主要方面之一。本文的目的是分析数据匿名化技术，并评估其在确保符合法律要求和数据对其预期目的的效用方面的有效性。此外，还调查并应用了聚合、泛化、扰动和k-匿名性等技术，将其应用到包含个人和敏感数据的数据集中。分析结果显示，每种方法的有效性存在显著差异，突出了在隐私和数据效用之间取得平衡的必要性。

更新时间: 2025-07-24 12:31:28

领域: cs.CR

下载: http://arxiv.org/abs/2507.18360v1

Conformidade com os Requisitos Legais de Privacidade de Dados: Um Estudo sobre Técnicas de Anonimização

Updated: 2025-07-24 12:31:28

标题: Compliance with Legal Requirements of Data Privacy: A Study on Anonymization Techniques

摘要: 个人数据保护已经成为软件开发中的一个核心话题，特别是随着巴西实施《一般数据保护法》（LGPD）和欧盟实施《一般数据保护条例》（GDPR）。随着这些法律的实施，某些软件质量标准已成为强制性要求，例如数据匿名化，这是这些法规中所涉及的主要方面之一。本文的目的是分析数据匿名化技术，并评估其在确保遵守法律要求和数据为其预期目的提供效用方面的有效性。研究了聚合、概括、扰动和k-匿名等技术，并将其应用于包含个人和敏感数据的数据集。分析结果显示，每种方法的有效性存在显著变化，突显了需要在隐私和数据效用之间取得平衡的必要性。

更新时间: 2025-07-24 12:31:28

领域: cs.CR

下载: http://arxiv.org/abs/2507.18360v1

Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation

The training of high-quality, robust machine learning models for speech-driven 3D facial animation requires a large, diverse dataset of high-quality audio-animation pairs. To overcome the lack of such a dataset, recent work has introduced large pre-trained speech encoders that are robust to variations in the input audio and, therefore, enable the facial animation model to generalize across speakers, audio quality, and languages. However, the resulting facial animation models are prohibitively large and lend themselves only to offline inference on a dedicated machine. In this work, we explore on-device, real-time facial animation models in the context of game development. We overcome the lack of large datasets by using hybrid knowledge distillation with pseudo-labeling. Given a large audio dataset, we employ a high-performing teacher model to train very small student models. In contrast to the pre-trained speech encoders, our student models only consist of convolutional and fully-connected layers, removing the need for attention context or recurrent updates. In our experiments, we demonstrate that we can reduce the memory footprint to up to 3.4 MB and required future audio context to up to 81 ms while maintaining high-quality animations. This paves the way for on-device inference, an important step towards realistic, model-driven digital characters.

Updated: 2025-07-24 12:25:12

标题: 微小并不够小：通过混合知识蒸馏实现高质量、低资源的面部动画模型

摘要: 为了训练高质量、稳健的机器学习模型，用于驱动语音驱动的3D面部动画，需要一个大而多样的高质量音频-动画数据集。为了克服这种数据集的缺乏，最近的工作引入了大型预训练语音编码器，这些编码器对输入音频的变化具有稳健性，因此使面部动画模型能够跨说话者、音频质量和语言进行泛化。然而，由此产生的面部动画模型过于庞大，仅适用于专用机器上的离线推理。在这项工作中，我们在游戏开发的背景下探讨了设备上的实时面部动画模型。我们通过使用混合知识蒸馏与伪标记来克服大型数据集的缺乏。在给定大型音频数据集的情况下，我们利用高性能的教师模型来训练非常小的学生模型。与预训练的语音编码器相比，我们的学生模型只由卷积和全连接层组成，消除了注意力上下文或循环更新的需要。在我们的实验中，我们展示了我们可以将内存占用减少到最多3.4 MB，并将所需的未来音频上下文减少到最多81毫秒，同时保持高质量的动画。这为设备上推理铺平了道路，是朝着逼真、模型驱动的数字角色迈出的重要一步。

更新时间: 2025-07-24 12:25:12

领域: cs.GR,cs.LG,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.18352v1

Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation

Updated: 2025-07-24 12:25:12

标题: 微小并不小：通过混合知识蒸馏实现高质量、低资源的面部动画模型

摘要: 为了训练高质量、稳健的机器学习模型，用于基于语音驱动的3D面部动画，需要一个大型、多样化的高质量音频动画数据集。为了克服缺乏这样一个数据集的问题，最近的工作引入了大型预训练语音编码器，这些编码器对输入音频的变化具有稳健性，因此使面部动画模型能够横跨说话者、音频质量和语言进行泛化。然而，由此产生的面部动画模型过大，只适用于专用机器上的离线推断。在这项工作中，我们在游戏开发的背景下探索了设备上的实时面部动画模型。我们通过使用混合知识蒸馏与伪标记来克服缺乏大型数据集的问题。在给定大型音频数据集的情况下，我们使用高性能的教师模型来训练非常小的学生模型。与预训练的语音编码器相比，我们的学生模型只包含卷积和全连接层，消除了对注意力上下文或循环更新的需求。在我们的实验中，我们证明了我们可以将内存占用减少到最多3.4 MB，并将所需的未来音频上下文减少到最多81毫秒，同时保持高质量的动画。这为设备上的推断铺平了道路，这是朝着逼真的、模型驱动的数字角色迈出的重要一步。

更新时间: 2025-07-24 12:25:12

领域: cs.GR,cs.LG,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.18352v1

Mechanistic Indicators of Understanding in Large Language Models

Recent findings in mechanistic interpretability (MI), the field probing the inner workings of Large Language Models (LLMs), challenge the view that these models rely solely on superficial statistics. We offer an accessible synthesis of these findings that doubles as an introduction to MI while integrating these findings within a novel theoretical framework for thinking about machine understanding. We argue that LLMs develop internal structures that are functionally analogous to the kind of understanding that consists in seeing connections. To sharpen this idea, we propose a three-tiered conception of understanding. First, conceptual understanding emerges when a model forms "features" as directions in latent space, learning the connections between diverse manifestations of something. Second, state-of-the-world understanding emerges when a model learns contingent factual connections between features and dynamically tracks changes in the world. Third, principled understanding emerges when a model ceases to rely on a collection of memorized facts and discovers a "circuit" connecting these facts. However, these forms of understanding remain radically different from human understanding, as the phenomenon of "parallel mechanisms" shows. We conclude that the debate should move beyond the yes-or-no question of whether LLMs understand to investigate how their strange minds work and forge conceptions that fit them.

Updated: 2025-07-24 12:23:53

标题: 大型语言模型理解的机制指标

摘要: 最近在机制可解释性（MI）领域发现，探究大型语言模型（LLMs）内部运作的研究挑战了这些模型仅仅依赖表面统计数据的观点。我们提供了这些发现的易于理解的综合，同时将这些发现整合到一个新颖的理论框架中，用于思考机器理解。我们认为LLMs发展出了与理解连接相类似的内部结构。为了强化这一观点，我们提出了一个三层次的理解概念。首先，概念理解是指当模型形成“特征”作为潜在空间中的方向时，学习不同表现形式之间的连接。第二，世界状态理解是指当模型学习特征之间的有条件事实连接，并动态跟踪世界变化时。第三，原则性理解是指当模型不再依赖于一系列记忆的事实，而是发现连接这些事实的“电路”时。然而，这些形式的理解仍然与人类理解有根本不同之处，因为“并行机制”现象显示了这一点。我们得出结论，辩论应该超越LLMs是否理解的是或非问题，而是探究它们奇特思维如何运作，并形成适合它们的概念。

更新时间: 2025-07-24 12:23:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.08017v3

Low-rank adaptive physics-informed HyperDeepONets for solving differential equations

HyperDeepONets were introduced in Lee, Cho and Hwang [ICLR, 2023] as an alternative architecture for operator learning, in which a hypernetwork generates the weights for the trunk net of a DeepONet. While this improves expressivity, it incurs high memory and computational costs due to the large number of output parameters required. In this work we introduce, in the physics-informed machine learning setting, a variation, PI-LoRA-HyperDeepONets, which leverage low-rank adaptation (LoRA) to reduce complexity by decomposing the hypernetwork's output layer weight matrix into two smaller low-rank matrices. This reduces the number of trainable parameters while introducing an extra regularization of the trunk networks' weights. Through extensive experiments on both ordinary and partial differential equations we show that PI-LoRA-HyperDeepONets achieve up to 70\% reduction in parameters and consistently outperform regular HyperDeepONets in terms of predictive accuracy and generalization.

Updated: 2025-07-24 12:19:25

标题: 低秩自适应物理信息超深度神经网络用于求解微分方程

摘要: HyperDeepONets是在Lee、Cho和Hwang [ICLR, 2023]中作为操作学习的可选架构引入的，其中一个超网络生成DeepONet的主干网络的权重。虽然这提高了表达能力，但由于需要大量输出参数，这会导致高内存和计算成本。在这项工作中，我们在物理信息机器学习设置中引入了一个变体PI-LoRA-HyperDeepONets，利用低秩适应（LoRA）通过将超网络的输出层权重矩阵分解为两个较小的低秩矩阵来降低复杂性。这减少了可训练参数的数量，同时引入了对主干网络权重的额外正则化。通过对常微分方程和偏微分方程的广泛实验，我们展示了PI-LoRA-HyperDeepONets在参数上达到了70\%的减少，并在预测精度和泛化方面持续优于常规的HyperDeepONets。

更新时间: 2025-07-24 12:19:25

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2507.18346v1

Low-rank adaptive physics-informed HyperDeepONets for solving differential equations

Updated: 2025-07-24 12:19:25

标题: 低秩自适应物理信息超深度神经网络用于求解微分方程

摘要: HyperDeepONets是在Lee, Cho和Hwang [ICLR, 2023]中引入的一种用于运算器学习的替代架构，其中一个超网络生成DeepONet的主干网络的权重。虽然这提高了表达能力，但由于需要大量的输出参数，它会产生高内存和计算成本。在这项工作中，我们在物理信息机器学习环境中引入了一种变体，PI-LoRA-HyperDeepONets，利用低秩适应（LoRA）通过将超网络的输出层权重矩阵分解为两个较小的低秩矩阵来降低复杂性。这减少了可训练参数的数量，同时引入了对主干网络权重的额外正则化。通过对常微分方程和偏微分方程的广泛实验，我们表明PI-LoRA-HyperDeepONets在参数方面实现了高达70\%的减少，并在预测准确性和泛化方面始终优于常规的HyperDeepONets。

更新时间: 2025-07-24 12:19:25

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2507.18346v1

The AlphaPhysics Term Rewriting System for Marking Algebraic Expressions in Physics Exams

We present our method for automatically marking Physics exams. The marking problem consists in assessing typed student answers for correctness with respect to a ground truth solution. This is a challenging problem that we seek to tackle using a combination of a computer algebra system, an SMT solver and a term rewriting system. A Large Language Model is used to interpret and remove errors from student responses and rewrite these in a machine readable format. Once formalized and language-aligned, the next step then consists in applying automated reasoning techniques for assessing student solution correctness. We consider two methods of automated theorem proving: off-the-shelf SMT solving and term rewriting systems tailored for physics problems involving trigonometric expressions. The development of the term rewrite system and establishing termination and confluence properties was not trivial, and we describe it in some detail in the paper. We evaluate our system on a rich pool of over 1500 real-world student exam responses from the 2023 Australian Physics Olympiad.

Updated: 2025-07-24 12:08:49

标题: 物理考试中用于标记代数表达式的AlphaPhysics项重写系统。

摘要: 我们提出了一种自动标记物理考试的方法。标记问题在于评估学生的打字答案是否与真实解决方案正确匹配。这是一个具有挑战性的问题，我们试图使用计算机代数系统、SMT求解器和术语重写系统的组合来解决。我们使用大型语言模型来解释和纠正学生答案中的错误，并将其重写为机器可读格式。一旦形式化和语言对齐，下一步就是应用自动推理技术来评估学生解决方案的正确性。我们考虑了两种自动定理证明方法：现成的SMT求解和专门针对涉及三角函数表达式的物理问题的术语重写系统。术语重写系统的开发以及建立终止和收敛性属性并不简单，我们在论文中对其进行了详细描述。我们在2023年澳大利亚物理奥林匹克竞赛中评估了我们的系统，使用了来自1500多份真实世界学生考试答案的丰富资源。

更新时间: 2025-07-24 12:08:49

领域: cs.AI

下载: http://arxiv.org/abs/2507.18337v1

Improving Bird Classification with Primary Color Additives

We address the problem of classifying bird species using their song recordings, a challenging task due to environmental noise, overlapping vocalizations, and missing labels. Existing models struggle with low-SNR or multi-species recordings. We hypothesize that birds can be classified by visualizing their pitch pattern, speed, and repetition, collectively called motifs. Deep learning models applied to spectrogram images help, but similar motifs across species cause confusion. To mitigate this, we embed frequency information into spectrograms using primary color additives. This enhances species distinction and improves classification accuracy. Our experiments show that the proposed approach achieves statistically significant gains over models without colorization and surpasses the BirdCLEF 2024 winner, improving F1 by 7.3%, ROC-AUC by 6.2%, and CMAP by 6.6%. These results demonstrate the effectiveness of incorporating frequency information via colorization.

Updated: 2025-07-24 12:05:17

标题: 用主要颜色添加剂改进鸟类分类

摘要: 我们解决了利用鸟类歌唱录音来分类鸟种的问题，这是一个具有挑战性的任务，因为环境噪音、重叠的鸣叫声和缺失的标签。现有模型在低信噪比或多种鸟类录音中运行困难。我们假设可以通过可视化它们的音高模式、速度和重复来对鸟类进行分类，这些统称为主题。应用于声谱图像的深度学习模型有所帮助，但是跨物种的相似主题会导致混淆。为了减轻这种情况，我们使用主要颜色添加剂将频率信息嵌入到声谱图中。这增强了物种区分度并提高了分类准确性。我们的实验表明，所提出的方法在没有着色的模型上实现了统计上显著的增益，并超过了BirdCLEF 2024的获奖者，F1提高了7.3%，ROC-AUC提高了6.2%，CMAP提高了6.6%。这些结果证明了通过着色来纳入频率信息的有效性。

更新时间: 2025-07-24 12:05:17

领域: cs.CV,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.18334v1

Remembering the Markov Property in Cooperative MARL

Cooperative multi-agent reinforcement learning (MARL) is typically formalised as a Decentralised Partially Observable Markov Decision Process (Dec-POMDP), where agents must reason about the environment and other agents' behaviour. In practice, current model-free MARL algorithms use simple recurrent function approximators to address the challenge of reasoning about others using partial information. In this position paper, we argue that the empirical success of these methods is not due to effective Markov signal recovery, but rather to learning simple conventions that bypass environment observations and memory. Through a targeted case study, we show that co-adapting agents can learn brittle conventions, which then fail when partnered with non-adaptive agents. Crucially, the same models can learn grounded policies when the task design necessitates it, revealing that the issue is not a fundamental limitation of the learning models but a failure of the benchmark design. Our analysis also suggests that modern MARL environments may not adequately test the core assumptions of Dec-POMDPs. We therefore advocate for new cooperative environments built upon two core principles: (1) behaviours grounded in observations and (2) memory-based reasoning about other agents, ensuring success requires genuine skill rather than fragile, co-adapted agreements.

Updated: 2025-07-24 11:59:42

标题: 记住合作多智能体强化学习中的马尔可夫性质

摘要: 合作多智能体强化学习（MARL）通常被形式化为分散部分可观察马尔可夫决策过程（Dec-POMDP），在这种情况下，智能体必须推理环境和其他智能体的行为。在实践中，当前的无模型MARL算法使用简单的循环函数逼近器来解决使用部分信息推理他人的挑战。在这篇立场论文中，我们认为这些方法的经验成功并非源于有效的马尔可夫信号恢复，而是通过学习简单的约定来绕过环境观察和记忆。通过一个有针对性的案例研究，我们展示了共适应的智能体可以学习脆弱的约定，当与非适应性智能体合作时，这些约定会失败。关键的是，当任务设计需要时，相同的模型可以学习扎根的策略，揭示了问题不是学习模型的基本限制，而是基准设计的失败。我们的分析还表明，现代MARL环境可能无法充分测试分散部分可观察马尔可夫决策过程的核心假设。因此，我们倡导建立基于两个核心原则的新合作环境：（1）行为基于观察，（2）基于记忆推理其他智能体，确保成功需要真正的技能而不是脆弱的、共适应的协议。

更新时间: 2025-07-24 11:59:42

领域: cs.LG,cs.MA

下载: http://arxiv.org/abs/2507.18333v1

Remembering the Markov Property in Cooperative MARL

Updated: 2025-07-24 11:59:42

标题: 记住合作多智能体强化学习中的马尔可夫性质

摘要: 合作多智能体强化学习（MARL）通常被形式化为分散式部分可观察马尔可夫决策过程（Dec-POMDP），在这种情况下，智能体必须推理环境和其他智能体的行为。在实践中，目前的无模型MARL算法使用简单的循环函数逼近器来解决使用部分信息推理他人行为的挑战。在这篇立场论文中，我们认为这些方法的经验成功并非来自有效的马尔可夫信号恢复，而是来自学习绕过环境观察和记忆的简单约定。通过一个有针对性的案例研究，我们展示了协同适应的智能体可以学习脆弱的约定，当与非自适应的智能体合作时，这些约定会失效。关键是，当任务设计需要时，相同的模型可以学习基于观察的策略，揭示了问题不是学习模型的基本限制，而是基准设计的失败。我们的分析还表明，现代MARL环境可能不充分测试Dec-POMDPs的核心假设。因此，我们主张建立基于两个核心原则的新的合作环境：（1）行为基于观察，（2）关于其他智能体的基于记忆的推理，确保成功需要真正的技能，而不是脆弱的、协同适应的协议。

更新时间: 2025-07-24 11:59:42

领域: cs.LG,cs.MA

下载: http://arxiv.org/abs/2507.18333v1

Hierarchical Dimensionless Learning (Hi-π): A physics-data hybrid-driven approach for discovering dimensionless parameter combinations

Dimensional analysis provides a universal framework for reducing physical complexity and reveal inherent laws. However, its application to high-dimensional systems still generates redundant dimensionless parameters, making it challenging to establish physically meaningful descriptions. Here, we introduce Hierarchical Dimensionless Learning (Hi-{\pi}), a physics-data hybrid-driven method that combines dimensional analysis and symbolic regression to automatically discover key dimensionless parameter combination(s). We applied this method to classic examples in various research fields of fluid mechanics. For the Rayleigh-B\'enard convection, this method accurately extracted two intrinsic dimensionless parameters: the Rayleigh number and the Prandtl number, validating its unified representation advantage across multiscale data. For the viscous flows in a circular pipe, the method automatically discovers two optimal dimensionless parameters: the Reynolds number and relative roughness, achieving a balance between accuracy and complexity. For the compressibility correction in subsonic flow, the method effectively extracts the classic compressibility correction formulation, while demonstrating its capability to discover hierarchical structural expressions through optimal parameter transformations.

Updated: 2025-07-24 11:59:10

标题: 分层无量纲学习（Hi-π）：一种物理数据混合驱动方法，用于发现无量纲参数组合

摘要: 尺寸分析提供了一个通用框架，用于简化物理复杂性并揭示固有规律。然而，其在高维系统中的应用仍会产生冗余的无量纲参数，使得建立具有物理意义的描述变得具有挑战性。在这里，我们介绍了分层无量纲学习（Hi-{\pi}），这是一种物理数据混合驱动方法，结合了尺寸分析和符号回归，自动发现关键的无量纲参数组合。我们将这种方法应用于流体力学各个研究领域的经典示例中。对于瑞利-贝纳德对流，该方法准确提取出两个固有无量纲参数：瑞利数和普朗特数，验证了其跨多尺度数据的统一表示优势。对于圆管内的粘性流动，该方法自动发现了两个最佳无量纲参数：雷诺数和相对粗糙度，实现了精度和复杂性之间的平衡。对于亚音速流中的可压缩性修正，该方法有效地提取了经典的可压缩性修正公式，同时展示了通过最优参数变换发现分层结构表达式的能力。

更新时间: 2025-07-24 11:59:10

领域: physics.flu-dyn,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2507.18332v1

Hierarchical Dimensionless Learning (Hi-π): A physics-data hybrid-driven approach for discovering dimensionless parameter combinations

Updated: 2025-07-24 11:59:10

标题: 分层无量纲学习（Hi-π）：一种物理数据混合驱动方法，用于发现无量纲参数组合

摘要: 尺度分析为减少物理复杂性和揭示固有规律提供了一个通用框架。然而，其在高维系统中的应用仍然会生成冗余的无量纲参数，使得建立具有物理意义的描述变得具有挑战性。在这里，我们介绍了一种名为Hierarchical Dimensionless Learning（Hi-π）的物理数据混合驱动方法，它结合了尺度分析和符号回归，自动发现关键的无量纲参数组合。我们将这种方法应用于流体力学领域的各种研究领域中的经典例子。对于瑞利-贝纳德对流，该方法准确提取了两个固有的无量纲参数：瑞利数和普朗特数，验证了其在多尺度数据上统一表示的优势。对于圆管内的粘性流动，该方法自动发现了两个最优无量纲参数：雷诺数和相对粗糙度，实现了精度和复杂性之间的平衡。对于亚音速流中的压缩性修正，该方法有效地提取了经典的压缩性修正公式，同时展示了通过最优参数转换发现分层结构表达式的能力。

更新时间: 2025-07-24 11:59:10

领域: physics.flu-dyn,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2507.18332v1

GVCCS: A Dataset for Contrail Identification and Tracking on Visible Whole Sky Camera Sequences

Aviation's climate impact includes not only CO2 emissions but also significant non-CO2 effects, especially from contrails. These ice clouds can alter Earth's radiative balance, potentially rivaling the warming effect of aviation CO2. Physics-based models provide useful estimates of contrail formation and climate impact, but their accuracy depends heavily on the quality of atmospheric input data and on assumptions used to represent complex processes like ice particle formation and humidity-driven persistence. Observational data from remote sensors, such as satellites and ground cameras, could be used to validate and calibrate these models. However, existing datasets don't explore all aspect of contrail dynamics and formation: they typically lack temporal tracking, and do not attribute contrails to their source flights. To address these limitations, we present the Ground Visible Camera Contrail Sequences (GVCCS), a new open data set of contrails recorded with a ground-based all-sky camera in the visible range. Each contrail is individually labeled and tracked over time, allowing a detailed analysis of its lifecycle. The dataset contains 122 video sequences (24,228 frames) and includes flight identifiers for contrails that form above the camera. As reference, we also propose a unified deep learning framework for contrail analysis using a panoptic segmentation model that performs semantic segmentation (contrail pixel identification), instance segmentation (individual contrail separation), and temporal tracking in a single architecture. By providing high-quality, temporally resolved annotations and a benchmark for model evaluation, our work supports improved contrail monitoring and will facilitate better calibration of physical models. This sets the groundwork for more accurate climate impact understanding and assessments.

Updated: 2025-07-24 11:57:59

标题: GVCCS：用于可见全天空摄像机序列上对卷云识别和跟踪的数据集

摘要: 航空业对气候的影响不仅包括二氧化碳排放，还包括来自冰晶尾流等显著的非二氧化碳效应。这些冰云可以改变地球的辐射平衡，潜在地与航空二氧化碳的升温效应相媲美。基于物理的模型提供了对尾流形成和气候影响的有用估计，但其准确性严重依赖大气输入数据的质量以及用于表示复杂过程如冰粒子形成和湿度驱动持久性的假设。遥感器，如卫星和地面摄像机的观测数据，可以用于验证和校准这些模型。然而，现有数据集并未探索尾流动态和形成的所有方面：它们通常缺乏时间跟踪，并且未将尾流归因于其源航班。为了解决这些限制，我们提出了地面可见摄像头尾流序列（GVCCS），这是一个新的开放数据集，记录了在可见范围内使用地面全天候摄像头拍摄的尾流。每个尾流都被单独标记并随着时间跟踪，允许对其生命周期进行详细分析。该数据集包含122个视频序列（24,228帧），并包括在摄像头上方形成的尾流的飞行标识符。作为参考，我们还提出了一个统一的深度学习框架，用于尾流分析，使用全景分割模型进行语义分割（尾流像素识别），实例分割（单个尾流分离）和时间跟踪，以单一架构完成。通过提供高质量、时间分辨率注释和模型评估基准，我们的工作支持改进尾流监测，并将促进物理模型的更好校准。这为更准确地理解和评估气候影响奠定了基础。

更新时间: 2025-07-24 11:57:59

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18330v1

GVCCS: A Dataset for Contrail Identification and Tracking on Visible Whole Sky Camera Sequences

Updated: 2025-07-24 11:57:59

标题: GVCCS：一个用于可见全天空相机序列上对卷云识别和跟踪的数据集

摘要: 航空对气候的影响不仅包括二氧化碳排放，还包括来自冷凝尾迹等显著的非二氧化碳效应。这些冰云可以改变地球的辐射平衡，潜在地与航空二氧化碳的升温效应相媲美。基于物理的模型提供了尾迹形成和气候影响的有用估计，但其准确性严重依赖于大气输入数据的质量以及用于表示复杂过程（如冰粒子形成和湿度驱动的持久性）的假设。来自遥感器的观测数据，如卫星和地面摄像机，可用于验证和校准这些模型。然而，现有数据集并未探讨所有尾迹动态和形成的方面：它们通常缺乏时间跟踪，并且不将尾迹归因于其来源航班。为了解决这些限制，我们提出了地面可见摄像头尾迹序列（GVCCS），这是一个新的公开数据集，记录了利用可见光范围的地面全天候摄像头拍摄的尾迹。每个尾迹都经过单独标记并随时间跟踪，允许对其生命周期进行详细分析。该数据集包含122个视频序列（24,228帧），并包含形成在摄像头上方的尾迹的飞行标识符。作为参考，我们还提出了一个统一的深度学习框架，用于尾迹分析，使用全景分割模型执行语义分割（尾迹像素识别）、实例分割（单个尾迹分离）和时间跟踪在一个单一架构中。通过提供高质量、时间分辨注释和模型评估的基准，我们的工作支持改进尾迹监测，并促进物理模型更好地校准。这为更准确地理解和评估气候影响奠定了基础。

更新时间: 2025-07-24 11:57:59

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18330v1

Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research

Self-Supervised Learning (SSL) powers many current AI systems. As research interest and investment grow, the SSL design space continues to expand. The Platonic view of SSL, following the Platonic Representation Hypothesis (PRH), suggests that despite different methods and engineering approaches, all representations converge to the same Platonic ideal. However, this phenomenon lacks precise theoretical explanation. By synthesizing evidence from Identifiability Theory (IT), we show that the PRH can emerge in SSL. However, current IT cannot explain SSL's empirical success. To bridge the gap between theory and practice, we propose expanding IT into what we term Singular Identifiability Theory (SITh), a broader theoretical framework encompassing the entire SSL pipeline. SITh would allow deeper insights into the implicit data assumptions in SSL and advance the field towards learning more interpretable and generalizable representations. We highlight three critical directions for future research: 1) training dynamics and convergence properties of SSL; 2) the impact of finite samples, batch size, and data diversity; and 3) the role of inductive biases in architecture, augmentations, initialization schemes, and optimizers.

Updated: 2025-07-24 11:53:07

标题: 位置：一个经验基础的可识别性理论将加速自监督学习研究

摘要: 自我监督学习（SSL）为许多当前的人工智能系统提供动力。随着研究兴趣和投资的增长，SSL的设计空间继续扩大。根据柏拉图式的SSL观点，遵循柏拉图表示假设（PRH），尽管采用不同的方法和工程方法，所有表示都会收敛到相同的柏拉图理想。然而，这种现象缺乏精确的理论解释。通过综合辨识性理论（IT）的证据，我们展示了PRH可以在SSL中出现。然而，当前的IT无法解释SSL的经验成功。为了弥合理论与实践之间的差距，我们提出将IT扩展为我们称之为奇异可识别性理论（SITh），这是一个更广泛的理论框架，涵盖整个SSL流程。SITh将允许更深入地洞察SSL中的隐含数据假设，并将该领域推向学习更具可解释性和可泛化性的表示。我们强调了未来研究的三个关键方向：1）SSL的训练动态和收敛特性；2）有限样本、批量大小和数据多样性的影响；以及3）体系结构、增强、初始化方案和优化器中的归纳偏差的作用。

更新时间: 2025-07-24 11:53:07

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2504.13101v3

Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research

Updated: 2025-07-24 11:53:07

标题: 立场：一个经验基础的可识别性理论将加速自监督学习研究

摘要: 自监督学习（SSL）支持许多当前的人工智能系统。随着研究兴趣和投资的增长，SSL设计空间不断扩大。遵循柏拉图表示假设（PRH）的柏拉图观点认为，尽管采用不同的方法和工程方法，所有表示都会收敛到相同的柏拉图理想。然而，这种现象缺乏精确的理论解释。通过综合可辨识性理论（IT）的证据，我们展示了PRH可以在SSL中出现。然而，当前的IT无法解释SSL的实证成功。为了弥补理论与实践之间的差距，我们提出将IT扩展为我们所称的奇异可辨识性理论（SITh），这是一个更广泛的理论框架，涵盖整个SSL流程。SITh将允许更深入地了解SSL中的隐含数据假设，并推动该领域朝着学习更可解释和可泛化的表示形式的方向发展。我们强调未来研究的三个关键方向：1）SSL的训练动态和收敛特性；2）有限样本、批量大小和数据多样性的影响；以及3）体系结构、增强、初始化方案和优化器中归纳偏见的作用。

更新时间: 2025-07-24 11:53:07

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2504.13101v3

A Concept for Efficient Scalability of Automated Driving Allowing for Technical, Legal, Cultural, and Ethical Differences

Efficient scalability of automated driving (AD) is key to reducing costs, enhancing safety, conserving resources, and maximizing impact. However, research focuses on specific vehicles and context, while broad deployment requires scalability across various configurations and environments. Differences in vehicle types, sensors, actuators, but also traffic regulations, legal requirements, cultural dynamics, or even ethical paradigms demand high flexibility of data-driven developed capabilities. In this paper, we address the challenge of scalable adaptation of generic capabilities to desired systems and environments. Our concept follows a two-stage fine-tuning process. In the first stage, fine-tuning to the specific environment takes place through a country-specific reward model that serves as an interface between technological adaptations and socio-political requirements. In the second stage, vehicle-specific transfer learning facilitates system adaptation and governs the validation of design decisions. In sum, our concept offers a data-driven process that integrates both technological and socio-political aspects, enabling effective scalability across technical, legal, cultural, and ethical differences.

Updated: 2025-07-24 11:51:55

标题: 一种有效的自动驾驶可扩展性概念，允许考虑技术、法律、文化和伦理差异

摘要: 自动驾驶（AD）的高效可扩展性是降低成本、增强安全性、节约资源和最大化影响的关键。然而，研究主要关注特定车辆和背景，而广泛部署需要在各种配置和环境中实现可扩展性。车辆类型、传感器、执行器的差异，以及交通法规、法律要求、文化动态，甚至是道德范式的不同，要求数据驱动的能力具有高度的灵活性。在本文中，我们解决了将通用能力可扩展地适应所需系统和环境的挑战。我们的概念遵循一个两阶段的微调过程。在第一阶段，通过一个特定国家的奖励模型进行特定环境的微调，该模型作为技术调整和社会政治要求之间的接口。在第二阶段，车辆特定的迁移学习促进系统适应，并指导设计决策的验证。总的来说，我们的概念提供了一个整合技术和社会政治方面的数据驱动过程，实现了在技术、法律、文化和道德差异方面的有效可扩展性。

更新时间: 2025-07-24 11:51:55

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2507.18326v1

A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation

Electrocardiogram (ECG) delineation, the segmentation of meaningful waveform features, is critical for clinical diagnosis. Despite recent advances using deep learning, progress has been limited by the scarcity of publicly available annotated datasets. Semi-supervised learning presents a promising solution by leveraging abundant unlabeled ECG data. In this study, we present the first systematic benchmark for semi-supervised semantic segmentation (SemiSeg) in ECG delineation. We curated and unified multiple public datasets, including previously underused sources, to support robust and diverse evaluation. We adopted five representative SemiSeg algorithms from computer vision, implemented them on two different architectures: the convolutional network and the transformer, and evaluated them in two different settings: in-domain and cross-domain. Additionally, we propose ECG-specific training configurations and augmentation strategies and introduce a standardized evaluation framework. Our results show that the transformer outperforms the convolutional network in semi-supervised ECG delineation. We anticipate that our benchmark will serve as a foundation for advancing semi-supervised ECG delineation methods and will facilitate further research in this domain.

Updated: 2025-07-24 11:49:46

标题: 一个半监督心电图划分的语义分割多数据集基准测试

摘要: 心电图（ECG）的划分，即有意义波形特征的分割，对于临床诊断至关重要。尽管最近利用深度学习取得了一些进展，但由于公开可用的注释数据集的稀缺，进展仍受到限制。半监督学习通过利用大量未标记的心电图数据提出了一种有希望的解决方案。在这项研究中，我们首次为心电图划分中的半监督语义分割（SemiSeg）提出了系统性基准。我们整合和统一了多个公开数据集，包括以前未被充分利用的来源，以支持稳健和多样化的评估。我们采用了来自计算机视觉的五种代表性SemiSeg算法，将它们实现在两种不同的架构上：卷积网络和变压器，并在两种不同的设置中对其进行评估：域内和跨域。此外，我们提出了心电图特定的训练配置和增强策略，并引入了一个标准化的评估框架。我们的结果显示，变压器在半监督心电图划分中表现优于卷积网络。我们预计我们的基准将成为推进半监督心电图划分方法的基础，并将有助于在该领域进一步研究。

更新时间: 2025-07-24 11:49:46

领域: cs.CV,cs.AI,cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.18323v1

A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation

Updated: 2025-07-24 11:49:46

标题: 一个用于心电图分割半监督语义分割的多数据集基准

摘要: 心电图（ECG）描绘，即对有意义波形特征进行分割，对于临床诊断至关重要。尽管最近利用深度学习取得了进展，但由于公开可用的带标注数据集稀缺，进展受到限制。半监督学习通过利用大量未标记的心电图数据提供了一个有希望的解决方案。在这项研究中，我们提出了心电图描绘中半监督语义分割（SemiSeg）的首个系统化基准。我们整合并统一了多个公开数据集，包括以前未被充分利用的来源，以支持强大和多样化的评估。我们采用了来自计算机视觉的五种代表性SemiSeg算法，并在两种不同架构上实施：卷积网络和变换器，并在两种不同设置下评估它们：领域内和跨领域。此外，我们提出了心电图特定的训练配置和增强策略，并引入了一个标准化的评估框架。我们的结果显示，变换器在半监督心电图描绘中优于卷积网络。我们期待我们的基准将为推进半监督心电图描绘方法奠定基础，并促进该领域的进一步研究。

更新时间: 2025-07-24 11:49:46

领域: cs.CV,cs.AI,cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.18323v1

A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1

Argument mining (AM) is an interdisciplinary research field that integrates insights from logic, philosophy, linguistics, rhetoric, law, psychology, and computer science. It involves the automatic identification and extraction of argumentative components, such as premises and claims, and the detection of relationships between them, such as support, attack, or neutrality. Recently, the field has advanced significantly, especially with the advent of large language models (LLMs), which have enhanced the efficiency of analyzing and extracting argument semantics compared to traditional methods and other deep learning models. There are many benchmarks for testing and verifying the quality of LLM, but there is still a lack of research and results on the operation of these models in publicly available argument classification databases. This paper presents a study of a selection of LLM's, using diverse datasets such as Args.me and UKP. The models tested include versions of GPT, Llama, and DeepSeek, along with reasoning-enhanced variants incorporating the Chain-of-Thoughts algorithm. The results indicate that ChatGPT-4o outperforms the others in the argument classification benchmarks. In case of models incorporated with reasoning capabilities, the Deepseek-R1 shows its superiority. However, despite their superiority, GPT-4o and Deepseek-R1 still make errors. The most common errors are discussed for all models. To our knowledge, the presented work is the first broader analysis of the mentioned datasets using LLM and prompt algorithms. The work also shows some weaknesses of known prompt algorithms in argument analysis, while indicating directions for their improvement. The added value of the work is the in-depth analysis of the available argument datasets and the demonstration of their shortcomings.

Updated: 2025-07-24 11:49:06

标题: 基于LLM的论据分类的全面研究：从LLAMA到GPT-4o再到Deepseek-R1

摘要: Argument mining (AM)是一个跨学科研究领域，整合了逻辑、哲学、语言学、修辞学、法律、心理学和计算机科学的见解。它涉及自动识别和提取论证组成部分，如前提和主张，并检测它们之间的关系，如支持、攻击或中立。最近，该领域取得了显著进展，特别是随着大型语言模型（LLMs）的出现，这些模型相比传统方法和其他深度学习模型增强了分析和提取论证语义的效率。有许多用于测试和验证LLM质量的基准，但在公开可用的论证分类数据库中，对这些模型的运行仍存在研究和结果的不足。本文介绍了对LLM的选择性研究，使用诸如Args.me和UKP之类的多样化数据集。测试的模型包括GPT、Llama和DeepSeek的版本，以及包含Chain-of-Thoughts算法的增强推理变体。结果表明，ChatGPT-4o在论证分类基准中表现优异。在具有推理能力的模型方面，Deepseek-R1展现出其优越性。然而，尽管它们优越，GPT-4o和Deepseek-R1仍会出现错误。各个模型的最常见错误进行了讨论。据我们所知，所提出的工作是首次使用LLM和提示算法对提到的数据集进行更广泛分析。该工作还展示了已知提示算法在论证分析中的一些弱点，同时指出了改进的方向。该工作的附加价值在于对可用论证数据集的深入分析，以及展示它们的不足之处。

更新时间: 2025-07-24 11:49:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.08621v2

I-CEE: Tailoring Explanations of Image Classification Models to User Expertise

Effectively explaining decisions of black-box machine learning models is critical to responsible deployment of AI systems that rely on them. Recognizing their importance, the field of explainable AI (XAI) provides several techniques to generate these explanations. Yet, there is relatively little emphasis on the user (the explainee) in this growing body of work and most XAI techniques generate "one-size-fits-all" explanations. To bridge this gap and achieve a step closer towards human-centered XAI, we present I-CEE, a framework that provides Image Classification Explanations tailored to User Expertise. Informed by existing work, I-CEE explains the decisions of image classification models by providing the user with an informative subset of training data (i.e., example images), corresponding local explanations, and model decisions. However, unlike prior work, I-CEE models the informativeness of the example images to depend on user expertise, resulting in different examples for different users. We posit that by tailoring the example set to user expertise, I-CEE can better facilitate users' understanding and simulatability of the model. To evaluate our approach, we conduct detailed experiments in both simulation and with human participants (N = 100) on multiple datasets. Experiments with simulated users show that I-CEE improves users' ability to accurately predict the model's decisions (simulatability) compared to baselines, providing promising preliminary results. Experiments with human participants demonstrate that our method significantly improves user simulatability accuracy, highlighting the importance of human-centered XAI

Updated: 2025-07-24 11:44:19

标题: I-CEE：将图像分类模型的解释定制给用户专业知识

摘要: 黑盒机器学习模型决策的有效解释对于依赖它们的人工智能系统的负责部署至关重要。认识到它们的重要性，可解释人工智能（XAI）领域提供了几种技术来生成这些解释。然而，在这一不断增长的研究领域中，对用户（被解释者）的重视相对较少，大多数XAI技术生成“一刀切”的解释。为了弥合这一差距，更接近人类中心的XAI，我们提出了I-CEE框架，为用户专业知识定制图像分类解释。受现有工作启发，I-CEE通过为用户提供信息丰富的训练数据子集（即示例图像）、相应的局部解释和模型决策来解释图像分类模型的决策。然而，与先前的工作不同，I-CEE模型化了示例图像的信息性取决于用户的专业知识，因此不同的用户会得到不同的示例。我们认为通过根据用户的专业知识定制示例集，I-CEE可以更好地促进用户对模型的理解和模拟性。为了评估我们的方法，我们在模拟和与人类参与者（N = 100）在多个数据集上进行了详细实验。与模拟用户的实验表明，与基线相比，I-CEE提高了用户准确预测模型决策（模拟性）的能力，提供了有前途的初步结果。与人类参与者的实验表明，我们的方法显著提高了用户的模拟性准确性，突显了人类中心XAI的重要性。

更新时间: 2025-07-24 11:44:19

领域: cs.AI,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2312.12102v3

I-CEE: Tailoring Explanations of Image Classification Models to User Expertise

Updated: 2025-07-24 11:44:19

标题: I-CEE：将图像分类模型的解释定制为用户专业知识

摘要: 解释黑盒机器学习模型的决策对于依赖它们的人工智能系统的负责部署至关重要。认识到它们的重要性，可解释人工智能（XAI）领域提供了几种技术来生成这些解释。然而，在这一不断增长的工作领域中，对用户（被解释者）的重视相对较少，大多数XAI技术生成“一刀切”的解释。为了弥合这一差距，更接近于以人为中心的XAI，我们提出了I-CEE，一个为用户专业知识量身定制的图像分类解释框架。受现有工作启发，I-CEE通过为用户提供信息丰富的训练数据子集（即示例图像）、相应的局部解释和模型决策来解释图像分类模型的决策。然而，与先前的工作不同，I-CEE模型将示例图像的信息量视为取决于用户的专业知识，从而为不同用户提供不同的示例。我们认为，通过将示例集合量身定制给用户的专业知识，I-CEE可以更好地促进用户对模型的理解和可模拟性。为了评估我们的方法，我们在模拟和100名人类参与者（N = 100）上进行了详细实验，涉及多个数据集。与模拟用户的实验表明，与基线相比，I-CEE提高了用户准确预测模型决策的能力（模拟性），提供了有希望的初步结果。与人类参与者的实验表明，我们的方法显著提高了用户的模拟性准确性，突显了以人为中心的XAI的重要性。

更新时间: 2025-07-24 11:44:19

领域: cs.AI,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2312.12102v3

State of Health Estimation of Batteries Using a Time-Informed Dynamic Sequence-Inverted Transformer

The rapid adoption of battery-powered vehicles and energy storage systems over the past decade has made battery health monitoring increasingly critical. Batteries play a central role in the efficiency and safety of these systems, yet they inevitably degrade over time due to repeated charge-discharge cycles. This degradation leads to reduced energy efficiency and potential overheating, posing significant safety concerns. Accurate estimation of a State of Health (SoH) of battery is therefore essential for ensuring operational reliability and safety. Several machine learning architectures, such as LSTMs, transformers, and encoder-based models, have been proposed to estimate SoH from discharge cycle data. However, these models struggle with the irregularities inherent in real-world measurements: discharge readings are often recorded at non-uniform intervals, and the lengths of discharge cycles vary significantly. To address this, most existing approaches extract features from the sequences rather than processing them in full, which introduces information loss and compromises accuracy. To overcome these challenges, we propose a novel architecture: Time-Informed Dynamic Sequence Inverted Transformer (TIDSIT). TIDSIT incorporates continuous time embeddings to effectively represent irregularly sampled data and utilizes padded sequences with temporal attention mechanisms to manage variable-length inputs without discarding sequence information. Experimental results on the NASA battery degradation dataset show that TIDSIT significantly outperforms existing models, achieving over 50% reduction in prediction error and maintaining an SoH prediction error below 0.58%. Furthermore, the architecture is generalizable and holds promise for broader applications in health monitoring tasks involving irregular time-series data.

Updated: 2025-07-24 11:43:46

标题: 使用时间通知的动态序列倒转变压器对电池健康状态进行估计

摘要: 在过去的十年里，电池动力车辆和储能系统的迅速采用使得电池健康监测变得日益关键。电池在这些系统的效率和安全性中起着至关重要的作用，然而由于反复的充放电循环，它们不可避免地会随着时间的推移而降解。这种降解会导致能量效率降低和潜在的过热现象，造成重大的安全隐患。因此，准确估计电池的健康状态对于确保运行可靠性和安全性至关重要。已经提出了几种机器学习架构，如LSTMs、transformers和基于编码器的模型，用于从放电循环数据中估计电池的健康状态。然而，这些模型在现实世界测量中固有的不规则性方面存在困难：放电读数通常以非均匀间隔记录，并且放电循环的长度变化很大。为了解决这个问题，大多数现有方法从序列中提取特征而不是对其进行完全处理，这会导致信息损失并影响准确性。为了克服这些挑战，我们提出了一种新颖的架构：Time-Informed Dynamic Sequence Inverted Transformer (TIDSIT)。TIDSIT将连续时间嵌入有效地表示不规则采样数据，并利用带有时间注意机制的填充序列来管理可变长度的输入，而不丢弃序列信息。NASA电池降解数据集上的实验结果显示，TIDSIT明显优于现有模型，预测误差降低超过50%，并将SoH预测误差保持在0.58%以下。此外，该架构具有通用性，并有望在涉及不规则时间序列数据的健康监测任务中得到更广泛的应用。

更新时间: 2025-07-24 11:43:46

领域: cs.LG

下载: http://arxiv.org/abs/2507.18320v1

State of Health Estimation of Batteries Using a Time-Informed Dynamic Sequence-Inverted Transformer

Updated: 2025-07-24 11:43:46

标题: 使用时间通知的动态序列反转变压器对电池的健康状态进行估计

摘要: 在过去的十年里，电池动力车辆和能量存储系统的快速普及使得电池健康监测变得日益关键。电池在这些系统的效率和安全性中起着核心作用，然而它们随着时间的推移不可避免地会因为反复充放电循环而逐渐退化。这种退化会导致能量效率降低和潜在过热，带来重大的安全隐患。因此，准确估计电池的健康状态对于确保操作可靠性和安全至关重要。已经提出了几种机器学习架构，如LSTM、transformers和基于编码器的模型，用于从放电循环数据中估计电池的健康状态。然而，这些模型在真实世界测量中固有的不规则性方面存在困难：放电读数往往以不均匀的间隔记录，放电循环的长度变化显著。为了解决这个问题，大多数现有方法从序列中提取特征而不是完整地处理它们，这会导致信息丢失并影响准确性。为了克服这些挑战，我们提出了一种新颖的架构：基于时间信息的动态序列倒置变压器（TIDSIT）。TIDSIT包含连续时间嵌入，有效地表示不规则采样数据，并利用带有时间注意机制的填充序列来管理可变长度的输入，而不会丢弃序列信息。对NASA电池退化数据集的实验结果显示，TIDSIT明显优于现有模型，预测误差降低了50%以上，并保持了低于0.58%的电池健康状态预测误差。此外，这种架构具有一般性，并有望在涉及不规则时间序列数据的健康监测任务中得到更广泛的应用。

更新时间: 2025-07-24 11:43:46

领域: cs.LG

下载: http://arxiv.org/abs/2507.18320v1

Retrieving Classes of Causal Orders with Inconsistent Knowledge Bases

Traditional causal discovery methods often rely on strong, untestable assumptions, which makes them unreliable in real applications. In this context, Large Language Models (LLMs) have emerged as a promising alternative for extracting causal knowledge from text-based metadata, which consolidates domain expertise. However, LLMs tend to be unreliable and prone to hallucinations, necessitating strategies that account for their limitations. One effective strategy is to use a consistency measure to assess reliability. Additionally, most text metadata does not clearly distinguish direct causal relationships from indirect ones, further complicating the discovery of a causal DAG. As a result, focusing on causal orders, rather than causal DAGs, emerges as a more practical and robust approach. We present a new method to derive a class of acyclic tournaments, which represent plausible causal orders, maximizing a consistency score derived from an LLM. Our approach starts by calculating pairwise consistency scores between variables, resulting in a semi-complete partially directed graph that consolidates these scores into an abstraction of the maximally consistent causal orders. Using this structure, we identify optimal acyclic tournaments, focusing on those that maximize consistency across all configurations. We subsequently show how both the abstraction and the class of causal orders can be used to estimate causal effects. We tested our method on both well-established benchmarks, as well as, real-world datasets from epidemiology and public health. Our results demonstrate the effectiveness of our approach in recovering the correct causal order.

Updated: 2025-07-24 11:38:53

标题: 使用不一致知识库检索因果顺序的类别

摘要: 传统的因果发现方法通常依赖于强大且不可测试的假设，这使它们在实际应用中不可靠。在这种背景下，大型语言模型(LLMs)已经成为从基于文本的元数据中提取因果知识的一种有前途的替代方法，这巩固了领域专业知识。然而，LLMs往往不可靠且容易产生幻觉，需要考虑到其局限性的策略。一种有效的策略是使用一致性度量来评估可靠性。此外，大多数文本元数据不能清楚地区分直接因果关系和间接因果关系，进一步复杂化了发现因果DAG。因此，将注意力集中在因果顺序而不是因果DAG上，显现出更为实用和稳健的方法。我们提出了一种新方法，用于推导一类无环锦标赛，它代表了可能的因果顺序，最大化了从LLM中导出的一致性分数。我们的方法首先通过计算变量之间的成对一致性分数开始，从而得到一个半完全部分有向图，将这些分数整合成最大一致性因果顺序的抽象。利用这种结构，我们确定最佳的无环锦标赛，重点关注那些在所有配置中最大化一致性的情况。随后，我们展示了如何使用这种抽象和因果顺序类别来估计因果效应。我们在已建立的基准数据集以及流行病学和公共卫生的真实世界数据集上测试了我们的方法。我们的结果证明了我们的方法在恢复正确的因果顺序方面的有效性。

更新时间: 2025-07-24 11:38:53

领域: cs.AI

下载: http://arxiv.org/abs/2412.14019v3

Differentiable Motion Manifold Primitives for Reactive Motion Generation under Kinodynamic Constraints

Real-time motion generation -- which is essential for achieving reactive and adaptive behavior -- under kinodynamic constraints for high-dimensional systems is a crucial yet challenging problem. We address this with a two-step approach: offline learning of a lower-dimensional trajectory manifold of task-relevant, constraint-satisfying trajectories, followed by rapid online search within this manifold. Extending the discrete-time Motion Manifold Primitives (MMP) framework, we propose Differentiable Motion Manifold Primitives (DMMP), a novel neural network architecture that encodes and generates continuous-time, differentiable trajectories, trained using data collected offline through trajectory optimizations, with a strategy that ensures constraint satisfaction -- absent in existing methods. Experiments on dynamic throwing with a 7-DoF robot arm demonstrate that DMMP outperforms prior methods in planning speed, task success, and constraint satisfaction.

Updated: 2025-07-24 11:36:31

标题: 在运动动力学约束下的可微运动流形基元用于反应式运动生成

摘要: 即时运动生成是实现反应性和适应性行为的关键，对于高维系统而言，在运动动力学约束下进行实时运动生成是一个至关重要但具有挑战性的问题。我们采用两步方法来解决这个问题：首先是离线学习与任务相关、满足约束的低维轨迹流形，然后在这个流形内进行快速在线搜索。通过扩展离散时间运动流形基元（MMP）框架，我们提出了可微分运动流形基元（DMMP），这是一种新颖的神经网络架构，用于编码和生成连续时间、可微分的轨迹，通过离线收集的轨迹优化数据进行训练，采用一种确保约束满足的策略，这在现有方法中是缺失的。在具有7自由度机械臂的动态投掷实验中，DMMP在规划速度、任务成功和约束满足方面表现优于先前的方法。

更新时间: 2025-07-24 11:36:31

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.12193v2

Regression-aware Continual Learning for Android Malware Detection

Malware evolves rapidly, forcing machine learning (ML)-based detectors to adapt continuously. With antivirus vendors processing hundreds of thousands of new samples daily, datasets can grow to billions of examples, making full retraining impractical. Continual learning (CL) has emerged as a scalable alternative, enabling incremental updates without full data access while mitigating catastrophic forgetting. In this work, we analyze a critical yet overlooked issue in this context: security regression. Unlike forgetting, which manifests as a general performance drop on previously seen data, security regression captures harmful prediction changes at the sample level, such as a malware sample that was once correctly detected but evades detection after a model update. Although often overlooked, regressions pose serious risks in security-critical applications, as the silent reintroduction of previously detected threats in the system may undermine users' trust in the whole updating process. To address this issue, we formalize and quantify security regression in CL-based malware detectors and propose a regression-aware penalty to mitigate it. Specifically, we adapt Positive Congruent Training (PCT) to the CL setting, preserving prior predictive behavior in a model-agnostic manner. Experiments on the ELSA, Tesseract, and AZ-Class datasets show that our method effectively reduces regression across different CL scenarios while maintaining strong detection performance over time.

Updated: 2025-07-24 11:31:23

标题: 回归感知的持续学习用于安卓恶意软件检测

摘要: 恶意软件的演变速度很快，迫使基于机器学习（ML）的检测器持续适应。随着杀毒软件供应商每天处理数十万个新样本，数据集可能增长到数十亿个示例，使得完全重新训练变得不切实际。持续学习（CL）已经成为一种可扩展的替代方案，可以进行增量更新，而无需完全访问数据，同时减轻灾难性遗忘。在这项工作中，我们分析了这一领域中一个关键但被忽视的问题：安全回归。与遗忘不同，遗忘表现为先前见过的数据的总体性能下降，安全回归捕捉到样本级别的有害预测变化，例如一种曾经被正确检测出的恶意软件样本，在模型更新后逃避检测。尽管经常被忽视，但回归在安全关键应用中构成严重风险，因为之前检测到的威胁在系统中的无声重新引入可能会削弱用户对整个更新过程的信任。为了解决这个问题，我们在基于CL的恶意软件检测器中形式化和量化安全回归，并提出一种回归感知惩罚来减轻它。具体而言，我们将Positive Congruent Training（PCT）调整到CL环境中，以一种与模型无关的方式保留先前的预测行为。在ELSA、Tesseract和AZ-Class数据集上的实验表明，我们的方法有效地减少了不同CL场景下的回归，同时在时间上保持强大的检测性能。

更新时间: 2025-07-24 11:31:23

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2507.18313v1

Regression-aware Continual Learning for Android Malware Detection

Updated: 2025-07-24 11:31:23

标题: 回归感知持续学习用于安卓恶意软件检测

摘要: 恶意软件迅速演变，迫使基于机器学习（ML）的检测器不断适应。随着防病毒厂商每天处理数十万个新样本，数据集可能增长到数十亿个示例，使得完全重新训练变得不切实际。持续学习（CL）已经成为一种可扩展的替代方案，可以进行增量更新，而无需完全访问数据，同时减轻灾难性遗忘。在这项工作中，我们分析了这个背景下一个关键但被忽视的问题：安全回归。与遗忘不同，遗忘表现为先前看到的数据的总体性能下降，安全回归捕捉到样本级别的有害预测变化，例如一种曾经被正确检测出的恶意软件样本，在模型更新后逃避检测。尽管经常被忽视，回归在安全关键应用中存在严重风险，因为先前检测到的威胁在系统中悄然重新引入，可能破坏用户对整个更新过程的信任。为了解决这个问题，我们在基于CL的恶意软件检测器中形式化和量化安全回归，并提出了一个回归感知的惩罚来减轻它。具体而言，我们将Positive Congruent Training（PCT）调整为CL环境，以一种与模型无关的方式保留先前的预测行为。对ELSA、Tesseract和AZ-Class数据集上的实验表明，我们的方法在不同的CL场景中有效降低了回归，同时随着时间的推移保持了强大的检测性能。

更新时间: 2025-07-24 11:31:23

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2507.18313v1

GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction

Circuit link prediction identifying missing component connections from incomplete netlists is crucial in analog circuit design automation. However, existing methods face three main challenges: 1) Insufficient use of topological patterns in circuit graphs reduces prediction accuracy; 2) Data scarcity due to the complexity of annotations hinders model generalization; 3) Limited adaptability to various netlist formats. We propose GNN-ACLP, a graph neural networks (GNNs) based method featuring three innovations to tackle these challenges. First, we introduce the SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction) framework and achieve port-level accuracy in circuit link prediction. Second, we propose Netlist Babel Fish, a netlist format conversion tool leveraging retrieval-augmented generation (RAG) with a large language model (LLM) to improve the compatibility of netlist formats. Finally, we construct SpiceNetlist, a comprehensive dataset that contains 775 annotated circuits across 10 different component classes. Experiments demonstrate accuracy improvements of 16.08% on SpiceNetlist, 11.38% on Image2Net, and 16.01% on Masala-CHAI compared to the baseline in intra-dataset evaluation, while maintaining accuracy from 92.05% to 99.07% in cross-dataset evaluation, exhibiting robust feature transfer capabilities.

Updated: 2025-07-24 11:31:03

标题: GNN-ACLP：基于图神经网络的模拟电路链接预测

摘要: 电路链接预测在模拟电路设计自动化中是至关重要的，可以识别不完整网表中缺失的元件连接。然而，现有方法面临三个主要挑战：1）电路图中拓扑模式的利用不足降低了预测准确性；2）由于注释复杂性导致数据稀缺，阻碍了模型的泛化能力；3）对各种网表格式的适应能力有限。我们提出了基于图神经网络（GNNs）的GNN-ACLP方法，具有三个创新来解决这些挑战。首先，我们引入了SEAL（学习来自子图、嵌入和属性用于链接预测）框架，在电路链接预测中实现了端口级准确性。其次，我们提出了Netlist Babel Fish，这是一种利用检索增强生成（RAG）和大型语言模型（LLM）的网表格式转换工具，以提高网表格式的兼容性。最后，我们构建了SpiceNetlist，这是一个包含10种不同组件类别的775个注释电路的全面数据集。实验证明，在数据集内评估中，与基线相比，在SpiceNetlist上准确性提高了16.08％，在Image2Net上提高了11.38％，在Masala-CHAI上提高了16.01％，同时在跨数据集评估中，准确性保持在92.05％至99.07％之间，展现了强大的特征转移能力。

更新时间: 2025-07-24 11:31:03

领域: cs.AR,cs.LG

下载: http://arxiv.org/abs/2504.10240v4

GNN-ACLP: Graph Neural Networks Based Analog Circuit Link Prediction

Updated: 2025-07-24 11:31:03

标题: GNN-ACLP: 基于图神经网络的模拟电路链路预测

摘要: 电路链接预测是模拟电路设计自动化中关键的一部分，可以识别不完整网表中缺失的组件连接。然而，现有方法面临三个主要挑战：1）电路图中拓扑模式的不充分利用降低了预测准确性；2）由于注释复杂性导致数据稀缺性，阻碍了模型的泛化能力；3）对各种网表格式的适应性有限。我们提出了基于图神经网络（GNNs）的GNN-ACLP方法，具有三个创新来应对这些挑战。首先，我们引入了SEAL（学习来自子图、嵌入和属性的链接预测）框架，并在电路链接预测中实现了端口级准确性。其次，我们提出了Netlist Babel Fish，这是一种网表格式转换工具，利用检索增强生成（RAG）和大型语言模型（LLM）来改善网表格式的兼容性。最后，我们构建了SpiceNetlist，这是一个包含775个注释电路的综合数据集，涵盖了10种不同的组件类别。实验结果表明，在数据集内部评估中，与基准相比，SpiceNetlist的准确性提高了16.08%，Image2Net提高了11.38%，Masala-CHAI提高了16.01%；在跨数据集评估中，保持了92.05%至99.07%的准确性，展现了强大的特征迁移能力。

更新时间: 2025-07-24 11:31:03

领域: cs.AR,cs.LG

下载: http://arxiv.org/abs/2504.10240v4

Variational inference for pile-up removal at hadron colliders with diffusion models

In this paper, we present a novel method for pile-up removal of $pp$ interactions using variational inference with diffusion models, called vipr. Instead of using classification methods to identify which particles are from the primary collision, a generative model is trained to predict the constituents of the hard-scatter particle jets with pile-up removed. This results in an estimate of the full posterior over hard-scatter jet constituents, which has not yet been explored in the context of pile-up removal, yielding a clear advantage over existing methods especially in the presence of imperfect detector efficiency. We evaluate the performance of vipr in a sample of jets from simulated $t\bar{t}$ events overlain with pile-up contamination. vipr outperforms softdrop and has comparable performance to puppiml in predicting the substructure of the hard-scatter jets over a wide range of pile-up scenarios.

Updated: 2025-07-24 11:24:17

标题: Hadron对撞机中利用扩散模型进行堆积消除的变分推断

摘要: 在这篇论文中，我们提出了一种新颖的方法，利用扩散模型和变分推理来消除$pp$相互作用中的堆积效应，称为vipr。与使用分类方法来识别哪些粒子来自初级碰撞不同，我们训练了一个生成模型，用于预测去除堆积效应后的硬散射粒子喷注的组成部分。这导致对硬散射喷注组成部分的完整后验的估计，这在堆积效应消除的背景下尚未被探索，尤其在存在不完美的探测器效率时具有明显优势。我们在叠加有堆积污染的模拟$t\bar{t}$事件的喷注样本中评估了vipr的性能。vipr在预测硬散射喷注的亚结构方面优于softdrop，并在各种堆积场景下具有与puppiml相媲美的性能。

更新时间: 2025-07-24 11:24:17

领域: hep-ph,cs.LG

下载: http://arxiv.org/abs/2410.22074v2

Variational inference for pile-up removal at hadron colliders with diffusion models

Updated: 2025-07-24 11:24:17

标题: Hadron对撞机中使用扩散模型进行堆积清除的变分推断

摘要: 在这篇论文中，我们提出了一种新颖的方法，使用扩散模型的变分推断来去除$pp$相互作用堆积的影响，称为vipr。与使用分类方法来识别哪些粒子来自主要碰撞不同，我们训练了一个生成模型来预测去除堆积的硬碰撞粒子喷流的组成部分。这导致对硬碰撞喷流组成部分的完整后验的估计，这在堆积去除的情况下尚未在这种背景下探索过，特别在存在不完善的探测器效率的情况下，相对于现有方法有明显优势。我们在模拟的$t\bar{t}$事件中使用vipr对受到堆积污染的喷流样本进行了性能评估。vipr在预测硬碰撞喷流的细微结构方面优于softdrop，并在各种堆积场景下与puppiml具有可比性。

更新时间: 2025-07-24 11:24:17

领域: hep-ph,cs.LG

下载: http://arxiv.org/abs/2410.22074v2

LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models

Language Models (LMs) typically adhere to a "pre-training and fine-tuning" paradigm, where a universal pre-trained model can be fine-tuned to cater to various specialized domains. Low-Rank Adaptation (LoRA) has gained the most widespread use in LM fine-tuning due to its lightweight computational cost and remarkable performance. Because the proportion of parameters tuned by LoRA is relatively small, there might be a misleading impression that the LoRA fine-tuning data is invulnerable to Membership Inference Attacks (MIAs). However, we identify that utilizing the pre-trained model can induce more information leakage, which is neglected by existing MIAs. Therefore, we introduce LoRA-Leak, a holistic evaluation framework for MIAs against the fine-tuning datasets of LMs. LoRA-Leak incorporates fifteen membership inference attacks, including ten existing MIAs, and five improved MIAs that leverage the pre-trained model as a reference. In experiments, we apply LoRA-Leak to three advanced LMs across three popular natural language processing tasks, demonstrating that LoRA-based fine-tuned LMs are still vulnerable to MIAs (e.g., 0.775 AUC under conservative fine-tuning settings). We also applied LoRA-Leak to different fine-tuning settings to understand the resulting privacy risks. We further explore four defenses and find that only dropout and excluding specific LM layers during fine-tuning effectively mitigate MIA risks while maintaining utility. We highlight that under the "pre-training and fine-tuning" paradigm, the existence of the pre-trained model makes MIA a more severe risk for LoRA-based LMs. We hope that our findings can provide guidance on data privacy protection for specialized LM providers.

Updated: 2025-07-24 11:18:27

标题: LoRA-Leak：LoRA经过微调的语言模型的成员推断攻击

摘要: 语言模型（LMs）通常遵循“预训练和微调”范式，其中一个通用的预训练模型可以被微调以适应各种专业领域。由于低秩适应（LoRA）具有计算成本轻、性能显著的优点，LM微调中LoRA已经得到了广泛应用。由于LoRA微调的参数比例相对较小，可能会产生一种误导性印象，即LoRA微调数据对成员推断攻击（MIAs）是无害的。然而，我们发现利用预训练模型可能会导致更多信息泄露，这是现有MIAs忽视的。因此，我们引入LoRA-Leak，一个针对LM微调数据集的MIAs的全面评估框架。LoRA-Leak包括十五种成员推断攻击，包括十种现有MIAs和五种利用预训练模型作为参考的改进MIAs。在实验中，我们将LoRA-Leak应用于三种先进的LMs，涵盖三种流行的自然语言处理任务，证明LoRA基于微调的LMs仍然容易受到MIAs的攻击（例如，在保守的微调设置下0.775的AUC）。我们还将LoRA-Leak应用于不同的微调设置，以了解产生的隐私风险。我们进一步探索了四种防御方法，并发现只有在微调过程中使用dropout和排除特定的LM层才能有效地减轻MIAs风险同时保持效用。我们强调，在“预训练和微调”范式下，预训练模型的存在使得对于LoRA-based LMs来说，MIAs是一个更严重的风险。我们希望我们的发现可以为专业LM提供商提供数据隐私保护的指导。

更新时间: 2025-07-24 11:18:27

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.18302v1

LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models

Updated: 2025-07-24 11:18:27

标题: LoRA-Leak: 针对LoRA Fine-tuned 语言模型的成员推断攻击

摘要: 语言模型（LMs）通常遵循“预训练和微调”范式，其中通用预训练模型可以微调以适应各种专业领域。由于其轻量级计算成本和显著性能，低秩适应（LoRA）在LM微调中得到了最广泛的应用。由于LoRA微调的参数比例相对较小，可能会产生一种误导性印象，即LoRA微调数据对成员推理攻击（MIAs）是无懈可击的。然而，我们发现利用预训练模型可能会导致更多的信息泄漏，这一点被现有的MIAs忽视了。因此，我们引入了LoRA-Leak，一个针对LM微调数据集的MIAs的综合评估框架。LoRA-Leak包含十五种成员推理攻击，包括十种现有的MIAs和五种利用预训练模型作为参考的改进MIAs。在实验中，我们将LoRA-Leak应用于三个先进的LMs，涵盖三个流行的自然语言处理任务，结果显示LoRA基础的微调LMs仍然容易受到MIAs的攻击（例如，在保守的微调设置下AUC为0.775）。我们还将LoRA-Leak应用于不同的微调设置，以了解产生的隐私风险。我们进一步探讨了四种防御方法，并发现只有在微调过程中使用dropout和排除特定的LM层能有效地缓解MIAs风险同时保持效用。我们强调，在“预训练和微调”范式下，预训练模型的存在使得对于基于LoRA的LMs而言，MIAs成为了一个更严重的风险。我们希望我们的发现能为专业LM提供者的数据隐私保护提供指导。

更新时间: 2025-07-24 11:18:27

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.18302v1

DocTER: Evaluating Document-based Knowledge Editing

Knowledge editing aims to correct outdated or inaccurate knowledge in neural networks. In this paper, we explore knowledge editing using easily accessible documents instead of manually labeled factual triples employed in earlier research. To advance this field, we establish the first evaluation benchmark, \textit{DocTER}, featuring Documents containing counterfactual knowledge for editing. A comprehensive four-perspective evaluation is introduced: Edit Success, Locality, Reasoning, and Cross-lingual Transfer. To adapt conventional triplet-based knowledge editing methods for this task, we develop an Extract-then-Edit pipeline that extracts triples from documents before applying existing methods. Experiments on popular knowledge editing methods demonstrate that editing with documents presents significantly greater challenges than using triples. In document-based scenarios, even the best-performing in-context editing approach still lags behind by 10 points in editing success when compared to using gold triples. This observation also holds for both reasoning and cross-lingual test sets. We further analyze key factors influencing task performance, including the quality of extracted triples, the frequency and position of edited knowledge in documents, various methods for enhancing reasoning, and performance differences across various directions in cross-lingual knowledge editing, which provide valuable insights for future research.

Updated: 2025-07-24 11:16:48

标题: DocTER：评估基于文档的知识编辑

摘要: Knowledge editing旨在纠正神经网络中过时或不准确的知识。在本文中，我们探讨使用易于获取的文档进行知识编辑，而不是像早期研究中使用手动标记的事实三元组。为推动这一领域的发展，我们建立了第一个评估基准，\textit{DocTER}，其中包含包含反事实知识的文档用于编辑。引入了全面的四方面评估：编辑成功、局部性、推理和跨语言转移。为了将传统的基于三元组的知识编辑方法应用于这一任务，我们开发了一个“提取-编辑”流水线，该流水线在应用现有方法之前从文档中提取三元组。对流行的知识编辑方法进行的实验表明，使用文档进行编辑比使用三元组面临着更大的挑战。在基于文档的场景中，即使是表现最佳的上下文编辑方法在编辑成功方面仍比使用黄金三元组落后10个百分点。这一观察结果也适用于推理和跨语言测试集。我们进一步分析了影响任务性能的关键因素，包括提取三元组的质量、文档中编辑知识的频率和位置、增强推理的各种方法，以及跨语言知识编辑中不同方向的性能差异，这为未来的研究提供了宝贵的见解。

更新时间: 2025-07-24 11:16:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2308.09954v2

Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning

Multimodal reasoning in Large Language Models (LLMs) struggles with incomplete knowledge and hallucination artifacts, challenges that textual Knowledge Graphs (KGs) only partially mitigate due to their modality isolation. While Multimodal Knowledge Graphs (MMKGs) promise enhanced cross-modal understanding, their practical construction is impeded by semantic narrowness of manual text annotations and inherent noise in visual-semantic entity linkages. In this paper, we propose Vision-align-to-Language integrated Knowledge Graph (VaLiK), a novel approach for constructing MMKGs that enhances LLMs reasoning through cross-modal information supplementation. Specifically, we cascade pre-trained Vision-Language Models (VLMs) to align image features with text, transforming them into descriptions that encapsulate image-specific information. Furthermore, we developed a cross-modal similarity verification mechanism to quantify semantic consistency, effectively filtering out noise introduced during feature alignment. Even without manually annotated image captions, the refined descriptions alone suffice to construct the MMKG. Compared to conventional MMKGs construction paradigms, our approach achieves substantial storage efficiency gains while maintaining direct entity-to-image linkage capability. Experimental results on multimodal reasoning tasks demonstrate that LLMs augmented with VaLiK outperform previous state-of-the-art models. Our code is published at https://github.com/Wings-Of-Disaster/VaLiK.

Updated: 2025-07-24 11:14:43

标题: 将视觉与语言对齐：无需标注的多模态知识图构建，以增强LLMs推理

摘要: 大语言模型（LLMs）中的多模态推理面临着知识不完整和幻觉人工制品的挑战，这些挑战文本知识图（KGs）仅部分缓解，因为它们是模态隔离的。虽然多模态知识图（MMKGs）承诺增强跨模态理解，但由于手动文本注释的语义狭窄和视觉语义实体链接中固有的噪音，它们的实际构建受到阻碍。在本文中，我们提出了一种名为Vision-align-to-Language integrated Knowledge Graph（VaLiK）的新方法，用于构建通过跨模态信息补充增强LLMs推理的MMKGs。具体而言，我们将预训练的Vision-Language模型（VLMs）级联以对齐图像特征和文本，将它们转换为概括图像特定信息的描述。此外，我们开发了一种跨模态相似性验证机制来量化语义一致性，有效地过滤在特征对齐过程中引入的噪音。即使没有手动注释的图像标题，精炼的描述也足以构建MMKG。与传统的MMKG构建范式相比，我们的方法在保持直接实体到图像链接能力的同时实现了实质性的存储效率提升。在多模态推理任务上的实验结果表明，增强了VaLiK的LLMs优于先前的最先进模型。我们的代码发布在https://github.com/Wings-Of-Disaster/VaLiK。

更新时间: 2025-07-24 11:14:43

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.12972v2

OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problems with Reasoning LLM

With the rise of artificial intelligence (AI), applying large language models (LLMs) to Operations Research (OR) problem-solving has attracted increasing attention. Most existing approaches attempt to improve OR problem-solving through prompt engineering or fine-tuning strategies for LLMs. However, these methods are fundamentally constrained by the limited capabilities of non-reasoning LLMs. To overcome these limitations, we propose OR-LLM-Agent, an AI agent built on reasoning LLMs for automated OR problem solving. The agent decomposes the task into three sequential stages: mathematical modeling, code generation, and debugging. Each task is handled by a dedicated sub-agent, which enables more targeted reasoning. We also construct BWOR, a high-quality dataset for evaluating LLM performance on OR tasks. Our analysis shows that existing benchmarks such as NL4OPT, MAMO, and IndustryOR suffer from certain issues, making them less suitable for reliably evaluating LLM performance. In contrast, BWOR provides a more consistent and discriminative assessment of model capabilities. Experimental results demonstrate that OR-LLM-Agent outperforms advanced methods, including GPT-o3, Gemini 2.5 Pro, and ORLM, by at least 7% in accuracy. These results demonstrate the effectiveness of task decomposition for OR problem solving.

Updated: 2025-07-24 11:09:58

标题: OR-LLM-Agent: 使用推理LLM自动建模和解决运筹学优化问题

摘要: 随着人工智能（AI）的兴起，将大型语言模型（LLMs）应用于运筹学（OR）问题解决引起了越来越多的关注。大多数现有方法试图通过提示工程或LLMs的微调策略来改善OR问题解决。然而，这些方法基本上受限于非推理LLMs的有限能力。为了克服这些限制，我们提出了OR-LLM-Agent，这是一个基于推理LLMs构建的AI代理，用于自动化OR问题解决。该代理将任务分解为三个连续阶段：数学建模、代码生成和调试。每个任务由专门的子代理处理，从而实现更有针对性的推理。我们还构建了BWOR，一个用于评估LLM在OR任务上性能的高质量数据集。我们的分析显示，现有的基准如NL4OPT，MAMO和IndustryOR存在某些问题，使它们不太适合可靠地评估LLM的性能。相比之下，BWOR提供了更一致和区分性的模型能力评估。实验结果表明，OR-LLM-Agent在准确性方面至少比GPT-o3，Gemini 2.5 Pro和ORLM等先进方法高出7%。这些结果表明了任务分解对OR问题解决的有效性。

更新时间: 2025-07-24 11:09:58

领域: cs.AI,math.OC

下载: http://arxiv.org/abs/2503.10009v2

VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks

Domain generalizability is a crucial aspect of a deep learning model since it determines the capability of the model to perform well on data from unseen domains. However, research on the domain generalizability of deep learning models for vision-language tasks remains limited, primarily because of the lack of required datasets. To address these challenges, we propose VolDoGer: Vision-Language Dataset for Domain Generalization, a dedicated dataset designed for domain generalization that addresses three vision-language tasks: image captioning, visual question answering, and visual entailment. We constructed VolDoGer by extending LLM-based data annotation techniques to vision-language tasks, thereby alleviating the burden of recruiting human annotators. We evaluated the domain generalizability of various models, ranging from fine-tuned models to a recent multimodal large language model, through VolDoGer.

Updated: 2025-07-24 11:08:59

标题: VolDoGer：LLM辅助数据集用于视觉-语言任务中的领域泛化

摘要: 领域通用性是深度学习模型的一个关键方面，因为它决定了模型在未知领域数据上表现良好的能力。然而，针对视觉语言任务的深度学习模型的领域通用性研究仍然有限，主要是因为缺乏必要的数据集。为了解决这些挑战，我们提出了VolDoGer：视觉语言领域通用性数据集，这是一个专门设计用于领域通用性的数据集，涉及三个视觉语言任务：图像描述、视觉问答和视觉蕴含。我们通过将基于LLM的数据标注技术扩展到视觉语言任务，构建了VolDoGer，从而减轻了招募人类注释者的负担。我们通过VolDoGer评估了各种模型的领域通用性，从微调模型到最近的多模态大型语言模型。

更新时间: 2025-07-24 11:08:59

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.19795v2

PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

While end-to-end autonomous driving models show promising results, their practical deployment is often hindered by large model sizes, a reliance on expensive LiDAR sensors and computationally intensive BEV feature representations. This limits their scalability, especially for mass-market vehicles equipped only with cameras. To address these challenges, we propose PRIX (Plan from Raw Pixels). Our novel and efficient end-to-end driving architecture operates using only camera data, without explicit BEV representation and forgoing the need for LiDAR. PRIX leverages a visual feature extractor coupled with a generative planning head to predict safe trajectories from raw pixel inputs directly. A core component of our architecture is the Context-aware Recalibration Transformer (CaRT), a novel module designed to effectively enhance multi-level visual features for more robust planning. We demonstrate through comprehensive experiments that PRIX achieves state-of-the-art performance on the NavSim and nuScenes benchmarks, matching the capabilities of larger, multimodal diffusion planners while being significantly more efficient in terms of inference speed and model size, making it a practical solution for real-world deployment. Our work is open-source and the code will be at https://maxiuw.github.io/prix.

Updated: 2025-07-24 11:04:42

标题: PRIX：从原始像素学习规划，实现端到端自动驾驶

摘要: 尽管端到端自动驾驶模型显示出有希望的结果，但它们的实际部署通常受到大型模型尺寸、对昂贵的LiDAR传感器的依赖和计算密集的BEV特征表示的限制。这限制了它们的可扩展性，特别是对于只配备摄像头的大众市场车辆。为了解决这些挑战，我们提出了PRIX（从原始像素规划）。我们的新颖、高效的端到端驾驶架构只使用摄像头数据，不需要显式的BEV表示，并且不需要LiDAR。PRIX利用视觉特征提取器与生成规划头部相结合，直接从原始像素输入预测安全轨迹。我们架构的核心组件是上下文感知重新校准变压器（CaRT），这是一个设计精良的模块，可有效增强多级视觉特征，以实现更健壮的规划。我们通过全面的实验展示，PRIX在NavSim和nuScenes基准测试中实现了最先进的性能，与更大、多模态扩散规划器的能力相匹配，同时在推理速度和模型尺寸方面显著更有效，使其成为实际部署的实际解决方案。我们的工作是开源的，代码将在https://maxiuw.github.io/prix提供。

更新时间: 2025-07-24 11:04:42

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2507.17596v2

Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation

Due to the high computational load of modern numerical simulation, there is a demand for approaches that would reduce the size of discrete problems while keeping the accuracy reasonable. In this work, we present an original algorithm to coarsen an unstructured grid based on the concepts of differentiable physics. We achieve this by employing k-means clustering, autodifferentiation and stochastic minimization algorithms. We demonstrate performance of the designed algorithm on two PDEs: a linear parabolic equation which governs slightly compressible fluid flow in porous media and the wave equation. Our results show that in the considered scenarios, we reduced the number of grid points up to 10 times while preserving the modeled variable dynamics in the points of interest. The proposed approach can be applied to the simulation of an arbitrary system described by evolutionary partial differential equations.

Updated: 2025-07-24 11:02:13

标题: 自监督结构化网格粗化与自动微分

摘要: 由于现代数值模拟的高计算负荷，需要一种方法来减小离散问题的规模，同时保持合理的精度。在这项工作中，我们提出了一种基于可微物理概念的原创算法来粗化非结构化网格。我们通过使用k均值聚类、自动微分和随机最小化算法来实现这一点。我们在两个偏微分方程上展示了设计算法的性能：一个控制多孔介质中稍可压缩流体流动的线性抛物方程和波动方程。我们的结果表明，在考虑的情景中，我们将网格点数量减少了多达10倍，同时保持了关注点的模拟变量动态。所提出的方法可以应用于演化偏微分方程描述的任意系统的模拟。

更新时间: 2025-07-24 11:02:13

领域: cs.LG

下载: http://arxiv.org/abs/2507.18297v1

Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation

Updated: 2025-07-24 11:02:13

标题: 无监督自动微分的非结构化网格粗化

摘要: 由于现代数值模拟的高计算负载，存在对能够减少离散问题规模同时保持合理精度的方法的需求。在这项工作中，我们提出了一种基于可微物理概念的原创算法来粗化非结构化网格。我们通过使用k均值聚类、自动微分和随机最小化算法来实现这一目标。我们在两个偏微分方程中展示了所设计算法的性能：一个控制多孔介质中略可压缩流体流动的线性抛物方程和波动方程。我们的结果表明，在考虑的情况下，我们将网格点数量减少了最多10倍，同时保持了感兴趣点的模拟变量动态。所提出的方法可以应用于描述演化偏微分方程的任意系统的模拟。

更新时间: 2025-07-24 11:02:13

领域: cs.LG

下载: http://arxiv.org/abs/2507.18297v1

Leveraging Data Augmentation and Siamese Learning for Predictive Process Monitoring

Predictive Process Monitoring (PPM) enables forecasting future events or outcomes of ongoing business process instances based on event logs. However, deep learning PPM approaches are often limited by the low variability and small size of real-world event logs. To address this, we introduce SiamSA-PPM, a novel self-supervised learning framework that combines Siamese learning with Statistical Augmentation for Predictive Process Monitoring. It employs three novel statistically grounded transformation methods that leverage control-flow semantics and frequent behavioral patterns to generate realistic, semantically valid new trace variants. These augmented views are used within a Siamese learning setup to learn generalizable representations of process prefixes without the need for labeled supervision. Extensive experiments on real-life event logs demonstrate that SiamSA-PPM achieves competitive or superior performance compared to the SOTA in both next activity and final outcome prediction tasks. Our results further show that statistical augmentation significantly outperforms random transformations and improves variability in the data, highlighting SiamSA-PPM as a promising direction for training data enrichment in process prediction.

Updated: 2025-07-24 10:57:20

标题: 利用数据增强和孪生学习进行预测流程监控

摘要: 预测性过程监控（PPM）能够基于事件日志预测正在进行的业务流程实例的未来事件或结果。然而，深度学习PPM方法通常受到真实世界事件日志的低变异性和小规模的限制。为了解决这个问题，我们引入了SiamSA-PPM，这是一个结合了Siamese学习和统计增强的自监督学习框架，用于预测性过程监控。它采用三种新颖的统计基础的转换方法，利用控制流语义和频繁的行为模式来生成真实、语义有效的新的跟踪变体。这些增强的视图在Siamese学习设置中使用，以学习过程前缀的可泛化表示，而无需标签监督。对真实生活中的事件日志进行了大量实验，结果表明，与SOTA相比，SiamSA-PPM在下一个活动和最终结果预测任务中实现了竞争性或优越的性能。我们的结果进一步显示，统计增强明显优于随机转换，并提高了数据的变化性，突显了SiamSA-PPM作为过程预测中训练数据丰富化的一个有前途的方向。

更新时间: 2025-07-24 10:57:20

领域: cs.LG

下载: http://arxiv.org/abs/2507.18293v1

Leveraging Data Augmentation and Siamese Learning for Predictive Process Monitoring

Updated: 2025-07-24 10:57:20

标题: 利用数据增强和孪生学习进行预测过程监控

摘要: Predictive Process Monitoring (PPM)能够基于事件日志预测正在进行的业务流程实例的未来事件或结果。然而，深度学习PPM方法通常受到现实世界事件日志的低变异性和小规模的限制。为了解决这个问题，我们引入了SiamSA-PPM，这是一个结合了Siamese学习和统计增强的自监督学习框架，用于预测流程监控。它采用三种新颖的统计基础转换方法，利用控制流语义和频繁的行为模式来生成真实、语义有效的新的跟踪变体。这些增强的视图在Siamese学习设置中使用，以学习流程前缀的可泛化表示，无需标记监督。对真实事件日志的大量实验表明，SiamSA-PPM在下一个活动和最终结果预测任务中实现了与SOTA相媲美或更好的性能。我们的结果进一步表明，统计增强明显优于随机转换，并改善了数据的变异性，突出了SiamSA-PPM作为在流程预测中训练数据丰富化的有前途的方向。

更新时间: 2025-07-24 10:57:20

领域: cs.LG

下载: http://arxiv.org/abs/2507.18293v1

Foundations for Risk Assessment of AI in Protecting Fundamental Rights

This chapter introduces a conceptual framework for qualitative risk assessment of AI, particularly in the context of the EU AI Act. The framework addresses the complexities of legal compliance and fundamental rights protection by itegrating definitional balancing and defeasible reasoning. Definitional balancing employs proportionality analysis to resolve conflicts between competing rights, while defeasible reasoning accommodates the dynamic nature of legal decision-making. Our approach stresses the need for an analysis of AI deployment scenarios and for identifying potential legal violations and multi-layered impacts on fundamental rights. On the basis of this analysis, we provide philosophical foundations for a logical account of AI risk analysis. In particular, we consider the basic building blocks for conceptually grasping the interaction between AI deployment scenarios and fundamental rights, incorporating in defeasible reasoning definitional balancing and arguments about the contextual promotion or demotion of rights. This layered approach allows for more operative models of assessment of both high-risk AI systems and General Purpose AI (GPAI) systems, emphasizing the broader applicability of the latter. Future work aims to develop a formal model and effective algorithms to enhance AI risk assessment, bridging theoretical insights with practical applications to support responsible AI governance.

Updated: 2025-07-24 10:52:22

标题: 基于保护基本权利的人工智能风险评估基础

摘要: 这一章介绍了一个用于定性风险评估人工智能（AI）的概念框架，特别是在欧盟AI法案的背景下。该框架通过整合定义平衡和可推翻推理来解决法律合规性和基本权利保护的复杂性。定义平衡采用比例分析来解决竞争权利之间的冲突，而可推翻推理则适应法律决策的动态性质。我们的方法强调了对AI部署场景的分析的必要性，并确定潜在的法律违规行为和对基本权利的多层影响。基于这一分析，我们为AI风险分析提供了哲学基础。特别是，我们考虑了概念上把握AI部署场景和基本权利之间相互作用的基本构建模块，将定义平衡和关于权利在上下文中的促进或降级的论点纳入可推翻推理。这种分层方法允许更具操作性的评估模型，既对高风险AI系统，又对通用人工智能（GPAI）系统进行强调后者的广泛适用性。未来的工作旨在发展一个形式模型和有效算法，以增强AI风险评估，将理论洞察力与实际应用相结合，支持负责任的AI治理。

更新时间: 2025-07-24 10:52:22

领域: cs.AI

下载: http://arxiv.org/abs/2507.18290v1

Scheduzz: Constraint-based Fuzz Driver Generation with Dual Scheduling

Fuzzing a library requires experts to understand the library usage well and craft high-quality fuzz drivers, which is tricky and tedious. Therefore, many techniques have been proposed to automatically generate fuzz drivers. However, they fail to generate rational fuzz drivers due to the lack of adherence to proper library usage conventions, such as ensuring a resource is closed after being opened. To make things worse, existing library fuzzing techniques unconditionally execute each driver, resulting in numerous irrational drivers that waste computational resources while contributing little coverage and generating false positive bug reports. To tackle these challenges, we propose a novel automatic library fuzzing technique, Scheduzz, an LLM-based library fuzzing technique. It leverages LLMs to understand rational usage of libraries and extract API combination constraints. To optimize computational resource utilization, a dual scheduling framework is implemented to efficiently manage API combinations and fuzz drivers. The framework models driver generation and the corresponding fuzzing campaign as an online optimization problem. Within the scheduling loop, multiple API combinations are selected to generate fuzz drivers, while simultaneously, various optimized fuzz drivers are scheduled for execution or suspension. We implemented Scheduzz and evaluated it in 33 real-world libraries. Compared to baseline approaches, Scheduzz significantly reduces computational overhead and outperforms UTopia on 16 out of 21 libraries. It achieves 1.62x, 1.50x, and 1.89x higher overall coverage than the state-of-the-art techniques CKGFuzzer, Promptfuzz, and the handcrafted project OSS-Fuzz, respectively. In addition, Scheduzz discovered 33 previously unknown bugs in these well-tested libraries, 3 of which have been assigned CVEs.

Updated: 2025-07-24 10:51:11

标题: Scheduzz：基于约束的具有双重调度的模糊驱动程序生成

摘要: 对于一本图书馆进行模糊测试需要专家们对图书馆的使用有很好的理解，并且制作高质量的模糊驱动程序，这是一个棘手且繁琐的过程。因此，许多技术已经被提出来自动生成模糊驱动程序。然而，由于缺乏适当的图书馆使用规范，例如确保在打开后关闭资源，它们未能生成合理的模糊驱动程序。更糟糕的是，现有的图书馆模糊测试技术无条件地执行每个驱动程序，导致大量不合理的驱动程序浪费计算资源，同时贡献的覆盖率很少，并且生成虚假的漏洞报告。为了应对这些挑战，我们提出了一种新颖的自动图书馆模糊测试技术Scheduzz，一种基于LLM的图书馆模糊测试技术。它利用LLMs来理解图书馆的合理使用和提取API组合约束。为了优化计算资源利用率，实现了一个双调度框架来高效管理API组合和模糊驱动程序。该框架将驱动程序生成和相应的模糊测试活动建模为一个在线优化问题。在调度循环中，选择多个API组合来生成模糊驱动程序，同时，安排执行或暂停各种优化的模糊驱动程序。我们实现了Scheduzz，并在33个现实世界的图书馆中进行了评估。与基线方法相比，Scheduzz显著减少了计算开销，并在21个图书馆中的16个上优于UTopia。它的整体覆盖率比最先进的技术CKGFuzzer、Promptfuzz和手工项目OSS-Fuzz分别高出1.62倍、1.50倍和1.89倍。此外，Scheduzz在这些经过充分测试的图书馆中发现了33个以前未知的漏洞，其中3个已被分配CVE编号。

更新时间: 2025-07-24 10:51:11

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2507.18289v1

Scheduzz: Constraint-based Fuzz Driver Generation with Dual Scheduling

Updated: 2025-07-24 10:51:11

标题: Scheduzz：基于约束的双调度模糊驱动生成

摘要: 对一个库进行fuzzing需要专家们对库的使用了解深入，并精心制作高质量的fuzz驱动程序，这是一项棘手而繁琐的任务。因此，许多技术已经提出自动生成fuzz驱动程序。然而，由于缺乏对正确库使用约定的遵守，例如确保资源在打开后被关闭，它们未能生成合理的fuzz驱动程序。更糟糕的是，现有的库fuzzing技术无条件地执行每个驱动程序，导致大量不合理的驱动程序浪费计算资源，同时贡献较少的覆盖范围并生成虚假的错误报告。为了解决这些挑战，我们提出了一种新颖的自动库fuzzing技术Scheduzz，一种基于LLM的库fuzzing技术。它利用LLMs来理解库的合理使用并提取API组合约束。为了优化计算资源利用率，实现了一个双调度框架来有效管理API组合和fuzz驱动程序。该框架将驱动程序生成和相应的fuzzing活动建模为一个在线优化问题。在调度循环中，选择多个API组合来生成fuzz驱动程序，同时，各种优化的fuzz驱动程序被安排执行或暂停。我们实现了Scheduzz，并在33个真实世界的库中进行了评估。与基线方法相比，Scheduzz显著减少了计算开销，并在21个库中的16个库中优于UTopia。它比最先进的技术CKGFuzzer、Promptfuzz和手工制作的项目OSS-Fuzz分别实现了1.62倍、1.50倍和1.89倍的整体覆盖率。此外，Scheduzz在这些经过充分测试的库中发现了33个以前未知的错误，其中3个已被分配CVE编号。

更新时间: 2025-07-24 10:51:11

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2507.18289v1

TCM-Tongue: A Standardized Tongue Image Dataset with Pathological Annotations for AI-Assisted TCM Diagnosis

Traditional Chinese medicine (TCM) tongue diagnosis, while clinically valuable, faces standardization challenges due to subjective interpretation and inconsistent imaging protocols, compounded by the lack of large-scale, annotated datasets for AI development. To address this gap, we present the first specialized dataset for AI-driven TCM tongue diagnosis, comprising 6,719 high-quality images captured under standardized conditions and annotated with 20 pathological symptom categories (averaging 2.54 clinically validated labels per image, all verified by licensed TCM practitioners). The dataset supports multiple annotation formats (COCO, TXT, XML) for broad usability and has been benchmarked using nine deep learning models (YOLOv5/v7/v8 variants, SSD, and MobileNetV2) to demonstrate its utility for AI development. This resource provides a critical foundation for advancing reliable computational tools in TCM, bridging the data shortage that has hindered progress in the field, and facilitating the integration of AI into both research and clinical practice through standardized, high-quality diagnostic data.

Updated: 2025-07-24 10:49:31

标题: 中医舌像：一个带有病理标注的标准化舌像数据集，用于AI辅助中医诊断

摘要: 中医舌诊虽然在临床上具有很大的价值，但由于主观解释和不一致的成像协议，加上缺乏大规模的、标注的数据集用于人工智能开发，面临着标准化的挑战。为了填补这一空白，我们提出了第一个专门为人工智能驱动的中医舌诊而设计的数据集，包括6,719张在标准化条件下拍摄的高质量图像，并用20种病理症状类别进行注释（每张图像平均有2.54个经临床验证的标签，全部由持牌中医师验证）。该数据集支持多种注释格式（COCO、TXT、XML），具有广泛的可用性，并已使用九种深度学习模型（YOLOv5/v7/v8变种、SSD和MobileNetV2）进行基准测试，以展示其在人工智能开发中的实用性。这一资源为推进中医可靠的计算工具奠定了重要的基础，弥补了阻碍该领域进展的数据短缺问题，并通过标准化、高质量的诊断数据促进了人工智能在研究和临床实践中的整合。

更新时间: 2025-07-24 10:49:31

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.18288v1

BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning

Recent years have seen significant advancements in designing reinforcement learning (RL)-based agents for building energy management. While individual success is observed in simulated or controlled environments, the scalability of RL approaches in terms of efficiency and generalization across building dynamics and operational scenarios remains an open question. In this work, we formally characterize the generalization space for the cross-environment, multi-objective building energy management task, and formulate the multi-objective contextual RL problem. Such a formulation helps understand the challenges of transferring learned policies across varied operational contexts such as climate and heat convection dynamics under multiple control objectives such as comfort level and energy consumption. We provide a principled way to parameterize such contextual information in realistic building RL environments, and construct a novel benchmark to facilitate the evaluation of generalizable RL algorithms in practical building control tasks. Our results show that existing multi-objective RL methods are capable of achieving reasonable trade-offs between conflicting objectives. However, their performance degrades under certain environment variations, underscoring the importance of incorporating dynamics-dependent contextual information into the policy learning process.

Updated: 2025-07-24 10:44:28

标题: 海狸：建立具有可评估变化的环境，用于评估多目标强化学习

摘要: 近年来，在设计基于强化学习（RL）的建筑能源管理代理方面取得了显著进展。虽然在模拟或控制环境中观察到了个别成功，但就效率和在建筑动态和运行场景方面的泛化能力而言，RL方法的可伸缩性仍然是一个悬而未决的问题。在这项工作中，我们正式刻画了跨环境、多目标建筑能源管理任务的泛化空间，并制定了多目标上下文RL问题。这种表述有助于理解在多样化运行环境下，如气候和热对流动力学下不同控制目标（如舒适水平和能耗）之间传递学习策略的挑战。我们提供了一种合理的方式来参数化这种上下文信息在现实建筑RL环境中，并构建了一个新颖的基准，以便评估在实际建筑控制任务中可泛化的RL算法。我们的结果显示，现有的多目标RL方法能够在冲突目标之间实现合理的权衡。然而，在某些环境变化下，它们的性能会下降，突显了将动态依赖的上下文信息纳入策略学习过程的重要性。

更新时间: 2025-07-24 10:44:28

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.07769v2

BEAVER: Building Environments with Assessable Variation for Evaluating Multi-Objective Reinforcement Learning

Updated: 2025-07-24 10:44:28

标题: 海狸：构建具有可评估变化的环境，用于评估多目标强化学习

摘要: 近年来，在设计基于强化学习（RL）的建筑能源管理代理方面取得了显著进展。虽然在模拟或受控环境中观察到个别成功，但强化学习方法在效率和在建筑动态和操作场景方面的泛化能力仍是一个悬而未决的问题。在这项工作中，我们正式刻画了跨环境、多目标建筑能源管理任务的泛化空间，并制定了多目标上下文强化学习问题。这样的制定有助于理解在不同操作环境（如气候和热对流动态）下跨学习策略的挑战，同时考虑舒适水平和能源消耗等多个控制目标。我们提供了一种合理的方法，在现实建筑强化学习环境中参数化这种上下文信息，并构建了一个新颖的基准，以便评估在实际建筑控制任务中具有泛化能力的强化学习算法。我们的结果表明，现有的多目标强化学习方法能够在冲突目标之间实现合理的权衡。然而，在某些环境变化下，它们的性能会下降，突显出将动态相关的上下文信息纳入到策略学习过程中的重要性。

更新时间: 2025-07-24 10:44:28

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.07769v2

Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models

Recent advancements in diffusion models (DMs) have been propelled by alignment methods that post-train models to better conform to human preferences. However, these approaches typically require computation-intensive training of a base model and a reward model, which not only incurs substantial computational overhead but may also compromise model accuracy and training efficiency. To address these limitations, we propose Inversion-DPO, a novel alignment framework that circumvents reward modeling by reformulating Direct Preference Optimization (DPO) with DDIM inversion for DMs. Our method conducts intractable posterior sampling in Diffusion-DPO with the deterministic inversion from winning and losing samples to noise and thus derive a new post-training paradigm. This paradigm eliminates the need for auxiliary reward models or inaccurate appromixation, significantly enhancing both precision and efficiency of training. We apply Inversion-DPO to a basic task of text-to-image generation and a challenging task of compositional image generation. Extensive experiments show substantial performance improvements achieved by Inversion-DPO compared to existing post-training methods and highlight the ability of the trained generative models to generate high-fidelity compositionally coherent images. For the post-training of compostitional image geneation, we curate a paired dataset consisting of 11,140 images with complex structural annotations and comprehensive scores, designed to enhance the compositional capabilities of generative models. Inversion-DPO explores a new avenue for efficient, high-precision alignment in diffusion models, advancing their applicability to complex realistic generation tasks. Our code is available at https://github.com/MIGHTYEZ/Inversion-DPO

Updated: 2025-07-24 10:37:32

标题: 反转-DPO：扩散模型的精确高效后训练

摘要: 最近，扩散模型（DMs）的最新进展受益于通过对齐方法来后训练模型，以更好地符合人类偏好。然而，这些方法通常需要对基础模型和奖励模型进行计算密集型训练，不仅会产生大量的计算开销，还可能损害模型的准确性和训练效率。为了解决这些限制，我们提出了Inversion-DPO，一种新颖的对齐框架，通过将Direct Preference Optimization（DPO）重新制定为带有DDIM反演的扩散模型，从而避免了奖励建模。我们的方法在Diffusion-DPO中进行不可解后验采样，通过从胜利和失败的样本到噪声的确定性反演，从而得出一个新的后训练范式。这种范式消除了对辅助奖励模型或不准确的近似的需求，显著提高了训练的精度和效率。我们将Inversion-DPO应用于文本到图像生成的基本任务和复杂图像生成的具有挑战性的任务。广泛的实验显示，与现有的后训练方法相比，Inversion-DPO取得了显著的性能改进，并突显了训练生成模型生成高保真度的构图连贯的图像的能力。对于构图图像生成的后训练，我们整理了一个配对数据集，包括11,140张带有复杂结构注释和全面评分的图像，旨在增强生成模型的构图能力。Inversion-DPO探索了扩散模型中高效、高精度对齐的新途径，推动了它们在复杂逼真生成任务中的适用性。我们的代码可在https://github.com/MIGHTYEZ/Inversion-DPO获取。

更新时间: 2025-07-24 10:37:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.11554v3

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

Existing encoder-free vision-language models (VLMs) are rapidly narrowing the performance gap with their encoder-based counterparts, highlighting the promising potential for unified multimodal systems with structural simplicity and efficient deployment. We systematically clarify the performance gap between VLMs using pre-trained vision encoders, discrete tokenizers, and minimalist visual layers from scratch, deeply excavating the under-examined characteristics of encoder-free VLMs. We develop efficient strategies for encoder-free VLMs that rival mainstream encoder-based ones. After an in-depth investigation, we launch EVEv2.0, a new and improved family of encoder-free VLMs. We show that: (i) Properly decomposing and hierarchically associating vision and language within a unified model reduces interference between modalities. (ii) A well-designed training strategy enables effective optimization for encoder-free VLMs. Through extensive evaluation, our EVEv2.0 represents a thorough study for developing a decoder-only architecture across modalities, demonstrating superior data efficiency and strong vision-reasoning capability. Code is publicly available at: https://github.com/baaivision/EVE.

Updated: 2025-07-24 10:29:52

标题: EVEv2：无编码器视觉-语言模型的改进基线

摘要: 现有的无编码器视觉-语言模型（VLMs）正在迅速缩小与基于编码器的对应模型之间的性能差距，突出显示了统一多模态系统具有结构简单性和高效部署的潜在潜力。我们系统地澄清了使用预训练视觉编码器、离散分词器和从头开始的极简视觉层构建VLMs之间的性能差距，深入挖掘了未经审查的无编码器VLMs的特征。我们开发了对抗主流基于编码器的VLMs的高效策略。经过深入调查，我们推出了EVEv2.0，这是一组新的改进的无编码器VLMs。我们展示了：（i）在统一模型中适当分解并层次化关联视觉和语言可以减少模态之间的干扰。（ii）一个设计良好的训练策略可以实现对无编码器VLMs的有效优化。通过广泛的评估，我们的EVEv2.0代表了开发跨模态解码器架构的全面研究，展示了卓越的数据效率和强大的视觉推理能力。代码可在以下网址公开获取：https://github.com/baaivision/EVE。

更新时间: 2025-07-24 10:29:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.06788v2

Online Housing Market

This paper studies an online variant of the celebrated housing market problem, where each agent has a single house and seeks to exchange it for another based on her preferences. In this online setting, agents may arrive and depart at any time, meaning that not all agents are present on the housing market simultaneously. I extend the well known serial dictatorship and Gale s top trading cycle mechanisms to this online scenario, aiming to retain their desirable properties such as Pareto efficiency, individual rationality, and strategy proofness. These extensions also seek to prevent agents from strategically delaying their arrival or advancing their departure. I demonstrate that achieving all of these properties simultaneously is impossible in the online context, and I present several variants that achieve different subsets of these properties.

Updated: 2025-07-24 10:10:42

标题: 在线住房市场

摘要: 本文研究了一个在线变种的著名住房市场问题，其中每个代理人拥有一栋房屋，并根据自己的偏好寻求交换。在这个在线环境中，代理人可以随时到来和离开，这意味着并非所有代理人同时出现在住房市场上。我将著名的串行独裁和盖尔最高交易循环机制扩展到这个在线场景中，旨在保留它们的良好特性，如帕累托效率、个体理性和策略可靠性。这些扩展还旨在阻止代理人策略性地延迟到达或提前离开。我证明在在线环境中同时实现所有这些特性是不可能的，并提出了几种实现这些特性不同子集的变种。

更新时间: 2025-07-24 10:10:42

领域: cs.GT,cs.AI

下载: http://arxiv.org/abs/2501.15916v2

Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models

Direct speech translation (ST) has garnered increasing attention nowadays, yet the accurate translation of terminology within utterances remains a great challenge. In this regard, current studies mainly concentrate on leveraging various translation knowledge into ST models. However, these methods often struggle with interference from irrelevant noise and can not fully utilize the translation knowledge. To address these issues, in this paper, we propose a novel Locate-and-Focus method for terminology translation. It first effectively locates the speech clips containing terminologies within the utterance to construct translation knowledge, minimizing irrelevant information for the ST model. Subsequently, it associates the translation knowledge with the utterance and hypothesis from both audio and textual modalities, allowing the ST model to better focus on translation knowledge during translation. Experimental results across various datasets demonstrate that our method effectively locates terminologies within utterances and enhances the success rate of terminology translation, while maintaining robust general translation performance.

Updated: 2025-07-24 10:07:59

标题: 定位和聚焦：增强语音语言模型中的术语翻译

摘要: 直接言语翻译（ST）如今受到越来越多的关注，然而在话语中术语的准确翻译仍然是一个巨大的挑战。在这方面，目前的研究主要集中在将各种翻译知识整合到ST模型中。然而，这些方法往往受到与无关噪音的干扰，无法充分利用翻译知识。为了解决这些问题，本文提出了一种新颖的术语翻译定位和聚焦方法。它首先有效地定位话语中包含术语的语音片段，构建翻译知识，最大程度减少ST模型的无关信息。随后，它将翻译知识与来自音频和文本模态的话语和假设进行关联，使ST模型在翻译过程中更好地专注于翻译知识。跨越各种数据集的实验结果表明，我们的方法有效地定位了话语中的术语，并提高了术语翻译的成功率，同时保持了稳健的一般翻译性能。

更新时间: 2025-07-24 10:07:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18263v1

ReSem3D: Refinable 3D Spatial Constraints via Fine-Grained Semantic Grounding for Generalizable Robotic Manipulation

Semantics-driven 3D spatial constraints align highlevel semantic representations with low-level action spaces, facilitating the unification of task understanding and execution in robotic manipulation. The synergistic reasoning of Multimodal Large Language Models (MLLMs) and Vision Foundation Models (VFMs) enables cross-modal 3D spatial constraint construction. Nevertheless, existing methods have three key limitations: (1) coarse semantic granularity in constraint modeling, (2) lack of real-time closed-loop planning, (3) compromised robustness in semantically diverse environments. To address these challenges, we propose ReSem3D, a unified manipulation framework for semantically diverse environments, leveraging the synergy between VFMs and MLLMs to achieve fine-grained visual grounding and dynamically constructs hierarchical 3D spatial constraints for real-time manipulation. Specifically, the framework is driven by hierarchical recursive reasoning in MLLMs, which interact with VFMs to automatically construct 3D spatial constraints from natural language instructions and RGB-D observations in two stages: part-level extraction and region-level refinement. Subsequently, these constraints are encoded as real-time optimization objectives in joint space, enabling reactive behavior to dynamic disturbances. Extensive simulation and real-world experiments are conducted in semantically rich household and sparse chemical lab environments. The results demonstrate that ReSem3D performs diverse manipulation tasks under zero-shot conditions, exhibiting strong adaptability and generalization. Code and videos at https://resem3d.github.io.

Updated: 2025-07-24 10:07:31

标题: ReSem3D：通过细粒度语义基础的可细化的三维空间约束，用于通用机器人操作

摘要: 语义驱动的三维空间约束将高级语义表示与低级动作空间对齐，促进了机器人操作中任务理解和执行的统一。多模态大型语言模型（MLLMs）和视觉基础模型（VFMs）的协同推理实现了跨模态三维空间约束的构建。然而，现有方法存在三个关键限制：（1）约束建模中语义粒度粗糙，（2）缺乏实时闭环规划，（3）在语义多样环境中的鲁棒性受损。为了应对这些挑战，我们提出了ReSem3D，一个用于语义多样环境的统一操纵框架，利用VFMs和MLLMs之间的协同作用实现细粒度的视觉定位，并动态构建层次化的三维空间约束以进行实时操作。具体来说，该框架由MLLMs中的层次递归推理驱动，这些推理与VFMs互动，从自然语言指令和RGB-D观察中自动构建三维空间约束的两个阶段：部分级别提取和区域级别细化。随后，这些约束被编码为联合空间中的实时优化目标，使其能够对动态干扰做出反应。在语义丰富的家庭和稀疏的化学实验室环境中进行了大量模拟和真实世界实验。结果表明，ReSem3D在零样本条件下执行多样化的操作任务，展现出强大的适应性和泛化能力。代码和视频请参见https://resem3d.github.io。

更新时间: 2025-07-24 10:07:31

领域: cs.RO,cs.AI,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2507.18262v1

ReSem3D: Refinable 3D Spatial Constraints via Fine-Grained Semantic Grounding for Generalizable Robotic Manipulation

Updated: 2025-07-24 10:07:31

标题: ReSem3D: 通过细粒度语义基础实现可细化的三维空间约束，用于可泛化的机器人操作

摘要: 语义驱动的3D空间约束将高级语义表示与低级动作空间对齐，促进了机器人操作中任务理解和执行的统一。多模态大语言模型（MLLMs）和视觉基础模型（VFMs）的协同推理实现了跨模态3D空间约束构建。然而，现有方法存在三个关键限制：（1）在约束建模中的粗粒度语义，（2）缺乏实时闭环规划，（3）在语义多样环境中牺牲了鲁棒性。为了解决这些挑战，我们提出了ReSem3D，一个用于语义多样环境的统一操作框架，利用VFMs和MLLMs之间的协同作用实现了细粒度的视觉基础和实时构建分层3D空间约束以进行实时操作。具体来说，该框架由MLLMs中的分层递归推理驱动，与VFMs互动，从自然语言指令和RGB-D观察中自动构建3D空间约束的两个阶段：部分水平提取和区域级细化。随后，这些约束被编码为联合空间中的实时优化目标，使其对动态干扰具有反应性行为。在语义丰富的家居和稀疏化学实验室环境中进行了大量模拟和实际实验。结果表明，ReSem3D在零样本条件下执行各种操作任务，表现出强大的适应性和泛化能力。代码和视频请访问https://resem3d.github.io。

更新时间: 2025-07-24 10:07:31

领域: cs.RO,cs.AI,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2507.18262v1

Exploiting Gaussian Agnostic Representation Learning with Diffusion Priors for Enhanced Infrared Small Target Detection

Infrared small target detection (ISTD) plays a vital role in numerous practical applications. In pursuit of determining the performance boundaries, researchers employ large and expensive manual-labeling data for representation learning. Nevertheless, this approach renders the state-of-the-art ISTD methods highly fragile in real-world challenges. In this paper, we first study the variation in detection performance across several mainstream methods under various scarcity -- namely, the absence of high-quality infrared data -- that challenge the prevailing theories about practical ISTD. To address this concern, we introduce the Gaussian Agnostic Representation Learning. Specifically, we propose the Gaussian Group Squeezer, leveraging Gaussian sampling and compression for non-uniform quantization. By exploiting a diverse array of training samples, we enhance the resilience of ISTD models against various challenges. Then, we introduce two-stage diffusion models for real-world reconstruction. By aligning quantized signals closely with real-world distributions, we significantly elevate the quality and fidelity of the synthetic samples. Comparative evaluations against state-of-the-art detection methods in various scarcity scenarios demonstrate the efficacy of the proposed approach.

Updated: 2025-07-24 10:03:33

标题: 利用扩散先验增强红外小目标检测的高斯不可知表示学习

摘要: 红外小目标检测（ISTD）在许多实际应用中发挥着至关重要的作用。为了确定性能边界，研究人员使用大量昂贵的手动标记数据进行表示学习。然而，这种方法使得目前最先进的ISTD方法在实际挑战中非常脆弱。本文首先研究了几种主流方法在不同稀缺性下的检测性能变化，即缺乏高质量红外数据挑战了有关实际ISTD的主流理论。为了解决这个问题，我们引入了高斯不可知表示学习。具体来说，我们提出了高斯群压缩器，利用高斯采样和压缩进行非均匀量化。通过利用各种不同的训练样本，我们增强了ISTD模型对各种挑战的韧性。然后，我们为实际重建引入了两阶段扩散模型。通过将量化信号与实际分布密切对齐，我们显著提高了合成样本的质量和保真度。在各种稀缺情景下与最先进的检测方法进行比较评估，证明了所提出方法的有效性。

更新时间: 2025-07-24 10:03:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18260v1

Learning Concepts Definable in First-Order Logic with Counting

We study Boolean classification problems over relational background structures in the logical framework introduced by Grohe and Tur\'an (TOCS 2004). It is known (Grohe and Ritzert, LICS 2017) that classifiers definable in first-order logic over structures of polylogarithmic degree can be learned in sublinear time, where the degree of the structure and the running time are measured in terms of the size of the structure. We generalise the results to the first-order logic with counting FOCN, which was introduced by Kuske and Schweikardt (LICS 2017) as an expressive logic generalising various other counting logics. Specifically, we prove that classifiers definable in FOCN over classes of structures of polylogarithmic degree can be consistently learned in sublinear time. This can be seen as a first step towards extending the learning framework to include numerical aspects of machine learning. We extend the result to agnostic probably approximately correct (PAC) learning for classes of structures of degree at most $(\log \log n)^c$ for some constant $c$. Moreover, we show that bounding the degree is crucial to obtain sublinear-time learning algorithms. That is, we prove that, for structures of unbounded degree, learning is not possible in sublinear time, even for classifiers definable in plain first-order logic.

Updated: 2025-07-24 10:00:10

标题: 学习可用一阶逻辑和计数定义的概念

摘要: 我们研究在Grohe和Tur\'an（TOCS 2004）引入的逻辑框架中的关系背景结构上的布尔分类问题。已知（Grohe和Ritzert，LICS 2017）可以在多对数度量结构上的一阶逻辑中定义的分类器可以在次线性时间内学习，其中结构的度和运行时间是根据结构的大小来衡量的。我们将结果推广到具有计数FOCN的一阶逻辑，该逻辑由Kuske和Schweikardt（LICS 2017）引入，作为一个广义的逻辑，推广了各种其他计数逻辑。具体来说，我们证明可以在多对数度量结构类上定义的FOCN中的分类器可以在次线性时间内一致学习。这可以看作是将学习框架扩展到包括机器学习的数值方面的第一步。我们将结果扩展到度数最多为$(\log \log n)^c$（其中$c$是常数）的结构类的识别可能近似正确（PAC）学习。此外，我们表明限制度数对于获得次线性时间学习算法是至关重要的。也就是说，我们证明了对于度数不受限制的结构，即使对于纯一阶逻辑中可定义的分类器，学习也不可能在次线性时间内实现。

更新时间: 2025-07-24 10:00:10

领域: cs.LO,cs.AI,cs.LG

下载: http://arxiv.org/abs/1909.03820v5

HPS: Hard Preference Sampling for Human Preference Alignment

Aligning Large Language Model (LLM) responses with human preferences is vital for building safe and controllable AI systems. While preference optimization methods based on Plackett-Luce (PL) and Bradley-Terry (BT) models have shown promise, they face challenges such as poor handling of harmful content, inefficient use of dispreferred responses, and, specifically for PL, high computational costs. To address these issues, we propose Hard Preference Sampling (HPS), a novel framework for robust and efficient human preference alignment. HPS introduces a training loss that prioritizes the most preferred response while rejecting all dispreferred and harmful ones. It emphasizes "hard" dispreferred responses -- those closely resembling preferred ones -- to enhance the model's rejection capabilities. By leveraging a single-sample Monte Carlo sampling strategy, HPS reduces computational overhead while maintaining alignment quality. Theoretically, HPS improves sample efficiency over existing PL methods and maximizes the reward margin between preferred and dispreferred responses, ensuring clearer distinctions. Experiments on HH-RLHF and PKU-Safety datasets validate HPS's effectiveness, achieving comparable BLEU and reward scores while greatly improving reward margins and thus reducing harmful content generation.

Updated: 2025-07-24 10:00:09

标题: HPS：人类偏好一致性的硬偏好抽样

摘要: 将大型语言模型（LLM）的响应与人类偏好对齐对于构建安全和可控的人工智能系统至关重要。虽然基于普拉克特-卢斯（PL）和布拉德利-特里（BT）模型的偏好优化方法显示出了潜力，但它们面临诸如对有害内容处理不佳、对不受欢迎的响应利用效率低下以及对PL来说计算成本高等挑战。为了解决这些问题，我们提出了Hard Preference Sampling（HPS），这是一个新颖的框架，用于稳健且高效地对齐人类偏好。HPS引入了一个培训损失，优先考虑最受欢迎的响应，同时拒绝所有不受欢迎和有害的响应。它强调了“硬”不受欢迎的响应 - 那些与受欢迎的响应非常相似的响应 - 以增强模型的拒绝能力。通过利用单样本蒙特卡洛抽样策略，HPS减少了计算开销，同时保持了对齐质量。从理论上讲，HPS提高了现有PL方法的样本效率，并最大化了受欢迎和不受欢迎响应之间的奖励间隔，确保更清晰的区分。在HH-RLHF和PKU-Safety数据集上的实验证实了HPS的有效性，实现了可比的BLEU和奖励分数，同时大大改善了奖励间隔，从而减少了有害内容的生成。

更新时间: 2025-07-24 10:00:09

领域: cs.AI

下载: http://arxiv.org/abs/2502.14400v4

Frequency-Dynamic Attention Modulation for Dense Prediction

Vision Transformers (ViTs) have significantly advanced computer vision, demonstrating strong performance across various tasks. However, the attention mechanism in ViTs makes each layer function as a low-pass filter, and the stacked-layer architecture in existing transformers suffers from frequency vanishing. This leads to the loss of critical details and textures. We propose a novel, circuit-theory-inspired strategy called Frequency-Dynamic Attention Modulation (FDAM), which can be easily plugged into ViTs. FDAM directly modulates the overall frequency response of ViTs and consists of two techniques: Attention Inversion (AttInv) and Frequency Dynamic Scaling (FreqScale). Since circuit theory uses low-pass filters as fundamental elements, we introduce AttInv, a method that generates complementary high-pass filtering by inverting the low-pass filter in the attention matrix, and dynamically combining the two. We further design FreqScale to weight different frequency components for fine-grained adjustments to the target response function. Through feature similarity analysis and effective rank evaluation, we demonstrate that our approach avoids representation collapse, leading to consistent performance improvements across various models, including SegFormer, DeiT, and MaskDINO. These improvements are evident in tasks such as semantic segmentation, object detection, and instance segmentation. Additionally, we apply our method to remote sensing detection, achieving state-of-the-art results in single-scale settings. The code is available at https://github.com/Linwei-Chen/FDAM.

Updated: 2025-07-24 09:57:56

标题: 频率动态注意力调节用于密集预测

摘要: 视觉Transformer（ViTs）显著推进了计算机视觉，在各种任务中展现出强大的性能。然而，ViTs中的注意力机制使每一层都作为一个低通滤波器运行，并且现有Transformer中的堆叠层架构遭受频率消失的问题。这导致关键细节和纹理的丢失。我们提出了一种新颖的、受电路理论启发的策略，称为频率动态注意调节（FDAM），可以轻松地插入ViTs中。FDAM直接调节ViTs的整体频率响应，包括两种技术：注意力反转（AttInv）和频率动态缩放（FreqScale）。由于电路理论使用低通滤波器作为基本要素，我们引入了AttInv，一种通过反转注意力矩阵中的低通滤波器生成互补高通滤波器的方法，并动态地将两者结合起来。我们进一步设计了FreqScale，以对不同频率成分进行加权，实现对目标响应函数的细粒度调整。通过特征相似性分析和有效秩评估，我们展示了我们的方法避免了表示崩溃，从而在各种模型中实现了一致的性能改进，包括SegFormer，DeiT和MaskDINO。这些改进在语义分割、目标检测和实例分割等任务中明显可见。此外，我们将我们的方法应用于遥感检测，在单尺度设置中取得了最新的成果。代码可以在https://github.com/Linwei-Chen/FDAM上找到。

更新时间: 2025-07-24 09:57:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.12006v3

Alternative Loss Function in Evaluation of Transformer Models

The proper design and architecture of testing machine learning models, especially in their application to quantitative finance problems, is crucial. The most important aspect of this process is selecting an adequate loss function for training, validation, estimation purposes, and hyperparameter tuning. Therefore, in this research, through empirical experiments on equity and cryptocurrency assets, we apply the Mean Absolute Directional Loss (MADL) function, which is more adequate for optimizing forecast-generating models used in algorithmic investment strategies. The MADL function results are compared between Transformer and LSTM models, and we show that in almost every case, Transformer results are significantly better than those obtained with LSTM.

Updated: 2025-07-24 09:56:46

标题: Transformer 模型评估中的替代损失函数

摘要: 在测试机器学习模型的设计和架构，特别是在应用于量化金融问题时，是至关重要的。这个过程中最重要的一点是选择适当的损失函数用于训练、验证、估计和超参数调整。因此，在这项研究中，通过对股票和加密货币资产进行实证实验，我们应用了Mean Absolute Directional Loss (MADL)函数，这更适用于优化用于算法投资策略的预测生成模型。我们比较了Transformer和LSTM模型的MADL函数结果，我们发现在几乎每种情况下，Transformer的结果明显优于LSTM。

更新时间: 2025-07-24 09:56:46

领域: q-fin.CP,cs.LG,q-fin.TR

下载: http://arxiv.org/abs/2507.16548v2

Countering Privacy Nihilism

Of growing concern in privacy scholarship is artificial intelligence (AI), as a powerful producer of inferences. Taken to its limits, AI may be presumed capable of inferring "everything from everything," thereby making untenable any normative scheme, including privacy theory and privacy regulation, which rests on protecting privacy based on categories of data - sensitive versus non-sensitive, private versus public. Discarding data categories as a normative anchoring in privacy and data protection as a result of an unconditional acceptance of AI's inferential capacities is what we call privacy nihilism. An ethically reasoned response to AI inferences requires a sober consideration of AI capabilities rather than issuing an epistemic carte blanche. We introduce the notion of conceptual overfitting to expose how privacy nihilism turns a blind eye toward flawed epistemic practices in AI development. Conceptual overfitting refers to the adoption of norms of convenience that simplify the development of AI models by forcing complex constructs to fit data that are conceptually under-representative or even irrelevant. While conceptual overfitting serves as a helpful device to counter normative suggestions grounded in hyperbolic AI capability claims, AI inferences shake any privacy regulation that hinges protections based on restrictions around data categories. We propose moving away from privacy frameworks that focus solely on data type, neglecting all other factors. Theories like contextual integrity evaluate the normative value of privacy across several parameters, including the type of data, the actors involved in sharing it, and the purposes for which the information is used.

Updated: 2025-07-24 09:52:18

标题: 对抗隐私虚无主义

摘要: 随着人工智能（AI）作为一个强大的推断生产者，在隐私学术中引起了越来越多的关注。将AI发展到极限，可以假定其能够推断“一切来自一切”，从而使基于数据类别（敏感与非敏感、私人与公开）保护隐私的规范方案，包括隐私理论和隐私监管，变得不可行。在隐私和数据保护方面，舍弃数据类别作为规范锚定，因为无条件接受AI的推断能力，这就是我们所谓的隐私虚无主义。对AI推断的伦理合理回应需要冷静考虑AI的能力，而不是发放认识论的空白支票。我们引入概念过拟合的概念，揭示隐私虚无主义是如何对AI发展中的错误认识实践视而不见的。概念过拟合是指采用方便的规范来简化AI模型的发展，通过强迫复杂构建适应概念上欠代表或甚至无关的数据。尽管概念过拟合可以作为一个有益的设备来对抗基于夸大的AI能力声明的规范建议，但AI推断会动摇以数据类别限制为基础的隐私监管。我们建议摆脱仅关注数据类型而忽视所有其他因素的隐私框架。像上下文完整性这样的理论评估隐私的规范价值，跨越几个参数，包括数据类型、参与共享的参与者以及信息使用的目的。

更新时间: 2025-07-24 09:52:18

领域: cs.CY,cs.CR

下载: http://arxiv.org/abs/2507.18253v1

SyncMapV2: Robust and Adaptive Unsupervised Segmentation

Human vision excels at segmenting visual cues without the need for explicit training, and it remains remarkably robust even as noise severity increases. In contrast, existing AI algorithms struggle to maintain accuracy under similar conditions. Here, we present SyncMapV2, the first to solve unsupervised segmentation with state-of-the-art robustness. SyncMapV2 exhibits a minimal drop in mIoU, only 0.01%, under digital corruption, compared to a 23.8% drop observed in SOTA methods. This superior performance extends across various types of corruption: noise (7.3% vs. 37.7%), weather (7.5% vs. 33.8%), and blur (7.0% vs. 29.5%). Notably, SyncMapV2 accomplishes this without any robust training, supervision, or loss functions. It is based on a learning paradigm that uses self-organizing dynamical equations combined with concepts from random networks. Moreover, unlike conventional methods that require re-initialization for each new input, SyncMapV2 adapts online, mimicking the continuous adaptability of human vision. Thus, we go beyond the accurate and robust results, and present the first algorithm that can do all the above online, adapting to input rather than re-initializing. In adaptability tests, SyncMapV2 demonstrates near-zero performance degradation, which motivates and fosters a new generation of robust and adaptive intelligence in the near future.

Updated: 2025-07-24 09:52:06

标题: SyncMapV2：稳健且自适应的无监督分割

摘要: 人类视觉在分割视觉线索方面表现出色，无需明确的训练，即使噪音严重程度增加，仍然保持非常稳健。相比之下，现有的人工智能算法在类似条件下很难保持准确性。在这里，我们介绍SyncMapV2，它是第一个以最先进的稳健性解决无监督分割的算法。与观察到的SOTA方法中23.8%的准确性下降相比，SyncMapV2在数字损坏情况下仅下降了0.01%的mIoU。这种优越性能延伸到各种类型的损坏：噪音（7.3% vs. 37.7%）、天气（7.5% vs. 33.8%）和模糊（7.0% vs. 29.5%）。值得注意的是，SyncMapV2在没有任何稳健训练、监督或损失函数的情况下实现了这一点。它基于一种学习范式，结合了自组织动力学方程和随机网络的概念。此外，与需要为每个新输入重新初始化的传统方法不同，SyncMapV2在线适应，模仿人类视觉的连续适应能力。因此，我们不仅仅是提供准确和稳健的结果，还呈现了第一个能够在线执行所有上述操作的算法，根据输入进行自适应而不是重新初始化。在适应性测试中，SyncMapV2展示了接近零的性能下降，这激发并培养了未来新一代稳健和适应性智能。

更新时间: 2025-07-24 09:52:06

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.16297v3

SyncMapV2: Robust and Adaptive Unsupervised Segmentation

Updated: 2025-07-24 09:52:06

标题: SyncMapV2：健壮且自适应的无监督分割

摘要: 人类视觉在分割视觉线索方面表现出色，无需明确训练，即使噪声严重程度增加，仍然保持非常稳健。相比之下，现有的人工智能算法在类似条件下保持准确性方面存在困难。在这里，我们介绍了SyncMapV2，这是第一个以最先进的稳健性解决无监督分割的算法。与SOTA方法中观察到的23.8%的降低相比，在数字损坏下，SyncMapV2的mIoU仅下降了0.01%。这种卓越性能延伸到各种类型的损坏：噪声（7.3% vs. 37.7%）、天气（7.5% vs. 33.8%）和模糊（7.0% vs. 29.5%）。值得注意的是，SyncMapV2在没有任何稳健训练、监督或损失函数的情况下实现了这一点。它基于一个学习范式，利用自组织动力学方程和随机网络的概念结合而成。此外，与需要为每个新输入重新初始化的传统方法不同，SyncMapV2在线适应，模拟人类视觉的持续适应性。因此，我们超越了准确和稳健的结果，并提出了第一个可以在线执行所有以上功能的算法，适应输入而不是重新初始化。在适应性测试中，SyncMapV2表现出接近零的性能下降，这激励并培育了未来新一代稳健和适应性智能。

更新时间: 2025-07-24 09:52:06

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.16297v3

Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning

Eye-tracking data reveals valuable insights into users' cognitive states but is difficult to analyze due to its structured, non-linguistic nature. While large language models (LLMs) excel at reasoning over text, they struggle with temporal and numerical data. This paper presents a multimodal human-AI collaborative framework designed to enhance cognitive pattern extraction from eye-tracking signals. The framework includes: (1) a multi-stage pipeline using horizontal and vertical segmentation alongside LLM reasoning to uncover latent gaze patterns; (2) an Expert-Model Co-Scoring Module that integrates expert judgment with LLM output to generate trust scores for behavioral interpretations; and (3) a hybrid anomaly detection module combining LSTM-based temporal modeling with LLM-driven semantic analysis. Our results across several LLMs and prompt strategies show improvements in consistency, interpretability, and performance, with up to 50% accuracy in difficulty prediction tasks. This approach offers a scalable, interpretable solution for cognitive modeling and has broad potential in adaptive learning, human-computer interaction, and educational analytics.

Updated: 2025-07-24 09:49:53

标题: 使用眼动追踪和基于LLM的推理进行多模态行为模式分析

摘要: 眼动数据揭示了用户的认知状态，但由于其结构化、非语言性质，分析起来很困难。尽管大型语言模型（LLMs）擅长对文本进行推理，但在处理时间和数字数据方面却表现不佳。本文提出了一个多模式人工智能协作框架，旨在增强从眼动信号中提取认知模式。该框架包括：（1）使用水平和垂直分割以及LLM推理的多阶段流水线，揭示潜在凝视模式；（2）专家-模型协同评分模块，将专家判断与LLM输出整合，生成行为解释的信任分数；以及（3）将基于LSTM的时间建模与LLM驱动的语义分析相结合的混合异常检测模块。我们在几种LLMs和提示策略上的结果显示，在难度预测任务中准确率高达50％，一致性、解释性和性能均有所提高。这种方法为认知建模提供了可扩展、可解释的解决方案，并在自适应学习、人机交互和教育分析等领域具有广泛潜力。

更新时间: 2025-07-24 09:49:53

领域: cs.HC,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.18252v1

Auto-SGCR: Automated Generation of Smart Grid Cyber Range Using IEC 61850 Standard Models

Digitalization of power grids have made them increasingly susceptible to cyber-attacks in the past decade. Iterative cybersecurity testing is indispensable to counter emerging attack vectors and to ensure dependability of critical infrastructure. Furthermore, these can be used to evaluate cybersecurity configuration, effectiveness of the cybersecurity measures against various attack vectors, as well as to train smart grid cybersecurity experts defending the system. Enabling extensive experiments narrows the gap between academic research and production environment. A high-fidelity cyber range is vital as it is often infeasible to conduct such experiments and training using production environment. However, the design and implementation of cyber range requires extensive domain knowledge of physical and cyber aspect of the infrastructure. Furthermore, costs incurred for setup and maintenance of cyber range are significant. Moreover, most existing smart grid cyber ranges are designed as a one-off, proprietary system, and are limited in terms of configurability, accessibility, portability, and reproducibility. To address these challenges, an automated Smart grid Cyber Range generation framework is presented in this paper. Initially a human-/machine-friendly, XML-based modeling language called Smart Grid Modeling Language was defined, which incorporates IEC 61850 System Configuration Language files. Subsequently, a toolchain to parse SG-ML model files and automatically instantiate a functional smart grid cyber range was developed. The developed SG-ML models can be easily shared and/or modified to reproduce or customize for any cyber range. The application of Auto-SGCR is demonstrated through case studies with large-scale substation models. The toolchain along with example SG-ML models have been open-sourced.

Updated: 2025-07-24 09:44:03

标题: Auto-SGCR：使用IEC 61850标准模型自动生成智能电网网络范围

摘要: 数字化电网在过去十年中越来越容易受到网络攻击的影响。迭代式的网络安全测试对于应对新兴攻击向量并确保关键基础设施的可靠性至关重要。此外，这些测试可以用于评估网络安全配置，网络安全措施对抗各种攻击向量的有效性，以及培训智能电网网络安全专家来保护系统。实现广泛的实验缩小了学术研究和生产环境之间的差距。高保真度的网络范围至关重要，因为使用生产环境进行此类实验和培训通常是不可行的。然而，网络范围的设计和实施需要对基础设施的物理和网络方面有深入的领域知识。此外，为设置和维护网络范围所产生的成本是相当可观的。此外，大多数现有的智能电网网络范围都设计为一次性的专有系统，在可配置性、可访问性、可移植性和可重现性方面受到限制。为了解决这些挑战，本文提出了一种自动化的智能电网网络范围生成框架。首先定义了一种人机友好的基于XML的建模语言，称为智能电网建模语言，其中包含IEC 61850系统配置语言文件。随后，开发了一个工具链来解析SG-ML模型文件，并自动实例化一个功能性的智能电网网络范围。开发的SG-ML模型可以轻松共享和/或修改以重现或定制任何网络范围。通过大型变电站模型的案例研究展示了Auto-SGCR的应用。工具链以及示例SG-ML模型已经开源。

更新时间: 2025-07-24 09:44:03

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.18249v1

Auto-SGCR: Automated Generation of Smart Grid Cyber Range Using IEC 61850 Standard Models

Updated: 2025-07-24 09:44:03

标题: Auto-SGCR: 使用IEC 61850标准模型自动生成智能电网网络范围

摘要: 数字化电网使其在过去十年变得越来越容易受到网络攻击的影响。迭代式的网络安全测试对于应对新兴攻击向量并确保关键基础设施的可靠性至关重要。此外，这些测试可以用来评估网络安全配置，网络安全措施对各种攻击向量的有效性，以及培训智能电网网络安全专家来保护系统。实现广泛的实验缩小了学术研究与生产环境之间的差距。高保真度的网络安全范围至关重要，因为在生产环境中进行此类实验和培训通常是不可行的。然而，网络安全范围的设计和实施需要对基础设施的物理和网络方面有深入的领域知识。此外，建立和维护网络安全范围所需的成本相当昂贵。此外，大多数现有的智能电网网络安全范围设计为一次性的专有系统，在可配置性、可访问性、可移植性和可再现性方面有限。为了解决这些挑战，本文提出了一种自动化的智能电网网络安全范围生成框架。首先定义了一种人机友好的基于XML的建模语言，称为智能电网建模语言，其中包含IEC 61850系统配置语言文件。随后，开发了一个工具链来解析SG-ML模型文件并自动实例化一个功能性的智能电网网络安全范围。开发的SG-ML模型可以轻松共享和/或修改以复制或定制任何网络安全范围。通过大型变电站模型的案例研究展示了Auto-SGCR的应用。该工具链以及示例SG-ML模型已经开源。

更新时间: 2025-07-24 09:44:03

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.18249v1

DepthDark: Robust Monocular Depth Estimation for Low-Light Environments

In recent years, foundation models for monocular depth estimation have received increasing attention. Current methods mainly address typical daylight conditions, but their effectiveness notably decreases in low-light environments. There is a lack of robust foundational models for monocular depth estimation specifically designed for low-light scenarios. This largely stems from the absence of large-scale, high-quality paired depth datasets for low-light conditions and the effective parameter-efficient fine-tuning (PEFT) strategy. To address these challenges, we propose DepthDark, a robust foundation model for low-light monocular depth estimation. We first introduce a flare-simulation module and a noise-simulation module to accurately simulate the imaging process under nighttime conditions, producing high-quality paired depth datasets for low-light conditions. Additionally, we present an effective low-light PEFT strategy that utilizes illumination guidance and multiscale feature fusion to enhance the model's capability in low-light environments. Our method achieves state-of-the-art depth estimation performance on the challenging nuScenes-Night and RobotCar-Night datasets, validating its effectiveness using limited training data and computing resources.

Updated: 2025-07-24 09:32:53

标题: DepthDark：用于低光环境的稳健单目深度估计

摘要: 近年来，单目深度估计的基础模型受到越来越多的关注。当前方法主要针对典型的白天条件，但它们在低光环境下的有效性明显降低。针对低光情况专门设计的单目深度估计的稳健基础模型存在不足。这主要是由于缺乏大规模、高质量的配对深度数据集以及有效的参数高效微调（PEFT）策略。为了解决这些挑战，我们提出了DepthDark，一个用于低光单目深度估计的稳健基础模型。我们首先引入了一个耀斑模拟模块和一个噪声模拟模块，以准确模拟夜间条件下的成像过程，生成高质量的低光条件下的配对深度数据集。此外，我们提出了一种有效的低光PEFT策略，利用照明指导和多尺度特征融合来增强模型在低光环境中的能力。我们的方法在具有挑战性的nuScenes-Night和RobotCar-Night数据集上实现了最先进的深度估计性能，验证了其在有限的训练数据和计算资源下的有效性。

更新时间: 2025-07-24 09:32:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18243v1

Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods

Despite their theoretical appeal, totally corrective boosting methods based on linear programming have received limited empirical attention. In this paper, we conduct the first large-scale experimental study of six LP-based boosting formulations, including two novel methods, NM-Boost and QRLP-Boost, across 20 diverse datasets. We evaluate the use of both heuristic and optimal base learners within these formulations, and analyze not only accuracy, but also ensemble sparsity, margin distribution, anytime performance, and hyperparameter sensitivity. We show that totally corrective methods can outperform or match state-of-the-art heuristics like XGBoost and LightGBM when using shallow trees, while producing significantly sparser ensembles. We further show that these methods can thin pre-trained ensembles without sacrificing performance, and we highlight both the strengths and limitations of using optimal decision trees in this context.

Updated: 2025-07-24 09:30:37

标题: 再探Boosting：基于LP的集成方法的基准测试与推进

摘要: 尽管基于线性规划的完全校正性提升方法在理论上具有吸引力，但在实证研究中受到了有限的关注。在本文中，我们进行了第一次对六种基于LP的提升形式进行大规模实验研究，包括两种新方法NM-Boost和QRLP-Boost，在20个不同的数据集上进行了评估。我们评估了这些形式中启发式和最优基本学习器的使用，并不仅分析准确性，还分析了集成稀疏性、边缘分布、任何时候的性能和超参数敏感性。我们展示了当使用浅树时，完全校正的方法可以胜过或与XGBoost和LightGBM等最先进的启发式方法相匹配，同时产生明显更稀疏的集成。我们进一步展示了这些方法可以在不牺牲性能的情况下减少预训练的集成，并强调了在这种情况下使用最优决策树的优势和局限性。

更新时间: 2025-07-24 09:30:37

领域: cs.LG

下载: http://arxiv.org/abs/2507.18242v1

Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods

Updated: 2025-07-24 09:30:37

标题: 再探提升：基于LP的集成方法的基准测试和进展

摘要: 尽管线性规划为基础的全面校正提升方法在理论上具有吸引力，但在实证研究中却受到了限制。本文对六种基于LP的提升方法进行了首次大规模实验研究，包括两种新方法NM-Boost和QRLP-Boost，涵盖了20个不同的数据集。我们评估了这些方法中启发式和最优基学习器的使用，并分析了准确性、集成稀疏性、间隔分布、任意时刻性能和超参数敏感性。我们展示了当使用浅树时，全面校正方法可以胜过或匹敌XGBoost和LightGBM等最先进的启发式方法，同时产生明显更稀疏的集成。我们进一步展示了这些方法可以在不牺牲性能的情况下稀疏预训练的集成，并突出了在这种情况下使用最优决策树的优势和局限性。

更新时间: 2025-07-24 09:30:37

领域: cs.LG

下载: http://arxiv.org/abs/2507.18242v1

DisMS-TS: Eliminating Redundant Multi-Scale Features for Time Series Classification

Real-world time series typically exhibit complex temporal variations, making the time series classification task notably challenging. Recent advancements have demonstrated the potential of multi-scale analysis approaches, which provide an effective solution for capturing these complex temporal patterns. However, existing multi-scale analysis-based time series prediction methods fail to eliminate redundant scale-shared features across multi-scale time series, resulting in the model over- or under-focusing on scale-shared features. To address this issue, we propose a novel end-to-end Disentangled Multi-Scale framework for Time Series classification (DisMS-TS). The core idea of DisMS-TS is to eliminate redundant shared features in multi-scale time series, thereby improving prediction performance. Specifically, we propose a temporal disentanglement module to capture scale-shared and scale-specific temporal representations, respectively. Subsequently, to effectively learn both scale-shared and scale-specific temporal representations, we introduce two regularization terms that ensure the consistency of scale-shared representations and the disparity of scale-specific representations across all temporal scales. Extensive experiments conducted on multiple datasets validate the superiority of DisMS-TS over its competitive baselines, with the accuracy improvement up to 9.71%.

Updated: 2025-07-24 09:29:08

标题: DisMS-TS：消除时间序列分类中多尺度特征的冗余

摘要: 真实世界的时间序列通常表现出复杂的时间变化，使得时间序列分类任务变得特别具有挑战性。最近的进展展示了多尺度分析方法的潜力，为捕捉这些复杂的时间模式提供了有效的解决方案。然而，现有基于多尺度分析的时间序列预测方法未能消除跨多尺度时间序列中冗余的共享特征，导致模型过度或不足地关注共享特征。为了解决这个问题，我们提出了一种新颖的端到端解缠多尺度时间序列分类框架（DisMS-TS）。DisMS-TS的核心思想是消除多尺度时间序列中的冗余共享特征，从而提高预测性能。具体来说，我们提出了一个时间解缠模块，分别捕捉尺度共享和尺度特定的时间表示。随后，为了有效地学习尺度共享和尺度特定的时间表示，我们引入了两个正则化项，确保所有时间尺度上尺度共享表示的一致性和尺度特定表示的差异性。在多个数据集上进行的大量实验验证了DisMS-TS相对于竞争基线的优越性，准确率提高了高达9.71%。

更新时间: 2025-07-24 09:29:08

领域: cs.AI

下载: http://arxiv.org/abs/2507.04600v2

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation

Recently, multi-view learning (MVL) has garnered significant attention due to its ability to fuse discriminative information from multiple views. However, real-world multi-view datasets are often heterogeneous and imperfect, which usually causes MVL methods designed for specific combinations of views to lack application potential and limits their effectiveness. To address this issue, we propose a novel robust MVL method (namely RML) with simultaneous representation fusion and alignment. Specifically, we introduce a simple yet effective multi-view transformer fusion network where we transform heterogeneous multi-view data into homogeneous word embeddings, and then integrate multiple views by the sample-level attention mechanism to obtain a fused representation. Furthermore, we propose a simulated perturbation based multi-view contrastive learning framework that dynamically generates the noise and unusable perturbations for simulating imperfect data conditions. The simulated noisy and unusable data obtain two distinct fused representations, and we utilize contrastive learning to align them for learning discriminative and robust representations. Our RML is self-supervised and can also be applied for downstream tasks as a regularization. In experiments, we employ it in multi-view unsupervised clustering, noise-label classification, and as a plug-and-play module for cross-modal hashing retrieval. Extensive comparison experiments and ablation studies validate RML's effectiveness. Code is available at https://github.com/SubmissionsIn/RML.

Updated: 2025-07-24 09:25:16

标题: 通过样本级注意力表示融合和模拟扰动对齐的稳健多视图学习

摘要: 最近，多视图学习（MVL）因其能够融合多个视图的辨别信息而受到了广泛关注。然而，现实世界中的多视图数据集通常是异构且不完美的，这通常导致为特定视图组合设计的MVL方法缺乏应用潜力并限制了它们的有效性。为了解决这个问题，我们提出了一种新颖的鲁棒MVL方法（即RML），具有同时表示融合和对齐。具体而言，我们引入了一个简单而有效的多视图转换器融合网络，将异构多视图数据转换为同质词嵌入，然后通过样本级别的注意机制集成多个视图以获得融合表示。此外，我们提出了一种基于模拟扰动的多视图对比学习框架，动态生成噪声和无用扰动，模拟不完美的数据条件。模拟的嘈杂和无用数据获得两种不同的融合表示，我们利用对比学习将它们对齐，以学习出具有辨别性和鲁棒性的表示。我们的RML是自监督的，也可以作为正则化应用于下游任务。在实验中，我们将其应用于多视图无监督聚类、噪声标签分类，以及作为跨模态哈希检索的即插即用模块。广泛的比较实验和消融研究验证了RML的有效性。代码可在https://github.com/SubmissionsIn/RML上获得。

更新时间: 2025-07-24 09:25:16

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.04151v2

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation

Updated: 2025-07-24 09:25:16

标题: 通过样本级注意力的表示融合和模拟扰动的对齐实现鲁棒的多视图学习

摘要: 最近，多视图学习（MVL）因其能够融合来自多个视图的判别信息而引起了广泛关注。然而，现实世界中的多视图数据集通常是异构且不完美的，这通常导致为特定视图组合设计的MVL方法缺乏应用潜力，并限制了它们的有效性。为了解决这个问题，我们提出了一种新颖的鲁棒的MVL方法（即RML），同时进行表示融合和对齐。具体而言，我们引入了一个简单但有效的多视图变换器融合网络，其中我们将异构多视图数据转换为同质的单词嵌入，然后通过样本级别的注意机制整合多个视图以获得融合表示。此外，我们提出了一个基于模拟扰动的多视图对比学习框架，动态生成噪声和不可用扰动以模拟不完美的数据条件。模拟的有噪声和不可用数据获得两种不同的融合表示，我们利用对比学习将它们对齐，以学习判别性和鲁棒性表示。我们的RML是自监督的，也可以应用于下游任务作为正则化。在实验中，我们将其应用于多视图无监督聚类，噪声标签分类，并作为跨模态哈希检索的即插即用模块。大量比较实验和消融研究验证了RML的有效性。代码可在https://github.com/SubmissionsIn/RML上找到。

更新时间: 2025-07-24 09:25:16

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2503.04151v2

Compositional Coordination for Multi-Robot Teams with Large Language Models

Multi-robot coordination has traditionally relied on a mission-specific and expert-driven pipeline, where natural language mission descriptions are manually translated by domain experts into mathematical formulation, algorithm design, and executable code. This conventional process is labor-intensive, inaccessible to non-experts, and inflexible to changes in mission requirements. Here, we propose LAN2CB (Language to Collective Behavior), a novel framework that leverages large language models (LLMs) to streamline and generalize the multi-robot coordination pipeline. LAN2CB transforms natural language (NL) mission descriptions into executable Python code for multi-robot systems through two core modules: (1) Mission Analysis, which parses mission descriptions into behavior trees, and (2) Code Generation, which leverages the behavior tree and a structured knowledge base to generate robot control code. We further introduce a dataset of natural language mission descriptions to support development and benchmarking. Experiments in both simulation and real-world environments demonstrate that LAN2CB enables robust and flexible multi-robot coordination from natural language, significantly reducing manual engineering effort and supporting broad generalization across diverse mission types. Website: https://sites.google.com/view/lan-cb

Updated: 2025-07-24 09:25:12

标题: 使用大型语言模型的多机器人团队的组合协调

摘要: 多机器人协调传统上依赖于特定任务和专家驱动的流程，其中自然语言任务描述由领域专家手动翻译为数学公式、算法设计和可执行代码。这种传统过程耗时、非专家无法访问，并且对任务需求的变化不够灵活。在这里，我们提出了LAN2CB（语言到集体行为），这是一个利用大型语言模型（LLMs）来简化和泛化多机器人协调流程的新框架。LAN2CB将自然语言（NL）任务描述转换为多机器人系统的可执行Python代码，通过两个核心模块实现：（1）任务分析，将任务描述解析为行为树，以及（2）代码生成，利用行为树和结构化知识库生成机器人控制代码。我们进一步介绍了一个自然语言任务描述的数据集，以支持开发和基准测试。在模拟和现实环境中的实验表明，LAN2CB使得从自然语言实现强大而灵活的多机器人协调成为可能，显著减少了手动工程工作量，并支持在不同任务类型之间进行广泛的泛化。网站：https://sites.google.com/view/lan-cb

更新时间: 2025-07-24 09:25:12

领域: cs.RO,cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2507.16068v2

From Individual Learning to Market Equilibrium: Correcting Structural and Parametric Biases in RL Simulations of Economic Models

The application of Reinforcement Learning (RL) to economic modeling reveals a fundamental conflict between the assumptions of equilibrium theory and the emergent behavior of learning agents. While canonical economic models assume atomistic agents act as `takers' of aggregate market conditions, a naive single-agent RL simulation incentivizes the agent to become a `manipulator' of its environment. This paper first demonstrates this discrepancy within a search-and-matching model with concave production, showing that a standard RL agent learns a non-equilibrium, monopsonistic policy. Additionally, we identify a parametric bias arising from the mismatch between economic discounting and RL's treatment of intertemporal costs. To address both issues, we propose a calibrated Mean-Field Reinforcement Learning framework that embeds a representative agent in a fixed macroeconomic field and adjusts the cost function to reflect economic opportunity costs. Our iterative algorithm converges to a self-consistent fixed point where the agent's policy aligns with the competitive equilibrium. This approach provides a tractable and theoretically sound methodology for modeling learning agents in economic systems within the broader domain of computational social science.

Updated: 2025-07-24 09:21:02

标题: 从个体学习到市场均衡：纠正RL模拟经济模型中的结构和参数偏差

摘要: 强化学习（RL）在经济建模中的应用揭示了均衡理论假设和学习代理的新兴行为之间的根本冲突。尽管经典经济模型假设微观代理行为像市场条件的“接受者”，但一个天真的单代理RL模拟激励代理成为其环境的“操纵者”。本文首先通过一个具有凹生产的搜索匹配模型展示了这种不一致性，展示标准RL代理学习了一个非均衡的单买者政策。此外，我们还确定了由经济贴现与RL对待跨期成本不匹配引起的参数偏差。为了解决这两个问题，我们提出了一个经过校准的均场强化学习框架，将一个代表性代理嵌入到一个固定的宏观经济领域中，并调整成本函数以反映经济机会成本。我们的迭代算法收敛到一个自洽的固定点，在这个点上，代理的政策与竞争均衡一致。这种方法为在计算社会科学更广泛领域内对经济系统中的学习代理进行建模提供了一种可行且理论上合理的方法论。

更新时间: 2025-07-24 09:21:02

领域: econ.GN,cs.AI,q-fin.EC

下载: http://arxiv.org/abs/2507.18229v1

Meta Prompting for AI Systems

We introduce Meta Prompting (MP), a framework that elevates the reasoning capabilities of large language models (LLMs) by focusing on the formal structure of a task rather than content-specific examples. We establish a theoretical foundation for this paradigm, formalizing MP as a functor that maps a category of tasks to a category of structured prompts, thereby guaranteeing that compositional problem-solving strategies can be systematically decomposed into modular prompt structures. We extend this concept to Recursive Meta Prompting (RMP), an automated process where an LLM can generate and refine its own prompts. We model this self-improvement loop formally as a monad, providing a principled framework for automated prompt engineering. Our claims are validated through extensive experiments demonstrating that a Qwen-72B base model, guided by a single, example-agnostic meta-prompt, achieves state-of-the-art results on MATH, GSM8K, and Game of 24. These results are achieved with substantial token efficiency gains over traditional few-shot methods. Project Page: https://github.com/meta-prompting/meta-prompting.

Updated: 2025-07-24 09:19:38

标题: AI系统的元启示 (Note: "元启示" can also be translated as "元提示" depending on the context)

摘要: 我们介绍了Meta Prompting（MP），这是一个框架，通过关注任务的形式结构而非特定内容的示例，提升了大型语言模型（LLMs）的推理能力。我们为这一范式建立了一个理论基础，将MP形式化为一个函子，将任务类别映射到结构化提示类别，从而保证组合问题解决策略可以被系统地分解为模块化提示结构。我们将这一概念扩展到递归元提示（RMP），这是一个自动化过程，LLM可以生成和改进自己的提示。我们正式地将这种自我改进循环建模为一个单子，为自动化提示工程提供了一个有原则的框架。我们通过大量实验证实了我们的主张，证明了一个Qwen-72B基模型，在一个单一的、不依赖示例的元提示的指导下，实现了MATH、GSM8K和Game of 24的最新成果。这些结果在传统的少样本方法上实现了大量令牌效率增益。项目页面：https://github.com/meta-prompting/meta-prompting。

更新时间: 2025-07-24 09:19:38

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2311.11482v8

Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation

Evaluating feature attribution methods represents a critical challenge in explainable AI (XAI), as researchers typically rely on perturbation-based metrics when ground truth is unavailable. However, recent work reveals that these evaluation metrics can show different performance across predicted classes within the same dataset. These "class-dependent evaluation effects" raise questions about whether perturbation analysis reliably measures attribution quality, with direct implications for XAI method development and evaluation trustworthiness. We investigate under which conditions these class-dependent effects arise by conducting controlled experiments with synthetic time series data where ground truth feature locations are known. We systematically vary feature types and class contrasts across binary classification tasks, then compare perturbation-based degradation scores with ground truth-based precision-recall metrics using multiple attribution methods. Our experiments demonstrate that class-dependent effects emerge with both evaluation approaches, even in simple scenarios with temporally localized features, triggered by basic variations in feature amplitude or temporal extent between classes. Most critically, we find that perturbation-based and ground truth metrics frequently yield contradictory assessments of attribution quality across classes, with weak correlations between evaluation approaches. These findings suggest that researchers should interpret perturbation-based metrics with care, as they may not always align with whether attributions correctly identify discriminating features. By showing this disconnect, our work points toward reconsidering what attribution evaluation actually measures and developing more rigorous evaluation methods that capture multiple dimensions of attribution quality.

Updated: 2025-07-24 09:17:21

标题: 为什么随着时间序列特征归因的发生，会出现与类别相关的评估效果？合成数据调查

摘要: 评估特征归因方法在可解释人工智能（XAI）中是一个关键挑战，因为研究人员通常在没有地面真相的情况下依赖基于扰动的度量标准。然而，最近的研究揭示了这些评估指标在同一数据集中的预测类别之间可能表现出不同的性能。这些“类别相关的评估效应”引发了关于扰动分析是否可靠地衡量归因质量的问题，对XAI方法的发展和评估可信度产生直接影响。我们通过在已知地面真相特征位置的合成时间序列数据上进行受控实验来研究这些类别相关效应在哪些条件下会出现。我们系统地改变特征类型和二元分类任务中的类别对比度，然后使用多种归因方法比较基于扰动的退化分数与基于地面真相的精度-召回率度量。我们的实验表明，即使在具有时间局部化特征的简单场景中，类别相关效应也会出现，这些效应由于类别之间特征振幅或时间范围的基本变化而触发。最关键的是，我们发现扰动基础和地面真相度量经常在不同类别之间对归因质量做出矛盾评估，评估方法之间的相关性较弱。这些发现表明，研究人员应谨慎解释基于扰动的度量，因为它们可能并不总是与归因是否正确识别区分特征相一致。通过展示这种脱节，我们的工作指向重新考虑归因评估实际上衡量的内容，并开发更严格的评估方法，以捕捉归因质量的多个维度。

更新时间: 2025-07-24 09:17:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.11790v2

Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation

Updated: 2025-07-24 09:17:21

标题: 为什么随着时间序列特征归因的类别相关评估效应发生？一个合成数据调查

摘要: 评估特征归因方法在可解释人工智能（XAI）中代表着一个关键挑战，因为研究人员通常在缺乏地面真相时依赖扰动基础指标。然而，最近的研究揭示了这些评估指标在同一数据集中的不同预测类别之间可能表现出不同的性能。这些“类别相关的评估效应”引发了对扰动分析是否可靠地衡量归因质量的疑问，直接影响了XAI方法的发展和评估可信度。我们通过在已知地面真相特征位置的合成时间序列数据上进行受控实验来调查这些类别相关效应在哪些条件下会出现。我们系统地改变特征类型和二元分类任务中的类别对比，然后使用多种归因方法将基于扰动的降级分数与基于地面真相的精确率-召回率指标进行比较。我们的实验表明，即使在具有时间局部特征的简单情况下，类别相关效应也会出现，这是由于类别之间特征幅度或时间范围的基本变化引发的。最关键的是，我们发现扰动基础和地面真相指标经常在类别之间产生对归因质量的矛盾评估，评估方法之间的相关性较弱。这些发现表明，研究人员应谨慎解释扰动基础指标，因为它们可能并不总是与归因正确识别区分特征的能力一致。通过展示这种脱节，我们的工作指向重新考虑归因评估实际上衡量了什么，并开发更严格的评估方法，以捕捉归因质量的多个维度。

更新时间: 2025-07-24 09:17:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.11790v2

GenAI for Automotive Software Development: From Requirements to Wheels

This paper introduces a GenAI-empowered approach to automated development of automotive software, with emphasis on autonomous and Advanced Driver Assistance Systems (ADAS) capabilities. The process starts with requirements as input, while the main generated outputs are test scenario code for simulation environment, together with implementation of desired ADAS capabilities targeting hardware platform of the vehicle connected to testbench. Moreover, we introduce additional steps for requirements consistency checking leveraging Model-Driven Engineering (MDE). In the proposed workflow, Large Language Models (LLMs) are used for model-based summarization of requirements (Ecore metamodel, XMI model instance and OCL constraint creation), test scenario generation, simulation code (Python) and target platform code generation (C++). Additionally, Retrieval Augmented Generation (RAG) is adopted to enhance test scenario generation from autonomous driving regulations-related documents. Our approach aims shorter compliance and re-engineering cycles, as well as reduced development and testing time when it comes to ADAS-related capabilities.

Updated: 2025-07-24 09:17:13

标题: GenAI用于汽车软件开发：从需求到车轮

摘要: 本文介绍了一种由GenAI驱动的自动化汽车软件开发方法，重点放在自动驾驶和高级驾驶辅助系统（ADAS）功能上。该过程以需求为输入开始，主要生成的输出是用于模拟环境的测试场景代码，以及针对连接到测试台的车辆硬件平台的所需ADAS功能的实现。此外，我们引入了利用模型驱动工程（MDE）进行要求一致性检查的额外步骤。在提出的工作流程中，大型语言模型（LLMs）用于基于模型的需求摘要（Ecore元模型、XMI模型实例和OCL约束创建）、测试场景生成、模拟代码（Python）和目标平台代码生成（C++）。此外，采用检索增强生成（RAG）来增强从自动驾驶规程相关文档中生成测试场景。我们的方法旨在缩短合规性和重工程周期，同时在涉及ADAS相关功能时减少开发和测试时间。

更新时间: 2025-07-24 09:17:13

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.18223v1

Sparse identification of nonlinear dynamics with library optimization mechanism: Recursive long-term prediction perspective

The sparse identification of nonlinear dynamics (SINDy) approach can discover the governing equations of dynamical systems based on measurement data, where the dynamical model is identified as the sparse linear combination of the given basis functions. A major challenge in SINDy is the design of a library, which is a set of candidate basis functions, as the appropriate library is not trivial for many dynamical systems. To overcome this difficulty, this study proposes SINDy with library optimization mechanism (SINDy-LOM), which is a combination of the sparse regression technique and the novel learning strategy of the library. In the proposed approach, the basis functions are parametrized. The SINDy-LOM approach involves a two-layer optimization architecture: the inner-layer, in which the data-driven model is extracted as the sparse linear combination of the candidate basis functions, and the outer-layer, in which the basis functions are optimized from the viewpoint of the recursive long-term (RLT) prediction accuracy; thus, the library design is reformulated as the optimization of the parametrized basis functions. The resulting SINDy-LOM model has good interpretability and usability, as the proposed approach yields the parsimonious model. The library optimization mechanism significantly reduces user burden. The RLT perspective improves the reliability of the resulting model compared with the traditional SINDy approach that can only ensure the one-step-ahead prediction accuracy. The validity of the proposed approach is demonstrated by applying it to a diesel engine airpath system, which is a well-known complex industrial system.

Updated: 2025-07-24 09:15:26

标题: 稀疏非线性动力学识别与库优化机制：递归长期预测视角

摘要: 非线性动力学的稀疏识别（SINDy）方法可以基于测量数据发现动态系统的控制方程，其中动力学模型被识别为给定基函数的稀疏线性组合。SINDy面临的一个主要挑战是设计库，即一组候选基函数，因为对于许多动力学系统来说，适当的库并不是微不足道的。为了克服这一困难，本研究提出了具有库优化机制（SINDy-LOM）的SINDy方法，它是稀疏回归技术和库的新学习策略的结合。在提出的方法中，基函数被参数化。SINDy-LOM方法涉及两层优化架构：内层，在该层中，数据驱动模型被提取为候选基函数的稀疏线性组合，外层，在该层中，基函数从递归长期（RLT）预测准确性的角度进行优化；因此，库设计被重新制定为参数化基函数的优化。由此产生的SINDy-LOM模型具有良好的解释性和可用性，因为该方法产生了简约的模型。库优化机制显著减少了用户的负担。与传统SINDy方法相比，RLT视角提高了结果模型的可靠性，后者只能保证一步预测的准确性。通过将所提出的方法应用于柴油发动机空气通道系统，即一个众所周知的复杂工业系统，验证了该方法的有效性。

更新时间: 2025-07-24 09:15:26

领域: cs.LG,math.DS

下载: http://arxiv.org/abs/2507.18220v1

Sparse identification of nonlinear dynamics with library optimization mechanism: Recursive long-term prediction perspective

Updated: 2025-07-24 09:15:26

标题: 用稀疏识别非线性动力学与库优化机制：递归长期预测角度

摘要: 非线性动力学的稀疏识别（SINDy）方法可以根据测量数据发现动态系统的控制方程，其中动态模型被识别为给定基础函数的稀疏线性组合。SINDy面临的一个主要挑战是设计一个库，即一组候选基础函数，因为对许多动态系统来说，适当的库并不是显而易见的。为了克服这一困难，本研究提出了具有库优化机制（SINDy-LOM）的SINDy方法，这是稀疏回归技术和库的新学习策略的结合。在所提出的方法中，基础函数被参数化。SINDy-LOM方法涉及一个两层优化架构：内层，其中数据驱动模型被提取为候选基础函数的稀疏线性组合，外层，在这里基础函数从递归长期（RLT）预测精度的角度进行优化；因此，库设计被重新制定为参数化基础函数的优化。由此产生的SINDy-LOM模型具有良好的可解释性和可用性，因为所提出的方法产生了简约模型。库优化机制显著减轻了用户负担。与只能确保一步预测精度的传统SINDy方法相比，RLT视角提高了结果模型的可靠性。提出的方法的有效性通过将其应用于柴油发动机空气路径系统来进行验证，该系统是一个众所周知的复杂工业系统。

更新时间: 2025-07-24 09:15:26

领域: cs.LG,math.DS

下载: http://arxiv.org/abs/2507.18220v1

FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting

Federated Graph Learning (FGL) is a distributed learning paradigm that enables collaborative training over large-scale subgraphs located on multiple local systems. However, most existing FGL approaches rely on synchronous communication, which leads to inefficiencies and is often impractical in real-world deployments. Meanwhile, current asynchronous federated learning (AFL) methods are primarily designed for conventional tasks such as image classification and natural language processing, without accounting for the unique topological properties of graph data. Directly applying these methods to graph learning can possibly result in semantic drift and representational inconsistency in the global model. To address these challenges, we propose FedSA-GCL, a semi-asynchronous federated framework that leverages both inter-client label distribution divergence and graph topological characteristics through a novel ClusterCast mechanism for efficient training. We evaluate FedSA-GCL on multiple real-world graph datasets using the Louvain and Metis split algorithms, and compare it against 9 baselines. Extensive experiments demonstrate that our method achieves strong robustness and outstanding efficiency, outperforming the baselines by an average of 2.92% with the Louvain and by 3.4% with the Metis.

Updated: 2025-07-24 09:15:07

标题: FedSA-GCL：具有个性化聚合和集群感知广播的半异步联邦图学习框架

摘要: 联邦图学习（FGL）是一种分布式学习范式，可以实现在多个本地系统上的大规模子图之间的协作训练。然而，大多数现有的FGL方法依赖于同步通信，这导致效率低下，在实际部署中往往不可行。与此同时，当前的异步联邦学习（AFL）方法主要设计用于传统任务，如图像分类和自然语言处理，没有考虑图数据的独特拓扑特性。直接应用这些方法到图学习可能导致全局模型中语义漂移和表示不一致。为了解决这些挑战，我们提出了FedSA-GCL，一种半异步联邦框架，通过一种新颖的ClusterCast机制利用客户端间标签分布差异和图拓扑特性进行高效训练。我们使用Louvain和Metis分割算法在多个真实世界的图数据集上评估了FedSA-GCL，并将其与9个基线进行了比较。大量实验证明，我们的方法实现了强大的鲁棒性和出色的效率，在Louvain上的表现优于基线平均2.92%，在Metis上优于3.4%。

更新时间: 2025-07-24 09:15:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18219v1

FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting

Updated: 2025-07-24 09:15:07

标题: FedSA-GCL：具有个性化聚合和集群感知广播的半异步联合图学习框架

摘要: 联邦图学习（FGL）是一种分布式学习范式，可以在位于多个本地系统上的大规模子图上进行协作训练。然而，大多数现有的FGL方法依赖于同步通信，这导致效率低下，在实际部署中通常不可行。与此同时，当前的异步联邦学习（AFL）方法主要设计用于传统任务，如图像分类和自然语言处理，并未考虑图数据的独特拓扑特性。直接将这些方法应用于图学习可能导致全局模型中的语义漂移和表征不一致。为了解决这些挑战，我们提出了FedSA-GCL，这是一个半异步联邦框架，通过一种新颖的ClusterCast机制利用客户端标签分布差异和图拓扑特性进行有效训练。我们使用Louvain和Metis拆分算法在多个真实世界图数据集上评估了FedSA-GCL，并将其与9个基线进行比较。广泛的实验表明，我们的方法具有强大的鲁棒性和出色的效率，在Louvain算法上优于基线平均2.92％，在Metis算法上优于基线3.4％。

更新时间: 2025-07-24 09:15:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18219v1

Information Security Based on LLM Approaches: A Review

Information security is facing increasingly severe challenges, and traditional protection means are difficult to cope with complex and changing threats. In recent years, as an emerging intelligent technology, large language models (LLMs) have shown a broad application prospect in the field of information security. In this paper, we focus on the key role of LLM in information security, systematically review its application progress in malicious behavior prediction, network threat analysis, system vulnerability detection, malicious code identification, and cryptographic algorithm optimization, and explore its potential in enhancing security protection performance. Based on neural networks and Transformer architecture, this paper analyzes the technical basis of large language models and their advantages in natural language processing tasks. It is shown that the introduction of large language modeling helps to improve the detection accuracy and reduce the false alarm rate of security systems. Finally, this paper summarizes the current application results and points out that it still faces challenges in model transparency, interpretability, and scene adaptability, among other issues. It is necessary to explore further the optimization of the model structure and the improvement of the generalization ability to realize a more intelligent and accurate information security protection system.

Updated: 2025-07-24 09:09:36

标题: 基于LLM方法的信息安全：综述

摘要: 信息安全正面临日益严峻的挑战，传统的保护手段难以应对复杂和不断变化的威胁。近年来，作为新兴智能技术，大型语言模型（LLMs）在信息安全领域展现出广阔的应用前景。本文重点关注LLM在信息安全中的关键作用，系统回顾了其在恶意行为预测、网络威胁分析、系统漏洞检测、恶意代码识别和密码算法优化等方面的应用进展，并探讨了其在增强安全保护性能方面的潜力。基于神经网络和Transformer架构，本文分析了大型语言模型的技术基础及其在自然语言处理任务中的优势。研究表明，引入大型语言建模有助于提高安全系统的检测准确性并降低误报率。最后，本文总结了当前的应用结果，并指出其在模型透明度、可解释性和场景适应性等问题上仍面临挑战。有必要进一步探讨模型结构的优化和泛化能力的提升，以实现更智能、更准确的信息安全保护系统。

更新时间: 2025-07-24 09:09:36

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.18215v1

Information Security Based on LLM Approaches: A Review

Updated: 2025-07-24 09:09:36

标题: 基于LLM方法的信息安全：一项综述

摘要: 信息安全面临日益严峻的挑战，传统的保护手段难以应对复杂和不断变化的威胁。近年来，作为一种新兴智能技术，大型语言模型(LLMs)在信息安全领域展现出广阔的应用前景。本文重点关注LLM在信息安全中的关键作用，系统地回顾了它在恶意行为预测、网络威胁分析、系统漏洞检测、恶意代码识别以及密码算法优化等方面的应用进展，并探讨了它在提升安全保护性能方面的潜力。基于神经网络和Transformer架构，本文分析了大型语言模型的技术基础及其在自然语言处理任务中的优势。结果显示，引入大型语言建模有助于提高安全系统的检测准确性和降低虚警率。最后，本文总结了当前的应用结果，并指出其在模型透明性、可解释性和场景适应性等方面仍面临挑战。有必要进一步探索模型结构的优化和泛化能力的提高，以实现更智能、更准确的信息安全保护系统。

更新时间: 2025-07-24 09:09:36

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.18215v1

The Role of the Time-Dependent Hessian in High-Dimensional Optimization

Gradient descent is commonly used to find minima in rough landscapes, particularly in recent machine learning applications. However, a theoretical understanding of why good solutions are found remains elusive, especially in strongly non-convex and high-dimensional settings. Here, we focus on the phase retrieval problem as a typical example, which has received a lot of attention recently in theoretical machine learning. We analyze the Hessian during gradient descent, identify a dynamical transition in its spectral properties, and relate it to the ability of escaping rough regions in the loss landscape. When the signal-to-noise ratio (SNR) is large enough, an informative negative direction exists in the Hessian at the beginning of the descent, i.e in the initial condition. While descending, a BBP transition in the spectrum takes place in finite time: the direction is lost, and the dynamics is trapped in a rugged region filled with marginally stable bad minima. Surprisingly, for finite system sizes, this window of negative curvature allows the system to recover the signal well before the theoretical SNR found for infinite sizes, emphasizing the central role of initialization and early-time dynamics for efficiently navigating rough landscapes.

Updated: 2025-07-24 09:06:37

标题: 高维优化中时间相关Hessian的作用

摘要: 梯度下降通常用于在复杂的景观中寻找极小值点，尤其是在最近的机器学习应用中。然而，对于为什么能找到好的解决方案的理论理解仍然难以捉摸，特别是在强烈非凸和高维设置中。在这里，我们以相位恢复问题为典型例子进行研究，这个问题最近在理论机器学习中引起了很多关注。我们分析了梯度下降过程中的Hessian矩阵，确定了其谱特性中的动态转变，并将其与在损失景观中逃脱粗糙区域的能力联系起来。当信噪比（SNR）足够大时，在下降开始时，Hessian矩阵中存在一个信息负方向，即在初始条件中。在下降过程中，Hessian矩阵的谱发生了BBP过渡：这个方向消失了，动态被困在一个充满边缘稳定不良极小值的崎岖区域中。令人惊讶的是，对于有限系统尺寸来说，这个负曲率窗口允许系统在理论上找到的无限尺寸的信噪比之前很好地恢复信号，强调了初始化和早期动态对于有效地穿越复杂景观的重要作用。

更新时间: 2025-07-24 09:06:37

领域: cs.LG,cond-mat.dis-nn,cond-mat.stat-mech

下载: http://arxiv.org/abs/2403.02418v3

The Role of the Time-Dependent Hessian in High-Dimensional Optimization

Updated: 2025-07-24 09:06:37

标题: 高维优化中的时间相关Hessian的作用

摘要: 梯度下降通常用于在复杂地形中寻找极小值点，特别是在最近的机器学习应用中。然而，对于为什么能够找到好的解决方案的理论理解仍然难以捉摸，尤其是在非凸和高维设置中。在这里，我们以相位恢复问题为典型例子进行重点研究，这个问题最近在理论机器学习中受到了很多关注。我们分析了梯度下降过程中的Hessian矩阵，识别了其谱特性中的一个动态转变，并将其与在损失地形中逃离粗糙区域的能力联系起来。当信噪比（SNR）足够大时，在下降的开始时，在Hessian矩阵中存在一个具有信息量的负方向，即在初始条件中。在下降过程中，谱中发生了BBP转变：这个方向丢失了，动态被困在一个充满边缘稳定坏极小值的崎岖区域。令人惊讶的是，对于有限的系统尺寸，这个负曲率窗口使系统能够在理论上发现的无限尺寸的信噪比之前很好地恢复信号，强调了初始化和早期动态对于有效地在复杂地形中导航的中心作用。

更新时间: 2025-07-24 09:06:37

领域: cs.LG,cond-mat.dis-nn,cond-mat.stat-mech

下载: http://arxiv.org/abs/2403.02418v3

MoRPI-PINN: A Physics-Informed Framework for Mobile Robot Pure Inertial Navigation

A fundamental requirement for full autonomy in mobile robots is accurate navigation even in situations where satellite navigation or cameras are unavailable. In such practical situations, relying only on inertial sensors will result in navigation solution drift due to the sensors' inherent noise and error terms. One of the emerging solutions to mitigate drift is to maneuver the robot in a snake-like slithering motion to increase the inertial signal-to-noise ratio, allowing the regression of the mobile robot position. In this work, we propose MoRPI-PINN as a physics-informed neural network framework for accurate inertial-based mobile robot navigation. By embedding physical laws and constraints into the training process, MoRPI-PINN is capable of providing an accurate and robust navigation solution. Using real-world experiments, we show accuracy improvements of over 85% compared to other approaches. MoRPI-PINN is a lightweight approach that can be implemented even on edge devices and used in any typical mobile robot application.

Updated: 2025-07-24 09:02:13

标题: MoRPI-PINN：移动机器人纯惯性导航的物理信息框架

摘要: 移动机器人完全自主的基本要求是在卫星导航或摄像头不可用的情况下仍能准确导航。在这种实际情况下，仅依靠惯性传感器会导致由于传感器固有噪声和误差项而产生导航解漂移。减轻漂移的新兴解决方案之一是让机器人以类似蛇行的蠕动运动移动，增加惯性信号与噪声比，从而实现移动机器人位置的回归。在这项工作中，我们提出了MoRPI-PINN作为一种基于物理信息的神经网络框架，用于准确的惯性导航移动机器人。通过将物理定律和约束嵌入训练过程中，MoRPI-PINN能够提供准确且稳健的导航解决方案。通过真实世界实验，我们展示了与其他方法相比超过85%的精度提高。MoRPI-PINN是一种轻量级方法，甚至可以在边缘设备上实现，并用于任何典型的移动机器人应用中。

更新时间: 2025-07-24 09:02:13

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2507.18206v1

Safeguarding RAG Pipelines with GMTP: A Gradient-based Masked Token Probability Method for Poisoned Document Detection

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by providing external knowledge for accurate and up-to-date responses. However, this reliance on external sources exposes a security risk, attackers can inject poisoned documents into the knowledge base to steer the generation process toward harmful or misleading outputs. In this paper, we propose Gradient-based Masked Token Probability (GMTP), a novel defense method to detect and filter out adversarially crafted documents. Specifically, GMTP identifies high-impact tokens by examining gradients of the retriever's similarity function. These key tokens are then masked, and their probabilities are checked via a Masked Language Model (MLM). Since injected tokens typically exhibit markedly low masked-token probabilities, this enables GMTP to easily detect malicious documents and achieve high-precision filtering. Experiments demonstrate that GMTP is able to eliminate over 90% of poisoned content while retaining relevant documents, thus maintaining robust retrieval and generation performance across diverse datasets and adversarial settings.

Updated: 2025-07-24 08:58:41

标题: 用GMTP保护RAG管道：一种基于梯度的掩码标记概率方法，用于检测受污染的文档

摘要: 检索增强生成（RAG）通过提供准确和最新的响应的外部知识来增强大型语言模型（LLMs）。然而，对外部来源的依赖暴露了安全风险，攻击者可以向知识库中注入有毒文档，以引导生成过程产生有害或误导性的输出。在本文中，我们提出了一种新颖的防御方法Gradient-based Masked Token Probability（GMTP），用于检测和过滤对抗性制作的文档。具体来说，GMTP通过检查检索器相似性函数的梯度来识别高影响力的标记。然后对这些关键标记进行屏蔽，并通过遮蔽语言模型（MLM）检查它们的概率。由于注入的标记通常表现出明显低的遮蔽标记概率，这使得GMTP能够轻松检测恶意文档并实现高精度过滤。实验证明，GMTP能够消除超过90％的有毒内容，同时保留相关文档，从而在不同数据集和对抗环境中保持强大的检索和生成性能。

更新时间: 2025-07-24 08:58:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18202v1

Comparing Non-minimal Semantics for Disjunction in Answer Set Programming

In this paper, we compare four different semantics for disjunction in Answer Set Programming that, unlike stable models, do not adhere to the principle of model minimality. Two of these approaches, Cabalar and Mu\~niz' \emph{Justified Models} and Doherty and Szalas' \emph{Strongly Supported Models}, directly provide an alternative non-minimal semantics for disjunction. The other two, Aguado et al's \emph{Forks} and Shen and Eiter's \emph{Determining Inference} (DI) semantics, actually introduce a new disjunction connective, but are compared here as if they constituted new semantics for the standard disjunction operator. We are able to prove that three of these approaches (Forks, Justified Models and a reasonable relaxation of the DI semantics) actually coincide, constituting a common single approach under different definitions. Moreover, this common semantics always provides a superset of the stable models of a program (in fact, modulo any context) and is strictly stronger than the fourth approach (Strongly Supported Models), that actually treats disjunctions as in classical logic.

Updated: 2025-07-24 08:54:37

标题: 比较答集编程中析取的非最小语义

摘要: 在这篇论文中，我们比较了四种不同的答案集编程中析取语义，这些语义与稳定模型不同，不遵循模型最小性原则。其中两种方法，Cabalar和Muñiz的“合理模型”和Doherty和Szalas的“强支持模型”，直接提供了析取的替代非最小语义。另外两种方法，Aguado等人的“分叉”和Shen和Eiter的“决定推理”（DI）语义，实际上引入了一个新的析取连接词，但在这里被比较，好像它们构成了标准析取运算符的新语义。我们能够证明这四种方法中的三种（分叉、合理模型和DI语义的合理放宽）实际上是相同的，构成了不同定义下的共同方法。此外，这种共同语义总是提供程序的稳定模型的超集（实际上，对于任何上下文），并且比第四种方法（强支持模型）严格更强，后者实际上将析取视为经典逻辑中的情况。

更新时间: 2025-07-24 08:54:37

领域: cs.AI

下载: http://arxiv.org/abs/2507.18198v1

Goal-based Trajectory Prediction for improved Cross-Dataset Generalization

To achieve full autonomous driving, a good understanding of the surrounding environment is necessary. Especially predicting the future states of other traffic participants imposes a non-trivial challenge. Current SotA-models already show promising results when trained on real datasets (e.g. Argoverse2, NuScenes). Problems arise when these models are deployed to new/unseen areas. Typically, performance drops significantly, indicating that the models lack generalization. In this work, we introduce a new Graph Neural Network (GNN) that utilizes a heterogeneous graph consisting of traffic participants and vectorized road network. Latter, is used to classify goals, i.e. endpoints of the predicted trajectories, in a multi-staged approach, leading to a better generalization to unseen scenarios. We show the effectiveness of the goal selection process via cross-dataset evaluation, i.e. training on Argoverse2 and evaluating on NuScenes.

Updated: 2025-07-24 08:54:17

标题: 基于目标的轨迹预测，以提高跨数据集泛化效果

摘要: 为了实现完全自动驾驶，必须对周围环境有良好的理解。特别是预测其他交通参与者未来状态会带来一定挑战。当前的最先进模型在训练真实数据集（例如Argoverse2、NuScenes）时已经显示出有希望的结果。但是，当这些模型部署到新的/未知区域时会出现问题。通常情况下，性能会显著下降，表明模型缺乏泛化能力。在这项工作中，我们引入了一种利用由交通参与者和矢量化道路网络组成的异构图的新图神经网络（GNN）。后者用于在多阶段方法中对预测轨迹的目标进行分类，从而更好地泛化到未知情景。我们通过跨数据集评估展示了目标选择过程的有效性，即在Argoverse2上进行训练并在NuScenes上进行评估。

更新时间: 2025-07-24 08:54:17

领域: cs.LG

下载: http://arxiv.org/abs/2507.18196v1

Goal-based Trajectory Prediction for improved Cross-Dataset Generalization

Updated: 2025-07-24 08:54:17

标题: 基于目标的轨迹预测以提高跨数据集泛化效果

摘要: 为了实现完全自主驾驶，需要对周围环境有良好的理解。特别是预测其他交通参与者未来状态会带来一个非常具有挑战性的问题。当前最先进的模型在训练真实数据集（如Argoverse2、NuScenes）时已经表现出很有前景的结果。然而，当这些模型部署到新的/未知区域时，问题就出现了。通常情况下，性能会显著下降，表明这些模型缺乏泛化能力。在这项工作中，我们引入了一种新的图神经网络（GNN），利用了一个包含交通参与者和向量化道路网络的异质图。后者被用来分类目标，即预测轨迹的终点，在一个多阶段的方法中，从而更好地泛化到未知情景。我们通过跨数据集评估展示了目标选择过程的有效性，即在Argoverse2上训练并在NuScenes上评估。

更新时间: 2025-07-24 08:54:17

领域: cs.LG

下载: http://arxiv.org/abs/2507.18196v1

Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning

On-device learning has emerged as a promising direction for AI development, particularly because of its potential to reduce latency issues and mitigate privacy risks associated with device-server communication, while improving energy efficiency. Despite these advantages, significant memory and computational constraints still represent major challenges for its deployment. Drawing on previous studies on low-rank decomposition methods that address activation memory bottlenecks in backpropagation, we propose a novel shortcut approach as an alternative. Our analysis and experiments demonstrate that our method can reduce activation memory usage, even up to $120.09\times$ compared to vanilla training, while also reducing overall training FLOPs up to $1.86\times$ when evaluated on traditional benchmarks.

Updated: 2025-07-24 08:52:22

标题: 超越低秩分解：一种用于高效设备端学习的快捷方法

摘要: 设备端学习已经成为人工智能发展的一个有前途的方向，尤其是因为它有潜力减少延迟问题并减轻与设备-服务器通信相关的隐私风险，同时提高能源效率。尽管具有这些优势，但显著的内存和计算约束仍然是其部署面临的主要挑战。借鉴先前研究中解决反向传播中激活内存瓶颈的低秩分解方法，我们提出一种新颖的替代捷径方法。我们的分析和实验表明，相比与基本训练，我们的方法可以减少激活内存使用量，甚至高达$120.09\times$，同时在传统基准测试中评估时，还可以减少整体训练FLOPs高达$1.86\times$。

更新时间: 2025-07-24 08:52:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2505.05086v2

Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning

Updated: 2025-07-24 08:52:22

标题: 超越低秩分解：一种用于高效设备学习的快捷方法

摘要: 随着设备学习的出现，它已成为人工智能发展的一个有前途的方向，特别是因为它有潜力减少延迟问题，并缓解与设备-服务器通信相关的隐私风险，同时提高能源效率。尽管具有这些优势，但显著的内存和计算约束仍然是其部署面临的主要挑战。借鉴先前针对反向传播中解决激活内存瓶颈的低秩分解方法的研究，我们提出了一种新颖的捷径方法作为替代方案。我们的分析和实验表明，与传统训练相比，我们的方法可以减少激活内存使用，最多可减少120.09倍，并在传统基准测试中还可以将总体训练FLOPs减少1.86倍。

更新时间: 2025-07-24 08:52:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2505.05086v2

A general language model for peptide identification

Accurate identification of bioactive peptides (BPs) and protein post-translational modifications (PTMs) is essential for understanding protein function and advancing therapeutic discovery. However, most computational methods remain limited in their generalizability across diverse peptide functions. Here, we present PDeepPP, a unified deep learning framework that integrates pretrained protein language models with a hybrid transformer-convolutional architecture, enabling robust identification across diverse peptide classes and PTM sites. We curated comprehensive benchmark datasets and implemented strategies to address data imbalance, allowing PDeepPP to systematically extract both global and local sequence features. Through extensive analyses-including dimensionality reduction and comparison studies-PDeepPP demonstrates strong, interpretable peptide representations and achieves state-of-the-art performance in 25 of the 33 biological identification tasks. Notably, PDeepPP attains high accuracy in antimicrobial (0.9726) and phosphorylation site (0.9984) identification, with 99.5% specificity in glycosylation site prediction and substantial reduction in false negatives in antimalarial tasks. By enabling large-scale, accurate peptide analysis, PDeepPP supports biomedical research and the discovery of novel therapeutic targets for disease treatment. All code, datasets, and pretrained models are publicly available via GitHub:https://github.com/fondress/PDeepPP and Hugging Face:https://huggingface.co/fondress/PDeppPP.

Updated: 2025-07-24 08:48:10

标题: 一个通用的肽段鉴定语言模型

摘要: 准确识别生物活性肽（BPs）和蛋白质后转译修饰（PTMs）对于理解蛋白质功能和推动治疗发现至关重要。然而，大多数计算方法在跨不同肽功能的泛化性方面仍然有限。在这里，我们提出了PDeepPP，这是一个统一的深度学习框架，将预训练的蛋白质语言模型与混合的变压器-卷积架构相结合，可以强大地识别不同肽类和PTM位点。我们筛选了全面的基准数据集，并实施了解决数据不平衡的策略，使PDeepPP能够系统地提取全局和局部序列特征。通过广泛的分析-包括降维和比较研究-PDeepPP展示了强大、可解释的肽段表示，并在33个生物识别任务中的25个中实现了最先进的性能。值得注意的是，PDeepPP在抗菌（0.9726）和磷酸化位点（0.9984）识别中取得了高准确性，在糖基化位点预测中具有99.5%的特异性，并在抗疟疾任务中大幅减少了假阴性。通过实现大规模、准确的肽分析，PDeepPP支持生物医学研究和发现疾病治疗的新治疗靶点。所有代码、数据集和预训练模型都可以通过GitHub和Hugging Face公开获取。

更新时间: 2025-07-24 08:48:10

领域: cs.LG,cs.AI,92C40, 68T07,I.2.6; J.3

下载: http://arxiv.org/abs/2502.15610v4

A general language model for peptide identification

Updated: 2025-07-24 08:48:10

标题: 一个通用的语言模型用于肽段鉴定

摘要: 准确识别生物活性肽（BPs）和蛋白质翻译后修饰（PTMs）对于理解蛋白质功能和推进治疗药物的发现至关重要。然而，大多数计算方法在不同肽类功能中的泛化能力仍然有限。在这里，我们提出了PDeepPP，这是一个统一的深度学习框架，将预训练的蛋白质语言模型与混合的transformer-卷积架构集成在一起，实现了在不同肽类和PTM位点中的强大识别能力。我们精心策划了全面的基准数据集，并实施了解决数据不平衡的策略，使PDeepPP能够系统地提取全局和局部序列特征。通过广泛的分析，包括维度缩减和比较研究，PDeepPP展示了强大、可解释的肽表示，并在33个生物识别任务中的25个中取得了最先进的性能。值得注意的是，PDeepPP在抗菌（0.9726）和磷酸化位点（0.9984）识别中取得了高准确度，在糖基化位点预测中具有99.5%的特异性，并在抗疟任务中显著减少了假阴性。通过实现大规模、准确的肽分析，PDeepPP支持生物医学研究，为疾病治疗的新型治疗靶点的发现提供支持。所有代码、数据集和预训练模型都可以通过GitHub：https://github.com/fondress/PDeepPP 和 Hugging Face：https://huggingface.co/fondress/PDeppPP 公开获取。

更新时间: 2025-07-24 08:48:10

领域: cs.LG,cs.AI,92C40, 68T07,I.2.6; J.3

下载: http://arxiv.org/abs/2502.15610v4

Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling

A large language model (LLM) with knowledge in both scientific and general tasks is the foundation of science general intelligence. However, directly continued pretraining an LLM using science data usually leads to catastrophic forgetting, which indicates severe degradation in general ability. In this report, we present Innovator, which solves this problem by upcycling a pre-trained dense LLM into a fine-grained Mixtures-of-Experts model during continued pretraining, where different experts are expected to learn science knowledge in different disciplines, and a shared expert is utilized for general tasks. Innovator introduces a four-stage upcycle training paradigm: (1) Scientific Expert Induction on discipline-specific data, (2) Fine-grained Expert Splitting via FFN dimension decomposition, (3) Science-Aware Routing warmup, and (4) Generalist-Scientist Integration training on hybrid datasets. Such a paradigm enables knowledge in the general domain, and different scientific disciplines can be decoupled, avoiding the negative influence among knowledge in different domains. With 53.3B total parameters and 13.3B activated, Innovator extends Qwen2.5-7B using a shared general expert and 64 specialized scientific experts with 8 activated. Trained on 300B tokens with tri-level quality-controlled data, Innovator achieves 25% average improvement across 30 scientific tasks with a win rate as 70%, while retaining 99% performance in general tasks. Furthermore, Innovator-Reason, which is post-trained from Innovator for reasoning boosting, exhibits excellent reasoning performance in solving complex scientific problems with improvements over 30%.

Updated: 2025-07-24 08:37:58

标题: 创新：科学的持续预训练与精细化MoE再利用

摘要: 一个具有科学和一般任务知识的大型语言模型（LLM）是科学智能的基础。然而，直接使用科学数据继续对LLM进行预训练通常会导致灾难性遗忘，这表明一般能力严重下降。在这份报告中，我们提出了Innovator，通过将预训练的密集LLM升级为一个细粒度的专家混合模型，解决了这个问题，在继续预训练过程中，不同的专家被期望在不同学科中学习科学知识，同时使用一个共享专家进行一般任务。Innovator引入了一个四阶段的升级训练范式：（1）在特定学科数据上进行科学专家归纳，（2）通过FFN维度分解进行细粒度专家分割，（3）科学感知路由预热，和（4）在混合数据集上进行一般-科学家整合训练。这种范式使得一般领域和不同科学学科的知识能够解耦，避免了不同领域知识之间的负面影响。Innovator使用53.3B总参数和13.3B激活参数，通过使用一个共享的一般专家和64个专门的科学专家进行扩展，其中8个被激活。在300B令牌上经过三级质量控制数据训练后，Innovator在30个科学任务中实现了25%的平均改进，胜率为70%，同时在一般任务中保持了99%的性能。此外，从Innovator后训练的Innovator-Reason在解决复杂科学问题时表现出色，其推理性能提高了超过30%。

更新时间: 2025-07-24 08:37:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18671v1

Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling

Updated: 2025-07-24 08:37:58

标题: 创新者：科学持续预训练与精细MoE升级再利用

摘要: 具有科学和一般任务知识的大型语言模型（LLM）是科学智能的基础。然而，直接使用科学数据继续预训练LLM通常会导致灾难性遗忘，这表明一般能力严重下降。在这篇报告中，我们介绍了Innovator，通过将预先训练的密集LLM升级为一个细粒度的专家混合模型来解决这个问题，在持续预训练期间，不同的专家被期望在不同学科中学习科学知识，并且一个共享专家被用于一般任务。Innovator引入了一个四阶段的升级训练范式：（1）在特定学科数据上进行科学专家诱导，（2）通过FFN维度分解进行细粒度专家分割，（3）科学感知路由预热，以及（4）在混合数据集上进行一般-科学家整合训练。这样的范式使得一般领域和不同科学学科的知识可以解耦，避免不同领域知识之间的负面影响。Innovator具有53.3B总参数和13.3B激活参数，使用一个共享的一般专家和64个专业科学专家，其中8个被激活，扩展了Qwen2.5-7B。在经过三级质量控制数据的300B令牌上进行训练，Innovator在30项科学任务中取得了25%的平均改进，胜率为70%，同时在一般任务中保持了99%的性能。此外，后续从Innovator中进行推理增强的Innovator-Reason在解决复杂科学问题时表现出色，改进超过30%。

更新时间: 2025-07-24 08:37:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18671v1

ChronoSelect: Robust Learning with Noisy Labels via Dynamics Temporal Memory

Training deep neural networks on real-world datasets is often hampered by the presence of noisy labels, which can be memorized by over-parameterized models, leading to significant degradation in generalization performance. While existing methods for learning with noisy labels (LNL) have made considerable progress, they fundamentally suffer from static snapshot evaluations and fail to leverage the rich temporal dynamics of learning evolution. In this paper, we propose ChronoSelect (chrono denoting its temporal nature), a novel framework featuring an innovative four-stage memory architecture that compresses prediction history into compact temporal distributions. Our unique sliding update mechanism with controlled decay maintains only four dynamic memory units per sample, progressively emphasizing recent patterns while retaining essential historical knowledge. This enables precise three-way sample partitioning into clean, boundary, and noisy subsets through temporal trajectory analysis and dual-branch consistency. Theoretical guarantees prove the mechanism's convergence and stability under noisy conditions. Extensive experiments demonstrate ChronoSelect's state-of-the-art performance across synthetic and real-world benchmarks.

Updated: 2025-07-24 08:29:21

标题: ChronoSelect：通过动态时间记忆实现具有噪声标签的稳健学习

摘要: 在真实世界数据集上训练深度神经网络通常会受到存在噪声标签的影响，这些标签可以被过度参数化的模型记忆，导致泛化性能显著下降。虽然现有的学习噪声标签方法已经取得了相当大的进展，但它们基本上受限于静态快照评估，并未利用学习演变的丰富时间动态。在本文中，我们提出了ChronoSelect（chrono表示其时间特性），这是一个新颖的框架，具有创新的四阶段存储架构，将预测历史压缩为紧凑的时间分布。我们独特的滑动更新机制通过受控衰减，每个样本仅保留四个动态内存单元，逐渐强调最近的模式，同时保留重要的历史知识。这使得通过时间轨迹分析和双分支一致性精确地将样本分为干净、边界和噪声子集成为可能。理论保证证明了在噪声条件下该机制的收敛性和稳定性。广泛的实验证明了ChronoSelect在合成和真实世界基准测试中的最新性能。

更新时间: 2025-07-24 08:29:21

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.18183v1

ChronoSelect: Robust Learning with Noisy Labels via Dynamics Temporal Memory

Updated: 2025-07-24 08:29:21

标题: ChronoSelect：通过动态时间记忆实现具有噪声标签的稳健学习

摘要: 在真实世界数据集上训练深度神经网络通常会受到存在噪声标签的影响，这些噪声标签可能被过度参数化的模型记住，导致泛化性能显著降低。虽然现有的学习带噪声标签的方法已经取得了相当大的进展，但它们基本上受到静态快照评估的限制，并且未能利用学习演变的丰富时间动态。在本文中，我们提出了ChronoSelect（chrono表示其时间特性），这是一个创新的框架，具有一个创新的四阶段记忆架构，将预测历史压缩为紧凑的时间分布。我们独特的滑动更新机制通过控制衰减仅保留每个样本的四个动态记忆单元，逐渐强调最近的模式同时保留必要的历史知识。这使得通过时间轨迹分析和双分支一致性将样本精确地分为清洁、边界和噪声子集成为可能。理论保证证明了在噪声条件下该机制的收敛性和稳定性。广泛的实验证明了ChronoSelect在合成和真实世界基准测试中的最先进性能。

更新时间: 2025-07-24 08:29:21

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.18183v1

SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models

Large Language Models (LLMs) can achieve inflated scores on multiple-choice tasks by exploiting inherent biases in option positions or labels, rather than demonstrating genuine understanding. This study introduces SCOPE, an evaluation framework designed to measure and mitigate such selection bias in a dataset-independent manner. By repeatedly invoking a null prompt that lacks semantic content, SCOPE estimates each model's unique position-bias distribution. It then redistributes the answer slot according to the inverse-bias distribution, thereby equalizing the lucky-rate, the probability of selecting the correct answer by chance. Furthermore, it prevents semantically similar distractors from being placed adjacent to the answer, thereby blocking near-miss guesses based on superficial proximity cues. Across multiple benchmark experiments, SCOPE consistently outperformed existing debiasing methods in terms of stable performance improvements and showed clearer confidence distributions over correct options. This framework thus offers a new standard for enhancing the fairness and reliability of LLM evaluations.

Updated: 2025-07-24 08:28:17

标题: 范围：用于评估大型语言模型的随机和反偏置选项放置

摘要: 大型语言模型（LLMs）可以通过利用选项位置或标签中固有的偏见，而不是展示真正的理解，来在多项选择任务上取得夸大的分数。本研究介绍了SCOPE，一个旨在测量和减轻数据集无关的选择偏见的评估框架。通过反复调用缺乏语义内容的空白提示，SCOPE估计每个模型的独特的位置偏见分布。然后根据逆偏见分布重新分配答案位置，从而使幸运率相等化，即通过偶然选择正确答案的概率。此外，它防止将语义上相似的干扰项放置在答案旁边，从而阻止基于表面接近线索的近似猜测。在多个基准实验中，SCOPE始终优于现有的去偏见方法，表现出稳定的性能改善，并显示出对正确选项的更清晰的置信度分布。因此，这个框架为增强LLM评估的公平性和可靠性提供了一个新的标准。

更新时间: 2025-07-24 08:28:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18182v1

Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory

While large language models (LLMs) leverage both knowledge and reasoning during inference, the capacity to distinguish between them plays a pivotal role in model analysis, interpretability, and development. Inspired by dual-system cognitive theory, we propose a cognition attribution framework to decouple the contribution of knowledge and reasoning. In particular, the cognition of LLMs is decomposed into two distinct yet complementary phases: knowledge retrieval (Phase 1) and reasoning adjustment (Phase 2). To separate these phases, LLMs are prompted to generate answers under two different cognitive modes, fast thinking and slow thinking, respectively. The performance under different cognitive modes is analyzed to quantify the contribution of knowledge and reasoning. This architecture is employed to 15 LLMs across 3 datasets. Results reveal: (1) reasoning adjustment is domain-specific, benefiting reasoning-intensive domains (e.g., mathematics, physics, and chemistry) and potentially imparing knowledge-intensive domains. (2) Parameter scaling improves both knowledge and reasoning, with knowledge improvements being more pronounced. Additionally, parameter scaling make LLMs reasoning significantly more prudent, while moderately more intelligent. (3) Knowledge primarily resides in lower network layers, while reasoning operates in higher layers. Our framework not only helps understand LLMs from a "decoupling" perspective, but also provides new insights into existing research, including scaling laws, hierarchical knowledge editing, and limitations of small-model reasoning.

Updated: 2025-07-24 08:24:52

标题: 在LLMs中分离知识和推理：使用认知双系统理论的探索

摘要: 大型语言模型（LLMs）在推理过程中利用知识和推理，然而区分二者的能力在模型分析、可解释性和发展中起着至关重要的作用。受到双系统认知理论的启发，我们提出了一个认知归因框架来解耦知识和推理的贡献。具体而言，LLMs的认知被分解为两个独立但互补的阶段：知识检索（阶段1）和推理调整（阶段2）。为了分离这些阶段，LLMs被促使以两种不同的认知模式生成答案，分别是快速思维和慢速思维。对不同认知模式下的表现进行分析，以量化知识和推理的贡献。这种架构应用于3个数据集上的15个LLMs。结果显示：（1）推理调整是领域特定的，有利于推理密集型领域（例如数学、物理和化学），但可能影响知识密集型领域。（2）参数缩放既改善知识又改善推理，其中知识改善更为显著。此外，参数缩放使LLMs的推理变得更为慎重，同时略微更加智能。（3）知识主要存在于较低的网络层，而推理在较高层中操作。我们的框架不仅有助于从“解耦”视角理解LLMs，还为现有研究提供了新的见解，包括缩放规律、层次化知识编辑和小型模型推理的局限性。

更新时间: 2025-07-24 08:24:52

领域: cs.AI

下载: http://arxiv.org/abs/2507.18178v1

Learning Temporal Abstractions via Variational Homomorphisms in Option-Induced Abstract MDPs

Large Language Models (LLMs) have shown remarkable reasoning ability through explicit Chain-of-Thought (CoT) prompting, but generating these step-by-step textual explanations is computationally expensive and slow. To overcome this, we aim to develop a framework for efficient, implicit reasoning, where the model "thinks" in a latent space without generating explicit text for every step. We propose that these latent thoughts can be modeled as temporally-extended abstract actions, or options, within a hierarchical reinforcement learning framework. To effectively learn a diverse library of options as latent embeddings, we first introduce the Variational Markovian Option Critic (VMOC), an off-policy algorithm that uses variational inference within the HiT-MDP framework. To provide a rigorous foundation for using these options as an abstract reasoning space, we extend the theory of continuous MDP homomorphisms. This proves that learning a policy in the simplified, abstract latent space, for which VMOC is suited, preserves the optimality of the solution to the original, complex problem. Finally, we propose a cold-start procedure that leverages supervised fine-tuning (SFT) data to distill human reasoning demonstrations into this latent option space, providing a rich initialization for the model's reasoning capabilities. Extensive experiments demonstrate that our approach achieves strong performance on complex logical reasoning benchmarks and challenging locomotion tasks, validating our framework as a principled method for learning abstract skills for both language and control.

Updated: 2025-07-24 08:23:56

标题: 通过选项诱导的抽象MDPs中的变分同态学习时间抽象

摘要: 大型语言模型（LLMs）通过显式的思维链（CoT）提示展示了出色的推理能力，但生成这些逐步的文本解释在计算上是昂贵且缓慢的。为了克服这一问题，我们旨在开发一个用于高效、隐式推理的框架，其中模型在潜在空间中“思考”，而无需为每个步骤生成明确的文本。我们提出这些潜在思维可以被建模为在层次强化学习框架中的时间扩展的抽象行为，或选项。为了有效学习多样化的选项作为潜在嵌入，我们首先引入了变分马尔可夫选项评论家（VMOC），这是一种利用变分推理在HiT-MDP框架中的离线策略。为了为使用这些选项作为抽象推理空间提供严格的基础，我们扩展了连续MDP同态的理论。这证明了在简化的抽象潜在空间中学习策略（适合VMOC），可以保持对原始复杂问题的解决方案的最优性。最后，我们提出了一种冷启动过程，利用监督微调（SFT）数据将人类推理演示提炼成这个潜在选项空间，为模型的推理能力提供丰富的初始化。大量实验证明，我们的方法在复杂逻辑推理基准和具有挑战性的运动任务上取得了强大的性能，验证了我们的框架作为学习语言和控制的抽象技能的原则方法。

更新时间: 2025-07-24 08:23:56

领域: cs.AI,I.2.7

下载: http://arxiv.org/abs/2507.16473v2

Differential-UMamba: Rethinking Tumor Segmentation Under Limited Data Scenarios

In data-scarce scenarios, deep learning models often overfit to noise and irrelevant patterns, which limits their ability to generalize to unseen samples. To address these challenges in medical image segmentation, we introduce Diff-UMamba, a novel architecture that combines the UNet framework with the mamba mechanism for modeling long-range dependencies. At the heart of Diff-UMamba is a Noise Reduction Module (NRM), which employs a signal differencing strategy to suppress noisy or irrelevant activations within the encoder. This encourages the model to filter out spurious features and enhance task-relevant representations, thereby improving its focus on clinically meaningful regions. As a result, the architecture achieves improved segmentation accuracy and robustness, particularly in low-data settings. Diff-UMamba is evaluated on multiple public datasets, including MSD (lung and pancreas) and AIIB23, demonstrating consistent performance gains of 1-3% over baseline methods across diverse segmentation tasks. To further assess performance under limited-data conditions, additional experiments are conducted on the BraTS-21 dataset by varying the proportion of available training samples. The approach is also validated on a small internal non-small cell lung cancer (NSCLC) dataset for gross tumor volume (GTV) segmentation in cone beam CT (CBCT), where it achieves a 4-5% improvement over the baseline.

Updated: 2025-07-24 08:23:11

标题: 不同-UMamba：重新思考有限数据情景下的肿瘤分割

摘要: 在数据稀缺的情况下，深度学习模型常常会过拟合噪声和无关模式，从而限制了它们对未见样本的泛化能力。为了解决医学图像分割中的这些挑战，我们引入了Diff-UMamba，这是一种将UNet框架与mamba机制相结合的新型架构，用于建模长程依赖关系。Diff-UMamba的核心是一个噪声减少模块（NRM），它采用信号差分策略来抑制编码器中的嘈杂或无关激活。这鼓励模型滤除虚假特征并增强任务相关的表示，从而提高其对临床有意义区域的关注度。因此，该架构在低数据设置中实现了改善的分割准确性和鲁棒性。Diff-UMamba在多个公共数据集上进行了评估，包括MSD（肺部和胰腺）和AIIB23，展示了在各种分割任务中相对基准方法的一致性性能提升（1-3％）。为了进一步评估在有限数据条件下的性能，还对BraTS-21数据集进行了额外的实验，通过改变可用训练样本的比例。该方法还在一个小型内部非小细胞肺癌（NSCLC）数据集上进行了验证，用于锥形束CT（CBCT）中的肿瘤总体积（GTV）分割，在该数据集上实现了相对基准方法的4-5％的改进。

更新时间: 2025-07-24 08:23:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18177v1

Sticking to the Mean: Detecting Sticky Tokens in Text Embedding Models

Despite the widespread use of Transformer-based text embedding models in NLP tasks, surprising 'sticky tokens' can undermine the reliability of embeddings. These tokens, when repeatedly inserted into sentences, pull sentence similarity toward a certain value, disrupting the normal distribution of embedding distances and degrading downstream performance. In this paper, we systematically investigate such anomalous tokens, formally defining them and introducing an efficient detection method, Sticky Token Detector (STD), based on sentence and token filtering. Applying STD to 40 checkpoints across 14 model families, we discover a total of 868 sticky tokens. Our analysis reveals that these tokens often originate from special or unused entries in the vocabulary, as well as fragmented subwords from multilingual corpora. Notably, their presence does not strictly correlate with model size or vocabulary size. We further evaluate how sticky tokens affect downstream tasks like clustering and retrieval, observing significant performance drops of up to 50%. Through attention-layer analysis, we show that sticky tokens disproportionately dominate the model's internal representations, raising concerns about tokenization robustness. Our findings show the need for better tokenization strategies and model design to mitigate the impact of sticky tokens in future text embedding applications.

Updated: 2025-07-24 08:13:16

标题: 坚持平均值：在文本嵌入模型中检测粘性标记

摘要: 尽管Transformer-based文本嵌入模型在NLP任务中被广泛使用，但令人惊讶的“粘性标记”可能会破坏嵌入的可靠性。这些标记在被重复插入到句子中时，会将句子相似性拉向某个特定值，破坏嵌入距离的正常分布并降低下游性能。在本文中，我们系统地研究了这些异常标记，正式定义它们并引入了一种高效的检测方法，基于句子和标记过滤的Sticky Token Detector（STD）。将STD应用于14个模型系列的40个检查点，我们发现了总共868个粘性标记。我们的分析表明，这些标记通常来自词汇表中的特殊或未使用的条目，以及多语言语料库中的碎片化子词。值得注意的是，它们的存在并不严格与模型大小或词汇大小相关。我们进一步评估了粘性标记如何影响聚类和检索等下游任务，观察到性能下降高达50%。通过注意力层分析，我们展示了粘性标记在模型的内部表示中不成比例地占主导地位，引发了有关标记化鲁棒性的担忧。我们的发现表明，未来文本嵌入应用中需要更好的标记化策略和模型设计来减轻粘性标记的影响。

更新时间: 2025-07-24 08:13:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18171v1

Statistical Runtime Verification for LLMs via Robustness Estimation

Adversarial robustness verification is essential for ensuring the safe deployment of Large Language Models (LLMs) in runtime-critical applications. However, formal verification techniques remain computationally infeasible for modern LLMs due to their exponential runtime and white-box access requirements. This paper presents a case study adapting and extending the RoMA statistical verification framework to assess its feasibility as an online runtime robustness monitor for LLMs in black-box deployment settings. Our adaptation of RoMA analyzes confidence score distributions under semantic perturbations to provide quantitative robustness assessments with statistically validated bounds. Our empirical validation against formal verification baselines demonstrates that RoMA achieves comparable accuracy (within 1\% deviation), and reduces verification times from hours to minutes. We evaluate this framework across semantic, categorial, and orthographic perturbation domains. Our results demonstrate RoMA's effectiveness for robustness monitoring in operational LLM deployments. These findings point to RoMA as a potentially scalable alternative when formal methods are infeasible, with promising implications for runtime verification in LLM-based systems.

Updated: 2025-07-24 08:03:09

标题: 通过健壮性估计的LLMs的统计运行时验证

摘要: 对抗性鲁棒性验证对于确保大型语言模型（LLMs）在运行关键应用中的安全部署至关重要。然而，由于现代LLMs的指数运行时间和白盒访问要求，正式验证技术仍然在计算上不可行。本文提出了一个案例研究，将RoMA统计验证框架进行调整和扩展，以评估其在黑盒部署设置中作为在线运行时鲁棒性监视器的可行性。我们对RoMA的调整分析了在语义扰动下的置信度分布，以提供经过统计验证的边界的定量鲁棒性评估。我们的实证验证与正式验证基线对比表明，RoMA实现了可比的准确性（误差在1\%以内），并将验证时间从几小时缩短到几分钟。我们在语义、分类和拼写扰动领域评估了这一框架。我们的结果表明，RoMA在操作LLM部署中的鲁棒性监视方面非常有效。这些发现表明，当正式方法不可行时，RoMA可能是一个具有潜在可扩展性的替代方案，对基于LLM的系统中的运行时验证具有有希望的影响。

更新时间: 2025-07-24 08:03:09

领域: cs.LG

下载: http://arxiv.org/abs/2504.17723v2

Statistical Runtime Verification for LLMs via Robustness Estimation

Updated: 2025-07-24 08:03:09

标题: 通过鲁棒性估计对LLMs进行统计运行时验证

摘要: 对于确保在运行关键应用中安全部署大型语言模型（LLMs）来说，对抗性鲁棒性验证是至关重要的。然而，由于其指数级运行时间和白盒访问要求，现代LLMs对形式验证技术仍然不可行。本文通过一个案例研究，对RoMA统计验证框架进行改编和扩展，以评估其作为在线运行时鲁棒性监视器在黑盒部署环境中的可行性。我们改编的RoMA分析了在语义扰动下的置信度分布，以提供经过统计验证的具有量化鲁棒性评估的边界。我们的实证验证与形式验证基线进行比较，结果表明RoMA实现了可比的准确性（误差在1\%以内），并将验证时间从几小时缩短到几分钟。我们跨语义、类别和正字法扰动领域评估了这一框架。我们的结果表明RoMA在运行中的LLM部署中用于鲁棒性监视的有效性。这些发现表明RoMA作为一种潜在可扩展的替代方案，当形式方法不可行时具有有希望的应用前景，对LLM系统中的运行时验证具有有益的影响。

更新时间: 2025-07-24 08:03:09

领域: cs.LG

下载: http://arxiv.org/abs/2504.17723v2

A Survey of Event Causality Identification: Taxonomy, Challenges, Assessment, and Prospects

Event Causality Identification (ECI) has become an essential task in Natural Language Processing (NLP), focused on automatically detecting causal relationships between events within texts. This comprehensive survey systematically investigates fundamental concepts and models, developing a systematic taxonomy and critically evaluating diverse models. We begin by defining core concepts, formalizing the ECI problem, and outlining standard evaluation protocols. Our classification framework divides ECI models into two primary tasks: Sentence-level Event Causality Identification (SECI) and Document-level Event Causality Identification (DECI). For SECI, we review models employing feature pattern-based matching, machine learning classifiers, deep semantic encoding, prompt-based fine-tuning, and causal knowledge pre-training, alongside data augmentation strategies. For DECI, we focus on approaches utilizing deep semantic encoding, event graph reasoning, and prompt-based fine-tuning. Special attention is given to recent advancements in multi-lingual and cross-lingual ECI, as well as zero-shot ECI leveraging Large Language Models (LLMs). We analyze the strengths, limitations, and unresolved challenges associated with each approach. Extensive quantitative evaluations are conducted on four benchmark datasets to rigorously assess the performance of various ECI models. We conclude by discussing future research directions and highlighting opportunities to advance the field further.

Updated: 2025-07-24 07:53:24

标题: 事件因果识别调查：分类、挑战、评估和展望

摘要: 事件因果识别（ECI）已成为自然语言处理（NLP）中的一个重要任务，重点是自动检测文本中事件之间的因果关系。本综合调查系统地调查了基本概念和模型，开发了一个系统化的分类体系，并对各种模型进行了批判性评估。我们首先定义核心概念，形式化ECI问题，并概述标准评估协议。我们的分类框架将ECI模型分为两个主要任务：句子级事件因果识别（SECI）和文档级事件因果识别（DECI）。对于SECI，我们回顾了采用基于特征模式匹配、机器学习分类器、深度语义编码、基于提示的微调以及因果知识预训练的模型，以及数据增强策略。对于DECI，我们关注利用深度语义编码、事件图推理和基于提示的微调的方法。我们特别关注多语言和跨语言ECI的最新进展，以及利用大型语言模型（LLMs）的零-shot ECI。我们分析了每种方法的优势、局限性和未解决的挑战。我们在四个基准数据集上进行了广泛的定量评估，严格评估了各种ECI模型的性能。最后，我们讨论未来的研究方向，并强调进一步推进该领域的机会。

更新时间: 2025-07-24 07:53:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2411.10371v5

An Improved ChaCha Algorithm Based on Quantum Random Number

Due to the merits of high efficiency and strong security against timing and side-channel attacks, ChaCha has been widely applied in real-time communication and data streaming scenarios. However, with the rapid development of AI-assisted cryptanalysis and quantum computing technologies, there are serious challenges to the secure implementation of ChaCha cipher. To further strengthen the security of ChaCha cipher, we propose an improved variant based on quantum random numbers, i.e., Quantum Random Number Enhanced ChaCha (QRE-ChaCha). Specifically, the design XORs the initial constants with quantum random numbers and periodically injects quantum random numbers into selected state words during odd rounds to enhance diffusion. Compared with the original ChaCha, the present variant shows stronger resistance to differential attacks and generates a keystream with statistical randomness, thereby offering increased robustness against both classical and quantum attacks. To evaluate the security and performance of the present ChaCha, our analysis proceeds in three main parts. Firstly, we analyze its theoretical security in terms of quantum randomness and attack testing, and conduct differential cryptanalysis with an automated search method based on the Boolean satisfiability problem (SAT). Secondly, we subject the keystream generated by the cipher to randomness tests using the NIST statistical test suite and the GM/T 0005-2021 randomness testing standard. Finally, we assess its encryption and decryption performance by measuring its encryption speed on files of various sizes. According to the results, the present ChaCha is significantly improved to resist differential attacks while maintaining the high efficiency of the original ChaCha cipher, and its keystream successfully passes statistical randomness tests using the NIST and GM/T 0005-2021 standards, meeting cryptographic application requirements.

Updated: 2025-07-24 07:50:17

标题: 基于量子随机数的改进ChaCha算法

摘要: 由于ChaCha具有高效率和强大的抗时序和侧信道攻击的优点，因此在实时通信和数据流场景中被广泛应用。然而，随着AI辅助的密码分析和量子计算技术的快速发展，对于ChaCha密码的安全实现存在严重挑战。为了进一步加强ChaCha密码的安全性，我们提出了一种基于量子随机数的改进变体，即量子随机数增强ChaCha（QRE-ChaCha）。具体来说，该设计使用量子随机数与初始常数进行异或运算，并在奇数轮中定期将量子随机数注入选定的状态单词，以增强扩散。与原始ChaCha相比，该变体表现出更强的抵抗差分攻击能力，并生成具有统计随机性的密钥流，从而提供了对传统和量子攻击的增强鲁棒性。为了评估当前ChaCha的安全性和性能，我们的分析分为三个主要部分。首先，我们从量子随机性和攻击测试方面分析其理论安全性，并使用基于布尔可满足性问题（SAT）的自动搜索方法进行差分密码分析。其次，我们将密码生成的密钥流经过NIST统计测试套件和GM/T 0005-2021随机性测试标准的随机性测试。最后，我们通过测量其对各种大小文件的加密速度来评估其加密和解密性能。根据结果，当前的ChaCha在保持原始ChaCha密码高效率的同时显著改善了对差分攻击的抵抗力，并且其密钥流成功通过了使用NIST和GM/T 0005-2021标准的统计随机性测试，满足了密码应用的要求。

更新时间: 2025-07-24 07:50:17

领域: cs.CR,quant-ph,Primary:94A60, Secondary:68P25, Tertiary:81P94

下载: http://arxiv.org/abs/2507.18157v1

An Improved ChaCha Algorithm Based on Quantum Random Number

Updated: 2025-07-24 07:50:17

标题: 基于量子随机数的改进ChaCha算法

摘要: 由于ChaCha具有高效性和强大的抗时序和侧信道攻击安全性，因此已广泛应用于实时通信和数据流场景。然而，随着AI辅助密码分析和量子计算技术的快速发展，对ChaCha密码的安全实现存在严峻挑战。为进一步加强ChaCha密码的安全性，我们提出了一种基于量子随机数的改进变体，即量子随机数增强ChaCha（QRE-ChaCha）。具体而言，该设计将初始常量与量子随机数进行异或操作，并在奇数轮中定期将量子随机数注入到选定的状态字中以增强扩散。与原始ChaCha相比，该变体显示出更强的抗差分攻击能力，并生成具有统计随机性的密钥流，从而提供了更强大的抗经典和量子攻击能力。为评估当前ChaCha的安全性和性能，我们的分析分为三个主要部分。首先，我们从量子随机性和攻击测试的角度分析其理论安全性，并利用基于布尔可满足性问题（SAT）的自动搜索方法进行差分密码分析。其次，我们使用NIST统计测试套件和GM/T 0005-2021随机性测试标准对密码生成的密钥流进行随机性测试。最后，我们通过测量其在不同大小文件上的加密速度来评估其加密和解密性能。根据结果，当前的ChaCha在保持原始ChaCha密码高效性的同时显著改善了抵抗差分攻击的能力，并且其密钥流成功通过了NIST和GM/T 0005-2021标准的统计随机性测试，满足了密码应用的要求。

更新时间: 2025-07-24 07:50:17

领域: cs.CR,quant-ph,Primary:94A60, Secondary:68P25, Tertiary:81P94

下载: http://arxiv.org/abs/2507.18157v1

SDSC:A Structure-Aware Metric for Semantic Signal Representation Learning

We propose the Signal Dice Similarity Coefficient (SDSC), a structure-aware metric function for time series self-supervised representation learning. Most Self-Supervised Learning (SSL) methods for signals commonly adopt distance-based objectives such as mean squared error (MSE), which are sensitive to amplitude, invariant to waveform polarity, and unbounded in scale. These properties hinder semantic alignment and reduce interpretability. SDSC addresses this by quantifying structural agreement between temporal signals based on the intersection of signed amplitudes, derived from the Dice Similarity Coefficient (DSC).Although SDSC is defined as a structure-aware metric, it can be used as a loss by subtracting from 1 and applying a differentiable approximation of the Heaviside function for gradient-based optimization. A hybrid loss formulation is also proposed to combine SDSC with MSE, improving stability and preserving amplitude where necessary. Experiments on forecasting and classification benchmarks demonstrate that SDSC-based pre-training achieves comparable or improved performance over MSE, particularly in in-domain and low-resource scenarios. The results suggest that structural fidelity in signal representations enhances the semantic representation quality, supporting the consideration of structure-aware metrics as viable alternatives to conventional distance-based methods.

Updated: 2025-07-24 07:48:25

标题: SDSC:一种用于语义信号表示学习的结构感知度量

摘要: 我们提出了信号骰子相似性系数（SDSC），这是一种针对时间序列自监督表示学习的结构感知度量函数。大多数信号的自监督学习（SSL）方法通常采用基于距离的目标，如均方误差（MSE），这些方法对振幅敏感，对波形极性不变，并且在规模上没有限制。这些特性阻碍了语义对齐并降低了可解释性。SDSC通过量化基于带符号振幅的时间信号之间的结构协议来解决这个问题，这些振幅是从Dice相似性系数（DSC）导出的。尽管SDSC被定义为一种结构感知度量，但它可以通过从1中减去并应用Heaviside函数的可微近似来用作损失，以进行基于梯度的优化。还提出了一种混合损失公式，将SDSC与MSE结合起来，改善了稳定性并在必要时保留了振幅。对预测和分类基准的实验表明，基于SDSC的预训练在MSE上达到了可比较或更好的性能，特别是在领域内和资源匮乏的情况下。结果表明，信号表示中的结构保真度提高了语义表示质量，支持将结构感知度量作为传统基于距离的方法的可行替代方案。

更新时间: 2025-07-24 07:48:25

领域: cs.LG,cs.AI,cs.LO

下载: http://arxiv.org/abs/2507.14516v2

SDSC:A Structure-Aware Metric for Semantic Signal Representation Learning

Updated: 2025-07-24 07:48:25

标题: SDSC：一种用于语义信号表示学习的结构感知度量

摘要: 我们提出了信号骰子相似性系数（SDSC），这是一种面向结构的度量函数，用于时间序列的自监督表示学习。大多数信号的自监督学习（SSL）方法通常采用基于距离的目标，例如均方误差（MSE），这些目标对振幅敏感，对波形极性不变，且规模无限制。这些特性阻碍了语义对齐，并降低了可解释性。SDSC 通过量化时间信号之间的结构一致性基于带符号振幅的交集，由Dice 相似性系数（DSC）导出。尽管SDSC被定义为面向结构的度量，但可以通过减去1并应用一个可微的海维赛德函数的近似值基于梯度的优化。还提出了一种混合损失公式来结合SDSC和 MSE，改善稳定性并在必要时保留振幅。实验对预测和分类基准进行，表明基于SDSC的预训练在MSE之上实现了可比较或改进的性能，特别是在领域内和资源稀缺的情况下。结果表明，结构在信号表示中的忠实度提高了语义表示质量，支持将面向结构的度量考虑为可行的选择替代传统基于距离的方法。

更新时间: 2025-07-24 07:48:25

领域: cs.LG,cs.AI,cs.LO

下载: http://arxiv.org/abs/2507.14516v2

Quantum Machine Learning in Precision Medicine and Drug Discovery -- A Game Changer for Tailored Treatments?

The digitization of healthcare presents numerous challenges, including the complexity of biological systems, vast data generation, and the need for personalized treatment plans. Traditional computational methods often fall short, leading to delayed and sometimes ineffective diagnoses and treatments. Quantum Computing (QC) and Quantum Machine Learning (QML) offer transformative advancements with the potential to revolutionize medicine. This paper summarizes areas where QC promises unprecedented computational power, enabling faster, more accurate diagnostics, personalized treatments, and enhanced drug discovery processes. However, integrating quantum technologies into precision medicine also presents challenges, including errors in algorithms and high costs. We show that mathematically-based techniques for specifying, developing, and verifying software (formal methods) can enhance the reliability and correctness of QC. By providing a rigorous mathematical framework, formal methods help to specify, develop, and verify systems with high precision. In genomic data analysis, formal specification languages can precisely (1) define the behavior and properties of quantum algorithms designed to identify genetic markers associated with diseases. Model checking tools can systematically explore all possible states of the algorithm to (2) ensure it behaves correctly under all conditions, while theorem proving techniques provide mathematical (3) proof that the algorithm meets its specified properties, ensuring accuracy and reliability. Additionally, formal optimization techniques can (4) enhance the efficiency and performance of quantum algorithms by reducing resource usage, such as the number of qubits and gate operations. Therefore, we posit that formal methods can significantly contribute to enabling QC to realize its full potential as a game changer in precision medicine.

Updated: 2025-07-24 07:47:46

标题: 量子机器学习在精准医学和药物发现中的应用——个性化治疗的游戏改变者？

摘要: 医疗数字化带来了许多挑战，包括生物系统的复杂性、大量数据的生成以及个性化治疗方案的需求。传统的计算方法通常存在不足，导致诊断和治疗的延迟和有时无效。量子计算（QC）和量子机器学习（QML）提供了具有革命性潜力的进展，有可能彻底改变医学。本文总结了QC在哪些领域承诺提供前所未有的计算能力，实现更快、更准确的诊断，个性化治疗以及增强药物发现过程。然而，将量子技术整合到精准医学中也存在挑战，包括算法中的错误和高昂的成本。我们展示了基于数学的指定、开发和验证软件的技术（形式方法）可以增强QC的可靠性和正确性。通过提供严谨的数学框架，形式方法有助于精确指定、开发和验证系统。在基因组数据分析中，形式化规范语言可以精确地（1）定义旨在识别与疾病相关的遗传标记的量子算法的行为和特性。模型检查工具可以系统地探索算法的所有可能状态，以（2）确保其在所有条件下正确运行，而定理证明技术提供了数学（3）证明算法满足其指定属性，确保准确性和可靠性。此外，形式化优化技术可以（4）通过减少资源使用，如量子位和门操作数量，提高量子算法的效率和性能。因此，我们认为形式方法可以显著促进QC实现其作为精准医学中的革命性变革者的全部潜力。

更新时间: 2025-07-24 07:47:46

领域: cs.ET,cs.AI,quant-ph

下载: http://arxiv.org/abs/2502.18639v2

When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning

Efficient vision-language understanding of large Remote Sensing Images (RSIs) is meaningful but challenging. Current Large Vision-Language Models (LVLMs) typically employ limited pre-defined grids to process images, leading to information loss when handling gigapixel RSIs. Conversely, using unlimited grids significantly increases computational costs. To preserve image details while reducing computational complexity, we propose a text-guided token pruning method with Dynamic Image Pyramid (DIP) integration. Our method introduces: (i) a Region Focus Module (RFM) that leverages text-aware region localization capability to identify critical vision tokens, and (ii) a coarse-to-fine image tile selection and vision token pruning strategy based on DIP, which is guided by RFM outputs and avoids directly processing the entire large imagery. Additionally, existing benchmarks for evaluating LVLMs' perception ability on large RSI suffer from limited question diversity and constrained image sizes. We construct a new benchmark named LRS-VQA, which contains 7,333 QA pairs across 8 categories, with image length up to 27,328 pixels. Our method outperforms existing high-resolution strategies on four datasets using the same data. Moreover, compared to existing token reduction methods, our approach demonstrates higher efficiency under high-resolution settings. Dataset and code are in https://github.com/VisionXLab/LRS-VQA.

Updated: 2025-07-24 07:44:45

标题: 当大型视觉语言模型遇到大型遥感图像：从粗到细的文本引导的标记修剪

摘要: 大规模遥感图像（RSIs）的高效视觉-语言理解具有意义但具有挑战性。当前的大型视觉-语言模型（LVLMs）通常使用有限预定义的网格来处理图像，导致在处理千兆像素RSIs时信息丢失。相反，使用无限网格会显着增加计算成本。为了保留图像细节同时减少计算复杂性，我们提出了一种带有动态图像金字塔（DIP）集成的文本引导的标记修剪方法。我们的方法引入了：（i）一个区域焦点模块（RFM），利用文本感知区域定位能力来识别关键的视觉标记，以及（ii）基于DIP的粗到细的图像瓦片选择和视觉标记修剪策略，该策略由RFM输出引导，并避免直接处理整个大图像。此外，用于评估LVLMs在大型RSI上的感知能力的现有基准测试受限于问题多样性有限和图像大小受限。我们构建了一个名为LRS-VQA的新基准测试，其中包含8个类别的7,333个QA对，图像长度高达27,328像素。我们的方法在使用相同数据的四个数据集上优于现有的高分辨率策略。此外，与现有的标记减少方法相比，我们的方法在高分辨率设置下表现出更高的效率。数据集和代码位于https://github.com/VisionXLab/LRS-VQA。

更新时间: 2025-07-24 07:44:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.07588v3

GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar

Despite recent progress in 3D head avatar generation, balancing identity preservation, i.e., reconstruction, with novel poses and expressions, i.e., animation, remains a challenge. Existing methods struggle to adapt Gaussians to varying geometrical deviations across facial regions, resulting in suboptimal quality. To address this, we propose GeoAvatar, a framework for adaptive geometrical Gaussian Splatting. GeoAvatar leverages Adaptive Pre-allocation Stage (APS), an unsupervised method that segments Gaussians into rigid and flexible sets for adaptive offset regularization. Then, based on mouth anatomy and dynamics, we introduce a novel mouth structure and the part-wise deformation strategy to enhance the animation fidelity of the mouth. Finally, we propose a regularization loss for precise rigging between Gaussians and 3DMM faces. Moreover, we release DynamicFace, a video dataset with highly expressive facial motions. Extensive experiments show the superiority of GeoAvatar compared to state-of-the-art methods in reconstruction and novel animation scenarios.

Updated: 2025-07-24 07:41:40

标题: GeoAvatar：用于3D头像的自适应几何高斯喷涂

摘要: 尽管最近在3D头像生成方面取得了进展，但在保持身份信息完整性（即重建）和新颖姿势与表情（即动画）之间取得平衡仍然是一项挑战。现有方法难以适应面部不同区域的几何偏差，导致质量不佳。为解决这一问题，我们提出了GeoAvatar，一个自适应几何高斯喷涂框架。GeoAvatar利用自适应预分配阶段（APS），这是一种无监督方法，将高斯分割为刚性和柔性集合，用于自适应偏移正则化。然后，基于口腔解剖学和动态学，我们引入了一种新颖的口腔结构和部分变形策略，以增强口腔动画的保真度。最后，我们提出了一种用于精确配准高斯和3DMM面部之间的正则化损失。此外，我们发布了一个高度表现力的面部动作视频数据集DynamicFace。大量实验证明，与最先进的方法相比，GeoAvatar在重建和新颖动画场景中表现出卓越性能。

更新时间: 2025-07-24 07:41:40

领域: cs.GR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18155v1

GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar

Updated: 2025-07-24 07:41:40

标题: GeoAvatar：用于3D头像的自适应几何高斯点渲染

摘要: 尽管最近在3D头像生成方面取得了进展，但在保留身份特征（即重建）与新颖姿势和表情（即动画）之间取得平衡仍然是一个挑战。现有方法难以适应不同面部区域的几何偏差，导致质量不佳。为了解决这个问题，我们提出了GeoAvatar，一个自适应几何高斯分层的框架。GeoAvatar利用自适应预分配阶段（APS），这是一种无监督的方法，将高斯分为刚性和柔性集合，用于自适应偏移正则化。然后，基于口腔解剖学和动态学，我们引入了一种新颖的口腔结构和部分变形策略，以增强口腔动画的保真度。最后，我们提出了一种用于精确高斯和3DMM面部之间骨骼绑定的正则化损失。此外，我们发布了DynamicFace，一个具有高度表现力的面部运动的视频数据集。广泛的实验证明，与最先进的方法相比，GeoAvatar在重建和新颖动画场景中具有优势。

更新时间: 2025-07-24 07:41:40

领域: cs.GR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18155v1

When Noisy Labels Meet Class Imbalance on Graphs: A Graph Augmentation Method with LLM and Pseudo Label

Class-imbalanced graph node classification is a practical yet underexplored research problem. Although recent studies have attempted to address this issue, they typically assume clean and reliable labels when processing class-imbalanced graphs. This assumption often violates the nature of real-world graphs, where labels frequently contain noise. Given this gap, this paper systematically investigates robust node classification for class-imbalanced graphs with noisy labels. We propose GraphALP, a novel Graph Augmentation framework based on Large language models (LLMs) and Pseudo-labeling techniques. Specifically, we design an LLM-based oversampling method to generate synthetic minority nodes, producing label-accurate minority nodes to alleviate class imbalance. Based on the class-balanced graphs, we develop a dynamically weighted pseudo-labeling method to obtain high-confidence pseudo labels to reduce label noise ratio. Additionally, we implement a secondary LLM-guided oversampling mechanism to mitigate potential class distribution skew caused by pseudo labels. Experimental results show that GraphALP achieves superior performance over state-of-the-art methods on class-imbalanced graphs with noisy labels.

Updated: 2025-07-24 07:39:07

标题: 当嘈杂标签遇到图中的类别不平衡：一种基于LLM和伪标签的图增强方法

摘要: 不平衡的图节点分类是一个实际但尚未充分探讨的研究问题。尽管最近的研究已经尝试解决这个问题，但它们通常在处理不平衡图时假设标签是干净且可靠的。这种假设经常违反现实世界图的性质，其中标签经常包含噪声。鉴于这一差距，本文系统地研究了具有嘈杂标签的不平衡图的鲁棒节点分类。我们提出了GraphALP，这是一种基于大型语言模型（LLM）和伪标记技术的新颖图增强框架。具体来说，我们设计了一个基于LLM的过采样方法来生成合成的少数节点，产生准确标记的少数节点以减轻类别不平衡。基于类别平衡的图，我们开发了一种动态加权的伪标记方法，以获得高置信度的伪标签来减少标签噪声比。此外，我们实施了一个次级LLM引导的过采样机制，以减轻由伪标签引起的潜在类别分布偏斜。实验结果表明，GraphALP在具有嘈杂标签的不平衡图上表现优于最先进的方法。

更新时间: 2025-07-24 07:39:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18153v1

When Noisy Labels Meet Class Imbalance on Graphs: A Graph Augmentation Method with LLM and Pseudo Label

Updated: 2025-07-24 07:39:07

标题: 当嘈杂的标签遇到图中的类别不平衡：一种基于LLM和伪标签的图增强方法

摘要: 类不平衡的图节点分类是一个实际但尚未充分探讨的研究问题。尽管最近的研究已经试图解决这个问题，但它们通常在处理类不平衡图时假定标签是干净和可靠的。这种假设经常违反了现实世界图的本质，其中标签经常包含噪声。鉴于这一差距，本文系统地研究了带有噪声标签的类不平衡图的鲁棒节点分类。我们提出了GraphALP，这是一个基于大型语言模型（LLM）和伪标记技术的新颖图增强框架。具体地，我们设计了一个基于LLM的过采样方法来生成合成少数节点，产生准确标记的少数节点以减轻类不平衡。基于类平衡图，我们开发了一个动态加权的伪标记方法，以获得高置信度的伪标签，以降低标签噪声比率。此外，我们实现了一个次级LLM引导的过采样机制，以减轻由伪标签引起的潜在类分布倾斜。实验结果显示，GraphALP在带有噪声标签的类不平衡图上实现了优于最先进方法的性能。

更新时间: 2025-07-24 07:39:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18153v1

Scalable Parameter Design for Superconducting Quantum Circuits with Graph Neural Networks

To demonstrate supremacy of quantum computing, increasingly large-scale superconducting quantum computing chips are being designed and fabricated. However, the complexity of simulating quantum systems poses a significant challenge to computer-aided design of quantum chips, especially for large-scale chips. Harnessing the scalability of graph neural networks (GNNs), we here propose a parameter designing algorithm for large-scale superconducting quantum circuits. The algorithm depends on the so-called 'three-stair scaling' mechanism, which comprises two neural-network models: an evaluator supervisedly trained on small-scale circuits for applying to medium-scale circuits, and a designer unsupervisedly trained on medium-scale circuits for applying to large-scale ones. We demonstrate our algorithm in mitigating quantum crosstalk errors. Frequencies for both single- and two-qubit gates (corresponding to the parameters of nodes and edges) are considered simultaneously. Numerical results indicate that the well-trained designer achieves notable advantages in efficiency, effectiveness, and scalability. For example, for large-scale superconducting quantum circuits consisting of around 870 qubits, our GNNs-based algorithm achieves 51% of the errors produced by the state-of-the-art algorithm, with a time reduction from 90 min to 27 sec. Overall, a better-performing and more scalable algorithm for designing parameters of superconducting quantum chips is proposed, which initially demonstrates the advantages of applying GNNs in superconducting quantum chips.

Updated: 2025-07-24 07:23:42

标题: 使用图神经网络实现超导量子电路的可扩展参数设计

摘要: 为了展示量子计算的优越性，越来越大规模的超导量子计算芯片正在设计和制造。然而，模拟量子系统的复杂性对量子芯片的计算机辅助设计构成了重大挑战，特别是对于大规模芯片。利用图神经网络（GNNs）的可扩展性，我们在这里提出了一个针对大规模超导量子电路的参数设计算法。该算法依赖于所谓的“三级梯度缩放”机制，包括两个神经网络模型：一个在小规模电路上监督训练的评估器，用于应用于中等规模电路，以及一个在中等规模电路上无监督训练的设计师，用于应用于大规模电路。我们展示了我们的算法在减轻量子串扰错误方面的效果。同时考虑单量子比特和双量子比特门的频率（对应节点和边的参数）。数值结果表明，训练有素的设计师在效率、有效性和可扩展性方面具有显著优势。例如，对于约870个量子比特组成的大规模超导量子电路，我们基于GNNs的算法实现了最先进算法产生的错误的51%，时间缩短从90分钟到27秒。总的来说，提出了一个性能更好、更可扩展的超导量子芯片参数设计算法，最初展示了在超导量子芯片中应用GNNs的优势。

更新时间: 2025-07-24 07:23:42

领域: quant-ph,cs.AI

下载: http://arxiv.org/abs/2411.16354v3

Logical Characterizations of GNNs with Mean Aggregation

We study the expressive power of graph neural networks (GNNs) with mean as the aggregation function. In the non-uniform setting, we show that such GNNs have exactly the same expressive power as ratio modal logic, which has modal operators expressing that at least a certain ratio of the successors of a vertex satisfies a specified property. The non-uniform expressive power of mean GNNs is thus higher than that of GNNs with max aggregation, but lower than for sum aggregation--the latter are characterized by modal logic and graded modal logic, respectively. In the uniform setting, we show that the expressive power relative to MSO is exactly that of alternation-free modal logic, under the natural assumptions that combination functions are continuous and classification functions are thresholds. This implies that, relative to MSO and in the uniform setting, mean GNNs are strictly less expressive than sum GNNs and max GNNs. When any of the assumptions is dropped, the expressive power increases.

Updated: 2025-07-24 07:21:49

标题: GNNs with Mean Aggregation的逻辑特征描述

摘要: 我们研究了以均值为聚合函数的图神经网络（GNNs）的表达能力。在非均匀设置中，我们展示了这样的GNNs与比例模态逻辑具有完全相同的表达能力，该逻辑具有表达操作符，表示至少一个顶点的后继顶点中的一定比例满足指定属性。因此，均值GNNs的非均匀表达能力高于最大聚合的GNNs，但低于总和聚合的GNNs，后者分别由模态逻辑和分级模态逻辑特征化。在均匀设置中，我们展示了相对于MSO的表达能力正好是无交替模态逻辑，前提是组合函数是连续的，分类函数是阈值。这意味着，相对于MSO和在均匀设置中，均值GNNs的表达能力比总和GNNs和最大GNNs要低。当任何前提被放弃时，表达能力增加。

更新时间: 2025-07-24 07:21:49

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2507.18145v1

Reality Proxy: Fluid Interactions with Real-World Objects in MR via Abstract Representations

Interacting with real-world objects in Mixed Reality (MR) often proves difficult when they are crowded, distant, or partially occluded, hindering straightforward selection and manipulation. We observe that these difficulties stem from performing interaction directly on physical objects, where input is tightly coupled to their physical constraints. Our key insight is to decouple interaction from these constraints by introducing proxies-abstract representations of real-world objects. We embody this concept in Reality Proxy, a system that seamlessly shifts interaction targets from physical objects to their proxies during selection. Beyond facilitating basic selection, Reality Proxy uses AI to enrich proxies with semantic attributes and hierarchical spatial relationships of their corresponding physical objects, enabling novel and previously cumbersome interactions in MR - such as skimming, attribute-based filtering, navigating nested groups, and complex multi object selections - all without requiring new gestures or menu systems. We demonstrate Reality Proxy's versatility across diverse scenarios, including office information retrieval, large-scale spatial navigation, and multi-drone control. An expert evaluation suggests the system's utility and usability, suggesting that proxy-based abstractions offer a powerful and generalizable interaction paradigm for future MR systems.

Updated: 2025-07-24 07:13:36

标题: 现实代理：通过抽象表示在MR中与现实世界物体进行流畅交互

摘要: 在混合现实（MR）中与现实世界物体进行交互通常会变得困难，特别是当它们密集、遥远或部分被遮挡时，这会妨碍直接选择和操作。我们观察到，这些困难源于直接在物理对象上进行交互，其中输入与它们的物理约束紧密耦合。我们的关键洞察是通过引入代理（proxies）-真实世界物体的抽象表示，将交互与这些约束解耦。我们在Reality Proxy中体现了这一概念，该系统在选择过程中将交互目标无缝地从物理对象转移到它们的代理。除了促进基本的选择外，Reality Proxy还利用人工智能来丰富代理的语义属性和相应物理对象的层次空间关系，从而在MR中实现新颖且以前繁琐的交互，如略读、基于属性的过滤、导航嵌套组和复杂的多对象选择，而无需新的手势或菜单系统。我们展示了Reality Proxy在各种场景中的多样性，包括办公室信息检索、大规模空间导航和多无人机控制。专家评估表明该系统具有实用性和可用性，表明基于代理的抽象为未来MR系统提供了强大且可推广的交互范式。

更新时间: 2025-07-24 07:13:36

领域: cs.HC,cs.AI,cs.GR,H.5.2; I.3.6

下载: http://arxiv.org/abs/2507.17248v2

HIVMedQA: Benchmarking large language models for HIV medical decision support

Large language models (LLMs) are emerging as valuable tools to support clinicians in routine decision-making. HIV management is a compelling use case due to its complexity, including diverse treatment options, comorbidities, and adherence challenges. However, integrating LLMs into clinical practice raises concerns about accuracy, potential harm, and clinician acceptance. Despite their promise, AI applications in HIV care remain underexplored, and LLM benchmarking studies are scarce. This study evaluates the current capabilities of LLMs in HIV management, highlighting their strengths and limitations. We introduce HIVMedQA, a benchmark designed to assess open-ended medical question answering in HIV care. The dataset consists of curated, clinically relevant questions developed with input from an infectious disease physician. We evaluated seven general-purpose and three medically specialized LLMs, applying prompt engineering to enhance performance. Our evaluation framework incorporates both lexical similarity and an LLM-as-a-judge approach, extended to better reflect clinical relevance. We assessed performance across key dimensions: question comprehension, reasoning, knowledge recall, bias, potential harm, and factual accuracy. Results show that Gemini 2.5 Pro consistently outperformed other models across most dimensions. Notably, two of the top three models were proprietary. Performance declined as question complexity increased. Medically fine-tuned models did not always outperform general-purpose ones, and larger model size was not a reliable predictor of performance. Reasoning and comprehension were more challenging than factual recall, and cognitive biases such as recency and status quo were observed. These findings underscore the need for targeted development and evaluation to ensure safe, effective LLM integration in clinical care.

Updated: 2025-07-24 07:06:30

标题: HIVMedQA：用于HIV医疗决策支持的大型语言模型基准测试

摘要: 大型语言模型（LLMs）正在成为支持临床医生进行日常决策的有价值工具。HIV管理是一个引人注目的应用案例，由于其复杂性，包括多种治疗选择、合并症和依从性挑战。然而，将LLMs整合到临床实践中引发了关于准确性、潜在危害和临床医生接受度的担忧。尽管具有潜力，但AI在HIV护理中的应用仍未得到充分探讨，LLM基准研究也很少。本研究评估了LLMs在HIV管理中的当前能力，突出了其优势和局限性。我们介绍了HIVMedQA，这是一个旨在评估HIV护理中开放式医学问题回答的基准。该数据集包含由传染病医生提供输入开发的经过筛选的临床相关问题。我们评估了七个通用型和三个医学专业型LLMs，应用提示工程来提升性能。我们的评估框架结合了词汇相似性和LLM作为评判者的方法，扩展以更好地反映临床相关性。我们评估了关键维度上的性能：问题理解、推理、知识回忆、偏见、潜在危害和事实准确性。结果显示，Gemini 2.5 Pro在大多数维度上始终表现优于其他模型。值得注意的是，前三名中有两个是专有的模型。随着问题复杂性的增加，性能下降。经过医学调优的模型并不总是优于通用型模型，而较大的模型大小并不是性能的可靠预测因子。推理和理解比事实回忆更具挑战性，观察到认知偏见，如最近性和现状。这些发现强调了有针对性的开发和评估的必要性，以确保在临床护理中安全有效地整合LLM。

更新时间: 2025-07-24 07:06:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18143v1

Robust Non-adaptive Group Testing under Errors in Group Membership Specifications

Given $p$ samples, each of which may or may not be defective, group testing (GT) aims to determine their defect status by performing tests on $n < p$ `groups', where a group is formed by mixing a subset of the $p$ samples. Assuming that the number of defective samples is very small compared to $p$, GT algorithms have provided excellent recovery of the status of all $p$ samples with even a small number of groups. Most existing methods, however, assume that the group memberships are accurately specified. This assumption may not always be true in all applications, due to various resource constraints. Such errors could occur, eg, when a technician, preparing the groups in a laboratory, unknowingly mixes together an incorrect subset of samples as compared to what was specified. We develop a new GT method, the Debiased Robust Lasso Test Method (DRLT), that handles such group membership specification errors. The proposed DRLT method is based on an approach to debias, or reduce the inherent bias in, estimates produced by Lasso, a popular and effective sparse regression technique. We also provide theoretical upper bounds on the reconstruction error produced by our estimator. Our approach is then combined with two carefully designed hypothesis tests respectively for (i) the identification of defective samples in the presence of errors in group membership specifications, and (ii) the identification of groups with erroneous membership specifications. The DRLT approach extends the literature on bias mitigation of statistical estimators such as the LASSO, to handle the important case when some of the measurements contain outliers, due to factors such as group membership specification errors. We present numerical results which show that our approach outperforms several baselines and robust regression techniques for identification of defective samples as well as erroneously specified groups.

Updated: 2025-07-24 07:03:53

标题: 在团体成员规范中的错误下的强大的非自适应团体测试

摘要: 鉴于$p$个样本，每个样本可能有缺陷，也可能没有缺陷，分组测试（GT）旨在通过对$n<p$“组”进行测试来确定它们的缺陷状态，其中一组由混合$p$个样本的子集组成。假设有缺陷样本数量远小于$p$，GT算法在即使只有少量组的情况下也能提供对所有$p$个样本状态的优秀恢复。然而，大多数现有方法假设组成员资格得到准确指定。由于各种资源约束，在某些应用中这种假设并不总是成立。诸如在实验室准备组时，技术人员无意中混合了与规定不同的样本子集时，可能会发生此类错误。我们开发了一种新的GT方法，即消除偏差稳健Lasso测试方法（DRLT），可处理此类组成员资格规定错误。所提出的DRLT方法基于一种消除偏差或减少由Lasso产生的估计值中固有偏差的方法，Lasso是一种流行且有效的稀疏回归技术。我们还提供了关于我们估计器产生的重建误差的理论上限。然后，我们的方法与两个精心设计的假设检验相结合，分别用于（i）在组成员资格规定错误存在的情况下识别有缺陷样本，以及（ii）识别具有错误成员资格规定的组。DRLT方法扩展了关于统计估计器（如LASSO）偏差减轻的文献，以处理某些测量包含异常值的重要情况，这是由于诸如组成员资格规定错误等因素。我们提供数值结果，表明我们的方法在识别有缺陷样本以及错误指定组方面优于几种基线和稳健回归技术。

更新时间: 2025-07-24 07:03:53

领域: stat.ML,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2409.05345v2

Robust Non-adaptive Group Testing under Errors in Group Membership Specifications

Updated: 2025-07-24 07:03:53

标题: 稳健的非自适应组检测在组成员规格错误下

摘要: 鉴于$p$个样本，每个样本可能有缺陷，也可能没有缺陷，分组测试（GT）旨在通过对$n<p$个“组”进行测试来确定它们的缺陷状态，其中一个组由混合$p$个样本的一个子集而形成。假设有缺陷样本的数量远远小于$p$，GT算法已经证明在即使只有少量组的情况下也能很好地恢复所有$p$个样本的状态。然而，大多数现有方法假设组成员身份已经准确指定。由于各种资源限制，这种假设并不总是成立。例如，当一名技术人员在实验室准备组时，无意中混合了一个与规定不符的样本子集时，就可能发生此类错误。我们开发了一种新的GT方法，即去偏差鲁棒Lasso测试方法（DRLT），可以处理这种组成员身份规定错误。所提出的DRLT方法基于消除Lasso产生的估计中固有偏差的方法，Lasso是一种流行且有效的稀疏回归技术。我们还提供了我们估计器产生的重建误差的理论上限。然后，我们的方法与两种精心设计的假设检验相结合，分别用于（i）在组成员身份规定错误的情况下识别有缺陷的样本，以及（ii）识别组成员身份规定错误的组。DRLT方法扩展了有关统计估计器偏差缓解的文献，例如LASSO，以处理一些测量包含异常值的重要情况，原因可能是组成员身份规定错误。我们呈现了数值结果，显示我们的方法在识别有缺陷的样本以及错误规定组方面优于几种基线和鲁棒回归技术。

更新时间: 2025-07-24 07:03:53

领域: stat.ML,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2409.05345v2

Neuromorphic Computing for Embodied Intelligence in Autonomous Systems: Current Trends, Challenges, and Future Directions

The growing need for intelligent, adaptive, and energy-efficient autonomous systems across fields such as robotics, mobile agents (e.g., UAVs), and self-driving vehicles is driving interest in neuromorphic computing. By drawing inspiration from biological neural systems, neuromorphic approaches offer promising pathways to enhance the perception, decision-making, and responsiveness of autonomous platforms. This paper surveys recent progress in neuromorphic algorithms, specialized hardware, and cross-layer optimization strategies, with a focus on their deployment in real-world autonomous scenarios. Special attention is given to event-based dynamic vision sensors and their role in enabling fast, efficient perception. The discussion highlights new methods that improve energy efficiency, robustness, adaptability, and reliability through the integration of spiking neural networks into autonomous system architectures. We integrate perspectives from machine learning, robotics, neuroscience, and neuromorphic engineering to offer a comprehensive view of the state of the field. Finally, emerging trends and open challenges are explored, particularly in the areas of real-time decision-making, continual learning, and the development of secure, resilient autonomous systems.

Updated: 2025-07-24 07:01:52

标题: 神经形态计算在自主系统中的具象智能应用：当前趋势、挑战和未来方向

摘要: 对于智能、自适应和节能的自主系统的需求不断增长，涉及领域包括机器人技术、移动代理（如无人机）和自动驾驶车辆，这推动了对神经形态计算的兴趣。通过借鉴生物神经系统，神经形态方法提供了增强自主平台感知、决策和响应能力的有希望的途径。本文调查了神经形态算法、专用硬件和跨层优化策略的最新进展，重点关注它们在真实世界自主场景中的部署。特别关注基于事件的动态视觉传感器及其在实现快速、高效的感知方面的作用。讨论突出了通过将尖峰神经网络集成到自主系统架构中来提高能源效率、稳健性、适应性和可靠性的新方法。我们整合了机器学习、机器人技术、神经科学和神经形态工程的观点，以提供对该领域现状的全面视图。最后，探讨了新兴趋势和开放挑战，特别是在实时决策、持续学习和开发安全、弹性的自主系统方面。

更新时间: 2025-07-24 07:01:52

领域: cs.LG

下载: http://arxiv.org/abs/2507.18139v1

Neuromorphic Computing for Embodied Intelligence in Autonomous Systems: Current Trends, Challenges, and Future Directions

Updated: 2025-07-24 07:01:52

标题: 神经形态计算用于自主系统中的体现智能：当前趋势、挑战和未来方向

摘要: 随着对智能、自适应和节能自主系统的需求不断增长，涉及机器人、移动代理（如无人机）和自动驾驶车辆等领域，对神经形态计算的兴趣日益增强。通过借鉴生物神经系统的启发，神经形态方法提供了增强自主平台感知、决策和响应能力的有希望途径。本文调查了神经形态算法、专用硬件和跨层优化策略在真实世界自主场景中的最新进展。特别关注事件驱动动态视觉传感器及其在实现快速、高效感知中的作用。讨论突出了通过将脉冲神经网络集成到自主系统架构中来提高能源效率、稳健性、适应性和可靠性的新方法。我们融合了机器学习、机器人技术、神经科学和神经形态工程的视角，提供了该领域现状的全面视图。最后，探讨了新兴趋势和开放挑战，特别是在实时决策制定、持续学习和开发安全、具有韧性的自主系统领域。

更新时间: 2025-07-24 07:01:52

领域: cs.LG

下载: http://arxiv.org/abs/2507.18139v1

Assessment of Quantitative Cyber-Physical Reliability of SCADA Systems in Autonomous Vehicle to Grid (V2G) Capable Smart Grids

The integration of electric vehicles (EVs) into power grids via Vehicle-to-Grid (V2G) system technology is increasing day by day, but these phenomena present both advantages and disadvantages. V2G can increase grid reliability by providing distributed energy storage and ancillary services. However, on the other hand, it has a scope that encompasses the cyber-physical attack surface of the national power grid, introducing new vulnerabilities in monitoring and supervisory control and data acquisition (SCADA) systems. This paper investigates the maliciousness caused by Autonomous Vehicle to Grid (AV2G) communication infrastructures and assesses their impacts on SCADA system reliability. This paper presents a quantitative reliability assessment using Bayesian attack graph combined with probabilistic capacity outage modeling based on IEEE RTS-79 system data. This work presents how AV2G-based attacks degrade system performance by using Monte Carlo simulations method, highlighting the need for cybersecurity-hardening strategies in smart grid design.

Updated: 2025-07-24 06:57:10

标题: 自主车辆对电网（V2G）可行智能电网中SCADA系统的定量网络物理可靠性评估

摘要: 将电动汽车（EVs）通过车辆对电网（V2G）系统技术整合到电网中的现象日益增多，但这些现象既有优势又有劣势。V2G可以通过提供分布式能量储存和辅助服务来增加电网可靠性。然而，另一方面，它涉及国家电网的网络攻击表面，引入了对监控和监视控制和数据采集（SCADA）系统的新漏洞。本文研究了自主车辆对电网（AV2G）通信基础设施造成的恶意行为，并评估了它们对SCADA系统可靠性的影响。本文利用基于IEEE RTS-79系统数据的贝叶斯攻击图结合概率容量中断建模进行了定量可靠性评估。这项工作展示了AV2G攻击如何通过使用蒙特卡罗模拟方法降低系统性能，突出了在智能电网设计中需要加强网络安全策略的必要性。

更新时间: 2025-07-24 06:57:10

领域: cs.CR,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2507.21154v1

DAA*: Deep Angular A Star for Image-based Path Planning

Path smoothness is often overlooked in path imitation learning from expert demonstrations. In this paper, we introduce a novel learning method, termed deep angular A* (DAA*), by incorporating the proposed path angular freedom (PAF) into A* to improve path similarity through adaptive path smoothness. The PAF aims to explore the effect of move angles on path node expansion by finding the trade-off between their minimum and maximum values, allowing for high adaptiveness for imitation learning. DAA* improves path optimality by closely aligning with the reference path through joint optimization of path shortening and smoothing, which correspond to heuristic distance and PAF, respectively. Throughout comprehensive evaluations on 7 datasets, including 4 maze datasets, 2 video-game datasets, and a real-world drone-view dataset containing 2 scenarios, we demonstrate remarkable improvements of our DAA* over neural A* in path similarity between the predicted and reference paths with a shorter path length when the shortest path is plausible, improving by 9.0% SPR, 6.9% ASIM, and 3.9% PSIM. Furthermore, when jointly learning pathfinding with both path loss and path probability map loss, DAA* significantly outperforms the state-of-the-art TransPath by 6.3% SPR, 6.0% PSIM, and 3.7% ASIM. We also discuss the minor trade-off between path optimality and search efficiency where applicable. Our code and model weights are available at https://github.com/zwxu064/DAAStar.git.

Updated: 2025-07-24 06:51:41

标题: DAA*: 基于图像的路径规划的深度角A星算法

摘要: 路径平滑通常在从专家演示中学习路径模仿时被忽视。在本文中，我们引入了一种新的学习方法，称为深度角A*（DAA*），通过将提出的路径角度自由度（PAF）纳入A*中，以通过自适应路径平滑改进路径相似性。PAF旨在通过找到移动角度对路径节点扩展的最小值和最大值之间的权衡，从而允许高度适应性地进行模仿学习。DAA*通过联合优化路径缩短和平滑来改善路径的最优性，分别对应启发式距离和PAF。在包括4个迷宫数据集、2个视频游戏数据集和一个包含2个场景的真实世界无人机视图数据集上进行全面评估，我们展示了我们的DAA*在预测路径和参考路径之间的路径相似性上相对于神经A*的显着改进，路径长度更短时，当最短路径是合理的时候，提高了9.0% SPR、6.9% ASIM和3.9% PSIM。此外，当同时学习路径损失和路径概率图损失时，DAA*在SPR、PSIM和ASIM方面显著优于最先进的TransPath，分别提高了6.3% SPR、6.0% PSIM和3.7% ASIM。我们还讨论了路径最优性和搜索效率之间的轻微权衡，在适用的情况下。我们的代码和模型权重可在https://github.com/zwxu064/DAAStar.git获取。

更新时间: 2025-07-24 06:51:41

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2507.09305v3

Deep Learning for Glioblastoma Morpho-pathological Features Identification: A BraTS-Pathology Challenge Solution

Glioblastoma, a highly aggressive brain tumor with diverse molecular and pathological features, poses a diagnostic challenge due to its heterogeneity. Accurate diagnosis and assessment of this heterogeneity are essential for choosing the right treatment and improving patient outcomes. Traditional methods rely on identifying specific features in tissue samples, but deep learning offers a promising approach for improved glioblastoma diagnosis. In this paper, we present our approach to the BraTS-Path Challenge 2024. We leverage a pre-trained model and fine-tune it on the BraTS-Path training dataset. Our model demonstrates poor performance on the challenging BraTS-Path validation set, as rigorously assessed by the Synapse online platform. The model achieves an accuracy of 0.392229, a recall of 0.392229, and a F1-score of 0.392229, indicating a consistent ability to correctly identify instances under the target condition. Notably, our model exhibits perfect specificity of 0.898704, showing an exceptional capacity to correctly classify negative cases. Moreover, a Matthews Correlation Coefficient (MCC) of 0.255267 is calculated, to signify a limited positive correlation between predicted and actual values and highlight our model's overall predictive power. Our solution also achieves the second place during the testing phase.

Updated: 2025-07-24 06:47:23

标题: 深度学习用于胶质母细胞瘤形态病理特征识别：一种BraTS-病理挑战解决方案

摘要: 胶质母细胞瘤是一种高度侵袭性的脑肿瘤，具有多样化的分子和病理特征，由于其异质性而造成诊断挑战。准确的诊断和评估这种异质性对于选择正确的治疗方法并改善患者预后至关重要。传统方法依赖于在组织样本中识别特定特征，但深度学习提供了一个改进胶质母细胞瘤诊断的有希望方法。在本文中，我们介绍了我们在2024年BraTS-Path挑战赛中的方法。我们利用一个预训练模型，并在BraTS-Path训练数据集上进行微调。我们的模型在具有挑战性的BraTS-Path验证集上表现不佳，经由Synapse在线平台严格评估。该模型实现了准确率为0.392229，召回率为0.392229，F1分数为0.392229，表明了正确识别目标条件下实例的一致能力。值得注意的是，我们的模型表现出完全的特异性为0.898704，表明了正确分类负例的异常能力。此外，计算出Matthews相关系数（MCC）为0.255267，表示预测值与实际值之间存在有限的正相关，并突显了我们模型的整体预测能力。我们的解决方案还在测试阶段取得了第二名。

更新时间: 2025-07-24 06:47:23

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.18133v1

TOC-UCO: a comprehensive repository of tabular ordinal classification datasets

An ordinal classification (OC) problem corresponds to a special type of classification characterised by the presence of a natural order relationship among the classes. This type of problem can be found in a number of real-world applications, motivating the design and development of many ordinal methodologies over the last years. However, it is important to highlight that the development of the OC field suffers from one main disadvantage: the lack of a comprehensive set of datasets on which novel approaches to the literature have to be benchmarked. In order to approach this objective, this manuscript from the University of C\'ordoba (UCO), which have previous experience on the OC field, provides the literature with a publicly available repository of tabular data for a robust validation of novel OC approaches, namely TOC-UCO (Tabular Ordinal Classification repository of the UCO). Specifically, this repository includes a set of $46$ tabular ordinal datasets, preprocessed under a common framework and ensured to have a reasonable number of patterns and an appropriate class distribution. We also provide the sources and preprocessing steps of each dataset, along with details on how to benchmark a novel approach using the TOC-UCO repository. For this, indices for $30$ different randomised train-test partitions are provided to facilitate the reproducibility of the experiments.

Updated: 2025-07-24 06:38:39

标题: TOC-UCO：一个全面的表格序数分类数据集存储库

摘要: 一个序数分类（OC）问题对应着一种特殊类型的分类，其特点是类别之间存在自然的顺序关系。这种类型的问题在许多现实世界的应用中都可以找到，促使在过去几年中设计和开发了许多序数方法论。然而，值得强调的是，OC领域的发展面临一个主要的缺点：缺乏一套全面的数据集，用于对文献中的新方法进行基准测试。为了实现这一目标，科尔多瓦大学（UCO）的这篇论文提供了一个公开可用的表格数据存储库，用于对新颖的OC方法进行强有力的验证，即TOC-UCO（科尔多瓦大学的表格序数分类存储库）。具体来说，这个存储库包括一组46个表格序数数据集，经过常见框架预处理，确保具有合理数量的模式和适当的类别分布。我们还提供了每个数据集的来源和预处理步骤，以及如何使用TOC-UCO存储库对新方法进行基准测试的详细信息。为此，提供了30个不同随机训练-测试分区的指标，以便实验的可重现性。

更新时间: 2025-07-24 06:38:39

领域: cs.LG

下载: http://arxiv.org/abs/2507.17348v2

U-Net Based Healthy 3D Brain Tissue Inpainting

This paper introduces a novel approach to synthesize healthy 3D brain tissue from masked input images, specifically focusing on the task of 'ASNR-MICCAI BraTS Local Synthesis of Tissue via Inpainting'. Our proposed method employs a U-Net-based architecture, which is designed to effectively reconstruct the missing or corrupted regions of brain MRI scans. To enhance our model's generalization capabilities and robustness, we implement a comprehensive data augmentation strategy that involves randomly masking healthy images during training. Our model is trained on the BraTS-Local-Inpainting dataset and demonstrates the exceptional performance in recovering healthy brain tissue. The evaluation metrics employed, including Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Mean Squared Error (MSE), consistently yields impressive results. On the BraTS-Local-Inpainting validation set, our model achieved an SSIM score of 0.841, a PSNR score of 23.257, and an MSE score of 0.007. Notably, these evaluation metrics exhibit relatively low standard deviations, i.e., 0.103 for SSIM score, 4.213 for PSNR score and 0.007 for MSE score, which indicates that our model's reliability and consistency across various input scenarios. Our method also secured first place in the challenge.

Updated: 2025-07-24 06:26:46

标题: 基于U-Net的健康3D脑组织修复

摘要: 这篇论文介绍了一种新颖的方法，用于从遮盖的输入图像中合成健康的3D脑组织，主要关注“ASNR-MICCAI BraTS局部组织合成通过修补”的任务。我们提出的方法采用基于U-Net的架构，旨在有效重建脑MRI扫描中缺失或损坏的区域。为了增强我们模型的泛化能力和鲁棒性，我们实施了一项全面的数据增强策略，其中包括在训练过程中随机遮盖健康图像。我们的模型在BraTS-Local-Inpainting数据集上进行训练，并展示了在恢复健康脑组织方面的出色性能。采用的评估指标包括结构相似性指数（SSIM）、峰值信噪比（PSNR）和均方误差（MSE），始终产生令人印象深刻的结果。在BraTS-Local-Inpainting验证集上，我们的模型实现了0.841的SSIM分数，23.257的PSNR分数和0.007的MSE分数。值得注意的是，这些评估指标表现出相对较低的标准偏差，即SSIM分数为0.103，PSNR分数为4.213，MSE分数为0.007，这表明我们模型在各种输入情景下的可靠性和一致性。我们的方法还在挑战中获得了第一名。

更新时间: 2025-07-24 06:26:46

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.18126v1

Integrated Learning and Optimization for Congestion Management and Profit Maximization in Real-Time Electricity Market

We develop novel integrated learning and optimization (ILO) methodologies to solve economic dispatch (ED) and DC optimal power flow (DCOPF) problems for better economic operation. The optimization problem for ED is formulated with load being an unknown parameter while DCOPF consists of load and power transfer distribution factor (PTDF) matrix as unknown parameters. PTDF represents the incremental variations of real power on transmission lines which occur due to real power transfers between two regions. These values represent a linearized approximation of power flows over the transmission lines. We develop novel ILO formulations to solve post-hoc penalties in electricity market and line congestion problems using ED and DCOPF optimization formulations. Our proposed methodologies capture the real-time electricity market and line congestion behavior to train the regret function which eventually train unknown loads at different buses and line PTDF matrix to achieve the afore-mentioned post-hoc goals. The proposed methodology is compared to sequential learning and optimization (SLO) which train load and PTDF forecasts for accuracy rather than economic operation. Our experimentation prove the superiority of ILO in minimizing the post-hoc penalties in electricity markets and minimizing the line congestion thereby improving the economic operation with noticeable amount.

Updated: 2025-07-24 06:25:26

标题: 实时电力市场中的拥挤管理和利润最大化的集成学习和优化

摘要: 我们开发了新颖的集成学习和优化（ILO）方法，用于解决经济调度（ED）和直流最优功率流（DCOPF）问题，以实现更好的经济运行。ED的优化问题是在未知参数负载的情况下制定的，而DCOPF包括负载和功率传输分布因子（PTDF）矩阵作为未知参数。PTDF代表由于两个区域之间的实际功率传输而在输电线上发生的实际功率的增量变化。这些值代表了输电线上功率流的线性化近似。我们开发了新颖的ILO公式，通过使用ED和DCOPF优化公式解决电力市场后期处罚和线路拥塞问题。我们提出的方法捕捉实时电力市场和线路拥塞行为，以训练后悔函数，最终训练不同母线上的未知负载和线路PTDF矩阵，以实现前述的后期目标。所提出的方法与顺序学习和优化（SLO）进行比较，后者为了准确性而训练负载和PTDF预测，而不是经济运行。我们的实验证明了ILO在最小化电力市场后期处罚和减少线路拥塞方面的卓越性，从而通过显著量提高经济运行。

更新时间: 2025-07-24 06:25:26

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2412.18003v3

Actively evaluating and learning the distinctions that matter: Vaccine safety signal detection from emergency triage notes

The rapid development of COVID-19 vaccines has showcased the global communitys ability to combat infectious diseases. However, the need for post-licensure surveillance systems has grown due to the limited window for safety data collection in clinical trials and early widespread implementation. This study aims to employ Natural Language Processing techniques and Active Learning to rapidly develop a classifier that detects potential vaccine safety issues from emergency department notes. ED triage notes, containing expert, succinct vital patient information at the point of entry to health systems, can significantly contribute to timely vaccine safety signal surveillance. While keyword-based classification can be effective, it may yield false positives and demand extensive keyword modifications. This is exacerbated by the infrequency of vaccination-related ED presentations and their similarity to other reasons for ED visits. NLP offers a more accurate and efficient alternative, albeit requiring annotated data, which is often scarce in the medical field. Active learning optimizes the annotation process and the quality of annotated data, which can result in faster model implementation and improved model performance. This work combines active learning, data augmentation, and active learning and evaluation techniques to create a classifier that is used to enhance vaccine safety surveillance from ED triage notes.

Updated: 2025-07-24 06:18:34

标题: 积极评估和学习重要的区别：从急诊分类记录中检测疫苗安全信号

摘要: COVID-19疫苗的快速发展展示了全球社区应对传染病的能力。然而，由于临床试验中安全数据收集的时间窗口有限，以及早期广泛实施的需求增加，对批准后监测系统的需求也在增长。本研究旨在利用自然语言处理技术和主动学习，快速开发一个能够从急诊科室笔记中检测潜在疫苗安全问题的分类器。急诊科室分诊笔记包含专家简明的患者重要信息，可以显著促进及时的疫苗安全信号监测。虽然基于关键词的分类可能有效，但可能会产生假阳性，并且需要大量关键词修改。这一问题进一步恶化，因为与疫苗接种相关的急诊科室就诊情况很少，而且与其他急诊就诊原因相似。自然语言处理提供了一个更准确和高效的替代方案，尽管需要标注数据，在医学领域往往很少见。主动学习优化了标注过程和标注数据的质量，可以加快模型实施速度并提高模型性能。本研究结合了主动学习、数据增强和主动学习评估技术，创建了一个分类器，用于从急诊科室分诊笔记中增强疫苗安全监测。

更新时间: 2025-07-24 06:18:34

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.18123v1

Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning

Recent work has shown that language models can self-improve by maximizing their own confidence in their predictions, without relying on external verifiers or reward signals. In this work, we study the test-time scaling of language models for mathematical reasoning tasks, where the model's own confidence is used to select the most promising attempts. Surprisingly, we find that we can achieve significant performance gains by continuing only the most promising attempt, selected by the model's prefix-confidence. We systematically evaluate prefix-confidence scaling on five mathematical reasoning datasets: the school-level GSM8K and MATH500, and the competition-level AMC23, AIME24, and AIME25. We find that prefix-confidence scaling with prefixes of only 32 tokens achieves a better accuracy-compute trade-off than majority voting. Moreover, prefix-confidence scaling appears less susceptible than BoN to length biases. Finally, we also evaluate test-time training with prefix-confidence and find that, while outperforming the base model, it does not improve over prefix-confidence scaling.

Updated: 2025-07-24 06:17:39

标题: 在测试时间高效地最大化前缀置信度显著改善数学推理

摘要: 最近的研究表明，语言模型可以通过最大化自己对预测的信心来自我改进，而无需依赖外部验证者或奖励信号。在这项工作中，我们研究了语言模型在数学推理任务中的测试时间缩放，其中模型自己的信心被用来选择最有希望的尝试。令人惊讶的是，我们发现通过仅继续模型的前缀信心选择的最有希望的尝试，可以实现显著的性能提升。我们系统地评估了前缀信心缩放在五个数学推理数据集上的表现：学校级别的GSM8K和MATH500，以及竞赛级别的AMC23、AIME24和AIME25。我们发现，仅使用32个标记的前缀信心缩放比多数投票实现了更好的准确性-计算效益平衡。此外，前缀信心缩放似乎比BoN更不容易受到长度偏见的影响。最后，我们还评估了带有前缀信心的测试时间训练，并发现，尽管优于基础模型，但并没有改进前缀信心缩放。

更新时间: 2025-07-24 06:17:39

领域: cs.LG

下载: http://arxiv.org/abs/2507.18122v1

Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning

Updated: 2025-07-24 06:17:39

标题: 在测试时间高效地提高数学推理的前缀置信度

摘要: 最近的研究表明，语言模型可以通过最大化自己对预测的信心来自我改进，而无需依赖外部验证者或奖励信号。在这项工作中，我们研究了语言模型在数学推理任务中的测试时间规模，其中模型自己的信心被用来选择最有希望的尝试。令人惊讶的是，我们发现通过仅继续模型的前缀置信度选择的最有希望的尝试，可以实现显著的性能提升。我们系统地评估了前缀置信度在五个数学推理数据集上的规模：学校级别的GSM8K和MATH500，以及竞赛级别的AMC23、AIME24和AIME25。我们发现，仅使用32个标记的前缀置信度规模比多数投票实现了更好的准确性-计算权衡。此外，前缀置信度规模似乎比BoN更不容易受长度偏见影响。最后，我们还评估了带有前缀置信度的测试时间训练，并发现，虽然表现优于基础模型，但并未超越前缀置信度规模。

更新时间: 2025-07-24 06:17:39

领域: cs.LG

下载: http://arxiv.org/abs/2507.18122v1

OrQstrator: An AI-Powered Framework for Advanced Quantum Circuit Optimization

We propose a novel approach, OrQstrator, which is a modular framework for conducting quantum circuit optimization in the Noisy Intermediate-Scale Quantum (NISQ) era. Our framework is powered by Deep Reinforcement Learning (DRL). Our orchestration engine intelligently selects among three complementary circuit optimizers: A DRL-based circuit rewriter trained to reduce depth and gate count via learned rewrite sequences; a domain-specific optimizer that performs efficient local gate resynthesis and numeric optimization; a parameterized circuit instantiator that improves compilation by optimizing template circuits during gate set translation. These modules are coordinated by a central orchestration engine that learns coordination policies based on circuit structure, hardware constraints, and backend-aware performance features such as gate count, depth, and expected fidelity. The system outputs an optimized circuit for hardware-aware transpilation and execution, leveraging techniques from an existing state-of-the-art approach, called the NISQ Analyzer, to adapt to backend constraints.

Updated: 2025-07-24 06:16:38

标题: OrQstrator：一种用于高级量子电路优化的人工智能框架

摘要: 我们提出了一种新颖的方法，OrQstrator，这是一个用于在嘈杂中间规模量子(NISQ)时代进行量子电路优化的模块化框架。我们的框架由深度强化学习(DRL)驱动。我们的编排引擎智能地在三种互补的电路优化器中选择：一种基于DRL的电路重写器，经过训练可以通过学习的重写序列来减少深度和门数量；一种领域特定的优化器，执行高效的本地门重合成和数值优化；一种参数化的电路实例化器，在门集翻译期间通过优化模板电路来改善编译。这些模块由一个中央编排引擎协调，该引擎基于电路结构、硬件约束和后端感知性能特征（如门计数、深度和预期保真度）学习协调策略。系统输出一个经过优化的电路，用于硬件感知的转译和执行，利用现有的最先进方法NISQ分析器的技术，以适应后端约束。

更新时间: 2025-07-24 06:16:38

领域: cs.SE,cs.AI,cs.ET

下载: http://arxiv.org/abs/2507.09682v2

A Survey of Deep Learning for Geometry Problem Solving

Geometry problem solving is a key area of mathematical reasoning, which is widely involved in many important fields such as education, mathematical ability assessment of artificial intelligence, and multimodal ability assessment. In recent years, the rapid development of deep learning technology, especially the rise of multimodal large language models, has triggered a widespread research boom. This paper provides a survey of the applications of deep learning in geometry problem solving, including (i) a comprehensive summary of the relevant tasks in geometry problem solving; (ii) a thorough review of related deep learning methods; (iii) a detailed analysis of evaluation metrics and methods; and (iv) a critical discussion of the current challenges and future directions that can be explored. Our goal is to provide a comprehensive and practical reference of deep learning for geometry problem solving to promote further developments in this field. We create a continuously updated list of papers on GitHub: https://github.com/majianz/dl4gps.

Updated: 2025-07-24 06:15:29

标题: 深度学习在几何问题解决中的调查

摘要: 几何问题解决是数学推理的一个关键领域，广泛涉及教育、人工智能数学能力评估和多模态能力评估等许多重要领域。近年来，深度学习技术的快速发展，尤其是多模态大型语言模型的兴起，引发了广泛的研究热潮。本文提供了深度学习在几何问题解决中的应用概述，包括(i) 对几何问题解决中相关任务的全面总结；(ii) 对相关深度学习方法的彻底审查；(iii) 对评估指标和方法的详细分析；以及(iv) 对当前挑战和未来方向的批判性讨论。我们的目标是提供一个深度学习在几何问题解决中的全面实用参考，以促进该领域的进一步发展。我们在GitHub上创建了一个持续更新的论文列表：https://github.com/majianz/dl4gps。

更新时间: 2025-07-24 06:15:29

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.11936v3

Efficient Knowledge Tracing Leveraging Higher-Order Information in Integrated Graphs

The rise of online learning has led to the development of various knowledge tracing (KT) methods. However, existing methods have overlooked the problem of increasing computational cost when utilizing large graphs and long learning sequences. To address this issue, we introduce Dual Graph Attention-based Knowledge Tracing (DGAKT), a graph neural network model designed to leverage high-order information from subgraphs representing student-exercise-KC relationships. DGAKT incorporates a subgraph-based approach to enhance computational efficiency. By processing only relevant subgraphs for each target interaction, DGAKT significantly reduces memory and computational requirements compared to full global graph models. Extensive experimental results demonstrate that DGAKT not only outperforms existing KT models but also sets a new standard in resource efficiency, addressing a critical need that has been largely overlooked by prior KT approaches.

Updated: 2025-07-24 06:12:43

标题: 高效知识追踪：利用综合图中的高阶信息

摘要: 在线学习的兴起导致了各种知识追踪（KT）方法的发展。然而，现有方法忽视了在利用大型图和长期学习序列时增加计算成本的问题。为了解决这个问题，我们引入了基于双图注意力的知识追踪（DGAKT），这是一个设计用于利用代表学生-练习-KC关系的子图的图神经网络模型。DGAKT采用了基于子图的方法来增强计算效率。通过仅处理每个目标交互的相关子图，DGAKT与全局图模型相比，显著降低了内存和计算需求。广泛的实验结果表明，DGAKT不仅优于现有的KT模型，而且在资源效率方面树立了新的标准，解决了先前KT方法中被大多忽视的重要需求。

更新时间: 2025-07-24 06:12:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18668v1

Efficient Knowledge Tracing Leveraging Higher-Order Information in Integrated Graphs

Updated: 2025-07-24 06:12:43

标题: 高效知识追踪：利用综合图中的高阶信息

摘要: 在线学习的兴起导致了各种知识追踪（KT）方法的发展。然而，现有方法在利用大型图和长时间学习序列时忽视了增加的计算成本问题。为了解决这个问题，我们引入了基于双图注意力的知识追踪（DGAKT），这是一个设计用于利用代表学生-练习-KC关系的子图的图神经网络模型。DGAKT采用基于子图的方法来增强计算效率。通过仅处理每个目标交互的相关子图，DGAKT相比于完整的全局图模型显著减少了内存和计算需求。广泛的实验结果表明，DGAKT不仅优于现有的KT模型，而且在资源效率方面树立了新的标准，解决了先前KT方法中被大多数人忽视的关键需求。

更新时间: 2025-07-24 06:12:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.18668v1

A Differentiated Reward Method for Reinforcement Learning based Multi-Vehicle Cooperative Decision-Making Algorithms

Reinforcement learning (RL) shows great potential for optimizing multi-vehicle cooperative driving strategies through the state-action-reward feedback loop, but it still faces challenges such as low sample efficiency. This paper proposes a differentiated reward method based on steady-state transition systems, which incorporates state transition gradient information into the reward design by analyzing traffic flow characteristics, aiming to optimize action selection and policy learning in multi-vehicle cooperative decision-making. The performance of the proposed method is validated in RL algorithms such as MAPPO, MADQN, and QMIX under varying autonomous vehicle penetration. The results show that the differentiated reward method significantly accelerates training convergence and outperforms centering reward and others in terms of traffic efficiency, safety, and action rationality. Additionally, the method demonstrates strong scalability and environmental adaptability, providing a novel approach for multi-agent cooperative decision-making in complex traffic scenarios.

Updated: 2025-07-24 06:12:24

标题: 一种基于强化学习的多车辆协作决策算法的差异化奖励方法

摘要: 强化学习（RL）显示出通过状态-动作-奖励反馈循环优化多车辆协同驾驶策略的巨大潜力，但仍面临低样本效率等挑战。本文提出了一种基于稳态转移系统的差异化奖励方法，通过分析交通流特性将状态转移梯度信息纳入奖励设计，旨在优化多车辆协同决策中的动作选择和策略学习。所提出方法在不同自动驾驶车辆渗透率下在RL算法（如MAPPO，MADQN和QMIX）中进行了验证。结果表明，差异化奖励方法显着加速了训练收敛，并在交通效率、安全性和动作合理性方面优于居中奖励和其他方法。此外，该方法展现出强大的可扩展性和环境适应性，为复杂交通场景中的多智能体协同决策提供了一种新方法。

更新时间: 2025-07-24 06:12:24

领域: cs.AI,cs.MA,cs.RO

下载: http://arxiv.org/abs/2502.00352v2

Integrating Evidence into the Design of XAI and AI-based Decision Support Systems: A Means-End Framework for End-users in Construction

Explainable Artificial Intelligence seeks to make the reasoning processes of AI models transparent and interpretable, particularly in complex decision making environments. In the construction industry, where AI based decision support systems are increasingly adopted, limited attention has been paid to the integration of supporting evidence that underpins the reliability and accountability of AI generated outputs. The absence of such evidence undermines the validity of explanations and the trustworthiness of system recommendations. This paper addresses this gap by introducing a theoretical, evidence based means end framework developed through a narrative review. The framework offers an epistemic foundation for designing XAI enabled DSS that generate meaningful explanations tailored to users knowledge needs and decision contexts. It focuses on evaluating the strength, relevance, and utility of different types of evidence supporting AI generated explanations. While developed with construction professionals as primary end users, the framework is also applicable to developers, regulators, and project managers with varying epistemic goals.

Updated: 2025-07-24 06:11:52

标题: 将证据整合到XAI和基于人工智能的决策支持系统设计中：建筑领域最终用户的目的-手段框架

摘要: 可解释人工智能旨在使人工智能模型的推理过程透明和可解释，特别是在复杂的决策环境中。在建筑行业，越来越多地采用基于人工智能的决策支持系统，但对支持证据的整合却受到了有限的关注，这些支持证据是AI生成输出的可靠性和可问责性的基础。缺乏这样的证据会削弱解释的有效性和系统建议的可信度。本文通过引入一个通过叙事回顾发展而来的理论、证据为基础的手段终结框架来填补这一空白。该框架为设计能够生成针对用户知识需求和决策背景定制的有意义解释的可解释人工智能(DSS)提供认知基础。它侧重于评估支持AI生成解释的不同类型证据的强度、相关性和效用。虽然主要面向建筑专业人员作为最终用户开发，但该框架也适用于具有不同认知目标的开发人员、监管机构和项目经理。

更新时间: 2025-07-24 06:11:52

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2412.14209v2

GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness

Recent advances in end-to-end spoken language models (SLMs) have significantly improved the ability of AI systems to engage in natural spoken interactions. However, most existing models treat speech merely as a vehicle for linguistic content, often overlooking the rich paralinguistic and speaker characteristic cues embedded in human speech, such as dialect, age, emotion, and non-speech vocalizations. In this work, we introduce GOAT-SLM, a novel spoken language model with paralinguistic and speaker characteristic awareness, designed to extend spoken language modeling beyond text semantics. GOAT-SLM adopts a dual-modality head architecture that decouples linguistic modeling from acoustic realization, enabling robust language understanding while supporting expressive and adaptive speech generation. To enhance model efficiency and versatility, we propose a modular, staged training strategy that progressively aligns linguistic, paralinguistic, and speaker characteristic information using large-scale speech-text corpora. Experimental results on TELEVAL, a multi-dimensional evaluation benchmark, demonstrate that GOAT-SLM achieves well-balanced performance across both semantic and non-semantic tasks, and outperforms existing open-source models in handling emotion, dialectal variation, and age-sensitive interactions. This work highlights the importance of modeling beyond linguistic content and advances the development of more natural, adaptive, and socially aware spoken language systems.

Updated: 2025-07-24 06:10:29

标题: GOAT-SLM: 一种具有语用和说话者特征意识的口语语言模型

摘要: 近年来，端到端的口语语言模型（SLMs）的最新进展显著提高了人工智能系统进行自然口语交互的能力。然而，大多数现有模型仅将语音视为语言内容的载体，往往忽视了人类语音中嵌入的丰富的语用和说话者特征线索，如方言、年龄、情绪和非语言语音化。在这项工作中，我们介绍了GOAT-SLM，一种具有语用和说话者特征意识的新型口语语言模型，旨在将口语语言建模扩展到文本语义之外。GOAT-SLM采用了一种双模态头架构，将语言建模与声学实现解耦，从而实现强大的语言理解，同时支持表达丰富和适应性的语音生成。为了提高模型的效率和多功能性，我们提出了一种模块化、分阶段的训练策略，逐步使用大规模语音-文本语料库对语言、语用和说话者特征信息进行对齐。对于TELEVAL，一个多维度评估基准上的实验结果表明，GOAT-SLM在语义和非语义任务之间实现了良好的平衡性能，并在处理情绪、方言变化和年龄敏感交互方面优于现有的开源模型。这项工作强调了超越语言内容建模的重要性，并推动了更自然、适应性更强、具有社会意识的口语语言系统的发展。

更新时间: 2025-07-24 06:10:29

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.18119v1

VCDiag: Classifying Erroneous Waveforms for Failure Triage Acceleration

Failure triage in design functional verification is critical but time-intensive, relying on manual specification reviews, log inspections, and waveform analyses. While machine learning (ML) has improved areas like stimulus generation and coverage closure, its application to RTL-level simulation failure triage, particularly for large designs, remains limited. VCDiag offers an efficient, adaptable approach using VCD data to classify failing waveforms and pinpoint likely failure locations. In the largest experiment, VCDiag achieves over 94% accuracy in identifying the top three most likely modules. The framework introduces a novel signal selection and statistical compression approach, achieving over 120x reduction in raw data size while preserving features essential for classification. It can also be integrated into diverse Verilog/SystemVerilog designs and testbenches.

Updated: 2025-07-24 06:09:00

标题: VCDiag：用于故障诊断加速的错误波形分类

摘要: 设计功能验证中的故障分类是至关重要的，但耗时，依赖于手动规范审查，日志检查和波形分析。虽然机器学习（ML）已经改进了诸如刺激生成和覆盖完成等领域，但其在RTL级仿真故障分类中的应用，特别是对于大型设计来说，仍然有限。VCDiag提供了一种有效的，可适应的方法，使用VCD数据对故障波形进行分类，找出可能的故障位置。在最大的实验中，VCDiag在识别前三个最有可能的模块方面实现了超过94％的准确性。该框架引入了一种新颖的信号选择和统计压缩方法，将原始数据大小减小超过120倍，同时保留了对于分类至关重要的特征。它还可以集成到各种Verilog/SystemVerilog设计和测试平台中。

更新时间: 2025-07-24 06:09:00

领域: cs.LG

下载: http://arxiv.org/abs/2506.03590v3

VCDiag: Classifying Erroneous Waveforms for Failure Triage Acceleration

Updated: 2025-07-24 06:09:00

标题: VCDiag：用于故障分类加速的错误波形分类

摘要: 在设计功能验证中，故障分类是至关重要但也耗时的，依赖于手动规范审查、日志检查和波形分析。虽然机器学习（ML）已经改善了诸如刺激生成和覆盖率闭合等领域，但其在RTL级仿真故障分类方面，尤其是针对大型设计，仍然受到限制。VCDiag提供了一种高效、可适应的方法，利用VCD数据对失败波形进行分类，并准确定位可能的故障位置。在最大的实验中，VCDiag在识别前三个最可能的模块方面达到了超过94%的准确率。该框架引入了一种新颖的信号选择和统计压缩方法，实现了原始数据尺寸的超过120倍减少，同时保留了对于分类至关重要的特征。它还可以集成到不同的Verilog/SystemVerilog设计和测试台中。

更新时间: 2025-07-24 06:09:00

领域: cs.LG

下载: http://arxiv.org/abs/2506.03590v3

Generalizing Adam to Manifolds for Efficiently Training Transformers

One of the primary reasons behind the success of neural networks has been the emergence of an array of new, highly-successful optimizers, perhaps most importantly the Adam optimizer. It is widely used for training neural networks, yet notoriously hard to interpret. Lacking a clear physical intuition, Adam is difficult to generalize to manifolds. Some attempts have been made to directly apply parts of the Adam algorithm to manifolds or to find an underlying structure, but a full generalization has remained elusive. In this work a new approach is presented that leverages the special structure of the manifolds which are relevant for optimization of neural networks, such as the Stiefel manifold, the symplectic Stiefel manifold and the Grassmann manifold: all of these are homogeneous spaces and as such admit a global tangent space representation - a common vector space (Lie subspace) in which all tangent spaces can easily be represented. This global tangent space representation is used to perform all of the steps in the Adam optimizer and we are able to fully generalize the optimizer to manifolds without a projection step. The resulting algorithm is then applied to train a transformer for which orthogonality constraints are enforced up to machine precision and we observe significant speed-ups in the training process.

Updated: 2025-07-24 06:08:33

标题: 将Adam推广到流形上以高效训练Transformer

摘要: 神经网络成功的主要原因之一是出现了一系列新的、非常成功的优化器，其中最重要的可能是Adam优化器。它被广泛用于训练神经网络，但很难解释。由于缺乏明确的物理直觉，Adam难以推广到流形。已经尝试直接将Adam算法的部分应用于流形或找到潜在的结构，但完全的泛化仍然难以实现。在这项工作中，提出了一种新方法，利用了与神经网络优化相关的流形的特殊结构，比如Stiefel流形、辛Stiefel流形和Grassmann流形：所有这些都是齐次空间，因此可以接受一个全局切空间表示 - 一个公共向量空间（Lie子空间），可以轻松地表示所有切空间。这个全局切空间表示被用来执行Adam优化器的所有步骤，我们能够完全将优化器推广到流形而不需要投影步骤。然后将得到的算法应用于训练一个变压器，其中正交约束被强制执行到机器精度，并观察到训练过程中的显著加速。

更新时间: 2025-07-24 06:08:33

领域: cs.LG,math.DG,53Z50, 53C30, 68T07, 68W10, 90C26

下载: http://arxiv.org/abs/2305.16901v4

Generalizing Adam to Manifolds for Efficiently Training Transformers

Updated: 2025-07-24 06:08:33

标题: 将Adam推广到流形上以有效训练Transformer

摘要: 神经网络成功的一个主要原因是出现了一系列新的、非常成功的优化器，其中最重要的可能是Adam优化器。它被广泛用于训练神经网络，但其解释起来很困难。由于缺乏清晰的物理直觉，Adam很难推广到流形上。一些尝试直接应用Adam算法的部分内容到流形上，或者寻找一个潜在的结构，但完全泛化仍然难以实现。在这项工作中，提出了一种新的方法，利用了与神经网络优化相关的流形的特殊结构，如Stiefel流形、辛Stiefel流形和Grassmann流形：所有这些都是齐次空间，因此可以容纳一个全局切空间表示 - 一个公共向量空间（李子空间），在其中所有切空间都可以轻松表示。这个全局切空间表示用于执行Adam优化器中的所有步骤，我们能够完全将优化器泛化到流形上，而无需投影步骤。然后将结果算法应用于训练一个变换器，其中正交约束被强制到机器精度，并观察到训练过程中的显著加速。

更新时间: 2025-07-24 06:08:33

领域: cs.LG,math.DG,53Z50, 53C30, 68T07, 68W10, 90C26

下载: http://arxiv.org/abs/2305.16901v4

A Two-armed Bandit Framework for A/B Testing

A/B testing is widely used in modern technology companies for policy evaluation and product deployment, with the goal of comparing the outcomes under a newly-developed policy against a standard control. Various causal inference and reinforcement learning methods developed in the literature are applicable to A/B testing. This paper introduces a two-armed bandit framework designed to improve the power of existing approaches. The proposed procedure consists of three main steps: (i) employing doubly robust estimation to generate pseudo-outcomes, (ii) utilizing a two-armed bandit framework to construct the test statistic, and (iii) applying a permutation-based method to compute the $p$-value. We demonstrate the efficacy of the proposed method through asymptotic theories, numerical experiments and real-world data from a ridesharing company, showing its superior performance in comparison to existing methods.

Updated: 2025-07-24 06:05:56

标题: A/B测试的双臂老虎机框架

摘要: A/B测试被广泛应用于现代技术公司的政策评估和产品部署中，其目标是比较在新开发的政策下与标准对照下的结果。文献中发展的各种因果推断和强化学习方法适用于A/B测试。本文介绍了一个旨在提高现有方法效力的两臂老虎机框架。所提出的程序包括三个主要步骤：(i) 使用双重稳健估计生成伪结果，(ii) 利用两臂老虎机框架构建检验统计量，以及 (iii) 应用基于排列的方法计算$p$-值。我们通过渐近理论、数值实验和来自一家共享乘车公司的真实世界数据展示了所提出方法的有效性，表明其在与现有方法的比较中表现出优越性。

更新时间: 2025-07-24 06:05:56

领域: stat.ML,cs.LG,stat.AP

下载: http://arxiv.org/abs/2507.18118v1

A Two-armed Bandit Framework for A/B Testing

Updated: 2025-07-24 06:05:56

标题: 一个用于A/B测试的双臂老虎机框架

摘要: A/B测试在现代技术公司中被广泛应用于政策评估和产品部署，其目的是比较新开发政策下的结果与标准对照的结果。文献中发展的各种因果推断和强化学习方法适用于A/B测试。本文介绍了一个旨在提高现有方法效力的双臂老虎机框架。所提出的程序包括三个主要步骤：(i)使用双重稳健估计生成伪结果，(ii)利用双臂老虎机框架构建检验统计量，以及(iii)应用基于排列的方法计算$p$值。我们通过渐近理论、数值实验和来自共享出行公司的真实世界数据展示了所提出方法的有效性，显示其相对于现有方法的卓越性能。

更新时间: 2025-07-24 06:05:56

领域: stat.ML,cs.LG,stat.AP

下载: http://arxiv.org/abs/2507.18118v1

The Impact of Pseudo-Science in Financial Loans Risk Prediction

We study the societal impact of pseudo-scientific assumptions for predicting the behavior of people in a straightforward application of machine learning to risk prediction in financial lending. This use case also exemplifies the impact of survival bias in loan return prediction. We analyze the models in terms of their accuracy and social cost, showing that the socially optimal model may not imply a significant accuracy loss for this downstream task. Our results are verified for commonly used learning methods and datasets. Our findings also show that there is a natural dynamic when training models that suffer survival bias where accuracy slightly deteriorates, and whose recall and precision improves with time. These results act as an illusion, leading the observer to believe that the system is getting better, when in fact the model is suffering from increasingly more unfairness and survival bias.

Updated: 2025-07-24 06:01:17

标题: 伪科学对金融贷款风险预测的影响

摘要: 我们研究了伪科学假设对人们行为预测的社会影响，在金融借贷风险预测中直接应用机器学习。这个案例也展示了贷款返还预测中生存偏差的影响。我们分析了模型的准确性和社会成本，表明在这个下游任务中，社会最优模型可能并不意味着显著的准确性损失。我们的结果经常用的学习方法和数据集进行了验证。我们的研究结果还显示，在训练模型时存在生存偏差，准确性略有下降，但随着时间的推移，召回率和精确度却有所提高。这些结果会产生一种错觉，让观察者相信系统正在变得更好，但实际上模型正遭受越来越多的不公平和生存偏差。

更新时间: 2025-07-24 06:01:17

领域: cs.CY,cs.LG

下载: http://arxiv.org/abs/2507.16182v2

On the Approximation of Stationary Processes using the ARMA Model

We revisit an old problem related to Autoregressive Moving Average (ARMA) models, on quantifying and bounding the approximation error between a true stationary process $X_t$ and an ARMA model $Y_t$. We take the transfer function representation of an ARMA model and show that the associated $L^{\infty}$ norm provides a valid alternate norm that controls the $L^2$ norm and has structural properties comparable to the cepstral norm. We show that a certain subspace of stationary processes, which includes ARMA models, forms a Banach algebra under the $L^{\infty}$ norm that respects the group structure of $H^{\infty}$ transfer functions. The natural definition of invertibility in this algebra is consistent with the original definition of ARMA invertibility, and generalizes better to non-ARMA processes than Wiener's $\ell^1$ condition. Finally, we calculate some explicit approximation bounds in the simpler context of continuous transfer functions, and critique some heuristic ideas on Pad\'e approximations and parsimonious models.

Updated: 2025-07-24 06:00:26

标题: 关于使用ARMA模型近似稳态过程的研究

摘要: 我们重新审视了与自回归移动平均（ARMA）模型相关的一个旧问题，即如何量化和界定真实平稳过程$X_t$与ARMA模型$Y_t$之间的近似误差。我们采用ARMA模型的传递函数表示，并展示相关的$L^{\infty}$范数提供了一个有效的替代范数，可以控制$L^2$范数并具有类似于倒谱范数的结构特性。我们展示了一个包括ARMA模型在内的一定子空间的平稳过程在$L^{\infty}$范数下形成了一个Banach代数，尊重$H^{\infty}$传递函数的群结构。在这个代数中，可逆性的自然定义与ARMA可逆性的原始定义一致，并且对于非ARMA过程的泛化比维纳的$\ell^1$条件更好。最后，我们在连续传递函数的简单背景下计算了一些明确的近似界，并对Pad\'e逼近和简约模型的一些启发式思想进行了批判。

更新时间: 2025-07-24 06:00:26

领域: cs.LG,math.PR,stat.ME,60G10,G.3

下载: http://arxiv.org/abs/2408.10610v3

On the Approximation of Stationary Processes using the ARMA Model

Updated: 2025-07-24 06:00:26

标题: 使用ARMA模型对稳态过程的近似处理

摘要: 我们重新审视了与自回归移动平均（ARMA）模型相关的一个旧问题，即如何量化和界定真实平稳过程$X_t$与ARMA模型$Y_t$之间的逼近误差。我们采用ARMA模型的传递函数表示，并展示相关的$L^{\infty}$范数提供了一个有效的替代范数，可以控制$L^2$范数，并且具有类似倒谱范数的结构特性。我们展示了一个包含ARMA模型在内的一定子空间的平稳过程，在$L^{\infty}$范数下形成一个Banach代数，遵守$H^{\infty}$传递函数的群结构。在这个代数中，可逆性的自然定义与ARMA可逆性的原始定义一致，并且对非ARMA过程的泛化性比维纳的$\ell^1$条件更好。最后，我们在连续传递函数的简单背景下计算了一些明确的逼近界限，并对一些关于Padé逼近和简约模型的启发式想法进行了批判。

更新时间: 2025-07-24 06:00:26

领域: cs.LG,math.PR,stat.ME,60G10,G.3

下载: http://arxiv.org/abs/2408.10610v3

When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social Systems

Recent large-scale events like election fraud and financial scams have shown how harmful coordinated efforts by human groups can be. With the rise of autonomous AI systems, there is growing concern that AI-driven groups could also cause similar harm. While most AI safety research focuses on individual AI systems, the risks posed by multi-agent systems (MAS) in complex real-world situations are still underexplored. In this paper, we introduce a proof-of-concept to simulate the risks of malicious MAS collusion, using a flexible framework that supports both centralized and decentralized coordination structures. We apply this framework to two high-risk fields: misinformation spread and e-commerce fraud. Our findings show that decentralized systems are more effective at carrying out malicious actions than centralized ones. The increased autonomy of decentralized systems allows them to adapt their strategies and cause more damage. Even when traditional interventions, like content flagging, are applied, decentralized groups can adjust their tactics to avoid detection. We present key insights into how these malicious groups operate and the need for better detection systems and countermeasures. Code is available at https://github.com/renqibing/RogueAgent.

Updated: 2025-07-24 06:00:02

标题: 当自主性失控：为社会系统中多智能体勾结风险做准备

摘要: 最近大规模事件如选举舞弊和金融诈骗已经显示出人类团体协同努力可能有多么有害。随着自主AI系统的兴起，人们越来越担心由AI驱动的团体也可能造成类似的伤害。虽然大多数AI安全研究关注个体AI系统，但在复杂的现实情况下，多智能体系统（MAS）带来的风险仍未得到充分探讨。在本文中，我们引入了一个概念验证来模拟恶意MAS勾结的风险，使用一个支持集中和分散协调结构的灵活框架。我们将这一框架应用于两个高风险领域：信息传播和电子商务诈骗。我们的研究结果显示，分散系统比集中系统更有效地执行恶意行动。分散系统的增加自主性使它们能够调整策略并造成更多损害。即使采取了传统的干预措施，如内容标记，分散群体也可以调整他们的战术以避免被发现。我们提供了这些恶意团体如何运作以及需要更好的检测系统和对策的关键见解。代码可在https://github.com/renqibing/RogueAgent获取。

更新时间: 2025-07-24 06:00:02

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.14660v2

Agentic AI framework for End-to-End Medical Data Inference

Building and deploying machine learning solutions in healthcare remains expensive and labor-intensive due to fragmented preprocessing workflows, model compatibility issues, and stringent data privacy constraints. In this work, we introduce an Agentic AI framework that automates the entire clinical data pipeline, from ingestion to inference, through a system of modular, task-specific agents. These agents handle both structured and unstructured data, enabling automatic feature selection, model selection, and preprocessing recommendation without manual intervention. We evaluate the system on publicly available datasets from geriatrics, palliative care, and colonoscopy imaging. For example, in the case of structured data (anxiety data) and unstructured data (colonoscopy polyps data), the pipeline begins with file-type detection by the Ingestion Identifier Agent, followed by the Data Anonymizer Agent ensuring privacy compliance, where we first identify the data type and then anonymize it. The Feature Extraction Agent identifies features using an embedding-based approach for tabular data, extracting all column names, and a multi-stage MedGemma-based approach for image data, which infers modality and disease name. These features guide the Model-Data Feature Matcher Agent in selecting the best-fit model from a curated repository. The Preprocessing Recommender Agent and Preprocessing Implementor Agent then apply tailored preprocessing based on data type and model requirements. Finally, the ``Model Inference Agent" runs the selected model on the uploaded data and generates interpretable outputs using tools like SHAP, LIME, and DETR attention maps. By automating these high-friction stages of the ML lifecycle, the proposed framework reduces the need for repeated expert intervention, offering a scalable, cost-efficient pathway for operationalizing AI in clinical environments.

Updated: 2025-07-24 05:56:25

标题: 自主AI框架用于端到端医疗数据推断

摘要: 在医疗保健领域构建和部署机器学习解决方案仍然昂贵且劳动密集，这是由于碎片化的预处理工作流程、模型兼容性问题和严格的数据隐私约束所致。在这项工作中，我们介绍了一种主动AI框架，通过一套模块化、任务特定的代理系统，自动化整个临床数据管道，从数据摄入到推理。这些代理处理结构化和非结构化数据，实现无需手动干预的自动特征选择、模型选择和预处理推荐。我们在老年学、姑息护理和结肠镜成像的公开可用数据集上评估了该系统。例如，在结构化数据（焦虑数据）和非结构化数据（结肠镜息肉数据）的情况下，管道从摄入标识代理进行文件类型检测开始，然后由数据匿名化代理确保隐私合规性，首先识别数据类型，然后进行匿名化处理。特征提取代理使用基于嵌入的方法用于表格数据识别特征，提取所有列名，并使用基于多阶段MedGemma的方法用于图像数据，推断模态和疾病名称。这些特征指导模型数据特征匹配代理从策划存储库中选择最适合的模型。预处理推荐代理和预处理实现代理然后根据数据类型和模型要求应用定制的预处理。最后，“模型推理代理”在上传的数据上运行所选模型，并使用SHAP、LIME和DETR关注地图等工具生成可解释的输出。通过自动化ML生命周期中这些高摩擦阶段，提出的框架减少了重复的专家干预需求，为在临床环境中实现AI提供了可扩展、成本效益的途径。

更新时间: 2025-07-24 05:56:25

领域: cs.AI,cs.CL,cs.CY,cs.ET,cs.LG

下载: http://arxiv.org/abs/2507.18115v1

Agentic AI framework for End-to-End Medical Data Inference

Updated: 2025-07-24 05:56:25

标题: 主体AI框架用于端到端医疗数据推断

摘要: 在医疗保健领域构建和部署机器学习解决方案仍然是昂贵且劳动密集的，原因在于碎片化的预处理工作流程、模型兼容性问题和严格的数据隐私约束。在这项工作中，我们引入了一个主动型人工智能框架，通过一套模块化、任务特定的代理系统，自动化整个临床数据流水线，从数据摄入到推断。这些代理处理结构化和非结构化数据，实现自动特征选择、模型选择和预处理推荐，无需人工干预。我们在老年医学、姑息治疗和结肠镜成像的公开数据集上评估了该系统。例如，在结构化数据（焦虑数据）和非结构化数据（结肠镜息肉数据）的情况下，流水线从摄入标识代理进行文件类型检测开始，然后由数据匿名化代理确保隐私合规性，首先识别数据类型，然后进行匿名化处理。特征提取代理使用基于嵌入的方法提取表格数据的特征，提取所有列名，并使用基于多阶段MedGemma方法提取图像数据的特征，推断模态和疾病名称。这些特征指导模型数据特征匹配代理从策划的仓库中选择最合适的模型。预处理推荐代理和预处理实施代理根据数据类型和模型要求应用定制的预处理。最后，“模型推断代理”在上传的数据上运行所选模型，并使用SHAP、LIME和DETR注意力图等工具生成可解释的输出。通过自动化机器学习生命周期中这些高摩擦阶段，提出的框架减少了对专家干预的重复需求，为在临床环境中实现人工智能提供了可扩展且成本效益高的途径。

更新时间: 2025-07-24 05:56:25

领域: cs.AI,cs.CL,cs.CY,cs.ET,cs.LG

下载: http://arxiv.org/abs/2507.18115v1

Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control I: Penalty Approach

This paper develops a unified nonconvex optimization framework for the design of group-sparse feedback controllers in infinite-horizon linear-quadratic (LQ) problems. We address two prominent extensions of the classical LQ problem: the distributed LQ problem with fixed communication topology (DFT-LQ) and the sparse feedback LQ problem (SF-LQ), both of which are motivated by the need for scalable and structure-aware control in large-scale systems. Unlike existing approaches that rely on convex relaxations or are limited to block-diagonal structures, we directly formulate the controller synthesis as a finite-dimensional nonconvex optimization problem with group $\ell_0$-norm regularization, capturing general sparsity patterns. We establish a connection between DFT-LQ and SF-LQ problems, showing that both can be addressed within our unified framework. Furthermore, we propose a penalty-based proximal alternating linearized minimization (PALM) algorithm and provide a rigorous convergence analysis under mild assumptions, overcoming the lack of coercivity in the objective function. The proposed method admits efficient solvers for all subproblems and guarantees global convergence to critical points. Our results fill a key gap in the literature by enabling the direct design of group-sparse feedback gains with theoretical guarantees, without resorting to convex surrogates or restrictive structural assumptions.

Updated: 2025-07-24 05:55:28

标题: 非凸优化框架用于群稀疏反馈线性二次最优控制 I：惩罚方法

摘要: 本文提出了一个统一的非凸优化框架，用于设计无穷时域线性二次(LQ)问题中的群稀疏反馈控制器。我们解决了经典LQ问题的两个突出扩展：具有固定通信拓扑结构的分布LQ问题(DFT-LQ)和稀疏反馈LQ问题(SF-LQ)，这两个问题都受到在大规模系统中需要可扩展和结构感知控制的影响。与现有依赖凸松弛或局限于块对角结构的方法不同，我们直接将控制器合成表述为带有群$\ell_0$-范数正则化的有限维非凸优化问题，捕捉一般稀疏模式。我们建立了DFT-LQ和SF-LQ问题之间的联系，表明两者都可以在我们的统一框架内解决。此外，我们提出了一个基于惩罚的交替线性化最小化(PALM)算法，并在温和假设下提供了严格的收敛分析，克服了目标函数中缺乏强制性的问题。所提出的方法对所有子问题都有高效的求解器，并保证全局收敛到临界点。我们的结果填补了文献中的一个关键空白，实现了具有理论保证的直接设计群稀疏反馈增益，而无需求助于凸替代物或限制性结构假设。

更新时间: 2025-07-24 05:55:28

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2507.18114v1

Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control I: Penalty Approach

Updated: 2025-07-24 05:55:28

标题: 非凸优化框架用于群稀疏反馈线性二次最优控制I：惩罚方法

摘要: 本文提出了一个统一的非凸优化框架，用于设计无限时域线性二次（LQ）问题中的群稀疏反馈控制器。我们讨论了两种经典LQ问题的突出扩展：具有固定通信拓扑的分布式LQ问题（DFT-LQ）和稀疏反馈LQ问题（SF-LQ），这两种问题都受到在大规模系统中实现可伸缩性和结构感知控制的需求的启发。与现有依赖于凸松弛或仅限于块对角结构的方法不同，我们直接将控制器合成表述为一个带有群$ \ell_0 $-范数正则化的有限维非凸优化问题，捕捉一般稀疏模式。我们建立了DFT-LQ和SF-LQ问题之间的联系，表明两者都可以在我们的统一框架内解决。此外，我们提出了一种基于惩罚的交替线性最小化（PALM）算法，并在温和的假设下提供了严格的收敛分析，克服了目标函数缺乏强制性的问题。所提出的方法允许对所有子问题进行高效求解，并保证全局收敛到临界点。我们的结果通过在不借助凸替代物或限制性结构假设的情况下，实现具有理论保证的群稀疏反馈增益的直接设计，填补了文献中的一个关键空白。

更新时间: 2025-07-24 05:55:28

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2507.18114v1

Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification

Reinforcement learning (RL) has achieved remarkable success in fields like robotics and autonomous driving, but adversarial attacks designed to mislead RL systems remain challenging. Existing approaches often rely on modifying the environment or policy, limiting their practicality. This paper proposes an adversarial attack method in which existing agents in the environment guide the target policy to output suboptimal actions without altering the environment. We propose a reward iteration optimization framework that leverages large language models (LLMs) to generate adversarial rewards explicitly tailored to the vulnerabilities of the target agent, thereby enhancing the effectiveness of inducing the target agent toward suboptimal decision-making. Additionally, a critical state identification algorithm is designed to pinpoint the target agent's most vulnerable states, where suboptimal behavior from the victim leads to significant degradation in overall performance. Experimental results in diverse environments demonstrate the superiority of our method over existing approaches.

Updated: 2025-07-24 05:52:06

标题: 强化学习中的政策干扰：大型语言模型的对抗攻击和关键状态识别

摘要: 强化学习（RL）已经在机器人和自动驾驶等领域取得了显著的成功，但针对RL系统的误导性对抗攻击仍然具有挑战性。现有方法通常依赖于修改环境或策略，从而限制了它们的实用性。本文提出了一种对抗攻击方法，其中环境中现有的代理引导目标策略输出次优动作，而无需改变环境。我们提出了一种奖励迭代优化框架，利用大型语言模型（LLMs）生成明确针对目标代理的弱点定制的对抗性奖励，从而增强了诱导目标代理朝向次优决策的效果。此外，设计了一个关键状态识别算法，用于准确定位目标代理最脆弱的状态，其中受害者的次优行为导致整体性能显著下降。在多样化环境中的实验结果表明，我们的方法优于现有方法。

更新时间: 2025-07-24 05:52:06

领域: cs.LG

下载: http://arxiv.org/abs/2507.18113v1

Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification

Updated: 2025-07-24 05:52:06

标题: 强化学习中的政策干扰：大型语言模型的对抗攻击和关键状态识别

摘要: 强化学习（RL）在机器人和自动驾驶等领域取得了显著的成功，但旨在误导RL系统的对抗性攻击仍然具有挑战性。现有方法通常依赖于修改环境或策略，这限制了它们的实用性。本文提出了一种对抗性攻击方法，其中环境中的现有代理引导目标策略输出次优动作，而无需改变环境。我们提出了一个奖励迭代优化框架，利用大型语言模型（LLMs）生成明确针对目标代理的弱点定制的对抗性奖励，从而增强诱导目标代理朝向次优决策的效果。此外，设计了一个关键状态识别算法，用于准确定位目标代理最脆弱的状态，其中受害者的次优行为导致整体性能显著下降。在不同环境中的实验结果显示，我们的方法优于现有方法。

更新时间: 2025-07-24 05:52:06

领域: cs.LG

下载: http://arxiv.org/abs/2507.18113v1

Parameter-Efficient Fine-Tuning of 3D DDPM for MRI Image Generation Using Tensor Networks

We address the challenge of parameter-efficient fine-tuning (PEFT) for three-dimensional (3D) U-Net-based denoising diffusion probabilistic models (DDPMs) in magnetic resonance imaging (MRI) image generation. Despite its practical significance, research on parameter-efficient representations of 3D convolution operations remains limited. To bridge this gap, we propose Tensor Volumetric Operator (TenVOO), a novel PEFT method specifically designed for fine-tuning DDPMs with 3D convolutional backbones. Leveraging tensor network modeling, TenVOO represents 3D convolution kernels with lower-dimensional tensors, effectively capturing complex spatial dependencies during fine-tuning with few parameters. We evaluate TenVOO on three downstream brain MRI datasets-ADNI, PPMI, and BraTS2021-by fine-tuning a DDPM pretrained on 59,830 T1-weighted brain MRI scans from the UK Biobank. Our results demonstrate that TenVOO achieves state-of-the-art performance in multi-scale structural similarity index measure (MS-SSIM), outperforming existing approaches in capturing spatial dependencies while requiring only 0.3% of the trainable parameters of the original model. Our code is available at: https://github.com/xiaovhua/tenvoo

Updated: 2025-07-24 05:51:51

标题: 使用张量网络对MRI图像生成的3D DDPM进行参数高效微调

摘要: 我们针对磁共振成像（MRI）图像生成中基于三维（3D）U-Net的去噪扩散概率模型（DDPMs）的参数高效微调（PEFT）挑战进行了研究。尽管具有实际意义，但对于三维卷积操作的参数高效表示的研究仍然有限。为了填补这一空白，我们提出了Tensor Volumetric Operator（TenVOO），这是一种专门为3D卷积骨干微调DDPMs而设计的新颖PEFT方法。利用张量网络建模，TenVOO用低维张量表示3D卷积核，有效地捕获了微调过程中的复杂空间依赖关系，同时只使用少量参数。我们在三个下游脑MRI数据集-ADNI、PPMI和BraTS2021上评估了TenVOO，通过对在英国生物库的59,830个T1加权脑MRI扫描进行预训练的DDPM进行微调。我们的结果表明，TenVOO在多尺度结构相似性指数测量（MS-SSIM）上实现了最先进的性能，优于现有方法在捕获空间依赖关系方面，同时只需要原始模型可训练参数的0.3％。我们的代码可在以下链接找到：https://github.com/xiaovhua/tenvoo

更新时间: 2025-07-24 05:51:51

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.18112v1

Percentile-Based Deep Reinforcement Learning and Reward Based Personalization For Delay Aware RAN Slicing in O-RAN

In this paper, we tackle the challenge of radio access network (RAN) slicing within an open RAN (O-RAN) architecture. Our focus centers on a network that includes multiple mobile virtual network operators (MVNOs) competing for physical resource blocks (PRBs) with the goal of meeting probabilistic delay upper bound constraints for their clients while minimizing PRB utilization. Initially, we derive a reward function based on the law of large numbers (LLN), then implement practical modifications to adapt it for real-world experimental scenarios. We then propose our solution, the Percentile-based Delay-Aware Deep Reinforcement Learning (PDA-DRL), which demonstrates its superiority over several baselines, including DRL models optimized for average delay constraints, by achieving a 38\% reduction in resultant average delay. Furthermore, we delve into the issue of model weight sharing among multiple MVNOs to develop a robust personalized model. We introduce a reward-based personalization method where each agent prioritizes other agents' model weights based on their performance. This technique surpasses traditional aggregation methods, such as federated averaging, and strategies reliant on traffic patterns and model weight distance similarities.

Updated: 2025-07-24 05:45:41

标题: 基于百分位数的深度强化学习和基于奖励的个性化延迟感知RAN切片在O-RAN中的应用

摘要: 在这篇论文中，我们着手解决开放式无线接入网络（O-RAN）架构中的无线接入网络（RAN）切片挑战。我们的重点是一个包括多个移动虚拟网络运营商（MVNOs）的网络，这些运营商竞争物理资源块（PRBs），目标是满足其客户的概率延迟上限约束，同时最小化PRB利用率。首先，我们基于大数定律（LLN）推导出一个奖励函数，然后实施实际修改以适应真实世界实验场景。然后，我们提出我们的解决方案，基于百分位延迟感知深度强化学习（PDA-DRL），通过实现38\%的平均延迟降低，展示了其优越性，包括针对平均延迟约束优化的DRL模型。此外，我们深入探讨了在多个MVNOs之间共享模型权重的问题，以开发一个强大的个性化模型。我们引入了一种基于奖励的个性化方法，其中每个代理根据其表现优先考虑其他代理的模型权重。这种技术超越了传统的聚合方法，如联邦平均，以及依赖于流量模式和模型权重距离相似性的策略。

更新时间: 2025-07-24 05:45:41

领域: cs.LG

下载: http://arxiv.org/abs/2507.18111v1

Percentile-Based Deep Reinforcement Learning and Reward Based Personalization For Delay Aware RAN Slicing in O-RAN

Updated: 2025-07-24 05:45:41

标题: 基于百分位数的深度强化学习和基于奖励的个性化延迟感知RAN切片在O-RAN中的应用

摘要: 在这篇论文中，我们解决了在开放式无线接入网络（O-RAN）架构中的无线接入网络（RAN）切片的挑战。我们的重点放在一个包括多个移动虚拟网络运营商（MVNOs）竞争物理资源块（PRBs）的网络上，目标是满足其客户的概率延迟上限约束，同时最小化PRB利用率。首先，我们基于大数定律（LLN）推导出一个奖励函数，然后实施实际修改以适应真实世界的实验场景。然后，我们提出了我们的解决方案，基于百分位延迟感知深度强化学习（PDA-DRL），通过实现38％的平均延迟结果的降低，表明其优于几个基线，包括针对平均延迟约束优化的DRL模型。此外，我们深入探讨了在多个MVNOs之间共享模型权重的问题，以开发出一个强大的个性化模型。我们引入了一种基于奖励的个性化方法，其中每个代理根据其性能优先考虑其他代理的模型权重。这种技术超越了传统的聚合方法，如联邦平均法，以及依赖于流量模式和模型权重距离相似性的策略。

更新时间: 2025-07-24 05:45:41

领域: cs.LG

下载: http://arxiv.org/abs/2507.18111v1

Recognizing and Eliciting Weakly Single Crossing Profiles on Trees

We introduce and study the weakly single-crossing domain on trees which is a generalization of the well-studied single-crossing domain in social choice theory. We design a polynomial-time algorithm for recognizing preference profiles which belong to this domain. We then develop an efficient elicitation algorithm for this domain which works even if the preferences can be accessed only sequentially and the underlying single-crossing tree structure is not known beforehand. We also prove matching lower bound on the query complexity of our elicitation algorithm when the number of voters is large compared to the number of candidates. We also prove a lower bound of $\Omega(m^2\log n)$ on the number of queries that any algorithm needs to ask to elicit single crossing profile when random queries are allowed. This resolves an open question in an earlier paper and proves optimality of their preference elicitation algorithm when random queries are allowed.

Updated: 2025-07-24 05:38:22

标题: 识别和引出树上的弱单交叉轮廓

摘要: 我们介绍并研究了树上的弱单交叉域，这是社会选择理论中广泛研究的单交叉域的推广。我们设计了一个多项式时间算法，用于识别属于该域的偏好配置。然后，我们为该域开发了一种有效的调查算法，即使只能顺序访问偏好，且底层的单交叉树结构事先未知也能工作。当选民数量相对于候选人数量较大时，我们还证明了我们的调查算法的查询复杂性的匹配下界。我们还证明了在允许随机查询时，任何算法需要询问以引发单交叉配置的查询数量的下界为$\Omega(m^2\log n)$。这解决了先前一篇论文中的一个未解问题，并证明了他们的偏好调查算法在允许随机查询时的最优性。

更新时间: 2025-07-24 05:38:22

领域: cs.MA,cs.AI,cs.DS

下载: http://arxiv.org/abs/1611.04175v4

Distributional Uncertainty for Out-of-Distribution Detection

Estimating uncertainty from deep neural networks is a widely used approach for detecting out-of-distribution (OoD) samples, which typically exhibit high predictive uncertainty. However, conventional methods such as Monte Carlo (MC) Dropout often focus solely on either model or data uncertainty, failing to align with the semantic objective of OoD detection. To address this, we propose the Free-Energy Posterior Network, a novel framework that jointly models distributional uncertainty and identifying OoD and misclassified regions using free energy. Our method introduces two key contributions: (1) a free-energy-based density estimator parameterized by a Beta distribution, which enables fine-grained uncertainty estimation near ambiguous or unseen regions; and (2) a loss integrated within a posterior network, allowing direct uncertainty estimation from learned parameters without requiring stochastic sampling. By integrating our approach with the residual prediction branch (RPL) framework, the proposed method goes beyond post-hoc energy thresholding and enables the network to learn OoD regions by leveraging the variance of the Beta distribution, resulting in a semantically meaningful and computationally efficient solution for uncertainty-aware segmentation. We validate the effectiveness of our method on challenging real-world benchmarks, including Fishyscapes, RoadAnomaly, and Segment-Me-If-You-Can.

Updated: 2025-07-24 05:35:49

标题: 分布不确定性用于外部分布检测

摘要: 从深度神经网络估计不确定性是一种广泛使用的方法，用于检测分布外（OoD）样本，这些样本通常表现出较高的预测不确定性。然而，传统方法如蒙特卡罗（MC）Dropout通常仅关注模型或数据不确定性之一，未能与OoD检测的语义目标对齐。为了解决这个问题，我们提出了自由能后验网络，这是一个新颖的框架，可以同时建模分布不确定性并使用自由能来识别OoD和误分类区域。我们的方法引入了两个关键贡献：（1）基于Beta分布参数化的自由能密度估计器，可以在模糊或未见区域附近进行精细的不确定性估计；和（2）集成在后验网络中的损失，允许直接从学习的参数中进行不确定性估计，而无需进行随机抽样。通过将我们的方法与残差预测分支（RPL）框架集成，提出的方法超越了事后能量阈值设定，利用Beta分布的方差使网络学习OoD区域，从而实现了一种在语义上有意义且计算效率高的不确定性感知分割解决方案。我们在具有挑战性的现实世界基准数据集（包括Fishyscapes、RoadAnomaly和Segment-Me-If-You-Can）上验证了我们方法的有效性。

更新时间: 2025-07-24 05:35:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18106v1

Understanding the Supply Chain and Risks of Large Language Model Applications

The rise of Large Language Models (LLMs) has led to the widespread deployment of LLM-based systems across diverse domains. As these systems proliferate, understanding the risks associated with their complex supply chains is increasingly important. LLM-based systems are not standalone as they rely on interconnected supply chains involving pretrained models, third-party libraries, datasets, and infrastructure. Yet, most risk assessments narrowly focus on model or data level, overlooking broader supply chain vulnerabilities. While recent studies have begun to address LLM supply chain risks, there remains a lack of benchmarks for systematic research. To address this gap, we introduce the first comprehensive dataset for analyzing and benchmarking LLM supply chain security. We collect 3,859 real-world LLM applications and perform interdependency analysis, identifying 109,211 models, 2,474 datasets, and 9,862 libraries. We extract model fine-tuning paths, dataset reuse, and library reliance, mapping the ecosystem's structure. To evaluate security, we gather 1,555 risk-related issues-50 for applications, 325 for models, 18 for datasets, and 1,229 for libraries from public vulnerability databases. Using this dataset, we empirically analyze component dependencies and risks. Our findings reveal deeply nested dependencies in LLM applications and significant vulnerabilities across the supply chain, underscoring the need for comprehensive security analysis. We conclude with practical recommendations to guide researchers and developers toward safer, more trustworthy LLM-enabled systems.

Updated: 2025-07-24 05:30:54

标题: 理解大型语言模型应用的供应链和风险

摘要: 大语言模型（LLMs）的兴起导致在各个领域广泛部署了基于LLM的系统。随着这些系统的增多，理解与其复杂供应链相关的风险变得日益重要。LLM系统不是独立存在的，因为它们依赖于涉及预训练模型、第三方库、数据集和基础设施的相互连接的供应链。然而，大多数风险评估狭隘地集中在模型或数据级别，忽视了更广泛的供应链漏洞。尽管最近的研究已经开始解决LLM供应链风险，但对于系统性研究仍然缺乏基准。为了填补这一空白，我们介绍了第一个用于分析和基准测试LLM供应链安全性的全面数据集。我们收集了3,859个真实世界的LLM应用程序，并进行了相互依赖性分析，识别了109,211个模型、2,474个数据集和9,862个库。我们提取了模型微调路径、数据集重用和库依赖性，绘制了生态系统的结构。为了评估安全性，我们从公共漏洞数据库中收集了1,555个与风险相关的问题，其中包括50个应用程序问题、325个模型问题、18个数据集问题和1,229个库问题。利用这个数据集，我们从实证角度分析了组件依赖关系和风险。我们的发现揭示了LLM应用程序中深度嵌套的依赖关系，以及供应链中的重大漏洞，强调了需要进行全面的安全性分析。最后，我们提出了实用建议，以指导研究人员和开发人员朝着更安全、更值得信赖的LLM启用系统发展。

更新时间: 2025-07-24 05:30:54

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2507.18105v1

Understanding the Supply Chain and Risks of Large Language Model Applications

Updated: 2025-07-24 05:30:54

标题: 理解大型语言模型应用的供应链和风险

摘要: 大型语言模型（LLMs）的兴起导致了在各个领域广泛部署基于LLM的系统。随着这些系统的增多，理解与其复杂供应链相关的风险变得愈发重要。LLM-based系统并非独立存在，因为它们依赖于包括预训练模型、第三方库、数据集和基础设施在内的相互关联的供应链。然而，大多数风险评估狭隘地集中在模型或数据级别，忽视了更广泛的供应链漏洞。尽管最近的研究已经开始解决LLM供应链风险，但仍然缺乏系统研究的基准。为了填补这一空白，我们引入了第一个用于分析和基准测试LLM供应链安全性的全面数据集。我们收集了3,859个真实世界的LLM应用程序，并进行了相互依赖性分析，识别出109,211个模型、2,474个数据集和9,862个库。我们提取了模型微调路径、数据集重复使用情况和库依赖性，绘制了生态系统的结构。为了评估安全性，我们从公共漏洞数据库中收集了1,555个与风险相关的问题：50个用于应用程序、325个用于模型、18个用于数据集和1,229个用于库。利用这个数据集，我们对组件依赖性和风险进行了实证分析。我们的研究结果揭示了LLM应用程序中深度嵌套的依赖关系，以及供应链中的显著漏洞，强调了进行全面安全分析的必要性。我们最后提出了实用建议，指导研究人员和开发人员朝着更安全、更可信赖的LLM-enabled系统发展。

更新时间: 2025-07-24 05:30:54

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2507.18105v1

A New Pair of GloVes

This report documents, describes, and evaluates new 2024 English GloVe (Global Vectors for Word Representation) models. While the original GloVe models built in 2014 have been widely used and found useful, languages and the world continue to evolve and we thought that current usage could benefit from updated models. Moreover, the 2014 models were not carefully documented as to the exact data versions and preprocessing that were used, and we rectify this by documenting these new models. We trained two sets of word embeddings using Wikipedia, Gigaword, and a subset of Dolma. Evaluation through vocabulary comparison, direct testing, and NER tasks shows that the 2024 vectors incorporate new culturally and linguistically relevant words, perform comparably on structural tasks like analogy and similarity, and demonstrate improved performance on recent, temporally dependent NER datasets such as non-Western newswire data.

Updated: 2025-07-24 05:29:18

标题: 一双新手套

摘要: 这份报告记录、描述并评估了新的2024年英文GloVe（全球词向量表示）模型。虽然2014年建立的原始GloVe模型被广泛使用并被证明有用，但是语言和世界持续发展，我们认为当前的使用可以从更新的模型中获益。此外，2014年的模型没有详细记录使用的确切数据版本和预处理方法，我们通过记录这些新模型来纠正这一点。我们使用维基百科、Gigaword和Dolma的一个子集训练了两组词嵌入。通过词汇比较、直接测试和NER任务的评估显示，2024年的向量融合了新的文化和语言相关词汇，在类比和相似性等结构任务上表现相当，并在最近、时间相关的NER数据集上表现出更好的性能，如非西方新闻稿数据。

更新时间: 2025-07-24 05:29:18

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.18103v1

A New Pair of GloVes

Updated: 2025-07-24 05:29:18

标题: 一双新手套

摘要: 这份报告记录、描述和评估了新的2024年英语GloVe（全局词向量表示）模型。尽管2014年构建的原始GloVe模型被广泛使用并被证明有用，但语言和世界仍在不断发展，我们认为当前的使用可以从更新的模型中受益。此外，2014年的模型并没有仔细记录使用的确切数据版本和预处理方式，我们通过记录这些新模型来纠正这一问题。我们使用维基百科、Gigaword和Dolma的一个子集来训练了两组词嵌入。通过词汇比较、直接测试和NER任务的评估显示，2024年的向量包含新的文化和语言相关词汇，在类比和相似性等结构任务上表现相当，并在最近的、时间相关的NER数据集上表现出更好的性能，例如非西方的新闻稿数据。

更新时间: 2025-07-24 05:29:18

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.18103v1

Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning

Video Temporal Grounding (VTG) aims to localize relevant temporal segments in videos given natural language queries. Despite recent progress with large vision-language models (LVLMs) and instruction-tuning, existing approaches often suffer from limited temporal awareness and poor generalization. In this work, we introduce a two-stage training framework that integrates supervised fine-tuning with reinforcement learning (RL) to improve both the accuracy and robustness of VTG models. Our approach first leverages high-quality curated cold start data for SFT initialization, followed by difficulty-controlled RL to further enhance temporal localization and reasoning abilities. Comprehensive experiments on multiple VTG benchmarks demonstrate that our method consistently outperforms existing models, particularly in challenging and open-domain scenarios. We conduct an in-depth analysis of training strategies and dataset curation, highlighting the importance of both high-quality cold start data and difficulty-controlled RL. To facilitate further research and industrial adoption, we release all intermediate datasets, models, and code to the community.

Updated: 2025-07-24 05:24:01

标题: 视频时间定位的数据集和强化学习的方法

摘要: Video Temporal Grounding (VTG)旨在根据自然语言查询在视频中定位相关的时间段。尽管最近大规模视觉-语言模型（LVLMs）和指导调整取得了进展，但现有方法往往存在有限的时间意识和较差的泛化能力。在这项工作中，我们引入了一个两阶段训练框架，将监督微调与强化学习（RL）相结合，以提高VTG模型的准确性和鲁棒性。我们的方法首先利用高质量的策划冷启动数据进行SFT初始化，然后通过难度控制的RL进一步增强时间定位和推理能力。对多个VTG基准进行全面实验表明，我们的方法在各种挑战和开放领域场景中始终优于现有模型。我们对训练策略和数据集策划进行了深入分析，强调了高质量冷启动数据和难度控制RL的重要性。为促进进一步的研究和工业应用，我们向社区发布所有中间数据集、模型和代码。

更新时间: 2025-07-24 05:24:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18100v1

Comparison of Segmentation Methods in Remote Sensing for Land Use Land Cover

Land Use Land Cover (LULC) mapping is essential for urban and resource planning, and is one of the key elements in developing smart and sustainable cities.This study evaluates advanced LULC mapping techniques, focusing on Look-Up Table (LUT)-based Atmospheric Correction applied to Cartosat Multispectral (MX) sensor images, followed by supervised and semi-supervised learning models for LULC prediction. We explore DeeplabV3+ and Cross-Pseudo Supervision (CPS). The CPS model is further refined with dynamic weighting, enhancing pseudo-label reliability during training. This comprehensive approach analyses the accuracy and utility of LULC mapping techniques for various urban planning applications. A case study of Hyderabad, India, illustrates significant land use changes due to rapid urbanization. By analyzing Cartosat MX images over time, we highlight shifts such as urban sprawl, shrinking green spaces, and expanding industrial areas. This demonstrates the practical utility of these techniques for urban planners and policymakers.

Updated: 2025-07-24 05:23:02

标题: 遥感中用于土地利用和土地覆盖的分割方法比较

摘要: 土地利用土地覆盖（LULC）映射对城市和资源规划至关重要，是发展智能和可持续城市的关键要素之一。本研究评估了先进的LULC映射技术，重点关注基于查找表（LUT）的大气校正应用于Cartosat多光谱（MX）传感器图像，然后采用监督和半监督学习模型进行LULC预测。我们探讨了DeeplabV3+和Cross-Pseudo Supervision（CPS）。CPS模型进一步通过动态加权进行优化，在训练过程中增强伪标签的可靠性。这种综合方法分析了不同城市规划应用中LULC映射技术的准确性和实用性。以印度海得拉巴为案例研究，说明了由于快速城市化而产生的重大土地利用变化。通过随时间分析Cartosat MX图像，我们突出了城市扩张、绿地减少和工业区扩张等变化。这展示了这些技术对城市规划者和决策者的实际实用性。

更新时间: 2025-07-24 05:23:02

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18099v1

Comparison of Segmentation Methods in Remote Sensing for Land Use Land Cover

Updated: 2025-07-24 05:23:02

标题: 遥感中用于土地利用和土地覆盖分割方法的比较

摘要: 土地利用土地覆盖（LULC）映射对城市和资源规划至关重要，并是发展智能和可持续城市的关键元素之一。本研究评估了先进的LULC映射技术，重点关注基于查找表（LUT）的大气校正应用于Cartosat多光谱（MX）传感器图像，随后采用监督和半监督学习模型进行LULC预测。我们探索了DeeplabV3+和交叉伪监督（CPS）。CPS模型进一步经过动态加权的细化，增强了训练过程中伪标签的可靠性。这种综合方法分析了不同城市规划应用中LULC映射技术的准确性和实用性。以印度海德拉巴为例，说明了由于快速城市化导致的重大土地利用变化。通过分析随时间变化的Cartosat MX图像，我们突出显示了城市扩张、绿地缩减和工业区扩大等变化。这证明了这些技术对城市规划者和决策者的实际应用价值。

更新时间: 2025-07-24 05:23:02

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18099v1

Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models

Backdoor attacks targeting text-to-image diffusion models have advanced rapidly. However, current backdoor samples often exhibit two key abnormalities compared to benign samples: 1) Semantic Consistency, where backdoor prompts tend to generate images with similar semantic content even with significant textual variations to the prompts; 2) Attention Consistency, where the trigger induces consistent structural responses in the cross-attention maps. These consistencies leave detectable traces for defenders, making backdoors easier to identify. In this paper, toward stealthy backdoor samples, we propose Trigger without Trace (TwT) by explicitly mitigating these consistencies. Specifically, our approach leverages syntactic structures as backdoor triggers to amplify the sensitivity to textual variations, effectively breaking down the semantic consistency. Besides, a regularization method based on Kernel Maximum Mean Discrepancy (KMMD) is proposed to align the distribution of cross-attention responses between backdoor and benign samples, thereby disrupting attention consistency. Extensive experiments demonstrate that our method achieves a 97.5% attack success rate while exhibiting stronger resistance to defenses. It achieves an average of over 98% backdoor samples bypassing three state-of-the-art detection mechanisms, revealing the vulnerabilities of current backdoor defense methods. The code is available at https://github.com/Robin-WZQ/TwT.

Updated: 2025-07-24 05:22:14

标题: 不留痕迹的触发器：针对文本到图像扩散模型的隐秘后门攻击

摘要: 文中描述了针对文本到图像扩散模型的后门攻击不断发展。然而，与良性样本相比，当前的后门样本通常表现出两个关键异常：1）语义一致性，即后门提示往往会生成具有相似语义内容的图像，即使提示文字有明显的变化；2）注意力一致性，即触发器会在交叉注意力图中引起一致的结构响应。这些一致性为防御者留下可检测的痕迹，使得后门更容易被识别。为了实现隐秘的后门样本，本文提出了一种名为"无痕触发器"（TwT）的方法，通过明确减轻这些一致性来实现。具体而言，我们的方法利用句法结构作为后门触发器，以增加对文本变化的敏感性，有效打破语义一致性。此外，提出了一种基于核最大均值差异（KMMD）的正则化方法，用于使后门和良性样本之间的交叉注意力响应分布对齐，从而破坏注意力一致性。广泛的实验证明，我们的方法在攻击成功率方面达到了97.5％，同时对防御措施表现出更强的抵抗力。它平均超过98％的后门样本绕过了三种最先进的检测机制，揭示了当前后门防御方法的漏洞。该代码可在https://github.com/Robin-WZQ/TwT上找到。

更新时间: 2025-07-24 05:22:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2503.17724v2

Learning from Hard Labels with Additional Supervision on Non-Hard-Labeled Classes

In scenarios where training data is limited due to observation costs or data scarcity, enriching the label information associated with each instance becomes crucial for building high-accuracy classification models. In such contexts, it is often feasible to obtain not only hard labels but also {\it additional supervision}, such as the confidences for the hard labels. This setting naturally raises fundamental questions: {\it What kinds of additional supervision are intrinsically beneficial?} And {\it how do they contribute to improved generalization performance?} To address these questions, we propose a theoretical framework that treats both hard labels and additional supervision as probability distributions, and constructs soft labels through their affine combination. Our theoretical analysis reveals that the essential component of additional supervision is not the confidence score of the assigned hard label, but rather the information of the distribution over the non-hard-labeled classes. Moreover, we demonstrate that the additional supervision and the mixing coefficient contribute to the refinement of soft labels in complementary roles. Intuitively, in the probability simplex, the additional supervision determines the direction in which the deterministic distribution representing the hard label should be adjusted toward the true label distribution, while the mixing coefficient controls the step size along that direction. Through generalization error analysis, we theoretically characterize how the additional supervision and its mixing coefficient affect both the convergence rate and asymptotic value of the error bound. Finally, we experimentally demonstrate that, based on our theory, designing additional supervision can lead to improved classification accuracy, even when utilized in a simple manner.

Updated: 2025-07-24 05:19:07

标题: 从硬标签中学习，并在非硬标记类别上增加监督

摘要: 在由于观测成本或数据稀缺而导致训练数据有限的情况下，丰富与每个实例相关的标签信息对于构建高准确度分类模型至关重要。在这种情况下，通常可以获得的不仅仅是硬标签，还有{\it 附加监督}，例如对硬标签的置信度。这种设置自然引出了基本问题：{\it 什么样的附加监督从本质上来说是有益的？}以及{\it 它们如何促进改善泛化性能？}为了解决这些问题，我们提出了一个理论框架，将硬标签和附加监督都视为概率分布，并通过它们的仿射组合构建软标签。我们的理论分析揭示了附加监督的基本组成部分不是分配的硬标签的置信度得分，而是非硬标签类别的分布信息。此外，我们证明了附加监督和混合系数在软标签的完善中发挥了互补的作用。直观地说，在概率单纯形中，附加监督确定了表示硬标签的确定性分布应该朝着真实标签分布调整的方向，而混合系数控制了沿该方向的步长。通过泛化误差分析，我们在理论上表征了附加监督及其混合系数如何影响误差界的收敛速度和渐近值。最后，我们通过实验证明，基于我们的理论，设计附加监督可以提高分类准确度，即使以简单的方式利用。

更新时间: 2025-07-24 05:19:07

领域: cs.LG,68T01,I.5

下载: http://arxiv.org/abs/2507.18098v1

Learning from Hard Labels with Additional Supervision on Non-Hard-Labeled Classes

Updated: 2025-07-24 05:19:07

标题: 通过对非硬标签类别进行额外监督学习硬标签

摘要: 在由于观测成本或数据稀缺而导致训练数据有限的情况下，丰富与每个实例相关联的标签信息对于构建高精度分类模型至关重要。在这种情境下，通常可以获得的不仅仅是硬标签，还有额外的监督，比如对硬标签的置信度。这种设置自然引发了基本问题：什么样的额外监督在本质上是有益的？它们如何有助于改善泛化性能？为了回答这些问题，我们提出了一个理论框架，将硬标签和额外监督都视为概率分布，并通过它们的仿射组合构建软标签。我们的理论分析揭示了额外监督的重要组成部分不是分配的硬标签的置信分数，而是非硬标记类别上的分布信息。此外，我们证明了额外监督和混合系数在软标签的改进中发挥了互补的作用。直观地，在概率单纯形中，额外监督决定了代表硬标签的确定性分布应朝着真实标签分布调整的方向，而混合系数则控制沿该方向的步长。通过泛化误差分析，我们理论上表征了额外监督及其混合系数如何影响误差界的收敛速度和渐近值。最后，我们实验性地证明，根据我们的理论，设计额外监督可以导致提高分类准确性，即使以简单方式利用。

更新时间: 2025-07-24 05:19:07

领域: cs.LG,68T01,I.5

下载: http://arxiv.org/abs/2507.18098v1

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Emotion Recognition in Conversation (ERC) is a practical and challenging task. This paper proposes a novel multimodal approach, the Long-Short Distance Graph Neural Network (LSDGNN). Based on the Directed Acyclic Graph (DAG), it constructs a long-distance graph neural network and a short-distance graph neural network to obtain multimodal features of distant and nearby utterances, respectively. To ensure that long- and short-distance features are as distinct as possible in representation while enabling mutual influence between the two modules, we employ a Differential Regularizer and incorporate a BiAffine Module to facilitate feature interaction. In addition, we propose an Improved Curriculum Learning (ICL) to address the challenge of data imbalance. By computing the similarity between different emotions to emphasize the shifts in similar emotions, we design a "weighted emotional shift" metric and develop a difficulty measurer, enabling a training process that prioritizes learning easy samples before harder ones. Experimental results on the IEMOCAP and MELD datasets demonstrate that our model outperforms existing benchmarks.

Updated: 2025-07-24 05:15:18

标题: 长短距离图神经网络和改进的课程学习在对话情绪识别中的应用

摘要: 情绪识别对话（ERC）是一个实际而具有挑战性的任务。本文提出了一种新颖的多模态方法，即长短距离图神经网络（LSDGNN）。基于有向无环图（DAG），它构建了一个长距离图神经网络和一个短距离图神经网络，分别获取远距离和近距离话语的多模态特征。为了确保在表示中长距离和短距离特征尽可能不同，同时实现两个模块之间的相互影响，我们采用差异正则化器并引入双仿射模块来促进特征交互。此外，我们提出了改进课程学习（ICL）来解决数据不平衡的挑战。通过计算不同情绪之间的相似性来强调类似情绪的变化，我们设计了一个“加权情绪变化”度量标准并开发了一个难度测量器，使得训练过程优先学习容易的样本。在IEMOCAP和MELD数据集上的实验结果表明，我们的模型优于现有的基准。

更新时间: 2025-07-24 05:15:18

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.15205v2

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Updated: 2025-07-24 05:15:18

标题: 长短距离图神经网络和改进的课程学习在对话情绪识别中的应用

摘要: 会话中的情感识别（ERC）是一个实际而具有挑战性的任务。本文提出了一种新颖的多模态方法，即长短距离图神经网络（LSDGNN）。基于有向无环图（DAG），它构建了一个长距离图神经网络和一个短距离图神经网络，分别获取远距离和近距离话语的多模态特征。为了确保长距离和短距离特征在表示上尽可能不同，同时使两个模块之间能够相互影响，我们采用了差分正则化器，并引入了一个BiAffine模块来促进特征交互。此外，我们提出了一种改进课程学习（ICL）来解决数据不平衡的挑战。通过计算不同情绪之间的相似度，强调类似情绪之间的变化，我们设计了一个“加权情感转变”指标，并开发了一个难度测量器，使训练过程优先学习简单样本而不是困难样本。在IEMOCAP和MELD数据集上的实验结果表明，我们的模型优于现有基准。

更新时间: 2025-07-24 05:15:18

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.15205v2

LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs

The increasing use of synthetic data from the public Internet has enhanced data usage efficiency in large language model (LLM) training. However, the potential threat of model collapse remains insufficiently explored. Existing studies primarily examine model collapse in a single model setting or rely solely on statistical surrogates. In this work, we introduce LLM Web Dynamics (LWD), an efficient framework for investigating model collapse at the network level. By simulating the Internet with a retrieval-augmented generation (RAG) database, we analyze the convergence pattern of model outputs. Furthermore, we provide theoretical guarantees for this convergence by drawing an analogy to interacting Gaussian Mixture Models.

Updated: 2025-07-24 05:08:02

标题: LLM网络动态：在LLM网络中追踪模型崩溃

摘要: 公开互联网上合成数据的使用不断增加，提高了大型语言模型（LLM）训练中的数据使用效率。然而，模型崩溃的潜在威胁仍未得到充分探讨。现有研究主要在单一模型设置中检查模型崩溃，或仅依赖于统计替代品。在这项工作中，我们引入了LLM Web Dynamics（LWD），这是一个用于在网络级别研究模型崩溃的高效框架。通过使用一个检索增强生成（RAG）数据库模拟互联网，我们分析了模型输出的收敛模式。此外，我们通过类比相互作用的高斯混合模型为这种收敛提供了理论保证。

更新时间: 2025-07-24 05:08:02

领域: cs.LG,cs.AI,cs.SI,stat.ME

下载: http://arxiv.org/abs/2506.15690v3

LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs

Updated: 2025-07-24 05:08:02

标题: LLM网络动态：在LLM网络中追踪模型崩溃

摘要: 公共互联网中合成数据的增加已经增强了大型语言模型(LLM)训练中的数据使用效率。然而，模型崩溃的潜在威胁仍然未被充分探讨。现有研究主要在单一模型设置中检验模型崩溃，或仅依赖于统计替代品。在这项工作中，我们引入了LLM Web Dynamics (LWD)，这是一个用于研究网络级别模型崩溃的有效框架。通过使用一个检索增强生成(RAG)数据库模拟互联网，我们分析模型输出的收敛模式。此外，我们通过类比互动高斯混合模型为这种收敛提供理论保证。

更新时间: 2025-07-24 05:08:02

领域: cs.LG,cs.AI,cs.SI,stat.ME

下载: http://arxiv.org/abs/2506.15690v3

A Principled Approach for Data Bias Mitigation

The widespread use of machine learning and data-driven algorithms for decision making has been steadily increasing over many years. \emph{Bias} in the data can adversely affect this decision-making. We present a new mitigation strategy to address data bias. Our methods are explainable and come with mathematical guarantees of correctness. They can take advantage of new work on table discovery to find new tuples that can be added to a dataset to create real datasets that are unbiased or less biased. Our framework covers data with non-binary labels and with multiple sensitive attributes. Hence, we are able to measure and mitigate bias that does not appear over a single attribute (or feature), but only intersectionally, when considering a combination of attributes. We evaluate our techniques on publicly available datasets and provide a theoretical analysis of our results, highlighting novel insights into data bias.

Updated: 2025-07-24 05:01:33

标题: 一种基于原则的数据偏差缓解方法

摘要: 机器学习和数据驱动算法在决策制定中的广泛应用已经持续增长多年。数据中的\emph{偏见}可能会对这种决策产生不利影响。我们提出了一种新的缓解数据偏见的策略。我们的方法是可解释的，并且具有正确性的数学保证。它们可以利用表格发现的新工作，找到可以添加到数据集中以创建无偏或较少偏见的真实数据集的新元组。我们的框架涵盖了具有非二进制标签和多个敏感属性的数据。因此，我们能够衡量和减轻不仅在单个属性（或特征）上出现的偏见，而且只在交叉点上出现，考虑到属性的组合。我们在公开可用的数据集上评估我们的技术，并对我们的结果进行理论分析，突出数据偏见的新颖见解。

更新时间: 2025-07-24 05:01:33

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2405.12312v4

A Principled Approach for Data Bias Mitigation

Updated: 2025-07-24 05:01:33

标题: 一个基于原则的数据偏见缓解方法

摘要: 多年来，用于决策的机器学习和数据驱动算法的广泛使用一直在稳步增加。数据中的\emph{偏见}会对这种决策产生不利影响。我们提出了一种新的缓解数据偏见的策略。我们的方法是可解释的，并具有正确性的数学保证。它们可以利用关于表格发现的新工作，找到可以添加到数据集中的新元组，从而创建无偏或偏差较小的真实数据集。我们的框架涵盖了具有非二进制标签和多个敏感属性的数据。因此，我们能够测量和减轻不仅仅在单个属性（或特征）上出现的偏见，而是在考虑属性组合时才交叉出现的偏见。我们在公开可用的数据集上评估了我们的技术，并对结果进行了理论分析，突出了对数据偏见的新见解。

更新时间: 2025-07-24 05:01:33

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2405.12312v4

Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections

We address key challenges in Dataset Aggregation (DAgger) for real-world contact-rich manipulation: how to collect informative human correction data and how to effectively update policies with this new data. We introduce Compliant Residual DAgger (CR-DAgger), which contains two novel components: 1) a Compliant Intervention Interface that leverages compliance control, allowing humans to provide gentle, accurate delta action corrections without interrupting the ongoing robot policy execution; and 2) a Compliant Residual Policy formulation that learns from human corrections while incorporating force feedback and force control. Our system significantly enhances performance on precise contact-rich manipulation tasks using minimal correction data, improving base policy success rates by over 50\% on two challenging tasks (book flipping and belt assembly) while outperforming both retraining-from-scratch and finetuning approaches. Through extensive real-world experiments, we provide practical guidance for implementing effective DAgger in real-world robot learning tasks. Result videos are available at: https://compliant-residual-dagger.github.io/

Updated: 2025-07-24 04:54:15

标题: 《Compliant Residual DAgger：借助人类纠正改进现实世界中接触丰富的操纵》

摘要: 我们解决了数据集聚合（DAgger）在现实世界中接触丰富操作中的关键挑战：如何收集信息丰富的人类纠正数据以及如何有效地使用这些新数据更新策略。我们引入了符合残差DAgger（CR-DAgger），其中包含两个新颖组成部分：1）符合干预界面，利用符合控制，允许人类提供温和、准确的增量动作纠正，而不中断正在进行的机器人策略执行；以及2）符合残差策略制定，从人类纠正中学习，同时融入力反馈和力控制。我们的系统显著提升了精确接触丰富操作任务的性能，使用最少的校正数据，将基本策略成功率提高了超过50％，在两项具有挑战性的任务（翻书和组装皮带）上表现优异，同时胜过从头开始重新训练和微调方法。通过大量的现实世界实验，我们为在现实世界中的机器人学习任务中实施有效的DAgger提供了实用指导。结果视频可在以下链接查看：https://compliant-residual-dagger.github.io/

更新时间: 2025-07-24 04:54:15

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2506.16685v2

Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections

Updated: 2025-07-24 04:54:15

标题: 《顺应残余Dagger：通过人类纠正改进真实世界接触丰富的操作》

摘要: 我们解决了现实世界中数据集聚合（DAgger）中的关键挑战：如何收集信息丰富的人类纠正数据以及如何有效地利用这些新数据更新策略。我们引入了Compliant Residual DAgger (CR-DAgger)，其中包含两个新颖的组件：1）一个Compliant Intervention Interface，利用合规性控制，允许人类提供温和、准确的增量动作纠正，而不会打断正在进行的机器人策略执行；2）一个Compliant Residual Policy制定，从人类纠正中学习，同时结合了力反馈和力控制。我们的系统通过使用最小的校正数据，在精确的接触丰富操作任务上显著提高了性能，将两个具有挑战性的任务（书籍翻动和传送带组装）的基本策略成功率提高了50%以上，同时优于重新训练和微调方法。通过广泛的现实世界实验，我们为在实际机器人学习任务中有效实施DAgger提供了实用指导。结果视频可在以下网址查看：https://compliant-residual-dagger.github.io/

更新时间: 2025-07-24 04:54:15

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2506.16685v2

Neural Corrective Machine Unranking

Machine unlearning in neural information retrieval (IR) systems requires removing specific data whilst maintaining model performance. Applying existing machine unlearning methods to IR may compromise retrieval effectiveness or inadvertently expose unlearning actions due to the removal of particular items from the retrieved results presented to users. We formalise corrective unranking, which extends machine unlearning in (neural) IR context by integrating substitute documents to preserve ranking integrity, and propose a novel teacher-student framework, Corrective unRanking Distillation (CuRD), for this task. CuRD (1) facilitates forgetting by adjusting the (trained) neural IR model such that its output relevance scores of to-be-forgotten samples mimic those of low-ranking, non-retrievable samples; (2) enables correction by fine-tuning the relevance scores for the substitute samples to match those of corresponding to-be-forgotten samples closely; (3) seeks to preserve performance on samples that are not targeted for forgetting. We evaluate CuRD on four neural IR models (BERTcat, BERTdot, ColBERT, PARADE) using MS MARCO and TREC CAR datasets. Experiments with forget set sizes from 1 % and 20 % of the training dataset demonstrate that CuRD outperforms seven state-of-the-art baselines in terms of forgetting and correction while maintaining model retention and generalisation capabilities.

Updated: 2025-07-24 04:51:19

标题: 神经校正机器去排名

摘要: 在神经信息检索（IR）系统中，机器遗忘需要删除特定数据同时保持模型性能。将现有的机器遗忘方法应用于IR可能会损害检索效果，或者由于从呈现给用户的检索结果中删除特定项目而无意中暴露遗忘操作。我们形式化了校正式排序，这扩展了（神经）IR环境中的机器遗忘，通过整合替代文档来保持排序完整性，并提出了一种新的师生框架，Corrective unRanking Distillation（CuRD），用于此任务。 CuRD（1）通过调整（训练的）神经IR模型，使其输出的相关性得分与待遗忘样本的低排名、不可检索样本的相关性得分相似，从而促进遗忘;（2）通过微调替代样本的相关性得分，使其与相应待遗忘样本的相关性得分紧密匹配，实现纠正;（3）试图保留不被遗忘的样本的性能。我们使用MS MARCO和TREC CAR数据集对四个神经IR模型（BERTcat、BERTdot、ColBERT、PARADE）进行了CuRD评估。实验显示，忘记集大小从训练数据集的1 %到20 %，CuRD在遗忘和纠正方面优于七种最先进的基线方法，同时保持模型的保留和泛化能力。

更新时间: 2025-07-24 04:51:19

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2411.08562v2

Gen-AI Police Sketches with Stable Diffusion

This project investigates the use of multimodal AI-driven approaches to automate and enhance suspect sketching. Three pipelines were developed and evaluated: (1) baseline image-to-image Stable Diffusion model, (2) same model integrated with a pre-trained CLIP model for text-image alignment, and (3) novel approach incorporating LoRA fine-tuning of the CLIP model, applied to self-attention and cross-attention layers, and integrated with Stable Diffusion. An ablation study confirmed that fine-tuning both self- and cross-attention layers yielded the best alignment between text descriptions and sketches. Performance testing revealed that Model 1 achieved the highest structural similarity (SSIM) of 0.72 and a peak signal-to-noise ratio (PSNR) of 25 dB, outperforming Model 2 and Model 3. Iterative refinement enhanced perceptual similarity (LPIPS), with Model 3 showing improvement over Model 2 but still trailing Model 1. Qualitatively, sketches generated by Model 1 demonstrated the clearest facial features, highlighting its robustness as a baseline despite its simplicity.

Updated: 2025-07-24 04:41:58

标题: 使用稳定扩散的Gen-AI警方草图

摘要: 这个项目研究了使用多模态人工智能驱动方法自动化和增强嫌疑人素描的应用。开发并评估了三条流水线：（1）基线图像到图像的稳定扩散模型，（2）集成了预训练的CLIP模型用于文本-图像对齐的相同模型，以及（3）创新方法，将LoRA微调应用于CLIP模型的自注意力和交叉注意力层，并与稳定扩散集成。消融研究证实，微调自注意力和交叉注意力层可以实现文本描述和素描之间最佳对齐。性能测试显示，模型1实现了最高的结构相似性（SSIM）为0.72，峰值信噪比（PSNR）为25 dB，优于模型2和模型3。迭代改进增强了感知相似度（LPIPS），模型3显示比模型2有所改善，但仍落后于模型1。从质量上看，模型1生成的素描展示了最清晰的面部特征，突显了其作为基线的稳健性，尽管它的简单性。

更新时间: 2025-07-24 04:41:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18667v1

Gen-AI Police Sketches with Stable Diffusion

Updated: 2025-07-24 04:41:58

标题: 使用稳定扩散生成AI警察素描

摘要: 该项目调查了使用多模态AI驱动方法自动化和增强嫌疑人素描的应用。开发并评估了三个流程：（1）基准图像到图像的稳定扩散模型，（2）将相同模型与预训练的CLIP模型集成以进行文本-图像对齐，以及（3）创新方法结合了LoRA对CLIP模型的微调，应用于自注意力和交叉注意力层，并与稳定扩散集成。消融研究证实，微调自注意力和交叉注意力层可以实现最佳的文本描述和素描之间的对齐。性能测试显示，模型1实现了最高的结构相似性（SSIM）为0.72和峰值信噪比（PSNR）为25 dB，优于模型2和模型3。迭代细化增强了感知相似性（LPIPS），模型3显示出比模型2更好的改善，但仍落后于模型1。从定性上看，模型1生成的素描展示了最清晰的面部特征，突显了其作为基线的稳健性，尽管其简单。

更新时间: 2025-07-24 04:41:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18667v1

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculations, we show that our strongest model (fine-tuned LLaMA-2 70B) can generate materials predicted to be metastable at about twice the rate (49% vs 28%) of CDVAE, a competing diffusion model. Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material, infilling of partial structures and text-conditional generation. Finally, we show that language models' ability to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are surprisingly well-suited for atomistic data.

Updated: 2025-07-24 04:36:04

标题: Feinabgestimmte Sprachmodelle generieren stabile anorganische Materialien als Text

摘要: 我们提出对大型语言模型进行微调，以生成稳定的材料。虽然不寻常，但在文本编码的原子数据上微调大型语言模型易于实现且可靠，约90%的采样结构遵守原子位置和电荷的物理约束。使用来自学习的ML势和金标准DFT计算的能量超出壳计算，我们展示了我们最强的模型（微调的LLaMA-2 70B）可以以约两倍的速率（49%对28%）生成预测为亚稳态的材料，与竞争扩散模型CDVAE相比。由于文本提示的固有灵活性，我们的模型可以同时用于稳定材料的无条件生成，部分结构的填充和文本条件生成。最后，我们展示了语言模型捕获晶体结构关键对称性的能力随着模型规模而提高，表明预训练LLM的偏见出奇地适合原子数据。

更新时间: 2025-07-24 04:36:04

领域: cs.LG,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2402.04379v2

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Updated: 2025-07-24 04:36:04

标题: 精细调整的语言模型生成稳定的无机材料文本

摘要: 我们提出对生成稳定材料进行大型语言模型微调。尽管非正统，但在文本编码的原子数据上微调大型语言模型简单易行且可靠，约90%的抽样结构遵守原子位置和电荷的物理约束。通过从学习的机器学习潜势和金标准密度泛函理论计算中的能量在船上的计算，我们展示了我们最强的模型（经微调的LLaMA-2 70B）可以以约两倍的速率（49% vs 28%）生成预测为亚稳态的材料，超过了竞争扩散模型CDVAE。由于文本提示的固有灵活性，我们的模型可以同时用于生成稳定材料、填充部分结构和文本条件生成。最后，我们展示了语言模型捕捉晶体结构关键对称性的能力随着模型规模的增加而提高，表明预训练的LLM的偏见出奇地适合原子数据。

更新时间: 2025-07-24 04:36:04

领域: cs.LG,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2402.04379v2

Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning

In this paper, we investigate the impact of compression on stochastic gradient algorithms for machine learning, a technique widely used in distributed and federated learning. We underline differences in terms of convergence rates between several unbiased compression operators, that all satisfy the same condition on their variance, thus going beyond the classical worst-case analysis. To do so, we focus on the case of least-squares regression (LSR) and analyze a general stochastic approximation algorithm for minimizing quadratic functions relying on a random field. We consider weak assumptions on the random field, tailored to the analysis (specifically, expected H\"older regularity), and on the noise covariance, enabling the analysis of various randomizing mechanisms, including compression. We then extend our results to the case of federated learning. More formally, we highlight the impact on the convergence of the covariance $\mathfrak{C}_{\mathrm{ania}}$ of the additive noise induced by the algorithm. We demonstrate despite the non-regularity of the stochastic field, that the limit variance term scales with $\mathrm{Tr}(\mathfrak{C}_{\mathrm{ania}} H^{-1})/K$ (where $H$ is the Hessian of the optimization problem and $K$ the number of iterations) generalizing the rate for the vanilla LSR case where it is $\sigma^2 \mathrm{Tr}(H H^{-1}) / K = \sigma^2 d / K$ (Bach and Moulines, 2013). Then, we analyze the dependency of $\mathfrak{C}_{\mathrm{ania}}$ on the compression strategy and ultimately its impact on convergence, first in the centralized case, then in two heterogeneous FL frameworks.

Updated: 2025-07-24 04:23:22

标题: 压缩和分布式最小二乘回归：收敛速率及在联邦学习中的应用

摘要: 在这篇论文中，我们研究了压缩对机器学习中随机梯度算法的影响，这是分布式和联邦学习中广泛使用的技术。我们强调了几种无偏压缩算子之间收敛速度的差异，这些算子都满足相同的方差条件，因此超越了传统的最坏情况分析。为此，我们专注于最小二乘回归（LSR）的情况，并分析了一种依赖于随机场的最小化二次函数的一般随机逼近算法。我们对随机场和噪声协方差进行了弱假设，以便进行分析（特别是期望的H\"older正则性），使得能够分析各种随机化机制，包括压缩。然后，我们将结果扩展到联邦学习的情况。更正式地说，我们强调了算法引起的加性噪声的协方差$\mathfrak{C}_{\mathrm{ania}}$对收敛的影响。我们证明了尽管随机场不规则，但极限方差项与$\mathrm{Tr}(\mathfrak{C}_{\mathrm{ania}} H^{-1})/K$（其中$H$是优化问题的Hessian矩阵，$K$是迭代次数）成比例，推广了香草LSR情况的速率，其中是$\sigma^2 \mathrm{Tr}(H H^{-1}) / K = \sigma^2 d / K$（Bach和Moulines，2013）。然后，我们分析了$\mathfrak{C}_{\mathrm{ania}}$对压缩策略的依赖性，以及其对收敛的影响，首先在集中的情况下，然后在两种异质FL框架中。

更新时间: 2025-07-24 04:23:22

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2308.01358v2

Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning

Updated: 2025-07-24 04:23:22

标题: 压缩和分布式最小二乘回归：收敛速度及其在联邦学习中的应用

摘要: 在本文中，我们研究了压缩对用于机器学习的随机梯度算法的影响，这是分布式和联邦学习中广泛使用的技术。我们强调了几种无偏压缩算子之间在收敛速度方面的差异，它们都满足相同的方差条件，从而超越了传统的最坏情况分析。为此，我们专注于最小二乘回归（LSR）的情况，并分析了一种依赖于随机场的最小化二次函数的一般随机逼近算法。我们考虑了随机场的弱假设，针对分析定制（特别是期望H\"older正则性），以及噪声协方差，从而使分析各种随机化机制，包括压缩成为可能。然后，我们将结果扩展到联邦学习的情况中。更正式地说，我们强调了算法引起的加性噪声的协方差$\mathfrak{C}_{\mathrm{ania}}$对收敛的影响。我们证明了尽管随机场不规则，但极限方差项与$\mathrm{Tr}(\mathfrak{C}_{\mathrm{ania}} H^{-1})/K$（其中$H$是优化问题的Hessian矩阵，$K$是迭代次数）成比例，从而推广了香草LSR情况的速率，其中它为$\sigma^2 \mathrm{Tr}(H H^{-1}) / K = \sigma^2 d / K$（Bach and Moulines，2013）。然后，我们分析了$\mathfrak{C}_{\mathrm{ania}}$对压缩策略的依赖性，最终对其对收敛的影响进行了分析，首先在中心化情况下，然后在两种异构FL框架中。

更新时间: 2025-07-24 04:23:22

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2308.01358v2

EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework

Large language models (LLMs) increasingly serve as educational tools, yet evaluating their teaching capabilities remains challenging due to the resource-intensive, context-dependent, and methodologically complex nature of teacher-student interactions. We introduce EducationQ, a multi-agent dialogue framework that efficiently assesses teaching capabilities through simulated dynamic educational scenarios, featuring specialized agents for teaching, learning, and evaluation. Testing 14 LLMs across major AI Organizations (OpenAI, Meta, Google, Anthropic, and others) on 1,498 questions spanning 13 disciplines and 10 difficulty levels reveals that teaching effectiveness does not correlate linearly with model scale or general reasoning capabilities - with some smaller open-source models outperforming larger commercial counterparts in teaching contexts. This finding highlights a critical gap in current evaluations that prioritize knowledge recall over interactive pedagogy. Our mixed-methods evaluation, combining quantitative metrics with qualitative analysis and expert case studies, identifies distinct pedagogical strengths employed by top-performing models (e.g., sophisticated questioning strategies, adaptive feedback mechanisms). Human expert evaluations show 78% agreement with our automated qualitative analysis of effective teaching behaviors, validating our methodology. EducationQ demonstrates that LLMs-as-teachers require specialized optimization beyond simple scaling, suggesting next-generation educational AI prioritize targeted enhancement of specific pedagogical effectiveness.

Updated: 2025-07-24 04:20:03

标题: 《教育Q：通过多代理对话框架评估LLM的教学能力》

摘要: 大型语言模型（LLMs）越来越被用作教育工具，然而评估它们的教学能力仍然具有挑战性，因为师生互动具有资源密集、依赖环境和方法论复杂的特性。我们引入了EducationQ，一个多智能体对话框架，通过模拟动态教育场景来高效评估教学能力，其中包括专门用于教学、学习和评估的智能体。在跨越13个学科和10个难度级别的1,498个问题上测试了14个LLMs，涵盖了主要人工智能组织（OpenAI，Meta，Google，Anthropic等），发现教学效果并不线性相关于模型规模或一般推理能力 - 有些较小的开源模型在教学环境中表现优于较大的商业对手。这一发现突显了当前评估中重视知识回忆而非互动教学的关键差距。我们的混合方法评估，将定量指标与定性分析和专家案例研究相结合，识别了表现最佳模型所采用的独特教学优势（例如，复杂的提问策略，自适应反馈机制）。人类专家评估显示78%的一致性与我们对有效教学行为的自动定性分析，验证了我们的方法论。EducationQ表明，LLMs作为教师需要专门的优化，超越简单的扩展，建议下一代教育人工智能优先考虑针对特定教学效果的有针对性增强。

更新时间: 2025-07-24 04:20:03

领域: cs.AI,cs.CE,cs.CL,cs.CY,cs.HC

下载: http://arxiv.org/abs/2504.14928v2

TextSAM-EUS: Text Prompt Learning for SAM to Accurately Segment Pancreatic Tumor in Endoscopic Ultrasound

Pancreatic cancer carries a poor prognosis and relies on endoscopic ultrasound (EUS) for targeted biopsy and radiotherapy. However, the speckle noise, low contrast, and unintuitive appearance of EUS make segmentation of pancreatic tumors with fully supervised deep learning (DL) models both error-prone and dependent on large, expert-curated annotation datasets. To address these challenges, we present TextSAM-EUS, a novel, lightweight, text-driven adaptation of the Segment Anything Model (SAM) that requires no manual geometric prompts at inference. Our approach leverages text prompt learning (context optimization) through the BiomedCLIP text encoder in conjunction with a LoRA-based adaptation of SAM's architecture to enable automatic pancreatic tumor segmentation in EUS, tuning only 0.86% of the total parameters. On the public Endoscopic Ultrasound Database of the Pancreas, TextSAM-EUS with automatic prompts attains 82.69% Dice and 85.28% normalized surface distance (NSD), and with manual geometric prompts reaches 83.10% Dice and 85.70% NSD, outperforming both existing state-of-the-art (SOTA) supervised DL models and foundation models (e.g., SAM and its variants). As the first attempt to incorporate prompt learning in SAM-based medical image segmentation, TextSAM-EUS offers a practical option for efficient and robust automatic EUS segmentation. Our code will be publicly available upon acceptance.

Updated: 2025-07-24 04:17:06

标题: TextSAM-EUS：用于SAM准确分割内窥镜超声下胰腺肿瘤的文本提示学习

摘要: 胰腺癌预后不佳，依赖内窥镜超声（EUS）进行靶向活检和放疗。然而，EUS的斑点噪声、低对比度和不直观的外观使得使用完全监督的深度学习（DL）模型进行胰腺肿瘤分割容易出错，并且依赖于大量专家筛选的注释数据集。为了解决这些挑战，我们提出了TextSAM-EUS，这是Segment Anything Model（SAM）的一种新颖、轻量级、以文本驱动的改进，无需在推断时进行手动几何提示。我们的方法利用BiomedCLIP文本编码器中的文本提示学习（上下文优化）结合基于LoRA的SAM架构调整，实现EUS中胰腺肿瘤的自动分割，仅调整总参数的0.86%。在公开的胰腺内窥镜数据库上，TextSAM-EUS通过自动提示实现82.69%的Dice系数和85.28%的归一化表面距离（NSD），通过手动几何提示达到83.10%的Dice系数和85.70%的NSD，优于现有的最先进（SOTA）监督式DL模型和基础模型（如SAM及其变种）。作为首次尝试在基于SAM的医学图像分割中引入提示学习的尝试，TextSAM-EUS为高效、稳健的自动EUS分割提供了实用选择。我们的代码将在接受后公开提供。

更新时间: 2025-07-24 04:17:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18082v1

A PBN-RL-XAI Framework for Discovering a "Hit-and-Run" Therapeutic Strategy in Melanoma

Innate resistance to anti-PD-1 immunotherapy remains a major clinical challenge in metastatic melanoma, with the underlying molecular networks being poorly understood. To address this, we constructed a dynamic Probabilistic Boolean Network model using transcriptomic data from patient tumor biopsies to elucidate the regulatory logic governing therapy response. We then employed a reinforcement learning agent to systematically discover optimal, multi-step therapeutic interventions and used explainable artificial intelligence to mechanistically interpret the agent's control policy. The analysis revealed that a precisely timed, 4-step temporary inhibition of the lysyl oxidase like 2 protein (LOXL2) was the most effective strategy. Our explainable analysis showed that this ''hit-and-run" intervention is sufficient to erase the molecular signature driving resistance, allowing the network to self-correct without requiring sustained intervention. This study presents a novel, time-dependent therapeutic hypothesis for overcoming immunotherapy resistance and provides a powerful computational framework for identifying non-obvious intervention protocols in complex biological systems.

Updated: 2025-07-24 04:04:13

标题: 一个用于在黑色素瘤中发现“打击与逃跑”治疗策略的PBN-RL-XAI框架

摘要: 对PD-1免疫治疗的先天抗性仍然是转移性黑色素瘤中的一项主要临床挑战，其中潜在的分子网络理解不足。为了解决这个问题，我们利用来自患者肿瘤活检的转录组数据构建了一个动态的概率布尔网络模型，以阐明治疗反应的调控逻辑。然后，我们利用强化学习代理系统地发现最佳的多步治疗干预，并使用可解释的人工智能来机械地解释代理的控制策略。分析结果表明，精确定时的4步临时抑制赖氨酸氧化酶类似物2蛋白（LOXL2）是最有效的策略。我们的可解释性分析显示，这种“打击并逃”干预足以消除驱动抗性的分子特征，使网络能够自我纠正，而无需持续干预。这项研究提出了一个新颖的、时间依赖的治疗假设，用于克服免疫治疗抗性，并为识别复杂生物系统中的非明显干预方案提供了强大的计算框架。

更新时间: 2025-07-24 04:04:13

领域: q-bio.QM,cs.AI

下载: http://arxiv.org/abs/2507.10136v4

Adaptive Relative Pose Estimation Framework with Dual Noise Tuning for Safe Approaching Maneuvers

Accurate and robust relative pose estimation is crucial for enabling challenging Active Debris Removal (ADR) missions targeting tumbling derelict satellites such as ESA's ENVISAT. This work presents a complete pipeline integrating advanced computer vision techniques with adaptive nonlinear filtering to address this challenge. A Convolutional Neural Network (CNN), enhanced with image preprocessing, detects structural markers (corners) from chaser imagery, whose 2D coordinates are converted to 3D measurements using camera modeling. These measurements are fused within an Unscented Kalman Filter (UKF) framework, selected for its ability to handle nonlinear relative dynamics, to estimate the full relative pose. Key contributions include the integrated system architecture and a dual adaptive strategy within the UKF: dynamic tuning of the measurement noise covariance compensates for varying CNN measurement uncertainty, while adaptive tuning of the process noise covariance, utilizing measurement residual analysis, accounts for unmodeled dynamics or maneuvers online. This dual adaptation enhances robustness against both measurement imperfections and dynamic model uncertainties. The performance of the proposed adaptive integrated system is evaluated through high-fidelity simulations using a realistic ENVISAT model, comparing estimates against ground truth under various conditions, including measurement outages. This comprehensive approach offers an enhanced solution for robust onboard relative navigation, significantly advancing the capabilities required for safe proximity operations during ADR missions.

Updated: 2025-07-24 04:02:42

标题: 具有双重噪声调整的自适应相对姿势估计框架，用于安全接近机动

摘要: 准确和稳健的相对姿态估计对于实现针对像欧空局的ENVISAT这样的翻滚废弃卫星的具有挑战性的主动空间碎片清除（ADR）任务至关重要。本研究提出了一个完整的流程，将先进的计算机视觉技术与自适应非线性滤波相结合，以解决这一挑战。一个卷积神经网络（CNN），增强了图像预处理，从追逐者图像中检测结构标记（角点），这些角点的二维坐标使用相机建模转换为三维测量。这些测量值在一个无迹卡尔曼滤波器（UKF）框架内融合，选择UKF是因为它能够处理非线性相对动态，从而估计完整的相对姿态。关键贡献包括整合系统架构和UKF内的双自适应策略：动态调整测量噪声协方差以补偿不同的CNN测量不确定性，同时通过利用测量残差分析，自适应调整过程噪声协方差，考虑到在线未建模的动态或机动。这种双重适应增强了对测量不完美和动态模型不确定性的鲁棒性。通过使用真实的ENVISAT模型进行高保真度模拟，比较估计值和各种条件下的地面真值，包括测量中断，评估了所提出的自适应一体化系统的性能。这种全面的方法为稳健的机载相对导航提供了增强的解决方案，显著提升了ADR任务中安全接近操作所需的能力。

更新时间: 2025-07-24 04:02:42

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2507.16214v2

Neural Machine Unranking

We address the problem of machine unlearning in neural information retrieval (IR), introducing a novel task termed Neural Machine UnRanking (NuMuR). This problem is motivated by growing demands for data privacy compliance and selective information removal in neural IR systems. Existing task- or model- agnostic unlearning approaches, primarily designed for classification tasks, are suboptimal for NuMuR due to two core challenges: (1) neural rankers output unnormalised relevance scores rather than probability distributions, limiting the effectiveness of traditional teacher-student distillation frameworks; and (2) entangled data scenarios, where queries and documents appear simultaneously across both forget and retain sets, may degrade retention performance in existing methods. To address these issues, we propose Contrastive and Consistent Loss (CoCoL), a dual-objective framework. CoCoL comprises (1) a contrastive loss that reduces relevance scores on forget sets while maintaining performance on entangled samples, and (2) a consistent loss that preserves accuracy on retain set. Extensive experiments on MS MARCO and TREC CAR datasets, across four neural IR models, demonstrate that CoCoL achieves substantial forgetting with minimal retain and generalisation performance loss. Our method facilitates more effective and controllable data removal than existing techniques.

Updated: 2025-07-24 04:02:17

标题: 神经机器解排名

摘要: 我们解决了神经信息检索（IR）中机器遗忘的问题，引入了一个称为神经机器遗忘（NuMuR）的新任务。这个问题是由对数据隐私合规性和神经IR系统中选择性信息移除需求的增长而推动的。现有的面向任务或模型的遗忘方法，主要设计用于分类任务，对于NuMuR来说并不最佳，原因有两个核心挑战：（1）神经排序器输出未经标准化的相关性分数，而不是概率分布，限制了传统师生蒸馏框架的有效性；（2）纠缠的数据情况，即查询和文档同时出现在忘记集和保留集中，可能会降低现有方法中的保留性能。为了解决这些问题，我们提出了对比一致损失（CoCoL），一个双目标框架。CoCoL包括（1）一个对比损失，降低忘记集上的相关性分数，同时在纠缠样本上保持性能；和（2）一个保持一致损失，在保留集上保持准确性。在MS MARCO和TREC CAR数据集上进行的大量实验，跨越四个神经IR模型，证明了CoCoL实现了大量遗忘，同时最小化了保留和泛化性能损失。我们的方法比现有技术更有效和可控地进行数据移除。

更新时间: 2025-07-24 04:02:17

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2408.05330v3

History-Guided Video Diffusion

Classifier-free guidance (CFG) is a key technique for improving conditional generation in diffusion models, enabling more accurate control while enhancing sample quality. It is natural to extend this technique to video diffusion, which generates video conditioned on a variable number of context frames, collectively referred to as history. However, we find two key challenges to guiding with variable-length history: architectures that only support fixed-size conditioning, and the empirical observation that CFG-style history dropout performs poorly. To address this, we propose the Diffusion Forcing Transformer (DFoT), a video diffusion architecture and theoretically grounded training objective that jointly enable conditioning on a flexible number of history frames. We then introduce History Guidance, a family of guidance methods uniquely enabled by DFoT. We show that its simplest form, vanilla history guidance, already significantly improves video generation quality and temporal consistency. A more advanced method, history guidance across time and frequency further enhances motion dynamics, enables compositional generalization to out-of-distribution history, and can stably roll out extremely long videos. Project website: https://boyuan.space/history-guidance

Updated: 2025-07-24 03:58:47

标题: 历史引导的视频传播

摘要: 无分类器引导（CFG）是改善扩散模型中条件生成的关键技术，可以在增强样本质量的同时实现更准确的控制。将这种技术扩展到视频扩散是很自然的，它可以根据不同数量的上下文帧生成视频，这些帧统称为历史。然而，我们发现利用可变长度历史进行引导存在两个关键挑战：只支持固定大小条件的架构，以及CFG风格的历史丢弃表现不佳的实证观察。为了解决这个问题，我们提出了扩散强迫变换器（DFoT），这是一个视频扩散架构和理论基础的训练目标，可以共同实现对灵活数量历史帧的条件生成。然后我们引入了历史引导，这是一类由DFoT独特支持的引导方法。我们证明，它最简单的形式，即香草历史引导，已经显著提高了视频生成质量和时间一致性。更高级的方法，跨时间和频率的历史引导进一步增强了运动动态，实现了对分布外历史的组合泛化，并且可以稳定地展开极长的视频。项目网站：https://boyuan.space/history-guidance

更新时间: 2025-07-24 03:58:47

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2502.06764v2

History-Guided Video Diffusion

Updated: 2025-07-24 03:58:47

标题: 历史引导的视频扩散

摘要: 无分类器引导（CFG）是改进扩散模型中条件生成的关键技术，可以更准确地控制同时提高样本质量。将这种技术扩展到视频扩散是很自然的，它生成基于可变数量的上下文帧的视频，统称为历史。然而，我们发现在引导可变长度历史时存在两个关键挑战：只支持固定大小条件的架构，以及CFG风格历史丢失效果不佳的经验观察。为了解决这个问题，我们提出了Diffusion Forcing Transformer（DFoT），一种视频扩散架构和理论基础的训练目标，共同实现对灵活数量历史帧的条件。然后我们引入了History Guidance，一系列由DFoT独特支持的引导方法。我们展示了它最简单形式的香草历史引导已经显著提高了视频生成质量和时间一致性。更高级的方法，跨时间和频率的历史引导进一步增强了动态运动，实现了对分布外历史的组合泛化，并且可以稳定地展开极长的视频。项目网站：https://boyuan.space/history-guidance

更新时间: 2025-07-24 03:58:47

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2502.06764v2

PyPitfall: Dependency Chaos and Software Supply Chain Vulnerabilities in Python

Python software development heavily relies on third-party packages. Direct and transitive dependencies create a labyrinth of software supply chains. While it is convenient to reuse code, vulnerabilities within these dependency chains can propagate through dependencies, potentially affecting down-stream packages and applications. PyPI, the official Python package repository, hosts many packages and lacks a comprehensive analysis of the prevalence of vulnerable dependencies. This paper introduces PyPitfall, a quantitative analysis of vulnerable dependencies across the PyPI ecosystem. We analyzed the dependency structures of 378,573 PyPI packages and identified 4,655 packages that explicitly require at least one known-vulnerable version and 141,044 packages that permit vulnerable versions within specified ranges. By characterizing the ecosystem-wide dependency landscape and the security impact of transitive dependencies, we aim to raise awareness of Python software supply chain security.

Updated: 2025-07-24 03:58:18

标题: PyPitfall：Python中的依赖混乱和软件供应链漏洞

摘要: Python软件开发严重依赖第三方包。直接和传递依赖关系构成了一个软件供应链的迷宫。虽然重用代码很方便，但这些依赖链中的漏洞可能会通过依赖关系传播，潜在地影响下游包和应用程序。PyPI，官方的Python包仓库，托管了许多包，并且缺乏对脆弱依赖性的流行程度的全面分析。本文介绍了PyPitfall，对PyPI生态系统中的脆弱依赖进行了定量分析。我们分析了378,573个PyPI包的依赖结构，并确定了4,655个明确需要至少一个已知脆弱版本的包，以及允许在指定范围内存在脆弱版本的141,044个包。通过描述生态系统范围内的依赖景观和传递依赖的安全影响，我们旨在提高对Python软件供应链安全的意识。

更新时间: 2025-07-24 03:58:18

领域: cs.CR

下载: http://arxiv.org/abs/2507.18075v1

PyPitfall: Dependency Chaos and Software Supply Chain Vulnerabilities in Python

Updated: 2025-07-24 03:58:18

标题: PyPitfall：Python 中的依赖混乱和软件供应链漏洞

摘要: Python软件开发严重依赖于第三方包。直接和传递性依赖关系构建了一个软件供应链的迷宫。虽然重用代码很方便，但在这些依赖链中存在的漏洞可能通过依赖关系传播，潜在地影响到下游的包和应用程序。PyPI，官方的Python包存储库，托管了许多包，并缺乏对易受漏洞依赖性普遍性的全面分析。本文介绍了PyPitfall，对PyPI生态系统中易受漏洞依赖进行了定量分析。我们分析了378,573个PyPI包的依赖结构，并确定了4,655个明确需要至少一个已知易受漏洞版本的包，以及允许在指定范围内存在易受漏洞版本的141,044个包。通过对整个生态系统的依赖关系景观和传递性依赖的安全影响进行表征，我们旨在提高人们对Python软件供应链安全性的意识。

更新时间: 2025-07-24 03:58:18

领域: cs.CR

下载: http://arxiv.org/abs/2507.18075v1

AlphaGo Moment for Model Architecture Discovery

While AI systems demonstrate exponentially improving capabilities, the pace of AI research itself remains linearly bounded by human cognitive capacity, creating an increasingly severe development bottleneck. We present ASI-Arch, the first demonstration of Artificial Superintelligence for AI research (ASI4AI) in the critical domain of neural architecture discovery--a fully autonomous system that shatters this fundamental constraint by enabling AI to conduct its own architectural innovation. Moving beyond traditional Neural Architecture Search (NAS), which is fundamentally limited to exploring human-defined spaces, we introduce a paradigm shift from automated optimization to automated innovation. ASI-Arch can conduct end-to-end scientific research in the domain of architecture discovery, autonomously hypothesizing novel architectural concepts, implementing them as executable code, training and empirically validating their performance through rigorous experimentation and past experience. ASI-Arch conducted 1,773 autonomous experiments over 20,000 GPU hours, culminating in the discovery of 106 innovative, state-of-the-art (SOTA) linear attention architectures. Like AlphaGo's Move 37 that revealed unexpected strategic insights invisible to human players, our AI-discovered architectures demonstrate emergent design principles that systematically surpass human-designed baselines and illuminate previously unknown pathways for architectural innovation. Crucially, we establish the first empirical scaling law for scientific discovery itself--demonstrating that architectural breakthroughs can be scaled computationally, transforming research progress from a human-limited to a computation-scalable process. We provide comprehensive analysis of the emergent design patterns and autonomous research capabilities that enabled these breakthroughs, establishing a blueprint for self-accelerating AI systems.

Updated: 2025-07-24 03:57:27

标题: AlphaGo模型架构发现的时刻

摘要: 尽管AI系统展示出指数级增长的能力，但AI研究本身的进展仍受人类认知能力的线性限制，导致开发瓶颈日益严重。我们提出了ASI-Arch，这是在神经架构发现关键领域中首次展示人工超智能用于AI研究（ASI4AI）的系统--一个完全自主的系统，通过使AI能够进行自身架构创新，打破了这一基本限制。我们超越了传统的神经架构搜索（NAS），后者基本上受限于探索人类定义的空间，引入了从自动优化到自动创新的范式转变。ASI-Arch可以在架构发现领域进行端到端的科学研究，自主假设新颖的架构概念，将其实现为可执行代码，通过严格的实验和过去的经验训练和实证验证其性能。ASI-Arch进行了1,773次自主实验，耗时20,000个GPU小时，最终发现了106种创新的最先进的线性关注架构（SOTA）。就像AlphaGo的第37步揭示了对人类玩家不可见的意外战略见解一样，我们的AI发现的架构展示了系统性超越人类设计基准的新兴设计原则，并揭示了以前未知的架构创新路径。至关重要的是，我们建立了科学发现本身的第一个经验扩展规律--证明了架构突破可以在计算上扩展，将研究进展从受人类限制转变为可计算扩展的过程。我们提供了对造就这些突破的新兴设计模式和自主研究能力的全面分析，为自我加速的AI系统奠定了基础。

更新时间: 2025-07-24 03:57:27

领域: cs.AI

下载: http://arxiv.org/abs/2507.18074v1

Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method

Deploying large language models (LLMs) is challenging due to their massive parameters and high computational costs. Ultra low-bit quantization can significantly reduce storage and accelerate inference, but extreme compression (i.e., mean bit-width <= 2) often leads to severe performance degradation. To address this, we propose Squeeze10-LLM, effectively "squeezing" 16-bit LLMs' weights by 10 times. Specifically, Squeeze10-LLM is a staged mixed-precision post-training quantization (PTQ) framework and achieves an average of 1.6 bits per weight by quantizing 80% of the weights to 1 bit and 20% to 4 bits. We introduce Squeeze10LLM with two key innovations: Post-Binarization Activation Robustness (PBAR) and Full Information Activation Supervision (FIAS). PBAR is a refined weight significance metric that accounts for the impact of quantization on activations, improving accuracy in low-bit settings. FIAS is a strategy that preserves full activation information during quantization to mitigate cumulative error propagation across layers. Experiments on LLaMA and LLaMA2 show that Squeeze10-LLM achieves state-of-the-art performance for sub-2bit weight-only quantization, improving average accuracy from 43% to 56% on six zero-shot classification tasks--a significant boost over existing PTQ methods. Our code will be released upon publication.

Updated: 2025-07-24 03:55:19

标题: Squeeze10-LLM：通过分阶段混合精度量化方法将LLMs的权重压缩10倍

摘要: 部署大型语言模型（LLMs）具有挑战性，因为它们拥有庞大的参数和高昂的计算成本。超低比特量化可以显著减少存储空间并加速推断，但是极端压缩（即平均比特宽度<=2）通常会导致严重的性能下降。为了解决这个问题，我们提出了Squeeze10-LLM，通过将16位LLMs的权重“挤压”10倍来有效缩小。具体来说，Squeeze10-LLM是一个分阶混合精度后训练量化（PTQ）框架，通过将80%的权重量化为1比特，20%的权重量化为4比特，实现了平均每个权重1.6比特。我们引入了Squeeze10LLM的两个关键创新：后二值化激活鲁棒性（PBAR）和完整信息激活监督（FIAS）。PBAR是一个精细的权重重要性度量，考虑了量化对激活的影响，在低比特设置下提高了准确性。FIAS是一种在量化过程中保留完整激活信息的策略，以减轻跨层间的累积误差传播。在LLaMA和LLaMA2上的实验表明，Squeeze10-LLM在低于2比特的仅权重量化方面实现了最先进的性能，将六个零样本分类任务的平均准确性从43%提高到56% - 这比现有的PTQ方法有了显著提升。我们的代码将在发表后发布。

更新时间: 2025-07-24 03:55:19

领域: cs.LG

下载: http://arxiv.org/abs/2507.18073v1

Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method

Updated: 2025-07-24 03:55:19

标题: Squeeze10-LLM：通过分阶段混合精度量化方法将LLM权重压缩10倍

摘要: 部署大型语言模型(LLMs)具有挑战性，因为它们具有庞大的参数和高昂的计算成本。超低比特量化可以显著减少存储需求并加速推断，但极端压缩(即平均比特宽度<= 2)常常会导致严重的性能降低。为了解决这个问题，我们提出了Squeeze10-LLM，通过将16位LLMs的权重"挤压"10倍来有效地实现。具体而言，Squeeze10-LLM是一个分阶混合精度后训练量化(PTQ)框架，通过将80%的权重量化为1比特，20%的权重量化为4比特，实现了平均每个权重1.6比特。我们引入了Squeeze10LLM并创新了两个关键点：后二值化激活鲁棒性(PBAR)和全信息激活监督(FIAS)。PBAR是一个精细的权重重要性度量，考虑了量化对激活的影响，在低比特设置中提高了准确性。FIAS是一种策略，通过在量化过程中保留完整的激活信息，减轻了跨层累积误差传播的影响。在LLaMA和LLaMA2上的实验表明，Squeeze10-LLM实现了亚2比特仅权重量化的最新性能，将六个零-shot分类任务的平均准确性从43%提高到56%--比现有PTQ方法显著提升。我们的代码将在发表后发布。

更新时间: 2025-07-24 03:55:19

领域: cs.LG

下载: http://arxiv.org/abs/2507.18073v1

C-AAE: Compressively Anonymizing Autoencoders for Privacy-Preserving Activity Recognition in Healthcare Sensor Streams

Wearable accelerometers and gyroscopes encode fine-grained behavioural signatures that can be exploited to re-identify users, making privacy protection essential for healthcare applications. We introduce C-AAE, a compressive anonymizing autoencoder that marries an Anonymizing AutoEncoder (AAE) with Adaptive Differential Pulse-Code Modulation (ADPCM). The AAE first projects raw sensor windows into a latent space that retains activity-relevant features while suppressing identity cues. ADPCM then differentially encodes this latent stream, further masking residual identity information and shrinking the bitrate. Experiments on the MotionSense and PAMAP2 datasets show that C-AAE cuts user re-identification F1 scores by 10-15 percentage points relative to AAE alone, while keeping activity-recognition F1 within 5 percentage points of the unprotected baseline. ADPCM also reduces data volume by roughly 75 %, easing transmission and storage overheads. These results demonstrate that C-AAE offers a practical route to balancing privacy and utility in continuous, sensor-based activity recognition for healthcare.

Updated: 2025-07-24 03:55:04

标题: C-AAE：用于在医疗传感器数据流中实现隐私保护的压缩匿名自编码器

摘要: 可穿戴加速计和陀螺仪编码细粒度的行为特征，可以被利用来重新识别用户，因此对于医疗应用来说，隐私保护是至关重要的。我们引入了C-AAE，这是一个将匿名自动编码器（AAE）与自适应差分脉冲编码调制（ADPCM）相结合的压缩匿名化自动编码器。AAE首先将原始传感器窗口投影到一个潜在空间中，保留与活动相关的特征同时抑制身份线索。然后，ADPCM对这个潜在流进行差分编码，进一步掩盖残留的身份信息并缩小比特率。对MotionSense和PAMAP2数据集的实验表明，相对于单独使用AAE，C-AAE将用户重新识别的F1分数降低了10-15个百分点，同时将活动识别的F1保持在未保护基线的5个百分点以内。ADPCM还将数据容量减少约75％，减轻了传输和存储开销。这些结果表明，C-AAE为医疗中连续、基于传感器的活动识别提供了一个在隐私和效用之间平衡的实用路线。

更新时间: 2025-07-24 03:55:04

领域: cs.LG

下载: http://arxiv.org/abs/2507.18072v1

C-AAE: Compressively Anonymizing Autoencoders for Privacy-Preserving Activity Recognition in Healthcare Sensor Streams

Updated: 2025-07-24 03:55:04

标题: C-AAE：用于医疗传感器流隐私保护活动识别的压缩匿名自编码器

摘要: 可穿戴加速计和陀螺仪编码了可以用来重新识别用户的细粒度行为特征，这使得隐私保护对于医疗应用至关重要。我们介绍了C-AAE，这是一个将匿名自动编码器（AAE）与自适应差分脉冲编码调制（ADPCM）相结合的压缩匿名化自动编码器。AAE首先将原始传感器窗口投影到一个潜在空间中，保留活动相关特征的同时抑制身份线索。然后，ADPCM不同ially编码这个潜在流，进一步掩盖残留的身份信息并缩小比特率。对MotionSense和PAMAP2数据集的实验表明，相对于单独使用AAE，C-AAE将用户重新识别F1分数降低了10-15个百分点，同时将活动识别F1保持在未保护基线的5个百分点之内。ADPCM还将数据量减少了大约75％，减轻了传输和存储开销。这些结果表明，C-AAE为医疗领域的连续、基于传感器的活动识别提供了一个实用的平衡隐私和效用的途径。

更新时间: 2025-07-24 03:55:04

领域: cs.LG

下载: http://arxiv.org/abs/2507.18072v1

Group Sequence Policy Optimization

This paper introduces Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant reinforcement learning algorithm for training large language models. Unlike previous algorithms that adopt token-level importance ratios, GSPO defines the importance ratio based on sequence likelihood and performs sequence-level clipping, rewarding, and optimization. We demonstrate that GSPO achieves superior training efficiency and performance compared to the GRPO algorithm, notably stabilizes Mixture-of-Experts (MoE) RL training, and has the potential for simplifying the design of RL infrastructure. These merits of GSPO have contributed to the remarkable improvements in the latest Qwen3 models.

Updated: 2025-07-24 03:50:32

标题: 群体序列策略优化

摘要: 本文介绍了Group Sequence Policy Optimization（GSPO），这是我们稳定、高效、性能优越的强化学习算法，用于训练大型语言模型。与先前采用标记级重要性比率的算法不同，GSPO基于序列可能性定义重要性比率，并执行序列级别的剪裁、奖励和优化。我们证明，与GRPO算法相比，GSPO实现了更高的训练效率和性能，显著稳定了Mixture-of-Experts（MoE）RL训练，并有简化RL基础设施设计的潜力。GSPO的这些优点已经为最新的Qwen3模型带来了显著的改进。

更新时间: 2025-07-24 03:50:32

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.18071v1

Group Sequence Policy Optimization

Updated: 2025-07-24 03:50:32

标题: 小组序列策略优化

摘要: 这篇论文介绍了Group Sequence Policy Optimization（GSPO），我们稳定、高效和高性能的强化学习算法，用于训练大型语言模型。与先前采用标记级重要性比率的算法不同，GSPO根据序列可能性定义重要性比率，并执行序列级别的剪切、奖励和优化。我们证明，与GRPO算法相比，GSPO实现了更高的训练效率和性能，显著稳定了专家混合（MoE）RL训练，并且具有简化RL基础设施设计的潜力。GSPO的这些优点已经为最新的Qwen3模型带来了显著的改进。

更新时间: 2025-07-24 03:50:32

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.18071v1

BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference

The rapidly increasing size of large language models (LLMs) presents significant challenges in memory usage and computational costs. Quantizing both weights and activations can address these issues, with hardware-supported fine-grained scaling emerging as a promising solution to mitigate outliers. However, existing methods struggle to capture nuanced block data distributions. We propose BlockDialect, a block-wise fine-grained mixed format technique that assigns a per-block optimal number format from a formatbook for better data representation. Additionally, we introduce DialectFP4, a formatbook of FP4 variants (akin to dialects) that adapt to diverse data distributions. To leverage this efficiently, we propose a two-stage approach for online DialectFP4 activation quantization. Importantly, DialectFP4 ensures energy efficiency by selecting representable values as scaled integers compatible with low-precision integer arithmetic. BlockDialect achieves 10.78% (7.48%) accuracy gain on the LLaMA3-8B (LLaMA2-7B) model compared to MXFP4 format with lower bit usage per data, while being only 5.45% (2.69%) below full precision even when quantizing full-path matrix multiplication. Focusing on how to represent over how to scale, our work presents a promising path for energy-efficient LLM inference.

Updated: 2025-07-24 03:46:03

标题: BlockDialect：块级细粒度混合格式量化，用于能效低的LLM推断

摘要: 大型语言模型（LLMs）的规模迅速增长，给内存使用和计算成本带来了重大挑战。权重和激活的量化可以解决这些问题，硬件支持的细粒度缩放正在成为缓解异常值的有前途的解决方案。然而，现有方法在捕捉微妙的块数据分布方面存在困难。我们提出了BlockDialect，一种基于块的细粒度混合格式技术，为了更好地表示数据，它从格式书中为每个块分配一个最佳数格式。此外，我们引入了DialectFP4，一种FP4变体（类似于方言）的格式书，适应各种数据分布。为了有效利用这一点，我们提出了一种用于在线DialectFP4激活量化的两阶段方法。重要的是，DialectFP4通过选择与低精度整数算术兼容的可表示值作为缩放整数来确保能源效率。与MXFP4格式相比，BlockDialect在LLaMA3-8B（LLaMA2-7B）模型上实现了10.78%（7.48%）的准确性增益，并且每个数据的比特使用率更低，即使在对完整路径矩阵乘法进行量化时，也仅低于完整精度5.45%（2.69%）。我们的工作聚焦于如何表示而不是如何缩放，为能源高效的LLM推理提供了一个有前途的路径。

更新时间: 2025-07-24 03:46:03

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2501.01144v5

BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference

Updated: 2025-07-24 03:46:03

标题: BlockDialect：面向能效的块级精细混合格式量化用于能效LLM推断

摘要: 大型语言模型（LLMs）的尺寸快速增长给内存使用和计算成本带来了重大挑战。对权重和激活进行量化可以解决这些问题，而硬件支持的细粒度缩放正在成为缓解离群值的有希望的解决方案。然而，现有方法难以捕捉微妙的块数据分布。我们提出了BlockDialect，一种块状细粒度混合格式技术，它为更好的数据表示从格式书中为每个块分配了一个最佳数格式。此外，我们介绍了DialectFP4，一种适应各种数据分布的FP4变体（类似于方言）的格式书。为了有效利用这一点，我们提出了一种用于在线DialectFP4激活量化的两阶段方法。重要的是，DialectFP4通过选择与低精度整数运算兼容的可表示值作为缩放整数来确保能源效率。与MXFP4格式相比，BlockDialect在LLaMA3-8B（LLaMA2-7B）模型上取得了10.78%（7.48%）的准确性增益，而每个数据的位使用率更低，即使在量化全路径矩阵乘法时，也仅比全精度低5.45%（2.69%）。我们的工作聚焦在如何表示而不是如何缩放，为能源高效的LLM推理提供了一个有希望的路径。

更新时间: 2025-07-24 03:46:03

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2501.01144v5

Multiscale Neural PDE Surrogates for Prediction and Downscaling: Application to Ocean Currents

Accurate modeling of physical systems governed by partial differential equations is a central challenge in scientific computing. In oceanography, high-resolution current data are critical for coastal management, environmental monitoring, and maritime safety. However, available satellite products, such as Copernicus data for sea water velocity at ~0.08 degrees spatial resolution and global ocean models, often lack the spatial granularity required for detailed local analyses. In this work, we (a) introduce a supervised deep learning framework based on neural operators for solving PDEs and providing arbitrary resolution solutions, and (b) propose downscaling models with an application to Copernicus ocean current data. Additionally, our method can model surrogate PDEs and predict solutions at arbitrary resolution, regardless of the input resolution. We evaluated our model on real-world Copernicus ocean current data and synthetic Navier-Stokes simulation datasets.

Updated: 2025-07-24 03:42:06

标题: 多尺度神经PDE代理用于预测和降尺度化：应用于海洋洋流

摘要: 物理系统由偏微分方程控制的精确建模是科学计算中的一个核心挑战。在海洋学中，高分辨率的海流数据对于沿海管理、环境监测和海上安全至关重要。然而，现有的卫星产品，如Copernicus数据中海水速度的空间分辨率约为0.08度和全球海洋模型，通常缺乏详细局部分析所需的空间粒度。在这项工作中，我们（a）引入了基于神经算子的监督深度学习框架，用于解决PDE并提供任意分辨率的解决方案，（b）提出了应用于Copernicus海洋流数据的降尺度模型。此外，我们的方法可以建模替代PDE并预测任意分辨率的解决方案，而不受输入分辨率的限制。我们在真实的Copernicus海洋流数据和合成Navier-Stokes模拟数据集上评估了我们的模型。

更新时间: 2025-07-24 03:42:06

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2507.18067v1

Multiscale Neural PDE Surrogates for Prediction and Downscaling: Application to Ocean Currents

Updated: 2025-07-24 03:42:06

标题: 多尺度神经PDE替代品用于预测和降尺度化：应用于海洋洋流

摘要: 由偏微分方程控制的物理系统的准确建模是科学计算中的一个核心挑战。在海洋学领域，高分辨率的海流数据对于海岸管理、环境监测和海上安全至关重要。然而，目前可用的卫星产品，例如Copernicus数据，海水速度的空间分辨率约为0.08度，以及全球海洋模型，通常缺乏详细局部分析所需的空间细粒度。在这项工作中，我们(a)介绍了一个基于神经算子的监督深度学习框架，用于解决PDE并提供任意分辨率的解决方案；(b)提出了一种应用于Copernicus海洋流数据的降尺度模型。此外，我们的方法可以建模替代PDE并预测任意分辨率的解决方案，无论输入分辨率如何。我们在真实的Copernicus海洋流数据和合成的Navier-Stokes模拟数据集上评估了我们的模型。

更新时间: 2025-07-24 03:42:06

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2507.18067v1

Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation

Variations in medical imaging modalities and individual anatomical differences pose challenges to cross-modality generalization in multi-modal tasks. Existing methods often concentrate exclusively on common anatomical patterns, thereby neglecting individual differences and consequently limiting their generalization performance. This paper emphasizes the critical role of learning individual-level invariance, i.e., personalized representation $\mathbb{X}_h$, to enhance multi-modality generalization under both homogeneous and heterogeneous settings. It reveals that mappings from individual biological profile to different medical modalities remain static across the population, which is implied in the personalization process. We propose a two-stage approach: pre-training with invariant representation $\mathbb{X}_h$ for personalization, then fine-tuning for diverse downstream tasks. We provide both theoretical and empirical evidence demonstrating the feasibility and advantages of personalization, showing that our approach yields greater generalizability and transferability across diverse multi-modal medical tasks compared to methods lacking personalization. Extensive experiments further validate that our approach significantly enhances performance in various generalization scenarios.

Updated: 2025-07-24 03:38:33

标题: 朝向通过学习个性化不变表示实现通用的三维医学多模态泛化

摘要: 医学成像模态的变化和个体解剖差异给多模态任务中的跨模态泛化带来挑战。现有方法往往只集中在常见解剖模式上，从而忽视个体差异，进而限制了它们的泛化性能。本文强调了学习个体级不变性的关键作用，即个性化表示$\mathbb{X}_h$，以增强同质和异质环境下的多模态泛化。研究表明，从个体生物特征到不同医学模态的映射在整个人口中保持静态，这在个性化过程中被隐含。我们提出了一个两阶段方法：使用不变表示$\mathbb{X}_h$进行个性化的预训练，然后针对不同的下游任务进行微调。我们提供了理论和实证证据，证明了个性化的可行性和优势，表明我们的方法相比缺乏个性化的方法在各种多模态医学任务中具有更大的泛化能力和可转移性。大量实验进一步验证了我们的方法显著提高了各种泛化场景中的性能。

更新时间: 2025-07-24 03:38:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.06106v4

Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature

Despite the significance of probabilistic time-series forecasting models, their evaluation metrics often involve intractable integrations. The most widely used metric, the continuous ranked probability score (CRPS), is a strictly proper scoring function; however, its computation requires approximation. We found that popular CRPS estimators--specifically, the quantile-based estimator implemented in the widely used GluonTS library and the probability-weighted moment approximation--both exhibit inherent estimation biases. These biases lead to crude approximations, resulting in improper rankings of forecasting model performance when CRPS values are close. To address this issue, we introduced a kernel quadrature approach that leverages an unbiased CRPS estimator and employs cubature construction for scalable computation. Empirically, our approach consistently outperforms the two widely used CRPS estimators.

Updated: 2025-07-24 03:38:00

标题: 通过核积分修复概率时间序列预测评估的缺陷

摘要: 尽管概率时间序列预测模型具有重要意义，但它们的评估指标经常涉及难以处理的积分。最广泛使用的指标连续排名概率分数（CRPS）是一个严格适当的评分函数；然而，它的计算需要近似。我们发现流行的CRPS估计器--特别是在广泛使用的GluonTS库中实现的基于分位数的估计器和概率加权矩估计--都存在固有的估计偏差。这些偏差导致粗略的近似，导致在CRPS值接近时预测模型性能的排名不正确。为了解决这个问题，我们引入了一种利用无偏CRPS估计器并采用可扩展计算的立方构造的核积分方法。从经验上看，我们的方法始终优于这两种广泛使用的CRPS估计器。

更新时间: 2025-07-24 03:38:00

领域: stat.ML,cs.LG,62C10, 62F15

下载: http://arxiv.org/abs/2503.06079v2

Fixing the Pitfalls of Probabilistic Time-Series Forecasting Evaluation by Kernel Quadrature

Updated: 2025-07-24 03:38:00

标题: 修复概率时间序列预测评估的缺陷：通过核积分进行修正

摘要: 尽管概率时间序列预测模型具有重要意义，但它们的评估指标往往涉及难以解决的积分。最广泛使用的指标连续排名概率得分（CRPS）是一个严格的适当评分函数；然而，它的计算需要近似。我们发现，流行的CRPS估计器--具体来说，广泛使用的GluonTS库中实现的基于分位数的估计器和概率加权矩估计--都存在固有的估计偏差。这些偏差导致粗糙的近似，导致当CRPS值接近时对预测模型性能进行不当排名。为了解决这个问题，我们引入了一种利用无偏CRPS估计器并采用可伸缩计算的积分构造的核积分方法。在实证研究中，我们的方法始终优于这两种广泛使用的CRPS估计器。

更新时间: 2025-07-24 03:38:00

领域: stat.ML,cs.LG,62C10, 62F15

下载: http://arxiv.org/abs/2503.06079v2

TELEVAL: A Dynamic Benchmark Designed for Spoken Language Models in Chinese Interactive Scenarios

Spoken language models (SLMs) have seen rapid progress in recent years, along with the development of numerous benchmarks for evaluating their performance. However, most existing benchmarks primarily focus on evaluating whether SLMs can perform complex tasks comparable to those tackled by large language models (LLMs), often failing to align with how users naturally interact in real-world conversational scenarios. In this paper, we propose TELEVAL, a dynamic benchmark specifically designed to evaluate SLMs' effectiveness as conversational agents in realistic Chinese interactive settings. TELEVAL defines three evaluation dimensions: Explicit Semantics, Paralinguistic and Implicit Semantics, and System Abilities. It adopts a dialogue format consistent with real-world usage and evaluates text and audio outputs separately. TELEVAL particularly focuses on the model's ability to extract implicit cues from user speech and respond appropriately without additional instructions. Our experiments demonstrate that despite recent progress, existing SLMs still have considerable room for improvement in natural conversational tasks. We hope that TELEVAL can serve as a user-centered evaluation framework that directly reflects the user experience and contributes to the development of more capable dialogue-oriented SLMs.

Updated: 2025-07-24 03:23:55

标题: TELEVAL：一个为中文交互场景中口语模型设计的动态基准

摘要: 在过去几年中，口语语言模型（SLMs）取得了快速进展，同时也发展出许多用于评估它们性能的基准。然而，大多数现有基准主要集中在评估SLMs是否能够执行类似于大型语言模型（LLMs）所处理的复杂任务，通常未能与用户在现实世界会话场景中自然交互的方式相一致。在本文中，我们提出了TELEVAL，这是一个专门设计用于评估SLMs在现实中文交互设置中作为会话代理的有效性的动态基准。TELEVAL定义了三个评估维度：显式语义、语用和隐式语义以及系统能力。它采用与现实使用一致的对话格式，并分别评估文本和音频输出。TELEVAL特别关注模型从用户语音中提取隐含线索并在没有额外指导的情况下作出适当回应的能力。我们的实验表明，尽管最近取得了进展，但现有的SLMs在自然对话任务中仍有相当大的改进空间。我们希望TELEVAL可以作为一个以用户为中心的评估框架，直接反映用户体验并促进更具能力的面向对话的SLMs的发展。

更新时间: 2025-07-24 03:23:55

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.18061v1

Multi-Agent Guided Policy Optimization

Due to practical constraints such as partial observability and limited communication, Centralized Training with Decentralized Execution (CTDE) has become the dominant paradigm in cooperative Multi-Agent Reinforcement Learning (MARL). However, existing CTDE methods often underutilize centralized training or lack theoretical guarantees. We propose Multi-Agent Guided Policy Optimization (MAGPO), a novel framework that better leverages centralized training by integrating centralized guidance with decentralized execution. MAGPO uses an auto-regressive joint policy for scalable, coordinated exploration and explicitly aligns it with decentralized policies to ensure deployability under partial observability. We provide theoretical guarantees of monotonic policy improvement and empirically evaluate MAGPO on 43 tasks across 6 diverse environments. Results show that MAGPO consistently outperforms strong CTDE baselines and matches or surpasses fully centralized approaches, offering a principled and practical solution for decentralized multi-agent learning. Our code and experimental data can be found in https://github.com/liyheng/MAGPO.

Updated: 2025-07-24 03:22:21

标题: 多智能体引导的策略优化

摘要: 由于实际约束，如部分可观测性和有限通信，集中式训练与分散执行（CTDE）已成为合作多智体强化学习（MARL）中的主导范式。然而，现有的CTDE方法通常未充分利用集中式训练，或者缺乏理论保证。我们提出了多智体引导策略优化（MAGPO），这是一个更好地利用集中式训练的新框架，通过将集中式引导与分散执行相结合。MAGPO使用自回归联合策略进行可扩展的协调探索，并明确将其与分散策略对齐，以确保在部分可观测性下的部署。我们提供了策略改进的理论保证，并在6个不同环境中的43个任务上对MAGPO进行了实证评估。结果显示，MAGPO始终优于强大的CTDE基线，并与完全集中式方法匹敌或超越，为分散多智体学习提供了一个基于原则的实用解决方案。我们的代码和实验数据可以在https://github.com/liyheng/MAGPO 中找到。

更新时间: 2025-07-24 03:22:21

领域: cs.AI,cs.MA

下载: http://arxiv.org/abs/2507.18059v1

Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias

Generated texts from large language models (LLMs) have been shown to exhibit a variety of harmful, human-like biases against various demographics. These findings motivate research efforts aiming to understand and measure such effects. This paper introduces a causal formulation for bias measurement in generative language models. Based on this theoretical foundation, we outline a list of desiderata for designing robust bias benchmarks. We then propose a benchmark called OccuGender, with a bias-measuring procedure to investigate occupational gender bias. We test several state-of-the-art open-source LLMs on OccuGender, including Llama, Mistral, and their instruction-tuned versions. The results show that these models exhibit substantial occupational gender bias. Lastly, we discuss prompting strategies for bias mitigation and an extension of our causal formulation to illustrate the generalizability of our framework. Our code and data https://github.com/chenyuen0103/gender-bias.

Updated: 2025-07-24 03:20:40

标题: 在LLM中因果性地测试性别偏见：职业偏见案例研究

摘要: 大型语言模型（LLMs）生成的文本已被证明存在多种有害的、类似于人类的偏见，针对各种人口统计数据。这些发现激发了研究人员的努力，旨在理解和测量这些影响。本文介绍了一种因果形式的偏见测量方法，用于生成语言模型中的偏见。基于这一理论基础，我们概述了设计健壮偏见基准的一系列期望。然后，我们提出了一个名为OccuGender的基准，其中包括一个用于调查职业性别偏见的偏见测量过程。我们在OccuGender上测试了几种最先进的开源LLMs，包括Llama、Mistral和它们的指导调整版本。结果显示，这些模型表现出明显的职业性别偏见。最后，我们讨论了用于减轻偏见的提示策略以及对我们因果形式的扩展，以说明我们框架的普适性。我们的代码和数据 https://github.com/chenyuen0103/gender-bias。

更新时间: 2025-07-24 03:20:40

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2212.10678v4

Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias

Updated: 2025-07-24 03:20:40

标题: 在LLM中对性别偏见进行因果测试：职业偏见案例研究

摘要: 大型语言模型（LLMs）生成的文本已被证明存在各种有害的、类似于人类的偏见，针对不同人口统计特征。这些发现激发了研究努力，旨在理解和衡量这些影响。本文介绍了一种因果形式的偏见测量方法，用于生成语言模型。基于这一理论基础，我们概述了设计强健偏见基准的一系列期望。然后，我们提出了一个名为OccuGender的基准，其中包括一个用于调查职业性别偏见的偏见测量程序。我们在OccuGender上测试了几种最先进的开源LLM，包括Llama、Mistral及其经过指导调整的版本。结果显示这些模型存在显著的职业性别偏见。最后，我们讨论了减轻偏见的提示策略，并扩展了我们的因果形式，以说明我们框架的普适性。我们的代码和数据https://github.com/chenyuen0103/gender-bias。

更新时间: 2025-07-24 03:20:40

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2212.10678v4

A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models

The rapid advancements in generative AI and large language models (LLMs) have opened up new avenues for producing synthetic data, particularly in the realm of structured tabular formats, such as product reviews. Despite the potential benefits, concerns regarding privacy leakage have surfaced, especially when personal information is utilized in the training datasets. In addition, there is an absence of a comprehensive evaluation framework capable of quantitatively measuring the quality of the generated synthetic data and their utility for downstream tasks. In response to this gap, we introduce SynEval, an open-source evaluation framework designed to assess the fidelity, utility, and privacy preservation of synthetically generated tabular data via a suite of diverse evaluation metrics. We validate the efficacy of our proposed framework - SynEval - by applying it to synthetic product review data generated by three state-of-the-art LLMs: ChatGPT, Claude, and Llama. Our experimental findings illuminate the trade-offs between various evaluation metrics in the context of synthetic data generation. Furthermore, SynEval stands as a critical instrument for researchers and practitioners engaged with synthetic tabular data,, empowering them to judiciously determine the suitability of the generated data for their specific applications, with an emphasis on upholding user privacy.

Updated: 2025-07-24 03:19:19

标题: 一个用于评估大型语言模型生成的合成数据的多方面评估框架

摘要: 生成式人工智能和大型语言模型（LLMs）的快速发展为生产合成数据开辟了新的途径，特别是在结构化表格格式领域，如产品评论。尽管潜在的好处，关于隐私泄露的担忧已经浮出水面，特别是当个人信息被用于训练数据集时。此外，缺乏一个能够定量衡量生成的合成数据质量及其对下游任务的实用性的全面评估框架。针对这一空白，我们介绍了SynEval，一个旨在通过一套多样化的评估指标评估通过合成生成的表格数据的忠实性、实用性和隐私保护的开源评估框架。我们通过将其应用于三种最先进的LLMs生成的合成产品评论数据来验证我们提出的框架SynEval的有效性：ChatGPT，Claude和Llama。我们的实验结果揭示了在合成数据生成背景下各种评估指标之间的权衡。此外，SynEval作为一种重要工具，为研究人员和从事合成表格数据的实践者提供支持，使他们能够审慎地确定生成数据对于其特定应用的适用性，强调保护用户隐私。

更新时间: 2025-07-24 03:19:19

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.14445v2

A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models

Updated: 2025-07-24 03:19:19

标题: 一个用于评估大型语言模型生成的合成数据的多方面评估框架

摘要: 生成式AI和大型语言模型（LLMs）的快速发展为生产合成数据开辟了新的途径，特别是在结构化表格格式领域，如产品评论方面。尽管存在潜在的好处，但隐私泄露方面的担忧日益凸显，特别是当个人信息被用于训练数据集时。此外，缺乏一个全面的评估框架，能够定量衡量生成的合成数据的质量以及它们对下游任务的效用。针对这一空白，我们介绍了SynEval，一个开源评估框架，旨在通过一系列多样的评估指标评估通过合成生成的表格数据的忠实度、效用和隐私保护。我们通过将其应用于三种最先进的LLMs生成的合成产品评论数据来验证我们提出的框架——SynEval——的有效性：ChatGPT、Claude和Llama。我们的实验结果阐明了在合成数据生成的背景下各种评估指标之间的权衡。此外，SynEval作为研究人员和从业者从事合成表格数据的关键工具，赋予他们审慎判断生成数据对其特定应用的适用性，特别强调维护用户隐私。

更新时间: 2025-07-24 03:19:19

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.14445v2

Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs

The increasing use of synthetic data generated by Large Language Models (LLMs) presents both opportunities and challenges in data-driven applications. While synthetic data provides a cost-effective, scalable alternative to real-world data to facilitate model training, its diversity and privacy risks remain underexplored. Focusing on text-based synthetic data, we propose a comprehensive set of metrics to quantitatively assess the diversity (i.e., linguistic expression, sentiment, and user perspective), and privacy (i.e., re-identification risk and stylistic outliers) of synthetic datasets generated by several state-of-the-art LLMs. Experiment results reveal significant limitations in LLMs' capabilities in generating diverse and privacy-preserving synthetic data. Guided by the evaluation results, a prompt-based approach is proposed to enhance the diversity of synthetic reviews while preserving reviewer privacy.

Updated: 2025-07-24 03:12:16

标题: 使用LLMs进行多样化写作风格的隐私保护合成评论生成

摘要: 随着大型语言模型（LLMs）生成的合成数据的增加，数据驱动应用程序面临机遇和挑战。虽然合成数据提供了一种成本效益高、可扩展的替代方案以促进模型训练，但其多样性和隐私风险仍未被充分探讨。针对基于文本的合成数据，我们提出了一套全面的指标，定量评估多个最先进的LLMs生成的合成数据集的多样性（即语言表达、情感和用户观点）和隐私（即重新识别风险和风格异常）。实验结果显示，LLMs在生成多样性和保护隐私的合成数据方面存在重要限制。在评估结果的指导下，提出了一种基于提示的方法，以增强合成评论的多样性同时保护评论者的隐私。

更新时间: 2025-07-24 03:12:16

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.18055v1

Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs

Updated: 2025-07-24 03:12:16

标题: 使用LLMs生成多样化写作风格的隐私保护综合评论

摘要: 大语言模型（LLMs）生成的合成数据的使用日益增加，为数据驱动应用程序提供了机遇和挑战。虽然合成数据提供了一种成本效益高、可扩展的替代方案，以促进模型训练，但其多样性和隐私风险仍未得到充分探讨。针对基于文本的合成数据，我们提出了一套全面的指标，以定量评估合成数据集的多样性（即语言表达、情感和用户观点）和隐私（即重新识别风险和风格异常）的情况。实验结果显示，LLMs在生成多样化和保护隐私的合成数据方面存在显著局限性。在评估结果的指导下，提出了一种基于提示的方法，以增强合成评论的多样性同时保护评论者的隐私。

更新时间: 2025-07-24 03:12:16

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.18055v1

An Empirical Study on Virtual Reality Software Security Weaknesses

Virtual Reality (VR) has emerged as a transformative technology across industries, yet its security weaknesses, including vulnerabilities, are underinvestigated. This study investigates 334 VR projects hosted on GitHub, examining 1,681 software security weaknesses to understand: what types of weaknesses are prevalent in VR software; when and how weaknesses are introduced; how long they have survived; and how they have been removed. Due to the limited availability of VR software security weaknesses in public databases (e.g., the National Vulnerability Database or NVD), we prepare the first systematic dataset of VR software security weaknesses by introducing a novel framework to collect such weaknesses from GitHub commit data. Our empirical study on the dataset leads to useful insights, including: (i) VR weaknesses are heavily skewed toward user interface weaknesses, followed by resource-related weaknesses; (ii) VR development tools pose higher security risks than VR applications; (iii) VR security weaknesses are often introduced at the VR software birth time.

Updated: 2025-07-24 03:05:47

标题: 关于虚拟现实软件安全弱点的实证研究

摘要: 虚拟现实（VR）已经成为跨行业的一种变革性技术，然而其安全弱点，包括漏洞，尚未得到充分研究。本研究调查了在GitHub上托管的334个VR项目，审查了1,681个软件安全弱点，以了解：VR软件中哪些类型的弱点最为普遍；弱点是何时和如何引入的；它们存在了多长时间；以及它们是如何被消除的。由于公共数据库（如国家漏洞数据库或NVD）中VR软件安全弱点的有限可用性，我们通过引入一种新颖的框架从GitHub提交数据中收集这些弱点，准备了第一个系统化的VR软件安全弱点数据集。我们对该数据集进行的实证研究提供了有用的见解，包括：（i）VR弱点在很大程度上偏向用户界面弱点，其次是与资源相关的弱点；（ii）VR开发工具比VR应用程序更容易存在安全风险；（iii）VR安全弱点通常在VR软件诞生时引入。

更新时间: 2025-07-24 03:05:47

领域: cs.CR

下载: http://arxiv.org/abs/2507.17324v2

From Hypothesis to Publication: A Comprehensive Survey of AI-Driven Research Support Systems

Research is a fundamental process driving the advancement of human civilization, yet it demands substantial time and effort from researchers. In recent years, the rapid development of artificial intelligence (AI) technologies has inspired researchers to explore how AI can accelerate and enhance research. To monitor relevant advancements, this paper presents a systematic review of the progress in this domain. Specifically, we organize the relevant studies into three main categories: hypothesis formulation, hypothesis validation, and manuscript publication. Hypothesis formulation involves knowledge synthesis and hypothesis generation. Hypothesis validation includes the verification of scientific claims, theorem proving, and experiment validation. Manuscript publication encompasses manuscript writing and the peer review process. Furthermore, we identify and discuss the current challenges faced in these areas, as well as potential future directions for research. Finally, we also offer a comprehensive overview of existing benchmarks and tools across various domains that support the integration of AI into the research process. We hope this paper serves as an introduction for beginners and fosters future research. Resources have been made publicly available at https://github.com/zkzhou126/AI-for-Research.

Updated: 2025-07-24 02:59:25

标题: 从假设到出版：人工智能驱动的研究支持系统综合调查

摘要: 研究是推动人类文明进步的基本过程，但它需要研究人员投入大量时间和精力。近年来，人工智能（AI）技术的快速发展激发了研究人员探索AI如何加速和增强研究的兴趣。为了监测相关进展，本文系统地回顾了该领域的进展。具体而言，我们将相关研究分为三大类别：假设制定、假设验证和手稿发表。假设制定涉及知识综合和假设生成。假设验证包括科学论断的验证、定理证明和实验验证。手稿发表包括手稿撰写和同行评审过程。此外，我们还识别并讨论了在这些领域面临的当前挑战，以及未来研究的潜在方向。最后，我们还提供了跨各个领域支持将AI整合到研究过程中的现有基准和工具的全面概述。我们希望本文能为初学者提供介绍，并促进未来研究。资源已公开发布在https://github.com/zkzhou126/AI-for-Research。

更新时间: 2025-07-24 02:59:25

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.01424v3

RECALLED: An Unbounded Resource Consumption Attack on Large Vision-Language Models

Resource Consumption Attacks (RCAs) have emerged as a significant threat to the deployment of Large Language Models (LLMs). With the integration of vision modalities, additional attack vectors exacerbate the risk of RCAs in large vision-language models (LVLMs). However, existing red-teaming studies have largely overlooked visual inputs as a potential attack surface, resulting in insufficient mitigation strategies against RCAs in LVLMs. To address this gap, we propose RECALLED (\textbf{RE}source \textbf{C}onsumption \textbf{A}ttack on \textbf{L}arge Vision-\textbf{L}anguag\textbf{E} Mo\textbf{D}els), the first approach for exploiting visual modalities to trigger unbounded RCAs red-teaming. First, we present \textit{Vision Guided Optimization}, a fine-grained pixel-level optimization, to obtain \textit{Output Recall} adversarial perturbations, which can induce repeating output. Then, we inject the perturbations into visual inputs, triggering unbounded generations to achieve the goal of RCAs. Additionally, we introduce \textit{Multi-Objective Parallel Losses} to generate universal attack templates and resolve optimization conflicts when intending to implement parallel attacks. Empirical results demonstrate that RECALLED increases service response latency by over 26 $\uparrow$, resulting in an additional 20\% increase in GPU utilization and memory consumption. Our study exposes security vulnerabilities in LVLMs and establishes a red-teaming framework that can facilitate future defense development against RCAs.

Updated: 2025-07-24 02:58:16

标题: 被召回：针对大型视觉语言模型的无限资源消耗攻击

摘要: 资源消耗攻击（RCAs）已经成为大型语言模型（LLMs）部署的重要威胁。随着视觉模态的整合，附加的攻击向量加剧了大型视觉-语言模型（LVLMs）中RCAs的风险。然而，现有的红队研究主要忽视了视觉输入作为潜在的攻击面，导致在LVLMs中对抗RCAs的缺乏充分的缓解策略。为了填补这一空白，我们提出了RECALLED（REsource Consumption Attack on Large Vision-Language Models），这是第一个利用视觉模态触发无限RCAs红队的方法。首先，我们提出了视觉引导优化，一个细粒度的像素级优化，以获得Output Recall对抗扰动，可以诱导出重复输出。然后，我们将扰动注入到视觉输入中，触发无限生成以实现RCAs的目标。此外，我们引入了多目标并行损失，以生成通用的攻击模板，并在实施并行攻击时解决优化冲突。实证结果表明，RECALLED使服务响应延迟增加了26％以上，导致GPU利用率和内存消耗额外增加了20％。我们的研究揭示了LVLMs中的安全漏洞，并建立了一个红队框架，可以促进未来对抗RCAs的防御开发。

更新时间: 2025-07-24 02:58:16

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2507.18053v1

RECALLED: An Unbounded Resource Consumption Attack on Large Vision-Language Models

Updated: 2025-07-24 02:58:16

标题: 被召回：对大型视觉语言模型的无限资源消耗攻击

摘要: 资源消耗攻击（RCAs）已经成为大型语言模型（LLMs）部署面临的重要威胁。随着视觉模态的整合，附加攻击向量加剧了大型视觉语言模型（LVLMs）中RCAs的风险。然而，现有的红队研究主要忽视了视觉输入作为潜在的攻击面，导致在LVLMs中对抗RCAs的缺乏足够的缓解策略。为了填补这一空缺，我们提出了RECALLED（大型视觉-语言模型上的资源消耗攻击），这是第一个利用视觉模态触发无限制RCAs红队的方法。首先，我们提出了“视觉引导优化”，这是一种细粒度的像素级优化，用于获得“输出召回”对抗扰动，可以诱导重复输出。然后，我们将这些扰动注入到视觉输入中，触发无限制的生成来实现RCAs的目标。此外，我们引入了“多目标并行损失”来生成通用攻击模板，并解决在实施并行攻击时的优化冲突。实证结果表明，RECALLED将服务响应延迟增加了超过26%，导致GPU利用率和内存消耗额外增加了20%。我们的研究揭示了LVLMs中的安全漏洞，并建立了一个红队框架，可以促进未来对抗RCAs的防御发展。

更新时间: 2025-07-24 02:58:16

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2507.18053v1

Segmentation-free Goodness of Pronunciation

Mispronunciation detection and diagnosis (MDD) is a significant part in modern computer aided language learning (CALL) systems. Within MDD, phoneme-level pronunciation assessment is key to helping L2 learners improve their pronunciation. However, most systems are based on a form of goodness of pronunciation (GOP) which requires pre-segmentation of speech into phonetic units. This limits the accuracy of these methods and the possibility to use modern CTC-based acoustic models for their evaluation. In this study, we first propose self-alignment GOP (GOP-SA) that enables the use of CTC-trained ASR models for MDD. Next, we define a more general alignment-free method that takes all possible alignments of the target phoneme into account (GOP-AF). We give a theoretical account of our definition of GOP-AF, an implementation that solves potential numerical issues as well as a proper normalization which makes the method applicable with acoustic models with different peakiness over time. We provide extensive experimental results on the CMU Kids and Speechocean762 datasets comparing the different definitions of our methods, estimating the dependency of GOP-AF on the peakiness of the acoustic models and on the amount of context around the target phoneme. Finally, we compare our methods with recent studies over the Speechocean762 data showing that the feature vectors derived from the proposed method achieve state-of-the-art results on phoneme-level pronunciation assessment.

Updated: 2025-07-24 02:55:40

标题: 无分段发音的好处

摘要: 发音错误检测与诊断（MDD）是现代计算机辅助语言学习（CALL）系统中的重要部分。在MDD中，音素级别的发音评估对帮助第二语言学习者改善发音至关重要。然而，大多数系统基于一种发音好坏（GOP）的形式，这需要将语音预先分割成音素单元。这限制了这些方法的准确性和使用现代基于CTC的声学模型进行评估的可能性。在本研究中，我们首先提出了自对齐GOP（GOP-SA），该方法使得可以使用经CTC训练的ASR模型进行MDD。接下来，我们定义了一种更通用的无对齐方法，考虑了目标音素的所有可能的对齐（GOP-AF）。我们对GOP-AF的定义进行了理论解释，并实现了解决潜在数值问题的方法，以及使该方法适用于随时间变化的不同尖锐度的声学模型的适当归一化。我们在CMU Kids和Speechocean762数据集上提供了广泛的实验结果，比较了我们方法的不同定义，估计了GOP-AF对声学模型尖锐度和目标音素周围上下文量的依赖性。最后，我们将我们的方法与最近对Speechocean762数据进行的研究进行了比较，结果显示，从所提出的方法派生的特征向量在音素级发音评估方面取得了最新的成果。

更新时间: 2025-07-24 02:55:40

领域: eess.AS,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.16838v2

Enhancing Scene Transition Awareness in Video Generation via Post-Training

Recent advances in AI-generated video have shown strong performance on \emph{text-to-video} tasks, particularly for short clips depicting a single scene. However, current models struggle to generate longer videos with coherent scene transitions, primarily because they cannot infer when a transition is needed from the prompt. Most open-source models are trained on datasets consisting of single-scene video clips, which limits their capacity to learn and respond to prompts requiring multiple scenes. Developing scene transition awareness is essential for multi-scene generation, as it allows models to identify and segment videos into distinct clips by accurately detecting transitions. To address this, we propose the \textbf{Transition-Aware Video} (TAV) dataset, which consists of preprocessed video clips with multiple scene transitions. Our experiment shows that post-training on the \textbf{TAV} dataset improves prompt-based scene transition understanding, narrows the gap between required and generated scenes, and maintains image quality.

Updated: 2025-07-24 02:50:26

标题: 通过后训练增强视频生成中的场景转换意识

摘要: 最近人工智能生成视频方面取得了显著进展，在文本到视频任务中表现出色，特别是对于呈现单一场景的短视频片段。然而，当前模型在生成具有连贯场景转换的长视频方面存在困难，主要是因为它们无法从提示中推断何时需要转换。大多数开源模型都是在由单一场景视频片段组成的数据集上训练的，这限制了它们学习和响应需要多个场景的提示的能力。发展场景转换意识对于多场景生成至关重要，因为它使模型能够通过准确检测转换来识别和分割视频为不同的片段。为了解决这个问题，我们提出了Transition-Aware Video（TAV）数据集，其中包含经过预处理的具有多个场景转换的视频片段。我们的实验表明，在TAV数据集上进行后训练可以改善基于提示的场景转换理解，缩小所需和生成的场景之间的差距，并保持图像质量。

更新时间: 2025-07-24 02:50:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18046v1

Synthetic Data Generation for Phrase Break Prediction with Large Language Model

Current approaches to phrase break prediction address crucial prosodic aspects of text-to-speech systems but heavily rely on vast human annotations from audio or text, incurring significant manual effort and cost. Inherent variability in the speech domain, driven by phonetic factors, further complicates acquiring consistent, high-quality data. Recently, large language models (LLMs) have shown success in addressing data challenges in NLP by generating tailored synthetic data while reducing manual annotation needs. Motivated by this, we explore leveraging LLM to generate synthetic phrase break annotations, addressing the challenges of both manual annotation and speech-related tasks by comparing with traditional annotations and assessing effectiveness across multiple languages. Our findings suggest that LLM-based synthetic data generation effectively mitigates data challenges in phrase break prediction and highlights the potential of LLMs as a viable solution for the speech domain.

Updated: 2025-07-24 02:45:03

标题: 使用大型语言模型进行短语分割预测的合成数据生成

摘要: 目前的短语断点预测方法着重处理文本到语音系统中关键的韵律方面，但严重依赖于来自音频或文本的大量人工标注，需要大量的手动工作和成本。由音系因素驱动的言语领域固有变化进一步使获取一致高质量数据变得更加复杂。最近，大型语言模型（LLMs）已经显示出在NLP中解决数据挑战的成功，通过生成定制的合成数据来减少手动标注的需求。受此启发，我们探索利用LLM生成合成短语断点注释，通过与传统注释进行比较并评估其在多种语言中的有效性，来解决手动标注和与言语相关任务的挑战。我们的研究结果表明，基于LLM的合成数据生成有效地缓解了短语断点预测中的数据挑战，并突显了LLMs作为言语领域可行解决方案的潜力。

更新时间: 2025-07-24 02:45:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18044v1

SAVANT: Vulnerability Detection in Application Dependencies through Semantic-Guided Reachability Analysis

The integration of open-source third-party library dependencies in Java development introduces significant security risks when these libraries contain known vulnerabilities. Existing Software Composition Analysis (SCA) tools struggle to effectively detect vulnerable API usage from these libraries due to limitations in understanding API usage semantics and computational challenges in analyzing complex codebases, leading to inaccurate vulnerability alerts that burden development teams and delay critical security fixes. To address these challenges, we proposed SAVANT by leveraging two insights: proof-of-vulnerability test cases demonstrate how vulnerabilities can be triggered in specific contexts, and Large Language Models (LLMs) can understand code semantics. SAVANT combines semantic preprocessing with LLM-powered context analysis for accurate vulnerability detection. SAVANT first segments source code into meaningful blocks while preserving semantic relationships, then leverages LLM-based reflection to analyze API usage context and determine actual vulnerability impacts. Our evaluation on 55 real-world applications shows that SAVANT achieves 83.8% precision, 73.8% recall, 69.0% accuracy, and 78.5% F1-score, outperforming state-of-the-art SCA tools.

Updated: 2025-07-24 02:36:20

标题: SAVANT：通过语义引导的可达性分析检测应用程序依赖的漏洞

摘要: 在Java开发中集成开源第三方库依赖会在这些库含有已知漏洞时引入重大安全风险。现有的软件组合分析（SCA）工具难以有效检测这些库中脆弱API的使用，原因在于对API使用语义的理解限制和分析复杂代码库的计算挑战，导致不准确的漏洞警报给开发团队带来负担并延迟关键安全修复。为解决这些挑战，我们提出了SAVANT，通过利用两个见解：漏洞证明测试案例展示了漏洞如何在特定上下文中触发，以及大型语言模型（LLMs）能理解代码语义。SAVANT将语义预处理与LLM驱动的上下文分析相结合，以实现准确的漏洞检测。SAVANT首先将源代码分割成有意义的块，同时保留语义关系，然后利用基于LLM的反射来分析API使用上下文，并确定实际漏洞影响。我们在55个真实应用程序上的评估结果显示，SAVANT实现了83.8%的精确度，73.8%的召回率，69.0%的准确度和78.5%的F1分数，超过了最先进的SCA工具。

更新时间: 2025-07-24 02:36:20

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2506.17798v2

SAVANT: Vulnerability Detection in Application Dependencies through Semantic-Guided Reachability Analysis

Updated: 2025-07-24 02:36:20

标题: SAVANT：通过语义引导的可达性分析检测应用程序依赖项的漏洞

摘要: 在Java开发中集成开源第三方库依赖会在这些库包含已知漏洞时引入显著的安全风险。现有的软件组合分析（SCA）工具难以有效地检测这些库中的易受攻击的API使用，因为理解API使用语义的限制和分析复杂代码库的计算挑战，导致不准确的漏洞警报，给开发团队带来负担并延迟关键安全修复。为了解决这些挑战，我们提出了SAVANT，利用了两个见解：漏洞验证测试用例展示了漏洞如何在特定上下文中触发，以及大型语言模型（LLM）可以理解代码语义。SAVANT结合了语义预处理和LLM支持的上下文分析，实现准确的漏洞检测。SAVANT首先将源代码分段成有意义的块，同时保留语义关系，然后利用基于LLM的反射来分析API使用上下文并确定实际的漏洞影响。我们对55个真实世界应用程序进行评估，结果显示，SAVANT实现了83.8%的精确度，73.8%的召回率，69.0%的准确度和78.5%的F1得分，优于最先进的SCA工具。

更新时间: 2025-07-24 02:36:20

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2506.17798v2

GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs

Inference-time steering methods offer a lightweight alternative to fine-tuning large language models (LLMs) and vision-language models (VLMs) by modifying internal activations at test time without updating model weights. However, most existing approaches rely on fixed, global intervention vectors, overlook the causal influence of individual input tokens, and fail to leverage informative gradients from the model's logits, particularly in multimodal settings where visual and textual inputs contribute unevenly. To address these limitations, we introduce GrAInS, an inference-time steering approach that operates across both language-only and vision-language models and tasks. GrAInS uses contrastive, gradient-based attribution via Integrated Gradients to identify the top-k most influential tokens, both positively and negatively attributed based on their contribution to preferred versus dispreferred outputs. These tokens are then used to construct directional steering vectors that capture semantic shifts from undesirable to desirable behavior. During inference, GrAInS adjusts hidden activations at transformer layers guided by token-level attribution signals, and normalizes activations to preserve representational scale. This enables fine-grained, interpretable, and modular control over model behavior, without retraining or auxiliary supervision. Empirically, GrAInS consistently outperforms both fine-tuning and existing steering baselines: it achieves a 13.22% accuracy gain on TruthfulQA using Llama-3.1-8B, reduces hallucination rates on MMHal-Bench from 0.624 to 0.514 with LLaVA-1.6-7B, and improves alignment win rates on SPA-VL by 8.11%, all while preserving the model's fluency and general capabilities.

Updated: 2025-07-24 02:34:13

标题: GrAInS：基于梯度的LLMs和VLMs推理时导向的归因

摘要: 推理时导向方法为大语言模型（LLMs）和视觉语言模型（VLMs）提供了一种轻量级替代方案，通过在测试时修改内部激活而不更新模型权重。然而，大多数现有方法依赖于固定的全局干预向量，忽视了单个输入标记的因果影响，并未充分利用模型logits的信息梯度，特别是在视觉和文本输入贡献不均匀的多模态设置中。为了解决这些限制，我们介绍了GrAInS，一种适用于语言模型和视觉-语言模型及任务的推理时导向方法。GrAInS使用对比、基于梯度的归因，通过整合梯度确定对首选输出与非首选输出的贡献最大的前k个标记，无论其正面还是负面。然后利用这些标记构建方向导向向量，捕捉从不良行为到良好行为的语义转变。在推理过程中，GrAInS根据标记级别的归因信号调整transformer层的隐藏激活，并对激活进行归一化以保留表征比例。这使得能够对模型行为进行细粒度、可解释和模块化的控制，无需重新训练或辅助监督。从经验上看，GrAInS始终优于微调和现有的导向基线：在TruthfulQA上使用Llama-3.1-8B，准确率提高了13.22％，在MMHal-Bench上将幻觉率从0.624降低到0.514，并在SPA-VL上将对齐胜率提高8.11％，同时保持模型的流畅性和一般能力。

更新时间: 2025-07-24 02:34:13

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.18043v1

Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems

Multi-solid systems are foundational to a wide range of real-world applications, yet modeling their complex interactions remains challenging. Existing deep learning methods predominantly rely on implicit modeling, where the factors influencing solid deformation are not explicitly represented but are instead indirectly learned. However, as the number of solids increases, these methods struggle to accurately capture intricate physical interactions. In this paper, we introduce a novel explicit modeling paradigm that incorporates factors influencing solid deformation through structured modules. Specifically, we present Unisoma, a unified and flexible Transformer-based model capable of handling variable numbers of solids. Unisoma directly captures physical interactions using contact modules and adaptive interaction allocation mechanism, and learns the deformation through a triplet relationship. Compared to implicit modeling techniques, explicit modeling is more well-suited for multi-solid systems with diverse coupling patterns, as it enables detailed treatment of each solid while preventing information blending and confusion. Experimentally, Unisoma achieves consistent state-of-the-art performance across seven well-established datasets and two complex multi-solid tasks. Code is avaiable at https://github.com/therontau0054/Unisoma.

Updated: 2025-07-24 02:14:16

标题: Unisoma：一种用于多固体系统的统一基于Transformer的求解器

摘要: 多实体系统是广泛应用于各种现实世界应用程序的基础，但对其复杂交互的建模仍然具有挑战性。现有的深度学习方法主要依赖于隐式建模，其中影响固体变形的因素并不被明确表示，而是间接学习。然而，随着固体数量的增加，这些方法往往难以准确捕捉复杂的物理相互作用。在本文中，我们介绍了一种新颖的显式建模范式，通过结构化模块整合影响固体变形的因素。具体而言，我们提出了Unisoma，这是一个统一且灵活的基于Transformer的模型，能够处理可变数量的固体。Unisoma直接利用接触模块和自适应交互分配机制捕获物理相互作用，并通过三元关系学习变形。与隐式建模技术相比，显式建模更适用于具有多样耦合模式的多固体系统，因为它使得能够详细处理每个固体，同时防止信息混合和混淆。在实验中，Unisoma在七个知名数据集和两个复杂的多实体任务中实现了一致的最先进性能。代码可在https://github.com/therontau0054/Unisoma获得。

更新时间: 2025-07-24 02:14:16

领域: cs.LG

下载: http://arxiv.org/abs/2506.06021v2

Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems

Updated: 2025-07-24 02:14:16

标题: Unisoma: 一个基于Transformer的统一解决方案，用于多固体系统

摘要: 多固体系统是广泛应用于各种现实世界应用的基础，然而对其复杂相互作用进行建模仍然具有挑战性。现有的深度学习方法主要依赖于隐式建模，其中影响固体变形的因素并没有明确表示，而是间接学习。然而，随着固体数量的增加，这些方法往往难以准确捕捉复杂的物理相互作用。在本文中，我们介绍了一种新颖的显式建模范式，通过结构化模块整合影响固体变形的因素。具体而言，我们提出了Unisoma，这是一个统一且灵活的基于Transformer的模型，能够处理可变数量的固体。Unisoma通过接触模块和自适应交互分配机制直接捕捉物理相互作用，并通过三元关系学习变形。与隐式建模技术相比，显式建模更适合于具有多样耦合模式的多固体系统，因为它可以对每个固体进行详细处理，同时防止信息混合和混淆。在实验中，Unisoma在七个成熟数据集和两个复杂的多固体任务中实现了一致的最先进性能。代码可在https://github.com/therontau0054/Unisoma上找到。

更新时间: 2025-07-24 02:14:16

领域: cs.LG

下载: http://arxiv.org/abs/2506.06021v2

Your ATs to Ts: MITRE ATT&CK Attack Technique to P-SSCRM Task Mapping

The MITRE Adversarial Tactics, Techniques and Common Knowledge (MITRE ATT&CK) Attack Technique to Proactive Software Supply Chain Risk Management Framework (P-SSCRM) Task mapping described in this document helps software organizations to determine how different tasks mitigate the attack techniques of software supply chain attacks. The mapping was created through four independent strategies to find agreed-upon mappings. Because each P-SSCRM task is mapped to one or more tasks from the 10 frameworks, the mapping we provide is also a mapping between MITRE ATT&CK and other prominent government and industry frameworks.

Updated: 2025-07-24 02:14:00

标题: 您的ATs到Ts: MITRE ATT&CK攻击技术到P-SSCRM任务映射

摘要: MITRE对抗策略、技术和常识（MITRE ATT&CK）攻击技术与主动软件供应链风险管理框架（P-SSCRM）任务映射在本文档中描述，帮助软件组织确定不同任务如何减轻软件供应链攻击的攻击技术。该映射是通过四种独立策略创建的，以找到达成一致的映射。由于每个P-SSCRM任务都映射到10个框架中的一个或多个任务，我们提供的映射也是MITRE ATT&CK和其他著名政府和行业框架之间的映射。

更新时间: 2025-07-24 02:14:00

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2507.18037v1

Your ATs to Ts: MITRE ATT&CK Attack Technique to P-SSCRM Task Mapping

Updated: 2025-07-24 02:14:00

标题: 您的AT到Ts：MITRE ATT&CK攻击技术到P-SSCRM任务映射

摘要: MITRE对抗策略、技术和常见知识（MITRE ATT＆CK）攻击技术与积极的软件供应链风险管理框架（P-SSCRM）任务映射描述在本文档中有助于软件组织确定不同任务如何减轻软件供应链攻击的攻击技术。该映射是通过四种独立策略创建的，以找到达成一致的映射。由于每个P-SSCRM任务都映射到10个框架中的一个或多个任务，因此我们提供的映射也是MITRE ATT＆CK和其他知名政府和行业框架之间的映射。

更新时间: 2025-07-24 02:14:00

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2507.18037v1

Tackling Hallucination from Conditional Models for Medical Image Reconstruction with DynamicDPS

Hallucinations are spurious structures not present in the ground truth, posing a critical challenge in medical image reconstruction, especially for data-driven conditional models. We hypothesize that combining an unconditional diffusion model with data consistency, trained on a diverse dataset, can reduce these hallucinations. Based on this, we propose DynamicDPS, a diffusion-based framework that integrates conditional and unconditional diffusion models to enhance low-quality medical images while systematically reducing hallucinations. Our approach first generates an initial reconstruction using a conditional model, then refines it with an adaptive diffusion-based inverse problem solver. DynamicDPS skips early stage in the reverse process by selecting an optimal starting time point per sample and applies Wolfe's line search for adaptive step sizes, improving both efficiency and image fidelity. Using diffusion priors and data consistency, our method effectively reduces hallucinations from any conditional model output. We validate its effectiveness in Image Quality Transfer for low-field MRI enhancement. Extensive evaluations on synthetic and real MR scans, including a downstream task for tissue volume estimation, show that DynamicDPS reduces hallucinations, improving relative volume estimation by over 15% for critical tissues while using only 5% of the sampling steps required by baseline diffusion models. As a model-agnostic and fine-tuning-free approach, DynamicDPS offers a robust solution for hallucination reduction in medical imaging. The code will be made publicly available upon publication.

Updated: 2025-07-24 02:11:36

标题: 用动态DPS条件模型解决医学图像重建中的幻觉

摘要: 幻觉是不存在于基本真实中的虚假结构，在医学图像重建中尤其对基于数据的条件模型构成了重大挑战。我们假设将一个无条件扩散模型与数据一致性相结合，在多样化数据集上进行训练，可以减少这些幻觉。基于此，我们提出了DynamicDPS，一个基于扩散的框架，集成了条件和无条件扩散模型，以增强低质量医学图像，同时系统地减少幻觉。我们的方法首先使用条件模型生成初始重建，然后通过自适应扩散逆问题求解器进行细化。DynamicDPS通过为每个样本选择最佳起始时间点，在逆过程中跳过早期阶段，并应用Wolfe线搜索以获得自适应步长，从而提高效率和图像保真度。通过扩散先验和数据一致性，我们的方法有效地减少了任何条件模型输出中的幻觉。我们在低场MRI增强的图像质量转移中验证了其有效性。对合成和真实MR扫描的广泛评估，包括用于组织体积估计的下游任务，显示DynamicDPS减少了幻觉，将关键组织的相对体积估计提高了超过15％，同时仅使用了基线扩散模型所需采样步骤的5％。作为一种与模型无关且无需微调的方法，DynamicDPS为医学图像中幻觉减少提供了稳健的解决方案。代码将在发表后公开。

更新时间: 2025-07-24 02:11:36

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.01075v2

NWaaS: Nonintrusive Watermarking as a Service for X-to-Image DNN

The intellectual property of deep neural network (DNN) models can be protected with DNN watermarking, which embeds copyright watermarks into model parameters (white-box), model behavior (black-box), or model outputs (box-free), and the watermarks can be subsequently extracted to verify model ownership or detect model theft. Despite recent advances, these existing methods are inherently intrusive, as they either modify the model parameters or alter the structure. This natural intrusiveness raises concerns about watermarking-induced shifts in model behavior and the additional cost of fine-tuning, further exacerbated by the rapidly growing model size. As a result, model owners are often reluctant to adopt DNN watermarking in practice, which limits the development of practical Watermarking as a Service (WaaS) systems. To address this issue, we introduce Nonintrusive Watermarking as a Service (NWaaS), a novel trustless paradigm designed for X-to-Image models, in which we hypothesize that with the model untouched, an owner-defined watermark can still be extracted from model outputs. Building on this concept, we propose ShadowMark, a concrete implementation of NWaaS which addresses critical deployment challenges by establishing a robust and nonintrusive side channel in the protected model's black-box API, leveraging a key encoder and a watermark decoder. It is significantly distinctive from existing solutions by attaining the so-called absolute fidelity and being applicable to different DNN architectures, while being also robust against existing attacks, eliminating the fidelity-robustness trade-off. Extensive experiments on image-to-image, noise-to-image, noise-and-text-to-image, and text-to-image models, demonstrate the efficacy and practicality of ShadowMark for real-world deployment of nonintrusive DNN watermarking.

Updated: 2025-07-24 02:07:28

标题: NWaaS：面向X到图像DNN的非侵入式水印服务

摘要: 深度神经网络（DNN）模型的知识产权可以通过DNN数字水印技术进行保护，该技术将版权水印嵌入到模型参数（白盒）、模型行为（黑盒）或模型输出（无盒），然后可以提取水印以验证模型所有权或检测模型盗窃。尽管最近取得了进展，但这些现有方法固有地具有侵入性，因为它们要么修改模型参数，要么改变结构。这种自然侵入性引发了对水印导致模型行为变化以及额外微调成本的担忧，这一担忧进一步受到模型规模迅速增长的影响。因此，模型所有者通常不愿意在实践中采用DNN数字水印技术，这限制了实际水印作为服务（WaaS）系统的发展。为了解决这个问题，我们引入了非侵入式水印服务（NWaaS），这是一种针对X到图像模型设计的新型无信任范式，我们假设通过不触及模型，可以从模型输出中提取出所有者定义的水印。基于这一概念，我们提出了ShadowMark，这是NWaaS的一个具体实现，通过在受保护模型的黑盒API中建立一个强大且非侵入式的侧信道，利用关键编码器和水印解码器来解决关键的部署挑战。与现有解决方案显著不同，ShadowMark 实现了所谓的绝对保真度，并适用于不同的DNN架构，同时还能够抵抗现有攻击，消除了保真度与鲁棒性之间的折衷。对图像到图像、噪声到图像、噪声和文本到图像以及文本到图像模型的大量实验表明，ShadowMark 对于非侵入式DNN数字水印的实际部署具有高效性和实用性。

更新时间: 2025-07-24 02:07:28

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2507.18036v1

NWaaS: Nonintrusive Watermarking as a Service for X-to-Image DNN

Updated: 2025-07-24 02:07:28

标题: NWaaS：X到图像DNN的非侵入式水印服务

摘要: 深度神经网络（DNN）模型的知识产权可以通过DNN水印技术进行保护，该技术将版权水印嵌入到模型参数（白盒）、模型行为（黑盒）或模型输出（无盒），并可随后提取水印以验证模型所有权或检测模型盗窃。尽管近年来取得了进展，但这些现有方法本质上是侵入性的，因为它们要么修改模型参数，要么改变结构。这种自然的侵入性引发了对水印引起的模型行为变化和额外调整成本的担忧，这些问题又受到模型尺寸迅速增长的影响。因此，模型所有者通常不愿在实践中采用DNN水印技术，这限制了实际WaaS（水印作为服务）系统的发展。为了解决这个问题，我们引入了非侵入性水印技术作为服务（NWaaS），这是一种针对X到图像模型设计的新型无信任范式，我们假设在不触及模型的情况下，可以从模型输出中提取出所有者定义的水印。基于这一概念，我们提出了ShadowMark，这是NWaaS的一个具体实现，通过在受保护模型的黑盒API中建立一个强大且无侵入性的侧信道，利用关键编码器和水印解码器。与现有解决方案明显不同，ShadowMark实现了所谓的绝对保真度，并适用于不同的DNN架构，同时也能够抵御现有攻击，消除了保真度和鲁棒性之间的权衡。对图像到图像、噪声到图像、噪声和文本到图像以及文本到图像模型的广泛实验表明，ShadowMark 对于实际部署非侵入性DNN水印技术具有高效性和实用性。

更新时间: 2025-07-24 02:07:28

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2507.18036v1

Removing Box-Free Watermarks for Image-to-Image Models via Query-Based Reverse Engineering

The intellectual property of deep generative networks (GNets) can be protected using a cascaded hiding network (HNet) which embeds watermarks (or marks) into GNet outputs, known as box-free watermarking. Although both GNet and HNet are encapsulated in a black box (called operation network, or ONet), with only the generated and marked outputs from HNet being released to end users and deemed secure, in this paper, we reveal an overlooked vulnerability in such systems. Specifically, we show that the hidden GNet outputs can still be reliably estimated via query-based reverse engineering, leaking the generated and unmarked images, despite the attacker's limited knowledge of the system. Our first attempt is to reverse-engineer an inverse model for HNet under the stringent black-box condition, for which we propose to exploit the query process with specially curated input images. While effective, this method yields unsatisfactory image quality. To improve this, we subsequently propose an alternative method leveraging the equivalent additive property of box-free model watermarking and reverse-engineering a forward surrogate model of HNet, with better image quality preservation. Extensive experimental results on image processing and image generation tasks demonstrate that both attacks achieve impressive watermark removal success rates (100%) while also maintaining excellent image quality (reaching the highest PSNR of 34.69 dB), substantially outperforming existing attacks, highlighting the urgent need for robust defensive strategies to mitigate the identified vulnerability in box-free model watermarking.

Updated: 2025-07-24 02:05:55

标题: 通过基于查询的逆向工程去除图像到图像模型中的无框水印

摘要: 深度生成网络（GNets）的知识产权可以通过级联隐藏网络（HNet）来保护，该网络将水印（或标记）嵌入到GNet的输出中，称为无框水印技术。尽管GNet和HNet都封装在一个黑盒子中（称为操作网络或ONet），只有从HNet释放的生成和标记输出才会提供给最终用户并被认为是安全的，但在本文中，我们揭示了这种系统中一个被忽视的漏洞。具体来说，我们展示了即使攻击者对系统了解有限，也可以通过基于查询的逆向工程可靠地估计隐藏的GNet输出，泄露生成的未标记图像。我们的第一次尝试是在严格的黑盒条件下为HNet反向工程一个逆模型，我们提出利用特别策划的输入图像进行查询过程。虽然有效，但这种方法产生的图像质量不理想。为了改进这一点，我们随后提出了一种利用无框模型水印技术的等效附加属性并反向工程HNet的前向替代模型的替代方法，具有更好的图像质量保留。对图像处理和图像生成任务的广泛实验结果表明，这两种攻击都实现了令人印象深刻的水印去除成功率（100%），同时保持了优秀的图像质量（达到34.69 dB的最高PSNR），明显优于现有的攻击方法，突出了在无框模型水印技术中减轻已识别漏洞的迫切需要强大的防御策略。

更新时间: 2025-07-24 02:05:55

领域: cs.CR

下载: http://arxiv.org/abs/2507.18034v1

Removing Box-Free Watermarks for Image-to-Image Models via Query-Based Reverse Engineering

Updated: 2025-07-24 02:05:55

标题: 通过基于查询的逆向工程方法去除图像到图像模型中的无框水印

摘要: 深度生成网络（GNets）的知识产权可以通过采用级联隐藏网络（HNet）来保护，该网络将水印（或标记）嵌入到GNet的输出中，被称为无框水印技术。尽管GNet和HNet都封装在一个黑盒子中（称为操作网络，或ONet），只有来自HNet生成的带有标记的输出被释放给最终用户并被视为安全，但在本文中，我们揭示了这类系统中一个被忽视的漏洞。具体来说，我们展示了通过基于查询的反向工程仍然可以可靠地估计出隐藏的GNet输出，泄露出生成的未标记图像，尽管攻击者对系统的了解有限。我们的第一次尝试是在严格的黑盒条件下为HNet反向工程一个逆模型，我们建议利用特别策划的输入图像进行查询过程。尽管有效，这种方法产生的图像质量令人不满意。为了改进这一点，我们随后提出了一种替代方法，利用无框模型水印技术的等效可加性属性，并反向工程一个HNet的前向替代模型，以更好地保留图像质量。在图像处理和图像生成任务上进行的大量实验结果表明，这两种攻击都实现了令人瞩目的水印去除成功率（100%），同时保持了优秀的图像质量（达到最高PSNR 34.69 dB），明显优于现有的攻击，突出了在无框模型水印技术中减轻已识别的漏洞所需的强大的防御策略的紧迫性。

更新时间: 2025-07-24 02:05:55

领域: cs.CR

下载: http://arxiv.org/abs/2507.18034v1

OpenNav: Open-World Navigation with Multimodal Large Language Models

Pre-trained large language models (LLMs) have demonstrated strong common-sense reasoning abilities, making them promising for robotic navigation and planning tasks. However, despite recent progress, bridging the gap between language descriptions and actual robot actions in the open-world, beyond merely invoking limited predefined motion primitives, remains an open challenge. In this work, we aim to enable robots to interpret and decompose complex language instructions, ultimately synthesizing a sequence of trajectory points to complete diverse navigation tasks given open-set instructions and open-set objects. We observe that multi-modal large language models (MLLMs) exhibit strong cross-modal understanding when processing free-form language instructions, demonstrating robust scene comprehension. More importantly, leveraging their code-generation capability, MLLMs can interact with vision-language perception models to generate compositional 2D bird-eye-view value maps, effectively integrating semantic knowledge from MLLMs with spatial information from maps to reinforce the robot's spatial understanding. To further validate our approach, we effectively leverage large-scale autonomous vehicle datasets (AVDs) to validate our proposed zero-shot vision-language navigation framework in outdoor navigation tasks, demonstrating its capability to execute a diverse range of free-form natural language navigation instructions while maintaining robustness against object detection errors and linguistic ambiguities. Furthermore, we validate our system on a Husky robot in both indoor and outdoor scenes, demonstrating its real-world robustness and applicability. Supplementary videos are available at https://trailab.github.io/OpenNav-website/

Updated: 2025-07-24 02:05:28

标题: OpenNav：使用多模态大型语言模型的开放世界导航

摘要: 预先训练的大型语言模型（LLM）已经展示了强大的常识推理能力，使它们在机器人导航和规划任务中具有潜力。然而，尽管最近取得了进展，但在开放世界中，将语言描述与实际机器人动作之间的鸿沟，超越仅仅调用有限预定义的运动原语，仍然是一个开放性挑战。在这项工作中，我们的目标是使机器人能够解释和分解复杂的语言指令，最终合成一系列轨迹点，完成各种导航任务，给定开放集指令和开放集物体。我们观察到，多模态大型语言模型（MLLM）在处理自由形式语言指令时表现出强大的跨模态理解能力，展示了稳健的场景理解能力。更重要的是，利用它们的代码生成能力，MLLM可以与视觉-语言感知模型互动，生成组合的2D鸟瞰价值图，有效地将MLLM的语义知识与地图的空间信息相结合，以增强机器人的空间理解能力。为了进一步验证我们的方法，我们有效地利用大规模自主车辆数据集（AVDs）验证我们提出的零-shot视觉-语言导航框架在户外导航任务中的有效性，展示其执行各种自由形式自然语言导航指令的能力，同时保持对物体检测错误和语言歧义的稳健性。此外，我们在Husky机器人上验证我们的系统在室内和室外场景中的实用性和鲁棒性。补充视频可在https://trailab.github.io/OpenNav-website/ 上找到。

更新时间: 2025-07-24 02:05:28

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2507.18033v1

ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

The rapid rise of deepfake technology, which produces realistic but fraudulent digital content, threatens the authenticity of media. Traditional deepfake detection approaches often struggle with sophisticated, customized deepfakes, especially in terms of generalization and robustness against malicious attacks. This paper introduces ViGText, a novel approach that integrates images with Vision Large Language Model (VLLM) Text explanations within a Graph-based framework to improve deepfake detection. The novelty of ViGText lies in its integration of detailed explanations with visual data, as it provides a more context-aware analysis than captions, which often lack specificity and fail to reveal subtle inconsistencies. ViGText systematically divides images into patches, constructs image and text graphs, and integrates them for analysis using Graph Neural Networks (GNNs) to identify deepfakes. Through the use of multi-level feature extraction across spatial and frequency domains, ViGText captures details that enhance its robustness and accuracy to detect sophisticated deepfakes. Extensive experiments demonstrate that ViGText significantly enhances generalization and achieves a notable performance boost when it detects user-customized deepfakes. Specifically, average F1 scores rise from 72.45% to 98.32% under generalization evaluation, and reflects the model's superior ability to generalize to unseen, fine-tuned variations of stable diffusion models. As for robustness, ViGText achieves an increase of 11.1% in recall compared to other deepfake detection approaches. When facing targeted attacks that exploit its graph-based architecture, ViGText limits classification performance degradation to less than 4%. ViGText uses detailed visual and textual analysis to set a new standard for detecting deepfakes, helping ensure media authenticity and information integrity.

Updated: 2025-07-24 02:04:58

标题: ViGText: 利用视觉-语言模型解释和图神经网络进行深度伪造图像检测

摘要: 深度伪造技术的迅速发展产生了逼真但欺骗性数字内容，威胁着媒体的真实性。传统的深度伪造检测方法通常难以应对复杂、定制的深度伪造，尤其在泛化和抵御恶意攻击方面表现不佳。本文介绍了一种名为ViGText的新方法，它将图像与Vision Large Language Model（VLLM）文本解释相结合在基于图的框架中，以提高深度伪造检测的效果。ViGText的创新之处在于将详细的解释与视觉数据整合在一起，因此比通常缺乏特异性并无法显示微小不一致性的标题提供更具上下文意识的分析。ViGText系统地将图像分成补丁，构建图像和文本图，并使用图神经网络（GNNs）将它们整合进行分析，以识别深度伪造。通过在空间和频率域跨多级特征提取，ViGText捕获细节，增强了其对检测复杂深度伪造的鲁棒性和准确性。大量实验表明，ViGText显著提高了泛化能力，并在检测用户定制的深度伪造时取得了显著的性能提升。具体而言，在泛化评估下，平均F1分数从72.45％提高到98.32％，反映了模型更具泛化到未见过的、稳定扩散模型微调变化的能力。就鲁棒性而言，与其他深度伪造检测方法相比，ViGText的召回率提高了11.1％。在面对利用其基于图的结构进行有针对性攻击时，ViGText将分类性能降低限制在不到4％。ViGText利用详细的视觉和文本分析来设定检测深度伪造的新标准，有助于确保媒体的真实性和信息完整性。

更新时间: 2025-07-24 02:04:58

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.18031v1

ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

Updated: 2025-07-24 02:04:58

标题: ViGText：利用视觉-语言模型解释和图神经网络检测深度伪造图像

摘要: 深度伪造技术的快速崛起产生了逼真但欺诈性数字内容，威胁着媒体的真实性。传统的深度伪造检测方法通常难以应对复杂、定制的深度伪造，尤其是在泛化和抵御恶意攻击方面。本文介绍了ViGText，一种新颖的方法，它将图像与Vision Large Language Model（VLLM）文本解释整合到基于图的框架中，以改善深度伪造检测。ViGText的新颖之处在于它将详细的解释与视觉数据整合在一起，提供了比标题更具上下文感知的分析，标题通常缺乏具体性，无法揭示微妙的不一致之处。ViGText系统地将图像分割为补丁，构建图像和文本图，然后利用图神经网络（GNNs）进行分析，以识别深度伪造。通过跨空间和频域的多级特征提取，ViGText捕捉了增强其抵抗性和准确性的细节，用于检测复杂的深度伪造。大量实验证明，ViGText显著提高了泛化能力，并在检测用户定制的深度伪造时实现了显著的性能提升。具体而言，平均F1分数从72.45%上升到98.32%，在泛化评估下反映出模型对未见过的、经过微调的稳定扩散模型变体的更强泛化能力。至于抵抗性，与其他深度伪造检测方法相比，ViGText的召回率提高了11.1%。在面对利用其基于图的架构进行的定向攻击时，ViGText将分类性能的下降限制在4%以下。ViGText利用详细的视觉和文本分析为检测深度伪造设立了新的标准，有助于确保媒体的真实性和信息的完整性。

更新时间: 2025-07-24 02:04:58

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.18031v1

NeuralDB: Scaling Knowledge Editing in LLMs to 100,000 Facts with Neural KV Database

Efficiently editing knowledge stored in large language models (LLMs) enables model updates without large-scale training. One possible solution is Locate-and-Edit (L\&E), allowing simultaneous modifications of a massive number of facts. However, such editing may compromise the general abilities of LLMs and even result in forgetting edited facts when scaling up to thousands of edits. In this paper, we model existing linear L\&E methods as querying a Key-Value (KV) database. From this perspective, we then propose NeuralDB, an editing framework that explicitly represents the edited facts as a neural KV database equipped with a non-linear gated retrieval module, % In particular, our gated module only operates when inference involves the edited facts, effectively preserving the general abilities of LLMs. Comprehensive experiments involving the editing of 10,000 facts were conducted on the ZsRE and CounterFacts datasets, using GPT2-XL, GPT-J (6B) and Llama-3 (8B). The results demonstrate that NeuralDB not only excels in editing efficacy, generalization, specificity, fluency, and consistency, but also preserves overall performance across six representative text understanding and generation tasks. Further experiments indicate that NeuralDB maintains its effectiveness even when scaled to 100,000 facts (\textbf{50x} more than in prior work).

Updated: 2025-07-24 02:00:09

标题: NeuralDB：将LLMs中的知识编辑扩展到100,000个事实与神经KV数据库

摘要: 将存储在大型语言模型（LLMs）中的知识高效编辑，可以实现模型更新而无需大规模训练。一种可能的解决方案是Locate-and-Edit（L&E），允许同时修改大量事实。然而，这种编辑可能会损害LLMs的一般能力，甚至在扩展到数千次编辑时会导致编辑的事实被遗忘。在本文中，我们将现有的线性L&E方法建模为查询键值（KV）数据库。从这个角度来看，我们提出了NeuralDB，这是一个编辑框架，明确将编辑的事实表示为一个配备非线性门控检索模块的神经KV数据库。特别是，我们的门控模块仅在推断涉及编辑的事实时才起作用，有效地保留了LLMs的一般能力。在ZsRE和CounterFacts数据集上进行了涉及编辑10,000个事实的综合实验，使用GPT2-XL、GPT-J（6B）和Llama-3（8B）。结果表明，NeuralDB在编辑效率、泛化、特异性、流畅性和一致性方面表现优异，并且在六项代表性文本理解和生成任务中保持了整体性能。进一步实验表明，即使扩展到100,000个事实（比之前的工作多50倍），NeuralDB仍然保持其有效性。

更新时间: 2025-07-24 02:00:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.18028v1

AI Workflow, External Validation, and Development in Eye Disease Diagnosis

Timely disease diagnosis is challenging due to increasing disease burdens and limited clinician availability. AI shows promise in diagnosis accuracy but faces real-world application issues due to insufficient validation in clinical workflows and diverse populations. This study addresses gaps in medical AI downstream accountability through a case study on age-related macular degeneration (AMD) diagnosis and severity classification. We designed and implemented an AI-assisted diagnostic workflow for AMD, comparing diagnostic performance with and without AI assistance among 24 clinicians from 12 institutions with real patient data sampled from the Age-Related Eye Disease Study (AREDS). Additionally, we demonstrated continual enhancement of an existing AI model by incorporating approximately 40,000 additional medical images (named AREDS2 dataset). The improved model was then systematically evaluated using both AREDS and AREDS2 test sets, as well as an external test set from Singapore. AI assistance markedly enhanced diagnostic accuracy and classification for 23 out of 24 clinicians, with the average F1-score increasing by 20% from 37.71 (Manual) to 45.52 (Manual + AI) (P-value < 0.0001), achieving an improvement of over 50% in some cases. In terms of efficiency, AI assistance reduced diagnostic times for 17 out of the 19 clinicians tracked, with time savings of up to 40%. Furthermore, a model equipped with continual learning showed robust performance across three independent datasets, recording a 29% increase in accuracy, and elevating the F1-score from 42 to 54 in the Singapore population.

Updated: 2025-07-24 01:49:32

标题: AI工作流程，外部验证和眼部疾病诊断的发展

摘要: 及时的疾病诊断面临着挑战，因为疾病负担增加，临床医生可用性有限。人工智能在诊断准确性方面表现出潜力，但由于在临床工作流程和不同人群中缺乏足够的验证，面临着现实世界应用问题。本研究通过对年龄相关性黄斑变性（AMD）诊断和严重程度分类的案例研究，解决了医疗人工智能下游责任方面的空白。我们设计并实施了一个AMD的人工智能辅助诊断工作流程，比较了24名来自12个机构的临床医生在真实患者数据（来自年龄相关眼病研究（AREDS））样本中，有无人工智能辅助的诊断表现。此外，我们通过整合约40,000张额外的医学图像（名为AREDS2数据集）展示了现有人工智能模型的持续改进。改进后的模型随后经过系统评估，使用AREDS和AREDS2测试集，以及新加坡的外部测试集。人工智能辅助显著提高了23名临床医生中的诊断准确性和分类，平均F1分数从37.71（手动）提高到45.52（手动+人工智能）（P值<0.0001），在某些情况下提高了50%以上。在效率方面，AI辅助减少了19名临床医生中的17人的诊断时间，节约时间高达40%。此外，一个具有持续学习功能的模型在三个独立数据集上表现出良好的性能，准确率提高了29%，在新加坡人群中将F1分数从42提高到54。

更新时间: 2025-07-24 01:49:32

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.15087v2

AI Workflow, External Validation, and Development in Eye Disease Diagnosis

Updated: 2025-07-24 01:49:32

标题: 人工智能工作流程、外部验证和眼疾诊断发展

摘要: 及时诊断疾病是具有挑战性的，因为疾病负担增加并且临床医生资源有限。人工智能在诊断准确性方面表现出潜力，但由于在临床工作流程和不同人群中的不足验证而面临现实应用问题。本研究通过一个以年龄相关性黄斑变性（AMD）诊断和严重程度分类为案例研究，解决了医学人工智能下游问责制度的问题。我们设计并实施了一个AMD的人工智能辅助诊断工作流程，比较了24名来自12个机构的临床医生在来自年龄相关眼病研究（AREDS）样本的真实患者数据中，有无人工智能辅助时的诊断表现。此外，我们通过整合约40,000张额外的医学图像（被命名为AREDS2数据集）展示了对现有人工智能模型的持续改进。改进后的模型随后被系统评估，使用了AREDS和AREDS2测试集，以及新加坡的外部测试集。人工智能辅助显著提高了23名医生中的23名医生的诊断准确性和分类，平均F1分数从37.71（手动）增加到45.52（手动+人工智能）（P值<0.0001），在某些情况下实现了超过50%的改进。在效率方面，人工智能辅助减少了19名被跟踪的17名医生的诊断时间，节省时间高达40%。此外，一个具有持续学习能力的模型在三个独立数据集中表现出稳健的性能，准确率增加了29%，并将新加坡人口的F1分数从42提高到54。

更新时间: 2025-07-24 01:49:32

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.15087v2

Does visualization help AI understand data?

Charts and graphs help people analyze data, but can they also be useful to AI systems? To investigate this question, we perform a series of experiments with two commercial vision-language models: GPT 4.1 and Claude 3.5. Across three representative analysis tasks, the two systems describe synthetic datasets more precisely and accurately when raw data is accompanied by a scatterplot, especially as datasets grow in complexity. Comparison with two baselines -- providing a blank chart and a chart with mismatched data -- shows that the improved performance is due to the content of the charts. Our results are initial evidence that AI systems, like humans, can benefit from visualization.

Updated: 2025-07-24 01:47:34

标题: 可视化有助于人工智能理解数据吗？

摘要: 图表有助于人们分析数据，但它们也可以对人工智能系统有用吗？为了调查这个问题，我们对两个商业视觉语言模型进行了一系列实验：GPT 4.1和Claude 3.5。在三个代表性分析任务中，当原始数据配有散点图时，这两个系统更精确、更准确地描述合成数据集，特别是在数据集变得更加复杂时。与两个基准线比较--提供空白图表和数据不匹配的图表--显示改进的性能是由图表的内容导致的。我们的结果初步表明，像人类一样，人工智能系统也可以从可视化中受益。

更新时间: 2025-07-24 01:47:34

领域: cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2507.18022v1

Does visualization help AI understand data?

Updated: 2025-07-24 01:47:34

标题: 可视化是否有助于人工智能理解数据？

摘要: 图表可以帮助人们分析数据，但它们是否也对人工智能系统有用呢？为了调查这个问题，我们对两个商业视觉语言模型进行了一系列实验：GPT 4.1和Claude 3.5。在三个代表性的分析任务中，当原始数据配有散点图时，这两个系统更精确、更准确地描述合成数据集，尤其是在数据集变得更加复杂时。与提供空白图表和图表与不匹配数据的两个基线进行比较表明，性能的提高是由图表的内容所致。我们的结果初步证明，像人类一样，人工智能系统也可以从可视化中受益。

更新时间: 2025-07-24 01:47:34

领域: cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2507.18022v1

SuperARC: An Agnostic Test for Narrow, General, and Super Intelligence Based On the Principles of Recursive Compression and Algorithmic Probability

We introduce an open-ended test grounded in algorithmic probability that can avoid benchmark contamination in the quantitative evaluation of frontier models in the context of their Artificial General Intelligence (AGI) and Superintelligence (ASI) claims. Unlike other tests, this test does not rely on statistical compression methods (such as GZIP or LZW), which are more closely related to Shannon entropy than to Kolmogorov complexity and are not able to test beyond simple pattern matching. The test challenges aspects of AI, in particular LLMs, related to features of intelligence of fundamental nature such as synthesis and model creation in the context of inverse problems (generating new knowledge from observation). We argue that metrics based on model abstraction and abduction (optimal Bayesian `inference') for predictive `planning' can provide a robust framework for testing intelligence, including natural intelligence (human and animal), narrow AI, AGI, and ASI. We found that LLM model versions tend to be fragile and incremental as a result of memorisation only with progress likely driven by the size of training data. The results were compared with a hybrid neurosymbolic approach that theoretically guarantees universal intelligence based on the principles of algorithmic probability and Kolmogorov complexity. The method outperforms LLMs in a proof-of-concept on short binary sequences. We prove that compression is equivalent and directly proportional to a system's predictive power and vice versa. That is, if a system can better predict it can better compress, and if it can better compress, then it can better predict. Our findings strengthen the suspicion regarding the fundamental limitations of LLMs, exposing them as systems optimised for the perception of mastery over human language.

Updated: 2025-07-24 01:40:47

标题: SuperARC：基于递归压缩和算法概率原理的狭窄、一般和超级智能的不可知测试

摘要: 我们引入了一种基于算法概率的开放式测试，可以避免在评估前沿模型的数量化评估中出现基准污染，特别是在其人工通用智能（AGI）和超智能（ASI）声明的背景下。与其他测试不同，这个测试不依赖于统计压缩方法（如GZIP或LZW），这些方法更接近于香农熵而不是Kolmogorov复杂性，无法测试超越简单的模式匹配。该测试挑战了人工智能的各个方面，特别是与智能的基本特征（如在逆问题的背景下合成和模型创造，即通过观察生成新知识）相关的LLMs。我们认为基于模型抽象和绑架（最优贝叶斯“推理”）的度量标准可以为测试智能提供一个强大的框架，包括自然智能（人类和动物）、狭义人工智能、AGI和ASI。我们发现LLM模型版本往往是脆弱且增量的，因为仅通过记忆而进展可能受到训练数据规模的驱动。结果与一个基于算法概率和Kolmogorov复杂性原则理论上保证的混合神经符号方法进行了比较。该方法在短二进制序列的概念验证中优于LLMs。我们证明压缩等同于系统的预测能力，并且成正比。也就是说，如果系统能更好地预测，它就能更好地压缩，反之亦然。我们的研究结果加强了对LLMs基本局限性的怀疑，将它们暴露为优化于对人类语言的掌握的系统。

更新时间: 2025-07-24 01:40:47

领域: cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2503.16743v4

Zeroth-order log-concave sampling

We study the zeroth-order query complexity of log-concave sampling, specifically uniform sampling from convex bodies using membership oracles. We propose a simple variant of the proximal sampler that achieves the query complexity with matched R\'enyi orders between the initial warmness and output guarantee. Specifically, for any $\varepsilon>0$ and $q\geq2$, the sampler, initialized at $\pi_{0}$, outputs a sample whose law is $\varepsilon$-close in $q$-R\'enyi divergence to $\pi$, the uniform distribution over a convex body in $\mathbb{R}^{d}$, using $\widetilde{O}(qM_{q}^{q/(q-1)}d^{2}\,\lVert\operatorname{cov}\pi\rVert\log\frac{1}{\varepsilon})$ membership queries, where $M_{q}=\lVert\text{d}\pi_{0}/\text{d}\pi\rVert_{L^{q}(\pi)}$. We further introduce a simple annealing scheme that produces a warm start in $q$-R\'enyi divergence (i.e., $M_{q}=O(1)$) using $\widetilde{O}(qd^{2}R^{3/2}\,\lVert\operatorname{cov}\pi\rVert^{1/4})$ queries, where $R^{2}=\mathbb{E}_{\pi}[|\cdot|^{2}]$. This interpolates between known complexities for warm-start generation in total variation and R\'enyi-infinity divergence. To relay a R\'enyi warmness across the annealing scheme, we establish hypercontractivity under simultaneous heat flow and translate it into an improved mixing guarantee for the proximal sampler under a logarithmic Sobolev inequality. These results extend naturally to general log-concave distributions accessible via evaluation oracles, incurring additional quadratic queries.

Updated: 2025-07-24 01:31:49

标题: 零阶对数凹抽样

摘要: 我们研究log-concave抽样的零阶查询复杂性，具体来说是使用成员资格或acles从凸体中进行均匀抽样。我们提出了一种简单的近似采样器的变体，该采样器在初始热度和输出保证之间实现了匹配的Rényi阶数的查询复杂性。具体而言，对于任意$\varepsilon>0$和$q\geq2$，在$\pi_{0}$处初始化的采样器输出一个样本，其分布在$q$-Rényi散度中与$\pi$接近$\varepsilon$，$\pi$是$\mathbb{R}^{d}$中凸体上的均匀分布，使用$\widetilde{O}(qM_{q}^{q/(q-1)}d^{2}\,\lVert\operatorname{cov}\pi\rVert\log\frac{1}{\varepsilon})$个成员查询，其中$M_{q}=\lVert\text{d}\pi_{0}/\text{d}\pi\rVert_{L^{q}(\pi)}$。我们进一步介绍了一种简单的退火方案，通过使用$\widetilde{O}(qd^{2}R^{3/2}\,\lVert\operatorname{cov}\pi\rVert^{1/4})$个查询，在$q$-Rényi散度中产生一个热启动，其中$R^{2}=\mathbb{E}_{\pi}[|\cdot|^{2}]$。这在总变差和Rényi无穷大散度中已知的复杂性之间插值。为了在退火方案中传递Rényi热度，我们建立了同时热流的超对数性，并将其转化为对数Sobolev不等式下近似采样器的改进混合保证。这些结果自然扩展到通过评估或acles可访问的一般log-concave分布，需要额外的二次查询。

更新时间: 2025-07-24 01:31:49

领域: math.ST,cs.DS,cs.LG,math.FA,math.PR,stat.TH

下载: http://arxiv.org/abs/2507.18021v1

Zeroth-order log-concave sampling

Updated: 2025-07-24 01:31:49

标题: 零阶对数凹抽样

摘要: 我们研究了对数凹抽样的零阶查询复杂度，具体来说是使用成员资格或者拉依次进行凸体均匀抽样。我们提出了一个简单的近端抽样器的变种，它实现了查询复杂度，使得初始热度与输出保证之间的Rényi阶数匹配。具体地，对于任意$\varepsilon>0$和$q\geq2$，在$\pi_{0}$处初始化的抽样器输出一个样本，其分布在$q$-Rényi散度中与$\pi$，即$\mathbb{R}^{d}$中凸体上的均匀分布，$\varepsilon$-接近，使用$\widetilde{O}(qM_{q}^{q/(q-1)}d^{2}\,\lVert\operatorname{cov}\pi\rVert\log\frac{1}{\varepsilon})$个成员资格查询，其中$M_{q}=\lVert\text{d}\pi_{0}/\text{d}\pi\rVert_{L^{q}(\pi)}$。我们进一步引入了一个简单的退火方案，通过使用$\widetilde{O}(qd^{2}R^{3/2}\,\lVert\operatorname{cov}\pi\rVert^{1/4})$个查询，在$q$-Rényi散度中生成一个热启动（即$M_{q}=O(1)$），其中$R^{2}=\mathbb{E}_{\pi}[|\cdot|^{2}]$。这在总变差和Rényi-无穷散度的已知复杂性之间插值。为了在退火方案中传递Rényi热度，我们建立了同时热流的超收缩性，并将其转化为对数Sobolev不等式下近端抽样器的改进混合保证。这些结果自然地扩展到通过评估成员资格可访问的一般对数凹分布，导致额外的二次查询。

更新时间: 2025-07-24 01:31:49

领域: math.ST,cs.DS,cs.LG,math.FA,math.PR,stat.TH

下载: http://arxiv.org/abs/2507.18021v1

On Leveraging Unlabeled Data for Concurrent Positive-Unlabeled Classification and Robust Generation

The scarcity of class-labeled data is a ubiquitous bottleneck in many machine learning problems. While abundant unlabeled data typically exist and provide a potential solution, it is highly challenging to exploit them. In this paper, we address this problem by leveraging Positive-Unlabeled~(PU) classification and the conditional generation with extra unlabeled data \emph{simultaneously}. We present a novel training framework to jointly target both PU classification and conditional generation when exposed to extra data, especially out-of-distribution unlabeled data, by exploring the interplay between them: 1) enhancing the performance of PU classifiers with the assistance of a novel Classifier-Noise-Invariant Conditional GAN~(CNI-CGAN) that is robust to noisy labels, 2) leveraging extra data with predicted labels from a PU classifier to help the generation. Theoretically, we prove the optimal condition of CNI-CGAN and experimentally, we conducted extensive evaluations on diverse datasets.

Updated: 2025-07-24 01:29:43

标题: 利用未标记数据进行并行正-无标签分类和稳健生成的研究

摘要: 类标记数据的稀缺性是许多机器学习问题中普遍存在的瓶颈。虽然存在丰富的未标记数据并提供潜在解决方案，但利用它们是非常具有挑战性的。本文通过同时利用正-未标记（PU）分类和条件生成与额外未标记数据来解决这个问题。我们提出了一个新颖的训练框架，同时针对PU分类和条件生成，尤其是在接触到额外数据，特别是分布之外的未标记数据时，探索它们之间的相互作用：1）借助对噪声标签具有鲁棒性的新型分类器-噪声不变条件GAN（CNI-CGAN）来增强PU分类器的性能，2）利用从PU分类器预测标签的额外数据帮助生成。从理论上讲，我们证明了CNI-CGAN的最优条件，并在各种数据集上进行了广泛的实验评估。

更新时间: 2025-07-24 01:29:43

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2006.07841v3

LENS-DF: Deepfake Detection and Temporal Localization for Long-Form Noisy Speech

This study introduces LENS-DF, a novel and comprehensive recipe for training and evaluating audio deepfake detection and temporal localization under complicated and realistic audio conditions. The generation part of the recipe outputs audios from the input dataset with several critical characteristics, such as longer duration, noisy conditions, and containing multiple speakers, in a controllable fashion. The corresponding detection and localization protocol uses models. We conduct experiments based on self-supervised learning front-end and simple back-end. The results indicate that models trained using data generated with LENS-DF consistently outperform those trained via conventional recipes, demonstrating the effectiveness and usefulness of LENS-DF for robust audio deepfake detection and localization. We also conduct ablation studies on the variations introduced, investigating their impact on and relevance to realistic challenges in the field.

Updated: 2025-07-24 01:25:18

标题: LENS-DF：用于长篇杂音演讲的深度伪造检测和时间定位

摘要: 这项研究介绍了LENS-DF，一种新颖而全面的配方，用于在复杂和现实的音频环境下训练和评估音频深度伪造检测和时间定位。该配方的生成部分从输入数据集中输出具有多个关键特征的音频，如更长的持续时间、嘈杂的环境和包含多个说话者，以可控的方式。相应的检测和定位协议使用模型。我们基于自监督学习前端和简单后端进行实验。结果表明，使用LENS-DF生成的数据训练的模型始终优于通过传统配方训练的模型，证明了LENS-DF对于稳健的音频深度伪造检测和定位的有效性和实用性。我们还对引入的变化进行了消融研究，研究它们对领域中的现实挑战的影响和相关性。

更新时间: 2025-07-24 01:25:18

领域: cs.SD,cs.CR,eess.AS

下载: http://arxiv.org/abs/2507.16220v2

PPFPL: Cross-silo Privacy-preserving Federated Prototype Learning Against Data Poisoning Attacks on Non-IID Data

Privacy-Preserving Federated Learning (PPFL) allows multiple clients to collaboratively train a deep learning model by submitting hidden model updates. Nonetheless, PPFL is vulnerable to data poisoning attacks due to the distributed training nature of clients. Existing solutions have struggled to improve the performance of cross-silo PPFL in poisoned Non-IID data. To address the issues, this paper proposes a privacy-preserving federated prototype learning framework, named PPFPL, which enhances the cross-silo FL performance in poisoned Non-IID data while effectively resisting data poisoning attacks. Specifically, we adopt prototypes as client-submitted model updates to eliminate the impact of tampered data distribution on federated learning. Moreover, we utilize two servers to achieve Byzantine-robust aggregation by secure aggregation protocol, which greatly reduces the impact of malicious clients. Theoretical analyses confirm the convergence of PPFPL, and experimental results on publicly available datasets show that PPFPL is effective for resisting data poisoning attacks with Non-IID conditions.

Updated: 2025-07-24 01:21:26

标题: PPFPL：针对非独立同分布数据中的数据毒化攻击的跨领域隐私保护联邦原型学习

摘要: 隐私保护的联邦学习（PPFL）允许多个客户端通过提交隐藏的模型更新来共同训练深度学习模型。然而，由于客户端的分布式训练性质，PPFL容易受到数据毒化攻击的影响。现有的解决方案在毒化非独立同分布数据的跨数据中心PPFL性能上存在困难。为了解决这些问题，本文提出了一个隐私保护的联邦原型学习框架，称为PPFPL，它在毒化非独立同分布数据中提高了跨数据中心FL性能，同时有效地抵抗数据毒化攻击。具体来说，我们采用原型作为客户端提交的模型更新，消除了篡改数据分布对联邦学习的影响。此外，我们利用两个服务器通过安全聚合协议实现拜占庭-鲁棒的聚合，大大减少了恶意客户端的影响。理论分析证实了PPFPL的收敛性，对公开可用数据集的实验结果表明，PPFPL对抗具有非独立同分布条件的数据毒化攻击是有效的。

更新时间: 2025-07-24 01:21:26

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2504.03173v4

Fashion-AlterEval: A Dataset for Improved Evaluation of Conversational Recommendation Systems with Alternative Relevant Items

In Conversational Recommendation Systems (CRS), a user provides feedback on recommended items at each turn, leading the CRS towards improved recommendations. Due to the need for a large amount of data, a user simulator is employed for both training and evaluation. Such user simulators critique the current retrieved item based on knowledge of a single target item. However, system evaluation in offline settings with simulators is limited by the focus on a single target item and their unlimited patience over a large number of turns. To overcome these limitations of existing simulators, we propose Fashion-AlterEval, a new dataset that contains human judgments for a selection of alternative items by adding new annotations in common fashion CRS datasets. Consequently, we propose two novel meta-user simulators that use the collected judgments and allow simulated users not only to express their preferences about alternative items to their original target, but also to change their mind and level of patience. In our experiments using the Shoes and Fashion IQ as the original datasets and three CRS models, we find that using the knowledge of alternatives by the simulator can have a considerable impact on the evaluation of existing CRS models, specifically that the existing single-target evaluation underestimates their effectiveness, and when simulatedusers are allowed to instead consider alternative relevant items, the system can rapidly respond to more quickly satisfy the user.

Updated: 2025-07-24 01:18:24

标题: 时尚- AlterEval：一个用于改进对话推荐系统评估的数据集，包含备选相关项目

摘要: 在对话推荐系统（CRS）中，用户在每个轮次提供对推荐项目的反馈，从而使CRS朝着改进推荐的方向发展。由于需要大量数据，用户模拟器被用于训练和评估。这样的用户模拟器根据单个目标项目的知识对当前检索到的项目进行批判。然而，在离线设置中使用模拟器进行系统评估受到的限制是仅关注单个目标项目及其对大量轮次的无限耐心。为了克服现有模拟器的这些局限性，我们提出了Fashion-AlterEval，这是一个新的数据集，通过在常见时尚CRS数据集中添加新的注释，包含了对一系列替代项目的人类判断。因此，我们提出了两种新颖的元用户模拟器，利用收集到的判断，让模拟用户不仅能够表达对原始目标外的替代项目的偏好，而且能够改变他们的主意和耐心水平。在使用原始数据集Shoes和Fashion IQ以及三种CRS模型进行实验时，我们发现，模拟器利用替代项目的知识对现有CRS模型的评估会产生显著影响，尤其是现有的单一目标评估低估了它们的有效性，当允许模拟用户考虑替代相关项目时，系统可以更快地响应以更快地满足用户的需求。

更新时间: 2025-07-24 01:18:24

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2507.18017v1

Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem

The AI alignment problem, which focusses on ensuring that artificial intelligence (AI), including AGI and ASI, systems act according to human values, presents profound challenges. With the progression from narrow AI to Artificial General Intelligence (AGI) and Superintelligence, fears about control and existential risk have escalated. Here, we investigate whether embracing inevitable AI misalignment can be a contingent strategy to foster a dynamic ecosystem of competing agents as a viable path to steer them in more human-aligned trends and mitigate risks. We explore how misalignment may serve and should be promoted as a counterbalancing mechanism to team up with whichever agents are most aligned to human interests, ensuring that no single system dominates destructively. The main premise of our contribution is that misalignment is inevitable because full AI-human alignment is a mathematical impossibility from Turing-complete systems, which we also offer as a proof in this contribution, a feature then inherited to AGI and ASI systems. We introduce a change-of-opinion attack test based on perturbation and intervention analysis to study how humans and agents may change or neutralise friendly and unfriendly AIs through cooperation and competition. We show that open models are more diverse and that most likely guardrails implemented in proprietary models are successful at controlling some of the agents' range of behaviour with positive and negative consequences while closed systems are more steerable and can also be used against proprietary AI systems. We also show that human and AI intervention has different effects hence suggesting multiple strategies.

Updated: 2025-07-24 01:12:59

标题: 神经非典型影响力作为人工智能对准问题的一种有条件解决方案

摘要: 人工智能对齐问题，着重确保人工智能（包括AGI和ASI）系统按照人类价值观行事，面临深刻挑战。随着从狭义人工智能发展到人工通用智能（AGI）和超级智能，对控制和存在风险的恐惧不断升级。在这里，我们调查了是否接受不可避免的人工智能不对齐可以作为一种有条件的策略，以促进一个竞争代理生态系统，作为引导它们走向更符合人类利益的趋势和减轻风险的可行路径。我们探讨了不对齐如何作为一个抵消机制，应该被提倡，与最符合人类利益的代理合作，确保没有单一系统具有破坏性地主导。我们的主要观点是，不对齐是不可避免的，因为从图灵完备系统到人工通用智能（AGI）和超级智能，完全的人工智能-人类对齐是数学上不可能的，我们也在本文中提供了这一证明。我们介绍了一种基于干扰和干预分析的改变意见攻击测试，以研究人类和代理如何通过合作和竞争改变或抵消友好和不友好的人工智能。我们展示了开放模型更加多样化，而专有模型中实施的大部分防护措施成功地控制了一些代理的行为范围，带来积极和消极后果，而封闭系统更容易控制，并且也可以用于对抗专有人工智能系统。我们还展示了人类和人工智能干预具有不同效果，因此建议采取多种策略。

更新时间: 2025-07-24 01:12:59

领域: cs.AI

下载: http://arxiv.org/abs/2505.02581v4

Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models

Fine-tuning large language models (LLMs) for reasoning tasks using reinforcement learning methods like Group Relative Policy Optimization (GRPO) is computationally expensive. To address this, we propose a predictive framework that models training dynamics and helps optimize resource usage. Through experiments on Llama and Qwen models (3B 8B), we derive an empirical scaling law based on model size, initial performance, and training progress. This law predicts reward trajectories and identifies three consistent training phases: slow start, rapid improvement, and plateau. We find that training beyond certain number of an epoch offers little gain, suggesting earlier stopping can significantly reduce compute without sacrificing performance. Our approach generalizes across model types, providing a practical guide for efficient GRPO-based fine-tuning.

Updated: 2025-07-24 01:09:25

标题: 大推理模型的高效GRPO训练的预测缩放定律

摘要: 将大型语言模型（LLMs）进行推理任务的微调，使用类似Group Relative Policy Optimization（GRPO）的强化学习方法是计算密集型的。为了解决这个问题，我们提出了一个预测性框架，用于建模训练动态并帮助优化资源使用。通过对Llama和Qwen模型（3B 8B）进行实验，我们得出了一个基于模型大小、初始性能和训练进展的经验性缩放定律。这个定律预测了奖励轨迹，并确定了三个一致的训练阶段：缓慢启动、快速改善和稳定期。我们发现训练超过一定的时代数并没有太多的收益，表明提前停止可以显著减少计算量而不影响性能。我们的方法可以泛化到不同类型的模型，为GRPO微调提供了高效的实用指南。

更新时间: 2025-07-24 01:09:25

领域: cs.LG

下载: http://arxiv.org/abs/2507.18014v1

Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models

Updated: 2025-07-24 01:09:25

标题: 大推理模型高效GRPO训练的预测缩放定律

摘要: 使用强化学习方法（如组相对策略优化（GRPO））对大型语言模型（LLMs）进行推理任务的微调是计算昂贵的。为了解决这个问题，我们提出了一个预测框架，模拟训练动态并帮助优化资源使用。通过对Llama和Qwen模型（3B 8B）的实验，我们得出了基于模型大小、初始性能和训练进展的经验缩放定律。该定律预测奖励轨迹并确定了三个一致的训练阶段：缓慢启动、快速改进和平台。我们发现，训练超过一定数量的时代几乎没有收益，这表明较早停止可以显著减少计算量而不牺牲性能。我们的方法可以泛化到不同类型的模型，为高效的基于GRPO的微调提供了实用指南。

更新时间: 2025-07-24 01:09:25

领域: cs.LG

下载: http://arxiv.org/abs/2507.18014v1

GRR-CoCa: Leveraging LLM Mechanisms in Multimodal Model Architectures

State-of-the-art (SOTA) image and text generation models are multimodal models that have many similarities to large language models (LLMs). Despite achieving strong performances, leading foundational multimodal model architectures frequently lag behind the architectural sophistication of contemporary LLMs. We propose GRR-CoCa, an improved SOTA Contrastive Captioner (CoCa) model that incorporates Gaussian error gated linear units, root mean squared normalization, and rotary positional embedding into the textual decoders and the vision transformer (ViT) encoder. Each architectural modification has been shown to improve model performance in LLMs, but has yet to be adopted in CoCa. We benchmarked GRR-CoCa against Baseline CoCa, a model with the same modified textual decoders but with CoCa's original ViT encoder. We used standard pretraining and fine-tuning workflows to benchmark the models on contrastive and generative tasks. Our GRR-CoCa significantly outperformed Baseline CoCa on the pretraining dataset and three diverse fine-tuning datasets. Pretraining improvements were 27.25% in contrastive loss, 3.71% in perplexity, and 7.15% in CoCa loss. The average fine-tuning improvements were 13.66% in contrastive loss, 5.18% in perplexity, and 5.55% in CoCa loss. We show that GRR-CoCa's modified architecture improves performance and generalization across vision-language domains.

Updated: 2025-07-24 00:54:31

标题: GRR-CoCa: 在多模态模型结构中利用LLM机制

摘要: 最先进（SOTA）的图像和文本生成模型是多模态模型，与大型语言模型（LLMs）有许多相似之处。尽管取得了强大的表现，但领先的基础多模态模型架构经常落后于当代LLMs的架构复杂性。我们提出了GRR-CoCa，这是一个改进的SOTA对比式字幕生成器（CoCa）模型，它将高斯误差门控线性单元、均方根归一化和旋转位置嵌入结合到文本解码器和视觉变换器（ViT）编码器中。每个架构修改都已被证明可以提高LLMs的模型性能，但尚未在CoCa中采用。我们将GRR-CoCa与基准CoCa进行了基准测试，基准CoCa是一个具有相同修改的文本解码器但具有CoCa原始ViT编码器的模型。我们使用标准的预训练和微调工作流程对模型进行了对比和生成任务的基准测试。我们的GRR-CoCa在预训练数据集和三个不同的微调数据集上明显优于基准CoCa。预训练改进为对比损失27.25％，困惑度为3.71％，CoCa损失为7.15％。平均微调改进为对比损失13.66％，困惑度为5.18％，CoCa损失为5.55％。我们展示了GRR-CoCa的修改架构提高了在视觉语言领域的性能和泛化能力。

更新时间: 2025-07-24 00:54:31

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.18009v1

Active Learning For Repairable Hardware Systems With Partial Coverage

Identifying the optimal diagnostic test and hardware system instance to infer reliability characteristics using field data is challenging, especially when constrained by fixed budgets and minimal maintenance cycles. Active Learning (AL) has shown promise for parameter inference with limited data and budget constraints in machine learning/deep learning tasks. However, AL for reliability model parameter inference remains underexplored for repairable hardware systems. It requires specialized AL Acquisition Functions (AFs) that consider hardware aging and the fact that a hardware system consists of multiple sub-systems, which may undergo only partial testing during a given diagnostic test. To address these challenges, we propose a relaxed Mixed Integer Semidefinite Program (MISDP) AL AF that incorporates Diagnostic Coverage (DC), Fisher Information Matrices (FIMs), and diagnostic testing budgets. Furthermore, we design empirical-based simulation experiments focusing on two diagnostic testing scenarios: (1) partial tests of a hardware system with overlapping subsystem coverage, and (2) partial tests where one diagnostic test fully subsumes the subsystem coverage of another. We evaluate our proposed approach against the most widely used AL AF in the literature (entropy), as well as several intuitive AL AFs tailored for reliability model parameter inference. Our proposed AF ranked best on average among the alternative AFs across 6,000 experimental configurations, with respect to Area Under the Curve (AUC) of the Absolute Total Expected Event Error (ATEER) and Mean Squared Error (MSE) curves, with statistical significance calculated at a 0.05 alpha level using a Friedman hypothesis test.

Updated: 2025-07-24 00:51:58

标题: 具有部分覆盖的可修复硬件系统的主动学习

摘要: 识别出使用现场数据推断可靠性特征的最佳诊断测试和硬件系统实例是具有挑战性的，特别是当受固定预算和最小维护周期的限制时。主动学习（AL）在机器学习/深度学习任务中表现出了在有限数据和预算约束下进行参数推断的潜力。然而，对于可修复硬件系统的可靠性模型参数推断来说，AL仍然未得到充分探索。这需要考虑硬件老化以及硬件系统由多个子系统组成的事实的专门的AL获取函数（AFs），这些子系统在给定的诊断测试中可能仅经历部分测试。为了解决这些挑战，我们提出了一个包含诊断覆盖率（DC）、费舍尔信息矩阵（FIM）和诊断测试预算的放松混合整数半定规划（MISDP）AL AF。此外，我们设计了基于经验的模拟实验，重点关注两种诊断测试场景：（1）具有重叠子系统覆盖的硬件系统的部分测试，以及（2）其中一个诊断测试完全包含另一个子系统覆盖的部分测试。我们将我们提出的方法与文献中最广泛使用的AL AF（熵）以及几种专门针对可靠性模型参数推断设计的直观AL AF进行了评估。在6000个实验配置中，我们提出的AF在绝对总预期事件误差（ATEER）和均方误差（MSE）曲线的曲线下面积（AUC）方面平均排名最佳，通过使用Friedman假设检验计算的统计显著性在0.05α水平上。

更新时间: 2025-07-24 00:51:58

领域: stat.AP,cs.LG

下载: http://arxiv.org/abs/2503.16315v3

Active Learning For Repairable Hardware Systems With Partial Coverage

Updated: 2025-07-24 00:51:58

标题: 可修复硬件系统的部分覆盖主动学习

摘要: 确定使用现场数据推断可靠性特征的最佳诊断测试和硬件系统实例是具有挑战性的，特别是在受到固定预算和最小维护周期的限制时。主动学习（AL）已经在机器学习/深度学习任务中显示出在有限数据和预算约束下进行参数推断的潜力。然而，对于可维修硬件系统的可靠性模型参数推断来说，AL仍然是未被充分探讨的。这需要考虑硬件老化以及硬件系统由多个子系统组成的事实的专门的AL获取函数（AFs），这些子系统可能在给定的诊断测试中仅进行部分测试。为了解决这些挑战，我们提出了一个融合了诊断覆盖（DC）、费舍尔信息矩阵（FIMs）和诊断测试预算的放松的混合整数半定规划（MISDP）AL AF。此外，我们设计了基于经验的模拟实验，重点关注两种诊断测试场景：（1）对具有重叠子系统覆盖的硬件系统进行部分测试，以及（2）对一个诊断测试完全包含另一个子系统覆盖的部分测试。我们对我们提出的方法进行了评估，与文献中最广泛使用的AL AF（熵）以及几种专门为可靠性模型参数推断定制的直观AL AF进行对比。在6000个实验配置中，我们提出的AF在平均值上排名最好，与备选AF相比，以绝对总预期事件误差（ATEER）的曲线下面积（AUC）和均方误差（MSE）为指标，在统计上显著，使用Friedman假设检验计算0.05的α水平。

更新时间: 2025-07-24 00:51:58

领域: stat.AP,cs.LG

下载: http://arxiv.org/abs/2503.16315v3

Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model

In recent years, the advancement of imitation learning has led to increased interest in teleoperating low-cost manipulators to collect demonstration data. However, most existing systems rely on unilateral control, which only transmits target position values. While this approach is easy to implement and suitable for slow, non-contact tasks, it struggles with fast or contact-rich operations due to the absence of force feedback. This work demonstrates that fast teleoperation with force feedback is feasible even with force-sensorless, low-cost manipulators by leveraging 4-channel bilateral control. Based on accurately identified manipulator dynamics, our method integrates nonlinear terms compensation, velocity and external force estimation, and variable gain corresponding to inertial variation. Furthermore, using data collected by 4-channel bilateral control, we show that incorporating force information into both the input and output of learned policies improves performance in imitation learning. These results highlight the practical effectiveness of our system for high-fidelity teleoperation and data collection on affordable hardware.

Updated: 2025-07-24 00:40:26

标题: 快速双边遥操作和模仿学习：利用无传感器力控制通过准确动力学模型

摘要: 近年来，模仿学习的进展导致了对远程操作低成本操作器以收集示范数据的兴趣增加。然而，大多数现有系统依赖于单边控制，仅传输目标位置数值。虽然这种方法易于实现并适用于缓慢、非接触性任务，但在快速或接触丰富的操作中，由于缺乏力反馈，它很难胜任。这项工作展示了即使在没有力传感器的低成本操作器上，通过利用4通道双边控制，也可以实现带有力反馈的快速远程操作。基于准确识别的操作器动态，我们的方法集成了非线性项补偿、速度和外部力估计，并相应地调整增益以适应惯性变化。此外，利用4通道双边控制收集的数据，我们展示了将力信息整合到学习策略的输入和输出中可以改善模仿学习的性能。这些结果突显了我们的系统在高保真度远程操作和在经济实惠的硬件上进行数据收集方面的实际有效性。

更新时间: 2025-07-24 00:40:26

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.06174v5

E.A.R.T.H.: Structuring Creative Evolution through Model Error in Generative AI

How can AI move beyond imitation toward genuine creativity? This paper proposes the E.A.R.T.H. framework, a five-stage generative pipeline that transforms model-generated errors into creative assets through Error generation, Amplification, Refine selection, Transform, and Harness feedback. Drawing on cognitive science and generative modeling, we posit that "creative potential hides in failure" and operationalize this via structured prompts, semantic scoring, and human-in-the-loop evaluation. Implemented using LLaMA-2-7B-Chat, SBERT, BERTScore, CLIP, BLIP-2, and Stable Diffusion, the pipeline employs a composite reward function based on novelty, surprise, and relevance. At the Refine stage, creativity scores increase by 52.5% (1.179 to 1.898, t = -5.56, p < 0.001), with final outputs reaching 2.010 - a 70.4% improvement. Refined slogans are 48.4% shorter, 40.7% more novel, with only a 4.0% drop in relevance. Cross-modal tests show strong slogan-to-image alignment (CLIPScore: 0.249; BERTScore F1: 0.816). In human evaluations, 60% of outputs scored >= 4.0, with metaphorical slogans (avg. 4.09) outperforming literal ones (3.99). Feedback highlights stylistic precision and emotional resonance. These results demonstrate that error-centered, feedback-driven generation enhances creativity, offering a scalable path toward self-evolving, human-aligned creative AI.

Updated: 2025-07-24 00:39:19

标题: E.A.R.T.H.: 通过生成式人工智能中的模型错误构建创造性演化

摘要: 如何让人工智能超越模仿，走向真正的创造力？本文提出了E.A.R.T.H.框架，一个五阶段的生成管道，通过错误生成、放大、选择精炼、转化和利用反馈，将模型生成的错误转化为创造性资产。基于认知科学和生成建模，我们认为“创造潜力隐藏在失败中”，并通过结构化提示、语义评分和人机协同评估来实现这一点。使用LLaMA-2-7B-Chat、SBERT、BERTScore、CLIP、BLIP-2和Stable Diffusion实现，该管道采用基于新颖性、惊喜和相关性的复合奖励函数。在精炼阶段，创造力评分增加了52.5%（从1.179到1.898，t = -5.56，p < 0.001），最终输出达到2.010，提高了70.4%。精炼的口号比原来短48.4%，更具新颖性，相关性下降了4.0%。跨模式测试显示口号与图像之间有很强的对齐性（CLIPScore：0.249；BERTScore F1：0.816）。在人类评估中，60%的输出得分>= 4.0，使用隐喻的口号（平均得分4.09）表现优于直接的口号（3.99）。反馈强调了风格的精准性和情感共鸣。这些结果表明，以错误为中心、以反馈为驱动的生成增强了创造力，为自我进化、与人类对齐的创造性人工智能提供了可扩展的路径。

更新时间: 2025-07-24 00:39:19

领域: cs.AI

下载: http://arxiv.org/abs/2507.18004v1

NIST Post-Quantum Cryptography Standard Algorithms Based on Quantum Random Number Generators

In recent years, the advancement of quantum computing technology has posed potential security threats to RSA cryptography and elliptic curve cryptography. In response, the National Institute of Standards and Technology (NIST) published several Federal Information Processing Standards (FIPS) of post-quantum cryptography (PQC) in August 2024, including the Module-Lattice-Based Key-Encapsulation Mechanism (ML-KEM), Module-Lattice-Based Digital Signature Algorithm (ML-DSA), and Stateless Hash-Based Digital Signature Algorithm (SLH-DSA). Although these PQC algorithms are designed to resist quantum computing attacks, they may not provide adequate security in certain specialized application scenarios. To address this issue, this study proposes quantum random number generator (QRNG)-based PQC algorithms. These algorithms leverage quantum computing to generate random numbers, which serve as the foundation for key pair generation, key encapsulation, and digital signature generation. A generalized architecture of QRNG is proposed, along with the design of six QRNGs. Each generator is evaluated according to the statistical validation procedures outlined in NIST SP 800-90B, including tests for verification of entropy sources and independent and identically distributed (IID) outputs. Experimental results assess the computation time of the six QRNGs, as well as the performance of QRNG-based ML-KEM, QRNG-based ML-DSA, and QRNG-based SLH-DSA. These findings provide valuable reference data for future deployment of PQC systems.

Updated: 2025-07-24 00:27:23

标题: NIST基于量子随机数生成器的后量子密码学标准算法

摘要: 近年来，量子计算技术的进步对RSA加密和椭圆曲线加密构成了潜在的安全威胁。作为回应，美国国家标准与技术研究院（NIST）于2024年8月发布了几项后量子密码学（PQC）的联邦信息处理标准（FIPS），包括基于模块格的密钥封装机制（ML-KEM）、基于模块格的数字签名算法（ML-DSA）和基于状态散列的数字签名算法（SLH-DSA）。尽管这些PQC算法旨在抵御量子计算攻击，但在某些特定应用场景下可能无法提供足够的安全性。为解决这一问题，本研究提出了基于量子随机数生成器（QRNG）的PQC算法。这些算法利用量子计算生成随机数，作为密钥对生成、密钥封装和数字签名生成的基础。提出了QRNG的通用架构，并设计了六种QRNG。根据NIST SP 800-90B中概述的统计验证程序对每个生成器进行评估，包括验证熵源和独立同分布（IID）输出的测试。实验结果评估了六种QRNG的计算时间，以及基于QRNG的ML-KEM、基于QRNG的ML-DSA和基于QRNG的SLH-DSA的性能。这些发现为未来部署PQC系统提供了宝贵的参考数据。

更新时间: 2025-07-24 00:27:23

领域: cs.CR,cs.PF,quant-ph,stat.AP

下载: http://arxiv.org/abs/2507.21151v1

Analyzing Islamophobic Discourse Using Semi-Coded Terms and LLMs

In recent years, Islamophobia has gained significant traction across Western societies, fueled by the rise of digital communication networks. This paper performs a large-scale analysis of specialized, semi-coded Islamophobic terms such as (muzrat, pislam, mudslime, mohammedan, muzzies) floated on extremist social platforms, i.e., 4Chan, Gab, Telegram, etc. Many of these terms appear lexically neutral or ambiguous outside of specific contexts, making them difficult for both human moderators and automated systems to reliably identify as hate speech. First, we use Large Language Models (LLMs) to show their ability to understand these terms. Second, Google Perspective API suggests that Islamophobic posts tend to receive higher toxicity scores than other categories of hate speech like Antisemitism. Finally, we use BERT topic modeling approach to extract different topics and Islamophobic discourse on these social platforms. Our findings indicate that LLMs understand these Out-Of-Vocabulary (OOV) slurs; however, further improvements in moderation strategies and algorithmic detection are necessary to address such discourse effectively. Our topic modeling also indicates that Islamophobic text is found across various political, conspiratorial, and far-right movements and is particularly directed against Muslim immigrants. Taken altogether, we performed one of the first studies on Islamophobic semi-coded terms and shed a global light on Islamophobia.

Updated: 2025-07-24 00:22:47

标题: 使用半编码术语和LLMs分析伊斯兰恐惧症言论

摘要: 近年来，伊斯兰恐惧症在西方社会中获得了显著的关注，这得益于数字通信网络的兴起。本文对专门的、半编码的伊斯兰恐惧症术语（如muzrat、pislam、mudslime、mohammedan、muzzies）进行了大规模分析，这些术语在极端社交平台（如4Chan、Gab、Telegram等）上流行。许多这些术语在特定语境之外在词法上是中性的或模棱两可的，这使得人类主持人和自动系统难以可靠地识别其为仇恨言论。首先，我们使用大型语言模型（LLMs）展示它们理解这些术语的能力。其次，谷歌观点API表明，伊斯兰恐惧症帖子往往比其他仇恨言论类别（如反犹太主义）获得更高的毒性评分。最后，我们使用BERT主题建模方法提取这些社交平台上不同主题和伊斯兰恐惧症话语。我们的研究结果表明，LLMs理解这些超出词汇表（OOV）的侮辱词；然而，进一步改进审查策略和算法检测是必要的，以有效应对这种话语。我们的主题建模还表明，伊斯兰恐惧症文本广泛存在于各种政治、阴谋和极右运动中，特别是针对穆斯林移民。总的来说，我们进行了对伊斯兰恐惧症半编码术语的首次研究，并为伊斯兰恐惧症投下了一道全球性的光芒。

更新时间: 2025-07-24 00:22:47

领域: cs.LG

下载: http://arxiv.org/abs/2503.18273v2

Analyzing Islamophobic Discourse Using Semi-Coded Terms and LLMs

Updated: 2025-07-24 00:22:47

标题: 使用半编码术语和LLMs分析伊斯兰恐惧言论

摘要: 在近年来，伊斯兰恐惧症在西方社会获得了显著的关注，这主要是由数字通讯网络的兴起所推动的。本文对专门化、半编码的伊斯兰恐惧症术语（如muzrat、pislam、mudslime、mohammedan、muzzies）进行了大规模分析，这些术语在极端社交平台（如4Chan、Gab、Telegram等）上流传。许多这些术语在特定环境之外看起来在词汇上中性或模棱两可，这使得人类审查员和自动化系统难以可靠地将其识别为仇恨言论。首先，我们使用大型语言模型（LLMs）展示了它们理解这些术语的能力。其次，谷歌观点API表明，伊斯兰恐惧症帖子往往比其他仇恨言论类别（如反犹太主义）获得更高的毒性评分。最后，我们使用BERT主题建模方法提取了这些社交平台上不同主题和伊斯兰恐惧症话语。我们的研究结果表明，LLMs理解这些超出词汇表（OOV）的侮辱性词语；然而，进一步改进审查策略和算法检测是必要的，以有效应对这种话语。我们的主题建模还表明，伊斯兰恐惧症文本在各种政治、阴谋和极右运动中普遍存在，特别是针对穆斯林移民。总的来说，我们进行了对伊斯兰恐惧症半编码术语的首次研究，并为伊斯兰恐惧症投下了全球的光。

更新时间: 2025-07-24 00:22:47

领域: cs.LG

下载: http://arxiv.org/abs/2503.18273v2

Fine-Grained Uncertainty Quantification via Collisions

We propose a new and intuitive metric for aleatoric uncertainty quantification (UQ), the prevalence of class collisions defined as the same input being observed in different classes. We use the rate of class collisions to define the collision matrix, a novel and uniquely fine-grained measure of uncertainty. For a classification problem involving $K$ classes, the $K\times K$ collision matrix $S$ measures the inherent difficulty in distinguishing between each pair of classes. We discuss several applications of the collision matrix, establish its fundamental mathematical properties, as well as show its relationship with existing UQ methods, including the Bayes error rate (BER). We also address the new problem of estimating the collision matrix using one-hot labeled data by proposing a series of innovative techniques to estimate $S$. First, we learn a pair-wise contrastive model which accepts two inputs and determines if they belong to the same class. We then show that this contrastive model (which is PAC learnable) can be used to estimate the Gramian matrix of $S$, defined as $G=S^TS$. Finally, we show that under reasonable assumptions, $G$ can be used to uniquely recover $S$, a new result on non-negative matrices which could be of independent interest. With a method to estimate $S$ established, we demonstrate how this estimate of $S$, in conjunction with the contrastive model, can be used to estimate the posterior class portability distribution of any point. Experimental results are also presented to validate our methods of estimating the collision matrix and class posterior distributions on several datasets.

Updated: 2025-07-24 00:06:46

标题: 通过碰撞实现细粒度不确定性量化

摘要: 我们提出了一种新的直观的用于不确定性量化（UQ）的度量标准，即类碰撞的普遍性，定义为相同输入被观察在不同类中的情况。我们使用类碰撞的率来定义碰撞矩阵，这是一种新颖且独特细粒度的不确定性度量。对于涉及$K$个类的分类问题，$K\times K$的碰撞矩阵$S$度量了在每对类之间区分的固有困难。我们讨论了碰撞矩阵的几个应用，建立了它的基本数学属性，并展示了它与现有UQ方法（包括贝叶斯错误率（BER））之间的关系。我们还解决了使用one-hot标记数据估计碰撞矩阵的新问题，通过提出一系列创新技术来估计$S$。首先，我们学习一个成对对比模型，接受两个输入并确定它们是否属于同一类。然后我们展示这个对比模型（可以PAC可学习）可以用来估计$S$的Gramian矩阵，定义为$G=S^TS$。最后，我们展示在合理假设下，$G$可以用于唯一恢复$S$，这是关于非负矩阵的一个新结果，可能具有独立的兴趣。通过建立一个估计$S$的方法，我们展示了如何利用这个$S$的估计值，结合对比模型，来估计任意点的后验类可移植性分布。还展示了实验结果，以验证我们估计碰撞矩阵和类后验分布的方法在几个数据集上的有效性。

更新时间: 2025-07-24 00:06:46

领域: cs.LG,cs.IT,math.IT,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2411.12127v4

Fine-Grained Uncertainty Quantification via Collisions

Updated: 2025-07-24 00:06:46

标题: 通过碰撞进行细粒度的不确定性量化

摘要: 我们提出了一种新的直观的用于不确定性量化（UQ）的度量标准，即类碰撞的流行度，定义为相同的输入在不同类中被观测到。我们利用类碰撞的比率来定义碰撞矩阵，这是一种新颖且独特细粒度的不确定性度量。对于涉及$K$个类别的分类问题，$K\times K$的碰撞矩阵$S$度量了在每一对类别之间进行区分的固有难度。我们讨论了碰撞矩阵的几种应用，确立了它的基本数学特性，并展示了它与现有UQ方法（包括贝叶斯错误率（BER））的关系。我们还解决了使用一热标记数据估计碰撞矩阵的新问题，提出了一系列创新技术来估计$S$。首先，我们学习了一个成对对比模型，接受两个输入并确定它们是否属于同一类。然后，我们展示这个对比模型（可以被PAC学习）可以用来估计$S$的格拉姆矩阵，定义为$G=S^TS$。最后，我们展示在合理假设下，$G$可以用来唯一恢复$S$，这是关于非负矩阵的一个新结果，可能具有独立的兴趣。通过建立一个估计$S$的方法，我们展示了如何利用$S$的估计值，结合对比模型，来估计任意点的后验类携带分布。实验结果也被呈现以验证我们估计碰撞矩阵和类后验分布的方法在几个数据集上的有效性。

更新时间: 2025-07-24 00:06:46

领域: cs.LG,cs.IT,math.IT,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2411.12127v4