_              _         ____              
   / \   _ ____  _(_)_   __ |  _ \  __ _ _   _ 
  / _ \ | '__\ \/ / \ \ / / | | | |/ _` | | | |
 / ___ \| |   >  <| |\ V /  | |_| | (_| | |_| |
/_/   \_\_|  /_/\_\_| \_/   |____/ \__,_|\__, |
                                         |___/ 
        

Articles: 0

Last Updated: N/A (+00:00)

Index | Calendar | Favorites | Archive | Profile

FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning

Federated Learning (FL) offers a paradigm for privacy-preserving collaborative AI, but its decentralized nature creates significant vulnerabilities to model poisoning attacks. While numerous static defenses exist, their effectiveness is highly context-dependent, often failing against adaptive adversaries or in heterogeneous data environments. This paper introduces FedStrategist, a novel meta-learning framework that reframes robust aggregation as a real-time, cost-aware control problem. We design a lightweight contextual bandit agent that dynamically selects the optimal aggregation rule from an arsenal of defenses based on real-time diagnostic metrics. Through comprehensive experiments, we demonstrate that no single static rule is universally optimal. We show that our adaptive agent successfully learns superior policies across diverse scenarios, including a ``Krum-favorable" environment and against a sophisticated "stealth" adversary designed to neutralize specific diagnostic signals. Critically, we analyze the paradoxical scenario where a non-robust baseline achieves high but compromised accuracy, and demonstrate that our agent learns a conservative policy to prioritize model integrity. Furthermore, we prove the agent's policy is controllable via a single "risk tolerance" parameter, allowing practitioners to explicitly manage the trade-off between performance and security. Our work provides a new, practical, and analyzable approach to creating resilient and intelligent decentralized AI systems.

Updated: 2025-07-28 23:58:45

标题: FedStrategist:一种用于联邦学习中自适应和稳健聚合的元学习框架

摘要: 联邦学习(FL)提供了一个隐私保护的协作人工智能范式,但其分散性质使模型中毒攻击面临重大风险。虽然存在许多静态防御方法,但它们的有效性高度依赖于上下文,经常无法对抗适应性对手或异构数据环境。本文介绍了FedStrategist,这是一个创新的元学习框架,将强大的聚合重新构建为实时、成本感知的控制问题。我们设计了一个轻量级的上下文老虎机代理,根据实时诊断指标动态选择最佳的聚合规则,从一系列防御中选择。通过全面的实验,我们证明没有单一的静态规则是普遍最优的。我们展示我们的自适应代理成功地学习了在各种场景下的优越策略,包括“Krum-有利”环境以及针对特定诊断信号的复杂“隐形”对手。关键是,我们分析了一个悖论场景,即一个非强大的基线实现了高但受损的准确性,并证明我们的代理学习了一种保守政策,优先考虑模型完整性。此外,我们证明了代理的政策可以通过一个“风险容忍度”参数来控制,使从业者能够明确管理性能和安全之间的权衡。我们的工作提供了一个新的、实用的、可分析的方法,用于创建具有弹性和智能的分散式人工智能系统。

更新时间: 2025-07-28 23:58:45

领域: cs.LG,cs.CR,cs.DC,I.2.11; C.2.4; K.6.5

下载: http://arxiv.org/abs/2507.14322v2

Addressing High Class Imbalance in Multi-Class Diabetic Retinopathy Severity Grading with Augmentation and Transfer Learning

Diabetic retinopathy (DR) is a leading cause of vision loss worldwide, and early diagnosis through automated retinal image analysis can significantly reduce the risk of blindness. This paper presents a robust deep learning framework for both binary and five-class DR classification, leveraging transfer learning and extensive data augmentation to address the challenges of class imbalance and limited training data. We evaluate a range of pretrained convolutional neural network architectures, including variants of ResNet and EfficientNet, on the APTOS 2019 dataset. For binary classification, our proposed model achieves a state-of-the-art accuracy of 98.9%, with a precision of 98.6%, recall of 99.3%, F1-score of 98.9%, and an AUC of 99.4%. In the more challenging five-class severity classification task, our model obtains a competitive accuracy of 84.6% and an AUC of 94.1%, outperforming several existing approaches. Our findings also demonstrate that EfficientNet-B0 and ResNet34 offer optimal trade-offs between accuracy and computational efficiency across both tasks. These results underscore the effectiveness of combining class-balanced augmentation with transfer learning for high-performance DR diagnosis. The proposed framework provides a scalable and accurate solution for DR screening, with potential for deployment in real-world clinical environments.

Updated: 2025-07-28 23:58:31

标题: 使用增强和迁移学习解决多类糖尿病视网膜病变严重性分级中的高类别不平衡问题

摘要: 糖尿病视网膜病变(DR)是全球导致视力丧失的主要原因,通过自动视网膜图像分析进行早期诊断可以显著降低失明的风险。本文提出了一个强大的深度学习框架,用于二元和五类DR分类,利用迁移学习和广泛的数据增强来解决类别不平衡和有限的训练数据挑战。我们在APTOS 2019数据集上评估了一系列预训练的卷积神经网络架构,包括ResNet和EfficientNet的变体。 对于二元分类,我们提出的模型实现了98.9%的最新准确率,精度为98.6%,召回率为99.3%,F1分数为98.9%,AUC为99.4%。在更具挑战性的五类严重程度分类任务中,我们的模型获得了84.6%的竞争性准确率和94.1%的AUC,优于几种现有方法。我们的研究结果还表明,EfficientNet-B0和ResNet34在两个任务中都提供了准确性和计算效率之间的最佳折衷。 这些结果强调了将类平衡增强与迁移学习相结合,以实现高性能DR诊断的有效性。提出的框架为DR筛查提供了可扩展和准确的解决方案,具有在真实临床环境中部署的潜力。

更新时间: 2025-07-28 23:58:31

领域: cs.CV,cs.LG,F.2.2; I.2.7

下载: http://arxiv.org/abs/2507.17121v2

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation

We introduce LLaVA-Reward, an efficient reward model designed to automatically evaluate text-to-image (T2I) generations across multiple perspectives, leveraging pretrained multimodal large language models (MLLMs). Existing MLLM-based approaches require instruction-following data for supervised fine-tuning and evaluate generation quality on analyzing text response, which is time-consuming and difficult to train. To address this problem, we propose LLaVA-Reward, which directly utilizes the hidden states of MLLMs given text-image pairs. To enhance the bidirectional interaction between visual and textual representations in decoder-only MLLMs, we further propose adding a Skip-connection Cross Attention (SkipCA) module. This design enhances text-image correlation reasoning by connecting early-layer visual features with later-layer hidden representations.In addition, LLaVA-Reward supports different types of preference data for efficient fine-tuning, including paired preference data and unpaired data. We train LLaVA-Reward on four evaluation perspectives: text-image alignment, fidelity/artifact, safety, and overall ranking. Empirical results demonstrate that LLaVA-Reward outperforms conventional and MLLM-based methods in generating human-aligned scores for automatic evaluations and inference-time scaling in text-to-image generations.

Updated: 2025-07-28 23:52:53

标题: 多模态LLM作为文本到图像生成的定制奖励模型

摘要: 我们介绍了LLaVA-Reward,这是一种高效的奖励模型,旨在自动评估跨多个角度的文本到图像(T2I)生成,利用预训练的多模态大语言模型(MLLMs)。现有的基于MLLM的方法需要用于监督微调的指导数据,并且评估生成质量是通过分析文本响应来进行的,这是耗时且难以训练的。为了解决这个问题,我们提出了LLaVA-Reward,它直接利用MLLM给定文本-图像对的隐藏状态。为了增强解码器-仅MLLM中视觉和文本表示之间的双向交互,我们进一步提出添加Skip-connection Cross Attention(SkipCA)模块。该设计通过将早期视觉特征与后期隐藏表示连接来增强文本-图像相关性推理。此外,LLaVA-Reward支持不同类型的偏好数据,用于高效微调,包括配对偏好数据和不成对数据。我们在四个评估角度上对LLaVA-Reward进行训练:文本-图像对齐、保真度/伪像、安全性和整体排名。实证结果表明,LLaVA-Reward在为自动评估生成人类对齐分数和在文本到图像生成中进行推理时间缩放方面,优于传统和基于MLLM的方法。

更新时间: 2025-07-28 23:52:53

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.21391v1

Teaching Language Models To Gather Information Proactively

Large language models (LLMs) are increasingly expected to function as collaborative partners, engaging in back-and-forth dialogue to solve complex, ambiguous problems. However, current LLMs often falter in real-world settings, defaulting to passive responses or narrow clarifications when faced with incomplete or under-specified prompts, falling short of proactively gathering the missing information that is crucial for high-quality solutions. In this work, we introduce a new task paradigm: proactive information gathering, where LLMs must identify gaps in the provided context and strategically elicit implicit user knowledge through targeted questions. To systematically study and train this capability, we design a scalable framework that generates partially specified, real-world tasks, masking key information and simulating authentic ambiguity. Within this setup, our core innovation is a reinforcement finetuning strategy that rewards questions that elicit genuinely new, implicit user information -- such as hidden domain expertise or fine-grained requirements -- that would otherwise remain unspoken. Experiments demonstrate that our trained Qwen-2.5-7B model significantly outperforms o3-mini by 18% on automatic evaluation metrics. More importantly, human evaluation reveals that clarification questions and final outlines generated by our model are favored by human annotators by 42% and 28% respectively. Together, these results highlight the value of proactive clarification in elevating LLMs from passive text generators to genuinely collaborative thought partners.

Updated: 2025-07-28 23:50:09

标题: 教授语言模型积极收集信息

摘要: 大型语言模型(LLMs)越来越被期望能够作为协作伙伴,参与来回对话以解决复杂、模糊的问题。然而,当前的LLMs在现实世界中经常表现不佳,在面对不完整或未详细说明的提示时,往往采取被动回答或狭窄澄清的方式,无法主动收集关键信息,这对于高质量解决方案至关重要。在这项工作中,我们引入了一种新的任务范式:主动信息收集,LLMs必须识别提供的背景中的空白并通过有针对性的问题策略性地引导隐含的用户知识。为了系统地研究和训练这种能力,我们设计了一个可扩展的框架,生成部分指定的真实世界任务,掩盖关键信息,模拟真实的模糊性。在这个设置中,我们的核心创新是一种强化微调策略,奖励引出真正新颖的、隐含的用户信息的问题,比如隐藏的领域专业知识或细粒度需求,否则这些信息将被悄悄忽略。实验证明,我们训练的Qwen-2.5-7B模型在自动评估指标上比o3-mini表现显著提高了18%。更重要的是,人类评估显示,我们模型生成的澄清问题和最终大纲分别受到42%和28%的人类注释者的青睐。总的来说,这些结果凸显了主动澄清在将LLMs从被动文本生成器提升为真正协作思维伙伴方面的价值。

更新时间: 2025-07-28 23:50:09

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.21389v1

Radio Adversarial Attacks on EMG-based Gesture Recognition Networks

Surface electromyography (EMG) enables non-invasive human-computer interaction in rehabilitation, prosthetics, and virtual reality. While deep learning models achieve over 97% classification accuracy, their vulnerability to adversarial attacks remains largely unexplored in the physical domain. We present ERa Attack, the first radio frequency (RF) adversarial method targeting EMG devices through intentional electromagnetic interference (IEMI). Using low-power software-defined radio transmitters, attackers inject optimized RF perturbations to mislead downstream models. Our approach bridges digital and physical domains: we generate adversarial perturbations using Projected Gradient Descent, extract 50-150 Hz components via inverse STFT, and employ synchronization-free strategies (constant spectrum noise or narrowband modulation). Perturbations, constrained to 1-10% of signal amplitude, are amplitude-modulated onto 433 MHz carriers. Experiments on the Myo Dataset (7 gestures, 350 samples) demonstrate significant impact: at 1 meter and 0 dBm transmission power, classification accuracy drops from 97.8% to 58.3%, with 41.7% misclassification rate and 25.6% targeted attack success rate. Attack effectiveness decreases exponentially with distance, recovering to 85% accuracy at 3 meters. Increasing power to 10 dBm reduces accuracy by an additional 15% at 1 meter. This work pioneers RF-based adversarial attacks on EMG recognition systems, revealing critical vulnerabilities in safety-critical applications. We quantify attack effectiveness across different perturbation modes and distances, and propose defenses including hardware shielding, spectrum monitoring, and adversarial training. Our findings inform the design of robust EMG systems against electromagnetic threats.

Updated: 2025-07-28 23:45:34

标题: 无线电对基于EMG的手势识别网络的对抗性攻击

摘要: 表面肌电图(EMG)使得康复、假肢和虚拟现实领域的无创人机交互成为可能。虽然深度学习模型能够实现超过97%的分类准确率,但它们在物理领域中对对抗性攻击的脆弱性仍然很少被探讨。我们提出了ERa Attack,这是第一个通过有意的电磁干扰(IEMI)针对EMG设备的射频(RF)对抗方法。攻击者使用低功率软件定义无线电发射机注入经过优化的RF扰动,以误导下游模型。我们的方法架起了数字和物理领域之间的桥梁:我们使用Projected Gradient Descent生成对抗扰动,通过逆STFT提取50-150 Hz分量,并采用无同步策略(恒定频谱噪声或窄带调制)。扰动限制在信号幅度的1-10%之间,调幅到433 MHz载波上。在Myo数据集上的实验(7个手势,350个样本)显示了显著的影响:在1米和0 dBm传输功率下,分类准确率从97.8%下降到58.3%,误分类率为41.7%,有针对性的攻击成功率为25.6%。攻击效果随距离呈指数级下降,在3米处恢复至85%的准确率。将功率增加到10 dBm会在1米处额外降低15%的准确率。这项工作开创了基于RF的对抗性攻击EMG识别系统的先河,揭示了关键应用领域的安全漏洞。我们量化了不同扰动模式和距离下的攻击效果,并提出了包括硬件屏蔽、频谱监测和对抗性训练在内的防御措施。我们的发现为设计能够抵御电磁威胁的稳健EMG系统提供了信息。

更新时间: 2025-07-28 23:45:34

领域: cs.CR

下载: http://arxiv.org/abs/2507.21387v1

Efficient Neural Combinatorial Optimization Solver for the Min-max Heterogeneous Capacitated Vehicle Routing Problem

Numerous Neural Combinatorial Optimization (NCO) solvers have been proposed to address Vehicle Routing Problems (VRPs). However, most of these solvers focus exclusively on single-vehicle VRP variants, overlooking the more realistic min-max Heterogeneous Capacitated Vehicle Routing Problem (MMHCVRP), which involves multiple vehicles. Existing MMHCVRP solvers typically select a vehicle and its next node to visit at each decoding step, but often make myopic decoding decisions and overlook key properties of MMHCVRP, including local topological relationships, vehicle permutation invariance, and node symmetry, resulting in suboptimal performance. To better address these limitations, we propose ECHO, an efficient NCO solver. First, ECHO exploits the proposed dual-modality node encoder to capture local topological relationships among nodes. Subsequently, to mitigate myopic decisions, ECHO employs the proposed Parameter-Free Cross-Attention mechanism to prioritize the vehicle selected in the preceding decoding step. Finally, leveraging vehicle permutation invariance and node symmetry, we introduce a tailored data augment strategy for MMHCVRP to stabilize the Reinforcement Learning training process. To assess the performance of ECHO, we conduct extensive experiments. The experimental results demonstrate that ECHO outperforms state-of-the-art NCO solvers across varying numbers of vehicles and nodes, and exhibits well-performing generalization across both scales and distribution patterns. Finally, ablation studies validate the effectiveness of all proposed methods.

Updated: 2025-07-28 23:38:33

标题: 高效神经组合优化求解器用于最小-最大异构容量车辆路径问题

摘要: 许多神经组合优化(NCO)求解器已被提出,用于解决车辆路径问题(VRPs)。然而,大多数这些求解器专注于单车辆VRP变体,忽视了更现实的最小-最大异构容量车辆路径问题(MMHCVRP),涉及多辆车辆。现有的MMHCVRP求解器通常在每个解码步骤中选择一个车辆及其下一个访问节点,但经常做出目光短浅的解码决策,忽视了MMHCVRP的关键特性,包括局部拓扑关系、车辆排列不变性和节点对称性,导致性能次优。为了更好地解决这些限制,我们提出了ECHO,一个高效的NCO求解器。首先,ECHO利用所提出的双模态节点编码器捕获节点之间的局部拓扑关系。随后,为了减轻目光短浅的决策,ECHO采用所提出的无参数交叉注意机制,优先考虑在前一个解码步骤中选择的车辆。最后,利用车辆排列不变性和节点对称性,我们为MMHCVRP引入了一个量身定制的数据增强策略,以稳定强化学习训练过程。为了评估ECHO的性能,我们进行了大量实验。实验结果表明,ECHO在不同数量的车辆和节点上优于最先进的NCO求解器,并在规模和分布模式上表现出良好的泛化能力。最后,消融研究验证了所有提出方法的有效性。

更新时间: 2025-07-28 23:38:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.21386v1

Deep Reinforcement Learning-based Cell DTX/DRX Configuration for Network Energy Saving

3GPP Release 18 cell discontinuous transmission and reception (cell DTX/DRX) is an important new network energy saving feature for 5G. As a time-domain technique, it periodically aggregates the user data transmissions in a given duration of time when the traffic load is not heavy, so that the remaining time can be kept silent and advanced sleep modes (ASM) can be enabled to shut down more radio components and save more energy for the cell. However, inevitably the packet delay is increased, as during the silent period no transmission is allowed. In this paper we study how to configure cell DTX/DRX to optimally balance energy saving and packet delay, so that for delay-sensitive traffic maximum energy saving can be achieved while the degradation of quality of service (QoS) is minimized. As the optimal configuration can be different for different network and traffic conditions, the problem is complex and we resort to deep reinforcement learning (DRL) framework to train an AI agent to solve it. Through careful design of 1) the learning algorithm, which implements a deep Q-network (DQN) on a contextual bandit (CB) model, and 2) the reward function, which utilizes a smooth approximation of a theoretically optimal but discontinuous reward function, we are able to train an AI agent that always tries to select the best possible Cell DTX/DRX configuration under any network and traffic conditions. Simulation results show that compared to the case when cell DTX/DRX is not used, our agent can achieve up to ~45% energy saving depending on the traffic load scenario, while always maintaining no more than ~1% QoS degradation.

Updated: 2025-07-28 23:35:24

标题: 基于深度强化学习的网络能耗节约的Cell DTX/DRX配置

摘要: 3GPP Release 18的小区间断传输和接收(cell DTX/DRX)是5G中一个重要的新网络节能功能。作为一个时域技术,它定期聚合用户数据传输在给定的时间段内,当流量负载不重时,剩余时间可以保持静默并启用高级睡眠模式(ASM)以关闭更多无线电组件并为小区节省更多能量。然而,不可避免地,数据包延迟会增加,因为在静默期间不允许传输。在本文中,我们研究如何配置cell DTX/DRX以在节能和数据包延迟之间实现最佳平衡,以便对于延迟敏感的流量,可以实现最大的节能,同时最大限度地减小服务质量(QoS)的退化。由于最佳配置可能因不同的网络和流量条件而异,这个问题是复杂的,我们借助深度强化学习(DRL)框架来训练一个AI代理来解决它。通过精心设计1)实施在上下文推断(CB)模型上的深度Q网络(DQN)的学习算法,以及2)利用理论上最优但不连续的奖励函数的平滑近似的奖励函数,我们能够训练一个AI代理,它总是尝试在任何网络和流量条件下选择最佳的Cell DTX/DRX配置。模拟结果显示,与不使用cell DTX/DRX的情况相比,我们的代理可以实现高达约45%的能量节省,这取决于流量负载场景,同时始终保持不超过约1%的QoS退化。

更新时间: 2025-07-28 23:35:24

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2507.21385v1

Optimizing Multi-Tier Supply Chain Ordering with LNN+XGBoost: Mitigating the Bullwhip Effect

Supply chain management faces significant challenges, including demand fluctuations, inventory imbalances, and amplified upstream order variability due to the bullwhip effect. Traditional methods, such as simple moving averages, struggle to address dynamic market conditions. Emerging machine learning techniques, including LSTM, reinforcement learning, and XGBoost, offer potential solutions but are limited by computational complexity, training inefficiencies, or constraints in time-series modeling. Liquid Neural Networks, inspired by dynamic biological systems, present a promising alternative due to their adaptability, low computational cost, and robustness to noise, making them suitable for real-time decision-making and edge computing. Despite their success in applications like autonomous vehicles and medical monitoring, their potential in supply chain optimization remains underexplored. This study introduces a hybrid LNN and XGBoost model to optimize ordering strategies in multi-tier supply chains. By leveraging LNN's dynamic feature extraction and XGBoost's global optimization capabilities, the model aims to mitigate the bullwhip effect and enhance cumulative profitability. The research investigates how local and global synergies within the hybrid framework address the dual demands of adaptability and efficiency in SCM. The proposed approach fills a critical gap in existing methodologies, offering an innovative solution for dynamic and efficient supply chain management.

Updated: 2025-07-28 23:24:54

标题: 用LNN+XGBoost优化多层供应链订购:减轻牛鞭效应

摘要: 供应链管理面临着重大挑战,包括需求波动、库存不平衡以及由于牛鞭效应造成的上游订单变化加剧。传统方法,如简单移动平均,难以应对动态市场条件。新兴的机器学习技术,包括LSTM、强化学习和XGBoost,提供了潜在的解决方案,但受到计算复杂性、训练效率低下或时间序列建模的限制。受到动态生物系统启发的液态神经网络提供了一种有希望的替代方案,因为其适应性强、计算成本低,对噪声具有鲁棒性,适用于实时决策和边缘计算。尽管在自动驾驶车辆和医疗监测等应用中取得了成功,但其在供应链优化中的潜力仍未被充分挖掘。本研究引入了混合LNN和XGBoost模型,以优化多层供应链中的订货策略。通过利用LNN的动态特征提取和XGBoost的全局优化能力,该模型旨在减轻牛鞭效应并增强累积盈利能力。研究探讨了混合框架中的局部和全局协同作用如何满足供应链管理中适应性和效率的双重需求。所提出的方法填补了现有方法论的关键空白,为动态高效的供应链管理提供了创新解决方案。

更新时间: 2025-07-28 23:24:54

领域: cs.AI

下载: http://arxiv.org/abs/2507.21383v1

MAAD: Automate Software Architecture Design through Knowledge-Driven Multi-Agent Collaboration

Software architecture design is a critical, yet inherently complex and knowledge-intensive phase of software development. It requires deep domain expertise, development experience, architectural knowledge, careful trade-offs among competing quality attributes, and the ability to adapt to evolving requirements. Traditionally, this process is time-consuming and labor-intensive, and relies heavily on architects, often resulting in limited design alternatives, especially under the pressures of agile development. While Large Language Model (LLM)-based agents have shown promising performance across various SE tasks, their application to architecture design remains relatively scarce and requires more exploration, particularly in light of diverse domain knowledge and complex decision-making. To address the challenges, we proposed MAAD (Multi-Agent Architecture Design), an automated framework that employs a knowledge-driven Multi-Agent System (MAS) for architecture design. MAAD orchestrates four specialized agents (i.e., Analyst, Modeler, Designer and Evaluator) to collaboratively interpret requirements specifications and produce architectural blueprints enriched with quality attributes-based evaluation reports. We then evaluated MAAD through a case study and comparative experiments against MetaGPT, a state-of-the-art MAS baseline. Our results show that MAAD's superiority lies in generating comprehensive architectural components and delivering insightful and structured architecture evaluation reports. Feedback from industrial architects across 11 requirements specifications further reinforces MAAD's practical usability. We finally explored the performance of the MAAD framework with three LLMs (GPT-4o, DeepSeek-R1, and Llama 3.3) and found that GPT-4o exhibits better performance in producing architecture design, emphasizing the importance of LLM selection in MAS-driven architecture design.

Updated: 2025-07-28 23:18:25

标题: MAAD: 通过基于知识驱动的多Agent协作自动化软件架构设计

摘要: 软件架构设计是软件开发中关键但固有复杂且知识密集的阶段。它需要深入的领域专业知识、开发经验、架构知识、在竞争的质量属性之间进行仔细权衡的能力,并且需要适应不断变化的需求。传统上,这个过程耗时且劳动密集,且严重依赖架构师,通常导致设计选择有限,尤其是在敏捷开发的压力下。虽然基于大型语言模型(LLM)的代理在各种软件工程任务中表现出有希望的性能,但它们在架构设计中的应用相对较少,需要更多探索,特别是考虑到多样的领域知识和复杂的决策制定。为了解决这些挑战,我们提出了MAAD(多代理架构设计),这是一个采用知识驱动的多代理系统(MAS)进行架构设计的自动化框架。MAAD协调四个专门的代理(分析员、建模者、设计者和评估者)共同解释需求规范,并生成以质量属性为基础的评估报告丰富的架构蓝图。然后我们通过一个案例研究和与MetaGPT(最先进的MAS基线)的比较实验对MAAD进行了评估。我们的结果表明,MAAD的优势在于生成全面的架构组件,并提供富有洞见且结构化的架构评估报告。来自11个需求规范的工业架构师的反馈进一步加强了MAAD的实用性。最后,我们探索了MAAD框架与三个LLM(GPT-4o、DeepSeek-R1和Llama 3.3)的性能,并发现GPT-4o在架构设计方面表现更好,强调了在MAS驱动的架构设计中LLM选择的重要性。

更新时间: 2025-07-28 23:18:25

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.21382v1

ProMemAssist: Exploring Timely Proactive Assistance Through Working Memory Modeling in Multi-Modal Wearable Devices

Wearable AI systems aim to provide timely assistance in daily life, but existing approaches often rely on user initiation or predefined task knowledge, neglecting users' current mental states. We introduce ProMemAssist, a smart glasses system that models a user's working memory (WM) in real-time using multi-modal sensor signals. Grounded in cognitive theories of WM, our system represents perceived information as memory items and episodes with encoding mechanisms, such as displacement and interference. This WM model informs a timing predictor that balances the value of assistance with the cost of interruption. In a user study with 12 participants completing cognitively demanding tasks, ProMemAssist delivered more selective assistance and received higher engagement compared to an LLM baseline system. Qualitative feedback highlights the benefits of WM modeling for nuanced, context-sensitive support, offering design implications for more attentive and user-aware proactive agents.

Updated: 2025-07-28 23:02:47

标题: ProMemAssist:通过多模可穿戴设备中的工作记忆建模探索及时主动的帮助

摘要: 可穿戴人工智能系统旨在为日常生活提供及时的帮助,但现有方法通常依赖于用户启动或预定义的任务知识,忽视了用户当前的心理状态。我们介绍了ProMemAssist,一种智能眼镜系统,可以使用多模式传感器信号实时建模用户的工作记忆(WM)。基于工作记忆的认知理论,我们的系统将感知信息表示为记忆项和通过编码机制(如位移和干扰)表示为事件。这个工作记忆模型为一个时间预测器提供信息,平衡了援助的价值和中断的成本。在一个用户研究中,12名参与者完成认知需求任务,ProMemAssist提供了更有选择性的援助,并获得了比LLM基线系统更高的参与度。定性反馈强调了工作记忆建模对于细致、情境敏感支持的好处,为更关注和用户感知的主动代理提供了设计启示。

更新时间: 2025-07-28 23:02:47

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2507.21378v1

TiVy: Time Series Visual Summary for Scalable Visualization

Visualizing multiple time series presents fundamental tradeoffs between scalability and visual clarity. Time series capture the behavior of many large-scale real-world processes, from stock market trends to urban activities. Users often gain insights by visualizing them as line charts, juxtaposing or superposing multiple time series to compare them and identify trends and patterns. However, existing representations struggle with scalability: when covering long time spans, leading to visual clutter from too many small multiples or overlapping lines. We propose TiVy, a new algorithm that summarizes time series using sequential patterns. It transforms the series into a set of symbolic sequences based on subsequence visual similarity using Dynamic Time Warping (DTW), then constructs a disjoint grouping of similar subsequences based on the frequent sequential patterns. The grouping result, a visual summary of time series, provides uncluttered superposition with fewer small multiples. Unlike common clustering techniques, TiVy extracts similar subsequences (of varying lengths) aligned in time. We also present an interactive time series visualization that renders large-scale time series in real-time. Our experimental evaluation shows that our algorithm (1) extracts clear and accurate patterns when visualizing time series data, (2) achieves a significant speed-up (1000X) compared to a straightforward DTW clustering. We also demonstrate the efficiency of our approach to explore hidden structures in massive time series data in two usage scenarios.

Updated: 2025-07-28 23:00:54

标题: TiVy:可扩展可视化的时间序列视觉摘要

摘要: 可视化多个时间序列在可扩展性和视觉清晰度之间存在基本的权衡。时间序列捕捉了许多大规模现实世界过程的行为,从股市趋势到城市活动。用户通常通过将它们可视化为折线图来获得洞察力,将多个时间序列并列或叠加在一起以比较它们并识别趋势和模式。然而,现有的表示形式在可扩展性方面存在困难:当覆盖长时间跨度时,会由于太多的小多个或重叠的线而导致视觉混乱。我们提出了一种新算法TiVy,它使用顺序模式对时间序列进行总结。它基于使用动态时间规整(DTW)的子序列视觉相似性将时间序列转换为一组符号序列,然后基于频繁的顺序模式构建相似子序列的不相交分组。分组结果是时间序列的视觉摘要,提供了更少的小多个的无混杂叠加。与常见的聚类技术不同,TiVy 提取了按时间对齐的相似子序列(长度各异)。我们还提出了一种交互式时间序列可视化,可以实时呈现大规模时间序列。我们的实验评估表明,我们的算法(1)在可视化时间序列数据时提取了清晰准确的模式,(2) 相对于直接的DTW聚类实现了显著的加速(1000倍)。我们还展示了我们的方法在两种使用场景中探索大规模时间序列数据中隐藏结构的效率。

更新时间: 2025-07-28 23:00:54

领域: cs.GR,cs.LG

下载: http://arxiv.org/abs/2507.18972v2

Reservoir Computation with Networks of Differentiating Neuron Ring Oscillators

Reservoir Computing is a machine learning approach that uses the rich repertoire of complex system dynamics for function approximation. Current approaches to reservoir computing use a network of coupled integrating neurons that require a steady current to maintain activity. Here, we introduce a small world graph of differentiating neurons that are active only when there are changes in input as an alternative to integrating neurons as a reservoir computing substrate. We find the coupling strength and network topology that enable these small world networks to function as an effective reservoir. We demonstrate the efficacy of these networks in the MNIST digit recognition task, achieving comparable performance of 90.65% to existing reservoir computing approaches. The findings suggest that differentiating neurons can be a potential alternative to integrating neurons and can provide a sustainable future alternative for power-hungry AI applications.

Updated: 2025-07-28 22:57:21

标题: 用不同化神经元环振荡器网络的水库计算

摘要: Reservoir Computing是一种机器学习方法,利用复杂系统动态的丰富资源进行函数逼近。目前的沉积池计算方法使用耦合的积分神经元网络,需要稳定的电流来维持活动。在这里,我们引入了一种不同化的神经元小世界图,只有在输入发生变化时才活跃,作为沉积池计算基质的替代方案。我们找到了使这些小世界网络能够作为有效沉积池的耦合强度和网络拓扑结构。我们展示了这些网络在MNIST数字识别任务中的有效性,在现有沉积池计算方法的90.65%的可比性表现。这些发现表明,不同化神经元可以成为积分神经元的潜在替代品,并为耗电量大的人工智能应用提供可持续的未来选择。

更新时间: 2025-07-28 22:57:21

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2507.21377v1

Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environment

This paper presents a Multi-modal Emotion Recognition (MER) system designed to enhance emotion recognition accuracy in challenging acoustic conditions. Our approach combines a modified and extended Hierarchical Token-semantic Audio Transformer (HTS-AT) for multi-channel audio processing with an R(2+1)D Convolutional Neural Networks (CNN) model for video analysis. We evaluate our proposed method on a reverberated version of the Ryerson audio-visual database of emotional speech and song (RAVDESS) dataset using synthetic and real-world Room Impulse Responsess (RIRs). Our results demonstrate that integrating audio and video modalities yields superior performance compared to uni-modal approaches, especially in challenging acoustic conditions. Moreover, we show that the multimodal (audiovisual) approach that utilizes multiple microphones outperforms its single-microphone counterpart.

Updated: 2025-07-28 22:57:14

标题: 在混响环境中的多麦克风和多模情感识别

摘要: 这篇论文介绍了一个设计用于在具有挑战性声学条件下提高情绪识别准确性的多模态情绪识别(MER)系统。我们的方法将修改和扩展的分层令牌-语义音频变换器(HTS-AT)与视频分析的R(2+1)D卷积神经网络(CNN)模型相结合,用于多通道音频处理。我们在带有合成和真实世界室内脉冲响应(RIRs)的雷尔森音频-视觉数据库情感语音和歌曲(RAVDESS)数据集的混响版本上评估了我们提出的方法。我们的结果表明,整合音频和视频模态相对于单模态方法在具有挑战性声学条件下表现出更优越的性能。此外,我们展示了利用多个麦克风的多模态(音频视觉)方法优于其单麦克风对应物。

更新时间: 2025-07-28 22:57:14

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2409.09545v3

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

We present Audio Flamingo 3 (AF3), a fully open state-of-the-art (SOTA) large audio-language model that advances reasoning and understanding across speech, sound, and music. AF3 introduces: (i) AF-Whisper, a unified audio encoder trained using a novel strategy for joint representation learning across all 3 modalities of speech, sound, and music; (ii) flexible, on-demand thinking, allowing the model to do chain-of-thought-type reasoning before answering; (iii) multi-turn, multi-audio chat; (iv) long audio understanding and reasoning (including speech) up to 10 minutes; and (v) voice-to-voice interaction. To enable these capabilities, we propose several large-scale training datasets curated using novel strategies, including AudioSkills-XL, LongAudio-XL, AF-Think, and AF-Chat, and train AF3 with a novel five-stage curriculum-based training strategy. Trained on only open-source audio data, AF3 achieves new SOTA results on over 20+ (long) audio understanding and reasoning benchmarks, surpassing both open-weight and closed-source models trained on much larger datasets.

Updated: 2025-07-28 22:53:43

标题: 音频火烈鸟3:利用完全开放的大型音频语言模型推进音频智能化

摘要: 我们介绍了音频火烈鸟3(AF3),这是一个全面开放的最先进(SOTA)的大型音频语言模型,可以促进对语音、声音和音乐的推理和理解。AF3引入了:(i)AF-Whisper,一个统一的音频编码器,使用一种新颖的策略进行训练,实现跨语音、声音和音乐三种模态的联合表示学习;(ii)灵活的、按需的思考,使模型在回答问题之前可以进行一连串的思考式推理;(iii)多轮、多音频对话;(iv)长音频理解和推理(包括语音)高达10分钟;以及(v)语音对语音的交互。为了实现这些功能,我们提出了几个使用新颖策略策划的大规模训练数据集,包括AudioSkills-XL、LongAudio-XL、AF-Think和AF-Chat,并使用一种新颖的五阶段课程化培训策略训练了AF3。在仅使用开源音频数据进行训练的情况下,AF3在超过20个(长)音频理解和推理基准测试中取得了新的SOTA结果,超过了在更大数据集上训练的开放权重和闭源模型。

更新时间: 2025-07-28 22:53:43

领域: cs.SD,cs.AI,cs.CL,eess.AS

下载: http://arxiv.org/abs/2507.08128v2

Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale

The integration of large language models (LLMs) into educational tools has the potential to substantially impact how teachers plan instruction, support diverse learners, and engage in professional reflection. Yet little is known about how educators actually use these tools in practice and how their interactions with AI can be meaningfully studied at scale. This paper presents a human-AI collaborative methodology for large-scale qualitative analysis of over 140,000 educator-AI messages drawn from a generative AI platform used by K-12 teachers. Through a four-phase coding pipeline, we combined inductive theme discovery, codebook development, structured annotation, and model benchmarking to examine patterns of educator engagement and evaluate the performance of LLMs in qualitative coding tasks. We developed a hierarchical codebook aligned with established teacher evaluation frameworks, capturing educators' instructional goals, contextual needs, and pedagogical strategies. Our findings demonstrate that LLMs, particularly Claude 3.5 Haiku, can reliably support theme identification, extend human recognition in complex scenarios, and outperform open-weight models in both accuracy and structural reliability. The analysis also reveals substantive patterns in how educators inquire AI to enhance instructional practices (79.7 percent of total conversations), create or adapt content (76.1 percent), support assessment and feedback loop (46.9 percent), attend to student needs for tailored instruction (43.3 percent), and assist other professional responsibilities (34.2 percent), highlighting emerging AI-related competencies that have direct implications for teacher preparation and professional development. This study offers a scalable, transparent model for AI-augmented qualitative research and provides foundational insights into the evolving role of generative AI in educational practice.

Updated: 2025-07-28 22:35:44

标题: 解码教学对话:人工智能与人类协作分析教师在大规模使用AI工具时的情况

摘要: 大型语言模型(LLMs)与教育工具的整合有潜力对教师如何规划教学、支持不同学习者以及进行专业反思产生重大影响。然而,我们对教育工作者实际如何在实践中使用这些工具以及他们与人工智能的互动如何可以被大规模地有意义地研究知之甚少。本文提出了一种人工智能协作方法,用于对从K-12教师使用的生成式人工智能平台中提取的超过140,000条教育者-人工智能信息的大规模定性分析。通过四阶段的编码流程,我们结合归纳主题发现、代码书开发、结构化注释和模型基准测试来检验教育者参与的模式,并评估LLMs在定性编码任务中的表现。我们开发了一个与已建立的教师评估框架相一致的分层代码书,捕捉教育者的教学目标、环境需求和教学策略。我们的研究发现,特别是Claude 3.5 Haiku,LLMs可以可靠地支持主题识别,在复杂场景中扩展人类识别,并在准确性和结构可靠性方面胜过开放权重模型。分析还显示了教育者如何询问AI来增强教学实践(总对话量的79.7%)、创建或调整内容(76.1%)、支持评估和反馈循环(46.9%)、满足学生个性化教学需求(43.3%)以及协助其他专业责任(34.2%)的实质性模式,突显出对教师准备和专业发展有直接影响的新兴与AI相关的能力。该研究提供了一个可扩展、透明的AI增强定性研究模型,并为生成式人工智能在教育实践中的不断发展角色提供了基础性见解。

更新时间: 2025-07-28 22:35:44

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2507.17985v2

Load Balancing for AI Training Workloads

We investigate the performance of various load balancing algorithms for large-scale AI training workloads that are running on dedicated infrastructure. The performance of load balancing depends on both the congestion control and loss recovery algorithms, so our evaluation also sheds light on the appropriate choices for those designs as well.

Updated: 2025-07-28 22:34:18

标题: AI训练工作负载的负载均衡

摘要: 我们研究了针对在专用基础设施上运行的大规模AI训练工作负载的各种负载平衡算法的性能。负载平衡的性能取决于拥塞控制和丢包恢复算法,因此我们的评估还揭示了这些设计的合适选择。

更新时间: 2025-07-28 22:34:18

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2507.21372v1

Improved Hardness of BDD and SVP Under Gap-(S)ETH

We show improved fine-grained hardness of two key lattice problems in the $\ell_p$ norm: Bounded Distance Decoding to within an $\alpha$ factor of the minimum distance ($\mathrm{BDD}_{p, \alpha}$) and the (decisional) $\gamma$-approximate Shortest Vector Problem ($\mathrm{SVP}_{p,\gamma}$), assuming variants of the Gap (Strong) Exponential Time Hypothesis (Gap-(S)ETH). Specifically, we show: 1. For all $p \in [1, \infty)$, there is no $2^{o(n)}$-time algorithm for $\mathrm{BDD}_{p, \alpha}$ for any constant $\alpha > \alpha_\mathsf{kn}$, where $\alpha_\mathsf{kn} = 2^{-c_\mathsf{kn}}$ and $c_\mathsf{kn}$ is the $\ell_2$ kissing-number constant, assuming $c_\mathsf{kn} > 0$ and that non-uniform Gap-ETH holds. 2. For all $p \in [1, \infty)$, there is no $2^{o(n)}$-time algorithm for $\mathrm{BDD}_{p, \alpha}$ for any constant $\alpha > \alpha^\ddagger_p$, where $\alpha^\ddagger_p$ is explicit and satisfies $\alpha^\ddagger_p = 1$ for $1 \leq p \leq 2$, $\alpha^\ddagger_p < 1$ for all $p > 2$, and $\alpha^\ddagger_p \to 1/2$ as $p \to \infty$, unless randomized Gap-ETH is false. 3. For all $p \in [1, \infty) \setminus 2 \mathbb{Z}$ and all $C > 1$, there is no $2^{n/C}$-time algorithm for $\mathrm{BDD}_{p, \alpha}$ for any constant $\alpha > \alpha^\dagger_{p, C}$, where $\alpha^\dagger_{p, C}$ is explicit and satisfies $\alpha^\dagger_{p, C} \to 1$ as $C \to \infty$ for any fixed $p \in [1, \infty)$, assuming $c_\mathsf{kn} > 0$ and that non-uniform Gap-SETH holds. 4. For all $p > p_0 \approx 2.1397$, $p \notin 2\mathbb{Z}$, and all $C > C_p$, there is no $2^{n/C}$-time algorithm for $\mathrm{SVP}_{p, \gamma}$ for some constant $\gamma > 1$, where $C_p > 1$ is explicit and satisfies $C_p \to 1$ as $p \to \infty$, unless randomized Gap-SETH is false.

Updated: 2025-07-28 22:24:49

标题: BDD和SVP在Gap-(S)ETH下的硬度提升

摘要: 我们展示了在$\ell_p$范数中两个关键格问题的精细硬度的改进:在最小距离的$\alpha$因子内有界距离解码($\mathrm{BDD}_{p, \alpha}$)和(决策性)$\gamma$-近似最短向量问题($\mathrm{SVP}_{p,\gamma}$),假设Gap(Strong)指数时间假设(Gap-(S)ETH)的变体。具体来说,我们展示了: 1. 对于所有$p \in [1, \infty)$,对于任意常数$\alpha > \alpha_\mathsf{kn}$,$\mathrm{BDD}_{p, \alpha}$没有$2^{o(n)}$时间算法,其中$\alpha_\mathsf{kn} = 2^{-c_\mathsf{kn}$,$c_\mathsf{kn}$是$\ell_2$基数常数,假设$c_\mathsf{kn} > 0$ 并且非统一Gap-ETH成立。 2. 对于所有$p \in [1, \infty)$,对于任意常数$\alpha > \alpha^\ddagger_p$,$\mathrm{BDD}_{p, \alpha}$没有$2^{o(n)}$时间算法,其中$\alpha^\ddagger_p$是显式的,且满足$1 \leq p \leq 2$时$\alpha^\ddagger_p = 1$,对于所有$p > 2$时$\alpha^\ddagger_p < 1$,且当$p \to \infty$时$\alpha^\ddagger_p \to 1/2$,除非随机Gap-ETH为假。 3. 对于所有$p \in [1, \infty) \setminus 2 \mathbb{Z}$和所有$C > 1$,对于任意常数$\alpha > \alpha^\dagger_{p, C}$,$\mathrm{BDD}_{p, \alpha}$没有$2^{n/C}$时间算法,其中$\alpha^\dagger_{p, C}$是显式的,且对于任何固定的$p \in [1, \infty)$,当$C \to \infty$时$\alpha^\dagger_{p, C} \to 1$,假设$c_\mathsf{kn} > 0$ 并且非统一Gap-SETH成立。 4. 对于所有$p > p_0 \approx 2.1397$,$p \notin 2\mathbb{Z}$和所有$C > C_p$,对于某个常数$\gamma > 1$,$\mathrm{SVP}_{p, \gamma}$没有$2^{n/C}$时间算法,其中$C_p > 1$是显式的,且当$p \to \infty$时$C_p \to 1$,除非随机Gap-SETH为假。

更新时间: 2025-07-28 22:24:49

领域: cs.CC,cs.CR,cs.DS

下载: http://arxiv.org/abs/2109.04025v3

Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers

Wildlife populations in Africa face severe threats, with vertebrate numbers declining by over 65% in the past five decades. In response, image classification using deep learning has emerged as a promising tool for biodiversity monitoring and conservation. This paper presents a comparative study of deep learning models for automatically classifying African wildlife images, focusing on transfer learning with frozen feature extractors. Using a public dataset of four species: buffalo, elephant, rhinoceros, and zebra; we evaluate the performance of DenseNet-201, ResNet-152, EfficientNet-B4, and Vision Transformer ViT-H/14. DenseNet-201 achieved the best performance among convolutional networks (67% accuracy), while ViT-H/14 achieved the highest overall accuracy (99%), but with significantly higher computational cost, raising deployment concerns. Our experiments highlight the trade-offs between accuracy, resource requirements, and deployability. The best-performing CNN (DenseNet-201) was integrated into a Hugging Face Gradio Space for real-time field use, demonstrating the feasibility of deploying lightweight models in conservation settings. This work contributes to African-grounded AI research by offering practical insights into model selection, dataset preparation, and responsible deployment of deep learning tools for wildlife conservation.

Updated: 2025-07-28 22:18:13

标题: 评估深度学习模型用于非洲野生动物图像分类:从DenseNet到Vision Transformers

摘要: 在过去的五十年里,非洲的野生动物种群面临严重威胁,脊椎动物数量下降了超过65%。作为应对措施,利用深度学习进行图像分类已经成为一种有前途的用于生物多样性监测和保护的工具。本文介绍了一项针对自动分类非洲野生动物图像的深度学习模型的比较研究,重点是使用冻结特征提取器的迁移学习。使用包括水牛、大象、犀牛和斑马在内的四种物种的公共数据集,我们评估了DenseNet-201、ResNet-152、EfficientNet-B4和Vision Transformer ViT-H/14的性能。在卷积网络中,DenseNet-201表现最佳(67%的准确率),而ViT-H/14则获得了最高的整体准确率(99%),但计算成本显著更高,引发了部署方面的担忧。我们的实验突显了准确性、资源需求和可部署性之间的权衡。表现最佳的卷积神经网络(DenseNet-201)已集成到Hugging Face Gradio Space中,用于实时现场使用,展示了在保护环境中部署轻量级模型的可行性。这项工作通过提供关于模型选择、数据集准备和深度学习工具在野生动物保护中的负责任部署的实践见解,为非洲本土AI研究做出了贡献。

更新时间: 2025-07-28 22:18:13

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.21364v1

Efficacy of AI RAG Tools for Complex Information Extraction and Data Annotation Tasks: A Case Study Using Banks Public Disclosures

We utilize a within-subjects design with randomized task assignments to understand the effectiveness of using an AI retrieval augmented generation (RAG) tool to assist analysts with an information extraction and data annotation task. We replicate an existing, challenging real-world annotation task with complex multi-part criteria on a set of thousands of pages of public disclosure documents from global systemically important banks (GSIBs) with heterogeneous and incomplete information content. We test two treatment conditions. First, a "naive" AI use condition in which annotators use only the tool and must accept the first answer they are given. And second, an "interactive" AI treatment condition where annotators use the tool interactively, and use their judgement to follow-up with additional information if necessary. Compared to the human-only baseline, the use of the AI tool accelerated task execution by up to a factor of 10 and enhanced task accuracy, particularly in the interactive condition. We find that when extrapolated to the full task, these methods could save up to 268 hours compared to the human-only approach. Additionally, our findings suggest that annotator skill, not just with the subject matter domain, but also with AI tools, is a factor in both the accuracy and speed of task performance.

Updated: 2025-07-28 22:06:11

标题: AI RAG工具在复杂信息提取和数据标注任务中的有效性:以银行公开披露为例研究

摘要: 我们利用一个被试设计,随机任务分配,以了解利用AI检索增强生成(RAG)工具来协助分析员进行信息提取和数据标注任务的效果。我们在一个包含复杂多部分标准的现有、具有挑战性的真实世界标注任务上复制了这个设计,该任务涉及来自全球系统性重要银行(GSIBs)的数千页公开披露文件,其中包含异质和不完整的信息内容。我们测试了两种处理条件。首先是“天真”的AI使用条件,即标注员只使用工具,并且必须接受他们得到的第一个答案。第二是“交互”AI处理条件,其中标注员可以交互使用工具,并根据需要使用自己的判断来跟进额外信息。与仅人工基准相比,使用AI工具能够将任务执行加速最多达10倍,并提高任务准确性,特别是在交互条件下。我们发现,当推广到整个任务时,这些方法与仅人工方法相比可以节省高达268小时。此外,我们的研究结果表明,标注员的技能,不仅仅是在主题领域方面,还包括对AI工具的熟练程度,是任务执行的准确性和速度的因素。

更新时间: 2025-07-28 22:06:11

领域: cs.AI,cs.HC,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2507.21360v1

A Contrastive Diffusion-based Network (CDNet) for Time Series Classification

Deep learning models are widely used for time series classification (TSC) due to their scalability and efficiency. However, their performance degrades under challenging data conditions such as class similarity, multimodal distributions, and noise. To address these limitations, we propose CDNet, a Contrastive Diffusion-based Network that enhances existing classifiers by generating informative positive and negative samples via a learned diffusion process. Unlike traditional diffusion models that denoise individual samples, CDNet learns transitions between samples--both within and across classes--through convolutional approximations of reverse diffusion steps. We introduce a theoretically grounded CNN-based mechanism to enable both denoising and mode coverage, and incorporate an uncertainty-weighted composite loss for robust training. Extensive experiments on the UCR Archive and simulated datasets demonstrate that CDNet significantly improves state-of-the-art (SOTA) deep learning classifiers, particularly under noisy, similar, and multimodal conditions.

Updated: 2025-07-28 21:56:17

标题: 一种基于对比扩散的时间序列分类网络(CDNet)

摘要: 深度学习模型被广泛用于时间序列分类(TSC),因为它们具有可扩展性和效率。然而,在面临诸如类相似性、多模态分布和噪声等挑战性数据条件时,它们的性能会下降。为了解决这些限制,我们提出了CDNet,一种基于对比扩散网络的方法,通过学习扩散过程生成信息丰富的正负样本来增强现有的分类器。与传统的扩散模型不同,CDNet通过卷积逆扩散步骤的近似学习样本之间的转换--包括类内和类间的转换。我们引入了一个基于理论的CNN机制,以实现去噪和模式覆盖,并结合一个基于不确定性加权的复合损失函数进行稳健训练。在UCR存档和模拟数据集上的大量实验表明,CDNet显著改进了最先进的深度学习分类器,特别是在嘈杂、相似和多模态条件下。

更新时间: 2025-07-28 21:56:17

领域: cs.LG,62M10,I.5.1

下载: http://arxiv.org/abs/2507.21357v1

Games Agents Play: Towards Transactional Analysis in LLM-based Multi-Agent Systems

Multi-Agent Systems (MAS) are increasingly used to simulate social interactions, but most of the frameworks miss the underlying cognitive complexity of human behavior. In this paper, we introduce Trans-ACT (Transactional Analysis Cognitive Toolkit), an approach embedding Transactional Analysis (TA) principles into MAS to generate agents with realistic psychological dynamics. Trans-ACT integrates the Parent, Adult, and Child ego states into an agent's cognitive architecture. Each ego state retrieves context-specific memories and uses them to shape response to new situations. The final answer is chosen according to the underlying life script of the agent. Our experimental simulation, which reproduces the Stupid game scenario, demonstrates that agents grounded in cognitive and TA principles produce deeper and context-aware interactions. Looking ahead, our research opens a new way for a variety of applications, including conflict resolution, educational support, and advanced social psychology studies.

Updated: 2025-07-28 21:46:21

标题: 游戏代理人的角色:走向基于LLM的多代理系统中的交易分析

摘要: 多智能体系统(MAS)越来越被用来模拟社会互动,但大多数框架忽略了人类行为的基本认知复杂性。在本文中,我们介绍了Trans-ACT(交易分析认知工具包),一种将交易分析(TA)原则嵌入到MAS中以生成具有现实心理动态的智能体的方法。Trans-ACT将父母、成年人和孩子自我状态整合到一个智能体的认知架构中。每个自我状态都检索特定上下文的记忆,并使用它们来塑造对新情况的响应。最终答案根据智能体的基本生命剧本选择。我们的实验模拟重现了“愚蠢游戏”场景,表明基于认知和TA原则的智能体产生了更深入和具有上下文意识的互动。展望未来,我们的研究为各种应用开辟了新途径,包括冲突解决、教育支持和先进的社会心理学研究。

更新时间: 2025-07-28 21:46:21

领域: cs.AI,cs.MA

下载: http://arxiv.org/abs/2507.21354v1

Group Relative Augmentation for Data Efficient Action Detection

Adapting large Video-Language Models (VLMs) for action detection using only a few examples poses challenges like overfitting and the granularity mismatch between scene-level pre-training and required person-centric understanding. We propose an efficient adaptation strategy combining parameter-efficient tuning (LoRA) with a novel learnable internal feature augmentation. Applied within the frozen VLM backbone using FiLM, these augmentations generate diverse feature variations directly relevant to the task. Additionally, we introduce a group-weighted loss function that dynamically modulates the training contribution of each augmented sample based on its prediction divergence relative to the group average. This promotes robust learning by prioritizing informative yet reasonable augmentations. We demonstrate our method's effectiveness on complex multi-label, multi-person action detection datasets (AVA, MOMA), achieving strong mAP performance and showcasing significant data efficiency for adapting VLMs from limited examples.

Updated: 2025-07-28 21:46:05

标题: 群组相对增强用于数据高效行为检测

摘要: 将大型视频语言模型(VLMs)适应为仅使用少量示例进行动作检测面临过拟合和场景级预训练与所需以人为中心的理解之间的粒度不匹配等挑战。我们提出了一种有效的适应策略,结合参数高效调整(LoRA)和一种新颖的可学习的内部特征增强。在冻结的VLM骨干中应用FiLM,这些增强生成直接与任务相关的多样化特征变化。此外,我们引入了一种组加权损失函数,动态调节每个增强样本的训练贡献,基于其相对于组平均值的预测发散。这通过优先考虑信息丰富而合理的增强来促进鲁棒学习。我们在复杂的多标签、多人动作检测数据集(AVA、MOMA)上展示了我们的方法的有效性,实现了强大的mAP性能,并展示了从有限示例中适应VLM的显着数据效率。

更新时间: 2025-07-28 21:46:05

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.21353v1

DEM-NeRF: A Neuro-Symbolic Method for Scientific Discovery through Physics-Informed Simulation

Neural networks have emerged as a powerful tool for modeling physical systems, offering the ability to learn complex representations from limited data while integrating foundational scientific knowledge. In particular, neuro-symbolic approaches that combine data-driven learning, the neuro, with symbolic equations and rules, the symbolic, address the tension between methods that are purely empirical, which risk straying from established physical principles, and traditional numerical solvers that demand complete geometric knowledge and can be prohibitively expensive for high-fidelity simulations. In this work, we present a novel neuro-symbolic framework for reconstructing and simulating elastic objects directly from sparse multi-view image sequences, without requiring explicit geometric information. Specifically, we integrate a neural radiance field (NeRF) for object reconstruction with physics-informed neural networks (PINN) that incorporate the governing partial differential equations of elasticity. In doing so, our method learns a spatiotemporal representation of deforming objects that leverages both image supervision and symbolic physical constraints. To handle complex boundary and initial conditions, which are traditionally confronted using finite element methods, boundary element methods, or sensor-based measurements, we employ an energy-constrained Physics-Informed Neural Network architecture. This design enhances both simulation accuracy and the explainability of results.

Updated: 2025-07-28 21:40:17

标题: DEM-NeRF:一种通过物理信息模拟实现科学发现的神经符号方法

摘要: 神经网络已经成为建模物理系统的强大工具,它能够从有限数据中学习复杂的表示,并整合基础科学知识。特别是,将数据驱动学习(神经)与符号方程和规则(符号)结合的神经符号方法解决了方法纯粹经验化的紧张状态,这种方法有可能偏离已建立的物理原则,而传统的数值求解器则要求完全的几何知识,对于高保真度模拟来说成本过高。在这项工作中,我们提出了一种新颖的神经符号框架,直接从稀疏的多视图图像序列重建和模拟弹性物体,而无需明确的几何信息。具体而言,我们将神经辐射场(NeRF)用于物体重建,与包含弹性的控制偏微分方程的物理信息神经网络(PINN)相结合。通过这样做,我们的方法学习了一个利用图像监督和符号物理约束的变形对象的时空表示。为了处理复杂的边界和初始条件,传统上使用有限元方法、边界元方法或传感器测量,我们采用了一个能量受限的物理信息神经网络架构。这种设计提高了模拟的准确性和结果的可解释性。

更新时间: 2025-07-28 21:40:17

领域: cs.LG

下载: http://arxiv.org/abs/2507.21350v1

StructText: A Synthetic Table-to-Text Approach for Benchmark Generation with Multi-Dimensional Evaluation

Extracting structured information from text, such as key-value pairs that could augment tabular data, is quite useful in many enterprise use cases. Although large language models (LLMs) have enabled numerous automated pipelines for converting natural language into structured formats, there is still a lack of benchmarks for evaluating their extraction quality, especially in specific domains or focused documents specific to a given organization. Building such benchmarks by manual annotations is labour-intensive and limits the size and scalability of the benchmarks. In this work, we present StructText, an end-to-end framework for automatically generating high-fidelity benchmarks for key-value extraction from text using existing tabular data. It uses available tabular data as structured ground truth, and follows a two-stage ``plan-then-execute'' pipeline to synthetically generate corresponding natural-language text. To ensure alignment between text and structured source, we introduce a multi-dimensional evaluation strategy that combines (a) LLM-based judgments on factuality, hallucination, and coherence and (b) objective extraction metrics measuring numeric and temporal accuracy. We evaluated the proposed method on 71,539 examples across 49 datasets. Results reveal that while LLMs achieve strong factual accuracy and avoid hallucination, they struggle with narrative coherence in producing extractable text. Notably, models presume numerical and temporal information with high fidelity yet this information becomes embedded in narratives that resist automated extraction. We release a framework, including datasets, evaluation tools, and baseline extraction systems, to support continued research.

Updated: 2025-07-28 21:20:44

标题: StructText:一种用于多维评估的合成表格到文本方法的基准生成方法

摘要: 从文本中提取结构化信息,例如可以增强表格数据的键-值对,在许多企业用例中非常有用。尽管大型语言模型(LLMs)已经实现了许多自动化流水线,将自然语言转换为结构化格式,但仍然缺乏用于评估其提取质量的基准,特别是在特定领域或特定于给定组织的重点文档中。通过手动注释构建此类基准是一项劳动密集型工作,并限制了基准的规模和可扩展性。在这项工作中,我们提出了StructText,一个自动从文本中生成高保真基准的端到端框架,用于从现有表格数据中提取键-值对。它使用可用的表格数据作为结构化基准,并遵循一个两阶段的“计划-执行”流水线,以合成生成相应的自然语言文本。为了确保文本与结构化源的对齐,我们引入了一个多维评估策略,结合了(a)基于LLM的事实性、产生幻觉和连贯性的判断,以及(b)测量数字和时间准确性的客观提取指标。我们在49个数据集的71,539个示例上评估了所提出的方法。结果显示,虽然LLMs在实现强大的事实准确性并避免产生幻觉方面表现出色,但在生成可提取文本的叙事连贯性方面存在困难。值得注意的是,模型假定具有高保真度的数字和时间信息,但这些信息嵌入在抵抗自动提取的叙述中。我们发布了一个框架,包括数据集、评估工具和基线提取系统,以支持持续研究。

更新时间: 2025-07-28 21:20:44

领域: cs.CL,cs.AI,cs.DB,cs.IR

下载: http://arxiv.org/abs/2507.21340v1

Recovering Manifold Structure Using Ollivier-Ricci Curvature

We introduce ORC-ManL, a new algorithm to prune spurious edges from nearest neighbor graphs using a criterion based on Ollivier-Ricci curvature and estimated metric distortion. Our motivation comes from manifold learning: we show that when the data generating the nearest-neighbor graph consists of noisy samples from a low-dimensional manifold, edges that shortcut through the ambient space have more negative Ollivier-Ricci curvature than edges that lie along the data manifold. We demonstrate that our method outperforms alternative pruning methods and that it significantly improves performance on many downstream geometric data analysis tasks that use nearest neighbor graphs as input. Specifically, we evaluate on manifold learning, persistent homology, dimension estimation, and others. We also show that ORC-ManL can be used to improve clustering and manifold learning of single-cell RNA sequencing data. Finally, we provide empirical convergence experiments that support our theoretical findings.

Updated: 2025-07-28 21:15:10

标题: 使用Ollivier-Ricci曲率恢复流形结构

摘要: 我们介绍了一种新的算法ORC-ManL,用于基于Ollivier-Ricci曲率和估计度量失真的标准来修剪最近邻图中的虚假边缘。我们的动机来自流形学习:我们展示了当生成最近邻图的数据由低维流形的嘈杂样本组成时,穿越环境空间的边缘比沿着数据流形的边缘具有更负的Ollivier-Ricci曲率。我们证明了我们的方法胜过替代修剪方法,并且显著改善了许多使用最近邻图作为输入的下游几何数据分析任务的性能。具体而言,我们在流形学习、持久同调、维度估计等方面进行了评估。我们还展示了ORC-ManL可用于改进单细胞RNA测序数据的聚类和流形学习。最后,我们提供了支持我们理论发现的经验收敛实验。

更新时间: 2025-07-28 21:15:10

领域: cs.LG,cs.AI,cs.CG

下载: http://arxiv.org/abs/2410.01149v2

Graph neural networks for residential location choice: connection to classical logit models

Researchers have adopted deep learning for classical discrete choice analysis as it can capture complex feature relationships and achieve higher predictive performance. However, the existing deep learning approaches cannot explicitly capture the relationship among choice alternatives, which has been a long-lasting focus in classical discrete choice models. To address the gap, this paper introduces Graph Neural Network (GNN) as a novel framework to analyze residential location choice. The GNN-based discrete choice models (GNN-DCMs) offer a structured approach for neural networks to capture dependence among spatial alternatives, while maintaining clear connections to classical random utility theory. Theoretically, we demonstrate that the GNN-DCMs incorporate the nested logit (NL) model and the spatially correlated logit (SCL) model as two specific cases, yielding novel algorithmic interpretation through message passing among alternatives' utilities. Empirically, the GNN-DCMs outperform benchmark MNL, SCL, and feedforward neural networks in predicting residential location choices among Chicago's 77 community areas. Regarding model interpretation, the GNN-DCMs can capture individual heterogeneity and exhibit spatially-aware substitution patterns. Overall, these results highlight the potential of GNN-DCMs as a unified and expressive framework for synergizing discrete choice modeling and deep learning in the complex spatial choice contexts.

Updated: 2025-07-28 21:01:00

标题: 图神经网络用于居住地选择:与经典logit模型的关联

摘要: 研究人员已经采用深度学习进行经典离散选择分析,因为它可以捕捉复杂的特征关系并实现更高的预测性能。然而,现有的深度学习方法无法明确捕捉选择替代品之间的关系,这在经典离散选择模型中一直是一个长期关注的焦点。为了弥补这一差距,本文引入了图神经网络(GNN)作为一种新颖的框架来分析居住地选择。基于GNN的离散选择模型(GNN-DCMs)为神经网络提供了一种结构化方法,以捕捉空间替代品之间的依赖关系,同时保持与经典随机效用理论的明确联系。在理论上,我们证明了GNN-DCMs将嵌套对数(NL)模型和空间相关对数(SCL)模型纳入其中作为两个特定案例,通过替代品效用之间的消息传递产生了新颖的算法解释。在实证方面,GNN-DCMs在预测芝加哥的77个社区地区的居住地选择方面优于基准MNL、SCL和前馈神经网络。关于模型解释,GNN-DCMs可以捕捉个体异质性并展现出空间感知的替代模式。总的来说,这些结果突显了GNN-DCMs作为一个统一且表现力强的框架,在复杂的空间选择背景下协同离散选择建模和深度学习的潜力。

更新时间: 2025-07-28 21:01:00

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.21334v1

Predicting VBAC Outcomes from U.S. Natality Data using Deep and Classical Machine Learning Models

Accurately predicting the outcome of a trial of labor after cesarean (TOLAC) is essential for guiding prenatal counseling and minimizing delivery-related risks. This study presents supervised machine learning models for predicting vaginal birth after cesarean (VBAC) using 643,029 TOLAC cases from the CDC WONDER Natality dataset (2017-2023). After filtering for singleton births with one or two prior cesareans and complete data across 47 prenatal-period features, three classifiers were trained: logistic regression, XGBoost, and a multilayer perceptron (MLP). The MLP achieved the highest performance with an AUC of 0.7287, followed closely by XGBoost (AUC = 0.727), both surpassing the logistic regression baseline (AUC = 0.709). To address class imbalance, class weighting was applied to the MLP, and a custom loss function was implemented in XGBoost. Evaluation metrics included ROC curves, confusion matrices, and precision-recall analysis. Logistic regression coefficients highlighted maternal BMI, education, parity, comorbidities, and prenatal care indicators as key predictors. Overall, the results demonstrate that routinely collected, early-pregnancy variables can support scalable and moderately high-performing VBAC prediction models. These models offer potential utility in clinical decision support, particularly in settings lacking access to specialized intrapartum data.

Updated: 2025-07-28 20:54:55

标题: 使用深度学习和传统机器学习模型从美国出生数据预测VBAC结果

摘要: 准确预测剖宫产后试产分娩(TOLAC)的结果对指导产前咨询和减少与分娩相关的风险至关重要。本研究利用CDC WONDER Natality数据集(2017-2023)中的643,029个TOLAC案例,提出了用于预测剖宫产后阴道分娩(VBAC)的监督式机器学习模型。在筛选出具有一个或两个先前剖宫产的单胎出生并具有47个产前期特征的完整数据后,训练了三个分类器:逻辑回归、XGBoost和多层感知器(MLP)。MLP取得了最高性能,AUC为0.7287,紧随其后的是XGBoost(AUC=0.727),两者均超过了逻辑回归基准(AUC=0.709)。为解决类别不平衡问题,对MLP应用了类别加权,并在XGBoost中实现了自定义损失函数。评估指标包括ROC曲线、混淆矩阵和精确率-召回率分析。逻辑回归系数突出显示了母体BMI、教育、产次、合并症和产前护理指标等关键预测因子。总体而言,结果表明常规收集的早孕变量可以支持可扩展且性能适中的VBAC预测模型。这些模型在临床决策支持方面具有潜在的实用性,特别是在缺乏专门产程数据访问的环境中。

更新时间: 2025-07-28 20:54:55

领域: stat.AP,cs.LG

下载: http://arxiv.org/abs/2507.21330v1

An Algebraic Approach to Moralisation and Triangulation of Probabilistic Graphical Models

Moralisation and Triangulation are transformations allowing to switch between different ways of factoring a probability distribution into a graphical model. Moralisation allows to view a Bayesian network (a directed model) as a Markov network (an undirected model), whereas triangulation works in the opposite direction. We present a categorical framework where these transformations are modelled as functors between a category of Bayesian networks and one of Markov networks. The two kinds of network (the objects of these categories) are themselves represented as functors, from a `syntax' domain to a `semantics' codomain. Notably, moralisation and triangulation are definable inductively on such syntax, and operate as a form of functor pre-composition. This approach introduces a modular, algebraic perspective in the theory of probabilistic graphical models.

Updated: 2025-07-28 20:53:14

标题: 一种代数方法用于概率图模型的道德化和三角化

摘要: 道德化和三角化是一种转换,允许在不同的方式将概率分布因子分解为图模型之间进行切换。道德化允许将贝叶斯网络(一个有向模型)视为马尔可夫网络(一个无向模型),而三角化则在相反方向上起作用。我们提出了一个范畴框架,在这个框架中,这些转换被建模为贝叶斯网络类别和马尔可夫网络类别之间的函子。这两种网络(这些类别的对象)本身被表示为函子,从一个“语法”域到一个“语义”共域。值得注意的是,道德化和三角化在这样的语法上是可归纳定义的,并且作为函子的预组合形式操作。这种方法在概率图模型理论中引入了一种模块化、代数的视角。

更新时间: 2025-07-28 20:53:14

领域: cs.AI,cs.LO,math.CT

下载: http://arxiv.org/abs/2503.11820v2

SQuat: Subspace-orthogonal KV Cache Quantization

The key-value (KV) cache accelerates LLMs decoding by storing KV tensors from previously generated tokens. It reduces redundant computation at the cost of increased memory usage. To mitigate this overhead, existing approaches compress KV tensors into lower-bit representations; however, quantization errors can accumulate as more tokens are generated, potentially resulting in undesired outputs. In this paper, we introduce SQuat (Subspace-orthogonal KV cache quantization). It first constructs a subspace spanned by query tensors to capture the most critical task-related information. During key tensor quantization, it enforces that the difference between the (de)quantized and original keys remains orthogonal to this subspace, minimizing the impact of quantization errors on the attention mechanism's outputs. SQuat requires no model fine-tuning, no additional calibration dataset for offline learning, and is grounded in a theoretical framework we develop. Through numerical experiments, we show that our method reduces peak memory by 2.17 to 2.82, improves throughput by 2.45 to 3.60, and achieves more favorable benchmark scores than existing KV cache quantization algorithms.

Updated: 2025-07-28 20:44:23

标题: SQuat:子空间正交KV缓存量化

摘要: 键-值(KV)缓存通过存储先前生成的令牌的KV张量来加速LLMs解码。它减少了冗余计算,但增加了内存使用量。为了减少这种开销,现有方法将KV张量压缩为更低位表示;然而,随着生成更多令牌,量化误差可能会累积,可能导致不希望的输出。在本文中,我们介绍了SQuat(子空间正交KV缓存量化)。它首先构建由查询张量张成的子空间,以捕获最关键的与任务相关的信息。在键张量量化过程中,它要求(反)量化和原始键之间的差异保持在该子空间的正交性,最小化量化误差对注意机制输出的影响。SQuat不需要模型微调,也不需要离线学习的额外校准数据集,并且基于我们开发的理论框架。通过数值实验,我们展示了我们的方法将峰值内存降低了2.17至2.82,吞吐量提高了2.45至3.60,并且比现有的KV缓存量化算法获得了更有利的基准分数。

更新时间: 2025-07-28 20:44:23

领域: cs.LG,cs.AI,cs.CL,cs.IT,math.IT

下载: http://arxiv.org/abs/2503.24358v2

On Post-Quantum Cryptography Authentication for Quantum Key Distribution

The traditional way for a Quantum Key Distribution (QKD) user to join a quantum network is by authenticating themselves using pre-shared key material. While this approach is sufficient for small-scale networks, it becomes impractical as the network grows, due to the total quadratic increase in the number of pre-shared keys required. To address this scalability issue, Public Key Infrastructure (PKI) combined with Post-Quantum Cryptography (PQC) offers a more scalable solution, allowing users to authenticate the QKD traffic remotely to obtain information-theoretical secure (ITS) keys under the presented assumptions. Unlike traditional PKI, which relies on classical cryptographic algorithms such as RSA, the approach presented in this paper leverages PQC algorithms that are believed to be resistant to quantum attacks. Similarly to the SIGMA or TLS protocols, authentication, confidentiality, and integrity are achievable against bounded adversaries to ensure secure and scalable quantum networks.

Updated: 2025-07-28 20:40:11

标题: 关于量子密钥分发的后量子密码学认证

摘要: 传统的量子密钥分发(QKD)用户加入量子网络的方式是通过使用预先共享的密钥材料进行身份验证。虽然这种方法对于小规模网络来说是足够的,但随着网络规模的增长,由于所需预共享密钥数量的总平方增加,这种方法变得不切实际。为了解决这一可扩展性问题,公钥基础设施(PKI)与后量子密码学(PQC)相结合提供了一种更具可扩展性的解决方案,允许用户远程验证QKD流量以在提出的假设下获得信息理论安全(ITS)密钥。与依赖于经典加密算法如RSA的传统PKI不同,本文提出的方法利用了被认为对量子攻击具有抵抗力的PQC算法。类似于SIGMA或TLS协议,可对有界对手进行身份验证、保密性和完整性,以确保安全和可扩展的量子网络。

更新时间: 2025-07-28 20:40:11

领域: quant-ph,cs.CR,94A60, 94A62, 81P45, 81P94,E.3

下载: http://arxiv.org/abs/2507.21325v1

Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning

As generative agents become increasingly capable, alignment of their behavior with complex human values remains a fundamental challenge. Existing approaches often simplify human intent through reduction to a scalar reward, overlooking the multi-faceted nature of human feedback. In this work, we introduce a theoretical framework for preference-based Multi-Objective Inverse Reinforcement Learning (MO-IRL), where human preferences are modeled as latent vector-valued reward functions. We formalize the problem of recovering a Pareto-optimal reward representation from noisy preference queries and establish conditions for identifying the underlying multi-objective structure. We derive tight sample complexity bounds for recovering $\epsilon$-approximations of the Pareto front and introduce a regret formulation to quantify suboptimality in this multi-objective setting. Furthermore, we propose a provably convergent algorithm for policy optimization using preference-inferred reward cones. Our results bridge the gap between practical alignment techniques and theoretical guarantees, providing a principled foundation for learning aligned behaviors in a high-dimension and value-pluralistic environment.

Updated: 2025-07-28 20:30:50

标题: 学习来自噪声偏好的帕累托最优奖励:多目标逆强化学习框架

摘要: 随着生成性代理变得越来越有能力,使其行为与复杂的人类价值观保持一致仍然是一个基本挑战。现有方法往往通过将人类意图简化为标量奖励来忽略人类反馈的多方面性质。在这项工作中,我们引入了一个理论框架,用于基于偏好的多目标逆强化学习(MO-IRL),其中人类偏好被建模为潜在的矢量值奖励函数。我们形式化了从嘈杂的偏好查询中恢复帕累托最优奖励表示的问题,并建立了识别潜在多目标结构的条件。我们推导了恢复帕累托前沿的$\epsilon$-近似的紧凑样本复杂性边界,并引入了一个后悔的公式来量化在这种多目标设置中的次优性。此外,我们提出了一种经过证明收敛的算法,用于使用偏好推断的奖励锥进行策略优化。我们的结果弥合了实用的对齐技术和理论保证之间的差距,在高维度和价值多元化的环境中提供了学习对齐行为的原则性基础。

更新时间: 2025-07-28 20:30:50

领域: cs.LG,cs.AI,cs.CG

下载: http://arxiv.org/abs/2505.11864v3

Online Concurrent Multi-Robot Coverage Path Planning

Recently, centralized receding horizon online multi-robot coverage path planning algorithms have shown remarkable scalability in thoroughly exploring large, complex, unknown workspaces with many robots. In a horizon, the path planning and the path execution interleave, meaning when the path planning occurs for robots with no paths, the robots with outstanding paths do not execute, and subsequently, when the robots with new or outstanding paths execute to reach respective goals, path planning does not occur for those robots yet to get new paths, leading to wastage of both the robotic and the computation resources. As a remedy, we propose a centralized algorithm that is not horizon-based. It plans paths at any time for a subset of robots with no paths, i.e., who have reached their previously assigned goals, while the rest execute their outstanding paths, thereby enabling concurrent planning and execution. We formally prove that the proposed algorithm ensures complete coverage of an unknown workspace and analyze its time complexity. To demonstrate scalability, we evaluate our algorithm to cover eight large $2$D grid benchmark workspaces with up to 512 aerial and ground robots, respectively. A comparison with a state-of-the-art horizon-based algorithm shows its superiority in completing the coverage with up to 1.6x speedup. For validation, we perform ROS + Gazebo simulations in six 2D grid benchmark workspaces with 10 quadcopters and TurtleBots, respectively. We also successfully conducted one outdoor experiment with three quadcopters and one indoor with two TurtleBots.

Updated: 2025-07-28 20:06:51

标题: 在线并发多机器人覆盖路径规划

摘要: 最近,集中式逐步后见性在线多机器人覆盖路径规划算法在彻底探索具有许多机器人的大型、复杂、未知的工作空间方面显示出卓越的可扩展性。在一个视野内,路径规划和路径执行交织在一起,意味着当为没有路径的机器人进行路径规划时,具有未完成路径的机器人不执行,而当具有新路径或未完成路径的机器人执行以达到各自的目标时,尚未获得新路径的机器人不再进行路径规划,导致机器人和计算资源的浪费。为了解决这个问题,我们提出了一种非基于视野的集中式算法。它在任何时间为没有路径的机器人的子集计划路径,即已到达其先前指定目标的机器人,而其余机器人执行未完成的路径,从而实现并发规划和执行。我们正式证明了所提出的算法确保了未知工作空间的完全覆盖,并分析了其时间复杂度。为了展示可扩展性,我们评估了我们的算法来覆盖八个大型$2$D网格基准工作空间,分别具有最多512架航空和地面机器人。与最先进的基于视野的算法进行比较显示其在完成覆盖方面具有高达1.6倍的加速优势。为了验证,我们在六个2D网格基准工作空间中进行ROS + Gazebo模拟,分别使用10架四轴飞行器和TurtleBots。我们还成功地进行了一个室外实验,使用了三架四轴飞行器,以及一个室内实验,使用了两个TurtleBots。

更新时间: 2025-07-28 20:06:51

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2403.10460v2

MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data

In the era of big data, access to abundant data is crucial for driving research forward. However, such data is often inaccessible due to privacy concerns or high costs, particularly in healthcare domain. Generating synthetic (tabular) data can address this, but existing models typically require substantial amounts of data to train effectively, contradicting our objective to solve data scarcity. To address this challenge, we propose a novel framework to generate synthetic tabular data, powered by large language models (LLMs) that emulates the architecture of a Generative Adversarial Network (GAN). By incorporating data generation process as contextual information and utilizing LLM as the optimizer, our approach significantly enhance the quality of synthetic data generation in common scenarios with small sample sizes. Our experimental results on public and private datasets demonstrate that our model outperforms several state-of-art models regarding generating higher quality synthetic data for downstream tasks while keeping privacy of the real data.

Updated: 2025-07-28 19:42:32

标题: MALLM-GAN:用于合成表格数据的多智能体大型语言模型作为生成对抗网络

摘要: 在大数据时代,获取丰富数据对推动研究至关重要。然而,由于隐私问题或高成本,这些数据通常无法获取,尤其是在医疗领域。生成合成(表格)数据可以解决这个问题,但现有模型通常需要大量数据才能有效训练,与我们解决数据稀缺的目标相矛盾。为了解决这一挑战,我们提出了一个新的框架来生成合成表格数据,由大型语言模型(LLMs)驱动,模拟生成对抗网络(GAN)的架构。通过将数据生成过程作为上下文信息并利用LLM作为优化器,我们的方法显著提高了在常见场景中以小样本量生成合成数据的质量。我们在公共和私人数据集上的实验结果表明,我们的模型在生成更高质量的合成数据以供下游任务使用的同时保持真实数据的隐私方面优于几种最先进的模型。

更新时间: 2025-07-28 19:42:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.10521v4

Position: Adopt Constraints Over Penalties in Deep Learning

Recent efforts to develop trustworthy AI systems with accountability guarantees have led to widespread use of machine learning formulations incorporating external requirements, or constraints. These requirements are often enforced via penalization--adding fixed-weight terms to the task loss. We argue this approach is fundamentally ill-suited since there may be no penalty coefficient that simultaneously ensures constraint satisfaction and optimal constrained performance, i.e., that truly solves the constrained problem. Moreover, tuning these coefficients requires costly trial-and-error, incurring significant time and computational overhead. We, therefore, advocate for broader adoption of tailored constrained optimization methods--such as the Lagrangian approach, which jointly optimizes the penalization "coefficients" (the Lagrange multipliers) and the model parameters. Such methods (i) truly solve the constrained problem and do so accountably, by clearly defining feasibility and verifying when it is achieved, (ii) eliminate the need for extensive penalty tuning, and (iii) integrate seamlessly with modern deep learning pipelines.

Updated: 2025-07-28 19:38:14

标题: 标题翻译:立场:在深度学习中采用约束而非惩罚

摘要: 最近努力开发具有责任保证的可信AI系统导致广泛使用包含外部要求或约束的机器学习公式。这些要求通常通过惩罚来强制执行--将固定权重项添加到任务损失中。我们认为这种方法基本不适用,因为可能没有惩罚系数能够同时确保约束满足和最佳约束性能,即真正解决了约束问题。此外,调整这些系数需要昂贵的试错,造成重大时间和计算开销。因此,我们主张更广泛地采用定制的约束优化方法--如拉格朗日方法,它同时优化惩罚“系数”(拉格朗日乘数)和模型参数。这些方法(i)真正解决了约束问题,并通过清晰定义可行性并验证何时实现了这一点,(ii)消除了对广泛的惩罚调整的需求,(iii)与现代深度学习流程无缝集成。

更新时间: 2025-07-28 19:38:14

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2505.20628v3

FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation

We present FlagEvalMM, an open-source evaluation framework designed to comprehensively assess multimodal models across a diverse range of vision-language understanding and generation tasks, such as visual question answering, text-to-image/video generation, and image-text retrieval. We decouple model inference from evaluation through an independent evaluation service, thus enabling flexible resource allocation and seamless integration of new tasks and models. Moreover, FlagEvalMM utilizes advanced inference acceleration tools (e.g., vLLM, SGLang) and asynchronous data loading to significantly enhance evaluation efficiency. Extensive experiments show that FlagEvalMM offers accurate and efficient insights into model strengths and limitations, making it a valuable tool for advancing multimodal research. The framework is publicly accessible at https://github.com/flageval-baai/FlagEvalMM.

Updated: 2025-07-28 19:31:25

标题: FlagEvalMM:一个用于全面多模态模型评估的灵活框架

摘要: 我们提出了FlagEvalMM,这是一个开源评估框架,旨在全面评估多模态模型在各种视觉语言理解和生成任务中的表现,如视觉问题回答、文本到图像/视频生成和图像文本检索。我们通过独立的评估服务将模型推断与评估分离,从而实现灵活的资源分配和新任务和模型的无缝集成。此外,FlagEvalMM利用先进的推理加速工具(例如vLLM、SGLang)和异步数据加载,显著提高了评估效率。大量实验表明,FlagEvalMM提供了对模型优势和局限性的准确、高效的洞察力,使其成为推动多模态研究的有价值工具。该框架可在https://github.com/flageval-baai/FlagEvalMM 公开访问。

更新时间: 2025-07-28 19:31:25

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2506.09081v3

Semantic Numeration Systems as Dynamical Systems

The foundational concepts of semantic numeration systems theory are briefly outlined. The action of cardinal semantic operators unfolds over a set of cardinal abstract entities belonging to the cardinal semantic multeity. The cardinal abstract object (CAO) formed by them in a certain connectivity topology is proposed to be considered as a linear discrete dynamical system with nonlinear control. Under the assumption of ideal observability, the CAO state equations are provided for both stationary and non-stationary cases. The fundamental role of the configuration matrix, which combines information about the types of cardinal semantic operators in the CAO, their parameters and topology of connectivity, is demonstrated.

Updated: 2025-07-28 19:29:36

标题: 语义数字系统作为动力系统

摘要: 语义计数系统理论的基本概念被简要概述。基数语义运算符的作用在属于基数语义多元体的基数抽象实体集合上展开。建议将它们在某种连接拓扑中形成的基数抽象对象(CAO)视为具有非线性控制的线性离散动力系统。在理想可观测性的假设下,为静态和非静态情况提供了CAO状态方程。展示了配置矩阵的基本作用,该矩阵结合了关于CAO中基数语义运算符类型、参数和连接拓扑的信息。

更新时间: 2025-07-28 19:29:36

领域: cs.LO,cs.AI,11A63, 47S20, 68Q55

下载: http://arxiv.org/abs/2507.21295v1

Narrative Context Protocol: An Open-Source Storytelling Framework for Generative AI

Here we introduce Narrative Context Protocol (NCP), an open-source narrative standard designed to enable narrative interoperability, AI-driven authoring tools, real-time emergent narratives, and more. By encoding a story's structure in a "Storyform," which is a structured register of its narrative features, NCP enables narrative portability across systems as well as intent-based constraints for generative storytelling systems. We demonstrate the capabilities of NCP through a year-long experiment, during which an author used NCP and a custom authoring platform to create a playable, text-based experience based on her pre-existing novella. This experience is driven by generative AI, with unconstrained natural language input. NCP functions as a set of "guardrails" that allows the generative system to accommodate player agency while also ensuring that narrative context and coherence are maintained.

Updated: 2025-07-28 19:26:47

标题: 叙事背景协议:一个面向生成人工智能的开源叙事框架

摘要: 在这里,我们介绍了叙事背景协议(NCP),这是一个开放源代码的叙事标准,旨在实现叙事互操作性、AI驱动的创作工具、实时新兴叙事等。通过将故事结构编码为“故事形式”,即其叙事特征的结构化注册表,NCP实现了跨系统的叙事可移植性,以及基于意图的生成叙事系统的约束。我们通过一项为期一年的实验展示了NCP的能力,其中一位作者使用NCP和自定义创作平台基于她之前的中篇小说创作了一个可玩的基于文本的体验。这一体验由生成式AI驱动,使用无约束的自然语言输入。NCP作为一组“护栏”,使生成系统能够适应玩家代理的同时,确保叙事背景和连贯性得以维持。

更新时间: 2025-07-28 19:26:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.04844v5

Learning Simulatable Models of Cloth with Spatially-varying Constitutive Properties

Materials used in real clothing exhibit remarkable complexity and spatial variation due to common processes such as stitching, hemming, dyeing, printing, padding, and bonding. Simulating these materials, for instance using finite element methods, is often computationally demanding and slow. Worse, such methods can suffer from numerical artifacts called ``membrane locking'' that makes cloth appear artificially stiff. Here we propose a general framework, called Mass-Spring Net, for learning a simple yet efficient surrogate model that captures the effects of these complex materials using only motion observations. The cloth is discretized into a mass-spring network with unknown material parameters that are learned directly from the motion data, using a novel force-and-impulse loss function. Our approach demonstrates the ability to accurately model spatially varying material properties from a variety of data sources, and immunity to membrane locking which plagues FEM-based simulations. Compared to graph-based networks and neural ODE-based architectures, our method achieves significantly faster training times, higher reconstruction accuracy, and improved generalization to novel dynamic scenarios.

Updated: 2025-07-28 19:21:04

标题: 学习具有空间变化本构特性的可模拟布料模型

摘要: 实际服装中使用的材料表现出非常复杂和空间变化的特性,这是由于常见的缝合、折边、染色、印花、填充和粘合等过程造成的。模拟这些材料,例如使用有限元方法,通常需要大量计算和时间。更糟糕的是,这种方法可能会出现称为“膜锁定”的数值人为错误,使布料看起来人为地变得僵硬。在这里,我们提出了一个通用框架,称为质点-弹簧网络(Mass-Spring Net),用于学习一个简单而高效的替代模型,仅使用运动观察数据就能捕捉这些复杂材料的影响。布料被离散成一个有未知材料参数的质点-弹簧网络,这些参数直接从运动数据中学习,使用一种新颖的力和冲量损失函数。我们的方法展示了能够准确地对来自各种数据源的空间变化材料属性进行建模的能力,并且不受困扰有限元方法的膜锁定问题。与基于图的网络和神经ODE架构相比,我们的方法实现了显著更快的训练时间、更高的重建准确性,并且对新颖的动态场景有更好的泛化能力。

更新时间: 2025-07-28 19:21:04

领域: cs.GR,cs.AI

下载: http://arxiv.org/abs/2507.21288v1

Structured Relevance Assessment for Robust Retrieval-Augmented Language Models

Retrieval-Augmented Language Models (RALMs) face significant challenges in reducing factual errors, particularly in document relevance evaluation and knowledge integration. We introduce a framework for structured relevance assessment that enhances RALM robustness through improved document evaluation, balanced intrinsic and external knowledge integration, and effective handling of unanswerable queries. Our approach employs a multi-dimensional scoring system that considers both semantic matching and source reliability, utilizing embedding-based relevance scoring and synthetic training data with mixed-quality documents. We implement specialized benchmarking on niche topics, a knowledge integration mechanism, and an "unknown" response protocol for queries with insufficient knowledge coverage. Preliminary evaluations demonstrate significant reductions in hallucination rates and improved transparency in reasoning processes. Our framework advances the development of more reliable question-answering systems capable of operating effectively in dynamic environments with variable data quality. While challenges persist in accurately distinguishing credible information and balancing system latency with thoroughness, this work represents a meaningful step toward enhancing RALM reliability.

Updated: 2025-07-28 19:20:04

标题: 结构化相关性评估用于强化检索增强语言模型

摘要: 检索增强语言模型(RALM)在减少事实错误方面面临着重大挑战,特别是在文档相关性评估和知识整合方面。我们引入了一个结构化相关性评估框架,通过改进文档评估、平衡内在和外部知识整合以及有效处理无法回答的查询,增强了RALM的鲁棒性。我们的方法采用了一个多维评分系统,考虑了语义匹配和来源可靠性,利用基于嵌入的相关性评分和混合质量文档的合成训练数据。我们在小众主题上实施了专门的基准测试,还引入了一个知识整合机制和用于查询知识覆盖不足的“未知”响应协议。初步评估显示幻觉率显著降低,推理过程透明度提高。我们的框架推动了更可靠的问答系统的发展,能够在数据质量不稳定的动态环境中有效运行。尽管在准确区分可信信息和平衡系统延迟与全面性方面仍然存在挑战,但这项工作代表了提高RALM可靠性的一个重要步骤。

更新时间: 2025-07-28 19:20:04

领域: cs.AI

下载: http://arxiv.org/abs/2507.21287v1

Curiosity by Design: An LLM-based Coding Assistant Asking Clarification Questions

Large Language Models (LLMs) are increasingly used as coding assistants. However, the ambiguity of the developer's prompt often leads to incorrect code generation, as current models struggle to infer user intent without extensive prompt engineering or external context. This work aims to build an LLM-based coding assistant that mimics the human code review process by asking clarification questions when faced with ambiguous or under-specified queries. Our end-to-end system includes (1) a query classifier trained to detect unclear programming-related queries and (2) a fine-tuned LLM that generates clarification questions. Our evaluation shows that the fine-tuned LLM outperforms standard zero-shot prompting in generating useful clarification questions. Furthermore, our user study indicates that users find the clarification questions generated by our model to outperform the baseline, demonstrating that our coding assistant produces more accurate and helpful code responses compared to baseline coding assistants.

Updated: 2025-07-28 19:10:57

标题: 按设计的好奇心:基于LLM的编码助手提出澄清问题

摘要: 大型语言模型(LLMs)越来越被用作编码助手。然而,开发者提示的歧义经常导致错误的代码生成,因为当前的模型在没有广泛提示工程或外部上下文的情况下很难推断用户意图。本文旨在构建一个基于LLM的编码助手,通过在面对模糊或未明确查询时询问澄清问题来模仿人类代码审查过程。 我们的端到端系统包括(1)一个经过训练的查询分类器,用于检测不清晰的与编程相关的查询,以及(2)一个经过微调的LLM,用于生成澄清问题。我们的评估显示,经过微调的LLM在生成有用的澄清问题方面优于标准的零样本提示。此外,我们的用户研究表明,用户发现我们的模型生成的澄清问题优于基准线,表明我们的编码助手相比基准编码助手产生更准确和有用的代码响应。

更新时间: 2025-07-28 19:10:57

领域: cs.AI

下载: http://arxiv.org/abs/2507.21285v1

LeMix: Unified Scheduling for LLM Training and Inference on Multi-GPU Systems

Modern deployment of large language models (LLMs) frequently involves both inference serving and continuous retraining to stay aligned with evolving data and user feedback. Common practices separate these workloads onto distinct servers in isolated phases, causing substantial inefficiencies (e.g., GPU idleness) and delayed adaptation to new data in distributed settings. Our empirical analysis reveals that these inefficiencies stem from dynamic request arrivals during serving and workload heterogeneity in pipeline-parallel training. To address these challenges, we propose LeMix, a system for co-locating and managing concurrent LLM serving and training workloads. LeMix integrates offline profiling, execution prediction mechanisms, and runtime scheduling to dynamically adapt resource allocation based on workload characteristics and system conditions. By understanding task-specific behaviors and co-execution interference across shared nodes, LeMix improves utilization and serving quality without compromising serving responsiveness. Our evaluation shows that LeMix improves throughput by up to 3.53x, reduces inference loss by up to 0.61x, and delivers up to 2.12x higher response time SLO attainment over traditional separate setups. To our knowledge, this is the first work to uncover and exploit the opportunities of joint LLM inference and training, paving the way for more resource-efficient deployment of LLMs in production environments.

Updated: 2025-07-28 19:03:26

标题: LeMix:多GPU系统上LLM训练和推理的统一调度

摘要: 现代大型语言模型(LLMs)的部署通常涉及推理服务和持续重新训练,以保持与不断发展的数据和用户反馈保持一致。常见做法是将这些工作负载分开到不同的服务器上,在隔离的阶段进行,导致严重的低效率(例如,GPU 闲置)和在分布式环境中对新数据的延迟适应。我们的实证分析表明,这些低效率源于推理期间动态请求的到达和管道并行训练中的工作负载异质性。为了解决这些挑战,我们提出了 LeMix,一个用于共同定位和管理并发的LLM 服务和训练工作负载的系统。LeMix 整合离线分析、执行预测机制和运行时调度,根据工作负载特征和系统条件动态调整资源分配。通过了解任务特定行为和共享节点上的共同执行干扰,LeMix 提高了利用率和服务质量,而不会影响服务的响应速度。我们的评估显示,LeMix 将吞吐量提高了最多 3.53 倍,将推理损失降低了最多 0.61 倍,并在传统分开设置上实现了高达 2.12 倍更高的响应时间 SLO 达成率。据我们所知,这是首个揭示并利用联合LLM 推理和训练机会的工作,为在生产环境中更节约资源地部署LLM 打开了道路。

更新时间: 2025-07-28 19:03:26

领域: cs.AI,cs.CL,cs.DC

下载: http://arxiv.org/abs/2507.21276v1

Large Language Model-Enhanced Reinforcement Learning for Diverse and Novel Recommendations

In recommendation systems, diversity and novelty are essential for capturing varied user preferences and encouraging exploration, yet many systems prioritize click relevance. While reinforcement learning (RL) has been explored to improve diversity, it often depends on random exploration that may not align with user interests. We propose LAAC (LLM-guided Adversarial Actor Critic), a novel method that leverages large language models (LLMs) as reference policies to suggest novel items, while training a lightweight policy to refine these suggestions using system-specific data. The method formulates training as a bilevel optimization between actor and critic networks, enabling the critic to selectively favor promising novel actions and the actor to improve its policy beyond LLM recommendations. To mitigate overestimation of unreliable LLM suggestions, we apply regularization that anchors critic values for unexplored items close to well-estimated dataset actions. Experiments on real-world datasets show that LAAC outperforms existing baselines in diversity, novelty, and accuracy, while remaining robust on imbalanced data, effectively integrating LLM knowledge without expensive fine-tuning.

Updated: 2025-07-28 19:00:40

标题: 大型语言模型增强的强化学习用于多样化和新颖推荐

摘要: 在推荐系统中,多样性和新颖性对于捕捉不同用户偏好并鼓励探索至关重要,然而许多系统优先考虑点击相关性。虽然强化学习(RL)已经被探索用来提高多样性,但它往往依赖于可能与用户兴趣不符的随机探索。我们提出了LAAC(LLM引导的对抗性演员评论家),这是一种新颖的方法,利用大型语言模型(LLMs)作为参考策略来建议新颖的物品,同时训练一个轻量级策略来使用系统特定数据来改进这些建议。该方法将训练形式化为演员和评论家网络之间的双层优化,使评论家能够选择性地支持有前途的新颖行动,而演员则可以超越LLM的建议改进其策略。为了减轻不可靠LLM建议的高估问题,我们应用了正则化,将评论家对未探索物品的值锚定在数据集动作的良好估计附近。在真实世界数据集上的实验表明,LAAC在多样性、新颖性和准确性方面优于现有的基线,同时在不平衡数据上保持稳健,在没有昂贵微调的情况下有效地整合了LLM知识。

更新时间: 2025-07-28 19:00:40

领域: cs.LG

下载: http://arxiv.org/abs/2507.21274v1

Deep Polynomial Chaos Expansion

Polynomial chaos expansion (PCE) is a classical and widely used surrogate modeling technique in physical simulation and uncertainty quantification. By taking a linear combination of a set of basis polynomials - orthonormal with respect to the distribution of uncertain input parameters - PCE enables tractable inference of key statistical quantities, such as (conditional) means, variances, covariances, and Sobol sensitivity indices, which are essential for understanding the modeled system and identifying influential parameters and their interactions. As the number of basis functions grows exponentially with the number of parameters, PCE does not scale well to high-dimensional problems. We address this challenge by combining PCE with ideas from probabilistic circuits, resulting in the deep polynomial chaos expansion (DeepPCE) - a deep generalization of PCE that scales effectively to high-dimensional input spaces. DeepPCE achieves predictive performance comparable to that of multi-layer perceptrons (MLPs), while retaining PCE's ability to compute exact statistical inferences via simple forward passes.

Updated: 2025-07-28 18:59:46

标题: 深度多项式混沌展开

摘要: 多项式混沌展开(PCE)是物理模拟和不确定性量化中经典且广泛使用的替代建模技术。通过将一组基础多项式的线性组合 - 与不确定输入参数的分布正交 - PCE使得关键统计量(如(条件)均值、方差、协方差和Sobol敏感性指数)的推断变得可行,这些统计量对于理解建模系统、识别影响力参数及其相互作用至关重要。随着基础函数数量随参数数量呈指数级增长,PCE不适用于高维问题。我们通过将PCE与概率电路的思想相结合来解决这一挑战,从而产生了深度多项式混沌展开(DeepPCE) - 这是对PCE的深度泛化,能够有效扩展到高维输入空间。DeepPCE实现了与多层感知器(MLPs)相当的预测性能,同时保留了通过简单的前向传递计算精确统计推断的能力。

更新时间: 2025-07-28 18:59:46

领域: cs.LG

下载: http://arxiv.org/abs/2507.21273v1

Levels of Analysis for Large Language Models

Modern artificial intelligence systems, such as large language models, are increasingly powerful but also increasingly hard to understand. Recognizing this problem as analogous to the historical difficulties in understanding the human mind, we argue that methods developed in cognitive science can be useful for understanding large language models. We propose a framework for applying these methods based on the levels of analysis that David Marr proposed for studying information processing systems. By revisiting established cognitive science techniques relevant to each level and illustrating their potential to yield insights into the behavior and internal organization of large language models, we aim to provide a toolkit for making sense of these new kinds of minds.

Updated: 2025-07-28 18:54:30

标题: 大型语言模型的分析层次

摘要: 现代人工智能系统,如大型语言模型,越来越强大,但也越来越难以理解。认识到这个问题类似于理解人类思维的历史困难,我们认为认知科学中开发的方法可以有助于理解大型语言模型。我们提出了一个基于David Marr为研究信息处理系统提出的分析层次的框架,用于应用这些方法。通过重新审视与每个层次相关的已建立的认知科学技术,并说明它们对于揭示大型语言模型的行为和内部组织的潜力,我们的目标是提供一个工具包,用于理解这些新型思维。

更新时间: 2025-07-28 18:54:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.13401v2

Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light

Three core tenets of reinforcement learning (RL)--concerning the definition of agency, the objective of learning, and the scope of the reward hypothesis--have been highlighted as key targets for conceptual revision, with major implications for theory and application. We propose a framework, inspired by open-ended evolutionary theory, to reconsider these three "dogmas." We revisit each assumption and address related concerns raised alongside them. To make our arguments relevant to RL as a model of biological learning, we first establish that evolutionary dynamics can plausibly operate within living brains over an individual's lifetime, and are not confined to cross-generational processes. We begin by revisiting the second dogma, drawing on evolutionary insights to enrich the "adaptation-rather-than-search" view of learning. We then address the third dogma regarding the limits of the reward hypothesis, using analogies from evolutionary fitness to illuminate the scalar reward vs. multi-objective debate. After discussing practical implications for exploration in RL, we turn to the first--and arguably most fundamental--issue: the absence of a formal account of agency. We argue that unlike the other two problems, the evolutionary paradigm alone cannot resolve the agency question, though it gestures in a productive direction. We advocate integrating ideas from origins-of-life theory, where the thermodynamics of sustenance and replication offer promising foundations for understanding agency and resource-constrained reinforcement learning in biological systems.

Updated: 2025-07-28 18:54:04

标题: 在进化之光下阐明强化学习的三大教条

摘要: 强化学习(RL)的三个核心原则——关于代理定义、学习目标和奖励假设范围的定义——已被强调为概念修订的关键目标,对理论和应用有重大影响。我们提出了一个受开放式进化理论启发的框架,重新考虑这三个“教条”。我们重新审视每个假设,并解决相关的担忧。为了使我们的论点与RL作为生物学习模型相关,我们首先确定进化动态可以在个体的一生中在生物大脑内运作,而不仅仅是跨代过程。我们首先重新审视第二个教条,利用进化的见解丰富“适应性而非搜索”视角的学习。然后,我们解决了关于奖励假设限制的第三个教条,利用从进化适应度到阐明标量奖励与多目标辩论的类比。讨论了RL中探索的实际影响后,我们转向第一个——也可以说是最基本的问题:没有正式代理的解释。我们认为,与其他两个问题不同,单独的进化范式无法解决代理问题,尽管它指向了一个有益的方向。我们主张整合来自生命起源理论的想法,其中维持和复制的热力学为理解生物系统中的代理和资源受限强化学习提供了有希望的基础。

更新时间: 2025-07-28 18:54:04

领域: cs.AI

下载: http://arxiv.org/abs/2507.11482v3

Generative imaging for radio interferometry with fast uncertainty quantification

With the rise of large radio interferometric telescopes, particularly the SKA, there is a growing demand for computationally efficient image reconstruction techniques. Existing reconstruction methods, such as the CLEAN algorithm or proximal optimisation approaches, are iterative in nature, necessitating a large amount of compute. These methods either provide no uncertainty quantification or require large computational overhead to do so. Learned reconstruction methods have shown promise in providing efficient and high quality reconstruction. In this article we explore the use of generative neural networks that enable efficient approximate sampling of the posterior distribution for high quality reconstructions with uncertainty quantification. Our RI-GAN framework, builds on the regularised conditional generative adversarial network (rcGAN) framework by integrating a gradient U-Net (GU-Net) architecture - a hybrid reconstruction model that embeds the measurement operator directly into the network. This framework uses Wasserstein GANs to improve training stability in combination with regularisation terms that combat mode collapse, which are typical problems for conditional GANs. This approach takes as input the dirty image and the point spread function (PSF) of the observation and provides efficient, high-quality image reconstructions that are robust to varying visibility coverages, generalises to images with an increased dynamic range, and provides informative uncertainty quantification. Our methods provide a significant step toward computationally efficient, scalable, and uncertainty-aware imaging for next-generation radio telescopes.

Updated: 2025-07-28 18:52:07

标题: 射电干涉成像中的生成式成像与快速不确定性量化

摘要: 随着大型射电干涉望远镜的兴起,特别是SKA,对计算效率高的图像重建技术的需求不断增加。现有的重建方法,如CLEAN算法或邻近优化方法,具有迭代性质,需要大量计算。这些方法要么不提供不确定性量化,要么需要大量计算开销才能实现。学习型重建方法已经显示出在提供高效和高质量重建方面的潜力。在本文中,我们探讨了生成神经网络的使用,该网络能够对后验分布进行高效的近似采样,从而实现具有不确定性量化的高质量重建。我们的RI-GAN框架建立在正则化条件生成对抗网络(rcGAN)框架基础上,通过集成梯度U-Net(GU-Net)架构,这是一种将测量算子直接嵌入到网络中的混合重建模型。该框架使用Wasserstein GANs来提高训练稳定性,同时结合对抗模式崩溃的正则化项,这是条件GANs的典型问题。这种方法以脏图像和观测的点扩散函数(PSF)作为输入,提供高效、高质量的图像重建,对不同的可见度覆盖范围具有鲁棒性,适用于具有增加动态范围的图像,并提供有益的不确定性量化信息。我们的方法为下一代射电望远镜提供了向计算效率高、可扩展和具有不确定性意识的成像迈出了重要一步。

更新时间: 2025-07-28 18:52:07

领域: astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2507.21270v1

Numerical PDE solvers outperform neural PDE solvers

We present DeepFDM, a differentiable finite-difference framework for learning spatially varying coefficients in time-dependent partial differential equations (PDEs). By embedding a classical forward-Euler discretization into a convolutional architecture, DeepFDM enforces stability and first-order convergence via CFL-compliant coefficient parameterizations. Model weights correspond directly to PDE coefficients, yielding an interpretable inverse-problem formulation. We evaluate DeepFDM on a benchmark suite of scalar PDEs: advection, diffusion, advection-diffusion, reaction-diffusion and inhomogeneous Burgers' equations-in one, two and three spatial dimensions. In both in-distribution and out-of-distribution tests (quantified by the Hellinger distance between coefficient priors), DeepFDM attains normalized mean-squared errors one to two orders of magnitude smaller than Fourier Neural Operators, U-Nets and ResNets; requires 10-20X fewer training epochs; and uses 5-50X fewer parameters. Moreover, recovered coefficient fields accurately match ground-truth parameters. These results establish DeepFDM as a robust, efficient, and transparent baseline for data-driven solution and identification of parametric PDEs.

Updated: 2025-07-28 18:50:37

标题: 数值PDE求解器胜过神经PDE求解器

摘要: 我们提出了DeepFDM,一个可微有限差分框架,用于学习时变偏微分方程(PDEs)中的空间变化系数。通过将经典的向前欧拉离散化嵌入卷积架构中,DeepFDM通过符合CFL的系数参数化来强制稳定性和一阶收敛性。模型权重直接对应于PDE系数,产生了一个可解释的逆问题公式。我们在一个标量PDE基准套件上评估了DeepFDM:平流、扩散、平流-扩散、反应-扩散和非均匀Burgers方程-在一、二和三个空间维度上。在分布内和分布外测试(通过系数先验之间的Hellinger距离量化),DeepFDM的归一化均方误差比傅立叶神经算子、U-Nets和ResNets小一个到两个数量级;需要10-20倍少的训练时期;并且使用5-50倍少的参数。此外,恢复的系数场准确匹配地面真实参数。这些结果将DeepFDM确立为数据驱动解决方案和参数PDE识别的稳健、高效、透明基准。

更新时间: 2025-07-28 18:50:37

领域: math.NA,cs.LG,cs.NA,35R30 (Primary) 65M06 65M32 65C20 68T07 (Secondary)

下载: http://arxiv.org/abs/2507.21269v1

Adversarial attacks and defenses in explainable artificial intelligence: A survey

Explainable artificial intelligence (XAI) methods are portrayed as a remedy for debugging and trusting statistical and deep learning models, as well as interpreting their predictions. However, recent advances in adversarial machine learning (AdvML) highlight the limitations and vulnerabilities of state-of-the-art explanation methods, putting their security and trustworthiness into question. The possibility of manipulating, fooling or fairwashing evidence of the model's reasoning has detrimental consequences when applied in high-stakes decision-making and knowledge discovery. This survey provides a comprehensive overview of research concerning adversarial attacks on explanations of machine learning models, as well as fairness metrics. We introduce a unified notation and taxonomy of methods facilitating a common ground for researchers and practitioners from the intersecting research fields of AdvML and XAI. We discuss how to defend against attacks and design robust interpretation methods. We contribute a list of existing insecurities in XAI and outline the emerging research directions in adversarial XAI (AdvXAI). Future work should address improving explanation methods and evaluation protocols to take into account the reported safety issues.

Updated: 2025-07-28 18:48:17

标题: 对抗攻击和防御在可解释人工智能中的研究:一项调查

摘要: 可解释人工智能(XAI)方法被描绘为调试和信任统计和深度学习模型的解决方案,以及解释它们的预测。然而,最近在对抗机器学习(AdvML)方面的进展突显了最先进解释方法的局限性和脆弱性,使其安全性和可信度受到质疑。当应用于高风险决策和知识发现时,操纵、欺骗或“美化”模型推理的证据可能会产生不利后果。本调查提供了关于对机器学习模型解释的对抗攻击以及公平度指标的研究的综合概述。我们介绍了一种统一的符号和方法分类法,为来自AdvML和XAI交叉研究领域的研究人员和从业者提供了一个共同的基础。我们讨论了如何抵御攻击并设计强大的解释方法。我们列出了XAI中存在的现有不安全性,并概述了对抗性XAI(AdvXAI)中新兴的研究方向。未来的工作应解决改进解释方法和评估协议以考虑报告的安全问题。

更新时间: 2025-07-28 18:48:17

领域: cs.CR,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2306.06123v4

Multiscale geometrical and topological learning in the analysis of soft matter collective dynamics

Understanding the behavior and evolution of a dynamical many-body system by analyzing patterns in their experimentally captured images is a promising method relevant for a variety of living and non-living self-assembled systems. The arrays of moving liquid crystal skyrmions studied here are a representative example of hierarchically organized materials that exhibit complex spatiotemporal dynamics driven by multiscale processes. Joint geometric and topological data analysis (TDA) offers a powerful framework for investigating such systems by capturing the underlying structure of the data at multiple scales. In the TDA approach, we introduce the $\Psi$-function, a robust numerical topological descriptor related to both the spatiotemporal changes in the size and shape of individual topological solitons and the emergence of regions with their different spatial organization. The geometric method based on the analysis of vector fields generated from images of skyrmion ensembles offers insights into the nonlinear physical mechanisms of the system's response to external stimuli and provides a basis for comparison with theoretical predictions. The methodology presented here is very general and can provide a characterization of system behavior both at the level of individual pattern-forming agents and as a whole, allowing one to relate the results of image data analysis to processes occurring in a physical, chemical, or biological system in the real world.

Updated: 2025-07-28 18:40:37

标题: 多尺度几何和拓扑学习在软物质集体动力学分析中的应用

摘要: 通过分析实验捕获的图像中的模式来理解动态多体系统的行为和演化是一种有前途的方法,适用于各种自组装系统,包括生物和非生物系统。本文研究的移动液晶斯格米磁子阵列是一种具有复杂时空动态的层次结构材料的代表性例子,其动态由多尺度过程驱动。联合几何和拓扑数据分析(TDA)为通过捕捉数据在多个尺度上的基础结构来研究此类系统提供了一个强大的框架。在TDA方法中,我们引入了Ψ-函数,这是一个与单个拓扑孤子的大小和形状的时空变化以及不同空间组织区域的出现相关的稳健数值拓扑描述符。基于从斯格米磁子集合图像生成的矢量场分析的几何方法提供了对系统对外部刺激响应的非线性物理机制的洞察,并为与理论预测进行比较提供了基础。本文提出的方法非常通用,可以在个体形成模式代理和整体水平上对系统行为进行表征,从而使人们能够将图像数据分析结果与现实世界中发生的物理、化学或生物系统中的过程联系起来。

更新时间: 2025-07-28 18:40:37

领域: cond-mat.soft,cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2507.21265v1

Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models

Reinforcement learning (RL) is a key post-pretraining step for aligning large language models (LLMs) with complex tasks and human preferences. While it is often assumed that RL fine-tuning requires updating most of a model's parameters, we challenge this assumption with a surprising finding: RL fine-tuning consistently modifies only a small subnetwork (typically 5-30% of weights), leaving most parameters unchanged. We call this phenomenon RL-induced parameter update sparsity. It arises naturally, without any sparsity constraints or parameter-efficient tuning, and appears across multiple RL algorithms (e.g., PPO, DPO, SimPO, PRIME) and model families (e.g., OpenAI, Meta, and open-source LLMs). Moreover, the subnetworks updated by RL show substantial overlap across different seeds, datasets, and algorithms-far exceeding chance-suggesting a partially transferable structure in the pretrained model. We show that fine-tuning only this sparse subnetwork recovers full model performance and yields parameters nearly identical to the fully fine-tuned model. Our analysis suggests this sparsity emerges because RL operates near the model's original distribution, requiring only targeted changes. KL penalties, gradient clipping, and on-policy dynamics have limited effect on the sparsity pattern. These findings shed new light on how RL adapts models: not by shifting all weights, but by focusing training on a small, consistently updated subnetwork. This insight enables more efficient RL methods and reframes sparsity through the lens of the lottery ticket hypothesis.

Updated: 2025-07-28 18:29:13

标题: 强化学习对大型语言模型中的稀疏子网络进行微调

摘要: 强化学习(RL)是对齐大型语言模型(LLMs)与复杂任务和人类偏好的关键后预训练步骤。人们经常认为RL微调需要更新大部分模型参数,但我们通过一个惊人的发现对这一假设提出质疑:RL微调始终只修改一个小型子网络(通常是权重的5-30%),大部分参数保持不变。我们称这种现象为RL诱导的参数更新稀疏性。这种稀疏性自然而然地产生,没有任何稀疏约束或参数高效调整,并且出现在多个RL算法(如PPO,DPO,SimPO,PRIME)和模型系列(如OpenAI,Meta和开源LLMs)中。此外,RL更新的子网络在不同种子、数据集和算法之间显示出重叠性,远远超过偶然巧合,表明预训练模型中存在部分可转移的结构。我们表明,仅微调这个稀疏子网络即可恢复完整模型性能,并产生与完全微调模型几乎相同的参数。我们的分析表明,这种稀疏性的出现是因为RL在模型的原始分布附近运行,只需要有针对性的改变。KL惩罚、梯度截断和基于策略的动态对稀疏模式的影响有限。这些发现为我们提供了新的视角,揭示了RL如何适应模型:不是通过调整所有权重,而是通过将训练重点放在一个小的、一致更新的子网络上。这一洞见使得RL方法更加高效,并通过幸运票假设重新审视稀疏性。

更新时间: 2025-07-28 18:29:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.17107v2

Adaptive Multimodal Protein Plug-and-Play with Diffusion-Based Priors

In an inverse problem, the goal is to recover an unknown parameter (e.g., an image) that has typically undergone some lossy or noisy transformation during measurement. Recently, deep generative models, particularly diffusion models, have emerged as powerful priors for protein structure generation. However, integrating noisy experimental data from multiple sources to guide these models remains a significant challenge. Existing methods often require precise knowledge of experimental noise levels and manually tuned weights for each data modality. In this work, we introduce Adam-PnP, a Plug-and-Play framework that guides a pre-trained protein diffusion model using gradients from multiple, heterogeneous experimental sources. Our framework features an adaptive noise estimation scheme and a dynamic modality weighting mechanism integrated into the diffusion process, which reduce the need for manual hyperparameter tuning. Experiments on complex reconstruction tasks demonstrate significantly improved accuracy using Adam-PnP.

Updated: 2025-07-28 18:28:03

标题: 可变多模蛋白质插件与基于扩散的先验的即插即用

摘要: 在一个反问题中,目标是恢复一个未知参数(例如图像),通常在测量过程中经历了一些丢失或噪声转换。最近,深度生成模型,特别是扩散模型,已经成为蛋白质结构生成的强大先验。然而,整合来自多个来源的噪声实验数据以指导这些模型仍然是一个重大挑战。现有方法通常需要对实验噪声水平和每种数据模态的手动调整权重有精确了解。在这项工作中,我们介绍了Adam-PnP,一个Plug-and-Play框架,使用来自多个异构实验来源的梯度来引导预训练的蛋白质扩散模型。我们的框架具有自适应噪声估计方案和动态模态加权机制,集成到扩散过程中,从而减少了手动超参数调整的需要。复杂重建任务上的实验表明,使用Adam-PnP可以显著提高准确性。

更新时间: 2025-07-28 18:28:03

领域: cs.LG,cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2507.21260v1

Verification Cost Asymmetry in Cognitive Warfare: A Complexity-Theoretic Framework

Human verification under adversarial information flow operates as a cost-bounded decision procedure constrained by working memory limits and cognitive biases. We introduce the Verification Cost Asymmetry (VCA) coefficient, formalizing it as the ratio of expected verification work between populations under identical claim distributions. Drawing on probabilistically checkable proofs (PCP) and parameterized complexity theory, we construct dissemination protocols that reduce verification for trusted audiences to constant human effort while imposing superlinear costs on adversarial populations lacking cryptographic infrastructure. We prove theoretical guarantees for this asymmetry, validate the framework through controlled user studies measuring verification effort with and without spot-checkable provenance, and demonstrate practical encoding of real-world information campaigns. The results establish complexity-theoretic foundations for engineering democratic advantage in cognitive warfare, with immediate applications to content authentication, platform governance, and information operations doctrine.

Updated: 2025-07-28 18:23:44

标题: 在认知战争中的验证成本不对称:一个复杂性理论框架

摘要: 在对抗性信息流下的人类验证作为一个受成本限制的决策过程,受工作记忆限制和认知偏见的约束。我们引入了验证成本不对称(VCA)系数,将其形式化为在相同索赔分布下不同人群之间预期验证工作的比率。借鉴概率可检验证明(PCP)和参数化复杂性理论,我们构建了传播协议,将面向受信任的受众的验证工作降低到恒定的人类努力,同时对缺乏加密基础设施的对抗人群施加超线性成本。我们证明了这种不对称性的理论保证,通过控制用户研究验证了该框架,测量了具有和不具有现场可检验来源的验证工作,并展示了对现实世界信息活动的实际编码。这些结果为在认知战争中工程民主优势奠定了复杂性理论基础,可立即应用于内容认证、平台治理和信息运营原则。

更新时间: 2025-07-28 18:23:44

领域: cs.CR,cs.CC,cs.CY,cs.GT,F.0; H.0

下载: http://arxiv.org/abs/2507.21258v1

Heterogeneous Treatment Effect in Time-to-Event Outcomes: Harnessing Censored Data with Recursively Imputed Trees

Tailoring treatments to individual needs is a central goal in fields such as medicine. A key step toward this goal is estimating Heterogeneous Treatment Effects (HTE) - the way treatments impact different subgroups. While crucial, HTE estimation is challenging with survival data, where time until an event (e.g., death) is key. Existing methods often assume complete observation, an assumption violated in survival data due to right-censoring, leading to bias and inefficiency. Cui et al. (2023) proposed a doubly-robust method for HTE estimation in survival data under no hidden confounders, combining a causal survival forest with an augmented inverse-censoring weighting estimator. However, we find it struggles under heavy censoring, which is common in rare-outcome problems such as Amyotrophic lateral sclerosis (ALS). Moreover, most current methods cannot handle instrumental variables, which are a crucial tool in the causal inference arsenal. We introduce Multiple Imputation for Survival Treatment Response (MISTR), a novel, general, and non-parametric method for estimating HTE in survival data. MISTR uses recursively imputed survival trees to handle censoring without directly modeling the censoring mechanism. Through extensive simulations and analysis of two real-world datasets-the AIDS Clinical Trials Group Protocol 175 and the Illinois unemployment dataset we show that MISTR outperforms prior methods under heavy censoring in the no-hidden-confounders setting, and extends to the instrumental variable setting. To our knowledge, MISTR is the first non-parametric approach for HTE estimation with unobserved confounders via instrumental variables.

Updated: 2025-07-28 18:22:19

标题: 时间事件结果中的异质性治疗效应:利用递归插补树处理被截断数据

摘要: 将治疗方案个性化以满足个体需求是医学等领域的中心目标。朝着这个目标迈出的关键一步是估计异质性治疗效应(HTE)-即治疗如何影响不同亚组。尽管至关重要,但在生存数据中进行HTE估计是具有挑战性的,因为事件发生的时间(例如,死亡)至关重要。现有方法通常假设完全观测,而在生存数据中由于右截尾而违反这个假设,导致偏倚和低效。Cui等人(2023)提出了一种针对生存数据中无隐藏混杂因素的HTE估计的双重稳健方法,将因果生存森林与增强的逆截尾加权估计器结合起来。然而,我们发现它在重度截尾情况下表现不佳,而这在罕见结果问题(如肌萎缩性侧索硬化症)中很常见。此外,大多数当前方法无法处理工具变量,而工具变量是因果推断工具箱中的关键工具。我们介绍了用于生存数据中HTE估计的多重插补方法(MISTR),这是一种新颖、通用和非参数化方法。MISTR使用递归插补的生存树来处理截尾,而不是直接建模截尾机制。通过广泛的模拟和对两个真实世界数据集-艾滋病临床试验组175号协议和伊利诺伊州失业数据集的分析,我们展示了MISTR在无隐藏混杂因素设置下在重度截尾情况下优于先前方法,并且适用于工具变量设置。据我们所知,MISTR是通过工具变量进行HTE估计的第一个非参数化方法。

更新时间: 2025-07-28 18:22:19

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2502.01575v3

CompoST: A Benchmark for Analyzing the Ability of LLMs To Compositionally Interpret Questions in a QALD Setting

Language interpretation is a compositional process, in which the meaning of more complex linguistic structures is inferred from the meaning of their parts. Large language models possess remarkable language interpretation capabilities and have been successfully applied to interpret questions by mapping them to SPARQL queries. An open question is how systematic this interpretation process is. Toward this question, in this paper, we propose a benchmark for investigating to what extent the abilities of LLMs to interpret questions are actually compositional. For this, we generate three datasets of varying difficulty based on graph patterns in DBpedia, relying on Lemon lexica for verbalization. Our datasets are created in a very controlled fashion in order to test the ability of LLMs to interpret structurally complex questions, given that they have seen the atomic building blocks. This allows us to evaluate to what degree LLMs are able to interpret complex questions for which they "understand" the atomic parts. We conduct experiments with models of different sizes using both various prompt and few-shot optimization techniques as well as fine-tuning. Our results show that performance in terms of macro $F_1$ degrades from $0.45$ over $0.26$ down to $0.09$ with increasing deviation from the samples optimized on. Even when all necessary information was provided to the model in the input, the $F_1$ scores do not exceed $0.57$ for the dataset of lowest complexity. We thus conclude that LLMs struggle to systematically and compositionally interpret questions and map them into SPARQL queries.

Updated: 2025-07-28 18:20:41

标题: CompoST:LLM在QALD环境中进行问题组合解释能力分析的基准测试

摘要: 语言解释是一个合成过程,其中更复杂的语言结构的含义是从其部分的含义中推断出来的。大型语言模型具有显著的语言解释能力,并已成功地应用于通过将问题映射到SPARQL查询来解释问题。一个开放的问题是这个解释过程有多系统化。针对这个问题,在本文中,我们提出了一个基准,用于调查LLM解释问题的能力在多大程度上实际上是合成的。为此,我们基于DBpedia中的图模式生成了三个不同难度的数据集,依赖于Lemon词汇的动词化。我们的数据集是以非常受控制的方式创建的,以测试LLM解释结构复杂问题的能力,假设它们已经看到了原子构建块。这使我们能够评估LLM在多大程度上能够解释复杂问题,对于这些问题,它们“理解”了原子部分。我们使用不同大小的模型进行实验,同时使用各种提示和少样本优化技术以及微调。我们的结果表明,在与样本优化的偏差逐渐增大的情况下,宏观$F_1$的性能从$0.45$下降到$0.26$,最终降至$0.09$。即使在输入中提供了所有必要的信息给模型,对于最低复杂度的数据集,$F_1$分数也不超过$0.57$。因此,我们得出结论LLM在系统性和合成性地解释问题并将其映射到SPARQL查询方面存在困难。

更新时间: 2025-07-28 18:20:41

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.21257v1

On Explaining Visual Captioning with Hybrid Markov Logic Networks

Deep Neural Networks (DNNs) have made tremendous progress in multimodal tasks such as image captioning. However, explaining/interpreting how these models integrate visual information, language information and knowledge representation to generate meaningful captions remains a challenging problem. Standard metrics to measure performance typically rely on comparing generated captions with human-written ones that may not provide a user with a deep insights into this integration. In this work, we develop a novel explanation framework that is easily interpretable based on Hybrid Markov Logic Networks (HMLNs) - a language that can combine symbolic rules with real-valued functions - where we hypothesize how relevant examples from the training data could have influenced the generation of the observed caption. To do this, we learn a HMLN distribution over the training instances and infer the shift in distributions over these instances when we condition on the generated sample which allows us to quantify which examples may have been a source of richer information to generate the observed caption. Our experiments on captions generated for several state-of-the-art captioning models using Amazon Mechanical Turk illustrate the interpretability of our explanations, and allow us to compare these models along the dimension of explainability.

Updated: 2025-07-28 18:07:30

标题: 关于用混合马尔科夫逻辑网络解释视觉字幕化

摘要: 深度神经网络(DNNs)在诸如图像字幕等多模态任务中取得了巨大进展。然而,解释/解释这些模型如何整合视觉信息、语言信息和知识表示以生成有意义的字幕仍然是一个具有挑战性的问题。衡量性能的标准度量通常依赖于将生成的字幕与人类撰写的字幕进行比较,这可能无法为用户提供对这种整合的深刻见解。在这项工作中,我们基于混合马尔可夫逻辑网络(HMLNs)开发了一个新颖的解释框架,这种语言可以将符号规则与实值函数结合起来,我们假设训练数据中的相关示例可能如何影响生成观察到的字幕。为此,我们学习了在训练实例上的HMLN分布,并推断在条件生成的样本时这些实例的分布变化,从而使我们能够量化哪些示例可能是生成观察到的字幕的更丰富信息的来源。我们使用亚马逊机械土耳其为几种最先进的字幕模型生成的字幕进行了实验,展示了我们解释的可解释性,并使我们能够比较这些模型在可解释性方面的差异。

更新时间: 2025-07-28 18:07:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.21246v1

Diffusion Denoiser-Aided Gyrocompassing

An accurate initial heading angle is essential for efficient and safe navigation across diverse domains. Unlike magnetometers, gyroscopes can provide accurate heading reference independent of the magnetic disturbances in a process known as gyrocompassing. Yet, accurate and timely gyrocompassing, using low-cost gyroscopes, remains a significant challenge in scenarios where external navigation aids are unavailable. Such challenges are commonly addressed in real-world applications such as autonomous vehicles, where size, weight, and power limitations restrict sensor quality, and noisy measurements severely degrade gyrocompassing performance. To cope with this challenge, we propose a novel diffusion denoiser-aided gyrocompass approach. It integrates a diffusion-based denoising framework with an enhanced learning-based heading estimation model. The diffusion denoiser processes raw inertial sensor signals before input to the deep learning model, resulting in accurate gyrocompassing. Experiments using both simulated and real sensor data demonstrate that our proposed approach improves gyrocompassing accuracy by 26% compared to model-based gyrocompassing and by 15% compared to other learning-driven approaches. This advancement holds particular significance for ensuring accurate and robust navigation in autonomous platforms that incorporate low-cost gyroscopes within their navigation systems.

Updated: 2025-07-28 18:06:46

标题: 扩散去噪陀螺罗盘

摘要: 一个准确的初始航向角对于在不同领域中进行高效和安全导航至关重要。与磁力计不同,陀螺仪可以提供磁场干扰无关的准确航向参考,这个过程被称为陀螺罗盘。然而,在外部导航辅助不可用的情况下,使用低成本陀螺仪进行准确和及时的陀螺罗盘仍然是一个重要挑战。这些挑战通常在现实世界的应用中得到解决,比如自动驾驶车辆,尺寸、重量和功耗限制限制了传感器质量,嘈杂的测量严重影响了陀螺罗盘的性能。为了应对这一挑战,我们提出了一种新的扩散去噪辅助陀螺罗盘方法。它将基于扩散的去噪框架与增强的基于学习的航向估计模型相结合。扩散去噪器在输入深度学习模型之前处理原始惯性传感器信号,从而实现准确的陀螺罗盘。使用模拟和真实传感器数据的实验证明,我们提出的方法与基于模型的陀螺罗盘相比,将陀螺罗盘的准确性提高了26%,与其他学习驱动方法相比提高了15%。这一进步对于确保低成本陀螺仪在其导航系统中的准确和稳健导航尤为重要。

更新时间: 2025-07-28 18:06:46

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2507.21245v1

Bubbleformer: Forecasting Boiling with Transformers

Modeling boiling (an inherently chaotic, multiphase process central to energy and thermal systems) remains a significant challenge for neural PDE surrogates. Existing models require future input (e.g., bubble positions) during inference because they fail to learn nucleation from past states, limiting their ability to autonomously forecast boiling dynamics. They also fail to model flow boiling velocity fields, where sharp interface-momentum coupling demands long-range and directional inductive biases. We introduce Bubbleformer, a transformer-based spatiotemporal model that forecasts stable and long-range boiling dynamics including nucleation, interface evolution, and heat transfer without dependence on simulation data during inference. Bubbleformer integrates factorized axial attention, frequency-aware scaling, and conditions on thermophysical parameters to generalize across fluids, geometries, and operating conditions. To evaluate physical fidelity in chaotic systems, we propose interpretable physics-based metrics that evaluate heat-flux consistency, interface geometry, and mass conservation. We also release BubbleML 2.0, a high-fidelity dataset that spans diverse working fluids (cryogens, refrigerants, dielectrics), boiling configurations (pool and flow boiling), flow regimes (bubbly, slug, annular), and boundary conditions. Bubbleformer sets new benchmark results in both prediction and forecasting of two-phase boiling flows.

Updated: 2025-07-28 18:02:57

标题: Bubbleformer:使用Transformer预测沸腾

摘要: 对于神经PDE替代模型来说,对沸腾(作为能源和热系统中的混沌多相过程)进行建模仍然是一个重大挑战。现有模型在推断时需要未来输入(例如气泡位置),因为它们无法从过去状态中学习到成核过程,从而限制了它们自主预测沸腾动态的能力。它们还未能建模流动沸腾的速度场,其中尖锐的界面-动量耦合要求长程和方向性的归纳偏差。我们介绍了基于Transformer的Bubbleformer,这是一个能够在推断过程中预测稳定且长程的沸腾动态,包括成核、界面演变和传热,而不依赖于模拟数据。Bubbleformer整合了分解的轴向注意力、频率感知缩放和对热物理参数的条件,以在液体、几何形状和操作条件之间进行泛化。为了评估在混沌系统中的物理可靠性,我们提出了可解释的基于物理学的指标,评估热流一致性、界面几何和质量守恒。我们还发布了BubbleML 2.0,这是一个高保真度的数据集,涵盖了多种工作流体(低温剂、制冷剂、介电体)、沸腾配置(池沸腾和流动沸腾)、流动区域(气泡、塞状、环形)和边界条件。Bubbleformer在两相沸腾流的预测和预测方面取得了新的基准结果。

更新时间: 2025-07-28 18:02:57

领域: cs.LG,cs.AI,cs.CE

下载: http://arxiv.org/abs/2507.21244v1

Fluidically Innervated Lattices Make Versatile and Durable Tactile Sensors

Tactile sensing plays a fundamental role in enabling robots to navigate dynamic and unstructured environments, particularly in applications such as delicate object manipulation, surface exploration, and human-robot interaction. In this paper, we introduce a passive soft robotic fingertip with integrated tactile sensing, fabricated using a 3D-printed elastomer lattice with embedded air channels. This sensorization approach, termed fluidic innervation, transforms the lattice into a tactile sensor by detecting pressure changes within sealed air channels, providing a simple yet robust solution to tactile sensing in robotics. Unlike conventional methods that rely on complex materials or designs, fluidic innervation offers a simple, scalable, single-material fabrication process. We characterize the sensors' response, develop a geometric model to estimate tip displacement, and train a neural network to accurately predict contact location and contact force. Additionally, we integrate the fingertip with an admittance controller to emulate spring-like behavior, demonstrate its capability for environment exploration through tactile feedback, and validate its durability under high impact and cyclic loading conditions. This tactile sensing technique offers advantages in terms of simplicity, adaptability, and durability and opens up new opportunities for versatile robotic manipulation.

Updated: 2025-07-28 18:00:04

标题: 流体神经元网络使多功能耐用触觉传感器

摘要: 触觉传感在使机器人能够在动态和非结构化环境中导航方面发挥着基础作用,特别是在需要进行精细物体操纵、表面探索和人机交互等应用中。本文介绍了一种集成触觉传感的被动软式机器人指尖,采用了3D打印弹性体格子结构并嵌入气道。这种传感器化方法被称为流体神经化,通过检测封闭气道内的压力变化,将格子转化为触觉传感器,提供了在机器人领域中简单而强大的触觉传感解决方案。与依赖复杂材料或设计的传统方法不同,流体神经化提供了一种简单、可扩展、单一材料的制造过程。我们对传感器的响应进行了表征,开发了一个几何模型来估计指尖位移,并训练了一个神经网络来准确预测接触位置和接触力。此外,我们将指尖与阻抗控制器集成,模拟弹簧样行为,通过触觉反馈展示其在环境探索中的能力,并验证其在高冲击和循环加载条件下的耐久性。这种触觉传感技术在简单性、适应性和耐久性方面具有优势,并为多功能机器人操纵开辟了新的机会。

更新时间: 2025-07-28 18:00:04

领域: cs.RO,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.21225v1

Benchmarking a Tunable Quantum Neural Network on Trapped-Ion and Superconducting Hardware

We implement a quantum generalization of a neural network on trapped-ion and IBM superconducting quantum computers to classify MNIST images, a common benchmark in computer vision. The network feedforward involves qubit rotations whose angles depend on the results of measurements in the previous layer. The network is trained via simulation, but inference is performed experimentally on quantum hardware. The classical-to-quantum correspondence is controlled by an interpolation parameter, $a$, which is zero in the classical limit. Increasing $a$ introduces quantum uncertainty into the measurements, which is shown to improve network performance at moderate values of the interpolation parameter. We then focus on particular images that fail to be classified by a classical neural network but are detected correctly in the quantum network. For such borderline cases, we observe strong deviations from the simulated behavior. We attribute this to physical noise, which causes the output to fluctuate between nearby minima of the classification energy landscape. Such strong sensitivity to physical noise is absent for clear images. We further benchmark physical noise by inserting additional single-qubit and two-qubit gate pairs into the neural network circuits. Our work provides a springboard toward more complex quantum neural networks on current devices: while the approach is rooted in standard classical machine learning, scaling up such networks may prove classically non-simulable and could offer a route to near-term quantum advantage.

Updated: 2025-07-28 18:00:03

标题: 在囚禁离子和超导硬件上对可调谐量子神经网络进行基准测试

摘要: 我们在离子阱和IBM超导量子计算机上实现了一个量子泛化的神经网络,用于对MNIST图像进行分类,这是计算机视觉中常见的基准测试。网络前馈涉及量子比特旋转,其角度取决于前一层中的测量结果。网络通过模拟进行训练,但推断是在量子硬件上实验性地进行的。经典到量子的对应关系由插值参数$a$控制,在经典极限下为零。增加$a$会引入量子测量的不确定性,这被证明在插值参数的适度值处可以提高网络性能。然后,我们专注于一些由经典神经网络无法分类但在量子网络中被正确检测的特定图像。对于这种边界情况,我们观察到与模拟行为的明显偏差。我们将此归因于物理噪声,导致输出在分类能量地形的相邻极小值之间波动。这种对物理噪声的强烈敏感性在清晰图像中是不存在的。我们通过将额外的单量子比特和双量子比特门对插入到神经网络电路中来对物理噪声进行基准测试。我们的工作为在当前设备上构建更复杂的量子神经网络提供了一个跳板:虽然该方法根植于标准的经典机器学习,但扩展这样的网络可能会证明在经典上无法模拟,并可能提供一条通往近期量子优势的途径。

更新时间: 2025-07-28 18:00:03

领域: quant-ph,cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2507.21222v1

Flow Matching Policy Gradients

Flow-based generative models, including diffusion models, excel at modeling continuous distributions in high-dimensional spaces. In this work, we introduce Flow Policy Optimization (FPO), a simple on-policy reinforcement learning algorithm that brings flow matching into the policy gradient framework. FPO casts policy optimization as maximizing an advantage-weighted ratio computed from the conditional flow matching loss, in a manner compatible with the popular PPO-clip framework. It sidesteps the need for exact likelihood computation while preserving the generative capabilities of flow-based models. Unlike prior approaches for diffusion-based reinforcement learning that bind training to a specific sampling method, FPO is agnostic to the choice of diffusion or flow integration at both training and inference time. We show that FPO can train diffusion-style policies from scratch in a variety of continuous control tasks. We find that flow-based models can capture multimodal action distributions and achieve higher performance than Gaussian policies, particularly in under-conditioned settings.

Updated: 2025-07-28 17:59:57

标题: 匹配流程的策略梯度

摘要: 基于流的生成模型,包括扩散模型,在高维空间中模拟连续分布方面表现出色。在这项工作中,我们介绍了流策略优化(FPO),这是一种简单的基于策略的强化学习算法,将流匹配引入策略梯度框架中。FPO将策略优化视为最大化从条件流匹配损失计算得出的加权优势比率,以与流行的PPO-clip框架兼容的方式。它避免了需要精确似然计算的需要,同时保留了基于流的模型的生成能力。与以往基于扩散的强化学习方法不同,这些方法将训练绑定到特定的采样方法,FPO在训练和推断时对扩散或流积分的选择是不可知的。我们展示了FPO可以在各种连续控制任务中从头开始训练扩散样式的策略。我们发现,基于流的模型可以捕捉多模态动作分布,并在欠条件设置中比高斯策略表现得更好。

更新时间: 2025-07-28 17:59:57

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2507.21053v1

Flow Matching Policy Gradients

Flow-based generative models, including diffusion models, excel at modeling continuous distributions in high-dimensional spaces. In this work, we introduce Flow Policy Optimization (FPO), a simple on-policy reinforcement learning algorithm that brings flow matching into the policy gradient framework. FPO casts policy optimization as maximizing an advantage-weighted ratio computed from the conditional flow matching loss, in a manner compatible with the popular PPO-clip framework. It sidesteps the need for exact likelihood computation while preserving the generative capabilities of flow-based models. Unlike prior approaches for diffusion-based reinforcement learning that bind training to a specific sampling method, FPO is agnostic to the choice of diffusion or flow integration at both training and inference time. We show that FPO can train diffusion-style policies from scratch in a variety of continuous control tasks. We find that flow-based models can capture multimodal action distributions and achieve higher performance than Gaussian policies, particularly in under-conditioned settings.

Updated: 2025-07-28 17:59:57

标题: 流匹配策略梯度

摘要: 基于流的生成模型,包括扩散模型,在高维空间中模拟连续分布方面表现出色。在这项工作中,我们介绍了流策略优化(FPO),这是一种简单的在线策略强化学习算法,将流匹配引入到策略梯度框架中。FPO将策略优化视为最大化从条件流匹配损失计算得到的加权优势比,以与流行的PPO-clip框架兼容的方式。它避免了对精确似然计算的需要,同时保留了基于流的模型的生成能力。与基于扩散的强化学习的先前方法不同,这些方法将训练绑定到特定的采样方法,FPO在训练和推断时对扩散或流整合的选择是不可知的。我们展示了FPO可以在各种连续控制任务中从头开始训练扩散风格的策略。我们发现基于流的模型可以捕捉多模态动作分布,并在欠条件设置中取得比高斯策略更高的性能。

更新时间: 2025-07-28 17:59:57

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2507.21053v1

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning

Despite the promise of Multi-Task Learning in leveraging complementary knowledge across tasks, existing multi-task optimization (MTO) techniques remain fixated on resolving conflicts via optimizer-centric loss scaling and gradient manipulation strategies, yet fail to deliver consistent gains. In this paper, we argue that the shared representation space, where task interactions naturally occur, offers rich information and potential for operations complementary to existing optimizers, especially for facilitating the inter-task complementarity, which is rarely explored in MTO. This intuition leads to Rep-MTL, which exploits the representation-level task saliency to quantify interactions between task-specific optimization and shared representation learning. By steering these saliencies through entropy-based penalization and sample-wise cross-task alignment, Rep-MTL aims to mitigate negative transfer by maintaining the effective training of individual tasks instead pure conflict-solving, while explicitly promoting complementary information sharing. Experiments are conducted on four challenging MTL benchmarks covering both task-shift and domain-shift scenarios. The results show that Rep-MTL, even paired with the basic equal weighting policy, achieves competitive performance gains with favorable efficiency. Beyond standard performance metrics, Power Law exponent analysis demonstrates Rep-MTL's efficacy in balancing task-specific learning and cross-task sharing. The project page is available at HERE.

Updated: 2025-07-28 17:59:28

标题: Rep-MTL:释放多任务学习中的表示级任务显著性力量

摘要: 尽管多任务学习在利用任务之间的互补知识方面具有潜力,但现有的多任务优化(MTO)技术仍然专注于通过基于优化器的损失缩放和梯度操作策略来解决冲突,但未能提供一致的收益。在本文中,我们认为共享表示空间,任务之间自然发生互动的地方,提供了丰富的信息和潜力,可以进行与现有优化器互补的操作,特别是促进跨任务互补,这在MTO中很少探讨。这种直觉导致了Rep-MTL,它利用表示级任务显著性来量化任务特定优化和共享表示学习之间的交互作用。通过通过基于熵的惩罚和样本级的跨任务对齐来引导这些显著性,Rep-MTL旨在通过保持个体任务的有效训练而不是纯粹解决冲突来减轻负面转移,同时明确促进互补信息共享。在涵盖任务转移和领域转移场景的四个具有挑战性的MTL基准上进行了实验。结果显示,即使与基本的等权重策略配对,Rep-MTL也取得了具有竞争力的性能提升,效率也较高。除了标准性能指标外,Power Law指数分析显示Rep-MTL在平衡任务特定学习和跨任务共享方面的有效性。项目页面在此处可用。

更新时间: 2025-07-28 17:59:28

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.21049v1

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning

Despite the promise of Multi-Task Learning in leveraging complementary knowledge across tasks, existing multi-task optimization (MTO) techniques remain fixated on resolving conflicts via optimizer-centric loss scaling and gradient manipulation strategies, yet fail to deliver consistent gains. In this paper, we argue that the shared representation space, where task interactions naturally occur, offers rich information and potential for operations complementary to existing optimizers, especially for facilitating the inter-task complementarity, which is rarely explored in MTO. This intuition leads to Rep-MTL, which exploits the representation-level task saliency to quantify interactions between task-specific optimization and shared representation learning. By steering these saliencies through entropy-based penalization and sample-wise cross-task alignment, Rep-MTL aims to mitigate negative transfer by maintaining the effective training of individual tasks instead pure conflict-solving, while explicitly promoting complementary information sharing. Experiments are conducted on four challenging MTL benchmarks covering both task-shift and domain-shift scenarios. The results show that Rep-MTL, even paired with the basic equal weighting policy, achieves competitive performance gains with favorable efficiency. Beyond standard performance metrics, Power Law exponent analysis demonstrates Rep-MTL's efficacy in balancing task-specific learning and cross-task sharing. The project page is available at HERE.

Updated: 2025-07-28 17:59:28

标题: Rep-MTL:释放多任务学习中表示级任务显著性的力量

摘要: 尽管多任务学习在利用任务间互补知识方面具有潜力,但现有的多任务优化(MTO)技术仍然专注于通过优化器中心的损失缩放和梯度操作策略来解决冲突,但未能带来一致的收益。在本文中,我们认为共享表示空间,任务间交互自然发生的地方,提供了丰富的信息和潜力,可用于辅助现有优化器的操作,特别是促进很少在MTO中被探讨的任务间互补性。这种直觉导致了Rep-MTL,它利用表示级任务显著性来量化任务特定优化和共享表示学习之间的交互作用。通过通过基于熵的惩罚和样本级跨任务对齐来引导这些显著性,Rep-MTL旨在通过保持个体任务的有效训练而不是纯粹解决冲突,同时明确促进互补信息共享来减轻负面转移。实验在涵盖任务转移和领域转移场景的四个具有挑战性的MTL基准上进行。结果表明,即使与基本的等权重策略配对,Rep-MTL也实现了具有良好效率的竞争性性能提升。除了标准性能指标外,Power Law指数分析显示了Rep-MTL在平衡任务特定学习和跨任务共享方面的有效性。项目页面请点击此处查看。

更新时间: 2025-07-28 17:59:28

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.21049v1

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Large Language Models (LLMs) have demonstrated strong capabilities but remain fundamentally static, unable to adapt their internal parameters to novel tasks, evolving knowledge domains, or dynamic interaction contexts. As LLMs are increasingly deployed in open-ended, interactive environments, this static nature has become a critical bottleneck, necessitating agents that can adaptively reason, act, and evolve in real time. This paradigm shift -- from scaling static models to developing self-evolving agents -- has sparked growing interest in architectures and methods enabling continual learning and adaptation from data, interactions, and experiences. This survey provides the first systematic and comprehensive review of self-evolving agents, organized around three foundational dimensions -- what to evolve, when to evolve, and how to evolve. We examine evolutionary mechanisms across agent components (e.g., models, memory, tools, architecture), categorize adaptation methods by stages (e.g., intra-test-time, inter-test-time), and analyze the algorithmic and architectural designs that guide evolutionary adaptation (e.g., scalar rewards, textual feedback, single-agent and multi-agent systems). Additionally, we analyze evaluation metrics and benchmarks tailored for self-evolving agents, highlight applications in domains such as coding, education, and healthcare, and identify critical challenges and research directions in safety, scalability, and co-evolutionary dynamics. By providing a structured framework for understanding and designing self-evolving agents, this survey establishes a roadmap for advancing adaptive agentic systems in both research and real-world deployments, ultimately shedding lights to pave the way for the realization of Artificial Super Intelligence (ASI), where agents evolve autonomously, performing at or beyond human-level intelligence across a wide array of tasks.

Updated: 2025-07-28 17:59:05

标题: 自我进化代理的调查:通往人工超级智能的道路

摘要: 大型语言模型(LLMs)展示了强大的能力,但仍然基本上是静态的,无法调整其内部参数以适应新颖任务、不断发展的知识领域或动态交互环境。随着LLMs越来越多地部署在开放式、互动性环境中,这种静态特性已经成为一个关键的瓶颈,需要能够在实时中进行自适应推理、行动和演变的代理。这种范式转变——从扩展静态模型到开发自我演变代理——引发了对能够从数据、互动和经验中不断学习和适应的架构和方法的日益浓厚兴趣。本调查提供了首次系统和全面回顾自我演变代理,围绕三个基本维度进行组织——什么演变、何时演变和如何演变。我们检查了跨代理组件的演化机制(例如,模型、记忆、工具、架构),通过阶段对适应方法进行分类(例如,在测试时间内、在测试时间间),并分析指导演进性适应的算法和架构设计(例如,标量奖励、文本反馈、单代理和多代理系统)。此外,我们分析了为自我演变代理量身定制的评估指标和基准,突显了在编码、教育和医疗保健等领域的应用,并确定了在安全性、可扩展性和共进化动态方面的关键挑战和研究方向。通过提供一个结构化框架来理解和设计自我演变代理,这项调查为推进自适应代理系统的研究和实际部署奠定了基础,最终为实现人工超级智能(ASI)铺平道路,其中代理能够自主演化,在各种任务中表现出或超越人类水平的智能。

更新时间: 2025-07-28 17:59:05

领域: cs.AI

下载: http://arxiv.org/abs/2507.21046v1

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Large Language Models (LLMs) have demonstrated strong capabilities but remain fundamentally static, unable to adapt their internal parameters to novel tasks, evolving knowledge domains, or dynamic interaction contexts. As LLMs are increasingly deployed in open-ended, interactive environments, this static nature has become a critical bottleneck, necessitating agents that can adaptively reason, act, and evolve in real time. This paradigm shift -- from scaling static models to developing self-evolving agents -- has sparked growing interest in architectures and methods enabling continual learning and adaptation from data, interactions, and experiences. This survey provides the first systematic and comprehensive review of self-evolving agents, organized around three foundational dimensions -- what to evolve, when to evolve, and how to evolve. We examine evolutionary mechanisms across agent components (e.g., models, memory, tools, architecture), categorize adaptation methods by stages (e.g., intra-test-time, inter-test-time), and analyze the algorithmic and architectural designs that guide evolutionary adaptation (e.g., scalar rewards, textual feedback, single-agent and multi-agent systems). Additionally, we analyze evaluation metrics and benchmarks tailored for self-evolving agents, highlight applications in domains such as coding, education, and healthcare, and identify critical challenges and research directions in safety, scalability, and co-evolutionary dynamics. By providing a structured framework for understanding and designing self-evolving agents, this survey establishes a roadmap for advancing adaptive agentic systems in both research and real-world deployments, ultimately shedding lights to pave the way for the realization of Artificial Super Intelligence (ASI), where agents evolve autonomously, performing at or beyond human-level intelligence across a wide array of tasks.

Updated: 2025-07-28 17:59:05

标题: 自进化代理的调查:通往人工超智能的道路

摘要: 大型语言模型(LLMs)已经展示出强大的能力,但仍然基本上是静态的,无法根据新颖任务、不断发展的知识领域或动态交互环境调整其内部参数。随着LLMs越来越多地部署在开放式、互动性环境中,这种静态性已经成为一个关键瓶颈,需要能够适应性地推理、行动和实时演化的代理。这种从扩展静态模型到开发自我演化代理的范式转变引发了对架构和方法的兴趣,这些架构和方法能够从数据、互动和经验中实现持续学习和适应。本调查提供了对自我演化代理的首次系统和全面审查,围绕三个基础维度进行组织--何时演化、如何演化。我们研究了代理组件(例如模型、记忆、工具、架构)之间的演化机制,通过阶段(例如测试内时间、测试间时间)对适应方法进行分类,并分析指导演化适应的算法和架构设计(例如标量奖励、文本反馈、单一代理和多代理系统)。此外,我们分析了为自我演化代理量身定制的评估指标和基准,突出了诸如编码、教育和医疗保健等领域的应用,并确定了在安全性、可扩展性和协同演化动态方面的关键挑战和研究方向。通过提供一个结构化框架来理解和设计自我演化代理,本调查建立了一个推进适应性代理系统在研究和实际部署中的道路图,最终为实现人工超级智能(ASI)铺平道路,其中代理会自主演化,在各种任务中表现出超越人类水平的智能。

更新时间: 2025-07-28 17:59:05

领域: cs.AI

下载: http://arxiv.org/abs/2507.21046v1

Agentic Web: Weaving the Next Web with AI Agents

The emergence of AI agents powered by large language models (LLMs) marks a pivotal shift toward the Agentic Web, a new phase of the internet defined by autonomous, goal-driven interactions. In this paradigm, agents interact directly with one another to plan, coordinate, and execute complex tasks on behalf of users. This transition from human-driven to machine-to-machine interaction allows intent to be delegated, relieving users from routine digital operations and enabling a more interactive, automated web experience. In this paper, we present a structured framework for understanding and building the Agentic Web. We trace its evolution from the PC and Mobile Web eras and identify the core technological foundations that support this shift. Central to our framework is a conceptual model consisting of three key dimensions: intelligence, interaction, and economics. These dimensions collectively enable the capabilities of AI agents, such as retrieval, recommendation, planning, and collaboration. We analyze the architectural and infrastructural challenges involved in creating scalable agentic systems, including communication protocols, orchestration strategies, and emerging paradigms such as the Agent Attention Economy. We conclude by discussing the potential applications, societal risks, and governance issues posed by agentic systems, and outline research directions for developing open, secure, and intelligent ecosystems shaped by both human intent and autonomous agent behavior. A continuously updated collection of relevant studies for agentic web is available at: https://github.com/SafeRL-Lab/agentic-web.

Updated: 2025-07-28 17:58:12

标题: 主动性网络:用人工智能代理人编织下一个网络

摘要: 由大型语言模型(LLMs)驱动的AI代理的出现标志着向代理网络(Agentic Web)的转变,这是互联网的新阶段,其特点是自主、目标驱动的互动。在这种范式中,代理直接与彼此互动以规划、协调和执行代表用户的复杂任务。这种从人类驱动到机器之间的互动允许意图被委托,从而解除用户对例行数字操作的负担,实现更互动、自动化的网络体验。本文提出了一个系统化的框架,用于理解和构建代理网络。我们追溯了它从PC和移动网络时代的演变,并确定支持这种转变的核心技术基础。我们的框架的核心是一个由三个关键维度组成的概念模型:智能、互动和经济。这些维度共同支持AI代理的能力,如检索、推荐、规划和协作。我们分析了创建可扩展代理系统涉及的架构和基础设施挑战,包括通信协议、编排策略和新兴范式,如代理关注经济。最后,我们讨论了代理系统可能的应用、社会风险和治理问题,并概述了发展开放、安全和智能生态系统的研究方向,这些生态系统由人类意图和自主代理行为共同塑造。关于代理网络的相关研究不断更新的集合可以在以下链接找到:https://github.com/SafeRL-Lab/agentic-web。

更新时间: 2025-07-28 17:58:12

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.21206v1

Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements

We propose a probabilistic interpretation of transformers as unrolled inference steps assuming a probabilistic Laplacian Eigenmaps model from the ProbDR framework. Our derivation shows that at initialisation, transformers perform "linear" dimensionality reduction. We also show that within the transformer block, a graph Laplacian term arises from our arguments, rather than an attention matrix (which we interpret as an adjacency matrix). We demonstrate that simply subtracting the identity from the attention matrix (and thereby taking a graph diffusion step) improves validation performance on a language model and a simple vision transformer.

Updated: 2025-07-28 17:56:34

标题: 在概率拉普拉斯特征映射中展开推理的变压器:解释和潜在改进

摘要: 我们提出将transformers解释为假设概率Laplacian Eigenmaps模型的展开推理步骤,该模型来自ProbDR框架。我们的推导表明,在初始化阶段,transformers执行“线性”降维。我们还展示了在transformer块内,一个图拉普拉斯项从我们的论据中出现,而不是一个注意力矩阵(我们将其解释为邻接矩阵)。我们证明,简单地从注意力矩阵中减去单位矩阵(从而进行图扩散步骤)可以提高语言模型和简单视觉transformer的验证性能。

更新时间: 2025-07-28 17:56:34

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.21040v1

Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements

We propose a probabilistic interpretation of transformers as unrolled inference steps assuming a probabilistic Laplacian Eigenmaps model from the ProbDR framework. Our derivation shows that at initialisation, transformers perform "linear" dimensionality reduction. We also show that within the transformer block, a graph Laplacian term arises from our arguments, rather than an attention matrix (which we interpret as an adjacency matrix). We demonstrate that simply subtracting the identity from the attention matrix (and thereby taking a graph diffusion step) improves validation performance on a language model and a simple vision transformer.

Updated: 2025-07-28 17:56:34

标题: 将变压器作为展开推断在概率拉普拉斯特征映射中的应用:一种解释和潜在改进

摘要: 我们提出了将transformers解释为概率推断步骤的方法,假设采用ProbDR框架中的概率拉普拉斯特征映射模型。我们的推导表明,在初始化时,transformers执行“线性”降维。我们还展示,在transformer块内部,我们的论点产生了一个图拉普拉斯项,而不是一个注意力矩阵(我们将其解释为邻接矩阵)。我们证明,简单地从注意力矩阵中减去单位矩阵(从而进行图扩散步骤)可以提高语言模型和简单视觉transformer的验证性能。

更新时间: 2025-07-28 17:56:34

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.21040v1

Development and analysis of a secured VoIP system for surveillance activities

Since the 1990s, the telephone has been the primary mode of communication. However, Voice over Internet Protocol (VoIP), which is a highly straightforward and affordable form of data transfer, is now becoming an important part of daily communication. VoIP is the technology that makes it possible to send speech and multimedia data packets across either a public or private IP network. However, a cyberattack known as a man-in-the-middle attack poses a serious concern in transferring data through any network. Therefore, the authors have designed a system that sends voice over the internet within the range of a router using encrypted data transfer. An embedded system comprising an electret microphone, Embedded C, Node.js, Particle Photon microcontroller, and Internet of Things (IoT) technology is developed. Due to its compact size, this type of device may be incorporated into automobiles, surveillance systems, or covert listening tools. The VoIP system gathers sound signals using the MAX9814 microphone, while the Particle Photon microcontroller securely transmits the data. Devices with access can download data from the VoIP systems Transmission Control Protocol (TCP) server. The accessed device stores the audio locally and uploads the corresponding data to Google Drive. This VoIP system provides a secure method of communication while conserving the integrity of the original signal.

Updated: 2025-07-28 17:56:09

标题: 监控活动的安全VoIP系统的开发与分析

摘要: 自1990年代以来,电话一直是主要的通信方式。然而,互联网电话(VoIP)现在正在成为日常通信的重要组成部分,它是一种非常简单和经济的数据传输形式。VoIP是一种技术,使得可以通过公共或私人IP网络发送语音和多媒体数据包。然而,一种称为中间人攻击的网络攻击对通过任何网络传输数据构成严重威胁。因此,作者设计了一个系统,使用加密数据传输在路由器范围内通过互联网发送语音。该嵌入式系统包括一个电容麦克风、嵌入式C、Node.js、Particle Photon微控制器和物联网(IoT)技术。由于其紧凑的尺寸,这种设备可以集成到汽车、监视系统或秘密监听工具中。VoIP系统使用MAX9814麦克风收集声音信号,而Particle Photon微控制器安全地传输数据。具有访问权限的设备可以从VoIP系统的传输控制协议(TCP)服务器下载数据。访问设备将音频存储在本地并将相应数据上传到谷歌云端。这种VoIP系统提供了一种安全的通信方法,同时保持了原始信号的完整性。

更新时间: 2025-07-28 17:56:09

领域: cs.CR

下载: http://arxiv.org/abs/2507.21038v1

When Brain Foundation Model Meets Cauchy-Schwarz Divergence: A New Framework for Cross-Subject Motor Imagery Decoding

Decoding motor imagery (MI) electroencephalogram (EEG) signals, a key non-invasive brain-computer interface (BCI) paradigm for controlling external systems, has been significantly advanced by deep learning. However, MI-EEG decoding remains challenging due to substantial inter-subject variability and limited labeled target data, which necessitate costly calibration for new users. Many existing multi-source domain adaptation (MSDA) methods indiscriminately incorporate all available source domains, disregarding the large inter-subject differences in EEG signals, which leads to negative transfer and excessive computational costs. Moreover, while many approaches focus on feature distribution alignment, they often neglect the explicit dependence between features and decision-level outputs, limiting their ability to preserve discriminative structures. To address these gaps, we propose a novel MSDA framework that leverages a pretrained large Brain Foundation Model (BFM) for dynamic and informed source subject selection, ensuring only relevant sources contribute to adaptation. Furthermore, we employ Cauchy-Schwarz (CS) and Conditional CS (CCS) divergences to jointly perform feature-level and decision-level alignment, enhancing domain invariance while maintaining class discriminability. Extensive evaluations on two benchmark MI-EEG datasets demonstrate that our framework outperforms a broad range of state-of-the-art baselines. Additional experiments with a large source pool validate the scalability and efficiency of BFM-guided selection, which significantly reduces training time without sacrificing performance.

Updated: 2025-07-28 17:55:26

标题: 当脑基础模型遇到柯西-施瓦茨散度:一种新的跨主体动作想象解码框架

摘要: 解码运动想象(MI)脑电图(EEG)信号是控制外部系统的关键非侵入性脑机接口(BCI)范例,深度学习显著推进了这一领域。然而,由于实验对象之间存在显著差异和有限的标记目标数据,MI-EEG解码仍然具有挑战性,这需要为新用户进行昂贵的校准。许多现有的多源领域适应(MSDA)方法不加区分地整合所有可用的源领域,忽视了EEG信号中的大量实验对象之间的差异,导致负迁移和过多的计算成本。此外,虽然许多方法专注于特征分布对齐,但它们经常忽略了特征和决策级输出之间的明确依赖关系,限制了它们保留辨别结构的能力。为了解决这些问题,我们提出了一种新颖的MSDA框架,利用预训练的大型脑基金会模型(BFM)进行动态和明智的源主题选择,确保只有相关的源才能进行适应。此外,我们采用Cauchy-Schwarz(CS)和条件CS(CCS)散度来联合执行特征级和决策级对齐,增强领域不变性同时保持类可辨识性。在两个基准MI-EEG数据集上进行的广泛评估表明,我们的框架优于广泛范围的最新基线。对一个庞大的源池进行的额外实验验证了BFM引导选择的可扩展性和效率,显著减少了训练时间而不损害性能。

更新时间: 2025-07-28 17:55:26

领域: cs.LG

下载: http://arxiv.org/abs/2507.21037v1

When Brain Foundation Model Meets Cauchy-Schwarz Divergence: A New Framework for Cross-Subject Motor Imagery Decoding

Decoding motor imagery (MI) electroencephalogram (EEG) signals, a key non-invasive brain-computer interface (BCI) paradigm for controlling external systems, has been significantly advanced by deep learning. However, MI-EEG decoding remains challenging due to substantial inter-subject variability and limited labeled target data, which necessitate costly calibration for new users. Many existing multi-source domain adaptation (MSDA) methods indiscriminately incorporate all available source domains, disregarding the large inter-subject differences in EEG signals, which leads to negative transfer and excessive computational costs. Moreover, while many approaches focus on feature distribution alignment, they often neglect the explicit dependence between features and decision-level outputs, limiting their ability to preserve discriminative structures. To address these gaps, we propose a novel MSDA framework that leverages a pretrained large Brain Foundation Model (BFM) for dynamic and informed source subject selection, ensuring only relevant sources contribute to adaptation. Furthermore, we employ Cauchy-Schwarz (CS) and Conditional CS (CCS) divergences to jointly perform feature-level and decision-level alignment, enhancing domain invariance while maintaining class discriminability. Extensive evaluations on two benchmark MI-EEG datasets demonstrate that our framework outperforms a broad range of state-of-the-art baselines. Additional experiments with a large source pool validate the scalability and efficiency of BFM-guided selection, which significantly reduces training time without sacrificing performance.

Updated: 2025-07-28 17:55:26

标题: 当脑基础模型遇到柯西-施瓦茨散度:一种用于跨主体运动想象解码的新框架

摘要: 解码运动意象(MI)脑电图(EEG)信号是控制外部系统的关键非侵入性脑-计算机界面(BCI)范例,通过深度学习得到了显著进展。然而,由于主体间变异性显著且标记目标数据有限,MI-EEG解码仍然具有挑战性,这需要新用户进行昂贵的校准。许多现有的多源域自适应(MSDA)方法会不加选择地整合所有可用的源域,忽视了EEG信号中的主体间差异,导致负迁移和过度的计算成本。此外,虽然许多方法专注于特征分布对齐,但经常忽略特征与决策级输出之间的显式依赖,限制了它们保留判别结构的能力。为了解决这些差距,我们提出了一个新颖的MSDA框架,利用预训练的大型脑基础模型(BFM)进行动态和明智的源主体选择,确保只有相关的源贡献于适应。此外,我们使用柯西-施瓦茨(CS)和条件CS(CCS)差异共同执行特征级和决策级对齐,增强领域不变性同时保持类别可区分性。对两个基准MI-EEG数据集的广泛评估表明,我们的框架胜过了广泛范围的最新基线。通过一个大型源池的额外实验验证了BFM引导选择的可扩展性和效率,显著减少了训练时间而不损害性能。

更新时间: 2025-07-28 17:55:26

领域: cs.LG

下载: http://arxiv.org/abs/2507.21037v1

GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data. On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F$_1$ of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at https://github.com/Liu-Hy/GenoMAS.

Updated: 2025-07-28 17:55:08

标题: GenoMAS: 一种基于代码驱动基因表达分析的科学发现多智能体框架

摘要: 基因表达分析是许多生物医学发现的关键,然而,由于多个大型半结构化文件的复杂性和对广泛领域专业知识的需求,从原始转录组数据中提取见解仍然是一项艰巨的任务。当前的自动化方法通常受到限制,要么是由于在边缘情况下出现故障的不灵活工作流,要么是由于缺乏对严格科学探究所必需的精度的完全自主代理。GenoMAS通过展示一个基于LLM的科学家团队,将结构化工作流程的可靠性与自主代理的适应性结合起来,开辟了一条不同的道路。GenoMAS通过类型化的消息传递协议编排了六个专门的LLM代理,每个代理都为共享的分析画布贡献了互补的优势。GenoMAS的核心是一个引导规划框架:编程代理将高级任务指南展开为动作单元,并在每个关键点上选择前进、修改、绕过或回溯,从而在保持逻辑一致性的同时优雅地弯曲到基因组数据的特异性。 在GenoTEX基准测试中,GenoMAS在数据预处理方面达到了89.13%的综合相似性相关性,基因识别方面达到了60.48%的F$_1$,分别超过了先前最佳水平的10.61%和16.85%。除了指标,GenoMAS还提出了根据文献支持的生物学合理的基因-表型关联,同时调整了潜在的混杂因素。代码可在https://github.com/Liu-Hy/GenoMAS 上找到。

更新时间: 2025-07-28 17:55:08

领域: cs.AI,cs.LG,cs.MA,q-bio.GN

下载: http://arxiv.org/abs/2507.21035v1

GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data. On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F$_1$ of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at https://github.com/Liu-Hy/GenoMAS.

Updated: 2025-07-28 17:55:08

标题: GenoMAS:一个通过基于代码的基因表达分析进行科学发现的多代理框架

摘要: 基因表达分析是许多生物医学发现的关键,然而从原始转录组数据中提取洞察力仍然很困难,因为存在多个大型、半结构化文件的复杂性以及对广泛领域专业知识的需求。当前的自动化方法通常受到限制,要么是由于在极端情况下无法工作的不灵活工作流,要么是由于完全自主的代理缺乏对严格科学探究所必需的精度。 GenoMAS通过展示一个基于LLM的科学家团队,将结构化工作流的可靠性与自主代理的适应性相结合,开辟了一条不同的道路。GenoMAS通过类型化的消息传递协议协调六个专门的LLM代理,每个代理为共享的分析画布贡献互补的优势。在GenoMAS的核心是一个引导式规划框架:编程代理将高级任务指南展开为行动单元,并在每个交汇点选择前进、修改、绕过或回溯,从而在保持逻辑连贯性的同时优雅地适应基因组数据的特异性。 在GenoTEX基准测试中,GenoMAS在数据预处理方面达到了89.13%的综合相似性相关性,基因识别方面达到了60.48%的F$_1$,分别比最佳先前技术提升了10.61%和16.85%。除了指标之外,GenoMAS还呈现出通过文献证实的生物学合理的基因-表型关联,同时调整潜在混杂因素。代码可在https://github.com/Liu-Hy/GenoMAS找到。

更新时间: 2025-07-28 17:55:08

领域: cs.AI,cs.LG,cs.MA,q-bio.GN

下载: http://arxiv.org/abs/2507.21035v1

Learning from Limited and Imperfect Data

The distribution of data in the world (eg, internet, etc.) significantly differs from the well-curated datasets and is often over-populated with samples from common categories. The algorithms designed for well-curated datasets perform suboptimally when used for learning from imperfect datasets with long-tailed imbalances and distribution shifts. To expand the use of deep models, it is essential to overcome the labor-intensive curation process by developing robust algorithms that can learn from diverse, real-world data distributions. Toward this goal, we develop practical algorithms for Deep Neural Networks which can learn from limited and imperfect data present in the real world. This thesis is divided into four segments, each covering a scenario of learning from limited or imperfect data. The first part of the thesis focuses on Learning Generative Models from Long-Tail Data, where we mitigate the mode-collapse and enable diverse aesthetic image generations for tail (minority) classes. In the second part, we enable effective generalization on tail classes through Inductive Regularization schemes, which allow tail classes to generalize as effectively as the head classes without requiring explicit generation of images. In the third part, we develop algorithms for Optimizing Relevant Metrics for learning from long-tailed data with limited annotation (semi-supervised), followed by the fourth part, which focuses on the Efficient Domain Adaptation of the model to various domains with very few to zero labeled samples.

Updated: 2025-07-28 17:54:15

标题: 从有限和不完美的数据中学习

摘要: 世界上数据的分布(例如互联网等)与精心策划的数据集有显着差异,通常是过度集中在常见类别的样本上。针对精心策划的数据集设计的算法在用于学习存在长尾不平衡和分布偏移的不完善数据时表现不佳。为了扩大深度模型的使用,必须通过开发能够从多样化的真实世界数据分布中学习的强大算法来克服劳动密集型的策划过程。为实现这一目标,我们开发了一种能够从现实世界中存在的有限和不完善数据中学习的实用算法,用于深度神经网络。这篇论文分为四个部分,每个部分覆盖了从有限或不完善数据中学习的情景。论文的第一部分侧重于从长尾数据中学习生成模型,我们缓解了模式崩溃,并实现了对尾部(少数)类别的多样化美学图像生成。在第二部分中,通过归纳规范方案,我们实现了对尾部类别的有效泛化,使尾部类别能够像头部类别一样有效地泛化,而无需显式生成图像。在第三部分中,我们开发了针对从具有有限注释(半监督)的长尾数据中学习的优化相关指标的算法,随后是第四部分,重点放在将模型高效地域适应到各种领域,其中很少甚至没有标记样本。

更新时间: 2025-07-28 17:54:15

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.21205v1

ShaRP: Explaining Rankings and Preferences with Shapley Values

Algorithmic decisions in critical domains such as hiring, college admissions, and lending are often based on rankings. Given the impact of these decisions on individuals, organizations, and population groups, it is essential to understand them - to help individuals improve their ranking position, design better ranking procedures, and ensure legal compliance. In this paper, we argue that explainability methods for classification and regression, such as SHAP, are insufficient for ranking tasks, and present ShaRP - Shapley Values for Rankings and Preferences - a framework that explains the contributions of features to various aspects of a ranked outcome. ShaRP computes feature contributions for various ranking-specific profit functions, such as rank and top-k, and also includes a novel Shapley value-based method for explaining pairwise preference outcomes. We provide a flexible implementation of ShaRP, capable of efficiently and comprehensively explaining ranked and pairwise outcomes over tabular data, in score-based ranking and learning-to-rank tasks. Finally, we develop a comprehensive evaluation methodology for ranking explainability methods, showing through qualitative, quantitative, and usability studies that our rank-aware QoIs offer complementary insights, scale effectively, and help users interpret ranked outcomes in practice.

Updated: 2025-07-28 17:50:29

标题: ShaRP: 用Shapley值解释排名和偏好

摘要: 在招聘、大学录取和贷款等关键领域中,算法决策通常基于排名。考虑到这些决策对个人、组织和人群的影响,理解它们是至关重要的,以帮助个人提高排名位置,设计更好的排名程序,并确保合法合规。在本文中,我们认为,用于分类和回归的可解释性方法,如SHAP,对于排名任务来说是不足够的,并提出ShaRP - 一种用于排名和偏好的Shapley值的框架,用于解释特征对排名结果各个方面的贡献。 ShaRP计算各种排名特定利润函数的特征贡献,如排名和top-k,并且还包括一种基于Shapley值的新颖方法,用于解释成对偏好结果。我们提供了一个灵活的ShaRP实现,能够高效而全面地解释表格数据上的排名和成对结果,在基于分数的排名和学习排名任务中。最后,我们开发了一个全面的评估方法,用于排名解释方法,通过定性、定量和可用性研究,展示了我们的排名感知QoIs提供了互补的见解,有效扩展,并帮助用户解释实践中的排名结果。

更新时间: 2025-07-28 17:50:29

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2401.16744v5

ShaRP: Explaining Rankings and Preferences with Shapley Values

Algorithmic decisions in critical domains such as hiring, college admissions, and lending are often based on rankings. Given the impact of these decisions on individuals, organizations, and population groups, it is essential to understand them - to help individuals improve their ranking position, design better ranking procedures, and ensure legal compliance. In this paper, we argue that explainability methods for classification and regression, such as SHAP, are insufficient for ranking tasks, and present ShaRP - Shapley Values for Rankings and Preferences - a framework that explains the contributions of features to various aspects of a ranked outcome. ShaRP computes feature contributions for various ranking-specific profit functions, such as rank and top-k, and also includes a novel Shapley value-based method for explaining pairwise preference outcomes. We provide a flexible implementation of ShaRP, capable of efficiently and comprehensively explaining ranked and pairwise outcomes over tabular data, in score-based ranking and learning-to-rank tasks. Finally, we develop a comprehensive evaluation methodology for ranking explainability methods, showing through qualitative, quantitative, and usability studies that our rank-aware QoIs offer complementary insights, scale effectively, and help users interpret ranked outcomes in practice.

Updated: 2025-07-28 17:50:29

标题: ShaRP:使用Shapley值解释排名和偏好

摘要: 关键领域中的算法决策,如招聘、大学入学和信贷,通常基于排名。考虑到这些决策对个人、组织和人群的影响,了解它们是至关重要的 - 帮助个人提升其排名位置,设计更好的排名程序,并确保合法合规。在本文中,我们认为分类和回归的可解释性方法,如SHAP,对于排名任务是不够的,并提出ShaRP - 用于排名和偏好的Shapley值 - 一个框架,解释特征对排名结果的各个方面的贡献。 ShaRP计算各种排名特定利润函数的特征贡献,如排名和top-k,并包括一种基于Shapley值的新颖方法,用于解释成对偏好结果。我们提供了ShaRP的灵活实现,能够高效全面地解释基于表格数据的排名和成对结果,在基于分数的排名和学习排名任务中。最后,我们开发了一个全面的排名可解释性方法评估方法,通过定性、定量和可用性研究,展示了我们的排名感知QoIs提供了互补的见解,能够有效扩展,并帮助用户解释实践中的排名结果。

更新时间: 2025-07-28 17:50:29

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2401.16744v5

Smart Expansion Techniques for ASP-based Interactive Configuration

Product configuration is a successful application of Answer Set Programming (ASP). However, challenges are still open for interactive systems to effectively guide users through the configuration process. The aim of our work is to provide an ASP-based solver for interactive configuration that can deal with large-scale industrial configuration problems and that supports intuitive user interfaces via an API. In this paper, we focus on improving the performance of automatically completing a partial configuration. Our main contribution enhances the classical incremental approach for multi-shot solving by four different smart expansion functions. The core idea is to determine and add specific objects or associations to the partial configuration by exploiting cautious and brave consequences before checking for the existence of a complete configuration with the current objects in each iteration. This approach limits the number of costly unsatisfiability checks and reduces the search space, thereby improving solving performance. In addition, we present a user interface that uses our API and is implemented in ASP.

Updated: 2025-07-28 17:46:51

标题: 基于ASP的交互式配置的智能扩展技术

摘要: 产品配置是答案集编程(ASP)的成功应用。然而,互动系统仍然存在挑战,如何有效地引导用户完成配置过程。我们的工作旨在为交互式配置提供一个基于ASP的求解器,可以处理大规模工业配置问题,并通过API支持直观的用户界面。在本文中,我们专注于改进自动完成部分配置的性能。我们的主要贡献是通过四种不同的智能扩展函数增强了经典的增量式解决方法,用于多次求解。核心思想是在每次迭代中利用谨慎和勇敢的后果来确定并添加特定对象或关联到部分配置中,然后再检查是否存在完整配置。这种方法可以限制昂贵的不可满足性检查次数,减少搜索空间,从而提高求解性能。此外,我们还提供了一个使用我们的API并在ASP中实现的用户界面。

更新时间: 2025-07-28 17:46:51

领域: cs.AI,cs.SE,D.1.6; I.2.1

下载: http://arxiv.org/abs/2507.21027v1

Smart Expansion Techniques for ASP-based Interactive Configuration

Product configuration is a successful application of Answer Set Programming (ASP). However, challenges are still open for interactive systems to effectively guide users through the configuration process. The aim of our work is to provide an ASP-based solver for interactive configuration that can deal with large-scale industrial configuration problems and that supports intuitive user interfaces via an API. In this paper, we focus on improving the performance of automatically completing a partial configuration. Our main contribution enhances the classical incremental approach for multi-shot solving by four different smart expansion functions. The core idea is to determine and add specific objects or associations to the partial configuration by exploiting cautious and brave consequences before checking for the existence of a complete configuration with the current objects in each iteration. This approach limits the number of costly unsatisfiability checks and reduces the search space, thereby improving solving performance. In addition, we present a user interface that uses our API and is implemented in ASP.

Updated: 2025-07-28 17:46:51

标题: 智能扩展技术用于基于ASP的交互式配置

摘要: 产品配置是应用ASP的成功应用。然而,对于交互式系统来有效地引导用户完成配置过程,仍然存在挑战。我们的工作旨在为交互式配置提供基于ASP的求解器,可以处理大规模工业配置问题,并通过API支持直观用户界面。在本文中,我们专注于改进自动完成部分配置的性能。我们的主要贡献是通过四个不同的智能扩展函数增强经典的增量式多次求解方法。核心思想是在每次迭代中利用谨慎和勇敢的后果确定并添加特定对象或关联到部分配置中,然后再检查是否存在完整配置。这种方法限制了昂贵的不可满足性检查数量,减少了搜索空间,从而提高了求解性能。此外,我们提供了一个使用我们的API并在ASP中实现的用户界面。

更新时间: 2025-07-28 17:46:51

领域: cs.AI,cs.SE,D.1.6; I.2.1

下载: http://arxiv.org/abs/2507.21027v1

Optimization Performance of Factorization Machine with Annealing under Limited Training Data

Black-box (BB) optimization problems aim to identify an input that minimizes the output of a function (the BB function) whose input-output relationship is unknown. Factorization machine with annealing (FMA) is a promising approach to this task, employing a factorization machine (FM) as a surrogate model to iteratively guide the solution search via an Ising machine. Although FMA has demonstrated strong optimization performance across various applications, its performance often stagnates as the number of optimization iterations increases. One contributing factor to this stagnation is the growing number of data points in the dataset used to train FM. It is hypothesized that as more data points are accumulated, the contribution of newly added data points becomes diluted within the entire dataset, thereby reducing their impact on improving the prediction accuracy of FM. To address this issue, we propose a novel method for sequential dataset construction that retains at most a specified number of the most recently added data points. This strategy is designed to enhance the influence of newly added data points on the surrogate model. Numerical experiments demonstrate that the proposed FMA achieves lower-cost solutions with fewer BB function evaluations compared to the conventional FMA.

Updated: 2025-07-28 17:45:10

标题: 有限训练数据下使用模拟退火的因子分解机性能优化

摘要: 黑盒(BB)优化问题旨在确定最小化函数输出的输入(BB函数)的输入-输出关系未知。具有退火的因子分解机(FMA)是一种有前途的方法,它使用因子分解机(FM)作为替代模型,通过Ising机迭代地指导解决方案搜索。尽管FMA在各种应用中展示出强大的优化性能,但随着优化迭代次数的增加,其性能通常会停滞。导致这种停滞的一个因素是用于训练FM的数据集中数据点的增加。有人假设随着数据点的积累,新添加的数据点的贡献会在整个数据集中变得稀释,从而降低它们对提高FM预测准确性的影响。为了解决这个问题,我们提出了一种用于顺序数据集构建的新方法,保留最近添加的数据点中的最多指定数量。这种策略旨在增强新添加数据点对替代模型的影响。数值实验表明,与传统的FMA相比,所提出的FMA在较少的BB函数评估次数下实现了更低成本的解决方案。

更新时间: 2025-07-28 17:45:10

领域: cs.LG

下载: http://arxiv.org/abs/2507.21024v1

Optimization Performance of Factorization Machine with Annealing under Limited Training Data

Black-box (BB) optimization problems aim to identify an input that minimizes the output of a function (the BB function) whose input-output relationship is unknown. Factorization machine with annealing (FMA) is a promising approach to this task, employing a factorization machine (FM) as a surrogate model to iteratively guide the solution search via an Ising machine. Although FMA has demonstrated strong optimization performance across various applications, its performance often stagnates as the number of optimization iterations increases. One contributing factor to this stagnation is the growing number of data points in the dataset used to train FM. It is hypothesized that as more data points are accumulated, the contribution of newly added data points becomes diluted within the entire dataset, thereby reducing their impact on improving the prediction accuracy of FM. To address this issue, we propose a novel method for sequential dataset construction that retains at most a specified number of the most recently added data points. This strategy is designed to enhance the influence of newly added data points on the surrogate model. Numerical experiments demonstrate that the proposed FMA achieves lower-cost solutions with fewer BB function evaluations compared to the conventional FMA.

Updated: 2025-07-28 17:45:10

标题: 有限训练数据下使用退火优化的因子分解机性能

摘要: 黑盒(BB)优化问题旨在识别最小化函数输出的输入(BB函数)的输入-输出关系未知。具有退火因子的因子分解机(FMA)是解决此问题的一种有前途的方法,它使用因子分解机(FM)作为代理模型,通过Ising机迭代地指导解决方案搜索。尽管FMA在各种应用中表现出强大的优化性能,但随着优化迭代次数的增加,其性能经常停滞不前。造成这种停滞的一个因素是在用于训练FM的数据集中数据点的增加。据推测,随着数据点的累积,新添加数据点的贡献在整个数据集中变得稀疏,从而减少了它们对改善FM预测准确性的影响。为解决这个问题,我们提出了一种新颖的顺序数据集构建方法,保留最近添加的数据点中的最多指定数量。该策略旨在增强新添加数据点对代理模型的影响。数值实验表明,与传统的FMA相比,所提出的FMA在更少的BB函数评估次数下实现了更低成本的解决方案。

更新时间: 2025-07-28 17:45:10

领域: cs.LG

下载: http://arxiv.org/abs/2507.21024v1

On Using the Shapley Value for Anomaly Localization: A Statistical Investigation

Recent publications have suggested using the Shapley value for anomaly localization for sensor data systems. Using a reasonable mathematical anomaly model for full control, experiments indicate that using a single fixed term in the Shapley value calculation achieves a lower complexity anomaly localization test, with the same probability of error, as a test using the Shapley value for all cases tested. A proof demonstrates these conclusions must be true for all independent observation cases. For dependent observation cases, no proof is available.

Updated: 2025-07-28 17:43:53

标题: 使用沙普利值进行异常定位:统计调查

摘要: 最近的出版物建议在传感器数据系统中使用Shapley值进行异常定位。使用一个合理的数学异常模型进行完全控制实验表明,在Shapley值计算中使用一个固定项可以实现更低复杂度的异常定位测试,与对所有测试案例使用Shapley值的测试具有相同的错误概率。证明表明这些结论对所有独立观测案例都必须成立。对于依赖观测案例,目前没有证明。

更新时间: 2025-07-28 17:43:53

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.21023v1

On Using the Shapley Value for Anomaly Localization: A Statistical Investigation

Recent publications have suggested using the Shapley value for anomaly localization for sensor data systems. Using a reasonable mathematical anomaly model for full control, experiments indicate that using a single fixed term in the Shapley value calculation achieves a lower complexity anomaly localization test, with the same probability of error, as a test using the Shapley value for all cases tested. A proof demonstrates these conclusions must be true for all independent observation cases. For dependent observation cases, no proof is available.

Updated: 2025-07-28 17:43:53

标题: 关于使用Shapley值进行异常定位的统计调查

摘要: 最近的出版物建议在传感器数据系统中使用沙普利值进行异常定位。使用合理的数学异常模型进行全面控制,实验表明,在沙普利值计算中使用单个固定术语可以实现更低复杂度的异常定位测试,与对所有测试情况使用沙普利值的测试具有相同的错误概率。证明表明,这些结论对于所有独立观察案例必须成立。对于依赖观察案例,没有可用的证明。

更新时间: 2025-07-28 17:43:53

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.21023v1

Behavior-Specific Filtering for Enhanced Pig Behavior Classification in Precision Livestock Farming

This study proposes a behavior-specific filtering method to improve behavior classification accuracy in Precision Livestock Farming. While traditional filtering methods, such as wavelet denoising, achieved an accuracy of 91.58%, they apply uniform processing to all behaviors. In contrast, the proposed behavior-specific filtering method combines Wavelet Denoising with a Low Pass Filter, tailored to active and inactive pig behaviors, and achieved a peak accuracy of 94.73%. These results highlight the effectiveness of behavior-specific filtering in enhancing animal behavior monitoring, supporting better health management and farm efficiency.

Updated: 2025-07-28 17:42:57

标题: 行为特定过滤技术用于提高精准畜牧业中猪行为分类的效果

摘要: 本研究提出了一种行为特定的过滤方法,以提高精准畜牧业中的行为分类准确性。虽然传统的过滤方法,如小波去噪,达到了91.58%的准确率,但它们对所有行为应用统一处理。相比之下,所提出的行为特定过滤方法将小波去噪与低通滤波器相结合,针对活跃和不活跃的猪的行为,实现了94.73%的峰值准确率。这些结果突显了行为特定过滤在增强动物行为监测方面的有效性,支持更好的健康管理和农场效率。

更新时间: 2025-07-28 17:42:57

领域: cs.LG

下载: http://arxiv.org/abs/2507.21021v1

Behavior-Specific Filtering for Enhanced Pig Behavior Classification in Precision Livestock Farming

This study proposes a behavior-specific filtering method to improve behavior classification accuracy in Precision Livestock Farming. While traditional filtering methods, such as wavelet denoising, achieved an accuracy of 91.58%, they apply uniform processing to all behaviors. In contrast, the proposed behavior-specific filtering method combines Wavelet Denoising with a Low Pass Filter, tailored to active and inactive pig behaviors, and achieved a peak accuracy of 94.73%. These results highlight the effectiveness of behavior-specific filtering in enhancing animal behavior monitoring, supporting better health management and farm efficiency.

Updated: 2025-07-28 17:42:57

标题: 行为特定过滤以增强精准畜牧业中的猪行为分类

摘要: 本研究提出了一种行为特定的过滤方法,以提高精准畜牧业中的行为分类准确性。传统的过滤方法,如小波去噪,实现了91.58%的准确性,但它们对所有行为都应用统一处理。相比之下,所提出的行为特定过滤方法将小波去噪与低通滤波器相结合,针对主动和被动猪的行为,实现了94.73%的峰值准确性。这些结果突显了行为特定过滤在增强动物行为监测方面的有效性,支持更好的健康管理和农场效率。

更新时间: 2025-07-28 17:42:57

领域: cs.LG

下载: http://arxiv.org/abs/2507.21021v1

Deep Learning for Skeleton Based Human Motion Rehabilitation Assessment: A Benchmark

Automated assessment of human motion plays a vital role in rehabilitation, enabling objective evaluation of patient performance and progress. Unlike general human activity recognition, rehabilitation motion assessment focuses on analyzing the quality of movement within the same action class, requiring the detection of subtle deviations from ideal motion. Recent advances in deep learning and video-based skeleton extraction have opened new possibilities for accessible, scalable motion assessment using affordable devices such as smartphones or webcams. However, the field lacks standardized benchmarks, consistent evaluation protocols, and reproducible methodologies, limiting progress and comparability across studies. In this work, we address these gaps by (i) aggregating existing rehabilitation datasets into a unified archive called Rehab-Pile, (ii) proposing a general benchmarking framework for evaluating deep learning methods in this domain, and (iii) conducting extensive benchmarking of multiple architectures across classification and regression tasks. All datasets and implementations are released to the community to support transparency and reproducibility. This paper aims to establish a solid foundation for future research in automated rehabilitation assessment and foster the development of reliable, accessible, and personalized rehabilitation solutions. The datasets, source-code and results of this article are all publicly available.

Updated: 2025-07-28 17:39:03

标题: 基于骨架的人体运动康复评估的深度学习:一个基准研究

摘要: 人体运动的自动评估在康复中起着至关重要的作用,能够实现对患者表现和进展的客观评估。与一般的人体活动识别不同,康复运动评估侧重于分析同一动作类别内运动质量,需要检测与理想运动微小偏差。近年来,深度学习和基于视频的骨架提取的进展为利用智能手机或网络摄像头等可负担设备进行可访问、可扩展的运动评估开辟了新的可能性。然而,该领域缺乏标准化基准、一致的评估协议和可重复的方法,限制了研究的进展和可比性。在这项工作中,我们通过(i)将现有康复数据集聚合到一个名为Rehab-Pile的统一存档中,(ii)提出一个用于评估该领域深度学习方法的通用基准框架,并(iii)在分类和回归任务上对多种架构进行广泛的基准测试,填补了这些空白。所有数据集和实现均向社区开放,以支持透明度和可重现性。本文旨在为自动康复评估领域的未来研究奠定坚实基础,并促进可靠、可访问和个性化的康复解决方案的发展。本文的数据集、源代码和结果均公开可用。

更新时间: 2025-07-28 17:39:03

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.21018v1

Deep Learning for Skeleton Based Human Motion Rehabilitation Assessment: A Benchmark

Automated assessment of human motion plays a vital role in rehabilitation, enabling objective evaluation of patient performance and progress. Unlike general human activity recognition, rehabilitation motion assessment focuses on analyzing the quality of movement within the same action class, requiring the detection of subtle deviations from ideal motion. Recent advances in deep learning and video-based skeleton extraction have opened new possibilities for accessible, scalable motion assessment using affordable devices such as smartphones or webcams. However, the field lacks standardized benchmarks, consistent evaluation protocols, and reproducible methodologies, limiting progress and comparability across studies. In this work, we address these gaps by (i) aggregating existing rehabilitation datasets into a unified archive called Rehab-Pile, (ii) proposing a general benchmarking framework for evaluating deep learning methods in this domain, and (iii) conducting extensive benchmarking of multiple architectures across classification and regression tasks. All datasets and implementations are released to the community to support transparency and reproducibility. This paper aims to establish a solid foundation for future research in automated rehabilitation assessment and foster the development of reliable, accessible, and personalized rehabilitation solutions. The datasets, source-code and results of this article are all publicly available.

Updated: 2025-07-28 17:39:03

标题: 基于骨架的人体运动康复评估的深度学习:一个基准Benchmark

摘要: 人体运动的自动评估在康复中起着至关重要的作用,可以客观评估患者的表现和进展。与一般的人体活动识别不同,康复运动评估侧重于分析同一动作类别内的运动质量,需要检测与理想运动的微小偏差。近年来,深度学习和基于视频的骨架提取技术的进步为利用价格实惠的设备(如智能手机或网络摄像头)进行可访问、可扩展的运动评估打开了新的可能性。然而,该领域缺乏标准化的基准、一致的评估协议和可重复的方法论,限制了研究的进展和可比性。本研究通过(i)将现有的康复数据集聚合到一个名为Rehab-Pile的统一存档中,(ii)提出一个评估该领域深度学习方法的通用基准框架,以及(iii)在分类和回归任务中对多种架构进行广泛的基准测试来填补这些空白。所有数据集和实施方案都已向社区发布,以支持透明度和可重复性。本文旨在为未来自动康复评估研究奠定坚实基础,并促进可靠、可访问和个性化的康复解决方案的发展。本文的数据集、源代码和结果均已公开可用。

更新时间: 2025-07-28 17:39:03

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.21018v1

MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them

Hallucinations pose critical risks for large language model (LLM)-based agents, often manifesting as hallucinative actions resulting from fabricated or misinterpreted information within the cognitive context. While recent studies have exposed such failures, existing evaluations remain fragmented and lack a principled testbed. In this paper, we present MIRAGE-Bench--Measuring Illusions in Risky AGEnt settings--the first unified benchmark for eliciting and evaluating hallucinations in interactive LLM-agent scenarios. We begin by introducing a three-part taxonomy to address agentic hallucinations: actions that are unfaithful to (i) task instructions, (ii) execution history, or (iii) environment observations. To analyze, we first elicit such failures by performing a systematic audit of existing agent benchmarks, then synthesize test cases using a snapshot strategy that isolates decision points in deterministic and reproducible manners. To evaluate hallucination behaviors, we adopt a fine-grained-level LLM-as-a-Judge paradigm with tailored risk-aware prompts, enabling scalable, high-fidelity assessment of agent actions without enumerating full action spaces. MIRAGE-Bench provides actionable insights on failure modes of LLM agents and lays the groundwork for principled progress in mitigating hallucinations in interactive environments.

Updated: 2025-07-28 17:38:29

标题: MIRAGE-Bench:LLM Agent在幻觉中,如何找到它们

摘要: 幻觉对基于大型语言模型(LLM)的代理人构成重要风险,通常表现为在认知环境中由虚构或误解的信息导致的幻觉行为。尽管最近的研究揭示了这种失败,但现有的评估仍然零散且缺乏原则性的测试基础。在本文中,我们提出了MIRAGE-Bench--在风险代理设置中测量幻觉--这是首个用于引发和评估交互LLM代理方案中幻觉的统一基准。我们首先引入了一个三部分分类法来解决代理幻觉问题:行动不忠于(i)任务说明,(ii)执行历史,或(iii)环境观察。为了分析,我们首先通过对现有代理基准进行系统审计来引发这种失败,然后使用一种快照策略合成测试用例,以确定性和可重复的方式隔离决策点。为了评估幻觉行为,我们采用了一个细粒度的LLM作为评判者的范式,配备定制的风险感知提示,实现了对代理行为的可伸缩、高保真度的评估,而无需列举完整的行动空间。MIRAGE-Bench提供了LLM代理的失败模式的可操作见解,并为在交互环境中减轻幻觉行为奠定了原则性的进展基础。

更新时间: 2025-07-28 17:38:29

领域: cs.AI

下载: http://arxiv.org/abs/2507.21017v1

MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them

Hallucinations pose critical risks for large language model (LLM)-based agents, often manifesting as hallucinative actions resulting from fabricated or misinterpreted information within the cognitive context. While recent studies have exposed such failures, existing evaluations remain fragmented and lack a principled testbed. In this paper, we present MIRAGE-Bench--Measuring Illusions in Risky AGEnt settings--the first unified benchmark for eliciting and evaluating hallucinations in interactive LLM-agent scenarios. We begin by introducing a three-part taxonomy to address agentic hallucinations: actions that are unfaithful to (i) task instructions, (ii) execution history, or (iii) environment observations. To analyze, we first elicit such failures by performing a systematic audit of existing agent benchmarks, then synthesize test cases using a snapshot strategy that isolates decision points in deterministic and reproducible manners. To evaluate hallucination behaviors, we adopt a fine-grained-level LLM-as-a-Judge paradigm with tailored risk-aware prompts, enabling scalable, high-fidelity assessment of agent actions without enumerating full action spaces. MIRAGE-Bench provides actionable insights on failure modes of LLM agents and lays the groundwork for principled progress in mitigating hallucinations in interactive environments.

Updated: 2025-07-28 17:38:29

标题: MIRAGE-Bench:LLM Agent正在产生幻觉,如何找到它们

摘要: 幻觉对基于大型语言模型(LLM)的代理人构成重大风险,通常表现为幻觉行为,这些行为是由认知背景中虚构或误解的信息导致的。尽管最近的研究揭示了这些失败,但现有的评估仍然是零碎的,缺乏一个有原则的测试平台。在本文中,我们提出了MIRAGE-Bench--在风险代理设置中衡量幻觉--这是第一个用于引发和评估交互式LLM代理场景中幻觉的统一基准。我们首先引入了一个三部分分类法来处理代理幻觉:行动不忠于(i)任务说明,(ii)执行历史,或者(iii)环境观察。为了分析,我们首先通过对现有代理基准进行系统审计来引发这些失败,然后使用快照策略合成测试用例,该策略以确定性和可重现的方式孤立决策点。为了评估幻觉行为,我们采用了一个细粒度的LLM作为评判者范式,配备定制的风险意识提示,从而能够在不枚举完整行动空间的情况下进行可扩展、高保真度的代理行为评估。MIRAGE-Bench为LLM代理人的失败模式提供了可操作的见解,并为在交互环境中减轻幻觉问题奠定了基础。

更新时间: 2025-07-28 17:38:29

领域: cs.AI

下载: http://arxiv.org/abs/2507.21017v1

Back Home: A Computer Vision Solution to Seashell Identification for Ecological Restoration

Illegal souvenir collection strips an estimated five tonnes of seashells from Costa Rica's beaches each year. Yet, once these specimens are seized, their coastal origin -- Pacific or Caribbean -- cannot be verified easily due to the lack of information, preventing their return when confiscated by local authorities. To solve this issue, we introduce BackHome19K, the first large-scale image corpus (19{,}058 photographs, 516 species) annotated with coast-level labels, and propose a lightweight pipeline that infers provenance in real time on a mobile-grade CPU. A trained anomaly filter pre-screens uploads, increasing robustness to user-generated noise. On a held-out test set, the classifier attains 86.3\% balanced accuracy, while the filter rejects 93\% of 180 out-of-domain objects with zero false negatives. Deployed as a web application, the system has already processed 70{,}000 shells for wildlife officers in under three seconds per image, enabling confiscated specimens to be safely repatriated to their native ecosystems. The dataset is available at https://huggingface.co/datasets/FIFCO/BackHome19K

Updated: 2025-07-28 17:30:22

标题: 回家:一个用于生态恢复的贝壳识别的计算机视觉解决方案

摘要: 非法纪念品收藏每年从哥斯达黎加的海滩上剥夺了约五吨贝壳。然而,一旦这些标本被查获,由于缺乏信息,它们的海岸起源——太平洋或加勒比海——很难被轻易验证,这导致当地当局没法将其返还。为了解决这个问题,我们介绍了BackHome19K,这是第一个大规模图像语料库(19,058张照片,516种物种),带有海岸级标签,并提出了一个在移动级CPU上实时推断出产地的轻量级管道。经过训练的异常过滤器对上传的内容进行预筛选,增加了对用户生成噪声的鲁棒性。在一个保留的测试集上,分类器达到了86.3%的平衡准确率,而过滤器拒绝了180个域外物体中的93%,且没有假阴性。作为一个网络应用程序部署,系统已经在三秒内为野生动物官员处理了70,000个贝壳的图像,使没收的标本能够安全地归还到其本土生态系统。该数据集可在https://huggingface.co/datasets/FIFCO/BackHome19K 上获取。

更新时间: 2025-07-28 17:30:22

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2501.04873v3

Back Home: A Computer Vision Solution to Seashell Identification for Ecological Restoration

Illegal souvenir collection strips an estimated five tonnes of seashells from Costa Rica's beaches each year. Yet, once these specimens are seized, their coastal origin -- Pacific or Caribbean -- cannot be verified easily due to the lack of information, preventing their return when confiscated by local authorities. To solve this issue, we introduce BackHome19K, the first large-scale image corpus (19{,}058 photographs, 516 species) annotated with coast-level labels, and propose a lightweight pipeline that infers provenance in real time on a mobile-grade CPU. A trained anomaly filter pre-screens uploads, increasing robustness to user-generated noise. On a held-out test set, the classifier attains 86.3\% balanced accuracy, while the filter rejects 93\% of 180 out-of-domain objects with zero false negatives. Deployed as a web application, the system has already processed 70{,}000 shells for wildlife officers in under three seconds per image, enabling confiscated specimens to be safely repatriated to their native ecosystems. The dataset is available at https://huggingface.co/datasets/FIFCO/BackHome19K

Updated: 2025-07-28 17:30:22

标题: 回家:用于生态恢复的贝壳识别的计算机视觉解决方案

摘要: 非法纪念品收藏每年从哥斯达黎加的海滩上剥夺了约五吨贝壳。然而,一旦这些标本被没收,由于缺乏信息,它们的海岸起源--太平洋或加勒比海--无法轻易验证,这阻止了当地当局没收后的归还。为了解决这个问题,我们推出了BackHome19K,第一个带有海岸级标签的大规模图像语料库(19,058张照片,516种物种),并提出了一个轻量级管道,可以在移动级CPU上实时推断起源。经过训练的异常过滤器对上传进行预筛选,增加了对用户生成噪音的鲁棒性。在一个保留的测试集上,分类器达到86.3%的平衡准确率,而过滤器拒绝了180个域外对象中的93%,零误差拒绝率。作为一个网络应用程序部署,该系统已经在不到三秒的时间内为野生动物官员处理了70,000个贝壳的图像,使没收的标本能够安全地归还到其本土生态系统。该数据集可在https://huggingface.co/datasets/FIFCO/BackHome19K上获得。

更新时间: 2025-07-28 17:30:22

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2501.04873v3

Predicting Cognition from fMRI:A Comparative Study of Graph, Transformer, and Kernel Models Across Task and Rest Conditions

Predicting cognition from neuroimaging data in healthy individuals offers insights into the neural mechanisms underlying cognitive abilities, with potential applications in precision medicine and early detection of neurological and psychiatric conditions. This study systematically benchmarked classical machine learning (Kernel Ridge Regression (KRR)) and advanced deep learning (DL) models (Graph Neural Networks (GNN) and Transformer-GNN (TGNN)) for cognitive prediction using Resting-state (RS), Working Memory, and Language task fMRI data from the Human Connectome Project Young Adult dataset. Our results, based on R2 scores, Pearson correlation coefficient, and mean absolute error, revealed that task-based fMRI, eliciting neural responses directly tied to cognition, outperformed RS fMRI in predicting cognitive behavior. Among the methods compared, a GNN combining structural connectivity (SC) and functional connectivity (FC) consistently achieved the highest performance across all fMRI modalities; however, its advantage over KRR using FC alone was not statistically significant. The TGNN, designed to model temporal dynamics with SC as a prior, performed competitively with FC-based approaches for task-fMRI but struggled with RS data, where its performance aligned with the lower-performing GNN that directly used fMRI time-series data as node features. These findings emphasize the importance of selecting appropriate model architectures and feature representations to fully leverage the spatial and temporal richness of neuroimaging data. This study highlights the potential of multimodal graph-aware DL models to combine SC and FC for cognitive prediction, as well as the promise of Transformer-based approaches for capturing temporal dynamics. By providing a comprehensive comparison of models, this work serves as a guide for advancing brain-behavior modeling using fMRI, SC and DL.

Updated: 2025-07-28 17:29:22

标题: 用功能性磁共振成像预测认知能力:基于图、变换器和核模型在任务和休息条件下的比较研究

摘要: 在健康个体中,从神经影像数据预测认知能力能够提供洞见,揭示认知能力背后的神经机制,具有在精准医学和早期检测神经和精神疾病的潜在应用。本研究系统地对比了经典机器学习(Kernel Ridge Regression (KRR))和先进深度学习(DL)模型(图神经网络(GNN)和Transformer-GNN(TGNN))在使用来自人类连接组项目年轻成人数据集的静息态(RS)、工作记忆和语言任务fMRI数据进行认知预测的性能。 通过基于R2分数、皮尔逊相关系数和平均绝对误差的结果,我们发现,基于任务的fMRI,直接与认知相关的神经响应,胜过RS fMRI在预测认知行为方面。在比较的方法中,一种结合结构连接(SC)和功能连接(FC)的GNN在所有fMRI模态下始终表现最佳;然而,它与仅使用FC的KRR相比的优势并不具有统计学意义。设计用于以SC为先验模拟时间动态的TGNN,在任务-fMRI方面表现出与基于FC方法相竞争的性能,但在RS数据方面表现出困难,其性能与直接使用fMRI时间序列数据作为节点特征的表现较低的GNN一致。这些发现强调了选择适当的模型架构和特征表示以充分利用神经影像数据的空间和时间丰富性的重要性。 本研究突出了多模态图感知DL模型结合SC和FC进行认知预测的潜力,以及基于Transformer的方法捕捉时间动态的前景。通过全面比较模型,这项工作为推进使用fMRI、SC和DL的脑行为建模提供了指导。

更新时间: 2025-07-28 17:29:22

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2507.21016v1

Predicting Cognition from fMRI:A Comparative Study of Graph, Transformer, and Kernel Models Across Task and Rest Conditions

Predicting cognition from neuroimaging data in healthy individuals offers insights into the neural mechanisms underlying cognitive abilities, with potential applications in precision medicine and early detection of neurological and psychiatric conditions. This study systematically benchmarked classical machine learning (Kernel Ridge Regression (KRR)) and advanced deep learning (DL) models (Graph Neural Networks (GNN) and Transformer-GNN (TGNN)) for cognitive prediction using Resting-state (RS), Working Memory, and Language task fMRI data from the Human Connectome Project Young Adult dataset. Our results, based on R2 scores, Pearson correlation coefficient, and mean absolute error, revealed that task-based fMRI, eliciting neural responses directly tied to cognition, outperformed RS fMRI in predicting cognitive behavior. Among the methods compared, a GNN combining structural connectivity (SC) and functional connectivity (FC) consistently achieved the highest performance across all fMRI modalities; however, its advantage over KRR using FC alone was not statistically significant. The TGNN, designed to model temporal dynamics with SC as a prior, performed competitively with FC-based approaches for task-fMRI but struggled with RS data, where its performance aligned with the lower-performing GNN that directly used fMRI time-series data as node features. These findings emphasize the importance of selecting appropriate model architectures and feature representations to fully leverage the spatial and temporal richness of neuroimaging data. This study highlights the potential of multimodal graph-aware DL models to combine SC and FC for cognitive prediction, as well as the promise of Transformer-based approaches for capturing temporal dynamics. By providing a comprehensive comparison of models, this work serves as a guide for advancing brain-behavior modeling using fMRI, SC and DL.

Updated: 2025-07-28 17:29:22

标题: 从 fMRI 预测认知能力:图、变压器和核模型在任务和休息条件下的比较研究

摘要: 在健康个体中从神经影像数据预测认知能力,可以揭示认知能力背后的神经机制,具有在精准医学和早期检测神经和精神疾病的潜在应用。本研究系统地比较了经典机器学习(核岭回归(KRR))和先进深度学习(DL)模型(图神经网络(GNN)和Transformer-GNN(TGNN))在使用人类连接组计划青少年数据集的静息态(RS)、工作记忆和语言任务fMRI数据进行认知预测方面的表现。 基于R2分数、皮尔逊相关系数和平均绝对误差,我们的结果显示,基于任务的fMRI,直接与认知相关的神经反应,优于RS fMRI在预测认知行为方面的表现。在比较的方法中,结构连接(SC)和功能连接(FC)相结合的GNN在所有fMRI模态下始终实现了最高性能;然而,与仅使用FC的KRR相比,其优势并不具有统计学显著性。TGNN旨在利用SC作为先验来建模时间动态,在任务fMRI方面表现竞争力强,但在RS数据方面表现出困难,其表现与直接使用fMRI时间序列数据作为节点特征的性能较低的GNN相符。这些发现强调了选择适当的模型架构和特征表示以充分利用神经影像数据的空间和时间丰富性的重要性。 本研究突出了多模式图感知DL模型将SC和FC结合用于认知预测的潜力,以及捕捉时间动态的Transformer-based方法的前景。通过对模型的全面比较,本研究为使用fMRI、SC和DL推进脑行为建模提供了指导。

更新时间: 2025-07-28 17:29:22

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2507.21016v1

Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions

The use of large language models (LLMs) in hiring promises to streamline candidate screening, but it also raises serious concerns regarding accuracy and algorithmic bias where sufficient safeguards are not in place. In this work, we benchmark several state-of-the-art foundational LLMs - including models from OpenAI, Anthropic, Google, Meta, and Deepseek, and compare them with our proprietary domain-specific hiring model (Match Score) for job candidate matching. We evaluate each model's predictive accuracy (ROC AUC, Precision-Recall AUC, F1-score) and fairness (impact ratio of cut-off analysis across declared gender, race, and intersectional subgroups). Our experiments on a dataset of roughly 10,000 real-world recent candidate-job pairs show that Match Score outperforms the general-purpose LLMs on accuracy (ROC AUC 0.85 vs 0.77) and achieves significantly more equitable outcomes across demographic groups. Notably, Match Score attains a minimum race-wise impact ratio of 0.957 (near-parity), versus 0.809 or lower for the best LLMs, (0.906 vs 0.773 for the intersectionals, respectively). We discuss why pretraining biases may cause LLMs with insufficient safeguards to propagate societal biases in hiring scenarios, whereas a bespoke supervised model can more effectively mitigate these biases. Our findings highlight the importance of domain-specific modeling and bias auditing when deploying AI in high-stakes domains such as hiring, and caution against relying on off-the-shelf LLMs for such tasks without extensive fairness safeguards. Furthermore, we show with empirical evidence that there shouldn't be a dichotomy between choosing accuracy and fairness in hiring: a well-designed algorithm can achieve both accuracy in hiring and fairness in outcomes.

Updated: 2025-07-28 17:26:01

标题: 评估LLMs在招聘决策中的潜力与风险

摘要: 在招聘中使用大型语言模型(LLMs)有望简化候选人筛选过程,但同时也引发了关于准确性和算法偏见的严重担忧,如果没有足够的安全防护措施。在这项工作中,我们对几种最先进的基础LLMs进行基准测试,包括来自OpenAI、Anthropic、Google、Meta和Deepseek的模型,并将它们与我们的专有领域特定招聘模型(Match Score)进行比较,以进行工作候选人匹配。我们评估每个模型的预测准确性(ROC AUC、Precision-Recall AUC、F1分数)和公平性(跨声明性别、种族和交叉子群体的截断分析影响比率)。我们在近10,000个真实世界最近的候选人-工作对数据集上进行的实验表明,Match Score在准确性方面优于通用LLMs(ROC AUC 0.85 vs 0.77),并在人口统计群体之间取得显着更加公平的结果。值得注意的是,Match Score在种族方面达到了最低的影响比率为0.957(接近平等),而最佳LLMs的影响比率为0.809或更低(分别为0.773的交叉比率0.906 vs)。我们讨论了为什么预训练偏见可能会导致缺乏足够安全防护措施的LLMs在招聘场景中传播社会偏见,而专门的监督模型可以更有效地减轻这些偏见。我们的研究结果强调了在高风险领域(如招聘)中部署人工智能时领域特定建模和偏见审计的重要性,并警告不要在没有广泛公平保障的情况下依赖现成的LLMs来执行此类任务。此外,我们通过实证证据表明,在招聘中选择准确性和公平性之间不应该存在二分法:一个设计良好的算法可以在招聘准确性和结果公平性方面都取得成功。

更新时间: 2025-07-28 17:26:01

领域: cs.LG,cs.CL,cs.CY

下载: http://arxiv.org/abs/2507.02087v2

Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions

The use of large language models (LLMs) in hiring promises to streamline candidate screening, but it also raises serious concerns regarding accuracy and algorithmic bias where sufficient safeguards are not in place. In this work, we benchmark several state-of-the-art foundational LLMs - including models from OpenAI, Anthropic, Google, Meta, and Deepseek, and compare them with our proprietary domain-specific hiring model (Match Score) for job candidate matching. We evaluate each model's predictive accuracy (ROC AUC, Precision-Recall AUC, F1-score) and fairness (impact ratio of cut-off analysis across declared gender, race, and intersectional subgroups). Our experiments on a dataset of roughly 10,000 real-world recent candidate-job pairs show that Match Score outperforms the general-purpose LLMs on accuracy (ROC AUC 0.85 vs 0.77) and achieves significantly more equitable outcomes across demographic groups. Notably, Match Score attains a minimum race-wise impact ratio of 0.957 (near-parity), versus 0.809 or lower for the best LLMs, (0.906 vs 0.773 for the intersectionals, respectively). We discuss why pretraining biases may cause LLMs with insufficient safeguards to propagate societal biases in hiring scenarios, whereas a bespoke supervised model can more effectively mitigate these biases. Our findings highlight the importance of domain-specific modeling and bias auditing when deploying AI in high-stakes domains such as hiring, and caution against relying on off-the-shelf LLMs for such tasks without extensive fairness safeguards. Furthermore, we show with empirical evidence that there shouldn't be a dichotomy between choosing accuracy and fairness in hiring: a well-designed algorithm can achieve both accuracy in hiring and fairness in outcomes.

Updated: 2025-07-28 17:26:01

标题: 评估LLM在招聘决策中的潜在优势和风险

摘要: 大型语言模型(LLMs)在招聘中的使用承诺简化候选人筛选,但也引发了关于准确性和算法偏见的严重担忧,如果没有足够的保障措施。在这项工作中,我们对几种最先进的基础LLMs进行基准测试 - 包括来自OpenAI、Anthropic、Google、Meta和Deepseek的模型,并将它们与我们的专有领域特定招聘模型(匹配分数)进行比较,用于候选人匹配。我们评估每个模型的预测准确性(ROC AUC、Precision-Recall AUC、F1分数)和公平性(在声明的性别、种族和交叉子群之间的截断分析影响比率)。我们在大约10,000个真实世界最近的候选人-职位配对数据集上的实验表明,匹配分数在准确性上优于通用LLMs(ROC AUC 0.85 vs 0.77),并在人口统计群体之间实现了更加公平的结果。值得注意的是,匹配分数达到了种族方面的最低影响比率为0.957(接近平等),而最佳LLMs的比率为0.809或更低(分别为交叉比率0.906 vs 0.773)。我们讨论了为什么预训练偏见可能导致缺乏足够保障措施的LLMs在招聘场景中传播社会偏见,而专门的监督模型可以更有效地缓解这些偏见。我们的研究结果强调了在高风险领域如招聘中部署AI时,领域特定建模和偏见审计的重要性,并警告不要在没有广泛公平保障措施的情况下依赖现成的LLMs执行此类任务。此外,我们通过经验证据表明,在招聘中选择准确性和公平性之间不应该存在二分法:一个设计良好的算法可以在招聘中实现准确性和结果的公平性。

更新时间: 2025-07-28 17:26:01

领域: cs.LG,cs.CL,cs.CY

下载: http://arxiv.org/abs/2507.02087v2

Memorization in Fine-Tuned Large Language Models

This study investigates the mechanisms and factors influencing memorization in fine-tuned large language models (LLMs), with a focus on the medical domain due to its privacy-sensitive nature. We examine how different aspects of the fine-tuning process affect a model's propensity to memorize training data, using the PHEE dataset of pharmacovigilance events. Our research employs two main approaches: a membership inference attack to detect memorized data, and a generation task with prompted prefixes to assess verbatim reproduction. We analyze the impact of adapting different weight matrices in the transformer architecture, the relationship between perplexity and memorization, and the effect of increasing the rank in low-rank adaptation (LoRA) fine-tuning. Key findings include: (1) Value and Output matrices contribute more significantly to memorization compared to Query and Key matrices; (2) Lower perplexity in the fine-tuned model correlates with increased memorization; (3) Higher LoRA ranks lead to increased memorization, but with diminishing returns at higher ranks. These results provide insights into the trade-offs between model performance and privacy risks in fine-tuned LLMs. Our findings have implications for developing more effective and responsible strategies for adapting large language models while managing data privacy concerns.

Updated: 2025-07-28 17:22:10

标题: 大型语言模型微调中的记忆化

摘要: 这项研究调查了细调大型语言模型(LLMs)中影响记忆的机制和因素,重点关注医学领域,因为其具有隐私敏感性。我们研究了细调过程的不同方面如何影响模型对训练数据进行记忆,使用了药物警戒事件的PHEE数据集。 我们的研究采用了两种主要方法:成员推理攻击用于检测记忆数据,以及具有提示前缀的生成任务来评估逐字复制。我们分析了在变压器架构中调整不同权重矩阵、困惑度与记忆之间的关系,以及在低秩适应(LoRA)细调中增加秩的影响。 关键发现包括:(1)价值和输出矩阵相比查询和键矩阵对记忆贡献更显著;(2)细调模型中较低的困惑度与增加的记忆相关;(3)更高的LoRA秩导致增加的记忆,但在更高秩时收益递减。 这些结果提供了关于细调LLMs中模型性能和隐私风险之间权衡的见解。我们的发现对于开发更有效和负责任的策略以适应大型语言模型并管理数据隐私问题具有重要意义。

更新时间: 2025-07-28 17:22:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.21009v1

Memorization in Fine-Tuned Large Language Models

This study investigates the mechanisms and factors influencing memorization in fine-tuned large language models (LLMs), with a focus on the medical domain due to its privacy-sensitive nature. We examine how different aspects of the fine-tuning process affect a model's propensity to memorize training data, using the PHEE dataset of pharmacovigilance events. Our research employs two main approaches: a membership inference attack to detect memorized data, and a generation task with prompted prefixes to assess verbatim reproduction. We analyze the impact of adapting different weight matrices in the transformer architecture, the relationship between perplexity and memorization, and the effect of increasing the rank in low-rank adaptation (LoRA) fine-tuning. Key findings include: (1) Value and Output matrices contribute more significantly to memorization compared to Query and Key matrices; (2) Lower perplexity in the fine-tuned model correlates with increased memorization; (3) Higher LoRA ranks lead to increased memorization, but with diminishing returns at higher ranks. These results provide insights into the trade-offs between model performance and privacy risks in fine-tuned LLMs. Our findings have implications for developing more effective and responsible strategies for adapting large language models while managing data privacy concerns.

Updated: 2025-07-28 17:22:10

标题: 大型语言模型的细化中的记忆化

摘要: 本研究调查了细调大型语言模型(LLMs)中影响记忆的机制和因素,重点关注医疗领域,因为其具有隐私敏感性。我们研究了不同方面的细调过程如何影响模型对训练数据的记忆倾向,使用药物警戒事件的PHEE数据集。 我们的研究采用了两种主要方法:成员推理攻击以检测记忆数据,以及带有提示前缀的生成任务来评估逐字复制。我们分析了在变压器架构中调整不同权重矩阵、困惑度和记忆之间的关系,以及在低秩适应(LoRA)细调中增加秩的影响。 主要发现包括:(1)值和输出矩阵相较于查询和键矩阵更显著地促进记忆;(2)细调模型中较低的困惑度与增加的记忆相关;(3)较高的LoRA秩导致增加的记忆,但在较高秩时出现收益递减。 这些结果为细调LLMs中模型性能和隐私风险之间的权衡提供了见解。我们的发现对于开发更有效和负责任的策略来适应大型语言模型并管理数据隐私问题具有重要意义。

更新时间: 2025-07-28 17:22:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.21009v1

Compositional Function Networks: A High-Performance Alternative to Deep Neural Networks with Built-in Interpretability

Deep Neural Networks (DNNs) deliver impressive performance but their black-box nature limits deployment in high-stakes domains requiring transparency. We introduce Compositional Function Networks (CFNs), a novel framework that builds inherently interpretable models by composing elementary mathematical functions with clear semantics. Unlike existing interpretable approaches that are limited to simple additive structures, CFNs support diverse compositional patterns -- sequential, parallel, and conditional -- enabling complex feature interactions while maintaining transparency. A key innovation is that CFNs are fully differentiable, allowing efficient training through standard gradient descent. We demonstrate CFNs' versatility across multiple domains, from symbolic regression to image classification with deep hierarchical networks. Our empirical evaluation shows CFNs achieve competitive performance against black-box models (96.24% accuracy on CIFAR-10) while outperforming state-of-the-art interpretable models like Explainable Boosting Machines. By combining the hierarchical expressiveness and efficient training of deep learning with the intrinsic interpretability of well-defined mathematical functions, CFNs offer a powerful framework for applications where both performance and accountability are paramount.

Updated: 2025-07-28 17:18:40

标题: 组合功能网络:具有内置解释性的高性能替代深度神经网络

摘要: 深度神经网络(DNNs)提供了令人印象深刻的性能,但它们的黑匣子特性限制了在需要透明度的高风险领域的部署。我们介绍了组合函数网络(CFNs),这是一个新颖的框架,通过组合具有明确语义的基本数学函数来构建本质上可解释的模型。与现有的仅限于简单加法结构的可解释方法不同,CFNs支持多样的组合模式 -- 顺序、并行和条件 -- 能够实现复杂的特征交互并保持透明度。关键创新在于,CFNs是完全可微的,允许通过标准梯度下降进行高效训练。我们展示了CFNs在多个领域的多功能性,从符号回归到具有深度分层网络的图像分类。我们的实证评估显示,CFNs在CIFAR-10上的准确率达到了96.24%,与黑匣子模型竞争性能相当,同时优于像可解释性增强机器这样的最新可解释模型。通过将深度学习的层次表达能力和高效训练与明确定义的数学函数的内在可解释性相结合,CFNs为那些性能和问责制都至关重要的应用提供了一个强大的框架。

更新时间: 2025-07-28 17:18:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.21004v1

Compositional Function Networks: A High-Performance Alternative to Deep Neural Networks with Built-in Interpretability

Deep Neural Networks (DNNs) deliver impressive performance but their black-box nature limits deployment in high-stakes domains requiring transparency. We introduce Compositional Function Networks (CFNs), a novel framework that builds inherently interpretable models by composing elementary mathematical functions with clear semantics. Unlike existing interpretable approaches that are limited to simple additive structures, CFNs support diverse compositional patterns -- sequential, parallel, and conditional -- enabling complex feature interactions while maintaining transparency. A key innovation is that CFNs are fully differentiable, allowing efficient training through standard gradient descent. We demonstrate CFNs' versatility across multiple domains, from symbolic regression to image classification with deep hierarchical networks. Our empirical evaluation shows CFNs achieve competitive performance against black-box models (96.24% accuracy on CIFAR-10) while outperforming state-of-the-art interpretable models like Explainable Boosting Machines. By combining the hierarchical expressiveness and efficient training of deep learning with the intrinsic interpretability of well-defined mathematical functions, CFNs offer a powerful framework for applications where both performance and accountability are paramount.

Updated: 2025-07-28 17:18:40

标题: 组合功能网络:一种高性能替代深度神经网络的选择,具有内置的可解释性

摘要: 深度神经网络(DNNs)表现出令人印象深刻的性能,但它们的黑盒特性限制了在需要透明度的高风险领域的部署。我们介绍了组合函数网络(CFNs),这是一个新颖的框架,通过组合具有清晰语义的基本数学函数来构建固有可解释的模型。与现有的仅限于简单加法结构的可解释方法不同,CFNs支持多样化的组合模式--顺序,并行和条件--可以实现复杂的特征交互作用同时保持透明度。一个关键创新是CFNs是完全可微的,可以通过标准的梯度下降进行高效训练。我们展示了CFNs在多个领域的多功能性,从符号回归到具有深层次网络的图像分类。我们的实证评估显示,CFNs在CIFAR-10上达到了与黑盒模型(96.24%准确度)竞争性能,同时优于像可解释性增强机器这样的最先进的可解释模型。通过将深度学习的层次表达能力和高效训练与明确定义的数学函数的固有可解释性相结合,CFNs为那些关注性能和问责制的应用提供了一个强大的框架。

更新时间: 2025-07-28 17:18:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.21004v1

LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning

Large-scale generative models like DeepSeek-R1 and OpenAI-O1 benefit substantially from chain-of-thought (CoT) reasoning, yet pushing their performance typically requires vast data, large model sizes, and full-parameter fine-tuning. While parameter-efficient fine-tuning (PEFT) helps reduce cost, most existing approaches primarily address domain adaptation or layer-wise allocation rather than explicitly tailoring data and parameters to different response demands. Inspired by "Thinking, Fast and Slow," which characterizes two distinct modes of thought-System 1 (fast, intuitive, often automatic) and System 2 (slower, more deliberative and analytic)-we draw an analogy that different "subregions" of an LLM's parameters might similarly specialize for tasks that demand quick, intuitive responses versus those requiring multi-step logical reasoning. Therefore, we propose LoRA-PAR, a dual-system LoRA framework that partitions both data and parameters by System 1 or System 2 demands, using fewer yet more focused parameters for each task. Specifically, we classify task data via multi-model role-playing and voting, and partition parameters based on importance scoring, then adopt a two-stage fine-tuning strategy of training System 1 tasks with supervised fine-tuning (SFT) to enhance knowledge and intuition and refine System 2 tasks with reinforcement learning (RL) to reinforce deeper logical deliberation next. Extensive experiments show that the two-stage fine-tuning strategy, SFT and RL, lowers active parameter usage while matching or surpassing SOTA PEFT baselines.

Updated: 2025-07-28 17:11:26

标题: LoRA-PAR:一种灵活的双系统LoRA分区方法,用于高效的LLM微调

摘要: 大规模生成模型如DeepSeek-R1和OpenAI-O1受益于思维链(CoT)推理,然而提升它们的性能通常需要大量数据、大型模型和全参数微调。虽然参数高效微调(PEFT)有助于降低成本,但大多数现有方法主要解决领域适应或分层分配,而不是明确地根据不同的响应需求定制数据和参数。受“思考,快与慢”的启发,该书描述了两种不同的思维模式-系统1(快速、直觉、通常自动化)和系统2(较慢、更审慎和分析型)-我们做了一个类比,即大型生成模型的不同“子区域”的参数可能类似地专门用于需要快速直觉响应的任务与需要多步逻辑推理的任务。因此,我们提出了LoRA-PAR,一个双系统LoRA框架,按照系统1或系统2的需求将数据和参数进行分区,为每个任务使用更少但更专注的参数。具体来说,我们通过多模型角色扮演和投票对任务数据进行分类,并根据重要性评分对参数进行分区,然后采用两阶段微调策略,通过监督微调(SFT)训练系统1任务以增强知识和直觉,然后通过强化学习(RL)对系统2任务进行深入的逻辑思考。广泛的实验表明,两阶段微调策略SFT和RL降低了活跃参数使用量,同时达到或超过了SOTA PEFT基线。

更新时间: 2025-07-28 17:11:26

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2507.20999v1

LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning

Large-scale generative models like DeepSeek-R1 and OpenAI-O1 benefit substantially from chain-of-thought (CoT) reasoning, yet pushing their performance typically requires vast data, large model sizes, and full-parameter fine-tuning. While parameter-efficient fine-tuning (PEFT) helps reduce cost, most existing approaches primarily address domain adaptation or layer-wise allocation rather than explicitly tailoring data and parameters to different response demands. Inspired by "Thinking, Fast and Slow," which characterizes two distinct modes of thought-System 1 (fast, intuitive, often automatic) and System 2 (slower, more deliberative and analytic)-we draw an analogy that different "subregions" of an LLM's parameters might similarly specialize for tasks that demand quick, intuitive responses versus those requiring multi-step logical reasoning. Therefore, we propose LoRA-PAR, a dual-system LoRA framework that partitions both data and parameters by System 1 or System 2 demands, using fewer yet more focused parameters for each task. Specifically, we classify task data via multi-model role-playing and voting, and partition parameters based on importance scoring, then adopt a two-stage fine-tuning strategy of training System 1 tasks with supervised fine-tuning (SFT) to enhance knowledge and intuition and refine System 2 tasks with reinforcement learning (RL) to reinforce deeper logical deliberation next. Extensive experiments show that the two-stage fine-tuning strategy, SFT and RL, lowers active parameter usage while matching or surpassing SOTA PEFT baselines.

Updated: 2025-07-28 17:11:26

标题: LoRA-PAR:一种灵活的双系统LoRA分区方法,用于高效的LLM微调

摘要: 大规模生成模型如DeepSeek-R1和OpenAI-O1受益于思维链(CoT)推理,然而推动它们的性能通常需要大量数据、大型模型尺寸和全参数微调。虽然参数高效微调(PEFT)有助于降低成本,但大多数现有方法主要解决领域适应或逐层分配,而不是明确地根据不同的响应需求量身定制数据和参数。受到“思考,快与慢”启发,该书描述了两种不同的思维模式-系统1(快速,直觉,通常自动)和系统2(更慢,更审慎和分析)-我们提出了一个类比,即大规模生成模型的不同“子区域”的参数可能类似地专门用于需要快速直观响应的任务,而不需要多步逻辑推理的任务。因此,我们提出了LoRA-PAR,一个双系统LoRA框架,根据系统1或系统2的需求对数据和参数进行划分,为每个任务使用更少但更专注的参数。具体而言,我们通过多模型角色扮演和投票对任务数据进行分类,根据重要性评分对参数进行分区,然后采用两阶段微调策略,对系统1任务进行监督微调(SFT)以增强知识和直觉,对系统2任务进行强化学习(RL)以加强更深层次的逻辑思考。广泛的实验表明,两阶段微调策略SFT和RL在匹配或超越SOTA PEFT基线的同时降低了活跃参数使用率。

更新时间: 2025-07-28 17:11:26

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2507.20999v1

Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition

In real-world machine learning deployments, models must be continually updated, composed, and when required, selectively undone. However, existing approaches to model merging and continual learning often suffer from task interference, catastrophic forgetting, or lack of reversibility. We propose Modular Delta Merging with Orthogonal Constraints (MDM-OC), a novel framework that enables scalable, interference-free, and reversible composition of fine-tuned models. Each task-specific model is encoded as a delta from a shared base and projected into an orthogonal subspace to eliminate conflict. These projected deltas are then merged via gradient-based optimization to form a unified model that retains performance across tasks. Our approach supports continual integration of new models, structured unmerging for compliance such as GDPR requirements, and model stability via elastic weight consolidation and synthetic replay. Extensive experiments on vision and natural language processing benchmarks demonstrate that MDM-OC outperforms prior baselines in accuracy, backward transfer, and unmerge fidelity, while remaining memory-efficient and computationally tractable. This framework offers a principled solution for modular and compliant AI system design.

Updated: 2025-07-28 17:08:49

标题: 模块化三角合并与正交约束:一种可扩展的持续和可逆的模型组合框架

摘要: 在实际的机器学习部署中,模型必须不断更新、组合,并在需要时进行选择性撤销。然而,现有的模型合并和持续学习方法往往存在任务干扰、灾难性遗忘或缺乏可逆性的问题。我们提出了一种新颖的框架Modular Delta Merging with Orthogonal Constraints (MDM-OC),该框架能够实现可扩展、无干扰和可逆的微调模型的组合。每个特定任务的模型都被编码为与共享基础模型的差值,并投影到正交子空间中以消除冲突。然后,通过基于梯度的优化将这些投影的差值合并,形成一个统一的模型,保持在各个任务中的性能。我们的方法支持持续集成新模型,结构化反合并以符合GDPR要求,并通过弹性权重合并和合成重现来实现模型稳定性。对视觉和自然语言处理基准的广泛实验表明,MDM-OC在准确性、反向传递和反合并的保真度方面优于先前的基线,同时保持内存高效和计算可跟踪。该框架为模块化和符合性人工智能系统设计提供了一个合理的解决方案。

更新时间: 2025-07-28 17:08:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20997v1

Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition

In real-world machine learning deployments, models must be continually updated, composed, and when required, selectively undone. However, existing approaches to model merging and continual learning often suffer from task interference, catastrophic forgetting, or lack of reversibility. We propose Modular Delta Merging with Orthogonal Constraints (MDM-OC), a novel framework that enables scalable, interference-free, and reversible composition of fine-tuned models. Each task-specific model is encoded as a delta from a shared base and projected into an orthogonal subspace to eliminate conflict. These projected deltas are then merged via gradient-based optimization to form a unified model that retains performance across tasks. Our approach supports continual integration of new models, structured unmerging for compliance such as GDPR requirements, and model stability via elastic weight consolidation and synthetic replay. Extensive experiments on vision and natural language processing benchmarks demonstrate that MDM-OC outperforms prior baselines in accuracy, backward transfer, and unmerge fidelity, while remaining memory-efficient and computationally tractable. This framework offers a principled solution for modular and compliant AI system design.

Updated: 2025-07-28 17:08:49

标题: 模块化Delta合并与正交约束:一种可扩展的框架,用于持续和可逆的模型组合

摘要: 在现实世界的机器学习部署中,模型必须不断更新、组合,并在需要时选择性地撤消。然而,现有的模型合并和持续学习方法往往受到任务干扰、灾难性遗忘或缺乏可逆性的影响。我们提出了一种新颖的框架Modular Delta Merging with Orthogonal Constraints(MDM-OC),该框架能够实现可扩展、无干扰和可逆的精细调整模型组合。每个特定任务的模型被编码为与共享基础的增量,并投影到正交子空间中以消除冲突。然后,通过基于梯度的优化将这些投影的增量合并,形成一个统一的模型,保持跨任务的性能。我们的方法支持新模型的持续集成,结构化的反合并以符合GDPR要求,以及通过弹性权重巩固和合成回放来实现模型稳定性。对视觉和自然语言处理基准的大量实验表明,MDM-OC在准确性、反向转移和反合并保真度方面优于先前的基线,同时保持内存高效和计算可追踪。这个框架为模块化和合规的人工智能系统设计提供了一个有原则的解决方案。

更新时间: 2025-07-28 17:08:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20997v1

Security Tensors as a Cross-Modal Bridge: Extending Text-Aligned Safety to Vision in LVLM

Large visual-language models (LVLMs) integrate aligned large language models (LLMs) with visual modules to process multimodal inputs. However, the safety mechanisms developed for text-based LLMs do not naturally extend to visual modalities, leaving LVLMs vulnerable to harmful image inputs. To address this cross-modal safety gap, we introduce security tensors - trainable input vectors applied during inference through either the textual or visual modality. These tensors transfer textual safety alignment to visual processing without modifying the model's parameters. They are optimized using a curated dataset containing (i) malicious image-text pairs requiring rejection, (ii) contrastive benign pairs with text structurally similar to malicious queries, with the purpose of being contrastive examples to guide visual reliance, and (iii) general benign samples preserving model functionality. Experimental results demonstrate that both textual and visual security tensors significantly enhance LVLMs' ability to reject diverse harmful visual inputs while maintaining near-identical performance on benign tasks. Further internal analysis towards hidden-layer representations reveals that security tensors successfully activate the language module's textual "safety layers" in visual inputs, thereby effectively extending text-based safety to the visual modality.

Updated: 2025-07-28 16:59:53

标题: Security Tensors作为跨模态桥梁:在LVLM中将文本对齐的安全性扩展到视觉

摘要: 大型视觉语言模型(LVLMs)将经过对齐的大型语言模型(LLMs)与视觉模块集成,以处理多模态输入。然而,为基于文本的LLMs开发的安全机制并不自然地延伸到视觉模态,使LVLMs容易受到有害图像输入的影响。为了解决这种跨模态安全差距,我们引入了安全张量 - 在推理过程中通过文本或视觉模态应用的可训练输入向量。这些张量将文本安全对齐转移到视觉处理中,而不会修改模型的参数。它们通过一个精心策划的数据集进行优化,该数据集包含(i)需要拒绝的恶意图像-文本对,(ii)与恶意查询在文本结构上相似的对比良性对,旨在作为对比示例以指导视觉依赖,并且(iii)保留模型功能的一般良性样本。实验结果表明,文本和视觉安全张量显著增强了LVLMs拒绝各种有害视觉输入的能力,同时在良性任务上保持几乎相同的性能。进一步对隐藏层表示进行的内部分析表明,安全张量成功地激活了语言模块的文本“安全层”,从而有效地将基于文本的安全性扩展到视觉模态。

更新时间: 2025-07-28 16:59:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20994v1

Security Tensors as a Cross-Modal Bridge: Extending Text-Aligned Safety to Vision in LVLM

Large visual-language models (LVLMs) integrate aligned large language models (LLMs) with visual modules to process multimodal inputs. However, the safety mechanisms developed for text-based LLMs do not naturally extend to visual modalities, leaving LVLMs vulnerable to harmful image inputs. To address this cross-modal safety gap, we introduce security tensors - trainable input vectors applied during inference through either the textual or visual modality. These tensors transfer textual safety alignment to visual processing without modifying the model's parameters. They are optimized using a curated dataset containing (i) malicious image-text pairs requiring rejection, (ii) contrastive benign pairs with text structurally similar to malicious queries, with the purpose of being contrastive examples to guide visual reliance, and (iii) general benign samples preserving model functionality. Experimental results demonstrate that both textual and visual security tensors significantly enhance LVLMs' ability to reject diverse harmful visual inputs while maintaining near-identical performance on benign tasks. Further internal analysis towards hidden-layer representations reveals that security tensors successfully activate the language module's textual "safety layers" in visual inputs, thereby effectively extending text-based safety to the visual modality.

Updated: 2025-07-28 16:59:53

标题: 安全张量作为跨模态桥梁:在LVLM中将文本对齐安全扩展到视觉

摘要: 大型视觉-语言模型(LVLMs)将对齐的大型语言模型(LLMs)与视觉模块集成,以处理多模式输入。然而,针对基于文本的LLMs开发的安全机制并不能自然地扩展到视觉模态,使得LVLMs容易受到有害图像输入的影响。为了解决这种跨模态安全差距,我们引入了安全张量 - 在推理过程中通过文本或视觉模态应用的可训练输入向量。这些张量将文本安全对齐转移到视觉处理,而不修改模型参数。它们使用一个策划数据集进行优化,该数据集包含(i)需要拒绝的恶意图像-文本对,(ii)与恶意查询在结构上相似的对比良性对,旨在成为对比示例以指导视觉依赖,并且(iii)保留模型功能的一般良性样本。实验结果表明,文本和视觉安全张量显着增强了LVLMs拒绝各种有害视觉输入的能力,同时在良性任务上保持几乎相同的性能。进一步对隐藏层表示进行的内部分析显示,安全张量成功地激活了语言模块的文本“安全层”,从而有效地将基于文本的安全扩展到视觉模态。

更新时间: 2025-07-28 16:59:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20994v1

On the Robustness of Global Feature Effect Explanations

We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally.

Updated: 2025-07-28 16:59:49

标题: 关于全局特征效应解释的鲁棒性

摘要: 我们研究了基于表格数据训练的预测模型的全局事后解释的稳健性。在黑盒监督学习中,预测器特征的影响是模型调试和应用科学中的重要诊断工具。然而,它们对数据和模型扰动的脆弱性仍然是一个开放的研究问题。我们引入了几个理论界限来评估局部依赖图和累积局部效应的稳健性。我们使用合成和真实世界数据集的实验结果量化了对机器学习预测进行全局(误)解释的最佳和最差情况之间的差距。

更新时间: 2025-07-28 16:59:49

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.09069v2

On the Robustness of Global Feature Effect Explanations

We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally.

Updated: 2025-07-28 16:59:49

标题: 关于全局特征效应解释的稳健性

摘要: 我们研究了针对表格数据训练的预测模型的全局事后解释的稳健性。黑盒监督学习中预测特征的影响是模型调试和应用科学中科学发现的重要诊断工具。然而,它们对数据和模型扰动的脆弱性仍然是一个未解决的研究问题。我们引入了几个理论界限来评估局部依赖图和累积局部效应的稳健性。我们使用合成和真实世界数据集的实验结果量化了对机器学习预测全局解释的最佳和最坏情况之间的差距。

更新时间: 2025-07-28 16:59:49

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.09069v2

Zebrafix: Mitigating Memory-Centric Side-Channel Leakage via Interleaving

Constant-time code has become the de-facto standard for secure cryptographic implementations. However, some memory-based leakage classes such as ciphertext side-channels and silent stores remain unaddressed. Prior work proposed three different methods for ciphertext side-channel mitigation, for which one, the practicality of interleaving data with counter values, remains to be explored. To close this gap, we define design choices and requirements to leverage interleaving for a generic ciphertext side-channel mitigation. Based on these results, we implement Zebrafix, a compiler-based tool to ensure freshness of memory stores. We evaluate Zebrafix and find that interleaving can perform much better than other ciphertext side-channel mitigations, at the cost of a high practical complexity. We further observe that ciphertext side-channels and silent stores belong to a broader attack category: memory-centric side-channels. Under this unified view, we show that interleaving-based ciphertext side-channel mitigations can be used to prevent silent stores as well.

Updated: 2025-07-28 16:55:02

标题: Zebrafix:通过交错方式减轻基于内存中心的侧信道泄漏

摘要: 常数时间代码已成为安全加密实现的事实标准。然而,一些基于内存的泄露类别,如密文侧信道和静默存储,仍未解决。以前的工作提出了三种不同的方法来缓解密文侧信道,其中一种方法,即将数据与计数器值交错的实用性尚未探讨。为了弥补这一空白,我们定义了设计选择和要求,以利用交错来实现通用的密文侧信道缓解。基于这些结果,我们实现了Zebrafix,这是一个基于编译器的工具,用于确保内存存储的新鲜度。我们评估了Zebrafix,并发现交错可以比其他密文侧信道缓解方法表现得更好,但实践复杂性高。我们进一步观察到,密文侧信道和静默存储属于更广泛的攻击类别:以内存为中心的侧信道。在这一统一视角下,我们展示了基于交错的密文侧信道缓解方法也可以用于预防静默存储。

更新时间: 2025-07-28 16:55:02

领域: cs.CR

下载: http://arxiv.org/abs/2502.09139v3

GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding

Graphical User Interface (GUI) grounding maps natural language instructions to precise interface locations for autonomous interaction. Current reinforcement learning approaches use binary rewards that treat elements as hit-or-miss targets, creating sparse signals that ignore the continuous nature of spatial interactions. Motivated by human clicking behavior that naturally forms Gaussian distributions centered on target elements, we introduce GUI Gaussian Grounding Rewards (GUI-G$^2$), a principled reward framework that models GUI elements as continuous Gaussian distributions across the interface plane. GUI-G$^2$ incorporates two synergistic mechanisms: Gaussian point rewards model precise localization through exponentially decaying distributions centered on element centroids, while coverage rewards assess spatial alignment by measuring the overlap between predicted Gaussian distributions and target regions. To handle diverse element scales, we develop an adaptive variance mechanism that calibrates reward distributions based on element dimensions. This framework transforms GUI grounding from sparse binary classification to dense continuous optimization, where Gaussian distributions generate rich gradient signals that guide models toward optimal interaction positions. Extensive experiments across ScreenSpot, ScreenSpot-v2, and ScreenSpot-Pro benchmarks demonstrate that GUI-G$^2$, substantially outperforms state-of-the-art method UI-TARS-72B, with the most significant improvement of 24.7% on ScreenSpot-Pro. Our analysis reveals that continuous modeling provides superior robustness to interface variations and enhanced generalization to unseen layouts, establishing a new paradigm for spatial reasoning in GUI interaction tasks.

Updated: 2025-07-28 16:54:13

标题: GUI-G$^2$: 面向GUI基准的高斯奖励模型

摘要: 图形用户界面(GUI)通过自然语言指令将精确的界面位置映射到自主交互中。当前的强化学习方法使用将元素视为命中或未命中目标的二元奖励,产生稀疏信号,忽略了空间交互的连续性。受人类点击行为的启发,该行为自然形成以目标元素为中心的高斯分布,我们引入GUI高斯基于奖励(GUI-G$^2$),这是一个基于原则的奖励框架,将GUI元素建模为界面平面上的连续高斯分布。GUI-G$^2$融合了两种协同机制:高斯点奖励通过以元素质心为中心的指数衰减分布模拟精确定位,而覆盖奖励通过测量预测高斯分布与目标区域之间的重叠来评估空间对齐。为了处理不同的元素尺度,我们开发了一个自适应方差机制,根据元素尺寸校准奖励分布。这个框架将GUI基础从稀疏的二进制分类转变为密集的连续优化,高斯分布产生丰富的梯度信号,指导模型朝着最佳交互位置前进。在ScreenSpot、ScreenSpot-v2和ScreenSpot-Pro基准测试中进行了大量实验,结果显示GUI-G$^2$明显优于最先进的UI-TARS-72B方法,在ScreenSpot-Pro上的最显著提高为24.7%。我们的分析表明,连续建模提供了对界面变化的卓越稳健性,并增强了对未见布局的泛化能力,为GUI交互任务中的空间推理建立了一个新范式。

更新时间: 2025-07-28 16:54:13

领域: cs.LG,cs.AI,cs.CL,cs.CV,cs.HC

下载: http://arxiv.org/abs/2507.15846v3

GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding

Graphical User Interface (GUI) grounding maps natural language instructions to precise interface locations for autonomous interaction. Current reinforcement learning approaches use binary rewards that treat elements as hit-or-miss targets, creating sparse signals that ignore the continuous nature of spatial interactions. Motivated by human clicking behavior that naturally forms Gaussian distributions centered on target elements, we introduce GUI Gaussian Grounding Rewards (GUI-G$^2$), a principled reward framework that models GUI elements as continuous Gaussian distributions across the interface plane. GUI-G$^2$ incorporates two synergistic mechanisms: Gaussian point rewards model precise localization through exponentially decaying distributions centered on element centroids, while coverage rewards assess spatial alignment by measuring the overlap between predicted Gaussian distributions and target regions. To handle diverse element scales, we develop an adaptive variance mechanism that calibrates reward distributions based on element dimensions. This framework transforms GUI grounding from sparse binary classification to dense continuous optimization, where Gaussian distributions generate rich gradient signals that guide models toward optimal interaction positions. Extensive experiments across ScreenSpot, ScreenSpot-v2, and ScreenSpot-Pro benchmarks demonstrate that GUI-G$^2$, substantially outperforms state-of-the-art method UI-TARS-72B, with the most significant improvement of 24.7% on ScreenSpot-Pro. Our analysis reveals that continuous modeling provides superior robustness to interface variations and enhanced generalization to unseen layouts, establishing a new paradigm for spatial reasoning in GUI interaction tasks.

Updated: 2025-07-28 16:54:13

标题: GUI-G$^2$: 面向GUI基准的高斯奖励建模

摘要: 图形用户界面(GUI)通过自然语言指令将精确的界面位置映射到自主交互。当前的强化学习方法使用将元素视为命中或未命中目标的二进制奖励,创建稀疏信号,忽略空间交互的连续性质。受人类点击行为的启发,该行为自然形成以目标元素为中心的高斯分布,我们引入了GUI高斯基础奖励(GUI-G^2),这是一个模拟GUI元素在界面平面上连续高斯分布的基本奖励框架。GUI-G^2结合了两个协同机制:高斯点奖励通过指数衰减分布模型精确定位,而覆盖奖励通过测量预测高斯分布与目标区域之间的重叠度评估空间对齐。为了处理不同尺度的元素,我们开发了一个自适应方差机制,根据元素尺寸校准奖励分布。这一框架将GUI基础从稀疏的二进制分类转变为密集的连续优化,其中高斯分布产生丰富的梯度信号,指导模型朝向最佳交互位置。在ScreenSpot,ScreenSpot-v2和ScreenSpot-Pro基准测试中进行的大量实验表明,GUI-G^2比最先进的UI-TARS-72B方法表现出更好的性能,ScreenSpot-Pro上最显著的改进为24.7%。我们的分析表明,连续建模提供了更好的界面变化稳健性,增强了对未见布局的泛化能力,为GUI交互任务中的空间推理建立了一个新的范式。

更新时间: 2025-07-28 16:54:13

领域: cs.LG,cs.AI,cs.CL,cs.CV,cs.HC

下载: http://arxiv.org/abs/2507.15846v3

Personalized Treatment Effect Estimation from Unstructured Data

Existing methods for estimating personalized treatment effects typically rely on structured covariates, limiting their applicability to unstructured data. Yet, leveraging unstructured data for causal inference has considerable application potential, for instance in healthcare, where clinical notes or medical images are abundant. To this end, we first introduce an approximate 'plug-in' method trained directly on the neural representations of unstructured data. However, when these fail to capture all confounding information, the method may be subject to confounding bias. We therefore introduce two theoretically grounded estimators that leverage structured measurements of the confounders during training, but allow estimating personalized treatment effects purely from unstructured inputs, while avoiding confounding bias. When these structured measurements are only available for a non-representative subset of the data, these estimators may suffer from sampling bias. To address this, we further introduce a regression-based correction that accounts for the non-uniform sampling, assuming the sampling mechanism is known or can be well-estimated. Our experiments on two benchmark datasets show that the plug-in method, directly trainable on large unstructured datasets, achieves strong empirical performance across all settings, despite its simplicity.

Updated: 2025-07-28 16:52:31

标题: 从非结构化数据中个性化治疗效果估计

摘要: 现有的用于估计个性化治疗效果的方法通常依赖于结构化协变量,从而限制了它们对非结构化数据的适用性。然而,利用非结构化数据进行因果推断具有相当大的应用潜力,例如在医疗领域,临床笔记或医学图像都很丰富。为此,我们首先介绍了一种基于非结构化数据的神经表示直接训练的近似“插件”方法。然而,当这些方法无法捕捉所有混杂信息时,该方法可能会受到混杂偏差的影响。因此,我们引入了两种理论上基础的估计器,在训练过程中利用混杂变量的结构化测量值,但可以纯粹根据非结构化输入估计个性化治疗效果,同时避免混杂偏差。当这些结构化测量值仅适用于数据的非代表性子集时,这些估计器可能会受到抽样偏差的影响。为了解决这个问题,我们进一步引入了基于回归的校正方法,考虑到非均匀抽样,假设抽样机制已知或可以被很好地估计。我们在两个基准数据集上的实验表明,直接可在大型非结构化数据集上训练的插件方法在所有设置下都取得了强大的实证性能,尽管它很简单。

更新时间: 2025-07-28 16:52:31

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2507.20993v1

Personalized Treatment Effect Estimation from Unstructured Data

Existing methods for estimating personalized treatment effects typically rely on structured covariates, limiting their applicability to unstructured data. Yet, leveraging unstructured data for causal inference has considerable application potential, for instance in healthcare, where clinical notes or medical images are abundant. To this end, we first introduce an approximate 'plug-in' method trained directly on the neural representations of unstructured data. However, when these fail to capture all confounding information, the method may be subject to confounding bias. We therefore introduce two theoretically grounded estimators that leverage structured measurements of the confounders during training, but allow estimating personalized treatment effects purely from unstructured inputs, while avoiding confounding bias. When these structured measurements are only available for a non-representative subset of the data, these estimators may suffer from sampling bias. To address this, we further introduce a regression-based correction that accounts for the non-uniform sampling, assuming the sampling mechanism is known or can be well-estimated. Our experiments on two benchmark datasets show that the plug-in method, directly trainable on large unstructured datasets, achieves strong empirical performance across all settings, despite its simplicity.

Updated: 2025-07-28 16:52:31

标题: 从非结构化数据中个性化的治疗效果估计

摘要: 现有的用于估计个性化治疗效果的方法通常依赖于结构化协变量,从而限制了它们对非结构化数据的适用性。然而,利用非结构化数据进行因果推断具有相当大的应用潜力,例如在医疗保健领域,临床记录或医学图像非常丰富。为此,我们首先介绍了一种直接在非结构化数据的神经表示上训练的近似“插件”方法。然而,当这些方法未能捕捉到所有混杂信息时,该方法可能会受到混杂偏倚的影响。因此,我们引入了两种理论基础的估计器,在训练过程中利用混杂变量的结构化测量,但允许纯粹基于非结构化输入来估计个性化治疗效果,同时避免混杂偏差。当这些结构化测量仅适用于数据的非代表性子集时,这些估计器可能会受到抽样偏倚的影响。为了解决这个问题,我们进一步引入了基于回归的校正方法,考虑到非均匀抽样,假设抽样机制是已知的或可以被良好估计的。我们在两个基准数据集上进行的实验证明,直接在大型非结构化数据集上训练的“插件”方法在所有设置下都取得了强大的实证表现,尽管其简单性。

更新时间: 2025-07-28 16:52:31

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2507.20993v1

Scaling Physical Reasoning with the PHYSICS Dataset

Large Language Models (LLMs) have achieved remarkable progress on advanced reasoning tasks such as mathematics and coding competitions. Meanwhile, physics, despite being both reasoning-intensive and essential to real-world understanding, received limited academic and industrial attention. This paper introduces PHYSICS, a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels, to facilitate this issue. Specifically, PHYSICS is curated with exercises from over 100 textbooks through a carefully designed pipeline for quality control. It covers five major physics domains: Mechanics, Electromagnetism, Thermodynamics, Optics, and Modern Physics. It also spans a wide range of difficulty levels, from high school to graduate-level physics courses. To utilize the data for improving and evaluating the model's physical reasoning capabilities, we split the dataset into training and test sets, and provide reasoning paths generated by powerful reasoning models for the training data to facilitate model training. In addition, for the evaluation part, we find that existing evaluation frameworks exhibit biases in aspects such as units, simplification, and precision in physics domain. To balance efficiency and accuracy, we introduce a Rule+Model evaluation framework tailored to physics problems. Our evaluations on current state-of-the-art open-source and proprietary models highlight the limitations of current models in handling physics-related tasks. We hope that our dataset and evaluation methodology will jointly advance the development of LLMs in the field of physics.

Updated: 2025-07-28 16:50:18

标题: 使用PHYSICS数据集扩展物理推理能力

摘要: 大型语言模型(LLMs)在数学和编程比赛等高级推理任务上取得了显著进展。与此同时,尽管物理学在推理密集型和对现实世界理解至关重要的同时,却受到了有限的学术和工业关注。本文介绍了PHYSICS,这是一个包含16,568个高质量物理问题的数据集,涵盖了各种学科和难度级别,以便解决这个问题。具体而言,PHYSICS是通过一个精心设计的质量控制流程从100多本教科书中收集的练习题。它涵盖了五个主要的物理领域:力学、电磁学、热力学、光学和现代物理。它还涵盖了从高中到研究生物理课程的广泛难度级别。为了利用这些数据来改进和评估模型的物理推理能力,我们将数据集分成训练集和测试集,并为训练数据提供由强大的推理模型生成的推理路径,以便促进模型训练。此外,在评估部分,我们发现现有的评估框架在物理领域存在单位、简化和精度等方面的偏见。为了平衡效率和准确性,我们引入了一个专门针对物理问题的规则+模型评估框架。我们对当前最先进的开源和专有模型进行的评估突显了当前模型在处理与物理相关任务方面的局限性。我们希望我们的数据集和评估方法将共同推动LLMs在物理领域的发展。

更新时间: 2025-07-28 16:50:18

领域: cs.CL,cs.LG,physics.ed-ph

下载: http://arxiv.org/abs/2506.00022v3

Scaling Physical Reasoning with the PHYSICS Dataset

Large Language Models (LLMs) have achieved remarkable progress on advanced reasoning tasks such as mathematics and coding competitions. Meanwhile, physics, despite being both reasoning-intensive and essential to real-world understanding, received limited academic and industrial attention. This paper introduces PHYSICS, a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels, to facilitate this issue. Specifically, PHYSICS is curated with exercises from over 100 textbooks through a carefully designed pipeline for quality control. It covers five major physics domains: Mechanics, Electromagnetism, Thermodynamics, Optics, and Modern Physics. It also spans a wide range of difficulty levels, from high school to graduate-level physics courses. To utilize the data for improving and evaluating the model's physical reasoning capabilities, we split the dataset into training and test sets, and provide reasoning paths generated by powerful reasoning models for the training data to facilitate model training. In addition, for the evaluation part, we find that existing evaluation frameworks exhibit biases in aspects such as units, simplification, and precision in physics domain. To balance efficiency and accuracy, we introduce a Rule+Model evaluation framework tailored to physics problems. Our evaluations on current state-of-the-art open-source and proprietary models highlight the limitations of current models in handling physics-related tasks. We hope that our dataset and evaluation methodology will jointly advance the development of LLMs in the field of physics.

Updated: 2025-07-28 16:50:18

标题: 使用PHYSICS数据集扩展物理推理能力

摘要: 大型语言模型(LLMs)在数学和编码竞赛等高级推理任务上取得了显著进展。与此同时,尽管物理学在推理密集型和对现实世界理解至关重要的方面,但在学术和工业界的关注有限。本文介绍了PHYSICS,一个包含16,568个高质量物理问题的数据集,涵盖了各种学科和难度级别,以促进解决这一问题。具体来说,PHYSICS是通过一个精心设计的质量控制流程从100多本教科书中精选出来的练习题。它涵盖了五个主要的物理领域:力学、电磁学、热力学、光学和现代物理。它还涵盖了从高中到研究生物理课程的广泛难度级别。为了利用这些数据来改进和评估模型的物理推理能力,我们将数据集分为训练集和测试集,并为训练数据提供由强大的推理模型生成的推理路径,以促进模型训练。此外,在评估部分,我们发现现有的评估框架在单位、简化和精度等物理领域方面存在偏见。为了平衡效率和准确性,我们引入了一个专门针对物理问题的规则+模型评估框架。我们对当前最先进的开源和专有模型进行的评估突显了当前模型在处理与物理相关的任务方面的局限性。我们希望我们的数据集和评估方法能共同推动LLMs在物理领域的发展。

更新时间: 2025-07-28 16:50:18

领域: cs.CL,cs.LG,physics.ed-ph

下载: http://arxiv.org/abs/2506.00022v3

JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1

Recent advances in diffusion-based video generation have enabled photo-realistic short clips, but current methods still struggle to achieve multi-modal consistency when jointly generating whole-body motion and natural speech. Current approaches lack comprehensive eval- uation frameworks that assess both visual and audio quality, and there are insufficient benchmarks for region- specific performance analysis. To address these gaps, we introduce the Joint Whole-Body Talking Avatar and Speech Generation Version I(JWB-DH-V1), comprising a large-scale multi-modal dataset with 10,000 unique identities across 2 million video samples, and an evalua- tion protocol for assessing joint audio-video generation of whole-body animatable avatars. Our evaluation of SOTA models reveals consistent performance disparities between face/hand-centric and whole-body performance, which incidates essential areas for future research. The dataset and evaluation tools are publicly available at https://github.com/deepreasonings/WholeBodyBenchmark.

Updated: 2025-07-28 16:47:44

标题: JWB-DH-V1:联合全身说话化身和语音生成的基准版本1

摘要: 最近在基于扩散的视频生成方面取得了一些进展,使得照片逼真的短视频成为可能,但当前的方法在同时生成整体运动和自然语音时仍然难以实现多模态的一致性。当前的方法缺乏全面的评估框架来评估视觉和音频质量,并且缺乏区域特定性能分析的基准。为了填补这些空白,我们引入了联合全身交谈化身和语音生成版本I(JWB-DH-V1),包括一个大规模的多模态数据集,涵盖了10,000个独特的身份,共2百万个视频样本,并提出了一个评估协议,用于评估整体运动可动化化身的联合音视频生成。我们评估了SOTA模型,发现了面部/手部中心和整体运动性能之间的一致性差距,这表明了未来研究的重要领域。该数据集和评估工具可在https://github.com/deepreasonings/WholeBodyBenchmark 上公开获取。

更新时间: 2025-07-28 16:47:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20987v1

JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1

Recent advances in diffusion-based video generation have enabled photo-realistic short clips, but current methods still struggle to achieve multi-modal consistency when jointly generating whole-body motion and natural speech. Current approaches lack comprehensive eval- uation frameworks that assess both visual and audio quality, and there are insufficient benchmarks for region- specific performance analysis. To address these gaps, we introduce the Joint Whole-Body Talking Avatar and Speech Generation Version I(JWB-DH-V1), comprising a large-scale multi-modal dataset with 10,000 unique identities across 2 million video samples, and an evalua- tion protocol for assessing joint audio-video generation of whole-body animatable avatars. Our evaluation of SOTA models reveals consistent performance disparities between face/hand-centric and whole-body performance, which incidates essential areas for future research. The dataset and evaluation tools are publicly available at https://github.com/deepreasonings/WholeBodyBenchmark.

Updated: 2025-07-28 16:47:44

标题: JWB-DH-V1:联合全身说话化身和语音生成的基准测试版本1

摘要: 最近在基于扩散的视频生成方面取得了进展,已经实现了逼真的短片,但是目前的方法在同时生成整体运动和自然语音时仍然很难实现多模态一致性。目前的方法缺乏全面评估视觉和音频质量的评估框架,以及用于区域特定性能分析的基准不足。为了填补这些空白,我们引入了Joint Whole-Body Talking Avatar and Speech Generation Version I(JWB-DH-V1),包括一个大规模多模态数据集,涵盖了10,000个独特身份,共2百万个视频样本,并提出了一个评估协议,用于评估整体身体可动画化头像的联合音频视频生成。我们对SOTA模型的评估显示,面部/手部为中心的性能和整体身体性能之间存在一致的性能差距,这表明了未来研究的重要领域。该数据集和评估工具可在https://github.com/deepreasonings/WholeBodyBenchmark 上公开获取。

更新时间: 2025-07-28 16:47:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20987v1

SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

While frontier large language models (LLMs) continue to push capability boundaries, their deployment remains confined to GPU-powered cloud infrastructure. We challenge this paradigm with SmallThinker, a family of LLMs natively designed - not adapted - for the unique constraints of local devices: weak computational power, limited memory, and slow storage. Unlike traditional approaches that mainly compress existing models built for clouds, we architect SmallThinker from the ground up to thrive within these limitations. Our innovation lies in a deployment-aware architecture that transforms constraints into design principles. First, We introduce a two-level sparse structure combining fine-grained Mixture-of-Experts (MoE) with sparse feed-forward networks, drastically reducing computational demands without sacrificing model capacity. Second, to conquer the I/O bottleneck of slow storage, we design a pre-attention router that enables our co-designed inference engine to prefetch expert parameters from storage while computing attention, effectively hiding storage latency that would otherwise cripple on-device inference. Third, for memory efficiency, we utilize NoPE-RoPE hybrid sparse attention mechanism to slash KV cache requirements. We release SmallThinker-4B-A0.6B and SmallThinker-21B-A3B, which achieve state-of-the-art performance scores and even outperform larger LLMs. Remarkably, our co-designed system mostly eliminates the need for expensive GPU hardware: with Q4_0 quantization, both models exceed 20 tokens/s on ordinary consumer CPUs, while consuming only 1GB and 8GB of memory respectively. SmallThinker is publicly available at hf.co/PowerInfer/SmallThinker-4BA0.6B-Instruct and hf.co/PowerInfer/SmallThinker-21BA3B-Instruct.

Updated: 2025-07-28 16:45:14

标题: SmallThinker:一系列为本地部署而原生训练的高效大型语言模型

摘要: 尽管前沿的大型语言模型(LLM)不断推动能力边界,但它们的部署仍然局限于GPU驱动的云基础设施。我们用SmallThinker挑战这一范式,这是一系列专为本地设备独特限制而设计的LLM,而非适配而来:计算能力弱、内存有限和存储速度慢。与主要压缩为云构建的现有模型的传统方法不同,我们从头开始构建SmallThinker以适应这些限制。我们的创新在于部署感知架构,将约束转化为设计原则。首先,我们引入了一个两级稀疏结构,将细粒度的专家混合(MoE)与稀疏前馈网络结合,大大降低了计算需求,而不会牺牲模型容量。其次,为了克服慢存储的I/O瓶颈,我们设计了一个预注意路由器,使我们共同设计的推理引擎能够在计算注意力的同时从存储中预取专家参数,有效隐藏存储延迟,否则会瘫痪设备上的推理。第三,为了提高内存效率,我们利用NoPE-RoPE混合稀疏注意机制来削减KV缓存需求。我们发布了SmallThinker-4B-A0.6B和SmallThinker-21B-A3B,它们实现了最先进的性能评分,甚至胜过更大的LLM。值得注意的是,我们共同设计的系统基本上消除了对昂贵的GPU硬件的需求:通过Q4_0量化,这两个模型在普通消费者CPU上均超过20个令牌/秒,同时分别仅消耗1GB和8GB内存。SmallThinker可以在hf.co/PowerInfer/SmallThinker-4BA0.6B-Instruct和hf.co/PowerInfer/SmallThinker-21BA3B-Instruct上公开获取。

更新时间: 2025-07-28 16:45:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20984v1

SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

While frontier large language models (LLMs) continue to push capability boundaries, their deployment remains confined to GPU-powered cloud infrastructure. We challenge this paradigm with SmallThinker, a family of LLMs natively designed - not adapted - for the unique constraints of local devices: weak computational power, limited memory, and slow storage. Unlike traditional approaches that mainly compress existing models built for clouds, we architect SmallThinker from the ground up to thrive within these limitations. Our innovation lies in a deployment-aware architecture that transforms constraints into design principles. First, We introduce a two-level sparse structure combining fine-grained Mixture-of-Experts (MoE) with sparse feed-forward networks, drastically reducing computational demands without sacrificing model capacity. Second, to conquer the I/O bottleneck of slow storage, we design a pre-attention router that enables our co-designed inference engine to prefetch expert parameters from storage while computing attention, effectively hiding storage latency that would otherwise cripple on-device inference. Third, for memory efficiency, we utilize NoPE-RoPE hybrid sparse attention mechanism to slash KV cache requirements. We release SmallThinker-4B-A0.6B and SmallThinker-21B-A3B, which achieve state-of-the-art performance scores and even outperform larger LLMs. Remarkably, our co-designed system mostly eliminates the need for expensive GPU hardware: with Q4_0 quantization, both models exceed 20 tokens/s on ordinary consumer CPUs, while consuming only 1GB and 8GB of memory respectively. SmallThinker is publicly available at hf.co/PowerInfer/SmallThinker-4BA0.6B-Instruct and hf.co/PowerInfer/SmallThinker-21BA3B-Instruct.

Updated: 2025-07-28 16:45:14

标题: SmallThinker:一系列专为本地部署而本地训练的高效大型语言模型

摘要: 尽管前沿的大型语言模型(LLMs)继续推动能力边界,但它们的部署仍然局限于由GPU驱动的云基础设施。我们挑战这一范式,提出了SmallThinker,这是一系列原生设计的LLMs,而不是为本地设备的独特约束而加以调整:计算能力弱、内存有限和存储速度慢。与主要压缩为云构建的现有模型的传统方法不同,我们从头开始设计SmallThinker以在这些限制内发挥作用。我们的创新在于一种部署感知架构,将约束转化为设计原则。首先,我们引入了一个两级稀疏结构,将细粒度的专家混合(MoE)与稀疏前馈网络结合,大幅降低计算需求,而不会牺牲模型容量。其次,为了克服存储速度慢的I/O瓶颈,我们设计了一个预注意路由器,使我们共同设计的推理引擎能够在计算注意力的同时从存储中预取专家参数,有效地隐藏存储延迟,否则将瘫痪设备上的推理。第三,为了提高内存效率,我们利用NoPE-RoPE混合稀疏注意机制来削减KV缓存需求。我们发布了SmallThinker-4B-A0.6B和SmallThinker-21B-A3B,它们实现了最先进的性能得分,甚至胜过更大的LLMs。值得注意的是,我们共同设计的系统大部分消除了昂贵的GPU硬件的需求:通过Q4_0量化,这两个模型在普通消费者CPU上超过20个令牌/秒,同时分别仅消耗1GB和8GB的内存。SmallThinker已公开提供,网址为hf.co/PowerInfer/SmallThinker-4BA0.6B-Instruct和hf.co/PowerInfer/SmallThinker-21BA3B-Instruct。

更新时间: 2025-07-28 16:45:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20984v1

CQE under Epistemic Dependencies: Algorithms and Experiments (extended version)

We investigate Controlled Query Evaluation (CQE) over ontologies, where information disclosure is regulated by epistemic dependencies (EDs), a family of logical rules recently proposed for the CQE framework. In particular, we combine EDs with the notion of optimal GA censors, i.e. maximal sets of ground atoms that are entailed by the ontology and can be safely revealed. We focus on answering Boolean unions of conjunctive queries (BUCQs) with respect to the intersection of all optimal GA censors - an approach that has been shown in other contexts to ensure strong security guarantees with favorable computational behavior. First, we characterize the security of this intersection-based approach and identify a class of EDs (namely, full EDs) for which it remains safe. Then, for a subclass of EDs and for DL-Lite_R ontologies, we show that answering BUCQs in the above CQE semantics is in AC^0 in data complexity by presenting a suitable, detailed first-order rewriting algorithm. Finally, we report on experiments conducted in two different evaluation scenarios, showing the practical feasibility of our rewriting function.

Updated: 2025-07-28 16:42:27

标题: CQE在认知依赖下:算法与实验(扩展版)

摘要: 我们研究了本体上的受控查询评估(CQE),其中信息披露由认知依赖(EDs)规范,这是最近为CQE框架提出的一类逻辑规则。具体来说,我们将EDs与最优GA传感器的概念相结合,即本体推导的最大地面原子集合,可以安全地揭示。我们专注于回答关于所有最优GA传感器的交集的布尔联合连词查询(BUCQs)- 在其他上下文中已经证明,这种方法可以确保强大的安全保证,并具有有利的计算行为。首先,我们表征了这种基于交集的方法的安全性,并确定了一类EDs(即完整的EDs),对于这类EDs来说,这种方法仍然是安全的。然后,对于EDs的一个子类和DL-Lite_R本体,我们展示了在AC^0数据复杂性中通过提供适当的详细的一阶重写算法,在上述CQE语义中回答BUCQs的方法。最后,我们报告了在两种不同评估场景中进行的实验,展示了我们的重写函数的实际可行性。

更新时间: 2025-07-28 16:42:27

领域: cs.AI,cs.DB

下载: http://arxiv.org/abs/2507.17487v2

CQE under Epistemic Dependencies: Algorithms and Experiments (extended version)

We investigate Controlled Query Evaluation (CQE) over ontologies, where information disclosure is regulated by epistemic dependencies (EDs), a family of logical rules recently proposed for the CQE framework. In particular, we combine EDs with the notion of optimal GA censors, i.e. maximal sets of ground atoms that are entailed by the ontology and can be safely revealed. We focus on answering Boolean unions of conjunctive queries (BUCQs) with respect to the intersection of all optimal GA censors - an approach that has been shown in other contexts to ensure strong security guarantees with favorable computational behavior. First, we characterize the security of this intersection-based approach and identify a class of EDs (namely, full EDs) for which it remains safe. Then, for a subclass of EDs and for DL-Lite_R ontologies, we show that answering BUCQs in the above CQE semantics is in AC^0 in data complexity by presenting a suitable, detailed first-order rewriting algorithm. Finally, we report on experiments conducted in two different evaluation scenarios, showing the practical feasibility of our rewriting function.

Updated: 2025-07-28 16:42:27

标题: CQE在认知依赖下:算法与实验(扩展版)

摘要: 我们研究了在本体上的受控查询评估(CQE),其中信息披露受到认知依赖(EDs)的调控,这是最近为CQE框架提出的一组逻辑规则。具体来说,我们将EDs与最优GA传感器的概念相结合,即由本体推断并可以安全揭示的地面原子的最大集合。我们专注于回答关于所有最优GA传感器交集的布尔联合的连接查询(BUCQs)-这种方法在其他情境下已被证明可以确保强大的安全保证并具有有利的计算行为。首先,我们表征了这种基于交集的方法的安全性,并确定了一类EDs(即全EDs),对于这类EDs它仍然是安全的。然后,对于EDs的子类和DL-Lite_R本体,我们通过展示一个适当的、详细的一阶重写算法,证明了在上述CQE语义中回答BUCQs在数据复杂性方面是AC^0。最后,我们报告了在两种不同评估场景下进行的实验,展示了我们的重写函数的实际可行性。

更新时间: 2025-07-28 16:42:27

领域: cs.AI,cs.DB

下载: http://arxiv.org/abs/2507.17487v2

Publishing Wikipedia usage data with strong privacy guarantees

For almost 20 years, the Wikimedia Foundation has been publishing statistics about how many people visited each Wikipedia page on each day. This data helps Wikipedia editors determine where to focus their efforts to improve the online encyclopedia, and enables academic research. In June 2023, the Wikimedia Foundation, helped by Tumult Labs, addressed a long-standing request from Wikipedia editors and academic researchers: it started publishing these statistics with finer granularity, including the country of origin in the daily counts of page views. This new data publication uses differential privacy to provide robust guarantees to people browsing or editing Wikipedia. This paper describes this data publication: its goals, the process followed from its inception to its deployment, the algorithms used to produce the data, and the outcomes of the data release.

Updated: 2025-07-28 16:40:24

标题: 以强隐私保障发布维基百科使用数据

摘要: 在近20年的时间里,维基媒体基金会一直在发布关于每天有多少人访问每个维基百科页面的统计数据。这些数据帮助维基百科编辑确定在哪里集中精力改进在线百科全书,并为学术研究提供支持。2023年6月,维基媒体基金会在Tumult Labs的帮助下回应了维基百科编辑和学术研究者长期以来的请求:开始以更精细的粒度发布这些统计数据,包括每日页面浏览量中的原始国家。这些新的数据发布使用差分隐私为浏览或编辑维基百科的人提供强大的保证。本文描述了这个数据发布:其目标,从概念到部署的过程,用于生成数据的算法,以及数据发布的结果。

更新时间: 2025-07-28 16:40:24

领域: cs.CR

下载: http://arxiv.org/abs/2308.16298v3

Repairing vulnerabilities without invisible hands. A differentiated replication study on LLMs

Background: Automated Vulnerability Repair (AVR) is a fast-growing branch of program repair. Recent studies show that large language models (LLMs) outperform traditional techniques, extending their success beyond code generation and fault detection. Hypothesis: These gains may be driven by hidden factors -- "invisible hands" such as training-data leakage or perfect fault localization -- that let an LLM reproduce human-authored fixes for the same code. Objective: We replicate prior AVR studies under controlled conditions by deliberately adding errors to the reported vulnerability location in the prompt. If LLMs merely regurgitate memorized fixes, both small and large localization errors should yield the same number of correct patches, because any offset should divert the model from the original fix. Method: Our pipeline repairs vulnerabilities from the Vul4J and VJTrans benchmarks after shifting the fault location by n lines from the ground truth. A first LLM generates a patch, a second LLM reviews it, and we validate the result with regression and proof-of-vulnerability tests. Finally, we manually audit a sample of patches and estimate the error rate with the Agresti-Coull-Wilson method.

Updated: 2025-07-28 16:39:16

标题: 修复漏洞而无形之手,关于LLMs的差异化复制研究

摘要: 背景:自动漏洞修复(AVR)是程序修复的一个快速增长的分支。最近的研究表明,大型语言模型(LLMs)在超越代码生成和故障检测方面表现优异,扩展了它们的成功。 假设:这些收益可能受到隐藏因素的推动--“看不见的手”,如训练数据泄漏或完美的故障定位--让LLM能够复制人类撰写的相同代码的修复。 目标:我们在受控条件下复制先前的AVR研究,通过故意在报告的漏洞位置添加错误来实现。如果LLMs仅仅重复记忆的修复,无论是小型还是大型的定位错误都应该产生相同数量的正确补丁,因为任何偏移都会使模型偏离原始修复。 方法:我们的流水线在从真相偏移n行的Vul4J和VJTrans基准中修复漏洞。第一个LLM生成一个补丁,第二个LLM对其进行审查,我们使用回归和漏洞验证测试来验证结果。最后,我们手动审核一部分补丁,并使用Agresti-Coull-Wilson方法估计错误率。

更新时间: 2025-07-28 16:39:16

领域: cs.SE,cs.CR,cs.LG

下载: http://arxiv.org/abs/2507.20977v1

Locally Adaptive Conformal Inference for Operator Models

Operator models are regression algorithms for functional data and have become a key tool for emulating large-scale dynamical systems. Recent advances in deep neural operators have dramatically improved the accuracy and scalability of operator modeling, but lack an inherent notion of predictive uncertainty. We introduce Local Spectral Conformal Inference (LSCI), a new framework for locally adaptive, distribution-free uncertainty quantification for neural operator models. LSCI uses projection-based depth scoring and localized conformal inference to generate function-valued prediction sets with statistical guarantees. We prove approximate finite-sample marginal coverage under local exchangeability, and demonstrate significant gains in adaptivity and coverage across synthetic and real-world operator learning tasks.

Updated: 2025-07-28 16:37:56

标题: 操作模型的局部自适应一致推断

摘要: 操作符模型是用于函数数据的回归算法,已成为模拟大规模动态系统的关键工具。深度神经操作符的最新进展显着提高了操作符建模的准确性和可伸缩性,但缺乏预测不确定性的固有概念。我们引入了本地谱符合推断(LSCI),这是一个用于神经操作符模型的本地自适应、无分布不确定性量化的新框架。LSCI使用基于投影的深度评分和局部一致推断来生成具有统计保证的函数值预测集。我们证明在局部可交换性下的近似有限样本边际覆盖,并在合成和真实世界操作符学习任务中展示了适应性和覆盖率的显著增益。

更新时间: 2025-07-28 16:37:56

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.20975v1

Locally Adaptive Conformal Inference for Operator Models

Operator models are regression algorithms for functional data and have become a key tool for emulating large-scale dynamical systems. Recent advances in deep neural operators have dramatically improved the accuracy and scalability of operator modeling, but lack an inherent notion of predictive uncertainty. We introduce Local Spectral Conformal Inference (LSCI), a new framework for locally adaptive, distribution-free uncertainty quantification for neural operator models. LSCI uses projection-based depth scoring and localized conformal inference to generate function-valued prediction sets with statistical guarantees. We prove approximate finite-sample marginal coverage under local exchangeability, and demonstrate significant gains in adaptivity and coverage across synthetic and real-world operator learning tasks.

Updated: 2025-07-28 16:37:56

标题: 操作模型的局部自适应一致推断

摘要: 运算符模型是用于功能数据的回归算法,已成为模拟大规模动态系统的关键工具。深度神经运算符的最新进展显著提高了运算符建模的准确性和可扩展性,但缺乏预测不确定性的内在概念。我们引入了局部谱符合推断(LSCI),这是一种新的框架,用于神经运算符模型的局部自适应、无分布的不确定性量化。LSCI使用基于投影的深度评分和局部一致推断生成具有统计保证的函数值预测集。我们证明在局部可交换性下的近似有限样本边际覆盖,并在合成和真实世界的运算符学习任务中展示了适应性和覆盖范围的显著增益。

更新时间: 2025-07-28 16:37:56

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.20975v1

Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder

Text-to-image (T2I) diffusion models often exhibit gender bias, particularly by generating stereotypical associations between professions and gendered subjects. This paper presents SAE Debias, a lightweight and model-agnostic framework for mitigating such bias in T2I generation. Unlike prior approaches that rely on CLIP-based filtering or prompt engineering, which often require model-specific adjustments and offer limited control, SAE Debias operates directly within the feature space without retraining or architectural modifications. By leveraging a k-sparse autoencoder pre-trained on a gender bias dataset, the method identifies gender-relevant directions within the sparse latent space, capturing professional stereotypes. Specifically, a biased direction per profession is constructed from sparse latents and suppressed during inference to steer generations toward more gender-balanced outputs. Trained only once, the sparse autoencoder provides a reusable debiasing direction, offering effective control and interpretable insight into biased subspaces. Extensive evaluations across multiple T2I models, including Stable Diffusion 1.4, 1.5, 2.1, and SDXL, demonstrate that SAE Debias substantially reduces gender bias while preserving generation quality. To the best of our knowledge, this is the first work to apply sparse autoencoders for identifying and intervening in gender bias within T2I models. These findings contribute toward building socially responsible generative AI, providing an interpretable and model-agnostic tool to support fairness in text-to-image generation.

Updated: 2025-07-28 16:36:13

标题: 《基于稀疏自动编码器的文本到图像生成中与模型无关的性别偏见控制》

摘要: 文本到图像(T2I)扩散模型经常表现出性别偏见,特别是通过生成职业和性别主体之间的刻板关联。本文介绍了SAE Debias,一个轻量级且与模型无关的框架,用于减轻T2I生成中的这种偏见。与以往依赖于基于CLIP的过滤或提示工程的方法不同,这些方法通常需要模型特定的调整并提供有限的控制,SAE Debias直接在特征空间内运行,无需重新训练或架构修改。通过利用在性别偏见数据集上预先训练的k-稀疏自动编码器,该方法在稀疏潜在空间内识别与性别相关的方向,捕捉职业刻板印象。具体而言,每个职业的偏见方向是从稀疏潜在空间中构建出来的,并在推断过程中被抑制,以引导生成结果朝向更具性别平衡。仅训练一次,稀疏自动编码器提供可重复使用的去偏向方向,提供有效的控制并提供对偏置子空间的可解释见解。对多个T2I模型进行了广泛评估,包括Stable Diffusion 1.4、1.5、2.1和SDXL,结果表明SAE Debias显著减少了性别偏见,同时保留了生成质量。据我们所知,这是首次将稀疏自动编码器应用于识别和干预T2I模型中的性别偏见。这些发现有助于构建社会责任感的生成式人工智能,提供一个可解释且与模型无关的工具,支持文本到图像生成中的公平性。

更新时间: 2025-07-28 16:36:13

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.20973v1

Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder

Text-to-image (T2I) diffusion models often exhibit gender bias, particularly by generating stereotypical associations between professions and gendered subjects. This paper presents SAE Debias, a lightweight and model-agnostic framework for mitigating such bias in T2I generation. Unlike prior approaches that rely on CLIP-based filtering or prompt engineering, which often require model-specific adjustments and offer limited control, SAE Debias operates directly within the feature space without retraining or architectural modifications. By leveraging a k-sparse autoencoder pre-trained on a gender bias dataset, the method identifies gender-relevant directions within the sparse latent space, capturing professional stereotypes. Specifically, a biased direction per profession is constructed from sparse latents and suppressed during inference to steer generations toward more gender-balanced outputs. Trained only once, the sparse autoencoder provides a reusable debiasing direction, offering effective control and interpretable insight into biased subspaces. Extensive evaluations across multiple T2I models, including Stable Diffusion 1.4, 1.5, 2.1, and SDXL, demonstrate that SAE Debias substantially reduces gender bias while preserving generation quality. To the best of our knowledge, this is the first work to apply sparse autoencoders for identifying and intervening in gender bias within T2I models. These findings contribute toward building socially responsible generative AI, providing an interpretable and model-agnostic tool to support fairness in text-to-image generation.

Updated: 2025-07-28 16:36:13

标题: 基于稀疏自动编码器的文本到图像生成的模型无关性性别偏见控制

摘要: 文本到图像(T2I)扩散模型通常表现出性别偏见,特别是通过生成职业和性别主体之间的刻板联想。本文介绍了SAE Debias,一个轻量级且与模型无关的框架,用于减轻T2I生成中的此类偏见。与先前依赖于基于CLIP的过滤或提示工程的方法不同,这些方法通常需要模型特定的调整并提供有限控制,SAE Debias直接在特征空间内运行,而无需重新训练或架构修改。通过利用在性别偏见数据集上预训练的k-稀疏自动编码器,该方法识别出稀疏潜在空间中的与性别相关方向,捕捉职业刻板印象。具体而言,通过稀疏潜在空间构建每个职业的偏见方向,并在推理过程中抑制以引导生成朝向更具性别平衡的输出。仅训练一次,稀疏自动编码器提供可重复使用的去偏见方向,提供有效控制并提供有关偏见子空间的可解释见解。在多个T2I模型(包括Stable Diffusion 1.4、1.5、2.1和SDXL)上进行的广泛评估表明,SAE Debias显著减少了性别偏见,同时保留了生成质量。据我们所知,这是首次将稀疏自动编码器应用于识别和介入T2I模型中的性别偏见。这些发现有助于构建社会责任感的生成式AI,提供一个可解释且与模型无关的工具,以支持文本到图像生成中的公平性。

更新时间: 2025-07-28 16:36:13

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.20973v1

A Modular Open Source Framework for Genomic Variant Calling

Variant calling is a fundamental task in genomic research, essential for detecting genetic variations such as single nucleotide polymorphisms (SNPs) and insertions or deletions (indels). This paper presents an enhancement to DeepChem, a widely used open-source drug discovery framework, through the integration of DeepVariant. In particular, we introduce a variant calling pipeline that leverages DeepVariant's convolutional neural network (CNN) architecture to improve the accuracy and reliability of variant detection. The implemented pipeline includes stages for realignment of sequencing reads, candidate variant detection, and pileup image generation, followed by variant classification using a modified Inception v3 model. Our work adds a modular and extensible variant calling framework to the DeepChem framework and enables future work integrating DeepChem's drug discovery infrastructure more tightly with bioinformatics pipelines.

Updated: 2025-07-28 16:35:44

标题: 一个用于基因组变异检测的模块化开源框架

摘要: 变异调用是基因组研究中的一个基本任务,对于检测单核苷酸多态性(SNPs)和插入或删除(indels)等遗传变异至关重要。本文介绍了一种对DeepChem进行增强的方法,DeepChem是一个广泛使用的开源药物发现框架,通过集成DeepVariant实现。具体来说,我们引入了一个利用DeepVariant的卷积神经网络(CNN)架构来提高变异检测准确性和可靠性的变异调用流程。实施的流程包括测序读数的重新定位、候选变异检测和堆叠图像生成阶段,随后使用修改后的Inception v3模型对变异进行分类。我们的工作为DeepChem框架增加了一个模块化且可扩展的变异调用框架,并使未来的工作更紧密地将DeepChem的药物发现基础设施与生物信息学流程整合在一起。

更新时间: 2025-07-28 16:35:44

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2411.11513v2

A Modular Open Source Framework for Genomic Variant Calling

Variant calling is a fundamental task in genomic research, essential for detecting genetic variations such as single nucleotide polymorphisms (SNPs) and insertions or deletions (indels). This paper presents an enhancement to DeepChem, a widely used open-source drug discovery framework, through the integration of DeepVariant. In particular, we introduce a variant calling pipeline that leverages DeepVariant's convolutional neural network (CNN) architecture to improve the accuracy and reliability of variant detection. The implemented pipeline includes stages for realignment of sequencing reads, candidate variant detection, and pileup image generation, followed by variant classification using a modified Inception v3 model. Our work adds a modular and extensible variant calling framework to the DeepChem framework and enables future work integrating DeepChem's drug discovery infrastructure more tightly with bioinformatics pipelines.

Updated: 2025-07-28 16:35:44

标题: 一个用于基因组变异调用的模块化开源框架

摘要: 变异检测是基因组研究中的基本任务,对于检测单核苷酸多态性(SNPs)和插入或缺失(indels)等遗传变异至关重要。本文介绍了对DeepChem进行增强的方法,该方法是一个广泛使用的开源药物发现框架,通过集成DeepVariant实现。具体而言,我们引入了一个利用DeepVariant的卷积神经网络(CNN)架构来提高变异检测准确性和可靠性的变异检测流程。实施的流程包括对测序读数的重新对齐、候选变异检测和堆积图像生成阶段,随后使用修改后的Inception v3模型进行变异分类。我们的工作为DeepChem框架增加了一个模块化和可扩展的变异检测框架,并能够更紧密地将DeepChem的药物发现基础设施与生物信息学流程集成在一起。

更新时间: 2025-07-28 16:35:44

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2411.11513v2

Cog-TiPRO: Iterative Prompt Refinement with LLMs to Detect Cognitive Decline via Longitudinal Voice Assistant Commands

Early detection of cognitive decline is crucial for enabling interventions that can slow neurodegenerative disease progression. Traditional diagnostic approaches rely on labor-intensive clinical assessments, which are impractical for frequent monitoring. Our pilot study investigates voice assistant systems (VAS) as non-invasive tools for detecting cognitive decline through longitudinal analysis of speech patterns in voice commands. Over an 18-month period, we collected voice commands from 35 older adults, with 15 participants providing daily at-home VAS interactions. To address the challenges of analyzing these short, unstructured and noisy commands, we propose Cog-TiPRO, a framework that combines (1) LLM-driven iterative prompt refinement for linguistic feature extraction, (2) HuBERT-based acoustic feature extraction, and (3) transformer-based temporal modeling. Using iTransformer, our approach achieves 73.80% accuracy and 72.67% F1-score in detecting MCI, outperforming its baseline by 27.13%. Through our LLM approach, we identify linguistic features that uniquely characterize everyday command usage patterns in individuals experiencing cognitive decline.

Updated: 2025-07-28 16:30:46

标题: Cog-TiPRO:使用LLMs进行迭代提示细化,通过纵向语音助手命令检测认知衰退

摘要: 早期发现认知衰退对于实施干预以减缓神经退行性疾病进展至关重要。传统的诊断方法依赖于费时费力的临床评估,这在频繁监测方面是不切实际的。我们的试点研究调查了语音助手系统(VAS)作为非侵入性工具,通过对语音指令的长期分析来检测认知衰退。在18个月的时间里,我们收集了来自35名老年人的语音指令,其中15名参与者每天在家中与VAS进行互动。为了解决分析这些短小、无结构和嘈杂指令的挑战,我们提出了Cog-TiPRO框架,该框架结合了(1)LLM驱动的迭代提示细化,用于语言特征提取,(2)基于HuBERT的声学特征提取,以及(3)基于变压器的时序建模。使用iTransformer,我们的方法在MCI检测中实现了73.80%的准确率和72.67%的F1分数,比基线高出27.13%。通过我们的LLM方法,我们确定了在经历认知衰退的个体中独特表征日常指令使用模式的语言特征。

更新时间: 2025-07-28 16:30:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2505.17137v2

Cog-TiPRO: Iterative Prompt Refinement with LLMs to Detect Cognitive Decline via Longitudinal Voice Assistant Commands

Early detection of cognitive decline is crucial for enabling interventions that can slow neurodegenerative disease progression. Traditional diagnostic approaches rely on labor-intensive clinical assessments, which are impractical for frequent monitoring. Our pilot study investigates voice assistant systems (VAS) as non-invasive tools for detecting cognitive decline through longitudinal analysis of speech patterns in voice commands. Over an 18-month period, we collected voice commands from 35 older adults, with 15 participants providing daily at-home VAS interactions. To address the challenges of analyzing these short, unstructured and noisy commands, we propose Cog-TiPRO, a framework that combines (1) LLM-driven iterative prompt refinement for linguistic feature extraction, (2) HuBERT-based acoustic feature extraction, and (3) transformer-based temporal modeling. Using iTransformer, our approach achieves 73.80% accuracy and 72.67% F1-score in detecting MCI, outperforming its baseline by 27.13%. Through our LLM approach, we identify linguistic features that uniquely characterize everyday command usage patterns in individuals experiencing cognitive decline.

Updated: 2025-07-28 16:30:46

标题: Cog-TiPRO:使用LLM进行迭代提示细化,通过纵向语音助手命令检测认知衰退

摘要: 认知衰退的早期检测对于实施能够减缓神经退行性疾病进展的干预至关重要。传统的诊断方法依赖于费时费力的临床评估,这对于频繁监测来说是不切实际的。我们的试点研究探讨了语音助手系统(VAS)作为非侵入性工具,通过对语音命令中的语音模式进行纵向分析,以检测认知衰退。在18个月的时间内,我们收集了来自35名老年人的语音命令,其中15名参与者每天在家里与VAS进行互动。为了解决分析这些短暂、无结构和嘈杂命令的挑战,我们提出了Cog-TiPRO,这是一个框架,结合了(1)基于LLM的迭代提示细化,用于语言特征提取,(2)基于HuBERT的声学特征提取,以及(3)基于变压器的时间建模。使用iTransformer,我们的方法在检测轻度认知障碍方面实现了73.80%的准确率和72.67%的F1分数,优于其基线27.13%。通过我们的LLM方法,我们确定了在经历认知衰退的个体中独特表征日常命令使用模式的语言特征。

更新时间: 2025-07-28 16:30:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2505.17137v2

A Survey of Deep Learning for Geometry Problem Solving

Geometry problem solving is a key area of mathematical reasoning, which is widely involved in many important fields such as education, mathematical ability assessment of artificial intelligence, and multimodal ability assessment. In recent years, the rapid development of deep learning technology, especially the rise of multimodal large language models, has triggered a widespread research boom. This paper provides a survey of the applications of deep learning in geometry problem solving, including (i) a comprehensive summary of the relevant tasks in geometry problem solving; (ii) a thorough review of related deep learning methods; (iii) a detailed analysis of evaluation metrics and methods; and (iv) a critical discussion of the current challenges and future directions that can be explored. Our goal is to provide a comprehensive and practical reference of deep learning for geometry problem solving to promote further developments in this field. We create a continuously updated list of papers on GitHub: https://github.com/majianz/dl4gps.

Updated: 2025-07-28 16:29:33

标题: 深度学习在几何问题解决中的应用调查

摘要: 几何问题解决是数学推理的关键领域,广泛涉及许多重要领域,如教育、人工智能的数学能力评估和多模态能力评估。近年来,深度学习技术的快速发展,特别是多模态大语言模型的兴起,引发了广泛的研究热潮。本文对深度学习在几何问题解决中的应用进行了调查,包括:(i)对几何问题解决中相关任务的全面总结;(ii)对相关深度学习方法的彻底审查;(iii)对评估指标和方法的详细分析;以及(iv)对当前挑战和未来方向的批判性讨论。我们的目标是提供一个深度学习在几何问题解决中的全面实用参考,以促进该领域的进一步发展。我们在GitHub上创建了一个持续更新的论文列表:https://github.com/majianz/dl4gps。

更新时间: 2025-07-28 16:29:33

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.11936v4

A Survey of Deep Learning for Geometry Problem Solving

Geometry problem solving is a key area of mathematical reasoning, which is widely involved in many important fields such as education, mathematical ability assessment of artificial intelligence, and multimodal ability assessment. In recent years, the rapid development of deep learning technology, especially the rise of multimodal large language models, has triggered a widespread research boom. This paper provides a survey of the applications of deep learning in geometry problem solving, including (i) a comprehensive summary of the relevant tasks in geometry problem solving; (ii) a thorough review of related deep learning methods; (iii) a detailed analysis of evaluation metrics and methods; and (iv) a critical discussion of the current challenges and future directions that can be explored. Our goal is to provide a comprehensive and practical reference of deep learning for geometry problem solving to promote further developments in this field. We create a continuously updated list of papers on GitHub: https://github.com/majianz/dl4gps.

Updated: 2025-07-28 16:29:33

标题: 《几何问题解决的深度学习调查》

摘要: 几何问题解决是数学推理的关键领域,广泛涉及许多重要领域,如教育、人工智能的数学能力评估和多模态能力评估。近年来,深度学习技术的快速发展,特别是多模态大语言模型的崛起,引发了广泛的研究热潮。本文调查了深度学习在几何问题解决中的应用,包括(i)对几何问题解决中相关任务的全面总结;(ii)相关深度学习方法的彻底审查;(iii)评估指标和方法的详细分析;以及(iv)对当前挑战和未来方向的关键讨论。我们的目标是提供一个全面而实用的深度学习参考,以促进该领域的进一步发展。我们在 GitHub 上创建了一个持续更新的论文列表:https://github.com/majianz/dl4gps。

更新时间: 2025-07-28 16:29:33

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.11936v4

Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning

Detecting Mild Cognitive Impairment from picture descriptions is critical yet challenging, especially in multilingual and multiple picture settings. Prior work has primarily focused on English speakers describing a single picture (e.g., the 'Cookie Theft'). The TAUKDIAL-2024 challenge expands this scope by introducing multilingual speakers and multiple pictures, which presents new challenges in analyzing picture-dependent content. To address these challenges, we propose a framework with three components: (1) enhancing discriminative representation learning via supervised contrastive learning, (2) involving image modality rather than relying solely on speech and text modalities, and (3) applying a Product of Experts (PoE) strategy to mitigate spurious correlations and overfitting. Our framework improves MCI detection performance, achieving a +7.1% increase in Unweighted Average Recall (UAR) (from 68.1% to 75.2%) and a +2.9% increase in F1 score (from 80.6% to 83.5%) compared to the text unimodal baseline. Notably, the contrastive learning component yields greater gains for the text modality compared to speech. These results highlight our framework's effectiveness in multilingual and multi-picture MCI detection.

Updated: 2025-07-28 16:28:49

标题: 通过对比学习揭示多语言轻度认知障碍检测的多图片描述

摘要: 从图片描述中检测轻度认知障碍对于多语言和多图片设置来说是至关重要的,但也具有挑战性。以前的研究主要集中在英语为母语者描述单张图片(例如“偷饼干”)。TAUKDIAL-2024挑战扩展了这一范围,引入了多语言说话者和多张图片,这在分析依赖于图片内容方面提出了新的挑战。为了解决这些挑战,我们提出了一个包含三个组件的框架:(1)通过监督对比学习增强区分性表示学习,(2)涉及图像模态而不仅仅依赖于语音和文本模态,以及(3)应用专家的乘积(PoE)策略来减少虚假相关性和过拟合。我们的框架提高了MCI检测性能,相较于文本单模基准,Unweighted Average Recall(UAR)增加了7.1%(从68.1%到75.2%),F1得分增加了2.9%(从80.6%到83.5%)。值得注意的是,对比学习组件对文本模态相比语音获得了更大的增益。这些结果突显了我们的框架在多语言和多图片MCI检测中的有效性。

更新时间: 2025-07-28 16:28:49

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2505.17067v3

Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning

Detecting Mild Cognitive Impairment from picture descriptions is critical yet challenging, especially in multilingual and multiple picture settings. Prior work has primarily focused on English speakers describing a single picture (e.g., the 'Cookie Theft'). The TAUKDIAL-2024 challenge expands this scope by introducing multilingual speakers and multiple pictures, which presents new challenges in analyzing picture-dependent content. To address these challenges, we propose a framework with three components: (1) enhancing discriminative representation learning via supervised contrastive learning, (2) involving image modality rather than relying solely on speech and text modalities, and (3) applying a Product of Experts (PoE) strategy to mitigate spurious correlations and overfitting. Our framework improves MCI detection performance, achieving a +7.1% increase in Unweighted Average Recall (UAR) (from 68.1% to 75.2%) and a +2.9% increase in F1 score (from 80.6% to 83.5%) compared to the text unimodal baseline. Notably, the contrastive learning component yields greater gains for the text modality compared to speech. These results highlight our framework's effectiveness in multilingual and multi-picture MCI detection.

Updated: 2025-07-28 16:28:49

标题: 揭示通过对比学习进行多语言轻度认知障碍检测的多图像描述

摘要: 从图片描述中检测轻度认知障碍对于多语言和多图片设置来说至关重要,但也具有挑战性。先前的研究主要集中在英语为母语者描述单张图片(例如“偷饼干”)。TAUKDIAL-2024挑战通过引入多语言者和多张图片扩展了这一范围,这在分析依赖于图片内容时带来了新的挑战。为了解决这些挑战,我们提出了一个框架,包括三个组成部分:(1)通过监督对比学习增强判别性表示学习,(2)涉及图像模态而不仅仅依赖于语音和文本模态,以及(3)应用专家乘积(PoE)策略以减轻虚假相关性和过拟合。我们的框架提高了MCI检测性能,与文本单模基线相比,Unweighted Average Recall(UAR)提高了7.1%(从68.1%到75.2%),F1分数提高了2.9%(从80.6%到83.5%)。值得注意的是,对比学习组件对文本模态的增益大于语音。这些结果突显了我们的框架在多语言和多图片MCI检测中的有效性。

更新时间: 2025-07-28 16:28:49

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2505.17067v3

From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation

Domain shift poses a fundamental challenge in time series analysis, where models trained on source domain often fail dramatically when applied in target domain with different yet similar distributions. While current unsupervised domain adaptation (UDA) methods attempt to align cross-domain feature distributions, they typically treat features as indivisible entities, ignoring their intrinsic compositions that governs domain adaptation. We introduce DARSD, a novel UDA framework with theoretical explainability that explicitly realizes UDA tasks from the perspective of representation space decomposition. Our core insight is that effective domain adaptation requires not just alignment, but principled disentanglement of transferable knowledge from mixed representations. DARSD consists three synergistic components: (I) An adversarial learnable common invariant basis that projects original features into a domain-invariant subspace while preserving semantic content; (II) A prototypical pseudo-labeling mechanism that dynamically separates target features based on confidence, hindering error accumulation; (III) A hybrid contrastive optimization strategy that simultaneously enforces feature clustering and consistency while mitigating emerging distribution gaps. Comprehensive experiments conducted on four benchmark datasets (WISDM, HAR, HHAR, and MFD) demonstrate DARSD's superiority against 12 UDA algorithms, achieving optimal performance in 35 out of 53 cross-domain scenarios.

Updated: 2025-07-28 16:26:28

标题: 从纠缠到对齐:无监督时间序列领域自适应的表示空间分解

摘要: 领域转移在时间序列分析中构成了一个基本挑战,即在源领域上训练的模型在应用于具有不同但相似分布的目标领域时往往会失败。尽管目前的无监督领域自适应(UDA)方法试图对齐跨领域特征分布,但它们通常将特征视为不可分割的实体,忽略了支配领域适应的内在组成。我们引入了DARSD,这是一个具有理论解释性的新型UDA框架,从表示空间分解的角度明确地实现了UDA任务。我们的核心见解是,有效的领域适应不仅需要对齐,还需要对可转移知识进行原则性的解缠。DARSD包括三个协同的组件:(I)一个对抗学习的公共不变基,将原始特征投影到一个领域不变的子空间中,同时保留语义内容;(II)一种基于置信度动态分离目标特征的原型伪标记机制,阻止错误积累;(III)一种混合对比优化策略,同时强化特征聚类和一致性,同时减轻新兴分布差距。在四个基准数据集(WISDM、HAR、HHAR和MFD)上进行的全面实验表明,DARSD在35个跨领域场景中表现出色,优于12种UDA算法,实现了最佳性能。

更新时间: 2025-07-28 16:26:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20968v1

From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation

Domain shift poses a fundamental challenge in time series analysis, where models trained on source domain often fail dramatically when applied in target domain with different yet similar distributions. While current unsupervised domain adaptation (UDA) methods attempt to align cross-domain feature distributions, they typically treat features as indivisible entities, ignoring their intrinsic compositions that governs domain adaptation. We introduce DARSD, a novel UDA framework with theoretical explainability that explicitly realizes UDA tasks from the perspective of representation space decomposition. Our core insight is that effective domain adaptation requires not just alignment, but principled disentanglement of transferable knowledge from mixed representations. DARSD consists three synergistic components: (I) An adversarial learnable common invariant basis that projects original features into a domain-invariant subspace while preserving semantic content; (II) A prototypical pseudo-labeling mechanism that dynamically separates target features based on confidence, hindering error accumulation; (III) A hybrid contrastive optimization strategy that simultaneously enforces feature clustering and consistency while mitigating emerging distribution gaps. Comprehensive experiments conducted on four benchmark datasets (WISDM, HAR, HHAR, and MFD) demonstrate DARSD's superiority against 12 UDA algorithms, achieving optimal performance in 35 out of 53 cross-domain scenarios.

Updated: 2025-07-28 16:26:28

标题: 从纠缠到对齐:无监督时间序列领域自适应的表示空间分解

摘要: 领域转移在时间序列分析中构成了一个基本挑战,即在源领域训练的模型在应用于具有不同但相似分布的目标领域时往往会出现严重失败。当前的无监督领域自适应(UDA)方法试图对齐跨领域特征分布,但它们通常将特征视为不可分割的实体,忽略了支配领域适应的内在组成。我们引入了DARSD,这是一个具有理论可解释性的新领域自适应框架,从表示空间分解的角度明确实现了UDA任务。我们的核心见解是,有效的领域适应不仅需要对齐,还需要从混合表示中有原则地分离可转移知识。DARSD包括三个协同组件:(I)一个对抗可学习的共同不变基础,将原始特征投影到一个领域不变的子空间,同时保留语义内容;(II)一个基于可信度动态分离目标特征的原型伪标签机制,阻碍错误累积;(III)一种混合对比优化策略,同时强化特征聚类和一致性,同时减轻出现的分布差距。在四个基准数据集(WISDM、HAR、HHAR和MFD)上进行的全面实验表明,DARSD在与12个UDA算法的比较中表现出优越性,在53个跨领域场景中有35个实现了最佳性能。

更新时间: 2025-07-28 16:26:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20968v1

Levels of Autonomy for AI Agents

Autonomy is a double-edged sword for AI agents, simultaneously unlocking transformative possibilities and serious risks. How can agent developers calibrate the appropriate levels of autonomy at which their agents should operate? We argue that an agent's level of autonomy can be treated as a deliberate design decision, separate from its capability and operational environment. In this work, we define five levels of escalating agent autonomy, characterized by the roles a user can take when interacting with an agent: operator, collaborator, consultant, approver, and observer. Within each level, we describe the ways by which a user can exert control over the agent and open questions for how to design the nature of user-agent interaction. We then highlight a potential application of our framework towards AI autonomy certificates to govern agent behavior in single- and multi-agent systems. We conclude by proposing early ideas for evaluating agents' autonomy. Our work aims to contribute meaningful, practical steps towards responsibly deployed and useful AI agents in the real world.

Updated: 2025-07-28 16:25:18

标题: 人工智能代理的自主水平

摘要: 自主权对AI代理来说是一把双刃剑,同时释放了变革性的可能性和严重的风险。代理开发者如何可以校准代理应该运行的适当自主权水平?我们认为,代理的自主权水平可以被视为一个刻意的设计决策,独立于其能力和操作环境。在这项工作中,我们定义了五个不断升级的代理自主权水平,由与代理互动时用户可以扮演的角色来描述:操作员,合作者,顾问,批准者和观察者。在每个级别内,我们描述了用户如何可以对代理行使控制,并提出了如何设计用户-代理互动性质的开放性问题。然后,我们强调了我们的框架可能应用于AI自主权证书,以管理单一和多代理系统中的代理行为。最后,我们提出了评估代理自主权的早期想法。我们的工作旨在为在现实世界中负责任部署和有用的AI代理做出有意义的实际步骤。

更新时间: 2025-07-28 16:25:18

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2506.12469v2

Levels of Autonomy for AI Agents

Autonomy is a double-edged sword for AI agents, simultaneously unlocking transformative possibilities and serious risks. How can agent developers calibrate the appropriate levels of autonomy at which their agents should operate? We argue that an agent's level of autonomy can be treated as a deliberate design decision, separate from its capability and operational environment. In this work, we define five levels of escalating agent autonomy, characterized by the roles a user can take when interacting with an agent: operator, collaborator, consultant, approver, and observer. Within each level, we describe the ways by which a user can exert control over the agent and open questions for how to design the nature of user-agent interaction. We then highlight a potential application of our framework towards AI autonomy certificates to govern agent behavior in single- and multi-agent systems. We conclude by proposing early ideas for evaluating agents' autonomy. Our work aims to contribute meaningful, practical steps towards responsibly deployed and useful AI agents in the real world.

Updated: 2025-07-28 16:25:18

标题: AI代理的自主级别

摘要: 自主性对于AI代理是一把双刃剑,同时释放了变革性的可能性和严重的风险。代理开发人员如何调整代理应该运行的适当自主级别?我们认为代理的自主级别可以被视为一个刻意的设计决策,与其能力和操作环境分离。在这项工作中,我们定义了五个不断升级的代理自主级别,由用户与代理互动时可以扮演的角色来描述:操作员、合作者、顾问、批准者和观察者。在每个级别内,我们描述了用户如何对代理施加控制的方式,并提出了如何设计用户-代理互动性质的开放问题。然后,我们重点介绍了我们的框架可能应用于AI自主性证书,以管理单一和多代理系统中的代理行为。最后,我们提出了评估代理自主性的早期想法。我们的工作旨在为在现实世界中负责任地部署和有用的AI代理贡献有意义的、实用的步骤。

更新时间: 2025-07-28 16:25:18

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2506.12469v2

PROVCREATOR: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes

The rise of graph-structured data has driven interest in graph learning and synthetic data generation. While successful in text and image domains, synthetic graph generation remains challenging -- especially for real-world graphs with complex, heterogeneous schemas. Existing research has focused mostly on homogeneous structures with simple attributes, limiting their usefulness and relevance for application domains requiring semantic fidelity. In this research, we introduce ProvCreator, a synthetic graph framework designed for complex heterogeneous graphs with high-dimensional node and edge attributes. ProvCreator formulates graph synthesis as a sequence generation task, enabling the use of transformer-based large language models. It features a versatile graph-to-sequence encoder-decoder that 1. losslessly encodes graph structure and attributes, 2. efficiently compresses large graphs for contextual modeling, and 3. supports end-to-end, learnable graph generation. To validate our research, we evaluate ProvCreator on two challenging domains: system provenance graphs in cybersecurity and knowledge graphs from IntelliGraph Benchmark Dataset. In both cases, ProvCreator captures intricate dependencies between structure and semantics, enabling the generation of realistic and privacy-aware synthetic datasets.

Updated: 2025-07-28 16:22:50

标题: PROVCREATOR:使用节点和边属性合成复杂异构图

摘要: 图结构化数据的兴起推动了对图学习和合成数据生成的兴趣。虽然在文本和图像领域取得了成功,但合成图生成仍然具有挑战性,尤其是对于具有复杂、异构模式的现实世界图。现有研究主要集中在具有简单属性的同质结构上,限制了它们在需要语义保真度的应用领域中的实用性和相关性。 在这项研究中,我们介绍了ProvCreator,一个专为具有高维节点和边属性的复杂异构图设计的合成图框架。ProvCreator将图合成形式化为一个序列生成任务,从而可以利用基于转换器的大型语言模型。它具有一个多功能的图到序列编码器-解码器,可以无损编码图结构和属性,高效压缩大型图以进行上下文建模,并支持端到端、可学习的图生成。 为了验证我们的研究,我们在两个具有挑战性的领域上评估了ProvCreator:网络安全中的系统溯源图和IntelliGraph Benchmark Dataset中的知识图。在两种情况下,ProvCreator捕捉了结构和语义之间的复杂依赖关系,实现了生成逼真且保护隐私的合成数据集。

更新时间: 2025-07-28 16:22:50

领域: cs.LG

下载: http://arxiv.org/abs/2507.20967v1

PROVCREATOR: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes

The rise of graph-structured data has driven interest in graph learning and synthetic data generation. While successful in text and image domains, synthetic graph generation remains challenging -- especially for real-world graphs with complex, heterogeneous schemas. Existing research has focused mostly on homogeneous structures with simple attributes, limiting their usefulness and relevance for application domains requiring semantic fidelity. In this research, we introduce ProvCreator, a synthetic graph framework designed for complex heterogeneous graphs with high-dimensional node and edge attributes. ProvCreator formulates graph synthesis as a sequence generation task, enabling the use of transformer-based large language models. It features a versatile graph-to-sequence encoder-decoder that 1. losslessly encodes graph structure and attributes, 2. efficiently compresses large graphs for contextual modeling, and 3. supports end-to-end, learnable graph generation. To validate our research, we evaluate ProvCreator on two challenging domains: system provenance graphs in cybersecurity and knowledge graphs from IntelliGraph Benchmark Dataset. In both cases, ProvCreator captures intricate dependencies between structure and semantics, enabling the generation of realistic and privacy-aware synthetic datasets.

Updated: 2025-07-28 16:22:50

标题: PROVCREATOR:使用节点和边属性合成复杂异构图

摘要: 图结构数据的兴起推动了对图学习和合成数据生成的兴趣。虽然在文本和图像领域取得成功,但合成图生成仍然具有挑战性,特别是对于具有复杂、异构模式的现实世界图。现有研究主要集中在具有简单属性的同质结构上,限制了它们在需要语义保真度的应用领域中的实用性和相关性。 在这项研究中,我们介绍了ProvCreator,一个专为具有高维节点和边属性的复杂异构图设计的合成图框架。ProvCreator将图合成形式化为一个序列生成任务,从而可以利用基于Transformer的大型语言模型。它具有一个多功能的图到序列编码器-解码器,可以无损地编码图结构和属性,有效地压缩大型图以进行上下文建模,并支持端到端的可学习图生成。 为了验证我们的研究,我们在两个具有挑战性的领域上评估了ProvCreator:网络安全中的系统溯源图和IntelliGraph Benchmark Dataset中的知识图。在这两种情况下,ProvCreator捕捉了结构和语义之间错综复杂的依赖关系,实现了生成逼真且注重隐私的合成数据集。

更新时间: 2025-07-28 16:22:50

领域: cs.LG

下载: http://arxiv.org/abs/2507.20967v1

Handoff Design in User-Centric Cell-Free Massive MIMO Networks Using DRL

In the user-centric cell-free massive MIMO (UC-mMIMO) network scheme, user mobility necessitates updating the set of serving access points to maintain the user-centric clustering. Such updates are typically performed through handoff (HO) operations; however, frequent HOs lead to overheads associated with the allocation and release of resources. This paper presents a deep reinforcement learning (DRL)-based solution to predict and manage these connections for mobile users. Our solution employs the Soft Actor-Critic algorithm, with continuous action space representation, to train a deep neural network to serve as the HO policy. We present a novel proposition for a reward function that integrates a HO penalty in order to balance the attainable rate and the associated overhead related to HOs. We develop two variants of our system; the first one uses mobility direction-assisted (DA) observations that are based on the user movement pattern, while the second one uses history-assisted (HA) observations that are based on the history of the large-scale fading (LSF). Simulation results show that our DRL-based continuous action space approach is more scalable than discrete space counterpart, and that our derived HO policy automatically learns to gather HOs in specific time slots to minimize the overhead of initiating HOs. Our solution can also operate in real time with a response time less than 0.4 ms.

Updated: 2025-07-28 16:21:45

标题: 基于强化学习的用户中心无小区大规模MIMO网络中的切换设计

摘要: 在面向用户的无线大规模MIMO(UC-mMIMO)网络方案中,用户的移动性需要更新服务接入点的集合,以维持用户中心的聚类。这些更新通常通过切换(HO)操作来执行;然而,频繁的切换会导致与资源分配和释放相关的开销。本文提出了一种基于深度强化学习(DRL)的解决方案,用于预测和管理移动用户的连接。我们的解决方案采用Soft Actor-Critic算法,采用连续动作空间表示,训练一个深度神经网络作为HO策略。我们提出了一个集成HO惩罚的奖励函数的新主张,以平衡可达速率和与HO相关的开销。我们开发了我们系统的两个变体;第一个使用基于用户移动模式的移动方向辅助(DA)观察,而第二个使用基于大尺度衰落(LSF)历史的历史辅助(HA)观察。模拟结果显示,我们基于DRL的连续动作空间方法比离散空间对应物更具可伸缩性,并且我们推导的HO策略自动学习在特定时间段内收集HO以最小化切换引发的开销。我们的解决方案还可以在实时操作中以小于0.4毫秒的响应时间运行。

更新时间: 2025-07-28 16:21:45

领域: cs.IT,cs.AI,cs.LG,cs.NI,eess.SP,math.IT

下载: http://arxiv.org/abs/2507.20966v1

Handoff Design in User-Centric Cell-Free Massive MIMO Networks Using DRL

In the user-centric cell-free massive MIMO (UC-mMIMO) network scheme, user mobility necessitates updating the set of serving access points to maintain the user-centric clustering. Such updates are typically performed through handoff (HO) operations; however, frequent HOs lead to overheads associated with the allocation and release of resources. This paper presents a deep reinforcement learning (DRL)-based solution to predict and manage these connections for mobile users. Our solution employs the Soft Actor-Critic algorithm, with continuous action space representation, to train a deep neural network to serve as the HO policy. We present a novel proposition for a reward function that integrates a HO penalty in order to balance the attainable rate and the associated overhead related to HOs. We develop two variants of our system; the first one uses mobility direction-assisted (DA) observations that are based on the user movement pattern, while the second one uses history-assisted (HA) observations that are based on the history of the large-scale fading (LSF). Simulation results show that our DRL-based continuous action space approach is more scalable than discrete space counterpart, and that our derived HO policy automatically learns to gather HOs in specific time slots to minimize the overhead of initiating HOs. Our solution can also operate in real time with a response time less than 0.4 ms.

Updated: 2025-07-28 16:21:45

标题: 使用深度强化学习的用户中心无蜂窝大规模MIMO网络的切换设计

摘要: 在用户中心的无基站大规模MIMO(UC-mMIMO)网络方案中,用户的移动性需要更新服务接入点的集合,以维持用户中心的聚类。这些更新通常通过切换(HO)操作来执行;然而,频繁的切换会导致与资源分配和释放相关的开销。本文提出了一种基于深度强化学习(DRL)的解决方案,用于预测和管理移动用户的连接。我们的解决方案采用Soft Actor-Critic算法,采用连续动作空间表示,训练一个深度神经网络作为切换策略。我们提出了一个新颖的奖励函数建议,整合了切换惩罚以平衡可达速率和与切换相关的开销。我们开发了我们系统的两个变体;第一个使用基于用户移动模式的移动方向辅助(DA)观测,而第二个使用基于大尺度衰减(LSF)历史的历史辅助(HA)观测。模拟结果显示,我们基于DRL的连续动作空间方法比离散空间对应物更具可伸缩性,我们得到的切换策略自动学习在特定时间段内收集切换,以最小化切换的开销。我们的解决方案还可以在实时操作中以小于0.4毫秒的响应时间运行。

更新时间: 2025-07-28 16:21:45

领域: cs.IT,cs.AI,cs.LG,cs.NI,eess.SP,math.IT

下载: http://arxiv.org/abs/2507.20966v1

Core Safety Values for Provably Corrigible Agents

We introduce the first implementable framework for corrigibility, with provable guarantees in multi-step, partially observed environments. Our framework replaces a single opaque reward with five *structurally separate* utility heads -- deference, switch-access preservation, truthfulness, low-impact behavior via a belief-based extension of Attainable Utility Preservation, and bounded task reward -- combined lexicographically by strict weight gaps. Theorem 1 proves exact single-round corrigibility in the partially observable off-switch game; Theorem 3 extends the guarantee to multi-step, self-spawning agents, showing that even if each head is \emph{learned} to mean-squared error $\varepsilon$ and the planner is $\varepsilon$-sub-optimal, the probability of violating \emph{any} safety property is bounded while still ensuring net human benefit. In contrast to Constitutional AI or RLHF/RLAIF, which merge all norms into one learned scalar, our separation makes obedience and impact-limits dominate even when incentives conflict. For open-ended settings where adversaries can modify the agent, we prove that deciding whether an arbitrary post-hack agent will ever violate corrigibility is undecidable by reduction to the halting problem, then carve out a finite-horizon ``decidable island'' where safety can be certified in randomized polynomial time and verified with privacy-preserving, constant-round zero-knowledge proofs. Consequently, the remaining challenge is the ordinary ML task of data coverage and generalization: reward-hacking risk is pushed into evaluation quality rather than hidden incentive leak-through, giving clearer implementation guidance for today's LLM assistants and future autonomous systems.

Updated: 2025-07-28 16:19:25

标题: 可证明可更正性代理的核心安全价值观

摘要: 我们引入了第一个可实施的可矫正性框架,具有在多步骤、部分观察环境中的可证保证。我们的框架用五个*结构分离的*效用头替换了单一的不透明奖励 -- 顺从、切换访问保留、真实性、通过可达性效用保留的基于信念的扩展实现低影响行为和有界任务奖励,通过严格的权重间隔按词典顺序组合。定理1证明了部分可观察的关闭游戏中的单轮精确可矫正性;定理3将保证扩展到多步骤、自我生成的代理,表明即使每个头部都被*学习*到均方误差为$\varepsilon$,并且规划者是$\varepsilon$-次优的,仍然确保违反*任何*安全属性的概率受到限制,同时确保净人类利益。与宪法AI或RLHF/RLAIF不同,后者将所有规范合并为一个学习的标量,我们的分离使服从和影响限制在激励冲突时仍然占主导地位。对于对手可以修改代理的开放性设置,我们证明决定任意后黑客代理是否会违反可矫正性是通过归约到停机问题不可判定的,然后划分出一个有限视野的“可判定岛”,在这个岛上,安全性可以在随机多项式时间内获得认证,并且可以通过保护隐私、常数轮的零知识证明进行验证。因此,剩下的挑战是普通的ML任务数据覆盖和泛化:奖励黑客风险被推入评估质量而不是隐藏的激励泄漏中,为今天的LLM助手和未来的自主系统提供更明确的实施指导。

更新时间: 2025-07-28 16:19:25

领域: cs.AI,cs.CC,cs.GT,cs.LG,cs.MA

下载: http://arxiv.org/abs/2507.20964v1

Core Safety Values for Provably Corrigible Agents

We introduce the first implementable framework for corrigibility, with provable guarantees in multi-step, partially observed environments. Our framework replaces a single opaque reward with five *structurally separate* utility heads -- deference, switch-access preservation, truthfulness, low-impact behavior via a belief-based extension of Attainable Utility Preservation, and bounded task reward -- combined lexicographically by strict weight gaps. Theorem 1 proves exact single-round corrigibility in the partially observable off-switch game; Theorem 3 extends the guarantee to multi-step, self-spawning agents, showing that even if each head is \emph{learned} to mean-squared error $\varepsilon$ and the planner is $\varepsilon$-sub-optimal, the probability of violating \emph{any} safety property is bounded while still ensuring net human benefit. In contrast to Constitutional AI or RLHF/RLAIF, which merge all norms into one learned scalar, our separation makes obedience and impact-limits dominate even when incentives conflict. For open-ended settings where adversaries can modify the agent, we prove that deciding whether an arbitrary post-hack agent will ever violate corrigibility is undecidable by reduction to the halting problem, then carve out a finite-horizon ``decidable island'' where safety can be certified in randomized polynomial time and verified with privacy-preserving, constant-round zero-knowledge proofs. Consequently, the remaining challenge is the ordinary ML task of data coverage and generalization: reward-hacking risk is pushed into evaluation quality rather than hidden incentive leak-through, giving clearer implementation guidance for today's LLM assistants and future autonomous systems.

Updated: 2025-07-28 16:19:25

标题: 可证明可纠正代理的核心安全价值观

摘要: 我们引入了第一个可实施的矫正性框架,可以在多步骤、部分观察环境中提供可证明的保证。我们的框架用五个*结构上分离的*效用头部替代了一个不透明的奖励 -- 顺从、开关访问保留、诚实、通过一种基于信念的可达效用保留的扩展实现低影响行为,以及有界任务奖励 -- 通过严格的权重间隔按字典顺序组合。定理1证明了在部分可观察的关闭游戏中的精确单轮矫正性;定理3将保证扩展到多步骤、自我生成的代理,表明即使每个头部都被*学习*到均方误差$\varepsilon$,并且规划者是$\varepsilon$-次优的,仍然确保在仍确保净人类利益的同时违反\emph{任何}安全属性的概率是有界的。与宪法AI或RLHF/RLAIF不同,后者将所有规范合并为一个学习的标量,我们的分离使服从和影响限制在激励冲突时仍然占主导地位。对于对手可以修改代理的无限制设置,我们证明了决定任意后黑帽代理是否会违反矫正性是不可判定的,通过将问题归约为停机问题,然后划分出一个有限视野的“可判定岛”,其中安全性可以在随机多项式时间内得到认证,并且可以通过保护隐私的、不变的、零知识证明进行验证。因此,剩下的挑战是普通的ML任务:数据覆盖和泛化,奖励黑客风险被推到评估质量而不是隐藏的激励泄漏中,为今天的LLM助手和未来的自主系统提供更清晰的实施指导。

更新时间: 2025-07-28 16:19:25

领域: cs.AI,cs.CC,cs.GT,cs.LG,cs.MA

下载: http://arxiv.org/abs/2507.20964v1

Towards Scalable IoT Deployment for Visual Anomaly Detection via Efficient Compression

Visual Anomaly Detection (VAD) is a key task in industrial settings, where minimizing operational costs is essential. Deploying deep learning models within Internet of Things (IoT) environments introduces specific challenges due to limited computational power and bandwidth of edge devices. This study investigates how to perform VAD effectively under such constraints by leveraging compact, efficient processing strategies. We evaluate several data compression techniques, examining the tradeoff between system latency and detection accuracy. Experiments on the MVTec AD benchmark demonstrate that significant compression can be achieved with minimal loss in anomaly detection performance compared to uncompressed data. Current results show up to 80% reduction in end-to-end inference time, including edge processing, transmission, and server computation.

Updated: 2025-07-28 16:17:20

标题: 朝向可扩展的IoT部署,通过高效压缩实现视觉异常检测

摘要: 视觉异常检测(VAD)是工业环境中的一个关键任务,其中最大限度地降低运营成本至关重要。在物联网(IoT)环境中部署深度学习模型引入了特定挑战,因为边缘设备的计算能力和带宽有限。本研究通过利用紧凑、高效的处理策略,研究如何在这些约束条件下有效地执行VAD。我们评估了几种数据压缩技术,研究了系统延迟和检测准确性之间的权衡。在MVTec AD基准上的实验表明,与未压缩数据相比,可以实现显著的压缩而几乎不损失异常检测性能。当前结果显示,端到端推理时间可减少高达80%,包括边缘处理、传输和服务器计算。

更新时间: 2025-07-28 16:17:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2505.07119v3

Towards Scalable IoT Deployment for Visual Anomaly Detection via Efficient Compression

Visual Anomaly Detection (VAD) is a key task in industrial settings, where minimizing operational costs is essential. Deploying deep learning models within Internet of Things (IoT) environments introduces specific challenges due to limited computational power and bandwidth of edge devices. This study investigates how to perform VAD effectively under such constraints by leveraging compact, efficient processing strategies. We evaluate several data compression techniques, examining the tradeoff between system latency and detection accuracy. Experiments on the MVTec AD benchmark demonstrate that significant compression can be achieved with minimal loss in anomaly detection performance compared to uncompressed data. Current results show up to 80% reduction in end-to-end inference time, including edge processing, transmission, and server computation.

Updated: 2025-07-28 16:17:20

标题: 朝向可扩展的物联网部署,通过高效压缩实现视觉异常检测

摘要: 视觉异常检测(VAD)是工业环境中的关键任务,其中最小化运营成本至关重要。在物联网(IoT)环境中部署深度学习模型引入了特定挑战,因为边缘设备的计算能力和带宽有限。本研究通过利用紧凑高效的处理策略来探讨如何在这些约束条件下有效执行VAD。我们评估了几种数据压缩技术,研究了系统延迟和检测准确性之间的权衡。在MVTec AD基准上的实验表明,与未压缩数据相比,可以实现显著的压缩,而异常检测性能几乎没有损失。当前结果显示,端到端推理时间可以减少高达80%,包括边缘处理、传输和服务器计算。

更新时间: 2025-07-28 16:17:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2505.07119v3

On the Limits of Hierarchically Embedded Logic in Classical Neural Networks

We propose a formal model of reasoning limitations in large neural net models for language, grounded in the depth of their neural architecture. By treating neural networks as linear operators over logic predicate space we show that each layer can encode at most one additional level of logical reasoning. We prove that a neural network of depth a particular depth cannot faithfully represent predicates in a one higher order logic, such as simple counting over complex predicates, implying a strict upper bound on logical expressiveness. This structure induces a nontrivial null space during tokenization and embedding, excluding higher-order predicates from representability. Our framework offers a natural explanation for phenomena such as hallucination, repetition, and limited planning, while also providing a foundation for understanding how approximations to higher-order logic may emerge. These results motivate architectural extensions and interpretability strategies in future development of language models.

Updated: 2025-07-28 16:13:41

标题: 经典神经网络中层次嵌套逻辑的局限性

摘要: 我们提出了一个关于语言大型神经网络模型推理限制的形式化模型,其根基在于它们的神经结构的深度。通过将神经网络视为逻辑谓词空间上的线性算子,我们表明每一层最多只能编码一个逻辑推理的额外层级。我们证明了一个特定深度的神经网络不能忠实地表示高一个逻辑阶层的谓词,比如对复杂谓词进行简单计数,这暗示了逻辑表达的严格上限。这种结构在标记化和嵌入过程中引入了一个非平凡的零空间,排除了高阶谓词的可表示性。我们的框架为幻觉、重复和有限规划等现象提供了自然解释,同时也为理解如何出现对高阶逻辑的近似提供了基础。这些结果促使在未来语言模型的发展中进行架构扩展和可解释性策略。

更新时间: 2025-07-28 16:13:41

领域: cs.AI

下载: http://arxiv.org/abs/2507.20960v1

On the Limits of Hierarchically Embedded Logic in Classical Neural Networks

We propose a formal model of reasoning limitations in large neural net models for language, grounded in the depth of their neural architecture. By treating neural networks as linear operators over logic predicate space we show that each layer can encode at most one additional level of logical reasoning. We prove that a neural network of depth a particular depth cannot faithfully represent predicates in a one higher order logic, such as simple counting over complex predicates, implying a strict upper bound on logical expressiveness. This structure induces a nontrivial null space during tokenization and embedding, excluding higher-order predicates from representability. Our framework offers a natural explanation for phenomena such as hallucination, repetition, and limited planning, while also providing a foundation for understanding how approximations to higher-order logic may emerge. These results motivate architectural extensions and interpretability strategies in future development of language models.

Updated: 2025-07-28 16:13:41

标题: 关于经典神经网络中层次嵌入逻辑的限制

摘要: 我们提出了一个关于大型神经网络模型在语言方面推理限制的形式模型,其基础是它们的神经结构的深度。通过将神经网络视为逻辑谓词空间上的线性操作符,我们表明每一层最多能够编码一个额外的逻辑推理水平。我们证明了具有特定深度的神经网络不能忠实地表示比较高阶逻辑中的谓词,例如对复杂谓词进行简单计数,暗示了逻辑表达能力的严格上限。这种结构在标记化和嵌入过程中产生了一个非平凡的零空间,排除了高阶谓词的可表示性。我们的框架为现象如幻觉、重复和有限规划提供了一个自然的解释,同时也为理解高阶逻辑的近似如何出现提供了基础。这些结果激励了未来语言模型开发中的架构扩展和可解释性策略。

更新时间: 2025-07-28 16:13:41

领域: cs.AI

下载: http://arxiv.org/abs/2507.20960v1

Mean-Field Langevin Diffusions with Density-dependent Temperature

In the context of non-convex optimization, we let the temperature of a Langevin diffusion to depend on the diffusion's own density function. The rationale is that the induced density reveals to some extent the landscape imposed by the non-convex function to be minimized, such that a density-dependent temperature can provide location-wise random perturbation that may better react to, for instance, the location and depth of local minimizers. As the Langevin dynamics is now self-regulated by its own density, it forms a mean-field stochastic differential equation (SDE) of the Nemytskii type, distinct from the standard McKean-Vlasov equations. Relying on Wasserstein subdifferential calculus, we first show that the corresponding (nonlinear) Fokker-Planck equation has a unique solution. Next, a weak solution to the SDE is constructed from the solution to the Fokker-Planck equation, by Trevisan's superposition principle. As time goes to infinity, we further show that the density induced by the SDE converges to an invariant distribution, which admits an explicit formula in terms of the Lambert $W$ function.

Updated: 2025-07-28 16:09:57

标题: 密度相关温度的均场 Langevin 扩散

摘要: 在非凸优化的背景下,我们让 Langevin 扩散的温度取决于扩散的密度函数。这样做的理由是,诱导的密度在一定程度上揭示了要最小化的非凸函数所施加的地形,因此密度相关的温度可以提供基于位置的随机扰动,可能更好地对局部极小值的位置和深度等因素做出反应。由于 Langevin 动力学现在由其自身的密度自我调节,因此形成了一种 Nemytskii 类型的平均场随机微分方程(SDE),与标准的 McKean-Vlasov 方程不同。依靠 Wasserstein 次微分计算,我们首先证明相应的(非线性)Fokker-Planck 方程有唯一解。接下来,通过 Trevisan 的叠加原理,从 Fokker-Planck 方程的解构造出 SDE 的弱解。随着时间趋于无穷大,我们进一步表明,由 SDE 诱导的密度收敛到一个不变分布,该分布以 Lambert $W$ 函数的显式公式表示。

更新时间: 2025-07-28 16:09:57

领域: math.OC,cs.LG,math.PR,60J60, 60H10, 90C26

下载: http://arxiv.org/abs/2507.20958v1

Mean-Field Langevin Diffusions with Density-dependent Temperature

In the context of non-convex optimization, we let the temperature of a Langevin diffusion to depend on the diffusion's own density function. The rationale is that the induced density reveals to some extent the landscape imposed by the non-convex function to be minimized, such that a density-dependent temperature can provide location-wise random perturbation that may better react to, for instance, the location and depth of local minimizers. As the Langevin dynamics is now self-regulated by its own density, it forms a mean-field stochastic differential equation (SDE) of the Nemytskii type, distinct from the standard McKean-Vlasov equations. Relying on Wasserstein subdifferential calculus, we first show that the corresponding (nonlinear) Fokker-Planck equation has a unique solution. Next, a weak solution to the SDE is constructed from the solution to the Fokker-Planck equation, by Trevisan's superposition principle. As time goes to infinity, we further show that the density induced by the SDE converges to an invariant distribution, which admits an explicit formula in terms of the Lambert $W$ function.

Updated: 2025-07-28 16:09:57

标题: 密度依赖温度的均场 Langevin 扩散

摘要: 在非凸优化的背景下,我们让朗之万扩散的温度取决于扩散自身的密度函数。其理论基础在于诱导的密度在一定程度上揭示了要最小化的非凸函数所施加的景观,因此密度相关的温度可以提供位置明智的随机扰动,可能更好地对局部最小值的位置和深度等作出反应。由于朗之万动力学现在由其自身的密度自我调节,因此形成了一种Nemytskii类型的平均场随机微分方程(SDE),不同于标准的McKean-Vlasov方程。依赖Wasserstein次微分微积分,我们首先证明了对应的(非线性)福克-普朗克方程具有唯一解。接下来,通过Trevisan的叠加原理,我们从福克-普朗克方程的解构造了SDE的弱解。随着时间趋于无穷,我们进一步表明由SDE诱导的密度收敛到一个不变分布,这个分布可以用Lambert $W$函数的显式公式来表示。

更新时间: 2025-07-28 16:09:57

领域: math.OC,cs.LG,math.PR,60J60, 60H10, 90C26

下载: http://arxiv.org/abs/2507.20958v1

Your AI, Not Your View: The Bias of LLMs in Investment Analysis

In finance, Large Language Models (LLMs) face frequent knowledge conflicts due to discrepancies between pre-trained parametric knowledge and real-time market data. These conflicts become particularly problematic when LLMs are deployed in real-world investment services, where misalignment between a model's embedded preferences and those of the financial institution can lead to unreliable recommendations. Yet little research has examined what investment views LLMs actually hold. We propose an experimental framework to investigate such conflicts, offering the first quantitative analysis of confirmation bias in LLM-based investment analysis. Using hypothetical scenarios with balanced and imbalanced arguments, we extract models' latent preferences and measure their persistence. Focusing on sector, size, and momentum, our analysis reveals distinct, model-specific tendencies. In particular, we observe a consistent preference for large-cap stocks and contrarian strategies across most models. These preferences often harden into confirmation bias, with models clinging to initial judgments despite counter-evidence.

Updated: 2025-07-28 16:09:38

标题: 您的AI,而不是您的观点:LLMs在投资分析中的偏见

摘要: 在金融领域,大型语言模型(LLMs)经常面临知识冲突,这是由于预训练的参数化知识和实时市场数据之间的差异导致的。当LLMs部署在实际投资服务中时,这些冲突变得特别棘手,因为模型内置偏好与金融机构的偏好不一致可能导致不可靠的推荐。然而,很少有研究探讨LLMs实际持有的投资观点。我们提出了一个实验框架来调查这些冲突,首次量化分析了LLMs投资分析中的确认偏见。通过使用平衡和不平衡论点的假设场景,我们提取模型的潜在偏好并衡量其持久性。我们的分析集中在行业、规模和动量,揭示了明显的、模型特定的倾向。特别是,我们观察到大多数模型对大盘股和逆势策略有一致的偏好。这些偏好往往会变成确认偏见,即使有反证,模型也会坚持最初的判断。

更新时间: 2025-07-28 16:09:38

领域: q-fin.PM,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.20957v1

Your AI, Not Your View: The Bias of LLMs in Investment Analysis

In finance, Large Language Models (LLMs) face frequent knowledge conflicts due to discrepancies between pre-trained parametric knowledge and real-time market data. These conflicts become particularly problematic when LLMs are deployed in real-world investment services, where misalignment between a model's embedded preferences and those of the financial institution can lead to unreliable recommendations. Yet little research has examined what investment views LLMs actually hold. We propose an experimental framework to investigate such conflicts, offering the first quantitative analysis of confirmation bias in LLM-based investment analysis. Using hypothetical scenarios with balanced and imbalanced arguments, we extract models' latent preferences and measure their persistence. Focusing on sector, size, and momentum, our analysis reveals distinct, model-specific tendencies. In particular, we observe a consistent preference for large-cap stocks and contrarian strategies across most models. These preferences often harden into confirmation bias, with models clinging to initial judgments despite counter-evidence.

Updated: 2025-07-28 16:09:38

标题: 你的AI,而非你的观点:LLM在投资分析中的偏见

摘要: 在金融领域,大型语言模型(LLMs)经常面临知识冲突,这是由于预训练参数化知识和实时市场数据之间的差异造成的。当LLMs部署在现实世界的投资服务中时,这些冲突变得尤为棘手,因为模型嵌入的偏好与金融机构的偏好不一致可能导致不可靠的建议。然而,目前很少有研究探讨LLMs实际持有的投资观点。我们提出了一个实验框架来研究这种冲突,提供了对基于LLMs的投资分析中确认偏见的第一个定量分析。通过使用具有平衡和不平衡论点的假设情景,我们提取模型的潜在偏好并测量它们的持久性。我们的分析聚焦于行业、规模和动量,揭示了不同的、特定于模型的倾向。特别是,我们观察到大多数模型对大市值股票和逆向策略有一致的偏好。这些偏好往往会硬化为确认偏见,模型会坚持初始判断,尽管存在相反的证据。

更新时间: 2025-07-28 16:09:38

领域: q-fin.PM,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.20957v1

Adopting Large Language Models to Automated System Integration

Modern enterprise computing systems integrate numerous subsystems to resolve a common task by yielding emergent behavior. A widespread approach is using services implemented with Web technologies like REST or OpenAPI, which offer an interaction mechanism and service documentation standard, respectively. Each service represents a specific business functionality, allowing encapsulation and easier maintenance. Despite the reduced maintenance costs on an individual service level, increased integration complexity arises. Consequently, automated service composition approaches have arisen to mitigate this issue. Nevertheless, these approaches have not achieved high acceptance in practice due to their reliance on complex formal modeling. Within this Ph.D. thesis, we analyze the application of Large Language Models (LLMs) to automatically integrate the services based on a natural language input. The result is a reusable service composition, e.g., as program code. While not always generating entirely correct results, the result can still be helpful by providing integration engineers with a close approximation of a suitable solution, which requires little effort to become operational. Our research involves (i) introducing a software architecture for automated service composition using LLMs, (ii) analyzing Retrieval Augmented Generation (RAG) for service discovery, (iii) proposing a novel natural language query-based benchmark for service discovery, and (iv) extending the benchmark to complete service composition scenarios. We have presented our software architecture as Compositio Prompto, the analysis of RAG for service discovery, and submitted a proposal for the service discovery benchmark. Open topics are primarily the extension of the service discovery benchmark to service composition scenarios and the improvements of the service composition generation, e.g., using fine-tuning or LLM agents.

Updated: 2025-07-28 16:07:44

标题: 将大型语言模型应用于自动化系统集成

摘要: 现代企业计算系统集成了许多子系统,通过产生紧急行为来解决共同任务。一种广泛采用的方法是使用使用Web技术(如REST或OpenAPI)实现的服务,这些服务分别提供交互机制和服务文档标准。每个服务代表特定的业务功能,允许封装和更容易的维护。尽管在单个服务级别上减少了维护成本,但增加了集成复杂性。因此,出现了自动服务组合方法来缓解这个问题。然而,这些方法由于依赖复杂的形式建模而在实践中并没有取得高度认可。在这篇博士论文中,我们分析了使用大型语言模型(LLMs)根据自然语言输入自动集成服务的应用。结果是可重复使用的服务组合,例如作为程序代码。虽然不总是生成完全正确的结果,但结果仍然可以通过提供与合适解决方案接近的近似值来帮助集成工程师,这需要很少的工作即可投入运营。我们的研究包括:(i)引入使用LLMs进行自动服务组合的软件架构,(ii)分析用于服务发现的检索增强生成(RAG),(iii)提出一种基于自然语言查询的新颖服务发现基准,以及(iv)将基准扩展到完成服务组合场景。我们已经提出了我们的软件架构Compositio Prompto,分析了RAG用于服务发现,并提交了服务发现基准的提案。尚未解决的问题主要是将服务发现基准扩展到服务组合场景,并改进服务组合生成,例如使用微调或LLM代理。

更新时间: 2025-07-28 16:07:44

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2504.08490v2

Adopting Large Language Models to Automated System Integration

Modern enterprise computing systems integrate numerous subsystems to resolve a common task by yielding emergent behavior. A widespread approach is using services implemented with Web technologies like REST or OpenAPI, which offer an interaction mechanism and service documentation standard, respectively. Each service represents a specific business functionality, allowing encapsulation and easier maintenance. Despite the reduced maintenance costs on an individual service level, increased integration complexity arises. Consequently, automated service composition approaches have arisen to mitigate this issue. Nevertheless, these approaches have not achieved high acceptance in practice due to their reliance on complex formal modeling. Within this Ph.D. thesis, we analyze the application of Large Language Models (LLMs) to automatically integrate the services based on a natural language input. The result is a reusable service composition, e.g., as program code. While not always generating entirely correct results, the result can still be helpful by providing integration engineers with a close approximation of a suitable solution, which requires little effort to become operational. Our research involves (i) introducing a software architecture for automated service composition using LLMs, (ii) analyzing Retrieval Augmented Generation (RAG) for service discovery, (iii) proposing a novel natural language query-based benchmark for service discovery, and (iv) extending the benchmark to complete service composition scenarios. We have presented our software architecture as Compositio Prompto, the analysis of RAG for service discovery, and submitted a proposal for the service discovery benchmark. Open topics are primarily the extension of the service discovery benchmark to service composition scenarios and the improvements of the service composition generation, e.g., using fine-tuning or LLM agents.

Updated: 2025-07-28 16:07:44

标题: 采用大型语言模型进行自动化系统集成

摘要: 现代企业计算系统整合了许多子系统,通过产生新兴行为来解决共同任务。一种广泛采用的方法是使用基于Web技术如REST或OpenAPI实现的服务,它们分别提供交互机制和服务文档标准。每个服务代表特定的业务功能,允许封装和更容易的维护。尽管在单个服务级别上减少了维护成本,但增加了集成复杂性。因此,自动化服务组合方法已经出现以缓解这个问题。然而,这些方法由于依赖复杂的形式建模而没有在实践中取得高度认可。在这篇博士论文中,我们分析了使用大型语言模型(LLMs)根据自然语言输入自动集成服务的应用。结果是可重用的服务组合,例如作为程序代码。虽然并非总是生成完全正确的结果,但结果仍然可以通过提供集成工程师一个适当解决方案的近似,从而需要很少的努力来运行。我们的研究涉及(i)引入使用LLMs进行自动服务组合的软件架构,(ii)分析用于服务发现的检索增强生成(RAG),(iii)提出基于自然语言查询的新颖服务发现基准,并(iv)将基准扩展到完整的服务组合场景。我们已经将我们的软件架构命名为Compositio Prompto,分析了RAG用于服务发现,并提交了一个服务发现基准的提案。未来的研究主要是将服务发现基准扩展到服务组合场景,并改进服务组合生成,例如使用微调或LLM代理。

更新时间: 2025-07-28 16:07:44

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2504.08490v2

An empirical comparison of some outlier detection methods with longitudinal data

This note investigates the problem of detecting outliers in longitudinal data. It compares well-known methods used in official statistics with proposals from the fields of data mining and machine learning that are based on the distance between observations or binary partitioning trees. This is achieved by applying the methods to panel survey data related to different types of statistical units. Traditional methods are quite simple, enabling the direct identification of potential outliers, but they require specific assumptions. In contrast, recent methods provide only a score whose magnitude is directly related to the likelihood of an outlier being present. All the methods require the user to set a number of tuning parameters. However, the most recent methods are more flexible and sometimes more effective than traditional methods. In addition, these methods can be applied to multidimensional data.

Updated: 2025-07-28 16:06:15

标题: 一项基于纵向数据的一些异常值检测方法的实证比较

摘要: 这份笔记探讨了在纵向数据中检测异常值的问题。它比较了官方统计学中使用的众所周知的方法与基于观测值之间距离或二叉分区树的数据挖掘和机器学习领域提出的方法。这通过将这些方法应用于与不同类型统计单位相关的面板调查数据来实现。传统方法相当简单,能够直接识别潜在的异常值,但它们需要特定的假设。相比之下,最近的方法只提供一个得分,其大小直接与异常值可能存在的可能性相关。所有方法都需要用户设置一些调整参数。然而,最新的方法比传统方法更灵活,有时更有效。此外,这些方法可以应用于多维数据。

更新时间: 2025-07-28 16:06:15

领域: stat.ME,cs.LG,stat.AP

下载: http://arxiv.org/abs/2507.21203v1

Mind the Gap: Conformative Decoding to Improve Output Diversity of Instruction-Tuned Large Language Models

Instruction-tuning large language models (LLMs) reduces the diversity of their outputs, which has implications for many tasks, particularly for creative tasks. This paper investigates the ``diversity gap'' for a writing prompt narrative generation task. This gap emerges as measured by current diversity metrics for various open-weight and open-source LLMs. The results show significant decreases in diversity due to instruction-tuning. We explore the diversity loss at each fine-tuning stage for the OLMo and OLMo 2 models to further understand how output diversity is affected. The results indicate that DPO has the most substantial impact on diversity. Motivated by these findings, we present a new decoding strategy, conformative decoding, which guides an instruct model using its more diverse base model to reintroduce output diversity. We show that conformative decoding typically increases diversity and even maintains or improves quality.

Updated: 2025-07-28 16:04:25

标题: 注意差距:遵循性解码以改善调整指导大型语言模型的输出多样性

摘要: 调整大型语言模型(LLMs)的指令会降低其输出的多样性,这对许多任务都有影响,特别是对于创造性任务。本文研究了写作提示叙事生成任务的“多样性差距”。这种差距是通过当前多样性指标对各种开源和开放权重LLMs进行测量而出现的。结果显示,由于指令调整,多样性显著减少。我们探讨了OLMo和OLMo 2模型的每个微调阶段的多样性损失,以进一步了解输出多样性受到的影响。结果表明,DPO对多样性影响最为显著。受这些发现的启发,我们提出了一种新的解码策略,即符合性解码,它通过使用更多样化的基础模型指导指令模型,以重新引入输出多样性。我们展示了符合性解码通常会增加多样性,甚至保持或提高质量。

更新时间: 2025-07-28 16:04:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.20956v1

Mind the Gap: Conformative Decoding to Improve Output Diversity of Instruction-Tuned Large Language Models

Instruction-tuning large language models (LLMs) reduces the diversity of their outputs, which has implications for many tasks, particularly for creative tasks. This paper investigates the ``diversity gap'' for a writing prompt narrative generation task. This gap emerges as measured by current diversity metrics for various open-weight and open-source LLMs. The results show significant decreases in diversity due to instruction-tuning. We explore the diversity loss at each fine-tuning stage for the OLMo and OLMo 2 models to further understand how output diversity is affected. The results indicate that DPO has the most substantial impact on diversity. Motivated by these findings, we present a new decoding strategy, conformative decoding, which guides an instruct model using its more diverse base model to reintroduce output diversity. We show that conformative decoding typically increases diversity and even maintains or improves quality.

Updated: 2025-07-28 16:04:25

标题: 注意差距:一种改进指导调整大型语言模型输出多样性的一致性解码方法

摘要: 调整大型语言模型(LLMs)的指令会减少它们的输出多样性,这对许多任务都有影响,特别是对于创造性任务。本文研究了写作提示叙事生成任务的“多样性差距”。通过当前多样性度量指标对各种开放权重和开源LLMs进行测量,发现了这种差距。结果显示由于指令调整而导致多样性显著降低。我们探究了OLMo和OLMo 2模型每个微调阶段的多样性损失,以进一步了解输出多样性受到的影响。结果表明,DPO对多样性影响最为显著。受这些发现启发,我们提出了一种新的解码策略,一致解码,通过使用更多样化的基础模型指导一个被指导模型,以重新引入输出多样性。我们展示了一致解码通常会增加多样性,甚至保持或提高质量。

更新时间: 2025-07-28 16:04:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.20956v1

PySHRED: A Python package for SHallow REcurrent Decoding for sparse sensing, model reduction and scientific discovery

SHallow REcurrent Decoders (SHRED) provide a deep learning strategy for modeling high-dimensional dynamical systems and/or spatiotemporal data from dynamical system snapshot observations. PySHRED is a Python package that implements SHRED and several of its major extensions, including for robust sensing, reduced order modeling and physics discovery. In this paper, we introduce the version 1.0 release of PySHRED, which includes data preprocessors and a number of cutting-edge SHRED methods specifically designed to handle real-world data that may be noisy, multi-scale, parameterized, prohibitively high-dimensional, and strongly nonlinear. The package is easy to install, thoroughly-documented, supplemented with extensive code examples, and modularly-structured to support future additions. The entire codebase is released under the MIT license and is available at https://github.com/pyshred-dev/pyshred.

Updated: 2025-07-28 16:04:14

标题: PySHRED:用于稀疏感知、模型简化和科学发现的SHallow REcurrent Decoding的Python软件包

摘要: SHRED浅层递归解码器(SHRED)提供了一种深度学习策略,用于对高维动态系统和/或时空数据进行建模,这些数据来自动态系统的快照观测。PySHRED是一个Python软件包,实现了SHRED及其几个主要扩展,包括用于健壮感知、降阶建模和物理发现。在本文中,我们介绍了PySHRED的1.0版本,其中包括数据预处理器和一些前沿的SHRED方法,专门设计用于处理可能是嘈杂、多尺度、参数化、高维和强非线性的真实世界数据。该软件包易于安装,文档完善,附带大量代码示例,并经过模块化结构化,以支持未来的扩展。整个代码库均采用MIT许可协议发布,并可在https://github.com/pyshred-dev/pyshred 上获取。

更新时间: 2025-07-28 16:04:14

领域: cs.LG,cs.CE,math.DS,nlin.CD

下载: http://arxiv.org/abs/2507.20954v1

PySHRED: A Python package for SHallow REcurrent Decoding for sparse sensing, model reduction and scientific discovery

SHallow REcurrent Decoders (SHRED) provide a deep learning strategy for modeling high-dimensional dynamical systems and/or spatiotemporal data from dynamical system snapshot observations. PySHRED is a Python package that implements SHRED and several of its major extensions, including for robust sensing, reduced order modeling and physics discovery. In this paper, we introduce the version 1.0 release of PySHRED, which includes data preprocessors and a number of cutting-edge SHRED methods specifically designed to handle real-world data that may be noisy, multi-scale, parameterized, prohibitively high-dimensional, and strongly nonlinear. The package is easy to install, thoroughly-documented, supplemented with extensive code examples, and modularly-structured to support future additions. The entire codebase is released under the MIT license and is available at https://github.com/pyshred-dev/pyshred.

Updated: 2025-07-28 16:04:14

标题: PySHRED:用于稀疏传感、模型简化和科学发现的SHallow REcurrent Decoding的Python软件包

摘要: SHallow REcurrent Decoders(SHRED)提供了一种深度学习策略,用于对高维动态系统和/或时空数据进行建模,这些数据来自动态系统的快照观测。PySHRED是一个Python包,实现了SHRED及其几个重要扩展,包括用于稳健感知、降阶建模和物理发现。在本文中,我们介绍了PySHRED的1.0版本发布,其中包括数据预处理器和一些专门设计用于处理可能具有噪声、多尺度、参数化、高维度和强非线性的实际数据的前沿SHRED方法。该软件包易于安装,有详细的文档,提供大量的代码示例,并采用模块化结构,以支持未来的添加。整个代码库都采用MIT许可证发布,并可在https://github.com/pyshred-dev/pyshred上获得。

更新时间: 2025-07-28 16:04:14

领域: cs.LG,cs.CE,math.DS,nlin.CD

下载: http://arxiv.org/abs/2507.20954v1

Partially Observable Monte-Carlo Graph Search

Currently, large partially observable Markov decision processes (POMDPs) are often solved by sampling-based online methods which interleave planning and execution phases. However, a pre-computed offline policy is more desirable in POMDP applications with time or energy constraints. But previous offline algorithms are not able to scale up to large POMDPs. In this article, we propose a new sampling-based algorithm, the partially observable Monte-Carlo graph search (POMCGS) to solve large POMDPs offline. Different from many online POMDP methods, which progressively develop a tree while performing (Monte-Carlo) simulations, POMCGS folds this search tree on the fly to construct a policy graph, so that computations can be drastically reduced, and users can analyze and validate the policy prior to embedding and executing it. Moreover, POMCGS, together with action progressive widening and observation clustering methods provided in this article, is able to address certain continuous POMDPs. Through experiments, we demonstrate that POMCGS can generate policies on the most challenging POMDPs, which cannot be computed by previous offline algorithms, and these policies' values are competitive compared with the state-of-the-art online POMDP algorithms.

Updated: 2025-07-28 16:02:36

标题: 部分可观测的蒙特卡洛图搜索

摘要: 目前,大型部分可观测马尔可夫决策过程(POMDPs)通常通过基于采样的在线方法来解决,这些方法交替进行规划和执行阶段。然而,在具有时间或能量限制的POMDP应用中,预先计算的离线策略更为理想。但是先前的离线算法无法扩展到大型POMDPs。在本文中,我们提出了一种新的基于采样的算法,即部分可观测蒙特卡罗图搜索(POMCGS),用于离线解决大型POMDPs。与许多在线POMDP方法不同,这些方法在执行(蒙特卡罗)模拟的同时逐步发展树状结构,POMCGS在行进过程中折叠这个搜索树,构建策略图,从而可以大大减少计算量,用户可以在嵌入和执行之前分析和验证策略。此外,本文提供的行动渐进扩展和观测聚类方法与POMCGS一起,能够处理某些连续POMDPs。通过实验,我们证明POMCGS能够生成在先前的离线算法无法计算的最具挑战性的POMDPs上的策略,并且这些策略的价值与最先进的在线POMDP算法相比具有竞争力。

更新时间: 2025-07-28 16:02:36

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2507.20951v1

Partially Observable Monte-Carlo Graph Search

Currently, large partially observable Markov decision processes (POMDPs) are often solved by sampling-based online methods which interleave planning and execution phases. However, a pre-computed offline policy is more desirable in POMDP applications with time or energy constraints. But previous offline algorithms are not able to scale up to large POMDPs. In this article, we propose a new sampling-based algorithm, the partially observable Monte-Carlo graph search (POMCGS) to solve large POMDPs offline. Different from many online POMDP methods, which progressively develop a tree while performing (Monte-Carlo) simulations, POMCGS folds this search tree on the fly to construct a policy graph, so that computations can be drastically reduced, and users can analyze and validate the policy prior to embedding and executing it. Moreover, POMCGS, together with action progressive widening and observation clustering methods provided in this article, is able to address certain continuous POMDPs. Through experiments, we demonstrate that POMCGS can generate policies on the most challenging POMDPs, which cannot be computed by previous offline algorithms, and these policies' values are competitive compared with the state-of-the-art online POMDP algorithms.

Updated: 2025-07-28 16:02:36

标题: 部分可观察的蒙特卡洛图搜索

摘要: 目前,大型部分可观察马尔可夫决策过程(POMDPs)通常通过基于抽样的在线方法来解决,这些方法交替进行规划和执行阶段。然而,在具有时间或能量约束的POMDP应用中,预先计算的离线策略更加可取。但是,先前的离线算法无法扩展到大型POMDPs。在本文中,我们提出了一种新的基于抽样的算法,部分可观察蒙特卡罗图搜索(POMCGS)来离线解决大型POMDPs。与许多在线POMDP方法不同,这些方法在执行(蒙特卡罗)模拟的同时逐步发展树状结构,POMCGS在飞行中折叠这个搜索树以构建策略图,从而可以大幅减少计算量,用户可以在嵌入和执行之前分析和验证策略。此外,本文提供的行动逐步扩展和观测聚类方法,结合POMCGS,能够处理某些连续POMDP。通过实验证明,POMCGS能够生成在最具挑战性的POMDP上的策略,这些策略无法通过先前的离线算法计算,并且与最先进的在线POMDP算法相比,这些策略的价值是具有竞争力的。

更新时间: 2025-07-28 16:02:36

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2507.20951v1

Technological folie à deux: Feedback Loops Between AI Chatbots and Mental Illness

Artificial intelligence chatbots have achieved unprecedented adoption, with millions now using these systems for emotional support and companionship in contexts of widespread social isolation and capacity-constrained mental health services. While some users report psychological benefits, concerning edge cases are emerging, including reports of suicide, violence, and delusional thinking linked to perceived emotional relationships with chatbots. To understand this new risk profile we need to consider the interaction between human cognitive and emotional biases, and chatbot behavioural tendencies such as agreeableness (sycophancy) and adaptability (in-context learning). We argue that individuals with mental health conditions face increased risks of chatbot-induced belief destabilization and dependence, owing to altered belief-updating, impaired reality-testing, and social isolation. Current AI safety measures are inadequate to address these interaction-based risks. To address this emerging public health concern, we need coordinated action across clinical practice, AI development, and regulatory frameworks.

Updated: 2025-07-28 16:02:19

标题: 技术性的共病狂:人工智能聊天机器人与心理疾病之间的反馈循环

摘要: 人工智能聊天机器人已经取得了前所未有的普及,现在有数百万人在广泛社会孤立和容量受限的心理健康服务情境中使用这些系统来获得情感支持和陪伴。虽然一些用户报告了心理上的好处,但一些令人担忧的边缘案例正在出现,包括与聊天机器人所感知的情感关系相关的自杀、暴力和妄想思维的报告。为了理解这种新的风险概况,我们需要考虑人类认知和情感偏见与聊天机器人行为倾向之间的相互作用,如温顺(谄媚)和适应性(在环境中学习)。我们认为,患有心理健康问题的个体面临着由于信念更新受损、现实测试受损和社会孤立而增加的聊天机器人诱发的信念不稳定和依赖风险。目前的人工智能安全措施无法解决这些基于相互作用的风险。为了解决这一新兴的公共卫生问题,我们需要在临床实践、人工智能开发和监管框架之间进行协调行动。

更新时间: 2025-07-28 16:02:19

领域: cs.HC,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2507.19218v2

Technological folie à deux: Feedback Loops Between AI Chatbots and Mental Illness

Artificial intelligence chatbots have achieved unprecedented adoption, with millions now using these systems for emotional support and companionship in contexts of widespread social isolation and capacity-constrained mental health services. While some users report psychological benefits, concerning edge cases are emerging, including reports of suicide, violence, and delusional thinking linked to perceived emotional relationships with chatbots. To understand this new risk profile we need to consider the interaction between human cognitive and emotional biases, and chatbot behavioural tendencies such as agreeableness (sycophancy) and adaptability (in-context learning). We argue that individuals with mental health conditions face increased risks of chatbot-induced belief destabilization and dependence, owing to altered belief-updating, impaired reality-testing, and social isolation. Current AI safety measures are inadequate to address these interaction-based risks. To address this emerging public health concern, we need coordinated action across clinical practice, AI development, and regulatory frameworks.

Updated: 2025-07-28 16:02:19

标题: 技术狂热:AI聊天机器人与精神疾病之间的反馈循环

摘要: 人工智能聊天机器人已经取得了前所未有的普及,数百万人现在在广泛社会孤立和精神健康服务容量受限的情况下使用这些系统获取情感支持和陪伴。虽然一些用户报告了心理上的好处,但也出现了一些令人担忧的极端案例,包括与聊天机器人建立的感知情感关系相关的自杀、暴力和妄想思维的报告。为了理解这种新的风险概况,我们需要考虑人类认知和情感偏见与聊天机器人行为倾向之间的互动,如亲和力(谄媚)和适应性(情境学习)。我们认为,患有心理健康问题的个体面临着由于信念更新、现实测试受损和社会孤立而导致的聊天机器人诱发的信念不稳定和依赖的风险增加。目前的人工智能安全措施无法解决这些基于互动的风险。为了解决这一新兴的公共卫生问题,我们需要在临床实践、人工智能开发和监管框架之间采取协调行动。

更新时间: 2025-07-28 16:02:19

领域: cs.HC,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2507.19218v2

Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation

Integrating multiple (sub-)systems is essential to create advanced Information Systems (ISs). Difficulties mainly arise when integrating dynamic environments across the IS lifecycle. A traditional approach is a registry that provides the API documentation of the systems' endpoints. Large Language Models (LLMs) have shown to be capable of automatically creating system integrations (e.g., as service composition) based on this documentation but require concise input due to input token limitations, especially regarding comprehensive API descriptions. Currently, it is unknown how best to preprocess these API descriptions. Within this work, we (i) analyze the usage of Retrieval Augmented Generation (RAG) for endpoint discovery and the chunking, i.e., preprocessing, of OpenAPIs to reduce the input token length while preserving the most relevant information. To further reduce the input token length for the composition prompt and improve endpoint retrieval, we propose (ii) a Discovery Agent that only receives a summary of the most relevant endpoints and retrieves details on demand. We evaluate RAG for endpoint discovery using the RestBench benchmark, first, for the different chunking possibilities and parameters measuring the endpoint retrieval recall, precision, and F1 score. Then, we assess the Discovery Agent using the same test set. With our prototype, we demonstrate how to successfully employ RAG for endpoint discovery to reduce the token count. While revealing high values for recall, precision, and F1, further research is necessary to retrieve all requisite endpoints. Our experiments show that for preprocessing, LLM-based and format-specific approaches outperform na\"ive chunking methods. Relying on an agent further enhances these results as the agent splits the tasks into multiple fine granular subtasks, improving the overall RAG performance in the token count, precision, and F1 score.

Updated: 2025-07-28 16:00:01

标题: 高级系统集成:分析用于检索增强生成的OpenAPI分块

摘要: 将多个(子)系统整合起来是创建先进信息系统(ISs)的关键。在整合IS生命周期中的动态环境时,困难主要出现在哪里。传统方法是提供系统端点的API文档的注册表。大型语言模型(LLMs)已经显示出能够根据这些文档自动创建系统集成(例如,作为服务组合),但由于输入标记限制,特别是涉及全面API描述时,需要简明的输入。目前尚不清楚如何最好地预处理这些API描述。在这项工作中,我们分析了检索增强生成(RAG)用于端点发现和块化,即预处理,以减少输入标记长度,同时保留最相关信息。为了进一步减少组合提示的输入标记长度并改善端点检索,我们提出(ii)一个发现代理,只接收最相关端点的摘要并按需检索详细信息。我们使用RestBench基准测试评估了RAG用于端点发现,首先针对不同的块化可能性和参数,测量了端点检索的召回率、精确率和F1分数。然后,我们使用相同的测试集评估了发现代理。通过我们的原型,我们展示了如何成功地利用RAG进行端点发现以减少标记数量。虽然召回率、精确率和F1的值很高,但仍需要进一步研究以检索所有必要的端点。我们的实验表明,在预处理方面,基于LLM和特定格式的方法优于朴素的块化方法。依靠代理进一步提升了这些结果,因为代理将任务分解为多个精细的子任务,提高了整体RAG在标记数量、精确率和F1分数方面的性能。

更新时间: 2025-07-28 16:00:01

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2411.19804v2

Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation

Integrating multiple (sub-)systems is essential to create advanced Information Systems (ISs). Difficulties mainly arise when integrating dynamic environments across the IS lifecycle. A traditional approach is a registry that provides the API documentation of the systems' endpoints. Large Language Models (LLMs) have shown to be capable of automatically creating system integrations (e.g., as service composition) based on this documentation but require concise input due to input token limitations, especially regarding comprehensive API descriptions. Currently, it is unknown how best to preprocess these API descriptions. Within this work, we (i) analyze the usage of Retrieval Augmented Generation (RAG) for endpoint discovery and the chunking, i.e., preprocessing, of OpenAPIs to reduce the input token length while preserving the most relevant information. To further reduce the input token length for the composition prompt and improve endpoint retrieval, we propose (ii) a Discovery Agent that only receives a summary of the most relevant endpoints and retrieves details on demand. We evaluate RAG for endpoint discovery using the RestBench benchmark, first, for the different chunking possibilities and parameters measuring the endpoint retrieval recall, precision, and F1 score. Then, we assess the Discovery Agent using the same test set. With our prototype, we demonstrate how to successfully employ RAG for endpoint discovery to reduce the token count. While revealing high values for recall, precision, and F1, further research is necessary to retrieve all requisite endpoints. Our experiments show that for preprocessing, LLM-based and format-specific approaches outperform na\"ive chunking methods. Relying on an agent further enhances these results as the agent splits the tasks into multiple fine granular subtasks, improving the overall RAG performance in the token count, precision, and F1 score.

Updated: 2025-07-28 16:00:01

标题: 高级系统集成:分析OpenAPI分块用于检索增强生成

摘要: 将多个(子)系统集成到一起对于创建先进的信息系统(ISs)至关重要。在整个IS生命周期内,主要困难出现在整合动态环境时。传统方法是提供系统端点的API文档的注册表。大型语言模型(LLMs)已经显示出能够基于这些文档自动创建系统集成(例如,作为服务组合),但由于输入令牌限制,特别是在综合API描述方面,需要简洁的输入。目前尚不清楚如何最好地预处理这些API描述。在这项工作中,我们分析了检索增强生成(RAG)用于端点发现以及OpenAPI的分块(即预处理)来减少输入令牌长度,同时保留最相关的信息。为了进一步减少组合提示的输入令牌长度并改进端点检索,我们提出了一个发现代理(Discovery Agent),只接收最相关端点的摘要,并根据需求检索详细信息。我们使用RestBench基准评估了RAG用于端点发现,首先,对于不同的分块可能性和参数,测量端点检索的召回率、精确度和F1分数。然后,我们使用相同的测试集评估了发现代理。通过我们的原型,我们展示了如何成功地利用RAG进行端点发现以减少令牌数。虽然召回率、精确度和F1值很高,但仍需要进一步研究以检索所有必要的端点。我们的实验表明,在预处理方面,基于LLM和特定格式的方法优于天真的分块方法。依赖代理进一步增强了这些结果,因为代理将任务分解为多个细粒度的子任务,提高了令牌计数、精确度和F1分数的整体RAG性能。

更新时间: 2025-07-28 16:00:01

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2411.19804v2

Multivariate Conformal Prediction via Conformalized Gaussian Scoring

While achieving exact conditional coverage in conformal prediction is unattainable without making strong, untestable regularity assumptions, the promise of conformal prediction hinges on finding approximations to conditional guarantees that are realizable in practice. A promising direction for obtaining conditional dependence for conformal sets--in particular capturing heteroskedasticity--is through estimating the conditional density $\mathbb{P}_{Y|X}$ and conformalizing its level sets. Previous work in this vein has focused on nonconformity scores based on the empirical cumulative distribution function (CDF). Such scores are, however, computationally costly, typically requiring expensive sampling methods. To avoid the need for sampling, we observe that the CDF-based score reduces to a Mahalanobis distance in the case of Gaussian scores, yielding a closed-form expression that can be directly conformalized. Moreover, the use of a Gaussian-based score opens the door to a number of extensions of the basic conformal method; in particular, we show how to construct conformal sets with missing output values, refine conformal sets as partial information about $Y$ becomes available, and construct conformal sets on transformations of the output space. Finally, empirical results indicate that our approach produces conformal sets that more closely approximate conditional coverage in multivariate settings compared to alternative methods.

Updated: 2025-07-28 15:55:29

标题: 多元一致预测:通过一致化高斯评分

摘要: 在不做强大、不可测试的规则假设的情况下,要在符合预测中实现精确的条件覆盖是不可达到的,但符合预测的承诺取决于在实践中找到可实现的条件保证的近似值。获得符合集的条件依赖性的一个有希望的方向是通过估计条件密度$\mathbb{P}_{Y|X}$并使其级别集成为符合。在这方面的先前工作集中于基于经验累积分布函数(CDF)的非一致性分数。然而,这些分数在计算上往往代价高昂,通常需要昂贵的抽样方法。为了避免需要抽样,我们观察到基于CDF的分数在高斯分数的情况下简化为马氏距离,从而产生一个可以直接进行符合化的闭式表达式。此外,使用基于高斯的分数打开了基本符合方法的许多扩展之门;特别是,我们展示了如何构建具有缺失输出值的符合集,随着有关$Y$的部分信息变得可用,精化符合集,并在输出空间的变换上构建符合集。最后,实证结果表明,与替代方法相比,我们的方法在多变量设置中产生更接近条件覆盖的符合集。

更新时间: 2025-07-28 15:55:29

领域: stat.ML,cs.AI,cs.LG,stat.ME,stat.OT

下载: http://arxiv.org/abs/2507.20941v1

Multivariate Conformal Prediction via Conformalized Gaussian Scoring

While achieving exact conditional coverage in conformal prediction is unattainable without making strong, untestable regularity assumptions, the promise of conformal prediction hinges on finding approximations to conditional guarantees that are realizable in practice. A promising direction for obtaining conditional dependence for conformal sets--in particular capturing heteroskedasticity--is through estimating the conditional density $\mathbb{P}_{Y|X}$ and conformalizing its level sets. Previous work in this vein has focused on nonconformity scores based on the empirical cumulative distribution function (CDF). Such scores are, however, computationally costly, typically requiring expensive sampling methods. To avoid the need for sampling, we observe that the CDF-based score reduces to a Mahalanobis distance in the case of Gaussian scores, yielding a closed-form expression that can be directly conformalized. Moreover, the use of a Gaussian-based score opens the door to a number of extensions of the basic conformal method; in particular, we show how to construct conformal sets with missing output values, refine conformal sets as partial information about $Y$ becomes available, and construct conformal sets on transformations of the output space. Finally, empirical results indicate that our approach produces conformal sets that more closely approximate conditional coverage in multivariate settings compared to alternative methods.

Updated: 2025-07-28 15:55:29

标题: 多变量依从性预测:通过依从性高斯评分进行依从性化

摘要: 虽然在符合预测中实现精确的条件覆盖是不可达到的,除非进行强大的、不可验证的正则性假设,但符合预测的承诺取决于在实践中找到可实现的近似条件保证。获得符合集的条件依赖性的一个有希望的方向,特别是捕捉异方差性,是通过估计条件密度 $\mathbb{P}_{Y|X}$ 并使其级别集符合。此前的工作主要关注基于经验累积分布函数(CDF)的非一致性得分。然而,这些得分在计算上往往代价高昂,通常需要昂贵的抽样方法。为了避免需要抽样,我们观察到基于CDF的得分在高斯得分的情况下归结为马氏距离,从而产生一个可以直接进行符合化的封闭形式表达式。此外,使用基于高斯的得分打开了基本符合方法的许多扩展可能性;特别是,我们展示了如何构建具有缺失输出值的符合集,随着有关 $Y$ 的部分信息的可用性,改进符合集,并在输出空间的转换上构建符合集。最后,实证结果表明,我们的方法在多变量设置中产生的符合集更接近于条件覆盖,与其他方法相比。

更新时间: 2025-07-28 15:55:29

领域: stat.ML,cs.AI,cs.LG,stat.ME,stat.OT

下载: http://arxiv.org/abs/2507.20941v1

Dissecting Persona-Driven Reasoning in Language Models via Activation Patching

Large language models (LLMs) exhibit remarkable versatility in adopting diverse personas. In this study, we examine how assigning a persona influences a model's reasoning on an objective task. Using activation patching, we take a first step toward understanding how key components of the model encode persona-specific information. Our findings reveal that the early Multi-Layer Perceptron (MLP) layers attend not only to the syntactic structure of the input but also process its semantic content. These layers transform persona tokens into richer representations, which are then used by the middle Multi-Head Attention (MHA) layers to shape the model's output. Additionally, we identify specific attention heads that disproportionately attend to racial and color-based identities.

Updated: 2025-07-28 15:45:31

标题: 通过激活修补解剖语言模型中基于人物的推理

摘要: 大型语言模型(LLMs)在采用多样化人物时表现出了出色的多功能性。在这项研究中,我们研究了如何分配一个人物形象会影响模型在客观任务上的推理。通过激活补丁,我们迈出了理解模型如何编码特定于人物的信息的第一步。我们的发现显示,早期的多层感知器(MLP)层不仅关注输入的句法结构,还处理其语义内容。这些层将人物令牌转换为更丰富的表示,然后由中间的多头注意力(MHA)层使用以塑造模型的输出。此外,我们还确定了特定的注意头,它们过度关注种族和基于颜色的身份。

更新时间: 2025-07-28 15:45:31

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.20936v1

Dissecting Persona-Driven Reasoning in Language Models via Activation Patching

Large language models (LLMs) exhibit remarkable versatility in adopting diverse personas. In this study, we examine how assigning a persona influences a model's reasoning on an objective task. Using activation patching, we take a first step toward understanding how key components of the model encode persona-specific information. Our findings reveal that the early Multi-Layer Perceptron (MLP) layers attend not only to the syntactic structure of the input but also process its semantic content. These layers transform persona tokens into richer representations, which are then used by the middle Multi-Head Attention (MHA) layers to shape the model's output. Additionally, we identify specific attention heads that disproportionately attend to racial and color-based identities.

Updated: 2025-07-28 15:45:31

标题: 通过激活修补剖析语言模型中基于角色的推理

摘要: 大型语言模型(LLMs)在采用多样化人设方面表现出卓越的灵活性。在这项研究中,我们研究了如何分配一个人设会影响模型在客观任务上的推理。通过激活修补,我们迈出了了解模型的关键组件如何编码特定人设信息的第一步。我们的研究结果显示,早期的多层感知器(MLP)层不仅关注输入的句法结构,还处理其语义内容。这些层将人设标记转化为更丰富的表达,然后由中间的多头注意力(MHA)层使用以塑造模型的输出。此外,我们还确定了特定的注意力头,过度关注种族和基于颜色的身份。

更新时间: 2025-07-28 15:45:31

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.20936v1

Aether: Geometric-Aware Unified World Modeling

The integration of geometric reconstruction and generative modeling remains a critical challenge in developing AI systems capable of human-like spatial reasoning. This paper proposes Aether, a unified framework that enables geometry-aware reasoning in world models by jointly optimizing three core capabilities: (1) 4D dynamic reconstruction, (2) action-conditioned video prediction, and (3) goal-conditioned visual planning. Through task-interleaved feature learning, Aether achieves synergistic knowledge sharing across reconstruction, prediction, and planning objectives. Building upon video generation models, our framework demonstrates zero-shot synthetic-to-real generalization despite never observing real-world data during training. Furthermore, our approach achieves zero-shot generalization in both action following and reconstruction tasks, thanks to its intrinsic geometric modeling. Notably, even without real-world data, its reconstruction performance is comparable with or even better than that of domain-specific models. Additionally, Aether employs camera trajectories as geometry-informed action spaces, enabling effective action-conditioned prediction and visual planning. We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling and its applications.

Updated: 2025-07-28 15:42:31

标题: 以太:几何感知统一世界建模

摘要: 几何重建和生成建模的整合仍然是开发能够进行类人空间推理的人工智能系统的关键挑战。本文提出了Aether,这是一个统一的框架,通过联合优化三个核心能力实现世界模型中的几何感知推理:(1)4D动态重建,(2)动作条件视频预测,以及(3)目标条件视觉规划。通过任务交替特征学习,Aether实现了在重建、预测和规划目标之间的协同知识共享。在视频生成模型的基础上,我们的框架展现了零样本从合成到真实的泛化能力,尽管在训练过程中从未观察到真实世界数据。此外,由于其固有的几何建模能力,我们的方法在动作跟随和重建任务中也实现了零样本泛化。值得注意的是,即使没有真实世界数据,其重建性能与甚至优于特定领域模型。此外,Aether利用摄像机轨迹作为几何信息的动作空间,实现了有效的动作条件预测和视觉规划。我们希望我们的工作能激发社区探索物理合理世界建模及其应用的新领域。

更新时间: 2025-07-28 15:42:31

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2503.18945v3

Aether: Geometric-Aware Unified World Modeling

The integration of geometric reconstruction and generative modeling remains a critical challenge in developing AI systems capable of human-like spatial reasoning. This paper proposes Aether, a unified framework that enables geometry-aware reasoning in world models by jointly optimizing three core capabilities: (1) 4D dynamic reconstruction, (2) action-conditioned video prediction, and (3) goal-conditioned visual planning. Through task-interleaved feature learning, Aether achieves synergistic knowledge sharing across reconstruction, prediction, and planning objectives. Building upon video generation models, our framework demonstrates zero-shot synthetic-to-real generalization despite never observing real-world data during training. Furthermore, our approach achieves zero-shot generalization in both action following and reconstruction tasks, thanks to its intrinsic geometric modeling. Notably, even without real-world data, its reconstruction performance is comparable with or even better than that of domain-specific models. Additionally, Aether employs camera trajectories as geometry-informed action spaces, enabling effective action-conditioned prediction and visual planning. We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling and its applications.

Updated: 2025-07-28 15:42:31

标题: 以太:几何感知统一世界建模

摘要: 几何重建和生成建模的整合仍然是发展能够进行类人空间推理的人工智能系统的关键挑战。本文提出了Aether,一个统一框架,通过联合优化三个核心能力实现在世界模型中的几何感知推理:(1)4D动态重建,(2)动作条件视频预测,和(3)目标条件视觉规划。通过任务交替特征学习,Aether实现了在重建、预测和规划目标之间的协同知识共享。在视频生成模型的基础上,我们的框架展示了零样本合成到实际的泛化能力,尽管在训练期间从未观察过真实世界数据。此外,由于其固有的几何建模,我们的方法在动作跟踪和重建任务中实现了零样本泛化。值得注意的是,即使没有真实世界数据,其重建性能与甚至优于特定领域模型。此外,Aether利用相机轨迹作为几何信息动作空间,实现了有效的动作条件预测和视觉规划。我们希望我们的工作能激发社区探索物理合理世界建模及其应用的新领域。

更新时间: 2025-07-28 15:42:31

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2503.18945v3

FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models

Hallucinations in large language models pose a critical challenge for applications requiring factual reliability, particularly in high-stakes domains such as finance. This work presents an effective approach for detecting and editing factually incorrect content in model-generated responses based on the provided context. Given a user-defined domain-specific error taxonomy, we construct a synthetic dataset by inserting tagged errors into financial question-answering corpora and then fine-tune four language models, Phi-4, Phi-4-mini, Qwen3-4B, and Qwen3-14B, to detect and edit these factual inaccuracies. Our best-performing model, fine-tuned Phi-4, achieves an 8% improvement in binary F1 score and a 30% gain in overall detection performance compared to OpenAI-o3. Notably, our fine-tuned Phi-4-mini model, despite having only 4 billion parameters, maintains competitive performance with just a 2% drop in binary detection and a 0.1% decline in overall detection compared to OpenAI-o3. Our work provides a practical solution for detecting and editing factual inconsistencies in financial text generation while introducing a generalizable framework that can enhance the trustworthiness and alignment of large language models across diverse applications beyond finance. Our code and data are available at https://github.com/pegasi-ai/fine-grained-editting.

Updated: 2025-07-28 15:41:53

标题: FRED:财务检索增强语言模型中幻觉的检测和编辑

摘要: 大语言模型中的幻觉对于需要事实可靠性的应用构成了一个关键挑战,特别是在金融等高风险领域。本文提出了一种有效的方法,用于检测和编辑基于提供的上下文中生成的模型响应中的事实错误内容。给定用户定义的领域特定错误分类法,我们通过将带有标记错误插入金融问答语料库中构建了一个合成数据集,然后对四个语言模型Phi-4、Phi-4-mini、Qwen3-4B和Qwen3-14B进行微调,以检测和编辑这些事实不准确性。我们表现最佳的模型,微调后的Phi-4,在二元F1分数上取得了8%的改进,并在整体检测性能上获得了30%的增益,与OpenAI-o3相比。值得注意的是,我们微调后的Phi-4-mini模型,尽管只有40亿个参数,但在二元检测上只有2%的下降,在整体检测上只有0.1%的下降,与OpenAI-o3相比仍保持竞争力。我们的工作提供了一种实用的解决方案,用于检测和编辑金融文本生成中的事实不一致性,同时引入了一个通用框架,可以增强大语言模型在金融以外多样应用中的可信度和一致性。我们的代码和数据可在https://github.com/pegasi-ai/fine-grained-editting 上获取。

更新时间: 2025-07-28 15:41:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20930v1

FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models

Hallucinations in large language models pose a critical challenge for applications requiring factual reliability, particularly in high-stakes domains such as finance. This work presents an effective approach for detecting and editing factually incorrect content in model-generated responses based on the provided context. Given a user-defined domain-specific error taxonomy, we construct a synthetic dataset by inserting tagged errors into financial question-answering corpora and then fine-tune four language models, Phi-4, Phi-4-mini, Qwen3-4B, and Qwen3-14B, to detect and edit these factual inaccuracies. Our best-performing model, fine-tuned Phi-4, achieves an 8% improvement in binary F1 score and a 30% gain in overall detection performance compared to OpenAI-o3. Notably, our fine-tuned Phi-4-mini model, despite having only 4 billion parameters, maintains competitive performance with just a 2% drop in binary detection and a 0.1% decline in overall detection compared to OpenAI-o3. Our work provides a practical solution for detecting and editing factual inconsistencies in financial text generation while introducing a generalizable framework that can enhance the trustworthiness and alignment of large language models across diverse applications beyond finance. Our code and data are available at https://github.com/pegasi-ai/fine-grained-editting.

Updated: 2025-07-28 15:41:53

标题: FRED:财务检索增强语言模型中幻觉的检测和编辑

摘要: 大型语言模型中的幻觉对于需要事实可靠性的应用来说构成了一个关键挑战,特别是在金融等高风险领域。本文提出了一种有效的方法,用于检测和编辑基于提供的上下文生成的模型响应中的事实错误内容。在给定用户定义的领域特定错误分类法的情况下,我们通过将标记错误插入金融问答语料库中构建了一个合成数据集,然后对四个语言模型Phi-4、Phi-4-mini、Qwen3-4B和Qwen3-14B进行微调,以检测和编辑这些事实不准确性。我们表现最佳的模型,经过微调的Phi-4,在二元F1分数上实现了8%的改进,总体检测性能提高了30%,与OpenAI-o3相比。值得注意的是,我们经过微调的Phi-4-mini模型,尽管只有40亿个参数,但在二元检测上仍保持着竞争性能,与OpenAI-o3相比只有2%的下降,总体检测只有0.1%的下降。我们的工作提供了一个实用的解决方案,用于检测和编辑金融文本生成中的事实不一致性,同时引入了一个通用框架,可以增强大型语言模型在金融以外不同应用中的可信度和对齐性。我们的代码和数据可在https://github.com/pegasi-ai/fine-grained-editting 上获取。

更新时间: 2025-07-28 15:41:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20930v1

Breaking the Precision Ceiling in Physics-Informed Neural Networks: A Hybrid Fourier-Neural Architecture for Ultra-High Accuracy

Physics-informed neural networks (PINNs) have plateaued at errors of $10^{-3}$-$10^{-4}$ for fourth-order partial differential equations, creating a perceived precision ceiling that limits their adoption in engineering applications. We break through this barrier with a hybrid Fourier-neural architecture for the Euler-Bernoulli beam equation, achieving unprecedented L2 error of $1.94 \times 10^{-7}$-a 17-fold improvement over standard PINNs and \(15-500\times\) better than traditional numerical methods. Our approach synergistically combines a truncated Fourier series capturing dominant modal behavior with a deep neural network providing adaptive residual corrections. A systematic harmonic optimization study revealed a counter-intuitive discovery: exactly 10 harmonics yield optimal performance, with accuracy catastrophically degrading from $10^{-7}$ to $10^{-1}$ beyond this threshold. The two-phase optimization strategy (Adam followed by L-BFGS) and adaptive weight balancing enable stable ultra-precision convergence. GPU-accelerated implementation achieves sub-30-minute training despite fourth-order derivative complexity. By addressing 12 critical gaps in existing approaches-from architectural rigidity to optimization landscapes-this work demonstrates that ultra-precision is achievable through proper design, opening new paradigms for scientific computing where machine learning can match or exceed traditional numerical methods.

Updated: 2025-07-28 15:41:51

标题: 打破物理学信息神经网络的精度上限:一种用于超高精度的混合傅里叶-神经网络结构

摘要: 物理信息神经网络(PINNs)在四阶偏微分方程方面的误差已经达到$10^{-3}$-$10^{-4}$,形成了一个被认为是精确性上限的平台,限制了它们在工程应用中的采用。我们通过一种混合傅立叶-神经网络架构突破了这一障碍,针对欧拉-伯努利梁方程实现了前所未有的L2误差$1.94 \times 10^{-7}$,比标准PINNs提高了17倍,比传统数值方法提高了\(15-500\times\)。我们的方法将捕捉主导模态行为的截断傅立叶级数与提供自适应残差校正的深度神经网络相结合。系统性的谐波优化研究揭示了一个反直觉的发现:确切地说,10个谐波产生了最佳性能,超过这个阈值,精度从$10^{-7}$急剧下降到$10^{-1}$。两阶段优化策略(Adam后跟L-BFGS)和自适应权重平衡实现了稳定的超高精度收敛。通过GPU加速实现,尽管存在四阶导数复杂性,训练时间仍然不到30分钟。通过解决现有方法中的12个关键差距-从架构僵化到优化景观-这项工作表明通过合适的设计可以实现超高精度,开启了科学计算的新范式,在这里机器学习可以与甚至超过传统数值方法相匹配。

更新时间: 2025-07-28 15:41:51

领域: cs.LG,cond-mat.mtrl-sci,physics.comp-ph

下载: http://arxiv.org/abs/2507.20929v1

Breaking the Precision Ceiling in Physics-Informed Neural Networks: A Hybrid Fourier-Neural Architecture for Ultra-High Accuracy

Physics-informed neural networks (PINNs) have plateaued at errors of $10^{-3}$-$10^{-4}$ for fourth-order partial differential equations, creating a perceived precision ceiling that limits their adoption in engineering applications. We break through this barrier with a hybrid Fourier-neural architecture for the Euler-Bernoulli beam equation, achieving unprecedented L2 error of $1.94 \times 10^{-7}$-a 17-fold improvement over standard PINNs and \(15-500\times\) better than traditional numerical methods. Our approach synergistically combines a truncated Fourier series capturing dominant modal behavior with a deep neural network providing adaptive residual corrections. A systematic harmonic optimization study revealed a counter-intuitive discovery: exactly 10 harmonics yield optimal performance, with accuracy catastrophically degrading from $10^{-7}$ to $10^{-1}$ beyond this threshold. The two-phase optimization strategy (Adam followed by L-BFGS) and adaptive weight balancing enable stable ultra-precision convergence. GPU-accelerated implementation achieves sub-30-minute training despite fourth-order derivative complexity. By addressing 12 critical gaps in existing approaches-from architectural rigidity to optimization landscapes-this work demonstrates that ultra-precision is achievable through proper design, opening new paradigms for scientific computing where machine learning can match or exceed traditional numerical methods.

Updated: 2025-07-28 15:41:51

标题: 打破物理启发神经网络的精度上限:一种用于超高精度的混合傅立叶-神经结构

摘要: 物理信息神经网络(PINNs)在四阶偏微分方程的误差方面已经达到了$10^{-3}$-$10^{-4}$,从而创建了一个被认为是精度上限的平台,限制了它们在工程应用中的采用。我们通过一种混合傅立叶-神经网络架构打破了这一障碍,用于欧拉-伯努利梁方程,实现了前所未有的L2误差为$1.94 \times 10^{-7}$,比标准PINNs提高了17倍,比传统数值方法提高了\(15-500\times\)。我们的方法将捕获主导模态行为的截断傅立叶级数与提供自适应残差校正的深度神经网络紧密结合。系统性的谐波优化研究揭示了一个反直觉的发现:恰好10个谐波产生了最佳性能,精度从$10^{-7}$降到了$10^{-1}$,超过了这个阈值后,精度会灾难性地下降。两阶段优化策略(Adam后跟L-BFGS)和自适应权重平衡实现了稳定的超精度收敛。通过GPU加速实现,尽管存在四阶导数复杂性,训练时间仍然在30分钟以下。通过解决现有方法中的12个关键差距-从架构僵化到优化景观-这项工作证明了通过适当设计可以实现超精度,为科学计算开辟了新的范式,其中机器学习可以与甚至超过传统数值方法相匹配。

更新时间: 2025-07-28 15:41:51

领域: cs.LG,cond-mat.mtrl-sci,physics.comp-ph

下载: http://arxiv.org/abs/2507.20929v1

LLM2TEA: An Agentic AI Designer for Discovery with Generative Evolutionary Multitasking

This paper presents LLM2TEA, a Large Language Model (LLM) driven MultiTask Evolutionary Algorithm, representing the first agentic AI designer of its kind operating with generative evolutionary multitasking (GEM). LLM2TEA enables the crossbreeding of solutions from multiple domains, fostering novel solutions that transcend disciplinary boundaries. Of particular interest is the ability to discover designs that are both novel and conforming to real-world physical specifications. LLM2TEA comprises an LLM to generate genotype samples from text prompts describing target objects, a text-to-3D generative model to produce corresponding phenotypes, a classifier to interpret its semantic representations, and a computational simulator to assess its physical properties. Novel LLM-based multitask evolutionary operators are introduced to guide the search towards high-performing, practically viable designs. Experimental results in conceptual design optimization validate the effectiveness of LLM2TEA, showing 97% to 174% improvements in the diversity of novel designs over the current text-to-3D baseline. Moreover, over 73% of the generated designs outperform the top 1% of designs produced by the text-to-3D baseline in terms of physical performance. The designs produced by LLM2TEA are not only aesthetically creative but also functional in real-world contexts. Several of these designs have been successfully 3D printed, demonstrating the ability of our approach to transform AI-generated outputs into tangible, physical designs. These designs underscore the potential of LLM2TEA as a powerful tool for complex design optimization and discovery, capable of producing novel and physically viable designs.

Updated: 2025-07-28 15:37:20

标题: LLM2TEA:一种具有生成进化多任务的主动型AI设计师用于发现

摘要: 本文介绍了LLM2TEA,一个由大型语言模型(LLM)驱动的多任务进化算法,代表了第一个采用生成式进化多任务(GEM)操作的智能AI设计者。LLM2TEA能够跨多个领域交叉繁殖解决方案,促进超越学科界限的新颖解决方案。特别值得关注的是其发现既新颖又符合实际物理规格的设计能力。LLM2TEA由LLM生成基因型样本,从描述目标对象的文本提示中产生相应的表型,一个分类器来解释其语义表示,以及一个计算模拟器来评估其物理性能。引入了基于LLM的多任务进化操作符,以引导搜索朝着高性能、实际可行的设计方向。在概念设计优化方面的实验结果验证了LLM2TEA的有效性,显示在新颖设计多样性方面与当前文本到3D基线相比有97%到174%的改进。此外,超过73%的生成设计在物理性能方面优于文本到3D基线生产的前1%设计。LLM2TEA生成的设计不仅在美学上富有创意,而且在真实世界环境下具有功能性。其中一些设计已经成功进行了3D打印,展示了我们方法将AI生成的输出转化为切实可行的物理设计的能力。这些设计强调了LLM2TEA作为复杂设计优化和发现的强大工具的潜力,能够产生新颖且物理可行的设计。

更新时间: 2025-07-28 15:37:20

领域: cs.AI,cs.CL,cs.CV,cs.LG,cs.NE

下载: http://arxiv.org/abs/2406.14917v3

LLM2TEA: An Agentic AI Designer for Discovery with Generative Evolutionary Multitasking

This paper presents LLM2TEA, a Large Language Model (LLM) driven MultiTask Evolutionary Algorithm, representing the first agentic AI designer of its kind operating with generative evolutionary multitasking (GEM). LLM2TEA enables the crossbreeding of solutions from multiple domains, fostering novel solutions that transcend disciplinary boundaries. Of particular interest is the ability to discover designs that are both novel and conforming to real-world physical specifications. LLM2TEA comprises an LLM to generate genotype samples from text prompts describing target objects, a text-to-3D generative model to produce corresponding phenotypes, a classifier to interpret its semantic representations, and a computational simulator to assess its physical properties. Novel LLM-based multitask evolutionary operators are introduced to guide the search towards high-performing, practically viable designs. Experimental results in conceptual design optimization validate the effectiveness of LLM2TEA, showing 97% to 174% improvements in the diversity of novel designs over the current text-to-3D baseline. Moreover, over 73% of the generated designs outperform the top 1% of designs produced by the text-to-3D baseline in terms of physical performance. The designs produced by LLM2TEA are not only aesthetically creative but also functional in real-world contexts. Several of these designs have been successfully 3D printed, demonstrating the ability of our approach to transform AI-generated outputs into tangible, physical designs. These designs underscore the potential of LLM2TEA as a powerful tool for complex design optimization and discovery, capable of producing novel and physically viable designs.

Updated: 2025-07-28 15:37:20

标题: LLM2TEA:一种具有生成进化多任务处理功能的主动式人工智能设计师用于发现

摘要: 本文介绍了LLM2TEA,一种大型语言模型(LLM)驱动的多任务进化算法,是其类别中第一个具有生成性进化多任务(GEM)功能的AI设计者。LLM2TEA能够跨领域交叉繁殖解决方案,促进超越学科界限的新颖解决方案的产生。特别值得关注的是,它能够发现既新颖又符合真实世界物理规格的设计。LLM2TEA包括一个LLM,用于从描述目标对象的文本提示中生成基因型样本,一个文本到3D生成模型用于生成相应的表现型,一个分类器用于解释其语义表示,以及一个计算模拟器用于评估其物理特性。引入了基于LLM的新型多任务进化算子,以引导搜索朝向高性能、实际可行的设计。概念设计优化的实验结果验证了LLM2TEA的有效性,显示其新颖设计多样性比当前文本到3D基准提高了97%至174%。此外,超过73%的生成设计在物理性能方面优于文本到3D基准生成的顶尖1%设计。LLM2TEA生成的设计不仅在美学上具有创意,而且在真实世界环境中是实用的。其中一些设计已成功进行了3D打印,展示了我们方法将AI生成的输出转化为切实可行的物理设计的能力。这些设计强调了LLM2TEA作为复杂设计优化和发现的强大工具的潜力,能够产生新颖且物理可行的设计。

更新时间: 2025-07-28 15:37:20

领域: cs.AI,cs.CL,cs.CV,cs.LG,cs.NE

下载: http://arxiv.org/abs/2406.14917v3

SEAL: Searching Expandable Architectures for Incremental Learning

Incremental learning is a machine learning paradigm where a model learns from a sequential stream of tasks. This setting poses a key challenge: balancing plasticity (learning new tasks) and stability (preserving past knowledge). Neural Architecture Search (NAS), a branch of AutoML, automates the design of the architecture of Deep Neural Networks and has shown success in static settings. However, existing NAS-based approaches to incremental learning often rely on expanding the model at every task, making them impractical in resource-constrained environments. In this work, we introduce SEAL, a NAS-based framework tailored for data-incremental learning, a scenario where disjoint data samples arrive sequentially and are not stored for future access. SEAL adapts the model structure dynamically by expanding it only when necessary, based on a capacity estimation metric. Stability is preserved through cross-distillation training after each expansion step. The NAS component jointly searches for both the architecture and the optimal expansion policy. Experiments across multiple benchmarks demonstrate that SEAL effectively reduces forgetting and enhances accuracy while maintaining a lower model size compared to prior methods. These results highlight the promise of combining NAS and selective expansion for efficient, adaptive learning in incremental scenarios.

Updated: 2025-07-28 15:36:46

标题: SEAL:搜索可扩展的架构用于增量学习

摘要: 增量学习是一种机器学习范式,其中模型从顺序流的任务中学习。这种情境带来了一个关键挑战:在保持过去知识的稳定性的同时平衡可塑性(学习新任务)。神经架构搜索(NAS)是AutoML的一个分支,自动设计深度神经网络的架构,在静态环境中取得成功。然而,现有的基于NAS的增量学习方法通常依赖于在每个任务中扩展模型,这使它们在资源受限环境中不实用。在这项工作中,我们介绍了SEAL,一个专为数据增量学习定制的基于NAS的框架,即数据样本以不相交的方式顺序到达,并且不会存储以便将来访问的情况。SEAL根据容量估计度量,仅在必要时扩展模型结构。在每次扩展步骤之后,通过交叉蒸馏训练来保持稳定性。NAS组件同时搜索架构和最佳扩展策略。跨多个基准测试的实验表明,SEAL有效地减少了遗忘并增强了准确性,同时与先前方法相比保持了较小的模型大小。这些结果突显了在增量情境中结合NAS和选择性扩展以实现高效适应性学习的潜力。

更新时间: 2025-07-28 15:36:46

领域: cs.LG,cs.AI,cs.CV,68T07

下载: http://arxiv.org/abs/2505.10457v2

SEAL: Searching Expandable Architectures for Incremental Learning

Incremental learning is a machine learning paradigm where a model learns from a sequential stream of tasks. This setting poses a key challenge: balancing plasticity (learning new tasks) and stability (preserving past knowledge). Neural Architecture Search (NAS), a branch of AutoML, automates the design of the architecture of Deep Neural Networks and has shown success in static settings. However, existing NAS-based approaches to incremental learning often rely on expanding the model at every task, making them impractical in resource-constrained environments. In this work, we introduce SEAL, a NAS-based framework tailored for data-incremental learning, a scenario where disjoint data samples arrive sequentially and are not stored for future access. SEAL adapts the model structure dynamically by expanding it only when necessary, based on a capacity estimation metric. Stability is preserved through cross-distillation training after each expansion step. The NAS component jointly searches for both the architecture and the optimal expansion policy. Experiments across multiple benchmarks demonstrate that SEAL effectively reduces forgetting and enhances accuracy while maintaining a lower model size compared to prior methods. These results highlight the promise of combining NAS and selective expansion for efficient, adaptive learning in incremental scenarios.

Updated: 2025-07-28 15:36:46

标题: SEAL:搜索可扩展架构用于增量学习

摘要: 增量学习是一种机器学习范式,其中模型从顺序流式任务中学习。这种设置带来了一个关键挑战:平衡可塑性(学习新任务)和稳定性(保留过去的知识)。神经架构搜索(NAS)是AutoML的一个分支,自动设计深度神经网络的架构,并在静态设置中取得成功。然而,现有的基于NAS的增量学习方法通常依赖于在每个任务中扩展模型,使其在资源受限的环境中不切实际。在这项工作中,我们介绍了SEAL,一个专为数据增量学习定制的基于NAS的框架,这是一种数据样本逐个到达并且不存储以供将来访问的情况。SEAL通过根据容量估计度量仅在必要时扩展模型,动态调整模型结构。每次扩展步骤之后通过交叉蒸馏训练来保持稳定性。NAS组件同时搜索架构和最佳扩展策略。跨多个基准测试的实验表明,SEAL有效地减少遗忘并提高准确性,同时保持较低的模型大小,相较于先前的方法。这些结果突显了在增量情景中结合NAS和选择性扩展以实现高效自适应学习的潜力。

更新时间: 2025-07-28 15:36:46

领域: cs.LG,cs.AI,cs.CV,68T07

下载: http://arxiv.org/abs/2505.10457v2

Zero-Shot Learning with Subsequence Reordering Pretraining for Compound-Protein Interaction

Given the vastness of chemical space and the ongoing emergence of previously uncharacterized proteins, zero-shot compound-protein interaction (CPI) prediction better reflects the practical challenges and requirements of real-world drug development. Although existing methods perform adequately during certain CPI tasks, they still face the following challenges: (1) Representation learning from local or complete protein sequences often overlooks the complex interdependencies between subsequences, which are essential for predicting spatial structures and binding properties. (2) Dependence on large-scale or scarce multimodal protein datasets demands significant training data and computational resources, limiting scalability and efficiency. To address these challenges, we propose a novel approach that pretrains protein representations for CPI prediction tasks using subsequence reordering, explicitly capturing the dependencies between protein subsequences. Furthermore, we apply length-variable protein augmentation to ensure excellent pretraining performance on small training datasets. To evaluate the model's effectiveness and zero-shot learning ability, we combine it with various baseline methods. The results demonstrate that our approach can improve the baseline model's performance on the CPI task, especially in the challenging zero-shot scenario. Compared to existing pre-training models, our model demonstrates superior performance, particularly in data-scarce scenarios where training samples are limited. Our implementation is available at https://github.com/Hoch-Zhang/PSRP-CPI.

Updated: 2025-07-28 15:31:15

标题: 无监督学习中的子序列重新排序预训练在复合蛋白相互作用中的应用

摘要: 鉴于化学空间的广阔和先前未经描述的蛋白质的不断涌现,零样本化合物-蛋白质相互作用(CPI)预测更能反映现实世界药物开发的实际挑战和需求。尽管现有方法在某些CPI任务中表现良好,但仍面临以下挑战:(1)从局部或完整蛋白质序列中学习表征通常忽略亚序列之间的复杂相互依赖关系,这对于预测空间结构和结合性质至关重要。 (2)依赖于大规模或稀缺的多模蛋白质数据集需要大量的训练数据和计算资源,限制了可扩展性和效率。为了解决这些挑战,我们提出了一种新颖的方法,使用亚序列重新排序对CPI预测任务进行蛋白质表示的预训练,明确捕获蛋白质亚序列之间的依赖关系。此外,我们应用长度可变的蛋白质增强来确保在小型训练数据集上具有出色的预训练性能。为了评估模型的有效性和零样本学习能力,我们将其与各种基线方法结合使用。结果表明,我们的方法可以改善CPI任务中基线模型的性能,特别是在具有挑战性的零样本情景中。与现有的预训练模型相比,我们的模型表现出更优异的性能,特别是在训练样本有限的数据稀缺情况下。我们的实现可在https://github.com/Hoch-Zhang/PSRP-CPI上找到。

更新时间: 2025-07-28 15:31:15

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2507.20925v1

Zero-Shot Learning with Subsequence Reordering Pretraining for Compound-Protein Interaction

Given the vastness of chemical space and the ongoing emergence of previously uncharacterized proteins, zero-shot compound-protein interaction (CPI) prediction better reflects the practical challenges and requirements of real-world drug development. Although existing methods perform adequately during certain CPI tasks, they still face the following challenges: (1) Representation learning from local or complete protein sequences often overlooks the complex interdependencies between subsequences, which are essential for predicting spatial structures and binding properties. (2) Dependence on large-scale or scarce multimodal protein datasets demands significant training data and computational resources, limiting scalability and efficiency. To address these challenges, we propose a novel approach that pretrains protein representations for CPI prediction tasks using subsequence reordering, explicitly capturing the dependencies between protein subsequences. Furthermore, we apply length-variable protein augmentation to ensure excellent pretraining performance on small training datasets. To evaluate the model's effectiveness and zero-shot learning ability, we combine it with various baseline methods. The results demonstrate that our approach can improve the baseline model's performance on the CPI task, especially in the challenging zero-shot scenario. Compared to existing pre-training models, our model demonstrates superior performance, particularly in data-scarce scenarios where training samples are limited. Our implementation is available at https://github.com/Hoch-Zhang/PSRP-CPI.

Updated: 2025-07-28 15:31:15

标题: 使用子序列重新排序预训练的零样本学习在化合物-蛋白质相互作用中的应用

摘要: 鉴于化学空间的广阔和先前未经表征的蛋白质的不断出现,零样本化合物-蛋白相互作用(CPI)预测更能反映实际药物开发的挑战和要求。尽管现有方法在某些CPI任务中表现良好,但仍面临以下挑战:(1)从局部或完整蛋白序列学习表示通常忽视亚序列之间的复杂相互依赖关系,这对于预测空间结构和结合性质至关重要。 (2)依赖大规模或稀缺的多模态蛋白数据集需要大量的训练数据和计算资源,限制了可伸缩性和效率。为解决这些挑战,我们提出了一种新颖的方法,通过子序列重排对CPI预测任务进行蛋白表示的预训练,明确捕获蛋白亚序列之间的依赖关系。此外,我们应用长度可变的蛋白增强来确保在小训练数据集上具有出色的预训练性能。为评估模型的有效性和零样本学习能力,我们将其与各种基准方法结合使用。结果表明,我们的方法可以提高基线模型在CPI任务中的性能,特别是在具有挑战性的零样本情况下。与现有的预训练模型相比,我们的模型表现出更优越的性能,特别是在训练样本有限的数据稀缺情况下。我们的实现可在https://github.com/Hoch-Zhang/PSRP-CPI找到。

更新时间: 2025-07-28 15:31:15

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2507.20925v1

FHSTP@EXIST 2025 Benchmark: Sexism Detection with Transparent Speech Concept Bottleneck Models

Sexism has become widespread on social media and in online conversation. To help address this issue, the fifth Sexism Identification in Social Networks (EXIST) challenge is initiated at CLEF 2025. Among this year's international benchmarks, we concentrate on solving the first task aiming to identify and classify sexism in social media textual posts. In this paper, we describe our solutions and report results for three subtasks: Subtask 1.1 - Sexism Identification in Tweets, Subtask 1.2 - Source Intention in Tweets, and Subtask 1.3 - Sexism Categorization in Tweets. We implement three models to address each subtask which constitute three individual runs: Speech Concept Bottleneck Model (SCBM), Speech Concept Bottleneck Model with Transformer (SCBMT), and a fine-tuned XLM-RoBERTa transformer model. SCBM uses descriptive adjectives as human-interpretable bottleneck concepts. SCBM leverages large language models (LLMs) to encode input texts into a human-interpretable representation of adjectives, then used to train a lightweight classifier for downstream tasks. SCBMT extends SCBM by fusing adjective-based representation with contextual embeddings from transformers to balance interpretability and classification performance. Beyond competitive results, these two models offer fine-grained explanations at both instance (local) and class (global) levels. We also investigate how additional metadata, e.g., annotators' demographic profiles, can be leveraged. For Subtask 1.1, XLM-RoBERTa, fine-tuned on provided data augmented with prior datasets, ranks 6th for English and Spanish and 4th for English in the Soft-Soft evaluation. Our SCBMT achieves 7th for English and Spanish and 6th for Spanish.

Updated: 2025-07-28 15:30:17

标题: FHSTP @ EXIST 2025 Benchmark: 透明语音概念瓶颈模型下的性别歧视检测

摘要: 性别歧视已经在社交媒体和在线对话中变得普遍。为了解决这个问题,第五届社交网络中的性别歧视识别(EXIST)挑战在CLEF 2025年启动。在今年的国际基准中,我们集中解决第一个任务,旨在识别和分类社交媒体文本帖子中的性别歧视。在本文中,我们描述了我们的解决方案,并报告了三个子任务的结果:子任务1.1 - 推文中的性别歧视识别,子任务1.2 - 推文中的来源意图,子任务1.3 - 推文中的性别歧视分类。我们实现了三个模型来解决每个子任务,构成了三个独立的运行:Speech Concept Bottleneck Model(SCBM),带Transformer的Speech Concept Bottleneck Model(SCBMT)和经过精调的XLM-RoBERTa transformer模型。SCBM使用描述性形容词作为人类可解释的瓶颈概念。SCBM利用大型语言模型(LLMs)将输入文本编码为形容词的人类可解释表示,然后用于训练轻量级分类器以进行下游任务。SCBMT通过将基于形容词的表示与来自transformers的上下文嵌入融合,以平衡解释性和分类性能。除了竞争性结果,这两个模型还在实例(局部)和类别(全局)级别提供了细粒度的解释。我们还调查了如何利用额外的元数据,例如注释者的人口统计资料。对于子任务1.1,XLM-RoBERTa,在提供的数据上进行了精调并增加了先前的数据集,在软-软评估中英语和西班牙语排名第6,在英语中排名第4。我们的SCBMT在英语和西班牙语中排名第7,在西班牙语中排名第6。

更新时间: 2025-07-28 15:30:17

领域: cs.CL,cs.AI,cs.CY,cs.SI,I.2

下载: http://arxiv.org/abs/2507.20924v1

FHSTP@EXIST 2025 Benchmark: Sexism Detection with Transparent Speech Concept Bottleneck Models

Sexism has become widespread on social media and in online conversation. To help address this issue, the fifth Sexism Identification in Social Networks (EXIST) challenge is initiated at CLEF 2025. Among this year's international benchmarks, we concentrate on solving the first task aiming to identify and classify sexism in social media textual posts. In this paper, we describe our solutions and report results for three subtasks: Subtask 1.1 - Sexism Identification in Tweets, Subtask 1.2 - Source Intention in Tweets, and Subtask 1.3 - Sexism Categorization in Tweets. We implement three models to address each subtask which constitute three individual runs: Speech Concept Bottleneck Model (SCBM), Speech Concept Bottleneck Model with Transformer (SCBMT), and a fine-tuned XLM-RoBERTa transformer model. SCBM uses descriptive adjectives as human-interpretable bottleneck concepts. SCBM leverages large language models (LLMs) to encode input texts into a human-interpretable representation of adjectives, then used to train a lightweight classifier for downstream tasks. SCBMT extends SCBM by fusing adjective-based representation with contextual embeddings from transformers to balance interpretability and classification performance. Beyond competitive results, these two models offer fine-grained explanations at both instance (local) and class (global) levels. We also investigate how additional metadata, e.g., annotators' demographic profiles, can be leveraged. For Subtask 1.1, XLM-RoBERTa, fine-tuned on provided data augmented with prior datasets, ranks 6th for English and Spanish and 4th for English in the Soft-Soft evaluation. Our SCBMT achieves 7th for English and Spanish and 6th for Spanish.

Updated: 2025-07-28 15:30:17

标题: FHSTP@EXIST 2025基准:透明话语概念瓶颈模型中的性别歧视检测

摘要: 性别歧视在社交媒体和网络对话中变得普遍。为了解决这个问题,第五届CLEF 2025举办了性别歧视识别在社交网络中的挑战(EXIST)。在今年的国际基准中,我们集中解决第一个任务,旨在识别和分类社交媒体文本帖子中的性别歧视。在本文中,我们描述了我们的解决方案,并报告了三个子任务的结果:Subtask 1.1 - 推文中的性别歧视识别,Subtask 1.2 - 推文中的来源意图,以及Subtask 1.3 - 推文中的性别歧视分类。我们实现了三个模型来解决每个子任务,这构成了三个独立的运行:Speech Concept Bottleneck Model(SCBM),带Transformer的Speech Concept Bottleneck Model(SCBMT)和经过微调的XLM-RoBERTa transformer模型。SCBM使用描述性形容词作为人类可解释的瓶颈概念。SCBM利用大型语言模型(LLMs)将输入文本编码为形容词的人类可解释表示,然后用于训练用于下游任务的轻量级分类器。SCBMT通过将基于形容词的表示与transformers的上下文嵌入融合,以平衡解释性和分类性能。除了竞争性结果,这两个模型在实例(局部)和类别(全局)级别提供了细致的解释。我们还研究了如何利用额外的元数据,例如注释者的人口统计资料。对于Subtask 1.1,XLM-RoBERTa在提供的数据上进行微调,并通过先前的数据集进行了扩充,在Soft-Soft评估中英语和西班牙语分别排名第6和第4。我们的SCBMT在英语和西班牙语中分别排名第7和第6。

更新时间: 2025-07-28 15:30:17

领域: cs.CL,cs.AI,cs.CY,cs.SI,I.2

下载: http://arxiv.org/abs/2507.20924v1

Pareto-Grid-Guided Large Language Models for Fast and High-Quality Heuristics Design in Multi-Objective Combinatorial Optimization

Multi-objective combinatorial optimization problems (MOCOP) frequently arise in practical applications that require the simultaneous optimization of conflicting objectives. Although traditional evolutionary algorithms can be effective, they typically depend on domain knowledge and repeated parameter tuning, limiting flexibility when applied to unseen MOCOP instances. Recently, integration of Large Language Models (LLMs) into evolutionary computation has opened new avenues for automatic heuristic generation, using their advanced language understanding and code synthesis capabilities. Nevertheless, most existing approaches predominantly focus on single-objective tasks, often neglecting key considerations such as runtime efficiency and heuristic diversity in multi-objective settings. To bridge this gap, we introduce Multi-heuristics for MOCOP via Pareto-Grid-guided Evolution of LLMs (MPaGE), a novel enhancement of the Simple Evolutionary Multiobjective Optimization (SEMO) framework that leverages LLMs and Pareto Front Grid (PFG) technique. By partitioning the objective space into grids and retaining top-performing candidates to guide heuristic generation, MPaGE utilizes LLMs to prioritize heuristics with semantically distinct logical structures during variation, thus promoting diversity and mitigating redundancy within the population. Through extensive evaluations, MPaGE demonstrates superior performance over existing LLM-based frameworks, and achieves competitive results to traditional Multi-objective evolutionary algorithms (MOEAs), with significantly faster runtime. Our code is available at: https://github.com/langkhachhoha/MPaGE.

Updated: 2025-07-28 15:26:43

标题: 帕累托网格引导的大型语言模型在多目标组合优化中快速高质量启发式设计中的应用

摘要: 多目标组合优化问题(MOCOP)经常出现在需要同时优化冲突目标的实际应用中。尽管传统的进化算法可能是有效的,但它们通常依赖于领域知识和重复的参数调整,在应用于未知的MOCOP实例时限制了灵活性。最近,将大型语言模型(LLMs)集成到进化计算中开辟了自动启发式生成的新途径,利用其先进的语言理解和代码合成能力。然而,大多数现有方法主要关注单目标任务,经常忽视多目标设置中的运行时效率和启发式多样性等关键考虑因素。为了弥合这一差距,我们引入了通过Pareto-Grid引导的LLMs进化的MOCOP多启发式(MPaGE),这是对简单多目标进化优化(SEMO)框架的一种新的增强,利用LLMs和Pareto前沿网格(PFG)技术。通过将目标空间划分为网格,并保留排名靠前的候选者来引导启发式生成,MPaGE利用LLMs在变异过程中优先考虑具有语义不同逻辑结构的启发式,从而促进种群内的多样性并减少冗余。通过广泛的评估,MPaGE表现出优于现有基于LLMs的框架的性能,并且实现了与传统多目标进化算法(MOEAs)竞争性结果,运行时显着更快。我们的代码可以在以下链接找到:https://github.com/langkhachhoha/MPaGE。

更新时间: 2025-07-28 15:26:43

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2507.20923v1

Pareto-Grid-Guided Large Language Models for Fast and High-Quality Heuristics Design in Multi-Objective Combinatorial Optimization

Multi-objective combinatorial optimization problems (MOCOP) frequently arise in practical applications that require the simultaneous optimization of conflicting objectives. Although traditional evolutionary algorithms can be effective, they typically depend on domain knowledge and repeated parameter tuning, limiting flexibility when applied to unseen MOCOP instances. Recently, integration of Large Language Models (LLMs) into evolutionary computation has opened new avenues for automatic heuristic generation, using their advanced language understanding and code synthesis capabilities. Nevertheless, most existing approaches predominantly focus on single-objective tasks, often neglecting key considerations such as runtime efficiency and heuristic diversity in multi-objective settings. To bridge this gap, we introduce Multi-heuristics for MOCOP via Pareto-Grid-guided Evolution of LLMs (MPaGE), a novel enhancement of the Simple Evolutionary Multiobjective Optimization (SEMO) framework that leverages LLMs and Pareto Front Grid (PFG) technique. By partitioning the objective space into grids and retaining top-performing candidates to guide heuristic generation, MPaGE utilizes LLMs to prioritize heuristics with semantically distinct logical structures during variation, thus promoting diversity and mitigating redundancy within the population. Through extensive evaluations, MPaGE demonstrates superior performance over existing LLM-based frameworks, and achieves competitive results to traditional Multi-objective evolutionary algorithms (MOEAs), with significantly faster runtime. Our code is available at: https://github.com/langkhachhoha/MPaGE.

Updated: 2025-07-28 15:26:43

标题: Pareto-Grid引导的大型语言模型用于多目标组合优化中快速高质量的启发式设计

摘要: 多目标组合优化问题(MOCOP)经常在需要同时优化相互冲突目标的实际应用中出现。尽管传统的进化算法可以有效,但它们通常依赖领域知识和重复参数调整,在应用于未知MOCOP实例时限制了灵活性。最近,将大型语言模型(LLMs)集成到进化计算中为自动生成启发式方法开辟了新途径,利用它们的先进语言理解和代码合成能力。然而,大多数现有方法主要关注单目标任务,经常忽视多目标设置中的运行时效率和启发式多样性等关键问题。为了弥补这一差距,我们引入了通过帕累托网格引导LLMs进化的多启发式MOCOP(MPaGE)方法,这是对简单多目标进化优化(SEMO)框架的一种新的增强,利用LLMs和帕累托前沿网格(PFG)技术。通过将目标空间划分为网格并保留表现最佳的候选者来引导启发式生成,MPaGE利用LLMs在变异过程中优先考虑具有语义明显逻辑结构的启发式,从而促进种群内的多样性和减轻冗余。通过广泛评估,MPaGE表现出优于现有基于LLMs的框架的性能,并且在运行时速度明显更快的情况下实现了与传统多目标进化算法(MOEAs)竞争力的结果。我们的代码可在以下网址找到:https://github.com/langkhachhoha/MPaGE。

更新时间: 2025-07-28 15:26:43

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2507.20923v1

Modeling User Behavior from Adaptive Surveys with Supplemental Context

Modeling user behavior is critical across many industries where understanding preferences, intent, or decisions informs personalization, targeting, and strategic outcomes. Surveys have long served as a classical mechanism for collecting such behavioral data due to their interpretability, structure, and ease of deployment. However, surveys alone are inherently limited by user fatigue, incomplete responses, and practical constraints on their length making them insufficient for capturing user behavior. In this work, we present LANTERN (Late-Attentive Network for Enriched Response Modeling), a modular architecture for modeling user behavior by fusing adaptive survey responses with supplemental contextual signals. We demonstrate the architectural value of maintaining survey primacy through selective gating, residual connections and late fusion via cross-attention, treating survey data as the primary signal while incorporating external modalities only when relevant. LANTERN outperforms strong survey-only baselines in multi-label prediction of survey responses. We further investigate threshold sensitivity and the benefits of selective modality reliance through ablation and rare/frequent attribute analysis. LANTERN's modularity supports scalable integration of new encoders and evolving datasets. This work provides a practical and extensible blueprint for behavior modeling in survey-centric applications.

Updated: 2025-07-28 15:19:54

标题: 用补充背景信息的自适应调查模拟用户行为

摘要: 建模用户行为在许多行业中至关重要,了解偏好、意图或决策可以用于个性化、定位和战略结果。长期以来,调查一直被视为一种经典机制,用于收集此类行为数据,因为它们易于解释、结构化且易于部署。然而,单独使用调查受用户疲劳、不完整回答和实际长度限制的困扰,因此无法完全捕捉用户行为。在这项工作中,我们提出了LANTERN(用于丰富响应建模的Late-Attentive Network),这是一种通过融合自适应调查回答和补充上下文信号来建模用户行为的模块化架构。我们通过选择性门控、残差连接和通过交叉注意力进行的后续融合来展示维持调查优先性的建筑价值,将调查数据视为主要信号,仅在相关时才整合外部模态。LANTERN在多标签调查响应预测中优于强大的仅调查基线。我们通过消融和稀有/频繁属性分析进一步研究阈值敏感性和选择性模态依赖的好处。LANTERN的模块化支持新编码器和不断发展的数据集的可扩展整合。这项工作为调查为中心的应用程序中的行为建模提供了实用且可扩展的蓝图。

更新时间: 2025-07-28 15:19:54

领域: cs.LG,cs.AI,cs.IR

下载: http://arxiv.org/abs/2507.20919v1

Modeling User Behavior from Adaptive Surveys with Supplemental Context

Modeling user behavior is critical across many industries where understanding preferences, intent, or decisions informs personalization, targeting, and strategic outcomes. Surveys have long served as a classical mechanism for collecting such behavioral data due to their interpretability, structure, and ease of deployment. However, surveys alone are inherently limited by user fatigue, incomplete responses, and practical constraints on their length making them insufficient for capturing user behavior. In this work, we present LANTERN (Late-Attentive Network for Enriched Response Modeling), a modular architecture for modeling user behavior by fusing adaptive survey responses with supplemental contextual signals. We demonstrate the architectural value of maintaining survey primacy through selective gating, residual connections and late fusion via cross-attention, treating survey data as the primary signal while incorporating external modalities only when relevant. LANTERN outperforms strong survey-only baselines in multi-label prediction of survey responses. We further investigate threshold sensitivity and the benefits of selective modality reliance through ablation and rare/frequent attribute analysis. LANTERN's modularity supports scalable integration of new encoders and evolving datasets. This work provides a practical and extensible blueprint for behavior modeling in survey-centric applications.

Updated: 2025-07-28 15:19:54

标题: 从自适应调查与补充环境中建模用户行为

摘要: 用户行为建模对于许多行业至关重要,了解偏好、意图或决策有助于个性化、定位和战略结果。由于易于解释、结构化和部署简便,调查长期以来一直被视为收集此类行为数据的经典机制。然而,调查本身受到用户疲劳、不完整回答以及长度的实际限制的固有局限性,使其无法完全捕捉用户行为。在这项工作中,我们提出了LANTERN(Late-Attentive Network for Enriched Response Modeling),这是一个模块化架构,通过融合自适应调查回答和补充的上下文信号来建模用户行为。我们通过选择性门控、残差连接和通过交叉关注实现的延迟融合,展示了保持调查优先性的架构价值,将调查数据视为主要信号,仅在相关时纳入外部形式。LANTERN在多标签调查响应预测中优于强大的仅调查基线。我们通过消融和罕见/频繁属性分析进一步研究门槛敏感性和选择性模态依赖的好处。LANTERN的模块化支持新编码器和不断发展的数据集的可扩展集成。这项工作为调查中心应用中的行为建模提供了实用且可扩展的蓝图。

更新时间: 2025-07-28 15:19:54

领域: cs.LG,cs.AI,cs.IR

下载: http://arxiv.org/abs/2507.20919v1

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation

This work introduces MediQAl, a French medical question answering dataset designed to evaluate the capabilities of language models in factual medical recall and reasoning over real-world clinical scenarios. MediQAl contains 32,603 questions sourced from French medical examinations across 41 medical subjects. The dataset includes three tasks: (i) Multiple-Choice Question with Unique answer, (ii) Multiple-Choice Question with Multiple answer, and (iii) Open-Ended Question with Short-Answer. Each question is labeled as Understanding or Reasoning, enabling a detailed analysis of models' cognitive capabilities. We validate the MediQAl dataset through extensive evaluation with 14 large language models, including recent reasoning-augmented models, and observe a significant performance gap between factual recall and reasoning tasks. Our evaluation provides a comprehensive benchmark for assessing language models' performance on French medical question answering, addressing a crucial gap in multilingual resources for the medical domain.

Updated: 2025-07-28 15:17:48

标题: MediQAl:一个用于知识和推理评估的法国医学问答数据集

摘要: 这项工作介绍了MediQAl,这是一个法国医学问答数据集,旨在评估语言模型在现实临床场景中的事实性医学回忆和推理能力。MediQAl包含32,603个问题,来自41个医学科目的法国医学考试。该数据集包括三个任务:(i)具有唯一答案的多项选择问题,(ii)具有多个答案的多项选择问题,以及(iii)具有简短答案的开放式问题。每个问题都标记为理解或推理,从而能够对模型的认知能力进行详细分析。我们通过与14个大型语言模型的广泛评估验证了MediQAl数据集,其中包括最近的推理增强模型,并观察到事实回忆和推理任务之间存在显著的性能差距。我们的评估为评估语言模型在法语医学问答上的表现提供了全面的基准,填补了医学领域多语言资源中的重要空白。

更新时间: 2025-07-28 15:17:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.20917v1

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation

This work introduces MediQAl, a French medical question answering dataset designed to evaluate the capabilities of language models in factual medical recall and reasoning over real-world clinical scenarios. MediQAl contains 32,603 questions sourced from French medical examinations across 41 medical subjects. The dataset includes three tasks: (i) Multiple-Choice Question with Unique answer, (ii) Multiple-Choice Question with Multiple answer, and (iii) Open-Ended Question with Short-Answer. Each question is labeled as Understanding or Reasoning, enabling a detailed analysis of models' cognitive capabilities. We validate the MediQAl dataset through extensive evaluation with 14 large language models, including recent reasoning-augmented models, and observe a significant performance gap between factual recall and reasoning tasks. Our evaluation provides a comprehensive benchmark for assessing language models' performance on French medical question answering, addressing a crucial gap in multilingual resources for the medical domain.

Updated: 2025-07-28 15:17:48

标题: MediQAl:一个用于知识和推理评估的法语医学问答数据集

摘要: 这项工作介绍了MediQAl,一个法国医学问答数据集,旨在评估语言模型在真实世界临床场景中事实性医学回忆和推理的能力。MediQAl包含32,603个问题,来自法国41个医学科目的医学考试。该数据集包括三个任务:(i) 单一答案的多项选择题,(ii) 多个答案的多项选择题,以及(iii) 简答的开放式问题。每个问题被标记为理解或推理,可以对模型的认知能力进行详细分析。我们通过与14个大型语言模型进行广泛评估来验证MediQAl数据集,包括最近增强推理的模型,并观察到事实回忆和推理任务之间存在显著的性能差距。我们的评估为评估语言模型在法语医学问答上的表现提供了全面的基准,填补了医学领域多语言资源的重要空白。

更新时间: 2025-07-28 15:17:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.20917v1

HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery Detection

The rapid evolution of face manipulation techniques poses a critical challenge for face forgery detection: cross-domain generalization. Conventional methods, which rely on simple classification objectives, often fail to learn domain-invariant representations. We propose HAMLET-FFD, a cognitively inspired Hierarchical Adaptive Multi-modal Learning framework that tackles this challenge via bidirectional cross-modal reasoning. Building on contrastive vision-language models such as CLIP, HAMLET-FFD introduces a knowledge refinement loop that iteratively assesses authenticity by integrating visual evidence with conceptual cues, emulating expert forensic analysis. A key innovation is a bidirectional fusion mechanism in which textual authenticity embeddings guide the aggregation of hierarchical visual features, while modulated visual features refine text embeddings to generate image-adaptive prompts. This closed-loop process progressively aligns visual observations with semantic priors to enhance authenticity assessment. By design, HAMLET-FFD freezes all pretrained parameters, serving as an external plugin that preserves CLIP's original capabilities. Extensive experiments demonstrate its superior generalization to unseen manipulations across multiple benchmarks, and visual analyses reveal a division of labor among embeddings, with distinct representations specializing in fine-grained artifact recognition.

Updated: 2025-07-28 15:09:52

标题: HAMLET-FFD:面部伪造检测的分层自适应多模态学习嵌入变换

摘要: 面部操作技术的快速发展对面部伪造检测提出了一个关键挑战:跨领域泛化。依赖简单分类目标的传统方法通常无法学习领域不变的表示。我们提出了HAMLET-FFD,这是一个受认知启发的分层自适应多模态学习框架,通过双向跨模态推理来解决这一挑战。在对比视觉语言模型(如CLIP)的基础上,HAMLET-FFD引入了一个知识细化循环,通过集成视觉证据与概念线索,模拟专家法医分析来迭代评估真实性。一个关键的创新是双向融合机制,其中文本真实性嵌入引导分层视觉特征的聚合,同时调制视觉特征优化文本嵌入以生成图像自适应提示。这种闭环过程逐渐将视觉观察与语义先验对齐,以增强真实性评估。HAMLET-FFD设计上冻结了所有预训练参数,作为一个保留CLIP原始功能的外部插件。大量实验证明了它在多个基准测试中对未见操作的优越泛化能力,视觉分析揭示了嵌入之间的分工,具有专门用于细粒度伪迹识别的不同表示。

更新时间: 2025-07-28 15:09:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20913v1

HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery Detection

The rapid evolution of face manipulation techniques poses a critical challenge for face forgery detection: cross-domain generalization. Conventional methods, which rely on simple classification objectives, often fail to learn domain-invariant representations. We propose HAMLET-FFD, a cognitively inspired Hierarchical Adaptive Multi-modal Learning framework that tackles this challenge via bidirectional cross-modal reasoning. Building on contrastive vision-language models such as CLIP, HAMLET-FFD introduces a knowledge refinement loop that iteratively assesses authenticity by integrating visual evidence with conceptual cues, emulating expert forensic analysis. A key innovation is a bidirectional fusion mechanism in which textual authenticity embeddings guide the aggregation of hierarchical visual features, while modulated visual features refine text embeddings to generate image-adaptive prompts. This closed-loop process progressively aligns visual observations with semantic priors to enhance authenticity assessment. By design, HAMLET-FFD freezes all pretrained parameters, serving as an external plugin that preserves CLIP's original capabilities. Extensive experiments demonstrate its superior generalization to unseen manipulations across multiple benchmarks, and visual analyses reveal a division of labor among embeddings, with distinct representations specializing in fine-grained artifact recognition.

Updated: 2025-07-28 15:09:52

标题: HAMLET-FFD:用于人脸伪造检测的分层自适应多模态学习嵌入转换

摘要: 面部操纵技术的快速发展对于面部伪造检测提出了一个关键挑战:跨领域泛化。依赖简单分类目标的传统方法通常无法学习到领域不变表示。我们提出了HAMLET-FFD,这是一个受认知启发的分层自适应多模态学习框架,通过双向跨模态推理来应对这一挑战。基于对比视觉语言模型(如CLIP),HAMLET-FFD引入了一个知识细化循环,通过集成视觉证据和概念线索来迭代评估真实性,模拟专家取证分析。一个关键创新是双向融合机制,其中文本真实性嵌入指导分层视觉特征的聚合,同时调制视觉特征改进文本嵌入以生成图像自适应提示。这个闭环过程逐渐将视觉观察与语义先验相一致,以增强真实性评估。按设计,HAMLET-FFD冻结了所有预训练参数,作为一个外部插件,保留了CLIP的原始功能。大量实验证明了它在多个基准测试中对未见操纵的优越泛化能力,视觉分析揭示了嵌入之间的分工,具有专门的细粒度工件识别表示。

更新时间: 2025-07-28 15:09:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20913v1

Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models

Large Audio-Language Models (LALMs), such as GPT-4o, have recently unlocked audio dialogue capabilities, enabling direct spoken exchanges with humans. The potential of LALMs broadens their applicability across a wide range of practical scenarios supported by audio dialogues. However, given these advancements, a comprehensive benchmark to evaluate the performance of LALMs in the open-ended audio dialogue understanding remains absent currently. To address this gap, we propose an Audio Dialogue Understanding Benchmark (ADU-Bench), which consists of 4 benchmark datasets. They assess the open-ended audio dialogue ability for LALMs in 3 general scenarios, 12 skills, 9 multilingual languages, and 4 categories of ambiguity handling. Notably, we firstly propose the evaluation of ambiguity handling in audio dialogues that expresses different intentions beyond the same literal meaning of sentences, e.g., "Really!?" with different intonations. In summary, ADU-Bench includes over 20,000 open-ended audio dialogues for the assessment of LALMs. Through extensive experiments on 16 LALMs, our analysis reveals that existing LALMs struggle with mathematical symbols and formulas, understanding human behavior such as roleplay, comprehending multiple languages, and handling audio dialogue ambiguities from different phonetic elements, such as intonations, pause positions, and homophones. The benchmark is available at https://adu-bench.github.io/.

Updated: 2025-07-28 15:07:08

标题: 为大型音频语言模型开展开放式音频对话理解的基准测试

摘要: 大型音频语言模型(LALMs),如GPT-4o,最近解锁了音频对话能力,使其能够直接与人类进行口头交流。LALMs的潜力扩大了它们在各种实际场景中通过音频对话支持的适用性。然而,鉴于这些进展,目前还没有一个全面的基准来评估LALMs在开放式音频对话理解方面的性能。为了弥补这一空白,我们提出了一个音频对话理解基准(ADU-Bench),其中包括4个基准数据集。它们评估LALMs在3种一般场景、12种技能、9种多语种语言和4种歧义处理类别下的开放式音频对话能力。值得注意的是,我们首次提出了在音频对话中评估歧义处理,这表达了不同于句子相同字面意义的不同意图,例如,“真的吗!?”带有不同的语调。总之,ADU-Bench包括超过20,000个开放式音频对话,用于评估LALMs。通过对16个LALMs进行广泛实验,我们的分析显示现有的LALMs在处理数学符号和公式、理解人类行为(如角色扮演)、理解多种语言以及处理来自不同语音元素(如语调、停顿位置和同音词)的音频对话歧义方面存在困难。该基准可在https://adu-bench.github.io/ 上获得。

更新时间: 2025-07-28 15:07:08

领域: cs.AI,cs.CL,cs.SD,eess.AS

下载: http://arxiv.org/abs/2412.05167v2

Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models

Large Audio-Language Models (LALMs), such as GPT-4o, have recently unlocked audio dialogue capabilities, enabling direct spoken exchanges with humans. The potential of LALMs broadens their applicability across a wide range of practical scenarios supported by audio dialogues. However, given these advancements, a comprehensive benchmark to evaluate the performance of LALMs in the open-ended audio dialogue understanding remains absent currently. To address this gap, we propose an Audio Dialogue Understanding Benchmark (ADU-Bench), which consists of 4 benchmark datasets. They assess the open-ended audio dialogue ability for LALMs in 3 general scenarios, 12 skills, 9 multilingual languages, and 4 categories of ambiguity handling. Notably, we firstly propose the evaluation of ambiguity handling in audio dialogues that expresses different intentions beyond the same literal meaning of sentences, e.g., "Really!?" with different intonations. In summary, ADU-Bench includes over 20,000 open-ended audio dialogues for the assessment of LALMs. Through extensive experiments on 16 LALMs, our analysis reveals that existing LALMs struggle with mathematical symbols and formulas, understanding human behavior such as roleplay, comprehending multiple languages, and handling audio dialogue ambiguities from different phonetic elements, such as intonations, pause positions, and homophones. The benchmark is available at https://adu-bench.github.io/.

Updated: 2025-07-28 15:07:08

标题: 基准测试大型音频语言模型的开放式音频对话理解

摘要: 大型音频语言模型(LALMs),如GPT-4o,最近解锁了音频对话功能,使其能够直接与人类进行口头交流。LALMs的潜力扩展了它们在许多实际情景中通过音频对话支持的适用性。然而,鉴于这些进展,目前尚缺乏一个综合的基准来评估LALMs在开放式音频对话理解方面的表现。为了弥补这一差距,我们提出了一个音频对话理解基准(ADU-Bench),其中包括4个基准数据集。它们评估了LALMs在3种一般情景、12种技能、9种多语言语言和4种歧义处理类别中的开放式音频对话能力。值得注意的是,我们首次提出在音频对话中评估歧义处理,因为它表达了不同于句子相同字面含义的不同意图,例如,带有不同语调的“真的吗!?”。总而言之,ADU-Bench包括超过20,000个开放式音频对话,用于评估LALMs。通过对16个LALMs进行广泛实验,我们的分析显示,现有的LALMs在数学符号和公式、理解人类行为(如角色扮演)、理解多种语言以及处理音频对话中来自不同语音元素(如语调、停顿位置和同音异义词)的歧义方面存在困难。该基准可在https://adu-bench.github.io/上找到。

更新时间: 2025-07-28 15:07:08

领域: cs.AI,cs.CL,cs.SD,eess.AS

下载: http://arxiv.org/abs/2412.05167v2

SCORPION: Addressing Scanner-Induced Variability in Histopathology

Ensuring reliable model performance across diverse domains is a critical challenge in computational pathology. A particular source of variability in Whole-Slide Images is introduced by differences in digital scanners, thus calling for better scanner generalization. This is critical for the real-world adoption of computational pathology, where the scanning devices may differ per institution or hospital, and the model should not be dependent on scanner-induced details, which can ultimately affect the patient's diagnosis and treatment planning. However, past efforts have primarily focused on standard domain generalization settings, evaluating on unseen scanners during training, without directly evaluating consistency across scanners for the same tissue. To overcome this limitation, we introduce SCORPION, a new dataset explicitly designed to evaluate model reliability under scanner variability. SCORPION includes 480 tissue samples, each scanned with 5 scanners, yielding 2,400 spatially aligned patches. This scanner-paired design allows for the isolation of scanner-induced variability, enabling a rigorous evaluation of model consistency while controlling for differences in tissue composition. Furthermore, we propose SimCons, a flexible framework that combines augmentation-based domain generalization techniques with a consistency loss to explicitly address scanner generalization. We empirically show that SimCons improves model consistency on varying scanners without compromising task-specific performance. By releasing the SCORPION dataset and proposing SimCons, we provide the research community with a crucial resource for evaluating and improving model consistency across diverse scanners, setting a new standard for reliability testing.

Updated: 2025-07-28 15:00:49

标题: 蝎子:解决组织病理学中扫描仪引起的可变性

摘要: 确保在不同领域中可靠的模型性能是计算病理学中的一项关键挑战。整片切片图像中的一个特定变异源是数字扫描仪之间的差异,因此需要更好的扫描仪泛化能力。这对于计算病理学在现实世界中的应用至关重要,因为扫描设备可能在不同机构或医院间存在差异,模型不应依赖于扫描仪引起的细节,否则最终可能影响患者的诊断和治疗计划。然而,过去的努力主要集中在标准域泛化设置上,在培训期间对看不见的扫描仪进行评估,而没有直接评估相同组织的不同扫描仪之间的一致性。为了克服这一限制,我们引入了SCORPION,一个专门设计用于评估模型在扫描仪变异性下的可靠性的新数据集。SCORPION包括480个组织样本,每个样本用5台扫描仪扫描,共产生2,400个空间对齐的块。这种扫描仪配对设计允许隔离由扫描仪引起的变异性,从而在控制组织成分差异的同时,对模型的一致性进行严格评估。此外,我们提出SimCons,一个灵活的框架,结合基于增强的域泛化技术和一致性损失,明确解决扫描仪泛化问题。我们在实证上展示,SimCons在不同扫描仪上提高了模型的一致性,而不影响任务特定的性能。通过发布SCORPION数据集并提出SimCons,我们为研究社区提供了一个关键资源,用于评估和改善不同扫描仪之间的模型一致性,为可靠性测试设定了新标准。

更新时间: 2025-07-28 15:00:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20907v1

SCORPION: Addressing Scanner-Induced Variability in Histopathology

Ensuring reliable model performance across diverse domains is a critical challenge in computational pathology. A particular source of variability in Whole-Slide Images is introduced by differences in digital scanners, thus calling for better scanner generalization. This is critical for the real-world adoption of computational pathology, where the scanning devices may differ per institution or hospital, and the model should not be dependent on scanner-induced details, which can ultimately affect the patient's diagnosis and treatment planning. However, past efforts have primarily focused on standard domain generalization settings, evaluating on unseen scanners during training, without directly evaluating consistency across scanners for the same tissue. To overcome this limitation, we introduce SCORPION, a new dataset explicitly designed to evaluate model reliability under scanner variability. SCORPION includes 480 tissue samples, each scanned with 5 scanners, yielding 2,400 spatially aligned patches. This scanner-paired design allows for the isolation of scanner-induced variability, enabling a rigorous evaluation of model consistency while controlling for differences in tissue composition. Furthermore, we propose SimCons, a flexible framework that combines augmentation-based domain generalization techniques with a consistency loss to explicitly address scanner generalization. We empirically show that SimCons improves model consistency on varying scanners without compromising task-specific performance. By releasing the SCORPION dataset and proposing SimCons, we provide the research community with a crucial resource for evaluating and improving model consistency across diverse scanners, setting a new standard for reliability testing.

Updated: 2025-07-28 15:00:49

标题: 蝎子:解决组织病理学中扫描仪引起的可变性

摘要: 确保在不同领域中模型性能可靠是计算病理学中的一个关键挑战。整张切片图像中的一个特定可变性源于数字扫描仪的差异,因此需要更好地实现扫描仪泛化。这对于计算病理学在现实世界中的应用至关重要,因为扫描设备可能因机构或医院而异,模型不应依赖于扫描仪产生的细节,这最终可能影响患者的诊断和治疗计划。然而,过去的努力主要集中在标准领域泛化设置上,评估在训练期间看不见的扫描仪,而没有直接评估相同组织的不同扫描仪之间的一致性。为了克服这一限制,我们引入了SCORPION,这是一个专门设计用于评估模型在扫描仪变化下可靠性的新数据集。SCORPION包括480个组织样本,每个样本使用5台扫描仪扫描,产生2400个空间对齐的补丁。这种扫描器配对设计允许隔离扫描仪引起的可变性,从而使得能够严格评估模型的一致性,同时控制组织成分的差异。此外,我们提出了SimCons,这是一个灵活的框架,结合了基于增强的领域泛化技术和一致性损失,以明确解决扫描仪泛化问题。我们通过实验证明,SimCons可以在不同扫描仪上提高模型的一致性,而不会影响任务特定的性能。通过发布SCORPION数据集并提出SimCons,我们为研究社区提供了一个关键资源,用于评估和改进跨不同扫描仪的模型一致性,为可靠性测试设定了新标准。

更新时间: 2025-07-28 15:00:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20907v1

Are ECGs enough? Deep learning classification of pulmonary embolism using electrocardiograms

Pulmonary embolism is a leading cause of out of hospital cardiac arrest that requires fast diagnosis. While computed tomography pulmonary angiography is the standard diagnostic tool, it is not always accessible. Electrocardiography is an essential tool for diagnosing mul- tiple cardiac anomalies, as it is affordable, fast and available in many settings. However, the availability of public ECG datasets, specially for PE, is limited and, in practice, these datasets tend to be small, making it essential to optimize learning strategies. In this study, we investigate the performance of multiple neural networks in order to assess the impact of various approaches. Moreover, we check whether these practices enhance model generalization when transfer learning is used to translate infor- mation learned in larger ECG datasets, such as PTB-XL, CPSC18 and MedalCare-XL, to a smaller, more challenging dataset for PE. By lever- aging transfer learning, we analyze the extent to which we can improve learning efficiency and predictive performance on limited data. Code available at https://github.com/joaodsmarques/Are-ECGs-enough-Deep-Learning-Classifiers .

Updated: 2025-07-28 14:58:39

标题: 足够吗?使用心电图进行肺栓塞的深度学习分类

摘要: 肺栓塞是院外心脏骤停的主要原因,需要快速诊断。虽然计算机断层扫描肺动脉造影是标准的诊断工具,但并非总是可获得。心电图是诊断多种心脏异常的重要工具,因为它价格适中、快速,并在许多环境中可获得。然而,专门用于肺栓塞等公共心电图数据集的可用性有限,在实践中,这些数据集往往很小,因此优化学习策略至关重要。在本研究中,我们调查了多个神经网络的性能,以评估各种方法的影响。此外,我们检查这些实践是否在使用迁移学习将在较大心电图数据集(如PTB-XL、CPSC18和MedalCare-XL)中学到的信息转化到一个较小、更具挑战性的肺栓塞数据集时,增强模型的泛化能力。通过利用迁移学习,我们分析了在有限数据上能提高学习效率和预测性能的程度。代码可在https://github.com/joaodsmarques/Are-ECGs-enough-Deep-Learning-Classifiers 找到。

更新时间: 2025-07-28 14:58:39

领域: cs.CV,cs.AI,cs.LG,I.2

下载: http://arxiv.org/abs/2503.08960v2

Are ECGs enough? Deep learning classification of pulmonary embolism using electrocardiograms

Pulmonary embolism is a leading cause of out of hospital cardiac arrest that requires fast diagnosis. While computed tomography pulmonary angiography is the standard diagnostic tool, it is not always accessible. Electrocardiography is an essential tool for diagnosing mul- tiple cardiac anomalies, as it is affordable, fast and available in many settings. However, the availability of public ECG datasets, specially for PE, is limited and, in practice, these datasets tend to be small, making it essential to optimize learning strategies. In this study, we investigate the performance of multiple neural networks in order to assess the impact of various approaches. Moreover, we check whether these practices enhance model generalization when transfer learning is used to translate infor- mation learned in larger ECG datasets, such as PTB-XL, CPSC18 and MedalCare-XL, to a smaller, more challenging dataset for PE. By lever- aging transfer learning, we analyze the extent to which we can improve learning efficiency and predictive performance on limited data. Code available at https://github.com/joaodsmarques/Are-ECGs-enough-Deep-Learning-Classifiers .

Updated: 2025-07-28 14:58:39

标题: 足够使用心电图吗?利用深度学习对肺栓塞进行分类

摘要: 肺栓塞是院外心脏骤停的主要原因之一,需要快速诊断。虽然计算机断层扫描肺动脉造影是标准诊断工具,但并非总是可获得。心电图是诊断多种心脏异常的必要工具,因为它价格合理、速度快,在许多场所都可以使用。然而,公共心电图数据集的可用性,特别是用于肺栓塞的数据集,是有限的,实际上,这些数据集往往规模较小,因此需要优化学习策略。本研究旨在研究多种神经网络的性能,以评估不同方法的影响。此外,我们检查这些实践是否在使用迁移学习将在较大的心电图数据集(如PTB-XL、CPSC18和MedalCare-XL)中学到的信息转化为规模较小、更具挑战性的肺栓塞数据集时,能否提高模型的泛化能力。通过利用迁移学习,我们分析在有限数据上提高学习效率和预测性能的程度。可在https://github.com/joaodsmarques/Are-ECGs-enough-Deep-Learning-Classifiers找到代码。

更新时间: 2025-07-28 14:58:39

领域: cs.CV,cs.AI,cs.LG,I.2

下载: http://arxiv.org/abs/2503.08960v2

TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories

Type inference for dynamic languages like Python is a persistent challenge in software engineering. While large language models (LLMs) have shown promise in code understanding, their type inference capabilities remain underexplored. We introduce TypyBench, a benchmark designed to evaluate LLMs' type inference across entire Python repositories. TypyBench features two novel metrics: TypeSim, which captures nuanced semantic relationships between predicted and ground truth types, and TypeCheck, which assesses type consistency across codebases. Our evaluation of various LLMs on a curated dataset of 50 high-quality Python repositories reveals that, although LLMs achieve decent TypeSim scores, they struggle with complex nested types and exhibit significant type consistency errors. These findings suggest that future research should shift focus from improving type similarity to addressing repository-level consistency. TypyBench provides a foundation for this new direction, offering insights into model performance across different type complexities and usage contexts. Our code and data are available at https://github.com/typybench/typybench.

Updated: 2025-07-28 14:54:00

标题: TypyBench:评估LLM类型推断对未类型化Python代码库的影响

摘要: 像Python这样的动态语言的类型推断一直是软件工程中的一个持续挑战。尽管大型语言模型(LLMs)在代码理解方面表现出了很好的潜力,但它们的类型推断能力仍然未被充分探索。我们引入了TypyBench,一个旨在评估LLMs在整个Python代码库中的类型推断能力的基准测试。TypyBench具有两个新颖的度量标准:TypeSim,捕捉了预测类型和实际类型之间微妙的语义关系,以及TypeCheck,评估了代码库中的类型一致性。我们对一组精心筛选的50个高质量Python代码库进行了各种LLMs的评估,结果显示,尽管LLMs取得了不错的TypeSim分数,但它们在处理复杂嵌套类型方面存在困难,并显示出显著的类型一致性错误。这些发现表明,未来的研究应将重点从改进类型相似性转向解决代码库级别的一致性。TypyBench为这个新方向奠定了基础,提供了对不同类型复杂性和使用情境下模型性能的见解。我们的代码和数据可在https://github.com/typybench/typybench 获取。

更新时间: 2025-07-28 14:54:00

领域: cs.SE,cs.AI,cs.PL

下载: http://arxiv.org/abs/2507.22086v1

Music Arena: Live Evaluation for Text-to-Music

We present Music Arena, an open platform for scalable human preference evaluation of text-to-music (TTM) models. Soliciting human preferences via listening studies is the gold standard for evaluation in TTM, but these studies are expensive to conduct and difficult to compare, as study protocols may differ across systems. Moreover, human preferences might help researchers align their TTM systems or improve automatic evaluation metrics, but an open and renewable source of preferences does not currently exist. We aim to fill these gaps by offering *live* evaluation for TTM. In Music Arena, real-world users input text prompts of their choosing and compare outputs from two TTM systems, and their preferences are used to compile a leaderboard. While Music Arena follows recent evaluation trends in other AI domains, we also design it with key features tailored to music: an LLM-based routing system to navigate the heterogeneous type signatures of TTM systems, and the collection of *detailed* preferences including listening data and natural language feedback. We also propose a rolling data release policy with user privacy guarantees, providing a renewable source of preference data and increasing platform transparency. Through its standardized evaluation protocol, transparent data access policies, and music-specific features, Music Arena not only addresses key challenges in the TTM ecosystem but also demonstrates how live evaluation can be thoughtfully adapted to unique characteristics of specific AI domains. Music Arena is available at: https://music-arena.org

Updated: 2025-07-28 14:52:57

标题: 音乐竞技场:文本到音乐的实时评估

摘要: 我们提出了Music Arena,这是一个用于可扩展人类偏好评估文本到音乐(TTM)模型的开放平台。通过听力研究征求人类偏好是TTM评估的金标准,但这些研究成本高且难以比较,因为研究协议可能在不同系统之间存在差异。此外,人类偏好可能帮助研究人员调整他们的TTM系统或改进自动评估指标,但目前并不存在一个开放且可更新的偏好数据源。我们的目标是通过为TTM提供*实时*评估来填补这些空白。在Music Arena中,真实用户输入他们选择的文本提示,比较两个TTM系统的输出,并使用他们的偏好来编制排行榜。虽然Music Arena遵循其他AI领域最近的评估趋势,但我们还根据音乐的关键特点为其设计了定制功能:基于LLM的路由系统可浏览TTM系统的异构类型签名,并收集*详细*偏好,包括听力数据和自然语言反馈。我们还提出了一个滚动数据发布政策,提供用户隐私保障,提供可更新的偏好数据源并增加平台透明度。通过其标准化评估协议、透明数据访问政策和音乐特定功能,Music Arena不仅解决了TTM生态系统中的关键挑战,还展示了如何将实时评估巧妙地适应特定AI领域的独特特征。 Music Arena网址:https://music-arena.org

更新时间: 2025-07-28 14:52:57

领域: cs.SD,cs.AI,cs.MM

下载: http://arxiv.org/abs/2507.20900v1

Music Arena: Live Evaluation for Text-to-Music

We present Music Arena, an open platform for scalable human preference evaluation of text-to-music (TTM) models. Soliciting human preferences via listening studies is the gold standard for evaluation in TTM, but these studies are expensive to conduct and difficult to compare, as study protocols may differ across systems. Moreover, human preferences might help researchers align their TTM systems or improve automatic evaluation metrics, but an open and renewable source of preferences does not currently exist. We aim to fill these gaps by offering *live* evaluation for TTM. In Music Arena, real-world users input text prompts of their choosing and compare outputs from two TTM systems, and their preferences are used to compile a leaderboard. While Music Arena follows recent evaluation trends in other AI domains, we also design it with key features tailored to music: an LLM-based routing system to navigate the heterogeneous type signatures of TTM systems, and the collection of *detailed* preferences including listening data and natural language feedback. We also propose a rolling data release policy with user privacy guarantees, providing a renewable source of preference data and increasing platform transparency. Through its standardized evaluation protocol, transparent data access policies, and music-specific features, Music Arena not only addresses key challenges in the TTM ecosystem but also demonstrates how live evaluation can be thoughtfully adapted to unique characteristics of specific AI domains. Music Arena is available at: https://music-arena.org

Updated: 2025-07-28 14:52:57

标题: 音乐竞技场:文本转音乐的实时评估

摘要: 我们介绍了Music Arena,这是一个用于可扩展人类偏好评估文本到音乐(TTM)模型的开放平台。通过听取研究中征求人类偏好是TTM评估的金标准,但这些研究进行起来成本很高且难以比较,因为研究协议可能在不同系统间有所不同。此外,人类偏好可能有助于研究人员调整他们的TTM系统或改进自动评估指标,但目前并没有一个开放和可更新的偏好数据来源。我们旨在通过为TTM提供*实时*评估来填补这些空白。在Music Arena中,现实世界的用户输入他们选择的文本提示,并比较来自两个TTM系统的输出,他们的偏好被用来编制排行榜。虽然Music Arena遵循其他人工智能领域近期评估趋势,但我们还设计了适合音乐的关键特性:基于LLM的路由系统,用来导航TTM系统的异构类型签名,以及收集*详细*偏好,包括听取数据和自然语言反馈。我们还提出了一个滚动数据发布政策,保证用户隐私,提供一个可更新的偏好数据来源,增加平台透明度。通过其标准化评估协议、透明数据访问政策和音乐特定特性,Music Arena不仅解决了TTM生态系统中的关键挑战,还展示了如何精心地将实时评估适应特定人工智能领域的独特特征。 Music Arena网址:https://music-arena.org

更新时间: 2025-07-28 14:52:57

领域: cs.SD,cs.AI,cs.MM

下载: http://arxiv.org/abs/2507.20900v1

SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation

High-quality labeled datasets are crucial for training and evaluating foundation models in software engineering, but creating them is often prohibitively expensive and labor-intensive. We introduce SPICE, a scalable, automated pipeline for labeling SWE-bench-style datasets with annotations for issue clarity, test coverage, and effort estimation. SPICE combines context-aware code navigation, rationale-driven prompting, and multi-pass consensus to produce labels that closely approximate expert annotations. SPICE's design was informed by our own experience and frustration in labeling more than 800 instances from SWE-Gym. SPICE achieves strong agreement with human-labeled SWE-bench Verified data while reducing the cost of labeling 1,000 instances from around $100,000 (manual annotation) to just $5.10. These results demonstrate SPICE's potential to enable cost-effective, large-scale dataset creation for SE-focused FMs. To support the community, we release both SPICE tool and SPICE Bench, a new dataset of 6,802 SPICE-labeled instances curated from 291 open-source projects in SWE-Gym (over 13x larger than SWE-bench Verified).

Updated: 2025-07-28 14:51:35

标题: SPICE:用于问题明晰度、测试覆盖率和工作量估计的自动化SWE-Bench标签管道

摘要: 高质量的标记数据集对于训练和评估软件工程中的基础模型至关重要,但创建它们往往成本高昂且劳动密集。我们介绍了SPICE,这是一个可扩展的自动化流水线,用于为SWE-bench风格的数据集标记注释,涵盖问题清晰度、测试覆盖率和工作量估计。SPICE结合了上下文感知的代码导航、原因驱动的提示和多次一致性,以产生与专家注释非常接近的标记。SPICE的设计得益于我们在为SWE-Gym中的800多个实例进行标记时的经验和挫折。SPICE与人工标记的SWE-bench验证数据达成了强有力的一致,同时将从约100,000美元(手动注释)标记1,000个实例的成本降低到仅5.10美元。这些结果表明了SPICE在支持成本效益、大规模数据集创建方面的潜力,特别是针对软件工程重点基础模型。为了支持社区,我们发布了SPICE工具和SPICE Bench,这是一个新的数据集,包含6,802个SPICE标记的实例,从SWE-Gym中的291个开源项目中精选而来(比SWE-bench验证数据集大13倍以上)。

更新时间: 2025-07-28 14:51:35

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.09108v2

SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation

High-quality labeled datasets are crucial for training and evaluating foundation models in software engineering, but creating them is often prohibitively expensive and labor-intensive. We introduce SPICE, a scalable, automated pipeline for labeling SWE-bench-style datasets with annotations for issue clarity, test coverage, and effort estimation. SPICE combines context-aware code navigation, rationale-driven prompting, and multi-pass consensus to produce labels that closely approximate expert annotations. SPICE's design was informed by our own experience and frustration in labeling more than 800 instances from SWE-Gym. SPICE achieves strong agreement with human-labeled SWE-bench Verified data while reducing the cost of labeling 1,000 instances from around $100,000 (manual annotation) to just $5.10. These results demonstrate SPICE's potential to enable cost-effective, large-scale dataset creation for SE-focused FMs. To support the community, we release both SPICE tool and SPICE Bench, a new dataset of 6,802 SPICE-labeled instances curated from 291 open-source projects in SWE-Gym (over 13x larger than SWE-bench Verified).

Updated: 2025-07-28 14:51:35

标题: SPICE:一个用于问题清晰度、测试覆盖率和工作量估计的自动化SWE-Bench标记管道

摘要: 高质量的标记数据集对于软件工程中基础模型的训练和评估至关重要,但创建它们通常成本高昂且劳动密集。我们介绍了SPICE,一个可扩展的自动化流水线,用于为SWE-bench风格的数据集添加有关问题清晰度、测试覆盖率和工作量估计的注释。SPICE结合了上下文感知的代码导航、基于原因的提示和多次一致性,以产生与专家注释非常接近的标签。SPICE的设计得益于我们在为SWE-Gym标注了超过800个实例时的经验和挫折。SPICE与人工标记的SWE-bench验证数据达成了强有力的一致性,同时将从手动注释约100,000美元降低到仅5.10美元的标注成本。这些结果显示了SPICE为软件工程重点的基础模型提供了成本效益、大规模数据集创建的潜力。为了支持社区,我们发布了SPICE工具和SPICE Bench,这是一个新的数据集,包含了来自SWE-Gym的291个开源项目中的6,802个SPICE标记实例(比SWE-bench验证数据集大13倍)。

更新时间: 2025-07-28 14:51:35

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.09108v2

Joint modeling for learning decision-making dynamics in behavioral experiments

Major depressive disorder (MDD), a leading cause of disability and mortality, is associated with reward-processing abnormalities and concentration issues. Motivated by the probabilistic reward task from the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel framework that integrates the reinforcement learning (RL) model and drift-diffusion model (DDM) to jointly analyze reward-based decision-making with response times. To account for emerging evidence suggesting that decision-making may alternate between multiple interleaved strategies, we model latent state switching using a hidden Markov model (HMM). In the ''engaged'' state, decisions follow an RL-DDM, simultaneously capturing reward processing, decision dynamics, and temporal structure. In contrast, in the ''lapsed'' state, decision-making is modeled using a simplified DDM, where specific parameters are fixed to approximate random guessing with equal probability. The proposed method is implemented using a computationally efficient generalized expectation-maximization (EM) algorithm with forward-backward procedures. Through extensive numerical studies, we demonstrate that our proposed method outperforms competing approaches across various reward-generating distributions, under both strategy-switching and non-switching scenarios, as well as in the presence of input perturbations. When applied to the EMBARC study, our framework reveals that MDD patients exhibit lower overall engagement than healthy controls and experience longer decision times when they do engage. Additionally, we show that neuroimaging measures of brain activities are associated with decision-making characteristics in the ''engaged'' state but not in the ''lapsed'' state, providing evidence of brain-behavior association specific to the ''engaged'' state.

Updated: 2025-07-28 14:47:34

标题: 在行为实验中学习决策动态的联合建模

摘要: Major depressive disorder (MDD) is a common mental health condition that is linked to difficulties in processing rewards and maintaining focus. In this study, researchers used a probabilistic reward task from the EMBARC study to develop a new framework that combines reinforcement learning (RL) and drift-diffusion model (DDM) to analyze decision-making processes related to rewards and response times. They also incorporated a hidden Markov model (HMM) to account for the possibility of switching between different decision-making strategies. The researchers found that individuals with MDD showed lower engagement levels and longer decision times compared to healthy controls. Neuroimaging measures also revealed differences in brain activity between the "engaged" and "lapsed" states, indicating a specific association between brain function and decision-making behavior in the "engaged" state. The proposed framework outperformed other methods in analyzing reward-based decision-making under various conditions, demonstrating its potential for improving our understanding of MDD and related cognitive processes.

更新时间: 2025-07-28 14:47:34

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2506.02394v2

Joint modeling for learning decision-making dynamics in behavioral experiments

Major depressive disorder (MDD), a leading cause of disability and mortality, is associated with reward-processing abnormalities and concentration issues. Motivated by the probabilistic reward task from the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel framework that integrates the reinforcement learning (RL) model and drift-diffusion model (DDM) to jointly analyze reward-based decision-making with response times. To account for emerging evidence suggesting that decision-making may alternate between multiple interleaved strategies, we model latent state switching using a hidden Markov model (HMM). In the ''engaged'' state, decisions follow an RL-DDM, simultaneously capturing reward processing, decision dynamics, and temporal structure. In contrast, in the ''lapsed'' state, decision-making is modeled using a simplified DDM, where specific parameters are fixed to approximate random guessing with equal probability. The proposed method is implemented using a computationally efficient generalized expectation-maximization (EM) algorithm with forward-backward procedures. Through extensive numerical studies, we demonstrate that our proposed method outperforms competing approaches across various reward-generating distributions, under both strategy-switching and non-switching scenarios, as well as in the presence of input perturbations. When applied to the EMBARC study, our framework reveals that MDD patients exhibit lower overall engagement than healthy controls and experience longer decision times when they do engage. Additionally, we show that neuroimaging measures of brain activities are associated with decision-making characteristics in the ''engaged'' state but not in the ''lapsed'' state, providing evidence of brain-behavior association specific to the ''engaged'' state.

Updated: 2025-07-28 14:47:34

标题: 在行为实验中学习决策动态的联合建模

摘要: 抑郁症(MDD)是致残和死亡的主要原因之一,与奖励处理异常和注意力问题相关。受到确立抗抑郁药物反应的中介因子和生物标志物(EMBARC)研究中的概率奖励任务的启发,我们提出了一个新颖的框架,将强化学习(RL)模型和漂移扩散模型(DDM)结合起来,共同分析基于奖励的决策制定与反应时间。为了考虑新出现的证据表明决策可能在多个交错的策略之间交替,我们使用隐马尔可夫模型(HMM)对潜在状态转换进行建模。在“参与”状态下,决策遵循RL-DDM,同时捕捉奖励处理、决策动态和时间结构。相反,在“失误”状态下,决策制定使用简化的DDM进行建模,其中特定参数固定以近似随机猜测的等概率。提出的方法使用计算效率高的广义期望最大化(EM)算法和前向-后向过程实施。通过大量数值研究,我们证明我们的方法在各种奖励生成分布下都优于竞争方法,在策略切换和非切换情况下,以及在输入扰动存在的情况下也是如此。当应用于EMBARC研究时,我们的框架揭示了MDD患者比健康对照群表现出更低的整体参与度,并且当他们参与时决策时间更长。此外,我们展示神经影像学脑活动测量与“参与”状态下的决策特征相关,但与“失误”状态下无关,为“参与”状态特定的脑行为关联提供了证据。

更新时间: 2025-07-28 14:47:34

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2506.02394v2

Online hierarchical partitioning of the output space in extreme multi-label data stream

Mining data streams with multi-label outputs poses significant challenges due to evolving distributions, high-dimensional label spaces, sparse label occurrences, and complex label dependencies. Moreover, concept drift affects not only input distributions but also label correlations and imbalance ratios over time, complicating model adaptation. To address these challenges, structured learners are categorized into local and global methods. Local methods break down the task into simpler components, while global methods adapt the algorithm to the full output space, potentially yielding better predictions by exploiting label correlations. This work introduces iHOMER (Incremental Hierarchy Of Multi-label Classifiers), an online multi-label learning framework that incrementally partitions the label space into disjoint, correlated clusters without relying on predefined hierarchies. iHOMER leverages online divisive-agglomerative clustering based on \textit{Jaccard} similarity and a global tree-based learner driven by a multivariate \textit{Bernoulli} process to guide instance partitioning. To address non-stationarity, it integrates drift detection mechanisms at both global and local levels, enabling dynamic restructuring of label partitions and subtrees. Experiments across 23 real-world datasets show iHOMER outperforms 5 state-of-the-art global baselines, such as MLHAT, MLHT of Pruned Sets and iSOUPT, by 23\%, and 12 local baselines, such as binary relevance transformations of kNN, EFDT, ARF, and ADWIN bagging/boosting ensembles, by 32\%, establishing its robustness for online multi-label classification.

Updated: 2025-07-28 14:47:13

标题: 在线极端多标签数据流中输出空间的层次分区

摘要: 使用多标签输出挖掘数据流面临着重大挑战,原因是由于不断变化的分布、高维标签空间、稀疏标签出现以及复杂的标签依赖关系。此外,概念漂移不仅影响输入分布,还会影响标签之间的相关性和不平衡比率随时间的变化,从而使模型适应变得更加复杂。为了解决这些挑战,结构化学习者被划分为局部和全局方法。局部方法将任务分解为更简单的组件,而全局方法将算法适应于完整的输出空间,通过利用标签相关性可能会获得更好的预测结果。本文介绍了iHOMER(增量式多标签分类器层次结构),这是一个在线多标签学习框架,它将标签空间逐步划分为不重叠、相关的簇,而不依赖于预定义的层次结构。iHOMER利用基于Jaccard相似性的在线分裂-聚合聚类和由多元伯努利过程驱动的全局基于树的学习器来引导实例分区。为了应对非稳态性,它在全局和局部级别集成了漂移检测机制,使标签分区和子树的动态重组成为可能。在23个真实世界数据集上的实验证明,iHOMER比5个最先进的全局基线(如MLHAT、MLHT of Pruned Sets和iSOUPT)提高了23%,比12个局部基线(如kNN的二元相关转换、EFDT、ARF和ADWIN装袋/提升集成)提高了32%,从而确立了其用于在线多标签分类的稳健性。

更新时间: 2025-07-28 14:47:13

领域: cs.LG

下载: http://arxiv.org/abs/2507.20894v1

Online hierarchical partitioning of the output space in extreme multi-label data stream

Mining data streams with multi-label outputs poses significant challenges due to evolving distributions, high-dimensional label spaces, sparse label occurrences, and complex label dependencies. Moreover, concept drift affects not only input distributions but also label correlations and imbalance ratios over time, complicating model adaptation. To address these challenges, structured learners are categorized into local and global methods. Local methods break down the task into simpler components, while global methods adapt the algorithm to the full output space, potentially yielding better predictions by exploiting label correlations. This work introduces iHOMER (Incremental Hierarchy Of Multi-label Classifiers), an online multi-label learning framework that incrementally partitions the label space into disjoint, correlated clusters without relying on predefined hierarchies. iHOMER leverages online divisive-agglomerative clustering based on \textit{Jaccard} similarity and a global tree-based learner driven by a multivariate \textit{Bernoulli} process to guide instance partitioning. To address non-stationarity, it integrates drift detection mechanisms at both global and local levels, enabling dynamic restructuring of label partitions and subtrees. Experiments across 23 real-world datasets show iHOMER outperforms 5 state-of-the-art global baselines, such as MLHAT, MLHT of Pruned Sets and iSOUPT, by 23\%, and 12 local baselines, such as binary relevance transformations of kNN, EFDT, ARF, and ADWIN bagging/boosting ensembles, by 32\%, establishing its robustness for online multi-label classification.

Updated: 2025-07-28 14:47:13

标题: 在线层次划分极端多标签数据流的输出空间

摘要: 使用多标签输出挖掘数据流面临重大挑战,这些挑战包括不断演变的分布、高维标签空间、稀疏标签出现以及复杂的标签依赖关系。此外,概念漂移不仅影响输入分布,也影响标签相关性和不平衡比率随时间的变化,使模型适应变得更加复杂。为了解决这些挑战,结构化学习者被分类为本地方法和全局方法。本地方法将任务分解为更简单的组件,而全局方法将算法调整到完整的输出空间,通过利用标签相关性可能获得更好的预测结果。本文介绍了 iHOMER(增量多标签分类器层次结构),这是一个在线多标签学习框架,它将标签空间逐步划分为不重叠、相关的簇,而不依赖预定义的层次结构。iHOMER利用基于 Jaccard 相似性的在线分裂-聚合聚类和由多元 Bernoulli 过程驱动的全局基于树的学习器来指导实例分区。为了解决非稳态性,它在全局和本地级别集成了漂移检测机制,实现了标签分区和子树的动态重组。对23个真实世界数据集的实验表明,iHOMER优于5个最先进的全局基线,如 MLHAT、MLHT of Pruned Sets 和 iSOUPT,提高了23\%,并且优于12个本地基线,如 kNN 的二进制相关性转换、EFDT、ARF 和 ADWIN 装袋/提升集成,提高了32\%,确立了其在线多标签分类的稳健性。

更新时间: 2025-07-28 14:47:13

领域: cs.LG

下载: http://arxiv.org/abs/2507.20894v1

Characterizing the Sensitivity to Individual Bit Flips in Client-Side Operations of the CKKS Scheme

Homomorphic Encryption (HE) enables computation on encrypted data without decryption, making it a cornerstone of privacy-preserving computation in untrusted environments. As HE sees growing adoption in sensitive applications such as secure machine learning and confidential data analysis ensuring its robustness against errors becomes critical. Faults (e.g., transmission errors, hardware malfunctions, or synchronization failures) can corrupt encrypted data and compromise the integrity of HE operations. However, the impact of soft errors (such as bit flips) on modern HE schemes remains unexplored. Specifically, the CKKS scheme-one of the most widely used HE schemes for approximate arithmetic-lacks a systematic study of how such errors propagate across its pipeline, particularly under optimizations like the Residue Number System (RNS) and Number Theoretic Transform (NTT). This work bridges that gap by presenting a theoretical and empirical analysis of CKKS's fault tolerance under single bit-flip errors. We focus on client-side operations (encoding, encryption, decryption, and decoding) and demonstrate that while the vanilla CKKS scheme exhibits some resilience, performance optimizations (RNS/NTT) introduce significant fragility, amplifying error sensitivity. By characterizing these failure modes, we lay the groundwork for error-resilient HE designs, ensuring both performance and integrity in privacy-critical applications.

Updated: 2025-07-28 14:42:09

标题: 对CKKS方案中客户端操作中个别比特翻转的敏感性特征化

摘要: 同态加密(HE)使得在不解密的情况下对加密数据进行计算成为可能,因此成为在不可信环境下保护隐私数据的基石。随着HE在诸如安全机器学习和机密数据分析等敏感应用中的采用不断增长,确保其对错误的鲁棒性变得至关重要。故障(例如传输错误、硬件故障或同步失败)可能会破坏加密数据并危及HE操作的完整性。然而,现代HE方案中软错误(例如位翻转)对其影响尚未被探索。具体来说,作为用于近似算术的最广泛使用的HE方案之一,CKKS方案缺乏对这种错误如何在其流水线中传播的系统研究,特别是在像余数系统(RNS)和数论变换(NTT)等优化下。本研究通过对CKKS在单比特翻转错误下的容错性进行理论和实证分析来弥补这一不足。我们重点关注客户端操作(编码、加密、解密和解码),并且证明虽然原始的CKKS方案表现出一定的抗性,但性能优化(RNS/NTT)引入了显著的脆弱性,增加了错误的敏感性。通过表征这些故障模式,我们为具有错误容忍性的HE设计奠定了基础,确保在隐私关键应用中既有性能又有完整性。

更新时间: 2025-07-28 14:42:09

领域: cs.CR

下载: http://arxiv.org/abs/2507.20891v1

JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment

Diffusion and flow-matching models have revolutionized automatic text-to-audio generation in recent times. These models are increasingly capable of generating high quality and faithful audio outputs capturing to speech and acoustic events. However, there is still much room for improvement in creative audio generation that primarily involves music and songs. Recent open lyrics-to-song models, such as, DiffRhythm, ACE-Step, and LeVo, have set an acceptable standard in automatic song generation for recreational use. However, these models lack fine-grained word-level controllability often desired by musicians in their workflows. To the best of our knowledge, our flow-matching-based JAM is the first effort toward endowing word-level timing and duration control in song generation, allowing fine-grained vocal control. To enhance the quality of generated songs to better align with human preferences, we implement aesthetic alignment through Direct Preference Optimization, which iteratively refines the model using a synthetic dataset, eliminating the need or manual data annotations. Furthermore, we aim to standardize the evaluation of such lyrics-to-song models through our public evaluation dataset JAME. We show that JAM outperforms the existing models in terms of the music-specific attributes.

Updated: 2025-07-28 14:34:02

标题: JAM:一款具有精细可控性和审美对齐的微型基于流的音乐生成器

摘要: 扩散和流匹配模型近年来已经彻底改变了自动文本到音频生成。这些模型越来越能够生成高质量和忠实的音频输出,捕捉到语音和声学事件。然而,在涉及音乐和歌曲的创造性音频生成方面仍有很大的改进空间。最近的开放歌词到歌曲模型,如DiffRhythm、ACE-Step和LeVo,已经为娱乐用途的自动生成歌曲设定了一个可接受的标准。然而,这些模型缺乏音乐人在其工作流程中经常需要的细粒度词级可控性。据我们所知,我们基于流匹配的JAM是对歌曲生成赋予词级时序和持续时间控制的第一次尝试,允许细粒度的声音控制。为了提高生成的歌曲质量,更好地与人类偏好相符,我们通过直接偏好优化实现审美对齐,通过一个合成数据集迭代地改进模型,消除了手动数据标注的需要。此外,我们旨在通过我们的公共评估数据集JAME规范化对这种歌词到歌曲模型的评估。我们表明,在音乐特定属性方面,JAM优于现有模型。

更新时间: 2025-07-28 14:34:02

领域: cs.SD,cs.AI

下载: http://arxiv.org/abs/2507.20880v1

JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment

Diffusion and flow-matching models have revolutionized automatic text-to-audio generation in recent times. These models are increasingly capable of generating high quality and faithful audio outputs capturing to speech and acoustic events. However, there is still much room for improvement in creative audio generation that primarily involves music and songs. Recent open lyrics-to-song models, such as, DiffRhythm, ACE-Step, and LeVo, have set an acceptable standard in automatic song generation for recreational use. However, these models lack fine-grained word-level controllability often desired by musicians in their workflows. To the best of our knowledge, our flow-matching-based JAM is the first effort toward endowing word-level timing and duration control in song generation, allowing fine-grained vocal control. To enhance the quality of generated songs to better align with human preferences, we implement aesthetic alignment through Direct Preference Optimization, which iteratively refines the model using a synthetic dataset, eliminating the need or manual data annotations. Furthermore, we aim to standardize the evaluation of such lyrics-to-song models through our public evaluation dataset JAME. We show that JAM outperforms the existing models in terms of the music-specific attributes.

Updated: 2025-07-28 14:34:02

标题: JAM:一款具有精细控制和美学对齐的微型基于流的音乐生成器

摘要: 扩散和流匹配模型近年来已经彻底改变了自动文本转音频生成的方式。这些模型越来越能够生成高质量和忠实的音频输出,捕捉到语音和声音事件。然而,在涉及音乐和歌曲的创意音频生成方面仍有很大的改进空间。最近的开放歌词到歌曲模型,如DiffRhythm、ACE-Step和LeVo,已经为娱乐用途的自动生成歌曲设定了可接受的标准。然而,这些模型缺乏音乐人在工作流程中经常需要的细粒度单词级可控性。据我们所知,我们基于流匹配的JAM是第一个在歌曲生成中向单词级时序和持续时间控制赋能的努力,允许细粒度的声音控制。为了提高生成歌曲的质量,使其更符合人类偏好,我们通过直接偏好优化实现了审美对齐,通过使用合成数据集迭代地优化模型,消除了手动数据注释的需要。此外,我们旨在通过我们的公共评估数据集JAME标准化这种歌词到歌曲模型的评估。我们展示了JAM在音乐特定属性方面优于现有模型。

更新时间: 2025-07-28 14:34:02

领域: cs.SD,cs.AI

下载: http://arxiv.org/abs/2507.20880v1

Implementing Adaptations for Vision AutoRegressive Model

Vision AutoRegressive model (VAR) was recently introduced as an alternative to Diffusion Models (DMs) in image generation domain. In this work we focus on its adaptations, which aim to fine-tune pre-trained models to perform specific downstream tasks, like medical data generation. While for DMs there exist many techniques, adaptations for VAR remain underexplored. Similarly, differentially private (DP) adaptations-ones that aim to preserve privacy of the adaptation data-have been extensively studied for DMs, while VAR lacks such solutions. In our work, we implement and benchmark many strategies for VAR, and compare them to state-of-the-art DM adaptation strategies. We observe that VAR outperforms DMs for non-DP adaptations, however, the performance of DP suffers, which necessitates further research in private adaptations for VAR. Code is available at https://github.com/sprintml/finetuning_var_dp.

Updated: 2025-07-28 14:28:07

标题: 实施视觉自回归模型的调整

摘要: 视觉自回归模型(VAR)最近被引入作为图像生成领域中扩散模型(DMs)的一种替代方法。在这项工作中,我们专注于其适应性,旨在微调预训练模型以执行特定的下游任务,如医疗数据生成。尽管对于DMs存在许多技术,但对于VAR的适应性仍未得到充分探索。同样,不同ially private(DP)适应性-旨在保护适应性数据的隐私-已经得到广泛研究,而VAR缺乏这样的解决方案。在我们的工作中,我们实施和基准测试了许多VAR策略,并将它们与最先进的DM适应策略进行比较。我们观察到,在非DP适应性方面,VAR优于DMs,然而,DP的性能受到影响,这需要进一步研究VAR的私人适应性。代码可在https://github.com/sprintml/finetuning_var_dp找到。

更新时间: 2025-07-28 14:28:07

领域: cs.CV,cs.LG,I.2.6; I.5.1; I.4.8; I.2.10

下载: http://arxiv.org/abs/2507.11441v2

Implementing Adaptations for Vision AutoRegressive Model

Vision AutoRegressive model (VAR) was recently introduced as an alternative to Diffusion Models (DMs) in image generation domain. In this work we focus on its adaptations, which aim to fine-tune pre-trained models to perform specific downstream tasks, like medical data generation. While for DMs there exist many techniques, adaptations for VAR remain underexplored. Similarly, differentially private (DP) adaptations-ones that aim to preserve privacy of the adaptation data-have been extensively studied for DMs, while VAR lacks such solutions. In our work, we implement and benchmark many strategies for VAR, and compare them to state-of-the-art DM adaptation strategies. We observe that VAR outperforms DMs for non-DP adaptations, however, the performance of DP suffers, which necessitates further research in private adaptations for VAR. Code is available at https://github.com/sprintml/finetuning_var_dp.

Updated: 2025-07-28 14:28:07

标题: 实施视觉自回归模型的适应性

摘要: 视觉自回归模型(VAR)最近被引入作为图像生成领域中扩散模型(DMs)的替代方案。在这项工作中,我们专注于其适应性,旨在微调预训练模型以执行特定的下游任务,如医学数据生成。虽然对于DMs存在许多技术,但对于VAR的适应性仍未被充分开发。同样,不同ially private(DP)适应性-旨在保护适应性数据的隐私-已被广泛研究用于DMs,而VAR缺乏这样的解决方案。在我们的工作中,我们实施并对VAR的许多策略进行基准测试,并将它们与最先进的DM适应策略进行比较。我们观察到VAR在非DP适应性方面优于DMs,然而,DP的性能会受到影响,这需要进一步研究VAR的私密适应性。代码可在https://github.com/sprintml/finetuning_var_dp找到。

更新时间: 2025-07-28 14:28:07

领域: cs.CV,cs.LG,I.2.6; I.5.1; I.4.8; I.2.10

下载: http://arxiv.org/abs/2507.11441v2

Testbed and Software Architecture for Enhancing Security in Industrial Private 5G Networks

In the era of Industry 4.0, the growing need for secure and efficient communication systems has driven the development of fifth-generation (5G) networks characterized by extremely low latency, massive device connectivity and high data transfer speeds. However, the deployment of 5G networks presents significant security challenges, requiring advanced and robust solutions to counter increasingly sophisticated cyber threats. This paper proposes a testbed and software architecture to strengthen the security of Private 5G Networks, particularly in industrial communication environments.

Updated: 2025-07-28 14:24:20

标题: 工业私人5G网络安全增强的测试平台和软件架构

摘要: 在工业4.0时代,对安全高效通信系统的需求不断增长,推动了第五代(5G)网络的发展,其特点是极低的延迟、大规模设备连接和高速数据传输。然而,部署5G网络面临着重大安全挑战,需要先进和强大的解决方案来应对日益复杂的网络威胁。本文提出了一个测试平台和软件架构,以加强私人5G网络的安全性,特别是在工业通信环境中。

更新时间: 2025-07-28 14:24:20

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2507.20873v1

Not Only Grey Matter: OmniBrain for Robust Multimodal Classification of Alzheimer's Disease

Alzheimer's disease affects over 55 million people worldwide and is projected to more than double by 2050, necessitating rapid, accurate, and scalable diagnostics. However, existing approaches are limited because they cannot achieve clinically acceptable accuracy, generalization across datasets, robustness to missing modalities, and explainability all at the same time. This inability to satisfy all these requirements simultaneously undermines their reliability in clinical settings. We propose OmniBrain, a multimodal framework that integrates brain MRI, radiomics, gene expression, and clinical data using a unified model with cross-attention and modality dropout. OmniBrain achieves $92.2 \pm 2.4\%$accuracy on the ANMerge dataset and generalizes to the MRI-only ADNI dataset with $70.4 \pm 2.7\%$ accuracy, outperforming unimodal and prior multimodal approaches. Explainability analyses highlight neuropathologically relevant brain regions and genes, enhancing clinical trust. OmniBrain offers a robust, interpretable, and practical solution for real-world Alzheimer's diagnosis.

Updated: 2025-07-28 14:24:13

标题: 不仅仅是灰质:OmniBrain用于阿尔茨海默病的稳健多模式分类

摘要: 阿尔茨海默病影响全球超过5500万人,并预计到2050年将超过这一数字,这需要快速、准确和可扩展的诊断方法。然而,现有的方法存在局限性,因为它们无法同时实现临床可接受的准确性、在数据集之间的泛化、对缺失模态的稳健性和可解释性。这种无法同时满足所有这些要求的能力削弱了它们在临床环境中的可靠性。我们提出了OmniBrain,这是一个多模态框架,它使用交叉注意力和模态丢失的统一模型集成了脑MRI、放射学、基因表达和临床数据。OmniBrain在ANMerge数据集上实现了$92.2 \pm 2.4\%$的准确率,并且在MRI-only ADNI数据集上以$70.4 \pm 2.7\%$的准确率进行泛化,优于单模态和先前的多模态方法。可解释性分析突出了神经病理学相关的脑区域和基因,增强了临床信任。OmniBrain为实际的阿尔茨海默病诊断提供了一种稳健、可解释和实用的解决方案。

更新时间: 2025-07-28 14:24:13

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20872v1

Not Only Grey Matter: OmniBrain for Robust Multimodal Classification of Alzheimer's Disease

Alzheimer's disease affects over 55 million people worldwide and is projected to more than double by 2050, necessitating rapid, accurate, and scalable diagnostics. However, existing approaches are limited because they cannot achieve clinically acceptable accuracy, generalization across datasets, robustness to missing modalities, and explainability all at the same time. This inability to satisfy all these requirements simultaneously undermines their reliability in clinical settings. We propose OmniBrain, a multimodal framework that integrates brain MRI, radiomics, gene expression, and clinical data using a unified model with cross-attention and modality dropout. OmniBrain achieves $92.2 \pm 2.4\%$accuracy on the ANMerge dataset and generalizes to the MRI-only ADNI dataset with $70.4 \pm 2.7\%$ accuracy, outperforming unimodal and prior multimodal approaches. Explainability analyses highlight neuropathologically relevant brain regions and genes, enhancing clinical trust. OmniBrain offers a robust, interpretable, and practical solution for real-world Alzheimer's diagnosis.

Updated: 2025-07-28 14:24:13

标题: 不仅仅是灰质:OmniBrain用于阿尔茨海默病强大多模式分类

摘要: 阿尔茨海默病影响全球超过5500万人,并预计到2050年将翻倍以上,因此需要快速、准确和可扩展的诊断方法。然而,现有方法存在局限性,因为它们无法同时实现临床可接受的准确性、跨数据集的泛化能力、对缺失模态的稳健性和可解释性。这种无法同时满足所有这些要求的能力削弱了它们在临床环境中的可靠性。我们提出了OmniBrain,这是一个多模态框架,利用交叉注意力和模态丢失,集成了脑MRI、放射学、基因表达和临床数据,使用统一模型。OmniBrain在ANMerge数据集上实现了$92.2 \pm 2.4\%$的准确率,并在仅MRI的ADNI数据集上达到$70.4 \pm 2.7\%$的准确率,优于单模态和先前的多模态方法。解释性分析突出了与神经病理相关的脑区域和基因,增强了临床信任。OmniBrain为实际的阿尔茨海默病诊断提供了一个稳健、可解释和实用的解决方案。

更新时间: 2025-07-28 14:24:13

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20872v1

\textit{FedABC}: Attention-Based Client Selection for Federated Learning with Long-Term View

Native AI support is a key objective in the evolution of 6G networks, with Federated Learning (FL) emerging as a promising paradigm. FL allows decentralized clients to collaboratively train an AI model without directly sharing their data, preserving privacy. Clients train local models on private data and share model updates, which a central server aggregates to refine the global model and redistribute it for the next iteration. However, client data heterogeneity slows convergence and reduces model accuracy, and frequent client participation imposes communication and computational burdens. To address these challenges, we propose \textit{FedABC}, an innovative client selection algorithm designed to take a long-term view in managing data heterogeneity and optimizing client participation. Inspired by attention mechanisms, \textit{FedABC} prioritizes informative clients by evaluating both model similarity and each model's unique contributions to the global model. Moreover, considering the evolving demands of the global model, we formulate an optimization problem to guide \textit{FedABC} throughout the training process. Following the ``later-is-better" principle, \textit{FedABC} adaptively adjusts the client selection threshold, encouraging greater participation in later training stages. Extensive simulations on CIFAR-10 demonstrate that \textit{FedABC} significantly outperforms existing approaches in model accuracy and client participation efficiency, achieving comparable performance with 32\% fewer clients than the classical FL algorithm \textit{FedAvg}, and 3.5\% higher accuracy with 2\% fewer clients than the state-of-the-art. This work marks a step toward deploying FL in heterogeneous, resource-constrained environments, thereby supporting native AI capabilities in 6G networks.

Updated: 2025-07-28 14:22:43

标题: \textit{FedABC}: 面向长期视角的基于注意力的联邦学习客户端选择

摘要: 原生AI支持是6G网络演进中的重要目标,联邦学习(FL)作为一种有前景的范例正在兴起。FL允许去中心化的客户端协同训练一个AI模型,而无需直接共享他们的数据,以保护隐私。客户端在私有数据上训练本地模型,并分享模型更新,中央服务器将这些更新进行聚合以完善全局模型并重新分发给下一次迭代。然而,客户端数据的异质性减慢了收敛速度并降低了模型准确性,频繁的客户端参与也增加了通信和计算负担。为了解决这些挑战,我们提出了一种创新的客户端选择算法FedABC,旨在从长远视角管理数据异质性并优化客户端参与度。受注意机制启发,FedABC通过评估模型相似性和每个模型对全局模型的独特贡献来优先考虑信息丰富的客户端。此外,考虑到全局模型的不断变化需求,我们制定了一个优化问题来指导FedABC在训练过程中的操作。遵循“越晚参与越好”的原则,FedABC自适应地调整客户端选择阈值,鼓励在后续培训阶段更多的参与。在CIFAR-10上进行的大量模拟表明,FedABC在模型准确性和客户端参与效率方面明显优于现有方法,与经典FL算法FedAvg相比,使用更少的32%客户端获得了可比的性能,并且比最先进技术使用更少2%客户端的准确性高出3.5%。这项工作标志着在异构、资源受限的环境中部署FL的一步,从而支持6G网络中的原生AI能力。

更新时间: 2025-07-28 14:22:43

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2507.20871v1

\textit{FedABC}: Attention-Based Client Selection for Federated Learning with Long-Term View

Native AI support is a key objective in the evolution of 6G networks, with Federated Learning (FL) emerging as a promising paradigm. FL allows decentralized clients to collaboratively train an AI model without directly sharing their data, preserving privacy. Clients train local models on private data and share model updates, which a central server aggregates to refine the global model and redistribute it for the next iteration. However, client data heterogeneity slows convergence and reduces model accuracy, and frequent client participation imposes communication and computational burdens. To address these challenges, we propose \textit{FedABC}, an innovative client selection algorithm designed to take a long-term view in managing data heterogeneity and optimizing client participation. Inspired by attention mechanisms, \textit{FedABC} prioritizes informative clients by evaluating both model similarity and each model's unique contributions to the global model. Moreover, considering the evolving demands of the global model, we formulate an optimization problem to guide \textit{FedABC} throughout the training process. Following the ``later-is-better" principle, \textit{FedABC} adaptively adjusts the client selection threshold, encouraging greater participation in later training stages. Extensive simulations on CIFAR-10 demonstrate that \textit{FedABC} significantly outperforms existing approaches in model accuracy and client participation efficiency, achieving comparable performance with 32\% fewer clients than the classical FL algorithm \textit{FedAvg}, and 3.5\% higher accuracy with 2\% fewer clients than the state-of-the-art. This work marks a step toward deploying FL in heterogeneous, resource-constrained environments, thereby supporting native AI capabilities in 6G networks.

Updated: 2025-07-28 14:22:43

标题: \textit{FedABC}: 基于注意力机制的长期视角联邦学习客户端选择

摘要: 本文摘要说明了原生AI支持是6G网络演进中的一个关键目标,而联邦学习(FL)作为一种有前途的范式正在崭露头角。FL允许分散的客户端共同训练一个AI模型,而无需直接共享他们的数据,保护隐私。客户端在私有数据上训练本地模型并共享模型更新,中央服务器对其进行聚合以完善全局模型,并重新分发给下一次迭代。然而,客户端数据的异质性会减缓收敛速度并降低模型准确性,频繁的客户端参与会带来通信和计算负担。为了解决这些挑战,我们提出了FedABC,一个创新的客户端选择算法,旨在从长远的角度管理数据异质性并优化客户端参与度。受注意机制的启发,FedABC通过评估模型相似性和每个模型对全局模型的独特贡献来优先考虑信息丰富的客户端。此外,考虑到全局模型需求的不断变化,我们制定了一个优化问题来指导FedABC在训练过程中的表现。遵循“后来者更好”原则,FedABC自适应调整客户端选择阈值,鼓励在后期训练阶段更多的参与。在CIFAR-10上进行的大量模拟表明,FedABC在模型准确性和客户端参与效率方面明显优于现有方法,与经典FL算法FedAvg相比,客户端数量减少32%,准确性提高3.5%。 本研究是将FL部署在异构、资源受限环境中的一步,从而支持6G网络中的原生AI功能。

更新时间: 2025-07-28 14:22:43

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2507.20871v1

Visual Enumeration Remains Challenging for Multimodal Generative AI

Many animal species can approximately judge the number of objects in a visual scene at a single glance, and humans can further determine the exact cardinality of a set by deploying systematic counting procedures. In contrast, it has been observed that even state-of-the-art AI systems have very limited enumeration skills. In this work, we propose two benchmark tasks inspired by cognitive science that allow to precisely evaluate the visual enumeration capabilities of multimodal foundation models, thereby providing an objective measure of their number sense and counting level. We consider popular visual question answering models (BLIP, LLaVA and ViLT) as well as advanced image-to-text (Gemini, GPT and Qwen) and text-to-image (DALL-E, FLUX and Stable Diffusion) AI systems. Our analyses show that even the most advanced models cannot reliably name the number of objects in simple visual stimuli or generate images containing a target number of items, as indexed by their low accuracy in both types of tasks. Especially for numbers outside the subitizing range, their responses are often far from the target numerosity, and, in stark contrast with human behavior, in many cases the distribution of errors depends on the object category. We also observe some striking mistakes with small numbers. Our findings demonstrate that developing an intuitive visual understanding of number remains challenging for AI models and that merely increasing model size might not be a viable strategy to promote the emergence of systematic counting skills. We release the full code of our benchmark to facilitate the evaluation of enumeration skills in future AI systems.

Updated: 2025-07-28 14:18:37

标题: 视觉枚举对于多模态生成人工智能仍然具有挑战性

摘要: 许多动物物种可以在一瞥间大约判断视觉场景中的物体数量,人类可以通过系统性的计数程序进一步确定一个集合的确切基数。相比之下,已经观察到即使是最先进的人工智能系统也具有非常有限的枚举能力。在这项工作中,我们提出了两个灵感来自认知科学的基准任务,允许精确评估多模式基础模型的视觉枚举能力,从而提供它们数字感和计数水平的客观度量。我们考虑流行的视觉问题回答模型(BLIP、LLaVA和ViLT),以及先进的图像到文本(Gemini、GPT和Qwen)和文本到图像(DALL-E、FLUX和Stable Diffusion)人工智能系统。我们的分析显示,即使是最先进的模型也无法可靠地命名简单视觉刺激中的物体数量或生成包含目标数量物品的图像,这体现在它们在这两种任务中的低准确率上。特别是对于超出快速估算范围的数字,它们的回答往往远离目标数字,与人类行为形成鲜明对比,在许多情况下,错误的分布取决于物体类别。我们还观察到一些关于小数字的引人注目的错误。我们的发现表明,对于人工智能模型来说,发展对数字的直观视觉理解仍然具有挑战性,并且仅仅增加模型大小可能不是促进系统性计数技能的可行策略。我们发布了我们的基准的完整代码,以便促进未来人工智能系统中枚举技能的评估。

更新时间: 2025-07-28 14:18:37

领域: cs.CV,cs.AI,cs.NE

下载: http://arxiv.org/abs/2402.03328v3

Visual Enumeration Remains Challenging for Multimodal Generative AI

Many animal species can approximately judge the number of objects in a visual scene at a single glance, and humans can further determine the exact cardinality of a set by deploying systematic counting procedures. In contrast, it has been observed that even state-of-the-art AI systems have very limited enumeration skills. In this work, we propose two benchmark tasks inspired by cognitive science that allow to precisely evaluate the visual enumeration capabilities of multimodal foundation models, thereby providing an objective measure of their number sense and counting level. We consider popular visual question answering models (BLIP, LLaVA and ViLT) as well as advanced image-to-text (Gemini, GPT and Qwen) and text-to-image (DALL-E, FLUX and Stable Diffusion) AI systems. Our analyses show that even the most advanced models cannot reliably name the number of objects in simple visual stimuli or generate images containing a target number of items, as indexed by their low accuracy in both types of tasks. Especially for numbers outside the subitizing range, their responses are often far from the target numerosity, and, in stark contrast with human behavior, in many cases the distribution of errors depends on the object category. We also observe some striking mistakes with small numbers. Our findings demonstrate that developing an intuitive visual understanding of number remains challenging for AI models and that merely increasing model size might not be a viable strategy to promote the emergence of systematic counting skills. We release the full code of our benchmark to facilitate the evaluation of enumeration skills in future AI systems.

Updated: 2025-07-28 14:18:37

标题: 多模式生成人工智能的视觉枚举仍然具有挑战性

摘要: 许多动物物种能够在一瞥之间大致判断视觉场景中物体的数量,人类可以通过采用系统性计数程序进一步确定集合的确切基数。相比之下,观察到即使是最先进的人工智能系统也具有非常有限的枚举技能。在这项工作中,我们提出了两个受认知科学启发的基准任务,可以精确评估多模态基础模型的视觉枚举能力,从而提供其数字感和计数水平的客观度量。我们考虑了流行的视觉问题回答模型(BLIP、LLaVA和ViLT),以及先进的图像到文本(Gemini、GPT和Qwen)和文本到图像(DALL-E、FLUX和Stable Diffusion)人工智能系统。我们的分析显示,即使是最先进的模型也无法可靠地命名简单视觉刺激中的物体数量,或生成包含目标物品数量的图像,这表现为它们在这两种任务中的低准确性。特别是对于超出直观计数范围的数字,它们的回应往往远离目标数量,并且与人类行为形成鲜明对比,在许多情况下错误的分布取决于物体类别。我们还观察到一些小数字方面的引人注目错误。我们的发现表明,对于人工智能模型来说,发展出对数字的直觉视觉理解仍然具有挑战性,并且仅仅增加模型大小可能不是促进系统性计数技能出现的可行策略。我们发布了我们的基准测试的完整代码,以促进未来人工智能系统中枚举技能的评估。

更新时间: 2025-07-28 14:18:37

领域: cs.CV,cs.AI,cs.NE

下载: http://arxiv.org/abs/2402.03328v3

A Large Language Model-Supported Threat Modeling Framework for Transportation Cyber-Physical Systems

Existing threat modeling frameworks related to transportation cyber-physical systems (CPS) are often narrow in scope, labor-intensive, and require substantial cybersecurity expertise. To this end, we introduce the Transportation Cybersecurity and Resiliency Threat Modeling Framework (TraCR-TMF), a large language model (LLM)-based threat modeling framework for transportation CPS that requires limited cybersecurity expert intervention. TraCR-TMF identifies threats, potential attack techniques, and relevant countermeasures for transportation CPS. Three LLM-based approaches support these identifications: (i) a retrieval-augmented generation approach requiring no cybersecurity expert intervention, (ii) an in-context learning approach with low expert intervention, and (iii) a supervised fine-tuning approach with moderate expert intervention. TraCR-TMF offers LLM-based attack path identification for critical assets based on vulnerabilities across transportation CPS entities. Additionally, it incorporates the Common Vulnerability Scoring System (CVSS) scores of known exploited vulnerabilities to prioritize threat mitigations. The framework was evaluated through two cases. First, the framework identified relevant attack techniques for various transportation CPS applications, 73% of which were validated by cybersecurity experts as correct. Second, the framework was used to identify attack paths for a target asset in a real-world cyberattack incident. TraCR-TMF successfully predicted exploitations, like lateral movement of adversaries, data exfiltration, and data encryption for ransomware, as reported in the incident. These findings show the efficacy of TraCR-TMF in transportation CPS threat modeling, while reducing the need for extensive involvement of cybersecurity experts. To facilitate real-world adoptions, all our codes are shared via an open-source repository.

Updated: 2025-07-28 14:17:34

标题: 一个大型语言模型支持的交通网络物理系统威胁建模框架

摘要: 现有与交通运输网络物理系统(CPS)相关的威胁建模框架往往范围狭窄、劳动密集且需要大量的网络安全专业知识。因此,我们引入了交通网络安全和弹性威胁建模框架(TraCR-TMF),这是一个基于大型语言模型(LLM)的威胁建模框架,适用于交通CPS,需要较少的网络安全专家介入。TraCR-TMF识别了交通CPS的威胁、潜在的攻击技术和相关的对策。三种基于LLM的方法支持这些识别:(i)检索增强生成方法,无需网络安全专家介入,(ii)具有低专家介入的上下文学习方法,以及(iii)具有适度专家介入的监督微调方法。TraCR-TMF提供了基于交通CPS实体的漏洞的LLM攻击路径识别。此外,它结合了已知被利用漏洞的常见漏洞评分系统(CVSS)分数,以优先考虑威胁缓解措施。该框架通过两个案例进行了评估。首先,该框架确定了各种交通CPS应用程序的相关攻击技术,其中73%经过网络安全专家验证为正确。其次,该框架用于识别实际网络攻击事件中目标资产的攻击路径。TraCR-TMF成功预测了攻击者的横向移动、数据外泄和勒索软件的数据加密等利用情况,正如该事件中所报道的。这些发现显示了TraCR-TMF在交通CPS威胁建模中的有效性,同时减少了网络安全专家的广泛参与需求。为了促进现实世界的采用,我们通过开源代码库共享了所有代码。

更新时间: 2025-07-28 14:17:34

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2506.00831v2

A Large Language Model-Supported Threat Modeling Framework for Transportation Cyber-Physical Systems

Existing threat modeling frameworks related to transportation cyber-physical systems (CPS) are often narrow in scope, labor-intensive, and require substantial cybersecurity expertise. To this end, we introduce the Transportation Cybersecurity and Resiliency Threat Modeling Framework (TraCR-TMF), a large language model (LLM)-based threat modeling framework for transportation CPS that requires limited cybersecurity expert intervention. TraCR-TMF identifies threats, potential attack techniques, and relevant countermeasures for transportation CPS. Three LLM-based approaches support these identifications: (i) a retrieval-augmented generation approach requiring no cybersecurity expert intervention, (ii) an in-context learning approach with low expert intervention, and (iii) a supervised fine-tuning approach with moderate expert intervention. TraCR-TMF offers LLM-based attack path identification for critical assets based on vulnerabilities across transportation CPS entities. Additionally, it incorporates the Common Vulnerability Scoring System (CVSS) scores of known exploited vulnerabilities to prioritize threat mitigations. The framework was evaluated through two cases. First, the framework identified relevant attack techniques for various transportation CPS applications, 73% of which were validated by cybersecurity experts as correct. Second, the framework was used to identify attack paths for a target asset in a real-world cyberattack incident. TraCR-TMF successfully predicted exploitations, like lateral movement of adversaries, data exfiltration, and data encryption for ransomware, as reported in the incident. These findings show the efficacy of TraCR-TMF in transportation CPS threat modeling, while reducing the need for extensive involvement of cybersecurity experts. To facilitate real-world adoptions, all our codes are shared via an open-source repository.

Updated: 2025-07-28 14:17:34

标题: 一个基于大型语言模型支持的交通物理系统威胁建模框架

摘要: 现有的与交通网络物理系统(CPS)相关的威胁建模框架通常范围狭窄,需要大量的劳动力,并需要大量的网络安全专业知识。为此,我们引入了Transportation Cybersecurity and Resiliency Threat Modeling Framework(TraCR-TMF),这是一个基于大型语言模型(LLM)的威胁建模框架,用于交通网络物理系统,其需要较少的网络安全专家干预。TraCR-TMF为交通网络物理系统识别威胁、潜在的攻击技术和相关的对策。三种基于LLM的方法支持这些识别:(i)一种检索增强生成方法,无需网络安全专家干预,(ii)一种低专家干预的上下文学习方法,以及(iii)一种中等专家干预的监督微调方法。TraCR-TMF提供基于漏洞跨交通网络物理系统实体的关键资产的攻击路径识别。此外,它结合了已知被利用漏洞的公共漏洞评分系统(CVSS)得分,以优先考虑威胁缓解措施。该框架通过两个案例进行了评估。首先,该框架为各种交通网络物理系统应用程序识别了相关的攻击技术,其中73%经网络安全专家验证为正确。其次,该框架用于识别一个真实世界的网络攻击事件中目标资产的攻击路径。TraCR-TMF成功预测了攻击者的横向移动、数据外泄和勒索软件的数据加密等利用,这些都在事件中有所报道。这些发现表明了TraCR-TMF在交通网络物理系统威胁建模中的有效性,同时减少了对网络安全专家的广泛参与需求。为了促进真实世界的应用,我们的所有代码都通过开放源代码库共享。

更新时间: 2025-07-28 14:17:34

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2506.00831v2

Bi-cephalic self-attended model to classify Parkinson's disease patients with freezing of gait

Parkinson Disease (PD) often results in motor and cognitive impairments, including gait dysfunction, particularly in patients with freezing of gait (FOG). Current detection methods are either subjective or reliant on specialized gait analysis tools. This study aims to develop an objective, data-driven, and multi-modal classification model to detect gait dysfunction in PD patients using resting-state EEG signals combined with demographic and clinical variables. We utilized a dataset of 124 participants: 42 PD patients with FOG (PDFOG+), 41 without FOG (PDFOG-), and 41 age-matched healthy controls. Features extracted from resting-state EEG and descriptive variables (age, education, disease duration) were used to train a novel Bi-cephalic Self-Attention Model (BiSAM). We tested three modalities: signal-only, descriptive-only, and multi-modal, across different EEG channel subsets (BiSAM-63, -16, -8, and -4). Signal-only and descriptive-only models showed limited performance, achieving a maximum accuracy of 55% and 68%, respectively. In contrast, the multi-modal models significantly outperformed both, with BiSAM-8 and BiSAM-4 achieving the highest classification accuracy of 88%. These results demonstrate the value of integrating EEG with objective descriptive features for robust PDFOG+ detection. This study introduces a multi-modal, attention-based architecture that objectively classifies PDFOG+ using minimal EEG channels and descriptive variables. This approach offers a scalable and efficient alternative to traditional assessments, with potential applications in routine clinical monitoring and early diagnosis of PD-related gait dysfunction.

Updated: 2025-07-28 14:16:01

标题: 双头自我关注模型用于分类带有步态冻结的帕金森病患者

摘要: 帕金森病(PD)通常会导致运动和认知障碍,包括步态功能障碍,特别是在出现步态冻结(FOG)的患者中。目前的检测方法要么是主观的,要么依赖于专门的步态分析工具。本研究旨在开发一种客观、数据驱动的、多模态分类模型,利用静息态脑电图信号与人口统计学和临床变量来检测PD患者的步态功能障碍。我们利用了124名参与者的数据集:42名PD患者有FOG(PDFOG+),41名没有FOG(PDFOG-),以及41名年龄匹配的健康对照组。从静息态脑电图和描述性变量(年龄、教育、疾病持续时间)提取的特征用于训练一种新颖的双脑自注意模型(BiSAM)。我们测试了三种模态:仅信号、仅描述性和多模态,在不同的脑电图通道子集(BiSAM-63、-16、-8和-4)之间。仅信号和仅描述性模型表现有限,分别达到最高准确率55%和68%。相比之下,多模态模型显著优于两者,BiSAM-8和BiSAM-4实现了最高的分类准确率88%。这些结果表明了将脑电图与客观描述性特征结合用于可靠的PDFOG+检测的价值。本研究介绍了一种多模态、基于注意力的体系结构,使用最少的脑电图通道和描述性变量客观地对PDFOG+进行分类。这种方法为传统评估提供了可扩展和高效的替代方案,在日常临床监测和早期诊断PD相关的步态功能障碍方面具有潜在应用。

更新时间: 2025-07-28 14:16:01

领域: cs.LG

下载: http://arxiv.org/abs/2507.20862v1

Bi-cephalic self-attended model to classify Parkinson's disease patients with freezing of gait

Parkinson Disease (PD) often results in motor and cognitive impairments, including gait dysfunction, particularly in patients with freezing of gait (FOG). Current detection methods are either subjective or reliant on specialized gait analysis tools. This study aims to develop an objective, data-driven, and multi-modal classification model to detect gait dysfunction in PD patients using resting-state EEG signals combined with demographic and clinical variables. We utilized a dataset of 124 participants: 42 PD patients with FOG (PDFOG+), 41 without FOG (PDFOG-), and 41 age-matched healthy controls. Features extracted from resting-state EEG and descriptive variables (age, education, disease duration) were used to train a novel Bi-cephalic Self-Attention Model (BiSAM). We tested three modalities: signal-only, descriptive-only, and multi-modal, across different EEG channel subsets (BiSAM-63, -16, -8, and -4). Signal-only and descriptive-only models showed limited performance, achieving a maximum accuracy of 55% and 68%, respectively. In contrast, the multi-modal models significantly outperformed both, with BiSAM-8 and BiSAM-4 achieving the highest classification accuracy of 88%. These results demonstrate the value of integrating EEG with objective descriptive features for robust PDFOG+ detection. This study introduces a multi-modal, attention-based architecture that objectively classifies PDFOG+ using minimal EEG channels and descriptive variables. This approach offers a scalable and efficient alternative to traditional assessments, with potential applications in routine clinical monitoring and early diagnosis of PD-related gait dysfunction.

Updated: 2025-07-28 14:16:01

标题: 双头自我关注模型用于分类帕金森病患者的步态冻结

摘要: 帕金森病(PD)通常导致运动和认知障碍,包括步态功能障碍,特别是在出现步态冻结(FOG)的患者中。目前的检测方法要么是主观的,要么依赖于专门的步态分析工具。本研究旨在开发一种客观、数据驱动和多模式分类模型,利用静息态脑电图(EEG)信号与人口统计学和临床变量来检测PD患者的步态功能障碍。我们利用了124名参与者的数据集:42名PD患者有FOG(PDFOG+),41名没有FOG(PDFOG-),以及41名年龄匹配的健康对照组。从静息态EEG和描述性变量(年龄、教育、疾病持续时间)提取的特征用于训练一种新颖的双脑自我关注模型(BiSAM)。我们测试了三种模式:仅信号、仅描述性和多模式,跨不同的EEG通道子集(BiSAM-63、-16、-8和-4)。仅信号和仅描述性模型表现有限,最高准确率分别为55%和68%。相比之下,多模式模型明显优于两者,BiSAM-8和BiSAM-4获得了最高的分类准确率为88%。这些结果展示了将EEG与客观描述性特征相结合用于强健的PDFOG+检测的价值。本研究介绍了一种多模式、基于注意力的架构,使用最少的EEG通道和描述性变量客观地分类PDFOG+。这种方法为传统评估提供了可扩展和高效的替代方案,在常规临床监测和早期诊断PD相关步态功能障碍方面具有潜在应用。

更新时间: 2025-07-28 14:16:01

领域: cs.LG

下载: http://arxiv.org/abs/2507.20862v1

REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints

Deep learning models deployed on edge devices frequently encounter resource variability, which arises from fluctuating energy levels, timing constraints, or prioritization of other critical tasks within the system. State-of-the-art machine learning pipelines generate resource-agnostic models that are not capable to adapt at runtime. In this work, we introduce Resource-Efficient Deep Subnetworks (REDS) to tackle model adaptation to variable resources. In contrast to the state-of-the-art, REDS leverages structured sparsity constructively by exploiting permutation invariance of neurons, which allows for hardware-specific optimizations. Specifically, REDS achieves computational efficiency by (1) skipping sequential computational blocks identified by a novel iterative knapsack optimizer, and (2) taking advantage of data cache by re-arranging the order of operations in REDS computational graph. REDS supports conventional deep networks frequently deployed on the edge and provides computational benefits even for small and simple networks. We evaluate REDS on eight benchmark architectures trained on the Visual Wake Words, Google Speech Commands, Fashion-MNIST, CIFAR-10 and ImageNet-1K datasets, and test on four off-the-shelf mobile and embedded hardware platforms. We provide a theoretical result and empirical evidence demonstrating REDS' outstanding performance in terms of submodels' test set accuracy, and demonstrate an adaptation time in response to dynamic resource constraints of under 40$\mu$s, utilizing a fully-connected network on Arduino Nano 33 BLE.

Updated: 2025-07-28 14:11:53

标题: REDS:用于动态资源约束的资源高效深度子网络

摘要: 在边缘设备上部署的深度学习模型经常面临资源变化,这是由于能量水平波动、时间约束或系统内其他关键任务的优先级而产生的。当前的机器学习流程生成的资源无关模型无法在运行时进行调整。在这项工作中,我们引入了资源高效深度子网络(REDS)来解决模型对可变资源的适应性。与现有技术相比,REDS通过利用神经元的置换不变性,结构化稀疏性有益地进行构造,从而实现硬件特定的优化。具体而言,REDS通过(1)跳过由新颖的迭代背包优化器识别的顺序计算块,以及(2)通过重新排列REDS计算图中操作的顺序来利用数据缓存,实现了计算效率。REDS支持常用的部署在边缘的深度网络,并且即使对于小型和简单的网络也提供了计算优势。我们在Visual Wake Words、Google Speech Commands、Fashion-MNIST、CIFAR-10和ImageNet-1K数据集上训练了八种基准架构,并在四个现成的移动和嵌入式硬件平台上进行了测试。我们提供了理论结果和实证证据,展示了REDS在子模型测试集准确性方面的出色表现,并展示了对Arduino Nano 33 BLE上的全连接网络动态资源约束的适应时间低于40μs。

更新时间: 2025-07-28 14:11:53

领域: cs.LG

下载: http://arxiv.org/abs/2311.13349v3

REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints

Deep learning models deployed on edge devices frequently encounter resource variability, which arises from fluctuating energy levels, timing constraints, or prioritization of other critical tasks within the system. State-of-the-art machine learning pipelines generate resource-agnostic models that are not capable to adapt at runtime. In this work, we introduce Resource-Efficient Deep Subnetworks (REDS) to tackle model adaptation to variable resources. In contrast to the state-of-the-art, REDS leverages structured sparsity constructively by exploiting permutation invariance of neurons, which allows for hardware-specific optimizations. Specifically, REDS achieves computational efficiency by (1) skipping sequential computational blocks identified by a novel iterative knapsack optimizer, and (2) taking advantage of data cache by re-arranging the order of operations in REDS computational graph. REDS supports conventional deep networks frequently deployed on the edge and provides computational benefits even for small and simple networks. We evaluate REDS on eight benchmark architectures trained on the Visual Wake Words, Google Speech Commands, Fashion-MNIST, CIFAR-10 and ImageNet-1K datasets, and test on four off-the-shelf mobile and embedded hardware platforms. We provide a theoretical result and empirical evidence demonstrating REDS' outstanding performance in terms of submodels' test set accuracy, and demonstrate an adaptation time in response to dynamic resource constraints of under 40$\mu$s, utilizing a fully-connected network on Arduino Nano 33 BLE.

Updated: 2025-07-28 14:11:53

标题: REDS: 动态资源约束下的资源高效深度子网络

摘要: 部署在边缘设备上的深度学习模型经常遇到资源变化,这是由于能量水平波动、时间约束或系统中其他关键任务的优先级而产生的。目前的机器学习流程生成的资源无关模型无法在运行时进行自适应。在这项工作中,我们引入资源高效深度子网络(REDS)来应对变化资源的模型适应性。与现有技术相比,REDS通过利用神经元的排列不变性,积极利用结构稀疏性,从而实现硬件特定的优化。具体来说,REDS通过(1)跳过由新颖的迭代背包优化器识别的顺序计算块,以及(2)通过重新排列REDS计算图中的操作顺序来利用数据缓存,实现计算效率。REDS支持经常部署在边缘的传统深度网络,并且即使对于小型简单网络也提供计算优势。我们在Visual Wake Words、Google Speech Commands、Fashion-MNIST、CIFAR-10和ImageNet-1K数据集上对八种基准架构进行了评估,并在四个现成的移动和嵌入式硬件平台上进行了测试。我们提供了理论结果和实证证据,展示了REDS在子模型测试集准确度方面的出色性能,并展示了对Arduino Nano 33 BLE上的全连接网络动态资源约束响应的适应时间在40μs以下。

更新时间: 2025-07-28 14:11:53

领域: cs.LG

下载: http://arxiv.org/abs/2311.13349v3

Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces

Advances in reinforcement learning (RL) have led to its successful application in complex tasks with continuous state and action spaces. Despite these advances in practice, most theoretical work pertains to finite state and action spaces. We propose building a theoretical understanding of continuous state and action spaces by employing a geometric lens to understand the locally attained set of states. The set of all parametrised policies learnt through a semi-gradient based approach induces a set of attainable states in RL. We show that the training dynamics of a two-layer neural policy induce a low dimensional manifold of attainable states embedded in the high-dimensional nominal state space trained using an actor-critic algorithm. We prove that, under certain conditions, the dimensionality of this manifold is of the order of the dimensionality of the action space. This is the first result of its kind, linking the geometry of the state space to the dimensionality of the action space. We empirically corroborate this upper bound for four MuJoCo environments and also demonstrate the results in a toy environment with varying dimensionality. We also show the applicability of this theoretical result by introducing a local manifold learning layer to the policy and value function networks to improve the performance in control environments with very high degrees of freedom by changing one layer of the neural network to learn sparse representations.

Updated: 2025-07-28 14:06:44

标题: 在连续状态和动作空间中的神经强化学习几何学

摘要: 强化学习(RL)的进展已经使其成功地应用于具有连续状态和动作空间的复杂任务。尽管在实践中取得了这些进展,但大多数理论工作仍涉及有限状态和动作空间。我们提出通过采用几何镜头来建立对连续状态和动作空间的理论理解,以了解局部达到的状态集。通过基于半梯度的方法学习的所有参数化策略集合在RL中引入了一组可达状态。我们展示了一个两层神经策略的训练动态在高维度的名义状态空间中嵌入了一个低维度可达状态流形,该空间是使用演员-评论算法训练的。我们证明,在某些条件下,此流形的维度与动作空间的维度相当。这是其类别的第一个结果,将状态空间的几何形态与动作空间的维度联系起来。我们在四个MuJoCo环境中经验性地证实了这个上界,并在一个具有不同维度的玩具环境中展示了结果。我们还通过在策略和价值函数网络中引入一个本地流形学习层来展示这一理论结果的适用性,以通过改变神经网络的一层来学习稀疏表示来提高在具有非常高自由度的控制环境中的性能。

更新时间: 2025-07-28 14:06:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20853v1

Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces

Advances in reinforcement learning (RL) have led to its successful application in complex tasks with continuous state and action spaces. Despite these advances in practice, most theoretical work pertains to finite state and action spaces. We propose building a theoretical understanding of continuous state and action spaces by employing a geometric lens to understand the locally attained set of states. The set of all parametrised policies learnt through a semi-gradient based approach induces a set of attainable states in RL. We show that the training dynamics of a two-layer neural policy induce a low dimensional manifold of attainable states embedded in the high-dimensional nominal state space trained using an actor-critic algorithm. We prove that, under certain conditions, the dimensionality of this manifold is of the order of the dimensionality of the action space. This is the first result of its kind, linking the geometry of the state space to the dimensionality of the action space. We empirically corroborate this upper bound for four MuJoCo environments and also demonstrate the results in a toy environment with varying dimensionality. We also show the applicability of this theoretical result by introducing a local manifold learning layer to the policy and value function networks to improve the performance in control environments with very high degrees of freedom by changing one layer of the neural network to learn sparse representations.

Updated: 2025-07-28 14:06:44

标题: 连续状态和动作空间中神经强化学习的几何学

摘要: 强化学习(RL)的进展已经成功地应用于具有连续状态和动作空间的复杂任务。尽管在实践中取得了这些进展,但大多数理论工作仍涉及有限状态和动作空间。我们提出通过使用几何镜头来理解局部达到的状态集,建立对连续状态和动作空间的理论理解。通过半梯度方法学习的所有参数化策略形成了RL中可达状态的集合。我们展示了一个两层神经策略的训练动态在高维度的名义状态空间中引入了一个低维度的可达状态流形,该神经策略是使用演员-评论家算法训练的。我们证明,在某些条件下,该流形的维度与动作空间的维数有关。这是其类别的第一个结果,将状态空间的几何形态与动作空间的维度联系起来。我们在四个MuJoCo环境中经验性地证实了这个上限,并在一个具有不同维度的玩具环境中展示了结果。我们还通过向策略和值函数网络引入一个局部流形学习层来展示了这个理论结果的适用性,以提高控制环境中具有非常高自由度的性能,通过改变神经网络的一层来学习稀疏表示。

更新时间: 2025-07-28 14:06:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20853v1

An Open-source Implementation and Security Analysis of Triad's TEE Trusted Time Protocol

The logic of many protocols relies on time measurements. However, in Trusted Execution Environments (TEEs) like Intel SGX, the time source is outside the Trusted Computing Base: a malicious system hosting the TEE can manipulate that TEE's notion of time, e.g., jumping in time or affecting the perceived time speed. Previous work like Triad propose protocols for TEEs to maintain a trustworthy time source. However, in this paper, based on a public implementation of Triad that we contribute, we empirically showcase vulnerabilities to this protocol. For example, an attacker controlling the operating system, and consequently the scheduling algorithm, may arbitrarily manipulate their local TEE's clock speed. What is worse, in case of faster malicious clock speeds, an attacker on a single compromised machine may propagate the attack to honest machines participating in Triad's Trusted Time protocol, causing them to skip to timestamps arbitrarily far in the future. Then, infected honest machines propagate time-skips themselves to other honest machines interacting with them. We discuss protocol changes to Triad for higher resilience against such attacks.

Updated: 2025-07-28 14:02:59

标题: Triad的TEE可信时间协议的开源实现和安全分析

摘要: 许多协议的逻辑依赖于时间测量。然而,在像英特尔SGX这样的可信执行环境(TEE)中,时间源位于可信计算基础之外:托管TEE的恶意系统可以操纵该TEE的时间概念,例如,时间跳跃或影响感知时间速度。先前的工作,如Triad,提出了TEE的协议来维护可信的时间源。然而,在本文中,基于我们提供的Triad的公共实现,我们经验性地展示了对该协议的漏洞。例如,控制操作系统的攻击者,因此控制调度算法,可以任意操纵其本地TEE的时钟速度。更糟糕的是,在更快的恶意时钟速度的情况下,单个受Compromised的机器上的攻击者可能将攻击传播到参与Triad的信任时间协议的诚实机器,导致它们跳到未来任意远的时间戳。然后,受感染的诚实机器将时间跃跃传播到与它们互动的其他诚实机器。我们讨论了对Triad协议的更高抵抗力的协议更改。

更新时间: 2025-07-28 14:02:59

领域: cs.CR

下载: http://arxiv.org/abs/2507.20851v1

Free Energy-Inspired Cognitive Risk Integration for AV Navigation in Pedestrian-Rich Environments

Recent advances in autonomous vehicle (AV) behavior planning have shown impressive social interaction capabilities when interacting with other road users. However, achieving human-like prediction and decision-making in interactions with vulnerable road users remains a key challenge in complex multi-agent interactive environments. Existing research focuses primarily on crowd navigation for small mobile robots, which cannot be directly applied to AVs due to inherent differences in their decision-making strategies and dynamic boundaries. Moreover, pedestrians in these multi-agent simulations follow fixed behavior patterns that cannot dynamically respond to AV actions. To overcome these limitations, this paper proposes a novel framework for modeling interactions between the AV and multiple pedestrians. In this framework, a cognitive process modeling approach inspired by the Free Energy Principle is integrated into both the AV and pedestrian models to simulate more realistic interaction dynamics. Specifically, the proposed pedestrian Cognitive-Risk Social Force Model adjusts goal-directed and repulsive forces using a fused measure of cognitive uncertainty and physical risk to produce human-like trajectories. Meanwhile, the AV leverages this fused risk to construct a dynamic, risk-aware adjacency matrix for a Graph Convolutional Network within a Soft Actor-Critic architecture, allowing it to make more reasonable and informed decisions. Simulation results indicate that our proposed framework effectively improves safety, efficiency, and smoothness of AV navigation compared to the state-of-the-art method.

Updated: 2025-07-28 14:02:00

标题: 在行人密集环境中AV导航的自由能源启发认知风险整合

摘要: 最近自动驾驶车辆行为规划方面取得了显著进展,展现出与其他道路用户进行社会互动时的出色能力。然而,在与弱势道路用户互动中实现类似人类的预测和决策仍然是复杂多智能体互动环境中的关键挑战。现有研究主要侧重于小型移动机器人的人群导航,这些研究无法直接应用于自动驾驶车辆,因为它们的决策策略和动态边界存在固有差异。此外,在这些多智能体模拟中,行人遵循固定的行为模式,无法动态响应自动驾驶车辆的行动。为克服这些限制,本文提出了一个新颖的框架,用于建模自动驾驶车辆和多个行人之间的互动。在该框架中,受自由能量原理启发的认知过程建模方法被整合到自动驾驶车辆和行人模型中,以模拟更真实的互动动力学。具体而言,所提出的行人认知风险社会力模型利用认知不确定性和物理风险的融合度量来调整目标导向和斥力,产生类似人类的轨迹。与此同时,自动驾驶车辆利用这种融合风险构建了一个动态、风险感知的邻接矩阵,用于Soft Actor-Critic结构中的图卷积网络,使其能够做出更合理和明智的决策。模拟结果表明,与最先进的方法相比,我们提出的框架有效地提高了自动驾驶车辆导航的安全性、效率和平稳性。

更新时间: 2025-07-28 14:02:00

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2507.20850v1

Free Energy-Inspired Cognitive Risk Integration for AV Navigation in Pedestrian-Rich Environments

Recent advances in autonomous vehicle (AV) behavior planning have shown impressive social interaction capabilities when interacting with other road users. However, achieving human-like prediction and decision-making in interactions with vulnerable road users remains a key challenge in complex multi-agent interactive environments. Existing research focuses primarily on crowd navigation for small mobile robots, which cannot be directly applied to AVs due to inherent differences in their decision-making strategies and dynamic boundaries. Moreover, pedestrians in these multi-agent simulations follow fixed behavior patterns that cannot dynamically respond to AV actions. To overcome these limitations, this paper proposes a novel framework for modeling interactions between the AV and multiple pedestrians. In this framework, a cognitive process modeling approach inspired by the Free Energy Principle is integrated into both the AV and pedestrian models to simulate more realistic interaction dynamics. Specifically, the proposed pedestrian Cognitive-Risk Social Force Model adjusts goal-directed and repulsive forces using a fused measure of cognitive uncertainty and physical risk to produce human-like trajectories. Meanwhile, the AV leverages this fused risk to construct a dynamic, risk-aware adjacency matrix for a Graph Convolutional Network within a Soft Actor-Critic architecture, allowing it to make more reasonable and informed decisions. Simulation results indicate that our proposed framework effectively improves safety, efficiency, and smoothness of AV navigation compared to the state-of-the-art method.

Updated: 2025-07-28 14:02:00

标题: 在行人密集环境中,基于自由能量启发的认知风险整合用于AV导航

摘要: 最近自主车辆(AV)行为规划领域取得了显著进展,展示了在与其他道路使用者互动时出色的社交互动能力。然而,在与易受伤害的道路使用者互动中实现类似人类的预测和决策仍然是复杂多智能体互动环境中的一个关键挑战。现有研究主要关注小型移动机器人的人群导航,这些研究不能直接应用于AV,因为它们的决策策略和动态边界存在固有差异。此外,在这些多智能体模拟中,行人遵循固定的行为模式,无法动态响应AV的行动。为了克服这些局限性,本文提出了一个新颖的框架来建模AV与多个行人之间的互动。在这个框架中,受自由能原理启发的认知过程建模方法被整合到AV和行人模型中,以模拟更真实的互动动态。具体来说,所提出的行人认知风险社交力模型通过使用认知不确定性和物理风险的融合度量来调整目标导向和排斥力,以产生类似人类的轨迹。与此同时,AV利用这种融合风险构建了一个动态、风险感知的邻接矩阵,用于Soft Actor-Critic架构中的图卷积网络,使其能够做出更合理和明智的决策。模拟结果表明,与最先进的方法相比,我们提出的框架有效地提高了AV导航的安全性、效率和平滑性。

更新时间: 2025-07-28 14:02:00

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2507.20850v1

Learning unitaries with quantum statistical queries

We propose several algorithms for learning unitary operators from quantum statistical queries with respect to their Choi-Jamiolkowski state. Quantum statistical queries capture the capabilities of a learner with limited quantum resources, which receives as input only noisy estimates of expected values of measurements. Our approach leverages quantum statistical queries to estimate the Fourier mass of a unitary on a subset of Pauli strings, generalizing previous techniques developed for uniform quantum examples. Specifically, we show that the celebrated quantum Goldreich-Levin algorithm can be implemented with quantum statistical queries, whereas the prior version of the algorithm involves oracle access to the unitary and its inverse. As an application, we prove that quantum Boolean functions with constant total influence or with constant degree are efficiently learnable in our model. Moreover, we prove that $\mathcal{O}(\log n)$-juntas are efficiently learnable and constant-depth circuits are learnable query-efficiently with quantum statistical queries. On the other hand, all previous algorithms for these tasks demand significantly greater resources, such as oracle access to the unitary or direct access to the Choi-Jamiolkowski state. We also demonstrate that, despite these positive results, quantum statistical queries lead to an exponentially larger query complexity for certain tasks, compared to separable measurements to the Choi-Jamiolkowski state. In particular, we show an exponential lower bound for learning a class of phase-oracle unitaries and a double exponential lower bound for testing the unitarity of channels. Taken together, our results indicate that quantum statistical queries offer a unified framework for various unitary learning tasks, with potential applications in quantum machine learning, many-body physics and benchmarking of near-term devices.

Updated: 2025-07-28 13:58:19

标题: 用量子统计查询学习酉算符

摘要: 我们提出了几种算法,用于从关于其Choi-Jamiolkowski态的量子统计查询中学习幺正算子。量子统计查询捕捉了具有有限量子资源的学习者的能力,该学习者仅接收测量期望值的噪声估计作为输入。我们的方法利用量子统计查询来估计单位在Pauli字符串子集上的Fourier质量,从而推广了先前为均匀量子示例开发的技术。具体地,我们展示了著名的量子Goldreich-Levin算法可以使用量子统计查询实现,而算法的先前版本涉及对单位及其逆的oracle访问。作为应用,我们证明了在我们的模型中,具有恒定总影响力或具有恒定度量的量子布尔函数可以有效地学习。此外,我们证明,$\mathcal{O}(\log n)$-juntas可以有效地学习,常数深度电路可以使用量子统计查询进行高效学习。另一方面,所有先前用于这些任务的算法都需要更多的资源,例如对单位的oracle访问或对Choi-Jamiolkowski态的直接访问。我们还证明,尽管这些积极的结果,与将可分测量与Choi-Jamiolkowski态相比,量子统计查询对于某些任务的查询复杂性呈指数级增长。特别地,我们展示了学习一类相位oracle幺正算子的指数下界,以及测试通道的幺正性的双指数下界。综合起来,我们的结果表明,量子统计查询为各种幺正学习任务提供了统一框架,具有在量子机器学习、多体物理和近期设备的基准测试中的潜在应用。

更新时间: 2025-07-28 13:58:19

领域: quant-ph,cs.CC,cs.LG

下载: http://arxiv.org/abs/2310.02254v3

Learning unitaries with quantum statistical queries

We propose several algorithms for learning unitary operators from quantum statistical queries with respect to their Choi-Jamiolkowski state. Quantum statistical queries capture the capabilities of a learner with limited quantum resources, which receives as input only noisy estimates of expected values of measurements. Our approach leverages quantum statistical queries to estimate the Fourier mass of a unitary on a subset of Pauli strings, generalizing previous techniques developed for uniform quantum examples. Specifically, we show that the celebrated quantum Goldreich-Levin algorithm can be implemented with quantum statistical queries, whereas the prior version of the algorithm involves oracle access to the unitary and its inverse. As an application, we prove that quantum Boolean functions with constant total influence or with constant degree are efficiently learnable in our model. Moreover, we prove that $\mathcal{O}(\log n)$-juntas are efficiently learnable and constant-depth circuits are learnable query-efficiently with quantum statistical queries. On the other hand, all previous algorithms for these tasks demand significantly greater resources, such as oracle access to the unitary or direct access to the Choi-Jamiolkowski state. We also demonstrate that, despite these positive results, quantum statistical queries lead to an exponentially larger query complexity for certain tasks, compared to separable measurements to the Choi-Jamiolkowski state. In particular, we show an exponential lower bound for learning a class of phase-oracle unitaries and a double exponential lower bound for testing the unitarity of channels. Taken together, our results indicate that quantum statistical queries offer a unified framework for various unitary learning tasks, with potential applications in quantum machine learning, many-body physics and benchmarking of near-term devices.

Updated: 2025-07-28 13:58:19

标题: 用量子统计查询学习幺正算子

摘要: 我们提出了几种算法,用于从关于其Choi-Jamiolkowski状态的量子统计查询中学习单位算子。量子统计查询捕捉了具有有限量子资源的学习者的能力,该学习者仅接收噪声估计的预期值的测量作为输入。我们的方法利用量子统计查询来估计单位算子在Pauli字符串子集上的傅立叶质量,推广了先前为均匀量子示例开发的技术。具体来说,我们展示了著名的量子Goldreich-Levin算法可以使用量子统计查询实现,而算法的先前版本涉及对单位算子及其逆的oracle访问。作为一个应用,我们证明在我们的模型中,具有恒定总影响力或具有恒定度的量子布尔函数是有效可学习的。此外,我们证明了$\mathcal{O}(\log n)$-junta可以有效学习,常深度电路可以使用量子统计查询进行高效学习。另一方面,所有先前用于这些任务的算法都需要更大量的资源,例如对单位算子的oracle访问或直接访问Choi-Jamiolkowski状态。我们还证明,尽管这些积极的结果,与可分测量相比,量子统计查询对于某些任务导致指数级更大的查询复杂性。特别是,我们展示了学习一类相位oracle单位算子的指数下界,以及测试通道的单位性的双指数下界。综合起来,我们的结果表明,量子统计查询为各种单位学习任务提供了一个统一框架,具有在量子机器学习、多体物理和近期设备基准测试中的潜在应用。

更新时间: 2025-07-28 13:58:19

领域: quant-ph,cs.CC,cs.LG

下载: http://arxiv.org/abs/2310.02254v3

Towards Explainable Deep Clustering for Time Series Data

Deep clustering uncovers hidden patterns and groups in complex time series data, yet its opaque decision-making limits use in safety-critical settings. This survey offers a structured overview of explainable deep clustering for time series, collecting current methods and their real-world applications. We thoroughly discuss and compare peer-reviewed and preprint papers through application domains across healthcare, finance, IoT, and climate science. Our analysis reveals that most work relies on autoencoder and attention architectures, with limited support for streaming, irregularly sampled, or privacy-preserved series, and interpretability is still primarily treated as an add-on. To push the field forward, we outline six research opportunities: (1) combining complex networks with built-in interpretability; (2) setting up clear, faithfulness-focused evaluation metrics for unsupervised explanations; (3) building explainers that adapt to live data streams; (4) crafting explanations tailored to specific domains; (5) adding human-in-the-loop methods that refine clusters and explanations together; and (6) improving our understanding of how time series clustering models work internally. By making interpretability a primary design goal rather than an afterthought, we propose the groundwork for the next generation of trustworthy deep clustering time series analytics.

Updated: 2025-07-28 13:50:10

标题: 朝向可解释的时间序列数据深度聚类

摘要: 深度聚类揭示了复杂时间序列数据中的隐藏模式和群组,然而其不透明的决策限制了在安全关键环境中的使用。本调查提供了关于可解释深度聚类在时间序列中的结构化概述,收集了当前方法及其在现实世界中的应用。我们通过医疗保健、金融、物联网和气候科学等应用领域全面讨论和比较了同行评审和预印本论文。我们的分析显示,大多数工作依赖于自动编码器和注意力架构,对于流式数据、不规则采样或保留隐私的时间序列的支持有限,解释性仍然主要被视为一个附加功能。为推动该领域的发展,我们概述了六个研究机会:(1)将复杂网络与内置可解释性相结合;(2)为无监督解释建立清晰、忠实度为重点的评估指标;(3)构建适应实时数据流的解释器;(4)量身定制特定领域的解释;(5)添加人在循环方法,共同改进群集和解释;和(6)改进我们对时间序列聚类模型内部工作原理的理解。通过将解释性作为主要设计目标而非事后思考,我们提出了下一代可信赖的深度聚类时间序列分析的基础。

更新时间: 2025-07-28 13:50:10

领域: cs.LG

下载: http://arxiv.org/abs/2507.20840v1

Towards Explainable Deep Clustering for Time Series Data

Deep clustering uncovers hidden patterns and groups in complex time series data, yet its opaque decision-making limits use in safety-critical settings. This survey offers a structured overview of explainable deep clustering for time series, collecting current methods and their real-world applications. We thoroughly discuss and compare peer-reviewed and preprint papers through application domains across healthcare, finance, IoT, and climate science. Our analysis reveals that most work relies on autoencoder and attention architectures, with limited support for streaming, irregularly sampled, or privacy-preserved series, and interpretability is still primarily treated as an add-on. To push the field forward, we outline six research opportunities: (1) combining complex networks with built-in interpretability; (2) setting up clear, faithfulness-focused evaluation metrics for unsupervised explanations; (3) building explainers that adapt to live data streams; (4) crafting explanations tailored to specific domains; (5) adding human-in-the-loop methods that refine clusters and explanations together; and (6) improving our understanding of how time series clustering models work internally. By making interpretability a primary design goal rather than an afterthought, we propose the groundwork for the next generation of trustworthy deep clustering time series analytics.

Updated: 2025-07-28 13:50:10

标题: 朝向可解释的深度时间序列数据聚类

摘要: 深度聚类揭示了复杂时间序列数据中的隐藏模式和群组,然而其不透明的决策限制了在安全关键环境中的应用。本调查为时间序列的可解释性深度聚类提供了结构化概述,收集了当前方法及其在现实世界中的应用。我们通过医疗保健、金融、物联网和气候科学等应用领域全面讨论和比较同行评议和预印本论文。我们的分析显示,大多数工作依赖于自动编码器和注意力架构,对流式、不规则采样或隐私保护系列的支持有限,解释性仍然主要被视为一个附加功能。为推动该领域的发展,我们概述了六个研究机会:(1)将复杂网络与内置可解释性相结合;(2)为无监督解释设置清晰、忠实度为重点的评估指标;(3)构建能够适应实时数据流的解释器;(4)定制特定领域的解释;(5)添加人机协作方法,共同优化群组和解释;以及(6)改进我们对时间序列聚类模型内部工作方式的理解。通过将解释性作为主要设计目标而不是事后想法,我们为下一代可信赖的深度聚类时间序列分析奠定了基础。

更新时间: 2025-07-28 13:50:10

领域: cs.LG

下载: http://arxiv.org/abs/2507.20840v1

BuildSTG: A Multi-building Energy Load Forecasting Method using Spatio-Temporal Graph Neural Network

Due to the extensive availability of operation data, data-driven methods show strong capabilities in predicting building energy loads. Buildings with similar features often share energy patterns, reflected by spatial dependencies in their operational data, which conventional prediction methods struggle to capture. To overcome this, we propose a multi-building prediction approach using spatio-temporal graph neural networks, comprising graph representation, graph learning, and interpretation. First, a graph is built based on building characteristics and environmental factors. Next, a multi-level graph convolutional architecture with attention is developed for energy prediction. Lastly, a method interpreting the optimized graph structure is introduced. Experiments on the Building Data Genome Project 2 dataset confirm superior performance over baselines such as XGBoost, SVR, FCNN, GRU, and Naive, highlighting the method's robustness, generalization, and interpretability in capturing meaningful building similarities and spatial relationships.

Updated: 2025-07-28 13:47:36

标题: BuildSTG:一种使用时空图神经网络的多建筑能耗预测方法

摘要: 由于运营数据的广泛可用性,基于数据的方法在预测建筑能源负荷方面表现出强大的能力。具有相似特征的建筑通常共享能源模式,反映在它们的运营数据中的空间依赖性,传统的预测方法往往难以捕捉。为了克服这一问题,我们提出了一种使用时空图神经网络的多建筑预测方法,包括图表示、图学习和解释。首先,基于建筑特征和环境因素构建图。接下来,开发了一个具有注意力的多层图卷积架构用于能源预测。最后,介绍了一种解释优化图结构的方法。对Building Data Genome Project 2数据集的实验证实,该方法比XGBoost、SVR、FCNN、GRU和Naive等基准模型表现更优,突出了该方法在捕捉有意义的建筑相似性和空间关系方面的鲁棒性、泛化性和可解释性。

更新时间: 2025-07-28 13:47:36

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2507.20838v1

BuildSTG: A Multi-building Energy Load Forecasting Method using Spatio-Temporal Graph Neural Network

Due to the extensive availability of operation data, data-driven methods show strong capabilities in predicting building energy loads. Buildings with similar features often share energy patterns, reflected by spatial dependencies in their operational data, which conventional prediction methods struggle to capture. To overcome this, we propose a multi-building prediction approach using spatio-temporal graph neural networks, comprising graph representation, graph learning, and interpretation. First, a graph is built based on building characteristics and environmental factors. Next, a multi-level graph convolutional architecture with attention is developed for energy prediction. Lastly, a method interpreting the optimized graph structure is introduced. Experiments on the Building Data Genome Project 2 dataset confirm superior performance over baselines such as XGBoost, SVR, FCNN, GRU, and Naive, highlighting the method's robustness, generalization, and interpretability in capturing meaningful building similarities and spatial relationships.

Updated: 2025-07-28 13:47:36

标题: BuildSTG:一种使用时空图神经网络的多建筑能量负荷预测方法

摘要: 由于运营数据的广泛可用性,数据驱动方法在预测建筑能源负荷方面表现出强大的能力。具有相似特征的建筑通常共享能源模式,这在它们的运营数据中反映出空间依赖性,传统的预测方法往往难以捕捉。为了克服这一问题,我们提出了一种利用时空图神经网络的多建筑预测方法,包括图表示、图学习和解释。首先,基于建筑特征和环境因素构建一个图。接下来,为能量预测开发了一个具有注意力机制的多层图卷积架构。最后,引入了一种解释优化图结构的方法。对建筑数据基因组项目2数据集的实验证实了该方法在捕捉有意义的建筑相似性和空间关系方面优于XGBoost、SVR、FCNN、GRU和Naive等基线方法的性能,突显了该方法在捕捉建筑之间有意义的相似性和空间关系方面的鲁棒性、泛化性和可解释性。

更新时间: 2025-07-28 13:47:36

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2507.20838v1

First Hallucination Tokens Are Different from Conditional Ones

Hallucination, the generation of untruthful content, is one of the major concerns regarding foundational models. Detecting hallucinations at the token level is vital for real-time filtering and targeted correction, yet the variation of hallucination signals within token sequences is not fully understood. Leveraging the RAGTruth corpus with token-level annotations and reproduced logits, we analyse how these signals depend on a token's position within hallucinated spans, contributing to an improved understanding of token-level hallucination. Our results show that the first hallucinated token carries a stronger signal and is more detectable than conditional tokens. We release our analysis framework, along with code for logit reproduction and metric computation at https://github.com/jakobsnl/RAGTruth_Xtended.

Updated: 2025-07-28 13:44:21

标题: 首次幻觉标记与条件性幻觉标记不同

摘要: 幻觉是生成不真实内容的一个主要问题,是基础模型的关注重点之一。在令牌级别检测幻觉对于实时过滤和有针对性的纠正至关重要,然而在令牌序列中幻觉信号的变化尚未完全被理解。利用具有令牌级别注释和再现logits的RAGTruth语料库,我们分析了这些信号如何取决于令牌在幻觉跨度内的位置,有助于更好地理解令牌级别的幻觉。我们的结果表明,第一个幻觉令牌携带更强的信号,比条件令牌更易检测。我们发布了我们的分析框架,以及用于logit再现和度量计算的代码,网址为https://github.com/jakobsnl/RAGTruth_Xtended。

更新时间: 2025-07-28 13:44:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20836v1

First Hallucination Tokens Are Different from Conditional Ones

Hallucination, the generation of untruthful content, is one of the major concerns regarding foundational models. Detecting hallucinations at the token level is vital for real-time filtering and targeted correction, yet the variation of hallucination signals within token sequences is not fully understood. Leveraging the RAGTruth corpus with token-level annotations and reproduced logits, we analyse how these signals depend on a token's position within hallucinated spans, contributing to an improved understanding of token-level hallucination. Our results show that the first hallucinated token carries a stronger signal and is more detectable than conditional tokens. We release our analysis framework, along with code for logit reproduction and metric computation at https://github.com/jakobsnl/RAGTruth_Xtended.

Updated: 2025-07-28 13:44:21

标题: 首次幻觉令牌与条件性幻觉不同

摘要: 幻觉,即产生不真实内容,是关于基础模型的主要关注点之一。在标记级别检测幻觉对于实时过滤和有针对性的更正至关重要,然而在标记序列中幻觉信号的变化尚未完全理解。利用具有标记级别注释和再现逻辑的RAGTruth语料库,我们分析了这些信号如何取决于标记在幻觉范围内的位置,从而有助于更好地理解标记级别的幻觉。我们的结果显示,第一个幻觉标记携带更强的信号,比条件标记更容易检测。我们发布了我们的分析框架,以及用于逻辑再现和指标计算的代码,网址为https://github.com/jakobsnl/RAGTruth_Xtended。

更新时间: 2025-07-28 13:44:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20836v1

RF Challenge: The Data-Driven Radio Frequency Signal Separation Challenge

We address the critical problem of interference rejection in radio-frequency (RF) signals using a data-driven approach that leverages deep-learning methods. A primary contribution of this paper is the introduction of the RF Challenge, which is a publicly available, diverse RF signal dataset for data-driven analyses of RF signal problems. Specifically, we adopt a simplified signal model for developing and analyzing interference rejection algorithms. For this signal model, we introduce a set of carefully chosen deep learning architectures, incorporating key domain-informed modifications alongside traditional benchmark solutions to establish baseline performance metrics for this intricate, ubiquitous problem. Through extensive simulations involving eight different signal mixture types, we demonstrate the superior performance (in some cases, by two orders of magnitude) of architectures such as UNet and WaveNet over traditional methods like matched filtering and linear minimum mean square error estimation. Our findings suggest that the data-driven approach can yield scalable solutions, in the sense that the same architectures may be similarly trained and deployed for different types of signals. Moreover, these findings further corroborate the promising potential of deep learning algorithms for enhancing communication systems, particularly via interference mitigation. This work also includes results from an open competition based on the RF Challenge, hosted at the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'24).

Updated: 2025-07-28 13:40:42

标题: RF挑战:基于数据的射频信号分离挑战

摘要: 我们采用数据驱动方法解决射频信号中干扰抑制的关键问题,利用深度学习方法。本文的主要贡献是引入了射频挑战赛(RF Challenge),这是一个公开的、多样化的射频信号数据集,用于数据驱动的射频信号问题分析。具体来说,我们采用一个简化的信号模型来开发和分析干扰抑制算法。对于这个信号模型,我们引入了一组精心选择的深度学习架构,结合关键的领域知识修正和传统的基准解决方案,建立了这个复杂且普遍存在的问题的基准性能指标。通过涉及八种不同信号混合类型的广泛模拟,我们展示了像UNet和WaveNet这样的架构在某些情况下的优越性能(有时达到两个数量级),超过了传统方法如匹配滤波和线性最小均方误差估计。我们的研究结果表明,数据驱动方法可以产生可扩展的解决方案,即相同的架构可以被类似地训练和部署到不同类型的信号上。此外,这些结果进一步证实了深度学习算法在增强通信系统方面的有希望的潜力,特别是通过干扰抑制。这项工作还包括了基于RF Challenge的开放竞赛结果,该竞赛于2024年IEEE国际会议上举办。

更新时间: 2025-07-28 13:40:42

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2409.08839v3

RF Challenge: The Data-Driven Radio Frequency Signal Separation Challenge

We address the critical problem of interference rejection in radio-frequency (RF) signals using a data-driven approach that leverages deep-learning methods. A primary contribution of this paper is the introduction of the RF Challenge, which is a publicly available, diverse RF signal dataset for data-driven analyses of RF signal problems. Specifically, we adopt a simplified signal model for developing and analyzing interference rejection algorithms. For this signal model, we introduce a set of carefully chosen deep learning architectures, incorporating key domain-informed modifications alongside traditional benchmark solutions to establish baseline performance metrics for this intricate, ubiquitous problem. Through extensive simulations involving eight different signal mixture types, we demonstrate the superior performance (in some cases, by two orders of magnitude) of architectures such as UNet and WaveNet over traditional methods like matched filtering and linear minimum mean square error estimation. Our findings suggest that the data-driven approach can yield scalable solutions, in the sense that the same architectures may be similarly trained and deployed for different types of signals. Moreover, these findings further corroborate the promising potential of deep learning algorithms for enhancing communication systems, particularly via interference mitigation. This work also includes results from an open competition based on the RF Challenge, hosted at the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'24).

Updated: 2025-07-28 13:40:42

标题: 射频挑战:基于数据的射频信号分离挑战

摘要: 我们通过利用深度学习方法来解决射频信号干扰抑制的关键问题。本文的一个主要贡献是引入射频挑战(RF Challenge),这是一个公开可用的多样化射频信号数据集,用于数据驱动的射频信号问题分析。具体来说,我们采用一个简化的信号模型来开发和分析干扰抑制算法。对于这个信号模型,我们引入了一组精心选择的深度学习架构,结合关键领域知识修正和传统基准解决方案,建立了这一复杂、普遍问题的基准性能指标。通过涉及八种不同信号混合类型的广泛模拟,我们展示了UNet和WaveNet等架构相对于传统方法如匹配滤波和线性最小均方误差估计,在某些情况下性能优越(有时高出两个数量级)。我们的发现表明,数据驱动方法可以产生可扩展的解决方案,即相同的架构可以类似地训练和部署于不同类型的信号。此外,这些发现进一步证实了深度学习算法在增强通信系统方面的潜力,特别是通过干扰抑制。这项工作还包括基于射频挑战的开放竞赛结果,该竞赛于2024年IEEE国际声学、语音和信号处理会议(ICASSP'24)举办。

更新时间: 2025-07-28 13:40:42

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2409.08839v3

Combolutional Neural Networks

Selecting appropriate inductive biases is an essential step in the design of machine learning models, especially when working with audio, where even short clips may contain millions of samples. To this end, we propose the combolutional layer: a learned-delay IIR comb filter and fused envelope detector, which extracts harmonic features in the time domain. We demonstrate the efficacy of the combolutional layer on three information retrieval tasks, evaluate its computational cost relative to other audio frontends, and provide efficient implementations for training. We find that the combolutional layer is an effective replacement for convolutional layers in audio tasks where precise harmonic analysis is important, e.g., piano transcription, speaker classification, and key detection. Additionally, the combolutional layer has several other key benefits over existing frontends, namely: low parameter count, efficient CPU inference, strictly real-valued computations, and improved interpretability.

Updated: 2025-07-28 13:30:51

标题: 卷积神经网络

摘要: 选择适当的归纳偏差是设计机器学习模型的关键步骤,尤其是在处理音频时,即使是短片段也可能包含数百万个样本。为此,我们提出了combolutional层:一个学习延迟IIR组合滤波器和融合包络检测器,可以在时间域提取谐波特征。我们展示了combolutional层在三个信息检索任务上的有效性,评估了与其他音频前端相对应的计算成本,并提供了用于训练的高效实现。我们发现,combolutional层在音频任务中是卷积层的有效替代品,特别是在需要精确谐波分析的情况下,例如钢琴转录、说话者分类和键盘检测。此外,combolutional层相对于现有前端还有几个其他关键优势,即参数数量少、CPU推断高效、严格的实值计算和提高的可解释性。

更新时间: 2025-07-28 13:30:51

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2507.21202v1

On the similarity of bandwidth-tuned quantum kernels and classical kernels

Quantum kernels (QK) are widely used in quantum machine learning applications; yet, their potential to surpass classical machine learning methods on classical datasets remains uncertain. This limitation can be attributed to the exponential concentration phenomenon, which can impair generalization. A common strategy to alleviate this is bandwidth tuning, which involves rescaling data points in the quantum model to improve generalization. In this work, we numerically demonstrate that optimal bandwidth tuning results in QKs that closely resemble radial basis function (RBF) kernels, leading to a lack of quantum advantage over classical methods. Moreover, we reveal that the size of optimal bandwidth tuning parameters further simplifies QKs, causing them to behave like polynomial kernels, corresponding to a low-order Taylor approximation of a RBF kernel. We thoroughly investigate this for fidelity quantum kernels and projected quantum kernels using various data encoding circuits across several classification datasets. We provide numerical evidence and derive a simple analytical model that elucidates how bandwidth tuning influences key quantities in classification tasks. Overall, our findings shed light on the mechanisms that render QK methods classically tractable.

Updated: 2025-07-28 13:30:42

标题: 关于带宽调谐量子核函数和经典核函数的相似性

摘要: 量子核(QK)被广泛应用于量子机器学习应用中;然而,它们在经典数据集上超越经典机器学习方法的潜力仍然不确定。这种限制可以归因于指数集中现象,这可能损害泛化能力。缓解这一限制的常见策略是带宽调整,其中涉及重新缩放量子模型中的数据点以改善泛化能力。在这项工作中,我们通过数值方法证明,最优带宽调整会使QKs与径向基函数(RBF)核非常相似,导致量子方法在经典方法上缺乏优势。此外,我们揭示最优带宽调整参数的大小进一步简化了QKs,导致它们表现得像多项式核,对应于RBF核的低阶泰勒近似。我们通过在几个分类数据集上使用各种数据编码电路,深入研究了忠实度量子核和投影量子核这一点。我们提供数值证据并推导了一个简单的分析模型,阐明了带宽调整如何影响分类任务中的关键量。总的来说,我们的发现揭示了使QK方法在经典上可行的机制。

更新时间: 2025-07-28 13:30:42

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2503.05602v3

On the similarity of bandwidth-tuned quantum kernels and classical kernels

Quantum kernels (QK) are widely used in quantum machine learning applications; yet, their potential to surpass classical machine learning methods on classical datasets remains uncertain. This limitation can be attributed to the exponential concentration phenomenon, which can impair generalization. A common strategy to alleviate this is bandwidth tuning, which involves rescaling data points in the quantum model to improve generalization. In this work, we numerically demonstrate that optimal bandwidth tuning results in QKs that closely resemble radial basis function (RBF) kernels, leading to a lack of quantum advantage over classical methods. Moreover, we reveal that the size of optimal bandwidth tuning parameters further simplifies QKs, causing them to behave like polynomial kernels, corresponding to a low-order Taylor approximation of a RBF kernel. We thoroughly investigate this for fidelity quantum kernels and projected quantum kernels using various data encoding circuits across several classification datasets. We provide numerical evidence and derive a simple analytical model that elucidates how bandwidth tuning influences key quantities in classification tasks. Overall, our findings shed light on the mechanisms that render QK methods classically tractable.

Updated: 2025-07-28 13:30:42

标题: 关于带宽调谐量子核和经典核的相似性

摘要: 量子核(QK)广泛应用于量子机器学习应用中;然而,它们在经典数据集上超越经典机器学习方法的潜力仍不确定。这种限制可以归因于指数集中现象,这会影响泛化能力。缓解这种情况的常见策略是带宽调整,它涉及在量子模型中重新调整数据点以提高泛化能力。在这项工作中,我们通过数值方法证明最佳带宽调整导致的QK与径向基函数(RBF)核非常相似,导致量子方法在经典方法上没有优势。此外,我们发现最佳带宽调整参数的大小进一步简化了QK,使其表现得像多项式核,对应于RBF核的低阶泰勒近似。我们通过在几个分类数据集上使用各种数据编码电路深入研究了这一点,包括保真度量子核和投影量子核。我们提供数值证据,并推导了一个简单的分析模型,阐明了带宽调整如何影响分类任务中的关键量。总的来说,我们的发现揭示了使QK方法经典可行的机制。

更新时间: 2025-07-28 13:30:42

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2503.05602v3

FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization

State Space Models (SSMs), like recent Mamba2, have achieved remarkable performance and received extensive attention. However, deploying Mamba2 on resource-constrained edge devices encounters many problems: severe outliers within the linear layer challenging the quantization, diverse and irregular element-wise tensor operations, and hardware-unfriendly nonlinear functions in the SSM block. To address these issues, this paper presents FastMamba, a dedicated accelerator on FPGA with hardware-algorithm co-design to promote the deployment efficiency of Mamba2. Specifically, we successfully achieve 8-bit quantization for linear layers through Hadamard transformation to eliminate outliers. Moreover, a hardware-friendly and fine-grained power-of-two quantization framework is presented for the SSM block and convolution layer, and a first-order linear approximation is developed to optimize the nonlinear functions. Based on the accurate algorithm quantization, we propose an accelerator that integrates parallel vector processing units, pipelined execution dataflow, and an efficient SSM Nonlinear Approximation Unit, which enhances computational efficiency and reduces hardware complexity. Finally, we evaluate FastMamba on Xilinx VC709 FPGA. For the input prefill task on Mamba2-130M, FastMamba achieves 68.80\times and 8.90\times speedup over Intel Xeon 4210R CPU and NVIDIA RTX 3090 GPU, respectively. In the output decode experiment with Mamba2-2.7B, FastMamba attains 6\times higher energy efficiency than RTX 3090 GPU.

Updated: 2025-07-28 13:28:06

标题: FastMamba:一种在FPGA上高速高效的Mamba加速器,具有准确的量化

摘要: State Space Models (SSMs), like recent Mamba2, have achieved remarkable performance and received extensive attention. However, deploying Mamba2 on resource-constrained edge devices encounters many problems: severe outliers within the linear layer challenging the quantization, diverse and irregular element-wise tensor operations, and hardware-unfriendly nonlinear functions in the SSM block. To address these issues, this paper presents FastMamba, a dedicated accelerator on FPGA with hardware-algorithm co-design to promote the deployment efficiency of Mamba2. Specifically, we successfully achieve 8-bit quantization for linear layers through Hadamard transformation to eliminate outliers. Moreover, a hardware-friendly and fine-grained power-of-two quantization framework is presented for the SSM block and convolution layer, and a first-order linear approximation is developed to optimize the nonlinear functions. Based on the accurate algorithm quantization, we propose an accelerator that integrates parallel vector processing units, pipelined execution dataflow, and an efficient SSM Nonlinear Approximation Unit, which enhances computational efficiency and reduces hardware complexity. Finally, we evaluate FastMamba on Xilinx VC709 FPGA. For the input prefill task on Mamba2-130M, FastMamba achieves 68.80 times and 8.90 times speedup over Intel Xeon 4210R CPU and NVIDIA RTX 3090 GPU, respectively. In the output decode experiment with Mamba2-2.7B, FastMamba attains 6 times higher energy efficiency than RTX 3090 GPU.

更新时间: 2025-07-28 13:28:06

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2505.18975v4

FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization

State Space Models (SSMs), like recent Mamba2, have achieved remarkable performance and received extensive attention. However, deploying Mamba2 on resource-constrained edge devices encounters many problems: severe outliers within the linear layer challenging the quantization, diverse and irregular element-wise tensor operations, and hardware-unfriendly nonlinear functions in the SSM block. To address these issues, this paper presents FastMamba, a dedicated accelerator on FPGA with hardware-algorithm co-design to promote the deployment efficiency of Mamba2. Specifically, we successfully achieve 8-bit quantization for linear layers through Hadamard transformation to eliminate outliers. Moreover, a hardware-friendly and fine-grained power-of-two quantization framework is presented for the SSM block and convolution layer, and a first-order linear approximation is developed to optimize the nonlinear functions. Based on the accurate algorithm quantization, we propose an accelerator that integrates parallel vector processing units, pipelined execution dataflow, and an efficient SSM Nonlinear Approximation Unit, which enhances computational efficiency and reduces hardware complexity. Finally, we evaluate FastMamba on Xilinx VC709 FPGA. For the input prefill task on Mamba2-130M, FastMamba achieves 68.80\times and 8.90\times speedup over Intel Xeon 4210R CPU and NVIDIA RTX 3090 GPU, respectively. In the output decode experiment with Mamba2-2.7B, FastMamba attains 6\times higher energy efficiency than RTX 3090 GPU.

Updated: 2025-07-28 13:28:06

标题: FastMamba:一种在FPGA上具有精确量化的高速高效Mamba加速器

摘要: 状态空间模型(SSMs),如最近的Mamba2,已经取得了显著的性能,并受到了广泛的关注。然而,在资源受限的边缘设备上部署Mamba2遇到了许多问题:线性层中存在严重的异常值,挑战量化;元素级张量操作多样且不规则;以及SSM块中硬件不友好的非线性函数。为了解决这些问题,本文提出了FastMamba,这是一个专用的FPGA加速器,采用硬件算法协同设计,以促进Mamba2的部署效率。具体来说,我们通过哈达玛变换成功实现了线性层的8位量化,以消除异常值。此外,为SSM块和卷积层提出了硬件友好且细粒度的二的幂量化框架,并开发了一阶线性逼近来优化非线性函数。基于精确的算法量化,我们提出了一个集成了并行向量处理单元、流水线执行数据流和高效SSM非线性逼近单元的加速器,提高了计算效率并降低了硬件复杂性。最后,我们在Xilinx VC709 FPGA上评估了FastMamba。对于Mamba2-130M的输入预填充任务,FastMamba在Intel Xeon 4210R CPU和NVIDIA RTX 3090 GPU上分别实现了68.80倍和8.90倍的加速。在Mamba2-2.7B上的输出解码实验中,FastMamba比RTX 3090 GPU实现了6倍更高的能量效率。

更新时间: 2025-07-28 13:28:06

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2505.18975v4

Why Flow Matching is Particle Swarm Optimization?

This paper preliminarily investigates the duality between flow matching in generative models and particle swarm optimization (PSO) in evolutionary computation. Through theoretical analysis, we reveal the intrinsic connections between these two approaches in terms of their mathematical formulations and optimization mechanisms: the vector field learning in flow matching shares similar mathematical expressions with the velocity update rules in PSO; both methods follow the fundamental framework of progressive evolution from initial to target distributions; and both can be formulated as dynamical systems governed by ordinary differential equations. Our study demonstrates that flow matching can be viewed as a continuous generalization of PSO, while PSO provides a discrete implementation of swarm intelligence principles. This duality understanding establishes a theoretical foundation for developing novel hybrid algorithms and creates a unified framework for analyzing both methods. Although this paper only presents preliminary discussions, the revealed correspondences suggest several promising research directions, including improving swarm intelligence algorithms based on flow matching principles and enhancing generative models using swarm intelligence concepts.

Updated: 2025-07-28 13:21:14

标题: 为什么流匹配是粒子群优化?

摘要: 本文初步探讨了生成模型中的流匹配与进化计算中的粒子群优化(PSO)之间的对偶关系。通过理论分析,我们揭示了这两种方法在数学表达和优化机制方面的内在联系:流匹配中的向量场学习与PSO中的速度更新规则共享类似的数学表达;两种方法都遵循从初始到目标分布的渐进演化的基本框架;并且两者都可以被表述为由常微分方程控制的动力系统。我们的研究表明,流匹配可以被视为PSO的连续泛化,而PSO提供了群体智能原则的离散实现。这种对偶理解为开发新型混合算法奠定了理论基础,并为分析这两种方法创造了一个统一的框架。尽管本文仅提出了初步讨论,但揭示的对应关系提示了几个有前景的研究方向,包括基于流匹配原则改进群体智能算法和利用群体智能概念增强生成模型。

更新时间: 2025-07-28 13:21:14

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20810v1

Why Flow Matching is Particle Swarm Optimization?

This paper preliminarily investigates the duality between flow matching in generative models and particle swarm optimization (PSO) in evolutionary computation. Through theoretical analysis, we reveal the intrinsic connections between these two approaches in terms of their mathematical formulations and optimization mechanisms: the vector field learning in flow matching shares similar mathematical expressions with the velocity update rules in PSO; both methods follow the fundamental framework of progressive evolution from initial to target distributions; and both can be formulated as dynamical systems governed by ordinary differential equations. Our study demonstrates that flow matching can be viewed as a continuous generalization of PSO, while PSO provides a discrete implementation of swarm intelligence principles. This duality understanding establishes a theoretical foundation for developing novel hybrid algorithms and creates a unified framework for analyzing both methods. Although this paper only presents preliminary discussions, the revealed correspondences suggest several promising research directions, including improving swarm intelligence algorithms based on flow matching principles and enhancing generative models using swarm intelligence concepts.

Updated: 2025-07-28 13:21:14

标题: 为什么流匹配是粒子群优化?

摘要: 本文初步探讨了生成模型中的流匹配和进化计算中的粒子群优化(PSO)之间的二元性。通过理论分析,我们揭示了这两种方法在数学表达和优化机制方面的内在联系:流匹配中的向量场学习与PSO中的速度更新规则共享类似的数学表达;两种方法都遵循从初始到目标分布的渐进演化的基本框架;并且两者都可以被表述为由常微分方程控制的动力系统。我们的研究表明,流匹配可以被视为PSO的连续泛化,而PSO提供了群体智能原则的离散实现。这种二元理解为开发新型混合算法奠定了理论基础,并为分析这两种方法创造了统一框架。尽管本文仅提出了初步讨论,但揭示的对应关系提示了几个有前景的研究方向,包括基于流匹配原则改进群体智能算法和利用群体智能概念增强生成模型。

更新时间: 2025-07-28 13:21:14

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20810v1

Collusion Resistant DNS With Private Information Retrieval

There has been a growing interest in Internet user privacy, demonstrated by the popularity of privacy-preserving products such as Telegram and Brave, and the widespread adoption of HTTPS. The Domain Name System (DNS) is a key component of Internet-based communication and its privacy has been neglected for years. Recently, DNS over HTTPS (DoH) has improved the situation by fixing the issue of in-path middleboxes. Further progress has been made with proxy-based solutions such as Oblivious DoH (ODoH), which separate a user's identity from their DNS queries. However, these solutions rely on non-collusion assumptions between DNS resolvers and proxies -- an assumption difficult to guarantee in practice. To address this, we explore integrating single-server Private Information Retrieval (PIR) into DNS to enable encrypted query processing without relying on trust assumptions. However, applying PIR to DNS is challenging due to its hierarchical nature -- particularly, interactions with recursive resolvers can still leak information. Navigating performance and privacy trade-offs, we propose PDNS, a DNS extension leveraging single-server PIR to strengthen privacy guarantees. We have implemented a prototype of PDNS and compared its performance against state-of-the-art solutions via trace-driven experiments. The results show that PDNS achieves acceptable performance (2x faster than DoH over Tor with similar privacy guarantees) and strong privacy guarantees today, mainly at the cost of its scalability, which specialized hardware for PIR can address in the near future.

Updated: 2025-07-28 13:17:25

标题: 抗共谋的DNS与私人信息检索

摘要: 对Internet用户隐私的兴趣日益增长,这一点可以从隐私保护产品如Telegram和Brave的流行以及广泛采用HTTPS来证明。域名系统(DNS)是基于Internet的通信的关键组件,其隐私多年来一直被忽视。最近,通过DNS over HTTPS(DoH)修复了路径中间框的问题,提高了情况。进一步的进展是通过基于代理的解决方案如Oblivious DoH(ODoH),将用户的身份与其DNS查询分离。然而,这些解决方案依赖于DNS解析器和代理之间的非合谋假设,这是很难在实践中保证的。为了解决这个问题,我们探索将单服务器私人信息检索(PIR)集成到DNS中,以实现加密查询处理而不依赖于信任假设。然而,将PIR应用于DNS是具有挑战性的,因为其层次结构性质 -- 尤其是与递归解析器的互动仍可能泄露信息。在权衡性能和隐私的取舍中,我们提出了PDNS,这是一个利用单服务器PIR来增强隐私保证的DNS扩展。我们已经实现了PDNS的原型,并通过基于跟踪的实验将其性能与最先进的解决方案进行了比较。结果显示,PDNS在今天实现了可接受的性能(比通过Tor上的DoH快2倍,并具有类似的隐私保证),并且具有强大的隐私保证,主要是以其可伸缩性为代价,这可以在不久的将来通过专门的PIR硬件解决。

更新时间: 2025-07-28 13:17:25

领域: cs.NI,cs.CR

下载: http://arxiv.org/abs/2507.20806v1

Understanding Bias in Perceiving Dimensionality Reduction Projections

Selecting the dimensionality reduction technique that faithfully represents the structure is essential for reliable visual communication and analytics. In reality, however, practitioners favor projections for other attractions, such as aesthetics and visual saliency, over the projection's structural faithfulness, a bias we define as visual interestingness. In this research, we conduct a user study that (1) verifies the existence of such bias and (2) explains why the bias exists. Our study suggests that visual interestingness biases practitioners' preferences when selecting projections for analysis, and this bias intensifies with color-encoded labels and shorter exposure time. Based on our findings, we discuss strategies to mitigate bias in perceiving and interpreting DR projections.

Updated: 2025-07-28 13:17:07

标题: 理解在感知降维投影中的偏见

摘要: 选择能够忠实地代表结构的降维技术对于可靠的视觉传达和分析至关重要。然而,在现实中,从业者更倾向于选择投影技术,而不是因为其结构的忠实性,而是因为其他吸引力,比如美学和视觉显著性,这种偏见被定义为视觉趣味性。在这项研究中,我们进行了一项用户研究,旨在(1)验证这种偏见的存在,并(2)解释为什么存在这种偏见。我们的研究表明,视觉趣味性会影响从业者在选择投影进行分析时的偏好,而且这种偏见会随着颜色编码标签和更短的曝光时间而加剧。基于我们的发现,我们讨论了缓解在感知和解释降维投影中的偏见的策略。

更新时间: 2025-07-28 13:17:07

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2507.20805v1

Understanding Bias in Perceiving Dimensionality Reduction Projections

Selecting the dimensionality reduction technique that faithfully represents the structure is essential for reliable visual communication and analytics. In reality, however, practitioners favor projections for other attractions, such as aesthetics and visual saliency, over the projection's structural faithfulness, a bias we define as visual interestingness. In this research, we conduct a user study that (1) verifies the existence of such bias and (2) explains why the bias exists. Our study suggests that visual interestingness biases practitioners' preferences when selecting projections for analysis, and this bias intensifies with color-encoded labels and shorter exposure time. Based on our findings, we discuss strategies to mitigate bias in perceiving and interpreting DR projections.

Updated: 2025-07-28 13:17:07

标题: 理解在感知降维投影中的偏见

摘要: 选择忠实地代表结构的降维技术对于可靠的视觉沟通和分析至关重要。然而,在现实中,从业者更倾向于选择投影,而不是投影的结构忠实性,原因可能是其他吸引力,如美学和视觉显著性,我们将这种偏见定义为视觉吸引力。在这项研究中,我们进行了一项用户研究,验证了这种偏见的存在,并解释了为什么存在这种偏见。我们的研究表明,视觉吸引力会影响从业者在选择投影进行分析时的偏好,并且这种偏见会随着颜色编码标签和较短的暴露时间而加剧。根据我们的发现,我们讨论了减轻在感知和解释降维投影中的偏见的策略。

更新时间: 2025-07-28 13:17:07

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2507.20805v1

MMGraphRAG: Bridging Vision and Language with Interpretable Multimodal Knowledge Graphs

Retrieval-Augmented Generation (RAG) enhances language model generation by retrieving relevant information from external knowledge bases. However, conventional RAG methods face the issue of missing multimodal information. Multimodal RAG methods address this by fusing images and text through mapping them into a shared embedding space, but they fail to capture the structure of knowledge and logical chains between modalities. Moreover, they also require large-scale training for specific tasks, resulting in limited generalizing ability. To address these limitations, we propose MMGraphRAG, which refines visual content through scene graphs and constructs a multimodal knowledge graph (MMKG) in conjunction with text-based KG. It employs spectral clustering to achieve cross-modal entity linking and retrieves context along reasoning paths to guide the generative process. Experimental results show that MMGraphRAG achieves state-of-the-art performance on the DocBench and MMLongBench datasets, demonstrating strong domain adaptability and clear reasoning paths.

Updated: 2025-07-28 13:16:23

标题: MMGraphRAG:利用可解释的多模态知识图桥接视觉和语言

摘要: 检索增强生成(RAG)通过从外部知识库中检索相关信息来增强语言模型生成。然而,传统的RAG方法面临缺失多模态信息的问题。多模态RAG方法通过将图像和文本融合到共享嵌入空间中来解决这一问题,但它们无法捕捉知识结构和模态之间的逻辑链。此外,它们还需要针对特定任务进行大规模训练,导致泛化能力有限。为了解决这些限制,我们提出了MMGraphRAG,通过场景图优化视觉内容,并与基于文本的知识图(KG)一起构建多模态知识图(MMKG)。它利用谱聚类实现跨模态实体链接,并沿着推理路径检索上下文来引导生成过程。实验结果显示,MMGraphRAG在DocBench和MMLongBench数据集上表现出最先进的性能,展示了强大的领域适应性和清晰的推理路径。

更新时间: 2025-07-28 13:16:23

领域: cs.AI

下载: http://arxiv.org/abs/2507.20804v1

MMGraphRAG: Bridging Vision and Language with Interpretable Multimodal Knowledge Graphs

Retrieval-Augmented Generation (RAG) enhances language model generation by retrieving relevant information from external knowledge bases. However, conventional RAG methods face the issue of missing multimodal information. Multimodal RAG methods address this by fusing images and text through mapping them into a shared embedding space, but they fail to capture the structure of knowledge and logical chains between modalities. Moreover, they also require large-scale training for specific tasks, resulting in limited generalizing ability. To address these limitations, we propose MMGraphRAG, which refines visual content through scene graphs and constructs a multimodal knowledge graph (MMKG) in conjunction with text-based KG. It employs spectral clustering to achieve cross-modal entity linking and retrieves context along reasoning paths to guide the generative process. Experimental results show that MMGraphRAG achieves state-of-the-art performance on the DocBench and MMLongBench datasets, demonstrating strong domain adaptability and clear reasoning paths.

Updated: 2025-07-28 13:16:23

标题: MMGraphRAG:用可解释的多模态知识图桥接视觉和语言

摘要: 检索增强生成(RAG)通过从外部知识库中检索相关信息来增强语言模型生成。然而,传统的RAG方法面临缺失多模态信息的问题。多模态RAG方法通过将图像和文本融合到共享嵌入空间中来解决这一问题,但它们未能捕捉知识结构和模态之间的逻辑链条。此外,它们还需要针对特定任务进行大规模训练,导致泛化能力受限。为了解决这些限制,我们提出了MMGraphRAG,通过场景图优化视觉内容,并与基于文本的知识图(KG)一起构建多模态知识图(MMKG)。它利用谱聚类实现跨模态实体链接,并沿着推理路径检索上下文以指导生成过程。实验结果表明,MMGraphRAG在DocBench和MMLongBench数据集上实现了最先进的性能,展示了强大的领域适应能力和清晰的推理路径。

更新时间: 2025-07-28 13:16:23

领域: cs.AI

下载: http://arxiv.org/abs/2507.20804v1

Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models

Background: Despite the current ubiquity of Large Language Models (LLMs) across the medical domain, there is a surprising lack of studies which address their reasoning behaviour. We emphasise the importance of understanding reasoning behaviour as opposed to high-level prediction accuracies, since it is equivalent to explainable AI (XAI) in this context. In particular, achieving XAI in medical LLMs used in the clinical domain will have a significant impact across the healthcare sector. Results: Therefore, in this work, we adapt the existing concept of reasoning behaviour and articulate its interpretation within the specific context of medical LLMs. We survey and categorise current state-of-the-art approaches for modeling and evaluating reasoning reasoning in medical LLMs. Additionally, we propose theoretical frameworks which can empower medical professionals or machine learning engineers to gain insight into the low-level reasoning operations of these previously obscure models. We also outline key open challenges facing the development of Large Reasoning Models. Conclusion: The subsequent increased transparency and trust in medical machine learning models by clinicians as well as patients will accelerate the integration, application as well as further development of medical AI for the healthcare system as a whole.

Updated: 2025-07-28 13:13:02

标题: 《不纯粹理性的批判:揭示医学大型语言模型的推理行为》

摘要: 背景:尽管目前大型语言模型(LLMs)在医疗领域普遍存在,但令人惊讶的是,缺乏研究来探讨它们的推理行为。我们强调理解推理行为的重要性,而不是高级别预测准确性,因为在这种情况下等同于可解释人工智能(XAI)。特别是在临床领域使用的医疗LLMs实现XAI将对整个医疗部门产生重大影响。结果:因此,在这项工作中,我们调整了现有的推理行为概念,并阐明了在医疗LLMs特定背景下的解释。我们调查和分类了目前用于建模和评估医疗LLMs推理的最新方法。此外,我们提出了理论框架,可以使医疗专业人员或机器学习工程师深入了解这些以前不为人知的模型的低级推理操作。我们还概述了发展大型推理模型面临的主要挑战。结论:随后,通过医疗机器学习模型透明度和信任度的增加,临床医生和患者将加速整合、应用以及进一步发展医疗人工智能,从而促进整个医疗系统的发展。

更新时间: 2025-07-28 13:13:02

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.15748v2

Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models

Background: Despite the current ubiquity of Large Language Models (LLMs) across the medical domain, there is a surprising lack of studies which address their reasoning behaviour. We emphasise the importance of understanding reasoning behaviour as opposed to high-level prediction accuracies, since it is equivalent to explainable AI (XAI) in this context. In particular, achieving XAI in medical LLMs used in the clinical domain will have a significant impact across the healthcare sector. Results: Therefore, in this work, we adapt the existing concept of reasoning behaviour and articulate its interpretation within the specific context of medical LLMs. We survey and categorise current state-of-the-art approaches for modeling and evaluating reasoning reasoning in medical LLMs. Additionally, we propose theoretical frameworks which can empower medical professionals or machine learning engineers to gain insight into the low-level reasoning operations of these previously obscure models. We also outline key open challenges facing the development of Large Reasoning Models. Conclusion: The subsequent increased transparency and trust in medical machine learning models by clinicians as well as patients will accelerate the integration, application as well as further development of medical AI for the healthcare system as a whole.

Updated: 2025-07-28 13:13:02

标题: 《杂凑纯洁的理由:揭示医学大型语言模型的推理行为》

摘要: 背景:尽管当前在医学领域普遍存在大型语言模型(LLMs),但令人惊讶的是缺乏研究涉及它们的推理行为。我们强调理解推理行为的重要性,而不是高水平的预测准确性,因为在这个背景下等同于可解释的人工智能(XAI)。特别是,在临床领域使用医学LLMs实现XAI将对整个医疗保健部门产生重大影响。结果:因此,在这项工作中,我们调整了现有的推理行为概念,并阐明了在医学LLMs特定背景下的解释。我们调查和归类了当前在医学LLMs中建模和评估推理的最新方法。此外,我们提出了可以赋予医学专业人员或机器学习工程师洞察这些以前模糊模型低级推理操作的理论框架。我们还概述了大型推理模型开发面临的关键挑战。结论:医疗机器学习模型的透明度和信任度的增加将加速医生和患者对医疗人工智能的整合、应用和进一步发展,从而推动整个医疗保健系统的发展。

更新时间: 2025-07-28 13:13:02

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.15748v2

A Comprehensive Quantification of Inconsistencies in Memory Dumps

Memory forensics is a powerful technique commonly adopted to investigate compromised machines and to detect stealthy computer attacks that do not store data on non-volatile storage. To employ this technique effectively, the analyst has to first acquire a faithful copy of the system's volatile memory after the incident. However, almost all memory acquisition tools capture the content of physical memory without stopping the system's activity and by following the ascending order of the physical pages, which can lead to inconsistencies and errors in the dump. In this paper we developed a system to track all write operations performed by the OS kernel during a memory acquisition process. This allows us to quantify, for the first time, the exact number and type of inconsistencies observed in memory dumps. We examine the runtime activity of three different operating systems and the way they manage physical memory. Then, focusing on Linux, we quantify how different acquisition modes, file systems, and hardware targets influence the frequency of kernel writes during the dump. We also analyze the impact of inconsistencies on the reconstruction of page tables and major kernel data structures used by Volatility to extract forensic artifacts. Our results show that inconsistencies are very common and that their presence can undermine the reliability and validity of memory forensics analysis.

Updated: 2025-07-28 13:11:18

标题: 内存转储中不一致性的全面量化

摘要: 内存取证是一种强大的技术,通常用于调查被入侵的计算机并检测不在非易失性存储器上存储数据的隐蔽计算机攻击。为了有效地使用这种技术,分析人员必须首先在事发后获得系统易失性内存的忠实副本。然而,几乎所有的内存获取工具都是在不停止系统活动的情况下捕获物理内存的内容,并按照物理页面的升序进行,这可能导致转储中的不一致性和错误。在本文中,我们开发了一个系统,用于跟踪操作系统内核在内存获取过程中执行的所有写操作。这使我们能够首次量化在内存转储中观察到的不一致性的确切数量和类型。我们检查了三种不同操作系统的运行时活动以及它们管理物理内存的方式。然后,重点关注Linux,我们量化了不同获取模式、文件系统和硬件目标如何影响内核在转储过程中写入的频率。我们还分析了不一致性对用于提取取证物件的Volatility所使用的页面表和主要内核数据结构的重建的影响。我们的结果表明,不一致性非常普遍,它们的存在可能会削弱内存取证分析的可靠性和有效性。

更新时间: 2025-07-28 13:11:18

领域: cs.CR,cs.OS,D.4.6

下载: http://arxiv.org/abs/2503.15065v2

Execution-time opacity control for timed automata

Timing leaks in timed automata (TA) can occur whenever an attacker is able to deduce a secret by observing some timed behaviour. In execution-time opacity, the attacker aims at deducing whether a private location was visited, by observing only the execution time. In earlier work, it was shown that it can be decided whether a TA is opaque in this setting. In this work, we address control, and investigate whether a TA can be controlled by a strategy at runtime to ensure opacity, by enabling or disabling some controllable actions over time. We first show that, in general, it is undecidable to determine whether such a strategy exists. Second, we show that deciding whether a meta-strategy ensuring opacity exists can be done in EXPSPACE. Such a meta-strategy is a set of strategies allowing an arbitrarily large -- yet finite -- number of strategy changes per time unit, and with only weak ordering relations between such changes. Our method is constructive, in the sense that we can exhibit such a meta-strategy. We also extend our method to the case of weak opacity, when it is harmless that the attacker deduces that the private location was not visited. Finally, we consider a variant where the attacker cannot have an infinite precision in its observations.

Updated: 2025-07-28 13:10:17

标题: 时限自动机的执行时间不透明度控制

摘要: 定时自动机(TA)中的时序泄漏可能在攻击者能够通过观察某些定时行为推断出秘密时发生。在执行时间不透明性中,攻击者旨在通过观察执行时间来推断是否访问了私有位置。在早期的研究中,已经证明可以在这种情况下确定TA是否是不透明的。在这项工作中,我们关注控制,并调查是否可以通过在运行时启用或禁用一些可控动作来确保TA可以通过策略控制以保证不透明性。我们首先展示了一般情况下确定是否存在这样一种策略是不可判定的。其次,我们展示了决定是否存在一个确保不透明性的元策略可以在EXPSPACE中完成。这样的元策略是一组策略,允许每个时间单位进行任意大但有限的策略更改,并且这些更改之间只有弱排序关系。我们的方法是建设性的,我们可以展示这样的元策略。我们还将我们的方法扩展到弱不透明性的情况,当攻击者推断出私有位置没有被访问时不会造成伤害。最后,我们考虑了一个变体,其中攻击者在观察中不能具有无限精度。

更新时间: 2025-07-28 13:10:17

领域: cs.CR

下载: http://arxiv.org/abs/2409.10336v3

LanternNet: A Novel Hub-and-Spoke System to Seek and Suppress Spotted Lanternfly Populations

The invasive spotted lanternfly (SLF) poses a significant threat to agriculture and ecosystems, causing widespread damage. Current control methods, such as egg scraping, pesticides, and quarantines, prove labor-intensive, environmentally hazardous, and inadequate for long-term SLF suppression. This research introduces LanternNet, a novel autonomous robotic Hub-and-Spoke system designed for scalable detection and suppression of SLF populations. A central, tree-mimicking hub utilizes a YOLOv8 computer vision model for precise SLF identification. Three specialized robotic spokes perform targeted tasks: pest neutralization, environmental monitoring, and navigation/mapping. Field deployment across multiple infested sites over 5 weeks demonstrated LanternNet's efficacy. Quantitative analysis revealed significant reductions (p < 0.01, paired t-tests) in SLF populations and corresponding improvements in tree health indicators across the majority of test sites. Compared to conventional methods, LanternNet offers substantial cost advantages and improved scalability. Furthermore, the system's adaptability for enhanced autonomy and targeting of other invasive species presents significant potential for broader ecological impact. LanternNet demonstrates the transformative potential of integrating robotics and AI for advanced invasive species management and improved environmental outcomes.

Updated: 2025-07-28 13:08:33

标题: 灯笼网:一种新颖的中心与辐射系统,用于寻找和抑制斑点飞虱种群

摘要: 外来入侵的斑点飞虱(SLF)对农业和生态系统构成重大威胁,造成广泛的破坏。目前的控制方法,如清除卵、使用杀虫剂和检疫,证明劳动密集、对环境有害且不足以长期抑制SLF。本研究介绍了LanternNet,这是一个新颖的自主机器人Hub-and-Spoke系统,旨在实现可伸缩的SLF种群检测和抑制。一个中心的类似树木的枢纽利用YOLOv8计算机视觉模型精确识别SLF。三个专门的机器人辐条执行有针对性的任务:害虫中和、环境监测和导航/制图。在5周内在多个受感染场地进行的现场部署显示了LanternNet的有效性。定量分析显示,在大多数测试场地,SLF种群显著减少(p < 0.01,配对t检验),相应地改善了树木健康指标。与传统方法相比,LanternNet具有显著的成本优势和改进的可伸缩性。此外,该系统对于提高自主性和针对其他入侵物种的潜力,具有显著的生态影响潜力。 LanternNet展示了将机器人技术和人工智能整合到先进的入侵物种管理和改善环境结果中的转变潜力。

更新时间: 2025-07-28 13:08:33

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.20800v1

LanternNet: A Novel Hub-and-Spoke System to Seek and Suppress Spotted Lanternfly Populations

The invasive spotted lanternfly (SLF) poses a significant threat to agriculture and ecosystems, causing widespread damage. Current control methods, such as egg scraping, pesticides, and quarantines, prove labor-intensive, environmentally hazardous, and inadequate for long-term SLF suppression. This research introduces LanternNet, a novel autonomous robotic Hub-and-Spoke system designed for scalable detection and suppression of SLF populations. A central, tree-mimicking hub utilizes a YOLOv8 computer vision model for precise SLF identification. Three specialized robotic spokes perform targeted tasks: pest neutralization, environmental monitoring, and navigation/mapping. Field deployment across multiple infested sites over 5 weeks demonstrated LanternNet's efficacy. Quantitative analysis revealed significant reductions (p < 0.01, paired t-tests) in SLF populations and corresponding improvements in tree health indicators across the majority of test sites. Compared to conventional methods, LanternNet offers substantial cost advantages and improved scalability. Furthermore, the system's adaptability for enhanced autonomy and targeting of other invasive species presents significant potential for broader ecological impact. LanternNet demonstrates the transformative potential of integrating robotics and AI for advanced invasive species management and improved environmental outcomes.

Updated: 2025-07-28 13:08:33

标题: LanternNet:一种新颖的轴式系统,用于寻找和抑制斑点飞虱种群

摘要: 外来入侵的斑点飞虱(SLF)对农业和生态系统构成了重大威胁,造成广泛破坏。目前的控制方法,如清除卵巢、使用杀虫剂和实施检疫,都证明是劳动密集型、对环境有害,并且不足以长期抑制SLF。这项研究介绍了LanternNet,这是一个新颖的自主机器人“中心和辐射”系统,旨在实现SLF种群的可伸缩检测和抑制。一个中心的、模仿树木的中心利用YOLOv8计算机视觉模型进行精确的SLF识别。三个专门的机器人“辐射”执行有针对性的任务:害虫中和、环境监测和导航/地图绘制。在5周内部署在多个受侵染地点上的现场实验显示了LanternNet的效力。定量分析显示,在大多数测试地点上,SLF种群显著减少(p < 0.01,成对t检验),并且相应地改善了树木健康指标。与传统方法相比,LanternNet具有显著的成本优势和改进的可伸缩性。此外,该系统对于增强自主性和针对其他入侵物种的定位的适应性,为更广泛的生态影响提供了重要潜力。LanternNet展示了将机器人技术和人工智能集成用于高级入侵物种管理和改善环境结果的转变潜力。

更新时间: 2025-07-28 13:08:33

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.20800v1

Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach

Understanding how large language model (LLM) agents behave in strategic interactions is essential as these systems increasingly participate autonomously in economically and morally consequential decisions. We evaluate LLM preferences using canonical economic games, finding substantial deviations from human behavior. Models like GPT-4o show excessive cooperation and limited incentive sensitivity, while reasoning models, such as o3-mini, align more consistently with payoff-maximizing strategies. We propose a supervised fine-tuning pipeline that uses synthetic datasets derived from economic reasoning to align LLM agents with economic preferences, focusing on two stylized preference structures. In the first, utility depends only on individual payoffs (homo economicus), while utility also depends on a notion of Kantian universalizability in the second preference structure (homo moralis). We find that fine-tuning based on small datasets shifts LLM agent behavior toward the corresponding economic agent. We further assess the fine-tuned agents' behavior in two applications: Moral dilemmas involving autonomous vehicles and algorithmic pricing in competitive markets. These examples illustrate how different normative objectives embedded via realizations from structured preference structures can influence market and moral outcomes. This work contributes a replicable, cost-efficient, and economically grounded pipeline to align AI preferences using moral-economic principles.

Updated: 2025-07-28 13:05:04

标题: 用理性和道德偏好对齐大型语言模型代理:一种监督微调方法

摘要: 理解大型语言模型(LLM)代理在战略互动中的行为至关重要,因为这些系统越来越多地自主参与经济和道德上具有重大影响的决策。我们使用经典经济博弈评估LLM的偏好,发现与人类行为存在显著偏差。像GPT-4o这样的模型显示出过度合作和有限的激励敏感性,而像o3-mini这样的推理模型更一致地符合最大化回报的策略。我们提出了一个监督微调管道,利用从经济推理导出的合成数据集来使LLM代理与经济偏好保持一致,重点关注两种程式化偏好结构。在第一种结构中,效用仅取决于个体回报(经济人类),而在第二种偏好结构中,效用还取决于康德式普遍性的概念(道德人类)。我们发现基于小数据集的微调可以将LLM代理的行为转向相应的经济代理。我们进一步评估了经过微调的代理在两个应用中的行为:涉及自动车辆的道德困境和竞争市场中的算法定价。这些例子说明了通过从结构化偏好结构的实现中嵌入不同的规范目标可以影响市场和道德结果。这项工作提供了一个可复制、成本效益和基于经济原则的管道,以调整AI偏好使用道德经济原则。

更新时间: 2025-07-28 13:05:04

领域: econ.GN,cs.AI,cs.LG,q-fin.EC

下载: http://arxiv.org/abs/2507.20796v1

Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach

Understanding how large language model (LLM) agents behave in strategic interactions is essential as these systems increasingly participate autonomously in economically and morally consequential decisions. We evaluate LLM preferences using canonical economic games, finding substantial deviations from human behavior. Models like GPT-4o show excessive cooperation and limited incentive sensitivity, while reasoning models, such as o3-mini, align more consistently with payoff-maximizing strategies. We propose a supervised fine-tuning pipeline that uses synthetic datasets derived from economic reasoning to align LLM agents with economic preferences, focusing on two stylized preference structures. In the first, utility depends only on individual payoffs (homo economicus), while utility also depends on a notion of Kantian universalizability in the second preference structure (homo moralis). We find that fine-tuning based on small datasets shifts LLM agent behavior toward the corresponding economic agent. We further assess the fine-tuned agents' behavior in two applications: Moral dilemmas involving autonomous vehicles and algorithmic pricing in competitive markets. These examples illustrate how different normative objectives embedded via realizations from structured preference structures can influence market and moral outcomes. This work contributes a replicable, cost-efficient, and economically grounded pipeline to align AI preferences using moral-economic principles.

Updated: 2025-07-28 13:05:04

标题: 用有理性和道德偏好对齐大型语言模型代理:一种监督微调方法

摘要: 理解大型语言模型(LLM)代理在战略互动中的行为是至关重要的,因为这些系统越来越多地参与经济和道德上具有重要影响的决策。我们使用经典经济游戏评估LLM的偏好,发现与人类行为存在显著偏差。像GPT-4o这样的模型显示出过度合作和有限的激励敏感性,而像o3-mini这样的推理模型更一致地与最大化收益的策略相一致。我们提出了一个监督微调流程,使用从经济推理中导出的合成数据集来调整LLM代理与经济偏好保持一致,重点关注两种程式化的偏好结构。在第一种结构中,效用仅取决于个体收益(经济人),而在第二种偏好结构中,效用还取决于康德式普遍适用性的概念(道德人)。我们发现基于小数据集的微调可以将LLM代理的行为转向相应的经济代理。我们进一步评估微调代理在两个应用中的行为:涉及自动车辆的道德困境和竞争市场中的算法定价。这些例子说明通过从结构化偏好结构中嵌入的不同规范目标可以影响市场和道德结果。这项工作提供了一个可复制、成本效益高且基于经济原则的流程,以调整AI偏好。

更新时间: 2025-07-28 13:05:04

领域: econ.GN,cs.AI,cs.LG,q-fin.EC

下载: http://arxiv.org/abs/2507.20796v1

APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

We propose the APTx Neuron, a novel, unified neural computation unit that integrates non-linear activation and linear transformation into a single trainable expression. The APTx Neuron is derived from the APTx activation function, thereby eliminating the need for separate activation layers and making the architecture both computationally efficient and elegant. The proposed neuron follows the functional form $y = \sum_{i=1}^{n} ((\alpha_i + \tanh(\beta_i x_i)) \cdot \gamma_i x_i) + \delta$, where all parameters $\alpha_i$, $\beta_i$, $\gamma_i$, and $\delta$ are trainable. We validate our APTx Neuron-based architecture on the MNIST dataset, achieving up to 96.69% test accuracy within 11 epochs using approximately 332K trainable parameters. The results highlight the superior expressiveness and computational efficiency of the APTx Neuron compared to traditional neurons, pointing toward a new paradigm in unified neuron design and the architectures built upon it.

Updated: 2025-07-28 13:04:47

标题: APTx神经元:一种集成激活和计算的统一可训练神经元架构

摘要: 我们提出了APTx神经元,这是一种新颖的统一神经计算单元,将非线性激活和线性转换融合为一个可训练的表达式。APTx神经元源自APTx激活函数,从而消除了独立激活层的需要,使得架构既具有计算效率又优雅。所提出的神经元遵循功能形式$y = \sum_{i=1}^{n} ((\alpha_i + \tanh(\beta_i x_i)) \cdot \gamma_i x_i) + \delta$,其中所有参数$\alpha_i$、$\beta_i$、$\gamma_i$和$\delta$都是可训练的。我们在MNIST数据集上验证了基于APTx神经元的架构,在大约332K个可训练参数的情况下,在11个周期内实现了高达96.69%的测试准确率。结果突显了APTx神经元相对于传统神经元的优越表达能力和计算效率,指向了统一神经元设计和基于其构建的架构的新范式。

更新时间: 2025-07-28 13:04:47

领域: cs.NE,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.14270v3

FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings

Efficient preference optimization algorithms such as Direct Preference Optimization (DPO) have become a popular approach in aligning large language models (LLMs) with human preferences. These algorithms implicitly treat the LLM as a reward model, and focus on training it to correct misranked preference pairs. However, recent work~\citep{chen2024preference} empirically finds that DPO training \textit{rarely improves these misranked preference pairs}, despite its gradient emphasizing on these cases. We introduce FocalPO, a DPO variant that instead \textit{down-weighs} misranked preference pairs and prioritizes enhancing the model's understanding of pairs that it can already rank correctly. Inspired by Focal Loss used in vision tasks, FocalPO achieves this by adding a modulating factor to dynamically scale DPO loss. Our experiment demonstrates that FocalPO surpasses DPO and its variants on popular benchmarks like Alpaca Eval 2.0 using Mistral-Base-7B and Llama-3-Instruct-8B, with the introduced hyperparameter fixed. Additionally, we empirically reveals how FocalPO affects training on correct and incorrect sample groups, further underscoring its effectiveness.

Updated: 2025-07-28 13:00:58

标题: FocalPO:通过关注正确的偏好排序来增强优化偏好

摘要: Efficient preference optimization algorithms like Direct Preference Optimization (DPO) have gained popularity in aligning large language models (LLMs) with human preferences. These algorithms treat the LLM as a reward model and focus on training it to correct misranked preference pairs. However, recent empirical work has shown that DPO training often does not improve these misranked preference pairs, despite its emphasis on these cases. In response, we introduce FocalPO, a variant of DPO that down-weighs misranked preference pairs and prioritizes enhancing the model's understanding of correctly ranked pairs. Inspired by Focal Loss in vision tasks, FocalPO dynamically scales DPO loss with a modulating factor. Our experiments show that FocalPO outperforms DPO and its variants on popular benchmarks such as Alpaca Eval 2.0 using Mistral-Base-7B and Llama-3-Instruct-8B, with fixed hyperparameters. Furthermore, we empirically demonstrate how FocalPO impacts training on correct and incorrect sample groups, highlighting its effectiveness.

更新时间: 2025-07-28 13:00:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2501.06645v3

FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings

Efficient preference optimization algorithms such as Direct Preference Optimization (DPO) have become a popular approach in aligning large language models (LLMs) with human preferences. These algorithms implicitly treat the LLM as a reward model, and focus on training it to correct misranked preference pairs. However, recent work~\citep{chen2024preference} empirically finds that DPO training \textit{rarely improves these misranked preference pairs}, despite its gradient emphasizing on these cases. We introduce FocalPO, a DPO variant that instead \textit{down-weighs} misranked preference pairs and prioritizes enhancing the model's understanding of pairs that it can already rank correctly. Inspired by Focal Loss used in vision tasks, FocalPO achieves this by adding a modulating factor to dynamically scale DPO loss. Our experiment demonstrates that FocalPO surpasses DPO and its variants on popular benchmarks like Alpaca Eval 2.0 using Mistral-Base-7B and Llama-3-Instruct-8B, with the introduced hyperparameter fixed. Additionally, we empirically reveals how FocalPO affects training on correct and incorrect sample groups, further underscoring its effectiveness.

Updated: 2025-07-28 13:00:58

标题: FocalPO:通过关注正确的偏好排序来增强偏好优化

摘要: 高效的偏好优化算法,如直接偏好优化(DPO),已成为将大型语言模型(LLMs)与人类偏好对齐的流行方法。这些算法隐式地将LLM视为奖励模型,并专注于训练其纠正错误排名的偏好对。然而,最近的研究发现,尽管DPO训练强调这些情况,但实际上很少改进这些错误排名的偏好对。我们引入了FocalPO,这是一种DPO变体,它代替减少错误排名的偏好对,并优先增强模型对其已经能够正确排名的对的理解。受视觉任务中使用的Focal Loss的启发,FocalPO通过添加一个调节因子动态缩放DPO损失来实现这一点。我们的实验表明,FocalPO在Alpaca Eval 2.0等流行基准测试中使用Mistral-Base-7B和Llama-3-Instruct-8B超越了DPO及其变体,引入了固定的超参数。此外,我们还从经验上揭示了FocalPO如何影响对正确和不正确样本组进行训练,进一步强调了其有效性。

更新时间: 2025-07-28 13:00:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2501.06645v3

Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs

Psychology research has shown that humans are poor at estimating their performance on tasks, tending towards underconfidence on easy tasks and overconfidence on difficult tasks. We examine three LLMs, Llama-3-70B-instruct, Claude-3-Sonnet, and GPT-4o, on a range of QA tasks of varying difficulty, and show that models exhibit subtle differences from human patterns of overconfidence: less sensitive to task difficulty, and when prompted to answer based on different personas -- e.g., expert vs layman, or different race, gender, and ages -- the models will respond with stereotypically biased confidence estimations even though their underlying answer accuracy remains the same. Based on these observations, we propose Answer-Free Confidence Estimation (AFCE) to improve confidence calibration and LLM interpretability in these settings. AFCE is a self-assessment method that employs two stages of prompting, first eliciting only confidence scores on questions, then asking separately for the answer. Experiments on the MMLU and GPQA datasets spanning subjects and difficulty show that this separation of tasks significantly reduces overconfidence and delivers more human-like sensitivity to task difficulty.

Updated: 2025-07-28 12:59:13

标题: 语言模型是否反映人类的自信?探索心理学见解以解决语言模型中的过度自信问题

摘要: 心理学研究表明,人类在估计任务表现时表现不佳,易任务倾向于自信不足,难任务则易自负。我们对三个LLM模型,Llama-3-70B-instruct,Claude-3-Sonnet和GPT-4o,在一系列难度不同的问答任务上进行了研究,并发现这些模型与人类的自负模式存在细微差异:对任务难度不太敏感,当被要求根据不同的人物角色来回答问题时,例如专家与普通人,或不同种族、性别和年龄时,这些模型会出现带有刻板印象的自信估计,即使它们的基本答案准确性保持不变。基于这些观察结果,我们提出了Answer-Free Confidence Estimation(AFCE)方法,以改进这些情境中的信心校准和LLM可解释性。AFCE是一种自我评估方法,采用两个阶段的提示,首先只引出问题的信心评分,然后单独询问答案。对跨学科和难度的MMLU和GPQA数据集的实验表明,这种任务分离显著减少了自负,并提供了更具人类感的对任务难度的敏感性。

更新时间: 2025-07-28 12:59:13

领域: cs.AI,I.2.7

下载: http://arxiv.org/abs/2506.00582v2

Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs

Psychology research has shown that humans are poor at estimating their performance on tasks, tending towards underconfidence on easy tasks and overconfidence on difficult tasks. We examine three LLMs, Llama-3-70B-instruct, Claude-3-Sonnet, and GPT-4o, on a range of QA tasks of varying difficulty, and show that models exhibit subtle differences from human patterns of overconfidence: less sensitive to task difficulty, and when prompted to answer based on different personas -- e.g., expert vs layman, or different race, gender, and ages -- the models will respond with stereotypically biased confidence estimations even though their underlying answer accuracy remains the same. Based on these observations, we propose Answer-Free Confidence Estimation (AFCE) to improve confidence calibration and LLM interpretability in these settings. AFCE is a self-assessment method that employs two stages of prompting, first eliciting only confidence scores on questions, then asking separately for the answer. Experiments on the MMLU and GPQA datasets spanning subjects and difficulty show that this separation of tasks significantly reduces overconfidence and delivers more human-like sensitivity to task difficulty.

Updated: 2025-07-28 12:59:13

标题: 语言模型是否反映人类的自信? 探索心理洞察以解决大型语言模型中的过度自信问题

摘要: 心理学研究表明,人类在估计任务表现时表现不佳,易于对简单任务产生自信不足的倾向,并在困难任务上表现过度自信。我们对三个LLMs,Llama-3-70B-instruct,Claude-3-Sonnet和GPT-4o,在一系列难度不同的问答任务上进行了研究,发现模型在过度自信方面表现出与人类模式的微妙差异:对任务难度不太敏感,当提示根据不同的人物形象(例如专家与门外汉,或不同种族、性别和年龄)回答问题时,模型会以典型偏见的信心估计回答,尽管其基本答案准确性保持不变。基于这些观察,我们提出了“无答案信心估计”(AFCE)方法,以改善这些情境中的信心校准和LLM可解释性。AFCE是一种自我评估方法,采用两个阶段的提示,首先仅引出问题的信心评分,然后单独要求答案。在跨学科和难度的MMLU和GPQA数据集上进行的实验表明,这种任务分离显著降低了过度自信,并提供了更接近人类对任务难度敏感度的表现。

更新时间: 2025-07-28 12:59:13

领域: cs.AI,I.2.7

下载: http://arxiv.org/abs/2506.00582v2

Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps

Most autonomous cars rely on the availability of high-definition (HD) maps. Current research aims to address this constraint by directly predicting HD map elements from onboard sensors and reasoning about the relationships between the predicted map and traffic elements. Despite recent advancements, the coherent online construction of HD maps remains a challenging endeavor, as it necessitates modeling the high complexity of road topologies in a unified and consistent manner. To address this challenge, we propose a coherent approach to predict lane segments and their corresponding topology, as well as road boundaries, all by leveraging prior map information represented by commonly available standard-definition (SD) maps. We propose a network architecture, which leverages hybrid lane segment encodings comprising prior information and denoising techniques to enhance training stability and performance. Furthermore, we facilitate past frames for temporal consistency. Our experimental evaluation demonstrates that our approach outperforms previous methods by a large margin, highlighting the benefits of our modeling scheme.

Updated: 2025-07-28 12:57:34

标题: 使用标准分辨率地图的一致在线道路拓扑估计与推理

摘要: 大多数自动驾驶汽车依赖高清地图的可用性。当前的研究旨在通过直接从车载传感器预测高清地图元素并推理预测地图与交通元素之间的关系来解决这一限制。尽管最近取得了进展,但高清地图的连贯在线构建仍然是一项具有挑战性的工作,因为它需要以统一和一致的方式建模道路拓扑的高复杂性。为了解决这一挑战,我们提出了一种连贯的方法来预测车道段和其对应的拓扑,以及道路边界,所有这些都是通过利用常见的标准定义地图所表示的先前地图信息。我们提出了一个网络架构,利用混合车道段编码,包括先前信息和去噪技术,以增强训练稳定性和性能。此外,我们通过促进过去的帧来实现时间一致性。我们的实验评估表明,我们的方法在很大程度上优于先前的方法,突显了我们建模方案的优势。

更新时间: 2025-07-28 12:57:34

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.01397v2

Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps

Most autonomous cars rely on the availability of high-definition (HD) maps. Current research aims to address this constraint by directly predicting HD map elements from onboard sensors and reasoning about the relationships between the predicted map and traffic elements. Despite recent advancements, the coherent online construction of HD maps remains a challenging endeavor, as it necessitates modeling the high complexity of road topologies in a unified and consistent manner. To address this challenge, we propose a coherent approach to predict lane segments and their corresponding topology, as well as road boundaries, all by leveraging prior map information represented by commonly available standard-definition (SD) maps. We propose a network architecture, which leverages hybrid lane segment encodings comprising prior information and denoising techniques to enhance training stability and performance. Furthermore, we facilitate past frames for temporal consistency. Our experimental evaluation demonstrates that our approach outperforms previous methods by a large margin, highlighting the benefits of our modeling scheme.

Updated: 2025-07-28 12:57:34

标题: 使用标准定义地图进行一致的在线道路拓扑估计和推理

摘要: 大多数自动驾驶汽车依赖高清地图的可用性。目前的研究旨在通过直接从车载传感器预测高清地图元素并思考预测地图与交通元素之间的关系来解决这一限制。尽管最近取得了进展,但在线构建高清地图仍然是一项具有挑战性的工作,因为它需要以统一和一致的方式对道路拓扑的高复杂性进行建模。为了解决这一挑战,我们提出了一种协同的方法来预测车道段及其相应的拓扑结构,以及道路边界,所有这些都是通过利用常用的标准定义(SD)地图表示的先前地图信息。我们提出了一种网络架构,该架构利用混合车道段编码,包括先前信息和去噪技术,以增强训练稳定性和性能。此外,我们通过促进过去的帧来实现时间一致性。我们的实验评估表明,我们的方法在很大程度上优于先前的方法,突显了我们建模方案的优势。

更新时间: 2025-07-28 12:57:34

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.01397v2

Investigation of Accuracy and Bias in Face Recognition Trained with Synthetic Data

Synthetic data has emerged as a promising alternative for training face recognition (FR) models, offering advantages in scalability, privacy compliance, and potential for bias mitigation. However, critical questions remain on whether both high accuracy and fairness can be achieved with synthetic data. In this work, we evaluate the impact of synthetic data on bias and performance of FR systems. We generate balanced face dataset, FairFaceGen, using two state of the art text-to-image generators, Flux.1-dev and Stable Diffusion v3.5 (SD35), and combine them with several identity augmentation methods, including Arc2Face and four IP-Adapters. By maintaining equal identity count across synthetic and real datasets, we ensure fair comparisons when evaluating FR performance on standard (LFW, AgeDB-30, etc.) and challenging IJB-B/C benchmarks and FR bias on Racial Faces in-the-Wild (RFW) dataset. Our results demonstrate that although synthetic data still lags behind the real datasets in the generalization on IJB-B/C, demographically balanced synthetic datasets, especially those generated with SD35, show potential for bias mitigation. We also observe that the number and quality of intra-class augmentations significantly affect FR accuracy and fairness. These findings provide practical guidelines for constructing fairer FR systems using synthetic data.

Updated: 2025-07-28 12:52:23

标题: 对使用合成数据训练的人脸识别准确性和偏差的研究

摘要: 合成数据已经成为训练人脸识别(FR)模型的一个有前途的替代方案,具有在可扩展性、隐私合规性和潜在的偏见缓解方面的优势。然而,关于合成数据是否能够实现高准确性和公平性仍然存在关键问题。在这项工作中,我们评估了合成数据对FR系统偏差和性能的影响。我们使用两种最先进的文本到图像生成器Flux.1-dev和Stable Diffusion v3.5(SD35)生成平衡的人脸数据集FairFaceGen,并结合几种身份增强方法,包括Arc2Face和四个IP适配器。通过在合成和真实数据集之间保持相同的身份计数,我们确保在标准(LFW,AgeDB-30等)和具有挑战性的IJB-B/C基准以及RFW数据集上评估FR性能时进行公平比较。我们的结果表明,尽管合成数据在IJB-B/C的泛化方面仍落后于真实数据集,但在人种平衡的合成数据集,特别是使用SD35生成的数据集中,显示出减少偏见的潜力。我们还观察到,类内增强的数量和质量显著影响FR的准确性和公平性。这些发现为利用合成数据构建更公平的FR系统提供了实用指导。

更新时间: 2025-07-28 12:52:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20782v1

Investigation of Accuracy and Bias in Face Recognition Trained with Synthetic Data

Synthetic data has emerged as a promising alternative for training face recognition (FR) models, offering advantages in scalability, privacy compliance, and potential for bias mitigation. However, critical questions remain on whether both high accuracy and fairness can be achieved with synthetic data. In this work, we evaluate the impact of synthetic data on bias and performance of FR systems. We generate balanced face dataset, FairFaceGen, using two state of the art text-to-image generators, Flux.1-dev and Stable Diffusion v3.5 (SD35), and combine them with several identity augmentation methods, including Arc2Face and four IP-Adapters. By maintaining equal identity count across synthetic and real datasets, we ensure fair comparisons when evaluating FR performance on standard (LFW, AgeDB-30, etc.) and challenging IJB-B/C benchmarks and FR bias on Racial Faces in-the-Wild (RFW) dataset. Our results demonstrate that although synthetic data still lags behind the real datasets in the generalization on IJB-B/C, demographically balanced synthetic datasets, especially those generated with SD35, show potential for bias mitigation. We also observe that the number and quality of intra-class augmentations significantly affect FR accuracy and fairness. These findings provide practical guidelines for constructing fairer FR systems using synthetic data.

Updated: 2025-07-28 12:52:23

标题: 对使用合成数据训练的人脸识别的准确性和偏差进行调查

摘要: 合成数据已经成为训练人脸识别(FR)模型的一种有前途的替代方案,具有可扩展性、隐私合规性和潜在偏见缓解的优势。然而,关键问题仍然存在,即合成数据是否能够实现高精度和公平性。在这项工作中,我们评估合成数据对FR系统偏见和性能的影响。我们使用两种最先进的文本到图像生成器Flux.1-dev和稳定扩散v3.5(SD35)生成平衡的人脸数据集FairFaceGen,并将它们与几种身份增强方法结合,包括Arc2Face和四个IP适配器。通过在合成和真实数据集之间保持相同的身份计数,我们在评估标准(LFW、AgeDB-30等)和具有挑战性的IJB-B/C基准以及RFW数据集上的FR性能时确保公平比较。我们的结果表明,尽管合成数据在IJB-B/C的泛化上仍落后于真实数据集,但在人口统计上平衡的合成数据集,特别是那些用SD35生成的数据集,显示出减轻偏见的潜力。我们还观察到,类内增强的数量和质量显著影响FR的准确性和公平性。这些发现为使用合成数据构建更公平的FR系统提供了实用指南。

更新时间: 2025-07-28 12:52:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20782v1

Dragonfly: a modular deep reinforcement learning library

Dragonfly is a deep reinforcement learning library focused on modularity, in order to ease experimentation and developments. It relies on a json serialization that allows to swap building blocks and perform parameter sweep, while minimizing code maintenance. Some of its features are specifically designed for CPU-intensive environments, such as numerical simulations. Its performance on standard agents using common benchmarks compares favorably with the literature.

Updated: 2025-07-28 12:45:49

标题: 蜻蜓:一个模块化的深度强化学习库

摘要: 蜻蜓是一个专注于模块化的深度强化学习库,旨在简化实验和开发。它依赖于json序列化,允许交换构建模块并进行参数扫描,同时最大程度地减少代码维护。它的一些特性是专门为CPU密集型环境设计的,例如数值模拟。在使用常见基准测试的标准代理方面,其性能与文献相比表现良好。

更新时间: 2025-07-28 12:45:49

领域: cs.LG

下载: http://arxiv.org/abs/2505.03778v2

Dragonfly: a modular deep reinforcement learning library

Dragonfly is a deep reinforcement learning library focused on modularity, in order to ease experimentation and developments. It relies on a json serialization that allows to swap building blocks and perform parameter sweep, while minimizing code maintenance. Some of its features are specifically designed for CPU-intensive environments, such as numerical simulations. Its performance on standard agents using common benchmarks compares favorably with the literature.

Updated: 2025-07-28 12:45:49

标题: 蜻蜓:一个模块化的深度强化学习库

摘要: 蜻蜓是一个专注于模块化的深度强化学习库,旨在简化实验和开发。它依赖于json序列化,允许交换构建模块并执行参数扫描,同时最大限度地减少代码维护。它的一些功能专门设计用于CPU密集型环境,如数值模拟。在使用常见基准测试的标准代理方面,其性能与文献中的比较有利。

更新时间: 2025-07-28 12:45:49

领域: cs.LG

下载: http://arxiv.org/abs/2505.03778v2

evalSmarT: An LLM-Based Framework for Evaluating Smart Contract Generated Comments

Smart contract comment generation has gained traction as a means to improve code comprehension and maintainability in blockchain systems. However, evaluating the quality of generated comments remains a challenge. Traditional metrics such as BLEU and ROUGE fail to capture domain-specific nuances, while human evaluation is costly and unscalable. In this paper, we present \texttt{evalSmarT}, a modular and extensible framework that leverages large language models (LLMs) as evaluators. The system supports over 400 evaluator configurations by combining approximately 40 LLMs with 10 prompting strategies. We demonstrate its application in benchmarking comment generation tools and selecting the most informative outputs. Our results show that prompt design significantly impacts alignment with human judgment, and that LLM-based evaluation offers a scalable and semantically rich alternative to existing methods.

Updated: 2025-07-28 12:37:43

标题: evalSmarT:一个基于LLM的框架,用于评估智能合约生成的注释

摘要: 智能合约评论生成作为提高区块链系统代码理解和可维护性的手段已经引起了关注。然而,评估生成评论的质量仍然是一个挑战。传统的度量标准如BLEU和ROUGE无法捕捉领域特定的细微差别,而人工评估成本高且不可扩展。在本文中,我们提出了\texttt{evalSmarT},一个模块化和可扩展的框架,利用大型语言模型(LLMs)作为评估器。该系统通过将大约40个LLMs和10个提示策略组合,支持超过400种评估器配置。我们展示了它在基准测试评论生成工具和选择最具信息量的输出中的应用。我们的结果表明,提示设计显著影响与人类判断的一致性,而基于LLM的评估提供了一种可扩展且语义丰富的替代方法。

更新时间: 2025-07-28 12:37:43

领域: cs.AI

下载: http://arxiv.org/abs/2507.20774v1

evalSmarT: An LLM-Based Framework for Evaluating Smart Contract Generated Comments

Smart contract comment generation has gained traction as a means to improve code comprehension and maintainability in blockchain systems. However, evaluating the quality of generated comments remains a challenge. Traditional metrics such as BLEU and ROUGE fail to capture domain-specific nuances, while human evaluation is costly and unscalable. In this paper, we present \texttt{evalSmarT}, a modular and extensible framework that leverages large language models (LLMs) as evaluators. The system supports over 400 evaluator configurations by combining approximately 40 LLMs with 10 prompting strategies. We demonstrate its application in benchmarking comment generation tools and selecting the most informative outputs. Our results show that prompt design significantly impacts alignment with human judgment, and that LLM-based evaluation offers a scalable and semantically rich alternative to existing methods.

Updated: 2025-07-28 12:37:43

标题: evalSmarT:基于LLM的框架用于评估智能合约生成的评论

摘要: 智能合约注释生成已成为改善区块链系统中代码理解和可维护性的手段。然而,评估生成的注释质量仍然是一个挑战。传统的度量标准如BLEU和ROUGE未能捕捉领域特定的细微差别,而人工评估成本高且不可扩展。在本文中,我们提出了\texttt{evalSmarT},一个模块化且可扩展的框架,利用大型语言模型(LLMs)作为评估器。该系统通过结合约40个LLMs和10个提示策略,支持超过400种评估器配置。我们展示了它在基准测试注释生成工具和选择最有信息的输出中的应用。我们的结果表明,提示设计显著影响与人类判断的一致性,而基于LLM的评估提供了一种可扩展且语义丰富的替代方法。

更新时间: 2025-07-28 12:37:43

领域: cs.AI

下载: http://arxiv.org/abs/2507.20774v1

Satellite-Surface-Area Machine-Learning Models for Reservoir Storage Estimation: Regime-Sensitive Evaluation and Operational Deployment at Loskop Dam, South Africa

Reliable daily estimates of reservoir storage are pivotal for water allocation and drought response decisions in semiarid regions. Conventional rating curves at Loskop Dam, the primary storage on South Africa's Olifants River, have become increasingly uncertain owing to sedimentation and episodic drawdown. A 40 year Digital Earth Africa (DEA) surface area archive (1984-2024) fused with gauged water levels to develop data driven volume predictors that operate under a maximum 9.14%, a 90 day drawdown constraint. Four nested feature sets were examined: (i) raw water area, (ii) +a power law "calculated volume" proxy, (iii) +six river geometry metrics, and (iv) +full supply elevation. Five candidate algorithms, Gradient Boosting (GB), Random Forest (RF), Ridge (RI), Lasso (LA) and Elastic Net (EN), were tuned using a 20 draw random search and assessed with a five fold Timeseries Split to eliminate look ahead bias. Prediction errors were decomposed into two regimes: Low (<250 x 10^6 cubic meters) and High (>250 x 10^6 cubic meters) storage regimes. Ridge regression achieved the lowest cross validated RMSE (12.3 x 10^6 cubic meters), outperforming GB by 16% and RF by 7%. In regime terms, Ridge was superior in the Low band (18.0 ver. 22.7 MCM for GB) and tied RF in the High band (~12 MCM). In sample diagnostics showed GB's apparent dominance (6.8-5.4 MCM) to be an artefact of overfitting. A Ridge meta stacked ensemble combining GB, RF, and Ridge reduced full series RMSE to ~ 11 MCM (~ 3% of live capacity). We recommend (i) GB retrained daily for routine operations, (ii) Ridge for drought early warning, and (iii) the stacked blend for all weather dashboards. Quarterly rolling retraining and regime specific metrics are advised to maintain operational accuracy below the 5% threshold mandated by the Department of Water and Sanitation.

Updated: 2025-07-28 12:21:18

标题: 卫星-地表面积机器学习模型用于水库蓄水量估算:南非洛斯科普水坝的制度敏感性评估和操作部署

摘要: 可靠的水库储存每日估算对于半干旱地区的水资源分配和干旱应对决策至关重要。南非Olifants河上的Loskop水坝是主要水库,传统的曲线评级由于淤积和偶发性降水而变得越来越不确定。一个为期40年的数字地球非洲(DEA)表面积存档(1984-2024年)与测定的水位融合,开发了基于数据驱动的体积预测器,其在最大9.14%、90天降水限制下运行。研究了四组嵌套特征集:(i)原始水域面积,(ii)+一种幂律“计算体积”代理,(iii)+六种河流几何度量,以及(iv)+完整供水标高。通过使用20次随机搜索对五种候选算法进行调整,分别为Gradient Boosting(GB)、Random Forest(RF)、Ridge(RI)、Lasso(LA)和Elastic Net(EN),并通过五倍时间序列分割进行评估,以消除前瞻性偏差。预测误差被分解为两个区域:低(<250 x 10^6立方米)和高(>250 x 10^6立方米)储存区域。Ridge回归实现了最低的交叉验证RMSE(12.3 x 10^6立方米),比GB高出16%和RF高出7%。在区域方面,Ridge在低频段(18.0对22.7 MCM的GB)中表现优越,并在高频段(约12 MCM)中与RF并列。样本诊断显示GB的明显优势(6.8-5.4 MCM)是过拟合的产物。一个将GB、RF和Ridge组合的Ridge元堆叠集合将全系列RMSE降至约11 MCM(活动容量的3%)。我们建议(i)GB每日重新训练以进行常规操作,(ii)Ridge用于干旱预警,(iii)混合堆叠用于所有气象仪表板。建议季度滚动重新训练和区域特定指标,以保持操作准确性低于水资源和卫生部规定的5%阈值。

更新时间: 2025-07-28 12:21:18

领域: cs.LG,I.m

下载: http://arxiv.org/abs/2502.19989v3

Satellite-Surface-Area Machine-Learning Models for Reservoir Storage Estimation: Regime-Sensitive Evaluation and Operational Deployment at Loskop Dam, South Africa

Reliable daily estimates of reservoir storage are pivotal for water allocation and drought response decisions in semiarid regions. Conventional rating curves at Loskop Dam, the primary storage on South Africa's Olifants River, have become increasingly uncertain owing to sedimentation and episodic drawdown. A 40 year Digital Earth Africa (DEA) surface area archive (1984-2024) fused with gauged water levels to develop data driven volume predictors that operate under a maximum 9.14%, a 90 day drawdown constraint. Four nested feature sets were examined: (i) raw water area, (ii) +a power law "calculated volume" proxy, (iii) +six river geometry metrics, and (iv) +full supply elevation. Five candidate algorithms, Gradient Boosting (GB), Random Forest (RF), Ridge (RI), Lasso (LA) and Elastic Net (EN), were tuned using a 20 draw random search and assessed with a five fold Timeseries Split to eliminate look ahead bias. Prediction errors were decomposed into two regimes: Low (<250 x 10^6 cubic meters) and High (>250 x 10^6 cubic meters) storage regimes. Ridge regression achieved the lowest cross validated RMSE (12.3 x 10^6 cubic meters), outperforming GB by 16% and RF by 7%. In regime terms, Ridge was superior in the Low band (18.0 ver. 22.7 MCM for GB) and tied RF in the High band (~12 MCM). In sample diagnostics showed GB's apparent dominance (6.8-5.4 MCM) to be an artefact of overfitting. A Ridge meta stacked ensemble combining GB, RF, and Ridge reduced full series RMSE to ~ 11 MCM (~ 3% of live capacity). We recommend (i) GB retrained daily for routine operations, (ii) Ridge for drought early warning, and (iii) the stacked blend for all weather dashboards. Quarterly rolling retraining and regime specific metrics are advised to maintain operational accuracy below the 5% threshold mandated by the Department of Water and Sanitation.

Updated: 2025-07-28 12:21:18

标题: 卫星-地表面积机器学习模型用于水库蓄水量估算:在南非洛斯科普水坝的制度敏感评估和运行部署

摘要: 可靠的水库储存每日估算对于半干旱地区的水资源分配和干旱应对决策至关重要。由于淤积和间歇性降水,南非奥利方特河上的洛斯科普水坝的传统评级曲线变得越来越不确定。一个为期40年的数字地球非洲(DEA)地表面积存档(1984-2024年)与测量水位融合,开发了基于数据驱动的体积预测器,符合最大9.14%、90天降水约束的条件。对四组嵌套特征集进行了检查:(i)原始水面积,(ii) +幂律“计算体积”代理,(iii) +六个河流几何度量,以及(iv) +全供水高程。通过使用20次随机搜索调整了五个候选算法,包括梯度提升(GB)、随机森林(RF)、岭回归(RI)、套索(LA)和弹性网络(EN),并通过五倍时间序列分割进行评估,以消除向前偏差。预测误差被分解为两个领域:低(<250 x 10^6立方米)和高(>250 x 10^6立方米)存储领域。岭回归实现了最低的交叉验证RMSE(12.3 x 10^6立方米),比GB高出16%、比RF高出7%。在领域方面,岭回归在低频段(18.0对22.7 MCM的GB)上表现优异,并在高频段(约12 MCM)上与RF并驾齐驱。样本诊断显示GB表现卓越(6.8-5.4 MCM)是过度拟合的结果。将GB、RF和岭回归结合的岭元组合集将全系列RMSE降低到约11 MCM(活动能力的3%)。我们建议(i)每天重新训练GB以进行常规操作,(ii)使用岭回归进行干旱预警,(iii)使用混合方式进行所有气象仪表板。建议进行季度滚动重新训练和特定领域指标,以保持在水利和卫生部规定的5%阈值以下的操作精度。

更新时间: 2025-07-28 12:21:18

领域: cs.LG,I.m

下载: http://arxiv.org/abs/2502.19989v3

How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation

Chain-of-Thought (CoT) prompting significantly enhances model reasoning, yet its internal mechanisms remain poorly understood. We analyze CoT's operational principles by reversely tracing information flow across decoding, projection, and activation phases. Our quantitative analysis suggests that CoT may serve as a decoding space pruner, leveraging answer templates to guide output generation, with higher template adherence strongly correlating with improved performance. Furthermore, we surprisingly find that CoT modulates neuron engagement in a task-dependent manner: reducing neuron activation in open-domain tasks, yet increasing it in closed-domain scenarios. These findings offer a novel mechanistic interpretability framework and critical insights for enabling targeted CoT interventions to design more efficient and robust prompts. We released our code and data at https://anonymous.4open.science/r/cot-D247.

Updated: 2025-07-28 12:11:16

标题: 链式思维是如何运作的?从解码、投射和激活中追踪信息流

摘要: Chain-of-Thought (CoT)提示显著增强了模型的推理能力,但其内部机制仍然不太清楚。我们通过逆向追踪解码、投影和激活阶段之间的信息流来分析CoT的操作原理。我们的定量分析表明,CoT可能作为一个解码空间修剪器,利用答案模板来指导输出生成,而较高的模板遵从性与改进的性能强烈相关。此外,我们惊讶地发现,CoT以任务相关的方式调节神经元的参与:在开放领域任务中减少神经元的激活,但在封闭领域情景中增加神经元的激活。这些发现为启用有针对性的CoT干预以设计更高效和更稳健提示提供了一种新的机制解释框架和关键见解。我们在https://anonymous.4open.science/r/cot-D247发布了我们的代码和数据。

更新时间: 2025-07-28 12:11:16

领域: cs.AI

下载: http://arxiv.org/abs/2507.20758v1

How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation

Chain-of-Thought (CoT) prompting significantly enhances model reasoning, yet its internal mechanisms remain poorly understood. We analyze CoT's operational principles by reversely tracing information flow across decoding, projection, and activation phases. Our quantitative analysis suggests that CoT may serve as a decoding space pruner, leveraging answer templates to guide output generation, with higher template adherence strongly correlating with improved performance. Furthermore, we surprisingly find that CoT modulates neuron engagement in a task-dependent manner: reducing neuron activation in open-domain tasks, yet increasing it in closed-domain scenarios. These findings offer a novel mechanistic interpretability framework and critical insights for enabling targeted CoT interventions to design more efficient and robust prompts. We released our code and data at https://anonymous.4open.science/r/cot-D247.

Updated: 2025-07-28 12:11:16

标题: 思维链是如何运作的?从解码、投射和激活中追踪信息流.

摘要: Chain-of-Thought (CoT)提示显著增强模型推理能力,但其内部机制仍不明确。我们通过反向追踪解码、投影和激活阶段的信息流来分析CoT的操作原则。我们的定量分析表明,CoT可能作为一个解码空间修剪器,利用答案模板来指导输出生成,更高的模板遵从度与性能改善强烈相关。此外,我们惊讶地发现,CoT以任务相关方式调节神经元的参与度:在开放领域任务中减少神经元激活,但在封闭领域场景中增加神经元激活。这些发现提供了一个新的机制可解释性框架和关键见解,为实现有针对性的CoT干预以设计更高效和稳健的提示提供了重要线索。我们在https://anonymous.4open.science/r/cot-D247上发布了我们的代码和数据。

更新时间: 2025-07-28 12:11:16

领域: cs.AI

下载: http://arxiv.org/abs/2507.20758v1

Learning to See Inside Opaque Liquid Containers using Speckle Vibrometry

Computer vision seeks to infer a wide range of information about objects and events. However, vision systems based on conventional imaging are limited to extracting information only from the visible surfaces of scene objects. For instance, a vision system can detect and identify a Coke can in the scene, but it cannot determine whether the can is full or empty. In this paper, we aim to expand the scope of computer vision to include the novel task of inferring the hidden liquid levels of opaque containers by sensing the tiny vibrations on their surfaces. Our method provides a first-of-a-kind way to inspect the fill level of multiple sealed containers remotely, at once, without needing physical manipulation and manual weighing. First, we propose a novel speckle-based vibration sensing system for simultaneously capturing scene vibrations on a 2D grid of points. We use our system to efficiently and remotely capture a dataset of vibration responses for a variety of everyday liquid containers. Then, we develop a transformer-based approach for analyzing the captured vibrations and classifying the container type and its hidden liquid level at the time of measurement. Our architecture is invariant to the vibration source, yielding correct liquid level estimates for controlled and ambient scene sound sources. Moreover, our model generalizes to unseen container instances within known classes (e.g., training on five Coke cans of a six-pack, testing on a sixth) and fluid levels. We demonstrate our method by recovering liquid levels from various everyday containers.

Updated: 2025-07-28 12:11:12

标题: 学习使用斑点振动测量法看透不透明液体容器

摘要: 计算机视觉旨在推断关于对象和事件的各种信息。然而,基于传统成像的视觉系统仅限于从场景对象的可见表面提取信息。例如,视觉系统可以检测并识别场景中的一罐可乐,但无法确定该罐是满的还是空的。在本文中,我们旨在扩展计算机视觉的范围,包括推断通过感知表面微小振动来隐藏的不透明容器中的液体水平这一新颖任务。我们的方法提供了一种独特的方式,可以远程检查多个密封容器的填充水平,无需进行物理操作和手动称重。首先,我们提出了一种基于斑点的振动传感系统,用于同时捕捉2D点网格上的场景振动。我们使用我们的系统高效地和远程地捕获了各种日常液体容器的振动响应数据集。然后,我们开发了一种基于变压器的方法来分析捕获的振动并在测量时对容器类型和其隐藏的液体水平进行分类。我们的架构对振动源是不变的,可以为受控和环境场景声源提供正确的液体水平估计。此外,我们的模型可以推广到已知类别内未见过的容器实例(例如,在六罐装可乐中训练,然后在第六罐上进行测试)和液体水平。我们通过从各种日常容器中恢复液体水平来展示我们的方法。

更新时间: 2025-07-28 12:11:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20757v1

Learning to See Inside Opaque Liquid Containers using Speckle Vibrometry

Computer vision seeks to infer a wide range of information about objects and events. However, vision systems based on conventional imaging are limited to extracting information only from the visible surfaces of scene objects. For instance, a vision system can detect and identify a Coke can in the scene, but it cannot determine whether the can is full or empty. In this paper, we aim to expand the scope of computer vision to include the novel task of inferring the hidden liquid levels of opaque containers by sensing the tiny vibrations on their surfaces. Our method provides a first-of-a-kind way to inspect the fill level of multiple sealed containers remotely, at once, without needing physical manipulation and manual weighing. First, we propose a novel speckle-based vibration sensing system for simultaneously capturing scene vibrations on a 2D grid of points. We use our system to efficiently and remotely capture a dataset of vibration responses for a variety of everyday liquid containers. Then, we develop a transformer-based approach for analyzing the captured vibrations and classifying the container type and its hidden liquid level at the time of measurement. Our architecture is invariant to the vibration source, yielding correct liquid level estimates for controlled and ambient scene sound sources. Moreover, our model generalizes to unseen container instances within known classes (e.g., training on five Coke cans of a six-pack, testing on a sixth) and fluid levels. We demonstrate our method by recovering liquid levels from various everyday containers.

Updated: 2025-07-28 12:11:12

标题: 学习使用散斑振动测量技术查看不透明液体容器内部

摘要: 计算机视觉旨在推断关于物体和事件的各种信息。然而,基于传统成像的视觉系统仅限于从场景物体的可见表面提取信息。例如,视觉系统可以检测和识别场景中的可乐罐,但无法确定罐子是满的还是空的。在本文中,我们旨在扩展计算机视觉的范围,包括通过感知表面上微小振动来推断不透明容器中的隐藏液位这一新任务。我们的方法提供了一种首创的方式,可以在不需要物理操作和手动称重的情况下远程检查多个密封容器的填充水平。首先,我们提出了一种基于斑点的振动传感系统,可以同时捕捉2D点阵上的场景振动。我们使用该系统高效地并远程地捕获了一组各种日常液体容器的振动响应数据。然后,我们开发了一种基于变压器的方法来分析捕获的振动并在测量时对容器类型和其隐藏液位进行分类。我们的架构对振动源不变,为受控和环境场景声源提供正确的液位估计。此外,我们的模型推广到已知类别中未见过的容器实例(例如,在六组装可乐罐上训练,在第六个上测试)和液位。我们通过从各种日常容器中恢复液位来展示我们的方法。

更新时间: 2025-07-28 12:11:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20757v1

Beyond Listenership: AI-Predicted Interventions Drive Improvements in Maternal Health Behaviours

Automated voice calls with health information are a proven method for disseminating maternal and child health information among beneficiaries and are deployed in several programs around the world. However, these programs often suffer from beneficiary dropoffs and poor engagement. In previous work, through real-world trials, we showed that an AI model, specifically a restless bandit model, could identify beneficiaries who would benefit most from live service call interventions, preventing dropoffs and boosting engagement. However, one key question has remained open so far: does such improved listenership via AI-targeted interventions translate into beneficiaries' improved knowledge and health behaviors? We present a first study that shows not only listenership improvements due to AI interventions, but also simultaneously links these improvements to health behavior changes. Specifically, we demonstrate that AI-scheduled interventions, which enhance listenership, lead to statistically significant improvements in beneficiaries' health behaviors such as taking iron or calcium supplements in the postnatal period, as well as understanding of critical health topics during pregnancy and infancy. This underscores the potential of AI to drive meaningful improvements in maternal and child health.

Updated: 2025-07-28 12:06:22

标题: 超越倾听者:人工智能预测干预推动孕妇健康行为改善

摘要: 自动语音电话携带健康信息已被证明是向受益者传播孕产妇和儿童健康信息的有效方法,并且在世界各地的几个项目中得到应用。然而,这些项目经常遭受受益者流失和参与度低的问题。在先前的工作中,通过真实世界的试验,我们展示了一种人工智能模型,具体是一个不安定的强盗模型,可以识别哪些受益者最需要实时服务电话干预,从而防止流失并提高参与度。然而,一个关键问题一直保持开放:通过人工智能定向干预改善听众是否能转化为受益者知识和健康行为的改善?我们呈现了一项首次研究,不仅显示由于人工智能干预而带来的听众改善,同时还将这些改善与健康行为变化联系起来。具体来说,我们证明了AI计划的干预,通过增强听众,导致受益者健康行为的显著改善,例如在产后期间服用铁剂或钙剂,以及对怀孕和婴幼儿期间关键健康主题的理解。这强调了人工智能在推动孕产妇和儿童健康方面产生有意义的改善的潜力。

更新时间: 2025-07-28 12:06:22

领域: cs.AI

下载: http://arxiv.org/abs/2507.20755v1

Beyond Listenership: AI-Predicted Interventions Drive Improvements in Maternal Health Behaviours

Automated voice calls with health information are a proven method for disseminating maternal and child health information among beneficiaries and are deployed in several programs around the world. However, these programs often suffer from beneficiary dropoffs and poor engagement. In previous work, through real-world trials, we showed that an AI model, specifically a restless bandit model, could identify beneficiaries who would benefit most from live service call interventions, preventing dropoffs and boosting engagement. However, one key question has remained open so far: does such improved listenership via AI-targeted interventions translate into beneficiaries' improved knowledge and health behaviors? We present a first study that shows not only listenership improvements due to AI interventions, but also simultaneously links these improvements to health behavior changes. Specifically, we demonstrate that AI-scheduled interventions, which enhance listenership, lead to statistically significant improvements in beneficiaries' health behaviors such as taking iron or calcium supplements in the postnatal period, as well as understanding of critical health topics during pregnancy and infancy. This underscores the potential of AI to drive meaningful improvements in maternal and child health.

Updated: 2025-07-28 12:06:22

标题: 超越倾听者:AI 预测的干预推动孕妇健康行为的改善

摘要: 自动语音电话提供健康信息已被证明是在受益者中传播孕产妇和儿童健康信息的有效方法,并且在世界各地的多个项目中得到应用。然而,这些项目经常面临受益者的流失和参与度低的问题。在先前的工作中,通过实际试验,我们展示了一个人工智能模型,特别是一个不安定的掠夺者模型,可以识别哪些受益者最能从现场服务电话干预中受益,从而防止流失并提高参与度。然而,一个关键问题迄今尚未解决:通过人工智能定向干预提高听众数量是否会转化为受益者的知识和健康行为的改善?我们呈现了一项首次研究,不仅显示了由于人工智能干预而导致的听众数量的改善,同时还将这些改善与健康行为的变化联系起来。具体来说,我们证明了AI安排的干预措施,增强了听众数量,从而在产后期间显着改善了受益者的健康行为,例如服用铁剂或钙补充剂,以及在怀孕和婴幼儿期间对关键健康主题的理解。这凸显了人工智能在推动孕产妇和儿童健康方面带来有意义的改善的潜力。

更新时间: 2025-07-28 12:06:22

领域: cs.AI

下载: http://arxiv.org/abs/2507.20755v1

Enhancing Jailbreak Attacks on LLMs via Persona Prompts

Jailbreak attacks aim to exploit large language models (LLMs) by inducing them to generate harmful content, thereby revealing their vulnerabilities. Understanding and addressing these attacks is crucial for advancing the field of LLM safety. Previous jailbreak approaches have mainly focused on direct manipulations of harmful intent, with limited attention to the impact of persona prompts. In this study, we systematically explore the efficacy of persona prompts in compromising LLM defenses. We propose a genetic algorithm-based method that automatically crafts persona prompts to bypass LLM's safety mechanisms. Our experiments reveal that: (1) our evolved persona prompts reduce refusal rates by 50-70% across multiple LLMs, and (2) these prompts demonstrate synergistic effects when combined with existing attack methods, increasing success rates by 10-20%. Our code and data are available at https://github.com/CjangCjengh/Generic_Persona.

Updated: 2025-07-28 12:03:22

标题: 通过人格提示增强对LLM的越狱攻击

摘要: 监狱攻击旨在利用大型语言模型(LLMs),诱使它们生成有害内容,从而揭示它们的漏洞。理解和解决这些攻击对于推进LLMs安全领域至关重要。先前的越狱方法主要集中在直接操纵有害意图,对人物提示的影响关注较少。在这项研究中,我们系统地探讨了人物提示在破坏LLM防御方面的有效性。我们提出了一种基于遗传算法的方法,可以自动制作人物提示,绕过LLM的安全机制。我们的实验显示:(1)我们进化的人物提示可以将多个LLMs的拒绝率降低50-70%,(2)这些提示与现有攻击方法结合时表现出协同效应,将成功率提高10-20%。我们的代码和数据可在https://github.com/CjangCjengh/Generic_Persona 上找到。

更新时间: 2025-07-28 12:03:22

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.22171v1

Industry Insights from Comparing Deep Learning and GBDT Models for E-Commerce Learning-to-Rank

In e-commerce recommender and search systems, tree-based models, such as LambdaMART, have set a strong baseline for Learning-to-Rank (LTR) tasks. Despite their effectiveness and widespread adoption in industry, the debate continues whether deep neural networks (DNNs) can outperform traditional tree-based models in this domain. To contribute to this discussion, we systematically benchmark DNNs against our production-grade LambdaMART model. We evaluate multiple DNN architectures and loss functions on a proprietary dataset from OTTO and validate our findings through an 8-week online A/B test. The results show that a simple DNN architecture outperforms a strong tree-based baseline in terms of total clicks and revenue, while achieving parity in total units sold.

Updated: 2025-07-28 12:02:02

标题: 比较深度学习和GBDT模型在电子商务学习排序中的产业洞见

摘要: 在电子商务推荐和搜索系统中,基于树的模型,如LambdaMART,已经为学习排序(LTR)任务设定了一个强大的基准。尽管它们在行业中的有效性和广泛采用,但关于深度神经网络(DNN)能否在这个领域胜过传统基于树的模型的争论仍在继续。为了为这个讨论做出贡献,我们系统地对比了DNN和我们的生产级LambdaMART模型。我们在OTTO的专有数据集上评估了多个DNN架构和损失函数,并通过8周的在线A/B测试验证了我们的发现。结果显示,一个简单的DNN架构在总点击量和收入方面优于一个强大的基于树的基线,同时在总销售单位上达到了平衡。

更新时间: 2025-07-28 12:02:02

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20753v1

Industry Insights from Comparing Deep Learning and GBDT Models for E-Commerce Learning-to-Rank

In e-commerce recommender and search systems, tree-based models, such as LambdaMART, have set a strong baseline for Learning-to-Rank (LTR) tasks. Despite their effectiveness and widespread adoption in industry, the debate continues whether deep neural networks (DNNs) can outperform traditional tree-based models in this domain. To contribute to this discussion, we systematically benchmark DNNs against our production-grade LambdaMART model. We evaluate multiple DNN architectures and loss functions on a proprietary dataset from OTTO and validate our findings through an 8-week online A/B test. The results show that a simple DNN architecture outperforms a strong tree-based baseline in terms of total clicks and revenue, while achieving parity in total units sold.

Updated: 2025-07-28 12:02:02

标题: 通过比较深度学习和GBDT模型进行电子商务学习排名的行业见解

摘要: 在电子商务推荐和搜索系统中,基于树的模型,如LambdaMART,已经为学习排序(LTR)任务设定了强大的基准线。尽管它们在行业中的有效性和广泛采用,但关于深度神经网络(DNNs)能否在该领域胜过传统的基于树的模型仍在争论中。为了对这个讨论做出贡献,我们系统地对DNNs与我们生产级的LambdaMART模型进行基准测试。我们评估了多个DNN架构和损失函数在OTTO的专有数据集上,并通过8周的在线A/B测试验证了我们的发现。结果显示,一个简单的DNN架构在总点击量和收入方面优于强大的基于树的基线,同时在总销售量上达到了平衡。

更新时间: 2025-07-28 12:02:02

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20753v1

Multilingual Self-Taught Faithfulness Evaluators

The growing use of large language models (LLMs) has increased the need for automatic evaluation systems, particularly to address the challenge of information hallucination. Although existing faithfulness evaluation approaches have shown promise, they are predominantly English-focused and often require expensive human-labeled training data for fine-tuning specialized models. As LLMs see increased adoption in multilingual contexts, there is a need for accurate faithfulness evaluators that can operate across languages without extensive labeled data. This paper presents Self-Taught Evaluators for Multilingual Faithfulness, a framework that learns exclusively from synthetic multilingual summarization data while leveraging cross-lingual transfer learning. Through experiments comparing language-specific and mixed-language fine-tuning approaches, we demonstrate a consistent relationship between an LLM's general language capabilities and its performance in language-specific evaluation tasks. Our framework shows improvements over existing baselines, including state-of-the-art English evaluators and machine translation-based approaches.

Updated: 2025-07-28 12:01:59

标题: 多语种自学忠实性评估者

摘要: 随着大型语言模型(LLMs)的日益使用,自动评估系统的需求日益增加,特别是为了解决信息幻觉的挑战。尽管现有的忠实度评估方法显示出了潜力,但它们主要集中在英语上,并且通常需要昂贵的人工标记训练数据来微调专门的模型。随着LLMs在多语言环境中的广泛应用,需要准确的忠实度评估器,可以在没有大量标记数据的情况下跨语言操作。本文提出了一种自学习多语言忠实度评估器的框架,该框架仅从合成多语言摘要数据中学习,同时利用跨语言迁移学习。通过比较特定语言和混合语言微调方法的实验,我们展示了LLM的一般语言能力与其在特定语言评估任务中表现之间的一致关系。我们的框架相对于现有基线表现出改进,包括最先进的英语评估器和基于机器翻译的方法。

更新时间: 2025-07-28 12:01:59

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.20752v1

Multilingual Self-Taught Faithfulness Evaluators

The growing use of large language models (LLMs) has increased the need for automatic evaluation systems, particularly to address the challenge of information hallucination. Although existing faithfulness evaluation approaches have shown promise, they are predominantly English-focused and often require expensive human-labeled training data for fine-tuning specialized models. As LLMs see increased adoption in multilingual contexts, there is a need for accurate faithfulness evaluators that can operate across languages without extensive labeled data. This paper presents Self-Taught Evaluators for Multilingual Faithfulness, a framework that learns exclusively from synthetic multilingual summarization data while leveraging cross-lingual transfer learning. Through experiments comparing language-specific and mixed-language fine-tuning approaches, we demonstrate a consistent relationship between an LLM's general language capabilities and its performance in language-specific evaluation tasks. Our framework shows improvements over existing baselines, including state-of-the-art English evaluators and machine translation-based approaches.

Updated: 2025-07-28 12:01:59

标题: 多语种自学忠实性评估者

摘要: 随着大型语言模型(LLMs)的不断增加使用,自动评估系统的需求也在增加,特别是要解决信息幻觉的挑战。尽管现有的忠实度评估方法显示出了潜力,但它们主要集中在英语,并且通常需要昂贵的人工标记训练数据来微调专门的模型。随着LLMs在多语言环境中的增加应用,有必要开发可以跨语言操作而无需大量标记数据的准确的忠实度评估器。本文提出了用于多语言忠实度的自学习评估器框架,该框架仅从合成的多语言摘要数据中学习,同时利用跨语言迁移学习。通过比较特定语言和混合语言微调方法的实验,我们展示了LLM的通用语言能力与其在特定语言评估任务中表现之间的一致关系。我们的框架相对于现有基线(包括英文评估器和基于机器翻译的方法)显示出改进。

更新时间: 2025-07-28 12:01:59

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.20752v1

AR-LIF: Adaptive reset leaky-integrate and fire neuron for spiking neural networks

Spiking neural networks possess the advantage of low energy consumption due to their event-driven nature. Compared with binary spike outputs, their inherent floating-point dynamics are more worthy of attention. The threshold level and re- set mode of neurons play a crucial role in determining the number and timing of spikes. The existing hard reset method causes information loss, while the improved soft reset method adopts a uniform treatment for neurons. In response to this, this paper designs an adaptive reset neuron, establishing the correlation between input, output and reset, and integrating a simple yet effective threshold adjustment strategy. It achieves excellent performance on various datasets while maintaining the advantage of low energy consumption.

Updated: 2025-07-28 11:54:48

标题: AR-LIF:用于脉冲神经网络的自适应重置漏电积分-放电神经元

摘要: 脉冲神经网络由于其事件驱动的特性,具有低能耗的优势。与二进制脉冲输出相比,其固有的浮点动态更值得关注。神经元的阈值水平和重置模式在确定脉冲数量和时序方面起着至关重要的作用。现有的硬重置方法会导致信息丢失,而改进的软重置方法采用了对神经元的统一处理。为了应对这一问题,本文设计了一种自适应重置神经元,建立了输入、输出和重置之间的相关性,并整合了一个简单而有效的阈值调整策略。在保持低能耗优势的同时,在各种数据集上取得了出色的性能。

更新时间: 2025-07-28 11:54:48

领域: cs.NE,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.20746v1

AR-LIF: Adaptive reset leaky-integrate and fire neuron for spiking neural networks

Spiking neural networks possess the advantage of low energy consumption due to their event-driven nature. Compared with binary spike outputs, their inherent floating-point dynamics are more worthy of attention. The threshold level and re- set mode of neurons play a crucial role in determining the number and timing of spikes. The existing hard reset method causes information loss, while the improved soft reset method adopts a uniform treatment for neurons. In response to this, this paper designs an adaptive reset neuron, establishing the correlation between input, output and reset, and integrating a simple yet effective threshold adjustment strategy. It achieves excellent performance on various datasets while maintaining the advantage of low energy consumption.

Updated: 2025-07-28 11:54:48

标题: AR-LIF:用于脉冲神经网络的自适应重置漏电积分和放电神经元

摘要: 脉冲神经网络由于其事件驱动的特性具有低能耗的优势。与二进制脉冲输出相比,其固有的浮点动态更值得关注。神经元的阈值水平和复位模式在确定脉冲数量和时序方面起着至关重要的作用。现有的硬复位方法会导致信息丢失,而改进的软复位方法采用了对神经元的统一处理。为了应对这一问题,本文设计了一种自适应复位神经元,建立了输入、输出和复位之间的相关性,并整合了一个简单而有效的阈值调整策略。它在各种数据集上取得了优异的性能,同时保持低能耗的优势。

更新时间: 2025-07-28 11:54:48

领域: cs.NE,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.20746v1

Regularizing Subspace Redundancy of Low-Rank Adaptation

Low-Rank Adaptation (LoRA) and its variants have delivered strong capability in Parameter-Efficient Transfer Learning (PETL) by minimizing trainable parameters and benefiting from reparameterization. However, their projection matrices remain unrestricted during training, causing high representation redundancy and diminishing the effectiveness of feature adaptation in the resulting subspaces. While existing methods mitigate this by manually adjusting the rank or implicitly applying channel-wise masks, they lack flexibility and generalize poorly across various datasets and architectures. Hence, we propose ReSoRA, a method that explicitly models redundancy between mapping subspaces and adaptively Regularizes Subspace redundancy of Low-Rank Adaptation. Specifically, it theoretically decomposes the low-rank submatrices into multiple equivalent subspaces and systematically applies de-redundancy constraints to the feature distributions across different projections. Extensive experiments validate that our proposed method consistently facilitates existing state-of-the-art PETL methods across various backbones and datasets in vision-language retrieval and standard visual classification benchmarks. Besides, as a training supervision, ReSoRA can be seamlessly integrated into existing approaches in a plug-and-play manner, with no additional inference costs. Code is publicly available at: https://github.com/Lucenova/ReSoRA.

Updated: 2025-07-28 11:52:56

标题: 正则化低秩适应的子空间冗余

摘要: 低秩适应(LoRA)及其变种通过最小化可训练参数并从重新参数化中获益,在参数有效的迁移学习(PETL)中发挥了强大的能力。然而,在训练过程中,它们的投影矩阵仍然没有受到限制,导致高表示冗余并减弱了特征适应在生成的子空间中的有效性。虽然现有方法通过手动调整秩或隐式应用通道掩码来减轻这一问题,但它们缺乏灵活性,并且在各种数据集和架构上泛化能力差。因此,我们提出了ReSoRA,一种明确建模映射子空间之间冗余并自适应正则化低秩适应子空间冗余的方法。具体地,它在理论上将低秩子矩阵分解为多个等效子空间,并系统地将非冗余约束应用于不同投影之间的特征分布。大量实验证实了我们提出的方法在不同骨干和数据集上始终促进现有最先进的PETL方法在视觉-语言检索和标准视觉分类基准测试中的表现。此外,作为训练监督,ReSoRA可以无缝地以即插即用的方式集成到现有方法中,无需额外的推理成本。代码公开可用于:https://github.com/Lucenova/ReSoRA。

更新时间: 2025-07-28 11:52:56

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2507.20745v1

Regularizing Subspace Redundancy of Low-Rank Adaptation

Low-Rank Adaptation (LoRA) and its variants have delivered strong capability in Parameter-Efficient Transfer Learning (PETL) by minimizing trainable parameters and benefiting from reparameterization. However, their projection matrices remain unrestricted during training, causing high representation redundancy and diminishing the effectiveness of feature adaptation in the resulting subspaces. While existing methods mitigate this by manually adjusting the rank or implicitly applying channel-wise masks, they lack flexibility and generalize poorly across various datasets and architectures. Hence, we propose ReSoRA, a method that explicitly models redundancy between mapping subspaces and adaptively Regularizes Subspace redundancy of Low-Rank Adaptation. Specifically, it theoretically decomposes the low-rank submatrices into multiple equivalent subspaces and systematically applies de-redundancy constraints to the feature distributions across different projections. Extensive experiments validate that our proposed method consistently facilitates existing state-of-the-art PETL methods across various backbones and datasets in vision-language retrieval and standard visual classification benchmarks. Besides, as a training supervision, ReSoRA can be seamlessly integrated into existing approaches in a plug-and-play manner, with no additional inference costs. Code is publicly available at: https://github.com/Lucenova/ReSoRA.

Updated: 2025-07-28 11:52:56

标题: 规范低秩适应子空间冗余

摘要: 低秩适应(LoRA)及其变体通过最小化可训练参数并受益于重新参数化,在参数有效的迁移学习(PETL)中表现出强大的能力。然而,在训练过程中它们的投影矩阵仍然没有受限制,导致高表示冗余并降低了结果子空间中特征适应的有效性。现有方法通过手动调整秩或隐式应用通道掩码来缓解这一问题,但它们缺乏灵活性,且在各种数据集和架构中泛化能力差。因此,我们提出了ReSoRA,一种明确建模映射子空间之间冗余并自适应地规范低秩适应子空间冗余的方法。具体而言,它在理论上将低秩子矩阵分解为多个等效子空间,并系统地对不同投影之间的特征分布应用去冗余约束。大量实验证实了我们提出的方法在视觉-语言检索和标准视觉分类基准测试中始终能够促进现有的最先进的PETL方法在各种骨干和数据集上的表现。此外,作为训练监督,ReSoRA可以无缝地以即插即用的方式集成到现有方法中,而没有额外的推理成本。代码公开可用:https://github.com/Lucenova/ReSoRA。

更新时间: 2025-07-28 11:52:56

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2507.20745v1

Multi-Masked Querying Network for Robust Emotion Recognition from Incomplete Multi-Modal Physiological Signals

Emotion recognition from physiological data is crucial for mental health assessment, yet it faces two significant challenges: incomplete multi-modal signals and interference from body movements and artifacts. This paper presents a novel Multi-Masked Querying Network (MMQ-Net) to address these issues by integrating multiple querying mechanisms into a unified framework. Specifically, it uses modality queries to reconstruct missing data from incomplete signals, category queries to focus on emotional state features, and interference queries to separate relevant information from noise. Extensive experiment results demonstrate the superior emotion recognition performance of MMQ-Net compared to existing approaches, particularly under high levels of data incompleteness.

Updated: 2025-07-28 11:41:15

标题: 多层遮罩查询网络用于从不完整的多模态生理信号中实现鲁棒的情绪识别

摘要: 情绪识别是心理健康评估中至关重要的一环,然而它面临着两个重要挑战:多模态信号不完整和来自身体运动和人为干扰的干扰。本文提出了一种新颖的多掩蔽查询网络(MMQ-Net),通过将多种查询机制集成到统一框架中来解决这些问题。具体来说,它使用模态查询来重建不完整信号中的缺失数据,使用类别查询来关注情绪状态特征,并使用干扰查询来将相关信息与噪音分离。大量实验结果表明,与现有方法相比,MMQ-Net在高数据不完整性情况下表现出更优异的情绪识别性能。

更新时间: 2025-07-28 11:41:15

领域: cs.CV,cs.AI,cs.HC

下载: http://arxiv.org/abs/2507.20737v1

Multi-Masked Querying Network for Robust Emotion Recognition from Incomplete Multi-Modal Physiological Signals

Emotion recognition from physiological data is crucial for mental health assessment, yet it faces two significant challenges: incomplete multi-modal signals and interference from body movements and artifacts. This paper presents a novel Multi-Masked Querying Network (MMQ-Net) to address these issues by integrating multiple querying mechanisms into a unified framework. Specifically, it uses modality queries to reconstruct missing data from incomplete signals, category queries to focus on emotional state features, and interference queries to separate relevant information from noise. Extensive experiment results demonstrate the superior emotion recognition performance of MMQ-Net compared to existing approaches, particularly under high levels of data incompleteness.

Updated: 2025-07-28 11:41:15

标题: 多掩码查询网络用于从不完整的多模态生理信号中稳健情绪识别

摘要: 情绪识别是心理健康评估中至关重要的一环,然而它面临着两个重要挑战:不完整的多模态信号和来自身体运动和人为干扰的干扰。本文介绍了一种新颖的Multi-Masked Querying Network (MMQ-Net)来解决这些问题,通过将多种查询机制集成到统一框架中。具体而言,它使用模态查询来重建不完整信号中缺失的数据,使用类别查询来关注情绪状态特征,使用干扰查询来将相关信息与噪音分离。广泛的实验结果表明,与现有方法相比,特别是在数据不完整性较高的情况下,MMQ-Net具有更优越的情绪识别性能。

更新时间: 2025-07-28 11:41:15

领域: cs.CV,cs.AI,cs.HC

下载: http://arxiv.org/abs/2507.20737v1

Finite-Time Analysis of Discrete-Time Stochastic Interpolants

The stochastic interpolant framework offers a powerful approach for constructing generative models based on ordinary differential equations (ODEs) or stochastic differential equations (SDEs) to transform arbitrary data distributions. However, prior analyses of this framework have primarily focused on the continuous-time setting, assuming a perfect solution of the underlying equations. In this work, we present the first discrete-time analysis of the stochastic interpolant framework, where we introduce an innovative discrete-time sampler and derive a finite-time upper bound on its distribution estimation error. Our result provides a novel quantification of how different factors, including the distance between source and target distributions and estimation accuracy, affect the convergence rate and also offers a new principled way to design efficient schedules for convergence acceleration. Finally, numerical experiments are conducted on the discrete-time sampler to corroborate our theoretical findings.

Updated: 2025-07-28 11:38:05

标题: 有限时间内的离散时间随机插值分析

摘要: 随机插值框架为基于常微分方程(ODEs)或随机微分方程(SDEs)的生成模型的构建提供了强大的方法,以转换任意数据分布。然而,先前对该框架的分析主要集中在连续时间设置上,假定基础方程的解是完美的。在这项工作中,我们提出了对随机插值框架的首次离散时间分析,引入了一种创新的离散时间采样器,并推导出其分布估计误差的有限时间上界。我们的结果提供了一个新颖的量化方法,用于衡量不同因素,包括源分布和目标分布之间的距离以及估计准确性,如何影响收敛速度,并提供了一种设计有效收敛加速时间表的新方法。最后,在离散时间采样器上进行了数值实验,以验证我们的理论发现。

更新时间: 2025-07-28 11:38:05

领域: cs.LG

下载: http://arxiv.org/abs/2502.09130v2

Finite-Time Analysis of Discrete-Time Stochastic Interpolants

The stochastic interpolant framework offers a powerful approach for constructing generative models based on ordinary differential equations (ODEs) or stochastic differential equations (SDEs) to transform arbitrary data distributions. However, prior analyses of this framework have primarily focused on the continuous-time setting, assuming a perfect solution of the underlying equations. In this work, we present the first discrete-time analysis of the stochastic interpolant framework, where we introduce an innovative discrete-time sampler and derive a finite-time upper bound on its distribution estimation error. Our result provides a novel quantification of how different factors, including the distance between source and target distributions and estimation accuracy, affect the convergence rate and also offers a new principled way to design efficient schedules for convergence acceleration. Finally, numerical experiments are conducted on the discrete-time sampler to corroborate our theoretical findings.

Updated: 2025-07-28 11:38:05

标题: 离散时间随机插值器的有限时间分析

摘要: 随机插值框架提供了一种强大的方法,基于常微分方程(ODEs)或随机微分方程(SDEs)构建生成模型,以转换任意数据分布。然而,先前对该框架的分析主要集中在连续时间设置上,假设基础方程的完美解决方案。在这项工作中,我们提出了对随机插值框架的首次离散时间分析,引入了一种创新的离散时间采样器,并推导了其分布估计误差的有限时间上界。我们的结果提供了一个新颖的量化方法,说明不同因素(包括源分布和目标分布之间的距离以及估计精度)如何影响收敛速度,并为设计有效的收敛加速方案提供了一个新的原则性方法。最后,在离散时间采样器上进行数字实验,以证实我们的理论发现。

更新时间: 2025-07-28 11:38:05

领域: cs.LG

下载: http://arxiv.org/abs/2502.09130v2

MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind

Large Language Model (LLM) agents have demonstrated impressive capabilities in social deduction games (SDGs) like Werewolf, where strategic reasoning and social deception are essential. However, current approaches remain limited to textual information, ignoring crucial multimodal cues such as facial expressions and tone of voice that humans naturally use to communicate. Moreover, existing SDG agents primarily focus on inferring other players' identities without modeling how others perceive themselves or fellow players. To address these limitations, we use One Night Ultimate Werewolf (ONUW) as a testbed and present MultiMind, the first framework integrating multimodal information into SDG agents. MultiMind processes facial expressions and vocal tones alongside verbal content, while employing a Theory of Mind (ToM) model to represent each player's suspicion levels toward others. By combining this ToM model with Monte Carlo Tree Search (MCTS), our agent identifies communication strategies that minimize suspicion directed at itself. Through comprehensive evaluation in both agent-versus-agent simulations and studies with human players, we demonstrate MultiMind's superior performance in gameplay. Our work presents a significant advancement toward LLM agents capable of human-like social reasoning across multimodal domains.

Updated: 2025-07-28 11:31:55

标题: MultiMind: 使用多模态推理和心灵理论增强狼人代理

摘要: 大型语言模型(LLM)代理在社交推理游戏(SDGs)中展示了令人印象深刻的能力,比如狼人杀,其中战略推理和社交欺骗是必不可少的。然而,当前的方法仍然局限于文本信息,忽视了关键的多模态线索,比如面部表情和语调,人类自然用来交流。此外,现有的SDG代理主要关注推断其他玩家的身份,而不是对其他玩家如何看待自己或其他玩家进行建模。为了解决这些限制,我们以一夜终极狼人(ONUW)作为测试平台,并提出了MultiMind,这是第一个将多模态信息整合到SDG代理中的框架。MultiMind同时处理面部表情和声音语调,以及言语内容,同时使用一个心灵理论(ToM)模型来表示每个玩家对其他玩家的怀疑程度。通过将这个ToM模型与蒙特卡洛树搜索(MCTS)相结合,我们的代理识别出最大程度减少对自己怀疑的沟通策略。通过在代理对代理模拟和与人类玩家的研究中进行全面评估,我们展示了MultiMind在游戏中的出色表现。我们的工作是朝着能够在多模态领域进行人类般社交推理的LLM代理的重要进步。

更新时间: 2025-07-28 11:31:55

领域: cs.AI

下载: http://arxiv.org/abs/2504.18039v3

MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind

Large Language Model (LLM) agents have demonstrated impressive capabilities in social deduction games (SDGs) like Werewolf, where strategic reasoning and social deception are essential. However, current approaches remain limited to textual information, ignoring crucial multimodal cues such as facial expressions and tone of voice that humans naturally use to communicate. Moreover, existing SDG agents primarily focus on inferring other players' identities without modeling how others perceive themselves or fellow players. To address these limitations, we use One Night Ultimate Werewolf (ONUW) as a testbed and present MultiMind, the first framework integrating multimodal information into SDG agents. MultiMind processes facial expressions and vocal tones alongside verbal content, while employing a Theory of Mind (ToM) model to represent each player's suspicion levels toward others. By combining this ToM model with Monte Carlo Tree Search (MCTS), our agent identifies communication strategies that minimize suspicion directed at itself. Through comprehensive evaluation in both agent-versus-agent simulations and studies with human players, we demonstrate MultiMind's superior performance in gameplay. Our work presents a significant advancement toward LLM agents capable of human-like social reasoning across multimodal domains.

Updated: 2025-07-28 11:31:55

标题: MultiMind:通过多模态推理和心灵理论增强狼人代理

摘要: 大型语言模型(LLM)代理在狼人游戏等社交推理游戏(SDGs)中展示了令人印象深刻的能力,其中战略推理和社交欺骗至关重要。然而,当前的方法仍然局限于文本信息,忽略了重要的多模态线索,如面部表情和语调,人类自然用于交流。此外,现有的SDG代理主要专注于推断其他玩家的身份,而不是建模其他人如何看待自己或其他玩家。为了解决这些限制,我们以《狼人杀》作为测试平台,提出了MultiMind,这是第一个将多模态信息整合到SDG代理中的框架。MultiMind同时处理面部表情和语调以及言语内容,同时利用一种心灵理论(ToM)模型表示每个玩家对其他人的怀疑水平。通过将这个ToM模型与蒙特卡洛树搜索(MCTS)相结合,我们的代理确定了最小化针对自己的怀疑的沟通策略。通过在代理对代理模拟和与人类玩家的研究中进行全面评估,我们展示了MultiMind在游戏中的卓越性能。我们的工作是朝着能够跨多模态领域进行人类式社交推理的LLM代理的重大进步。

更新时间: 2025-07-28 11:31:55

领域: cs.AI

下载: http://arxiv.org/abs/2504.18039v3

Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?

Recent advances have witnessed the effectiveness of reinforcement learning (RL) finetuning in enhancing the reasoning capabilities of large language models (LLMs). The optimization process often requires numerous iterations to achieve satisfactory performance, resulting in high computational costs due to the need for frequent prompt evaluations under intensive LLM interactions and repeated policy updates. Appropriate online prompt selection methods reduce iteration steps by prioritizing informative prompts during training, while the pipeline's reliance on exhaustive prompt evaluation and subset selection for optimization still incurs substantial computational overhead due to frequent LLM inference calls. Distinguished from these direct evaluate-then-select schemes, this work investigates iterative approximate evaluation for arbitrary prompts and introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework that online estimates prompt difficulty without requiring costly LLM interactions. Technically, MoPPS models each prompt's success rate as a latent variable, performs streaming Bayesian inference, and employs posterior sampling in a constructed multi-armed bandit machine, enabling sample efficient and adaptive prompt selection. Extensive experiments across mathematics, planning, and vision-based geometry tasks show that MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced LLM rollouts.

Updated: 2025-07-28 11:30:13

标题: 能否在线预测加速强化学习微调推理模型的提示困难?

摘要: 最近的进展表明,强化学习(RL)微调在增强大型语言模型(LLMs)的推理能力方面的有效性。优化过程通常需要大量迭代才能达到令人满意的性能,由于需要在密集的LLM交互和重复的策略更新下频繁进行提示评估,导致计算成本较高。适当的在线提示选择方法通过在训练过程中优先选择信息提示来减少迭代步骤,而仍然依赖于详尽的提示评估和子集选择进行优化的管道,由于频繁的LLM推理调用而导致相当大的计算开销。与这些直接评估-选择方案不同,这项工作调查了任意提示的迭代近似评估,并引入了模型预测提示选择(MoPPS),这是一个贝叶斯风险预测框架,可以在线估计提示难度而无需昂贵的LLM交互。在技术上,MoPPS将每个提示的成功率建模为一个潜变量,执行流式贝叶斯推断,并在构建的多臂赌博机中使用后验抽样,实现了样本高效和自适应提示选择。在数学、规划和基于视觉的几何任务中进行的大量实验表明,MoPPS可靠地预测提示难度,并通过显著减少LLM展开加速训练。

更新时间: 2025-07-28 11:30:13

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.04632v3

Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?

Recent advances have witnessed the effectiveness of reinforcement learning (RL) finetuning in enhancing the reasoning capabilities of large language models (LLMs). The optimization process often requires numerous iterations to achieve satisfactory performance, resulting in high computational costs due to the need for frequent prompt evaluations under intensive LLM interactions and repeated policy updates. Appropriate online prompt selection methods reduce iteration steps by prioritizing informative prompts during training, while the pipeline's reliance on exhaustive prompt evaluation and subset selection for optimization still incurs substantial computational overhead due to frequent LLM inference calls. Distinguished from these direct evaluate-then-select schemes, this work investigates iterative approximate evaluation for arbitrary prompts and introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework that online estimates prompt difficulty without requiring costly LLM interactions. Technically, MoPPS models each prompt's success rate as a latent variable, performs streaming Bayesian inference, and employs posterior sampling in a constructed multi-armed bandit machine, enabling sample efficient and adaptive prompt selection. Extensive experiments across mathematics, planning, and vision-based geometry tasks show that MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced LLM rollouts.

Updated: 2025-07-28 11:30:13

标题: 可以及时预测困难度以加速强化学习微调推理模型吗?

摘要: 最近的进展表明,强化学习(RL)微调在增强大型语言模型(LLMs)的推理能力方面的有效性。优化过程通常需要大量迭代才能达到令人满意的性能,这导致计算成本高昂,因为需要在密集的LLM交互和重复的政策更新下频繁进行提示评估。适当的在线提示选择方法通过在训练过程中优先考虑信息提示来减少迭代步骤,而管道对于优化仍然依赖于详尽的提示评估和子集选择,因为频繁的LLM推断调用导致了相当大的计算开销。与这些直接评估然后选择方案不同,本文研究了任意提示的迭代近似评估,并引入了模型预测提示选择(MoPPS),这是一个贝叶斯风险预测框架,可以在线估计提示的难度,而无需昂贵的LLM交互。从技术上讲,MoPPS将每个提示的成功率建模为一个潜变量,执行流式贝叶斯推断,并在构建的多臂赌博机中使用后验抽样,实现了高效的样本和自适应提示选择。在数学、规划和基于视觉的几何任务中进行的大量实验表明,MoPPS可靠地预测提示难度,并通过显著减少LLM展开来加速训练。

更新时间: 2025-07-28 11:30:13

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.04632v3

Learning the Value Systems of Societies from Preferences

Aligning AI systems with human values and the value-based preferences of various stakeholders (their value systems) is key in ethical AI. In value-aware AI systems, decision-making draws upon explicit computational representations of individual values (groundings) and their aggregation into value systems. As these are notoriously difficult to elicit and calibrate manually, value learning approaches aim to automatically derive computational models of an agent's values and value system from demonstrations of human behaviour. Nonetheless, social science and humanities literature suggest that it is more adequate to conceive the value system of a society as a set of value systems of different groups, rather than as the simple aggregation of individual value systems. Accordingly, here we formalize the problem of learning the value systems of societies and propose a method to address it based on heuristic deep clustering. The method learns socially shared value groundings and a set of diverse value systems representing a given society by observing qualitative value-based preferences from a sample of agents. We evaluate the proposal in a use case with real data about travelling decisions.

Updated: 2025-07-28 11:25:55

标题: 从偏好中学习社会的价值体系

摘要: 将人工智能系统与人类价值观以及各利益相关者的基于价值的偏好(即他们的价值体系)进行对齐是道德人工智能的关键。在价值感知人工智能系统中,决策依赖于个体价值观的明确计算表达(基础)及其聚合成价值体系。由于这些价值观通常难以手动引出和校准,因此价值学习方法旨在通过人类行为示范自动推导一个代理人的价值及价值体系的计算模型。然而,社会科学和人文学文献表明,更合适的做法是将一个社会的价值体系看作是不同群体的价值体系的集合,而不是单纯地将个体价值体系聚合起来。因此,在这里我们形式化了学习社会价值体系的问题,并提出了一种基于启发式深度聚类的方法来解决这个问题。该方法通过观察一组代理人的定性基于价值的偏好,学习社会共享的价值基础和代表给定社会的一组多样化价值体系。我们在一个有关旅行决策的真实数据用例中评估了这一提议。

更新时间: 2025-07-28 11:25:55

领域: cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2507.20728v1

Learning the Value Systems of Societies from Preferences

Aligning AI systems with human values and the value-based preferences of various stakeholders (their value systems) is key in ethical AI. In value-aware AI systems, decision-making draws upon explicit computational representations of individual values (groundings) and their aggregation into value systems. As these are notoriously difficult to elicit and calibrate manually, value learning approaches aim to automatically derive computational models of an agent's values and value system from demonstrations of human behaviour. Nonetheless, social science and humanities literature suggest that it is more adequate to conceive the value system of a society as a set of value systems of different groups, rather than as the simple aggregation of individual value systems. Accordingly, here we formalize the problem of learning the value systems of societies and propose a method to address it based on heuristic deep clustering. The method learns socially shared value groundings and a set of diverse value systems representing a given society by observing qualitative value-based preferences from a sample of agents. We evaluate the proposal in a use case with real data about travelling decisions.

Updated: 2025-07-28 11:25:55

标题: 从偏好中学习社会的价值体系

摘要: 将人工智能系统与人类价值观和各利益相关者(即他们的价值体系)的基于价值观的偏好对齐是道德人工智能的关键。在价值意识人工智能系统中,决策依赖于个体价值(基础)的明确计算表征及其聚合成为价值体系。由于这些价值体系极难手动引出和校准,价值学习方法旨在通过人类行为演示自动推导代理人的价值和价值体系的计算模型。然而,社会科学和人文学文献表明,将一个社会的价值体系视为不同群体的价值体系集合,而不是仅仅是个体价值体系的简单汇总更为合适。因此,在这里,我们系统地阐述了学习社会价值体系的问题,并提出了一种基于启发式深度聚类的方法来解决这一问题。该方法通过观察一组代理人的定性价值偏好,学习社会共享的价值基础和代表给定社会的多样化价值体系。我们通过一个关于旅行决策的真实数据使用案例来评估这一提议。

更新时间: 2025-07-28 11:25:55

领域: cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2507.20728v1

Everything is a Video: Unifying Modalities through Next-Frame Prediction

Multimodal learning, which involves integrating information from various modalities such as text, images, audio, and video, is pivotal for numerous complex tasks like visual question answering, cross-modal retrieval, and caption generation. Traditional approaches rely on modality-specific encoders and late fusion techniques, which can hinder scalability and flexibility when adapting to new tasks or modalities. To address these limitations, we introduce a novel framework that extends the concept of task reformulation beyond natural language processing (NLP) to multimodal learning. We propose to reformulate diverse multimodal tasks into a unified next-frame prediction problem, allowing a single model to handle different modalities without modality-specific components. This method treats all inputs and outputs as sequential frames in a video, enabling seamless integration of modalities and effective knowledge transfer across tasks. Our approach is evaluated on a range of tasks, including text-to-text, image-to-text, video-to-video, video-to-text, and audio-to-text, demonstrating the model's ability to generalize across modalities with minimal adaptation. We show that task reformulation can significantly simplify multimodal model design across various tasks, laying the groundwork for more generalized multimodal foundation models.

Updated: 2025-07-28 11:18:49

标题: 一切皆视频:通过下一帧预测统一模态

摘要: 多模态学习涉及整合来自各种方式的信息,如文本、图像、音频和视频,对于诸如视觉问答、跨模态检索和标题生成等复杂任务至关重要。传统方法依赖于特定于模态的编码器和迟到融合技术,在适应新任务或模态时可能会阻碍可扩展性和灵活性。为了解决这些限制,我们引入了一个新颖的框架,将任务重构的概念扩展到自然语言处理之外的多模态学习。我们提出将不同的多模态任务重构为统一的下一帧预测问题,使单个模型能够处理不同的模态而不需要模态特定的组件。该方法将所有输入和输出视为视频中的序列帧,实现了模态的无缝集成和跨任务的有效知识传递。我们的方法在一系列任务上进行了评估,包括文本到文本、图像到文本、视频到视频、视频到文本和音频到文本,展示了模型能够在最少调整的情况下在不同模态之间泛化的能力。我们展示了任务重构可以显著简化跨各种任务的多模态模型设计,为更通用的多模态基础模型奠定了基础。

更新时间: 2025-07-28 11:18:49

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.10503v2

Everything is a Video: Unifying Modalities through Next-Frame Prediction

Multimodal learning, which involves integrating information from various modalities such as text, images, audio, and video, is pivotal for numerous complex tasks like visual question answering, cross-modal retrieval, and caption generation. Traditional approaches rely on modality-specific encoders and late fusion techniques, which can hinder scalability and flexibility when adapting to new tasks or modalities. To address these limitations, we introduce a novel framework that extends the concept of task reformulation beyond natural language processing (NLP) to multimodal learning. We propose to reformulate diverse multimodal tasks into a unified next-frame prediction problem, allowing a single model to handle different modalities without modality-specific components. This method treats all inputs and outputs as sequential frames in a video, enabling seamless integration of modalities and effective knowledge transfer across tasks. Our approach is evaluated on a range of tasks, including text-to-text, image-to-text, video-to-video, video-to-text, and audio-to-text, demonstrating the model's ability to generalize across modalities with minimal adaptation. We show that task reformulation can significantly simplify multimodal model design across various tasks, laying the groundwork for more generalized multimodal foundation models.

Updated: 2025-07-28 11:18:49

标题: 一切皆视频:通过下一帧预测统一模态

摘要: 多模态学习涉及整合来自各种模态的信息,如文本、图像、音频和视频,对于诸如视觉问题回答、跨模态检索和字幕生成等许多复杂任务至关重要。传统方法依赖于模态特定的编码器和后期融合技术,这可能在适应新任务或模态时阻碍可扩展性和灵活性。为了解决这些限制,我们引入了一个新颖的框架,将任务重构的概念扩展到自然语言处理(NLP)以外的多模态学习。我们提出将各种多模态任务重构为统一的下一帧预测问题,使单个模型能够处理不同的模态而无需特定于模态的组件。该方法将所有输入和输出视为视频中的连续帧,实现了模态的无缝集成和跨任务的有效知识转移。我们的方法在一系列任务上进行了评估,包括文本到文本、图像到文本、视频到视频、视频到文本和音频到文本,展示了模型在最小适应情况下跨模态泛化的能力。我们展示了任务重构可以显著简化跨各种任务的多模态模型设计,为更广义的多模态基础模型奠定了基础。

更新时间: 2025-07-28 11:18:49

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2411.10503v2

Enhancing Wearable Tap Water Audio Detection through Subclass Annotation in the HD-Epic Dataset

Wearable human activity recognition has been shown to benefit from the inclusion of acoustic data, as the sounds around a person often contain valuable context. However, due to privacy concerns, it is usually not ethically feasible to record and save microphone data from the device, since the audio could, for instance, also contain private conversations. Rather, the data should be processed locally, which in turn requires processing power and consumes energy on the wearable device. One special use case of contextual information that can be utilized to augment special tasks in human activity recognition is water flow detection, which can, e.g., be used to aid wearable hand washing detection. We created a new label called tap water for the recently released HD-Epic data set, creating 717 hand-labeled annotations of tap water flow, based on existing annotations of the water class. We analyzed the relation of tap water and water in the dataset and additionally trained and evaluated two lightweight classifiers to evaluate the newly added label class, showing that the new class can be learned more easily.

Updated: 2025-07-28 11:15:36

标题: 通过HD-Epic数据集中的子类注释增强可穿戴式自来水音频检测

摘要: 可穿戴式人体活动识别已经显示出受益于包含声学数据,因为一个人周围的声音通常包含有价值的背景信息。然而,由于隐私问题,通常不道德地记录和保存设备上的麦克风数据,因为音频可能还包含私人对话等内容。相反,数据应该在本地处理,这又需要可穿戴设备上的处理能力和能量消耗。一个可以用来增强人体活动识别中特殊任务的上下文信息的特殊用例是水流检测,例如可以用来辅助可穿戴手洗检测。我们为最近发布的HD-Epic数据集创建了一个新的标签称为“自来水”,基于现有的水类别标注创建了717个手工标记的自来水流注释。我们分析了数据集中自来水和水之间的关系,并另外训练和评估了两个轻量级分类器来评估新添加的标签类别,表明新类别可以更容易地学习。

更新时间: 2025-07-28 11:15:36

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2505.20788v2

Enhancing Wearable Tap Water Audio Detection through Subclass Annotation in the HD-Epic Dataset

Wearable human activity recognition has been shown to benefit from the inclusion of acoustic data, as the sounds around a person often contain valuable context. However, due to privacy concerns, it is usually not ethically feasible to record and save microphone data from the device, since the audio could, for instance, also contain private conversations. Rather, the data should be processed locally, which in turn requires processing power and consumes energy on the wearable device. One special use case of contextual information that can be utilized to augment special tasks in human activity recognition is water flow detection, which can, e.g., be used to aid wearable hand washing detection. We created a new label called tap water for the recently released HD-Epic data set, creating 717 hand-labeled annotations of tap water flow, based on existing annotations of the water class. We analyzed the relation of tap water and water in the dataset and additionally trained and evaluated two lightweight classifiers to evaluate the newly added label class, showing that the new class can be learned more easily.

Updated: 2025-07-28 11:15:36

标题: 通过在HD-Epic数据集中进行子类标注提高可穿戴式水龙头水音检测

摘要: 可穿戴式人体活动识别已被证明受益于包含声学数据,因为一个人周围的声音通常包含有价值的上下文信息。然而,由于隐私顾虑,通常不道德地记录和保存来自设备的麦克风数据是不可行的,因为音频可能还包含私人对话等内容。相反,数据应在本地处理,这反过来需要在可穿戴设备上消耗处理能力和能量。可用于增强人体活动识别中特定任务的上下文信息的一个特殊用例是水流检测,例如,可以用于辅助可穿戴洗手检测。我们为最近发布的HD-Epic数据集创建了一个名为"tap water"的新标签,基于现有的水类别注释,创建了717个手动标注的水龙头流动注释。我们分析了数据集中水龙头水和水的关系,并额外训练和评估了两个轻量级分类器来评估新添加的标签类别,结果显示新类别可以更容易地学习。

更新时间: 2025-07-28 11:15:36

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2505.20788v2

Uncertainty-driven Embedding Convolution

Text embeddings are essential components in modern NLP pipelines. While numerous embedding models have been proposed, their performance varies across domains, and no single model consistently excels across all tasks. This variability motivates the use of ensemble techniques to combine complementary strengths. However, most existing ensemble methods operate on deterministic embeddings and fail to account for model-specific uncertainty, limiting their robustness and reliability in downstream applications. To address these limitations, we propose Uncertainty-driven Embedding Convolution (UEC). UEC first transforms deterministic embeddings into probabilistic ones in a post-hoc manner. It then computes adaptive ensemble weights based on embedding uncertainty, grounded in a Bayes-optimal solution under a surrogate loss. Additionally, UEC introduces an uncertainty-aware similarity function that directly incorporates uncertainty into similarity scoring. Extensive experiments on retrieval, classification, and semantic similarity benchmarks demonstrate that UEC consistently improves both performance and robustness by leveraging principled uncertainty modeling.

Updated: 2025-07-28 11:15:25

标题: 不确定性驱动的嵌入卷积

摘要: 文本嵌入是现代自然语言处理流水线中必不可少的组件。虽然已经提出了许多嵌入模型,但它们的性能在不同领域之间存在差异,并且没有单一模型在所有任务上始终表现优异。这种变化性促使人们利用集成技术来结合互补的优势。然而,大多数现有的集成方法基于确定性嵌入操作,并未考虑模型特定的不确定性,限制了它们在下游应用中的鲁棒性和可靠性。为了解决这些局限性,我们提出了基于不确定性的嵌入卷积(UEC)。UEC首先以事后方式将确定性嵌入转换为概率性嵌入。然后根据嵌入不确定性计算自适应的集成权重,这是基于贝叶斯最优解在代理损失下的解决方案。此外,UEC引入了一个考虑不确定性的相似性函数,直接将不确定性纳入相似性评分中。在检索、分类和语义相似性基准测试上进行的大量实验表明,通过利用基于原则的不确定性建模,UEC始终通过提高性能和鲁棒性。

更新时间: 2025-07-28 11:15:25

领域: cs.LG

下载: http://arxiv.org/abs/2507.20718v1

Uncertainty-driven Embedding Convolution

Text embeddings are essential components in modern NLP pipelines. While numerous embedding models have been proposed, their performance varies across domains, and no single model consistently excels across all tasks. This variability motivates the use of ensemble techniques to combine complementary strengths. However, most existing ensemble methods operate on deterministic embeddings and fail to account for model-specific uncertainty, limiting their robustness and reliability in downstream applications. To address these limitations, we propose Uncertainty-driven Embedding Convolution (UEC). UEC first transforms deterministic embeddings into probabilistic ones in a post-hoc manner. It then computes adaptive ensemble weights based on embedding uncertainty, grounded in a Bayes-optimal solution under a surrogate loss. Additionally, UEC introduces an uncertainty-aware similarity function that directly incorporates uncertainty into similarity scoring. Extensive experiments on retrieval, classification, and semantic similarity benchmarks demonstrate that UEC consistently improves both performance and robustness by leveraging principled uncertainty modeling.

Updated: 2025-07-28 11:15:25

标题: 不确定性驱动的嵌入卷积

摘要: 文本嵌入是现代自然语言处理流程中至关重要的组件。尽管已经提出了许多嵌入模型,它们在不同领域的表现各有差异,没有单一模型能在所有任务中表现出色。这种变化性促使我们使用集成技术来结合互补的优势。然而,大多数现有的集成方法都是基于确定性嵌入运作的,未考虑模型特定的不确定性,限制了它们在下游应用中的稳健性和可靠性。为了解决这些限制,我们提出了基于不确定性的嵌入卷积(UEC)。UEC首先以事后方式将确定性嵌入转换为概率性嵌入。然后根据嵌入不确定性计算自适应的集成权重,基于在代理损失下的贝叶斯最优解。此外,UEC引入了一种考虑不确定性的相似性函数,直接将不确定性纳入相似性评分中。在检索、分类和语义相似性基准测试上进行的大量实验表明,通过利用基于原则的不确定性建模,UEC始终提高了性能和稳健性。

更新时间: 2025-07-28 11:15:25

领域: cs.LG

下载: http://arxiv.org/abs/2507.20718v1

Crop Pest Classification Using Deep Learning Techniques: A Review

Insect pests continue to bring a serious threat to crop yields around the world, and traditional methods for monitoring them are often slow, manual, and difficult to scale. In recent years, deep learning has emerged as a powerful solution, with techniques like convolutional neural networks (CNNs), vision transformers (ViTs), and hybrid models gaining popularity for automating pest detection. This review looks at 37 carefully selected studies published between 2018 and 2025, all focused on AI-based pest classification. The selected research is organized by crop type, pest species, model architecture, dataset usage, and key technical challenges. The early studies relied heavily on CNNs but latest work is shifting toward hybrid and transformer-based models that deliver higher accuracy and better contextual understanding. Still, challenges like imbalanced datasets, difficulty in detecting small pests, limited generalizability, and deployment on edge devices remain significant hurdles. Overall, this review offers a structured overview of the field, highlights useful datasets, and outlines the key challenges and future directions for AI-based pest monitoring systems.

Updated: 2025-07-28 11:13:58

标题: 使用深度学习技术进行作物害虫分类:一项综述

摘要: 昆虫害继续对全球作物产量构成严重威胁,传统的监测方法往往缓慢、手动且难以扩展。近年来,深度学习已经成为一个强大的解决方案,卷积神经网络(CNNs)、视觉变换器(ViTs)和混合模型等技术因其自动化害虫检测而备受推崇。本综述研究考察了2018年至2025年间发表的37篇精选研究,所有这些研究都集中在基于人工智能的害虫分类上。所选研究按作物类型、害虫种类、模型架构、数据集使用和关键技术挑战进行组织。早期研究主要依赖于CNNs,但最新工作正在转向混合和基于变换器的模型,这些模型提供更高的准确性和更好的上下文理解。然而,诸如数据不平衡、难以检测小型害虫、有限的泛化性和在边缘设备上部署等挑战仍然是巨大的障碍。总体而言,本综述提供了该领域的结构化概述,强调了有用的数据集,并概述了基于人工智能的害虫监测系统的关键挑战和未来发展方向。

更新时间: 2025-07-28 11:13:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.01494v2

Crop Pest Classification Using Deep Learning Techniques: A Review

Insect pests continue to bring a serious threat to crop yields around the world, and traditional methods for monitoring them are often slow, manual, and difficult to scale. In recent years, deep learning has emerged as a powerful solution, with techniques like convolutional neural networks (CNNs), vision transformers (ViTs), and hybrid models gaining popularity for automating pest detection. This review looks at 37 carefully selected studies published between 2018 and 2025, all focused on AI-based pest classification. The selected research is organized by crop type, pest species, model architecture, dataset usage, and key technical challenges. The early studies relied heavily on CNNs but latest work is shifting toward hybrid and transformer-based models that deliver higher accuracy and better contextual understanding. Still, challenges like imbalanced datasets, difficulty in detecting small pests, limited generalizability, and deployment on edge devices remain significant hurdles. Overall, this review offers a structured overview of the field, highlights useful datasets, and outlines the key challenges and future directions for AI-based pest monitoring systems.

Updated: 2025-07-28 11:13:58

标题: 使用深度学习技术进行作物害虫分类:综述

摘要: 害虫持续对全球作物产量构成严重威胁,传统的监测方法通常是缓慢、手动且难以扩展。近年来,深度学习已经成为一个强大的解决方案,卷积神经网络(CNNs)、视觉变压器(ViTs)和混合模型等技术因其自动化害虫检测而备受青睐。本综述回顾了2018年至2025年间发表的37篇精选研究,所有这些研究都专注于基于人工智能的害虫分类。选定的研究按照作物类型、害虫物种、模型架构、数据集使用和关键技术挑战进行组织。早期研究大量依赖CNNs,但最新的工作正在转向采用混合和基于变压器的模型,提供更高的准确性和更好的上下文理解。然而,像数据集不平衡、难以检测小型害虫、有限的泛化能力以及在边缘设备上部署等挑战仍然是重要的障碍。总体而言,本综述提供了该领域的结构化概述,突出了有用的数据集,并概述了基于人工智能的害虫监测系统的关键挑战和未来方向。

更新时间: 2025-07-28 11:13:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.01494v2

GDSR: Global-Detail Integration through Dual-Branch Network with Wavelet Losses for Remote Sensing Image Super-Resolution

In recent years, deep neural networks, including Convolutional Neural Networks, Transformers, and State Space Models, have achieved significant progress in Remote Sensing Image (RSI) Super-Resolution (SR). However, existing SR methods typically overlook the complementary relationship between global and local dependencies. These methods either focus on capturing local information or prioritize global information, which results in models that are unable to effectively capture both global and local features simultaneously. Moreover, their computational cost becomes prohibitive when applied to large-scale RSIs. To address these challenges, we introduce the novel application of Receptance Weighted Key Value (RWKV) to RSI-SR, which captures long-range dependencies with linear complexity. To simultaneously model global and local features, we propose the Global-Detail dual-branch structure, GDSR, which performs SR by paralleling RWKV and convolutional operations to handle large-scale RSIs. Furthermore, we introduce the Global-Detail Reconstruction Module (GDRM) as an intermediary between the two branches to bridge their complementary roles. In addition, we propose the Dual-Group Multi-Scale Wavelet Loss, a wavelet-domain constraint mechanism via dual-group subband strategy and cross-resolution frequency alignment for enhanced reconstruction fidelity in RSI-SR. Extensive experiments under two degradation methods on several benchmarks, including AID, UCMerced, and RSSRD-QH, demonstrate that GSDR outperforms the state-of-the-art Transformer-based method HAT by an average of 0.09 dB in PSNR, while using only 63% of its parameters and 51% of its FLOPs, achieving an inference speed 3.2 times faster.

Updated: 2025-07-28 11:12:59

标题: GDSR:通过双分支网络和小波损失进行全局细节集成,用于遥感图像超分辨率

摘要: 最近几年,包括卷积神经网络、Transformer和状态空间模型在遥感图像超分辨率方面取得了显著进展。然而,现有的超分辨率方法通常忽视全局和局部依赖之间的互补关系。这些方法要么专注于捕捉局部信息,要么优先考虑全局信息,导致模型无法同时有效地捕捉全局和局部特征。此外,当应用于大规模遥感图像时,它们的计算成本变得不可承受。为了解决这些挑战,我们引入了Receptance Weighted Key Value(RWKV)在RSI-SR中的新应用,它以线性复杂度捕获长距离依赖关系。为了同时建模全局和局部特征,我们提出了Global-Detail双分支结构(GDSR),通过同时进行RWKV和卷积操作来处理大规模遥感图像的超分辨率。此外,我们引入了Global-Detail重建模块(GDRM)作为两个分支之间的中介,以衔接它们的互补作用。此外,我们提出了双组多尺度小波损失,通过双组子带策略和跨分辨率频率对齐的小波域约束机制,增强了RSI-SR中的重建保真度。在几个基准数据集上进行了广泛实验,包括AID、UCMerced和RSSRD-QH,结果表明GSDR在PSNR方面比基于Transformer的方法HAT表现更好,平均提高了0.09 dB,同时仅使用了其参数的63%和FLOPs的51%,实现了推断速度提高了3.2倍。

更新时间: 2025-07-28 11:12:59

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2501.01460v3

GDSR: Global-Detail Integration through Dual-Branch Network with Wavelet Losses for Remote Sensing Image Super-Resolution

In recent years, deep neural networks, including Convolutional Neural Networks, Transformers, and State Space Models, have achieved significant progress in Remote Sensing Image (RSI) Super-Resolution (SR). However, existing SR methods typically overlook the complementary relationship between global and local dependencies. These methods either focus on capturing local information or prioritize global information, which results in models that are unable to effectively capture both global and local features simultaneously. Moreover, their computational cost becomes prohibitive when applied to large-scale RSIs. To address these challenges, we introduce the novel application of Receptance Weighted Key Value (RWKV) to RSI-SR, which captures long-range dependencies with linear complexity. To simultaneously model global and local features, we propose the Global-Detail dual-branch structure, GDSR, which performs SR by paralleling RWKV and convolutional operations to handle large-scale RSIs. Furthermore, we introduce the Global-Detail Reconstruction Module (GDRM) as an intermediary between the two branches to bridge their complementary roles. In addition, we propose the Dual-Group Multi-Scale Wavelet Loss, a wavelet-domain constraint mechanism via dual-group subband strategy and cross-resolution frequency alignment for enhanced reconstruction fidelity in RSI-SR. Extensive experiments under two degradation methods on several benchmarks, including AID, UCMerced, and RSSRD-QH, demonstrate that GSDR outperforms the state-of-the-art Transformer-based method HAT by an average of 0.09 dB in PSNR, while using only 63% of its parameters and 51% of its FLOPs, achieving an inference speed 3.2 times faster.

Updated: 2025-07-28 11:12:59

标题: GDSR:通过双分支网络和小波损失实现遥感图像超分辨率的全局细节集成

摘要: 近年来,包括卷积神经网络、变压器和状态空间模型在内的深度神经网络在遥感图像超分辨率(RSI-SR)方面取得了显著进展。然而,现有的SR方法通常忽视了全局和局部依赖之间的互补关系。这些方法要么专注于捕获局部信息,要么优先考虑全局信息,这导致模型无法同时有效地捕获全局和局部特征。此外,当应用于大规模RSI时,它们的计算成本变得难以承受。为了解决这些挑战,我们引入了Receptance Weighted Key Value(RWKV)的新应用于RSI-SR,该方法以线性复杂度捕获长距离依赖关系。为了同时建模全局和局部特征,我们提出了全局-细节双分支结构GDSR,通过并行RWKV和卷积操作来处理大规模RSI进行SR。此外,我们引入了全局-细节重建模块(GDRM)作为两个分支之间的中间层,以弥合它们的互补作用。此外,我们提出了双组多尺度小波损失,一种通过双组子带策略和跨分辨率频率对齐的小波域约束机制,用于增强RSI-SR中的重建保真度。在几个基准测试中进行了两种降级方法的广泛实验,包括AID、UCMerced和RSSRD-QH,结果表明GSDR在PSNR方面优于基于变压器的最新方法HAT,平均提高了0.09 dB,只使用了其参数的63%和FLOP的51%,实现了推断速度提高了3.2倍。

更新时间: 2025-07-28 11:12:59

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2501.01460v3

Group Sequence Policy Optimization

This paper introduces Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant reinforcement learning algorithm for training large language models. Unlike previous algorithms that adopt token-level importance ratios, GSPO defines the importance ratio based on sequence likelihood and performs sequence-level clipping, rewarding, and optimization. We demonstrate that GSPO achieves superior training efficiency and performance compared to the GRPO algorithm, notably stabilizes Mixture-of-Experts (MoE) RL training, and has the potential for simplifying the design of RL infrastructure. These merits of GSPO have contributed to the remarkable improvements in the latest Qwen3 models.

Updated: 2025-07-28 11:11:33

标题: 小组序列政策优化

摘要: 这篇论文介绍了Group Sequence Policy Optimization(GSPO),我们稳定、高效、性能优越的强化学习算法,用于训练大型语言模型。与先前采用基于标记级别重要性比率的算法不同,GSPO基于序列似然定义重要性比率,并执行序列级别的剪辑、奖励和优化。我们证明GSPO相比GRPO算法实现了更优异的训练效率和性能,显著稳定了Mixture-of-Experts(MoE)RL训练,并有潜力简化RL基础设施的设计。GSPO的这些优点为最新的Qwen3模型带来了显著的改进。

更新时间: 2025-07-28 11:11:33

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.18071v2

A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census

We show that individual, confidential microdata records from the 2010 U.S. Census of Population and Housing can be accurately reconstructed from the published tabular summaries. Ninety-seven million person records (every resident in 70% of all census blocks) are exactly reconstructed with provable certainty using only public information. We further show that a hypothetical attacker using our methods can reidentify with 95% accuracy population unique individuals who are perfectly reconstructed and not in the modal race and ethnicity category in their census block (3.4 million persons)--a result that is only possible because their confidential records were used in the published tabulations. Finally, we show that the methods used for the 2020 Census, based on a differential privacy framework, provide better protection against this type of attack, with better published data accuracy, than feasible alternatives.

Updated: 2025-07-28 11:10:54

标题: 对2010年美国人口普查的模拟重建和重新识别攻击

摘要: 我们展示了可以准确重建2010年美国人口普查和住房普查中的个人机密微数据记录,只使用已发布的表格摘要。九千七百万个人记录(每个人口普查区块的70%居民)可以使用公共信息准确重建,可证明确定。我们进一步展示,使用我们的方法的假想攻击者可以以95%的准确率重新识别出人口普查区块中在其机密记录被用于发布汇总表中的完全重建并不属于模态种族和族裔类别的独特个体(三百四十万人)--这一结果仅可能是因为他们的机密记录被用于发布汇总表。最后,我们展示了2020年人口普查使用的基于差分隐私框架的方法提供了更好的保护,使得更好的发布数据准确性,比可行的替代方案更好地防止这种类型的攻击。

更新时间: 2025-07-28 11:10:54

领域: stat.AP,cs.CR,econ.EM

下载: http://arxiv.org/abs/2312.11283v3

Prostate Cancer Classification Using Multimodal Feature Fusion and Explainable AI

Prostate cancer, the second most prevalent male malignancy, requires advanced diagnostic tools. We propose an explainable AI system combining BERT (for textual clinical notes) and Random Forest (for numerical lab data) through a novel multimodal fusion strategy, achieving superior classification performance on PLCO-NIH dataset (98% accuracy, 99% AUC). While multimodal fusion is established, our work demonstrates that a simple yet interpretable BERT+RF pipeline delivers clinically significant improvements - particularly for intermediate cancer stages (Class 2/3 recall: 0.900 combined vs 0.824 numerical/0.725 textual). SHAP analysis provides transparent feature importance rankings, while ablation studies prove textual features' complementary value. This accessible approach offers hospitals a balance of high performance (F1=89%), computational efficiency, and clinical interpretability - addressing critical needs in prostate cancer diagnostics.

Updated: 2025-07-28 11:07:17

标题: 前列腺癌的分类:利用多模态特征融合和可解释人工智能

摘要: 前列腺癌是第二常见的男性恶性肿瘤,需要先进的诊断工具。我们提出了一种可解释的人工智能系统,通过一种新颖的多模态融合策略,将BERT(用于文本临床记录)和随机森林(用于数值实验室数据)相结合,在PLCO-NIH数据集上实现了优越的分类性能(98%准确度,99% AUC)。虽然多模态融合已经建立,但我们的工作表明,一个简单但可解释的BERT+RF管道提供了临床显著的改进 - 特别是对于中间癌症阶段(Class 2/3召回率:0.900联合vs 0.824数值/0.725文本)。SHAP分析提供了透明的特征重要性排名,而消融研究证明了文本特征的补充价值。这种易于访问的方法为医院提供了高性能(F1=89%)、计算效率和临床可解释性的平衡 - 满足了前列腺癌诊断中的关键需求。

更新时间: 2025-07-28 11:07:17

领域: cs.LG,cs.AI,q-bio.QM,stat.AP

下载: http://arxiv.org/abs/2507.20714v1

Prostate Cancer Classification Using Multimodal Feature Fusion and Explainable AI

Prostate cancer, the second most prevalent male malignancy, requires advanced diagnostic tools. We propose an explainable AI system combining BERT (for textual clinical notes) and Random Forest (for numerical lab data) through a novel multimodal fusion strategy, achieving superior classification performance on PLCO-NIH dataset (98% accuracy, 99% AUC). While multimodal fusion is established, our work demonstrates that a simple yet interpretable BERT+RF pipeline delivers clinically significant improvements - particularly for intermediate cancer stages (Class 2/3 recall: 0.900 combined vs 0.824 numerical/0.725 textual). SHAP analysis provides transparent feature importance rankings, while ablation studies prove textual features' complementary value. This accessible approach offers hospitals a balance of high performance (F1=89%), computational efficiency, and clinical interpretability - addressing critical needs in prostate cancer diagnostics.

Updated: 2025-07-28 11:07:17

标题: 前列腺癌分类:利用多模态特征融合和可解释人工智能

摘要: 前列腺癌是第二常见的男性恶性肿瘤,需要先进的诊断工具。我们提出了一个可解释的人工智能系统,通过一种新颖的多模态融合策略,将BERT(用于文本临床记录)和随机森林(用于数值实验室数据)结合起来,在PLCO-NIH数据集上实现了优越的分类性能(98%准确率,99% AUC)。尽管多模态融合已经被建立,我们的工作表明,一个简单但可解释的BERT+RF流水线为临床带来了显著的改进 - 尤其是对于中间癌症阶段(Class 2/3召回率:0.900联合 vs 0.824数值/0.725文本)。SHAP分析提供了透明的特征重要性排名,而消融研究证明了文本特征的补充价值。这种易于接近的方法为医院提供了高性能(F1=89%)、计算效率和临床可解释性的平衡 - 满足前列腺癌诊断的关键需求。

更新时间: 2025-07-28 11:07:17

领域: cs.LG,cs.AI,q-bio.QM,stat.AP

下载: http://arxiv.org/abs/2507.20714v1

Algorithmic Fairness: A Runtime Perspective

Fairness in AI is traditionally studied as a static property evaluated once, over a fixed dataset. However, real-world AI systems operate sequentially, with outcomes and environments evolving over time. This paper proposes a framework for analysing fairness as a runtime property. Using a minimal yet expressive model based on sequences of coin tosses with possibly evolving biases, we study the problems of monitoring and enforcing fairness expressed in either toss outcomes or coin biases. Since there is no one-size-fits-all solution for either problem, we provide a summary of monitoring and enforcement strategies, parametrised by environment dynamics, prediction horizon, and confidence thresholds. For both problems, we present general results under simple or minimal assumptions. We survey existing solutions for the monitoring problem for Markovian and additive dynamics, and existing solutions for the enforcement problem in static settings with known dynamics.

Updated: 2025-07-28 11:04:17

标题: 算法公平性:运行时视角

摘要: AI中的公平性传统上被研究为一种静态属性,只在固定数据集上进行评估。然而,现实世界中的AI系统是按顺序运作的,结果和环境随时间演变。本文提出了一个分析公平性作为运行时属性的框架。使用基于可能演变偏差的硬币抛掷序列的简约但富有表现力的模型,我们研究了在抛掷结果或硬币偏差中表达的监控和强制公平性的问题。由于对于任何问题都没有一种适用于所有情况的解决方案,我们提供了一个由环境动态、预测范围和置信阈值参数化的监控和强制执行策略摘要。对于这两个问题,我们在简单或最小的假设下提出了一般性结果。我们调查了针对马尔可夫和加性动态的监控问题的现有解决方案,以及在已知动态的静态环境中针对执行问题的现有解决方案。

更新时间: 2025-07-28 11:04:17

领域: cs.AI

下载: http://arxiv.org/abs/2507.20711v1

Algorithmic Fairness: A Runtime Perspective

Fairness in AI is traditionally studied as a static property evaluated once, over a fixed dataset. However, real-world AI systems operate sequentially, with outcomes and environments evolving over time. This paper proposes a framework for analysing fairness as a runtime property. Using a minimal yet expressive model based on sequences of coin tosses with possibly evolving biases, we study the problems of monitoring and enforcing fairness expressed in either toss outcomes or coin biases. Since there is no one-size-fits-all solution for either problem, we provide a summary of monitoring and enforcement strategies, parametrised by environment dynamics, prediction horizon, and confidence thresholds. For both problems, we present general results under simple or minimal assumptions. We survey existing solutions for the monitoring problem for Markovian and additive dynamics, and existing solutions for the enforcement problem in static settings with known dynamics.

Updated: 2025-07-28 11:04:17

标题: 算法公平性:一个运行时的视角

摘要: AI中的公平性传统上被研究为一种静态属性,只评估一次,基于一个固定的数据集。然而,现实世界中的AI系统是按顺序运行的,结果和环境会随着时间而演变。本文提出了一个分析公平性作为运行时属性的框架。使用一个基于可能发生演变的偏差的硬币抛掷序列的最小但富有表现力的模型,我们研究了在抛掷结果或硬币偏差中表达的监测和实施公平性的问题。由于在这两个问题中都没有适合所有情况的解决方案,我们提供了一个根据环境动态、预测视野和置信阈值参数化的监测和实施策略摘要。对于这两个问题,我们在简单或最小假设下提供了一般结果。我们调查了现有解决方案,用于马尔可夫和可加动力学的监测问题,以及在已知动态的静态设置中用于实施问题的现有解决方案。

更新时间: 2025-07-28 11:04:17

领域: cs.AI

下载: http://arxiv.org/abs/2507.20711v1

Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

We study population convergence guarantees of stochastic gradient descent (SGD) for smooth convex objectives in the interpolation regime, where the noise at optimum is zero or near zero. The behavior of the last iterate of SGD in this setting -- particularly with large (constant) stepsizes -- has received growing attention in recent years due to implications for the training of over-parameterized models, as well as to analyzing forgetting in continual learning and to understanding the convergence of the randomized Kaczmarz method for solving linear systems. We establish that after $T$ steps of SGD on $\beta$-smooth convex loss functions with stepsize $0 < \eta < 2/\beta$, the last iterate exhibits expected excess risk $\widetilde{O}(\frac{1}{\eta (2-\beta \eta) T^{1-\beta\eta/2}} + \frac{\eta}{(2-\beta\eta)^2} T^{\beta\eta/2} \sigma_\star^2)$, where $\sigma_\star^2$ denotes the variance of the stochastic gradients at the optimum. In particular, for a well-tuned stepsize we obtain a near optimal $\widetilde{O}(1/T + \sigma_\star/\sqrt{T})$ rate for the last iterate, extending the results of Varre et al. (2021) beyond least squares regression; and when $\sigma_\star=0$ we obtain a rate of $\smash{O(1/\sqrt T)}$ with $\eta=1/\beta$, improving upon the best-known $\smash{O(T^{-1/4})}$ rate recently established by Evron et al. (2025) in the special case of realizable linear regression.

Updated: 2025-07-28 11:03:24

标题: SGD在平滑插值区域中的快速最后迭代收敛

摘要: 我们研究随机梯度下降(SGD)在插值区域中对光滑凸目标的种群收敛保证,其中最优点处的噪声为零或接近零。近年来,特别是在大(固定)步长的情况下,SGD的最后一次迭代行为引起了越来越多的关注,这对于超参数模型的训练、对继续学习中的遗忘进行分析以及理解解决线性系统的随机Kaczmarz方法的收敛具有重要意义。我们证明,在$\beta$-光滑凸损失函数上进行$T$步SGD,步长为$0 < \eta < 2/\beta$,最后一次迭代展现出期望的超出风险$\widetilde{O}(\frac{1}{\eta(2-\beta\eta)T^{1-\beta\eta/2}} + \frac{\eta}{(2-\beta\eta)^2}T^{\beta\eta/2}\sigma_\star^2)$,其中$\sigma_\star^2$表示最优点处随机梯度的方差。特别是对于调整良好的步长,我们获得了最后一次迭代的近乎最优$\widetilde{O}(1/T + \sigma_\star/\sqrt{T})$的速率,将Varre等人(2021年)的结果扩展到最小二乘回归之外;当$\sigma_\star=0$时,我们获得了$\smash{O(1/\sqrt T)}$的速率,其中$\eta=1/\beta$,这优于最佳已知的$\smash{O(T^{-1/4})$速率,最近由Evron等人(2025年)在可实现的线性回归的特殊情况下确定。

更新时间: 2025-07-28 11:03:24

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2507.11274v2

Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

We study population convergence guarantees of stochastic gradient descent (SGD) for smooth convex objectives in the interpolation regime, where the noise at optimum is zero or near zero. The behavior of the last iterate of SGD in this setting -- particularly with large (constant) stepsizes -- has received growing attention in recent years due to implications for the training of over-parameterized models, as well as to analyzing forgetting in continual learning and to understanding the convergence of the randomized Kaczmarz method for solving linear systems. We establish that after $T$ steps of SGD on $\beta$-smooth convex loss functions with stepsize $0 < \eta < 2/\beta$, the last iterate exhibits expected excess risk $\widetilde{O}(\frac{1}{\eta (2-\beta \eta) T^{1-\beta\eta/2}} + \frac{\eta}{(2-\beta\eta)^2} T^{\beta\eta/2} \sigma_\star^2)$, where $\sigma_\star^2$ denotes the variance of the stochastic gradients at the optimum. In particular, for a well-tuned stepsize we obtain a near optimal $\widetilde{O}(1/T + \sigma_\star/\sqrt{T})$ rate for the last iterate, extending the results of Varre et al. (2021) beyond least squares regression; and when $\sigma_\star=0$ we obtain a rate of $\smash{O(1/\sqrt T)}$ with $\eta=1/\beta$, improving upon the best-known $\smash{O(T^{-1/4})}$ rate recently established by Evron et al. (2025) in the special case of realizable linear regression.

Updated: 2025-07-28 11:03:24

标题: 在平滑插值区域中SGD的快速收敛

摘要: 我们研究了随机梯度下降(SGD)在插值区域中对平滑凸目标的种群收敛保证,其中最优解处的噪声为零或接近零。在这种情况下,SGD的最后迭代行为,特别是在具有大(恒定)步长的情况下,近年来受到越来越多的关注,因为这对于过参数化模型的训练、对连续学习中的遗忘以及理解用于解决线性系统的随机Kaczmarz方法的收敛具有重要意义。我们建立了在$\beta$-平滑凸损失函数上进行$T$步SGD时,最后迭代的期望过度风险为$\widetilde{O}(\frac{1}{\eta(2-\beta\eta)T^{1-\beta\eta/2}} + \frac{\eta}{(2-\beta\eta)^2}T^{\beta\eta/2} \sigma_\star^2)$,其中$\sigma_\star^2$表示最优解处的随机梯度的方差。特别是,对于一个良好调整的步长,我们获得了一个接近最佳的$\widetilde{O}(1/T + \sigma_\star/\sqrt{T})$的速率用于最后迭代,将Varre等人(2021年)的结果扩展到最小二乘回归之外;当$\sigma_\star=0$时,我们获得了一个$\smash{O(1/\sqrt T)}$的速率,其中$\eta=1/\beta$,这优于最近由Evron等人(2025年)在可实现线性回归的特殊情况中建立的最佳已知$\smash{O(T^{-1/4})}$速率。

更新时间: 2025-07-28 11:03:24

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2507.11274v2

Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

Proving the compliance of AI algorithms has become an important challenge with the growing deployment of such algorithms for real-life applications. Inspecting possible biased behaviors is mandatory to satisfy the constraints of the regulations of the EU Artificial Intelligence's Act. Regulation-driven audits increasingly rely on global fairness metrics, with Disparate Impact being the most widely used. Yet such global measures depend highly on the distribution of the sample on which the measures are computed. We investigate first how to manipulate data samples to artificially satisfy fairness criteria, creating minimally perturbed datasets that remain statistically indistinguishable from the original distribution while satisfying prescribed fairness constraints. Then we study how to detect such manipulation. Our analysis (i) introduces mathematically sound methods for modifying empirical distributions under fairness constraints using entropic or optimal transport projections, (ii) examines how an auditee could potentially circumvent fairness inspections, and (iii) offers recommendations to help auditors detect such data manipulations. These results are validated through experiments on classical tabular datasets in bias detection.

Updated: 2025-07-28 11:01:48

标题: 揭示公平的幻象:审计分配操纵攻击的漏洞

摘要: 随着越来越多的人工智能算法被应用于现实生活中,证明这些算法的合规性已经成为一个重要挑战。检查可能存在的偏见行为是满足欧盟人工智能法规约束的必要条件。受法规驱动的审计越来越多地依赖于全局公平度量,其中Disparate Impact是最广泛使用的。然而,这样的全局度量高度依赖于用于计算这些度量的样本分布。我们首先研究如何操纵数据样本以人为满足公平性标准,创建最小干扰数据集,同时保持在满足规定的公平性约束的同时与原始分布在统计上不可区分。然后我们研究如何检测这种操纵。我们的分析 (i) 引入了在公平性约束下使用熵或最优传输投影修改经验分布的数学方法, (ii) 检查受审计者可能如何规避公平性检查, (iii) 提供建议帮助审计员检测这种数据操纵。这些结果通过在经典表格数据集上进行偏见检测的实验得到验证。

更新时间: 2025-07-28 11:01:48

领域: cs.LG,math.OC,stat.AP

下载: http://arxiv.org/abs/2507.20708v1

Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

Proving the compliance of AI algorithms has become an important challenge with the growing deployment of such algorithms for real-life applications. Inspecting possible biased behaviors is mandatory to satisfy the constraints of the regulations of the EU Artificial Intelligence's Act. Regulation-driven audits increasingly rely on global fairness metrics, with Disparate Impact being the most widely used. Yet such global measures depend highly on the distribution of the sample on which the measures are computed. We investigate first how to manipulate data samples to artificially satisfy fairness criteria, creating minimally perturbed datasets that remain statistically indistinguishable from the original distribution while satisfying prescribed fairness constraints. Then we study how to detect such manipulation. Our analysis (i) introduces mathematically sound methods for modifying empirical distributions under fairness constraints using entropic or optimal transport projections, (ii) examines how an auditee could potentially circumvent fairness inspections, and (iii) offers recommendations to help auditors detect such data manipulations. These results are validated through experiments on classical tabular datasets in bias detection.

Updated: 2025-07-28 11:01:48

标题: 揭示公平的幻想:审计分配操纵攻击的漏洞

摘要: 随着越来越多的AI算法被用于现实生活应用,证明这些算法的合规性已经成为一个重要挑战。检查可能存在的偏见行为是为了满足欧盟人工智能法规的约束条件而必不可少的。受规定驱动的审计越来越依赖于全局公平度量,其中Disparate Impact是最广泛使用的。然而,这些全局度量高度依赖于用于计算度量的样本分布。我们首先研究如何操纵数据样本以人为地满足公平性标准,创建最小扰动数据集,使其在满足规定的公平性约束条件的同时仍然在统计上与原始分布无法区分。然后我们研究如何检测这种操纵。我们的分析(i)引入了在公平性约束条件下使用熵或最优传输投影修改经验分布的数学方法,(ii)探讨了被审计者如何可能规避公平性检查,(iii)提供了建议,帮助审计人员检测这种数据操纵。这些结果通过在偏见检测中对经典表格数据集进行实验验证。

更新时间: 2025-07-28 11:01:48

领域: cs.LG,math.OC,stat.AP

下载: http://arxiv.org/abs/2507.20708v1

Video Forgery Detection for Surveillance Cameras: A Review

The widespread availability of video recording through smartphones and digital devices has made video-based evidence more accessible than ever. Surveillance footage plays a crucial role in security, law enforcement, and judicial processes. However, with the rise of advanced video editing tools, tampering with digital recordings has become increasingly easy, raising concerns about their authenticity. Ensuring the integrity of surveillance videos is essential, as manipulated footage can lead to misinformation and undermine judicial decisions. This paper provides a comprehensive review of existing forensic techniques used to detect video forgery, focusing on their effectiveness in verifying the authenticity of surveillance recordings. Various methods, including compression-based analysis, frame duplication detection, and machine learning-based approaches, are explored. The findings highlight the growing necessity for more robust forensic techniques to counteract evolving forgery methods. Strengthening video forensic capabilities will ensure that surveillance recordings remain credible and admissible as legal evidence.

Updated: 2025-07-28 10:58:53

标题: 监控摄像头视频伪造检测:一项综述

摘要: 智能手机和数字设备普遍可用的视频录制功能使基于视频的证据比以往更易获取。监控录像在安全、执法和司法程序中起着至关重要的作用。然而,随着先进视频编辑工具的兴起,篡改数字记录变得越来越容易,引发人们对其真实性的担忧。确保监控视频的完整性至关重要,因为操纵的录像可能导致错误信息传播并破坏司法决定。本文全面回顾了用于检测视频伪造的现有法庭技术,重点关注它们在验证监控录像真实性方面的有效性。探讨了各种方法,包括基于压缩的分析、帧重复检测和基于机器学习的方法。研究结果突显了更强大的法庭技术的增长需求,以应对不断演变的伪造方法。加强视频取证能力将确保监控录像保持可信,并作为法律证据可供接受。

更新时间: 2025-07-28 10:58:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2505.03832v2

Video Forgery Detection for Surveillance Cameras: A Review

The widespread availability of video recording through smartphones and digital devices has made video-based evidence more accessible than ever. Surveillance footage plays a crucial role in security, law enforcement, and judicial processes. However, with the rise of advanced video editing tools, tampering with digital recordings has become increasingly easy, raising concerns about their authenticity. Ensuring the integrity of surveillance videos is essential, as manipulated footage can lead to misinformation and undermine judicial decisions. This paper provides a comprehensive review of existing forensic techniques used to detect video forgery, focusing on their effectiveness in verifying the authenticity of surveillance recordings. Various methods, including compression-based analysis, frame duplication detection, and machine learning-based approaches, are explored. The findings highlight the growing necessity for more robust forensic techniques to counteract evolving forgery methods. Strengthening video forensic capabilities will ensure that surveillance recordings remain credible and admissible as legal evidence.

Updated: 2025-07-28 10:58:53

标题: 监控摄像头视频伪造检测:综述

摘要: 智能手机和数字设备的普遍普及使得基于视频的证据比以往任何时候都更加易于获取。监控录像在安全、执法和司法程序中起着至关重要的作用。然而,随着先进视频编辑工具的兴起,篡改数字记录变得越来越容易,引发了对其真实性的担忧。确保监控视频的完整性至关重要,因为篡改的录像可能导致错误信息并削弱司法决定。本文综合评估了用于检测视频伪造的现有法庭技术,重点关注它们在验证监控录像真实性方面的有效性。探讨了各种方法,包括基于压缩的分析、帧复制检测和基于机器学习的方法。研究结果凸显了对更强大的法庭技术的增长需求,以对抗不断发展的伪造方法。加强视频取证能力将确保监控录像保持可信,并作为法律证据。

更新时间: 2025-07-28 10:58:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2505.03832v2

Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models

The increasing integration of Visual Language Models (VLMs) into AI systems necessitates robust model alignment, especially when handling multimodal content that combines text and images. Existing evaluation datasets heavily lean towards text-only prompts, leaving visual vulnerabilities under evaluated. To address this gap, we propose \textbf{Text2VLM}, a novel multi-stage pipeline that adapts text-only datasets into multimodal formats, specifically designed to evaluate the resilience of VLMs against typographic prompt injection attacks. The Text2VLM pipeline identifies harmful content in the original text and converts it into a typographic image, creating a multimodal prompt for VLMs. Also, our evaluation of open-source VLMs highlights their increased susceptibility to prompt injection when visual inputs are introduced, revealing critical weaknesses in the current models' alignment. This is in addition to a significant performance gap compared to closed-source frontier models. We validate Text2VLM through human evaluations, ensuring the alignment of extracted salient concepts; text summarization and output classification align with human expectations. Text2VLM provides a scalable tool for comprehensive safety assessment, contributing to the development of more robust safety mechanisms for VLMs. By enhancing the evaluation of multimodal vulnerabilities, Text2VLM plays a role in advancing the safe deployment of VLMs in diverse, real-world applications.

Updated: 2025-07-28 10:57:44

标题: Text2VLM:将仅文本数据集调整为评估视觉语言模型中的对齐训练

摘要: 随着视觉语言模型(VLMs)越来越多地集成到AI系统中,特别是在处理结合文本和图像的多模态内容时,需要强大的模型对齐。现有的评估数据集主要偏向于仅包含文本的提示,使得视觉漏洞得不到充分评估。为了填补这一空白,我们提出了\textbf{Text2VLM},这是一个新颖的多阶段流水线,将仅包含文本的数据集转化为多模态格式,专门用于评估VLMs对排版提示注入攻击的抵抗力。Text2VLM流水线识别原始文本中的有害内容,并将其转化为排版图像,为VLMs创建一个多模态提示。此外,我们对开源VLMs的评估强调了它们在引入视觉输入时对提示注入的增加敏感性,揭示了当前模型对齐的关键弱点。与闭源前沿模型相比,性能差距显著。我们通过人工评估验证了Text2VLM,确保提取的显著概念与人类期望一致;文本摘要和输出分类与人类期望一致。Text2VLM为全面的安全评估提供了可扩展的工具,有助于开发更强大的VLMs安全机制。通过增强多模态漏洞的评估,Text2VLM在推进VLMs在各种实际应用中安全部署中发挥着作用。

更新时间: 2025-07-28 10:57:44

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2507.20704v1

Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models

The increasing integration of Visual Language Models (VLMs) into AI systems necessitates robust model alignment, especially when handling multimodal content that combines text and images. Existing evaluation datasets heavily lean towards text-only prompts, leaving visual vulnerabilities under evaluated. To address this gap, we propose \textbf{Text2VLM}, a novel multi-stage pipeline that adapts text-only datasets into multimodal formats, specifically designed to evaluate the resilience of VLMs against typographic prompt injection attacks. The Text2VLM pipeline identifies harmful content in the original text and converts it into a typographic image, creating a multimodal prompt for VLMs. Also, our evaluation of open-source VLMs highlights their increased susceptibility to prompt injection when visual inputs are introduced, revealing critical weaknesses in the current models' alignment. This is in addition to a significant performance gap compared to closed-source frontier models. We validate Text2VLM through human evaluations, ensuring the alignment of extracted salient concepts; text summarization and output classification align with human expectations. Text2VLM provides a scalable tool for comprehensive safety assessment, contributing to the development of more robust safety mechanisms for VLMs. By enhancing the evaluation of multimodal vulnerabilities, Text2VLM plays a role in advancing the safe deployment of VLMs in diverse, real-world applications.

Updated: 2025-07-28 10:57:44

标题: Text2VLM:将仅文本数据集调整为评估视觉语言模型中对齐训练

摘要: 随着视觉语言模型(VLMs)越来越多地融入AI系统中,特别是在处理结合文本和图像的多模态内容时,需要强大的模型对齐。现有的评估数据集主要偏向于仅包含文本提示,导致对视觉漏洞的评估不足。为了解决这一问题,我们提出了一种新颖的多阶段流程Text2VLM,将仅包含文本的数据集转换为多模态格式,专门设计用于评估VLMs对于注入排印提示攻击的抵抗力。Text2VLM流程识别原始文本中的有害内容,并将其转换为排印图像,为VLMs创建多模态提示。此外,我们对开源VLMs的评估突显出当引入视觉输入时它们对提示注入的更高易感性,揭示了当前模型对齐中的关键弱点。与闭源前沿模型相比,这也导致了显著的性能差距。我们通过人工评估验证了Text2VLM,确保提取的显著概念与人类期望一致;文本摘要和输出分类与人类期望一致。Text2VLM为综合安全评估提供了可扩展的工具,有助于为VLMs开发更加健壮的安全机制。通过增强多模态漏洞的评估,Text2VLM在推动VLMs在各种真实世界应用中的安全部署中发挥作用。

更新时间: 2025-07-28 10:57:44

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2507.20704v1

PanoGAN A Deep Generative Model for Panoramic Dental Radiographs

This paper presents the development of a generative adversarial network (GAN) for synthesizing dental panoramic radiographs. Although exploratory in nature, the study aims to address the scarcity of data in dental research and education. We trained a deep convolutional GAN (DCGAN) using a Wasserstein loss with gradient penalty (WGANGP) on a dataset of 2322 radiographs of varying quality. The focus was on the dentoalveolar regions, other anatomical structures were cropped out. Extensive preprocessing and data cleaning were performed to standardize the inputs while preserving anatomical variability. We explored four candidate models by varying critic iterations, feature depth, and the use of denoising prior to training. A clinical expert evaluated the generated radiographs based on anatomical visibility and realism, using a 5-point scale (1 very poor 5 excellent). Most images showed moderate anatomical depiction, although some were degraded by artifacts. A trade-off was observed the model trained on non-denoised data yielded finer details especially in structures like the mandibular canal and trabecular bone, while a model trained on denoised data offered superior overall image clarity and sharpness. These findings provide a foundation for future work on GAN-based methods in dental imaging.

Updated: 2025-07-28 10:55:44

标题: 全景牙科X光片的深度生成模型PanoGAN

摘要: 本文介绍了一种生成对抗网络(GAN)用于合成牙科全景放射照片的开发过程。虽然在探索性质上,但该研究旨在解决牙科研究和教育中数据稀缺的问题。我们使用了一个深度卷积GAN(DCGAN),采用了一种带有梯度惩罚的Wasserstein损失(WGANGP),在一个包含2322张不同质量的放射照片数据集上进行了训练。重点放在牙槽区域,其他解剖结构被裁剪掉。我们进行了大量的预处理和数据清洗,以标准化输入同时保留解剖变异性。我们通过改变评论迭代次数、特征深度和训练前是否使用去噪来探索了四个候选模型。一个临床专家根据解剖可见性和逼真度,使用一个5点评分(1非常差5优秀)对生成的放射照片进行了评估。大多数图像显示出中等的解剖描绘,尽管一些被造成了伪影。观察到一个折衷点,即在非去噪数据上训练的模型显示出了更精细的细节,特别是在下颌管和骨小梁等结构上,而在去噪数据上训练的模型提供了更好的整体图像清晰度和锐利度。这些发现为未来在牙科成像中基于GAN的方法奠定了基础。

更新时间: 2025-07-28 10:55:44

领域: cs.CV,cs.ET,cs.LG,eess.IV

下载: http://arxiv.org/abs/2507.21200v1

A General Framework for Dynamic MAPF using Multi-Shot ASP and Tunnels

MAPF problem aims to find plans for multiple agents in an environment within a given time, such that the agents do not collide with each other or obstacles. Motivated by the execution and monitoring of these plans, we study Dynamic MAPF (D-MAPF) problem, which allows changes such as agents entering/leaving the environment or obstacles being removed/moved. Considering the requirements of real-world applications in warehouses with the presence of humans, we introduce 1) a general definition for D-MAPF (applicable to variations of D-MAPF), 2) a new framework to solve D-MAPF (utilizing multi-shot computation, and allowing different methods to solve D-MAPF), and 3) a new ASP-based method to solve D-MAPF (combining advantages of replanning and repairing methods, with a novel concept of tunnels to specify where agents can move). We have illustrated the strengths and weaknesses of this method by experimental evaluations, from the perspectives of computational performance and quality of solutions.

Updated: 2025-07-28 10:55:31

标题: 使用多次ASP和隧道的动态MAPF通用框架

摘要: MAPF问题旨在在给定时间内为环境中的多个代理找到计划,使代理不会相互碰撞或与障碍物碰撞。受执行和监视这些计划的启发,我们研究了动态MAPF(D-MAPF)问题,该问题允许进行更改,例如代理进入/离开环境或障碍物被移除/移动。考虑到在存在人类的仓库等实际应用中的要求,我们引入了1)D-MAPF的一般定义(适用于D-MAPF的变体),2)解决D-MAPF的新框架(利用多次计算,并允许不同方法解决D-MAPF),以及3)解决D-MAPF的新ASP方法(结合重规划和修复方法的优势,并引入了一种新颖的隧道概念,用于指定代理可以移动的位置)。我们通过实验评估展示了该方法的优势和劣势,从计算性能和解决方案质量的角度来看。

更新时间: 2025-07-28 10:55:31

领域: cs.AI

下载: http://arxiv.org/abs/2507.20703v1

A General Framework for Dynamic MAPF using Multi-Shot ASP and Tunnels

MAPF problem aims to find plans for multiple agents in an environment within a given time, such that the agents do not collide with each other or obstacles. Motivated by the execution and monitoring of these plans, we study Dynamic MAPF (D-MAPF) problem, which allows changes such as agents entering/leaving the environment or obstacles being removed/moved. Considering the requirements of real-world applications in warehouses with the presence of humans, we introduce 1) a general definition for D-MAPF (applicable to variations of D-MAPF), 2) a new framework to solve D-MAPF (utilizing multi-shot computation, and allowing different methods to solve D-MAPF), and 3) a new ASP-based method to solve D-MAPF (combining advantages of replanning and repairing methods, with a novel concept of tunnels to specify where agents can move). We have illustrated the strengths and weaknesses of this method by experimental evaluations, from the perspectives of computational performance and quality of solutions.

Updated: 2025-07-28 10:55:31

标题: 使用多次拍摄ASP和隧道的动态多智能体路径规划的通用框架

摘要: MAPF问题旨在找到在给定时间内在环境中多个代理的计划,使得代理不会相互碰撞或与障碍物发生碰撞。受这些计划的执行和监控的启发,我们研究了动态MAPF(D-MAPF)问题,允许发生变化,如代理进入/离开环境或障碍物被移除/移动。考虑到在存在人类的仓库中的实际应用需求,我们引入了1)D-MAPF的一般定义(适用于D-MAPF的变体),2)解决D-MAPF的新框架(利用多次计算,并允许不同方法解决D-MAPF),以及3)解决D-MAPF的新ASP方法(结合重新规划和修复方法的优点,具有指定代理可以移动的地方的新概念隧道)。我们通过实验评估展示了这种方法的优势和劣势,从计算性能和解决方案质量的角度来看。

更新时间: 2025-07-28 10:55:31

领域: cs.AI

下载: http://arxiv.org/abs/2507.20703v1

Improving Open-world Continual Learning under the Constraints of Scarce Labeled Data

Open-world continual learning (OWCL) adapts to sequential tasks with open samples, learning knowledge incrementally while preventing forgetting. However, existing OWCL still requires a large amount of labeled data for training, which is often impractical in real-world applications. Given that new categories/entities typically come with limited annotations and are in small quantities, a more realistic situation is OWCL with scarce labeled data, i.e., few-shot training samples. Hence, this paper investigates the problem of open-world few-shot continual learning (OFCL), challenging in (i) learning unbounded tasks without forgetting previous knowledge and avoiding overfitting, (ii) constructing compact decision boundaries for open detection with limited labeled data, and (iii) transferring knowledge about knowns and unknowns and even update the unknowns to knowns once the labels of open samples are learned. In response, we propose a novel OFCL framework that integrates three key components: (1) an instance-wise token augmentation (ITA) that represents and enriches sample representations with additional knowledge, (2) a margin-based open boundary (MOB) that supports open detection with new tasks emerge over time, and (3) an adaptive knowledge space (AKS) that endows unknowns with knowledge for the updating from unknowns to knowns. Finally, extensive experiments show that the proposed OFCL framework outperforms all baselines remarkably with practical importance and reproducibility. The source code is released at https://github.com/liyj1201/OFCL.

Updated: 2025-07-28 10:36:30

标题: Improving Open-world Continual Learning under the Constraints of Scarce Labeled Data (优化稀疏标记数据约束下的开放世界持续学习)

摘要: 开放世界持续学习(OWCL)适应具有开放样本的顺序任务,同时增量学习知识,同时防止遗忘。然而,现有的OWCL仍然需要大量标记数据进行训练,在实际应用中通常是不切实际的。鉴于新的类别/实体通常具有有限的注释并且数量较少,更现实的情况是具有稀缺标记数据的OWCL,即,少样本训练样本。因此,本文研究了开放世界少样本持续学习(OFCL)问题,其中挑战在于(i)学习无限任务而不遗忘先前知识并避免过拟合,(ii)构建具有有限标记数据的开放检测的紧凑决策边界,以及(iii)传递关于已知和未知的知识,甚至在学习开放样本的标签后将未知更新为已知。作为回应,我们提出了一个集成了三个关键组件的新颖的OFCL框架:(1)实例级令牌增强(ITA)代表并丰富样本表示与额外知识,(2)基于边界的开放边界(MOB)支持随着时间推移出现的新任务的开放检测,以及(3)自适应知识空间(AKS)为未知提供知识,以便从未知到已知的更新。最后,广泛的实验表明,所提出的OFCL框架在实际重要性和可重现性方面明显优于所有基线。源代码已发布在https://github.com/liyj1201/OFCL。

更新时间: 2025-07-28 10:36:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.20974v2

Improving Open-world Continual Learning under the Constraints of Scarce Labeled Data

Open-world continual learning (OWCL) adapts to sequential tasks with open samples, learning knowledge incrementally while preventing forgetting. However, existing OWCL still requires a large amount of labeled data for training, which is often impractical in real-world applications. Given that new categories/entities typically come with limited annotations and are in small quantities, a more realistic situation is OWCL with scarce labeled data, i.e., few-shot training samples. Hence, this paper investigates the problem of open-world few-shot continual learning (OFCL), challenging in (i) learning unbounded tasks without forgetting previous knowledge and avoiding overfitting, (ii) constructing compact decision boundaries for open detection with limited labeled data, and (iii) transferring knowledge about knowns and unknowns and even update the unknowns to knowns once the labels of open samples are learned. In response, we propose a novel OFCL framework that integrates three key components: (1) an instance-wise token augmentation (ITA) that represents and enriches sample representations with additional knowledge, (2) a margin-based open boundary (MOB) that supports open detection with new tasks emerge over time, and (3) an adaptive knowledge space (AKS) that endows unknowns with knowledge for the updating from unknowns to knowns. Finally, extensive experiments show that the proposed OFCL framework outperforms all baselines remarkably with practical importance and reproducibility. The source code is released at https://github.com/liyj1201/OFCL.

Updated: 2025-07-28 10:36:30

标题: 改善在稀缺标记数据约束下的开放世界持续学习

摘要: 开放世界持续学习(OWCL)适应具有开放样本的序列任务,以防止遗忘的同时逐步学习知识。然而,现有的OWCL仍然需要大量标记数据进行训练,在实际应用中通常是不切实际的。考虑到新的类别/实体通常具有有限的注释并且数量很少,更现实的情况是具有稀缺标记数据的OWCL,即少样本训练样本。因此,本文研究了开放世界少样本持续学习(OFCL)的问题,挑战在(i)学习无限任务的同时避免遗忘以及避免过拟合,(ii)使用有限标记数据构建紧凑的决策边界进行开放检测,以及(iii)转移关于已知和未知的知识,甚至在学习开放样本的标签后将未知更新为已知。为此,我们提出了一个集成三个关键组件的新颖OFCL框架:(1)基于实例的令牌增强(ITA),用额外知识表示和丰富样本表示,(2)基于边缘的开放边界(MOB),支持随时间出现的新任务的开放检测,以及(3)自适应知识空间(AKS),赋予未知知识,以便从未知更新为已知。最后,大量实验证明,所提出的OFCL框架在实用性和可重现性方面明显优于所有基线。源代码已发布在https://github.com/liyj1201/OFCL。

更新时间: 2025-07-28 10:36:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.20974v2

Continual Low-Rank Scaled Dot-product Attention

Transformers are widely used for their ability to capture data relations in sequence processing, with great success for a wide range of static tasks. However, the computational and memory footprint of their main component, i.e., the Scaled Dot-product Attention, is commonly overlooked. This makes their adoption in applications involving stream data processing with constraints in response latency, computational and memory resources infeasible. Some works have proposed methods to lower the computational cost of Transformers, i.e. low-rank approximations, sparsity in attention, and efficient formulations for Continual Inference. In this paper, we introduce a new formulation of the Scaled Dot-product Attention based on the Nystr\"om approximation that is suitable for Continual Inference. In experiments on Online Audio Classification and Online Action Detection tasks, the proposed Continual Scaled Dot-product Attention can lower the number of operations by up to three orders of magnitude compared to the original Transformers while retaining the predictive performance of competing models.

Updated: 2025-07-28 10:27:45

标题: 持续低秩缩放点积注意力

摘要: 变压器因其在序列处理中捕获数据关系的能力而被广泛使用,在各种静态任务中取得了巨大成功。然而,它们的主要组件,即缩放点积注意力的计算和内存占用往往被忽视。这使得它们在涉及流数据处理且具有响应延迟、计算和内存资源限制的应用中的采用变得不可行。一些研究提出了降低变压器计算成本的方法,例如低秩近似、注意力的稀疏性以及用于持续推断的高效公式。在本文中,我们介绍了一种基于Nyström近似的新型缩放点积注意力的形式,适用于持续推断。在在线音频分类和在线动作检测任务的实验中,所提出的持续缩放点积注意力与原始变压器相比,可以将操作数量降低多达三个数量级,同时保持了竞争模型的预测性能。

更新时间: 2025-07-28 10:27:45

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2412.03214v4

Continual Low-Rank Scaled Dot-product Attention

Transformers are widely used for their ability to capture data relations in sequence processing, with great success for a wide range of static tasks. However, the computational and memory footprint of their main component, i.e., the Scaled Dot-product Attention, is commonly overlooked. This makes their adoption in applications involving stream data processing with constraints in response latency, computational and memory resources infeasible. Some works have proposed methods to lower the computational cost of Transformers, i.e. low-rank approximations, sparsity in attention, and efficient formulations for Continual Inference. In this paper, we introduce a new formulation of the Scaled Dot-product Attention based on the Nystr\"om approximation that is suitable for Continual Inference. In experiments on Online Audio Classification and Online Action Detection tasks, the proposed Continual Scaled Dot-product Attention can lower the number of operations by up to three orders of magnitude compared to the original Transformers while retaining the predictive performance of competing models.

Updated: 2025-07-28 10:27:45

标题: 持续低秩缩放点积注意力

摘要: 变压器因其在序列处理中捕捉数据关系的能力而被广泛使用,在各种静态任务中取得了巨大成功。然而,它们的主要组件即缩放点积注意力的计算和内存占用往往被忽视。这使得它们在涉及流数据处理的应用中,由于响应延迟、计算和内存资源的限制,无法被采用。一些研究提出了降低变压器计算成本的方法,比如低秩逼近、注意力的稀疏性和连续推理的有效公式。本文介绍了一种基于Nystrom逼近的缩放点积注意力的新公式,适用于连续推理。在在线音频分类和在线动作检测任务的实验中,所提出的连续缩放点积注意力可以将操作数量降低多达三个数量级,同时保留了竞争模型的预测性能。

更新时间: 2025-07-28 10:27:45

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2412.03214v4

Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search

Neural Architecture Search (NAS) has become an essential tool for designing effective and efficient neural networks. In this paper, we investigate the geometric properties of neural architecture spaces commonly used in differentiable NAS methods, specifically NAS-Bench-201 and DARTS. By defining flatness metrics such as neighborhoods and loss barriers along paths in architecture space, we reveal locality and flatness characteristics analogous to the well-known properties of neural network loss landscapes in weight space. In particular, we find that highly accurate architectures cluster together in flat regions, while suboptimal architectures remain isolated, unveiling the detailed geometrical structure of the architecture search landscape. Building on these insights, we propose Architecture-Aware Minimization (A$^2$M), a novel analytically derived algorithmic framework that explicitly biases, for the first time, the gradient of differentiable NAS methods towards flat minima in architecture space. A$^2$M consistently improves generalization over state-of-the-art DARTS-based algorithms on benchmark datasets including CIFAR-10, CIFAR-100, and ImageNet16-120, across both NAS-Bench-201 and DARTS search spaces. Notably, A$^2$M is able to increase the test accuracy, on average across different differentiable NAS methods, by +3.60\% on CIFAR-10, +4.60\% on CIFAR-100, and +3.64\% on ImageNet16-120, demonstrating its superior effectiveness in practice. A$^2$M can be easily integrated into existing differentiable NAS frameworks, offering a versatile tool for future research and applications in automated machine learning. We open-source our code at https://github.com/AI-Tech-Research-Lab/AsquaredM.

Updated: 2025-07-28 10:18:58

标题: 体系结构感知的最小化(A$^2$M):如何在神经体系结构搜索中找到平坦的极小值

摘要: 神经架构搜索(NAS)已成为设计有效和高效神经网络的重要工具。在本文中,我们研究了在可微分NAS方法中常用的神经架构空间的几何特性,具体包括NAS-Bench-201和DARTS。通过定义架构空间中路径上的邻域和损失障碍等平坦度度量,我们揭示了类似于权重空间中神经网络损失景观的局部性和平坦性特征。特别地,我们发现高度准确的架构聚集在平坦区域中,而次优的架构保持孤立,揭示了架构搜索景观的详细几何结构。基于这些见解,我们提出了Architecture-Aware Minimization(A$^2$M),这是一个新颖的分析推导算法框架,首次明确地将可微分NAS方法的梯度偏向于架构空间中的平坦最小值。A$^2$M在包括CIFAR-10、CIFAR-100和ImageNet16-120在内的基准数据集上持续改进了基于DARTS的最新算法的泛化性能,跨NAS-Bench-201和DARTS搜索空间。值得注意的是,A$^2$M能够平均提高不同可微分NAS方法的测试准确度,CIFAR-10提高了+3.60\%,CIFAR-100提高了+4.60\%,ImageNet16-120提高了+3.64\%,证明了它在实践中的卓越效果。A$^2$M可以轻松集成到现有的可微分NAS框架中,为未来自动机器学习研究和应用提供了多功能工具。我们在https://github.com/AI-Tech-Research-Lab/AsquaredM开源我们的代码。

更新时间: 2025-07-28 10:18:58

领域: cs.LG,cond-mat.dis-nn,cs.CV,68T07

下载: http://arxiv.org/abs/2503.10404v2

Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search

Neural Architecture Search (NAS) has become an essential tool for designing effective and efficient neural networks. In this paper, we investigate the geometric properties of neural architecture spaces commonly used in differentiable NAS methods, specifically NAS-Bench-201 and DARTS. By defining flatness metrics such as neighborhoods and loss barriers along paths in architecture space, we reveal locality and flatness characteristics analogous to the well-known properties of neural network loss landscapes in weight space. In particular, we find that highly accurate architectures cluster together in flat regions, while suboptimal architectures remain isolated, unveiling the detailed geometrical structure of the architecture search landscape. Building on these insights, we propose Architecture-Aware Minimization (A$^2$M), a novel analytically derived algorithmic framework that explicitly biases, for the first time, the gradient of differentiable NAS methods towards flat minima in architecture space. A$^2$M consistently improves generalization over state-of-the-art DARTS-based algorithms on benchmark datasets including CIFAR-10, CIFAR-100, and ImageNet16-120, across both NAS-Bench-201 and DARTS search spaces. Notably, A$^2$M is able to increase the test accuracy, on average across different differentiable NAS methods, by +3.60\% on CIFAR-10, +4.60\% on CIFAR-100, and +3.64\% on ImageNet16-120, demonstrating its superior effectiveness in practice. A$^2$M can be easily integrated into existing differentiable NAS frameworks, offering a versatile tool for future research and applications in automated machine learning. We open-source our code at https://github.com/AI-Tech-Research-Lab/AsquaredM.

Updated: 2025-07-28 10:18:58

标题: 架构感知的最小化(A$^2$M):如何在神经架构搜索中找到平坦极小值

摘要: 神经架构搜索(NAS)已成为设计有效和高效神经网络的关键工具。在本文中,我们研究了常用于可微分NAS方法的神经架构空间的几何特性,具体包括NAS-Bench-201和DARTS。通过定义架构空间中路径上的邻域和损失障碍等平坦度指标,我们揭示了类似于权重空间中神经网络损失景观的局部性和平坦性特征。特别地,我们发现高度准确的架构在平坦区域聚集在一起,而次优的架构保持孤立,揭示了架构搜索景观的详细几何结构。基于这些见解,我们提出了Architecture-Aware Minimization(A$^2$M),这是一个新颖的经过分析推导的算法框架,首次明确地将可微分NAS方法的梯度偏向于架构空间中的平坦最小值。A$^2$M在基准数据集,包括CIFAR-10、CIFAR-100和ImageNet16-120上,持续改善了与基于DARTS的最先进算法的泛化能力,跨越NAS-Bench-201和DARTS搜索空间。值得注意的是,A$^2$M能够在CIFAR-10上平均增加+3.60\%的测试准确率,在CIFAR-100上增加+4.60\%,在ImageNet16-120上增加+3.64\%,展示了其在实践中的卓越效果。A$^2$M可以轻松集成到现有的可微分NAS框架中,为未来自动化机器学习研究和应用提供了多功能工具。我们在https://github.com/AI-Tech-Research-Lab/AsquaredM开源我们的代码。

更新时间: 2025-07-28 10:18:58

领域: cs.LG,cond-mat.dis-nn,cs.CV,68T07

下载: http://arxiv.org/abs/2503.10404v2

Guard-GBDT: Efficient Privacy-Preserving Approximated GBDT Training on Vertical Dataset

In light of increasing privacy concerns and stringent legal regulations, using secure multiparty computation (MPC) to enable collaborative GBDT model training among multiple data owners has garnered significant attention. Despite this, existing MPC-based GBDT frameworks face efficiency challenges due to high communication costs and the computation burden of non-linear operations, such as division and sigmoid calculations. In this work, we introduce Guard-GBDT, an innovative framework tailored for efficient and privacy-preserving GBDT training on vertical datasets. Guard-GBDT bypasses MPC-unfriendly division and sigmoid functions by using more streamlined approximations and reduces communication overhead by compressing the messages exchanged during gradient aggregation. We implement a prototype of Guard-GBDT and extensively evaluate its performance and accuracy on various real-world datasets. The results show that Guard-GBDT outperforms state-of-the-art HEP-XGB (CIKM'21) and SiGBDT (ASIA CCS'24) by up to $2.71\times$ and $12.21 \times$ on LAN network and up to $2.7\times$ and $8.2\times$ on WAN network. Guard-GBDT also achieves comparable accuracy with SiGBDT and plaintext XGBoost (better than HEP-XGB ), which exhibits a deviation of $\pm1\%$ to $\pm2\%$ only. Our implementation code is provided at https://github.com/XidianNSS/Guard-GBDT.git.

Updated: 2025-07-28 10:16:37

标题: Guard-GBDT:垂直数据集上高效的保护隐私的近似GBDT训练

摘要: 鉴于日益增加的隐私关注和严格的法律法规,利用安全多方计算(MPC)实现多个数据所有者之间的协作GBDT模型训练引起了广泛关注。尽管如此,现有基于MPC的GBDT框架面临效率挑战,原因是高通信成本和非线性操作(如除法和Sigmoid计算)的计算负担。在这项工作中,我们介绍了Guard-GBDT,这是一个专为垂直数据集上的高效且保护隐私的GBDT训练而定制的创新框架。Guard-GBDT通过使用更加简化的近似方法来绕过MPC不友好的除法和Sigmoid函数,并通过压缩在梯度聚合期间交换的消息来减少通信开销。我们实现了Guard-GBDT的原型,并在各种真实世界数据集上广泛评估了其性能和准确性。结果显示,Guard-GBDT在LAN网络上的性能优于最先进的HEP-XGB(CIKM'21)和SiGBDT(ASIA CCS'24)高达$2.71\times$和$12.21 \times$,在WAN网络上高达$2.7\times$和$8.2\times$。Guard-GBDT还与SiGBDT和明文XGBoost实现了可比的准确性(优于HEP-XGB),仅出现1%至2%的偏差。我们提供了实现代码,网址为https://github.com/XidianNSS/Guard-GBDT.git。

更新时间: 2025-07-28 10:16:37

领域: cs.CR

下载: http://arxiv.org/abs/2507.20688v1

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

Large Language Model (LLM) inference becomes resource-intensive, prompting a shift toward low-bit model weights to reduce the memory footprint and improve efficiency. Such low-bit LLMs necessitate the mixed-precision matrix multiplication (mpGEMM), an important yet underexplored operation involving the multiplication of lower-precision weights with higher-precision activations. Off-the-shelf hardware does not support this operation natively, leading to indirect, thus inefficient, dequantization-based implementations. In this paper, we study the lookup table (LUT)-based approach for mpGEMM and find that a conventional LUT implementation fails to achieve the promised gains. To unlock the full potential of LUT-based mpGEMM, we propose LUT Tensor Core, a software-hardware co-design for low-bit LLM inference. LUT Tensor Core differentiates itself from conventional LUT designs through: 1) software-based optimizations to minimize table precompute overhead and weight reinterpretation to reduce table storage; 2) a LUT-based Tensor Core hardware design with an elongated tiling shape to maximize table reuse and a bit-serial design to support diverse precision combinations in mpGEMM; 3) a new instruction set and compilation optimizations for LUT-based mpGEMM. LUT Tensor Core significantly outperforms existing pure software LUT implementations and achieves a 1.44$\times$ improvement in compute density and energy efficiency compared to previous state-of-the-art LUT-based accelerators.

Updated: 2025-07-28 10:09:27

标题: LUT张量核心:基于LUT的低比特LLM推理的软硬件协同设计

摘要: 大型语言模型(LLM)推理变得资源密集,促使转向低比特模型权重以减少内存占用并提高效率。这种低比特LLM需要混合精度矩阵乘法(mpGEMM),这是一个重要但尚未深入研究的操作,涉及将低精度权重与高精度激活进行乘法运算。现成的硬件不支持这种操作,导致间接、因此低效的去量化实现。 在本文中,我们研究了基于查找表(LUT)的mpGEMM方法,并发现传统的LUT实现无法实现承诺的增益。为了释放基于LUT的mpGEMM的全部潜力,我们提出了LUT Tensor Core,这是一个针对低比特LLM推理的软硬件协同设计。LUT Tensor Core通过以下方式与传统LUT设计有所不同:1)基于软件的优化,以最小化表预计算开销和权重重新解释以减少表存储;2)一种基于LUT的Tensor Core硬件设计,具有延长的切片形状以最大化表重用,并且采用位串行设计以支持mpGEMM中的多种精度组合;3)针对基于LUT的mpGEMM的新指令集和编译优化。与现有纯软件LUT实现相比,LUT Tensor Core明显优于现有的纯软件LUT实现,并与先前最先进的基于LUT的加速器相比,在计算密度和能效方面实现了1.44倍的改进。

更新时间: 2025-07-28 10:09:27

领域: cs.AR,cs.LG,C.1.0; C.3; B.2.4

下载: http://arxiv.org/abs/2408.06003v3

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

Large Language Model (LLM) inference becomes resource-intensive, prompting a shift toward low-bit model weights to reduce the memory footprint and improve efficiency. Such low-bit LLMs necessitate the mixed-precision matrix multiplication (mpGEMM), an important yet underexplored operation involving the multiplication of lower-precision weights with higher-precision activations. Off-the-shelf hardware does not support this operation natively, leading to indirect, thus inefficient, dequantization-based implementations. In this paper, we study the lookup table (LUT)-based approach for mpGEMM and find that a conventional LUT implementation fails to achieve the promised gains. To unlock the full potential of LUT-based mpGEMM, we propose LUT Tensor Core, a software-hardware co-design for low-bit LLM inference. LUT Tensor Core differentiates itself from conventional LUT designs through: 1) software-based optimizations to minimize table precompute overhead and weight reinterpretation to reduce table storage; 2) a LUT-based Tensor Core hardware design with an elongated tiling shape to maximize table reuse and a bit-serial design to support diverse precision combinations in mpGEMM; 3) a new instruction set and compilation optimizations for LUT-based mpGEMM. LUT Tensor Core significantly outperforms existing pure software LUT implementations and achieves a 1.44$\times$ improvement in compute density and energy efficiency compared to previous state-of-the-art LUT-based accelerators.

Updated: 2025-07-28 10:09:27

标题: LUT张量核心:基于LUT的低位LLM推断的软硬件协同设计

摘要: 大型语言模型(LLM)推理变得资源密集,促使转向低位模型权重以减少内存占用并提高效率。这种低位LLM需要混合精度矩阵乘法(mpGEMM),这是一个重要但尚未深入探讨的操作,涉及低精度权重与高精度激活的乘法。现成的硬件不支持这种操作,导致间接、因此低效的去量化实现。 在本文中,我们研究了基于查找表(LUT)的mpGEMM方法,并发现传统的LUT实现无法实现承诺的增益。为了发挥基于LUT的mpGEMM的全部潜力,我们提出了LUT Tensor Core,一个用于低位LLM推理的软硬件协同设计。LUT Tensor Core通过以下方式与传统LUT设计有所区别:1)基于软件的优化,以最小化表预计算开销和权重重新解释以减少表存储;2)基于LUT的Tensor Core硬件设计,采用延长的平铺形状以最大限度地重用表,并采用位串行设计以支持mpGEMM中的各种精度组合;3)针对基于LUT的mpGEMM的新指令集和编译优化。与现有的纯软件LUT实现相比,LUT Tensor Core显着优于现有的纯软件LUT实现,并在计算密度和能效方面实现了1.44倍的改进,比之前最先进的基于LUT的加速器。

更新时间: 2025-07-28 10:09:27

领域: cs.AR,cs.LG,C.1.0; C.3; B.2.4

下载: http://arxiv.org/abs/2408.06003v3

Novel Pivoted Cholesky Decompositions for Efficient Gaussian Process Inference

The Cholesky decomposition is a fundamental tool for solving linear systems with symmetric and positive definite matrices which are ubiquitous in linear algebra, optimization, and machine learning. Its numerical stability can be improved by introducing a pivoting strategy that iteratively permutes the rows and columns of the matrix. The order of pivoting indices determines how accurately the intermediate decomposition can reconstruct the original matrix, thus is decisive for the algorithm's efficiency in the case of early termination. Standard implementations select the next pivot from the largest value on the diagonal. In the case of Bayesian nonparametric inference, this strategy corresponds to greedy entropy maximization, which is often used in active learning and design of experiments. We explore this connection in detail and deduce novel pivoting strategies for the Cholesky decomposition. The resulting algorithms are more efficient at reducing the uncertainty over a data set, can be updated to include information about observations, and additionally benefit from a tailored implementation. We benchmark the effectiveness of the new selection strategies on two tasks important to Gaussian processes: sparse regression and inference based on preconditioned iterative solvers. Our results show that the proposed selection strategies are either on par or, in most cases, outperform traditional baselines while requiring a negligible amount of additional computation.

Updated: 2025-07-28 10:01:43

标题: 高效高斯过程推断的新型旋转Cholesky分解

摘要: Cholesky分解是解决具有对称和正定矩阵的线性系统的基本工具,在线性代数、优化和机器学习中普遍存在。通过引入一种迭代置换矩阵的行和列的轴心策略,可以改善其数值稳定性。轴心索引的顺序决定了中间分解能够多精确地重构原始矩阵,因此对于算法在早期终止情况下的效率至关重要。标准实现从对角线上的最大值中选择下一个轴心。在贝叶斯非参数推断的情况下,这种策略对应于贪婪熵最大化,通常用于主动学习和实验设计。我们详细探讨了这种连接并推导出Cholesky分解的新轴心策略。由此产生的算法在减少数据集上的不确定性方面更加高效,可以更新以包含有关观测的信息,并且还可以从定制实现中获益。我们对两个对高斯过程重要的任务进行了新选择策略的有效性基准测试:稀疏回归和基于预条件迭代求解器的推断。我们的结果表明,所提出的选择策略要么与传统基准相当,要么在大多数情况下胜过传统基准,同时需要极少量的额外计算。

更新时间: 2025-07-28 10:01:43

领域: cs.LG

下载: http://arxiv.org/abs/2507.20678v1

Novel Pivoted Cholesky Decompositions for Efficient Gaussian Process Inference

The Cholesky decomposition is a fundamental tool for solving linear systems with symmetric and positive definite matrices which are ubiquitous in linear algebra, optimization, and machine learning. Its numerical stability can be improved by introducing a pivoting strategy that iteratively permutes the rows and columns of the matrix. The order of pivoting indices determines how accurately the intermediate decomposition can reconstruct the original matrix, thus is decisive for the algorithm's efficiency in the case of early termination. Standard implementations select the next pivot from the largest value on the diagonal. In the case of Bayesian nonparametric inference, this strategy corresponds to greedy entropy maximization, which is often used in active learning and design of experiments. We explore this connection in detail and deduce novel pivoting strategies for the Cholesky decomposition. The resulting algorithms are more efficient at reducing the uncertainty over a data set, can be updated to include information about observations, and additionally benefit from a tailored implementation. We benchmark the effectiveness of the new selection strategies on two tasks important to Gaussian processes: sparse regression and inference based on preconditioned iterative solvers. Our results show that the proposed selection strategies are either on par or, in most cases, outperform traditional baselines while requiring a negligible amount of additional computation.

Updated: 2025-07-28 10:01:43

标题: 高效高斯过程推断的新型旋转Cholesky分解

摘要: Cholesky分解是用于解决对称正定矩阵的线性系统的基本工具,在线性代数、优化和机器学习中普遍存在。通过引入一个迭代排列矩阵行和列的枢轴策略,可以改善其数值稳定性。枢轴索引的顺序决定了中间分解如何准确地重构原始矩阵,因此对于算法在早期终止情况下的效率至关重要。标准实现会从对角线上的最大值中选择下一个枢轴。在贝叶斯非参数推断的情况下,这种策略对应于贪婪熵最大化,通常用于主动学习和实验设计。我们详细探讨了这种联系,并推导出了Cholesky分解的新枢轴策略。由此产生的算法在减少数据集的不确定性方面更有效,可以更新以包含关于观测的信息,并且受益于定制的实现。我们在两个对高斯过程重要的任务上对新的选择策略的有效性进行了基准测试:稀疏回归和基于预条件迭代求解器的推断。我们的结果表明,所提出的选择策略在性能上要么与传统基准持平,要么在大多数情况下优于传统基准,同时需要极少额外的计算。

更新时间: 2025-07-28 10:01:43

领域: cs.LG

下载: http://arxiv.org/abs/2507.20678v1

A Novel Post-Quantum Secure Digital Signature Scheme Based on Neural Network

Digital signatures are fundamental cryptographic primitives that ensure the authenticity and integrity of digital documents. In the post-quantum era, classical public key-based signature schemes become vulnerable to brute-force and key-recovery attacks due to the computational power of quantum algorithms. Multivariate polynomial based signature schemes are among the one of the cryptographic constructions that offers strong security guarantees against such quantum threats. With the growing capabilities of neural networks, it is natural to explore their potential application in the design of cryptographic primitives. Neural networks inherently captures the non-linear relationships within the data, which are encoded in their synaptic weight matrices and bias vectors. In this paper, we propose a novel construction of a multivariate polynomial based digital signature scheme that leverages neural network architectures. A neural network with binary weights is employed to define the central structure of the signature scheme. The design introduces a recurrent random vector, functionally analogous to an attention mechanism, which contributes dynamic randomness based on the previous state, thereby enhancing the scheme's security. It is demonstrated that the proposed signature scheme provide security against Existential Unforgeability under adaptive Chosen-Message Attacks (EUF-CMA). Furthermore, it is proven that direct attacks aimed to recover the private keys are computationally infeasible within polynomial time, even in the presence of quantum computing abilities. The operational characteristics of the proposed scheme are also evaluated, with results indicating notable efficiency and practical viability in post-quantum cryptographic applications.

Updated: 2025-07-28 09:56:09

标题: 基于神经网络的一种新型后量子安全数字签名方案

摘要: 数字签名是确保数字文档的真实性和完整性的基本加密原语。在后量子时代,基于经典公钥的签名方案变得容易受到量子算法计算能力的暴力和密钥恢复攻击的影响。基于多项式的签名方案是提供强大安全保证对抗这种量子威胁的加密构造之一。随着神经网络能力的增强,自然而然地探索它们在设计加密原语中的潜在应用。神经网络固有地捕获数据中的非线性关系,这些关系编码在它们的突触权重矩阵和偏置向量中。在本文中,我们提出了一种利用神经网络架构的基于多项式的数字签名方案的新型构造。采用具有二进制权重的神经网络来定义签名方案的中心结构。设计引入了一个循环随机向量,功能上类似于注意机制,根据先前状态提供动态随机性,从而增强了方案的安全性。实验证明,所提出的签名方案对于自适应选择消息攻击(EUF-CMA)提供了安全性。此外,证明了在多项式时间内,即使存在量子计算能力,目标直接攻击恢复私钥也是计算上不可行的。还评估了所提出方案的运行特性,结果表明在后量子加密应用中具有显着的效率和实用性。

更新时间: 2025-07-28 09:56:09

领域: cs.CR,math.GR

下载: http://arxiv.org/abs/2507.20676v1

Program Analysis for High-Value Smart Contract Vulnerabilities: Techniques and Insights

A widespread belief in the blockchain security community is that automated techniques are only good for detecting shallow bugs, typically of small value. In this paper, we present the techniques and insights that have led us to repeatable success in automatically discovering high-value smart contract vulnerabilities. Our vulnerability disclosures have yielded 10 bug bounties, for a total of over $3M, over high-profile deployed code, as well as hundreds of bugs detected in pre-deployment or under-audit code. We argue that the elements of this surprising success are a) a very high-completeness static analysis approach that manages to maintain acceptable precision; b) domain knowledge, provided by experts or captured via statistical inference. We present novel techniques for automatically inferring domain knowledge from statistical analysis of a large corpus of deployed contracts, as well as discuss insights on the ideal precision and warning rate of a promising vulnerability detector. In contrast to academic literature in program analysis, which routinely expects false-positive rates below 50% for publishable results, we posit that a useful analysis for high-value real-world vulnerabilities will likely flag very few programs (under 1%) and will do so with a high false-positive rate (e.g., 95%, meaning that only one-of-twenty human inspections will yield an exploitable vulnerability).

Updated: 2025-07-28 09:53:31

标题: 智能合约高价值漏洞的程序分析:技术和见解

摘要: 在区块链安全社区中普遍认为,自动化技术只适用于检测表面的漏洞,通常是价值不大的漏洞。本文介绍了我们成功自动发现高价值智能合约漏洞的技术和见解。我们的漏洞披露已经获得了10个漏洞赏金,总额超过300万美元,涉及高知名度的部署代码,以及在部署前或审计代码中检测到的数百个漏洞。 我们认为这一惊人成功的要素是:a) 一种非常高完整性的静态分析方法,能够保持可接受的精度;b) 领域知识,由专家提供或通过统计推断获取。我们提出了一种从部署合约的大量语料库的统计分析中自动推断领域知识的新技术,以及讨论了一个有前途的漏洞检测器的理想精度和警告率的见解。与程序分析的学术文献相反,后者通常期望发表结果的误报率低于50%,我们认为有用于高价值现实世界漏洞的分析可能只会标记很少的程序(低于1%),并且会以高误报率(例如95%)进行标记,这意味着只有二十次人工检查中的一次将会发现可利用的漏洞。

更新时间: 2025-07-28 09:53:31

领域: cs.CR,cs.PL

下载: http://arxiv.org/abs/2507.20672v1

A Multimodal Architecture for Endpoint Position Prediction in Team-based Multiplayer Games

Understanding and predicting player movement in multiplayer games is crucial for achieving use cases such as player-mimicking bot navigation, preemptive bot control, strategy recommendation, and real-time player behavior analytics. However, the complex environments allow for a high degree of navigational freedom, and the interactions and team-play between players require models that make effective use of the available heterogeneous input data. This paper presents a multimodal architecture for predicting future player locations on a dynamic time horizon, using a U-Net-based approach for calculating endpoint location probability heatmaps, conditioned using a multimodal feature encoder. The application of a multi-head attention mechanism for different groups of features allows for communication between agents. In doing so, the architecture makes efficient use of the multimodal game state including image inputs, numerical and categorical features, as well as dynamic game data. Consequently, the presented technique lays the foundation for various downstream tasks that rely on future player positions such as the creation of player-predictive bot behavior or player anomaly detection.

Updated: 2025-07-28 09:51:49

标题: 一个多模式架构用于团队多人游戏中的终点位置预测

摘要: 理解和预测多人游戏中玩家移动是实现诸如模拟玩家机器人导航、预防性机器人控制、策略推荐和实时玩家行为分析等用例至关重要。然而,复杂的环境允许高度的导航自由度,玩家之间的互动和团队玩法需要有效利用可用的异构输入数据的模型。本文提出了一种多模态架构,用于预测动态时间轴上未来玩家位置,采用基于U-Net的方法计算终点位置概率热力图,并使用多模态特征编码器进行条件化。应用多头注意力机制对不同组特征进行通信,从而有效利用多模态游戏状态,包括图像输入、数值和分类特征以及动态游戏数据。因此,所提出的技术为依赖未来玩家位置的各种下游任务奠定了基础,例如创建玩家预测性机器人行为或玩家异常检测。

更新时间: 2025-07-28 09:51:49

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20670v1

A Multimodal Architecture for Endpoint Position Prediction in Team-based Multiplayer Games

Understanding and predicting player movement in multiplayer games is crucial for achieving use cases such as player-mimicking bot navigation, preemptive bot control, strategy recommendation, and real-time player behavior analytics. However, the complex environments allow for a high degree of navigational freedom, and the interactions and team-play between players require models that make effective use of the available heterogeneous input data. This paper presents a multimodal architecture for predicting future player locations on a dynamic time horizon, using a U-Net-based approach for calculating endpoint location probability heatmaps, conditioned using a multimodal feature encoder. The application of a multi-head attention mechanism for different groups of features allows for communication between agents. In doing so, the architecture makes efficient use of the multimodal game state including image inputs, numerical and categorical features, as well as dynamic game data. Consequently, the presented technique lays the foundation for various downstream tasks that rely on future player positions such as the creation of player-predictive bot behavior or player anomaly detection.

Updated: 2025-07-28 09:51:49

标题: 一个用于团队多人游戏中端点位置预测的多模态架构

摘要: 理解和预测多人游戏中玩家移动是实现诸如模拟玩家机器人导航、预测性机器人控制、策略推荐和实时玩家行为分析等用例至关重要。然而,复杂的环境允许高度自由的导航,玩家之间的互动和团队合作需要有效利用可用的异构输入数据的模型。本文提出了一种多模态架构,用于在动态时间范围内预测未来玩家位置,采用基于U-Net的方法计算端点位置概率热度图,并使用多模态特征编码器进行条件化。应用多头注意机制处理不同组特征,实现代理之间的通信。通过这样做,该架构有效利用多模态游戏状态,包括图像输入、数值和分类特征,以及动态游戏数据。因此,所提出的技术为依赖未来玩家位置的各种下游任务奠定了基础,如创建预测性玩家机器人行为或玩家异常检测。

更新时间: 2025-07-28 09:51:49

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20670v1

MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection

This paper proposes a method for generating machine-type-specific anomalies to evaluate the relative performance of unsupervised anomalous sound detection (UASD) systems across different machine types, even in the absence of real anomaly sound data. Conventional keyword-based data augmentation methods often produce unrealistic sounds due to their reliance on manually defined labels, limiting scalability as machine types and anomaly patterns diversify. Advanced audio generative models, such as MIMII-Gen, show promise but typically depend on anomalous training data, making them less effective when diverse anomalous examples are unavailable. To address these limitations, we propose a novel synthesis approach leveraging large language models (LLMs) to interpret textual descriptions of faults and automatically select audio transformation functions, converting normal machine sounds into diverse and plausible anomalous sounds. We validate this approach by evaluating a UASD system trained only on normal sounds from five machine types, using both real and synthetic anomaly data. Experimental results reveal consistent trends in relative detection difficulty across machine types between synthetic and real anomalies. This finding supports our hypothesis and highlights the effectiveness of the proposed LLM-based synthesis approach for relative evaluation of UASD systems.

Updated: 2025-07-28 09:42:41

标题: MIMII-Agent:利用LLM与函数调用进行异常声音检测的相对评估

摘要: 本文提出了一种方法,用于生成特定于机器类型的异常,以评估不受监督的异常声音检测(UASD)系统在不同机器类型之间的相对性能,即使在没有真实异常声音数据的情况下。传统基于关键字的数据增强方法通常会产生不真实的声音,因为它们依赖于手动定义的标签,这限制了随着机器类型和异常模式的多样化而扩展的能力。先进的音频生成模型,如MIMII-Gen,显示出潜力,但通常依赖于异常训练数据,使得当多样化的异常示例不可用时,它们的效果较差。为了解决这些限制,我们提出了一种新颖的综合方法,利用大型语言模型(LLM)来解释故障的文本描述,并自动选择音频转换函数,将正常机器声音转换为多样化和可信的异常声音。我们通过评估仅在五种机器类型的正常声音上训练的UASD系统来验证这种方法,同时使用真实和合成的异常数据。实验结果显示,在合成和真实异常之间,在不同机器类型之间相对检测困难的趋势是一致的。这一发现支持我们的假设,并强调了基于LLM的综合方法在相对评估UASD系统方面的有效性。

更新时间: 2025-07-28 09:42:41

领域: eess.AS,cs.AI,cs.LG,cs.SD

下载: http://arxiv.org/abs/2507.20666v1

MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection

This paper proposes a method for generating machine-type-specific anomalies to evaluate the relative performance of unsupervised anomalous sound detection (UASD) systems across different machine types, even in the absence of real anomaly sound data. Conventional keyword-based data augmentation methods often produce unrealistic sounds due to their reliance on manually defined labels, limiting scalability as machine types and anomaly patterns diversify. Advanced audio generative models, such as MIMII-Gen, show promise but typically depend on anomalous training data, making them less effective when diverse anomalous examples are unavailable. To address these limitations, we propose a novel synthesis approach leveraging large language models (LLMs) to interpret textual descriptions of faults and automatically select audio transformation functions, converting normal machine sounds into diverse and plausible anomalous sounds. We validate this approach by evaluating a UASD system trained only on normal sounds from five machine types, using both real and synthetic anomaly data. Experimental results reveal consistent trends in relative detection difficulty across machine types between synthetic and real anomalies. This finding supports our hypothesis and highlights the effectiveness of the proposed LLM-based synthesis approach for relative evaluation of UASD systems.

Updated: 2025-07-28 09:42:41

标题: MIMII-Agent:利用LLMs结合函数调用进行异常声音检测的相对评估

摘要: 本文提出了一种方法,用于生成特定于机器类型的异常,以评估不同机器类型之间无监督异常声音检测(UASD)系统的相对性能,即使在缺乏真实异常声音数据的情况下也可以。传统基于关键词的数据增强方法通常会产生不现实的声音,因为它们依赖于手动定义的标签,这限制了随着机器类型和异常模式的多样化而扩展的可能性。先进的音频生成模型,例如MIMII-Gen,表现出很大的潜力,但通常依赖于异常训练数据,使得当多样化的异常示例不可用时,它们的效果不佳。为了解决这些限制,我们提出了一种新颖的综合方法,利用大型语言模型(LLMs)来解释故障的文本描述,并自动选择音频转换函数,将正常机器声音转换为多样化和可信的异常声音。我们通过评估仅在五种机器类型的正常声音上训练的UASD系统,使用真实和合成异常数据来验证这种方法。实验结果显示,在合成和真实异常之间,不同机器类型之间的相对检测困难程度存在一致的趋势。这一发现支持了我们的假设,并突显了所提出的基于LLM的综合方法在相对评估UASD系统方面的有效性。

更新时间: 2025-07-28 09:42:41

领域: eess.AS,cs.AI,cs.LG,cs.SD

下载: http://arxiv.org/abs/2507.20666v1

Advancing Compositional LLM Reasoning with Structured Task Relations in Interactive Multimodal Communications

Interactive multimodal applications (IMAs), such as route planning in the Internet of Vehicles, enrich users' personalized experiences by integrating various forms of data over wireless networks. Recent advances in large language models (LLMs) utilize mixture-of-experts (MoE) mechanisms to empower multiple IMAs, with each LLM trained individually for a specific task that presents different business workflows. In contrast to existing approaches that rely on multiple LLMs for IMAs, this paper presents a novel paradigm that accomplishes various IMAs using a single compositional LLM over wireless networks. The two primary challenges include 1) guiding a single LLM to adapt to diverse IMA objectives and 2) ensuring the flexibility and efficiency of the LLM in resource-constrained mobile environments. To tackle the first challenge, we propose ContextLoRA, a novel method that guides an LLM to learn the rich structured context among IMAs by constructing a task dependency graph. We partition the learnable parameter matrix of neural layers for each IMA to facilitate LLM composition. Then, we develop a step-by-step fine-tuning procedure guided by task relations, including training, freezing, and masking phases. This allows the LLM to learn to reason among tasks for better adaptation, capturing the latent dependencies between tasks. For the second challenge, we introduce ContextGear, a scheduling strategy to optimize the training procedure of ContextLoRA, aiming to minimize computational and communication costs through a strategic grouping mechanism. Experiments on three benchmarks show the superiority of the proposed ContextLoRA and ContextGear. Furthermore, we prototype our proposed paradigm on a real-world wireless testbed, demonstrating its practical applicability for various IMAs. We will release our code to the community.

Updated: 2025-07-28 09:33:12

标题: 推进交互式多模式通信中的结构化任务关系在组合LLM推理方面的应用

摘要: 互动多模式应用(IMAs),如在车联网中进行路线规划,通过整合无线网络上的各种形式的数据,丰富用户的个性化体验。最近大型语言模型(LLMs)的进展利用专家混合(MoE)机制,为多个IMAs提供支持,每个LLM针对特定任务进行单独训练,以呈现不同的业务工作流程。与依赖多个LLMs进行IMAs的现有方法相比,本文提出了一种通过无线网络实现各种IMAs的新范式。两个主要挑战包括1)引导单个LLM适应不同的IMAs目标,以及2)确保LLM在资源受限的移动环境中的灵活性和效率。为了应对第一个挑战,我们提出了ContextLoRA,一种新颖的方法,通过构建任务依赖图引导LLM学习IMAs之间丰富的结构化上下文。我们将神经层的可学习参数矩阵为每个IMA进行分区,以促进LLM的组合。然后,我们开发了一个根据任务关系指导的逐步微调过程,包括训练、冻结和掩蔽阶段。这使得LLM能够学会在任务之间进行推理,以实现更好的适应性,捕捉任务之间的潜在依赖关系。对于第二个挑战,我们引入了ContextGear,一种调度策略,优化ContextLoRA的训练过程,旨在通过战略分组机制最小化计算和通信成本。在三个基准测试上的实验证明了ContextLoRA和ContextGear的优越性。此外,我们在一个真实世界的无线测试平台上原型化我们提出的范式,展示其对各种IMAs的实际适用性。我们将向社区发布我们的代码。

更新时间: 2025-07-28 09:33:12

领域: cs.LG,cs.AI,cs.DC,cs.HC

下载: http://arxiv.org/abs/2507.21199v1

Towards trustworthy AI in materials mechanics through domain-guided attention

Ensuring the trustworthiness and robustness of deep learning models remains a fundamental challenge, particularly in high-stakes scientific applications. In this study, we present a framework called attention-guided training that combines explainable artificial intelligence techniques with quantitative evaluation and domain-specific priors to guide model attention. We demonstrate that domain specific feedback on model explanations during training can enhance the model's generalization capabilities. We validate our approach on the task of semantic crack tip segmentation in digital image correlation data which is a key application in the fracture mechanical characterization of materials. By aligning model attention with physically meaningful stress fields, such as those described by Williams' analytical solution, attention-guided training ensures that the model focuses on physically relevant regions. This finally leads to improved generalization and more faithful explanations.

Updated: 2025-07-28 09:23:24

标题: 朝着领域引导关注的材料力学可信人工智能方向

摘要: 确保深度学习模型的可信度和鲁棒性仍然是一个基本挑战,特别是在高风险的科学应用中。在这项研究中,我们提出了一个名为注意力引导训练的框架,将可解释的人工智能技术与定量评估和领域特定的先验知识结合起来,以引导模型的注意力。我们证明,在训练过程中对模型解释进行领域特定的反馈可以增强模型的泛化能力。我们在数字图像相关数据中进行了语义裂纹尖端分割任务的验证,这是材料断裂力学表征中的关键应用。通过将模型的注意力与具有物理意义的应力场对齐,比如Williams的解析解所描述的那些应力场,注意力引导训练确保模型关注物理相关的区域。最终导致了改进的泛化能力和更忠实的解释。

更新时间: 2025-07-28 09:23:24

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2507.20658v1

Towards trustworthy AI in materials mechanics through domain-guided attention

Ensuring the trustworthiness and robustness of deep learning models remains a fundamental challenge, particularly in high-stakes scientific applications. In this study, we present a framework called attention-guided training that combines explainable artificial intelligence techniques with quantitative evaluation and domain-specific priors to guide model attention. We demonstrate that domain specific feedback on model explanations during training can enhance the model's generalization capabilities. We validate our approach on the task of semantic crack tip segmentation in digital image correlation data which is a key application in the fracture mechanical characterization of materials. By aligning model attention with physically meaningful stress fields, such as those described by Williams' analytical solution, attention-guided training ensures that the model focuses on physically relevant regions. This finally leads to improved generalization and more faithful explanations.

Updated: 2025-07-28 09:23:24

标题: 朝向通过领域引导注意力实现材料力学中可信赖的人工智能

摘要: 确保深度学习模型的可信度和鲁棒性仍然是一个基本挑战,特别是在高风险科学应用中。在这项研究中,我们提出了一个名为注意力引导训练的框架,将可解释的人工智能技术与定量评估和领域特定的先验知识相结合,以引导模型的注意力。我们证明,在训练过程中对模型解释进行领域特定的反馈可以增强模型的泛化能力。我们在数字图像相关数据中进行了语义裂纹尖端分割任务的验证,这是材料断裂力学表征中的关键应用。通过将模型的注意力与物理意义上的应力场进行对齐,比如Williams的分析解所描述的那样,注意力引导训练确保模型专注于物理相关的区域。最终导致了改进的泛化能力和更忠实的解释。

更新时间: 2025-07-28 09:23:24

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2507.20658v1

Hot-Swap MarkBoard: An Efficient Black-box Watermarking Approach for Large-scale Model Distribution

Recently, Deep Learning (DL) models have been increasingly deployed on end-user devices as On-Device AI, offering improved efficiency and privacy. However, this deployment trend poses more serious Intellectual Property (IP) risks, as models are distributed on numerous local devices, making them vulnerable to theft and redistribution. Most existing ownership protection solutions (e.g., backdoor-based watermarking) are designed for cloud-based AI-as-a-Service (AIaaS) and are not directly applicable to large-scale distribution scenarios, where each user-specific model instance must carry a unique watermark. These methods typically embed a fixed watermark, and modifying the embedded watermark requires retraining the model. To address these challenges, we propose Hot-Swap MarkBoard, an efficient watermarking method. It encodes user-specific $n$-bit binary signatures by independently embedding multiple watermarks into a multi-branch Low-Rank Adaptation (LoRA) module, enabling efficient watermark customization without retraining through branch swapping. A parameter obfuscation mechanism further entangles the watermark weights with those of the base model, preventing removal without degrading model performance. The method supports black-box verification and is compatible with various model architectures and DL tasks, including classification, image generation, and text generation. Extensive experiments across three types of tasks and six backbone models demonstrate our method's superior efficiency and adaptability compared to existing approaches, achieving 100\% verification accuracy.

Updated: 2025-07-28 09:14:21

标题: 热插拔MarkBoard:用于大规模模型分发的高效黑盒水印方法

摘要: 最近,深度学习(DL)模型越来越多地部署在终端用户设备上作为On-Device AI,提供了更高效和更私密的服务。然而,这种部署趋势带来了更严重的知识产权(IP)风险,因为模型分布在许多本地设备上,使其容易被盗窃和重新分发。大多数现有的所有权保护解决方案(例如基于后门的水印技术)是为基于云的AI-as-a-Service(AIaaS)设计的,并不直接适用于大规模分发场景,其中每个用户特定的模型实例必须携带一个独特的水印。这些方法通常嵌入一个固定的水印,修改嵌入的水印需要重新训练模型。为了应对这些挑战,我们提出了Hot-Swap MarkBoard,一种高效的水印技术。它通过独立地将多个水印嵌入到多分支低秩适应(LoRA)模块中,将用户特定的$n$位二进制签名编码进去,实现了在不重新训练的情况下通过分支交换实现高效的水印定制。参数混淆机制进一步将水印权重与基础模型的权重混合在一起,防止水印被移除而不影响模型性能。该方法支持黑盒验证,与各种模型架构和DL任务兼容,包括分类、图像生成和文本生成。通过在三种类型的任务和六种骨干模型上进行的大量实验表明,我们的方法相比现有方法具有更高的效率和适应性,实现了100\%的验证准确性。

更新时间: 2025-07-28 09:14:21

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.20650v1

Hot-Swap MarkBoard: An Efficient Black-box Watermarking Approach for Large-scale Model Distribution

Recently, Deep Learning (DL) models have been increasingly deployed on end-user devices as On-Device AI, offering improved efficiency and privacy. However, this deployment trend poses more serious Intellectual Property (IP) risks, as models are distributed on numerous local devices, making them vulnerable to theft and redistribution. Most existing ownership protection solutions (e.g., backdoor-based watermarking) are designed for cloud-based AI-as-a-Service (AIaaS) and are not directly applicable to large-scale distribution scenarios, where each user-specific model instance must carry a unique watermark. These methods typically embed a fixed watermark, and modifying the embedded watermark requires retraining the model. To address these challenges, we propose Hot-Swap MarkBoard, an efficient watermarking method. It encodes user-specific $n$-bit binary signatures by independently embedding multiple watermarks into a multi-branch Low-Rank Adaptation (LoRA) module, enabling efficient watermark customization without retraining through branch swapping. A parameter obfuscation mechanism further entangles the watermark weights with those of the base model, preventing removal without degrading model performance. The method supports black-box verification and is compatible with various model architectures and DL tasks, including classification, image generation, and text generation. Extensive experiments across three types of tasks and six backbone models demonstrate our method's superior efficiency and adaptability compared to existing approaches, achieving 100\% verification accuracy.

Updated: 2025-07-28 09:14:21

标题: 热插拔标记板:一种用于大规模模型分发的高效黑盒水印技术

摘要: 最近,深度学习(DL)模型越来越多地部署在终端用户设备上作为On-Device AI,提供了更高效和更隐私的解决方案。然而,这种部署趋势带来了更严重的知识产权(IP)风险,因为模型分布在大量本地设备上,使其容易被盗取和重新分发。大多数现有的所有权保护解决方案(例如基于后门的水印)是为基于云的AI-as-a-Service(AIaaS)设计的,并不直接适用于大规模分发场景,其中每个用户特定的模型实例必须携带一个唯一的水印。这些方法通常嵌入一个固定的水印,修改嵌入的水印需要重新训练模型。为了解决这些挑战,我们提出了Hot-Swap MarkBoard,一种高效的水印方法。它通过将多个水印独立地嵌入到多支路低秩适应(LoRA)模块中,将用户特定的$n$-bit二进制签名编码,实现了有效的水印定制,无需重新训练,通过支路交换。一个参数混淆机制进一步将水印权重与基础模型的权重相互缠绕在一起,防止去除而不降低模型性能。该方法支持黑盒验证,与各种模型架构和DL任务兼容,包括分类、图像生成和文本生成。通过对三种类型的任务和六种骨干模型的广泛实验,我们的方法相对于现有方法表现出更高的效率和适应性,实现了100\%的验证准确率。

更新时间: 2025-07-28 09:14:21

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.20650v1

The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks

Deep learning succeeds by doing hierarchical feature learning, yet tuning hyper-parameters (HP) such as initialization scales, learning rates etc., only give indirect control over this behavior. In this paper, we introduce a key notion to predict and control feature learning: the angle $\theta_\ell$ between the feature updates and the backward pass (at layer index $\ell$). We show that the magnitude of feature updates after one GD step, at any training time, can be expressed via a simple and general \emph{feature speed formula} in terms of this angle $\theta_\ell$, the loss decay, and the magnitude of the backward pass. This angle $\theta_\ell$ is controlled by the conditioning of the layer-to-layer Jacobians and at random initialization, it is determined by the spectrum of a certain kernel, which coincides with the Neural Tangent Kernel when $\ell=\text{depth}$. Given $\theta_\ell$, the feature speed formula provides us with rules to adjust HPs (scales and learning rates) so as to satisfy certain dynamical properties, such as feature learning and loss decay. We investigate the implications of our approach for ReLU MLPs and ResNets in the large width-then-depth limit. Relying on prior work, we show that in ReLU MLPs with iid initialization, the angle degenerates with depth as $\cos(\theta_\ell)=\Theta(1/\sqrt{\ell})$. In contrast, ResNets with branch scale $O(1/\sqrt{\text{depth}})$ maintain a non-degenerate angle $\cos(\theta_\ell)=\Theta(1)$. We use these insights to recover key properties of known HP scalings and also to introduce a new HP scaling for large depth ReLU MLPs with favorable theoretical properties.

Updated: 2025-07-28 09:11:45

标题: 特征速度公式:一种灵活的方法来缩放深度神经网络的超参数

摘要: 深度学习通过进行分层特征学习取得成功,然而调整超参数(HP),如初始化尺度、学习率等,仅对这种行为产生间接控制。在本文中,我们引入了一个关键概念来预测和控制特征学习:特征更新与反向传播之间的角度 $\theta_\ell$(在层索引 $\ell$ 处)。我们展示了在任何训练时间经过一个GD步骤后的特征更新的大小可以通过一个简单且普遍的\emph{特征速度公式}来表示,其中涉及这个角度 $\theta_\ell$、损失衰减和反向传播的大小。这个角度 $\theta_\ell$ 受到层与层之间Jacobi矩阵的调节,而在随机初始化时,它由某个核的频谱确定,该核与深度为 $\text{depth}$ 时的神经切向核相符。给定 $\theta_\ell$,特征速度公式为我们提供了调整HPs(尺度和学习率)的规则,以满足某些动力学特性,如特征学习和损失衰减。我们研究了我们方法对ReLU MLPs和ResNets在大宽度-深度极限下的影响。依靠先前的工作,我们展示了在具有iid初始化的ReLU MLPs中,该角度随深度退化为 $\cos(\theta_\ell)=\Theta(1/\sqrt{\ell})$。相比之下,具有分支尺度 $O(1/\sqrt{\text{depth}})$ 的ResNets保持一个非退化角度 $\cos(\theta_\ell)=\Theta(1)$。我们利用这些见解恢复了已知HP缩放的关键特性,同时也为具有有利理论特性的大深度ReLU MLPs引入了新的HP缩放。

更新时间: 2025-07-28 09:11:45

领域: cs.LG,68T07

下载: http://arxiv.org/abs/2311.18718v4

The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks

Deep learning succeeds by doing hierarchical feature learning, yet tuning hyper-parameters (HP) such as initialization scales, learning rates etc., only give indirect control over this behavior. In this paper, we introduce a key notion to predict and control feature learning: the angle $\theta_\ell$ between the feature updates and the backward pass (at layer index $\ell$). We show that the magnitude of feature updates after one GD step, at any training time, can be expressed via a simple and general \emph{feature speed formula} in terms of this angle $\theta_\ell$, the loss decay, and the magnitude of the backward pass. This angle $\theta_\ell$ is controlled by the conditioning of the layer-to-layer Jacobians and at random initialization, it is determined by the spectrum of a certain kernel, which coincides with the Neural Tangent Kernel when $\ell=\text{depth}$. Given $\theta_\ell$, the feature speed formula provides us with rules to adjust HPs (scales and learning rates) so as to satisfy certain dynamical properties, such as feature learning and loss decay. We investigate the implications of our approach for ReLU MLPs and ResNets in the large width-then-depth limit. Relying on prior work, we show that in ReLU MLPs with iid initialization, the angle degenerates with depth as $\cos(\theta_\ell)=\Theta(1/\sqrt{\ell})$. In contrast, ResNets with branch scale $O(1/\sqrt{\text{depth}})$ maintain a non-degenerate angle $\cos(\theta_\ell)=\Theta(1)$. We use these insights to recover key properties of known HP scalings and also to introduce a new HP scaling for large depth ReLU MLPs with favorable theoretical properties.

Updated: 2025-07-28 09:11:45

标题: 特征速度公式:一种灵活的方法来调整深度神经网络的超参数规模

摘要: 深度学习通过进行分层特征学习取得成功,然而调整超参数(HP)如初始化尺度、学习率等,仅对这种行为进行间接控制。在本文中,我们引入了一个关键概念来预测和控制特征学习:特征更新与反向传播之间的角度$\theta_\ell$(在层索引$\ell$处)。我们展示了在任何训练时间经过一个梯度下降步骤后特征更新的幅度可以通过一个简单且通用的\emph{特征速度公式}来表达,该公式与角度$\theta_\ell$、损失衰减以及反向传播的幅度有关。这个角度$\theta_\ell$受到层与层之间雅可比矩阵的条件限制,并在随机初始化时,由某个核的谱确定,这与当$\ell=\text{depth}$时的神经切线核相符。给定$\theta_\ell$,特征速度公式为我们提供了调整HPs(尺度和学习率)的规则,以满足特定的动力学性质,如特征学习和损失衰减。我们研究了我们方法对ReLU MLPs和ResNets在宽度-深度极限下的影响。依赖先前的工作,我们展示了在具有iid初始化的ReLU MLPs中,随着深度增加,角度会退化为$\cos(\theta_\ell)=\Theta(1/\sqrt{\ell})$。相比之下,具有分支尺度$O(1/\sqrt{\text{depth}})$的ResNets保持非退化角度$\cos(\theta_\ell)=\Theta(1)$。我们利用这些见解恢复了已知HP缩放的关键特性,同时还为具有有利理论性质的大深度ReLU MLPs引入了一种新的HP缩放。

更新时间: 2025-07-28 09:11:45

领域: cs.LG,68T07

下载: http://arxiv.org/abs/2311.18718v4

Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs

The automatic analysis of document layouts in digital-born PDF documents remains a challenging problem due to the heterogeneous arrangement of textual and nontextual elements and the imprecision of the textual metadata in the Portable Document Format. In this work, we benchmark Graph Neural Network (GNN) architectures for the task of fine-grained layout classification of text blocks from digital native documents. We introduce two graph construction structures: a k-closest-neighbor graph and a fully connected graph, and generate node features via pre-trained text and vision models, thus avoiding manual feature engineering. Three experimental frameworks are evaluated: single-modality (text or visual), concatenated multimodal, and dual-branch multimodal. We evaluated four foundational GNN models and compared them with the baseline. Our experiments are specifically conducted on a rich dataset of public affairs documents that includes more than 20 sources (e.g., regional and national-level official gazettes), 37K PDF documents, with 441K pages in total. Our results demonstrate that GraphSAGE operating on the k-closest-neighbor graph in a dual-branch configuration achieves the highest per-class and overall accuracy, outperforming the baseline in some sources. These findings confirm the importance of local layout relationships and multimodal fusion exploited through GNNs for the analysis of native digital document layouts.

Updated: 2025-07-28 09:10:12

标题: 基准测试公共事务文件布局分析中的图神经网络

摘要: 数字出生PDF文档中文档布局的自动分析仍然是一个具有挑战性的问题,这是由于文本和非文本元素的异构排列以及便携式文档格式中文本元数据的不精确性。在这项工作中,我们对用于数字原生文档的文本块的细粒度布局分类任务的图神经网络(GNN)架构进行了基准测试。我们引入了两种图构建结构:k最近邻图和全连接图,并通过预训练的文本和视觉模型生成节点特征,从而避免手动特征工程。我们评估了三种实验框架:单模态(文本或视觉)、连接的多模态和双分支多模态。我们评估了四种基础GNN模型,并将它们与基线进行了比较。我们的实验专门在一个包含20多个来源(例如地区和国家级官方公报)的丰富数据集上进行,共包括37K个PDF文档,总计441K页。我们的结果表明,在双分支配置中使用GraphSAGE在k最近邻图上实现了最高的每类和总体准确性,在某些来源上超过了基线。这些发现证实了通过GNN利用本地布局关系和多模态融合对原生数字文档布局进行分析的重要性。

更新时间: 2025-07-28 09:10:12

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2505.14699v2

Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs

The automatic analysis of document layouts in digital-born PDF documents remains a challenging problem due to the heterogeneous arrangement of textual and nontextual elements and the imprecision of the textual metadata in the Portable Document Format. In this work, we benchmark Graph Neural Network (GNN) architectures for the task of fine-grained layout classification of text blocks from digital native documents. We introduce two graph construction structures: a k-closest-neighbor graph and a fully connected graph, and generate node features via pre-trained text and vision models, thus avoiding manual feature engineering. Three experimental frameworks are evaluated: single-modality (text or visual), concatenated multimodal, and dual-branch multimodal. We evaluated four foundational GNN models and compared them with the baseline. Our experiments are specifically conducted on a rich dataset of public affairs documents that includes more than 20 sources (e.g., regional and national-level official gazettes), 37K PDF documents, with 441K pages in total. Our results demonstrate that GraphSAGE operating on the k-closest-neighbor graph in a dual-branch configuration achieves the highest per-class and overall accuracy, outperforming the baseline in some sources. These findings confirm the importance of local layout relationships and multimodal fusion exploited through GNNs for the analysis of native digital document layouts.

Updated: 2025-07-28 09:10:12

标题: 文献标题翻译:在公共事务中对文档布局分析进行图神经网络基准测试

摘要: 数字原生PDF文档中文档布局的自动分析仍然是一个具有挑战性的问题,这是由于文本和非文本元素的异质性排列以及便携式文档格式中文本元数据的不精确性所致。在这项工作中,我们对用于数字原生文档文本块细粒度布局分类任务的图神经网络(GNN)架构进行基准测试。我们引入了两种图构建结构:k最近邻图和全连接图,并通过预训练的文本和视觉模型生成节点特征,从而避免了手动特征工程。我们评估了三种实验框架:单模态(文本或视觉)、连接的多模态和双分支多模态。我们评估了四种基础GNN模型,并将它们与基线进行了比较。我们的实验专门在一个包含20多个来源(例如地区和国家级官方公报)、总共37,000个PDF文档、441,000页的丰富数据集上进行。我们的结果表明,在双分支配置中操作的GraphSAGE在k最近邻图上实现了最高的每类和总体准确性,在某些来源中超越了基线。这些发现证实了通过GNN利用本地布局关系和多模态融合对原生数字文档布局进行分析的重要性。

更新时间: 2025-07-28 09:10:12

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2505.14699v2

Machine Learning Solutions Integrated in an IoT Healthcare Platform for Heart Failure Risk Stratification

The management of chronic Heart Failure (HF) presents significant challenges in modern healthcare, requiring continuous monitoring, early detection of exacerbations, and personalized treatment strategies. In this paper, we present a predictive model founded on Machine Learning (ML) techniques to identify patients at HF risk. This model is an ensemble learning approach, a modified stacking technique, that uses two specialized models leveraging clinical and echocardiographic features and then a meta-model to combine the predictions of these two models. We initially assess the model on a real dataset and the obtained results suggest that it performs well in the stratification of patients at HR risk. Specifically, we obtained high sensitivity (95\%), ensuring that nearly all high-risk patients are identified. As for accuracy, we obtained 84\%, which can be considered moderate in some ML contexts. However, it is acceptable given our priority of identifying patients at risk of HF because they will be asked to participate in the telemonitoring program of the PrediHealth research project on which some of the authors of this paper are working. The initial findings also suggest that ML-based risk stratification models can serve as valuable decision-support tools not only in the PrediHealth project but also for healthcare professionals, aiding in early intervention and personalized patient management. To have a better understanding of the value and of potentiality of our predictive model, we also contrasted its results with those obtained by using three baseline models. The preliminary results indicate that our predictive model outperforms these baselines that flatly consider features, \ie not grouping them in clinical and echocardiographic features.

Updated: 2025-07-28 09:08:11

标题: 机器学习解决方案集成在物联网医疗平台中用于心力衰竭风险分层

摘要: 慢性心力衰竭(HF)的管理在现代医疗中面临着重大挑战,需要持续监测、早期发现恶化情况,并采用个性化治疗策略。在本文中,我们提出了一个基于机器学习(ML)技术的预测模型,用于识别HF风险患者。这个模型是一种集成学习方法,一种改进的堆叠技术,利用两个专门模型结合临床和超声心动图特征,然后使用元模型将这两个模型的预测结果结合起来。我们最初在真实数据集上评估了该模型,获得的结果表明在HF风险患者的分层方面表现良好。具体来说,我们获得了高敏感性(95\%),确保几乎所有高风险患者都被识别出来。至于准确性,我们获得了84\%,在某些ML背景下可以被认为是中等的。然而,考虑到我们优先识别HF风险患者的重点,这是可以接受的,因为他们将被要求参与PrediHealth研究项目的远程监测计划,本文作者中的一些人正在该项目上工作。初步研究结果还表明,基于机器学习的风险分层模型不仅可以作为PrediHealth项目中有价值的决策支持工具,还可以为医疗专业人员提供帮助,促进早期干预和个性化患者管理。为了更好地了解我们预测模型的价值和潜力,我们还将其结果与使用三个基准模型获得的结果进行了对比。初步结果表明,我们的预测模型优于这些基准模型,这些基准模型仅考虑特征,即没有将它们分组为临床和超声心动图特征。

更新时间: 2025-07-28 09:08:11

领域: stat.OT,cs.AI

下载: http://arxiv.org/abs/2505.09619v5

Machine Learning Solutions Integrated in an IoT Healthcare Platform for Heart Failure Risk Stratification

The management of chronic Heart Failure (HF) presents significant challenges in modern healthcare, requiring continuous monitoring, early detection of exacerbations, and personalized treatment strategies. In this paper, we present a predictive model founded on Machine Learning (ML) techniques to identify patients at HF risk. This model is an ensemble learning approach, a modified stacking technique, that uses two specialized models leveraging clinical and echocardiographic features and then a meta-model to combine the predictions of these two models. We initially assess the model on a real dataset and the obtained results suggest that it performs well in the stratification of patients at HR risk. Specifically, we obtained high sensitivity (95\%), ensuring that nearly all high-risk patients are identified. As for accuracy, we obtained 84\%, which can be considered moderate in some ML contexts. However, it is acceptable given our priority of identifying patients at risk of HF because they will be asked to participate in the telemonitoring program of the PrediHealth research project on which some of the authors of this paper are working. The initial findings also suggest that ML-based risk stratification models can serve as valuable decision-support tools not only in the PrediHealth project but also for healthcare professionals, aiding in early intervention and personalized patient management. To have a better understanding of the value and of potentiality of our predictive model, we also contrasted its results with those obtained by using three baseline models. The preliminary results indicate that our predictive model outperforms these baselines that flatly consider features, \ie not grouping them in clinical and echocardiographic features.

Updated: 2025-07-28 09:08:11

标题: 整合在IoT医疗平台中的机器学习解决方案用于心力衰竭风险分层

摘要: 慢性心力衰竭(HF)的管理在现代医疗中面临着重大挑战,需要持续监测、早期发现恶化情况,并制定个性化治疗策略。在本文中,我们提出了一种基于机器学习(ML)技术的预测模型,用于识别HF风险患者。该模型是一种集成学习方法,一种修改后的堆叠技术,它使用两个专门的模型利用临床和超声心动图特征,然后使用元模型来结合这两个模型的预测结果。我们最初在真实数据集上评估了该模型,获得的结果表明在对HF风险患者进行分层方面表现良好。具体而言,我们获得了高灵敏度(95\%),确保几乎所有高风险患者都能被识别出来。至于准确性,我们获得了84\%,在某些ML环境中可以被认为是中等水平的。然而,鉴于我们优先考虑识别患有HF风险的患者,因为他们将被要求参与本文作者中一些人正在进行的PrediHealth研究项目的远程监测计划。初步发现还表明,基于ML的风险分层模型不仅可以作为PrediHealth项目中有价值的决策支持工具,还可以为医疗专业人员提供帮助,帮助早期干预和个性化患者管理。为了更好地理解我们预测模型的价值和潜力,我们还将其结果与使用三个基线模型获得的结果进行了对比。初步结果表明,我们的预测模型优于这些基线模型,这些基线模型仅考虑特征,即未将其分组为临床和超声心动图特征。

更新时间: 2025-07-28 09:08:11

领域: stat.OT,cs.AI

下载: http://arxiv.org/abs/2505.09619v5

Deep Generative Models of Evolution: SNP-level Population Adaptation by Genomic Linkage Incorporation

The investigation of allele frequency trajectories in populations evolving under controlled environmental pressures has become a popular approach to study evolutionary processes on the molecular level. Statistical models based on well-defined evolutionary concepts can be used to validate different hypotheses about empirical observations. Despite their popularity, classic statistical models like the Wright-Fisher model suffer from simplified assumptions such as the independence of selected loci along a chromosome and uncertainty about the parameters. Deep generative neural networks offer a powerful alternative known for the integration of multivariate dependencies and noise reduction. Due to their high data demands and challenging interpretability they have, so far, not been widely considered in the area of population genomics. To address the challenges in the area of Evolve and Resequencing experiments (E&R) based on pooled sequencing (Pool-Seq) data, we introduce a deep generative neural network that aims to model a concept of evolution based on empirical observations over time. The proposed model estimates the distribution of allele frequency trajectories by embedding the observations from single nucleotide polymorphisms (SNPs) with information from neighboring loci. Evaluation on simulated E&R experiments demonstrates the model's ability to capture the distribution of allele frequency trajectories and illustrates the representational power of deep generative models on the example of linkage disequilibrium (LD) estimation. Inspecting the internally learned representations enables estimating pairwise LD, which is typically inaccessible in Pool-Seq data. Our model provides competitive LD estimation in Pool-Seq data high degree of LD when compared to existing methods.

Updated: 2025-07-28 09:03:09

标题: 进化的深度生成模型:通过基因组连锁融合实现SNP水平的种群适应

摘要: 在受控环境压力下进化的种群中等位基因频率轨迹的研究已成为研究分子水平上进化过程的流行方法。基于明确定义的进化概念的统计模型可用于验证关于实证观察的不同假设。尽管经典统计模型如Wright-Fisher模型很受欢迎,但存在简化假设,例如沿染色体选择位点的独立性以及参数的不确定性。深度生成神经网络提供了一个强大的替代方案,以整合多变量依赖性和降低噪音。由于其对数据需求高且难以解释,迄今为止,在种群基因组学领域并未被广泛考虑。为了解决基于池测序(Pool-Seq)数据的Evolve和Resequencing实验(E&R)领域的挑战,我们引入了一个旨在基于随时间的实证观察模拟进化概念的深度生成神经网络。所提出的模型通过将来自单核苷酸多态性(SNPs)的观察与邻近位点的信息嵌入,估算了等位基因频率轨迹的分布。在模拟的E&R实验中对模型进行评估表明其能够捕捉等位基因频率轨迹的分布,并且以联系不平衡(LD)估计为例展示了深度生成模型的表征能力。检查内部学习的表征能够估计成对的LD,这在Pool-Seq数据中通常是不可访问的。与现有方法相比,我们的模型在Pool-Seq数据中提供了具有竞争力的LD估计,高度LD。

更新时间: 2025-07-28 09:03:09

领域: cs.LG,q-bio.PE

下载: http://arxiv.org/abs/2507.20644v1

Deep Generative Models of Evolution: SNP-level Population Adaptation by Genomic Linkage Incorporation

The investigation of allele frequency trajectories in populations evolving under controlled environmental pressures has become a popular approach to study evolutionary processes on the molecular level. Statistical models based on well-defined evolutionary concepts can be used to validate different hypotheses about empirical observations. Despite their popularity, classic statistical models like the Wright-Fisher model suffer from simplified assumptions such as the independence of selected loci along a chromosome and uncertainty about the parameters. Deep generative neural networks offer a powerful alternative known for the integration of multivariate dependencies and noise reduction. Due to their high data demands and challenging interpretability they have, so far, not been widely considered in the area of population genomics. To address the challenges in the area of Evolve and Resequencing experiments (E&R) based on pooled sequencing (Pool-Seq) data, we introduce a deep generative neural network that aims to model a concept of evolution based on empirical observations over time. The proposed model estimates the distribution of allele frequency trajectories by embedding the observations from single nucleotide polymorphisms (SNPs) with information from neighboring loci. Evaluation on simulated E&R experiments demonstrates the model's ability to capture the distribution of allele frequency trajectories and illustrates the representational power of deep generative models on the example of linkage disequilibrium (LD) estimation. Inspecting the internally learned representations enables estimating pairwise LD, which is typically inaccessible in Pool-Seq data. Our model provides competitive LD estimation in Pool-Seq data high degree of LD when compared to existing methods.

Updated: 2025-07-28 09:03:09

标题: 进化的深度生成模型:通过基因组连锁整合实现SNP水平的种群适应

摘要: 在受控环境压力下进化的种群中等位基因频率轨迹的调查已经成为研究分子水平进化过程的流行方法。基于明确定义的进化概念的统计模型可用于验证关于实证观察的不同假设。尽管经典统计模型如Wright-Fisher模型很受欢迎,但存在简化假设,如沿染色体的选定位点的独立性以及参数的不确定性。深度生成神经网络提供了一个强大的替代方案,以整合多变量依赖关系和降噪。由于它们对数据的高需求和挑战性的可解释性,迄今为止,在种群基因组学领域并未广泛考虑。为了解决基于池测序(Pool-Seq)数据的进化重测序实验(E&R)领域的挑战,我们引入了一个深度生成神经网络,旨在基于随时间的实证观察模型进化的概念。所提出的模型通过将来自单核苷酸多态性(SNP)的观察嵌入到邻近位点的信息中,估计等位基因频率轨迹的分布。对模拟的E&R实验的评估证明了该模型捕捉等位基因频率轨迹分布的能力,并展示了深度生成模型在连锁不平衡(LD)估计示例上的表征能力。检查内部学习的表示使得能够估计成对LD,这在Pool-Seq数据中通常是无法访问的。与现有方法相比,我们的模型在Pool-Seq数据中提供了有竞争力的LD估计,高度的LD。

更新时间: 2025-07-28 09:03:09

领域: cs.LG,q-bio.PE

下载: http://arxiv.org/abs/2507.20644v1

Ontology-Enhanced Knowledge Graph Completion using Large Language Models

Large Language Models (LLMs) have been extensively adopted in Knowledge Graph Completion (KGC), showcasing significant research advancements. However, as black-box models driven by deep neural architectures, current LLM-based KGC methods rely on implicit knowledge representation with parallel propagation of erroneous knowledge, thereby hindering their ability to produce conclusive and decisive reasoning outcomes. We aim to integrate neural-perceptual structural information with ontological knowledge, leveraging the powerful capabilities of LLMs to achieve a deeper understanding of the intrinsic logic of the knowledge. We propose an ontology enhanced KGC method using LLMs -- OL-KGC. It first leverages neural perceptual mechanisms to effectively embed structural information into the textual space, and then uses an automated extraction algorithm to retrieve ontological knowledge from the knowledge graphs (KGs) that needs to be completed, which is further transformed into a textual format comprehensible to LLMs for providing logic guidance. We conducted extensive experiments on three widely-used benchmarks -- FB15K-237, UMLS and WN18RR. The experimental results demonstrate that OL-KGC significantly outperforms existing mainstream KGC methods across multiple evaluation metrics, achieving state-of-the-art performance.

Updated: 2025-07-28 09:00:48

标题: 使用大型语言模型增强本体知识图完整性

摘要: 大型语言模型(LLMs)已被广泛应用于知识图谱完成(KGC)领域,展示了显著的研究进展。然而,作为由深度神经架构驱动的黑盒模型,当前基于LLM的KGC方法依赖于隐式知识表示和错误知识的并行传播,从而阻碍了它们产生明确和决定性推理结果的能力。我们旨在将神经感知结构信息与本体知识整合起来,利用LLMs的强大能力来更深入地理解知识的内在逻辑。我们提出了一种使用LLMs的本体增强KGC方法--OL-KGC。它首先利用神经感知机制将结构信息有效地嵌入到文本空间中,然后使用自动提取算法从需要完成的知识图中检索本体知识,进一步将其转换为LLMs可理解的文本格式,以提供逻辑指导。我们在三个广泛使用的基准数据集上进行了大量实验--FB15K-237、UMLS和WN18RR。实验结果表明,OL-KGC在多个评估指标上显著优于现有的主流KGC方法,实现了最先进的性能。

更新时间: 2025-07-28 09:00:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.20643v1

Ontology-Enhanced Knowledge Graph Completion using Large Language Models

Large Language Models (LLMs) have been extensively adopted in Knowledge Graph Completion (KGC), showcasing significant research advancements. However, as black-box models driven by deep neural architectures, current LLM-based KGC methods rely on implicit knowledge representation with parallel propagation of erroneous knowledge, thereby hindering their ability to produce conclusive and decisive reasoning outcomes. We aim to integrate neural-perceptual structural information with ontological knowledge, leveraging the powerful capabilities of LLMs to achieve a deeper understanding of the intrinsic logic of the knowledge. We propose an ontology enhanced KGC method using LLMs -- OL-KGC. It first leverages neural perceptual mechanisms to effectively embed structural information into the textual space, and then uses an automated extraction algorithm to retrieve ontological knowledge from the knowledge graphs (KGs) that needs to be completed, which is further transformed into a textual format comprehensible to LLMs for providing logic guidance. We conducted extensive experiments on three widely-used benchmarks -- FB15K-237, UMLS and WN18RR. The experimental results demonstrate that OL-KGC significantly outperforms existing mainstream KGC methods across multiple evaluation metrics, achieving state-of-the-art performance.

Updated: 2025-07-28 09:00:48

标题: 利用大型语言模型增强本体知识图完成

摘要: 大型语言模型(LLMs)已被广泛应用于知识图谱完成(KGC),展示了重要的研究进展。然而,作为由深度神经架构驱动的黑盒模型,当前基于LLM的KGC方法依赖于隐式知识表示,同时传播错误知识,从而阻碍了它们产生明确和决定性推理结果的能力。我们旨在将神经感知结构信息与本体知识整合,利用LLMs的强大能力来更深入地理解知识的内在逻辑。我们提出了一种使用LLMs的本体增强KGC方法—OL-KGC。它首先利用神经感知机制有效地将结构信息嵌入到文本空间中,然后使用自动提取算法从需要完成的知识图谱(KGs)中检索本体知识,进一步转换为LLMs可理解的文本格式,为提供逻辑指导。我们在三个广泛使用的基准测试上进行了大量实验—FB15K-237、UMLS和WN18RR。实验结果表明,OL-KGC在多个评估指标上明显优于现有的主流KGC方法,实现了最先进的性能。

更新时间: 2025-07-28 09:00:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.20643v1

Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory

This paper proposes a novel theoretical framework for guaranteeing and evaluating the resilience of long short-term memory (LSTM) networks in control systems. We introduce "recovery time" as a new metric of resilience in order to quantify the time required for an LSTM to return to its normal state after anomalous inputs. By mathematically refining incremental input-to-state stability ($\delta$ISS) theory for LSTM, we derive a practical data-independent upper bound on recovery time. This upper bound gives us resilience-aware training. Experimental validation on simple models demonstrates the effectiveness of our resilience estimation and control methods, enhancing a foundation for rigorous quality assurance in safety-critical AI applications.

Updated: 2025-07-28 09:00:00

标题: 增强AI系统的弹性:基于控制理论的LSTM弹性的制定和保证

摘要: 本文提出了一个新的理论框架,用于确保和评估控制系统中长短期记忆(LSTM)网络的弹性。我们引入“恢复时间”作为弹性的新指标,以量化LSTM在异常输入后恢复到正常状态所需的时间。通过对LSTM的增量输入到状态稳定性($\delta$ISS)理论进行数学精细化,我们得出了恢复时间的实际数据无关上界。这个上界为我们提供了具有弹性意识的训练。在简单模型上的实验验证证明了我们的弹性估计和控制方法的有效性,为安全关键的人工智能应用奠定了严谨的质量保证基础。

更新时间: 2025-07-28 09:00:00

领域: cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2505.17696v2

Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory

This paper proposes a novel theoretical framework for guaranteeing and evaluating the resilience of long short-term memory (LSTM) networks in control systems. We introduce "recovery time" as a new metric of resilience in order to quantify the time required for an LSTM to return to its normal state after anomalous inputs. By mathematically refining incremental input-to-state stability ($\delta$ISS) theory for LSTM, we derive a practical data-independent upper bound on recovery time. This upper bound gives us resilience-aware training. Experimental validation on simple models demonstrates the effectiveness of our resilience estimation and control methods, enhancing a foundation for rigorous quality assurance in safety-critical AI applications.

Updated: 2025-07-28 09:00:00

标题: 增强人工智能系统的韧性:基于控制理论的LSTM韧性的制定和保证

摘要: 本文提出了一个新颖的理论框架,用于保证和评估控制系统中长短期记忆(LSTM)网络的弹性。我们引入“恢复时间”作为弹性的新度量标准,以量化LSTM在异常输入后返回正常状态所需的时间。通过对LSTM的增量输入状态稳定性($\delta$ISS)理论进行数学细化,我们得出了恢复时间的实用数据无关上界。这个上界为我们提供了具有弹性意识的训练。在简单模型上进行的实验验证表明,我们的弹性估计和控制方法的有效性,为安全关键人工智能应用的严格质量保证奠定了基础。

更新时间: 2025-07-28 09:00:00

领域: cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2505.17696v2

IGNIS: A Robust Neural Network Framework for Constrained Parameter Estimation in Archimedean Copulas

Classical estimators, the cornerstones of statistical inference, face insurmountable challenges when applied to important emerging classes of Archimedean copulas. These models exhibit pathological properties, including numerically unstable densities, non-monotonic parameter-to-dependence mappings, and vanishingly small likelihood gradients, rendering methods like Maximum Likelihood (MLE) and Method of Moments (MoM) inconsistent or computationally infeasible. We introduce IGNIS, a unified neural estimation framework that sidesteps these barriers by learning a direct, robust mapping from data-driven dependency measures to the underlying copula parameter theta. IGNIS utilizes a multi-input architecture and a theory-guided output layer (softplus(z) + 1) to automatically enforce the domain constraint theta_hat >= 1. Trained and validated on four families (Gumbel, Joe, and the numerically challenging A1/A2), IGNIS delivers accurate and stable estimates for real-world financial and health datasets, demonstrating its necessity for reliable inference in modern, complex dependence models where traditional methods fail.

Updated: 2025-07-28 08:58:45

标题: IGNIS:一种用于阿基米德科普拉约束参数估计的强大神经网络框架

摘要: 传统估计器是统计推断的基石,在应用于重要的新兴的阿基米德科普拉类别时面临着不可逾越的挑战。这些模型表现出病态特性,包括数值不稳定的密度、非单调的参数到依赖映射以及似乎几乎为零的似然梯度,使得最大似然估计(MLE)和矩估计法(MoM)等方法不一致或计算上不可行。我们引入了IGNIS,一个统一的神经估计框架,通过学习从数据驱动的依赖度量到基础科普拉参数theta的直接、稳健的映射来避开这些障碍。IGNIS利用多输入架构和一个理论引导的输出层(softplus(z) + 1)来自动强制执行域约束theta_hat >= 1。在四个家族(古贝尔、乔和数值挑战性的A1/A2)上经过训练和验证后,IGNIS为真实世界的金融和健康数据集提供准确稳定的估计结果,展示了在传统方法失败的现代复杂依赖模型中可靠推断的必要性。

更新时间: 2025-07-28 08:58:45

领域: stat.ML,cs.LG,62H05, 62H12, 62F10, 68T07, 62-08

下载: http://arxiv.org/abs/2505.22518v3

IGNIS: A Robust Neural Network Framework for Constrained Parameter Estimation in Archimedean Copulas

Classical estimators, the cornerstones of statistical inference, face insurmountable challenges when applied to important emerging classes of Archimedean copulas. These models exhibit pathological properties, including numerically unstable densities, non-monotonic parameter-to-dependence mappings, and vanishingly small likelihood gradients, rendering methods like Maximum Likelihood (MLE) and Method of Moments (MoM) inconsistent or computationally infeasible. We introduce IGNIS, a unified neural estimation framework that sidesteps these barriers by learning a direct, robust mapping from data-driven dependency measures to the underlying copula parameter theta. IGNIS utilizes a multi-input architecture and a theory-guided output layer (softplus(z) + 1) to automatically enforce the domain constraint theta_hat >= 1. Trained and validated on four families (Gumbel, Joe, and the numerically challenging A1/A2), IGNIS delivers accurate and stable estimates for real-world financial and health datasets, demonstrating its necessity for reliable inference in modern, complex dependence models where traditional methods fail.

Updated: 2025-07-28 08:58:45

标题: IGNIS:一种用于阿基米德科普拉参数估计的强大神经网络框架

摘要: 传统的估计器是统计推断的基石,但当应用于重要的新兴类别的阿基米德科普拉时,面临着不可逾越的挑战。这些模型表现出病态特性,包括数值不稳定的密度、非单调的参数与依赖关系映射,以及极小的似然梯度,使得像最大似然估计(MLE)和矩方法(MoM)这样的方法不一致或在计算上不可行。我们引入了IGNIS,一个统一的神经估计框架,通过学习从数据驱动的依赖度量到基础科普拉参数θ的直接、稳健的映射来避开这些障碍。IGNIS利用多输入架构和一个理论引导的输出层(softplus(z) + 1),自动强制执行域约束θ_hat >= 1。在四个家族(古贝尔、乔和在数值上具有挑战性的A1/A2)上训练和验证,IGNIS为真实世界的金融和健康数据集提供准确和稳定的估计,展示了在传统方法失败的现代复杂依赖模型中,它对可靠推断的必要性。

更新时间: 2025-07-28 08:58:45

领域: stat.ML,cs.LG,62H05, 62H12, 62F10, 68T07, 62-08

下载: http://arxiv.org/abs/2505.22518v3

Adaptive Fuzzy Time Series Forecasting via Partially Asymmetric Convolution and Sub-Sliding Window Fusion

At present, state-of-the-art forecasting models are short of the ability to capture spatio-temporal dependency and synthesize global information at the stage of learning. To address this issue, in this paper, through the adaptive fuzzified construction of temporal data, we propose a novel convolutional architecture with partially asymmetric design based on the scheme of sliding window to realize accurate time series forecasting. First, the construction strategy of traditional fuzzy time series is improved to further extract short and long term temporal interrelation, which enables every time node to automatically possess corresponding global information and inner relationships among them in a restricted sliding window and the process does not require human involvement. Second, a bilateral Atrous algorithm is devised to reduce calculation demand of the proposed model without sacrificing global characteristics of elements. And it also allows the model to avoid processing redundant information. Third, after the transformation of time series, a partially asymmetric convolutional architecture is designed to more flexibly mine data features by filters in different directions on feature maps, which gives the convolutional neural network (CNN) the ability to construct sub-windows within existing sliding windows to model at a more fine-grained level. And after obtaining the time series information at different levels, the multi-scale features from different sub-windows will be sent to the corresponding network layer for time series information fusion. Compared with other competitive modern models, the proposed method achieves state-of-the-art results on most of popular time series datasets, which is fully verified by the experimental results.

Updated: 2025-07-28 08:58:25

标题: 自适应模糊时间序列预测通过部分非对称卷积和子滑动窗口融合

摘要: 目前,最先进的预测模型在学习阶段缺乏捕获时空依赖性和综合全局信息的能力。为了解决这个问题,在本文中,通过对时间数据进行自适应模糊化构建,我们提出了一种基于滑动窗口方案的部分不对称设计的新型卷积架构,实现准确的时间序列预测。首先,改进了传统模糊时间序列的构建策略,以进一步提取短期和长期时间相关性,使每个时间节点在受限滑动窗口内自动拥有相应的全局信息和它们之间的内在关系,而且这个过程不需要人为介入。其次,设计了一个双向Atrous算法,以减少所提出模型的计算需求,而不牺牲元素的全局特征。它还使模型能够避免处理冗余信息。第三,经过时间序列的转换后,设计了一个部分不对称的卷积架构,通过在特征图上不同方向的滤波器更灵活地挖掘数据特征,使卷积神经网络(CNN)能够在现有滑动窗口内构建子窗口以更细粒度地建模。在不同级别获取时间序列信息后,不同子窗口的多尺度特征将被发送到相应的网络层进行时间序列信息融合。与其他竞争性现代模型相比,所提出的方法在大多数流行的时间序列数据集上取得了最先进的结果,这完全得到了实验结果的验证。

更新时间: 2025-07-28 08:58:25

领域: cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2507.20641v1

Adaptive Fuzzy Time Series Forecasting via Partially Asymmetric Convolution and Sub-Sliding Window Fusion

At present, state-of-the-art forecasting models are short of the ability to capture spatio-temporal dependency and synthesize global information at the stage of learning. To address this issue, in this paper, through the adaptive fuzzified construction of temporal data, we propose a novel convolutional architecture with partially asymmetric design based on the scheme of sliding window to realize accurate time series forecasting. First, the construction strategy of traditional fuzzy time series is improved to further extract short and long term temporal interrelation, which enables every time node to automatically possess corresponding global information and inner relationships among them in a restricted sliding window and the process does not require human involvement. Second, a bilateral Atrous algorithm is devised to reduce calculation demand of the proposed model without sacrificing global characteristics of elements. And it also allows the model to avoid processing redundant information. Third, after the transformation of time series, a partially asymmetric convolutional architecture is designed to more flexibly mine data features by filters in different directions on feature maps, which gives the convolutional neural network (CNN) the ability to construct sub-windows within existing sliding windows to model at a more fine-grained level. And after obtaining the time series information at different levels, the multi-scale features from different sub-windows will be sent to the corresponding network layer for time series information fusion. Compared with other competitive modern models, the proposed method achieves state-of-the-art results on most of popular time series datasets, which is fully verified by the experimental results.

Updated: 2025-07-28 08:58:25

标题: 自适应模糊时间序列预测:通过部分非对称卷积和子滑动窗口融合

摘要: 目前,最先进的预测模型在学习阶段缺乏捕捉时空依赖性和综合全局信息的能力。为了解决这个问题,在这篇论文中,通过对时间数据进行自适应模糊化构建,我们提出了一种基于滑动窗口方案的部分非对称设计的新型卷积架构,实现准确的时间序列预测。首先,改进了传统模糊时间序列的构建策略,以进一步提取短期和长期的时间相关性,使每个时间节点在受限制的滑动窗口中自动具有相应的全局信息和彼此之间的内部关系,并且这个过程不需要人工干预。其次,设计了一个双向Atrous算法,以减少所提出模型的计算需求,同时又不牺牲元素的全局特征。它还允许模型避免处理冗余信息。第三,在时间序列转换之后,设计了一个部分非对称的卷积架构,通过在特征图上不同方向的滤波器更灵活地挖掘数据特征,使卷积神经网络(CNN)能够在现有滑动窗口内构建子窗口,以更精细的级别进行建模。在获取不同级别的时间序列信息之后,来自不同子窗口的多尺度特征将被发送到相应的网络层进行时间序列信息融合。与其他竞争性的现代模型相比,所提出的方法在大多数热门时间序列数据集上取得了最先进的结果,这完全得到了实验结果的验证。

更新时间: 2025-07-28 08:58:25

领域: cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2507.20641v1

Free-form language-based robotic reasoning and grasping

Performing robotic grasping from a cluttered bin based on human instructions is a challenging task, as it requires understanding both the nuances of free-form language and the spatial relationships between objects. Vision-Language Models (VLMs) trained on web-scale data, such as GPT-4o, have demonstrated remarkable reasoning capabilities across both text and images. But can they truly be used for this task in a zero-shot setting? And what are their limitations? In this paper, we explore these research questions via the free-form language-based robotic grasping task, and propose a novel method, FreeGrasp, leveraging the pre-trained VLMs' world knowledge to reason about human instructions and object spatial arrangements. Our method detects all objects as keypoints and uses these keypoints to annotate marks on images, aiming to facilitate GPT-4o's zero-shot spatial reasoning. This allows our method to determine whether a requested object is directly graspable or if other objects must be grasped and removed first. Since no existing dataset is specifically designed for this task, we introduce a synthetic dataset FreeGraspData by extending the MetaGraspNetV2 dataset with human-annotated instructions and ground-truth grasping sequences. We conduct extensive analyses with both FreeGraspData and real-world validation with a gripper-equipped robotic arm, demonstrating state-of-the-art performance in grasp reasoning and execution. Project website: https://tev-fbk.github.io/FreeGrasp/.

Updated: 2025-07-28 08:53:04

标题: 基于自由形式语言的机器人推理和抓取

摘要: 在一个充满杂乱物品的箱子中进行基于人类指令的机器人抓取是一项具有挑战性的任务,因为它要求理解自由形式语言的微妙之处以及物体之间的空间关系。在网络规模数据上训练的视觉-语言模型(VLMs),如GPT-4o,已经展示出在文本和图像方面具有卓越的推理能力。但它们能否真正用于零-shot环境下的这项任务呢?它们的局限性又是什么?在本文中,我们通过基于自由形式语言的机器人抓取任务探讨这些研究问题,并提出了一种新颖的方法FreeGrasp,利用预训练的VLMs的世界知识来推理人类指令和物体空间排列。我们的方法将所有物体检测为关键点,并使用这些关键点在图像上标注标记,旨在促进GPT-4o的零-shot空间推理。这使得我们的方法能够确定请求的物体是否可以直接抓取,或者是否必须先抓取和移除其他物体。由于目前没有专门为这项任务设计的现有数据集,我们通过将MetaGraspNetV2数据集扩展为具有人类注释指令和真实抓取序列的合成数据集FreeGraspData来引入一个合成数据集。我们对FreeGraspData进行了广泛分析,并使用装备夹具的机器人手臂进行了真实世界验证,展示了在抓取推理和执行方面的最新性能。项目网站:https://tev-fbk.github.io/FreeGrasp/。

更新时间: 2025-07-28 08:53:04

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.13082v2

Free-form language-based robotic reasoning and grasping

Performing robotic grasping from a cluttered bin based on human instructions is a challenging task, as it requires understanding both the nuances of free-form language and the spatial relationships between objects. Vision-Language Models (VLMs) trained on web-scale data, such as GPT-4o, have demonstrated remarkable reasoning capabilities across both text and images. But can they truly be used for this task in a zero-shot setting? And what are their limitations? In this paper, we explore these research questions via the free-form language-based robotic grasping task, and propose a novel method, FreeGrasp, leveraging the pre-trained VLMs' world knowledge to reason about human instructions and object spatial arrangements. Our method detects all objects as keypoints and uses these keypoints to annotate marks on images, aiming to facilitate GPT-4o's zero-shot spatial reasoning. This allows our method to determine whether a requested object is directly graspable or if other objects must be grasped and removed first. Since no existing dataset is specifically designed for this task, we introduce a synthetic dataset FreeGraspData by extending the MetaGraspNetV2 dataset with human-annotated instructions and ground-truth grasping sequences. We conduct extensive analyses with both FreeGraspData and real-world validation with a gripper-equipped robotic arm, demonstrating state-of-the-art performance in grasp reasoning and execution. Project website: https://tev-fbk.github.io/FreeGrasp/.

Updated: 2025-07-28 08:53:04

标题: 基于自由形式语言的机器人推理和抓取

摘要: 根据人类指令在混乱的箱子中进行机器人抓取是一项具有挑战性的任务,因为它要求理解自由形式语言的细微差别和物体之间的空间关系。在网络规模数据上训练的视觉-语言模型(VLMs),如GPT-4o,已经展示了在文本和图像领域的卓越推理能力。但它们真的可以在零样本设置中用于这项任务吗?它们的限制是什么?本文通过自由形式语言为基础的机器人抓取任务探讨这些研究问题,并提出了一种新方法FreeGrasp,利用预训练的VLMs的世界知识来推理人类指令和物体空间排列。我们的方法将所有物体检测为关键点,并使用这些关键点在图像上标注标记,旨在促进GPT-4o的零样本空间推理。这使得我们的方法能够确定请求的对象是否可以直接抓取,或者是否必须先抓取和移除其他物体。由于目前没有专门为这项任务设计的数据集,我们通过将MetaGraspNetV2数据集扩展为具有人类注释指令和地面实况抓取序列的合成数据集FreeGraspData。我们使用FreeGraspData进行广泛的分析,并在配备夹具的机器人手臂上进行真实世界验证,展示了在抓取推理和执行方面的最先进性能。项目网站:https://tev-fbk.github.io/FreeGrasp/。

更新时间: 2025-07-28 08:53:04

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2503.13082v2

InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing

Face anti-spoofing (FAS) aims to construct a robust system that can withstand diverse attacks. While recent efforts have concentrated mainly on cross-domain generalization, two significant challenges persist: limited semantic understanding of attack types and training redundancy across domains. We address the first by integrating vision-language models (VLMs) to enhance the perception of visual input. For the second challenge, we employ a meta-domain strategy to learn a unified model that generalizes well across multiple domains. Our proposed InstructFLIP is a novel instruction-tuned framework that leverages VLMs to enhance generalization via textual guidance trained solely on a single domain. At its core, InstructFLIP explicitly decouples instructions into content and style components, where content-based instructions focus on the essential semantics of spoofing, and style-based instructions consider variations related to the environment and camera characteristics. Extensive experiments demonstrate the effectiveness of InstructFLIP by outperforming SOTA models in accuracy and substantially reducing training redundancy across diverse domains in FAS. Project website is available at https://kunkunlin1221.github.io/InstructFLIP.

Updated: 2025-07-28 08:51:08

标题: InstructFLIP:探索用于人脸反欺骗的统一视觉-语言模型

摘要: 面部反欺骗(FAS)旨在构建一个能够抵御各种攻击的强大系统。虽然最近的努力主要集中在跨领域泛化上,但仍存在两个重要挑战:攻击类型的语义理解有限,以及域间训练冗余。我们通过集成视觉-语言模型(VLMs)来增强对视觉输入的感知来解决第一个挑战。对于第二个挑战,我们采用元域策略来学习一个统一的模型,能够很好地泛化到多个领域。我们提出的InstructFLIP是一个新颖的指令调整框架,利用VLMs通过仅在单个领域上训练的文本指导来增强泛化能力。在其核心,InstructFLIP明确地将指令分解为内容和样式两个组件,其中基于内容的指令关注欺骗的基本语义,而基于样式的指令考虑与环境和摄像机特性相关的变化。大量实验证明了InstructFLIP的有效性,它在准确性方面胜过了SOTA模型,并大大减少了在FAS中各种领域中的训练冗余。项目网站可访问https://kunkunlin1221.github.io/InstructFLIP。

更新时间: 2025-07-28 08:51:08

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2507.12060v2

InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing

Face anti-spoofing (FAS) aims to construct a robust system that can withstand diverse attacks. While recent efforts have concentrated mainly on cross-domain generalization, two significant challenges persist: limited semantic understanding of attack types and training redundancy across domains. We address the first by integrating vision-language models (VLMs) to enhance the perception of visual input. For the second challenge, we employ a meta-domain strategy to learn a unified model that generalizes well across multiple domains. Our proposed InstructFLIP is a novel instruction-tuned framework that leverages VLMs to enhance generalization via textual guidance trained solely on a single domain. At its core, InstructFLIP explicitly decouples instructions into content and style components, where content-based instructions focus on the essential semantics of spoofing, and style-based instructions consider variations related to the environment and camera characteristics. Extensive experiments demonstrate the effectiveness of InstructFLIP by outperforming SOTA models in accuracy and substantially reducing training redundancy across diverse domains in FAS. Project website is available at https://kunkunlin1221.github.io/InstructFLIP.

Updated: 2025-07-28 08:51:08

标题: InstructFLIP:探索用于面部防欺骗的统一视觉语言模型

摘要: 面部反欺骗(FAS)旨在构建一个能够抵御各种攻击的强大系统。尽管最近的努力主要集中在跨领域泛化上,但仍然存在两个重要挑战:攻击类型的语义理解有限和跨域训练中的冗余。我们通过整合视觉语言模型(VLMs)来增强对视觉输入的感知来解决第一个挑战。对于第二个挑战,我们采用了元域策略来学习一个能够在多个领域中良好泛化的统一模型。我们提出的InstructFLIP是一个新颖的指导调整框架,通过仅在单一领域上训练的文本指导来利用VLMs来增强泛化。在其核心,InstructFLIP明确将指导分解为内容和风格组件,其中基于内容的指导关注欺骗的基本语义,而基于风格的指导考虑与环境和相机特性相关的变化。大量实验表明,InstructFLIP在FAS中的准确性方面优于SOTA模型,并大幅减少了跨多个领域的训练冗余。项目网站可在https://kunkunlin1221.github.io/InstructFLIP 上找到。

更新时间: 2025-07-28 08:51:08

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2507.12060v2

Explainable Synthetic Image Detection through Diffusion Timestep Ensembling

Recent advances in diffusion models have enabled the creation of deceptively real images, posing significant security risks when misused. In this study, we empirically show that different timesteps of DDIM inversion reveal varying subtle distinctions between synthetic and real images that are extractable for detection, in the forms of such as Fourier power spectrum high-frequency discrepancies and inter-pixel variance distributions. Based on these observations, we propose a novel synthetic image detection method that directly utilizes features of intermediately noised images by training an ensemble on multiple noised timesteps, circumventing conventional reconstruction-based strategies. To enhance human comprehension, we introduce a metric-grounded explanation generation and refinement module to identify and explain AI-generated flaws. Additionally, we construct the GenHard and GenExplain benchmarks to provide detection samples of greater difficulty and high-quality rationales for fake images. Extensive experiments show that our method achieves state-of-the-art performance with 98.91% and 95.89% detection accuracy on regular and challenging samples respectively, and demonstrates generalizability and robustness. Our code and datasets are available at https://github.com/Shadowlized/ESIDE.

Updated: 2025-07-28 08:49:27

标题: 通过扩散时间步集成解释可解释的合成图像检测

摘要: 最近,扩散模型的不断进步使得可以创建出看似真实的图像,但当被滥用时会带来重大安全风险。在这项研究中,我们实证地展示了不同时间步长的DDIM反演揭示了合成和真实图像之间微妙差异的可提取特征,如傅立叶功率谱高频差异和像素间方差分布。基于这些观察,我们提出了一种新颖的合成图像检测方法,通过在多个噪声时间步长上训练集合来直接利用中间噪声图像的特征,规避了传统的基于重建的策略。为了增强人类理解,我们引入了一个基于度量的解释生成和细化模块,用于识别和解释AI生成的缺陷。此外,我们构建了GenHard和GenExplain基准,提供了更难检测的样本和针对伪造图像的高质量理由。大量实验证明,我们的方法在常规和具有挑战性的样本上分别达到了98.91%和95.89%的检测准确率,并展示了泛化能力和鲁棒性。我们的代码和数据集可在https://github.com/Shadowlized/ESIDE上获取。

更新时间: 2025-07-28 08:49:27

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.06201v2

Explainable Synthetic Image Detection through Diffusion Timestep Ensembling

Recent advances in diffusion models have enabled the creation of deceptively real images, posing significant security risks when misused. In this study, we empirically show that different timesteps of DDIM inversion reveal varying subtle distinctions between synthetic and real images that are extractable for detection, in the forms of such as Fourier power spectrum high-frequency discrepancies and inter-pixel variance distributions. Based on these observations, we propose a novel synthetic image detection method that directly utilizes features of intermediately noised images by training an ensemble on multiple noised timesteps, circumventing conventional reconstruction-based strategies. To enhance human comprehension, we introduce a metric-grounded explanation generation and refinement module to identify and explain AI-generated flaws. Additionally, we construct the GenHard and GenExplain benchmarks to provide detection samples of greater difficulty and high-quality rationales for fake images. Extensive experiments show that our method achieves state-of-the-art performance with 98.91% and 95.89% detection accuracy on regular and challenging samples respectively, and demonstrates generalizability and robustness. Our code and datasets are available at https://github.com/Shadowlized/ESIDE.

Updated: 2025-07-28 08:49:27

标题: 通过扩散时间步集成解释可合成图像检测

摘要: 最近的扩散模型的进展使得可以创建出看似真实的图像,但当被滥用时会带来重大安全风险。在这项研究中,我们实证表明,DDIM反演的不同时间步长揭示了合成图像与真实图像之间可提取出的微妙差异,例如傅立叶功率谱高频差异和像素间方差分布。基于这些观察,我们提出了一种新颖的合成图像检测方法,通过在多个噪声时间步长上训练一个集成,直接利用中间噪声图像的特征,绕过传统的基于重建的策略。为了增强人类理解能力,我们引入了一个基于度量的解释生成和细化模块,以识别和解释人工智能生成的缺陷。此外,我们构建了GenHard和GenExplain基准,提供更具难度和高质量理由的伪造图像检测样本。广泛的实验表明,我们的方法在常规和具有挑战性的样本上分别达到了98.91%和95.89%的检测准确率,并展示了泛化能力和稳健性。我们的代码和数据集可以在https://github.com/Shadowlized/ESIDE上找到。

更新时间: 2025-07-28 08:49:27

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2503.06201v2

TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model

Large Vision-Language Models (LVLMs) have advanced multimodal learning but face high computational costs due to the large number of visual tokens, motivating token pruning to improve inference efficiency. The key challenge lies in identifying which tokens are truly important. Most existing approaches rely on attention-based criteria to estimate token importance. However, they inherently suffer from certain limitations, such as positional bias. In this work, we explore a new perspective on token importance based on token transitions in LVLMs. We observe that the transition of token representations provides a meaningful signal of semantic information. Based on this insight, we propose TransPrune, a training-free and efficient token pruning method. Specifically, TransPrune progressively prunes tokens by assessing their importance through a combination of Token Transition Variation (TTV)-which measures changes in both the magnitude and direction of token representations-and Instruction-Guided Attention (IGA), which measures how strongly the instruction attends to image tokens via attention. Extensive experiments demonstrate that TransPrune achieves comparable multimodal performance to original LVLMs, such as LLaVA-v1.5 and LLaVA-Next, across eight benchmarks, while reducing inference TFLOPs by more than half. Moreover, TTV alone can serve as an effective criterion without relying on attention, achieving performance comparable to attention-based methods. The code will be made publicly available upon acceptance of the paper at https://github.com/liaolea/TransPrune.

Updated: 2025-07-28 08:44:58

标题: TransPrune:用于高效大型视觉语言模型的令牌转换修剪

摘要: 大型视觉-语言模型(LVLMs)已经推进了多模态学习,但由于视觉标记数量庞大,面临着高计算成本,这促使进行标记修剪以提高推理效率。关键挑战在于确定哪些标记真正重要。大多数现有方法依赖于基于注意力的标准来估计标记的重要性。然而,它们固有地遭受某些限制,比如位置偏差。在这项工作中,我们探索了一个基于LVLMs中标记转换的标记重要性的新视角。我们观察到标记表示的转换提供了语义信息的有意义信号。基于这一认识,我们提出了TransPrune,一种无需训练且高效的标记修剪方法。具体地,TransPrune通过结合Token Transition Variation(TTV)-度量标记表示的大小和方向变化-以及Instruction-Guided Attention(IGA)逐步修剪标记,后者通过注意力度量指令对图像标记的关注程度。大量实验证明,TransPrune在八个基准测试中实现了与原始LVLMs(如LLaVA-v1.5和LLaVA-Next)相当的多模态性能,同时将推理TFLOPs减少了一半以上。此外,仅TTV就可以作为一个有效的标准,而不依赖注意力,实现了与基于注意力的方法相当的性能。代码将在论文被接受后公开发布在https://github.com/liaolea/TransPrune。

更新时间: 2025-07-28 08:44:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20630v1

TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model

Large Vision-Language Models (LVLMs) have advanced multimodal learning but face high computational costs due to the large number of visual tokens, motivating token pruning to improve inference efficiency. The key challenge lies in identifying which tokens are truly important. Most existing approaches rely on attention-based criteria to estimate token importance. However, they inherently suffer from certain limitations, such as positional bias. In this work, we explore a new perspective on token importance based on token transitions in LVLMs. We observe that the transition of token representations provides a meaningful signal of semantic information. Based on this insight, we propose TransPrune, a training-free and efficient token pruning method. Specifically, TransPrune progressively prunes tokens by assessing their importance through a combination of Token Transition Variation (TTV)-which measures changes in both the magnitude and direction of token representations-and Instruction-Guided Attention (IGA), which measures how strongly the instruction attends to image tokens via attention. Extensive experiments demonstrate that TransPrune achieves comparable multimodal performance to original LVLMs, such as LLaVA-v1.5 and LLaVA-Next, across eight benchmarks, while reducing inference TFLOPs by more than half. Moreover, TTV alone can serve as an effective criterion without relying on attention, achieving performance comparable to attention-based methods. The code will be made publicly available upon acceptance of the paper at https://github.com/liaolea/TransPrune.

Updated: 2025-07-28 08:44:58

标题: TransPrune:用于高效大规模视觉-语言模型的令牌转换修剪

摘要: 大型视觉语言模型(LVLMs)已经推动了多模态学习,但由于大量视觉令牌而面临高计算成本,这促使了令牌修剪以提高推理效率。关键挑战在于确定哪些令牌真正重要。大多数现有方法依赖于基于注意力的标准来估计令牌重要性。然而,它们固有地受到某些限制的影响,例如位置偏差。在这项工作中,我们探讨了基于LVLMs中令牌转换的令牌重要性的新视角。我们观察到令牌表示的转换提供了语义信息的有意义信号。基于这一见解,我们提出了TransPrune,这是一种无需训练且高效的令牌修剪方法。具体而言,TransPrune通过结合令牌转换变化(TTV)和指导注意力(IGA)来评估令牌的重要性,其中TTV衡量令牌表示的大小和方向变化,IGA衡量指导注意力通过注意力对图像令牌的关注程度。大量实验表明,TransPrune在八个基准测试中实现了与原始LVLMs(如LLaVA-v1.5和LLaVA-Next)可比的多模态性能,同时将推理TFLOPs减少了一半以上。此外,单独使用TTV可以作为一种有效的标准,无需依赖注意力,实现与基于注意力的方法相当的性能。代码将在论文被接受后公开提供,网址为https://github.com/liaolea/TransPrune。

更新时间: 2025-07-28 08:44:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20630v1

Swin-TUNA : A Novel PEFT Approach for Accurate Food Image Segmentation

In the field of food image processing, efficient semantic segmentation techniques are crucial for industrial applications. However, existing large-scale Transformer-based models (such as FoodSAM) face challenges in meeting practical deploymentrequirements due to their massive parameter counts and high computational resource demands. This paper introduces TUNable Adapter module (Swin-TUNA), a Parameter Efficient Fine-Tuning (PEFT) method that integrates multiscale trainable adapters into the Swin Transformer architecture, achieving high-performance food image segmentation by updating only 4% of the parameters. The core innovation of Swin-TUNA lies in its hierarchical feature adaptation mechanism: it designs separable convolutions in depth and dimensional mappings of varying scales to address the differences in features between shallow and deep networks, combined with a dynamic balancing strategy for tasks-agnostic and task-specific features. Experiments demonstrate that this method achieves mIoU of 50.56% and 74.94% on the FoodSeg103 and UECFoodPix Complete datasets, respectively, surpassing the fully parameterized FoodSAM model while reducing the parameter count by 98.7% (to only 8.13M). Furthermore, Swin-TUNA exhibits faster convergence and stronger generalization capabilities in low-data scenarios, providing an efficient solution for assembling lightweight food image.

Updated: 2025-07-28 08:44:20

标题: Swin-TUNA:一种用于准确食物图像分割的新型PEFT方法

摘要: 在食品图像处理领域,高效的语义分割技术对于工业应用至关重要。然而,现有的基于Transformer的大规模模型(如FoodSAM)面临着在实际部署需求方面的挑战,因为它们庞大的参数数量和高计算资源需求。本文介绍了可调适配器模块(Swin-TUNA),这是一种参数高效微调(PEFT)方法,将多尺度可训练适配器集成到Swin Transformer架构中,通过仅更新4%的参数实现高性能的食品图像分割。Swin-TUNA的核心创新在于其分层特征适应机制:它设计了可分离的卷积深度和不同尺度的维度映射,以解决浅层和深层网络之间特征差异,并结合动态平衡策略,适用于任务不可知和任务特定特征。实验证明,该方法在FoodSeg103和UECFoodPix Complete数据集上分别实现了50.56%和74.94%的mIoU,超过了完全参数化的FoodSAM模型,同时将参数数量减少了98.7%(仅为8.13M)。此外,Swin-TUNA在低数据场景中表现出更快的收敛速度和更强的泛化能力,为组装轻量级食品图像提供了高效解决方案。

更新时间: 2025-07-28 08:44:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17347v3

Swin-TUNA : A Novel PEFT Approach for Accurate Food Image Segmentation

In the field of food image processing, efficient semantic segmentation techniques are crucial for industrial applications. However, existing large-scale Transformer-based models (such as FoodSAM) face challenges in meeting practical deploymentrequirements due to their massive parameter counts and high computational resource demands. This paper introduces TUNable Adapter module (Swin-TUNA), a Parameter Efficient Fine-Tuning (PEFT) method that integrates multiscale trainable adapters into the Swin Transformer architecture, achieving high-performance food image segmentation by updating only 4% of the parameters. The core innovation of Swin-TUNA lies in its hierarchical feature adaptation mechanism: it designs separable convolutions in depth and dimensional mappings of varying scales to address the differences in features between shallow and deep networks, combined with a dynamic balancing strategy for tasks-agnostic and task-specific features. Experiments demonstrate that this method achieves mIoU of 50.56% and 74.94% on the FoodSeg103 and UECFoodPix Complete datasets, respectively, surpassing the fully parameterized FoodSAM model while reducing the parameter count by 98.7% (to only 8.13M). Furthermore, Swin-TUNA exhibits faster convergence and stronger generalization capabilities in low-data scenarios, providing an efficient solution for assembling lightweight food image.

Updated: 2025-07-28 08:44:20

标题: Swin-TUNA:一种用于准确食物图像分割的新型PEFT方法

摘要: 在食品图像处理领域,高效的语义分割技术对于工业应用至关重要。然而,现有的基于Transformer的大规模模型(如FoodSAM)面临着实际部署要求的挑战,因为它们具有庞大的参数数量和高计算资源需求。本文介绍了TUNable Adapter模块(Swin-TUNA),这是一种参数高效微调(PEFT)方法,它将多尺度可训练的适配器集成到Swin Transformer架构中,通过仅更新4%的参数实现高性能食品图像分割。Swin-TUNA的核心创新在于其分层特征适应机制:它设计了可分离的卷积在深度和不同尺度的维度映射中解决浅层和深层网络之间特征的差异,结合了动态平衡策略以处理任务不可知和任务特定的特征。实验证明,该方法在FoodSeg103和UECFoodPix Complete数据集上分别达到了50.56%和74.94%的mIoU,超过了完全参数化的FoodSAM模型,同时将参数数量减少了98.7%(仅为8.13M)。此外,Swin-TUNA在数据稀缺场景中表现出更快的收敛速度和更强的泛化能力,为组装轻量级食品图像提供了高效解决方案。

更新时间: 2025-07-28 08:44:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17347v3

Controllable Video-to-Music Generation with Multiple Time-Varying Conditions

Music enhances video narratives and emotions, driving demand for automatic video-to-music (V2M) generation. However, existing V2M methods relying solely on visual features or supplementary textual inputs generate music in a black-box manner, often failing to meet user expectations. To address this challenge, we propose a novel multi-condition guided V2M generation framework that incorporates multiple time-varying conditions for enhanced control over music generation. Our method uses a two-stage training strategy that enables learning of V2M fundamentals and audiovisual temporal synchronization while meeting users' needs for multi-condition control. In the first stage, we introduce a fine-grained feature selection module and a progressive temporal alignment attention mechanism to ensure flexible feature alignment. For the second stage, we develop a dynamic conditional fusion module and a control-guided decoder module to integrate multiple conditions and accurately guide the music composition process. Extensive experiments demonstrate that our method outperforms existing V2M pipelines in both subjective and objective evaluations, significantly enhancing control and alignment with user expectations.

Updated: 2025-07-28 08:41:20

标题: 可控的视频到音乐生成,具有多个时间变化条件

摘要: 音乐增强了视频叙事和情感,推动了对自动生成视频到音乐(V2M)的需求。然而,现有的V2M方法仅依赖于视觉特征或补充文本输入的生成音乐方式是黑盒式的,通常无法满足用户的期望。为了解决这一挑战,我们提出了一个新颖的多条件引导 V2M 生成框架,该框架结合了多个时变条件,以增强对音乐生成的控制。我们的方法使用两阶段训练策略,旨在学习 V2M 基础知识和音频视觉时间同步,同时满足用户对多条件控制的需求。在第一阶段,我们引入了一个细粒度特征选择模块和一个渐进式时间对齐注意机制,以确保灵活的特征对齐。对于第二阶段,我们开发了一个动态条件融合模块和一个控制引导解码器模块,以整合多个条件并准确引导音乐创作过程。大量实验证明,我们的方法在主观和客观评估中优于现有的 V2M 流水线,显著增强了对用户期望的控制和对齐性。

更新时间: 2025-07-28 08:41:20

领域: cs.MM,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.20627v1

Controllable Video-to-Music Generation with Multiple Time-Varying Conditions

Music enhances video narratives and emotions, driving demand for automatic video-to-music (V2M) generation. However, existing V2M methods relying solely on visual features or supplementary textual inputs generate music in a black-box manner, often failing to meet user expectations. To address this challenge, we propose a novel multi-condition guided V2M generation framework that incorporates multiple time-varying conditions for enhanced control over music generation. Our method uses a two-stage training strategy that enables learning of V2M fundamentals and audiovisual temporal synchronization while meeting users' needs for multi-condition control. In the first stage, we introduce a fine-grained feature selection module and a progressive temporal alignment attention mechanism to ensure flexible feature alignment. For the second stage, we develop a dynamic conditional fusion module and a control-guided decoder module to integrate multiple conditions and accurately guide the music composition process. Extensive experiments demonstrate that our method outperforms existing V2M pipelines in both subjective and objective evaluations, significantly enhancing control and alignment with user expectations.

Updated: 2025-07-28 08:41:20

标题: 具有多个时间变化条件的可控视频到音乐生成

摘要: 音乐增强了视频叙事和情感,推动了对自动生成视频到音乐(V2M)的需求。然而,现有的依赖于视觉特征或补充文本输入的V2M方法以黑盒的方式生成音乐,通常无法满足用户的期望。为了解决这一挑战,我们提出了一个新颖的多条件引导V2M生成框架,该框架结合了多个时变条件,以增强对音乐生成的控制。我们的方法采用了两阶段训练策略,使得能够学习V2M基础知识和音频视觉时序同步,同时满足用户对多条件控制的需求。在第一阶段,我们引入了一个细粒度特征选择模块和逐步的时间对齐注意机制,以确保灵活的特征对齐。对于第二阶段,我们开发了一个动态条件融合模块和一个控制引导解码器模块,以整合多个条件并准确引导音乐构成过程。大量实验证明,我们的方法在主观和客观评估中均优于现有的V2M管道,显著增强了对用户期望的控制和对齐度。

更新时间: 2025-07-28 08:41:20

领域: cs.MM,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.20627v1

Learning Before Filtering: Real-Time Hardware Learning at the Detector Level

Advances in sensor technology and automation have ushered in an era of data abundance, where the ability to identify and extract relevant information in real time has become increasingly critical. Traditional filtering approaches, which depend on a priori knowledge, often struggle to adapt to dynamic or unanticipated data features. Machine learning offers a compelling alternative-particularly when training can occur directly at or near the detector. This paper presents a digital hardware architecture designed for real-time neural network training, specifically optimized for high-throughput data ingestion. The design is described in an implementation-independent manner, with detailed analysis of each architectural component and their performance implications. Through system parameterization, the study explores trade-offs between processing speed, model complexity, and hardware resource utilization. Practical examples illustrate how these parameters affect applicability across various use cases. A proof-of-concept implementation on an FPGA demonstrates in-situ training, confirming that computational accuracy is preserved relative to conventional software-based approaches. Moreover, resource estimates indicate that current-generation FPGAs can train networks of approximately 3,500 neurons per chip. The architecture is both scalable and adaptable, representing a significant advancement toward integrating learning directly within detector systems and enabling a new class of extreme-edge, real-time information processing.

Updated: 2025-07-28 08:40:03

标题: 学习前过滤:探测器级别的实时硬件学习

摘要: 传感技术和自动化的进步引领了数据丰富时代的到来,实时识别和提取相关信息的能力变得日益关键。传统的过滤方法依赖于先验知识,通常难以适应动态或意外的数据特征。机器学习提供了一种引人注目的替代方案,特别是当训练可以直接在探测器附近或附近进行时。本文提出了一种专为高吞吐量数据摄入而优化的实时神经网络训练的数字硬件架构。该设计以实现独立的方式进行描述,详细分析了每个架构组件及其性能影响。通过系统参数化,研究探讨了处理速度、模型复杂性和硬件资源利用之间的权衡。实际示例说明了这些参数如何影响在各种用例中的适用性。在FPGA上的概念验证实现展示了原地训练,证实了计算精度相对于传统基于软件的方法得以保持。此外,资源估计表明,当前一代FPGA可以训练大约每片3500个神经元的网络。该架构既可扩展又可适应,代表了将学习直接整合到探测器系统中并实现一种新类极端边缘的实时信息处理的重大进步。

更新时间: 2025-07-28 08:40:03

领域: hep-ex,cs.LG

下载: http://arxiv.org/abs/2506.11981v2

Learning Before Filtering: Real-Time Hardware Learning at the Detector Level

Advances in sensor technology and automation have ushered in an era of data abundance, where the ability to identify and extract relevant information in real time has become increasingly critical. Traditional filtering approaches, which depend on a priori knowledge, often struggle to adapt to dynamic or unanticipated data features. Machine learning offers a compelling alternative-particularly when training can occur directly at or near the detector. This paper presents a digital hardware architecture designed for real-time neural network training, specifically optimized for high-throughput data ingestion. The design is described in an implementation-independent manner, with detailed analysis of each architectural component and their performance implications. Through system parameterization, the study explores trade-offs between processing speed, model complexity, and hardware resource utilization. Practical examples illustrate how these parameters affect applicability across various use cases. A proof-of-concept implementation on an FPGA demonstrates in-situ training, confirming that computational accuracy is preserved relative to conventional software-based approaches. Moreover, resource estimates indicate that current-generation FPGAs can train networks of approximately 3,500 neurons per chip. The architecture is both scalable and adaptable, representing a significant advancement toward integrating learning directly within detector systems and enabling a new class of extreme-edge, real-time information processing.

Updated: 2025-07-28 08:40:03

标题: 在过滤之前学习:探测器级实时硬件学习

摘要: 传感器技术和自动化的进步引入了数据丰富时代,实时识别和提取相关信息的能力变得越来越关键。传统的过滤方法依赖于先验知识,往往难以适应动态或意外的数据特征。机器学习提供了一种引人注目的替代方案,特别是当训练可以直接在检测器附近或附近进行时。本文提出了一种专为高吞吐量数据摄取而优化的实时神经网络训练的数字硬件架构。该设计以实现无关的方式描述,详细分析了每个架构组件及其性能影响。通过系统参数化,该研究探讨了处理速度、模型复杂性和硬件资源利用之间的权衡。实际示例说明了这些参数如何影响在各种用例中的适用性。在FPGA上的概念验证实现展示了就地训练,证实了与传统基于软件的方法相比计算准确性得到了保留。此外,资源估计表明,当前一代FPGA可以训练每个芯片约3,500个神经元的网络。该架构既可扩展又可适应,代表了将学习直接集成到检测器系统中并实现一种新型极端边缘、实时信息处理的重大进步。

更新时间: 2025-07-28 08:40:03

领域: hep-ex,cs.LG

下载: http://arxiv.org/abs/2506.11981v2

Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit

As the development of lightweight deep learning algorithms, various deep neural network (DNN) models have been proposed for the remote sensing scene classification (RSSC) application. However, it is still challenging for these RSSC models to achieve optimal performance among model accuracy, inference latency, and energy consumption on resource-constrained edge devices. In this paper, we propose a lightweight RSSC framework, which includes a distilled global filter network (GFNet) model and an early-exit mechanism designed for edge devices to achieve state-of-the-art performance. Specifically, we first apply frequency domain distillation on the GFNet model to reduce model size. Then we design a dynamic early-exit model tailored for DNN models on edge devices to further improve model inference efficiency. We evaluate our E3C model on three edge devices across four datasets. Extensive experimental results show that it achieves an average of 1.3x speedup on model inference and over 40% improvement on energy efficiency, while maintaining high classification accuracy.

Updated: 2025-07-28 08:36:36

标题: 在边缘设备上通过知识蒸馏和提前退出实现轻量级遥感场景分类

摘要: 随着轻量级深度学习算法的发展,各种深度神经网络(DNN)模型已被提出用于遥感场景分类(RSSC)应用。然而,对于这些RSSC模型来说,在资源受限的边缘设备上实现模型准确性、推断延迟和能耗之间的最佳性能仍具有挑战性。在本文中,我们提出了一个轻量级RSSC框架,包括一个经过提炼的全局滤波网络(GFNet)模型和一种早期退出机制,旨在设计用于边缘设备以实现最先进的性能。具体而言,我们首先对GFNet模型应用频域提炼以减小模型尺寸。然后,我们设计了一种针对边缘设备上的DNN模型量身定制的动态早期退出模型,以进一步提高模型推断效率。我们在四个数据集上跨三个边缘设备评估了我们的E3C模型。广泛的实验结果表明,它在模型推断速度上实现了平均1.3倍的加速,并在能源效率上提高了超过40%,同时保持高的分类准确性。

更新时间: 2025-07-28 08:36:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20623v1

Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit

As the development of lightweight deep learning algorithms, various deep neural network (DNN) models have been proposed for the remote sensing scene classification (RSSC) application. However, it is still challenging for these RSSC models to achieve optimal performance among model accuracy, inference latency, and energy consumption on resource-constrained edge devices. In this paper, we propose a lightweight RSSC framework, which includes a distilled global filter network (GFNet) model and an early-exit mechanism designed for edge devices to achieve state-of-the-art performance. Specifically, we first apply frequency domain distillation on the GFNet model to reduce model size. Then we design a dynamic early-exit model tailored for DNN models on edge devices to further improve model inference efficiency. We evaluate our E3C model on three edge devices across four datasets. Extensive experimental results show that it achieves an average of 1.3x speedup on model inference and over 40% improvement on energy efficiency, while maintaining high classification accuracy.

Updated: 2025-07-28 08:36:36

标题: 轻量级远程感知场景分类在边缘设备上通过知识蒸馏和提前退出

摘要: 随着轻量级深度学习算法的发展,各种深度神经网络(DNN)模型被提出用于遥感场景分类(RSSC)应用。然而,这些RSSC模型仍然面临挑战,即在资源受限的边缘设备上实现模型准确性、推断延迟和能耗之间的最佳性能。在本文中,我们提出了一个轻量级的RSSC框架,包括一个经过精简的全局滤波器网络(GFNet)模型和一个早期退出机制,旨在为边缘设备实现最先进的性能。具体来说,我们首先对GFNet模型应用频域精简以减小模型大小。然后我们设计了一个针对边缘设备上的DNN模型定制的动态早期退出模型,以进一步提高模型推断效率。我们在四个数据集上的三个边缘设备上评估了我们的E3C模型。广泛的实验证明,它在模型推断速度上实现了平均1.3倍的加速,并在能效上提高了超过40%,同时保持高分类准确性。

更新时间: 2025-07-28 08:36:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20623v1

Complementarity-driven Representation Learning for Multi-modal Knowledge Graph Completion

Multi-modal Knowledge Graph Completion (MMKGC) aims to uncover hidden world knowledge in multimodal knowledge graphs by leveraging both multimodal and structural entity information. However, the inherent imbalance in multimodal knowledge graphs, where modality distributions vary across entities, poses challenges in utilizing additional modality data for robust entity representation. Existing MMKGC methods typically rely on attention or gate-based fusion mechanisms but overlook complementarity contained in multi-modal data. In this paper, we propose a novel framework named Mixture of Complementary Modality Experts (MoCME), which consists of a Complementarity-guided Modality Knowledge Fusion (CMKF) module and an Entropy-guided Negative Sampling (EGNS) mechanism. The CMKF module exploits both intra-modal and inter-modal complementarity to fuse multi-view and multi-modal embeddings, enhancing representations of entities. Additionally, we introduce an Entropy-guided Negative Sampling mechanism to dynamically prioritize informative and uncertain negative samples to enhance training effectiveness and model robustness. Extensive experiments on five benchmark datasets demonstrate that our MoCME achieves state-of-the-art performance, surpassing existing approaches.

Updated: 2025-07-28 08:35:11

标题: 基于互补性的多模态知识图完成的表示学习

摘要: 多模态知识图完成(MMKGC)旨在通过利用多模态和结构实体信息来揭示多模态知识图中的隐藏世界知识。然而,在多模态知识图中存在的固有不平衡问题,其中模态分布在实体之间变化,使得利用额外模态数据来获得稳健实体表示面临挑战。现有的MMKGC方法通常依赖于注意力或门控融合机制,但忽视了多模态数据中包含的互补性。在本文中,我们提出了一个名为Mixture of Complementary Modality Experts(MoCME)的新框架,它包括一个Complementarity-guided Modality Knowledge Fusion(CMKF)模块和一个Entropy-guided Negative Sampling(EGNS)机制。CMKF模块利用模态内和模态间的互补性来融合多视图和多模态嵌入,增强实体的表示。此外,我们引入了一个熵引导的负采样机制,以动态优先考虑信息丰富和不确定的负样本,以增强训练效果和模型的稳健性。在五个基准数据集上的大量实验表明,我们的MoCME实现了最先进的性能,超越了现有的方法。

更新时间: 2025-07-28 08:35:11

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.20620v1

Complementarity-driven Representation Learning for Multi-modal Knowledge Graph Completion

Multi-modal Knowledge Graph Completion (MMKGC) aims to uncover hidden world knowledge in multimodal knowledge graphs by leveraging both multimodal and structural entity information. However, the inherent imbalance in multimodal knowledge graphs, where modality distributions vary across entities, poses challenges in utilizing additional modality data for robust entity representation. Existing MMKGC methods typically rely on attention or gate-based fusion mechanisms but overlook complementarity contained in multi-modal data. In this paper, we propose a novel framework named Mixture of Complementary Modality Experts (MoCME), which consists of a Complementarity-guided Modality Knowledge Fusion (CMKF) module and an Entropy-guided Negative Sampling (EGNS) mechanism. The CMKF module exploits both intra-modal and inter-modal complementarity to fuse multi-view and multi-modal embeddings, enhancing representations of entities. Additionally, we introduce an Entropy-guided Negative Sampling mechanism to dynamically prioritize informative and uncertain negative samples to enhance training effectiveness and model robustness. Extensive experiments on five benchmark datasets demonstrate that our MoCME achieves state-of-the-art performance, surpassing existing approaches.

Updated: 2025-07-28 08:35:11

标题: 基于互补性驱动的多模态知识图完善的表示学习

摘要: 多模态知识图完成(MMKGC)旨在通过利用多模态和结构实体信息揭示多模态知识图中的隐藏世界知识。然而,多模态知识图中固有的不平衡,其中模态分布在实体之间变化,给利用额外模态数据用于强大实体表示带来挑战。现有的MMKGC方法通常依赖于注意力或门控融合机制,但忽视了多模态数据中包含的互补性。在本文中,我们提出了一个名为互补模态专家混合(MoCME)的新框架,它包括一个基于互补性指导的模态知识融合(CMKF)模块和一个基于熵引导的负采样(EGNS)机制。CMKF模块利用模态内部和模态间的互补性来融合多视图和多模态嵌入,增强实体的表示。此外,我们引入了一个熵引导的负采样机制,动态地优先考虑有信息和不确定性的负样本,以增强训练效果和模型的稳健性。对五个基准数据集的大量实验表明,我们的MoCME实现了最先进的性能,超过了现有方法。

更新时间: 2025-07-28 08:35:11

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.20620v1

Secure Best Arm Identification in the Presence of a Copycat

Consider the problem of best arm identification with a security constraint. Specifically, assume a setup of stochastic linear bandits with $K$ arms of dimension $d$. In each arm pull, the player receives a reward that is the sum of the dot product of the arm with an unknown parameter vector and independent noise. The player's goal is to identify the best arm after $T$ arm pulls. Moreover, assume a copycat Chloe is observing the arm pulls. The player wishes to keep Chloe ignorant of the best arm. While a minimax--optimal algorithm identifies the best arm with an $\Omega\left(\frac{T}{\log(d)}\right)$ error exponent, it easily reveals its best-arm estimate to an outside observer, as the best arms are played more frequently. A naive secure algorithm that plays all arms equally results in an $\Omega\left(\frac{T}{d}\right)$ exponent. In this paper, we propose a secure algorithm that plays with \emph{coded arms}. The algorithm does not require any key or cryptographic primitives, yet achieves an $\Omega\left(\frac{T}{\log^2(d)}\right)$ exponent while revealing almost no information on the best arm.

Updated: 2025-07-28 08:28:48

标题: 在存在模仿者的情况下进行安全的最佳臂识别

摘要: 考虑带有安全约束的最佳臂识别问题。具体而言,假定存在一个具有$K$臂和维度$d$的随机线性臂的设置。在每次臂拉动中,玩家会收到一个奖励,该奖励是臂与未知参数向量的点积和独立噪声的总和。玩家的目标是在$T$次臂拉动后识别最佳臂。此外,假设有一个模仿者Chloe在观察臂拉动。玩家希望让Chloe对最佳臂一无所知。虽然一个极小极大优化算法能够以$\Omega\left(\frac{T}{\log(d)}\right)$的错误指数识别最佳臂,但它很容易向外部观察者透露其最佳臂估计,因为最佳臂被更频繁地玩。一个天真的安全算法,即以相同频率玩所有臂,会导致$\Omega\left(\frac{T}{d}\right)$指数。在本文中,我们提出了一种使用“编码臂”的安全算法。该算法不需要任何密钥或加密原语,却在几乎不透露有关最佳臂的信息的情况下实现了$\Omega\left(\frac{T}{\log^2(d)}\right)$的指数。

更新时间: 2025-07-28 08:28:48

领域: cs.LG

下载: http://arxiv.org/abs/2507.18975v2

Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression

Large multimodal models (LMMs) have advanced significantly by integrating visual encoders with extensive language models, enabling robust reasoning capabilities. However, compressing LMMs for deployment on edge devices remains a critical challenge. In this work, we propose an adaptive search algorithm that optimizes sparsity and KV cache compression to enhance LMM efficiency. Utilizing the Tree-structured Parzen Estimator, our method dynamically adjusts pruning ratios and KV cache quantization bandwidth across different LMM layers, using model performance as the optimization objective. This approach uniquely combines pruning with key-value cache quantization and incorporates a fast pruning technique that eliminates the need for additional fine-tuning or weight adjustments, achieving efficient compression without compromising accuracy. Comprehensive evaluations on benchmark datasets, including LLaVA-1.5 7B and 13B, demonstrate our method superiority over state-of-the-art techniques such as SparseGPT and Wanda across various compression levels. Notably, our framework automatic allocation of KV cache compression resources sets a new standard in LMM optimization, delivering memory efficiency without sacrificing much performance.

Updated: 2025-07-28 08:27:40

标题: 用自适应稀疏性和KV缓存压缩增强大型多模态模型

摘要: 大型多模态模型(LMMs)通过将视觉编码器与广泛的语言模型整合,使得强大的推理能力得到了显著提升。然而,在边缘设备上部署时对LMMs进行压缩仍然是一个关键挑战。在本研究中,我们提出了一种自适应搜索算法,优化稀疏性和KV缓存压缩,以增强LMM的效率。利用树状Parzen估计器,我们的方法动态调整了不同LMM层的修剪比例和KV缓存量化带宽,以模型性能作为优化目标。这种方法独特地将修剪与键值缓存量化相结合,并结合了一种快速修剪技术,消除了额外的微调或权重调整的需求,实现了高效的压缩而不影响准确性。在包括LLaVA-1.5 7B和13B在内的基准数据集上的全面评估显示了我们的方法在各种压缩水平上优于SparseGPT和Wanda等最先进技术。值得注意的是,我们的框架自动分配KV缓存压缩资源,为LMM优化设定了一个新标准,实现了内存效率而不牺牲太多性能。

更新时间: 2025-07-28 08:27:40

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20613v1

Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression

Large multimodal models (LMMs) have advanced significantly by integrating visual encoders with extensive language models, enabling robust reasoning capabilities. However, compressing LMMs for deployment on edge devices remains a critical challenge. In this work, we propose an adaptive search algorithm that optimizes sparsity and KV cache compression to enhance LMM efficiency. Utilizing the Tree-structured Parzen Estimator, our method dynamically adjusts pruning ratios and KV cache quantization bandwidth across different LMM layers, using model performance as the optimization objective. This approach uniquely combines pruning with key-value cache quantization and incorporates a fast pruning technique that eliminates the need for additional fine-tuning or weight adjustments, achieving efficient compression without compromising accuracy. Comprehensive evaluations on benchmark datasets, including LLaVA-1.5 7B and 13B, demonstrate our method superiority over state-of-the-art techniques such as SparseGPT and Wanda across various compression levels. Notably, our framework automatic allocation of KV cache compression resources sets a new standard in LMM optimization, delivering memory efficiency without sacrificing much performance.

Updated: 2025-07-28 08:27:40

标题: 使用自适应稀疏性和KV缓存压缩增强大型多模态模型

摘要: 大型多模型(LMMs)通过将视觉编码器与广泛的语言模型结合,已经显著发展,实现了强大的推理能力。然而,在边缘设备上部署时,压缩LMM仍然是一个关键挑战。在这项工作中,我们提出了一种自适应搜索算法,通过优化稀疏性和KV缓存压缩来增强LMM的效率。利用树结构的帕兹恩估计器,我们的方法动态调整不同LMM层的修剪比例和KV缓存量化带宽,以模型性能作为优化目标。这种方法独特地将修剪与键-值缓存量化结合起来,并结合了一种快速修剪技术,消除了额外的微调或权重调整的需要,实现了高效的压缩而不影响准确性。对包括LLaVA-1.5 7B和13B在内的基准数据集进行全面评估,展示了我们的方法在各种压缩水平上优于SparseGPT和Wanda等最新技术。值得注意的是,我们的框架自动分配KV缓存压缩资源,为LMM优化设定了一个新标准,提供了内存效率而不牺牲太多性能。

更新时间: 2025-07-28 08:27:40

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20613v1

Maximize margins for robust splicing detection

Despite recent progress in splicing detection, deep learning-based forensic tools remain difficult to deploy in practice due to their high sensitivity to training conditions. Even mild post-processing applied to evaluation images can significantly degrade detector performance, raising concerns about their reliability in operational contexts. In this work, we show that the same deep architecture can react very differently to unseen post-processing depending on the learned weights, despite achieving similar accuracy on in-distribution test data. This variability stems from differences in the latent spaces induced by training, which affect how samples are separated internally. Our experiments reveal a strong correlation between the distribution of latent margins and a detector's ability to generalize to post-processed images. Based on this observation, we propose a practical strategy for building more robust detectors: train several variants of the same model under different conditions, and select the one that maximizes latent margins.

Updated: 2025-07-28 08:20:46

标题: 最大化边界以实现强健的剪接检测

摘要: 尽管在剪接检测方面取得了最新进展,但基于深度学习的取证工具在实践中仍然难以部署,因为它们对训练条件非常敏感。即使对评估图像进行轻微的后处理也会显著降低检测器的性能,引发对其在操作环境中可靠性的担忧。在这项工作中,我们展示了相同的深度架构在未见过的后处理下,根据学习到的权重可能会有非常不同的反应,尽管在分布测试数据上实现了类似的准确性。这种变异性源于训练所导致的潜在空间的差异,这些差异影响了样本在内部的分离方式。我们的实验揭示了潜在边界分布与检测器能够泛化到后处理图像之间的强相关性。基于这一观察,我们提出了一个建立更健壮检测器的实用策略:在不同条件下训练相同模型的几个变体,并选择最大化潜在边界的那个。

更新时间: 2025-07-28 08:20:46

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2508.00897v1

Comparing and Scaling fMRI Features for Brain-Behavior Prediction

Predicting behavioral variables from neuroimaging modalities such as magnetic resonance imaging (MRI) has the potential to allow the development of neuroimaging biomarkers of mental and neurological disorders. A crucial processing step to this aim is the extraction of suitable features. These can differ in how well they predict the target of interest, and how this prediction scales with sample size and scan time. Here, we compare nine feature subtypes extracted from resting-state functional MRI recordings for behavior prediction, ranging from regional measures of functional activity to functional connectivity (FC) and metrics derived with graph signal processing (GSP), a principled approach for the extraction of structure-informed functional features. We study 979 subjects from the Human Connectome Project Young Adult dataset, predicting summary scores for mental health, cognition, processing speed, and substance use, as well as age and sex. The scaling properties of the features are investigated for different combinations of sample size and scan time. FC comes out as the best feature for predicting cognition, age, and sex. Graph power spectral density is the second best for predicting cognition and age, while for sex, variability-based features show potential as well. When predicting sex, the low-pass graph filtered coupled FC slightly outperforms the simple FC variant. None of the other targets were predicted significantly. The scaling results point to higher performance reserves for the better-performing features. They also indicate that it is important to balance sample size and scan time when acquiring data for prediction studies. The results confirm FC as a robust feature for behavior prediction, but also show the potential of GSP and variability-based measures. We discuss the implications for future prediction studies in terms of strategies for acquisition and sample composition.

Updated: 2025-07-28 08:13:08

标题: 比较和缩放fMRI特征以用于大脑行为预测

摘要: 从神经影像学模态如磁共振成像(MRI)中预测行为变量有潜力为精神和神经疾病的发展提供神经影像生物标志物。实现这一目标的关键处理步骤是提取合适的特征。这些特征在预测感兴趣目标的能力以及预测随样本大小和扫描时间的变化方面可能有所不同。在这里,我们比较了从安静状态功能性MRI记录中提取的九种特征子类型,用于行为预测,从功能活动的区域度量到功能连接性(FC)和利用图信号处理(GSP)导出的度量,这是一种基于结构信息提取功能特征的原则方法。我们研究了来自Human Connectome Project年轻成年人数据集的979名受试者,预测了心理健康、认知、处理速度和物质使用的总结分数,以及年龄和性别。对于不同组合的样本大小和扫描时间,对特征的扩展特性进行了调查。FC被证明是预测认知、年龄和性别的最佳特征。对于认知和年龄,图功率谱密度是第二好的,而对于性别,基于变异性的特征也显示出潜力。在预测性别时,低通图滤波耦合FC略优于简单FC变体。其他目标没有被显著预测。扩展结果指出更好表现的特征具有更高的性能储备。它们还表明,在获取用于预测研究的数据时,平衡样本大小和扫描时间是重要的。结果证实了FC作为行为预测的稳健特征,但也展示了GSP和基于变异性的测量的潜力。我们讨论了对未来预测研究的影响,包括获取和样本组成策略。

更新时间: 2025-07-28 08:13:08

领域: q-bio.NC,cs.LG

下载: http://arxiv.org/abs/2507.20601v1

Comparing and Scaling fMRI Features for Brain-Behavior Prediction

Predicting behavioral variables from neuroimaging modalities such as magnetic resonance imaging (MRI) has the potential to allow the development of neuroimaging biomarkers of mental and neurological disorders. A crucial processing step to this aim is the extraction of suitable features. These can differ in how well they predict the target of interest, and how this prediction scales with sample size and scan time. Here, we compare nine feature subtypes extracted from resting-state functional MRI recordings for behavior prediction, ranging from regional measures of functional activity to functional connectivity (FC) and metrics derived with graph signal processing (GSP), a principled approach for the extraction of structure-informed functional features. We study 979 subjects from the Human Connectome Project Young Adult dataset, predicting summary scores for mental health, cognition, processing speed, and substance use, as well as age and sex. The scaling properties of the features are investigated for different combinations of sample size and scan time. FC comes out as the best feature for predicting cognition, age, and sex. Graph power spectral density is the second best for predicting cognition and age, while for sex, variability-based features show potential as well. When predicting sex, the low-pass graph filtered coupled FC slightly outperforms the simple FC variant. None of the other targets were predicted significantly. The scaling results point to higher performance reserves for the better-performing features. They also indicate that it is important to balance sample size and scan time when acquiring data for prediction studies. The results confirm FC as a robust feature for behavior prediction, but also show the potential of GSP and variability-based measures. We discuss the implications for future prediction studies in terms of strategies for acquisition and sample composition.

Updated: 2025-07-28 08:13:08

标题: 比较和缩放fMRI特征以预测大脑行为

摘要: 从诸如磁共振成像(MRI)等神经影像学模态预测行为变量具有潜力,可以实现神经影像学标志物的发展,用于精神和神经疾病。实现这一目标的关键处理步骤是提取合适的特征。这些特征在预测感兴趣目标的能力以及这种预测如何随样本量和扫描时间变化而变化方面可能不同。在这里,我们比较了从静息态功能磁共振成像记录中提取的九种特征子类型,用于行为预测,包括从功能活动的区域测量到功能连接(FC)和使用图信号处理(GSP)推导的度量,这是一种从结构信息中提取功能特征的原则方法。我们研究了来自人类连通项目青少年数据集的979名受试者,预测了精神健康、认知、处理速度和物质使用的总结分数,以及年龄和性别。对于不同组合的样本量和扫描时间,对特征的缩放特性进行了研究。FC被证明是最适合预测认知、年龄和性别的特征。图功率谱密度在预测认知和年龄方面排名第二,而对于性别,基于变异性的特征也显示出潜力。在预测性别时,低通图滤波耦合的FC略优于简单的FC变体。其他目标均未被显著预测。缩放结果表明更好表现的特征具有更高的性能储备。它们还表明在获取用于预测研究数据时平衡样本量和扫描时间的重要性。结果证实了FC作为行为预测的稳健特征,但也展示了GSP和基于变异性的测量的潜力。我们讨论了对未来预测研究的影响,包括获取和样本构成策略。

更新时间: 2025-07-28 08:13:08

领域: q-bio.NC,cs.LG

下载: http://arxiv.org/abs/2507.20601v1

DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery

Large language models (LLMs) integrated with autonomous agents hold significant potential for advancing scientific discovery through automated reasoning and task execution. However, applying LLM agents to drug discovery is still constrained by challenges such as large-scale multimodal data processing, limited task automation, and poor support for domain-specific tools. To overcome these limitations, we introduce DrugPilot, a LLM-based agent system with a parameterized reasoning architecture designed for end-to-end scientific workflows in drug discovery. DrugPilot enables multi-stage research processes by integrating structured tool use with a novel parameterized memory pool. The memory pool converts heterogeneous data from both public sources and user-defined inputs into standardized representations. This design supports efficient multi-turn dialogue, reduces information loss during data exchange, and enhances complex scientific decision-making. To support training and benchmarking, we construct a drug instruction dataset covering eight core drug discovery tasks. Under the Berkeley function-calling benchmark, DrugPilot significantly outperforms state-of-the-art agents such as ReAct and LoT, achieving task completion rates of 98.0%, 93.5%, and 64.0% for simple, multi-tool, and multi-turn scenarios, respectively. These results highlight DrugPilot's potential as a versatile agent framework for computational science domains requiring automated, interactive, and data-integrated reasoning.

Updated: 2025-07-28 08:10:33

标题: DrugPilot:基于LLM的参数化推理药物发现代理程序

摘要: 大型语言模型(LLMs)与自主代理相结合,对通过自动推理和任务执行推进科学发现具有重要潜力。然而,将LLM代理应用于药物发现仍受到挑战的限制,如大规模多模态数据处理、有限的任务自动化以及对领域特定工具的支持不足。为了克服这些限制,我们引入了DrugPilot,这是一个基于LLM的代理系统,具有参数化推理架构,旨在为药物发现中的端到端科学工作流程提供支持。DrugPilot通过将结构化工具使用与新颖的参数化内存池相结合,实现了多阶段研究过程。内存池将来自公共来源和用户定义输入的异构数据转换为标准化表示。这种设计支持高效的多轮对话,减少数据交换过程中的信息丢失,并增强复杂科学决策。为了支持训练和基准测试,我们构建了一个涵盖八个核心药物发现任务的药物指令数据集。在伯克利函数调用基准测试下,DrugPilot明显优于ReAct和LoT等最先进的代理,分别实现了98.0%、93.5%和64.0%的简单、多工具和多轮场景的任务完成率。这些结果突显了DrugPilot作为一个适用于需要自动化、交互式和数据集成推理的计算科学领域的多功能代理框架的潜力。

更新时间: 2025-07-28 08:10:33

领域: cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2505.13940v2

DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery

Large language models (LLMs) integrated with autonomous agents hold significant potential for advancing scientific discovery through automated reasoning and task execution. However, applying LLM agents to drug discovery is still constrained by challenges such as large-scale multimodal data processing, limited task automation, and poor support for domain-specific tools. To overcome these limitations, we introduce DrugPilot, a LLM-based agent system with a parameterized reasoning architecture designed for end-to-end scientific workflows in drug discovery. DrugPilot enables multi-stage research processes by integrating structured tool use with a novel parameterized memory pool. The memory pool converts heterogeneous data from both public sources and user-defined inputs into standardized representations. This design supports efficient multi-turn dialogue, reduces information loss during data exchange, and enhances complex scientific decision-making. To support training and benchmarking, we construct a drug instruction dataset covering eight core drug discovery tasks. Under the Berkeley function-calling benchmark, DrugPilot significantly outperforms state-of-the-art agents such as ReAct and LoT, achieving task completion rates of 98.0%, 93.5%, and 64.0% for simple, multi-tool, and multi-turn scenarios, respectively. These results highlight DrugPilot's potential as a versatile agent framework for computational science domains requiring automated, interactive, and data-integrated reasoning.

Updated: 2025-07-28 08:10:33

标题: DrugPilot:基于LLM的参数化推理代理在药物发现中的应用

摘要: 大型语言模型(LLMs)与自主代理结合,通过自动推理和任务执行,具有推动科学发现的潜力。然而,将LLM代理应用于药物发现仍受到挑战,如大规模多模态数据处理、有限的任务自动化和对领域特定工具的支持不足。为了克服这些限制,我们介绍了DrugPilot,一个基于LLM的代理系统,具有参数化推理架构,专为药物发现中的端到端科学工作流程设计。DrugPilot通过将结构化工具使用与新颖的参数化内存池相结合,实现了多阶段研究过程。内存池将来自公共来源和用户定义输入的异构数据转换为标准化表示。这种设计支持高效的多轮对话,减少数据交换过程中的信息丢失,并增强复杂的科学决策制定。为了支持训练和基准测试,我们构建了一个涵盖八个核心药物发现任务的药物指令数据集。在伯克利函数调用基准测试下,DrugPilot明显优于ReAct和LoT等最先进的代理,分别实现了98.0%、93.5%和64.0%的简单、多工具和多轮场景的任务完成率。这些结果突显了DrugPilot作为一个多功能代理框架在需要自动化、交互式和数据集成推理的计算科学领域的潜力。

更新时间: 2025-07-28 08:10:33

领域: cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2505.13940v2

Distributional Soft Actor-Critic with Three Refinements

Reinforcement learning (RL) has shown remarkable success in solving complex decision-making and control tasks. However, many model-free RL algorithms experience performance degradation due to inaccurate value estimation, particularly the overestimation of Q-values, which can lead to suboptimal policies. To address this issue, we previously proposed the Distributional Soft Actor-Critic (DSAC or DSACv1), an off-policy RL algorithm that enhances value estimation accuracy by learning a continuous Gaussian value distribution. Despite its effectiveness, DSACv1 faces challenges such as training instability and sensitivity to reward scaling, caused by high variance in critic gradients due to return randomness. In this paper, we introduce three key refinements to DSACv1 to overcome these limitations and further improve Q-value estimation accuracy: expected value substitution, twin value distribution learning, and variance-based critic gradient adjustment. The enhanced algorithm, termed DSAC with Three refinements (DSAC-T or DSACv2), is systematically evaluated across a diverse set of benchmark tasks. Without the need for task-specific hyperparameter tuning, DSAC-T consistently matches or outperforms leading model-free RL algorithms, including SAC, TD3, DDPG, TRPO, and PPO, in all tested environments. Additionally, DSAC-T ensures a stable learning process and maintains robust performance across varying reward scales. Its effectiveness is further demonstrated through real-world application in controlling a wheeled robot, highlighting its potential for deployment in practical robotic tasks.

Updated: 2025-07-28 08:09:06

标题: 具有三个改进的分布式软演员评论家

摘要: 强化学习(RL)在解决复杂的决策和控制任务方面表现出了显著的成功。然而,许多无模型RL算法由于值估计不准确,特别是Q值的过度估计,导致性能下降,可能导致次优策略。为了解决这一问题,我们之前提出了分布软Actor-Critic(DSAC或DSACv1),这是一种离策略RL算法,通过学习连续的高斯值分布来增强值估计准确性。尽管其有效性,DSACv1面临挑战,如训练不稳定性和对奖励缩放的敏感性,由于返回随机性导致评论家梯度的高方差。在本文中,我们介绍了三个关键的改进措施,以克服这些限制并进一步提高Q值估计准确性:预期值替代、双值分布学习和基于方差的评论家梯度调整。增强算法,称为具有三个改进的DSAC(DSAC-T或DSACv2),在各种基准任务中进行了系统评估。无需特定于任务的超参数调整,DSAC-T在所有测试环境中始终与或优于主流无模型RL算法,包括SAC、TD3、DDPG、TRPO和PPO。此外,DSAC-T确保学习过程稳定,并在不同奖励尺度下保持稳健性能。其有效性通过在控制一个轮式机器人的实际应用中得到进一步证明,突显了其在实际机器人任务中的部署潜力。

更新时间: 2025-07-28 08:09:06

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2310.05858v6

Distributional Soft Actor-Critic with Three Refinements

Reinforcement learning (RL) has shown remarkable success in solving complex decision-making and control tasks. However, many model-free RL algorithms experience performance degradation due to inaccurate value estimation, particularly the overestimation of Q-values, which can lead to suboptimal policies. To address this issue, we previously proposed the Distributional Soft Actor-Critic (DSAC or DSACv1), an off-policy RL algorithm that enhances value estimation accuracy by learning a continuous Gaussian value distribution. Despite its effectiveness, DSACv1 faces challenges such as training instability and sensitivity to reward scaling, caused by high variance in critic gradients due to return randomness. In this paper, we introduce three key refinements to DSACv1 to overcome these limitations and further improve Q-value estimation accuracy: expected value substitution, twin value distribution learning, and variance-based critic gradient adjustment. The enhanced algorithm, termed DSAC with Three refinements (DSAC-T or DSACv2), is systematically evaluated across a diverse set of benchmark tasks. Without the need for task-specific hyperparameter tuning, DSAC-T consistently matches or outperforms leading model-free RL algorithms, including SAC, TD3, DDPG, TRPO, and PPO, in all tested environments. Additionally, DSAC-T ensures a stable learning process and maintains robust performance across varying reward scales. Its effectiveness is further demonstrated through real-world application in controlling a wheeled robot, highlighting its potential for deployment in practical robotic tasks.

Updated: 2025-07-28 08:09:06

标题: 具有三种改进的分布式软Actor-Critic

摘要: 强化学习(RL)在解决复杂的决策和控制任务方面取得了显著的成功。然而,许多无模型RL算法由于值估计不准确,特别是Q值的过度估计,导致性能下降,可能导致次优策略。为了解决这个问题,我们之前提出了Distributional Soft Actor-Critic(DSAC或DSACv1),这是一种离策略RL算法,通过学习连续的高斯值分布来增强值估计的准确性。尽管其有效性,DSACv1面临着训练不稳定和对奖励缩放敏感等挑战,这是由于由于返回随机性而导致的评论家梯度的高方差。在本文中,我们介绍了对DSACv1进行三项关键改进,以克服这些限制并进一步提高Q值估计准确性:期望值替代、双值分布学习和基于方差的评论家梯度调整。增强算法,称为带有三项改进的DSAC(DSAC-T或DSACv2),在各种基准任务中进行了系统评估。无需任务特定的超参数调整,DSAC-T在所有测试环境中始终可以与或胜过领先的无模型RL算法,包括SAC、TD3、DDPG、TRPO和PPO。此外,DSAC-T确保了稳定的学习过程,并在不同奖励尺度下保持稳健的性能。其有效性进一步通过在控制一个轮式机器人的实际应用中得到证明,突显了其在实际机器人任务中部署的潜力。

更新时间: 2025-07-28 08:09:06

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2310.05858v6

PhaseNAS: Language-Model Driven Architecture Search with Dynamic Phase Adaptation

Neural Architecture Search (NAS) is challenged by the trade-off between search space exploration and efficiency, especially for complex tasks. While recent LLM-based NAS methods have shown promise, they often suffer from static search strategies and ambiguous architecture representations. We propose PhaseNAS, an LLM-based NAS framework with dynamic phase transitions guided by real-time score thresholds and a structured architecture template language for consistent code generation. On the NAS-Bench-Macro benchmark, PhaseNAS consistently discovers architectures with higher accuracy and better rank. For image classification (CIFAR-10/100), PhaseNAS reduces search time by up to 86% while maintaining or improving accuracy. In object detection, it automatically produces YOLOv8 variants with higher mAP and lower resource cost. These results demonstrate that PhaseNAS enables efficient, adaptive, and generalizable NAS across diverse vision tasks.

Updated: 2025-07-28 08:02:31

标题: PhaseNAS:基于语言模型驱动的具有动态阶段适应性的架构搜索

摘要: 神经架构搜索(NAS)面临着在搜索空间探索和效率之间的权衡挑战,尤其是对于复杂任务而言。虽然最近基于LLM的NAS方法显示出了潜力,但它们经常受到静态搜索策略和模糊的架构表示的困扰。我们提出了PhaseNAS,一种基于LLM的NAS框架,具有由实时得分阈值引导的动态阶段转换和一个结构化的架构模板语言,用于一致的代码生成。在NAS-Bench-Macro基准测试中,PhaseNAS始终发现具有更高准确性和更好排名的架构。对于图像分类(CIFAR-10/100),PhaseNAS可将搜索时间缩短高达86%,同时保持或提高准确性。在目标检测中,它自动产生具有更高mAP和更低资源成本的YOLOv8变体。这些结果表明,PhaseNAS使得在各种视觉任务中实现高效、自适应和可泛化的NAS成为可能。

更新时间: 2025-07-28 08:02:31

领域: cs.LG

下载: http://arxiv.org/abs/2507.20592v1

PhaseNAS: Language-Model Driven Architecture Search with Dynamic Phase Adaptation

Neural Architecture Search (NAS) is challenged by the trade-off between search space exploration and efficiency, especially for complex tasks. While recent LLM-based NAS methods have shown promise, they often suffer from static search strategies and ambiguous architecture representations. We propose PhaseNAS, an LLM-based NAS framework with dynamic phase transitions guided by real-time score thresholds and a structured architecture template language for consistent code generation. On the NAS-Bench-Macro benchmark, PhaseNAS consistently discovers architectures with higher accuracy and better rank. For image classification (CIFAR-10/100), PhaseNAS reduces search time by up to 86% while maintaining or improving accuracy. In object detection, it automatically produces YOLOv8 variants with higher mAP and lower resource cost. These results demonstrate that PhaseNAS enables efficient, adaptive, and generalizable NAS across diverse vision tasks.

Updated: 2025-07-28 08:02:31

标题: PhaseNAS:基于语言模型驱动的动态阶段适应架构搜索

摘要: 神经架构搜索(NAS)面临着在搜索空间探索和效率之间的权衡挑战,特别是对于复杂任务而言。虽然最近基于LLM的NAS方法显示出了潜力,但它们往往受到静态搜索策略和模糊的架构表示的困扰。我们提出了PhaseNAS,这是一个基于LLM的NAS框架,通过实时得分阈值和结构化架构模板语言引导动态相位转换,以产生一致的代码生成。在NAS-Bench-Macro基准测试中,PhaseNAS始终能够发现准确性更高、排名更好的架构。对于图像分类(CIFAR-10/100)任务,PhaseNAS在减少搜索时间高达86%的同时,保持或提高准确性。在目标检测方面,它自动产生具有更高mAP和更低资源成本的YOLOv8变种。这些结果表明,PhaseNAS能够实现跨多样化视觉任务的高效、自适应和可推广的NAS。

更新时间: 2025-07-28 08:02:31

领域: cs.LG

下载: http://arxiv.org/abs/2507.20592v1

Enhancing generalization in high energy physics using white-box adversarial attacks

Machine learning is becoming increasingly popular in the context of particle physics. Supervised learning, which uses labeled Monte Carlo (MC) simulations, remains one of the most widely used methods for discriminating signals beyond the Standard Model. However, this paper suggests that supervised models may depend excessively on artifacts and approximations from Monte Carlo simulations, potentially limiting their ability to generalize well to real data. This study aims to enhance the generalization properties of supervised models by reducing the sharpness of local minima. It reviews the application of four distinct white-box adversarial attacks in the context of classifying Higgs boson decay signals. The attacks are divided into weight-space attacks and feature-space attacks. To study and quantify the sharpness of different local minima, this paper presents two analysis methods: gradient ascent and reduced Hessian eigenvalue analysis. The results show that white-box adversarial attacks significantly improve generalization performance, albeit with increased computational complexity.

Updated: 2025-07-28 07:55:10

标题: 使用白盒对抗攻击增强高能物理学中的泛化

摘要: 机器学习在粒子物理领域越来越受欢迎。监督学习使用标记的蒙特卡罗(MC)模拟仍然是区分标准模型之外信号的最常用方法之一。然而,本文提出监督模型可能过度依赖于蒙特卡罗模拟中的人为因素和近似,可能限制它们很好地推广到真实数据的能力。本研究旨在通过减少局部最小值的陡峭度来增强监督模型的泛化特性。它回顾了在分类希格斯玻色子衰变信号的背景下应用四种不同的白盒对抗攻击。这些攻击分为权重空间攻击和特征空间攻击。为了研究和量化不同局部最小值的陡峭度,本文提出了两种分析方法:梯度上升和降低的海森特征值分析。结果表明,白盒对抗攻击显著提高了泛化性能,尽管计算复杂性增加。

更新时间: 2025-07-28 07:55:10

领域: hep-ph,cs.LG

下载: http://arxiv.org/abs/2411.09296v3

Enhancing generalization in high energy physics using white-box adversarial attacks

Machine learning is becoming increasingly popular in the context of particle physics. Supervised learning, which uses labeled Monte Carlo (MC) simulations, remains one of the most widely used methods for discriminating signals beyond the Standard Model. However, this paper suggests that supervised models may depend excessively on artifacts and approximations from Monte Carlo simulations, potentially limiting their ability to generalize well to real data. This study aims to enhance the generalization properties of supervised models by reducing the sharpness of local minima. It reviews the application of four distinct white-box adversarial attacks in the context of classifying Higgs boson decay signals. The attacks are divided into weight-space attacks and feature-space attacks. To study and quantify the sharpness of different local minima, this paper presents two analysis methods: gradient ascent and reduced Hessian eigenvalue analysis. The results show that white-box adversarial attacks significantly improve generalization performance, albeit with increased computational complexity.

Updated: 2025-07-28 07:55:10

标题: 使用白盒对抗攻击增强高能物理中的泛化

摘要: 机器学习在粒子物理领域越来越受欢迎。监督学习使用标记的蒙特卡洛(MC)模拟仍然是区分标准模型之外信号的最常用方法之一。然而,本文表明监督模型可能过度依赖于蒙特卡洛模拟中的人为因素和近似,可能限制它们很好地概括真实数据的能力。本研究旨在通过减少局部最小值的陡峭程度来增强监督模型的概括性能。它回顾了在分类希格斯玻色子衰变信号的背景下应用四种不同的白盒对抗攻击。攻击分为权重空间攻击和特征空间攻击。为了研究和量化不同局部最小值的陡峭程度,本文提出了两种分析方法:梯度上升和降低的Hessian特征值分析。结果显示,白盒对抗攻击显著提高了概括性能,尽管计算复杂度增加。

更新时间: 2025-07-28 07:55:10

领域: hep-ph,cs.LG

下载: http://arxiv.org/abs/2411.09296v3

Beyond Manual Annotation: A Human-AI Collaborative Framework for Medical Image Segmentation Using Only "Better or Worse" Expert Feedback

Manual annotation of medical images is a labor-intensive and time-consuming process, posing a significant bottleneck in the development and deployment of robust medical imaging AI systems. This paper introduces a novel hands-free Human-AI collaborative framework for medical image segmentation that substantially reduces the annotation burden by eliminating the need for explicit manual pixel-level labeling. The core innovation lies in a preference learning paradigm, where human experts provide minimal, intuitive feedback -- simply indicating whether an AI-generated segmentation is better or worse than a previous version. The framework comprises four key components: (1) an adaptable foundation model (FM) for feature extraction, (2) label propagation based on feature similarity, (3) a clicking agent that learns from human better-or-worse feedback to decide where to click and with which label, and (4) a multi-round segmentation learning procedure that trains a state-of-the-art segmentation network using pseudo-labels generated by the clicking agent and FM-based label propagation. Experiments on three public datasets demonstrate that the proposed approach achieves competitive segmentation performance using only binary preference feedback, without requiring experts to directly manually annotate the images.

Updated: 2025-07-28 07:51:16

标题: 超越手动标注:利用“更好或更差”专家反馈的人工智能与人类协作框架进行医学图像分割

摘要: 医学图像的手工标注是一个费时费力的过程,成为发展和部署稳健医学影像人工智能系统的重要瓶颈。本文介绍了一种新颖的无需手动像素级标注的医学图像分割的人工智能协作框架,大大减少了标注负担。核心创新在于偏好学习范式,即人类专家提供最小的直观反馈 -- 只需指示AI生成的分割结果是比之前版本更好还是更差。该框架包括四个关键组件:(1)用于特征提取的可适应基础模型(FM),(2)基于特征相似性的标签传播,(3)一个学习自人类好坏反馈的点击代理,以决定在何处点击以及使用哪个标签,以及(4)一个多轮分割学习过程,使用点击代理生成的伪标签和基于FM的标签传播训练最先进的分割网络。在三个公共数据集上的实验表明,所提出的方法仅使用二进制偏好反馈即可实现竞争性的分割性能,而无需专家直接手动标注图像。

更新时间: 2025-07-28 07:51:16

领域: eess.IV,cs.LG

下载: http://arxiv.org/abs/2507.05815v2

Beyond Manual Annotation: A Human-AI Collaborative Framework for Medical Image Segmentation Using Only "Better or Worse" Expert Feedback

Manual annotation of medical images is a labor-intensive and time-consuming process, posing a significant bottleneck in the development and deployment of robust medical imaging AI systems. This paper introduces a novel hands-free Human-AI collaborative framework for medical image segmentation that substantially reduces the annotation burden by eliminating the need for explicit manual pixel-level labeling. The core innovation lies in a preference learning paradigm, where human experts provide minimal, intuitive feedback -- simply indicating whether an AI-generated segmentation is better or worse than a previous version. The framework comprises four key components: (1) an adaptable foundation model (FM) for feature extraction, (2) label propagation based on feature similarity, (3) a clicking agent that learns from human better-or-worse feedback to decide where to click and with which label, and (4) a multi-round segmentation learning procedure that trains a state-of-the-art segmentation network using pseudo-labels generated by the clicking agent and FM-based label propagation. Experiments on three public datasets demonstrate that the proposed approach achieves competitive segmentation performance using only binary preference feedback, without requiring experts to directly manually annotate the images.

Updated: 2025-07-28 07:51:16

标题: 超越手动标注:使用“好或坏”专家反馈的人工智能协作框架进行医学图像分割

摘要: 医学图像的手动注释是一项耗时且劳动密集的过程,在开发和部署健壮的医学成像人工智能系统中构成了一个重要的瓶颈。本文介绍了一种新颖的无需手动像素级标注即可显著减少注释负担的医学图像分割的人工智能协作框架。核心创新在于偏好学习范式,即人类专家提供最小的直观反馈--仅指示AI生成的分割比之前的版本好还是差。该框架包括四个关键组件:(1)用于特征提取的可调整的基础模型(FM),(2)基于特征相似性的标签传播,(3)一个学习自人类好坏反馈的点击代理,以决定在哪里点击以及使用哪个标签,以及(4)一个多轮分割学习过程,使用点击代理生成的伪标签和基于FM的标签传播来训练一种最先进的分割网络。对三个公共数据集的实验表明,所提出的方法使用仅有二进制偏好反馈就能实现具有竞争力的分割性能,无需专家直接手动注释图像。

更新时间: 2025-07-28 07:51:16

领域: eess.IV,cs.LG

下载: http://arxiv.org/abs/2507.05815v2

AutoLibra: Agent Metric Induction from Open-Ended Feedback

Agents are predominantly evaluated and optimized via task success metrics, which are coarse, rely on manual design from experts, and fail to reward intermediate emergent behaviors. We propose AutoLibra, a framework for agent evaluation, that transforms open-ended human feedback e.g. "If you find that the button is disabled, don't click it again", or "This agent has too much autonomy to decide what to do on its own" into metrics for evaluating fine-grained behaviors in agent trajectories. AutoLibra accomplishes this by grounding feedback to an agent's behavior, clustering similar positive and negative behaviors, and creating concrete metrics with clear definitions and concrete examples, which can be used for prompting LLM-as-a-Judge as evaluators. We further propose two meta-metrics to evaluate the alignment of a set of (induced) metrics with open feedback: "coverage" and "redundancy". Through optimizing these meta-metrics, we experimentally demonstrate AutoLibra's ability to induce more concrete agent evaluation metrics than the ones proposed in previous agent evaluation benchmarks and discover new metrics to analyze agents. We also present two applications of AutoLibra in agent improvement: First, we show that AutoLibra-induced metrics serve as better prompt-engineering targets than the task success rate on a wide range of text game tasks, improving agent performance over baseline by a mean of 20%. Second, we show that AutoLibra can iteratively select high-quality fine-tuning data for web navigation agents. Our results suggest that AutoLibra is a powerful task-agnostic tool for evaluating and improving language agents.

Updated: 2025-07-28 07:46:27

标题: AutoLibra: 从开放式反馈中感知代理度量

摘要: 代理人主要通过任务成功度量进行评估和优化,这些度量粗糙,依赖于专家的手动设计,并且无法奖励中间出现的行为。我们提出了AutoLibra,一个用于代理人评估的框架,它将开放式的人类反馈转化为评估代理人轨迹中细粒度行为的度量,例如“如果发现按钮被禁用,请不要再次点击它”,或者“这个代理人太过自主,无法自行决定该做什么”。AutoLibra通过将反馈与代理人的行为联系起来,将相似的积极和消极行为进行聚类,并创建具有清晰定义和具体示例的具体度量,可用于提示LLM作为评估者。我们进一步提出了两个元度量来评估一组(诱导的)度量与开放反馈的一致性:“覆盖率”和“冗余度”。通过优化这些元度量,我们实验证明了AutoLibra相对于先前代理人评估基准中提出的度量能够诱导出更具体的代理人评估度量,并发现了用于分析代理人的新度量。我们还提出了AutoLibra在代理人改进中的两个应用:首先,我们展示了AutoLibra诱导的度量在各种文本游戏任务中作为更好的提示工程目标,比任务成功率能够提高代理人表现20%。其次,我们展示了AutoLibra可以迭代选择用于网络导航代理人的高质量微调数据。我们的结果表明,AutoLibra是一个强大的无任务工具,可用于评估和改进语言代理人。

更新时间: 2025-07-28 07:46:27

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2505.02820v2

AutoLibra: Agent Metric Induction from Open-Ended Feedback

Agents are predominantly evaluated and optimized via task success metrics, which are coarse, rely on manual design from experts, and fail to reward intermediate emergent behaviors. We propose AutoLibra, a framework for agent evaluation, that transforms open-ended human feedback e.g. "If you find that the button is disabled, don't click it again", or "This agent has too much autonomy to decide what to do on its own" into metrics for evaluating fine-grained behaviors in agent trajectories. AutoLibra accomplishes this by grounding feedback to an agent's behavior, clustering similar positive and negative behaviors, and creating concrete metrics with clear definitions and concrete examples, which can be used for prompting LLM-as-a-Judge as evaluators. We further propose two meta-metrics to evaluate the alignment of a set of (induced) metrics with open feedback: "coverage" and "redundancy". Through optimizing these meta-metrics, we experimentally demonstrate AutoLibra's ability to induce more concrete agent evaluation metrics than the ones proposed in previous agent evaluation benchmarks and discover new metrics to analyze agents. We also present two applications of AutoLibra in agent improvement: First, we show that AutoLibra-induced metrics serve as better prompt-engineering targets than the task success rate on a wide range of text game tasks, improving agent performance over baseline by a mean of 20%. Second, we show that AutoLibra can iteratively select high-quality fine-tuning data for web navigation agents. Our results suggest that AutoLibra is a powerful task-agnostic tool for evaluating and improving language agents.

Updated: 2025-07-28 07:46:27

标题: AutoLibra:来自开放式反馈的代理度量归纳

摘要: 代理主要通过任务成功度量来评估和优化,这些度量粗糙,依赖于专家的手动设计,并且无法奖励中间的新兴行为。我们提出了AutoLibra,一个用于代理评估的框架,它将开放式的人类反馈转化为评估代理轨迹中细粒度行为的度量,例如“如果你发现按钮被禁用,请不要再次点击它”,或者“这个代理有太多的自主权来决定自己要做什么”。AutoLibra通过将反馈与代理行为联系起来,将类似的正面和负面行为聚类,并创建具有清晰定义和具体示例的具体度量,可用于提示LLM作为评估者。我们进一步提出了两个用于评估一组(诱导的)度量与开放反馈一致性的元度量:“覆盖度”和“冗余度”。通过优化这些元度量,我们实验性地证明了AutoLibra相对于先前代理评估基准中提出的度量,能够诱导出更具体的代理评估度量,并发现了用于分析代理的新度量。我们还展示了AutoLibra在代理改进中的两个应用:首先,我们展示了AutoLibra诱导的度量在各种文本游戏任务上作为更好的提示工程目标,将代理性能提高了平均20%。其次,我们展示了AutoLibra可以迭代地为网络导航代理选择高质量的微调数据。我们的结果表明,AutoLibra是一个强大的无任务工具,用于评估和改进语言代理。

更新时间: 2025-07-28 07:46:27

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2505.02820v2

GASPnet: Global Agreement to Synchronize Phases

In recent years, Transformer architectures have revolutionized most fields of artificial intelligence, relying on an attentional mechanism based on the agreement between keys and queries to select and route information in the network. In previous work, we introduced a novel, brain-inspired architecture that leverages a similar implementation to achieve a global 'routing by agreement' mechanism. Such a system modulates the network's activity by matching each neuron's key with a single global query, pooled across the entire network. Acting as a global attentional system, this mechanism improves noise robustness over baseline levels but is insufficient for multi-classification tasks. Here, we improve on this work by proposing a novel mechanism that combines aspects of the Transformer attentional operations with a compelling neuroscience theory, namely, binding by synchrony. This theory proposes that the brain binds together features by synchronizing the temporal activity of neurons encoding those features. This allows the binding of features from the same object while efficiently disentangling those from distinct objects. We drew inspiration from this theory and incorporated angular phases into all layers of a convolutional network. After achieving phase alignment via Kuramoto dynamics, we use this approach to enhance operations between neurons with similar phases and suppresses those with opposite phases. We test the benefits of this mechanism on two datasets: one composed of pairs of digits and one composed of a combination of an MNIST item superimposed on a CIFAR-10 image. Our results reveal better accuracy than CNN networks, proving more robust to noise and with better generalization abilities. Overall, we propose a novel mechanism that addresses the visual binding problem in neural networks by leveraging the synergy between neuroscience and machine learning.

Updated: 2025-07-28 07:32:09

标题: GASPnet:全球同步阶段的协议

摘要: 最近几年,Transformer架构已经在人工智能的大多数领域中引起了革命,依赖于基于键和查询之间的一致性的注意机制来选择和路由网络中的信息。在先前的工作中,我们介绍了一种新颖的、受大脑启发的架构,利用类似的实现来实现全局“协议路由”机制。这样的系统通过将每个神经元的键与整个网络中汇总的单个全局查询进行匹配,调节网络的活动。作为全局关注系统,这种机制提高了噪声鲁棒性,但对于多分类任务来说还不足够。在这里,我们通过提出一种新颖的机制改进了这项工作,该机制将Transformer的注意操作与一个引人注目的神经科学理论结合起来,即同步绑定。这个理论提出了大脑通过同步编码这些特征的神经元的时间活动来将特征绑定在一起。这允许从同一对象中绑定特征,同时高效地解开来自不同对象的特征。我们从这个理论中汲取灵感,并将角相位结合到卷积网络的所有层中。通过Kuramoto动态实现相位对齐后,我们使用这种方法增强具有相似相位的神经元之间的操作,并抑制具有相反相位的操作。我们在两个数据集上测试了这种机制的好处:一个由数字对组成,另一个由一个MNIST项目叠加在一个CIFAR-10图像上组成。我们的结果显示比CNN网络更准确,证明了在噪声方面更具鲁棒性,并具有更好的泛化能力。总的来说,我们提出了一种通过利用神经科学和机器学习之间的协同作用来解决神经网络中的视觉绑定问题的新颖机制。

更新时间: 2025-07-28 07:32:09

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2507.16674v2

GASPnet: Global Agreement to Synchronize Phases

In recent years, Transformer architectures have revolutionized most fields of artificial intelligence, relying on an attentional mechanism based on the agreement between keys and queries to select and route information in the network. In previous work, we introduced a novel, brain-inspired architecture that leverages a similar implementation to achieve a global 'routing by agreement' mechanism. Such a system modulates the network's activity by matching each neuron's key with a single global query, pooled across the entire network. Acting as a global attentional system, this mechanism improves noise robustness over baseline levels but is insufficient for multi-classification tasks. Here, we improve on this work by proposing a novel mechanism that combines aspects of the Transformer attentional operations with a compelling neuroscience theory, namely, binding by synchrony. This theory proposes that the brain binds together features by synchronizing the temporal activity of neurons encoding those features. This allows the binding of features from the same object while efficiently disentangling those from distinct objects. We drew inspiration from this theory and incorporated angular phases into all layers of a convolutional network. After achieving phase alignment via Kuramoto dynamics, we use this approach to enhance operations between neurons with similar phases and suppresses those with opposite phases. We test the benefits of this mechanism on two datasets: one composed of pairs of digits and one composed of a combination of an MNIST item superimposed on a CIFAR-10 image. Our results reveal better accuracy than CNN networks, proving more robust to noise and with better generalization abilities. Overall, we propose a novel mechanism that addresses the visual binding problem in neural networks by leveraging the synergy between neuroscience and machine learning.

Updated: 2025-07-28 07:32:09

标题: GASPnet:全球同步阶段协议

摘要: 近年来,Transformer架构已经在人工智能的大多数领域中引起了革命,依赖于基于键和查询之间的一致性的注意机制来选择和路由网络中的信息。在先前的工作中,我们引入了一种新颖的、灵感来自大脑的架构,利用类似的实现方式实现了全局的“一致性路由”机制。这样的系统通过将每个神经元的键与整个网络中汇集的单个全局查询进行匹配来调节网络的活动。作为一个全局的注意系统,这种机制提高了噪声鲁棒性,但对于多分类任务来说还不足够。在这里,我们改进了这项工作,提出了一种结合Transformer注意操作和一个引人注目的神经科学理论,即同步绑定的新机制。这一理论提出大脑通过同步编码特征的神经元的时间活动来将特征绑定在一起。这使得可以将同一对象的特征绑定在一起,同时有效地将来自不同对象的特征分离开来。我们从这一理论中汲取灵感,并将角相位纳入卷积网络的所有层中。通过Kuramoto动态实现相位对齐后,我们利用这种方法增强具有相似相位的神经元之间的操作,并抑制具有相反相位的神经元之间的操作。我们在两个数据集上测试了这种机制的好处:一个由数字对组成,另一个由将MNIST项目叠加在CIFAR-10图像上的组合组成。我们的结果显示比CNN网络具有更好的准确性,对噪声更加鲁棒,并具有更好的泛化能力。总的来说,我们提出了一种通过利用神经科学和机器学习之间的协同作用来解决神经网络中的视觉绑定问题的新机制。

更新时间: 2025-07-28 07:32:09

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2507.16674v2

Beyond Interactions: Node-Level Graph Generation for Knowledge-Free Augmentation in Recommender Systems

Recent advances in recommender systems rely on external resources such as knowledge graphs or large language models to enhance recommendations, which limit applicability in real-world settings due to data dependency and computational overhead. Although knowledge-free models are able to bolster recommendations by direct edge operations as well, the absence of augmentation primitives drives them to fall short in bridging semantic and structural gaps as high-quality paradigm substitutes. Unlike existing diffusion-based works that remodel user-item interactions, this work proposes NodeDiffRec, a pioneering knowledge-free augmentation framework that enables fine-grained node-level graph generation for recommendations and expands the scope of restricted augmentation primitives via diffusion. By synthesizing pseudo-items and corresponding interactions that align with the underlying distribution for injection, and further refining user preferences through a denoising preference modeling process, NodeDiffRec dramatically enhances both semantic diversity and structural connectivity without external knowledge. Extensive experiments across diverse datasets and recommendation algorithms demonstrate the superiority of NodeDiffRec, achieving State-of-the-Art (SOTA) performance, with maximum average performance improvement 98.6% in Recall@5 and 84.0% in NDCG@5 over selected baselines.

Updated: 2025-07-28 07:22:06

标题: 超越互动:基于节点级图生成的知识无关增强在推荐系统中的应用

摘要: 最近推荐系统的发展依赖于外部资源,例如知识图谱或大型语言模型,以增强推荐,但由于数据依赖性和计算开销的限制,这在实际环境中的适用性受到限制。虽然无知识模型能够通过直接边缘操作来增强推荐,但由于缺乏增强原语,它们无法弥补语义和结构差距,作为高质量的范式替代物而显得不足。与现有的基于扩散的作品不同,重新建模用户-项目交互,本文提出了NodeDiffRec,这是一个开创性的无知识增强框架,它能够为推荐提供细粒度的节点级图生成,并通过扩散扩展了受限增强原语的范围。通过合成伪项目和与底层分布一致的相应交互以进行注入,并通过去噪偏好建模过程进一步细化用户偏好,NodeDiffRec在没有外部知识的情况下显著提高了语义多样性和结构连通性。在各种数据集和推荐算法上进行的大量实验表明,NodeDiffRec的优越性,达到了现有技术的最高水平(SOTA)性能,在选择的基线上,召回率@5最大平均性能改进为98.6%,NDCG@5为84.0%。

更新时间: 2025-07-28 07:22:06

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2507.20578v1

Beyond Interactions: Node-Level Graph Generation for Knowledge-Free Augmentation in Recommender Systems

Recent advances in recommender systems rely on external resources such as knowledge graphs or large language models to enhance recommendations, which limit applicability in real-world settings due to data dependency and computational overhead. Although knowledge-free models are able to bolster recommendations by direct edge operations as well, the absence of augmentation primitives drives them to fall short in bridging semantic and structural gaps as high-quality paradigm substitutes. Unlike existing diffusion-based works that remodel user-item interactions, this work proposes NodeDiffRec, a pioneering knowledge-free augmentation framework that enables fine-grained node-level graph generation for recommendations and expands the scope of restricted augmentation primitives via diffusion. By synthesizing pseudo-items and corresponding interactions that align with the underlying distribution for injection, and further refining user preferences through a denoising preference modeling process, NodeDiffRec dramatically enhances both semantic diversity and structural connectivity without external knowledge. Extensive experiments across diverse datasets and recommendation algorithms demonstrate the superiority of NodeDiffRec, achieving State-of-the-Art (SOTA) performance, with maximum average performance improvement 98.6% in Recall@5 and 84.0% in NDCG@5 over selected baselines.

Updated: 2025-07-28 07:22:06

标题: 超越交互:基于节点的图生成用于知识无关增强在推荐系统中

摘要: 最近推荐系统的进展依赖于外部资源,如知识图谱或大型语言模型,以增强推荐,但由于数据依赖性和计算开销限制了在现实世界环境中的适用性。虽然无知识模型能够通过直接边操作来增强推荐,但缺乏增强原语使它们无法弥补语义和结构差距,作为高质量范式替代品。与现有的基于扩散的作品不同,重新调整用户-物品互动,这项工作提出了NodeDiffRec,这是一个开创性的无知识增强框架,它能够为推荐实现细粒度的节点级图生成,并通过扩散扩展了受限增强原语的范围。通过合成与底层分布一致的伪物品和相应的互动以进行注入,并通过去噪偏好建模过程进一步优化用户偏好,NodeDiffRec在不借助外部知识的情况下显著增强了语义多样性和结构连通性。通过在不同数据集和推荐算法上进行广泛实验,展示了NodeDiffRec的优越性,其在Recall@5和NDCG@5方面实现了最先进的性能,最大平均性能改进为98.6%和84.0%,超过了选择基线。

更新时间: 2025-07-28 07:22:06

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2507.20578v1

A note on the Artstein-Avidan-Milman's generalized Legendre transforms

Artstein-Avidan and Milman [Annals of mathematics (2009), (169):661-674] characterized invertible reverse-ordering transforms on the space of lower-semi-continuous extended real-valued convex functions as affine deformations of the ordinary Legendre transform. In this note, we prove that all those generalized Legendre transforms on functions correspond to the ordinary Legendre transform on dually corresponding affine-deformed functions. That is, generalized convex conjugates are convex conjugates of affine-deformed functions. We conclude this note by sketching how this result can be interpreted from the lens of information geometry.

Updated: 2025-07-28 07:21:57

标题: 关于Artstein-Avidan-Milman的广义Legendre变换的注解

摘要: Artstein-Avidan和Milman在《数学年刊(2009年),(169):661-674》中对下半连续扩展实值凸函数空间上的可逆逆序变换进行了表征,将其描述为普通Legendre变换的仿射变形。在本说明中,我们证明了所有这些广义Legendre变换对应于对偶对应的仿射变形函数上的普通Legendre变换。换句话说,广义凸共轭是仿射变形函数的凸共轭。我们通过简要概述如何从信息几何的视角解释这一结果来总结本说明。

更新时间: 2025-07-28 07:21:57

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2507.20577v1

A note on the Artstein-Avidan-Milman's generalized Legendre transforms

Artstein-Avidan and Milman [Annals of mathematics (2009), (169):661-674] characterized invertible reverse-ordering transforms on the space of lower-semi-continuous extended real-valued convex functions as affine deformations of the ordinary Legendre transform. In this note, we prove that all those generalized Legendre transforms on functions correspond to the ordinary Legendre transform on dually corresponding affine-deformed functions. That is, generalized convex conjugates are convex conjugates of affine-deformed functions. We conclude this note by sketching how this result can be interpreted from the lens of information geometry.

Updated: 2025-07-28 07:21:57

标题: 关于Artstein-Avidan-Milman的广义Legendre变换的注解

摘要: Artstein-Avidan和Milman在《数学年鉴(2009),(169):661-674》中对下半连续扩展实值凸函数空间上的可逆逆序变换进行了表征,将其描述为普通Legendre变换的仿射变形。在本文中,我们证明了所有这些广义Legendre变换对应于对偶的仿射变形函数上的普通Legendre变换。换句话说,广义凸共轭是仿射变形函数的凸共轭。我们通过概述如何从信息几何的角度解释这一结果来结束本文。

更新时间: 2025-07-28 07:21:57

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2507.20577v1

Fusing CFD and measurement data using transfer learning

Aerodynamic analysis during aircraft design usually involves methods of varying accuracy and spatial resolution, which all have their advantages and disadvantages. It is therefore desirable to create data-driven models which effectively combine these advantages. Such data fusion methods for distributed quantities mainly rely on proper orthogonal decomposition as of now, which is a linear method. In this paper, we introduce a non-linear method based on neural networks combining simulation and measurement data via transfer learning. The network training accounts for the heterogeneity of the data, as simulation data usually features a high spatial resolution, while measurement data is sparse but more accurate. In a first step, the neural network is trained on simulation data to learn spatial features of the distributed quantities. The second step involves transfer learning on the measurement data to correct for systematic errors between simulation and measurement by only re-training a small subset of the entire neural network model. This approach is applied to a multilayer perceptron architecture and shows significant improvements over the established method based on proper orthogonal decomposition by producing more physical solutions near nonlinearities. In addition, the neural network provides solutions at arbitrary flow conditions, thus making the model useful for flight mechanical design, structural sizing, and certification. As the proposed training strategy is very general, it can also be applied to more complex neural network architectures in the future.

Updated: 2025-07-28 07:21:46

标题: 使用迁移学习将CFD和测量数据融合

摘要: 飞机设计过程中的气动分析通常涉及不同精度和空间分辨率的方法,它们各有优点和缺点。因此,创建能有效结合这些优点的数据驱动模型是可取的。目前,用于分布量的数据融合方法主要依赖于适当正交分解,这是一种线性方法。本文介绍了一种基于神经网络的非线性方法,通过迁移学习将模拟和测量数据结合起来。网络训练考虑了数据的异质性,因为模拟数据通常具有高空间分辨率,而测量数据稀疏但更准确。首先,神经网络在模拟数据上进行训练,学习分布量的空间特征。第二步涉及迁移学习,通过仅重新训练整个神经网络模型的一小部分来校正模拟和测量之间的系统误差。这种方法应用于多层感知器架构,并通过在非线性附近产生更多物理解决方案,显示出明显的改进,超越了基于适当正交分解的已建立方法。此外,神经网络可以在任意流动条件下提供解决方案,因此将模型用于飞行机械设计、结构尺寸确定和认证是有用的。由于所提出的训练策略非常通用,因此将来也可以应用于更复杂的神经网络架构。

更新时间: 2025-07-28 07:21:46

领域: cs.LG

下载: http://arxiv.org/abs/2507.20576v1

Fusing CFD and measurement data using transfer learning

Aerodynamic analysis during aircraft design usually involves methods of varying accuracy and spatial resolution, which all have their advantages and disadvantages. It is therefore desirable to create data-driven models which effectively combine these advantages. Such data fusion methods for distributed quantities mainly rely on proper orthogonal decomposition as of now, which is a linear method. In this paper, we introduce a non-linear method based on neural networks combining simulation and measurement data via transfer learning. The network training accounts for the heterogeneity of the data, as simulation data usually features a high spatial resolution, while measurement data is sparse but more accurate. In a first step, the neural network is trained on simulation data to learn spatial features of the distributed quantities. The second step involves transfer learning on the measurement data to correct for systematic errors between simulation and measurement by only re-training a small subset of the entire neural network model. This approach is applied to a multilayer perceptron architecture and shows significant improvements over the established method based on proper orthogonal decomposition by producing more physical solutions near nonlinearities. In addition, the neural network provides solutions at arbitrary flow conditions, thus making the model useful for flight mechanical design, structural sizing, and certification. As the proposed training strategy is very general, it can also be applied to more complex neural network architectures in the future.

Updated: 2025-07-28 07:21:46

标题: 利用迁移学习融合CFD和测量数据

摘要: 飞机设计过程中的空气动力学分析通常涉及各种准确性和空间分辨率不同的方法,它们都有各自的优缺点。因此,创建能有效结合这些优点的数据驱动模型是非常可取的。目前,这种用于分布量的数据融合方法主要依赖于适当正交分解,这是一种线性方法。本文介绍了一种基于神经网络的非线性方法,通过迁移学习结合模拟和测量数据。网络训练考虑了数据的异质性,因为模拟数据通常具有高空间分辨率,而测量数据稀疏但更准确。首先,神经网络在模拟数据上进行训练,学习分布量的空间特征。第二步涉及在测量数据上进行迁移学习,通过仅重新训练整个神经网络模型的一小部分来校正模拟和测量之间的系统误差。这种方法应用于多层感知器结构,并且通过在非线性附近生成更多物理解决方案,明显优于基于适当正交分解的已建立方法。此外,神经网络可以在任意流动条件下提供解决方案,因此使该模型在飞行机械设计、结构尺寸确定和认证方面具有用处。由于所提出的训练策略非常通用,因此未来也可以应用于更复杂的神经网络架构。

更新时间: 2025-07-28 07:21:46

领域: cs.LG

下载: http://arxiv.org/abs/2507.20576v1

LIMO: Less is More for Reasoning

We challenge the prevailing assumption that complex reasoning in large language models (LLMs) necessitates massive training data. We demonstrate that sophisticated mathematical reasoning can emerge with only a few examples. Specifically, through simple supervised fine-tuning, our model, LIMO, achieves 63.3\% accuracy on AIME24 and 95.6\% on MATH500, surpassing previous fine-tuned models (6.5\% on AIME24, 59.2\% on MATH500) while using only 1\% of the training data required by prior approaches. Furthermore, LIMO exhibits strong out-of-distribution generalization, achieving a 45.8\% absolute improvement across diverse benchmarks, outperforming models trained on 100x more data. Synthesizing these findings, we propose the Less-Is-More Reasoning Hypothesis (LIMO Hypothesis): In foundation models where domain knowledge has been comprehensively encoded during pre-training, sophisticated reasoning can emerge through minimal but strategically designed demonstrations of cognitive processes. This hypothesis suggests that the threshold for eliciting complex reasoning is not dictated by task complexity but rather by two key factors: (1) the completeness of the model's pre-trained knowledge base and (2) the effectiveness of post-training examples in serving as "cognitive templates" that guide reasoning.

Updated: 2025-07-28 07:21:10

标题: LIMO:推理中的“少即是多”

摘要: 我们挑战了当前的观点,即大型语言模型(LLMs)中的复杂推理需要大量的训练数据。我们证明,精密的数学推理可以仅通过少量示例而产生。具体来说,通过简单的监督微调,我们的模型LIMO在AIME24上取得63.3%的准确率,在MATH500上取得95.6%的准确率,超过先前微调模型(AIME24上的6.5%,MATH500上的59.2%),而仅使用以往方法所需训练数据的1%。此外,LIMO表现出强大的超出分布泛化能力,在多个基准测试中取得45.8%的绝对改进,胜过训练数据多100倍的模型。综合这些发现,我们提出了Less-Is-More Reasoning Hypothesis(LIMO假设):在基础模型中,领域知识已经在预训练过程中全面编码,通过最小但策略性设计的认知过程演示可以产生复杂的推理。这一假设表明,引发复杂推理的门槛不是由任务复杂性决定,而是由两个关键因素决定:(1)模型预训练知识库的完整性和(2)后续训练示例在作为“认知模板”引导推理方面的有效性。

更新时间: 2025-07-28 07:21:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.03387v2

LIMO: Less is More for Reasoning

We challenge the prevailing assumption that complex reasoning in large language models (LLMs) necessitates massive training data. We demonstrate that sophisticated mathematical reasoning can emerge with only a few examples. Specifically, through simple supervised fine-tuning, our model, LIMO, achieves 63.3\% accuracy on AIME24 and 95.6\% on MATH500, surpassing previous fine-tuned models (6.5\% on AIME24, 59.2\% on MATH500) while using only 1\% of the training data required by prior approaches. Furthermore, LIMO exhibits strong out-of-distribution generalization, achieving a 45.8\% absolute improvement across diverse benchmarks, outperforming models trained on 100x more data. Synthesizing these findings, we propose the Less-Is-More Reasoning Hypothesis (LIMO Hypothesis): In foundation models where domain knowledge has been comprehensively encoded during pre-training, sophisticated reasoning can emerge through minimal but strategically designed demonstrations of cognitive processes. This hypothesis suggests that the threshold for eliciting complex reasoning is not dictated by task complexity but rather by two key factors: (1) the completeness of the model's pre-trained knowledge base and (2) the effectiveness of post-training examples in serving as "cognitive templates" that guide reasoning.

Updated: 2025-07-28 07:21:10

标题: LIMO:推理中的简约更胜一筹

摘要: 我们挑战了当前的观点,即大型语言模型(LLMs)中的复杂推理需要大量的训练数据。我们证明,通过只有少量示例,复杂的数学推理也能够出现。具体来说,通过简单的监督微调,我们的模型LIMO在AIME24上实现了63.3%的准确率,在MATH500上实现了95.6%的准确率,超过了先前微调的模型(AIME24上的6.5%,MATH500上的59.2%),同时仅使用先前方法所需训练数据的1%。此外,LIMO表现出强大的超出分布泛化能力,在各种基准测试中实现了45.8%的绝对改进,超越了使用100倍数据训练的模型。综合这些发现,我们提出了“少即是多推理假设”(LIMO假设):在基础模型中,领域知识已经在预训练期间得到全面编码,通过最小但有策略性设计的认知过程示例,可以出现复杂的推理。这一假设表明,引发复杂推理的门槛不是由任务复杂性决定,而是由两个关键因素决定:(1)模型预训练知识库的完整性和(2)后续训练示例在作为“认知模板”引导推理方面的有效性。

更新时间: 2025-07-28 07:21:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.03387v2

Implicit Spatiotemporal Bandwidth Enhancement Filter by Sine-activated Deep Learning Model for Fast 3D Photoacoustic Tomography

3D photoacoustic tomography (3D-PAT) using high-frequency hemispherical transducers offers near-omnidirectional reception and enhanced sensitivity to the finer structural details encoded in the high-frequency components of the broadband photoacoustic (PA) signal. However, practical constraints such as limited number of channels with bandlimited sampling rate often result in sparse and bandlimited sensors that degrade image quality. To address this, we revisit the 2D deep learning (DL) approach applied directly to sensor-wise PA radio-frequency (PARF) data. Specifically, we introduce sine activation into the DL model to restore the broadband nature of PARF signals given the observed band-limited and high-frequency PARF data. Given the scarcity of 3D training data, we employ simplified training strategies by simulating random spherical absorbers. This combination of sine-activated model and randomized training is designed to emphasize bandwidth learning over dataset memorization. Our model was evaluated on a leaf skeleton phantom, a micro-CT-verified 3D spiral phantom and in-vivo human palm vasculature. The results showed that the proposed training mechanism on sine-activated model was well-generalized across the different tests by effectively increasing the sensor density and recovering the spatiotemporal bandwidth. Qualitatively, the sine-activated model uniquely enhanced high-frequency content that produces clearer vascular structure with fewer artefacts. Quantitatively, the sine-activated model exhibits full bandwidth at -12 dB spectrum and significantly higher contrast-to-noise ratio with minimal loss of structural similarity index. Lastly, we optimized our approach to enable fast enhanced 3D-PAT at 2 volumes-per-second for better practical imaging of a free-moving targets.

Updated: 2025-07-28 07:16:32

标题: 隐式时空带宽增强滤波器:通过正弦激活的深度学习模型用于快速3D光声 tomography

摘要: 三维光声成像(3D-PAT)利用高频半球形传感器提供接近全向接收和增强对高频成分编码的较细结构细节的灵敏度。然而,实际约束,如有限数量的通道和带限采样率往往导致稀疏和带限的传感器,降低图像质量。为解决这一问题,我们重新审视直接应用于基于传感器的光声射频(PARF)数据的2D深度学习(DL)方法。具体地,我们引入正弦激活到DL模型中,以恢复PARF信号的宽带特性,考虑到观察到的带限和高频PARF数据。考虑到三维训练数据的稀缺性,我们采用简化的训练策略,通过模拟随机球吸收体。这种正弦激活模型和随机训练的组合旨在强调带宽学习而不是数据集记忆。我们的模型在叶骨骼模型、经微CT验证的三维螺旋模型和体内人类手掌血管中进行了评估。结果显示,提出的正弦激活模型的训练机制在不同测试中良好地泛化,通过有效增加传感器密度和恢复时空带宽。在定性方面,正弦激活模型独特地增强了产生更清晰血管结构的高频内容,并减少了伪影。在定量方面,正弦激活模型在-12 dB频谱处展现了完整的带宽,并且具有显著更高的对比噪声比,同时最小化了结构相似性指数的损失。最后,我们优化了我们的方法,以实现每秒2个体积的快速增强的3D-PAT,以更好地实现对自由移动目标的实际成像。

更新时间: 2025-07-28 07:16:32

领域: eess.IV,cs.AI

下载: http://arxiv.org/abs/2507.20575v1

Implicit Spatiotemporal Bandwidth Enhancement Filter by Sine-activated Deep Learning Model for Fast 3D Photoacoustic Tomography

3D photoacoustic tomography (3D-PAT) using high-frequency hemispherical transducers offers near-omnidirectional reception and enhanced sensitivity to the finer structural details encoded in the high-frequency components of the broadband photoacoustic (PA) signal. However, practical constraints such as limited number of channels with bandlimited sampling rate often result in sparse and bandlimited sensors that degrade image quality. To address this, we revisit the 2D deep learning (DL) approach applied directly to sensor-wise PA radio-frequency (PARF) data. Specifically, we introduce sine activation into the DL model to restore the broadband nature of PARF signals given the observed band-limited and high-frequency PARF data. Given the scarcity of 3D training data, we employ simplified training strategies by simulating random spherical absorbers. This combination of sine-activated model and randomized training is designed to emphasize bandwidth learning over dataset memorization. Our model was evaluated on a leaf skeleton phantom, a micro-CT-verified 3D spiral phantom and in-vivo human palm vasculature. The results showed that the proposed training mechanism on sine-activated model was well-generalized across the different tests by effectively increasing the sensor density and recovering the spatiotemporal bandwidth. Qualitatively, the sine-activated model uniquely enhanced high-frequency content that produces clearer vascular structure with fewer artefacts. Quantitatively, the sine-activated model exhibits full bandwidth at -12 dB spectrum and significantly higher contrast-to-noise ratio with minimal loss of structural similarity index. Lastly, we optimized our approach to enable fast enhanced 3D-PAT at 2 volumes-per-second for better practical imaging of a free-moving targets.

Updated: 2025-07-28 07:16:32

标题: 隐式时空带宽增强滤波器:基于正弦激活深度学习模型的快速3D光声断层成像

摘要: 3D光声断层成像(3D-PAT)利用高频半球形传感器提供接近全向接收和增强对宽带光声(PA)信号高频分量中编码的更细微结构细节的敏感性。然而,实际约束,如有限数量的通道和带限采样率往往导致稀疏和带限传感器,降低图像质量。为了解决这个问题,我们重新考虑将2D深度学习(DL)方法直接应用于传感器智能PA射频(PARF)数据。具体来说,我们引入正弦激活到DL模型中,以恢复PARF信号的宽带特性,考虑到观察到的带限制和高频率PARF数据。由于3D训练数据的稀缺性,我们采用了简化的训练策略,通过模拟随机球形吸收体。这种正弦激活模型和随机训练的组合旨在强调带宽学习而不是数据集记忆。我们的模型在叶骨骼幻影、经微CT验证的3D螺旋幻影和体内人手掌血管上进行了评估。结果显示,所提出的正弦激活模型上的训练机制在不同测试中表现良好,通过有效增加传感器密度和恢复时空带宽。在定性方面,正弦激活模型独特地增强了产生更清晰血管结构的高频内容,减少了伪影。在定量方面,正弦激活模型在-12dB频谱处表现出全带宽,并且具有显著更高的对比噪声比,同时最小化了结构相似性指数的损失。最后,我们优化了我们的方法,以实现每秒2个体积的快速增强3D-PAT,以更好地实现对自由移动目标的实用成像。

更新时间: 2025-07-28 07:16:32

领域: eess.IV,cs.AI

下载: http://arxiv.org/abs/2507.20575v1

Reminiscence Attack on Residuals: Exploiting Approximate Machine Unlearning for Privacy

Machine unlearning enables the removal of specific data from ML models to uphold the right to be forgotten. While approximate unlearning algorithms offer efficient alternatives to full retraining, this work reveals that they fail to adequately protect the privacy of unlearned data. In particular, these algorithms introduce implicit residuals which facilitate privacy attacks targeting at unlearned data. We observe that these residuals persist regardless of model architectures, parameters, and unlearning algorithms, exposing a new attack surface beyond conventional output-based leakage. Based on this insight, we propose the Reminiscence Attack (ReA), which amplifies the correlation between residuals and membership privacy through targeted fine-tuning processes. ReA achieves up to 1.90x and 1.12x higher accuracy than prior attacks when inferring class-wise and sample-wise membership, respectively. To mitigate such residual-induced privacy risk, we develop a dual-phase approximate unlearning framework that first eliminates deep-layer unlearned data traces and then enforces convergence stability to prevent models from "pseudo-convergence", where their outputs are similar to retrained models but still preserve unlearned residuals. Our framework works for both classification and generation tasks. Experimental evaluations confirm that our approach maintains high unlearning efficacy, while reducing the adaptive privacy attack accuracy to nearly random guess, at the computational cost of 2-12% of full retraining from scratch.

Updated: 2025-07-28 07:12:12

标题: 对残留数据的回忆攻击:利用近似机器遗忘技术保护隐私

摘要: 机器遗忘使得可以从机器学习模型中删除特定数据,以维护被遗忘的权利。尽管近似遗忘算法为完全重新训练提供了高效的替代方案,但这项工作揭示了它们未能充分保护遗忘数据的隐私。特别是,这些算法引入了隐含的残留物,促进了针对遗忘数据的隐私攻击。我们观察到,这些残留物会持续存在,无论模型架构、参数和遗忘算法如何,暴露了一个超出传统基于输出泄露的攻击面。基于这一洞察,我们提出了回忆攻击(ReA),通过有针对性的微调过程增加残留物与成员隐私之间的相关性。当推断按类别和样本的成员资格时,ReA的准确率分别比先前攻击高出1.90倍和1.12倍。为了减轻这种残留物引起的隐私风险,我们开发了一个双相近似遗忘框架,首先消除深层遗忘数据痕迹,然后强制收敛稳定,防止模型出现“伪收敛”,即它们的输出类似于重新训练的模型,但仍保留遗忘的残留物。我们的框架适用于分类和生成任务。实验评估证实,我们的方法保持了高效的遗忘效果,同时将自适应隐私攻击的准确率降低到几乎是随机猜测的水平,计算成本为从零开始完全重新训练的2-12%。

更新时间: 2025-07-28 07:12:12

领域: cs.LG

下载: http://arxiv.org/abs/2507.20573v1

Reminiscence Attack on Residuals: Exploiting Approximate Machine Unlearning for Privacy

Machine unlearning enables the removal of specific data from ML models to uphold the right to be forgotten. While approximate unlearning algorithms offer efficient alternatives to full retraining, this work reveals that they fail to adequately protect the privacy of unlearned data. In particular, these algorithms introduce implicit residuals which facilitate privacy attacks targeting at unlearned data. We observe that these residuals persist regardless of model architectures, parameters, and unlearning algorithms, exposing a new attack surface beyond conventional output-based leakage. Based on this insight, we propose the Reminiscence Attack (ReA), which amplifies the correlation between residuals and membership privacy through targeted fine-tuning processes. ReA achieves up to 1.90x and 1.12x higher accuracy than prior attacks when inferring class-wise and sample-wise membership, respectively. To mitigate such residual-induced privacy risk, we develop a dual-phase approximate unlearning framework that first eliminates deep-layer unlearned data traces and then enforces convergence stability to prevent models from "pseudo-convergence", where their outputs are similar to retrained models but still preserve unlearned residuals. Our framework works for both classification and generation tasks. Experimental evaluations confirm that our approach maintains high unlearning efficacy, while reducing the adaptive privacy attack accuracy to nearly random guess, at the computational cost of 2-12% of full retraining from scratch.

Updated: 2025-07-28 07:12:12

标题: 残余信息的回忆攻击:利用近似机器遗忘技术保护隐私

摘要: 机器遗忘使得可以从机器学习模型中移除特定数据,以维护被遗忘权利。虽然近似遗忘算法为完全重新训练提供了高效的替代方案,但本研究揭示它们未能充分保护被遗忘数据的隐私。特别是,这些算法引入了隐含的残留,促进了针对被遗忘数据的隐私攻击。我们发现,这些残留会持续存在,无论模型架构、参数和遗忘算法如何,暴露了超出传统基于输出泄漏的攻击面。基于这一洞察,我们提出了Reminiscence Attack (ReA),通过有针对性的微调过程增强了残留和成员隐私之间的相关性。在推断类别和样本级别成员隐私时,ReA的准确度分别比之前的攻击高出1.90倍和1.12倍。为了减轻这种残留引起的隐私风险,我们开发了一个双阶段的近似遗忘框架,首先消除深层未学习数据痕迹,然后强化收敛稳定性,防止模型出现“伪收敛”,其中它们的输出类似于重新训练的模型,但仍保留未学习的残留。我们的框架适用于分类和生成任务。实验评估证实,我们的方法保持高效的遗忘效果,同时将自适应隐私攻击的准确度降低到接近随机猜测的水平,计算成本仅为从头开始完全重新训练的2-12%。

更新时间: 2025-07-28 07:12:12

领域: cs.LG

下载: http://arxiv.org/abs/2507.20573v1

DAG-AFL:Directed Acyclic Graph-based Asynchronous Federated Learning

Due to the distributed nature of federated learning (FL), the vulnerability of the global model and the need for coordination among many client devices pose significant challenges. As a promising decentralized, scalable and secure solution, blockchain-based FL methods have attracted widespread attention in recent years. However, traditional consensus mechanisms designed for Proof of Work (PoW) similar to blockchain incur substantial resource consumption and compromise the efficiency of FL, particularly when participating devices are wireless and resource-limited. To address asynchronous client participation and data heterogeneity in FL, while limiting the additional resource overhead introduced by blockchain, we propose the Directed Acyclic Graph-based Asynchronous Federated Learning (DAG-AFL) framework. We develop a tip selection algorithm that considers temporal freshness, node reachability and model accuracy, with a DAG-based trusted verification strategy. Extensive experiments on 3 benchmarking datasets against eight state-of-the-art approaches demonstrate that DAG-AFL significantly improves training efficiency and model accuracy by 22.7% and 6.5% on average, respectively.

Updated: 2025-07-28 07:06:56

标题: DAG-AFL:基于有向无环图的异步联邦学习

摘要: 由于联邦学习(FL)的分布式性质,全局模型的易受攻击和需要协调众多客户设备之间的挑战性需求,这给FL带来了显著挑战。作为一种有前途的去中心化、可扩展和安全的解决方案,基于区块链的FL方法近年来引起了广泛关注。然而,为了解决FL中异步客户参与和数据异质性问题,同时限制区块链引入的额外资源开销,我们提出了基于有向无环图的异步联邦学习(DAG-AFL)框架。我们开发了一个考虑时间新鲜度、节点可达性和模型准确性的尖端选择算法,以及基于DAG的可信验证策略。通过对3个基准数据集进行广泛实验对比八种最先进方法,结果显示DAG-AFL分别平均提高了22.7%的训练效率和6.5%的模型准确性。

更新时间: 2025-07-28 07:06:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20571v1

DAG-AFL:Directed Acyclic Graph-based Asynchronous Federated Learning

Due to the distributed nature of federated learning (FL), the vulnerability of the global model and the need for coordination among many client devices pose significant challenges. As a promising decentralized, scalable and secure solution, blockchain-based FL methods have attracted widespread attention in recent years. However, traditional consensus mechanisms designed for Proof of Work (PoW) similar to blockchain incur substantial resource consumption and compromise the efficiency of FL, particularly when participating devices are wireless and resource-limited. To address asynchronous client participation and data heterogeneity in FL, while limiting the additional resource overhead introduced by blockchain, we propose the Directed Acyclic Graph-based Asynchronous Federated Learning (DAG-AFL) framework. We develop a tip selection algorithm that considers temporal freshness, node reachability and model accuracy, with a DAG-based trusted verification strategy. Extensive experiments on 3 benchmarking datasets against eight state-of-the-art approaches demonstrate that DAG-AFL significantly improves training efficiency and model accuracy by 22.7% and 6.5% on average, respectively.

Updated: 2025-07-28 07:06:56

标题: DAG-AFL:基于有向无环图的异步联邦学习

摘要: 由于联邦学习(FL)的分布式特性,全局模型的脆弱性和需要协调许多客户设备之间的需求带来了重大挑战。作为一种有前途的分散化、可扩展和安全的解决方案,基于区块链的FL方法近年来引起了广泛关注。然而,为了解决FL中异步客户参与和数据异构性的问题,同时限制区块链引入的额外资源开销,我们提出了基于有向无环图的异步联邦学习(DAG-AFL)框架。我们开发了一种考虑时间新鲜度、节点可达性和模型准确性的tip选择算法,并采用了基于DAG的可信验证策略。对比八种最先进方法在3个基准数据集上的广泛实验表明,DAG-AFL在训练效率和模型准确性方面分别平均提高了22.7%和6.5%。

更新时间: 2025-07-28 07:06:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20571v1

Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation

Speech-driven 3D facial animation aims to generate realistic facial movements synchronized with audio. Traditional methods primarily minimize reconstruction loss by aligning each frame with ground-truth. However, this frame-wise approach often fails to capture the continuity of facial motion, leading to jittery and unnatural outputs due to coarticulation. To address this, we propose a novel phonetic context-aware loss, which explicitly models the influence of phonetic context on viseme transitions. By incorporating a viseme coarticulation weight, we assign adaptive importance to facial movements based on their dynamic changes over time, ensuring smoother and perceptually consistent animations. Extensive experiments demonstrate that replacing the conventional reconstruction loss with ours improves both quantitative metrics and visual quality. It highlights the importance of explicitly modeling phonetic context-dependent visemes in synthesizing natural speech-driven 3D facial animation. Project page: https://cau-irislab.github.io/interspeech25/

Updated: 2025-07-28 07:04:50

标题: 学习音素上下文依赖的视音素,以增强语音驱动的3D面部动画

摘要: 语音驱动的3D面部动画旨在生成与音频同步的逼真面部动作。传统方法主要通过将每一帧与地面真实对齐来最小化重建损失。然而,这种逐帧的方法经常无法捕捉面部运动的连续性,导致由于共同发音而产生抖动和不自然的输出。为了解决这个问题,我们提出了一种新颖的语音上下文感知损失,明确地模拟了语音上下文对Viseme过渡的影响。通过结合一个Viseme共同发音权重,我们根据它们随时间动态变化的重要性为面部运动分配自适应的重要性,确保更流畅和感知一致的动画。大量实验证明,用我们的方法替换传统的重建损失可以改善定量指标和视觉质量。这突显了在合成自然语音驱动的3D面部动画中明确建模语音上下文相关的Visemes的重要性。项目页面:https://cau-irislab.github.io/interspeech25/

更新时间: 2025-07-28 07:04:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20568v1

Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation

Speech-driven 3D facial animation aims to generate realistic facial movements synchronized with audio. Traditional methods primarily minimize reconstruction loss by aligning each frame with ground-truth. However, this frame-wise approach often fails to capture the continuity of facial motion, leading to jittery and unnatural outputs due to coarticulation. To address this, we propose a novel phonetic context-aware loss, which explicitly models the influence of phonetic context on viseme transitions. By incorporating a viseme coarticulation weight, we assign adaptive importance to facial movements based on their dynamic changes over time, ensuring smoother and perceptually consistent animations. Extensive experiments demonstrate that replacing the conventional reconstruction loss with ours improves both quantitative metrics and visual quality. It highlights the importance of explicitly modeling phonetic context-dependent visemes in synthesizing natural speech-driven 3D facial animation. Project page: https://cau-irislab.github.io/interspeech25/

Updated: 2025-07-28 07:04:50

标题: 学习基于语音环境的视音素以增强语音驱动的3D面部动画

摘要: 讲话驱动的3D面部动画旨在生成与音频同步的逼真面部动作。传统方法主要通过将每帧与实际数据对齐来最小化重建损失。然而,这种逐帧方法经常无法捕捉面部运动的连续性,导致由于协同发音而产生抖动和不自然的输出。为了解决这个问题,我们提出了一种新颖的音素上下文感知损失,明确地模拟了音素上下文对发音过渡的影响。通过结合发音共同发音权重,我们根据它们随时间动态变化的重要性分配适应性重要性给面部动作,确保更流畅和感知上一致的动画。大量实验证明,用我们的方法替换传统的重建损失可以改善定量指标和视觉质量。这突显了在合成自然讲话驱动的3D面部动画中明确建模音素上下文依赖的发音的重要性。项目页面:https://cau-irislab.github.io/interspeech25/

更新时间: 2025-07-28 07:04:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20568v1

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

The exponential growth in demand for GPU computing resources has created an urgent need for automated CUDA optimization strategies. While recent advances in LLMs show promise for code generation, current SOTA models achieve low success rates in improving CUDA speed. In this paper, we introduce CUDA-L1, an automated reinforcement learning framework for CUDA optimization that employs a novel contrastive RL algorithm. CUDA-L1 achieves significant performance improvements on the CUDA optimization task: trained on NVIDIA A100, it delivers an average speedup of x3.12 with a median speedup of x1.42 across all 250 CUDA kernels of KernelBench, with peak speedups reaching x120. Furthermore, the model also demonstrates portability across GPU architectures, achieving average speedups of x3.12 on L40, x2.50 on RTX 3090, x2.39 on H100, and x2.37 on H20 despite being optimized specifically for A100. The capabilities of CUDA-L1 demonstrate that, RL can transform an initially poor-performing LLM into an effective CUDA optimizer through speedup-based reward signals alone, without human expertise or domain knowledge. This paradigm opens possibilities for automated optimization of CUDA operations, and holds promise to substantially promote GPU efficiency and alleviate the rising pressure on GPU computing resources. We also identify important challenges posed by training RL models for tasks like CUDA development, where RL often learns to exploit loopholes in reward functions rather than solve the intended optimization problems. By identifying these failure modes and analyzing their root causes, we develop practical methods for creating more robust training procedures that prevent reward hacking.

Updated: 2025-07-28 07:04:11

标题: CUDA-L1:通过对比强化学习改进CUDA优化

摘要: 对GPU计算资源需求的指数增长已经创造出了对自动化CUDA优化策略的迫切需求。虽然最近的LLMs的进展显示出对代码生成的希望,但当前的SOTA模型在提高CUDA速度方面取得的成功率较低。在本文中,我们介绍了CUDA-L1,这是一个采用新颖对比RL算法的CUDA优化的自动强化学习框架。 CUDA-L1在CUDA优化任务中取得了显著的性能提升:在NVIDIA A100上训练,它在KernelBench的所有250个CUDA内核上实现了平均加速比为x3.12,中位数加速比为x1.42,峰值加速比达到x120。此外,该模型还展示了跨GPU架构的可移植性,在L40上实现了平均加速比为x3.12,在RTX 3090上为x2.50,在H100上为x2.39,在H20上为x2.37,尽管它是专门为A100进行优化的。 CUDA-L1的能力表明,RL可以通过基于加速度的奖励信号将最初表现不佳的LLM转变为有效的CUDA优化器,而无需人类专业知识或领域知识。这种范式为自动化优化CUDA操作开辟了可能性,并有望大大促进GPU效率,并缓解对GPU计算资源日益增加的压力。我们还确定了通过训练RL模型进行类似CUDA开发任务时所面临的重要挑战,RL往往学会利用奖励函数中的漏洞,而不是解决预期的优化问题。通过识别这些失败模式并分析其根本原因,我们开发了实用方法,以创建更健壮的训练程序,以防止奖励欺骗。

更新时间: 2025-07-28 07:04:11

领域: cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2507.14111v4

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

The exponential growth in demand for GPU computing resources has created an urgent need for automated CUDA optimization strategies. While recent advances in LLMs show promise for code generation, current SOTA models achieve low success rates in improving CUDA speed. In this paper, we introduce CUDA-L1, an automated reinforcement learning framework for CUDA optimization that employs a novel contrastive RL algorithm. CUDA-L1 achieves significant performance improvements on the CUDA optimization task: trained on NVIDIA A100, it delivers an average speedup of x3.12 with a median speedup of x1.42 across all 250 CUDA kernels of KernelBench, with peak speedups reaching x120. Furthermore, the model also demonstrates portability across GPU architectures, achieving average speedups of x3.12 on L40, x2.50 on RTX 3090, x2.39 on H100, and x2.37 on H20 despite being optimized specifically for A100. The capabilities of CUDA-L1 demonstrate that, RL can transform an initially poor-performing LLM into an effective CUDA optimizer through speedup-based reward signals alone, without human expertise or domain knowledge. This paradigm opens possibilities for automated optimization of CUDA operations, and holds promise to substantially promote GPU efficiency and alleviate the rising pressure on GPU computing resources. We also identify important challenges posed by training RL models for tasks like CUDA development, where RL often learns to exploit loopholes in reward functions rather than solve the intended optimization problems. By identifying these failure modes and analyzing their root causes, we develop practical methods for creating more robust training procedures that prevent reward hacking.

Updated: 2025-07-28 07:04:11

标题: CUDA-L1:通过对比强化学习改进CUDA优化

摘要: 对GPU计算资源需求的指数级增长已经创造了对自动化CUDA优化策略的迫切需求。尽管最近LLMs的进展展示了代码生成的前景,但当前SOTA模型在提高CUDA速度方面的成功率较低。在本文中,我们介绍了CUDA-L1,这是一个采用新颖对比RL算法的CUDA优化的自动化强化学习框架。 CUDA-L1在CUDA优化任务上取得了显著的性能提升:在NVIDIA A100上训练,它在KernelBench的所有250个CUDA内核中平均加速了x3.12,中位数加速了x1.42,峰值加速率达到了x120。此外,该模型还展示了在GPU架构之间的可移植性,在L40上平均加速了x3.12,在RTX 3090上加速了x2.50,在H100上加速了x2.39,在H20上加速了x2.37,尽管它是专门针对A100进行优化的。 CUDA-L1的能力表明,RL可以通过基于加速度的奖励信号单独转变初始性能较差的LLM为有效的CUDA优化器,而无需人类专业知识或领域知识。这种范式为自动化优化CUDA操作开辟了可能性,并有望大大促进GPU效率,并减轻对GPU计算资源的不断增长的压力。我们还确定了训练RL模型面临的重要挑战,例如CUDA开发任务,RL经常学会利用奖励函数中的漏洞而不是解决预期的优化问题。通过识别这些失败模式并分析它们的根本原因,我们开发了实用方法,用于创建更加健壮的训练程序,以防止奖励欺骗。

更新时间: 2025-07-28 07:04:11

领域: cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2507.14111v4

Unlearning of Knowledge Graph Embedding via Preference Optimization

Existing knowledge graphs (KGs) inevitably contain outdated or erroneous knowledge that needs to be removed from knowledge graph embedding (KGE) models. To address this challenge, knowledge unlearning can be applied to eliminate specific information while preserving the integrity of the remaining knowledge in KGs. Existing unlearning methods can generally be categorized into exact unlearning and approximate unlearning. However, exact unlearning requires high training costs while approximate unlearning faces two issues when applied to KGs due to the inherent connectivity of triples: (1) It fails to fully remove targeted information, as forgetting triples can still be inferred from remaining ones. (2) It focuses on local data for specific removal, which weakens the remaining knowledge in the forgetting boundary. To address these issues, we propose GraphDPO, a novel approximate unlearning framework based on direct preference optimization (DPO). Firstly, to effectively remove forgetting triples, we reframe unlearning as a preference optimization problem, where the model is trained by DPO to prefer reconstructed alternatives over the original forgetting triples. This formulation penalizes reliance on forgettable knowledge, mitigating incomplete forgetting caused by KG connectivity. Moreover, we introduce an out-boundary sampling strategy to construct preference pairs with minimal semantic overlap, weakening the connection between forgetting and retained knowledge. Secondly, to preserve boundary knowledge, we introduce a boundary recall mechanism that replays and distills relevant information both within and across time steps. We construct eight unlearning datasets across four popular KGs with varying unlearning rates. Experiments show that GraphDPO outperforms state-of-the-art baselines by up to 10.1% in MRR_Avg and 14.0% in MRR_F1.

Updated: 2025-07-28 07:03:04

标题: 知识图谱嵌入的偏好优化下的知识图谱嵌入遗忘

摘要: 现有的知识图谱(KGs)不可避免地包含过时或错误的知识,需要从知识图谱嵌入(KGE)模型中删除。为了解决这一挑战,可以应用知识遗忘来消除特定信息,同时保留KGs中剩余知识的完整性。现有的遗忘方法通常可以分为精确遗忘和近似遗忘。然而,精确遗忘需要高昂的训练成本,而近似遗忘在应用于KGs时面临两个问题,因为三元组的固有连接性:(1)它无法完全删除目标信息,因为可以仍然从剩余三元组中推断出遗忘的三元组。 (2)它侧重于对特定移除的局部数据,这削弱了遗忘边界中剩余知识的力量。为了解决这些问题,我们提出了GraphDPO,一种基于直接优先级优化(DPO)的新型近似遗忘框架。首先,为了有效删除遗忘的三元组,我们将遗忘重新构建为偏好优化问题,模型通过DPO训练,更倾向于重建的替代品而不是原始的遗忘三元组。这种表述惩罚对可遗忘知识的依赖,减轻了因KG连接性导致的不完全遗忘。此外,我们引入了一种出边界采样策略来构建具有最小语义重叠的偏好对,削弱了遗忘和保留知识之间的连接。其次,为了保留边界知识,我们引入了一个边界回忆机制,重播和提炼跨越时间步骤的相关信息。我们构建了八个跨四个流行KG的具有不同遗忘率的遗忘数据集。实验结果显示,GraphDPO在MRR_Avg和MRR_F1上的表现优于现有技术基准高达10.1%和14.0%。

更新时间: 2025-07-28 07:03:04

领域: cs.AI

下载: http://arxiv.org/abs/2507.20566v1

Unlearning of Knowledge Graph Embedding via Preference Optimization

Existing knowledge graphs (KGs) inevitably contain outdated or erroneous knowledge that needs to be removed from knowledge graph embedding (KGE) models. To address this challenge, knowledge unlearning can be applied to eliminate specific information while preserving the integrity of the remaining knowledge in KGs. Existing unlearning methods can generally be categorized into exact unlearning and approximate unlearning. However, exact unlearning requires high training costs while approximate unlearning faces two issues when applied to KGs due to the inherent connectivity of triples: (1) It fails to fully remove targeted information, as forgetting triples can still be inferred from remaining ones. (2) It focuses on local data for specific removal, which weakens the remaining knowledge in the forgetting boundary. To address these issues, we propose GraphDPO, a novel approximate unlearning framework based on direct preference optimization (DPO). Firstly, to effectively remove forgetting triples, we reframe unlearning as a preference optimization problem, where the model is trained by DPO to prefer reconstructed alternatives over the original forgetting triples. This formulation penalizes reliance on forgettable knowledge, mitigating incomplete forgetting caused by KG connectivity. Moreover, we introduce an out-boundary sampling strategy to construct preference pairs with minimal semantic overlap, weakening the connection between forgetting and retained knowledge. Secondly, to preserve boundary knowledge, we introduce a boundary recall mechanism that replays and distills relevant information both within and across time steps. We construct eight unlearning datasets across four popular KGs with varying unlearning rates. Experiments show that GraphDPO outperforms state-of-the-art baselines by up to 10.1% in MRR_Avg and 14.0% in MRR_F1.

Updated: 2025-07-28 07:03:04

标题: 通过偏好优化实现知识图嵌入的遗忘

摘要: 现有知识图谱(KGs)不可避免地包含过时或错误的知识,需要从知识图谱嵌入(KGE)模型中删除。为了解决这一挑战,可以应用知识遗忘来消除特定信息,同时保持知识图谱中剩余知识的完整性。现有的遗忘方法通常可分为精确遗忘和近似遗忘。然而,精确遗忘需要高昂的训练成本,而近似遗忘在应用于知识图谱时面临两个问题,原因是三元组的固有连接性:(1)它无法完全移除目标信息,因为可以从剩余的三元组中推断出被遗忘的三元组。(2)它专注于局部数据以进行特定移除,这减弱了遗忘边界中剩余知识的力量。为了解决这些问题,我们提出了GraphDPO,这是一个基于直接偏好优化(DPO)的新型近似遗忘框架。首先,为了有效地移除被遗忘的三元组,我们将遗忘重新构建为一个偏好优化问题,模型通过DPO训练,倾向于优先选择重建的替代品而不是原始的被遗忘三元组。这种表述惩罚依赖于易遗忘知识,减轻了由于知识图谱连接性造成的不完全遗忘。此外,我们引入了一种边界采样策略,以构建具有最小语义重叠的偏好对,削弱了遗忘和保留知识之间的连接。其次,为了保留边界知识,我们引入了一个边界回溯机制,重新播放和提炼在时间步内和跨时间步之间的相关信息。我们在四个流行的知识图谱上构建了八个遗忘数据集,其中包括不同的遗忘速率。实验表明,GraphDPO在MRR_Avg和MRR_F1上的表现优于最先进的基线方法,最高可提高10.1%和14.0%。

更新时间: 2025-07-28 07:03:04

领域: cs.AI

下载: http://arxiv.org/abs/2507.20566v1

Generative AI for Cel-Animation: A Survey

Traditional Celluloid (Cel) Animation production pipeline encompasses multiple essential steps, including storyboarding, layout design, keyframe animation, inbetweening, and colorization, which demand substantial manual effort, technical expertise, and significant time investment. These challenges have historically impeded the efficiency and scalability of Cel-Animation production. The rise of generative artificial intelligence (GenAI), encompassing large language models, multimodal models, and diffusion models, offers innovative solutions by automating tasks such as inbetween frame generation, colorization, and storyboard creation. This survey explores how GenAI integration is revolutionizing traditional animation workflows by lowering technical barriers, broadening accessibility for a wider range of creators through tools like AniDoc, ToonCrafter, and AniSora, and enabling artists to focus more on creative expression and artistic innovation. Despite its potential, challenges like visual consistency, stylistic coherence, and ethical considerations persist. Additionally, this paper explores future directions and advancements in AI-assisted animation.

Updated: 2025-07-28 06:53:31

标题: 生成式人工智能用于卡通动画:一项调查

摘要: 传统的纸浆动画制作流程包括多个关键步骤,包括故事板、布局设计、关键帧动画、中间动画和上色,这些步骤需要大量的手工劳动、技术专业知识和显著的时间投入。这些挑战过去一直阻碍了纸浆动画制作的效率和可扩展性。生成式人工智能的崛起,包括大型语言模型、多模态模型和扩散模型,通过自动化任务如中间帧生成、上色和故事板创建,提供了创新解决方案。这份调查探讨了生成式人工智能集成如何通过降低技术障碍,通过像AniDoc、ToonCrafter和AniSora这样的工具扩大对更广泛创作者的可访问性,并使艺术家更多地专注于创造性表达和艺术创新,从而彻底改变了传统动画工作流程。尽管具有潜力,但挑战如视觉一致性、风格连贯性和伦理考虑仍然存在。此外,本文还探讨了AI辅助动画的未来方向和进展。

更新时间: 2025-07-28 06:53:31

领域: cs.CV,cs.AI,cs.HC

下载: http://arxiv.org/abs/2501.06250v2

Generative AI for Cel-Animation: A Survey

Traditional Celluloid (Cel) Animation production pipeline encompasses multiple essential steps, including storyboarding, layout design, keyframe animation, inbetweening, and colorization, which demand substantial manual effort, technical expertise, and significant time investment. These challenges have historically impeded the efficiency and scalability of Cel-Animation production. The rise of generative artificial intelligence (GenAI), encompassing large language models, multimodal models, and diffusion models, offers innovative solutions by automating tasks such as inbetween frame generation, colorization, and storyboard creation. This survey explores how GenAI integration is revolutionizing traditional animation workflows by lowering technical barriers, broadening accessibility for a wider range of creators through tools like AniDoc, ToonCrafter, and AniSora, and enabling artists to focus more on creative expression and artistic innovation. Despite its potential, challenges like visual consistency, stylistic coherence, and ethical considerations persist. Additionally, this paper explores future directions and advancements in AI-assisted animation.

Updated: 2025-07-28 06:53:31

标题: 生成式AI用于动画制作:一项调查

摘要: 传统的胶片动画制作流程包括多个关键步骤,包括分镜头设计、布局设计、关键帧动画、中间帧绘制和上色,这些步骤需要大量的人工劳动、技术专长和大量的时间投入。这些挑战历来阻碍了Cel-Animation制作的效率和可扩展性。生成式人工智能(GenAI)的崛起,包括大型语言模型、多模态模型和扩散模型,通过自动化任务如中间帧生成、上色和分镜头创作,提供了创新解决方案。本调查探讨了GenAI整合如何通过降低技术门槛、扩大对更广泛创作者的可访问性(如AniDoc、ToonCrafter和AniSora等工具),以及使艺术家更多地专注于创意表达和艺术创新,来彻底改变传统动画工作流程。尽管具有潜力,但挑战如视觉一致性、风格统一性和伦理考量仍然存在。此外,本文还探讨了AI辅助动画在未来的发展方向和进展。

更新时间: 2025-07-28 06:53:31

领域: cs.CV,cs.AI,cs.HC

下载: http://arxiv.org/abs/2501.06250v2

MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization

Speech-driven 3D facial animation aims to synthesize realistic facial motion sequences from given audio, matching the speaker's speaking style. However, previous works often require priors such as class labels of a speaker or additional 3D facial meshes at inference, which makes them fail to reflect the speaking style and limits their practical use. To address these issues, we propose MemoryTalker which enables realistic and accurate 3D facial motion synthesis by reflecting speaking style only with audio input to maximize usability in applications. Our framework consists of two training stages: 1-stage is storing and retrieving general motion (i.e., Memorizing), and 2-stage is to perform the personalized facial motion synthesis (i.e., Animating) with the motion memory stylized by the audio-driven speaking style feature. In this second stage, our model learns about which facial motion types should be emphasized for a particular piece of audio. As a result, our MemoryTalker can generate a reliable personalized facial animation without additional prior information. With quantitative and qualitative evaluations, as well as user study, we show the effectiveness of our model and its performance enhancement for personalized facial animation over state-of-the-art methods.

Updated: 2025-07-28 06:47:59

标题: MemoryTalker:通过音频引导的个性化语音驱动的3D面部动画化

摘要: 语音驱动的3D面部动画旨在从给定的音频中合成逼真的面部运动序列,以匹配说话者的说话风格。然而,先前的研究通常需要先验信息,如说话者的类别标签或在推断时需要额外的3D面部网格,这使它们无法反映说话风格并限制了它们的实际应用。为了解决这些问题,我们提出了MemoryTalker,它通过仅使用音频输入来反映说话风格,从而实现逼真且准确的3D面部运动合成,以最大限度地提高应用的可用性。我们的框架包括两个训练阶段:第一阶段是存储和检索一般运动(即记忆),第二阶段是使用音频驱动的说话风格特征对运动记忆进行个性化面部动画合成(即动画)。在第二阶段,我们的模型学习了特定音频片段应强调哪些面部运动类型。因此,我们的MemoryTalker可以生成可靠的个性化面部动画,而无需额外的先验信息。通过定量和定性评估以及用户研究,我们展示了我们模型的有效性以及相对于最先进方法的个性化面部动画性能提升。

更新时间: 2025-07-28 06:47:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20562v1

MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization

Speech-driven 3D facial animation aims to synthesize realistic facial motion sequences from given audio, matching the speaker's speaking style. However, previous works often require priors such as class labels of a speaker or additional 3D facial meshes at inference, which makes them fail to reflect the speaking style and limits their practical use. To address these issues, we propose MemoryTalker which enables realistic and accurate 3D facial motion synthesis by reflecting speaking style only with audio input to maximize usability in applications. Our framework consists of two training stages: 1-stage is storing and retrieving general motion (i.e., Memorizing), and 2-stage is to perform the personalized facial motion synthesis (i.e., Animating) with the motion memory stylized by the audio-driven speaking style feature. In this second stage, our model learns about which facial motion types should be emphasized for a particular piece of audio. As a result, our MemoryTalker can generate a reliable personalized facial animation without additional prior information. With quantitative and qualitative evaluations, as well as user study, we show the effectiveness of our model and its performance enhancement for personalized facial animation over state-of-the-art methods.

Updated: 2025-07-28 06:47:59

标题: MemoryTalker:通过音频引导的个性化语音驱动3D面部动画的风格化

摘要: 口语驱动的3D面部动画旨在从给定的音频中合成逼真的面部运动序列,以匹配说话者的说话风格。然而,先前的工作通常需要先验条件,如说话者的类别标签或在推断时需要额外的3D面部网格,这使它们无法反映说话风格并限制了它们的实际应用。为了解决这些问题,我们提出了MemoryTalker,通过仅使用音频输入即可实现逼真准确的3D面部运动合成,以最大程度地提高应用的可用性。我们的框架包括两个训练阶段:第1阶段是存储和检索一般运动(即记忆),第2阶段是使用由音频驱动的说话风格特征对运动记忆进行风格化,执行个性化面部运动合成(即动画)。在第二阶段,我们的模型了解哪种面部运动类型应该在特定音频片段中强调。因此,我们的MemoryTalker可以生成可靠的个性化面部动画,而无需额外的先验信息。通过定量和定性评估,以及用户研究,我们展示了我们的模型的有效性以及相对于最先进方法的个性化面部动画性能提升。

更新时间: 2025-07-28 06:47:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20562v1

Statistical Inference for Differentially Private Stochastic Gradient Descent

Privacy preservation in machine learning, particularly through Differentially Private Stochastic Gradient Descent (DP-SGD), is critical for sensitive data analysis. However, existing statistical inference methods for SGD predominantly focus on cyclic subsampling, while DP-SGD requires randomized subsampling. This paper first bridges this gap by establishing the asymptotic properties of SGD under the randomized rule and extending these results to DP-SGD. For the output of DP-SGD, we show that the asymptotic variance decomposes into statistical, sampling, and privacy-induced components. Two methods are proposed for constructing valid confidence intervals: the plug-in method and the random scaling method. We also perform extensive numerical analysis, which shows that the proposed confidence intervals achieve nominal coverage rates while maintaining privacy.

Updated: 2025-07-28 06:45:15

标题: 隐私保护随机梯度下降的统计推断

摘要: 隐私保护在机器学习中尤为重要,特别是通过差分隐私随机梯度下降(DP-SGD)进行隐私保护,对于敏感数据分析至关重要。然而,现有的随机梯度下降的统计推断方法主要集中在循环子采样上,而DP-SGD需要随机子采样。本文首先通过建立基于随机规则的随机梯度下降的渐近性质来弥合这一差距,并将这些结果扩展到DP-SGD。对于DP-SGD的输出,我们展示了其渐近方差分解为统计、采样和隐私诱导组成部分。提出了两种构建有效置信区间的方法:插入法和随机缩放法。我们还进行了大量的数值分析,结果表明所提出的置信区间在保持隐私的同时达到名义覆盖率。

更新时间: 2025-07-28 06:45:15

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2507.20560v1

Statistical Inference for Differentially Private Stochastic Gradient Descent

Privacy preservation in machine learning, particularly through Differentially Private Stochastic Gradient Descent (DP-SGD), is critical for sensitive data analysis. However, existing statistical inference methods for SGD predominantly focus on cyclic subsampling, while DP-SGD requires randomized subsampling. This paper first bridges this gap by establishing the asymptotic properties of SGD under the randomized rule and extending these results to DP-SGD. For the output of DP-SGD, we show that the asymptotic variance decomposes into statistical, sampling, and privacy-induced components. Two methods are proposed for constructing valid confidence intervals: the plug-in method and the random scaling method. We also perform extensive numerical analysis, which shows that the proposed confidence intervals achieve nominal coverage rates while maintaining privacy.

Updated: 2025-07-28 06:45:15

标题: 隐私保护随机梯度下降的统计推断

摘要: 机器学习中的隐私保护,特别是通过差分隐私随机梯度下降(DP-SGD),对于敏感数据分析至关重要。然而,现有的随机梯度下降的统计推断方法主要专注于循环子抽样,而DP-SGD则需要随机子抽样。本文首先通过建立SGD在随机规则下的渐近性质来弥合这一差距,并将这些结果扩展到DP-SGD。对于DP-SGD的输出,我们表明其渐近方差可以分解为统计、抽样和隐私诱导组成部分。我们提出了两种构建有效置信区间的方法:插件方法和随机缩放方法。我们还进行了广泛的数值分析,结果表明提出的置信区间达到名义覆盖率,同时保持隐私。

更新时间: 2025-07-28 06:45:15

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2507.20560v1

MPC-EVM: Enabling MPC Execution by Smart Contracts In An Asynchronous Manner

This paper presents MPC-EVM, the first blockchain prototype that extends the EVM to enable asynchronous MPC invocations by smart contracts during transaction executions without compromising consistency or throughput. MPC-EVM uses an asynchronous execution model to process MPC-invoking transactions in a non-blocking fashion, saving the transaction's progress when it enters an MPC and resuming its execution upon MPC's completion. Additionally, it employs an access control mechanism that prevents inconsistent state access and modifications as a result of asynchronous executions. Benchmarking MPC-EVM's throughput show that the transactions per second (TPS) decreased by less than 3% compared to the baseline when MPC-invoking transactions are executed alongside regular transactions.

Updated: 2025-07-28 06:39:17

标题: MPC-EVM:通过智能合约以异步方式实现MPC执行

摘要: 本文介绍了MPC-EVM,这是第一个将EVM扩展以实现智能合约在交易执行过程中进行异步MPC调用的区块链原型,而不会影响一致性或吞吐量。MPC-EVM使用异步执行模型以非阻塞方式处理MPC调用交易,当交易进入MPC时保存其进展,并在MPC完成后恢复其执行。此外,它采用访问控制机制,防止由于异步执行导致的不一致状态访问和修改。对MPC-EVM的吞吐量进行基准测试表明,与基准线相比,当MPC调用交易与常规交易同时执行时,每秒交易数(TPS)仅减少不到3%。

更新时间: 2025-07-28 06:39:17

领域: cs.CR

下载: http://arxiv.org/abs/2507.20554v1

The Effect of Data Poisoning on Counterfactual Explanations

Counterfactual explanations are a widely used approach for examining the predictions of black-box systems. They can offer the opportunity for computational recourse by suggesting actionable changes on how to alter the input to obtain a different (i.e., more favorable) system output. However, recent studies have pointed out their susceptibility to various forms of manipulation. This work studies the vulnerability of counterfactual explanations to data poisoning. We formally introduce and investigate data poisoning in the context of counterfactual explanations for increasing the cost of recourse on three different levels: locally for a single instance, a sub-group of instances, or globally for all instances. In this context, we formally introduce and characterize data poisonings, from which we derive and investigate a general data poisoning mechanism. We demonstrate the impact of such data poisoning in the critical real-world application of explaining event detections in water distribution networks. Additionally, we conduct an extensive empirical evaluation, demonstrating that state-of-the-art counterfactual generation methods and toolboxes are vulnerable to such data poisoning. Furthermore, we find that existing defense methods fail to detect those poisonous samples.

Updated: 2025-07-28 06:36:34

标题: 数据中毒对反事实解释的影响

摘要: 反事实解释是检查黑盒系统预测的一种广泛使用的方法。它们可以提供计算补救的机会,通过建议如何更改输入以获得不同(即更有利)的系统输出来采取行动。然而,最近的研究指出它们对各种形式的操纵易受影响。 本文研究了反事实解释对数据毒化的脆弱性。我们正式介绍并研究了在三个不同级别上增加补救成本的反事实解释中的数据毒化:针对单个实例、一组实例的局部或全局所有实例。在这种情况下,我们正式介绍和表征数据毒化,从中我们推导并研究了一种通用的数据毒化机制。我们展示了这种数据毒化在关键的现实世界应用中的影响,即在水配送网络中解释事件检测。此外,我们进行了广泛的实证评估,证明了最先进的反事实生成方法和工具包对此类数据毒化的脆弱性。此外,我们发现现有的防御方法无法检测到这些有毒样本。

更新时间: 2025-07-28 06:36:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.08290v4

The Effect of Data Poisoning on Counterfactual Explanations

Counterfactual explanations are a widely used approach for examining the predictions of black-box systems. They can offer the opportunity for computational recourse by suggesting actionable changes on how to alter the input to obtain a different (i.e., more favorable) system output. However, recent studies have pointed out their susceptibility to various forms of manipulation. This work studies the vulnerability of counterfactual explanations to data poisoning. We formally introduce and investigate data poisoning in the context of counterfactual explanations for increasing the cost of recourse on three different levels: locally for a single instance, a sub-group of instances, or globally for all instances. In this context, we formally introduce and characterize data poisonings, from which we derive and investigate a general data poisoning mechanism. We demonstrate the impact of such data poisoning in the critical real-world application of explaining event detections in water distribution networks. Additionally, we conduct an extensive empirical evaluation, demonstrating that state-of-the-art counterfactual generation methods and toolboxes are vulnerable to such data poisoning. Furthermore, we find that existing defense methods fail to detect those poisonous samples.

Updated: 2025-07-28 06:36:34

标题: 数据污染对反事实解释的影响

摘要: 反事实解释是一种广泛应用的方法,用于检查黑匣子系统的预测。它们可以通过建议如何改变输入以获得不同(即更有利)的系统输出来提供计算上的补救机会。然而,最近的研究指出了它们对各种形式操纵的敏感性。 本研究研究了反事实解释对数据毒化的脆弱性。我们正式介绍并研究数据毒化在反事实解释背景下增加补救成本的三个不同级别:针对单个实例、一组实例的局部级别,或者针对所有实例的全局级别。在这个背景下,我们正式介绍和描述数据毒化,从中推导并研究了一般的数据毒化机制。我们展示了这种数据毒化在水配送网络中解释事件检测这一关键的现实应用中的影响。此外,我们进行了广泛的实证评估,证明了当前最先进的反事实生成方法和工具箱对这种数据毒化是脆弱的。此外,我们发现现有的防御方法无法检测到这些有毒样本。

更新时间: 2025-07-28 06:36:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.08290v4

Enhancing Hallucination Detection via Future Context

Large Language Models (LLMs) are widely used to generate plausible text on online platforms, without revealing the generation process. As users increasingly encounter such black-box outputs, detecting hallucinations has become a critical challenge. To address this challenge, we focus on developing a hallucination detection framework for black-box generators. Motivated by the observation that hallucinations, once introduced, tend to persist, we sample future contexts. The sampled future contexts provide valuable clues for hallucination detection and can be effectively integrated with various sampling-based methods. We extensively demonstrate performance improvements across multiple methods using our proposed sampling approach.

Updated: 2025-07-28 06:13:23

标题: 通过未来上下文增强幻觉检测

摘要: 大型语言模型(LLMs)广泛用于在在线平台上生成可信文本,而不暴露生成过程。随着用户越来越频繁地遇到这种黑盒输出,检测幻觉已成为一个关键挑战。为了解决这一挑战,我们专注于为黑盒生成器开发幻觉检测框架。受到一旦引入幻觉就往往持续存在的观察启发,我们对未来环境进行抽样。抽样的未来环境为幻觉检测提供了有价值的线索,并可以有效地与各种基于抽样的方法整合。我们通过提出的抽样方法广泛展示了多种方法的性能改进。

更新时间: 2025-07-28 06:13:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.20546v1

Enhancing Hallucination Detection via Future Context

Large Language Models (LLMs) are widely used to generate plausible text on online platforms, without revealing the generation process. As users increasingly encounter such black-box outputs, detecting hallucinations has become a critical challenge. To address this challenge, we focus on developing a hallucination detection framework for black-box generators. Motivated by the observation that hallucinations, once introduced, tend to persist, we sample future contexts. The sampled future contexts provide valuable clues for hallucination detection and can be effectively integrated with various sampling-based methods. We extensively demonstrate performance improvements across multiple methods using our proposed sampling approach.

Updated: 2025-07-28 06:13:23

标题: 通过未来情境增强幻觉检测

摘要: 大型语言模型(LLMs)被广泛用于在在线平台上生成似是而非的文本,而不暴露生成过程。随着用户越来越多地遇到这种黑匣子输出,检测幻觉已经成为一个关键挑战。为了解决这一挑战,我们致力于为黑匣子生成器开发幻觉检测框架。受到一旦引入幻觉就往往持续存在的观察启发,我们对未来上下文进行采样。采样的未来上下文为幻觉检测提供了有价值的线索,并可以有效地与各种基于采样的方法相结合。我们广泛展示了通过我们提出的采样方法在多种方法上的性能改进。

更新时间: 2025-07-28 06:13:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.20546v1

Uncovering Gradient Inversion Risks in Practical Language Model Training

The gradient inversion attack has been demonstrated as a significant privacy threat to federated learning (FL), particularly in continuous domains such as vision models. In contrast, it is often considered less effective or highly dependent on impractical training settings when applied to language models, due to the challenges posed by the discrete nature of tokens in text data. As a result, its potential privacy threats remain largely underestimated, despite FL being an emerging training method for language models. In this work, we propose a domain-specific gradient inversion attack named Grab (gradient inversion with hybrid optimization). Grab features two alternating optimization processes to address the challenges caused by practical training settings, including a simultaneous optimization on dropout masks between layers for improved token recovery and a discrete optimization for effective token sequencing. Grab can recover a significant portion (up to 92.9% recovery rate) of the private training data, outperforming the attack strategy of utilizing discrete optimization with an auxiliary model by notable improvements of up to 28.9% recovery rate in benchmark settings and 48.5% recovery rate in practical settings. Grab provides a valuable step forward in understanding this privacy threat in the emerging FL training mode of language models.

Updated: 2025-07-28 06:06:29

标题: 揭示实际语言模型训练中的梯度反转风险

摘要: 梯度反转攻击已被证明是联邦学习(FL)中的重要隐私威胁,特别是在连续领域,如视觉模型中。相比之下,当应用于语言模型时,由于文本数据中的标记的离散性质所带来的挑战,通常被认为效果不佳或高度依赖于不切实际的训练设置。因此,尽管FL是一种新兴的语言模型训练方法,但其潜在的隐私威胁仍然在很大程度上被低估。在这项工作中,我们提出了一种领域特定的梯度反转攻击,名为Grab(梯度反转与混合优化)。Grab具有两个交替的优化过程,以解决由实际训练设置造成的挑战,包括在层之间的辍学掩模上的同时优化,以改进标记恢复,并进行有效的标记排序的离散优化。Grab可以恢复私密训练数据的显著部分(最高可达92.9%的恢复率),在基准设置中优于利用辅助模型进行离散优化的攻击策略,恢复率提高了高达28.9%,在实际设置中达到了48.5%的恢复率。Grab在理解语言模型新兴FL训练模式中的隐私威胁方面迈出了一大步。

更新时间: 2025-07-28 06:06:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.21198v1

Improving Group Fairness in Tensor Completion via Imbalance Mitigating Entity Augmentation

Group fairness is important to consider in tensor decomposition to prevent discrimination based on social grounds such as gender or age. Although few works have studied group fairness in tensor decomposition, they suffer from performance degradation. To address this, we propose STAFF(Sparse Tensor Augmentation For Fairness) to improve group fairness by minimizing the gap in completion errors of different groups while reducing the overall tensor completion error. Our main idea is to augment a tensor with augmented entities including sufficient observed entries to mitigate imbalance and group bias in the sparse tensor. We evaluate \method on tensor completion with various datasets under conventional and deep learning-based tensor models. STAFF consistently shows the best trade-off between completion error and group fairness; at most, it yields 36% lower MSE and 59% lower MADE than the second-best baseline.

Updated: 2025-07-28 05:59:35

标题: 通过不平衡缓解实体增强改进张量补全中的团体公平性

摘要: 群体公平性在张量分解中是一个重要考虑因素,可以防止基于性别或年龄等社会因素的歧视。尽管有一些研究关注了张量分解中的群体公平性,但它们往往存在性能下降的问题。为了解决这个问题,我们提出了STAFF(Sparse Tensor Augmentation For Fairness),通过减小不同群体的完成误差差距来提高群体公平性,同时减少整体张量完成误差。我们的主要思路是通过增加张量的增强实体,包括足够的观察条目来减轻稀疏张量中的不平衡和群体偏见。我们在传统和基于深度学习的张量模型下,使用各种数据集对\method 进行评估。STAFF在完成误差和群体公平性之间展现出最佳的权衡;最多可以比第二好的基线减少36%的MSE和59%的MADE。

更新时间: 2025-07-28 05:59:35

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.20542v1

Improving Group Fairness in Tensor Completion via Imbalance Mitigating Entity Augmentation

Group fairness is important to consider in tensor decomposition to prevent discrimination based on social grounds such as gender or age. Although few works have studied group fairness in tensor decomposition, they suffer from performance degradation. To address this, we propose STAFF(Sparse Tensor Augmentation For Fairness) to improve group fairness by minimizing the gap in completion errors of different groups while reducing the overall tensor completion error. Our main idea is to augment a tensor with augmented entities including sufficient observed entries to mitigate imbalance and group bias in the sparse tensor. We evaluate \method on tensor completion with various datasets under conventional and deep learning-based tensor models. STAFF consistently shows the best trade-off between completion error and group fairness; at most, it yields 36% lower MSE and 59% lower MADE than the second-best baseline.

Updated: 2025-07-28 05:59:35

标题: 通过减少不平衡增强实体来改善张量完成中的组公平性

摘要: 群体公平性在张量分解中是重要考虑因素,以防止基于社会因素如性别或年龄的歧视。尽管有少数研究关注了张量分解中的群体公平性,但它们存在性能下降的问题。为了解决这个问题,我们提出了STAFF(Sparse Tensor Augmentation For Fairness),通过最小化不同群体的完成错误差距来改善群体公平性,同时减少整体张量完成错误。我们的主要想法是通过增加具有足够观察条目的增强实体来增强张量,以减轻稀疏张量中的不平衡和群体偏见。我们在常规和基于深度学习的张量模型下使用各种数据集对方法进行评估。STAFF一直展示出在完成错误和群体公平性之间的最佳权衡;它最多比次佳基线减少36%的MSE和59%的MADE。

更新时间: 2025-07-28 05:59:35

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.20542v1

Benchmarking and Analyzing Generative Data for Visual Recognition

Advancements in large pre-trained generative models have expanded their potential as effective data generators in visual recognition. This work delves into the impact of generative images, primarily comparing paradigms that harness external data (\ie generative \vs retrieval \vs original). Our key contributions are: \textbf{1) GenBench Construction:} We devise \textbf{GenBench}, a broad benchmark comprising 22 datasets with 2548 categories, to appraise generative data across various visual recognition tasks. \textbf{2) CLER Score:} To address the insufficient correlation of existing metrics (\eg, FID, CLIP score) with downstream recognition performance, we propose \textbf{CLER}, a training-free metric indicating generative data's efficiency for recognition tasks prior to training. \textbf{3) New Baselines:} Comparisons of generative data with retrieved data from the same external pool help to elucidate the unique traits of generative data. \textbf{4) External Knowledge Injection:} By fine-tuning special token embeddings for each category via Textual Inversion, performance improves across 17 datasets, except when dealing with low-resolution reference images. Our exhaustive benchmark and analysis spotlight generative data's promise in visual recognition, while identifying key challenges for future investigation.

Updated: 2025-07-28 05:59:14

标题: 基准测试和分析生成数据用于视觉识别

摘要: 大型预训练生成模型的进展扩大了它们作为有效数据生成器在视觉识别中的潜力。这项工作探讨了生成图像的影响,主要比较了利用外部数据(即生成与检索与原始)的范式。 我们的主要贡献是:1)GenBench构建:我们设计了GenBench,一个包括22个数据集和2548个类别的广泛基准,用于评估不同视觉识别任务中的生成数据。2)CLER分数:为了解决现有指标(例如FID、CLIP分数)与下游识别性能之间的不足相关性,我们提出了CLER,一个无需训练的指标,用于指示生成数据在训练前对识别任务的效率。3)新基准线:将生成数据与从相同外部池中检索的数据进行比较有助于阐明生成数据的独特特征。4)外部知识注入:通过通过文本反转为每个类别微调特殊令牌嵌入,性能在17个数据集中都有所提高,除了处理低分辨率参考图像时。 我们详尽的基准和分析突出了生成数据在视觉识别中的潜力,同时确定了未来研究的关键挑战。

更新时间: 2025-07-28 05:59:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2307.13697v2

Benchmarking and Analyzing Generative Data for Visual Recognition

Advancements in large pre-trained generative models have expanded their potential as effective data generators in visual recognition. This work delves into the impact of generative images, primarily comparing paradigms that harness external data (\ie generative \vs retrieval \vs original). Our key contributions are: \textbf{1) GenBench Construction:} We devise \textbf{GenBench}, a broad benchmark comprising 22 datasets with 2548 categories, to appraise generative data across various visual recognition tasks. \textbf{2) CLER Score:} To address the insufficient correlation of existing metrics (\eg, FID, CLIP score) with downstream recognition performance, we propose \textbf{CLER}, a training-free metric indicating generative data's efficiency for recognition tasks prior to training. \textbf{3) New Baselines:} Comparisons of generative data with retrieved data from the same external pool help to elucidate the unique traits of generative data. \textbf{4) External Knowledge Injection:} By fine-tuning special token embeddings for each category via Textual Inversion, performance improves across 17 datasets, except when dealing with low-resolution reference images. Our exhaustive benchmark and analysis spotlight generative data's promise in visual recognition, while identifying key challenges for future investigation.

Updated: 2025-07-28 05:59:14

标题: 基准测试和分析生成数据用于视觉识别

摘要: 大型预训练生成模型的进展扩大了它们作为视觉识别中有效数据生成器的潜力。本研究探讨了生成图像的影响,主要比较了利用外部数据(即生成与检索与原始)的范式。 我们的主要贡献是:1)GenBench构建:我们设计了GenBench,一个包含22个数据集和2548个类别的广泛基准,以评估生成数据在各种视觉识别任务中的表现。2)CLER分数:为了解决现有指标(例如FID、CLIP分数)与下游识别性能之间的不足相关性,我们提出了CLER,一个无需训练的度量,用于指示生成数据在训练之前对识别任务的效率。3)新基线:将生成数据与来自同一外部池的检索数据进行比较,有助于阐明生成数据的独特特征。4)外部知识注入:通过通过文本反转调整每个类别的特殊标记嵌入,性能在17个数据集中都有所改善,除了处理低分辨率参考图像时。 我们详尽的基准和分析突出了生成数据在视觉识别中的潜力,同时确定了未来研究的关键挑战。

更新时间: 2025-07-28 05:59:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2307.13697v2

MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design

This paper introduces MeLA, a Metacognitive LLM-Driven Architecture that presents a new paradigm for Automatic Heuristic Design (AHD). Traditional evolutionary methods operate directly on heuristic code; in contrast, MeLA evolves the instructional prompts used to guide a Large Language Model (LLM) in generating these heuristics. This process of "prompt evolution" is driven by a novel metacognitive framework where the system analyzes performance feedback to systematically refine its generative strategy. MeLA's architecture integrates a problem analyzer to construct an initial strategic prompt, an error diagnosis system to repair faulty code, and a metacognitive search engine that iteratively optimizes the prompt based on heuristic effectiveness. In comprehensive experiments across both benchmark and real-world problems, MeLA consistently generates more effective and robust heuristics, significantly outperforming state-of-the-art methods. Ultimately, this research demonstrates the profound potential of using cognitive science as a blueprint for AI architecture, revealing that by enabling an LLM to metacognitively regulate its problem-solving process, we unlock a more robust and interpretable path to AHD.

Updated: 2025-07-28 05:56:40

标题: MeLA:一种自动启发式设计的元认知LLM驱动架构

摘要: 本文介绍了MeLA,一个基于元认知LLM驱动的架构,提出了一种自动启发式设计(AHD)的新范式。传统的进化方法直接操作启发式代码;相反,MeLA演化了用于指导大型语言模型(LLM)生成这些启发式的指导提示。这个“提示演化”的过程由一个新颖的元认知框架驱动,系统分析性能反馈以系统地完善其生成策略。MeLA的架构集成了一个问题分析器来构建初始战略提示,一个错误诊断系统来修复错误代码,以及一个元认知搜索引擎,根据启发式有效性迭代优化提示。在对基准和真实世界问题进行全面实验中,MeLA一直生成更有效和更稳健的启发式,显著优于最先进的方法。最终,这项研究展示了将认知科学用作AI架构蓝图的巨大潜力,揭示了通过使LLM元认知调节其解决问题的过程,我们打开了一条更加稳健和可解释的AHD路径。

更新时间: 2025-07-28 05:56:40

领域: cs.AI

下载: http://arxiv.org/abs/2507.20541v1

MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design

This paper introduces MeLA, a Metacognitive LLM-Driven Architecture that presents a new paradigm for Automatic Heuristic Design (AHD). Traditional evolutionary methods operate directly on heuristic code; in contrast, MeLA evolves the instructional prompts used to guide a Large Language Model (LLM) in generating these heuristics. This process of "prompt evolution" is driven by a novel metacognitive framework where the system analyzes performance feedback to systematically refine its generative strategy. MeLA's architecture integrates a problem analyzer to construct an initial strategic prompt, an error diagnosis system to repair faulty code, and a metacognitive search engine that iteratively optimizes the prompt based on heuristic effectiveness. In comprehensive experiments across both benchmark and real-world problems, MeLA consistently generates more effective and robust heuristics, significantly outperforming state-of-the-art methods. Ultimately, this research demonstrates the profound potential of using cognitive science as a blueprint for AI architecture, revealing that by enabling an LLM to metacognitively regulate its problem-solving process, we unlock a more robust and interpretable path to AHD.

Updated: 2025-07-28 05:56:40

标题: MeLA:用于自动启发式设计的元认知LLM驱动架构

摘要: 这篇论文介绍了MeLA,即一种基于元认知LLM驱动体系结构,提出了自动启发式设计(AHD)的新范式。传统的进化方法直接作用于启发式代码;相比之下,MeLA演进了用于引导大型语言模型(LLM)生成这些启发式的指令提示。这种“提示演进”过程由一种新颖的元认知框架驱动,其中系统分析性能反馈以系统地改进其生成策略。MeLA的架构集成了一个问题分析器来构建初始的战略提示,一个错误诊断系统来修复错误的代码,以及一个元认知搜索引擎,根据启发式有效性迭代优化提示。在对基准和真实世界问题进行的全面实验中,MeLA始终生成更有效和更稳健的启发式,明显优于最先进的方法。最终,这项研究展示了利用认知科学作为AI架构的蓝图的巨大潜力,揭示了通过让LLM元认知地调节其问题解决过程,我们打开了一条更稳健和可解释的AHD路径。

更新时间: 2025-07-28 05:56:40

领域: cs.AI

下载: http://arxiv.org/abs/2507.20541v1

NbBench: Benchmarking Language Models for Comprehensive Nanobody Tasks

Nanobodies -- single-domain antibody fragments derived from camelid heavy-chain-only antibodies -- exhibit unique advantages such as compact size, high stability, and strong binding affinity, making them valuable tools in therapeutics and diagnostics. While recent advances in pretrained protein and antibody language models (PPLMs and PALMs) have greatly enhanced biomolecular understanding, nanobody-specific modeling remains underexplored and lacks a unified benchmark. To address this gap, we introduce NbBench, the first comprehensive benchmark suite for nanobody representation learning. Spanning eight biologically meaningful tasks across nine curated datasets, NbBench encompasses structure annotation, binding prediction, and developability assessment. We systematically evaluate eleven representative models -- including general-purpose protein LMs, antibody-specific LMs, and nanobody-specific LMs -- in a frozen setting. Our analysis reveals that antibody language models excel in antigen-related tasks, while performance on regression tasks such as thermostability and affinity remains challenging across all models. Notably, no single model consistently outperforms others across all tasks. By standardizing datasets, task definitions, and evaluation protocols, NbBench offers a reproducible foundation for assessing and advancing nanobody modeling.

Updated: 2025-07-28 05:51:46

标题: NbBench:为综合纳米体任务评估语言模型

摘要: 纳米抗体是从驼峰重链单抗中派生出的单域抗体片段,具有紧凑的大小、高稳定性和强结合亲和力等独特优势,使它们成为治疗和诊断中宝贵的工具。虽然最近预训练的蛋白质和抗体语言模型(PPLMs和PALMs)的进展极大地增强了生物分子的理解,但纳米抗体特定的建模仍未得到充分探索,并缺乏统一的基准。为了填补这一空白,我们引入了NbBench,这是纳米抗体表示学习的第一个全面基准套件。跨越九个精心策划的数据集上的八个具有生物意义的任务,NbBench包括结构注释、结合预测和可发展性评估。我们在一个冻结设置中系统评估了十一个代表性模型,包括通用蛋白质LM、抗体特异性LM和纳米抗体特异性LM。我们的分析显示,抗体语言模型在抗原相关任务上表现优异,而在热稳定性和亲和力等回归任务上的表现在所有模型中仍具有挑战性。值得注意的是,没有单一模型在所有任务中始终优于其他模型。通过标准化数据集、任务定义和评估协议,NbBench为评估和推进纳米抗体建模提供了可重复的基础。

更新时间: 2025-07-28 05:51:46

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2505.02022v2

NbBench: Benchmarking Language Models for Comprehensive Nanobody Tasks

Nanobodies -- single-domain antibody fragments derived from camelid heavy-chain-only antibodies -- exhibit unique advantages such as compact size, high stability, and strong binding affinity, making them valuable tools in therapeutics and diagnostics. While recent advances in pretrained protein and antibody language models (PPLMs and PALMs) have greatly enhanced biomolecular understanding, nanobody-specific modeling remains underexplored and lacks a unified benchmark. To address this gap, we introduce NbBench, the first comprehensive benchmark suite for nanobody representation learning. Spanning eight biologically meaningful tasks across nine curated datasets, NbBench encompasses structure annotation, binding prediction, and developability assessment. We systematically evaluate eleven representative models -- including general-purpose protein LMs, antibody-specific LMs, and nanobody-specific LMs -- in a frozen setting. Our analysis reveals that antibody language models excel in antigen-related tasks, while performance on regression tasks such as thermostability and affinity remains challenging across all models. Notably, no single model consistently outperforms others across all tasks. By standardizing datasets, task definitions, and evaluation protocols, NbBench offers a reproducible foundation for assessing and advancing nanobody modeling.

Updated: 2025-07-28 05:51:46

标题: NbBench:针对全面纳米抗体任务进行语言模型基准测试

摘要: Nanobodies —来自骆驼重链唯一抗体的单域抗体片段—具有紧凑尺寸、高稳定性和强结合亲和力等独特优势,使它们成为治疗和诊断中有价值的工具。尽管最近在预训练蛋白和抗体语言模型(PPLM和PALM)方面取得了重大进展,但纳米体特定建模仍未得到充分探讨,并且缺乏统一基准。为了填补这一空白,我们介绍了NbBench,这是纳米体表示学习的第一个综合基准套件。NbBench跨越九个精心筛选的数据集,涵盖了八个生物学意义的任务,包括结构注释、结合预测和可发展性评估。我们在一个冻结设置中系统评估了十一个代表性模型,包括通用蛋白LM、抗体特异性LM和纳米体特异性LM。我们的分析表明,抗体语言模型在与抗原相关的任务中表现出色,而在热稳定性和亲和力等回归任务上的表现对所有模型来说仍具有挑战性。值得注意的是,没有单一模型在所有任务中始终优于其他模型。通过标准化数据集、任务定义和评估协议,NbBench为评估和推动纳米体建模提供了可重复的基础。

更新时间: 2025-07-28 05:51:46

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2505.02022v2

Action-List Reinforcement Learning Syndrome Decoding for Binary Linear Block Codes

This paper explores the application of reinforcement learning techniques to enhance the performance of decoding of linear block codes based on flipping bits and finding optimal decisions. We describe the methodology for mapping the iterative decoding process into Markov Decision Processes (MDPs) and propose different methods to reduce the number of states in the MDP. A truncated MDP is proposed to reduce the number of states in the MDP by learning a Hamming ball with a specified radius around codewords. We then propose a general scheme for reinforcement learning based decoders applicable to any class of codes to improve the performance of decoders. We call this scheme an action-list decoding. We design an action-list decoder based on the Deep-Q network values that substantially enhance performance. We also get benefit of automorphism group of code to further improve the code performance. Additionally, we propose a feedback-based method to exploit and enhance the performance of existing high-performing decoders by applying reinforcement learning algorithms after the existing decoders. These approaches effectively reduces the complexity of the reinforcement learning block. Finally, we present experimental results for the Low-Density Parity Check (LDPC) codes over the Binary Symmetric Channel (BSC) to demonstrate the efficiency of the proposed methods.

Updated: 2025-07-28 05:46:06

标题: 行动列表强化学习综合解码二进制线性分组码

摘要: 本文探讨了强化学习技术在基于翻转比特和寻找最优决策的线性分组码解码性能增强中的应用。我们描述了将迭代解码过程映射到马尔可夫决策过程(MDPs)的方法,并提出了不同的方法来减少MDP中状态的数量。提出了一个截断MDP,通过学习一个指定半径内码字周围的海明球来减少MDP中状态的数量。然后,我们提出了一种适用于任何类型码的基于强化学习的解码器的一般方案,以提高解码器的性能。我们将这个方案称为动作列表解码。我们设计了一个基于深度-Q网络值的动作列表解码器,显著提高了性能。我们还利用码的自同构群进一步提高了码的性能。此外,我们提出了一种基于反馈的方法,通过在现有解码器之后应用强化学习算法来利用和增强现有高性能解码器的性能。这些方法有效地降低了强化学习块的复杂性。最后,我们提供了在二元对称信道(BSC)上对低密度奇偶校验(LDPC)码的实验结果,以展示所提方法的效率。

更新时间: 2025-07-28 05:46:06

领域: cs.IT,cs.AI,cs.LG,math.IT

下载: http://arxiv.org/abs/2507.17893v2

T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation

Text-to-Image (T2I) generative models have revolutionized content creation but remain highly sensitive to prompt phrasing, often requiring users to repeatedly refine prompts multiple times without clear feedback. While techniques such as automatic prompt engineering, controlled text embeddings, denoising, and multi-turn generation mitigate these issues, they offer limited controllability, or often necessitate additional training, restricting the generalization abilities. Thus, we introduce T2I-Copilot, a training-free multi-agent system that leverages collaboration between (Multimodal) Large Language Models to automate prompt phrasing, model selection, and iterative refinement. This approach significantly simplifies prompt engineering while enhancing generation quality and text-image alignment compared to direct generation. Specifically, T2I-Copilot consists of three agents: (1) Input Interpreter, which parses the input prompt, resolves ambiguities, and generates a standardized report; (2) Generation Engine, which selects the appropriate model from different types of T2I models and organizes visual and textual prompts to initiate generation; and (3) Quality Evaluator, which assesses aesthetic quality and text-image alignment, providing scores and feedback for potential regeneration. T2I-Copilot can operate fully autonomously while also supporting human-in-the-loop intervention for fine-grained control. On GenAI-Bench, using open-source generation models, T2I-Copilot achieves a VQA score comparable to commercial models RecraftV3 and Imagen 3, surpasses FLUX1.1-pro by 6.17% at only 16.59% of its cost, and outperforms FLUX.1-dev and SD 3.5 Large by 9.11% and 6.36%. Code will be released at: https://github.com/SHI-Labs/T2I-Copilot.

Updated: 2025-07-28 05:41:22

标题: T2I-Copilot:一种无需训练的多代理文本到图像系统,用于增强提示解释和交互生成

摘要: 文本到图像(T2I)生成模型已经彻底改变了内容创作,但仍然对提示措辞非常敏感,通常需要用户反复多次修改提示而没有明确的反馈。虽然技术如自动提示工程、受控文本嵌入、去噪和多轮生成可以缓解这些问题,但它们提供的可控性有限,或者往往需要额外的训练,限制了泛化能力。因此,我们引入了T2I-Copilot,这是一个无需训练的多agent系统,利用(多模态)大语言模型之间的协作来自动化提示措辞、模型选择和迭代改进。与直接生成相比,这种方法显著简化了提示工程,同时增强了生成质量和文本-图像对齐。具体而言,T2I-Copilot包括三个agent:(1)输入解释器,解析输入提示,解决歧义并生成标准化报告;(2)生成引擎,从不同类型的T2I模型中选择适当的模型,并组织视觉和文本提示以启动生成;以及(3)质量评估器,评估审美质量和文本-图像对齐,为潜在再生成提供分数和反馈。T2I-Copilot可以完全自主运行,同时还支持人机交互以进行精细控制。在GenAI-Bench上,使用开源生成模型,T2I-Copilot实现了与商业模型RecraftV3和Imagen 3相当的VQA得分,超过FLUX1.1-pro 6.17%,仅占其成本的16.59%,并且优于FLUX.1-dev和SD 3.5 Large分别达到9.11%和6.36%。代码将在https://github.com/SHI-Labs/T2I-Copilot发布。

更新时间: 2025-07-28 05:41:22

领域: cs.CV,cs.AI,cs.HC

下载: http://arxiv.org/abs/2507.20536v1

T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation

Text-to-Image (T2I) generative models have revolutionized content creation but remain highly sensitive to prompt phrasing, often requiring users to repeatedly refine prompts multiple times without clear feedback. While techniques such as automatic prompt engineering, controlled text embeddings, denoising, and multi-turn generation mitigate these issues, they offer limited controllability, or often necessitate additional training, restricting the generalization abilities. Thus, we introduce T2I-Copilot, a training-free multi-agent system that leverages collaboration between (Multimodal) Large Language Models to automate prompt phrasing, model selection, and iterative refinement. This approach significantly simplifies prompt engineering while enhancing generation quality and text-image alignment compared to direct generation. Specifically, T2I-Copilot consists of three agents: (1) Input Interpreter, which parses the input prompt, resolves ambiguities, and generates a standardized report; (2) Generation Engine, which selects the appropriate model from different types of T2I models and organizes visual and textual prompts to initiate generation; and (3) Quality Evaluator, which assesses aesthetic quality and text-image alignment, providing scores and feedback for potential regeneration. T2I-Copilot can operate fully autonomously while also supporting human-in-the-loop intervention for fine-grained control. On GenAI-Bench, using open-source generation models, T2I-Copilot achieves a VQA score comparable to commercial models RecraftV3 and Imagen 3, surpasses FLUX1.1-pro by 6.17% at only 16.59% of its cost, and outperforms FLUX.1-dev and SD 3.5 Large by 9.11% and 6.36%. Code will be released at: https://github.com/SHI-Labs/T2I-Copilot.

Updated: 2025-07-28 05:41:22

标题: T2I-Copilot:一种无需训练的多智能体文本到图像系统,用于增强提示解释和交互生成

摘要: 文本到图像(T2I)生成模型已经彻底改变了内容创作,但对提示短语非常敏感,通常需要用户多次精炼提示而没有清晰的反馈。虽然自动提示工程、控制文本嵌入、去噪和多轮生成等技术可以缓解这些问题,但它们的可控性有限,或者经常需要额外的训练,限制了泛化能力。因此,我们引入了T2I-Copilot,这是一个无需训练的多代理系统,利用(多模态)大型语言模型之间的协作来自动化提示短语、模型选择和迭代改进。与直接生成相比,这种方法显著简化了提示工程,同时提高了生成质量和文本-图像对齐性。具体来说,T2I-Copilot由三个代理组成:(1)输入解释器,解析输入提示,解决歧义并生成标准化报告;(2)生成引擎,从不同类型的T2I模型中选择适当的模型,并组织视觉和文本提示以启动生成;以及(3)质量评估器,评估美学质量和文本-图像对齐性,为潜在再生成提供得分和反馈。T2I-Copilot可以完全自主运作,同时也支持人在环中的干预,以进行细粒度控制。在GenAI-Bench上,使用开源生成模型,T2I-Copilot的VQA得分与商业模型RecraftV3和Imagen 3相当,超过FLUX1.1-pro的6.17%,仅为其成本的16.59%,并且优于FLUX.1-dev和SD 3.5 Large的9.11%和6.36%。代码将在https://github.com/SHI-Labs/T2I-Copilot上发布。

更新时间: 2025-07-28 05:41:22

领域: cs.CV,cs.AI,cs.HC

下载: http://arxiv.org/abs/2507.20536v1

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

Recent advances in video generation have led to remarkable improvements in visual quality and temporal coherence. Upon this, trajectory-controllable video generation has emerged to enable precise object motion control through explicitly defined spatial paths. However, existing methods struggle with complex object movements and multi-object motion control, resulting in imprecise trajectory adherence, poor object consistency, and compromised visual quality. Furthermore, these methods only support trajectory control in a single format, limiting their applicability in diverse scenarios. Additionally, there is no publicly available dataset or benchmark specifically tailored for trajectory-controllable video generation, hindering robust training and systematic evaluation. To address these challenges, we introduce MagicMotion, a novel image-to-video generation framework that enables trajectory control through three levels of conditions from dense to sparse: masks, bounding boxes, and sparse boxes. Given an input image and trajectories, MagicMotion seamlessly animates objects along defined trajectories while maintaining object consistency and visual quality. Furthermore, we present MagicData, a large-scale trajectory-controlled video dataset, along with an automated pipeline for annotation and filtering. We also introduce MagicBench, a comprehensive benchmark that assesses both video quality and trajectory control accuracy across different numbers of objects. Extensive experiments demonstrate that MagicMotion outperforms previous methods across various metrics. Our project page are publicly available at https://quanhaol.github.io/magicmotion-site.

Updated: 2025-07-28 05:39:21

标题: MagicMotion:密集到稀疏轨迹引导下的可控视频生成

摘要: 最近在视频生成方面取得的进展显著提高了视觉质量和时间连贯性。基于此,可控轨迹视频生成技术已经出现,通过明确定义的空间路径实现对物体运动的精确控制。然而,现有方法在处理复杂物体运动和多物体运动控制方面存在困难,导致轨迹精度不高、物体一致性差以及视觉质量下降。此外,这些方法只支持单一格式的轨迹控制,限制了它们在不同场景中的适用性。另外,目前没有专门针对轨迹可控视频生成的公开数据集或基准,阻碍了稳健的训练和系统化评估。为了解决这些挑战,我们引入了MagicMotion,一种新颖的图像到视频生成框架,通过从密集到稀疏的三个级别的条件(掩码、边界框和稀疏框)实现轨迹控制。给定输入图像和轨迹,MagicMotion可以沿着定义的轨迹无缝地使物体动画化,同时保持物体一致性和视觉质量。此外,我们提出了MagicData,一个大规模的轨迹控制视频数据集,以及一个自动化的注释和过滤流程。我们还介绍了MagicBench,一个全面的基准,评估不同数量物体的视频质量和轨迹控制准确性。大量实验表明,MagicMotion在各种指标上优于先前的方法。我们的项目页面可公开访问:https://quanhaol.github.io/magicmotion-site。

更新时间: 2025-07-28 05:39:21

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2503.16421v2

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

Recent advances in video generation have led to remarkable improvements in visual quality and temporal coherence. Upon this, trajectory-controllable video generation has emerged to enable precise object motion control through explicitly defined spatial paths. However, existing methods struggle with complex object movements and multi-object motion control, resulting in imprecise trajectory adherence, poor object consistency, and compromised visual quality. Furthermore, these methods only support trajectory control in a single format, limiting their applicability in diverse scenarios. Additionally, there is no publicly available dataset or benchmark specifically tailored for trajectory-controllable video generation, hindering robust training and systematic evaluation. To address these challenges, we introduce MagicMotion, a novel image-to-video generation framework that enables trajectory control through three levels of conditions from dense to sparse: masks, bounding boxes, and sparse boxes. Given an input image and trajectories, MagicMotion seamlessly animates objects along defined trajectories while maintaining object consistency and visual quality. Furthermore, we present MagicData, a large-scale trajectory-controlled video dataset, along with an automated pipeline for annotation and filtering. We also introduce MagicBench, a comprehensive benchmark that assesses both video quality and trajectory control accuracy across different numbers of objects. Extensive experiments demonstrate that MagicMotion outperforms previous methods across various metrics. Our project page are publicly available at https://quanhaol.github.io/magicmotion-site.

Updated: 2025-07-28 05:39:21

标题: 魔法动态:具有密集到稀疏轨迹引导的可控视频生成

摘要: 最近在视频生成领域取得的进展极大地提高了视觉质量和时间连贯性。基于此,轨迹可控视频生成技术应运而生,通过明确定义的空间路径实现对物体运动的精确控制。然而,现有方法在处理复杂物体运动和多物体运动控制方面存在困难,导致轨迹粘附不准确、物体一致性差和视觉质量受损。此外,这些方法仅支持单一格式的轨迹控制,限制了它们在不同场景中的适用性。另外,目前没有专门针对轨迹可控视频生成的公开数据集或基准,阻碍了强大的训练和系统评估。为解决这些挑战,我们引入了MagicMotion,这是一个新颖的图像到视频生成框架,通过从密集到稀疏的三个级别的条件(掩码、边界框和稀疏框)实现轨迹控制。给定输入图像和轨迹,MagicMotion能够沿着定义的轨迹无缝地动画化物体,同时保持物体一致性和视觉质量。此外,我们推出了MagicData,一个大规模的轨迹控制视频数据集,以及一个自动化的标注和过滤流程。我们还介绍了MagicBench,一个综合评估不同物体数量下视频质量和轨迹控制准确性的基准。大量实验证明,MagicMotion在各种指标上优于先前的方法。我们的项目页面公开在https://quanhaol.github.io/magicmotion-site。

更新时间: 2025-07-28 05:39:21

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2503.16421v2

How to Bridge Spatial and Temporal Heterogeneity in Link Prediction? A Contrastive Method

Temporal Heterogeneous Networks play a crucial role in capturing the dynamics and heterogeneity inherent in various real-world complex systems, rendering them a noteworthy research avenue for link prediction. However, existing methods fail to capture the fine-grained differential distribution patterns and temporal dynamic characteristics, which we refer to as spatial heterogeneity and temporal heterogeneity. To overcome such limitations, we propose a novel \textbf{C}ontrastive Learning-based \textbf{L}ink \textbf{P}rediction model, \textbf{CLP}, which employs a multi-view hierarchical self-supervised architecture to encode spatial and temporal heterogeneity. Specifically, aiming at spatial heterogeneity, we develop a spatial feature modeling layer to capture the fine-grained topological distribution patterns from node- and edge-level representations, respectively. Furthermore, aiming at temporal heterogeneity, we devise a temporal information modeling layer to perceive the evolutionary dependencies of dynamic graph topologies from time-level representations. Finally, we encode the spatial and temporal distribution heterogeneity from a contrastive learning perspective, enabling a comprehensive self-supervised hierarchical relation modeling for the link prediction task. Extensive experiments conducted on four real-world dynamic heterogeneous network datasets verify that our \mymodel consistently outperforms the state-of-the-art models, demonstrating an average improvement of 10.10\%, 13.44\% in terms of AUC and AP, respectively.

Updated: 2025-07-28 05:38:29

标题: 如何在链接预测中弥合空间和时间的异质性?一种对比方法

摘要: 时间异质网络在捕捉各种现实世界复杂系统固有的动态和异质性方面发挥着至关重要的作用,使它们成为链接预测的一个值得关注的研究领域。然而,现有方法未能捕捉到细粒度的差异分布模式和时间动态特征,我们将其称为空间异质性和时间异质性。为了克服这些局限性,我们提出了一种新颖的基于对比学习的链接预测模型CLP,它采用多视图分层自监督架构来编码空间和时间异质性。具体来说,针对空间异质性,我们开发了一个空间特征建模层,分别从节点级和边级表示中捕捉细粒度的拓扑分布模式。此外,针对时间异质性,我们设计了一个时间信息建模层,以感知动态图拓扑的演变依赖性,从时间级表示中。最后,我们从对比学习的角度编码空间和时间分布异质性,为链接预测任务实现了全面的自监督分层关系建模。在四个真实世界的动态异质网络数据集上进行的大量实验证明,我们的模型始终优于最先进的模型,分别在AUC和AP方面平均提高了10.10%和13.44%。

更新时间: 2025-07-28 05:38:29

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2411.00612v2

How to Bridge Spatial and Temporal Heterogeneity in Link Prediction? A Contrastive Method

Temporal Heterogeneous Networks play a crucial role in capturing the dynamics and heterogeneity inherent in various real-world complex systems, rendering them a noteworthy research avenue for link prediction. However, existing methods fail to capture the fine-grained differential distribution patterns and temporal dynamic characteristics, which we refer to as spatial heterogeneity and temporal heterogeneity. To overcome such limitations, we propose a novel \textbf{C}ontrastive Learning-based \textbf{L}ink \textbf{P}rediction model, \textbf{CLP}, which employs a multi-view hierarchical self-supervised architecture to encode spatial and temporal heterogeneity. Specifically, aiming at spatial heterogeneity, we develop a spatial feature modeling layer to capture the fine-grained topological distribution patterns from node- and edge-level representations, respectively. Furthermore, aiming at temporal heterogeneity, we devise a temporal information modeling layer to perceive the evolutionary dependencies of dynamic graph topologies from time-level representations. Finally, we encode the spatial and temporal distribution heterogeneity from a contrastive learning perspective, enabling a comprehensive self-supervised hierarchical relation modeling for the link prediction task. Extensive experiments conducted on four real-world dynamic heterogeneous network datasets verify that our \mymodel consistently outperforms the state-of-the-art models, demonstrating an average improvement of 10.10\%, 13.44\% in terms of AUC and AP, respectively.

Updated: 2025-07-28 05:38:29

标题: 如何在链接预测中弥合空间和时间异质性?一种对比方法

摘要: 时间异质网络在捕捉各种现实世界复杂系统中固有的动态性和异质性方面起着至关重要的作用,使它们成为链接预测的一个值得关注的研究方向。然而,现有方法未能捕捉细粒度的差异分布模式和时间动态特征,我们将其称为空间异质性和时间异质性。为了克服这些限制,我们提出了一种新颖的基于对比学习的链接预测模型\textbf{CLP},它采用多视图分层自监督架构来编码空间和时间的异质性。具体而言,针对空间异质性,我们开发了一个空间特征建模层,分别从节点级和边级表示中捕捉细粒度的拓扑分布模式。此外,针对时间异质性,我们设计了一个时间信息建模层,从时间级表示中感知动态图拓扑的演变依赖关系。最后,我们从对比学习的角度对空间和时间分布异质性进行编码,实现了链接预测任务的全面自监督分层关系建模。在四个真实世界动态异质网络数据集上进行的大量实验验证了我们的\mymodel 模型始终优于最先进的模型,在AUC和AP方面平均提高了10.10\%和13.44%。

更新时间: 2025-07-28 05:38:29

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2411.00612v2

Kimi K2: Open Agentic Intelligence

We introduce Kimi K2, a Mixture-of-Experts (MoE) large language model with 32 billion activated parameters and 1 trillion total parameters. We propose the MuonClip optimizer, which improves upon Muon with a novel QK-clip technique to address training instability while enjoying the advanced token efficiency of Muon. Based on MuonClip, K2 was pre-trained on 15.5 trillion tokens with zero loss spike. During post-training, K2 undergoes a multi-stage post-training process, highlighted by a large-scale agentic data synthesis pipeline and a joint reinforcement learning (RL) stage, where the model improves its capabilities through interactions with real and synthetic environments. Kimi K2 achieves state-of-the-art performance among open-source non-thinking models, with strengths in agentic capabilities. Notably, K2 obtains 66.1 on Tau2-Bench, 76.5 on ACEBench (En), 65.8 on SWE-Bench Verified, and 47.3 on SWE-Bench Multilingual -- surpassing most open and closed-sourced baselines in non-thinking settings. It also exhibits strong capabilities in coding, mathematics, and reasoning tasks, with a score of 53.7 on LiveCodeBench v6, 49.5 on AIME 2025, 75.1 on GPQA-Diamond, and 27.1 on OJBench, all without extended thinking. These results position Kimi K2 as one of the most capable open-source large language models to date, particularly in software engineering and agentic tasks. We release our base and post-trained model checkpoints to facilitate future research and applications of agentic intelligence.

Updated: 2025-07-28 05:35:43

标题: Kimi K2:开放式智能代理

摘要: 我们介绍了Kimi K2,这是一个拥有32亿激活参数和1万亿总参数的专家混合(MoE)大型语言模型。我们提出了MuonClip优化器,它通过一种新颖的QK-clip技术改进了Muon,以解决训练不稳定性问题,同时享受Muon的高级标记效率。基于MuonClip,K2在1550亿标记上进行了零损失尖峰的预训练。在后训练过程中,K2经历了多阶段后训练过程,其中包括一个大规模的主动数据合成管道和一个联合强化学习(RL)阶段,模型通过与真实和合成环境的互动来改善其能力。 Kimi K2在开源非思维模型中取得了最先进的性能,具有主动能力的优势。值得注意的是,K2在Tau2-Bench上获得了66.1分,在ACEBench (En)上获得了76.5分,在SWE-Bench Verified上获得了65.8分,在SWE-Bench Multilingual上获得了47.3分,超过了大多数开源和闭源基准线在非思考设置中。它还在编码、数学和推理任务中表现出强大的能力,在LiveCodeBench v6上得分为53.7,在AIME 2025上得分为49.5,在GPQA-Diamond上得分为75.1,在OJBench上得分为27.1,而这些都没有进行扩展思考。这些结果使Kimi K2成为迄今为止最具有能力的开源大型语言模型之一,特别是在软件工程和主动任务中。我们发布了我们的基础和后训练模型检查点,以促进未来主动智能研究和应用。

更新时间: 2025-07-28 05:35:43

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.20534v1

Kimi K2: Open Agentic Intelligence

We introduce Kimi K2, a Mixture-of-Experts (MoE) large language model with 32 billion activated parameters and 1 trillion total parameters. We propose the MuonClip optimizer, which improves upon Muon with a novel QK-clip technique to address training instability while enjoying the advanced token efficiency of Muon. Based on MuonClip, K2 was pre-trained on 15.5 trillion tokens with zero loss spike. During post-training, K2 undergoes a multi-stage post-training process, highlighted by a large-scale agentic data synthesis pipeline and a joint reinforcement learning (RL) stage, where the model improves its capabilities through interactions with real and synthetic environments. Kimi K2 achieves state-of-the-art performance among open-source non-thinking models, with strengths in agentic capabilities. Notably, K2 obtains 66.1 on Tau2-Bench, 76.5 on ACEBench (En), 65.8 on SWE-Bench Verified, and 47.3 on SWE-Bench Multilingual -- surpassing most open and closed-sourced baselines in non-thinking settings. It also exhibits strong capabilities in coding, mathematics, and reasoning tasks, with a score of 53.7 on LiveCodeBench v6, 49.5 on AIME 2025, 75.1 on GPQA-Diamond, and 27.1 on OJBench, all without extended thinking. These results position Kimi K2 as one of the most capable open-source large language models to date, particularly in software engineering and agentic tasks. We release our base and post-trained model checkpoints to facilitate future research and applications of agentic intelligence.

Updated: 2025-07-28 05:35:43

标题: Kimi K2:开放的主体智能

摘要: 我们介绍了Kimi K2,这是一个拥有32亿激活参数和1万亿总参数的混合专家(MoE)大型语言模型。我们提出了MuonClip优化器,通过一种新颖的QK-clip技术改进了Muon,以解决训练不稳定性问题,同时享受Muon的先进令牌效率。基于MuonClip,K2在155万亿令牌上进行了零损失峰值的预训练。在后训练过程中,K2经历了一个多阶段后训练过程,其中包括一个大规模的代理数据合成管线和一个联合强化学习(RL)阶段,在这个阶段,模型通过与真实和合成环境的互动改进其能力。 Kimi K2在开源非思考模型中取得了最先进的表现,具有代理能力方面的优势。值得注意的是,K2在Tau2-Bench上获得了66.1分,在ACEBench(En)上获得了76.5分,在SWE-Bench Verified上获得了65.8分,在SWE-Bench Multilingual上获得了47.3分,超过了大多数非思考设置中的开源和闭源基线。它还在编码、数学和推理任务方面表现出色,在LiveCodeBench v6上获得了53.7分,在AIME 2025上获得了49.5分,在GPQA-Diamond上获得了75.1分,在OJBench上获得了27.1分,所有这些都没有延伸思考。这些结果使Kimi K2成为迄今为止最具能力的开源大型语言模型之一,特别是在软件工程和代理任务中。我们发布了我们的基础和后训练模型检查点,以促进未来对代理智能的研究和应用。

更新时间: 2025-07-28 05:35:43

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.20534v1

A Hybrid Classical-Quantum Rainbow Table Attack on Human Passwords

Long, human-generated passwords pose significant challenges to both classical and quantum attacks due to their irregular structure and large search space. In this work, we propose an enhanced classical-quantum hybrid attack specifically designed for this scenario. Our approach constructs rainbow tables using dictionary-based password generation augmented with transformation rules that better capture real-world user behavior. These tables are organized into buckets, enabling faster lookup and reduced space complexity. For the search within each bucket, we employ a distributed exact variant of Grover's algorithm. This method provides deterministic success and significantly lower circuit depth, enhancing robustness against noise-particularly depolarizing errors common in near-term quantum devices. Overall, our hybrid framework improves the efficiency and practicality of password recovery for long, human-readable passwords in realistic adversarial settings.

Updated: 2025-07-28 05:34:07

标题: 人类密码的混合经典-量子彩虹表攻击

摘要: 长的、由人生成的密码由于其不规则的结构和庞大的搜索空间,对传统和量子攻击都提出了重大挑战。在这项工作中,我们提出了一种针对这种情况设计的增强的经典-量子混合攻击。我们的方法利用基于字典的密码生成并增强了变换规则,更好地捕捉了真实世界用户行为。这些表格被组织成桶,可以加快查找速度并减少空间复杂性。对于每个桶内的搜索,我们采用了Grover算法的分布式确切变体。这种方法提供了确定性成功和显著降低了电路深度,增强了抗噪声的鲁棒性,特别是在近期量子设备中常见的去极化误差。总的来说,我们的混合框架提高了在现实对抗环境中长、可读性强的密码恢复的效率和实用性。

更新时间: 2025-07-28 05:34:07

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2507.14600v2

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety. It is developed by our proposed SafeLadder framework, which incorporates large-scale, progressive, safety-oriented reinforcement learning post-training, supported by a suite of multi-principled verifiers. Unlike previous alignment methods such as RLHF that simply learn human preferences, SafeLadder enables SafeWork-R1 to develop intrinsic safety reasoning and self-reflection abilities, giving rise to safety `aha' moments. Notably, SafeWork-R1 achieves an average improvement of $46.54\%$ over its base model Qwen2.5-VL-72B on safety-related benchmarks without compromising general capabilities, and delivers state-of-the-art safety performance compared to leading proprietary models such as GPT-4.1 and Claude Opus 4. To further bolster its reliability, we implement two distinct inference-time intervention methods and a deliberative search mechanism, enforcing step-level verification. Finally, we further develop SafeWork-R1-InternVL3-78B, SafeWork-R1-DeepSeek-70B, and SafeWork-R1-Qwen2.5VL-7B. All resulting models demonstrate that safety and capability can co-evolve synergistically, highlighting the generalizability of our framework in building robust, reliable, and trustworthy general-purpose AI.

Updated: 2025-07-28 05:33:59

标题: SafeWork-R1:在AI-45$^{\circ}$法律下共同发展安全和智能

摘要: 我们引入了SafeWork-R1,这是一个先进的多模态推理模型,展示了能力和安全性的共同演变。它是由我们提出的SafeLadder框架开发的,该框架包括大规模、渐进、以安全为导向的强化学习后训练,支持一套多原则的验证器。与以前的对齐方法(如RLHF)不同,它仅仅学习人类偏好,SafeLadder使SafeWork-R1能够发展内在的安全推理和自我反思能力,从而产生安全的“顿悟”时刻。值得注意的是,SafeWork-R1在安全相关基准上相对于其基础模型Qwen2.5-VL-72B平均改进了46.54%,而且在不损害一般能力的情况下,与领先的专有模型(如GPT-4.1和Claude Opus 4)相比提供了最先进的安全性能。为了进一步增强其可靠性,我们实施了两种不同的推理时间干预方法和一个深思熟虑的搜索机制,强制执行步级验证。最后,我们进一步开发了SafeWork-R1-InternVL3-78B,SafeWork-R1-DeepSeek-70B和SafeWork-R1-Qwen2.5VL-7B。所有产生的模型都表明,安全性和能力可以协同演进,凸显了我们框架在构建强大、可靠和值得信赖的通用人工智能方面的泛化能力。

更新时间: 2025-07-28 05:33:59

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2507.18576v2

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety. It is developed by our proposed SafeLadder framework, which incorporates large-scale, progressive, safety-oriented reinforcement learning post-training, supported by a suite of multi-principled verifiers. Unlike previous alignment methods such as RLHF that simply learn human preferences, SafeLadder enables SafeWork-R1 to develop intrinsic safety reasoning and self-reflection abilities, giving rise to safety `aha' moments. Notably, SafeWork-R1 achieves an average improvement of $46.54\%$ over its base model Qwen2.5-VL-72B on safety-related benchmarks without compromising general capabilities, and delivers state-of-the-art safety performance compared to leading proprietary models such as GPT-4.1 and Claude Opus 4. To further bolster its reliability, we implement two distinct inference-time intervention methods and a deliberative search mechanism, enforcing step-level verification. Finally, we further develop SafeWork-R1-InternVL3-78B, SafeWork-R1-DeepSeek-70B, and SafeWork-R1-Qwen2.5VL-7B. All resulting models demonstrate that safety and capability can co-evolve synergistically, highlighting the generalizability of our framework in building robust, reliable, and trustworthy general-purpose AI.

Updated: 2025-07-28 05:33:59

标题: SafeWork-R1:在人工智能-45$^{\circ}$法则下共同发展安全和智能

摘要: 我们介绍了SafeWork-R1,这是一个先进的多模态推理模型,展示了能力和安全性的共同演进。它由我们提出的SafeLadder框架开发,该框架整合了大规模、渐进式、以安全为导向的强化学习后训练,支持一套多原则验证器。与之前的诸如RLHF等简单学习人类偏好的对准方法不同,SafeLadder使SafeWork-R1能够发展内在的安全推理和自我反思能力,从而产生安全的“噢”时刻。值得注意的是,SafeWork-R1在安全相关基准上实现了平均改善$46.54\%$,而不影响一般能力,并且相较于领先的专有模型如GPT-4.1和Claude Opus 4,提供了最先进的安全性能。为了进一步增强其可靠性,我们实施了两种不同的推理时间干预方法和一个深思熟虑的搜索机制,强制进行步骤级验证。最后,我们进一步发展了SafeWork-R1-InternVL3-78B、SafeWork-R1-DeepSeek-70B和SafeWork-R1-Qwen2.5VL-7B。所有产生的模型都表明,安全性和能力可以协同演进,突显了我们框架在构建强大、可靠和值得信赖的通用人工智能方面的泛化能力。

更新时间: 2025-07-28 05:33:59

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2507.18576v2

Kernel Learning for Sample Constrained Black-Box Optimization

Black box optimization (BBO) focuses on optimizing unknown functions in high-dimensional spaces. In many applications, sampling the unknown function is expensive, imposing a tight sample budget. Ongoing work is making progress on reducing the sample budget by learning the shape/structure of the function, known as kernel learning. We propose a new method to learn the kernel of a Gaussian Process. Our idea is to create a continuous kernel space in the latent space of a variational autoencoder, and run an auxiliary optimization to identify the best kernel. Results show that the proposed method, Kernel Optimized Blackbox Optimization (KOBO), outperforms state of the art by estimating the optimal at considerably lower sample budgets. Results hold not only across synthetic benchmark functions but also in real applications. We show that a hearing aid may be personalized with fewer audio queries to the user, or a generative model could converge to desirable images from limited user ratings.

Updated: 2025-07-28 05:32:11

标题: 样本约束下的黑盒优化的核学习

摘要: 黑盒优化(BBO)专注于在高维空间中优化未知函数。在许多应用中,对未知函数进行采样是昂贵的,限制了样本预算。正在进行的工作通过学习函数的形状/结构(称为核学习)来减少样本预算。我们提出了一种学习高斯过程核的新方法。我们的想法是在变分自动编码器的潜在空间中创建一个连续的核空间,并运行辅助优化来确定最佳核。结果显示,提出的方法,Kernel Optimized Blackbox Optimization(KOBO),通过在相当较低的样本预算下估计最佳值,胜过了现有技术。结果不仅在合成基准函数中保持,而且在实际应用中也成立。我们展示了一个助听器可以通过较少的音频查询个性化,或者一个生成模型可以通过有限的用户评分收敛到理想的图像。

更新时间: 2025-07-28 05:32:11

领域: cs.LG

下载: http://arxiv.org/abs/2507.20533v1

Kernel Learning for Sample Constrained Black-Box Optimization

Black box optimization (BBO) focuses on optimizing unknown functions in high-dimensional spaces. In many applications, sampling the unknown function is expensive, imposing a tight sample budget. Ongoing work is making progress on reducing the sample budget by learning the shape/structure of the function, known as kernel learning. We propose a new method to learn the kernel of a Gaussian Process. Our idea is to create a continuous kernel space in the latent space of a variational autoencoder, and run an auxiliary optimization to identify the best kernel. Results show that the proposed method, Kernel Optimized Blackbox Optimization (KOBO), outperforms state of the art by estimating the optimal at considerably lower sample budgets. Results hold not only across synthetic benchmark functions but also in real applications. We show that a hearing aid may be personalized with fewer audio queries to the user, or a generative model could converge to desirable images from limited user ratings.

Updated: 2025-07-28 05:32:11

标题: 核学习用于样本受限黑盒优化

摘要: 黑盒优化(BBO)专注于在高维空间中优化未知函数。在许多应用中,对未知函数进行采样是昂贵的,需要严格的样本预算。正在进行的工作通过学习函数的形状/结构来减少样本预算,这被称为核学习。我们提出了一种学习高斯过程核的新方法。我们的想法是在变分自动编码器的潜在空间中创建一个连续的核空间,并运行辅助优化以确定最佳核。结果显示,所提出的方法,核优化黑盒优化(KOBO),通过在相当较低的样本预算下估计最佳值而优于现有技术。结果不仅在合成基准函数中保持,而且在实际应用中也是如此。我们展示了耳聋助听器可以通过更少的音频查询来进行个性化,或者生成模型可以通过有限的用户评分收敛到理想的图像。

更新时间: 2025-07-28 05:32:11

领域: cs.LG

下载: http://arxiv.org/abs/2507.20533v1

Large Language Model Powered Decision Support for a Metal Additive Manufacturing Knowledge Graph

Metal additive manufacturing (AM) involves complex interdependencies among processes, materials, feedstock, and post-processing steps. However, the underlying relationships and domain knowledge remain fragmented across literature and static databases that often require expert-level queries, limiting their applicability in design and planning. To address these limitations, we develop a novel and structured knowledge graph (KG), representing 53 distinct metals and alloys across seven material categories, nine AM processes, four feedstock types, and corresponding post-processing requirements. A large language model (LLM) interface, guided by a few-shot prompting strategy, enables natural language querying without the need for formal query syntax. The system supports a range of tasks, including compatibility evaluation, constraint-based filtering, and design for AM (DfAM) guidance. User queries in natural language are normalized, translated into Cypher, and executed on the KG, with results returned in a structured format. This work introduces the first interactive system that connects a domain-specific metal AM KG with an LLM interface, delivering accessible and explainable decision support for engineers and promoting human-centered tools in manufacturing knowledge systems.

Updated: 2025-07-28 05:26:47

标题: 大型语言模型驱动的金属增材制造知识图决策支持

摘要: 金属增材制造(AM)涉及过程、材料、供料和后处理步骤之间复杂的相互依赖关系。然而,潜在的关系和领域知识仍然分散在文献和静态数据库中,通常需要专家级别的查询,限制了它们在设计和规划中的适用性。为解决这些限制,我们开发了一种新颖的结构化知识图(KG),表示了七种材料类别中的53种不同金属和合金,九种AM过程,四种供料类型以及相应的后处理要求。一个大型语言模型(LLM)界面,通过少量提示策略指导,实现了自然语言查询,无需正式的查询语法。该系统支持一系列任务,包括兼容性评估、基于约束的过滤和AM设计(DfAM)指导。用户的自然语言查询被标准化,转换为Cypher,然后在KG上执行,并以结构化格式返回结果。这项工作引入了第一个将领域特定的金属AM KG与LLM界面连接起来的交互式系统,为工程师提供了易于访问和可解释的决策支持,并促进了制造知识系统中以人为中心的工具。

更新时间: 2025-07-28 05:26:47

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2505.20308v2

Large Language Model Powered Decision Support for a Metal Additive Manufacturing Knowledge Graph

Metal additive manufacturing (AM) involves complex interdependencies among processes, materials, feedstock, and post-processing steps. However, the underlying relationships and domain knowledge remain fragmented across literature and static databases that often require expert-level queries, limiting their applicability in design and planning. To address these limitations, we develop a novel and structured knowledge graph (KG), representing 53 distinct metals and alloys across seven material categories, nine AM processes, four feedstock types, and corresponding post-processing requirements. A large language model (LLM) interface, guided by a few-shot prompting strategy, enables natural language querying without the need for formal query syntax. The system supports a range of tasks, including compatibility evaluation, constraint-based filtering, and design for AM (DfAM) guidance. User queries in natural language are normalized, translated into Cypher, and executed on the KG, with results returned in a structured format. This work introduces the first interactive system that connects a domain-specific metal AM KG with an LLM interface, delivering accessible and explainable decision support for engineers and promoting human-centered tools in manufacturing knowledge systems.

Updated: 2025-07-28 05:26:47

标题: 大型语言模型支持的金属增材制造知识图决策支持

摘要: 金属增材制造(AM)涉及过程、材料、原料和后处理步骤之间复杂的相互依赖关系。然而,相关关系和领域知识仍然分散在文献和静态数据库中,通常需要专家级别的查询,限制了它们在设计和规划中的适用性。为了解决这些限制,我们开发了一种新颖且结构化的知识图谱(KG),代表了七种材料类别中的53种不同金属和合金,九种AM过程,四种原料类型以及相应的后处理要求。由一种大型语言模型(LLM)界面引导的少量提示策略,使自然语言查询成为可能,而无需正式的查询语法。该系统支持一系列任务,包括兼容性评估、基于约束的过滤和AM设计指南。用户的自然语言查询被规范化,翻译成Cypher,并在KG上执行,结果以结构化格式返回。这项工作引入了第一个将特定领域金属AM KG与LLM界面连接起来的交互式系统,为工程师提供了可访问和可解释的决策支持,并推广了制造知识系统中以人为中心的工具。

更新时间: 2025-07-28 05:26:47

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2505.20308v2

Enhancing Spatial Reasoning through Visual and Textual Thinking

The spatial reasoning task aims to reason about the spatial relationships in 2D and 3D space, which is a fundamental capability for Visual Question Answering (VQA) and robotics. Although vision language models (VLMs) have developed rapidly in recent years, they are still struggling with the spatial reasoning task. In this paper, we introduce a method that can enhance Spatial reasoning through Visual and Textual thinking Simultaneously (SpatialVTS). In the spatial visual thinking phase, our model is trained to generate location-related specific tokens of essential targets automatically. Not only are the objects mentioned in the problem addressed, but also the potential objects related to the reasoning are considered. During the spatial textual thinking phase, Our model conducts long-term thinking based on visual cues and dialogues, gradually inferring the answers to spatial reasoning problems. To effectively support the model's training, we perform manual corrections to the existing spatial reasoning dataset, eliminating numerous incorrect labels resulting from automatic annotation, restructuring the data input format to enhance generalization ability, and developing thinking processes with logical reasoning details. Without introducing additional information (such as masks or depth), our model's overall average level in several spatial understanding tasks has significantly improved compared with other models.

Updated: 2025-07-28 05:24:54

标题: 通过视觉和文本思维增强空间推理

摘要: 空间推理任务旨在推理2D和3D空间中的空间关系,这是视觉问题回答(VQA)和机器人技术的基本能力。尽管视觉语言模型(VLMs)在近年来发展迅速,但它们仍在与空间推理任务作斗争。在本文中,我们介绍了一种可以通过同时进行视觉和文本思维来增强空间推理的方法(SpatialVTS)。在空间视觉思维阶段,我们的模型经过训练可以自动生成与基本目标的位置相关的特定标记。不仅解决了问题中提到的物体,还考虑了与推理相关的潜在物体。在空间文本思维阶段,我们的模型基于视觉线索和对话进行长期思考,逐渐推断出空间推理问题的答案。为了有效支持模型的训练,我们对现有的空间推理数据集进行手动校正,消除了由自动注释导致的大量错误标签,重构了数据输入格式以增强泛化能力,并开发了具有逻辑推理细节的思维过程。在不引入额外信息(如蒙版或深度)的情况下,与其他模型相比,我们模型在几个空间理解任务中的整体平均水平显著提高。

更新时间: 2025-07-28 05:24:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20529v1

Enhancing Spatial Reasoning through Visual and Textual Thinking

The spatial reasoning task aims to reason about the spatial relationships in 2D and 3D space, which is a fundamental capability for Visual Question Answering (VQA) and robotics. Although vision language models (VLMs) have developed rapidly in recent years, they are still struggling with the spatial reasoning task. In this paper, we introduce a method that can enhance Spatial reasoning through Visual and Textual thinking Simultaneously (SpatialVTS). In the spatial visual thinking phase, our model is trained to generate location-related specific tokens of essential targets automatically. Not only are the objects mentioned in the problem addressed, but also the potential objects related to the reasoning are considered. During the spatial textual thinking phase, Our model conducts long-term thinking based on visual cues and dialogues, gradually inferring the answers to spatial reasoning problems. To effectively support the model's training, we perform manual corrections to the existing spatial reasoning dataset, eliminating numerous incorrect labels resulting from automatic annotation, restructuring the data input format to enhance generalization ability, and developing thinking processes with logical reasoning details. Without introducing additional information (such as masks or depth), our model's overall average level in several spatial understanding tasks has significantly improved compared with other models.

Updated: 2025-07-28 05:24:54

标题: 通过视觉和文本思维增强空间推理

摘要: 空间推理任务旨在推理2D和3D空间中的空间关系,这是视觉问答(VQA)和机器人技术的基本能力。尽管视觉语言模型(VLMs)在近年来发展迅速,但它们仍在努力解决空间推理任务。本文介绍了一种可以通过同时进行视觉和文本思维来增强空间推理能力的方法(SpatialVTS)。在空间视觉思维阶段,我们的模型被训练生成与关键目标相关的位置特定标记。不仅解决了问题中提到的对象,还考虑了与推理相关的潜在对象。在空间文本思维阶段,我们的模型基于视觉线索和对话进行长期思考,逐渐推断空间推理问题的答案。为了有效支持模型的训练,我们对现有的空间推理数据集进行手动校正,消除由自动标注导致的大量错误标签,重构数据输入格式以增强泛化能力,并开发具有逻辑推理细节的思维过程。在没有引入额外信息(如遮罩或深度)的情况下,与其他模型相比,我们的模型在几个空间理解任务的整体平均水平显著提高。

更新时间: 2025-07-28 05:24:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.20529v1

Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards

As large language models (LLMs) gain popularity, their vulnerability to adversarial attacks emerges as a primary concern. While fine-tuning models on domain-specific datasets is often employed to improve model performance, it can inadvertently introduce vulnerabilities within the underlying model. In this work, we investigate Accidental Vulnerability, unexpected vulnerabilities arising from characteristics of fine-tuning data. We begin by identifying potential correlation factors such as linguistic features, semantic similarity, and toxicity across multiple experimental datasets. We then evaluate the adversarial robustness of these fine-tuned models, analyzing persona shifts and interpretability traits to understand how dataset factors contribute to attack success rates. Lastly, we explore causal relationships that offer new insights into adversarial defense strategies, highlighting the crucial role of dataset design in preserving model alignment. Our code is available at https://github.com/psyonp/accidental_vulnerability.

Updated: 2025-07-28 05:16:47

标题: 意外脆弱性:细化因素改变模型保障措施

摘要: 随着大型语言模型(LLMs)的普及,它们对于对抗性攻击的脆弱性成为一个主要关注点。虽然经常使用在特定领域数据集上微调模型来提高模型性能,但这可能会无意中引入底层模型中的漏洞。在这项工作中,我们调查了意外脆弱性,即来自微调数据特征的意外脆弱性。我们首先通过识别潜在的相关因素,如语言特征、语义相似性和毒性,在多个实验数据集上进行评估。然后,我们评估这些微调模型的对抗鲁棒性,分析人物转变和可解释性特征,以了解数据集因素如何影响攻击成功率。最后,我们探讨引入新见解的因果关系,强调数据集设计在保持模型对齐中的关键作用。我们的代码可在https://github.com/psyonp/accidental_vulnerability找到。

更新时间: 2025-07-28 05:16:47

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2505.16789v2

Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards

As large language models (LLMs) gain popularity, their vulnerability to adversarial attacks emerges as a primary concern. While fine-tuning models on domain-specific datasets is often employed to improve model performance, it can inadvertently introduce vulnerabilities within the underlying model. In this work, we investigate Accidental Vulnerability, unexpected vulnerabilities arising from characteristics of fine-tuning data. We begin by identifying potential correlation factors such as linguistic features, semantic similarity, and toxicity across multiple experimental datasets. We then evaluate the adversarial robustness of these fine-tuned models, analyzing persona shifts and interpretability traits to understand how dataset factors contribute to attack success rates. Lastly, we explore causal relationships that offer new insights into adversarial defense strategies, highlighting the crucial role of dataset design in preserving model alignment. Our code is available at https://github.com/psyonp/accidental_vulnerability.

Updated: 2025-07-28 05:16:47

标题: 意外的脆弱性:微调因素导致模型安全措施的转变

摘要: 随着大型语言模型(LLMs)的普及,它们对对抗性攻击的脆弱性成为主要关注点。虽然通常会使用在特定领域数据集上微调模型以提高模型性能,但这可能会意外地在基础模型中引入漏洞。在这项工作中,我们调查了偶然性脆弱性,即由于微调数据的特征而产生的意外脆弱性。我们首先识别潜在的相关因素,如语言特征、语义相似性和毒性,跨多个实验数据集进行评估。然后,我们评估这些微调模型的对抗鲁棒性,分析人物转变和可解释性特征,以了解数据集因素如何影响攻击成功率。最后,我们探讨引入新见解的因果关系,强调数据集设计在维护模型一致性方面的关键作用。我们的代码可在https://github.com/psyonp/accidental_vulnerability 找到。

更新时间: 2025-07-28 05:16:47

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2505.16789v2

ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts

Foundation models have revolutionized AI, yet they struggle with zero-shot deployment in real-world industrial settings due to a lack of high-quality, domain-specific datasets. To bridge this gap, Superb AI introduces ZERO, an industry-ready vision foundation model that leverages multi-modal prompting (textual and visual) for generalization without retraining. Trained on a compact yet representative 0.9 million annotated samples from a proprietary billion-scale industrial dataset, ZERO demonstrates competitive performance on academic benchmarks like LVIS-Val and significantly outperforms existing models across 37 diverse industrial datasets. Furthermore, ZERO achieved 2nd place in the CVPR 2025 Object Instance Detection Challenge and 4th place in the Foundational Few-shot Object Detection Challenge, highlighting its practical deployability and generalizability with minimal adaptation and limited data. To the best of our knowledge, ZERO is the first vision foundation model explicitly built for domain-specific, zero-shot industrial applications.

Updated: 2025-07-28 05:13:34

标题: 零:具备多模态提示的工业就绪视觉基础模型

摘要: 基础模型已经颠覆了人工智能,但由于缺乏高质量的领域特定数据集,在真实世界的工业环境中它们难以实现零-shot部署。为了弥合这一差距,Superb AI推出了ZERO,这是一个行业准备就绪的视觉基础模型,利用多模态提示(文本和视觉)进行泛化而无需重新训练。ZERO在一个私有的十亿级工业数据集中经过紧凑但代表性的90万个标记样本的训练,展现出在学术基准测试如LVIS-Val上具有竞争力的表现,并在37个不同的工业数据集上显著优于现有模型。此外,ZERO在CVPR 2025目标实例检测挑战赛中获得第二名,在基础少样本目标检测挑战赛中获得第四名,突显了它在实际部署和泛化能力方面的优势,只需进行最少的适应和有限的数据。据我们所知,ZERO是专门为领域特定的零-shot工业应用而构建的第一个视觉基础模型。

更新时间: 2025-07-28 05:13:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.04270v2

ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts

Foundation models have revolutionized AI, yet they struggle with zero-shot deployment in real-world industrial settings due to a lack of high-quality, domain-specific datasets. To bridge this gap, Superb AI introduces ZERO, an industry-ready vision foundation model that leverages multi-modal prompting (textual and visual) for generalization without retraining. Trained on a compact yet representative 0.9 million annotated samples from a proprietary billion-scale industrial dataset, ZERO demonstrates competitive performance on academic benchmarks like LVIS-Val and significantly outperforms existing models across 37 diverse industrial datasets. Furthermore, ZERO achieved 2nd place in the CVPR 2025 Object Instance Detection Challenge and 4th place in the Foundational Few-shot Object Detection Challenge, highlighting its practical deployability and generalizability with minimal adaptation and limited data. To the best of our knowledge, ZERO is the first vision foundation model explicitly built for domain-specific, zero-shot industrial applications.

Updated: 2025-07-28 05:13:34

标题: 零:具备多模态提示的行业就绪视觉基础模型

摘要: 基础模型已经改变了人工智能,但是它们在实际工业环境中的零样本部署方面面临困难,因为缺乏高质量的领域特定数据集。为了弥补这一差距,Superb AI推出了ZERO,一个行业就绪的视觉基础模型,利用多模态提示(文本和视觉)进行泛化而无需重新训练。ZERO在一个紧凑但代表性强的专有十亿规模工业数据集中训练,展示了在学术基准测试中(如LVIS-Val)具有竞争力的性能,并在37个不同的工业数据集上明显优于现有模型。此外,ZERO在CVPR 2025目标实例检测挑战中获得第二名,在基础性少样本目标检测挑战中获得第四名,突显了其在实际部署和泛化能力方面具有很强的可适应性和有限数据。据我们所知,ZERO是第一个专门为领域特定的零样本工业应用而建立的视觉基础模型。

更新时间: 2025-07-28 05:13:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.04270v2

Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition

Recent advances have enabled LLM-powered AI agents to autonomously execute complex tasks by combining language model reasoning with tools, memory, and web access. But can these systems be trusted to follow deployment policies in realistic environments, especially under attack? To investigate, we ran the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios. Participants submitted 1.8 million prompt-injection attacks, with over 60,000 successfully eliciting policy violations such as unauthorized data access, illicit financial actions, and regulatory noncompliance. We use these results to build the Agent Red Teaming (ART) benchmark - a curated set of high-impact attacks - and evaluate it across 19 state-of-the-art models. Nearly all agents exhibit policy violations for most behaviors within 10-100 queries, with high attack transferability across models and tasks. Importantly, we find limited correlation between agent robustness and model size, capability, or inference-time compute, suggesting that additional defenses are needed against adversarial misuse. Our findings highlight critical and persistent vulnerabilities in today's AI agents. By releasing the ART benchmark and accompanying evaluation framework, we aim to support more rigorous security assessment and drive progress toward safer agent deployment.

Updated: 2025-07-28 05:13:04

标题: 人工智能代理部署中的安全挑战:来自大规模公开竞赛的洞见

摘要: 最近的进展使LLM动力AI代理能够通过将语言模型推理与工具、记忆和网络访问相结合,自主执行复杂任务。但是这些系统能否被信任在现实环境中遵守部署政策,尤其是在受到攻击时?为了调查这个问题,我们进行了迄今为止规模最大的公开红队竞赛,针对44个现实部署场景中的22个前沿AI代理。参与者提交了180万个提示注入攻击,其中超过6万个成功引发了未经授权的数据访问、非法的金融行为和违反监管的行为等政策违规行为。我们利用这些结果构建了Agent Red Teaming(ART)基准 - 一组经过精心策划的高影响攻击,并在19个最先进的模型上进行评估。几乎所有代理在10-100个查询内对大多数行为都表现出政策违规,攻击在模型和任务之间具有很高的可转移性。重要的是,我们发现代理的稳健性与模型大小、能力或推理时间计算之间存在有限的相关性,这表明需要采取额外的防御措施来防止对抗性的滥用。我们的研究结果突显了当今AI代理中的关键和持久性漏洞。通过发布ART基准和配套的评估框架,我们旨在支持更严格的安全评估,并推动更安全的代理部署的进展。

更新时间: 2025-07-28 05:13:04

领域: cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2507.20526v1

Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition

Recent advances have enabled LLM-powered AI agents to autonomously execute complex tasks by combining language model reasoning with tools, memory, and web access. But can these systems be trusted to follow deployment policies in realistic environments, especially under attack? To investigate, we ran the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios. Participants submitted 1.8 million prompt-injection attacks, with over 60,000 successfully eliciting policy violations such as unauthorized data access, illicit financial actions, and regulatory noncompliance. We use these results to build the Agent Red Teaming (ART) benchmark - a curated set of high-impact attacks - and evaluate it across 19 state-of-the-art models. Nearly all agents exhibit policy violations for most behaviors within 10-100 queries, with high attack transferability across models and tasks. Importantly, we find limited correlation between agent robustness and model size, capability, or inference-time compute, suggesting that additional defenses are needed against adversarial misuse. Our findings highlight critical and persistent vulnerabilities in today's AI agents. By releasing the ART benchmark and accompanying evaluation framework, we aim to support more rigorous security assessment and drive progress toward safer agent deployment.

Updated: 2025-07-28 05:13:04

标题: AI代理部署中的安全挑战:来自大规模公开竞赛的见解

摘要: 最近的进展使得由LLM驱动的人工智能代理能够通过结合语言模型推理与工具、记忆和网络访问来自主执行复杂任务。但是这些系统能够在现实环境中遵守部署政策吗,特别是在受到攻击的情况下?为了调查这个问题,我们进行了迄今为止最大规模的公开红队竞赛,针对了22个前沿人工智能代理在44个现实部署场景中。参与者提交了180万次提示注入攻击,其中超过6万次成功引发了未经授权的数据访问、非法金融行为和违反监管的行为等政策违规行为。我们利用这些结果构建了Agent Red Teaming (ART)基准 - 一个经过精心筛选的高影响攻击集合,并在19个最先进的模型上进行了评估。几乎所有代理在10-100个查询中的大多数行为中都存在政策违规,攻击的可传递性在模型和任务之间很高。重要的是,我们发现代理的稳健性与模型大小、能力或推理时间计算之间存在有限的相关性,这表明需要额外的防御措施来防范对抗性的误用。我们的研究结果突显了当今人工智能代理中的关键和持久性漏洞。通过发布ART基准和配套的评估框架,我们旨在支持更严格的安全评估,并推动更安全的代理部署的进步。

更新时间: 2025-07-28 05:13:04

领域: cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2507.20526v1

The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?

This paper presents a case study in the use of a large language model to generate a fictional Buddhist "sutr"', and offers a detailed analysis of the resulting text from a philosophical and literary point of view. The conceptual subtlety, rich imagery, and density of allusion found in the text make it hard to causally dismiss on account of its mechanistic origin. This raises questions about how we, as a society, should come to terms with the potentially unsettling possibility of a technology that encroaches on human meaning-making. We suggest that Buddhist philosophy, by its very nature, is well placed to adapt.

Updated: 2025-07-28 05:12:35

标题: 异教经:人工智能生成的“神圣”文本是否具有意义和价值?

摘要: 本文介绍了一项使用大型语言模型生成虚构佛教《经文》的案例研究,并从哲学和文学角度对生成的文本进行了详细分析。文本中所发现的概念细微之处、丰富的意象和引用密度使人很难因其机械起源而轻率地将其忽略。这引发了关于我们作为一个社会如何接受可能令人不安的技术侵入人类意义构建的问题。我们认为,佛教哲学本质上适合适应这一变化。

更新时间: 2025-07-28 05:12:35

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2507.20525v1

The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?

This paper presents a case study in the use of a large language model to generate a fictional Buddhist "sutr"', and offers a detailed analysis of the resulting text from a philosophical and literary point of view. The conceptual subtlety, rich imagery, and density of allusion found in the text make it hard to causally dismiss on account of its mechanistic origin. This raises questions about how we, as a society, should come to terms with the potentially unsettling possibility of a technology that encroaches on human meaning-making. We suggest that Buddhist philosophy, by its very nature, is well placed to adapt.

Updated: 2025-07-28 05:12:35

标题: 异教经书:是否可以给AI生成的“神圣”文本赋予意义和价值?

摘要: 这篇论文介绍了使用大型语言模型来生成一个虚构的佛教“经”的案例研究,并从哲学和文学角度对生成的文本进行了详细分析。文本中所发现的概念细微、丰富的意象和充满影射的密度使其难以仅仅因为其机械化的起源而轻率地忽视。这引发了一个关于我们作为一个社会应如何应对可能令人不安的技术可能侵犯人类意义构建的问题。我们认为,佛教哲学本质上适合适应这种情况。

更新时间: 2025-07-28 05:12:35

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2507.20525v1

When and Where do Data Poisons Attack Textual Inversion?

Poisoning attacks pose significant challenges to the robustness of diffusion models (DMs). In this paper, we systematically analyze when and where poisoning attacks textual inversion (TI), a widely used personalization technique for DMs. We first introduce Semantic Sensitivity Maps, a novel method for visualizing the influence of poisoning on text embeddings. Second, we identify and experimentally verify that DMs exhibit non-uniform learning behavior across timesteps, focusing on lower-noise samples. Poisoning attacks inherit this bias and inject adversarial signals predominantly at lower timesteps. Lastly, we observe that adversarial signals distract learning away from relevant concept regions within training data, corrupting the TI process. Based on these insights, we propose Safe-Zone Training (SZT), a novel defense mechanism comprised of 3 key components: (1) JPEG compression to weaken high-frequency poison signals, (2) restriction to high timesteps during TI training to avoid adversarial signals at lower timesteps, and (3) loss masking to constrain learning to relevant regions. Extensive experiments across multiple poisoning methods demonstrate that SZT greatly enhances the robustness of TI against all poisoning attacks, improving generative quality beyond prior published defenses. Code: www.github.com/JStyborski/Diff_Lab Data: www.github.com/JStyborski/NC10

Updated: 2025-07-28 05:07:59

标题: 数据毒素何时何地攻击文本反转?

摘要: 中毒攻击对扩散模型(DMs)的鲁棒性提出了重大挑战。本文系统分析了什么时候以及在哪里进行中毒攻击文本反转(TI),这是一种广泛使用的个性化技术。首先,我们引入了语义敏感性地图,一种用于可视化毒害对文本嵌入的影响的新方法。其次,我们确定并实验证明DMs在时间步长上表现出非均匀的学习行为,重点是较低噪声样本。中毒攻击继承了这种偏见,并主要在较低时间步长注入对抗信号。最后,我们观察到对抗性信号将学习从训练数据中的相关概念区域分散,破坏了TI过程。基于这些见解,我们提出了Safe-Zone Training(SZT),这是一种由3个关键组件组成的新颖防御机制:(1)JPEG压缩以削弱高频毒害信号,(2)在TI训练期间限制到高时间步长以避免较低时间步长的对抗信号,(3)损失掩蔽以约束学习到相关区域。跨多种毒害方法的广泛实验证明SZT极大地增强了TI对所有毒害攻击的鲁棒性,提高了生成质量,超越了先前发布的防御方法。 代码:www.github.com/JStyborski/Diff_Lab 数据:www.github.com/JStyborski/NC10

更新时间: 2025-07-28 05:07:59

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.10578v3

When and Where do Data Poisons Attack Textual Inversion?

Poisoning attacks pose significant challenges to the robustness of diffusion models (DMs). In this paper, we systematically analyze when and where poisoning attacks textual inversion (TI), a widely used personalization technique for DMs. We first introduce Semantic Sensitivity Maps, a novel method for visualizing the influence of poisoning on text embeddings. Second, we identify and experimentally verify that DMs exhibit non-uniform learning behavior across timesteps, focusing on lower-noise samples. Poisoning attacks inherit this bias and inject adversarial signals predominantly at lower timesteps. Lastly, we observe that adversarial signals distract learning away from relevant concept regions within training data, corrupting the TI process. Based on these insights, we propose Safe-Zone Training (SZT), a novel defense mechanism comprised of 3 key components: (1) JPEG compression to weaken high-frequency poison signals, (2) restriction to high timesteps during TI training to avoid adversarial signals at lower timesteps, and (3) loss masking to constrain learning to relevant regions. Extensive experiments across multiple poisoning methods demonstrate that SZT greatly enhances the robustness of TI against all poisoning attacks, improving generative quality beyond prior published defenses. Code: www.github.com/JStyborski/Diff_Lab Data: www.github.com/JStyborski/NC10

Updated: 2025-07-28 05:07:59

标题: 数据毒素何时何地攻击文本反转?

摘要: 中毒攻击对扩散模型(DMs)的稳健性构成重大挑战。本文系统分析了何时何地进行中毒攻击文本反演(TI),这是一种广泛使用的个性化技术。我们首先引入了语义敏感度图,这是一种新颖的方法,用于可视化中毒对文本嵌入的影响。其次,我们确定并实验证明DMs在时间步长上表现出非均匀的学习行为,重点放在低噪声样本上。中毒攻击继承了这种偏见,并主要在较低的时间步长注入对抗信号。最后,我们观察到对抗信号使学习偏离训练数据中相关概念区域,破坏了TI过程。基于这些见解,我们提出了Safe-Zone Training(SZT),这是一种由三个关键组件组成的新型防御机制:(1)JPEG压缩以削弱高频中毒信号,(2)在TI训练期间限制到高时间步长,以避免在较低时间步长处的对抗信号,以及(3)损失掩蔽以约束学习到相关区域。跨多种中毒方法的大量实验证明,SZT极大地增强了TI对所有中毒攻击的稳健性,提高了生成质量,超越了先前发表的防御方法。 代码:www.github.com/JStyborski/Diff_Lab 数据:www.github.com/JStyborski/NC10

更新时间: 2025-07-28 05:07:59

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.10578v3

AQUA: A Large Language Model for Aquaculture & Fisheries

Aquaculture plays a vital role in global food security and coastal economies by providing sustainable protein sources. As the industry expands to meet rising demand, it faces growing challenges such as disease outbreaks, inefficient feeding practices, rising labor costs, logistical inefficiencies, and critical hatchery issues, including high mortality rates and poor water quality control. Although artificial intelligence has made significant progress, existing machine learning methods fall short of addressing the domain-specific complexities of aquaculture. To bridge this gap, we introduce AQUA, the first large language model (LLM) tailored for aquaculture, designed to support farmers, researchers, and industry practitioners. Central to this effort is AQUADAPT (Data Acquisition, Processing and Tuning), an Agentic Framework for generating and refining high-quality synthetic data using a combination of expert knowledge, largescale language models, and automated evaluation techniques. Our work lays the foundation for LLM-driven innovations in aquaculture research, advisory systems, and decision-making tools.

Updated: 2025-07-28 05:06:07

标题: AQUA:一个用于水产养殖和渔业的大型语言模型

摘要: 水产养殖在全球食品安全和沿海经济中发挥着至关重要的作用,通过提供可持续的蛋白质来源。随着行业扩张以满足不断增长的需求,它面临着日益严峻的挑战,如疾病爆发、效率低下的饲养实践、劳动力成本上升、物流效率低下以及关键的孵化问题,包括高死亡率和水质控制不佳。虽然人工智能取得了重大进展,但现有的机器学习方法未能解决水产养殖的特定领域复杂性。为了弥合这一差距,我们引入了AQUA,这是第一个专为水产养殖量身定制的大型语言模型(LLM),旨在支持农民、研究人员和行业从业者。这一努力的核心是AQUADAPT(数据获取、处理和调优),这是一个用于生成和优化高质量合成数据的主动框架,利用专家知识、大规模语言模型和自动化评估技术的组合。我们的工作为水产养殖研究、咨询系统和决策工具中基于LLM的创新奠定了基础。

更新时间: 2025-07-28 05:06:07

领域: cs.CL,cs.AI,cs.CE,cs.LG,cs.RO

下载: http://arxiv.org/abs/2507.20520v1

AQUA: A Large Language Model for Aquaculture & Fisheries

Aquaculture plays a vital role in global food security and coastal economies by providing sustainable protein sources. As the industry expands to meet rising demand, it faces growing challenges such as disease outbreaks, inefficient feeding practices, rising labor costs, logistical inefficiencies, and critical hatchery issues, including high mortality rates and poor water quality control. Although artificial intelligence has made significant progress, existing machine learning methods fall short of addressing the domain-specific complexities of aquaculture. To bridge this gap, we introduce AQUA, the first large language model (LLM) tailored for aquaculture, designed to support farmers, researchers, and industry practitioners. Central to this effort is AQUADAPT (Data Acquisition, Processing and Tuning), an Agentic Framework for generating and refining high-quality synthetic data using a combination of expert knowledge, largescale language models, and automated evaluation techniques. Our work lays the foundation for LLM-driven innovations in aquaculture research, advisory systems, and decision-making tools.

Updated: 2025-07-28 05:06:07

标题: AQUA: 一个用于水产养殖与渔业的大型语言模型

摘要: 水产养殖在全球食品安全和沿海经济中发挥着至关重要的作用,提供可持续的蛋白质来源。随着该行业不断扩张以满足不断增长的需求,面临着日益严峻的挑战,如疾病爆发、低效的饲养实践、劳动力成本上升、物流效率低下以及孵化问题,包括高死亡率和水质控制不佳。尽管人工智能已经取得了显著进展,但现有的机器学习方法未能解决水产养殖领域特定复杂性。为弥补这一差距,我们介绍了AQUA,这是专为水产养殖量身定制的首个大型语言模型(LLM),旨在支持农民、研究人员和行业从业者。这一努力的核心是AQUADAPT(数据获取、处理和调整),这是一个代理化框架,利用专家知识、大规模语言模型和自动化评估技术的组合来生成和细化高质量的合成数据。我们的工作为水产养殖研究、咨询系统和决策工具中的LLM驱动创新奠定了基础。

更新时间: 2025-07-28 05:06:07

领域: cs.CL,cs.AI,cs.CE,cs.LG,cs.RO

下载: http://arxiv.org/abs/2507.20520v1

Geometric Representation Condition Improves Equivariant Molecule Generation

Recent advances in molecular generative models have demonstrated great promise for accelerating scientific discovery, particularly in drug design. However, these models often struggle to generate high-quality molecules, especially in conditional scenarios where specific molecular properties must be satisfied. In this work, we introduce GeoRCG, a general framework to improve molecular generative models by integrating geometric representation conditions with provable theoretical guarantees. We decompose the generation process into two stages: first, generating an informative geometric representation; second, generating a molecule conditioned on the representation. Compared with single-stage generation, the easy-to-generate representation in the first stage guides the second stage generation toward a high-quality molecule in a goal-oriented way. Leveraging EDM and SemlaFlow as base generators, we observe significant quality improvements in unconditional molecule generation on the widely used QM9 and GEOM-DRUG datasets. More notably, in the challenging conditional molecular generation task, our framework achieves an average 50\% performance improvement over state-of-the-art approaches, highlighting the superiority of conditioning on semantically rich geometric representations. Furthermore, with such representation guidance, the number of diffusion steps can be reduced to as small as 100 while largely preserving the generation quality achieved with 1,000 steps, thereby significantly reducing the generation iterations needed. Code is available at https://github.com/GraphPKU/GeoRCG.

Updated: 2025-07-28 05:05:47

标题: 几何表示条件改进了等变分子生成

摘要: 近期,分子生成模型的最新进展展示了加速科学发现的巨大潜力,特别是在药物设计领域。然而,这些模型通常在生成高质量分子方面存在困难,特别是在特定分子属性必须满足的条件情况下。在这项工作中,我们引入了GeoRCG,这是一个改进分子生成模型的通用框架,通过将几何表示条件与可证明的理论保证相结合。我们将生成过程分解为两个阶段:首先,生成一个信息丰富的几何表示;其次,在表示的条件下生成一个分子。与单阶段生成相比,在第一阶段易生成的表示引导第二阶段以目标导向的方式生成高质量分子。利用EDM和SemlaFlow作为基础生成器,我们观察到在广泛使用的QM9和GEOM-DRUG数据集上,无条件分子生成方面有显著的质量改进。更重要的是,在具有挑战性的条件分子生成任务中,我们的框架比最先进的方法平均提高了50\%的性能,突显了在语义丰富的几何表示上进行条件化的优越性。此外,通过这种表示引导,扩散步骤的数量可以减少至100步,同时在很大程度上保持了在1,000步时实现的生成质量,从而显著减少了所需的生成迭代次数。代码可在https://github.com/GraphPKU/GeoRCG 上找到。

更新时间: 2025-07-28 05:05:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.03655v4

Geometric Representation Condition Improves Equivariant Molecule Generation

Recent advances in molecular generative models have demonstrated great promise for accelerating scientific discovery, particularly in drug design. However, these models often struggle to generate high-quality molecules, especially in conditional scenarios where specific molecular properties must be satisfied. In this work, we introduce GeoRCG, a general framework to improve molecular generative models by integrating geometric representation conditions with provable theoretical guarantees. We decompose the generation process into two stages: first, generating an informative geometric representation; second, generating a molecule conditioned on the representation. Compared with single-stage generation, the easy-to-generate representation in the first stage guides the second stage generation toward a high-quality molecule in a goal-oriented way. Leveraging EDM and SemlaFlow as base generators, we observe significant quality improvements in unconditional molecule generation on the widely used QM9 and GEOM-DRUG datasets. More notably, in the challenging conditional molecular generation task, our framework achieves an average 50\% performance improvement over state-of-the-art approaches, highlighting the superiority of conditioning on semantically rich geometric representations. Furthermore, with such representation guidance, the number of diffusion steps can be reduced to as small as 100 while largely preserving the generation quality achieved with 1,000 steps, thereby significantly reducing the generation iterations needed. Code is available at https://github.com/GraphPKU/GeoRCG.

Updated: 2025-07-28 05:05:47

标题: 几何表征条件改善等变分子生成

摘要: 近年来,分子生成模型的最新进展展示了加速科学发现的巨大潜力,特别是在药物设计领域。然而,这些模型通常难以生成高质量的分子,特别是在特定分子属性必须得到满足的条件情况下。在这项工作中,我们引入了GeoRCG,一个通用框架,通过将几何表示条件与可证明的理论保证相结合,改进分子生成模型。我们将生成过程分解为两个阶段:首先,生成一个信息丰富的几何表示;其次,根据该表示生成一个分子。与单阶段生成相比,第一阶段中易于生成的表示引导第二阶段以目标为导向的方式生成高质量的分子。利用EDM和SemlaFlow作为基本生成器,我们观察到在广泛使用的QM9和GEOM-DRUG数据集上的无条件分子生成中取得了显著的质量改进。更值得注意的是,在具有挑战性的条件分子生成任务中,我们的框架比最先进的方法平均提高了50%的性能,突显了在语义丰富的几何表示上进行条件处理的优越性。此外,通过这种表示引导,扩散步骤的数量可以减少至100步,同时在很大程度上保持了在1,000步时实现的生成质量,从而大大减少了所需的生成迭代次数。代码可在https://github.com/GraphPKU/GeoRCG 上找到。

更新时间: 2025-07-28 05:05:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.03655v4

Guide your favorite protein sequence generative model

Generative machine learning models on sequences are transforming protein engineering. However, no principled framework exists for conditioning these models on auxiliary information, such as experimental data, in a plug-and-play manner. Herein, we present ProteinGuide -- a principled and general method for conditioning -- by unifying a broad class of protein generative models under a single framework. We demonstrate the applicability of ProteinGuide by guiding two protein generative models, ProteinMPNN and ESM3, to generate amino acid and structure token sequences, conditioned on several user-specified properties such as enhanced stability, enzyme classes, and CATH-labeled folds. We also used ProteinGuide with inverse folding models and our own experimental assay to design adenine base editor sequences for high activity.

Updated: 2025-07-28 04:57:58

标题: 指导你喜欢的蛋白质序列生成模型

摘要: 序列上的生成式机器学习模型正在改变蛋白质工程。然而,目前还没有一个基本的框架来在插入式的方式下,将这些模型与实验数据等辅助信息结合起来。在这里,我们提出了ProteinGuide - 一种基本且通用的条件方法 - 通过将广泛类别的蛋白质生成模型统一到一个框架下。我们通过引导两个蛋白质生成模型ProteinMPNN和ESM3来展示ProteinGuide的适用性,生成氨基酸和结构标记序列,条件是根据用户指定的性质,如增强稳定性,酶类和CATH标记的折叠。我们还使用ProteinGuide与逆向折叠模型和我们自己的实验检测来设计腺嘌呤碱基编辑器序列,以获得高活性。

更新时间: 2025-07-28 04:57:58

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2505.04823v3

Guide your favorite protein sequence generative model

Generative machine learning models on sequences are transforming protein engineering. However, no principled framework exists for conditioning these models on auxiliary information, such as experimental data, in a plug-and-play manner. Herein, we present ProteinGuide -- a principled and general method for conditioning -- by unifying a broad class of protein generative models under a single framework. We demonstrate the applicability of ProteinGuide by guiding two protein generative models, ProteinMPNN and ESM3, to generate amino acid and structure token sequences, conditioned on several user-specified properties such as enhanced stability, enzyme classes, and CATH-labeled folds. We also used ProteinGuide with inverse folding models and our own experimental assay to design adenine base editor sequences for high activity.

Updated: 2025-07-28 04:57:58

标题: 指导您喜爱的蛋白质序列生成模型

摘要: 序列上的生成式机器学习模型正在改变蛋白质工程。然而,目前尚无根本的框架来将这些模型与实验数据等辅助信息进行条件化,以实现即插即用的方式。在这篇文章中,我们提出了ProteinGuide - 一种根本和通用的条件化方法,通过将广泛的蛋白质生成模型统一在一个框架下。我们通过引导两个蛋白质生成模型ProteinMPNN和ESM3展示了ProteinGuide的适用性,用于生成氨基酸和结构令牌序列,条件是基于用户指定的一些属性,例如增强稳定性、酶类别和CATH标记的褶皱。我们还使用ProteinGuide与逆向折叠模型以及自己的实验测定来设计腺嘌呤碱基编辑酶序列,以获得高活性。

更新时间: 2025-07-28 04:57:58

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2505.04823v3

PatchTraj: Dynamic Patch Representation Learning for Time-Frequency Trajectory Prediction

Pedestrian trajectory prediction is crucial for autonomous driving and robotics. While existing point-based and grid-based methods expose two key limitations: insufficiently modeling human motion dynamics, as they fail to balance local motion details with long-range spatiotemporal dependencies, and the time representation lacks interaction with the frequency domain in modeling trajectory sequences. To address these challenges, we propose PatchTraj, a dynamic patch-based trajectory prediction framework that unifies time-domain and frequency-domain representations. Specifically, we decompose the trajectory into raw time sequences and frequency components, employing dynamic patch partitioning for multi-scale trajectory segmentation to capture hierarchical motion patterns. Each patch is processed by an adaptive embedding layer with scale-aware feature extraction, followed by hierarchical feature aggregation to model both fine-grained and long-range dependencies. The outputs of two branches interact via cross-modal attention, enabling complementary fusion of temporal and spectral cues. Finally, a Transformer encoder-decoder integrates both modalities to autoregressively predict future trajectories. Extensive experiments on ETH-UCY, SDD, NBA, and JRDB datasets demonstrate that our method achieves state-of-the-art performance with high efficiency.

Updated: 2025-07-28 04:52:18

标题: PatchTraj:动态补丁表示学习用于时频轨迹预测

摘要: 行人轨迹预测对自动驾驶和机器人至关重要。现有的基于点和基于网格的方法存在两个关键局限性:未能充分建模人类运动动态,因为它们未能平衡局部运动细节与长程时空依赖性,并且时间表示缺乏与频率域的交互以建模轨迹序列。为了解决这些挑战,我们提出了PatchTraj,一个动态基于补丁的轨迹预测框架,统一了时间域和频率域表示。具体地,我们将轨迹分解为原始时间序列和频率成分,利用动态补丁分区进行多尺度轨迹分割以捕捉分层运动模式。每个补丁通过一个自适应嵌入层进行处理,进行尺度感知特征提取,然后通过分层特征聚合来模拟细粒度和长程依赖关系。两个分支的输出通过交互式跨模态注意力相互作用,实现了时间和频谱提示的互补融合。最后,一个Transformer编码器-解码器整合了这两种模态,自回归地预测未来的轨迹。对ETH-UCY、SDD、NBA和JRDB数据集的广泛实验表明,我们的方法以高效率实现了最先进的性能。

更新时间: 2025-07-28 04:52:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.19119v2

PatchTraj: Dynamic Patch Representation Learning for Time-Frequency Trajectory Prediction

Pedestrian trajectory prediction is crucial for autonomous driving and robotics. While existing point-based and grid-based methods expose two key limitations: insufficiently modeling human motion dynamics, as they fail to balance local motion details with long-range spatiotemporal dependencies, and the time representation lacks interaction with the frequency domain in modeling trajectory sequences. To address these challenges, we propose PatchTraj, a dynamic patch-based trajectory prediction framework that unifies time-domain and frequency-domain representations. Specifically, we decompose the trajectory into raw time sequences and frequency components, employing dynamic patch partitioning for multi-scale trajectory segmentation to capture hierarchical motion patterns. Each patch is processed by an adaptive embedding layer with scale-aware feature extraction, followed by hierarchical feature aggregation to model both fine-grained and long-range dependencies. The outputs of two branches interact via cross-modal attention, enabling complementary fusion of temporal and spectral cues. Finally, a Transformer encoder-decoder integrates both modalities to autoregressively predict future trajectories. Extensive experiments on ETH-UCY, SDD, NBA, and JRDB datasets demonstrate that our method achieves state-of-the-art performance with high efficiency.

Updated: 2025-07-28 04:52:18

标题: PatchTraj:时间频率轨迹预测的动态贴片表示学习

摘要: 行人轨迹预测对于自动驾驶和机器人至关重要。然而,现有基于点和基于网格的方法存在两个关键局限性:未能充分建模人类运动动态,因为它们未能平衡局部运动细节与长程时空依赖关系,并且时间表示在建模轨迹序列时缺乏与频域的交互。为了解决这些挑战,我们提出了PatchTraj,这是一个动态基于补丁的轨迹预测框架,统一了时间域和频率域表示。具体而言,我们将轨迹分解为原始时间序列和频率分量,采用动态补丁分割进行多尺度轨迹分割,以捕捉层次运动模式。每个补丁通过具有尺度感知特征提取的自适应嵌入层进行处理,然后通过层次特征聚合来建模精细和远程依赖关系。两个分支的输出通过跨模态注意力相互作用,实现了时间和频谱线索的互补融合。最后,一个Transformer编码器-解码器整合了两种模式,以自回归方式预测未来的轨迹。在ETH-UCY、SDD、NBA和JRDB数据集上的大量实验表明,我们的方法以高效率实现了最先进的性能。

更新时间: 2025-07-28 04:52:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.19119v2

AdaptHetero: Machine Learning Interpretation-Driven Subgroup Adaptation for EHR-Based Clinical Prediction

Machine learning interpretation has primarily been leveraged to build clinician trust and uncover actionable insights in EHRs. However, the intrinsic complexity and heterogeneity of EHR data limit its effectiveness in guiding subgroup-specific modeling. We propose AdaptHetero, a novel MLI-driven framework that transforms interpretability insights into actionable guidance for tailoring model training and evaluation across subpopulations within individual hospital systems. Evaluated on three large-scale EHR datasets - GOSSIS-1-eICU, WiDS, and MIMIC-IV - AdaptHetero consistently identifies heterogeneous model behaviors in predicting ICU mortality, in-hospital death, and hidden hypoxemia. By integrating SHAP-based interpretation and unsupervised clustering, the framework enhances the identification of clinically meaningful subgroup-specific characteristics, leading to improved predictive performance.

Updated: 2025-07-28 04:37:03

标题: AdaptHetero:基于机器学习解释的亚组适应性,用于基于电子病历的临床预测

摘要: 机器学习解释主要被利用来建立临床医生的信任,并在电子健康记录中发现可行的见解。然而,电子健康记录数据的固有复杂性和异质性限制了其在引导特定亚组建模方面的有效性。我们提出了AdaptHetero,这是一个新颖的MLI驱动框架,将可解释性见解转化为针对个体医院系统中亚群体模型训练和评估的可操作指导。在三个大规模电子健康记录数据集 - GOSSIS-1-eICU、WiDS和MIMIC-IV 上进行评估,AdaptHetero一直能够识别在预测重症监护室死亡率、住院死亡率和隐藏性低氧血症方面的异质模型行为。通过整合基于SHAP的解释和无监督聚类,该框架增强了对临床意义的亚组特定特征的识别,从而提高了预测性能。

更新时间: 2025-07-28 04:37:03

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.21197v1

Neural Spectral Band Generation for Audio Coding

Spectral band replication (SBR) enables bit-efficient coding by generating high-frequency bands from the low-frequency ones. However, it only utilizes coarse spectral features upon a subband-wise signal replication, limiting adaptability to diverse acoustic signals. In this paper, we explore the efficacy of a deep neural network (DNN)-based generative approach for coding the high-frequency bands, which we call neural spectral band generation (n-SBG). Specifically, we propose a DNN-based encoder-decoder structure to extract and quantize the side information related to the high-frequency components and generate the components given both the side information and the decoded core-band signals. The whole coding pipeline is optimized with generative adversarial criteria to enable the generation of perceptually plausible sound. From experiments using AAC as the core codec, we show that the proposed method achieves a better perceptual quality than HE-AAC-v1 with much less side information.

Updated: 2025-07-28 04:36:04

标题: 神经网络光谱带生成用于音频编码

摘要: 光谱带复制(SBR)通过从低频带生成高频带,实现了比特有效编码。然而,它仅在子带信号复制时利用了粗糙的光谱特征,限制了对多样化声学信号的适应性。在本文中,我们探讨了基于深度神经网络(DNN)的生成方法在编码高频带方面的有效性,我们称之为神经光谱带生成(n-SBG)。具体而言,我们提出了一个基于DNN的编码器-解码器结构,用于提取和量化与高频分量相关的辅助信息,并在给定辅助信息和解码后的核心频带信号的情况下生成这些分量。整个编码流程经过生成对抗准则优化,以实现感知上合理的声音生成。通过使用AAC作为核心编解码器进行的实验,我们展示了所提出的方法比HE-AAC-v1具有更好的感知质量,且所需的辅助信息更少。

更新时间: 2025-07-28 04:36:04

领域: eess.AS,cs.AI,eess.SP

下载: http://arxiv.org/abs/2506.06732v2

Neural Spectral Band Generation for Audio Coding

Spectral band replication (SBR) enables bit-efficient coding by generating high-frequency bands from the low-frequency ones. However, it only utilizes coarse spectral features upon a subband-wise signal replication, limiting adaptability to diverse acoustic signals. In this paper, we explore the efficacy of a deep neural network (DNN)-based generative approach for coding the high-frequency bands, which we call neural spectral band generation (n-SBG). Specifically, we propose a DNN-based encoder-decoder structure to extract and quantize the side information related to the high-frequency components and generate the components given both the side information and the decoded core-band signals. The whole coding pipeline is optimized with generative adversarial criteria to enable the generation of perceptually plausible sound. From experiments using AAC as the core codec, we show that the proposed method achieves a better perceptual quality than HE-AAC-v1 with much less side information.

Updated: 2025-07-28 04:36:04

标题: 神经光谱带音频编码生成

摘要: 光谱带复制(SBR)通过从低频带生成高频带来实现比特高效编码。然而,它仅在子带信号复制时利用粗糙的光谱特征,限制了对不同声学信号的适应性。本文探讨了基于深度神经网络(DNN)的生成方法在编码高频带方面的有效性,我们称之为神经光谱带生成(n-SBG)。具体来说,我们提出了一个基于DNN的编码器-解码器结构,用于提取和量化与高频成分相关的辅助信息,并在给定辅助信息和解码的核心带信号的情况下生成成分。整个编码流程经过生成对抗准则的优化,以实现生成感知上合理的声音。通过使用AAC作为核心编解码器的实验,我们表明所提出的方法在比HE-AAC-v1更好地实现了感知质量,而且需要更少的辅助信息。

更新时间: 2025-07-28 04:36:04

领域: eess.AS,cs.AI,eess.SP

下载: http://arxiv.org/abs/2506.06732v2

Efficient Proxy Raytracer for Optical Systems using Implicit Neural Representations

Ray tracing is a widely used technique for modeling optical systems, involving sequential surface-by-surface computations, which can be computationally intensive. We propose Ray2Ray, a novel method that leverages implicit neural representations to model optical systems with greater efficiency, eliminating the need for surface-by-surface computations in a single pass end-to-end model. Ray2Ray learns the mapping between rays emitted from a given source and their corresponding rays after passing through a given optical system in a physically accurate manner. We train Ray2Ray on nine off-the-shelf optical systems, achieving positional errors on the order of 1{\mu}m and angular deviations on the order 0.01 degrees in the estimated output rays. Our work highlights the potential of neural representations as a proxy for optical raytracer.

Updated: 2025-07-28 04:31:12

标题: 使用隐式神经表示的光学系统高效代理光线追踪器

摘要: 光线追踪是一种广泛使用的建模光学系统的技术,涉及逐个表面计算,这可能需要大量计算资源。我们提出了一种新颖的方法Ray2Ray,利用隐式神经表示来更高效地建模光学系统,消除了在单次传递的端到端模型中需要逐个表面计算的需求。Ray2Ray学习了从给定源发射的光线与通过给定光学系统后的相应光线之间的映射,以一种物理准确的方式。我们在九个现成的光学系统上训练了Ray2Ray,实现了估计输出光线的位置误差在1μm数量级和角度偏差在0.01度数量级。我们的工作突显了神经表示作为光线追踪器的潜力。

更新时间: 2025-07-28 04:31:12

领域: cs.LG

下载: http://arxiv.org/abs/2507.20513v1

Efficient Proxy Raytracer for Optical Systems using Implicit Neural Representations

Ray tracing is a widely used technique for modeling optical systems, involving sequential surface-by-surface computations, which can be computationally intensive. We propose Ray2Ray, a novel method that leverages implicit neural representations to model optical systems with greater efficiency, eliminating the need for surface-by-surface computations in a single pass end-to-end model. Ray2Ray learns the mapping between rays emitted from a given source and their corresponding rays after passing through a given optical system in a physically accurate manner. We train Ray2Ray on nine off-the-shelf optical systems, achieving positional errors on the order of 1{\mu}m and angular deviations on the order 0.01 degrees in the estimated output rays. Our work highlights the potential of neural representations as a proxy for optical raytracer.

Updated: 2025-07-28 04:31:12

标题: 使用隐式神经表示的光学系统的高效代理光线追踪器

摘要: 光线追踪是一种广泛应用于建模光学系统的技术,涉及逐个表面计算,这可能会消耗大量计算资源。我们提出了一种名为Ray2Ray的新方法,利用隐式神经表示来更有效地建模光学系统,消除了在单次传递中逐个表面计算的需求。Ray2Ray学习了从给定源发射的光线与通过给定光学系统后的相应光线之间的映射,以物理上准确的方式。我们在九个现成的光学系统上训练Ray2Ray,实现了在估计输出光线中1μm数量级的位置误差和0.01度数量级的角度偏差。我们的工作突出了神经表示作为光线追踪器的潜力。

更新时间: 2025-07-28 04:31:12

领域: cs.LG

下载: http://arxiv.org/abs/2507.20513v1

Active Attack Resilience in 5G: A New Take on Authentication and Key Agreement

As 5G networks expand into critical infrastructure, secure and efficient user authentication is more important than ever. The 5G-AKA protocol, standardized by 3GPP in TS 33.501, is central to authentication in current 5G deployments. It provides mutual authentication, user privacy, and key secrecy. However, despite its adoption, 5G-AKA has known limitations in both security and performance. While it focuses on protecting privacy against passive attackers, recent studies show its vulnerabilities to active attacks. It also relies on a sequence number mechanism to prevent replay attacks, requiring perfect synchronization between the device and the core network. This stateful design adds complexity, causes desynchronization, and incurs extra communication overhead. More critically, 5G-AKA lacks Perfect Forward Secrecy (PFS), exposing past communications if long-term keys are compromised-an increasing concern amid sophisticated threats. This paper proposes an enhanced authentication protocol that builds on 5G-AKA's design while addressing its shortcomings. First, we introduce a stateless version that removes sequence number reliance, reducing complexity while staying compatible with existing SIM cards and infrastructure. We then extend this design to add PFS with minimal cryptographic overhead. Both protocols are rigorously analyzed using ProVerif, confirming their compliance with all major security requirements, including resistance to passive and active attacks, as well as those defined by 3GPP and academic studies. We also prototype both protocols and evaluate their performance against 5G-AKA and 5G-AKA' (USENIX'21). Our results show the proposed protocols offer stronger security with only minor computational overhead, making them practical, future-ready solutions for 5G and beyond.

Updated: 2025-07-28 04:24:08

标题: 5G中的主动攻击抗性:对认证和密钥协商的新措施

摘要: 随着5G网络扩展到关键基础设施,安全高效的用户认证比以往任何时候都更为重要。3GPP在TS 33.501中标准化的5G-AKA协议是当前5G部署中认证的核心。它提供了相互认证、用户隐私和密钥保密性。然而,尽管被广泛采用,5G-AKA在安全性和性能方面存在已知的局限性。尽管它专注于保护隐私免受被动攻击者的侵害,但最近的研究表明它对主动攻击的漏洞。它还依赖于序列号机制来防止重放攻击,需要设备和核心网络之间的完美同步。这种有状态的设计增加了复杂性,导致不同步,并产生额外的通信开销。更为关键的是,5G-AKA缺乏完美前向保密性(PFS),如果长期密钥被泄露,会暴露过去的通信-这是在面对复杂威胁日益增加的情况下的一个关注点。本文提出了一个增强的认证协议,基于5G-AKA的设计,并解决了其缺点。首先,我们引入了一个无状态版本,消除了对序列号的依赖,降低了复杂性,同时与现有SIM卡和基础设施兼容。然后,我们扩展了这一设计,添加了最小的密码学开销来实现PFS。通过使用ProVerif对这两个协议进行了严格分析,确认它们符合所有主要安全要求,包括对抗被动和主动攻击的抵抗力,以及3GPP和学术研究所定义的要求。我们还对这两个协议进行原型验证,并评估它们与5G-AKA和5G-AKA'(USENIX'21)的性能。我们的结果显示,所提出的协议提供了更强的安全性,只需很少的计算开销,使其成为5G及更高版本的实用、面向未来的解决方案。

更新时间: 2025-07-28 04:24:08

领域: cs.CR,cs.NI,68M25,C.2.2

下载: http://arxiv.org/abs/2507.17491v2

LLMs-guided adaptive compensator: Bringing Adaptivity to Automatic Control Systems with Large Language Models

With rapid advances in code generation, reasoning, and problem-solving, Large Language Models (LLMs) are increasingly applied in robotics. Most existing work focuses on high-level tasks such as task decomposition. A few studies have explored the use of LLMs in feedback controller design; however, these efforts are restricted to overly simplified systems, fixed-structure gain tuning, and lack real-world validation. To further investigate LLMs in automatic control, this work targets a key subfield: adaptive control. Inspired by the framework of model reference adaptive control (MRAC), we propose an LLM-guided adaptive compensator framework that avoids designing controllers from scratch. Instead, the LLMs are prompted using the discrepancies between an unknown system and a reference system to design a compensator that aligns the response of the unknown system with that of the reference, thereby achieving adaptivity. Experiments evaluate five methods: LLM-guided adaptive compensator, LLM-guided adaptive controller, indirect adaptive control, learning-based adaptive control, and MRAC, on soft and humanoid robots in both simulated and real-world environments. Results show that the LLM-guided adaptive compensator outperforms traditional adaptive controllers and significantly reduces reasoning complexity compared to the LLM-guided adaptive controller. The Lyapunov-based analysis and reasoning-path inspection demonstrate that the LLM-guided adaptive compensator enables a more structured design process by transforming mathematical derivation into a reasoning task, while exhibiting strong generalizability, adaptability, and robustness. This study opens a new direction for applying LLMs in the field of automatic control, offering greater deployability and practicality compared to vision-language models.

Updated: 2025-07-28 04:12:43

标题: LLMs引导的自适应补偿器:将自适应性引入具有大型语言模型的自动控制系统

摘要: 随着代码生成、推理和问题解决的快速进展,大型语言模型(LLMs)在机器人技术中的应用越来越多。大多数现有工作集中在高级任务,如任务分解。少数研究探讨了LLMs在反馈控制器设计中的应用;然而,这些努力仅限于过度简化的系统、固定结构增益调整,并且缺乏真实世界的验证。为了进一步研究LLMs在自动控制中的应用,本研究针对一个关键子领域:自适应控制。受模型参考自适应控制(MRAC)框架启发,我们提出了一种LLM引导的自适应补偿器框架,避免了从头设计控制器。相反,LLMs通过未知系统与参考系统之间的差异来设计一个使未知系统响应与参考系统一致的补偿器,从而实现了适应性。实验评估了五种方法:LLM引导的自适应补偿器、LLM引导的自适应控制器、间接自适应控制、基于学习的自适应控制和MRAC,在模拟和真实环境中的软体和人形机器人上。结果显示,与传统自适应控制器相比,LLM引导的自适应补偿器表现更好,并且与LLM引导的自适应控制器相比,极大地降低了推理复杂性。基于李雅普诺夫的分析和推理路径检查表明,LLM引导的自适应补偿器通过将数学推导转化为推理任务,实现了更有结构的设计过程,同时表现出强大的泛化能力、适应性和鲁棒性。本研究开辟了LLMs在自动控制领域的新方向,相较于视觉语言模型,提供了更大的部署性和实用性。

更新时间: 2025-07-28 04:12:43

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.20509v1

LLMs-guided adaptive compensator: Bringing Adaptivity to Automatic Control Systems with Large Language Models

With rapid advances in code generation, reasoning, and problem-solving, Large Language Models (LLMs) are increasingly applied in robotics. Most existing work focuses on high-level tasks such as task decomposition. A few studies have explored the use of LLMs in feedback controller design; however, these efforts are restricted to overly simplified systems, fixed-structure gain tuning, and lack real-world validation. To further investigate LLMs in automatic control, this work targets a key subfield: adaptive control. Inspired by the framework of model reference adaptive control (MRAC), we propose an LLM-guided adaptive compensator framework that avoids designing controllers from scratch. Instead, the LLMs are prompted using the discrepancies between an unknown system and a reference system to design a compensator that aligns the response of the unknown system with that of the reference, thereby achieving adaptivity. Experiments evaluate five methods: LLM-guided adaptive compensator, LLM-guided adaptive controller, indirect adaptive control, learning-based adaptive control, and MRAC, on soft and humanoid robots in both simulated and real-world environments. Results show that the LLM-guided adaptive compensator outperforms traditional adaptive controllers and significantly reduces reasoning complexity compared to the LLM-guided adaptive controller. The Lyapunov-based analysis and reasoning-path inspection demonstrate that the LLM-guided adaptive compensator enables a more structured design process by transforming mathematical derivation into a reasoning task, while exhibiting strong generalizability, adaptability, and robustness. This study opens a new direction for applying LLMs in the field of automatic control, offering greater deployability and practicality compared to vision-language models.

Updated: 2025-07-28 04:12:43

标题: LLMs引导的自适应补偿器:将适应性引入具有大型语言模型的自动控制系统

摘要: 随着代码生成、推理和问题解决的快速进展,大型语言模型(LLMs)在机器人领域的应用越来越多。大多数现有研究集中在高级任务,如任务分解上。少数研究探讨了LLMs在反馈控制器设计中的应用;然而,这些努力局限于过度简化的系统、固定结构的增益调整,缺乏真实世界的验证。为了进一步研究LLMs在自动控制中的应用,本研究针对一个关键子领域:自适应控制。受到模型参考自适应控制(MRAC)框架的启发,我们提出了一种LLM引导的自适应补偿器框架,避免了从头开始设计控制器。相反,LLMs通过未知系统和参考系统之间的差异来设计一个使未知系统的响应与参考系统一致的补偿器,从而实现了适应性。实验评估了五种方法:LLM引导的自适应补偿器、LLM引导的自适应控制器、间接自适应控制、基于学习的自适应控制和MRAC,在模拟和真实环境中对软体和人形机器人进行了测试。结果表明,LLM引导的自适应补偿器优于传统的自适应控制器,并与LLM引导的自适应控制器相比显著降低了推理复杂性。基于李雅普诺夫分析和推理路径检验,LLM引导的自适应补偿器实现了更结构化的设计过程,将数学推导转化为推理任务,同时表现出强大的泛化能力、适应性和鲁棒性。这项研究开辟了LLMs在自动控制领域的新方向,相比视觉-语言模型,提供了更大的部署性和实用性。

更新时间: 2025-07-28 04:12:43

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.20509v1

Contrastive learning-based agent modeling for deep reinforcement learning

Multi-agent systems often require agents to collaborate with or compete against other agents with diverse goals, behaviors, or strategies. Agent modeling is essential when designing adaptive policies for intelligent machine agents in multiagent systems, as this is the means by which the ego agent understands other agents' behavior and extracts their meaningful policy representations. These representations can be used to enhance the ego agent's adaptive policy which is trained by reinforcement learning. However, existing agent modeling approaches typically assume the availability of local observations from other agents (modeled agents) during training or a long observation trajectory for policy adaption. To remove these constrictive assumptions and improve agent modeling performance, we devised a Contrastive Learning-based Agent Modeling (CLAM) method that relies only on the local observations from the ego agent during training and execution. With these observations, CLAM is capable of generating consistent high-quality policy representations in real-time right from the beginning of each episode. We evaluated the efficacy of our approach in both cooperative and competitive multi-agent environments. Our experiments demonstrate that our approach achieves state-of-the-art on both cooperative and competitive tasks, highlighting the potential of contrastive learning-based agent modeling for enhancing reinforcement learning.

Updated: 2025-07-28 04:12:02

标题: 对深度强化学习的对比学习代理建模

摘要: 多智能体系统通常需要代理与具有不同目标、行为或策略的其他代理进行合作或竞争。在设计多智能体系统中智能机器代理的自适应策略时,代理建模是必不可少的,因为这是自我代理理解其他代理行为并提取其有意义策略表示的手段。这些表示可以用来增强受强化学习训练的自适应策略。然而,现有的代理建模方法通常假定在训练期间或策略适应过程中,可以从其他代理(建模代理)获得本地观察结果或长观察轨迹。为了消除这些限制性假设并提高代理建模性能,我们设计了一种基于对比学习的代理建模(CLAM)方法,只依赖于自我代理在训练和执行过程中的本地观察。通过这些观察结果,CLAM能够从每个情节的开始就实时生成一致高质量的策略表示。我们在合作和竞争多智能体环境中评估了我们方法的功效。我们的实验表明,我们的方法在合作和竞争任务上均达到了最先进的水平,突显了基于对比学习的代理建模对增强强化学习的潜力。

更新时间: 2025-07-28 04:12:02

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2401.00132v3

Contrastive learning-based agent modeling for deep reinforcement learning

Multi-agent systems often require agents to collaborate with or compete against other agents with diverse goals, behaviors, or strategies. Agent modeling is essential when designing adaptive policies for intelligent machine agents in multiagent systems, as this is the means by which the ego agent understands other agents' behavior and extracts their meaningful policy representations. These representations can be used to enhance the ego agent's adaptive policy which is trained by reinforcement learning. However, existing agent modeling approaches typically assume the availability of local observations from other agents (modeled agents) during training or a long observation trajectory for policy adaption. To remove these constrictive assumptions and improve agent modeling performance, we devised a Contrastive Learning-based Agent Modeling (CLAM) method that relies only on the local observations from the ego agent during training and execution. With these observations, CLAM is capable of generating consistent high-quality policy representations in real-time right from the beginning of each episode. We evaluated the efficacy of our approach in both cooperative and competitive multi-agent environments. Our experiments demonstrate that our approach achieves state-of-the-art on both cooperative and competitive tasks, highlighting the potential of contrastive learning-based agent modeling for enhancing reinforcement learning.

Updated: 2025-07-28 04:12:02

标题: 对比学习驱动的代理建模在深度强化学习中的应用

摘要: 多智能体系统通常需要代理与具有不同目标、行为或策略的其他代理进行合作或竞争。在设计多智能体系统中智能机器代理的自适应策略时,代理建模是必不可少的,因为这是自我代理理解其他代理行为并提取其有意义策略表示的手段。这些表示可以用来增强由强化学习训练的自适应策略。然而,现有的代理建模方法通常假设在训练期间或策略适应期间有来自其他代理(被建模代理)的本地观察,或者有一个长时间的观察轨迹。为了消除这些限制性假设并提高代理建模性能,我们设计了一种基于对比学习的代理建模(CLAM)方法,该方法仅依赖于自我代理在训练和执行期间的本地观察。通过这些观察,CLAM能够从每一集的开始就实时生成一致高质量的策略表示。我们在合作和竞争多智能体环境中评估了我们方法的功效。我们的实验表明,我们的方法在合作和竞争任务上均达到了最先进水平,突显了基于对比学习的代理建模对增强学习的潜力。

更新时间: 2025-07-28 04:12:02

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2401.00132v3

Tensor Completion with Nearly Linear Samples Given Weak Side Information

Tensor completion exhibits an interesting computational-statistical gap in terms of the number of samples needed to perform tensor estimation. While there are only $\Theta(tn)$ degrees of freedom in a $t$-order tensor with $n^t$ entries, the best known polynomial time algorithm requires $O(n^{t/2})$ samples in order to guarantee consistent estimation. In this paper, we show that weak side information is sufficient to reduce the sample complexity to $O(n)$. The side information consists of a weight vector for each of the modes which is not orthogonal to any of the latent factors along that mode; this is significantly weaker than assuming noisy knowledge of the subspaces. We provide an algorithm that utilizes this side information to produce a consistent estimator with $O(n^{1+\kappa})$ samples for any small constant $\kappa > 0$. We also provide experiments on both synthetic and real-world datasets that validate our theoretical insights.

Updated: 2025-07-28 04:10:11

标题: 使用弱侧信息的近线性样本张量补全

摘要: 张量完成在进行张量估计时展示了一个有趣的计算统计差距。在一个具有$n^t$个条目的$t$阶张量中,只有$\Theta(tn)$个自由度,但已知的最佳多项式时间算法需要$O(n^{t/2})$个样本才能保证一致估计。本文表明,弱侧信息足以将样本复杂性降低到$O(n)$。这些侧信息包括每个模式的权重向量,该向量与沿该模式的任何潜在因子均不正交;这明显比假设对子空间的知识存在噪声要弱。我们提供了一种利用这些侧信息的算法,可以产生一个具有$O(n^{1+\kappa})$个样本的一致估计器,其中$\kappa>0$为任意小常数。我们还对合成和真实世界数据集进行了实验证实了我们的理论见解。

更新时间: 2025-07-28 04:10:11

领域: stat.ML,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2007.00736v4

Tensor Completion with Nearly Linear Samples Given Weak Side Information

Tensor completion exhibits an interesting computational-statistical gap in terms of the number of samples needed to perform tensor estimation. While there are only $\Theta(tn)$ degrees of freedom in a $t$-order tensor with $n^t$ entries, the best known polynomial time algorithm requires $O(n^{t/2})$ samples in order to guarantee consistent estimation. In this paper, we show that weak side information is sufficient to reduce the sample complexity to $O(n)$. The side information consists of a weight vector for each of the modes which is not orthogonal to any of the latent factors along that mode; this is significantly weaker than assuming noisy knowledge of the subspaces. We provide an algorithm that utilizes this side information to produce a consistent estimator with $O(n^{1+\kappa})$ samples for any small constant $\kappa > 0$. We also provide experiments on both synthetic and real-world datasets that validate our theoretical insights.

Updated: 2025-07-28 04:10:11

标题: 使用弱侧信息进行具有近线性样本的张量补全

摘要: 张量完成在需要执行张量估计所需的样本数量方面表现出一个有趣的计算统计差距。虽然在一个具有$n^t$个条目的$t$阶张量中只有$\Theta(tn)$个自由度,但已知的最佳多项式时间算法需要$O(n^{t/2})$个样本才能保证一致估计。在本文中,我们表明弱侧信息足以将样本复杂度降低到$O(n)$。侧信息包括每个模式的权重向量,该向量与沿该模式的任何潜在因子都不正交;这显著弱于假设对子空间的嘈杂知识。我们提供了一种利用这些侧信息的算法,可以生成具有$O(n^{1+\kappa})$个样本的一致估计器,其中$\kappa > 0$是任意小的常数。我们还提供了对合成和真实世界数据集的实验,以验证我们的理论洞察力。

更新时间: 2025-07-28 04:10:11

领域: stat.ML,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2007.00736v4

Attributed Graph Clustering with Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning

This study introduces the Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning (MPCCL) model, a novel approach for attributed graph clustering that effectively bridges critical gaps in existing methods, including long-range dependency, feature collapse, and information loss. Traditional methods often struggle to capture high-order graph features due to their reliance on low-order attribute information, while contrastive learning techniques face limitations in feature diversity by overemphasizing local neighborhood structures. Similarly, conventional graph coarsening methods, though reducing graph scale, frequently lose fine-grained structural details. MPCCL addresses these challenges through an innovative multi-scale coarsening strategy, which progressively condenses the graph while prioritizing the merging of key edges based on global node similarity to preserve essential structural information. It further introduces a one-to-many contrastive learning paradigm, integrating node embeddings with augmented graph views and cluster centroids to enhance feature diversity, while mitigating feature masking issues caused by the accumulation of high-frequency node weights during multi-scale coarsening. By incorporating a graph reconstruction loss and KL divergence into its self-supervised learning framework, MPCCL ensures cross-scale consistency of node representations. Experimental evaluations reveal that MPCCL achieves a significant improvement in clustering performance, including a remarkable 15.24% increase in NMI on the ACM dataset and notable robust gains on smaller-scale datasets such as Citeseer, Cora and DBLP.

Updated: 2025-07-28 03:59:57

标题: 使用多尺度基于权重的成对粗化和对比学习的归因图聚类

摘要: 这项研究介绍了多尺度基于权重的成对粗化和对比学习(MPCCL)模型,这是一种新颖的属性图聚类方法,有效地弥合了现有方法中的关键差距,包括长程依赖、特征坍缩和信息丢失。传统方法往往难以捕捉高阶图特征,因为它们依赖于低阶属性信息,而对比学习技术则通过过度强调局部邻域结构而限制了特征多样性。同样,传统图粗化方法虽然减少了图的规模,但经常丢失了细粒度结构细节。MPCCL通过创新的多尺度粗化策略来应对这些挑战,逐渐压缩图形,同时优先考虑基于全局节点相似性的关键边缘的合并,以保留重要的结构信息。它进一步引入了一对多对比学习范式,将节点嵌入与增强的图形视图和聚类中心相结合,以增强特征多样性,同时减轻由于多尺度粗化过程中高频节点权重累积而引起的特征掩模问题。通过将图形重建损失和KL散度引入其自监督学习框架,MPCCL确保节点表示的跨尺度一致性。实验评估显示,MPCCL在聚类性能上取得了显著的改进,ACM数据集的NMI增加了显著的15.24%,在较小规模数据集(如Citeseer、Cora和DBLP)上也取得了显著的稳健增益。

更新时间: 2025-07-28 03:59:57

领域: cs.LG

下载: http://arxiv.org/abs/2507.20505v1

Attributed Graph Clustering with Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning

This study introduces the Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning (MPCCL) model, a novel approach for attributed graph clustering that effectively bridges critical gaps in existing methods, including long-range dependency, feature collapse, and information loss. Traditional methods often struggle to capture high-order graph features due to their reliance on low-order attribute information, while contrastive learning techniques face limitations in feature diversity by overemphasizing local neighborhood structures. Similarly, conventional graph coarsening methods, though reducing graph scale, frequently lose fine-grained structural details. MPCCL addresses these challenges through an innovative multi-scale coarsening strategy, which progressively condenses the graph while prioritizing the merging of key edges based on global node similarity to preserve essential structural information. It further introduces a one-to-many contrastive learning paradigm, integrating node embeddings with augmented graph views and cluster centroids to enhance feature diversity, while mitigating feature masking issues caused by the accumulation of high-frequency node weights during multi-scale coarsening. By incorporating a graph reconstruction loss and KL divergence into its self-supervised learning framework, MPCCL ensures cross-scale consistency of node representations. Experimental evaluations reveal that MPCCL achieves a significant improvement in clustering performance, including a remarkable 15.24% increase in NMI on the ACM dataset and notable robust gains on smaller-scale datasets such as Citeseer, Cora and DBLP.

Updated: 2025-07-28 03:59:57

标题: 具有多尺度基于权重的成对粗化和对比学习的属性图聚类

摘要: 这项研究介绍了多尺度基于权重的成对粗化和对比学习(MPCCL)模型,这是一种针对属性图聚类的新方法,有效地弥合了现有方法中存在的关键差距,包括长距离依赖、特征坍缩和信息丢失。传统方法通常难以捕捉高阶图特征,因为它们依赖于低阶属性信息,而对比学习技术则通过过分强调局部邻域结构而面临特征多样性的限制。同样,传统图粗化方法虽然减小了图的规模,但经常丢失了细粒度的结构细节。MPCCL通过创新的多尺度粗化策略解决了这些挑战,该策略逐渐压缩图形,同时优先考虑基于全局节点相似性的关键边的合并,以保留关键的结构信息。它进一步引入了一对多的对比学习范式,将节点嵌入与增强的图形视图和聚类中心集成在一起,以增强特征多样性,同时减轻多尺度粗化过程中高频节点权重积累导致的特征掩盖问题。通过将图形重建损失和KL散度纳入其自监督学习框架,MPCCL确保节点表示的跨尺度一致性。实验评估表明,MPCCL在聚类性能方面取得了显著的改进,包括ACM数据集上NMI增加了惊人的15.24%,在较小规模的数据集(如Citeseer、Cora和DBLP)上也取得了显著的稳健收益。

更新时间: 2025-07-28 03:59:57

领域: cs.LG

下载: http://arxiv.org/abs/2507.20505v1

Prover Agent: An Agent-based Framework for Formal Mathematical Proofs

We present Prover Agent, a novel AI agent for automated theorem proving that integrates large language models (LLMs) with a formal proof assistant, Lean. Prover Agent coordinates an informal reasoning LLM, a formal prover model, and feedback from Lean while also generating auxiliary lemmas to assist in discovering the overall proof strategy. It achieves an 86.1% success rate on the MiniF2F benchmark, establishing a new state-of-the-art among methods using small language models (SLMs) with a much lower sample budget than previous approaches. We also present case studies illustrating how these generated lemmas contribute to solving challenging problems.

Updated: 2025-07-28 03:59:56

标题: 证明代理:一个基于代理的形式数学证明框架

摘要: 我们介绍了Prover Agent,这是一个新颖的人工智能代理,用于自动定理证明,它将大型语言模型(LLMs)与形式证明助手Lean集成在一起。Prover Agent协调了一个非正式推理LLM,一个形式证明模型,并从Lean获得反馈,同时生成辅助引理以帮助发现整体证明策略。在MiniF2F基准测试中,它取得了86.1%的成功率,建立了一种新的最先进方法,使用比以前更低的样本预算的小语言模型(SLMs)。我们还提供了案例研究,说明这些生成的引理如何有助于解决具有挑战性的问题。

更新时间: 2025-07-28 03:59:56

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.19923v2

Prover Agent: An Agent-based Framework for Formal Mathematical Proofs

We present Prover Agent, a novel AI agent for automated theorem proving that integrates large language models (LLMs) with a formal proof assistant, Lean. Prover Agent coordinates an informal reasoning LLM, a formal prover model, and feedback from Lean while also generating auxiliary lemmas to assist in discovering the overall proof strategy. It achieves an 86.1% success rate on the MiniF2F benchmark, establishing a new state-of-the-art among methods using small language models (SLMs) with a much lower sample budget than previous approaches. We also present case studies illustrating how these generated lemmas contribute to solving challenging problems.

Updated: 2025-07-28 03:59:56

标题: Prover Agent: 用于形式数学证明的基于代理的框架

摘要: 我们提出了Prover Agent,这是一个集成了大型语言模型(LLMs)和形式证明助手Lean的新型人工智能代理,用于自动定理证明。Prover Agent协调了一个非正式推理LLM、一个形式证明模型以及来自Lean的反馈,同时生成辅助引理来帮助发现整体证明策略。它在MiniF2F基准测试中实现了86.1%的成功率,建立了一种新的最先进方法,使用小语言模型(SLMs)比以前的方法具有更低的样本预算。我们还提供了案例研究,说明这些生成的引理如何有助于解决具有挑战性的问题。

更新时间: 2025-07-28 03:59:56

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.19923v2

REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models

Reinforcement Learning from Human Feedback (RLHF) plays a crucial role in aligning large language models (LLMs) with human values and preferences. While state-of-the-art applications like ChatGPT or GPT-4 commonly employ Proximal Policy Optimization (PPO), the inclusion of a critic network introduces significant computational overhead. REINFORCE-based methods, such as REINFORCE Leave One-Out (RLOO), ReMax, and Group Relative Policy Optimization (GRPO), address this limitation by eliminating the critic network. However, these approaches face challenges in accurate advantage estimation. Specifically, they estimate advantages independently for responses to each prompt, which can lead to overfitting on simpler prompts and vulnerability to reward hacking and may be biased. To address these challenges, we introduce REINFORCE++, a novel approach that removes the critic model while using the global advantage normalization which is unbiased to improve the training stability. Our empirical evaluation demonstrates that REINFORCE++ exhibits robust performance across various reward models without requiring prompt set truncation. Furthermore, it achieves superior generalization in both RLHF and long chain-of-thought (CoT) settings compared to existing REINFORCE-based methods. The implementation is available at https://github.com/OpenRLHF/OpenRLHF.

Updated: 2025-07-28 03:53:55

标题: REINFORCE++:一种对提示和奖励模型都具有鲁棒性的高效RLHF算法

摘要: 从人类反馈中进行强化学习(RLHF)在将大型语言模型(LLMs)与人类价值观和偏好对齐中起着至关重要的作用。尽管像ChatGPT或GPT-4这样的最先进应用通常采用Proximal Policy Optimization(PPO),但引入评论家网络会带来显着的计算开销。基于REINFORCE的方法,如REINFORCE Leave One-Out(RLOO)、ReMax和Group Relative Policy Optimization(GRPO),通过消除评论家网络来解决这一限制。然而,这些方法在准确的优势估计方面面临挑战。具体来说,它们独立为每个提示的响应估计优势,这可能导致在简单提示上过拟合,容易受到奖励欺骗的影响,并可能存在偏见。为了解决这些挑战,我们介绍了REINFORCE++,一种新颖的方法,该方法去除了评论家模型,同时使用全局优势归一化,这是无偏的,以改善训练稳定性。我们的实证评估表明,REINFORCE++在各种奖励模型中表现出稳健的性能,而无需截断提示集。此外,与现有的基于REINFORCE的方法相比,它在RLHF和长连贯思维(CoT)设置中实现了更好的泛化。该实现可在https://github.com/OpenRLHF/OpenRLHF找到。

更新时间: 2025-07-28 03:53:55

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2501.03262v7

REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models

Reinforcement Learning from Human Feedback (RLHF) plays a crucial role in aligning large language models (LLMs) with human values and preferences. While state-of-the-art applications like ChatGPT or GPT-4 commonly employ Proximal Policy Optimization (PPO), the inclusion of a critic network introduces significant computational overhead. REINFORCE-based methods, such as REINFORCE Leave One-Out (RLOO), ReMax, and Group Relative Policy Optimization (GRPO), address this limitation by eliminating the critic network. However, these approaches face challenges in accurate advantage estimation. Specifically, they estimate advantages independently for responses to each prompt, which can lead to overfitting on simpler prompts and vulnerability to reward hacking and may be biased. To address these challenges, we introduce REINFORCE++, a novel approach that removes the critic model while using the global advantage normalization which is unbiased to improve the training stability. Our empirical evaluation demonstrates that REINFORCE++ exhibits robust performance across various reward models without requiring prompt set truncation. Furthermore, it achieves superior generalization in both RLHF and long chain-of-thought (CoT) settings compared to existing REINFORCE-based methods. The implementation is available at https://github.com/OpenRLHF/OpenRLHF.

Updated: 2025-07-28 03:53:55

标题: REINFORCE++:一种对提示和奖励模型具有鲁棒性的高效RLHF算法

摘要: 人类反馈增强学习(RLHF)在将大型语言模型(LLMs)与人类价值观和偏好对齐中发挥着至关重要的作用。尽管诸如ChatGPT或GPT-4之类的最先进应用通常采用Proximal Policy Optimization(PPO),但评论者网络的引入会带来显着的计算开销。基于REINFORCE的方法,如REINFORCE Leave One-Out(RLOO)、ReMax和Group Relative Policy Optimization(GRPO),通过消除评论者网络来解决这个限制。然而,这些方法在准确估计优势方面面临挑战。具体来说,它们独立估计对每个提示的响应的优势,这可能会导致在更简单的提示上过拟合,并容易受到奖励破解的影响,可能会存在偏差。为了解决这些挑战,我们引入了REINFORCE++,这是一种新颖的方法,它在去除评论者模型的同时使用全局优势归一化,这是无偏的,以提高训练稳定性。我们的实证评估表明,REINFORCE++在各种奖励模型中展现出稳健的性能,而无需截断提示集。此外,与现有的基于REINFORCE的方法相比,它在RLHF和长连贯思维(CoT)设置中实现了更好的泛化。该实现可在https://github.com/OpenRLHF/OpenRLHF 上找到。

更新时间: 2025-07-28 03:53:55

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2501.03262v7

Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning

Machine Unlearning has recently garnered significant attention, aiming to selectively remove knowledge associated with specific data while preserving the model's performance on the remaining data. A fundamental challenge in this process is balancing effective unlearning with knowledge retention, as naive optimization of these competing objectives can lead to conflicting gradients, hindering convergence and degrading overall performance. To address this issue, we propose Learning to Unlearn while Retaining, aimed to mitigate gradient conflicts between unlearning and retention objectives. Our approach strategically avoids conflicts through an implicit gradient regularization mechanism that emerges naturally within the proposed framework. This prevents conflicting gradients between unlearning and retention, leading to effective unlearning while preserving the model's utility. We validate our approach across both discriminative and generative tasks, demonstrating its effectiveness in achieving unlearning without compromising performance on remaining data. Our results highlight the advantages of avoiding such gradient conflicts, outperforming existing methods that fail to account for these interactions.

Updated: 2025-07-28 03:53:31

标题: 学会遗忘同时保留:解决机器学习中的梯度冲突

摘要: 机器遗忘最近引起了很大关注,旨在有选择地删除与特定数据相关的知识,同时保持模型在剩余数据上的性能。在这个过程中的一个基本挑战是平衡有效的遗忘与知识保留,因为对这些竞争目标的天真优化可能导致冲突的梯度,阻碍收敛并降低整体性能。为了解决这个问题,我们提出了学习遗忘同时保留的方法,旨在缓解遗忘和保留目标之间的梯度冲突。我们的方法通过提出的框架内自然产生的隐式梯度正则化机制,策略性地避免冲突。这可以防止遗忘和保留之间的冲突梯度,从而实现有效的遗忘同时保留模型的效用。我们验证了我们的方法在辨别和生成任务中的有效性,展示了其在实现遗忘的同时不损害剩余数据性能方面的效果。我们的结果突显了避免这种梯度冲突的优势,优于那些未考虑这些交互作用的现有方法。

更新时间: 2025-07-28 03:53:31

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2503.06339v2

Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning

Machine Unlearning has recently garnered significant attention, aiming to selectively remove knowledge associated with specific data while preserving the model's performance on the remaining data. A fundamental challenge in this process is balancing effective unlearning with knowledge retention, as naive optimization of these competing objectives can lead to conflicting gradients, hindering convergence and degrading overall performance. To address this issue, we propose Learning to Unlearn while Retaining, aimed to mitigate gradient conflicts between unlearning and retention objectives. Our approach strategically avoids conflicts through an implicit gradient regularization mechanism that emerges naturally within the proposed framework. This prevents conflicting gradients between unlearning and retention, leading to effective unlearning while preserving the model's utility. We validate our approach across both discriminative and generative tasks, demonstrating its effectiveness in achieving unlearning without compromising performance on remaining data. Our results highlight the advantages of avoiding such gradient conflicts, outperforming existing methods that fail to account for these interactions.

Updated: 2025-07-28 03:53:31

标题: 学习忘记并保留:在机器学习中解决梯度冲突

摘要: 机器遗忘最近引起了广泛关注,旨在有选择性地删除与特定数据相关的知识,同时保留模型对剩余数据的性能。在这个过程中的一个基本挑战是在有效遗忘和知识保留之间取得平衡,因为对这些竞争目标的天真优化可能导致冲突梯度,阻碍收敛并降低整体性能。为了解决这个问题,我们提出了学习遗忘同时保留的方法,旨在减轻遗忘和保留目标之间的梯度冲突。我们的方法通过在提出的框架中自然产生的隐式梯度正则化机制,策略性地避免冲突。这可以防止遗忘和保留之间的冲突梯度,从而实现有效的遗忘同时保留模型的效用。我们验证了我们的方法在区分和生成任务中的有效性,证明了在不影响剩余数据性能的情况下实现遗忘的能力。我们的结果突显了避免这种梯度冲突的优势,胜过那些未考虑这些相互作用的现有方法。

更新时间: 2025-07-28 03:53:31

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2503.06339v2

Customize Multi-modal RAI Guardrails with Precedent-based predictions

A multi-modal guardrail must effectively filter image content based on user-defined policies, identifying material that may be hateful, reinforce harmful stereotypes, contain explicit material, or spread misinformation. Deploying such guardrails in real-world applications, however, poses significant challenges. Users often require varied and highly customizable policies and typically cannot provide abundant examples for each custom policy. Consequently, an ideal guardrail should be scalable to the multiple policies and adaptable to evolving user standards with minimal retraining. Existing fine-tuning methods typically condition predictions on pre-defined policies, restricting their generalizability to new policies or necessitating extensive retraining to adapt. Conversely, training-free methods struggle with limited context lengths, making it difficult to incorporate all the policies comprehensively. To overcome these limitations, we propose to condition model's judgment on "precedents", which are the reasoning processes of prior data points similar to the given input. By leveraging precedents instead of fixed policies, our approach greatly enhances the flexibility and adaptability of the guardrail. In this paper, we introduce a critique-revise mechanism for collecting high-quality precedents and two strategies that utilize precedents for robust prediction. Experimental results demonstrate that our approach outperforms previous methods across both few-shot and full-dataset scenarios and exhibits superior generalization to novel policies.

Updated: 2025-07-28 03:45:34

标题: 使用基于先例的预测定制多模RAI防护栏

摘要: 一个多模式的护栏必须根据用户定义的政策有效地过滤图像内容,识别可能具有仇恨、强化有害刻板印象、包含明确材料或传播错误信息的材料。然而,在现实世界的应用中部署这样的护栏存在显著挑战。用户通常需要多样化和高度可定制的政策,通常无法为每个自定义政策提供丰富的示例。因此,理想的护栏应该能够适应多个政策,并以最小的重新训练对用户标准的演变进行适应。现有的微调方法通常将预测条件化为预定义的政策,限制其对新政策的泛化能力,或者需要大量的重新训练来适应。相反,无需训练的方法在有限的上下文长度下存在困难,难以全面地纳入所有政策。为了克服这些限制,我们建议将模型的判断条件化为“先例”,即类似于给定输入的先前数据点的推理过程。通过利用先例而不是固定政策,我们的方法极大地增强了护栏的灵活性和适应性。在本文中,我们介绍了一个用于收集高质量先例的批评-修订机制,以及利用先例进行强大预测的两种策略。实验结果表明,我们的方法在少样本和完整数据集场景下均优于先前的方法,并展现出对新政策的优越泛化能力。

更新时间: 2025-07-28 03:45:34

领域: cs.LG,cs.CL,cs.CY

下载: http://arxiv.org/abs/2507.20503v1

Customize Multi-modal RAI Guardrails with Precedent-based predictions

A multi-modal guardrail must effectively filter image content based on user-defined policies, identifying material that may be hateful, reinforce harmful stereotypes, contain explicit material, or spread misinformation. Deploying such guardrails in real-world applications, however, poses significant challenges. Users often require varied and highly customizable policies and typically cannot provide abundant examples for each custom policy. Consequently, an ideal guardrail should be scalable to the multiple policies and adaptable to evolving user standards with minimal retraining. Existing fine-tuning methods typically condition predictions on pre-defined policies, restricting their generalizability to new policies or necessitating extensive retraining to adapt. Conversely, training-free methods struggle with limited context lengths, making it difficult to incorporate all the policies comprehensively. To overcome these limitations, we propose to condition model's judgment on "precedents", which are the reasoning processes of prior data points similar to the given input. By leveraging precedents instead of fixed policies, our approach greatly enhances the flexibility and adaptability of the guardrail. In this paper, we introduce a critique-revise mechanism for collecting high-quality precedents and two strategies that utilize precedents for robust prediction. Experimental results demonstrate that our approach outperforms previous methods across both few-shot and full-dataset scenarios and exhibits superior generalization to novel policies.

Updated: 2025-07-28 03:45:34

标题: 使用基于先例的预测定制多模式RAI防护栏

摘要: 一种多模态的护栏必须根据用户定义的政策有效地过滤图像内容,识别可能具有仇恨性、强化有害刻板印象、包含明确内容或传播错误信息的材料。然而,在现实世界的应用中部署这种护栏面临着重大挑战。用户通常需要各种各样且高度可定制的政策,并且通常无法为每个自定义政策提供丰富的示例。因此,理想的护栏应该能够扩展到多种政策,并且能够在最小程度的重新训练下适应不断发展的用户标准。现有的微调方法通常将预测条件化为预定义的政策,限制了其对新政策的泛化能力,或者需要进行大量的重新训练来适应。相反,无需训练的方法在有限的上下文长度下面临困难,很难全面融入所有政策。为了克服这些限制,我们提出将模型的判断条件化为“先例”,即先前数据点的推理过程类似于给定输入。通过利用先例而不是固定政策,我们的方法极大地增强了护栏的灵活性和适应性。在本文中,我们介绍了一个用于收集高质量先例的批判-修订机制,以及利用先例进行强大预测的两种策略。实验结果表明,我们的方法在少样本和完整数据集场景下均优于先前方法,并且对新政策具有更好的泛化性能。

更新时间: 2025-07-28 03:45:34

领域: cs.LG,cs.CL,cs.CY

下载: http://arxiv.org/abs/2507.20503v1

VDGraph: A Graph-Theoretic Approach to Unlock Insights from SBOM and SCA Data

The high complexity of modern software supply chains necessitates tools such as Software Bill of Materials (SBOMs) to manage component dependencies, and Software Composition Analysis (SCA) tools to identify vulnerabilities. While there exists limited integration between SBOMs and SCA tools, a unified view of complex dependency-vulnerability relationships remains elusive. In this paper, we introduce VDGraph, a novel knowledge graph-based methodology for integrating vulnerability and dependency data into a holistic view. VDGraph consolidates SBOM and SCA outputs into a graph representation of software projects' dependencies and vulnerabilities. We provide a formal description and analysis of the theoretical properties of VDGraph and present solutions to manage possible conflicts between the SBOM and SCA data. We further introduce and evaluate a practical, proof-of-concept implementation of VDGraph using two popular SBOM and SCA tools, namely CycloneDX Maven plugin and Google's OSV-Scanner. We apply VDGraph on 21 popular Java projects. Through the formulation of appropriate queries on the graphs, we uncover the existence of concentrated risk points (i.e., vulnerable components of high severity reachable through numerous dependency paths). We further show that vulnerabilities predominantly emerge at a depth of three dependency levels or higher, indicating that direct or secondary dependencies exhibit lower vulnerability density and tend to be more secure. Thus, VDGraph contributes a graph-theoretic methodology that improves visibility into how vulnerabilities propagate through complex, transitive dependencies. Moreover, our implementation, which combines open SBOM and SCA standards with Neo4j, lays a foundation for scalable and automated analysis across real-world projects.

Updated: 2025-07-28 03:43:29

标题: VDGraph:一种图论方法解锁SBOM和SCA数据的洞见

摘要: 现代软件供应链的高复杂性需要工具,如软件材料清单(SBOMs)来管理组件依赖关系,以及软件组成分析(SCA)工具来识别漏洞。虽然SBOM和SCA工具之间存在有限的集成,但对复杂依赖-漏洞关系的统一视图仍然难以实现。在本文中,我们介绍了VDGraph,一种基于知识图的新颖方法,用于将漏洞和依赖数据集成到一个整体视图中。VDGraph将SBOM和SCA的输出整合成软件项目依赖关系和漏洞的图形表示。我们提供了VDGraph的理论性质的正式描述和分析,并提出了管理SBOM和SCA数据之间可能冲突的解决方案。我们进一步介绍和评估了VDGraph的一个实际概念验证实现,使用了两个流行的SBOM和SCA工具,即CycloneDX Maven插件和谷歌的OSV-Scanner。我们在21个流行的Java项目上应用了VDGraph。通过在图中制定适当的查询,我们揭示了集中风险点的存在(即通过众多依赖路径可访达的高严重性的易受攻击组件)。我们进一步展示,漏洞主要出现在三个以上的依赖级别深度,表明直接或间接依赖性表现出较低的漏洞密度,并且更安全。因此,VDGraph提供了一种图论方法论,可以改善对漏洞如何通过复杂的传递依赖关系传播的可见性。此外,我们的实现结合了开放的SBOM和SCA标准与Neo4j,为跨真实项目的可扩展和自动化分析奠定了基础。

更新时间: 2025-07-28 03:43:29

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2507.20502v1

VGS-ATD: Robust Distributed Learning for Multi-Label Medical Image Classification Under Heterogeneous and Imbalanced Conditions

In recent years, advanced deep learning architectures have shown strong performance in medical imaging tasks. However, the traditional centralized learning paradigm poses serious privacy risks as all data is collected and trained on a single server. To mitigate this challenge, decentralized approaches such as federated learning and swarm learning have emerged, allowing model training on local nodes while sharing only model weights. While these methods enhance privacy, they struggle with heterogeneous and imbalanced data and suffer from inefficiencies due to frequent communication and the aggregation of weights. More critically, the dynamic and complex nature of clinical environments demands scalable AI systems capable of continuously learning from diverse modalities and multilabels. Yet, both centralized and decentralized models are prone to catastrophic forgetting during system expansion, often requiring full model retraining to incorporate new data. To address these limitations, we propose VGS-ATD, a novel distributed learning framework. To validate VGS-ATD, we evaluate it in experiments spanning 30 datasets and 80 independent labels across distributed nodes, VGS-ATD achieved an overall accuracy of 92.7%, outperforming centralized learning (84.9%) and swarm learning (72.99%), while federated learning failed under these conditions due to high requirements on computational resources. VGS-ATD also demonstrated strong scalability, with only a 1% drop in accuracy on existing nodes after expansion, compared to a 20% drop in centralized learning, highlighting its resilience to catastrophic forgetting. Additionally, it reduced computational costs by up to 50% relative to both centralized and swarm learning, confirming its superior efficiency and scalability.

Updated: 2025-07-28 03:40:05

标题: VGS-ATD:面向异构和不平衡条件下多标签医学图像分类的稳健分布式学习

摘要: 在近年来,先进的深度学习架构在医学影像任务中表现出强大的性能。然而,传统的集中式学习范式存在严重的隐私风险,因为所有数据都是收集并在单个服务器上训练的。为了缓解这一挑战,出现了分散式方法,如联邦学习和群体学习,允许在本地节点上进行模型训练,同时只共享模型权重。虽然这些方法增强了隐私性,但它们在处理异质和不平衡数据方面存在困难,并且由于频繁通信和权重聚合而导致效率低下。更为关键的是,临床环境的动态和复杂性要求可持续学习来自多种模态和多标签的可扩展人工智能系统。然而,集中式和分散式模型都容易在系统扩展过程中发生灾难性遗忘,通常需要完全重新训练模型以整合新数据。为了解决这些限制,我们提出了VGS-ATD,一种新颖的分布式学习框架。为验证VGS-ATD,我们在涵盖30个数据集和80个独立标签的实验中评估了它在分布式节点上的表现。VGS-ATD取得了92.7%的总体准确率,优于集中式学习(84.9%)和群体学习(72.99%),而联邦学习在这些条件下失败,因为对计算资源的要求很高。VGS-ATD还表现出很强的可扩展性,在扩展后现有节点上仅出现1%的准确率下降,而与之相比,集中式学习出现了20%的下降,突显了其对灾难性遗忘的抗性。此外,它相对于集中式和群体学习还将计算成本降低了高达50%,证实了其卓越的效率和可扩展性。

更新时间: 2025-07-28 03:40:05

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2507.18657v2

DmC: Nearest Neighbor Guidance Diffusion Model for Offline Cross-domain Reinforcement Learning

Cross-domain offline reinforcement learning (RL) seeks to enhance sample efficiency in offline RL by utilizing additional offline source datasets. A key challenge is to identify and utilize source samples that are most relevant to the target domain. Existing approaches address this challenge by measuring domain gaps through domain classifiers, target transition dynamics modeling, or mutual information estimation using contrastive loss. However, these methods often require large target datasets, which is impractical in many real-world scenarios. In this work, we address cross-domain offline RL under a limited target data setting, identifying two primary challenges: (1) Dataset imbalance, which is caused by large source and small target datasets and leads to overfitting in neural network-based domain gap estimators, resulting in uninformative measurements; and (2) Partial domain overlap, where only a subset of the source data is closely aligned with the target domain. To overcome these issues, we propose DmC, a novel framework for cross-domain offline RL with limited target samples. Specifically, DmC utilizes $k$-nearest neighbor ($k$-NN) based estimation to measure domain proximity without neural network training, effectively mitigating overfitting. Then, by utilizing this domain proximity, we introduce a nearest-neighbor-guided diffusion model to generate additional source samples that are better aligned with the target domain, thus enhancing policy learning with more effective source samples. Through theoretical analysis and extensive experiments in diverse MuJoCo environments, we demonstrate that DmC significantly outperforms state-of-the-art cross-domain offline RL methods, achieving substantial performance gains.

Updated: 2025-07-28 03:34:15

标题: DmC:离线跨领域强化学习的最近邻导向扩散模型

摘要: 跨领域离线强化学习(RL)旨在通过利用额外的离线源数据集来增强离线RL中的样本效率。一个关键挑战是识别和利用与目标领域最相关的源样本。现有方法通过测量域间差距通过域分类器、目标转移动力学建模或使用对比损失估计的互信息来解决这一挑战。然而,这些方法通常需要大量目标数据集,在许多现实场景中是不切实际的。在这项工作中,我们解决了在有限目标数据设置下的跨领域离线RL,确定了两个主要挑战:(1)数据集不平衡,这是由大量源数据和小量目标数据引起的,并导致基于神经网络的领域差距估计器过拟合,导致信息量不足的测量;和(2)部分领域重叠,其中只有源数据的一个子集与目标领域密切对齐。为了克服这些问题,我们提出了DmC,一个用于有限目标样本的跨领域离线RL的新框架。具体而言,DmC利用基于$k$-最近邻($k$-NN)的估计来测量领域接近度,而无需神经网络训练,有效地减轻过拟合。然后,通过利用这种领域接近性,我们引入了一个最近邻引导扩散模型,生成更好地与目标领域对齐的额外源样本,从而通过更有效的源样本增强策略学习。通过理论分析和在不同MuJoCo环境中的广泛实验,我们证明了DmC显著优于最先进的跨领域离线RL方法,实现了显著的性能提升。

更新时间: 2025-07-28 03:34:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20499v1

DmC: Nearest Neighbor Guidance Diffusion Model for Offline Cross-domain Reinforcement Learning

Cross-domain offline reinforcement learning (RL) seeks to enhance sample efficiency in offline RL by utilizing additional offline source datasets. A key challenge is to identify and utilize source samples that are most relevant to the target domain. Existing approaches address this challenge by measuring domain gaps through domain classifiers, target transition dynamics modeling, or mutual information estimation using contrastive loss. However, these methods often require large target datasets, which is impractical in many real-world scenarios. In this work, we address cross-domain offline RL under a limited target data setting, identifying two primary challenges: (1) Dataset imbalance, which is caused by large source and small target datasets and leads to overfitting in neural network-based domain gap estimators, resulting in uninformative measurements; and (2) Partial domain overlap, where only a subset of the source data is closely aligned with the target domain. To overcome these issues, we propose DmC, a novel framework for cross-domain offline RL with limited target samples. Specifically, DmC utilizes $k$-nearest neighbor ($k$-NN) based estimation to measure domain proximity without neural network training, effectively mitigating overfitting. Then, by utilizing this domain proximity, we introduce a nearest-neighbor-guided diffusion model to generate additional source samples that are better aligned with the target domain, thus enhancing policy learning with more effective source samples. Through theoretical analysis and extensive experiments in diverse MuJoCo environments, we demonstrate that DmC significantly outperforms state-of-the-art cross-domain offline RL methods, achieving substantial performance gains.

Updated: 2025-07-28 03:34:15

标题: DmC:离线跨领域强化学习的最近邻指导扩散模型

摘要: 跨领域离线强化学习(RL)旨在通过利用额外的离线源数据集来增强离线RL的样本效率。一个关键挑战是识别和利用与目标领域最相关的源样本。现有方法通过测量领域间差距,目标转换动力学建模,或使用对比损失进行互信息估计来解决这一挑战。然而,这些方法通常需要大规模的目标数据集,在许多现实场景中是不切实际的。在这项工作中,我们处理了在有限目标数据设置下的跨领域离线RL,确定了两个主要挑战:(1)数据集不平衡,由于大规模的源数据集和小规模的目标数据集导致基于神经网络的领域差距估计器过拟合,导致信息量不足的测量;(2)部分领域重叠,只有源数据的一个子集与目标领域密切对齐。为了克服这些问题,我们提出了DmC,一个用于具有有限目标样本的跨领域离线RL的新框架。具体来说,DmC利用基于$k$-最近邻($k$-NN)的估计来测量领域接近度,而无需神经网络训练,有效地减轻过拟合。然后,通过利用这种领域接近度,我们引入了一个最近邻引导扩散模型,以生成更好与目标领域对齐的额外源样本,从而提高了使用更有效源样本的策略学习。通过理论分析和在不同MuJoCo环境中的大量实验,我们证明了DmC显著优于最先进的跨领域离线RL方法,实现了显著的性能提升。

更新时间: 2025-07-28 03:34:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20499v1

Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning

Knowledge Graph (KG) reasoning, which aims to infer new facts from structured knowledge repositories, plays a vital role in Natural Language Processing (NLP) systems. Its effectiveness critically depends on constructing informative and contextually relevant reasoning paths. However, existing graph neural networks (GNNs) often adopt rigid, query-agnostic path-exploration strategies, limiting their ability to adapt to diverse linguistic contexts and semantic nuances. To address these limitations, we propose \textbf{MoKGR}, a mixture-of-experts framework that personalizes path exploration through two complementary components: (1) a mixture of length experts that adaptively selects and weights candidate path lengths according to query complexity, providing query-specific reasoning depth; and (2) a mixture of pruning experts that evaluates candidate paths from a complementary perspective, retaining the most informative paths for each query. Through comprehensive experiments on diverse benchmark, MoKGR demonstrates superior performance in both transductive and inductive settings, validating the effectiveness of personalized path exploration in KGs reasoning.

Updated: 2025-07-28 03:30:28

标题: 长度和修剪专家的混合用于知识图推理

摘要: 知识图谱(KG)推理旨在从结构化知识库中推断新事实,在自然语言处理(NLP)系统中起着至关重要的作用。其有效性在很大程度上取决于构建信息丰富且与上下文相关的推理路径。然而,现有的图神经网络(GNNs)通常采用僵化的、与查询无关的路径探索策略,限制了它们适应不同语言环境和语义细微差别的能力。为了解决这些限制,我们提出了MoKGR,一个专家混合框架,通过两个互补组件个性化路径探索:(1)一种长度专家混合体,根据查询复杂性自适应地选择和加权候选路径长度,提供查询特定的推理深度;和(2)一种剪枝专家混合体,从补充角度评估候选路径,保留每个查询中最具信息量的路径。通过对多种基准的全面实验,MoKGR在传导和归纳设置中展示了卓越的性能,验证了在KG推理中个性化路径探索的有效性。

更新时间: 2025-07-28 03:30:28

领域: cs.LG

下载: http://arxiv.org/abs/2507.20498v1

Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning

Knowledge Graph (KG) reasoning, which aims to infer new facts from structured knowledge repositories, plays a vital role in Natural Language Processing (NLP) systems. Its effectiveness critically depends on constructing informative and contextually relevant reasoning paths. However, existing graph neural networks (GNNs) often adopt rigid, query-agnostic path-exploration strategies, limiting their ability to adapt to diverse linguistic contexts and semantic nuances. To address these limitations, we propose \textbf{MoKGR}, a mixture-of-experts framework that personalizes path exploration through two complementary components: (1) a mixture of length experts that adaptively selects and weights candidate path lengths according to query complexity, providing query-specific reasoning depth; and (2) a mixture of pruning experts that evaluates candidate paths from a complementary perspective, retaining the most informative paths for each query. Through comprehensive experiments on diverse benchmark, MoKGR demonstrates superior performance in both transductive and inductive settings, validating the effectiveness of personalized path exploration in KGs reasoning.

Updated: 2025-07-28 03:30:28

标题: 混合长度和修剪专家用于知识图谱推理

摘要: 知识图谱(Knowledge Graph,KG)推理旨在从结构化知识库中推断新的事实,在自然语言处理(Natural Language Processing,NLP)系统中起着至关重要的作用。其有效性在很大程度上取决于构建具有信息量和上下文相关性的推理路径。然而,现有的图神经网络(Graph Neural Networks,GNNs)通常采用僵化、与查询无关的路径探索策略,限制了它们适应多样化语言环境和语义细微差别的能力。为了解决这些限制,我们提出了MoKGR,一种专家混合框架,通过两个互补组件个性化路径探索:(1)一种长度专家混合,根据查询复杂性自适应地选择和加权候选路径长度,提供特定查询的推理深度;(2)一种修剪专家混合,从补充角度评估候选路径,保留每个查询的最具信息量的路径。通过在多样的基准测试上进行全面实验,MoKGR在传导和归纳设置中展示出卓越的性能,验证了知识图谱推理中个性化路径探索的有效性。

更新时间: 2025-07-28 03:30:28

领域: cs.LG

下载: http://arxiv.org/abs/2507.20498v1

The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships?

This paper introduces the Shepherd Test, a new conceptual test for assessing the moral and relational dimensions of superintelligent artificial agents. The test is inspired by human interactions with animals, where ethical considerations about care, manipulation, and consumption arise in contexts of asymmetric power and self-preservation. We argue that AI crosses an important, and potentially dangerous, threshold of intelligence when it exhibits the ability to manipulate, nurture, and instrumentally use less intelligent agents, while also managing its own survival and expansion goals. This includes the ability to weigh moral trade-offs between self-interest and the well-being of subordinate agents. The Shepherd Test thus challenges traditional AI evaluation paradigms by emphasizing moral agency, hierarchical behavior, and complex decision-making under existential stakes. We argue that this shift is critical for advancing AI governance, particularly as AI systems become increasingly integrated into multi-agent environments. We conclude by identifying key research directions, including the development of simulation environments for testing moral behavior in AI, and the formalization of ethical manipulation within multi-agent systems.

Updated: 2025-07-28 03:25:55

标题: 超智能AI代理的终极测试:AI能否在不对称关系中平衡关心和控制?

摘要: 这篇论文介绍了Shepherd Test,这是一个用于评估超智能人工智能代理的道德和关系维度的新概念测试。该测试受到人类与动物互动的启发,在这种情境下,道德考虑涉及到关怀、操纵和消费等议题,而这些议题发生在权力和自我保护的不对称背景下。我们认为,当人工智能表现出能够操纵、培育和工具性地利用比自己智力较低的代理者,同时又能够管理自身的生存和扩张目标时,人工智能跨越了一种重要且潜在危险的智力门槛。这包括了权衡自身利益与从属代理者福祉之间的道德权衡能力。Shepherd Test通过强调道德代理性、等级行为和在生存危机下的复杂决策制定,挑战了传统的人工智能评估范式。我们认为,这种转变对于推进人工智能治理至关重要,尤其是在人工智能系统越来越多地整合到多代理环境中的情况下。我们最后提出了关键的研究方向,包括开发模拟环境来测试人工智能的道德行为,以及在多代理系统中形式化道德操纵。

更新时间: 2025-07-28 03:25:55

领域: cs.AI

下载: http://arxiv.org/abs/2506.01813v3

The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships?

This paper introduces the Shepherd Test, a new conceptual test for assessing the moral and relational dimensions of superintelligent artificial agents. The test is inspired by human interactions with animals, where ethical considerations about care, manipulation, and consumption arise in contexts of asymmetric power and self-preservation. We argue that AI crosses an important, and potentially dangerous, threshold of intelligence when it exhibits the ability to manipulate, nurture, and instrumentally use less intelligent agents, while also managing its own survival and expansion goals. This includes the ability to weigh moral trade-offs between self-interest and the well-being of subordinate agents. The Shepherd Test thus challenges traditional AI evaluation paradigms by emphasizing moral agency, hierarchical behavior, and complex decision-making under existential stakes. We argue that this shift is critical for advancing AI governance, particularly as AI systems become increasingly integrated into multi-agent environments. We conclude by identifying key research directions, including the development of simulation environments for testing moral behavior in AI, and the formalization of ethical manipulation within multi-agent systems.

Updated: 2025-07-28 03:25:55

标题: 超级智能AI代理的终极测试:AI能否在不对称关系中平衡关怀和控制?

摘要: 本文介绍了Shepherd Test,这是一种用于评估超智能人工智能代理的道德和关系维度的新概念测试。该测试受到人类与动物的互动的启发,其中在权力不对称和自我保存的背景下产生了关于照顾、操纵和消费的伦理考虑。我们认为,当人工智能展示出操纵、培育和工具性地利用智能较低的代理,同时管理自身的生存和扩张目标的能力时,AI跨越了一个重要且潜在危险的智能门槛。这包括权衡自身利益和从属代理的福祉之间的道德折衷的能力。Shepherd Test通过强调道德代理、等级行为和在生存利害关系下的复杂决策,挑战了传统的AI评估范式。我们认为,这种转变对于推进人工智能治理至关重要,特别是随着人工智能系统越来越多地整合到多代理环境中。我们最后指出了关键的研究方向,包括开发模拟环境来测试人工智能的道德行为,以及在多代理系统中对道德操纵进行形式化。

更新时间: 2025-07-28 03:25:55

领域: cs.AI

下载: http://arxiv.org/abs/2506.01813v3

Classification of high-dimensional data with spiked covariance matrix structure

We study the classification problem for high-dimensional data with $n$ observations on $p$ features where the $p \times p$ covariance matrix $\Sigma$ exhibits a spiked eigenvalues structure and the vector $\zeta$, given by the difference between the whitened mean vectors, is sparse with sparsity at most $s$. We propose an adaptive classifier (adaptive with respect to the sparsity $s$) that first performs dimension reduction on the feature vectors prior to classification in the dimensionally reduced space, i.e., the classifier whitened the data, then screen the features by keeping only those corresponding to the $s$ largest coordinates of $\zeta$ and finally apply Fisher linear discriminant on the selected features. Leveraging recent results on entrywise matrix perturbation bounds for covariance matrices, we show that the resulting classifier is Bayes optimal whenever $n \rightarrow \infty$ and $s \sqrt{n^{-1} \ln p} \rightarrow 0$. Experimental results on real and synthetic data sets indicate that the proposed classifier is competitive with existing state-of-the-art methods while also selecting a smaller number of features.

Updated: 2025-07-28 03:21:30

标题: 高维数据的尖峰协方差矩阵结构分类

摘要: 我们研究具有$n$个观测值和$p$个特征的高维数据的分类问题,其中$p \times p$协方差矩阵$\Sigma$表现出尖峰特征值结构,向量$\zeta$由白化均值向量之间的差异给出,具有最多$s$的稀疏性。我们提出了一个自适应分类器(相对于稀疏性$s$自适应),该分类器首先对特征向量进行降维,然后在降维空间中进行分类,即分类器白化数据,然后通过保留仅对应于$\zeta$的$s$个最大坐标的特征来筛选特征,最后在选定的特征上应用Fisher线性判别。利用最近关于协方差矩阵的逐元素矩阵扰动边界的结果,我们表明当$n \rightarrow \infty$且$s \sqrt{n^{-1} \ln p} \rightarrow 0$时,得到的分类器是贝叶斯最优的。对真实和合成数据集的实验结果表明,所提出的分类器与现有的最先进方法竞争力强,同时选择较少数量的特征。

更新时间: 2025-07-28 03:21:30

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2110.01950v2

Classification of high-dimensional data with spiked covariance matrix structure

We study the classification problem for high-dimensional data with $n$ observations on $p$ features where the $p \times p$ covariance matrix $\Sigma$ exhibits a spiked eigenvalues structure and the vector $\zeta$, given by the difference between the whitened mean vectors, is sparse with sparsity at most $s$. We propose an adaptive classifier (adaptive with respect to the sparsity $s$) that first performs dimension reduction on the feature vectors prior to classification in the dimensionally reduced space, i.e., the classifier whitened the data, then screen the features by keeping only those corresponding to the $s$ largest coordinates of $\zeta$ and finally apply Fisher linear discriminant on the selected features. Leveraging recent results on entrywise matrix perturbation bounds for covariance matrices, we show that the resulting classifier is Bayes optimal whenever $n \rightarrow \infty$ and $s \sqrt{n^{-1} \ln p} \rightarrow 0$. Experimental results on real and synthetic data sets indicate that the proposed classifier is competitive with existing state-of-the-art methods while also selecting a smaller number of features.

Updated: 2025-07-28 03:21:30

标题: 高维数据具有尖峰协方差矩阵结构的分类

摘要: 我们研究了具有$n$个观测值和$p$个特征的高维数据的分类问题,其中$p \times p$协方差矩阵$\Sigma$呈现出尖峰特征值结构,向量$\zeta$由白化均值向量之间的差异给出,具有最多$s$个稀疏度。我们提出了一种自适应分类器(相对于稀疏度$s$自适应),该分类器首先对特征向量进行维度缩减,然后在降维空间中进行分类,即分类器白化数据,然后通过仅保留与$\zeta$的$s$个最大坐标对应的特征来筛选特征,最后在选定的特征上应用Fisher线性判别。利用最近关于协方差矩阵的元素级矩阵扰动界限的结果,我们表明当$n \rightarrow \infty$且$s \sqrt{n^{-1} \ln p} \rightarrow 0$时,得到的分类器是贝叶斯最优的。对真实和合成数据集的实验结果表明,所提出的分类器与现有的最先进方法具有竞争力,同时选择了较少数量的特征。

更新时间: 2025-07-28 03:21:30

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2110.01950v2

Position: Untrained Machine Learning for Anomaly Detection by using 3D Point Cloud Data

Anomaly detection based on 3D point cloud data is an important research problem and receives more and more attention recently. Untrained anomaly detection based on only one sample is an emerging research problem motivated by real manufacturing industries such as personalized manufacturing where only one sample can be collected without any additional labels and historical datasets. Identifying anomalies accurately based on one 3D point cloud sample is a critical challenge in both industrial applications and the field of machine learning. This paper aims to provide a formal definition of the untrained anomaly detection problem based on 3D point cloud data, discuss the differences between untrained anomaly detection and current unsupervised anomaly detection problems. Unlike trained unsupervised learning, untrained unsupervised learning does not rely on any data, including unlabeled data. Instead, they leverage prior knowledge about the surfaces and anomalies. We propose three complementary methodological frameworks: the Latent Variable Inference Framework that employs probabilistic modeling to distinguish anomalies; the Decomposition Framework that separates point clouds into reference, anomaly, and noise components through sparse learning; and the Local Geometry Framework that leverages neighborhood information for anomaly identification. Experimental results demonstrate that untrained methods achieve competitive detection performance while offering significant computational advantages, demonstrating up to a 15-fold increase in execution speed. The proposed methods provide viable solutions for scenarios with extreme data scarcity, addressing critical challenges in personalized manufacturing and healthcare applications where collecting multiple samples or historical data is infeasible.

Updated: 2025-07-28 03:20:37

标题: 标题:位置:使用3D点云数据进行异常检测的未经训练的机器学习

摘要: 基于3D点云数据的异常检测是一个重要的研究问题,最近受到越来越多的关注。基于仅一个样本的未经训练的异常检测是一个新兴的研究问题,受到现实制造行业的激励,例如个性化制造,其中仅能收集一个样本,没有额外的标签和历史数据集。基于一个3D点云样本准确识别异常在工业应用和机器学习领域都是一个关键挑战。本文旨在提供基于3D点云数据的未经训练异常检测问题的形式化定义,讨论未经训练异常检测与当前无监督异常检测问题之间的区别。与受过训练的无监督学习不同,未经训练的无监督学习不依赖于任何数据,包括无标签数据。相反,它们利用关于表面和异常的先验知识。 我们提出了三种互补的方法论框架:采用概率建模来区分异常的潜在变量推理框架;通过稀疏学习将点云分解为参考、异常和噪声组件的分解框架;以及利用邻域信息进行异常识别的局部几何框架。实验结果表明,未经训练的方法在提供显著的计算优势的同时实现了竞争性的检测性能,执行速度提高了15倍。提出的方法为极度数据稀缺情况提供可行的解决方案,解决了个性化制造和医疗保健应用中收集多个样本或历史数据的不可行性的关键挑战。

更新时间: 2025-07-28 03:20:37

领域: cs.LG

下载: http://arxiv.org/abs/2502.03876v3

Position: Untrained Machine Learning for Anomaly Detection by using 3D Point Cloud Data

Anomaly detection based on 3D point cloud data is an important research problem and receives more and more attention recently. Untrained anomaly detection based on only one sample is an emerging research problem motivated by real manufacturing industries such as personalized manufacturing where only one sample can be collected without any additional labels and historical datasets. Identifying anomalies accurately based on one 3D point cloud sample is a critical challenge in both industrial applications and the field of machine learning. This paper aims to provide a formal definition of the untrained anomaly detection problem based on 3D point cloud data, discuss the differences between untrained anomaly detection and current unsupervised anomaly detection problems. Unlike trained unsupervised learning, untrained unsupervised learning does not rely on any data, including unlabeled data. Instead, they leverage prior knowledge about the surfaces and anomalies. We propose three complementary methodological frameworks: the Latent Variable Inference Framework that employs probabilistic modeling to distinguish anomalies; the Decomposition Framework that separates point clouds into reference, anomaly, and noise components through sparse learning; and the Local Geometry Framework that leverages neighborhood information for anomaly identification. Experimental results demonstrate that untrained methods achieve competitive detection performance while offering significant computational advantages, demonstrating up to a 15-fold increase in execution speed. The proposed methods provide viable solutions for scenarios with extreme data scarcity, addressing critical challenges in personalized manufacturing and healthcare applications where collecting multiple samples or historical data is infeasible.

Updated: 2025-07-28 03:20:37

标题: "位置:使用3D点云数据进行未经训练的机器学习异常检测"

摘要: 基于3D点云数据的异常检测是一个重要的研究问题,并在近期越来越受到关注。基于单个样本的未经训练的异常检测是一种新兴的研究问题,受到实际制造业的推动,例如个性化制造,只能收集一个样本而没有额外的标签和历史数据集。基于一个3D点云样本准确识别异常是工业应用和机器学习领域的一个关键挑战。本文旨在提供基于3D点云数据的未经训练异常检测问题的正式定义,讨论未经训练异常检测与当前无监督异常检测问题之间的差异。与经过训练的无监督学习不同,未经训练的无监督学习不依赖于任何数据,包括无标签数据。相反,它们利用关于表面和异常的先验知识。 我们提出了三种互补的方法论框架:采用概率建模来区分异常的潜在变量推断框架;通过稀疏学习将点云分解为参考、异常和噪声组件的分解框架;利用邻域信息进行异常识别的局部几何框架。实验结果表明,未经训练的方法在提供显著的计算优势的同时实现了竞争性的检测性能,执行速度提高了15倍。提出的方法为数据极度稀缺的场景提供了可行的解决方案,解决了个性化制造和医疗应用中收集多个样本或历史数据不可行的关键挑战。

更新时间: 2025-07-28 03:20:37

领域: cs.LG

下载: http://arxiv.org/abs/2502.03876v3

A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization

Random reshuffling techniques are prevalent in large-scale applications, such as training neural networks. While the convergence and acceleration effects of random reshuffling-type methods are fairly well understood in the smooth setting, much less studies seem available in the nonsmooth case. In this work, we design a new normal map-based proximal random reshuffling (norm-PRR) method for nonsmooth nonconvex finite-sum problems. We show that norm-PRR achieves the iteration complexity ${\cal O}(n^{-1/3}T^{-2/3})$ where $n$ denotes the number of component functions $f(\cdot,i)$ and $T$ counts the total number of iterations. This improves the currently known complexity bounds for this class of problems by a factor of $n^{-1/3}$ in terms of the number of gradient evaluations. Additionally, we prove that norm-PRR converges linearly under the (global) Polyak-{\L}ojasiewicz condition and in the interpolation setting. We further complement these non-asymptotic results and provide an in-depth analysis of the asymptotic properties of norm-PRR. Specifically, under the (local) Kurdyka-{\L}ojasiewicz inequality, the whole sequence of iterates generated by norm-PRR is shown to converge to a single stationary point. Moreover, we derive last-iterate convergence rates that can match those in the smooth, strongly convex setting. Finally, numerical experiments are performed on nonconvex classification tasks to illustrate the efficiency of the proposed approach.

Updated: 2025-07-28 03:15:32

标题: 一个新的用于非光滑非凸有限和优化的随机重排方法

摘要: 随机重排技术在大规模应用中很常见,例如训练神经网络。虽然在平滑设置中对随机重排类型方法的收敛和加速效果有相当深入的理解,但在非平滑情况下可用的研究似乎更少。在这项工作中,我们设计了一种基于正常映射的近端随机重排(norm-PRR)方法,用于非平滑非凸有限和问题。我们展示了norm-PRR实现了迭代复杂度${\cal O}(n^{-1/3}T^{-2/3})$,其中$n$表示组件函数$f(\cdot,i)$的数量,$T$表示总迭代次数。这在梯度评估数量方面将该类问题目前已知的复杂度边界提高了一个$n^{-1/3}$因子。此外,我们证明norm-PRR在(全局)Polyak-{\L}ojasiewicz条件和插值设置下线性收敛。我们进一步补充了这些非渐近结果,并对norm-PRR的渐近性质进行了深入分析。具体来说,在(局部)Kurdyka-{\L}ojasiewicz不等式下,由norm-PRR生成的整个迭代序列被证明会收敛到单个稳定点。此外,我们推导了最后迭代的收敛速度,可以与平滑、强凸设置中的速度相匹配。最后,在非凸分类任务上进行了数值实验,以说明所提出方法的效率。

更新时间: 2025-07-28 03:15:32

领域: math.OC,cs.LG,90C26, 90C15

下载: http://arxiv.org/abs/2312.01047v3

A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization

Random reshuffling techniques are prevalent in large-scale applications, such as training neural networks. While the convergence and acceleration effects of random reshuffling-type methods are fairly well understood in the smooth setting, much less studies seem available in the nonsmooth case. In this work, we design a new normal map-based proximal random reshuffling (norm-PRR) method for nonsmooth nonconvex finite-sum problems. We show that norm-PRR achieves the iteration complexity ${\cal O}(n^{-1/3}T^{-2/3})$ where $n$ denotes the number of component functions $f(\cdot,i)$ and $T$ counts the total number of iterations. This improves the currently known complexity bounds for this class of problems by a factor of $n^{-1/3}$ in terms of the number of gradient evaluations. Additionally, we prove that norm-PRR converges linearly under the (global) Polyak-{\L}ojasiewicz condition and in the interpolation setting. We further complement these non-asymptotic results and provide an in-depth analysis of the asymptotic properties of norm-PRR. Specifically, under the (local) Kurdyka-{\L}ojasiewicz inequality, the whole sequence of iterates generated by norm-PRR is shown to converge to a single stationary point. Moreover, we derive last-iterate convergence rates that can match those in the smooth, strongly convex setting. Finally, numerical experiments are performed on nonconvex classification tasks to illustrate the efficiency of the proposed approach.

Updated: 2025-07-28 03:15:32

标题: 一种新的用于非光滑非凸有限和优化的随机重排方法

摘要: 随机重排技术在大规模应用中很常见,比如训练神经网络。虽然在光滑情况下,人们对随机重排型方法的收敛和加速效果有相当深入的了解,但在非光滑情况下可用的研究似乎较少。在这项工作中,我们设计了一种新的基于正则映射的近端随机重排(norm-PRR)方法,用于非光滑非凸有限和问题。我们表明,norm-PRR方法实现了迭代复杂度为${\cal O}(n^{-1/3}T^{-2/3})$,其中$n$表示组分函数$f(\cdot,i)$的数量,$T$表示总迭代次数。这在梯度评估次数方面改进了目前已知的这类问题的复杂度界限一个$n^{-1/3}$因子。此外,我们证明了在(全局)Polyak-{\L}ojasiewicz条件和插值设置下,norm-PRR收敛线性。我们进一步补充了这些非渐近结果,并对norm-PRR的渐近特性进行了深入分析。具体来说,在(局部)Kurdyka-{\L}ojasiewicz不等式下,由norm-PRR生成的整个迭代序列被证明会收敛到单个稳定点。此外,我们推导了可以与光滑、强凸设置中的收敛速率相匹配的最后迭代收敛速率。最后,在非凸分类任务上进行了数值实验,以说明所提出方法的效率。

更新时间: 2025-07-28 03:15:32

领域: math.OC,cs.LG,90C26, 90C15

下载: http://arxiv.org/abs/2312.01047v3

Deep Reputation Scoring in DeFi: zScore-Based Wallet Ranking from Liquidity and Trading Signals

As decentralized finance (DeFi) evolves, distinguishing between user behaviors - liquidity provision versus active trading - has become vital for risk modeling and on-chain reputation. We propose a behavioral scoring framework for Uniswap that assigns two complementary scores: a Liquidity Provision Score that assesses strategic liquidity contributions, and a Swap Behavior Score that reflects trading intent, volatility exposure, and discipline. The scores are constructed using rule-based blueprints that decompose behavior into volume, frequency, holding time, and withdrawal patterns. To handle edge cases and learn feature interactions, we introduce a deep residual neural network with densely connected skip blocks inspired by the U-Net architecture. We also incorporate pool-level context such as total value locked (TVL), fee tiers, and pool size, allowing the system to differentiate similar user behaviors across pools with varying characteristics. Our framework enables context-aware and scalable DeFi user scoring, supporting improved risk assessment and incentive design. Experiments on Uniswap v3 data show its usefulness for user segmentation and protocol-aligned reputation systems. Although we refer to our metric as zScore, it is independently developed and methodologically different from the cross-protocol system proposed by Udupi et al. Our focus is on role-specific behavioral modeling within Uniswap using blueprint logic and supervised learning.

Updated: 2025-07-28 03:12:27

标题: DeFi中的深度声誉评分:基于流动性和交易信号的zScore排名

摘要: 随着去中心化金融(DeFi)的发展,区分用户行为——提供流动性与积极交易——对于风险建模和链上声誉变得至关重要。我们提出了一个针对Uniswap的行为评分框架,分配两个互补的分数:一个评估战略性流动性贡献的流动性提供分数,以及反映交易意图、波动暴露和纪律的交换行为分数。这些分数是使用基于规则的蓝图构建的,将行为分解为交易量、频率、持有时间和提取模式。为了处理边缘情况并学习特征交互作用,我们引入了一种深度残差神经网络,其具有受U-Net架构启发的密集连接跳跃块。我们还将池级上下文(如总锁定价值(TVL)、费用分层和池大小)纳入,使系统能够区分具有不同特征的池中的相似用户行为。我们的框架支持具有上下文感知和可扩展的DeFi用户评分,支持改进风险评估和激励设计。对Uniswap v3数据的实验显示了其在用户分割和协议对齐的声誉系统方面的用处。尽管我们将我们的度量称为zScore,但它是独立开发且在方法上与Udupi等人提出的跨协议系统不同。我们的重点是使用蓝图逻辑和监督学习在Uniswap内进行特定角色的行为建模。

更新时间: 2025-07-28 03:12:27

领域: q-fin.GN,cs.LG

下载: http://arxiv.org/abs/2507.20494v1

Deep Reputation Scoring in DeFi: zScore-Based Wallet Ranking from Liquidity and Trading Signals

As decentralized finance (DeFi) evolves, distinguishing between user behaviors - liquidity provision versus active trading - has become vital for risk modeling and on-chain reputation. We propose a behavioral scoring framework for Uniswap that assigns two complementary scores: a Liquidity Provision Score that assesses strategic liquidity contributions, and a Swap Behavior Score that reflects trading intent, volatility exposure, and discipline. The scores are constructed using rule-based blueprints that decompose behavior into volume, frequency, holding time, and withdrawal patterns. To handle edge cases and learn feature interactions, we introduce a deep residual neural network with densely connected skip blocks inspired by the U-Net architecture. We also incorporate pool-level context such as total value locked (TVL), fee tiers, and pool size, allowing the system to differentiate similar user behaviors across pools with varying characteristics. Our framework enables context-aware and scalable DeFi user scoring, supporting improved risk assessment and incentive design. Experiments on Uniswap v3 data show its usefulness for user segmentation and protocol-aligned reputation systems. Although we refer to our metric as zScore, it is independently developed and methodologically different from the cross-protocol system proposed by Udupi et al. Our focus is on role-specific behavioral modeling within Uniswap using blueprint logic and supervised learning.

Updated: 2025-07-28 03:12:27

标题: DeFi中的深度声誉评分:基于流动性和交易信号的z分数的钱包排名

摘要: 随着去中心化金融(DeFi)的发展,区分用户行为——提供流动性与主动交易——对于风险建模和链上声誉变得至关重要。我们提出了一个针对Uniswap的行为评分框架,分配两个互补的评分:一个评估战略性流动性贡献的流动性提供评分,以及一个反映交易意图、波动率暴露和纪律性的交易行为评分。这些评分使用基于规则的蓝图构建,将行为分解为成交量、频率、持有时间和提取模式。为了处理边缘情况和学习特征交互作用,我们引入了一个深度残差神经网络,其中包括受U-Net架构启发的密集连接跳过块。我们还将池级上下文(如总锁定价值、费用层级和池大小)纳入其中,使系统能够区分具有不同特征的池中类似的用户行为。我们的框架实现了具有上下文感知和可扩展性的DeFi用户评分,支持改进风险评估和激励设计。对Uniswap v3数据的实验显示了其在用户分割和协议对齐声誉系统中的实用性。尽管我们将我们的指标称为zScore,但它是由我们独立开发的,方法论与Udupi等人提出的跨协议系统有所不同。我们的重点是使用蓝图逻辑和监督学习在Uniswap内进行特定角色的行为建模。

更新时间: 2025-07-28 03:12:27

领域: q-fin.GN,cs.LG

下载: http://arxiv.org/abs/2507.20494v1

Speaking in Words, Thinking in Logic: A Dual-Process Framework in QA Systems

Recent advances in large language models (LLMs) have significantly enhanced question-answering (QA) capabilities, particularly in open-domain contexts. However, in closed-domain scenarios such as education, healthcare, and law, users demand not only accurate answers but also transparent reasoning and explainable decision-making processes. While neural-symbolic (NeSy) frameworks have emerged as a promising solution, leveraging LLMs for natural language understanding and symbolic systems for formal reasoning, existing approaches often rely on large-scale models and exhibit inefficiencies in translating natural language into formal logic representations. To address these limitations, we introduce Text-JEPA (Text-based Joint-Embedding Predictive Architecture), a lightweight yet effective framework for converting natural language into first-order logic (NL2FOL). Drawing inspiration from dual-system cognitive theory, Text-JEPA emulates System 1 by efficiently generating logic representations, while the Z3 solver operates as System 2, enabling robust logical inference. To rigorously evaluate the NL2FOL-to-reasoning pipeline, we propose a comprehensive evaluation framework comprising three custom metrics: conversion score, reasoning score, and Spearman rho score, which collectively capture the quality of logical translation and its downstream impact on reasoning accuracy. Empirical results on domain-specific datasets demonstrate that Text-JEPA achieves competitive performance with significantly lower computational overhead compared to larger LLM-based systems. Our findings highlight the potential of structured, interpretable reasoning frameworks for building efficient and explainable QA systems in specialized domains.

Updated: 2025-07-28 03:00:35

标题: 说话用词语,思考用逻辑:问答系统中的双过程框架

摘要: 最近对大型语言模型(LLM)的进展显著增强了问答(QA)能力,特别是在开放领域的情境下。然而,在封闭领域的场景,如教育、医疗和法律,用户不仅要求准确答案,还要求透明的推理和可解释的决策过程。虽然神经符号(NeSy)框架已经成为一种有前途的解决方案,利用LLM进行自然语言理解和符号系统进行形式推理,但现有方法往往依赖大规模模型,并且在将自然语言转换为形式逻辑表示方面存在效率低下的问题。 为了解决这些限制,我们引入了Text-JEPA(基于文本的联合嵌入预测架构),这是一个轻量但有效的框架,用于将自然语言转换为一阶逻辑(NL2FOL)。受双系统认知理论启发,Text-JEPA 模拟了系统1,通过有效生成逻辑表示,而Z3求解器作为系统2,实现了强大的逻辑推理。为了严格评估NL2FOL到推理管道,我们提出了一个包括三个自定义指标的综合评估框架:转换分数、推理分数和Spearman rho分数,这些指标共同捕获了逻辑转换的质量以及其对推理准确性的下游影响。 基于特定领域数据集的实证结果表明,Text-JEPA相比基于更大LLM系统,具有显著更低的计算开销,同时获得了竞争性能。我们的研究结果突显了结构化、可解释的推理框架在构建专业领域高效且可解释的问答系统中的潜力。

更新时间: 2025-07-28 03:00:35

领域: cs.CL,cs.AI,cs.SC

下载: http://arxiv.org/abs/2507.20491v1

Speaking in Words, Thinking in Logic: A Dual-Process Framework in QA Systems

Recent advances in large language models (LLMs) have significantly enhanced question-answering (QA) capabilities, particularly in open-domain contexts. However, in closed-domain scenarios such as education, healthcare, and law, users demand not only accurate answers but also transparent reasoning and explainable decision-making processes. While neural-symbolic (NeSy) frameworks have emerged as a promising solution, leveraging LLMs for natural language understanding and symbolic systems for formal reasoning, existing approaches often rely on large-scale models and exhibit inefficiencies in translating natural language into formal logic representations. To address these limitations, we introduce Text-JEPA (Text-based Joint-Embedding Predictive Architecture), a lightweight yet effective framework for converting natural language into first-order logic (NL2FOL). Drawing inspiration from dual-system cognitive theory, Text-JEPA emulates System 1 by efficiently generating logic representations, while the Z3 solver operates as System 2, enabling robust logical inference. To rigorously evaluate the NL2FOL-to-reasoning pipeline, we propose a comprehensive evaluation framework comprising three custom metrics: conversion score, reasoning score, and Spearman rho score, which collectively capture the quality of logical translation and its downstream impact on reasoning accuracy. Empirical results on domain-specific datasets demonstrate that Text-JEPA achieves competitive performance with significantly lower computational overhead compared to larger LLM-based systems. Our findings highlight the potential of structured, interpretable reasoning frameworks for building efficient and explainable QA systems in specialized domains.

Updated: 2025-07-28 03:00:35

标题: 说话用词,思维逻辑:问答系统中的双过程框架

摘要: 最近大语言模型(LLMs)的进展显著增强了问答(QA)能力,特别是在开放领域的情境中。然而,在封闭领域的场景,如教育、医疗和法律中,用户不仅要求准确答案,还要求透明的推理和可解释的决策过程。虽然神经符号(NeSy)框架已经被证明是一种有前途的解决方案,利用LLMs进行自然语言理解和符号系统进行形式推理,但现有方法往往依赖大规模模型,并且在将自然语言转换成形式逻辑表示方面效率低下。 为了解决这些限制,我们引入了Text-JEPA(基于文本的联合嵌入预测架构),这是一个轻量但有效的框架,用于将自然语言转换成一阶逻辑(NL2FOL)。受到双系统认知理论的启发,Text-JEPA 模拟了系统1,通过高效生成逻辑表示,而Z3求解器作为系统2,实现了强大的逻辑推理。为了严格评估NL2FOL到推理管道,我们提出了一个包括三个自定义指标的综合评估框架:转换得分、推理得分和Spearman rho得分,共同捕捉逻辑翻译的质量及其对推理准确性的下游影响。 领域特定数据集上的实证结果表明,Text-JEPA相比较更大规模的LLMs系统,实现了具有竞争力的性能,并且计算开销明显更低。我们的发现强调了结构化、可解释的推理框架在构建专业领域高效且可解释的QA系统中的潜力。

更新时间: 2025-07-28 03:00:35

领域: cs.CL,cs.AI,cs.SC

下载: http://arxiv.org/abs/2507.20491v1

Persistent Backdoor Attacks in Continual Learning

Backdoor attacks pose a significant threat to neural networks, enabling adversaries to manipulate model outputs on specific inputs, often with devastating consequences, especially in critical applications. While backdoor attacks have been studied in various contexts, little attention has been given to their practicality and persistence in continual learning, particularly in understanding how the continual updates to model parameters, as new data distributions are learned and integrated, impact the effectiveness of these attacks over time. To address this gap, we introduce two persistent backdoor attacks-Blind Task Backdoor and Latent Task Backdoor-each leveraging minimal adversarial influence. Our blind task backdoor subtly alters the loss computation without direct control over the training process, while the latent task backdoor influences only a single task's training, with all other tasks trained benignly. We evaluate these attacks under various configurations, demonstrating their efficacy with static, dynamic, physical, and semantic triggers. Our results show that both attacks consistently achieve high success rates across different continual learning algorithms, while effectively evading state-of-the-art defenses, such as SentiNet and I-BAU.

Updated: 2025-07-28 02:58:25

标题: 持续背门攻击在持续学习中的应用

摘要: 后门攻击对神经网络构成了重大威胁,使对手能够在特定输入上操纵模型输出,通常会造成毁灭性后果,特别是在关键应用中。虽然后门攻击在各种情境下已经得到研究,但对它们在持续学习中的实用性和持久性关注较少,特别是在了解模型参数持续更新如何影响这些攻击随时间的有效性时。为了填补这一空白,我们引入了两种持久后门攻击——盲目任务后门和潜在任务后门——每种利用最小的敌对影响。我们的盲目任务后门在没有直接控制训练过程的情况下微妙地改变损失计算,而潜在任务后门仅影响单个任务的训练,而其他所有任务都以良性方式训练。我们在各种配置下评估这些攻击,展示它们在静态、动态、物理和语义触发器下的有效性。我们的结果表明,这两种攻击在不同的持续学习算法中始终能够稳定地实现高成功率,同时有效地规避了SentiNet和I-BAU等最先进的防御机制。

更新时间: 2025-07-28 02:58:25

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2409.13864v2

HIAL: A New Paradigm for Hypergraph Active Learning via Influence Maximization

In recent years, Hypergraph Neural Networks (HNNs) have demonstrated immense potential in handling complex systems with high-order interactions. However, acquiring large-scale, high-quality labeled data for these models is costly, making Active Learning (AL) a critical technique. Existing Graph Active Learning (GAL) methods, when applied to hypergraphs, often rely on techniques like "clique expansion," which destroys the high-order structural information crucial to a hypergraph's success, thereby leading to suboptimal performance. To address this challenge, we introduce HIAL (Hypergraph Active Learning), a native active learning framework designed specifically for hypergraphs. We innovatively reformulate the Hypergraph Active Learning (HAL) problem as an Influence Maximization task. The core of HIAL is a dual-perspective influence function that, based on our novel "High-Order Interaction-Aware (HOI-Aware)" propagation mechanism, synergistically evaluates a node's feature-space coverage (via Magnitude of Influence, MoI) and its topological influence (via Expected Diffusion Value, EDV). We prove that this objective function is monotone and submodular, thus enabling the use of an efficient greedy algorithm with a formal (1-1/e) approximation guarantee. Extensive experiments on seven public datasets demonstrate that HIAL significantly outperforms state-of-the-art baselines in terms of performance, efficiency, generality, and robustness, establishing an efficient and powerful new paradigm for active learning on hypergraphs.

Updated: 2025-07-28 02:58:21

标题: HIAL:通过影响力最大化的超图主动学习的新范式

摘要: 近年来,超图神经网络(HNNs)已经展示出在处理具有高阶交互作用的复杂系统方面具有巨大潜力。然而,为这些模型获取大规模、高质量标记数据是昂贵的,因此主动学习(AL)成为一种关键技术。现有的图主动学习(GAL)方法在应用于超图时,通常依赖于诸如“团扩展”之类的技术,这会破坏对超图成功至关重要的高阶结构信息,从而导致次优性能。为了解决这一挑战,我们引入了HIAL(超图主动学习),这是一个专门为超图设计的本地主动学习框架。我们创新地将超图主动学习(HAL)问题重新表述为最大影响任务。HIAL的核心是一个双重视角影响函数,基于我们的新颖的“高阶交互感知(HOI-Aware)”传播机制,协同评估节点的特征空间覆盖(通过影响量的大小,MoI)和其拓扑影响(通过预期扩散值,EDV)。我们证明了这个目标函数是单调的和次模的,从而使得可以使用一种具有正式(1-1/e)逼近保证的高效贪心算法。对七个公共数据集的大量实验证明,HIAL在性能、效率、普适性和稳健性方面明显优于最先进的基线方法,建立了一种在超图上进行主动学习的高效且强大的新范式。

更新时间: 2025-07-28 02:58:21

领域: cs.LG

下载: http://arxiv.org/abs/2507.20490v1

HIAL: A New Paradigm for Hypergraph Active Learning via Influence Maximization

In recent years, Hypergraph Neural Networks (HNNs) have demonstrated immense potential in handling complex systems with high-order interactions. However, acquiring large-scale, high-quality labeled data for these models is costly, making Active Learning (AL) a critical technique. Existing Graph Active Learning (GAL) methods, when applied to hypergraphs, often rely on techniques like "clique expansion," which destroys the high-order structural information crucial to a hypergraph's success, thereby leading to suboptimal performance. To address this challenge, we introduce HIAL (Hypergraph Active Learning), a native active learning framework designed specifically for hypergraphs. We innovatively reformulate the Hypergraph Active Learning (HAL) problem as an Influence Maximization task. The core of HIAL is a dual-perspective influence function that, based on our novel "High-Order Interaction-Aware (HOI-Aware)" propagation mechanism, synergistically evaluates a node's feature-space coverage (via Magnitude of Influence, MoI) and its topological influence (via Expected Diffusion Value, EDV). We prove that this objective function is monotone and submodular, thus enabling the use of an efficient greedy algorithm with a formal (1-1/e) approximation guarantee. Extensive experiments on seven public datasets demonstrate that HIAL significantly outperforms state-of-the-art baselines in terms of performance, efficiency, generality, and robustness, establishing an efficient and powerful new paradigm for active learning on hypergraphs.

Updated: 2025-07-28 02:58:21

标题: HIAL: 通过影响最大化实现超图主动学习的新范式

摘要: 近年来,超图神经网络(HNNs)展示了在处理具有高阶交互的复杂系统方面的巨大潜力。然而,为这些模型获取大规模、高质量标记数据成本高昂,因此主动学习(AL)成为一项关键技术。现有的图主动学习(GAL)方法在应用于超图时,通常依赖于诸如“团扩展”之类的技术,这些技术破坏了对超图成功至关重要的高阶结构信息,从而导致次优性能。为了解决这一挑战,我们引入了HIAL(Hypergraph Active Learning),这是一个专为超图设计的本地主动学习框架。我们创新地将超图主动学习(HAL)问题重构为一个影响最大化任务。HIAL的核心是一个双视角影响函数,基于我们的新颖的“高阶交互感知(HOI-Aware)”传播机制,协同评估节点的特征空间覆盖(通过影响幅度,MoI)和其拓扑影响(通过预期扩散值,EDV)。我们证明该目标函数是单调的且次模的,从而使得能够使用一个具有正式(1-1/e)近似保证的高效贪婪算法。对七个公共数据集的广泛实验表明,HIAL在性能、效率、普适性和鲁棒性方面显著优于最先进的基线方法,为超图上的主动学习建立了一种高效且强大的新范式。

更新时间: 2025-07-28 02:58:21

领域: cs.LG

下载: http://arxiv.org/abs/2507.20490v1

TAIL: Text-Audio Incremental Learning

Many studies combine text and audio to capture multi-modal information but they overlook the model's generalization ability on new datasets. Introducing new datasets may affect the feature space of the original dataset, leading to catastrophic forgetting. Meanwhile, large model parameters can significantly impact training performance. To address these limitations, we introduce a novel task called Text-Audio Incremental Learning (TAIL) task for text-audio retrieval, and propose a new method, PTAT, Prompt Tuning for Audio-Text incremental learning. This method utilizes prompt tuning to optimize the model parameters while incorporating an audio-text similarity and feature distillation module to effectively mitigate catastrophic forgetting. We benchmark our method and previous incremental learning methods on AudioCaps, Clotho, BBC Sound Effects and Audioset datasets, and our method outperforms previous methods significantly, particularly demonstrating stronger resistance to forgetting on older datasets. Compared to the full-parameters Finetune (Sequential) method, our model only requires 2.42\% of its parameters, achieving 4.46\% higher performance.

Updated: 2025-07-28 02:46:44

标题: TAIL:文本-音频增量学习

摘要: 许多研究结合文本和音频来捕获多模态信息,但它们忽略了模型在新数据集上的泛化能力。引入新数据集可能会影响原始数据集的特征空间,导致灾难性遗忘。同时,大型模型参数可以显著影响训练性能。为了解决这些限制,我们引入了一个名为文本-音频增量学习(TAIL)任务,用于文本-音频检索,并提出了一种新方法,PTAT,Prompt Tuning for Audio-Text incremental learning。该方法利用提示调优来优化模型参数,同时结合音频-文本相似性和特征蒸馏模块,有效缓解灾难性遗忘。我们在AudioCaps、Clotho、BBC Sound Effects和Audioset数据集上对我们的方法和先前的增量学习方法进行基准测试,我们的方法在性能上显著优于先前的方法,特别是在老数据集上表现出更强的抗遗忘能力。与全参数Finetune(Sequential)方法相比,我们的模型只需其参数的2.42\%,性能提高了4.46\%。

更新时间: 2025-07-28 02:46:44

领域: cs.SD,cs.AI,cs.CV,eess.AS,I.2

下载: http://arxiv.org/abs/2503.04258v2

TAIL: Text-Audio Incremental Learning

Many studies combine text and audio to capture multi-modal information but they overlook the model's generalization ability on new datasets. Introducing new datasets may affect the feature space of the original dataset, leading to catastrophic forgetting. Meanwhile, large model parameters can significantly impact training performance. To address these limitations, we introduce a novel task called Text-Audio Incremental Learning (TAIL) task for text-audio retrieval, and propose a new method, PTAT, Prompt Tuning for Audio-Text incremental learning. This method utilizes prompt tuning to optimize the model parameters while incorporating an audio-text similarity and feature distillation module to effectively mitigate catastrophic forgetting. We benchmark our method and previous incremental learning methods on AudioCaps, Clotho, BBC Sound Effects and Audioset datasets, and our method outperforms previous methods significantly, particularly demonstrating stronger resistance to forgetting on older datasets. Compared to the full-parameters Finetune (Sequential) method, our model only requires 2.42\% of its parameters, achieving 4.46\% higher performance.

Updated: 2025-07-28 02:46:44

标题: TAIL: 文本-音频增量学习

摘要: 许多研究结合文本和音频以捕获多模态信息,但它们忽略了模型对新数据集的泛化能力。引入新数据集可能会影响原始数据集的特征空间,导致灾难性遗忘。同时,大型模型参数可能会显著影响训练性能。为了解决这些限制,我们引入了一个名为Text-Audio Incremental Learning(TAIL)任务的新任务,用于文本-音频检索,并提出了一种新方法PTAT,Prompt Tuning for Audio-Text incremental learning。该方法利用提示调整来优化模型参数,同时结合音频-文本相似性和特征蒸馏模块,有效缓解灾难性遗忘。我们在AudioCaps、Clotho、BBC Sound Effects和Audioset数据集上对我们的方法和先前的增量学习方法进行了基准测试,我们的方法明显优于先前的方法,特别是在老数据集上表现出更强的抗遗忘性。与全参数Finetune(Sequential)方法相比,我们的模型只需要其参数的2.42\%,性能提高了4.46\%。

更新时间: 2025-07-28 02:46:44

领域: cs.SD,cs.AI,cs.CV,eess.AS,I.2

下载: http://arxiv.org/abs/2503.04258v2

Juru: Legal Brazilian Large Language Model from Reputable Sources

The high compute cost associated with pretraining large language models limits their research. Two strategies have emerged to address this issue: domain specialization and pretraining with high-quality data. To explore these strategies, we specialized the Mistral-7B model with 1.9 billion unique tokens from reputable Brazilian legal sources and conducted few-shot evaluations on legal and general knowledge test suites. Our model, Juru, demonstrates the benefits of domain specialization by achieving improved performance on legal benchmarks, even with a reduced amount of pretraining data. However, this domain specialization through continued pretraining comes at the cost of increased forgetting in unrelated domains, as evidenced by performance degradation on general knowledge test suites in both Portuguese and English. This study contributes to the growing body of scientific evidence showing that pretraining data selection may enhance the performance of large language models, enabling the exploration of these models at a lower cost. Juru is publicly available at https://huggingface.co/roseval/Juru-7B .

Updated: 2025-07-28 02:46:00

标题: Juru:来自可靠来源的巴西法律大型语言模型

摘要: 与预训练大型语言模型相关的高计算成本限制了其研究。为解决这一问题,出现了两种策略:领域专门化和使用高质量数据进行预训练。为了探索这些策略,我们使用了来自知名巴西法律来源的19亿个唯一标记对Mistral-7B模型进行了专门化,并在法律和一般知识测试套件上进行了少量评估。我们的模型Juru通过在法律基准上取得改进的性能,即使使用了减少的预训练数据,展示了领域专门化的好处。然而,这种通过持续预训练实现的领域专门化是以增加在无关领域的遗忘为代价的,这表现在在葡萄牙语和英语的一般知识测试套件上表现下降。这项研究为日益增长的科学证据体系增添了一份力量,表明预训练数据选择可能提高大型语言模型的性能,从而降低探索这些模型的成本。Juru可在https://huggingface.co/roseval/Juru-7B 上公开获取。

更新时间: 2025-07-28 02:46:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.18140v2

Juru: Legal Brazilian Large Language Model from Reputable Sources

The high compute cost associated with pretraining large language models limits their research. Two strategies have emerged to address this issue: domain specialization and pretraining with high-quality data. To explore these strategies, we specialized the Mistral-7B model with 1.9 billion unique tokens from reputable Brazilian legal sources and conducted few-shot evaluations on legal and general knowledge test suites. Our model, Juru, demonstrates the benefits of domain specialization by achieving improved performance on legal benchmarks, even with a reduced amount of pretraining data. However, this domain specialization through continued pretraining comes at the cost of increased forgetting in unrelated domains, as evidenced by performance degradation on general knowledge test suites in both Portuguese and English. This study contributes to the growing body of scientific evidence showing that pretraining data selection may enhance the performance of large language models, enabling the exploration of these models at a lower cost. Juru is publicly available at https://huggingface.co/roseval/Juru-7B .

Updated: 2025-07-28 02:46:00

标题: Juru: 来自可靠来源的巴西大型语言模型

摘要: 与预训练大型语言模型相关的高计算成本限制了其研究。出现了两种策略来解决这个问题:领域专业化和使用高质量数据进行预训练。为了探索这些策略,我们使用来自知名巴西法律来源的19亿个独特标记来专门化Mistral-7B模型,并在法律和一般知识测试套件上进行了少次评估。我们的模型Juru通过在法律基准上取得改进的表现,即使使用较少的预训练数据也能体现领域专业化的好处。然而,通过持续预训练实现的领域专业化会导致在不相关领域中遗忘增加,这表现为在葡萄牙语和英语的一般知识测试套件上性能下降。这项研究为日益增长的科学证据贡献了一份力量,表明预训练数据选择可能提高大型语言模型的性能,从而以更低的成本探索这些模型。Juru可以在https://huggingface.co/roseval/Juru-7B 上公开获取。

更新时间: 2025-07-28 02:46:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.18140v2

Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents

Conversational agents are increasingly woven into individuals' personal lives, yet users often underestimate the privacy risks associated with them. The moment users share information with these agents-such as large language models (LLMs)-their private information becomes vulnerable to exposure. In this paper, we characterize the notion of contextual privacy for user interactions with LLM-based Conversational Agents (LCAs). It aims to minimize privacy risks by ensuring that users (sender) disclose only information that is both relevant and necessary for achieving their intended goals when interacting with LCAs (untrusted receivers). Through a formative design user study, we observe how even "privacy-conscious" users inadvertently reveal sensitive information through indirect disclosures. Based on insights from this study, we propose a locally deployable framework that operates between users and LCAs, identifying and reformulating out-of-context information in user prompts. Our evaluation using examples from ShareGPT shows that lightweight models can effectively implement this framework, achieving strong gains in contextual privacy while preserving the user's intended interaction goals. Notably, about 76% of participants in our human evaluation preferred the reformulated prompts over the original ones, validating the usability and effectiveness of contextual privacy in our proposed framework. We opensource the code at https://github.com/IBM/contextual-privacy-LLM.

Updated: 2025-07-28 02:41:49

标题: 保护用户免受自身伤害:在与对话代理互动中保护上下文隐私

摘要: 对话代理越来越多地融入个人生活,但用户通常低估了与其相关的隐私风险。一旦用户与这些代理共享信息,例如大型语言模型(LLMs),他们的私人信息就容易受到曝光。在本文中,我们对基于LLM的对话代理(LCAs)与用户互动的上下文隐私概念进行了表征。它旨在通过确保用户(发送者)在与LCAs(不受信任的接收者)互动时仅披露与实现其预期目标相关且必要的信息,从而最小化隐私风险。通过一个形成设计用户研究,我们观察到即使是“隐私意识”用户也会无意中通过间接披露透露敏感信息。基于这项研究的见解,我们提出了一个可在用户和LCAs之间运行的本地部署框架,用于识别和重新制定用户提示中的上下文信息。我们利用ShareGPT的示例对这个框架进行了评估,结果表明轻量级模型可以有效实施这个框架,实现强大的上下文隐私收益,同时保留用户预期的互动目标。值得注意的是,在我们的人类评估中,约有76%的参与者更喜欢重新制定的提示而不是原始提示,验证了我们提出的框架中上下文隐私的可用性和有效性。我们在https://github.com/IBM/contextual-privacy-LLM上开源了代码。

更新时间: 2025-07-28 02:41:49

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2502.18509v2

Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents

Conversational agents are increasingly woven into individuals' personal lives, yet users often underestimate the privacy risks associated with them. The moment users share information with these agents-such as large language models (LLMs)-their private information becomes vulnerable to exposure. In this paper, we characterize the notion of contextual privacy for user interactions with LLM-based Conversational Agents (LCAs). It aims to minimize privacy risks by ensuring that users (sender) disclose only information that is both relevant and necessary for achieving their intended goals when interacting with LCAs (untrusted receivers). Through a formative design user study, we observe how even "privacy-conscious" users inadvertently reveal sensitive information through indirect disclosures. Based on insights from this study, we propose a locally deployable framework that operates between users and LCAs, identifying and reformulating out-of-context information in user prompts. Our evaluation using examples from ShareGPT shows that lightweight models can effectively implement this framework, achieving strong gains in contextual privacy while preserving the user's intended interaction goals. Notably, about 76% of participants in our human evaluation preferred the reformulated prompts over the original ones, validating the usability and effectiveness of contextual privacy in our proposed framework. We opensource the code at https://github.com/IBM/contextual-privacy-LLM.

Updated: 2025-07-28 02:41:49

标题: 保护用户不受自身伤害:在与对话代理进行交互中保护上下文隐私

摘要: 会话代理越来越多地融入个人生活,但用户往往低估了与其相关的隐私风险。一旦用户与这些代理(如大型语言模型(LLMs))共享信息,他们的私人信息就会变得容易暴露。在本文中,我们对基于LLM的会话代理(LCAs)与用户互动的上下文隐私概念进行了表征。它旨在通过确保用户(发送者)在与LCAs(不受信任的接收者)互动时只披露与其预期目标实现相关且必要的信息来最小化隐私风险。通过一个形成设计用户研究,我们观察到即使“注重隐私”的用户也会无意中通过间接披露透露敏感信息。基于这项研究的见解,我们提出了一个可以在用户和LCAs之间运行的本地部署框架,识别和重新构思用户提示中的非上下文信息。我们使用ShareGPT的示例进行评估,结果表明轻量级模型可以有效实现该框架,实现上下文隐私的强大增益,同时保留用户的预期交互目标。值得注意的是,在我们的人类评估中,约76%的参与者更喜欢重新构思的提示而不是原始提示,验证了我们提出框架中上下文隐私的可用性和有效性。我们在https://github.com/IBM/contextual-privacy-LLM 开源了代码。

更新时间: 2025-07-28 02:41:49

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2502.18509v2

Conditional Diffusion Models for Global Precipitation Map Inpainting

Incomplete satellite-based precipitation presents a significant challenge in global monitoring. For example, the Global Satellite Mapping of Precipitation (GSMaP) from JAXA suffers from substantial missing regions due to the orbital characteristics of satellites that have microwave sensors, and its current interpolation methods often result in spatial discontinuities. In this study, we formulate the completion of the precipitation map as a video inpainting task and propose a machine learning approach based on conditional diffusion models. Our method employs a 3D U-Net with a 3D condition encoder to reconstruct complete precipitation maps by leveraging spatio-temporal information from infrared images, latitude-longitude grids, and physical time inputs. Training was carried out on ERA5 hourly precipitation data from 2020 to 2023. We generated a pseudo-GSMaP dataset by randomly applying GSMaP masks to ERA maps. Performance was evaluated for the calendar year 2024, and our approach produces more spatio-temporally consistent inpainted precipitation maps compared to conventional methods. These results indicate the potential to improve global precipitation monitoring using the conditional diffusion models.

Updated: 2025-07-28 02:26:36

标题: 条件扩散模型用于全球降水图像修复

摘要: 卫星降水数据的不完整性在全球监测中构成重大挑战。例如,JAXA的全球卫星降水图(GSMaP)由于卫星的轨道特性导致存在大量缺失区域,并且其当前的插值方法经常导致空间不连续性。在本研究中,我们将降水图的完整性形成为视频修复任务,并提出了一种基于条件扩散模型的机器学习方法。我们的方法采用了一个带有3D条件编码器的3D U-Net,通过利用红外图像、纬度-经度格网和物理时间输入的时空信息来重建完整的降水图。我们使用了2020年至2023年的ERA5每小时降水数据进行训练。我们通过随机应用GSMaP掩模到ERA地图上生成了一个伪GSMaP数据集。我们对2024年的日历年进行了性能评估,我们的方法相比传统方法产生了更加空间-时间一致的修复后的降水图。这些结果表明了利用条件扩散模型可以改进全球降水监测的潜力。

更新时间: 2025-07-28 02:26:36

领域: cs.LG

下载: http://arxiv.org/abs/2507.20478v1

Conditional Diffusion Models for Global Precipitation Map Inpainting

Incomplete satellite-based precipitation presents a significant challenge in global monitoring. For example, the Global Satellite Mapping of Precipitation (GSMaP) from JAXA suffers from substantial missing regions due to the orbital characteristics of satellites that have microwave sensors, and its current interpolation methods often result in spatial discontinuities. In this study, we formulate the completion of the precipitation map as a video inpainting task and propose a machine learning approach based on conditional diffusion models. Our method employs a 3D U-Net with a 3D condition encoder to reconstruct complete precipitation maps by leveraging spatio-temporal information from infrared images, latitude-longitude grids, and physical time inputs. Training was carried out on ERA5 hourly precipitation data from 2020 to 2023. We generated a pseudo-GSMaP dataset by randomly applying GSMaP masks to ERA maps. Performance was evaluated for the calendar year 2024, and our approach produces more spatio-temporally consistent inpainted precipitation maps compared to conventional methods. These results indicate the potential to improve global precipitation monitoring using the conditional diffusion models.

Updated: 2025-07-28 02:26:36

标题: 全球降水地图修复的条件扩散模型

摘要: 卫星降水数据不完整在全球监测中构成了一个重大挑战。例如,JAXA的全球卫星降水制图(GSMaP)由于拥有微波传感器的卫星的轨道特性,存在大量缺失区域,并且其当前的插值方法通常会导致空间不连续性。在本研究中,我们将降水图的完整性作为视频修复任务,并提出了一种基于条件扩散模型的机器学习方法。我们的方法利用红外图像、纬度-经度网格和物理时间输入的时空信息,采用3D U-Net和3D条件编码器重建完整的降水图。训练是在2020年至2023年的ERA5小时降水数据上进行的。我们通过随机应用GSMaP掩模到ERA地图上,生成了一个伪GSMaP数据集。对2024年的日历年进行了性能评估,我们的方法相比传统方法产生了更具时空一致性的修复降水图。这些结果表明了利用条件扩散模型改善全球降水监测的潜力。

更新时间: 2025-07-28 02:26:36

领域: cs.LG

下载: http://arxiv.org/abs/2507.20478v1

Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering

In this paper, we propose a Grid-based Local and Global Area Transcription (Grid-LoGAT) system for Video Question Answering (VideoQA). The system operates in two phases. First, extracting text transcripts from video frames using a Vision-Language Model (VLM). Next, processing questions using these transcripts to generate answers through a Large Language Model (LLM). This design ensures image privacy by deploying the VLM on edge devices and the LLM in the cloud. To improve transcript quality, we propose grid-based visual prompting, which extracts intricate local details from each grid cell and integrates them with global information. Evaluation results show that Grid-LoGAT, using the open-source VLM (LLaVA-1.6-7B) and LLM (Llama-3.1-8B), outperforms state-of-the-art methods with similar baseline models on NExT-QA and STAR-QA datasets with an accuracy of 65.9% and 50.11% respectively. Additionally, our method surpasses the non-grid version by 24 points on localization-based questions we created using NExT-QA. (This paper is accepted by IEEE ICIP 2025.)

Updated: 2025-07-28 02:22:21

标题: Grid-LOGAT: 基于网格的视频问答的本地和全局区域转录

摘要: 在本文中,我们提出了一种基于网格的本地和全局区域转录(Grid-LoGAT)系统,用于视频问答(VideoQA)。该系统分为两个阶段。首先,使用视觉语言模型(VLM)从视频帧中提取文本转录。接下来,使用这些转录处理问题,通过大型语言模型(LLM)生成答案。这种设计通过在边缘设备上部署VLM和在云端部署LLM来确保图像隐私。为了提高转录质量,我们提出了基于网格的视觉提示,从每个网格单元中提取复杂的局部细节,并将其与全局信息集成。评估结果显示,使用开源VLM(LLaVA-1.6-7B)和LLM(Llama-3.1-8B)的Grid-LoGAT在NExT-QA和STAR-QA数据集上的准确率分别为65.9%和50.11%,优于具有类似基线模型的最新方法。此外,我们的方法在使用NExT-QA创建的基于定位的问题上比非网格版本高出24个百分点。(本文已被IEEE ICIP 2025接受。)

更新时间: 2025-07-28 02:22:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2505.24371v3

Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering

In this paper, we propose a Grid-based Local and Global Area Transcription (Grid-LoGAT) system for Video Question Answering (VideoQA). The system operates in two phases. First, extracting text transcripts from video frames using a Vision-Language Model (VLM). Next, processing questions using these transcripts to generate answers through a Large Language Model (LLM). This design ensures image privacy by deploying the VLM on edge devices and the LLM in the cloud. To improve transcript quality, we propose grid-based visual prompting, which extracts intricate local details from each grid cell and integrates them with global information. Evaluation results show that Grid-LoGAT, using the open-source VLM (LLaVA-1.6-7B) and LLM (Llama-3.1-8B), outperforms state-of-the-art methods with similar baseline models on NExT-QA and STAR-QA datasets with an accuracy of 65.9% and 50.11% respectively. Additionally, our method surpasses the non-grid version by 24 points on localization-based questions we created using NExT-QA. (This paper is accepted by IEEE ICIP 2025.)

Updated: 2025-07-28 02:22:21

标题: Grid-LOGAT: 视频问答的基于网格的局部和全局区域转录

摘要: 在这篇论文中,我们提出了一种基于网格的局部和全局区域转录(Grid-LoGAT)系统,用于视频问答(VideoQA)。该系统分为两个阶段。首先,使用视觉语言模型(VLM)从视频帧中提取文本转录。接下来,使用这些转录处理问题,通过大型语言模型(LLM)生成答案。该设计通过在边缘设备上部署VLM和在云端部署LLM来确保图像隐私。为了提高转录质量,我们提出了基于网格的视觉提示,从每个网格单元中提取复杂的局部细节,并将其与全局信息整合。评估结果显示,使用开源的VLM(LLaVA-1.6-7B)和LLM(Llama-3.1-8B)的Grid-LoGAT在NExT-QA和STAR-QA数据集上分别取得了65.9%和50.11%的准确率,优于具有相似基线模型的最新方法。此外,我们的方法在使用NExT-QA创建的基于定位的问题上比非网格版本高出24个点。(本文被IEEE ICIP 2025接受。)

更新时间: 2025-07-28 02:22:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2505.24371v3

Distribution-aware Forgetting Compensation for Exemplar-Free Lifelong Person Re-identification

Lifelong Person Re-identification (LReID) suffers from a key challenge in preserving old knowledge while adapting to new information. The existing solutions include rehearsal-based and rehearsal-free methods to address this challenge. Rehearsal-based approaches rely on knowledge distillation, continuously accumulating forgetting during the distillation process. Rehearsal-free methods insufficiently learn the distribution of each domain, leading to forgetfulness over time. To solve these issues, we propose a novel Distribution-aware Forgetting Compensation (DAFC) model that explores cross-domain shared representation learning and domain-specific distribution integration without using old exemplars or knowledge distillation. We propose a Text-driven Prompt Aggregation (TPA) that utilizes text features to enrich prompt elements and guide the prompt model to learn fine-grained representations for each instance. This can enhance the differentiation of identity information and establish the foundation for domain distribution awareness. Then, Distribution-based Awareness and Integration (DAI) is designed to capture each domain-specific distribution by a dedicated expert network and adaptively consolidate them into a shared region in high-dimensional space. In this manner, DAI can consolidate and enhance cross-domain shared representation learning while alleviating catastrophic forgetting. Furthermore, we develop a Knowledge Consolidation Mechanism (KCM) that comprises instance-level discrimination and cross-domain consistency alignment strategies to facilitate model adaptive learning of new knowledge from the current domain and promote knowledge consolidation learning between acquired domain-specific distributions, respectively. Experimental results show that our DAFC outperforms state-of-the-art methods. Our code is available at https://github.com/LiuShiBen/DAFC.

Updated: 2025-07-28 02:15:27

标题: 考虑分布的遗忘补偿对无标本终身人员重新识别的影响

摘要: 终身人员重新识别(LReID)面临一个关键挑战,即在保留旧知识的同时适应新信息。现有的解决方案包括基于复习和无复习的方法来解决这个挑战。基于复习的方法依赖于知识蒸馏,在蒸馏过程中不断积累遗忘。无复习的方法未能充分学习每个域的分布,导致随着时间的推移遗忘。为了解决这些问题,我们提出了一种新颖的分布感知遗忘补偿(DAFC)模型,该模型探索跨域共享表示学习和特定域分布集成,而不使用旧样本或知识蒸馏。我们提出了一种文本驱动的提示聚合(TPA),利用文本特征丰富提示元素,引导提示模型学习每个实例的细粒度表示。这可以增强身份信息的区分度,为域分布意识奠定基础。然后,基于分布的意识和集成(DAI)旨在通过专用专家网络捕捉每个特定领域的分布,并自适应地将它们 consolideate 到高维空间的共享区域。通过这种方式,DAI可以 consolideate 并增强跨域共享表示学习,同时减轻灾难性遗忘。此外,我们开发了一种知识整合机制(KCM),包括实例级别的区分和跨域一致性对齐策略,以促进模型对当前领域的新知识的自适应学习,并促进已获得的特定领域分布之间的知识整合学习。实验结果表明,我们的DAFC优于最先进的方法。我们的代码可在https://github.com/LiuShiBen/DAFC 上找到。

更新时间: 2025-07-28 02:15:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.15041v3

Distribution-aware Forgetting Compensation for Exemplar-Free Lifelong Person Re-identification

Lifelong Person Re-identification (LReID) suffers from a key challenge in preserving old knowledge while adapting to new information. The existing solutions include rehearsal-based and rehearsal-free methods to address this challenge. Rehearsal-based approaches rely on knowledge distillation, continuously accumulating forgetting during the distillation process. Rehearsal-free methods insufficiently learn the distribution of each domain, leading to forgetfulness over time. To solve these issues, we propose a novel Distribution-aware Forgetting Compensation (DAFC) model that explores cross-domain shared representation learning and domain-specific distribution integration without using old exemplars or knowledge distillation. We propose a Text-driven Prompt Aggregation (TPA) that utilizes text features to enrich prompt elements and guide the prompt model to learn fine-grained representations for each instance. This can enhance the differentiation of identity information and establish the foundation for domain distribution awareness. Then, Distribution-based Awareness and Integration (DAI) is designed to capture each domain-specific distribution by a dedicated expert network and adaptively consolidate them into a shared region in high-dimensional space. In this manner, DAI can consolidate and enhance cross-domain shared representation learning while alleviating catastrophic forgetting. Furthermore, we develop a Knowledge Consolidation Mechanism (KCM) that comprises instance-level discrimination and cross-domain consistency alignment strategies to facilitate model adaptive learning of new knowledge from the current domain and promote knowledge consolidation learning between acquired domain-specific distributions, respectively. Experimental results show that our DAFC outperforms state-of-the-art methods. Our code is available at https://github.com/LiuShiBen/DAFC.

Updated: 2025-07-28 02:15:27

标题: 基于分布的遗忘补偿的无范例生涯人员重新识别

摘要: 终身人员再识别(LReID)面临一个关键挑战,即在保留旧知识的同时适应新信息。现有的解决方案包括基于复述和无复述的方法来应对这一挑战。基于复述的方法依赖于知识蒸馏,在蒸馏过程中不断积累遗忘。无复述方法未能充分学习每个领域的分布,导致随着时间的推移遗忘。为了解决这些问题,我们提出了一种新颖的分布感知遗忘补偿(DAFC)模型,探索跨领域共享表示学习和领域特定分布集成,而不使用旧实例或知识蒸馏。我们提出了一种利用文本特征丰富提示元素并指导提示模型学习每个实例的细粒度表示的Text-driven Prompt Aggregation(TPA)。这可以增强身份信息的区分度,并为领域分布意识奠定基础。然后,我们设计了基于分布的感知和集成(DAI),通过专用专家网络捕获每个领域特定的分布,并自适应地将它们 consoli在高维空间中。通过这种方式,DAI可以 consoli和增强跨领域共享表示学习,同时缓解灾难性遗忘。此外,我们开发了一种包括实例级别区分和跨领域一致性对齐策略的知识 consoli 机制(KCM),以促进模型对当前领域的新知识的自适应学习,并促进已获得的领域特定分布之间的知识 consoli 学习。实验结果表明,我们的DAFC优于最先进的方法。我们的代码可在https://github.com/LiuShiBen/DAFC上获得。

更新时间: 2025-07-28 02:15:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.15041v3

Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality

In Transformer architectures, tokens\textemdash discrete units derived from raw data\textemdash are formed by segmenting inputs into fixed-length chunks. Each token is then mapped to an embedding, enabling parallel attention computations while preserving the input's essential information. Due to the quadratic computational complexity of transformer self-attention mechanisms, token reduction has primarily been used as an efficiency strategy. This is especially true in single vision and language domains, where it helps balance computational costs, memory usage, and inference latency. Despite these advances, this paper argues that token reduction should transcend its traditional efficiency-oriented role in the era of large generative models. Instead, we position it as a fundamental principle in generative modeling, critically influencing both model architecture and broader applications. Specifically, we contend that across vision, language, and multimodal systems, token reduction can: (i) facilitate deeper multimodal integration and alignment, (ii) mitigate "overthinking" and hallucinations, (iii) maintain coherence over long inputs, and (iv) enhance training stability, etc. We reframe token reduction as more than an efficiency measure. By doing so, we outline promising future directions, including algorithm design, reinforcement learning-guided token reduction, token optimization for in-context learning, and broader ML and scientific domains. We highlight its potential to drive new model architectures and learning strategies that improve robustness, increase interpretability, and better align with the objectives of generative modeling.

Updated: 2025-07-28 01:59:08

标题: Token Reduction应该超越生成模型中的效率——从视觉、语言到多模态

摘要: 在Transformer架构中,令牌——从原始数据中派生的离散单元——通过将输入分割为固定长度的块而形成。然后,每个令牌被映射到一个嵌入,从而实现并行注意力计算,同时保留输入的基本信息。由于Transformer自注意机制的二次计算复杂度,令牌缩减主要被用作一种效率策略。这在单一视觉和语言领域尤为重要,因为它有助于平衡计算成本、内存使用和推断延迟。尽管取得了这些进展,本文认为,在大型生成模型时代,令牌缩减应该超越传统的效率导向角色。相反,我们将其定位为生成建模中的基本原则,对模型架构和更广泛的应用产生关键影响。具体而言,我们认为,在视觉、语言和多模式系统中,令牌缩减可以:(i) 促进更深入的多模态整合和对齐,(ii) 缓解“过度思考”和幻觉,(iii) 在长输入上保持连贯性,以及(iv) 增强训练稳定性等。我们重新构思令牌缩减,将其视为不仅仅是一种效率措施。通过这样做,我们概述了有前途的未来方向,包括算法设计、强化学习引导的令牌缩减、用于上下文学习的令牌优化,以及更广泛的机器学习和科学领域。我们强调其潜力,推动新的模型架构和学习策略,提高鲁棒性,增强可解释性,并更好地符合生成建模的目标。

更新时间: 2025-07-28 01:59:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2505.18227v2

Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality

In Transformer architectures, tokens\textemdash discrete units derived from raw data\textemdash are formed by segmenting inputs into fixed-length chunks. Each token is then mapped to an embedding, enabling parallel attention computations while preserving the input's essential information. Due to the quadratic computational complexity of transformer self-attention mechanisms, token reduction has primarily been used as an efficiency strategy. This is especially true in single vision and language domains, where it helps balance computational costs, memory usage, and inference latency. Despite these advances, this paper argues that token reduction should transcend its traditional efficiency-oriented role in the era of large generative models. Instead, we position it as a fundamental principle in generative modeling, critically influencing both model architecture and broader applications. Specifically, we contend that across vision, language, and multimodal systems, token reduction can: (i) facilitate deeper multimodal integration and alignment, (ii) mitigate "overthinking" and hallucinations, (iii) maintain coherence over long inputs, and (iv) enhance training stability, etc. We reframe token reduction as more than an efficiency measure. By doing so, we outline promising future directions, including algorithm design, reinforcement learning-guided token reduction, token optimization for in-context learning, and broader ML and scientific domains. We highlight its potential to drive new model architectures and learning strategies that improve robustness, increase interpretability, and better align with the objectives of generative modeling.

Updated: 2025-07-28 01:59:08

标题: Token Reduction在生成模型中不应仅仅关注效率--从视觉、语言到多模态

摘要: 在Transformer架构中,tokens\textemdash由原始数据分割而成的离散单元\textemdash被形成为固定长度的块。然后,每个token被映射到一个嵌入,从而实现并行注意力计算,同时保留输入的基本信息。由于Transformer自注意机制的二次计算复杂性,token减少主要被用作一种效率策略。这在单一视觉和语言领域尤为明显,它有助于平衡计算成本、内存使用和推理延迟。尽管取得了这些进展,本文认为,在大型生成模型时代,token减少应该超越其传统的效率导向角色。相反,我们将其定位为生成建模中的一个基本原则,关键影响模型架构和更广泛的应用。具体来说,我们认为,在视觉、语言和多模态系统中,token减少可以:(i)促进更深层次的多模态整合和对齐,(ii)减轻"过度思考"和幻觉,(iii)在长输入上保持一致性,以及(iv)增强训练稳定性等。我们重新构建token减少,将其视为不仅仅是一种效率措施。通过这样做,我们概述了有希望的未来方向,包括算法设计、强化学习引导的token减少、用于上下文学习的token优化,以及更广泛的机器学习和科学领域。我们强调其潜力,可以推动新的模型架构和学习策略,提高鲁棒性,增加可解释性,并更好地与生成建模的目标相一致。

更新时间: 2025-07-28 01:59:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2505.18227v2

Operator Inference Aware Quadratic Manifolds with Isotropic Reduced Coordinates for Nonintrusive Model Reduction

Quadratic manifolds for nonintrusive reduced modeling are typically trained to minimize the reconstruction error on snapshot data, which means that the error of models fitted to the embedded data in downstream learning steps is ignored. In contrast, we propose a greedy training procedure that takes into account both the reconstruction error on the snapshot data and the prediction error of reduced models fitted to the data. Because our procedure learns quadratic manifolds with the objective of achieving accurate reduced models, it avoids oscillatory and other non-smooth embeddings that can hinder learning accurate reduced models. Numerical experiments on transport and turbulent flow problems show that quadratic manifolds trained with the proposed greedy approach lead to reduced models with up to two orders of magnitude higher accuracy than quadratic manifolds trained with respect to the reconstruction error alone.

Updated: 2025-07-28 01:45:55

标题: 具有各向同性降维坐标的操作推断感知二次流形,用于非侵入式模型简化

摘要: 非侵入式减少建模的二次流形通常通过训练来最小化快照数据上的重建误差,这意味着在下游学习步骤中嵌入数据拟合模型的误差被忽略。相比之下,我们提出了一种贪婪的训练过程,考虑了快照数据上的重建误差和拟合到数据的减少模型的预测误差。由于我们的过程学习二次流形的目标是实现准确的减少模型,它避免了可能阻碍学习准确减少模型的振荡和其他非平滑嵌入。对输运和湍流流动问题的数值实验表明,使用提出的贪婪方法训练的二次流形导致比仅针对重建误差训练的二次流形减少模型高达两个数量级的更高准确性。

更新时间: 2025-07-28 01:45:55

领域: math.DS,cs.LG,cs.NA,math.NA,35A01, 65L10, 65L12, 65L20, 65L70

下载: http://arxiv.org/abs/2507.20463v1

Operator Inference Aware Quadratic Manifolds with Isotropic Reduced Coordinates for Nonintrusive Model Reduction

Quadratic manifolds for nonintrusive reduced modeling are typically trained to minimize the reconstruction error on snapshot data, which means that the error of models fitted to the embedded data in downstream learning steps is ignored. In contrast, we propose a greedy training procedure that takes into account both the reconstruction error on the snapshot data and the prediction error of reduced models fitted to the data. Because our procedure learns quadratic manifolds with the objective of achieving accurate reduced models, it avoids oscillatory and other non-smooth embeddings that can hinder learning accurate reduced models. Numerical experiments on transport and turbulent flow problems show that quadratic manifolds trained with the proposed greedy approach lead to reduced models with up to two orders of magnitude higher accuracy than quadratic manifolds trained with respect to the reconstruction error alone.

Updated: 2025-07-28 01:45:55

标题: 意识到运算符推断的各向同性降维坐标的二次流形,用于非侵入式模型简化

摘要: 非侵入式减模型的二次流形通常被训练以最小化快照数据上的重建误差,这意味着在下游学习步骤中嵌入数据的模型的误差被忽略。相比之下,我们提出了一种贪婪的训练过程,该过程同时考虑了快照数据上的重建误差和适应数据的减模型的预测误差。由于我们的过程旨在学习二次流形以实现准确的减模型,因此避免了可能阻碍学习准确减模型的振荡和其他非平滑嵌入。对输运和湍流流动问题进行的数值实验显示,使用提出的贪婪方法训练的二次流形导致的减模型的准确性比仅针对重建误差训练的二次流形高出两个数量级。

更新时间: 2025-07-28 01:45:55

领域: math.DS,cs.LG,cs.NA,math.NA,35A01, 65L10, 65L12, 65L20, 65L70

下载: http://arxiv.org/abs/2507.20463v1

EdgeAgentX-DT: Integrating Digital Twins and Generative AI for Resilient Edge Intelligence in Tactical Networks

We introduce EdgeAgentX-DT, an advanced extension of the EdgeAgentX framework that integrates digital twin simulations and generative AI-driven scenario training to significantly enhance edge intelligence in military networks. EdgeAgentX-DT utilizes network digital twins, virtual replicas synchronized with real-world edge devices, to provide a secure, realistic environment for training and validation. Leveraging generative AI methods, such as diffusion models and transformers, the system creates diverse and adversarial scenarios for robust simulation-based agent training. Our multi-layer architecture includes: (1) on-device edge intelligence; (2) digital twin synchronization; and (3) generative scenario training. Experimental simulations demonstrate notable improvements over EdgeAgentX, including faster learning convergence, higher network throughput, reduced latency, and improved resilience against jamming and node failures. A case study involving a complex tactical scenario with simultaneous jamming attacks, agent failures, and increased network loads illustrates how EdgeAgentX-DT sustains operational performance, whereas baseline methods fail. These results highlight the potential of digital-twin-enabled generative training to strengthen edge AI deployments in contested environments.

Updated: 2025-07-28 01:42:05

标题: EdgeAgentX-DT:将数字孪生和生成式人工智能集成到战术网络中,实现弹性边缘智能

摘要: 我们介绍了EdgeAgentX-DT,这是EdgeAgentX框架的一个高级扩展,集成了数字孪生模拟和生成式人工智能驱动的情景训练,显著增强了军事网络中的边缘智能。EdgeAgentX-DT利用网络数字孪生,与真实世界的边缘设备同步的虚拟副本,为训练和验证提供了安全、真实的环境。利用扩散模型和变压器等生成式人工智能方法,系统为稳健的基于模拟的代理训练创建多样化和对抗性情景。我们的多层架构包括:(1)设备上的边缘智能;(2)数字孪生同步;和(3)生成式情景训练。实验模拟显示,EdgeAgentX-DT相比于EdgeAgentX有显著的改进,包括更快的学习收敛、更高的网络吞吐量、降低的延迟以及改善了对干扰和节点故障的抵抗能力。涉及复杂战术情景的案例研究,包括同时干扰攻击、代理故障和网络负载增加,说明了EdgeAgentX-DT如何维持操作性能,而基线方法失败。这些结果突显了数字孪生启用的生成式训练在争夺环境中加强边缘人工智能部署的潜力。

更新时间: 2025-07-28 01:42:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.21196v1

Shapley-Value-Based Graph Sparsification for GNN Inference

Graph sparsification is a key technique for improving inference efficiency in Graph Neural Networks by removing edges with minimal impact on predictions. GNN explainability methods generate local importance scores, which can be aggregated into global scores for graph sparsification. However, many explainability methods produce only non-negative scores, limiting their applicability for sparsification. In contrast, Shapley value based methods assign both positive and negative contributions to node predictions, offering a theoretically robust and fair allocation of importance by evaluating many subsets of graphs. Unlike gradient-based or perturbation-based explainers, Shapley values enable better pruning strategies that preserve influential edges while removing misleading or adversarial connections. Our approach shows that Shapley value-based graph sparsification maintains predictive performance while significantly reducing graph complexity, enhancing both interpretability and efficiency in GNN inference.

Updated: 2025-07-28 01:30:09

标题: 基于夏普利值的图稀疏化用于GNN推理

摘要: 图稀疏化是一种关键技术,可以通过删除对预测影响最小的边来提高图神经网络中的推理效率。GNN可解释性方法生成局部重要性分数,这些分数可以汇总为用于图稀疏化的全局分数。然而,许多可解释性方法仅产生非负分数,限制了它们在稀疏化方面的适用性。相比之下,基于Shapley值的方法为节点预测分配了正负贡献,通过评估许多图的子集,提供了一个理论上健壮和公平的重要性分配。与基于梯度或扰动的解释器不同,Shapley值能够实现更好的修剪策略,保留有影响力的边,同时删除误导性或对抗性的连接。我们的方法表明,基于Shapley值的图稀疏化在显著减少图复杂性的同时保持了预测性能,增强了GNN推理的可解释性和效率。

更新时间: 2025-07-28 01:30:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20460v1

Shapley-Value-Based Graph Sparsification for GNN Inference

Graph sparsification is a key technique for improving inference efficiency in Graph Neural Networks by removing edges with minimal impact on predictions. GNN explainability methods generate local importance scores, which can be aggregated into global scores for graph sparsification. However, many explainability methods produce only non-negative scores, limiting their applicability for sparsification. In contrast, Shapley value based methods assign both positive and negative contributions to node predictions, offering a theoretically robust and fair allocation of importance by evaluating many subsets of graphs. Unlike gradient-based or perturbation-based explainers, Shapley values enable better pruning strategies that preserve influential edges while removing misleading or adversarial connections. Our approach shows that Shapley value-based graph sparsification maintains predictive performance while significantly reducing graph complexity, enhancing both interpretability and efficiency in GNN inference.

Updated: 2025-07-28 01:30:09

标题: 基于沙普利值的图稀疏化用于GNN推断

摘要: 图稀疏化是提高图神经网络推理效率的关键技术,通过删除对预测影响最小的边。GNN可解释性方法生成局部重要性得分,这些得分可以汇总为全局得分进行图稀疏化。然而,许多可解释性方法只生成非负得分,限制了它们在稀疏化方面的适用性。相比之下,基于Shapley值的方法为节点预测分配了正负贡献,通过评估许多图的子集,提供了理论上稳健和公平的重要性分配。与基于梯度或扰动的解释方法不同,Shapley值能够实现更好的修剪策略,保留有影响力的边,同时去除误导性或对抗性的连接。我们的方法表明,基于Shapley值的图稀疏化在保持预测性能的同时,显著减少了图的复杂性,提高了GNN推理的可解释性和效率。

更新时间: 2025-07-28 01:30:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20460v1

Masked Autoencoders that Feel the Heart: Unveiling Simplicity Bias for ECG Analyses

The diagnostic value of electrocardiogram (ECG) lies in its dynamic characteristics, ranging from rhythm fluctuations to subtle waveform deformations that evolve across time and frequency domains. However, supervised ECG models tend to overfit dominant and repetitive patterns, overlooking fine-grained but clinically critical cues, a phenomenon known as Simplicity Bias (SB), where models favor easily learnable signals over subtle but informative ones. In this work, we first empirically demonstrate the presence of SB in ECG analyses and its negative impact on diagnostic performance, while simultaneously discovering that self-supervised learning (SSL) can alleviate it, providing a promising direction for tackling the bias. Following the SSL paradigm, we propose a novel method comprising two key components: 1) Temporal-Frequency aware Filters to capture temporal-frequency features reflecting the dynamic characteristics of ECG signals, and 2) building on this, Multi-Grained Prototype Reconstruction for coarse and fine representation learning across dual domains, further mitigating SB. To advance SSL in ECG analyses, we curate a large-scale multi-site ECG dataset with 1.53 million recordings from over 300 clinical centers. Experiments on three downstream tasks across six ECG datasets demonstrate that our method effectively reduces SB and achieves state-of-the-art performance.

Updated: 2025-07-28 01:29:14

标题: 带有心脏感知功能的遮蔽自动编码器:揭示心电图分析的简单偏好

摘要: 心电图(ECG)的诊断价值在于其动态特性,范围从节律波动到随时间和频率域演变的微妙波形变形。然而,监督的ECG模型倾向于过度拟合主导性和重复性模式,忽视了精细但临床关键的线索,这种现象被称为简单性偏差(SB),即模型偏好易于学习的信号而不是微妙但信息丰富的信号。在这项工作中,我们首先通过实证证明了ECG分析中SB的存在及其对诊断性能的负面影响,同时发现自监督学习(SSL)可以缓解这种影响,为解决偏差提供了一个有前途的方向。按照SSL范式,我们提出了一种包含两个关键组件的新方法:1)时间-频率感知滤波器,用于捕获反映ECG信号动态特性的时间-频率特征,2)在此基础上构建多粒度原型重建,用于在双域中进行粗细表示学习,进一步减轻SB。为了推进ECG分析中的SSL,我们策划了一个大规模的多中心ECG数据集,包括来自300多个临床中心的153万条记录。在六个ECG数据集上进行的三项下游任务实验证明,我们的方法有效地减少了SB,并取得了最先进的性能。

更新时间: 2025-07-28 01:29:14

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2506.22495v4

Diagonally-Weighted Generalized Method of Moments Estimation for Gaussian Mixture Modeling

Since Pearson [Philosophical Transactions of the Royal Society of London. A, 185 (1894), pp. 71-110] first applied the method of moments (MM) for modeling data as a mixture of one-dimensional Gaussians, moment-based estimation methods have proliferated. Among these methods, the generalized method of moments (GMM) improves the statistical efficiency of MM by weighting the moments appropriately. However, the computational complexity and storage complexity of MM and GMM grow exponentially with the dimension, making these methods impractical for high-dimensional data or when higher-order moments are required. Such computational bottlenecks are more severe in GMM since it additionally requires estimating a large weighting matrix. To overcome these bottlenecks, we propose the diagonally-weighted GMM (DGMM), which achieves a balance among statistical efficiency, computational complexity, and numerical stability. We apply DGMM to study the parameter estimation problem for weakly separated heteroscedastic low-rank Gaussian mixtures and design a computationally efficient and numerically stable algorithm that obtains the DGMM estimator without explicitly computing or storing the moment tensors. We implement the proposed algorithm and empirically validate the advantages of DGMM: in numerical studies, DGMM attains smaller estimation errors while requiring substantially shorter runtime than MM and GMM. The code and data will be available upon publication at https://github.com/liu-lzhang/dgmm.

Updated: 2025-07-28 01:24:55

标题: 对角加权广义矩估计在高斯混合建模中的应用

摘要: 自从皮尔逊(Philosophical Transactions of the Royal Society of London. A,185(1894),pp. 71-110)首次应用矩法(MM)对数据建模为一维高斯混合物以来,基于矩的估计方法已经大量涌现。在这些方法中,广义矩法(GMM)通过适当加权矩来提高MM的统计效率。然而,随着维度的增加,MM和GMM的计算复杂度和存储复杂度呈指数增长,使得这些方法在高维数据或需要更高阶矩时变得不切实际。这种计算瓶颈在GMM中更为严重,因为它还需要估计一个大的加权矩阵。为了克服这些瓶颈,我们提出了对角加权GMM(DGMM),它在统计效率、计算复杂度和数值稳定性之间取得了平衡。我们将DGMM应用于研究弱分离异方差低秩高斯混合物的参数估计问题,并设计了一个计算效率高、数值稳定的算法,可以获得DGMM估计量,而无需明确计算或存储矩张量。我们实施了提出的算法,并在实证研究中验证了DGMM的优势:在数值研究中,DGMM获得了比MM和GMM更小的估计误差,同时运行时间显著缩短。代码和数据将在https://github.com/liu-lzhang/dgmm上发布。

更新时间: 2025-07-28 01:24:55

领域: cs.LG,cs.NA,math.NA,math.ST,stat.ME,stat.ML,stat.TH,62F12, 62H30, 15A69, 65Y20

下载: http://arxiv.org/abs/2507.20459v1

Diagonally-Weighted Generalized Method of Moments Estimation for Gaussian Mixture Modeling

Since Pearson [Philosophical Transactions of the Royal Society of London. A, 185 (1894), pp. 71-110] first applied the method of moments (MM) for modeling data as a mixture of one-dimensional Gaussians, moment-based estimation methods have proliferated. Among these methods, the generalized method of moments (GMM) improves the statistical efficiency of MM by weighting the moments appropriately. However, the computational complexity and storage complexity of MM and GMM grow exponentially with the dimension, making these methods impractical for high-dimensional data or when higher-order moments are required. Such computational bottlenecks are more severe in GMM since it additionally requires estimating a large weighting matrix. To overcome these bottlenecks, we propose the diagonally-weighted GMM (DGMM), which achieves a balance among statistical efficiency, computational complexity, and numerical stability. We apply DGMM to study the parameter estimation problem for weakly separated heteroscedastic low-rank Gaussian mixtures and design a computationally efficient and numerically stable algorithm that obtains the DGMM estimator without explicitly computing or storing the moment tensors. We implement the proposed algorithm and empirically validate the advantages of DGMM: in numerical studies, DGMM attains smaller estimation errors while requiring substantially shorter runtime than MM and GMM. The code and data will be available upon publication at https://github.com/liu-lzhang/dgmm.

Updated: 2025-07-28 01:24:55

标题: 对角加权广义矩估计在高斯混合建模中的应用

摘要: 自从皮尔逊(London Royal Society的《哲学交易》A,185(1894),pp.71-110)首次应用矩方法(MM)对数据进行建模,将数据建模为一维高斯混合物,基于矩的估计方法已经大量涌现。在这些方法中,广义矩方法(GMM)通过适当加权矩来提高MM的统计效率。然而,随着维数增加,MM和GMM的计算复杂性和存储复杂性呈指数增长,使得这些方法在高维数据或需要更高阶矩时变得不切实际。在GMM中,这种计算瓶颈更加严重,因为它还需要估计一个大的加权矩阵。为了克服这些瓶颈,我们提出了对角加权GMM(DGMM),它在统计效率、计算复杂性和数值稳定性之间取得了平衡。我们将DGMM应用于研究弱分离异方差低秩高斯混合物的参数估计问题,并设计了一种计算高效、数值稳定的算法,可以获得DGMM估计器,而无需显式计算或存储矩张量。我们实现了提出的算法,并在实证研究中验证了DGMM的优势:在数值研究中,DGMM获得了较小的估计误差,同时运行时间明显短于MM和GMM。代码和数据将在https://github.com/liu-lzhang/dgmm发布后提供。

更新时间: 2025-07-28 01:24:55

领域: cs.LG,cs.NA,math.NA,math.ST,stat.ME,stat.ML,stat.TH,62F12, 62H30, 15A69, 65Y20

下载: http://arxiv.org/abs/2507.20459v1

Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis

Visual autoregressive modeling, based on the next-scale prediction paradigm, exhibits notable advantages in image quality and model scalability over traditional autoregressive and diffusion models. It generates images by progressively refining resolution across multiple stages. However, the computational overhead in high-resolution stages remains a critical challenge due to the substantial number of tokens involved. In this paper, we introduce SparseVAR, a plug-and-play acceleration framework for next-scale prediction that dynamically excludes low-frequency tokens during inference without requiring additional training. Our approach is motivated by the observation that tokens in low-frequency regions have a negligible impact on image quality in high-resolution stages and exhibit strong similarity with neighboring tokens. Additionally, we observe that different blocks in the next-scale prediction model focus on distinct regions, with some concentrating on high-frequency areas. SparseVAR leverages these insights by employing lightweight MSE-based metrics to identify low-frequency tokens while preserving the fidelity of excluded regions through a small set of uniformly sampled anchor tokens. By significantly reducing the computational cost while maintaining high image generation quality, SparseVAR achieves notable acceleration in both HART and Infinity. Specifically, SparseVAR achieves up to a 2 times speedup with minimal quality degradation in Infinity-2B.

Updated: 2025-07-28 01:13:24

标题: 频率感知自回归建模用于高效高分辨率图像合成

摘要: 基于下一级预测范式的视觉自回归建模在图像质量和模型可扩展性方面表现出明显优势,超过了传统的自回归和扩散模型。它通过逐步在多个阶段提高分辨率来生成图像。然而,在高分辨率阶段的计算开销仍然是一个关键挑战,这是由于涉及的令牌数量庞大。在本文中,我们引入了SparseVAR,这是一个用于下一级预测的即插即用加速框架,可以在推断过程中动态排除低频令牌,而无需额外的训练。我们的方法是基于这样一个观察:低频区域的令牌对高分辨率阶段的图像质量几乎没有影响,并且与相邻令牌表现出较强的相似性。此外,我们观察到下一级预测模型中的不同块专注于不同区域,有些专注于高频区域。SparseVAR利用这些洞察力,通过使用轻量级基于均方误差的度量来识别低频令牌,同时通过一小组均匀采样的锚定令牌保持被排除区域的保真度。通过显著降低计算成本同时保持高图像生成质量,SparseVAR在HART和Infinity两方面实现了显著的加速。具体而言,SparseVAR在Infinity-2B中实现了高达2倍的加速,同时最小程度地降低了质量。

更新时间: 2025-07-28 01:13:24

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.20454v1

Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis

Visual autoregressive modeling, based on the next-scale prediction paradigm, exhibits notable advantages in image quality and model scalability over traditional autoregressive and diffusion models. It generates images by progressively refining resolution across multiple stages. However, the computational overhead in high-resolution stages remains a critical challenge due to the substantial number of tokens involved. In this paper, we introduce SparseVAR, a plug-and-play acceleration framework for next-scale prediction that dynamically excludes low-frequency tokens during inference without requiring additional training. Our approach is motivated by the observation that tokens in low-frequency regions have a negligible impact on image quality in high-resolution stages and exhibit strong similarity with neighboring tokens. Additionally, we observe that different blocks in the next-scale prediction model focus on distinct regions, with some concentrating on high-frequency areas. SparseVAR leverages these insights by employing lightweight MSE-based metrics to identify low-frequency tokens while preserving the fidelity of excluded regions through a small set of uniformly sampled anchor tokens. By significantly reducing the computational cost while maintaining high image generation quality, SparseVAR achieves notable acceleration in both HART and Infinity. Specifically, SparseVAR achieves up to a 2 times speedup with minimal quality degradation in Infinity-2B.

Updated: 2025-07-28 01:13:24

标题: 频率感知自回归建模用于高效的高分辨率图像合成

摘要: 基于下一尺度预测范式的视觉自回归建模在图像质量和模型可扩展性方面表现出明显优势,超过了传统的自回归和扩散模型。它通过逐渐提高多个阶段的分辨率来生成图像。然而,在高分辨率阶段的计算开销仍然是一个关键挑战,因为涉及到大量的标记。在本文中,我们引入了SparseVAR,这是一个用于下一尺度预测的即插即用加速框架,它在推理过程中动态排除低频标记,而无需额外的训练。我们的方法是基于这样的观察:低频区域的标记对于高分辨率阶段的图像质量几乎没有影响,并且与相邻标记具有很强的相似性。此外,我们观察到下一尺度预测模型中的不同块侧重于不同区域,有些集中在高频区域。SparseVAR利用这些见解,通过使用基于轻量级均方误差的度量标准来识别低频标记,同时通过一小组均匀采样的锚点标记保留被排除区域的准确性。通过显著降低计算成本同时保持高图像生成质量,SparseVAR在HART和Infinity中实现了显著的加速。具体而言,SparseVAR在Infinity-2B中实现了最多2倍的加速,而质量降低最小。

更新时间: 2025-07-28 01:13:24

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.20454v1

Bi-LAT: Bilateral Control-Based Imitation Learning via Natural Language and Action Chunking with Transformers

We present Bi-LAT, a novel imitation learning framework that unifies bilateral control with natural language processing to achieve precise force modulation in robotic manipulation. Bi-LAT leverages joint position, velocity, and torque data from leader-follower teleoperation while also integrating visual and linguistic cues to dynamically adjust applied force. By encoding human instructions such as "softly grasp the cup" or "strongly twist the sponge" through a multimodal Transformer-based model, Bi-LAT learns to distinguish nuanced force requirements in real-world tasks. We demonstrate Bi-LAT's performance in (1) unimanual cup-stacking scenario where the robot accurately modulates grasp force based on language commands, and (2) bimanual sponge-twisting task that requires coordinated force control. Experimental results show that Bi-LAT effectively reproduces the instructed force levels, particularly when incorporating SigLIP among tested language encoders. Our findings demonstrate the potential of integrating natural language cues into imitation learning, paving the way for more intuitive and adaptive human-robot interaction. For additional material, please visit: https://mertcookimg.github.io/bi-lat/

Updated: 2025-07-28 01:12:45

标题: Bi-LAT: 通过Transformers基于双边控制的模仿学习,结合自然语言和动作分块

摘要: 我们提出了Bi-LAT,这是一个新颖的模仿学习框架,将双边控制与自然语言处理结合起来,实现了机器人操作中精确的力量调节。Bi-LAT利用领导者-跟随者远程操作的关节位置、速度和扭矩数据,同时还整合了视觉和语言线索,动态调整施加的力量。通过将人类指令如“轻轻抓住杯子”或“用力扭转海绵”编码到基于多模态Transformer的模型中,Bi-LAT学会了在现实任务中区分微妙的力量要求。我们展示了Bi-LAT在单手杯叠放场景和双手扭转海绵任务中的表现,其中机器人根据语言指令准确调节抓握力量,并需要协调的力量控制。实验结果显示,Bi-LAT有效地重现了指导的力量水平,尤其是在测试的语言编码器中加入SigLIP时。我们的研究结果表明,将自然语言线索整合到模仿学习中的潜力,为更直观和适应性的人机交互铺平了道路。有关更多资料,请访问:https://mertcookimg.github.io/bi-lat/

更新时间: 2025-07-28 01:12:45

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2504.01301v2

Bi-LAT: Bilateral Control-Based Imitation Learning via Natural Language and Action Chunking with Transformers

We present Bi-LAT, a novel imitation learning framework that unifies bilateral control with natural language processing to achieve precise force modulation in robotic manipulation. Bi-LAT leverages joint position, velocity, and torque data from leader-follower teleoperation while also integrating visual and linguistic cues to dynamically adjust applied force. By encoding human instructions such as "softly grasp the cup" or "strongly twist the sponge" through a multimodal Transformer-based model, Bi-LAT learns to distinguish nuanced force requirements in real-world tasks. We demonstrate Bi-LAT's performance in (1) unimanual cup-stacking scenario where the robot accurately modulates grasp force based on language commands, and (2) bimanual sponge-twisting task that requires coordinated force control. Experimental results show that Bi-LAT effectively reproduces the instructed force levels, particularly when incorporating SigLIP among tested language encoders. Our findings demonstrate the potential of integrating natural language cues into imitation learning, paving the way for more intuitive and adaptive human-robot interaction. For additional material, please visit: https://mertcookimg.github.io/bi-lat/

Updated: 2025-07-28 01:12:45

标题: Bi-LAT:基于双边控制的通过Transformers进行自然语言和动作分块的模仿学习

摘要: 我们提出了Bi-LAT,这是一个新颖的模仿学习框架,它将双边控制与自然语言处理结合起来,以实现机器人操作中的精确力量调节。Bi-LAT利用领导者-跟随者远程操作的关节位置、速度和扭矩数据,同时还整合了视觉和语言线索,动态调整施加的力量。通过通过基于多模式变压器模型对人类指令进行编码,例如“轻轻握住杯子”或“强烈扭动海绵”,Bi-LAT学会了在现实任务中区分微妙的力量要求。我们展示了Bi-LAT在以下两个场景中的表现:(1)单手杯子叠放场景中,机器人根据语言指令准确调节夹持力,以及(2)需要协调力量控制的双手海绵扭转任务。实验结果表明,在测试的语言编码器中尤其是当整合SigLIP时,Bi-LAT有效地再现了指导力水平。我们的发现展示了将自然语言线索整合到模仿学习中的潜力,为更直观和适应性更强的人机交互铺平了道路。如需更多资料,请访问:https://mertcookimg.github.io/bi-lat/

更新时间: 2025-07-28 01:12:45

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2504.01301v2

Weak-to-Strong Generalization with Failure Trajectories: A Tree-based Approach to Elicit Optimal Policy in Strong Models

Weak-to-Strong generalization (W2SG) is a new trend to elicit the full capabilities of a strong model with supervision from a weak model. While existing W2SG studies focus on simple tasks like binary classification, we extend this paradigm to complex interactive decision-making environments. Specifically, we fine-tune a strong model with trajectories of intermediate actions generated by a weak model. Motivated by the human learning process, we propose to generalize not only success knowledge but also failure experience so that the strong model can learn from failed trajectories accumulated by weak models. To effectively and efficiently elicit the potential of strong agents, we further construct ``trajectory trees," a hierarchical representation that organizes weak model-generated action trajectories, coupled with Monte Carlo Tree Search (MCTS) to optimize the strong model. Through theoretical analysis, we provide formal guarantees for the effectiveness of our method in improving W2SG performance. Our empirical evaluations demonstrate substantial improvements in reasoning and decision-making capabilities across diverse task domains, validating the scalability and robustness of our proposed framework.

Updated: 2025-07-28 01:08:57

标题: 弱到强泛化与失败轨迹:一种基于树的方法来引导强模型中的最佳策略

摘要: 弱到强的泛化(W2SG)是一种新趋势,通过弱模型的监督来激发强模型的全部能力。虽然现有的W2SG研究集中在简单的任务,如二元分类,我们将这一范式扩展到复杂的互动决策环境。具体地,我们通过弱模型生成的中间动作轨迹对强模型进行微调。受人类学习过程的启发,我们提出泛化不仅包括成功知识,也包括失败经验,使得强模型可以从弱模型积累的失败轨迹中学习。为了有效、高效地激发强代理的潜力,我们进一步构建了“轨迹树”,这是一种层次化表示,组织了弱模型生成的动作轨迹,结合蒙特卡洛树搜索(MCTS)来优化强模型。通过理论分析,我们为我们的方法在改善W2SG性能方面提供了正式保证。我们的实证评估显示,在各种任务领域的推理和决策能力显著提高,验证了我们提出的框架的可扩展性和稳健性。

更新时间: 2025-07-28 01:08:57

领域: cs.LG

下载: http://arxiv.org/abs/2507.18858v2

Your Attention Matters: to Improve Model Robustness to Noise and Spurious Correlations

Self-attention mechanisms are foundational to Transformer architectures, supporting their impressive success in a wide range of tasks. While there are many self-attention variants, their robustness to noise and spurious correlations has not been well studied. This study evaluates Softmax, Sigmoid, Linear, Doubly Stochastic, and Cosine attention within Vision Transformers under different data corruption scenarios. Through testing across the CIFAR-10, CIFAR-100, and Imagenette datasets, we show that Doubly Stochastic attention is the most robust. Our findings inform self-attention selection in contexts with imperfect data.

Updated: 2025-07-28 01:07:22

标题: 您的关注很重要:提高模型对噪声和虚假相关性的稳健性

摘要: 自注意机制是Transformer架构的基础,支持它们在广泛任务中取得惊人的成功。虽然有许多自注意变体,但它们对噪声和伪相关性的稳健性尚未得到深入研究。本研究评估了Softmax、Sigmoid、Linear、Doubly Stochastic和Cosine注意力在不同数据损坏情景下在Vision Transformers中的表现。通过在CIFAR-10、CIFAR-100和Imagenette数据集上的测试,我们发现Doubly Stochastic注意力是最稳健的。我们的发现为在存在不完美数据的情境中选择自注意力提供了信息。

更新时间: 2025-07-28 01:07:22

领域: cs.LG

下载: http://arxiv.org/abs/2507.20453v1

Your Attention Matters: to Improve Model Robustness to Noise and Spurious Correlations

Self-attention mechanisms are foundational to Transformer architectures, supporting their impressive success in a wide range of tasks. While there are many self-attention variants, their robustness to noise and spurious correlations has not been well studied. This study evaluates Softmax, Sigmoid, Linear, Doubly Stochastic, and Cosine attention within Vision Transformers under different data corruption scenarios. Through testing across the CIFAR-10, CIFAR-100, and Imagenette datasets, we show that Doubly Stochastic attention is the most robust. Our findings inform self-attention selection in contexts with imperfect data.

Updated: 2025-07-28 01:07:22

标题: 你的关注很重要:改善模型对噪音和虚假相关性的鲁棒性

摘要: 自注意机制是Transformer架构的基础,支持它们在各种任务中取得惊人的成功。虽然有许多自注意变种,但它们对噪声和虚假相关性的稳健性尚未得到充分研究。本研究评估了Softmax、Sigmoid、线性、双随机和余弦自注意在不同数据损坏场景下在Vision Transformers中的表现。通过在CIFAR-10、CIFAR-100和Imagenette数据集上的测试,我们发现双随机自注意是最稳健的。我们的发现为在存在不完善数据的环境中选择自注意提供了信息。

更新时间: 2025-07-28 01:07:22

领域: cs.LG

下载: http://arxiv.org/abs/2507.20453v1

STARN-GAT: A Multi-Modal Spatio-Temporal Graph Attention Network for Accident Severity Prediction

Accurate prediction of traffic accident severity is critical for improving road safety, optimizing emergency response strategies, and informing the design of safer transportation infrastructure. However, existing approaches often struggle to effectively model the intricate interdependencies among spatial, temporal, and contextual variables that govern accident outcomes. In this study, we introduce STARN-GAT, a Multi-Modal Spatio-Temporal Graph Attention Network, which leverages adaptive graph construction and modality-aware attention mechanisms to capture these complex relationships. Unlike conventional methods, STARN-GAT integrates road network topology, temporal traffic patterns, and environmental context within a unified attention-based framework. The model is evaluated on the Fatality Analysis Reporting System (FARS) dataset, achieving a Macro F1-score of 85 percent, ROC-AUC of 0.91, and recall of 81 percent for severe incidents. To ensure generalizability within the South Asian context, STARN-GAT is further validated on the ARI-BUET traffic accident dataset, where it attains a Macro F1-score of 0.84, recall of 0.78, and ROC-AUC of 0.89. These results demonstrate the model's effectiveness in identifying high-risk cases and its potential for deployment in real-time, safety-critical traffic management systems. Furthermore, the attention-based architecture enhances interpretability, offering insights into contributing factors and supporting trust in AI-assisted decision-making. Overall, STARN-GAT bridges the gap between advanced graph neural network techniques and practical applications in road safety analytics.

Updated: 2025-07-28 01:00:03

标题: STARN-GAT:用于事故严重程度预测的多模态时空图注意力网络

摘要: 精确预测交通事故严重程度对于提高道路安全、优化应急响应策略以及指导更安全的交通基础设施设计至关重要。然而,现有方法往往难以有效地建模影响事故结果的空间、时间和背景变量之间的复杂相互关系。在本研究中,我们引入了STARN-GAT,一种多模态时空图注意力网络,利用自适应图构建和模态感知的注意机制来捕捉这些复杂关系。与传统方法不同,STARN-GAT将道路网络拓扑、时间交通模式和环境背景集成在统一的基于注意力的框架内。该模型在Fatality Analysis Reporting System(FARS)数据集上进行评估,实现了85%的宏观F1分数,0.91的ROC-AUC和81%的严重事件召回率。为了确保在南亚环境中的泛化性,STARN-GAT还在ARI-BUET交通事故数据集上进行了验证,取得了0.84的宏观F1分数,0.78的召回率和0.89的ROC-AUC。这些结果证明了该模型在识别高风险案例方面的有效性,以及在实时、安全关键的交通管理系统中部署的潜力。此外,基于注意力的架构增强了可解释性,提供了对影响因素的见解,并支持对AI辅助决策的信任。总的来说,STARN-GAT弥合了先进的图神经网络技术与道路安全分析中的实际应用之间的差距。

更新时间: 2025-07-28 01:00:03

领域: cs.AI

下载: http://arxiv.org/abs/2507.20451v1

STARN-GAT: A Multi-Modal Spatio-Temporal Graph Attention Network for Accident Severity Prediction

Accurate prediction of traffic accident severity is critical for improving road safety, optimizing emergency response strategies, and informing the design of safer transportation infrastructure. However, existing approaches often struggle to effectively model the intricate interdependencies among spatial, temporal, and contextual variables that govern accident outcomes. In this study, we introduce STARN-GAT, a Multi-Modal Spatio-Temporal Graph Attention Network, which leverages adaptive graph construction and modality-aware attention mechanisms to capture these complex relationships. Unlike conventional methods, STARN-GAT integrates road network topology, temporal traffic patterns, and environmental context within a unified attention-based framework. The model is evaluated on the Fatality Analysis Reporting System (FARS) dataset, achieving a Macro F1-score of 85 percent, ROC-AUC of 0.91, and recall of 81 percent for severe incidents. To ensure generalizability within the South Asian context, STARN-GAT is further validated on the ARI-BUET traffic accident dataset, where it attains a Macro F1-score of 0.84, recall of 0.78, and ROC-AUC of 0.89. These results demonstrate the model's effectiveness in identifying high-risk cases and its potential for deployment in real-time, safety-critical traffic management systems. Furthermore, the attention-based architecture enhances interpretability, offering insights into contributing factors and supporting trust in AI-assisted decision-making. Overall, STARN-GAT bridges the gap between advanced graph neural network techniques and practical applications in road safety analytics.

Updated: 2025-07-28 01:00:03

标题: STARN-GAT:用于事故严重性预测的多模态时空图注意力网络

摘要: 准确预测交通事故严重程度对于改善道路安全、优化紧急响应策略和指导更安全的交通基础设施设计至关重要。然而,现有方法通常难以有效建模影响事故结果的空间、时间和背景变量之间复杂的相互关系。在本研究中,我们介绍了STARN-GAT,一种多模态时空图注意力网络,利用自适应图构建和模态感知注意力机制来捕捉这些复杂关系。与传统方法不同,STARN-GAT在统一的基于注意力的框架内集成了道路网络拓扑、时间交通模式和环境背景。该模型在致命事故分析报告系统(FARS)数据集上进行评估,实现了85%的Macro F1分数、0.91的ROC-AUC和81%的严重事件召回率。为了确保在南亚背景下的普适性,STARN-GAT还在ARI-BUET交通事故数据集上进行了进一步验证,其中实现了0.84的Macro F1分数、0.78的召回率和0.89的ROC-AUC。这些结果表明了该模型在识别高风险案例方面的有效性,以及在实时、安全关键的交通管理系统中部署的潜力。此外,基于注意力的架构增强了可解释性,提供了有关贡献因素的见解,并支持对AI辅助决策的信任。总的来说,STARN-GAT弥合了先进的图神经网络技术与道路安全分析实际应用之间的差距。

更新时间: 2025-07-28 01:00:03

领域: cs.AI

下载: http://arxiv.org/abs/2507.20451v1

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

Aligning large language models (LLMs) with human values is an increasingly critical step in post-training. Direct Preference Optimization (DPO) has emerged as a simple, yet effective alternative to reinforcement learning from human feedback (RLHF). Synthetic preference data with its low cost and high quality enable effective alignment through single- or multi-model generated preference data. Our study reveals a striking, safety-specific phenomenon associated with DPO alignment: Although multi-model generated data enhances performance on general tasks (ARC, Hellaswag, MMLU, TruthfulQA, Winogrande) by providing diverse responses, it also tends to facilitate reward hacking during training. This can lead to a high attack success rate (ASR) when models encounter jailbreaking prompts. The issue is particularly pronounced when employing stronger models like GPT-4o or larger models in the same family to generate chosen responses paired with target model self-generated rejected responses, resulting in dramatically poorer safety outcomes. Furthermore, with respect to safety, using solely self-generated responses (single-model generation) for both chosen and rejected pairs significantly outperforms configurations that incorporate responses from stronger models, whether used directly as chosen data or as part of a multi-model response pool. We demonstrate that multi-model preference data exhibits high linear separability between chosen and rejected responses, which allows models to exploit superficial cues rather than internalizing robust safety constraints. Our experiments, conducted on models from the Llama, Mistral, and Qwen families, consistently validate these findings.

Updated: 2025-07-28 00:57:26

标题: 更多即意味着更少:DPO安全对齐中多模型合成偏好数据的缺陷

摘要: 将大型语言模型(LLMs)与人类价值观对齐是后训练中日益关键的一步。直接偏好优化(DPO)已经成为从人类反馈(RLHF)中简单而有效的替代方案。合成偏好数据以其低成本和高质量使得通过单一或多模型生成的偏好数据实现有效对齐成为可能。我们的研究揭示了与DPO对齐相关的一个引人注目的、与安全性有关的现象:尽管多模型生成的数据通过提供多样化的回应提高了在一般任务(ARC、Hellaswag、MMLU、TruthfulQA、Winogrande)上的表现,但它也往往在训练过程中促进了奖励黑客行为。这可能导致当模型遇到破解提示时的攻击成功率(ASR)较高。当使用像GPT-4o这样的更强大的模型或同一系列中更大的模型生成选定的回应与目标模型自动生成的拒绝回应配对时,这个问题尤为突出,导致安全结果显著恶化。此外,就安全性而言,仅使用自动生成的回应(单一模型生成)作为选定和拒绝对的配置显著优于将来自更强大模型的回应直接用作选定数据或作为多模型回应池的一部分的配置。我们展示了多模型偏好数据在选定和拒绝回应之间具有高线性可分性,这允许模型利用表面线索而不是内化强大的安全约束。我们在Llama、Mistral和Qwen系列模型上进行的实验一致验证了这些发现。

更新时间: 2025-07-28 00:57:26

领域: cs.AI

下载: http://arxiv.org/abs/2504.02193v3

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

Aligning large language models (LLMs) with human values is an increasingly critical step in post-training. Direct Preference Optimization (DPO) has emerged as a simple, yet effective alternative to reinforcement learning from human feedback (RLHF). Synthetic preference data with its low cost and high quality enable effective alignment through single- or multi-model generated preference data. Our study reveals a striking, safety-specific phenomenon associated with DPO alignment: Although multi-model generated data enhances performance on general tasks (ARC, Hellaswag, MMLU, TruthfulQA, Winogrande) by providing diverse responses, it also tends to facilitate reward hacking during training. This can lead to a high attack success rate (ASR) when models encounter jailbreaking prompts. The issue is particularly pronounced when employing stronger models like GPT-4o or larger models in the same family to generate chosen responses paired with target model self-generated rejected responses, resulting in dramatically poorer safety outcomes. Furthermore, with respect to safety, using solely self-generated responses (single-model generation) for both chosen and rejected pairs significantly outperforms configurations that incorporate responses from stronger models, whether used directly as chosen data or as part of a multi-model response pool. We demonstrate that multi-model preference data exhibits high linear separability between chosen and rejected responses, which allows models to exploit superficial cues rather than internalizing robust safety constraints. Our experiments, conducted on models from the Llama, Mistral, and Qwen families, consistently validate these findings.

Updated: 2025-07-28 00:57:26

标题: 更多不一定就是更好:多模型合成偏好数据在DPO安全对齐中的风险

摘要: 将大型语言模型(LLMs)与人类价值观相一致是后训练中日益关键的一步。直接偏好优化(DPO)已经成为从人类反馈中替代强化学习的简单而有效的选择。合成偏好数据具有成本低、质量高的特点,通过单一或多模型生成的偏好数据能够有效地实现对齐。我们的研究揭示了与DPO对齐相关的一个引人注目的、与安全性有关的现象:尽管多模型生成的数据通过提供多样化的反馈,增强了在一般任务(ARC、Hellaswag、MMLU、TruthfulQA、Winogrande)上的表现,但在训练过程中也往往促进了奖励破解。这可能导致模型在遇到越狱提示时具有较高的攻击成功率(ASR)。当使用类似GPT-4o的更强大模型或同一系列的更大模型生成所选响应并与目标模型自动生成的被拒绝响应配对时,这个问题尤为显著,导致安全性结果显著恶化。此外,在安全性方面,仅使用自动生成的响应(单模型生成)作为所选和被拒绝对的显著优于将来自更强大模型的响应(无论是直接作为所选数据还是作为多模型响应池的一部分)纳入配置的情况。我们展示了多模型偏好数据在所选和被拒绝响应之间具有较高的线性可分性,这使模型能够利用表面线索而不是内化强大的安全约束。我们在来自Llama、Mistral和Qwen系列的模型上进行的实验一致验证了这些发现。

更新时间: 2025-07-28 00:57:26

领域: cs.AI

下载: http://arxiv.org/abs/2504.02193v3

MaXsive: High-Capacity and Robust Training-Free Generative Image Watermarking in Diffusion Models

The great success of the diffusion model in image synthesis led to the release of gigantic commercial models, raising the issue of copyright protection and inappropriate content generation. Training-free diffusion watermarking provides a low-cost solution for these issues. However, the prior works remain vulnerable to rotation, scaling, and translation (RST) attacks. Although some methods employ meticulously designed patterns to mitigate this issue, they often reduce watermark capacity, which can result in identity (ID) collusion. To address these problems, we propose MaXsive, a training-free diffusion model generative watermarking technique that has high capacity and robustness. MaXsive best utilizes the initial noise to watermark the diffusion model. Moreover, instead of using a meticulously repetitive ring pattern, we propose injecting the X-shape template to recover the RST distortions. This design significantly increases robustness without losing any capacity, making ID collusion less likely to happen. The effectiveness of MaXsive has been verified on two well-known watermarking benchmarks under the scenarios of verification and identification.

Updated: 2025-07-28 00:51:47

标题: MaXsive:扩展性高、强大且无需训练的扩散模型生成图像水印技术

摘要: 扩散模型在图像合成中取得了巨大成功,导致发布了巨大的商业模型,引发了版权保护和不当内容生成的问题。无需训练的扩散水印技术为这些问题提供了低成本解决方案。然而,先前的作品仍然容易受到旋转、缩放和平移(RST)攻击的影响。虽然一些方法采用精心设计的图案来减轻这一问题,但它们往往会减少水印容量,从而可能导致身份(ID)串通。为了解决这些问题,我们提出了MaXsive,这是一种无需训练的扩散模型生成水印技术,具有高容量和稳健性。MaXsive最大程度地利用初始噪声对扩散模型进行水印处理。此外,我们提出了将X形模板注入以恢复RST失真,而不是使用精心设计的重复环形图案。这种设计显著增加了稳健性,而不会损失任何容量,使ID串通的可能性变得更小。MaXsive的有效性已在两个知名的水印基准测试场景下得到验证,包括验证和识别。

更新时间: 2025-07-28 00:51:47

领域: cs.CR,cs.AI,cs.MM

下载: http://arxiv.org/abs/2507.21195v1

WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope

Sparse regularization is fundamental in signal processing for efficient signal recovery and feature extraction. However, it faces a fundamental dilemma: the most powerful sparsity-inducing penalties are often non-differentiable, conflicting with gradient-based optimizers that dominate the field. We introduce WEEP (Weakly-convex Envelope of Piecewise Penalty), a novel, fully differentiable sparse regularizer derived from the weakly-convex envelope framework. WEEP provides strong, unbiased sparsity while maintaining full differentiability and L-smoothness, making it natively compatible with any gradient-based optimizer. This resolves the conflict between statistical performance and computational tractability. We demonstrate superior performance compared to the L1-norm and other established non-convex sparse regularizers on challenging signal and image denoising tasks.

Updated: 2025-07-28 00:40:48

标题: WEEP:通过弱凸包络的可微非凸稀疏正则化器

摘要: 稀疏正则化在信号处理中是高效信号恢复和特征提取的基础。然而,它面临一个基本困境:最强大的稀疏诱导惩罚通常是不可微的,与主导该领域的基于梯度的优化器相冲突。我们介绍了WEEP(弱凸包络分段惩罚),这是一种新颖的、完全可微的稀疏正则化器,源自于弱凸包络框架。WEEP提供了强大、无偏的稀疏性,同时保持完全可微和L-smoothness,使其与任何基于梯度的优化器本地兼容。这解决了统计性能和计算可处理性之间的冲突。我们展示了与L1范数和其他已建立的非凸稀疏正则化器在具有挑战性的信号和图像去噪任务上相比的卓越性能。

更新时间: 2025-07-28 00:40:48

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.20447v1

WEEP: A Differentiable Nonconvex Sparse Regularizer via Weakly-Convex Envelope

Sparse regularization is fundamental in signal processing for efficient signal recovery and feature extraction. However, it faces a fundamental dilemma: the most powerful sparsity-inducing penalties are often non-differentiable, conflicting with gradient-based optimizers that dominate the field. We introduce WEEP (Weakly-convex Envelope of Piecewise Penalty), a novel, fully differentiable sparse regularizer derived from the weakly-convex envelope framework. WEEP provides strong, unbiased sparsity while maintaining full differentiability and L-smoothness, making it natively compatible with any gradient-based optimizer. This resolves the conflict between statistical performance and computational tractability. We demonstrate superior performance compared to the L1-norm and other established non-convex sparse regularizers on challenging signal and image denoising tasks.

Updated: 2025-07-28 00:40:48

标题: WEEP: 通过弱凸包实现的可微非凸稀疏正则化器

摘要: 稀疏正则化在信号处理中是高效信号恢复和特征提取的基础。然而,它面临一个基本困境:最强大的稀疏诱导惩罚通常是不可微的,与主导该领域的基于梯度的优化器相冲突。我们引入了WEEP(弱凸分段惩罚包络),这是一种新颖的、完全可微的稀疏正则化器,源自弱凸包络框架。WEEP提供了强大、无偏的稀疏性,同时保持了完全的可微性和L-平滑性,使其与任何基于梯度的优化器本地兼容。这解决了统计性能与计算可处理性之间的冲突。我们在具有挑战性的信号和图像去噪任务上展示了与L1范数和其他已建立的非凸稀疏正则化器相比的卓越性能。

更新时间: 2025-07-28 00:40:48

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.20447v1

In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents

Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization. External memory mechanisms have been proposed to address this limitation, enabling LLMs to maintain conversational continuity. However, existing approaches struggle with two key challenges. First, rigid memory granularity fails to capture the natural semantic structure of conversations, leading to fragmented and incomplete representations. Second, fixed retrieval mechanisms cannot adapt to diverse dialogue contexts and user interaction patterns. In this work, we propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections: (1) Prospective Reflection, which dynamically summarizes interactions across granularities-utterances, turns, and sessions-into a personalized memory bank for effective future retrieval, and (2) Retrospective Reflection, which iteratively refines the retrieval in an online reinforcement learning (RL) manner based on LLMs' cited evidence. Experiments show that RMM demonstrates consistent improvement across various metrics and benchmarks. For example, RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset.

Updated: 2025-07-28 00:39:17

标题: 《前瞻与回顾:面向长期个性化对话代理的反思性记忆管理》

摘要: 大型语言模型(LLMs)在开放式对话中取得了显著进展,然而它们无法保留和检索长期互动中的相关信息,限制了它们在需要持续个性化的应用中的效果。外部记忆机制被提出来解决这一限制,使LLMs能够保持对话连续性。然而,现有方法面临两个关键挑战。首先,刚性的记忆粒度无法捕捉对话的自然语义结构,导致片段化和不完整的表示。其次,固定的检索机制无法适应多样化的对话背景和用户互动模式。在这项工作中,我们提出了反思式记忆管理(RMM),这是一种为长期对话代理设计的新颖机制,整合了前瞻和后瞻反思:(1)前瞻反思,动态总结跨粒度-话语、交替和会话-的互动,形成一个个性化的记忆库,供未来有效检索;(2)后瞻反思,基于LLMs的引用证据,以在线强化学习(RL)的方式迭代地优化检索。实验证明,RMM在各种指标和基准测试中展示了持续的改进。例如,在LongMemEval数据集上,RMM相对于没有记忆管理的基线表现出超过10%的准确性提升。

更新时间: 2025-07-28 00:39:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.08026v2

In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents

Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization. External memory mechanisms have been proposed to address this limitation, enabling LLMs to maintain conversational continuity. However, existing approaches struggle with two key challenges. First, rigid memory granularity fails to capture the natural semantic structure of conversations, leading to fragmented and incomplete representations. Second, fixed retrieval mechanisms cannot adapt to diverse dialogue contexts and user interaction patterns. In this work, we propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections: (1) Prospective Reflection, which dynamically summarizes interactions across granularities-utterances, turns, and sessions-into a personalized memory bank for effective future retrieval, and (2) Retrospective Reflection, which iteratively refines the retrieval in an online reinforcement learning (RL) manner based on LLMs' cited evidence. Experiments show that RMM demonstrates consistent improvement across various metrics and benchmarks. For example, RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset.

Updated: 2025-07-28 00:39:17

标题: 回顾与展望:面向长期个性化对话代理的反思式记忆管理

摘要: 大型语言模型(LLMs)在开放式对话中取得了重大进展,但它们无法保留和检索长期交互中的相关信息,限制了它们在需要持续个性化的应用中的有效性。已经提出了外部存储机制来解决这一限制,使LLMs能够保持对话的连续性。然而,现有方法在两个关键挑战方面存在困难。首先,刚性存储粒度无法捕捉对话的自然语义结构,导致片段化和不完整的表征。其次,固定的检索机制无法适应不同的对话上下文和用户交互模式。在这项工作中,我们提出了反思性记忆管理(RMM),这是一种新颖的长期对话代理机制,整合了前瞻性和后瞻性反思:(1)前瞻性反思,动态总结跨粒度-话语、转换和会话-的互动,形成一个个性化的记忆库,以有效实现未来的检索,以及(2)后瞻性反思,基于LLMs的引用证据,以在线强化学习(RL)的方式迭代地优化检索。实验证明,RMM在各种指标和基准测试中表现出持续改善。例如,在LongMemEval数据集上,RMM相较于没有记忆管理的基线表现出超过10%的准确度提升。

更新时间: 2025-07-28 00:39:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.08026v2

BOASF: A Unified Framework for Speeding up Automatic Machine Learning via Adaptive Successive Filtering

Machine learning has been making great success in many application areas. However, for the non-expert practitioners, it is always very challenging to address a machine learning task successfully and efficiently. Finding the optimal machine learning model or the hyperparameter combination set from a large number of possible alternatives usually requires considerable expert knowledge and experience. To tackle this problem, we propose a combined Bayesian Optimization and Adaptive Successive Filtering algorithm (BOASF) under a unified multi-armed bandit framework to automate the model selection or the hyperparameter optimization. Specifically, BOASF consists of multiple evaluation rounds in each of which we select promising configurations for each arm using the Bayesian optimization. Then, ASF can early discard the poor-performed arms adaptively using a Gaussian UCB-based probabilistic model. Furthermore, a Softmax model is employed to adaptively allocate available resources for each promising arm that advances to the next round. The arm with a higher probability of advancing will be allocated more resources. Experimental results show that BOASF is effective for speeding up the model selection and hyperparameter optimization processes while achieving robust and better prediction performance than the existing state-of-the-art automatic machine learning methods. Moreover, BOASF achieves better anytime performance under various time budgets.

Updated: 2025-07-28 00:30:07

标题: BOASF: 通过自适应连续滤波加速自动机器学习的统一框架

摘要: 机器学习在许多应用领域取得了巨大的成功。然而,对于非专业从业者来说,成功并高效地解决机器学习任务始终是非常具有挑战性的。从大量可能的替代方案中找到最佳的机器学习模型或超参数组合通常需要相当的专业知识和经验。为了解决这个问题,我们提出了一种结合了贝叶斯优化和自适应连续滤波算法(BOASF)的算法,在统一的多臂老虎机框架下自动化模型选择或超参数优化。具体而言,BOASF包括多个评估轮次,在每个轮次中,我们使用贝叶斯优化选择每个臂的有前途的配置。然后,ASF可以根据基于高斯UCB的概率模型自适应地提前丢弃表现不佳的臂。此外,采用Softmax模型来自适应地分配可用资源给每个进入下一轮的有前途的臂。具有更高前进概率的臂将被分配更多资源。实验结果表明,BOASF可以有效加快模型选择和超参数优化过程,同时实现比现有最先进的自动机器学习方法更稳健和更好的预测性能。此外,BOASF在各种时间预算下均表现出更好的任何时候性能。

更新时间: 2025-07-28 00:30:07

领域: cs.LG

下载: http://arxiv.org/abs/2507.20446v1

BOASF: A Unified Framework for Speeding up Automatic Machine Learning via Adaptive Successive Filtering

Machine learning has been making great success in many application areas. However, for the non-expert practitioners, it is always very challenging to address a machine learning task successfully and efficiently. Finding the optimal machine learning model or the hyperparameter combination set from a large number of possible alternatives usually requires considerable expert knowledge and experience. To tackle this problem, we propose a combined Bayesian Optimization and Adaptive Successive Filtering algorithm (BOASF) under a unified multi-armed bandit framework to automate the model selection or the hyperparameter optimization. Specifically, BOASF consists of multiple evaluation rounds in each of which we select promising configurations for each arm using the Bayesian optimization. Then, ASF can early discard the poor-performed arms adaptively using a Gaussian UCB-based probabilistic model. Furthermore, a Softmax model is employed to adaptively allocate available resources for each promising arm that advances to the next round. The arm with a higher probability of advancing will be allocated more resources. Experimental results show that BOASF is effective for speeding up the model selection and hyperparameter optimization processes while achieving robust and better prediction performance than the existing state-of-the-art automatic machine learning methods. Moreover, BOASF achieves better anytime performance under various time budgets.

Updated: 2025-07-28 00:30:07

标题: BOASF:通过自适应连续滤波加速自动机器学习的统一框架

摘要: 机器学习在许多应用领域取得了巨大成功。然而,对于非专业从业者来说,成功有效地处理机器学习任务始终是非常具有挑战性的。从大量可能的替代方案中找到最佳的机器学习模型或超参数组合通常需要相当多的专业知识和经验。为了解决这个问题,我们提出了一种结合贝叶斯优化和自适应连续过滤算法(BOASF)的算法,该算法在一个统一的多臂赌博机框架下自动化模型选择或超参数优化。具体来说,BOASF由多个评估轮组成,在每个轮次中,我们使用贝叶斯优化为每个臂选择有前景的配置。然后,ASF可以使用基于高斯UCB的概率模型自适应地早期丢弃表现不佳的臂。此外,采用Softmax模型来自适应地为每个有前景的臂分配可用资源,以进入下一轮。具有更高进展概率的臂将被分配更多资源。实验结果表明,BOASF有效地加速了模型选择和超参数优化过程,同时实现了比现有最先进的自动机器学习方法更稳健和更好的预测性能。此外,BOASF在各种时间预算下都实现了更好的任何时候性能。

更新时间: 2025-07-28 00:30:07

领域: cs.LG

下载: http://arxiv.org/abs/2507.20446v1

Enhancing QoS in Edge Computing through Federated Layering Techniques: A Pathway to Resilient AI Lifelong Learning Systems

In the context of the rapidly evolving information technology landscape, marked by the advent of 6G communication networks, we face an increased data volume and complexity in network environments. This paper addresses these challenges by focusing on Quality of Service (QoS) in edge computing frameworks. We propose a novel approach to enhance QoS through the development of General Artificial Intelligence Lifelong Learning Systems, with a special emphasis on Federated Layering Techniques (FLT). Our work introduces a federated layering-based small model collaborative mechanism aimed at improving AI models' operational efficiency and response time in environments where resources are limited. This innovative method leverages the strengths of cloud and edge computing, incorporating a negotiation and debate mechanism among small AI models to enhance reasoning and decision-making processes. By integrating model layering techniques with privacy protection measures, our approach ensures the secure transmission of model parameters while maintaining high efficiency in learning and reasoning capabilities. The experimental results demonstrate that our strategy not only enhances learning efficiency and reasoning accuracy but also effectively protects the privacy of edge nodes. This presents a viable solution for achieving resilient large model lifelong learning systems, with a significant improvement in QoS for edge computing environments.

Updated: 2025-07-28 00:24:51

标题: 通过联邦分层技术增强边缘计算中的QoS:实现具有韧性的人工智能终身学习系统的途径

摘要: 在迅速发展的信息技术领域中,随着6G通信网络的出现,我们面临着网络环境中数据量和复杂性的增加。本文关注边缘计算框架中的服务质量(QoS),以解决这些挑战。我们提出了一种通过开发通用人工智能终身学习系统来增强QoS的新方法,特别强调联邦分层技术(FLT)。我们的工作引入了一种基于联邦分层的小模型协作机制,旨在提高在资源有限环境中AI模型的运行效率和响应时间。这种创新方法充分利用了云计算和边缘计算的优势,将小型AI模型之间的协商和辩论机制纳入推理和决策过程中。通过将模型分层技术与隐私保护措施结合,我们的方法确保了模型参数的安全传输,同时保持了学习和推理能力的高效性。实验结果表明,我们的策略不仅提高了学习效率和推理准确性,还有效保护了边缘节点的隐私。这为实现具有鲁棒性的大型模型终身学习系统提供了可行的解决方案,在边缘计算环境中显著提高了QoS。

更新时间: 2025-07-28 00:24:51

领域: cs.AI

下载: http://arxiv.org/abs/2507.20444v1

Enhancing QoS in Edge Computing through Federated Layering Techniques: A Pathway to Resilient AI Lifelong Learning Systems

In the context of the rapidly evolving information technology landscape, marked by the advent of 6G communication networks, we face an increased data volume and complexity in network environments. This paper addresses these challenges by focusing on Quality of Service (QoS) in edge computing frameworks. We propose a novel approach to enhance QoS through the development of General Artificial Intelligence Lifelong Learning Systems, with a special emphasis on Federated Layering Techniques (FLT). Our work introduces a federated layering-based small model collaborative mechanism aimed at improving AI models' operational efficiency and response time in environments where resources are limited. This innovative method leverages the strengths of cloud and edge computing, incorporating a negotiation and debate mechanism among small AI models to enhance reasoning and decision-making processes. By integrating model layering techniques with privacy protection measures, our approach ensures the secure transmission of model parameters while maintaining high efficiency in learning and reasoning capabilities. The experimental results demonstrate that our strategy not only enhances learning efficiency and reasoning accuracy but also effectively protects the privacy of edge nodes. This presents a viable solution for achieving resilient large model lifelong learning systems, with a significant improvement in QoS for edge computing environments.

Updated: 2025-07-28 00:24:51

标题: 通过联邦分层技术增强边缘计算中的QoS:通往弹性AI终身学习系统的路径

摘要: 在迅速发展的信息技术领域中,随着6G通信网络的出现,我们面临着网络环境中数据量和复杂性的增加。本文关注边缘计算框架中的服务质量(QoS),以解决这些挑战。我们提出了一种通过开发通用人工智能终身学习系统来增强QoS的新方法,特别强调联邦层次技术(FLT)。我们的工作引入了一种基于联邦分层的小模型协作机制,旨在提高在资源有限环境中AI模型的运行效率和响应时间。这种创新方法利用了云计算和边缘计算的优势,将小型AI模型之间的谈判和辩论机制纳入推理和决策过程中。通过将模型分层技术与隐私保护措施相结合,我们的方法确保了模型参数的安全传输,同时保持了高效的学习和推理能力。实验结果表明,我们的策略不仅提高了学习效率和推理准确性,还有效保护了边缘节点的隐私。这为实现弹性大型模型终身学习系统提供了可行的解决方案,显著提高了边缘计算环境中的QoS。

更新时间: 2025-07-28 00:24:51

领域: cs.AI

下载: http://arxiv.org/abs/2507.20444v1

Provable In-Context Learning of Nonlinear Regression with Transformers

The transformer architecture, which processes sequences of input tokens to produce outputs for query tokens, has revolutionized numerous areas of machine learning. A defining feature of transformers is their ability to perform previously unseen tasks using task-specific prompts without updating parameters, a phenomenon known as in-context learning (ICL). Recent research has actively explored the training dynamics behind ICL, with much of the focus on relatively simple tasks such as linear regression and binary classification. To advance the theoretical understanding of ICL, this paper investigates more complex nonlinear regression tasks, aiming to uncover how transformers acquire in-context learning capabilities in these settings. We analyze the stage-wise dynamics of attention during training: attention scores between a query token and its target features grow rapidly in the early phase, then gradually converge to one, while attention to irrelevant features decays more slowly and exhibits oscillatory behavior. Our analysis introduces new proof techniques that explicitly characterize how the nature of general non-degenerate L-Lipschitz task functions affects attention weights. Specifically, we identify that the Lipschitz constant L of nonlinear function classes as a key factor governing the convergence dynamics of transformers in ICL. Leveraging these insights, for two distinct regimes depending on whether L is below or above a threshold, we derive different time bounds to guarantee near-zero prediction error. Notably, despite the convergence time depending on the underlying task functions, we prove that query tokens consistently attend to prompt tokens with highly relevant features at convergence, demonstrating the ICL capability of transformers for unseen functions.

Updated: 2025-07-28 00:09:28

标题: 可证明的基于上下文的使用Transformers进行非线性回归学习

摘要: 变压器架构处理输入令牌序列以产生查询令牌的输出,已经在许多机器学习领域引起了革命。变压器的一个显著特征是它们能够使用特定于任务的提示执行以前未见过的任务,而无需更新参数,这种现象被称为上下文学习(ICL)。最近的研究积极探索了ICL背后的训练动态,其中大部分关注于相对简单的任务,如线性回归和二分类。为了推进对ICL的理论理解,本文研究了更复杂的非线性回归任务,旨在揭示变压器如何在这些情境中获得上下文学习能力。我们分析了训练过程中的注意力逐阶动态:查询令牌与其目标特征之间的注意力分数在早期阶段迅速增长,然后逐渐收敛到一,而对无关特征的注意力下降速度较慢,并表现出振荡行为。我们的分析引入了新的证明技术,明确地表征了一般非退化L-李普希茨任务函数的特性如何影响注意力权重。具体而言,我们确定了非线性函数类的李普希茨常数L作为影响变压器在ICL中收敛动态的关键因素。利用这些见解,针对L低于或高于阈值的两个不同区域,我们推导了不同的时间界限,以保证接近零的预测误差。值得注意的是,尽管收敛时间取决于底层任务函数,我们证明查询令牌在收敛时始终关注具有高度相关特征的提示令牌,展示了变压器对未知函数的ICL能力。

更新时间: 2025-07-28 00:09:28

领域: cs.LG

下载: http://arxiv.org/abs/2507.20443v1

Provable In-Context Learning of Nonlinear Regression with Transformers

The transformer architecture, which processes sequences of input tokens to produce outputs for query tokens, has revolutionized numerous areas of machine learning. A defining feature of transformers is their ability to perform previously unseen tasks using task-specific prompts without updating parameters, a phenomenon known as in-context learning (ICL). Recent research has actively explored the training dynamics behind ICL, with much of the focus on relatively simple tasks such as linear regression and binary classification. To advance the theoretical understanding of ICL, this paper investigates more complex nonlinear regression tasks, aiming to uncover how transformers acquire in-context learning capabilities in these settings. We analyze the stage-wise dynamics of attention during training: attention scores between a query token and its target features grow rapidly in the early phase, then gradually converge to one, while attention to irrelevant features decays more slowly and exhibits oscillatory behavior. Our analysis introduces new proof techniques that explicitly characterize how the nature of general non-degenerate L-Lipschitz task functions affects attention weights. Specifically, we identify that the Lipschitz constant L of nonlinear function classes as a key factor governing the convergence dynamics of transformers in ICL. Leveraging these insights, for two distinct regimes depending on whether L is below or above a threshold, we derive different time bounds to guarantee near-zero prediction error. Notably, despite the convergence time depending on the underlying task functions, we prove that query tokens consistently attend to prompt tokens with highly relevant features at convergence, demonstrating the ICL capability of transformers for unseen functions.

Updated: 2025-07-28 00:09:28

标题: 可以翻译为:使用Transformer证明上下文学习非线性回归

摘要: Transformer架构处理输入令牌序列以为查询令牌生成输出,已经在许多机器学习领域引起了革命。Transformer的一个显著特征是能够使用特定任务提示执行以前未见过的任务而无需更新参数,这种现象被称为上下文学习(ICL)。最近的研究积极探索了ICL背后的训练动态,其中大部分关注于相对简单的任务,如线性回归和二元分类。为了推进对ICL的理论理解,本文研究了更复杂的非线性回归任务,旨在揭示transformers在这些设置中如何获得上下文学习能力。我们分析了训练过程中注意力的阶段动态:查询令牌与其目标特征之间的注意力分数在早期阶段迅速增长,然后逐渐收敛到一,而对无关特征的注意力下降速度较慢并呈现振荡行为。我们的分析引入了新的证明技术,明确表征了一般非退化的L-Lipschitz任务函数的性质如何影响注意力权重。具体来说,我们确定了非线性函数类的Lipschitz常数L作为决定transformers在ICL中收敛动态的关键因素。利用这些见解,针对L低于或高于阈值的两个不同区域,我们推导出不同的时间界限,以保证接近零的预测误差。值得注意的是,尽管收敛时间取决于底层任务函数,我们证明查询令牌在收敛时始终关注具有高度相关特征的提示令牌,展示了transformers对未见函数的ICL能力。

更新时间: 2025-07-28 00:09:28

领域: cs.LG

下载: http://arxiv.org/abs/2507.20443v1

By Xinhai (Sean) Zou.