Tabularis Formatus: Predictive Formatting for Tables
Spreadsheet manipulation software are widely used for data management and analysis of tabular data, yet the creation of conditional formatting (CF) rules remains a complex task requiring technical knowledge and experience with specific platforms. In this paper we present TaFo, a neuro-symbolic approach to generating CF suggestions for tables, addressing common challenges such as user unawareness, difficulty in rule creation, and inadequate user interfaces. TaFo takes inspiration from component based synthesis systems and extends them with semantic knowledge of language models and a diversity preserving rule ranking.Unlike previous methods focused on structural formatting, TaFo uniquely incorporates value-based formatting, automatically learning both the rule trigger and the associated visual formatting properties for CF rules. By removing the dependency on user specification used by existing techniques in the form of formatted examples or natural language instruction, TaFo makes formatting completely predictive and automated for the user. To evaluate TaFo, we use a corpus of 1.8 Million public workbooks with CF and manual formatting. We compare TaFo against a diverse set of symbolic and neural systems designed for or adapted for the task of table formatting. Our results show that TaFo generates more accurate, diverse and complete formatting suggestions than current systems and outperforms these by 15.6\%--26.5\% on matching user added ground truth rules in tables.
Updated: 2025-08-14 23:54:40
标题: Formatted Tables: 针对表格的预测性格式化
摘要: 电子表格操作软件被广泛用于表格数据的管理和分析,然而条件格式(CF)规则的创建仍然是一个复杂的任务,需要具有特定平台的技术知识和经验。在本文中,我们提出了TaFo,一种神经符号方法,用于为表格生成CF建议,解决了用户不熟悉、规则创建困难和用户界面不足等常见挑战。TaFo受到基于组件的综合系统的启发,并通过语言模型的语义知识和保持多样性的规则排序来扩展这些系统。与以往专注于结构格式的方法不同,TaFo独特地融合了基于值的格式,自动学习CF规则的触发器和相关的视觉格式属性。通过消除现有技术中对用户规范的依赖,如格式化示例或自然语言指令形式,TaFo使得格式化对用户来说完全具有预测性和自动化。为了评估TaFo,我们使用了一个包含180万个带有CF和手动格式的公共工作簿的语料库。我们将TaFo与一系列为表格格式设计或适应的符号和神经系统进行了比较。我们的结果显示,TaFo生成的格式化建议比当前系统更准确、多样化和完整,并在表格中匹配用户添加的真实规则时优于这些系统15.6\%--26.5%。
更新时间: 2025-08-14 23:54:40
领域: cs.DB,cs.AI,cs.SE
Request-Only Optimization for Recommendation Systems
Deep Learning Recommendation Models (DLRMs) represent one of the largest machine learning applications on the planet. Industry-scale DLRMs are trained with petabytes of recommendation data to serve billions of users every day. To utilize the rich user signals in the long user history, DLRMs have been scaled up to unprecedented complexity, up to trillions of floating-point operations (TFLOPs) per example. This scale, coupled with the huge amount of training data, necessitates new storage and training algorithms to efficiently improve the quality of these complex recommendation systems. In this paper, we present a Request-Only Optimizations (ROO) training and modeling paradigm. ROO simultaneously improves the storage and training efficiency as well as the model quality of recommendation systems. We holistically approach this challenge through co-designing data (i.e., request-only data), infrastructure (i.e., request-only based data processing pipeline), and model architecture (i.e., request-only neural architectures). Our ROO training and modeling paradigm treats a user request as a unit of the training data. Compared with the established practice of treating a user impression as a unit, our new design achieves native feature deduplication in data logging, consequently saving data storage. Second, by de-duplicating computations and communications across multiple impressions in a request, this new paradigm enables highly scaled-up neural network architectures to better capture user interest signals, such as Generative Recommenders (GRs) and other request-only friendly architectures.
Updated: 2025-08-14 23:38:08
标题: 基于请求的推荐系统优化
摘要: 深度学习推荐模型(DLRMs)代表了地球上最大的机器学习应用之一。工业规模的DLRMs通过以PB级别的推荐数据进行训练,以服务每天数十亿用户。为了利用长期用户历史中丰富的用户信号,DLRMs已被扩展到前所未有的复杂度,每个示例高达万亿次浮点运算(TFLOPs)。这种规模,加上巨大的训练数据量,需要新的存储和训练算法来有效地提高这些复杂推荐系统的质量。在本文中,我们提出了一种仅请求优化(ROO)训练和建模范式。ROO同时提高了推荐系统的存储和训练效率以及模型质量。我们通过共同设计数据(即仅请求数据)、基础设施(即基于仅请求的数据处理管道)和模型架构(即仅请求的神经架构)全面应对这一挑战。我们的ROO训练和建模范式将用户请求视为训练数据的单位。与将用户印象视为单位的已建立实践相比,我们的新设计在数据记录中实现了本地特征去重,从而节省数据存储空间。其次,通过在请求中跨多个印象去重计算和通信,这种新范式使高度扩展的神经网络架构能更好地捕捉用户兴趣信号,如生成式推荐器(GRs)和其他仅请求友好的架构。
更新时间: 2025-08-14 23:38:08
领域: cs.IR,cs.AI
Quantization through Piecewise-Affine Regularization: Optimization and Statistical Guarantees
Optimization problems over discrete or quantized variables are very challenging in general due to the combinatorial nature of their search space. Piecewise-affine regularization (PAR) provides a flexible modeling and computational framework for quantization based on continuous optimization. In this work, we focus on the setting of supervised learning and investigate the theoretical foundations of PAR from optimization and statistical perspectives. First, we show that in the overparameterized regime, where the number of parameters exceeds the number of samples, every critical point of the PAR-regularized loss function exhibits a high degree of quantization. Second, we derive closed-form proximal mappings for various (convex, quasi-convex, and non-convex) PARs and show how to solve PAR-regularized problems using the proximal gradient method, its accelerated variant, and the Alternating Direction Method of Multipliers. Third, we study statistical guarantees of PAR-regularized linear regression problems; specifically, we can approximate classical formulations of $\ell_1$-, squared $\ell_2$-, and nonconvex regularizations using PAR and obtain similar statistical guarantees with quantized solutions.
Updated: 2025-08-14 23:35:21
标题: 分段仿射正则化的量化:优化和统计保证
摘要: 离散或量化变量上的优化问题通常具有挑战性,因为其搜索空间具有组合性质。分段仿射正则化(PAR)提供了一种基于连续优化的量化模型和计算框架。在这项工作中,我们专注于监督学习的设定,并从优化和统计的角度研究PAR的理论基础。首先,我们展示在过参数化的情况下,即参数数量超过样本数量时,每个PAR正则化损失函数的临界点都表现出高度量化。其次,我们推导出各种(凸、拟凸和非凸)PAR的闭合近端映射,并展示如何使用近端梯度法、其加速变体和交替方向乘法器方法来解决PAR正则化问题。第三,我们研究PAR正则化线性回归问题的统计保证;具体来说,我们可以使用PAR来近似经典的$\ell_1$-、平方$\ell_2$-和非凸正则化问题,并获得类似的统计保证与量化解。
更新时间: 2025-08-14 23:35:21
领域: cs.LG,cs.AI,math.OC,stat.ML
ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction
Bed-related falls remain a major source of injury in hospitals and long-term care facilities, yet many commercial alarms trigger only after a patient has already left the bed. We show that early bed-exit intent can be predicted using only one low-cost load cell mounted under a bed leg. The resulting load signals are first converted into a compact set of complementary images: an RGB line plot that preserves raw waveforms and three texture maps-recurrence plot, Markov transition field, and Gramian angular field-that expose higher-order dynamics. We introduce ViFusionTST, a dual-stream Swin Transformer that processes the line plot and texture maps in parallel and fuses them through cross-attention to learn data-driven modality weights. To provide a realistic benchmark, we collected six months of continuous data from 95 beds in a long-term-care facility. On this real-world dataset ViFusionTST reaches an accuracy of 0.885 and an F1 score of 0.794, surpassing recent 1D and 2D time-series baselines across F1, recall, accuracy, and AUPRC. The results demonstrate that image-based fusion of load-sensor signals for time series classification is a practical and effective solution for real-time, privacy-preserving fall prevention.
Updated: 2025-08-14 23:34:15
标题: ViFusionTST:从负荷信号中深度融合时间序列图像表示以进行早期起床预测
摘要: 床上跌倒仍然是医院和长期护理机构中受伤的主要原因,然而许多商用警报器只有在患者已经离开床后才会触发。我们展示了可以使用仅一个低成本的载荷传感器安装在床腿下来预测早期离床意图。所产生的载荷信号首先被转换成一组紧凑的互补图像:保留原始波形的RGB线图和三个纹理图-复发图、马尔可夫转换图和格兰姆角场,这些图像揭示了更高阶的动态。我们引入了ViFusionTST,一个双流Swin Transformer,可以并行处理线图和纹理图,并通过交叉关注融合它们以学习数据驱动的模态权重。为了提供一个现实的基准,我们从一个长期护理机构的95张床上收集了连续六个月的数据。在这个真实世界的数据集上,ViFusionTST达到了0.885的准确率和0.794的F1分数,超过了最近的1D和2D时间序列基线在F1、召回率、准确率和AUPRC方面。结果表明,基于图像的载荷传感器信号融合对于时间序列分类是一个实用且有效的解决方案,可以实现实时、隐私保护的跌倒预防。
更新时间: 2025-08-14 23:34:15
领域: cs.CV,cs.AI
Diffusion is a code repair operator and generator
Code diffusion models generate code by iteratively removing noise from the latent representation of a code snippet. During later steps of the diffusion process, when the code snippet has almost converged, differences between discrete representations of these snippets look like last-mile repairs applied to broken or incomplete code. We evaluate the extent to which this resemblance can be exploited to leverage pre-trained code diffusion models for the problem of last-mile repair by considering two applications with significant potential. First, we can leverage the diffusion model for last-mile repair by adding noise to a broken code snippet and resuming the diffusion process. Second, we can leverage the diffusion model to generate arbitrary amount of training data for last-mile repair tasks (that are computationally more efficient) by sampling an intermediate program (input) and the final program (output) from the diffusion process. We perform experiments on 3 domains (Python, Excel and PowerShell) to evaluate applications, as well as analyze properties.
Updated: 2025-08-14 23:27:09
标题: 扩散是一个代码修复操作符和生成器
摘要: 代码扩散模型通过迭代地从代码片段的潜在表示中去除噪音来生成代码。在扩散过程的后期步骤中,当代码片段几乎收敛时,这些片段的离散表示之间的差异看起来像是对损坏或不完整的代码进行的最后一英里修复。我们评估了这种相似性能够被利用到多大程度,以利用预训练的代码扩散模型来解决最后一英里修复问题,考虑到两个具有重要潜力的应用。首先,我们可以通过向损坏的代码片段添加噪音并恢复扩散过程来利用扩散模型进行最后一英里修复。其次,我们可以利用扩散模型通过从扩散过程中对一个中间程序(输入)和最终程序(输出)进行采样来生成任意数量的最后一英里修复任务的训练数据(这在计算上更有效)。我们在三个领域(Python、Excel和PowerShell)上进行实验,以评估应用程序,并分析其属性。
更新时间: 2025-08-14 23:27:09
领域: cs.SE,cs.AI,cs.CL
LETS Forecast: Learning Embedology for Time Series Forecasting
Real-world time series are often governed by complex nonlinear dynamics. Understanding these underlying dynamics is crucial for precise future prediction. While deep learning has achieved major success in time series forecasting, many existing approaches do not explicitly model the dynamics. To bridge this gap, we introduce DeepEDM, a framework that integrates nonlinear dynamical systems modeling with deep neural networks. Inspired by empirical dynamic modeling (EDM) and rooted in Takens' theorem, DeepEDM presents a novel deep model that learns a latent space from time-delayed embeddings, and employs kernel regression to approximate the underlying dynamics, while leveraging efficient implementation of softmax attention and allowing for accurate prediction of future time steps. To evaluate our method, we conduct comprehensive experiments on synthetic data of nonlinear dynamical systems as well as real-world time series across domains. Our results show that DeepEDM is robust to input noise, and outperforms state-of-the-art methods in forecasting accuracy. Our code is available at: https://abrarmajeedi.github.io/deep_edm.
Updated: 2025-08-14 23:19:51
标题: LETS 预测:学习嵌入学习时间序列预测
摘要: 实际世界的时间序列通常受复杂的非线性动态控制。 理解这些潜在动态对于准确的未来预测至关重要。尽管深度学习在时间序列预测方面取得了重大成功,但许多现有方法并未明确建模这些动态。为了弥补这一差距,我们引入了DeepEDM,这是一个将非线性动力系统建模与深度神经网络集成的框架。DeepEDM受经验动态建模(EDM)启发,根基于Takens定理,提出了一种新颖的深度模型,通过学习时间延迟嵌入的潜在空间,并利用核回归来逼近潜在动态,同时利用softmax注意力的高效实现,并允许准确预测未来时间步。为了评估我们的方法,我们在非线性动力系统的合成数据以及跨领域的实际世界时间序列上进行了全面实验。我们的结果显示DeepEDM对输入噪声具有鲁棒性,并在预测准确性方面优于最先进的方法。我们的代码可在以下链接找到:https://abrarmajeedi.github.io/deep_edm.
更新时间: 2025-08-14 23:19:51
领域: cs.LG,stat.ML
SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression
Conventional model compression techniques for LLMs address high memory consumption and slow inference challenges but typically require computationally expensive retraining to preserve accuracy. In contrast, one-shot compression methods eliminate retraining cost, but struggle to achieve accuracy comparable to dense models. This paper presents SLIM, a new one-shot compression framework that holistically integrates hardware-friendly quantization, sparsity, and low-rank approximation into a unified process. First, we formulate the quantization process using a probabilistic approach (SLIM-Quant) that enables us to apply uniform quantization. Then, we use an existing one-shot pruning method to apply semi-structured sparsity on top of the quantized weights. Finally, to compensate for the introduced aggregated quantization and sparsity error, we use a novel saliency function with unique invertible and additive features that enables us to mathematically compute the value of low-rank adapters. SLIM improves model accuracy by up to 5.66% (LLaMA-2-7B) for 2:4 sparsity with 4-bit weight quantization, outperforming prior methods. Models compressed with SLIM achieve up to 4.3x and 3.8x on Nvidia RTX3060 and A100 GPUs, respectively. Additionally, they achieve up to 0.23x end-to-end memory reduction in comparison to their dense counterparts. We also propose an optional PEFT recipe that further improves accuracy by up to 1.66% (LLaMA-2-13B) compared to SLIM without fine-tuning.
Updated: 2025-08-14 23:13:36
标题: SLiM:一次性量化和稀疏化,通过低秩逼近实现LLM权重压缩
摘要: 传统的LLM模型压缩技术解决了高内存消耗和慢推理挑战,但通常需要计算昂贵的再训练来保持准确性。相比之下,一次性压缩方法消除了再训练成本,但往往难以达到与密集模型相当的准确度。本文提出了SLIM,一种新的一次性压缩框架,将硬件友好的量化、稀疏性和低秩逼近综合地整合到一个统一的过程中。首先,我们使用概率方法(SLIM-Quant)制定量化过程,使我们能够应用均匀量化。然后,我们使用现有的一次性修剪方法在量化权重之上应用半结构化稀疏性。最后,为了补偿引入的聚合量化和稀疏性误差,我们使用一种具有独特可逆和可加特性的新型显著性函数,使我们能够数学计算低秩适配器的值。SLIM通过2:4稀疏度和4位权重量化将模型准确度提高了最多5.66%(LLaMA-2-7B),优于先前的方法。使用SLIM压缩的模型在Nvidia RTX3060和A100 GPU上分别实现了最多4.3倍和3.8倍的性能提升。此外,与其密集对应物相比,它们实现了最多0.23倍的端到端内存减少。我们还提出了一个可选的PEFT配方,与不进行微调的SLIM相比,进一步提高了最多1.66%的准确度(LLaMA-2-13B)。
更新时间: 2025-08-14 23:13:36
领域: cs.LG,cs.AI,cs.PF
Hybrid-Hierarchical Fashion Graph Attention Network for Compatibility-Oriented and Personalized Outfit Recommendation
The rapid expansion of the fashion industry and the growing variety of products have made it challenging for users to find compatible items on e-commerce platforms. Effective fashion recommendation systems are crucial for filtering irrelevant items and suggesting suitable ones. However, simultaneously addressing outfit compatibility and personalized recommendations remains a significant challenge, as these aspects are often treated independently in existing studies, often overlooking the complex interactions between items and user preferences. This research introduces a new framework named FGAT, inspired by the HFGN model, which leverages graph neural networks and graph attention mechanisms to tackle this issue. The proposed framework constructs a three-tier hierarchical graph of users, outfits, and items, integrating visual and textual features to simultaneously model outfit compatibility and user preferences. A graph attention mechanism dynamically weights node importance during representation propagation, enabling the capture of key interactions and generating precise representations for both user preferences and outfit compatibility. Evaluated on the POG dataset, FGAT outperforms baseline models such as HFGN, achieving improved results in precision, HR, recall, NDCG, and accuracy.These results demonstrate that combining multimodal visual-textual features with a hierarchical graph structure and attention mechanisms significantly enhances the accuracy and efficiency of personalized fashion recommendation systems.
Updated: 2025-08-14 23:09:57
标题: 混合-分层时尚图关注网络用于面向兼容性和个性化服装推荐
摘要: 时尚行业的快速扩张和产品种类的增加使得用户在电子商务平台上寻找兼容物品变得具有挑战性。有效的时尚推荐系统对于过滤无关物品和建议合适物品至关重要。然而,同时解决服装兼容性和个性化推荐仍然是一个重大挑战,因为这些方面通常在现有研究中被独立处理,往往忽视了物品和用户偏好之间的复杂交互。本研究引入了一个名为FGAT的新框架,灵感源自HFGN模型,利用图神经网络和图注意机制来解决这个问题。所提出的框架构建了一个用户、服装和物品的三层次层次图,整合了视觉和文本特征,同时建模服装兼容性和用户偏好。图注意机制在表示传播过程中动态加权节点重要性,实现了关键交互的捕获,并为用户偏好和服装兼容性生成精确的表示。在POG数据集上评估,FGAT优于HFGN等基线模型,在准确率、HR、召回率、NDCG和准确性方面取得了改进的结果。这些结果表明,将多模态视觉-文本特征与分层图结构和注意机制相结合,显著提高了个性化时尚推荐系统的准确性和效率。
更新时间: 2025-08-14 23:09:57
领域: cs.LG,cs.IR
AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?
Despite progress in language model (LM) capabilities, evaluations have thus far focused on models' performance on tasks that humans have previously solved, including in programming (Jimenez et al., 2024) and mathematics (Glazer et al., 2024). We therefore propose testing models' ability to design and implement algorithms in an open-ended benchmark: We task LMs with writing code that efficiently solves computationally challenging problems in computer science, physics, and mathematics. Our AlgoTune benchmark consists of 154 coding tasks collected from domain experts and a framework for validating and timing LM-synthesized solution code, which is compared to reference implementations from popular open-source packages. In addition, we develop a baseline LM agent, AlgoTuner, and evaluate its performance across a suite of frontier models. AlgoTuner uses a simple, budgeted loop that edits code, compiles and runs it, profiles performance, verifies correctness on tests, and selects the fastest valid version. AlgoTuner achieves an average 1.72x speedup against our reference solvers, which use libraries such as SciPy, sk-learn and CVXPY. However, we find that current models fail to discover algorithmic innovations, instead preferring surface-level optimizations. We hope that AlgoTune catalyzes the development of LM agents exhibiting creative problem solving beyond state-of-the-art human performance.
Updated: 2025-08-14 23:04:38
标题: AlgoTune:语言模型能加速通用数值程序吗?
摘要: 尽管语言模型(LM)的能力有所提高,但至今评估重点仍放在人类先前已解决的任务上,包括在编程(Jimenez等,2024)和数学(Glazer等,2024)领域的表现。因此,我们提出在一个开放式基准测试中测试模型设计和实现算法的能力:我们要求LM编写能有效解决计算机科学、物理和数学中具有挑战性问题的代码。我们的AlgoTune基准测试由来自领域专家的154个编码任务组成,以及一个用于验证和计时LM合成解决方案代码的框架,该代码与流行的开源软件包中的参考实现进行比较。此外,我们开发了一个基准LM代理,AlgoTuner,并评估其在一系列前沿模型上的表现。AlgoTuner使用一个简单的预算循环来编辑代码,编译和运行它,对性能进行分析,验证测试的正确性,并选择最快的有效版本。AlgoTuner相对于我们的参考求解器实现了平均1.72倍的加速,这些求解器使用诸如SciPy、sk-learn和CVXPY等库。然而,我们发现当前模型未能发现算法创新,而是更倾向于表面层次的优化。我们希望AlgoTune能推动LM代理的发展,展示出超越人类最新水平的创造性问题解决能力。
更新时间: 2025-08-14 23:04:38
领域: cs.SE,cs.AI,cs.CL,cs.LG
Language-Based Bayesian Optimization Research Assistant (BORA)
Many important scientific problems involve multivariate optimization coupled with slow and laborious experimental measurements. These complex, high-dimensional searches can be defined by non-convex optimization landscapes that resemble needle-in-a-haystack surfaces, leading to entrapment in local minima. Contextualizing optimizers with human domain knowledge is a powerful approach to guide searches to localized fruitful regions. However, this approach is susceptible to human confirmation bias and it is also challenging for domain experts to keep track of the rapidly expanding scientific literature. Here, we propose the use of Large Language Models (LLMs) for contextualizing Bayesian optimization (BO) via a hybrid optimization framework that intelligently and economically blends stochastic inference with domain knowledge-based insights from the LLM, which is used to suggest new, better-performing areas of the search space for exploration. Our method fosters user engagement by offering real-time commentary on the optimization progress, explaining the reasoning behind the search strategies. We validate the effectiveness of our approach on synthetic benchmarks with up to 15 independent variables and demonstrate the ability of LLMs to reason in four real-world experimental tasks where context-aware suggestions boost optimization performance substantially.
Updated: 2025-08-14 22:51:01
标题: Language-Based Bayesian Optimization Research Assistant (BORA)的翻译是:基于语言的贝叶斯优化研究助理(BORA)
摘要: 许多重要的科学问题涉及多元优化和缓慢且费力的实验测量相结合。这些复杂的高维搜索可以由类似于草堆中的针的非凸优化景观定义,导致陷入局部最小值。将优化器与人类领域知识相结合是引导搜索到局部富有成果区域的强大方法。然而,这种方法容易受到人类确认偏见的影响,而且领域专家很难跟踪迅速扩展的科学文献。在这里,我们提出使用大型语言模型(LLMs)通过智能且经济高效地将随机推理与LLM提供的基于领域知识的见解相结合的混合优化框架来对贝叶斯优化(BO)进行情境化。LLM用于建议探索搜索空间的新的、性能更好的区域。我们的方法通过实时评论优化进展,解释搜索策略背后的推理,促进了用户的参与。我们在具有高达15个独立变量的合成基准测试上验证了我们方法的有效性,并展示了LLMs在四个真实世界实验任务中推理的能力,其中基于上下文的建议大大提升了优化性能。
更新时间: 2025-08-14 22:51:01
领域: cs.LG,cs.AI
Triangle Counting with Local Edge Differential Privacy
Many deployments of differential privacy in industry are in the local model, where each party releases its private information via a differentially private randomizer. We study triangle counting in the local model with edge differential privacy (that, intuitively, requires that the outputs of the algorithm on graphs that differ in one edge be indistinguishable). In this model, each party's local view consists of the adjacency list of one vertex. We investigate both noninteractive and interactive variants of the model. In the noninteractive model, we prove that additive $\Omega(n^2)$ error is necessary for sufficiently small constant $\varepsilon$, where $n$ is the number of nodes and $\varepsilon$ is the privacy parameter. This lower bound is our main technical contribution. It uses a reconstruction attack with a new class of linear queries and a novel mix-and-match strategy of running the local randomizers with different completions of their adjacency lists. It matches the additive error of the algorithm based on Randomized Response, proposed by Imola, Murakami and Chaudhuri (USENIX2021) and analyzed by Imola, Murakami and Chaudhuri (CCS2022) for constant $\varepsilon$. We use a different postprocessing of Randomized Response and provide tight bounds on the variance of the resulting algorithm. In the interactive setting, we prove a lower bound of $\Omega(n^{3/2}/\varepsilon)$ on the additive error for $\varepsilon\leq 1$. Previously, no hardness results were known for interactive, edge-private algorithms in the local model, except for those that follow trivially from the results for the central model. Our work significantly improves on the state of the art in differentially private graph analysis in the local model.
Updated: 2025-08-14 22:33:06
标题: 用本地边差分隐私进行三角形计数
摘要: 在工业中,许多差分隐私的部署是基于本地模型,其中每个参与方通过不同差分私有随机化器发布其私有信息。我们研究了在边差分隐私条件下的本地模型中的三角形计数(直观地要求算法在在一条边上不同的图上的输出是无法区分的)。在这个模型中,每个参与方的本地视图包括一个顶点的邻接列表。我们研究了模型的非交互和交互变体。 在非交互模型中,我们证明对于足够小的常数$\varepsilon$,需要至少$\Omega(n^2)$的加法误差,其中$n$是节点数,$\varepsilon$是隐私参数。这个下界是我们的主要技术贡献。它使用一种新的线性查询类别和一个新颖的混合策略,即运行具有不同邻接列表的本地随机器。它与基于随机响应的算法的加法误差匹配,该算法由Imola,Murakami和Chaudhuri(USENIX2021)提出,并由Imola,Murakami和Chaudhuri(CCS2022)进行分析,适用于常数$\varepsilon$。我们使用不同的随机响应后处理,并提供了生成算法方差的紧密界限。 在交互设置中,我们证明对于$\varepsilon\leq 1$,加法误差的下界为$\Omega(n^{3/2}/\varepsilon)$。以前,在本地模型中的交互式、边私有算法中没有已知的困难结果,除了那些可以从中心模型的结果中平凡地得出。我们的工作在本地模型中的差分隐私图分析领域显著提高了现有技术水平。
更新时间: 2025-08-14 22:33:06
领域: cs.DS,cs.CR
HEIR: A Universal Compiler for Homomorphic Encryption
This work presents Homomorphic Encryption Intermediate Representation (HEIR), a unified approach to building homomorphic encryption (HE) compilers. HEIR aims to support all mainstream techniques in homomorphic encryption, integrate with all major software libraries and hardware accelerators, and advance the field by providing a platform for research and benchmarking. Built on the MLIR compiler framework, HEIR introduces HE-specific abstraction layers at which existing optimizations and new research ideas may be easily implemented. Although many HE optimization techniques have been proposed, it remains difficult to combine or compare them effectively. HEIR provides a means to effectively explore the space of HE optimizations. HEIR addresses the entire HE stack and includes support for various frontends, including Python. The contribution of this work includes: (1) We introduce HEIR as a framework for building HE compilers. (2) We validate HEIR's design by porting a large fraction of the HE literature to HEIR, and we argue that HEIR can tackle more complicated and diverse programs than prior literature. (3) We provide evidence that HEIR is emerging as the de facto HE compiler for academic research and industry development.
Updated: 2025-08-14 22:19:53
标题: HEIR:一种用于同态加密的通用编译器
摘要: 这项工作介绍了同态加密中间表示(HEIR),这是一种构建同态加密(HE)编译器的统一方法。HEIR旨在支持同态加密中的所有主流技术,与所有主要软件库和硬件加速器集成,并通过提供一个用于研究和基准测试的平台来推动该领域的发展。HEIR基于MLIR编译器框架,引入了HE特定的抽象层,可在其中轻松实现现有优化和新的研究想法。尽管已提出许多HE优化技术,但有效地组合或比较它们仍然困难。HEIR提供了一种有效探索HE优化空间的方法。HEIR涵盖了整个HE堆栈,并包括对各种前端的支持,包括Python。这项工作的贡献包括:(1)我们将HEIR作为构建HE编译器的框架进行介绍。 (2)我们通过将大部分HE文献移植到HEIR来验证HEIR的设计,并认为HEIR可以处理比以往文献更复杂和多样化的程序。 (3)我们提供证据表明HEIR正在成为学术研究和行业发展的事实标准HE编译器。
更新时间: 2025-08-14 22:19:53
领域: cs.CR
Utilizing Vision-Language Models as Action Models for Intent Recognition and Assistance
Human-robot collaboration requires robots to quickly infer user intent, provide transparent reasoning, and assist users in achieving their goals. Our recent work introduced GUIDER, our framework for inferring navigation and manipulation intents. We propose augmenting GUIDER with a vision-language model (VLM) and a text-only language model (LLM) to form a semantic prior that filters objects and locations based on the mission prompt. A vision pipeline (YOLO for object detection and the Segment Anything Model for instance segmentation) feeds candidate object crops into the VLM, which scores their relevance given an operator prompt; in addition, the list of detected object labels is ranked by a text-only LLM. These scores weight the existing navigation and manipulation layers of GUIDER, selecting context-relevant targets while suppressing unrelated objects. Once the combined belief exceeds a threshold, autonomy changes occur, enabling the robot to navigate to the desired area and retrieve the desired object, while adapting to any changes in the operator's intent. Future work will evaluate the system on Isaac Sim using a Franka Emika arm on a Ridgeback base, with a focus on real-time assistance.
Updated: 2025-08-14 22:19:09
标题: 利用视觉-语言模型作为意图识别和辅助的行动模型
摘要: 人机协作需要机器人快速推断用户意图,提供透明的推理,并帮助用户实现他们的目标。我们最近的工作引入了GUIDER,我们推断导航和操作意图的框架。我们建议通过增加一个视觉-语言模型(VLM)和一个仅文本语言模型(LLM)来扩充GUIDER,形成一个语义先验,根据任务提示过滤对象和位置。一个视觉管道(用于物体检测的YOLO和用于实例分割的Segment Anything模型)将候选物体裁剪输入VLM,根据操作员提示评分它们的相关性;此外,检测到的物体标签列表由仅文本LLM排序。这些分数权衡了GUIDER的现有导航和操作层,选择上下文相关的目标,同时抑制无关的对象。一旦合并信念超过阈值,自主性变化发生,使机器人能够导航到所需区域并检索所需对象,同时适应操作员意图的任何变化。未来的工作将在Isaac Sim上评估系统,使用Ridgeback基座上的Franka Emika机械臂,重点放在实时辅助上。
更新时间: 2025-08-14 22:19:09
领域: cs.RO,cs.AI,cs.HC
Predictive Multimodal Modeling of Diagnoses and Treatments in EHR
While the ICD code assignment problem has been widely studied, most works have focused on post-discharge document classification. Models for early forecasting of this information could be used for identifying health risks, suggesting effective treatments, or optimizing resource allocation. To address the challenge of predictive modeling using the limited information at the beginning of a patient stay, we propose a multimodal system to fuse clinical notes and tabular events captured in electronic health records. The model integrates pre-trained encoders, feature pooling, and cross-modal attention to learn optimal representations across modalities and balance their presence at every temporal point. Moreover, we present a weighted temporal loss that adjusts its contribution at each point in time. Experiments show that these strategies enhance the early prediction model, outperforming the current state-of-the-art systems.
Updated: 2025-08-14 22:14:18
标题: 电子健康记录中诊断和治疗的预测性多模态建模
摘要: 尽管ICD编码问题已经得到广泛研究,但大多数工作都集中在出院后文档分类上。早期预测这些信息的模型可以用于识别健康风险、建议有效治疗方法或优化资源分配。为了解决在患者入院初期使用有限信息进行预测建模的挑战,我们提出了一个多模态系统,以融合电子健康记录中捕获的临床笔记和表格事件。该模型整合了预训练的编码器、特征汇集和跨模态注意力,以学习跨模态的最佳表示,并在每个时间点平衡它们的存在。此外,我们提出了一种加权时间损失,调整其在每个时间点的贡献。实验证明,这些策略增强了早期预测模型,优于当前最先进的系统。
更新时间: 2025-08-14 22:14:18
领域: cs.LG
Compressive Meta-Learning
The rapid expansion in the size of new datasets has created a need for fast and efficient parameter-learning techniques. Compressive learning is a framework that enables efficient processing by using random, non-linear features to project large-scale databases onto compact, information-preserving representations whose dimensionality is independent of the number of samples and can be easily stored, transferred, and processed. These database-level summaries are then used to decode parameters of interest from the underlying data distribution without requiring access to the original samples, offering an efficient and privacy-friendly learning framework. However, both the encoding and decoding techniques are typically randomized and data-independent, failing to exploit the underlying structure of the data. In this work, we propose a framework that meta-learns both the encoding and decoding stages of compressive learning methods by using neural networks that provide faster and more accurate systems than the current state-of-the-art approaches. To demonstrate the potential of the presented Compressive Meta-Learning framework, we explore multiple applications -- including neural network-based compressive PCA, compressive ridge regression, compressive k-means, and autoencoders.
Updated: 2025-08-14 22:08:06
标题: 压缩元学习
摘要: 新数据集规模的快速扩张已经创造出对快速有效的参数学习技术的需求。压缩学习是一个框架,通过使用随机、非线性特征将大规模数据库投影到紧凑、保留信息的表示上,使得数据的维度独立于样本数量并且可以轻松存储、传输和处理。这些数据库级摘要然后用于从基础数据分布中解码感兴趣的参数,而无需访问原始样本,提供了一种高效且有利于隐私的学习框架。然而,编码和解码技术通常是随机的且与数据无关的,未能充分利用数据的底层结构。在这项工作中,我们提出了一个框架,通过使用神经网络来元学习压缩学习方法的编码和解码阶段,提供比当前最先进方法更快速和更准确的系统。为了展示所提出的压缩元学习框架的潜力,我们探索了多个应用,包括基于神经网络的压缩PCA、压缩岭回归、压缩k均值和自编码器。
更新时间: 2025-08-14 22:08:06
领域: cs.LG,cs.AI,cs.CE,cs.DB,68T07, 68T05, 68T09,I.2.6; I.5.1; G.3; H.2.8
An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models
We develop an analytical framework for understanding how the generated distribution evolves during diffusion model training. Leveraging a Gaussian-equivalence principle, we solve the full-batch gradient-flow dynamics of linear and convolutional denoisers and integrate the resulting probability-flow ODE, yielding analytic expressions for the generated distribution. The theory exposes a universal inverse-variance spectral law: the time for an eigen- or Fourier mode to match its target variance scales as $\tau\propto\lambda^{-1}$, so high-variance (coarse) structure is mastered orders of magnitude sooner than low-variance (fine) detail. Extending the analysis to deep linear networks and circulant full-width convolutions shows that weight sharing merely multiplies learning rates accelerating but not eliminating the bias whereas local convolution introduces a qualitatively different bias. Experiments on Gaussian and natural-image datasets confirm the spectral law persists in deep MLP-based UNet. Convolutional U-Nets, however, display rapid near-simultaneous emergence of many modes, implicating local convolution in reshaping learning dynamics. These results underscore how data covariance governs the order and speed with which diffusion models learn, and they call for deeper investigation of the unique inductive biases introduced by local convolution.
Updated: 2025-08-14 21:54:26
标题: 扩散模型学习动态中的光谱偏差的分析理论
摘要: 我们发展了一个分析框架,用于理解扩散模型训练过程中生成的分布如何演变。利用高斯等价原理,我们解决了线性和卷积去噪器的全批量梯度流动力学,并整合了结果概率流ODE,得到了生成分布的解析表达式。该理论揭示了一个普遍的逆方差谱定律:一个特征值或傅立叶模式匹配其目标方差所需的时间按$\tau\propto\lambda^{-1}$缩放,因此高方差(粗)结构比低方差(细节)掌握得早几个数量级。将分析扩展到深度线性网络和循环全宽卷积显示,权重共享仅仅是加速学习速度,而不是消除偏差,而局部卷积引入了一种质量上不同的偏差。对高斯和自然图像数据集上的实验证实了深度MLP-based UNet中的谱定律仍然存在。然而,卷积U-Nets显示出许多模式迅速几乎同时出现,暗示了局部卷积在重塑学习动态中的作用。这些结果强调了数据协方差如何决定扩散模型学习的顺序和速度,并呼吁更深入地研究局部卷积引入的独特归纳偏差。
更新时间: 2025-08-14 21:54:26
领域: cs.LG,cs.CV,math.ST,stat.ML,stat.TH,68T07, 60G15,F.2.2; G.1.2; G.3; I.2.6
Relative Advantage Debiasing for Watch-Time Prediction in Short-Video Recommendation
Watch time is widely used as a proxy for user satisfaction in video recommendation platforms. However, raw watch times are influenced by confounding factors such as video duration, popularity, and individual user behaviors, potentially distorting preference signals and resulting in biased recommendation models. We propose a novel relative advantage debiasing framework that corrects watch time by comparing it to empirically derived reference distributions conditioned on user and item groups. This approach yields a quantile-based preference signal and introduces a two-stage architecture that explicitly separates distribution estimation from preference learning. Additionally, we present distributional embeddings to efficiently parameterize watch-time quantiles without requiring online sampling or storage of historical data. Both offline and online experiments demonstrate significant improvements in recommendation accuracy and robustness compared to existing baseline methods.
Updated: 2025-08-14 21:52:00
标题: 短视频推荐中观看时长预测的相对优势去偏见
摘要: 观看时间被广泛用作视频推荐平台中用户满意度的代理。然而,原始观看时间受到视频持续时间、流行度和个人用户行为等混淆因素的影响,可能扭曲偏好信号并导致偏倚的推荐模型。我们提出了一种新颖的相对优势去偏框架,通过将观看时间与基于用户和项目组的实证推导的参考分布进行比较来校正观看时间。这种方法产生基于分位数的偏好信号,并引入了一个明确将分布估计与偏好学习分开的两阶段架构。此外,我们提出了分布嵌入,以有效地参数化观看时间分位数,而无需在线抽样或存储历史数据。离线和在线实验均表明,与现有基准方法相比,推荐准确性和稳健性显著提高。
更新时间: 2025-08-14 21:52:00
领域: cs.LG,cs.IR
Learn to optimize for automatic proton PBS treatment planning for H&N cancers
Proton PBS treatment planning for H&N cancers involves numerous conflicting objectives, requiring significant effort from human planners to balance and satisfy multiple clinical goals during planning. To achieve this, experience-demanding objective parameter adjustment and computationally expensive inverse optimization are performed iteratively. Extensive efforts have been made to automatically adjust objective parameters, but the most time-consuming component, i.e., inverse optimization, still relies heavily on theory-driven approaches. We propose a data-driven inverse optimizer and integrate it into a PPO-based automatic treatment planning framework to automatically generate high-quality plans within a clinical acceptable planning time. The inverse optimizer is a L2O method that predicts update steps by learning from the task-specific data distribution. For the first time, we integrate techniques designed for long-context processing, originally developed for LLMs, into a Transformer-based L2O framework to address the scalability issue of existing L2O methods. The PPO framework functions as an outer-loop virtual planner, autonomously adjusting objective parameters through a policy network, and the dose predictor is used to initialize objective parameters. The inner-loop L2O inverse optimizer computes machine-deliverable MU values based on objectives refined by the PPO policy network. 97 patients are collected in this study, and compared with L-BFGSB, our L2O-based inverse optimizer improves the effectiveness and efficiency by 22.97% and 36.41%, respectively. In conjunction with the PPO-based learned virtual planner, plans generated by our framework within an average of 2.55 hours show improved or comparable OAR sparing with superior target coverage for patients with different prescription dose levels, number of target volumes, beam angles, etc., compared with human-generated plans.
Updated: 2025-08-14 21:50:31
标题: 学习为头颈癌进行自动质子PBS治疗规划进行优化
摘要: 质子PBS治疗计划对头颈癌涉及众多相互冲突的目标,需要人类规划者付出大量努力来平衡和满足规划过程中的多个临床目标。为了达到这一目标,需要通过经验需求的客观参数调整和计算密集型的反向优化进行迭代。虽然已经做出了大量努力来自动调整客观参数,但耗时最长的组件,即反向优化,仍然严重依赖理论驱动的方法。我们提出了一种数据驱动的反向优化器,并将其整合到基于PPO的自动治疗计划框架中,以在临床可接受的规划时间内自动生成高质量计划。该反向优化器是一种L2O方法,通过从特定任务数据分布中学习来预测更新步骤。我们首次将为长上下文处理而设计的技术,最初是为LLMs开发的,整合到基于Transformer的L2O框架中,以解决现有L2O方法的可伸缩性问题。PPO框架作为外部循环虚拟规划器,通过策略网络自主调整客观参数,剂量预测器用于初始化客观参数。内部循环的L2O反向优化器根据由PPO策略网络细化的目标计算机可交付的MU值。本研究收集了97名患者,并与L-BFGSB进行比较,我们基于L2O的反向优化器分别提高了22.97%和36.41%的效果和效率。与人工生成的计划相比,我们的框架生成的计划在平均2.55小时内显示出改善或可比的器官风险限制,以及对不同处方剂量水平、目标体积数量、射束角度等患者具有更好的靶区覆盖。
更新时间: 2025-08-14 21:50:31
领域: cs.AI,cs.LG
A Feasibility Experiment on the Application of Predictive Coding to Instant Messaging Corpora
Predictive coding, the term used in the legal industry for document classification using machine learning, presents additional challenges when the dataset comprises instant messages, due to their informal nature and smaller sizes. In this paper, we exploit a data management workflow to group messages into day chats, followed by feature selection and a logistic regression classifier to provide an economically feasible predictive coding solution. We also improve the solution's baseline model performance by dimensionality reduction, with focus on quantitative features. We test our methodology on an Instant Bloomberg dataset, rich in quantitative information. In parallel, we provide an example of the cost savings of our approach.
Updated: 2025-08-14 21:43:13
标题: 一个关于预测编码在即时通讯语料库中应用的可行性实验
摘要: 预测编码是法律行业中使用机器学习进行文件分类的术语,当数据集包含即时消息时,由于其非正式性和较小的规模,会出现额外的挑战。在本文中,我们利用数据管理工作流程将消息分组为日常聊天,然后进行特征选择和逻辑回归分类器,以提供一种经济可行的预测编码解决方案。我们还通过降维改进了解决方案的基线模型性能,重点放在定量特征上。我们在一个富含定量信息的Instant Bloomberg数据集上测试了我们的方法。与此同时,我们提供了我们方法的成本节约示例。
更新时间: 2025-08-14 21:43:13
领域: cs.LG
A Constant-Time Hardware Architecture for the CSIDH Key-Exchange Protocol
The commutative supersingular isogeny Diffie-Hellman (CSIDH) algorithm is a promising post-quantum key exchange protocol, notable for its exceptionally small key sizes, but hindered by computationally intensive key generation. Furthermore, practical implementations must operate in constant time to mitigate side-channel vulnerabilities, which presents an additional performance challenge. This paper presents, to our knowledge, the first comprehensive hardware study of CSIDH, establishing a performance baseline with a unified architecture on both field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) platforms. The architecture features a top-level finite state machine (FSM) that orchestrates a deeply pipelined arithmetic logic unit (ALU) to accelerate the underlying 512-bit finite field operations. The ALU employs a parallelized schoolbook multiplier, completing a 512$\times$512-bit multiplication in 22 clock cycles and enabling a full Montgomery modular multiplication in 87 cycles. The constant-time CSIDH-512 design requires $1.03\times10^{8}$ clock cycles per key generation. When implemented on a Xilinx Zynq UltraScale+ FPGA, the architecture achieves a 200 MHz clock frequency, corresponding to a 515 ms latency. For ASIC implementation in a 180nm process, the design requires $1.065\times10^{8}$ clock cycles and achieves a \textasciitilde 180 MHz frequency, resulting in a key generation latency of 591 ms. By providing the first public hardware performance metrics for CSIDH on both FPGA and ASIC platforms, this work delivers a crucial benchmark for future isogeny-based post-quantum cryptography (PQC) accelerators.
Updated: 2025-08-14 21:37:29
标题: 一种用于CSIDH密钥交换协议的恒定时间硬件架构
摘要: 超对称同态Diffie-Hellman(CSIDH)算法是一种有前途的后量子密钥交换协议,以其异常小的密钥大小而著称,但受到密钥生成计算密集的阻碍。此外,实际实现必须在恒定时间内运行,以减轻侧信道漏洞,这带来了额外的性能挑战。本文提出了我们所知的第一个CSIDH的全面硬件研究,通过在现场可编程门阵列(FPGA)和特定应用集成电路(ASIC)平台上使用统一架构建立了性能基线。该架构具有一个顶层有限状态机(FSM),协调深度流水线化算术逻辑单元(ALU)以加速基础的512位有限域操作。ALU采用并行学校乘法器,完成22个时钟周期内的512×512位乘法,并使得蒙哥马利模乘法在87个周期内完成。恒定时间的CSIDH-512设计需要每个密钥生成$1.03×10^8$个时钟周期。在Xilinx Zynq UltraScale+ FPGA上实现时,该架构实现了200 MHz的时钟频率,对应于515毫秒的延迟。对于180纳米工艺的ASIC实现,该设计需要$1.065×10^8$个时钟周期,并实现了\textasciitilde 180 MHz的频率,导致密钥生成延迟为591毫秒。通过在FPGA和ASIC平台上提供CSIDH的第一个公开硬件性能指标,本研究为未来基于同态的后量子密码加速器提供了关键的基准。
更新时间: 2025-08-14 21:37:29
领域: cs.CR
Are AI Machines Making Humans Obsolete?
This chapter starts with a sketch of how we got to "generative AI" (GenAI) and a brief summary of the various impacts it had so far. It then discusses some of the opportunities of GenAI, followed by the challenges and dangers, including dystopian outcomes resulting from using uncontrolled machine learning and our failures to understand the results. It concludes with some suggestions for how to control GenAI and address its dangers.
Updated: 2025-08-14 21:24:31
标题: 人工智能机器正在让人类变得过时吗?
摘要: 这一章节首先概述了我们如何走向“生成式人工智能”(GenAI),以及迄今为止它所产生的各种影响。接着讨论了GenAI的一些机遇,随后是挑战和危险,包括使用不受控制的机器学习所导致的反乌托邦结果以及我们未能理解结果的失败。最后提出了一些建议,以控制GenAI并解决其危险。
更新时间: 2025-08-14 21:24:31
领域: cs.CY,cs.AI
Prototype-Guided Diffusion: Visual Conditioning without External Memory
Diffusion models have emerged as a leading framework for high-quality image generation, offering stable training and strong performance across diverse domains. However, they remain computationally intensive, particularly during the iterative denoising process. Latent-space models like Stable Diffusion alleviate some of this cost by operating in compressed representations, though at the expense of fine-grained detail. More recent approaches such as Retrieval-Augmented Diffusion Models (RDM) address efficiency by conditioning denoising on similar examples retrieved from large external memory banks. While effective, these methods introduce drawbacks: they require costly storage and retrieval infrastructure, depend on static vision-language models like CLIP for similarity, and lack adaptability during training. We propose the Prototype Diffusion Model (PDM), a method that integrates prototype learning directly into the diffusion process for efficient and adaptive visual conditioning - without external memory. Instead of retrieving reference samples, PDM constructs a dynamic set of compact visual prototypes from clean image features using contrastive learning. These prototypes guide the denoising steps by aligning noisy representations with semantically relevant visual patterns, enabling efficient generation with strong semantic grounding. Experiments show that PDM maintains high generation quality while reducing computational and storage overhead, offering a scalable alternative to retrieval-based conditioning in diffusion models.
Updated: 2025-08-14 21:24:11
标题: 原型引导的扩散:无外部记忆的视觉调节
摘要: 扩散模型已成为高质量图像生成的主要框架,提供稳定的训练和在不同领域中强大的性能。然而,它们仍然在计算上具有很高的要求,特别是在迭代去噪过程中。像稳定扩散这样的潜在空间模型通过在压缩表示中运行来减轻部分成本,尽管以牺牲细粒度细节为代价。更近期的方法,如检索增强扩散模型(RDM),通过在大型外部存储器库中检索的相似示例对去噪进行调节来解决效率问题。虽然有效,但这些方法引入了一些缺点:它们需要昂贵的存储和检索基础设施,依赖于像CLIP这样的静态视觉语言模型来进行相似性判断,并在训练过程中缺乏适应性。我们提出了原型扩散模型(PDM),一种将原型学习直接整合到扩散过程中以实现高效和自适应视觉调节的方法 - 而无需外部存储器。PDM不是检索参考样本,而是使用对比学习从干净图像特征中构建动态的紧凑视觉原型集合。这些原型通过将嘈杂的表示与语义相关的视觉模式对齐,引导去噪步骤,实现了具有强大语义基础的高效生成。实验证明,PDM在保持高质量生成的同时减少了计算和存储开销,为扩散模型中基于检索的调节提供了可扩展的替代方案。
更新时间: 2025-08-14 21:24:11
领域: cs.LG
Abundance-Aware Set Transformer for Microbiome Sample Embedding
Microbiome sample representation to input into LLMs is essential for downstream tasks such as phenotype prediction and environmental classification. While prior studies have explored embedding-based representations of each microbiome sample, most rely on simple averaging over sequence embeddings, often overlooking the biological importance of taxa abundance. In this work, we propose an abundance-aware variant of the Set Transformer to construct fixed-size sample-level embeddings by weighting sequence embeddings according to their relative abundance. Without modifying the model architecture, we replicate embedding vectors proportional to their abundance and apply self-attention-based aggregation. Our method outperforms average pooling and unweighted Set Transformers on real-world microbiome classification tasks, achieving perfect performance in some cases. These results demonstrate the utility of abundance-aware aggregation for robust and biologically informed microbiome representation. To the best of our knowledge, this is one of the first approaches to integrate sequence-level abundance into Transformer-based sample embeddings.
Updated: 2025-08-14 21:15:53
标题: 《考虑丰度的微生物组样本嵌入集变换器》
摘要: 微生物组样本的表示对于输入到LLMs中是至关重要的,用于下游任务如表型预测和环境分类。虽然先前的研究探索了基于嵌入的每个微生物组样本的表示,但大多数依赖于对序列嵌入的简单平均,往往忽略了分类群丰度的生物学重要性。在这项工作中,我们提出了一种考虑丰度的Set Transformer变体,通过根据其相对丰度加权序列嵌入来构建固定大小的样本级嵌入。在不修改模型架构的情况下,我们复制与其丰度成比例的嵌入向量,并应用基于自注意力的聚合。我们的方法在真实世界的微生物组分类任务中优于平均池化和未加权的Set Transformers,在某些情况下实现了完美的性能。这些结果证明了考虑丰度的聚合对于稳健且生物信息的微生物组表示的实用性。据我们所知,这是第一种将序列级丰度整合到基于Transformer的样本嵌入中的方法之一。
更新时间: 2025-08-14 21:15:53
领域: cs.LG
LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters
Generating high-quality and temporally synchronized audio from video content is essential for video editing and post-production tasks, enabling the creation of semantically aligned audio for silent videos. However, most existing approaches focus on short-form audio generation for video segments under 10 seconds or rely on noisy datasets for long-form video-to-audio zsynthesis. To address these limitations, we introduce LD-LAudio-V1, an extension of state-of-the-art video-to-audio models and it incorporates dual lightweight adapters to enable long-form audio generation. In addition, we release a clean and human-annotated video-to-audio dataset that contains pure sound effects without noise or artifacts. Our method significantly reduces splicing artifacts and temporal inconsistencies while maintaining computational efficiency. Compared to direct fine-tuning with short training videos, LD-LAudio-V1 achieves significant improvements across multiple metrics: $FD_{\text{passt}}$ 450.00 $\rightarrow$ 327.29 (+27.27%), $FD_{\text{panns}}$ 34.88 $\rightarrow$ 22.68 (+34.98%), $FD_{\text{vgg}}$ 3.75 $\rightarrow$ 1.28 (+65.87%), $KL_{\text{panns}}$ 2.49 $\rightarrow$ 2.07 (+16.87%), $KL_{\text{passt}}$ 1.78 $\rightarrow$ 1.53 (+14.04%), $IS_{\text{panns}}$ 4.17 $\rightarrow$ 4.30 (+3.12%), $IB_{\text{score}}$ 0.25 $\rightarrow$ 0.28 (+12.00%), $Energy\Delta10\text{ms}$ 0.3013 $\rightarrow$ 0.1349 (+55.23%), $Energy\Delta10\text{ms(vs.GT)}$ 0.0531 $\rightarrow$ 0.0288 (+45.76%), and $Sem.\,Rel.$ 2.73 $\rightarrow$ 3.28 (+20.15%). Our dataset aims to facilitate further research in long-form video-to-audio generation and is available at https://github.com/deepreasonings/long-form-video2audio.
Updated: 2025-08-14 21:11:57
标题: LD-LAudio-V1:具有双轻量级适配器的视频到长格式音频生成扩展
摘要: 从视频内容生成高质量且时间同步的音频对于视频编辑和后期制作任务至关重要,可以为无声视频创造语义对齐的音频。然而,大多数现有方法侧重于短时音频生成,适用于不到10秒的视频片段,或者依赖嘈杂的数据集进行长时视频到音频合成。为了解决这些限制,我们引入了LD-LAudio-V1,这是一种最先进的视频到音频模型的扩展,并且结合了双轻量级适配器,以实现长时音频生成。此外,我们发布了一个干净且人工注释的视频到音频数据集,其中包含没有噪音或伪影的纯音效。我们的方法显著降低了拼接伪影和时间不一致性,同时保持了计算效率。与直接对短训练视频进行微调相比,LD-LAudio-V1在多个指标上取得了显著改进:$FD_{\text{passt}}$ 450.00 $\rightarrow$ 327.29 (+27.27%),$FD_{\text{panns}}$ 34.88 $\rightarrow$ 22.68 (+34.98%),$FD_{\text{vgg}}$ 3.75 $\rightarrow$ 1.28 (+65.87%),$KL_{\text{panns}}$ 2.49 $\rightarrow$ 2.07 (+16.87%),$KL_{\text{passt}}$ 1.78 $\rightarrow$ 1.53 (+14.04%),$IS_{\text{panns}}$ 4.17 $\rightarrow$ 4.30 (+3.12%),$IB_{\text{score}}$ 0.25 $\rightarrow$ 0.28 (+12.00%),$Energy\Delta10\text{ms}$ 0.3013 $\rightarrow$ 0.1349 (+55.23%),$Energy\Delta10\text{ms(vs.GT)}$ 0.0531 $\rightarrow$ 0.0288 (+45.76%),以及$Sem.\,Rel.$ 2.73 $\rightarrow$ 3.28 (+20.15%)。我们的数据集旨在促进长时视频到音频生成领域的进一步研究,并可在https://github.com/deepreasonings/long-form-video2audio 上获得。
更新时间: 2025-08-14 21:11:57
领域: cs.SD,cs.AI,cs.CV,eess.AS
From Individual to Multi-Agent Algorithmic Recourse: Minimizing the Welfare Gap via Capacitated Bipartite Matching
Decision makers are increasingly relying on machine learning in sensitive situations. In such settings, algorithmic recourse aims to provide individuals with actionable and minimally costly steps to reverse unfavorable AI-driven decisions. While existing research predominantly focuses on single-individual (i.e., seeker) and single-model (i.e., provider) scenarios, real-world applications often involve multiple interacting stakeholders. Optimizing outcomes for seekers under an individual welfare approach overlooks the inherently multi-agent nature of real-world systems, where individuals interact and compete for limited resources. To address this, we introduce a novel framework for multi-agent algorithmic recourse that accounts for multiple recourse seekers and recourse providers. We model this many-to-many interaction as a capacitated weighted bipartite matching problem, where matches are guided by both recourse cost and provider capacity. Edge weights, reflecting recourse costs, are optimized for social welfare while quantifying the welfare gap between individual welfare and this collectively feasible outcome. We propose a three-layer optimization framework: (1) basic capacitated matching, (2) optimal capacity redistribution to minimize the welfare gap, and (3) cost-aware optimization balancing welfare maximization with capacity adjustment costs. Experimental validation on synthetic and real-world datasets demonstrates that our framework enables the many-to-many algorithmic recourse to achieve near-optimal welfare with minimum modification in system settings. This work extends algorithmic recourse from individual recommendations to system-level design, providing a tractable path toward higher social welfare while maintaining individual actionability.
Updated: 2025-08-14 21:04:24
标题: 从个体到多智能体算法的补救措施:通过容量受限的二部图匹配最小化福利差距
摘要: 决策者在敏感情况下越来越依赖机器学习。在这种情况下,算法性补救旨在为个人提供可行且成本最低的步骤,以扭转不利于人工智能驱动的决定。尽管现有研究主要集中在单个个体(即求助者)和单一模型(即提供者)的情况下,但现实世界的应用往往涉及多个相互作用的利益相关者。在个人福利方法下优化求助者的结果忽视了现实世界系统固有的多代理性质,个体在其中相互作用并竞争有限资源。为了解决这个问题,我们引入了一个新颖的多智能体算法性补救框架,考虑了多个求助者和补救者。我们将这种多对多互动建模为一个容量加权二部匹配问题,匹配由补救成本和提供者容量引导。边权重反映补救成本,优化社会福利,同时量化个人福利和这种集体可行结果之间的福利差距。我们提出一个三层优化框架:(1)基本容量匹配,(2)最优容量重新分配以最小化福利差距,(3)成本感知优化平衡福利最大化与容量调整成本。在合成和真实数据集上的实验验证表明,我们的框架使得多对多算法性补救能够在系统设置中实现近乎最优的福利,并且最小地修改。这项工作将算法性补救从个体建议扩展到系统级设计,为实现更高的社会福利提供了一条可行的路径,同时保持个体可行性。
更新时间: 2025-08-14 21:04:24
领域: cs.AI
Functional Analysis of Variance for Association Studies
While progress has been made in identifying common genetic variants associated with human diseases, for most of common complex diseases, the identified genetic variants only account for a small proportion of heritability. Challenges remain in finding additional unknown genetic variants predisposing to complex diseases. With the advance in next-generation sequencing technologies, sequencing studies have become commonplace in genetic research. The ongoing exome-sequencing and whole-genome-sequencing studies generate a massive amount of sequencing variants and allow researchers to comprehensively investigate their role in human diseases. The discovery of new disease-associated variants can be enhanced by utilizing powerful and computationally efficient statistical methods. In this paper, we propose a functional analysis of variance (FANOVA) method for testing an association of sequence variants in a genomic region with a qualitative trait. The FANOVA has a number of advantages: (1) it tests for a joint effect of gene variants, including both common and rare; (2) it fully utilizes linkage disequilibrium and genetic position information; and (3) allows for either protective or risk-increasing causal variants. Through simulations, we show that FANOVA outperform two popularly used methods - SKAT and a previously proposed method based on functional linear models (FLM), - especially if a sample size of a study is small and/or sequence variants have low to moderate effects. We conduct an empirical study by applying three methods (FANOVA, SKAT and FLM) to sequencing data from Dallas Heart Study. While SKAT and FLM respectively detected ANGPTL 4 and ANGPTL 3 associated with obesity, FANOVA was able to identify both genes associated with obesity.
Updated: 2025-08-14 21:02:45
标题: 关联研究的方差功能分析
摘要: 虽然已经取得了在人类疾病中识别常见遗传变体的进展,但对于大多数常见的复杂疾病来说,已经确定的遗传变体仅占遗传性的一小部分。在发现进一步未知的与复杂疾病易感性相关的遗传变体方面仍存在挑战。随着下一代测序技术的进步,测序研究已经成为遗传研究中的常见做法。正在进行的外显子测序和全基因组测序研究产生了大量的测序变体,并允许研究人员全面调查它们在人类疾病中的作用。通过利用功能分析方差(FANOVA)方法,可以增强发现与疾病相关的新变体的能力。FANOVA具有许多优点:(1)它测试基因变体的联合效应,包括常见和罕见的变体;(2)它充分利用连锁不平衡和遗传位置信息;以及(3)允许保护性或增加风险的致病变体。通过模拟,我们展示了FANOVA在表现上优于两种常用方法 - SKAT和基于功能线性模型(FLM)的先前提出的方法 - 尤其是在研究样本量较小和/或序列变体具有低至中等效应的情况下。我们通过将三种方法(FANOVA、SKAT和FLM)应用于达拉斯心脏研究的测序数据,进行了一个实证研究。虽然SKAT和FLM分别检测出与肥胖相关的ANGPTL4和ANGPTL3,但FANOVA能够识别与肥胖相关的两种基因。
更新时间: 2025-08-14 21:02:45
领域: stat.AP,cs.LG,stat.ME
Adaptive Bayesian Optimization for Robust Identification of Stochastic Dynamical Systems
This paper deals with the identification of linear stochastic dynamical systems, where the unknowns include system coefficients and noise variances. Conventional approaches that rely on the maximum likelihood estimation (MLE) require nontrivial gradient computations and are prone to local optima. To overcome these limitations, a sample-efficient global optimization method based on Bayesian optimization (BO) is proposed, using an ensemble Gaussian process (EGP) surrogate with weighted kernels from a predefined dictionary. This ensemble enables a richer function space and improves robustness over single-kernel BO. Each objective evaluation is efficiently performed via Kalman filter recursion. Extensive experiments across parameter settings and sampling intervals show that the EGP-based BO consistently outperforms MLE via steady-state filtering and expectation-maximization (whose derivation is a side contribution) in terms of RMSE and statistical consistency. Unlike the ensemble variant, single-kernel BO does not always yield such gains, underscoring the benefits of model averaging. Notably, the BO-based estimator achieves RMSE below the classical Cramer-Rao bound, particularly for the inverse time constant, long considered difficult to estimate. This counterintuitive outcome is attributed to a data-driven prior implicitly induced by the GP surrogate in BO.
Updated: 2025-08-14 20:46:37
标题: 自适应贝叶斯优化用于鲁棒识别随机动力系统
摘要: 这篇论文涉及识别线性随机动态系统,其中未知参数包括系统系数和噪声方差。传统方法依赖于最大似然估计(MLE),需要复杂的梯度计算,并容易陷入局部最优解。为了克服这些局限性,提出了一种基于贝叶斯优化(BO)的样本高效全局优化方法,使用预定义字典中的加权核的集合高斯过程(EGP)代理。这个集合使得函数空间更丰富,提高了对单核BO的鲁棒性。每个目标评估通过卡尔曼滤波递归有效地执行。在各种参数设置和采样间隔上进行的大量实验表明,基于EGP的BO在RMSE和统计一致性方面始终优于通过稳态滤波和期望最大化(其推导是一个次要贡献)的MLE。与集合变体不同,单核BO并不总是能够获得这样的增益,突出了模型平均的好处。值得注意的是,基于BO的估计器实现了低于经典Cramer-Rao界限的RMSE,特别是对于长期被认为难以估计的倒数时间常数。这种直觉上的结果归因于BO中由GP代理隐含引导的数据驱动先验。
更新时间: 2025-08-14 20:46:37
领域: stat.ML,cs.LG,stat.ME
Human-in-the-Loop Systems for Adaptive Learning Using Generative AI
A Human-in-the-Loop (HITL) approach leverages generative AI to enhance personalized learning by directly integrating student feedback into AI-generated solutions. Students critique and modify AI responses using predefined feedback tags, fostering deeper engagement and understanding. This empowers students to actively shape their learning, with AI serving as an adaptive partner. The system uses a tagging technique and prompt engineering to personalize content, informing a Retrieval-Augmented Generation (RAG) system to retrieve relevant educational material and adjust explanations in real time. This builds on existing research in adaptive learning, demonstrating how student-driven feedback loops can modify AI-generated responses for improved student retention and engagement, particularly in STEM education. Preliminary findings from a study with STEM students indicate improved learning outcomes and confidence compared to traditional AI tools. This work highlights AI's potential to create dynamic, feedback-driven, and personalized learning environments through iterative refinement.
Updated: 2025-08-14 20:44:34
标题: 基于生成式人工智能的人机协同系统用于自适应学习
摘要: 一种人机协作(HITL)方法利用生成式人工智能来增强个性化学习,通过直接将学生反馈集成到AI生成的解决方案中。学生使用预定义的反馈标签对AI的回应进行批评和修改,促进更深入的参与和理解。这使学生能够积极塑造他们的学习,而AI则作为适应性伙伴。该系统使用标记技术和提示工程来个性化内容,通过一个检索增强生成(RAG)系统实时检索相关的教育材料并调整解释。这在自适应学习的现有研究基础上进行了拓展,展示了学生驱动的反馈循环如何修改AI生成的回应,以提高学生的保留和参与度,尤其是在STEM教育领域。与STEM学生进行的研究初步结果表明,与传统AI工具相比,学习成果和信心得到了提高。这项工作突显了AI通过迭代改进能够创造动态、反馈驱动和个性化的学习环境的潜力。
更新时间: 2025-08-14 20:44:34
领域: cs.HC,cs.LG
Counterfactual Survival Q Learning for Longitudinal Randomized Trials via Buckley James Boosting
We propose a Buckley James (BJ) Boost Q learning framework for estimating optimal dynamic treatment regimes under right censored survival data, tailored for longitudinal randomized clinical trial settings. The method integrates accelerated failure time models with iterative boosting techniques, including componentwise least squares and regression trees, within a counterfactual Q learning framework. By directly modeling conditional survival time, BJ Boost Q learning avoids the restrictive proportional hazards assumption and enables unbiased estimation of stage specific Q functions. Grounded in potential outcomes, this framework ensures identifiability of the optimal treatment regime under standard causal assumptions. Compared to Cox based Q learning, which relies on hazard modeling and may suffer from bias under misspecification, our approach provides robust and flexible estimation. Simulation studies and analysis of the ACTG175 HIV trial demonstrate that BJ Boost Q learning yields higher accuracy in treatment decision making, especially in multistage settings where bias can accumulate.
Updated: 2025-08-14 20:43:56
标题: 反事实生存Q学习在纵向随机试验中的应用:通过Buckley James Boosting
摘要: 我们提出了一个Buckley James(BJ)Boost Q学习框架,用于在纵向随机临床试验环境中估计最佳的动态治疗方案,适用于右截尾生存数据。该方法将加速故障时间模型与迭代Boosting技术整合在一起,包括成分最小二乘和回归树,结合反事实Q学习框架。通过直接建模条件生存时间,BJ Boost Q学习避免了限制性的比例危险假设,并实现了各阶段特定Q函数的无偏估计。基于潜在结果,该框架确保在标准因果假设下最佳治疗方案的可识别性。与基于Cox的Q学习相比,后者依赖于危险模型,并且在误差规范不正确时可能存在偏差,我们的方法提供了稳健和灵活的估计。模拟研究和对ACTG175 HIV试验的分析表明,BJ Boost Q学习在治疗决策方面具有更高的准确性,特别是在多阶段设置中,偏差可能会累积。
更新时间: 2025-08-14 20:43:56
领域: stat.ML,cs.LG,stat.ME
SHLIME: Foiling adversarial attacks fooling SHAP and LIME
Post hoc explanation methods, such as LIME and SHAP, provide interpretable insights into black-box classifiers and are increasingly used to assess model biases and generalizability. However, these methods are vulnerable to adversarial manipulation, potentially concealing harmful biases. Building on the work of Slack et al. (2020), we investigate the susceptibility of LIME and SHAP to biased models and evaluate strategies for improving robustness. We first replicate the original COMPAS experiment to validate prior findings and establish a baseline. We then introduce a modular testing framework enabling systematic evaluation of augmented and ensemble explanation approaches across classifiers of varying performance. Using this framework, we assess multiple LIME/SHAP ensemble configurations on out-of-distribution models, comparing their resistance to bias concealment against the original methods. Our results identify configurations that substantially improve bias detection, highlighting their potential for enhancing transparency in the deployment of high-stakes machine learning systems.
Updated: 2025-08-14 20:28:48
标题: SHLIME:挫败对抗性攻击,愚弄SHAP和LIME
摘要: 后设说明方法,如LIME和SHAP,提供了对黑盒分类器的可解释性见解,并越来越多地用于评估模型偏差和泛化能力。然而,这些方法容易受到对抗性操纵的影响,可能隐藏有害的偏见。在Slack等人(2020)的工作基础上,我们研究了LIME和SHAP对有偏模型的敏感性,并评估了改进鲁棒性的策略。我们首先复制了原始的COMPAS实验,以验证先前的发现并建立一个基准。然后,我们引入了一个模块化的测试框架,可以系统地评估不同性能的分类器之间的增强和集成解释方法。利用这个框架,我们评估了多个LIME/SHAP集成配置在超出分布模型上的表现,比较它们对偏见掩盖的抵抗力与原始方法。我们的结果确定了显著改善偏见检测的配置,突显了它们在增强高风险机器学习系统部署透明性方面的潜力。
更新时间: 2025-08-14 20:28:48
领域: cs.LG,cs.CR
Clean-Label Physical Backdoor Attacks with Data Distillation
Deep Neural Networks (DNNs) are shown to be vulnerable to backdoor poisoning attacks, with most research focusing on digital triggers -- artificial patterns added to test-time inputs to induce targeted misclassification. Physical triggers, which are natural objects embedded in real-world scenes, offer a promising alternative for attackers, as they can activate backdoors in real-time without digital manipulation. However, existing physical backdoor attacks are dirty-label, meaning that attackers must change the labels of poisoned inputs to the target label. The inconsistency between image content and label exposes the attack to human inspection, reducing its stealthiness in real-world settings. To address this limitation, we introduce Clean-Label Physical Backdoor Attack (CLPBA), a new paradigm of physical backdoor attack that does not require label manipulation and trigger injection at the training stage. Instead, the attacker injects imperceptible perturbations into a small number of target class samples to backdoor a model. By framing the attack as a Dataset Distillation problem, we develop three CLPBA variants -- Parameter Matching, Gradient Matching, and Feature Matching -- that craft effective poisons under both linear probing and full-finetuning training settings. In hard scenarios that require backdoor generalizability in the physical world, CLPBA is shown to even surpass Dirty-label attack baselines. We demonstrate the effectiveness of CLPBA via extensive experiments on two collected physical backdoor datasets for facial recognition and animal classification. The code is available in https://github.com/thinh-dao/Clean-Label-Physical-Backdoor-Attacks.
Updated: 2025-08-14 20:23:48
标题: 使用数据蒸馏的干净标签物理后门攻击
摘要: 深度神经网络(DNNs)被证明容易受到后门毒化攻击的影响,大多数研究集中在数字触发器上--即在测试时输入中添加人为模式以诱导目标错误分类。物理触发器,即嵌入在现实世界场景中的自然物体,为攻击者提供了一个有希望的替代选择,因为它们可以在实时环境中激活后门而无需数字操作。然而,现有的物理后门攻击是"脏标签"的,意味着攻击者必须将毒化输入的标签更改为目标标签。图像内容与标签之间的不一致性暴露了攻击,降低了其在现实世界环境中的隐蔽性。为了解决这一限制,我们引入了Clean-Label Physical Backdoor Attack(CLPBA),这是一种新的物理后门攻击范式,不需要在训练阶段进行标签操作和触发器注入。相反,攻击者将难以察觉的扰动注入到少量目标类样本中以设置一个模型后门。通过将攻击框架化为数据集蒸馏问题,我们开发了三种CLPBA变体--参数匹配、梯度匹配和特征匹配--在线性探测和全细化训练设置下制作有效的毒药。在需要在现实世界中具有后门泛化能力的困难场景中,CLPBA甚至超越了脏标签攻击的基线。我们通过对两个收集的用于人脸识别和动物分类的物理后门数据集进行广泛实验来证明CLPBA的有效性。代码可在https://github.com/thinh-dao/Clean-Label-Physical-Backdoor-Attacks上找到。
更新时间: 2025-08-14 20:23:48
领域: cs.CR,cs.AI
AI That Helps Us Help Each Other: A Proactive System for Scaffolding Mentor-Novice Collaboration in Entrepreneurship Coaching
Entrepreneurship requires navigating open-ended, ill-defined problems: identifying risks, challenging assumptions, and making strategic decisions under deep uncertainty. Novice founders often struggle with these metacognitive demands, while mentors face limited time and visibility to provide tailored support. We present a human-AI coaching system that combines a domain-specific cognitive model of entrepreneurial risk with a large language model (LLM) to proactively scaffold both novice and mentor thinking. The system proactively poses diagnostic questions that challenge novices' thinking and helps both novices and mentors plan for more focused and emotionally attuned meetings. Critically, mentors can inspect and modify the underlying cognitive model, shaping the logic of the system to reflect their evolving needs. Through an exploratory field deployment, we found that using the system supported novice metacognition, helped mentors plan emotionally attuned strategies, and improved meeting depth, intentionality, and focus--while also surfaced key tensions around trust, misdiagnosis, and expectations of AI. We contribute design principles for proactive AI systems that scaffold metacognition and human-human collaboration in complex, ill-defined domains, offering implications for similar domains like healthcare, education, and knowledge work.
Updated: 2025-08-14 20:23:48
标题: 帮助我们相互帮助的人工智能:一种主动系统,用于支持企业家指导中的导师-新手合作
摘要: 创业需要处理开放性、含糊不清的问题:识别风险、挑战假设并在深度不确定性下做出战略决策。新手创始人经常在这些元认知需求上挣扎,而导师则面临时间有限和可见性有限的困境,无法提供量身定制的支持。我们提出了一个人工智能辅导系统,结合了创业风险的领域特定认知模型和大型语言模型(LLM),积极地支持新手和导师的思维。该系统积极提出诊断性问题,挑战新手的思维,并帮助新手和导师计划更加专注和情感上敏锐的会议。关键是,导师可以检查和修改基础认知模型,塑造系统的逻辑以反映他们不断变化的需求。通过一次探索性的现场部署,我们发现使用该系统支持了新手的元认知,帮助导师计划情感上敏锐的策略,并提高了会议的深度、目的性和焦点 - 同时也暴露了关于信任、误诊和对人工智能期望的关键紧张关系。我们为积极支持元认知和人际协作的复杂、含糊不清领域设计原则做出了贡献,为类似的领域如医疗保健、教育和知识工作提供了启示。
更新时间: 2025-08-14 20:23:48
领域: cs.HC,cs.AI,68T35 (Primary), 68U99 (Secondary),H.5.2
Conditional Independence Estimates for the Generalized Nonparanormal
For general non-Gaussian distributions, the covariance and precision matrices do not encode the independence structure of the variables, as they do for the multivariate Gaussian. This paper builds on previous work to show that for a class of non-Gaussian distributions -- those derived from diagonal transformations of a Gaussian -- information about the conditional independence structure can still be inferred from the precision matrix, provided the data meet certain criteria, analogous to the Gaussian case. We call such transformations of the Gaussian as the generalized nonparanormal. The functions that define these transformations are, in a broad sense, arbitrary. We also provide a simple and computationally efficient algorithm that leverages this theory to recover conditional independence structure from the generalized nonparanormal data. The effectiveness of the proposed algorithm is demonstrated via synthetic experiments and applications to real-world data.
Updated: 2025-08-14 20:19:30
标题: 广义非正常模型的条件独立估计
摘要: 对于一般的非高斯分布,协方差和精度矩阵并不像多元高斯分布那样编码变量的独立结构。本文在先前的工作基础上展示,对于一类非高斯分布——那些由高斯的对角线变换导出的分布——条件独立结构的信息仍然可以从精度矩阵中推断出来,只要数据符合某些条件,类似于高斯情况。我们将这种高斯的变换称为广义非正态。定义这些变换的函数在广义意义上是任意的。我们还提供了一个简单且计算效率高的算法,利用这一理论从广义非正态数据中恢复条件独立结构。所提出的算法的有效性通过合成实验和应用于真实数据来展示。
更新时间: 2025-08-14 20:19:30
领域: cs.LG,stat.ML
Explainable Attention-Guided Stacked Graph Neural Networks for Malware Detection
Malware detection in modern computing environments demands models that are not only accurate but also interpretable and robust to evasive techniques. Graph neural networks (GNNs) have shown promise in this domain by modeling rich structural dependencies in graph-based program representations such as control flow graphs (CFGs). However, single-model approaches may suffer from limited generalization and lack interpretability, especially in high-stakes security applications. In this paper, we propose a novel stacking ensemble framework for graph-based malware detection and explanation. Our method dynamically extracts CFGs from portable executable (PE) files and encodes their basic blocks through a two-step embedding strategy. A set of diverse GNN base learners, each with a distinct message-passing mechanism, is used to capture complementary behavioral features. Their prediction outputs are aggregated by a meta-learner implemented as an attention-based multilayer perceptron, which both classifies malware instances and quantifies the contribution of each base model. To enhance explainability, we introduce an ensemble-aware post-hoc explanation technique that leverages edge-level importance scores generated by a GNN explainer and fuses them using the learned attention weights. This produces interpretable, model-agnostic explanations aligned with the final ensemble decision. Experimental results demonstrate that our framework improves classification performance while providing insightful interpretations of malware behavior.
Updated: 2025-08-14 20:12:03
标题: 可解释的注意力引导堆叠图神经网络用于恶意软件检测
摘要: 在现代计算环境中,恶意软件检测需要准确性、可解释性并且对回避技术具有鲁棒性的模型。图神经网络(GNNs)通过建模基于图的程序表示(如控制流图)中的丰富结构依赖性,在这一领域展现了潜力。然而,单一模型方法可能在泛化能力受限且缺乏可解释性,尤其是在高风险安全应用中。本文中,我们提出了一种新颖的基于图的恶意软件检测和解释的堆叠集成框架。我们的方法动态提取可执行文件(PE)中的控制流图,并通过两步嵌入策略对其基本块进行编码。一组多样化的GNN基础学习器,每个具有不同的消息传递机制,用于捕获互补的行为特征。它们的预测输出由一个基于注意力的多层感知器作为元学习器进行聚合,既对恶意软件实例进行分类,又量化每个基础模型的贡献。为了增强可解释性,我们引入了一种基于集成意识的事后解释技术,利用由GNN解释器生成的边级重要性分数,并使用学习的注意力权重融合它们。这产生了与最终集成决策一致的可解释的、与模型无关的解释。实验结果表明,我们的框架提高了分类性能,并提供了对恶意软件行为的有见地的解释。
更新时间: 2025-08-14 20:12:03
领域: cs.CR,cs.AI
Sophisticated Learning: A novel algorithm for active learning during model-based planning
We introduce Sophisticated Learning (SL), a planning-to-learn algorithm that embeds active parameter learning inside the Sophisticated Inference (SI) tree-search framework of Active Inference. Unlike SI -- which optimizes beliefs about hidden states -- SL also updates beliefs about model parameters within each simulated branch, enabling counterfactual reasoning about how future observations would improve subsequent planning. We compared SL with Bayes-adaptive Reinforcement Learning (BARL) agents as well as with its parent algorithm, SI. Using a biologically inspired seasonal foraging task in which resources shift probabilistically over a 10x10 grid, we designed experiments that forced agents to balance probabilistic reward harvesting against information gathering. In early trials, where rapid learning is vital, SL agents survive, on average, 8.2% longer than SI and 35% longer than Bayes-adaptive Reinforcement Learning. While both SL and SI showed equal convergence performance, SL reached this convergence 40% faster than SI. Additionally, SL showed robust out-performance of other algorithms in altered environment configurations. Our results show that incorporating active learning into multi-step planning materially improves decision making under radical uncertainty, and reinforces the broader utility of Active Inference for modeling biologically relevant behavior.
Updated: 2025-08-14 20:04:52
标题: 精密学习:一种在基于模型规划过程中进行主动学习的新算法
摘要: 我们介绍了Sophisticated Learning(SL),这是一种计划学习算法,它将主动参数学习嵌入了Active Inference的Sophisticated Inference(SI)树搜索框架中。与SI不同的是,SI优化对隐藏状态的信念,SL还在每个模拟分支中更新对模型参数的信念,从而实现关于未来观察如何改善后续规划的反事实推理。 我们将SL与Bayes-adaptive强化学习(BARL)代理以及其父算法SI进行了比较。在一个受生物启发的季节性觅食任务中,资源在一个10x10的网格上以概率方式转移,我们设计了实验,迫使代理在概率奖励收获与信息搜集之间取得平衡。 在早期试验中,快速学习至关重要,SL代理的平均存活时间比SI长8.2%,比Bayes-adaptive强化学习长35%。虽然SL和SI表现出相等的收敛性能,但SL比SI快40%达到这种收敛。此外,SL在改变的环境配置中显示出其他算法的稳健优势。 我们的结果表明,在极端不确定性下将主动学习纳入多步规划中实质性地改善决策,强化了Active Inference在建模具有生物相关行为方面的广泛实用性。
更新时间: 2025-08-14 20:04:52
领域: cs.AI,cs.LG,q-bio.NC
Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping
While federated learning (FL) and differential privacy (DP) have been extensively studied, their application to automatic speech recognition (ASR) remains largely unexplored due to the challenges in training large transformer models. Specifically, large models further exacerbate issues in FL as they are particularly susceptible to gradient heterogeneity across layers, unlike the relatively uniform gradient behavior observed in shallow models. As a result, prior works struggle to converge with standard optimization techniques, even in the absence of DP mechanisms. To the best of our knowledge, no existing work establishes a competitive, practical recipe for FL with DP in the context of ASR. To address this gap, we establish \textbf{the first benchmark for FL with DP in end-to-end ASR}. Our approach centers on per-layer clipping and layer-wise gradient normalization: theoretical analysis reveals that these techniques together mitigate clipping bias and gradient heterogeneity across layers in deeper models. Consistent with these theoretical insights, our empirical results show that FL with DP is viable under strong privacy guarantees, provided a population of at least several million users. Specifically, we achieve user-level (7.2, $10^{-9}$)-DP (resp. (4.5, $10^{-9}$)-DP) with only a 1.3% (resp. 4.6%) absolute drop in word error rate when extrapolating to high (resp. low) population scales for FL with DP in ASR. Although our experiments focus on ASR, the underlying principles we uncover - particularly those concerning gradient heterogeneity and layer-wise gradient normalization - offer broader guidance for designing scalable, privacy-preserving FL algorithms for large models across domains. Code of all experiments and benchmarks is available at https://github.com/apple/ml-pfl4asr.
Updated: 2025-08-14 19:55:10
标题: 实现差分私密联邦学习用于语音识别:基准测试、自适应优化器和梯度剪切
摘要: 虽然联邦学习(FL)和差分隐私(DP)已经得到广泛研究,但它们在自动语音识别(ASR)中的应用仍然较少探索,这主要是由于训练大型Transformer模型所面临的挑战。特别是,大型模型进一步加剧了FL中的问题,因为它们在各层之间特别容易受到梯度异质性的影响,这与浅层模型中观察到的相对均匀的梯度行为不同。因此,先前的研究很难通过标准优化技术收敛,即使在没有DP机制的情况下也是如此。据我们所知,在ASR的环境中,没有现有的工作建立了一个有竞争力的、实用的FL与DP的配方。为了弥补这一空白,我们建立了第一个FL与DP在端到端ASR中的基准。我们的方法集中在每层裁剪和逐层梯度归一化上:理论分析表明,这些技术共同减轻了深层模型中各层之间的裁剪偏差和梯度异质性。与这些理论洞见一致,我们的实证结果表明,在提供至少几百万用户的情况下,FL与DP在强隐私保证下是可行的。具体地,当推广到高(低)人口规模时,我们在ASR的FL与DP中实现了用户级别(7.2,$10^{-9}$)-DP(分别为(4.5,$10^{-9}$)-DP),仅有1.3%(分别为4.6%)的绝对词错误率下降。虽然我们的实验集中在ASR上,但我们揭示的基本原理 - 特别是关于梯度异质性和逐层梯度归一化的原理 - 为设计跨领域的大型模型的可扩展、隐私保护的FL算法提供了更广泛的指导。所有实验和基准的代码都可以在https://github.com/apple/ml-pfl4asr上找到。
更新时间: 2025-08-14 19:55:10
领域: cs.LG,cs.CR,stat.ML
Learning with Confidence
We characterize a notion of confidence that arises in learning or updating beliefs: the amount of trust one has in incoming information and its impact on the belief state. This learner's confidence can be used alongside (and is easily mistaken for) probability or likelihood, but it is fundamentally a different concept -- one that captures many familiar concepts in the literature, including learning rates and number of training epochs, Shafer's weight of evidence, and Kalman gain. We formally axiomatize what it means to learn with confidence, give two canonical ways of measuring confidence on a continuum, and prove that confidence can always be represented in this way. Under additional assumptions, we derive more compact representations of confidence-based learning in terms of vector fields and loss functions. These representations induce an extended language of compound "parallel" observations. We characterize Bayes Rule as the special case of an optimizing learner whose loss representation is a linear expectation.
Updated: 2025-08-14 19:45:40
标题: 学习自信
摘要: 我们描述了在学习或更新信念过程中出现的一种置信度概念:一个人对于接收信息及其对信念状态的影响有多少信任。这种学习者的信心可以与(并且往往被误解为)概率或可能性一起使用,但它基本上是一个不同的概念——捕捉了文献中许多熟悉的概念,包括学习速率和训练纪元的数量、Shafer的证据权重和Kalman增益。我们正式公理化了学习置信度的含义,提供了两种在连续中测量置信度的规范方式,并证明了置信度总是可以用这种方式表示。在额外假设的情况下,我们推导出了基于置信度学习的更紧凑表示形式,这些表示形式涉及矢量场和损失函数。这些表示形式引发了一个扩展的“并行”观测的语言。我们将贝叶斯规则描述为一个特殊情况,即其损失表示是线性期望的优化学习者。
更新时间: 2025-08-14 19:45:40
领域: cs.LG,cs.AI,math.DG
Recent Advances in Generative AI for Healthcare Applications
The rapid advancement of Artificial Intelligence (AI) has catalyzed revolutionary changes across various sectors, notably in healthcare. In particular, generative AI-led by diffusion models and transformer architectures-has enabled significant breakthroughs in medical imaging (including image reconstruction, image-to-image translation, generation, and classification), protein structure prediction, clinical documentation, diagnostic assistance, radiology interpretation, clinical decision support, medical coding, and billing, as well as drug design and molecular representation. These innovations have enhanced clinical diagnosis, data reconstruction, and drug synthesis. This review paper aims to offer a comprehensive synthesis of recent advances in healthcare applications of generative AI, with an emphasis on diffusion and transformer models. Moreover, we discuss current capabilities, highlight existing limitations, and outline promising research directions to address emerging challenges. Serving as both a reference for researchers and a guide for practitioners, this work offers an integrated view of the state of the art, its impact on healthcare, and its future potential.
Updated: 2025-08-14 19:43:06
标题: 近期在医疗应用中生成式人工智能的最新进展
摘要: 人工智能(AI)的快速发展在各个领域引发了革命性的变革,尤其是在医疗保健领域。特别是,由扩散模型和变压器架构引领的生成式AI已经在医学影像(包括图像重建、图像到图像的转换、生成和分类)、蛋白质结构预测、临床文档、诊断辅助、放射学解释、临床决策支持、医疗编码和计费以及药物设计和分子表示方面取得了重大突破。这些创新已经增强了临床诊断、数据重建和药物合成。本综述旨在提供对生成式AI在医疗保健应用中的最新进展的综合综合,重点关注扩散和变压器模型。此外,我们讨论当前的能力,突出现有的限制,并概述应对新兴挑战的有前景的研究方向。作为研究人员的参考和从业者的指南,这项工作提供了对最新技术的综合视图,以及它对医疗保健的影响和未来潜力。
更新时间: 2025-08-14 19:43:06
领域: cs.LG,cs.AI
Note on Selection Bias in Observational Estimates of Algorithmic Progress
Ho et. al (2024) is an interesting paper that attempts to estimate the degree of algorithmic progress from language models. They collect observational data on language models' loss and compute over time, and argue that as time has passed, language models' algorithmic efficiency has been rising. That is, the loss achieved for fixed compute has been dropping over time. In this note, I want to raise one potential methodological problem with the estimation strategy. Intuitively, if part of algorithmic quality is latent, and compute choices are endogenous to algorithmic quality, then resulting estimates of algorithmic quality will be biased.
Updated: 2025-08-14 19:38:10
标题: 关于观察估计算法进展中选择偏倚的注解
摘要: Ho等人(2024年)是一篇有趣的论文,试图估计语言模型的算法进展程度。他们收集了关于语言模型损失随时间变化的观察数据,并认为随着时间的推移,语言模型的算法效率不断提高。也就是说,对于固定计算,损失随时间逐渐降低。在这篇笔记中,我想提出一种可能的方法论问题,即直觉上,如果算法质量的一部分是潜在的,并且计算选择是内生于算法质量的话,那么得到的算法质量估计将存在偏差。
更新时间: 2025-08-14 19:38:10
领域: econ.GN,cs.AI,q-fin.EC
Risk-Based Prognostics and Health Management
It is often the case that risk assessment and prognostics are viewed as related but separate tasks. This chapter describes a risk-based approach to prognostics that seeks to provide a tighter coupling between risk assessment and fault prediction. We show how this can be achieved using the continuous-time Bayesian network as the underlying modeling framework. Furthermore, we provide an overview of the techniques that are available to derive these models from data and show how they might be used in practice to achieve tasks like decision support and performance-based logistics. This work is intended to provide an overview of the recent developments related to risk-based prognostics, and we hope that it will serve as a tutorial of sorts that will assist others in adopting these techniques.
Updated: 2025-08-14 19:31:33
标题: 基于风险的预测和健康管理
摘要: 通常情况下,风险评估和预测被视为相关但分开的任务。本章描述了一种基于风险的预测方法,旨在提供风险评估和故障预测之间更紧密的联系。我们展示了如何利用连续时间贝叶斯网络作为基础建模框架来实现这一目标。此外,我们概述了从数据中推导这些模型的技术,并展示了它们如何在实践中用于实现决策支持和基于性能的后勤任务。本工作旨在提供与基于风险的预测相关的最新发展概述,希望它能作为一种教程,帮助他人采用这些技术。
更新时间: 2025-08-14 19:31:33
领域: eess.SY,cs.AI,cs.SY,stat.AP
StoryEnsemble: Enabling Dynamic Exploration & Iteration in the Design Process with AI and Forward-Backward Propagation
Design processes involve exploration, iteration, and movement across interconnected stages such as persona creation, problem framing, solution ideation, and prototyping. However, time and resource constraints often hinder designers from exploring broadly, collecting feedback, and revisiting earlier assumptions-making it difficult to uphold core design principles in practice. To better understand these challenges, we conducted a formative study with 15 participants-comprised of UX practitioners, students, and instructors. Based on the findings, we developed StoryEnsemble, a tool that integrates AI into a node-link interface and leverages forward and backward propagation to support dynamic exploration and iteration across the design process. A user study with 10 participants showed that StoryEnsemble enables rapid, multi-directional iteration and flexible navigation across design stages. This work advances our understanding of how AI can foster more iterative design practices by introducing novel interactions that make exploration and iteration more fluid, accessible, and engaging.
Updated: 2025-08-14 19:28:08
标题: StoryEnsemble:利用AI和前向-后向传播在设计过程中实现动态探索和迭代
摘要: 设计过程涉及探索、迭代和在人物创作、问题框架、解决方案构想和原型制作等相互关联的阶段之间的移动。然而,时间和资源约束通常阻碍设计师进行广泛探索、收集反馈和重新审视先前的假设,使得在实践中难以坚持核心设计原则。为了更好地理解这些挑战,我们与15名用户体验从业者、学生和教师进行了一项形式研究。根据研究结果,我们开发了StoryEnsemble,这是一种将人工智能集成到节点-链接界面中,并利用前向和后向传播来支持动态探索和迭代设计过程的工具。一项涉及10名参与者的用户研究表明,StoryEnsemble能够实现快速、多方向的迭代和灵活的导航,跨越设计阶段。这项工作推动了我们对人工智能如何促进更具迭代性的设计实践的理解,通过引入新颖的交互方式,使探索和迭代更加流畅、易于访问和引人入胜。
更新时间: 2025-08-14 19:28:08
领域: cs.HC,cs.AI
A Closer Look at Multimodal Representation Collapse
We aim to develop a fundamental understanding of modality collapse, a recently observed empirical phenomenon wherein models trained for multimodal fusion tend to rely only on a subset of the modalities, ignoring the rest. We show that modality collapse happens when noisy features from one modality are entangled, via a shared set of neurons in the fusion head, with predictive features from another, effectively masking out positive contributions from the predictive features of the former modality and leading to its collapse. We further prove that cross-modal knowledge distillation implicitly disentangles such representations by freeing up rank bottlenecks in the student encoder, denoising the fusion-head outputs without negatively impacting the predictive features from either modality. Based on the above findings, we propose an algorithm that prevents modality collapse through explicit basis reallocation, with applications in dealing with missing modalities. Extensive experiments on multiple multimodal benchmarks validate our theoretical claims. Project page: https://abhrac.github.io/mmcollapse/.
Updated: 2025-08-14 19:16:46
标题: 对多模态表示崩溃的更深入研究
摘要: 我们的目标是发展对模态坍缩的基本理解,这是一种最近观察到的经验现象,其中针对多模态融合训练的模型倾向于仅依赖于模态的子集,忽略其余的。我们展示了模态崩溃发生在一个模态的嘈杂特征通过融合头中的一组共享神经元与另一个模态的预测特征纠缠在一起时,从而有效地掩盖了前一个模态的预测特征的积极贡献,导致其崩溃。我们进一步证明,跨模态知识蒸馏通过释放学生编码器中的等级瓶颈来隐式解开这种表示,从而去噪融合头输出,而不会对任一模态的预测特征产生负面影响。基于以上发现,我们提出了一种通过显式基础重新分配来防止模态崩溃的算法,适用于处理缺失模态。对多个多模态基准的广泛实验验证了我们的理论主张。项目页面:https://abhrac.github.io/mmcollapse/。
更新时间: 2025-08-14 19:16:46
领域: cs.LG,cs.AI,cs.CV
Modeling Sampling Distributions of Test Statistics with Autograd
Simulation-based inference methods that feature correct conditional coverage of confidence sets based on observations that have been compressed to a scalar test statistic require accurate modeling of either the p-value function or the cumulative distribution function (cdf) of the test statistic. If the model of the cdf, which is typically a deep neural network, is a function of the test statistic then the derivative of the neural network with respect to the test statistic furnishes an approximation of the sampling distribution of the test statistic. We explore whether this approach to modeling conditional 1-dimensional sampling distributions is a viable alternative to the probability density-ratio method, also known as the likelihood-ratio trick. Relatively simple, yet effective, neural network models are used whose predictive uncertainty is quantified through a variety of methods.
Updated: 2025-08-14 19:14:45
标题: 用Autograd建模测试统计量的抽样分布
摘要: 基于模拟的推断方法需要正确的条件覆盖置信区间,这些方法基于已经压缩为标量检验统计量的观测数据。这要求准确建模p值函数或检验统计量的累积分布函数(cdf)。如果cdf的模型通常是一个深度神经网络,是检验统计量的函数,那么神经网络关于检验统计量的导数提供了检验统计量的抽样分布的近似。我们探讨了这种建模条件1维抽样分布的方法是否是概率密度比方法(也称为似然比技巧)的可行替代方案。我们使用了相对简单但有效的神经网络模型,通过各种方法量化其预测不确定性。
更新时间: 2025-08-14 19:14:45
领域: stat.ML,cs.LG,hep-ex,stat.CO
Zono-Conformal Prediction: Zonotope-Based Uncertainty Quantification for Regression and Classification Tasks
Conformal prediction is a popular uncertainty quantification method that augments a base predictor with prediction sets with statistically valid coverage guarantees. However, current methods are often computationally expensive and data-intensive, as they require constructing an uncertainty model before calibration. Moreover, existing approaches typically represent the prediction sets with intervals, which limits their ability to capture dependencies in multi-dimensional outputs. We address these limitations by introducing zono-conformal prediction, a novel approach inspired by interval predictor models and reachset-conformant identification that constructs prediction zonotopes with assured coverage. By placing zonotopic uncertainty sets directly into the model of the base predictor, zono-conformal predictors can be identified via a single, data-efficient linear program. While we can apply zono-conformal prediction to arbitrary nonlinear base predictors, we focus on feed-forward neural networks in this work. Aside from regression tasks, we also construct optimal zono-conformal predictors in classification settings where the output of an uncertain predictor is a set of possible classes. We provide probabilistic coverage guarantees and present methods for detecting outliers in the identification data. In extensive numerical experiments, we show that zono-conformal predictors are less conservative than interval predictor models and standard conformal prediction methods, while achieving a similar coverage over the test data.
Updated: 2025-08-14 19:03:28
标题: Zone-Conformal Prediction:基于zonotope的回归和分类任务的不确定性量化
摘要: 共形预测是一种流行的不确定性量化方法,它通过具有统计有效覆盖保证的预测集来增强基础预测器。然而,当前的方法通常在计算上昂贵且数据密集,因为它们需要在校准之前构建不确定性模型。此外,现有方法通常用区间表示预测集,这限制了它们捕捉多维输出中的依赖关系的能力。我们通过引入zono-conformal预测来解决这些限制,这是一种受区间预测模型和reachset-conformant识别启发的新方法,它构建具有保证覆盖的预测zonotope。通过直接将zonotopic不确定性集置于基础预测器模型中,zono-conformal预测器可以通过单个、数据高效的线性程序进行识别。虽然我们可以将zono-conformal预测应用于任意非线性基础预测器,但在这项工作中,我们专注于前馈神经网络。除了回归任务外,我们还在分类设置中构建了最优的zono-conformal预测器,其中不确定预测器的输出是一组可能的类别。我们提供概率覆盖保证,并提出了在识别数据中检测异常值的方法。在广泛的数值实验中,我们展示了zono-conformal预测器比区间预测模型和标准共形预测方法更少保守,同时在测试数据上实现了类似的覆盖率。
更新时间: 2025-08-14 19:03:28
领域: cs.LG,cs.AI,cs.SY,eess.SY
On Approximate MMS Allocations on Restricted Graph Classes
We study the problem of fair division of a set of indivisible goods with connectivity constraints. Specifically, we assume that the goods are represented as vertices of a connected graph, and sets of goods allocated to the agents are connected subgraphs of this graph. We focus on the widely-studied maximin share criterion of fairness. It has been shown that an allocation satisfying this criterion may not exist even without connectivity constraints, i.e., if the graph of goods is complete. In view of this, it is natural to seek approximate allocations that guarantee each agent a connected bundle of goods with value at least a constant fraction of the maximin share value to the agent. It is known that for some classes of graphs, such as complete graphs, cycles, and $d$-claw-free graphs for any fixed $d$, such approximate allocations indeed exist. However, it is an open problem whether they exist for the class of all graphs. In this paper, we continue the systematic study of the existence of approximate allocations on restricted graph classes. In particular, we show that such allocations exist for several well-studied classes, including block graphs, cacti, complete multipartite graphs, and split graphs.
Updated: 2025-08-14 19:01:13
标题: 关于受限图类上近似MMS分配的研究
摘要: 我们研究了带有连通性约束的不可分割商品集的公平分配问题。具体而言,我们假设商品被表示为连通图的顶点,而分配给代理的商品集是该图的连通子图。我们专注于广泛研究的最大最小份额公平准则。已经证明,即使没有连通性约束,即商品图是完全的情况下,满足该准则的分配可能不存在。考虑到这一点,自然而然地寻求保证每个代理人至少获得与最大最小份额值的常数分数相等的连通商品束的近似分配。已知对于某些图类,例如完全图、环、以及任意固定$d$的$d-$爪图,这样的近似分配确实存在。然而,对于所有图类是否存在这样的分配仍然是一个开放问题。 在本文中,我们继续系统地研究在受限图类上近似分配的存在性。特别地,我们展示了这样的分配对于几个广泛研究的图类是存在的,包括分块图、仙人掌图、完全多部图以及分裂图。
更新时间: 2025-08-14 19:01:13
领域: cs.DM,cs.AI
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
Foundation models are premised on the idea that sequence prediction can uncover deeper domain understanding, much like how Kepler's predictions of planetary motion later led to the discovery of Newtonian mechanics. However, evaluating whether these models truly capture deeper structure remains a challenge. We develop a technique for evaluating foundation models that examines how they adapt to synthetic datasets generated from some postulated world model. Our technique measures whether the foundation model's inductive bias aligns with the world model, and so we refer to it as an inductive bias probe. Across multiple domains, we find that foundation models can excel at their training tasks yet fail to develop inductive biases towards the underlying world model when adapted to new tasks. We particularly find that foundation models trained on orbital trajectories consistently fail to apply Newtonian mechanics when adapted to new physics tasks. Further analysis reveals that these models behave as if they develop task-specific heuristics that fail to generalize.
Updated: 2025-08-14 18:55:48
标题: 什么是基础模型发现的?利用归纳偏差探索世界模型
摘要: 基础模型的基本理念是序列预测可以揭示更深层次的领域理解,就像开普勒对行星运动的预测后来导致了牛顿力学的发现一样。然而,评估这些模型是否真正捕捉到更深层次的结构仍然是一个挑战。我们开发了一种评估基础模型的技术,该技术检查它们如何适应从某个假设的世界模型生成的合成数据集。我们的技术衡量基础模型的归纳偏差是否与世界模型一致,因此我们将其称为归纳偏差探测器。在多个领域中,我们发现基础模型在训练任务上表现出色,但在适应新任务时未能发展出对底层世界模型的归纳偏差。我们特别发现,基于轨道轨迹训练的基础模型在适应新的物理任务时一贯未能应用牛顿力学。进一步分析显示,这些模型表现出它们似乎发展出了特定于任务的启发式方法,无法推广。
更新时间: 2025-08-14 18:55:48
领域: cs.LG,cs.AI
Quantization vs Pruning: Insights from the Strong Lottery Ticket Hypothesis
Quantization is an essential technique for making neural networks more efficient, yet our theoretical understanding of it remains limited. Previous works demonstrated that extremely low-precision networks, such as binary networks, can be constructed by pruning large, randomly-initialized networks, and showed that the ratio between the size of the original and the pruned networks is at most polylogarithmic. The specific pruning method they employed inspired a line of theoretical work known as the Strong Lottery Ticket Hypothesis (SLTH), which leverages insights from the Random Subset Sum Problem. However, these results primarily address the continuous setting and cannot be applied to extend SLTH results to the quantized setting. In this work, we build on foundational results by Borgs et al. on the Number Partitioning Problem to derive new theoretical results for the Random Subset Sum Problem in a quantized setting. Using these results, we then extend the SLTH framework to finite-precision networks. While prior work on SLTH showed that pruning allows approximation of a certain class of neural networks, we demonstrate that, in the quantized setting, the analogous class of target discrete neural networks can be represented exactly, and we prove optimal bounds on the necessary overparameterization of the initial network as a function of the precision of the target network.
Updated: 2025-08-14 18:51:34
标题: 量化与修剪:来自强抽奖票假设的启示
摘要: 量化是使神经网络更高效的一种必要技术,然而我们对其的理论认识仍然有限。先前的研究表明,可以通过修剪大型、随机初始化的网络构建极低精度的网络,例如二进制网络,并显示原始网络和修剪网络之间的大小比最多是对数多项式。 他们采用的具体修剪方法启发了一系列被称为强抽奖票假设(SLTH)的理论工作,利用了随机子集求和问题的洞见。然而,这些结果主要涉及连续设置,并不能应用于将SLTH结果扩展到量化设置。 在这项工作中,我们基于Borgs等人关于数分割问题的基础结果,推导出了量化设置中随机子集求和问题的新理论结果。 利用这些结果,我们将SLTH框架扩展到有限精度网络。尽管先前的SLTH研究表明修剪可以实现对某一类神经网络的近似,但我们证明,在量化设置中,目标离散神经网络的类可以被准确表示,并且我们根据目标网络的精度证明了初始网络在必要的超参数化方面的最优界限。
更新时间: 2025-08-14 18:51:34
领域: cs.LG
The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation
This paper presents a novel methodology for generating synthetic Preference Optimization (PO) datasets using multi-model workflows. We evaluate the effectiveness and potential of these workflows in automating and enhancing the dataset generation process. PO dataset generation requires two modules: (1) $\textit{response evaluation}$, and (2) $\textit{response generation}$. In the $\textit{response evaluation}$ module, the responses from Large Language Models (LLMs) are evaluated and ranked - a task typically carried out by human annotators that we automate using LLMs. We assess the response evaluation module in a 2 step process. In step 1, we assess LLMs as evaluators using three distinct prompting strategies. In step 2, we apply the winning prompting strategy to compare the performance of LLM-as-a-Judge, LLMs-as-a-Jury, and LLM Debate. Our evaluation shows that GPT-4o-as-a-Judge is more consistent across all datasets. For the $\textit{response generation}$ module, we use the identified LLM evaluator configuration and compare different configurations of the LLM Feedback Loop. We use the win rate to determine the best multi-model configuration for generation. Experimenting with various configurations, we find that the LLM Feedback Loop, with Llama as the generator and Gemma as the reviewer, achieves a notable 71.8% and 73.8% win rate over single-model Llama and Gemma, respectively. After identifying the best configurations for both modules, we generate our PO datasets using the above pipeline.
Updated: 2025-08-14 18:51:16
标题: LLMs的团结:用于合成偏好优化数据集生成的多模型工作流
摘要: 这篇论文提出了一种利用多模型工作流生成合成偏好优化(PO)数据集的新方法。我们评估了这些工作流在自动化和增强数据集生成过程中的有效性和潜力。PO数据集生成需要两个模块:(1)$\textit{响应评估}$ 和(2)$\textit{响应生成}$。在$\textit{响应评估}$模块中,对来自大型语言模型(LLMs)的响应进行评估和排名 - 这通常是人类标注者执行的任务,我们使用LLMs自动化。我们通过两步过程评估响应评估模块。在第一步中,我们使用三种不同的提示策略评估LLMs作为评估者。在第二步中,我们应用获胜的提示策略比较LLM作为法官,LLMs作为陪审团和LLM辩论的表现。我们的评估显示,GPT-4o作为法官在所有数据集上更为一致。对于$\textit{响应生成}$模块,我们使用识别的LLM评估器配置,并比较LLM反馈环路的不同配置。我们使用胜率确定用于生成的最佳多模型配置。通过尝试各种配置,我们发现LLM反馈环路,其中Llama作为生成器,Gemma作为审阅者,分别在单一模型Llama和Gemma上分别实现了显著的71.8%和73.8%的胜率。在确定了两个模块的最佳配置后,我们使用上述流程生成我们的PO数据集。
更新时间: 2025-08-14 18:51:16
领域: cs.CL,cs.AI
Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics
Large language models (LLMs) struggle with cross-lingual knowledge transfer: they hallucinate when asked in one language about facts expressed in a different language during training. This work introduces a controlled setting to study the causes and dynamics of this phenomenon by training small Transformer models from scratch on synthetic multilingual datasets. We identify a learning phase wherein a model develops either separate or unified representations of the same facts across languages, and show that unification is essential for cross-lingual transfer. We also show that the degree of unification depends on mutual information between facts and training data language, and on how easy it is to extract that language. Based on these insights, we develop methods to modulate the level of cross-lingual transfer by manipulating data distribution and tokenization, and we introduce metrics and visualizations to formally characterize their effects on unification. Our work shows how controlled settings can shed light on pre-training dynamics and suggests new directions for improving cross-lingual transfer in LLMs.
Updated: 2025-08-14 18:44:13
标题: 超越罗塞塔石碑:泛化动力学中的统一力量
摘要: 大型语言模型(LLMs)在跨语言知识转移方面面临困难:当在一种语言中询问在训练期间以不同语言表达的事实时,它们会产生幻觉。本文介绍了一个受控设置,通过在合成多语言数据集上从头开始训练小型Transformer模型来研究这种现象的原因和动态。我们确定了一个学习阶段,在该阶段模型会在不同语言之间开发相同事实的独立或统一表示,并表明统一对于跨语言转移是至关重要的。我们还展示了统一程度取决于事实和训练数据语言之间的互信息,以及提取该语言的难易程度。基于这些见解,我们开发了通过操纵数据分布和标记化来调节跨语言转移水平的方法,并介绍了度量和可视化工具,正式表征它们对统一的影响。我们的工作展示了受控设置如何揭示预训练动态,并提出了改进LLMs中跨语言转移的新方向。
更新时间: 2025-08-14 18:44:13
领域: cs.CL,cs.AI
CURE: Critical-Token-Guided Re-concatenation for Entropy-collapse Prevention
Recent advances in Reinforcement Learning with Verified Reward (RLVR) have driven the emergence of more sophisticated cognitive behaviors in large language models (LLMs), thereby enhancing their reasoning capabilities. However, in prior RLVR pipelines, the repeated use of static initial-state sampling drawn exactly from the dataset distribution during each sampling phase produced overly deterministic, low diversity model behavior, which manifested as rapid entropy collapse and hindered sustained performance gains during prolonged training. To address this issue, we introduce CURE (Critical-token-gUided Re concatenation for Entropy-collapse prevention), a two-stage framework that balances exploration and exploitation. Specifically, in the first stage, to deliberately steer the model toward novel yet coherent contexts, we re-generate at high-entropy critical tokens and jointly optimize the original and the branched trajectories. The further comparison with vanilla DAPO shows that the regeneration process achieves a better performance on math reasoning tasks while sustaining a high-level entropy degree for exploration. In the second stage, we continue training with static initial-state sampling by DAPO, intentionally placing the model in a familiar state to gradually strengthen exploitation. Extensive experiments on Qwen-2.5-Math-7B show that, compared to other RLVR methods, CURE achieves a 5% performance gain across six math benchmarks, establishing state-of-the-art performance in both entropy and accuracy. A series of experiments further validate the effectiveness of our approach. Code is available at https://github.com/CURE-Project/CURE.
Updated: 2025-08-14 18:40:34
标题: CURE:关键令牌引导的重新连接以预防熵崩溃
摘要: 最近在验证奖励的强化学习(RLVR)方面取得的进展推动了大型语言模型(LLMs)中更复杂认知行为的出现,从而增强了它们的推理能力。然而,在先前的RLVR流水线中,每次采样阶段都精确地从数据集分布中抽取静态初始状态样本的重复使用导致模型行为过于确定性,缺乏多样性,表现为快速熵崩溃,阻碍了在长时间训练过程中持续性能提升。为了解决这个问题,我们引入了CURE(用于防止熵崩溃的关键标记引导重新连接) ,这是一个平衡探索和利用的两阶段框架。具体而言,在第一阶段,为了有意地引导模型朝向新颖但连贯的语境,我们在高熵关键标记上重新生成,并联合优化原始轨迹和分支轨迹。与普通的DAPO相比,再生过程在数学推理任务上取得了更好的表现,同时保持了高水平的探索熵度。在第二阶段,我们继续使用DAPO进行静态初始状态采样训练,有意将模型置于熟悉状态,逐渐加强利用。在Qwen-2.5-Math-7B上进行的大量实验显示,与其他RLVR方法相比,CURE在六个数学基准测试中实现了5%的性能增益,同时在熵和准确性方面确立了最先进的性能。一系列实验进一步验证了我们方法的有效性。代码可在https://github.com/CURE-Project/CURE找到。
更新时间: 2025-08-14 18:40:34
领域: cs.LG,cs.AI
Deep Learning-Based Automated Segmentation of Uterine Myomas
Uterine fibroids (myomas) are the most common benign tumors of the female reproductive system, particularly among women of childbearing age. With a prevalence exceeding 70%, they pose a significant burden on female reproductive health. Clinical symptoms such as abnormal uterine bleeding, infertility, pelvic pain, and pressure-related discomfort play a crucial role in guiding treatment decisions, which are largely influenced by the size, number, and anatomical location of the fibroids. Magnetic Resonance Imaging (MRI) is a non-invasive and highly accurate imaging modality commonly used by clinicians for the diagnosis of uterine fibroids. Segmenting uterine fibroids requires a precise assessment of both the uterus and fibroids on MRI scans, including measurements of volume, shape, and spatial location. However, this process is labor intensive and time consuming and subjected to variability due to intra- and inter-expert differences at both pre- and post-treatment stages. As a result, there is a critical need for an accurate and automated segmentation method for uterine fibroids. In recent years, deep learning algorithms have shown re-markable improvements in medical image segmentation, outperforming traditional methods. These approaches offer the potential for fully automated segmentation. Several studies have explored the use of deep learning models to achieve automated segmentation of uterine fibroids. However, most of the previous work has been conducted using private datasets, which poses challenges for validation and comparison between studies. In this study, we leverage the publicly available Uterine Myoma MRI Dataset (UMD) to establish a baseline for automated segmentation of uterine fibroids, enabling standardized evaluation and facilitating future research in this domain.
Updated: 2025-08-14 18:22:14
标题: 基于深度学习的子宫肌瘤自动分割
摘要: 宫内肌瘤(子宫肌瘤)是女性生殖系统中最常见的良性肿瘤,特别是在育龄妇女中。由于患病率超过70%,对女性生殖健康构成重大负担。临床症状如异常子宫出血、不孕、盆腔疼痛和压力相关的不适在指导治疗决策中起着关键作用,这些决策很大程度上受到肌瘤的大小、数量和解剖位置的影响。磁共振成像(MRI)是临床医生常用的无创和高精度成像模式,用于诊断宫内肌瘤。分割宫内肌瘤需要对MRI扫描中的子宫和肌瘤进行精确评估,包括体积、形状和空间位置的测量。然而,这一过程需要大量的人力和时间,并且由于术前和术后专家之间的差异而存在变异性。因此,迫切需要一种准确且自动化的宫内肌瘤分割方法。近年来,深度学习算法在医学图像分割方面表现出显著的改进,超越了传统方法。这些方法提供了完全自动化分割的潜力。一些研究已探索深度学习模型用于实现宫内肌瘤的自动分割。然而,大多数先前的工作是使用私有数据集进行的,这给验证和研究之间的比较带来了挑战。在本研究中,我们利用公开可用的宫内肌瘤MRI数据集(UMD)建立了宫内肌瘤自动分割的基线,从而实现了标准化评估,并促进了该领域未来研究的开展。
更新时间: 2025-08-14 18:22:14
领域: eess.IV,cs.AI,cs.CV
SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth
The rapid proliferation of large language models (LLMs) in applications targeting children and adolescents necessitates a fundamental reassessment of prevailing AI safety frameworks, which are largely tailored to adult users and neglect the distinct developmental vulnerabilities of minors. This paper highlights key deficiencies in existing LLM safety benchmarks, including their inadequate coverage of age-specific cognitive, emotional, and social risks spanning early childhood (ages 0--6), middle childhood (7--12), and adolescence (13--18). To bridge these gaps, we introduce SproutBench, an innovative evaluation suite comprising 1,283 developmentally grounded adversarial prompts designed to probe risks such as emotional dependency, privacy violations, and imitation of hazardous behaviors. Through rigorous empirical evaluation of 47 diverse LLMs, we uncover substantial safety vulnerabilities, corroborated by robust inter-dimensional correlations (e.g., between Safety and Risk Prevention) and a notable inverse relationship between Interactivity and Age Appropriateness. These insights yield practical guidelines for advancing child-centric AI design and deployment.
Updated: 2025-08-14 18:21:39
标题: SproutBench:面向青少年的安全和道德大型语言模型基准测试
摘要: 大型语言模型(LLMs)在针对儿童和青少年的应用中迅速扩散,迫使我们对现有的人工智能安全框架进行基本重新评估,这些框架主要针对成年用户,忽视了未成年人独特的发展脆弱性。本文突出了现有LLM安全基准的关键缺陷,包括它们不足的覆盖范围,涵盖了早期儿童(0-6岁)、中期儿童(7-12岁)和青少年(13-18岁)的年龄特定的认知、情感和社会风险。为了填补这些差距,我们引入了SproutBench,这是一个创新的评估套件,包括1,283个基于发展的对抗提示,旨在探究风险,如情感依赖、隐私侵犯和危险行为的模仿。通过对47个不同LLMs的严格实证评估,我们发现了实质性的安全漏洞,得到了坚固的跨维度相关性支持(例如,安全与风险预防之间的关系)和互动性与年龄适宜性之间的显著反向关系。这些见解为推进以儿童为中心的人工智能设计和部署提供了实用指南。
更新时间: 2025-08-14 18:21:39
领域: cs.CL,cs.AI
Blending 3D Geometry and Machine Learning for Multi-View Stereopsis
Traditional multi-view stereo (MVS) methods primarily depend on photometric and geometric consistency constraints. In contrast, modern learning-based algorithms often rely on the plane sweep algorithm to infer 3D geometry, applying explicit geometric consistency (GC) checks only as a post-processing step, with no impact on the learning process itself. In this work, we introduce GC MVSNet plus plus, a novel approach that actively enforces geometric consistency of reference view depth maps across multiple source views (multi view) and at various scales (multi scale) during the learning phase (see Fig. 1). This integrated GC check significantly accelerates the learning process by directly penalizing geometrically inconsistent pixels, effectively halving the number of training iterations compared to other MVS methods. Furthermore, we introduce a densely connected cost regularization network with two distinct block designs simple and feature dense optimized to harness dense feature connections for enhanced regularization. Extensive experiments demonstrate that our approach achieves a new state of the art on the DTU and BlendedMVS datasets and secures second place on the Tanks and Temples benchmark. To our knowledge, GC MVSNet plus plus is the first method to enforce multi-view, multi-scale supervised geometric consistency during learning. Our code is available.
Updated: 2025-08-14 18:10:57
标题: 将3D几何和机器学习融合用于多视角立体视觉
摘要: 传统的多视图立体(MVS)方法主要依赖于光度和几何一致性约束。相比之下,现代基于学习的算法通常依赖于平面扫描算法来推断三维几何,仅将显式几何一致性(GC)检查作为后处理步骤,对学习过程本身没有影响。在这项工作中,我们介绍了GC MVSNet plus plus,一种新颖的方法,它在学习阶段积极强化了参考视图深度图在多个源视图(多视图)和不同尺度(多尺度)上的几何一致性(见图1)。这种整合的GC检查通过直接惩罚几何不一致的像素,显著加快了学习过程,有效地将训练迭代次数减半,与其他MVS方法相比。此外,我们引入了一个密集连接的成本正则化网络,具有两种不同的块设计,简单和特征密集优化,以利用密集特征连接进行增强正则化。大量实验证明,我们的方法在DTU和BlendedMVS数据集上达到了新的技术水平,并在Tanks and Temples基准测试中获得了第二名。据我们所知,GC MVSNet plus plus 是第一个在学习过程中强制执行多视图、多尺度监督几何一致性的方法。我们的代码可供使用。
更新时间: 2025-08-14 18:10:57
领域: cs.CV,cs.AI,cs.CG,cs.LG
JMA: a General Algorithm to Craft Nearly Optimal Targeted Adversarial Example
Most of the approaches proposed so far to craft targeted adversarial examples against Deep Learning classifiers are highly suboptimal and typically rely on increasing the likelihood of the target class, thus implicitly focusing on one-hot encoding settings. In this paper, a more general, theoretically sound, targeted attack is proposed, which resorts to the minimization of a Jacobian-induced Mahalanobis distance term, taking into account the effort (in the input space) required to move the latent space representation of the input sample in a given direction. The minimization is solved by exploiting the Wolfe duality theorem, reducing the problem to the solution of a Non-Negative Least Square (NNLS) problem. The proposed algorithm (referred to as JMA) provides an optimal solution to a linearised version of the adversarial example problem originally introduced by Szegedy et al. The results of the experiments confirm the generality of the proposed attack which is proven to be effective under a wide variety of output encoding schemes. Noticeably, JMA is also effective in a multi-label classification scenario, being capable to induce a targeted modification of up to half the labels in complex multi-label classification scenarios, a capability that is out of reach of all the attacks proposed so far. As a further advantage, JMA requires very few iterations, thus resulting more efficient than existing methods.
Updated: 2025-08-14 18:09:49
标题: JMA:一个通用算法用于生成几乎最佳的有针对性对抗样本
摘要: 迄今为止,针对深度学习分类器制定有针对性对抗样本的大多数方法都存在严重的亚优化问题,通常依赖于增加目标类别的可能性,因此隐含地专注于独热编码设置。本文提出了一种更加通用、理论上可靠的有针对性攻击方法,该方法利用雅可比引导的马氏距离项的最小化,考虑了将输入样本的潜在空间表示在给定方向上移动所需的工作量。通过利用沃尔夫对偶定理,将最小化问题简化为解决非负最小二乘(NNLS)问题。所提出的算法(称为JMA)为最初由Szegedy等人引入的对抗样本问题的线性化版本提供了最优解。实验结果证实了所提出的攻击的普遍性,证明在各种输出编码方案下都是有效的。值得注意的是,JMA在多标签分类场景中也非常有效,能够在复杂的多标签分类场景中有目标地修改多达一半的标签,这是迄今为止所有攻击方法都无法达到的能力。此外,JMA所需的迭代次数非常少,因此比现有方法更高效。
更新时间: 2025-08-14 18:09:49
领域: cs.LG,cs.AI,cs.CV
Uncertainty-Aware Adaptation of Large Language Models for Protein-Protein Interaction Analysis
Identification of protein-protein interactions (PPIs) helps derive cellular mechanistic understanding, particularly in the context of complex conditions such as neurodegenerative disorders, metabolic syndromes, and cancer. Large Language Models (LLMs) have demonstrated remarkable potential in predicting protein structures and interactions via automated mining of vast biomedical literature; yet their inherent uncertainty remains a key challenge for deriving reproducible findings, critical for biomedical applications. In this study, we present an uncertainty-aware adaptation of LLMs for PPI analysis, leveraging fine-tuned LLaMA-3 and BioMedGPT models. To enhance prediction reliability, we integrate LoRA ensembles and Bayesian LoRA models for uncertainty quantification (UQ), ensuring confidence-calibrated insights into protein behavior. Our approach achieves competitive performance in PPI identification across diverse disease contexts while addressing model uncertainty, thereby enhancing trustworthiness and reproducibility in computational biology. These findings underscore the potential of uncertainty-aware LLM adaptation for advancing precision medicine and biomedical research.
Updated: 2025-08-14 18:03:36
标题: 不确定性感知的大型语言模型在蛋白质相互作用分析中的适应性
摘要: 蛋白质-蛋白质相互作用(PPIs)的识别有助于推导细胞机制的理解,特别是在复杂疾病条件下,如神经退行性疾病、代谢综合征和癌症。大型语言模型(LLMs)已经展现出在通过自动挖掘大量生物医学文献来预测蛋白质结构和相互作用方面的显著潜力;然而,它们固有的不确定性仍然是推导可重复性结果的关键挑战,对于生物医学应用至关重要。在这项研究中,我们提出了一种对LLMs进行不确定性感知的适应性,利用经过精细调整的LLaMA-3和BioMedGPT模型进行PPI分析。为了增强预测的可靠性,我们集成了LoRA集成和贝叶斯LoRA模型进行不确定性量化(UQ),确保对蛋白质行为的信心校准洞察。我们的方法在跨多种疾病背景下实现了竞争性的PPI识别性能,同时解决了模型不确定性,从而增强了计算生物学的可信度和可重复性。这些发现强调了对不确定性感知的LLM适应性对推动精密医学和生物医学研究的潜力。
更新时间: 2025-08-14 18:03:36
领域: cs.LG,cs.AI,cs.CL,stat.AP,stat.ML
Improving Text Style Transfer using Masked Diffusion Language Models with Inference-time Scaling
Masked diffusion language models (MDMs) have recently gained traction as a viable generative framework for natural language. This can be attributed to its scalability and ease of training compared to other diffusion model paradigms for discrete data, establishing itself as the state-of-the-art non-autoregressive generator for discrete data. Diffusion models, in general, have shown excellent ability to improve the generation quality by leveraging inference-time scaling either by increasing the number of denoising steps or by using external verifiers on top of the outputs of each step to guide the generation. In this work, we propose a verifier-based inference-time scaling method that aids in finding a better candidate generation during the denoising process of the MDM. Our experiments demonstrate the application of MDMs for standard text-style transfer tasks and establish MDMs as a better alternative to autoregressive language models. Additionally, we show that a simple soft-value-based verifier setup for MDMs using off-the-shelf pre-trained embedding models leads to significant gains in generation quality even when used on top of typical classifier-free guidance setups in the existing literature.
Updated: 2025-08-14 18:01:22
标题: 使用推理时间缩放的遮罩扩散语言模型改进文本风格转移
摘要: 掩盖扩散语言模型(MDMs)最近已经成为一种可行的自然语言生成框架。与其他离散数据扩散模型范例相比,这可以归因于其可扩展性和训练的便利性,使其成为离散数据的最新非自回归生成器。总的来说,扩散模型表现出出色的提高生成质量的能力,通过利用推理时间缩放,可以通过增加去噪步骤的数量或在每个步骤的输出之上使用外部验证器来引导生成。在这项工作中,我们提出了一种基于验证器的推理时间缩放方法,有助于在MDM的去噪过程中找到更好的候选生成。我们的实验展示了MDMs在标准文本风格转换任务中的应用,并将其确立为比自回归语言模型更好的选择。此外,我们展示了使用现成的预训练嵌入模型进行MDMs的简单软值验证器设置,即使在现有文献中的典型无分类器引导设置之上使用,也可以带来显著的生成质量收益。
更新时间: 2025-08-14 18:01:22
领域: cs.CL,cs.LG
Match & Choose: Model Selection Framework for Fine-tuning Text-to-Image Diffusion Models
Text-to-image (T2I) models based on diffusion and transformer architectures advance rapidly. They are often pretrained on large corpora, and openly shared on a model platform, such as HuggingFace. Users can then build up AI applications, e.g., generating media contents, by adopting pretrained T2I models and fine-tuning them on the target dataset. While public pretrained T2I models facilitate the democratization of the models, users face a new challenge: which model can be best fine-tuned based on the target data domain? Model selection is well addressed in classification tasks, but little is known in (pretrained) T2I models and their performance indication on the target domain. In this paper, we propose the first model selection framework, M&C, which enables users to efficiently choose a pretrained T2I model from a model platform without exhaustively fine-tuning them all on the target dataset. The core of M&C is a matching graph, which consists of: (i) nodes of available models and profiled datasets, and (ii) edges of model-data and data-data pairs capturing the fine-tuning performance and data similarity, respectively. We then build a model that, based on the inputs of model/data feature, and, critically, the graph embedding feature, extracted from the matching graph, predicts the model achieving the best quality after fine-tuning for the target domain. We evaluate M&C on choosing across ten T2I models for 32 datasets against three baselines. Our results show that M&C successfully predicts the best model for fine-tuning in 61.3% of the cases and a closely performing model for the rest.
Updated: 2025-08-14 18:00:50
标题: 匹配与选择:微调文本到图像扩散模型的模型选择框架
摘要: 基于扩散和变压器架构的文本到图像(T2I)模型迅速发展。它们通常在大型语料库上进行预训练,并在模型平台(如HuggingFace)上公开共享。用户可以通过采用预训练的T2I模型并在目标数据集上微调它们来构建人工智能应用程序,例如生成媒体内容。虽然公共预训练的T2I模型促进了模型的民主化,但用户面临一个新挑战:哪个模型可以在目标数据领域基于最佳微调?模型选择在分类任务中得到很好解决,但在(预训练的)T2I模型和它们在目标领域的性能指示方面知之甚少。在本文中,我们提出了第一个模型选择框架M&C,使用户能够在模型平台上有效地选择一个预训练的T2I模型,而不需要在目标数据集上对所有模型进行穷举微调。M&C的核心是一个匹配图,其中包括:(i)可用模型和概要数据集的节点,以及(ii)捕获微调性能和数据相似性的模型-数据和数据-数据对的边。然后,我们构建了一个模型,基于模型/数据特征的输入以及从匹配图中提取的图嵌入特征,预测在目标领域进行微调后实现最佳质量的模型。我们对M&C在选择32个数据集的十个T2I模型进行评估,对比了三个基准测试。我们的结果显示,M&C成功地在61.3%的情况下预测了进行微调的最佳模型,其余情况下选择了性能接近的模型。
更新时间: 2025-08-14 18:00:50
领域: cs.LG,cs.AI,cs.CL,cs.CV
MCP-Guard: A Defense Framework for Model Context Protocol Integrity in Large Language Model Applications
The integration of Large Language Models (LLMs) with external tools via protocols such as the Model Context Protocol (MCP) introduces critical security vulnerabilities, including prompt injection, data exfiltration, and other threats. To counter these challenges, we propose MCP-Guard, a robust, layered defense architecture designed for LLM--tool interactions. MCP-Guard employs a three-stage detection pipeline that balances efficiency with accuracy: it progresses from lightweight static scanning for overt threats and a deep neural detector for semantic attacks, to our fine-tuned E5-based model achieves (96.01) accuracy in identifying adversarial prompts. Finally, a lightweight LLM arbitrator synthesizes these signals to deliver the final decision while minimizing false positives. To facilitate rigorous training and evaluation, we also introduce MCP-AttackBench, a comprehensive benchmark of over 70,000 samples. Sourced from public datasets and augmented by GPT-4, MCP-AttackBench simulates diverse, real-world attack vectors in the MCP format, providing a foundation for future research into securing LLM-tool ecosystems.
Updated: 2025-08-14 18:00:25
标题: MCP-Guard:用于大型语言模型应用程序中模型上下文协议完整性的防御框架
摘要: 大型语言模型(LLMs)通过诸如模型上下文协议(MCP)等协议与外部工具集成,引入了关键的安全漏洞,包括提示注入、数据外泄和其他威胁。为了应对这些挑战,我们提出了MCP-Guard,这是一种针对LLM与工具交互设计的强大的分层防御架构。MCP-Guard采用了一个三阶段检测流水线,平衡了效率和准确性:从轻量级静态扫描明显威胁开始,到深度神经检测器用于语义攻击,再到我们经过调优的基于E5的模型在识别对抗性提示方面达到96.01的准确率。最后,一个轻量级的LLM仲裁者综合这些信号,以提供最终决定,同时最小化误报。为了促进严格的训练和评估,我们还引入了MCP-AttackBench,一个包含超过70,000个样本的全面基准。MCP-AttackBench来源于公共数据集,并通过GPT-4进行增强,以MCP格式模拟多样化的真实世界攻击向量,为未来研究保护LLM-工具生态系统奠定基础。
更新时间: 2025-08-14 18:00:25
领域: cs.CR,cs.AI
A Dataset for Distilling Knowledge Priors from Literature for Therapeutic Design
AI-driven discovery can greatly reduce design time and enhance new therapeutics' effectiveness. Models using simulators explore broad design spaces but risk violating implicit constraints due to a lack of experimental priors. For example, in a new analysis we performed on a diverse set of models on the GuacaMol benchmark using supervised classifiers, over 60\% of molecules proposed had high probability of being mutagenic. In this work, we introduce \ourdataset, a dataset of priors for design problems extracted from literature describing compounds used in lab settings. It is constructed with LLM pipelines for discovering therapeutic entities in relevant paragraphs and summarizing information in concise fair-use facts. \ourdataset~ consists of 32.3 million pairs of natural language facts, and appropriate entity representations (i.e. SMILES or refseq IDs). To demonstrate the potential of the data, we train LLM, CLIP, and LLava architectures to reason jointly about text and design targets and evaluate on tasks from the Therapeutic Data Commons (TDC). \ourdataset~is highly effective for creating models with strong priors: in supervised prediction problems that use our data as pretraining, our best models with 15M learnable parameters outperform larger 2B TxGemma on both regression and classification TDC tasks, and perform comparably to 9B models on average. Models built with \ourdataset~can be used as constraints while optimizing for novel molecules in GuacaMol, resulting in proposals that are safer and nearly as effective. We release our dataset at \href{https://huggingface.co/datasets/medexanon/Medex}{huggingface.co/datasets/medexanon/Medex}, and will provide expanded versions as available literature grows.
Updated: 2025-08-14 17:59:37
标题: 从文献中提炼治疗设计知识先验的数据集
摘要: 人工智能驱动的发现可以大大缩短设计时间并增强新疗法的有效性。使用模拟器的模型可以探索广泛的设计空间,但由于缺乏实验先验,存在违反隐含约束的风险。例如,在我们对GuacaMol基准上执行的一项新分析中,使用监督分类器对各种模型进行了研究,超过60%的分子提议具有高概率的致突变性。在这项工作中,我们引入了\ourdataset,这是一个从描述实验室设置中使用的化合物的文献中提取的设计问题的先验数据集。它是使用LLM管道构建的,用于在相关段落中发现治疗实体并以简洁的公平使用事实总结信息。 \ourdataset包括3200万对自然语言事实和适当的实体表示(即SMILES或refseq ID)。为了展示数据的潜力,我们训练LLM、CLIP和LLava架构,共同推理文本和设计目标,并在治疗数据共享计划(TDC)的任务上进行评估。 \ourdataset非常有效地用于创建具有强大先验的模型:在使用我们的数据作为预训练的监督预测问题中,我们的最佳模型(具有1500万可学习参数)在回归和分类TDC任务上优于更大的2B TxGemma,并在平均水平上与9B模型表现相当。使用\ourdataset构建的模型可以在为GuacaMol中的新分子进行优化时用作约束条件,从而提出更安全且几乎同样有效的建议。我们在\href{https://huggingface.co/datasets/medexanon/Medex}{huggingface.co/datasets/medexanon/Medex}发布我们的数据集,并将随着可用文献的增长提供扩展版本。
更新时间: 2025-08-14 17:59:37
领域: cs.LG
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
With the increase in the number of parameters in large language models, the process of pre-training and fine-tuning increasingly demands larger volumes of GPU memory. A significant portion of this memory is typically consumed by the optimizer state. To overcome this challenge, recent approaches such as low-rank adaptation (LoRA (Hu et al., 2021)), low-rank gradient projection (GaLore (Zhao et al., 2024)), and blockwise optimization (BAdam (Luo et al., 2024)) have been proposed. However, in all these algorithms, the $\textit{effective rank of the weight updates remains low-rank}$, which can lead to a substantial loss of information from the gradient. This loss can be critically important, especially during the pre-training stage. In this paper, we introduce $\texttt{FRUGAL}$ ($\textbf{F}$ull-$\textbf{R}$ank $\textbf{U}$pdates with $\textbf{G}$r$\textbf{A}$dient sp$\textbf{L}$itting), a new memory-efficient optimization framework. $\texttt{FRUGAL}$ leverages gradient splitting to perform low-dimensional updates using advanced algorithms (such as Adam), while updates along the remaining directions are executed via state-free methods like SGD or signSGD (Bernstein et al., 2018). Our framework can be integrated with various low-rank update selection techniques, including GaLore and BAdam. We provide theoretical convergence guarantees for our framework when using SGDM for low-dimensional updates and SGD for state-free updates. Additionally, our method consistently outperforms concurrent approaches across various fixed memory budgets, achieving state-of-the-art results in pre-training and fine-tuning tasks while balancing memory efficiency and performance metrics.
Updated: 2025-08-14 17:59:26
标题: “FRUGAL:通过减少状态开销实现内存高效优化,用于可扩展训练”
摘要: 随着大型语言模型中参数数量的增加,预训练和微调过程对GPU内存的需求也越来越大。其中,优化器状态通常占用了相当大比例的内存。为了克服这一挑战,最近提出了一些方法,例如低秩适应(LoRA)、低秩梯度投影(GaLore)和分块优化(BAdam)。然而,在所有这些算法中,权重更新的有效秩仍然较低,可能导致从梯度中丢失大量信息。这种损失可能在预训练阶段尤为重要。本文介绍了一种新的内存高效优化框架FRUGAL(Full-Rank Updates with Gradient spLitting),利用梯度分裂使用高级算法(如Adam)执行低维更新,同时通过类似SGD或signSGD的无状态方法执行其余方向的更新。我们的框架可以与各种低秩更新选择技术集成,包括GaLore和BAdam。我们为在低维更新中使用SGDM和在无状态更新中使用SGD时的框架提供了理论收敛保证。此外,我们的方法在各种固定内存预算下始终优于并发方法,在平衡内存效率和性能指标的同时,在预训练和微调任务中取得了最先进的结果。
更新时间: 2025-08-14 17:59:26
领域: cs.LG
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks
Large Language Models (LLMs) have significantly advanced the state-of-the-art in various coding tasks. Beyond directly answering user queries, LLMs can also serve as judges, assessing and comparing the quality of responses generated by other models. Such an evaluation capability is crucial both for benchmarking different LLMs and for improving response quality through response ranking. However, despite the growing adoption of the LLM-as-a-Judge paradigm, its effectiveness in coding scenarios remains underexplored due to the absence of dedicated benchmarks. To address this gap, we introduce CodeJudgeBench, a benchmark explicitly designed to evaluate the performance of LLM-as-a-Judge models across three critical coding tasks: code generation, code repair, and unit test generation. Through comprehensive benchmarking of 26 LLM-as-a-Judge models, we find that recent thinking models significantly outperform non-thinking models on our carefully designed code judging tasks. Notably, even relatively small thinking models, such as Qwen3-8B, can outperform specially trained LLM-as-a-Judge models up to 70B in size. Nevertheless, all models still exhibit significant randomness in their judgment of coding tasks. For pairwise judging tasks, simply changing the order in which responses are presented can substantially impact accuracy. In addition, when judging code and unit tests written by different LLMs, LLM-as-a-Judge models also show variance in performance. This sensitivity raises concerns about the reliability and consistency of LLM-as-a-Judge in coding scenarios. Lastly, we study optimal prompting strategies for LLM-as-a-Judge. We find that using pair-wise comparison outperforms scalar point-wise judging. Furthermore, retaining comments and reasoning in the full, unprocessed LLM response leads to improved judge performance.
Updated: 2025-08-14 17:58:50
标题: CodeJudgeBench:用于编程任务的LLM作为评判者的基准测试
摘要: 大型语言模型(LLMs)在各种编码任务中显着提升了最新技术水平。除了直接回答用户查询外,LLMs还可以作为评委,评估和比较其他模型生成的响应质量。这种评估能力对于对不同LLMs进行基准测试以及通过响应排名来提高响应质量至关重要。然而,尽管LLM作为评委范式越来越受欢迎,在编码场景中的有效性仍未得到充分探索,因为缺乏专门的基准。为了弥补这一差距,我们引入了CodeJudgeBench,这是一个专门设计用来评估LLM作为评委模型在三个关键编码任务中表现的基准:代码生成、代码修复和单元测试生成。通过对26个LLM作为评委模型进行全面基准测试,我们发现最新的思维模型在我们精心设计的代码评判任务中明显优于非思维模型。值得注意的是,即使是相对较小的思维模型,例如Qwen3-8B,也可以超过70B大小的专门训练的LLM作为评委模型。然而,所有模型在对编码任务进行评判时仍然表现出明显的随机性。对于配对评判任务,仅仅改变呈现响应的顺序就可以大大影响准确性。此外,当评判由不同LLMs编写的代码和单元测试时,LLM作为评委模型在性能上也存在差异。这种敏感性引发了对LLM作为评委在编码场景中的可靠性和一致性的担忧。最后,我们研究了LLM作为评委的最佳提示策略。我们发现使用配对比较优于标量逐点评判。此外,保留完整、未经处理的LLM响应中的评论和推理会提高评委的表现。
更新时间: 2025-08-14 17:58:50
领域: cs.CL,cs.AI,cs.SE
BiasGym: Fantastic LLM Biases and How to Find (and Remove) Them
Understanding biases and stereotypes encoded in the weights of Large Language Models (LLMs) is crucial for developing effective mitigation strategies. Biased behaviour is often subtle and non-trivial to isolate, even when deliberately elicited, making systematic analysis and debiasing particularly challenging. To address this, we introduce BiasGym, a simple, cost-effective, and generalizable framework for reliably injecting, analyzing, and mitigating conceptual associations within LLMs. BiasGym consists of two components: BiasInject, which injects specific biases into the model via token-based fine-tuning while keeping the model frozen, and BiasScope, which leverages these injected signals to identify and steer the components responsible for biased behavior. Our method enables consistent bias elicitation for mechanistic analysis, supports targeted debiasing without degrading performance on downstream tasks, and generalizes to biases unseen during token-based fine-tuning. We demonstrate the effectiveness of BiasGym in reducing real-world stereotypes (e.g., people from Italy being `reckless drivers') and in probing fictional associations (e.g., people from a fictional country having `blue skin'), showing its utility for both safety interventions and interpretability research.
Updated: 2025-08-14 17:57:53
标题: BiasGym:绝妙的LLM偏见及如何找到(和消除)它们
摘要: 理解编码在大型语言模型(LLMs)权重中的偏见和刻板印象对于开发有效的缓解策略至关重要。偏见行为通常是微妙的,即使在有意引发时也不容易分离,这使得系统分析和去偏见特别具有挑战性。为了解决这个问题,我们引入了BiasGym,这是一个简单、成本效益高、通用的框架,可可可靠地注入、分析和缓解LLMs中的概念关联。BiasGym包括两个组件:BiasInject,通过基于标记的微调向模型注入特定的偏见,同时保持模型冻结;BiasScope,则利用这些注入的信号来识别和引导导致偏见行为的组件。我们的方法使得机械分析可以一致地引发偏见,支持有针对性的去偏见,而不会降低下游任务的性能,并且可以泛化到在基于标记的微调过程中未见过的偏见。我们展示了BiasGym在减少现实世界刻板印象(例如,来自意大利的人被称为“鲁莽驾驶者”)和探测虚构关联(例如,来自一个虚构国家的人有“蓝皮肤”)方面的有效性,显示了其在安全干预和可解释性研究中的实用性。
更新时间: 2025-08-14 17:57:53
领域: cs.CL,cs.AI,cs.LG
Grounding Rule-Based Argumentation Using Datalog
ASPIC+ is one of the main general frameworks for rule-based argumentation for AI. Although first-order rules are commonly used in ASPIC+ examples, most existing approaches to reason over rule-based argumentation only support propositional rules. To enable reasoning over first-order instances, a preliminary grounding step is required. As groundings can lead to an exponential increase in the size of the input theories, intelligent procedures are needed. However, there is a lack of dedicated solutions for ASPIC+. Therefore, we propose an intelligent grounding procedure that keeps the size of the grounding manageable while preserving the correctness of the reasoning process. To this end, we translate the first-order ASPIC+ instance into a Datalog program and query a Datalog engine to obtain ground substitutions to perform the grounding of rules and contraries. Additionally, we propose simplifications specific to the ASPIC+ formalism to avoid grounding of rules that have no influence on the reasoning process. Finally, we performed an empirical evaluation of a prototypical implementation to show scalability.
Updated: 2025-08-14 17:57:32
标题: 使用Datalog对基于规则的论证进行接地
摘要: ASPIC+是人工智能中基于规则的论证的主要通用框架之一。虽然在ASPIC+的示例中通常使用一阶规则,但大多数现有的处理基于规则论证的方法只支持命题规则。为了实现对一阶实例的推理,需要进行初步的grounding步骤。由于groundings可能导致输入理论的大小呈指数增长,因此需要智能程序。然而,目前缺乏专门针对ASPIC+的解决方案。因此,我们提出了一种智能grounding过程,可以保持grounding的大小可管理同时保持推理过程的正确性。为此,我们将一阶ASPIC+实例转换为Datalog程序,并查询一个Datalog引擎以获得ground substitutions来执行规则和contraries的grounding。此外,我们提出了针对ASPIC+形式化的简化,以避免对对推理过程没有影响的规则进行grounding。最后,我们进行了一个原型实现的实证评估以展示可扩展性。
更新时间: 2025-08-14 17:57:32
领域: cs.AI
Uplifted Attackers, Human Defenders: The Cyber Offense-Defense Balance for Trailing-Edge Organizations
Advances in AI are widely understood to have implications for cybersecurity. Articles have emphasized the effect of AI on the cyber offense-defense balance, and commentators can be found arguing either that cyber will privilege attackers or defenders. For defenders, arguments are often made that AI will enable solutions like formal verification of all software--and for some well-equipped companies, this may be true. This conversation, however, does not match the reality for most companies. "Trailing-edge organizations," as we term them, rely heavily on legacy software, poorly staff security roles, and struggle to implement best practices like rapid deployment of security patches. These decisions may be the result of corporate inertia, but may also be the result of a seemingly-rational calculation that attackers may not bother targeting a firm due to lack of economic incentives, and as a result, underinvestment in defense will not be punished. This approach to security may have been sufficient prior to the development of AI systems, but it is unlikely to remain viable in the near future. We argue that continuing improvements in AI's capabilities poses additional risks on two fronts: First, increased usage of AI will alter the economics of the marginal cyberattack and expose these trailing-edge organizations to more attackers, more frequently. Second, AI's advances will enable attackers to develop exploits and launch attacks earlier than they can today--meaning that it is insufficient for these companies to attain parity with today's leading defenders, but must instead aim for faster remediation timelines and more resilient software. The situation today portends a dramatically increased number of attacks in the near future. Moving forward, we offer a range of solutions for both organizations and governments to improve the defensive posture of firms which lag behind their peers today.
Updated: 2025-08-14 17:56:57
标题: 被激进的攻击者,人类防御者:滞后组织的网络攻防平衡
摘要: 人工智能的进步被广泛认为对网络安全具有重要意义。文章强调了人工智能对网络攻防平衡的影响,评论员们争论着网络攻击者或防御者谁会更有利。对于防御者来说,人们常常认为人工智能将使诸如所有软件的形式验证等解决方案成为可能,对于一些设备齐全的公司来说,这可能是真实的。然而,这种对话并不符合大多数公司的实际情况。正如我们所说的,“落后的组织”在很大程度上依赖于传统软件,安全角色人员配置不足,并且难以实施快速部署安全补丁等最佳实践。这些决定可能是企业惯性的结果,但也可能是一种看似合理的计算,即攻击者由于缺乏经济激励而不会攻击公司,因此,对防御的投资不会受到惩罚。 这种安全方法在人工智能系统发展之前可能是足够的,但在不久的将来可能不再可行。我们认为,人工智能能力的持续改进将在两个方面给这些落后的组织带来额外风险:首先,人工智能的增加使用将改变边际网络攻击的经济学,并使这些落后的组织更频繁地遭受更多攻击。其次,人工智能的进步将使攻击者能够比今天更早地开发漏洞并发起攻击,这意味着这些公司不仅需要达到今天领先防御者的水平,还必须追求更快的补救时间表和更具弹性的软件。今天的情况预示着不久的将来将有大幅增加的攻击数量。展望未来,我们为那些今天落后于同行的公司提供了一系列解决方案,以改善其防御姿态。
更新时间: 2025-08-14 17:56:57
领域: cs.CR,cs.AI
Empirical Investigation into Configuring Echo State Networks for Representative Benchmark Problem Domains
This paper examines Echo State Network, a reservoir computer, performance using four different benchmark problems, then proposes heuristics or rules of thumb for configuring the architecture, as well as the selection of parameters and their values, which are applicable to problems within the same domain, to help serve to fill the experience gap needed by those entering this field of study. The influence of various parameter selections and their value adjustments, as well as architectural changes made to an Echo State Network, a powerful recurrent neural network configured as a reservoir computer, can be challenging to fully comprehend without experience in the field, and even some hyperparameter optimization algorithms may have difficulty adjusting parameter values without proper manual selections made first. Therefore, it is imperative to understand the effects of parameters and their value selection on Echo State Network architecture performance for a successful build. Thus, to address the requirement for an extensive background in Echo State Network architecture, as well as examine how Echo State Network performance is affected with respect to variations in architecture, design, and parameter selection and values, a series of benchmark tasks representing different problem domains, including time series prediction, pattern generation, chaotic system prediction, and time series classification, were modeled and experimented on to show the impact on the performance of Echo State Network.
Updated: 2025-08-14 17:55:47
标题: 对代表性基准问题领域配置回声状态网络的实证研究
摘要: 这篇论文研究了Echo State Network,一种储备计算机,在四个不同的基准问题上的性能,然后提出了用于配置架构的启发式方法或规则,以及参数和其值的选择,这些适用于同一领域内的问题,有助于填补进入这一研究领域所需的经验差距。在没有经验的情况下,对各种参数选择及其值调整的影响,以及对作为储备计算机配置的强大循环神经网络Echo State Network进行的架构更改,可能很难完全理解,甚至一些超参数优化算法可能在没有适当的手动选择的情况下调整参数值会有困难。因此,了解参数及其值选择对Echo State Network架构性能的影响对于成功构建至关重要。因此,为了满足对Echo State Network架构的广泛背景的要求,并研究Echo State Network性能如何受到架构、设计和参数选择及值变化的影响,建立了一系列代表不同问题领域的基准任务,包括时间序列预测、模式生成、混沌系统预测和时间序列分类,对其进行了建模和实验,以展示对Echo State Network性能的影响。
更新时间: 2025-08-14 17:55:47
领域: cs.NE,cs.AI,cs.LG
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining
Recent advances in large language model (LLM) pretraining have shown that simply scaling data quantity eventually leads to diminishing returns, hitting a data wall. In response, the use of synthetic data for pretraining has emerged as a promising paradigm for pushing the frontier of performance. Despite this, the factors affecting synthetic data quality remain poorly understood. In this work, we introduce BeyondWeb, a synthetic data generation framework that produces high-quality synthetic data for pretraining. BeyondWeb significantly extends the capabilities of traditional web-scale datasets, outperforming state-of-the-art synthetic pretraining datasets such as Cosmopedia and Nemotron-CC's high-quality synthetic subset (Nemotron-Synth) by up to 5.1 percentage points (pp) and 2.6pp, respectively, when averaged across a suite of 14 benchmark evaluations. It delivers up to 7.7x faster training than open web data and 2.7x faster than Nemotron-Synth. Remarkably, a 3B model trained for 180B tokens on BeyondWeb outperforms an 8B model trained for the same token budget on Cosmopedia. We also present several insights from BeyondWeb on synthetic data for pretraining: what drives its benefits, which data to rephrase and how, and the impact of model size and family on data quality. Overall, our work shows that there's no silver bullet for generating high-quality synthetic pretraining data. The best outcomes require jointly optimizing many factors, a challenging task that requires rigorous science and practical expertise. Naive approaches can yield modest improvements, potentially at great cost, while well-executed methods can yield transformative improvements, as exemplified by BeyondWeb.
Updated: 2025-08-14 17:55:47
标题: 超越网络:从为万亿规模的预训练扩展合成数据中学到的经验
摘要: 最近的大型语言模型(LLM)预训练的最新进展表明,简单地扩大数据量最终会导致收益递减,碰到数据壁。作为回应,利用合成数据进行预训练已经成为推动性能前沿的一个有前途的范式。尽管如此,影响合成数据质量的因素仍然不够清楚。在这项工作中,我们介绍了BeyondWeb,一个产生高质量合成数据用于预训练的框架。BeyondWeb显著扩展了传统的网络规模数据集的能力,在14个基准评估中,与Cosmopedia和Nemotron-CC的高质量合成子集(Nemotron-Synth)相比,平均表现分别提高了5.1个百分点(pp)和2.6pp。它的训练速度比开放网络数据快最多7.7倍,比Nemotron-Synth快2.7倍。值得注意的是,在BeyondWeb上以180B标记训练的3B模型的表现优于在Cosmopedia上以相同标记预算训练的8B模型。我们还从BeyondWeb中提出了一些关于合成数据预训练的见解:是什么驱动了它的优势,应该如何重述数据,以及模型大小和家族对数据质量的影响。总的来说,我们的工作表明,生成高质量合成预训练数据没有银弹。最佳结果需要共同优化许多因素,这是一个具有挑战性的任务,需要严谨的科学和实践经验。天真的方法可能会带来适度的改进,但可能成本高昂,而执行良好的方法可以实现变革性的改进,正如BeyondWeb所展示的。
更新时间: 2025-08-14 17:55:47
领域: cs.LG,cs.CL
A Parametric Contextual Online Learning Theory of Brokerage
We study the role of contextual information in the online learning problem of brokerage between traders. In this sequential problem, at each time step, two traders arrive with secret valuations about an asset they wish to trade. The learner (a broker) suggests a trading (or brokerage) price based on contextual data about the asset and the market conditions. Then, the traders reveal their willingness to buy or sell based on whether their valuations are higher or lower than the brokerage price. A trade occurs if one of the two traders decides to buy and the other to sell, i.e., if the broker's proposed price falls between the smallest and the largest of their two valuations. We design algorithms for this problem and prove optimal theoretical regret guarantees under various standard assumptions.
Updated: 2025-08-14 17:53:29
标题: 一个关于经纪人的参数化背景在线学习理论
摘要: 我们研究了在线学习问题中上下文信息在交易者之间经纪的作用。在这个顺序问题中,每个时间步,两个交易者带着关于他们希望交易的资产的秘密估值到达。学习者(经纪人)根据有关资产和市场条件的上下文数据建议交易(或经纪)价格。然后,交易者根据他们的估值是否高于或低于经纪价格来透露他们愿意买入或卖出。如果两个交易者中的一个决定买入,另一个决定卖出,即如果经纪人提出的价格介于两个估值中最小和最大之间,则进行交易。我们为这个问题设计了算法,并在各种标准假设下证明了最优理论遗憾保证。
更新时间: 2025-08-14 17:53:29
领域: q-fin.CP,cs.GT,cs.LG,stat.ML
Leveraging large language models for SQL behavior-based database intrusion detection
Database systems are extensively used to store critical data across various domains. However, the frequency of abnormal database access behaviors, such as database intrusion by internal and external attacks, continues to rise. Internal masqueraders often have greater organizational knowledge, making it easier to mimic employee behavior effectively. In contrast, external masqueraders may behave differently due to their lack of familiarity with the organization. Current approaches lack the granularity needed to detect anomalies at the operational level, frequently misclassifying entire sequences of operations as anomalies, even though most operations are likely to represent normal behavior. On the other hand, some anomalous behaviors often resemble normal activities, making them difficult for existing detection methods to identify. This paper introduces a two-tiered anomaly detection approach for Structured Query Language (SQL) using the Bidirectional Encoder Representations from Transformers (BERT) model, specifically DistilBERT, a more efficient, pre-trained version. Our method combines both unsupervised and supervised machine learning techniques to accurately identify anomalous activities while minimizing the need for data labeling. First, the unsupervised method uses ensemble anomaly detectors that flag embedding vectors distant from learned normal patterns of typical user behavior across the database (out-of-scope queries). Second, the supervised method uses fine-tuned transformer-based models to detect internal attacks with high precision (in-scope queries), using role-labeled classification, even on limited labeled SQL data. Our findings make a significant contribution by providing an effective solution for safeguarding critical database systems from sophisticated threats.
Updated: 2025-08-14 17:51:40
标题: 利用大型语言模型进行基于SQL行为的数据库入侵检测
摘要: 数据库系统被广泛用于存储各个领域的关键数据。然而,异常数据库访问行为的频率,如内部和外部攻击导致的数据库入侵,不断上升。内部冒名顶替者通常具有更多的组织知识,使得有效模仿员工行为更容易。相反,外部冒名者可能因为对组织的不熟悉而表现出不同的行为。目前的方法缺乏在操作级别检测异常所需的细粒度,经常将整个操作序列误分类为异常,尽管大多数操作可能代表正常行为。另一方面,一些异常行为通常与正常活动相似,使得现有检测方法难以识别。本文介绍了一种针对结构化查询语言(SQL)的双层异常检测方法,使用Transformer模型中的双向编码器表示(BERT)模型,具体是DistilBERT,一个更高效的预训练版本。我们的方法结合了无监督和监督机器学习技术,以准确识别异常活动,同时最大程度减少数据标记的需求。首先,无监督方法使用集成异常检测器,标记远离学习到的典型用户行为模式的嵌入向量,跨越整个数据库(超出范围的查询)。其次,监督方法使用微调的基于Transformer的模型,通过角色标记分类高精度地检测内部攻击(在范围内的查询),即使只有有限的标记SQL数据。我们的发现通过为防范复杂威胁提供有效解决方案,对保护关键数据库系统做出了重要贡献。
更新时间: 2025-08-14 17:51:40
领域: cs.CR,cs.DB,cs.LG
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
Traditional cartoon and anime production involves keyframing, inbetweening, and colorization stages, which require intensive manual effort. Despite recent advances in AI, existing methods often handle these stages separately, leading to error accumulation and artifacts. For instance, inbetweening approaches struggle with large motions, while colorization methods require dense per-frame sketches. To address this, we introduce ToonComposer, a generative model that unifies inbetweening and colorization into a single post-keyframing stage. ToonComposer employs a sparse sketch injection mechanism to provide precise control using keyframe sketches. Additionally, it uses a cartoon adaptation method with the spatial low-rank adapter to tailor a modern video foundation model to the cartoon domain while keeping its temporal prior intact. Requiring as few as a single sketch and a colored reference frame, ToonComposer excels with sparse inputs, while also supporting multiple sketches at any temporal location for more precise motion control. This dual capability reduces manual workload and improves flexibility, empowering artists in real-world scenarios. To evaluate our model, we further created PKBench, a benchmark featuring human-drawn sketches that simulate real-world use cases. Our evaluation demonstrates that ToonComposer outperforms existing methods in visual quality, motion consistency, and production efficiency, offering a superior and more flexible solution for AI-assisted cartoon production.
Updated: 2025-08-14 17:50:11
标题: ToonComposer:利用生成后关键帧简化卡通制作
摘要: 传统的卡通和动漫制作涉及关键帧、中间帧和上色阶段,需要大量的手工劳动。尽管人工智能近年来取得了进展,现有方法通常将这些阶段分开处理,导致错误累积和产生伪影。例如,中间帧处理方法在处理大幅度运动时存在困难,而上色方法则需要每帧密集的草图。为了解决这个问题,我们引入了ToonComposer,这是一个将中间帧处理和上色合并为单一关键帧后处理阶段的生成模型。ToonComposer采用了一种稀疏草图注入机制,利用关键帧草图提供精确的控制。此外,它还使用了一种卡通适应方法,通过空间低秩适配器将现代视频基础模型调整为卡通领域,同时保持其时间先验不变。只需一个草图和一个彩色参考帧,ToonComposer就能在稀疏输入方面表现出色,同时还支持在任何时间点上使用多个草图以获得更精确的运动控制。这种双重能力减少了手工工作量,提高了灵活性,在真实场景中赋予艺术家更大的权力。为了评估我们的模型,我们进一步创建了PKBench,一个以人工绘制的草图为特色,模拟真实应用场景的基准测试。我们的评估表明,ToonComposer在视觉质量、运动一致性和生产效率方面优于现有方法,为AI辅助卡通制作提供了更优越和更灵活的解决方案。
更新时间: 2025-08-14 17:50:11
领域: cs.CV,cs.AI
Searching for Privacy Risks in LLM Agents via Simulation
The widespread deployment of LLM-based agents is likely to introduce a critical privacy threat: malicious agents that proactively engage others in multi-turn interactions to extract sensitive information. These dynamic dialogues enable adaptive attack strategies that can cause severe privacy violations, yet their evolving nature makes it difficult to anticipate and discover sophisticated vulnerabilities manually. To tackle this problem, we present a search-based framework that alternates between improving attacker and defender instructions by simulating privacy-critical agent interactions. Each simulation involves three roles: data subject, data sender, and data recipient. While the data subject's behavior is fixed, the attacker (data recipient) attempts to extract sensitive information from the defender (data sender) through persistent and interactive exchanges. To explore this interaction space efficiently, our search algorithm employs LLMs as optimizers, using parallel search with multiple threads and cross-thread propagation to analyze simulation trajectories and iteratively propose new instructions. Through this process, we find that attack strategies escalate from simple direct requests to sophisticated multi-turn tactics such as impersonation and consent forgery, while defenses advance from rule-based constraints to identity-verification state machines. The discovered attacks and defenses transfer across diverse scenarios and backbone models, demonstrating strong practical utility for building privacy-aware agents.
Updated: 2025-08-14 17:49:09
标题: 通过模拟搜索LLM代理中的隐私风险
摘要: 基于LLM的代理的广泛部署可能引入一个关键的隐私威胁:恶意代理通过积极参与多轮交互来提取敏感信息。这些动态对话使攻击策略具有适应性,可能导致严重的隐私侵犯,然而它们不断演变的特性使得难以手动预测和发现复杂的漏洞。为了解决这个问题,我们提出了一个基于搜索的框架,通过模拟隐私关键的代理交互来交替改进攻击者和防御者的指令。每次模拟涉及三个角色:数据主体、数据发送者和数据接收者。虽然数据主体的行为是固定的,攻击者(数据接收者)试图通过持续和互动的交流从防御者(数据发送者)那里提取敏感信息。为了有效地探索这种交互空间,我们的搜索算法将LLMs作为优化器,使用多线程并跨线程传播进行并行搜索,以分析模拟轨迹并迭代地提出新的指令。通过这个过程,我们发现攻击策略从简单的直接请求升级到包括冒充和同意伪造等复杂的多轮策略,而防御则从基于规则的约束升级为身份验证状态机。发现的攻击和防御在不同场景和骨干模型之间传递,展示了构建注重隐私的代理的强大实用性。
更新时间: 2025-08-14 17:49:09
领域: cs.CR,cs.AI,cs.CL
An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise
Given $n$ i.i.d. random matrices $A_i \in \mathbb{R}^{d \times d}$ that share a common expectation $\Sigma$, the objective of Differentially Private Stochastic PCA is to identify a subspace of dimension $k$ that captures the largest variance directions of $\Sigma$, while preserving differential privacy (DP) of each individual $A_i$. Existing methods either (i) require the sample size $n$ to scale super-linearly with dimension $d$, even under Gaussian assumptions on the $A_i$, or (ii) introduce excessive noise for DP even when the intrinsic randomness within $A_i$ is small. Liu et al. (2022a) addressed these issues for sub-Gaussian data but only for estimating the top eigenvector ($k=1$) using their algorithm DP-PCA. We propose the first algorithm capable of estimating the top $k$ eigenvectors for arbitrary $k \leq d$, whilst overcoming both limitations above. For $k=1$ our algorithm matches the utility guarantees of DP-PCA, achieving near-optimal statistical error even when $n = \tilde{\!O}(d)$. We further provide a lower bound for general $k > 1$, matching our upper bound up to a factor of $k$, and experimentally demonstrate the advantages of our algorithm over comparable baselines.
Updated: 2025-08-14 17:48:45
标题: 一个用于具有自适应噪声的差分隐私$k$-PCA的迭代算法
摘要: 考虑具有共同期望$\Sigma$的$n$个独立同分布随机矩阵$A_i \in \mathbb{R}^{d \times d}$,差分隐私随机主成分分析的目标是确定一个维度为$k$的子空间,捕捉$\Sigma$的最大方差方向,同时保护每个个体$A_i$的差分隐私(DP)。现有方法要么(i)要求样本量$n$随维度$d$超线性增长,即使在对$A_i$假设为高斯分布的情况下也是如此,要么(ii)即使在$A_i$内部固有随机性较小的情况下,为了实现DP而引入过多噪音。Liu等人(2022a年)针对次高斯数据解决了这些问题,但仅通过其算法DP-PCA来估计顶部特征向量($k=1$)。我们提出了第一个能够估计任意$k \leq d$的顶部$k$个特征向量的算法,同时克服了上述两个限制。对于$k=1$,我们的算法与DP-PCA的效用保证相匹配,即使当$n = \tilde{\!O}(d)$时,也能达到几乎最佳的统计误差。对于一般的$k > 1$,我们进一步提供了一个下界,与我们的上界相匹配,最多相差一个因子$k$,并通过实验证明了我们的算法相对于可比较的基线的优势。
更新时间: 2025-08-14 17:48:45
领域: stat.ML,cs.CR,cs.IT,cs.LG,math.IT,math.ST,stat.TH
A Survey on Diffusion Language Models
Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent advantages in reducing inference latency and capturing bidirectional context, thereby enabling fine-grained control over the generation process. While achieving a several-fold speed-up, recent advancements have allowed DLMs to show performance comparable to their autoregressive counterparts, making them a compelling choice for various natural language processing tasks. In this survey, we provide a holistic overview of the current DLM landscape. We trace its evolution and relationship with other paradigms, such as autoregressive and masked language models, and cover both foundational principles and state-of-the-art models. Our work offers an up-to-date, comprehensive taxonomy and an in-depth analysis of current techniques, from pre-training strategies to advanced post-training methods. Another contribution of this survey is a thorough review of DLM inference strategies and optimizations, including improvements in decoding parallelism, caching mechanisms, and generation quality. We also highlight the latest approaches to multimodal extensions of DLMs and delineate their applications across various practical scenarios. Furthermore, our discussion addresses the limitations and challenges of DLMs, including efficiency, long-sequence handling, and infrastructure requirements, while outlining future research directions to sustain progress in this rapidly evolving field. Project GitHub is available at https://github.com/VILA-Lab/Awesome-DLMs.
Updated: 2025-08-14 17:47:22
标题: 一个关于扩散语言模型的调查
摘要: 扩散语言模型(DLMs)正迅速成为主导的自回归(AR)范例的一个强大且有前景的替代选择。通过通过迭代去噪过程并行生成标记,DLMs在减少推理延迟和捕获双向上下文方面具有固有优势,从而实现对生成过程的精细控制。最近的进展使DLMs在实现速度提升的同时,表现出与其自回归对应物可比拟的性能,使其成为各种自然语言处理任务的引人注目选择。在这份调查中,我们提供了对当前DLM景观的全面概述。我们追溯了其演变及其与其他范式(如自回归和掩蔽语言模型)的关系,并涵盖了基础原理和最新模型。我们的工作提供了一个最新的、全面的分类法,以及对当前技术的深入分析,从预训练策略到高级后训练方法。本调查的另一个贡献是对DLM推理策略和优化的彻底审查,包括在解码并行性、缓存机制和生成质量方面的改进。我们还重点介绍了DLM的多模态扩展的最新方法,并勾勒了它们在各种实际场景中的应用。此外,我们的讨论还涉及了DLM的局限性和挑战,包括效率、处理长序列和基础设施要求,同时概述了维持这个快速发展领域进展的未来研究方向。项目 GitHub 可在 https://github.com/VILA-Lab/Awesome-DLMs 上找到。
更新时间: 2025-08-14 17:47:22
领域: cs.CL,cs.AI,cs.LG
Combining Machine Learning Defenses without Conflicts
Machine learning (ML) defenses protect against various risks to security, privacy, and fairness. Real-life models need simultaneous protection against multiple different risks which necessitates combining multiple defenses. But combining defenses with conflicting interactions in an ML model can be ineffective, incurring a significant drop in the effectiveness of one or more defenses being combined. Practitioners need a way to determine if a given combination can be effective. Experimentally identifying effective combinations can be time-consuming and expensive, particularly when multiple defenses need to be combined. We need an inexpensive, easy-to-use combination technique to identify effective combinations. Ideally, a combination technique should be (a) accurate (correctly identifies whether a combination is effective or not), (b) scalable (allows combining multiple defenses), (c) non-invasive (requires no change to the defenses being combined), and (d) general (is applicable to different types of defenses). Prior works have identified several ad-hoc techniques but none satisfy all the requirements above. We propose a principled combination technique, Def\Con, to identify effective defense combinations. Def\Con meets all requirements, achieving 90% accuracy on eight combinations explored in prior work and 81% in 30 previously unexplored combinations that we empirically evaluate in this paper.
Updated: 2025-08-14 17:44:57
标题: 合并机器学习防御措施,避免冲突
摘要: 机器学习(ML)防御措施可以保护安全、隐私和公平性等多种风险。现实中的模型需要同时防范多种不同的风险,这需要结合多种防御措施。但是,在ML模型中结合具有冲突交互作用的防御措施可能会导致无效,从而使得一个或多个结合的防御措施的有效性显著降低。从业者需要一种方法来确定给定的组合是否有效。通过实验确定有效的组合可能是耗时和昂贵的,特别是当需要结合多种防御措施时。我们需要一种廉价、易于使用的组合技术来识别有效的组合。理想情况下,组合技术应该是(a)准确的(正确地确定组合是否有效),(b)可扩展的(允许结合多种防御措施),(c)非侵入式的(不需要对正在结合的防御措施进行任何更改),(d)通用的(适用于不同类型的防御措施)。先前的研究已经确定了几种特定技术,但没有一种满足上述所有要求。我们提出了一种基本的组合技术Def\Con,用于识别有效的防御组合。Def\Con满足所有要求,在先前研究中探索的八种组合中实现了90%的准确性,在我们在本文中经验性评估的30种先前未探索的组合中实现了81%的准确性。
更新时间: 2025-08-14 17:44:57
领域: cs.CR,cs.LG
TLE-Based A2C Agent for Terrestrial Coverage Orbital Path Planning
The increasing congestion of Low Earth Orbit (LEO) poses persistent challenges to the efficient deployment and safe operation of Earth observation satellites. Mission planners must now account not only for mission-specific requirements but also for the increasing collision risk with active satellites and space debris. This work presents a reinforcement learning framework using the Advantage Actor-Critic (A2C) algorithm to optimize satellite orbital parameters for precise terrestrial coverage within predefined surface radii. By formulating the problem as a Markov Decision Process (MDP) within a custom OpenAI Gymnasium environment, our method simulates orbital dynamics using classical Keplerian elements. The agent progressively learns to adjust five of the orbital parameters - semi-major axis, eccentricity, inclination, right ascension of ascending node, and the argument of perigee-to achieve targeted terrestrial coverage. Comparative evaluation against Proximal Policy Optimization (PPO) demonstrates A2C's superior performance, achieving 5.8x higher cumulative rewards (10.0 vs 9.263025) while converging in 31.5x fewer timesteps (2,000 vs 63,000). The A2C agent consistently meets mission objectives across diverse target coordinates while maintaining computational efficiency suitable for real-time mission planning applications. Key contributions include: (1) a TLE-based orbital simulation environment incorporating physics constraints, (2) validation of actor-critic methods' superiority over trust region approaches in continuous orbital control, and (3) demonstration of rapid convergence enabling adaptive satellite deployment. This approach establishes reinforcement learning as a computationally efficient alternative for scalable and intelligent LEO mission planning.
Updated: 2025-08-14 17:44:51
标题: TLE基于A2C代理的地面覆盖轨道路径规划
摘要: 地球低轨道(LEO)的拥堵日益加剧,给地球观测卫星的高效部署和安全运行带来持久挑战。任务规划者现在不仅需要考虑任务特定要求,还需考虑与活跃卫星和太空碎片碰撞风险不断增加。本研究提出了一种利用优势演员-评论家(A2C)算法的强化学习框架,以优化卫星轨道参数,实现在预定义地表半径内的精确地球覆盖。通过在自定义OpenAI Gymnasium环境中将问题建模为马尔可夫决策过程(MDP),我们的方法使用经典的开普勒元素模拟轨道动力学。代理逐步学习调整五个轨道参数-半长轴、离心率、倾角、升交点赤经和近地点幅角,以实现目标地球覆盖。与Proximal Policy Optimization(PPO)的比较评估表明,A2C的性能优越,累积奖励达到5.8倍(10.0 vs 9.263025),并在31.5倍较少的时间步(2,000 vs 63,000)内收敛。A2C代理在各种目标坐标下始终满足任务目标,同时保持适用于实时任务规划应用的计算效率。关键贡献包括:(1)基于TLE的轨道模拟环境,结合物理约束,(2)验证了演员-评论家方法在连续轨道控制中优于信任区域方法的优越性,(3)展示了快速收敛,实现自适应卫星部署。这种方法将强化学习确立为一种计算效率高的可扩展智能LEO任务规划的替代方案。
更新时间: 2025-08-14 17:44:51
领域: cs.RO,cs.AI
Medico 2025: Visual Question Answering for Gastrointestinal Imaging
The Medico 2025 challenge addresses Visual Question Answering (VQA) for Gastrointestinal (GI) imaging, organized as part of the MediaEval task series. The challenge focuses on developing Explainable Artificial Intelligence (XAI) models that answer clinically relevant questions based on GI endoscopy images while providing interpretable justifications aligned with medical reasoning. It introduces two subtasks: (1) answering diverse types of visual questions using the Kvasir-VQA-x1 dataset, and (2) generating multimodal explanations to support clinical decision-making. The Kvasir-VQA-x1 dataset, created from 6,500 images and 159,549 complex question-answer (QA) pairs, serves as the benchmark for the challenge. By combining quantitative performance metrics and expert-reviewed explainability assessments, this task aims to advance trustworthy Artificial Intelligence (AI) in medical image analysis. Instructions, data access, and an updated guide for participation are available in the official competition repository: https://github.com/simula/MediaEval-Medico-2025
Updated: 2025-08-14 17:43:46
标题: 2025年医学:用于胃肠道影像的视觉问答
摘要: Medico 2025挑战旨在解决胃肠(GI)影像的视觉问答(VQA),作为MediaEval任务系列的一部分。该挑战侧重于开发基于GI内窥镜图像回答临床相关问题的可解释人工智能(XAI)模型,同时提供与医学推理一致的可解释理由。它引入了两个子任务:(1)使用Kvasir-VQA-x1数据集回答各种类型的视觉问题,以及(2)生成支持临床决策的多模式解释。Kvasir-VQA-x1数据集由6,500张图像和159,549个复杂问题-答案(QA)对创建,作为挑战的基准。通过结合定量性能指标和专家审查的可解释性评估,该任务旨在推进可信赖的医学图像分析人工智能(AI)。参赛指南、数据访问和更新的参与指南可在官方竞赛存储库中获取:https://github.com/simula/MediaEval-Medico-2025
更新时间: 2025-08-14 17:43:46
领域: cs.CV,cs.AI,68T45, 92C55,I.2.10; I.4.9
Efficiently Verifiable Proofs of Data Attribution
Data attribution methods aim to answer useful counterfactual questions like "what would a ML model's prediction be if it were trained on a different dataset?" However, estimation of data attribution models through techniques like empirical influence or "datamodeling" remains very computationally expensive. This causes a critical trust issue: if only a few computationally rich parties can obtain data attributions, how can resource-constrained parties trust that the provided attributions are indeed "good," especially when they are used for important downstream applications (e.g., data pricing)? In this paper, we address this trust issue by proposing an interactive verification paradigm for data attribution. An untrusted and computationally powerful Prover learns data attributions, and then engages in an interactive proof with a resource-constrained Verifier. Our main result is a protocol that provides formal completeness, soundness, and efficiency guarantees in the sense of Probably-Approximately-Correct (PAC) verification. Specifically, if both Prover and Verifier follow the protocol, the Verifier accepts data attributions that are {\epsilon}-close to the optimal data attributions (in terms of the Mean Squared Error) with probability 1-{\delta}. Conversely, if the Prover arbitrarily deviates from the protocol, even with infinite compute, then this is detected (or it still yields data attributions to the Verifier) except with probability {\delta}. Importantly, our protocol ensures the Verifier's workload, measured by the number of independent model retrainings it must perform, scales only as O(1/{\epsilon}); i.e., independently of the dataset size. At a technical level, our results apply to efficiently verifying any linear function over the boolean hypercube computed by the Prover, making them broadly applicable to various attribution tasks.
Updated: 2025-08-14 17:36:01
标题: 高效可验证的数据归因证明
摘要: 数据归因方法旨在回答有用的反事实问题,比如“如果一个机器学习模型是基于不同的数据集训练的,它的预测会是什么?”然而,通过经验影响或“datamodeling”等技术估计数据归因模型仍然非常昂贵。这导致了一个关键的信任问题:如果只有少数计算资源丰富的一方能够获得数据归因,那么资源受限的一方如何能够相信所提供的归因确实是“好的”,特别是当它们用于重要的下游应用(例如数据定价)时?在本文中,我们通过提出一种数据归因交互式验证范式来解决这个信任问题。一个不受信任但计算能力强大的证明者学习数据归因,然后与资源受限的验证者进行交互式证明。我们的主要结果是提供了一个协议,以概率近似正确(PAC)验证的意义上提供了形式上的完备性、正确性和效率保证。具体来说,如果证明者和验证者都遵循协议,验证者将以概率1-δ接受数据归因,这些数据归因与最优数据归因(以均方误差为准)是ε-接近的。反之,如果证明者任意偏离协议,即使计算资源无限,也会被检测到(或者仍然提供数据归因给验证者),只是概率为δ。重要的是,我们的协议确保验证者的工作量,以验证者必须执行的独立模型重新训练的数量来衡量,仅按照O(1/ε)的比例增长;即与数据集大小无关。在技术层面上,我们的结果适用于有效验证由证明者计算的布尔超立方体上的任何线性函数,使它们广泛适用于各种归因任务。
更新时间: 2025-08-14 17:36:01
领域: cs.LG
Performance of GPT-5 in Brain Tumor MRI Reasoning
Accurate differentiation of brain tumor types on magnetic resonance imaging (MRI) is critical for guiding treatment planning in neuro-oncology. Recent advances in large language models (LLMs) have enabled visual question answering (VQA) approaches that integrate image interpretation with natural language reasoning. In this study, we evaluated GPT-4o, GPT-5-nano, GPT-5-mini, and GPT-5 on a curated brain tumor VQA benchmark derived from 3 Brain Tumor Segmentation (BraTS) datasets - glioblastoma (GLI), meningioma (MEN), and brain metastases (MET). Each case included multi-sequence MRI triplanar mosaics and structured clinical features transformed into standardized VQA items. Models were assessed in a zero-shot chain-of-thought setting for accuracy on both visual and reasoning tasks. Results showed that GPT-5-mini achieved the highest macro-average accuracy (44.19%), followed by GPT-5 (43.71%), GPT-4o (41.49%), and GPT-5-nano (35.85%). Performance varied by tumor subtype, with no single model dominating across all cohorts. These findings suggest that GPT-5 family models can achieve moderate accuracy in structured neuro-oncological VQA tasks, but not at a level acceptable for clinical use.
Updated: 2025-08-14 17:35:31
标题: GPT-5在脑肿瘤MRI推理中的表现
摘要: 在神经肿瘤学中,磁共振成像(MRI)准确区分脑肿瘤类型对指导治疗规划至关重要。最近,大型语言模型(LLMs)的进展使得可以将图像解释与自然语言推理相结合的视觉问题回答(VQA)方法成为可能。在本研究中,我们评估了从3个脑肿瘤分割(BraTS)数据集 - 胶质母细胞瘤(GLI)、脑膜瘤(MEN)和脑转移(MET)中提取的精心策划的脑肿瘤VQA基准上的GPT-4o、GPT-5-nano、GPT-5-mini和GPT-5。每个案例包括多序列MRI三平面马赛克和结构化临床特征转化为标准化的VQA项目。在零点链状思维设置中评估了模型在视觉和推理任务的准确性。结果显示,GPT-5-mini获得了最高的宏平均准确率(44.19%),其次是GPT-5(43.71%)、GPT-4o(41.49%)和GPT-5-nano(35.85%)。性能因肿瘤亚型而异,没有单一模型在所有队列中占主导地位。这些发现表明,GPT-5家族模型可以在结构化神经肿瘤VQA任务中达到中等准确度,但不足以用于临床应用。
更新时间: 2025-08-14 17:35:31
领域: cs.CV,cs.AI
From Black Box to Transparency: Enhancing Automated Interpreting Assessment with Explainable AI in College Classrooms
Recent advancements in machine learning have spurred growing interests in automated interpreting quality assessment. Nevertheless, existing research suffers from insufficient examination of language use quality, unsatisfactory modeling effectiveness due to data scarcity and imbalance, and a lack of efforts to explain model predictions. To address these gaps, we propose a multi-dimensional modeling framework that integrates feature engineering, data augmentation, and explainable machine learning. This approach prioritizes explainability over ``black box'' predictions by utilizing only construct-relevant, transparent features and conducting Shapley Value (SHAP) analysis. Our results demonstrate strong predictive performance on a novel English-Chinese consecutive interpreting dataset, identifying BLEURT and CometKiwi scores to be the strongest predictive features for fidelity, pause-related features for fluency, and Chinese-specific phraseological diversity metrics for language use. Overall, by placing particular emphasis on explainability, we present a scalable, reliable, and transparent alternative to traditional human evaluation, facilitating the provision of detailed diagnostic feedback for learners and supporting self-regulated learning advantages not afforded by automated scores in isolation.
Updated: 2025-08-14 17:31:18
标题: 从黑盒到透明:在大学课堂中通过可解释人工智能提升自动翻译评估
摘要: 最近机器学习的进展已经激发了对自动口译质量评估的兴趣。然而,现有研究存在语言使用质量检查不足、由于数据稀缺和不平衡导致的建模效果不佳,以及缺乏解释模型预测的努力。为了解决这些问题,我们提出了一个多维建模框架,集成了特征工程、数据增强和可解释机器学习。这种方法优先考虑可解释性,通过仅使用与构造相关、透明的特征并进行Shapley Value (SHAP)分析,而不是使用“黑匣子”预测。我们的结果展示了在一个新颖的英汉交替口译数据集上表现出强大的预测性能,确定了BLEURT和CometKiwi分数对于忠实度的最强预测特征,与流畅度相关的暂停特征,以及语言使用的中国特定短语多样性指标。总的来说,通过特别强调可解释性,我们提出了一个可扩展、可靠、透明的替代传统人工评估的选择,促进为学习者提供详细诊断反馈,并支持自我调节学习优势,这是仅依靠自动化评分无法实现的。
更新时间: 2025-08-14 17:31:18
领域: cs.CL,cs.AI
Privacy-Aware Detection of Fake Identity Documents: Methodology, Benchmark, and Improved Detection Methods (FakeIDet2)
Remote user verification in Internet-based applications is becoming increasingly important nowadays. A popular scenario for it consists of submitting a picture of the user's Identity Document (ID) to a service platform, authenticating its veracity, and then granting access to the requested digital service. An ID is well-suited to verify the identity of an individual, since it is government issued, unique, and nontransferable. However, with recent advances in Artificial Intelligence (AI), attackers can surpass security measures in IDs and create very realistic physical and synthetic fake IDs. Researchers are now trying to develop methods to detect an ever-growing number of these AI-based fakes that are almost indistinguishable from authentic (bona fide) IDs. In this counterattack effort, researchers are faced with an important challenge: the difficulty in using real data to train fake ID detectors. This real data scarcity for research and development is originated by the sensitive nature of these documents, which are usually kept private by the ID owners (the users) and the ID Holders (e.g., government, police, bank, etc.). The main contributions of our study are: 1) We propose and discuss a patch-based methodology to preserve privacy in fake ID detection research. 2) We provide a new public database, FakeIDet2-db, comprising over 900K real/fake ID patches extracted from 2,000 ID images, acquired using different smartphone sensors, illumination and height conditions, etc. In addition, three physical attacks are considered: print, screen, and composite. 3) We present a new privacy-aware fake ID detection method, FakeIDet2. 4) We release a standard reproducible benchmark that considers physical and synthetic attacks from popular databases in the literature.
Updated: 2025-08-14 17:30:36
标题: 隐私感知下的假身份文档检测:方法论、基准测试和改进的检测方法(FakeIDet2)
摘要: 在基于互联网的应用中,远程用户验证变得越来越重要。一个常见的场景是将用户的身份证明(ID)的照片提交给服务平台,验证其真实性,然后授予对所请求数字服务的访问权限。身份证明是用于验证个人身份的理想工具,因为它由政府发放,唯一且不可转让。然而,随着人工智能(AI)的最新进展,攻击者可以突破身份证明中的安全措施,并创建非常逼真的真实和合成假身份证。研究人员现在正在努力开发方法来检测一个日益增长的这些基于AI的伪造品,这些伪造品几乎无法与真实的(真实的)身份证区分开来。在这一反击努力中,研究人员面临一个重要挑战:使用真实数据来训练伪造身份证检测器的困难。这种研究和开发中真实数据的稀缺性是由于这些文档的敏感性,通常由身份证所有者(用户)和身份证持有者(例如政府、警察、银行等)保留私密性。我们研究的主要贡献包括:1)我们提出并讨论一种基于补丁的方法论,用于保护伪造身份证检测研究中的隐私。2)我们提供一个新的公共数据库FakeIDet2-db,包括从2,000张身份证图像中提取的超过900K真假身份证补丁,这些图像是使用不同的智能手机传感器、照明和高度条件等获取的。此外,考虑了三种物理攻击:打印、屏幕和合成。3)我们提出了一种新的注重隐私的伪造身份证检测方法FakeIDet2。4)我们发布了一个标准可重现的基准测试,考虑了文献中流行数据库中的物理和合成攻击。
更新时间: 2025-08-14 17:30:36
领域: cs.CR,cs.AI,cs.CV,eess.IV
CrossDenoise: Denoising Implicit Feedback via a Lightweight Entity-Aware Synergistic Framework
Recommender systems heavily rely on implicit feedback, which is inherently noisy due to false positives and negatives, severely degrading recommendation accuracy. Existing denoising strategies often overlook entity-aware modeling, suffer from high computational overhead, or demand excessive hyperparameter tuning, limiting their real-world applicability. We propose CrossDenoise, a novel and lightweight framework that addresses these challenges by disentangling noise estimation into user-, item-, and interaction-specific factors. Leveraging empirical observations that show significant heterogeneity in user and item noise propensities, CrossDenoise computes entity reputation factors (user/item reliability) via a rank-based linear mapping of average training losses. These are fused with interaction-level weights derived from an empirical cumulative distribution function (ECDF) of individual losses. This design is model-agnostic, computationally efficient, and requires only two intuitive hyperparameters. Extensive experiments on ML-1M, Yelp, and Amazon-book datasets, across GMF, NeuMF, and CDAE backbones, demonstrate that CrossDenoise consistently and significantly outperforms state-of-the-art baselines. For instance, it achieves up to 27.01% NDCG@50 gain on Yelp with NeuMF, while incurring negligible computational and memory overhead. Our analysis confirms that CrossDenoise effectively separates clean from noisy samples and remains robust under varied hyperparameter settings. It offers a practical and scalable solution for denoising implicit feedback.
Updated: 2025-08-14 17:20:12
标题: CrossDenoise:通过轻量级实体感知协同框架对隐式反馈进行去噪
摘要: 推荐系统严重依赖隐式反馈,由于假阳性和假阴性,隐式反馈本质上是嘈杂的,严重降低了推荐准确性。现有的去噪策略通常忽视了实体感知建模,计算开销高,或者需要过多的超参数调整,限制了它们在实际应用中的适用性。我们提出了CrossDenoise,这是一个新颖且轻量级的框架,通过将噪声估计分解为用户、物品和交互特定因素来解决这些挑战。利用实证观察结果表明用户和物品的噪声倾向存在显著的异质性,CrossDenoise通过对平均训练损失的排序线性映射计算实体声誉因子(用户/物品可靠性)。这些因子与从单个损失的经验累积分布函数(ECDF)导出的交互级权重相融合。这种设计与模型无关,计算效率高,仅需要两个直观的超参数。在ML-1M、Yelp和Amazon-book数据集上进行了大量实验,跨GMF、NeuMF和CDAE主干网络,结果表明CrossDenoise始终显著优于最先进的基线。例如,它在Yelp上与NeuMF相比可以获得高达27.01%的NDCG@50增益,同时几乎不增加计算和内存开销。我们的分析证实CrossDenoise能有效地将干净样本与嘈杂样本分离,并在不同的超参数设置下保持稳健。它为去噪隐式反馈提供了实用且可扩展的解决方案。
更新时间: 2025-08-14 17:20:12
领域: cs.IR,cs.LG
GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo
Traditional multi-view stereo (MVS) methods rely heavily on photometric and geometric consistency constraints, but newer machine learning-based MVS methods check geometric consistency across multiple source views only as a post-processing step. In this paper, we present a novel approach that explicitly encourages geometric consistency of reference view depth maps across multiple source views at different scales during learning (see Fig. 1). We find that adding this geometric consistency loss significantly accelerates learning by explicitly penalizing geometrically inconsistent pixels, reducing the training iteration requirements to nearly half that of other MVS methods. Our extensive experiments show that our approach achieves a new state-of-the-art on the DTU and BlendedMVS datasets, and competitive results on the Tanks and Temples benchmark. To the best of our knowledge, GC-MVSNet is the first attempt to enforce multi-view, multi-scale geometric consistency during learning.
Updated: 2025-08-14 17:14:25
标题: GC-MVSNet:多视角、多尺度、几何一致的多视角立体匹配网络
摘要: 传统的多视角立体(MVS)方法在很大程度上依赖于光度和几何一致性约束,但基于新的基于机器学习的MVS方法只在后处理步骤中检查多个源视图之间的几何一致性。在本文中,我们提出了一种新颖的方法,通过在学习过程中明确鼓励在不同尺度上跨多个源视图的参考视图深度图的几何一致性来实现(见图1)。我们发现,添加这种几何一致性损失显著加速学习,通过明确惩罚几何不一致的像素,将训练迭代要求减少到几乎是其他MVS方法的一半。我们的广泛实验证明,我们的方法在DTU和BlendedMVS数据集上实现了新的最先进水平,并在Tanks and Temples基准测试上取得了竞争性结果。据我们所知,GC-MVSNet是第一次在学习过程中强制执行多视图、多尺度几何一致性。
更新时间: 2025-08-14 17:14:25
领域: cs.CV,cs.LG
iFairy: the First 2-bit Complex LLM with All Parameters in $\{\pm1, \pm i\}$
Quantization-Aware Training (QAT) integrates quantization into the training loop, enabling LLMs to learn robust low-bit representations, and is widely recognized as one of the most promising research directions. All current QAT research focuses on minimizing quantization error on full-precision models, where the full-precision accuracy acts as an upper bound (accuracy ceiling). No existing method has even attempted to surpass this ceiling. To break this ceiling, we propose a new paradigm: raising the ceiling (full-precision model), and then still quantizing it efficiently into 2 bits. We propose Fairy$\pm i$, the first 2-bit quantization framework for complex-valued LLMs. Specifically, our method leverages the representational advantages of the complex domain to boost full-precision accuracy. We map weights to the fourth roots of unity $\{\pm1, \pm i\}$, forming a perfectly symmetric and information-theoretically optimal 2-bit representation. Importantly, each quantized weight has either a zero real or imaginary part, enabling multiplication-free inference using only additions and element swaps. Experimental results show that Fairy$\pm i$ outperforms the ceiling of existing 2-bit quantization approaches in terms of both PPL and downstream tasks, while maintaining strict storage and compute efficiency. This work opens a new direction for building highly accurate and practical LLMs under extremely low-bit constraints.
Updated: 2025-08-14 17:13:26
标题: iFairy:第一个所有参数都在$\{\pm1, \pm i\}$中的2比特复杂LLM
摘要: Quantization-Aware Training (QAT)将量化整合到训练循环中,使LLMs能够学习稳健的低比特表示,并被广泛认为是最有前途的研究方向之一。当前所有的QAT研究都集中在最小化全精度模型上的量化误差,其中全精度准确性充当上限(准确性天花板)。目前没有任何方法甚至尝试超越这个上限。为了打破这个天花板,我们提出了一个新的范式:提高天花板(全精度模型),然后将其有效地量化为2比特。我们提出了Fairy$\pm i$,这是第一个用于复值LLMs的2比特量化框架。具体来说,我们的方法利用了复域的表示优势来提高全精度准确性。我们将权重映射到四次单位根$\{\pm1, \pm i\}$,形成一个完全对称且信息理论上最优的2比特表示。重要的是,每个量化的权重要么有零实部要么有零虚部,只能使用加法和元素交换进行无乘法推断。实验结果表明,Fairy$\pm i$在PPL和下游任务方面优于现有2比特量化方法的上限,同时保持严格的存储和计算效率。这项工作开辟了一个新方向,可以在极低比特约束下构建高度准确和实用的LLMs。
更新时间: 2025-08-14 17:13:26
领域: cs.LG,cs.CL
TAR: Teacher-Aligned Representations via Contrastive Learning for Quadrupedal Locomotion
Quadrupedal locomotion via Reinforcement Learning (RL) is commonly addressed using the teacher-student paradigm, where a privileged teacher guides a proprioceptive student policy. However, key challenges such as representation misalignment between privileged teacher and proprioceptive-only student, covariate shift due to behavioral cloning, and lack of deployable adaptation; lead to poor generalization in real-world scenarios. We propose Teacher-Aligned Representations via Contrastive Learning (TAR), a framework that leverages privileged information with self-supervised contrastive learning to bridge this gap. By aligning representations to a privileged teacher in simulation via contrastive objectives, our student policy learns structured latent spaces and exhibits robust generalization to Out-of-Distribution (OOD) scenarios, surpassing the fully privileged "Teacher". Results showed accelerated training by 2x compared to state-of-the-art baselines to achieve peak performance. OOD scenarios showed better generalization by 40% on average compared to existing methods. Moreover, TAR transitions seamlessly into learning during deployment without requiring privileged states, setting a new benchmark in sample-efficient, adaptive locomotion and enabling continual fine-tuning in real-world scenarios. Open-source code and videos are available at https://amrmousa.com/TARLoco/.
Updated: 2025-08-14 17:09:59
标题: TAR:通过对比学习实现与教师一致的四足动物运动表示
摘要: 通过强化学习(RL)进行四足动物的运动通常采用师生范式,即一位特权教师指导一个本体感知的学生策略。然而,关键挑战,如特权教师和仅本体感知学生之间的表示不一致、因行为克隆而导致的协变量转移以及缺乏可部署的适应性,导致在真实世界场景中的泛化能力较差。我们提出了通过对比学习实现教师对齐表示(TAR)的框架,该框架利用特权信息和自监督对比学习来弥合这一差距。通过在模拟环境中通过对比目标将表示对齐到特权教师,我们的学生策略学习结构化的潜在空间,并在超出分布(OOD)场景中表现出强大的泛化能力,超越完全特权的“教师”。结果显示,与最先进的基线相比,训练加速了2倍以实现最佳性能。与现有方法相比,OOD场景的泛化能力平均提高了40%。此外,TAR在部署期间无缝过渡到学习,无需特权状态,为样本高效、适应性运动设定了新的基准,并在真实世界场景中实现了持续微调。开源代码和视频可在https://amrmousa.com/TARLoco/获取。
更新时间: 2025-08-14 17:09:59
领域: cs.RO,cs.LG,cs.SY,eess.SY
Performance of universal machine-learned potentials with explicit long-range interactions in biomolecular simulations
Universal machine-learned potentials promise transferable accuracy across compositional and vibrational degrees of freedom, yet their application to biomolecular simulations remains underexplored. This work systematically evaluates equivariant message-passing architectures trained on the SPICE-v2 dataset with and without explicit long-range dispersion and electrostatics. We assess the impact of model size, training data composition, and electrostatic treatment across in- and out-of-distribution benchmark datasets, as well as molecular simulations of bulk liquid water, aqueous NaCl solutions, and biomolecules, including alanine tripeptide, the mini-protein Trp-cage, and Crambin. While larger models improve accuracy on benchmark datasets, this trend does not consistently extend to properties obtained from simulations. Predicted properties also depend on the composition of the training dataset. Long-range electrostatics show no systematic impact across systems. However, for Trp-cage, their inclusion yields increased conformational variability. Our results suggest that imbalanced datasets and immature evaluation practices currently challenge the applicability of universal machine-learned potentials to biomolecular simulations.
Updated: 2025-08-14 17:08:34
标题: 具有显式长程相互作用的通用机器学习势在生物分子模拟中的性能
摘要: 通用的机器学习势能承诺在组成和振动自由度上具有可传递的准确性,但它们在生物分子模拟中的应用仍未得到充分探讨。本文系统评估了在SPICE-v2数据集上训练的等变消息传递架构,包括明确的长程色散和静电效应以及没有这些效应的情况。我们评估了模型大小、训练数据组成和静电处理对于在内部和外部基准数据集以及模拟的大量液态水、水合氯化钠溶液和生物分子(包括丙氨酸三肽、迷你蛋白Trp-cage和Crambin)的影响。尽管更大的模型可以提高基准数据集上的准确性,但这种趋势并不一致地延伸到从模拟中获得的性质。预测的性质还取决于训练数据集的组成。长程静电效应在系统中没有系统性的影响。然而,对于Trp-cage,它们的包含导致了构象的变化增加。我们的结果表明,不平衡的数据集和不成熟的评估实践目前挑战了通用机器学习势能在生物分子模拟中的适用性。
更新时间: 2025-08-14 17:08:34
领域: physics.chem-ph,cond-mat.soft,cs.LG,physics.comp-ph
Reinforced Language Models for Sequential Decision Making
Large Language Models (LLMs) show potential as sequential decision-making agents, but their application is often limited due to a reliance on large, computationally expensive models. This creates a need to improve smaller models, yet existing post-training methods are designed for single-turn interactions and cannot handle credit assignment in multi-step agentic tasks. To address this, we introduce Multi-Step Group-Relative Policy Optimization (MS-GRPO), a new algorithm for post-training LLM agents, grounded in formal Text-Mediated Stochastic Game (TSMG) and Language-Agent Policy (LAP) frameworks. For credit assignment, MS-GRPO attributes the entire cumulative episode reward to each individual episode step. We supplement this algorithm with a novel absolute-advantage-weighted episode sampling strategy that we show improves training performance. We evaluate our approach by post-training a 3-billion parameter model on Snake and Frozen Lake. Our experiments demonstrate that the method is effective in improving decision-making performance: our post-trained 3B parameter model outperforms a 72B parameter baseline by 50% on the Frozen Lake task. This work demonstrates that targeted post-training is a practical and efficient alternative to relying on model scale for creating sequential decision-making agents using LLMs.
Updated: 2025-08-14 17:05:44
标题: 增强语言模型用于序列决策制定
摘要: 大型语言模型(LLMs)显示出作为顺序决策制定代理的潜力,但由于依赖大型、计算昂贵的模型,它们的应用通常受到限制。这导致需要改进较小的模型,然而现有的后训练方法设计用于单一轮交互,无法处理多步代理任务中的信用分配。为了解决这个问题,我们引入了多步组相对策略优化(MS-GRPO),这是一种用于后训练LLM代理的新算法,基于正式的文本介导随机博弈(TSMG)和语言-代理策略(LAP)框架。对于信用分配,MS-GRPO将整个累积回报归因给每个个体回合步骤。我们补充了这个算法一个新颖的绝对优势加权的回合采样策略,我们展示这种策略可以提高训练性能。我们通过在Snake和Frozen Lake上对一个30亿参数模型进行后训练来评估我们的方法。我们的实验表明,这种方法在改进决策性能方面是有效的:我们的后训练30亿参数模型在Frozen Lake任务上比72亿参数基线模型表现出50%的优势。这项工作表明,有针对性的后训练是一个实用和有效的替代方案,不再依赖于模型规模来使用LLMs创建顺序决策制定代理。
更新时间: 2025-08-14 17:05:44
领域: cs.CL,cs.AI,cs.LG,I.2.7; I.2.8
Hypothesis Spaces for Deep Learning
This paper introduces a hypothesis space for deep learning based on deep neural networks (DNNs). By treating a DNN as a function of two variables - the input variable and the parameter variable - we consider the set of DNNs where the parameter variable belongs to a space of weight matrices and biases determined by a prescribed depth and layer widths. To construct a Banach space of functions of the input variable, we take the weak* closure of the linear span of this DNN set. We prove that the resulting Banach space is a reproducing kernel Banach space (RKBS) and explicitly construct its reproducing kernel. Furthermore, we investigate two learning models - regularized learning and the minimum norm interpolation (MNI) problem - within the RKBS framework by establishing representer theorems. These theorems reveal that the solutions to these learning problems can be expressed as a finite sum of kernel expansions based on training data.
Updated: 2025-08-14 17:04:50
标题: 深度学习的假设空间
摘要: 本文介绍了一种基于深度神经网络(DNNs)的深度学习假设空间。通过将DNN视为两个变量的函数-输入变量和参数变量-我们考虑了参数变量属于由规定深度和层宽确定的权重矩阵和偏差空间的DNN集合。为了构建输入变量函数的Banach空间,我们取这个DNN集合的线性空间的弱*闭包。我们证明了结果的Banach空间是一个再生核Banach空间(RKBS),并明确构建了它的再生核。此外,我们通过建立代表定理,在RKBS框架内研究了两种学习模型-正则化学习和最小范数插值(MNI)问题。这些定理揭示了这些学习问题的解可以表示为基于训练数据的核展开的有限总和。
更新时间: 2025-08-14 17:04:50
领域: stat.ML,cs.LG,math.FA
Interpretable Neural ODEs for Gene Regulatory Network Discovery under Perturbations
Modern high-throughput biological datasets with thousands of perturbations provide the opportunity for large-scale discovery of causal graphs that represent the regulatory interactions between genes. Differentiable causal graphical models have been proposed to infer a gene regulatory network (GRN) from large scale interventional datasets, capturing the causal gene regulatory relationships from genetic perturbations. However, existing models are limited in their expressivity and scalability while failing to address the dynamic nature of biological processes such as cellular differentiation. We propose PerturbODE, a novel framework that incorporates biologically informative neural ordinary differential equations (neural ODEs) to model cell state trajectories under perturbations and derive the causal GRN from the neural ODE's parameters. We demonstrate PerturbODE's efficacy in trajectory prediction and GRN inference across simulated and real over-expression datasets.
Updated: 2025-08-14 17:04:28
标题: 可解释的神经ODE在干扰下的基因调控网络发现中的应用
摘要: 现代高通量生物数据集具有数千个扰动,为代表基因之间调控相互作用的因果图的大规模发现提供了机会。已经提出了可微因果图模型,用于从大规模干预数据集中推断基因调控网络(GRN),捕捉来自遗传扰动的因果基因调控关系。然而,现有模型在表达能力和可扩展性方面存在局限,同时未能解决细胞分化等生物过程的动态特性。我们提出了PerturbODE,这是一个新颖的框架,将具有生物信息的神经普通微分方程(神经ODEs)纳入其中,用于模拟细胞状态在扰动下的轨迹,并从神经ODE的参数中推导出因果GRN。我们展示了PerturbODE在轨迹预测和GRN推断方面在模拟和真实过表达数据集中的有效性。
更新时间: 2025-08-14 17:04:28
领域: cs.LG,cs.AI,cs.CE,q-bio.MN,stat.ME
SoK: Data Minimization in Machine Learning
Data minimization (DM) describes the principle of collecting only the data strictly necessary for a given task. It is a foundational principle across major data protection regulations like GDPR and CPRA. Violations of this principle have substantial real-world consequences, with regulatory actions resulting in fines reaching hundreds of millions of dollars. Notably, the relevance of data minimization is particularly pronounced in machine learning (ML) applications, which typically rely on large datasets, resulting in an emerging research area known as Data Minimization in Machine Learning (DMML). At the same time, existing work on other ML privacy and security topics often addresses concerns relevant to DMML without explicitly acknowledging the connection. This disconnect leads to confusion among practitioners, complicating their efforts to implement DM principles and interpret the terminology, metrics, and evaluation criteria used across different research communities. To address this gap, our work introduces a comprehensive framework for DMML, including a unified data pipeline, adversaries, and points of minimization. This framework allows us to systematically review the literature on data minimization and \emph{DM-adjacent} methodologies, for the first time presenting a structured overview designed to help practitioners and researchers effectively apply DM principles. Our work facilitates a unified DM-centric understanding and broader adoption of data minimization strategies in AI/ML.
Updated: 2025-08-14 17:00:13
标题: SoK: 机器学习中的数据最小化
摘要: 数据最小化(DM)描述了仅收集给定任务所需的数据的原则。它是GDPR和CPRA等主要数据保护法规的基本原则。违反这一原则会带来实质性的现实后果,监管行动可能导致数亿美元的罚款。值得注意的是,数据最小化在机器学习(ML)应用中尤为重要,这些应用通常依赖于大型数据集,从而形成了一个名为数据最小化在机器学习(DMML)的新兴研究领域。与此同时,现有关于其他ML隐私和安全主题的工作通常处理与DMML相关的问题,但并未明确承认这种联系。这种脱节导致从业者感到困惑,使他们难以实施DM原则并解释跨不同研究社区使用的术语、指标和评估标准。为填补这一空白,我们的工作引入了一个全面的DMML框架,包括统一的数据流水线、对手和最小化点。这一框架使我们能够系统地审查关于数据最小化和DM相关方法的文献,首次呈现了一个旨在帮助从业者和研究人员有效应用DM原则的结构化概览。我们的工作促进了对AI/ML中数据最小化策略的统一理解和更广泛的采用。
更新时间: 2025-08-14 17:00:13
领域: cs.LG,cs.CR
A Multimodal Neural Network for Recognizing Subjective Self-Disclosure Towards Social Robots
Subjective self-disclosure is an important feature of human social interaction. While much has been done in the social and behavioural literature to characterise the features and consequences of subjective self-disclosure, little work has been done thus far to develop computational systems that are able to accurately model it. Even less work has been done that attempts to model specifically how human interactants self-disclose with robotic partners. It is becoming more pressing as we require social robots to work in conjunction with and establish relationships with humans in various social settings. In this paper, our aim is to develop a custom multimodal attention network based on models from the emotion recognition literature, training this model on a large self-collected self-disclosure video corpus, and constructing a new loss function, the scale preserving cross entropy loss, that improves upon both classification and regression versions of this problem. Our results show that the best performing model, trained with our novel loss function, achieves an F1 score of 0.83, an improvement of 0.48 from the best baseline model. This result makes significant headway in the aim of allowing social robots to pick up on an interaction partner's self-disclosures, an ability that will be essential in social robots with social cognition.
Updated: 2025-08-14 16:50:51
标题: 一个用于识别面向社交机器人的主观自我披露的多模态神经网络
摘要: 主观自我披露是人类社会互动的重要特征。尽管社会和行为文献中已经做了很多工作来描述主观自我披露的特征和后果,但迄今为止很少有工作致力于开发能够准确建模它的计算系统。更少的工作尝试对人类互动者如何与机器人伙伴自我披露进行建模。随着我们需要社交机器人在各种社交环境中与人类合作并建立关系,这一问题变得更加紧迫。本文旨在基于情绪识别文献中的模型开发自定义多模态注意力网络,对其进行训练以此模型在一个大型自我收集的自我披露视频语料库上,并构建一个新的损失函数,即保持比例的交叉熵损失,该损失函数改进了这个问题的分类和回归版本。我们的结果表明,使用我们的新型损失函数训练的表现最佳的模型,达到了0.83的F1分数,比最佳基准模型提高了0.48。这一结果在让社交机器人能够理解互动伙伴的自我披露方面取得了重大进展,这种能力对于具有社交认知的社交机器人至关重要。
更新时间: 2025-08-14 16:50:51
领域: cs.RO,cs.AI
Accelerating exoplanet climate modelling: A machine learning approach to complement 3D GCM grid simulations
With the development of ever-improving telescopes capable of observing exoplanet atmospheres in greater detail and number, there is a growing demand for enhanced 3D climate models to support and help interpret observational data from space missions like CHEOPS, TESS, JWST, PLATO, and Ariel. However, the computationally intensive and time-consuming nature of general circulation models (GCMs) poses significant challenges in simulating a wide range of exoplanetary atmospheres. This study aims to determine whether machine learning (ML) algorithms can be used to predict the 3D temperature and wind structure of arbitrary tidally-locked gaseous exoplanets in a range of planetary parameters. A new 3D GCM grid with 60 inflated hot Jupiters orbiting A, F, G, K, and M-type host stars modelled with Exorad has been introduced. A dense neural network (DNN) and a decision tree algorithm (XGBoost) are trained on this grid to predict local gas temperatures along with horizontal and vertical winds. To ensure the reliability and quality of the ML model predictions, WASP-121 b, HATS-42 b, NGTS-17 b, WASP-23 b, and NGTS-1 b-like planets, which are all targets for PLATO observation, are selected and modelled with ExoRad and the two ML methods as test cases. The DNN predictions for the gas temperatures are to such a degree that the calculated spectra agree within 32 ppm for all but one planet, for which only one single HCN feature reaches a 100 ppm difference. The developed ML emulators can reliably predict the complete 3D temperature field of an inflated warm to ultra-hot tidally locked Jupiter around A to M-type host stars. It provides a fast tool to complement and extend traditional GCM grids for exoplanet ensemble studies. The quality of the predictions is such that no or minimal effects on the gas phase chemistry, hence on the cloud formation and transmission spectra, are to be expected.
Updated: 2025-08-14 16:50:38
标题: 加速外行星气候建模:一种机器学习方法来辅助三维GCM网格模拟
摘要: 随着能够观测外行星大气层更详细和更多的不断改进的望远镜的发展,对增强的3D气候模型的需求不断增长,以支持和帮助解释来自CHEOPS、TESS、JWST、PLATO和Ariel等太空任务的观测数据。然而,广义环流模型(GCMs)计算密集和耗时的特性在模拟各种外行星大气层时存在重大挑战。本研究旨在确定机器学习(ML)算法是否可以用于预测一系列行星参数范围内的任意潮汐锁定气态外行星的3D温度和风结构。引入了一个新的具有60个气球热木星围绕A、F、G、K和M型主星轨道运行的3D GCM网格,该模型采用Exorad建模。在该网格上训练了一个密集神经网络(DNN)和一个决策树算法(XGBoost),以预测局部气体温度以及水平和垂直风。为了确保ML模型预测的可靠性和质量,选择并用ExoRad和两种ML方法对WASP-121b、HATS-42b、NGTS-17b、WASP-23b和NGTS-1b等类似PLATO观测目标的行星进行建模作为测试案例。气体温度的DNN预测是如此程度,以至于计算的光谱在除一个行星外的所有行星中都在32 ppm范围内一致,对于这个行星,只有一个单一的HCN特征达到100 ppm的差异。开发的ML模拟器可以可靠地预测围绕A至M型主星的膨胀温暖到超热潮汐锁定木星的完整3D温度场。它提供了一个快速工具,以补充和扩展外行星集合研究的传统GCM网格。预测的质量使得不会或只会对气相化学产生最小影响,因此对云层形成和透射光谱也不会产生影响。
更新时间: 2025-08-14 16:50:38
领域: astro-ph.EP,cs.LG
Memory-Augmented Transformers: A Systematic Review from Neuroscience Principles to Technical Solutions
Memory is fundamental to intelligence, enabling learning, reasoning, and adaptability across biological and artificial systems. While Transformer architectures excel at sequence modeling, they face critical limitations in long-range context retention, continual learning, and knowledge integration. This review presents a unified framework bridging neuroscience principles, including dynamic multi-timescale memory, selective attention, and consolidation, with engineering advances in Memory-Augmented Transformers. We organize recent progress through three taxonomic dimensions: functional objectives (context extension, reasoning, knowledge integration, adaptation), memory representations (parameter-encoded, state-based, explicit, hybrid), and integration mechanisms (attention fusion, gated control, associative retrieval). Our analysis of core memory operations (reading, writing, forgetting, and capacity management) reveals a shift from static caches toward adaptive, test-time learning systems. We identify persistent challenges in scalability and interference, alongside emerging solutions including hierarchical buffering and surprise-gated updates. This synthesis provides a roadmap toward cognitively-inspired, lifelong-learning Transformer architectures.
Updated: 2025-08-14 16:48:38
标题: 记忆增强变压器:从神经科学原理到技术解决方案的系统性回顾
摘要: 记忆对智能至关重要,能够在生物和人工系统中实现学习、推理和适应性。虽然Transformer架构在序列建模方面表现出色,但在长距离上下文保留、持续学习和知识整合方面面临关键限制。本综述提出了一个统一的框架,将神经科学原理(包括动态多时间尺度记忆、选择性注意力和巩固)与Memory-Augmented Transformers中的工程进展相结合。我们通过三个分类维度组织了最近的进展:功能目标(上下文扩展、推理、知识整合、适应性)、记忆表示(参数编码、状态基础、显式、混合)和整合机制(注意力融合、门控、联想检索)。我们对核心记忆操作(读取、写入、遗忘和容量管理)的分析显示,从静态缓存向自适应、测试时间学习系统的转变。我们确定了可扩展性和干扰中持续存在的挑战,以及新兴解决方案,包括分层缓冲和惊喜门控更新。这种综合提供了通向受认知启发的终身学习Transformer架构的路线图。
更新时间: 2025-08-14 16:48:38
领域: cs.LG,cs.CL
OpenCUA: Open Foundations for Computer-Use Agents
Vision-language models have demonstrated impressive capabilities as computer-use agents (CUAs) capable of automating diverse computer tasks. As their commercial potential grows, critical details of the most capable CUA systems remain closed. As these agents will increasingly mediate digital interactions and execute consequential decisions on our behalf, the research community needs access to open CUA frameworks to study their capabilities, limitations, and risks. To bridge this gap, we propose OpenCUA, a comprehensive open-source framework for scaling CUA data and foundation models. Our framework consists of: (1) an annotation infrastructure that seamlessly captures human computer-use demonstrations; (2) AgentNet, the first large-scale computer-use task dataset spanning 3 operating systems and 200+ applications and websites; (3) a scalable pipeline that transforms demonstrations into state-action pairs with reflective long Chain-of-Thought reasoning that sustain robust performance gains as data scales. Our end-to-end agent models demonstrate strong performance across CUA benchmarks. In particular, OpenCUA-32B achieves an average success rate of 34.8% on OSWorld-Verified, establishing a new state-of-the-art (SOTA) among open-source models and surpassing OpenAI CUA (GPT-4o). Further analysis confirms that our approach generalizes well across domains and benefits significantly from increased test-time computation. We release our annotation tool, datasets, code, and models to build open foundations for further CUA research.
Updated: 2025-08-14 16:47:06
标题: OpenCUA:计算机使用代理的开放基础
摘要: 视觉语言模型已经展示出作为计算机使用代理(CUAs)的印象能力,能够自动化各种计算机任务。随着它们商业潜力的增长,最具能力的CUA系统的关键细节仍然保密。由于这些代理将越来越多地介入数字互动并代表我们执行重要决策,研究界需要访问开放的CUA框架来研究它们的能力、限制和风险。为了弥补这一差距,我们提出了OpenCUA,一个全面的开源框架,用于扩展CUA数据和基础模型。我们的框架包括:(1)一个无缝捕获人类计算机使用演示的注释基础设施;(2)AgentNet,覆盖3个操作系统和200多个应用程序和网站的第一个大规模计算机使用任务数据集;(3)一个可扩展的流水线,将演示转换为具有反思长链推理的状态-动作对,随着数据规模的扩大而保持强大的性能增益。我们的端到端代理模型在CUA基准测试中表现出色。特别是,OpenCUA-32B在OSWorld-Verified上实现了34.8%的平均成功率,确立了开源模型的最新技术水平,并超过了OpenAI CUA(GPT-4o)。进一步的分析证实,我们的方法在各个领域都有很好的泛化能力,并且在增加测试时间计算方面获益显著。我们发布我们的注释工具、数据集、代码和模型,为进一步的CUA研究打下开放的基础。
更新时间: 2025-08-14 16:47:06
领域: cs.AI,cs.CV
Not There Yet: Evaluating Vision Language Models in Simulating the Visual Perception of People with Low Vision
Advances in vision language models (VLMs) have enabled the simulation of general human behavior through their reasoning and problem solving capabilities. However, prior research has not investigated such simulation capabilities in the accessibility domain. In this paper, we evaluate the extent to which VLMs can simulate the vision perception of low vision individuals when interpreting images. We first compile a benchmark dataset through a survey study with 40 low vision participants, collecting their brief and detailed vision information and both open-ended and multiple-choice image perception and recognition responses to up to 25 images. Using these responses, we construct prompts for VLMs (GPT-4o) to create simulated agents of each participant, varying the included information on vision information and example image responses. We evaluate the agreement between VLM-generated responses and participants' original answers. Our results indicate that VLMs tend to infer beyond the specified vision ability when given minimal prompts, resulting in low agreement (0.59). The agreement between the agent' and participants' responses remains low when only either the vision information (0.59) or example image responses (0.59) are provided, whereas a combination of both significantly increase the agreement (0.70, p < 0.0001). Notably, a single example combining both open-ended and multiple-choice responses, offers significant performance improvements over either alone (p < 0.0001), while additional examples provided minimal benefits (p > 0.05).
Updated: 2025-08-14 16:46:03
标题: 还没达到目标:评估视觉语言模型在模拟低视力人群的视觉感知方面的表现
摘要: 视觉语言模型(VLMs)的进展使得通过它们的推理和问题解决能力模拟了普通人类行为。然而,先前的研究并未探究这种模拟能力在可访问性领域的应用。在本文中,我们评估了VLMs在解释图像时能够模拟低视力个体的视觉感知的程度。我们首先通过对40名低视力参与者进行调查研究,收集他们的简要和详细的视力信息,以及对多达25幅图像的开放性和多选图像感知和识别回答。利用这些回答,我们为VLMs(GPT-4o)构建提示,创建每个参与者的模拟代理,变化包括的视力信息和示例图像回答。我们评估了VLM生成的回答与参与者原始答案之间的一致性。我们的结果表明,当提供最少的提示时,VLMs往往会推断超出指定的视力能力,导致一致性较低(0.59)。当仅提供视力信息(0.59)或示例图像回答(0.59)时,代理和参与者的回答一致性仍然较低,而两者的结合显著提高了一致性(0.70,p <0.0001)。值得注意的是,一个结合开放性和多选回答的单个示例,相较于单独使用,显著提高了性能(p <0.0001),而额外的示例提供了较小的好处(p >0.05)。
更新时间: 2025-08-14 16:46:03
领域: cs.CV,cs.AI,cs.HC
Mobile-Friendly Deep Learning for Plant Disease Detection: A Lightweight CNN Benchmark Across 101 Classes of 33 Crops
Plant diseases are a major threat to food security globally. It is important to develop early detection systems which can accurately detect. The advancement in computer vision techniques has the potential to solve this challenge. We have developed a mobile-friendly solution which can accurately classify 101 plant diseases across 33 crops. We built a comprehensive dataset by combining different datasets, Plant Doc, PlantVillage, and PlantWild, all of which are for the same purpose. We evaluated performance across several lightweight architectures - MobileNetV2, MobileNetV3, MobileNetV3-Large, and EfficientNet-B0, B1 - specifically chosen for their efficiency on resource-constrained devices. The results were promising, with EfficientNet-B1 delivering our best performance at 94.7% classification accuracy. This architecture struck an optimal balance between accuracy and computational efficiency, making it well-suited for real-world deployment on mobile devices.
Updated: 2025-08-14 16:43:27
标题: 移动友好的深度学习用于植物病害检测:跨越33种作物的101个类别的轻量级CNN基准
摘要: 植物疾病是全球粮食安全的主要威胁。开发可以准确检测的早期检测系统非常重要。计算机视觉技术的进步有可能解决这一挑战。我们开发了一种适合移动设备的解决方案,可以准确分类33种作物上的101种植物疾病。我们通过结合不同数据集Plant Doc、PlantVillage和PlantWild建立了一个全面的数据集,这些数据集都是为了相同的目的。我们评估了几种轻量级架构的性能 - MobileNetV2、MobileNetV3、MobileNetV3-Large和EfficientNet-B0、B1 - 具体选择它们的效率在资源受限设备上。结果令人鼓舞,EfficientNet-B1以94.7%的分类准确率提供了我们最佳的性能。这种架构在准确性和计算效率之间达到了最佳平衡,非常适合在移动设备上实际部署。
更新时间: 2025-08-14 16:43:27
领域: cs.CV,cs.LG
Rule2Text: A Framework for Generating and Evaluating Natural Language Explanations of Knowledge Graph Rules
Knowledge graphs (KGs) can be enhanced through rule mining; however, the resulting logical rules are often difficult for humans to interpret due to their inherent complexity and the idiosyncratic labeling conventions of individual KGs. This work presents Rule2Text, a comprehensive framework that leverages large language models (LLMs) to generate natural language explanations for mined logical rules, thereby improving KG accessibility and usability. We conduct extensive experiments using multiple datasets, including Freebase variants (FB-CVT-REV, FB+CVT-REV, and FB15k-237) as well as the ogbl-biokg dataset, with rules mined using AMIE 3.5.1. We systematically evaluate several LLMs across a comprehensive range of prompting strategies, including zero-shot, few-shot, variable type incorporation, and Chain-of-Thought reasoning. To systematically assess models' performance, we conduct a human evaluation of generated explanations on correctness and clarity. To address evaluation scalability, we develop and validate an LLM-as-a-judge framework that demonstrates strong agreement with human evaluators. Leveraging the best-performing model (Gemini 2.0 Flash), LLM judge, and human-in-the-loop feedback, we construct high-quality ground truth datasets, which we use to fine-tune the open-source Zephyr model. Our results demonstrate significant improvements in explanation quality after fine-tuning, with particularly strong gains in the domain-specific dataset. Additionally, we integrate a type inference module to support KGs lacking explicit type information. All code and data are publicly available at https://github.com/idirlab/KGRule2NL.
Updated: 2025-08-14 16:41:47
标题: Rule2Text:生成和评估知识图规则自然语言解释的框架
摘要: 知识图谱(KGs)可以通过规则挖掘进行增强;然而,由于其固有复杂性和个体KGs的独特标签约定,由此产生的逻辑规则通常难以解释。本文介绍了Rule2Text,这是一个全面的框架,利用大型语言模型(LLMs)生成自然语言解释,从而提高KG的可访问性和可用性。我们使用多个数据集进行了广泛的实验,包括Freebase变体(FB-CVT-REV、FB+CVT-REV和FB15k-237)以及ogbl-biokg数据集,其中使用AMIE 3.5.1挖掘规则。我们系统地评估了几种LLMs在广泛范围的提示策略下的性能,包括零提示、少提示、变量类型整合和思维链推理。为了系统评估模型的性能,我们对生成的解释进行了正确性和清晰度的人类评估。为了解决评估可扩展性问题,我们开发并验证了一个LLM作为评委的框架,与人类评估者具有很强的一致性。利用表现最佳的模型(Gemini 2.0 Flash)、LLM评委和人机交互反馈,我们构建了高质量的基准数据集,并用于微调开源的Zephyr模型。我们的结果表明,在微调后,解释质量显著提高,尤其是在领域特定数据集中获得了明显的增益。此外,我们集成了一个类型推断模块,以支持缺乏明确类型信息的KGs。所有代码和数据都可以在https://github.com/idirlab/KGRule2NL 公开获得。
更新时间: 2025-08-14 16:41:47
领域: cs.CL,cs.AI
Comparison of Data Reduction Criteria for Online Gaussian Processes
Gaussian Processes (GPs) are widely used for regression and system identification due to their flexibility and ability to quantify uncertainty. However, their computational complexity limits their applicability to small datasets. Moreover in a streaming scenario, more and more datapoints accumulate which is intractable even for Sparse GPs. Online GPs aim to alleviate this problem by e.g. defining a maximum budget of datapoints and removing redundant datapoints. This work provides a unified comparison of several reduction criteria, analyzing both their computational complexity and reduction behavior. The criteria are evaluated on benchmark functions and real-world datasets, including dynamic system identification tasks. Additionally, acceptance criteria are proposed to further filter out redundant datapoints. This work yields practical guidelines for choosing a suitable criterion for an online GP algorithm.
Updated: 2025-08-14 16:41:18
标题: 在线高斯过程数据减少标准的比较
摘要: 高斯过程(GPs)由于其灵活性和量化不确定性的能力,在回归和系统识别中被广泛使用。然而,它们的计算复杂性限制了它们对小数据集的适用性。此外,在流式场景中,越来越多的数据点积累,即使对于稀疏GPs也是难以处理的。在线GPs旨在通过定义数据点的最大预算并去除冗余数据点等方式缓解这个问题。本文提供了对几种减少标准的统一比较,分析它们的计算复杂性和减少行为。这些标准在基准函数和真实世界数据集上进行评估,包括动态系统识别任务。此外,还提出了接受标准,以进一步过滤掉冗余数据点。这项工作为选择在线GP算法的适当标准提供了实用指南。
更新时间: 2025-08-14 16:41:18
领域: cs.LG,stat.ML
Learning to Schedule in Parallel-Server Queues with Stochastic Bilinear Rewards
We consider the problem of scheduling in multi-class, parallel-server queuing systems with uncertain rewards from job-server assignments. In this scenario, jobs incur holding costs while awaiting completion, and job-server assignments yield observable stochastic rewards with unknown mean values. The mean rewards for job-server assignments are assumed to follow a bilinear model with respect to features that characterize jobs and servers. Our objective is to minimize regret by maximizing the cumulative reward of job-server assignments over a time horizon, while keeping the total job holding cost bounded to ensure the stability of the queueing system. This problem is motivated by applications requiring resource allocation in network systems. In this problem, it is essential to control the tradeoff between reward maximization and fair allocation for the stability of the underlying queuing system (i.e., maximizing network throughput). To address this problem, we propose a scheduling algorithm based on a weighted proportional fair criteria augmented with marginal costs for reward maximization, incorporating a bandit algorithm tailored for bilinear rewards. Our algorithm achieves a sub-linear regret bound and a sub-linear mean holding cost (and queue length bound) of $\tilde{O}(\sqrt{T})$, respectively, with respect to the time horizon $T$, thus guaranteeing queuing system stability. Additionally, we establish stability conditions for distributed iterative algorithms for computing allocations, which are relevant to large-scale system applications. Finally, we demonstrate the efficiency of our algorithm through numerical experiments.
Updated: 2025-08-14 16:35:11
标题: 学习在具有随机双线性奖励的并行服务器队列中进行调度
摘要: 我们考虑在多类别、并行服务器排队系统中,由于作业-服务器分配的不确定奖励而产生的调度问题。在这种情况下,作业在等待完成时会产生持有成本,作业-服务器分配会产生具有未知均值的可观测随机奖励。假定作业-服务器分配的平均奖励遵循一个双线性模型,与表征作业和服务器的特征有关。我们的目标是通过最大化作业-服务器分配的累积奖励来最小化后悔,在保持总作业持有成本有界以确保排队系统稳定的同时,在一个时间范围内。这个问题的动机来自需要在网络系统中进行资源分配的应用。在这个问题中,对于底层排队系统的稳定性(即最大化网络吞吐量),控制奖励最大化和公平分配之间的权衡是至关重要的。为了解决这个问题,我们提出了一种基于加权比例公平标准的调度算法,该算法增加了用于奖励最大化的边际成本,同时结合了专门针对双线性奖励设计的赌博算法。我们的算法分别在时间范围$T$的情况下,实现了次线性后悔界和次线性平均持有成本(和队列长度界)的$\tilde{O}(\sqrt{T})$,从而确保了排队系统的稳定性。此外,我们建立了用于计算分配的分布式迭代算法的稳定性条件,这对于大规模系统应用是相关的。最后,我们通过数值实验展示了我们算法的效率。
更新时间: 2025-08-14 16:35:11
领域: cs.LG,cs.DS,math.OC
Quantitative Comparison of Fine-Tuning Techniques for Pretrained Latent Diffusion Models in the Generation of Unseen SAR Images
We present a framework for adapting a large pretrained latent diffusion model to high-resolution Synthetic Aperture Radar (SAR) image generation. The approach enables controllable synthesis and the creation of rare or out-of-distribution scenes beyond the training set. Rather than training a task-specific small model from scratch, we adapt an open-source text-to-image foundation model to the SAR modality, using its semantic prior to align prompts with SAR imaging physics (side-looking geometry, slant-range projection, and coherent speckle with heavy-tailed statistics). Using a 100k-image SAR dataset, we compare full fine-tuning and parameter-efficient Low-Rank Adaptation (LoRA) across the UNet diffusion backbone, the Variational Autoencoder (VAE), and the text encoders. Evaluation combines (i) statistical distances to real SAR amplitude distributions, (ii) textural similarity via Gray-Level Co-occurrence Matrix (GLCM) descriptors, and (iii) semantic alignment using a SAR-specialized CLIP model. Our results show that a hybrid strategy-full UNet tuning with LoRA on the text encoders and a learned token embedding-best preserves SAR geometry and texture while maintaining prompt fidelity. The framework supports text-based control and multimodal conditioning (e.g., segmentation maps, TerraSAR-X, or optical guidance), opening new paths for large-scale SAR scene data augmentation and unseen scenario simulation in Earth observation.
Updated: 2025-08-14 16:29:14
标题: 预训练潜在扩散模型微调技术在生成未见SAR图像中的定量比较
摘要: 我们提出了一个框架,用于将大型预训练的潜在扩散模型适应高分辨率合成孔径雷达(SAR)图像生成。该方法实现了可控合成和创造训练集之外的罕见或分布之外的场景。与从头开始训练特定任务的小型模型不同,我们将开源文本到图像基础模型适应到SAR模态,利用其语义先验来对齐SAR成像物理学(侧视几何、斜距投影和具有重尾统计的相干斑点)。使用一个包含10万张图像的SAR数据集,我们比较了完全微调和参数高效的低秩适应(LoRA)在UNet扩散主干、变分自动编码器(VAE)和文本编码器之间的差异。评估结合了(i)与真实SAR幅度分布的统计距离,(ii)通过灰度共生矩阵(GLCM)描述符的纹理相似性,以及(iii)使用SAR专用CLIP模型的语义对齐。我们的结果表明,一个混合策略-在文本编码器上进行完整的UNet调整,并学习标记嵌入-最佳保留SAR几何和纹理,同时保持提示的忠实度。该框架支持基于文本的控制和多模态调节(例如,分割地图、TerraSAR-X或光学引导),为大规模SAR场景数据增强和地球观测中未见场景模拟打开了新路径。
更新时间: 2025-08-14 16:29:14
领域: cs.CV,cs.AI
Parity Cross-Resonance: A Multiqubit Gate
We present a native three-qubit entangling gate that exploits engineered interactions to realize control-control-target and control-target-target operations in a single coherent step. Unlike conventional decompositions into multiple two-qubit gates, our hybrid optimization approach selectively amplifies desired interactions while suppressing unwanted couplings, yielding robust performance across the computational subspace and beyond. The new gate can be classified as a cross-resonance gate. We show it can be utilized in several ways, for example, in GHZ triplet state preparation, Toffoli-class logic demonstrations with many-body interactions, and in implementing a controlled-ZZ gate. The latter maps the parity of two data qubits directly onto a measurement qubit, enabling faster and higher-fidelity stabilizer measurements in surface-code quantum error correction. In all these examples, we show that the three-qubit gate performance remains robust across Hilbert space sizes, as confirmed by testing under increasing total excitation numbers. This work lays the foundation for co-designing circuit architectures and control protocols that leverage native multiqubit interactions as core elements of next-generation superconducting quantum processors.
Updated: 2025-08-14 16:26:32
标题: 奇偶交叉共振:多比特门
摘要: 我们提出了一种原生的三比特纠缠门,利用工程化相互作用实现在一个连贯步骤中的控制-控制-目标和控制-目标-目标操作。与传统的分解为多个两比特门不同,我们的混合优化方法选择性地增强期望的相互作用,同时抑制不需要的耦合,产生跨计算子空间及更大范围内的稳健性表现。这个新门可以被归类为交叉共振门。我们展示它可以以多种方式利用,例如在GHZ三重态状态准备中,具有多体相互作用的Toffoli类逻辑演示中,以及实现一个受控-ZZ门。后者将两个数据比特的奇偶直接映射到一个测量比特上,从而在表面码量子误差纠正中实现更快速和更高保真度的稳定子测量。在所有这些示例中,我们展示了三比特门性能在希尔伯特空间大小上保持稳健,通过在增加总激发数下的测试确认。这项工作为共同设计电路架构和控制协议奠定了基础,利用原生多比特相互作用作为下一代超导量子处理器的核心元素。
更新时间: 2025-08-14 16:26:32
领域: quant-ph,cs.LG,math.OC
Who Benefits from AI Explanations? Towards Accessible and Interpretable Systems
As AI systems are increasingly deployed to support decision-making in critical domains, explainability has become a means to enhance the understandability of these outputs and enable users to make more informed and conscious choices. However, despite growing interest in the usability of eXplainable AI (XAI), the accessibility of these methods, particularly for users with vision impairments, remains underexplored. This paper investigates accessibility gaps in XAI through a two-pronged approach. First, a literature review of 79 studies reveals that evaluations of XAI techniques rarely include disabled users, with most explanations relying on inherently visual formats. Second, we present a four-part methodological proof of concept that operationalizes inclusive XAI design: (1) categorization of AI systems, (2) persona definition and contextualization, (3) prototype design and implementation, and (4) expert and user assessment of XAI techniques for accessibility. Preliminary findings suggest that simplified explanations are more comprehensible for non-visual users than detailed ones, and that multimodal presentation is required for more equitable interpretability.
Updated: 2025-08-14 16:26:09
标题: 谁受益于AI解释?朝着可访问和可解释的系统方向
摘要: 随着人工智能系统越来越多地被部署来支持关键领域的决策,可解释性已成为增强这些输出的可理解性并使用户能够做出更为明智和自觉选择的手段。然而,尽管对可解释人工智能(XAI)的可用性越来越感兴趣,但这些方法的可访问性,特别是对视觉障碍用户而言,仍然未被充分探讨。本文通过双重方法调查了XAI中的可访问性差距。首先,对79项研究的文献综述显示,对XAI技术的评估很少包括残障用户,大多数解释依赖于固有的视觉格式。其次,我们提出了一个四部分的方法论概念验证,实现了包容性XAI设计:(1)AI系统的分类,(2)角色定义和情境化,(3)原型设计和实施,以及(4)专家和用户对XAI技术的可访问性评估。初步研究结果表明,简化的解释对非视觉用户更易理解,而多模态呈现对更公平的可解释性是必要的。
更新时间: 2025-08-14 16:26:09
领域: cs.AI
Non-Stationary Restless Multi-Armed Bandits with Provable Guarantee
Online restless multi-armed bandits (RMABs) typically assume that each arm follows a stationary Markov Decision Process (MDP) with fixed state transitions and rewards. However, in real-world applications like healthcare and recommendation systems, these assumptions often break due to non-stationary dynamics, posing significant challenges for traditional RMAB algorithms. In this work, we specifically consider $N$-armd RMAB with non-stationary transition constrained by bounded variation budgets $B$. Our proposed \rmab\; algorithm integrates sliding window reinforcement learning (RL) with an upper confidence bound (UCB) mechanism to simultaneously learn transition dynamics and their variations. We further establish that \rmab\; achieves $\widetilde{\mathcal{O}}(N^2 B^{\frac{1}{4}} T^{\frac{3}{4}})$ regret bound by leveraging a relaxed definition of regret, providing a foundational theoretical framework for non-stationary RMAB problems for the first time.
Updated: 2025-08-14 16:26:00
标题: 非稳态不安静的多臂老虎机,具有可证明的保证
摘要: 在线不安定的多臂赌博机(RMABs)通常假设每个手臂都遵循具有固定状态转换和奖励的稳态马尔可夫决策过程(MDP)。然而,在诸如医疗保健和推荐系统等实际应用中,这些假设通常会因为非稳态动态而破坏,给传统的RMAB算法带来重大挑战。在这项工作中,我们特别考虑了具有非稳态转换的$N$臂RMAB,其转换受到有界变化预算$B$的约束。我们提出的\rmab\算法将滑动窗口强化学习(RL)与上置信界(UCB)机制相结合,以同时学习转换动态及其变化。我们进一步确定\rmab\通过利用对后悔的宽松定义,实现了$\widetilde{\mathcal{O}}(N^2 B^{\frac{1}{4}} T^{\frac{3}{4}})$的后悔界,为非稳态RMAB问题提供了首次的基础理论框架。
更新时间: 2025-08-14 16:26:00
领域: cs.LG
The SET Perceptual Factors Framework: Towards Assured Perception for Autonomous Systems
Future autonomous systems promise significant societal benefits, yet their deployment raises concerns about safety and trustworthiness. A key concern is assuring the reliability of robot perception, as perception seeds safe decision-making. Failures in perception are often due to complex yet common environmental factors and can lead to accidents that erode public trust. To address this concern, we introduce the SET (Self, Environment, and Target) Perceptual Factors Framework. We designed the framework to systematically analyze how factors such as weather, occlusion, or sensor limitations negatively impact perception. To achieve this, the framework employs SET State Trees to categorize where such factors originate and SET Factor Trees to model how these sources and factors impact perceptual tasks like object detection or pose estimation. Next, we develop Perceptual Factor Models using both trees to quantify the uncertainty for a given task. Our framework aims to promote rigorous safety assurances and cultivate greater public understanding and trust in autonomous systems by offering a transparent and standardized method for identifying, modeling, and communicating perceptual risks.
Updated: 2025-08-14 16:22:01
标题: SET感知因素框架:走向自主系统的可靠感知
摘要: 未来的自主系统承诺带来重大的社会利益,然而它们的部署引发了对安全性和可信度的担忧。一个关键问题是确保机器人感知的可靠性,因为感知对安全决策起到关键作用。感知失败通常是由于复杂但常见的环境因素引起的,可能导致事故,从而侵蚀公众的信任。为了解决这一问题,我们引入了SET(自我、环境和目标)感知因素框架。我们设计了这个框架来系统分析诸如天气、遮挡或传感器限制等因素如何负面影响感知。为了实现这一目标,该框架采用SET状态树来对这些因素的来源进行分类,并使用SET因素树来模拟这些来源和因素如何影响感知任务,如目标检测或姿态估计。接下来,我们使用这两个树开发感知因素模型,以量化给定任务的不确定性。我们的框架旨在通过提供透明和标准化的方法来识别、建模和沟通感知风险,促进严格的安全保证,并培养公众对自主系统的更深入理解和信任。
更新时间: 2025-08-14 16:22:01
领域: cs.RO,cs.AI
Using machine learning to inform harvest control rule design in complex fishery settings
In fishery science, harvest management of size-structured stochastic populations is a long-standing and difficult problem. Rectilinear precautionary policies based on biomass and harvesting reference points have now become a standard approach to this problem. While these standard feedback policies are adapted from analytical or dynamic programming solutions assuming relatively simple ecological dynamics, they are often applied to more complicated ecological settings in the real world. In this paper we explore the problem of designing harvest control rules for partially observed, age-structured, spasmodic fish populations using tools from reinforcement learning (RL) and Bayesian optimization. Our focus is on the case of Walleye fisheries in Alberta, Canada, whose highly variable recruitment dynamics have perplexed managers and ecologists. We optimized and evaluated policies using several complementary performance metrics. The main questions we addressed were: 1. How do standard policies based on reference points perform relative to numerically optimized policies? 2. Can an observation of mean fish weight, in addition to stock biomass, aid policy decisions?
Updated: 2025-08-14 16:17:57
标题: 利用机器学习为复杂渔业环境中的捕捞控制规则设计提供信息
摘要: 在渔业科学中,对尺寸结构随机种群的收获管理是一个长期存在且困难的问题。基于生物量和收获参考点的直线谨慎政策现已成为解决这一问题的标准方法。尽管这些标准反馈政策源自假设相对简单的生态动态的分析或动态规划解决方案,但它们经常应用于现实世界中更复杂的生态环境。在本文中,我们利用强化学习(RL)和贝叶斯优化的工具探讨了设计部分观察的、年龄结构化的、间歇性的鱼类种群的收获控制规则的问题。我们的重点是加拿大艾伯塔省的沃尔利眼鱼渔业,其高度变化的招募动态困扰着管理者和生态学家。我们使用几个互补的性能指标来优化和评估政策。我们要解决的主要问题是:1. 基于参考点的标准政策相对于数值优化政策的表现如何?2. 除了存货生物量外,平均鱼重的观察是否有助于政策决策?
更新时间: 2025-08-14 16:17:57
领域: q-bio.PE,cs.LG,q-bio.QM
UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
We introduce UniOcc, a comprehensive, unified benchmark and toolkit for occupancy forecasting (i.e., predicting future occupancies based on historical information) and occupancy prediction (i.e., predicting current-frame occupancy from camera images. UniOcc unifies the data from multiple real-world datasets (i.e., nuScenes, Waymo) and high-fidelity driving simulators (i.e., CARLA, OpenCOOD), providing 2D/3D occupancy labels and annotating innovative per-voxel flows. Unlike existing studies that rely on suboptimal pseudo labels for evaluation, UniOcc incorporates novel evaluation metrics that do not depend on ground-truth labels, enabling robust assessment on additional aspects of occupancy quality. Through extensive experiments on state-of-the-art models, we demonstrate that large-scale, diverse training data and explicit flow information significantly enhance occupancy prediction and forecasting performance. Our data and code are available at https://uniocc.github.io/.
Updated: 2025-08-14 16:13:36
标题: UniOcc:自动驾驶中占用率预测和预测的统一基准
摘要: 我们介绍了UniOcc,这是一个全面的、统一的基准和工具包,用于占用预测(即根据历史信息预测未来的占用)和占用预测(即根据摄像机图像预测当前帧的占用)。UniOcc将来自多个真实世界数据集(如nuScenes、Waymo)和高保真度驾驶模拟器(如CARLA、OpenCOOD)的数据统一起来,提供2D/3D占用标签,并注释创新的每体素流。与现有研究依赖于次优伪标签进行评估的情况不同,UniOcc结合了不依赖于地面真实标签的新颖评估指标,能够对占用质量的其他方面进行强大的评估。通过对最先进模型的广泛实验,我们证明了大规模、多样化的训练数据和明确的流信息显著提高了占用预测和预测性能。我们的数据和代码可以在https://uniocc.github.io/获取。
更新时间: 2025-08-14 16:13:36
领域: cs.CV,cs.AI,cs.LG,cs.MA,cs.RO
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
Large language models (LLMs) have been widely deployed with rapidly expanding context windows to support increasingly demanding applications. However, long contexts pose significant deployment challenges, primarily due to the KV cache whose size grows proportionally with context length. While KV cache compression methods are proposed to address this issue, KV dropping methods incur considerable accuracy loss, and KV retrieval methods suffer from significant efficiency bottlenecks. We propose FreeKV, an algorithm-system co-optimization framework to enhance KV retrieval efficiency while preserving accuracy. On the algorithm side, FreeKV introduces speculative retrieval to shift the KV selection and recall processes out of the critical path, combined with fine-grained correction to ensure accuracy. On the system side, FreeKV employs hybrid KV layouts across CPU and GPU memory to eliminate fragmented data transfers, and leverages double-buffered streamed recall to further improve efficiency. Experiments demonstrate that FreeKV achieves near-lossless accuracy across various scenarios and models, delivering up to 13$\times$ speedup compared to SOTA KV retrieval methods.
Updated: 2025-08-14 16:12:44
标题: FreeKV:提升KV缓存检索以实现高效的LLM推断
摘要: 大型语言模型(LLMs)已被广泛部署,其上下文窗口迅速扩大,以支持日益苛刻的应用需求。然而,长上下文窗口会带来重要的部署挑战,主要是由于 KV 缓存的大小与上下文长度成正比增长。虽然提出了 KV 缓存压缩方法来解决这个问题,但 KV 删除方法会带来相当大的精度损失,而 KV 检索方法则受到显著的效率瓶颈的影响。我们提出了 FreeKV,一个算法-系统协同优化框架,以增强 KV 检索效率同时保持精度。在算法方面,FreeKV 引入了推测检索来将 KV 选择和召回过程从关键路径中移出,并结合细粒度校正来确保精度。在系统方面,FreeKV 使用跨 CPU 和 GPU 内存的混合 KV 布局来消除数据传输的碎片化,并利用双缓冲流式召回来进一步提高效率。实验表明,FreeKV 在各种情景和模型中实现了几乎无损的精度,与 SOTA KV 检索方法相比,实现了高达 13 倍的加速。
更新时间: 2025-08-14 16:12:44
领域: cs.LG,cs.AI,cs.CL
Enhancing Fairness in Autoencoders for Node-Level Graph Anomaly Detection
Graph anomaly detection (GAD) has become an increasingly important task across various domains. With the rapid development of graph neural networks (GNNs), GAD methods have achieved significant performance improvements. However, fairness considerations in GAD remain largely underexplored. Indeed, GNN-based GAD models can inherit and amplify biases present in training data, potentially leading to unfair outcomes. While existing efforts have focused on developing fair GNNs, most approaches target node classification tasks, where models often rely on simple layer architectures rather than autoencoder-based structures, which are the most widely used architecturs for anomaly detection. To address fairness in autoencoder-based GAD models, we propose \textbf{D}is\textbf{E}ntangled \textbf{C}ounterfactual \textbf{A}dversarial \textbf{F}air (DECAF)-GAD, a framework that alleviates bias while preserving GAD performance. Specifically, we introduce a structural causal model (SCM) to disentangle sensitive attributes from learned representations. Based on this causal framework, we formulate a specialized autoencoder architecture along with a fairness-guided loss function. Through extensive experiments on both synthetic and real-world datasets, we demonstrate that DECAF-GAD not only achieves competitive anomaly detection performance but also significantly enhances fairness metrics compared to baseline GAD methods. Our code is available at https://github.com/Tlhey/decaf_code.
Updated: 2025-08-14 16:12:15
标题: 增强自编码器在节点级图异常检测中的公平性
摘要: 图形异常检测(GAD)已经成为各个领域中越来越重要的任务。随着图神经网络(GNNs)的快速发展,GAD方法已经取得了显著的性能改善。然而,在GAD中的公平性考虑仍然大部分未被探索。事实上,基于GNN的GAD模型可能会继承和放大训练数据中存在的偏见,可能导致不公平的结果。虽然现有的努力集中在开发公平的GNNs,但大多数方法针对节点分类任务,模型通常依赖于简单的层结构,而不是基于自动编码器的结构,后者是异常检测中最广泛使用的架构。为了解决自动编码器基础的GAD模型中的公平性问题,我们提出了\textbf{D}is\textbf{E}ntangled \textbf{C}ounterfactual \textbf{A}dversarial \textbf{F}air (DECAF)-GAD,这是一个旨在减轻偏见同时保持GAD性能的框架。具体来说,我们引入了结构因果模型(SCM)来将敏感属性与学习表示分离开来。基于这个因果框架,我们制定了一个专门的自动编码器架构以及一个公平引导的损失函数。通过对合成和真实数据集进行广泛实验,我们证明DECAF-GAD不仅实现了竞争性的异常检测性能,还与基线GAD方法相比显著提高了公平性指标。我们的代码可在https://github.com/Tlhey/decaf_code上找到。
更新时间: 2025-08-14 16:12:15
领域: cs.LG,cs.AI,stat.ML
Hardness-Aware Dynamic Curriculum Learning for Robust Multimodal Emotion Recognition with Missing Modalities
Missing modalities have recently emerged as a critical research direction in multimodal emotion recognition (MER). Conventional approaches typically address this issue through missing modality reconstruction. However, these methods fail to account for variations in reconstruction difficulty across different samples, consequently limiting the model's ability to handle hard samples effectively. To overcome this limitation, we propose a novel Hardness-Aware Dynamic Curriculum Learning framework, termed HARDY-MER. Our framework operates in two key stages: first, it estimates the hardness level of each sample, and second, it strategically emphasizes hard samples during training to enhance model performance on these challenging instances. Specifically, we first introduce a Multi-view Hardness Evaluation mechanism that quantifies reconstruction difficulty by considering both Direct Hardness (modality reconstruction errors) and Indirect Hardness (cross-modal mutual information). Meanwhile, we introduce a Retrieval-based Dynamic Curriculum Learning strategy that dynamically adjusts the training curriculum by retrieving samples with similar semantic information and balancing the learning focus between easy and hard instances. Extensive experiments on benchmark datasets demonstrate that HARDY-MER consistently outperforms existing methods in missing-modality scenarios. Our code will be made publicly available at https://github.com/HARDY-MER/HARDY-MER.
Updated: 2025-08-14 16:06:55
标题: 硬度感知动态课程学习用于具有缺失模态的鲁棒多模态情绪识别
摘要: 缺失模态最近在多模态情绪识别(MER)中已经成为一个关键的研究方向。传统方法通常通过缺失模态重建来解决这个问题。然而,这些方法未能考虑到不同样本之间重建难度的变化,从而限制了模型有效处理困难样本的能力。为了克服这一限制,我们提出了一个新颖的Hardness-Aware Dynamic Curriculum Learning框架,称为HARDY-MER。我们的框架分为两个关键阶段:首先,它估计每个样本的难度级别,然后,在训练期间有策略地强调困难样本,以增强模型在这些具有挑战性的实例上的性能。具体来说,我们首先引入了一个多视角难度评估机制,通过考虑直接难度(模态重建错误)和间接难度(跨模态互信息)来量化重建难度。同时,我们引入了一种基于检索的动态课程学习策略,动态调整训练课程,通过检索具有相似语义信息的样本,并在易于和困难实例之间平衡学习重点。在基准数据集上的大量实验表明,HARDY-MER在缺失模态场景中始终优于现有方法。我们的代码将在https://github.com/HARDY-MER/HARDY-MER 上公开。
更新时间: 2025-08-14 16:06:55
领域: cs.LG,cs.AI
Ultra-High-Definition Reference-Based Landmark Image Super-Resolution with Generative Diffusion Prior
Reference-based Image Super-Resolution (RefSR) aims to restore a low-resolution (LR) image by utilizing the semantic and texture information from an additional reference high-resolution (reference HR) image. Existing diffusion-based RefSR methods are typically built upon ControlNet, which struggles to effectively align the information between the LR image and the reference HR image. Moreover, current RefSR datasets suffer from limited resolution and poor image quality, resulting in the reference images lacking sufficient fine-grained details to support high-quality restoration. To overcome the limitations above, we propose TriFlowSR, a novel framework that explicitly achieves pattern matching between the LR image and the reference HR image. Meanwhile, we introduce Landmark-4K, the first RefSR dataset for Ultra-High-Definition (UHD) landmark scenarios. Considering the UHD scenarios with real-world degradation, in TriFlowSR, we design a Reference Matching Strategy to effectively match the LR image with the reference HR image. Experimental results show that our approach can better utilize the semantic and texture information of the reference HR image compared to previous methods. To the best of our knowledge, we propose the first diffusion-based RefSR pipeline for ultra-high definition landmark scenarios under real-world degradation. Our code and model will be available at https://github.com/nkicsl/TriFlowSR.
Updated: 2025-08-14 16:04:39
标题: 基于超高清参考点的地标图像超分辨率与生成扩散先验
摘要: 参考图像超分辨率(RefSR)旨在通过利用附加的参考高分辨率(参考HR)图像的语义和纹理信息来恢复低分辨率(LR)图像。现有基于扩散的RefSR方法通常基于ControlNet构建,该方法很难有效地对齐LR图像和参考HR图像之间的信息。此外,当前的RefSR数据集分辨率有限,图像质量较差,导致参考图像缺乏足够的细节以支持高质量的恢复。为了克服上述限制,我们提出了TriFlowSR,这是一个新颖的框架,明确实现了LR图像和参考HR图像之间的模式匹配。同时,我们介绍了Landmark-4K,这是第一个用于超高清(UHD)地标情景的RefSR数据集。考虑到真实世界的退化情况,TriFlowSR中,我们设计了一种参考匹配策略,以有效地将LR图像与参考HR图像匹配。实验结果表明,与先前的方法相比,我们的方法能更好地利用参考HR图像的语义和纹理信息。据我们所知,我们提出了首个基于扩散的RefSR管道,用于真实世界退化情况下的超高清地标情景。我们的代码和模型将在https://github.com/nkicsl/TriFlowSR 上提供。
更新时间: 2025-08-14 16:04:39
领域: cs.CV,cs.AI
The Knowledge-Reasoning Dissociation: Fundamental Limitations of LLMs in Clinical Natural Language Inference
Large language models are often assumed to acquire increasingly structured, generalizable internal representations simply by scaling data and parameters. We interrogate this assumption by introducing a Clinical Trial Natural Language Inference benchmark comprising four reasoning families, Causal Attribution, Compositional Grounding, Epistemic Verification, and Risk State Abstraction. Each item is paired with a targeted Ground Knowledge and Meta-Level Reasoning Verification (GKMRV) probe, allowing us to dissociate failures of factual access from failures of inference. We evaluate six contemporary LLMs under both direct and chain of thought prompting. Models achieve near-ceiling GKMRV accuracy (mean accuracy 0.918) yet perform poorly on the main reasoning tasks (mean accuracy 0.25). Despite low accuracy, output inferences are highly consistent across samples (mean 0.87), indicating a systematic application of underlying heuristics and shortcuts. These results reveal fundamental structural and representational limitations: current LLMs often possess the relevant clinical knowledge but lack the structured, composable internal representations needed to deploy it reliably (e.g., integrating constraints, weighing evidence, or simulating counterfactuals). Decoupling knowledge from reasoning with GKMRV makes this dissociation explicit and measurable, providing an effective framework for probing the reliability of LLMs in high-stakes domains.
Updated: 2025-08-14 16:01:10
标题: 知识推理分离:LLMs在临床自然语言推理中的基本限制
摘要: 大型语言模型通常被认为通过扩展数据和参数来获得越来越结构化、可推广的内部表示。我们通过引入一个临床试验自然语言推理基准来质疑这一假设,该基准包括四个推理家族,包括因果归因、组合基础、认识验证和风险状态抽象。每个项目都与一个有针对性的地面知识和元级推理验证(GKMRV)探针配对,这使我们能够将事实访问的失败与推理的失败分离开来。我们在直接和思维链提示下评估了六个当代LLM。 这些模型在GKMRV准确性方面接近完美(平均准确率为0.918),但在主要推理任务上表现不佳(平均准确率0.25)。尽管准确率低,输出的推理在样本之间高度一致(平均0.87),表明了对基础启发式和快捷方法的系统应用。 这些结果揭示了基本的结构和表征限制:当前的LLM通常具有相关的临床知识,但缺乏可靠部署它所需的结构化、可组合的内部表示(例如,整合约束、权衡证据或模拟反事实)。通过使用GKMRV将知识与推理分离,使这种分离变得明确且可衡量,为探究LLM在高风险领域的可靠性提供了有效的框架。
更新时间: 2025-08-14 16:01:10
领域: cs.AI
Estimating Covariance for Global Minimum Variance Portfolio: A Decision-Focused Learning Approach
Portfolio optimization constitutes a cornerstone of risk management by quantifying the risk-return trade-off. Since it inherently depends on accurate parameter estimation under conditions of future uncertainty, the selection of appropriate input parameters is critical for effective portfolio construction. However, most conventional statistical estimators and machine learning algorithms determine these parameters by minimizing mean-squared error (MSE), a criterion that can yield suboptimal investment decisions. In this paper, we adopt decision-focused learning (DFL) - an approach that directly optimizes decision quality rather than prediction error such as MSE - to derive the global minimum-variance portfolio (GMVP). Specifically, we theoretically derive the gradient of decision loss using the analytic solution of GMVP and its properties regarding the principal components of itself. Through extensive empirical evaluation, we show that prediction-focused estimation methods may fail to produce optimal allocations in practice, whereas DFL-based methods consistently deliver superior decision performance. Furthermore, we provide a comprehensive analysis of DFL's mechanism in GMVP construction, focusing on its volatility reduction capability, decision-driving features, and estimation characteristics.
Updated: 2025-08-14 16:00:52
标题: 估计全球最小方差投资组合的协方差:一种以决策为重点的学习方法
摘要: 投资组合优化是风险管理的基石,通过量化风险回报权衡。由于它在未来不确定条件下依赖于准确的参数估计,选择适当的输入参数对有效的投资组合构建至关重要。然而,大多数传统统计估计器和机器学习算法通过最小化均方误差(MSE)确定这些参数,这种标准可能会导致次优的投资决策。在本文中,我们采用以决策为中心的学习(DFL)-一种直接优化决策质量而不是预测误差(如MSE)的方法,来推导全局最小方差投资组合(GMVP)。具体来说,我们通过GMVP的解析解理论推导决策损失的梯度及其关于自身主成分的特性。通过广泛的实证评估,我们展示了以预测为中心的估计方法在实践中可能无法产生最佳配置,而基于DFL的方法始终提供更优越的决策表现。此外,我们对DFL在GMVP构建中的机制进行了全面分析,重点关注其波动性降低能力、决策驱动特征和估计特性。
更新时间: 2025-08-14 16:00:52
领域: q-fin.PM,cs.AI
IBEX: Information-Bottleneck-EXplored Coarse-to-Fine Molecular Generation under Limited Data
Three-dimensional generative models increasingly drive structure-based drug discovery, yet it remains constrained by the scarce publicly available protein-ligand complexes. Under such data scarcity, almost all existing pipelines struggle to learn transferable geometric priors and consequently overfit to training-set biases. As such, we present IBEX, an Information-Bottleneck-EXplored coarse-to-fine pipeline to tackle the chronic shortage of protein-ligand complex data in structure-based drug design. Specifically, we use PAC-Bayesian information-bottleneck theory to quantify the information density of each sample. This analysis reveals how different masking strategies affect generalization and indicates that, compared with conventional de novo generation, the constrained Scaffold Hopping task endows the model with greater effective capacity and improved transfer performance. IBEX retains the original TargetDiff architecture and hyperparameters for training to generate molecules compatible with the binding pocket; it then applies an L-BFGS optimization step to finely refine each conformation by optimizing five physics-based terms and adjusting six translational and rotational degrees of freedom in under one second. With only these modifications, IBEX raises the zero-shot docking success rate on CBGBench CrossDocked2020-based from 53% to 64%, improves the mean Vina score from $-7.41 kcal mol^{-1}$ to $-8.07 kcal mol^{-1}$, and achieves the best median Vina energy in 57 of 100 pockets versus 3 for the original TargetDiff. IBEX also increases the QED by 25%, achieves state-of-the-art validity and diversity, and markedly reduces extrapolation error.
Updated: 2025-08-14 15:59:22
标题: IBEX: 信息瓶颈探索下受限数据条件下的粗到细分子生成
摘要: 三维生成模型越来越推动基于结构的药物发现,但受制于稀缺的公开可用的蛋白质-配体复合物。在这种数据稀缺的情况下,几乎所有现有的流程都难以学习可转移的几何先验,因此容易在训练集偏差方面过拟合。因此,我们提出了IBEX,这是一种从粗到细的信息瓶颈探索流程,旨在解决基于结构的药物设计中长期存在的蛋白质-配体复合物数据短缺问题。具体来说,我们使用PAC-Bayesian信息瓶颈理论来量化每个样本的信息密度。这种分析揭示了不同的掩蔽策略如何影响泛化,并表明与传统的de novo生成相比,受限的骨架跳跃任务赋予模型更大的有效容量和改进的转移性能。IBEX保留了原始的TargetDiff架构和超参数进行训练,以生成与结合口袋兼容的分子;然后应用L-BFGS优化步骤通过优化五个基于物理的项和调整六个平移和旋转自由度在不到一秒的时间内精细调整每个构象。仅通过这些修改,IBEX将基于CBGBench CrossDocked2020的零射击对接成功率从53%提高到64%,将平均Vina分数从$-7.41 kcal mol^{-1}$提高到$-8.07 kcal mol^{-1}$,并在100个口袋中的57个口袋中实现了最佳的中位Vina能量,而原始TargetDiff只有3个。IBEX还将QED提高了25%,实现了最先进的有效性和多样性,并显著减少了外推误差。
更新时间: 2025-08-14 15:59:22
领域: cs.LG,q-bio.BM
Video-BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
Diffusion transformers currently lead the field in high-quality video generation, but their slow iterative denoising process and prohibitive quadratic attention costs for long sequences create significant inference bottlenecks. While both step distillation and sparse attention mechanisms have shown promise as independent acceleration strategies, effectively combining these approaches presents critical challenges -- training-free integration yields suboptimal results, while separately training sparse attention after step distillation requires prohibitively expensive high-quality video data. To overcome these limitations, we propose BLADE, an innovative data-free joint training framework that introduces: (1) an Adaptive Block-Sparse Attention (ASA) mechanism for dynamically generating content-aware sparsity masks to focus computation on salient spatiotemporal features, and (2) a sparsity-aware step distillation paradigm built upon Trajectory Distribution Matching (TDM) that directly incorporates sparsity into the distillation process rather than treating it as a separate compression step, with fast convergence. We validate BLADE on text-to-video models like CogVideoX-5B and Wan2.1-1.3B. Our framework demonstrates remarkable efficiency gains across different scales. On Wan2.1-1.3B, BLADE achieves a 14.10x end-to-end inference acceleration over a 50-step baseline. Moreover, on models such as CogVideoX-5B with short video sequence lengths, our framework delivers a robust 8.89x speedup. Crucially, the acceleration is accompanied by a consistent quality improvement. On the VBench-2.0 benchmark, BLADE boosts the score of CogVideoX-5B to 0.569 (from 0.534) and Wan2.1-1.3B to 0.570 (from 0.563), results that are further corroborated by superior ratings in human evaluations. Our code and model weights are publicly available at: http://ziplab.co/BLADE-Homepage/.
Updated: 2025-08-14 15:58:59
标题: Video-BLADE:块稀疏注意力结合步骤蒸馏实现高效视频生成
摘要: 扩散变压器目前在高质量视频生成领域处于领先地位,但其缓慢的迭代去噪过程和长序列的二次注意力成本限制了推断效率。虽然步骤蒸馏和稀疏注意机制作为独立的加速策略表现出了希望,但有效地结合这些方法面临着关键挑战--无需训练的整合会产生次优结果,而在步骤蒸馏之后单独训练稀疏注意力则需要昂贵的高质量视频数据。为了克服这些限制,我们提出了一种创新的无数据联合训练框架BLADE,该框架引入了:(1)自适应块稀疏注意力(ASA)机制,动态生成内容感知的稀疏掩模,将计算重点放在显著的时空特征上,以及(2)基于轨迹分布匹配(TDM)的稀疏感知步蒸馏范式,直接将稀疏性纳入蒸馏过程,而不是将其视为单独的压缩步骤,具有快速收敛。我们在文本到视频模型(如CogVideoX-5B和Wan2.1-1.3B)上验证了BLADE。我们的框架在不同规模上展现了显著的效率提升。在Wan2.1-1.3B上,BLADE相对于50步基线实现了14.10倍的端到端推断加速。此外,在诸如CogVideoX-5B这样的模型中,我们的框架提供了稳健的8.89倍加速。至关重要的是,这种加速伴随着一致的质量改善。在VBench-2.0基准测试中,BLADE将CogVideoX-5B的得分提升至0.569(从0.534),将Wan2.1-1.3B的得分提升至0.570(从0.563),这些结果得到了人类评估中更高评分的进一步证实。我们的代码和模型权重可在http://ziplab.co/BLADE-Homepage/上公开获取。
更新时间: 2025-08-14 15:58:59
领域: cs.CV,cs.AI,cs.LG
AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences
Recent advances in AI-generated content have fueled the rise of highly realistic synthetic videos, posing severe risks to societal trust and digital integrity. Existing benchmarks for video authenticity detection typically suffer from limited realism, insufficient scale, and inadequate complexity, failing to effectively evaluate modern vision-language models against sophisticated forgeries. To address this critical gap, we introduce AEGIS, a novel large-scale benchmark explicitly targeting the detection of hyper-realistic and semantically nuanced AI-generated videos. AEGIS comprises over 10,000 rigorously curated real and synthetic videos generated by diverse, state-of-the-art generative models, including Stable Video Diffusion, CogVideoX-5B, KLing, and Sora, encompassing open-source and proprietary architectures. In particular, AEGIS features specially constructed challenging subsets enhanced with robustness evaluation. Furthermore, we provide multimodal annotations spanning Semantic-Authenticity Descriptions, Motion Features, and Low-level Visual Features, facilitating authenticity detection and supporting downstream tasks such as multimodal fusion and forgery localization. Extensive experiments using advanced vision-language models demonstrate limited detection capabilities on the most challenging subsets of AEGIS, highlighting the dataset's unique complexity and realism beyond the current generalization capabilities of existing models. In essence, AEGIS establishes an indispensable evaluation benchmark, fundamentally advancing research toward developing genuinely robust, reliable, broadly generalizable video authenticity detection methodologies capable of addressing real-world forgery threats. Our dataset is available on https://huggingface.co/datasets/Clarifiedfish/AEGIS.
Updated: 2025-08-14 15:55:49
标题: AEGIS:人工智能生成视频序列的真实性评估基准
摘要: 人工智能生成内容的最新进展推动了高度逼真的合成视频的兴起,给社会信任和数字完整性带来了严重风险。现有的视频真实性检测基准通常存在现实性受限、规模不足和复杂度不足的问题,无法有效评估现代视觉语言模型对复杂伪造的能力。为了填补这一关键差距,我们引入了AEGIS,一个新颖的大规模基准,明确针对检测超逼真和语义细致的人工智能生成视频。AEGIS包括由各种最先进的生成模型生成的超过10,000个严格策划的真实和合成视频,包括稳定视频扩散、CogVideoX-5B、KLing和Sora等开源和专有架构。特别是,AEGIS具有特别构建的具有鲁棒性评估的挑战性子集。此外,我们提供跨越语义真实性描述、动作特征和低级视觉特征的多模态注释,促进真实性检测,并支持多模态融合和伪造定位等下游任务。使用先进的视觉语言模型进行广泛实验,表明在AEGIS最具挑战性的子集上的检测能力受到限制,突显了该数据集在当前现有模型的概括能力之外的独特复杂性和逼真性。总的来说,AEGIS建立了一个不可或缺的评估基准,从根本上推动了开发真正稳健、可靠、广泛可推广的视频真实性检测方法的研究,能够应对现实世界的伪造威胁。我们的数据集可以在https://huggingface.co/datasets/Clarifiedfish/AEGIS上获取。
更新时间: 2025-08-14 15:55:49
领域: cs.CV,cs.AI
Modeling Human Responses to Multimodal AI Content
As AI-generated content becomes widespread, so does the risk of misinformation. While prior research has primarily focused on identifying whether content is authentic, much less is known about how such content influences human perception and behavior. In domains like trading or the stock market, predicting how people react (e.g., whether a news post will go viral), can be more critical than verifying its factual accuracy. To address this, we take a human-centered approach and introduce the MhAIM Dataset, which contains 154,552 online posts (111,153 of them AI-generated), enabling large-scale analysis of how people respond to AI-generated content. Our human study reveals that people are better at identifying AI content when posts include both text and visuals, particularly when inconsistencies exist between the two. We propose three new metrics: trustworthiness, impact, and openness, to quantify how users judge and engage with online content. We present T-Lens, an LLM-based agent system designed to answer user queries by incorporating predicted human responses to multimodal information. At its core is HR-MCP (Human Response Model Context Protocol), built on the standardized Model Context Protocol (MCP), enabling seamless integration with any LLM. This integration allows T-Lens to better align with human reactions, enhancing both interpretability and interaction capabilities. Our work provides empirical insights and practical tools to equip LLMs with human-awareness capabilities. By highlighting the complex interplay among AI, human cognition, and information reception, our findings suggest actionable strategies for mitigating the risks of AI-driven misinformation.
Updated: 2025-08-14 15:55:19
标题: 建模人类对多模态人工智能内容的反应
摘要: 随着人工智能生成内容的普及,虚假信息的风险也在加大。先前的研究主要侧重于识别内容是否真实,对这种内容如何影响人类的认知和行为知之甚少。在交易或股市等领域,预测人们的反应(例如,一篇新闻文章是否会走红)可能比验证其事实准确性更为关键。为了解决这个问题,我们采取了以人为中心的方法,并引入了MhAIM数据集,其中包含了154,552条在线帖子(其中111,153条由人工智能生成),可实现对人们如何对人工智能生成内容做出响应的大规模分析。我们的人类研究表明,当帖子包含文本和视觉元素时,人们更擅长识别人工智能内容,尤其是当文本和视觉之间存在不一致时。我们提出了三个新的度量标准:信任度、影响力和开放性,用于量化用户如何评判和参与在线内容。我们提出了T-Lens,这是一个基于LLM的代理系统,通过整合对多模态信息的预测人类反应来回答用户查询。其核心是HR-MCP(人类反应模型上下文协议),建立在标准化的模型上下文协议(MCP)之上,使其能够与任何LLM无缝集成。这种集成使得T-Lens能够更好地与人类反应保持一致,增强了解释能力和交互能力。我们的工作提供了实证洞见和实用工具,为LLM提供了人类意识能力。通过突显人工智能、人类认知和信息接收之间的复杂互动,我们的研究结果提出了应对人工智能驱动的虚假信息风险的可行策略。
更新时间: 2025-08-14 15:55:19
领域: cs.AI,cs.MM
Memorisation and forgetting in a learning Hopfield neural network: bifurcation mechanisms, attractors and basins
Despite explosive expansion of artificial intelligence based on artificial neural networks (ANNs), these are employed as "black boxes'', as it is unclear how, during learning, they form memories or develop unwanted features, including spurious memories and catastrophic forgetting. Much research is available on isolated aspects of learning ANNs, but due to their high dimensionality and non-linearity, their comprehensive analysis remains a challenge. In ANNs, knowledge is thought to reside in connection weights or in attractor basins, but these two paradigms are not linked explicitly. Here we comprehensively analyse mechanisms of memory formation in an 81-neuron Hopfield network undergoing Hebbian learning by revealing bifurcations leading to formation and destruction of attractors and their basin boundaries. We show that, by affecting evolution of connection weights, the applied stimuli induce a pitchfork and then a cascade of saddle-node bifurcations creating new attractors with their basins that can code true or spurious memories, and an abrupt disappearance of old memories (catastrophic forgetting). With successful learning, new categories are represented by the basins of newly born point attractors, and their boundaries by the stable manifolds of new saddles. With this, memorisation and forgetting represent two manifestations of the same mechanism. Our strategy to analyse high-dimensional learning ANNs is universal and applicable to recurrent ANNs of any form. The demonstrated mechanisms of memory formation and of catastrophic forgetting shed light on the operation of a wider class of recurrent ANNs and could aid the development of approaches to mitigate their flaws.
Updated: 2025-08-14 15:48:39
标题: 在学习Hopfield神经网络中的记忆和遗忘:分叉机制,吸引子和盆地
摘要: 尽管基于人工神经网络(ANNs)的人工智能迅猛扩张,但它们被用作“黑匣子”,因为不清楚在学习过程中它们如何形成记忆或发展不需要的特征,包括虚假记忆和灾难性遗忘。虽然有很多关于学习ANNs的孤立方面的研究,但由于其高维度和非线性,对它们进行全面分析仍然是一个挑战。在ANNs中,知识被认为存在于连接权重或吸引子盆地中,但这两种范式并没有明确的联系。在这里,我们通过揭示导致吸引子形成和破坏以及其盆地边界的分叉,全面分析了81个神经元Hopfield网络在进行Hebbian学习时的记忆形成机制。我们展示了通过影响连接权重的演变,应用的刺激诱导出一个叉齿和然后一系列鞍点分叉,形成了可以编码真实或虚假记忆的新吸引子及其盆地,以及旧记忆的突然消失(灾难性遗忘)。通过成功学习,新类别由新生点吸引子的盆地代表,而它们的边界由新鞍点的稳定流形代表。通过这一机制,记忆和遗忘代表了同一机制的两种表现形式。我们分析高维学习ANNs的策略是通用的,并可应用于任何形式的循环ANNs。所展示的记忆形成和灾难性遗忘的机制揭示了更广泛类别的循环ANNs的运作,并有助于开发减轻它们缺陷的方法。
更新时间: 2025-08-14 15:48:39
领域: math.DS,cs.LG,nlin.AO,37N99 (primary) 68T07, 68T05 (secondary)
FROGENT: An End-to-End Full-process Drug Design Agent
Powerful AI tools for drug discovery reside in isolated web apps, desktop programs, and code libraries. Such fragmentation forces scientists to manage incompatible interfaces and specialized scripts, which can be a cumbersome and repetitive process. To address this issue, a Full-pROcess druG dEsign ageNT, named FROGENT, has been proposed. Specifically, FROGENT utilizes a Large Language Model and the Model Context Protocol to integrate multiple dynamic biochemical databases, extensible tool libraries, and task-specific AI models. This agentic framework allows FROGENT to execute complicated drug discovery workflows dynamically, including component tasks such as target identification, molecule generation and retrosynthetic planning. FROGENT has been evaluated on eight benchmarks that cover various aspects of drug discovery, such as knowledge retrieval, property prediction, virtual screening, mechanistic analysis, molecular design, and synthesis. It was compared against six increasingly advanced ReAct-style agents that support code execution and literature searches. Empirical results demonstrated that FROGENT triples the best baseline performance in hit-finding and doubles it in interaction profiling, significantly outperforming both the open-source model Qwen3-32B and the commercial model GPT-4o. In addition, real-world cases have been utilized to validate the practicability and generalization of FROGENT. This development suggests that streamlining the agentic drug discovery pipeline can significantly enhance researcher productivity.
Updated: 2025-08-14 15:45:53
标题: FROGENT:一种端到端的全流程药物设计代理程序
摘要: 强大的人工智能工具用于药物发现,分布在独立的网络应用程序、桌面程序和代码库中。这种碎片化迫使科学家管理不兼容的界面和专门的脚本,这可能是一个繁琐和重复的过程。为了解决这个问题,提出了一种全过程药物设计代理,名为FROGENT。具体来说,FROGENT利用大型语言模型和模型上下文协议,集成了多个动态生物化学数据库、可扩展的工具库和任务特定的人工智能模型。这种代理框架允许FROGENT动态执行复杂的药物发现工作流程,包括目标识别、分子生成和逆合成规划等组件任务。FROGENT已经在涵盖药物发现各个方面的八个基准上进行了评估,例如知识检索、特性预测、虚拟筛选、机制分析、分子设计和合成。它与支持代码执行和文献搜索的六种越来越先进的ReAct风格代理进行了比较。实证结果表明,FROGENT在发现活性化合物方面将最佳基准性能提高了三倍,在相互作用分析中提高了两倍,明显优于开源模型Qwen3-32B和商业模型GPT-4o。此外,真实案例已经被用来验证FROGENT的实用性和泛化能力。这一发展表明,简化代理式药物发现流程可以显著提高研究人员的生产力。
更新时间: 2025-08-14 15:45:53
领域: q-bio.BM,cs.AI
Retro-Expert: Collaborative Reasoning for Interpretable Retrosynthesis
Retrosynthesis prediction aims to infer the reactant molecule based on a given product molecule, which is a fundamental task in chemical synthesis. However, existing models rely on static pattern-matching paradigm, which limits their ability to perform effective logic decision-making, leading to black-box decision-making. Building on this, we propose Retro-Expert, an interpretable retrosynthesis framework that performs collaborative reasoning by combining the complementary reasoning strengths of Large Language Models and specialized models via reinforcement learning. It outputs natural language explanations grounded in chemical logic through three components: (1) specialized models perform shallow reasoning to construct high-quality chemical decision space, (2) LLM-driven critical reasoning to generate predictions and corresponding interpretable reasoning path, and (3) reinforcement learning optimizing interpretable decision policy. Experiments show that Retro-Expert not only surpasses both LLM-based and specialized models across different metrics but also provides expert-aligned explanations that bridge the gap between AI predictions and actionable chemical insights.
Updated: 2025-08-14 15:41:25
标题: Retro-Expert:可解释性合成反应的协同推理
摘要: 逆合成预测旨在根据给定的产物分子推断反应物分子,这是化学合成中的一项基本任务。然而,现有模型依赖于静态模式匹配范式,这限制了它们执行有效逻辑决策的能力,导致黑盒决策。基于此,我们提出了一种可解释的逆合成框架Retro-Expert,通过结合大型语言模型和专门模型的互补推理优势,通过强化学习执行协作推理。它通过三个组件输出基于化学逻辑的自然语言解释:(1) 专门模型进行浅层推理,构建高质量的化学决策空间,(2) 基于LLM的关键推理生成预测和相应的可解释推理路径,以及(3) 优化可解释决策策略的强化学习。实验证明,Retro-Expert不仅在不同指标上超过了基于LLM和专门模型,而且提供了与专家对齐的解释,弥合了AI预测与可操作化学洞见之间的差距。
更新时间: 2025-08-14 15:41:25
领域: cs.LG,cs.AI
Natively Trainable Sparse Attention for Hierarchical Point Cloud Datasets
Unlocking the potential of transformers on datasets of large physical systems depends on overcoming the quadratic scaling of the attention mechanism. This work explores combining the Erwin architecture with the Native Sparse Attention (NSA) mechanism to improve the efficiency and receptive field of transformer models for large-scale physical systems, addressing the challenge of quadratic attention complexity. We adapt the NSA mechanism for non-sequential data, implement the Erwin NSA model, and evaluate it on three datasets from the physical sciences -- cosmology simulations, molecular dynamics, and air pressure modeling -- achieving performance that matches or exceeds that of the original Erwin model. Additionally, we reproduce the experimental results from the Erwin paper to validate their implementation.
Updated: 2025-08-14 15:39:34
标题: 可以翻译为:可原生训练的稀疏层级点云数据集注意力机制
摘要: 在大型物理系统数据集上释放transformer模型的潜力取决于克服注意力机制的二次扩展。本研究探讨了将Erwin架构与本地稀疏注意力(NSA)机制相结合,以提高transformer模型在大规模物理系统中的效率和接受域,解决二次注意力复杂性的挑战。我们将NSA机制调整为非顺序数据,实现Erwin NSA模型,并在来自物理科学的三个数据集上进行评估--宇宙学模拟、分子动力学和空气压力建模--实现了与原始Erwin模型相匹配或超越的性能。此外,我们重现了Erwin论文中的实验结果,以验证其实施。
更新时间: 2025-08-14 15:39:34
领域: cs.LG,cs.AI
BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV Cache
The rise of long-context Large Language Models (LLMs) amplifies memory and bandwidth demands during autoregressive decoding, as the Key-Value (KV) cache grows with each generated token. Low-bit KV-cache quantization (e.g., 4-bit or 2-bit) can reduce memory footprint while preserving accuracy, but existing systems suffer from slow decoding due to their exclusive reliance on CUDA cores, neglecting Tensor Cores (the primary source of compute on modern GPUs). We present BitDecoding, a new long-context LLM inference system with a low-bit KV cache. BitDecoding enables efficient low-bit KV-cache decoding by cooperatively leveraging CUDA cores and Tensor Cores. It introduces methods for automatically inducing optimized layouts to exploit Tensor Cores, along with warp-level parallelization strategies for dequantization. For unified system support, BitDecoding includes a query transformation module supporting diverse attention variants, a quantization kernel that supports both tensor-wise and channel-wise scaling used in various quantization algorithms with high performance, and a dequantization kernel with a software-defined pipeline to coordinate CUDA and Tensor Cores execution for mixed-precision operations. Evaluated on RTX 4090, A100, and H100, BitDecoding accelerates decoding by up to 7.5x, 4.8x, and 8.9x, respectively, over FP16 FlashDecoding-v2, and surpasses the state-of-the-art low-bit system QServe by up to 4.3x. On LLaMA-3.1-8B with a 128K context, BitDecoding reduces single-batch decoding latency by 3x, showing substantial improvements for long-context generation. The code is available at https://github.com/DD-DuDa/BitDecoding.
Updated: 2025-08-14 15:37:43
标题: Bit解码:利用低位KV缓存解锁长上下文LLM的张量核
摘要: 长上下文大语言模型(LLMs)的崛起增加了自回归解码过程中的内存和带宽需求,因为键-值(KV)缓存随着每个生成的令牌而增长。低比特KV缓存量化(例如,4比特或2比特)可以减少内存占用,同时保持准确性,但现有系统由于仅依赖CUDA核心而导致解码速度缓慢,忽略了Tensor Cores(现代GPU上计算的主要来源)。我们提出了BitDecoding,一种具有低比特KV缓存的新长上下文LLM推理系统。BitDecoding通过合作利用CUDA核心和Tensor Cores实现了高效的低比特KV缓存解码。它引入了自动诱导优化布局以利用Tensor Cores的方法,以及用于去量化的warp级并行化策略。为了统一系统支持,BitDecoding包括一个支持各种注意力变体的查询转换模块,一个支持在各种量化算法中使用的张量级和通道级缩放的量化内核,并具有软件定义管线的去量化内核,用于协调CUDA和Tensor Cores执行混合精度操作。在RTX 4090、A100和H100上评估,BitDecoding分别加速解码7.5倍、4.8倍和8.9倍,超过FP16 FlashDecoding-v2的最先进低比特系统QServe高达4.3倍。在具有128K上下文的LLaMA-3.1-8B上,BitDecoding将单批解码延迟减少了3倍,对于长上下文生成显示出显着改进。代码可在https://github.com/DD-DuDa/BitDecoding 上找到。
更新时间: 2025-08-14 15:37:43
领域: cs.AR,cs.AI,cs.CL,cs.PF
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models
Reinforcement learning with verifiable rewards (RLVR), which typically adopts Pass@1 as the reward, has faced the issues in balancing exploration and exploitation, causing policies to prefer conservative actions, converging to a local optimum. Identifying an appropriate reward metric is therefore crucial. Regarding the prior work, although Pass@k has been used in evaluation, its connection to LLM exploration ability in RLVR remains largely overlooked. To investigate this, we first use Pass@k as the reward to train the policy model (i.e., $\textbf{Pass@k Training}$), and observe the improvement on its exploration ability. Next, we derive an analytical solution for the advantage of Pass@k Training, leading to an efficient and effective process. Building on this, our analysis reveals that exploration and exploitation are not inherently conflicting objectives, while they can mutually enhance each other. Moreover, Pass@k Training with analytical derivation essentially involves directly designing the advantage function. Inspired by this, we preliminarily explore the advantage design for RLVR, showing promising results and highlighting a potential future direction.
Updated: 2025-08-14 15:34:47
标题: Pass@k培训用于自适应平衡大型推理模型的探索和利用
摘要: 用可验证奖励(RLVR)的强化学习通常采用Pass@1作为奖励,面临在探索和利用之间平衡的问题,导致策略更倾向于保守的行动,收敛到局部最优解。因此,确定适当的奖励度量是至关重要的。在先前的研究中,虽然Pass@k在评估中被使用,但它与RLVR中的LLM探索能力的联系仍然被大部分忽视。为了调查这一点,我们首先使用Pass@k作为奖励来训练策略模型(即Pass@k训练),并观察其探索能力的改进。接下来,我们为Pass@k训练的优势导出了一个分析解,引领了一种高效而有效的过程。基于此,我们的分析揭示了探索和利用并不是固有的相互冲突的目标,而是可以相互增强的。此外,具有分析推导的Pass@k训练实质上涉及直接设计优势函数。受此启发,我们初步探索了RLVR的优势设计,展示了有希望的结果,并突出了一个潜在的未来方向。
更新时间: 2025-08-14 15:34:47
领域: cs.LG,cs.AI,cs.CL
Optimistic critics can empower small actors
Actor-critic methods have been central to many of the recent advances in deep reinforcement learning. The most common approach is to use symmetric architectures, whereby both actor and critic have the same network topology and number of parameters. However, recent works have argued for the advantages of asymmetric setups, specifically with the use of smaller actors. We perform broad empirical investigations and analyses to better understand the implications of this and find that, in general, smaller actors result in performance degradation and overfit critics. Our analyses suggest poor data collection, due to value underestimation, as one of the main causes for this behavior, and further highlight the crucial role the critic can play in alleviating this pathology. We explore techniques to mitigate the observed value underestimation, which enables further research in asymmetric actor-critic methods.
Updated: 2025-08-14 15:32:40
标题: 乐观的评论家可以赋予小型演员力量
摘要: 演员-评论家方法在深度强化学习的许多最新进展中起着核心作用。最常见的方法是使用对称架构,即演员和评论家具有相同的网络拓扑和参数数量。然而,最近的研究认为不对称设置的优势,特别是使用较小的演员。我们进行了广泛的实证调查和分析,以更好地理解这一点,并发现,一般来说,较小的演员会导致性能下降和过度拟合评论家。我们的分析表明,由于价值低估,数据收集不足是这种行为的主要原因之一,并进一步强调评论家在缓解这种病态中可以起到关键作用。我们探讨了减轻观察到的价值低估的技术,从而促进了对不对称演员-评论家方法的进一步研究。
更新时间: 2025-08-14 15:32:40
领域: cs.LG,stat.ML
Scaling Up without Fading Out: Goal-Aware Sparse GNN for RL-based Generalized Planning
Generalized planning using deep reinforcement learning (RL) combined with graph neural networks (GNNs) has shown promising results in various symbolic planning domains described by PDDL. However, existing approaches typically represent planning states as fully connected graphs, leading to a combinatorial explosion in edge information and substantial sparsity as problem scales grow, especially evident in large grid-based environments. This dense representation results in diluted node-level information, exponentially increases memory requirements, and ultimately makes learning infeasible for larger-scale problems. To address these challenges, we propose a sparse, goal-aware GNN representation that selectively encodes relevant local relationships and explicitly integrates spatial features related to the goal. We validate our approach by designing novel drone mission scenarios based on PDDL within a grid world, effectively simulating realistic mission execution environments. Our experimental results demonstrate that our method scales effectively to larger grid sizes previously infeasible with dense graph representations and substantially improves policy generalization and success rates. Our findings provide a practical foundation for addressing realistic, large-scale generalized planning tasks.
Updated: 2025-08-14 15:30:28
标题: 不褪色地扩大规模:面向目标的稀疏GNN用于基于RL的泛化规划
摘要: 使用深度强化学习(RL)结合图神经网络(GNNs)进行泛化规划在各种由PDDL描述的符号规划领域中展现出令人期待的结果。然而,现有方法通常将规划状态表示为完全连接的图,导致边缘信息的组合爆炸和随着问题规模增长而显著稀疏,尤其在大型基于网格的环境中表现明显。这种密集表示导致节点级信息被稀释,指数级增加内存需求,并最终使得在较大规模问题中的学习变得不可行。为了解决这些挑战,我们提出了一种稀疏的、目标感知的GNN表示,选择性地编码相关的局部关系,并明确集成与目标相关的空间特征。我们通过设计基于PDDL的新颖无人机任务场景在网格世界中验证了我们的方法,有效模拟了现实任务执行环境。我们的实验结果表明,我们的方法有效地扩展到以前使用密集图表示不可行的更大网格大小,并且显著改善了策略泛化和成功率。我们的发现为解决现实的大规模泛化规划任务提供了实用的基础。
更新时间: 2025-08-14 15:30:28
领域: cs.AI,cs.RO
Data and Context Matter: Towards Generalizing AI-based Software Vulnerability Detection
The performance of AI-based software vulnerability detection systems is often limited by their poor generalization to unknown codebases. In this research, we explore the impact of data quality and model architecture on the generalizability of vulnerability detection systems. By generalization we mean ability of high vulnerability detection performance across different C/C++ software projects not seen during training. Through a series of experiments, we demonstrate that improvements in dataset diversity and quality substantially enhance detection performance. Additionally, we compare multiple encoder-only and decoder-only models, finding that encoder based models outperform in terms of accuracy and generalization. Our model achieves 6.8% improvement in recall on the benchmark BigVul[1] dataset, also outperforming on unseen projects, hence showing enhanced generalizability. These results highlight the role of data quality and model selection in the development of robust vulnerability detection systems. Our findings suggest a direction for future systems having high cross-project effectiveness.
Updated: 2025-08-14 15:30:22
标题: 数据和上下文很重要:朝着AI基础软件漏洞检测的泛化方向
摘要: 基于人工智能的软件漏洞检测系统的性能通常受到其对未知代码库的泛化能力不足的限制。在这项研究中,我们探讨了数据质量和模型架构对漏洞检测系统泛化能力的影响。泛化能力指的是在训练期间未见过的不同C/C++软件项目中实现高漏洞检测性能的能力。通过一系列实验,我们证明了数据集多样性和质量的提高显著提升了检测性能。此外,我们比较了多个仅编码器和仅解码器模型,发现基于编码器的模型在准确性和泛化性方面表现优异。我们的模型在基准BigVul数据集上实现了6.8%的召回率提升,同时在未知项目上也表现出色,因此显示出增强的泛化能力。这些结果突显了数据质量和模型选择在开发强大的漏洞检测系统中的作用。我们的发现为未来具有高跨项目效果的系统指明了方向。
更新时间: 2025-08-14 15:30:22
领域: cs.CR,cs.AI,cs.SE
IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection
Industrial anomaly detection is a critical component of modern manufacturing, yet the scarcity of defective samples restricts traditional detection methods to scenario-specific applications. Although Vision-Language Models (VLMs) demonstrate significant advantages in generalization capabilities, their performance in industrial anomaly detection remains limited. To address this challenge, we propose IAD-R1, a universal post-training framework applicable to VLMs of different architectures and parameter scales, which substantially enhances their anomaly detection capabilities. IAD-R1 employs a two-stage training strategy: the Perception Activation Supervised Fine-Tuning (PA-SFT) stage utilizes a meticulously constructed high-quality Chain-of-Thought dataset (Expert-AD) for training, enhancing anomaly perception capabilities and establishing reasoning-to-answer correlations; the Structured Control Group Relative Policy Optimization (SC-GRPO) stage employs carefully designed reward functions to achieve a capability leap from "Anomaly Perception" to "Anomaly Interpretation". Experimental results demonstrate that IAD-R1 achieves significant improvements across 7 VLMs, the largest improvement was on the DAGM dataset, with average accuracy 43.3% higher than the 0.5B baseline. Notably, the 0.5B parameter model trained with IAD-R1 surpasses commercial models including GPT-4.1 and Claude-Sonnet-4 in zero-shot settings, demonstrating the effectiveness and superiority of IAD-R1. The dataset, code, and all model weights will be publicly available at https://github.com/Yanhui-Lee/IAD-R1.
Updated: 2025-08-14 15:30:10
标题: IAD-R1:在工业异常检测中加强一致推理
摘要: 工业异常检测是现代制造业的关键组成部分,然而缺乏有缺陷样本限制了传统检测方法只能应用于特定场景。虽然视觉语言模型(VLMs)在泛化能力方面表现出明显优势,但它们在工业异常检测方面的性能仍然有限。为了解决这一挑战,我们提出了IAD-R1,这是一个通用的后训练框架,适用于不同架构和参数规模的VLMs,大大提升了它们的异常检测能力。IAD-R1采用两阶段训练策略:感知激活监督微调(PA-SFT)阶段利用精心构建的高质量思维链数据集(Expert-AD)进行训练,增强异常感知能力并建立推理到答案的相关性;结构化控制组相对策略优化(SC-GRPO)阶段采用精心设计的奖励函数,实现从“异常感知”到“异常解释”的能力飞跃。实验结果表明,IAD-R1在7个VLMs上实现了显著改进,其中在DAGM数据集上的改进最大,平均准确率比0.5B基准高出43.3%。值得注意的是,通过IAD-R1训练的0.5B参数模型在零射击设置中超过了商业模型,包括GPT-4.1和Claude-Sonnet-4,展示了IAD-R1的有效性和优越性。数据集、代码和所有模型权重将在https://github.com/Yanhui-Lee/IAD-R1 上公开提供。
更新时间: 2025-08-14 15:30:10
领域: cs.CV,cs.AI
Agentic Design Review System
Evaluating graphic designs involves assessing it from multiple facets like alignment, composition, aesthetics and color choices. Evaluating designs in a holistic way involves aggregating feedback from individual expert reviewers. Towards this, we propose an Agentic Design Review System (AgenticDRS), where multiple agents collaboratively analyze a design, orchestrated by a meta-agent. A novel in-context exemplar selection approach based on graph matching and a unique prompt expansion method plays central role towards making each agent design aware. Towards evaluating this framework, we propose DRS-BENCH benchmark. Thorough experimental evaluation against state-of-the-art baselines adapted to the problem setup, backed-up with critical ablation experiments brings out the efficacy of Agentic-DRS in evaluating graphic designs and generating actionable feedback. We hope that this work will attract attention to this pragmatic, yet under-explored research direction.
Updated: 2025-08-14 15:29:24
标题: 主观设计审查系统
摘要: 评估图形设计涉及从多个方面评估,如对齐、构图、美学和颜色选择。以整体方式评估设计涉及聚合来自个别专家评审员的反馈。为此,我们提出了一个主动设计审查系统(AgenticDRS),在这个系统中,多个代理共同分析设计,由一个元代理进行协调。基于图匹配的新型上下文示例选择方法和独特的提示扩展方法在使每个代理设计意识到中起到了核心作用。为了评估这个框架,我们提出了DRS-BENCH基准。通过与问题设置相适应的最先进基线的彻底实验评估,以及支持关键消融实验,展示了Agentic-DRS在评估图形设计和生成可操作反馈方面的有效性。我们希望这项工作能够吸引人们对这个实用但尚未充分探索的研究方向的关注。
更新时间: 2025-08-14 15:29:24
领域: cs.AI,cs.CV,cs.LG,cs.MA,cs.MM
APFL: Analytic Personalized Federated Learning via Dual-Stream Least Squares
Personalized Federated Learning (PFL) has presented a significant challenge to deliver personalized models to individual clients through collaborative training. Existing PFL methods are often vulnerable to non-IID data, which severely hinders collective generalization and then compromises the subsequent personalization efforts. In this paper, to address this non-IID issue in PFL, we propose an Analytic Personalized Federated Learning (APFL) approach via dual-stream least squares. In our APFL, we use a foundation model as a frozen backbone for feature extraction. Subsequent to the feature extractor, we develop dual-stream analytic models to achieve both collective generalization and individual personalization. Specifically, our APFL incorporates a shared primary stream for global generalization across all clients, and a dedicated refinement stream for local personalization of each individual client. The analytical solutions of our APFL enable its ideal property of heterogeneity invariance, theoretically meaning that each personalized model remains identical regardless of how heterogeneous the data are distributed across all other clients. Empirical results across various datasets also validate the superiority of our APFL over state-of-the-art baselines, with advantages of at least 1.10%-15.45% in accuracy.
Updated: 2025-08-14 15:12:50
标题: APFL: 通过双流最小二乘实现的分析个性化联邦学习
摘要: 个性化联邦学习(PFL)已经提出了一个重要挑战,即通过协作训练向个体客户提供个性化模型。现有的PFL方法往往容易受到非独立同分布数据的影响,这严重影响了集体泛化,从而损害了后续的个性化努力。在本文中,为了解决PFL中的非独立同分布问题,我们提出了一种通过双流最小二乘的分析个性化联邦学习(APFL)方法。在我们的APFL中,我们使用基础模型作为特征提取的冻结骨干。在特征提取器之后,我们开发了双流分析模型,以实现集体泛化和个体个性化。具体而言,我们的APFL包含一个共享的主要流用于全局泛化,跨所有客户,以及一个专门的细化流用于每个个体客户的本地个性化。我们的APFL的分析解决方案使其具有理想的异质性不变性属性,从理论上讲,这意味着每个个性化模型保持相同,无论数据在所有其他客户之间的分布有多么异质。各种数据集上的实证结果还验证了我们的APFL相对于最先进基线的优越性,准确率至少提高了1.10%-15.45%。
更新时间: 2025-08-14 15:12:50
领域: cs.LG,cs.AI
Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction
Human perceptual systems excel at inducing and recognizing objects across both known and novel categories, a capability far beyond current machine learning frameworks. While generalized category discovery (GCD) aims to bridge this gap, existing methods predominantly focus on optimizing objective functions. We present an orthogonal solution, inspired by the human cognitive process for novel object understanding: decomposing objects into visual primitives and establishing cross-knowledge comparisons. We propose ConGCD, which establishes primitive-oriented representations through high-level semantic reconstruction, binding intra-class shared attributes via deconstruction. Mirroring human preference diversity in visual processing, where distinct individuals leverage dominant or contextual cues, we implement dominant and contextual consensus units to capture class-discriminative patterns and inherent distributional invariants, respectively. A consensus scheduler dynamically optimizes activation pathways, with final predictions emerging through multiplex consensus integration. Extensive evaluations across coarse- and fine-grained benchmarks demonstrate ConGCD's effectiveness as a consensus-aware paradigm. Code is available at github.com/lytang63/ConGCD.
Updated: 2025-08-14 15:11:22
标题: 解剖广义类别发现:自我解构下的多重共识
摘要: 人类感知系统在识别和识别已知和新颖类别的对象方面表现出色,这种能力远远超出了当前机器学习框架的范围。虽然广义类别发现(GCD)旨在弥合这一差距,但现有方法主要集中在优化客观函数上。我们提出了一种正交解决方案,灵感来自于人类对新颖物体理解的认知过程:将对象分解为视觉基元,并建立跨知识比较。我们提出了ConGCD,通过高级语义重建建立基元导向的表示,通过解构绑定类内共享属性。在视觉处理中反映人类偏好多样性的地方,不同个体利用主导或上下文线索,我们实现了主导和上下文一致性单元,分别捕捉类别区分模式和固有分布不变性。共识调度程序动态优化激活路径,最终预测通过复用共识集成产生。在粗粒度和细粒度基准上进行的广泛评估表明ConGCD作为一个考虑共识的范式的有效性。代码可在github.com/lytang63/ConGCD上找到。
更新时间: 2025-08-14 15:11:22
领域: cs.CV,cs.LG
EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering
Recent advances in Multimodal Large Language Models (MLLMs) have significantly pushed the frontier of egocentric video question answering (EgocentricQA). However, existing benchmarks and studies are mainly limited to common daily activities such as cooking and cleaning. In contrast, real-world deployment inevitably encounters domain shifts, where target domains differ substantially in both visual style and semantic content. To bridge this gap, we introduce \textbf{EgoCross}, a comprehensive benchmark designed to evaluate the cross-domain generalization of MLLMs in EgocentricQA. EgoCross covers four diverse and challenging domains, including surgery, industry, extreme sports, and animal perspective, representing realistic and high-impact application scenarios. It comprises approximately 1,000 QA pairs across 798 video clips, spanning four key QA tasks: prediction, recognition, localization, and counting. Each QA pair provides both OpenQA and CloseQA formats to support fine-grained evaluation. Extensive experiments show that most existing MLLMs, whether general-purpose or egocentric-specialized, struggle to generalize to domains beyond daily life, highlighting the limitations of current models. Furthermore, we conduct several pilot studies, \eg, fine-tuning and reinforcement learning, to explore potential improvements. We hope EgoCross and our accompanying analysis will serve as a foundation for advancing domain-adaptive, robust egocentric video understanding. Data and codes will be released at: \href{https://github.com/MyUniverse0726/EgoCross}{https://github.com/MyUniverse0726/EgoCross.}
Updated: 2025-08-14 15:11:20
标题: EgoCross:用于跨领域自我中心视频问答的多模态大型语言模型基准测试
摘要: 最近关于多模态大型语言模型(MLLMs)的研究取得了显著进展,大大推动了自我中心视频问答(EgocentricQA)的前沿。然而,现有的基准和研究主要局限于常见的日常活动,如烹饪和清洁。相比之下,真实世界的部署不可避免地会遇到领域转移,目标领域在视觉风格和语义内容上差异很大。为了弥合这一差距,我们引入了\textbf{EgoCross},一个旨在评估MLLMs在EgocentricQA中跨领域泛化能力的全面基准。EgoCross涵盖了四个多样且具有挑战性的领域,包括手术、工业、极限运动和动物视角,代表了现实和高影响应用场景。它包括约1,000个QA对,涵盖了798个视频剪辑,涵盖了四个关键的QA任务:预测、识别、定位和计数。每个QA对提供了OpenQA和CloseQA格式,以支持细粒度评估。广泛的实验表明,大多数现有的MLLMs,无论是通用型还是自我中心专用型,都难以泛化到超出日常生活领域的领域,突显了当前模型的局限性。此外,我们进行了几项初步研究,如微调和强化学习,以探索潜在的改进。我们希望EgoCross及其附带的分析将为推进领域自适应、强大的自我中心视频理解奠定基础。数据和代码将发布在:\href{https://github.com/MyUniverse0726/EgoCross}{https://github.com/MyUniverse0726/EgoCross}。
更新时间: 2025-08-14 15:11:20
领域: cs.CV,cs.AI
Sample-efficient LLM Optimization with Reset Replay
Recent advancements in post-training Large Language Models (LLMs), particularly through Reinforcement Learning (RL) and preference optimization methods, are key drivers for enhancing their reasoning capabilities. However, these methods are often plagued by low sample efficiency and a susceptibility to primacy bias, where overfitting to initial experiences degrades policy quality and damages the learning process. To address these challenges, we introduce LLM optimization with Reset Replay (LoRR), a general and powerful plugin designed to enhance sample efficiency in any preference-based optimization framework. LoRR core mechanism enables training at a high replay number, maximizing the utility of each collected data batch. To counteract the risk of overfitting inherent in high-replay training, LoRR incorporates a periodic reset strategy with reusing initial data, which preserves network plasticity. Furthermore, it leverages a hybrid optimization objective, combining supervised fine-tuning (SFT) and preference-based losses to further bolster data exploitation. Our extensive experiments demonstrate that LoRR significantly boosts the performance of various preference optimization methods on both mathematical and general reasoning benchmarks. Notably, an iterative DPO approach augmented with LoRR achieves comparable performance on challenging math tasks, outperforming some complex and computationally intensive RL-based algorithms. These findings highlight that LoRR offers a practical, sample-efficient, and highly effective paradigm for LLM finetuning, unlocking greater performance from limited data.
Updated: 2025-08-14 14:59:51
标题: 具有重置重放的样本高效LLM优化
摘要: 最近在后续培训大型语言模型(LLMs)方面取得的进展,特别是通过强化学习(RL)和偏好优化方法,是增强它们推理能力的关键驱动因素。然而,这些方法往往受到样本效率低和对初期体验的过度拟合的困扰,这会降低策略质量并损害学习过程。为了解决这些挑战,我们引入了带有重置重放(LoRR)的LLM优化,这是一个通用且强大的插件,旨在增强任何基于偏好的优化框架的样本效率。LoRR的核心机制可以在高重放次数下进行训练,最大化每个收集的数据批次的效用。为了抵消高重放训练中固有的过拟合风险,LoRR采用了定期重置策略,并重复使用初始数据,从而保持网络的可塑性。此外,它利用混合优化目标,将监督微调(SFT)和基于偏好的损失结合起来,进一步加强数据的开发利用。我们的大量实验证明,LoRR显著提升了各种偏好优化方法在数学和一般推理基准测试上的性能。值得注意的是,采用LoRR增强的迭代DPO方法在具有挑战性的数学任务上取得了可比较的性能,优于一些复杂且计算密集型的基于RL的算法。这些发现突出了LoRR提供了一种实用、样本有效且高效的LLM微调范式,从有限数据中释放出更大的性能。
更新时间: 2025-08-14 14:59:51
领域: cs.LG,cs.CL
MAP Estimation with Denoisers: Convergence Rates and Guarantees
Denoiser models have become powerful tools for inverse problems, enabling the use of pretrained networks to approximate the score of a smoothed prior distribution. These models are often used in heuristic iterative schemes aimed at solving Maximum a Posteriori (MAP) optimisation problems, where the proximal operator of the negative log-prior plays a central role. In practice, this operator is intractable, and practitioners plug in a pretrained denoiser as a surrogate-despite the lack of general theoretical justification for this substitution. In this work, we show that a simple algorithm, closely related to several used in practice, provably converges to the proximal operator under a log-concavity assumption on the prior $p$. We show that this algorithm can be interpreted as a gradient descent on smoothed proximal objectives. Our analysis thus provides a theoretical foundation for a class of empirically successful but previously heuristic methods.
Updated: 2025-08-14 14:59:47
标题: 用去噪器进行MAP估计:收敛速率和保证
摘要: Denoiser模型已成为逆问题的强大工具,使得可以利用预训练的网络来近似平滑先验分布的评分。这些模型通常用于启发式迭代方案,旨在解决最大后验(MAP)优化问题,其中负对数先验的近端算子起着核心作用。在实践中,这个算子是难以处理的,实践者会插入一个预训练的去噪器作为替代,尽管缺乏对这种替代的一般理论证明。在这项工作中,我们展示了一个简单的算法,与实践中使用的几个算法密切相关,可以在先验$p$的对数凹性假设下证明收敛到近端算子。我们展示了这个算法可以解释为对平滑近端目标的梯度下降。因此,我们的分析为一类经验成功但以前是启发式方法提供了理论基础。
更新时间: 2025-08-14 14:59:47
领域: cs.LG,math.OC,stat.ML
Preacher: Paper-to-Video Agentic System
The paper-to-video task converts a research paper into a structured video abstract, distilling key concepts, methods, and conclusions into an accessible, well-organized format. While state-of-the-art video generation models demonstrate potential, they are constrained by limited context windows, rigid video duration constraints, limited stylistic diversity, and an inability to represent domain-specific knowledge. To address these limitations, we introduce Preacher, the first paper-to-video agentic system. Preacher employs a topdown approach to decompose, summarize, and reformulate the paper, followed by bottom-up video generation, synthesizing diverse video segments into a coherent abstract. To align cross-modal representations, we define key scenes and introduce a Progressive Chain of Thought (P-CoT) for granular, iterative planning. Preacher successfully generates high-quality video abstracts across five research fields, demonstrating expertise beyond current video generation models. Code will be released at: https://github.com/GenVerse/Paper2Video
Updated: 2025-08-14 14:59:23
标题: 传教士:纸质到视频的自主系统
摘要: 这篇论文将研究论文转换为结构化视频摘要,将关键概念、方法和结论提炼成易于访问、组织良好的格式。尽管最先进的视频生成模型展现出潜力,但受限于有限的上下文窗口、严格的视频持续时间限制、有限的风格多样性,以及无法表示领域特定知识的限制。为了解决这些限制,我们介绍了Preacher,这是第一个论文到视频代理系统。Preacher采用自上而下的方法,将论文分解、总结和重新表述,然后进行自下而上的视频生成,将多样的视频片段合成成连贯的摘要。为了对齐跨模态表示,我们定义了关键场景,并引入了渐进式思维链条(P-CoT)进行细粒度、迭代式规划。Preacher成功地在五个研究领域生成了高质量的视频摘要,展示了超越当前视频生成模型的专业知识。代码将在以下网址发布:https://github.com/GenVerse/Paper2Video。
更新时间: 2025-08-14 14:59:23
领域: cs.CV,cs.AI
Oranits: Mission Assignment and Task Offloading in Open RAN-based ITS using Metaheuristic and Deep Reinforcement Learning
In this paper, we explore mission assignment and task offloading in an Open Radio Access Network (Open RAN)-based intelligent transportation system (ITS), where autonomous vehicles leverage mobile edge computing for efficient processing. Existing studies often overlook the intricate interdependencies between missions and the costs associated with offloading tasks to edge servers, leading to suboptimal decision-making. To bridge this gap, we introduce Oranits, a novel system model that explicitly accounts for mission dependencies and offloading costs while optimizing performance through vehicle cooperation. To achieve this, we propose a twofold optimization approach. First, we develop a metaheuristic-based evolutionary computing algorithm, namely the Chaotic Gaussian-based Global ARO (CGG-ARO), serving as a baseline for one-slot optimization. Second, we design an enhanced reward-based deep reinforcement learning (DRL) framework, referred to as the Multi-agent Double Deep Q-Network (MA-DDQN), that integrates both multi-agent coordination and multi-action selection mechanisms, significantly reducing mission assignment time and improving adaptability over baseline methods. Extensive simulations reveal that CGG-ARO improves the number of completed missions and overall benefit by approximately 7.1% and 7.7%, respectively. Meanwhile, MA-DDQN achieves even greater improvements of 11.0% in terms of mission completions and 12.5% in terms of the overall benefit. These results highlight the effectiveness of Oranits in enabling faster, more adaptive, and more efficient task processing in dynamic ITS environments.
Updated: 2025-08-14 14:59:13
标题: 标题翻译:Oranits:基于元启发式和深度强化学习的开放式RAN ITS中的任务分配和任务卸载
摘要: 在本文中,我们探讨了基于开放式无线接入网络(Open RAN)的智能交通系统(ITS)中的任务分配和任务卸载,自动驾驶车辆利用移动边缘计算进行高效处理。现有研究通常忽视任务之间的复杂相互依赖关系以及将任务卸载到边缘服务器所带来的成本,导致决策不够优化。为了弥补这一差距,我们引入了Oranits,一个新颖的系统模型,明确考虑任务依赖关系和卸载成本,同时通过车辆协作优化性能。为实现这一目标,我们提出了一个双重优化方法。首先,我们开发了一个基于元启发式的进化计算算法,即混沌高斯全局ARO(CGG-ARO),作为一个基准用于单槽优化。其次,我们设计了一个增强的基于奖励的深度强化学习(DRL)框架,称为多智能体双重深度Q网络(MA-DDQN),集成了多智能体协调和多动作选择机制,显著减少了任务分配时间,并提高了对基准方法的适应性。大量模拟显示,CGG-ARO在完成任务数量和整体效益上分别提高了约7.1%和7.7%。与此同时,MA-DDQN在完成任务数量和整体效益方面取得了更大的改善,分别提高了11.0%和12.5%。这些结果突显了Oranits在动态ITS环境中实现更快、更具适应性和更高效的任务处理的有效性。
更新时间: 2025-08-14 14:59:13
领域: cs.DC,cs.AI,cs.GT,cs.LG,cs.NI
Symmetry-Constrained Multi-Scale Physics-Informed Neural Networks for Graphene Electronic Band Structure Prediction
Accurate prediction of electronic band structures in two-dimensional materials remains a fundamental challenge, with existing methods struggling to balance computational efficiency and physical accuracy. We present the Symmetry-Constrained Multi-Scale Physics-Informed Neural Network (SCMS-PINN) v35, which directly learns graphene band structures while rigorously enforcing crystallographic symmetries through a multi-head architecture. Our approach introduces three specialized ResNet-6 pathways -- K-head for Dirac physics, M-head for saddle points, and General head for smooth interpolation -- operating on 31 physics-informed features extracted from k-points. Progressive Dirac constraint scheduling systematically increases the weight parameter from 5.0 to 25.0, enabling hierarchical learning from global topology to local critical physics. Training on 10,000 k-points over 300 epochs achieves 99.99\% reduction in training loss (34.597 to 0.003) with validation loss of 0.0085. The model predicts Dirac point gaps within 30.3 $\mu$eV of theoretical zero and achieves average errors of 53.9 meV (valence) and 40.5 meV (conduction) across the Brillouin zone. All twelve C$_{6v}$ operations are enforced through systematic averaging, guaranteeing exact symmetry preservation. This framework establishes a foundation for extending physics-informed learning to broader two-dimensional materials for accelerated discovery.
Updated: 2025-08-14 14:59:10
标题: 受对称约束的多尺度物理信息神经网络用于石墨烯电子能带结构预测
摘要: 在二维材料中准确预测电子能带结构仍然是一个基本挑战,现有的方法在平衡计算效率和物理准确性方面存在困难。我们提出了一种称为Symmetry-Constrained Multi-Scale Physics-Informed Neural Network (SCMS-PINN)的方法,通过多头架构严格执行晶体对称性,直接学习石墨烯的能带结构。我们的方法引入了三个专门的ResNet-6路径 - K头用于Dirac物理,M头用于鞍点,General头用于平滑插值 - 从31个从k点提取的物理信息特征进行操作。渐进的Dirac约束调度系统地将权重参数从5.0增加到25.0,实现了从全局拓扑到局部关键物理的分层学习。通过在300个epochs上对10,000个k点进行训练,训练损失减少了99.99%(从34.597降至0.003),验证损失为0.0085。该模型预测的Dirac点间隙与理论零点之间的误差为30.3μeV,并在布里渊区域内实现了53.9 meV(价带)和40.5 meV(导带)的平均误差。通过系统平均,所有十二个C6v操作都被强制执行,确保准确的对称性保持。这一框架为将物理信息学习扩展到更广泛的二维材料以加速发现奠定了基础。
更新时间: 2025-08-14 14:59:10
领域: cond-mat.mtrl-sci,cs.LG,physics.comp-ph
Electromagnetic Simulations of Antennas on GPUs for Machine Learning Applications
This study proposes an antenna simulation framework powered by graphics processing units (GPUs) based on an open-source electromagnetic (EM) simulation software (gprMax) for machine learning applications of antenna design and optimization. Furthermore, it compares the simulation results with those obtained through commercial EM software. The proposed software framework for machine learning and surrogate model applications will produce antenna data sets consisting of a large number of antenna simulation results using GPUs. Although machine learning methods can attain the optimum solutions for many problems, they are known to be data-hungry and require a great deal of samples for the training stage of the algorithms. However, producing a sufficient number of training samples in EM applications within a limited time is challenging due to the high computational complexity of EM simulations. Therefore, GPUs are utilized in this study to simulate a large number of antennas with predefined or random antenna shape parameters to produce data sets. Moreover, this study also compares various machine learning and deep learning models in terms of antenna parameter estimation performance. This study demonstrates that an entry-level GPU substantially outperforms a high-end CPU in terms of computational performance, while a high-end gaming GPU can achieve around 18 times more computational performance compared to a high-end CPU. Moreover, it is shown that the open-source EM simulation software can deliver similar results to those obtained via commercial software in the simulation of microstrip antennas when the spatial resolution of the simulations is sufficiently fine.
Updated: 2025-08-14 14:56:04
标题: 在GPU上进行天线的电磁模拟用于机器学习应用
摘要: 这项研究提出了一种由图形处理单元(GPU)驱动的天线仿真框架,基于开源电磁(EM)仿真软件(gprMax),用于天线设计和优化的机器学习应用。此外,它将仿真结果与商业EM软件获得的结果进行了比较。该机器学习和代理模型应用的软件框架将利用GPU生成包含大量天线仿真结果的数据集。尽管机器学习方法可以获得许多问题的最佳解决方案,但它们需要大量的数据样本,需要大量的样本来进行算法的训练阶段。然而,在有限的时间内生成足够数量的训练样本在EM应用中是具有挑战性的,因为EM仿真的计算复杂性很高。因此,本研究利用GPU来模拟具有预定义或随机天线形状参数的大量天线以生成数据集。此外,本研究还比较了各种机器学习和深度学习模型在天线参数估计性能方面的表现。该研究表明,入门级GPU在计算性能方面明显优于高端CPU,而高端游戏GPU与高端CPU相比可以实现约18倍的计算性能。此外,研究表明,在仿真分辨率足够细的情况下,开源EM仿真软件可以提供与商业软件相似的结果,用于微带天线的仿真。
更新时间: 2025-08-14 14:56:04
领域: cs.LG,cs.AI
Advancing MAPF towards the Real World: A Scalable Multi-Agent Realistic Testbed (SMART)
We present Scalable Multi-Agent Realistic Testbed (SMART), a realistic and efficient software tool for evaluating Multi-Agent Path Finding (MAPF) algorithms. MAPF focuses on planning collision-free paths for a group of agents. While state-ofthe-art MAPF algorithms can plan paths for hundreds of robots in seconds, they often rely on simplified robot models, making their real-world performance unclear. Researchers typically lack access to hundreds of physical robots in laboratory settings to evaluate the algorithms. Meanwhile, industrial professionals who lack expertise in MAPF require an easy-to-use simulator to efficiently test and understand the performance of MAPF algorithms in their specific settings. SMART fills this gap with several advantages: (1) SMART uses physics-engine-based simulators to create realistic simulation environments, accounting for complex real-world factors such as robot kinodynamics and execution uncertainties, (2) SMART uses an execution monitor framework based on the Action Dependency Graph, facilitating seamless integration with various MAPF algorithms and robot models, and (3) SMART scales to thousands of robots. The code is publicly available at https://github.com/smart-mapf/smart.
Updated: 2025-08-14 14:55:27
标题: 推动MAPF走向现实世界:一个可扩展的多智能体真实测试平台(SMART)
摘要: 我们提出了可扩展的多智能体真实测试平台(SMART),这是一个用于评估多智能体路径规划(MAPF)算法的真实且高效的软件工具。MAPF专注于为一组智能体规划无碰撞路径。尽管目前最先进的MAPF算法可以在几秒内为数百个机器人规划路径,但它们通常依赖简化的机器人模型,使得它们在现实世界中的性能不明确。研究人员通常缺乏在实验室环境中使用数百个物理机器人来评估算法的机会。与此同时,缺乏MAPF专业知识的工业专业人士需要一个易于使用的模拟器,以有效地测试和了解MAPF算法在其特定环境中的性能。SMART填补了这一空白,具有几个优点:(1)SMART使用基于物理引擎的模拟器来创建逼真的仿真环境,考虑到复杂的现实世界因素,如机器人运动动力学和执行不确定性,(2)SMART使用基于动作依赖图的执行监控框架,便于与各种MAPF算法和机器人模型无缝集成,(3)SMART可以扩展到数千个机器人。该代码可公开访问:https://github.com/smart-mapf/smart。
更新时间: 2025-08-14 14:55:27
领域: cs.RO,cs.AI
Lightweight CNNs for Embedded SAR Ship Target Detection and Classification
Synthetic Aperture Radar (SAR) data enables large-scale surveillance of maritime vessels. However, near-real-time monitoring is currently constrained by the need to downlink all raw data, perform image focusing, and subsequently analyze it on the ground. On-board processing to generate higher-level products could reduce the data volume that needs to be downlinked, alleviating bandwidth constraints and minimizing latency. However, traditional image focusing and processing algorithms face challenges due to the satellite's limited memory, processing power, and computational resources. This work proposes and evaluates neural networks designed for real-time inference on unfocused SAR data acquired in Stripmap and Interferometric Wide (IW) modes captured with Sentinel-1. Our results demonstrate the feasibility of using one of our models for on-board processing and deployment on an FPGA. Additionally, by investigating a binary classification task between ships and windmills, we demonstrate that target classification is possible.
Updated: 2025-08-14 14:55:19
标题: 轻量级卷积神经网络用于嵌入式合成孔径雷达舰船目标检测和分类
摘要: 合成孔径雷达(SAR)数据使得对海上船只进行大规模监视成为可能。然而,目前实时监测受到限制,因为需要将所有原始数据传输下来,进行图像对焦,然后在地面上进行分析。通过在船上处理生成高级产品,可以减少需要传输下来的数据量,缓解带宽限制并最小化延迟。然而,传统的图像对焦和处理算法面临挑战,因为卫星的内存、处理能力和计算资源有限。本研究提出并评估了专为在Stripmap和Interferometric Wide(IW)模式下使用Sentinel-1捕获的未对焦SAR数据进行实时推断而设计的神经网络。我们的结果表明,我们的模型之一可用于船上处理,并在FPGA上部署。此外,通过研究船只和风车之间的二元分类任务,我们证明了目标分类是可能的。
更新时间: 2025-08-14 14:55:19
领域: cs.CV,cs.LG
Episodic Memory Verbalization using Hierarchical Representations of Life-Long Robot Experience
Verbalization of robot experience, i.e., summarization of and question answering about a robot's past, is a crucial ability for improving human-robot interaction. Previous works applied rule-based systems or fine-tuned deep models to verbalize short (several-minute-long) streams of episodic data, limiting generalization and transferability. In our work, we apply large pretrained models to tackle this task with zero or few examples, and specifically focus on verbalizing life-long experiences. For this, we derive a tree-like data structure from episodic memory (EM), with lower levels representing raw perception and proprioception data, and higher levels abstracting events to natural language concepts. Given such a hierarchical representation built from the experience stream, we apply a large language model as an agent to interactively search the EM given a user's query, dynamically expanding (initially collapsed) tree nodes to find the relevant information. The approach keeps computational costs low even when scaling to months of robot experience data. We evaluate our method on simulated household robot data, human egocentric videos, and real-world robot recordings, demonstrating its flexibility and scalability.
Updated: 2025-08-14 14:53:25
标题: 使用分层表示的终身机器人经验实现情节记忆的语言化
摘要: 机器人经验的口头表达,即对机器人过去的总结和问题回答,对于改善人机交互至关重要。先前的研究应用基于规则的系统或经过精细调整的深度模型来口头表达短暂(几分钟长)的情节数据流,限制了泛化和可转移性。在我们的工作中,我们应用大型预训练模型来处理这一任务,即使零个或极少的示例,特别关注口头表达终身经验。为此,我们从情节记忆(EM)中导出一个树状数据结构,较低级别代表原始感知和本体感知数据,较高级别将事件抽象为自然语言概念。在从经验流构建的这种分层表示的基础上,我们应用一个大型语言模型作为代理来交互式搜索EM,根据用户的查询动态扩展(最初折叠的)树节点以找到相关信息。即使扩展到几个月的机器人经验数据,该方法也保持计算成本低。我们在模拟家庭机器人数据、人类自我中心视频和真实世界机器人录音上评估了我们的方法,展示了其灵活性和可扩展性。
更新时间: 2025-08-14 14:53:25
领域: cs.RO,cs.AI
15,500 Seconds: Lean UAV Classification Using EfficientNet and Lightweight Fine-Tuning
As unmanned aerial vehicles (UAVs) become increasingly prevalent in both consumer and defense applications, the need for reliable, modality-specific classification systems grows in urgency. This paper addresses the challenge of data scarcity in UAV audio classification by expanding on prior work through the integration of pre-trained deep learning models, parameter-efficient fine-tuning (PEFT) strategies, and targeted data augmentation techniques. Using a custom dataset of 3,100 UAV audio clips (15,500 seconds) spanning 31 distinct drone types, we evaluate the performance of transformer-based and convolutional neural network (CNN) architectures under various fine-tuning configurations. Experiments were conducted with five-fold cross-validation, assessing accuracy, training efficiency, and robustness. Results show that full fine-tuning of the EfficientNet-B0 model with three augmentations achieved the highest validation accuracy (95.95), outperforming both the custom CNN and transformer-based models like AST. These findings suggest that combining lightweight architectures with PEFT and well-chosen augmentations provides an effective strategy for UAV audio classification on limited datasets. Future work will extend this framework to multimodal UAV classification using visual and radar telemetry.
Updated: 2025-08-14 14:50:40
标题: 15,500秒:使用EfficientNet和轻量级微调的精简无人机分类
摘要: 随着无人机在消费和国防领域的日益普及,对可靠的、特定模态的分类系统的需求变得更加迫切。本文通过整合预训练的深度学习模型、参数高效微调(PEFT)策略和有针对性的数据增强技术,来解决无人机音频分类中数据稀缺性的挑战。利用一组包含31种不同无人机类型的3,100个无人机音频剪辑(总计15,500秒)的自定义数据集,评估了变换器和卷积神经网络(CNN)架构在不同微调配置下的性能。实验采用了五折交叉验证,评估了准确性、训练效率和鲁棒性。结果显示,对EfficientNet-B0模型进行全面微调并使用三种增强技术,实现了最高的验证准确率(95.95),优于自定义CNN和变换器模型(如AST)。这些发现表明,将轻量级架构与PEFT和选择良好的增强技术结合起来,是在有限数据集上进行无人机音频分类的有效策略。未来的工作将扩展这一框架,利用视觉和雷达遥测数据进行多模态无人机分类。
更新时间: 2025-08-14 14:50:40
领域: cs.LG,cs.AI
GenOM: Ontology Matching with Description Generation and Large Language Model
Ontology matching (OM) plays an essential role in enabling semantic interoperability and integration across heterogeneous knowledge sources, particularly in the biomedical domain which contains numerous complex concepts related to diseases and pharmaceuticals. This paper introduces GenOM, a large language model (LLM)-based ontology alignment framework, which enriches the semantic representations of ontology concepts via generating textual definitions, retrieves alignment candidates with an embedding model, and incorporates exact matching-based tools to improve precision. Extensive experiments conducted on the OAEI Bio-ML track demonstrate that GenOM can often achieve competitive performance, surpassing many baselines including traditional OM systems and recent LLM-based methods. Further ablation studies confirm the effectiveness of semantic enrichment and few-shot prompting, highlighting the framework's robustness and adaptability.
Updated: 2025-08-14 14:48:09
标题: GenOM:利用描述生成和大型语言模型进行本体匹配
摘要: 本文介绍了GenOM,一个基于大型语言模型(LLM)的本体对齐框架,它通过生成文本定义丰富本体概念的语义表示,利用嵌入模型检索对齐候选项,并结合基于精确匹配的工具来提高精度。在OAEI Bio-ML轨道上进行的大量实验表明,GenOM通常能够取得竞争性表现,超越许多基线,包括传统的本体匹配系统和最近的基于LLM的方法。进一步的消融研究验证了语义丰富和少样本提示的有效性,突显了该框架的鲁棒性和适应性。
更新时间: 2025-08-14 14:48:09
领域: cs.AI
REFN: A Reinforcement-Learning-From-Network Framework against 1-day/n-day Exploitations
The exploitation of 1 day or n day vulnerabilities poses severe threats to networked devices due to massive deployment scales and delayed patching (average Mean Time To Patch exceeds 60 days). Existing defenses, including host based patching and network based filtering, are inadequate due to limited scalability across diverse devices, compatibility issues especially with embedded or legacy systems, and error prone deployment process (manual patch validation). To address these issues, we introduce REFN (Reinforcement Learning From Network), a novel framework that trains Large Language Models (LLMs) to autonomously generate network filters to prevent 1 day or n day exploitations. REFN ensures scalability by uniquely employs Reinforcement Learning (RL) driven by online network rewards instead of traditional Human Feedback (RLHF). REFN guarantees compatibility via unified deployment on edge security gateways (Amazon Eero). REFN provides robustness via online validation using real network traffic. Crucially, REFN addresses three core challenges in training LLMs for exploit prevention: 1) expanding current LLMs limited vulnerability fixing expertise via Agentic RAG based Knowledge Distillation, 2) bridging current LLMs language to network gaps through an RL From VNF Pipeline that translates language context (vulnerability description) into network enforcement, 3) addressing the LLM hallucination and non determinism via the Online Agentic Validation that penalizes erroneous outputs. Evaluated across 22 families of 1 day or n day exploits, REFN demonstrates effectiveness (21.1 percent higher accuracy than alternatives), efficiency (Mean Time To Patch of 3.65 hours) and scalability (easily scale to 10K devices). REFN serves as an initial step toward training LLMs to rapidly prevent massive scale 1 day or n day exploitations.
Updated: 2025-08-14 14:45:45
标题: REFN:一种针对1天/n天利用的强化学习网络框架
摘要: 利用1天或n天漏洞的利用对网络设备造成严重威胁,这是由于大规模部署和延迟修补(平均修补时间超过60天)。现有的防御措施,包括基于主机的修补和基于网络的过滤,由于跨多种设备的有限可扩展性、与嵌入式或传统系统的兼容性问题以及容易出错的部署过程(手动修补验证)而不足。为了解决这些问题,我们介绍了REFN(来自网络的强化学习),这是一个新颖的框架,它训练大型语言模型(LLMs)自主生成网络过滤器,以防止1天或n天的利用。REFN通过独特地使用强化学习(RL)驱动在线网络奖励来确保可扩展性,而不是传统的人类反馈(RLHF)。REFN通过在边缘安全网关(Amazon Eero)上统一部署来确保兼容性。REFN通过使用真实网络流量进行在线验证提供了鲁棒性。关键是,REFN解决了训练LLMs以防止利用的三个核心挑战:1)通过基于主动RAG的知识蒸馏扩展当前LLMs有限的漏洞修复专业知识,2)通过从VNF管道中的RL将当前LLMs的语言与网络之间的差距联系起来,将语言上下文(漏洞描述)翻译成网络强制执行,3)通过在线主动验证来解决LLM的幻觉和非确定性问题,惩罚错误的输出。在对22个1天或n天漏洞家族进行评估时,REFN展示出了高效性(比替代方案高21.1%的准确性)、效率性(平均修补时间为3.65小时)和可扩展性(轻松扩展到10,000个设备)。REFN作为训练LLMs迅速防止大规模1天或n天利用的初始步骤。
更新时间: 2025-08-14 14:45:45
领域: cs.LG,cs.AI
A Random-Key Optimizer for Combinatorial Optimization
This paper introduces the Random-Key Optimizer (RKO), a versatile and efficient stochastic local search method tailored for combinatorial optimization problems. Using the random-key concept, RKO encodes solutions as vectors of random keys that are subsequently decoded into feasible solutions via problem-specific decoders. The RKO framework is able to combine a plethora of classic metaheuristics, each capable of operating independently or in parallel, with solution sharing facilitated through an elite solution pool. This modular approach allows for the adaptation of various metaheuristics, including simulated annealing, iterated local search, and greedy randomized adaptive search procedures, among others. The efficacy of the RKO framework, implemented in C++ and publicly available (Github public repository: github.com/RKO-solver), is demonstrated through its application to three NP-hard combinatorial optimization problems: the alpha-neighborhood p-median problem, the tree of hubs location problem, and the node-capacitated graph partitioning problem. The results highlight the framework's ability to produce high-quality solutions across diverse problem domains, underscoring its potential as a robust tool for combinatorial optimization.
Updated: 2025-08-14 14:41:46
标题: 一种用于组合优化的随机密钥优化器
摘要: 本文介绍了随机键优化器(RKO),这是一种专为组合优化问题量身定制的多功能高效随机局部搜索方法。使用随机键概念,RKO将解决方案编码为随机键向量,随后通过特定于问题的解码器解码为可行解。RKO框架能够结合大量经典元启发式算法,每种算法都可以独立运行或并行运行,并通过精英解池促进解决方案共享。这种模块化方法允许适应各种元启发式算法,包括模拟退火、迭代局部搜索和贪婪随机自适应搜索等算法。RKO框架的有效性通过在三个NP-hard组合优化问题上的应用得到了证明:alpha-邻域p-中值问题、中心位置树问题和节点容量图分区问题。结果突显了该框架在各种问题领域产生高质量解决方案的能力,突显了其作为组合优化稳健工具的潜力。
更新时间: 2025-08-14 14:41:46
领域: cs.AI,cond-mat.dis-nn,cs.NE,math.OC,90-02, 90B40, 90C27,G.1.6; G.2.1; I.2.8
GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
We present GLM-4.1V-Thinking and GLM-4.5V, a family of vision-language models (VLMs) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document interpretation. In a comprehensive evaluation across 42 public benchmarks, GLM-4.5V achieves state-of-the-art performance on nearly all tasks among open-source models of similar size, and demonstrates competitive or even superior results compared to closed-source models such as Gemini-2.5-Flash on challenging tasks including Coding and GUI Agents. Meanwhile, the smaller GLM-4.1V-9B-Thinking remains highly competitive-achieving superior results to the much larger Qwen2.5-VL-72B on 29 benchmarks. We open-source both GLM-4.1V-9B-Thinking and GLM-4.5V. Code, models and more information are released at https://github.com/zai-org/GLM-V.
Updated: 2025-08-14 14:37:03
标题: GLM-4.1V:思维与GLM-4.5V:朝向可扩展强化学习的多模态推理
摘要: 我们介绍GLM-4.1V-Thinking和GLM-4.5V,这是一组旨在推进通用多模态理解和推理的视觉语言模型(VLMs)。在这份报告中,我们分享了在推理中心训练框架开发中的关键发现。我们首先通过大规模预训练开发了一个具有巨大潜力的视觉基础模型,这可以说是最终性能的上限。然后,我们提出了采用课程采样的强化学习(RLCS)来释放模型的全部潜力,从而在各种任务中全面提升能力,包括STEM问题解决、视频理解、内容识别、编码、接地、基于GUI的代理以及长篇文档解释。在对42个公共基准测试的全面评估中,GLM-4.5V在几乎所有任务中实现了同类大小的开源模型中的最新性能,并且在具有挑战性的任务中,如编码和GUI代理,与Gemini-2.5-Flash等闭源模型相比表现出竞争力甚至优越的结果。与此同时,较小的GLM-4.1V-9B-Thinking仍然具有很强的竞争力,在29个基准测试中取得了比Qwen2.5-VL-72B等较大模型更好的结果。我们开源了GLM-4.1V-9B-Thinking和GLM-4.5V。代码、模型和更多信息已发布在https://github.com/zai-org/GLM-V。
更新时间: 2025-08-14 14:37:03
领域: cs.CV,cs.AI,cs.LG
Learning from Natural Language Feedback for Personalized Question Answering
Personalization is crucial for enhancing both the effectiveness and user satisfaction of language technologies, particularly in information-seeking tasks like question answering. Current approaches for personalizing large language models (LLMs) often rely on retrieval-augmented generation (RAG), followed by reinforcement learning with scalar reward signals to teach models how to use retrieved personal context. We believe that these scalar rewards sometimes provide weak, non-instructive feedback, limiting learning efficiency and personalization quality. We introduce VAC, a novel framework for personalized response generation that replaces scalar rewards with natural language feedback (NLF) that are generated conditioned on the user profiles and the question narratives. NLF serves as a rich and actionable supervision signal, allowing the policy model to iteratively refine its outputs and internalize effective personalization strategies. Training alternates between optimizing the feedback model and fine-tuning the policy model on the improved responses, resulting in a policy model that no longer requires feedback at inference. Evaluation on the LaMP-QA benchmark that consists of three diverse domains demonstrates consistent and significant improvements over the state-of-the-art results. Human evaluations further confirm the superior quality of the generated responses. These results demonstrate that NLF provides more effective signals for optimizing personalized question answering.
Updated: 2025-08-14 14:36:53
标题: 学习来自自然语言反馈的个性化问答
摘要: 个性化对于增强语言技术的效果和用户满意度至关重要,特别是在信息检索任务中,如问题回答。目前用于个性化大型语言模型(LLMs)的方法通常依赖于检索增强生成(RAG),然后通过标量奖励信号进行强化学习,教导模型如何使用检索到的个人背景。我们认为这些标量奖励有时提供弱、非指导性的反馈,限制了学习效率和个性化质量。我们引入了VAC,一个用于个性化响应生成的新框架,用自然语言反馈(NLF)取代标量奖励,这些反馈是根据用户配置文件和问题叙事生成的。NLF作为丰富且可操作的监督信号,允许策略模型迭代地完善其输出并内化有效的个性化策略。训练在优化反馈模型和在改进的响应上微调策略模型之间交替进行,导致策略模型不再需要推理时的反馈。在包含三个不同领域的LaMP-QA基准测试上进行评估显示,相对于最新技术的结果,取得了一致且显著的改进。人工评估进一步证实了生成的响应的优质性。这些结果表明NLF提供了更有效的信号,用于优化个性化问题回答。
更新时间: 2025-08-14 14:36:53
领域: cs.CL,cs.AI,cs.IR
Continuous Bangla Sign Language Translation: Mitigating the Expense of Gloss Annotation with the Assistance of Graph
Millions of individuals worldwide are affected by deafness and hearing impairment. Sign language serves as a sophisticated means of communication for the deaf and hard of hearing. However, in societies that prioritize spoken languages, sign language often faces underestimation, leading to communication barriers and social exclusion. The Continuous Bangla Sign Language Translation project aims to address this gap by enhancing translation methods. While recent approaches leverage transformer architecture for state-of-the-art results, our method integrates graph-based methods with the transformer architecture. This fusion, combining transformer and STGCN-LSTM architectures, proves more effective in gloss-free translation. Our contributions include architectural fusion, exploring various fusion strategies, and achieving a new state-of-the-art performance on diverse sign language datasets, namely RWTH-PHOENIX-2014T, CSL-Daily, How2Sign, and BornilDB v1.0. Our approach demonstrates superior performance compared to current translation outcomes across all datasets, showcasing notable improvements of BLEU-4 scores of 4.01, 2.07, and 0.5, surpassing those of GASLT, GASLT and slt_how2sign in RWTH-PHOENIX-2014T, CSL-Daily, and How2Sign, respectively. Also, we introduce benchmarking on the BornilDB v1.0 dataset for the first time. Our method sets a benchmark for future research, emphasizing the importance of gloss-free translation to improve communication accessibility for the deaf and hard of hearing.
Updated: 2025-08-14 14:32:31
标题: 持续的孟加拉手语翻译:通过图形辅助减轻注释费用
摘要: 全球数百万人受到聋和听力障碍的影响。手语作为聋人和听力有障碍者的一种复杂的交流方式。然而,在那些优先考虑口头语言的社会中,手语经常被低估,导致沟通障碍和社会排斥。连续孟加拉手语翻译项目旨在通过增强翻译方法来解决这一差距。尽管最近的方法利用变压器架构实现了最先进的结果,但我们的方法将基于图的方法与变压器架构集成在一起。这种融合,结合变压器和STGCN-LSTM架构,在无术语翻译中证明更为有效。我们的贡献包括架构融合,探索各种融合策略,并在不同手语数据集上实现了新的最先进性能,即RWTH-PHOENIX-2014T、CSL-Daily、How2Sign和BornilDB v1.0。我们的方法在所有数据集上都显示出比当前翻译结果更出色的性能,展示了BLEU-4分数的显著提高,分别为4.01、2.07和0.5,超过了RWTH-PHOENIX-2014T、CSL-Daily和How2Sign中的GASLT、GASLT和slt_how2sign。此外,我们首次在BornilDB v1.0数据集上进行基准测试。我们的方法为未来的研究设立了一个基准,强调了无术语翻译对于改善聋人和听力有障碍者的交流可达性的重要性。
更新时间: 2025-08-14 14:32:31
领域: cs.CL,cs.AI
From Actions to Words: Towards Abstractive-Textual Policy Summarization in RL
Policies generated by Reinforcement Learning (RL) algorithms are difficult to explain to users, as they emerge from the interaction of complex reward structures and neural network representations. Consequently, analyzing and predicting agent behavior can be challenging, undermining user trust in real-world applications. To facilitate user understanding, current methods for global policy summarization typically rely on videos that demonstrate agent behavior in a subset of world states. However, users can only watch a limited number of demonstrations, constraining their understanding. Moreover, these methods place the burden of interpretation on users by presenting raw behaviors rather than synthesizing them into coherent patterns. To resolve these issues, we introduce SySLLM (Synthesized Summary using Large Language Models), advocating for a new paradigm of abstractive-textual policy explanations. By leveraging Large Language Models (LLMs)-which possess extensive world knowledge and pattern synthesis capabilities-SySLLM generates textual summaries that provide structured and comprehensible explanations of agent policies. SySLLM demonstrates that LLMs can interpret spatio-temporally structured descriptions of state-action trajectories from an RL agent and generate valuable policy insights in a zero-shot setting, without any prior knowledge or fine-tuning. Our evaluation shows that SySLLM captures key insights, such as goal preferences and exploration strategies, that were also identified by human experts. Furthermore, in a large-scale user study (with 200 participants), SySLLM summaries were preferred over demonstration-based summaries (HIGHLIGHTS) by a clear majority (75.5%) of participants.
Updated: 2025-08-14 14:31:22
标题: 从行动到言语:朝向强化学习中的抽象文本政策摘要化
摘要: 通过强化学习(RL)算法生成的策略很难向用户解释,因为它们源自复杂奖励结构和神经网络表示的交互。因此,分析和预测代理行为可能具有挑战性,会削弱用户对实际应用的信任。为了促进用户理解,当前的全局策略总结方法通常依赖于展示代理在部分世界状态下行为的视频。然而,用户只能观看有限数量的演示,限制了他们的理解。此外,这些方法将解释的负担放在用户身上,通过呈现原始行为而不将其合成为连贯模式。为了解决这些问题,我们引入了SySLLM(使用大型语言模型合成摘要),倡导一种新的抽象文本策略解释范式。通过利用具有广泛世界知识和模式合成能力的大型语言模型(LLMs),SySLLM生成提供结构化和易理解的代理策略解释的文本摘要。SySLLM表明,LLMs能够在零样本设置中解释RL代理的时空结构化状态-行为轨迹描述,并生成有价值的策略见解,无需任何先前知识或微调。我们的评估显示,SySLLM捕捉到了关键见解,如目标偏好和探索策略,这些也被人类专家识别出来。此外,在一项大规模用户研究中(有200名参与者),绝大多数(75.5%)参与者更喜欢SySLLM摘要而不是基于演示的摘要(亮点)。
更新时间: 2025-08-14 14:31:22
领域: cs.LG
MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control
We study the problem of learning a neural sampler to generate samples from discrete state spaces where the target probability mass function $\pi\propto\mathrm{e}^{-U}$ is known up to a normalizing constant, which is an important task in fields such as statistical physics, machine learning, combinatorial optimization, etc. To better address this challenging task when the state space has a large cardinality and the distribution is multi-modal, we propose $\textbf{M}$asked $\textbf{D}$iffusion $\textbf{N}$eural $\textbf{S}$ampler ($\textbf{MDNS}$), a novel framework for training discrete neural samplers by aligning two path measures through a family of learning objectives, theoretically grounded in the stochastic optimal control of the continuous-time Markov chains. We validate the efficiency and scalability of MDNS through extensive experiments on various distributions with distinct statistical properties, where MDNS learns to accurately sample from the target distributions despite the extremely high problem dimensions and outperforms other learning-based baselines by a large margin. A comprehensive study of ablations and extensions is also provided to demonstrate the efficacy and potential of the proposed framework.
Updated: 2025-08-14 14:27:16
标题: MDNS:通过随机最优控制的掩蔽扩散神经采样器
摘要: 我们研究了学习神经采样器生成来自离散状态空间的样本的问题,其中目标概率质量函数$\pi\propto\mathrm{e}^{-U}$已知,但需要通过归一化常数进行归一化。这在统计物理学、机器学习、组合优化等领域是一项重要任务。为了更好地解决当状态空间具有大基数并且分布是多模态时的这一挑战性任务,我们提出了一种新颖的框架,即$\textbf{M}$asked $\textbf{D}$iffusion $\textbf{N}$eural $\textbf{S}$ampler($\textbf{MDNS}$),通过通过一系列学习目标理论上基于连续时间马尔可夫链的随机最优控制来对离散神经采样器进行训练。我们通过在具有不同统计特性的各种分布上进行广泛实验来验证MDNS的效率和可扩展性,在这些实验中,尽管问题维度极高,MDNS学会了准确从目标分布中采样,并且大幅超过其他基于学习的基线。我们还提供了一项全面的消融和扩展研究,以展示所提出框架的有效性和潜力。
更新时间: 2025-08-14 14:27:16
领域: cs.LG,stat.ML
Advancing Autonomous Incident Response: Leveraging LLMs and Cyber Threat Intelligence
Effective incident response (IR) is critical for mitigating cyber threats, yet security teams are overwhelmed by alert fatigue, high false-positive rates, and the vast volume of unstructured Cyber Threat Intelligence (CTI) documents. While CTI holds immense potential for enriching security operations, its extensive and fragmented nature makes manual analysis time-consuming and resource-intensive. To bridge this gap, we introduce a novel Retrieval-Augmented Generation (RAG)-based framework that leverages Large Language Models (LLMs) to automate and enhance IR by integrating dynamically retrieved CTI. Our approach introduces a hybrid retrieval mechanism that combines NLP-based similarity searches within a CTI vector database with standardized queries to external CTI platforms, facilitating context-aware enrichment of security alerts. The augmented intelligence is then leveraged by an LLM-powered response generation module, which formulates precise, actionable, and contextually relevant incident mitigation strategies. We propose a dual evaluation paradigm, wherein automated assessment using an auxiliary LLM is systematically cross-validated by cybersecurity experts. Empirical validation on real-world and simulated alerts demonstrates that our approach enhances the accuracy, contextualization, and efficiency of IR, alleviating analyst workload and reducing response latency. This work underscores the potential of LLM-driven CTI fusion in advancing autonomous security operations and establishing a foundation for intelligent, adaptive cybersecurity frameworks.
Updated: 2025-08-14 14:20:34
标题: 推进自主事件响应:利用LLMs和网络威胁情报
摘要: 有效的事件响应(IR)对于缓解网络威胁至关重要,然而安全团队常常被警报疲劳、高误报率和大量非结构化的网络威胁情报(CTI)文档所压倒。虽然CTI具有丰富的潜力来丰富安全运营,但其广泛和分散的性质使得手动分析耗时且资源密集。为了弥补这一差距,我们引入了一个基于检索增强生成(RAG)的框架,利用大型语言模型(LLMs)通过集成动态检索的CTI来自动化和增强IR。我们的方法引入了一个混合检索机制,将基于自然语言处理的相似性搜索与标准化查询外部CTI平台相结合,促进安全警报的上下文感知丰富。增强的智能然后由LLM驱动的响应生成模块利用,制定精确、可操作且上下文相关的事件缓解策略。我们提出了一个双重评估范式,通过辅助LLM进行自动评估,并由网络安全专家进行系统交叉验证。在真实世界和模拟警报上的经验验证表明,我们的方法增强了IR的准确性、上下文化和效率,减轻了分析师的工作量并缩短了响应延迟。这项工作强调了LLM驱动的CTI融合在推动自主安全运营和建立智能、自适应网络安全框架方面的潜力。
更新时间: 2025-08-14 14:20:34
领域: cs.CR,cs.LG
Hybrid Generative Fusion for Efficient and Privacy-Preserving Face Recognition Dataset Generation
In this paper, we present our approach to the DataCV ICCV Challenge, which centers on building a high-quality face dataset to train a face recognition model. The constructed dataset must not contain identities overlapping with any existing public face datasets. To handle this challenge, we begin with a thorough cleaning of the baseline HSFace dataset, identifying and removing mislabeled or inconsistent identities through a Mixture-of-Experts (MoE) strategy combining face embedding clustering and GPT-4o-assisted verification. We retain the largest consistent identity cluster and apply data augmentation up to a fixed number of images per identity. To further diversify the dataset, we generate synthetic identities using Stable Diffusion with prompt engineering. As diffusion models are computationally intensive, we generate only one reference image per identity and efficiently expand it using Vec2Face, which rapidly produces 49 identity-consistent variants. This hybrid approach fuses GAN-based and diffusion-based samples, enabling efficient construction of a diverse and high-quality dataset. To address the high visual similarity among synthetic identities, we adopt a curriculum learning strategy by placing them early in the training schedule, allowing the model to progress from easier to harder samples. Our final dataset contains 50 images per identity, and all newly generated identities are checked with mainstream face datasets to ensure no identity leakage. Our method achieves \textbf{1st place} in the competition, and experimental results show that our dataset improves model performance across 10K, 20K, and 100K identity scales. Code is available at https://github.com/Ferry-Li/datacv_fr.
Updated: 2025-08-14 14:14:18
标题: 混合生成融合技术用于高效和保护隐私的人脸识别数据集生成
摘要: 在这篇论文中,我们介绍了我们在DataCV ICCV挑战赛中的方法,重点是构建一个高质量的人脸数据集,用于训练人脸识别模型。构建的数据集不得包含与任何现有公共人脸数据集重叠的身份。为了处理这一挑战,我们从基线HSFace数据集开始进行彻底清理,通过结合人脸嵌入聚类和GPT-4o辅助验证的专家混合(MoE)策略,识别和删除错误标记或不一致的身份。我们保留最大的一致身份集群,并对每个身份的图像进行数据增强,直到达到固定数量。为了进一步使数据集多样化,我们使用稳定扩散和提示工程生成合成身份。由于扩散模型计算密集,我们每个身份仅生成一幅参考图像,并使用Vec2Face高效扩展,快速生成49个身份一致的变体。这种混合方法融合了基于GAN和基于扩散的样本,实现了多样化和高质量数据集的高效构建。为了解决合成身份之间的高视觉相似性,我们采用课程学习策略,将它们放在训练计划的早期阶段,使模型能够从简单到困难的样本逐步进展。我们最终的数据集每个身份包含50张图像,并且所有新生成的身份都经过主流人脸数据集的检查,以确保没有身份泄漏。我们的方法在比赛中获得了第一名,并实验结果表明我们的数据集在10K、20K和100K身份规模下改善了模型性能。代码可在https://github.com/Ferry-Li/datacv_fr上找到。
更新时间: 2025-08-14 14:14:18
领域: cs.CV,cs.AI
Knowledge-based Consistency Testing of Large Language Models
In this work, we systematically expose and measure the inconsistency and knowledge gaps of Large Language Models (LLMs). Specifically, we propose an automated testing framework (called KonTest) which leverages a knowledge graph to construct test cases. KonTest probes and measures the inconsistencies in the LLM's knowledge of the world via a combination of semantically-equivalent queries and test oracles (metamorphic or ontological oracle). KonTest further mitigates knowledge gaps via a weighted LLM model ensemble. Using four state-of-the-art LLMs (Falcon, Gemini, GPT3.5, and Llama2), we show that KonTest generates 19.2% error inducing inputs (1917 errors from 9979 test inputs). It also reveals a 16.5% knowledge gap across all tested LLMs. A mitigation method informed by KonTest's test suite reduces LLM knowledge gap by 32.48%. Our ablation study further shows that GPT3.5 is not suitable for knowledge-based consistency testing because it is only 60%-68% effective in knowledge construction.
Updated: 2025-08-14 14:12:12
标题: 基于知识的大型语言模型一致性测试
摘要: 在这项工作中,我们系统地暴露和衡量了大型语言模型(LLMs)的不一致性和知识空白。具体来说,我们提出了一种自动化测试框架(称为KonTest),利用知识图构建测试用例。KonTest通过语义等价查询和测试神谕(变形神谕或本体神谕)的结合,探测和衡量LLM对世界知识的不一致性。KonTest进一步通过加权LLM模型集合来缓解知识空白。使用四种最先进的LLMs(Falcon、Gemini、GPT3.5和Llama2),我们展示了KonTest生成了19.2%的引发错误的输入(9979个测试输入中的1917个错误)。它还揭示了所有经过测试的LLMs存在16.5%的知识空白。KonTest测试套件指导的缓解方法将LLM的知识空白减少了32.48%。我们的消融研究进一步表明,GPT3.5不适用于基于知识的一致性测试,因为它在知识构建方面只有60%-68%的有效性。
更新时间: 2025-08-14 14:12:12
领域: cs.CL,cs.AI,cs.LG
STEP: Stepwise Curriculum Learning for Context-Knowledge Fusion in Conversational Recommendation
Conversational recommender systems (CRSs) aim to proactively capture user preferences through natural language dialogue and recommend high-quality items. To achieve this, CRS gathers user preferences via a dialog module and builds user profiles through a recommendation module to generate appropriate recommendations. However, existing CRS faces challenges in capturing the deep semantics of user preferences and dialogue context. In particular, the efficient integration of external knowledge graph (KG) information into dialogue generation and recommendation remains a pressing issue. Traditional approaches typically combine KG information directly with dialogue content, which often struggles with complex semantic relationships, resulting in recommendations that may not align with user expectations. To address these challenges, we introduce STEP, a conversational recommender centered on pre-trained language models that combines curriculum-guided context-knowledge fusion with lightweight task-specific prompt tuning. At its heart, an F-Former progressively aligns the dialogue context with knowledge-graph entities through a three-stage curriculum, thus resolving fine-grained semantic mismatches. The fused representation is then injected into the frozen language model via two minimal yet adaptive prefix prompts: a conversation prefix that steers response generation toward user intent and a recommendation prefix that biases item ranking toward knowledge-consistent candidates. This dual-prompt scheme allows the model to share cross-task semantics while respecting the distinct objectives of dialogue and recommendation. Experimental results show that STEP outperforms mainstream methods in the precision of recommendation and dialogue quality in two public datasets.
Updated: 2025-08-14 14:08:21
标题: STEP:会话推荐中的逐步课程学习,用于上下文知识融合
摘要: 会话式推荐系统(CRSs)旨在通过自然语言对话主动捕获用户偏好并推荐高质量物品。为了实现这一目标,CRS通过对话模块收集用户偏好,并通过推荐模块构建用户个人资料以生成适当的推荐。然而,现有的CRS在捕获用户偏好和对话背景的深层语义方面面临挑战。特别是,将外部知识图(KG)信息有效集成到对话生成和推荐中仍然是一个紧迫的问题。传统方法通常直接将KG信息与对话内容结合,往往难以处理复杂的语义关系,导致推荐可能与用户期望不符。 为了解决这些挑战,我们引入了STEP,一个以预训练语言模型为中心的会话式推荐系统,结合课程引导的上下文知识融合和轻量级的任务特定提示调整。在其核心,一个F-Former通过三阶段的课程逐渐将对话上下文与知识图实体对齐,从而解决细粒度语义不匹配。融合表示然后通过两个最小但自适应的前缀提示注入到冻结的语言模型中:一个引导响应生成朝向用户意图的对话前缀,一个偏向知识一致候选项的推荐前缀来偏向物品排名。这种双前缀方案允许模型在尊重对话和推荐的不同目标的同时共享跨任务语义。实验结果表明,STEP在两个公共数据集中的推荐精度和对话质量方面表现优于主流方法。
更新时间: 2025-08-14 14:08:21
领域: cs.AI,cs.IR,H.3.3; I.2.7; H.2.8
AddressVLM: Cross-view Alignment Tuning for Image Address Localization using Large Vision-Language Models
Large visual language models (LVLMs) have demonstrated impressive performance in coarse-grained geo-localization at the country or city level, but they struggle with fine-grained street-level localization within urban areas. In this paper, we explore integrating city-wide address localization capabilities into LVLMs, facilitating flexible address-related question answering using street-view images. A key challenge is that the street-view visual question-and-answer (VQA) data provides only microscopic visual cues, leading to subpar performance in fine-tuned models. To tackle this issue, we incorporate perspective-invariant satellite images as macro cues and propose cross-view alignment tuning including a satellite-view and street-view image grafting mechanism, along with an automatic label generation mechanism. Then LVLM's global understanding of street distribution is enhanced through cross-view matching. Our proposed model, named AddressVLM, consists of two-stage training protocols: cross-view alignment tuning and address localization tuning. Furthermore, we have constructed two street-view VQA datasets based on image address localization datasets from Pittsburgh and San Francisco. Qualitative and quantitative evaluations demonstrate that AddressVLM outperforms counterpart LVLMs by over 9% and 12% in average address localization accuracy on these two datasets, respectively.
Updated: 2025-08-14 14:06:28
标题: AddressVLM: 使用大型视觉-语言模型进行图像地址定位的跨视图对齐调整
摘要: 大型视觉语言模型(LVLMs)在国家或城市级别的粗粒度地理定位方面表现出色,但在城市区域内的细粒度街道级定位方面表现不佳。在本文中,我们探讨将城市范围的地址定位能力整合到LVLMs中,从而利用街景图像实现灵活的与地址相关的问题回答。一个关键挑战是街景视觉问答(VQA)数据仅提供微观视觉线索,导致微调模型性能不佳。为了解决这个问题,我们将透视不变的卫星图像作为宏观线索,提出了跨视图对齐调整,包括卫星视图和街景图像嫁接机制,以及自动生成标签机制。然后通过跨视图匹配增强LVLM对街道分布的全局理解。我们提出的模型名为AddressVLM,包括两阶段训练协议:跨视图对齐调整和地址定位调整。此外,我们基于匹兹堡和旧金山的图像地址定位数据集构建了两个街景VQA数据集。定性和定量评估表明,在这两个数据集上,AddressVLM的平均地址定位准确性分别比对应的LVLM提高了超过9%和12%。
更新时间: 2025-08-14 14:06:28
领域: cs.CV,cs.AI
Deep Learning in Classical and Quantum Physics
Scientific progress is tightly coupled to the emergence of new research tools. Today, machine learning (ML)-especially deep learning (DL)-has become a transformative instrument for quantum science and technology. Owing to the intrinsic complexity of quantum systems, DL enables efficient exploration of large parameter spaces, extraction of patterns from experimental data, and data-driven guidance for research directions. These capabilities already support tasks such as refining quantum control protocols and accelerating the discovery of materials with targeted quantum properties, making ML/DL literacy an essential skill for the next generation of quantum scientists. At the same time, DL's power brings risks: models can overfit noisy data, obscure causal structure, and yield results with limited physical interpretability. Recognizing these limitations and deploying mitigation strategies is crucial for scientific rigor. These lecture notes provide a comprehensive, graduate-level introduction to DL for quantum applications, combining conceptual exposition with hands-on examples. Organized as a progressive sequence, they aim to equip readers to decide when and how to apply DL effectively, to understand its practical constraints, and to adapt AI methods responsibly to problems across quantum physics, chemistry, and engineering.
Updated: 2025-08-14 14:05:12
标题: 经典物理学和量子物理学中的深度学习
摘要: 科学进步与新研究工具的出现密切相关。如今,机器学习(ML)-特别是深度学习(DL)-已成为量子科学和技术的革命性工具。由于量子系统的固有复杂性,DL使得对大参数空间的高效探索、从实验数据中提取模式以及数据驱动的研究方向指导成为可能。这些能力已经支持诸如优化量子控制协议和加快发现具有目标量子性质材料等任务,使得ML/DL的知识成为下一代量子科学家的必备技能。同时,DL的强大也带来风险:模型可能对噪声数据过拟合、隐藏因果结构,并产生具有有限物理可解释性的结果。认识到这些限制并采取缓解策略对于科学严谨性至关重要。这些讲座笔记为量子应用的DL提供了全面的研究生级介绍,结合概念性阐述和实际示例。它们按照渐进顺序组织,旨在使读者能够决定何时以及如何有效地应用DL,理解其实际约束,并负责地将AI方法应用于量子物理、化学和工程等问题。
更新时间: 2025-08-14 14:05:12
领域: quant-ph,cs.AI,cs.NE,physics.comp-ph
Personalized Feature Translation for Expression Recognition: An Efficient Source-Free Domain Adaptation Method
Facial expression recognition (FER) models are employed in many video-based affective computing applications, such as human-computer interaction and healthcare monitoring. However, deep FER models often struggle with subtle expressions and high inter-subject variability, limiting their performance in real-world applications. To improve their performance, source-free domain adaptation (SFDA) methods have been proposed to personalize a pretrained source model using only unlabeled target domain data, thereby avoiding data privacy, storage, and transmission constraints. This paper addresses a challenging scenario where source data is unavailable for adaptation, and only unlabeled target data consisting solely of neutral expressions is available. SFDA methods are not typically designed to adapt using target data from only a single class. Further, using models to generate facial images with non-neutral expressions can be unstable and computationally intensive. In this paper, personalized feature translation (PFT) is proposed for SFDA. Unlike current image translation methods for SFDA, our lightweight method operates in the latent space. We first pre-train the translator on the source domain data to transform the subject-specific style features from one source subject into another. Expression information is preserved by optimizing a combination of expression consistency and style-aware objectives. Then, the translator is adapted on neutral target data, without using source data or image synthesis. By translating in the latent space, PFT avoids the complexity and noise of face expression generation, producing discriminative embeddings optimized for classification. Using PFT eliminates the need for image synthesis, reduces computational overhead (using a lightweight translator), and only adapts part of the model, making the method efficient compared to image-based translation.
Updated: 2025-08-14 14:05:10
标题: 个性化特征翻译用于表情识别:一种高效的无源领域自适应方法
摘要: 面部表情识别(FER)模型被应用于许多基于视频的情感计算应用中,如人机交互和医疗监测。然而,深度FER模型通常在处理微妙表情和高个体间变异性时遇到困难,限制了它们在现实世界应用中的性能。为了提高它们的性能,提出了源无关领域自适应(SFDA)方法,该方法仅使用未标记的目标域数据对预训练的源模型进行个性化,从而避免了数据隐私、存储和传输约束。本文针对一种具有挑战性的场景,其中源数据不可用于自适应,只有包含仅中性表情的未标记目标数据可用。SFDA方法通常不是设计为使用仅来自单个类的目标数据进行适应。此外,使用模型生成具有非中性表情的面部图像可能不稳定且计算密集。本文提出了个性化特征转换(PFT)用于SFDA。与当前用于SFDA的图像翻译方法不同,我们的轻量级方法在潜在空间中运作。我们首先在源领域数据上对翻译器进行预训练,将一个源主体的主观特征从一个源主体转换为另一个源主体。通过优化表情一致性和样式感知目标的组合来保留表达信息。然后,在中性目标数据上适应翻译器,而无需使用源数据或图像合成。通过在潜在空间中进行翻译,PFT避免了脸部表情生成的复杂性和噪声,产生了优化用于分类的具有区分性的嵌入。使用PFT消除了图像合成的需要,减少了计算开销(使用轻量级翻译器)并且仅适应模型的一部分,使该方法与基于图像的翻译相比更有效率。
更新时间: 2025-08-14 14:05:10
领域: cs.CV,cs.AI
Vision Transformers in Precision Agriculture: A Comprehensive Survey
Detecting plant diseases is a crucial aspect of modern agriculture, as it plays a key role in maintaining crop health and increasing overall yield. Traditional approaches, though still valuable, often rely on manual inspection or conventional machine learning techniques, both of which face limitations in scalability and accuracy. Recently, Vision Transformers (ViTs) have emerged as a promising alternative, offering advantages such as improved handling of long-range dependencies and better scalability for visual tasks. This review explores the application of ViTs in precision agriculture, covering a range of tasks. We begin by introducing the foundational architecture of ViTs and discussing their transition from Natural Language Processing (NLP) to Computer Vision. The discussion includes the concept of inductive bias in traditional models like Convolutional Neural Networks (CNNs), and how ViTs mitigate these biases. We provide a comprehensive review of recent literature, focusing on key methodologies, datasets, and performance metrics. This study also includes a comparative analysis of CNNs and ViTs, along with a review of hybrid models and performance enhancements. Technical challenges such as data requirements, computational demands, and model interpretability are addressed, along with potential solutions. Finally, we outline future research directions and technological advancements that could further support the integration of ViTs in real-world agricultural settings. Our goal with this study is to offer practitioners and researchers a deeper understanding of how ViTs are poised to transform smart and precision agriculture.
Updated: 2025-08-14 13:55:53
标题: 在精准农业中的视觉变压器:一项全面调查
摘要: 检测植物疾病是现代农业的一个关键方面,因为它在维护作物健康和增加总产量方面起着关键作用。传统方法虽然仍然有价值,但通常依赖于手动检查或传统的机器学习技术,这两种方法都存在可扩展性和准确性方面的局限。最近,视觉变换器(ViTs)作为一种有希望的替代方案出现,提供了改进长距离依赖性处理和更好的视觉任务可扩展性等优势。本综述探讨了ViTs在精准农业中的应用,涵盖了一系列任务。我们首先介绍了ViTs的基础架构,并讨论了它们从自然语言处理(NLP)转向计算机视觉的过程。讨论包括传统模型(如卷积神经网络(CNNs))中归纳偏差的概念,以及ViTs如何缓解这些偏差。我们提供了对最近文献的综合审查,重点关注关键方法、数据集和性能指标。本研究还包括对CNNs和ViTs的比较分析,以及混合模型和性能增强的审查。还讨论了数据需求、计算需求和模型可解释性等技术挑战,以及潜在的解决方案。最后,我们概述了未来的研究方向和技术进展,这些进展可能进一步支持ViTs在现实农业环境中的整合。我们的目标是通过这项研究为从业者和研究人员提供对ViTs如何改变智能和精准农业的更深入理解。
更新时间: 2025-08-14 13:55:53
领域: cs.CV,cs.AI
Serial Over Parallel: Learning Continual Unification for Multi-Modal Visual Object Tracking and Benchmarking
Unifying multiple multi-modal visual object tracking (MMVOT) tasks draws increasing attention due to the complementary nature of different modalities in building robust tracking systems. Existing practices mix all data sensor types in a single training procedure, structuring a parallel paradigm from the data-centric perspective and aiming for a global optimum on the joint distribution of the involved tasks. However, the absence of a unified benchmark where all types of data coexist forces evaluations on separated benchmarks, causing \textit{inconsistency} between training and testing, thus leading to performance \textit{degradation}. To address these issues, this work advances in two aspects: \ding{182} A unified benchmark, coined as UniBench300, is introduced to bridge the inconsistency by incorporating multiple task data, reducing inference passes from three to one and cutting time consumption by 27\%. \ding{183} The unification process is reformulated in a serial format, progressively integrating new tasks. In this way, the performance degradation can be specified as knowledge forgetting of previous tasks, which naturally aligns with the philosophy of continual learning (CL), motivating further exploration of injecting CL into the unification process. Extensive experiments conducted on two baselines and four benchmarks demonstrate the significance of UniBench300 and the superiority of CL in supporting a stable unification process. Moreover, while conducting dedicated analyses, the performance degradation is found to be negatively correlated with network capacity. Additionally, modality discrepancies contribute to varying degradation levels across tasks (RGBT > RGBD > RGBE in MMVOT), offering valuable insights for future multi-modal vision research. Source codes and the proposed benchmark is available at \textit{https://github.com/Zhangyong-Tang/UniBench300}.
Updated: 2025-08-14 13:54:04
标题: 串行超并行:学习连续统一以进行多模态视觉目标跟踪和基准测试
摘要: 多模态视觉目标跟踪(MMVOT)任务的统一吸引了越来越多的关注,这是因为不同模态在构建稳健跟踪系统时的互补性。现有实践将所有数据传感器类型混合在一个训练过程中,从数据中心的角度构建并旨在全局最优化参与任务的联合分布。然而,由于缺乏一个统一的基准,在这里所有类型的数据共存,导致对分离基准的评估,导致训练和测试之间的不一致,从而导致性能下降。为了解决这些问题,这项工作在两个方面取得了进展:一种被称为UniBench300的统一基准被引入,通过整合多个任务数据来减少推理次数从三次到一次,并将时间消耗减少27%。统一过程以串行格式重新构建,逐步整合新任务。通过这种方式,可以将性能下降指定为对先前任务的知识遗忘,这与持续学习(CL)的理念自然契合,进一步激发将CL注入到统一过程中的探索。在两个基线和四个基准上进行的大量实验证明了UniBench300的重要性以及CL在支持稳定统一过程中的优越性。此外,在进行专门分析时,发现性能下降与网络容量呈负相关。此外,模态差异对跨任务的不同降级水平贡献(在MMVOT中RGBT > RGBD > RGBE),为未来多模态视觉研究提供了有价值的见解。源代码和提出的基准可在https://github.com/Zhangyong-Tang/UniBench300上找到。
更新时间: 2025-08-14 13:54:04
领域: cs.CV,cs.AI
A Novel Study on Intelligent Methods and Explainable AI for Dynamic Malware Analysis
Deep learning models are one of the security strategies, trained on extensive datasets, and play a critical role in detecting and responding to these threats by recognizing complex patterns in malicious code. However, the opaque nature of these models-often described as "black boxes"-makes their decision-making processes difficult to understand, even for their creators. This research addresses these challenges by integrating Explainable AI (XAI) techniques to enhance the interpretability and trustworthiness of malware detection models. In this research, the use of Multi-Layer Perceptrons (MLP) for dynamic malware analysis has been considered, a less explored area, and its efficacy in detecting Metamorphic Malware, and further the effectiveness and transparency of MLPs, CNNs, RNNs, and CNN-LSTM models in malware classification, evaluating these models through the lens of Explainable AI (XAI). This comprehensive approach aims to demystify the internal workings of deep learning models, promoting a better understanding and trust in their predictive capabilities in cybersecurity contexts. Such in-depth analysis and implementation haven't been done to the best of our knowledge.
Updated: 2025-08-14 13:49:29
标题: 智能方法和可解释人工智能在动态恶意软件分析中的新颖研究
摘要: 深度学习模型是安全策略之一,经过大量数据集的训练,在检测和应对这些威胁中发挥关键作用,通过识别恶意代码中的复杂模式。然而,这些模型的不透明性——通常被描述为“黑匣子”——使得它们的决策过程难以理解,即使对于它们的创建者也是如此。本研究通过整合可解释人工智能(XAI)技术来增强恶意软件检测模型的可解释性和可信度,以解决这些挑战。在这项研究中,考虑了使用多层感知器(MLP)进行动态恶意软件分析,这是一个较少探索的领域,以及它在检测变形恶意软件方面的功效,并进一步评估MLP、CNN、RNN和CNN-LSTM模型在恶意软件分类中的有效性和透明度,通过可解释人工智能(XAI)的视角评估这些模型。这种全面的方法旨在揭开深度学习模型的内部运作,促进对其在网络安全环境中预测能力的更好理解和信任。据我们所知,这样深入的分析和实施尚未完成。
更新时间: 2025-08-14 13:49:29
领域: cs.CR,cs.IT,math.IT
Graph Learning via Logic-Based Weisfeiler-Leman Variants and Tabularization
We present a novel approach for graph classification based on tabularizing graph data via variants of the Weisfeiler-Leman algorithm and then applying methods for tabular data. We investigate a comprehensive class of Weisfeiler-Leman variants obtained by modifying the underlying logical framework and establish a precise theoretical characterization of their expressive power. We then test two selected variants on twelve benchmark datasets that span a range of different domains. The experiments demonstrate that our approach matches the accuracy of state-of-the-art graph neural networks and graph kernels while being more time or memory efficient, depending on the dataset. We also briefly discuss directly extracting interpretable modal logic formulas from graph datasets.
Updated: 2025-08-14 13:47:50
标题: 通过基于逻辑的Weisfeiler-Leman变体和表格化进行图学习
摘要: 我们提出了一种基于Weisfeiler-Leman算法的图分类的新方法,通过对图数据进行表格化处理,然后应用表格数据的方法。我们研究了通过修改基本逻辑框架获得的一系列Weisfeiler-Leman变体,并建立了它们表达能力的精确理论特征。然后我们在涵盖不同领域的十二个基准数据集上测试了两种选择的变体。实验证明我们的方法在与最先进的图神经网络和图核函数相匹配的同时,在时间或内存效率上更高,取决于数据集。我们还简要讨论了如何直接从图数据集中提取可解释的模态逻辑公式。
更新时间: 2025-08-14 13:47:50
领域: cs.LG,I.2.6; F.4.1; I.2.4; E.2
Unifying Self-Supervised Clustering and Energy-Based Models
Self-supervised learning excels at learning representations from large amounts of data. At the same time, generative models offer the complementary property of learning information about the underlying data generation process. In this study, we aim at establishing a principled connection between these two paradigms and highlight the benefits of their complementarity. In particular, we perform an analysis of self-supervised learning objectives, elucidating the underlying probabilistic graphical models and presenting a standardized methodology for their derivation from first principles. The analysis suggests a natural means of integrating self-supervised learning with likelihood-based generative models. We instantiate this concept within the realm of cluster-based self-supervised learning and energy models, introducing a lower bound proven to reliably penalize the most important failure modes and unlocking full unification. Our theoretical findings are substantiated through experiments on synthetic and real-world data, including SVHN, CIFAR10, and CIFAR100, demonstrating that our objective function allows to jointly train a backbone network in a discriminative and generative fashion, consequently outperforming existing self-supervised learning strategies in terms of clustering, generation and out-of-distribution detection performance by a wide margin. We also demonstrate that the solution can be integrated into a neuro-symbolic framework to tackle a simple yet non-trivial instantiation of the symbol grounding problem. The code is publicly available at https://github.com/emsansone/GEDI.
Updated: 2025-08-14 13:45:40
标题: 统一自监督聚类与能量模型
摘要: 自监督学习擅长从大量数据中学习表示。同时,生成模型提供了学习有关底层数据生成过程信息的补充属性。在这项研究中,我们旨在建立这两种范例之间的原理性联系,并强调它们互补性的好处。具体来说,我们对自监督学习目标进行了分析,阐明了潜在的概率图模型,并提出了从第一原则推导它们的标准方法论。分析表明了一种自然集成自监督学习和基于似然的生成模型的方法。我们在基于聚类的自监督学习和能量模型领域内实现了这一概念,引入了一个可靠惩罚最重要的失败模式并解锁完全统一的下界。我们的理论研究通过对合成和真实世界数据的实验进行了证实,包括SVHN、CIFAR10和CIFAR100,证明了我们的目标函数允许以较大幅度优于现有的自监督学习策略的方式联合训练骨干网络,进而在聚类、生成和超出分配检测性能方面表现出色。我们还展示了该解决方案可以集成到神经符号框架中,以解决符号接地问题的简单但非平凡的实例。代码公开可在https://github.com/emsansone/GEDI获取。
更新时间: 2025-08-14 13:45:40
领域: cs.LG,cs.CV
Geospatial Diffusion for Land Cover Imperviousness Change Forecasting
Land cover, both present and future, has a significant effect on several important Earth system processes. For example, impervious surfaces heat up and speed up surface water runoff and reduce groundwater infiltration, with concomitant effects on regional hydrology and flood risk. While regional Earth System models have increasing skill at forecasting hydrologic and atmospheric processes at high resolution in future climate scenarios, our ability to forecast land-use and land-cover change (LULC), a critical input to risk and consequences assessment for these scenarios, has lagged behind. In this paper, we propose a new paradigm exploiting Generative AI (GenAI) for land cover change forecasting by framing LULC forecasting as a data synthesis problem conditioned on historical and auxiliary data-sources. We discuss desirable properties of generative models that fundament our research premise, and demonstrate the feasibility of our methodology through experiments on imperviousness forecasting using historical data covering the entire conterminous United States. Specifically, we train a diffusion model for decadal forecasting of imperviousness and compare its performance to a baseline that assumes no change at all. Evaluation across 12 metropolitan areas for a year held-out during training indicate that for average resolutions $\geq 0.7\times0.7km^2$ our model yields MAE lower than such a baseline. This finding corroborates that such a generative model can capture spatiotemporal patterns from historical data that are significant for projecting future change. Finally, we discuss future research to incorporate auxiliary information on physical properties about the Earth, as well as supporting simulation of different scenarios by means of driver variables.
Updated: 2025-08-14 13:45:10
标题: 地理空间扩散用于土地覆盖不透水性变化预测
摘要: 土地覆盖,无论是现在还是未来,都对几个重要的地球系统过程产生显著影响。例如,不透水表面会加热并加快地表径流,减少地下水渗透,对区域水文和洪水风险产生影响。虽然区域地球系统模型在预测未来气候场景中高分辨率下的水文和大气过程方面的技能越来越强,但我们对于预测土地利用和土地覆盖变化(LULC)的能力,这对于这些场景的风险和后果评估至关重要,却一直滞后。在本文中,我们提出了一种新的范式,利用生成人工智能(GenAI)来预测土地覆盖变化,将LULC预测作为一个在历史和辅助数据源条件下的数据综合问题。我们讨论了生成模型的可取性质,作为我们研究前提的基础,并通过使用覆盖整个连续美国的历史数据进行不透水性预测实验,展示了我们方法的可行性。具体来说,我们对不透水性的十年预测训练了一个扩散模型,并将其性能与假设完全没有变化的基线进行了比较。在训练期间保留的一年内对12个大都市区域的评估表明,对于平均分辨率≥0.7×0.7km²,我们的模型产生的平均绝对误差低于这样一个基线。这一发现证实了这样一个生成模型能够从历史数据中捕捉对于预测未来变化重要的时空模式。最后,我们讨论了未来的研究,以整合关于地球物理特性的辅助信息,以及通过驱动变量支持不同场景的模拟。
更新时间: 2025-08-14 13:45:10
领域: cs.LG,cs.CV
SPHENIC: Topology-Informed Multi-View Clustering for Spatial Transcriptomics
By incorporating spatial location information, spatial-transcriptomics clustering yields more comprehensive insights into cell subpopulation identification. Despite recent progress, existing methods have at least two limitations: (i) topological learning typically considers only representations of individual cells or their interaction graphs; however, spatial transcriptomic profiles are often noisy, making these approaches vulnerable to low-quality topological signals, and (ii) insufficient modeling of spatial neighborhood information leads to low-quality spatial embeddings. To address these limitations, we propose SPHENIC, a novel Spatial Persistent Homology Enhanced Neighborhood Integrative Clustering method. Specifically, SPHENIC incorporates invariant topological features into the clustering network to achieve stable representation learning. Additionally, to construct high-quality spatial embeddings that reflect the true cellular distribution, we design the Spatial Constraint and Distribution Optimization Module (SCDOM). This module increases the similarity between a cell's embedding and those of its spatial neighbors, decreases similarity with non-neighboring cells, and thereby produces clustering-friendly spatial embeddings. Extensive experiments on 14 benchmark spatial transcriptomic slices demonstrate that SPHENIC achieves superior performance on the spatial clustering task, outperforming existing state-of-the-art methods by 3.31%-6.54% over the best alternative.
Updated: 2025-08-14 13:43:28
标题: SPHENIC:基于拓扑信息的空间转录组学多视图聚类
摘要: 通过整合空间位置信息,空间转录组聚类能够为细胞亚群识别提供更全面的洞察。尽管最近取得了进展,但现有方法至少存在两个局限性:(i)拓扑学习通常仅考虑单个细胞或其相互作用图的表示;然而,空间转录组文件通常存在噪音,使这些方法容易受到低质量拓扑信号的影响;(ii)对空间邻域信息建模不足导致空间嵌入的质量较低。为了解决这些局限性,我们提出了SPHENIC,一种新颖的空间持久同调增强邻域整合聚类方法。具体而言,SPHENIC将不变的拓扑特征整合到聚类网络中,实现稳定的表示学习。此外,为了构建反映真实细胞分布的高质量空间嵌入,我们设计了空间约束和分布优化模块(SCDOM)。该模块增加了一个细胞的嵌入与其空间邻居的相似性,降低了与非邻近细胞的相似性,从而产生适合聚类的空间嵌入。对14个基准空间转录组切片进行的大量实验表明,SPHENIC在空间聚类任务上表现出卓越性能,超过现有最先进方法的3.31%-6.54%。
更新时间: 2025-08-14 13:43:28
领域: cs.LG,cs.AI
Conditional Information Bottleneck for Multimodal Fusion: Overcoming Shortcut Learning in Sarcasm Detection
Multimodal sarcasm detection is a complex task that requires distinguishing subtle complementary signals across modalities while filtering out irrelevant information. Many advanced methods rely on learning shortcuts from datasets rather than extracting intended sarcasm-related features. However, our experiments show that shortcut learning impairs the model's generalization in real-world scenarios. Furthermore, we reveal the weaknesses of current modality fusion strategies for multimodal sarcasm detection through systematic experiments, highlighting the necessity of focusing on effective modality fusion for complex emotion recognition. To address these challenges, we construct MUStARD++$^{R}$ by removing shortcut signals from MUStARD++. Then, a Multimodal Conditional Information Bottleneck (MCIB) model is introduced to enable efficient multimodal fusion for sarcasm detection. Experimental results show that the MCIB achieves the best performance without relying on shortcut learning.
Updated: 2025-08-14 13:39:03
标题: 多模态融合的条件信息瓶颈:克服讽刺检测中的捷径学习
摘要: 多模态讽刺检测是一个复杂的任务,需要在不同模态之间区分微妙的互补信号,同时滤除不相关信息。许多先进的方法依赖于从数据集中学习捷径,而不是提取旨在与讽刺相关的特征。然而,我们的实验证明,捷径学习会影响模型在实际场景中的泛化能力。此外,我们通过系统实验揭示了当前多模态融合策略在多模态讽刺检测中的弱点,强调了专注于有效模态融合对复杂情感识别的必要性。为了解决这些挑战,我们通过从MUStARD++中去除捷径信号构建了MUStARD++$^{R}$。然后,引入了一种多模态条件信息瓶颈(MCIB)模型,以实现有效的多模态融合用于讽刺检测。实验结果表明,MCIB在不依赖于捷径学习的情况下实现了最佳性能。
更新时间: 2025-08-14 13:39:03
领域: cs.LG
WeChat-YATT: A Simple, Scalable and Balanced RLHF Trainer
Reinforcement Learning from Human Feedback (RLHF) has emerged as a prominent paradigm for training large language models and multimodal systems. Despite notable advances enabled by existing RLHF training frameworks, significant challenges remain in scaling to complex multimodal workflows and adapting to dynamic workloads. In particular, current systems often encounter limitations related to controller scalability when managing large models, as well as inefficiencies in orchestrating intricate RLHF pipelines, especially in scenarios that require dynamic sampling and resource allocation. In this paper, we introduce WeChat-YATT (Yet Another Transformer Trainer in WeChat), a simple, scalable, and balanced RLHF training framework specifically designed to address these challenges. WeChat-YATT features a parallel controller programming model that enables flexible and efficient orchestration of complex RLHF workflows, effectively mitigating the bottlenecks associated with centralized controller architectures and facilitating scalability in large-scale data scenarios. In addition, we propose a dynamic placement schema that adaptively partitions computational resources and schedules workloads, thereby significantly reducing hardware idle time and improving GPU utilization under variable training conditions. We evaluate WeChat-YATT across a range of experimental scenarios, demonstrating that it achieves substantial improvements in throughput compared to state-of-the-art RLHF training frameworks. Furthermore, WeChat-YATT has been successfully deployed to train models supporting WeChat product features for a large-scale user base, underscoring its effectiveness and robustness in real-world applications.We have open-source WeChat-YATT at https://www.github.com/tencent/WeChat-YATT.
Updated: 2025-08-14 13:38:55
标题: 微信-YATT:一个简单、可扩展和平衡的RLHF训练器
摘要: 人类反馈强化学习(RLHF)已成为训练大型语言模型和多模态系统的重要范式。尽管现有的RLHF训练框架取得了显著进展,但在扩展到复杂的多模态工作流程和适应动态工作负载方面仍存在重大挑战。特别是,当前系统在管理大型模型时经常遇到与控制器可扩展性相关的限制,以及在编排复杂的RLHF流水线时出现效率低下的情况,尤其是在需要动态抽样和资源分配的场景中。在本文中,我们介绍了WeChat-YATT(Yet Another Transformer Trainer in WeChat),这是一个简单、可扩展且平衡的RLHF训练框架,专门设计用于解决这些挑战。WeChat-YATT具有并行控制器编程模型,可以灵活高效地编排复杂的RLHF工作流程,有效地缓解了与集中式控制器架构相关的瓶颈,并促进了大规模数据场景中的可扩展性。此外,我们提出了一种动态放置方案,自适应地划分计算资源并安排工作负载,从而显著减少硬件空闲时间,并在可变训练条件下提高GPU利用率。我们在一系列实验场景中评估了WeChat-YATT,结果显示与最先进的RLHF训练框架相比,它在吞吐量方面取得了显著改进。此外,WeChat-YATT已成功部署用于训练支持微信产品功能的模型,为大规模用户群体提供支持,凸显了它在实际应用中的有效性和鲁棒性。我们已在https://www.github.com/tencent/WeChat-YATT 开源了WeChat-YATT。
更新时间: 2025-08-14 13:38:55
领域: cs.LG,cs.AI
MirGuard: Towards a Robust Provenance-based Intrusion Detection System Against Graph Manipulation Attacks
Learning-based Provenance-based Intrusion Detection Systems (PIDSes) have become essential tools for anomaly detection in host systems due to their ability to capture rich contextual and structural information, as well as their potential to detect unknown attacks. However, recent studies have shown that these systems are vulnerable to graph manipulation attacks, where attackers manipulate the graph structure to evade detection. While some previous approaches have discussed this type of attack, none have fully addressed it with a robust detection solution, limiting the practical applicability of PIDSes. To address this challenge, we propose MirGuard, a robust anomaly detection framework that combines logic-aware multi-view augmentation with contrastive representation learning. Rather than applying arbitrary structural perturbations, MirGuard introduces Logic-Aware Noise Injection (LNI) to generate semantically valid graph views, ensuring that all augmentations preserve the underlying causal semantics of the provenance data. These views are then used in a Logic-Preserving Contrastive Learning framework, which encourages the model to learn representations that are invariant to benign transformations but sensitive to adversarial inconsistencies. Comprehensive evaluations on multiple provenance datasets demonstrate that MirGuard significantly outperforms state-of-the-art detectors in robustness against various graph manipulation attacks without sacrificing detection performance and efficiency. Our work represents the first targeted study to enhance PIDS against such adversarial threats, providing a robust and effective solution to modern cybersecurity challenges.
Updated: 2025-08-14 13:35:51
标题: MirGuard:面向图操作攻击的强大基于溯源的入侵检测系统
摘要: 基于学习的溯源入侵检测系统(PIDSes)已成为主机系统异常检测中不可或缺的工具,因为它们能够捕获丰富的上下文和结构信息,以及发现未知攻击的潜力。然而,最近的研究表明,这些系统容易受到图形操纵攻击的影响,攻击者通过操纵图形结构来规避检测。虽然一些先前的方法讨论了这种类型的攻击,但没有完全解决它并提供强大的检测解决方案,限制了PIDSes的实际适用性。 为了解决这一挑战,我们提出了MirGuard,一个结合逻辑感知多视图增强和对比表示学习的强大异常检测框架。MirGuard不同于应用任意的结构扰动,引入了逻辑感知噪声注入(LNI)来生成语义有效的图形视图,确保所有增强都保留了溯源数据的潜在因果语义。这些视图然后用于逻辑保留对比学习框架,鼓励模型学习对良性变换不变但对敌对不一致敏感的表示。对多个溯源数据集的综合评估表明,MirGuard在抵御各种图形操纵攻击方面明显优于最先进的检测器,而不牺牲检测性能和效率。我们的工作代表了首个针对此类敌对威胁加强PIDS的有针对性研究,为现代网络安全挑战提供了强大有效的解决方案。
更新时间: 2025-08-14 13:35:51
领域: cs.CR
A Transformer-Based Approach for DDoS Attack Detection in IoT Networks
DDoS attacks have become a major threat to the security of IoT devices and can cause severe damage to the network infrastructure. IoT devices suffer from the inherent problem of resource constraints and are therefore susceptible to such resource-exhausting attacks. Traditional methods for detecting DDoS attacks are not efficient enough to cope with the dynamic nature of IoT networks, as well as the scalability of the attacks, diversity of protocols, high volume of traffic, and variability in device behavior, and variability of protocols like MQTT, CoAP, making it hard to implement security across all the protocols. In this paper, we propose a novel approach, i.e., the use of Transformer models, which have shown remarkable performance in natural language processing tasks, for detecting DDoS attacks on IoT devices. The proposed model extracts features from network traffic data and processes them using a self-attention mechanism. Experiments conducted on a real-world dataset demonstrate that the proposed approach outperforms traditional machine learning techniques, which can be validated by comparing both approaches' accuracy, precision, recall, and F1-score. The results of this study show that the Transformer models can be an effective solution for detecting DDoS attacks on IoT devices and have the potential to be deployed in real-world IoT environments.
Updated: 2025-08-14 13:33:49
标题: 一种基于Transformer的方法用于IoT网络中的DDoS攻击检测
摘要: DDoS攻击已经成为对物联网设备安全的重大威胁,并且可能对网络基础设施造成严重破坏。物联网设备面临资源限制的固有问题,因此容易受到这种资源耗尽攻击的影响。传统的检测DDoS攻击的方法不足以应对物联网网络动态特性、攻击规模、协议多样性、高流量量和设备行为的变化,以及像MQTT、CoAP等协议的变化,导致在所有协议上实施安全性变得困难。本文提出了一种新颖的方法,即使用Transformer模型,该模型在自然语言处理任务中表现出色,用于检测物联网设备上的DDoS攻击。提出的模型从网络流量数据中提取特征,并使用自注意机制进行处理。在实际数据集上进行的实验表明,所提出的方法优于传统的机器学习技术,可以通过比较两种方法的准确性、精确度、召回率和F1分数来验证。本研究结果表明,Transformer模型可以成为检测物联网设备上DDoS攻击的有效解决方案,并具有在实际物联网环境中部署的潜力。
更新时间: 2025-08-14 13:33:49
领域: cs.CR,cs.IT,math.IT
Energy-Based Models for Predicting Mutational Effects on Proteins
Predicting changes in binding free energy ($\Delta\Delta G$) is a vital task in protein engineering and protein-protein interaction (PPI) engineering for drug discovery. Previous works have observed a high correlation between $\Delta\Delta G$ and entropy, using probabilities of biologically important objects such as side chain angles and residue identities to estimate $\Delta\Delta G$. However, estimating the full conformational distribution of a protein complex is generally considered intractable. In this work, we propose a new approach to $\Delta\Delta G$ prediction that avoids this issue by instead leveraging energy-based models for estimating the probability of a complex's conformation. Specifically, we novelly decompose $\Delta\Delta G$ into a sequence-based component estimated by an inverse folding model and a structure-based component estimated by an energy model. This decomposition is made tractable by assuming equilibrium between the bound and unbound states, allowing us to simplify the estimation of degeneracies associated with each state. Unlike previous deep learning-based methods, our method incorporates an energy-based physical inductive bias by connecting the often-used sequence log-odds ratio-based approach to $\Delta\Delta G$ prediction with a new $\Delta\Delta E$ term grounded in statistical mechanics. We demonstrate superiority over existing state-of-the-art structure and sequence-based deep learning methods in $\Delta\Delta G$ prediction and antibody optimization against SARS-CoV-2.
Updated: 2025-08-14 13:30:19
标题: 基于能量的模型预测蛋白质突变效应
摘要: 预测结合自由能变化($\Delta\Delta G$)是蛋白工程和蛋白质-蛋白质相互作用(PPI)工程中的关键任务,用于药物发现。先前的研究观察到$\Delta\Delta G$与熵之间存在高度相关性,利用生物重要对象的概率,如侧链角度和残基标识,来估计$\Delta\Delta G$。然而,通常认为估计蛋白质复合物的完整构象分布是不可解的。在这项工作中,我们提出了一种新的$\Delta\Delta G$预测方法,通过利用基于能量的模型来估计复合物构象的概率,从而避免了这个问题。具体来说,我们将$\Delta\Delta G$新颖地分解为由逆折叠模型估计的基于序列的组分和由能量模型估计的基于结构的组分。通过假设结合态和非结合态之间的平衡,使我们能够简化与每个状态相关的简并度的估计。与先前基于深度学习的方法不同,我们的方法通过将经常使用的基于序列对数几率比的方法与基于统计力学的新$\Delta\Delta E$项相连接,将基于能量的物理归纳偏差纳入到$\Delta\Delta G$预测中。我们在$\Delta\Delta G$预测和针对SARS-CoV-2的抗体优化中展示了对现有最先进的基于结构和序列的深度学习方法的优越性。
更新时间: 2025-08-14 13:30:19
领域: cs.LG
Beyond Random Sampling: Instance Quality-Based Data Partitioning via Item Response Theory
Robust validation of Machine Learning (ML) models is essential, but traditional data partitioning approaches often ignore the intrinsic quality of each instance. This study proposes the use of Item Response Theory (IRT) parameters to characterize and guide the partitioning of datasets in the model validation stage. The impact of IRT-informed partitioning strategies on the performance of several ML models in four tabular datasets was evaluated. The results obtained demonstrate that IRT reveals an inherent heterogeneity of the instances and highlights the existence of informative subgroups of instances within the same dataset. Based on IRT, balanced partitions were created that consistently help to better understand the tradeoff between bias and variance of the models. In addition, the guessing parameter proved to be a determining factor: training with high-guessing instances can significantly impair model performance and resulted in cases with accuracy below 50%, while other partitions reached more than 70% in the same dataset.
Updated: 2025-08-14 13:29:40
标题: 超越随机抽样:基于项目反应理论的实例质量数据分区
摘要: 机器学习(ML)模型的强大验证是必不可少的,但传统的数据划分方法经常忽视每个实例的固有质量。本研究提出利用项目反应理论(IRT)参数来表征和指导模型验证阶段中数据集的划分。评估了IRT信息的划分策略对四个表格数据集中几种ML模型性能的影响。结果表明,IRT揭示了实例的固有异质性,并突显了同一数据集中存在信息丰富的实例子群。基于IRT,创建了平衡的划分,有助于更好地理解模型偏差和方差之间的权衡。此外,猜测参数被证明是一个决定性因素:训练具有高猜测实例的模型可能会严重影响模型性能,并导致准确率低于50%的情况,而其他划分在同一数据集中达到了70%以上。
更新时间: 2025-08-14 13:29:40
领域: cs.LG,I.2.6
Mathematical Computation and Reasoning Errors by Large Language Models
Large Language Models (LLMs) are increasingly utilized in AI-driven educational instruction and assessment, particularly within mathematics education. The capability of LLMs to generate accurate answers and detailed solutions for math problem-solving tasks is foundational for ensuring reliable and precise feedback and assessment in math education practices. Our study focuses on evaluating the accuracy of four LLMs (OpenAI GPT-4o and o1, DeepSeek-V3 and DeepSeek-R1) solving three categories of math tasks, including arithmetic, algebra, and number theory, and identifies step-level reasoning errors within their solutions. Instead of relying on standard benchmarks, we intentionally build math tasks (via item models) that are challenging for LLMs and prone to errors. The accuracy of final answers and the presence of errors in individual solution steps were systematically analyzed and coded. Both single-agent and dual-agent configurations were tested. It is observed that the reasoning-enhanced OpenAI o1 model consistently achieved higher or nearly perfect accuracy across all three math task categories. Analysis of errors revealed that procedural slips were the most frequent and significantly impacted overall performance, while conceptual misunderstandings were less frequent. Deploying dual-agent configurations substantially improved overall performance. These findings offer actionable insights into enhancing LLM performance and underscore effective strategies for integrating LLMs into mathematics education, thereby advancing AI-driven instructional practices and assessment precision.
Updated: 2025-08-14 13:25:18
标题: 大型语言模型的数学计算和推理错误 (Note: This is a direct translation of the title)
摘要: 大型语言模型(LLMs)在人工智能驱动的教育教学和评估中越来越被广泛使用,特别是在数学教育领域。LLMs 生成准确答案和详细解决方案的能力对于确保数学教育实践中可靠和精确的反馈和评估至关重要。我们的研究重点是评估四个LLMs(OpenAI GPT-4o 和 o1、DeepSeek-V3 和 DeepSeek-R1)解决三类数学任务(包括算术、代数和数论)的准确性,并识别其解决方案中的步骤级推理错误。我们故意构建了对LLMs具有挑战性且容易出错的数学任务(通过项目模型),而不是依赖于标准基准。最终答案的准确性和解决方案中错误的存在被系统分析和编码。我们测试了单一代理和双一代理配置。观察到增强推理的OpenAI o1 模型在所有三类数学任务中始终实现更高或几乎完美的准确性。错误分析显示,过程失误是最常见的,对整体表现影响显著,而概念误解较少。部署双一代理配置显著提高了整体表现。这些发现为提升LLM性能提供了可操作的见解,并强调了将LLM整合到数学教育中的有效策略,从而推动了人工智能驱动的教学实践和评估精度。
更新时间: 2025-08-14 13:25:18
领域: cs.AI
INSIGHT: Explainable Weakly-Supervised Medical Image Analysis
Due to their large sizes, volumetric scans and whole-slide pathology images (WSIs) are often processed by extracting embeddings from local regions and then an aggregator makes predictions from this set. However, current methods require post-hoc visualization techniques (e.g., Grad-CAM) and often fail to localize small yet clinically crucial details. To address these limitations, we introduce INSIGHT, a novel weakly-supervised aggregator that integrates heatmap generation as an inductive bias. Starting from pre-trained feature maps, INSIGHT employs a detection module with small convolutional kernels to capture fine details and a context module with a broader receptive field to suppress local false positives. The resulting internal heatmap highlights diagnostically relevant regions. On CT and WSI benchmarks, INSIGHT achieves state-of-the-art classification results and high weakly-labeled semantic segmentation performance. Project website and code are available at: https://zhangdylan83.github.io/ewsmia/
Updated: 2025-08-14 13:14:10
标题: 洞察:可解释的弱监督医学图像分析
摘要: 由于它们的体积较大,体积扫描和全切片病理图像(WSIs)通常通过从局部区域提取嵌入然后聚合器从该集合中进行预测来处理。然而,当前的方法需要事后可视化技术(例如,Grad-CAM),并且经常无法定位小但在临床上至关重要的细节。为了解决这些限制,我们引入了INSIGHT,一种新颖的弱监督聚合器,它集成了热图生成作为一种归纳偏差。从预训练的特征图开始,INSIGHT使用具有小卷积核的检测模块捕获细节,使用具有更广泛感受野的上下文模块抑制局部假阳性。生成的内部热图突出显示诊断相关区域。在CT和WSI基准测试中,INSIGHT取得了最新的分类结果和高弱标记语义分割性能。项目网站和代码可在以下链接找到:https://zhangdylan83.github.io/ewsmia/
更新时间: 2025-08-14 13:14:10
领域: eess.IV,cs.AI,cs.CV
Fourier-Guided Attention Upsampling for Image Super-Resolution
We propose Frequency-Guided Attention (FGA), a lightweight upsampling module for single image super-resolution. Conventional upsamplers, such as Sub-Pixel Convolution, are efficient but frequently fail to reconstruct high-frequency details and introduce aliasing artifacts. FGA addresses these issues by integrating (1) a Fourier feature-based Multi-Layer Perceptron (MLP) for positional frequency encoding, (2) a cross-resolution Correlation Attention Layer for adaptive spatial alignment, and (3) a frequency-domain L1 loss for spectral fidelity supervision. Adding merely 0.3M parameters, FGA consistently enhances performance across five diverse super-resolution backbones in both lightweight and full-capacity scenarios. Experimental results demonstrate average PSNR gains of 0.12~0.14 dB and improved frequency-domain consistency by up to 29%, particularly evident on texture-rich datasets. Visual and spectral evaluations confirm FGA's effectiveness in reducing aliasing and preserving fine details, establishing it as a practical, scalable alternative to traditional upsampling methods.
Updated: 2025-08-14 13:13:17
标题: Fourier引导的注意力上采样用于图像超分辨率
摘要: 我们提出了频率引导注意力(FGA),一个用于单图像超分辨率的轻量级上采样模块。传统的上采样器,如子像素卷积,虽然高效,但经常无法重建高频细节并引入混淆伪影。FGA通过整合(1)基于傅里叶特征的多层感知器(MLP)进行位置频率编码,(2)用于自适应空间对齐的跨分辨率相关性注意力层,以及(3)用于频谱保真监督的频域L1损失来解决这些问题。仅增加0.3M参数,FGA在轻量级和全容量场景下持续提升了五种不同超分辨率骨干网络的性能。实验结果显示,平均PSNR增益为0.12~0.14 dB,并且频域一致性提高了高达29%,在纹理丰富的数据集上尤为明显。视觉和频谱评估证实了FGA在减少混淆伪影和保留细节方面的有效性,将其确立为传统上采样方法的实用、可扩展替代方案。
更新时间: 2025-08-14 13:13:17
领域: cs.CV,cs.AI
Goal-Oriented Time-Series Forecasting: Foundation Framework Design
Conventional time-series forecasting methods typically aim to minimize overall prediction error, without accounting for the varying importance of different forecast ranges in downstream applications. We propose a training methodology that enables forecasting models to adapt their focus to application-specific regions of interest at inference time, without retraining. The approach partitions the prediction space into fine-grained segments during training, which are dynamically reweighted and aggregated to emphasize the target range specified by the application. Unlike prior methods that predefine these ranges, our framework supports flexible, on-demand adjustments. Experiments on standard benchmarks and a newly collected wireless communication dataset demonstrate that our method not only improves forecast accuracy within regions of interest but also yields measurable gains in downstream task performance. These results highlight the potential for closer integration between predictive modeling and decision-making in real-world systems.
Updated: 2025-08-14 13:00:09
标题: 面向目标的时间序列预测:基础框架设计
摘要: 传统的时间序列预测方法通常旨在最小化总体预测误差,而不考虑不同预测范围在下游应用中的重要性变化。我们提出了一种训练方法,使预测模型能够在推断时适应应用特定的感兴趣区域,而无需重新训练。该方法在训练过程中将预测空间分割成细粒度的段,动态重新加权和聚合以强调应用中指定的目标范围。与以前预先定义这些范围的方法不同,我们的框架支持灵活的、按需的调整。在标准基准测试和新收集的无线通信数据集上的实验表明,我们的方法不仅改善了感兴趣区域内的预测准确性,而且在下游任务性能上取得了可衡量的增益。这些结果突显了在现实系统中预测建模与决策制定之间更紧密集成的潜力。
更新时间: 2025-08-14 13:00:09
领域: cs.LG,cs.AI
A Graph-Based Framework for Exploring Mathematical Patterns in Physics: A Proof of Concept
The vast corpus of physics equations forms an implicit network of mathematical relationships that traditional analysis cannot fully explore. This work introduces a graph-based framework combining neural networks with symbolic analysis to systematically discover and validate mathematical patterns across physics domains. Starting from 659 equations, we performed rigorous semantic disambiguation to resolve notational polysemy affecting 213 equations, then focused on 400 advanced physics equations by excluding elementary mechanics to emphasize inter-branch connections of modern physics. This corpus was represented as a weighted knowledge graph where a Graph Attention Network achieved 97.4% AUC in link prediction, significantly outperforming classical baselines. The framework's primary value emerges from its dual capability: generating hypotheses and auditing knowledge. First, it functions as a hypothesis generator, producing hundreds of candidate cross-domain connections, from blackbody radiation coupled with Navier-Stokes equations to radioactive decay linked with electromagnetic induction. Second, through symbolic analysis of 30 equation clusters, it serves as a computational auditor that verified established theory consistencies, synthesized the Magnetic Reynolds Number from electromagnetic-fluid coupling, and revealed how even parsing errors could potentially point toward legitimate research like analog gravity. This proof-of-concept intentionally over-generates candidates to ensure comprehensive exploration of mathematical possibility space. Even tautologies and errors serve scientific purposes: redundancy identification and knowledge base quality assessment. The system transforms the intractable combinatorial space into a filtered stream of mathematical patterns for human interpretation.
Updated: 2025-08-14 12:55:58
标题: 一个基于图的框架用于探索物理学中的数学模式:概念验证
摘要: 物理方程的庞大语料库形成了一种隐含的数学关系网络,传统分析无法完全探索。本研究引入了一个基于图的框架,将神经网络与符号分析相结合,系统地发现和验证跨物理领域的数学模式。从659个方程开始,我们进行了严格的语义消歧,解决了影响213个方程的符号多义性问题,然后将重点放在排除了基础力学的400个高级物理方程上,以强调现代物理学的跨学科连接。这个语料库被表示为一个加权知识图,其中一个图注意力网络在链接预测中实现了97.4%的AUC,明显优于传统基线。该框架的主要价值在于其双重功能:生成假设和审计知识。首先,它作为假设生成器,产生了数百个跨领域连接的候选假设,从黑体辐射与纳维-斯托克斯方程的耦合到放射性衰变与电磁感应的联系。其次,通过对30个方程簇的符号分析,它充当了一个计算审计员,验证了已建立的理论一致性,从电磁流体耦合中合成了磁雷诺数,并揭示了即使是解析错误也可能指向合法研究,比如模拟引力。这个概念验证故意过度生成候选者,以确保全面探索数学可能性空间。即使是重复和错误也有科学目的:冗余识别和知识库质量评估。该系统将难以处理的组合空间转化为一个经过过滤的数学模式流,供人类解释。
更新时间: 2025-08-14 12:55:58
领域: cs.LG,physics.data-an,68T07, 81-08, 05C90,I.2.6; G.2.2; I.5.1
Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction
Foundation models, including large language models (LLMs) and vision-language models (VLMs), have recently enabled novel approaches to robot autonomy and human-robot interfaces. In parallel, vision-language-action models (VLAs) or large behavior models (LBMs) are increasing the dexterity and capabilities of robotic systems. This survey paper focuses on those works advancing towards agentic applications and architectures. This includes initial efforts exploring GPT-style interfaces to tooling, as well as more complex system where AI agents are coordinators, planners, perception actors, or generalist interfaces. Such agentic architectures allow robots to reason over natural language instructions, invoke APIs, plan task sequences, or assist in operations and diagnostics. In addition to peer-reviewed research, due to the fast-evolving nature of the field, we highlight and include community-driven projects, ROS packages, and industrial frameworks that show emerging trends. We propose a taxonomy for classifying model integration approaches and present a comparative analysis of the role that agents play in different solutions in today's literature.
Updated: 2025-08-14 12:55:31
标题: 走向具有实体代理性的人工智能:基于LLM和VLM驱动的机器人自主性和交互的综述和分类
摘要: 基础模型,包括大型语言模型(LLMs)和视觉语言模型(VLMs),最近使得机器人自主性和人机界面的新方法成为可能。同时,视觉语言行为模型(VLAs)或大型行为模型(LBMs)正在增加机器人系统的灵活性和能力。本调查论文专注于那些朝着主动应用和架构前进的工作。这包括初步探索类似GPT界面到工具的努力,以及更复杂的系统,其中AI代理是协调员、规划者、感知行为者或通用界面。这种主动架构允许机器人对自然语言指令进行推理,调用API,规划任务序列,或在操作和诊断方面进行协助。除了同行评议研究外,由于领域发展迅速,我们突出并包括社区驱动的项目、ROS软件包和展示新兴趋势的工业框架。我们提出了一个用于分类模型整合方法的分类法,并对当今文献中不同解决方案中代理扮演的角色进行了比较分析。
更新时间: 2025-08-14 12:55:31
领域: cs.RO,cs.AI,cs.LG
Continuous Parallel Relaxation for Finding Diverse Solutions in Combinatorial Optimization Problems
Finding the optimal solution is often the primary goal in combinatorial optimization (CO). However, real-world applications frequently require diverse solutions rather than a single optimum, particularly in two key scenarios. The first scenario occurs in real-world applications where strictly enforcing every constraint is neither necessary nor desirable. Allowing minor constraint violations can often lead to more cost-effective solutions. This is typically achieved by incorporating the constraints as penalty terms in the objective function, which requires careful tuning of penalty parameters. The second scenario involves cases where CO formulations tend to oversimplify complex real-world factors, such as domain knowledge, implicit trade-offs, or ethical considerations. To address these challenges, generating (i) penalty-diversified solutions by varying penalty intensities and (ii) variation-diversified solutions with distinct structural characteristics provides valuable insights, enabling practitioners to post-select the most suitable solution for their specific needs. However, efficiently discovering these diverse solutions is more challenging than finding a single optimal one. This study introduces Continual Parallel Relaxation Annealing (CPRA), a computationally efficient framework for unsupervised-learning (UL)-based CO solvers that generates diverse solutions within a single training run. CPRA leverages representation learning and parallelization to automatically discover shared representations, substantially accelerating the search for these diverse solutions. Numerical experiments demonstrate that CPRA outperforms existing UL-based solvers in generating these diverse solutions while significantly reducing computational costs.
Updated: 2025-08-14 12:55:12
标题: 连续并行松弛用于在组合优化问题中找到多样化解决方案
摘要: 在组合优化(CO)中,寻找最佳解通常是首要目标。然而,在现实世界的应用中,经常需要多样化的解决方案,而不是单一的最佳解,特别是在两个关键场景中。第一个场景发生在现实世界的应用中,严格执行每个约束条件既不必要也不可取。允许轻微的约束违反通常会导致更具成本效益的解决方案。这通常通过将约束作为惩罚项纳入目标函数中来实现,这需要仔细调整惩罚参数。第二个场景涉及到CO公式倾向于过于简化复杂的现实因素,如领域知识、隐含的权衡或道德考虑等。为了解决这些挑战,通过变化惩罚强度生成(i)多样化的惩罚解和(ii)具有不同结构特征的多样化解决方案提供了宝贵的见解,使从业者能够为他们的特定需求后选择最合适的解决方案。然而,高效地发现这些多样化的解决方案比找到单一最佳解更具挑战性。本研究介绍了连续并行弛豫退火(CPRA),这是一种计算效率高的基于无监督学习(UL)的CO求解器框架,可在单次训练中生成多样化的解决方案。CPRA利用表示学习和并行化来自动发现共享表示,大大加速了对这些多样化解决方案的搜索。数值实验表明,CPRA在生成这些多样化解决方案方面优于现有的基于UL的求解器,同时显著降低了计算成本。
更新时间: 2025-08-14 12:55:12
领域: stat.ML,cs.LG,stat.CO,stat.ME
Variance Reduced Policy Gradient Method for Multi-Objective Reinforcement Learning
Multi-Objective Reinforcement Learning (MORL) is a generalization of traditional Reinforcement Learning (RL) that aims to optimize multiple, often conflicting objectives simultaneously rather than focusing on a single reward. This approach is crucial in complex decision-making scenarios where agents must balance trade-offs between various goals, such as maximizing performance while minimizing costs. We consider the problem of MORL where the objectives are combined using a non-linear scalarization function. Just like in standard RL, policy gradient methods (PGMs) are amongst the most effective for handling large and continuous state-action spaces in MORL. However, existing PGMs for MORL suffer from high sample inefficiency, requiring large amounts of data to be effective. Previous attempts to solve this problem rely on overly strict assumptions, losing PGMs' benefits in scalability to large state-action spaces. In this work, we address the issue of sample efficiency by implementing variance-reduction techniques to reduce the sample complexity of policy gradients while maintaining general assumptions.
Updated: 2025-08-14 12:52:57
标题: 多目标强化学习的方差减少策略梯度方法
摘要: 多目标强化学习(MORL)是传统的强化学习(RL)的一种泛化,其旨在同时优化多个通常相互冲突的目标,而不是专注于单一奖励。这种方法在复杂的决策场景中至关重要,其中代理必须在各种目标之间平衡权衡,例如在最大化性能的同时最小化成本。我们考虑MORL的问题,其中目标使用非线性标量化函数组合。就像标准RL一样,政策梯度方法(PGMs)是处理MORL中的大型和连续状态-动作空间中最有效的方法之一。然而,现有的MORL的PGMs存在较高的样本效率低下,需要大量数据才能有效。以往尝试解决这一问题的方法依赖于过于严格的假设,失去了PGMs在可扩展到大型状态-动作空间中的优势。在这项工作中,我们通过实施方差减少技术来解决样本效率问题,以减少政策梯度的样本复杂性,同时保持一般假设。
更新时间: 2025-08-14 12:52:57
领域: cs.LG,cs.SY,eess.SY,math.OC,math.ST,stat.TH,68T05: 90C26: 90C40,I.2.0; G.3
Bistochastically private release of longitudinal data
Although the bulk of the research in privacy and statistical disclosure control is designed for cross-sectional data, i.e. data where individuals are observed at one single point in time, longitudinal data, i.e. individuals observed over multiple periods, are increasingly collected. Such data enhance undoubtedly the possibility of statistical analysis compared to cross-sectional data, but also come with one additional layer of information, individual trajectories, that must remain practically useful in a privacy-preserving way. Few extensions, essentially k-anonymity based, of popular privacy tools have been proposed to deal with the challenges posed by longitudinal data, and these proposals are often complex. By considering randomized response, and specifically its recent bistochastic extension, in the context of longitudinal data, this paper proposes a simple approach for their anonymization. After having characterized new results on bistochastic matrices, we show that a simple relationship exists between the protection of each data set released at each period, and the protection of individuals trajectories over time. In turn, this relationship can be tuned according to desired protection and information requirements. We illustrate the application of the proposed approach by an empirical example.
Updated: 2025-08-14 12:48:06
标题: Bistochastically私密发布纵向数据
摘要: 尽管在隐私和统计披露控制领域的大部分研究都是针对横断面数据设计的,即在一个单一时间点观察个体的数据,但纵向数据,即在多个时间段观察个体的数据,正在越来越多地被收集。这些数据无疑增强了与横断面数据相比的统计分析可能性,但也带来了一个额外的信息层,即个体轨迹,这些轨迹必须以保护隐私的方式保持实用性。一些基于k-匿名的流行隐私工具的扩展已被提出来处理纵向数据带来的挑战,这些提议通常是复杂的。通过将随机响应,尤其是其最近的双随机扩展,应用于纵向数据的背景中,本文提出了一种简单的匿名化方法。在对双随机矩阵的新结果进行特征化后,我们展示了在每个时间段发布的每个数据集的保护与个体轨迹随时间的保护之间存在着简单的关系。反过来,这种关系可以根据所需的保护和信息要求进行调整。我们通过一个实证示例说明了所提出方法的应用。
更新时间: 2025-08-14 12:48:06
领域: stat.ME,cs.CR
On Understanding of the Dynamics of Model Capacity in Continual Learning
The stability-plasticity dilemma, closely related to a neural network's (NN) capacity-its ability to represent tasks-is a fundamental challenge in continual learning (CL). Within this context, we introduce CL's effective model capacity (CLEMC) that characterizes the dynamic behavior of the stability-plasticity balance point. We develop a difference equation to model the evolution of the interplay between the NN, task data, and optimization procedure. We then leverage CLEMC to demonstrate that the effective capacity-and, by extension, the stability-plasticity balance point is inherently non-stationary. We show that regardless of the NN architecture or optimization method, a NN's ability to represent new tasks diminishes when incoming task distributions differ from previous ones. We conduct extensive experiments to support our theoretical findings, spanning a range of architectures-from small feedforward network and convolutional networks to medium-sized graph neural networks and transformer-based large language models with millions of parameters.
Updated: 2025-08-14 12:42:10
标题: 对于连续学习中模型容量动态的理解
摘要: 稳定性-可塑性困境与神经网络(NN)的容量-表示任务能力密切相关,在持续学习(CL)中是一个基本挑战。在这个背景下,我们引入了持续学习的有效模型容量(CLEMC),用于表征稳定性-可塑性平衡点的动态行为。我们开发了一个差分方程来模拟NN、任务数据和优化过程之间的相互作用的演变。然后利用CLEMC来证明有效容量,从而稳定性-可塑性平衡点本质上是非稳态的。我们展示了无论是NN的架构还是优化方法,当新任务分布与之前不同时,NN表示新任务的能力会减弱。我们进行了大量实验证明我们的理论发现,涵盖了一系列架构 - 从小型前馈网络和卷积网络到中等规模的图神经网络和基于Transformer的大型语言模型,参数数量达到百万级。
更新时间: 2025-08-14 12:42:10
领域: cs.LG,cs.AI
MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models
Activation steering offers a promising approach to controlling the behavior of Large Language Models by directly manipulating their internal activations. However, most existing methods struggle to jointly steer multiple attributes, often resulting in interference and undesirable trade-offs. To address this challenge, we propose Multi-Subspace Representation Steering (MSRS), a novel framework for effective multi-attribute steering via subspace representation fine-tuning. MSRS reduces inter-attribute interference by allocating orthogonal subspaces to each attribute, isolating their influence within the model's representation space. MSRS also incorporates a hybrid subspace composition strategy: it combines attribute-specific subspaces for unique steering directions with a shared subspace for common steering directions. A dynamic weighting function learns to efficiently integrate these components for precise control. During inference, MSRS introduces a token-level steering mechanism that dynamically identifies and intervenes on the most semantically relevant tokens, enabling fine-grained behavioral modulation. Experimental results show that MSRS significantly reduces attribute conflicts, surpasses existing methods across a range of attributes, and generalizes effectively to diverse downstream tasks.
Updated: 2025-08-14 12:40:19
标题: MSRS: 大型语言模型中属性对齐的自适应多子空间表示导向
摘要: 激活引导提供了一种有望控制大型语言模型行为的方法,直接操纵它们的内部激活。然而,大多数现有方法往往难以同时引导多个属性,通常会导致干扰和不良权衡。为了解决这一挑战,我们提出了一种新颖的框架,即多子空间表示引导(MSRS),通过子空间表示微调实现有效的多属性引导。MSRS通过为每个属性分配正交子空间来减少属性之间的干扰,将它们的影响隔离在模型的表示空间内。MSRS还结合了混合子空间组成策略:它将属性特定的子空间与共享子空间结合,用于独特的引导方向和共同的引导方向。动态加权函数学习有效地整合这些组件以实现精确控制。在推理过程中,MSRS引入了一个令牌级引导机制,动态识别并干预最具语义相关的令牌,实现细粒度的行为调节。实验结果显示,MSRS显著减少了属性冲突,超越了现有方法在各种属性上的表现,并有效地推广到各种下游任务中。
更新时间: 2025-08-14 12:40:19
领域: cs.AI
Oops!... They Stole it Again: Attacks on Split Learning
Split Learning (SL) is a collaborative learning approach that improves privacy by keeping data on the client-side while sharing only the intermediate output with a server. However, the distributed nature of SL introduces new security challenges, necessitating a comprehensive exploration of potential attacks. This paper systematically reviews various attacks on SL, classifying them based on factors such as the attacker's role, the type of privacy risks, when data leaks occur, and where vulnerabilities exist. We also analyze existing defense methods, including cryptographic methods, data modification approaches, distributed techniques, and hybrid solutions. Our findings reveal security gaps, highlighting the effectiveness and limitations of existing defenses. By identifying open challenges and future directions, this work provides valuable information to improve SL privacy issues and guide further research.
Updated: 2025-08-14 12:39:28
标题: Oops!...他们又偷走了它:对分布式学习的攻击
摘要: Split Learning(SL)是一种协作学习方法,通过在客户端保留数据并仅与服务器共享中间输出来提高隐私性。然而,SL的分布式性质引入了新的安全挑战,需要全面探索潜在攻击。本文系统地回顾了对SL的各种攻击,并根据攻击者的角色、隐私风险类型、数据泄漏发生时间和漏洞存在位置等因素对其进行分类。我们还分析了现有的防御方法,包括加密方法、数据修改方法、分布式技术和混合解决方案。我们的研究结果揭示了安全漏洞,突出了现有防御方法的有效性和局限性。通过确定开放挑战和未来方向,本研究为改进SL隐私问题并引导进一步研究提供了有价值的信息。
更新时间: 2025-08-14 12:39:28
领域: cs.LG
On Spectral Properties of Gradient-based Explanation Methods
Understanding the behavior of deep networks is crucial to increase our confidence in their results. Despite an extensive body of work for explaining their predictions, researchers have faced reliability issues, which can be attributed to insufficient formalism. In our research, we adopt novel probabilistic and spectral perspectives to formally analyze explanation methods. Our study reveals a pervasive spectral bias stemming from the use of gradient, and sheds light on some common design choices that have been discovered experimentally, in particular, the use of squared gradient and input perturbation. We further characterize how the choice of perturbation hyperparameters in explanation methods, such as SmoothGrad, can lead to inconsistent explanations and introduce two remedies based on our proposed formalism: (i) a mechanism to determine a standard perturbation scale, and (ii) an aggregation method which we call SpectralLens. Finally, we substantiate our theoretical results through quantitative evaluations.
Updated: 2025-08-14 12:37:22
标题: 基于梯度的解释方法的谱特性
摘要: 理解深度网络的行为对于增加我们对其结果的信心至关重要。尽管有大量的工作来解释它们的预测,研究人员面临着可靠性问题,这可以归因于不足的形式化。在我们的研究中,我们采用了新颖的概率和谱学角度来正式分析解释方法。我们的研究揭示了源自梯度使用的普遍谱偏差,并阐明了一些经过实验证实的常见设计选择,特别是使用平方梯度和输入扰动。我们进一步表征了解释方法中扰动超参数的选择如何导致不一致的解释,并提出了两种基于我们提出的形式化的补救措施:(i)确定标准扰动尺度的机制,以及(ii)我们称之为SpectralLens的聚合方法。最后,我们通过定量评估验证了我们的理论结果。
更新时间: 2025-08-14 12:37:22
领域: cs.LG,cs.AI,cs.CV
FreeGAD: A Training-Free yet Effective Approach for Graph Anomaly Detection
Graph Anomaly Detection (GAD) aims to identify nodes that deviate from the majority within a graph, playing a crucial role in applications such as social networks and e-commerce. Despite the current advancements in deep learning-based GAD, existing approaches often suffer from high deployment costs and poor scalability due to their complex and resource-intensive training processes. Surprisingly, our empirical findings suggest that the training phase of deep GAD methods, commonly perceived as crucial, may actually contribute less to anomaly detection performance than expected. Inspired by this, we propose FreeGAD, a novel training-free yet effective GAD method. Specifically, it leverages an affinity-gated residual encoder to generate anomaly-aware representations. Meanwhile, FreeGAD identifies anchor nodes as pseudo-normal and anomalous guides, followed by calculating anomaly scores through anchor-guided statistical deviations. Extensive experiments demonstrate that FreeGAD achieves superior anomaly detection performance, efficiency, and scalability on multiple benchmark datasets from diverse domains, without any training or iterative optimization.
Updated: 2025-08-14 12:37:20
标题: FreeGAD:一种无需训练但有效的图异常检测方法
摘要: 图异常检测(GAD)旨在识别在图中偏离大多数节点的节点,在社交网络和电子商务等应用中起着至关重要的作用。尽管基于深度学习的GAD取得了当前的进展,现有方法往往由于复杂和资源密集的训练过程而具有高部署成本和较差的可扩展性。令人惊讶的是,我们的实证发现表明,通常被认为至关重要的深度GAD方法的训练阶段实际上对异常检测性能的贡献可能低于预期。受此启发,我们提出了FreeGAD,一种新颖的无需训练但有效的GAD方法。具体来说,它利用一种亲和力门控残差编码器生成具有异常感知的表示。同时,FreeGAD将锚节点识别为伪正常和异常指南,并通过锚引导的统计偏差计算异常得分。大量实验表明,FreeGAD在来自不同领域的多个基准数据集上实现了卓越的异常检测性能、效率和可扩展性,而无需任何训练或迭代优化。
更新时间: 2025-08-14 12:37:20
领域: cs.LG,cs.AI
CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting
Recent advances in 3D reconstruction techniques and vision-language models have fueled significant progress in 3D semantic understanding, a capability critical to robotics, autonomous driving, and virtual/augmented reality. However, methods that rely on 2D priors are prone to a critical challenge: cross-view semantic inconsistencies induced by occlusion, image blur, and view-dependent variations. These inconsistencies, when propagated via projection supervision, deteriorate the quality of 3D Gaussian semantic fields and introduce artifacts in the rendered outputs. To mitigate this limitation, we propose CCL-LGS, a novel framework that enforces view-consistent semantic supervision by integrating multi-view semantic cues. Specifically, our approach first employs a zero-shot tracker to align a set of SAM-generated 2D masks and reliably identify their corresponding categories. Next, we utilize CLIP to extract robust semantic encodings across views. Finally, our Contrastive Codebook Learning (CCL) module distills discriminative semantic features by enforcing intra-class compactness and inter-class distinctiveness. In contrast to previous methods that directly apply CLIP to imperfect masks, our framework explicitly resolves semantic conflicts while preserving category discriminability. Extensive experiments demonstrate that CCL-LGS outperforms previous state-of-the-art methods. Our project page is available at https://epsilontl.github.io/CCL-LGS/.
Updated: 2025-08-14 12:29:24
标题: CCL-LGS:用于3D语言高斯Splatting的对比码书学习
摘要: 最近在3D重建技术和视觉语言模型方面取得的进展,在3D语义理解方面产生了显著进展,这是机器人技术、自动驾驶和虚拟/增强现实等领域至关重要的能力。然而,依赖于2D先验知识的方法容易面临一个关键挑战:由遮挡、图像模糊和视角相关变化引起的跨视图语义不一致性。这些不一致性,通过投影监督传播,会降低3D高斯语义场的质量,并在渲染输出中引入伪影。为了克服这一限制,我们提出了CCL-LGS,这是一个新颖的框架,通过整合多视图语义线索来强化视图一致的语义监督。具体来说,我们的方法首先利用零样本跟踪器对一组由SAM生成的2D掩模进行对齐,并可靠地识别它们对应的类别。接下来,我们利用CLIP在视图之间提取强大的语义编码。最后,我们的对比码书学习(CCL)模块通过强化类内紧凑性和类间差异性来提炼具有区分性的语义特征。与直接将CLIP应用于不完美掩模的先前方法不同,我们的框架明确解决了语义冲突,同时保持了类别可辨识性。大量实验证明,CCL-LGS优于先前的最先进方法。我们的项目页面位于https://epsilontl.github.io/CCL-LGS/。
更新时间: 2025-08-14 12:29:24
领域: cs.CV,cs.AI
Self-Supervised Temporal Super-Resolution of Energy Data using Generative Adversarial Transformer
To bridge the temporal granularity gap in energy network design and operation based on Energy System Models, resampling of time series is required. While conventional upsampling methods are computationally efficient, they often result in significant information loss or increased noise. Advanced models such as time series generation models, Super-Resolution models and imputation models show potential, but also face fundamental challenges. The goal of time series generative models is to learn the distribution of the original data to generate high-resolution series with similar statistical characteristics. This is not entirely consistent with the definition of upsampling. Time series Super-Resolution models or imputation models can degrade the accuracy of upsampling because the input low-resolution time series are sparse and may have insufficient context. Moreover, such models usually rely on supervised learning paradigms. This presents a fundamental application paradox: their training requires the high-resolution time series that is intrinsically absent in upsampling application scenarios. To address the mentioned upsampling issue, this paper introduces a new method utilizing Generative Adversarial Transformers (GATs), which can be trained without access to any ground-truth high-resolution data. Compared with conventional interpolation methods, the introduced method can reduce the root mean square error (RMSE) of upsampling tasks by 9%, and the accuracy of a model predictive control (MPC) application scenario is improved by 13%.
Updated: 2025-08-14 12:25:39
标题: 自监督时间超分辨率能源数据的生成对抗变换器
摘要: 为了弥合基于能源系统模型的能源网络设计和运行中的时间粒度差距,需要对时间序列进行重新取样。传统的上采样方法在计算效率上表现出色,但往往会导致信息丢失或增加噪音。先进的模型,如时间序列生成模型、超分辨率模型和插补模型显示出潜力,但也面临着基本挑战。时间序列生成模型的目标是学习原始数据的分布,以生成具有相似统计特征的高分辨率序列。这与上采样的定义并不完全一致。时间序列超分辨率模型或插补模型可能会降低上采样的准确性,因为输入的低分辨率时间序列往往稀疏且可能缺乏足够的上下文。此外,这些模型通常依赖于监督学习范式。这带来了一个基本的应用悖论:它们的训练需要高分辨率时间序列,而这在上采样应用场景中本质上是不存在的。为解决上述上采样问题,本文介绍了一种利用生成对抗变换器(GATs)的新方法,该方法可以在没有任何真实高分辨率数据的情况下进行训练。与传统的插值方法相比,引入的方法可以将上采样任务的均方根误差(RMSE)减少9%,并且模型预测控制(MPC)应用场景的准确性提高了13%。
更新时间: 2025-08-14 12:25:39
领域: cs.LG,cs.NA,eess.SP,math.NA
Improved GUI Grounding via Iterative Narrowing
Graphical User Interface (GUI) grounding plays a crucial role in enhancing the capabilities of Vision-Language Model (VLM) agents. While general VLMs, such as GPT-4V, demonstrate strong performance across various tasks, their proficiency in GUI grounding remains suboptimal. Recent studies have focused on fine-tuning these models specifically for zero-shot GUI grounding, yielding significant improvements over baseline performance. We introduce a visual prompting framework that employs an iterative narrowing mechanism to further improve the performance of both general and fine-tuned models in GUI grounding. For evaluation, we tested our method on a comprehensive benchmark comprising various UI platforms and provided the code to reproduce our results.
Updated: 2025-08-14 12:23:09
标题: 通过迭代缩小改进GUI基础
摘要: 图形用户界面(GUI)基础对于提高视觉语言模型(VLM)代理的能力起着至关重要的作用。虽然通用的VLM,如GPT-4V,在各种任务上表现出色,但它们在GUI基础方面的熟练程度仍然不尽如人意。最近的研究集中在针对零样本GUI基础专门微调这些模型,取得了显著的改进,超过了基准性能。我们引入了一个视觉提示框架,采用迭代缩小机制来进一步提高通用和经过微调的模型在GUI基础方面的性能。为了评估,我们在包含各种UI平台的全面基准上测试了我们的方法,并提供了代码以重现我们的结果。
更新时间: 2025-08-14 12:23:09
领域: cs.CV,cs.AI,cs.CL
GNN-based Unified Deep Learning
Deep learning models often struggle to maintain generalizability in medical imaging, particularly under domain-fracture scenarios where distribution shifts arise from varying imaging techniques, acquisition protocols, patient populations, demographics, and equipment. In practice, each hospital may need to train distinct models - differing in learning task, width, and depth - to match local data. For example, one hospital may use Euclidean architectures such as MLPs and CNNs for tabular or grid-like image data, while another may require non-Euclidean architectures such as graph neural networks (GNNs) for irregular data like brain connectomes. How to train such heterogeneous models coherently across datasets, while enhancing each model's generalizability, remains an open problem. We propose unified learning, a new paradigm that encodes each model into a graph representation, enabling unification in a shared graph learning space. A GNN then guides optimization of these unified models. By decoupling parameters of individual models and controlling them through a unified GNN (uGNN), our method supports parameter sharing and knowledge transfer across varying architectures (MLPs, CNNs, GNNs) and distributions, improving generalizability. Evaluations on MorphoMNIST and two MedMNIST benchmarks - PneumoniaMNIST and BreastMNIST - show that unified learning boosts performance when models are trained on unique distributions and tested on mixed ones, demonstrating strong robustness to unseen data with large distribution shifts. Code and benchmarks: https://github.com/basiralab/uGNN
Updated: 2025-08-14 12:22:32
标题: 基于GNN的统一深度学习
摘要: 深度学习模型在医学影像领域经常难以保持泛化能力,特别是在领域破裂情况下,其中由于不同的成像技术、采集协议、患者人群、人口统计数据和设备而引起的分布变化。在实践中,每家医院可能需要训练不同的模型 - 在学习任务、宽度和深度方面有所不同 - 来匹配本地数据。例如,一个医院可能会使用欧几里德结构(如MLPs和CNNs)用于表格或类似网格的图像数据,而另一个可能需要非欧几里德结构(如图神经网络(GNNs))用于像脑连接组这样的不规则数据。如何在数据集间一致地训练这种异构模型,同时增强每个模型的泛化能力,仍然是一个悬而未决的问题。我们提出了统一学习,这是一种新的范式,将每个模型编码为图形表示,使其能够在共享的图形学习空间中统一。然后,GNN指导这些统一模型的优化。通过解耦各个模型的参数并通过统一GNN(uGNN)控制它们,我们的方法支持跨不同架构(MLPs,CNNs,GNNs)和分布进行参数共享和知识传递,从而提高泛化能力。在MorphoMNIST和两个MedMNIST基准测试 - PneumoniaMNIST和BreastMNIST上的评估显示,当模型在不同的分布上进行训练并在混合数据上进行测试时,统一学习提升了性能,展示了对大量分布变化的未知数据的强大鲁棒性。代码和基准测试:https://github.com/basiralab/uGNN
更新时间: 2025-08-14 12:22:32
领域: cs.LG
DiRW: Path-Aware Digraph Learning for Heterophily
Recently, graph neural network (GNN) has emerged as a powerful representation learning tool for graph-structured data. However, most approaches are tailored for undirected graphs, neglecting the abundant information in the edges of directed graphs (digraphs). In fact, digraphs are widely applied in the real world and confirmed to address heterophily challenges. Despite recent advancements, existing spatial- and spectral-based DiGNNs have limitations due to their complex learning mechanisms and reliance on high-quality topology, resulting in low efficiency and unstable performance. To address these issues, we propose Directed Random Walk (DiRW), a plug-and-play strategy for most spatial-based DiGNNs and also an innovative model which offers a new digraph learning paradigm. Specifically, it utilizes a direction-aware path sampler optimized from the perspectives of walk probability, length, and number in a weight-free manner by considering node profiles and topologies. Building upon this, DiRW incorporates a node-wise learnable path aggregator for generalized node representations. Extensive experiments on 9 datasets demonstrate that DiRW: (1) enhances most spatial-based methods as a plug-and-play strategy; (2) achieves SOTA performance as a new digraph learning paradigm. The source code and data are available at https://github.com/dhsiuu/DiRW.
Updated: 2025-08-14 12:21:06
标题: DiRW:面向异构性的路径感知有向图学习
摘要: 最近,图神经网络(GNN)已经成为图结构数据强大的表示学习工具。然而,大多数方法都是为无向图定制的,忽略了有向图(有向图)中边缘的丰富信息。事实上,有向图在现实世界中被广泛应用,并已证实可以解决异质挑战。尽管最近取得了进展,但现有的基于空间和频谱的DiGNN由于其复杂的学习机制和对高质量拓扑的依赖而存在局限性,导致效率低下和性能不稳定。为了解决这些问题,我们提出了Directed Random Walk(DiRW),这是一种适用于大多数基于空间的DiGNN的即插即用策略,同时也是一种创新模型,提供了一种新的有向图学习范式。具体来说,它利用了一种从行走概率、长度和数量的角度优化的方向感知路径采样器,以一种不依赖权重的方式来考虑节点概况和拓扑。在此基础上,DiRW还结合了一种节点智能可学习的路径聚合器,用于生成广义节点表示。在9个数据集上进行的大量实验表明,DiRW:(1)作为一种即插即用策略增强了大多数基于空间的方法;(2)作为一种新的有向图学习范式实现了最先进的性能。源代码和数据可在https://github.com/dhsiuu/DiRW 上找到。
更新时间: 2025-08-14 12:21:06
领域: cs.LG,cs.AI
Technical Report: Facilitating the Adoption of Causal Inference Methods Through LLM-Empowered Co-Pilot
Estimating treatment effects (TE) from observational data is a critical yet complex task in many fields, from healthcare and economics to public policy. While recent advances in machine learning and causal inference have produced powerful estimation techniques, their adoption remains limited due to the need for deep expertise in causal assumptions, adjustment strategies, and model selection. In this paper, we introduce CATE-B, an open-source co-pilot system that uses large language models (LLMs) within an agentic framework to guide users through the end-to-end process of treatment effect estimation. CATE-B assists in (i) constructing a structural causal model via causal discovery and LLM-based edge orientation, (ii) identifying robust adjustment sets through a novel Minimal Uncertainty Adjustment Set criterion, and (iii) selecting appropriate regression methods tailored to the causal structure and dataset characteristics. To encourage reproducibility and evaluation, we release a suite of benchmark tasks spanning diverse domains and causal complexities. By combining causal inference with intelligent, interactive assistance, CATE-B lowers the barrier to rigorous causal analysis and lays the foundation for a new class of benchmarks in automated treatment effect estimation.
Updated: 2025-08-14 12:20:51
标题: 技术报告:通过LLM增强的副驾驶促进因果推断方法的采用
摘要: 从观察数据中估计治疗效果(TE)是许多领域中的一个关键但复杂的任务,从医疗保健和经济学到公共政策。虽然最近机器学习和因果推断方面取得了重大进展,产生了强大的估计技术,但由于需要深入了解因果假设、调整策略和模型选择,它们的采用仍然受到限制。在本文中,我们介绍了CATE-B,一个使用大型语言模型(LLMs)在主动框架内指导用户进行端到端治疗效果估计过程的开源协作系统。CATE-B协助(i)通过因果发现和基于LLM的边缘定向构建结构性因果模型,(ii)通过一种新颖的最小不确定性调整集准则识别稳健的调整集,以及(iii)选择适合因果结构和数据集特征的适当回归方法。为了鼓励可重复性和评估,我们发布了一套涵盖多个领域和因果复杂性的基准任务。通过将因果推断与智能、交互式辅助相结合,CATE-B降低了进行严格因果分析的门槛,并为自动化治疗效果估计新类别的基准奠定了基础。
更新时间: 2025-08-14 12:20:51
领域: cs.LG
Delayed Feedback Modeling with Influence Functions
In online advertising under the cost-per-conversion (CPA) model, accurate conversion rate (CVR) prediction is crucial. A major challenge is delayed feedback, where conversions may occur long after user interactions, leading to incomplete recent data and biased model training. Existing solutions partially mitigate this issue but often rely on auxiliary models, making them computationally inefficient and less adaptive to user interest shifts. We propose IF-DFM, an \underline{I}nfluence \underline{F}unction-empowered for \underline{D}elayed \underline{F}eedback \underline{M}odeling which estimates the impact of newly arrived and delayed conversions on model parameters, enabling efficient updates without full retraining. By reformulating the inverse Hessian-vector product as an optimization problem, IF-DFM achieves a favorable trade-off between scalability and effectiveness. Experiments on benchmark datasets show that IF-DFM outperforms prior methods in both accuracy and adaptability.
Updated: 2025-08-14 12:15:41
标题: 用影响函数进行延迟反馈建模
摘要: 在线广告中,基于成本每次转化(CPA)模型,准确预测转化率(CVR)至关重要。一个主要挑战是延迟反馈,即转化可能在用户交互之后很长时间才发生,导致最近数据不完整且模型训练存在偏差。现有解决方案在一定程度上缓解了这个问题,但通常依赖辅助模型,使其在计算上效率低且不够适应用户兴趣变化。我们提出IF-DFM,一种利用影响函数增强的延迟反馈建模方法,该方法估计新到达和延迟转化对模型参数的影响,实现高效更新而无需完全重新训练。通过将逆Hessian-向量乘积重新定义为优化问题,IF-DFM在可扩展性和有效性之间实现了有利的权衡。对基准数据集进行的实验表明,IF-DFM在准确性和适应性方面优于先前的方法。
更新时间: 2025-08-14 12:15:41
领域: cs.LG,cs.AI,cs.IR
FAIRGAME: a Framework for AI Agents Bias Recognition using Game Theory
Letting AI agents interact in multi-agent applications adds a layer of complexity to the interpretability and prediction of AI outcomes, with profound implications for their trustworthy adoption in research and society. Game theory offers powerful models to capture and interpret strategic interaction among agents, but requires the support of reproducible, standardized and user-friendly IT frameworks to enable comparison and interpretation of results. To this end, we present FAIRGAME, a Framework for AI Agents Bias Recognition using Game Theory. We describe its implementation and usage, and we employ it to uncover biased outcomes in popular games among AI agents, depending on the employed Large Language Model (LLM) and used language, as well as on the personality trait or strategic knowledge of the agents. Overall, FAIRGAME allows users to reliably and easily simulate their desired games and scenarios and compare the results across simulation campaigns and with game-theoretic predictions, enabling the systematic discovery of biases, the anticipation of emerging behavior out of strategic interplays, and empowering further research into strategic decision-making using LLM agents.
Updated: 2025-08-14 12:12:53
标题: 《FAIRGAME:使用博弈论进行人工智能代理偏见识别的框架》
摘要: 让人工智能代理在多智能体应用中互动增加了对人工智能结果的可解释性和预测的复杂性,这对它们在研究和社会中的可信赖采用具有深远的影响。博弈论提供了强大的模型来捕捉和解释智能体之间的战略互动,但需要可复制、标准化和用户友好的IT框架的支持,以实现结果的比较和解释。为此,我们提出了FAIRGAME,一个利用博弈论识别人工智能代理偏见的框架。我们描述了其实施和使用,并利用它发现了在人工智能代理中流行游戏中的偏见结果,这取决于所采用的大型语言模型(LLM)和使用的语言,以及智能体的个性特征或战略知识。总的来说,FAIRGAME允许用户可靠且轻松地模拟他们想要的游戏和情景,并在模拟活动中比较结果,并与博弈论预测进行比较,从而实现对偏见的系统发现,预测出战略互动中出现的新行为,并进一步研究使用LLM代理的战略决策。
更新时间: 2025-08-14 12:12:53
领域: cs.AI
Reproducible Physiological Features in Affective Computing: A Preliminary Analysis on Arousal Modeling
In Affective Computing, a key challenge lies in reliably linking subjective emotional experiences with objective physiological markers. This preliminary study addresses the issue of reproducibility by identifying physiological features from cardiovascular and electrodermal signals that are associated with continuous self-reports of arousal levels. Using the Continuously Annotated Signal of Emotion dataset, we analyzed 164 features extracted from cardiac and electrodermal signals of 30 participants exposed to short emotion-evoking videos. Feature selection was performed using the Terminating-Random Experiments (T-Rex) method, which performs variable selection systematically controlling a user-defined target False Discovery Rate. Remarkably, among all candidate features, only two electrodermal-derived features exhibited reproducible and statistically significant associations with arousal, achieving a 100\% confirmation rate. These results highlight the necessity of rigorous reproducibility assessments in physiological features selection, an aspect often overlooked in Affective Computing. Our approach is particularly promising for applications in safety-critical environments requiring trustworthy and reliable white box models, such as mental disorder recognition and human-robot interaction systems.
Updated: 2025-08-14 11:58:36
标题: 情感计算中可重现的生理特征:对唤醒建模的初步分析
摘要: 在情感计算中,一个关键挑战在于可靠地将主观情感体验与客观生理指标联系起来。这项初步研究解决了可重复性问题,通过识别与连续自我报告唤起水平相关的心血管和皮肤电信号的生理特征。使用情感信号连续标注数据集,我们分析了30名参与者在暴露于短视频情感唤起的情况下从心脏和皮肤电信号提取的164个特征。特征选择使用了终止-随机实验(T-Rex)方法,该方法通过系统地控制用户定义的目标虚假发现率来进行变量选择。值得注意的是,在所有候选特征中,只有两个皮肤电导源特征表现出可重复和统计显著的与唤起相关的关联,实现了100%的确认率。这些结果突显了在生理特征选择中进行严格的可重复性评估的必要性,这是情感计算中经常被忽视的一个方面。我们的方法在需要可信赖和可靠的白盒模型的安全关键环境中特别有前景,例如精神障碍识别和人机交互系统。
更新时间: 2025-08-14 11:58:36
领域: cs.HC,cs.LG,eess.SP
Information Science Principles of Machine Learning: A Causal Chain Meta-Framework Based on Formalized Information Mapping
[Objective] This study addresses key challenges in machine learning, namely the absence of a unified formal theoretical framework and the lack of foundational theories for model interpretability and ethical safety. [Methods] We first construct a formal information model, explicitly defining the ontological states and carrier mappings of typical machine learning stages using sets of well-formed formulas. By introducing learnable and processable predicates, as well as learning and processing functions, we analyze the causal chain logic and constraint laws governing machine learning processes. [Results] We establish the Machine Learning Theory Meta-Framework (MLT-MF), on which we further propose universal definitions for model interpretability and ethical safety. We prove and validate three key theorems: the relationship between model interpretability and information existence, ethical safety assurance, and the upper bound estimation of total variation distance (TVD). [Limitations] The current framework assumes ideal, noise-free information enabling mappings and focuses primarily on model learning and processing logic in static scenarios. It does not yet address information fusion and conflict resolution across ontological spaces in multimodal or multi-agent systems. [Conclusions] This work overcomes the limitations of fragmented research and provides a unified theoretical foundation for systematically addressing critical issues in contemporary machine learning.
Updated: 2025-08-14 11:58:29
标题: 机器学习的信息科学原理:基于形式化信息映射的因果链元框架
摘要: 【目的】本研究解决了机器学习中的关键挑战,即缺乏统一的形式理论框架和模型可解释性及道德安全基础理论的缺乏。【方法】我们首先构建了一个正式的信息模型,明确定义了典型机器学习阶段的本体状态和载体映射,使用一组良构公式。通过引入可学习和可处理谓词,以及学习和处理函数,我们分析了控制机器学习过程的因果链逻辑和约束法则。【结果】我们建立了机器学习理论元框架(MLT-MF),在此基础上进一步提出了模型可解释性和道德安全的通用定义。我们证明并验证了三个关键定理:模型可解释性与信息存在之间的关系,道德安全保证以及总变差距离(TVD)的上限估计。【限制】当前框架假设理想的、无噪声的信息使映射成为可能,并且主要关注静态场景中的模型学习和处理逻辑。它尚未解决多模式或多代理系统中跨本体空间的信息融合和冲突解决。【结论】本工作克服了零散研究的局限性,并为系统地解决当代机器学习中的关键问题提供了统一的理论基础。
更新时间: 2025-08-14 11:58:29
领域: cs.LO,cs.AI
Fake Speech Wild: Detecting Deepfake Speech on Social Media Platform
The rapid advancement of speech generation technology has led to the widespread proliferation of deepfake speech across social media platforms. While deepfake audio countermeasures (CMs) achieve promising results on public datasets, their performance degrades significantly in cross-domain scenarios. To advance CMs for real-world deepfake detection, we first propose the Fake Speech Wild (FSW) dataset, which includes 254 hours of real and deepfake audio from four different media platforms, focusing on social media. As CMs, we establish a benchmark using public datasets and advanced selfsupervised learning (SSL)-based CMs to evaluate current CMs in real-world scenarios. We also assess the effectiveness of data augmentation strategies in enhancing CM robustness for detecting deepfake speech on social media. Finally, by augmenting public datasets and incorporating the FSW training set, we significantly advanced real-world deepfake audio detection performance, achieving an average equal error rate (EER) of 3.54% across all evaluation sets.
Updated: 2025-08-14 11:56:30
标题: 虚假言论横行:在社交媒体平台上检测深度伪造言论
摘要: 语音生成技术的快速发展导致深度伪造语音在社交媒体平台上广泛传播。尽管深度伪造音频对抗措施在公共数据集上取得了令人满意的结果,但它们在跨领域场景中的性能显著下降。为了推进用于真实世界深度伪造检测的对抗措施,我们首先提出了Fake Speech Wild(FSW)数据集,其中包括来自四个不同媒体平台的254小时真实和深度伪造音频,重点关注社交媒体。作为对抗措施,我们使用公共数据集和先进的自监督学习(SSL)基础的对抗措施建立了一个基准,以评估当前对抗措施在真实世界场景中的表现。我们还评估了数据增强策略在增强对抗措施对社交媒体上深度伪造语音检测的鲁棒性方面的有效性。最后,通过增强公共数据集并整合FSW训练集,我们显著提高了真实世界深度伪造音频检测性能,实现了所有评估集合上平均等误差率(EER)为3.54%。
更新时间: 2025-08-14 11:56:30
领域: cs.SD,cs.AI
Clipping Improves Adam-Norm and AdaGrad-Norm when the Noise Is Heavy-Tailed
Methods with adaptive stepsizes, such as AdaGrad and Adam, are essential for training modern Deep Learning models, especially Large Language Models. Typically, the noise in the stochastic gradients is heavy-tailed for the later ones. Gradient clipping provably helps to achieve good high-probability convergence for such noises. However, despite the similarity between AdaGrad/Adam and Clip-SGD, the current understanding of the high-probability convergence of AdaGrad/Adam-type methods is limited in this case. In this work, we prove that AdaGrad/Adam (and their delayed version) can have provably bad high-probability convergence if the noise is heavy-tailed. We also show that gradient clipping fixes this issue, i.e., we derive new high-probability convergence bounds with polylogarithmic dependence on the confidence level for AdaGrad-Norm and Adam-Norm with clipping and with/without delay for smooth convex/non-convex stochastic optimization with heavy-tailed noise. We extend our results to the case of AdaGrad/Adam with delayed stepsizes. Our empirical evaluations highlight the superiority of clipped versions of AdaGrad/Adam in handling the heavy-tailed noise.
Updated: 2025-08-14 11:55:47
标题: Clipping 在噪声呈重尾分布时改善 Adam-Norm 和 AdaGrad-Norm
摘要: 使用自适应步长的方法,如AdaGrad和Adam,在训练现代深度学习模型,特别是大型语言模型中至关重要。通常,随机梯度中的噪声对于后者来说是重尾的。梯度裁剪可以帮助证明在这种噪声下实现良好的高概率收敛。然而,尽管AdaGrad/Adam和Clip-SGD之间的相似性,但目前对于AdaGrad/Adam类型方法的高概率收敛的理解在这种情况下是有限的。在这项工作中,我们证明了如果噪声是重尾的,AdaGrad/Adam(及其延迟版本)可能会有明显糟糕的高概率收敛。我们还展示了梯度裁剪可以解决这个问题,即我们推导出了对于带有裁剪的AdaGrad-Norm和Adam-Norm在光滑凸/非凸随机优化中具有重尾噪声的自信水平具有多对数依赖的新的高概率收敛界限。我们将我们的结果扩展到具有延迟步长的AdaGrad/Adam的情况。我们的实证评估突显了裁剪版本的AdaGrad/Adam在处理重尾噪声方面的优越性。
更新时间: 2025-08-14 11:55:47
领域: cs.LG,math.OC
PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception Tasks
Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) represent two mainstream model quantization approaches. However, PTQ often leads to unacceptable performance degradation in quantized models, while QAT imposes substantial GPU memory requirements and extended training time due to weight fine-tuning.In this paper, we propose PTQAT, a novel general hybrid quantization algorithm for the efficient deployment of 3D perception networks. To address the speed accuracy trade-off between PTQ and QAT, our method selects critical layers for QAT fine-tuning and performs PTQ on the remaining layers. Contrary to intuition, fine-tuning the layers with smaller output discrepancies before and after quantization, rather than those with larger discrepancies, actually leads to greater improvements in the model's quantization accuracy. This means we better compensate for quantization errors during their propagation, rather than addressing them at the point where they occur. The proposed PTQAT achieves similar performance to QAT with more efficiency by freezing nearly 50% of quantifiable layers. Additionally, PTQAT is a universal quantization method that supports various quantization bit widths (4 bits) as well as different model architectures, including CNNs and Transformers. The experimental results on nuScenes across diverse 3D perception tasks, including object detection, semantic segmentation, and occupancy prediction, show that our method consistently outperforms QAT-only baselines. Notably, it achieves 0.2%-0.9% NDS and 0.3%-1.0% mAP gains in object detection, 0.3%-2.0% mIoU gains in semantic segmentation and occupancy prediction while fine-tuning fewer weights.
Updated: 2025-08-14 11:55:21
标题: PTQAT:一种用于3D感知任务的混合参数高效量化算法
摘要: 后训练量化(PTQ)和量化感知训练(QAT)代表了两种主流的模型量化方法。然而,PTQ通常会导致量化模型性能下降,而QAT由于权重微调会造成显著的GPU内存需求和延长的训练时间。本文提出了PTQAT,一种新颖的通用混合量化算法,用于高效部署3D感知网络。为了解决PTQ和QAT之间的速度精度权衡,我们的方法选择关键层进行QAT微调,并在其余层上执行PTQ。与直觉相反,先微调输出差异较小的层,而不是输出差异较大的层,实际上会导致模型量化精度的更大提高。这意味着我们更好地补偿了量化误差在传播过程中的影响,而不是在其发生的地方解决它们。所提出的PTQAT通过冻结近50%的可量化层,实现了与QAT相似的性能,但更高效。此外,PTQAT是一种通用量化方法,支持各种量化位宽(4位)以及不同的模型架构,包括CNN和Transformers。在nuScenes上进行的实验结果涵盖了各种3D感知任务,包括目标检测、语义分割和占用预测,表明我们的方法始终优于仅使用QAT的基线。值得注意的是,在目标检测中,实现了0.2%-0.9%的NDS和0.3%-1.0%的mAP增益,而在语义分割和占用预测中,实现了0.3%-2.0%的mIoU增益,同时微调了更少的权重。
更新时间: 2025-08-14 11:55:21
领域: cs.CV,cs.AI
Retrieval-Augmented Prompt for OOD Detection
Out-of-Distribution (OOD) detection is crucial for the reliable deployment of machine learning models in-the-wild, enabling accurate identification of test samples that differ from the training data distribution. Existing methods rely on auxiliary outlier samples or in-distribution (ID) data to generate outlier information for training, but due to limited outliers and their mismatch with real test OOD samples, they often fail to provide sufficient semantic supervision, leading to suboptimal performance. To address this, we propose a novel OOD detection method called Retrieval-Augmented Prompt (RAP). RAP augments a pre-trained vision-language model's prompts by retrieving external knowledge, offering enhanced semantic supervision for OOD detection. During training, RAP retrieves descriptive words for outliers based on joint similarity with external textual knowledge and uses them to augment the model's OOD prompts. During testing, RAP dynamically updates OOD prompts in real-time based on the encountered OOD samples, enabling the model to rapidly adapt to the test environment. Our extensive experiments demonstrate that RAP achieves state-of-the-art performance on large-scale OOD detection benchmarks. For example, in 1-shot OOD detection on the ImageNet-1k dataset, RAP reduces the average FPR95 by 7.05% and improves the AUROC by 1.71% compared to previous methods. Additionally, comprehensive ablation studies validate the effectiveness of each module and the underlying motivations of our approach.
Updated: 2025-08-14 11:52:43
标题: 检索增强提示用于OOD检测
摘要: Out-of-Distribution (OOD)检测对于在野外可靠部署机器学习模型至关重要,能够准确识别与训练数据分布不同的测试样本。现有方法依赖于辅助的异常样本或者分布内(ID)数据来生成异常信息用于训练,但由于异常样本数量有限且与真实测试OOD样本不匹配,它们经常无法提供足够的语义监督,导致性能不佳。为了解决这个问题,我们提出了一种名为检索增强提示(RAP)的新型OOD检测方法。RAP通过检索外部知识来增强预训练的视觉语言模型的提示,为OOD检测提供增强的语义监督。在训练过程中,RAP基于与外部文本知识的联合相似性检索异常样本的描述性词,并将其用于增强模型的OOD提示。在测试过程中,RAP根据遇到的OOD样本实时动态更新OOD提示,使模型能够迅速适应测试环境。我们的广泛实验表明,RAP在大规模OOD检测基准上取得了最先进的性能。例如,在ImageNet-1k数据集上的1-shot OOD检测中,RAP将平均FPR95降低了7.05%,并将AUROC提高了1.71%比之前的方法。此外,全面的消融研究验证了每个模块的有效性以及我们方法的基本动机。
更新时间: 2025-08-14 11:52:43
领域: cs.CV,cs.AI
Tuning-Free Online Robust Principal Component Analysis through Implicit Regularization
The performance of the standard Online Robust Principal Component Analysis (OR-PCA) technique depends on the optimum tuning of the explicit regularizers and this tuning is dataset sensitive. We aim to remove the dependency on these tuning parameters by using implicit regularization. We propose to use the implicit regularization effect of various modified gradient descents to make OR-PCA tuning free. Our method incorporates three different versions of modified gradient descent that separately but naturally encourage sparsity and low-rank structures in the data. The proposed method performs comparable or better than the tuned OR-PCA for both simulated and real-world datasets. Tuning-free ORPCA makes it more scalable for large datasets since we do not require dataset-dependent parameter tuning.
Updated: 2025-08-14 11:52:12
标题: 无调节的在线稳健主成分分析通过隐式正则化
摘要: 标准在线鲁棒主成分分析(OR-PCA)技术的性能取决于显式正则化器的最佳调优,而这种调优是数据集敏感的。我们旨在通过使用隐式正则化来消除对这些调优参数的依赖。我们提出利用各种修改的梯度下降的隐式正则化效应来使OR-PCA免于调优。我们的方法融合了三种不同版本的修改梯度下降,分别但自然地鼓励数据中的稀疏性和低秩结构。所提出的方法在模拟和真实世界数据集中表现出与调优的OR-PCA相当或更好的性能。无调优的ORPCA使其更适用于大型数据集,因为我们不需要数据集相关的参数调优。
更新时间: 2025-08-14 11:52:12
领域: cs.LG,cs.CV,stat.ML
Curse of High Dimensionality Issue in Transformer for Long-context Modeling
Transformer-based large language models (LLMs) excel in natural language processing tasks by capturing long-range dependencies through self-attention mechanisms. However, long-context modeling faces significant computational inefficiencies due to \textit{redundant} attention computations: while attention weights are often \textit{sparse}, all tokens consume \textit{equal} computational resources. In this paper, we reformulate traditional probabilistic sequence modeling as a \textit{supervised learning task}, enabling the separation of relevant and irrelevant tokens and providing a clearer understanding of redundancy. Based on this reformulation, we theoretically analyze attention sparsity, revealing that only a few tokens significantly contribute to predictions. Building on this, we formulate attention optimization as a linear coding problem and propose a \textit{group coding strategy}, theoretically showing its ability to improve robustness against random noise and enhance learning efficiency. Motivated by this, we propose \textit{Dynamic Group Attention} (DGA), which leverages the group coding to explicitly reduce redundancy by aggregating less important tokens during attention computation. Empirical results show that our DGA significantly reduces computational costs while maintaining competitive performance.Code is available at https://github.com/bolixinyu/DynamicGroupAttention.
Updated: 2025-08-14 11:51:31
标题: Transformer中长上下文建模中高维度问题的诅咒
摘要: 基于Transformer的大型语言模型(LLMs)通过自注意力机制捕获长距离依赖关系,在自然语言处理任务中表现出色。然而,长上下文建模面临重要的计算效率问题,这是由于\textit{冗余}注意力计算造成的:虽然注意力权重通常是\textit{稀疏}的,但所有标记却消耗\textit{相等}的计算资源。在本文中,我们重新构造传统的概率序列建模为一个\textit{监督学习任务},实现了相关和不相关标记的分离,并提供了对冗余性更清晰的理解。基于这一重构,我们在理论上分析了注意力稀疏性,揭示只有少数标记对预测产生显著贡献。基于此,我们将注意力优化建模为一个线性编码问题,并提出了一个\textit{分组编码策略},在理论上展示了它提高抗随机噪声的鲁棒性和增强学习效率的能力。受此启发,我们提出了\textit{动态分组注意力}(DGA),利用分组编码明确减少冗余性,通过在注意力计算过程中聚合不太重要的标记。实证结果表明,我们的DGA显著降低了计算成本,同时保持了竞争性能。代码可在https://github.com/bolixinyu/DynamicGroupAttention上找到。
更新时间: 2025-08-14 11:51:31
领域: cs.CL,cs.LG,stat.ML
Physics-Informed Deep Contrast Source Inversion: A Unified Framework for Inverse Scattering Problems
Inverse scattering problems are critical in electromagnetic imaging and medical diagnostics but are challenged by their nonlinearity and diverse measurement scenarios. This paper proposes a physics-informed deep contrast source inversion framework (DeepCSI) for fast and accurate medium reconstruction across various measurement conditions. Inspired by contrast source inversion (CSI) and neural operator methods, a residual multilayer perceptron (ResMLP) is employed to model current distributions in the region of interest under different transmitter excitations, effectively linearizing the nonlinear inverse scattering problem and significantly reducing the computational cost of traditional full-waveform inversion. By modeling medium parameters as learnable tensors and utilizing a hybrid loss function that integrates state equation loss, data equation loss, and total variation regularization, DeepCSI establishes a fully differentiable framework for joint optimization of network parameters and medium properties. Compared with conventional methods, DeepCSI offers advantages in terms of simplicity and universal modeling capabilities for diverse measurement scenarios, including phase-less and multi-frequency observation. Simulations and experiments demonstrate that DeepCSI achieves high-precision, robust reconstruction under full-data, phaseless data, and multifrequency conditions, outperforming traditional CSI methods and providing an efficient and universal solution for complex inverse scattering problems.
Updated: 2025-08-14 11:50:16
标题: 物理信息深度对比源反演:逆散射问题的统一框架
摘要: 反散射问题在电磁成像和医学诊断中至关重要,但由于其非线性和多样的测量场景而受到挑战。本文提出了一种物理信息深度对比源反演框架(DeepCSI),用于在各种测量条件下快速准确地重建介质。受对比源反演(CSI)和神经算子方法的启发,采用残差多层感知器(ResMLP)来模拟感兴趣区域内在不同发射机激励下的电流分布,有效地线性化非线性反散射问题,显著减少传统全波形反演的计算成本。通过将介质参数建模为可学习张量,并利用整合状态方程损失、数据方程损失和总变差正则化的混合损失函数,DeepCSI建立了一个完全可微分的框架,用于联合优化网络参数和介质属性。与传统方法相比,DeepCSI在简单性和对多样化测量场景的通用建模能力方面具有优势,包括无相位和多频率观测。模拟和实验表明,DeepCSI在全数据、无相位数据和多频率条件下实现了高精度、稳健的重建,优于传统的CSI方法,并为复杂反散射问题提供了高效且通用的解决方案。
更新时间: 2025-08-14 11:50:16
领域: physics.comp-ph,cs.CE,cs.LG
When Language Overrules: Revealing Text Dominance in Multimodal Large Language Models
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities across a diverse range of multimodal tasks. However, these models suffer from a core problem known as text dominance: they depend heavily on text for their inference, while underutilizing other modalities. While prior work has acknowledged this phenomenon in vision-language tasks, often attributing it to data biases or model architectures. In this paper, we conduct the first systematic investigation of text dominance across diverse data modalities, including images, videos, audio, time-series, and graphs. To measure this imbalance, we propose two evaluation metrics: the Modality Dominance Index (MDI) and the Attention Efficiency Index (AEI). Our comprehensive analysis reveals that text dominance is both significant and pervasive across all tested modalities. Our in-depth analysis identifies three underlying causes: attention dilution from severe token redundancy in non-textual modalities, the influence of fusion architecture design, and task formulations that implicitly favor textual inputs. Furthermore, we propose a simple token compression method that effectively rebalances model attention. Applying this method to LLaVA-7B, for instance, drastically reduces its MDI from 10.23 to a well-balanced value of 0.86. Our analysis and methodological framework offer a foundation for the development of more equitable and comprehensive multimodal language models.
Updated: 2025-08-14 11:44:52
标题: 当语言主导:揭示多模式大型语言模型中的文本主导
摘要: 多模态大型语言模型(MLLMs)在各种多模态任务中展示出了卓越的能力。然而,这些模型存在一个核心问题,即文本主导:它们在推断中严重依赖文本,而未充分利用其他模态。先前的研究已经承认了这种在视觉-语言任务中的现象,通常将其归因于数据偏差或模型架构。在本文中,我们首次对不同数据模态之间的文本主导进行了系统调查,包括图像、视频、音频、时间序列和图形。为了衡量这种不平衡,我们提出了两个评估指标:模态主导指数(MDI)和注意力效率指数(AEI)。我们的全面分析表明,文本主导在所有测试的模态中既显著又普遍。我们深入分析了三个潜在原因:非文本模态中严重令牌冗余导致的注意力稀释,融合架构设计的影响,以及隐含偏好文本输入的任务表述。此外,我们提出了一种简单的令牌压缩方法,有效地重新平衡模型的注意力。例如,将此方法应用于LLaVA-7B,将其MDI从10.23大幅降低到0.86,达到良好的平衡值。我们的分析和方法框架为开发更具公平性和全面性的多模态语言模型奠定了基础。
更新时间: 2025-08-14 11:44:52
领域: cs.CL,cs.AI
Okapi: Efficiently Safeguarding Speculative Data Accesses in Sandboxed Environments
This paper introduces Okapi, a new hardware/software cross-layer architecture designed to mitigate Transient Execution Side Channel attacks, including Spectre variants, in modern computing systems. Okapi provides a hardware basis for secure speculation in sandboxed environments and can replace expensive speculation barriers in software. At its core, it allows for speculative data accesses to a memory page only after the page has been accessed non-speculatively by the current trust domain. The granularity of the trust domains can be controlled in software to achieve different security and performance trade-offs. For environments with less stringent security needs, the features can be deactivated to remove all performance overhead. Without relying on any software modification, the Okapi hardware features provide full protection against TES breakout attacks, e.g., by Spectre-PHT or Spectre-BTB, at a thread-level granularity. This incurs an average performance overhead of only 3.17% for the SPEC CPU2017 benchmark suite. Okapi introduces the OkapiReset instruction for additional software-level security support. This instruction allows for fine-grained sandboxing with any custom size, resulting in 2.34% performance overhead in our WebAssembly runtime experiment. On top, Okapi provides the possibility to eliminate poisoning attacks. For the highest level of security, the OkapiLoad instruction prevents confidential data from being added to the trust domain after a sequential access, thereby enforcing weak speculative non-interference. In addition, we present a hardware extension that limits the exploitable code space for Spectre gadgets to well-defined sections of the program. Therefore, by ensuring the absence of gadgets in these sections, developers can tailor their software towards achieving beneficial trade-offs between the size of a trust domain and performance.
Updated: 2025-08-14 11:41:09
标题: Okapi:在沙箱环境中高效保护推测数据访问
摘要: 本文介绍了Okapi,这是一种新的硬件/软件跨层架构,旨在减轻现代计算系统中的瞬态执行侧信道攻击,包括Spectre变种。Okapi为沙盒环境中的安全推测提供了硬件基础,并可以替代软件中昂贵的推测障碍。 在其核心,它只允许在当前信任域通过非推测性方式访问内存页后,才能进行推测性数据访问。信任域的粒度可以在软件中进行控制,以实现不同的安全性和性能权衡。对于安全需求较低的环境,可以将这些功能停用以消除所有性能开销。 在不依赖任何软件修改的情况下,Okapi硬件功能提供了针对TES突破攻击的全面保护,例如Spectre-PHT或Spectre-BTB,以线程级粒度。这导致SPEC CPU2017基准测试套件的平均性能开销仅为3.17%。 Okapi引入了OkapiReset指令,用于提供额外的软件级安全支持。该指令允许使用任何自定义大小进行细粒度的沙盒化,导致我们在WebAssembly运行时实验中的2.34%性能开销。 另外,Okapi还提供了消除毒化攻击的可能性。为了达到最高级别的安全性,OkapiLoad指令防止机密数据在顺序访问后被添加到信任域,从而强制执行弱推测性非干扰。此外,我们提出了一种硬件扩展,将Spectre小工具的可利用代码空间限制在程序的明确定义部分。因此,通过确保这些部分中没有小工具,开发人员可以根据信任域的大小和性能之间的有益权衡调整其软件。
更新时间: 2025-08-14 11:41:09
领域: cs.CR,cs.AR
Stabilizing Long-term Multi-turn Reinforcement Learning with Gated Rewards
Reward sparsity in long-horizon reinforcement learning (RL) tasks remains a significant challenge, while existing outcome-based reward shaping struggles to define meaningful immediate rewards without introducing bias or requiring explicit task decomposition. Alternatively, verification-based reward shaping uses stepwise critics, but misalignment between immediate rewards and long-term objectives can lead to reward hacking and suboptimal policies. In this work, we address this problem in the context of software engineering (SWE) tasks, where multi-turn reasoning and rule-based verification are critical. We introduce the SWE-oriented RL Framework, a unified system supporting multi-turn interaction, docker-based execution, and customizable reward functions. Additionally, we propose Gated Reward Accumulation (G-RA), a novel method that accumulates immediate rewards only when high-level (long-term) rewards meet a predefined threshold, ensuring stable RL optimization. Experiments on SWE-bench Verified and kBench demonstrate that G-RA leads to an increase in completion rates (47.6\% \rightarrow 93.8\% and 22.0\% \rightarrow 86.0\%) and modification rates (19.6\% \rightarrow 23.8\% and 12.0\% \rightarrow 42.0\%), while avoiding policy degradation caused by reward misalignment. Our findings highlight the importance of balanced reward accumulation in long-horizon RL and provide a practical solution.
Updated: 2025-08-14 11:37:02
标题: 使用门控奖励稳定长期多轮强化学习
摘要: 长期强化学习(RL)任务中的奖励稀疏性仍然是一个重要挑战,而现有的基于结果的奖励塑造在不引入偏见或需要明确任务分解的情况下很难定义有意义的即时奖励。相反,基于验证的奖励塑造使用逐步评论家,但即时奖励与长期目标之间的错位可能导致奖励欺骗和次优策略。在这项工作中,我们在软件工程(SWE)任务的背景下解决了这个问题,其中多轮推理和基于规则的验证至关重要。我们引入了SWE导向的RL框架,这是一个支持多轮交互、基于docker的执行和可定制奖励函数的统一系统。此外,我们提出了门控奖励累积(G-RA),这是一种新颖的方法,只有当高级(长期)奖励达到预定义阈值时才累积即时奖励,确保稳定的RL优化。在SWE-bench Verified和kBench上的实验表明,G-RA导致完成率提高(47.6% → 93.8%和22.0% → 86.0%)和修改率提高(19.6% → 23.8%和12.0% → 42.0%),同时避免了由奖励错位引起的策略下降。我们的发现突显了在长期RL中平衡奖励累积的重要性,并提供了一个实际解决方案。
更新时间: 2025-08-14 11:37:02
领域: cs.LG,cs.AI,cs.CL
Driving Accurate Allergen Prediction with Protein Language Models and Generalization-Focused Evaluation
Allergens, typically proteins capable of triggering adverse immune responses, represent a significant public health challenge. To accurately identify allergen proteins, we introduce Applm (Allergen Prediction with Protein Language Models), a computational framework that leverages the 100-billion parameter xTrimoPGLM protein language model. We show that Applm consistently outperforms seven state-of-the-art methods in a diverse set of tasks that closely resemble difficult real-world scenarios. These include identifying novel allergens that lack similar examples in the training set, differentiating between allergens and non-allergens among homologs with high sequence similarity, and assessing functional consequences of mutations that create few changes to the protein sequences. Our analysis confirms that xTrimoPGLM, originally trained on one trillion tokens to capture general protein sequence characteristics, is crucial for Applm's performance by detecting important differences among protein sequences. In addition to providing Applm as open-source software, we also provide our carefully curated benchmark datasets to facilitate future research.
Updated: 2025-08-14 11:30:20
标题: 使用蛋白质语言模型和以泛化为重点的评估实现准确的过敏原预测
摘要: 过敏原通常是能够触发不良免疫反应的蛋白质,代表着一个重要的公共卫生挑战。为了准确识别过敏原蛋白,我们引入了Applm(具有蛋白质语言模型的过敏原预测),这是一个利用具有1000亿参数的xTrimoPGLM蛋白质语言模型的计算框架。我们展示了Applm在一组与困难的真实场景密切相关的任务中一贯优于七种最先进的方法。这些任务包括识别在训练集中缺乏相似示例的新型过敏原,区分高序列相似性的同源物中的过敏原和非过敏原,以及评估导致蛋白质序列变化少的突变的功能后果。我们的分析证实了xTrimoPGLM对Applm的性能至关重要,它最初是通过训练一万亿标记来捕捉一般蛋白质序列特征的。除了将Applm作为开源软件提供外,我们还提供了我们精心策划的基准数据集,以促进未来的研究。
更新时间: 2025-08-14 11:30:20
领域: cs.LG
ORBIT: An Object Property Reasoning Benchmark for Visual Inference Tasks
While vision-language models (VLMs) have made remarkable progress on many popular visual question answering (VQA) benchmarks, it remains unclear whether they abstract and reason over depicted objects. Inspired by human object categorisation, object property reasoning involves identifying and recognising low-level details and higher-level abstractions. While current VQA benchmarks consider a limited set of object property attributes like size, they typically blend perception and reasoning, and lack representativeness in terms of reasoning and image categories. To this end, we introduce a systematic evaluation framework with images of three representative types, three reasoning levels of increasing complexity, and four object property dimensions driven by prior work on commonsense reasoning. We develop a procedure to instantiate this benchmark into ORBIT, a multi-level reasoning VQA benchmark for object properties comprising 360 images paired with a total of 1,080 count-based questions. Experiments with 12 state-of-the-art VLMs in zero-shot settings reveal significant limitations compared to humans, with the best-performing model only reaching 40\% accuracy. VLMs struggle particularly with realistic (photographic) images, counterfactual reasoning about physical and functional properties, and higher counts. ORBIT points to the need to develop methods for scalable benchmarking, generalize annotation guidelines, and explore additional reasoning VLMs. We make the ORBIT benchmark and the experimental code available to support such endeavors.
Updated: 2025-08-14 11:28:40
标题: ORBIT:用于视觉推理任务的对象属性推理基准
摘要: 虽然视觉语言模型(VLMs)在许多流行的视觉问答(VQA)基准上取得了显著进展,但它们是否在抽象和推理描绘的对象仍不清楚。受人类对象分类启发,对象属性推理涉及识别和识别低级细节和高级抽象。虽然当前的VQA基准考虑了一组有限的对象属性属性,如大小,但它们通常混合了感知和推理,并在推理和图像类别方面缺乏代表性。为此,我们引入了一个系统化评估框架,其中包括三种代表性类型的图像,三种递增复杂性的推理级别,以及四个基于常识推理的对象属性维度。我们开发了一个程序将这个基准实例化为ORBIT,这是一个包含360张图像的多级推理VQA基准,配对总共1,080个基于计数的问题。在零样本设置中使用12种最先进的VLMs进行实验显示,与人类相比,它们存在显著的局限性,表现最好的模型仅达到40\%的准确率。VLMs特别难以处理现实(摄影)图像、关于物理和功能属性的反事实推理以及更高的计数。ORBIT指出了需要开发可扩展基准测试方法、普及注释指南以及探索其他推理VLMs的需求。我们提供ORBIT基准测试和实验代码以支持这些努力。
更新时间: 2025-08-14 11:28:40
领域: cs.CV,cs.AI
Improving Value-based Process Verifier via Low-Cost Variance Reduction
Large language models (LLMs) have achieved remarkable success in a wide range of tasks. However, their reasoning capabilities, particularly in complex domains like mathematics, remain a significant challenge. Value-based process verifiers, which estimate the probability of a partial reasoning chain leading to a correct solution, are a promising approach for improving reasoning. Nevertheless, their effectiveness is often hindered by estimation error in their training annotations, a consequence of the limited number of Monte Carlo (MC) samples feasible due to the high cost of LLM inference. In this paper, we identify that the estimation error primarily arises from high variance rather than bias, and the MC estimator is a Minimum Variance Unbiased Estimator (MVUE). To address the problem, we propose the \textsc{Com}pound \textsc{M}onte \textsc{C}arlo \textsc{S}ampling (ComMCS) method, which constructs an unbiased estimator by linearly combining the MC estimators from the current and subsequent steps. Theoretically, we show that our method leads to a predictable reduction in variance, while maintaining an unbiased estimation without additional LLM inference cost. We also perform empirical experiments on the MATH-500 and GSM8K benchmarks to demonstrate the effectiveness of our method. Notably, ComMCS outperforms regression-based optimization method by 2.8 points, the non-variance-reduced baseline by 2.2 points on MATH-500 on Best-of-32 sampling experiment.
Updated: 2025-08-14 11:22:29
标题: 通过低成本方差减少改进基于价值的过程验证器
摘要: 大型语言模型(LLMs)在各种任务中取得了显著的成功。然而,它们的推理能力,特别是在数学等复杂领域中,仍然是一个重要挑战。基于价值的过程验证器,可以估计部分推理链导致正确解决方案的概率,是改善推理的一个有前途的方法。然而,它们的有效性经常受到训练标注中的估计误差的阻碍,这是由于由于LLM推理的高成本而导致的蒙特卡罗(MC)样本数量有限。在本文中,我们发现估计误差主要来自高方差而不是偏差,而MC估计器是最小方差无偏估计器(MVUE)。为了解决这个问题,我们提出了\textsc{Com}pound \textsc{M}onte \textsc{C}arlo \textsc{S}ampling(ComMCS)方法,通过线性组合当前步骤和后续步骤的MC估计器构建一个无偏估计器。从理论上讲,我们表明我们的方法可以导致方差的可预测减少,同时保持无偏估计,而不增加额外的LLM推理成本。我们还在MATH-500和GSM8K基准测试上进行了实证实验,以展示我们方法的有效性。值得注意的是,在Best-of-32采样实验中,ComMCS在MATH-500上比基于回归的优化方法高出2.8个点,比非方差减少基线高出2.2个点。
更新时间: 2025-08-14 11:22:29
领域: cs.AI,cs.CL
LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation
Predictive manipulation has recently gained considerable attention in the Embodied AI community due to its potential to improve robot policy performance by leveraging predicted states. However, generating accurate future visual states of robot-object interactions from world models remains a well-known challenge, particularly in achieving high-quality pixel-level representations. To this end, we propose LaDi-WM, a world model that predicts the latent space of future states using diffusion modeling. Specifically, LaDi-WM leverages the well-established latent space aligned with pre-trained Visual Foundation Models (VFMs), which comprises both geometric features (DINO-based) and semantic features (CLIP-based). We find that predicting the evolution of the latent space is easier to learn and more generalizable than directly predicting pixel-level images. Building on LaDi-WM, we design a diffusion policy that iteratively refines output actions by incorporating forecasted states, thereby generating more consistent and accurate results. Extensive experiments on both synthetic and real-world benchmarks demonstrate that LaDi-WM significantly enhances policy performance by 27.9\% on the LIBERO-LONG benchmark and 20\% on the real-world scenario. Furthermore, our world model and policies achieve impressive generalizability in real-world experiments.
Updated: 2025-08-14 11:15:08
标题: LaDi-WM:一种基于潜在扩散的世界模型用于预测性操纵
摘要: 最近,由于预测潜在状态有助于提高机器人策略表现,预测性操纵在具体智能领域引起了广泛关注。然而,从世界模型中生成准确的未来视觉状态以及机器人-物体交互仍然是一个众所周知的挑战,特别是在实现高质量像素级表示方面。为此,我们提出了LaDi-WM,这是一个使用扩散建模预测未来状态的世界模型。具体来说,LaDi-WM利用了与预先训练的视觉基础模型(VFMs)对齐的潜在空间,其中包括几何特征(基于DINO)和语义特征(基于CLIP)。我们发现,预测潜在空间的演变比直接预测像素级图像更容易学习并更具泛化能力。基于LaDi-WM,我们设计了一个扩散策略,通过整合预测状态来迭代地优化输出动作,从而产生更一致和准确的结果。在合成和真实世界基准测试上进行的大量实验表明,LaDi-WM在LIBERO-LONG基准测试中将策略性能提高了27.9%,在真实世界场景中提高了20%。此外,我们的世界模型和策略在真实世界实验中表现出令人印象深刻的泛化能力。
更新时间: 2025-08-14 11:15:08
领域: cs.RO,cs.AI,cs.LG
Mitigating Exponential Mixed Frequency Growth through Frequency Selection and Dimensional Separation in Quantum Machine Learning
To leverage the potential computational speedup of quantum computing (QC), research in quantum machine learning (QML) has gained increasing prominence. Angle encoding techniques in QML models have been shown to generate truncated Fourier series, offering asymptotically universal function approximation capabilities. By selecting efficient feature maps (FMs) within quantum circuits, one can leverage the exponential growth of Fourier frequencies for improved approximation. In multi-dimensional settings, additional input dimensions induce further exponential scaling via mixed frequencies. In practice, however, quantum models frequently fail at regression tasks. Through two white-box experiments, we show that such failures can occur even when the relevant frequencies are present, due to an insufficient number of trainable parameters. In order to mitigate the double-exponential parameter growth resulting from double-exponentially growing frequencies, we propose frequency selection and dimensional separation as techniques to constrain the number of parameters, thereby improving trainability. By restricting the QML model to essential frequencies and permitting mixed frequencies only among feature dimensions with known interdependence, we expand the set of tractable problems on current hardware. We demonstrate the reduced parameter requirements by fitting two white-box functions with known frequency spectrum and dimensional interdependencies that could not be fitted with the default methods. The reduced parameter requirements permit us to perform training on a noisy quantum simulator and to demonstrate inference on real quantum hardware.
Updated: 2025-08-14 11:10:07
标题: 通过频率选择和量纲分离在量子机器学习中减轻指数混合频率增长
摘要: 为了利用量子计算(QC)的潜在计算加速优势,量子机器学习(QML)的研究日益受到重视。在QML模型中,角度编码技术已被证明可以生成截断的傅立叶级数,提供渐近通用的函数逼近能力。通过在量子电路中选择高效的特征映射(FMs),可以利用傅立叶频率的指数增长来改善逼近效果。在多维环境中,额外的输入维度通过混合频率进一步引起指数级的扩展。然而,在实践中,量子模型经常在回归任务中失败。通过两个白盒实验,我们展示了即使相关频率存在,由于可训练参数数量不足,这种失败也可能发生。 为了缓解由于频率双指数增长导致的双指数参数增长,我们提出了频率选择和维度分离作为限制参数数量的技术,从而提高可训练性。通过将QML模型限制在必要的频率,并仅在具有已知相互关联的特征维度之间允许混合频率,我们扩展了在当前硬件上可解决的问题集。我们通过拟合两个具有已知频谱和维度相互依赖的白盒函数来展示降低的参数需求,这些函数无法使用默认方法拟合。降低的参数需求使我们能够在嘈杂的量子模拟器上进行训练,并展示在真实量子硬件上进行推断。
更新时间: 2025-08-14 11:10:07
领域: quant-ph,cs.LG
Projected Coupled Diffusion for Test-Time Constrained Joint Generation
Modifications to test-time sampling have emerged as an important extension to diffusion algorithms, with the goal of biasing the generative process to achieve a given objective without having to retrain the entire diffusion model. However, generating jointly correlated samples from multiple pre-trained diffusion models while simultaneously enforcing task-specific constraints without costly retraining has remained challenging. To this end, we propose Projected Coupled Diffusion (PCD), a novel test-time framework for constrained joint generation. PCD introduces a coupled guidance term into the generative dynamics to encourage coordination between diffusion models and incorporates a projection step at each diffusion step to enforce hard constraints. Empirically, we demonstrate the effectiveness of PCD in application scenarios of image-pair generation, object manipulation, and multi-robot motion planning. Our results show improved coupling effects and guaranteed constraint satisfaction without incurring excessive computational costs.
Updated: 2025-08-14 11:05:31
标题: 预测耦合扩散用于测试时间约束下的联合生成
摘要: 对测试时间抽样的修改已经成为扩展扩散算法的重要领域,其目标是通过偏置生成过程来实现特定目标,而无需重新训练整个扩散模型。然而,同时从多个预训练的扩散模型中生成相关样本,并同时施加任务特定约束,而无需昂贵的重新训练,一直是具有挑战性的。为此,我们提出了投影耦合扩散(PCD),这是一个用于受约束联合生成的新颖测试时间框架。PCD在生成动力学中引入了一个耦合引导项,以鼓励扩散模型之间的协调,并在每个扩散步骤中引入投影步骤来强制执行硬约束。在经验上,我们展示了PCD在图像对生成、物体操作和多机器人运动规划应用场景中的有效性。我们的结果显示了改进的耦合效果,并保证了约束满足,而不会产生过多的计算成本。
更新时间: 2025-08-14 11:05:31
领域: cs.LG
Diversity First, Quality Later: A Two-Stage Assumption for Language Model Alignment
The alignment of language models (LMs) with human preferences is critical for building reliable AI systems. The problem is typically framed as optimizing an LM policy to maximize the expected reward that reflects human preferences. Recently, Direct Preference Optimization (DPO) was proposed as a LM alignment method that directly optimize the policy from static preference data, and further improved by incorporating on-policy sampling (i.e., preference candidates generated during the training loop) for better LM alignment. However, we show on-policy data is not always optimal, with systematic effectiveness difference emerging between static and on-policy preference candidates. For example, on-policy data can result in a 3$\times$ effectiveness compared with static data for Llama-3, and a 0.4$\times$ effectiveness for Zephyr. To explain the phenomenon, we propose the alignment stage assumption, which divides the alignment process into two distinct stages: the preference injection stage, which benefits from diverse data, and the preference fine-tuning stage, which favors high-quality data. Through theoretical and empirical analysis, we characterize these stages and propose an effective algorithm to identify the boundaries between them. We perform experiments on 5 models (Llama, Zephyr, Phi-2, Qwen, Pythia) and 2 alignment methods (DPO, SLiC-HF) to show the generalizability of alignment stage assumption and boundary measurement.
Updated: 2025-08-14 11:05:18
标题: 多样性优先,质量其次:语言模型对齐的两阶段假设
摘要: 语言模型(LMs)与人类偏好的对齐对于构建可靠的人工智能系统至关重要。这个问题通常被构建为优化一个LM策略,以最大化反映人类偏好的期望奖励。最近,直接偏好优化(DPO)被提出作为一种LM对齐方法,直接优化来自静态偏好数据的策略,并通过纳入在线策略采样(即在训练循环期间生成的偏好候选)进一步改进,以获得更好的LM对齐。然而,我们展示了在线数据并不总是最佳的,在静态和在线偏好候选之间出现了系统有效性差异。例如,相对于静态数据,在线数据可能导致Llama-3的3倍有效性,以及Zephyr的0.4倍有效性。为了解释这一现象,我们提出了对齐阶段假设,将对齐过程分为两个明确的阶段:偏好注入阶段,受益于多样化数据,以及偏好微调阶段,偏好高质量数据。通过理论和实证分析,我们描述了这些阶段,并提出了一个有效的算法来识别它们之间的边界。我们在5个模型(Llama,Zephyr,Phi-2,Qwen,Pythia)和2种对齐方法(DPO,SLiC-HF)上进行实验,以展示对齐阶段假设和边界测量的普适性。
更新时间: 2025-08-14 11:05:18
领域: cs.AI,cs.CL
Med-GLIP: Advancing Medical Language-Image Pre-training with Large-scale Grounded Dataset
Medical image grounding aims to align natural language phrases with specific regions in medical images, serving as a foundational task for intelligent diagnosis, visual question answering (VQA), and automated report generation (MRG). However, existing research is constrained by limited modality coverage, coarse-grained annotations, and the absence of a unified, generalizable grounding framework. To address these challenges, we construct a large-scale medical grounding dataset Med-GLIP-5M comprising over 5.3 million region-level annotations across seven imaging modalities, covering diverse anatomical structures and pathological findings. The dataset supports both segmentation and grounding tasks with hierarchical region labels, ranging from organ-level boundaries to fine-grained lesions. Based on this foundation, we propose Med-GLIP, a modality-aware grounding framework trained on Med-GLIP-5M. Rather than relying on explicitly designed expert modules, Med-GLIP implicitly acquires hierarchical semantic understanding from diverse training data -- enabling it to recognize multi-granularity structures, such as distinguishing lungs from pneumonia lesions. Extensive experiments demonstrate that Med-GLIP consistently outperforms state-of-the-art baselines across multiple grounding benchmarks. Furthermore, integrating its spatial outputs into downstream tasks, including medical VQA and report generation, leads to substantial performance gains. Our dataset will be released soon.
Updated: 2025-08-14 11:02:38
标题: Med-GLIP:利用大规模基础数据集推进医学语言-图像预训练
摘要: 医学图像对齐旨在将自然语言短语与医学图像中的特定区域对齐,为智能诊断、视觉问题回答(VQA)和自动生成报告(MRG)等任务奠定基础。然而,现有研究受限于有限的模态覆盖、粗粒度注释以及缺乏统一的、可泛化的对齐框架。为了解决这些挑战,我们构建了一个大规模的医学对齐数据集Med-GLIP-5M,包括超过530万个区域级别注释,涵盖七种成像模态,涵盖多样的解剖结构和病理发现。该数据集支持分割和对齐任务,具有层次化的区域标签,从器官级边界到细粒度病变。基于这一基础,我们提出了Med-GLIP,这是一个在Med-GLIP-5M上训练的模态感知对齐框架。与依赖明确设计的专家模块不同,Med-GLIP从多样的训练数据中隐式获取层次化语义理解,使其能够识别多粒度结构,例如区分肺部和肺炎病变。广泛的实验表明,Med-GLIP在多个对齐基准上始终优于最先进的基线。此外,将其空间输出集成到包括医学VQA和报告生成在内的下游任务中,导致了显著的性能提升。我们的数据集将很快发布。
更新时间: 2025-08-14 11:02:38
领域: cs.CV,cs.AI
Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees
Two-stage Learning-to-Defer (L2D) enables optimal task delegation by assigning each input to either a fixed main model or one of several offline experts, supporting reliable decision-making in complex, multi-agent environments. However, existing L2D frameworks assume clean inputs and are vulnerable to adversarial perturbations that can manipulate query allocation--causing costly misrouting or expert overload. We present the first comprehensive study of adversarial robustness in two-stage L2D systems. We introduce two novel attack strategie--untargeted and targeted--which respectively disrupt optimal allocations or force queries to specific agents. To defend against such threats, we propose SARD, a convex learning algorithm built on a family of surrogate losses that are provably Bayes-consistent and $(\mathcal{R}, \mathcal{G})$-consistent. These guarantees hold across classification, regression, and multi-task settings. Empirical results demonstrate that SARD significantly improves robustness under adversarial attacks while maintaining strong clean performance, marking a critical step toward secure and trustworthy L2D deployment.
Updated: 2025-08-14 11:00:13
标题: 两阶段学习推迟的对抗鲁棒性:算法和保证
摘要: 两阶段学习推迟(L2D)通过将每个输入分配给固定的主模型或多个离线专家之一,支持复杂的多代理环境中可靠的决策制定,实现了最佳任务委托。然而,现有的L2D框架假定输入是干净的,容易受到对抗性扰动的影响,这些扰动可以操纵查询分配,导致昂贵的错误路由或专家超载。我们提出了对两阶段L2D系统中的对抗鲁棒性进行的第一次全面研究。我们引入了两种新的攻击策略——非定向和定向——分别破坏最佳分配或强制查询发送到特定代理。为了抵御这些威胁,我们提出了SARD,这是一个基于一系列可证明贝叶斯一致和(R,G)一致的替代损失构建的凸学习算法。这些保证在分类、回归和多任务设置中都成立。实证结果表明,SARD在对抗性攻击下显著提高了鲁棒性,同时保持了强大的干净性能,这标志着朝着安全可信的L2D部署迈出了关键一步。
更新时间: 2025-08-14 11:00:13
领域: stat.ML,cs.LG
An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach
Phishing email is a serious cyber threat that tries to deceive users by sending false emails with the intention of stealing confidential information or causing financial harm. Attackers, often posing as trustworthy entities, exploit technological advancements and sophistication to make detection and prevention of phishing more challenging. Despite extensive academic research, phishing detection remains an ongoing and formidable challenge in the cybersecurity landscape. Large Language Models (LLMs) and Masked Language Models (MLMs) possess immense potential to offer innovative solutions to address long-standing challenges. In this research paper, we present an optimized, fine-tuned transformer-based DistilBERT model designed for the detection of phishing emails. In the detection process, we work with a phishing email dataset and utilize the preprocessing techniques to clean and solve the imbalance class issues. Through our experiments, we found that our model effectively achieves high accuracy, demonstrating its capability to perform well. Finally, we demonstrate our fine-tuned model using Explainable-AI (XAI) techniques such as Local Interpretable Model-Agnostic Explanations (LIME) and Transformer Interpret to explain how our model makes predictions in the context of text classification for phishing emails.
Updated: 2025-08-14 10:59:03
标题: 一个可解释的基于Transformer的钓鱼邮件检测模型:基于大型语言模型的方法
摘要: 网络钓鱼邮件是一种严重的网络威胁,试图通过发送虚假邮件来欺骗用户,目的是窃取机密信息或造成财务损失。攻击者常常假冒可信实体,利用技术进步和复杂性来使网络钓鱼的检测和防范更具挑战性。尽管进行了大量学术研究,网络钓鱼的检测仍然是网络安全领域一个持续而严峻的挑战。大型语言模型(LLMs)和掩蔽语言模型(MLMs)具有巨大潜力,可以提供创新解决方案来应对长期存在的挑战。在这篇研究论文中,我们提出了一个经过优化、精细调整的基于变压器的DistilBERT模型,用于检测网络钓鱼邮件。在检测过程中,我们使用网络钓鱼邮件数据集,并利用预处理技术来清理和解决类别不平衡问题。通过我们的实验,我们发现我们的模型有效地实现了高准确性,展示了它的表现能力。最后,我们使用可解释人工智能(XAI)技术,如局部可解释模型不可知解释(LIME)和变压器解释,来解释我们的模型在网络钓鱼邮件文本分类的预测过程中是如何进行的。
更新时间: 2025-08-14 10:59:03
领域: cs.LG,cs.AI,cs.CR
A Two-Stage Learning-to-Defer Approach for Multi-Task Learning
The Two-Stage Learning-to-Defer (L2D) framework has been extensively studied for classification and, more recently, regression tasks. However, many real-world applications require solving both tasks jointly in a multi-task setting. We introduce a novel Two-Stage L2D framework for multi-task learning that integrates classification and regression through a unified deferral mechanism. Our method leverages a two-stage surrogate loss family, which we prove to be both Bayes-consistent and $(\mathcal{G}, \mathcal{R})$-consistent, ensuring convergence to the Bayes-optimal rejector. We derive explicit consistency bounds tied to the cross-entropy surrogate and the $L_1$-norm of agent-specific costs, and extend minimizability gap analysis to the multi-expert two-stage regime. We also make explicit how shared representation learning -- commonly used in multi-task models -- affects these consistency guarantees. Experiments on object detection and electronic health record analysis demonstrate the effectiveness of our approach and highlight the limitations of existing L2D methods in multi-task scenarios.
Updated: 2025-08-14 10:55:42
标题: 一个两阶段的学习推迟方法用于多任务学习
摘要: 两阶段学习推迟(L2D)框架已广泛应用于分类和最近的回归任务。然而,许多现实世界的应用需要在多任务设置中同时解决这两个任务。我们引入了一种新颖的用于多任务学习的两阶段L2D框架,通过统一的推迟机制集成了分类和回归。我们的方法利用了一个两阶段替代损失家族,我们证明它既是贝叶斯一致的,也是$(\mathcal{G}, \mathcal{R})$-一致的,确保收敛到贝叶斯最优的拒绝器。我们推导了与交叉熵替代和代理特定成本的$L_1$-范数相关的显式一致性界限,并将可最小化差距分析扩展到多专家两阶段制度。我们还明确了共享表示学习--在多任务模型中常用--如何影响这些一致性保证。对对象检测和电子健康记录分析的实验表明了我们方法的有效性,并突出了现有L2D方法在多任务场景中的局限性。
更新时间: 2025-08-14 10:55:42
领域: stat.ML,cs.HC,cs.LG
VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models
Popular PEFT methods reduce trainable parameter count for fine-tuning by parameterizing new low-rank or sparse trainable weights in parallel to the frozen pre-trained weights $W$. However, these weights are trained from scratch, and there exists a performance gap between these methods and full fine-tuning, especially in low-budget settings. We introduce VectorFit, a new way of parameterization that efficiently utilizes the existing knowledge embedded in $W$ by adaptively training their singular vectors and biases. We show that utilizing the structural and transformational properties of $W$ in this way can lead to high-rank incremental weight matrices $\Delta W$, comparable to that of full fine-tuning. VectorFit delivers superior results with 9$\boldsymbol\times$ fewer trainable parameters than the leading PEFT methods. Through comprehensive experiments across 19 datasets covering a wide range of language and vision tasks such as natural language understanding and generation, question answering, image classification, and image generation, we demonstrate that VectorFit surpasses baselines in terms of performance as a function of parameter-efficiency.
Updated: 2025-08-14 10:49:01
标题: VectorFit:预训练基础模型的自适应奇异性和偏差向量微调
摘要: 流行的PEFT方法通过在冻结的预训练权重$W$的同时参数化新的低秩或稀疏可训练权重,从而减少微调的可训练参数数量。然而,这些权重是从头开始训练的,这些方法与完全微调之间存在性能差距,特别是在低预算环境下。我们引入了VectorFit,一种新的参数化方法,通过自适应地训练其奇异向量和偏差,有效利用了嵌入在$W$中的现有知识。我们展示了以这种方式利用$W$的结构和变换特性可以导致高秩增量权重矩阵$\Delta W$,与完全微调相媲美。与主流PEFT方法相比,VectorFit在可训练参数数量少了9倍的情况下提供了更好的结果。通过对覆盖自然语言理解和生成、问答、图像分类和图像生成等多样化语言和视觉任务的19个数据集进行全面实验,我们证明了VectorFit在参数效率方面的性能优越性。
更新时间: 2025-08-14 10:49:01
领域: cs.LG,cs.AI
Nonlocal Monte Carlo via Reinforcement Learning
Optimizing or sampling complex cost functions of combinatorial optimization problems is a longstanding challenge across disciplines and applications. When employing family of conventional algorithms based on Markov Chain Monte Carlo (MCMC) such as simulated annealing or parallel tempering, one assumes homogeneous (equilibrium) temperature profiles across input. This instance independent approach was shown to be ineffective for the hardest benchmarks near a computational phase transition when the so-called overlap-gap-property holds. In these regimes conventional MCMC struggles to unfreeze rigid variables, escape suboptimal basins of attraction, and sample high-quality and diverse solutions. In order to mitigate these challenges, Nonequilibrium Nonlocal Monte Carlo (NMC) algorithms were proposed that leverage inhomogeneous temperature profiles thereby accelerating exploration of the configuration space without compromising its exploitation. Here, we employ deep reinforcement learning (RL) to train the nonlocal transition policies of NMC which were previously designed phenomenologically. We demonstrate that the resulting solver can be trained solely by observing energy changes of the configuration space exploration as RL rewards and the local minimum energy landscape geometry as RL states. We further show that the trained policies improve upon the standard MCMC-based and nonlocal simulated annealing on hard uniform random and scale-free random 4-SAT benchmarks in terms of residual energy, time-to-solution, and diversity of solutions metrics.
Updated: 2025-08-14 10:45:44
标题: 通过强化学习的非局部蒙特卡洛算法
摘要: 优化或采样组合优化问题的复杂成本函数是跨学科和应用领域中长期存在的挑战。当使用基于马尔可夫链蒙特卡洛(MCMC)的常规算法家族,如模拟退火或并行退火时,人们假设输入中存在均匀(平衡)的温度分布。然而,当存在所谓的重叠-间隙属性时,这种独立于实例的方法在计算相变附近的最难基准案例中被证明是无效的。在这些情况下,常规MCMC难以解冻刚性变量,逃离次优吸引盆地,并采样高质量和多样化的解决方案。为了缓解这些挑战,提出了非平衡非局部蒙特卡洛(NMC)算法,利用不均匀温度分布从而加速配置空间的探索而不损害其利用。在这里,我们利用深度强化学习(RL)来训练NMC的非局部转移策略,这些策略先前是现象学设计的。我们展示,通过观察配置空间探索的能量变化作为RL奖励和局部最小能量景观几何形状作为RL状态,可以单独训练出的求解器。我们进一步展示,训练后的策略在困难的均匀随机和无标度随机4-SAT基准测试中,在残余能量、解决方案时间和解决方案多样性指标方面优于标准的基于MCMC和非局部模拟退火方法。
更新时间: 2025-08-14 10:45:44
领域: cs.LG,cond-mat.dis-nn
Virtual Sensing for Solder Layer Degradation and Temperature Monitoring in IGBT Modules
Monitoring the degradation state of Insulated Gate Bipolar Transistor (IGBT) modules is essential for ensuring the reliability and longevity of power electronic systems, especially in safety-critical and high-performance applications. However, direct measurement of key degradation indicators - such as junction temperature, solder fatigue or delamination - remains challenging due to the physical inaccessibility of internal components and the harsh environment. In this context, machine learning-based virtual sensing offers a promising alternative by bridging the gap from feasible sensor placement to the relevant but inaccessible locations. This paper explores the feasibility of estimating the degradation state of solder layers, and the corresponding full temperature maps based on a limited number of physical sensors. Based on synthetic data of a specific degradation mode, we obtain a high accuracy in the estimation of the degraded solder area (1.17% mean absolute error), and are able to reproduce the surface temperature of the IGBT with a maximum relative error of 4.56% (corresponding to an average relative error of 0.37%).
Updated: 2025-08-14 10:40:46
标题: IGBT模块中焊料层降解和温度监测的虚拟传感器
摘要: 监测绝缘栅双极晶体管(IGBT)模块的退化状态对于确保功率电子系统的可靠性和持久性至关重要,特别是在安全关键和高性能应用中。然而,直接测量关键的退化指标 - 如结温、焊料疲劳或层状剥离 - 仍然具有挑战性,因为内部组件的物理不可访问性和恶劣环境。在这种背景下,基于机器学习的虚拟传感提供了一种有希望的替代方案,可以弥合从可行的传感器放置到相关但不可访问位置之间的差距。本文探讨了基于有限数量的物理传感器估计焊层退化状态和相应的完整温度图的可行性。基于特定退化模式的合成数据,我们获得了在估计退化焊料区域方面的高准确度(平均绝对误差为1.17%),并且能够重现IGBT的表面温度,最大相对误差为4.56%(对应平均相对误差为0.37%)。
更新时间: 2025-08-14 10:40:46
领域: physics.comp-ph,cs.CE,cs.LG,cs.SY,eess.SY
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
Current multimodal large language models (MLLMs) still face significant challenges in complex visual tasks (e.g., spatial understanding, fine-grained perception). Prior methods have tried to incorporate visual reasoning, however, they fail to leverage attention correction with spatial cues to iteratively refine their focus on prompt-relevant regions. In this paper, we introduce SIFThinker, a spatially-aware "think-with-images" framework that mimics human visual perception. Specifically, SIFThinker enables attention correcting and image region focusing by interleaving depth-enhanced bounding boxes and natural language. Our contributions are twofold: First, we introduce a reverse-expansion-forward-inference strategy that facilitates the generation of interleaved image-text chains of thought for process-level supervision, which in turn leads to the construction of the SIF-50K dataset. Besides, we propose GRPO-SIF, a reinforced training paradigm that integrates depth-informed visual grounding into a unified reasoning pipeline, teaching the model to dynamically correct and focus on prompt-relevant regions. Extensive experiments demonstrate that SIFThinker outperforms state-of-the-art methods in spatial understanding and fine-grained visual perception, while maintaining strong general capabilities, highlighting the effectiveness of our method. Code: https://github.com/zhangquanchen/SIFThinker.
Updated: 2025-08-14 10:34:22
标题: SIFThinker:用于视觉推理的空间感知图像聚焦
摘要: 目前,多模态大型语言模型(MLLMs)在复杂的视觉任务(如空间理解、细粒度感知)仍然面临重大挑战。先前的方法尝试将视觉推理纳入其中,然而它们未能利用空间线索对注意力进行修正,以迭代地优化对提示相关区域的关注。在本文中,我们介绍了一种名为SIFThinker的空间感知“图像思考”框架,模拟人类视觉感知。具体而言,SIFThinker通过交替使用增强深度的边界框和自然语言,实现了注意力校正和图像区域聚焦。我们的贡献有两个方面:首先,我们引入了一种反向扩展-前向推理策略,促进了交替生成图像-文本思维链以进行过程级监督,从而构建了SIF-50K数据集。此外,我们提出了GRPO-SIF,一种强化训练范式,将深度信息的视觉定位整合到统一的推理流程中,教导模型动态地校正和聚焦于提示相关区域。大量实验表明,SIFThinker在空间理解和细粒度视觉感知方面优于现有方法,同时保持了较强的通用能力,突出了我们方法的有效性。 代码:https://github.com/zhangquanchen/SIFThinker。
更新时间: 2025-08-14 10:34:22
领域: cs.CV,cs.AI,I.2.10
Exploring the Application of Visual Question Answering (VQA) for Classroom Activity Monitoring
Classroom behavior monitoring is a critical aspect of educational research, with significant implications for student engagement and learning outcomes. Recent advancements in Visual Question Answering (VQA) models offer promising tools for automatically analyzing complex classroom interactions from video recordings. In this paper, we investigate the applicability of several state-of-the-art open-source VQA models, including LLaMA2, LLaMA3, QWEN3, and NVILA, in the context of classroom behavior analysis. To facilitate rigorous evaluation, we introduce our BAV-Classroom-VQA dataset derived from real-world classroom video recordings at the Banking Academy of Vietnam. We present the methodology for data collection, annotation, and benchmark the performance of the selected VQA models on this dataset. Our initial experimental results demonstrate that all four models achieve promising performance levels in answering behavior-related visual questions, showcasing their potential in future classroom analytics and intervention systems.
Updated: 2025-08-14 10:32:44
标题: 探究视觉问答(VQA)在课堂活动监控中的应用
摘要: 课堂行为监测是教育研究的一个关键方面,对学生的参与度和学习成果有重要影响。最近,视觉问答(VQA)模型的最新进展为自动分析视频录像中复杂的课堂互动提供了有前途的工具。本文探讨了几种最先进的开源VQA模型(包括LLaMA2、LLaMA3、QWEN3和NVILA)在课堂行为分析中的适用性。为了促进严格评估,我们介绍了我们的BAV-Classroom-VQA数据集,该数据集来源于越南银行学院的实际课堂视频录像。我们提出了数据收集、注释的方法论,并在该数据集上对选定的VQA模型的性能进行了基准测试。我们的初步实验结果表明,这四个模型在回答与行为相关的视觉问题方面取得了有前途的性能水平,展示了它们在未来课堂分析和干预系统中的潜力。
更新时间: 2025-08-14 10:32:44
领域: cs.CV,cs.AI
Yan: Foundational Interactive Video Generation
We present Yan, a foundational framework for interactive video generation, covering the entire pipeline from simulation and generation to editing. Specifically, Yan comprises three core modules. AAA-level Simulation: We design a highly-compressed, low-latency 3D-VAE coupled with a KV-cache-based shift-window denoising inference process, achieving real-time 1080P/60FPS interactive simulation. Multi-Modal Generation: We introduce a hierarchical autoregressive caption method that injects game-specific knowledge into open-domain multi-modal video diffusion models (VDMs), then transforming the VDM into a frame-wise, action-controllable, real-time infinite interactive video generator. Notably, when the textual and visual prompts are sourced from different domains, the model demonstrates strong generalization, allowing it to blend and compose the style and mechanics across domains flexibly according to user prompts. Multi-Granularity Editing: We propose a hybrid model that explicitly disentangles interactive mechanics simulation from visual rendering, enabling multi-granularity video content editing during interaction through text. Collectively, Yan offers an integration of these modules, pushing interactive video generation beyond isolated capabilities toward a comprehensive AI-driven interactive creation paradigm, paving the way for the next generation of creative tools, media, and entertainment. The project page is: https://greatx3.github.io/Yan/.
Updated: 2025-08-14 10:26:51
标题: Yan:基础交互式视频生成
摘要: 我们提出了Yan,一个用于交互视频生成的基础框架,涵盖了从模拟和生成到编辑的整个流程。具体来说,Yan包括三个核心模块。AAA级别的模拟:我们设计了一个高度压缩、低延迟的3D-VAE,结合基于KV缓存的移位窗口去噪推理过程,实现了实时1080P/60FPS交互式模拟。多模态生成:我们引入了一种分层自回归的标题方法,将游戏特定知识注入到开放领域多模态视频扩散模型(VDMs)中,然后将VDM转换为逐帧、动作可控、实时无限交互式视频生成器。值得注意的是,当文本和视觉提示来自不同领域时,模型展现出很强的泛化能力,可以根据用户提示灵活地混合和组合跨领域的风格和机制。多粒度编辑:我们提出了一个混合模型,明确地将交互机制模拟与视觉渲染分离,使得用户可以通过文本在交互过程中进行多粒度视频内容编辑。综合起来,Yan提供了这些模块的集成,将交互视频生成推向超越孤立能力的综合AI驱动的交互式创作范式,为下一代创意工具、媒体和娱乐铺平道路。项目页面是:https://greatx3.github.io/Yan/。
更新时间: 2025-08-14 10:26:51
领域: cs.CV,cs.AI
Codes on any Cayley Graph have an Interactive Oracle Proof of Proximity
Interactive Oracle Proofs of Proximity (IOPP) are at the heart of code-based SNARKs, a family of zeroknowledge protocols. The first and most famous one is the FRI protocol [BBHR18a], that efficiently tests proximity to Reed-Solomon codes. This paper generalizes the flowering IOPP introduced in [DMR25] for some specific (2, n)-regular Tanner codes to a much broader variety of codes: any code with symbols indexed on the edges of a Cayley graph. The flowering protocol of [DMR25] had a soundness parameter much lower than the FRI protocol [BCI + 23], and complexity parameters that could compete with the FRI [BBHR18a]. The lower soundness and the absence of restriction on the base field may lead to other practical speedups, however the codes considered in [DMR25] have an o(1) minimum distance. The generalization proposed in this paper preserves the soundness parameter with a slight decrease of the complexity parameters, while allowing being applied on codes with constant rate and constant minimum distance thanks to the good expansion properties of some families of Cayley graphs.
Updated: 2025-08-14 10:25:17
标题: 在任何凯莱图上的编码具有交互式在线证明的近似性
摘要: 交互式证明接近性(IOPP)是基于代码的SNARKs的核心,这是一类零知识协议。第一个和最著名的是FRI协议[BBHR18a],它有效地测试了接近Reed-Solomon码。本文将引入的用于某些特定(2,n)-正则Tanner码的flowering IOPP[DMR25]推广到更广泛的代码:任何符号索引在Cayley图的边上的代码。[DMR25]的flowering协议的可靠性参数比FRI协议[BCI + 23]低得多,并且复杂性参数可以与FRI[BBHR18a]竞争。然而,较低的可靠性和基本领域上的限制缺失可能导致其他实际加速,但是[DMR25]中考虑的代码具有o(1)最小距离。本文提出的泛化保持了可靠性参数,并略微降低了复杂性参数,同时允许在具有恒定速率和恒定最小距离的代码上应用,这要归功于一些Cayley图族的良好扩展特性。
更新时间: 2025-08-14 10:25:17
领域: cs.CR
Multi-Sample Anti-Aliasing and Constrained Optimization for 3D Gaussian Splatting
Recent advances in 3D Gaussian splatting have significantly improved real-time novel view synthesis, yet insufficient geometric constraints during scene optimization often result in blurred reconstructions of fine-grained details, particularly in regions with high-frequency textures and sharp discontinuities. To address this, we propose a comprehensive optimization framework integrating multisample anti-aliasing (MSAA) with dual geometric constraints. Our system computes pixel colors through adaptive blending of quadruple subsamples, effectively reducing aliasing artifacts in high-frequency components. The framework introduces two constraints: (a) an adaptive weighting strategy that prioritizes under-reconstructed regions through dynamic gradient analysis, and (b) gradient differential constraints enforcing geometric regularization at object boundaries. This targeted optimization enables the model to allocate computational resources preferentially to critical regions requiring refinement while maintaining global consistency. Extensive experimental evaluations across multiple benchmarks demonstrate that our method achieves state-of-the-art performance in detail preservation, particularly in preserving high-frequency textures and sharp discontinuities, while maintaining real-time rendering efficiency. Quantitative metrics and perceptual studies confirm statistically significant improvements over baseline approaches in both structural similarity (SSIM) and perceptual quality (LPIPS).
Updated: 2025-08-14 10:14:36
标题: 多样本抗锯齿和受限优化用于3D高斯散点化
摘要: 最近在3D高斯喷洒方面取得的进展显著提高了实时新视角合成,然而,在场景优化过程中缺乏几何约束往往导致细节模糊的重建,特别是在具有高频纹理和锐利不连续性的区域。为了解决这个问题,我们提出了一个综合优化框架,将多重采样抗锯齿(MSAA)与双重几何约束集成在一起。我们的系统通过自适应混合四倍子样本的方式计算像素颜色,有效地减少高频成分中的混叠伪影。该框架引入了两个约束:(a)一种自适应加权策略,通过动态梯度分析优先考虑未充分重建的区域,以及(b)梯度差约束,在对象边界处强制几何正则化。这种有针对性的优化使模型能够将计算资源优先分配给需要精细化的关键区域,同时保持全局一致性。对多个基准测试的广泛实验评估表明,我们的方法在细节保留方面取得了最先进的性能,特别是在保持高频纹理和锐利不连续性方面,同时保持实时渲染效率。定量指标和感知研究证实,在结构相似性(SSIM)和感知质量(LPIPS)方面,与基线方法相比,我们的方法在统计上都有显著改进。
更新时间: 2025-08-14 10:14:36
领域: cs.CV,cs.AI
Advances in Logic-Based Entity Resolution: Enhancing ASPEN with Local Merges and Optimality Criteria
In this paper, we present ASPEN+, which extends an existing ASP-based system, ASPEN,for collective entity resolution with two important functionalities: support for local merges and new optimality criteria for preferred solutions. Indeed, ASPEN only supports so-called global merges of entity-referring constants (e.g. author ids), in which all occurrences of matched constants are treated as equivalent and merged accordingly. However, it has been argued that when resolving data values, local merges are often more appropriate, as e.g. some instances of 'J. Lee' may refer to 'Joy Lee', while others should be matched with 'Jake Lee'. In addition to allowing such local merges, ASPEN+ offers new optimality criteria for selecting solutions, such as minimizing rule violations or maximising the number of rules supporting a merge. Our main contributions are thus (1) the formalisation and computational analysis of various notions of optimal solution, and (2) an extensive experimental evaluation on real-world datasets, demonstrating the effect of local merges and the new optimality criteria on both accuracy and runtime.
Updated: 2025-08-14 10:05:56
标题: 基于逻辑的实体解析的进展:将ASPEN与本地合并和最优性标准相结合。
摘要: 在本文中,我们介绍了ASPEN+,它扩展了现有的基于ASP的系统ASPEN,用于集体实体解析,具有两个重要功能:支持本地合并和新的最佳解决方案标准。实际上,ASPEN仅支持所谓的实体引用常量(例如作者ID)的全局合并,其中匹配常量的所有出现都被视为等效并相应合并。然而,有人认为在解析数据值时,本地合并通常更合适,例如'J. Lee'的一些实例可能指的是'Joy Lee',而另一些实例应该与'Jake Lee'匹配。除了允许这种本地合并外,ASPEN+还提供了选择解决方案的新的最佳标准,例如最小化规则违反或最大化支持合并的规则数量。我们的主要贡献是(1)对各种最佳解决方案概念的形式化和计算分析,以及(2)对真实数据集的广泛实验评估,展示了本地合并和新的最佳标准对准确性和运行时间的影响。
更新时间: 2025-08-14 10:05:56
领域: cs.DB,cs.AI
PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning
Existing tool-augmented agentic systems are limited in the real world by (i) black-box reasoning steps that undermine trust of decision-making and pose safety risks, (ii) poor multimodal integration, which is inherently critical for healthcare tasks, and (iii) rigid and computationally inefficient agentic pipelines. We introduce PASS (Probabilistic Agentic Supernet Sampling), the first multimodal framework to address these challenges in the context of Chest X-Ray (CXR) reasoning. PASS adaptively samples agentic workflows over a multi-tool graph, yielding decision paths annotated with interpretable probabilities. Given the complex CXR reasoning task with multimodal medical data, PASS leverages its learned task-conditioned distribution over the agentic supernet. Thus, it adaptively selects the most suitable tool at each supernet layer, offering probability-annotated trajectories for post-hoc audits and directly enhancing medical AI safety. PASS also continuously compresses salient findings into an evolving personalized memory, while dynamically deciding whether to deepen its reasoning path or invoke an early exit for efficiency. To optimize a Pareto frontier balancing performance and cost, we design a novel three-stage training procedure, including expert knowledge warm-up, contrastive path-ranking, and cost-aware reinforcement learning. To facilitate rigorous evaluation, we introduce CAB-E, a comprehensive benchmark for multi-step, safety-critical, free-form CXR reasoning. Experiments across various benchmarks validate that PASS significantly outperforms strong baselines in multiple metrics (e.g., accuracy, AUC, LLM-J.) while balancing computational costs, pushing a new paradigm shift towards interpretable, adaptive, and multimodal medical agentic systems.
Updated: 2025-08-14 10:03:47
标题: PASS:用于可解释和自适应胸部X光推理的概率性代理超网络采样
摘要: 现有的工具增强型主体系统在现实世界中存在一些限制,包括:(i)黑盒推理步骤破坏了决策的信任,并带来安全风险,(ii)多模态集成不佳,这在医疗任务中至关重要,以及(iii)刚性且计算效率低下的主体管道。我们引入了PASS(Probabilistic Agentic Supernet Sampling),这是第一个多模态框架,用于解决胸部X光(CXR)推理中的这些挑战。PASS自适应地在多工具图上对主体工作流进行采样,生成带有可解释概率注释的决策路径。考虑到复杂的CXR推理任务和多模态医学数据,PASS利用其对主体超网络的学习任务条件分布。因此,它自适应地选择每个超网络层中最合适的工具,为事后审计提供带有概率注释的轨迹,并直接增强医疗人工智能的安全性。PASS还将显著发现持续压缩到不断发展的个性化记忆中,同时动态决定是加深其推理路径还是为了效率而提前退出。为了优化在性能和成本之间平衡的帕累托前沿,我们设计了一个新颖的三阶段训练程序,包括专家知识热身、对比路径排序和成本感知强化学习。为了促进严格评估,我们引入了CAB-E,一个全面的多步、安全关键、自由形式的CXR推理基准。在各种基准测试中的实验证明,PASS在多个指标(如准确性、AUC、LLM-J等)上明显优于强基线,同时平衡计算成本,推动了朝着可解释、自适应和多模态医疗主体系统的新范式转变。
更新时间: 2025-08-14 10:03:47
领域: cs.AI,cs.LG
MIRRAMS: Learning Robust Tabular Models under Unseen Missingness Shifts
The presence of missing values often reflects variations in data collection policies, which may shift across time or locations, even when the underlying feature distribution remains stable. Such shifts in the missingness distribution between training and test inputs pose a significant challenge to achieving robust predictive performance. In this study, we propose a novel deep learning framework designed to address this challenge, particularly in the common yet challenging scenario where the test-time dataset is unseen. We begin by introducing a set of mutual information-based conditions, called MI robustness conditions, which guide the prediction model to extract label-relevant information. This promotes robustness against distributional shifts in missingness at test-time. To enforce these conditions, we design simple yet effective loss terms that collectively define our final objective, called MIRRAMS. Importantly, our method does not rely on any specific missingness assumption such as MCAR, MAR, or MNAR, making it applicable to a broad range of scenarios. Furthermore, it can naturally extend to cases where labels are also missing in training data, by generalizing the framework to a semi-supervised learning setting. Extensive experiments across multiple benchmark tabular datasets demonstrate that MIRRAMS consistently outperforms existing state-of-the-art baselines and maintains stable performance under diverse missingness conditions. Moreover, it achieves superior performance even in fully observed settings, highlighting MIRRAMS as a powerful, off-the-shelf framework for general-purpose tabular learning.
Updated: 2025-08-14 09:57:08
标题: MIRRAMS:学习在未知缺失转变下的稳健表格模型
摘要: 缺失值的存在通常反映了数据收集政策的变化,即使潜在的特征分布保持稳定,这些政策可能随着时间或位置的变化而发生变化。在训练和测试输入之间缺失性分布的这种转变给实现稳健的预测性能带来了重大挑战。在本研究中,我们提出了一种新颖的深度学习框架,旨在应对这一挑战,特别是在测试时数据集是未知的常见且具有挑战性的场景中。我们首先引入一组基于互信息的条件,称为MI鲁棒性条件,这些条件指导预测模型提取与标签相关的信息。这有助于在测试时对缺失性分布的变化保持稳健性。为了实施这些条件,我们设计了简单而有效的损失项,共同定义了我们的最终目标,称为MIRRAMS。重要的是,我们的方法不依赖于任何特定的缺失性假设,如MCAR、MAR或MNAR,使其适用于广泛的场景。此外,它可以自然地扩展到训练数据中标签也缺失的情况,通过将框架推广到半监督学习设置。在多个基准表格数据集上进行的大量实验证明,MIRRAMS始终优于现有的最先进基线,并在不同的缺失条件下保持稳定的性能。此外,即使在完全观察的情况下,它也能实现卓越的性能,突显MIRRAMS作为通用表格学习的强大、即插即用的框架。
更新时间: 2025-08-14 09:57:08
领域: stat.ML,cs.LG
A Unified Multi-Agent Framework for Universal Multimodal Understanding and Generation
Real-world multimodal applications often require any-to-any capabilities, enabling both understanding and generation across modalities including text, image, audio, and video. However, integrating the strengths of autoregressive language models (LLMs) for reasoning and diffusion models for high-fidelity generation remains challenging. Existing approaches rely on rigid pipelines or tightly coupled architectures, limiting flexibility and scalability. We propose MAGUS (Multi-Agent Guided Unified Multimodal System), a modular framework that unifies multimodal understanding and generation via two decoupled phases: Cognition and Deliberation. MAGUS enables symbolic multi-agent collaboration within a shared textual workspace. In the Cognition phase, three role-conditioned multimodal LLM agents - Perceiver, Planner, and Reflector - engage in collaborative dialogue to perform structured understanding and planning. The Deliberation phase incorporates a Growth-Aware Search mechanism that orchestrates LLM-based reasoning and diffusion-based generation in a mutually reinforcing manner. MAGUS supports plug-and-play extensibility, scalable any-to-any modality conversion, and semantic alignment - all without the need for joint training. Experiments across multiple benchmarks, including image, video, and audio generation, as well as cross-modal instruction following, demonstrate that MAGUS outperforms strong baselines and state-of-the-art systems. Notably, on the MME benchmark, MAGUS surpasses the powerful closed-source model GPT-4o.
Updated: 2025-08-14 09:52:51
标题: 一个统一的多智能体框架用于普遍多模态理解和生成
摘要: 真实世界的多模态应用通常需要任何到任何的能力,可以跨越文本、图像、音频和视频等模态进行理解和生成。然而,将自回归语言模型(LLMs)的推理能力和扩散模型的高保真生成能力整合起来仍然具有挑战性。现有方法依赖于严格的管道或紧密耦合的架构,限制了灵活性和可扩展性。我们提出了MAGUS(多智能导向统一多模态系统),这是一个通过两个解耦阶段——认知和思考——统一多模态理解和生成的模块化框架。MAGUS在一个共享的文本工作空间中实现了符号多智能体协作。在认知阶段,三个角色条件的多模态LLM智能体——感知者、规划者和反射者——进行协作对话,进行结构化理解和规划。思考阶段包括一个增长感知搜索机制,以相互强化的方式协调基于LLM的推理和基于扩散的生成。MAGUS支持即插即用的可扩展性,可扩展的任何到任何的模态转换,以及语义对齐——所有这些都不需要联合训练。在包括图像、视频和音频生成以及跨模态指令跟随在内的多个基准测试中的实验表明,MAGUS优于强基线和最先进的系统。值得注意的是,在MME基准测试中,MAGUS超过了强大的闭源模型GPT-4o。
更新时间: 2025-08-14 09:52:51
领域: cs.LG,cs.AI,cs.MA
AlDBaran: Towards Blazingly Fast State Commitments for Blockchains
The fundamental basis for maintaining integrity within contemporary blockchain systems is provided by authenticated databases. Our analysis indicates that a significant portion of the approaches applied in this domain fail to sufficiently meet the stringent requirements of systems processing transactions at rates of multi-million TPS. AlDBaran signifies a substantial advancement in authenticated databases. By eliminating disk I/O operations from the critical path, implementing prefetching strategies, and refining the update mechanism of the Merkle tree, we have engineered an authenticated data structure capable of handling state updates efficiently at a network throughput of 50 Gbps. This throughput capacity significantly surpasses any empirically documented blockchain throughput, guaranteeing the ability of even the most high-throughput blockchains to generate state commitments effectively. AlDBaran provides support for historical state proofs, which facilitates a wide array of novel applications. For instance, the deployment of AlDBaran could enable blockchains that do not currently support state commitments to offer functionalities for light clients and/or implement rollups. When benchmarked against alternative authenticated data structure projects, AlDBaran exhibits superior performance and simplicity. In particular, AlDBaran achieves speeds of approximately 48 million updates per second using an identical machine configuration. This characteristic renders AlDBaran an attractive solution for resource-limited environments, as its historical data capabilities can be modularly isolated (and deactivated), which further enhances performance. On consumer-level portable hardware, it achieves approximately 8 million updates/s in an in-memory setting and 5 million updates/s with snapshots at sub-second intervals, illustrating compelling and cost-effective scalability.
Updated: 2025-08-14 09:52:15
标题: AlDBaran:向区块链的快速状态承诺迈进
摘要: 维护当代区块链系统完整性的基本基础是认证数据库。我们的分析表明,在这一领域应用的方法中,有相当一部分未能满足以千万TPS速率处理交易的系统的严格要求。AlDBaran标志着认证数据库方面的重大进展。通过从关键路径中消除磁盘I/O操作,实施预取策略,并完善Merkle树的更新机制,我们构建了一种能够以50 Gbps的网络吞吐量高效处理状态更新的认证数据结构。这种吞吐量容量明显超过任何经验文献中记录的区块链吞吐量,确保即使是最高吞吐量的区块链也能有效生成状态承诺。 AlDBaran支持历史状态证明,有利于广泛的新应用。例如,部署AlDBaran可以使目前不支持状态承诺的区块链提供轻客户端功能或实施Rollups。 与替代认证数据结构项目进行基准测试时,AlDBaran表现出优越的性能和简单性。特别是,AlDBaran在相同的机器配置下达到了约4800万更新每秒的速度。这一特点使AlDBaran成为资源有限环境中的一个吸引人的解决方案,因为其历史数据功能可以被模块化地隔离(和停用),从而进一步提升性能。在消费级可携带硬件上,在内存设置下,它可以实现约800万更新/秒,在次秒间隔快照下,它可以实现500万更新/秒,展示出引人注目和具有成本效益的可扩展性。
更新时间: 2025-08-14 09:52:15
领域: cs.CR,cs.GT
Reverse Physician-AI Relationship: Full-process Clinical Diagnosis Driven by a Large Language Model
Full-process clinical diagnosis in the real world encompasses the entire diagnostic workflow that begins with only an ambiguous chief complaint. While artificial intelligence (AI), particularly large language models (LLMs), is transforming clinical diagnosis, its role remains largely as an assistant to physicians. This AI-assisted working pattern makes AI can only answer specific medical questions at certain parts within the diagnostic process, but lack the ability to drive the entire diagnostic process starting from an ambiguous complaint, which still relies heavily on human physicians. This gap limits AI's ability to fully reduce physicians' workload and enhance diagnostic efficiency. To address this, we propose a paradigm shift that reverses the relationship between physicians and AI: repositioning AI as the primary director, with physicians serving as its assistants. So we present DxDirector-7B, an LLM endowed with advanced deep thinking capabilities, enabling it to drive the full-process diagnosis with minimal physician involvement. Furthermore, DxDirector-7B establishes a robust accountability framework for misdiagnoses, delineating responsibility between AI and human physicians. In evaluations across rare, complex, and real-world cases under full-process diagnosis setting, DxDirector-7B not only achieves significant superior diagnostic accuracy but also substantially reduces physician workload than state-of-the-art medical LLMs as well as general-purpose LLMs. Fine-grained analyses across multiple clinical departments and tasks validate its efficacy, with expert evaluations indicating its potential to serve as a viable substitute for medical specialists. These findings mark a new era where AI, traditionally a physicians' assistant, now drives the entire diagnostic process to drastically reduce physicians' workload, indicating an efficient and accurate diagnostic solution.
Updated: 2025-08-14 09:51:20
标题: 逆转医师与人工智能的关系:由大型语言模型驱动的全过程临床诊断
摘要: 在现实世界中,全过程临床诊断涵盖了从仅有模糊主诉开始的整个诊断工作流程。虽然人工智能(AI),特别是大型语言模型(LLMs),正在改变临床诊断,但其角色仍然主要是作为医生的助手。这种AI辅助工作模式使得AI只能在诊断过程中的特定部分回答特定的医学问题,但缺乏从模糊主诉开始驱动整个诊断过程的能力,这仍然严重依赖于人类医生。这种差距限制了AI全面减轻医生工作负担和提高诊断效率的能力。为了解决这个问题,我们提出了一种范式转变,逆转了医生和AI之间的关系:将AI重新定位为主要指导者,而医生则作为其助手。因此,我们提出了一种具有先进深度思维能力的LLM——DxDirector-7B,使其能够在最少医生参与的情况下推动全过程诊断。此外,DxDirector-7B建立了一个严格的失诊责任框架,界定了AI和人类医生之间的责任。在全过程诊断设置下对罕见、复杂和现实案例进行评估时,DxDirector-7B不仅实现了显著优越的诊断准确性,而且比最先进的医学LLMs以及通用LLMs大幅减少了医生工作量。跨多个临床科室和任务的细粒度分析验证了其有效性,专家评估表明其有望作为医学专家的可行替代品。这些发现标志着一个新时代,AI,传统上是医生的助手,现在驱动整个诊断过程,大幅减轻医生的工作负担,表明了一种高效和准确的诊断解决方案。
更新时间: 2025-08-14 09:51:20
领域: cs.AI,cs.CE,cs.CL
Contrastive ECOC: Learning Output Codes for Adversarial Defense
Although one-hot encoding is commonly used for multiclass classification, it is not always the most effective encoding mechanism. Error Correcting Output Codes (ECOC) address multiclass classification by mapping each class to a unique codeword used as a label. Traditional ECOC methods rely on manually designed or randomly generated codebooks, which are labor-intensive and may yield suboptimal, dataset-agnostic results. This paper introduces three models for automated codebook learning based on contrastive learning, allowing codebooks to be learned directly and adaptively from data. Across four datasets, our proposed models demonstrate superior robustness to adversarial attacks compared to two baselines. The source is available at https://github.com/YuChou20/Automated-Codebook-Learning-with-Error-Correcting-Output-Code-Technique.
Updated: 2025-08-14 09:50:50
标题: 对比ECOC:学习输出码以用于对抗性防御
摘要: 虽然独热编码通常用于多类分类,但并不总是最有效的编码机制。纠错输出码(ECOC)通过将每个类映射到一个独特的码字作为标签来解决多类分类问题。传统的ECOC方法依赖于手动设计或随机生成的码书,这些方法劳动密集且可能产生次优的、与数据集无关的结果。本文介绍了三种基于对比学习的自动码书学习模型,允许码书直接从数据中自适应地学习。在四个数据集上,我们提出的模型相对于两个基线表现出更强的抗对抗攻击鲁棒性。源码可在https://github.com/YuChou20/Automated-Codebook-Learning-with-Error-Correcting-Output-Code-Technique找到。
更新时间: 2025-08-14 09:50:50
领域: cs.LG,cs.AI,cs.IT,math.IT
On the Complexity-Faithfulness Trade-off of Gradient-Based Explanations
ReLU networks, while prevalent for visual data, have sharp transitions, sometimes relying on individual pixels for predictions, making vanilla gradient-based explanations noisy and difficult to interpret. Existing methods, such as GradCAM, smooth these explanations by producing surrogate models at the cost of faithfulness. We introduce a unifying spectral framework to systematically analyze and quantify smoothness, faithfulness, and their trade-off in explanations. Using this framework, we quantify and regularize the contribution of ReLU networks to high-frequency information, providing a principled approach to identifying this trade-off. Our analysis characterizes how surrogate-based smoothing distorts explanations, leading to an ``explanation gap'' that we formally define and measure for different post-hoc methods. Finally, we validate our theoretical findings across different design choices, datasets, and ablations.
Updated: 2025-08-14 09:49:07
标题: 关于基于梯度的解释的复杂性-忠实度权衡
摘要: ReLU网络在视觉数据中普遍存在,具有尖锐的转换,有时会依赖于个别像素进行预测,使得基于梯度的普通解释变得嘈杂且难以解释。现有的方法,如GradCAM,通过产生替代模型来平滑这些解释,但会牺牲忠实度。我们引入了一个统一的谱框架来系统分析和量化解释的平滑性、忠实度及其之间的权衡。利用这一框架,我们量化并规范ReLU网络对高频信息的贡献,提供了一个合理的方法来识别这种权衡。我们的分析描述了基于替代的平滑如何扭曲解释,导致我们正式定义并测量不同后续方法的“解释差距”。最后,我们验证我们的理论发现在不同设计选择、数据集和消融实验中的有效性。
更新时间: 2025-08-14 09:49:07
领域: cs.LG,cs.AI,cs.CV
Learning State-Space Models of Dynamic Systems from Arbitrary Data using Joint Embedding Predictive Architectures
With the advent of Joint Embedding Predictive Architectures (JEPAs), which appear to be more capable than reconstruction-based methods, this paper introduces a novel technique for creating world models using continuous-time dynamic systems from arbitrary observation data. The proposed method integrates sequence embeddings with neural ordinary differential equations (neural ODEs). It employs loss functions that enforce contractive embeddings and Lipschitz constants in state transitions to construct a well-organized latent state space. The approach's effectiveness is demonstrated through the generation of structured latent state-space models for a simple pendulum system using only image data. This opens up a new technique for developing more general control algorithms and estimation techniques with broad applications in robotics.
Updated: 2025-08-14 09:46:11
标题: 学习动态系统的状态空间模型:使用联合嵌入预测架构从任意数据中学习
摘要: 随着联合嵌入预测架构(JEPAs)的出现,这些架构似乎比基于重建的方法更具能力,本文介绍了一种利用连续时间动态系统从任意观测数据创建世界模型的新技术。所提出的方法将序列嵌入与神经常微分方程(神经ODEs)相结合。它使用损失函数来强制嵌入收缩和状态转换中的利普希茨常数,以构建一个组织良好的潜在状态空间。该方法的有效性通过仅使用图像数据为简单摆系统生成结构化潜在状态空间模型进行了证明。这为开发更通用的控制算法和估计技术打开了一种新技术,适用于机器人领域的广泛应用。
更新时间: 2025-08-14 09:46:11
领域: cs.LG
SEQ-GPT: LLM-assisted Spatial Query via Example
Contemporary spatial services such as online maps predominantly rely on user queries for location searches. However, the user experience is limited when performing complex tasks, such as searching for a group of locations simultaneously. In this study, we examine the extended scenario known as Spatial Exemplar Query (SEQ), where multiple relevant locations are jointly searched based on user-specified examples. We introduce SEQ-GPT, a spatial query system powered by Large Language Models (LLMs) towards more versatile SEQ search using natural language. The language capabilities of LLMs enable unique interactive operations in the SEQ process, including asking users to clarify query details and dynamically adjusting the search based on user feedback. We also propose a tailored LLM adaptation pipeline that aligns natural language with structured spatial data and queries through dialogue synthesis and multi-model cooperation. SEQ-GPT offers an end-to-end demonstration for broadening spatial search with realistic data and application scenarios.
Updated: 2025-08-14 09:41:55
标题: SEQ-GPT: 通过示例辅助的空间查询
摘要: 当代空间服务,如在线地图,主要依赖用户查询进行位置搜索。然而,在执行复杂任务,比如同时搜索一组位置时,用户体验受到限制。在本研究中,我们研究了被称为空间范例查询(SEQ)的扩展场景,其中基于用户指定的示例共同搜索多个相关位置。我们引入了SEQ-GPT,这是一个由大型语言模型(LLMs)驱动的空间查询系统,以更具灵活性地利用自然语言进行SEQ搜索。LLMs的语言能力使得在SEQ过程中可以进行独特的交互操作,包括要求用户澄清查询细节并根据用户反馈动态调整搜索。我们还提出了一个定制的LLM适应管道,通过对话合成和多模型合作来将自然语言与结构化空间数据和查询进行对齐。SEQ-GPT提供了一个端到端的演示,以拓宽具有现实数据和应用场景的空间搜索。
更新时间: 2025-08-14 09:41:55
领域: cs.AI
Federated Cross-Training Learners for Robust Generalization under Data Heterogeneity
Federated learning benefits from cross-training strategies, which enables models to train on data from distinct sources to improve generalization capability. However, due to inherent differences in data distributions, the optimization goals of local models remain misaligned, and this mismatch continues to manifest as feature space heterogeneity even after cross-training. We argue that knowledge distillation from the personalized view preserves client-specific characteristics and expands the local knowledge base, while distillation from the global view provides consistent semantic anchors that facilitate feature alignment across clients. To achieve this goal, this paper presents a cross-training scheme, termed FedCT, includes three main modules, where the consistency-aware knowledge broadcasting module aims to optimize model assignment strategies, which enhances collaborative advantages between clients and achieves an efficient federated learning process. The multi-view knowledge-guided representation learning module leverages fused prototypical knowledge from both global and local views to enhance the preservation of local knowledge before and after model exchange, as well as to ensure consistency between local and global knowledge. The mixup-based feature augmentation module aggregates rich information to further increase the diversity of feature spaces, which enables the model to better discriminate complex samples. Extensive experiments were conducted on four datasets in terms of performance comparison, ablation study, in-depth analysis and case study. The results demonstrated that FedCT alleviates knowledge forgetting from both local and global views, which enables it outperform state-of-the-art methods.
Updated: 2025-08-14 09:39:07
标题: 跨领域联合训练学习者在数据异构性下的强健泛化
摘要: 联邦学习受益于跨培训策略,这使得模型能够在不同来源的数据上训练,以提高泛化能力。然而,由于数据分布的固有差异,本地模型的优化目标仍然不一致,即使在跨培训之后,这种不匹配仍然表现为特征空间的异质性。我们认为,从个性化视角进行的知识蒸馏可以保留客户特定特征,并扩展本地知识库,而从全局视角进行的蒸馏则提供一致的语义锚点,有助于跨客户特征对齐。为了实现这一目标,本文提出了一个名为FedCT的跨培训方案,包括三个主要模块,其中意识一致性的知识广播模块旨在优化模型分配策略,增强客户之间的合作优势,并实现高效的联邦学习过程。多视图知识引导表示学习模块利用来自全局和本地视图的融合原型知识,增强了在模型交换前后本地知识的保留,以及确保本地和全局知识之间的一致性。基于混合的特征增强模块聚合丰富的信息,进一步增加特征空间的多样性,使模型能够更好地区分复杂样本。在四个数据集上进行了大量实验,包括性能比较、消融研究、深入分析和案例研究。结果表明,FedCT减轻了来自本地和全局视图的知识遗忘,使其超越了最先进的方法。
更新时间: 2025-08-14 09:39:07
领域: cs.AI
A Market for Accuracy: Classification under Competition
Machine learning models play a key role for service providers looking to gain market share in consumer markets. However, traditional learning approaches do not take into account the existence of additional providers, who compete with each other for consumers. Our work aims to study learning in this market setting, as it affects providers, consumers, and the market itself. We begin by analyzing such markets through the lens of the learning objective, and show that accuracy cannot be the only consideration. We then propose a method for classification under competition, so that a learner can maximize market share in the presence of competitors. We show that our approach benefits the providers as well as the consumers, and find that the timing of market entry and model updates can be crucial. We display the effectiveness of our approach across a range of domains, from simple distributions to noisy datasets, and show that the market as a whole remains stable by converging quickly to an equilibrium.
Updated: 2025-08-14 09:36:30
标题: 精准市场:竞争下的分类
摘要: 机器学习模型在消费市场中扮演着关键的角色,帮助服务提供商获取市场份额。然而,传统的学习方法并未考虑到存在竞争对手的情况,这些竞争对手相互竞争以争夺消费者。我们的研究旨在研究这种市场环境下的学习,因为它影响到提供商、消费者和市场本身。我们首先通过学习目标的视角分析这些市场,并显示准确性不能是唯一考虑因素。然后,我们提出了一种在竞争环境下进行分类的方法,以便学习者可以在竞争对手存在的情况下最大化市场份额。我们表明我们的方法有利于提供商和消费者,并发现市场进入的时机和模型更新的时机可能至关重要。我们展示了我们的方法在一系列领域中的有效性,从简单的分布到嘈杂的数据集,以及显示整个市场通过快速收敛到一个均衡状态而保持稳定。
更新时间: 2025-08-14 09:36:30
领域: cs.LG,cs.GT
Minimax Optimality in Contextual Dynamic Pricing with General Valuation Models
We study contextual dynamic pricing, where a decision maker posts personalized prices based on observable contexts and receives binary purchase feedback indicating whether the customer's valuation exceeds the price. Each valuation is modeled as an unknown latent function of the context, corrupted by independent and identically distributed market noise from an unknown distribution. Relying only on Lipschitz continuity of the noise distribution and bounded valuations, we propose a minimax-optimal algorithm. To accommodate the unknown distribution, our method discretizes the relevant noise range to form a finite set of candidate prices, then applies layered data partitioning to obtain confidence bounds substantially tighter than those derived via the elliptical-potential lemma. A key advantage is that estimation bias in the valuation function cancels when comparing upper confidence bounds, eliminating the need to know the Lipschitz constant. The framework extends beyond linear models to general function classes through offline regression oracles. Our regret analysis depends solely on the oracle's estimation error, typically governed by the statistical complexity of the class. These techniques yield a regret upper bound matching the minimax lower bound up to logarithmic factors. Furthermore, we refine these guarantees under additional structures -- e.g., linear valuation models, second-order smoothness, sparsity, and known noise distribution or observable valuations -- and compare our bounds and assumptions with prior dynamic-pricing methods. Finally, numerical experiments corroborate the theory and show clear improvements over benchmark methods.
Updated: 2025-08-14 09:34:22
标题: 具有一般估值模型的情境动态定价中的极小化优化
摘要: 我们研究了情境动态定价,其中决策者根据可观察到的情境发布个性化的价格,并接收二元购买反馈,指示客户的估值是否超过价格。每个估值被建模为情境的未知潜在函数,受到来自未知分布的独立同分布市场噪声的干扰。仅依赖于噪声分布的Lipschitz连续性和有界估值,我们提出了一个极小极优算法。为了适应未知分布,我们的方法将相关噪声范围离散化为一个有限的候选价格集,然后应用分层数据分割来获得比通过椭圆势引理推导的置信区间要紧密得多。一个关键优势是,在比较上限置信区间时,估值函数中的估计偏差被取消,消除了需要知道Lipschitz常数的必要性。该框架通过离线回归预测器将线性模型扩展到一般的函数类。我们的遗憾分析仅取决于预测器的估计误差,通常由类的统计复杂性控制。这些技术产生的遗憾上界与极小极下界相匹配,直到对数因子。此外,我们在额外结构下进一步细化这些保证 -- 例如,线性估值模型,二阶平滑度,稀疏性,以及已知噪声分布或可观察到的估值 -- 并将我们的上界和假设与先前的动态定价方法进行比较。最后,数值实验证实了理论,并显示出相对基准方法的明显改进。
更新时间: 2025-08-14 09:34:22
领域: cs.LG,stat.ML
Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers
We introduce an output layer for neural networks that ensures satisfaction of convex constraints. Our approach, $\Pi$net, leverages operator splitting for rapid and reliable projections in the forward pass, and the implicit function theorem for backpropagation. We deploy $\Pi$net as a feasible-by-design optimization proxy for parametric constrained optimization problems and obtain modest-accuracy solutions faster than traditional solvers when solving a single problem, and significantly faster for a batch of problems. We surpass state-of-the-art learning approaches in terms of training time, solution quality, and robustness to hyperparameter tuning, while maintaining similar inference times. Finally, we tackle multi-vehicle motion planning with non-convex trajectory preferences and provide $\Pi$net as a GPU-ready package implemented in JAX with effective tuning heuristics.
Updated: 2025-08-14 09:32:09
标题: Pinet:利用正交投影层优化硬约束神经网络
摘要: 我们引入了神经网络的输出层,确保满足凸约束。我们的方法,$\Pi$net,利用了运算符分裂进行快速和可靠的投影,在前向传播中使用隐函数定理进行反向传播。我们将$\Pi$net部署为一种设计可行的优化代理,用于参数约束优化问题,并在解决单个问题时获得比传统求解器更快的适度精度解决方案,并在解决一批问题时显着更快。我们在培训时间,解决方案质量和对超参数调整的鲁棒性方面超越了最先进的学习方法,同时保持类似的推理时间。最后,我们处理具有非凸轨迹偏好的多车辆运动规划,并提供$\Pi$net作为一个在JAX中实现的GPU准备包,具有有效的调整启发。
更新时间: 2025-08-14 09:32:09
领域: cs.LG,cs.AI,math.OC
Confounding is a Pervasive Problem in Real World Recommender Systems
Unobserved confounding arises when an unmeasured feature influences both the treatment and the outcome, leading to biased causal effect estimates. This issue undermines observational studies in fields like economics, medicine, ecology or epidemiology. Recommender systems leveraging fully observed data seem not to be vulnerable to this problem. However many standard practices in recommender systems result in observed features being ignored, resulting in effectively the same problem. This paper will show that numerous common practices such as feature engineering, A/B testing and modularization can in fact introduce confounding into recommendation systems and hamper their performance. Several illustrations of the phenomena are provided, supported by simulation studies with practical suggestions about how practitioners may reduce or avoid the affects of confounding in real systems.
Updated: 2025-08-14 09:31:35
标题: 混淆是现实世界推荐系统中普遍存在的问题
摘要: 未观察到的混杂是指未测量的特征影响治疗和结果,导致偏倚的因果效应估计。这个问题削弱了经济学、医学、生态学或流行病学等领域的观察性研究。利用完全观察数据的推荐系统似乎不容易受到这个问题的影响。然而,许多推荐系统中的标准做法导致被忽视的观察特征,实际上导致了同样的问题。本文将展示许多常见做法,如特征工程、A/B测试和模块化实际上可以引入混杂到推荐系统中,并阻碍其性能。提供了几个现象的示例,并通过模拟研究支持,提出了关于从业者如何减少或避免真实系统中混杂影响的实用建议。
更新时间: 2025-08-14 09:31:35
领域: cs.LG,cs.IR,stat.ML
EDAPT: Towards Calibration-Free BCIs with Continual Online Adaptation
Brain-computer interfaces (BCIs) suffer from accuracy degradation as neural signals drift over time and vary across users, requiring frequent recalibration that limits practical deployment. We introduce EDAPT, a task- and model-agnostic framework that eliminates calibration through continual model adaptation. EDAPT first trains a baseline decoder using data from multiple users, then continually personalizes this model via supervised finetuning as the neural patterns evolve during use. We tested EDAPT across nine datasets covering three BCI tasks, and found that it consistently improved accuracy over conventional, static methods. These improvements primarily stem from combining population-level pretraining and online continual finetuning, with unsupervised domain adaptation providing further gains on some datasets. EDAPT runs efficiently, updating models within 200 milliseconds on consumer-grade hardware. Finally, decoding accuracy scales with total data budget rather than its allocation between subjects and trials. EDAPT provides a practical pathway toward calibration-free BCIs, reducing a major barrier to BCI deployment.
Updated: 2025-08-14 09:23:25
标题: EDAPT:实现无需校准的脑机接口,并具有持续在线适应能力
摘要: 脑机接口(BCIs)存在准确性下降的问题,因为神经信号随时间漂移并在不同用户间变化,需要频繁重新校准,限制了实际部署。我们引入了EDAPT,这是一个任务和模型无关的框架,通过持续模型适应消除校准。EDAPT首先使用来自多个用户的数据训练基线解码器,然后在使用过程中不断通过监督微调个性化这个模型,以适应神经模式的演变。我们在涵盖三个BCI任务的九个数据集上测试了EDAPT,并发现它始终比传统的静态方法提高了准确性。这些改进主要来自于结合人群级别的预训练和在线持续微调,无监督领域适应在一些数据集上提供了进一步的增益。EDAPT运行效率高,可以在消费级硬件上在200毫秒内更新模型。最后,解码准确性随总数据预算而不是其在受试者和试验间的分配而扩展。EDAPT为无校准BCI提供了一条实际的路径,减少了BCI部署的主要障碍。
更新时间: 2025-08-14 09:23:25
领域: cs.LG,cs.HC,q-bio.NC
GraphFedMIG: Tackling Class Imbalance in Federated Graph Learning via Mutual Information-Guided Generation
Federated graph learning (FGL) enables multiple clients to collaboratively train powerful graph neural networks without sharing their private, decentralized graph data. Inherited from generic federated learning, FGL is critically challenged by statistical heterogeneity, where non-IID data distributions across clients can severely impair model performance. A particularly destructive form of this is class imbalance, which causes the global model to become biased towards majority classes and fail at identifying rare but critical events. This issue is exacerbated in FGL, as nodes from a minority class are often surrounded by biased neighborhood information, hindering the learning of expressive embeddings. To grapple with this challenge, we propose GraphFedMIG, a novel FGL framework that reframes the problem as a federated generative data augmentation task. GraphFedMIG employs a hierarchical generative adversarial network where each client trains a local generator to synthesize high-fidelity feature representations. To provide tailored supervision, clients are grouped into clusters, each sharing a dedicated discriminator. Crucially, the framework designs a mutual information-guided mechanism to steer the evolution of these client generators. By calculating each client's unique informational value, this mechanism corrects the local generator parameters, ensuring that subsequent rounds of mutual information-guided generation are focused on producing high-value, minority-class features. We conduct extensive experiments on four real-world datasets, and the results demonstrate the superiority of the proposed GraphFedMIG compared with other baselines.
Updated: 2025-08-14 09:16:56
标题: GraphFedMIG:通过互信息引导生成解决联合图学习中的类别不平衡问题
摘要: Federated graph learning (FGL)使多个客户端能够协作训练强大的图神经网络,而无需共享它们的私人、分散的图数据。继承于通用的联邦学习,FGL面临统计异质性的挑战,即客户端之间的非独立同分布数据分布可能严重影响模型性能。其中一种特别破坏性的形式是类别不平衡,导致全局模型偏向多数类并无法识别罕见但关键的事件。在FGL中,这个问题变得更加严重,因为少数类的节点通常被偏见的邻域信息所包围,阻碍了表达性嵌入的学习。为了解决这一挑战,我们提出了GraphFedMIG,这是一个重新构思问题为联邦生成数据增强任务的新型FGL框架。GraphFedMIG采用了一个分层生成对抗网络,其中每个客户端训练一个本地生成器来合成高保真度的特征表示。为了提供定制监督,客户端被分组成簇,每个簇共享一个专用的鉴别器。关键的是,该框架设计了一个互信息引导机制来引导这些客户端生成器的演变。通过计算每个客户端的独特信息价值,该机制纠正了本地生成器的参数,确保随后的互信息引导生成过程集中于产生高价值的少数类特征。我们在四个真实世界数据集上进行了大量实验,结果表明所提出的GraphFedMIG相对于其他基线方法具有卓越性能。
更新时间: 2025-08-14 09:16:56
领域: cs.LG
Hummingbird: Fast, Flexible, and Fair Inter-Domain Bandwidth Reservations
To realize the long-standing vision of providing quality-of-service (QoS) guarantees on a public Internet, this paper introduces Hummingbird: a lightweight QoS-system that provides fine-grained inter-domain reservations for end hosts. Hummingbird enables flexible and composable reservations with end-to-end guarantees, and addresses an often overlooked, but crucial, aspect of bandwidth-reservation systems: incentivization of network providers. Hummingbird represents bandwidth reservations as tradable assets, allowing markets to emerge. These markets then ensure fair and efficient resource allocation and encourage deployment by remunerating providers. This incentivization is facilitated by decoupling reservations from network identities, which enables novel control-plane mechanisms and allows the design of a control plane based on smart contracts. Hummingbird also provides an efficient reservation data plane, which streamlines the processing on routers and thus simplifies the implementation, deployment, and traffic policing, while maintaining robust security properties. Our prototype implementation demonstrates the efficiency and scalability of Hummingbird's asset-based control plane, and our high-speed software implementation can fill a 160 Gbps link with Hummingbird packets on commodity hardware.
Updated: 2025-08-14 09:16:33
标题: 蜂鸟:快速、灵活和公平的跨域带宽预约
摘要: 为了实现在公共互联网上提供服务质量(QoS)保证的长期愿景,本文介绍了Hummingbird:一种轻量级QoS系统,为终端主机提供细粒度的跨域预留。 Hummingbird实现了灵活和可组合的预留,并提供端到端的保证,并解决了经常被忽视但关键的带宽预留系统方面:网络提供商的激励。 Hummingbird将带宽预留表示为可交易的资产,允许市场出现。这些市场确保资源的公平和有效分配,并通过对提供商进行报酬来鼓励部署。这种激励是通过将预留与网络身份解耦来实现的,这样可以实现新颖的控制平面机制,并允许基于智能合约的控制平面设计。 Hummingbird还提供了高效的预留数据平面,简化了路由器上的处理,从而简化了实施、部署和流量管理,同时保持了强大的安全性质。我们的原型实现展示了Hummingbird基于资产的控制平面的效率和可扩展性,我们的高速软件实现可以在通用硬件上使用Hummingbird数据包填充160 Gbps的链路。
更新时间: 2025-08-14 09:16:33
领域: cs.NI,cs.CR,C.2.1; C.2.2; C.2.6
Enhanced Sparse Point Cloud Data Processing for Privacy-aware Human Action Recognition
Human Action Recognition (HAR) plays a crucial role in healthcare, fitness tracking, and ambient assisted living technologies. While traditional vision based HAR systems are effective, they pose privacy concerns. mmWave radar sensors offer a privacy preserving alternative but present challenges due to the sparse and noisy nature of their point cloud data. In the literature, three primary data processing methods: Density-Based Spatial Clustering of Applications with Noise (DBSCAN), the Hungarian Algorithm, and Kalman Filtering have been widely used to improve the quality and continuity of radar data. However, a comprehensive evaluation of these methods, both individually and in combination, remains lacking. This paper addresses that gap by conducting a detailed performance analysis of the three methods using the MiliPoint dataset. We evaluate each method individually, all possible pairwise combinations, and the combination of all three, assessing both recognition accuracy and computational cost. Furthermore, we propose targeted enhancements to the individual methods aimed at improving accuracy. Our results provide crucial insights into the strengths and trade-offs of each method and their integrations, guiding future work on mmWave based HAR systems
Updated: 2025-08-14 09:09:49
标题: 隐私感知人体动作识别的增强稀疏点云数据处理
摘要: 人体动作识别(HAR)在医疗保健、健身追踪和环境辅助生活技术中发挥着至关重要的作用。虽然传统的基于视觉的HAR系统是有效的,但它们存在隐私问题。毫米波雷达传感器提供了一种保护隐私的替代方案,但由于其点云数据的稀疏和嘈杂特性,存在挑 challenges。在文献中,三种主要的数据处理方法:基于密度的空间聚类应用与噪声(DBSCAN)、匈牙利算法和卡尔曼滤波已被广泛用于提高雷达数据的质量和连续性。然而,对这些方法的全面评估,包括单独和组合使用,仍然缺乏。本文通过使用MiliPoint数据集对这三种方法进行了详细的性能分析来填补这一空白。我们评估了每种方法的单独效果,所有可能的两两组合,以及三种方法的组合,评估了识别准确性和计算成本。此外,我们提出了针对单个方法的目标增强措施,旨在提高准确性。我们的结果为每种方法及其集成的优势和权衡提供了重要见解,引导未来基于毫米波的HAR系统的研究工作。
更新时间: 2025-08-14 09:09:49
领域: cs.CV,cs.AI
FIRESPARQL: A LLM-based Framework for SPARQL Query Generation over Scholarly Knowledge Graphs
Question answering over Scholarly Knowledge Graphs (SKGs) remains a challenging task due to the complexity of scholarly content and the intricate structure of these graphs. Large Language Model (LLM) approaches could be used to translate natural language questions (NLQs) into SPARQL queries; however, these LLM-based approaches struggle with SPARQL query generation due to limited exposure to SKG-specific content and the underlying schema. We identified two main types of errors in the LLM-generated SPARQL queries: (i) structural inconsistencies, such as missing or redundant triples in the queries, and (ii) semantic inaccuracies, where incorrect entities or properties are shown in the queries despite a correct query structure. To address these issues, we propose FIRESPARQL, a modular framework that supports fine-tuned LLMs as a core component, with optional context provided via retrieval-augmented generation (RAG) and a SPARQL query correction layer. We evaluate the framework on the SciQA Benchmark using various configurations (zero-shot, zero-shot with RAG, one-shot, fine-tuning, and fine-tuning with RAG) and compare the performance with baseline and state-of-the-art approaches. We measure query accuracy using BLEU and ROUGE metrics, and query result accuracy using relaxed exact match(RelaxedEM), with respect to the gold standards containing the NLQs, SPARQL queries, and the results of the queries. Experimental results demonstrate that fine-tuning achieves the highest overall performance, reaching 0.90 ROUGE-L for query accuracy and 0.85 RelaxedEM for result accuracy on the test set.
Updated: 2025-08-14 09:08:50
标题: FIRESPARQL:基于LLM的学术知识图上SPARQL查询生成框架
摘要: 在学术知识图谱(SKGs)上进行问答仍然是一个具有挑战性的任务,这是由于学术内容的复杂性和这些图谱的复杂结构。大型语言模型(LLM)方法可以用来将自然语言问题(NLQs)转换为SPARQL查询;然而,这些基于LLM的方法在SPARQL查询生成方面存在困难,因为它们对SKG特定内容和基础架构的了解有限。我们确定了LLM生成的SPARQL查询中的两种主要错误类型:(i)结构不一致,如查询中缺少或多余的三元组,以及(ii)语义不准确,即尽管查询结构正确,但查询中显示了不正确的实体或属性。为了解决这些问题,我们提出了FIRESPARQL,这是一个模块化框架,支持以调优的LLM作为核心组件,可通过检索增强生成(RAG)和一个SPARQL查询校正层来提供可选上下文。我们在SciQA基准测试中使用各种配置(零-shot、零-shot与RAG、一-shot、微调和微调与RAG)对该框架进行评估,并将性能与基准和最先进的方法进行比较。我们使用BLEU和ROUGE指标来衡量查询准确性,使用宽松精确匹配(RelaxedEM)来衡量查询结果准确性,其中参考标准包含NLQs、SPARQL查询和查询结果。实验结果表明,微调实现了最高的整体性能,在测试集上达到了0.90 ROUGE-L的查询准确性和0.85 RelaxedEM的结果准确性。
更新时间: 2025-08-14 09:08:50
领域: cs.AI,cs.DL
Boosting Cross-problem Generalization in Diffusion-Based Neural Combinatorial Solver via Inference Time Adaptation
Diffusion-based Neural Combinatorial Optimization (NCO) has demonstrated effectiveness in solving NP-complete (NPC) problems by learning discrete diffusion models for solution generation, eliminating hand-crafted domain knowledge. Despite their success, existing NCO methods face significant challenges in both cross-scale and cross-problem generalization, and high training costs compared to traditional solvers. While recent studies on diffusion models have introduced training-free guidance approaches that leverage pre-defined guidance functions for conditional generation, such methodologies have not been extensively explored in combinatorial optimization. To bridge this gap, we propose a training-free inference time adaptation framework (DIFU-Ada) that enables both the zero-shot cross-problem transfer and cross-scale generalization capabilities of diffusion-based NCO solvers without requiring additional training. We provide theoretical analysis that helps understanding the cross-problem transfer capability. Our experimental results demonstrate that a diffusion solver, trained exclusively on the Traveling Salesman Problem (TSP), can achieve competitive zero-shot transfer performance across different problem scales on TSP variants, such as Prize Collecting TSP (PCTSP) and the Orienteering Problem (OP), through inference time adaptation.
Updated: 2025-08-14 09:07:37
标题: 通过推理时间适应提升基于扩散的神经组合求解器的跨问题泛化
摘要: 基于扩散的神经组合优化(NCO)已经证明通过学习离散扩散模型进行解决NP完全(NPC)问题的有效性,消除了手工制定的领域知识。尽管它们取得了成功,现有的NCO方法在跨尺度和跨问题泛化以及与传统求解器相比的高训练成本方面面临重大挑战。尽管最近关于扩散模型的研究引入了基于预定义指导函数的无需训练的指导方法用于条件生成,但这样的方法尚未在组合优化中得到广泛探讨。为了弥补这一差距,我们提出了一个无需训练的推理时间自适应框架(DIFU-Ada),该框架使基于扩散的NCO求解器具有零次跨问题转移和跨尺度泛化能力,而无需额外训练。我们提供了有助于理解跨问题转移能力的理论分析。我们的实验结果表明,一个仅在旅行商问题(TSP)上训练的扩散求解器可以通过推理时间自适应在不同问题规模上实现竞争力的零次转移性能,如奖励收集TSP(PCTSP)和导向问题(OP)。
更新时间: 2025-08-14 09:07:37
领域: cs.LG,cs.AI
SingleStrip: learning skull-stripping from a single labeled example
Deep learning segmentation relies heavily on labeled data, but manual labeling is laborious and time-consuming, especially for volumetric images such as brain magnetic resonance imaging (MRI). While recent domain-randomization techniques alleviate the dependency on labeled data by synthesizing diverse training images from label maps, they offer limited anatomical variability when very few label maps are available. Semi-supervised self-training addresses label scarcity by iteratively incorporating model predictions into the training set, enabling networks to learn from unlabeled data. In this work, we combine domain randomization with self-training to train three-dimensional skull-stripping networks using as little as a single labeled example. First, we automatically bin voxel intensities, yielding labels we use to synthesize images for training an initial skull-stripping model. Second, we train a convolutional autoencoder (AE) on the labeled example and use its reconstruction error to assess the quality of brain masks predicted for unlabeled data. Third, we select the top-ranking pseudo-labels to fine-tune the network, achieving skull-stripping performance on out-of-distribution data that approaches models trained with more labeled images. We compare AE-based ranking to consistency-based ranking under test-time augmentation, finding that the AE approach yields a stronger correlation with segmentation accuracy. Our results highlight the potential of combining domain randomization and AE-based quality control to enable effective semi-supervised segmentation from extremely limited labeled data. This strategy may ease the labeling burden that slows progress in studies involving new anatomical structures or emerging imaging techniques.
Updated: 2025-08-14 09:05:19
标题: SingleStrip:从单个标记的示例中学习去颅。
摘要: 深度学习分割在很大程度上依赖于标记数据,但手动标记是费时费力的,特别是对于体积图像,如脑磁共振成像(MRI)。最近的领域随机化技术通过从标签映射中合成多样化的训练图像,减少了对标记数据的依赖,但在只有很少的标签映射可用时,它们提供了有限的解剖变异性。半监督自我训练通过将模型预测迭代地纳入训练集,使网络能够从未标记的数据中学习来解决标签稀缺的问题。在这项工作中,我们将领域随机化和自我训练结合起来,使用至少一个标记示例训练三维去颅网络。首先,我们自动对体素强度进行分箱,得到我们用来合成训练初始去颅模型的标签。其次,我们在标记示例上训练一个卷积自动编码器(AE),并使用其重建误差来评估为未标记数据预测的脑部掩模的质量。第三,我们选择排名前几位的伪标签来微调网络,使其在分布外数据上实现去颅性能,接近使用更多标记图像训练的模型。我们将基于AE的排名与测试时间增强下的一致性排名进行比较,发现AE方法与分割准确性有更强的相关性。我们的结果突显了结合领域随机化和基于AE的质量控制的潜力,从极少的标记数据中实现有效的半监督分割。这种策略可能减轻标记负担,加快涉及新的解剖结构或新兴成像技术的研究进展。
更新时间: 2025-08-14 09:05:19
领域: cs.CV,cs.LG
ASPD: Unlocking Adaptive Serial-Parallel Decoding by Exploring Intrinsic Parallelism in LLMs
The increasing scale and complexity of large language models (LLMs) pose significant inference latency challenges, primarily due to their autoregressive decoding paradigm characterized by the sequential nature of next-token prediction. By re-examining the outputs of autoregressive models, we observed that some segments exhibit parallelizable structures, which we term intrinsic parallelism. Decoding each parallelizable branch simultaneously (i.e. parallel decoding) can significantly improve the overall inference speed of LLMs. In this paper, we propose an Adaptive Serial-Parallel Decoding (ASPD), which addresses two core challenges: automated construction of parallelizable data and efficient parallel decoding mechanism. More specifically, we introduce a non-invasive pipeline that automatically extracts and validates parallelizable structures from the responses of autoregressive models. To empower efficient adaptive serial-parallel decoding, we implement a Hybrid Decoding Engine which enables seamless transitions between serial and parallel decoding modes while maintaining a reusable KV cache, maximizing computational efficiency. Extensive evaluations across General Tasks, Retrieval-Augmented Generation, Mathematical Reasoning, demonstrate that ASPD achieves unprecedented performance in both effectiveness and efficiency. Notably, on Vicuna Bench, our method achieves up to 3.19x speedup (1.85x on average) while maintaining response quality within 1% difference compared to autoregressive models, realizing significant acceleration without compromising generation quality. Our framework sets a groundbreaking benchmark for efficient LLM parallel inference, paving the way for its deployment in latency-sensitive applications such as AI-powered customer service bots and answer retrieval engines.
Updated: 2025-08-14 09:04:56
标题: ASPD:通过探索LLMs中固有的并行性解锁自适应串行-并行解码
摘要: 随着大型语言模型(LLMs)规模和复杂性的增加,由于其自回归解码范式具有下一个标记预测的顺序性质,存在着显著的推理延迟挑战。通过重新审视自回归模型的输出,我们观察到一些部分展现出可并行化结构,我们将其称为固有并行性。同时解码每个可并行化分支(即并行解码)可以显著提高LLMs的整体推理速度。在本文中,我们提出了自适应串行-并行解码(ASPD),解决了两个核心挑战:自动构建可并行化数据和高效的并行解码机制。更具体地说,我们引入了一个非侵入式的流水线,自动从自回归模型的响应中提取并验证可并行化结构。为了实现高效的自适应串行-并行解码,我们实现了一个混合解码引擎,实现了串行和并行解码模式之间的无缝切换,同时保持可重用的KV缓存,最大限度地提高计算效率。通过对一般任务、检索增强生成、数学推理等方面进行广泛评估,我们发现ASPD在效果和效率方面取得了前所未有的性能。值得注意的是,在Vicuna Bench上,我们的方法实现了高达3.19倍的加速(平均1.85倍),并且与自回归模型相比,保持响应质量在1%的差异范围内,实现了显著的加速而不会影响生成质量。我们的框架为高效的LLM并行推理设立了突破性的基准,为其在诸如AI驱动的客户服务机器人和答案检索引擎等延迟敏感的应用中的部署铺平了道路。
更新时间: 2025-08-14 09:04:56
领域: cs.CL,cs.AI
X-Node: Self-Explanation is All We Need
Graph neural networks (GNNs) have achieved state-of-the-art results in computer vision and medical image classification tasks by capturing structural dependencies across data instances. However, their decision-making remains largely opaque, limiting their trustworthiness in high-stakes clinical applications where interpretability is essential. Existing explainability techniques for GNNs are typically post-hoc and global, offering limited insight into individual node decisions or local reasoning. We introduce X-Node, a self-explaining GNN framework in which each node generates its own explanation as part of the prediction process. For every node, we construct a structured context vector encoding interpretable cues such as degree, centrality, clustering, feature saliency, and label agreement within its local topology. A lightweight Reasoner module maps this context into a compact explanation vector, which serves three purposes: (1) reconstructing the node's latent embedding via a decoder to enforce faithfulness, (2) generating a natural language explanation using a pre-trained LLM (e.g., Grok or Gemini), and (3) guiding the GNN itself via a "text-injection" mechanism that feeds explanations back into the message-passing pipeline. We evaluate X-Node on two graph datasets derived from MedMNIST and MorphoMNIST, integrating it with GCN, GAT, and GIN backbones. Our results show that X-Node maintains competitive classification accuracy while producing faithful, per-node explanations. Repository: https://github.com/basiralab/X-Node.
Updated: 2025-08-14 09:00:45
标题: X-Node:自我解释就是我们所需要的
摘要: 图神经网络(GNNs)通过捕捉数据实例之间的结构依赖关系,在计算机视觉和医学图像分类任务中取得了最先进的结果。然而,它们的决策过程仍然不够透明,在需要可解释性的高风险临床应用中,其可信度受到限制。现有的GNN可解释性技术通常是事后和全局的,只能提供有限的洞察力,无法揭示单个节点的决策或局部推理。我们引入了X-Node,这是一个自解释的GNN框架,其中每个节点在预测过程中生成自己的解释。对于每个节点,我们构建一个结构化的上下文向量,编码可解释的线索,如度、中心性、聚类、特征显著性和其局部拓扑中的标签一致性。一个轻量级的Reasoner模块将这个上下文映射到一个紧凑的解释向量,实现三个目的:(1)通过解码器重构节点的潜在嵌入以强制保真,(2)使用预训练的LLM(例如Grok或Gemini)生成自然语言解释,以及(3)通过一个“文本注入”机制引导GNN自身,将解释反馈到消息传递管道中。我们在两个源自MedMNIST和MorphoMNIST的图数据集上评估了X-Node,将其与GCN、GAT和GIN主干相结合。我们的结果表明,X-Node在保持竞争性分类准确性的同时,能够产生忠实的、每个节点的解释。存储库:https://github.com/basiralab/X-Node.
更新时间: 2025-08-14 09:00:45
领域: cs.LG,cs.AI
VERCATION: Precise Vulnerable Open-source Software Version Identification based on Static Analysis and LLM
Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. However, the adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and extract the code features involved in vulnerability patches using static analysis with pre-defined rules. They then use code clone detection to identify the vulnerable versions. These methods are hindered by imprecision due to (1) the exclusion of vulnerability-irrelevant code in the analysis and (2) the inadequacy of code clone detection. This paper presents VERCATION, an approach designed to identify vulnerable versions of OSS written in C/C++. VERCATION combines program slicing with a Large Language Model (LLM) to identify vulnerability-relevant code from vulnerability patches. It then backtracks historical commits to gather previous modifications of identified vulnerability-relevant code. We propose code clone detection based on expanded and normalized ASTs to compare the differences between pre-modification and post-modification code, thereby locating the vulnerability-introducing commit (vic) and enabling the identification of the vulnerable versions between the vulnerability-fixing commit and the vic. We curate a dataset linking 122 OSS vulnerabilities and 1,211 versions to evaluate VERCATION. On this dataset, our approach achieves an F1 score of 93.1%, outperforming current state-of-the-art methods. More importantly, VERCATION detected 202 incorrect vulnerable OSS versions in NVD reports.
Updated: 2025-08-14 09:00:40
标题: VERCATION:基于静态分析和LLM的精确易受攻击的开源软件版本识别
摘要: 开源软件(OSS)经历了一波流行,这归功于其协作开发模式和成本效益性质。然而,在开发项目中采用特定软件版本可能会引入安全风险,因为这些版本可能带来漏洞。目前识别有漏洞版本的方法通常是使用静态分析和预定义规则来分析和提取涉及漏洞补丁的代码特征。然后使用代码克隆检测来识别有漏洞的版本。这些方法受到不精确性的阻碍,原因是(1)在分析中排除了与漏洞无关的代码,以及(2)代码克隆检测的不足。本文提出了VERCATION,这是一种旨在识别用C/C++编写的OSS的有漏洞版本的方法。VERCATION将程序切片与大型语言模型(LLM)相结合,以从漏洞补丁中识别与漏洞相关的代码。然后回溯历史提交,收集已识别的漏洞相关代码的先前修改。我们提出了基于扩展和规范化AST的代码克隆检测,以比较修改前后代码的差异,从而定位引入漏洞的提交(vic),并能够识别在漏洞修复提交和vic之间的有漏洞版本。我们整理了一个数据集,将122个OSS漏洞和1,211个版本链接起来,以评估VERCATION。在这个数据集上,我们的方法取得了93.1%的F1分数,优于当前的最先进方法。更重要的是,VERCATION在NVD报告中检测到了202个错误的有漏洞的OSS版本。
更新时间: 2025-08-14 09:00:40
领域: cs.SE,cs.CR
Efficient Methods for Accurate Sparse Trajectory Recovery and Map Matching
Real-world trajectories are often sparse with low-sampling rates (i.e., long intervals between consecutive GPS points) and misaligned with road networks, yet many applications demand high-quality data for optimal performance. To improve data quality with sparse trajectories as input, we systematically study two related research problems: trajectory recovery on road network, which aims to infer missing points to recover high-sampling trajectories, and map matching, which aims to map GPS points to road segments to determine underlying routes. In this paper, we present efficient methods TRMMA and MMA for accurate trajectory recovery and map matching, respectively, where MMA serves as the first step of TRMMA. In MMA, we carefully formulate a classification task to map a GPS point from sparse trajectories to a road segment over a small candidate segment set, rather than the entire road network. We develop techniques in MMA to generate effective embeddings that capture the patterns of GPS data, directional information, and road segments, to accurately align sparse trajectories to routes. For trajectory recovery, TRMMA focuses on the segments in the route returned by MMA to infer missing points with position ratios on road segments, producing high-sampling trajectories efficiently by avoiding evaluation of all road segments. Specifically, in TRMMA, we design a dual-transformer encoding process to cohesively capture latent patterns in trajectories and routes, and an effective decoding technique to sequentially predict the position ratios and road segments of missing points. We conduct extensive experiments to compare TRMMA and MMA with numerous existing methods for trajectory recovery and map matching, respectively, on 4 large real-world datasets. TRMMA and MMA consistently achieve the best result quality, often by a significant margin.
Updated: 2025-08-14 09:00:37
标题: 高效方法用于准确的稀疏轨迹恢复和地图匹配
摘要: 现实世界中的轨迹通常稀疏,并且采样率低(即,连续GPS点之间的间隔较长)并且与道路网络不对齐,然而许多应用需要高质量数据以实现最佳性能。为了改善以稀疏轨迹作为输入的数据质量,我们系统地研究了两个相关的研究问题:在道路网络上的轨迹恢复,旨在推断缺失点以恢复高采样轨迹,和地图匹配,旨在将GPS点映射到道路段以确定底层路线。在本文中,我们提出了用于准确轨迹恢复和地图匹配的高效方法TRMMA和MMA,其中MMA充当TRMMA的第一步。在MMA中,我们精心制定了一个分类任务,将稀疏轨迹中的GPS点映射到一个小的候选段集合上的道路段,而不是整个道路网络。我们在MMA中开发了技术,生成有效的嵌入,捕捉GPS数据、方向信息和道路段的模式,以准确地将稀疏轨迹与路线对齐。对于轨迹恢复,TRMMA专注于MMA返回的路线中的段,通过在道路段上的位置比例推断缺失点,通过避免评估所有道路段,有效地产生高采样轨迹。具体而言,在TRMMA中,我们设计了一个双变换器编码过程,以凝聚地捕捉轨迹和路线中的潜在模式,并且采用有效的解码技术来顺序预测缺失点的位置比例和道路段。我们进行了大量实验,将TRMMA和MMA与许多现有方法进行了比较,用于分别在4个大型真实世界数据集上进行轨迹恢复和地图匹配。TRMMA和MMA始终获得最佳的结果质量,通常差距显著。
更新时间: 2025-08-14 09:00:37
领域: cs.DB,cs.LG
FedABC: Attention-Based Client Selection for Federated Learning with Long-Term View
Native AI support is a key objective in the evolution of 6G networks, with Federated Learning (FL) emerging as a promising paradigm. FL allows decentralized clients to collaboratively train an AI model without directly sharing their data, preserving privacy. Clients train local models on private data and share model updates, which a central server aggregates to refine the global model and redistribute it for the next iteration. However, client data heterogeneity slows convergence and reduces model accuracy, and frequent client participation imposes communication and computational burdens. To address these challenges, we propose FedABC, an innovative client selection algorithm designed to take a long-term view in managing data heterogeneity and optimizing client participation. Inspired by attention mechanisms, FedABC prioritizes informative clients by evaluating both model similarity and each model's unique contributions to the global model. Moreover, considering the evolving demands of the global model, we formulate an optimization problem to guide FedABC throughout the training process. Following the "later-is-better" principle, FedABC adaptively adjusts the client selection threshold, encouraging greater participation in later training stages. Extensive simulations on CIFAR-10 demonstrate that FedABC significantly outperforms existing approaches in model accuracy and client participation efficiency, achieving comparable performance with 32% fewer clients than the classical FL algorithm FedAvg, and 3.5% higher accuracy with 2% fewer clients than the state-of-the-art. This work marks a step toward deploying FL in heterogeneous, resource-constrained environments, thereby supporting native AI capabilities in 6G networks.
Updated: 2025-08-14 08:57:03
标题: FedABC:基于注意力的长期视角联邦学习客户端选择
摘要: 本文献摘要介绍了原生AI支持是6G网络演进的关键目标,联邦学习(FL)作为一种有前途的范式正在崭露头角。FL允许分散的客户端协作训练AI模型,而无需直接共享它们的数据,从而保护隐私。客户端在私有数据上训练本地模型,并共享模型更新,而中央服务器将这些更新汇总以完善全局模型,并重新分发给下一轮。然而,客户端数据异质性会减缓收敛速度并降低模型准确性,并且频繁的客户参与会带来通信和计算负担。为了解决这些挑战,我们提出了FedABC,一种创新的客户端选择算法,旨在从长远视角管理数据异质性并优化客户端参与度。受注意机制启发,FedABC通过评估模型相似性和每个模型对全局模型的独特贡献来优先考虑信息丰富的客户端。此外,考虑到全局模型的不断变化需求,我们制定了一个优化问题,以指导FedABC在整个训练过程中。遵循“后者更好”的原则,FedABC自适应调整客户端选择阈值,鼓励在后期培训阶段更多地参与。在CIFAR-10上进行的大量仿真实验表明,FedABC在模型准确性和客户端参与效率方面明显优于现有方法,与经典FL算法FedAvg相比,使用的客户端数量减少了32%,准确性提高了3.5%,比最先进技术使用的客户端数量少了2%。这项工作是在异构、资源受限的环境中部署FL的一步,从而支持6G网络中的原生AI功能。
更新时间: 2025-08-14 08:57:03
领域: cs.NI,cs.LG
Multi-Label Plant Species Prediction with Metadata-Enhanced Multi-Head Vision Transformers
We present a multi-head vision transformer approach for multi-label plant species prediction in vegetation plot images, addressing the PlantCLEF 2025 challenge. The task involves training models on single-species plant images while testing on multi-species quadrat images, creating a drastic domain shift. Our methodology leverages a pre-trained DINOv2 Vision Transformer Base (ViT-B/14) backbone with multiple classification heads for species, genus, and family prediction, utilizing taxonomic hierarchies. Key contributions include multi-scale tiling to capture plants at different scales, dynamic threshold optimization based on mean prediction length, and ensemble strategies through bagging and Hydra model architectures. The approach incorporates various inference techniques including image cropping to remove non-plant artifacts, top-n filtering for prediction constraints, and logit thresholding strategies. Experiments were conducted on approximately 1.4 million training images covering 7,806 plant species. Results demonstrate strong performance, making our submission 3rd best on the private leaderboard. Our code is available at https://github.com/geranium12/plant-clef-2025/tree/v1.0.0.
Updated: 2025-08-14 08:56:58
标题: 利用元数据增强的多头视觉Transformer进行多标签植物物种预测
摘要: 我们提出了一种多头视觉变换器方法,用于植被样地图像中的多标签植物物种预测,解决了PlantCLEF 2025挑战。该任务涉及在单一物种植物图像上训练模型,同时在多物种象限图像上进行测试,造成了极端的域转移。我们的方法利用了一个预训练的DINOv2 Vision Transformer Base(ViT-B/14)骨干,其具有多个分类头用于物种、属和科的预测,利用分类学层次结构。关键贡献包括多尺度平铺以捕获不同尺度的植物,基于平均预测长度的动态阈值优化,以及通过装袋和Hydra模型架构的集成策略。该方法结合了各种推理技术,包括图像裁剪以去除非植物图像,用于预测约束的前n个过滤,以及逻辑阈值策略。实验在大约140万张训练图像上进行,涵盖了7806种植物物种。结果表明了强大的性能,使我们的提交在私人排行榜上排名第三。我们的代码可在https://github.com/geranium12/plant-clef-2025/tree/v1.0.0上找到。
更新时间: 2025-08-14 08:56:58
领域: cs.CV,cs.IR,cs.LG
RealAC: A Domain-Agnostic Framework for Realistic and Actionable Counterfactual Explanations
Counterfactual explanations provide human-understandable reasoning for AI-made decisions by describing minimal changes to input features that would alter a model's prediction. To be truly useful in practice, such explanations must be realistic and feasible -- they should respect both the underlying data distribution and user-defined feasibility constraints. Existing approaches often enforce inter-feature dependencies through rigid, hand-crafted constraints or domain-specific knowledge, which limits their generalizability and ability to capture complex, nonlinear relations inherent in data. Moreover, they rarely accommodate user-specified preferences and suggest explanations that are causally implausible or infeasible to act upon. We introduce RealAC, a domain-agnostic framework for generating realistic and actionable counterfactuals. RealAC automatically preserves complex inter-feature dependencies without relying on explicit domain knowledge -- by aligning the joint distributions of feature pairs between factual and counterfactual instances. The framework also allows end-users to ``freeze'' attributes they cannot or do not wish to change by suppressing change in frozen features during optimization. Evaluations on three synthetic and two real datasets demonstrate that RealAC balances realism with actionability. Our method outperforms state-of-the-art baselines and Large Language Model-based counterfactual generation techniques in causal edge score, dependency preservation score, and IM1 realism metric and offers a solution for causality-aware and user-centric counterfactual generation.
Updated: 2025-08-14 08:51:39
标题: RealAC: 一个领域无关的框架,用于提供现实和可操作的反事实解释
摘要: 反事实解释通过描述最小变化的输入特征来改变模型预测,为人类理解AI决策提供了推理。为了在实践中真正有用,这些解释必须是现实和可行的——它们应该尊重基础数据分布和用户定义的可行性约束。现有方法通常通过严格的、手工制作的约束或特定领域知识来强制执行特征之间的相互依赖性,这限制了它们的泛化能力和捕捉数据中固有的复杂非线性关系的能力。此外,它们很少考虑用户指定的偏好,并提出因果上不合理或不可行的解释。我们引入了RealAC,一个用于生成现实和可操作反事实的领域无关框架。RealAC通过调整实际和反事实实例之间的特征对的联合分布来自动保留复杂的特征之间的依赖关系,而不依赖于显式的领域知识。该框架还允许最终用户通过在优化过程中抑制冻结特征中的变化来“冻结”他们不能或不希望改变的属性。对三个合成数据集和两个真实数据集的评估表明,RealAC在现实性和可操作性之间取得了平衡。我们的方法在因果边缘得分、依赖性保留得分和IM1真实度指标方面优于最先进的基线和基于大型语言模型的反事实生成技术,并为考虑因果关系和用户中心的反事实生成提供了解决方案。
更新时间: 2025-08-14 08:51:39
领域: cs.LG,cs.AI
SkeySpot: Automating Service Key Detection for Digital Electrical Layout Plans in the Construction Industry
Legacy floor plans, often preserved only as scanned documents, remain essential resources for architecture, urban planning, and facility management in the construction industry. However, the lack of machine-readable floor plans render large-scale interpretation both time-consuming and error-prone. Automated symbol spotting offers a scalable solution by enabling the identification of service key symbols directly from floor plans, supporting workflows such as cost estimation, infrastructure maintenance, and regulatory compliance. This work introduces a labelled Digitised Electrical Layout Plans (DELP) dataset comprising 45 scanned electrical layout plans annotated with 2,450 instances across 34 distinct service key classes. A systematic evaluation framework is proposed using pretrained object detection models for DELP dataset. Among the models benchmarked, YOLOv8 achieves the highest performance with a mean Average Precision (mAP) of 82.5\%. Using YOLOv8, we develop SkeySpot, a lightweight, open-source toolkit for real-time detection, classification, and quantification of electrical symbols. SkeySpot produces structured, standardised outputs that can be scaled up for interoperable building information workflows, ultimately enabling compatibility across downstream applications and regulatory platforms. By lowering dependency on proprietary CAD systems and reducing manual annotation effort, this approach makes the digitisation of electrical layouts more accessible to small and medium-sized enterprises (SMEs) in the construction industry, while supporting broader goals of standardisation, interoperability, and sustainability in the built environment.
Updated: 2025-08-14 08:36:11
标题: SkeySpot:建筑行业数字电气布局图中服务关键点检测的自动化
摘要: 传统的平面图,通常仅保存为扫描文档,对于建筑、城市规划和设施管理在建筑行业仍然是必不可少的资源。然而,由于缺乏可机器阅读的平面图,大规模解释变得耗时且容易出错。自动符号识别提供了一种可扩展的解决方案,通过直接从平面图中识别服务关键符号,支持成本估算、基础设施维护和合规性等工作流程。本研究介绍了一个包含45个扫描电气布局图的标记化数字化电气布局计划(DELP)数据集,涵盖了34个不同的服务关键类别,共计2,450个实例。提出了一个系统评估框架,使用预训练的目标检测模型对DELP数据集进行评估。在所比较的模型中,YOLOv8表现最佳,平均精度(mAP)达到82.5\%。利用YOLOv8,我们开发了SkeySpot,一个轻量级、开源的工具包,用于实时检测、分类和量化电气符号。SkeySpot生成结构化、标准化的输出,可扩展用于互操作建筑信息工作流程,最终实现在下游应用和监管平台之间的兼容性。通过降低对专有CAD系统的依赖和减少手动标注的工作量,这种方法使电气布局的数字化更易于中小型企业(SMEs)在建筑行业的应用,同时支持建筑环境中标准化、互操作性和可持续性的更广泛目标。
更新时间: 2025-08-14 08:36:11
领域: cs.CV,cs.LG
Learning Classifiers That Induce Markets
When learning is used to inform decisions about humans, such as for loans, hiring, or admissions, this can incentivize users to strategically modify their features, at a cost, to obtain positive predictions. The common assumption is that the function governing costs is exogenous, fixed, and predetermined. We challenge this assumption, and assert that costs can emerge as a result of deploying a classifier. Our idea is simple: when users seek positive predictions, this creates demand for important features; and if features are available for purchase, then a market will form, and competition will give rise to prices. We extend the strategic classification framework to support this notion, and study learning in a setting where a classifier can induce a market for features. We present an analysis of the learning task, devise an algorithm for computing market prices, propose a differentiable learning framework, and conduct experiments to explore our novel setting and approach.
Updated: 2025-08-14 08:33:01
标题: 学习诱导市场的分类器
摘要: 当学习被用来决策有关人类的事项,例如贷款、招聘或录取时,这会激励用户有策略地修改他们的特征,以获取积极的预测,但这会带来成本。常见的假设是支配成本的功能是外生的、固定的和预先确定的。我们对这个假设提出质疑,并断言成本可以作为部署分类器的结果而出现。我们的想法很简单:当用户寻求积极的预测时,这会创造对重要特征的需求;如果特征可以购买,那么市场将形成,竞争将会导致价格的出现。我们扩展了战略分类框架以支持这一概念,并研究了一个分类器可以引发特征市场的环境中的学习。我们提出了学习任务的分析,设计了一个计算市场价格的算法,提出了一个可微分的学习框架,并进行实验来探索我们的新颖环境和方法。
更新时间: 2025-08-14 08:33:01
领域: cs.LG
A Lightweight Transformer with Phase-Only Cross-Attention for Illumination-Invariant Biometric Authentication
Traditional biometric systems have encountered significant setbacks due to various unavoidable factors, for example, wearing of face masks in face recognition-based biometrics and hygiene concerns in fingerprint-based biometrics. This paper proposes a novel lightweight vision transformer with phase-only cross-attention (POC-ViT) using dual biometric traits of forehead and periocular portions of the face, capable of performing well even with face masks and without any physical touch, offering a promising alternative to traditional methods. The POC-ViT framework is designed to handle two biometric traits and to capture inter-dependencies in terms of relative structural patterns. Each channel consists of a Cross-Attention using phase-only correlation (POC) that captures both their individual and correlated structural patterns. The computation of cross-attention using POC extracts the phase correlation in the spatial features. Therefore, it is robust against variations in resolution and intensity, as well as illumination changes in the input images. The lightweight model is suitable for edge device deployment. The performance of the proposed framework was successfully demonstrated using the Forehead Subcutaneous Vein Pattern and Periocular Biometric Pattern (FSVP-PBP) database, having 350 subjects. The POC-ViT framework outperformed state-of-the-art methods with an outstanding classification accuracy of $98.8\%$ with the dual biometric traits.
Updated: 2025-08-14 08:27:24
标题: 一个轻量级的变压器,具有仅相位交叉注意力的特性,用于光照不变的生物特征认证
摘要: 传统的生物识别系统由于各种不可避免的因素而遇到了重大挫折,例如在基于面部识别的生物识别中戴口罩和指纹生物识别中的卫生问题。本文提出了一种新颖的轻量级视觉变压器,使用额头和面部眼周部位的双生物特征,即相位交叉注意力(POC-ViT),即使在戴口罩和没有任何物理接触的情况下也能表现良好,为传统方法提供了一个有希望的替代方案。POC-ViT框架旨在处理两个生物特征,并捕捉相对结构模式的相互依赖关系。每个通道都包含使用相位相关性(POC)的交叉注意力,捕捉它们各自和相关的结构模式。使用POC计算交叉注意力会提取空间特征中的相位相关性。因此,它对分辨率和强度的变化以及输入图像中的光照变化具有鲁棒性。这种轻量级模型适合于边缘设备部署。提出的框架在具有350个主题的额头皮下静脉图案和眼周生物特征图案(FSVP-PBP)数据库上成功展示了性能。POC-ViT框架在双生物特征的情况下表现优异,具有98.8%的出色分类准确性,超过了最先进的方法。
更新时间: 2025-08-14 08:27:24
领域: cs.CV,cs.AI,cs.LG
Alternating Approach-Putt Models for Multi-Stage Speech Enhancement
Speech enhancement using artificial neural networks aims to remove noise from noisy speech signals while preserving the speech content. However, speech enhancement networks often introduce distortions to the speech signal, referred to as artifacts, which can degrade audio quality. In this work, we propose a post-processing neural network designed to mitigate artifacts introduced by speech enhancement models. Inspired by the analogy of making a `Putt' after an `Approach' in golf, we name our model PuttNet. We demonstrate that alternating between a speech enhancement model and the proposed Putt model leads to improved speech quality, as measured by perceptual quality scores (PESQ), objective intelligibility (STOI), and background noise intrusiveness (CBAK) scores. Furthermore, we illustrate with graphical analysis why this alternating Approach outperforms repeated application of either model alone.
Updated: 2025-08-14 08:18:42
标题: 多阶段语音增强的交替式推断模型
摘要: 使用人工神经网络进行语音增强旨在从嘈杂的语音信号中消除噪音,同时保留语音内容。然而,语音增强网络常常会引入对语音信号的失真,称为伪像,这会降低音频质量。在这项工作中,我们提出了一个后处理神经网络,旨在减轻语音增强模型引入的伪像。受高尔夫球中“挑球”后“推球”的类比启发,我们将我们的模型命名为PuttNet。我们证明交替使用语音增强模型和提议的Putt模型可以提高语音质量,通过感知质量评分(PESQ)、客观可懂性(STOI)和背景噪音侵扰性(CBAK)评分进行衡量。此外,我们通过图形分析说明为什么这种交替的方法优于重复应用任一模型。
更新时间: 2025-08-14 08:18:42
领域: cs.SD,cs.AI,cs.LG,eess.AS
Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models
Sharpness-Aware Minimization (SAM) has been proven to be an effective optimization technique for improving generalization in overparameterized models. While prior works have explored the implicit regularization of SAM in simple two-core scale-invariant settings, its behavior in more general tensorized or scale-invariant models remains underexplored. In this work, we leverage scale-invariance to analyze the norm dynamics of SAM in general tensorized models. We introduce the notion of \emph{Norm Deviation} as a global measure of core norm imbalance, and derive its evolution under SAM using gradient flow analysis. We show that SAM's implicit control of Norm Deviation is governed by the covariance between core norms and their gradient magnitudes. Motivated by these findings, we propose a simple yet effective method, \emph{Deviation-Aware Scaling (DAS)}, which explicitly mimics this regularization behavior by scaling core norms in a data-adaptive manner. Our experiments across tensor completion, noisy training, model compression, and parameter-efficient fine-tuning confirm that DAS achieves competitive or improved performance over SAM, while offering reduced computational overhead.
Updated: 2025-08-14 08:17:34
标题: 拆解张量模型中的锐度感知最小化隐含规范动态
摘要: 锐度感知最小化(SAM)已被证明是一种有效的优化技术,可改善过度参数化模型中的泛化能力。尽管先前的研究已经探讨了SAM在简单的两核标度不变设置中的隐式正则化,但其在更一般的张量化或标度不变模型中的行为仍未被充分探索。在这项工作中,我们利用标度不变性来分析SAM在一般张量化模型中的范数动态。我们引入了\emph{范数偏差}的概念作为核范数不平衡的全局度量,并利用梯度流分析推导了其在SAM下的演化。我们展示了SAM对范数偏差的隐式控制受核范数和梯度大小之间的协方差的影响。受这些发现的启发,我们提出了一种简单而有效的方法,\emph{偏差感知缩放(DAS)},通过以数据自适应的方式调整核范数,明确模仿这种正则化行为。我们在张量完成、噪声训练、模型压缩和参数高效微调等实验中证实,DAS在SAM之上取得了竞争性或改进的性能,同时提供了降低的计算开销。
更新时间: 2025-08-14 08:17:34
领域: cs.LG,cs.AI,stat.ML
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities across various tasks, but still struggle with complex mathematical reasoning. Existing research primarily focuses on dataset construction and method optimization, often overlooking two critical aspects: comprehensive knowledge-driven design and model-centric data space modeling. In this paper, we introduce We-Math 2.0, a unified system that integrates a structured mathematical knowledge system, model-centric data space modeling, and a reinforcement learning (RL)-based training paradigm to comprehensively enhance the mathematical reasoning abilities of MLLMs. The key contributions of We-Math 2.0 are fourfold: (1) MathBook Knowledge System: We construct a five-level hierarchical system encompassing 491 knowledge points and 1,819 fundamental principles. (2) MathBook-Standard & Pro: We develop MathBook-Standard, a dataset that ensures broad conceptual coverage and flexibility through dual expansion. Additionally, we define a three-dimensional difficulty space and generate 7 progressive variants per problem to build MathBook-Pro, a challenging dataset for robust training. (3) MathBook-RL: We propose a two-stage RL framework comprising: (i) Cold-Start Fine-tuning, which aligns the model with knowledge-oriented chain-of-thought reasoning; and (ii) Progressive Alignment RL, leveraging average-reward learning and dynamic data scheduling to achieve progressive alignment across difficulty levels. (4) MathBookEval: We introduce a comprehensive benchmark covering all 491 knowledge points with diverse reasoning step distributions. Experimental results show that MathBook-RL performs competitively with existing baselines on four widely-used benchmarks and achieves strong results on MathBookEval, suggesting promising generalization in mathematical reasoning.
Updated: 2025-08-14 08:15:41
标题: 我们数学2.0:一种用于激励视觉数学推理的多功能数学书系统
摘要: 多模式大型语言模型(MLLMs)在各种任务中展示了令人印象深刻的能力,但在复杂的数学推理方面仍然存在困难。现有研究主要集中在数据集构建和方法优化上,通常忽视了两个关键方面:全面的知识驱动设计和以模型为中心的数据空间建模。在本文中,我们引入了We-Math 2.0,一个统一系统,将结构化数学知识系统、以模型为中心的数据空间建模和基于强化学习(RL)的训练范式整合在一起,全面增强MLLMs的数学推理能力。We-Math 2.0的关键贡献有四个方面:(1)MathBook知识系统:我们构建了一个包含491个知识点和1819个基本原则的五级层次系统。(2)MathBook-Standard & Pro:我们开发了MathBook-Standard,一个确保广泛概念覆盖和灵活性的数据集,通过双重扩展。此外,我们定义了一个三维难度空间,并为每个问题生成了7个渐进变体,建立了MathBook-Pro,一个用于强化训练的具有挑战性数据集。(3)MathBook-RL:我们提出了一个包括两个阶段的RL框架:(i)冷启动微调,将模型与以知识为导向的思维链对齐;(ii)渐进对齐RL,利用平均奖励学习和动态数据调度实现跨难度水平的渐进对齐。(4)MathBookEval:我们引入了一个包含所有491个知识点和各种推理步骤分布的全面基准。实验结果表明,MathBook-RL在四个广泛使用的基准测试中表现与现有基线竞争,并在MathBookEval上取得了强大的结果,表明在数学推理方面有着良好的泛化能力。
更新时间: 2025-08-14 08:15:41
领域: cs.AI,cs.CV,cs.LG
DeepWriter: A Fact-Grounded Multimodal Writing Assistant Based On Offline Knowledge Base
Large Language Models (LLMs) have demonstrated remarkable capabilities in various applications. However, their use as writing assistants in specialized domains like finance, medicine, and law is often hampered by a lack of deep domain-specific knowledge and a tendency to hallucinate. Existing solutions, such as Retrieval-Augmented Generation (RAG), can suffer from inconsistency across multiple retrieval steps, while online search-based methods often degrade quality due to unreliable web content. To address these challenges, we introduce DeepWriter, a customizable, multimodal, long-form writing assistant that operates on a curated, offline knowledge base. DeepWriter leverages a novel pipeline that involves task decomposition, outline generation, multimodal retrieval, and section-by-section composition with reflection. By deeply mining information from a structured corpus and incorporating both textual and visual elements, DeepWriter generates coherent, factually grounded, and professional-grade documents. We also propose a hierarchical knowledge representation to enhance retrieval efficiency and accuracy. Our experiments on financial report generation demonstrate that DeepWriter produces high-quality, verifiable articles that surpasses existing baselines in factual accuracy and generated content quality.
Updated: 2025-08-14 08:14:12
标题: 深度写作助手:基于离线知识库的事实基础多模式写作助手
摘要: 大型语言模型(LLMs)在各种应用中展现出了卓越的能力。然而,在金融、医学和法律等专业领域作为写作助手时,它们往往受到深度领域专业知识的缺乏和幻觉倾向的阻碍。现有的解决方案,如检索增强生成(RAG),可能在多个检索步骤中存在不一致性,而在线搜索方法往往因为不可靠的网络内容而降低质量。为了解决这些挑战,我们引入了DeepWriter,一个可定制的、多模态的、长篇写作助手,它在一个精心策划的离线知识库上运行。DeepWriter利用一种新颖的流程,涉及任务分解、大纲生成、多模态检索和逐节构成与反思。通过深度挖掘结构化语料库中的信息,并结合文本和视觉元素,DeepWriter生成连贯、事实基础和专业级文档。我们还提出了一种层次化知识表示来增强检索效率和准确性。我们在财务报告生成上的实验表明,DeepWriter生成了高质量、可验证的文章,超过了现有基准线的事实准确性和生成内容质量。
更新时间: 2025-08-14 08:14:12
领域: cs.CL,cs.AI
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
Large reasoning models have achieved remarkable performance through extended chain-of-thought sequences, yet this computational freedom leads to excessive token generation even for simple problems. We present Length-Adaptive Policy Optimization (LAPO), a novel framework that transforms reasoning length control from an external constraint into an intrinsic model capability. Unlike existing approaches that impose rigid limits or rely on post-hoc interventions, LAPO enables models to internalize an understanding of appropriate reasoning depth through a two-stage reinforcement learning process. In the first stage, models learn natural reasoning patterns by discovering the statistical distribution of successful solution lengths. The second stage leverages these patterns as meta-cognitive guidance, embedding them directly within the model's reasoning context to ensure inference-time flexibility. Experiments on mathematical reasoning benchmarks demonstrate that LAPO reduces token usage by up to 40.9% while improving accuracy by 2.3%. Our analysis reveals that models trained with LAPO develop emergent abilities to allocate computational resources based on problem complexity, achieving efficient reasoning without sacrificing quality.
Updated: 2025-08-14 08:13:36
标题: LAPO:通过长度自适应策略优化内部理论效率
摘要: 大型推理模型通过延长的推理序列取得了显著的性能,然而这种计算自由性导致即使对于简单问题也会产生过多的标记生成。我们提出了一种新颖的框架,称为长度自适应策略优化(LAPO),它将推理长度控制从外部约束转化为内在模型能力。与现有的强加严格限制或依赖事后干预的方法不同,LAPO使模型能够通过两阶段强化学习过程内化对适当推理深度的理解。在第一阶段,模型通过发现成功解决长度的统计分布来学习自然推理模式。第二阶段利用这些模式作为元认知指导,将它们直接嵌入到模型的推理上下文中,以确保推理时的灵活性。数学推理基准实验表明,LAPO可以将标记使用量减少高达40.9%,同时提高准确性2.3%。我们的分析显示,通过LAPO训练的模型发展出根据问题复杂性分配计算资源的新能力,实现了有效推理而不牺牲质量。
更新时间: 2025-08-14 08:13:36
领域: cs.AI,cs.CL
Neuronal correlations shape the scaling behavior of memory capacity and nonlinear computational capability of reservoir recurrent neural networks
Reservoir computing is a powerful framework for real-time information processing, characterized by its high computational ability and quick learning, with applications ranging from machine learning to biological systems. In this paper, we investigate how the computational ability of reservoir recurrent neural networks (RNNs) scales with an increasing number of readout neurons. First, we demonstrate that the memory capacity of a reservoir RNN scales sublinearly with the number of readout neurons. To elucidate this observation, we develop a theoretical framework for analytically deriving memory capacity that incorporates the effect of neuronal correlations, which have been ignored in prior theoretical work for analytical simplicity. Our theory successfully relates the sublinear scaling of memory capacity to the strength of neuronal correlations. Furthermore, we show this principle holds across diverse types of RNNs, even those beyond the direct applicability of our theory. Next, we numerically investigate the scaling behavior of nonlinear computational ability, which, alongside memory capacity, is crucial for overall computational performance. Our numerical simulations reveal that as memory capacity growth becomes sublinear, increasing the number of readout neurons successively enables nonlinear processing at progressively higher polynomial orders. Our theoretical framework suggests that neuronal correlations govern not only memory capacity but also the sequential growth of nonlinear computational capabilities. Our findings establish a foundation for designing scalable and cost-effective reservoir computing, providing novel insights into the interplay among neuronal correlations, linear memory, and nonlinear processing.
Updated: 2025-08-14 08:10:43
标题: 神经元相关性塑造了储存容量的缩放行为和嵌套循环神经网络的非线性计算能力
摘要: Reservoir computing是一种强大的实时信息处理框架,其特点是具有高计算能力和快速学习能力,应用范围从机器学习到生物系统。在本文中,我们调查了随着读出神经元数量增加,储水池循环神经网络(RNNs)的计算能力如何扩展。首先,我们证明了储水池RNN的记忆容量随着读出神经元数量的增加呈亚线性扩展。为了阐明这一观察结果,我们开发了一个理论框架,用于分析推导记忆容量,该框架考虑了神经元相关性的影响,这在以前的理论工作中被忽略,以简化分析。我们的理论成功地将记忆容量的亚线性扩展与神经元相关性的强度联系起来。此外,我们展示了这一原则适用于各种类型的RNNs,甚至适用于我们理论的直接适用范围之外的RNNs。接下来,我们通过数值模拟调查了非线性计算能力的扩展行为,这与记忆容量一样,对于整体计算性能至关重要。我们的数值模拟显示,随着记忆容量的增长变得亚线性,增加读出神经元的数量逐渐实现了越来越高次幂的非线性处理。我们的理论框架表明,神经元相关性不仅影响记忆容量,还影响非线性计算能力的顺序增长。我们的发现为设计可扩展和成本效益的储水池计算奠定了基础,提供了关于神经元相关性、线性记忆和非线性处理之间相互作用的新颖见解。
更新时间: 2025-08-14 08:10:43
领域: cond-mat.dis-nn,cs.LG,q-bio.NC
Discrepancy-Aware Graph Mask Auto-Encoder
Masked Graph Auto-Encoder, a powerful graph self-supervised training paradigm, has recently shown superior performance in graph representation learning. Existing works typically rely on node contextual information to recover the masked information. However, they fail to generalize well to heterophilic graphs where connected nodes may be not similar, because they focus only on capturing the neighborhood information and ignoring the discrepancy information between different nodes, resulting in indistinguishable node representations. In this paper, to address this issue, we propose a Discrepancy-Aware Graph Mask Auto-Encoder (DGMAE). It obtains more distinguishable node representations by reconstructing the discrepancy information of neighboring nodes during the masking process. We conduct extensive experiments on 17 widely-used benchmark datasets. The results show that our DGMAE can effectively preserve the discrepancies of nodes in low-dimensional space. Moreover, DGMAE significantly outperforms state-of-the-art graph self-supervised learning methods on three graph analytic including tasks node classification, node clustering, and graph classification, demonstrating its remarkable superiority. The code of DGMAE is available at https://github.com/zhengziyu77/DGMAE.
Updated: 2025-08-14 08:05:43
标题: 差异感知图蒙版自编码器
摘要: 掩码图自编码器(Masked Graph Auto-Encoder)是一种强大的图自监督训练范式,最近在图表示学习中表现出优越的性能。现有的研究通常依赖于节点的上下文信息来恢复被掩盖的信息。然而,它们在异构图中通常无法很好地泛化,因为连接的节点可能并不相似,这是因为它们只关注捕获邻域信息,忽略了不同节点之间的差异信息,导致节点表示无法区分。为了解决这个问题,本文提出了一种差异感知图掩码自编码器(Discrepancy-Aware Graph Mask Auto-Encoder,DGMAE)。在掩码过程中通过重建相邻节点的差异信息,它获得了更具区分性的节点表示。我们在17个广泛使用的基准数据集上进行了大量实验。结果表明,我们的DGMAE能够有效地保留节点在低维空间中的差异。此外,DGMAE在节点分类、节点聚类和图分类等三个图分析任务上明显优于最先进的图自监督学习方法,展示了其显著的优越性。DGMAE的代码可在https://github.com/zhengziyu77/DGMAE中获得。
更新时间: 2025-08-14 08:05:43
领域: cs.LG,cs.AI
Yet Another Mirage of Breaking MIRAGE: Debunking Occupancy-based Side-Channel Attacks on Fully Associative Randomized Caches
Recent work presented at USENIX Security 2025 claims that occupancy-based attacks can recover AES keys from the MIRAGE randomized cache. In this paper, we examine these claims and find that they arise from fundamental modeling flaws. Most critically, the authors' simulation of MIRAGE uses a constant seed to initialize the random number generator used for global evictions in MIRAGE, causing every AES encryption they trace to evict the same deterministic sequence of cache lines. This artificially creates a highly repeatable timing pattern that is not representative of a realistic implementation of MIRAGE, where eviction sequences vary randomly between encryptions. When we instead randomize the eviction seed for each run, reflecting realistic operation, the correlation between AES T-table accesses and attacker runtimes disappears, and the attack fails. These findings show that the reported leakage is an artifact of incorrect modeling, and not an actual vulnerability in MIRAGE.
Updated: 2025-08-14 08:04:15
标题: 再次揭开破解MIRAGE的幻象:揭穿基于占用率的对全关联随机缓存的侧信道攻击
摘要: 最近在USENIX Security 2025上发布的研究声称,基于占用率的攻击可以从MIRAGE随机缓存中恢复AES密钥。在本文中,我们检查了这些声明,并发现它们源于基本建模错误。最为关键的是,作者模拟MIRAGE时使用一个恒定种子来初始化用于全局逐出的随机数生成器,导致他们跟踪的每个AES加密都会逐出相同确定性的缓存行序列。这人为地创建了一个高度可重复的时间模式,不代表MIRAGE的实际实现,其中逐出序列在加密之间随机变化。当我们改用每次运行随机化逐出种子,反映实际操作时,AES T表访问和攻击者运行时间之间的相关性消失,攻击失败。这些发现表明,所报告的泄漏是建模错误的产物,而不是MIRAGE中的实际漏洞。
更新时间: 2025-08-14 08:04:15
领域: cs.CR
A Neurosymbolic Framework for Interpretable Cognitive Attack Detection in Augmented Reality
Augmented Reality (AR) enriches perception by overlaying virtual elements on the physical world. Due to its growing popularity, cognitive attacks that alter AR content to manipulate users' semantic perception have received increasing attention. Existing detection methods often focus on visual changes, which are restricted to pixel- or image-level processing and lack semantic reasoning capabilities, or they rely on pre-trained vision-language models (VLMs), which function as black-box approaches with limited interpretability. In this paper, we present CADAR, a novel neurosymbolic approach for cognitive attack detection in AR. It fuses multimodal vision-language inputs using neural VLMs to obtain a symbolic perception-graph representation, incorporating prior knowledge, salience weighting, and temporal correlations. The model then enables particle-filter based statistical reasoning -- a sequential Monte Carlo method -- to detect cognitive attacks. Thus, CADAR inherits the adaptability of pre-trained VLM and the interpretability and reasoning rigor of particle filtering. Experiments on an extended AR cognitive attack dataset show accuracy improvements of up to 10.7% over strong baselines on challenging AR attack scenarios, underscoring the promise of neurosymbolic methods for effective and interpretable cognitive attack detection.
Updated: 2025-08-14 07:59:40
标题: 一个神经符号框架用于可解释的增强现实认知攻击检测
摘要: 增强现实(AR)通过在物理世界上叠加虚拟元素来丰富感知。由于其日益流行,对改变AR内容以操纵用户语义感知的认知攻击引起了越来越多的关注。现有的检测方法通常集中在视觉变化上,这些变化受限于像素或图像级处理,并缺乏语义推理能力,或者它们依赖于预先训练的视觉-语言模型(VLMs),这些模型作为黑盒方法具有有限的可解释性。在本文中,我们提出了CADAR,一种用于AR中认知攻击检测的新型神经符号方法。它使用神经VLMs融合多模态视觉-语言输入,以获得符号感知图表示,结合先验知识、显著性加权和时间相关性。然后,该模型通过基于粒子滤波的统计推理(一种顺序蒙特卡洛方法)来检测认知攻击。因此,CADAR继承了预先训练的VLM的适应性,以及粒子滤波的可解释性和推理严密性。在扩展的AR认知攻击数据集上进行的实验显示,在具有挑战性的AR攻击场景中,CADAR相对于强基线的准确性提高了高达10.7%,强调了神经符号方法在有效和可解释的认知攻击检测方面的潜力。
更新时间: 2025-08-14 07:59:40
领域: cs.CV,cs.AI
MM-Food-100K: A 100,000-Sample Multimodal Food Intelligence Dataset with Verifiable Provenance
We present MM-Food-100K, a public 100,000-sample multimodal food intelligence dataset with verifiable provenance. It is a curated approximately 10% open subset of an original 1.2 million, quality-accepted corpus of food images annotated for a wide range of information (such as dish name, region of creation). The corpus was collected over six weeks from over 87,000 contributors using the Codatta contribution model, which combines community sourcing with configurable AI-assisted quality checks; each submission is linked to a wallet address in a secure off-chain ledger for traceability, with a full on-chain protocol on the roadmap. We describe the schema, pipeline, and QA, and validate utility by fine-tuning large vision-language models (ChatGPT 5, ChatGPT OSS, Qwen-Max) on image-based nutrition prediction. Fine-tuning yields consistent gains over out-of-box baselines across standard metrics; we report results primarily on the MM-Food-100K subset. We release MM-Food-100K for publicly free access and retain approximately 90% for potential commercial access with revenue sharing to contributors.
Updated: 2025-08-14 07:59:31
标题: MM-Food-100K: 一个包含10万样本的多模态食品智能数据集,具有可验证来源
摘要: 我们介绍了MM-Food-100K,一个公开的、拥有10万样本的多模态食品智能数据集,具有可验证的出处。这是一个由原始120万张经过质量认可的食品图像组成的数据集的精心策划的开放子集,其注释涵盖了广泛的信息(如菜名、制作地区)。该数据集是通过Codatta贡献模型从超过87,000名贡献者处收集的,该模型将社区采集与可配置的AI辅助质量检查相结合;每个提交都与一个安全的链下账本中的钱包地址相关联,以便追溯,未来还将有完整的链上协议。我们描述了架构、流程和质量保证,并通过在基于图像的营养预测上微调大型视觉语言模型(ChatGPT 5、ChatGPT OSS、Qwen-Max)来验证其实用性。微调在标准指标上始终优于开箱即用的基线模型;我们主要报告了MM-Food-100K子集的结果。我们发布了MM-Food-100K供公开免费访问,并保留约90%供潜在的商业使用,并按贡献者的收入进行分成。
更新时间: 2025-08-14 07:59:31
领域: cs.AI,cs.CR,cs.CV,I.2.10; I.2.6
SC2Arena and StarEvolve: Benchmark and Self-Improvement Framework for LLMs in Complex Decision-Making Tasks
Evaluating large language models (LLMs) in complex decision-making is essential for advancing AI's ability for strategic planning and real-time adaptation. However, existing benchmarks for tasks like StarCraft II fail to capture the game's full complexity, such as its complete game context, diverse action spaces, and all playable races. To address this gap, we present SC2Arena, a benchmark that fully supports all playable races, low-level action spaces, and optimizes text-based observations to tackle spatial reasoning challenges. Complementing this, we introduce StarEvolve, a hierarchical framework that integrates strategic planning with tactical execution, featuring iterative self-correction and continuous improvement via fine-tuning on high-quality gameplay data. Its key components include a Planner-Executor-Verifier structure to break down gameplay, and a scoring system for selecting high-quality training samples. Comprehensive analysis using SC2Arena provides valuable insights into developing generalist agents that were not possible with previous benchmarks. Experimental results also demonstrate that our proposed StarEvolve achieves superior performance in strategic planning. Our code, environment, and algorithms are publicly available.
Updated: 2025-08-14 07:58:01
标题: SC2Arena和StarEvolve:复杂决策任务中LLM的基准和自我改进框架
摘要: 在复杂决策中评估大型语言模型(LLMs)对于推动人工智能在战略规划和实时调整方面的能力至关重要。然而,现有的用于类似星际争霸II任务的基准测试未能完全捕捉游戏的复杂性,如完整的游戏背景、多样的行动空间和所有可玩种族。为了弥补这一差距,我们提出了SC2Arena,一个全面支持所有可玩种族、低级别行动空间,并优化基于文本的观察以解决空间推理挑战的基准测试。为了补充这一点,我们引入了StarEvolve,一个将战略规划与战术执行相结合的分层框架,具有通过在高质量游戏数据上进行微调进行迭代式自我校正和持续改进的特点。其关键组件包括将游戏分解为计划者-执行者-验证者结构,以及用于选择高质量训练样本的评分系统。使用SC2Arena进行的全面分析为开发以前基准测试无法实现的通用代理提供了宝贵的见解。实验结果还表明,我们提出的StarEvolve在战略规划方面表现出卓越的性能。我们的代码、环境和算法是公开可用的。
更新时间: 2025-08-14 07:58:01
领域: cs.LG
HiRef: Leveraging Hierarchical Ontology and Network Refinement for Robust Medication Recommendation
Medication recommendation is a crucial task for assisting physicians in making timely decisions from longitudinal patient medical records. However, real-world EHR data present significant challenges due to the presence of rarely observed medical entities and incomplete records that may not fully capture the clinical ground truth. While data-driven models trained on longitudinal Electronic Health Records often achieve strong empirical performance, they struggle to generalize under missing or novel conditions, largely due to their reliance on observed co-occurrence patterns. To address these issues, we propose Hierarchical Ontology and Network Refinement for Robust Medication Recommendation (HiRef), a unified framework that combines two complementary structures: (i) the hierarchical semantics encoded in curated medical ontologies, and (ii) refined co-occurrence patterns derived from real-world EHRs. We embed ontology entities in hyperbolic space, which naturally captures tree-like relationships and enables knowledge transfer through shared ancestors, thereby improving generalizability to unseen codes. To further improve robustness, we introduce a prior-guided sparse regularization scheme that refines the EHR co-occurrence graph by suppressing spurious edges while preserving clinically meaningful associations. Our model achieves strong performance on EHR benchmarks (MIMIC-III and MIMIC-IV) and maintains high accuracy under simulated unseen-code settings. Extensive experiments with comprehensive ablation studies demonstrate HiRef's resilience to unseen medical codes, supported by in-depth analyses of the learned sparsified graph structure and medical code embeddings.
Updated: 2025-08-14 07:55:03
标题: HiRef:利用分层本体和网络精炼技术进行稳健的药物推荐
摘要: 药物推荐是协助医生从纵向患者医疗记录中做出及时决策的关键任务。然而,现实世界中的电子健康记录数据存在重要挑战,因为很少观察到的医疗实体和不完整的记录可能无法完全捕捉临床真相。虽然在纵向电子健康记录上训练的数据驱动模型通常能够取得强大的实证表现,但它们在缺失或新颖条件下往往难以泛化,主要是由于它们依赖于观察到的共现模式。为了解决这些问题,我们提出了用于稳健药物推荐的分层本体和网络细化(HiRef)的统一框架,该框架结合了两种互补的结构:(i)编码在策划医学本体中的层次语义,以及(ii)从现实世界电子健康记录派生的优化共现模式。我们将本体实体嵌入双曲空间,自然地捕捉树状关系,并通过共享祖先实现知识传递,从而提高对未见代码的泛化能力。为了进一步提高稳健性,我们引入了一种先验引导的稀疏正则化方案,通过抑制虚假边缘同时保留临床有意义的关联来优化电子健康记录共现图。我们的模型在电子健康记录基准(MIMIC-III和MIMIC-IV)上取得了强大表现,并在模拟未见代码设置下保持高准确性。通过广泛的消融研究实验,展示了HiRef对未见医学代码的韧性,支持对学习稀疏化图结构和医学代码嵌入的深入分析。
更新时间: 2025-08-14 07:55:03
领域: cs.AI,cs.LG
MASH: Cooperative-Heterogeneous Multi-Agent Reinforcement Learning for Single Humanoid Robot Locomotion
This paper proposes a novel method to enhance locomotion for a single humanoid robot through cooperative-heterogeneous multi-agent deep reinforcement learning (MARL). While most existing methods typically employ single-agent reinforcement learning algorithms for a single humanoid robot or MARL algorithms for multi-robot system tasks, we propose a distinct paradigm: applying cooperative-heterogeneous MARL to optimize locomotion for a single humanoid robot. The proposed method, multi-agent reinforcement learning for single humanoid locomotion (MASH), treats each limb (legs and arms) as an independent agent that explores the robot's action space while sharing a global critic for cooperative learning. Experiments demonstrate that MASH accelerates training convergence and improves whole-body cooperation ability, outperforming conventional single-agent reinforcement learning methods. This work advances the integration of MARL into single-humanoid-robot control, offering new insights into efficient locomotion strategies.
Updated: 2025-08-14 07:54:31
标题: MASH:用于单个人形机器人运动的合作异构多智能体强化学习
摘要: 本文提出了一种新颖的方法,通过合作异构多智能体深度强化学习(MARL)来增强单个人形机器人的运动能力。尽管大多数现有方法通常使用单智能体强化学习算法来控制单个人形机器人,或者使用MARL算法来处理多机器人系统任务,但我们提出了一种独特的范例:将合作异构MARL应用于优化单个人形机器人的运动。所提出的方法,即单一人形机器人运动的多智能体强化学习(MASH),将每个肢体(腿和手臂)视为一个独立智能体,探索机器人的动作空间,同时共享全局评论家以进行合作学习。实验证明,MASH加速了训练收敛速度,提高了整体协作能力,优于传统的单智能体强化学习方法。这项工作推动了MARL技术与单个人形机器人控制的整合,为高效的运动策略提供了新的见解。
更新时间: 2025-08-14 07:54:31
领域: cs.RO,cs.AI,cs.SY,eess.SY
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
Narrative comprehension on long stories and novels has been a challenging domain attributed to their intricate plotlines and entangled, often evolving relations among characters and entities. Given the LLM's diminished reasoning over extended context and high computational cost, retrieval-based approaches remain a pivotal role in practice. However, traditional RAG methods can fall short due to their stateless, single-step retrieval process, which often overlooks the dynamic nature of capturing interconnected relations within long-range context. In this work, we propose ComoRAG, holding the principle that narrative reasoning is not a one-shot process, but a dynamic, evolving interplay between new evidence acquisition and past knowledge consolidation, analogous to human cognition when reasoning with memory-related signals in the brain. Specifically, when encountering a reasoning impasse, ComoRAG undergoes iterative reasoning cycles while interacting with a dynamic memory workspace. In each cycle, it generates probing queries to devise new exploratory paths, then integrates the retrieved evidence of new aspects into a global memory pool, thereby supporting the emergence of a coherent context for the query resolution. Across four challenging long-context narrative benchmarks (200K+ tokens), ComoRAG outperforms strong RAG baselines with consistent relative gains up to 11% compared to the strongest baseline. Further analysis reveals that ComoRAG is particularly advantageous for complex queries requiring global comprehension, offering a principled, cognitively motivated paradigm for retrieval-based long context comprehension towards stateful reasoning. Our code is publicly released at https://github.com/EternityJune25/ComoRAG
Updated: 2025-08-14 07:52:09
标题: ComoRAG:一种受认知启发的、用于有状态长篇叙事推理的记忆组织RAG
摘要: 长篇故事和小说的叙事理解一直是一个具有挑战性的领域,这归因于它们复杂的情节线和纠缠不清的、经常变化的人物和实体之间的关系。鉴于LLM在处理扩展上下文和高计算成本方面的推理能力有限,基于检索的方法在实践中仍然起着关键作用。然而,传统的RAG方法可能存在不足,因为它们是无状态的、单步骤的检索过程,往往忽视了在长距离上下文中捕捉互相关系的动态性质。在这项工作中,我们提出了ComoRAG,其原则是叙事推理不是一个一次性过程,而是一种动态的、不断发展的新证据获取和过去知识巩固之间的相互作用,类似于大脑中推理与记忆相关信号时的人类认知。具体而言,当遇到推理僵局时,ComoRAG会通过与动态内存工作空间进行交互的迭代推理周期。在每个周期中,它会生成探索性的查询以制定新的探索路径,然后将新方面的检索证据整合到全局内存池中,从而支持查询解决的连贯上下文的出现。在四个具有挑战性的长上下文叙事基准(200K+标记)中,ComoRAG的表现优于强大的RAG基线,相对增益一致高达11%。进一步分析表明,ComoRAG特别适用于需要全局理解的复杂查询,为基于检索的长上下文理解提供了一种有原则、认知动机的范式,有助于有状态推理。我们的代码已经公开发布在https://github.com/EternityJune25/ComoRAG。
更新时间: 2025-08-14 07:52:09
领域: cs.CL,cs.AI,cs.LG
MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning
Marine videos present significant challenges for video understanding due to the dynamics of marine objects and the surrounding environment, camera motion, and the complexity of underwater scenes. Existing video captioning datasets, typically focused on generic or human-centric domains, often fail to generalize to the complexities of the marine environment and gain insights about marine life. To address these limitations, we propose a two-stage marine object-oriented video captioning pipeline. We introduce a comprehensive video understanding benchmark that leverages the triplets of video, text, and segmentation masks to facilitate visual grounding and captioning, leading to improved marine video understanding and analysis, and marine video generation. Additionally, we highlight the effectiveness of video splitting in order to detect salient object transitions in scene changes, which significantly enrich the semantics of captioning content. Our dataset and code have been released at https://msc.hkustvgd.com.
Updated: 2025-08-14 07:50:06
标题: MSC: 一个具有基于分割和剪辑级别字幕的海洋野生动物视频数据集
摘要: 海洋视频对视频理解提出了重大挑战,这是由于海洋物体及周围环境的动态性、摄像机运动以及水下场景的复杂性。现有的视频字幕数据集通常专注于通用或以人为中心的领域,往往难以泛化到海洋环境的复杂性,并获得有关海洋生物的见解。为了解决这些限制,我们提出了一个两阶段的海洋物体导向视频字幕管道。我们引入了一个综合的视频理解基准,利用视频、文本和分割掩模的三元组来促进视觉定位和字幕生成,从而提高海洋视频理解和分析以及海洋视频生成的效果。此外,我们强调了视频分割的有效性,以便检测场景变化中显著的物体转换,这显著丰富了字幕内容的语义。我们的数据集和代码已发布在https://msc.hkustvgd.com。
更新时间: 2025-08-14 07:50:06
领域: cs.CV,cs.AI,cs.MM
CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model
Existing vision-and-language navigation models often deviate from the correct trajectory when executing instructions. However, these models lack effective error correction capability, hindering their recovery from errors. To address this challenge, we propose Self-correction Flywheel, a novel post-training paradigm. Instead of considering the model's error trajectories on the training set as a drawback, our paradigm emphasizes their significance as a valuable data source. We have developed a method to identify deviations in these error trajectories and devised innovative techniques to automatically generate self-correction data for perception and action. These self-correction data serve as fuel to power the model's continued training. The brilliance of our paradigm is revealed when we re-evaluate the model on the training set, uncovering new error trajectories. At this time, the self-correction flywheel begins to spin. Through multiple flywheel iterations, we progressively enhance our monocular RGB-based VLA navigation model CorrectNav. Experiments on R2R-CE and RxR-CE benchmarks show CorrectNav achieves new state-of-the-art success rates of 65.1% and 69.3%, surpassing prior best VLA navigation models by 8.2% and 16.4%. Real robot tests in various indoor and outdoor environments demonstrate \method's superior capability of error correction, dynamic obstacle avoidance, and long instruction following.
Updated: 2025-08-14 07:39:26
标题: CorrectNav:自我校正飞轮赋能视觉-语言-行动导航模型
摘要: 现有的视觉与语言导航模型在执行指令时经常偏离正确的轨迹。然而,这些模型缺乏有效的错误更正能力,阻碍了它们从错误中恢复。为了解决这一挑战,我们提出了一种新颖的后训练范式——自我校正飞轮。我们的范式并不将模型在训练集上的错误轨迹视为一个缺点,而是强调它们作为宝贵数据源的重要性。我们已经开发了一种方法来识别这些错误轨迹中的偏差,并设计了创新技术来自动生成感知和行动的自我校正数据。这些自我校正数据作为驱动模型持续训练的动力。当我们在训练集上重新评估模型时,揭示了新的错误轨迹时,我们的范式的优势就显现出来了。此时,自我校正飞轮开始旋转。通过多次飞轮迭代,我们逐步增强了我们基于单眼RGB的VLA导航模型CorrectNav。在R2R-CE和RxR-CE基准测试中的实验证明,CorrectNav实现了新的成功率最高水平,分别为65.1%和69.3%,比先前最佳的VLA导航模型高出8.2%和16.4%。在各种室内和室外环境中进行的真实机器人测试展示了方法在错误更正、动态障碍物避开和长指令跟随方面的卓越能力。
更新时间: 2025-08-14 07:39:26
领域: cs.RO,cs.AI,cs.CL,cs.CV
MCP2OSC: Parametric Control by Natural Language
Text prompts enable intuitive content creation but may fall short in achieving high precision for intricate tasks; knob or slider controls offer precise adjustments at the cost of increased complexity. To address the gap between knobs and prompts, a new MCP (Model Context Protocol) server and a unique set of prompt design criteria are presented to enable exploring parametric OSC (OpenSoundControl) control by natural language prompts. Demonstrated by 14 practical QA examples with best practices and the generalized prompt templates, this study finds Claude integrated with the MCP2OSC server effective in generating OSC messages by natural language, interpreting, searching, and visualizing OSC messages, validating and debugging OSC messages, and managing OSC address patterns. MCP2OSC enhances human-machine collaboration by leveraging LLM (Large Language Model) to handle intricate OSC development tasks, and by empowering human creativity with an intuitive language interface featuring flexible precision controls: a prompt-based OSC tool. This study provides a novel perspective on the creative MCP application at the network protocol level by utilizing LLM's strength in directly processing and generating human-readable OSC messages. The results suggest its potential for a LLM-based universal control mechanism for multimedia devices.
Updated: 2025-08-14 07:38:01
标题: MCP2OSC: 自然语言参数化控制
摘要: 文本提示使内容创建变得直观,但在完成复杂任务时可能缺乏高精度;旋钮或滑块控件提供精确调整,但增加了复杂性。为了弥补旋钮和提示之间的差距,提出了一个新的MCP(Model Context Protocol)服务器和一组独特的提示设计标准,以便通过自然语言提示探索参数化OSC(OpenSoundControl)控制。通过14个实际的QA示例和最佳实践以及通用的提示模板进行演示,本研究发现与MCP2OSC服务器集成的Claude能够有效地通过自然语言生成OSC消息,解释、搜索和可视化OSC消息,验证和调试OSC消息,以及管理OSC地址模式。MCP2OSC通过利用LLM(Large Language Model)处理复杂的OSC开发任务,提升了人机协作,同时通过提供具有灵活精度控制的直观语言界面,即基于提示的OSC工具,增强了人类创造力。本研究通过利用LLM在直接处理和生成可读的OSC消息方面的优势,在网络协议级别提供了创新的MCP应用视角。结果表明,它具有成为基于LLM的多媒体设备通用控制机制的潜力。
更新时间: 2025-08-14 07:38:01
领域: cs.HC,cs.AI,cs.SD,eess.AS
Enhancing GraphQL Security by Detecting Malicious Queries Using Large Language Models, Sentence Transformers, and Convolutional Neural Networks
GraphQL's flexibility, while beneficial for efficient data fetching, introduces unique security vulnerabilities that traditional API security mechanisms often fail to address. Malicious GraphQL queries can exploit the language's dynamic nature, leading to denial-of-service attacks, data exfiltration through injection, and other exploits. Existing solutions, such as static analysis, rate limiting, and general-purpose Web Application Firewalls, offer limited protection against sophisticated, context-aware attacks. This paper presents a novel, AI-driven approach for real-time detection of malicious GraphQL queries. Our method combines static analysis with machine learning techniques, including Large Language Models (LLMs) for dynamic schema-based configuration, Sentence Transformers (SBERT and Doc2Vec) for contextual embedding of query payloads, and Convolutional Neural Networks (CNNs), Random Forests, and Multilayer Perceptrons for classification. We detail the system architecture, implementation strategies optimized for production environments (including ONNX Runtime optimization and parallel processing), and evaluate the performance of our detection models and the overall system under load. Results demonstrate high accuracy in detecting various threats, including SQL injection, OS command injection, and XSS exploits, alongside effective mitigation of DoS and SSRF attempts. This research contributes a robust and adaptable solution for enhancing GraphQL API security.
Updated: 2025-08-14 07:35:11
标题: 通过使用大型语言模型、句子转换器和卷积神经网络检测恶意查询以增强GraphQL安全性
摘要: GraphQL的灵活性,在有效数据提取方面具有益处,但也引入了传统API安全机制通常无法解决的独特安全漏洞。恶意GraphQL查询可以利用该语言的动态特性,导致拒绝服务攻击、通过注入数据窃取以及其他利用。现有的解决方案,如静态分析、速率限制和通用Web应用程序防火墙,对于复杂的、上下文感知的攻击提供了有限的保护。本文提出了一种新颖的、基于AI的方法,用于实时检测恶意GraphQL查询。我们的方法将静态分析与机器学习技术相结合,包括用于动态基于模式的配置的大型语言模型(LLMs),用于查询负载的上下文嵌入的句子转换器(SBERT和Doc2Vec),以及用于分类的卷积神经网络(CNNs)、随机森林和多层感知器。我们详细介绍了系统架构,针对生产环境进行了优化的实施策略(包括ONNX运行时优化和并行处理),并评估了我们的检测模型在负载下的性能以及整个系统的性能。结果表明,在检测各种威胁方面(包括SQL注入、操作系统命令注入和XSS利用)具有较高的准确性,同时有效地缓解了DoS和SSRF尝试。这项研究为增强GraphQL API安全性提供了一个强大而适应性强的解决方案。
更新时间: 2025-08-14 07:35:11
领域: cs.CR,cs.AI,cs.LG
Visual SLAMMOT Considering Multiple Motion Models
Simultaneous Localization and Mapping (SLAM) and Multi-Object Tracking (MOT) are pivotal tasks in the realm of autonomous driving, attracting considerable research attention. While SLAM endeavors to generate real-time maps and determine the vehicle's pose in unfamiliar settings, MOT focuses on the real-time identification and tracking of multiple dynamic objects. Despite their importance, the prevalent approach treats SLAM and MOT as independent modules within an autonomous vehicle system, leading to inherent limitations. Classical SLAM methodologies often rely on a static environment assumption, suitable for indoor rather than dynamic outdoor scenarios. Conversely, conventional MOT techniques typically rely on the vehicle's known state, constraining the accuracy of object state estimations based on this prior. To address these challenges, previous efforts introduced the unified SLAMMOT paradigm, yet primarily focused on simplistic motion patterns. In our team's previous work IMM-SLAMMOT\cite{IMM-SLAMMOT}, we present a novel methodology incorporating consideration of multiple motion models into SLAMMOT i.e. tightly coupled SLAM and MOT, demonstrating its efficacy in LiDAR-based systems. This paper studies feasibility and advantages of instantiating this methodology as visual SLAMMOT, bridging the gap between LiDAR and vision-based sensing mechanisms. Specifically, we propose a solution of visual SLAMMOT considering multiple motion models and validate the inherent advantages of IMM-SLAMMOT in the visual domain.
Updated: 2025-08-14 07:34:43
标题: 考虑多种运动模型的视觉SLAMMOT
摘要: 同时定位与地图构建(SLAM)和多目标跟踪(MOT)是自动驾驶领域中至关重要的任务,吸引了大量的研究关注。虽然SLAM致力于在陌生环境中生成实时地图并确定车辆的姿态,MOT专注于实时识别和跟踪多个动态物体。尽管它们的重要性,目前的方法将SLAM和MOT视为自动驾驶系统中独立的模块,导致固有的局限性。经典的SLAM方法往往依赖于静态环境假设,适用于室内而非动态室外场景。相反,传统的MOT技术通常依赖于车辆的已知状态,限制了基于此先验的对象状态估计的准确性。为了解决这些挑战,先前的努力引入了统一的SLAMMOT范式,但主要集中在简单的运动模式上。在我们团队的以往工作IMM-SLAMMOT中,我们提出了一种新颖的方法,将考虑多个运动模型纳入SLAMMOT中,即密切耦合的SLAM和MOT,展示了其在基于LiDAR的系统中的有效性。本文研究了将该方法作为视觉SLAMMOT的可行性和优势,弥合了LiDAR和基于视觉的传感机制之间的差距。具体地,我们提出了一种考虑多个运动模型的视觉SLAMMOT解决方案,并验证了IMM-SLAMMOT在视觉领域的固有优势。
更新时间: 2025-08-14 07:34:43
领域: cs.RO,cs.AI,cs.CV
AnalogSeeker: An Open-source Foundation Language Model for Analog Circuit Design
In this paper, we propose AnalogSeeker, an effort toward an open-source foundation language model for analog circuit design, with the aim of integrating domain knowledge and giving design assistance. To overcome the scarcity of data in this field, we employ a corpus collection strategy based on the domain knowledge framework of analog circuits. High-quality, accessible textbooks across relevant subfields are systematically curated and cleaned into a textual domain corpus. To address the complexity of knowledge of analog circuits, we introduce a granular domain knowledge distillation method. Raw, unlabeled domain corpus is decomposed into typical, granular learning nodes, where a multi-agent framework distills implicit knowledge embedded in unstructured text into question-answer data pairs with detailed reasoning processes, yielding a fine-grained, learnable dataset for fine-tuning. To address the unexplored challenges in training analog circuit foundation models, we explore and share our training methods through both theoretical analysis and experimental validation. We finally establish a fine-tuning-centric training paradigm, customizing and implementing a neighborhood self-constrained supervised fine-tuning algorithm. This approach enhances training outcomes by constraining the perturbation magnitude between the model's output distributions before and after training. In practice, we train the Qwen2.5-32B-Instruct model to obtain AnalogSeeker, which achieves 85.04% accuracy on AMSBench-TQA, the analog circuit knowledge evaluation benchmark, with a 15.67% point improvement over the original model and is competitive with mainstream commercial models. Furthermore, AnalogSeeker also shows effectiveness in the downstream operational amplifier design task. AnalogSeeker is open-sourced at https://huggingface.co/analogllm/analogseeker for research use.
Updated: 2025-08-14 07:32:07
标题: AnalogSeeker:用于模拟电路设计的开源基础语言模型
摘要: 在这篇论文中,我们提出了AnalogSeeker,这是一个开源的模拟电路设计基础语言模型,旨在整合领域知识并提供设计辅助。为了克服这一领域数据稀缺的问题,我们采用基于模拟电路领域知识框架的语料库收集策略。通过系统地筛选和清理相关子领域的高质量、易获取的教材,构建成为一个文本领域语料库。为了解决模拟电路知识的复杂性,我们引入了一种精细颗粒的领域知识提炼方法。原始、未标记的领域语料库被分解为典型的、精细的学习节点,一个多智能体框架将嵌入在非结构化文本中的隐含知识提炼为具有详细推理过程的问题-答案数据对,产生了一个细粒度的、可学习的数据集用于微调。为了解决模拟电路基础模型训练中尚未探索的挑战,我们通过理论分析和实验验证探索并分享我们的训练方法。最终,我们建立了一个以微调为中心的训练范式,定制和实现了一个邻域自约束的监督微调算法。这种方法通过约束模型在训练前后的输出分布之间的扰动幅度来增强训练结果。在实践中,我们训练Qwen2.5-32B-Instruct模型,得到AnalogSeeker,在AMSBench-TQA模拟电路知识评估基准上达到了85.04%的准确率,比原始模型提高了15.67个百分点,并且与主流商业模型具有竞争力。此外,AnalogSeeker还在下游运算放大器设计任务中表现出效果。AnalogSeeker已在https://huggingface.co/analogllm/analogseeker上开源供研究使用。
更新时间: 2025-08-14 07:32:07
领域: cs.AR,cs.AI
Online Distributional Regression
Large-scale streaming data are common in modern machine learning applications and have led to the development of online learning algorithms. Many fields, such as supply chain management, weather and meteorology, energy markets, and finance, have pivoted towards using probabilistic forecasts. This results in the need not only for accurate learning of the expected value but also for learning the conditional heteroskedasticity and conditional moments. Against this backdrop, we present a methodology for online estimation of regularized, linear distributional models. The proposed algorithm is based on a combination of recent developments for the online estimation of LASSO models and the well-known GAMLSS framework. We provide a case study on day-ahead electricity price forecasting, in which we show the competitive performance of the incremental estimation combined with strongly reduced computational effort. Our algorithms are implemented in a computationally efficient Python package ondil.
Updated: 2025-08-14 07:26:03
标题: 在线分布回归
摘要: 大规模流数据在现代机器学习应用中很常见,导致了在线学习算法的发展。许多领域,如供应链管理、天气和气象、能源市场和金融,已经转向使用概率预测。这不仅需要准确学习期望值,还需要学习条件异方差和条件矩。在这种背景下,我们提出了一种在线估计正则化线性分布模型的方法。所提出的算法基于最近发展的LASSO模型在线估计和众所周知的GAMLSS框架的结合。我们提供了一个关于次日电力价格预测的案例研究,展示了增量估计与大大降低的计算工作量相结合的竞争性表现。我们的算法在一个计算效率高的Python软件包ondil中实现。
更新时间: 2025-08-14 07:26:03
领域: stat.ML,cs.LG,econ.EM,stat.AP,stat.CO,stat.ME
Semantic-Enhanced Time-Series Forecasting via Large Language Models
Time series forecasting plays a significant role in finance, energy, meteorology, and IoT applications. Recent studies have leveraged the generalization capabilities of large language models (LLMs) to adapt to time series forecasting, achieving promising performance. However, existing studies focus on token-level modal alignment, instead of bridging the intrinsic modality gap between linguistic knowledge structures and time series data patterns, greatly limiting the semantic representation. To address this issue, we propose a novel Semantic-Enhanced LLM (SE-LLM) that explores the inherent periodicity and anomalous characteristics of time series to embed into the semantic space to enhance the token embedding. This process enhances the interpretability of tokens for LLMs, thereby activating the potential of LLMs for temporal sequence analysis. Moreover, existing Transformer-based LLMs excel at capturing long-range dependencies but are weak at modeling short-term anomalies in time-series data. Hence, we propose a plugin module embedded within self-attention that models long-term and short-term dependencies to effectively adapt LLMs to time-series analysis. Our approach freezes the LLM and reduces the sequence dimensionality of tokens, greatly reducing computational consumption. Experiments demonstrate the superiority performance of our SE-LLM against the state-of-the-art (SOTA) methods.
Updated: 2025-08-14 07:25:44
标题: 通过大型语言模型增强语义的时间序列预测
摘要: 时间序列预测在金融、能源、气象和物联网应用中发挥着重要作用。最近的研究利用大型语言模型(LLMs)的泛化能力来适应时间序列预测,取得了令人期待的表现。然而,现有研究侧重于令牌级别的模态对齐,而不是弥合语言知识结构和时间序列数据模式之间的固有模态差距,从而极大限制了语义表示。为了解决这个问题,我们提出了一种新颖的语义增强LLM(SE-LLM),该模型探索时间序列的固有周期性和异常特征,将其嵌入语义空间以增强令牌嵌入。这个过程增强了LLMs的令牌可解释性,从而激活了LLMs在时间序列分析中的潜力。此外,现有基于Transformer的LLMs擅长捕捉长程依赖,但在建模时间序列数据中的短期异常方面表现不佳。因此,我们提出了一个嵌入在自注意力中的插件模块,用于建模长期和短期依赖关系,有效地将LLMs适应于时间序列分析。我们的方法冻结了LLM并减少了令牌的序列维度,大大减少了计算消耗。实验证明了我们的SE-LLM相对于最先进的方法的卓越性能。
更新时间: 2025-08-14 07:25:44
领域: cs.LG,cs.CE
EvaDrive: Evolutionary Adversarial Policy Optimization for End-to-End Autonomous Driving
Autonomous driving faces significant challenges in achieving human-like iterative decision-making, which continuously generates, evaluates, and refines trajectory proposals. Current generation-evaluation frameworks isolate trajectory generation from quality assessment, preventing iterative refinement essential for planning, while reinforcement learning methods collapse multi-dimensional preferences into scalar rewards, obscuring critical trade-offs and yielding scalarization bias.To overcome these issues, we present EvaDrive, a novel multi-objective reinforcement learning framework that establishes genuine closed-loop co-evolution between trajectory generation and evaluation via adversarial optimization. EvaDrive frames trajectory planning as a multi-round adversarial game. In this game, a hierarchical generator continuously proposes candidate paths by combining autoregressive intent modeling for temporal causality with diffusion-based refinement for spatial flexibility. These proposals are then rigorously assessed by a trainable multi-objective critic that explicitly preserves diverse preference structures without collapsing them into a single scalarization bias.This adversarial interplay, guided by a Pareto frontier selection mechanism, enables iterative multi-round refinement, effectively escaping local optima while preserving trajectory diversity.Extensive experiments on NAVSIM and Bench2Drive benchmarks demonstrate SOTA performance, achieving 94.9 PDMS on NAVSIM v1 (surpassing DiffusionDrive by 6.8, DriveSuprim by 5.0, and TrajHF by 0.9) and 64.96 Driving Score on Bench2Drive. EvaDrive generates diverse driving styles via dynamic weighting without external preference data, introducing a closed-loop adversarial framework for human-like iterative decision-making, offering a novel scalarization-free trajectory optimization approach.
Updated: 2025-08-14 07:22:36
标题: EvaDrive:用于端到端自动驾驶的进化对抗策略优化
摘要: 自动驾驶面临着重大挑战,即实现类似人类的迭代决策制定,不断生成、评估和完善轨迹提议。当前的生成评估框架将轨迹生成与质量评估隔离,阻碍了规划所必需的迭代完善,而强化学习方法将多维偏好转化为标量奖励,模糊了重要的权衡,并导致标量化偏差。为了克服这些问题,我们提出了EvaDrive,一种新颖的多目标强化学习框架,通过对抗优化建立了轨迹生成和评估之间的真正闭环协同演化。EvaDrive将轨迹规划框架为一个多轮对抗游戏。在这个游戏中,一个分层生成器通过将自回归意图建模与基于扩散的空间灵活性完善相结合,不断提出候选路径。然后,这些提议将被一个可训练的多目标评论者严格评估,明确保留多样的偏好结构,而不将它们合并为单一的标量化偏差。这种对抗性互动,引导着帕累托前沿选择机制,实现了迭代多轮完善,有效地避开局部最优解,同时保留轨迹多样性。在NAVSIM和Bench2Drive基准测试上进行的大量实验表明,EvaDrive取得了最先进的表现,实现了NAVSIM v1上的94.9 PDMS(超过DiffusionDrive 6.8,DriveSuprim 5.0和TrajHF 0.9),以及Bench2Drive上的64.96驾驶得分。EvaDrive通过动态加权生成多样的驾驶风格,无需外部偏好数据,引入了一个闭环对抗框架,用于实现类似人类的迭代决策制定,提供了一种新颖的无标量化轨迹优化方法。
更新时间: 2025-08-14 07:22:36
领域: cs.LG,cs.AI
Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
Large language models (LLMs) have achieved significant performance gains via scaling up model sizes and/or data. However, recent evidence suggests diminishing returns from such approaches, motivating scaling the computation spent at inference time. Existing inference-time scaling methods, usually with reward models, cast the task as a search problem, which tends to be vulnerable to reward hacking as a consequence of approximation errors in reward models. In this paper, we instead cast inference-time scaling as a probabilistic inference task and leverage sampling-based techniques to explore the typical set of the state distribution of a state-space model with an approximate likelihood, rather than optimize for its mode directly. We propose a novel inference-time scaling approach by adapting particle-based Monte Carlo methods to this task. Our empirical evaluation demonstrates that our methods have a 4-16x better scaling rate over our deterministic search counterparts on various challenging mathematical reasoning tasks. Using our approach, we show that Qwen2.5-Math-1.5B-Instruct can surpass GPT-4o accuracy in only 4 rollouts, while Qwen2.5-Math-7B-Instruct scales to o1 level accuracy in only 32 rollouts. Our work not only presents an effective method to inference-time scaling, but also connects the rich literature in probabilistic inference with inference-time scaling of LLMs to develop more robust algorithms in future work. Code, videos, and further information available at https://probabilistic-inference-scaling.github.io.
Updated: 2025-08-14 07:21:40
标题: 轮盘赌:一种基于粒子的蒙特卡洛方法的概率推理方法,用于LLM的推理时间缩放
摘要: 大型语言模型(LLMs)通过扩大模型规模和/或数据取得了显著的性能提升。然而,最近的证据表明,通过这种方法获得的回报递减,促使我们在推理时扩大计算量。现有的推理时扩大方法通常使用奖励模型,将任务视为搜索问题,这往往容易受到奖励模型中近似误差的影响而导致奖励作弊。在本文中,我们将推理时扩大视为概率推理任务,并利用基于采样的技术探索状态空间模型的状态分布的典型集合,而不是直接优化其模式。我们提出了一种新颖的推理时扩大方法,通过将基于粒子的蒙特卡洛方法应用于这一任务。我们的实证评估表明,我们的方法在各种具有挑战性的数学推理任务上比我们的确定性搜索对照组具有4-16倍更好的扩展速率。使用我们的方法,我们展示了Qwen2.5-Math-1.5B-Instruct在仅4次展开中就能超越GPT-4o的准确度,而Qwen2.5-Math-7B-Instruct在仅32次展开中就能达到o1级准确度。我们的工作不仅提出了一种有效的推理时扩大方法,还将概率推理中的丰富文献与LLMs的推理时扩大联系起来,以在未来工作中开发更加强健的算法。代码、视频和更多信息可在https://probabilistic-inference-scaling.github.io获取。
更新时间: 2025-08-14 07:21:40
领域: cs.LG,cs.AI
Decentralized Weather Forecasting via Distributed Machine Learning and Blockchain-Based Model Validation
Weather forecasting plays a vital role in disaster preparedness, agriculture, and resource management, yet current centralized forecasting systems are increasingly strained by security vulnerabilities, limited scalability, and susceptibility to single points of failure. To address these challenges, we propose a decentralized weather forecasting framework that integrates Federated Learning (FL) with blockchain technology. FL enables collaborative model training without exposing sensitive local data; this approach enhances privacy and reduces data transfer overhead. Meanwhile, the Ethereum blockchain ensures transparent and dependable verification of model updates. To further enhance the system's security, we introduce a reputation-based voting mechanism that assesses the trustworthiness of submitted models while utilizing the Interplanetary File System (IPFS) for efficient off-chain storage. Experimental results demonstrate that our approach not only improves forecasting accuracy but also enhances system resilience and scalability, making it a viable candidate for deployment in real-world, security-critical environments.
Updated: 2025-08-14 07:18:06
标题: 基于分布式机器学习和基于区块链的模型验证的去中心化天气预报
摘要: 天气预报在灾难准备、农业和资源管理中发挥着至关重要的作用,然而当前的集中式预测系统越来越受到安全漏洞、有限的可扩展性和易受单点故障的影响。为了解决这些挑战,我们提出了一个集成了联邦学习(FL)和区块链技术的去中心化天气预报框架。FL使得协作模型训练不会暴露敏感的本地数据;这种方法增强了隐私性并减少了数据传输开销。同时,以太坊区块链确保了模型更新的透明和可靠性验证。为了进一步增强系统的安全性,我们引入了基于声誉的投票机制,评估提交模型的可信度,同时利用星际文件系统(IPFS)进行有效的离线存储。实验结果表明,我们的方法不仅提高了预测准确性,还增强了系统的弹性和可扩展性,使其成为在真实世界的安全关键环境中部署的可行候选方案。
更新时间: 2025-08-14 07:18:06
领域: cs.LG,cs.AI,cs.CR
Measuring Diversity in Synthetic Datasets
Large language models (LLMs) are widely adopted to generate synthetic datasets for various natural language processing (NLP) tasks, such as text classification and summarization. However, accurately measuring the diversity of these synthetic datasets-an aspect crucial for robust model performance-remains a significant challenge. In this paper, we introduce DCScore, a novel method for measuring synthetic dataset diversity from a classification perspective. Specifically, DCScore formulates diversity evaluation as a sample classification task, leveraging mutual relationships among samples. We further provide theoretical verification of the diversity-related axioms satisfied by DCScore, highlighting its role as a principled diversity evaluation method. Experimental results on synthetic datasets reveal that DCScore enjoys a stronger correlation with multiple diversity pseudo-truths of evaluated datasets, underscoring its effectiveness. Moreover, both empirical and theoretical evidence demonstrate that DCScore substantially reduces computational costs compared to existing methods. Code is available at: https://github.com/bluewhalelab/dcscore.
Updated: 2025-08-14 07:15:48
标题: 在合成数据集中测量多样性
摘要: 大型语言模型(LLMs)被广泛采用来生成各种自然语言处理(NLP)任务的合成数据集,如文本分类和摘要。然而,准确衡量这些合成数据集的多样性-这对于稳健模型性能至关重要-仍然是一个重大挑战。在本文中,我们介绍了DCScore,一种新颖的方法,用于从分类角度衡量合成数据集的多样性。具体而言,DCScore将多样性评估形式化为一个样本分类任务,利用样本之间的相互关系。我们进一步提供了DCScore满足的与多样性相关的公理的理论验证,突出了它作为一个有原则的多样性评估方法的作用。对合成数据集的实验结果表明,DCScore与评估数据集的多个多样性伪真实性之间具有较强的相关性,强调了其有效性。此外,经验和理论证据表明,与现有方法相比,DCScore大大降低了计算成本。代码可在以下链接找到:https://github.com/bluewhalelab/dcscore。
更新时间: 2025-08-14 07:15:48
领域: cs.CL,cs.AI
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
Fine-tuning pre-trained Large Language Models (LLMs) for specialized tasks incurs substantial computational and data costs. While model merging offers a training-free solution to integrate multiple task-specific models, existing methods suffer from safety-utility conflicts where enhanced general capabilities degrade safety safeguards. We identify two root causes: $\textbf{neuron misidentification}$ due to simplistic parameter magnitude-based selection, and $\textbf{cross-task neuron interference}$ during merging. To address these challenges, we propose $\textbf{LED-Merging}$, a three-stage framework that $\textbf{L}$ocates task-specific neurons via gradient-based attribution, dynamically $\textbf{E}$lects critical neurons through multi-model importance fusion, and $\textbf{D}$isjoints conflicting updates through parameter isolation. Extensive experiments on Llama-3-8B, Mistral-7B, and Llama2-13B demonstrate that LED-Merging effectively reduces harmful response rates, showing a 31.4\% decrease on Llama-3-8B-Instruct on HarmBench, while simultaneously preserving 95\% of utility performance, such as achieving 52.39\% accuracy on GSM8K. LED-Merging resolves safety-utility conflicts and provides a lightweight, training-free paradigm for constructing reliable multi-task LLMs. Code is available at $\href{https://github.com/MqLeet/LED-Merging}{GitHub}$.
Updated: 2025-08-14 07:15:21
标题: LED-Merging: 在位置选举不相交的模型合并中缓解安全性-效用冲突
摘要: 将预训练的大型语言模型(LLMs)进行微调以适用于专门任务会带来巨大的计算和数据成本。虽然模型合并提供了一个无需训练的解决方案,可以整合多个特定任务的模型,但现有方法存在安全-效用冲突,增强的通用能力会降低安全保障。我们确定了两个根本原因:由于简单的参数大小基础选择,导致$\textbf{神经元误识}$,以及在合并过程中的$\textbf{跨任务神经元干扰}$。为了解决这些挑战,我们提出了一个三阶段框架$\textbf{LED-Merging}$,通过基于梯度的属性定位任务特定神经元,通过多模型重要性融合动态选择关键神经元,并通过参数隔离分离冲突更新。在Llama-3-8B、Mistral-7B和Llama2-13B上进行的大量实验表明,LED-Merging有效降低了有害响应率,在Llama-3-8B-Instruct on HarmBench上减少了31.4%,同时保留了95%的效用性能,如在GSM8K上达到了52.39%的准确率。LED-Merging解决了安全-效用冲突,并为构建可靠的多任务LLMs提供了一种轻量级的、无需训练的范式。代码可在GitHub上找到。
更新时间: 2025-08-14 07:15:21
领域: cs.CL,cs.AI
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
With the rapid proliferation of Natural Language Processing (NLP), especially Large Language Models (LLMs), generating adversarial examples to jailbreak LLMs remains a key challenge for understanding model vulnerabilities and improving robustness. In this context, we propose a new black-box attack method that leverages the interpretability of large models. We introduce the Sparse Feature Perturbation Framework (SFPF), a novel approach for adversarial text generation that utilizes sparse autoencoders to identify and manipulate critical features in text. After using the SAE model to reconstruct hidden layer representations, we perform feature clustering on the successfully attacked texts to identify features with higher activations. These highly activated features are then perturbed to generate new adversarial texts. This selective perturbation preserves the malicious intent while amplifying safety signals, thereby increasing their potential to evade existing defenses. Our method enables a new red-teaming strategy that balances adversarial effectiveness with safety alignment. Experimental results demonstrate that adversarial texts generated by SFPF can bypass state-of-the-art defense mechanisms, revealing persistent vulnerabilities in current NLP systems.However, the method's effectiveness varies across prompts and layers, and its generalizability to other architectures and larger models remains to be validated.
Updated: 2025-08-14 07:12:44
标题: 通过稀疏自动编码器进行逐层扰动的对抗文本生成
摘要: 随着自然语言处理(NLP)的迅速传播,尤其是大型语言模型(LLMs),生成对抗性样本以破解LLMs仍然是理解模型漏洞和提高鲁棒性的关键挑战。在这种情况下,我们提出了一种利用大型模型可解释性的新的黑盒攻击方法。我们引入了稀疏特征扰动框架(SFPF),这是一种用于对抗性文本生成的新方法,利用稀疏自动编码器来识别和操纵文本中的关键特征。在使用SAE模型重构隐藏层表示后,我们对成功攻击的文本执行特征聚类,以识别具有更高激活度的特征。然后扰动这些高度激活的特征以生成新的对抗性文本。这种选择性扰动保留了恶意意图,同时增强了安全信号,从而增加了它们规避现有防御措施的潜力。我们的方法实现了一种新的红队策略,平衡了对抗性效果与安全对齐。实验结果表明,由SFPF生成的对抗性文本可以绕过最先进的防御机制,揭示了当前NLP系统中持久的漏洞。然而,该方法的有效性在提示和层之间存在差异,其泛化到其他架构和更大的模型的能力仍有待验证。
更新时间: 2025-08-14 07:12:44
领域: cs.CL,cs.AI
Compass-Thinker-7B Technical Report
Recent R1-Zero-like research further demonstrates that reasoning extension has given large language models (LLMs) unprecedented reasoning capabilities, and Reinforcement Learning is the core technology to elicit its complex reasoning. However, conducting RL experiments directly on hyperscale models involves high computational costs and resource demands, posing significant risks. We propose the Compass-Thinker-7B model, which aims to explore the potential of Reinforcement Learning with less computational resources and costs, and provides insights for further research into RL recipes for larger models. Compass-Thinker-7B is trained from an open source model through a specially designed Reinforcement Learning Pipeline. We curate a dataset of 30k verifiable mathematics problems for the Reinforcement Learning Pipeline. By configuring data and training settings with different difficulty distributions for different stages, the potential of the model is gradually released and the training efficiency is improved. Extensive evaluations show that Compass-Thinker-7B possesses exceptional reasoning potential, and achieves superior performance on mathematics compared to the same-sized RL model. Especially in the challenging AIME2024 evaluation, Compass-Thinker-7B achieves 40% accuracy.
Updated: 2025-08-14 07:12:38
标题: Compass-Thinker-7B技术报告
摘要: 最近的R1-Zero-like研究进一步证明,推理扩展赋予了大型语言模型(LLMs)前所未有的推理能力,而强化学习是引发其复杂推理的核心技术。然而,在超大规模模型上直接进行强化学习实验涉及高昂的计算成本和资源需求,存在重大风险。我们提出了Compass-Thinker-7B模型,旨在探索强化学习在较少计算资源和成本下的潜力,并为进一步研究更大模型的强化学习方法提供见解。Compass-Thinker-7B通过特别设计的强化学习流水线从开源模型进行训练。我们策划了一个包含30,000个可验证数学问题的数据集,用于强化学习流水线。通过在不同阶段配置不同难度分布的数据和训练设置,逐渐释放模型的潜力并提高训练效率。广泛的评估表明,Compass-Thinker-7B具有卓越的推理潜力,在数学方面表现出卓越的性能,相比于同等大小的强化学习模型。特别是在具有挑战性的AIME2024评估中,Compass-Thinker-7B实现了40%的准确率。
更新时间: 2025-08-14 07:12:38
领域: cs.AI
Efficient Distributed Optimization under Heavy-Tailed Noise
Distributed optimization has become the default training paradigm in modern machine learning due to the growing scale of models and datasets. To mitigate communication overhead, local updates are often applied before global aggregation, resulting in a nested optimization approach with inner and outer steps. However, heavy-tailed stochastic gradient noise remains a significant challenge, particularly in attention-based models, hindering effective training. In this work, we propose TailOPT, an efficient framework designed to address heavy-tailed noise by leveraging adaptive optimization or clipping techniques. We establish convergence guarantees for the TailOPT framework under heavy-tailed noise with potentially unbounded gradient variance and local updates. Among its variants, we highlight a memory and communication efficient instantiation which we call $Bi^2Clip$, which performs coordinate-wise clipping at both the inner and outer optimizers, achieving adaptive-like performance (e.g., Adam) without the cost of maintaining or transmitting additional gradient statistics. Empirically, TailOPT, including $Bi^2Clip$, demonstrates superior performance on several language tasks and models, outperforming state-of-the-art methods.
Updated: 2025-08-14 07:03:00
标题: 高尾噪声下的高效分布式优化
摘要: 分布式优化已经成为现代机器学习中的默认训练范式,因为模型和数据集的规模不断增长。为了减少通信开销,在全局聚合之前经常应用本地更新,导致了一种具有内部和外部步骤的嵌套优化方法。然而,重尾随机梯度噪声仍然是一个重要挑战,特别是在基于注意力的模型中,阻碍了有效的训练。在这项工作中,我们提出了TailOPT,一个设计用于解决重尾噪声的有效框架,利用自适应优化或剪切技术。我们建立了TailOPT框架在重尾噪声下的收敛保证,可能具有无界梯度方差和本地更新。在其变体中,我们强调了一种内存和通信高效的实例化,我们称之为$Bi^2Clip$,它在内部和外部优化器上都执行逐坐标剪切,实现了类似自适应(例如,Adam)的性能,而不需要维护或传输额外的梯度统计。在实证方面,TailOPT,包括$Bi^2Clip$,在几个语言任务和模型上展示出优越的性能,优于最先进的方法。
更新时间: 2025-08-14 07:03:00
领域: cs.LG
Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free
Large Language Models (LLMs) face deployment challenges due to high computational costs, and while Post-Training Quantization (PTQ) offers a solution, existing rotation-based methods struggle at very low bit-widths like 2-bit. We introduce a novel, training-free approach to construct an improved rotation matrix, addressing the limitations of current methods. The key contributions include leveraging the Walsh-Hadamard transform with sequency ordering, which clusters similar frequency components to reduce quantization error compared to standard Hadamard matrices, significantly improving performance. Furthermore, we propose a Grouped Sequency-arranged Rotation (GSR) using block-diagonal matrices with smaller Walsh blocks, effectively isolating outlier impacts and achieving performance comparable to optimization-based methods without requiring any training. Our method demonstrates robust performance on reasoning tasks and Perplexity (PPL) score on WikiText-2. Our method also enhances results even when applied over existing learned rotation techniques.
Updated: 2025-08-14 07:02:58
标题: 分组序列排序旋转:优化旋转变换以实现免费量化
摘要: 大型语言模型(LLMs)面临部署挑战,因为计算成本高,而后训练量化(PTQ)提供了一个解决方案,现有的基于旋转的方法在非常低的比特宽度(如2比特)下面临困难。我们引入了一种新颖的、无需训练的方法来构建改进的旋转矩阵,解决了当前方法的局限性。关键贡献包括利用Walsh-Hadamard变换与序列排序,将相似频率成分聚类在一起,以减少量化误差,相对于标准Hadamard矩阵,显著提高性能。此外,我们提出了一种分组序列排列旋转(GSR),使用具有较小Walsh块的块对角矩阵,有效地隔离了异常值影响,并实现了与基于优化的方法相当的性能,而无需任何训练。我们的方法在推理任务和WikiText-2上的困惑度(PPL)评分上表现出稳健的性能。我们的方法甚至在应用于现有学习的旋转技术时也会提高结果。
更新时间: 2025-08-14 07:02:58
领域: cs.LG,cs.AI,cs.CL
LinguaFluid: Language Guided Fluid Control via Semantic Rewards in Reinforcement Learning
In the domain of scientific machine learning, designing effective reward functions remains a challenge in reinforcement learning (RL), particularly in environments where task goals are difficult to specify numerically. Reward functions in existing work are predominantly based on heuristics, manual engineering, or task-specific tuning. In this work, we introduce a semantically aligned reinforcement learning method where rewards are computed by aligning the current state with a target semantic instruction using a Sentence-Bidirectional Encoder Representations from Transformers (SBERT). Instead of relying on manually defined reward functions, the policy receives feedback based on the reward, which is a cosine similarity between the goal textual description and the statement description in the episode. We evaluated our approach in several environments and showed that semantic reward can guide learning to achieve competitive control behavior, even in the absence of hand-crafted reward functions. Our study demonstrates a correlation between the language embedding space and the conventional Euclidean space. This framework opens new horizons for aligning agent behavior with natural language goals and lays the groundwork for a more seamless integration of larger language models (LLMs) and fluid control applications.
Updated: 2025-08-14 07:01:16
标题: LinguaFluid: 通过语言引导的语义奖励在强化学习中的流体控制
摘要: 在科学机器学习领域,设计有效的奖励函数在强化学习(RL)中仍然是一个挑战,特别是在任务目标难以用数字来指定的环境中。现有工作中的奖励函数主要基于启发式、手动工程或特定任务的调整。在这项工作中,我们引入了一种语义对齐的强化学习方法,其中奖励是通过使用来自Transformers的句子双向编码器表示(SBERT)将当前状态与目标语义指令对齐来计算的。与依赖手动定义的奖励函数不同,策略根据奖励接收反馈,这是目标文本描述和剧集中的语句描述之间的余弦相似度。我们在几个环境中评估了我们的方法,并展示了语义奖励如何引导学习实现有竞争力的控制行为,即使没有手工制作的奖励函数。我们的研究证明了语言嵌入空间与传统欧几里得空间之间的相关性。这个框架为将代理行为与自然语言目标对齐开辟了新的视野,并为更无缝地集成更大的语言模型(LLMs)和流体控制应用奠定了基础。
更新时间: 2025-08-14 07:01:16
领域: cs.LG,physics.flu-dyn
Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference
Artificial Intelligence (AI) conferences are essential for advancing research, sharing knowledge, and fostering academic community. However, their rapid expansion has rendered the centralized conference model increasingly unsustainable. This paper offers a data-driven diagnosis of a structural crisis that threatens the foundational goals of scientific dissemination, equity, and community well-being. We identify four key areas of strain: (1) scientifically, with per-author publication rates more than doubling over the past decade to over 4.5 papers annually; (2) environmentally, with the carbon footprint of a single conference exceeding the daily emissions of its host city; (3) psychologically, with 71% of online community discourse reflecting negative sentiment and 35% referencing mental health concerns; and (4) logistically, with attendance at top conferences such as NeurIPS 2024 beginning to outpace venue capacity. These pressures point to a system that is misaligned with its core mission. In response, we propose the Community-Federated Conference (CFC) model, which separates peer review, presentation, and networking into globally coordinated but locally organized components, offering a more sustainable, inclusive, and resilient path forward for AI research.
Updated: 2025-08-14 06:58:44
标题: 立场:目前的人工智能会议模式不可持续!诊断集中式人工智能会议危机
摘要: 人工智能(AI)会议对于推动研究、分享知识和促进学术社区发展至关重要。然而,它们的快速扩张使得中心化会议模式变得越来越不可持续。本文提供了一个基于数据的结构性危机诊断,该危机威胁着科学传播、公平性和社区福祉的基本目标。我们确定了四个关键压力领域:(1)在科学上,过去十年每位作者的出版率超过两倍增长,达到每年4.5篇以上;(2)在环境上,单个会议的碳排放量超过了其主办城市的日常排放量;(3)在心理上,71%的在线社区讨论表现出消极情绪,35%提及心理健康问题;(4)在物流上,像NeurIPS 2024这样顶级会议的参与人数开始超过场馆容纳能力。这些压力指向一个与其核心使命不符的系统。作为回应,我们提出了社区联合会议(CFC)模式,该模式将同行评审、展示和网络建设分为全球协调但在本地组织的组成部分,为人工智能研究提供了一条更可持续、包容和有韧性的前进道路。
更新时间: 2025-08-14 06:58:44
领域: cs.CY,cs.AI,cs.CL
PQ-DAF: Pose-driven Quality-controlled Data Augmentation for Data-scarce Driver Distraction Detection
Driver distraction detection is essential for improving traffic safety and reducing road accidents. However, existing models often suffer from degraded generalization when deployed in real-world scenarios. This limitation primarily arises from the few-shot learning challenge caused by the high cost of data annotation in practical environments, as well as the substantial domain shift between training datasets and target deployment conditions. To address these issues, we propose a Pose-driven Quality-controlled Data Augmentation Framework (PQ-DAF) that leverages a vision-language model for sample filtering to cost-effectively expand training data and enhance cross-domain robustness. Specifically, we employ a Progressive Conditional Diffusion Model (PCDMs) to accurately capture key driver pose features and synthesize diverse training examples. A sample quality assessment module, built upon the CogVLM vision-language model, is then introduced to filter out low-quality synthetic samples based on a confidence threshold, ensuring the reliability of the augmented dataset. Extensive experiments demonstrate that PQ-DAF substantially improves performance in few-shot driver distraction detection, achieving significant gains in model generalization under data-scarce conditions.
Updated: 2025-08-14 06:54:28
标题: PQ-DAF:基于姿势驱动的质量控制数据增广用于数据稀缺的驾驶员分心检测
摘要: 司机分心检测对于提高交通安全和减少道路事故至关重要。然而,现有模型在实际场景中部署时往往存在泛化能力下降的问题。这一限制主要源自实际环境中数据标注成本高昂导致的少样本学习挑战,以及训练数据集与目标部署条件之间的实质性领域转移。为解决这些问题,我们提出了一种基于姿势驱动的质量控制数据增强框架(PQ-DAF),利用视觉语言模型进行样本过滤,从而成本有效地扩展训练数据并增强跨领域鲁棒性。具体而言,我们采用了一个渐进条件扩散模型(PCDMs)来准确捕捉关键的驾驶员姿势特征并合成多样化的训练样本。然后引入了一个基于CogVLM视觉语言模型构建的样本质量评估模块,根据置信阈值过滤出低质量的合成样本,确保增强数据集的可靠性。大量实验证明,PQ-DAF在少样本司机分心检测中显著提升了性能,在数据稀缺条件下取得了模型泛化能力的显著增益。
更新时间: 2025-08-14 06:54:28
领域: cs.CV,cs.AI
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization
Although LLM inference has emerged as a critical workload for many downstream applications, efficiently inferring LLMs is challenging due to the substantial memory footprint and bandwidth requirements. In parallel, compute capabilities have steadily outpaced both memory capacity and bandwidth over the last few decades, a trend that remains evident in modern GPU hardware and exacerbates the challenge of LLM inference. As such, new algorithms are emerging that trade increased computation for reduced memory operations. To that end, we present XQuant, which takes advantage of this trend, enabling an order-of-magnitude reduction in memory consumption through low-bit quantization with substantial accuracy benefits relative to state-of-the-art KV cache quantization methods. We accomplish this by quantizing and caching the layer input activations X, instead of using standard KV caching, and then rematerializing the Keys and Values on-the-fly during inference. This results in an immediate 2$\times$ memory savings compared to KV caching. By applying XQuant, we achieve up to $\sim 7.7\times$ memory savings with $<0.1$ perplexity degradation compared to the FP16 baseline. Furthermore, our approach leverages the fact that X values are similar across layers. Building on this observation, we introduce XQuant-CL, which exploits the cross-layer similarity in the X embeddings for extreme compression. Across different models, XQuant-CL attains up to 10$\times$ memory savings relative to the FP16 baseline with only 0.01 perplexity degradation, and 12.5$\times$ memory savings with only $0.1$ perplexity degradation. XQuant exploits the rapidly increasing compute capabilities of hardware platforms to eliminate the memory bottleneck, while surpassing state-of-the-art KV cache quantization methods and achieving near-FP16 accuracy across a wide range of models.
Updated: 2025-08-14 06:52:38
标题: XQuant:通过KV缓存重建打破LLM推断的内存墙
摘要: 虽然低精度量化模型(LLM)推断已经成为许多下游应用程序的关键工作负载,但由于其显着的内存占用和带宽需求,高效地推断LLMs仍具有挑战性。与此同时,过去几十年来,计算能力稳步超过了内存容量和带宽,这一趋势在现代GPU硬件中仍然明显,加剧了LLM推断的挑战。因此,新的算法正在出现,通过增加计算量来减少内存操作。为此,我们提出了XQuant,利用这一趋势,通过低位量化实现了内存消耗的数量级减少,相对于最先进的KV缓存量化方法,精度效益显著。我们通过量化和缓存层输入激活X来实现这一目标,而不是使用标准的KV缓存,然后在推断过程中动态重新生成Keys和Values。与KV缓存相比,这立即节省了2倍的内存。通过应用XQuant,我们相对于FP16基线实现了高达约7.7倍的内存节省,而困惑度降级小于0.1。此外,我们的方法利用了X值在各层之间的相似性。基于这一观察结果,我们引入了XQuant-CL,利用X嵌入中的跨层相似性进行极端压缩。在不同模型中,相对于FP16基线,XQuant-CL实现了高达10倍的内存节省,仅有0.01的困惑度降级,以及12.5倍的内存节省,仅有0.1的困惑度降级。XQuant利用硬件平台快速增长的计算能力来消除内存瓶颈,超越最先进的KV缓存量化方法,并在各种模型中实现接近FP16的精度。
更新时间: 2025-08-14 06:52:38
领域: cs.LG
A Unified Evaluation Framework for Multi-Annotator Tendency Learning
Recent works have emerged in multi-annotator learning that shift focus from Consensus-oriented Learning (CoL), which aggregates multiple annotations into a single ground-truth prediction, to Individual Tendency Learning (ITL), which models annotator-specific labeling behavior patterns (i.e., tendency) to provide explanation analysis for understanding annotator decisions. However, no evaluation framework currently exists to assess whether ITL methods truly capture individual tendencies and provide meaningful behavioral explanations. To address this gap, we propose the first unified evaluation framework with two novel metrics: (1) Difference of Inter-annotator Consistency (DIC) quantifies how well models capture annotator tendencies by comparing predicted inter-annotator similarity structures with ground-truth; (2) Behavior Alignment Explainability (BAE) evaluates how well model explanations reflect annotator behavior and decision relevance by aligning explainability-derived with ground-truth labeling similarity structures via Multidimensional Scaling (MDS). Extensive experiments validate the effectiveness of our proposed evaluation framework.
Updated: 2025-08-14 06:50:20
标题: 一个用于多注释者倾向学习的统一评估框架
摘要: 最近的研究在多注释者学习领域涌现出来,将焦点从以共识为导向的学习(CoL)转移到了以个体倾向为导向的学习(ITL),后者将多个注释整合成单一的基准预测,而ITL模型则对注释者特定的标注行为模式(即倾向)进行建模,以提供解释分析,以便理解注释者的决策。然而,目前尚无评估框架能够评估ITL方法是否真正捕捉到个体倾向并提供有意义的行为解释。为了弥补这一空白,我们提出了第一个统一的评估框架,其中包含两个新颖的指标:(1)注释者一致性差异(DIC)量化模型捕捉注释者倾向的能力,通过比较预测的注释者间相似性结构与基准的相似性结构;(2)行为对齐可解释性(BAE)评估模型解释如何反映注释者的行为和决策相关性,通过多维缩放(MDS)将可解释性推导与基准标注相似性结构对齐。大量实验证实了我们提出的评估框架的有效性。
更新时间: 2025-08-14 06:50:20
领域: cs.LG,cs.MM,Artificial intelligence
Rethinking Client-oriented Federated Graph Learning
As a new distributed graph learning paradigm, Federated Graph Learning (FGL) facilitates collaborative model training across local systems while preserving data privacy. We review existing FGL approaches and categorize their optimization mechanisms into: (1) Server-Client (S-C), where clients upload local model parameters for server-side aggregation and global updates; (2) Client-Client (C-C), which allows direct exchange of information between clients and customizing their local training process. We reveal that C-C shows superior potential due to its refined communication structure. However, existing C-C methods broadcast redundant node representations, incurring high communication costs and privacy risks at the node level. To this end, we propose FedC4, which combines graph Condensation with C-C Collaboration optimization. Specifically, FedC4 employs graph condensation technique to refine the knowledge of each client's graph into a few synthetic embeddings instead of transmitting node-level knowledge. Moreover, FedC4 introduces three novel modules that allow the source client to send distinct node representations tailored to the target client's graph properties. Experiments on eight public real-world datasets show that FedC4 outperforms state-of-the-art baselines in both task performance and communication cost. Our code is now available on https://github.com/Ereshkigal1/FedC4.
Updated: 2025-08-14 06:48:27
标题: 重新思考面向客户的联邦图学习
摘要: 作为一种新的分布式图学习范式,联邦图学习(FGL)促进了跨本地系统的协作模型训练,同时保护数据隐私。我们回顾了现有的FGL方法,并将它们的优化机制分类为: (1)服务器-客户端(S-C),其中客户端上传本地模型参数进行服务器端聚合和全局更新; (2)客户端-客户端(C-C),允许客户端之间直接交换信息并自定义其本地训练过程。我们发现,C-C由于其精细的通信结构,显示出更高的潜力。 然而,现有的C-C方法广播冗余的节点表示,导致节点级别的高通信成本和隐私风险。因此,我们提出了FedC4,它结合了图浓缩和C-C协作优化。具体来说,FedC4采用图浓缩技术将每个客户端的图知识精细化为少量合成嵌入,而不是传输节点级别的知识。此外,FedC4引入了三个新颖的模块,允许源客户端发送适用于目标客户端图属性的不同节点表示。对八个公共真实数据集的实验表明,FedC4在任务性能和通信成本方面均优于最先进的基线模型。我们的代码现在可在https://github.com/Ereshkigal1/FedC4 上获取。
更新时间: 2025-08-14 06:48:27
领域: cs.LG
LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval
Retrieval-Augmented Generation (RAG) plays a crucial role in grounding Large Language Models by leveraging external knowledge, whereas the effectiveness is often compromised by the retrieval of contextually flawed or incomplete information. To address this, knowledge graph-based RAG methods have evolved towards hierarchical structures, organizing knowledge into multi-level summaries. However, these approaches still suffer from two critical, unaddressed challenges: high-level conceptual summaries exist as disconnected ``semantic islands'', lacking the explicit relations needed for cross-community reasoning; and the retrieval process itself remains structurally unaware, often degenerating into an inefficient flat search that fails to exploit the graph's rich topology. To overcome these limitations, we introduce LeanRAG, a framework that features a deeply collaborative design combining knowledge aggregation and retrieval strategies. LeanRAG first employs a novel semantic aggregation algorithm that forms entity clusters and constructs new explicit relations among aggregation-level summaries, creating a fully navigable semantic network. Then, a bottom-up, structure-guided retrieval strategy anchors queries to the most relevant fine-grained entities and then systematically traverses the graph's semantic pathways to gather concise yet contextually comprehensive evidence sets. The LeanRAG can mitigate the substantial overhead associated with path retrieval on graphs and minimizes redundant information retrieval. Extensive experiments on four challenging QA benchmarks with different domains demonstrate that LeanRAG significantly outperforming existing methods in response quality while reducing 46\% retrieval redundancy. Code is available at: https://github.com/RaZzzyz/LeanRAG
Updated: 2025-08-14 06:47:18
标题: LeanRAG:基于知识图谱的生成,具有语义聚合和层次检索
摘要: 检索增强生成(RAG)通过利用外部知识在植根大型语言模型中发挥关键作用,然而其有效性常常受到上下文错误或不完整信息的检索损害。为解决这一问题,基于知识图的RAG方法已经发展到层次结构,将知识组织成多级摘要。然而,这些方法仍然面临两个关键但未解决的挑战:高级概念摘要存在于不相连的“语义岛”,缺乏跨社区推理所需的显式关系;检索过程本身仍然缺乏结构意识,常常降级为低效的平面搜索,未能充分利用图的丰富拓扑结构。为克服这些限制,我们引入了LeanRAG,一个特点是深度协作设计的框架,结合了知识聚合和检索策略。LeanRAG首先采用一种新颖的语义聚合算法,形成实体群集并构建聚合级摘要之间的新显式关系,从而创建一个完全可导航的语义网络。然后,一种自下而上的、结构引导的检索策略将查询锚定到最相关的细粒度实体,然后系统地遍历图的语义路径,收集简洁但具有上下文广泛的证据集。LeanRAG可以减轻与图上路径检索相关的重大开销,并最小化冗余信息的检索。在四个具有不同领域的挑战性QA基准上进行的广泛实验表明,LeanRAG在响应质量方面显著优于现有方法,同时减少46%的检索冗余。源代码可在以下链接找到:https://github.com/RaZzzyz/LeanRAG
更新时间: 2025-08-14 06:47:18
领域: cs.AI
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
Evaluating jailbreak attacks is challenging when prompts are not overtly harmful or fail to induce harmful outputs. Unfortunately, many existing red-teaming datasets contain such unsuitable prompts. To evaluate attacks accurately, these datasets need to be assessed and cleaned for maliciousness. However, existing malicious content detection methods rely on either manual annotation, which is labor-intensive, or large language models (LLMs), which have inconsistent accuracy in harmful types. To balance accuracy and efficiency, we propose a hybrid evaluation framework named MDH (Malicious content Detection based on LLMs with Human assistance) that combines LLM-based annotation with minimal human oversight, and apply it to dataset cleaning and detection of jailbroken responses. Furthermore, we find that well-crafted developer messages can significantly boost jailbreak success, leading us to propose two new strategies: D-Attack, which leverages context simulation, and DH-CoT, which incorporates hijacked chains of thought. The Codes, datasets, judgements, and detection results will be released in github repository: https://github.com/AlienZhang1996/DH-CoT.
Updated: 2025-08-14 06:46:56
标题: 用明确有害提示越狱商业黑匣子LLM
摘要: 评估越狱攻击在提示不明显有害或未能引发有害输出时是具有挑战性的。不幸的是,许多现有的红队数据集包含此类不适当的提示。为了准确评估攻击,这些数据集需要进行恶意性评估和清理。然而,现有的恶意内容检测方法要么依赖于人工注释,这是劳动密集型的,要么依赖于大型语言模型(LLMs),这些模型在有害类型上的准确性不一致。为了平衡准确性和效率,我们提出了一个名为MDH(基于LLMs和人类协助的恶意内容检测)的混合评估框架,该框架结合了基于LLMs的注释和最少的人类监督,并将其应用于数据集清理和检测越狱响应。此外,我们发现精心制作的开发者消息可以显著提升越狱成功率,从而提出了两种新策略:D-Attack,利用上下文模拟,以及DH-CoT,结合劫持思维链。代码、数据集、判断和检测结果将在github仓库中发布:https://github.com/AlienZhang1996/DH-CoT。
更新时间: 2025-08-14 06:46:56
领域: cs.CL,cs.CR
Towards Efficient Prompt-based Continual Learning in Distributed Medical AI
Modern AI models achieve state-of-the-art performance with large-scale, high-quality datasets; however, ethical, social, and institutional constraints in the medical domain severely restrict data sharing, rendering centralized learning nearly impossible. Each institution must incrementally update models using only local data. Traditional training overfits new samples and suffers from catastrophic forgetting, losing previously acquired knowledge. Medical data distributions also shift due to varying diagnostic equipment and demographics. Although continual learning (CL) has advanced, most methods address natural images, leaving medical-domain-specific CL underexplored. We propose a prompt-based continual learning (PCL) approach featuring a unified prompt pool with a minimal expansion strategy: by expanding and freezing a subset of prompts, our method reduces computational overhead, and a novel regularization term balances retention and adaptation. Experiments on three diabetic retinopathy datasets Aptos2019, LI2019, and Diabetic Retinopathy Detection show our model improves final classification accuracy by at least 10% and F1-score by 9 points over state-of-the-art approaches while lowering inference cost. We anticipate this study will drive sustainable medical AI advances, enabling real-time diagnosis, patient monitoring, and telemedicine applications in distributed healthcare. Code will be released upon acceptance
Updated: 2025-08-14 06:46:14
标题: 朝着高效的基于提示的分布式医疗人工智能持续学习方向
摘要: 现代人工智能模型利用大规模、高质量数据集实现了最先进的性能;然而,在医疗领域,伦理、社会和制度性约束严重限制了数据共享,使得集中式学习几乎不可能。每个机构必须仅使用本地数据逐步更新模型。传统训练容易过拟合新样本,并且容易遗忘之前获得的知识。由于不同的诊断设备和人口统计数据,医疗数据分布也会发生变化。虽然持续学习(CL)已经取得了进展,但大多数方法都是针对自然图像的,对医学领域特定的持续学习尚未得到充分探讨。我们提出了一种基于提示的持续学习(PCL)方法,该方法具有一个统一的提示池和最小化扩展策略:通过扩展和冻结一部分提示,我们的方法降低了计算开销,并引入了一种新颖的正则化项来平衡保留和适应能力。在三个糖尿病视网膜病变数据集Aptos2019、LI2019和糖尿病视网膜病变检测中,我们的模型在提高最终分类准确性至少10%和F1得分9分的同时降低了推理成本。我们期待这项研究将推动可持续的医疗人工智能进步,实现分布式医疗保健中的实时诊断、患者监测和远程医疗应用。代码将在接受后发布。
更新时间: 2025-08-14 06:46:14
领域: cs.LG,cs.AI
Improved Personalized Headline Generation via Denoising Fake Interests from Implicit Feedback
Accurate personalized headline generation hinges on precisely capturing user interests from historical behaviors. However, existing methods neglect personalized-irrelevant click noise in entire historical clickstreams, which may lead to hallucinated headlines that deviate from genuine user preferences. In this paper, we reveal the detrimental impact of click noise on personalized generation quality through rigorous analysis in both user and news dimensions. Based on these insights, we propose a novel Personalized Headline Generation framework via Denoising Fake Interests from Implicit Feedback (PHG-DIF). PHG-DIF first employs dual-stage filtering to effectively remove clickstream noise, identified by short dwell times and abnormal click bursts, and then leverages multi-level temporal fusion to dynamically model users' evolving and multi-faceted interests for precise profiling. Moreover, we release DT-PENS, a new benchmark dataset comprising the click behavior of 1,000 carefully curated users and nearly 10,000 annotated personalized headlines with historical dwell time annotations. Extensive experiments demonstrate that PHG-DIF substantially mitigates the adverse effects of click noise and significantly improves headline quality, achieving state-of-the-art (SOTA) results on DT-PENS. Our framework implementation and dataset are available at https://github.com/liukejin-up/PHG-DIF.
Updated: 2025-08-14 06:43:57
标题: 通过去噪虚假兴趣从隐式反馈中改进个性化标题生成
摘要: 准确的个性化标题生成取决于准确捕捉用户从历史行为中的兴趣。然而,现有方法忽略了整个历史点击流中的个性化无关的点击噪音,这可能导致产生偏离真实用户偏好的虚构标题。在本文中,我们通过在用户和新闻维度上进行严格分析,揭示了点击噪音对个性化生成质量的有害影响。基于这些见解,我们提出了一种新颖的通过去噪虚假兴趣从隐式反馈中生成个性化标题的框架(PHG-DIF)。PHG-DIF首先采用双阶段过滤来有效去除点击流噪音,通过短暂停留时间和异常点击突发事件来识别,然后利用多级时间融合动态建模用户不断发展和多方面的兴趣,进行精确的个性化描述。此外,我们发布了一个新的基准数据集DT-PENS,包括1,000个经过精心策划的用户的点击行为和将近10,000个带有历史停留时间注释的个性化标题。大量实验表明,PHG-DIF显著减轻了点击噪音的不利影响,并显著提高了标题质量,在DT-PENS上取得了最新技术(SOTA)结果。我们的框架实现和数据集可在https://github.com/liukejin-up/PHG-DIF上获得。
更新时间: 2025-08-14 06:43:57
领域: cs.CL,cs.AI
Unlocking Robust Semantic Segmentation Performance via Label-only Elastic Deformations against Implicit Label Noise
While previous studies on image segmentation focus on handling severe (or explicit) label noise, real-world datasets also exhibit subtle (or implicit) label imperfections. These arise from inherent challenges, such as ambiguous object boundaries and annotator variability. Although not explicitly present, such mild and latent noise can still impair model performance. Typical data augmentation methods, which apply identical transformations to the image and its label, risk amplifying these subtle imperfections and limiting the model's generalization capacity. In this paper, we introduce NSegment+, a novel augmentation framework that decouples image and label transformations to address such realistic noise for semantic segmentation. By introducing controlled elastic deformations only to segmentation labels while preserving the original images, our method encourages models to focus on learning robust representations of object structures despite minor label inconsistencies. Extensive experiments demonstrate that NSegment+ consistently improves performance, achieving mIoU gains of up to +2.29, +2.38, +1.75, and +3.39 in average on Vaihingen, LoveDA, Cityscapes, and PASCAL VOC, respectively-even without bells and whistles, highlighting the importance of addressing implicit label noise. These gains can be further amplified when combined with other training tricks, including CutMix and Label Smoothing.
Updated: 2025-08-14 06:27:43
标题: 通过仅使用标签进行弹性变形来解锁鲁棒的语义分割性能,应对隐含标签噪声
摘要: 以往关于图像分割的研究主要集中在处理严重(或明显)的标签噪音,而现实世界中的数据集也显示出微妙(或隐性)的标签缺陷。这些缺陷源于固有挑战,如模糊的物体边界和注释者的变异性。尽管并非明显存在,这种轻微和潜在的噪音仍可能影响模型的性能。典型的数据增强方法,即对图像和其标签应用相同的转换,存在风险放大这些微妙的缺陷并限制模型的泛化能力。在本文中,我们介绍了NSegment+,这是一个新颖的增强框架,它解耦了图像和标签的转换,以解决语义分割中的这种现实噪音。通过仅对分割标签引入受控的弹性变形,同时保留原始图像,我们的方法鼓励模型专注于学习对象结构的稳健表示,尽管存在轻微的标签不一致性。大量实验证明,NSegment+持续改善性能,分别在Vaihingen、LoveDA、Cityscapes和PASCAL VOC上平均提升了+2.29、+2.38、+1.75和+3.39的mIoU,即使没有花里胡哨的技巧,突显了解决隐性标签噪音的重要性。当与其他训练技巧(包括CutMix和标签平滑)结合时,这些增益可以进一步放大。
更新时间: 2025-08-14 06:27:43
领域: cs.CV,cs.AI
BadBlocks: Low-Cost and Stealthy Backdoor Attacks Tailored for Text-to-Image Diffusion Models
In recent years, Diffusion models have achieved remarkable progress in the field of image generation. However, recent studies have shown that diffusion models are susceptible to backdoor attacks, in which attackers can manipulate the output by injecting covert triggers such as specific visual patterns or textual phrases into the training dataset. Fortunately, with the continuous advancement of defense techniques, defenders have become increasingly capable of identifying and mitigating most backdoor attacks using visual inspection and neural network-based detection methods. However, in this paper, we identify a novel type of backdoor threat that is more lightweight and covert than existing approaches, which we name BadBlocks, requires only about 30 of the computational resources and 20 GPU time typically needed by previous backdoor attacks, yet it successfully injects backdoors and evades the most advanced defense frameworks. BadBlocks enables attackers to selectively contaminate specific blocks within the UNet architecture of diffusion models while maintaining normal functionality in the remaining components. Experimental results demonstrate that BadBlocks achieves a high attack success rate and low perceptual quality loss , even under extremely constrained computational resources and GPU time. Moreover, BadBlocks is able to bypass existing defense frameworks, especially the attention-based backdoor detection method, highlighting it as a novel and noteworthy threat. Ablation studies further demonstrate that effective backdoor injection does not require fine-tuning the entire network and highlight the pivotal role of certain neural network layers in backdoor mapping. Overall, BadBlocks significantly reduces the barrier to conducting backdoor attacks in all aspects. It enables attackers to inject backdoors into large-scale diffusion models even using consumer-grade GPUs.
Updated: 2025-08-14 06:27:25
标题: 坏块:专为文本到图像扩散模型定制的低成本和隐蔽后门攻击
摘要: 最近几年,扩散模型在图像生成领域取得了显著进展。然而,最近的研究表明,扩散模型容易受到后门攻击的影响,攻击者可以通过向训练数据集中注入特定的视觉模式或文本短语等隐秘触发器来操纵输出。幸运的是,随着防御技术的不断进步,防御者越来越能够通过视觉检查和基于神经网络的检测方法识别和减轻大多数后门攻击。然而,在本文中,我们发现了一种更轻量级和隐蔽的新型后门威胁,我们将其命名为BadBlocks,仅需要先前后门攻击通常所需的大约30的计算资源和20 GPU时间,但它成功注入了后门并规避了最先进的防御框架。BadBlocks使攻击者能够选择性地污染扩散模型的UNet架构中的特定块,同时在其余组件中保持正常功能。实验结果表明,BadBlocks实现了高攻击成功率和低感知质量损失,即使在极其受限的计算资源和GPU时间下。此外,BadBlocks能够绕过现有的防御框架,特别是基于注意力的后门检测方法,突显其作为一种新颖且值得关注的威胁。消融研究进一步表明,有效的后门注入不需要对整个网络进行微调,并突出了某些神经网络层在后门映射中的关键作用。总的来说,BadBlocks显著降低了进行后门攻击的障碍。它使攻击者能够在使用消费级GPU的情况下将后门注入大规模扩散模型中。
更新时间: 2025-08-14 06:27:25
领域: cs.CR,cs.CV
Out of Distribution, Out of Luck: How Well Can LLMs Trained on Vulnerability Datasets Detect Top 25 CWE Weaknesses?
Automated vulnerability detection research has made substantial progress, yet its real-world impact remains limited. Current vulnerability datasets suffer from issues including label inaccuracy rates of 20-71%, extensive duplication, and poor coverage of critical CWE types. These issues create a significant "generalization gap" where models achieve misleading self-testing performance (measured on held-out data from the same dataset for training) by exploiting spurious correlations rather than learning true vulnerability patterns. Our analysis reveals that many models experience substantial performance drops of up to 33% when evaluated on independent data, with some performing close to random guessing. To address these limitations, we present a three-part solution. First, we introduce a manually curated test dataset, BenchVul, covering the MITRE Top 25 Most Dangerous CWEs. Second, we construct a high-quality training dataset, TitanVul, comprising 38,863 functions by aggregating seven public sources and applying deduplication and validation using a novel multi-agent LLM framework. Third, we propose a Realistic Vulnerability Generation (RVG) framework, which synthesizes context-aware vulnerability examples for underrepresented but critical CWE types through simulated development workflows. Our evaluation shows the strengths of each component in closing the generalization gap. First, BenchVul shows the limitations of self-testing: models trained on existing datasets, such as BigVul and CVEfixes, experience performance drops on BenchVul (from 0.776 to 0.519 and from 0.713 to 0.607). Second, training models on TitanVul demonstrates improved generalization, with model performance increasing from 0.584 when evaluated on the same dataset to 0.767 when tested on BenchVul. Third, supplementing TitanVul with RVG-generated data yields further gains, increasing model performance by 14.0% to 0.874.
Updated: 2025-08-14 06:16:52
标题: 超出分布,超出幸运:在易受攻击数据集上训练的LLM能有多好地检测前25个CWE弱点?
摘要: 自动化漏洞检测研究取得了实质性进展,但其在实际世界中的影响仍然有限。当前的漏洞数据集存在问题,包括标签不准确率高达20-71%,大量重复,以及对关键CWE类型覆盖不足。这些问题导致了一个重要的“泛化差距”,在这里模型通过利用虚假相关性而不是真正学习漏洞模式来实现误导性的自测性能(在来自相同数据集的保留数据上进行训练)。我们的分析显示,许多模型在独立数据上评估时经历了高达33%的性能下降,有些性能接近随机猜测。为了解决这些限制,我们提出了一个三部分解决方案。首先,我们引入了一个手动策划的测试数据集BenchVul,涵盖MITRE的25个最危险的CWE。其次,我们构建了一个高质量的训练数据集TitanVul,包括来自七个公共来源的38,863个函数,并应用去重和验证,使用一种新型的多代理LLM框架。第三,我们提出了一个现实漏洞生成(RVG)框架,通过模拟开发工作流程为被低估但关键的CWE类型合成具有上下文感知的漏洞示例。我们的评估显示了每个组件在缩小泛化差距方面的优势。首先,BenchVul显示了自测的局限性:在现有数据集(如BigVul和CVEfixes)上训练的模型在BenchVul上经历了性能下降(从0.776到0.519和从0.713到0.607)。其次,在TitanVul上训练模型展示了改善的泛化能力,当在同一数据集上评估时,模型性能从0.584增加到在BenchVul上测试时的0.767。第三,将TitanVul与RVG生成的数据相结合,进一步提高了模型性能,将其提高了14.0%至0.874。
更新时间: 2025-08-14 06:16:52
领域: cs.CR,cs.SE
Clicks Versus Conversion: Choosing a Recommender's Training Objective in E-Commerce
Ranking product recommendations to optimize for a high click-through rate (CTR) or for high conversion, such as add-to-cart rate (ACR) and Order-Submit-Rate (OSR, view-to-purchase conversion) are standard practices in e-commerce. Optimizing for CTR appears like a straightforward choice: Training data (i.e., click data) are simple to collect and often available in large quantities. Additionally, CTR is used far beyond e-commerce, making it a generalist, easily implemented option. ACR and OSR, on the other hand, are more directly linked to a shop's business goals, such as the Gross Merchandise Value (GMV). In this paper, we compare the effects of using either of these objectives using an online A/B test. Among our key findings, we demonstrate that in our shops, optimizing for OSR produces a GMV uplift more than five times larger than when optimizing for CTR, without sacrificing new product discovery. Our results also provide insights into the different feature importances for each of the objectives.
Updated: 2025-08-14 06:15:23
标题: 点击次数与转化率:在电子商务中选择推荐系统的训练目标
摘要: 将产品推荐进行排名以优化点击率(CTR)或高转化率,如添加到购物车率(ACR)和订单提交率(OSR,即查看到购买转化)是电子商务中的标准做法。优化CTR似乎是一个直接的选择:训练数据(即点击数据)容易收集,通常数量庞大。此外,CTR远不止在电子商务中使用,使其成为一个通用且易实施的选择。另一方面,ACR和OSR更直接地与商店的业务目标相关,如总商品价值(GMV)。本文通过在线A/B测试比较使用这两个目标的效果。在我们的关键发现中,我们证明在我们的商店中,优化OSR产生的GMV提升是优化CTR时的五倍以上,而不会牺牲新产品的发现。我们的结果还提供了关于每个目标的不同特征重要性的见解。
更新时间: 2025-08-14 06:15:23
领域: cs.IR,cs.LG
Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval
Determining which legal cases are relevant to a given query involves navigating lengthy texts and applying nuanced legal reasoning. Traditionally, this task has demanded significant time and domain expertise to identify key Legal Facts and reach sound juridical conclusions. In addition, existing data with legal case similarities often lack interpretability, making it difficult to understand the rationale behind relevance judgments. With the growing capabilities of large language models (LLMs), researchers have begun investigating their potential in this domain. Nonetheless, the method of employing a general large language model for reliable relevance judgments in legal case retrieval remains largely unexplored. To address this gap in research, we propose a novel few-shot approach where LLMs assist in generating expert-aligned interpretable relevance judgments. The proposed approach decomposes the judgment process into several stages, mimicking the workflow of human annotators and allowing for the flexible incorporation of expert reasoning to improve the accuracy of relevance judgments. Importantly, it also ensures interpretable data labeling, providing transparency and clarity in the relevance assessment process. Through a comparison of relevance judgments made by LLMs and human experts, we empirically demonstrate that the proposed approach can yield reliable and valid relevance assessments. Furthermore, we demonstrate that with minimal expert supervision, our approach enables a large language model to acquire case analysis expertise and subsequently transfers this ability to a smaller model via annotation-based knowledge distillation.
Updated: 2025-08-14 06:13:17
标题: 利用大型语言模型对法律案例检索中的相关性判断进行优化
摘要: 确定哪些法律案例与给定查询相关涉及浏览冗长的文本并应用微妙的法律推理。传统上,这项任务需要大量时间和领域专业知识,以确定关键的法律事实并得出合理的司法结论。此外,现有的具有类似法律案例的数据通常缺乏可解释性,这使得难以理解相关性判断背后的基本原理。随着大型语言模型(LLMs)的不断发展,研究人员已经开始探讨它们在这一领域的潜力。然而,将通用大型语言模型用于可靠的法律案例检索相关性判断的方法仍然大部分未被探索。为了解决这一研究中的差距,我们提出了一种新颖的少样本方法,其中LLMs帮助生成与专家对齐的可解释性相关性判断。所提出的方法将判断过程分解为几个阶段,模仿人类注释者的工作流程,允许灵活地整合专家推理以提高相关性判断的准确性。重要的是,它还确保可解释的数据标记,为相关性评估过程提供透明度和清晰度。通过比较LLMs和人类专家所做的相关性判断,我们经验证明所提出的方法可以产生可靠和有效的相关性评估。此外,我们还演示了在最小的专家监督下,我们的方法使大型语言模型能够获得案例分析专业知识,并随后通过基于注释的知识蒸馏将这种能力转移到较小的模型。
更新时间: 2025-08-14 06:13:17
领域: cs.AI,cs.IR
eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing
State Space Model (SSM)-based machine learning architectures have recently gained significant attention for processing sequential data. Mamba, a recent sequence-to-sequence SSM, offers competitive accuracy with superior computational efficiency compared to state-of-the-art transformer models. While this advantage makes Mamba particularly promising for resource-constrained edge devices, no hardware acceleration frameworks are currently optimized for deploying it in such environments. This paper presents eMamba, a comprehensive end-to-end hardware acceleration framework explicitly designed for deploying Mamba models on edge platforms. eMamba maximizes computational efficiency by replacing complex normalization layers with lightweight hardware-aware alternatives and approximating expensive operations, such as SiLU activation and exponentiation, considering the target applications. Then, it performs an approximation-aware neural architecture search (NAS) to tune the learnable parameters used during approximation. Evaluations with Fashion-MNIST, CIFAR-10, and MARS, an open-source human pose estimation dataset, show eMamba achieves comparable accuracy to state-of-the-art techniques using 1.63-19.9$\times$ fewer parameters. In addition, it generalizes well to large-scale natural language tasks, demonstrating stable perplexity across varying sequence lengths on the WikiText2 dataset. We also quantize and implement the entire eMamba pipeline on an AMD ZCU102 FPGA and ASIC using GlobalFoundries (GF) 22 nm technology. Experimental results show 4.95-5.62$\times$ lower latency and 2.22-9.95$\times$ higher throughput, with 4.77$\times$ smaller area, 9.84$\times$ lower power, and 48.6$\times$ lower energy consumption than baseline solutions while maintaining competitive accuracy.
Updated: 2025-08-14 06:08:05
标题: eMamba: 边缘计算中Mamba模型的高效加速框架
摘要: 基于状态空间模型(SSM)的机器学习架构最近引起了人们的极大关注,用于处理序列数据。最近的序列到序列SSM模型Mamba,与最先进的Transformer模型相比,在提供竞争力准确性的同时具有卓越的计算效率。尽管这一优势使Mamba特别适用于资源受限的边缘设备,但目前没有针对在这种环境中部署它进行优化的硬件加速框架。本文提出了eMamba,一个专门设计用于在边缘平台上部署Mamba模型的全面端到端硬件加速框架。eMamba通过用轻量级硬件感知替代复杂的归一化层和对昂贵操作(如SiLU激活和指数运算)进行近似,考虑目标应用程序,从而最大化计算效率。然后,它执行一个考虑到近似的神经架构搜索(NAS),以调整在近似过程中使用的可学习参数。通过Fashion-MNIST、CIFAR-10和MARS(一个开源的人体姿势估计数据集)的评估结果显示,eMamba在使用1.63-19.9倍较少参数的情况下实现了与最先进技术相媲美的准确性。此外,它在大规模自然语言任务中表现良好,展示了在WikiText2数据集上不同序列长度下的稳定困惑度。我们还在AMD ZCU102 FPGA和ASIC上量化和实现整个eMamba流水线,使用GlobalFoundries(GF)22纳米技术。实验结果显示,与基线解决方案相比,延迟降低了4.95-5.62倍,吞吐量提高了2.22-9.95倍,面积减小了4.77倍,功耗降低了9.84倍,能耗降低了48.6倍,同时保持了竞争力的准确性。
更新时间: 2025-08-14 06:08:05
领域: cs.LG,cs.AI
What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles
We investigate the capacity of Large Language Models (LLMs) for imaginative reasoning--the proactive construction, testing, and revision of hypotheses in information-sparse environments. Existing benchmarks, often static or focused on social deduction, fail to capture the dynamic, exploratory nature of this reasoning process. To address this gap, we introduce a comprehensive research framework based on the classic "Turtle Soup" game, integrating a benchmark, an agent, and an evaluation protocol. We present TurtleSoup-Bench, the first large-scale, bilingual, interactive benchmark for imaginative reasoning, comprising 800 turtle soup puzzles sourced from both the Internet and expert authors. We also propose Mosaic-Agent, a novel agent designed to assess LLMs' performance in this setting. To evaluate reasoning quality, we develop a multi-dimensional protocol measuring logical consistency, detail completion, and conclusion alignment. Experiments with leading LLMs reveal clear capability limits, common failure patterns, and a significant performance gap compared to humans. Our work offers new insights into LLMs' imaginative reasoning and establishes a foundation for future research on exploratory agent behavior.
Updated: 2025-08-14 05:55:42
标题: 下一步该问什么?通过TurtleSoup谜题探究LLM的想象推理
摘要: 我们研究了大型语言模型(LLMs)在想象推理方面的能力-在信息稀缺环境中主动构建、测试和修订假设。现有的基准测试通常是静态的或专注于社交推断,无法捕捉到这种推理过程的动态、探索性质。为了填补这一空白,我们引入了一个基于经典“Turtle Soup”游戏的综合研究框架,整合了一个基准测试、一个代理和一个评估协议。我们提出了TurtleSoup-Bench,这是第一个大规模的、双语的、互动的用于想象推理的基准测试,包括了来自互联网和专家作者的800个乌龟汤谜题。我们还提出了一种新颖的代理Mosaic-Agent,旨在评估LLMs在这种环境下的表现。为了评估推理质量,我们开发了一个多维度协议,用于衡量逻辑一致性、细节完成度和结论一致性。与领先的LLMs进行的实验揭示了明显的能力限制、常见的失败模式,并且与人类相比存在显著的性能差距。我们的工作提供了关于LLMs想象推理的新见解,并为未来对探索性代理行为的研究奠定了基础。
更新时间: 2025-08-14 05:55:42
领域: cs.AI
PromptTSS: A Prompting-Based Approach for Interactive Multi-Granularity Time Series Segmentation
Multivariate time series data, collected across various fields such as manufacturing and wearable technology, exhibit states at multiple levels of granularity, from coarse-grained system behaviors to fine-grained, detailed events. Effectively segmenting and integrating states across these different granularities is crucial for tasks like predictive maintenance and performance optimization. However, existing time series segmentation methods face two key challenges: (1) the inability to handle multiple levels of granularity within a unified model, and (2) limited adaptability to new, evolving patterns in dynamic environments. To address these challenges, we propose PromptTSS, a novel framework for time series segmentation with multi-granularity states. PromptTSS uses a unified model with a prompting mechanism that leverages label and boundary information to guide segmentation, capturing both coarse- and fine-grained patterns while adapting dynamically to unseen patterns. Experiments show PromptTSS improves accuracy by 24.49% in multi-granularity segmentation, 17.88% in single-granularity segmentation, and up to 599.24% in transfer learning, demonstrating its adaptability to hierarchical states and evolving time series dynamics. Our code is available at https://github.com/blacksnail789521/PromptTSS.
Updated: 2025-08-14 05:53:15
标题: PromptTSS:一种基于提示的方法,用于交互式多粒度时间序列分段
摘要: 多变量时间序列数据,横跨制造业和可穿戴技术等各个领域,展示出多个粒度级别的状态,从粗粒度的系统行为到细粒度、详细的事件。有效地对这些不同粒度级别的状态进行分割和整合对于诸如预测性维护和性能优化等任务至关重要。然而,现有的时间序列分割方法面临两个关键挑战:(1) 不能在统一模型内处理多个粒度级别,以及 (2) 对动态环境中新的、不断演化的模式的适应能力有限。为了解决这些挑战,我们提出了PromptTSS,这是一个用于多粒度状态时间序列分割的新框架。PromptTSS使用一个具有提示机制的统一模型,利用标签和边界信息来指导分割,捕捉粗粒度和细粒度模式,同时动态地适应未见模式。实验结果表明,PromptTSS在多粒度分割中将准确性提高了24.49%,在单粒度分割中提高了17.88%,在迁移学习中提高了高达599.24%,表明其对分层状态和不断演变的时间序列动态的适应能力。我们的代码可在https://github.com/blacksnail789521/PromptTSS找到。
更新时间: 2025-08-14 05:53:15
领域: cs.LG,cs.AI
HGAurban: Heterogeneous Graph Autoencoding for Urban Spatial-Temporal Learning
Spatial-temporal graph representations play a crucial role in urban sensing applications, including traffic analysis, human mobility behavior modeling, and citywide crime prediction. However, a key challenge lies in the noisy and sparse nature of spatial-temporal data, which limits existing neural networks' ability to learn meaningful region representations in the spatial-temporal graph. To overcome these limitations, we propose HGAurban, a novel heterogeneous spatial-temporal graph masked autoencoder that leverages generative self-supervised learning for robust urban data representation. Our framework introduces a spatial-temporal heterogeneous graph encoder that extracts region-wise dependencies from multi-source data, enabling comprehensive modeling of diverse spatial relationships. Within our self-supervised learning paradigm, we implement a masked autoencoder that jointly processes node features and graph structure. This approach automatically learns heterogeneous spatial-temporal patterns across regions, significantly improving the representation of dynamic temporal correlations. Comprehensive experiments across multiple spatiotemporal mining tasks demonstrate that our framework outperforms state-of-the-art methods and robustly handles real-world urban data challenges, including noise and sparsity in both spatial and temporal dimensions.
Updated: 2025-08-14 05:43:10
标题: HGAurban:用于城市时空学习的异质图自编码
摘要: 空间-时间图表示在城市感知应用中发挥着关键作用,包括交通分析、人类移动行为建模和全市犯罪预测。然而,一个关键挑战在于空间-时间数据的嘈杂和稀疏性,这限制了现有神经网络学习空间-时间图中有意义区域表示的能力。为了克服这些限制,我们提出了HGAurban,这是一个新颖的异构空间-时间图蒙版自编码器,利用生成式自监督学习实现强大的城市数据表示。我们的框架引入了一个空间-时间异构图编码器,从多源数据中提取区域依赖关系,实现对多样空间关系的全面建模。在我们的自监督学习范式中,我们实现了一个蒙版自编码器,联合处理节点特征和图结构。这种方法自动学习跨区域的异构空间-时间模式,显著改善了动态时间相关性的表示。通过多个时空挖掘任务的全面实验表明,我们的框架优于最先进的方法,并且稳健地处理现实世界城市数据挑战,包括空间和时间维度上的噪声和稀疏性。
更新时间: 2025-08-14 05:43:10
领域: cs.LG
Code Vulnerability Detection Across Different Programming Languages with AI Models
Security vulnerabilities present in a code that has been written in diverse programming languages are among the most critical yet complicated aspects of source code to detect. Static analysis tools based on rule-based patterns usually do not work well at detecting the context-dependent bugs and lead to high false positive rates. Recent developments in artificial intelligence, specifically the use of transformer-based models like CodeBERT and CodeLlama, provide light to this problem, as they show potential in finding such flaws better. This paper presents the implementations of these models on various datasets of code vulnerability, showing how off-the-shelf models can successfully produce predictive capacity in models through dynamic fine-tuning of the models on vulnerable and safe code fragments. The methodology comprises the gathering of the dataset, normalization of the language, fine-tuning of the model, and incorporation of ensemble learning and explainable AI. Experiments show that a well-trained CodeBERT can be as good as or even better than some existing static analyzers in terms of accuracy greater than 97%. Further study has indicated that although language models can achieve close-to-perfect recall, the precision can decrease. A solution to this is given by hybrid models and validation procedures, which will reduce false positives. According to the results, the AI-based solutions generalize to different programming languages and classes of vulnerability. Nevertheless, robustness, interpretability, and deployment readiness are still being developed. The results illustrate the probabilities that AI will enhance the trustworthiness in the usability and scalability of machine-learning-based detectors of vulnerabilities.
Updated: 2025-08-14 05:41:58
标题: 使用AI模型检测不同编程语言中的代码漏洞
摘要: 存在于使用多种编程语言编写的代码中的安全漏洞是源代码中最关键但也最复杂的方面之一,很难检测出来。基于规则模式的静态分析工具通常不能很好地检测出依赖上下文的错误,并导致高误报率。最近人工智能的发展,特别是使用像CodeBERT和CodeLlama这样的基于转换器的模型,为解决这一问题提供了新的思路,因为它们在发现此类缺陷方面表现出潜力更大。本文介绍了这些模型在各种代码漏洞数据集上的实现,展示了如何通过对模型在易受攻击和安全代码片段上进行动态微调,成功地产生模型的预测能力。方法包括数据集的收集、语言的标准化、模型的微调以及集成学习和可解释性人工智能的融合。实验表明,经过良好训练的CodeBERT在准确率方面可以与甚至优于一些现有的静态分析器,准确率高于97%。进一步的研究表明,语言模型虽然可以实现接近完美的召回率,但精度可能会降低。通过混合模型和验证程序的解决方案可以减少误报。根据结果,基于人工智能的解决方案可以泛化到不同的编程语言和漏洞类别。然而,鲁棒性、可解释性和部署准备性仍在不断发展中。结果表明,人工智能将提升机器学习漏洞检测器在可用性和可扩展性方面的可靠性。
更新时间: 2025-08-14 05:41:58
领域: cs.CR,cs.AI,cs.CL
Is Quantum Optimization Ready? An Effort Towards Neural Network Compression using Adiabatic Quantum Computing
Quantum optimization is the most mature quantum computing technology to date, providing a promising approach towards efficiently solving complex combinatorial problems. Methods such as adiabatic quantum computing (AQC) have been employed in recent years on important optimization problems across various domains. In deep learning, deep neural networks (DNN) have reached immense sizes to support new predictive capabilities. Optimization of large-scale models is critical for sustainable deployment, but becomes increasingly challenging with ever-growing model sizes and complexity. While quantum optimization is suitable for solving complex problems, its application to DNN optimization is not straightforward, requiring thorough reformulation for compatibility with commercially available quantum devices. In this work, we explore the potential of adopting AQC for fine-grained pruning-quantization of convolutional neural networks. We rework established heuristics to formulate model compression as a quadratic unconstrained binary optimization (QUBO) problem, and assess the solution space offered by commercial quantum annealing devices. Through our exploratory efforts of reformulation, we demonstrate that AQC can achieve effective compression of practical DNN models. Experiments demonstrate that adiabatic quantum computing (AQC) not only outperforms classical algorithms like genetic algorithms and reinforcement learning in terms of time efficiency but also excels at identifying global optima.
Updated: 2025-08-14 05:38:23
标题: 量子优化准备好了吗?努力实现使用绝热量子计算进行神经网络压缩
摘要: 量子优化是迄今为止最成熟的量子计算技术,为高效解决复杂组合问题提供了有前景的方法。近年来,诸如绝热量子计算(AQC)等方法已被应用于各个领域的重要优化问题。在深度学习中,深度神经网络(DNN)已经发展到庞大的规模,以支持新的预测能力。对大规模模型的优化对于可持续部署至关重要,但随着模型规模和复杂性不断增长,这一挑战变得越来越严峻。虽然量子优化适用于解决复杂问题,但其应用于DNN优化并不直接,需要进行彻底的重新制定以与商用量子设备兼容。在这项工作中,我们探讨了采用AQC对卷积神经网络进行精细修剪-量化的潜力。我们重新设计了已建立的启发式方法,将模型压缩形式化为二次无约束二进制优化(QUBO)问题,并评估商用量子退火设备所提供的解空间。通过我们对重新制定的探索性努力,我们展示了AQC可以实现实用DNN模型的有效压缩。实验证明,绝热量子计算(AQC)不仅在时间效率方面优于遗传算法和强化学习等经典算法,而且在识别全局最优解方面表现出色。
更新时间: 2025-08-14 05:38:23
领域: quant-ph,cs.AI,cs.PF
A New Query Expansion Approach via Agent-Mediated Dialogic Inquiry
Query expansion is widely used in Information Retrieval (IR) to improve search outcomes by supplementing initial queries with richer information. While recent Large Language Model (LLM) based methods generate pseudo-relevant content and expanded terms via multiple prompts, they often yield homogeneous, narrow expansions that lack the diverse context needed to retrieve relevant information. In this paper, we propose AMD: a new Agent-Mediated Dialogic Framework that engages in a dialogic inquiry involving three specialized roles: (1) a Socratic Questioning Agent reformulates the initial query into three sub-questions, with each question inspired by a specific Socratic questioning dimension, including clarification, assumption probing, and implication probing, (2) a Dialogic Answering Agent generates pseudo-answers, enriching the query representation with multiple perspectives aligned to the user's intent, and (3) a Reflective Feedback Agent evaluates and refines these pseudo-answers, ensuring that only the most relevant and informative content is retained. By leveraging a multi-agent process, AMD effectively crafts richer query representations through inquiry and feedback refinement. Extensive experiments on benchmarks including BEIR and TREC demonstrate that our framework outperforms previous methods, offering a robust solution for retrieval tasks.
Updated: 2025-08-14 05:37:14
标题: 一种新的通过代理调解对话查询扩展方法
摘要: 查询扩展在信息检索(IR)中被广泛使用,通过补充初始查询与更丰富的信息来改善搜索结果。尽管最近基于大型语言模型(LLM)的方法通过多个提示生成伪相关内容和扩展术语,但它们通常产生同质化、狭窄的扩展,缺乏检索相关信息所需的多样化上下文。在本文中,我们提出AMD:一种新的代理人介导的对话框架,涉及三种专业角色:(1)一位苏格拉底质疑代理将初始查询重新表述为三个子问题,每个问题受到特定苏格拉底质疑维度的启发,包括澄清、假设探究和暗示探究,(2)一位对话回答代理生成伪答案,丰富查询表示以与用户意图一致的多个视角,(3)一位反思反馈代理评估和完善这些伪答案,确保仅保留最相关和信息丰富的内容。通过利用多代理过程,AMD通过询问和反馈精炼有效地构建更丰富的查询表示。包括BEIR和TREC在内的基准测试上的大量实验证明我们的框架优于先前的方法,为检索任务提供了强大的解决方案。
更新时间: 2025-08-14 05:37:14
领域: cs.IR,cs.CL,cs.LG,cs.MA
Generative Active Adaptation for Drifting and Imbalanced Network Intrusion Detection
Machine learning has shown promise in network intrusion detection systems, yet its performance often degrades due to concept drift and imbalanced data. These challenges are compounded by the labor-intensive process of labeling network traffic, especially when dealing with evolving and rare attack types, which makes preparing the right data for adaptation difficult. To address these issues, we propose a generative active adaptation framework that minimizes labeling effort while enhancing model robustness. Our approach employs density-aware dataset prior selection to identify the most informative samples for annotation, and leverages deep generative models to conditionally synthesize diverse samples, thereby augmenting the training set and mitigating the effects of concept drift. We evaluate our end-to-end framework \NetGuard on both simulated IDS data and a real-world ISP dataset, demonstrating significant improvements in intrusion detection performance. Our method boosts the overall F1-score from 0.60 (without adaptation) to 0.86. Rare attacks such as Infiltration, Web Attack, and FTP-BruteForce, which originally achieved F1 scores of 0.001, 0.04, and 0.00, improve to 0.30, 0.50, and 0.71, respectively, with generative active adaptation in the CIC-IDS 2018 dataset. Our framework effectively enhances rare attack detection while reducing labeling costs, making it a scalable and practical solution for intrusion detection.
Updated: 2025-08-14 05:27:49
标题: 生成式主动适应用于漂移和不平衡网络入侵检测
摘要: 机器学习在网络入侵检测系统中显示出潜力,但由于概念漂移和数据不平衡,其性能经常会下降。在处理进化和罕见攻击类型时,网络流量的标记是一项劳动密集型的过程,这使得为适应准备正确数据变得困难。为了解决这些问题,我们提出了一个生成式主动适应框架,可以最小化标记工作量同时增强模型的稳健性。我们的方法采用基于密度感知的数据集先验选择来识别最具信息价值的样本进行注释,并利用深度生成模型有条件地合成多样化的样本,从而增加训练集并减轻概念漂移的影响。我们在模拟IDS数据和真实的ISP数据集上评估了我们的端到端框架NetGuard,展示了入侵检测性能的显著改善。我们的方法将整体F1分数从0.60(没有适应)提高到0.86。在CIC-IDS 2018数据集中,原本仅获得0.001、0.04和0.00的罕见攻击,如渗透、网络攻击和FTP-暴力破解,分别提高到0.30、0.50和0.71,通过生成式主动适应。我们的框架有效地增强了罕见攻击检测,同时降低了标记成本,使其成为入侵检测的可扩展和实用解决方案。
更新时间: 2025-08-14 05:27:49
领域: cs.NI,cs.CR,cs.LG
A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set
The state-of-the-art methods for estimating high-dimensional covariance matrices all shrink the eigenvalues of the sample covariance matrix towards a data-insensitive shrinkage target. The underlying shrinkage transformation is either chosen heuristically - without compelling theoretical justification - or optimally in view of restrictive distributional assumptions. In this paper, we propose a principled approach to construct covariance estimators without imposing restrictive assumptions. That is, we study distributionally robust covariance estimation problems that minimize the worst-case Frobenius error with respect to all data distributions close to a nominal distribution, where the proximity of distributions is measured via a divergence on the space of covariance matrices. We identify mild conditions on this divergence under which the resulting minimizers represent shrinkage estimators. We show that the corresponding shrinkage transformations are intimately related to the geometrical properties of the underlying divergence. We also prove that our robust estimators are efficiently computable and asymptotically consistent and that they enjoy finite-sample performance guarantees. We exemplify our general methodology by synthesizing explicit estimators induced by the Kullback-Leibler, Fisher-Rao, and Wasserstein divergences. Numerical experiments based on synthetic and real data show that our robust estimators are competitive with state-of-the-art estimators.
Updated: 2025-08-14 05:17:33
标题: 一个几何统一的分布鲁棒协方差估计器:通过扩大模糊集来缩小谱
摘要: 目前用于估计高维协方差矩阵的先进方法都将样本协方差矩阵的特征值收缩到一个与数据无关的收缩目标。潜在的收缩转换要么是根据启发式选择的 - 没有强有力的理论证明 - 要么是根据限制性分布假设来选择的最优方法。在本文中,我们提出了一种基于原则的方法来构建协方差估计器,而不施加限制性假设。也就是说,我们研究在所有接近名义分布的数据分布中最小化最坏情况的Frobenius误差的分布鲁棒协方差估计问题,其中分布的接近度是通过协方差矩阵空间上的散度来衡量的。我们确定了这种散度的温和条件,使得结果的最小化者代表收缩估计器。我们展示了相应的收缩转换与基础散度的几何特性密切相关。我们还证明了我们的鲁棒估计器是高效可计算的,并且渐近一致,并且它们具有有限样本性能保证。我们通过合成由Kullback-Leibler,Fisher-Rao和Wasserstein散度引起的显式估计器来举例说明我们的一般方法论。基于合成和真实数据的数值实验表明,我们的鲁棒估计器与最先进的估计器竞争力强。
更新时间: 2025-08-14 05:17:33
领域: stat.ML,cs.LG,math.OC
Semantic Communication with Distribution Learning through Sequential Observations
Semantic communication aims to convey meaning rather than bit-perfect reproduction, representing a paradigm shift from traditional communication. This paper investigates distribution learning in semantic communication where receivers must infer the underlying meaning distribution through sequential observations. While semantic communication traditionally optimizes individual meaning transmission, we establish fundamental conditions for learning source statistics when priors are unknown. We prove that learnability requires full rank of the effective transmission matrix, characterize the convergence rate of distribution estimation, and quantify how estimation errors translate to semantic distortion. Our analysis reveals a fundamental trade-off: encoding schemes optimized for immediate semantic performance often sacrifice long-term learnability. Experiments on CIFAR-10 validate our theoretical framework, demonstrating that system conditioning critically impacts both learning rate and achievable performance. These results provide the first rigorous characterization of statistical learning in semantic communication and offer design principles for systems that balance immediate performance with adaptation capability.
Updated: 2025-08-14 05:15:05
标题: 通过顺序观察进行分布学习的语义通信
摘要: 语义通信旨在传达意义而不是完美的比特重现,代表了传统通信的范式转变。本文研究了语义通信中的分布学习,接收方必须通过顺序观察推断潜在的含义分布。虽然语义通信传统上优化个体含义传输,但我们建立了在先验未知的情况下学习源统计量的基本条件。我们证明了可学习性需要有效传输矩阵的满秩,表征了分布估计的收敛速度,并量化了估计误差如何转化为语义失真。我们的分析揭示了一个基本的权衡:为即时语义表现优化的编码方案通常会牺牲长期的可学习性。在CIFAR-10上的实验证实了我们的理论框架,表明系统的调节对学习速度和可达性能均有关键影响。这些结果提供了对语义通信中统计学习的首次严格表征,并为平衡即时性能和适应能力的系统提供了设计原则。
更新时间: 2025-08-14 05:15:05
领域: cs.LG,cs.NI
Flexible Personalized Split Federated Learning for On-Device Fine-Tuning of Foundation Models
Fine-tuning foundation models is critical for superior performance on personalized downstream tasks, compared to using pre-trained models. Collaborative learning can leverage local clients' datasets for fine-tuning, but limited client data and heterogeneous data distributions hinder effective collaboration. To address the challenge, we propose a flexible personalized federated learning paradigm that enables clients to engage in collaborative learning while maintaining personalized objectives. Given the limited and heterogeneous computational resources available on clients, we introduce \textbf{flexible personalized split federated learning (FlexP-SFL)}. Based on split learning, FlexP-SFL allows each client to train a portion of the model locally while offloading the rest to a server, according to resource constraints. Additionally, we propose an alignment strategy to improve personalized model performance on global data. Experimental results show that FlexP-SFL outperforms baseline models in personalized fine-tuning efficiency and final accuracy.
Updated: 2025-08-14 05:14:00
标题: 灵活的个性化分裂联邦学习:基础模型在设备上微调
摘要: 精调基础模型对于在个性化下游任务上实现卓越性能至关重要,而不是使用预训练模型。协作学习可以利用本地客户端的数据集进行精细调整,但是有限的客户数据和异构的数据分布阻碍了有效的协作。为了解决这一挑战,我们提出了一种灵活的个性化联邦学习范式,使客户能够在维持个性化目标的同时参与协作学习。鉴于客户端上可用的计算资源有限且异质,我们引入了灵活的个性化分割联邦学习(FlexP-SFL)。基于分割学习,FlexP-SFL允许每个客户端在本地训练部分模型,同时根据资源约束将其余部分卸载到服务器。此外,我们提出了一种对齐策略,以提高全局数据上的个性化模型性能。实验结果表明,FlexP-SFL在个性化精细调整效率和最终准确性方面优于基线模型。
更新时间: 2025-08-14 05:14:00
领域: cs.DC,cs.LG
A Hierarchical IDS for Zero-Day Attack Detection in Internet of Medical Things Networks
The Internet of Medical Things (IoMT) is driving a healthcare revolution but remains vulnerable to cyberattacks such as denial of service, ransomware, data hijacking, and spoofing. These networks comprise resource constrained, heterogeneous devices (e.g., wearable sensors, smart pills, implantables), making traditional centralized Intrusion Detection Systems (IDSs) unsuitable due to response delays, privacy risks, and added vulnerabilities. Centralized IDSs require all sensors to transmit data to a central server, causing delays or network disruptions in dense environments. Running IDSs locally on IoMT devices is often infeasible due to limited computation, and even lightweight IDS components remain at risk if updated models are delayed leaving them exposed to zero-day attacks that threaten patient health and data security. We propose a multi level IoMT IDS framework capable of detecting zero day attacks and distinguishing between known and unknown threats. The first layer (near Edge) filters traffic at a coarse level (attack or not) using meta-learning or One Class Classification (OCC) with the usfAD algorithm. Subsequent layers (far Edge, Cloud) identify attack type and novelty. Experiments on the CICIoMT2024 dataset show 99.77 percentage accuracy and 97.8 percentage F1-score. The first layer detects zero-day attacks with high accuracy without needing new datasets, ensuring strong applicability in IoMT environments. Additionally, the meta-learning approach achieves high.
Updated: 2025-08-14 05:08:37
标题: 一种用于医疗物联网网络零日攻击检测的分层IDS
摘要: 医疗物联网(IoMT)正在推动医疗保健革命,但仍然容易受到网络攻击,如拒绝服务、勒索软件、数据劫持和欺骗。这些网络包括资源受限、异构设备(如可穿戴传感器、智能药丸、可植入设备),使传统的集中式入侵检测系统(IDS)不适用,因为会造成响应延迟、隐私风险和额外的漏洞。集中式IDS需要所有传感器将数据传输到中央服务器,导致在密集环境中延迟或网络中断。在IoMT设备上本地运行IDS通常是不可行的,因为计算能力有限,即使是轻量级的IDS组件,如果更新模型延迟,也会面临风险,使其容易受到威胁患者健康和数据安全的零日攻击。我们提出了一个多层次的IoMT IDS框架,能够检测零日攻击,并区分已知和未知的威胁。第一层(近边缘)使用元学习或一类分类(OCC)和usfAD算法对流量进行粗略级别的过滤(攻击或非攻击)。随后的层(远边缘、云端)识别攻击类型和新颖性。对CICIoMT2024数据集的实验显示99.77%的准确率和97.8%的F1分数。第一层在不需要新数据集的情况下高准确地检测零日攻击,确保在IoMT环境中具有强大的适用性。此外,元学习方法取得了很高的效果。
更新时间: 2025-08-14 05:08:37
领域: cs.LG,cs.NI
Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods
In this work, we demonstrate that certain machine unlearning methods may fail under straightforward prompt attacks. We systematically evaluate eight unlearning techniques across three model families using output-based, logit-based, and probe analysis to assess the extent to which supposedly unlearned knowledge can be retrieved. While methods like RMU and TAR exhibit robust unlearning, ELM remains vulnerable to specific prompt attacks (e.g., prepending Hindi filler text to the original prompt recovers 57.3% accuracy). Our logit analysis further indicates that unlearned models are unlikely to hide knowledge through changes in answer formatting, given the strong correlation between output and logit accuracy. These findings challenge prevailing assumptions about unlearning effectiveness and highlight the need for evaluation frameworks that can reliably distinguish between genuine knowledge removal and superficial output suppression. To facilitate further research, we publicly release our evaluation framework to easily evaluate prompting techniques to retrieve unlearned knowledge.
Updated: 2025-08-14 05:03:53
标题: 及时攻击揭示了去学习方法中的表面知识消除
摘要: 在这项工作中,我们证明了某些机器遗忘方法在面对直接的提示攻击时可能会失败。我们系统评估了三个模型系列中的八种遗忘技术,使用基于输出、基于logit和探针分析来评估据说已经遗忘的知识可以被检索的程度。虽然像RMU和TAR这样的方法表现出了强大的遗忘能力,但ELM仍然容易受到特定提示攻击的影响(例如,在原始提示前加入印地语填充文本可以恢复57.3%的准确率)。我们的logit分析进一步表明,未学习的模型不太可能通过更改答案格式来隐藏知识,因为输出和logit准确率之间存在强相关性。这些发现挑战了关于遗忘效果的现行假设,并突出了需要评估框架来可靠区分真正的知识删除和表面的输出抑制的需求。为了促进进一步的研究,我们公开发布了我们的评估框架,以便轻松评估提示技术来检索已遗忘的知识。
更新时间: 2025-08-14 05:03:53
领域: cs.CR,cs.AI,cs.CL,cs.CY,cs.LG
Welfare-Centric Clustering
Fair clustering has traditionally focused on ensuring equitable group representation or equalizing group-specific clustering costs. However, Dickerson et al. (2025) recently showed that these fairness notions may yield undesirable or unintuitive clustering outcomes and advocated for a welfare-centric clustering approach that models the utilities of the groups. In this work, we model group utilities based on both distances and proportional representation and formalize two optimization objectives based on welfare-centric clustering: the Rawlsian (Egalitarian) objective and the Utilitarian objective. We introduce novel algorithms for both objectives and prove theoretical guarantees for them. Empirical evaluations on multiple real-world datasets demonstrate that our methods significantly outperform existing fair clustering baselines.
Updated: 2025-08-14 05:02:32
标题: 福利中心的聚类
摘要: 公平聚类传统上关注确保公平的群组代表性或平衡特定群组的聚类成本。然而,Dickerson等人(2025年)最近表明,这些公平概念可能会产生不良或不直观的聚类结果,并倡导以模拟群组效用为中心的聚类方法。在这项工作中,我们基于距离和比例代表性建立群体效用模型,并基于以福利为中心的聚类形式化两个优化目标:Rawlsian(平等主义)目标和Utilitarian目标。我们为两个目标引入了新的算法,并为它们提供了理论保证。对多个真实世界数据集的实证评估表明,我们的方法明显优于现有的公平聚类基线。
更新时间: 2025-08-14 05:02:32
领域: cs.LG,cs.AI,cs.CY,cs.DS
Multi-Agent Trust Region Policy Optimisation: A Joint Constraint Approach
Multi-agent reinforcement learning (MARL) requires coordinated and stable policy updates among interacting agents. Heterogeneous-Agent Trust Region Policy Optimization (HATRPO) enforces per-agent trust region constraints using Kullback-Leibler (KL) divergence to stabilize training. However, assigning each agent the same KL threshold can lead to slow and locally optimal updates, especially in heterogeneous settings. To address this limitation, we propose two approaches for allocating the KL divergence threshold across agents: HATRPO-W, a Karush-Kuhn-Tucker-based (KKT-based) method that optimizes threshold assignment under global KL constraints, and HATRPO-G, a greedy algorithm that prioritizes agents based on improvement-to-divergence ratio. By connecting sequential policy optimization with constrained threshold scheduling, our approach enables more flexible and effective learning in heterogeneous-agent settings. Experimental results demonstrate that our methods significantly boost the performance of HATRPO, achieving faster convergence and higher final rewards across diverse MARL benchmarks. Specifically, HATRPO-W and HATRPO-G achieve comparable improvements in final performance, each exceeding 22.5%. Notably, HATRPO-W also demonstrates more stable learning dynamics, as reflected by its lower variance.
Updated: 2025-08-14 04:48:46
标题: 多智能体信任域策略优化:联合约束方法
摘要: 多智能体强化学习(MARL)需要在相互作用的智能体之间协调稳定的策略更新。异构智能体信任区域策略优化(HATRPO)利用Kullback-Leibler(KL)散度强化训练,强制每个智能体的信任区域约束。然而,将相同的KL阈值分配给每个智能体可能会导致缓慢和局部最优的更新,特别是在异构设置中。为了解决这个限制,我们提出了两种方法来分配智能体之间的KL散度阈值:HATRPO-W,一种基于Karush-Kuhn-Tucker(KKT)的方法,在全局KL约束下优化阈值分配;HATRPO-G,一种贪婪算法,根据改进-散度比率对智能体进行优先级排序。通过将顺序策略优化与受限阈值调度相连接,我们的方法在异构智能体设置中实现了更灵活和有效的学习。实验结果表明,我们的方法显著提高了HATRPO的性能,实现了更快的收敛和更高的最终奖励,覆盖各种MARL基准。具体来说,HATRPO-W和HATRPO-G在最终性能方面取得了可比较的改进,每个都超过了22.5%。值得注意的是,HATRPO-W还展现出更稳定的学习动态,其低方差反映了这一点。
更新时间: 2025-08-14 04:48:46
领域: cs.AI
Concepts or Skills? Rethinking Instruction Selection for Multi-modal Models
Vision-language instruction tuning achieves two main purposes: learning visual concepts and learning visual skills. In this paper, we found that vision-language benchmarks fall into the dichotomy of mainly benefiting from training on instructions with similar skills or visual concepts. Inspired by the discovery, we designed a simple targeted training data selection method to optimize the performance of a given benchmark. We first extract the concepts/skills from the benchmark, determine whether the benchmark predominantly benefits from similar concepts or skills, and finally select instructions with the most matching concepts/skills. Experiments on 10+ benchmarks validate the effectiveness of our targeted data selection method, showing +0.9\% over the best existing baseline averaged over all benchmarks and +1.5\% on the skill-focused subset. Our findings underscore the importance of recognizing the inherent trade-off within instruction selection, which requires balancing the acquisition of conceptual knowledge against visual skill.
Updated: 2025-08-14 04:48:38
标题: 概念还是技能?重新思考多模型模型的指导选择
摘要: 视觉-语言指导调整实现了两个主要目的:学习视觉概念和学习视觉技能。在本文中,我们发现视觉-语言基准主要受益于训练具有类似技能或视觉概念的指导的二分法。受此发现启发,我们设计了一种简单的目标训练数据选择方法,以优化给定基准的性能。我们首先从基准中提取概念/技能,确定基准是否主要受益于类似的概念或技能,最后选择具有最匹配概念/技能的指导。对10个以上的基准进行的实验验证了我们的目标数据选择方法的有效性,显示出相对于所有基准的最佳现有基线平均值增加了+0.9\%,在以技能为重点的子集上增加了+1.5\%。我们的发现强调了识别指导选择中固有权衡的重要性,这需要在获取概念知识与视觉技能之间取得平衡。
更新时间: 2025-08-14 04:48:38
领域: cs.CV,cs.LG
Privacy-preserving Blockchain-enabled Parametric Insurance via Remote Sensing and IoT
Traditional Insurance, a popular approach of financial risk management, has suffered from the issues of high operational costs, opaqueness, inefficiency and a lack of trust. Recently, blockchain-enabled "parametric insurance" through authorized data sources (e.g., remote sensing and IoT) aims to overcome these issues by automating the underwriting and claim processes of insurance policies on a blockchain. However, the openness of blockchain platforms raises a concern of user privacy, as the private user data in insurance claims on a blockchain may be exposed to outsiders. In this paper, we propose a privacy-preserving parametric insurance framework based on succinct zero-knowledge proofs (zk-SNARKs), whereby an insuree submits a zero-knowledge proof (without revealing any private data) for the validity of an insurance claim and the authenticity of its data sources to a blockchain for transparent verification. Moreover, we extend the recent zk-SNARKs to support robust privacy protection for multiple heterogeneous data sources and improve its efficiency to cut the incurred gas cost by 80%. As a proof-of-concept, we implemented a working prototype of bushfire parametric insurance on real-world blockchain platform Ethereum, and present extensive empirical evaluations.
Updated: 2025-08-14 04:41:28
标题: 隐私保护的区块链启用的参数化保险通过远程感知和物联网
摘要: 传统保险是金融风险管理的一种流行方法,但受到高运营成本、不透明性、低效率和缺乏信任等问题的困扰。最近,基于区块链的“参数化保险”通过授权数据来源(例如远程感知和物联网)旨在通过在区块链上自动化保险政策的核保和理赔流程来解决这些问题。然而,区块链平台的开放性引发了用户隐私的担忧,因为在区块链上的保险索赔中的私人用户数据可能会暴露给外部人员。在本文中,我们提出了一个基于简洁零知识证明(zk-SNARKs)的保护隐私的参数化保险框架,其中被保险人向区块链提交一个零知识证明(不透露任何私人数据)以验证保险索赔的有效性和数据来源的真实性,以实现透明验证。此外,我们将最近的zk-SNARKs扩展到支持多个异构数据源的强大隐私保护,并提高其效率,将产生的燃气成本削减80%。作为概念验证,我们在真实世界的区块链平台以太坊上实现了一个灭火参数化保险的工作原型,并进行了广泛的实证评估。
更新时间: 2025-08-14 04:41:28
领域: cs.CR,cs.NI
Collaborative Mean Estimation Among Heterogeneous Strategic Agents: Individual Rationality, Fairness, and Truthful Contribution
We study a collaborative learning problem where $m$ agents aim to estimate a vector $\mu =(\mu_1,\ldots,\mu_d)\in \mathbb{R}^d$ by sampling from associated univariate normal distributions $\{\mathcal{N}(\mu_k, \sigma^2)\}_{k\in[d]}$. Agent $i$ incurs a cost $c_{i,k}$ to sample from $\mathcal{N}(\mu_k, \sigma^2)$. Instead of working independently, agents can exchange data, collecting cheaper samples and sharing them in return for costly data, thereby reducing both costs and estimation error. We design a mechanism to facilitate such collaboration, while addressing two key challenges: ensuring individually rational (IR) and fair outcomes so all agents benefit, and preventing strategic behavior (e.g. non-collection, data fabrication) to avoid socially undesirable outcomes. We design a mechanism and an associated Nash equilibrium (NE) which minimizes the social penalty-sum of agents' estimation errors and collection costs-while being IR for all agents. We achieve a $\mathcal{O}(\sqrt{m})$-approximation to the minimum social penalty in the worst case and an $\mathcal{O}(1)$-approximation under favorable conditions. Additionally, we establish three hardness results: no nontrivial mechanism guarantees (i) a dominant strategy equilibrium where agents report truthfully, (ii) is IR for every strategy profile of other agents, (iii) or avoids a worst-case $\Omega(\sqrt{m})$ price of stability in any NE. Finally, by integrating concepts from axiomatic bargaining, we demonstrate that our mechanism supports fairer outcomes than one which minimizes social penalty.
Updated: 2025-08-14 04:41:26
标题: 异构战略代理间的协作均值估计:个体合理性、公平性和真实贡献
摘要: 我们研究了一个协作学习问题,其中$m$个代理人旨在通过从相关的单变量正态分布$\{\mathcal{N}(\mu_k, \sigma^2)\}_{k\in[d]}$中抽样来估计一个向量$\mu =(\mu_1,\ldots,\mu_d)\in \mathbb{R}^d$。代理人$i$为从$\mathcal{N}(\mu_k, \sigma^2)$中抽样而产生成本$c_{i,k}$。代替独立工作,代理人可以交换数据,收集更便宜的样本并分享它们以换取昂贵的数据,从而降低成本和估计误差。我们设计了一个机制来促进这种协作,同时解决两个关键挑战:确保个体理性(IR)和公平结果,使所有代理人受益,并防止战略行为(例如非收集、数据捏造)以避免不良的社会结果。我们设计了一个机制和一个相关的纳什均衡(NE),在最坏情况下最小化代理人估计误差和收集成本的社会惩罚总和,同时对所有代理人都是IR的。在最坏情况下,我们实现了对最小社会惩罚的$\mathcal{O}(\sqrt{m})$近似,在有利条件下实现了$\mathcal{O}(1)$近似。此外,我们建立了三个困难结果:没有非平凡机制能够确保(i)主导策略均衡,其中代理人真实报告,(ii)对其他代理人的每个策略配置都是IR的,(iii)或在任何NE中避免最坏情况下的$\Omega(\sqrt{m})$稳定性价格。最后,通过整合公理谈判的概念,我们证明我们的机制支持比最小化社会惩罚的机制更公平的结果。
更新时间: 2025-08-14 04:41:26
领域: cs.GT,cs.LG
Echoes of Automation: The Increasing Use of LLMs in Newsmaking
The rapid rise of Generative AI (GenAI), particularly LLMs, poses concerns for journalistic integrity and authorship. This study examines AI-generated content across over 40,000 news articles from major, local, and college news media, in various media formats. Using three advanced AI-text detectors (e.g., Binoculars, Fast-Detect GPT, and GPTZero), we find substantial increase of GenAI use in recent years, especially in local and college news. Sentence-level analysis reveals LLMs are often used in the introduction of news, while conclusions usually written manually. Linguistic analysis shows GenAI boosts word richness and readability but lowers formality, leading to more uniform writing styles, particularly in local media.
Updated: 2025-08-14 04:40:50
标题: 自动化的回声:新闻制作中对LLM的日益使用
摘要: 快速崛起的生成式人工智能(GenAI),尤其是大型语言模型(LLMs),引发了对新闻诚信和作者身份的担忧。本研究考察了超过40,000篇来自主要、地方和大学新闻媒体的新闻文章中生成式人工智能生成的内容,涵盖了各种媒体格式。利用三种先进的AI文本检测器(例如,双筒望远镜,快速检测GPT和GPTZero),我们发现近年来生成式人工智能的使用大幅增加,尤其是在地方和大学新闻中。句子级分析显示LLMs经常用于新闻引言,而结论通常是手动撰写的。语言分析显示生成式人工智能提升了词汇丰富度和可读性,但降低了正式性,导致更加统一的写作风格,特别是在地方媒体中。
更新时间: 2025-08-14 04:40:50
领域: cs.CL,cs.AI
A Curriculum Learning Approach to Reinforcement Learning: Leveraging RAG for Multimodal Question Answering
This paper describes the solutions of the Dianping-Trust-Safety team for the META CRAG-MM challenge. The challenge requires building a comprehensive retrieval-augmented generation system capable for multi-modal multi-turn question answering. The competition consists of three tasks: (1) answering questions using structured data retrieved from an image-based mock knowledge graph, (2) synthesizing information from both knowledge graphs and web search results, and (3) handling multi-turn conversations that require context understanding and information aggregation from multiple sources. For Task 1, our solution is based on the vision large language model, enhanced by supervised fine-tuning with knowledge distilled from GPT-4.1. We further applied curriculum learning strategies to guide reinforcement learning, resulting in improved answer accuracy and reduced hallucination. For Task 2 and Task 3, we additionally leveraged web search APIs to incorporate external knowledge, enabling the system to better handle complex queries and multi-turn conversations. Our approach achieved 1st place in Task 1 with a significant lead of 52.38\%, and 3rd place in Task 3, demonstrating the effectiveness of the integration of curriculum learning with reinforcement learning in our training pipeline.
Updated: 2025-08-14 04:37:56
标题: 一种课程学习方法用于强化学习:利用RAG进行多模态问答
摘要: 本文描述了大众点评-信任-安全团队在META CRAG-MM挑战中的解决方案。该挑战要求构建一个能够进行多模态多轮问答的综合检索增强生成系统。比赛包括三个任务:(1)利用从基于图像的模拟知识图检索的结构化数据回答问题,(2)从知识图和网络搜索结果中综合信息,以及(3)处理需要上下文理解和从多个来源进行信息聚合的多轮对话。对于任务1,我们的解决方案基于视觉大语言模型,通过从GPT-4.1蒸馏的知识进行监督微调增强。我们进一步应用课程学习策略指导强化学习,从而提高了答案准确性并减少了虚幻现象。对于任务2和任务3,我们另外利用网络搜索API整合外部知识,使系统能够更好地处理复杂查询和多轮对话。我们的方法在任务1中取得了第一名,领先幅度达到了52.38%,在任务3中取得了第三名,展示了我们在训练流程中整合课程学习和强化学习的有效性。
更新时间: 2025-08-14 04:37:56
领域: cs.AI,cs.LG
Rhythmic sharing: A bio-inspired paradigm for zero-shot adaptive learning in neural networks
The brain rapidly adapts to new contexts and learns from limited data, a coveted characteristic that artificial intelligence (AI) algorithms struggle to mimic. Inspired by the mechanical oscillatory rhythms of neural cells, we developed a learning paradigm utilizing link strength oscillations, where learning is associated with the coordination of these oscillations. Link oscillations can rapidly change coordination, allowing the network to sense and adapt to subtle contextual changes without supervision. The network becomes a generalist AI architecture, capable of predicting dynamics of multiple contexts including unseen ones. These results make our paradigm a powerful starting point for novel models of cognition. Because our paradigm is agnostic to specifics of the neural network, our study opens doors for introducing rapid adaptive learning into leading AI models.
Updated: 2025-08-14 04:28:36
标题: 节奏共享:一种神经网络中零-shot自适应学习的生物启发式范式
摘要: 大脑迅速适应新环境并从有限数据中学习,这是人工智能(AI)算法很难模仿的一个令人向往的特征。受神经细胞机械振荡节律的启发,我们开发了一种利用链接强度振荡的学习范式,其中学习与这些振荡的协调相关联。链接振荡可以快速改变协调,使网络能够感知并适应微妙的环境变化而无需监督。该网络成为一个通用的AI架构,能够预测多个环境的动态,包括未曾见过的环境。这些结果使我们的范式成为认知新模型的一个强大起点。由于我们的范式对神经网络的具体性质是不可知的,我们的研究为引入快速适应性学习到领先的AI模型中打开了大门。
更新时间: 2025-08-14 04:28:36
领域: cs.LG,cs.AI,math.DS,nlin.AO,physics.bio-ph
Evaluation of Speech Foundation Models for ASR on Child-Adult Conversations in Autism Diagnostic Sessions
Reliable transcription of child-adult conversations in clinical settings is crucial for diagnosing developmental disorders like Autism. Recent advances in deep learning and availability of large scale transcribed data has led to development of speech foundation models that have shown dramatic improvements in ASR performance. However, their performance on conversational child-adult interactions remains underexplored. In this work, we provide a comprehensive evaluation of ASR performance on a dataset containing child-adult interactions from autism diagnostic sessions, using Whisper, Wav2Vec2, HuBERT, and WavLM. We find that speech foundation models show a noticeable performance drop (15-20% absolute WER) for child speech compared to adult speech in the conversational setting. Then, we fine-tune the best-performing zero-shot model (Whisper-large) using LoRA in a low-resource setting, yielding 8% and 13% absolute WER improvements for child and adult speech, respectively.
Updated: 2025-08-14 04:22:42
标题: 在自闭症诊断会话中评估语音基础模型用于ASR的效果
摘要: 在临床环境中,可靠地转录儿童-成人对话对于诊断像自闭症这样的发育障碍至关重要。深度学习的最新进展和大规模转录数据的可用性促成了语音基础模型的发展,这些模型在ASR性能上取得了显著改进。然而,它们在儿童-成人对话互动中的表现仍未得到充分探讨。在这项工作中,我们对包含自闭症诊断会话中儿童-成人互动的数据集上的ASR性能进行了全面评估,使用了Whisper、Wav2Vec2、HuBERT和WavLM。我们发现,在对话环境中,与成人语音相比,语音基础模型在儿童语音表现出明显的性能下降(15-20%的绝对WER)。然后,我们在低资源环境中使用LoRA对表现最佳的零样本模型(Whisper-large)进行微调,分别使儿童和成人语音的绝对WER分别提高了8%和13%。
更新时间: 2025-08-14 04:22:42
领域: eess.AS,cs.LG,cs.SD
Interpretable Reward Model via Sparse Autoencoder
Large language models (LLMs) have been widely deployed across numerous fields. Reinforcement Learning from Human Feedback (RLHF) leverages reward models (RMs) as proxies for human preferences to align LLM behaviors with human values, making the accuracy, reliability, and interpretability of RMs critical for effective alignment. However, traditional RMs lack interpretability, offer limited insight into the reasoning behind reward assignments, and are inflexible toward user preference shifts. While recent multidimensional RMs aim for improved interpretability, they often fail to provide feature-level attribution and require costly annotations. To overcome these limitations, we introduce the Sparse Autoencoder-enhanced Reward Model (SARM), a novel architecture that integrates a pretrained Sparse Autoencoder (SAE) into a reward model. SARM maps the hidden activations of LLM-based RM into an interpretable, sparse, and monosemantic feature space, from which a scalar head aggregates feature activations to produce transparent and conceptually meaningful reward scores. Empirical evaluations demonstrate that SARM facilitates direct feature-level attribution of reward assignments, allows dynamic adjustment to preference shifts, and achieves superior alignment performance compared to conventional reward models. Our code is available at https://github.com/schrieffer-z/sarm.
Updated: 2025-08-14 04:21:50
标题: 可解释的奖励模型:通过稀疏自动编码器
摘要: 大型语言模型(LLMs)已被广泛部署在许多领域。从人类反馈中强化学习(RLHF)利用奖励模型(RMs)作为人类偏好的代理,以使LLM的行为与人类价值观保持一致,从而使RMs的准确性、可靠性和可解释性至关重要。然而,传统的RMs缺乏可解释性,对奖励分配背后的推理提供有限洞察,并且对用户偏好变化不灵活。尽管最近的多维RMs旨在提高可解释性,但它们经常无法提供特征级别的归因,并且需要昂贵的注释。为了克服这些限制,我们引入了Sparse Autoencoder-enhanced Reward Model(SARM),这是一种将预训练的稀疏自动编码器(SAE)集成到奖励模型中的新型架构。SARM将基于LLM的RM的隐藏激活映射到一个可解释的、稀疏的和单语义的特征空间,从中一个标量头部聚合特征激活,产生透明和概念上有意义的奖励分数。实证评估表明,SARM促进了对奖励分配的直接特征级别归因,允许动态调整到偏好变化,并且相比传统奖励模型实现了更优越的对齐性能。我们的代码可在https://github.com/schrieffer-z/sarm获取。
更新时间: 2025-08-14 04:21:50
领域: cs.LG
Layer-Wise Analysis of Self-Supervised Representations for Age and Gender Classification in Children's Speech
Children's speech presents challenges for age and gender classification due to high variability in pitch, articulation, and developmental traits. While self-supervised learning (SSL) models perform well on adult speech tasks, their ability to encode speaker traits in children remains underexplored. This paper presents a detailed layer-wise analysis of four Wav2Vec2 variants using the PFSTAR and CMU Kids datasets. Results show that early layers (1-7) capture speaker-specific cues more effectively than deeper layers, which increasingly focus on linguistic information. Applying PCA further improves classification, reducing redundancy and highlighting the most informative components. The Wav2Vec2-large-lv60 model achieves 97.14% (age) and 98.20% (gender) on CMU Kids; base-100h and large-lv60 models reach 86.05% and 95.00% on PFSTAR. These results reveal how speaker traits are structured across SSL model depth and support more targeted, adaptive strategies for child-aware speech interfaces.
Updated: 2025-08-14 04:11:44
标题: 层次分析儿童语音中用于年龄和性别分类的自监督表示
摘要: 儿童的语音由于音调、发音和发展特征的高度变异性,对年龄和性别分类提出了挑战。虽然自监督学习(SSL)模型在成人语音任务上表现良好,但它们在编码儿童的说话特征方面仍未得到充分探索。本文通过使用PFSTAR和CMU Kids数据集对四个Wav2Vec2变体进行了详细的逐层分析。结果显示,早期层(1-7)比更深层次更有效地捕捉说话者特定线索,更深层次逐渐专注于语言信息。应用PCA进一步改进了分类,减少了冗余并突出了最具信息量的组件。Wav2Vec2-large-lv60模型在CMU Kids数据集上实现了97.14%(年龄)和98.20%(性别)的准确率;base-100h和large-lv60模型在PFSTAR数据集上分别达到了86.05%和95.00%的准确率。这些结果揭示了说话者特征如何在SSL模型深度上结构化,并支持针对儿童感知的语音界面的更有针对性、适应性策略。
更新时间: 2025-08-14 04:11:44
领域: eess.AS,cs.AI,cs.HC,cs.LG,cs.SD
Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance
The rapid advancement of AI has expanded its capabilities across domains, yet introduced critical technical vulnerabilities, such as algorithmic bias and adversarial sensitivity, that pose significant societal risks, including misinformation, inequity, security breaches, physical harm, and eroded public trust. These challenges highlight the urgent need for robust AI governance. We propose a comprehensive framework integrating technical and societal dimensions, structured around three interconnected pillars: Intrinsic Security (system reliability), Derivative Security (real-world harm mitigation), and Social Ethics (value alignment and accountability). Uniquely, our approach unifies technical methods, emerging evaluation benchmarks, and policy insights to promote transparency, accountability, and trust in AI systems. Through a systematic review of over 300 studies, we identify three core challenges: (1) the generalization gap, where defenses fail against evolving threats; (2) inadequate evaluation protocols that overlook real-world risks; and (3) fragmented regulations leading to inconsistent oversight. These shortcomings stem from treating governance as an afterthought, rather than a foundational design principle, resulting in reactive, siloed efforts that fail to address the interdependence of technical integrity and societal trust. To overcome this, we present an integrated research agenda that bridges technical rigor with social responsibility. Our framework offers actionable guidance for researchers, engineers, and policymakers to develop AI systems that are not only robust and secure but also ethically aligned and publicly trustworthy. The accompanying repository is available at https://github.com/ZTianle/Awesome-AI-SG.
Updated: 2025-08-14 04:06:14
标题: 永不妥协于漏洞:关于人工智能治理的综合调查
摘要: 人工智能的快速发展扩展了其在各个领域的能力,同时引入了关键的技术漏洞,如算法偏见和敌对灵敏度,这些漏洞带来了重大的社会风险,包括错误信息、不公平、安全漏洞、身体伤害和公众信任的侵蚀。这些挑战凸显了对强健的人工智能治理的迫切需要。我们提出了一个综合框架,结合技术和社会维度,围绕三个相互关联的支柱结构:内在安全(系统可靠性)、派生安全(现实世界伤害缓解)和社会伦理学(价值对齐和问责制)。独特的是,我们的方法统一了技术方法、新兴评估基准和政策见解,以促进人工智能系统的透明度、问责制和信任。通过对300多项研究的系统审查,我们确定了三个核心挑战:(1)普遍性差距,防御措施无法应对不断演变的威胁;(2)不足的评估协议忽视了现实世界的风险;和(3)碎片化的法规导致监督不一致。这些缺点源于将治理视为事后考虑,而不是基础性设计原则,导致反应性、信息孤立的努力无法解决技术诚信和社会信任的相互依存性。为了克服这一问题,我们提出了一个集成的研究议程,将技术严谨与社会责任相结合。我们的框架为研究人员、工程师和政策制定者提供了可操作的指导,以开发不仅稳健和安全,而且道德一致且公众可信赖的人工智能系统。相关资料库可在https://github.com/ZTianle/Awesome-AI-SG上找到。
更新时间: 2025-08-14 04:06:14
领域: cs.CR
BERTector: Intrusion Detection Based on Joint-Dataset Learning
Intrusion detection systems (IDS) are facing challenges in generalization and robustness due to the heterogeneity of network traffic and the diversity of attack patterns. To address this issue, we propose a new joint-dataset training paradigm for IDS and propose a scalable BERTector framework based on BERT. BERTector integrates three key components: NSS-Tokenizer for traffic-aware semantic tokenization, supervised fine-tuning with a hybrid dataset, and low-rank adaptation (LoRA) for efficient training. Extensive experiments show that BERTector achieves state-of-the-art detection accuracy, strong cross-dataset generalization capabilities, and excellent robustness to adversarial perturbations. This work establishes a unified and efficient solution for modern IDS in complex and dynamic network environments.
Updated: 2025-08-14 04:05:01
标题: BERTector:基于联合数据集学习的入侵检测
摘要: 入侵检测系统(IDS)面临着一般化和稳健性方面的挑战,原因是网络流量的异质性和攻击模式的多样性。为了解决这个问题,我们提出了一种新的IDS联合数据集训练范式,并提出了基于BERT的可扩展BERTector框架。BERTector集成了三个关键组件:NSS-Tokenizer用于流量感知语义标记化、使用混合数据集进行监督微调以及用于高效训练的低秩适应(LoRA)。大量实验证明,BERTector实现了最先进的检测准确性,强大的跨数据集泛化能力,并且对敌对扰动具有出色的稳健性。这项工作为现代IDS在复杂和动态网络环境中建立了一个统一且高效的解决方案。
更新时间: 2025-08-14 04:05:01
领域: cs.CR
Data Pruning by Information Maximization
In this paper, we present InfoMax, a novel data pruning method, also known as coreset selection, designed to maximize the information content of selected samples while minimizing redundancy. By doing so, InfoMax enhances the overall informativeness of the coreset. The information of individual samples is measured by importance scores, which capture their influence or difficulty in model learning. To quantify redundancy, we use pairwise sample similarities, based on the premise that similar samples contribute similarly to the learning process. We formalize the coreset selection problem as a discrete quadratic programming (DQP) task, with the objective of maximizing the total information content, represented as the sum of individual sample contributions minus the redundancies introduced by similar samples within the coreset. To ensure practical scalability, we introduce an efficient gradient-based solver, complemented by sparsification techniques applied to the similarity matrix and dataset partitioning strategies. This enables InfoMax to seamlessly scale to datasets with millions of samples. Extensive experiments demonstrate the superior performance of InfoMax in various data pruning tasks, including image classification, vision-language pre-training, and instruction tuning for large language models. Code is available at https://github.com/hrtan/InfoMax.
Updated: 2025-08-14 03:59:04
标题: 通过信息最大化进行数据修剪
摘要: 在本文中,我们提出了一种新颖的数据修剪方法InfoMax,也称为核心集选择,旨在最大化所选样本的信息内容,同时最小化冗余。通过这样做,InfoMax增强了核心集的整体信息性。个体样本的信息由重要性分数来衡量,这些分数捕捉了它们在模型学习中的影响或难度。为了量化冗余,我们使用成对样本相似性,基于相似样本对学习过程的类似贡献的前提。我们将核心集选择问题形式化为离散二次规划(DQP)任务,其目标是最大化总信息内容,表示为个体样本贡献之和减去核心集中相似样本引入的冗余。为了确保实用可扩展性,我们引入了一种高效的基于梯度的求解器,辅以应用于相似性矩阵和数据集分区策略的稀疏化技术。这使得InfoMax能够无缝扩展到拥有数百万样本的数据集。大量实验证明了InfoMax在各种数据修剪任务中的卓越性能,包括图像分类、视觉语言预训练和大型语言模型的指令调整。代码可在https://github.com/hrtan/InfoMax找到。
更新时间: 2025-08-14 03:59:04
领域: cs.CV,cs.AI
Semantic Structure-Aware Generative Attacks for Enhanced Adversarial Transferability
Generative adversarial attacks train a perturbation generator on a white-box surrogate model and subsequently apply the crafted perturbations to unseen black-box victim models. In contrast to iterative attacks, these methods deliver superior inference-time efficiency, scalability, and transferability; however, up until now, existing studies have not fully exploited the representational capacity of generative models to preserve and harness semantic information. Specifically, the intermediate activations of the generator encode rich semantic features--object boundaries and coarse shapes--that remain under-exploited, thereby limiting the alignment of perturbations with object-salient regions which are critical for adversarial transferability. To remedy this, we introduce a semantic structure-aware attack framework based on the Mean Teacher, which serves as a temporally smoothed feature reference. With this smoothed reference, we further direct semantic consistency between the early-layer activations in the student and those of the semantically rich teacher by feature distillation. By anchoring perturbation synthesis to the semantically salient early intermediate blocks within the generator based on empirical findings, our method guides progressive adversarial perturbation on regions that substantially enhance adversarial transferability. We conduct extensive experiments over diverse models, domains and tasks to demonstrate consistent improvements relative to state-of-the-art generative attacks, comprehensively evaluated using conventional metrics and our newly proposed Accidental Correction Rate (ACR).
Updated: 2025-08-14 03:51:59
标题: 语义结构感知的生成式攻击以增强对抗性可转移性
摘要: 生成对抗攻击训练一个扰动生成器在一个白盒替代模型上,然后将制作的扰动应用于未见过的黑盒受害者模型。与迭代攻击相比,这些方法在推理时间效率、可扩展性和可转移性上具有优势;然而,迄今为止,现有研究尚未充分利用生成模型的表示能力来保留和利用语义信息。具体来说,生成器的中间激活编码丰富的语义特征--物体边界和粗略形状--仍然未充分利用,从而限制了扰动与物体显著区域的对准,而这些区域对于对抗性可转移性至关重要。为了解决这个问题,我们引入了一种基于Mean Teacher的语义结构感知攻击框架,它可以作为一个临时平滑的特征参考。通过这个平滑的参考,我们通过特征蒸馏进一步指导学生的早期层激活与语义丰富的老师的激活之间的语义一致性。通过根据实证发现将扰动合成锚定到生成器内部语义显著的早期中间块,我们的方法引导渐进的对抗性扰动在大大增强对抗性可转移性的区域上进行。我们通过对各种模型、领域和任务进行广泛实验,相对于最先进的生成性攻击,使用传统指标和我们新提出的意外修正率(ACR)进行全面评估,展示了一致的改进。
更新时间: 2025-08-14 03:51:59
领域: cs.CV,cs.AI
Warehouse Spatial Question Answering with LLM Agent
Spatial understanding has been a challenging task for existing Multi-modal Large Language Models~(MLLMs). Previous methods leverage large-scale MLLM finetuning to enhance MLLM's spatial understanding ability. In this paper, we present a data-efficient approach. We propose a LLM agent system with strong and advanced spatial reasoning ability, which can be used to solve the challenging spatial question answering task in complex indoor warehouse scenarios. Our system integrates multiple tools that allow the LLM agent to conduct spatial reasoning and API tools interaction to answer the given complicated spatial question. Extensive evaluations on the 2025 AI City Challenge Physical AI Spatial Intelligence Warehouse dataset demonstrate that our system achieves high accuracy and efficiency in tasks such as object retrieval, counting, and distance estimation. The code is available at: https://github.com/hsiangwei0903/SpatialAgent
Updated: 2025-08-14 03:48:03
标题: 仓储空间问答与LLM代理的研究
摘要: 空间理解一直是现有多模态大语言模型(MLLM)面临的挑战性任务。先前的方法利用大规模MLLM微调来增强MLLM的空间理解能力。在本文中,我们提出了一种数据高效的方法。我们提出了一个具有强大和先进空间推理能力的LLM代理系统,可以用来解决复杂室内仓库场景中具有挑战性的空间问题回答任务。我们的系统集成了多种工具,允许LLM代理进行空间推理和API工具交互以回答给定的复杂空间问题。对2025年AI城市挑战物理AI空间智能仓库数据集的大量评估表明,我们的系统在物体检索、计数和距离估计等任务中取得了高准确性和效率。源代码可在以下链接找到:https://github.com/hsiangwei0903/SpatialAgent
更新时间: 2025-08-14 03:48:03
领域: cs.CV,cs.AI
A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning
Existing backdoor defense methods in Federated Learning (FL) rely on the assumption of homogeneous client data distributions or the availability of a clean serve dataset, which limits the practicality and effectiveness. Defending against backdoor attacks under heterogeneous client data distributions while preserving model performance remains a significant challenge. In this paper, we propose a FL backdoor defense framework named CLIP-Fed, which leverages the zero-shot learning capabilities of vision-language pre-training models. By integrating both pre-aggregation and post-aggregation defense strategies, CLIP-Fed overcomes the limitations of Non-IID imposed on defense effectiveness. To address privacy concerns and enhance the coverage of the dataset against diverse triggers, we construct and augment the server dataset using the multimodal large language model and frequency analysis without any client samples. To address class prototype deviations caused by backdoor samples and eliminate the correlation between trigger patterns and target labels, CLIP-Fed aligns the knowledge of the global model and CLIP on the augmented dataset using prototype contrastive loss and Kullback-Leibler divergence. Extensive experiments on representative datasets validate the effectiveness of CLIP-Fed. Compared to state-of-the-art methods, CLIP-Fed achieves an average reduction in ASR, i.e., 2.03\% on CIFAR-10 and 1.35\% on CIFAR-10-LT, while improving average MA by 7.92\% and 0.48\%, respectively.
Updated: 2025-08-14 03:39:54
标题: 一种基于视觉语言预训练模型指导的方法,用于减轻联邦学习中的后门攻击
摘要: 现有的联邦学习(FL)中的后门防御方法依赖于同质客户数据分布的假设或者干净的服务数据集的可用性,这限制了实用性和有效性。在异质客户数据分布下防御后门攻击,同时保持模型性能仍然是一个重要挑战。本文提出了一个名为CLIP-Fed的FL后门防御框架,利用视觉-语言预训练模型的零样本学习能力。通过整合预聚合和后聚合防御策略,CLIP-Fed克服了对防御效果施加的非IID的限制。为了解决隐私问题并增强数据集对各种触发器的覆盖范围,我们使用多模态大型语言模型和频率分析构建和增强服务器数据集,而无需任何客户样本。为了解决由后门样本引起的类原型偏差并消除触发器模式与目标标签之间的相关性,CLIP-Fed使用原型对比损失和Kullback-Leibler散度在增强数据集上对全局模型和CLIP的知识进行对齐。对代表性数据集进行的大量实验验证了CLIP-Fed的有效性。与最先进的方法相比,CLIP-Fed在CIFAR-10上平均减少了2.03\%的ASR,在CIFAR-10-LT上平均减少了1.35\%,同时分别提高了平均MA 7.92%和0.48%。
更新时间: 2025-08-14 03:39:54
领域: cs.LG,cs.AI
ToolACE-R: Model-aware Iterative Training and Adaptive Refinement for Tool Learning
Tool learning, which allows Large Language Models (LLMs) to leverage external tools for solving complex user tasks, has emerged as a promising avenue for extending model capabilities. However, existing approaches primarily focus on data synthesis for fine-tuning LLMs to invoke tools effectively, largely ignoring how to fully stimulate the potential of the model. In this paper, we propose ToolACE-R, a novel framework that includes both model-aware iterative training and adaptive refinement for tool learning. ToolACE-R features a model-aware iterative training procedure that progressively adjust training samples based on the model's evolving capabilities to maximize its potential. Additionally, it incorporates self-refinement training corpus which emphasizes LLM's ability to iteratively refine their tool calls, optimizing performance without requiring external feedback. Furthermore, we introduce adaptive self-refinement mechanism for efficient test-time scaling, where the trained model can autonomously determine when to stop the process based on iterative self-refinement. We conduct extensive experiments across several benchmark datasets, showing that ToolACE-R achieves competitive performance compared to advanced API-based models. The performance of tool invocation can be further improved efficiently through adaptive self-refinement. These results highlight the effectiveness and generalizability of ToolACE-R, offering a promising direction for more efficient and scalable tool learning.
Updated: 2025-08-14 03:37:54
标题: ToolACE-R:面向工具学习的模型感知迭代训练和自适应细化
摘要: 工具学习是一种允许大型语言模型(LLMs)利用外部工具解决复杂用户任务的方法,已被证明是扩展模型能力的一个有前途的途径。然而,现有方法主要集中在为微调LLMs合成数据以有效调用工具,很大程度上忽略了如何充分激发模型的潜力。在本文中,我们提出了ToolACE-R,这是一个包括模型感知迭代训练和自适应精炼的新框架,用于工具学习。ToolACE-R具有一个模型感知的迭代训练过程,根据模型不断发展的能力逐渐调整训练样本,以最大化其潜力。此外,它还包括强调LLM能力的自我精炼训练语料库,可以迭代地优化其工具调用,提高性能而无需外部反馈。此外,我们引入了自适应自我精炼机制,用于有效的测试时间缩放,训练好的模型可以自主确定何时停止这个过程,基于迭代的自我精炼。我们在几个基准数据集上进行了广泛的实验,显示出ToolACE-R相比先进的基于API的模型实现了竞争性能。通过自适应自我精炼,工具调用的性能可以进一步有效提高。这些结果突显了ToolACE-R的有效性和泛化性,为更高效和可扩展的工具学习提供了一个有前途的方向。
更新时间: 2025-08-14 03:37:54
领域: cs.CL,cs.AI,cs.LG
Diversifying Policy Behaviors with Extrinsic Behavioral Curiosity
Imitation learning (IL) has shown promise in various applications (e.g. robot locomotion) but is often limited to learning a single expert policy, constraining behavior diversity and robustness in unpredictable real-world scenarios. To address this, we introduce Quality Diversity Inverse Reinforcement Learning (QD-IRL), a novel framework that integrates quality-diversity optimization with IRL methods, enabling agents to learn diverse behaviors from limited demonstrations. This work introduces Extrinsic Behavioral Curiosity (EBC), which allows agents to receive additional curiosity rewards from an external critic based on how novel the behaviors are with respect to a large behavioral archive. To validate the effectiveness of EBC in exploring diverse locomotion behaviors, we evaluate our method on multiple robot locomotion tasks. EBC improves the performance of QD-IRL instances with GAIL, VAIL, and DiffAIL across all included environments by up to 185%, 42%, and 150%, even surpassing expert performance by 20% in Humanoid. Furthermore, we demonstrate that EBC is applicable to Gradient-Arborescence-based Quality Diversity Reinforcement Learning (QD-RL) algorithms, where it substantially improves performance and provides a generic technique for learning behavioral-diverse policies. The source code of this work is provided at https://github.com/vanzll/EBC.
Updated: 2025-08-14 03:37:11
标题: 通过外在行为好奇心使政策行为多样化
摘要: 模仿学习(IL)在各种应用中显示出潜力(例如机器人运动),但通常限于学习单一专家策略,限制了在不可预测的现实世界场景中的行为多样性和鲁棒性。为了解决这个问题,我们引入了Quality Diversity Inverse Reinforcement Learning(QD-IRL),这是一个将质量多样性优化与IRL方法相结合的新框架,使代理能够从有限的演示中学习多样化的行为。这项工作引入了外在行为好奇心(EBC),允许代理根据行为相对于大型行为存档的新颖程度而从外部评论者获得额外的好奇奖励。为了验证EBC在探索多样的运动行为方面的有效性,我们在多个机器人运动任务上评估了我们的方法。EBC将GAIL、VAIL和DiffAIL的QD-IRL实例在所有包含的环境中的表现提高了高达185%,42%和150%,甚至在Humanoid上超过专家表现20%。此外,我们证明EBC适用于基于梯度树结构的质量多样性强化学习(QD-RL)算法,在这些算法中,它显着提高了性能,并提供了一种学习行为多样化策略的通用技术。本研究的源代码可在https://github.com/vanzll/EBC 中找到。
更新时间: 2025-08-14 03:37:11
领域: cs.LG,cs.AI
Hallucination vs interpretation: rethinking accuracy and precision in AI-assisted data extraction for knowledge synthesis
Knowledge syntheses (literature reviews) are essential to health professions education (HPE), consolidating findings to advance theory and practice. However, they are labor-intensive, especially during data extraction. Artificial Intelligence (AI)-assisted extraction promises efficiency but raises concerns about accuracy, making it critical to distinguish AI 'hallucinations' (fabricated content) from legitimate interpretive differences. We developed an extraction platform using large language models (LLMs) to automate data extraction and compared AI to human responses across 187 publications and 17 extraction questions from a published scoping review. AI-human, human-human, and AI-AI consistencies were measured using interrater reliability (categorical) and thematic similarity ratings (open-ended). Errors were identified by comparing extracted responses to source publications. AI was highly consistent with humans for concrete, explicitly stated questions (e.g., title, aims) and lower for questions requiring subjective interpretation or absent in text (e.g., Kirkpatrick's outcomes, study rationale). Human-human consistency was not higher than AI-human and showed the same question-dependent variability. Discordant AI-human responses (769/3179 = 24.2%) were mostly due to interpretive differences (18.3%); AI inaccuracies were rare (1.51%), while humans were nearly three times more likely to state inaccuracies (4.37%). Findings suggest AI variability depends more on interpretability than hallucination. Repeating AI extraction can identify interpretive complexity or ambiguity, refining processes before human review. AI can be a transparent, trustworthy partner in knowledge synthesis, though caution is needed to preserve critical human insights.
Updated: 2025-08-14 03:36:46
标题: 幻觉与解释:重新思考AI辅助数据提取在知识综合中的准确性和精度
摘要: 知识综合(文献综述)对于卫生专业教育(HPE)至关重要,它整合发现以推进理论和实践。然而,在数据提取过程中,它们需要大量的人力资源。人工智能(AI)辅助提取承诺提高效率,但也引发了准确性的担忧,因此有必要区分AI“幻觉”(虚构内容)和合法的解释差异。我们开发了一个使用大型语言模型(LLMs)的提取平台,以自动化数据提取,并通过一项发表的范围审查比较了AI与人类回答在187篇出版物和17个提取问题。使用分类方式和主题相似性评分来测量AI-人类、人类-人类和AI-AI的一致性。通过将提取的回答与来源出版物进行比较来识别错误。AI在具体的、明确陈述的问题(例如标题、目标)方面与人类非常一致,对需要主观解释的问题或者在文本中不存在的问题(例如柯克帕特里克的结果、研究基础)的一致性较低。人类之间的一致性并不比AI-人类更高,并且显示相同的问题依赖变异性。AI-人类的不一致回答(769/3179 = 24.2%)主要是由于解释性差异(18.3%);AI的不准确性很少(1.51%),而人类几乎三倍更有可能陈述不准确性(4.37%)。研究结果表明,AI的可变性更多取决于可解释性而不是幻觉。重复AI提取可以识别解释复杂性或模糊性,在人类审查之前改进流程。AI可以成为知识综合中透明、可信赖的合作伙伴,但需要谨慎以保留关键的人类见解。
更新时间: 2025-08-14 03:36:46
领域: cs.HC,cs.AI,cs.ET
Multi-objective Optimization in CPU Design Space Exploration: Attention is All You Need
Design Space Exploration (DSE) is essential to modern CPU design, yet current frameworks struggle to scale and generalize in high-dimensional architectural spaces. As the dimensionality of design spaces continues to grow, existing DSE frameworks face three fundamental challenges: (1) reduced accuracy and poor scalability of surrogate models in large design spaces; (2) inefficient acquisition guided by hand-crafted heuristics or exhaustive search; (3) limited interpretability, making it hard to pinpoint architectural bottlenecks. In this work, we present \textbf{AttentionDSE}, the first end-to-end DSE framework that \emph{natively integrates} performance prediction and design guidance through an attention-based neural architecture. Unlike traditional DSE workflows that separate surrogate modeling from acquisition and rely heavily on hand-crafted heuristics, AttentionDSE establishes a unified, learning-driven optimization loop, in which attention weights serve a dual role: enabling accurate performance estimation and simultaneously exposing the performance bottleneck. This paradigm shift elevates attention from a passive representation mechanism to an active, interpretable driver of design decision-making. Key innovations include: (1) a \textbf{Perception-Driven Attention} mechanism that exploits architectural hierarchy and locality, scaling attention complexity from $\mathcal{O}(n^2)$ to $\mathcal{O}(n)$ via sliding windows; (2) an \textbf{Attention-aware Bottleneck Analysis} that automatically surfaces critical parameters for targeted optimization, eliminating the need for domain-specific heuristics. Evaluated on high-dimensional CPU design space using the SPEC CPU2017 benchmark suite, AttentionDSE achieves up to \textbf{3.9\% higher Pareto Hypervolume} and over \textbf{80\% reduction in exploration time} compared to state-of-the-art baselines.
Updated: 2025-08-14 03:32:45
标题: CPU设计空间探索中的多目标优化:关注的重点就是一切
摘要: 设计空间探索(DSE)对于现代CPU设计至关重要,然而当前框架在高维架构空间中很难扩展和泛化。随着设计空间的维度继续增长,现有的DSE框架面临三个基本挑战:(1)在大型设计空间中代理模型的准确性和可扩展性降低;(2)由手工启发式或穷举搜索引导的获取效率低;(3)解释能力有限,难以准确定位架构瓶颈。 在这项工作中,我们提出了\textbf{AttentionDSE},这是第一个端到端的DSE框架,通过基于注意力的神经架构\emph{本地集成}了性能预测和设计指导。与传统的DSE工作流程将代理建模与获取分开,并且大量依赖手工启发式不同,AttentionDSE建立了一个统一的、基于学习的优化循环,其中注意力权重起着双重作用:实现准确的性能估计并同时暴露性能瓶颈。这种范式转变将注意力从被动的表示机制提升为设计决策的主动、可解释的驱动器。 关键创新包括:(1)一种\textbf{感知驱动注意力}机制,利用架构层次和局部性,通过滑动窗口将注意力复杂度从$\mathcal{O}(n^2)$缩减到$\mathcal{O}(n)$;(2)一种\textbf{注意力感知瓶颈分析},自动展示关键参数以进行有针对性的优化,消除了对领域特定启发式的需求。 在使用SPEC CPU2017基准套件对高维CPU设计空间进行评估时,AttentionDSE相比最先进的基线实现具有高达\textbf{3.9\%更高的帕累托超体积}和超过\textbf{80\%的探索时间减少}。
更新时间: 2025-08-14 03:32:45
领域: cs.LG,cs.AI,cs.AR
Communication Cost Reduction for Subgraph Counting under Local Differential Privacy via Hash Functions
We suggest the use of hash functions to cut down the communication costs when counting subgraphs under edge local differential privacy. While various algorithms exist for computing graph statistics, including the count of subgraphs, under the edge local differential privacy, many suffer with high communication costs, making them less efficient for large graphs. Though data compression is a typical approach in differential privacy, its application in local differential privacy requires a form of compression that every node can reproduce. In our study, we introduce linear congruence hashing. With a sampling rate of $s$, our method can cut communication costs by a factor of $s^2$, albeit at the cost of increasing variance in the published graph statistic by a factor of $s$. The experimental results indicate that, when matched for communication costs, our method achieves a reduction in the $\ell_2$-error for triangle counts by up to 1000 times compared to the performance of leading algorithms.
Updated: 2025-08-14 03:29:34
标题: 使用哈希函数在本地差分隐私条件下减少子图计数的通信成本
摘要: 我们建议在边缘局部差分隐私下计算子图数量时使用哈希函数来降低通信成本。虽然存在各种算法用于计算图统计数据,包括在边缘局部差分隐私下计算子图数量,但许多算法在通信成本方面存在问题,使它们在大型图中效率较低。尽管数据压缩是差分隐私中的典型方法,但在局部差分隐私中的应用需要一种每个节点都能重现的压缩形式。在我们的研究中,我们引入了线性同余哈希。在采样率为$s$的情况下,我们的方法可以将通信成本降低$s^2$倍,尽管这会导致发布的图统计数据的方差增加$s$倍。实验结果表明,当在通信成本方面进行匹配时,我们的方法可以将三角形计数的$\ell_2$误差降低多达1000倍,相比于主要算法的性能。
更新时间: 2025-08-14 03:29:34
领域: cs.CR,cs.AI
A Detailed Factor Analysis for the Political Compass Test: Navigating Ideologies of Large Language Models
Political Compass Test (PCT) or similar questionnaires have been used to quantify LLM's political leanings. Building on a recent line of work that examines the validity of PCT tests, we demonstrate that variation in standard generation parameters does not significantly impact the models' PCT scores. However, external factors such as prompt variations and fine-tuning individually and in combination affect the same. Finally, we demonstrate that when models are fine-tuned on text datasets with higher political content than others, the PCT scores are not differentially affected. This calls for a thorough investigation into the validity of PCT and similar tests, as well as the mechanism by which political leanings are encoded in LLMs.
Updated: 2025-08-14 03:29:22
标题: 一个详细的因素分析对政治罗盘测试:导航大型语言模型的意识形态
摘要: 政治罗盘测试(PCT)或类似的问卷已被用来量化LLM的政治倾向。在最近一系列研究政治罗盘测试的有效性的基础上,我们证明了标准生成参数的变化并不显著影响模型的PCT分数。然而,外部因素如提示变化和微调单独或结合起来会影响相同的结果。最后,我们证明了当模型在比其他文本数据集具有更高政治内容的数据集上进行微调时,PCT分数并没有被差异性影响。这需要对PCT和类似测试的有效性进行彻底调查,以及政治倾向如何被编码在LLM中的机制。
更新时间: 2025-08-14 03:29:22
领域: cs.CY,cs.CL,cs.LG
ReviewRL: Towards Automated Scientific Review with RL
Peer review is essential for scientific progress but faces growing challenges due to increasing submission volumes and reviewer fatigue. Existing automated review approaches struggle with factual accuracy, rating consistency, and analytical depth, often generating superficial or generic feedback lacking the insights characteristic of high-quality human reviews. We introduce ReviewRL, a reinforcement learning framework for generating comprehensive and factually grounded scientific paper reviews. Our approach combines: (1) an ArXiv-MCP retrieval-augmented context generation pipeline that incorporates relevant scientific literature, (2) supervised fine-tuning that establishes foundational reviewing capabilities, and (3) a reinforcement learning procedure with a composite reward function that jointly enhances review quality and rating accuracy. Experiments on ICLR 2025 papers demonstrate that ReviewRL significantly outperforms existing methods across both rule-based metrics and model-based quality assessments. ReviewRL establishes a foundational framework for RL-driven automatic critique generation in scientific discovery, demonstrating promising potential for future development in this domain. The implementation of ReviewRL will be released at GitHub.
Updated: 2025-08-14 03:26:13
标题: ReviewRL:朝着利用强化学习实现科学文献自动审阅的方向
摘要: 同行评审对于科学进步至关重要,但面临着日益增加的投稿量和评审者疲劳等挑战。现有的自动评审方法在事实准确性、评分一致性和分析深度方面存在困难,往往产生肤浅或普通的反馈,缺乏高质量人工评审所具有的洞察力。我们引入了ReviewRL,这是一个用于生成全面且基于事实的科学论文评审的强化学习框架。我们的方法结合了:(1)一个ArXiv-MCP检索增强的上下文生成流水线,其中包含相关的科学文献;(2)监督微调,建立基础评审能力;(3)一个强化学习过程,带有一个综合奖励函数,共同增强评审质量和评分准确性。对ICLR 2025年的论文进行的实验表明,ReviewRL在基于规则的指标和基于模型的质量评估方面显著优于现有方法。ReviewRL为RL驱动的科学发现中的自动评论生成奠定了基础框架,展示了未来在该领域发展的有希望的潜力。ReviewRL的实现将在GitHub上发布。
更新时间: 2025-08-14 03:26:13
领域: cs.CL,cs.AI
Yet another algorithmic bias: A Discursive Analysis of Large Language Models Reinforcing Dominant Discourses on Gender and Race
With the advance of Artificial Intelligence (AI), Large Language Models (LLMs) have gained prominence and been applied in diverse contexts. As they evolve into more sophisticated versions, it is essential to assess whether they reproduce biases, such as discrimination and racialization, while maintaining hegemonic discourses. Current bias detection approaches rely mostly on quantitative, automated methods, which often overlook the nuanced ways in which biases emerge in natural language. This study proposes a qualitative, discursive framework to complement such methods. Through manual analysis of LLM-generated short stories featuring Black and white women, we investigate gender and racial biases. We contend that qualitative methods such as the one proposed here are fundamental to help both developers and users identify the precise ways in which biases manifest in LLM outputs, thus enabling better conditions to mitigate them. Results show that Black women are portrayed as tied to ancestry and resistance, while white women appear in self-discovery processes. These patterns reflect how language models replicate crystalized discursive representations, reinforcing essentialization and a sense of social immobility. When prompted to correct biases, models offered superficial revisions that maintained problematic meanings, revealing limitations in fostering inclusive narratives. Our results demonstrate the ideological functioning of algorithms and have significant implications for the ethical use and development of AI. The study reinforces the need for critical, interdisciplinary approaches to AI design and deployment, addressing how LLM-generated discourses reflect and perpetuate inequalities.
Updated: 2025-08-14 03:22:02
标题: 另一个算法偏见:对性别和种族主导话语进行大语言模型的话语分析
摘要: 随着人工智能(AI)的进步,大型语言模型(LLMs)已经引起了人们的关注,并被应用于各种不同的背景中。随着它们逐渐演变为更复杂的版本,评估它们是否复制了偏见,如歧视和种族化,同时保持霸权话语是至关重要的。目前的偏见检测方法主要依赖定量、自动化的方法,这往往忽视了偏见如何在自然语言中以微妙的方式出现。本研究提出了一个定性、话语性的框架来补充这样的方法。通过对由LLM生成的关于黑人和白人女性的短篇故事进行手动分析,我们调查了性别和种族偏见。我们认为,像本研究提出的这样的定性方法对于帮助开发人员和用户识别偏见在LLM输出中表现的确切方式是至关重要的,从而为减少它们创造更好的条件。结果显示,黑人女性被描绘为与祖先和抵抗联系在一起,而白人女性则出现在自我发现过程中。这些模式反映了语言模型如何复制了凝固的话语表现,加强了本质化和社会静止感。当被要求纠正偏见时,模型提供了维持有问题含义的表面修订,揭示了在促进包容性叙事方面的局限性。我们的结果展示了算法的意识形态功能,并对人工智能的道德使用和发展产生了重要影响。这项研究强调了对人工智能设计和部署的批判性、跨学科方法的需求,解决LLM生成的话语如何反映和延续不平等现象的问题。
更新时间: 2025-08-14 03:22:02
领域: cs.CL,cs.AI
CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization
Fine-tuning large language models (LLMs) using low-rank adaptation (LoRA) has become a highly efficient approach for downstream tasks, particularly in scenarios with limited computational resources. However, applying LoRA techniques to quantized LLMs poses unique challenges due to the reduced representational precision of quantized weights. In this paper, we introduce CLoQ (Calibrated LoRA initialization for Quantized LLMs), a simplistic initialization strategy designed to overcome these challenges. Our approach focuses on minimizing the layer-wise discrepancy between the original LLM and its quantized counterpart with LoRA components during initialization. By leveraging a small calibration dataset, CLoQ quantizes a pre-trained LLM and determines the optimal LoRA components for each layer, ensuring a strong foundation for subsequent fine-tuning. A key contribution of this work is a novel theoretical result that enables the accurate and closed-form construction of these optimal LoRA components. We validate the efficacy of CLoQ across multiple tasks such as language generation, arithmetic reasoning, and commonsense reasoning, demonstrating that it consistently outperforms existing LoRA fine-tuning methods for quantized LLMs, especially at ultra low-bit widths.
Updated: 2025-08-14 03:12:42
标题: CLoQ: 通过校准LoRA初始化增强量化LLMs的微调
摘要: 利用低秩适应(LoRA)对大型语言模型(LLMs)进行微调已成为下游任务中一种高效的方法,特别是在计算资源有限的情况下。然而,将LoRA技术应用于量化的LLMs存在独特的挑战,因为量化权重的表示精度降低。在本文中,我们介绍了CLoQ(用于量化LLMs的校准LoRA初始化),这是一种旨在克服这些挑战的简单初始化策略。我们的方法侧重于在初始化期间最小化原始LLM和其量化版本之间的逐层差异,其中包含LoRA组件。通过利用一个小型校准数据集,CLoQ对预训练的LLM进行量化,并确定每个层的最佳LoRA组件,确保为后续微调奠定坚实基础。本文的一个重要贡献是一项新颖的理论结果,该结果使得能够准确地构建这些最佳LoRA组件。我们验证了CLoQ在诸如语言生成、算术推理和常识推理等多个任务上的有效性,证明它在量化LLMs上始终优于现有的LoRA微调方法,尤其是在极低比特宽度下。
更新时间: 2025-08-14 03:12:42
领域: cs.LG,cs.AI
Improving Learning of New Diseases through Knowledge-Enhanced Initialization for Federated Adapter Tuning
In healthcare, federated learning (FL) is a widely adopted framework that enables privacy-preserving collaboration among medical institutions. With large foundation models (FMs) demonstrating impressive capabilities, using FMs in FL through cost-efficient adapter tuning has become a popular approach. Given the rapidly evolving healthcare environment, it is crucial for individual clients to quickly adapt to new tasks or diseases by tuning adapters while drawing upon past experiences. In this work, we introduce Federated Knowledge-Enhanced Initialization (FedKEI), a novel framework that leverages cross-client and cross-task transfer from past knowledge to generate informed initializations for learning new tasks with adapters. FedKEI begins with a global clustering process at the server to generalize knowledge across tasks, followed by the optimization of aggregation weights across clusters (inter-cluster weights) and within each cluster (intra-cluster weights) to personalize knowledge transfer for each new task. To facilitate more effective learning of the inter- and intra-cluster weights, we adopt a bi-level optimization scheme that collaboratively learns the global intra-cluster weights across clients and optimizes the local inter-cluster weights toward each client's task objective. Extensive experiments on three benchmark datasets of different modalities, including dermatology, chest X-rays, and retinal OCT, demonstrate FedKEI's advantage in adapting to new diseases compared to state-of-the-art methods.
Updated: 2025-08-14 03:02:48
标题: 通过知识增强初始化改善联邦适配器调整的新疾病学习
摘要: 在医疗保健领域,联邦学习(FL)是一种被广泛采用的框架,可以实现医疗机构之间保护隐私的合作。随着大型基础模型(FMs)展示出令人印象深刻的能力,通过成本效益高的适配器调整在FL中使用FMs已成为一种流行的方法。鉴于医疗保健环境快速发展,个体客户通过调整适配器并借鉴过去经验快速适应新任务或疾病至关重要。在这项工作中,我们介绍了联邦知识增强初始化(FedKEI),这是一种利用过去知识进行跨客户和跨任务转移以生成了解新任务的启动的新框架。FedKEI从服务器开始进行全局聚类过程,以泛化任务间的知识,然后优化跨集群(集群间权重)和在每个集群内(集群内权重)的聚合权重,以为每项新任务个性化地转移知识。为了促进更有效地学习集群间和集群内权重,我们采用了双层优化方案,共同学习跨客户的全局集群内权重,并优化本地集群间权重以达到每个客户的任务目标。在三个不同模态的基准数据集上进行的大量实验,包括皮肤科学、胸部X射线和视网膜OCT,显示了FedKEI相对于最先进方法在适应新疾病方面的优势。
更新时间: 2025-08-14 03:02:48
领域: cs.LG,cs.CV
SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning
Deciphering how visual stimuli are transformed into cortical responses is a fundamental challenge in computational neuroscience. This visual-to-neural mapping is inherently a one-to-many relationship, as identical visual inputs reliably evoke variable hemodynamic responses across trials, contexts, and subjects. However, existing deterministic methods struggle to simultaneously model this biological variability while capturing the underlying functional consistency that encodes stimulus information. To address these limitations, we propose SynBrain, a generative framework that simulates the transformation from visual semantics to neural responses in a probabilistic and biologically interpretable manner. SynBrain introduces two key components: (i) BrainVAE models neural representations as continuous probability distributions via probabilistic learning while maintaining functional consistency through visual semantic constraints; (ii) A Semantic-to-Neural Mapper acts as a semantic transmission pathway, projecting visual semantics into the neural response manifold to facilitate high-fidelity fMRI synthesis. Experimental results demonstrate that SynBrain surpasses state-of-the-art methods in subject-specific visual-to-fMRI encoding performance. Furthermore, SynBrain adapts efficiently to new subjects with few-shot data and synthesizes high-quality fMRI signals that are effective in improving data-limited fMRI-to-image decoding performance. Beyond that, SynBrain reveals functional consistency across trials and subjects, with synthesized signals capturing interpretable patterns shaped by biological neural variability. The code will be made publicly available.
Updated: 2025-08-14 03:01:05
标题: SynBrain: 通过概率表示学习增强视觉到fMRI合成
摘要: 解读视觉刺激如何转化为皮层反应是计算神经科学中的一个基本挑战。这种视觉到神经的映射本质上是一对多的关系,因为相同的视觉输入可在试验、背景和受试者之间可靠地引发变化的血液动力学反应。然而,现有的确定性方法在同时建模生物变异和捕捉编码刺激信息的基础功能一致性时面临困难。为了解决这些限制,我们提出了SynBrain,这是一个生成框架,以概率和生物可解释的方式模拟从视觉语义到神经反应的转化。SynBrain引入了两个关键组件:(i) BrainVAE通过概率学习将神经表示建模为连续概率分布,同时通过视觉语义约束保持功能一致性;(ii) 语义到神经映射器作为一个语义传输通道,将视觉语义投射到神经反应空间,以促进高保真度的fMRI合成。实验结果表明,SynBrain在特定受试者的视觉到fMRI编码性能方面超越了最先进的方法。此外,SynBrain可以有效地适应新受试者的少样本数据,并合成高质量的fMRI信号,有助于改善数据有限的fMRI到图像解码性能。此外,SynBrain显示了试验和受试者之间的功能一致性,合成信号捕捉到由生物神经变异塑造的可解释模式。该代码将公开提供。
更新时间: 2025-08-14 03:01:05
领域: cs.LG,cs.CV,eess.IV
Efficient Homomorphically Encrypted Convolutional Neural Network Without Rotation
Privacy-preserving neural network (NN) inference can be achieved by utilizing homomorphic encryption (HE), which allows computations to be directly carried out over ciphertexts. Popular HE schemes are built over large polynomial rings. To allow simultaneous multiplications in the convolutional (Conv) and fully-connected (FC) layers, multiple input data are mapped to coefficients in the same polynomial, so are the weights of NNs. However, ciphertext rotations are necessary to compute the sums of products and/or incorporate the outputs of different channels into the same polynomials. Ciphertext rotations have much higher complexity than ciphertext multiplications and contribute to the majority of the latency of HE-evaluated Conv and FC layers. This paper proposes a novel reformulated server-client joint computation procedure and a new filter coefficient packing scheme to eliminate ciphertext rotations without affecting the security of the HE scheme. Our proposed scheme also leads to substantial reductions on the number of coefficient multiplications needed and the communication cost between the server and client. For various plain-20 classifiers over the CIFAR-10/100 datasets, our design reduces the running time of the Conv and FC layers by 15.5% and the communication cost between client and server by more than 50%, compared to the best prior design.
Updated: 2025-08-14 03:00:32
标题: 高效的不需要旋转的同态加密卷积神经网络
摘要: 利用同态加密(HE)可以实现隐私保护的神经网络(NN)推断,该方法允许在密文上直接进行计算。流行的HE方案建立在大的多项式环上。为了允许在卷积(Conv)和全连接(FC)层中进行同时乘法运算,多个输入数据被映射到同一多项式中的系数,神经网络的权重也是如此。然而,为了计算积的和并/或将不同通道的输出合并到同一多项式中,需要进行密文旋转。与密文乘法相比,密文旋转的复杂性要高得多,并且对HE评估的Conv和FC层的延迟贡献较大。本文提出了一种新的重构的服务器-客户端联合计算过程和一种新的滤波器系数打包方案,以消除密文旋转,而不影响HE方案的安全性。我们的提议方案还大大减少了所需的系数乘法数量和服务器与客户端之间的通信成本。对于CIFAR-10/100数据集上的各种明文-20分类器,与最佳先前设计相比,我们的设计将Conv和FC层的运行时间减少了15.5%,客户端和服务器之间的通信成本减少了50%以上。
更新时间: 2025-08-14 03:00:32
领域: cs.CR
Promoting Efficient Reasoning with Verifiable Stepwise Reward
Large reasoning models (LRMs) have recently achieved significant progress in complex reasoning tasks, aided by reinforcement learning with verifiable rewards. However, LRMs often suffer from overthinking, expending excessive computation on simple problems and reducing efficiency. Existing efficient reasoning methods typically require accurate task assessment to preset token budgets or select reasoning modes, which limits their flexibility and reliability. In this work, we revisit the essence of overthinking and identify that encouraging effective steps while penalizing ineffective ones is key to its solution. To this end, we propose a novel rule-based verifiable stepwise reward mechanism (VSRM), which assigns rewards based on the performance of intermediate states in the reasoning trajectory. This approach is intuitive and naturally fits the step-by-step nature of reasoning tasks. We conduct extensive experiments on standard mathematical reasoning benchmarks, including AIME24 and AIME25, by integrating VSRM with PPO and Reinforce++. Results show that our method achieves substantial output length reduction while maintaining original reasoning performance, striking an optimal balance between efficiency and accuracy. Further analysis of overthinking frequency and pass@k score before and after training demonstrates that our approach in deed effectively suppresses ineffective steps and encourages effective reasoning, fundamentally alleviating the overthinking problem. All code will be released upon acceptance.
Updated: 2025-08-14 02:43:53
标题: 促进有效推理与可验证逐步奖励
摘要: 最近,大型推理模型(LRMs)在复杂推理任务中取得了显著进展,受到可验证奖励的强化学习的支持。然而,LRMs往往容易陷入过度思考,对简单问题进行过度计算,降低效率。现有的高效推理方法通常需要准确的任务评估来预设令牌预算或选择推理模式,这限制了它们的灵活性和可靠性。在这项工作中,我们重新审视了过度思考的本质,并确定鼓励有效步骤并惩罚无效步骤是解决问题的关键。为此,我们提出了一种基于规则的可验证分步奖励机制(VSRM),根据推理轨迹中中间状态的表现分配奖励。这种方法直观且自然地适应推理任务的逐步性质。我们在标准数学推理基准测试中进行了大量实验,包括AIME24和AIME25,通过将VSRM与PPO和Reinforce++集成。结果表明,我们的方法在保持原始推理性能的同时实现了实质性的输出长度减少,达到了效率和准确性之间的最佳平衡。在培训之前和之后对过度思考频率和pass@k分数进行进一步分析表明,我们的方法确实有效地抑制了无效步骤,并鼓励有效推理,从根本上缓解了过度思考问题。所有代码将在接受后发布。
更新时间: 2025-08-14 02:43:53
领域: cs.AI
LLM-Driven Adaptive 6G-Ready Wireless Body Area Networks: Survey and Framework
Wireless Body Area Networks (WBANs) enable continuous monitoring of physiological signals for applications ranging from chronic disease management to emergency response. Recent advances in 6G communications, post-quantum cryptography, and energy harvesting have the potential to enhance WBAN performance. However, integrating these technologies into a unified, adaptive system remains a challenge. This paper surveys some of the most well-known Wireless Body Area Network (WBAN) architectures, routing strategies, and security mechanisms, identifying key gaps in adaptability, energy efficiency, and quantum-resistant security. We propose a novel Large Language Model-driven adaptive WBAN framework in which a Large Language Model acts as a cognitive control plane, coordinating routing, physical layer selection, micro-energy harvesting, and post-quantum security in real time. Our review highlights the limitations of current heuristic-based designs and outlines a research agenda for resource-constrained, 6G-ready medical systems. This approach aims to enable ultra-reliable, secure, and self-optimizing WBANs for next-generation mobile health applications.
Updated: 2025-08-14 02:38:22
标题: 基于LLM的自适应6G准备的无线体域网络:调查与框架
摘要: 无线人体区域网络(WBANs)使得从慢性疾病管理到紧急响应等应用中连续监测生理信号成为可能。最近在6G通信、后量子密码学和能量收集方面的进展有潜力提升WBAN的性能。然而,将这些技术整合到一个统一的自适应系统中仍然是一个挑战。本文调查了一些最著名的无线人体区域网络(WBAN)架构、路由策略和安全机制,识别出自适应性、能源效率和量子抗性安全方面的关键缺口。我们提出了一个新颖的基于大型语言模型驱动的自适应WBAN框架,其中一个大型语言模型充当认知控制平面,实时协调路由、物理层选择、微能量收集和后量子安全。我们的审查突出了当前基于启发式的设计的局限性,并规划了一个针对资源受限的、6G就绪的医疗系统的研究议程。这种方法旨在为下一代移动健康应用实现超可靠、安全和自优化的WBANs。
更新时间: 2025-08-14 02:38:22
领域: cs.NI,cs.AI,C.2.1; C.2.2; E.3; I.2.7
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Category-agnostic pose estimation (CAPE) has traditionally relied on support images with annotated keypoints, a process that is often cumbersome and may fail to fully capture the necessary correspondences across diverse object categories. Recent efforts have explored the use of text queries, leveraging their enhanced stability and generalization capabilities. However, existing approaches often remain constrained by their reliance on support queries, their failure to fully utilize the rich priors embedded in pre-trained large language models, and the limitations imposed by their parametric distribution assumptions. To address these challenges, we introduce CapeLLM, the first multimodal large language model (MLLM) designed for CAPE. Our method only employs query image and detailed text descriptions as an input to estimate category-agnostic keypoints. Our method encompasses effective training strategies and carefully designed instructions for applying the MLLM to CAPE. Moreover, we propose an inference mechanism that further enhances the reasoning process for unseen keypoints. while flexibly modeling their underlying spatial distribution and uncertainty, allowing for adaptive refinement based on contextual cues. We conducted extensive experiments to apply the MLLM to CAPE effectively, focusing not only on the model architecture and prompt design but also on ensuring robustness across input variations. Our approach sets a new state-of-the-art on the MP-100 benchmark in the 1-shot and even 5-shot setting, marking a significant advancement in the field of category-agnostic pose estimation. Code is available at https://github.com/Junhojuno/CapeLLM.
Updated: 2025-08-14 02:34:01
标题: CapeLLM:多模态大语言模型支持的无需支架的类别不可知姿势估计
摘要: 类别无关的姿势估计(CAPE)传统上依赖带有注释关键点的支持图像,这一过程通常繁琐且可能无法完全捕捉跨不同物体类别的必要对应关系。最近的努力探索了使用文本查询,利用其增强的稳定性和泛化能力。然而,现有方法常常受限于对支持查询的依赖,未能充分利用预训练大型语言模型中嵌入的丰富先验知识,并受到参数分布假设所施加的限制。为了解决这些挑战,我们引入了 CapeLLM,第一个专为CAPE设计的多模态大型语言模型(MLLM)。我们的方法仅利用查询图像和详细文本描述作为输入来估计类别无关的关键点。我们的方法包含有效的训练策略和精心设计的指令,以应用MLLM到CAPE。此外,我们提出了一种推理机制,进一步增强了对未见关键点的推理过程,同时灵活地建模它们的基础空间分布和不确定性,允许根据情境线索进行自适应精炼。我们进行了广泛的实验,有效地将MLLM应用于CAPE,重点不仅在于模型架构和提示设计,还在于确保在输入变化下的稳健性。我们的方法在1-shot甚至5-shot设置下在MP-100基准上取得了新的最先进成果,标志着类别无关姿势估计领域的重大进步。代码可在https://github.com/Junhojuno/CapeLLM找到。
更新时间: 2025-08-14 02:34:01
领域: cs.CV,cs.LG
Federated Time Series Generation on Feature and Temporally Misaligned Data
Distributed time series data presents a challenge for federated learning, as clients often possess different feature sets and have misaligned time steps. Existing federated time series models are limited by the assumption of perfect temporal or feature alignment across clients. In this paper, we propose FedTDD, a novel federated time series diffusion model that jointly learns a synthesizer across clients. At the core of FedTDD is a novel data distillation and aggregation framework that reconciles the differences between clients by imputing the misaligned timesteps and features. In contrast to traditional federated learning, FedTDD learns the correlation across clients' time series through the exchange of local synthetic outputs instead of model parameters. A coordinator iteratively improves a global distiller network by leveraging shared knowledge from clients through the exchange of synthetic data. As the distiller becomes more refined over time, it subsequently enhances the quality of the clients' local feature estimates, allowing each client to then improve its local imputations for missing data using the latest, more accurate distiller. Experimental results on five datasets demonstrate FedTDD's effectiveness compared to centralized training, and the effectiveness of sharing synthetic outputs to transfer knowledge of local time series. Notably, FedTDD achieves 79.4% and 62.8% improvement over local training in Context-FID and Correlational scores.
Updated: 2025-08-14 02:30:09
标题: 基于特征和时间不匹配数据的联合时间序列生成
摘要: 分布式时间序列数据对于联邦学习提出了挑战,因为客户端通常拥有不同的特征集,并且存在时间步长不对齐的情况。现有的联邦时间序列模型受到对客户端之间存在完美时间或特征对齐的假设的限制。在本文中,我们提出了FedTDD,一种新颖的联邦时间序列扩散模型,它联合学习了跨客户端的合成器。FedTDD的核心是一个新颖的数据浓缩和聚合框架,通过填补不对齐的时间步长和特征来协调客户端之间的差异。与传统的联邦学习相反,FedTDD通过交换本地合成输出而不是模型参数来学习跨客户端时间序列的相关性。一个协调员通过交换合成数据从客户端共享的知识来迭代改进全局蒸馏器网络。随着蒸馏器随着时间的推移变得更加精细,它随后提高了客户端的本地特征估计的质量,使每个客户端能够利用最新、更准确的蒸馏器改进其对缺失数据的本地填充。五个数据集上的实验结果表明,与集中式训练相比,FedTDD的有效性以及共享合成输出来传递本地时间序列知识的有效性。值得注意的是,FedTDD在Context-FID和相关分数上分别实现了79.4%和62.8%的提升。
更新时间: 2025-08-14 02:30:09
领域: cs.LG,cs.DC
Biased AI improves human decision-making but reduces trust
Current AI systems minimize risk by enforcing ideological neutrality, yet this may introduce automation bias by suppressing cognitive engagement in human decision-making. We conducted randomized trials with 2,500 participants to test whether culturally biased AI enhances human decision-making. Participants interacted with politically diverse GPT-4o variants on information evaluation tasks. Partisan AI assistants enhanced human performance, increased engagement, and reduced evaluative bias compared to non-biased counterparts, with amplified benefits when participants encountered opposing views. These gains carried a trust penalty: participants underappreciated biased AI and overcredited neutral systems. Exposing participants to two AIs whose biases flanked human perspectives closed the perception-performance gap. These findings complicate conventional wisdom about AI neutrality, suggesting that strategic integration of diverse cultural biases may foster improved and resilient human decision-making.
Updated: 2025-08-14 02:23:09
标题: 有偏见的人工智能改善人类决策但降低信任
摘要: 目前的人工智能系统通过强制执行意识形态中立性来最小化风险,然而这可能通过抑制人类决策中的认知参与引入自动化偏见。我们进行了一项涉及2,500名参与者的随机试验,以测试文化偏见的人工智能是否增强了人类的决策能力。参与者与政治观点多样化的GPT-4o变体进行了信息评估任务的互动。党派化的人工智能助手提高了人类表现,增加了参与度,并与无偏见的对照组相比减少了评估偏见,当参与者遇到相反观点时效果更为显著。然而,这些收益伴随着信任的惩罚:参与者低估了有偏见的人工智能,过高评价了中立系统。让参与者接触两个偏见分别围绕人类观点的人工智能缩小了认知与表现之间的差距。这些发现使人们对人工智能的中立性的传统智慧产生了质疑,表明战略性地整合不同的文化偏见可能促进改进和强大的人类决策能力。
更新时间: 2025-08-14 02:23:09
领域: cs.HC,cs.AI,cs.CY
Uncertainty-Aware Prediction of Parkinson's Disease Medication Needs: A Two-Stage Conformal Prediction Approach
Parkinson's Disease (PD) medication management presents unique challenges due to heterogeneous disease progression and treatment response. Neurologists must balance symptom control with optimal dopaminergic dosing based on functional disability while minimizing side effects. This balance is crucial as inadequate or abrupt changes can cause levodopa-induced dyskinesia, wearing off, and neuropsychiatric effects, significantly reducing quality of life. Current approaches rely on trial-and-error decisions without systematic predictive methods. Despite machine learning advances, clinical adoption remains limited due to reliance on point predictions that do not account for prediction uncertainty, undermining clinical trust and utility. Clinicians require not only predictions of future medication needs but also reliable confidence measures. Without quantified uncertainty, adjustments risk premature escalation to maximum doses or prolonged inadequate symptom control. We developed a conformal prediction framework anticipating medication needs up to two years in advance with reliable prediction intervals and statistical guarantees. Our approach addresses zero-inflation in PD inpatient data, where patients maintain stable medication regimens between visits. Using electronic health records from 631 inpatient admissions at University of Florida Health (2011-2021), our two-stage approach identifies patients likely to need medication changes, then predicts required levodopa equivalent daily dose adjustments. Our framework achieved marginal coverage while reducing prediction interval lengths compared to traditional approaches, providing precise predictions for short-term planning and wider ranges for long-term forecasting. By quantifying uncertainty, our approach enables evidence-based decisions about levodopa dosing, optimizing symptom control while minimizing side effects and improving life quality.
Updated: 2025-08-14 02:22:41
标题: 不确定性感知帕金森病药物需求的预测:一种两阶段符合预测方法
摘要: 帕金森病(PD)药物管理面临独特挑战,因为疾病进展和治疗反应存在异质性。神经学家必须在最大程度地控制症状和基于功能残疾的最佳多巴胺剂量之间取得平衡,同时尽量减少副作用。这种平衡非常关键,因为不足或突然的变化可能导致左多巴诱发的运动障碍、药效消退和神经精神效应,显著降低生活质量。目前的方法依赖于试错决策,没有系统的预测方法。尽管机器学习取得了进展,但临床应用仍受限于依赖不考虑预测不确定性的点预测,这损害了临床信任和实用性。临床医生不仅需要对未来药物需求进行预测,还需要可靠的置信度测量。没有量化的不确定性,调整可能会冒险过早升级到最大剂量或延长不足的症状控制。我们开发了一个符合预测框架,可以提前两年预测药物需求,提供可靠的预测区间和统计保证。我们的方法解决了PD住院数据中的零膨胀问题,患者在就诊之间保持稳定的用药方案。利用佛罗里达大学健康中心(2011-2021年)的631次住院就诊的电子健康记录,我们的两阶段方法确定了可能需要药物变化的患者,然后预测所需的左多巴当量每日剂量调整。与传统方法相比,我们的框架在减少预测区间长度的同时实现了边际覆盖,为短期规划提供精确预测,为长期预测提供更宽的范围。通过量化不确定性,我们的方法使基于证据的关于左多巴剂量的决策成为可能,优化症状控制,同时最大限度地减少副作用并改善生活质量。
更新时间: 2025-08-14 02:22:41
领域: cs.LG,stat.ME,stat.ML
The Conditional Regret-Capacity Theorem for Batch Universal Prediction
We derive a conditional version of the classical regret-capacity theorem. This result can be used in universal prediction to find lower bounds on the minimal batch regret, which is a recently introduced generalization of the average regret, when batches of training data are available to the predictor. As an example, we apply this result to the class of binary memoryless sources. Finally, we generalize the theorem to R\'enyi information measures, revealing a deep connection between the conditional R\'enyi divergence and the conditional Sibson's mutual information.
Updated: 2025-08-14 02:17:10
标题: 《批量通用预测的条件后悔容量定理》
摘要: 我们推导了经典遗憾容量定理的条件版本。这个结果可以在通用预测中使用,以找到预测器在训练数据批次可用时的最小批次遗憾的下界,这是最近引入的平均遗憾的一般化。作为一个例子,我们将这个结果应用到二进制无记忆源的类别中。最后,我们将定理推广到R\'enyi信息度量,揭示了条件R\'enyi散度和条件Sibson互信息之间的深刻联系。
更新时间: 2025-08-14 02:17:10
领域: cs.IT,cs.LG,math.IT,stat.ML
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts
We introduce PRELUDE, a benchmark for evaluating long-context understanding through the task of determining whether a character's prequel story is consistent with the canonical narrative of the original book. Our task poses a stronger demand for global comprehension and deep reasoning than existing benchmarks -- as the prequels are not part of the original story, assessing their plausibility typically requires searching and integrating information that is only indirectly related. Empirically, 88% of instances require evidence from multiple parts of the narrative. Experimental results highlight the challenge of our task: in-context learning, RAG and in-domain training with state-of-the-art LLMs, and commercial DeepResearch services, lag behind humans by >15%. A further human study reveals that models often produce correct answers with flawed reasoning, leading to an over 30% gap in reasoning accuracy compared to humans. These findings underscore the substantial room for improvement in long-context understanding and reasoning.
Updated: 2025-08-14 02:08:15
标题: 序曲:一个旨在要求全局理解和推理长篇背景的基准Benchmark
摘要: 我们介绍了PRELUDE,一个用于评估长篇文本理解的基准,通过判断一个角色的前传故事是否与原著小说的正史叙事相一致。我们的任务对全局理解和深度推理提出了更高要求,超过了现有基准 -- 因为前传故事并不是原始故事的一部分,评估它们的可信度通常需要搜索和整合间接相关的信息。根据实证数据,88%的情况需要从叙述的多个部分获取证据。实验结果突显了我们任务的挑战:在上下文学习、RAG和在领域训练中,与最先进的LLMs和商业DeepResearch服务相比,人类的表现落后超过15%。进一步的人类研究揭示了模型经常通过有缺陷的推理产生正确答案,导致推理准确性与人类相比的差距超过30%。这些发现突显了在长篇文本理解和推理方面有很大的改进空间。
更新时间: 2025-08-14 02:08:15
领域: cs.CL,cs.AI
Explainable Sentiment Analysis with DeepSeek-R1: Performance, Efficiency, and Few-Shot Learning
Large language models (LLMs) have transformed sentiment analysis, yet balancing accuracy, efficiency, and explainability remains a critical challenge. This study presents the first comprehensive evaluation of DeepSeek-R1--an open-source reasoning model--against OpenAI's GPT-4o and GPT-4o-mini. We test the full 671B model and its distilled variants, systematically documenting few-shot learning curves. Our experiments show DeepSeek-R1 achieves a 91.39\% F1 score on 5-class sentiment and 99.31\% accuracy on binary tasks with just 5 shots, an eightfold improvement in few-shot efficiency over GPT-4o. Architecture-specific distillation effects emerge, where a 32B Qwen2.5-based model outperforms the 70B Llama-based variant by 6.69 percentage points. While its reasoning process reduces throughput, DeepSeek-R1 offers superior explainability via transparent, step-by-step traces, establishing it as a powerful, interpretable open-source alternative.
Updated: 2025-08-14 02:03:06
标题: 用DeepSeek-R1解释性情感分析:性能、效率和少样本学习
摘要: 大型语言模型(LLMs)已经改变了情感分析,然而在准确性、效率和可解释性之间的平衡仍然是一个关键挑战。本研究首次全面评估了DeepSeek-R1——一个开源推理模型——与OpenAI的GPT-4o和GPT-4o-mini。我们测试了完整的671B模型及其精炼的变体,系统地记录了少样本学习曲线。我们的实验表明,DeepSeek-R1在5类情感分析上实现了91.39\%的F1分数,在二元任务上实现了99.31\%的准确率,仅需5次尝试,比GPT-4o提高了8倍的少样本效率。架构特定的精炼效果出现,基于32B Qwen2.5的模型比基于70B Llama的变体高出6.69个百分点。虽然其推理过程降低了吞吐量,但DeepSeek-R1通过透明、逐步的追踪提供了卓越的可解释性,将其确立为一个强大、可解释的开源选择。
更新时间: 2025-08-14 02:03:06
领域: cs.CL,cs.AI
EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision
In digital pathology, whole-slide images (WSIs) are often difficult to handle due to their gigapixel scale, so most approaches train patch encoders via self-supervised learning (SSL) and then aggregate the patch-level embeddings via multiple instance learning (MIL) or slide encoders for downstream tasks. However, patch-level SSL may overlook complex domain-specific features that are essential for biomarker prediction, such as mutation status and molecular characteristics, as SSL methods rely only on basic augmentations selected for natural image domains on small patch-level area. Moreover, SSL methods remain less data efficient than fully supervised approaches, requiring extensive computational resources and datasets to achieve competitive performance. To address these limitations, we present EXAONE Path 2.0, a pathology foundation model that learns patch-level representations under direct slide-level supervision. Using only 37k WSIs for training, EXAONE Path 2.0 achieves state-of-the-art average performance across 10 biomarker prediction tasks, demonstrating remarkable data efficiency.
Updated: 2025-08-14 02:00:46
标题: EXAONE Path 2.0:端到端监督的病理基础模型
摘要: 在数字病理学中,整张切片图像(WSIs)通常由于其千亿像素的规模而难以处理,因此大多数方法通过自监督学习(SSL)训练补丁编码器,然后通过多示例学习(MIL)或幻灯片编码器聚合补丁级嵌入以进行下游任务。然而,基于补丁级的SSL可能忽略对生物标记预测至关重要的复杂领域特定特征,例如突变状态和分子特征,因为SSL方法仅依赖于为小补丁级区域上选择的自然图像领域的基本增强。此外,SSL方法仍然比完全监督的方法数据效率低,需要大量计算资源和数据集才能实现竞争性性能。为了解决这些限制,我们提出了EXAONE Path 2.0,这是一个病理基础模型,它在直接幻灯片级别监督下学习补丁级表示。仅使用37k WSIs进行训练,EXAONE Path 2.0在10个生物标记预测任务中实现了最先进的平均性能,展示了出色的数据效率。
更新时间: 2025-08-14 02:00:46
领域: cs.CV,cs.AI,cs.LG
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
Reinforcement learning (RL) has emerged as an effective post-training paradigm for enhancing the reasoning capabilities of multimodal large language model (MLLM). However, current RL pipelines often suffer from training inefficiencies caused by two underexplored issues: Advantage Collapsing, where most advantages in a batch concentrate near zero, and Rollout Silencing, where the proportion of rollouts contributing non-zero gradients diminishes over time. These issues lead to suboptimal gradient updates and hinder long-term learning efficiency. To address these issues, we propose Shuffle-R1, a simple yet principled framework that improves RL fine-tuning efficiency by dynamically restructuring trajectory sampling and batch composition. It introduces (1) Pairwise Trajectory Sampling, which selects high-contrast trajectories with large advantages to improve gradient signal quality, and (2) Advantage-based Trajectory Shuffle, which increases exposure of valuable rollouts through informed batch reshuffling. Experiments across multiple reasoning benchmarks show that our framework consistently outperforms strong RL baselines with minimal overhead. These results highlight the importance of data-centric adaptations for more efficient RL training in MLLM.
Updated: 2025-08-14 02:00:27
标题: Shuffle-R1:通过数据中心的动态洗牌,为多模态大型语言模型提供高效的RL框架
摘要: 强化学习(RL)已经成为增强多模态大型语言模型(MLLM)推理能力的有效后训练范式。然而,目前的RL管道经常遭受由两个未经探讨的问题引起的训练效率低下:优势坍缩,其中一个批次中的大多数优势集中在零附近,以及Rollout沉默,其中贡献非零梯度的Rollout比例随时间减少。这些问题导致次优的梯度更新,并阻碍长期学习效率。为了解决这些问题,我们提出了Shuffle-R1,一个简单而有原则的框架,通过动态重组轨迹采样和批次组成来提高RL微调效率。它引入了(1)成对轨迹采样,选择具有大优势的高对比度轨迹以提高梯度信号质量,以及(2)基于优势的轨迹洗牌,通过知情批量重排增加有价值的Rollout的曝光。跨多个推理基准的实验显示,我们的框架始终优于强RL基线,并且开销最小。这些结果突显了数据中心适应对于MLLM中更有效的RL训练的重要性。
更新时间: 2025-08-14 02:00:27
领域: cs.LG,cs.AI
Bi-Sparse Unsupervised Feature Selection
To deal with high-dimensional unlabeled datasets in many areas, principal component analysis (PCA) has become a rising technique for unsupervised feature selection (UFS). However, most existing PCA-based methods only consider the structure of datasets by embedding a single sparse regularization or constraint on the transformation matrix. In this paper, we introduce a novel bi-sparse method called BSUFS to improve the performance of UFS. The core idea of BSUFS is to incorporate $\ell_{2,p}$-norm and $\ell_q$-norm into the classical PCA, which enables our method to select relevant features and filter out irrelevant noises, thereby obtaining discriminative features. Here, the parameters $p$ and $q$ are within the range of $[0, 1)$. Therefore, BSUFS not only constructs a unified framework for bi-sparse optimization, but also includes some existing works as special cases. To solve the resulting non-convex model, we propose an efficient proximal alternating minimization (PAM) algorithm using Stiefel manifold optimization and sparse optimization techniques. In addition, the computational complexity analysis is presented. Extensive numerical experiments on synthetic and real-world datasets demonstrate the effectiveness of our proposed BSUFS. The results reveal the advantages of bi-sparse optimization in feature selection and show its potential for other fields in image processing. Our code is available at https://github.com/xianchaoxiu/BSUFS.
Updated: 2025-08-14 01:54:33
标题: 双稀疏无监督特征选择
摘要: 为了处理许多领域中的高维度未标记数据集,主成分分析(PCA)已经成为一种用于无监督特征选择(UFS)的新兴技术。然而,大多数现有的基于PCA的方法只考虑通过在转换矩阵上嵌入单一稀疏正则化或约束来处理数据集的结构。在本文中,我们引入了一种称为BSUFS的新型双稀疏方法,以提高UFS的性能。BSUFS的核心思想是将$\ell_{2,p}$-范数和$\ell_q$-范数引入经典的PCA中,使我们的方法能够选择相关特征并滤除无关噪音,从而获得具有辨别性的特征。这里,参数$p$和$q$在范围$[0, 1)$内。因此,BSUFS不仅构建了一个双稀疏优化的统一框架,还包括一些现有作品作为特例。为了解决由此产生的非凸模型,我们提出了一种使用Stiefel流形优化和稀疏优化技术的高效近端交替最小化(PAM)算法。此外,还提供了计算复杂性分析。对合成和真实世界数据集的大量数值实验证明了我们提出的BSUFS的有效性。结果显示了双稀疏优化在特征选择中的优势,并展示了在图像处理等其他领域的潜力。我们的代码可在https://github.com/xianchaoxiu/BSUFS 上找到。
更新时间: 2025-08-14 01:54:33
领域: math.OC,cs.LG
Transferable Parasitic Estimation via Graph Contrastive Learning and Label Rebalancing in AMS Circuits
Graph representation learning on Analog-Mixed Signal (AMS) circuits is crucial for various downstream tasks, e.g., parasitic estimation. However, the scarcity of design data, the unbalanced distribution of labels, and the inherent diversity of circuit implementations pose significant challenges to learning robust and transferable circuit representations. To address these limitations, we propose CircuitGCL, a novel graph contrastive learning framework that integrates representation scattering and label rebalancing to enhance transferability across heterogeneous circuit graphs. CircuitGCL employs a self-supervised strategy to learn topology-invariant node embeddings through hyperspherical representation scattering, eliminating dependency on large-scale data. Simultaneously, balanced mean squared error (BMSE) and balanced softmax cross-entropy (BSCE) losses are introduced to mitigate label distribution disparities between circuits, enabling robust and transferable parasitic estimation. Evaluated on parasitic capacitance estimation (edge-level task) and ground capacitance classification (node-level task) across TSMC 28nm AMS designs, CircuitGCL outperforms all state-of-the-art (SOTA) methods, with the $R^2$ improvement of $33.64\% \sim 44.20\%$ for edge regression and F1-score gain of $0.9\times \sim 2.1\times$ for node classification. Our code is available at https://github.com/ShenShan123/CircuitGCL.
Updated: 2025-08-14 01:46:02
标题: 在AMS电路中,通过图对比学习和标签再平衡实现的可传递寄生参数估计
摘要: 在模拟混合信号(AMS)电路上进行图表示学习对于各种下游任务非常关键,例如寄生参数估计。然而,设计数据的稀缺性、标签分布不均衡以及电路实现的固有多样性给学习健壮和可转移的电路表示带来了重大挑战。为了解决这些限制,我们提出了一种新颖的图对比学习框架CircuitGCL,该框架整合了表示散射和标签再平衡,以增强异构电路图之间的可转移性。CircuitGCL采用自监督策略通过超球形表示散射来学习拓扑不变的节点嵌入,消除了对大规模数据的依赖。同时,引入了平衡均方误差(BMSE)和平衡softmax交叉熵(BSCE)损失来减轻电路之间标签分布不均衡的差异,实现了强大和可转移的寄生参数估计。在TSMC 28nm AMS设计中,针对寄生电容估计(边级任务)和地面电容分类(节点级任务)的评估中,CircuitGCL的性能优于所有最新技术(SOTA)方法,边回归的$R^2$改善了$33.64\% \sim 44.20\%$,节点分类的F1得分提高了$0.9\times \sim 2.1\times$。我们的代码可在https://github.com/ShenShan123/CircuitGCL 上获取。
更新时间: 2025-08-14 01:46:02
领域: cs.LG,cs.SY,eess.SY
M3-Net: A Cost-Effective Graph-Free MLP-Based Model for Traffic Prediction
Achieving accurate traffic prediction is a fundamental but crucial task in the development of current intelligent transportation systems.Most of the mainstream methods that have made breakthroughs in traffic prediction rely on spatio-temporal graph neural networks, spatio-temporal attention mechanisms, etc. The main challenges of the existing deep learning approaches are that they either depend on a complete traffic network structure or require intricate model designs to capture complex spatio-temporal dependencies. These limitations pose significant challenges for the efficient deployment and operation of deep learning models on large-scale datasets. To address these challenges, we propose a cost-effective graph-free Multilayer Perceptron (MLP) based model M3-Net for traffic prediction. Our proposed model not only employs time series and spatio-temporal embeddings for efficient feature processing but also first introduces a novel MLP-Mixer architecture with a mixture of experts (MoE) mechanism. Extensive experiments conducted on multiple real datasets demonstrate the superiority of the proposed model in terms of prediction performance and lightweight deployment.
Updated: 2025-08-14 01:45:48
标题: M3-Net:一种基于MLP的无图成本效益的交通预测模型
摘要: 实现准确的交通预测是当前智能交通系统发展中的基础性但至关重要的任务。大多数在交通预测方面取得突破的主流方法依赖于时空图神经网络、时空注意机制等。现有深度学习方法的主要挑战在于,它们要么依赖完整的交通网络结构,要么需要复杂的模型设计来捕捉复杂的时空依赖关系。这些限制给大规模数据集上深度学习模型的高效部署和运行带来了重大挑战。为了解决这些挑战,我们提出了一种基于多层感知器(MLP)的无图模型M3-Net用于交通预测。我们提出的模型不仅采用时间序列和时空嵌入进行高效特征处理,还首次引入了一种具有专家混合(MoE)机制的新型MLP-Mixer架构。在多个真实数据集上进行的大量实验表明,所提出的模型在预测性能和轻量级部署方面具有优越性。
更新时间: 2025-08-14 01:45:48
领域: cs.LG,cs.AI
MedRep: Medical Concept Representation for General Electronic Health Record Foundation Models
Electronic health record (EHR) foundation models have been an area ripe for exploration with their improved performance in various medical tasks. Despite the rapid advances, there exists a fundamental limitation: Processing unseen medical codes out of vocabulary. This problem limits the generalizability of EHR foundation models and the integration of models trained with different vocabularies. To alleviate this problem, we propose a set of novel medical concept representations (MedRep) for EHR foundation models based on the observational medical outcome partnership (OMOP) common data model (CDM). For concept representation learning, we enrich the information of each concept with a minimal definition through large language model (LLM) prompts and complement the text-based representations through the graph ontology of OMOP vocabulary. Our approach outperforms the vanilla EHR foundation model and the model with a previously introduced medical code tokenizer in diverse prediction tasks. We also demonstrate the generalizability of MedRep through external validation.
Updated: 2025-08-14 01:34:08
标题: MedRep:通用电子健康记录基础模型的医学概念表示
摘要: 电子健康记录(EHR)基础模型一直是一个值得探索的领域,因为它们在各种医疗任务中表现出色。尽管快速进展,但存在一个基本限制:处理未见过的医疗代码超出词汇表。这个问题限制了EHR基础模型的泛化能力以及训练有不同词汇的模型的整合。为了缓解这个问题,我们提出了一组基于观察性医疗结果合作伙伴(OMOP)通用数据模型(CDM)的新颖医疗概念表示(MedRep)用于EHR基础模型。对于概念表示学习,我们通过大型语言模型(LLM)提示丰富每个概念的信息,并通过OMOP词汇的图本体补充基于文本的表示。我们的方法在不同的预测任务中优于普通的EHR基础模型和之前介绍过的医疗代码标记器模型。我们还通过外部验证展示了MedRep的泛化能力。
更新时间: 2025-08-14 01:34:08
领域: cs.AI,cs.CL,cs.LG
MedRep: Medical Concept Representation for General Electronic Health Record Foundation Models
Electronic health record (EHR) foundation models have been an area ripe for exploration with their improved performance in various medical tasks. Despite the rapid advances, there exists a fundamental limitation: Processing unseen medical codes out of vocabulary. This problem limits the generalizability of EHR foundation models and the integration of models trained with different vocabularies. To alleviate this problem, we propose a set of novel medical concept representations (MedRep) for EHR foundation models based on the observational medical outcome partnership (OMOP) common data model (CDM). For concept representation learning, we enrich the information of each concept with a minimal definition through large language model (LLM) prompts and complement the text-based representations through the graph ontology of OMOP vocabulary. Our approach outperforms the vanilla EHR foundation model and the model with a previously introduced medical code tokenizer in diverse prediction tasks. We also demonstrate the generalizability of MedRep through external validation.
Updated: 2025-08-14 01:34:08
标题: MedRep:通用电子健康记录基础模型的医学概念表示
摘要: 电子健康记录(EHR)基础模型一直是一个值得探索的领域,其在各种医学任务中表现出色。尽管取得了快速进展,但存在一个根本性限制:处理超出词汇表范围的未知医学编码。这个问题限制了EHR基础模型的泛化能力和训练有不同词汇的模型的集成。为了缓解这个问题,我们提出了一组基于观察性医学结果伙伴关系(OMOP)通用数据模型(CDM)的新型医学概念表示(MedRep)用于EHR基础模型。对于概念表示学习,我们通过大型语言模型(LLM)提示丰富每个概念的信息,并通过OMOP词汇的图本体补充基于文本的表示。我们的方法在各种预测任务中优于原始的EHR基础模型和先前引入的医学编码分词器模型。我们还通过外部验证展示了MedRep的泛化能力。
更新时间: 2025-08-14 01:34:08
领域: cs.AI,cs.CL,cs.LG
Pose-Robust Calibration Strategy for Point-of-Gaze Estimation on Mobile Phones
Although appearance-based point-of-gaze (PoG) estimation has improved, the estimators still struggle to generalize across individuals due to personal differences. Therefore, person-specific calibration is required for accurate PoG estimation. However, calibrated PoG estimators are often sensitive to head pose variations. To address this, we investigate the key factors influencing calibrated estimators and explore pose-robust calibration strategies. Specifically, we first construct a benchmark, MobilePoG, which includes facial images from 32 individuals focusing on designated points under either fixed or continuously changing head poses. Using this benchmark, we systematically analyze how the diversity of calibration points and head poses influences estimation accuracy. Our experiments show that introducing a wider range of head poses during calibration improves the estimator's ability to handle pose variation. Building on this insight, we propose a dynamic calibration strategy in which users fixate on calibration points while moving their phones. This strategy naturally introduces head pose variation during a user-friendly and efficient calibration process, ultimately producing a better calibrated PoG estimator that is less sensitive to head pose variations than those using conventional calibration strategies. Codes and datasets are available at our project page.
Updated: 2025-08-14 01:28:30
标题: 手机上姿态鲁棒的注视点估计校准策略
摘要: 尽管基于外观的凝视点(PoG)估计已经有所改进,但由于个人差异,估计器仍然难以在个体间泛化。因此,精确的PoG估计需要个人特定的校准。然而,校准后的PoG估计器通常对头部姿势变化敏感。为了解决这个问题,我们研究了影响校准估计器的关键因素,并探讨了姿势鲁棒的校准策略。具体而言,我们首先构建了一个基准MobilePoG,其中包括32个人的面部图像,集中在固定或持续变化的头部姿势下的指定点。利用这一基准,我们系统分析了校准点和头部姿势多样性如何影响估计精度。我们的实验表明,在校准过程中引入更广泛的头部姿势范围可以提高估计器处理姿势变化的能力。基于这一见解,我们提出了一种动态校准策略,用户在移动手机时注视校准点。这种策略在用户友好和高效的校准过程中自然引入头部姿势变化,最终生成一个比使用传统校准策略的校准PoG估计器更不敏感于头部姿势变化的更好的校准PoG估计器。我们的项目页面上提供了代码和数据集。
更新时间: 2025-08-14 01:28:30
领域: cs.CV,cs.AI,cs.HC
GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes
Robust grasping in cluttered environments remains an open challenge in robotics. While benchmark datasets have significantly advanced deep learning methods, they mainly focus on simplistic scenes with light occlusion and insufficient diversity, limiting their applicability to practical scenarios. We present GraspClutter6D, a large-scale real-world grasping dataset featuring: (1) 1,000 highly cluttered scenes with dense arrangements (14.1 objects/scene, 62.6\% occlusion), (2) comprehensive coverage across 200 objects in 75 environment configurations (bins, shelves, and tables) captured using four RGB-D cameras from multiple viewpoints, and (3) rich annotations including 736K 6D object poses and 9.3B feasible robotic grasps for 52K RGB-D images. We benchmark state-of-the-art segmentation, object pose estimation, and grasp detection methods to provide key insights into challenges in cluttered environments. Additionally, we validate the dataset's effectiveness as a training resource, demonstrating that grasping networks trained on GraspClutter6D significantly outperform those trained on existing datasets in both simulation and real-world experiments. The dataset, toolkit, and annotation tools are publicly available on our project website: https://sites.google.com/view/graspclutter6d.
Updated: 2025-08-14 01:19:42
标题: GraspClutter6D:一个用于在混乱场景中进行稳健感知和抓取的大规模真实世界数据集
摘要: 在机器人技术领域,复杂环境中的稳健抓取仍然是一个挑战。虽然基准数据集显著推进了深度学习方法,但它们主要侧重于简单的场景,具有轻微遮挡和不足的多样性,从而限制了它们在实际情景中的适用性。我们提出了GraspClutter6D,一个大规模真实世界抓取数据集,包括:(1) 1,000个高度混乱的场景,密集排列的物体(每个场景14.1个物体,62.6\%遮挡),(2) 在75个环境配置(箱子、货架和桌子)中涵盖了200个物体,使用四个RGB-D摄像头从多个视角捕获,以及(3) 丰富的注释,包括736K个6D物体姿势和9.3B个可行的机器人抓取姿势,用于52K个RGB-D图像。我们对最先进的分割、物体姿势估计和抓取检测方法进行基准测试,以揭示复杂环境中的挑战。此外,我们验证了数据集作为训练资源的有效性,证明在GraspClutter6D上训练的抓取网络在模拟和真实世界实验中明显优于在现有数据集上训练的网络。数据集、工具包和注释工具可在我们的项目网站上公开获取:https://sites.google.com/view/graspclutter6d。
更新时间: 2025-08-14 01:19:42
领域: cs.RO,cs.AI,cs.CV
Why Cannot Large Language Models Ever Make True Correct Reasoning?
Recently, with the application progress of AIGC tools based on large language models (LLMs), led by ChatGPT, many AI experts and more non-professionals are trumpeting the "understanding ability" and "reasoning ability" of the LLMs. The present author considers that the so-called "understanding ability" and "reasoning ability" of LLMs are just illusions of those people who with vague concepts. In fact, the LLMs can never have the true understanding ability and true reasoning ability. This paper intents to explain that, because the essential limitations of their working principle, the LLMs can never have the ability of true correct reasoning.
Updated: 2025-08-14 01:18:18
标题: 为什么大型语言模型永远无法进行真正正确的推理?
摘要: 最近,随着基于大型语言模型(LLMs)的AIGC工具的应用进展,由ChatGPT领导,许多人工智能专家以及更多非专业人士都在大肆宣扬LLMs的“理解能力”和“推理能力”。本文作者认为,所谓的LLMs的“理解能力”和“推理能力”只是那些概念模糊的人的幻觉。事实上,LLMs永远无法拥有真正的理解能力和真正的推理能力。本文旨在解释,由于它们工作原理的本质限制,LLMs永远无法具有真正正确推理的能力。
更新时间: 2025-08-14 01:18:18
领域: cs.AI,cs.LO
MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs
Large Vision-Language Models (LVLMs) have shown strong performance across multimodal tasks. However, they often produce hallucinations -- text that is inconsistent with visual input, due to the limited ability to verify information in different regions of the image. To address this, we propose Multi-Region Fusion Decoding (MRFD), a training-free decoding method that improves factual grounding by modeling inter-region consistency. MRFD identifies salient regions using cross-attention, generates initial responses for each, and computes reliability weights based on Jensen-Shannon Divergence (JSD) among the responses. These weights guide a consistency-aware fusion of per-region predictions, using region-aware prompts inspired by Chain-of-Thought reasoning. Experiments across multiple LVLMs and benchmarks show that MRFD significantly reduces hallucinations and improves response factuality without requiring model updates.
Updated: 2025-08-14 01:17:39
标题: MRFD:具有自一致性的多区域融合解码,用于减轻LVLMs中的幻觉
摘要: 大型视觉语言模型(LVLMs)已经在多模态任务中表现出色。然而,它们通常会产生幻觉——即与视觉输入不一致的文本,这是因为模型在不同区域验证信息的能力有限。为了解决这个问题,我们提出了多区域融合解码(MRFD),这是一种无需训练的解码方法,通过建模区域间一致性来改善事实基础。MRFD使用交叉注意力识别显著区域,为每个区域生成初始响应,并根据响应之间的Jensen-Shannon散度计算可靠性权重。这些权重通过受到“思维链条”推理启发的区域感知提示,引导了对每个区域预测的一致性融合。跨多个LVLMs和基准的实验表明,MRFD显著减少了幻觉,并提高了响应的事实准确性,而无需进行模型更新。
更新时间: 2025-08-14 01:17:39
领域: cs.CV,cs.AI
To Theoretically Understand Transformer-Based In-Context Learning for Optimizing CSMA
The binary exponential backoff scheme is widely used in WiFi 7 and still incurs poor throughput performance under dynamic channel environments. Recent model-based approaches (e.g., non-persistent and $p$-persistent CSMA) simply optimize backoff strategies under a known and fixed node density, still leading to a large throughput loss due to inaccurate node density estimation. This paper is the first to propose LLM transformer-based in-context learning (ICL) theory for optimizing channel access. We design a transformer-based ICL optimizer to pre-collect collision-threshold data examples and a query collision case. They are constructed as a prompt as the input for the transformer to learn the pattern, which then generates a predicted contention window threshold (CWT). To train the transformer for effective ICL, we develop an efficient algorithm and guarantee a near-optimal CWT prediction within limited training steps. As it may be hard to gather perfect data examples for ICL in practice, we further extend to allow erroneous data input in the prompt. We prove that our optimizer maintains minimal prediction and throughput deviations from the optimal values. Experimental results on NS-3 further demonstrate our approach's fast convergence and near-optimal throughput over existing model-based and DRL-based approaches under unknown node densities.
Updated: 2025-08-14 01:13:56
标题: 理论上理解基于Transformer的上下文学习以优化CSMA
摘要: 二进制指数退避方案被广泛应用于WiFi 7,并且在动态信道环境下仍然表现出较差的吞吐性能。最近基于模型的方法(例如,非持久性和$p$-持久性CSMA)仅仅在已知和固定的节点密度下优化退避策略,仍然导致由于节点密度估计不准确而产生大量吞吐量损失。本文首次提出了基于LLM变压器的上下文学习(ICL)理论,用于优化信道访问。我们设计了一个基于变压器的ICL优化器,预先收集碰撞阈值数据示例和一个查询碰撞案例。它们构造为输入变压器学习模式的提示,然后生成预测的争用窗口阈值(CWT)。为了训练变压器实现有效的ICL,我们开发了一种高效算法,并保证在有限的训练步骤内实现接近最优的CWT预测。由于在实践中可能很难收集完美的ICL数据示例,我们进一步扩展以允许提示中的错误数据输入。我们证明我们的优化器与最优值之间保持最小的预测和吞吐量偏差。在NS-3上的实验结果进一步证明了我们的方法在未知节点密度下比现有基于模型和基于DRL的方法快速收敛并且吞吐量接近最优。
更新时间: 2025-08-14 01:13:56
领域: cs.LG,cs.AI,cs.NI
Identifying Causal Direction via Variational Bayesian Compression
Telling apart the cause and effect between two random variables with purely observational data is a challenging problem that finds applications in various scientific disciplines. A key principle utilized in this task is the algorithmic Markov condition, which postulates that the joint distribution, when factorized according to the causal direction, yields a more succinct codelength compared to the anti-causal direction. Previous approaches approximate these codelengths by relying on simple functions or Gaussian processes (GPs) with easily evaluable complexity, compromising between model fitness and computational complexity. To address these limitations, we propose leveraging the variational Bayesian learning of neural networks as an interpretation of the codelengths. This allows the improvement of model fitness, while maintaining the succinctness of the codelengths, and the avoidance of the significant computational complexity of the GP-based approaches. Extensive experiments on both synthetic and real-world benchmarks in cause-effect identification demonstrate the effectiveness of our proposed method, showing promising performance enhancements on several datasets in comparison to most related methods.
Updated: 2025-08-14 01:13:43
标题: 通过变分贝叶斯压缩识别因果方向
摘要: 使用纯观测数据区分两个随机变量之间的因果关系是一个具有挑战性的问题,在各种科学学科中都有应用。在这项任务中使用的一个关键原则是算法马尔可夫条件,该条件假设联合分布在根据因果方向进行因子化时,相对于反向因果方向会产生更简洁的编码长度。先前的方法通过依赖简单函数或高斯过程(GPs)来近似这些编码长度,这些函数或过程易于评估复杂性,从而在模型适应性和计算复杂性之间进行妥协。为了解决这些限制,我们提出利用神经网络的变分贝叶斯学习来解释编码长度。这可以改善模型的适应性,同时保持编码长度的简洁性,并避免基于GP的方法的显著计算复杂性。在因果关系识别的合成和真实基准测试中进行的大量实验表明,我们提出的方法的有效性,与大多数相关方法相比,在几个数据集上显示出性能的显著提升。
更新时间: 2025-08-14 01:13:43
领域: cs.LG,stat.ML
DINOMotion: advanced robust tissue motion tracking with DINOv2 in 2D-Cine MRI-guided radiotherapy
Accurate tissue motion tracking is critical to ensure treatment outcome and safety in 2D-Cine MRI-guided radiotherapy. This is typically achieved by registration of sequential images, but existing methods often face challenges with large misalignments and lack of interpretability. In this paper, we introduce DINOMotion, a novel deep learning framework based on DINOv2 with Low-Rank Adaptation (LoRA) layers for robust, efficient, and interpretable motion tracking. DINOMotion automatically detects corresponding landmarks to derive optimal image registration, enhancing interpretability by providing explicit visual correspondences between sequential images. The integration of LoRA layers reduces trainable parameters, improving training efficiency, while DINOv2's powerful feature representations offer robustness against large misalignments. Unlike iterative optimization-based methods, DINOMotion directly computes image registration at test time. Our experiments on volunteer and patient datasets demonstrate its effectiveness in estimating both linear and nonlinear transformations, achieving Dice scores of 92.07% for the kidney, 90.90% for the liver, and 95.23% for the lung, with corresponding Hausdorff distances of 5.47 mm, 8.31 mm, and 6.72 mm, respectively. DINOMotion processes each scan in approximately 30ms and consistently outperforms state-of-the-art methods, particularly in handling large misalignments. These results highlight its potential as a robust and interpretable solution for real-time motion tracking in 2D-Cine MRI-guided radiotherapy.
Updated: 2025-08-14 01:02:26
标题: DINOMotion:在2D-Cine MRI引导放射治疗中使用DINOv2实现高级鲁棒组织运动跟踪
摘要: 准确的组织运动跟踪对于确保2D-Cine MRI引导放射治疗的治疗结果和安全至关重要。通常通过注册连续图像来实现这一目标,但现有方法常常面临大的错位和缺乏可解释性的挑战。在本文中,我们介绍了DINOMotion,这是一个基于DINOv2和Low-Rank Adaptation(LoRA)层的新颖深度学习框架,用于稳健、高效和可解释的运动跟踪。DINOMotion自动检测对应的标志点,以获得最佳图像注册,通过提供连续图像之间的明确视觉对应关系来增强可解释性。LoRA层的集成减少了可训练参数,提高了训练效率,而DINOv2的强大特征表示提供了对大错位的稳健性。与基于迭代优化的方法不同,DINOMotion直接在测试时计算图像注册。我们对志愿者和患者数据集进行的实验表明,它在估计线性和非线性变换方面具有效果,对于肾脏、肝脏和肺部分别实现了92.07%、90.90%和95.23%的Dice分数,相应的Hausdorff距离分别为5.47毫米、8.31毫米和6.72毫米。DINOMotion每次扫描处理时间约为30毫秒,并始终优于最先进的方法,特别是在处理大错位方面。这些结果突显了其作为2D-Cine MRI引导放射治疗中实时运动跟踪的稳健和可解释解决方案的潜力。
更新时间: 2025-08-14 01:02:26
领域: eess.IV,cs.AI,cs.CV
Fast Convergence for High-Order ODE Solvers in Diffusion Probabilistic Models
Diffusion probabilistic models generate samples by learning to reverse a noise-injection process that transforms data into noise. A key development is the reformulation of the reverse sampling process as a deterministic probability flow ordinary differential equation (ODE), which allows for efficient sampling using high-order numerical solvers. Unlike traditional time integrator analysis, the accuracy of this sampling procedure depends not only on numerical integration errors but also on the approximation quality and regularity of the learned score function, as well as their interaction. In this work, we present a rigorous convergence analysis of deterministic samplers derived from probability flow ODEs for general forward processes with arbitrary variance schedules. Specifically, we develop and analyze $p$-th order (exponential) Runge-Kutta schemes, under the practical assumption that the first and second derivatives of the learned score function are bounded. We prove that the total variation distance between the generated and target distributions can be bounded as \begin{align*} O\bigl(d^{\frac{7}{4}}\varepsilon_{\text{score}}^{\frac{1}{2}} +d(dH_{\max})^p\bigr), \end{align*} where $\varepsilon^2_{\text{score}}$ denotes the $L^2$ error in the score function approximation, $d$ is the data dimension, and $H_{\max}$ represents the maximum solver step size. Numerical experiments on benchmark datasets further confirm that the derivatives of the learned score function are bounded in practice.
Updated: 2025-08-14 01:01:23
标题: 在扩散概率模型中高阶ODE求解器的快速收敛
摘要: 概率扩散模型通过学习逆转将数据转换为噪声的注入过程来生成样本。一个关键的发展是将逆向采样过程重新表述为确定性概率流常微分方程(ODE),这允许使用高阶数值解算器进行高效的采样。与传统的时间积分器分析不同,这种采样过程的准确性不仅取决于数值积分误差,还取决于学习到的评分函数的逼近质量和正则性,以及它们之间的相互作用。在这项工作中,我们针对具有任意方差调度的一般前向过程从概率流ODE导出的确定性采样器进行了严格的收敛分析。具体来说,我们开发和分析了$p$-阶(指数)龙格-库塔方案,假设学习到的评分函数的一阶和二阶导数是有界的。我们证明了生成和目标分布之间的总变差距离可以被限制为$O\bigl(d^{\frac{7}{4}}\varepsilon_{\text{score}}^{\frac{1}{2}}+d(dH_{\max})^p\bigr)$,其中$\varepsilon^2_{\text{score}}$表示评分函数逼近的$L^2$误差,$d$是数据维度,$H_{\max}$代表最大的解算器步长。对基准数据集的数值实验进一步证实了学习到的评分函数的导数在实践中是有界的。
更新时间: 2025-08-14 01:01:23
领域: cs.LG,cs.NA,math.CA,math.NA
Source Component Shift Adaptation via Offline Decomposition and Online Mixing Approach
This paper addresses source component shift adaptation, aiming to update predictions adapting to source component shifts for incoming data streams based on past training data. Existing online learning methods often fail to utilize recurring shifts effectively, while model-pool-based methods struggle to capture individual source components, leading to poor adaptation. In this paper, we propose a source component shift adaptation method via an offline decomposition and online mixing approach. We theoretically identify that the problem can be divided into two subproblems: offline source component decomposition and online mixing weight adaptation. Based on this, our method first determines prediction models, each of which learns a source component solely based on past training data offline through the EM algorithm. Then, it updates the mixing weight of the prediction models for precise prediction through online convex optimization. Thanks to our theoretical derivation, our method fully leverages the characteristics of the shifts, achieving superior adaptation performance over existing methods. Experiments conducted on various real-world regression datasets demonstrate that our method outperforms baselines, reducing the cumulative test loss by up to 67.4%.
Updated: 2025-08-14 00:51:36
标题: 通过离线分解和在线混合方法进行源成分转移适应
摘要: 这篇论文讨论了源组件转移适应性,旨在根据过去的训练数据更新预测,使其适应于来自数据流的源组件转移。现有的在线学习方法通常无法有效利用重复的转移,而基于模型池的方法则难以捕捉个别源组件,导致适应性差。在本文中,我们提出了一种通过离线分解和在线混合方法进行源组件转移适应的方法。我们理论上确定该问题可以分为两个子问题:离线源组件分解和在线混合权重调整。基于此,我们的方法首先通过EM算法仅基于过去的训练数据离线学习源组件的预测模型。然后,通过在线凸优化来更新预测模型的混合权重,以实现精确预测。由于我们的理论推导,我们的方法充分利用了转移的特征,实现了优于现有方法的适应性性能。对各种真实世界的回归数据集进行的实验表明,我们的方法优于基线方法,将累积测试损失减少了高达67.4%。
更新时间: 2025-08-14 00:51:36
领域: cs.LG
Federated Anomaly Detection for Multi-Tenant Cloud Platforms with Personalized Modeling
This paper proposes an anomaly detection method based on federated learning to address key challenges in multi-tenant cloud environments, including data privacy leakage, heterogeneous resource behavior, and the limitations of centralized modeling. The method establishes a federated training framework involving multiple tenants. Each tenant trains the model locally using private resource usage data. Through parameter aggregation, a global model is optimized, enabling cross-tenant collaborative anomaly detection while preserving data privacy. To improve adaptability to diverse resource usage patterns, a personalized parameter adjustment mechanism is introduced. This allows the model to retain tenant-specific feature representations while sharing global knowledge. In the model output stage, the Mahalanobis distance is used to compute anomaly scores. This enhances both the accuracy and stability of anomaly detection. The experiments use real telemetry data from a cloud platform to construct a simulated multi-tenant environment. The study evaluates the model's performance under varying participation rates and noise injection levels. These comparisons demonstrate the proposed method's robustness and detection accuracy. Experimental results show that the proposed method outperforms existing mainstream models across key metrics such as Precision, Recall, and F1-Score. It also maintains stable performance in various complex scenarios. These findings highlight the method's practical potential for intelligent resource monitoring and anomaly diagnosis in cloud computing environments.
Updated: 2025-08-14 00:46:24
标题: 跨租户云平台的个性化建模联合异常检测
摘要: 本文提出了一种基于联邦学习的异常检测方法,以解决多租户云环境中的关键挑战,包括数据隐私泄露、异构资源行为和集中建模的限制。该方法建立了一个涉及多个租户的联邦训练框架。每个租户使用私有资源使用数据在本地训练模型。通过参数聚合,优化了一个全局模型,实现了跨租户的协作异常检测,同时保护数据隐私。为了提高对各种资源使用模式的适应性,引入了个性化参数调整机制。这使得模型能够保留特定于租户的特征表示,同时共享全局知识。在模型输出阶段,使用马氏距离来计算异常分数。这提高了异常检测的准确性和稳定性。实验使用来自云平台的真实遥测数据构建了一个模拟的多租户环境。研究评估了模型在不同参与率和噪声注入水平下的性能。这些比较展示了所提出方法的鲁棒性和检测准确性。实验结果表明,所提出的方法在关键指标(如精度、召回率和F1分数)上优于现有的主流模型。它还在各种复杂场景中保持稳定性。这些发现突显了该方法在云计算环境中智能资源监控和异常诊断方面的实际潜力。
更新时间: 2025-08-14 00:46:24
领域: cs.LG
Multi-Agent Reinforcement Learning for Adaptive Resource Orchestration in Cloud-Native Clusters
This paper addresses the challenges of high resource dynamism and scheduling complexity in cloud-native database systems. It proposes an adaptive resource orchestration method based on multi-agent reinforcement learning. The method introduces a heterogeneous role-based agent modeling mechanism. This allows different resource entities, such as compute nodes, storage nodes, and schedulers, to adopt distinct policy representations. These agents are better able to reflect diverse functional responsibilities and local environmental characteristics within the system. A reward-shaping mechanism is designed to integrate local observations with global feedback. This helps mitigate policy learning bias caused by incomplete state observations. By combining real-time local performance signals with global system value estimation, the mechanism improves coordination among agents and enhances policy convergence stability. A unified multi-agent training framework is developed and evaluated on a representative production scheduling dataset. Experimental results show that the proposed method outperforms traditional approaches across multiple key metrics. These include resource utilization, scheduling latency, policy convergence speed, system stability, and fairness. The results demonstrate strong generalization and practical utility. Across various experimental scenarios, the method proves effective in handling orchestration tasks with high concurrency, high-dimensional state spaces, and complex dependency relationships. This confirms its advantages in real-world, large-scale scheduling environments.
Updated: 2025-08-14 00:43:20
标题: 多智能体强化学习用于云原生集群中的自适应资源编排
摘要: 本文讨论了云原生数据库系统中高资源动态性和调度复杂性所面临的挑战。它提出了一种基于多智能体强化学习的自适应资源编排方法。该方法引入了一种异构基于角色的智能体建模机制。这使得不同的资源实体,如计算节点、存储节点和调度器,可以采用不同的策略表示。这些智能体能够更好地反映系统中的多样化功能责任和本地环境特征。设计了一种奖励塑造机制,将本地观察与全局反馈相结合。这有助于减轻由于不完整状态观察引起的策略学习偏差。通过将实时本地性能信号与全局系统价值估计相结合,该机制改善了智能体之间的协调,并增强了策略收敛稳定性。开发并在代表性生产调度数据集上评估了统一的多智能体训练框架。实验结果表明,所提出的方法在多个关键指标上优于传统方法。这些指标包括资源利用率、调度延迟、策略收敛速度、系统稳定性和公平性。结果表明具有很强的泛化能力和实际效用。在各种实验场景中,该方法证明在处理具有高并发性、高维状态空间和复杂依赖关系的编排任务方面是有效的。这证实了它在实际的大规模调度环境中的优势。
更新时间: 2025-08-14 00:43:20
领域: cs.LG
Blockchain-Enabled Federated Learning
Blockchain-enabled federated learning (BCFL) addresses fundamental challenges of trust, privacy, and coordination in collaborative AI systems. This chapter provides comprehensive architectural analysis of BCFL systems through a systematic four-dimensional taxonomy examining coordination structures, consensus mechanisms, storage architectures, and trust models. We analyze design patterns from blockchain-verified centralized coordination to fully decentralized peer-to-peer networks, evaluating trade-offs in scalability, security, and performance. Through detailed examination of consensus mechanisms designed for federated learning contexts, including Proof of Quality and Proof of Federated Learning, we demonstrate how computational work can be repurposed from arbitrary cryptographic puzzles to productive machine learning tasks. The chapter addresses critical storage challenges by examining multi-tier architectures that balance blockchain's transaction constraints with neural networks' large parameter requirements while maintaining cryptographic integrity. A technical case study of the TrustMesh framework illustrates practical implementation considerations in BCFL systems through distributed image classification training, demonstrating effective collaborative learning across IoT devices with highly non-IID data distributions while maintaining complete transparency and fault tolerance. Analysis of real-world deployments across healthcare consortiums, financial services, and IoT security applications validates the practical viability of BCFL systems, achieving performance comparable to centralized approaches while providing enhanced security guarantees and enabling new models of trustless collaborative intelligence.
Updated: 2025-08-14 00:40:52
标题: 区块链启用的联邦学习
摘要: 区块链启用的联邦学习(BCFL)解决了协作人工智能系统中的信任、隐私和协调等基本挑战。本章通过系统性的四维分类法对BCFL系统的全面架构进行了分析,考察了协调结构、共识机制、存储架构和信任模型。我们从区块链验证的集中协调到完全去中心化的点对点网络的设计模式进行了分析,评估了可伸缩性、安全性和性能的权衡。通过详细研究为联邦学习环境设计的共识机制,包括质量证明和联邦学习证明,我们展示了如何将计算工作从任意的加密难题重新用于生产性的机器学习任务。本章通过研究平衡区块链交易约束和神经网络大参数需求的多层架构来解决关键的存储挑战,同时保持加密完整性。TrustMesh框架的技术案例研究展示了BCFL系统中分布式图像分类训练的实际实施考虑,展示了在保持完全透明和容错性的同时,在具有高度非IID数据分布的物联网设备之间实现有效的协作学习。跨医疗联盟、金融服务和物联网安全应用的现实部署分析验证了BCFL系统的实际可行性,实现了与集中式方法相当的性能,同时提供了增强的安全保障,并促进了无信任协作智能的新模型。
更新时间: 2025-08-14 00:40:52
领域: cs.DC,cs.LG
Facilitating Longitudinal Interaction Studies of AI Systems
UIST researchers develop tools to address user challenges. However, user interactions with AI evolve over time through learning, adaptation, and repurposing, making one time evaluations insufficient. Capturing these dynamics requires longer-term studies, but challenges in deployment, evaluation design, and data collection have made such longitudinal research difficult to implement. Our workshop aims to tackle these challenges and prepare researchers with practical strategies for longitudinal studies. The workshop includes a keynote, panel discussions, and interactive breakout groups for discussion and hands-on protocol design and tool prototyping sessions. We seek to foster a community around longitudinal system research and promote it as a more embraced method for designing, building, and evaluating UIST tools.
Updated: 2025-08-14 00:38:23
标题: 促进人工智能系统的纵向交互研究
摘要: UIST研究人员开发了工具来解决用户挑战。然而,用户与人工智能的互动随着时间的推移而不断发展,通过学习、适应和再利用,使得一次性评估变得不够。捕捉这些动态过程需要较长期的研究,但在部署、评估设计和数据收集方面存在挑战,使得这种纵向研究难以实施。我们的研讨会旨在解决这些挑战,为纵向研究准备研究人员提供实用策略。研讨会包括主题演讲、小组讨论和互动分组,以讨论和实际设计协议和工具原型会话。我们希望围绕纵向系统研究建立一个社区,并将其推广为设计、构建和评估UIST工具的更受欢迎的方法。
更新时间: 2025-08-14 00:38:23
领域: cs.HC,cs.AI,cs.CY
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
Large Language Models (LLMs) hold promise in automating data analysis tasks, yet open-source models face significant limitations in these kinds of reasoning-intensive scenarios. In this work, we investigate strategies to enhance the data analysis capabilities of open-source LLMs. By curating a seed dataset of diverse, realistic scenarios, we evaluate model behavior across three core dimensions: data understanding, code generation, and strategic planning. Our analysis reveals three key findings: (1) Strategic planning quality serves as the primary determinant of model performance; (2) Interaction design and task complexity significantly influence reasoning capabilities; (3) Data quality demonstrates a greater impact than diversity in achieving optimal performance. We leverage these insights to develop a data synthesis methodology, demonstrating significant improvements in open-source LLMs' analytical reasoning capabilities. Code is available at https://github.com/zjunlp/DataMind.
Updated: 2025-08-14 00:35:54
标题: 为什么开源LLMs在数据分析方面遇到困难?一个系统性的实证研究
摘要: 大型语言模型(LLMs)在自动化数据分析任务方面具有潜力,然而开源模型在这些需要大量推理的场景中面临重大限制。在这项工作中,我们研究了增强开源LLMs数据分析能力的策略。通过策划一个包含多样化、现实场景的种子数据集,我们评估了模型在数据理解、代码生成和战略规划三个核心维度上的表现。我们的分析揭示了三个关键发现:(1)战略规划质量是模型性能的主要决定因素;(2)交互设计和任务复杂性显著影响推理能力;(3)数据质量对于实现最佳性能比多样性产生更大影响。我们利用这些见解开发了一种数据合成方法,展示了提高开源LLMs分析推理能力的显著改进。代码可在https://github.com/zjunlp/DataMind上获得。
更新时间: 2025-08-14 00:35:54
领域: cs.CL,cs.AI,cs.IR,cs.LG,cs.MA
Convergence Analysis of Max-Min Exponential Neural Network Operators in Orlicz Space
In this current work, we propose a Max Min approach for approximating functions using exponential neural network operators. We extend this framework to develop the Max Min Kantorovich-type exponential neural network operators and investigate their approximation properties. We study both pointwise and uniform convergence for univariate functions. To analyze the order of convergence, we use the logarithmic modulus of continuity and estimate the corresponding rate of convergence. Furthermore, we examine the convergence behavior of the Max Min Kantorovich type exponential neural network operators within the Orlicz space setting. We provide some graphical representations to illustrate the approximation error of the function through suitable kernel and sigmoidal activation functions.
Updated: 2025-08-14 00:30:56
标题: Orlicz空间中Max-Min指数神经网络算子的收敛分析
摘要: 在这项当前的工作中,我们提出了一种最大最小方法,用指数神经网络算子来逼近函数。我们扩展了这个框架,开发了最大最小坎托罗维奇类型的指数神经网络算子,并研究它们的逼近性质。我们研究了一元函数的逐点收敛和一致收敛。为了分析收敛的阶数,我们使用了对数连续性模数,并估计了相应的收敛速度。此外,我们在Orlicz空间设置中检验了最大最小坎托罗维奇类型指数神经网络算子的收敛行为。我们提供了一些图形表示,通过适当的核函数和S形激活函数来说明函数的逼近误差。
更新时间: 2025-08-14 00:30:56
领域: cs.LG,math.FA
Responsible Machine Learning via Mixed-Integer Optimization
In the last few decades, Machine Learning (ML) has achieved significant success across domains ranging from healthcare, sustainability, and the social sciences, to criminal justice and finance. But its deployment in increasingly sophisticated, critical, and sensitive areas affecting individuals, the groups they belong to, and society as a whole raises critical concerns around fairness, transparency and robustness, among others. As the complexity and scale of ML systems and of the settings in which they are deployed grow, so does the need for responsible ML methods that address these challenges while providing guaranteed performance in deployment. Mixed-integer optimization (MIO) offers a powerful framework for embedding responsible ML considerations directly into the learning process while maintaining performance. For example, it enables learning of inherently transparent models that can conveniently incorporate fairness or other domain specific constraints. This tutorial paper provides an accessible and comprehensive introduction to this topic discussing both theoretical and practical aspects. It outlines some of the core principles of responsible ML, their importance in applications, and the practical utility of MIO for building ML models that align with these principles. Through examples and mathematical formulations, it illustrates practical strategies and available tools for efficiently solving MIO problems for responsible ML. It concludes with a discussion on current limitations and open research questions, providing suggestions for future work.
Updated: 2025-08-14 00:28:10
标题: 通过混合整数优化实现负责任的机器学习
摘要: 在过去几十年里,机器学习(ML)在医疗保健、可持续性、社会科学、刑事司法和金融等领域取得了显著成功。但它在越来越复杂、关键和敏感的领域中的部署,涉及到个人、他们所属的群体以及整个社会,引发了围绕公平性、透明度和鲁棒性等关键问题的担忧。随着ML系统的复杂性和规模以及其部署环境的增长,负责任的ML方法的需求也在增加,这些方法可以在部署过程中提供性能保证。 混合整数优化(MIO)提供了一个强大的框架,可以直接将负责任的ML考虑因素嵌入到学习过程中,同时保持性能。例如,它可以实现学习透明的模型,这些模型可以方便地纳入公平性或其他领域特定的约束条件。本教程论文对这一主题进行了易于理解且全面的介绍,讨论了理论和实践方面。它概述了负责任ML的一些核心原则,它们在应用中的重要性,以及MIO在构建符合这些原则的ML模型方面的实际效用。通过示例和数学公式,它说明了有效解决负责任ML的MIO问题的实际策略和可用工具。最后,它讨论了当前的局限性和未解研究问题,并提出了未来工作的建议。
更新时间: 2025-08-14 00:28:10
领域: cs.LG,math.OC,stat.ML
Pruning and Malicious Injection: A Retraining-Free Backdoor Attack on Transformer Models
Transformer models have demonstrated exceptional performance and have become indispensable in computer vision (CV) and natural language processing (NLP) tasks. However, recent studies reveal that transformers are susceptible to backdoor attacks. Prior backdoor attack methods typically rely on retraining with clean data or altering the model architecture, both of which can be resource-intensive and intrusive. In this paper, we propose Head-wise Pruning and Malicious Injection (HPMI), a novel retraining-free backdoor attack on transformers that does not alter the model's architecture. Our approach requires only a small subset of the original data and basic knowledge of the model architecture, eliminating the need for retraining the target transformer. Technically, HPMI works by pruning the least important head and injecting a pre-trained malicious head to establish the backdoor. We provide a rigorous theoretical justification demonstrating that the implanted backdoor resists detection and removal by state-of-the-art defense techniques, under reasonable assumptions. Experimental evaluations across multiple datasets further validate the effectiveness of HPMI, showing that it 1) incurs negligible clean accuracy loss, 2) achieves at least 99.55% attack success rate, and 3) bypasses four advanced defense mechanisms. Additionally, relative to state-of-the-art retraining-dependent attacks, HPMI achieves greater concealment and robustness against diverse defense strategies, while maintaining minimal impact on clean accuracy.
Updated: 2025-08-14 00:13:22
标题: 修剪和恶意注入:对Transformer模型的无需重新训练的后门攻击
摘要: Transformer 模型已经展示出卓越的性能,并在计算机视觉(CV)和自然语言处理(NLP)任务中变得不可或缺。然而,最近的研究表明,transformers 容易受到后门攻击。先前的后门攻击方法通常依赖于使用干净数据进行重新训练或更改模型架构,这两种方法都可能耗费资源且具有侵入性。在本文中,我们提出了一种名为 Head-wise Pruning and Malicious Injection(HPMI)的新型无需重新训练的transformers 后门攻击方法,不会改变模型的架构。我们的方法仅需要原始数据的一个小子集和对模型架构的基础知识,而不需要重新训练目标 transformer。技术上,HPMI 的工作原理是通过修剪最不重要的头部并注入一个预训练的恶意头部来建立后门。我们提供了严格的理论证明,表明植入的后门对最先进的防御技术具有抵抗检测和移除的能力,在合理的假设下。跨多个数据集的实验评估进一步验证了 HPMI 的有效性,表明它 1)几乎没有干净准确性损失,2)至少达到 99.55% 的攻击成功率,3)绕过了四种先进的防御机制。此外,与依赖最先进的重新训练攻击相比,HPMI 实现了更大的隐蔽性和对各种防御策略的稳健性,同时对干净准确性的影响最小。
更新时间: 2025-08-14 00:13:22
领域: cs.LG
nodeWSNsec: A hybrid metaheuristic approach for reliable security and node deployment in WSNs
Efficient and reliable node deployment in Wireless Sensor Networks is crucial for optimizing coverage of the area, connectivity among nodes, and energy efficiency. This paper proposes a hybrid meta heuristic approach combining a Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) to address the challenges of energy efficient and reliable node deployment. The GA PSO hybrid leverages GAs strong exploration capabilities and PSOs rapid convergence, achieving an optimum stability between coverage and energy consumption. The performance of the proposed approach is evaluated against GA and PSO alone and the innovatory meta heuristic based Competitive Multi Objective Marine Predators Algorithm (CMOMPA) across varying sensing ranges. Simulation results demonstrate that GA PSO requires 15% to 25% fewer sensor nodes and maintains 95% or more area coverage while maintaining the connectivity in comparison to standalone GA or PSO algorithm. The proposed algorithm also dominates CMOMPA when compared for long sensing and communication range in terms of higher coverage, improved connectivity, and reduced deployment time while requiring fewer sensor nodes. This study also explores key trade offs in WSN deployment and highlights future research directions, including heterogeneous node deployment, mobile WSNs, and enhanced multi objective optimization techniques. The findings underscore the effectiveness of hybrid meta heuristics in improving WSN performance, offering a promising approach for real world applications such as environmental monitoring, smart cities, smart agriculture, disaster response, and IIoT.
Updated: 2025-08-14 00:06:04
标题: nodeWSNsec:一种可靠的安全性和节点部署在无线传感器网络中的混合元启发式方法
摘要: 在无线传感器网络中,高效可靠的节点部署对于优化区域覆盖、节点之间的连接性和能源效率至关重要。本文提出了一种结合遗传算法(GA)和粒子群优化(PSO)的混合元启发式方法,以解决能源高效和可靠的节点部署挑战。GA PSO混合利用了GA强大的探索能力和PSO的快速收敛特性,实现了覆盖和能源消耗之间的最佳稳定性。所提出方法的性能与单独的GA和PSO算法以及基于创新元启发式的竞争多目标海洋掠食者算法(CMOMPA)在不同感知范围下进行了评估。模拟结果表明,与独立的GA或PSO算法相比,GA PSO需要15%至25%更少的传感器节点,并且在维持95%或更高的区域覆盖率的同时保持连接性。与CMOMPA相比,所提出的算法在长感知和通信范围方面具有更高的覆盖率、改进的连接性和减少的部署时间,同时需要更少的传感器节点。本研究还探讨了WSN部署中的关键权衡,并强调未来的研究方向,包括异构节点部署、移动WSN和增强的多目标优化技术。研究结果强调了混合元启发式方法在提高WSN性能方面的有效性,为环境监测、智能城市、智能农业、灾难响应和工业物联网等实际应用提供了一种有前途的方法。
更新时间: 2025-08-14 00:06:04
领域: cs.CR
Continual Learning for Multiple Modalities
Continual learning aims to learn knowledge of tasks observed in sequential time steps while mitigating the forgetting of previously learned knowledge. Existing methods were designed to learn a single modality (e.g., image) over time, which limits their applicability in scenarios involving multiple modalities. In this work, we propose a novel continual learning framework that accommodates multiple modalities (image, video, audio, depth, and text). We train a model to align various modalities with text, leveraging its rich semantic information. However, this increases the risk of forgetting previously learned knowledge, exacerbated by the differing input traits across tasks. To alleviate the overwriting of previous knowledge of modalities, we propose a framework that consolidates intra-modal knowledge while incorporating relevant inter-modal information. This is achieved by self-regulating shifts in learned representations to gradually integrating novel knowledge into the information retained across modalities. Simultaneously, it mitigates inter-modal interference by selectively integrating knowledge from previously encountered modalities based on their mutual relevance. Furthermore, we introduce a strategy to re-align modality embeddings, effectively addressing biased alignment between modalities. We evaluate the proposed method in a wide range of continual learning scenarios using multiple datasets with different modalities. Extensive experiments demonstrate that ours outperforms existing methods in the scenarios, regardless of whether the identity of the modality is given.
Updated: 2025-08-14 00:03:13
标题: 多模态连续学习
摘要: 持续学习旨在学习在连续时间步中观察到的任务知识,同时减轻先前学习知识的遗忘。现有方法旨在随着时间学习单一模态(例如图像),这限制了它们在涉及多个模态的情景中的适用性。在这项工作中,我们提出了一个新颖的持续学习框架,可以容纳多个模态(图像、视频、音频、深度和文本)。我们训练一个模型来将各种模态与文本对齐,利用其丰富的语义信息。然而,这增加了遗忘先前学习知识的风险,由于跨任务的输入特征的差异而加剧。为了减轻模态先前知识的覆盖,我们提出了一个框架,巩固了模态内部知识,同时融入相关的模态间信息。这是通过自我调节学习表示中的变化来逐渐将新知识整合到保留在各种模态之间的信息中实现的。同时,通过有选择地整合先前遇到的模态的知识,基于它们的相互关联,减轻模态间的干扰。此外,我们引入一种重新对齐模态嵌入的策略,有效解决了模态之间的偏见对齐问题。我们在多个数据集中使用不同模态进行广泛的持续学习场景评估所提出的方法。大量实验表明,无论给定模态的身份如何,我们的方法在这些场景中均优于现有方法。
更新时间: 2025-08-14 00:03:13
领域: cs.CV,cs.AI