Arxiv Day: Article

Argumentatively Coherent Judgmental Forecasting

Judgmental forecasting employs human opinions to make predictions about future events, rather than exclusively historical data as in quantitative forecasting. When these opinions form an argumentative structure around forecasts, it is useful to study the properties of the forecasts from an argumentative perspective. In this paper, we advocate and formally define a property of argumentative coherence, which, in essence, requires that a forecaster's reasoning is coherent with their forecast. We then conduct three evaluations with our notion of coherence. First, we assess the impact of enforcing coherence on human forecasters as well as on Large Language Model (LLM)-based forecasters, given that they have recently shown to be competitive with human forecasters. In both cases, we show that filtering out incoherent predictions improves forecasting accuracy consistently, supporting the practical value of coherence in both human and LLM-based forecasting. Then, via crowd-sourced user experiments, we show that, despite its apparent intuitiveness and usefulness, users do not generally align with this coherence property. This points to the need to integrate, within argumentation-based judgmental forecasting, mechanisms to filter out incoherent opinions before obtaining group forecasting predictions.

Updated: 2025-07-30 23:58:37

标题: 辩证一致的判断性预测

摘要: 判断性预测利用人类意见来预测未来事件，而不像定量预测那样仅使用历史数据。当这些意见围绕预测形成一个论证结构时，从论证角度研究预测的特性是有用的。在本文中，我们提倡并正式定义了一个论证连贯性属性，本质上要求预测者的推理与他们的预测一致。然后，我们进行了三次评估来验证我们关于连贯性的概念。首先，我们评估了强调连贯性对人类预测者以及基于大型语言模型（LLM）的预测者的影响，因为最近已经证明它们与人类预测者具有竞争力。在两种情况下，我们都表明，过滤不连贯的预测能够持续提高预测准确性，支持连贯性在人类和基于LLM的预测中的实用价值。然后，通过众包用户实验，我们展示了，尽管连贯性属性显而易见且有用，但用户通常不与之保持一致。这表明在基于论证的判断性预测中需要整合机制，以在获得群体预测预测之前过滤不连贯的意见。

更新时间: 2025-07-30 23:58:37

领域: cs.AI,I.2.7

下载: http://arxiv.org/abs/2507.23163v1

TokenBlowUp: Resolving Representational Singularities in LLM Token Spaces via Monoidal Transformations

Recent work has provided compelling evidence challenging the foundational manifold hypothesis for the token embedding spaces of Large Language Models (LLMs). These findings reveal the presence of geometric singularities around polysemous tokens, which can lead to representational instability. Existing methodologies, which presuppose a smooth data manifold, are ill-equipped to address such intrinsic structural flaws. In this paper, we formalize this problem in the language of scheme theory and propose a rigorous resolution by applying the scheme-theoretic blow-up at each singular point. This procedure replaces a singular point in the ambient affine scheme with its exceptional divisor, which we identify as a canonical geometric space -- a projective space of directions -- that houses the disambiguated semantic meanings of the token. This process of ``representational desingularization'' constructs a new geometric landscape for embeddings. We prove a formal theorem guaranteeing the geometric regularization of this new space, showing that the original pathologies are resolved. Finally, we outline the architectural implications of our framework, arguing for a paradigm shift from static look-ups to dynamic, geometrically-grounded computation.

Updated: 2025-07-30 23:48:07

标题: TokenBlowUp：通过单体变换解决LLM Token空间中的表征奇异性

摘要: 最近的研究提供了令人信服的证据，挑战了大型语言模型（LLMs）的令牌嵌入空间的基本流形假设。这些发现揭示了在多义词标记周围存在几何奇异性，这可能导致表示不稳定。现有的方法学，假设平滑数据流形，无法解决这种固有的结构缺陷。在本文中，我们以方案理论的语言正式化了这个问题，并提出了一个严格的解决方案，即在每个奇异点应用方案理论的扩张。这个过程将环境仿射方案中的奇异点替换为其特殊除数，我们将其识别为一个规范几何空间 - 一个方向的射影空间 - 其中包含标记的消歧义语义含义。这种“表示去奇异化”的过程构建了嵌入的新几何景观。我们证明了一个正式定理，保证了这个新空间的几何正规化，表明原始的病理现象得到了解决。最后，我们概述了我们框架的架构影响，主张从静态查找到动态，几何基础的计算的范式转变。

更新时间: 2025-07-30 23:48:07

领域: math.AG,cs.LG

下载: http://arxiv.org/abs/2507.19747v2

Extended Factorization Machine Annealing for Rapid Discovery of Transparent Conducting Materials

The development of novel transparent conducting materials (TCMs) is essential for enhancing the performance and reducing the cost of next-generation devices such as solar cells and displays. In this research, we focus on the (Al$_x$Ga$_y$In$_z$)$_2$O$_3$ system and extend the FMA framework, which combines a Factorization Machine (FM) and annealing, to search for optimal compositions and crystal structures with high accuracy and low cost. The proposed method introduces (i) the binarization of continuous variables, (ii) the utilization of good solutions using a Hopfield network, (iii) the activation of global search through adaptive random flips, and (iv) fine-tuning via a bit-string local search. Validation using the (Al$_x$Ga$_y$In$_z$)$_2$O$_3$ data from the Kaggle "Nomad2018 Predicting Transparent Conductors" competition demonstrated that our method achieves faster and more accurate searches than Bayesian optimization and genetic algorithms. Furthermore, its application to multi-objective optimization showed its capability in designing materials by simultaneously considering both the band gap and formation energy. These results suggest that applying our method to larger, more complex search problems and diverse material designs that reflect realistic experimental conditions is expected to contribute to the further advancement of materials informatics.

Updated: 2025-07-30 23:43:40

标题: 扩展因子分解机退火用于快速发现透明导电材料

摘要: 开发新型透明导电材料(TCMs)对于提高下一代设备(如太阳能电池和显示器)的性能和降低成本至关重要。在这项研究中，我们专注于(Al$_x$Ga$_y$In$_z$)$_2$O$_3$系统，并扩展了FMA框架，该框架结合了因子分解机(FM)和退火，以寻找具有高精度和低成本的最佳组成和晶体结构。所提出的方法引入了(i)连续变量的二值化，(ii)使用Hopfield网络利用好的解决方案，(iii)通过自适应随机翻转激活全局搜索，(iv)通过位串局部搜索进行微调。使用Kaggle“Nomad2018预测透明导体”竞赛的(Al$_x$Ga$_y$In$_z$)$_2$O$_3$数据进行验证，表明我们的方法比贝叶斯优化和遗传算法实现了更快、更准确的搜索。此外，将其应用于多目标优化显示出了其在设计材料方面的能力，同时考虑了能隙和形成能量。这些结果表明，将我们的方法应用于更大、更复杂的搜索问题和反映实际实验条件的多样化材料设计中，有望为材料信息学的进一步发展做出贡献。

更新时间: 2025-07-30 23:43:40

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2507.23160v1

AdaptHetero: Machine Learning Interpretation-Driven Subgroup Adaptation for EHR-Based Clinical Prediction

Machine learning interpretation (MLI) has primarily been leveraged to build clinician trust and uncover actionable insights in EHRs. However, the intrinsic complexity and heterogeneity of EHR data limit its effectiveness in guiding subgroup-specific modeling. We propose AdaptHetero, a novel MLI-driven framework that transforms interpretability insights into actionable guidance for tailoring model training and evaluation across subpopulations within individual hospital systems. Evaluated on three large-scale EHR datasets: GOSSIS-1-eICU, WiDS, and MIMIC-IV, AdaptHetero consistently identifies heterogeneous model behaviors in predicting ICU mortality, in-hospital death, and hidden hypoxemia. By integrating SHAP-based interpretation and unsupervised clustering, the framework enhances the identification of clinically meaningful subgroup-specific characteristics, leading to improved predictive performance and optimized clinical deployment.

Updated: 2025-07-30 23:42:06

标题: AdaptHetero：基于机器学习解释驱动的电子病历临床预测子组适应

摘要: 机器学习解释（MLI）主要用于建立临床医生的信任并发现电子病历中的可操作见解。然而，电子病历数据的固有复杂性和异质性限制了其在指导亚组特定建模方面的有效性。我们提出了AdaptHetero，这是一个新颖的MLI驱动框架，将可解释性见解转化为针对个体医院系统中亚群体的模型训练和评估的可操作指导。在三个大型电子病历数据集上进行评估：GOSSIS-1-eICU、WiDS和MIMIC-IV，AdaptHetero始终能够识别出在预测ICU死亡率、住院死亡率和隐藏性低氧血症方面的异质模型行为。通过整合基于SHAP的解释和无监督聚类，该框架增强了对临床意义的亚组特定特征的识别，从而提高了预测性能和优化了临床部署。

更新时间: 2025-07-30 23:42:06

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2507.21197v2

Decision by Supervised Learning with Deep Ensembles: A Practical Framework for Robust Portfolio Optimization

We propose Decision by Supervised Learning (DSL), a practical framework for robust portfolio optimization. DSL reframes portfolio construction as a supervised learning problem: models are trained to predict optimal portfolio weights, using cross-entropy loss and portfolios constructed by maximizing the Sharpe or Sortino ratio. To further enhance stability and reliability, DSL employs Deep Ensemble methods, substantially reducing variance in portfolio allocations. Through comprehensive backtesting across diverse market universes and neural architectures, shows superior performance compared to both traditional strategies and leading machine learning-based methods, including Prediction-Focused Learning and End-to-End Learning. We show that increasing the ensemble size leads to higher median returns and more stable risk-adjusted performance. The code is available at https://github.com/DSLwDE/DSLwDE.

Updated: 2025-07-30 23:25:16

标题: 使用深度集成监督学习进行决策：稳健投资组合优化的实用框架

摘要: 我们提出了一种名为监督学习决策（DSL）的实用框架，用于稳健的投资组合优化。DSL将投资组合构建重新定义为一个监督学习问题：模型被训练用于预测最优的投资组合权重，使用交叉熵损失和通过最大化夏普比率或Sortino比率构建的投资组合。为了进一步增强稳定性和可靠性，DSL采用了深度集成方法，大大减少了投资组合分配中的方差。通过在不同市场宇宙和神经结构上进行全面的回测，DSLwDE展示了与传统策略和领先的基于机器学习的方法，包括预测焦点学习和端到端学习相比的优越性能。我们表明，增加集成大小会导致更高的中位回报和更稳定的风险调整性能。代码可在https://github.com/DSLwDE/DSLwDE 上获得。

更新时间: 2025-07-30 23:25:16

领域: cs.LG,q-fin.CP,q-fin.PM

下载: http://arxiv.org/abs/2503.13544v4

Can one size fit all?: Measuring Failure in Multi-Document Summarization Domain Transfer

Abstractive multi-document summarization (MDS) is the task of automatically summarizing information in multiple documents, from news articles to conversations with multiple speakers. The training approaches for current MDS models can be grouped into four approaches: end-to-end with special pre-training ("direct"), chunk-then-summarize, extract-then-summarize, and inference with GPT-style models. In this work, we evaluate MDS models across training approaches, domains, and dimensions (reference similarity, quality, and factuality), to analyze how and why models trained on one domain can fail to summarize documents from another (News, Science, and Conversation) in the zero-shot domain transfer setting. We define domain-transfer "failure" as a decrease in factuality, higher deviation from the target, and a general decrease in summary quality. In addition to exploring domain transfer for MDS models, we examine potential issues with applying popular summarization metrics out-of-the-box.

Updated: 2025-07-30 23:19:16

标题: 一个尺寸适合所有吗？：在多文档总结领域转移中测量失败

摘要: 抽象式多文档摘要（MDS）是自动总结多个文档中的信息的任务，从新闻文章到与多个发言者的对话。目前MDS模型的训练方法可以分为四种方法：端到端与特殊预训练（“直接”）、分块-总结、提取-总结和使用GPT风格模型的推理。在这项工作中，我们评估了MDS模型在训练方法、领域和维度（参考相似性、质量和事实性）上的表现，以分析为什么在一个领域训练的模型可能无法在零射击领域转移设置中总结另一个领域的文档（新闻、科学和对话）。我们将领域转移的“失败”定义为事实性下降、与目标的偏差增加以及总结质量普遍下降。除了探索MDS模型的领域转移，我们还研究了将流行的总结度量应用于原装箱时可能出现的问题。

更新时间: 2025-07-30 23:19:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2503.15768v2

On the Complexity of Finding Stationary Points in Nonconvex Simple Bilevel Optimization

In this paper, we study the problem of solving a simple bilevel optimization problem, where the upper-level objective is minimized over the solution set of the lower-level problem. We focus on the general setting in which both the upper- and lower-level objectives are smooth but potentially nonconvex. Due to the absence of additional structural assumptions for the lower-level objective-such as convexity or the Polyak-{\L}ojasiewicz (PL) condition-guaranteeing global optimality is generally intractable. Instead, we introduce a suitable notion of stationarity for this class of problems and aim to design a first-order algorithm that finds such stationary points in polynomial time. Intuitively, stationarity in this setting means the upper-level objective cannot be substantially improved locally without causing a larger deterioration in the lower-level objective. To this end, we show that a simple and implementable variant of the dynamic barrier gradient descent (DBGD) framework can effectively solve the considered nonconvex simple bilevel problems up to stationarity. Specifically, to reach an $(\epsilon_f, \epsilon_g)$-stationary point-where $\epsilon_f$ and $\epsilon_g$ denote the target stationarity accuracies for the upper- and lower-level objectives, respectively-the considered method achieves a complexity of $\mathcal{O}\left(\max\left(\epsilon_f^{-\frac{3+p}{1+p}}, \epsilon_g^{-\frac{3+p}{2}}\right)\right)$, where $p \geq 0$ is an arbitrary constant balancing the terms. To the best of our knowledge, this is the first complexity result for a discrete-time algorithm that guarantees joint stationarity for both levels in general nonconvex simple bilevel problems.

Updated: 2025-07-30 23:10:29

标题: 关于在非凸简单双层优化中寻找稳定点的复杂性

摘要: 在这篇论文中，我们研究了解决一个简单的双层优化问题的问题，其中上层目标在下层问题的解集上被最小化。我们专注于一般设置，其中上层和下层目标都是光滑但可能非凸的。由于对于下层目标缺乏额外的结构假设，比如凸性或Polyak-Lojasiewicz（PL）条件，保证全局最优通常是难以处理的。相反，我们引入了适当的稳定性概念，旨在设计一个一阶算法，在多项式时间内找到这类问题的稳定点。直观地，在这种情况下，稳定性意味着上层目标在本地不能在不引起下层目标的更大恶化的情况下显着改善。为此，我们展示了动态障碍梯度下降（DBGD）框架的一个简单且可实现的变体可以有效地解决考虑的非凸简单双层问题直到稳定性。具体地，为了达到（ϵf，ϵg）-稳定点-其中ϵf和ϵg分别表示上层和下层目标的目标稳定性精度，所考虑的方法实现了一个复杂度为O（max（ϵf^ -3+p /1+p，ϵg^ -3+p /2））的结果，其中p≥0是一个平衡项的任意常数。据我们所知，这是确保一般非凸简单双层问题中两个级别的联合稳定性的离散时间算法的第一个复杂度结果。

更新时间: 2025-07-30 23:10:29

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2507.23155v1

FuseTen: A Generative Model for Daily 10 m Land Surface Temperature Estimation from Spatio-Temporal Satellite Observations

Urban heatwaves, droughts, and land degradation are pressing and growing challenges in the context of climate change. A valuable approach to studying them requires accurate spatio-temporal information on land surface conditions. One of the most important variables for assessing and understanding these phenomena is Land Surface Temperature (LST), which is derived from satellites and provides essential information about the thermal state of the Earth's surface. However, satellite platforms inherently face a trade-off between spatial and temporal resolutions. To bridge this gap, we propose FuseTen, a novel generative framework that produces daily LST observations at a fine 10 m spatial resolution by fusing spatio-temporal observations derived from Sentinel-2, Landsat 8, and Terra MODIS. FuseTen employs a generative architecture trained using an averaging-based supervision strategy grounded in physical principles. It incorporates attention and normalization modules within the fusion process and uses a PatchGAN discriminator to enforce realism. Experiments across multiple dates show that FuseTen outperforms linear baselines, with an average 32.06% improvement in quantitative metrics and 31.42% in visual fidelity. To the best of our knowledge, this is the first non-linear method to generate daily LST estimates at such fine spatial resolution.

Updated: 2025-07-30 23:04:16

标题: FuseTen：一种从时空卫星观测数据生成每日10米地表温度估算的模型

摘要: 城市热浪、干旱和土地退化是气候变化背景下迫切和不断增长的挑战。研究它们的宝贵方法需要准确的土地表面条件的时空信息。评估和理解这些现象最重要的变量之一是来自卫星的地表温度（LST），它提供了有关地球表面热状态的基本信息。然而，卫星平台本质上面临着空间和时间分辨率之间的权衡。为了弥补这一差距，我们提出了FuseTen，这是一个新颖的生成框架，通过融合来自Sentinel-2、Landsat 8和Terra MODIS的时空观测数据，以细粒度的10米空间分辨率生成每日LST观测值。FuseTen采用了基于平均的监督策略训练的生成架构，这一策略基于物理原理。它在融合过程中引入了注意力和归一化模块，并使用PatchGAN鉴别器来强制现实性。跨多个日期的实验表明，FuseTen优于线性基线，定量指标平均改善了32.06％，视觉保真度提高了31.42％。据我们所知，这是第一个以这么细粒度的空间分辨率生成每日LST估计的非线性方法。

更新时间: 2025-07-30 23:04:16

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.23154v1

AI paradigm for solving differential equations: first-principles data generation and scale-dilation operator AI solver

Many problems are governed by differential equations (DEs). Artificial intelligence (AI) is a new path for solving DEs. However, data is very scarce and existing AI solvers struggle with approximation of high frequency components (AHFC). We propose an AI paradigm for solving diverse DEs, including DE-ruled first-principles data generation methodology and scale-dilation operator (SDO) AI solver. Using either prior knowledge or random fields, we generate solutions and then substitute them into the DEs to derive the sources and initial/boundary conditions through balancing DEs, thus producing arbitrarily vast amount of, first-principles-consistent training datasets at extremely low computational cost. We introduce a reversible SDO that leverages the Fourier transform of the multiscale solutions to fix AHFC, and design a spatiotemporally coupled, attention-based Transformer AI solver of DEs with SDO. An upper bound on the Hessian condition number of the loss function is proven to be proportional to the squared 2-norm of the solution gradient, revealing that SDO yields a smoother loss landscape, consequently fixing AHFC with efficient training. Extensive tests on diverse DEs demonstrate that our AI paradigm achieves consistently superior accuracy over state-of-the-art methods. This work makes AI solver of DEs to be truly usable in broad nature and engineering fields.

Updated: 2025-07-30 22:45:11

标题: 解决微分方程的人工智能范式：基本原理数据生成和尺度膨胀运算符人工智能求解器

摘要: 许多问题都由微分方程（DEs）所控制。人工智能（AI）是解决DEs的新途径。然而，数据非常稀缺，现有的AI求解器在高频组件（AHFC）的逼近方面存在困难。我们提出了一种用于解决不同DEs的AI范式，包括基于第一原理数据生成方法和尺度膨胀算子（SDO）AI求解器。我们使用先验知识或随机场生成解决方案，然后将它们代入DEs以推导出源和初始/边界条件，通过平衡DEs，从而以极低的计算成本生成任意大量的，符合第一原理的训练数据集。我们引入了一个可逆的SDO，利用多尺度解的傅里叶变换来修复AHFC，并设计了一个时空耦合、基于注意力的Transformer AI求解器，具有SDO。已经证明，损失函数的Hessian条件数的上界与解梯度的平方2-范数成正比，表明SDO产生了更加平滑的损失景观，从而通过高效的训练修复了AHFC。对不同DEs进行的广泛测试表明，我们的AI范式在准确性方面始终优于最先进的方法。这项工作使得DEs的AI求解器在广泛的自然和工程领域中真正可用。

更新时间: 2025-07-30 22:45:11

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2507.23141v1

Observational Multiplicity

Many prediction tasks can admit multiple models that can perform almost equally well. This phenomenon can can undermine interpretability and safety when competing models assign conflicting predictions to individuals. In this work, we study how arbitrariness can arise in probabilistic classification tasks as a result of an effect that we call \emph{observational multiplicity}. We discuss how this effect arises in a broad class of practical applications where we learn a classifier to predict probabilities $p_i \in [0,1]$ but are given a dataset of observations $y_i \in \{0,1\}$. We propose to evaluate the arbitrariness of individual probability predictions through the lens of \emph{regret}. We introduce a measure of regret for probabilistic classification tasks, which measures how the predictions of a model could change as a result of different training labels change. We present a general-purpose method to estimate the regret in a probabilistic classification task. We use our measure to show that regret is higher for certain groups in the dataset and discuss potential applications of regret. We demonstrate how estimating regret promote safety in real-world applications by abstention and data collection.

Updated: 2025-07-30 22:30:56

标题: 观察多重性

摘要: 许多预测任务可以采用多个模型，这些模型的性能几乎相同。当竞争模型为个体分配冲突预测时，这种现象可能会损害可解释性和安全性。在这项工作中，我们研究了在概率分类任务中如何产生任意性，这是由我们称为“观察多样性”的效应引起的。我们讨论了这种效应在广泛的实际应用中的产生方式，我们在这些应用中学习分类器以预测概率$p_i \in [0,1]$，但给定的是观测数据集$y_i \in \{0,1\}$。我们建议通过“遗憾”的视角来评估个体概率预测的任意性。我们引入了一个用于概率分类任务的遗憾度量，该度量衡量模型的预测可能会因不同的训练标签而发生变化。我们提出了一种通用方法来估计概率分类任务中的遗憾。我们使用我们的度量来展示数据集中某些群体的遗憾程度更高，并讨论遗憾的潜在应用。我们演示了如何通过估计遗憾度来促进现实世界应用中的安全性，通过放弃和数据收集。

更新时间: 2025-07-30 22:30:56

领域: cs.LG

下载: http://arxiv.org/abs/2507.23136v1

Evaluating and Improving the Robustness of Speech Command Recognition Models to Noise and Distribution Shifts

Although prior work in computer vision has shown strong correlations between in-distribution (ID) and out-of-distribution (OOD) accuracies, such relationships remain underexplored in audio-based models. In this study, we investigate how training conditions and input features affect the robustness and generalization abilities of spoken keyword classifiers under OOD conditions. We benchmark several neural architectures across a variety of evaluation sets. To quantify the impact of noise on generalization, we make use of two metrics: Fairness (F), which measures overall accuracy gains compared to a baseline model, and Robustness (R), which assesses the convergence between ID and OOD performance. Our results suggest that noise-aware training improves robustness in some configurations. These findings shed new light on the benefits and limitations of noise-based augmentation for generalization in speech models.

Updated: 2025-07-30 22:14:16

标题: 评估和改善语音命令识别模型对噪声和分布偏移的鲁棒性

摘要: 尽管计算机视觉领域的先前工作已经显示出内分布（ID）和外分布（OOD）准确性之间的强相关性，但在基于音频的模型中，这些关系仍然未被充分探索。在这项研究中，我们调查了训练条件和输入特征如何影响在OOD条件下口头关键词分类器的稳健性和泛化能力。我们在多种评估集上对几种神经架构进行了基准测试。为了量化噪声对泛化的影响，我们利用了两个指标：公平性（F），它衡量了整体准确性与基线模型相比的增益，以及鲁棒性（R），它评估了ID和OOD性能之间的收敛性。我们的结果表明，噪声感知训练在某些配置中改善了鲁棒性。这些发现为噪声增强对语音模型泛化的好处和局限性带来了新的启示。

更新时间: 2025-07-30 22:14:16

领域: cs.LG

下载: http://arxiv.org/abs/2507.23128v1

ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology

Prediction tasks in digital pathology are challenging due to the massive size of whole-slide images (WSIs) and the weak nature of training signals. Advances in computing, data availability, and self-supervised learning (SSL) have paved the way for slide-level foundation models (SLFMs) that can improve prediction tasks in low-data regimes. However, current methods under-utilize shared information between tasks and modalities. To overcome this challenge, we propose ModalTune, a novel fine-tuning framework which introduces the Modal Adapter to integrate new modalities without modifying SLFM weights. Additionally, we use large-language models (LLMs) to encode labels as text, capturing semantic relationships across multiple tasks and cancer types in a single training recipe. ModalTune achieves state-of-the-art (SOTA) results against both uni-modal and multi-modal models across four cancer types, jointly improving survival and cancer subtype prediction while remaining competitive in pan-cancer settings. Additionally, we show ModalTune is generalizable to two out-of-distribution (OOD) datasets. To our knowledge, this is the first unified fine-tuning framework for multi-modal, multi-task, and pan-cancer modeling in digital pathology.

Updated: 2025-07-30 22:10:58

标题: ModalTune: 使用多模态信息对数字病理学中的幻灯片级基础模型进行微调，以实现多任务学习

摘要: 数字病理学中的预测任务具有挑战性，因为全幻灯片图像（WSIs）的规模庞大，训练信号的弱性质。计算、数据可用性和自监督学习（SSL）的进步为可以改善在低数据环境中的预测任务的滑片级基础模型（SLFMs）铺平了道路。然而，当前的方法未充分利用任务和模态之间的共享信息。为了克服这一挑战，我们提出了ModalTune，这是一种引入模态适配器的新颖微调框架，可以在不修改SLFM权重的情况下集成新的模态。此外，我们使用大型语言模型（LLMs）将标签编码为文本，捕捉跨多个任务和癌症类型的语义关系在一个训练配方中。ModalTune在四种癌症类型中对单模态和多模态模型实现了最先进的结果，同时在泛癌症设置中保持竞争力，共同提高生存和癌症亚型预测。此外，我们展示了ModalTune对两个分布外（OOD）数据集的泛化能力。据我们所知，这是数字病理学中的第一个统一的多模态、多任务和泛癌症建模的微调框架。

更新时间: 2025-07-30 22:10:58

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.17564v2

MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement

Agents based on large language models (LLMs) for machine learning engineering (MLE) can automatically implement ML models via code generation. However, existing approaches to build such agents often rely heavily on inherent LLM knowledge and employ coarse exploration strategies that modify the entire code structure at once. This limits their ability to select effective task-specific models and perform deep exploration within specific components, such as experimenting extensively with feature engineering options. To overcome these, we propose MLE-STAR, a novel approach to build MLE agents. MLE-STAR first leverages external knowledge by using a search engine to retrieve effective models from the web, forming an initial solution, then iteratively refines it by exploring various strategies targeting specific ML components. This exploration is guided by ablation studies analyzing the impact of individual code blocks. Furthermore, we introduce a novel ensembling method using an effective strategy suggested by MLE-STAR. Our experimental results show that MLE-STAR achieves medals in 64% of the Kaggle competitions on the MLE-bench Lite, significantly outperforming the best alternative.

Updated: 2025-07-30 21:58:41

标题: MLE-STAR：通过搜索和定向优化实现的机器学习工程代理

摘要: 基于大型语言模型（LLMs）的代理人用于机器学习工程（MLE）可以通过代码生成自动实现ML模型。然而，现有的建立这种代理人的方法通常严重依赖固有的LLM知识，并采用粗糙的探索策略，一次性修改整个代码结构。这限制了它们选择有效的特定任务模型以及在特定组件内进行深度探索的能力，例如广泛尝试特征工程选项。为了克服这些问题，我们提出了MLE-STAR，这是一种建立MLE代理人的新方法。MLE-STAR首先利用搜索引擎检索网络上的有效模型，形成初始解决方案，然后通过探索针对特定ML组件的各种策略来迭代地完善它。这种探索是通过消融研究来分析个别代码块的影响来引导的。此外，我们介绍了一种新颖的集成方法，使用MLE-STAR建议的有效策略。我们的实验结果显示，MLE-STAR在MLE-bench Lite上的64%的Kaggle竞赛中获得奖牌，明显优于最佳替代方案。

更新时间: 2025-07-30 21:58:41

领域: cs.LG

下载: http://arxiv.org/abs/2506.15692v2

Uncovering the Fragility of Trustworthy LLMs through Chinese Textual Ambiguity

In this work, we study a critical research problem regarding the trustworthiness of large language models (LLMs): how LLMs behave when encountering ambiguous narrative text, with a particular focus on Chinese textual ambiguity. We created a benchmark dataset by collecting and generating ambiguous sentences with context and their corresponding disambiguated pairs, representing multiple possible interpretations. These annotated examples are systematically categorized into 3 main categories and 9 subcategories. Through experiments, we discovered significant fragility in LLMs when handling ambiguity, revealing behavior that differs substantially from humans. Specifically, LLMs cannot reliably distinguish ambiguous text from unambiguous text, show overconfidence in interpreting ambiguous text as having a single meaning rather than multiple meanings, and exhibit overthinking when attempting to understand the various possible meanings. Our findings highlight a fundamental limitation in current LLMs that has significant implications for their deployment in real-world applications where linguistic ambiguity is common, calling for improved approaches to handle uncertainty in language understanding. The dataset and code are publicly available at this GitHub repository: https://github.com/ictup/LLM-Chinese-Textual-Disambiguation.

Updated: 2025-07-30 21:50:19

标题: 揭示信任可靠性LLM在中文文本模糊性中的脆弱性

摘要: 在这项工作中，我们研究了一个关于大型语言模型（LLMs）可信度的关键研究问题：当遇到模糊的叙述文本时，LLMs的行为如何，特别关注中文文本的歧义性。我们通过收集和生成带有背景的模糊句子及其对应的消歧对，创建了一个基准数据集，代表了多种可能的解释。这些注释的示例被系统地分类为3个主要类别和9个子类别。通过实验，我们发现LLMs在处理歧义性时存在显著的脆弱性，展示了与人类明显不同的行为。具体而言，LLMs无法可靠地区分模糊文本和明确文本，在解释模糊文本时表现出自信过度，认为其只有一个含义而非多个含义，并在尝试理解各种可能的含义时表现出过度思考。我们的发现突显了当前LLMs存在的一个根本限制，对于它们在真实世界中的应用具有重要影响，因为语言歧义是常见的，这要求改进方法以处理语言理解中的不确定性。该数据集和代码可以在以下GitHub存储库中公开获取：https://github.com/ictup/LLM-Chinese-Textual-Disambiguation。

更新时间: 2025-07-30 21:50:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.23121v1

Controlling diverse robots by inferring Jacobian fields with deep networks

Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have greatly expanded the feasible hardware, but using these systems requires control software to translate the desired motions into actuator commands. Conventional robots can easily be modeled as rigid links connected by joints, but it remains an open challenge to model and control biologically inspired robots that are often soft or made of several materials, lack sensing capabilities, and may change their material properties with use. Here, we introduce a method that uses deep neural networks to map a video stream of a robot to its visuomotor Jacobian field (the sensitivity of all 3D points to the robot's actuators). Our method enables the control of robots from only a single camera, makes no assumptions about the robots' materials, actuation, or sensing, and is trained without expert intervention by observing the execution of random commands. We demonstrate our method on a diverse set of robot manipulators that vary in actuation, materials, fabrication, and cost. Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot. Because it enables robot control using a generic camera as the only sensor, we anticipate that our work will broaden the design space of robotic systems and serve as a starting point for lowering the barrier to robotic automation.

Updated: 2025-07-30 21:44:23

标题: 使用深度网络推断雅可比场来控制多样化的机器人

摘要: 将自然生物的复杂结构和多样功能反映在机器人上是一个长期的挑战。现代制造技术已经极大地扩展了可行的硬件，但使用这些系统需要控制软件将所需的运动转化为执行器命令。传统机器人可以轻松地建模为由关节连接的刚性链，但对于经常是软的或由几种材料制成、缺乏感知能力，并且可能随着使用而改变材料属性的生物启发机器人进行建模和控制仍然是一个未解决的挑战。在这里，我们介绍了一种使用深度神经网络将机器人的视频流映射到其视觉动作雅可比场（所有3D点对机器人执行器的灵敏度）的方法。我们的方法使得可以通过单个摄像头控制机器人，不对机器人的材料、执行或感知做任何假设，并且通过观察随机命令的执行而无需专家干预进行训练。我们展示了我们的方法在各种不同的机器人操作器上的应用，这些机器人在执行、材料、制造和成本上有所不同。我们的方法实现了准确的闭环控制，并恢复了每个机器人的因果动态结构。因为它使得可以使用通用摄像头作为唯一传感器来控制机器人，我们预计我们的工作将扩展机器人系统的设计空间，并作为降低机器人自动化门槛的起点。

更新时间: 2025-07-30 21:44:23

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.08722v2

Insights into resource utilization of code small language models serving with runtime engines and execution providers

The rapid growth of language models, particularly in code generation, requires substantial computational resources, raising concerns about energy consumption and environmental impact. Optimizing language models inference resource utilization is crucial, and Small Language Models (SLMs) offer a promising solution to reduce resource demands. Our goal is to analyze the impact of deep learning serving configurations, defined as combinations of runtime engines and execution providers, on resource utilization, in terms of energy consumption, execution time, and computing-resource utilization from the point of view of software engineers conducting inference in the context of code generation SLMs. We conducted a technology-oriented, multi-stage experimental pipeline using twelve code generation SLMs to investigate energy consumption, execution time, and computing-resource utilization across the configurations. Significant differences emerged across configurations. CUDA execution provider configurations outperformed CPU execution provider configurations in both energy consumption and execution time. Among the configurations, TORCH paired with CUDA demonstrated the greatest energy efficiency, achieving energy savings from 37.99% up to 89.16% compared to other serving configurations. Similarly, optimized runtime engines like ONNX with the CPU execution provider achieved from 8.98% up to 72.04% energy savings within CPU-based configurations. Also, TORCH paired with CUDA exhibited efficient computing-resource utilization. Serving configuration choice significantly impacts resource utilization. While further research is needed, we recommend the above configurations best suited to software engineers' requirements for enhancing serving resource utilization efficiency.

Updated: 2025-07-30 21:44:18

标题: 小型语言模型在运行时引擎和执行提供者服务中资源利用的洞察

摘要: 语言模型的快速增长，尤其是在代码生成方面，需要大量的计算资源，引发了对能源消耗和环境影响的担忧。优化语言模型推理资源利用至关重要，小型语言模型（SLMs）为减少资源需求提供了一个有希望的解决方案。我们的目标是分析深度学习服务配置（即运行时引擎和执行提供者的组合）对资源利用率的影响，从软件工程师在代码生成SLMs推理环境中的角度来看，包括能源消耗、执行时间和计算资源利用。我们进行了一个技术导向的、多阶段的实验流水线，使用十二个代码生成SLMs来研究能源消耗、执行时间和计算资源利用在不同配置下的情况。不同配置之间出现了显著差异。CUDA执行提供者配置在能源消耗和执行时间方面优于CPU执行提供者配置。在各种配置中，TORCH与CUDA配对展现了最高的能源效率，与其他服务配置相比，实现了从37.99%到89.16%的能源节约。同样，优化的运行时引擎如ONNX配合CPU执行提供者在CPU配置中实现了从8.98%到72.04%的能源节约。此外，TORCH与CUDA展现出了高效的计算资源利用。服务配置选择显著影响资源利用率。虽然还需要进一步研究，但我们建议以上配置最适合软件工程师的需求，以提高服务资源利用效率。

更新时间: 2025-07-30 21:44:18

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2412.15441v2

FLOSS: Federated Learning with Opt-Out and Straggler Support

Previous work on data privacy in federated learning systems focuses on privacy-preserving operations for data from users who have agreed to share their data for training. However, modern data privacy agreements also empower users to use the system while opting out of sharing their data as desired. When combined with stragglers that arise from heterogeneous device capabilities, the result is missing data from a variety of sources that introduces bias and degrades model performance. In this paper, we present FLOSS, a system that mitigates the impacts of such missing data on federated learning in the presence of stragglers and user opt-out, and empirically demonstrate its performance in simulations.

Updated: 2025-07-30 21:34:56

标题: FLOSS：具有退出和落后支持的联邦学习

摘要: 先前关于联邦学习系统中数据隐私的研究主要集中在为同意共享其数据进行训练的用户提供隐私保护操作上。然而，现代数据隐私协议也赋予用户权利在需要时选择退出共享其数据并使用系统。当与因设备能力不同而产生的滞后者结合在一起时，结果是来自各种来源的缺失数据引入偏差并降低模型性能。在本文中，我们提出了FLOSS系统，该系统在存在滞后者和用户选择退出时，减轻了这种缺失数据对联邦学习的影响，并通过模拟实验证明了其性能。

更新时间: 2025-07-30 21:34:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.23115v1

Scalable Generative Modeling of Weighted Graphs

Weighted graphs are ubiquitous throughout biology, chemistry, and the social sciences, motivating the development of generative models for abstract weighted graph data using deep neural networks. However, most current deep generative models are either designed for unweighted graphs and are not easily extended to weighted topologies or incorporate edge weights without consideration of a joint distribution with topology. Furthermore, learning a distribution over weighted graphs must account for complex nonlocal dependencies between both the edges of the graph and corresponding weights of each edge. We develop an autoregressive model BiGG-E, a nontrivial extension of the BiGG model, that learns a joint distribution over weighted graphs while still exploiting sparsity to generate a weighted graph with $n$ nodes and $m$ edges in $O((n + m)\log n)$ time. Simulation studies and experiments on a variety of benchmark datasets demonstrate that BiGG-E best captures distributions over weighted graphs while remaining scalable and computationally efficient.

Updated: 2025-07-30 21:28:28

标题: 可扩展的带权图生成建模

摘要: 加权图在生物学、化学和社会科学中随处可见，这促使开发使用深度神经网络对抽象加权图数据进行生成模型。然而，大多数当前的深度生成模型要么设计用于无权图，不容易扩展到加权拓扑，要么在不考虑与拓扑结构的联合分布的情况下合并边权重。此外，学习关于加权图的分布必须考虑图的边和每个边对应的权重之间的复杂非局部依赖关系。我们开发了一种自回归模型BiGG-E，这是BiGG模型的一个非平凡扩展，它学习了一个关于加权图的联合分布，同时利用稀疏性在$O((n + m)\log n)$时间内生成一个具有$n$个节点和$m$条边的加权图。在各种基准数据集上进行的模拟研究和实验表明，BiGG-E最好地捕捉了加权图的分布，同时保持可伸缩性和计算效率。

更新时间: 2025-07-30 21:28:28

领域: cs.LG

下载: http://arxiv.org/abs/2507.23111v1

Graph Sampling for Scalable and Expressive Graph Neural Networks on Homophilic Graphs

Graph Neural Networks (GNNs) excel in many graph machine learning tasks but face challenges when scaling to large networks. GNN transferability allows training on smaller graphs and applying the model to larger ones, but existing methods often rely on random subsampling, leading to disconnected subgraphs and reduced model expressivity. We propose a novel graph sampling algorithm that leverages feature homophily to preserve graph structure. By minimizing the trace of the data correlation matrix, our method better preserves the graph Laplacian trace -- a proxy for the graph connectivity -- than random sampling, while achieving lower complexity than spectral methods. Experiments on citation networks show improved performance in preserving Laplacian trace and GNN transferability compared to random sampling.

Updated: 2025-07-30 21:23:43

标题: 在同质图上进行的用于可扩展和富有表现力的图神经网络的图采样

摘要: 图神经网络（GNNs）在许多图机器学习任务中表现出色，但在扩展到大型网络时面临挑战。GNN的可迁移性允许在较小的图上进行训练，并将模型应用于更大的图，但现有方法通常依赖于随机子采样，导致断开的子图和降低的模型表达能力。我们提出了一种新颖的图采样算法，利用特征同质性来保留图结构。通过最小化数据相关矩阵的迹，我们的方法比随机采样更好地保留了图拉普拉斯迹 -- 一种图连接的代理 -- 同时实现了比谱方法更低的复杂度。在引用网络上的实验显示，与随机采样相比，我们的方法在保留拉普拉斯迹和GNN可迁移性方面表现出更好的性能。

更新时间: 2025-07-30 21:23:43

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.16593v4

Coarse Graining with Neural Operators for Simulating Chaotic Systems

Accurately predicting the long-term behavior of chaotic systems is crucial for various applications such as climate modeling. However, achieving such predictions typically requires iterative computations over a dense spatiotemporal grid to account for the unstable nature of chaotic systems, which is expensive and impractical in many real-world situations. An alternative approach to such a full-resolved simulation is using a coarse grid and then correcting its errors through a \textit{closure model}, which approximates the overall information from fine scales not captured in the coarse-grid simulation. Recently, ML approaches have been used for closure modeling, but they typically require a large number of training samples from expensive fully-resolved simulations (FRS). In this work, we prove an even more fundamental limitation, i.e., the standard approach to learning closure models suffers from a large approximation error for generic problems, no matter how large the model is, and it stems from the non-uniqueness of the mapping. We propose an alternative end-to-end learning approach using a physics-informed neural operator (PINO) that overcomes this limitation by not using a closure model or a coarse-grid solver. We first train the PINO model on data from a coarse-grid solver and then fine-tune it with (a small amount of) FRS and physics-based losses on a fine grid. The discretization-free nature of neural operators means that they do not suffer from the restriction of a coarse grid that closure models face, and they can provably approximate the long-term statistics of chaotic systems. In our experiments, our PINO model achieves a 330x speedup compared to FRS with a relative error $\sim 10\%$. In contrast, the closure model coupled with a coarse-grid solver is $60$x slower than PINO while having a much higher error $\sim186\%$ when the closure model is trained on the same FRS dataset.

Updated: 2025-07-30 21:18:51

标题: 使用神经算子进行粗粒化模拟混沌系统

摘要: 准确预测混沌系统的长期行为对于各种应用非常关键，如气候建模。然而，实现这样的预测通常需要在密集的时空网格上进行迭代计算，以考虑混沌系统不稳定的性质，这在许多真实世界情况下是昂贵且不切实际的。一种替代完全解析模拟的方法是使用粗网格，然后通过“封闭模型”来校正其错误，封闭模型近似了粗网格模拟中未捕捉到的精细尺度的整体信息。最近，机器学习方法已被用于封闭建模，但通常需要大量来自昂贵完全解析模拟（FRS）的训练样本。在这项工作中，我们证明了更基本的限制，即学习封闭模型的标准方法在一般问题上存在较大的近似误差，无论模型有多大，这是由于映射的非唯一性而导致的。我们提出了一种使用基于物理信息的神经算子（PINO）的替代端到端学习方法，通过不使用封闭模型或粗网格求解器来克服这一限制。我们首先在粗网格求解器的数据上训练PINO模型，然后通过在精细网格上进行微调（使用少量）FRS和基于物理的损失。神经算子的离散化自由特性意味着它们不会受到封闭模型所面临的粗网格限制的限制，并且它们可以有效地近似混沌系统的长期统计数据。在我们的实验中，我们的PINO模型相较于FRS实现了330倍的加速，相对误差约为10％。相比之下，与粗网格求解器相结合的封闭模型比PINO慢60倍，而当封闭模型在相同的FRS数据集上训练时，错误率高达约186％。

更新时间: 2025-07-30 21:18:51

领域: cs.LG

下载: http://arxiv.org/abs/2408.05177v5

RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL

Despite advances in large language model (LLM)-based natural language interfaces for databases, scaling to enterprise-level data catalogs remains an under-explored challenge. Prior works addressing this challenge rely on domain-specific fine-tuning - complicating deployment - and fail to leverage important semantic context contained within database metadata. To address these limitations, we introduce a component-based retrieval architecture that decomposes database schemas and metadata into discrete semantic units, each separately indexed for targeted retrieval. Our approach prioritizes effective table identification while leveraging column-level information, ensuring the total number of retrieved tables remains within a manageable context budget. Experiments demonstrate that our method maintains high recall and accuracy, with our system outperforming baselines over massive databases with varying structure and available metadata. Our solution enables practical text-to-SQL systems deployable across diverse enterprise settings without specialized fine-tuning, addressing a critical scalability gap in natural language database interfaces.

Updated: 2025-07-30 21:09:47

标题: RASL：用于大型数据库文本到SQL的检索增强模式链接

摘要: 尽管大型语言模型（LLM）为数据库提供了自然语言接口的进展，但将其扩展到企业级数据目录仍然是一个未被充分探讨的挑战。先前解决这一挑战的作品依赖于领域特定的微调 - 增加了部署的复杂性 - 并未利用数据库元数据中包含的重要语义上下文。为了解决这些限制，我们引入了一个基于组件的检索架构，将数据库模式和元数据分解为离散的语义单元，每个单独索引以进行有针对性的检索。我们的方法优先考虑有效的表识别，同时利用列级信息，确保检索到的表的总数保持在可管理的上下文预算内。实验证明，我们的方法保持了高召回率和准确性，我们的系统在具有不同结构和可用元数据的大型数据库上表现优于基线。我们的解决方案使得可以在各种企业环境中部署的实用文本到SQL系统成为可能，而无需进行专门的微调，从而解决了自然语言数据库接口中的一个关键的可伸缩性差距。

更新时间: 2025-07-30 21:09:47

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.23104v1

SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity

We present SMART-Editor, a framework for compositional layout and content editing across structured (posters, websites) and unstructured (natural images) domains. Unlike prior models that perform local edits, SMART-Editor preserves global coherence through two strategies: Reward-Refine, an inference-time rewardguided refinement method, and RewardDPO, a training-time preference optimization approach using reward-aligned layout pairs. To evaluate model performance, we introduce SMARTEdit-Bench, a benchmark covering multi-domain, cascading edit scenarios. SMART-Editor outperforms strong baselines like InstructPix2Pix and HIVE, with RewardDPO achieving up to 15% gains in structured settings and Reward-Refine showing advantages on natural images. Automatic and human evaluations confirm the value of reward-guided planning in producing semantically consistent and visually aligned edits.

Updated: 2025-07-30 20:52:34

标题: SMART-Editor：一种用于具有结构完整性的类人设计编辑的多代理框架

摘要: 我们提出了SMART-Editor，一个用于跨结构化（海报、网站）和非结构化（自然图像）领域的组合布局和内容编辑的框架。与之前执行局部编辑的模型不同，SMART-Editor通过两种策略保持全局连贯性：Reward-Refine，一种推理时间奖励引导的细化方法，以及RewardDPO，一种使用奖励对齐布局对训练时间偏好优化方法。为了评估模型性能，我们引入了SMARTEdit-Bench，一个涵盖多领域、级联编辑场景的基准。SMART-Editor胜过强基线模型如InstructPix2Pix和HIVE，RewardDPO在结构化设置中取得了高达15%的增益，Reward-Refine在自然图像上显示出优势。自动和人工评估证实了奖励引导规划在产生语义一致且视觉对齐的编辑方面的价值。

更新时间: 2025-07-30 20:52:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.23095v1

On the Sustainability of AI Inferences in the Edge

The proliferation of the Internet of Things (IoT) and its cutting-edge AI-enabled applications (e.g., autonomous vehicles and smart industries) combine two paradigms: data-driven systems and their deployment on the edge. Usually, edge devices perform inferences to support latency-critical applications. In addition to the performance of these resource-constrained edge devices, their energy usage is a critical factor in adopting and deploying edge applications. Examples of such devices include Raspberry Pi (RPi), Intel Neural Compute Stick (INCS), NVIDIA Jetson nano (NJn), and Google Coral USB (GCU). Despite their adoption in edge deployment for AI inferences, there is no study on their performance and energy usage for informed decision-making on the device and model selection to meet the demands of applications. This study fills the gap by rigorously characterizing the performance of traditional, neural networks, and large language models on the above-edge devices. Specifically, we analyze trade-offs among model F1 score, inference time, inference power, and memory usage. Hardware and framework optimization, along with external parameter tuning of AI models, can balance between model performance and resource usage to realize practical edge AI deployments.

Updated: 2025-07-30 20:47:22

标题: 关于边缘人工智能推理可持续性的研究

摘要: 物联网（IoT）的普及以及其尖端的AI应用（例如自动驾驶汽车和智能产业）结合了两种范式：数据驱动系统及其在边缘部署。通常，边缘设备执行推理以支持延迟关键应用。除了这些资源受限的边缘设备的性能外，它们的能源使用也是采用和部署边缘应用的关键因素。这些设备的示例包括Raspberry Pi（RPi）、Intel神经计算棒（INCS）、NVIDIA Jetson nano（NJn）和Google Coral USB（GCU）。尽管它们在边缘部署中用于AI推理，但目前还没有关于它们的性能和能源使用的研究，以便对设备和模型选择做出明智的决策以满足应用需求。本研究通过严格表征传统、神经网络和大型语言模型在上述边缘设备上的性能来填补这一空白。具体来说，我们分析了模型F1分数、推理时间、推理功耗和内存使用之间的权衡。通过硬件和框架优化，以及AI模型的外部参数调整，可以在模型性能和资源使用之间取得平衡，实现实际的边缘AI部署。

更新时间: 2025-07-30 20:47:22

领域: cs.LG,cs.AI,cs.PF

下载: http://arxiv.org/abs/2507.23093v1

Accenture-NVS1: A Novel View Synthesis Dataset

This paper introduces ACC-NVS1, a specialized dataset designed for research on Novel View Synthesis specifically for airborne and ground imagery. Data for ACC-NVS1 was collected in Austin, TX and Pittsburgh, PA in 2023 and 2024. The collection encompasses six diverse real-world scenes captured from both airborne and ground cameras, resulting in a total of 148,000 images. ACC-NVS1 addresses challenges such as varying altitudes and transient objects. This dataset is intended to supplement existing datasets, providing additional resources for comprehensive research, rather than serving as a benchmark.

Updated: 2025-07-30 20:46:33

标题: Accenture-NVS1：一种新颖的视图合成数据集

摘要: 这篇论文介绍了ACC-NVS1，这是一个专门设计用于研究空中和地面图像的新视角合成的数据集。ACC-NVS1的数据是在2023年和2024年在德克萨斯州奥斯汀和宾夕法尼亚州匹兹堡收集的。该数据集包含了从空中和地面摄像机捕捉的六个多样化的真实场景，总共有148,000张图片。ACC-NVS1解决了诸如不同高度和瞬时物体等挑战。这个数据集旨在补充现有的数据集，为综合研究提供额外资源，而不是作为基准。

更新时间: 2025-07-30 20:46:33

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2503.18711v2

Moravec's Paradox: Towards an Auditory Turing Test

This research work demonstrates that current AI systems fail catastrophically on auditory tasks that humans perform effortlessly. Drawing inspiration from Moravec's paradox (i.e., tasks simple for humans often prove difficult for machines, and vice versa), we introduce an auditory Turing test comprising 917 challenges across seven categories: overlapping speech, speech in noise, temporal distortion, spatial audio, coffee-shop noise, phone distortion, and perceptual illusions. Our evaluation of state-of-the-art audio models including GPT-4's audio capabilities and OpenAI's Whisper reveals a striking failure rate exceeding 93%, with even the best-performing model achieving only 6.9% accuracy on tasks that humans solved at 7.5 times higher success (52%). These results expose focusing failures in how AI systems process complex auditory scenes, particularly in selective attention, noise robustness, and contextual adaptation. Our benchmark not only quantifies the human-machine auditory gap but also provides insights into why these failures occur, suggesting that current architectures lack fundamental mechanisms for human-like auditory scene analysis. The traditional design of audio CAPTCHAs highlights common filters that humans evolved but machines fail to select in multimodal language models. This work establishes a diagnostic framework for measuring progress toward human-level machine listening and highlights the need for novel approaches integrating selective attention, physics-based audio understanding, and context-aware perception into multimodal AI systems.

Updated: 2025-07-30 20:45:13

标题: Moravec的悖论：走向听觉图灵测试

摘要: 这项研究表明，当前的人工智能系统在人类轻松完成的听觉任务上遭遇了灾难性的失败。受莫拉维克悖论的启发（即，对人类来说简单的任务往往对机器来说很困难，反之亦然），我们引入了一个包括917个挑战的听觉图灵测试，涵盖七个类别：重叠语音、噪音中的语音、时间失真、空间音频、咖啡店噪音、电话失真和感知错觉。我们评估了包括GPT-4的音频能力和OpenAI的Whisper在内的最先进的音频模型，结果显示失败率超过93％，即使表现最好的模型在人类以7.5倍更高成功率（52％）解决的任务上也仅达到了6.9％的准确率。这些结果揭示了人工智能系统在处理复杂听觉场景时的聚焦失败，特别是在选择性注意力、噪音鲁棒性和环境适应方面。我们的基准不仅量化了人机听觉差距，还揭示了这些失败发生的原因，表明当前的架构缺乏类似于人类的听觉场景分析的基本机制。传统设计的音频CAPTCHA突显了人类进化的共同过滤器，但机器未能在多模态语言模型中选择。这项工作建立了一个诊断框架，用于衡量达到人类水平的机器听力的进展，并强调了将选择性注意力、基于物理的音频理解和上下文感知整合到多模态人工智能系统中的需要。

更新时间: 2025-07-30 20:45:13

领域: cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.23091v1

Beyond Rigid AI: Towards Natural Human-Machine Symbiosis for Interoperative Surgical Assistance

Emerging surgical data science and robotics solutions, especially those designed to provide assistance in situ, require natural human-machine interfaces to fully unlock their potential in providing adaptive and intuitive aid. Contemporary AI-driven solutions remain inherently rigid, offering limited flexibility and restricting natural human-machine interaction in dynamic surgical environments. These solutions rely heavily on extensive task-specific pre-training, fixed object categories, and explicit manual-prompting. This work introduces a novel Perception Agent that leverages speech-integrated prompt-engineered large language models (LLMs), segment anything model (SAM), and any-point tracking foundation models to enable a more natural human-machine interaction in real-time intraoperative surgical assistance. Incorporating a memory repository and two novel mechanisms for segmenting unseen elements, Perception Agent offers the flexibility to segment both known and unseen elements in the surgical scene through intuitive interaction. Incorporating the ability to memorize novel elements for use in future surgeries, this work takes a marked step towards human-machine symbiosis in surgical procedures. Through quantitative analysis on a public dataset, we show that the performance of our agent is on par with considerably more labor-intensive manual-prompting strategies. Qualitatively, we show the flexibility of our agent in segmenting novel elements (instruments, phantom grafts, and gauze) in a custom-curated dataset. By offering natural human-machine interaction and overcoming rigidity, our Perception Agent potentially brings AI-based real-time assistance in dynamic surgical environments closer to reality.

Updated: 2025-07-30 20:42:24

标题: 超越僵化的人工智能：走向自然的人机共生，用于术中手术辅助

摘要: 新兴的外科数据科学和机器人解决方案，特别是那些旨在提供现场帮助的解决方案，需要自然的人机界面，以充分释放它们在提供适应性和直观帮助方面的潜力。当代基于人工智能的解决方案在本质上仍然是刚性的，提供有限的灵活性，并限制动态外科环境中的自然人机交互。这些解决方案严重依赖于广泛的特定任务预训练、固定的对象类别和明确的手动提示。这项工作介绍了一种新颖的感知代理，利用语音集成的提示工程化大型语言模型（LLMs）、分段任何模型（SAM）和任意点跟踪基础模型，以实现实时术中外科辅助中更自然的人机交互。感知代理包括一个记忆库和两种用于分割看不到的元素的新机制，通过直观的交互提供了对外科场景中已知和未知元素的分割灵活性。通过具有记忆新元素以供将来手术使用的能力，这项工作在外科手术程序中迈出了重要的一步朝着人机共生。通过对公共数据集的定量分析，我们展示了我们的代理的性能与更加繁重的手动提示策略相当。在定制的数据集中，我们展示了我们的代理在分割新元素（器械、幻影移植和纱布）方面的灵活性。通过提供自然的人机交互并克服刚性，我们的感知代理有可能将基于人工智能的实时辅助引入动态外科环境的现实。

更新时间: 2025-07-30 20:42:24

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2507.23088v1

On LLM-Assisted Generation of Smart Contracts from Business Processes

Large language models (LLMs) have changed the reality of how software is produced. Within the wider software engineering community, among many other purposes, they are explored for code generation use cases from different types of input. In this work, we present an exploratory study to investigate the use of LLMs for generating smart contract code from business process descriptions, an idea that has emerged in recent literature to overcome the limitations of traditional rule-based code generation approaches. However, current LLM-based work evaluates generated code on small samples, relying on manual inspection, or testing whether code compiles but ignoring correct execution. With this work, we introduce an automated evaluation framework and provide empirical data from larger data sets of process models. We test LLMs of different types and sizes in their capabilities of achieving important properties of process execution, including enforcing process flow, resource allocation, and data-based conditions. Our results show that LLM performance falls short of the perfect reliability required for smart contract development. We suggest future work to explore responsible LLM integrations in existing tools for code generation to ensure more reliable output. Our benchmarking framework can serve as a foundation for developing and evaluating such integrations.

Updated: 2025-07-30 20:39:45

标题: 关于LLM辅助从业务流程生成智能合约

摘要: 大型语言模型(LLMs)已经改变了软件生产的现实。在更广泛的软件工程社区内，它们被用于从不同类型的输入中生成代码。在这项工作中，我们提出了一项探索性研究，以调查使用LLMs从业务流程描述生成智能合约代码的用途，这是最近文献中出现的一个想法，旨在克服传统基于规则的代码生成方法的局限性。然而，目前基于LLMs的工作在生成的代码上进行小样本评估，依赖于手动检查，或者测试代码是否编译，但忽略了正确执行。通过这项工作，我们引入了一个自动化评估框架，并提供了来自更大的流程模型数据集的经验数据。我们测试了不同类型和大小的LLMs在实现流程执行的重要属性方面的能力，包括强制执行流程流程、资源分配和基于数据的条件。我们的结果显示，LLM的性能达不到智能合约开发所需的完美可靠性。我们建议未来的工作探索在现有代码生成工具中负责任地集成LLM，以确保更可靠的输出。我们的基准框架可以作为开发和评估这种集成的基础。

更新时间: 2025-07-30 20:39:45

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.23087v1

AutoIndexer: A Reinforcement Learning-Enhanced Index Advisor Towards Scaling Workloads

Efficiently selecting indexes is fundamental to database performance optimization, particularly for systems handling large-scale analytical workloads. While deep reinforcement learning (DRL) has shown promise in automating index selection through its ability to learn from experience, few works address how these RL-based index advisors can adapt to scaling workloads due to exponentially growing action spaces and heavy trial and error. To address these challenges, we introduce AutoIndexer, a framework that combines workload compression, query optimization, and specialized RL models to scale index selection effectively. By operating on compressed workloads, AutoIndexer substantially lowers search complexity without sacrificing much index quality. Extensive evaluations show that it reduces end-to-end query execution time by up to 95% versus non-indexed baselines. On average, it outperforms state-of-the-art RL-based index advisors by approximately 20% in workload cost savings while cutting tuning time by over 50%. These results affirm AutoIndexer's practicality for large and diverse workloads.

Updated: 2025-07-30 20:38:13

标题: 自动索引工具：一种强化学习增强的索引顾问，用于扩展工作负载

摘要: 高效选择索引是数据库性能优化的基础，尤其是对于处理大规模分析工作负载的系统。虽然深度强化学习（DRL）已经显示出自动化索引选择的潜力，但很少有研究解决基于RL的索引顾问如何适应由于指数级增长的动作空间和大量试错所导致的工作负载扩展的挑战。为了解决这些挑战，我们引入了AutoIndexer，这是一个结合了工作负载压缩、查询优化和专门的RL模型来有效扩展索引选择的框架。通过在压缩工作负载上运行，AutoIndexer大大降低了搜索复杂性，同时几乎不牺牲索引质量。广泛的评估结果显示，与非索引基线相比，它将端到端查询执行时间缩短了高达95%。平均而言，它在工作负载成本节约方面比最先进的基于RL的索引顾问高出约20%，同时将调优时间缩短了50%以上。这些结果证实了AutoIndexer对于大规模和多样化工作负载的实用性。

更新时间: 2025-07-30 20:38:13

领域: cs.DB,cs.AI

下载: http://arxiv.org/abs/2507.23084v1

Learning dynamically inspired invariant subspaces for Koopman and transfer operator approximation

Transfer and Koopman operator methods offer a framework for representing complex, nonlinear dynamical systems via linear transformations, enabling a deeper understanding of the underlying dynamics. The spectra of these operators provide important insights into system predictability and emergent behaviour, although efficiently estimating them from data can be challenging. We approach this issue through the lens of general operator and representational learning, in which we approximate these linear operators using efficient finite-dimensional representations. Specifically, we machine-learn orthonormal basis functions that are dynamically tailored to the system. This learned basis provides a particularly accurate approximation of the operator's action as well as a nearly invariant finite-dimensional subspace. We illustrate our approach with examples that showcase the retrieval of spectral properties from the estimated operator, and emphasise the dynamically adaptive quality of the machine-learned basis.

Updated: 2025-07-30 20:30:51

标题: 学习动态启发的不变子空间用于Koopman和转移算子逼近

摘要: 转移和Koopman算子方法提供了一个框架，通过线性变换表示复杂的非线性动力系统，从而使对底层动态的理解更加深入。这些算子的谱提供了对系统可预测性和新兴行为的重要见解，尽管从数据有效地估计它们可能具有挑战性。我们通过一般算子和表征学习的视角来解决这个问题，通过使用高效的有限维表示来近似这些线性算子。具体而言，我们机器学习动态定制的正交基函数，这些学习到的基函数提供了对算子作用的特别准确近似以及一个几乎不变的有限维子空间。我们通过展示示例来说明我们的方法，展示从估计的算子中检索谱特性，并强调机器学习基础的动态适应性质。

更新时间: 2025-07-30 20:30:51

领域: math.DS,cs.LG,cs.NA,math.NA,47A15, 37C30, 47A58, 68T07

下载: http://arxiv.org/abs/2505.05085v2

A Foundation Model for Material Fracture Prediction

Accurately predicting when and how materials fail is critical to designing safe, reliable structures, mechanical systems, and engineered components that operate under stress. Yet, fracture behavior remains difficult to model across the diversity of materials, geometries, and loading conditions in real-world applications. While machine learning (ML) methods show promise, most models are trained on narrow datasets, lack robustness, and struggle to generalize. Meanwhile, physics-based simulators offer high-fidelity predictions but are fragmented across specialized methods and require substantial high-performance computing resources to explore the input space. To address these limitations, we present a data-driven foundation model for fracture prediction, a transformer-based architecture that operates across simulators, a wide range of materials (including plastic-bonded explosives, steel, aluminum, shale, and tungsten), and diverse loading conditions. The model supports both structured and unstructured meshes, combining them with large language model embeddings of textual input decks specifying material properties, boundary conditions, and solver settings. This multimodal input design enables flexible adaptation across simulation scenarios without changes to the model architecture. The trained model can be fine-tuned with minimal data on diverse downstream tasks, including time-to-failure estimation, modeling fracture evolution, and adapting to combined finite-discrete element method simulations. It also generalizes to unseen materials such as titanium and concrete, requiring as few as a single sample, dramatically reducing data needs compared to standard ML. Our results show that fracture prediction can be unified under a single model architecture, offering a scalable, extensible alternative to simulator-specific workflows.

Updated: 2025-07-30 20:23:36

标题: 一个用于材料断裂预测的基础模型

摘要: 准确预测材料何时以及如何失效对于设计安全、可靠的结构、机械系统和在应力下运行的工程部件至关重要。然而，在现实世界的各种材料、几何形状和加载条件中，断裂行为仍然难以建模。虽然机器学习（ML）方法显示出潜力，但大多数模型是在狭窄数据集上训练的，缺乏鲁棒性，并且难以泛化。与此同时，基于物理的模拟器提供高保真度的预测，但是分散在专门的方法中，并且需要大量高性能计算资源来探索输入空间。为了解决这些限制，我们提出了一个基于数据驱动的断裂预测模型，这是一个基于变压器的架构，可以跨模拟器操作，适用于广泛的材料（包括塑料粘结炸药、钢铁、铝、页岩和钨）和多样化的加载条件。该模型支持结构化和非结构化网格，将它们与文本输入卡片的大型语言模型嵌入结合起来，用于指定材料属性、边界条件和求解器设置。这种多模态输入设计使得在不改变模型架构的情况下，能够灵活适应各种模拟场景。经过训练的模型可以通过最少的数据在各种下游任务上进行微调，包括时间至失效估计、建模断裂演化以及适应组合有限-离散元法模拟。它还可以泛化到未见过的材料，如钛和混凝土，只需一个样本，相较于标准的机器学习方法，大大减少了数据需求。我们的结果表明，断裂预测可以统一在一个模型架构下，提供一种可扩展的、易扩展的替代模拟器特定工作流。

更新时间: 2025-07-30 20:23:36

领域: cs.LG,cond-mat.mtrl-sci,physics.geo-ph

下载: http://arxiv.org/abs/2507.23077v1

Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks

Human action recognition plays a critical role in healthcare and medicine, supporting applications such as patient behavior monitoring, fall detection, surgical robot supervision, and procedural skill assessment. While traditional models like CNNs and RNNs have achieved moderate success, they often struggle to generalize across diverse and complex actions. Recent advancements in vision-language models, especially the transformer-based CLIP model, offer promising capabilities for generalizing action recognition from video data. In this work, we evaluate CLIP on the UCF-101 dataset and systematically analyze its performance under three masking strategies: (1) percentage-based and shape-based black masking at 10%, 30%, and 50%, (2) feature-specific masking to suppress bias-inducing elements, and (3) isolation masking that retains only class-specific regions. Our results reveal that CLIP exhibits inconsistent behavior and frequent misclassifications, particularly when essential visual cues are obscured. To overcome these limitations, we propose incorporating class-specific noise, learned via a custom loss function, to reinforce attention to class-defining features. This enhancement improves classification accuracy and model confidence while reducing bias. We conclude with a discussion on the challenges of applying such models in clinical domains and outline directions for future work to improve generalizability across domain-independent healthcare scenarios.

Updated: 2025-07-30 20:14:41

标题: 推进基于视觉的人体动作识别：探索视觉-语言CLIP模型在领域无关任务中的泛化

摘要: 人类动作识别在医疗保健和医学领域起着至关重要的作用，支持诸如患者行为监测、跌倒检测、外科机器人监督和程序技能评估等应用。尽管传统模型如CNN和RNN取得了适度成功，但它们往往难以泛化到各种复杂的动作中。近年来，视觉-语言模型的进展，特别是基于Transformer的CLIP模型，为从视频数据中泛化动作识别提供了有希望的能力。在这项工作中，我们评估了CLIP在UCF-101数据集上的表现，并系统分析了其在三种屏蔽策略下的性能：（1）基于百分比和形状的黑色屏蔽，分别为10％、30％和50％，（2）特征特定的屏蔽，以抑制引入偏见的元素，以及（3）保留仅类别特定区域的孤立屏蔽。我们的结果显示，CLIP表现出不一致的行为和频繁的误分类，特别是在关键的视觉线索被遮挡时。为了克服这些限制，我们提出通过自定义损失函数学习类别特定的噪声，以加强对定义类别的特征的关注。这种增强提高了分类准确性和模型信心，同时减少了偏见。最后，我们讨论了在临床领域应用这些模型的挑战，并概述了未来工作的方向，以提高跨领域独立医疗保健场景的泛化能力。

更新时间: 2025-07-30 20:14:41

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.18675v2

Locally Differentially Private Thresholding Bandits

This work investigates the impact of ensuring local differential privacy in the thresholding bandit problem. We consider both the fixed budget and fixed confidence settings. We propose methods that utilize private responses, obtained through a Bernoulli-based differentially private mechanism, to identify arms with expected rewards exceeding a predefined threshold. We show that this procedure provides strong privacy guarantees and derive theoretical performance bounds on the proposed algorithms. Additionally, we present general lower bounds that characterize the additional loss incurred by any differentially private mechanism, and show that the presented algorithms match these lower bounds up to poly-logarithmic factors. Our results provide valuable insights into privacy-preserving decision-making frameworks in bandit problems.

Updated: 2025-07-30 20:08:30

标题: 局部差分隐私阈值赌博算法

摘要: 这项工作研究了在阈值赌博问题中确保本地差分隐私的影响。我们考虑固定预算和固定置信度设置。我们提出了利用通过基于伯努利的差分隐私机制获得的私密响应的方法，以识别预期奖励超过预定义阈值的臂。我们展示了这个过程提供了强大的隐私保证，并推导了所提算法的理论性能界限。此外，我们提出了表征任何差分私密机制造成的额外损失的一般下界，并展示所提出的算法与这些下界相匹配，直到对数多项式因子。我们的结果为在赌博问题中隐私保护决策框架提供了宝贵的见解。

更新时间: 2025-07-30 20:08:30

领域: cs.LG

下载: http://arxiv.org/abs/2507.23073v1

Affect Models Have Weak Generalizability to Atypical Speech

Speech and voice conditions can alter the acoustic properties of speech, which could impact the performance of paralinguistic models for affect for people with atypical speech. We evaluate publicly available models for recognizing categorical and dimensional affect from speech on a dataset of atypical speech, comparing results to datasets of typical speech. We investigate three dimensions of speech atypicality: intelligibility, which is related to pronounciation; monopitch, which is related to prosody, and harshness, which is related to voice quality. We look at (1) distributional trends of categorical affect predictions within the dataset, (2) distributional comparisons of categorical affect predictions to similar datasets of typical speech, and (3) correlation strengths between text and speech predictions for spontaneous speech for valence and arousal. We find that the output of affect models is significantly impacted by the presence and degree of speech atypicalities. For instance, the percentage of speech predicted as sad is significantly higher for all types and grades of atypical speech when compared to similar typical speech datasets. In a preliminary investigation on improving robustness for atypical speech, we find that fine-tuning models on pseudo-labeled atypical speech data improves performance on atypical speech without impacting performance on typical speech. Our results emphasize the need for broader training and evaluation datasets for speech emotion models, and for modeling approaches that are robust to voice and speech differences.

Updated: 2025-07-30 20:02:12

标题: 情感模型对非典型语音的泛化能力较弱

摘要: 语音和声音条件可以改变语音的声学特性，这可能会影响对非典型言语人群情感的语用模型的表现。我们在非典型言语数据集上评估公开可用的模型，用于识别语音中的分类和维度情感，将结果与典型言语数据集进行比较。我们研究了三个言语非典型的维度：可理解性，与发音有关；单音高，与语调有关；粗糙度，与声音质量有关。我们考察了数据集中分类情感预测的分布趋势，将分类情感预测与类似的典型言语数据集进行分布比较，以及在愉悦度和唤醒度的自发言语中文本和语音预测之间的相关性强度。我们发现情感模型的输出受到言语非典型的存在和程度的显著影响。例如，与类似的典型言语数据集相比，所有类型和等级的非典型言语中被预测为悲伤的语音百分比显著更高。在改进非典型言语的鲁棒性的初步调查中，我们发现在伪标记的非典型言语数据上微调模型可以提高非典型言语的表现，而不会影响典型言语的表现。我们的结果强调了对语音情感模型进行更广泛的训练和评估数据集的需求，以及对能够应对声音和言语差异的建模方法的需求。

更新时间: 2025-07-30 20:02:12

领域: cs.LG

下载: http://arxiv.org/abs/2504.16283v2

FairReason: Balancing Reasoning and Social Bias in MLLMs

Multimodal Large Language Models (MLLMs) already achieve state-of-the-art results across a wide range of tasks and modalities. To push their reasoning ability further, recent studies explore advanced prompting schemes and post-training fine-tuning. Although these techniques improve logical accuracy, they frequently leave the models' outputs burdened with pronounced social biases. Clarifying how reasoning gains interact with bias mitigation-and whether the two objectives inherently trade off-therefore remains an open and pressing research problem. Our study begins by benchmarking three bias-mitigation strategies-supervised fine-uning (SFT), knowledge distillation (KD), and rule-based reinforcement learning (RL)-under identical conditions, establishing their baseline strengths and weaknesses. Building on these results, we vary the proportion of debias-focused and reasoning-centric samples within each paradigm to chart the reasoning-versus-bias trade-off. Our sweeps reveal a consistent sweet spot: a roughly 1:4 mix trained with reinforcement learning cuts stereotype scores by 10% while retaining 88% of the model's original reasoning accuracy, offering concrete guidance for balancing fairness and capability in MLLMs.

Updated: 2025-07-30 19:57:22

标题: 公平原因：在MLLMs中平衡推理和社会偏见

摘要: 多模态大语言模型（MLLMs）已经在广泛的任务和模态下取得了最先进的结果。为了进一步提高它们的推理能力，最近的研究探索了先进的提示方案和后期微调。尽管这些技术提高了逻辑准确性，但它们经常使模型的输出充满了明显的社会偏见。因此，澄清推理收益如何与偏见缓解相互作用，以及这两个目标是否本质上存在权衡，仍然是一个开放且紧迫的研究问题。我们的研究首先通过在相同条件下对三种偏见缓解策略（监督微调（SFT）、知识蒸馏（KD）和基于规则的强化学习（RL））进行基准测试，建立其基线优势和劣势。基于这些结果，我们改变了每种范式中关注去偏见和推理为中心的样本比例，以绘制推理与偏见之间的权衡。我们的调查揭示了一个一致的最佳点：大约1:4的混合使用强化学习训练可以将刻板印象得分降低10%，同时保留模型原始推理准确性的88%，为在MLLMs中平衡公平性和能力提供了具体指导。

更新时间: 2025-07-30 19:57:22

领域: cs.AI

下载: http://arxiv.org/abs/2507.23067v1

Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & Waypoints

Autonomous cars need geometric accuracy and semantic understanding to navigate complex environments, yet most stacks handle them separately. We present XYZ-Drive, a single vision-language model that reads a front-camera frame, a 25m $\times$ 25m overhead map, and the next waypoint, then outputs steering and speed. A lightweight goal-centered cross-attention layer lets waypoint tokens highlight relevant image and map patches, supporting both action and textual explanations, before the fused tokens enter a partially fine-tuned LLaMA-3.2 11B model. On the MD-NEX Outdoor-Driving benchmark XYZ-Drive attains 95% success and 0.80 Success weighted by Path Length (SPL), surpassing PhysNav-DG by 15%. and halving collisions, all while significantly improving efficiency by using only a single branch. Sixteen ablations explain the gains. Removing any modality (vision, waypoint, map) drops success by up to 11%, confirming their complementary roles and rich connections. Replacing goal-centered attention with simple concatenation cuts 3% in performance, showing query-based fusion injects map knowledge more effectively. Keeping the transformer frozen loses 5%, showing the importance of fine-tuning when applying VLMs for specific tasks such as autonomous driving. Coarsening map resolution from 10 cm to 40 cm blurs lane edges and raises crash rate. Overall, these results demonstrate that early, token-level fusion of intent and map layout enables accurate, transparent, real-time driving.

Updated: 2025-07-30 19:51:23

标题: 实时自动驾驶的视觉语言融合：摄像头、高清地图和路径点的目标中心交叉注意力

摘要: 自动驾驶汽车需要几何精度和语义理解来导航复杂环境，然而大多数技术堆栈都是分开处理它们的。我们提出了XYZ-Drive，一个单一的视觉-语言模型，它读取前置摄像头帧、一个25m x 25m的俯视地图和下一个航点，然后输出转向和速度。一个轻量级的以目标为中心的交叉注意力层让航点标记突出显示相关的图像和地图区块，支持动作和文本解释，然后合并的标记进入部分微调的LLaMA-3.2 11B模型。在MD-NEX户外驾驶基准测试中，XYZ-Drive获得了95%的成功率和0.80的路径长度加权成功率（SPL），超过PhysNav-DG 15%。并且碰撞减半，同时通过仅使用单一分支显着提高了效率。十六个消融实验证明了这些收益。去除任何模态（视觉、航点、地图）最多会降低11%的成功率，证实它们的互补作用和丰富的联系。将以目标为中心的注意力替换为简单的串联会降低3%的性能，表明基于查询的融合更有效地注入地图知识。保持变压器冷冻状态会损失5%，表明在应用VLMs进行特定任务如自动驾驶时微调的重要性。将地图分辨率从10厘米粗糙化到40厘米会模糊车道边缘并增加碰撞率。总的来说，这些结果表明了意图和地图布局的早期、令牌级别的融合使准确、透明、实时驾驶成为可能。

更新时间: 2025-07-30 19:51:23

领域: cs.CV,cs.AI,cs.LG,cs.RO,I.4.8; I.2.10; I.2.6; C.3.3; I.4.9

下载: http://arxiv.org/abs/2507.23064v1

Lattice Protein Folding with Variational Annealing

Understanding the principles of protein folding is a cornerstone of computational biology, with implications for drug design, bioengineering, and the understanding of fundamental biological processes. Lattice protein folding models offer a simplified yet powerful framework for studying the complexities of protein folding, enabling the exploration of energetically optimal folds under constrained conditions. However, finding these optimal folds is a computationally challenging combinatorial optimization problem. In this work, we introduce a novel upper-bound training scheme that employs masking to identify the lowest-energy folds in two-dimensional Hydrophobic-Polar (HP) lattice protein folding. By leveraging Dilated Recurrent Neural Networks (RNNs) integrated with an annealing process driven by temperature-like fluctuations, our method accurately predicts optimal folds for benchmark systems of up to 60 beads. Our approach also effectively masks invalid folds from being sampled without compromising the autoregressive sampling properties of RNNs. This scheme is generalizable to three spatial dimensions and can be extended to lattice protein models with larger alphabets. Our findings emphasize the potential of advanced machine learning techniques in tackling complex protein folding problems and a broader class of constrained combinatorial optimization challenges.

Updated: 2025-07-30 19:46:48

标题: 使用变分退火的格蛋白折叠

摘要: 理解蛋白质折叠原理是计算生物学的基石，对药物设计、生物工程和基本生物过程的理解具有重要意义。晶格蛋白质折叠模型提供了一个简化但强大的框架，用于研究蛋白质折叠的复杂性，使得在受限条件下探索能量最优折叠成为可能。然而，找到这些最优折叠是一个计算上具有挑战性的组合优化问题。在这项工作中，我们引入了一种新颖的上界训练方案，利用掩蔽来识别二维疏水-极性（HP）晶格蛋白质折叠中能量最低的折叠。通过结合具有温度-类波动驱动的退火过程的扩张循环神经网络（RNNs），我们的方法准确预测了长达60个珠子的基准系统的最优折叠。我们的方法还有效地屏蔽了无效折叠的采样，同时不影响RNNs的自回归采样特性。这种方案可以推广到三维空间维度，并可扩展到具有更大字母表的晶格蛋白质模型。我们的发现强调了先进的机器学习技术在解决复杂蛋白质折叠问题和更广泛的受限组合优化挑战中的潜力。

更新时间: 2025-07-30 19:46:48

领域: cond-mat.dis-nn,cs.AI,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2502.20632v2

Reference-Guided Diffusion Inpainting For Multimodal Counterfactual Generation

Safety-critical applications, such as autonomous driving and medical image analysis, require extensive multimodal data for rigorous testing. Synthetic data methods are gaining prominence due to the cost and complexity of gathering real-world data, but they demand a high degree of realism and controllability to be useful. This work introduces two novel methods for synthetic data generation in autonomous driving and medical image analysis, namely MObI and AnydoorMed, respectively. MObI is a first-of-its-kind framework for Multimodal Object Inpainting that leverages a diffusion model to produce realistic and controllable object inpaintings across perceptual modalities, demonstrated simultaneously for camera and lidar. Given a single reference RGB image, MObI enables seamless object insertion into existing multimodal scenes at a specified 3D location, guided by a bounding box, while maintaining semantic consistency and multimodal coherence. Unlike traditional inpainting methods that rely solely on edit masks, this approach uses 3D bounding box conditioning to ensure accurate spatial positioning and realistic scaling. AnydoorMed extends this paradigm to the medical imaging domain, focusing on reference-guided inpainting for mammography scans. It leverages a diffusion-based model to inpaint anomalies with impressive detail preservation, maintaining the reference anomaly's structural integrity while semantically blending it with the surrounding tissue. Together, these methods demonstrate that foundation models for reference-guided inpainting in natural images can be readily adapted to diverse perceptual modalities, paving the way for the next generation of systems capable of constructing highly realistic, controllable and multimodal counterfactual scenarios.

Updated: 2025-07-30 19:43:47

标题: 参考引导的扩散修复法用于多模态反事实生成

摘要: 安全关键应用程序，如自动驾驶和医学图像分析，需要大量多模态数据进行严格测试。由于收集现实世界数据的成本和复杂性，合成数据方法正变得越来越重要，但它们要求具有高度的真实性和可控性才能发挥作用。本研究介绍了两种新颖的合成数据生成方法，分别是用于自动驾驶和医学图像分析的MObI和AnydoorMed。 MObI是一种首创的多模态对象修补框架，利用扩散模型在感知模态上产生真实和可控的对象修补，同时在相机和激光雷达上进行演示。给定单个参考RGB图像，MObI使得能够在指定的3D位置上无缝插入对象到现有多模态场景中，通过边界框进行引导，同时保持语义一致性和多模态连贯性。与仅依赖于编辑蒙版的传统修补方法不同，这种方法使用3D边界框条件来确保准确的空间定位和逼真的缩放。AnydoorMed将这种范例扩展到医学影像领域，专注于乳腺X光扫描的参考引导修补。它利用基于扩散的模型来修补异常，同时保持参考异常的结构完整性，并将其在周围组织中语义上融合。这些方法共同展示了，自然图像中参考引导修补的基础模型可以轻松适应各种感知模态，为下一代系统能够构建高度逼真、可控和多模态的反事实场景铺平了道路。

更新时间: 2025-07-30 19:43:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.23058v1

Prediction of Significant Creatinine Elevation in First ICU Stays with Vancomycin Use: A retrospective study through Catboost

Background: Vancomycin, a key antibiotic for severe Gram-positive infections in ICUs, poses a high nephrotoxicity risk. Early prediction of kidney injury in critically ill patients is challenging. This study aimed to develop a machine learning model to predict vancomycin-related creatinine elevation using routine ICU data. Methods: We analyzed 10,288 ICU patients (aged 18-80) from the MIMIC-IV database who received vancomycin. Kidney injury was defined by KDIGO criteria (creatinine rise >=0.3 mg/dL within 48h or >=50% within 7d). Features were selected via SelectKBest (top 30) and Random Forest ranking (final 15). Six algorithms were tested with 5-fold cross-validation. Interpretability was evaluated using SHAP, Accumulated Local Effects (ALE), and Bayesian posterior sampling. Results: Of 10,288 patients, 2,903 (28.2%) developed creatinine elevation. CatBoost performed best (AUROC 0.818 [95% CI: 0.801-0.834], sensitivity 0.800, specificity 0.681, negative predictive value 0.900). Key predictors were phosphate, total bilirubin, magnesium, Charlson index, and APSIII. SHAP confirmed phosphate as a major risk factor. ALE showed dose-response patterns. Bayesian analysis estimated mean risk 60.5% (95% credible interval: 16.8-89.4%) in high-risk cases. Conclusions: This machine learning model predicts vancomycin-associated creatinine elevation from routine ICU data with strong accuracy and interpretability, enabling early risk detection and supporting timely interventions in critical care.

Updated: 2025-07-30 19:15:37

标题: 预测首次ICU住院患者在万古霉素使用期间肌酐显著升高的研究：基于Catboost的回顾性研究

摘要: 背景：万古霉素是ICU中治疗严重革兰氏阳性感染的关键抗生素，但具有较高的肾毒性风险。在危重病患者中早期预测肾损伤是具有挑战性的。本研究旨在利用常规ICU数据开发一个机器学习模型，以预测与万古霉素相关的肌酐升高。方法：我们分析了来自MIMIC-IV数据库的10,288名接受万古霉素治疗的ICU患者（年龄在18-80岁之间）。肾损伤的定义是根据KDIGO标准（48小时内肌酐升高>=0.3 mg/dL或7天内升高>=50%）。特征通过SelectKBest（前30个）和随机森林排名（最终15个）进行选择。使用5折交叉验证测试了六种算法。通过SHAP、累积局部效应（ALE）和贝叶斯后验采样评估了可解释性。结果：在10,288名患者中，有2,903人（28.2%）出现了肌酐升高。CatBoost表现最佳（AUROC 0.818 [95% CI: 0.801-0.834]，敏感性0.800，特异性0.681，阴性预测值0.900）。关键预测因子包括磷酸盐、总胆红素、镁、Charlson指数和APSIII。SHAP确认了磷酸盐是一个主要危险因子。ALE显示了剂量-反应模式。贝叶斯分析估计高风险病例的平均风险为60.5%（95%可信区间：16.8-89.4%）。结论：这个机器学习模型能够从常规ICU数据中准确地预测与万古霉素相关的肌酐升高，具有强大的准确性和可解释性，可以实现早期风险检测，并支持危重护理中及时干预。

更新时间: 2025-07-30 19:15:37

领域: cs.LG

下载: http://arxiv.org/abs/2507.23043v1

Early Goal-Guided Multi-Scale Fusion for Real-Time Vision-Language Driving

Autonomous vehicles must react in milliseconds while reasoning about road geometry and traffic intent to navigate complex situations. We introduce NovaDrive, a single-branch vision-language architecture that processes front-camera images, HD-map tiles, LiDAR depth, and textual waypoints in a single branch. A lightweight, two-stage cross-attention block first aligns waypoint tokens with the HD map, then refines attention over fine-grained image and depth patches. Coupled with a novel smoothness loss that discourages abrupt steering and speed changes, this design eliminates the need for recurrent memory. We fine-tune the top 15 layers of an 11B LLaMA-3.2 vision-language backbone, enabling real-time inference. On the nuScenes / Waymo subset of the MD-NEX Outdoor benchmark, NovaDrive raises success rate to 84% (+4%), boosts path-efficiency (SPL) to 0.66 (+0.11), and reduces collision frequency from 2.6% to 1.2% (-1.4%) relative to the previous state-of-the-art. Our ablations confirm that waypoint tokens, partial VLM fine-tuning, and the cross-attention fusion each contribute the most to these gains. Beyond safety, NovaDrive's shorter routes (resulting from the novel smoothness loss) translate to lower fuel or battery usage, pointing toward leaner, more easily updated driving stacks. NovaDrive can be extended to other embodied-AI domains as well.

Updated: 2025-07-30 19:12:42

标题: 早期目标引导的多尺度融合实时视觉语言驾驶

摘要: 自主驾驶汽车必须在毫秒内做出反应，同时推理道路几何和交通意图以应对复杂情况。我们引入了NovaDrive，这是一种单分支视觉语言架构，可以处理前置摄像头图像、高清地图瓦片、LiDAR深度和文本航路点在一个分支中。一个轻量级的、两阶段的交叉注意力块首先将航路点令牌与高清地图对齐，然后在细粒度图像和深度补丁上优化注意力。结合一种新颖的平滑损失，可以抑制突然的转向和速度变化，这种设计消除了对循环内存的需求。我们微调了11B LLaMA-3.2视觉语言主干的前15层，实现了实时推理。在MD-NEX Outdoor基准的nuScenes/Waymo子集上，NovaDrive将成功率提高到84%（+4%），将路径效率（SPL）提高到0.66（+0.11），并将碰撞频率从2.6%降低到1.2%（-1.4%），相对于先前的最新技术。我们的消融实验证实，航路点令牌、部分VLM微调和交叉注意力融合对这些增益贡献最大。除了安全性外，NovaDrive的更短路径（由新颖的平滑损失产生）意味着更低的燃料或电池使用量，指向更精简、更容易更新的驾驶堆栈。NovaDrive也可以扩展到其他具体化人工智能领域。

更新时间: 2025-07-30 19:12:42

领域: cs.CV,cs.AI,cs.LG,cs.MM,cs.RO,I.2.6; I.2.9; I.2.10; C.3.3

下载: http://arxiv.org/abs/2507.23042v1

Two-dimensional Parallel Tempering for Constrained Optimization

Sampling Boltzmann probability distributions plays a key role in machine learning and optimization, motivating the design of hardware accelerators such as Ising machines. While the Ising model can in principle encode arbitrary optimization problems, practical implementations are often hindered by soft constraints that either slow down mixing when too strong, or fail to enforce feasibility when too weak. We introduce a two-dimensional extension of the powerful parallel tempering algorithm (PT) that addresses this challenge by adding a second dimension of replicas interpolating the penalty strengths. This scheme ensures constraint satisfaction in the final replicas, analogous to low-energy states at low temperature. The resulting two-dimensional parallel tempering algorithm (2D-PT) improves mixing in heavily constrained replicas and eliminates the need to explicitly tune the penalty strength. In a representative example of graph sparsification with copy constraints, 2D-PT achieves near-ideal mixing, with Kullback-Leibler divergence decaying as O(1/t). When applied to sparsified Wishart instances, 2D-PT yields orders of magnitude speedup over conventional PT with the same number of replicas. The method applies broadly to constrained Ising problems and can be deployed on existing Ising machines.

Updated: 2025-07-30 19:09:52

标题: 受限优化的二维平行淬火

摘要: 采样Boltzmann概率分布在机器学习和优化中起着关键作用，促使硬件加速器的设计，如Ising机器。虽然Ising模型原则上可以编码任意优化问题，但实际实现常常受软约束的限制，这些软约束在太强时会减慢混合速度，或者在太弱时无法实现可行性。我们引入了两维扩展的强大的并行退火算法（PT），通过添加第二维度的插值副本来解决这一挑战，插值副本可以调节惩罚强度。该方案确保了最终副本的约束满足，类似于低温下的低能态。由此产生的两维并行退火算法（2D-PT）改善了受到严重约束的副本的混合，并消除了需要明确调节惩罚强度的需要。在具有复制约束的图稀疏化的代表性示例中，2D-PT实现了接近理想的混合，Kullback-Leibler散度随时间以O(1/t)衰减。当应用于稀疏化Wishart实例时，2D-PT相对于具有相同数量副本的传统PT获得了数量级的加速。该方法广泛适用于受约束的Ising问题，并可部署在现有的Ising机器上。

更新时间: 2025-07-30 19:09:52

领域: cs.LG,cond-mat.stat-mech,math.OC,stat.ML

下载: http://arxiv.org/abs/2506.14781v2

Prompt Engineering Techniques for Mitigating Cultural Bias Against Arabs and Muslims in Large Language Models: A Systematic Review

Large language models have demonstrated remarkable capabilities across various domains, yet concerns about cultural bias - particularly towards Arabs and Muslims - pose significant ethical challenges by perpetuating harmful stereotypes and marginalization. Despite growing recognition of bias in LLMs, prompt engineering strategies specifically addressing Arab and Muslim representation remain understudied. This mixed-methods systematic review examines such techniques, offering evidence-based guidance for researchers and practitioners. Following PRISMA guidelines and Kitchenham's systematic review methodology, we analyzed 8 empirical studies published between 2021-2024 investigating bias mitigation strategies. Our findings reveal five primary prompt engineering approaches: cultural prompting, affective priming, self-debiasing techniques, structured multi-step pipelines, and parameter-optimized continuous prompts. Although all approaches show potential for reducing bias, effectiveness varied substantially across studies and bias types. Evidence suggests that certain bias types may be more resistant to prompt-based mitigation than others. Structured multi-step pipelines demonstrated the highest overall effectiveness, achieving up to 87.7% reduction in bias, though they require greater technical expertise. Cultural prompting offers broader accessibility with substantial effectiveness. These results underscore the accessibility of prompt engineering for mitigating cultural bias without requiring access to model parameters. The limited number of studies identified highlights a significant research gap in this critical area. Future research should focus on developing culturally adaptive prompting techniques, creating Arab and Muslim-specific evaluation resources, and integrating prompt engineering with complementary debiasing methods to address deeper stereotypes while maintaining model utility.

Updated: 2025-07-30 19:07:18

标题: 快速工程技术对抗大型语言模型中针对阿拉伯和穆斯林文化偏见的系统性审查

摘要: 大型语言模型已在各个领域展示出卓越的能力，然而对文化偏见的担忧 - 尤其是对阿拉伯人和穆斯林的偏见 - 通过延续有害的刻板印象和边缘化，构成了重大的伦理挑战。尽管人们越来越意识到LLMs中的偏见，但针对阿拉伯人和穆斯林代表性的及时工程策略仍然鲜有研究。本混合方法系统评价考察了这些技术，为研究人员和从业者提供基于证据的指导。遵循PRISMA指南和Kitchenham的系统评审方法，我们分析了8项2021-2024年发表的实证研究，探讨了偏见减轻策略。我们的研究结果揭示了五种主要的及时工程方法：文化提示、情感启动、自我去偏见技术、结构化多步骤管道和参数优化的连续提示。尽管所有方法都显示了减轻偏见的潜力，但在研究和偏见类型之间的有效性差异很大。证据表明，某些偏见类型可能比其他类型更难以通过提示来减轻。结构化多步骤管道展示了最高的整体有效性，可以实现高达87.7%的偏见减少，尽管它们需要更多的技术专长。文化提示提供了更广泛的可及性和显著的有效性。这些结果强调了通过提示工程减轻文化偏见的可及性，而无需访问模型参数。鉴于在这一关键领域存在重大的研究空白，已确认的研究数量有限。未来的研究应集中于开发具有文化适应性的提示技术，创建针对阿拉伯人和穆斯林的特定评估资源，并将提示工程与其他去偏见方法相结合，以解决更深层次的刻板印象问题，同时保持模型的实用性。

更新时间: 2025-07-30 19:07:18

领域: cs.CL,cs.AI,cs.CY,cs.HC

下载: http://arxiv.org/abs/2506.18199v2

Linking Actor Behavior to Process Performance Over Time

Understanding how actor behavior influences process outcomes is a critical aspect of process mining. Traditional approaches often use aggregate and static process data, overlooking the temporal and causal dynamics that arise from individual actor behavior. This limits the ability to accurately capture the complexity of real-world processes, where individual actor behavior and interactions between actors significantly shape performance. In this work, we address this gap by integrating actor behavior analysis with Granger causality to identify correlating links in time series data. We apply this approach to realworld event logs, constructing time series for actor interactions, i.e. continuation, interruption, and handovers, and process outcomes. Using Group Lasso for lag selection, we identify a small but consistently influential set of lags that capture the majority of causal influence, revealing that actor behavior has direct and measurable impacts on process performance, particularly throughput time. These findings demonstrate the potential of actor-centric, time series-based methods for uncovering the temporal dependencies that drive process outcomes, offering a more nuanced understanding of how individual behaviors impact overall process efficiency.

Updated: 2025-07-30 19:04:07

标题: 将演员行为与时间内的流程绩效联系起来

摘要: 理解演员行为如何影响流程结果是过程挖掘的关键方面。传统方法通常使用聚合和静态流程数据，忽略了个体演员行为所产生的时间和因果动态。这限制了准确捕捉现实世界流程复杂性的能力，其中个体演员行为和演员之间的互动显著影响绩效。在这项工作中，我们通过将演员行为分析与Granger因果关系相结合，识别时间序列数据中的相关链接，来填补这一空白。我们将这一方法应用于真实世界事件日志，构建演员互动、即持续性、中断和移交以及流程结果的时间序列。通过使用Group Lasso进行滞后选择，我们确定了一组小但一贯具有影响力的滞后，捕捉了大部分因果影响，揭示了演员行为对流程绩效，特别是吞吐时间具有直接且可测量的影响。这些发现展示了以演员为中心、基于时间序列的方法揭示驱动流程结果的时间依赖性的潜力，提供了更加细致的理解个体行为如何影响整体流程效率。

更新时间: 2025-07-30 19:04:07

领域: cs.LG

下载: http://arxiv.org/abs/2507.23037v1

KLLM: Fast LLM Inference with K-Means Quantization

Large language model (LLM) inference poses significant challenges due to its intensive memory and computation demands. Weight and activation quantization (WAQ) offers a promising solution by reducing both memory footprint and arithmetic complexity. However, two key challenges remain in the existing WAQ designs. (1) Traditional WAQ designs rely on uniform integer-based quantization for hardware efficiency, but this often results in significant accuracy degradation at low precision. K-Means-based quantization, a non-uniform quantization technique, achieves higher accuracy by matching the Gaussian-like distributions of weights and activations in LLMs. However, its non-uniform nature prevents direct execution on low-precision compute units, requiring dequantization and floating-point matrix multiplications (MatMuls) during inference. (2) Activation outliers further hinder effective low-precision WAQ. Offline thresholding methods for outlier detection can lead to significant model performance degradation, while existing online detection techniques introduce substantial runtime overhead. To address the aforementioned challenges and fully unleash the potential of WAQ with K-Means quantization for LLM inference, in this paper, we propose KLLM, a hardware-software co-design framework. KLLM features an index-based computation scheme for efficient execution of MatMuls and nonlinear operations on K-Means-quantized data, which avoids most of the dequantization and full-precision computations. Moreover, KLLM incorporates a novel outlier detection engine, Orizuru, that efficiently identifies the top-$k$ largest and smallest elements in the activation data stream during online inference. Extensive experiments show that, on average, KLLM achieves speedups of 9.67x, 7.03x and energy efficiency improvements of 229.50x, 150.21x compared to the A100 GPU and Atom, respectively.

Updated: 2025-07-30 19:01:25

标题: KLLM：使用K-Means量化进行快速LLM推断

摘要: 大型语言模型（LLM）推断由于其密集的内存和计算需求而面临重大挑战。权重和激活量化（WAQ）通过减少内存占用和算术复杂性，提供了一个有希望的解决方案。然而，现有WAQ设计中仍存在两个关键挑战。（1）传统的WAQ设计依赖于均匀的基于整数的量化以提高硬件效率，但这往往会导致低精度时显著的准确性降低。基于K均值的量化是一种非均匀量化技术，通过匹配LLM中权重和激活的类似高斯分布，实现更高的准确性。然而，其非均匀性使得无法直接在低精度计算单元上执行，需要在推断过程中进行反量化和浮点矩阵乘法（MatMuls）。（2）激活异常值进一步阻碍了有效的低精度WAQ。离线阈值方法用于异常值检测可能导致显著的模型性能降低，而现有的在线检测技术引入了大量的运行时开销。为了解决上述挑战并充分释放WAQ与K均值量化在LLM推断中的潜力，在本文中，我们提出了KLLM，一个硬件软件协同设计框架。KLLM采用基于索引的计算方案，以有效地执行MatMuls和K均值量化数据上的非线性运算，避免了大部分反量化和全精度计算。此外，KLLM还整合了一种新颖的异常值检测引擎Orizuru，在在线推断过程中有效地识别激活数据流中的前k个最大和最小元素。大量实验表明，与A100 GPU和Atom相比，KLLM平均实现了9.67倍、7.03倍的加速度，以及229.50倍、150.21倍的能效提升。

更新时间: 2025-07-30 19:01:25

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2507.23035v1

Recursive Learning-Based Virtual Buffering for Analytical Global Placement

Due to the skewed scaling of interconnect versus cell delay in modern technology nodes, placement with buffer porosity (i.e., cell density) awareness is essential for timing closure in physical synthesis flows. However, existing approaches face two key challenges: (i) traditional van Ginneken-Lillis-style buffering approaches are computationally expensive during global placement; and (ii) machine learning-based approaches, such as BufFormer, lack a thorough consideration of Electrical Rule Check (ERC) violations and fail to "close the loop" back into the physical design flow. In this work, we propose MLBuf-RePlAce, the first open-source learning-driven virtual buffering-aware analytical global placement framework, built on top of the OpenROAD infrastructure. MLBuf-RePlAce adopts an efficient recursive learning-based generative buffering approach to predict buffer types and locations, addressing ERC violations during global placement. We compare MLBuf-RePlAce against the default virtual buffering-based timing-driven global placer in OpenROAD, using open-source testcases from the TILOS MacroPlacement and OpenROAD-flow-scripts repositories. Without degradation of post-route power, MLBuf-RePlAce achieves (maximum, average) improvements of (56%, 31%) in total negative slack (TNS) within the open-source OpenROAD flow. When evaluated by completion in a commercial flow, MLBuf-RePlAce achieves (maximum, average) improvements of (53%, 28%) in TNS with an average of 0.2% improvement in post-route power.

Updated: 2025-07-30 18:51:25

标题: 递归学习虚拟缓冲用于分析全局布局

摘要: 由于现代技术节点中互连与单元延迟的不平衡缩放，具有缓冲孔隙度（即单元密度）意识的放置对于物理综合流程中的时序闭合至关重要。然而，现有方法面临两个关键挑战：（i）传统的van Ginneken-Lillis风格的缓冲方法在全局放置过程中计算成本高昂；（ii）基于机器学习的方法，如BufFormer，缺乏对电气规则检查（ERC）违规的全面考虑，并且未能“闭环”回到物理设计流程中。在这项工作中，我们提出了MLBuf-RePlAce，这是第一个基于开源学习驱动的虚拟缓冲感知的分析全局放置框架，构建在OpenROAD基础设施之上。MLBuf-RePlAce采用高效的递归学习驱动的生成缓冲方法来预测缓冲类型和位置，解决全局放置过程中的ERC违规问题。我们使用来自TILOS MacroPlacement和OpenROAD-flow-scripts存储库的开源测试用例，将MLBuf-RePlAce与OpenROAD中的默认虚拟缓冲驱动的时序全局放置器进行比较。在开源OpenROAD流程中，MLBuf-RePlAce在不降低后布线功耗的情况下，在总负时序（TNS）方面取得（最大，平均）改进（56％，31％）。在商业流程中进行评估时，MLBuf-RePlAce在TNS方面取得（最大，平均）改进（53％，28％），后布线功耗平均提高了0.2％。

更新时间: 2025-07-30 18:51:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2506.17247v2

Recovering Diagnostic Value: Super-Resolution-Aided Echocardiographic Classification in Resource-Constrained Imaging

Automated cardiac interpretation in resource-constrained settings (RCS) is often hindered by poor-quality echocardiographic imaging, limiting the effectiveness of downstream diagnostic models. While super-resolution (SR) techniques have shown promise in enhancing magnetic resonance imaging (MRI) and computed tomography (CT) scans, their application to echocardiography-a widely accessible but noise-prone modality-remains underexplored. In this work, we investigate the potential of deep learning-based SR to improve classification accuracy on low-quality 2D echocardiograms. Using the publicly available CAMUS dataset, we stratify samples by image quality and evaluate two clinically relevant tasks of varying complexity: a relatively simple Two-Chamber vs. Four-Chamber (2CH vs. 4CH) view classification and a more complex End-Diastole vs. End-Systole (ED vs. ES) phase classification. We apply two widely used SR models-Super-Resolution Generative Adversarial Network (SRGAN) and Super-Resolution Residual Network (SRResNet), to enhance poor-quality images and observe significant gains in performance metric-particularly with SRResNet, which also offers computational efficiency. Our findings demonstrate that SR can effectively recover diagnostic value in degraded echo scans, making it a viable tool for AI-assisted care in RCS, achieving more with less.

Updated: 2025-07-30 18:45:31

标题: 恢复诊断价值：资源受限成像中的超分辨率辅助超声心动图分类

摘要: 在资源受限的环境中，心脏自动解读常常受到心脏超声成像质量不佳的影响，限制了下游诊断模型的有效性。虽然超分辨率（SR）技术在提升磁共振成像（MRI）和计算机断层扫描（CT）方面表现出潜力，但它们在心脏超声成像领域——一种广泛可获取但容易受噪音影响的模态——的应用仍未得到充分探索。在这项工作中，我们研究了基于深度学习的SR技术在改善低质量2D心脏超声图像上的分类准确性的潜力。利用公开可用的CAMUS数据集，我们按图像质量对样本进行分层，并评估了两个不同复杂性的临床相关任务：一个相对简单的二腔室与四腔室（2CH vs. 4CH）视图分类和一个更复杂的舒张末期与收缩末期（ED vs. ES）相位分类。我们使用两种广泛使用的SR模型——超分辨率生成对抗网络（SRGAN）和超分辨率残差网络（SRResNet），来增强质量不佳的图像，并观察到性能指标的显著增益——尤其是在SRResNet中，它还具有计算效率。我们的研究结果表明，SR技术能够有效恢复退化的超声扫描中的诊断价值，使其成为资源受限环境中AI辅助护理的可行工具，实现少量实现更多。

更新时间: 2025-07-30 18:45:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.23027v1

Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction

Predicting human gaze scanpaths is crucial for understanding visual attention, with applications in human-computer interaction, autonomous systems, and cognitive robotics. While deep learning models have advanced scanpath prediction, most existing approaches generate averaged behaviors, failing to capture the variability of human visual exploration. In this work, we present ScanDiff, a novel architecture that combines diffusion models with Vision Transformers to generate diverse and realistic scanpaths. Our method explicitly models scanpath variability by leveraging the stochastic nature of diffusion models, producing a wide range of plausible gaze trajectories. Additionally, we introduce textual conditioning to enable task-driven scanpath generation, allowing the model to adapt to different visual search objectives. Experiments on benchmark datasets show that ScanDiff surpasses state-of-the-art methods in both free-viewing and task-driven scenarios, producing more diverse and accurate scanpaths. These results highlight its ability to better capture the complexity of human visual behavior, pushing forward gaze prediction research. Source code and models are publicly available at https://aimagelab.github.io/ScanDiff.

Updated: 2025-07-30 18:36:09

标题: 用扩散模型模拟人类凝视行为，用于统一的扫描路径预测

摘要: 预测人类注视路径对于理解视觉注意力至关重要，在人机交互、自主系统和认知机器人等领域具有应用。虽然深度学习模型已经推进了注视路径预测，但大多数现有方法生成的是平均行为，未能捕捉人类视觉探索的变异性。在本研究中，我们提出了ScanDiff，一种将扩散模型与视觉转换器相结合的新型架构，用于生成多样化和逼真的注视路径。我们的方法通过利用扩散模型的随机性明确建模注视路径的变异性，生成多种合理的注视轨迹。此外，我们引入了文本条件，以实现任务驱动的注视路径生成，使模型能够适应不同的视觉搜索目标。对基准数据集的实验结果显示，ScanDiff在自由查看和任务驱动场景中超过了现有方法，生成了更多样化和准确的注视路径。这些结果突显了其更好捕捉人类视觉行为复杂性的能力，推动了注视预测研究的发展。源代码和模型可在https://aimagelab.github.io/ScanDiff 上公开获取。

更新时间: 2025-07-30 18:36:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.23021v1

Data Readiness for Scientific AI at Scale

This paper examines how Data Readiness for AI (DRAI) principles apply to leadership-scale scientific datasets used to train foundation models. We analyze archetypal workflows across four representative domains - climate, nuclear fusion, bio/health, and materials - to identify common preprocessing patterns and domain-specific constraints. We introduce a two-dimensional readiness framework composed of Data Readiness Levels (raw to AI-ready) and Data Processing Stages (ingest to shard), both tailored to high performance computing (HPC) environments. This framework outlines key challenges in transforming scientific data for scalable AI training, emphasizing transformer-based generative models. Together, these dimensions form a conceptual maturity matrix that characterizes scientific data readiness and guides infrastructure development toward standardized, cross-domain support for scalable and reproducible AI for science.

Updated: 2025-07-30 18:30:37

标题: 大规模科学人工智能的数据准备工作

摘要: 本文探讨了数据准备对于用于训练基础模型的领导规模科学数据集的应用原则。我们分析了代表性领域（气候、核聚变、生物/健康和材料）中的典型工作流程，以确定常见的预处理模式和特定领域的约束。我们引入了一个由数据准备级别（原始到AI准备）和数据处理阶段（摄取到分片）组成的二维准备框架，两者都针对高性能计算（HPC）环境。这个框架概述了将科学数据转换为可扩展AI训练的关键挑战，强调基于变压器的生成模型。这些维度共同形成了一个概念成熟度矩阵，表征了科学数据的准备情况，并指导基础设施发展朝向标准化、跨领域支持可扩展和可重复的科学AI。

更新时间: 2025-07-30 18:30:37

领域: cs.AI,cs.CE,cs.DC,cs.LG,I.2.6

下载: http://arxiv.org/abs/2507.23018v1

Deciphering interventional dynamical causality from non-intervention complex systems

Detecting and quantifying causality is a focal topic in the fields of science, engineering, and interdisciplinary studies. However, causal studies on non-intervention systems attract much attention but remain extremely challenging. Delay-embedding technique provides a promising approach. In this study, we propose a framework named Interventional Dynamical Causality (IntDC) in contrast to the traditional Constructive Dynamical Causality (ConDC). ConDC, including Granger causality, transfer entropy and convergence of cross-mapping, measures the causality by constructing a dynamical model without considering interventions. A computational criterion, Interventional Embedding Entropy (IEE), is proposed to measure causal strengths in an interventional manner. IEE is an intervened causal information flow but in the delay-embedding space. Further, the IEE theoretically and numerically enables the deciphering of IntDC solely from observational (non-interventional) time-series data, without requiring any knowledge of dynamical models or real interventions in the considered system. In particular, IEE can be applied to rank causal effects according to their importance and construct causal networks from data. We conducted numerical experiments to demonstrate that IEE can find causal edges accurately, eliminate effects of confounding, and quantify causal strength robustly over traditional indices. We also applied IEE to real-world tasks. IEE performed as an accurate and robust tool for causal analyses solely from the observational data. The IntDC framework and IEE algorithm provide an efficient approach to the study of causality from time series in diverse non-intervention complex systems.

Updated: 2025-07-30 18:26:45

标题: 解读非干预复杂系统中的干预动态因果关系

摘要: 检测和量化因果关系是科学，工程和跨学科研究领域的一个关键主题。然而，对非干预系统的因果研究引起了很多关注，但仍然极具挑战性。延迟嵌入技术提供了一种有希望的方法。在这项研究中，我们提出了一个名为干预动力因果性（IntDC）的框架，与传统的建构动力因果性（ConDC）相对。ConDC，包括Granger因果性，传递熵和交叉映射的收敛性，通过构建一个动力模型来衡量因果性，而不考虑干预。提出了一个计算标准，干预嵌入熵（IEE），以干预方式衡量因果关系强度。IEE是在延迟嵌入空间中的干预因果信息流。此外，IEE在理论和数值上使得从观察（非干预）时间序列数据中仅通过观察数据而不需要考虑系统中动态模型或真实干预就可以解密IntDC成为可能。特别地，IEE可以应用于根据其重要性对因果效应进行排名，并从数据中构建因果网络。我们进行了数值实验，证明IEE能够准确找出因果边缘，消除混杂效应，并且相对于传统指标，能够稳健地量化因果强度。我们还将IEE应用于现实任务中。IEE在仅从观察数据中进行因果分析时表现为准确且稳健的工具。IntDC框架和IEE算法为研究各种非干预复杂系统中的时间序列数据的因果关系提供了一种高效的方法。

更新时间: 2025-07-30 18:26:45

领域: cs.LG,q-bio.QM,stat.ME,stat.ML

下载: http://arxiv.org/abs/2407.01621v2

A Smoothing Newton Method for Rank-one Matrix Recovery

We consider the phase retrieval problem, which involves recovering a rank-one positive semidefinite matrix from rank-one measurements. A recently proposed algorithm based on Bures-Wasserstein gradient descent (BWGD) exhibits superlinear convergence, but it is unstable, and existing theory can only prove local linear convergence for higher rank matrix recovery. We resolve this gap by revealing that BWGD implements Newton's method with a nonsmooth and nonconvex objective. We develop a smoothing framework that regularizes the objective, enabling a stable method with rigorous superlinear convergence guarantees. Experiments on synthetic data demonstrate this superior stability while maintaining fast convergence.

Updated: 2025-07-30 18:25:42

标题: 一种用于秩为一矩阵恢复的平滑牛顿方法

摘要: 我们考虑相位恢复问题，该问题涉及从秩为一的测量中恢复秩为一的正半定矩阵。最近提出的基于Bures-Wasserstein梯度下降（BWGD）的算法表现出超线性收敛，但它是不稳定的，现有理论只能证明对于更高秩矩阵恢复的局部线性收敛。我们通过揭示BWGD实现了牛顿法，并且具有非光滑和非凸目标，解决了这一差距。我们开发了一个平滑框架，对目标进行正则化，实现了一个稳定的方法，并且具有严格的超线性收敛保证。对合成数据的实验表明了这种卓越的稳定性，同时保持了快速收敛。

更新时间: 2025-07-30 18:25:42

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2507.23017v1

Hypergraph Neural Sheaf Diffusion: A Symmetric Simplicial Set Framework for Higher-Order Learning

The absence of intrinsic adjacency relations and orientation systems in hypergraphs creates fundamental challenges for constructing sheaf Laplacians of arbitrary degrees. We resolve these limitations through symmetric simplicial sets derived directly from hypergraphs, called symmetric simplicial lifting, which encode all possible oriented subrelations within each hyperedge as ordered tuples. This construction canonically defines adjacency via facet maps while inherently preserving hyperedge provenance. We establish that the normalized degree zero sheaf Laplacian on our symmetric simplicial lifting reduces exactly to the traditional graph normalized sheaf Laplacian when restricted to graphs, validating its mathematical consistency with prior graph-based sheaf theory. Furthermore, the induced structure preserves all structural information from the original hypergraph, ensuring that every multi-way relational detail is faithfully retained. Leveraging this framework, we introduce Hypergraph Neural Sheaf Diffusion (HNSD), the first principled extension of neural sheaf diffusion to hypergraphs. HNSD operates via normalized degree zero sheaf Laplacian over symmetric simplicial lifting, resolving orientation ambiguity and adjacency sparsity inherent to hypergraph learning. Experimental evaluations demonstrate HNSDs competitive performance across established benchmarks.

Updated: 2025-07-30 18:24:32

标题: 超图神经束扩散：用于高阶学习的对称单纯集框架

摘要: 超图中缺乏固有邻接关系和定向系统，这为构建任意度数的层状拉普拉斯算子带来了基本挑战。我们通过直接从超图中导出的对称单纯集，称为对称单纯提升，解决了这些限制，这些对称单纯集将每个超边内的所有可能定向子关系编码为有序元组。这种构造通过面映射规范地定义邻接关系，同时固有地保留超边的来源。我们证明，在我们的对称单纯提升上的标准化零度层拉普拉斯算子在限制为图时，恰好可以归约为传统图标准化层拉普拉斯算子，验证了它与先前基于图的层理论的数学一致性。此外，所引发的结构保留了原始超图的所有结构信息，确保每个多向关系细节都被忠实保留。利用这一框架，我们引入了超图神经层扩散（HNSD），这是对超图进行的神经层扩散的第一个原则性扩展。HNSD通过对称单纯提升上的标准化零度层拉普拉斯算子运行，解决了超图学习中固有的定向歧义和邻接稀疏性。实验评估表明，HNSD在已建立的基准测试中表现出竞争力。

更新时间: 2025-07-30 18:24:32

领域: cs.LG,math.AT,05C65, 55U10, 68T07

下载: http://arxiv.org/abs/2505.05702v3

Learning to Prune Branches in Modern Tree-Fruit Orchards

Dormant tree pruning is labor-intensive but essential to maintaining modern highly-productive fruit orchards. In this work we present a closed-loop visuomotor controller for robotic pruning. The controller guides the cutter through a cluttered tree environment to reach a specified cut point and ensures the cutters are perpendicular to the branch. We train the controller using a novel orchard simulation that captures the geometric distribution of branches in a target apple orchard configuration. Unlike traditional methods requiring full 3D reconstruction, our controller uses just optical flow images from a wrist-mounted camera. We deploy our learned policy in simulation and the real-world for an example V-Trellis envy tree with zero-shot transfer, achieving a 30% success rate -- approximately half the performance of an oracle planner.

Updated: 2025-07-30 18:24:20

标题: 学习在现代果树园中修剪树枝

摘要: 休眠树修剪是一项劳动密集型的工作，但对于维护现代高产果园至关重要。在这项工作中，我们提出了一个闭环视觉运动控制器，用于机器人修剪。该控制器引导刀具穿过杂乱的树木环境到达指定的切割点，并确保刀具与树枝垂直。我们使用一种新颖的果园模拟训练控制器，该模拟捕捉了目标苹果园配置中树枝的几何分布。与传统方法需要完整的3D重建不同，我们的控制器仅使用手腕装载摄像头的光流图像。我们在模拟环境和现实世界中部署我们学习到的策略，对于一个示例的V-Trellis嫉妒树，实现了零迁移的30%成功率，约为神谕规划器性能的一半。

更新时间: 2025-07-30 18:24:20

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2507.23015v1

Investigating the Invertibility of Multimodal Latent Spaces: Limitations of Optimization-Based Methods

This paper investigates the inverse capabilities and broader utility of multimodal latent spaces within task-specific AI (Artificial Intelligence) models. While these models excel at their designed forward tasks (e.g., text-to-image generation, audio-to-text transcription), their potential for inverse mappings remains largely unexplored. We propose an optimization-based framework to infer input characteristics from desired outputs, applying it bidirectionally across Text-Image (BLIP, Flux.1-dev) and Text-Audio (Whisper-Large-V3, Chatterbox-TTS) modalities. Our central hypothesis posits that while optimization can guide models towards inverse tasks, their multimodal latent spaces will not consistently support semantically meaningful and perceptually coherent inverse mappings. Experimental results consistently validate this hypothesis. We demonstrate that while optimization can force models to produce outputs that align textually with targets (e.g., a text-to-image model generating an image that an image captioning model describes correctly, or an ASR model transcribing optimized audio accurately), the perceptual quality of these inversions is chaotic and incoherent. Furthermore, when attempting to infer the original semantic input from generative models, the reconstructed latent space embeddings frequently lack semantic interpretability, aligning with nonsensical vocabulary tokens. These findings highlight a critical limitation. multimodal latent spaces, primarily optimized for specific forward tasks, do not inherently possess the structure required for robust and interpretable inverse mappings. Our work underscores the need for further research into developing truly semantically rich and invertible multimodal latent spaces.

Updated: 2025-07-30 18:19:11

标题: 研究多模态潜在空间的可逆性：基于优化方法的局限性

摘要: 本文研究了任务特定人工智能（AI）模型中多模态潜在空间的逆能力和更广泛的实用性。虽然这些模型在设计的前向任务（例如，文本到图像生成，音频到文本转录）方面表现出色，但它们的逆映射潜力仍未得到充分探索。我们提出了一个基于优化的框架，用于从期望的输出推断输入特征，并在文本-图像（BLIP，Flux.1-dev）和文本-音频（Whisper-Large-V3，Chatterbox-TTS）模态之间双向应用它。我们的中心假设认为，虽然优化可以引导模型朝着逆任务发展，但它们的多模态潜在空间不会始终支持语义上有意义且感知上连贯的逆映射。实验结果一致验证了这一假设。我们证明，虽然优化可以迫使模型产生与目标文本对齐的输出（例如，文本到图像模型生成一个能够被图像字幕模型正确描述的图像，或者一个ASR模型准确转录经过优化的音频），但这些逆转的感知质量是混乱且不连贯的。此外，当尝试从生成模型推断出原始语义输入时，重构的潜在空间嵌入经常缺乏语义可解释性，与无意义的词汇标记相一致。这些发现突出了一个关键限制。主要针对特定前向任务进行优化的多模态潜在空间并不具备用于强大且可解释的逆映射所需的结构。我们的工作强调了进一步研究开发真正语义丰富且可逆的多模态潜在空间的必要性。

更新时间: 2025-07-30 18:19:11

领域: cs.LG,cs.AI,cs.CV,cs.SD,eess.AS

下载: http://arxiv.org/abs/2507.23010v1

Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead

Large Language Models (LLMs) have achieved remarkable results on a range of standardized tests originally designed to assess human cognitive and psychological traits, such as intelligence and personality. While these results are often interpreted as strong evidence of human-like characteristics in LLMs, this paper argues that such interpretations constitute an ontological error. Human psychological and educational tests are theory-driven measurement instruments, calibrated to a specific human population. Applying these tests to non-human subjects without empirical validation, risks mischaracterizing what is being measured. Furthermore, a growing trend frames AI performance on benchmarks as measurements of traits such as ``intelligence'', despite known issues with validity, data contamination, cultural bias and sensitivity to superficial prompt changes. We argue that interpreting benchmark performance as measurements of human-like traits, lacks sufficient theoretical and empirical justification. This leads to our position: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead. We call for the development of principled, AI-specific evaluation frameworks tailored to AI systems. Such frameworks might build on existing frameworks for constructing and validating psychometrics tests, or could be created entirely from scratch to fit the unique context of AI.

Updated: 2025-07-30 18:14:35

标题: 不要用人类测试来评估人工智能，而是应开发基于原则的、专门针对人工智能的测试

摘要: 大型语言模型(LLMs)在一系列最初设计用于评估人类认知和心理特征的标准化测试中取得了显著的成果，比如智力和人格。尽管这些结果经常被解释为LLMs具有类似人类特征，但本文认为这样的解释构成本体论错误。人类心理和教育测试是理论驱动的测量工具，校准到特定的人类群体。将这些测试应用于非人类主体而没有经验验证，会有误解正在被测量的风险。此外，一种趋势是将AI在基准测试上的表现框定为“智力”等特征的测量，尽管已知存在有效性、数据污染、文化偏见和对表面提示变化的敏感性等问题。我们认为将基准测试表现解释为类似人类特征的测量，缺乏足够的理论和实证理据。这导致我们的立场：停止用人类测试评估AI，而是开发基于原则的、专门针对AI的测试。我们呼吁开发基于原则的、专门针对AI系统的评估框架。这样的框架可能建立在用于构建和验证心理测量测试的现有框架上，也可以完全从头开始创建，以适应AI的独特背景。

更新时间: 2025-07-30 18:14:35

领域: cs.LG,cs.AI,91E45,I.2

下载: http://arxiv.org/abs/2507.23009v1

Noise-Coded Illumination for Forensic and Photometric Video Analysis

The proliferation of advanced tools for manipulating video has led to an arms race, pitting those who wish to sow disinformation against those who want to detect and expose it. Unfortunately, time favors the ill-intentioned in this race, with fake videos growing increasingly difficult to distinguish from real ones. At the root of this trend is a fundamental advantage held by those manipulating media: equal access to a distribution of what we consider authentic (i.e., "natural") video. In this paper, we show how coding very subtle, noise-like modulations into the illumination of a scene can help combat this advantage by creating an information asymmetry that favors verification. Our approach effectively adds a temporal watermark to any video recorded under coded illumination. However, rather than encoding a specific message, this watermark encodes an image of the unmanipulated scene as it would appear lit only by the coded illumination. We show that even when an adversary knows that our technique is being used, creating a plausible coded fake video amounts to solving a second, more difficult version of the original adversarial content creation problem at an information disadvantage. This is a promising avenue for protecting high-stakes settings like public events and interviews, where the content on display is a likely target for manipulation, and while the illumination can be controlled, the cameras capturing video cannot.

Updated: 2025-07-30 18:08:34

标题: 噪声编码照明用于取证和光度视频分析

摘要: 视频处理的高级工具的普及导致了一场军备竞赛，使那些希望散布虚假信息的人与那些希望检测和揭示虚假信息的人对立。不幸的是，在这场竞赛中，时间有利于恶意行为者，因为伪造视频与真实视频之间的区别越来越难以辨别。造成这一趋势的根本原因是那些操纵媒体的人拥有的基本优势：对我们认为真实（即“自然”）视频的平等访问。在本文中，我们展示了如何将非常微妙的、类似噪声的调制编码到场景的照明中，可以帮助打破这种优势，通过创造一个有利于验证的信息不对称性。我们的方法有效地为在编码照明下录制的任何视频添加了一个时间水印。然而，与其编码特定消息，这个水印将未经操纵的场景的图像编码为只有编码照明照亮时才会出现的样子。我们展示，即使对手知道我们正在使用的技术，创建一个可信的编码虚假视频也相当于在信息劣势下解决原始对抗内容创建问题的一个更难的版本。这是一种有希望的途径，用于保护公共事件和采访等高风险环境，其中展示的内容很可能成为操纵的目标，而捕捉视频的相机则无法控制。

更新时间: 2025-07-30 18:08:34

领域: cs.GR,cs.CR,cs.CV

下载: http://arxiv.org/abs/2507.23002v1

Planning for Cooler Cities: A Multimodal AI Framework for Predicting and Mitigating Urban Heat Stress through Urban Landscape Transformation

As extreme heat events intensify due to climate change and urbanization, cities face increasing challenges in mitigating outdoor heat stress. While traditional physical models such as SOLWEIG and ENVI-met provide detailed assessments of human-perceived heat exposure, their computational demands limit scalability for city-wide planning. In this study, we propose GSM-UTCI, a multimodal deep learning framework designed to predict daytime average Universal Thermal Climate Index (UTCI) at 1-meter hyperlocal resolution. The model fuses surface morphology (nDSM), high-resolution land cover data, and hourly meteorological conditions using a feature-wise linear modulation (FiLM) architecture that dynamically conditions spatial features on atmospheric context. Trained on SOLWEIG-derived UTCI maps, GSM-UTCI achieves near-physical accuracy, with an R2 of 0.9151 and a mean absolute error (MAE) of 0.41{\deg}C, while reducing inference time from hours to under five minutes for an entire city. To demonstrate its planning relevance, we apply GSM-UTCI to simulate systematic landscape transformation scenarios in Philadelphia, replacing bare earth, grass, and impervious surfaces with tree canopy. Results show spatially heterogeneous but consistently strong cooling effects, with impervious-to-tree conversion producing the highest aggregated benefit (-4.18{\deg}C average change in UTCI across 270.7 km2). Tract-level bivariate analysis further reveals strong alignment between thermal reduction potential and land cover proportions. These findings underscore the utility of GSM-UTCI as a scalable, fine-grained decision support tool for urban climate adaptation, enabling scenario-based evaluation of greening strategies across diverse urban environments.

Updated: 2025-07-30 18:05:43

标题: 规划更凉爽的城市：通过城市景观转变预测和减轻城市热应激的多模态人工智能框架

摘要: 随着气候变化和城市化加剧，极端热事件愈发频繁，城市面临着在减轻户外热应激方面越来越大的挑战。虽然传统的物理模型如SOLWEIG和ENVI-met能够提供详细的人体感知热暴露评估，但它们的计算需求限制了在城市规划中的可扩展性。本研究提出了GSM-UTCI，一个多模态深度学习框架，旨在预测1米高分辨率下白天平均通用热气候指数（UTCI）。该模型利用一种特征线性调制（FiLM）架构，将地表形态（nDSM）、高分辨率土地覆盖数据和每小时气象条件融合在一起，动态地根据大气环境来调节空间特征。在以SOLWEIG推导的UTCI地图上进行训练后，GSM-UTCI实现了接近物理精度的预测，R2为0.9151，平均绝对误差（MAE）为0.41°C，同时将推断时间从几小时缩短到不到五分钟，适用于整个城市。为了展示其规划相关性，我们将GSM-UTCI应用于模拟费城的系统性景观转变场景，用树冠替代裸露土地、草地和不透水表面。结果显示空间上异质但一致的强降温效果，其中不透水表面转化为树冠产生最高的聚合效益（在270.7 km2范围内UTCI平均变化为-4.18°C）。地段级双变量分析进一步显示热减潜力与土地覆盖比例之间的强关联。这些发现强调了GSM-UTCI作为可扩展的、细粒度的城市气候适应决策支持工具的实用性，可实现对不同城市环境中绿化策略的基于场景的评估。

更新时间: 2025-07-30 18:05:43

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.23000v1

Learning Simulatable Models of Cloth with Spatially-varying Constitutive Properties

Materials used in real clothing exhibit remarkable complexity and spatial variation due to common processes such as stitching, hemming, dyeing, printing, padding, and bonding. Simulating these materials, for instance using finite element methods, is often computationally demanding and slow. Worse, such methods can suffer from numerical artifacts called ``membrane locking'' that makes cloth appear artificially stiff. Here we propose a general framework, called Mass-Spring Net, for learning a simple yet efficient surrogate model that captures the effects of these complex materials using only motion observations. The cloth is discretized into a mass-spring network with unknown material parameters that are learned directly from the motion data, using a novel force-and-impulse loss function. Our approach demonstrates the ability to accurately model spatially varying material properties from a variety of data sources, and immunity to membrane locking which plagues FEM-based simulations. Compared to graph-based networks and neural ODE-based architectures, our method achieves significantly faster training times, higher reconstruction accuracy, and improved generalization to novel dynamic scenarios.

Updated: 2025-07-30 18:05:08

标题: 学习具有空间变化本构特性的可模拟布料模型

摘要: 真实服装中使用的材料表现出显著的复杂性和空间变化，这是由于常见的缝纫、折边、染色、印花、填充和粘合等过程造成的。模拟这些材料，例如使用有限元方法，通常需要大量的计算和时间。更糟糕的是，这种方法可能会出现称为“膜锁定”的数值人为现象，使布料看起来人为地僵硬。在这里，我们提出了一个通用框架，称为Mass-Spring Net，用于学习一个简单而高效的替代模型，仅使用运动观察数据捕捉这些复杂材料的效果。布料被离散化为一个具有未知材料参数的质点-弹簧网络，这些参数直接从运动数据中学习，使用一种新颖的力和冲量损失函数。我们的方法展示了能够准确地模拟来自各种数据源的空间变化的材料属性的能力，并且不受困扰有限元模拟的膜锁定。与基于图的网络和神经ODE架构相比，我们的方法实现了显著更快的训练时间、更高的重建准确度，并对新颖的动态场景具有更好的泛化能力。

更新时间: 2025-07-30 18:05:08

领域: cs.GR,cs.AI

下载: http://arxiv.org/abs/2507.21288v2

Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning

In-context learning (ICL) is a critical emerging capability of large language models (LLMs), enabling few-shot learning during inference by including a few demonstrations (demos) in the prompt. However, it has been found that ICL's performance can be sensitive to the choices of demos and their order. This paper investigates an unexplored new positional bias of ICL for the first time: we observe that the predictions and accuracy can drift drastically when the positions of demos, the system prompt, and the user message in LLM input are varied. We refer to this bias as DEMOS' POSITION IN PROMPT (DPP) bias. We design a systematic evaluation pipeline to study this type of positional bias across classification, question answering, summarization, and reasoning tasks. We introduce two metrics, ACCURACY-CHANGE and PREDICTION-CHANGE, to quantify net gains and output volatility induced by changes in the demos' position. Extensive experiments on ten LLMs from four open-source model families (QWEN, LLAMA3, MISTRAL, COHERE) verify that the bias significantly affects their accuracy and predictions: placing demos at the start of the prompt yields the most stable and accurate outputs with gains of up to +6 points. In contrast, placing demos at the end of the user message flips over 30\% of predictions without improving correctness on QA tasks. Smaller models are most affected by this sensitivity, though even large models remain marginally affected on complex tasks.

Updated: 2025-07-30 17:59:46

标题: 在您的提示中展示演示文稿的最佳位置：上下文学习的位置偏见

摘要: 上下文学习（ICL）是大型语言模型（LLMs）的一个关键新兴能力，通过在提示中包含一些演示（demos），在推理过程中实现少样本学习。然而，发现ICL的性能可能对演示的选择和顺序敏感。本文首次研究了ICL的一个未被探索的新的位置偏见：我们观察到，当LLM输入中的演示的位置，系统提示和用户消息的位置变化时，预测和准确性可能会发生显著漂移。我们将这种偏见称为演示位置在提示中（DPP）偏见。我们设计了一个系统化评估流程，以研究这种位置偏见在分类、问答、摘要和推理任务中的影响。我们引入了两个度量标准，准确性变化和预测变化，以量化由演示位置变化引起的净收益和输出波动性。对来自四个开源模型系列（QWEN、LLAMA3、MISTRAL、COHERE）的十个LLM进行了大量实验，验证了这种偏见对它们的准确性和预测的显著影响：将演示放在提示的开头产生了最稳定和准确的输出，获得了高达+6个点的增益。相反，将演示放在用户消息的末尾会使超过30\%的预测发生变化，但不会提高问答任务的正确性。较小的模型受这种敏感性影响最大，尽管在复杂任务上，即使是大型模型也会受到边缘影响。

更新时间: 2025-07-30 17:59:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.22887v1

Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning

Updated: 2025-07-30 17:59:46

标题: 在您的提示中展示演示的位置偏见：上下文学习的位置偏见

摘要: In-context learning (ICL)是大型语言模型（LLMs）的一个重要新兴能力，通过在提示中包含一些演示（demos），使得在推理过程中能进行少样本学习。然而，研究发现ICL的性能可能会对演示的选择和顺序敏感。本文首次调查了ICL的一个未被探索的新的位置偏差：我们观察到当演示、系统提示和用户消息在LLM输入中的位置变化时，预测和准确性可能会发生明显的漂移。我们将这种偏差称为演示在提示中的位置（DPP）偏差。我们设计了一个系统评估流水线来研究这种位置偏差在分类、问答、摘要和推理任务中的影响。我们引入了两个度量标准，准确性变化和预测变化，来量化演示位置的变化所导致的净增益和输出波动。对来自四个开源模型系列（QWEN、LLAMA3、MISTRAL、COHERE）的十个LLMs进行了大量实验，验证了这种偏差对它们的准确性和预测的显著影响：将演示放在提示的开头会产生最稳定和准确的输出，增益高达+6点。相比之下，将演示放在用户消息的末尾会翻转30％以上的预测，但不会提高问答任务的正确性。较小的模型受到这种敏感性的影响最大，尽管即使是大型模型在复杂任务上也受到边缘影响。

更新时间: 2025-07-30 17:59:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.22887v1

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Large Language Models (LLMs) have demonstrated strong capabilities but remain fundamentally static, unable to adapt their internal parameters to novel tasks, evolving knowledge domains, or dynamic interaction contexts. As LLMs are increasingly deployed in open-ended, interactive environments, this static nature has become a critical bottleneck, necessitating agents that can adaptively reason, act, and evolve in real time. This paradigm shift -- from scaling static models to developing self-evolving agents -- has sparked growing interest in architectures and methods enabling continual learning and adaptation from data, interactions, and experiences. This survey provides the first systematic and comprehensive review of self-evolving agents, organized around three foundational dimensions -- what to evolve, when to evolve, and how to evolve. We examine evolutionary mechanisms across agent components (e.g., models, memory, tools, architecture), categorize adaptation methods by stages (e.g., intra-test-time, inter-test-time), and analyze the algorithmic and architectural designs that guide evolutionary adaptation (e.g., scalar rewards, textual feedback, single-agent and multi-agent systems). Additionally, we analyze evaluation metrics and benchmarks tailored for self-evolving agents, highlight applications in domains such as coding, education, and healthcare, and identify critical challenges and research directions in safety, scalability, and co-evolutionary dynamics. By providing a structured framework for understanding and designing self-evolving agents, this survey establishes a roadmap for advancing adaptive agentic systems in both research and real-world deployments, ultimately shedding lights to pave the way for the realization of Artificial Super Intelligence (ASI), where agents evolve autonomously, performing at or beyond human-level intelligence across a wide array of tasks.

Updated: 2025-07-30 17:59:37

标题: 自我进化代理的调查：通往人工超级智能的道路

摘要: 大型语言模型（LLMs）展示了强大的能力，但仍然基本上是静态的，无法调整其内部参数以适应新任务、不断演变的知识领域或动态互动环境。随着LLMs越来越多地部署在开放式、交互式环境中，这种静态性已经成为一个关键瓶颈，需要能够适应性地推理、行动和实时演化的代理人。这种范式转变--从扩展静态模型到开发自我演化代理--引发了对架构和方法的兴趣，使其能够持续学习和适应来自数据、互动和经验。本调查提供了首次系统和全面审查自我演化代理，围绕三个基本维度组织-什么要演化、何时演化以及如何演化。我们研究了代理组件（例如模型、记忆、工具、架构）之间的演化机制，通过阶段（例如测试内时间、测试间时间）对适应方法进行分类，并分析引导演化适应的算法和架构设计（例如标量奖励、文本反馈、单一代理和多代理系统）。此外，我们分析了为自我演化代理量身定制的评估指标和基准，强调了在领域（例如编码、教育和医疗保健）中的应用，并确定了在安全性、可扩展性和共同进化动态方面的关键挑战和研究方向。通过提供一个结构化框架来理解和设计自我演化代理，本调查为推进自适应代理系统的研究和现实世界部署建立了一条路线图，最终为实现人工超级智能（ASI）铺平道路，其中代理人可以自主演化，在各种任务中表现出人类水平以上的智能。

更新时间: 2025-07-30 17:59:37

领域: cs.AI

下载: http://arxiv.org/abs/2507.21046v2

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Updated: 2025-07-30 17:59:37

标题: 自进化智能体调查：通往人工超级智能的道路

摘要: 大型语言模型（LLMs）展示了强大的能力，但仍然基本上是静态的，无法将其内部参数适应新颖任务、不断发展的知识领域或动态互动环境。随着LLMs越来越多地部署在开放式、交互式环境中，这种静态特性已成为一个关键瓶颈，需要代理能够在实时中适应性推理、行动和进化。这种范式转变--从扩展静态模型到开发自我进化代理--引发了对架构和方法的日益增长的兴趣，使其能够从数据、交互和经验中持续学习和适应。本调查提供了自我进化代理的首次系统且全面的回顾，围绕三个基础维度组织--何时进化、何时进化和如何进化。我们研究了在代理组件（例如模型、记忆、工具、架构）之间的进化机制，将适应方法按阶段（例如测试内时间、测试间时间）进行分类，并分析了指导进化适应的算法和架构设计（例如标量奖励、文本反馈、单代理和多代理系统）。此外，我们分析了为自我进化代理量身定制的评估指标和基准，突出了在诸如编码、教育和医疗保健等领域的应用，并确定了在安全性、可扩展性和共同进化动态方面的关键挑战和研究方向。通过提供一个结构化框架来理解和设计自我进化代理，本调查为在研究和现实世界部署中推进自适应代理系统奠定了路线图，最终为实现人工超级智能（ASI）铺平了道路，其中代理能够自主进化，在广泛的任务范围内以人类水平以上的智能水平执行。

更新时间: 2025-07-30 17:59:37

领域: cs.AI

下载: http://arxiv.org/abs/2507.21046v2

C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

Spoken Dialogue Models (SDMs) have recently attracted significant attention for their ability to generate voice responses directly to users' spoken queries. Despite their increasing popularity, there exists a gap in research focused on comprehensively understanding their practical effectiveness in comprehending and emulating human conversations. This is especially true compared to text-based Large Language Models (LLMs), which benefit from extensive benchmarking. Human voice interactions are inherently more complex than text due to characteristics unique to spoken dialogue. Ambiguity poses one challenge, stemming from semantic factors like polysemy, as well as phonological aspects such as heterograph, heteronyms, and stress patterns. Additionally, context-dependency, like omission, coreference, and multi-turn interaction, adds further complexity to human conversational dynamics. To illuminate the current state of SDM development and to address these challenges, we present a benchmark dataset in this paper, which comprises 1,079 instances in English and Chinese. Accompanied by an LLM-based evaluation method that closely aligns with human judgment, this dataset facilitates a comprehensive exploration of the performance of SDMs in tackling these practical challenges.

Updated: 2025-07-30 17:56:23

标题: C3：探索复杂对话中挑战的口语对话模型的双语基准Benchmark

摘要: 口语对话模型（SDMs）最近吸引了相当大的关注，因为它们能够直接生成对用户口头查询的语音回应。尽管它们越来越受欢迎，但在全面理解它们在理解和模拟人类对话方面的实际有效性方面存在研究空白。与受益于广泛基准测试的基于文本的大型语言模型（LLMs）相比，这一点尤为明显。人类语音交互由于口语对话独特的特点而本质上更加复杂。模棱两可构成了一个挑战，源自语义因素如多义性，以及音韵方面的异字同音、同形异义词和重音模式。此外，像省略、指代和多轮交互这样的上下文依赖性增加了人类对话动态的复杂性。为了阐明SDM开发的当前状态并解决这些挑战，我们在本文中提出了一个基准数据集，包括1,079个英语和中文实例。伴随着一种基于LLM的评估方法，该数据集与人类判断紧密对齐，有助于全面探索SDMs在应对这些实际挑战方面的表现。

更新时间: 2025-07-30 17:56:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.22968v1

Consistency of Feature Attribution in Deep Learning Architectures for Multi-Omics

Machine and deep learning have grown in popularity and use in biological research over the last decade but still present challenges in interpretability of the fitted model. The development and use of metrics to determine features driving predictions and increase model interpretability continues to be an open area of research. We investigate the use of Shapley Additive Explanations (SHAP) on a multi-view deep learning model applied to multi-omics data for the purposes of identifying biomolecules of interest. Rankings of features via these attribution methods are compared across various architectures to evaluate consistency of the method. We perform multiple computational experiments to assess the robustness of SHAP and investigate modeling approaches and diagnostics to increase and measure the reliability of the identification of important features. Accuracy of a random-forest model fit on subsets of features selected as being most influential as well as clustering quality using only these features are used as a measure of effectiveness of the attribution method. Our findings indicate that the rankings of features resulting from SHAP are sensitive to the choice of architecture as well as different random initializations of weights, suggesting caution when using attribution methods on multi-view deep learning models applied to multi-omics data. We present an alternative, simple method to assess the robustness of identification of important biomolecules.

Updated: 2025-07-30 17:53:42

标题: 多组学深度学习架构中特征归因的一致性

摘要: 在过去的十年里，机器学习和深度学习在生物研究中越来越受欢迎和应用，但在解释拟合模型方面仍存在挑战。开发和使用指标来确定驱动预测的特征并增加模型的解释性仍然是一个开放的研究领域。我们研究了在多组学数据上应用多视图深度学习模型的Shapley Additive Explanations（SHAP）的使用，以便识别感兴趣的生物分子。通过这些归因方法对特征进行排名，并比较不同架构之间的排名以评估方法的一致性。我们进行了多个计算实验来评估SHAP的稳健性，并研究建模方法和诊断以增加和衡量重要特征的可靠性识别。通过将对被选为最具影响力的特征子集进行的随机森林模型拟合的准确性以及仅使用这些特征进行的聚类质量作为归因方法有效性的度量。我们的研究结果表明，通过SHAP得出的特征排名对架构的选择以及不同的权重随机初始化敏感，这表明在将归因方法应用于应用于多组学数据的多视图深度学习模型时需要谨慎。我们提出了一种替代简单的方法来评估重要生物分子的识别稳健性。

更新时间: 2025-07-30 17:53:42

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.22877v1

Consistency of Feature Attribution in Deep Learning Architectures for Multi-Omics

Updated: 2025-07-30 17:53:42

标题: Deep Learning架构在多组学中的特征归因一致性

摘要: 在过去十年中，机器学习和深度学习在生物学研究中变得越来越受欢迎和应用，但仍然存在模型解释能力的挑战。开发和使用度量标准来确定驱动预测的特征，并增加模型的解释性仍然是一个开放的研究领域。我们调查了在应用于多组学数据的多视图深度学习模型上使用Shapley Additive Explanations（SHAP）的方法，以便识别感兴趣的生物分子。通过这些归因方法对特征进行排名，并在不同架构之间进行比较，以评估方法的一致性。我们进行多次计算实验来评估SHAP的稳健性，并研究建模方法和诊断方法，以增加和衡量重要特征的可靠性识别。使用作为最具影响力的特征子集拟合的随机森林模型的准确性，以及仅使用这些特征的聚类质量作为归因方法有效性的衡量标准。我们的研究结果表明，通过SHAP得出的特征排名对架构选择和不同的随机权重初始化敏感，这表明在应用于多组学数据的多视图深度学习模型时使用归因方法时需要谨慎。我们提出了一种替代简单方法，用于评估重要生物分子的识别的稳健性。

更新时间: 2025-07-30 17:53:42

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.22877v1

Automatically discovering heuristics in a complex SAT solver with large language models

Satisfiability problem (SAT) is a cornerstone of computational complexity with broad industrial applications, and it remains challenging to optimize modern SAT solvers in real-world settings due to their intricate architectures. While automatic configuration frameworks have been developed, they rely on manually constrained search spaces and yield limited performance gains. This work introduces a novel paradigm which effectively optimizes complex SAT solvers via Large Language Models (LLMs), and a tool called AutoModSAT is developed. Three fundamental challenges are addressed in order to achieve superior performance: (1) LLM-friendly solver: Systematic guidelines are proposed for developing a modularized solver to meet LLMs' compatibility, emphasizing code simplification, information share and bug reduction; (2) Automatic prompt optimization: An unsupervised automatic prompt optimization method is introduced to advance the diversity of LLMs' output; (3) Efficient search strategy: We design a presearch strategy and an EA evolutionary algorithm for the final efficient and effective discovery of heuristics. Extensive experiments across a wide range of datasets demonstrate that AutoModSAT achieves 50% performance improvement over the baseline solver and achieves 30% superiority against the state-of-the-art (SOTA) solvers. Moreover, AutoModSAT attains a 20% speedup on average compared to parameter-tuned alternatives of the SOTA solvers, showcasing the enhanced capability in handling complex problem instances. This work bridges the gap between AI-driven heuristics discovery and mission-critical system optimization, and provides both methodological advancements and empirically validated results for next-generation complex solver development.

Updated: 2025-07-30 17:52:25

标题: 使用大型语言模型在复杂SAT求解器中自动发现启发式算法

摘要: 可满足性问题（SAT）是计算复杂性的基石，具有广泛的工业应用，由于其复杂的架构，优化现代SAT求解器在现实世界中仍然具有挑战性。虽然已经开发了自动配置框架，但它们依赖于手动约束的搜索空间，并且性能提升有限。本文介绍了一种通过大型语言模型（LLMs）有效优化复杂SAT求解器的新范式，开发了一个名为AutoModSAT的工具。为了实现卓越的性能，解决了三个基本挑战：（1）LLM友好的求解器：提出了系统性的指导方针，以开发模块化求解器以满足LLMs的兼容性，强调代码简化，信息共享和缺陷减少；（2）自动提示优化：引入了一种无监督的自动提示优化方法，以提高LLMs输出的多样性；（3）高效搜索策略：我们设计了一种预搜索策略和一个EA进化算法，用于最终高效有效地发现启发式。广泛的实验跨越了各种数据集，证明AutoModSAT相对于基准求解器实现了50％的性能改善，并且相对于最先进的（SOTA）求解器实现了30％的优越性。此外，与参数调整的SOTA求解器的替代方案相比，AutoModSAT平均加快了20％的速度，展示了处理复杂问题实例的增强能力。这项工作弥合了基于人工智能的启发式发现与关键任务系统优化之间的差距，并为下一代复杂求解器的发展提供了方法论进步和经验验证结果。

更新时间: 2025-07-30 17:52:25

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2507.22876v1

Automatically discovering heuristics in a complex SAT solver with large language models

Updated: 2025-07-30 17:52:25

标题: 使用大型语言模型自动发现复杂SAT求解器中的启发式算法

摘要: 可满足性问题（SAT）是计算复杂性的基石，具有广泛的工业应用，由于其复杂的架构，在实际环境中优化现代SAT求解器仍然具有挑战性。虽然已经开发了自动配置框架，但它们依赖于手动约束的搜索空间，并且性能收益有限。本文介绍了一种通过大型语言模型（LLMs）有效优化复杂SAT求解器的新范式，并开发了一个名为AutoModSAT的工具。为了实现卓越的性能，解决了三个基本挑战：（1）LLM友好的求解器：提出了开发模块化求解器以满足LLMs兼容性的系统化指南，强调代码简化、信息共享和缺陷减少；（2）自动提示优化：引入了一种无监督的自动提示优化方法，以提高LLMs输出的多样性；（3）高效搜索策略：我们设计了一个预搜索策略和一个EA进化算法，最终高效有效地发现启发式方法。在广泛的数据集上进行了大量实验，证明AutoModSAT相比基准求解器实现了50%的性能改进，并且比最先进（SOTA）求解器具有30%的优越性。此外，与参数调整的SOTA求解器的替代方案相比，AutoModSAT平均实现了20%的加速，展示了处理复杂问题实例的增强能力。本工作弥合了基于人工智能的启发式发现和任务关键系统优化之间的鸿沟，并为下一代复杂求解器开发提供了方法论进展和经验验证结果。

更新时间: 2025-07-30 17:52:25

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2507.22876v1

LCS: An AI-based Low-Complexity Scaler for Power-Efficient Super-Resolution of Game Content

The increasing complexity of content rendering in modern games has led to a problematic growth in the workload of the GPU. In this paper, we propose an AI-based low-complexity scaler (LCS) inspired by state-of-the-art efficient super-resolution (ESR) models which could offload the workload on the GPU to a low-power device such as a neural processing unit (NPU). The LCS is trained on GameIR image pairs natively rendered at low and high resolution. We utilize adversarial training to encourage reconstruction of perceptually important details, and apply reparameterization and quantization techniques to reduce model complexity and size. In our comparative analysis we evaluate the LCS alongside the publicly available AMD hardware-based Edge Adaptive Scaling Function (EASF) and AMD FidelityFX Super Resolution 1 (FSR1) on five different metrics, and find that the LCS achieves better perceptual quality, demonstrating the potential of ESR models for upscaling on resource-constrained devices.

Updated: 2025-07-30 17:47:25

标题: LCS：一种基于人工智能的低复杂度缩放器，用于高效节能的游戏内容超分辨率

摘要: 现代游戏中内容呈现的日益复杂化导致GPU的工作负荷不断增加。本文提出了一种基于人工智能的低复杂度缩放器（LCS），灵感来源于最先进的高效超分辨率（ESR）模型，可以将GPU的工作负荷转移到低功耗设备，如神经处理单元（NPU）。LCS在原生渲染的低分辨率和高分辨率GameIR图像对上进行训练。我们利用对抗训练来促进对感知重要细节的重建，并应用重新参数化和量化技术来减少模型复杂性和大小。在我们的比较分析中，我们评估了LCS与公开可用的基于AMD硬件的边缘自适应缩放功能（EASF）和AMD FidelityFX超分辨率1（FSR1）在五个不同指标上的表现，并发现LCS实现了更好的感知质量，展示了ESR模型在资源受限设备上的提升潜力。

更新时间: 2025-07-30 17:47:25

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.22873v1

LCS: An AI-based Low-Complexity Scaler for Power-Efficient Super-Resolution of Game Content

Updated: 2025-07-30 17:47:25

标题: LCS：一种基于人工智能的低复杂度缩放器，用于游戏内容的高效能超分辨率

摘要: 现代游戏内容呈现的不断复杂化导致GPU工作量的问题性增长。在本文中，我们提出了一种基于人工智能的低复杂度缩放器（LCS），灵感来自最先进的高效超分辨率（ESR）模型，可以将GPU的工作量转移到低功耗设备，如神经处理单元（NPU）。LCS在原生渲染的低分辨率和高分辨率下训练GameIR图像对。我们利用对抗训练来鼓励对感知重要细节的重建，并应用重参数化和量化技术来减少模型复杂性和大小。在我们的比较分析中，我们评估了LCS与公开可用的基于AMD硬件的边缘自适应缩放功能（EASF）和AMD FidelityFX Super Resolution 1（FSR1）在五个不同指标上的表现，发现LCS实现了更好的感知质量，展示了ESR模型在资源受限设备上放大的潜力。

更新时间: 2025-07-30 17:47:25

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.22873v1

Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point

Recent work has shown that 8-bit floating point (FP8) can be used for efficiently training neural networks with reduced computational cost compared to training in FP32/FP16. In this work, we investigate the use of FP8 training in a federated learning context. This approach brings not only the usual benefits of FP8 which are desirable for on-device training at the edge, but also reduces client-server communication costs due to significant weight compression. We present a novel method for combining FP8 client training while maintaining a global FP32 server model and provide convergence analysis. Experiments with various machine learning models and datasets show that our method consistently yields communication reductions of at least 2.9x across a variety of tasks and models compared to an FP32 baseline to achieve the same trained model accuracy.

Updated: 2025-07-30 17:45:50

标题: 朝着使用8位浮点数进行设备训练和通信的联合学习方向

摘要: 最近的研究表明，与在FP32/FP16中训练相比，使用8位浮点（FP8）可以有效地训练神经网络并减少计算成本。在这项工作中，我们研究了在联邦学习环境中使用FP8训练的方法。这种方法不仅带来了FP8的常规优势，这对边缘设备上的训练非常有利，而且由于显著的权重压缩，还降低了客户端和服务器之间的通信成本。我们提出了一种新颖的方法，可以在保持全局FP32服务器模型的同时结合FP8客户端训练，并提供收敛分析。通过各种机器学习模型和数据集的实验表明，与FP32基线相比，我们的方法在各种任务和模型中始终能够实现至少2.9倍的通信减少，以达到相同的训练模型准确度。

更新时间: 2025-07-30 17:45:50

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2407.02610v2

Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point

Updated: 2025-07-30 17:45:50

标题: 朝着使用8位浮点数进行设备端训练和通信的联邦学习方向

摘要: 最近的研究表明，与在FP32/FP16中训练相比，8位浮点（FP8）可用于以更低的计算成本有效地训练神经网络。在这项工作中，我们研究了在联邦学习环境中使用FP8训练的方法。这种方法不仅带来了FP8通常的优势，这对边缘设备上的训练很有用，而且由于重要的权重压缩，还降低了客户端-服务器通信成本。我们提出了一种新颖的方法，将FP8客户端训练与保持全局FP32服务器模型相结合，并提供了收敛分析。通过对各种机器学习模型和数据集进行实验，我们的方法显示出与FP32基准相比，至少在各种任务和模型上实现了至少2.9倍的通信减少，以达到相同的训练模型准确性。

更新时间: 2025-07-30 17:45:50

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2407.02610v2

TextSAM-EUS: Text Prompt Learning for SAM to Accurately Segment Pancreatic Tumor in Endoscopic Ultrasound

Pancreatic cancer carries a poor prognosis and relies on endoscopic ultrasound (EUS) for targeted biopsy and radiotherapy. However, the speckle noise, low contrast, and unintuitive appearance of EUS make segmentation of pancreatic tumors with fully supervised deep learning (DL) models both error-prone and dependent on large, expert-curated annotation datasets. To address these challenges, we present TextSAM-EUS, a novel, lightweight, text-driven adaptation of the Segment Anything Model (SAM) that requires no manual geometric prompts at inference. Our approach leverages text prompt learning (context optimization) through the BiomedCLIP text encoder in conjunction with a LoRA-based adaptation of SAM's architecture to enable automatic pancreatic tumor segmentation in EUS, tuning only 0.86% of the total parameters. On the public Endoscopic Ultrasound Database of the Pancreas, TextSAM-EUS with automatic prompts attains 82.69% Dice and 85.28% normalized surface distance (NSD), and with manual geometric prompts reaches 83.10% Dice and 85.70% NSD, outperforming both existing state-of-the-art (SOTA) supervised DL models and foundation models (e.g., SAM and its variants). As the first attempt to incorporate prompt learning in SAM-based medical image segmentation, TextSAM-EUS offers a practical option for efficient and robust automatic EUS segmentation. Code is available at https://github.com/HealthX-Lab/TextSAM-EUS .

Updated: 2025-07-30 17:39:30

标题: TextSAM-EUS: 文本提示学习用于提高SAM在内窥镜超声中准确分割胰腺肿瘤

摘要: 胰腺癌预后不良，依赖内窥镜超声（EUS）进行靶向活检和放疗。然而，EUS的斑点噪声、低对比度和直观性不强使得利用完全监督的深度学习（DL）模型对胰腺肿瘤进行分割容易出现错误，并且依赖于大量的专家标注数据集。为了解决这些挑战，我们提出了TextSAM-EUS，这是一种新颖轻量级的、文本驱动的Segment Anything Model（SAM）的改进版，无需在推断时进行手动几何提示。我们的方法利用BiomedCLIP文本编码器进行文本提示学习（上下文优化），结合基于LoRA的SAM架构的适应来实现EUS中的自动胰腺肿瘤分割，仅调整了总参数的0.86%。在公共胰腺内窥镜数据库上，TextSAM-EUS通过自动提示达到82.69%的Dice系数和85.28%的归一化表面距离（NSD），使用手动几何提示则可达到83.10%的Dice系数和85.70%的NSD，优于现有的最先进的监督DL模型和基础模型（例如SAM及其变体）。作为首次尝试将提示学习纳入基于SAM的医学图像分割中，TextSAM-EUS为高效和稳健的自动EUS分割提供了实用选择。源代码可在https://github.com/HealthX-Lab/TextSAM-EUS获取。

更新时间: 2025-07-30 17:39:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18082v3

TextSAM-EUS: Text Prompt Learning for SAM to Accurately Segment Pancreatic Tumor in Endoscopic Ultrasound

Updated: 2025-07-30 17:39:30

标题: TextSAM-EUS：用于SAM准确分割内窥超声胰腺肿瘤的文本提示学习

摘要: 胰腺癌预后不良，依赖内窥镜超声（EUS）进行靶向活检和放疗。然而，EUS的斑点噪声、低对比度和不直观的外观使得使用全监督深度学习（DL）模型对胰腺肿瘤进行分割既容易出错，又依赖于大规模、专家策划的注释数据集。为了解决这些挑战，我们提出了TextSAM-EUS，这是一种新颖、轻量级的、由文本驱动的Segment Anything Model（SAM）适应版本，无需在推断时进行手动几何提示。我们的方法利用BiomedCLIP文本编码器中的文本提示学习（上下文优化），结合基于LoRA的SAM架构适应，实现了EUS中自动胰腺肿瘤分割，仅调整了总参数的0.86%。在公共胰腺内窥镜数据库上，TextSAM-EUS通过自动提示达到了82.69%的Dice系数和85.28%的标准化表面距离（NSD），通过手动几何提示达到了83.10%的Dice系数和85.70%的NSD，胜过了现有的最先进（SOTA）监督DL模型和基础模型（例如SAM及其变体）。作为首次尝试在基于SAM的医学图像分割中引入提示学习，TextSAM-EUS为高效、稳健的自动EUS分割提供了实用的选择。代码可在https://github.com/HealthX-Lab/TextSAM-EUS 上找到。

更新时间: 2025-07-30 17:39:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.18082v3

Utilizing Evolution Strategies to Train Transformers in Reinforcement Learning

We explore the capability of evolution strategies to train an agent with a policy based on a transformer architecture in a reinforcement learning setting. We performed experiments using OpenAI's highly parallelizable evolution strategy to train Decision Transformer in the MuJoCo Humanoid locomotion environment and in the environment of Atari games, testing the ability of this black-box optimization technique to train even such relatively large and complicated models (compared to those previously tested in the literature). The examined evolution strategy proved to be, in general, capable of achieving strong results and managed to produce high-performing agents, showcasing evolution's ability to tackle the training of even such complex models.

Updated: 2025-07-30 17:37:43

标题: 利用进化策略来训练强化学习中的Transformer

摘要: 我们探讨了进化策略在强化学习环境中训练基于变压器架构的代理的能力。我们使用OpenAI的高度并行化进化策略在MuJoCo Humanoid运动环境和Atari游戏环境中训练决策变压器，并测试了这种黑盒优化技术训练即使是相对较大和复杂模型（与以往文献中测试的模型相比）的能力。经过实验，检验的进化策略通常能够取得良好结果，并成功生成高性能代理，展示了进化能够处理甚至是如此复杂模型的训练能力。

更新时间: 2025-07-30 17:37:43

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2501.13883v2

Utilizing Evolution Strategies to Train Transformers in Reinforcement Learning

Updated: 2025-07-30 17:37:43

标题: 利用进化策略在强化学习中训练变压器

摘要: 我们探讨了进化策略在强化学习环境中训练基于变压器架构的代理的能力。我们使用OpenAI的高度可并行化的进化策略，在MuJoCo人形机器人运动环境和Atari游戏环境中训练Decision Transformer，测试这种黑盒优化技术训练相对较大和复杂模型（与以往文献中测试的模型相比）的能力。所研究的进化策略总体上能够取得强大的结果，并成功产生高性能代理，展示了进化技术处理甚至如此复杂模型的训练的能力。

更新时间: 2025-07-30 17:37:43

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2501.13883v2

Mesh based segmentation for automated margin line generation on incisors receiving crown treatment

Dental crowns are essential dental treatments for restoring damaged or missing teeth of patients. Recent design approaches of dental crowns are carried out using commercial dental design software. Once a scan of a preparation is uploaded to the software, a dental technician needs to manually define a precise margin line on the preparation surface, which constitutes a non-repeatable and inconsistent procedure. This work proposes a new framework to determine margin lines automatically and accurately using deep learning. A dataset of incisor teeth was provided by a collaborating dental laboratory to train a deep learning segmentation model. A mesh-based neural network was modified by changing its input channels and used to segment the prepared tooth into two regions such that the margin line is contained within the boundary faces separating the two regions. Next, k-fold cross-validation was used to train 5 models, and a voting classifier technique was used to combine their results to enhance the segmentation. After that, boundary smoothing and optimization using the graph cut method were applied to refine the segmentation results. Then, boundary faces separating the two regions were selected to represent the margin line faces. A spline was approximated to best fit the centers of the boundary faces to predict the margin line. Our results show that an ensemble model combined with maximum probability predicted the highest number of successful test cases (7 out of 13) based on a maximum distance threshold of 200 m (representing human error) between the predicted and ground truth point clouds. It was also demonstrated that the better the quality of the preparation, the smaller the divergence between the predicted and ground truth margin lines (Spearman's rank correlation coefficient of -0.683). We provide the train and test datasets for the community.

Updated: 2025-07-30 17:34:45

标题: 基于网格的分割技术用于自动化生成接受冠修复治疗的门牙的边缘线

摘要: 牙冠是恢复患者受损或缺失牙齿的基本牙科治疗。最近的牙冠设计方法是使用商业牙科设计软件进行的。一旦制备扫描上传到软件中，牙技师需要手动在制备表面上定义一个精确的边缘线，这构成了一个不可重复和不一致的过程。本文提出了一个新的框架，利用深度学习自动准确确定边缘线。一个合作的牙科实验室提供了一组门牙的数据集，用于训练深度学习分割模型。通过改变输入通道修改基于网格的神经网络，将制备好的牙齿分割为两个区域，使得边缘线包含在分隔两个区域的边界面内。接下来，使用k-fold交叉验证训练了5个模型，并使用投票分类器技术将它们的结果结合起来以增强分割效果。然后，应用图割方法对边界进行平滑优化，以精细化分割结果。然后，选择分隔两个区域的边界面来代表边缘线面。通过将样条最佳拟合到边界面的中心来预测边缘线。我们的结果表明，组合模型结合最大概率预测的成功测试案例数量最高（13个中的7个），基于预测和地面真实点云之间的最大距离阈值为200米（代表人为误差）。还证明，制备质量越好，预测和地面真实边缘线之间的差异越小（斯皮尔曼等级相关系数为-0.683）。我们为社区提供了训练和测试数据集。

更新时间: 2025-07-30 17:34:45

领域: cs.CE,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.22859v1

Mesh based segmentation for automated margin line generation on incisors receiving crown treatment

Updated: 2025-07-30 17:34:45

标题: 基于网格的分割技术用于自动化冠修复处理中的牙齿边缘线生成

摘要: 牙冠是恢复患者受损或缺失牙齿的必要牙科治疗。最近的牙冠设计方法是使用商业牙科设计软件进行的。一旦扫描准备好的牙齿上传到软件中，牙技师需要手动在准备表面上定义一个精确的边缘线，这构成了一个不可重复和不一致的过程。本研究提出了一个新的框架，利用深度学习自动和准确地确定边缘线。一个合作的牙科实验室提供了一个门牙数据集，用于训练深度学习分割模型。通过改变其输入通道，修改了基于网格的神经网络，并用于将准备好的牙齿分割成两个区域，使得边缘线包含在分隔两个区域的边界面内。接下来，使用k折交叉验证训练了5个模型，并使用投票分类器技术将它们的结果结合起来以增强分割。然后，使用图割方法进行边界平滑和优化以细化分割结果。然后，选择分隔两个区域的边界面来代表边缘线面。拟合一条样条以最佳匹配边界面的中心以预测边缘线。我们的结果表明，与最大概率相结合的集成模型预测了最多成功测试案例（13个中的7个），在预测和地面真实点云之间的最大距离阈值为200 m（代表人为误差）。还证明了，准备的质量越好，预测和地面真实边缘线之间的差异越小（斯皮尔曼秩相关系数为-0.683）。我们为社区提供了训练和测试数据集。

更新时间: 2025-07-30 17:34:45

领域: cs.CE,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.22859v1

Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers

Competitive Pok\'emon Singles (CPS) is a popular strategy game where players learn to exploit their opponent based on imperfect information in battles that can last more than one hundred stochastic turns. AI research in CPS has been led by heuristic tree search and online self-play, but the game may also create a platform to study adaptive policies trained offline on large datasets. We develop a pipeline to reconstruct the first-person perspective of an agent from logs saved from the third-person perspective of a spectator, thereby unlocking a dataset of real human battles spanning more than a decade that grows larger every day. This dataset enables a black-box approach where we train large sequence models to adapt to their opponent based solely on their input trajectory while selecting moves without explicit search of any kind. We study a progression from imitation learning to offline RL and offline fine-tuning on self-play data in the hardcore competitive setting of Pok\'emon's four oldest (and most partially observed) game generations. The resulting agents outperform a recent LLM Agent approach and a strong heuristic search engine. While playing anonymously in online battles against humans, our best agents climb to rankings inside the top 10% of active players. All agent checkpoints, training details, datasets, and baselines are available at https://metamon.tech.

Updated: 2025-07-30 17:33:22

标题: Transformers模型的离线强化学习实现人类水平竞争力的 Pokémon

摘要: Competitive Pok\'emon Singles (CPS) 是一种流行的策略游戏，玩家在其中学会利用对手的不完全信息，在可以持续超过一百个随机回合的战斗中展开对战。在CPS中的AI研究一直由启发式树搜索和在线自我对弈引领，但这个游戏也可以成为研究离线训练在大型数据集上的自适应策略的平台。我们开发了一个管道，可以从保存在观众第三人称视角的日志中重建代理的第一人称视角，从而解锁了一个持续增长的真实人类战斗数据集，跨越了十多年。这个数据集使得我们能够使用黑盒方法，通过训练大型序列模型，仅基于对手的输入轨迹而选择移动，而无需任何明确搜索。我们研究了从模仿学习到离线RL和在Pok\'emon的四个最古老（同时也是最部分观察到的）游戏世代的自我对弈数据上的离线微调的进展。结果的代理超过了最近的LLM代理方法和强大的启发式搜索引擎。在与人类匿名在线对战中，我们最优秀的代理者跻身活跃玩家的前10%名次。所有代理检查点、训练细节、数据集和基线都可以在https://metamon.tech上找到。

更新时间: 2025-07-30 17:33:22

领域: cs.LG

下载: http://arxiv.org/abs/2504.04395v2

Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers

Updated: 2025-07-30 17:33:22

标题: 使用可扩展的离线强化学习和变压器实现人类水平的竞争性宝可梦

摘要: 竞技宝可梦单打（CPS）是一种流行的策略游戏，玩家学习如何利用基于不完全信息的对手，在持续超过一百个随机回合的战斗中取得优势。在CPS中的AI研究主要集中在启发式树搜索和在线自我对弈，但游戏也可以成为研究离线训练在大型数据集上的自适应策略的平台。我们开发了一个流程，从保存在第三人视角的观众日志中重建代理的第一人视角，从而解锁了一个涵盖超过十年的真实人类战斗数据集，这个数据集每天都在增长。这个数据集使得我们可以采用黑盒方法，训练大型序列模型 solely以他们的输入轨迹适应对手，并在不进行任何显式搜索的情况下选择动作。我们研究了从模仿学习到离线强化学习以及在Pok\'emon的四个最古老（且部分观察）的游戏世代的自我对弈数据上的离线微调。最终产生的代理优于最近的LLM代理方法和强大的启发式搜索引擎。在与人类匿名在线对战中，我们最优秀的代理者已经攀升到活跃玩家前10%的排名。所有代理检查点、训练细节、数据集和基线都可以在https://metamon.tech上找到。

更新时间: 2025-07-30 17:33:22

领域: cs.LG

下载: http://arxiv.org/abs/2504.04395v2

Synchronization of mean-field models on the circle

This paper considers a mean-field model of $n$ interacting particles whose state space is the unit circle, a generalization of the classical Kuramoto model. Global synchronization is said to occur if after starting from almost any initial state, all particles coalesce to a common point on the circle. We propose a general synchronization criterion in terms of $L_1$-norm of the third derivative of the particle interaction function. As an application we resolve a conjecture for the so-called self-attention dynamics (stylized model of transformers), by showing synchronization for all $\beta \ge -0.16$, which significantly extends the previous bound of $0\le \beta \le 1$ from Criscitiello, Rebjock, McRae, and Boumal (2024). We also show that global synchronization does not occur when $\beta < -2/3$.

Updated: 2025-07-30 17:31:57

标题: 在圆环上的平均场模型同步

摘要: 本文考虑了一个包含$n$个相互作用粒子的平均场模型，其状态空间为单位圆，这是经典Kuramoto模型的推广。全局同步被认为发生在从几乎任何初始状态开始后，所有粒子聚集到圆圈上的一个共同点。我们提出了一个关于粒子相互作用函数的三阶导数的$L_1$-范数的一般同步标准。作为一个应用，我们解决了所谓的自注意力动态（transformers的样式化模型）的一个猜想，通过展示对于所有$\beta \ge -0.16$的同步，这显著扩展了Criscitiello, Rebjock, McRae和Boumal（2024年）的先前边界$0\le \beta \le 1$。我们还证明了当$\beta < -2/3$时，全局同步不会发生。

更新时间: 2025-07-30 17:31:57

领域: math.DS,cs.LG,math.AP,math.OC

下载: http://arxiv.org/abs/2507.22857v1

Synchronization of mean-field models on the circle

Updated: 2025-07-30 17:31:57

标题: 在圆环上的平均场模型同步

摘要: 这篇论文考虑了一个$n$个相互作用粒子的均场模型，其状态空间为单位圆，这是古典Kuramoto模型的一个推广。全局同步被认为发生在从几乎任何初始状态开始后，所有粒子汇聚到圆上的一个共同点。我们提出了一个关于粒子相互作用函数的三阶导数的$L_1$范数的普遍同步标准。作为应用，我们解决了所谓的自注意力动态（transformers的简化模型）的一个猜想，通过展示对于所有$\beta \ge -0.16$的同步，这显著扩展了Criscitiello，Rebjock，McRae和Boumal（2024年）的$0\le \beta \le 1$的先前界限。我们还证明了当$\beta < -2/3$时全局同步不会发生。

更新时间: 2025-07-30 17:31:57

领域: math.DS,cs.LG,math.AP,math.OC

下载: http://arxiv.org/abs/2507.22857v1

Federated Learning on Riemannian Manifolds: A Gradient-Free Projection-Based Approach

Federated learning (FL) has emerged as a powerful paradigm for collaborative model training across distributed clients while preserving data privacy. However, existing FL algorithms predominantly focus on unconstrained optimization problems with exact gradient information, limiting its applicability in scenarios where only noisy function evaluations are accessible or where model parameters are constrained. To address these challenges, we propose a novel zeroth-order projection-based algorithm on Riemannian manifolds for FL. By leveraging the projection operator, we introduce a computationally efficient zeroth-order Riemannian gradient estimator. Unlike existing estimators, ours requires only a simple Euclidean random perturbation, eliminating the need to sample random vectors in the tangent space, thus reducing computational cost. Theoretically, we first prove the approximation properties of the estimator and then establish the sublinear convergence of the proposed algorithm, matching the rate of its first-order counterpart. Numerically, we first assess the efficiency of our estimator using kernel principal component analysis. Furthermore, we apply the proposed algorithm to two real-world scenarios: zeroth-order attacks on deep neural networks and low-rank neural network training to validate the theoretical findings.

Updated: 2025-07-30 17:24:27

标题: 在黎曼流形上的联邦学习：一种基于无梯度投影的方法

摘要: 联邦学习（FL）已经成为一种强大的范式，用于在分布式客户端之间进行协作模型训练，同时保护数据隐私。然而，现有的FL算法主要集中在具有精确梯度信息的无约束优化问题上，这限制了它在只能访问嘈杂函数评估或模型参数受限的情况下的适用性。为了解决这些挑战，我们提出了一种基于黎曼流形的新型零阶投影算法用于FL。通过利用投影算子，我们引入了一个计算效率高的零阶黎曼梯度估计器。与现有的估计器不同，我们的估计器只需要简单的欧几里德随机扰动，消除了在切空间中采样随机向量的需要，从而降低了计算成本。在理论上，我们首先证明了估计器的逼近性质，然后建立了所提出算法的亚线性收敛性，与其一阶对应物的速率相匹配。在数值上，我们首先使用核主成分分析评估了我们的估计器的效率。此外，我们将所提出的算法应用于两个现实场景：对深度神经网络的零阶攻击和低秩神经网络训练，以验证理论发现。

更新时间: 2025-07-30 17:24:27

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2507.22855v1

Federated Learning on Riemannian Manifolds: A Gradient-Free Projection-Based Approach

Updated: 2025-07-30 17:24:27

标题: Riemannian流形上的联邦学习：一种基于无梯度投影的方法

摘要: 联邦学习（FL）已经成为一种强大的范例，用于在分布式客户端之间进行协作模型训练，同时保护数据隐私。然而，现有的FL算法主要集中在具有精确梯度信息的无约束优化问题上，这限制了其在只能访问嘈杂函数评估或模型参数受限的情况下的适用性。为了解决这些挑战，我们提出了一种基于黎曼流形的新型零阶投影算法用于FL。通过利用投影算子，我们引入了一个计算效率高的零阶黎曼梯度估计器。与现有的估计器不同，我们的估计器只需要一个简单的欧几里得随机扰动，消除了在切空间中采样随机向量的需要，从而降低了计算成本。在理论上，我们首先证明了估计器的逼近性质，然后建立了所提出算法的次线性收敛性，与其一阶对应物的速率相匹配。在数值上，我们首先使用核主成分分析评估了我们的估计器的效率。此外，我们将所提出的算法应用于两个现实场景：深度神经网络的零阶攻击和低秩神经网络训练，以验证理论发现。

更新时间: 2025-07-30 17:24:27

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2507.22855v1

A Bit of Freedom Goes a Long Way: Classical and Quantum Algorithms for Reinforcement Learning under a Generative Model

We propose novel classical and quantum online algorithms for learning finite-horizon and infinite-horizon average-reward Markov Decision Processes (MDPs). Our algorithms are based on a hybrid exploration-generative reinforcement learning (RL) model wherein the agent can, from time to time, freely interact with the environment in a generative sampling fashion, i.e., by having access to a "simulator". By employing known classical and new quantum algorithms for approximating optimal policies under a generative model within our learning algorithms, we show that it is possible to avoid several paradigms from RL like "optimism in the face of uncertainty" and "posterior sampling" and instead compute and use optimal policies directly, which yields better regret bounds compared to previous works. For finite-horizon MDPs, our quantum algorithms obtain regret bounds which only depend logarithmically on the number of time steps $T$, thus breaking the $O(\sqrt{T})$ classical barrier. This matches the time dependence of the prior quantum works of Ganguly et al. (arXiv'23) and Zhong et al. (ICML'24), but with improved dependence on other parameters like state space size $S$ and action space size $A$. For infinite-horizon MDPs, our classical and quantum bounds still maintain the $O(\sqrt{T})$ dependence but with better $S$ and $A$ factors. Nonetheless, we propose a novel measure of regret for infinite-horizon MDPs with respect to which our quantum algorithms have $\operatorname{poly}\log{T}$ regret, exponentially better compared to classical algorithms. Finally, we generalise all of our results to compact state spaces.

Updated: 2025-07-30 17:24:23

标题: 一点自由就够了：在生成模型下强化学习的经典和量子算法

摘要: 我们提出了一种新颖的经典和量子在线算法，用于学习有限时间和无限时间平均奖励马尔可夫决策过程（MDPs）。我们的算法基于一种混合的探索-生成强化学习（RL）模型，其中代理可以不时地以生成取样的方式自由与环境互动，即通过访问“模拟器”。通过在我们的学习算法中使用已知的经典和新的量子算法来近似在生成模型下的最优策略，我们展示了可以避免RL中的一些范式，如“面对不确定性的乐观主义”和“后验取样”，而是直接计算和使用最优策略，这比以前的工作具有更好的遗憾界限。对于有限时间MDPs，我们的量子算法获得了仅对时间步数$T$进行对数依赖的遗憾界限，从而打破了$O(\sqrt{T})$的经典障碍。这与Ganguly等人的先前量子作品（arXiv'23）和Zhong等人的ICML'24中的时间依赖性相匹配，但对其他参数如状态空间大小$S$和动作空间大小$A$的依赖性有所改善。对于无限时间MDPs，我们的经典和量子界限仍然保持$O(\sqrt{T})$的依赖性，但具有更好的$S$和$A$因子。尽管如此，我们提出了一种针对无限时间MDPs的遗憾度量，根据这个度量，我们的量子算法具有$\operatorname{poly}\log{T}$的遗憾度量，与经典算法相比，表现出指数级的改进。最后，我们将所有结果推广到紧致状态空间。

更新时间: 2025-07-30 17:24:23

领域: cs.LG,cs.AI,math.OC,quant-ph,stat.ML

下载: http://arxiv.org/abs/2507.22854v1

A Bit of Freedom Goes a Long Way: Classical and Quantum Algorithms for Reinforcement Learning under a Generative Model

Updated: 2025-07-30 17:24:23

标题: 一点自由可以走很长远：在生成模型下强化学习的经典和量子算法

摘要: 我们提出了一种新颖的经典和量子在线算法，用于学习有限时间和无限时间平均奖励的马尔可夫决策过程（MDPs）。我们的算法基于一种混合的探索-生成强化学习（RL）模型，在这种模型中，代理可以不时地以一种生成采样的方式自由与环境互动，即通过访问"模拟器"。通过在我们的学习算法中利用已知的经典和新的量子算法来近似在生成模型下的最优策略，我们展示了可以避免强化学习中的诸多范例，如“面对不确定性的乐观”和“后验抽样”，而是直接计算和使用最优策略，这相比之前的工作获得了更好的遗憾界。对于有限时间MDPs，我们的量子算法获得的遗憾界仅在时间步数$T$上呈对数依赖，从而突破了$O(\sqrt{T})$的经典界限。这与Ganguly等人（arXiv'23）和Zhong等人（ICML'24）之前的量子作品的时间依赖性相匹配，但在其他参数如状态空间大小$S$和动作空间大小$A$上有改进。对于无限时间MDPs，我们的经典和量子界仍然保持$O(\sqrt{T})$的依赖性，但在$S$和$A$因子上有更好的表现。尽管如此，我们提出了一种针对无限时间MDPs的新颖遗憾度量，根据这种度量，我们的量子算法具有$\operatorname{poly}\log{T}$的遗憾度，相比之下比经典算法有指数级的改进。最后，我们将所有结果推广到紧凑状态空间。

更新时间: 2025-07-30 17:24:23

领域: cs.LG,cs.AI,math.OC,quant-ph,stat.ML

下载: http://arxiv.org/abs/2507.22854v1

Repair-R1: Better Test Before Repair

APR (Automated Program Repair) aims to automatically locate program defects, generate patches and validate the repairs. Existing techniques for APR are often combined with LLMs (Large Language Models), which leverages the code-related knowledge of LLMs to improve repair effectiveness. Current LLM-based APR methods typically utilize test cases only during the inference stage, adopting an iterative approach that performs repair first and validates it through test execution afterward. This conventional paradigm neglects two important aspects: the potential contribution of test cases in the training phase, and the possibility of leveraging testing prior to repair. To address this, we propose Repair-R1, which introduces test cases into the model's training phase and shifts test generation to precede repair. The model is required to first generate discriminative test cases that can distinguish defective behaviors, and then perform repair based on these tests. This enables the model to better locate defects and understand the underlying causes of defects, thereby improving repair effectiveness. We implement Repair-R1 with three different backbone models, using RL (reinforcement learning) to co-optimize test generation and bug repair. Experimental results on four widely adopted benchmarks demonstrate the superiority of Repair-R1. Specially, compared to vanilla models, Repair-R1 improves repair success rate by 2.68\% to 48.29\%, test generation success rate by 16.38\% to 53.28\%, and test coverage by 0.78\% to 53.96\%. We publish the code and weights at https://github.com/Tomsawyerhu/APR-RL and https://huggingface.co/tomhu/Qwen3-4B-RL-5000-step.

Updated: 2025-07-30 17:24:05

标题: Repair-R1: 修复之前更好的测试

摘要: APR（自动程序修复）旨在自动定位程序缺陷，生成修补程序并验证修复。现有的APR技术通常与LLMs（大型语言模型）结合使用，利用LLMs的代码相关知识来提高修复效果。当前基于LLM的APR方法通常仅在推理阶段利用测试用例，采用迭代方法进行修复，然后通过测试执行验证。这种传统范式忽略了两个重要方面：测试用例在训练阶段的潜在贡献，以及在修复之前利用测试的可能性。为了解决这个问题，我们提出了Repair-R1，该方法将测试用例引入模型的训练阶段，并将测试生成置于修复之前。模型首先需要生成能够区分缺陷行为的区分性测试用例，然后基于这些测试进行修复。这使模型能够更好地定位缺陷并理解缺陷的根本原因，从而提高修复效果。我们使用三种不同的主干模型实现了Repair-R1，利用RL（强化学习）来共同优化测试生成和错误修复。在四个广泛采用的基准测试上的实验结果表明了Repair-R1的优越性。特别地，与基本模型相比，Repair-R1将修复成功率提高了2.68\%至48.29%，测试生成成功率提高了16.38\%至53.28%，测试覆盖率提高了0.78\%至53.96\%。我们在https://github.com/Tomsawyerhu/APR-RL和https://huggingface.co/tomhu/Qwen3-4B-RL-5000-step发布了代码和权重。

更新时间: 2025-07-30 17:24:05

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.22853v1

Repair-R1: Better Test Before Repair

Updated: 2025-07-30 17:24:05

标题: 修复-R1：在修复前进行更好的测试

摘要: APR (Automated Program Repair)旨在自动定位程序缺陷，生成修补程序并验证修复。现有的APR技术通常与LLMs（Large Language Models）结合使用，利用LLMs的代码相关知识来提高修复效果。当前基于LLM的APR方法通常仅在推理阶段利用测试用例，采用迭代方法先进行修复，然后通过测试执行验证修复。这种传统范式忽略了两个重要方面：测试用例在训练阶段的潜在贡献，以及在修复之前利用测试。为了解决这个问题，我们提出了Repair-R1，将测试用例引入模型的训练阶段，并将测试生成移到修复之前。模型首先需要生成能够区分缺陷行为的判别性测试用例，然后基于这些测试进行修复。这使模型能够更好地定位缺陷并了解缺陷的根本原因，从而提高修复效果。我们使用三种不同的骨干模型实现了Repair-R1，使用RL（强化学习）来共同优化测试生成和漏洞修复。在四个广泛采用的基准测试上的实验结果显示了Repair-R1的优越性。特别是，与普通模型相比，Repair-R1将修复成功率提高了2.68\%至48.29%，测试生成成功率提高了16.38\%至53.28%，测试覆盖率提高了0.78%至53.96%。我们在https://github.com/Tomsawyerhu/APR-RL和https://huggingface.co/tomhu/Qwen3-4B-RL-5000-step上发布了代码和权重。

更新时间: 2025-07-30 17:24:05

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.22853v1

Lightweight Online Adaption for Time Series Foundation Model Forecasts

Foundation models (FMs) have emerged as a promising approach for time series forecasting. While effective, FMs typically remain fixed during deployment due to the high computational costs of learning them online. Consequently, deployed FMs fail to adapt their forecasts to current data characteristics, despite the availability of online feedback from newly arriving data. This raises the question of whether FM performance can be enhanced by the efficient usage of this feedback. We propose ELF to answer this question. ELF is a lightweight mechanism for the online adaption of FM forecasts in response to online feedback. ELF consists of two parts: a) the ELF-Forecaster which is used to learn the current data distribution; and b) the ELF-Weighter which is used to combine the forecasts of the FM and the ELF-Forecaster. We evaluate the performance of ELF in conjunction with several recent FMs across a suite of standard time series datasets. In all of our experiments we find that using ELF improves performance. This work demonstrates how efficient usage of online feedback can be used to improve FM forecasts.

Updated: 2025-07-30 17:23:56

标题: 轻量级在线适应时间序列基础模型预测

摘要: 基金会模型（FMs）已经成为时间序列预测的一种有前途的方法。虽然有效，但由于在线学习的高计算成本，通常在部署期间FMs保持固定。因此，尽管有来自新到达数据的在线反馈，部署的FMs无法调整其预测与当前数据特征相匹配。这引发了一个问题，即是否可以通过有效利用这些反馈来提高FM的性能。我们提出了ELF来回答这个问题。ELF是一种轻量级机制，用于根据在线反馈调整FM预测。ELF包括两部分：a）ELF-Forecaster用于学习当前数据分布；b）ELF-Weighter用于组合FM和ELF-Forecaster的预测。我们在一系列标准时间序列数据集上评估ELF与几种最新的FM的性能。在我们所有的实验中，我们发现使用ELF可以提高性能。这项工作展示了如何通过有效利用在线反馈来改进FM的预测。

更新时间: 2025-07-30 17:23:56

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.12920v3

Lightweight Online Adaption for Time Series Foundation Model Forecasts

Updated: 2025-07-30 17:23:56

标题: 轻量级在线适应时间序列基础模型预测

摘要: 基础模型（FMs）已被证明是一种有前途的时间序列预测方法。尽管有效，由于在线学习的高计算成本，FMs通常在部署过程中保持固定。因此，尽管有来自新到达数据的在线反馈，部署的FMs无法根据当前数据特征调整其预测。这引发了一个问题，即是否可以通过有效利用这些反馈来提高FM的性能。我们提出ELF来回答这个问题。ELF是一种用于根据在线反馈调整FM预测的轻量级机制。ELF包括两部分：a）ELF-Forecaster用于学习当前数据分布；b）ELF-Weighter用于将FM和ELF-Forecaster的预测组合在一起。我们在一系列标准时间序列数据集上评估ELF与几种最新的FM结合使用的性能。在所有实验中，我们发现使用ELF可以提高性能。这项工作展示了如何通过有效利用在线反馈来改进FM预测。

更新时间: 2025-07-30 17:23:56

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.12920v3

FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models

Hallucinations in large language models pose a critical challenge for applications requiring factual reliability, particularly in high-stakes domains such as finance. This work presents an effective approach for detecting and editing factually incorrect content in model-generated responses based on the provided context. Given a user-defined domain-specific error taxonomy, we construct a synthetic dataset by inserting tagged errors into financial question-answering corpora and then fine-tune four language models, Phi-4, Phi-4-mini, Qwen3-4B, and Qwen3-14B, to detect and edit these factual inaccuracies. Our best-performing model, fine-tuned Phi-4, achieves an 8% improvement in binary F1 score and a 30% gain in overall detection performance compared to OpenAI-o3. Notably, our fine-tuned Phi-4-mini model, despite having only 4 billion parameters, maintains competitive performance with just a 2% drop in binary detection and a 0.1% decline in overall detection compared to OpenAI-o3. Our work provides a practical solution for detecting and editing factual inconsistencies in financial text generation while introducing a generalizable framework that can enhance the trustworthiness and alignment of large language models across diverse applications beyond finance. Our code and data are available at https://github.com/pegasi-ai/shield.

Updated: 2025-07-30 17:19:41

标题: FRED：财务检索增强模型语言中幻觉的检测和编辑

摘要: 大语言模型中的幻觉对于需要事实可靠性的应用来说构成了一个重要挑战，尤其是在高风险领域，比如金融。本文提出了一种有效的方法，用于检测和编辑模型生成的响应中的事实错误内容，基于所提供的上下文。在给定用户定义的特定领域错误分类法的情况下，我们通过将标记错误插入金融问答语料库中构建一个合成数据集，然后对四个语言模型Phi-4、Phi-4-mini、Qwen3-4B和Qwen3-14B进行微调，以检测和编辑这些事实不准确性。我们效果最好的模型，微调后的Phi-4，在二元F1得分上取得了8%的改进，并在整体检测性能上获得了30%的增益，相比OpenAI-o3。值得注意的是，我们微调后的Phi-4-mini模型，尽管只有40亿个参数，但在二元检测上仅下降2%，在整体检测上下降0.1%，与OpenAI-o3相比，仍保持了竞争性能。我们的工作提供了一种实际解决方案，用于检测和编辑金融文本生成中的事实不一致性，同时引入了一个通用框架，可以增强大型语言模型在金融之外的各种应用中的可信度和一致性。我们的代码和数据可在https://github.com/pegasi-ai/shield上找到。

更新时间: 2025-07-30 17:19:41

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20930v2

FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models

Updated: 2025-07-30 17:19:41

标题: FRED：财务检索增强语言模型中幻觉的检测和编辑

摘要: 大型语言模型中的幻觉对于需要事实可靠性的应用提出了重要挑战，特别是在金融等高风险领域。本研究提出了一种有效的方法，基于提供的上下文来检测和编辑模型生成的响应中的事实错误内容。给定用户定义的领域特定错误分类法，我们通过将带标签的错误插入金融问答语料库中构建了一个合成数据集，然后对四个语言模型Phi-4、Phi-4-mini、Qwen3-4B和Qwen3-14B进行微调，以检测和编辑这些事实不准确之处。我们表现最佳的模型，微调后的Phi-4，在二元F1分数上取得了8%的改进，并在整体检测性能上获得了30%的增益，与OpenAI-o3相比。值得注意的是，我们微调后的Phi-4-mini模型，尽管只有40亿个参数，但在二元检测上仅有2%的下降，整体检测上只有0.1%的下降，与OpenAI-o3相比保持了竞争力。我们的工作提供了一个实用的解决方案，可以检测和编辑金融文本生成中的事实不一致，同时引入了一个通用框架，可以增强大型语言模型在金融以外的各种应用中的可信度和对齐性。我们的代码和数据可在https://github.com/pegasi-ai/shield获取。

更新时间: 2025-07-30 17:19:41

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.20930v2

Past Meets Present: Creating Historical Analogy with Large Language Models

Historical analogies, which compare known past events with contemporary but unfamiliar events, are important abilities that help people make decisions and understand the world. However, research in applied history suggests that people have difficulty finding appropriate analogies. And previous studies in the AI community have also overlooked historical analogies. To fill this gap, in this paper, we focus on the historical analogy acquisition task, which aims to acquire analogous historical events for a given event. We explore retrieval and generation methods for acquiring historical analogies based on different large language models (LLMs). Furthermore, we propose a self-reflection method to mitigate hallucinations and stereotypes when LLMs generate historical analogies. Through human evaluations and our specially designed automatic multi-dimensional assessment, we find that LLMs generally have a good potential for historical analogies. And the performance of the models can be further improved by using our self-reflection method.

Updated: 2025-07-30 17:18:33

标题: 过去与现在相遇：利用大型语言模型创建历史类比

摘要: 历史类比是将已知的过去事件与当代但陌生的事件进行比较的重要能力，它有助于人们做出决策和理解世界。然而，应用历史研究表明，人们在寻找适当的类比时存在困难。此外，AI领域的先前研究也忽视了历史类比。为填补这一空白，本文聚焦于历史类比获取任务，旨在为给定事件获取类比的历史事件。我们探讨了基于不同大型语言模型（LLMs）的获取历史类比的检索和生成方法。此外，我们提出了一种自我反思方法，以减轻LLMs生成历史类比时可能出现的幻觉和刻板印象。通过人类评估和我们专门设计的自动多维评估，我们发现LLMs通常具有良好的历史类比潜力。并且通过使用我们的自我反思方法，模型的性能可以进一步提高。

更新时间: 2025-07-30 17:18:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.14820v2

Past Meets Present: Creating Historical Analogy with Large Language Models

Updated: 2025-07-30 17:18:33

标题: 过去与现在相遇：利用大型语言模型创建历史类比

摘要: 历史类比是将已知的过去事件与当代但陌生的事件进行比较的重要能力，它有助于人们做出决策并理解世界。然而，应用历史研究表明，人们很难找到合适的类比。此外，AI社区先前的研究也忽视了历史类比。为了填补这一空白，本文聚焦于历史类比获取任务，旨在为给定事件获取类似的历史事件。我们探讨了基于不同大型语言模型（LLMs）的获取历史类比的检索和生成方法。此外，我们提出了一种自我反思方法，可以减轻LLMs生成历史类比时的幻觉和刻板印象。通过人工评估和我们专门设计的自动多维度评估，我们发现LLMs通常具有良好的历史类比潜力。而且通过使用我们的自我反思方法，模型的性能可以进一步提高。

更新时间: 2025-07-30 17:18:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.14820v2

Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving

Vision-language models (VLMs) have become a promising approach to enhancing perception and decision-making in autonomous driving. The gap remains in applying VLMs to understand complex scenarios interacting with pedestrians and efficient vehicle deployment. In this paper, we propose a knowledge distillation method that transfers knowledge from large-scale vision-language foundation models to efficient vision networks, and we apply it to pedestrian behavior prediction and scene understanding tasks, achieving promising results in generating more diverse and comprehensive semantic attributes. We also utilize multiple pre-trained models and ensemble techniques to boost the model's performance. We further examined the effectiveness of the model after knowledge distillation; the results show significant metric improvements in open-vocabulary perception and trajectory prediction tasks, which can potentially enhance the end-to-end performance of autonomous driving.

Updated: 2025-07-30 17:16:46

标题: 自动驾驶中视觉语言模型在行人行为和场景理解中的应用

摘要: 视觉语言模型（VLMs）已成为增强自动驾驶感知和决策的一种有前途的方法。将VLMs应用于理解与行人互动和高效车辆部署的复杂场景仍存在差距。在本文中，我们提出了一种知识蒸馏方法，将知识从大规模视觉语言基础模型转移到高效视觉网络，并将其应用于行人行为预测和场景理解任务，取得了有希望的结果，生成了更多样化和全面的语义属性。我们还利用多个预训练模型和集成技术来提升模型的性能。我们进一步检验了知识蒸馏后模型的有效性；结果显示，在开放词汇感知和轨迹预测任务中，显著改进了度量标准，这可能提升自动驾驶的端到端性能。

更新时间: 2025-07-30 17:16:46

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2501.06680v2

Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving

Updated: 2025-07-30 17:16:46

标题: 自动驾驶中视觉语言模型在行人行为和场景理解中的应用

摘要: 视觉语言模型（VLMs）已经成为增强自动驾驶感知和决策的一种有前途的方法。在将VLMs应用于理解与行人互动和高效车辆部署的复杂场景方面仍存在差距。本文提出了一种知识蒸馏方法，将知识从大规模视觉语言基础模型转移到高效的视觉网络，并将其应用于行人行为预测和场景理解任务，取得了有希望的结果，生成了更加多样化和全面的语义属性。我们还利用多个预训练模型和集成技术来提升模型的性能。我们进一步检验了知识蒸馏后模型的有效性；结果显示，在开放词汇感知和轨迹预测任务中，指标显著提高，这有可能增强自动驾驶的端到端性能。

更新时间: 2025-07-30 17:16:46

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2501.06680v2

Decentralized Differentially Private Power Method

We propose a novel Decentralized Differentially Private Power Method (D-DP-PM) for performing Principal Component Analysis (PCA) in networked multi-agent settings. Unlike conventional decentralized PCA approaches where each agent accesses the full n-dimensional sample space, we address the challenging scenario where each agent observes only a subset of dimensions through row-wise data partitioning. Our method ensures $(\epsilon,\delta)$-Differential Privacy (DP) while enabling collaborative estimation of global eigenvectors across the network without requiring a central aggregator. We achieve this by having agents share only local embeddings of the current eigenvector iterate, leveraging both the inherent privacy from random initialization and carefully calibrated Gaussian noise additions. We prove that our algorithm satisfies the prescribed $(\epsilon,\delta)$-DP guarantee and establish convergence rates that explicitly characterize the impact of the network topology. Our theoretical analysis, based on linear dynamics and high-dimensional probability theory, provides tight bounds on both privacy and utility. Experiments on real-world datasets demonstrate that D-DP-PM achieves superior privacy-utility tradeoffs compared to naive local DP approaches, with particularly strong performance in moderate privacy regimes ($\epsilon\in[2, 5]$). The method converges rapidly, allowing practitioners to trade iterations for enhanced privacy while maintaining competitive utility.

Updated: 2025-07-30 17:15:50

标题: 去中心化的差分隐私动力方法

摘要: 我们提出了一种新颖的去中心化差分隐私功率方法（D-DP-PM），用于在网络化多智能体环境中执行主成分分析（PCA）。与传统的去中心化PCA方法不同，其中每个智能体都可以访问完整的n维样本空间，我们解决了每个智能体仅观察维度子集的挑战性场景，通过按行数据划分。我们的方法确保了$(\epsilon,\delta)$-差分隐私（DP），同时实现了网络中全局特征向量的协作估计，无需中央聚合器。我们通过使智能体仅共享当前特征向量迭代的本地嵌入，利用随机初始化和精心校准的高斯噪声添加，实现了这一点。我们证明了我们的算法满足规定的$(\epsilon,\delta)$-DP保证，并建立了明确表征网络拓扑影响的收敛速率。基于线性动态和高维概率论的理论分析，为隐私和效用提供了严格的界限。对真实世界数据集的实验证明，与简单的本地差分隐私方法相比，D-DP-PM在隐私-效用权衡方面表现出色，特别是在中等隐私范围（$\epsilon\in[2, 5]$）内表现出色。该方法收敛迅速，使从业者可以在保持竞争性效用的同时，交换迭代以获得增强的隐私。

更新时间: 2025-07-30 17:15:50

领域: cs.LG

下载: http://arxiv.org/abs/2507.22849v1

Decentralized Differentially Private Power Method

Updated: 2025-07-30 17:15:50

标题: 去中心化的差分隐私功率方法

摘要: 我们提出了一种新颖的分散式差分隐私功率方法（D-DP-PM），用于在网络化的多智能体环境中执行主成分分析（PCA）。与传统的分散式PCA方法不同，每个智能体都可以访问完整的n维样本空间，我们解决了每个智能体只能通过逐行数据分区观察到一部分维度的挑战性情景。我们的方法确保了$(\epsilon,\delta)$-差分隐私（DP），同时在不需要中央聚合器的情况下，促进了网络中全局特征向量的协作估计。我们通过让智能体仅分享当前特征向量迭代的局部嵌入，利用随机初始化和精心校准的高斯噪声添加，实现了这一点。我们证明了我们的算法满足所规定的$(\epsilon,\delta)$-DP保证，并建立了明确刻画网络拓扑影响的收敛速度。基于线性动态和高维概率论的理论分析为隐私和效用提供了严格的界限。对真实世界数据集的实验表明，与简单的本地差分隐私方法相比，D-DP-PM在中等隐私范围（$\epsilon\in[2, 5]$）表现出更好的隐私-效用权衡，特别是在隐私程度适中的情况下。该方法收敛速度快，使从业者能够通过交换迭代次数来获得增强的隐私，同时保持竞争性的效用。

更新时间: 2025-07-30 17:15:50

领域: cs.LG

下载: http://arxiv.org/abs/2507.22849v1

Curvature Dynamic Black-box Attack: revisiting adversarial robustness via dynamic curvature estimation

Adversarial attack reveals the vulnerability of deep learning models. For about a decade, countless attack and defense methods have been proposed, leading to robustified classifiers and better understanding of models. Among these methods, curvature-based approaches have attracted attention because it is assumed that high curvature may give rise to rough decision boundary. However, the most commonly used \textit{curvature} is the curvature of loss function, scores or other parameters from within the model as opposed to decision boundary curvature, since the former can be relatively easily formed using second order derivative. In this paper, we propose a new query-efficient method, dynamic curvature estimation(DCE), to estimate the decision boundary curvature in a black-box setting. Our approach is based on CGBA, a black-box adversarial attack. By performing DCE on a wide range of classifiers, we discovered, statistically, a connection between decision boundary curvature and adversarial robustness. We also propose a new attack method, curvature dynamic black-box attack(CDBA) with improved performance using the dynamically estimated curvature.

Updated: 2025-07-30 17:06:45

标题: 曲率动态黑盒攻击：通过动态曲率估计重新审视对抗鲁棒性

摘要: 对抗性攻击揭示了深度学习模型的脆弱性。十多年来，已经提出了无数攻击和防御方法，导致了增强的分类器和对模型的更好理解。在这些方法中，基于曲率的方法引起了关注，因为人们认为高曲率可能导致粗糙的决策边界。然而，最常用的\textit{曲率}是来自于模型内部的损失函数、分数或其他参数的曲率，而不是决策边界的曲率，因为前者可以相对容易地使用二阶导数形成。在本文中，我们提出了一种新的查询高效方法，动态曲率估计（DCE），以在黑盒设置中估计决策边界的曲率。我们的方法基于CGBA，这是一种黑盒对抗性攻击。通过在广泛的分类器上执行DCE，我们统计上发现了决策边界曲率和对抗鲁棒性之间的连接。我们还提出了一种新的攻击方法，曲率动态黑盒攻击（CDBA），利用动态估计的曲率提高了性能。

更新时间: 2025-07-30 17:06:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2505.19194v2

Curvature Dynamic Black-box Attack: revisiting adversarial robustness via dynamic curvature estimation

Updated: 2025-07-30 17:06:45

标题: 曲率动态黑盒攻击：通过动态曲率估计重新审视对抗鲁棒性

摘要: 对抗攻击揭示了深度学习模型的脆弱性。在过去的十年里，已经提出了无数的攻击和防御方法，导致了强化分类器和对模型的更好理解。在这些方法中，基于曲率的方法引起了关注，因为人们认为高曲率可能导致粗糙的决策边界。然而，最常用的\textit{曲率}是损失函数、分数或其他模型内部参数的曲率，而不是决策边界的曲率，因为前者可以相对容易地使用二阶导数形成。在本文中，我们提出了一种新的查询高效方法，动态曲率估计（DCE），在黑盒设置中估计决策边界曲率。我们的方法基于CGBA，一种黑盒对抗攻击。通过对各种分类器执行DCE，我们统计上发现决策边界曲率和对抗鲁棒性之间的联系。我们还提出了一种新的攻击方法，曲率动态黑盒攻击（CDBA），利用动态估计的曲率实现了更好的性能。

更新时间: 2025-07-30 17:06:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2505.19194v2

The Incomplete Bridge: How AI Research (Mis)Engages with Psychology

Social sciences have accumulated a rich body of theories and methodologies for investigating the human mind and behaviors, while offering valuable insights into the design and understanding of Artificial Intelligence (AI) systems. Focusing on psychology as a prominent case, this study explores the interdisciplinary synergy between AI and the field by analyzing 1,006 LLM-related papers published in premier AI venues between 2023 and 2025, along with the 2,544 psychology publications they cite. Through our analysis, we identify key patterns of interdisciplinary integration, locate the psychology domains most frequently referenced, and highlight areas that remain underexplored. We further examine how psychology theories/frameworks are operationalized and interpreted, identify common types of misapplication, and offer guidance for more effective incorporation. Our work provides a comprehensive map of interdisciplinary engagement between AI and psychology, thereby facilitating deeper collaboration and advancing AI systems.

Updated: 2025-07-30 17:03:59

标题: 《不完整的桥梁：人工智能研究如何（未）与心理学互动》

摘要: 社会科学已经积累了丰富的理论和方法论，用于研究人类的思想和行为，同时为设计和理解人工智能系统提供了宝贵的见解。本研究以心理学为突出案例，通过分析2023年至2025年在顶尖人工智能会议上发表的1,006篇与LLM有关的论文以及它们引用的2,544篇心理学出版物，探讨了人工智能与该领域之间的跨学科协同作用。通过我们的分析，我们确定了跨学科整合的关键模式，找到了最经常被引用的心理学领域，并突出了仍未被充分开发的领域。我们进一步研究了心理学理论/框架如何被实施和解释，确定了常见的误用类型，并提供了更有效整合的指导。我们的工作提供了一张AI与心理学之间跨学科互动的全面图谱，从而促进更深入的合作并推动AI系统的发展。

更新时间: 2025-07-30 17:03:59

领域: cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2507.22847v1

The Incomplete Bridge: How AI Research (Mis)Engages with Psychology

Updated: 2025-07-30 17:03:59

标题: 这个文献标题的翻译为：未完成的桥梁：人工智能研究如何与心理学（不）相互作用

摘要: 社会科学已经积累了丰富的理论和方法论，用于研究人类的思维和行为，同时为设计和理解人工智能（AI）系统提供了宝贵的见解。本研究以心理学为突出案例，通过分析2023年至2025年在顶尖AI会场发表的1,006篇LLM相关论文以及它们引用的2,544篇心理学论文，探讨了AI与该领域之间的跨学科协同作用。通过我们的分析，我们确定了跨学科整合的关键模式，找出了被引用最频繁的心理学领域，并突出了尚未充分探索的领域。我们进一步研究了心理学理论/框架如何被操作化和解释，确定了常见的误用类型，并提供了更有效整合的指导。我们的工作提供了AI与心理学之间跨学科合作的全面地图，从而促进更深入的合作并推进AI系统的发展。

更新时间: 2025-07-30 17:03:59

领域: cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2507.22847v1

RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents

The development of autonomous agents for complex, long-horizon tasks is a central goal in AI. However, dominant training paradigms face a critical limitation: reinforcement learning (RL) methods that optimize solely for final task success often reinforce flawed or inefficient reasoning paths, a problem we term inefficient exploration. This leads to agents that are brittle and fail to generalize, as they learn to find solutions without learning how to reason coherently. To address this, we introduce RLVMR, a novel framework that integrates dense, process-level supervision into end-to-end RL by rewarding verifiable, meta-reasoning behaviors. RLVMR equips an agent to explicitly tag its cognitive steps, such as planning, exploration, and reflection, and provides programmatic, rule-based rewards for actions that contribute to effective problem-solving. These process-centric rewards are combined with the final outcome signal and optimized using a critic-free policy gradient method. On the challenging ALFWorld and ScienceWorld benchmarks, RLVMR achieves new state-of-the-art results, with our 7B model reaching an 83.6% success rate on the most difficult unseen task split. Our analysis confirms these gains stem from improved reasoning quality, including significant reductions in redundant actions and enhanced error recovery, leading to more robust, efficient, and interpretable agents.

Updated: 2025-07-30 17:00:48

标题: RLVMR：具有可验证的元推理奖励的强化学习，用于稳健的长期视野代理

摘要: 复杂、长期任务的自主代理的发展是人工智能的中心目标。然而，主导的训练范式面临一个关键的限制：仅仅优化最终任务成功的强化学习（RL）方法常常会强化有缺陷或低效的推理路径，这一问题被称为低效的探索。这导致代理变得脆弱且无法推广，因为它们学会找到解决方案，而没有学会如何进行一致的推理。为了解决这个问题，我们引入了RLVMR，这是一个将密集的、过程级监督集成到端到端的RL中的新框架，通过奖励可验证的、元推理行为。RLVMR使代理能够明确标记其认知步骤，例如规划、探索和反思，并为有助于有效问题解决的行为提供程序化的、基于规则的奖励。这些以过程为中心的奖励与最终结果信号相结合，并使用无评论者的政策梯度方法进行优化。在具有挑战性的ALFWorld和ScienceWorld基准测试中，RLVMR取得了新的最先进的结果，我们的7B模型在最困难的未见任务分割上达到了83.6%的成功率。我们的分析证实了这些收益来自于改进的推理质量，包括冗余行动的显著减少和增强的错误恢复，从而产生更加强大、高效和可解释的代理。

更新时间: 2025-07-30 17:00:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22844v1

RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents

Updated: 2025-07-30 17:00:48

标题: RLVMR: 具有可验证元推理奖励的强化学习，用于稳健长期视野代理

摘要: 自主代理的发展对于复杂、长期任务是人工智能的一个中心目标。然而，主导的训练范式面临一个关键限制：仅优化最终任务成功的强化学习（RL）方法往往会强化有缺陷或低效的推理路径，这一问题被称为低效探索。这导致代理变得脆弱且无法泛化，因为它们学会找到解决方案但却没有学会如何进行连贯的推理。为了解决这个问题，我们引入了RLVMR，这是一个将密集的、过程级的监督集成到端到端RL中的新框架，通过奖励可验证的、元推理行为。RLVMR使代理能够明确标记其认知步骤，如规划、探索和反思，并为有助于有效问题解决的行动提供程序化、基于规则的奖励。这些以过程为中心的奖励与最终结果信号结合，使用无评论者策略梯度方法进行优化。在具有挑战性的ALFWorld和ScienceWorld基准测试中，RLVMR取得了新的最先进结果，我们的7B模型在最困难的未见任务分裂中达到83.6%的成功率。我们的分析证实了这些收益源于推理质量的改善，包括冗余行动的显著减少和错误恢复的增强，从而产生更加强大、高效和可解释的代理。

更新时间: 2025-07-30 17:00:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22844v1

Subgrid BoostCNN: Efficient Boosting of Convolutional Networks via Gradient-Guided Feature Selection

Convolutional Neural Networks (CNNs) have achieved remarkable success across a wide range of machine learning tasks by leveraging hierarchical feature learning through deep architectures. However, the large number of layers and millions of parameters often make CNNs computationally expensive to train, requiring extensive time and manual tuning to discover optimal architectures. In this paper, we introduce a novel framework for boosting CNN performance that integrates dynamic feature selection with the principles of BoostCNN. Our approach incorporates two key strategies: subgrid selection and importance sampling, to guide training toward informative regions of the feature space. We further develop a family of algorithms that embed boosting weights directly into the network training process using a least squares loss formulation. This integration not only alleviates the burden of manual architecture design but also enhances accuracy and efficiency. Experimental results across several fine-grained classification benchmarks demonstrate that our boosted CNN variants consistently outperform conventional CNNs in both predictive performance and training speed.

Updated: 2025-07-30 17:00:05

标题: 子网格BoostCNN：通过梯度引导特征选择，高效增强卷积网络

摘要: 卷积神经网络（CNNs）通过深层次结构实现分层特征学习，在各种机器学习任务中取得了显著的成功。然而，大量的层和数百万参数通常使CNNs在训练过程中计算成本高昂，需要大量时间和手动调整才能发现最佳结构。本文介绍了一种新颖的框架，用于提升CNN性能，将动态特征选择与BoostCNN原理相结合。我们的方法包括两个关键策略：子网格选择和重要性采样，以引导训练进入特征空间的信息区域。我们进一步开发了一系列算法，将增强权重直接嵌入网络训练过程中，使用最小二乘损失公式。这种集成不仅减轻了手动设计结构的负担，还提高了准确性和效率。在多个细粒度分类基准测试中的实验结果表明，我们的增强CNN变体在预测性能和训练速度上始终优于传统CNNs。

更新时间: 2025-07-30 17:00:05

领域: stat.ML,cs.LG,68T05, 68T45,I.2.6; I.5.1; I.2.10

下载: http://arxiv.org/abs/2507.22842v1

Subgrid BoostCNN: Efficient Boosting of Convolutional Networks via Gradient-Guided Feature Selection

Updated: 2025-07-30 17:00:05

标题: 子网格BoostCNN：通过梯度引导特征选择有效提升卷积网络

摘要: 卷积神经网络（CNNs）通过深层架构利用层次特征学习，在各种机器学习任务中取得了显著成功。然而，大量的层和数百万参数使得CNNs在训练过程中计算成本高昂，需要大量时间和手动调整才能发现最佳架构。本文介绍了一种新颖的提升CNN性能的框架，将动态特征选择与BoostCNN的原则相结合。我们的方法包括两个关键策略：子网格选择和重要性采样，以引导训练朝着特征空间中具有信息量的区域进行。我们进一步开发了一系列算法，通过最小二乘损失公式将提升权重直接嵌入到网络训练过程中。这种集成不仅减轻了手动设计架构的负担，还提高了准确性和效率。跨几个细粒度分类基准的实验结果表明，我们的提升CNN变体在预测性能和训练速度方面始终优于传统CNNs。

更新时间: 2025-07-30 17:00:05

领域: stat.ML,cs.LG,68T05, 68T45,I.2.6; I.5.1; I.2.10

下载: http://arxiv.org/abs/2507.22842v1

PAF-Net: Phase-Aligned Frequency Decoupling Network for Multi-Process Manufacturing Quality Prediction

Accurate quality prediction in multi-process manufacturing is critical for industrial efficiency but hindered by three core challenges: time-lagged process interactions, overlapping operations with mixed periodicity, and inter-process dependencies in shared frequency bands. To address these, we propose PAF-Net, a frequency decoupled time series prediction framework with three key innovations: (1) A phase-correlation alignment method guided by frequency domain energy to synchronize time-lagged quality series, resolving temporal misalignment. (2) A frequency independent patch attention mechanism paired with Discrete Cosine Transform (DCT) decomposition to capture heterogeneous operational features within individual series. (3) A frequency decoupled cross attention module that suppresses noise from irrelevant frequencies, focusing exclusively on meaningful dependencies within shared bands. Experiments on 4 real-world datasets demonstrate PAF-Net's superiority. It outperforms 10 well-acknowledged baselines by 7.06% lower MSE and 3.88% lower MAE. Our code is available at https://github.com/StevenLuan904/PAF-Net-Official.

Updated: 2025-07-30 16:56:42

标题: PAF-Net：用于多工序制造质量预测的相位对齐频率解耦网络

摘要: 在多工艺制造中准确的质量预测对于工业效率至关重要，但受到三个核心挑战的阻碍：延迟的工艺相互作用，具有混合周期性的重叠操作，以及共享频带中的跨工艺依赖关系。为了解决这些挑战，我们提出了PAF-Net，这是一个频率解耦的时间序列预测框架，具有三个关键创新：(1) 一种由频域能量引导的相位相关对齐方法，用于同步延迟的质量序列，解决时间上的不对齐。(2) 一个与离散余弦变换（DCT）分解配对的频率独立的补丁注意机制，用于捕捉各个序列中的异构操作特征。(3) 一个频率解耦的交叉注意模块，抑制来自无关频率的噪声，专注于共享频带内有意义的依赖关系。对4个真实数据集进行的实验证明了PAF-Net的优越性。它比10个公认的基线模型表现更好，均方误差低7.06％，平均绝对误差低3.88％。我们的代码可在https://github.com/StevenLuan904/PAF-Net-Official找到。

更新时间: 2025-07-30 16:56:42

领域: cs.LG

下载: http://arxiv.org/abs/2507.22840v1

PAF-Net: Phase-Aligned Frequency Decoupling Network for Multi-Process Manufacturing Quality Prediction

Updated: 2025-07-30 16:56:42

标题: PAF-Net：用于多工序制造质量预测的相位对齐频率解耦网络

摘要: 在多工艺制造中，准确的质量预测对于工业效率至关重要，但受到三个核心挑战的阻碍：时间延迟的工艺相互作用、具有混合周期性的重叠操作，以及共享频段中的跨工艺依赖关系。为了解决这些问题，我们提出了PAF-Net，这是一个具有三个关键创新的频率解耦时间序列预测框架：(1) 一个由频域能量指导的相位相关对齐方法，用于同步时间延迟的质量序列，解决时间上的错位。(2) 一个独立于频率的补丁注意机制，配合离散余弦变换（DCT）分解，用于捕捉单个序列中的异质性操作特征。(3) 一个频率解耦的交叉注意模块，抑制来自无关频率的噪声，专注于共享频段内有意义的依赖关系。对4个真实数据集进行的实验证明了PAF-Net的优越性。它比10个公认的基线模型表现更好，均方误差低7.06%，平均绝对误差低3.88%。我们的代码可在https://github.com/StevenLuan904/PAF-Net-Official 获取。

更新时间: 2025-07-30 16:56:42

领域: cs.LG

下载: http://arxiv.org/abs/2507.22840v1

Scaling RL to Long Videos

We introduce a full-stack framework that scales up reasoning in vision-language models (VLMs) to long videos, leveraging reinforcement learning. We address the unique challenges of long video reasoning by integrating three critical components: (1) a large-scale dataset, LongVideo-Reason, comprising 104K long video QA pairs with high-quality reasoning annotations across diverse domains such as sports, games, and vlogs; (2) a two-stage training pipeline that extends VLMs with chain-of-thought supervised fine-tuning (CoT-SFT) and reinforcement learning (RL); and (3) a training infrastructure for long video RL, named Multi-modal Reinforcement Sequence Parallelism (MR-SP), which incorporates sequence parallelism and a vLLM-based engine tailored for long video, using cached video embeddings for efficient rollout and prefilling. In our experiments, LongVILA-R1-7B achieves strong performance on video benchmarks, reaching 65.1% and 71.1% accuracy on VideoMME without and with subtitles, respectively, and consistently outperforming LongVILA-7B across multiple benchmarks. Moreover, LongVILA-R1-7B supports processing up to 8,192 video frames per video, and configurable FPS settings. Notably, our MR-SP system achieves up to 2.1x speedup on long video RL training. In addition, we release our training system for public availability that supports RL training on various modalities (video, text, and audio), various models (VILA and Qwen series), and even image and video generation models. On a single A100 node (8 GPUs), it supports RL training on hour-long videos (e.g., 3,600 frames).

Updated: 2025-07-30 16:55:33

标题: 将RL扩展到长视频

摘要: 我们引入了一个全栈框架，通过强化学习将视觉-语言模型(VLMs)的推理能力扩展到长视频。我们通过整合三个关键组件来解决长视频推理的独特挑战：(1)一个大规模数据集LongVideo-Reason，包括104K个高质量推理注释的长视频问答对，涵盖运动、游戏和视频博客等多个领域；(2)一个两阶段训练流程，通过链式思维监督微调（CoT-SFT）和强化学习（RL）扩展VLMs；以及(3)用于长视频RL的训练基础设施，名为多模式强化序列并行（MR-SP），其结合了序列并行和针对长视频定制的vLLM引擎，使用缓存的视频嵌入进行高效展开和预填充。在我们的实验中，LongVILA-R1-7B在视频基准测试中取得了强大的性能，分别在没有字幕和有字幕的情况下达到了65.1%和71.1%的准确率，且在多个基准测试中始终优于LongVILA-7B。此外，LongVILA-R1-7B支持每个视频处理高达8,192帧，并支持可配置的FPS设置。值得注意的是，我们的MR-SP系统在长视频RL训练中实现了高达2.1倍的加速。此外，我们发布了我们的训练系统，支持公开可用的RL训练，涵盖各种模态（视频、文本和音频）、各种模型（VILA和Qwen系列），甚至包括图像和视频生成模型。在单个A100节点（8个GPU）上，它支持对长达一个小时的视频（例如3,600帧）进行RL训练。

更新时间: 2025-07-30 16:55:33

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.07966v3

Scaling RL to Long Videos

Updated: 2025-07-30 16:55:33

标题: 将强化学习扩展至长视频

摘要: 我们介绍了一个全栈框架，通过利用强化学习将视觉语言模型（VLMs）的推理能力扩展到长视频。我们通过整合三个关键组成部分来解决长视频推理的独特挑战：（1）一个大规模数据集LongVideo-Reason，包括104K个长视频问答对，在体育、游戏和视频博客等不同领域具有高质量的推理注释；（2）一个两阶段训练流程，将VLMs扩展为具有思维链式监督微调（CoT-SFT）和强化学习（RL）的模型；以及（3）一个针对长视频RL的训练基础架构，称为多模式强化学习序列并行（MR-SP），它结合了序列并行和专门针对长视频的vLLM引擎，利用缓存的视频嵌入来实现有效的展开和预填充。在我们的实验中，LongVILA-R1-7B在视频基准上取得了出色的表现，分别在VideoMME上达到了65.1%和71.1%的准确率，LongVILA-7B在多个基准测试中始终表现优异。此外，LongVILA-R1-7B支持每个视频处理高达8192帧，并且可配置FPS设置。值得注意的是，我们的MR-SP系统在长视频RL训练上实现了长达2.1倍的加速。此外，我们发布了我们的训练系统，支持公开使用，支持在各种模态（视频、文本和音频）、各种模型（VILA和Qwen系列）甚至图像和视频生成模型上进行RL训练。在单个A100节点（8个GPU）上，它支持对长达1小时的视频进行RL训练（例如，3600帧）。

更新时间: 2025-07-30 16:55:33

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.07966v3

Probing EFX via PMMS: (Non-)Existence Results in Discrete Fair Division

We study the fair division of indivisible items and provide new insights into the EFX problem, which is widely regarded as the central open question in fair division, and the PMMS problem, a strictly stronger variant of EFX. Our first result constructs a three-agent instance with two monotone valuations and one additive valuation in which no PMMS allocation exists. Since EFX allocations are known to exist under these assumptions, this establishes a formal separation between EFX and PMMS. We prove existence of fair allocations for three important special cases. We show that EFX allocations exist for personalized bivalued valuations, where for each agent $i$ there exist values $a_i > b_i$ such that agent $i$ assigns value $v_i(\{g\}) \in \{a_i, b_i\}$ to each good $g$. We establish an analogous existence result for PMMS allocations when $a_i$ is divisible by $b_i$. We also prove that PMMS allocations exist for binary-valued MMS-feasible valuations, where each bundle $S$ has value $v_i(S) \in \{0, 1\}$. Notably, this result holds even without assuming monotonicity of valuations and thus applies to the fair division of chores and mixed manna. Finally, we study a class of valuations called pair-demand valuations, which extend the well-studied unit-demand valuations to the case where each agent derives value from at most two items, and we show that PMMS allocations exist in this setting. Our proofs are constructive, and we provide polynomial-time algorithms for all three existence results.

Updated: 2025-07-30 16:51:28

标题: 通过PMMS探究EFX：（非）存在结果在离散公平分配中

摘要: 我们研究了不可分割物品的公平分配，并对EFX问题提供了新的见解，该问题被广泛认为是公平分配中的中心未解问题，以及PMMS问题，这是EFX的一个严格更强变体。我们的第一个结果构造了一个具有两个单调估值和一个加法估值的三个代理实例，其中不存在PMMS分配。由于已知在这些假设下存在EFX分配，这就建立了EFX和PMMS之间的正式区分。我们证明了三种重要特殊情况下公平分配的存在性。我们展示了对于个性化双值估值，即对于每个代理i，存在值a_i > b_i，使得代理i对每个商品g分配值v_i({g}) ∈ {a_i, b_i}，存在EFX分配。我们还建立了类似的存在结果，即当a_i可被b_i整除时，PMMS分配也存在。我们还证明了对于二值MMS可行估值，即每个捆绑S的值v_i(S) ∈ {0, 1}，PMMS分配存在。值得注意的是，即使不假定估值的单调性，该结果也成立，因此适用于家务分工和混合粮食的公平分配。最后，我们研究了一类称为对需求估值的估值，这些估值将广泛研究的单位需求估值扩展到每个代理从最多两个物品中获得价值的情况，并且我们证明在这种情况下存在PMMS分配。我们的证明是构造性的，并为所有三个存在性结果提供了多项式时间算法。

更新时间: 2025-07-30 16:51:28

领域: cs.GT,cs.AI,cs.DS

下载: http://arxiv.org/abs/2507.14957v2

Probing EFX via PMMS: (Non-)Existence Results in Discrete Fair Division

Updated: 2025-07-30 16:51:28

标题: 通过PMMS探索EFX:离散公平分配中的（非）存在结果

摘要: 我们研究了不可分割物品的公平分配，并对被广泛认为是公平分配中的中心问题的EFX问题和PMMS问题提供了新的见解。我们的第一个结果构建了一个三个代理人实例，其中两个代理人有单调估值，一个代理人有可加估值，没有PMMS分配存在。由于在这些假设下已知存在EFX分配，这确立了EFX和PMMS之间的正式分离。我们证明了三个重要特殊情况的公平分配的存在性。我们展示了对于个性化双值估值，每个代理人$i$存在值$a_i > b_i$，使得代理人$i$将值$v_i(\{g\}) \in \{a_i, b_i\}$分配给每个物品$g$时，存在EFX分配。当$a_i$能被$b_i$整除时，我们建立了PMMS分配的类似存在性结果。我们还证明了对于二值MMS可行估值，每个捆$S$的值$v_i(S) \in \{0, 1\}$，PMMS分配存在。值得注意的是，即使不假定估值的单调性，这个结果也成立，因此适用于家务和混合食物的公平分配。最后，我们研究了一类称为对偶需求估值的估值，将广泛研究的单位需求估值扩展到每个代理人最多从两个物品中获得价值的情况，并展示了在这种情况下PMMS分配的存在性。我们的证明具有建设性，并为所有三个存在性结果提供了多项式时间算法。

更新时间: 2025-07-30 16:51:28

领域: cs.GT,cs.AI,cs.DS

下载: http://arxiv.org/abs/2507.14957v2

Tapping into the Black Box: Uncovering Aligned Representations in Pretrained Neural Networks

In this paper we argue that ReLU networks learn an implicit linear model we can actually tap into. We describe that alleged model formally and show that we can approximately pull its decision boundary back to the input space with certain simple modification to the backward pass. The resulting gradients (called excitation pullbacks) reveal high-resolution input- and target-specific features of remarkable perceptual alignment on a number of popular ImageNet-pretrained deep architectures. This strongly suggests that neural networks do, in fact, rely on learned interpretable patterns that can be recovered after training. Thus, our findings may have profound implications for knowledge discovery and the development of dependable artificial systems.

Updated: 2025-07-30 16:47:42

标题: 打开黑盒子：揭示预训练神经网络中的对齐表示

摘要: 在本文中，我们认为ReLU网络学习了一个我们实际可以利用的隐式线性模型。我们正式描述了这个所谓的模型，并展示出我们可以通过对反向传播进行简单修改，将其决策边界近似地拉回到输入空间。得到的梯度（称为激励反馈）显示出在许多流行的ImageNet预训练深度架构上具有显著感知对齐的高分辨率输入和目标特定特征。这强烈暗示神经网络实际上依赖于学习到的可解释模式，可以在训练后恢复。因此，我们的发现可能对知识发现和可靠人工系统的发展产生深远影响。

更新时间: 2025-07-30 16:47:42

领域: cs.LG,cs.CV,cs.NE,I.2.6; I.4.10

下载: http://arxiv.org/abs/2507.22832v1

Tapping into the Black Box: Uncovering Aligned Representations in Pretrained Neural Networks

Updated: 2025-07-30 16:47:42

标题: 打开黑匣子：揭示预训练神经网络中的对齐表示

摘要: 在这篇论文中，我们认为ReLU网络学习了一个隐含的线性模型，我们实际上可以利用这个模型。我们形式化地描述了这个所谓的模型，并展示了通过对反向传播进行某些简单修改，我们可以将其决策边界近似地拉回输入空间。由此产生的梯度（称为激励回传）显示出在许多流行的ImageNet预训练深度架构上具有显著感知对齐的高分辨率输入和目标特定特征。这强烈暗示神经网络实际上依赖于学习到的可解释模式，这些模式可以在训练后恢复。因此，我们的发现可能对知识发现和可靠人工系统的发展产生深远影响。

更新时间: 2025-07-30 16:47:42

领域: cs.LG,cs.CV,cs.NE,I.2.6; I.4.10

下载: http://arxiv.org/abs/2507.22832v1

CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models

As Vision-Language Models (VLMs) are increasingly deployed in split-DNN configurations--with visual encoders (e.g., ResNet, ViT) operating on user devices and sending intermediate features to the cloud--there is a growing privacy risk from semantic information leakage. Existing approaches to reconstructing images from these intermediate features often result in blurry, semantically ambiguous images. To directly address semantic leakage, we propose CapRecover, a cross-modality inversion framework that recovers high-level semantic content, such as labels or captions, directly from intermediate features without image reconstruction. We evaluate CapRecover on multiple datasets and victim models, demonstrating strong performance in semantic recovery. Specifically, CapRecover achieves up to 92.71% Top-1 label accuracy on CIFAR-10 and generates fluent captions from ResNet50 features on COCO2017 with ROUGE-L scores up to 0.52. Our analysis further reveals that deeper convolutional layers encode significantly more semantic information compared to shallow layers. To mitigate semantic leakage, we introduce a simple yet effective protection method: adding random noise to intermediate features at each layer and removing the noise in the next layer. Experimental results show that this approach prevents semantic leakage without additional training costs.

Updated: 2025-07-30 16:42:02

标题: CapRecover：一个用于视觉语言模型的跨模态特征反演攻击框架

摘要: 随着视觉语言模型（VLMs）越来越多地部署在分布式DNN配置中 - 其中视觉编码器（例如ResNet，ViT）在用户设备上运行并将中间特征发送到云端 - 由于语义信息泄漏，日益增加了隐私风险。现有的从这些中间特征重建图像的方法通常会导致模糊、语义模糊的图像。为了直接解决语义泄漏问题，我们提出了CapRecover，这是一个跨模态反演框架，可以直接从中间特征中恢复高级语义内容，例如标签或标题，而无需图像重建。我们在多个数据集和受害模型上评估了CapRecover，展示了在语义恢复方面的强大性能。具体来说，CapRecover在CIFAR-10上实现了高达92.71%的Top-1标签准确度，并在COCO2017上从ResNet50特征生成流畅的标题，ROUGE-L得分高达0.52。我们的分析进一步显示，较深的卷积层相比较浅层编码了更多的语义信息。为了减轻语义泄漏问题，我们引入了一种简单但有效的保护方法：在每一层的中间特征中添加随机噪声，并在下一层中去除噪声。实验结果表明，这种方法可以防止语义泄漏而无需额外的训练成本。

更新时间: 2025-07-30 16:42:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22828v1

CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models

Updated: 2025-07-30 16:42:02

标题: CapRecover: 一种基于视觉语言模型的跨模态特征逆向攻击框架

摘要: 随着视觉-语言模型（VLMs）越来越多地部署在分布式深度神经网络配置中--其中视觉编码器（例如ResNet、ViT）在用户设备上运行并将中间特征发送到云端--由语义信息泄漏带来的隐私风险日益增加。现有的从这些中间特征重建图像的方法通常会导致模糊、语义模糊的图像。为了直接解决语义泄漏问题，我们提出了CapRecover，一个跨模态反演框架，可以直接从中间特征中恢复高级语义内容，如标签或标题，而无需图像重建。我们在多个数据集和受害模型上评估了CapRecover，展示了其在语义恢复方面的强大性能。具体来说，CapRecover在CIFAR-10上实现了高达92.71%的Top-1标签准确率，并在COCO2017上从ResNet50特征生成流畅的标题，ROUGE-L分数高达0.52。我们的分析进一步揭示，深层卷积层相比浅层卷积层编码了更多语义信息。为了减轻语义泄漏问题，我们引入了一种简单而有效的保护方法：在每一层向中间特征添加随机噪声，并在下一层中去除噪声。实验结果表明，这种方法可以防止语义泄漏，而无需额外的训练成本。

更新时间: 2025-07-30 16:42:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22828v1

The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation for Healthcare QA

Deploying Large Language Models (LLMs) for healthcare question answering requires robust methods to ensure accuracy and reliability. This work introduces Query-Based Retrieval Augmented Generation (QB-RAG), a framework for enhancing Retrieval-Augmented Generation (RAG) systems in healthcare question-answering by pre-aligning user queries with a database of curated, answerable questions derived from healthcare content. A key component of QB-RAG is an LLM-based filtering mechanism that ensures that only relevant and answerable questions are included in the database, enabling reliable reference query generation at scale. We provide theoretical motivation for QB-RAG, conduct a comparative analysis of existing retrieval enhancement techniques, and introduce a generalizable, comprehensive evaluation framework that assesses both the retrieval effectiveness and the quality of the generated response based on faithfulness, relevance, and adherence to the guideline. Our empirical evaluation on a healthcare data set demonstrates the superior performance of QB-RAG compared to existing retrieval methods, highlighting its practical value in building trustworthy digital health applications for health question-answering.

Updated: 2025-07-30 16:28:54

标题: 查询的几何：基于查询的创新在用于医疗问答的检索增强生成中的应用

摘要: 部署大型语言模型（LLMs）用于医疗问答需要稳健的方法来确保准确性和可靠性。本文介绍了基于查询的检索增强生成（QB-RAG）框架，通过将用户查询与源自医疗内容的数据库中的筹备问题进行预对齐，来增强医疗问答中的检索增强生成（RAG）系统。QB-RAG的一个关键组件是基于LLM的过滤机制，确保只有相关和可回答的问题包含在数据库中，从而实现可靠的参考查询生成。我们提供了QB-RAG的理论动机，对现有的检索增强技术进行了比较分析，并引入了一个通用的、全面的评估框架，评估检索效果以及生成回答的质量，包括忠实度、相关性和遵循指南。我们在医疗数据集上的实证评估显示，与现有的检索方法相比，QB-RAG表现出更优异的性能，凸显了其在构建值得信赖的数字健康应用程序中的实际价值。

更新时间: 2025-07-30 16:28:54

领域: cs.LG

下载: http://arxiv.org/abs/2407.18044v2

The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation for Healthcare QA

Updated: 2025-07-30 16:28:54

标题: 查询的几何学：用于医疗问答检索增强生成的基于查询的创新

摘要: 使用大型语言模型（LLMs）进行医疗问题回答需要稳健的方法来确保准确性和可靠性。本研究介绍了基于查询的检索增强生成（QB-RAG）框架，通过将用户查询与从医疗内容中提取的可回答问题数据库预对齐，从而增强了医疗问题回答中的检索增强生成（RAG）系统。QB-RAG的一个关键组成部分是基于LLM的过滤机制，确保只有相关和可回答的问题包含在数据库中，从而实现可靠的大规模参考查询生成。我们提供了QB-RAG的理论动机，进行了现有检索增强技术的比较分析，并引入了一个可推广的、全面的评估框架，根据忠实度、相关性和遵循指南的质量评估检索效果和生成的响应质量。我们在医疗数据集上进行的实证评估显示，与现有的检索方法相比，QB-RAG表现出卓越的性能，突显了其在构建可信任的数字健康应用程序中的实际价值，用于医疗问题回答。

更新时间: 2025-07-30 16:28:54

领域: cs.LG

下载: http://arxiv.org/abs/2407.18044v2

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation

The pursuit of a generalizable stereo matching model, capable of performing well across varying resolutions and disparity ranges without dataset-specific fine-tuning, has revealed a fundamental trade-off. Iterative local search methods achieve high scores on constrained benchmarks, but their core mechanism inherently limits the global consistency required for true generalization. However, global matching architectures, while theoretically more robust, have historically been rendered infeasible by prohibitive computational and memory costs. We resolve this dilemma with $S^2M^2$: a global matching architecture that achieves state-of-the-art accuracy and high efficiency without relying on cost volume filtering or deep refinement stacks. Our design integrates a multi-resolution transformer for robust long-range correspondence, trained with a novel loss function that concentrates probability on feasible matches. This approach enables a more robust joint estimation of disparity, occlusion, and confidence. $S^2M^2$ establishes a new state of the art on Middlebury v3 and ETH3D benchmarks, significantly outperforming prior methods in most metrics while reconstructing high-quality details with competitive efficiency.

Updated: 2025-07-30 16:27:21

标题: $S^2M^2$: 可伸缩的立体匹配模型用于可靠的深度估计

摘要: 追求一个通用的立体匹配模型，能够在不同分辨率和视差范围下表现良好，无需针对特定数据集进行微调，揭示了一个基本的权衡。迭代局部搜索方法在受限基准上取得高分，但它们的核心机制固有地限制了真正泛化所需的全局一致性。然而，尽管全局匹配架构在理论上更为健壮，但由于计算和内存成本过高而在历史上不可行。我们通过$S^2M^2$解决了这一困境：一种全局匹配架构，实现了最先进的精度和高效率，而不依赖于成本体积过滤或深度细化堆栈。我们的设计集成了一个多分辨率变换器，用于稳健的远程对应，训练了一个新颖的损失函数，将概率集中在可行的匹配上。这种方法使得视差、遮挡和置信度的联合估计更为稳健。$S^2M^2$在Middlebury v3和ETH3D基准上树立了一个新的技术水准，在大多数指标上明显优于先前的方法，同时以竞争性的效率重建高质量的细节。

更新时间: 2025-07-30 16:27:21

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2507.13229v3

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation

Updated: 2025-07-30 16:27:21

标题: $S^2M^2$: 可扩展立体匹配模型用于可靠的深度估计

摘要: 追求一个可推广的立体匹配模型，能够在不同分辨率和视差范围下表现良好，而无需进行特定数据集的微调，揭示了一个根本性的折衷。迭代局部搜索方法在受限基准测试中取得高分，但其核心机制固有地限制了真正泛化所需的全局一致性。然而，全局匹配架构在理论上更加健壮，但在过去由于计算和内存成本过高而不可行。我们通过$S^2M^2$来解决这一困境：这是一种全局匹配架构，实现了最先进的准确性和高效性，而无需依赖代价体积滤波或深度精化堆栈。我们的设计整合了一个多分辨率变压器，用一种新颖的损失函数进行训练，该损失函数将概率集中在可行匹配上。这种方法实现了对视差、遮挡和置信度的更加健壮的联合估计。$S^2M^2$在Middlebury v3和ETH3D基准测试上取得了最新的研究成果，在大多数指标上明显优于先前的方法，同时以竞争性的效率重建了高质量的细节。

更新时间: 2025-07-30 16:27:21

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2507.13229v3

Repetition Makes Perfect: Recurrent Graph Neural Networks Match Message Passing Limit

We precisely characterize the expressivity of computable Recurrent Graph Neural Networks (recurrent GNNs). We prove that recurrent GNNs with finite-precision parameters, sum aggregation, and ReLU activation, can compute any graph algorithm that respects the natural message-passing invariance induced by the Color Refinement (or Weisfeiler-Leman) algorithm. While it is well known that the expressive power of GNNs is limited by this invariance [Morris et al., AAAI 2019; Xu et al., ICLR 2019], we establish that recurrent GNNs can actually match this limit. This is in contrast to non-recurrent GNNs, which have the power of Weisfeiler-Leman only in a very weak, "non-uniform", sense where each graph size requires a different GNN to compute with. Our construction introduces only a polynomial overhead in both time and space. Furthermore, we show that by incorporating random initialization, for connected graphs recurrent GNNs can express all graph algorithms. In particular, any polynomial-time graph algorithm can be emulated on connected graphs in polynomial time by a recurrent GNN with random initialization.

Updated: 2025-07-30 16:27:11

标题: 重复练习使之完美：循环图神经网络匹配消息传递极限

摘要: 我们精确地表征了可计算的循环图神经网络（recurrent GNNs）的表达能力。我们证明了具有有限精度参数、求和聚合和ReLU激活的循环GNN可以计算出任何符合由颜色精炼（或Weisfeiler-Leman）算法引起的自然消息传递不变性的图算法。虽然众所周知，GNN的表达能力受到这种不变性的限制[Morris等人，AAAI 2019；Xu等人，ICLR 2019]，我们确立了循环GNN实际上可以与这一限制相匹配。这与非循环GNN形成鲜明对比，后者仅在一个非常弱的、“非均匀”的意义上具有Weisfeiler-Leman的能力，其中每个图的大小都需要不同的GNN来计算。我们的构造在时间和空间上只引入了多项式开销。此外，我们展示通过结合随机初始化，对于连通图而言，循环GNN可以表达所有图算法。特别是，任何多项式时间图算法都可以通过具有随机初始化的循环GNN在多项式时间内在连通图上模拟。

更新时间: 2025-07-30 16:27:11

领域: cs.LG,68T05, 68T07,I.2.6

下载: http://arxiv.org/abs/2505.00291v2

Repetition Makes Perfect: Recurrent Graph Neural Networks Match Message Passing Limit

Updated: 2025-07-30 16:27:11

标题: 重复使完美：循环图神经网络匹配消息传递极限

摘要: 我们精确地表征了可计算的循环图神经网络（recurrent GNNs）的表达能力。我们证明，带有有限精度参数、求和聚合和ReLU激活的循环GNN能够计算任何遵循由颜色细化（或Weisfeiler-Leman）算法引发的自然消息传递不变性的图算法。虽然众所周知，GNN的表达能力受到这种不变性的限制[Morris等人，AAAI 2019；Xu等人，ICLR 2019]，我们确立了循环GNN实际上可以达到这一限制。这与非循环GNN形成对比，非循环GNN只在非常弱的“非均匀”意义上具有Weisfeiler-Leman的能力，每个图大小需要一个不同的GNN来计算。我们的构建仅在时间和空间上引入了多项式开销。此外，我们展示通过融入随机初始化，对于连通图，循环GNN可以表达所有图算法。特别地，任何多项式时间图算法可以通过具有随机初始化的循环GNN在多项式时间内在连通图上模拟。

更新时间: 2025-07-30 16:27:11

领域: cs.LG,68T05, 68T07,I.2.6

下载: http://arxiv.org/abs/2505.00291v2

On the algebraic degree stability of vectorial Boolean functions when restricted to affine subspaces

We study the behaviour of the algebraic degree of vectorial Boolean functions when their inputs are restricted to an affine subspace of their domain. Functions which maintain their degree on all subspaces of as high a codimension as possible are particularly interesting for cryptographic applications. For functions which are power functions $x^d$ in their univariate representation, we fully characterize the exponents $d$ for which the algebraic degree of the function stays unchanged when the input is restricted to spaces of codimension 1 or 2. For codimensions $k\ge 3$, we give a sufficient condition for the algebraic degree to stay unchanged. We apply these results to the multiplicative inverse function, as well as to the Kasami functions. We define an optimality notion regarding the stability of the degree on subspaces, and determine a number of optimal functions, including the multiplicative inverse function and the quadratic APN functions. We also give an explicit formula for counting the functions that keep their algebraic degree unchanged when restricted to hyperplanes.

Updated: 2025-07-30 16:19:31

标题: 关于向量布尔函数在限制在仿射子空间时的代数度稳定性

摘要: 我们研究了向量布尔函数的代数度在输入限制为定义域的仿射子空间时的行为。在尽可能高维的子空间上保持其度的函数对于密码应用特别有趣。对于在其一元表示中是幂函数$x^d$的函数，我们完全描述了当输入限制在维数为1或2的空间时，函数的代数度保持不变的指数$d$。对于维数$k\ge 3$的情况，我们给出了代数度保持不变的充分条件。我们将这些结果应用于乘法逆函数以及Kasami函数。我们定义了关于子空间上度稳定性的最优性概念，并确定了许多最优函数，包括乘法逆函数和二次APN函数。我们还给出了一个明确的公式来计算在限制为超平面时保持其代数度不变的函数的数量。

更新时间: 2025-07-30 16:19:31

领域: math.AC,cs.CR,06E30, 94D10

下载: http://arxiv.org/abs/2504.03307v2

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

Vision large language models (VLLMs) are focusing primarily on handling complex and fine-grained visual information by incorporating advanced vision encoders and scaling up visual models. However, these approaches face high training and inference costs, as well as challenges in extracting visual details, effectively bridging across modalities. In this work, we propose a novel visual framework, MoCHA, to address these issues. Our framework integrates four vision backbones (i.e., CLIP, SigLIP, DINOv2 and ConvNeXt) to extract complementary visual features and is equipped with a sparse Mixture of Experts Connectors (MoECs) module to dynamically select experts tailored to different visual dimensions. To mitigate redundant or insufficient use of the visual information encoded by the MoECs module, we further design a Hierarchical Group Attention (HGA) with intra- and inter-group operations and an adaptive gating strategy for encoded visual features. We train MoCHA on two mainstream LLMs (e.g., Phi2-2.7B and Vicuna-7B) and evaluate their performance across various benchmarks. Notably, MoCHA outperforms state-of-the-art open-weight models on various tasks. For example, compared to CuMo (Mistral-7B), our MoCHA (Phi2-2.7B) presents outstanding abilities to mitigate hallucination by showing improvements of 3.25% in POPE and to follow visual instructions by raising 153 points on MME. Finally, ablation studies further confirm the effectiveness and robustness of the proposed MoECs and HGA in improving the overall performance of MoCHA.

Updated: 2025-07-30 16:15:22

标题: MoCHA: 使用MoE连接器和分层组注意力进行高级视觉-语言推理

摘要: 视觉大型语言模型（VLLMs）主要专注于通过整合先进的视觉编码器和扩大视觉模型来处理复杂和细粒度的视觉信息。然而，这些方法面临高训练和推理成本，以及在提取视觉细节和有效跨模态桥接方面的挑战。在这项工作中，我们提出了一种新颖的视觉框架MoCHA来解决这些问题。我们的框架集成了四个视觉主干（即CLIP、SigLIP、DINOv2和ConvNeXt）以提取互补的视觉特征，并配备了一个稀疏的专家混合连接器（MoECs）模块，以动态选择适合不同视觉维度的专家。为了减轻MoECs模块编码的视觉信息的冗余或不足使用，我们进一步设计了一个具有组内和组间操作的分层组注意力（HGA）以及适应性门控策略用于编码的视觉特征。我们在两个主流LLMs（如Phi2-2.7B和Vicuna-7B）上训练MoCHA，并评估它们在各种基准测试中的性能。值得注意的是，MoCHA在各种任务上优于最先进的开放权重模型。例如，与CuMo（Mistral-7B）相比，我们的MoCHA（Phi2-2.7B）在减轻幻觉方面表现出色，POPE提高了3.25％，并且在遵循视觉指令方面，MME提高了153分。最后，消融研究进一步证实了提出的MoECs和HGA在改善MoCHA整体性能方面的有效性和稳健性。

更新时间: 2025-07-30 16:15:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22805v1

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

Updated: 2025-07-30 16:15:22

标题: MoCHA：具有MoE连接器和分层组注意力的高级视觉-语言推理

摘要: 视觉大型语言模型(VLLMs)主要专注于通过整合先进的视觉编码器和扩大视觉模型来处理复杂和细粒度的视觉信息。然而，这些方法面临着高昂的训练和推理成本，以及在提取视觉细节、有效地跨模态之间搭建的挑战。在这项工作中，我们提出了一种新颖的视觉框架MoCHA来解决这些问题。我们的框架集成了四个视觉骨干网络（即CLIP、SigLIP、DINOv2和ConvNeXt）来提取互补的视觉特征，并配备了一个稀疏的专家混合连接器(MoECs)模块，动态选择适合不同视觉维度的专家。为了减轻MoECs模块编码的视觉信息的冗余或不足使用，我们进一步设计了一个具有组内和组间操作的分层组注意(HGA)和一种自适应门控策略来处理编码的视觉特征。我们在两个主流的LLMs(例如Phi2-2.7B和Vicuna-7B)上训练MoCHA，并评估它们在各种基准测试中的性能。值得注意的是，MoCHA在各种任务上胜过了最先进的开放权重模型。例如，与CuMo(Mistral-7B)相比，我们的MoCHA(Phi2-2.7B)在减轻幻觉方面表现出色，POPE提高了3.25%，在遵循视觉指令方面提高了153分的MME。最后，消融研究进一步确认了提出的MoECs和HGA在提高MoCHA整体性能方面的有效性和稳健性。

更新时间: 2025-07-30 16:15:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22805v1

Advancing Fetal Ultrasound Image Quality Assessment in Low-Resource Settings

Accurate fetal biometric measurements, such as abdominal circumference, play a vital role in prenatal care. However, obtaining high-quality ultrasound images for these measurements heavily depends on the expertise of sonographers, posing a significant challenge in low-income countries due to the scarcity of trained personnel. To address this issue, we leverage FetalCLIP, a vision-language model pretrained on a curated dataset of over 210,000 fetal ultrasound image-caption pairs, to perform automated fetal ultrasound image quality assessment (IQA) on blind-sweep ultrasound data. We introduce FetalCLIP$_{CLS}$, an IQA model adapted from FetalCLIP using Low-Rank Adaptation (LoRA), and evaluate it on the ACOUSLIC-AI dataset against six CNN and Transformer baselines. FetalCLIP$_{CLS}$ achieves the highest F1 score of 0.757. Moreover, we show that an adapted segmentation model, when repurposed for classification, further improves performance, achieving an F1 score of 0.771. Our work demonstrates how parameter-efficient fine-tuning of fetal ultrasound foundation models can enable task-specific adaptations, advancing prenatal care in resource-limited settings. The experimental code is available at: https://github.com/donglihe-hub/FetalCLIP-IQA.

Updated: 2025-07-30 16:09:29

标题: 在低资源环境中推进胎儿超声图像质量评估

摘要: 准确的胎儿生物测量，如腹围，对产前护理起着至关重要的作用。然而，获取用于这些测量的高质量超声图像在很大程度上取决于超声技师的专业知识，这在低收入国家中由于受训人员的稀缺性而构成了重大挑战。为了解决这个问题，我们利用FetalCLIP，这是一个在超过21万个胎儿超声图像-字幕对的精心策划数据集上预训练的视觉-语言模型，对盲扫超声数据进行自动化胎儿超声图像质量评估（IQA）。我们介绍了FetalCLIP$_{CLS}$，这是一个通过低秩适应（LoRA）从FetalCLIP调整的IQA模型，并在ACOUSLIC-AI数据集上与六个CNN和Transformer基线进行评估。FetalCLIP$_{CLS}$实现了最高的F1分数为0.757。此外，我们展示了当一个适应的分割模型被重新用于分类时，进一步提高了性能，实现了一个F1分数为0.771。我们的工作展示了如何通过参数高效的对胎儿超声基础模型进行微调，可以实现任务特定的适应，推进资源有限环境中的产前护理。实验代码可在以下链接找到：https://github.com/donglihe-hub/FetalCLIP-IQA。

更新时间: 2025-07-30 16:09:29

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22802v1

Advancing Fetal Ultrasound Image Quality Assessment in Low-Resource Settings

Updated: 2025-07-30 16:09:29

标题: 在低资源环境中推进胎儿超声图像质量评估

摘要: 胎儿生物测量学如腹围等准确的测量在产前护理中起着至关重要的作用。然而，获取用于这些测量的高质量超声图像在很大程度上取决于超声技师的专业知识，这在低收入国家由于受训人员稀缺而带来了重大挑战。为解决这一问题，我们利用FetalCLIP，这是一个在超过210,000对胎儿超声图像-标题配对数据集上预训练的视觉-语言模型，对盲扫超声数据进行自动化胎儿超声图像质量评估（IQA）。我们引入了FetalCLIP$_{CLS}$，这是一个通过低秩适应（LoRA）从FetalCLIP适应的IQA模型，并在ACOUSLIC-AI数据集上针对六个CNN和Transformer基线进行评估。FetalCLIP$_{CLS}$实现了最高的F1分数为0.757。此外，我们展示了当一个适应的分割模型被重新用于分类时，进一步提高了性能，实现了F1分数为0.771。我们的工作展示了如何通过参数高效的对胎儿超声基础模型进行微调，可以实现任务特定的适应，推动资源有限环境中的产前护理。实验代码可在以下网址获取：https://github.com/donglihe-hub/FetalCLIP-IQA。

更新时间: 2025-07-30 16:09:29

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22802v1

Mitigating loss of variance in ensemble data assimilation: machine learning-based and distance-free localization

We propose two new methods based/inspired by machine learning for tabular data and distance-free localization to enhance the covariance estimations in an ensemble data assimilation. The main goal is to enhance the data assimilation results by mitigating loss of variance due to sampling errors. We also analyze the suitability of several machine learning models and the balance between accuracy and computational cost of the covariance estimations. We introduce two distance-free localization techniques leveraging machine learning methods specifically tailored for tabular data. The methods are integrated into the Ensemble Smoother with Multiple Data Assimilation (ES-MDA) framework. The results show that the proposed localizations improve covariance accuracy and enhance data assimilation and uncertainty quantification results. We observe reduced variance loss for the input variables using the proposed methods. Furthermore, we compare several machine learning models, assessing their suitability for the problem in terms of computational cost, and quality of the covariance estimation and data match. The influence of ensemble size is also investigated, providing insights into balancing accuracy and computational efficiency. Our findings demonstrate that certain machine learning models are more suitable for this problem. This study introduces two novel methods that mitigate variance loss for model parameters in ensemble-based data assimilation, offering practical solutions that are easy to implement and do not require any additional numerical simulation or hyperparameter tuning.

Updated: 2025-07-30 16:08:55

标题: 减轻集合数据同化中方差损失：基于机器学习和无距离限制的本地化

摘要: 我们提出两种基于机器学习的新方法，用于表格数据和无距离定位，以增强集合数据同化中的协方差估计。主要目标是通过减少抽样误差导致的方差损失来增强数据同化结果。我们还分析了几种机器学习模型的适用性，以及协方差估计的准确性和计算成本之间的平衡。我们引入了两种利用机器学习方法专门针对表格数据定制的无距离定位技术。这些方法被整合到多数据同化集合平滑器（ES-MDA）框架中。结果显示，所提出的局部化改进了协方差的准确性，并增强了数据同化和不确定性量化结果。我们观察到使用所提出的方法减少了输入变量的方差损失。此外，我们比较了几种机器学习模型，评估它们在计算成本和协方差估计质量以及数据匹配方面对问题的适用性。还调查了集合大小的影响，提供了在准确性和计算效率之间平衡的见解。我们的研究结果表明，某些机器学习模型更适合这个问题。这项研究介绍了两种新方法，用于减少集合数据同化中模型参数的方差损失，提供了易于实施且不需要任何额外数值模拟或超参数调整的实用解决方案。

更新时间: 2025-07-30 16:08:55

领域: cs.LG,cs.AI,math.ST,physics.comp-ph,stat.TH

下载: http://arxiv.org/abs/2506.13362v2

Mitigating loss of variance in ensemble data assimilation: machine learning-based and distance-free localization

Updated: 2025-07-30 16:08:55

标题: 减轻集合数据同化中方差丢失问题：基于机器学习和无距离限制的本地化

摘要: 我们提出了两种基于机器学习的新方法，用于表格数据和无距离定位，以增强合奏数据同化中的协方差估计。主要目标是通过减轻由于采样误差而导致的方差损失来增强数据同化结果。我们还分析了几种机器学习模型的适用性以及协方差估计的准确性和计算成本之间的平衡。我们引入了两种利用机器学习方法专门针对表格数据设计的无距离定位技术。这些方法被整合到集合平滑器与多数据同化（ES-MDA）框架中。结果表明，所提出的定位方法提高了协方差的准确性，并增强了数据同化和不确定性量化结果。我们观察到使用所提出的方法减少了输入变量的方差损失。此外，我们比较了几种机器学习模型，评估它们在计算成本和协方差估计以及数据匹配质量方面对问题的适用性。还研究了集合大小的影响，为平衡准确性和计算效率提供了见解。我们的研究结果表明，某些机器学习模型更适合这个问题。这项研究介绍了两种新方法，可以减轻基于集合的数据同化中模型参数的方差损失，提供易于实施且不需要任何额外数值模拟或超参数调整的实用解决方案。

更新时间: 2025-07-30 16:08:55

领域: cs.LG,cs.AI,math.ST,physics.comp-ph,stat.TH

下载: http://arxiv.org/abs/2506.13362v2

Neutral Residues: Revisiting Adapters for Model Extension

We address the problem of extending a pretrained large language model to a new domain that was not seen during training. Standard techniques, such as finetuning or low-rank adaptation (LoRA) are successful at domain adaptation, but do not formally add capacity to the model. This often leads to a trade-off, between performing well on the new domain vs. degrading performance on the original domain. Here, we revisit and improve adapters to extend LLMs from three angles: data, architecture and training procedure, which are advantageously considered jointly. The resulting method, called neutral residues, modifies adapters in a way that leads each new residual block to output near-zeros on the original domain. This solution leads to strong results when adapting a state-of-the-art model originally trained on English to a new language. Neutral residues significantly outperform competing approaches such as finetuning, LoRA or vanilla adapters in terms of the trade-off between learning the new language and not forgetting English.

Updated: 2025-07-30 16:07:24

标题: 中性残基：重新审视模型扩展的适配器

摘要: 我们解决了将预训练的大型语言模型扩展到在训练过程中未见过的新领域的问题。标准技术，如微调或低秩适应（LoRA），在领域适应方面表现出色，但并未正式增加模型的容量。这往往导致在新领域表现良好与在原始领域性能下降之间的权衡。在这里，我们重新审视和改进适配器，以从数据、架构和训练程序三个角度扩展LLMs，这三个角度有利地共同考虑。所得方法称为“中性残余”，以一种修改适配器的方式，使每个新的残余块在原始领域输出接近零。这个解决方案在将一个原本在英语上训练的最先进模型适应到新语言时取得了强大的结果。中性残余在学习新语言和不忘记英语之间的权衡方面明显优于竞争方法，如微调、LoRA或普通适配器。

更新时间: 2025-07-30 16:07:24

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.02744v2

Neutral Residues: Revisiting Adapters for Model Extension

Updated: 2025-07-30 16:07:24

标题: 中性残基：重访模型扩展的适配器

摘要: 我们解决了将预先训练的大型语言模型扩展到在训练过程中未见过的新领域的问题。标准技术，如微调或低秩适应（LoRA），在领域适应方面非常成功，但并未正式增加模型的容量。这经常导致在新领域表现良好与在原始领域性能下降之间的权衡。在这里，我们重新审视并改进了适配器，从数据、架构和训练程序三个角度扩展LLM，这些方面有利地共同考虑。所得到的方法被称为中性残基，以一种方式修改适配器，使每个新的残余块在原始领域输出接近零。这个解决方案在将最先进的模型从英语原始训练转移到新语言时取得了强大的结果。与微调、LoRA或普通适配器相比，中性残基在学习新语言和不忘记英语之间的权衡方面表现出色。

更新时间: 2025-07-30 16:07:24

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.02744v2

Quantifying surprise in clinical care: Detecting highly informative events in electronic health records with foundation models

We present a foundation model-derived method to identify highly informative tokens and events in electronic health records. Our approach considers incoming data in the entire context of a patient's hospitalization and so can flag anomalous events that rule-based approaches would consider within a normal range. We demonstrate that the events our model flags are significant for predicting downstream patient outcomes and that a fraction of events identified as carrying little information can safely be dropped. Additionally, we show how informativeness can help interpret the predictions of prognostic models trained on foundation model-derived representations.

Updated: 2025-07-30 16:01:18

标题: 量化临床护理中的惊喜：利用基础模型在电子健康记录中检测高度信息化事件

摘要: 我们提出了一种基于基础模型的方法，用于识别电子健康记录中高度信息化的标记和事件。我们的方法考虑了患者住院期间的整体情况，因此可以标记出异常事件，而基于规则的方法可能认为这些事件在正常范围内。我们证明了我们的模型标记的事件对于预测患者结果具有重要意义，并且可以安全地放弃一部分被识别为携带少量信息的事件。此外，我们展示了信息性如何帮助解释基于基础模型派生表示训练的预后模型的预测。

更新时间: 2025-07-30 16:01:18

领域: cs.LG

下载: http://arxiv.org/abs/2507.22798v1

Quantifying surprise in clinical care: Detecting highly informative events in electronic health records with foundation models

Updated: 2025-07-30 16:01:18

标题: 在临床护理中量化惊喜：利用基础模型在电子健康记录中检测高度信息化事件

摘要: 我们提出了一种基于基础模型的方法，用于识别电子健康记录中高度信息化的标记和事件。我们的方法考虑患者住院期间的整个环境中的传入数据，因此可以标记那些基于规则的方法认为在正常范围内的异常事件。我们证明了我们的模型标记的事件对预测下游患者结果是重要的，并且可以安全地删除部分被识别为携带少量信息的事件。此外，我们展示了信息化如何帮助解释基于基础模型生成的表示训练的预测模型的预测。

更新时间: 2025-07-30 16:01:18

领域: cs.LG

下载: http://arxiv.org/abs/2507.22798v1

Towards the Law of Capacity Gap in Distilling Language Models

Language model (LM) distillation aims at distilling the knowledge in a large teacher LM to a small student one. As a critical issue facing LM distillation, a superior student often arises from a teacher of a relatively small scale instead of a larger one, especially in the presence of substantial capacity gap between the teacher and student. This issue, often referred to as the \textit{curse of capacity gap}, suggests that there is likely an optimal teacher yielding the best-performing student along the scaling course of the teacher. Consequently, distillation trials on teachers of a wide range of scales are called for to determine the optimal teacher, which becomes computationally intensive in the context of large LMs (LLMs). This paper addresses this critical bottleneck by providing the \textit{law of capacity gap} inducted from a preliminary study on distilling a broad range of small-scale (<3B) LMs, where the optimal teacher consistently scales linearly with the student scale across different model and data scales. By extending the law to LLM distillation on a larger scale (7B), we succeed in obtaining versatile LLMs that outperform a wide array of competitors.

Updated: 2025-07-30 16:00:53

标题: 走向在提炼语言模型中的容量差异法则

摘要: 语言模型（LM）蒸馏旨在将大型教师LM中的知识提炼到小型学生LM中。作为LM蒸馏面临的一个关键问题，一个优秀的学生往往来自一个相对规模较小的教师，而不是规模更大的教师，尤其是在教师和学生之间存在实质性能力差距的情况下。这个问题通常被称为“容量差距的诅咒”，表明在教师的扩展过程中可能存在一个最佳教师，从而产生表现最佳的学生。因此，需要对一系列规模的教师进行蒸馏试验，以确定最佳教师，这在大型LM（LLMs）的情况下变得计算密集。本文通过从对一系列小规模（<3B）LM的蒸馏的初步研究中归纳出的“容量差距定律”，解决了这一关键瓶颈，其中最佳教师在不同的模型和数据规模下始终与学生规模呈线性关系。通过将这一定律扩展到更大规模（7B）的LLM蒸馏，我们成功获得了表现优越的多功能LLM，胜过了众多竞争对手。

更新时间: 2025-07-30 16:00:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.07052v4

Towards the Law of Capacity Gap in Distilling Language Models

Updated: 2025-07-30 16:00:53

标题: 朝向填补语言模型容量差距的法则

摘要: 语言模型（LM）蒸馏旨在将大型教师LM中的知识蒸馏到一个小型学生模型中。作为LM蒸馏面临的关键问题，一个优秀的学生往往出自一个规模相对较小的教师，尤其是在教师和学生之间存在实质性容量差距的情况下。这个问题通常被称为“容量差距的诅咒”，表明在教师的扩展过程中可能存在一个最佳教师产生表现最佳学生的情况。因此，需要对各种规模的教师进行蒸馏试验，以确定最佳教师，这在大型LM（LLMs）的情境下变得计算密集。本文通过在蒸馏一系列小规模（<3B）LM的初步研究中归纳出“容量差距法则”来解决这一关键瓶颈，其中最佳教师在不同模型和数据规模下始终与学生规模成线性关系。通过将这一法则扩展到更大规模（7B）的LLM蒸馏，我们成功地获得了胜过各种竞争对手的多功能LLMs。

更新时间: 2025-07-30 16:00:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.07052v4

G-Core: A Simple, Scalable and Balanced RLHF Trainer

Reinforcement Learning from Human Feedback (RLHF) has become an increasingly popular paradigm for training large language models (LLMs) and diffusion models. While existing RLHF training systems have enabled significant progress, they often face challenges in scaling to multi-modal and diffusion workflows and adapting to dynamic workloads. In particular, current approaches may encounter limitations in controller scalability, flexible resource placement, and efficient orchestration when handling complex RLHF pipelines, especially in scenarios involving dynamic sampling or generative reward modeling. In this paper, we present \textbf{G-Core}, a simple, scalable, and balanced RLHF training framework designed to address these challenges. G-Core introduces a parallel controller programming model, enabling flexible and efficient orchestration of complex RLHF workflows without the bottlenecks of a single centralized controller. Furthermore, we propose a dynamic placement schema that adaptively partitions resources and schedules workloads, significantly reducing hardware idle time and improving utilization, even under highly variable training conditions. G-Core has successfully trained models that support WeChat product features serving a large-scale user base, demonstrating its effectiveness and robustness in real-world scenarios. Our results show that G-Core advances the state of the art in RLHF training, providing a solid foundation for future research and deployment of large-scale, human-aligned models.

Updated: 2025-07-30 15:55:08

标题: G-Core：一个简单、可扩展和平衡的RLHF训练器

摘要: 人类反馈强化学习（RLHF）已成为训练大型语言模型（LLMs）和扩散模型的一种越来越受欢迎的范式。虽然现有的RLHF训练系统已经取得了显著进展，但它们在扩展到多模态和扩散工作流程以及适应动态工作负载方面经常面临挑战。特别是，在处理复杂的RLHF流水线时，当前方法可能会遇到控制器可扩展性、灵活资源布置和高效编排方面的限制，尤其是在涉及动态采样或生成奖励建模的情况下。在本文中，我们提出了一个名为G-Core的简单、可扩展和平衡的RLHF训练框架，旨在解决这些挑战。G-Core引入了并行控制器编程模型，实现了复杂RLHF工作流程的灵活高效编排，避免了单一集中式控制器的瓶颈。此外，我们提出了一种动态布置模式，自适应地分区资源并安排工作负载，显著减少硬件空闲时间，提高利用率，即使在高度可变的训练条件下也能实现。G-Core已成功训练出支持微信产品功能的模型，为大规模用户群提供服务，展示了其在实际场景中的有效性和稳健性。我们的结果表明，G-Core推动了RLHF训练的最新进展，为未来大规模、与人类对齐的模型的研究和部署提供了坚实基础。

更新时间: 2025-07-30 15:55:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22789v1

G-Core: A Simple, Scalable and Balanced RLHF Trainer

Updated: 2025-07-30 15:55:08

标题: G-Core: 一种简单、可扩展且平衡的RLHF训练器

摘要: 从人类反馈中进行强化学习（RLHF）已经成为训练大型语言模型（LLMs）和扩散模型的一个越来越受欢迎的范式。虽然现有的RLHF训练系统已经取得了重大进展，但它们在扩展到多模态和扩散工作流以及适应动态工作负载方面经常面临挑战。特别是，在处理复杂的RLHF流水线时，当前的方法可能会遇到控制器可扩展性、灵活资源放置和有效协调方面的局限，特别是在涉及动态抽样或生成奖励建模的场景中。在本文中，我们提出了\textbf{G-Core}，一个简单、可扩展和平衡的RLHF训练框架，旨在解决这些挑战。G-Core引入了并行控制器编程模型，实现了复杂RLHF工作流的灵活高效协调，避免了单一集中控制器的瓶颈。此外，我们提出了一种动态放置模式，自适应分区资源和调度工作负载，显著减少硬件空闲时间并提高利用率，即使在高度变化的训练条件下也能如此。G-Core已成功训练出支持微信产品功能为大规模用户群体服务的模型，展示了其在现实场景中的有效性和稳健性。我们的结果表明，G-Core推动了RLHF训练的最新技术，为未来大规模、与人类对齐的模型的研究和部署奠定了坚实基础。

更新时间: 2025-07-30 15:55:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22789v1

Amorphous Solid Model of Vectorial Hopfield Neural Networks

We present a vectorial extension of the Hopfield associative memory model inspired by the theory of amorphous solids, where binary neural states are replaced by unit vectors $\mathbf{s}_i \in \mathbb{R}^3$ on the sphere $S^2$. The generalized Hebbian learning rule creates a block-structured weight matrix through outer products of stored pattern vectors, analogous to the Hessian matrix structure in amorphous solids. We demonstrate that this model exhibits quantifiable structural properties characteristic of disordered materials: energy landscapes with deep minima for stored patterns versus random configurations (energy gaps $\sim 7$ units), strongly anisotropic correlations encoded in the weight matrix (anisotropy ratios $\sim 10^2$), and order-disorder transitions controlled by the pattern density $\gamma = P/(N \cdot d)$. The enhanced memory capacity ($\gamma_c \approx 0.55$ for a fully-connected network) compared to binary networks ($\gamma_c \approx 0.138$) and the emergence of orientational correlations establish connections between associative memory mechanisms and amorphous solid physics, particularly in systems with continuous orientational degrees of freedom. We also unveil the scaling with the coordination number $Z$ of the memory capacity: $\gamma_c \sim (Z-6)$ from the isostatic point $Z_c =6$ of the 3D elastic network, which closely mirrors the scaling of the shear modulus $G \sim (Z-6)$ in 3D central-force spring networks.

Updated: 2025-07-30 15:51:54

标题: 非晶态固体模型下的矢量Hopfield神经网络

摘要: 我们提出了Hopfield相关记忆模型的矢量扩展，受到无定形固体理论的启发，其中二进制神经状态被单位矢量$\mathbf{s}_i \in \mathbb{R}^3$在球面$S^2$上替代。通过存储模式矢量的外积创建一个块状结构的权重矩阵的广义Hebb学习规则，类似于无定形固体中的Hessian矩阵结构。我们证明这个模型表现出有序材料特有的可量化结构特性：存储模式与随机配置之间的深最小值的能量景观（能量差$\sim 7$单位），编码在权重矩阵中的强各向异性相关性（各向异性比$\sim 10^2$），以及由模式密度$\gamma = P/(N \cdot d)$控制的有序-无序转变。与二进制网络相比的增强记忆容量（对于全连接网络，$\gamma_c \approx 0.55$，而对于二进制网络，$\gamma_c \approx 0.138$）以及定向相关性的出现建立了相关记忆机制和无定形固体物理之间的联系，特别是在具有连续定向自由度的系统中。我们还揭示了记忆容量与协调数$Z$的缩放关系：从3D弹性网络的等静点$Z_c = 6$出发，记忆容量$\gamma_c \sim (Z-6)$，这与3D中心力弹簧网络中的剪切模量$G \sim (Z-6)$的缩放密切相关。

更新时间: 2025-07-30 15:51:54

领域: cond-mat.dis-nn,cond-mat.soft,cond-mat.stat-mech,cs.LG,cs.NE

下载: http://arxiv.org/abs/2507.22787v1

Amorphous Solid Model of Vectorial Hopfield Neural Networks

Updated: 2025-07-30 15:51:54

标题: 非晶固态模型下的矢量Hopfield神经网络

摘要: 我们提出了一种受非晶固体理论启发的Hopfield联想记忆模型的矢量扩展，其中将二进制神经状态替换为球面$S^2$上的单位向量$\mathbf{s}_i \in \mathbb{R}^3$。广义Hebb学习规则通过存储模式向量的外积创建一个分块结构的权重矩阵，类似于非晶固体中的Hessian矩阵结构。我们展示了这个模型表现出与无序材料特征性质相关的可量化结构特性：存储模式与随机配置之间具有深最小值的能量景观（能量差$\sim 7$单位），编码在权重矩阵中的强各向异性相关性（各向异性比$\sim 10^2$），以及由模式密度$\gamma = P/(N \cdot d)$控制的有序-无序转变。与二进制网络相比（$\gamma_c \approx 0.138$），增强的记忆容量（对于全连接网络，$\gamma_c \approx 0.55$）和定向相关性的出现建立了联想记忆机制与非晶固体物理之间的联系，特别是在具有连续定向自由度的系统中。我们还揭示了记忆容量与配位数$Z$的比例关系：从3D弹性网络的等静点$Z_c =6$，记忆容量$\gamma_c \sim (Z-6)$，这与3D中心力弹簧网络中的剪切模量$G \sim (Z-6)$的比例关系非常相似。

更新时间: 2025-07-30 15:51:54

领域: cond-mat.dis-nn,cond-mat.soft,cond-mat.stat-mech,cs.LG,cs.NE

下载: http://arxiv.org/abs/2507.22787v1

DO-EM: Density Operator Expectation Maximization

Density operators, quantum generalizations of probability distributions, are gaining prominence in machine learning due to their foundational role in quantum computing. Generative modeling based on density operator models (\textbf{DOMs}) is an emerging field, but existing training algorithms -- such as those for the Quantum Boltzmann Machine -- do not scale to real-world data, such as the MNIST dataset. The Expectation-Maximization algorithm has played a fundamental role in enabling scalable training of probabilistic latent variable models on real-world datasets. \textit{In this paper, we develop an Expectation-Maximization framework to learn latent variable models defined through \textbf{DOMs} on classical hardware, with resources comparable to those used for probabilistic models, while scaling to real-world data.} However, designing such an algorithm is nontrivial due to the absence of a well-defined quantum analogue to conditional probability, which complicates the Expectation step. To overcome this, we reformulate the Expectation step as a quantum information projection (QIP) problem and show that the Petz Recovery Map provides a solution under sufficient conditions. Using this formulation, we introduce the Density Operator Expectation Maximization (DO-EM) algorithm -- an iterative Minorant-Maximization procedure that optimizes a quantum evidence lower bound. We show that the \textbf{DO-EM} algorithm ensures non-decreasing log-likelihood across iterations for a broad class of models. Finally, we present Quantum Interleaved Deep Boltzmann Machines (\textbf{QiDBMs}), a \textbf{DOM} that can be trained with the same resources as a DBM. When trained with \textbf{DO-EM} under Contrastive Divergence, a \textbf{QiDBM} outperforms larger classical DBMs in image generation on the MNIST dataset, achieving a 40--60\% reduction in the Fr\'echet Inception Distance.

Updated: 2025-07-30 15:51:20

标题: DO-EM：密度算子期望最大化

摘要: 密度算符是概率分布的量子推广，在机器学习中备受重视，因为它们在量子计算中具有基础性作用。基于密度算符模型（DOMs）的生成建模是一个新兴领域，但现有的训练算法，如量子玻尔兹曼机，无法扩展到现实世界的数据，如MNIST数据集。期望最大化算法在实现对现实世界数据集上的概率潜变量模型的可扩展训练方面起着基础性作用。在本文中，我们开发了一个期望最大化框架，用于在经典硬件上学习通过DOMs定义的潜变量模型，其资源与用于概率模型的资源相当，同时可以扩展到真实世界数据。然而，设计这样的算法并不容易，因为缺乏明确定义的量子条件概率，这使期望步骤变得复杂。为了克服这一问题，我们将期望步骤重新表述为一个量子信息投影（QIP）问题，并展示Petz恢复映射在足够条件下提供了解决方案。利用这种表述，我们引入了密度算符期望最大化（DO-EM）算法 - 一种迭代的减小最大化程序，优化了量子证据下界。我们表明，DO-EM算法确保了在广泛模型类别中的对数似然在迭代中不减。最后，我们提出了量子交错深度玻尔兹曼机（QiDBMs），一种DOM，可以用与DBM相同的资源进行训练。当在对比散度下通过DO-EM进行训练时，QiDBM在MNIST数据集上的图像生成中优于更大的经典DBM，实现了Fr\'echet Inception Distance的40-60%减少。

更新时间: 2025-07-30 15:51:20

领域: cs.LG,quant-ph

下载: http://arxiv.org/abs/2507.22786v1

DO-EM: Density Operator Expectation Maximization

Updated: 2025-07-30 15:51:20

标题: 密度算子期望最大化（DO-EM：Density Operator Expectation Maximization）

摘要: 密度算子是概率分布的量子泛化，在机器学习中变得越来越重要，因为它们在量子计算中起着基础性作用。基于密度算子模型（DOMs）的生成建模是一个新兴领域，但现有的训练算法，如量子玻尔兹曼机的算法，并不能扩展到现实世界的数据，比如MNIST数据集。期望最大化算法在实现概率潜变量模型在真实世界数据集上的可伸缩训练方面发挥了基础性作用。在本文中，我们开发了一个期望最大化框架，用于在经典硬件上学习通过DOMs定义的潜变量模型，其资源与用于概率模型的资源相当，同时扩展到真实世界数据。然而，设计这样的算法并不容易，因为缺乏一个明确定义的量子条件概率的模拟，这使得期望步骤变得复杂。为了克服这一问题，我们将期望步骤重新表述为一个量子信息投影（QIP）问题，并展示了在足够条件下Petz恢复映射提供了一个解决方案。利用这个表述，我们引入了密度算子期望最大化（DO-EM）算法，这是一个迭代次优最大化过程，优化了量子证据下限。我们证明DO-EM算法确保了一类广泛模型的对数似然随迭代的非减性。最后，我们提出了量子交错深度玻尔兹曼机（QiDBMs），这是一个DOM，可以使用与DBM相同的资源进行训练。当使用DO-EM进行对比散度训练时，QiDBM在MNIST数据集上的图像生成中优于更大的经典DBMs，实现了Frechet Inception Distance的40-60%的降低。

更新时间: 2025-07-30 15:51:20

领域: cs.LG,quant-ph

下载: http://arxiv.org/abs/2507.22786v1

GATEAU: Selecting Influential Samples for Long Context Alignment

Aligning large language models to handle instructions with extremely long contexts has yet to be fully investigated. Previous studies have attempted to scale up the available data volume by synthesizing long instruction-following samples, as constructing such a dataset tends to be challenging for annotators. However, a lack of a well-defined strategy for ensuring data quality may introduce low-quality samples and restrict the model's performance. Thus, we propose GATEAU, a novel framework to address the unique challenge of long context alignment by identifying the influential samples enriched with long-range dependency relations. Specifically, GATEAU measures the long-range dependencies from two essential aspects: the difficulty of generating target responses due to the long-range dependencies, and the difficulty of understanding long inputs due to such dependencies. Comprehensive experiments indicate that GATEAU effectively identifies influential samples, and the model trained on these selected samples exhibits better instruction-following and long-context understanding capabilities.

Updated: 2025-07-30 15:50:58

标题: GATEAU：选择具有影响力的样本进行长文本对齐

摘要: 目前还没有对大型语言模型进行对齐以处理具有极长上下文的指令进行全面研究。先前的研究尝试通过合成长指令跟随样本来扩大可用数据量，因为构建这样一个数据集对于标注者来说往往是具有挑战性的。然而，缺乏确保数据质量的明确定策略可能会引入低质量样本并限制模型的性能。因此，我们提出了一种新颖的框架GATEAU，以解决长上下文对齐的独特挑战，通过识别富有长距离依赖关系的影响样本。具体来说，GATEAU从两个关键方面衡量长距离依赖：由于长距离依赖而产生生成目标响应的困难，以及由于这些依赖关系而理解长输入的困难。全面的实验表明，GATEAU能有效识别有影响力的样本，并且在这些选定样本上训练的模型表现出更好的指令跟随和长上下文理解能力。

更新时间: 2025-07-30 15:50:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.15633v6

GATEAU: Selecting Influential Samples for Long Context Alignment

Updated: 2025-07-30 15:50:58

标题: GATEAU：选择具有影响力的样本进行长上下文对齐

摘要: 将大型语言模型与处理具有极长上下文的指令进行对齐尚未得到充分调查。先前的研究尝试通过合成长指令跟随样本来扩大可用数据量，因为构建这样的数据集往往对注释者来说具有挑战性。然而，缺乏确保数据质量的明确定策略可能会引入低质量样本并限制模型的性能。因此，我们提出了GATEAU，这是一个新颖的框架，旨在通过识别富含长距离依赖关系的具有影响力的样本来解决长上下文对齐的独特挑战。具体而言，GATEAU从两个关键方面衡量长距离依赖关系：由于长距离依赖关系而生成目标响应的困难度，以及由于这些依赖关系而理解长输入的困难度。全面的实验表明，GATEAU有效地识别了具有影响力的样本，而在这些选定样本上训练的模型展现出更好的指令跟随和长上下文理解能力。

更新时间: 2025-07-30 15:50:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.15633v6

Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies

This paper introduces Team-Attention-Actor-Critic (TAAC), a reinforcement learning algorithm designed to enhance multi-agent collaboration in cooperative environments. TAAC employs a Centralized Training/Centralized Execution scheme incorporating multi-headed attention mechanisms in both the actor and critic. This design facilitates dynamic, inter-agent communication, allowing agents to explicitly query teammates, thereby efficiently managing the exponential growth of joint-action spaces while ensuring a high degree of collaboration. We further introduce a penalized loss function which promotes diverse yet complementary roles among agents. We evaluate TAAC in a simulated soccer environment against benchmark algorithms representing other multi-agent paradigms, including Proximal Policy Optimization and Multi-Agent Actor-Attention-Critic. We find that TAAC exhibits superior performance and enhanced collaborative behaviors across a variety of metrics (win rates, goal differentials, Elo ratings, inter-agent connectivity, balanced spatial distributions, and frequent tactical interactions such as ball possession swaps).

Updated: 2025-07-30 15:48:38

标题: 利用基于注意力的演员-评论家政策增强多智能体协作

摘要: 本文介绍了Team-Attention-Actor-Critic（TAAC），这是一种旨在增强合作环境中多智能体协作的强化学习算法。TAAC采用了集中式训练/集中式执行方案，在演员和评论家中都引入了多头注意力机制。这种设计促进了动态的智能体间通信，使智能体能够明确地查询队友，从而有效地管理联合行动空间的指数增长，同时确保高度的协作。我们进一步介绍了一种惩罚的损失函数，促进了智能体之间多样而互补的角色。我们在模拟足球环境中评估了TAAC与代表其他多智能体范例的基准算法（包括Proximal Policy Optimization和Multi-Agent Actor-Attention-Critic）的对比。我们发现TAAC在各种指标（胜率、进球差、Elo评级、智能体间连接性、平衡的空间分布以及频繁的战术互动如控球换位）上表现出优越的性能和增强的协作行为。

更新时间: 2025-07-30 15:48:38

领域: cs.AI,cs.LG,I.2.0; I.2.8

下载: http://arxiv.org/abs/2507.22782v1

Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies

Updated: 2025-07-30 15:48:38

标题: 使用基于注意力的演员-评论家策略增强多智能体协作

摘要: 本文介绍了Team-Attention-Actor-Critic（TAAC），这是一种旨在增强合作环境中多智能体协作的强化学习算法。TAAC采用了集中式训练/集中式执行方案，在演员和评论员中都使用了多头注意机制。该设计促进了动态的智能体间通信，使智能体能够明确地查询队友，从而有效地管理联合动作空间的指数增长，同时确保高度的协作。我们进一步引入了一种惩罚损失函数，促进了智能体之间的多样化但互补的角色。我们在一个模拟足球环境中评估了TAAC，与代表其他多智能体范式的基准算法（包括Proximal Policy Optimization和Multi-Agent Actor-Attention-Critic）进行比较。我们发现TAAC在各种指标（胜率、进球差、Elo等级、智能体间的连接性、平衡的空间分布以及频繁的战术互动，如控球交换）上表现出优越的性能和增强的协作行为。

更新时间: 2025-07-30 15:48:38

领域: cs.AI,cs.LG,I.2.0; I.2.8

下载: http://arxiv.org/abs/2507.22782v1

Effective Non-Random Extreme Learning Machine

The Extreme Learning Machine (ELM) is a growing statistical technique widely applied to regression problems. In essence, ELMs are single-layer neural networks where the hidden layer weights are randomly sampled from a specific distribution, while the output layer weights are learned from the data. Two of the key challenges with this approach are the architecture design, specifically determining the optimal number of neurons in the hidden layer, and the method's sensitivity to the random initialization of hidden layer weights. This paper introduces a new and enhanced learning algorithm for regression tasks, the Effective Non-Random ELM (ENR-ELM), which simplifies the architecture design and eliminates the need for random hidden layer weight selection. The proposed method incorporates concepts from signal processing, such as basis functions and projections, into the ELM framework. We introduce two versions of the ENR-ELM: the approximated ENR-ELM and the incremental ENR-ELM. Experimental results on both synthetic and real datasets demonstrate that our method overcomes the problems of traditional ELM while maintaining comparable predictive performance.

Updated: 2025-07-30 15:42:47

标题: 有效的非随机极限学习机

摘要: 极限学习机（ELM）是一种不断发展的统计技术，广泛应用于回归问题。实质上，ELM是单层神经网络，其中隐藏层权重从特定分布中随机抽样，而输出层权重则是根据数据学习得到的。采用这种方法的两个关键挑战是架构设计，特别是确定隐藏层中神经元的最佳数量，以及该方法对隐藏层权重的随机初始化的敏感性。本文介绍了一种新的增强学习算法，用于回归任务，即有效的非随机ELM（ENR-ELM），它简化了架构设计，并消除了对随机隐藏层权重选择的需求。该方法将信号处理中的概念，如基函数和投影，纳入ELM框架。我们介绍了两个版本的ENR-ELM：近似ENR-ELM和增量ENR-ELM。对合成数据集和真实数据集的实验结果表明，我们的方法克服了传统ELM的问题，同时保持了可比较的预测性能。

更新时间: 2025-07-30 15:42:47

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2411.16229v2

Effective Non-Random Extreme Learning Machine

Updated: 2025-07-30 15:42:47

标题: 有效的非随机极限学习机

摘要: 极限学习机（ELM）是一种不断发展的统计技术，广泛应用于回归问题。实质上，ELM是单层神经网络，其中隐藏层权重从特定分布中随机抽样，而输出层权重则是从数据中学习得到的。该方法面临的两个关键挑战是体系结构设计，特别是确定隐藏层中最佳神经元的数量，以及对隐藏层权重的随机初始化的敏感性。本文介绍了一种新的增强学习算法，用于回归任务，即有效非随机ELM（ENR-ELM），该算法简化了体系结构设计，并消除了对随机隐藏层权重选择的需求。所提出的方法将信号处理的概念，如基函数和投影，纳入ELM框架。我们引入了两个版本的ENR-ELM：近似ENR-ELM和增量ENR-ELM。对合成和真实数据集的实验结果表明，我们的方法克服了传统ELM的问题，同时保持了可比较的预测性能。

更新时间: 2025-07-30 15:42:47

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2411.16229v2

Label-free estimation of clinically relevant performance metrics under distribution shifts

Performance monitoring is essential for safe clinical deployment of image classification models. However, because ground-truth labels are typically unavailable in the target dataset, direct assessment of real-world model performance is infeasible. State-of-the-art performance estimation methods address this by leveraging confidence scores to estimate the target accuracy. Despite being a promising direction, the established methods mainly estimate the model's accuracy and are rarely evaluated in a clinical domain, where strong class imbalances and dataset shifts are common. Our contributions are twofold: First, we introduce generalisations of existing performance prediction methods that directly estimate the full confusion matrix. Then, we benchmark their performance on chest x-ray data in real-world distribution shifts as well as simulated covariate and prevalence shifts. The proposed confusion matrix estimation methods reliably predicted clinically relevant counting metrics on medical images under distribution shifts. However, our simulated shift scenarios exposed important failure modes of current performance estimation techniques, calling for a better understanding of real-world deployment contexts when implementing these performance monitoring techniques for postmarket surveillance of medical AI models.

Updated: 2025-07-30 15:37:58

标题: 无标签估计在分布转移下临床相关性能指标

摘要: 性能监控对于安全地临床部署图像分类模型至关重要。然而，由于目标数据集中通常缺乏地实际标签，直接评估实际模型性能是不可行的。最先进的性能估计方法通过利用置信度分数来估计目标准确性来解决这个问题。尽管这是一个有前途的方向，但已建立的方法主要估计模型的准确性，并且很少在临床领域进行评估，其中强烈的类别不平衡和数据集转移是常见的。我们的贡献有两个方面：首先，我们介绍了现有性能预测方法的泛化，直接估计完整的混淆矩阵。然后，我们在胸部X射线数据上对它们的性能进行了基准测试，这些数据存在实际世界中的分布转移，以及模拟的协变量和患病率转移。提出的混淆矩阵估计方法可靠地预测了在分布转移下的医学图像上的临床相关计数指标。然而，我们的模拟转移场景暴露了当前性能估计技术的重要失败模式，呼吁在为医疗AI模型的后市场监测实施这些性能监控技术时更好地理解真实世界部署环境。

更新时间: 2025-07-30 15:37:58

领域: cs.LG

下载: http://arxiv.org/abs/2507.22776v1

Label-free estimation of clinically relevant performance metrics under distribution shifts

Updated: 2025-07-30 15:37:58

标题: 无标签估计在分布转移下临床相关性能指标

摘要: 性能监测对于安全地在临床中部署图像分类模型至关重要。然而，由于目标数据集通常缺乏地面实况标签，直接评估实际模型性能是不可行的。最先进的性能估计方法通过利用置信度分数来估计目标准确性来解决这个问题。尽管这是一个有前途的方向，但已建立的方法主要是估计模型的准确性，并且很少在临床领域进行评估，那里普遍存在强烈的类别不平衡和数据集转移。我们的贡献有两个方面：首先，我们介绍了现有性能预测方法的泛化版本，直接估计完整的混淆矩阵。然后，我们在胸部X射线数据上对它们的性能进行了基准测试，该数据存在真实世界分布转移以及模拟的协变量和患病率转移。提出的混淆矩阵估计方法可靠地预测了在分布转移下的医学图像上的临床相关计数指标。然而，我们的模拟转移情景暴露了当前性能估计技术的重要失败模式，这要求在实施这些性能监测技术用于医疗AI模型的市场监测时更好地理解真实世界部署环境。

更新时间: 2025-07-30 15:37:58

领域: cs.LG

下载: http://arxiv.org/abs/2507.22776v1

ASP-FZN: A Translation-based Constraint Answer Set Solver

We present the solver asp-fzn for Constraint Answer Set Programming (CASP), which extends ASP with linear constraints. Our approach is based on translating CASP programs into the solver-independent FlatZinc language that supports several Constraint Programming and Integer Programming backend solvers. Our solver supports a rich language of linear constraints, including some common global constraints. As for evaluation, we show that asp-fzn is competitive with state-of-the-art ASP solvers on benchmarks taken from past ASP competitions. Furthermore, we evaluate it on several CASP problems from the literature and compare its performance with clingcon, which is a prominent CASP solver that supports most of the asp-fzn language. The performance of asp-fzn is very promising as it is already competitive on plain ASP and even outperforms clingcon on some CASP benchmarks.

Updated: 2025-07-30 15:36:40

标题: ASP-FZN: 基于翻译的约束答案集求解器

摘要: 我们介绍了Constraint Answer Set Programming（CASP）求解器asp-fzn，它将ASP扩展为线性约束。我们的方法是将CASP程序转换为支持多个Constraint Programming和Integer Programming后端求解器的与求解器无关的FlatZinc语言。我们的求解器支持丰富的线性约束语言，包括一些常见的全局约束。在评估方面，我们展示了asp-fzn在过去ASP竞赛中的基准测试上与最先进的ASP求解器竞争的情况。此外，我们对文献中的几个CASP问题进行了评估，并将其性能与clingcon进行了比较，clingcon是一个支持大部分asp-fzn语言的著名CASP求解器。asp-fzn的性能非常令人鼓舞，因为它在普通ASP上已经具有竞争力，甚至在一些CASP基准测试中胜过clingcon。

更新时间: 2025-07-30 15:36:40

领域: cs.AI

下载: http://arxiv.org/abs/2507.22774v1

ASP-FZN: A Translation-based Constraint Answer Set Solver

Updated: 2025-07-30 15:36:40

标题: ASP-FZN: 基于翻译的约束答案集求解器

摘要: 我们介绍了Constraint Answer Set Programming（CASP）求解器asp-fzn，它将ASP扩展为线性约束。我们的方法是将CASP程序转换为支持多个约束编程和整数编程后端求解器的独立求解器FlatZinc语言。我们的求解器支持丰富的线性约束语言，包括一些常见的全局约束。在评估方面，我们展示了asp-fzn在来自过去ASP竞赛的基准测试中与最先进的ASP求解器竞争的能力。此外，我们对文献中的几个CASP问题进行了评估，并将其性能与clingcon进行了比较，clingcon是一个支持大多数asp-fzn语言的知名CASP求解器。asp-fzn的性能非常有前途，因为它在普通ASP上已经具有竞争力，甚至在一些CASP基准测试中表现优于clingcon。

更新时间: 2025-07-30 15:36:40

领域: cs.AI

下载: http://arxiv.org/abs/2507.22774v1

Empirical Evaluation of Concept Drift in ML-Based Android Malware Detection

Despite outstanding results, machine learning-based Android malware detection models struggle with concept drift, where rapidly evolving malware characteristics degrade model effectiveness. This study examines the impact of concept drift on Android malware detection, evaluating two datasets and nine machine learning and deep learning algorithms, as well as Large Language Models (LLMs). Various feature types--static, dynamic, hybrid, semantic, and image-based--were considered. The results showed that concept drift is widespread and significantly affects model performance. Factors influencing the drift include feature types, data environments, and detection methods. Balancing algorithms helped with class imbalance but did not fully address concept drift, which primarily stems from the dynamic nature of the malware landscape. No strong link was found between the type of algorithm used and concept drift, the impact was relatively minor compared to other variables since hyperparameters were not fine-tuned, and the default algorithm configurations were used. While LLMs using few-shot learning demonstrated promising detection performance, they did not fully mitigate concept drift, highlighting the need for further investigation.

Updated: 2025-07-30 15:35:51

标题: 基于机器学习的安卓恶意软件检测中概念漂移的实证评估

摘要: 尽管基于机器学习的Android恶意软件检测模型取得了出色的结果，但仍然面临着概念漂移的挑战，即快速发展的恶意软件特征降低了模型的有效性。本研究探讨了概念漂移对Android恶意软件检测的影响，评估了两个数据集和九种机器学习和深度学习算法，以及大型语言模型（LLMs）。考虑了各种特征类型--静态、动态、混合、语义和基于图像的。结果显示，概念漂移普遍存在，并且显著影响模型性能。影响漂移的因素包括特征类型、数据环境和检测方法。平衡算法有助于处理类别不平衡问题，但并未完全解决概念漂移问题，这主要源于恶意软件环境的动态性。没有发现算法类型与概念漂移之间的强关联，其影响相对较小，因为超参数未经过调优，使用了默认的算法配置。虽然使用少量样本学习的LLMs表现出有希望的检测性能，但并未完全缓解概念漂移问题，强调了进一步研究的必要性。

更新时间: 2025-07-30 15:35:51

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22772v1

Empirical Evaluation of Concept Drift in ML-Based Android Malware Detection

Updated: 2025-07-30 15:35:51

标题: 基于机器学习的安卓恶意软件检测中概念漂移的实证评估

摘要: 尽管基于机器学习的安卓恶意软件检测模型取得了出色的结果，但仍然面临着概念漂移的挑战，即快速演化的恶意软件特征会降低模型的有效性。本研究考察了概念漂移对安卓恶意软件检测的影响，评估了两个数据集和九种机器学习和深度学习算法，以及大型语言模型（LLM）。考虑了各种特征类型--静态、动态、混合、语义和基于图像的。结果显示，概念漂移是普遍存在的，并且显著影响模型性能。影响漂移的因素包括特征类型、数据环境和检测方法。平衡算法有助于处理类别不平衡，但并未完全解决概念漂移问题，主要原因是恶意软件环境的动态性。研究没有发现使用的算法类型与概念漂移之间存在强烈联系，其影响相对较小，因为超参数没有进行精细调整，而是使用了默认的算法配置。尽管使用少样本学习的LLMs展示了有希望的检测性能，但它们并未完全消除概念漂移，强调了进一步研究的必要性。

更新时间: 2025-07-30 15:35:51

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22772v1

The Effect of Stochasticity in Score-Based Diffusion Sampling: a KL Divergence Analysis

Sampling in score-based diffusion models can be performed by solving either a reverse-time stochastic differential equation (SDE) parameterized by an arbitrary time-dependent stochasticity parameter or a probability flow ODE, corresponding to the stochasticity parameter set to zero. In this work, we study the effect of this stochasticity on the generation process through bounds on the Kullback-Leibler (KL) divergence, complementing the analysis with numerical and analytical examples. Our main results apply to linear forward SDEs with additive noise and Lipschitz-continuous score functions, and quantify how errors from the prior distribution and score approximation propagate under different choices of the stochasticity parameter. The theoretical bounds are derived using log-Sobolev inequalities for the marginals of the forward process, which enable a more effective control of the KL divergence decay along sampling. For exact score functions, we find that stochasticity acts as an error-correcting mechanism, decreasing KL divergence along the sampling trajectory. For an approximate score function, there is a trade-off between error correction and score error amplification, so that stochasticity can either improve or worsen the performance, depending on the structure of the score error. Numerical experiments on simple datasets and a fully analytical example are included to illustrate and enlighten the theoretical results.

Updated: 2025-07-30 15:34:07

标题: 得分基础扩散采样中随机性的影响：KL散度分析

摘要: 在基于分数的扩散模型中，可以通过解决参数化任意时间依赖性随机性参数的逆时随机微分方程（SDE）或对应于将随机性参数设置为零的概率流ODE来进行抽样。在这项工作中，我们通过控制Kullback-Leibler（KL）散度的上下界来研究这种随机性对生成过程的影响，并通过数值和分析示例进行分析。我们的主要结果适用于具有加性噪声和Lipschitz连续得分函数的线性正向SDE，并量化了在不同随机性参数选择下来自先验分布和得分近似的错误如何传播。理论上的界限是使用正向过程边缘的log-Sobolev不等式导出的，这使得能够更有效地控制沿抽样KL散度衰减。对于精确的得分函数，我们发现随机性起到了一个错误校正机制的作用，沿着抽样轨迹减少KL散度。对于近似得分函数，存在错误校正和得分错误放大之间的权衡，因此随机性可以改善或恶化性能，这取决于得分错误的结构。我们包括了对简单数据集和一个完全分析的示例的数值实验，以阐明和解释理论结果。

更新时间: 2025-07-30 15:34:07

领域: cs.LG

下载: http://arxiv.org/abs/2506.11378v2

The Effect of Stochasticity in Score-Based Diffusion Sampling: a KL Divergence Analysis

Updated: 2025-07-30 15:34:07

标题: 得分为基础的扩散采样中随机性的影响：KL散度分析

摘要: 在基于分数的扩散模型中，可以通过解决由任意时间相关随机性参数化的逆时随机微分方程（SDE）或与随机性参数设为零对应的概率流ODE来执行抽样。在这项工作中，我们通过Kullback-Leibler（KL）散度的界限研究了这种随机性对生成过程的影响，辅以数值和分析示例。我们的主要结果适用于具有加性噪声和Lipschitz连续得分函数的线性正向SDE，并量化了先验分布和得分逼近误差在不同随机性参数选择下的传播方式。理论界限是使用正向过程的边缘的对数Sobolev不等式推导出来的，这使得在抽样过程中更有效地控制KL散度的衰减。对于精确的得分函数，我们发现随机性作为一个纠错机制，减少了沿抽样轨迹的KL散度。对于近似得分函数，存在纠错和得分误差放大之间的权衡，因此随机性可以改善或恶化性能，取决于得分误差的结构。在简单数据集和完全分析的示例上进行数值实验，以说明和阐明理论结果。

更新时间: 2025-07-30 15:34:07

领域: cs.LG

下载: http://arxiv.org/abs/2506.11378v2

Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization

Distilling large neural networks into simple, human-readable symbolic formulas is a promising path toward trustworthy and interpretable AI. However, this process is often brittle, as the complex functions learned by standard networks are poor targets for symbolic discovery, resulting in low-fidelity student models. In this work, we propose a novel training paradigm to address this challenge. Instead of passively distilling a pre-trained network, we introduce a \textbf{Jacobian-based regularizer} that actively encourages the ``teacher'' network to learn functions that are not only accurate but also inherently smoother and more amenable to distillation. We demonstrate through extensive experiments on a suite of real-world regression benchmarks that our method is highly effective. By optimizing the regularization strength for each problem, we improve the $R^2$ score of the final distilled symbolic model by an average of \textbf{120\% (relative)} compared to the standard distillation pipeline, all while maintaining the teacher's predictive accuracy. Our work presents a practical and principled method for significantly improving the fidelity of interpretable models extracted from complex neural networks.

Updated: 2025-07-30 15:32:18

标题: 教师教学: 通过雅可比正则化改进符号回归的神经网络可分解性

摘要: 将大型神经网络提炼成简单、人类可读的符号公式是通向可信赖和可解释人工智能的一条有前途的道路。然而，这个过程通常很脆弱，因为标准网络学习的复杂函数不太适合符号发现，导致学生模型的准确性较低。在这项工作中，我们提出了一种新颖的训练范式来解决这一挑战。我们不是被动地提炼一个预训练网络，而是引入了一种基于雅可比矩阵的正则化器，积极鼓励“教师”网络学习不仅准确而且固有更平滑、更适合提炼的函数。通过在一系列真实回归基准数据集上进行大量实验，我们展示了我们的方法非常有效。通过为每个问题优化正则化强度，我们将最终提炼的符号模型的$R^2$得分相对于标准提炼流程平均提高了120\%，同时保持了教师的预测准确性。我们的工作提出了一种实用而有原则性的方法，显著提高了从复杂神经网络中提取的可解释模型的准确性。

更新时间: 2025-07-30 15:32:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22767v1

Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization

Updated: 2025-07-30 15:32:18

标题: 教导教师：通过雅可比正则化提高符号回归的神经网络可压缩性

摘要: 将大型神经网络提炼成简单、易读的符号公式是通向可靠且可解释人工智能的一条有前途的道路。然而，这个过程通常是脆弱的，因为标准网络学习的复杂函数不是符号发现的好目标，导致学生模型的准确性较低。在这项工作中，我们提出了一种新颖的训练范式来解决这一挑战。我们不是被动地提炼预先训练的网络，而是引入了一个基于雅可比矩阵的正则化器，积极鼓励“老师”网络学习不仅准确而且本质上更平滑、更易于提炼的函数。通过对一系列真实回归基准测试进行大量实验，我们证明了我们的方法非常有效。通过为每个问题优化正则化强度，我们将最终提炼的符号模型的$R^2$得分相对于标准提炼管道平均提高了120\%，同时保持了老师的预测准确性。我们的工作提出了一种实用和有原则的方法，显著提高了从复杂神经网络中提取的可解释模型的准确性。

更新时间: 2025-07-30 15:32:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22767v1

Bayesian Optimization of Process Parameters of a Sensor-Based Sorting System using Gaussian Processes as Surrogate Models

Sensor-based sorting systems enable the physical separation of a material stream into two fractions. The sorting decision is based on the image data evaluation of the sensors used and is carried out using actuators. Various process parameters must be set depending on the properties of the material stream, the dimensioning of the system, and the required sorting accuracy. However, continuous verification and re-adjustment are necessary due to changing requirements and material stream compositions. In this paper, we introduce an approach for optimizing, recurrently monitoring and adjusting the process parameters of a sensor-based sorting system. Based on Bayesian Optimization, Gaussian process regression models are used as surrogate models to achieve specific requirements for system behavior with the uncertainties contained therein. This method minimizes the number of necessary experiments while simultaneously considering two possible optimization targets based on the requirements for both material output streams. In addition, uncertainties are considered during determining sorting accuracies in the model calculation. We evaluated the method with three example process parameters.

Updated: 2025-07-30 15:31:39

标题: 贝叶斯优化传感器基排序系统的过程参数，使用高斯过程作为代理模型

摘要: 传感器基于的分选系统可以将物料流分离成两个部分。分选决策基于传感器的图像数据评估，并利用执行器进行操作。根据物料流的特性、系统的尺寸和所需的分选精度，必须设置各种过程参数。然而，由于需求和物料流组成的变化，连续的验证和重新调整是必要的。在本文中，我们介绍了一种优化、定期监控和调整基于传感器的分选系统的过程参数的方法。基于贝叶斯优化，高斯过程回归模型被用作替代模型，以实现系统行为的特定要求和其中包含的不确定性。该方法在同时考虑两种可能的优化目标的基础上，最小化了必要实验的数量。此外，在模型计算中确定分选准确性时考虑了不确定性。我们用三个示例过程参数评估了该方法。

更新时间: 2025-07-30 15:31:39

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.22766v1

Bayesian Optimization of Process Parameters of a Sensor-Based Sorting System using Gaussian Processes as Surrogate Models

Updated: 2025-07-30 15:31:39

标题: 贝叶斯优化传感器基础分类系统的过程参数，使用高斯过程作为代理模型

摘要: 传感器排序系统使物料流分离成两个部分成为可能。排序决策基于传感器的图像数据评估，并通过执行器进行。根据物料流的特性、系统的尺寸和所需的排序精度，必须设置各种处理参数。然而，由于不断变化的要求和物料流组成，持续的验证和重新调整是必要的。在本文中，我们介绍了一种优化、定期监控和调整传感器排序系统的处理参数的方法。基于贝叶斯优化，高斯过程回归模型被用作代理模型，以实现带有其中包含的不确定性的系统行为的特定要求。这种方法最小化了必要实验的数量，同时考虑了基于对两个材料输出流的要求的两个可能的优化目标。此外，在模型计算中确定分类准确性时考虑了不确定性。我们用三个示例处理参数评估了该方法。

更新时间: 2025-07-30 15:31:39

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.22766v1

Of Good Demons and Bad Angels: Guaranteeing Safe Control under Finite Precision

As neural networks (NNs) become increasingly prevalent in safety-critical neural network-controlled cyber-physical systems (NNCSs), formally guaranteeing their safety becomes crucial. For these systems, safety must be ensured throughout their entire operation, necessitating infinite-time horizon verification. To verify the infinite-time horizon safety of NNCSs, recent approaches leverage Differential Dynamic Logic (dL). However, these dL-based guarantees rely on idealized, real-valued NN semantics and fail to account for roundoff errors introduced by finite-precision implementations. This paper bridges the gap between theoretical guarantees and real-world implementations by incorporating robustness under finite-precision perturbations -- in sensing, actuation, and computation -- into the safety verification. We model the problem as a hybrid game between a good Demon, responsible for control actions, and a bad Angel, introducing perturbations. This formulation enables formal proofs of robustness w.r.t. a given (bounded) perturbation. Leveraging this bound, we employ state-of-the-art mixed-precision fixed-point tuners to synthesize sound and efficient implementations, thus providing a complete end-to-end solution. We evaluate our approach on case studies from the automotive and aeronautics domains, producing efficient NN implementations with rigorous infinite-time horizon safety guarantees.

Updated: 2025-07-30 15:21:22

标题: 善恶之间：有限精度条件下保证安全控制

摘要: 随着神经网络（NNs）在安全关键的神经网络控制的网络物理系统（NNCSs）中变得越来越普遍，正式保证它们的安全性变得至关重要。对于这些系统，安全必须在其整个运行过程中得到保证，这需要进行无限时间范围的验证。为了验证NNCSs的无限时间范围安全性，最近的方法利用微分动态逻辑（dL）。然而，这些基于dL的保证依赖于理想化的实值NN语义，并未考虑由有限精度实现引入的舍入误差。本文通过将有限精度扰动（在感知、执行和计算方面）的鲁棒性纳入安全验证，弥合了理论保证与实际实现之间的差距。我们将问题建模为一个混合游戏，其中一个负责控制动作的好恶魔和一个引入扰动的坏天使。这种表述使得能够对给定（有界）扰动的鲁棒性进行正式证明。利用这个界限，我们利用最先进的混合精度定点调谐器来合成稳健和高效的实现，从而提供一个完整的端到端解决方案。我们在汽车和航空领域的案例研究中评估了我们的方法，产生了具有严格无限时间范围安全保证的高效NN实现。

更新时间: 2025-07-30 15:21:22

领域: eess.SY,cs.AI,cs.LG,cs.LO,cs.SY

下载: http://arxiv.org/abs/2507.22760v1

Of Good Demons and Bad Angels: Guaranteeing Safe Control under Finite Precision

Updated: 2025-07-30 15:21:22

标题: 善恶之间：有限精度下的安全控制保证

摘要: 随着神经网络（NNs）在安全关键的神经网络控制的网络物理系统（NNCSs）中变得越来越普遍，正式保证它们的安全性变得至关重要。对于这些系统，安全必须在其整个运行过程中得到保证，这需要进行无限时间范围的验证。为了验证NNCSs的无限时间范围安全性，最近的方法利用了微分动态逻辑（dL）。然而，这些基于dL的保证依赖于理想化的实值NN语义，并未考虑有限精度实现引入的舍入误差。本文通过将有限精度扰动下的鲁棒性（在传感、激励和计算中）纳入安全验证，弥合了理论保证与现实实现之间的差距。我们将问题建模为一个混合游戏，由一个负责控制动作的好恶魔和一个引入扰动的坏天使组成。这种表述使得能够对给定（有界）扰动的鲁棒性进行正式证明。利用这个界限，我们利用最先进的混合精度固定点调节器来合成健全和高效的实现，从而提供一个完整的端到端解决方案。我们在汽车和航空领域的案例研究中评估了我们的方法，产生了具有严格无限时间范围安全保证的高效NN实现。

更新时间: 2025-07-30 15:21:22

领域: eess.SY,cs.AI,cs.LG,cs.LO,cs.SY

下载: http://arxiv.org/abs/2507.22760v1

MASCA: LLM based-Multi Agents System for Credit Assessment

Recent advancements in financial problem-solving have leveraged LLMs and agent-based systems, with a primary focus on trading and financial modeling. However, credit assessment remains an underexplored challenge, traditionally dependent on rule-based methods and statistical models. In this paper, we introduce MASCA, an LLM-driven multi-agent system designed to enhance credit evaluation by mirroring real-world decision-making processes. The framework employs a layered architecture where specialized LLM-based agents collaboratively tackle sub-tasks. Additionally, we integrate contrastive learning for risk and reward assessment to optimize decision-making. We further present a signaling game theory perspective on hierarchical multi-agent systems, offering theoretical insights into their structure and interactions. Our paper also includes a detailed bias analysis in credit assessment, addressing fairness concerns. Experimental results demonstrate that MASCA outperforms baseline approaches, highlighting the effectiveness of hierarchical LLM-based multi-agent systems in financial applications, particularly in credit scoring.

Updated: 2025-07-30 15:19:38

标题: MASCA：基于LLM的信用评估多智能体系统

摘要: 最近在金融问题解决方面取得的进展利用了LLMs和基于代理的系统，主要关注交易和金融建模。然而，信用评估仍然是一个未被充分探讨的挑战，传统上依赖于基于规则的方法和统计模型。在本文中，我们介绍了MASCA，这是一个由LLM驱动的多代理系统，旨在通过模拟现实世界的决策过程来增强信用评估。该框架采用了分层架构，其中专门的基于LLM的代理协同处理子任务。此外，我们整合了对风险和回报评估的对比学习，以优化决策。我们进一步从信号博弈理论的角度提出了关于分层多代理系统的视角，提供了关于它们的结构和互动的理论见解。我们的论文还包括对信用评估中的偏见进行了详细分析，解决了公平性问题。实验结果表明，MASCA优于基准方法，突显了分层LLM基础的多代理系统在金融应用中特别是在信用评分方面的有效性。

更新时间: 2025-07-30 15:19:38

领域: cs.CL,cs.CE,cs.LG

下载: http://arxiv.org/abs/2507.22758v1

MASCA: LLM based-Multi Agents System for Credit Assessment

Updated: 2025-07-30 15:19:38

标题: MASCA: LLM基于的多智能体信用评估系统

摘要: 近年来，金融问题解决方面的最新进展已经利用了LLM和基于代理的系统，主要关注交易和金融建模。然而，信用评估仍然是一个未被充分探讨的挑战，传统上依赖于基于规则的方法和统计模型。在本文中，我们介绍了MASCA，这是一个由LLM驱动的多代理系统，旨在通过模拟真实世界的决策过程来增强信用评估。该框架采用分层架构，专门的基于LLM的代理共同处理子任务。此外，我们还整合了对比学习用于风险和奖励评估以优化决策。我们进一步从信号博弈理论的角度提出了关于分层多代理系统的结构和互动的理论见解。我们的论文还包括对信用评估中的偏见进行了详细分析，解决了公平性问题。实验结果表明，MASCA优于基线方法，突显了分层LLM基础的多代理系统在金融应用中特别是在信用评分方面的有效性。

更新时间: 2025-07-30 15:19:38

领域: cs.CL,cs.CE,cs.LG

下载: http://arxiv.org/abs/2507.22758v1

Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention

Recent advances in sparse voxel representations have significantly improved the quality of 3D content generation, enabling high-resolution modeling with fine-grained geometry. However, existing frameworks suffer from severe computational inefficiencies due to the quadratic complexity of attention mechanisms in their two-stage diffusion pipelines. In this work, we propose Ultra3D, an efficient 3D generation framework that significantly accelerates sparse voxel modeling without compromising quality. Our method leverages the compact VecSet representation to efficiently generate a coarse object layout in the first stage, reducing token count and accelerating voxel coordinate prediction. To refine per-voxel latent features in the second stage, we introduce Part Attention, a geometry-aware localized attention mechanism that restricts attention computation within semantically consistent part regions. This design preserves structural continuity while avoiding unnecessary global attention, achieving up to 6.7x speed-up in latent generation. To support this mechanism, we construct a scalable part annotation pipeline that converts raw meshes into part-labeled sparse voxels. Extensive experiments demonstrate that Ultra3D supports high-resolution 3D generation at 1024 resolution and achieves state-of-the-art performance in both visual fidelity and user preference.

Updated: 2025-07-30 15:17:22

标题: Ultra3D：使用部分注意力进行高效且高保真度的3D生成

摘要: 近年来，稀疏体素表示方面的最新进展显著提高了三维内容生成的质量，实现了具有精细几何的高分辨率建模。然而，现有框架由于其两阶段扩散管道中注意机制的二次复杂度而遭受严重的计算效率低下。在这项工作中，我们提出了Ultra3D，一种高效的3D生成框架，显著加速了稀疏体素建模而不影响质量。我们的方法利用紧凑的VecSet表示，在第一阶段高效生成粗糙的对象布局，减少令牌数量并加速体素坐标预测。为了在第二阶段优化每个体素的潜在特征，我们引入了Part Attention，一种几何感知的局部化注意机制，将关注计算限制在语义一致的部分区域内。这种设计保持了结构连续性，同时避免了不必要的全局关注，使潜在生成速度提高了最多6.7倍。为了支持这种机制，我们构建了一个可扩展的部分注释管道，将原始网格转换为带有部分标签的稀疏体素。大量实验表明，Ultra3D支持1024分辨率的高分辨率3D生成，并在视觉保真度和用户偏好方面取得了最先进的性能。

更新时间: 2025-07-30 15:17:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17745v2

Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention

Updated: 2025-07-30 15:17:22

标题: Ultra3D：具有部分关注的高效高保真度3D生成

摘要: 最近对稀疏体素表示的进展显著提高了3D内容生成的质量，实现了具有细粒度几何的高分辨率建模。然而，现有框架由于其两阶段扩散管道中注意机制的二次复杂性而遭受严重的计算效率问题。在这项工作中，我们提出了Ultra3D，一种高效的3D生成框架，显著加快了稀疏体素建模而不损害质量。我们的方法利用紧凑的VecSet表示，在第一阶段高效生成粗糙的对象布局，减少令牌计数并加速体素坐标的预测。为了在第二阶段细化每个体素的潜在特征，我们引入了Part Attention，一种几何感知的局部注意机制，将注意力计算限制在语义一致的部分区域内。这种设计保持了结构连续性，同时避免了不必要的全局注意力，实现了潜在生成速度高达6.7倍的提升。为支持这种机制，我们构建了一个可扩展的部分注释管道，将原始网格转换为带有部分标记的稀疏体素。大量实验证明，Ultra3D支持1024分辨率下的高分辨率3D生成，并在视觉保真度和用户偏好方面取得了业界领先的性能。

更新时间: 2025-07-30 15:17:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.17745v2

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Vector Quantization (VQ) is essential for discretizing continuous representations in unsupervised learning but suffers from representation collapse, causing low codebook utilization and limiting scalability. Existing solutions often rely on complex optimizations or reduce latent dimensionality, which compromises model capacity and fails to fully solve the problem. We identify the root cause as disjoint codebook optimization, where only a few code vectors are updated via gradient descent. To fix this, we propose \textbf{Sim}ple\textbf{VQ}, which reparameterizes code vectors through a learnable linear transformation layer over a latent basis, optimizing the \textit{entire linear space} rather than nearest \textit{individual code vectors}. Although the multiplication of two linear matrices is equivalent to applying a single linear layer, this simple approach effectively prevents collapse. Extensive experiments on image and audio tasks demonstrate that SimVQ improves codebook usage, is easy to implement, and generalizes well across modalities and architectures.

Updated: 2025-07-30 15:05:10

标题: 使用一个线性层解决向量量化模型中的表示坍缩问题

摘要: 矢量量化（VQ）对于在无监督学习中离散化连续表示是至关重要的，但存在表示崩溃问题，导致码书利用率低并限制可扩展性。现有解决方案通常依赖于复杂的优化或降低潜在维度，这会影响模型容量并未能完全解决问题。我们确定根本原因是不连续的码书优化，只有少数码向量通过梯度下降进行更新。为了解决这个问题，我们提出了 SimVQ，通过一个可学习的线性变换层对码向量进行重新参数化，优化整个线性空间而不是最近的单个码向量。虽然两个线性矩阵的乘法等同于应用一个单独的线性层，但这种简单的方法有效地防止了崩溃。对图像和音频任务的广泛实验表明，SimVQ提高了码书的使用率，易于实现，并且在不同模态和架构中具有很好的泛化性能。

更新时间: 2025-07-30 15:05:10

领域: cs.LG,cs.CV,cs.SD,eess.AS

下载: http://arxiv.org/abs/2411.02038v2

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Updated: 2025-07-30 15:05:10

标题: 使用一个线性层解决向量量化模型中的表示坍缩问题

摘要: 量化向量（VQ）对于在无监督学习中离散化连续表示是至关重要的，但遭受了表示坍缩的困扰，导致码书利用率低，限制了可扩展性。现有的解决方案通常依赖于复杂的优化或降低潜在维度，这会影响模型容量并未能完全解决问题。我们确定根本原因是不连续的码书优化，只有少数码矢量通过梯度下降进行更新。为了解决这个问题，我们提出了SimpleVQ，通过一个可学习的线性变换层对码矢量进行重新参数化，优化整个线性空间而不是最近的单个码矢量。虽然两个线性矩阵的乘积等同于应用一个单一的线性层，但这种简单的方法有效地阻止了坍塌。对图像和音频任务的大量实验表明，SimVQ改善了码书的使用，易于实现，并且在不同模态和架构上具有很好的泛化性能。

更新时间: 2025-07-30 15:05:10

领域: cs.LG,cs.CV,cs.SD,eess.AS

下载: http://arxiv.org/abs/2411.02038v2

Reducing Hallucinations in Summarization via Reinforcement Learning with Entity Hallucination Index

Reducing hallucinations in abstractive summarization remains a critical challenge for deploying language models (LMs) in real-world settings. In this work, we introduce a rewarddriven fine-tuning framework that explicitly optimizes for Entity Hallucination Index (EHI), a metric designed to quantify the presence, correctness, and grounding of named entities in generated summaries. Given a corpus of meeting transcripts, we first generate baseline summaries using a pre-trained LM and compute EHI scores via automatic entity extraction and matching. We then apply reinforcement learning to fine-tune the model parameters, using EHI as a reward signal to bias generation toward entity-faithful outputs. Our approach does not rely on human-written factuality annotations, enabling scalable fine-tuning. Experiments demonstrate consistent improvements in EHI across datasets, with qualitative analysis revealing a significant reduction in entity-level hallucinations without degradation in fluency or informativeness. We release a reproducible Colab pipeline, facilitating further research on hallucination-aware model fine-tuning using lightweight, hallucintion metrics like EHI.

Updated: 2025-07-30 15:00:00

标题: 通过实体幻觉指数的强化学习减少摘要中的幻觉

摘要: 在抽象总结中减少幻觉仍然是在实际环境中部署语言模型（LMs）时面临的关键挑战。在这项工作中，我们引入了一个基于奖励驱动的微调框架，明确优化实体幻觉指数（EHI），这是一个旨在量化生成摘要中命名实体的存在、正确性和基础的度量标准。给定一个会议记录语料库，我们首先使用预训练的LM生成基线摘要，并通过自动实体提取和匹配计算EHI分数。然后，我们应用强化学习来微调模型参数，使用EHI作为奖励信号来偏向生成对实体忠实的输出。我们的方法不依赖于人工编写的事实注释，可以实现可扩展的微调。实验表明，在各个数据集上EHI都有一致的改进，定性分析显示在不降低流畅性或信息量的情况下，显著减少了实体级别的幻觉。我们发布了一个可重现的Colab管道，促进进一步研究使用轻量级、幻觉指标（如EHI）的幻觉感知模型微调。

更新时间: 2025-07-30 15:00:00

领域: cs.CL,cs.AI,68T50,I.2.7

下载: http://arxiv.org/abs/2507.22744v1

Reducing Hallucinations in Summarization via Reinforcement Learning with Entity Hallucination Index

Updated: 2025-07-30 15:00:00

标题: 通过实体幻觉指数以强化学习减少总结中的幻觉

摘要: 摘要：在抽象式摘要生成中减少幻觉仍然是在实际环境中部署语言模型（LMs）时面临的关键挑战。在这项工作中，我们引入了一种奖励驱动的微调框架，明确优化实体幻觉指数（EHI），这是一种旨在量化生成摘要中命名实体的存在、正确性和基础的度量标准。给定一组会议记录文本，我们首先使用预训练的LM生成基准摘要，并通过自动实体提取和匹配计算EHI分数。然后，我们应用强化学习来微调模型参数，使用EHI作为奖励信号来偏向生成实体忠实的输出。我们的方法不依赖人工编写的事实注释，实现了可扩展的微调。实验表明，在各个数据集上都实现了EHI的一致改进，定性分析显示在不影响流畅性或信息性的情况下，实体级幻觉显著减少。我们发布了一个可重现的Colab管道，促进对使用轻量级幻觉指标如EHI进行幻觉感知模型微调的进一步研究。

更新时间: 2025-07-30 15:00:00

领域: cs.CL,cs.AI,68T50,I.2.7

下载: http://arxiv.org/abs/2507.22744v1

Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning

Synthetic verification techniques such as generating test cases and reward modelling are common ways to enhance the coding capabilities of large language models (LLM) beyond predefined tests. Additionally, code verification has recently found great success as a critical component in improving reasoning capability of LLMs via reinforcement learning. In this paper, we propose an approach which can transform existing coding benchmarks into scoring and ranking datasets to evaluate the effectiveness of synthetic verifiers. We also propose multiple metrics to measure different aspects of the synthetic verifiers with the proposed benchmarks. By employing the proposed approach, we release four new benchmarks (HE-R, HE-R+, MBPP-R, and MBPP-R+), and analyzed synthetic verification methods with standard, reasoning-based, and reward-based LLMs. Our experiments show that reasoning can significantly improve test case generation and that scaling the number of test cases enhances the verification accuracy.

Updated: 2025-07-30 14:58:42

标题: 打分验证者：评估代码和推理的合成验证

摘要: 合成验证技术，如生成测试用例和奖励建模，是增强大型语言模型（LLM）编码能力的常见方式，超越预定义的测试。此外，代码验证最近在通过强化学习来提高LLMs的推理能力方面取得了巨大成功。在本文中，我们提出了一种方法，可以将现有的编码基准转化为评估合成验证器效果的得分和排名数据集。我们还提出了多个指标来衡量合成验证器在所提出的基准下的不同方面。通过采用所提出的方法，我们发布了四个新的基准（HE-R，HE-R+，MBPP-R和MBPP-R+），并分析了标准、基于推理和基于奖励的LLMs中的合成验证方法。我们的实验表明，推理可以显著改善测试用例生成，并且增加测试用例数量可以提高验证准确性。

更新时间: 2025-07-30 14:58:42

领域: cs.AI,cs.CL,cs.LG,cs.SE

下载: http://arxiv.org/abs/2502.13820v3

Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning

Updated: 2025-07-30 14:58:42

标题: 评分验证者：评估代码和推理的合成验证

摘要: 合成验证技术，如生成测试用例和奖励建模，是增强大型语言模型（LLM）编码能力的常见方法，超出了预定义的测试范围。此外，代码验证最近在通过强化学习改进LLM推理能力方面取得了巨大成功。在本文中，我们提出了一种方法，可以将现有的编码基准转换为评估合成验证器效果的评分和排名数据集。我们还提出了多个指标来衡量提出基准的合成验证器的不同方面。通过采用提出的方法，我们发布了四个新的基准（HE-R、HE-R+、MBPP-R和MBPP-R+），并使用标准的、基于推理的和基于奖励的LLM对合成验证方法进行了分析。我们的实验表明，推理能够显著改善测试用例生成，而扩大测试用例数量可以提高验证准确性。

更新时间: 2025-07-30 14:58:42

领域: cs.AI,cs.CL,cs.LG,cs.SE

下载: http://arxiv.org/abs/2502.13820v3

RocketStack: Level-aware deep recursive ensemble learning framework with adaptive feature fusion and model pruning dynamics

Ensemble learning remains a cornerstone of machine learning, with stacking used to integrate predictions from multiple base learners through a meta-model. However, deep stacking remains rare, as most designs prioritize horizontal diversity over recursive depth due to model complexity, feature redundancy, and computational burden. To address these challenges, RocketStack, a level-aware recursive ensemble framework, is introduced and explored up to ten stacking levels, extending beyond prior architectures. The framework incrementally prunes weaker learners at each level, enabling deeper stacking without excessive complexity. To mitigate early performance saturation, mild Gaussian noise is added to out-of-fold (OOF) scores before pruning, and compared against strict OOF pruning. Further both per-level and periodic feature compressions are explored using attention-based selection, Simple, Fast, Efficient (SFE) filter, and autoencoders. Across 33 datasets (23 binary, 10 multi-class), linear-trend tests confirmed rising accuracy with depth in most variants, and the top performing meta-model at each level increasingly outperformed the strongest standalone ensemble. In the binary subset, periodic SFE with mild OOF-score randomization reached 97.08% at level 10, 5.14% above the strict-pruning configuration and cut runtime by 10.5% relative to no compression. In the multi-class subset, periodic attention selection reached 98.60% at level 10, exceeding the strongest baseline by 6.11%, while reducing runtime by 56.1% and feature dimensionality by 74% compared to no compression. These findings highlight mild randomization as an effective regularizer and periodic compression as a stabilizer. Echoing the design of multistage rockets in aerospace (prune, compress, propel) RocketStack achieves deep recursive ensembling with tractable complexity.

Updated: 2025-07-30 14:53:10

标题: RocketStack：具有自适应特征融合和模型修剪动态的层感知深度递归集成学习框架

摘要: 集成学习仍然是机器学习的基石，堆叠被用来通过元模型整合多个基础学习器的预测。然而，深度堆叠仍然较为罕见，因为大多数设计优先考虑水平多样性而非递归深度，这是由于模型复杂性、特征冗余和计算负担所致。为了解决这些挑战，引入了一个级别感知的递归集成框架RocketStack，并探索了高达十个堆叠级别，超越了先前的架构。该框架在每个级别逐步剪枝较弱的学习器，实现了更深的堆叠而不会带来过多的复杂性。为了缓解早期性能饱和，将轻微的高斯噪声添加到折叠外（OOF）分数中，然后进行剪枝，与严格的OOF剪枝进行了比较。此外，还使用基于注意力的选择、Simple, Fast, Efficient (SFE) 过滤器和自动编码器探索了每级别和周期性特征压缩。在33个数据集（23个二元，10个多类别）中，线性趋势测试证实大多数变体的准确性随深度而提高，并且每个级别的表现最佳元模型逐渐超越了最强大的独立集成。在二元子集中，使用轻微OOF分数随机化的周期性SFE在第10级别达到了97.08%，比严格剪枝配置高出5.14%，同时相对于没有压缩，减少了10.5%的运行时间。在多类别子集中，周期性注意选择在第10级别达到了98.60%，超过最强基线6.11%，同时相对于没有压缩，减少了56.1%的运行时间和74%的特征维度。这些发现突出了轻微随机化作为一种有效的正则化器，周期性压缩作为一种稳定器。类似于航天中多级火箭的设计（剪枝、压缩、推进），RocketStack实现了具有可管理复杂性的深度递归集成。

更新时间: 2025-07-30 14:53:10

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2506.16965v2

RocketStack: Level-aware deep recursive ensemble learning framework with adaptive feature fusion and model pruning dynamics

Updated: 2025-07-30 14:53:10

标题: RocketStack：具有适应性特征融合和模型修剪动态的层感知深度递归集成学习框架

摘要: 集成学习仍然是机器学习的基石，堆叠被用于通过元模型整合多个基本学习器的预测。然而，深度堆叠仍然很少见，因为大多数设计优先考虑水平多样性而非递归深度，原因是模型复杂性、特征冗余和计算负担。为了解决这些挑战，引入了一种名为RocketStack的层感知递归集成框架，并探索了高达十个堆叠级别，超越了先前的架构。该框架在每个级别逐渐剪枝较弱的学习器，实现更深的堆叠而不增加过多复杂性。为了减轻早期性能饱和，对于剪枝前的out-of-fold（OOF）得分添加了轻微的高斯噪声，并与严格的OOF剪枝进行了比较。此外，还使用基于注意力的选择、简单、快速、高效（SFE）过滤器和自动编码器来探索每级和周期性特征压缩。在33个数据集（23个二元，10个多类别）上，线性趋势测试证实大多数变体在深度上提高了准确性，并且每个级别的表现最佳的元模型逐渐超过了最强的独立集成。在二元子集中，使用轻微OOF得分随机化的周期SFE在第10级达到了97.08%，比严格剪枝配置高出5.14%，而相对于无压缩，运行时减少了10.5%。在多类别子集中，周期性注意力选择在第10级达到了98.60%，超过最强基线6.11%，同时相对于无压缩，运行时减少了56.1%，特征维度减少了74%。这些发现强调了轻微随机化作为一种有效的正则化器和周期性压缩作为一种稳定器。类似于航空航天中多级火箭的设计（剪枝、压缩、推进），RocketStack实现了具有可控复杂性的深度递归集成。

更新时间: 2025-07-30 14:53:10

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2506.16965v2

FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation

In this paper, we challenge the conventional practice in Open-Vocabulary Semantic Segmentation (OVSS) of using averaged class-wise text embeddings, which are typically obtained by encoding each class name with multiple templates (e.g., a photo of <class>, a sketch of a <class>). We investigate the impact of templates for OVSS, and find that for each class, there exist single-template classifiers--which we refer to as class-experts--that significantly outperform the conventional averaged classifier. First, to identify these class-experts, we introduce a novel approach that estimates them without any labeled data or training. By leveraging the class-wise prediction entropy of single-template classifiers, we select those yielding the lowest entropy as the most reliable class-experts. Second, we combine the outputs of class-experts in a new fusion process. Our plug-and-play method, coined FLOSS, is orthogonal and complementary to existing OVSS methods, offering an improvement without the need for additional labels or training. Extensive experiments show that FLOSS consistently enhances state-of-the-art OVSS models, generalizes well across datasets with different distribution shifts, and delivers substantial improvements in low-data scenarios where only a few unlabeled images are available. Our code is available at https://github.com/yasserben/FLOSS .

Updated: 2025-07-30 14:39:53

标题: FLOSS: 开放词汇语义分割中的免费午餐

摘要: 在本文中，我们挑战了开放词汇语义分割（OVSS）中通常使用的平均类别文本嵌入的传统做法，这些文本嵌入通常通过使用多个模板（例如，<class>的照片，<class>的草图）对每个类别名称进行编码而获得。我们调查了模板对OVSS的影响，并发现对于每个类别，存在单一模板分类器 - 我们称之为类别专家 - 明显优于传统的平均分类器。首先，为了识别这些类别专家，我们引入了一种新颖的方法，可以在没有任何标记数据或训练的情况下估计它们。通过利用单一模板分类器的类别预测熵，我们选择产生最低熵的那些作为最可靠的类别专家。其次，我们在一个新的融合过程中结合类别专家的输出。我们的即插即用方法，名为FLOSS，与现有的OVSS方法正交且互补，提供了一种改进，无需额外的标签或训练。广泛的实验表明，FLOSS一直增强了最先进的OVSS模型，在具有不同分布偏移的数据集之间具有良好的泛化性，并在只有少量未标记图像可用的低数据场景中实现了显著的改进。我们的代码可在https://github.com/yasserben/FLOSS 获取。

更新时间: 2025-07-30 14:39:53

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.10487v2

FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation

Updated: 2025-07-30 14:39:53

标题: FLOSS：开放词汇语义分割中的免费午餐

摘要: 在这篇论文中，我们挑战了在开放词汇语义分割（OVSS）中使用平均类别文本嵌入的传统做法，这些文本嵌入通常通过使用多个模板（例如，<class>的照片，<class>的草图）对每个类别名称进行编码而获得。我们调查了模板对OVSS的影响，并发现对于每个类别，存在着单一模板分类器--我们称之为类别专家--其表现明显优于传统的平均分类器。首先，为了识别这些类别专家，我们引入了一种新颖的方法，可以在没有任何标记数据或训练的情况下估计它们。通过利用单一模板分类器的类别预测熵，我们选择产生最低熵的那些作为最可靠的类别专家。其次，我们在一个新的融合过程中结合类别专家的输出。我们的即插即用方法，命名为FLOSS，与现有的OVSS方法正交且互补，提供了一种无需额外标签或训练即可改进的方法。大量实验证明，FLOSS持续增强了最先进的OVSS模型，在不同分布偏移的数据集中具有良好的泛化能力，并在仅有少量未标记图像的低数据情况下产生了显著的改进。我们的代码可在https://github.com/yasserben/FLOSS 上找到。

更新时间: 2025-07-30 14:39:53

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2504.10487v2

Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining

The growing use of large language models has raised environmental and economic concerns about their intensity of resource usage during inference. Serving these models to each user requires substantial energy and water for cooling. Model compression techniques like quantization can shrink large language models and make them more resource efficient at the cost of potential performance degradation. Quantization methods compress model size through replacing their high-precision parameters by quantized values of lower precision. Among existing methods, the ApiQ method achieves superior accuracy preservation at minimal memory and time overhead. We investigate two ideas to extend performance in ultra-low-bit quantization beyond ApiQ's level. First, we look into combining existing quantization-aware training techniques with ApiQ's partial training. We show that this does not outperform the baseline ApiQ method with limited training data and frozen weights. This leads to two key insights: (1) The substantial representational capacity that is gained through full retraining is unlikely to be feasible through partial training. (2) This gain may depend on using a large and diverse dataset in quantization-aware training. Second, through a novel approach informed by the two insights, we propose an ultra-low-bit quantization method that builds upon ApiQ and extends its performance without the need for full retraining. This publicly available method relies on a saliency-aware regularization term that prioritizes preserving the most impactful parameters during quantization. Our experiments on LLaMA 7B and 13B benchmarks demonstrate that our method reduces the ApiQ's accuracy degradation by 10.85% and 7.54% respectively. A Python implementation of the proposed quantization method is publicly available on GitHub https://github.com/TokuyuSou/ULB-SAPR.

Updated: 2025-07-30 14:37:56

标题: 通过关注显著性的部分重新训练增强大型语言模型的超低比特量化

摘要: 使用大型语言模型的增长引起了人们对其推理过程中资源使用强度的环境和经济担忧。为每个用户提供这些模型需要大量的能源和水进行冷却。量化等模型压缩技术可以缩小大型语言模型，并使它们在资源效率上更加高效，但可能会损害性能。量化方法通过用较低精度的量化值替换高精度参数来压缩模型大小。在现有方法中，ApiQ方法在最小内存和时间开销下实现了卓越的准确性保持。我们研究了两个想法，以超低比特量化的性能超越ApiQ的水平。首先，我们研究了将现有的量化感知训练技术与ApiQ的部分训练相结合。我们发现，在有限的训练数据和冻结权重的情况下，这并没有超越基线ApiQ方法。这带来了两个关键的见解：（1）通过完全重新训练获得的实质性表示能力可能不太可能通过部分训练实现。（2）这种增益可能取决于在量化感知训练中使用大型和多样化的数据集。其次，通过一种由这两个见解启发的新方法，我们提出了一种超低比特量化方法，它建立在ApiQ的基础上，扩展了其性能，而无需完全重新训练。这种公开可用的方法依赖于一种优先保留在量化过程中最具影响力参数的显著性感知正则项。我们在LLaMA 7B和13B基准测试上的实验表明，我们的方法分别将ApiQ的准确性降低了10.85%和7.54%。建议的量化方法的Python实现可以在GitHub上公开获取https://github.com/TokuyuSou/ULB-SAPR。

更新时间: 2025-07-30 14:37:56

领域: cs.LG,cs.CL,68T50, 68T07, 68T09, 68U15,I.2.7; I.2.6; I.2.4

下载: http://arxiv.org/abs/2504.13932v3

Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining

Updated: 2025-07-30 14:37:56

标题: 通过注意力感知部分重训练增强大型语言模型的超低比特量化

摘要: 越来越多地使用大型语言模型引起了人们对它们在推断过程中资源使用强度的环境和经济担忧。为每个用户提供这些模型需要大量的能源和水来进行冷却。量化等模型压缩技术可以缩小大型语言模型，并使它们在资源上更加高效，但可能会导致性能下降。量化方法通过用较低精度的量化值替换高精度参数来压缩模型大小。在现有方法中，ApiQ方法在最小内存和时间开销下实现了卓越的准确性保留。我们研究了两种方法，以超低比特量化的性能超越ApiQ的水平。首先，我们研究了将现有的量化感知训练技术与ApiQ的部分训练相结合。我们发现，在有限的训练数据和冻结权重的情况下，这并不能超越基准ApiQ方法。这导致了两个关键见解：(1)通过完全重新训练获得的实质性表示容量可能不太可能通过部分训练实现。(2)这种增益可能取决于在量化感知训练中使用大型和多样化的数据集。其次，通过一种受这两个见解启发的新方法，我们提出了一种超低比特量化方法，该方法基于ApiQ并在无需完全重新训练的情况下扩展了其性能。这种公开可用的方法依赖于一种优先保留在量化过程中最具影响力的参数的显著性感知正则化项。我们在LLaMA 7B和13B基准上的实验表明，我们的方法分别将ApiQ的准确性降低了10.85%和7.54%。提出的量化方法的Python实现在GitHub上公开可用https://github.com/TokuyuSou/ULB-SAPR。

更新时间: 2025-07-30 14:37:56

领域: cs.LG,cs.CL,68T50, 68T07, 68T09, 68U15,I.2.7; I.2.6; I.2.4

下载: http://arxiv.org/abs/2504.13932v3

Towards interactive evaluations for interaction harms in human-AI systems

Current AI evaluation methods, which rely on static, model-only tests, fail to account for harms that emerge through sustained human-AI interaction. As AI systems proliferate and are increasingly integrated into real-world applications, this disconnect between evaluation approaches and actual usage becomes more significant. In this paper, we propose a shift towards evaluation based on \textit{interactional ethics}, which focuses on \textit{interaction harms} - issues like inappropriate parasocial relationships, social manipulation, and cognitive overreliance that develop over time through repeated interaction, rather than through isolated outputs. First, we discuss the limitations of current evaluation methods, which (1) are static, (2) assume a universal user experience, and (3) have limited construct validity. Drawing on research from human-computer interaction, natural language processing, and the social sciences, we present practical principles for designing interactive evaluations. These include ecologically valid interaction scenarios, human impact metrics, and diverse human participation approaches. Finally, we explore implementation challenges and open research questions for researchers, practitioners, and regulators aiming to integrate interactive evaluations into AI governance frameworks. This work lays the groundwork for developing more effective evaluation methods that better capture the complex dynamics between humans and AI systems.

Updated: 2025-07-30 14:35:05

标题: 朝向人工智能系统中互动危害的互动评估

摘要: 当前的人工智能评估方法依赖于静态的、仅基于模型的测试，未能考虑到通过持续的人机交互而出现的伤害。随着人工智能系统的不断增加并越来越多地整合到实际应用中，评估方法与实际使用之间的脱节变得更加显著。在本文中，我们提出了一种转向基于\textit{互动伦理}评估的观点，重点关注\textit{互动伤害}——例如通过反复交互而随时间发展的不当社交关系、社会操纵和认知过度依赖等问题，而不是通过孤立的输出。首先，我们讨论了当前评估方法的局限性，即（1）静态性，（2）假定了普遍用户体验，以及（3）具有有限的构建效度。借鉴人机交互、自然语言处理和社会科学的研究成果，我们提出了设计互动评估的实用原则。这些原则包括生态有效的互动场景、人类影响指标和多样化的人类参与方法。最后，我们探讨了研究人员、从业者和监管机构在将互动评估整合到人工智能治理框架中时面临的实施挑战和未解研究问题。这项工作为发展更有效的评估方法奠定了基础，更好地捕捉人类和人工智能系统之间复杂动态的关系。

更新时间: 2025-07-30 14:35:05

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.10632v7

Towards interactive evaluations for interaction harms in human-AI systems

Updated: 2025-07-30 14:35:05

标题: 朝向人工智能系统中交互危害的交互评估

摘要: 当前的人工智能评估方法依赖于静态的、仅基于模型的测试，未能考虑通过持续的人机互动而产生的伤害。随着人工智能系统的普及和越来越多地融入实际应用中，评估方法与实际使用之间的脱节变得更加显著。在本文中，我们提出了一种基于\textit{互动伦理}的评估转向，重点关注\textit{互动伤害} - 例如随着时间推移而发展起来的不当的偶像关系、社会操纵和认知过度依赖等问题，而非通过孤立的输出产生。首先，我们讨论了当前评估方法的局限性，即（1）静态性，（2）假设一个普遍的用户体验，以及（3）具有有限的建构效度。借鉴人机交互、自然语言处理和社会科学领域的研究，我们提出了设计交互式评估的实践原则。这些原则包括生态有效的互动场景、人类影响度量和多样化的人类参与方法。最后，我们探讨了将交互式评估融入人工智能治理框架的研究人员、从业者和监管机构面临的实施挑战和开放性研究问题。这项工作为发展更有效的评估方法奠定了基础，更好地捕捉人类与人工智能系统之间复杂动态之间的关系。

更新时间: 2025-07-30 14:35:05

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.10632v7

OFCnetLLM: Large Language Model for Network Monitoring and Alertness

The rapid evolution of network infrastructure is bringing new challenges and opportunities for efficient network management, optimization, and security. With very large monitoring databases becoming expensive to explore, the use of AI and Generative AI can help reduce costs of managing these datasets. This paper explores the use of Large Language Models (LLMs) to revolutionize network monitoring management by addressing the limitations of query finding and pattern analysis. We leverage LLMs to enhance anomaly detection, automate root-cause analysis, and automate incident analysis to build a well-monitored network management team using AI. Through a real-world example of developing our own OFCNetLLM, based on the open-source LLM model, we demonstrate practical applications of OFCnetLLM in the OFC conference network. Our model is developed as a multi-agent approach and is still evolving, and we present early results here.

Updated: 2025-07-30 14:22:42

标题: OFCnetLLM: 用于网络监控和警报的大型语言模型

摘要: 网络基础设施的快速演进为高效的网络管理、优化和安全带来了新的挑战和机遇。随着非常庞大的监控数据库变得昂贵且难以探索，利用人工智能和生成式人工智能可以帮助降低管理这些数据集的成本。本文探讨了使用大型语言模型（LLMs）来革新网络监控管理，解决查询查找和模式分析的限制。我们利用LLMs来增强异常检测、自动化根本原因分析和自动化事件分析，以构建一个利用人工智能进行良好监测的网络管理团队。通过以OFCNetLLM为基础开发我们自己的模型，基于开源LLM模型，我们展示了OFCnetLLM在OFC会议网络中的实际应用。我们的模型采用多代理方法开发，并且仍在不断发展，我们在这里展示了早期结果。

更新时间: 2025-07-30 14:22:42

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2507.22711v1

OFCnetLLM: Large Language Model for Network Monitoring and Alertness

Updated: 2025-07-30 14:22:42

标题: OFCnetLLM：用于网络监控和警报的大型语言模型

摘要: 网络基础设施的快速发展为高效的网络管理、优化和安全性带来了新的挑战和机遇。随着监控数据库变得越来越昂贵，利用人工智能和生成式人工智能可以帮助降低管理这些数据集的成本。本文探讨了利用大型语言模型(LLMs)来革新网络监控管理，以解决查询查找和模式分析的限制。我们利用LLMs来增强异常检测，自动化根本原因分析和自动化事件分析，以构建一个利用人工智能的良好监控的网络管理团队。通过在开源LLM模型基础上开发我们自己的OFCNetLLM，并在OFC会议网络中展示OFCnetLLM的实际应用，我们展示了模型的实际应用。我们的模型采用多代理方法开发，并仍在不断发展，我们在这里提供初步结果。

更新时间: 2025-07-30 14:22:42

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2507.22711v1

Enhanced Prediction of CAR T-Cell Cytotoxicity with Quantum-Kernel Methods

Chimeric antigen receptor (CAR) T-cells are T-cells engineered to recognize and kill specific tumor cells. Through their extracellular domains, CAR T-cells bind tumor cell antigens which triggers CAR T activation and proliferation. These processes are regulated by co-stimulatory domains present in the intracellular region of the CAR T-cell. Through integrating novel signaling components into the co-stimulatory domains, it is possible to modify CAR T-cell phenotype. Identifying and experimentally testing new CAR constructs based on libraries of co-stimulatory domains is nontrivial given the vast combinatorial space defined by such libraries. This leads to a highly data constrained, poorly explored combinatorial problem, where the experiments undersample all possible combinations. We propose a quantum approach using a Projected Quantum Kernel (PQK) to address this challenge. PQK operates by embedding classical data into a high dimensional Hilbert space and employs a kernel method to measure sample similarity. Using 61 qubits on a gate-based quantum computer, we demonstrate the largest PQK application to date and an enhancement in the classification performance over purely classical machine learning methods for CAR T cytotoxicity prediction. Importantly, we show improved learning for specific signaling domains and domain positions, particularly where there was lower information highlighting the potential for quantum computing in data-constrained problems.

Updated: 2025-07-30 14:21:32

标题: 使用量子核方法增强CAR T细胞毒性的预测

摘要: 嵌合抗原受体（CAR）T细胞是经过工程改造用于识别和杀死特定肿瘤细胞的T细胞。通过它们的细胞外结构域，CAR T细胞结合肿瘤细胞抗原，从而触发CAR T的激活和增殖。这些过程由CAR T细胞内部区域中存在的共刺激结构域调节。通过将新的信号组分整合到共刺激结构域中，可以修改CAR T细胞的表型。基于共刺激结构域库识别和实验测试新的CAR构建是一项不容易的任务，因为这些库定义了巨大的组合空间。这导致了一种高度数据限制、未充分探索的组合问题，其中实验未对所有可能的组合进行充分采样。我们提出了使用投影量子核（PQK）来解决这一挑战的量子方法。PQK通过将经典数据嵌入高维希尔伯特空间，并采用核方法来衡量样本相似性。在基于门的量子计算机上使用61比特，我们展示了迄今为止最大的PQK应用，并且在CAR T细胞毒性预测中相对于纯粹的经典机器学习方法表现出分类性能的提升。重要的是，我们展示了针对特定信号结构域和结构域位置的改进学习，特别是在信息较少的情况下，突显了量子计算在数据受限问题中的潜力。

更新时间: 2025-07-30 14:21:32

领域: cs.LG,q-bio.QM,quant-ph

下载: http://arxiv.org/abs/2507.22710v1

Enhanced Prediction of CAR T-Cell Cytotoxicity with Quantum-Kernel Methods

Updated: 2025-07-30 14:21:32

标题: 用量子核方法增强CAR T细胞细胞毒性的预测

摘要: 嵌合抗原受体（CAR）T细胞是经过工程改造以识别和杀死特定肿瘤细胞的T细胞。通过它们的细胞外结构域，CAR T细胞结合肿瘤细胞抗原，触发CAR T的激活和增殖。这些过程受CAR T细胞内部区域中存在的共刺激结构域调控。通过将新的信号组分整合到共刺激结构域中，可以修改CAR T细胞的表型。鉴定和实验性测试基于共刺激结构域库的新CAR构建是一项非常困难的任务，因为这些库定义的组合空间非常庞大。这导致了一个高度数据受限、未充分探索的组合问题，实验中对所有可能的组合进行了不足的抽样。我们提出使用Projected Quantum Kernel（PQK）的量子方法来解决这一挑战。PQK通过将经典数据嵌入高维希尔伯特空间，并采用核方法来衡量样本相似性。利用基于门的量子计算机上的61个量子比特，我们展示了迄今为止最大的PQK应用，并在CAR T细胞毒性预测中比纯经典机器学习方法提高了分类性能。重要的是，我们展示了对特定信号结构域和结构域位置的改善学习，特别是在信息较少的地方，突显了量子计算在数据受限问题中的潜力。

更新时间: 2025-07-30 14:21:32

领域: cs.LG,q-bio.QM,quant-ph

下载: http://arxiv.org/abs/2507.22710v1

Unsupervised Learning in Echo State Networks for Input Reconstruction

Echo state networks (ESNs) are a class of recurrent neural networks in which only the readout layer is trainable, while the recurrent and input layers are fixed. This architectural constraint enables computationally efficient processing of time-series data. Traditionally, the readout layer in ESNs is trained using supervised learning with target outputs. In this study, we focus on input reconstruction (IR), where the readout layer is trained to reconstruct the input time series fed into the ESN. We show that IR can be achieved through unsupervised learning (UL), without access to supervised targets, provided that the ESN parameters are known a priori and satisfy invertibility conditions. This formulation allows applications relying on IR, such as dynamical system replication and noise filtering, to be reformulated within the UL framework via straightforward integration with existing algorithms. Our results suggest that prior knowledge of ESN parameters can reduce reliance on supervision, thereby establishing a new principle: not only by fixing part of the network parameters but also by exploiting their specific values. Furthermore, our UL-based algorithms for input reconstruction and related tasks are suitable for autonomous processing, offering insights into how analogous computational mechanisms might operate in the brain in principle. These findings contribute to a deeper understanding of the mathematical foundations of ESNs and their relevance to models in computational neuroscience.

Updated: 2025-07-30 14:09:44

标题: Echo State网络中的无监督学习用于输入重建

摘要: Echo state networks (ESNs)是一类递归神经网络，其中只有读出层是可训练的，而递归和输入层是固定的。这种架构约束使得对时间序列数据进行高效处理成为可能。传统上，在ESNs中，读出层是通过带有目标输出的监督学习来训练的。在本研究中，我们关注输入重构（IR），其中读出层被训练以重构输入到ESN中的时间序列。我们展示了IR可以通过无监督学习（UL）来实现，即使没有访问监督目标，只要ESN参数事先已知并满足可逆条件。这种表述允许依赖IR的应用，如动态系统复制和噪声过滤，通过与现有算法的直接集成，在UL框架内重新制定。我们的结果表明，对ESN参数的先验了解可以减少对监督的依赖，从而建立一个新的原则：不仅通过固定网络参数的一部分，而且通过利用它们的具体值。此外，我们基于UL的输入重构和相关任务的算法适用于自主处理，提供了关于类似计算机制在原则上如何在大脑中运作的见解。这些发现有助于更深入地理解ESNs的数学基础及其与计算神经科学模型的相关性。

更新时间: 2025-07-30 14:09:44

领域: cs.LG,cs.AI,eess.SP,nlin.CD,q-bio.NC

下载: http://arxiv.org/abs/2501.11409v4

Unsupervised Learning in Echo State Networks for Input Reconstruction

Updated: 2025-07-30 14:09:44

标题: 无监督学习在回声状态网络中的输入重建

摘要: Echo state networks（ESNs）是一类递归神经网络，其中只有输出层是可训练的，而递归和输入层是固定的。这种架构约束使得对时间序列数据的处理具有高效率。传统上，ESNs中的输出层是使用带有目标输出的监督学习进行训练的。在这项研究中，我们专注于输入重建（IR），其中输出层被训练以重建输入到ESN中的时间序列。我们展示了IR可以通过无监督学习（UL）实现，无需访问监督目标，前提是ESN参数事先已知并满足可逆条件。这种表述允许依赖于IR的应用，如动态系统复制和噪声过滤，通过与现有算法的简单集成在UL框架内重新制定。我们的结果表明，对ESN参数的先验知识可以减少对监督的依赖，从而建立一个新的原则：不仅通过固定网络参数的一部分，而且通过利用它们的具体值。此外，我们基于UL的输入重建和相关任务的算法适用于自主处理，提供了关于类似计算机制如何原则上在大脑中运作的见解。这些发现有助于更深入地理解ESNs的数学基础及其与计算神经科学模型的相关性。

更新时间: 2025-07-30 14:09:44

领域: cs.LG,cs.AI,eess.SP,nlin.CD,q-bio.NC

下载: http://arxiv.org/abs/2501.11409v4

Inferring biological processes with intrinsic noise from cross-sectional data

Inferring dynamical models from data continues to be a significant challenge in computational biology, especially given the stochastic nature of many biological processes. We explore a common scenario in omics, where statistically independent cross-sectional samples are available at a few time points, and the goal is to infer the underlying diffusion process that generated the data. Existing inference approaches often simplify or ignore noise intrinsic to the system, compromising accuracy for the sake of optimization ease. We circumvent this compromise by inferring the phase-space probability flow that shares the same time-dependent marginal distributions as the underlying stochastic process. Our approach, probability flow inference (PFI), disentangles force from intrinsic stochasticity while retaining the algorithmic ease of ODE inference. Analytically, we prove that for Ornstein-Uhlenbeck processes the regularized PFI formalism yields a unique solution in the limit of well-sampled distributions. In practical applications, we show that PFI enables accurate parameter and force estimation in high-dimensional stochastic reaction networks, and that it allows inference of cell differentiation dynamics with molecular noise, outperforming state-of-the-art approaches.

Updated: 2025-07-30 14:06:57

标题: 用横断面数据推断内在噪声中的生物过程

摘要: 从数据推断动态模型在计算生物学中仍然是一个重要挑战，尤其是考虑到许多生物过程的随机性质。我们探讨了在组学中常见的情景，即在几个时间点上有统计独立的横截面样本可用，目标是推断生成数据的基础扩散过程。现有的推断方法通常简化或忽略系统固有的噪声，为了优化方便而牺牲准确性。我们通过推断与基础随机过程具有相同时间依赖边际分布的相空间概率流来避免这种妥协。我们的方法，概率流推断（PFI），将力量与内在随机性分开，同时保留了ODE推断的算法便利性。在分析上，我们证明对于Ornstein-Uhlenbeck过程，正则化的PFI形式主义在采样充分分布的极限下产生唯一解。在实际应用中，我们展示了PFI能够在高维随机反应网络中实现准确的参数和力量估计，并且它允许推断具有分子噪声的细胞分化动态，优于最先进的方法。

更新时间: 2025-07-30 14:06:57

领域: cs.LG,physics.bio-ph,q-bio.QM

下载: http://arxiv.org/abs/2410.07501v2

Inferring biological processes with intrinsic noise from cross-sectional data

Updated: 2025-07-30 14:06:57

标题: 从横断面数据中推断具有内在噪音的生物过程

摘要: 从数据中推断动态模型在计算生物学中仍然是一个重要挑战，尤其是考虑到许多生物过程的随机性质。我们探讨了一种在组学中常见的情况，即在少数时间点提供统计独立的横截面样本，并且目标是推断生成数据的基础扩散过程。现有的推断方法通常简化或忽略系统固有的噪音，为了优化的便利而损害准确性。我们通过推断相空间概率流来规避这种妥协，该概率流与基础随机过程具有相同的时间依赖边缘分布。我们的方法，概率流推断（PFI），将力量与固有随机性分离开来，同时保留了ODE推断的算法便利性。在理论上，我们证明对于Ornstein-Uhlenbeck过程，正则化的PFI形式主义在有充分采样的分布极限下产生唯一解。在实际应用中，我们展示了PFI能够在高维随机反应网络中实现准确的参数和力量估计，并且它允许推断具有分子噪声的细胞分化动态，优于最先进的方法。

更新时间: 2025-07-30 14:06:57

领域: cs.LG,physics.bio-ph,q-bio.QM

下载: http://arxiv.org/abs/2410.07501v2

Unsupervised Learning: Comparative Analysis of Clustering Techniques on High-Dimensional Data

This paper presents a comprehensive comparative analysis of prominent clustering algorithms K-means, DBSCAN, and Spectral Clustering on high-dimensional datasets. We introduce a novel evaluation framework that assesses clustering performance across multiple dimensionality reduction techniques (PCA, t-SNE, and UMAP) using diverse quantitative metrics. Experiments conducted on MNIST, Fashion-MNIST, and UCI HAR datasets reveal that preprocessing with UMAP consistently improves clustering quality across all algorithms, with Spectral Clustering demonstrating superior performance on complex manifold structures. Our findings show that algorithm selection should be guided by data characteristics, with Kmeans excelling in computational efficiency, DBSCAN in handling irregular clusters, and Spectral Clustering in capturing complex relationships. This research contributes a systematic approach for evaluating and selecting clustering techniques for high dimensional data applications.

Updated: 2025-07-30 13:58:12

标题: 无监督学习：高维数据上聚类技术的比较分析

摘要: 本文提出了对著名的聚类算法K-means、DBSCAN和谱聚类在高维数据集上进行全面比较分析。我们引入了一个新颖的评估框架，通过多种定量指标评估聚类性能在多个降维技术（PCA、t-SNE和UMAP）上的表现。在MNIST、Fashion-MNIST和UCI HAR数据集上进行的实验表明，使用UMAP预处理能够在所有算法上持续提高聚类质量，而谱聚类在处理复杂流形结构时表现出卓越性能。我们的研究结果表明，算法选择应受数据特征的指导，Kmeans在计算效率方面表现出色，DBSCAN在处理不规则簇上表现出色，而谱聚类在捕捉复杂关系方面表现出色。这项研究为评估和选择高维数据应用中的聚类技术提供了系统性方法。

更新时间: 2025-07-30 13:58:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.23215v2

Unsupervised Learning: Comparative Analysis of Clustering Techniques on High-Dimensional Data

Updated: 2025-07-30 13:58:12

标题: 无监督学习：高维数据上聚类技术的比较分析

摘要: 本文提出了对著名的聚类算法K均值、DBSCAN和谱聚类在高维数据集上进行全面比较分析。我们引入了一个新颖的评估框架，通过多种量化指标评估聚类性能跨多个降维技术（PCA、t-SNE和UMAP）。在MNIST、Fashion-MNIST和UCI HAR数据集上进行的实验表明，UMAP预处理在所有算法中始终提高了聚类质量，而谱聚类在复杂流形结构上表现出优越性能。我们的研究结果表明，算法选择应该受数据特征的指导，K均值在计算效率方面表现出色，DBSCAN在处理不规则簇方面表现出色，而谱聚类在捕捉复杂关系方面表现出色。这项研究为评估和选择高维数据应用的聚类技术提供了系统方法。

更新时间: 2025-07-30 13:58:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2503.23215v2

Equivariant Flow Matching for Point Cloud Assembly

The goal of point cloud assembly is to reconstruct a complete 3D shape by aligning multiple point cloud pieces. This work presents a novel equivariant solver for assembly tasks based on flow matching models. We first theoretically show that the key to learning equivariant distributions via flow matching is to learn related vector fields. Based on this result, we propose an assembly model, called equivariant diffusion assembly (Eda), which learns related vector fields conditioned on the input pieces. We further construct an equivariant path for Eda, which guarantees high data efficiency of the training process. Our numerical results show that Eda is highly competitive on practical datasets, and it can even handle the challenging situation where the input pieces are non-overlapped.

Updated: 2025-07-30 13:55:45

标题: 点云组装的等变流匹配

摘要: 点云组装的目标是通过对齐多个点云片段来重建完整的三维形状。本文介绍了一种基于流匹配模型的用于组装任务的新颖等变求解器。我们首先在理论上展示了通过流匹配学习等变分布的关键是学习相关的矢量场。基于这一结果，我们提出了一种名为等变扩散组装（Eda）的组装模型，该模型在输入片段的条件下学习相关的矢量场。我们进一步为Eda构建了一个等变路径，确保训练过程的高数据效率。我们的数值结果表明，Eda在实际数据集上具有很高的竞争力，甚至可以处理输入片段不重叠的挑战性情况。

更新时间: 2025-07-30 13:55:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2505.21539v2

Equivariant Flow Matching for Point Cloud Assembly

Updated: 2025-07-30 13:55:45

标题: 点云装配的等变流匹配

摘要: 点云拼接的目标是通过对齐多个点云片段来重建完整的三维形状。本文提出了一种基于流匹配模型的用于拼接任务的新型等变求解器。我们首先从理论上展示通过流匹配学习等变分布的关键是学习相关的矢量场。基于这一结果，我们提出了一个名为等变扩散拼接（Eda）的拼接模型，它学习与输入片段相关的矢量场。我们进一步构建了一个用于Eda的等变路径，确保训练过程的高数据效率。我们的数值结果表明，Eda在实际数据集上具有很高的竞争力，甚至可以处理输入片段不重叠的挑战性情况。

更新时间: 2025-07-30 13:55:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2505.21539v2

Resource-Efficient Automatic Software Vulnerability Assessment via Knowledge Distillation and Particle Swarm Optimization

The increasing complexity of software systems has led to a surge in cybersecurity vulnerabilities, necessitating efficient and scalable solutions for vulnerability assessment. However, the deployment of large pre-trained models in real-world scenarios is hindered by their substantial computational and storage demands. To address this challenge, we propose a novel resource-efficient framework that integrates knowledge distillation and particle swarm optimization to enable automated vulnerability assessment. Our framework employs a two-stage approach: First, particle swarm optimization is utilized to optimize the architecture of a compact student model, balancing computational efficiency and model capacity. Second, knowledge distillation is applied to transfer critical vulnerability assessment knowledge from a large teacher model to the optimized student model. This process significantly reduces the model size while maintaining high performance. Experimental results on an enhanced MegaVul dataset, comprising 12,071 CVSS (Common Vulnerability Scoring System) v3 annotated vulnerabilities, demonstrate the effectiveness of our approach. Our approach achieves a 99.4% reduction in model size while retaining 89.3% of the original model's accuracy. Furthermore, it outperforms state-of-the-art baselines by 1.7% in accuracy with 60% fewer parameters. The framework also reduces training time by 72.1% and architecture search time by 34.88% compared to traditional genetic algorithms.

Updated: 2025-07-30 13:55:28

标题: 资源高效的自动软件漏洞评估通过知识蒸馏和粒子群优化

摘要: 软件系统日益复杂导致网络安全漏洞激增，需要高效且可扩展的漏洞评估解决方案。然而，在实际场景中部署大型预训练模型受制于其巨大的计算和存储需求。为了解决这一挑战，我们提出了一种新颖的资源高效框架，整合了知识蒸馏和粒子群优化，实现自动化漏洞评估。我们的框架采用两阶段方法：首先，利用粒子群优化来优化一个紧凑的学生模型的架构，平衡计算效率和模型容量。其次，应用知识蒸馏将关键的漏洞评估知识从大型教师模型转移至优化后的学生模型。这一过程显著减小了模型大小同时保持高性能。在增强的MegaVul数据集上进行的实验结果表明了我们方法的有效性，该数据集包含12,071个CVSS（通用漏洞评分系统）v3标注的漏洞。我们的方法实现了99.4%的模型大小缩减，同时保持了原模型89.3%的准确率。此外，与最先进的基准相比，我们的方法在准确率上提高了1.7%，且参数减少了60%。该框架还将训练时间降低了72.1%，架构搜索时间降低了34.88%，相比传统的遗传算法表现更优。

更新时间: 2025-07-30 13:55:28

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2508.02840v1

Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Mixture of expert (MoE) models are a promising approach to increasing model capacity without increasing inference cost, and are core components of many state-of-the-art language models. However, current MoE models typically use only few experts due to prohibitive training and inference cost. We propose Test-Time Model Merging (TTMM) which scales the MoE paradigm to an order of magnitude more experts and uses model merging to avoid almost any test-time overhead. We show that TTMM is an approximation of test-time training (TTT), which fine-tunes an expert model for each prediction task, i.e., prompt. TTT has recently been shown to significantly improve language models, but is computationally expensive. We find that performance of TTMM improves with more experts and approaches the performance of TTT. Moreover, we find that with a 1B parameter base model, TTMM is more than 100x faster than TTT at test-time by amortizing the cost of TTT at train-time. Thus, TTMM offers a promising cost-effective approach to scale test-time training.

Updated: 2025-07-30 13:53:32

标题: 专家的本地混合：通过模型合并实现基本免费的测试时间训练

摘要: 混合专家（MoE）模型是一种有前途的方法，可以增加模型容量而不增加推断成本，并且是许多最先进的语言模型的核心组件。然而，当前的MoE模型通常只使用少量专家，因为训练和推断成本高昂。我们提出了测试时间模型合并（TTMM），将MoE范例扩展到更多专家数量，并使用模型合并来避免几乎任何测试时间开销。我们展示了TTMM是测试时间训练（TTT）的一种近似，TTT为每个预测任务（即提示）微调专家模型。最近已经证明TTT可以显著改善语言模型，但计算成本高。我们发现，随着更多专家的参与，TTMM的性能提高，并接近TTT的性能。此外，我们发现，使用10亿参数基础模型，TTMM在测试时间比TTT快100倍以上，通过将TTT在训练时间的成本摊销，因此TTMM提供了一种有前途的成本效益方法来扩展测试时间训练。

更新时间: 2025-07-30 13:53:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2505.14136v2

Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Updated: 2025-07-30 13:53:32

标题: 本地专家混合模型：通过模型合并实现基本免费的测试时间训练

摘要: Mixture of expert (MoE) models是一种增加模型容量而不增加推理成本的有前途的方法，是许多最先进的语言模型的核心组件。然而，由于训练和推理成本过高，当前的MoE模型通常只使用少数专家。我们提出了Test-Time Model Merging (TTMM)，它将MoE范例扩展到更多的专家数量，并利用模型合并来避免几乎任何测试时的额外开销。我们展示了TTMM是测试时训练（TTT）的一种近似方法，即为每个预测任务（即提示）微调专家模型。最近已经证明TTT可以显著改善语言模型，但计算成本高。我们发现，随着专家数量的增加，TTMM的性能得到提高，并接近TTT的性能。此外，我们发现，使用10亿参数的基础模型，TTMM在测试时比TTT快100倍以上，通过分摊TTT在训练时的成本。因此，TTMM提供了一种有前途且具有成本效益的方法来扩展测试时的训练。

更新时间: 2025-07-30 13:53:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2505.14136v2

Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features and their relationships with spatial relation semantics. The model is trained as a probability estimator to capture aleatoric uncertainty in human language using three-stage curriculum learning. Results showed that FP-LGN matched expert-designed rules in mean Negative Log-Likelihood (NLL) and demonstrated greater robustness with lower standard deviation. Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language observations and robot sensor measurements, achieving significant improvements in human-robot collaborative task performance.

Updated: 2025-07-30 13:52:22

标题: 空间语言可能性基础网络用于人机观测的贝叶斯融合

摘要: 将人类观察到的信息融合在一起可以帮助机器人克服在协作任务中的感知限制。然而，一个具有不确定性意识的融合框架需要一个基于地面的概率表示人类输入的不确定性。本文提出了一个特征金字塔概率基础网络（FP-LGN），通过学习相关地图图像特征及其与空间关系语义的关系来地面化空间语言。该模型经过三阶段课程学习训练为概率估计器，以捕获人类语言中的偶然性不确定性。结果表明，FP-LGN在平均负对数似然（NLL）上与专家设计的规则相匹配，并且表现出更低标准差的更高鲁棒性。协作感知结果表明，地面化概率成功地实现了对异构人类语言观察和机器人传感器测量的不确定性感知融合，显著提高了人机协作任务绩效。

更新时间: 2025-07-30 13:52:22

领域: cs.RO,cs.CL,cs.IT,cs.LG,cs.SY,eess.SY,math.IT

下载: http://arxiv.org/abs/2507.19947v2

Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

Updated: 2025-07-30 13:52:22

标题: 人机观测的贝叶斯融合空间语言可能性基础网络

摘要: 将来自人类观察的信息融合在一起可以帮助机器人克服在协作任务中的感知限制。然而，一个具有不确定性意识的融合框架需要一个基于地面的可能性，代表人类输入的不确定性。本文介绍了一个特征金字塔可能性基础网络（FP-LGN），通过学习相关地图图像特征及其与空间关系语义的关系来地面化空间语言。该模型作为概率估计器进行训练，以捕捉人类语言中的随机不确定性，采用三阶段课程学习。结果表明，FP-LGN在平均负对数似然（NLL）上与专家设计的规则相匹配，并表现出更低标准差的更强鲁棒性。协作感知结果表明，地面化可能性成功地实现了异构人类语言观察和机器人传感器测量的不确定性感知融合，显著提升了人机协作任务的性能。

更新时间: 2025-07-30 13:52:22

领域: cs.RO,cs.CL,cs.IT,cs.LG,cs.SY,eess.SY,math.IT

下载: http://arxiv.org/abs/2507.19947v2

Bifröst: Spatial Networking with Bigraphs

Modern networked environments increasingly rely on spatial reasoning, but lack a coherent representation for coordinating physical space. Consequently, tasks such as enforcing spatial access policies remain fragile and manual. We first propose a unifying representation based on bigraphs, capturing spatial, social, and communication relationships within a single formalism, with user-facing tools to generate bigraphs from physical environments. Second, we present a hierarchical agent architecture for distributed spatial reasoning, with runtimes for agentic processes to interact the spatial representation, and a context-aware execution model that scopes reasoning to the smallest viable subspace. Together, these enable private, reliable, and low-latency spatial networking that can safely interact with agentic workflows.

Updated: 2025-07-30 13:49:12

标题: Bifröst：利用双图进行空间网络连接

摘要: 现代网络环境越来越依赖空间推理，但缺乏一个统一的表示来协调物理空间。因此，诸如执行空间访问策略等任务仍然是脆弱和手工操作的。我们首先提出了一个基于双图的统一表示，捕捉了空间、社会和通信关系在一个形式化中，具有用户界面工具，可以从物理环境中生成双图。其次，我们提出了一个分布式空间推理的分层代理架构，其中运行时用于代理过程与空间表示进行交互，并具有上下文感知执行模型，将推理范围限制在最小有效子空间。这些共同实现了可以与代理工作流安全交互的私密、可靠和低延迟的空间网络。

更新时间: 2025-07-30 13:49:12

领域: cs.NI,cs.AI,cs.LO,cs.MA

下载: http://arxiv.org/abs/2507.22687v1

Bifröst: Spatial Networking with Bigraphs

Updated: 2025-07-30 13:49:12

标题: Bifröst：使用双图进行空间网络连接

摘要: 现代网络化环境越来越依赖空间推理，但缺乏一个统一的表示来协调物理空间。因此，诸如执行空间访问策略等任务仍然脆弱且需要手动操作。我们首先提出了一个基于大图的统一表示，捕捉了空间、社会和通信关系在一个形式体系中，同时提供用户界面工具从物理环境中生成大图。其次，我们提出了一个用于分布式空间推理的分层代理架构，运行时用于代理过程与空间表示进行交互，以及一个上下文感知执行模型，将推理范围限定在最小有效子空间内。总的来说，这些技术使得私密、可靠且低延迟的空间网络能够安全地与代理工作流进行交互。

更新时间: 2025-07-30 13:49:12

领域: cs.NI,cs.AI,cs.LO,cs.MA

下载: http://arxiv.org/abs/2507.22687v1

Hydra-Bench: A Benchmark for Multi-Modal Leaf Wetness Sensing

Leaf wetness detection is a crucial task in agricultural monitoring, as it directly impacts the prediction and protection of plant diseases. However, existing sensing systems suffer from limitations in robustness, accuracy, and environmental resilience when applied to natural leaves under dynamic real-world conditions. To address these challenges, we introduce a new multi-modal dataset specifically designed for evaluating and advancing machine learning algorithms in leaf wetness detection. Our dataset comprises synchronized mmWave raw data, Synthetic Aperture Radar (SAR) images, and RGB images collected over six months from five diverse plant species in both controlled and outdoor field environments. We provide detailed benchmarks using the Hydra model, including comparisons against single modality baselines and multiple fusion strategies, as well as performance under varying scan distances. Additionally, our dataset can serve as a benchmark for future SAR imaging algorithm optimization, enabling a systematic evaluation of detection accuracy under diverse conditions.

Updated: 2025-07-30 13:47:56

标题: Hydra-Bench：多模式叶片湿度感知基准测试

摘要: 叶片湿度检测是农业监测中的关键任务，因为它直接影响了对植物疾病的预测和保护。然而，在应用于动态真实世界条件下的自然叶片时，现有的传感系统存在鲁棒性、准确性和环境适应性方面的限制。为了解决这些挑战，我们引入了一个新的多模态数据集，专门用于评估和推进叶片湿度检测中的机器学习算法。我们的数据集包括经过同步的毫米波原始数据、合成孔径雷达（SAR）图像和RGB图像，这些数据在六个月内从五种不同植物物种中收集，涵盖了受控和户外田野环境。我们使用Hydra模型提供了详细的基准数据，包括与单模态基准线和多重融合策略的比较，以及在不同扫描距离下的性能。此外，我们的数据集可以作为未来SAR成像算法优化的基准，实现在各种条件下检测准确性的系统评估。

更新时间: 2025-07-30 13:47:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22685v1

Hydra-Bench: A Benchmark for Multi-Modal Leaf Wetness Sensing

Updated: 2025-07-30 13:47:56

标题: Hydra-Bench：一种多模态叶片湿度感知基准测试

摘要: 叶片湿度检测是农业监测中至关重要的任务，因为它直接影响植物疾病的预测和保护。然而，在应用于动态真实世界条件下的自然叶片时，现有的传感系统存在鲁棒性、准确性和环境适应性方面的限制。为了解决这些挑战，我们引入了一个新的多模态数据集，专门用于评估和推进叶片湿度检测中的机器学习算法。我们的数据集包括从五种不同植物物种在受控和室外田间环境中收集的六个月的同步毫米波原始数据、合成孔径雷达（SAR）图像和RGB图像。我们使用Hydra模型提供了详细的基准测试，包括与单模态基线和多模态融合策略的比较，以及在不同扫描距离下的性能。此外，我们的数据集可以作为未来SAR成像算法优化的基准，实现在不同条件下检测准确性的系统评估。

更新时间: 2025-07-30 13:47:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22685v1

Cryptanalysis of LC-MUME: A Lightweight Certificateless Multi-User Matchmaking Encryption for Mobile Devices

Yang et al. proposed a lightweight certificateless multiuser matchmaking encryption (LC-MUME) scheme for mobile devices, published in IEEE Transactions on Information Forensics and Security (TIFS) (DOI: 10.1109/TIFS.2023.3321961). Their construction aims to reduce computational and communication overhead within a one-to-many certificateless cryptographic framework. The authors claim that their scheme satisfies existential unforgeability under chosen-message attacks (EUF-CMA) in the random oracle model. However, our cryptanalytic study demonstrates that the scheme fails to meet this critical security requirement. In particular, we show that a Type-I adversary can successfully forge a valid ciphertext without possessing the complete private key of the sender. Both theoretical analysis and practical implementation confirm that this attack can be mounted with minimal computational cost. To address these weaknesses, we propose a modification strategy to strengthen the security of matchmaking encryption schemes in mobile computing environments.

Updated: 2025-07-30 13:36:52

标题: LC-MUME的密码分析：一种轻量级无证书多用户匹配加密方案，适用于移动设备

摘要: 杨等人提出了一种轻量级无证书多用户匹配加密（LC-MUME）方案，用于移动设备，在IEEE信息取证与安全交易（TIFS）上发表（DOI: 10.1109/TIFS.2023.3321961）。他们的构建旨在在一对多的无证书密码框架内减少计算和通信开销。作者声称他们的方案在随机预言模型下满足选择消息攻击的存在性不可伪造性（EUF-CMA）。然而，我们的密码分析研究表明，该方案未能满足这一关键安全要求。具体而言，我们展示了一种I型对手可以成功伪造一个有效的密文，而无需拥有发送者的完整私钥。理论分析和实际实施均证实该攻击可以以极小的计算成本进行。为了解决这些弱点，我们提出了一种修改策略，以加强移动计算环境中匹配加密方案的安全性。

更新时间: 2025-07-30 13:36:52

领域: cs.CR

下载: http://arxiv.org/abs/2507.22674v1

Designing for Self-Regulation in Informal Programming Learning: Insights from a Storytelling-Centric Approach

Many people learn programming independently from online resources and often report struggles in achieving their personal learning goals. Learners frequently describe their experiences as isolating and frustrating, challenged by abundant uncertainties, information overload, and distraction, compounded by limited guidance. At the same time, social media serves as a personal space where many engage in diverse self-regulation practices, including help-seeking, using external memory aids (e.g., self-notes), self-reflection, emotion regulation, and self-motivation. For instance, learners often mark achievements and set milestones through their posts. In response, we developed a system consisting of a web platform and browser extensions to support self-regulation online. The design aims to add learner-defined structure to otherwise unstructured experiences and bring meaning to curation and reflection activities by translating them into learning stories with AI-generated feedback. We position storytelling as an integrative approach to design that connects resource curation, reflective and sensemaking practice, and narrative practices learners already use across social platforms. We recruited 15 informal programming learners who are regular social media users to engage with the system in a self-paced manner; participation concluded upon submitting a learning story and survey. We used three quantitative scales and a qualitative survey to examine users' characteristics and perceptions of the system's support for their self-regulation. User feedback suggests the system's viability as a self-regulation aid. Learners particularly valued in-situ reflection, automated story feedback, and video annotation, while other features received mixed views. We highlight perceived benefits, friction points, and design opportunities for future AI-augmented self-regulation tools.

Updated: 2025-07-30 13:30:04

标题: 在非正式编程学习中设计自我调节：从讲故事为中心的方法中获得的见解

摘要: 许多人独立从在线资源学习编程，通常报告在实现他们的个人学习目标上遇到困难。学习者经常将他们的经验描述为孤立和沮丧，面临着大量的不确定性、信息过载和分散注意力，再加上有限的指导。同时，社交媒体作为一个个人空间，在那里许多人参与各种自我调节实践，包括寻求帮助、使用外部记忆辅助工具（如自我笔记）、自我反思、情绪调节和自我激励。例如，学习者经常通过他们的帖子标记成就并设定里程碑。作为回应，我们开发了一个由网络平台和浏览器扩展程序组成的系统，以支持在线自我调节。该设计旨在为否则无结构的体验增加由学习者定义的结构，并通过将它们转化为具有人工智能生成反馈的学习故事，给策展和反思活动赋予意义。我们将叙事定位为一种整合性设计方法，连接资源策展、反思和意义构建实践以及学习者已经在社交平台上使用的叙事实践。我们招募了15名经常使用社交媒体的非正式编程学习者以自主的方式参与该系统；参与活动在提交学习故事和调查后结束。我们使用三个量化量表和一份定性调查来研究用户的特征以及他们对系统支持其自我调节的看法。用户反馈表明该系统作为自我调节辅助工具的可行性。学习者特别重视现场反思、自动化故事反馈和视频注释，而其他功能收到了不同的评价。我们强调了未来AI增强的自我调节工具的感知益处、摩擦点和设计机会。

更新时间: 2025-07-30 13:30:04

领域: cs.HC,cs.AI,cs.CY,cs.SE,H.5.2; H.5.4

下载: http://arxiv.org/abs/2507.22671v1

Designing for Self-Regulation in Informal Programming Learning: Insights from a Storytelling-Centric Approach

Updated: 2025-07-30 13:30:04

标题: 自我调节在非正式编程学习中的设计：来自讲故事中心方法的启示

摘要: 许多人通过在线资源独立学习编程，通常报告在实现个人学习目标方面遇到困难。学习者经常描述他们的经历是孤立和沮丧的，受到大量的不确定性、信息过载和干扰的挑战，再加上有限的指导。同时，社交媒体作为一个个人空间，许多人参与各种自我调节实践，包括寻求帮助、使用外部记忆辅助工具（例如自我笔记）、自我反思、情绪调节和自我激励。例如，学习者经常通过他们的帖子标记成就并设立里程碑。作为回应，我们开发了一个由网络平台和浏览器扩展组成的系统，以支持在线自我调节。设计旨在为否则无结构的体验增加由学习者定义的结构，并通过将它们转化为带有AI生成反馈的学习故事，将意义带入策展和反思活动中。我们将叙事定位为一种整合性设计方法，连接资源策展、反思和意义构建实践，以及学习者已经在社交平台上使用的叙事实践。我们招募了15名常规社交媒体用户的非正式编程学习者以自主方式参与系统；参与者在提交学习故事和调查后结束参与。我们使用了三个量化量表和一个定性调查来检查用户的特征和他们对系统支持自我调节的感知。用户反馈表明系统作为自我调节辅助工具的可行性。学习者特别重视现场反思、自动化故事反馈和视频注释，而其他功能收到了不同意见。我们强调了未来AI增强自我调节工具的感知好处、摩擦点和设计机会。

更新时间: 2025-07-30 13:30:04

领域: cs.HC,cs.AI,cs.CY,cs.SE,H.5.2; H.5.4

下载: http://arxiv.org/abs/2507.22671v1

Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis

As ever-larger clinical datasets become available, they have the potential to unlock unprecedented opportunities for medical research. Foremost among them is Medical Information Mart for Intensive Care (MIMIC-IV), the world's largest open-source EHR database. However, the inherent complexity of these datasets, particularly the need for sophisticated querying skills and the need to understand the underlying clinical settings, often presents a significant barrier to their effective use. M3 lowers the technical barrier to understanding and querying MIMIC-IV data. With a single command it retrieves MIMIC-IV from PhysioNet, launches a local SQLite instance (or hooks into the hosted BigQuery), and-via the Model Context Protocol (MCP)-lets researchers converse with the database in plain English. Ask a clinical question in natural language; M3 uses a language model to translate it into SQL, executes the query against the MIMIC-IV dataset, and returns structured results alongside the underlying query for verifiability and reproducibility. Demonstrations show that minutes of dialogue with M3 yield the kind of nuanced cohort analyses that once demanded hours of handcrafted SQL and relied on understanding the complexities of clinical workflows. By simplifying access, M3 invites the broader research community to mine clinical critical-care data and accelerates the translation of raw records into actionable insight.

Updated: 2025-07-30 13:27:00

标题: 对话式LLMs简化安全的临床数据访问、理解和分析

摘要: 随着越来越大规模的临床数据集变得可用，它们有潜力为医学研究开辟前所未有的机遇。其中最重要的是重症监护医学信息市集（MIMIC-IV），这是世界上最大的开源电子病历数据库。然而，这些数据集固有的复杂性，特别是需要复杂的查询技能和理解底层临床环境的需要，往往构成了对其有效使用的重要障碍。M3降低了理解和查询MIMIC-IV数据的技术障碍。通过一个简单的命令，它从PhysioNet检索MIMIC-IV，启动本地SQLite实例（或连接到托管的BigQuery），并通过模型上下文协议（MCP）让研究人员用简单的英语与数据库交流。用自然语言提出一个临床问题；M3使用语言模型将其翻译成SQL，对MIMIC-IV数据集执行查询，并返回结构化结果以及底层查询以供验证和重现。演示表明，与M3进行几分钟的对话可以产生细致的队列分析，这种分析曾经需要数小时的手工编写SQL并依赖于对临床工作流程的理解。通过简化访问，M3邀请更广泛的研究社区挖掘临床重症护理数据，并加速将原始记录转化为可操作的洞察。

更新时间: 2025-07-30 13:27:00

领域: cs.IR,cs.AI,cs.DB,68T50, 68P15,H.2.3; I.2.7; J.3

下载: http://arxiv.org/abs/2507.01053v2

Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis

Updated: 2025-07-30 13:27:00

标题: 对话式LLMs简化了安全的临床数据访问、理解和分析。

摘要: 随着越来越大规模的临床数据集变得可用，它们有潜力为医学研究开启前所未有的机会。其中最重要的是医学信息管理系统(MIMIC-IV)，这是世界上最大的开源电子病历数据库。然而，这些数据集固有的复杂性，特别是需要复杂的查询技能和理解潜在的临床环境，通常会对它们的有效使用造成重大障碍。M3降低了理解和查询MIMIC-IV数据的技术障碍。通过一个简单的命令，它从PhysioNet检索MIMIC-IV，启动本地SQLite实例（或连接到托管的BigQuery），并且通过模型上下文协议(MCP)，让研究人员可以用简单的英语与数据库交流。用自然语言提出临床问题；M3使用语言模型将其翻译为SQL，执行针对MIMIC-IV数据集的查询，并返回结构化的结果以及底层查询，以供验证和再现。演示表明，与M3进行几分钟的对话可以得到那种曾经需要几小时手工编写SQL并依赖于理解临床工作流程复杂性的细致队列分析。通过简化访问，M3邀请更广泛的研究社区挖掘临床重症护理数据，并加速将原始记录转化为可操作洞察。

更新时间: 2025-07-30 13:27:00

领域: cs.IR,cs.AI,cs.DB,68T50, 68P15,H.2.3; I.2.7; J.3

下载: http://arxiv.org/abs/2507.01053v2

Cluster-Based Random Forest Visualization and Interpretation

Random forests are a machine learning method used to automatically classify datasets and consist of a multitude of decision trees. While these random forests often have higher performance and generalize better than a single decision tree, they are also harder to interpret. This paper presents a visualization method and system to increase interpretability of random forests. We cluster similar trees which enables users to interpret how the model performs in general without needing to analyze each individual decision tree in detail, or interpret an oversimplified summary of the full forest. To meaningfully cluster the decision trees, we introduce a new distance metric that takes into account both the decision rules as well as the predictions of a pair of decision trees. We also propose two new visualization methods that visualize both clustered and individual decision trees: (1) The Feature Plot, which visualizes the topological position of features in the decision trees, and (2) the Rule Plot, which visualizes the decision rules of the decision trees. We demonstrate the efficacy of our approach through a case study on the "Glass" dataset, which is a relatively complex standard machine learning dataset, as well as a small user study.

Updated: 2025-07-30 13:22:28

标题: 基于聚类的随机森林可视化和解释

摘要: 随机森林是一种机器学习方法，用于自动分类数据集，由多个决策树组成。虽然这些随机森林通常比单个决策树具有更高的性能和更好的泛化能力，但它们也更难解释。本文介绍了一种可视化方法和系统，以提高随机森林的可解释性。我们对相似的树进行聚类，使用户能够理解模型的整体表现，而无需详细分析每个单独的决策树，或解释整个森林的过度简化摘要。为了有意义地对决策树进行聚类，我们引入了一种新的距离度量标准，既考虑了决策规则，又考虑了一对决策树的预测。我们还提出了两种新的可视化方法，既可视化聚类的决策树，也可视化单个决策树：（1）特征图，可视化决策树中特征的拓扑位置，以及（2）规则图，可视化决策树的决策规则。我们通过对“玻璃”数据集的案例研究以及一项小型用户研究来展示我们方法的有效性。

更新时间: 2025-07-30 13:22:28

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2507.22665v1

Cluster-Based Random Forest Visualization and Interpretation

Updated: 2025-07-30 13:22:28

标题: 基于聚类的随机森林可视化与解释

摘要: 随机森林是一种用于自动分类数据集的机器学习方法，由多个决策树组成。虽然这些随机森林通常比单个决策树具有更高的性能和更好的泛化能力，但它们也更难解释。本文提出了一种可视化方法和系统，以增加随机森林的可解释性。我们对相似的树进行聚类，使用户能够解释模型的整体表现，而无需详细分析每个个体决策树，或解释整个森林的过度简化摘要。为了有意义地对决策树进行聚类，我们引入了一个新的距离度量，考虑了决策规则以及一对决策树的预测。我们还提出了两种新的可视化方法，分别可视化聚类和个体决策树：（1）特征图，可视化决策树中特征的拓扑位置，以及（2）规则图，可视化决策树的决策规则。我们通过对“玻璃”数据集的案例研究以及一项小型用户研究展示了我们方法的有效性。

更新时间: 2025-07-30 13:22:28

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2507.22665v1

RobEthiChor: Automated Context-aware Ethics-based Negotiation for Autonomous Robots

The presence of autonomous systems is growing at a fast pace and it is impacting many aspects of our lives. Designed to learn and act independently, these systems operate and perform decision-making without human intervention. However, they lack the ability to incorporate users' ethical preferences, which are unique for each individual in society and are required to personalize the decision-making processes. This reduces user trust and prevents autonomous systems from behaving according to the moral beliefs of their end-users. When multiple systems interact with differing ethical preferences, they must negotiate to reach an agreement that satisfies the ethical beliefs of all the parties involved and adjust their behavior consequently. To address this challenge, this paper proposes RobEthiChor, an approach that enables autonomous systems to incorporate user ethical preferences and contextual factors into their decision-making through ethics-based negotiation. RobEthiChor features a domain-agnostic reference architecture for designing autonomous systems capable of ethic-based negotiating. The paper also presents RobEthiChor-Ros, an implementation of RobEthiChor within the Robot Operating System (ROS), which can be deployed on robots to provide them with ethics-based negotiation capabilities. To evaluate our approach, we deployed RobEthiChor-Ros on real robots and ran scenarios where a pair of robots negotiate upon resource contention. Experimental results demonstrate the feasibility and effectiveness of the system in realizing ethics-based negotiation. RobEthiChor allowed robots to reach an agreement in more than 73\% of the scenarios with an acceptable negotiation time (0.67s on average). Experiments also demonstrate that the negotiation approach implemented in RobEthiChor is scalable.

Updated: 2025-07-30 13:21:38

标题: RobEthiChor：自主机器人的自动化上下文感知伦理基础谈判

摘要: 自主系统的存在正以快速的速度增长，并影响我们生活的许多方面。设计为学习并独立行动的这些系统，在没有人类干预的情况下运行并做出决策。然而，它们缺乏整合用户伦理偏好的能力，这些偏好对社会中每个个体是独特的，并且需要个性化决策过程。这降低了用户的信任，并阻止自主系统按照最终用户的道德信仰行事。当多个系统交互时具有不同的伦理偏好，它们必须协商以达成满足所有参与方伦理信仰的协议，并相应调整行为。为了解决这一挑战，本文提出了RobEthiChor，一种通过基于伦理的协商使自主系统能够整合用户伦理偏好和环境因素进入其决策过程的方法。RobEthiChor具有一个领域无关的参考架构，用于设计能够进行基于伦理的谈判的自主系统。本文还介绍了RobEthiChor-Ros，即在机器人操作系统（ROS）中实现的RobEthiChor，可以部署在机器人上，为它们提供基于伦理的协商能力。为了评估我们的方法，我们在真实机器人上部署了RobEthiChor-Ros，并运行了一对机器人在资源争用上进行协商的情景。实验结果表明系统在实现基于伦理的协商方面的可行性和有效性。RobEthiChor使机器人在超过73％的情景中达成了协议，且具有可接受的协商时间（平均0.67秒）。实验还表明，在RobEthiChor中实现的协商方法是可扩展的。

更新时间: 2025-07-30 13:21:38

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.22664v1

RobEthiChor: Automated Context-aware Ethics-based Negotiation for Autonomous Robots

Updated: 2025-07-30 13:21:38

标题: RobEthiChor: 用于自治机器人的自动化、上下文感知的基于道德的谈判

摘要: 自主系统的存在正在快速增长，并影响着我们生活的许多方面。这些系统被设计为能够独立学习和行动，它们在没有人类干预的情况下运作并做出决策。然而，它们缺乏融入用户的道德偏好的能力，而这些偏好对于社会中的每个个体都是独特的，并且需要个性化决策过程。这降低了用户的信任，阻止了自主系统根据其最终用户的道德信仰行事。当多个系统与不同的道德偏好进行交互时，它们必须进行协商以达成令所有参与方都满意的道德信仰，并相应地调整自己的行为。为了解决这一挑战，本文提出了RobEthiChor，一种通过基于伦理道德的协商使自主系统能够融入用户道德偏好和上下文因素到其决策中的方法。RobEthiChor具备一个领域无关的参考架构，用于设计能够进行基于伦理道德协商的自主系统。本文还介绍了RobEthiChor-Ros，将RobEthiChor实现在机器人操作系统（ROS）中，可以部署在机器人上，为它们提供基于伦理道德的协商能力。为了评估我们的方法，我们在真实机器人上部署了RobEthiChor-Ros，并运行了一些场景，其中一对机器人就资源争夺进行协商。实验结果表明系统在实现基于伦理道德的协商方面的可行性和有效性。RobEthiChor使得机器人在超过73\%的场景中达成了一个可接受的协商时间（平均0.67秒）的协议。实验还表明，RobEthiChor实现的协商方法是可扩展的。

更新时间: 2025-07-30 13:21:38

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.22664v1

A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language Models

The increasing adoption of Large Language Models (LLMs) in software engineering has sparked interest in their use for software vulnerability detection. However, the rapid development of this field has resulted in a fragmented research landscape, with diverse studies that are difficult to compare due to differences in, e.g., system designs and dataset usage. This fragmentation makes it difficult to obtain a clear overview of the state-of-the-art or compare and categorize studies meaningfully. In this work, we present a comprehensive systematic literature review (SLR) of LLM-based software vulnerability detection. We analyze 227 studies published between January 2020 and June 2025, categorizing them by task formulation, input representation, system architecture, and adaptation techniques. Further, we analyze the datasets used, including their characteristics, vulnerability coverage, and diversity. We present a fine-grained taxonomy of vulnerability detection approaches, identify key limitations, and outline actionable future research opportunities. By providing a structured overview of the field, this review improves transparency and serves as a practical guide for researchers and practitioners aiming to conduct more comparable and reproducible research. We publicly release all artifacts and maintain a living repository of LLM-based software vulnerability detection studies.

Updated: 2025-07-30 13:17:16

标题: 使用大型语言模型检测软件漏洞的系统文献综述

摘要: 软件工程中大型语言模型（LLMs）的日益普及引起了人们对其在软件漏洞检测中的应用的兴趣。然而，这一领域的快速发展导致了一个碎片化的研究格局，各种研究难以进行比较，原因在于系统设计和数据集使用等方面的差异。这种碎片化使得难以获得对现有技术水平的清晰概述，也难以有意义地比较和分类研究。在本研究中，我们展示了一项关于基于LLM的软件漏洞检测的全面系统文献综述（SLR）。我们分析了2020年1月至2025年6月间发表的227项研究，将它们按照任务制定、输入表示、系统架构和适应技术进行分类。此外，我们还分析了使用的数据集，包括其特征、漏洞覆盖率和多样性。我们提出了一个漏洞检测方法的细粒度分类法，识别了关键限制，并概述了可操作的未来研究机会。通过提供领域的结构化概述，本综述提高了透明度，并为希望进行更具可比性和可重现性研究的研究人员和从业者提供了实用指南。我们公开发布所有工件，并维护一个基于LLM的软件漏洞检测研究的持续更新的仓库。

更新时间: 2025-07-30 13:17:16

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.22659v1

A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language Models

Updated: 2025-07-30 13:17:16

标题: 使用大型语言模型检测软件漏洞的系统文献综述

摘要: 大规模语言模型（LLM）在软件工程中的日益普及引起了人们对其在软件漏洞检测中的应用的兴趣。然而，这一领域的快速发展导致了研究领域的碎片化，各种研究难以比较，因为它们在系统设计和数据集使用等方面存在差异。这种碎片化使得很难获得明确的最新技术概况，也难以有意义地比较和分类研究。在本研究中，我们提出了一个关于基于LLM的软件漏洞检测的全面系统文献综述（SLR）。我们分析了自2020年1月至2025年6月间发表的227篇研究，通过任务制定、输入表示、系统架构和适应技术对它们进行分类。此外，我们对使用的数据集进行了分析，包括其特征、漏洞覆盖率和多样性。我们提出了一个漏洞检测方法的细粒度分类法，识别了关键限制，并概述了可行的未来研究机会。通过提供该领域的结构化概述，本综述增进了透明度，并为希望进行更具可比性和可重复性研究的研究人员和从业者提供了实用指南。我们公开发布所有工件，并维护一个关于基于LLM的软件漏洞检测研究的持续更新的知识库。

更新时间: 2025-07-30 13:17:16

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.22659v1

Don't Lag, RAG: Training-Free Adversarial Detection Using RAG

Adversarial patch attacks pose a major threat to vision systems by embedding localized perturbations that mislead deep models. Traditional defense methods often require retraining or fine-tuning, making them impractical for real-world deployment. We propose a training-free Visual Retrieval-Augmented Generation (VRAG) framework that integrates Vision-Language Models (VLMs) for adversarial patch detection. By retrieving visually similar patches and images that resemble stored attacks in a continuously expanding database, VRAG performs generative reasoning to identify diverse attack types, all without additional training or fine-tuning. We extensively evaluate open-source large-scale VLMs, including Qwen-VL-Plus, Qwen2.5-VL-72B, and UI-TARS-72B-DPO, alongside Gemini-2.0, a closed-source model. Notably, the open-source UI-TARS-72B-DPO model achieves up to 95 percent classification accuracy, setting a new state-of-the-art for open-source adversarial patch detection. Gemini-2.0 attains the highest overall accuracy, 98 percent, but remains closed-source. Experimental results demonstrate VRAG's effectiveness in identifying a variety of adversarial patches with minimal human annotation, paving the way for robust, practical defenses against evolving adversarial patch attacks.

Updated: 2025-07-30 13:13:40

标题: 不拖延，RAG：使用RAG进行无需训练的对抗性检测

摘要: 对抗贴片攻击通过嵌入误导深度模型的局部扰动，对视觉系统构成重大威胁。传统的防御方法通常需要重新训练或微调，这使它们在实际部署中变得不切实际。我们提出了一个无需训练的视觉检索增强生成（VRAG）框架，该框架整合了视觉-语言模型（VLMs）用于对抗性贴片检测。通过检索视觉上相似的贴片和图像，这些贴片和图像类似于存储的攻击，在一个不断扩大的数据库中，VRAG执行生成推理来识别各种攻击类型，而无需额外的训练或微调。我们对开源大规模VLMs进行了广泛评估，包括Qwen-VL-Plus、Qwen2.5-VL-72B和UI-TARS-72B-DPO，以及Gemini-2.0，一个闭源模型。值得注意的是，开源模型UI-TARS-72B-DPO实现了高达95%的分类准确率，为开源对抗性贴片检测设立了新的技术水平。Gemini-2.0实现了最高的整体准确率，为98%，但仍然是闭源的。实验结果表明，VRAG在识别各种对抗性贴片方面具有有效性，几乎不需要人工注释，为对抗不断进化的对抗性贴片攻击提供了坚固、实用的防御方式。

更新时间: 2025-07-30 13:13:40

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.04858v3

Don't Lag, RAG: Training-Free Adversarial Detection Using RAG

Updated: 2025-07-30 13:13:40

标题: 不要拖延，RAG：使用RAG进行无需训练的对抗检测

摘要: 对抗性贴片攻击通过嵌入局部扰动误导深度模型，对视觉系统构成重大威胁。传统的防御方法通常需要重新训练或微调，使其在实际部署中不切实际。我们提出了一个无需训练的视觉检索增强生成（VRAG）框架，该框架整合了视觉-语言模型（VLMs）用于对抗性贴片的检测。通过检索视觉上相似的贴片和图像，并在一个不断扩大的数据库中找到与存储的攻击相似的图像，VRAG执行生成推理来识别多样的攻击类型，而无需额外的训练或微调。我们对包括Qwen-VL-Plus、Qwen2.5-VL-72B和UI-TARS-72B-DPO在内的开源大规模VLMs进行了广泛评估，以及闭源模型Gemini-2.0。值得注意的是，开源UI-TARS-72B-DPO模型实现了高达95%的分类准确率，为开源对抗性贴片检测设立了新的技术标准。Gemini-2.0获得了最高的整体准确率，为98%，但仍保持闭源。实验结果表明VRAG在识别各种对抗性贴片方面的有效性，几乎不需要人工标注，为对抗不断发展的对抗性贴片攻击提供了坚固、实用的防御手段。

更新时间: 2025-07-30 13:13:40

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2504.04858v3

GPT-4.1 Sets the Standard in Automated Experiment Design Using Novel Python Libraries

Large Language Models (LLMs) have advanced rapidly as tools for automating code generation in scientific research, yet their ability to interpret and use unfamiliar Python APIs for complex computational experiments remains poorly characterized. This study systematically benchmarks a selection of state-of-the-art LLMs in generating functional Python code for two increasingly challenging scenarios: conversational data analysis with the \textit{ParShift} library, and synthetic data generation and clustering using \textit{pyclugen} and \textit{scikit-learn}. Both experiments use structured, zero-shot prompts specifying detailed requirements but omitting in-context examples. Model outputs are evaluated quantitatively for functional correctness and prompt compliance over multiple runs, and qualitatively by analyzing the errors produced when code execution fails. Results show that only a small subset of models consistently generate correct, executable code, with GPT-4.1 standing out as the only model to always succeed in both tasks. In addition to benchmarking LLM performance, this approach helps identify shortcomings in third-party libraries, such as unclear documentation or obscure implementation bugs. Overall, these findings highlight current limitations of LLMs for end-to-end scientific automation and emphasize the need for careful prompt design, comprehensive library documentation, and continued advances in language model capabilities.

Updated: 2025-07-30 13:11:29

标题: GPT-4.1使用新型Python库在自动实验设计方面设定标准

摘要: 大型语言模型（LLMs）作为科学研究中自动生成代码的工具迅速发展，然而它们解释和使用复杂计算实验中的陌生Python API的能力仍然缺乏明确定性。本研究系统地对一系列最先进的LLMs进行基准测试，生成用于两种日益具有挑战性情境的功能性Python代码：使用\textit{ParShift}库进行对话数据分析，以及使用\textit{pyclugen}和\textit{scikit-learn}进行合成数据生成和聚类。这两个实验使用结构化的零样本提示，指定详细要求但省略上下文示例。模型输出在多次运行中通过定量评估功能正确性和提示符合度，通过分析代码执行失败时产生的错误进行定性评估。结果显示，只有少量模型始终生成正确的可执行代码，其中GPT-4.1在两项任务中始终成功。除了对LLM性能进行基准测试外，这种方法还有助于识别第三方库的缺陷，如文档不清晰或实现不明确的错误。总的来说，这些发现突显了LLMs在端到端科学自动化中目前的局限性，并强调了需要谨慎设计提示、全面的库文档以及对语言模型能力的持续进展。

更新时间: 2025-07-30 13:11:29

领域: cs.SE,cs.AI,cs.CL,68T50,I.2.2; I.2.7; D.2.3

下载: http://arxiv.org/abs/2508.00033v1

Transductive Model Selection under Prior Probability Shift

Transductive learning is a supervised machine learning task in which, unlike in traditional inductive learning, the unlabelled data that require labelling are a finite set and are available at training time. Similarly to inductive learning contexts, transductive learning contexts may be affected by dataset shift, i.e., may be such that the IID assumption does not hold. We here propose a method, tailored to transductive classification contexts, for performing model selection (i.e., hyperparameter optimisation) when the data exhibit prior probability shift, an important type of dataset shift typical of anti-causal learning problems. In our proposed method the hyperparameters can be optimised directly on the unlabelled data to which the trained classifier must be applied; this is unlike traditional model selection methods, that are based on performing cross-validation on the labelled training data. We provide experimental results that show the benefits brought about by our method.

Updated: 2025-07-30 13:03:24

标题: 先验概率变化下的传导模型选择

摘要: 跨域学习是一种监督机器学习任务，在这种任务中，与传统的归纳学习不同的是，需要标记的未标记数据是有限集，并且在训练时可用。与归纳学习环境类似，跨域学习环境可能受到数据集转移的影响，即可能出现IID假设不成立的情况。我们在这里提出了一种针对跨域分类环境的方法，用于在数据表现出先验概率转移的情况下进行模型选择（即超参数优化），这是一种典型的反因果学习问题的重要数据集转移类型。在我们提出的方法中，超参数可以直接在未标记的数据上进行优化，这与传统的基于在标记训练数据上进行交叉验证的模型选择方法不同。我们提供实验证明了我们的方法带来的好处。

更新时间: 2025-07-30 13:03:24

领域: cs.LG

下载: http://arxiv.org/abs/2507.22647v1

Transductive Model Selection under Prior Probability Shift

Updated: 2025-07-30 13:03:24

标题: 在先验概率转移下的转导模型选择

摘要: 推导性学习是一种监督机器学习任务，与传统的归纳性学习不同，需要标记的未标记数据是有限的，并且在训练时可用。与归纳性学习情境类似，推导性学习情境可能受到数据集转移的影响，即可能存在 IID 假设不成立的情况。我们在这里提出了一种针对推导性分类情境的方法，用于在数据表现出先验概率转移的情况下进行模型选择（即超参数优化），这是一种典型的反因果学习问题。在我们提出的方法中，超参数可以直接在需要应用训练分类器的未标记数据上进行优化；这与传统的模型选择方法不同，后者基于在标记训练数据上进行交叉验证。我们提供了实验证据，展示了我们的方法带来的好处。

更新时间: 2025-07-30 13:03:24

领域: cs.LG

下载: http://arxiv.org/abs/2507.22647v1

Safe Deployment of Offline Reinforcement Learning via Input Convex Action Correction

Offline reinforcement learning (offline RL) offers a promising framework for developing control strategies in chemical process systems using historical data, without the risks or costs of online experimentation. This work investigates the application of offline RL to the safe and efficient control of an exothermic polymerisation continuous stirred-tank reactor. We introduce a Gymnasium-compatible simulation environment that captures the reactor's nonlinear dynamics, including reaction kinetics, energy balances, and operational constraints. The environment supports three industrially relevant scenarios: startup, grade change down, and grade change up. It also includes reproducible offline datasets generated from proportional-integral controllers with randomised tunings, providing a benchmark for evaluating offline RL algorithms in realistic process control tasks. We assess behaviour cloning and implicit Q-learning as baseline algorithms, highlighting the challenges offline agents face, including steady-state offsets and degraded performance near setpoints. To address these issues, we propose a novel deployment-time safety layer that performs gradient-based action correction using input convex neural networks (PICNNs) as learned cost models. The PICNN enables real-time, differentiable correction of policy actions by descending a convex, state-conditioned cost surface, without requiring retraining or environment interaction. Experimental results show that offline RL, particularly when combined with convex action correction, can outperform traditional control approaches and maintain stability across all scenarios. These findings demonstrate the feasibility of integrating offline RL with interpretable and safety-aware corrections for high-stakes chemical process control, and lay the groundwork for more reliable data-driven automation in industrial systems.

Updated: 2025-07-30 12:58:02

标题: 通过输入凸动作校正实现离线强化学习的安全部署

摘要: 离线强化学习（offline RL）为利用历史数据开发化工过程系统控制策略提供了一个有前景的框架，而无需进行在线实验的风险或成本。本研究调查了将离线RL应用于安全高效控制一个外加热聚合连续搅拌罐反应器的可能性。我们引入了一个与Gymnasium兼容的仿真环境，捕捉了反应器的非线性动态，包括反应动力学、能量平衡和操作约束。该环境支持三种工业相关场景：启动、等级调整降级和等级调整升级。它还包括由带有随机调整的比例积分控制器生成的可重现的离线数据集，为评估现实过程控制任务中离线RL算法提供了一个基准。我们评估了行为克隆和隐式Q-learning作为基准算法，强调了离线代理面临的挑战，包括稳态偏移和接近设定点时性能下降。为了解决这些问题，我们提出了一种新颖的部署时间安全层，利用输入凸神经网络（PICNNs）作为学习成本模型执行基于梯度的动作校正。PICNN通过下降一个凸的、条件于状态的成本表面，实现了对策略动作的实时可微校正，而无需重新训练或环境交互。实验结果表明，特别是当与凸动作校正结合时，离线RL可以胜过传统的控制方法，并在所有场景中保持稳定性。这些发现证明了将离线RL与可解释的、具有安全意识的校正相结合，用于高风险化工过程控制的可行性，并为工业系统中更可靠的数据驱动自动化奠定了基础。

更新时间: 2025-07-30 12:58:02

领域: eess.SY,cs.AI,cs.LG,cs.SY,stat.ML

下载: http://arxiv.org/abs/2507.22640v1

Safe Deployment of Offline Reinforcement Learning via Input Convex Action Correction

Updated: 2025-07-30 12:58:02

标题: 通过输入凸动作校正实现离线强化学习的安全部署

摘要: 离线强化学习（offline RL）为利用历史数据开发化工过程系统控制策略提供了一个有前景的框架，而无需在线实验的风险或成本。本研究调查了将离线RL应用于外温聚合连续搅拌槽反应器的安全和高效控制。我们引入了一个与Gymnasium兼容的模拟环境，捕捉了反应器的非线性动态，包括反应动力学、能量平衡和操作约束。该环境支持三种工业相关场景：启动、等级降级和等级提升。它还包括从具有随机调整的比例积分控制器生成的可重现的离线数据集，为评估离线RL算法在现实过程控制任务中提供了基准。我们评估了行为克隆和隐式Q学习作为基线算法，突出了离线代理面临的挑战，包括稳态偏移和接近设定点时性能下降。为了解决这些问题，我们提出了一种新颖的部署时安全层，使用输入凸神经网络（PICNNs）作为学习成本模型，执行基于梯度的动作校正。PICNN通过下降凸状态条件成本表面实现实时、可微的策略动作校正，而无需重新训练或环境交互。实验结果表明，离线RL，特别是与凸动作校正相结合时，可以优于传统控制方法，并在所有场景中保持稳定性。这些发现证明了将离线RL与可解释和安全感知校正相结合，用于高风险化工过程控制的可行性，并为工业系统中更可靠的数据驱动自动化奠定了基础。

更新时间: 2025-07-30 12:58:02

领域: eess.SY,cs.AI,cs.LG,cs.SY,stat.ML

下载: http://arxiv.org/abs/2507.22640v1

trAIce3D: A Prompt-Driven Transformer Based U-Net for Semantic Segmentation of Microglial Cells from Large-Scale 3D Microscopy Images

The shape of a cell contains essential information about its function within the biological system. Segmenting these structures from large-scale 3D microscopy images is challenging, limiting clinical insights especially for microglia, immune-associated cells involved in neurodegenerative diseases. Existing segmentation methods mainly focus on cell bodies, struggle with overlapping structures, perform poorly on noisy images, require hyperparameter tuning for each new dataset, or rely on tedious semi-automated approaches. We introduce trAIce3D, a deep-learning architecture designed for precise microglia segmentation, capturing both somas and branches. It employs a two-stage approach: first, a 3D U-Net with vision transformers in the encoder detects somas using a sliding-window technique to cover the entire image. Then, the same architecture, enhanced with cross-attention blocks in skip connections, refines each soma and its branches by using soma coordinates as a prompt and a 3D window around the target cell as input. Training occurs in two phases: self-supervised Soma Segmentation, followed by prompt-based Branch Segmentation, leveraging pre-trained weights from the first phase. Trained and evaluated on a dataset of 41,230 microglial cells, trAIce3D significantly improves segmentation accuracy and generalization, enabling scalable analysis of complex cellular morphologies. While optimized for microglia, its architecture can extend to other intricate cell types, such as neurons and astrocytes, broadening its impact on neurobiological research.

Updated: 2025-07-30 12:54:53

标题: trAIce3D：一种以提示驱动的基于Transformer的U-Net，用于从大尺度3D显微镜图像中进行微胶质细胞的语义分割

摘要: 细胞的形状包含关于其在生物系统内功能的基本信息。从大规模的三维显微镜图像中分割这些结构是具有挑战性的，尤其对于微胶质细胞，这些细胞是参与神经退行性疾病的免疫相关细胞，限制了临床洞察力。现有的分割方法主要集中在细胞体上，难以处理重叠的结构，在嘈杂的图像上表现不佳，对于每个新数据集都需要超参数调整，或者依赖于繁琐的半自动化方法。我们介绍了trAIce3D，这是一个专为精确分割微胶质细胞（包括细胞体和分支）而设计的深度学习架构。它采用两阶段方法：首先，使用视觉变压器在编码器中的3D U-Net检测细胞体，采用滑动窗口技术覆盖整个图像。然后，相同的架构在跳跃连接中增加了交叉注意力块，通过使用细胞体坐标作为提示和目标细胞周围的三维窗口作为输入，优化每个细胞体及其分支。训练分为两个阶段：自监督细胞体分割，然后基于提示的分支分割，利用第一阶段的预训练权重。在一个包含41,230个微胶质细胞的数据集上进行训练和评估，trAIce3D显著提高了分割准确性和泛化能力，实现了对复杂细胞形态的可扩展分析。虽然优化了微胶质细胞，但其架构可以扩展到其他复杂的细胞类型，如神经元和星形胶质细胞，扩大其对神经生物学研究的影响。

更新时间: 2025-07-30 12:54:53

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.22635v1

trAIce3D: A Prompt-Driven Transformer Based U-Net for Semantic Segmentation of Microglial Cells from Large-Scale 3D Microscopy Images

Updated: 2025-07-30 12:54:53

标题: trAIce3D：一种基于Prompt驱动的Transformer U-Net，用于从大规模3D显微镜图像中进行微胶质细胞的语义分割

摘要: 细胞的形状包含关于其在生物系统中功能的重要信息。从大规模3D显微镜图像中分割这些结构具有挑战性，尤其对于微胶质细胞，这些免疫相关细胞参与神经退行性疾病。现有的分割方法主要集中在细胞体上，难以处理重叠结构，在嘈杂图像上表现不佳，对于每个新数据集需要超参数调整，或依赖于繁琐的半自动化方法。我们介绍了trAIce3D，这是一种专为精确分割微胶质细胞而设计的深度学习架构，捕捉细胞体和分支。它采用两阶段方法：首先，在编码器中使用视觉变换器的3D U-Net通过滑动窗口技术检测细胞体，以覆盖整个图像。然后，相同的架构，在跳跃连接中增加了交叉注意力块，通过使用细胞体坐标作为提示和以目标细胞周围的3D窗口为输入，对每个细胞体及其分支进行精细调整。训练分两个阶段进行：自监督的细胞体分割，然后是基于提示的分支分割，利用第一阶段的预训练权重。在41,230个微胶质细胞数据集上训练和评估，trAIce3D显著提高了分割准确性和泛化能力，实现了对复杂细胞形态的可扩展分析。尽管优化为微胶质细胞，其架构可以延伸到其他复杂的细胞类型，如神经元和星形胶质细胞，扩大了其对神经生物学研究的影响。

更新时间: 2025-07-30 12:54:53

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.22635v1

H2Tune: Federated Foundation Model Fine-Tuning with Hybrid Heterogeneity

Different from existing federated fine-tuning (FFT) methods for foundation models, hybrid heterogeneous federated fine-tuning (HHFFT) is an under-explored scenario where clients exhibit double heterogeneity in model architectures and downstream tasks. This hybrid heterogeneity introduces two significant challenges: 1) heterogeneous matrix aggregation, where clients adopt different large-scale foundation models based on their task requirements and resource limitations, leading to dimensional mismatches during LoRA parameter aggregation; and 2) multi-task knowledge interference, where local shared parameters, trained with both task-shared and task-specific knowledge, cannot ensure only task-shared knowledge is transferred between clients. To address these challenges, we propose H2Tune, a federated foundation model fine-tuning with hybrid heterogeneity. Our framework H2Tune consists of three key components: (i) sparsified triple matrix decomposition to align hidden dimensions across clients through constructing rank-consistent middle matrices, with adaptive sparsification based on client resources; (ii) relation-guided matrix layer alignment to handle heterogeneous layer structures and representation capabilities; and (iii) alternating task-knowledge disentanglement mechanism to decouple shared and specific knowledge of local model parameters through alternating optimization. Theoretical analysis proves a convergence rate of O(1/\sqrt{T}). Extensive experiments show our method achieves up to 15.4% accuracy improvement compared to state-of-the-art baselines. Our code is available at https://anonymous.4open.science/r/H2Tune-1407.

Updated: 2025-07-30 12:53:18

标题: H2Tune：具有混合异质性的联邦基础模型微调

摘要: 不同于现有的基于联邦微调（FFT）方法用于基础模型，混合异构联邦微调（HHFFT）是一个尚未探索的场景，其中客户端在模型架构和下游任务中表现出双重异构性。这种混合异构性引入了两个重要挑战：1）异构矩阵聚合，其中客户端根据其任务需求和资源限制采用不同的大规模基础模型，导致在LoRA参数聚合过程中维度不匹配；和2）多任务知识干扰，其中本地共享参数，训练时具有任务共享和任务特定知识，无法确保只有任务共享知识在客户端之间传输。为了解决这些挑战，我们提出了H2Tune，一个具有混合异构性的联邦基础模型微调方法。我们的框架H2Tune由三个关键组件组成：（i）稀疏三重矩阵分解，通过构建基于客户资源的自适应稀疏化的中间矩阵来对齐客户端之间的隐藏维度；（ii）关系引导的矩阵层对齐，用于处理异构层结构和表示能力；和（iii）交替任务知识解缠机制，通过交替优化来解耦本地模型参数的共享和特定知识。理论分析证明了收敛速率为O(1/\sqrt{T})。大量实验表明，与最先进的基线相比，我们的方法实现了高达15.4%的准确度改进。我们的代码可在https://anonymous.4open.science/r/H2Tune-1407上找到。

更新时间: 2025-07-30 12:53:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22633v1

H2Tune: Federated Foundation Model Fine-Tuning with Hybrid Heterogeneity

Updated: 2025-07-30 12:53:18

标题: H2Tune：混合异质性下基于联邦基础模型的微调

摘要: 不同于现有的基于联邦微调（FFT）方法用于基础模型，混合异构联邦微调（HHFFT）是一个未被充分探索的场景，其中客户端在模型架构和下游任务中表现出双重异构性。这种混合异构性引入了两个重要挑战：1）异构矩阵聚合，其中客户端根据其任务需求和资源限制采用不同的大型基础模型，导致在LoRA参数聚合过程中出现维度不匹配；和2）多任务知识干扰，其中局部共享参数，训练了任务共享和任务特定知识，不能保证只有任务共享知识在客户端之间传递。为了解决这些挑战，我们提出了H2Tune，一个具有混合异构性的联邦基础模型微调。我们的框架H2Tune包括三个关键组成部分：（i）稀疏化三重矩阵分解，通过构建基于客户端资源的自适应稀疏化，通过构建一致的中间矩阵来对齐客户端间的隐藏维度；（ii）关系引导的矩阵层对齐，以处理异构层结构和表示能力；和（iii）交替任务知识解缠机制，通过交替优化来解耦本地模型参数的共享和特定知识。理论分析证明了O（1/√T）的收敛速率。大量实验表明我们的方法与最先进的基准相比，准确性提高了高达15.4％。我们的代码可在https://anonymous.4open.science/r/H2Tune-1407找到。

更新时间: 2025-07-30 12:53:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22633v1

A Unified Analysis of Generalization and Sample Complexity for Semi-Supervised Domain Adaptation

Domain adaptation seeks to leverage the abundant label information in a source domain to improve classification performance in a target domain with limited labels. While the field has seen extensive methodological development, its theoretical foundations remain relatively underexplored. Most existing theoretical analyses focus on simplified settings where the source and target domains share the same input space and relate target-domain performance to measures of domain discrepancy. Although insightful, these analyses may not fully capture the behavior of modern approaches that align domains into a shared space via feature transformations. In this paper, we present a comprehensive theoretical study of domain adaptation algorithms based on domain alignment. We consider the joint learning of domain-aligning feature transformations and a shared classifier in a semi-supervised setting. We first derive generalization bounds in a broad setting, in terms of covering numbers of the relevant function classes. We then extend our analysis to characterize the sample complexity of domain-adaptive neural networks employing maximum mean discrepancy (MMD) or adversarial objectives. Our results rely on a rigorous analysis of the covering numbers of these architectures. We show that, for both MMD-based and adversarial models, the sample complexity admits an upper bound that scales quadratically with network depth and width. Furthermore, our analysis suggests that in semi-supervised settings, robustness to limited labeled target data can be achieved by scaling the target loss proportionally to the square root of the number of labeled target samples. Experimental evaluation in both shallow and deep settings lends support to our theoretical findings.

Updated: 2025-07-30 12:53:08

标题: 半监督域自适应的泛化和样本复杂性的统一分析

摘要: 域自适应旨在利用源域中丰富的标签信息，以提高目标域中有限标签的分类性能。尽管该领域已经经历了广泛的方法论发展，但其理论基础仍相对不够深入探讨。大多数现有的理论分析集中在简化的设置上，其中源域和目标域共享相同的输入空间，并将目标域性能与域差异的度量相关联。尽管这些分析具有洞察力，但可能无法完全捕捉通过特征转换将域对齐到共享空间的现代方法的行为。在本文中，我们提出了基于域对齐的域自适应算法的全面理论研究。我们考虑在半监督设置中学习域对齐特征转换和共享分类器。我们首先在一个广泛的设置中推导概括性界限，以相关函数类的覆盖数量为基础。然后，我们将分析扩展到表征使用最大均值差异（MMD）或对抗目标的域自适应神经网络的样本复杂度。我们的结果依赖于对这些体系结构的覆盖数的严格分析。我们表明，对于基于MMD的和对抗模型的情况，样本复杂度具有与网络深度和宽度二次相关的上限。此外，我们的分析表明，在半监督设置中，通过将目标损失按照标记目标样本数量的平方根比例缩放，可以实现对有限标记目标数据的稳健性。在浅层和深层设置中的实验评估支持我们的理论发现。

更新时间: 2025-07-30 12:53:08

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.22632v1

A Unified Analysis of Generalization and Sample Complexity for Semi-Supervised Domain Adaptation

Updated: 2025-07-30 12:53:08

标题: 半监督领域自适应的泛化和样本复杂性的统一分析

摘要: 域自适应旨在利用源域中丰富的标签信息，以提高目标域中有限标签的分类性能。尽管该领域经历了广泛的方法论发展，但其理论基础仍相对未被充分探索。大多数现有的理论分析集中在简化的设置上，其中源域和目标域共享相同的输入空间，并将目标域性能与域差异度的度量相关联。尽管具有洞见，这些分析可能无法完全捕捉现代方法通过特征转换将域对齐到共享空间的行为。在本文中，我们基于域对齐提出了域自适应算法的综合理论研究。我们考虑在半监督设置中联合学习领域对齐特征转换和共享分类器。我们首先在广泛的设置中推导出一般化界限，以相关函数类的覆盖数为基础。然后我们将我们的分析扩展到表征采用最大均值差异（MMD）或对抗目标的域自适应神经网络的样本复杂性。我们的结果依赖于对这些结构的覆盖数的严格分析。我们表明，对于基于MMD和对抗模型，样本复杂性存在一个上界，其随网络深度和宽度呈二次增长。此外，我们的分析表明，在半监督设置中，对有限标记的目标数据的鲁棒性可以通过将目标损失按照标记目标样本数量的平方根比例扩展来实现。在浅层和深层设置下的实验评估支持我们的理论发现。

更新时间: 2025-07-30 12:53:08

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2507.22632v1

LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing

Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model's multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization.

Updated: 2025-07-30 12:48:29

标题: 时尚大爆发！通过草图-文本配对实现图像生成的多条件训练

摘要: 时尚设计是一个复杂的创意过程，融合了视觉和文本表达。设计师通过草图传达想法，草图定义了空间结构和设计元素，文本描述捕捉了材料、质地和风格细节。在这篇论文中，我们提出了LOcalized Text and Sketch for fashion image generation (LOTS)，这是一种基于构图草图和文本的完整时尚外观生成方法。LOTS利用全局描述与成对的本地化草图+文本信息进行条件化，并引入了一种新颖的基于步骤的融合策略进行扩散适应。首先，一个模块化的成对中心表示将草图和文本编码到共享的潜在空间中，同时保留独立的本地化特征；然后，一个扩散成对引导阶段通过基于注意力的引导在扩散模型的多步去噪过程中整合本地和全局条件。为验证我们的方法，我们基于Fashionpedia构建了Sketchy，这是第一个提供每个图像多个文本-草图对的时尚数据集。定量结果显示LOTS在全局和本地化指标上实现了最先进的图像生成性能，而定性示例和人类评估研究突显了它前所未有的设计定制水平。

更新时间: 2025-07-30 12:48:29

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22627v1

LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing

Updated: 2025-07-30 12:48:29

标题: 时尚大爆发！通过草图-文本配对进行图像生成的多条件化

摘要: 时尚设计是一个复杂的创意过程，融合了视觉和文本表达。设计师通过草图传达想法，这些草图定义了空间结构和设计元素，以及文本描述，捕捉了材料、质地和风格细节。在本文中，我们提出了LOcalized Text and Sketch for fashion image generation（LOTS），这是一种基于构图草图-文本的完整时尚外观生成方法。LOTS利用全局描述与配对的本地化草图+文本信息进行条件设定，并引入了一种新颖的基于步骤的融合策略进行扩散适应。首先，一个模块化的对中心表示将草图和文本编码成共享的潜在空间，同时保留独立的本地特征；然后，一个扩散对指导阶段通过基于注意力的引导在扩散模型的多步去噪过程中集成了本地和全局的条件。为了验证我们的方法，我们在Fashionpedia基础上发布了Sketchy，这是第一个提供了每个图像多个文本-草图对的时尚数据集。定量结果显示LOTS在全局和本地指标上实现了最先进的图像生成性能，而定性示例和人类评估研究突显了其前所未有的设计定制水平。

更新时间: 2025-07-30 12:48:29

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22627v1

StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification

Existing large vision-language models (LVLMs) are largely limited to processing short, seconds-long videos and struggle with generating coherent descriptions for extended video spanning minutes or more. Long video description introduces new challenges, such as consistent character identification and plot-level descriptions incorporating both visual and audio information. To address these, we figure out audio-visual character identification, matching character names to each dialogue, as a key factor. We propose StoryTeller, a system for generating dense descriptions of long videos, incorporating both low-level visual concepts and high-level plot information. StoryTeller uses a multimodal large language model that integrates visual, audio, and text modalities to perform audio-visual character identification on minute-long video clips. The results are then fed into a LVLM to enhance consistency of video description. We validate our approach on movie description tasks and introduce MovieStory101, a dataset with dense descriptions for three-minute movie clips. To evaluate long video descriptions, we create StoryQA, a large set of multiple-choice questions for MovieStory101 test set. We assess descriptions by inputting them into GPT-4 to answer these questions, using accuracy as an automatic evaluation metric. Experiments show that StoryTeller outperforms all open and closed-source baselines on StoryQA, achieving 9.5% higher accuracy than the strongest baseline, Gemini-1.5-pro, and demonstrating a +15.56% advantage in human side-by-side evaluations. Additionally, incorporating audio-visual character identification from StoryTeller improves the performance of all video description models, with Gemini-1.5-pro and GPT-4o showing relative improvement of 5.5% and 13.0%, respectively, in accuracy on StoryQA.

Updated: 2025-07-30 12:47:35

标题: 讲述者：通过全局音频-视觉角色识别改进长视频描述

摘要: 现有的大型视觉语言模型（LVLMs）主要局限于处理短暂的几秒钟视频，并且在生成涵盖几分钟或更长时间的连贯描述方面存在困难。长视频描述引入了新的挑战，例如一致的角色识别和融合视觉和音频信息的情节级描述。为了解决这些问题，我们确定了音频-视觉角色识别，将角色名称与每个对话匹配，作为关键因素。我们提出了StoryTeller，一个用于生成长视频的密集描述的系统，融合了低级视觉概念和高级情节信息。StoryTeller使用一个多模态大型语言模型，集成了视觉、音频和文本模态，对一分钟长的视频剪辑进行音频-视觉角色识别。然后将结果输入LVLM以增强视频描述的一致性。我们在电影描述任务上验证了我们的方法，并引入了MovieStory101，一个包含三分钟电影片段的密集描述的数据集。为了评估长视频描述，我们创建了StoryQA，一个包含大量多项选择问题的MovieStory101测试集。我们通过将描述输入GPT-4来回答这些问题，使用准确性作为自动评估指标。实验结果显示，StoryTeller在StoryQA上表现优于所有公开和闭源基线，比最强基线Gemini-1.5-pro的准确度高出9.5％，在人类并行评估中表现出+15.56％的优势。此外，从StoryTeller中整合音频-视觉角色识别可以提高所有视频描述模型的性能，在StoryQA上，Gemini-1.5-pro和GPT-4o分别显示了5.5％和13.0％的准确性相对改进。

更新时间: 2025-07-30 12:47:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.07076v3

StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification

Updated: 2025-07-30 12:47:35

标题: 讲故事者：通过全局音频-视觉角色识别改进长视频描述

摘要: 现有的大型视觉语言模型（LVLMs）主要局限于处理短暂的、几秒钟长的视频，并且在生成延续几分钟或更长时间的视频的连贯描述时存在困难。长视频描述引入了新的挑战，例如一致的角色识别和融合视觉和音频信息的情节级描述。为了解决这些问题，我们确定了音视频角色识别，将角色名称与每个对话匹配，作为关键因素。我们提出了StoryTeller，一个用于生成长视频密集描述的系统，融合了低级视觉概念和高级情节信息。StoryTeller使用一个多模态大型语言模型，集成了视觉、音频和文本模态，以在长达一分钟的视频片段上执行音视频角色识别。然后将结果输入到LVLM中，以增强视频描述的一致性。我们在电影描述任务上验证了我们的方法，并引入了MovieStory101，一个具有三分钟电影片段的密集描述的数据集。为了评估长视频描述，我们创建了StoryQA，一个用于MovieStory101测试集的大量多项选择问题集。我们通过将描述输入到GPT-4中回答这些问题，使用准确率作为自动评估指标。实验证明，StoryTeller在StoryQA上表现优于所有开源和闭源基线，比最强基线Gemini-1.5-pro高出9.5%的准确率，并在人类对比评估中展示了+15.56%的优势。此外，从StoryTeller中融入音视频角色识别提高了所有视频描述模型的性能，Gemini-1.5-pro和GPT-4o在StoryQA上的准确率分别提高了5.5%和13.0%。

更新时间: 2025-07-30 12:47:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.07076v3

Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs

Knowledge graphs offer a structured representation of real-world entities and their relationships, enabling a wide range of applications from information retrieval to automated reasoning. In this paper, we conduct a systematic comparison between traditional rule-based approaches and modern deep learning methods for link prediction. We focus on KBGAT, a graph neural network model that leverages multi-head attention to jointly encode both entity and relation features within local neighborhood structures. To advance this line of research, we introduce \textbf{GCAT} (Graph Collaborative Attention Network), a refined model that enhances context aggregation and interaction between heterogeneous nodes. Experimental results on four widely-used benchmark datasets demonstrate that GCAT not only consistently outperforms rule-based methods but also achieves competitive or superior performance compared to existing neural embedding models. Our findings highlight the advantages of attention-based architectures in capturing complex relational patterns for knowledge graph completion tasks.

Updated: 2025-07-30 12:47:25

标题: 知识图谱中用于链接预测的图协作注意力网络

摘要: 知识图谱提供了对现实世界实体及其关系的结构化表示，从信息检索到自动推理等各种应用都能够实现。在本文中，我们对传统基于规则的方法和现代深度学习方法在链接预测方面进行了系统比较。我们重点关注KBGAT，这是一个图神经网络模型，利用多头注意力来共同编码本地邻域结构中的实体和关系特征。为了推进这一研究领域，我们引入了GCAT（图协同注意网络），这是一个改进的模型，增强了异构节点之间的上下文聚合和交互。在四个广泛使用的基准数据集上的实验结果表明，GCAT不仅在规则-based方法方面始终表现优异，而且与现有的神经嵌入模型相比，也取得了竞争性或更优越的性能。我们的研究结果突出了基于注意力的架构在捕捉知识图谱补全任务中复杂关系模式方面的优势。

更新时间: 2025-07-30 12:47:25

领域: cs.LG

下载: http://arxiv.org/abs/2507.03947v2

Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs

Updated: 2025-07-30 12:47:25

标题: 知识图谱中用于链接预测的图协作注意力网络

摘要: 知识图谱提供了现实世界实体及其关系的结构化表示，使得从信息检索到自动推理等各种应用成为可能。本文对传统基于规则的方法和现代深度学习方法在链接预测方面进行了系统比较。我们专注于KBGAT，这是一种图神经网络模型，利用多头注意力来共同编码局部邻域结构中的实体和关系特征。为了推进这一研究领域，我们引入了\textbf{GCAT}（图协同注意网络），这是一个改进的模型，增强了异构节点之间的上下文聚合和交互。在四个广泛使用的基准数据集上的实验结果表明，GCAT不仅在一致性上优于基于规则的方法，而且与现有神经嵌入模型相比，性能竞争力或卓越。我们的研究结果突显了基于注意力的架构在捕捉知识图完成任务中复杂关系模式方面的优势。

更新时间: 2025-07-30 12:47:25

领域: cs.LG

下载: http://arxiv.org/abs/2507.03947v2

Enhancing Manufacturing Knowledge Access with LLMs and Context-aware Prompting

Knowledge graphs (KGs) have transformed data management within the manufacturing industry, offering effective means for integrating disparate data sources through shared and structured conceptual schemas. However, harnessing the power of KGs can be daunting for non-experts, as it often requires formulating complex SPARQL queries to retrieve specific information. With the advent of Large Language Models (LLMs), there is a growing potential to automatically translate natural language queries into the SPARQL format, thus bridging the gap between user-friendly interfaces and the sophisticated architecture of KGs. The challenge remains in adequately informing LLMs about the relevant context and structure of domain-specific KGs, e.g., in manufacturing, to improve the accuracy of generated queries. In this paper, we evaluate multiple strategies that use LLMs as mediators to facilitate information retrieval from KGs. We focus on the manufacturing domain, particularly on the Bosch Line Information System KG and the I40 Core Information Model. In our evaluation, we compare various approaches for feeding relevant context from the KG to the LLM and analyze their proficiency in transforming real-world questions into SPARQL queries. Our findings show that LLMs can significantly improve their performance on generating correct and complete queries when provided only the adequate context of the KG schema. Such context-aware prompting techniques help LLMs to focus on the relevant parts of the ontology and reduce the risk of hallucination. We anticipate that the proposed techniques help LLMs to democratize access to complex data repositories and empower informed decision-making in manufacturing settings.

Updated: 2025-07-30 12:39:01

标题: 利用LLMs和上下文感知提示增强制造知识获取

摘要: 知识图谱（KGs）已经改变了制造业内的数据管理，通过共享和结构化的概念模式，为整合不同数据源提供了有效手段。然而，对于非专家来说，利用KGs的力量可能是令人生畏的，因为通常需要制定复杂的SPARQL查询来检索特定信息。随着大型语言模型（LLMs）的出现，自然语言查询自动转换为SPARQL格式的潜力日益增长，从而弥合了用户友好界面与KGs复杂体系结构之间的差距。挑战在于充分告知LLMs有关特定领域KGs的相关上下文和结构，例如，在制造领域，以提高生成查询的准确性。在本文中，我们评估了多种使用LLMs作为中介来促进从KGs中检索信息的策略。我们关注制造领域，特别是博世生产线信息系统KG和I40核心信息模型。在我们的评估中，我们比较了从KG向LLM提供相关上下文的各种方法，并分析它们在将真实世界问题转化为SPARQL查询方面的熟练程度。我们的研究结果显示，当仅提供KG模式的充分上下文时，LLMs在生成正确和完整查询方面可以显著提高性能。这种上下文感知提示技术帮助LLMs集中精力于本体的相关部分，并减少幻觉的风险。我们预计，提出的技术将帮助LLMs民主化地访问复杂数据存储库，并在制造环境中促进知情决策。

更新时间: 2025-07-30 12:39:01

领域: cs.AI

下载: http://arxiv.org/abs/2507.22619v1

Enhancing Manufacturing Knowledge Access with LLMs and Context-aware Prompting

Updated: 2025-07-30 12:39:01

标题: 利用LLMs和上下文感知提示增强制造知识访问

摘要: 知识图谱（KGs）已经改变了制造业内的数据管理，通过共享和结构化的概念模式，它为整合不同数据源提供了有效的手段。然而，对于非专家来说，利用KGs的力量可能是令人畏惧的，因为通常需要制定复杂的SPARQL查询来检索特定信息。随着大语言模型（LLMs）的出现，自然语言查询自动转换为SPARQL格式的潜力日益增长，从而弥合了用户友好界面与KGs复杂架构之间的差距。挑战在于充分告知LLMs有关特定领域KGs的相关上下文和结构，例如，在制造业中，以提高生成查询的准确性。在本文中，我们评估了多种使用LLMs作为中介来促进从KGs检索信息的策略。我们专注于制造领域，特别是博世生产线信息系统KG和I40核心信息模型。在我们的评估中，我们比较了多种方法，用于将KG的相关上下文提供给LLM，并分析它们在将现实问题转换为SPARQL查询方面的熟练程度。我们的研究结果显示，当仅提供KG模式的适当上下文时，LLMs可以显著提高生成正确和完整查询的性能。这种上下文感知提示技术帮助LLMs专注于本体论的相关部分，并降低了幻觉的风险。我们预计，提出的技术将帮助LLMs普及对复杂数据存储库的访问，并在制造领域中促进知情决策。

更新时间: 2025-07-30 12:39:01

领域: cs.AI

下载: http://arxiv.org/abs/2507.22619v1

Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions

Recent advances in text-to-image diffusion models have enabled the creation of a new form of digital art: optical illusions--visual tricks that create different perceptions of reality. However, adversaries may misuse such techniques to generate hateful illusions, which embed specific hate messages into harmless scenes and disseminate them across web communities. In this work, we take the first step toward investigating the risks of scalable hateful illusion generation and the potential for bypassing current content moderation models. Specifically, we generate 1,860 optical illusions using Stable Diffusion and ControlNet, conditioned on 62 hate messages. Of these, 1,571 are hateful illusions that successfully embed hate messages, either overtly or subtly, forming the Hateful Illusion dataset. Using this dataset, we evaluate the performance of six moderation classifiers and nine vision language models (VLMs) in identifying hateful illusions. Experimental results reveal significant vulnerabilities in existing moderation models: the detection accuracy falls below 0.245 for moderation classifiers and below 0.102 for VLMs. We further identify a critical limitation in their vision encoders, which mainly focus on surface-level image details while overlooking the secondary layer of information, i.e., hidden messages. To address this risk, we explore preliminary mitigation measures and identify the most effective approaches from the perspectives of image transformations and training-level strategies.

Updated: 2025-07-30 12:37:29

标题: 《明目张胆的仇恨：关于调节人工智能生成的仇恨幻觉风险的研究》

摘要: 最近，文本到图像扩散模型的最新进展已经实现了一种新形式的数字艺术：光学幻觉--通过视觉技巧创造不同的现实感知。然而，恶意对手可能会滥用这种技术生成令人讨厌的幻觉，将特定的仇恨信息嵌入无害的场景，并在网络社区中传播。在这项工作中，我们迈出了探讨可扩展的令人讨厌的幻觉生成风险以及绕过当前内容管理模型的潜力的第一步。具体地，我们使用稳定扩散和ControlNet生成1,860个光学幻觉，条件是62个仇恨信息。其中，1,571个是成功嵌入仇恨信息的令人讨厌的幻觉，无论是明显还是隐晦，形成了令人讨厌的幻觉数据集。利用这个数据集，我们评估了六个内容管理分类器和九个视觉语言模型（VLMs）在识别令人讨厌的幻觉方面的性能。实验结果显示现有管理模型存在显著的漏洞：管理分类器的检测准确率低于0.245，VLMs的检测准确率低于0.102。我们进一步发现他们的视觉编码器存在一个关键限制，主要关注表面级别的图像细节，而忽略了隐藏信息的次级信息层。为了解决这一风险，我们探索了初步的缓解措施，并从图像转换和训练级别策略的角度确定了最有效的方法。

更新时间: 2025-07-30 12:37:29

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2507.22617v1

Adaptive Duration Model for Text Speech Alignment

Speech-to-text alignment is a critical component of neural text to-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive end to-end TTS models rely on durations extracted from external sources, using additional duration models for alignment. In this paper, we propose a novel duration prediction framework that can give compromising phoneme-level duration distribution with given text. In our experiments, the proposed duration model has more precise prediction and condition adaptation ability compared to previous baseline models. Numerically, it has roughly a 11.3 percents immprovement on alignment accuracy, and makes the performance of zero-shot TTS models more robust to the mismatch between prompt audio and input audio.

Updated: 2025-07-30 12:31:11

标题: 适应性时长模型用于文本语音对齐

摘要: 语音到文本对齐是神经文本到语音（TTS）模型的关键组成部分。自回归TTS模型通常使用注意力机制在线学习这些对齐。然而，这些对齐往往脆弱，并且经常无法推广到长句和域外文本，导致缺失或重复的单词。大多数非自回归端到端TTS模型依赖于从外部来源提取的持续时间，使用额外的持续时间模型进行对齐。在本文中，我们提出了一种新颖的持续时间预测框架，可以给出具有给定文本的折衷音素级持续时间分布。在我们的实验中，所提出的持续时间模型相对于以前的基线模型具有更精确的预测和条件适应能力。在数值上，它在对齐准确度上大约提高了11.3％，并使零样本TTS模型的性能对提示音频和输入音频之间的不匹配更加稳健。

更新时间: 2025-07-30 12:31:11

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2507.22612v1

Adaptive Duration Model for Text Speech Alignment

Updated: 2025-07-30 12:31:11

标题: 自适应时长模型用于文本语音对齐

摘要: 语音到文本的对齐是神经文本到语音（TTS）模型的关键组成部分。自回归TTS模型通常使用注意力机制在线学习这些对齐。然而，这些对齐往往脆弱，并且经常不能推广到长句和超领域文本，导致缺失或重复单词。大多数非自回归端到端TTS模型依赖于从外部来源提取的持续时间，使用额外的持续时间模型进行对齐。在本文中，我们提出了一个新颖的持续时间预测框架，可以根据给定文本提供妥协的音素级持续时间分布。在我们的实验中，所提出的持续时间模型相比之前的基准模型具有更精确的预测和条件适应能力。从数字上看，它在对齐准确性上大约有11.3％的改善，并使零样本TTS模型的性能更加稳健，能够应对提示音频和输入音频之间的不匹配。

更新时间: 2025-07-30 12:31:11

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2507.22612v1

DoS Attacks and Defense Technologies in Blockchain Systems: A Hierarchical Analysis

Blockchain technology is widely used in various fields due to its ability to provide decentralization and trustless security. This is a fundamental understanding held by many advocates, but it is misunderstood, leading participants to fail to recognize the limitations of the security that blockchain can provide. Among all current network attacks, Denial of Service (DoS) attacks pose significant threats due to their ease of execution and destructive potential. This paper, based on the blockchain architecture hierarchy, categorizes and organizes existing DoS attacks, with a focus on explaining the principles and methods of contract layer and consensus layer DoS attacks. Furthermore, this paper comprehensively analyzes and compares commonly used detection methods and defense technologies, which will contribute to strengthening the security and stability of blockchain systems and promoting further innovation and application of blockchain systems.

Updated: 2025-07-30 12:29:34

标题: 区块链系统中的DoS攻击和防御技术：一种分层分析

摘要: 区块链技术由于其提供去中心化和无信任安全性的能力而被广泛应用于各个领域。这是许多支持者持有的基本理解，但被误解，导致参与者未能认识到区块链能提供的安全性限制。在当前所有网络攻击中，拒绝服务（DoS）攻击由于其执行简单和破坏潜力而构成重大威胁。本文基于区块链架构层次结构，对现有DoS攻击进行分类和整理，重点解释了合约层和共识层DoS攻击的原则和方法。此外，本文全面分析和比较常用的检测方法和防御技术，这将有助于加强区块链系统的安全性和稳定性，并推动进一步创新和应用区块链系统。

更新时间: 2025-07-30 12:29:34

领域: cs.CR

下载: http://arxiv.org/abs/2507.22611v1

Metamorphic Testing of Deep Code Models: A Systematic Literature Review

Large language models and deep learning models designed for code intelligence have revolutionized the software engineering field due to their ability to perform various code-related tasks. These models can process source code and software artifacts with high accuracy in tasks such as code completion, defect detection, and code summarization; therefore, they can potentially become an integral part of modern software engineering practices. Despite these capabilities, robustness remains a critical quality attribute for deep-code models as they may produce different results under varied and adversarial conditions (e.g., variable renaming). Metamorphic testing has become a widely used approach to evaluate models' robustness by applying semantic-preserving transformations to input programs and analyzing the stability of model outputs. While prior research has explored testing deep learning models, this systematic literature review focuses specifically on metamorphic testing for deep code models. By studying 45 primary papers, we analyze the transformations, techniques, and evaluation methods used to assess robustness. Our review summarizes the current landscape, identifying frequently evaluated models, programming tasks, datasets, target languages, and evaluation metrics, and highlights key challenges and future directions for advancing the field.

Updated: 2025-07-30 12:25:30

标题: 深层代码模型的变形测试：系统文献综述

摘要: 大型语言模型和为代码智能设计的深度学习模型已经在软件工程领域引起了革命，因为它们能够执行各种与代码相关的任务。这些模型可以在诸如代码完成、缺陷检测和代码总结等任务中以高准确度处理源代码和软件工件；因此，它们有可能成为现代软件工程实践的一个重要组成部分。尽管具有这些能力，鲁棒性仍然是深度代码模型的一个关键质量属性，因为它们在不同和对抗性条件下（例如变量重命名）可能产生不同的结果。变形测试已经成为评估模型鲁棒性的一种广泛使用的方法，通过对输入程序应用保持语义的转换并分析模型输出的稳定性来评估。虽然先前的研究已经探讨了对深度学习模型进行测试，但这篇系统文献综述专注于深度代码模型的变形测试。通过研究45篇主要论文，我们分析了用于评估鲁棒性的转换、技术和评估方法。我们的综述总结了当前的情况，确定了经常评估的模型、编程任务、数据集、目标语言和评估指标，并重点介绍了前进领域的关键挑战和未来方向。

更新时间: 2025-07-30 12:25:30

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.22610v1

Metamorphic Testing of Deep Code Models: A Systematic Literature Review

Updated: 2025-07-30 12:25:30

标题: 深度代码模型的变形测试：系统文献综述

摘要: 大型语言模型和深度学习模型为代码智能设计，由于它们在执行各种与代码相关任务时的能力，已经在软件工程领域引起了革命。这些模型可以在诸如代码补全、缺陷检测和代码摘要等任务中以高准确性处理源代码和软件工件；因此，它们有可能成为现代软件工程实践的不可或缺的一部分。尽管这些模型具有这些能力，但稳健性仍然是深层代码模型的一个关键质量属性，因为它们在不同和对抗性条件下（例如变量重命名）可能产生不同的结果。变形测试已经成为一种广泛使用的方法，通过对输入程序应用保持语义的转换并分析模型输出的稳定性来评估模型的稳健性。尽管先前的研究已经探讨了测试深度学习模型的方法，但这篇系统文献综述专注于深代码模型的变形测试。通过研究45篇主要论文，我们分析了用于评估稳健性的转换、技术和评估方法。我们的综述总结了当前的局面，识别了经常评估的模型、编程任务、数据集、目标语言和评估指标，并强调了推进该领域的关键挑战和未来方向。

更新时间: 2025-07-30 12:25:30

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.22610v1

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

Reinforcement learning has proven its effectiveness in enhancing the reasoning capabilities of large language models. Recent research efforts have progressively extended this paradigm to multimodal reasoning tasks. Due to the inherent complexity and diversity of multimodal tasks, especially in semantic content and problem formulations, existing models often exhibit unstable performance across various domains and difficulty levels. To address these limitations, we propose VL-Cogito, an advanced multimodal reasoning model trained via a novel multi-stage Progressive Curriculum Reinforcement Learning (PCuRL) framework. PCuRL systematically guides the model through tasks of gradually increasing difficulty, substantially improving its reasoning abilities across diverse multimodal contexts. The framework introduces two key innovations: (1) an online difficulty soft weighting mechanism, dynamically adjusting training difficulty across successive RL training stages; and (2) a dynamic length reward mechanism, which encourages the model to adaptively regulate its reasoning path length according to task complexity, thus balancing reasoning efficiency with correctness. Experimental evaluations demonstrate that VL-Cogito consistently matches or surpasses existing reasoning-oriented models across mainstream multimodal benchmarks spanning mathematics, science, logic, and general understanding, validating the effectiveness of our approach.

Updated: 2025-07-30 12:23:21

标题: VL-Cogito：先进多模态推理的渐进式课程强化学习

摘要: 强化学习已经证明了其在增强大型语言模型推理能力方面的有效性。最近的研究努力逐渐将这一范式扩展到多模态推理任务。由于多模态任务的固有复杂性和多样性，特别是在语义内容和问题制定方面，现有模型往往在各个领域和难度级别上表现不稳定。为了解决这些限制，我们提出了VL-Cogito，这是一个先进的多模态推理模型，通过一种新颖的多阶段渐进课程强化学习（PCuRL）框架进行训练。PCuRL系统地引导模型通过逐渐增加难度的任务，显著提高其在各种多模态背景下的推理能力。该框架引入了两个关键创新：（1）在线难度软加权机制，动态调整连续RL训练阶段中的训练难度；（2）动态长度奖励机制，鼓励模型根据任务复杂性自适应地调节其推理路径长度，从而在推理效率和正确性之间取得平衡。实验评估表明，VL-Cogito在涵盖数学、科学、逻辑和一般理解的主流多模态基准测试中始终与或超过现有的面向推理的模型，验证了我们方法的有效性。

更新时间: 2025-07-30 12:23:21

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.22607v1

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

Updated: 2025-07-30 12:23:21

标题: VL-Cogito：高级多模态推理的渐进式课程强化学习

摘要: 强化学习已经证明了它在增强大型语言模型的推理能力方面的有效性。最近的研究努力逐渐将这种范式扩展到多模态推理任务。由于多模态任务的固有复杂性和多样性，特别是在语义内容和问题表述方面，现有模型往往在不同领域和难度水平上表现不稳定。为了解决这些限制，我们提出了VL-Cogito，这是一个通过新颖的多阶段渐进课程强化学习（PCuRL）框架训练的先进多模态推理模型。PCuRL系统地引导模型通过逐渐增加难度的任务，显著改进了其在不同多模态环境中的推理能力。该框架引入了两个关键创新：（1）在线难度软加权机制，动态调整连续RL训练阶段的训练难度；和（2）动态长度奖励机制，鼓励模型根据任务复杂性自适应地调节其推理路径长度，从而平衡推理效率和正确性。实验评估表明，VL-Cogito在跨数学、科学、逻辑和一般理解等主流多模态基准上始终能够匹敌或超越现有的面向推理的模型，验证了我们方法的有效性。

更新时间: 2025-07-30 12:23:21

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.22607v1

MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines

Large Language Models (LLMs) have demonstrated the ability to solve a wide range of practical tasks within multi-agent systems. However, existing human-designed multi-agent frameworks are typically limited to a small set of pre-defined scenarios, while current automated design methods suffer from several limitations, such as the lack of tool integration, dependence on external training data, and rigid communication structures. In this paper, we propose MetaAgent, a finite state machine based framework that can automatically generate a multi-agent system. Given a task description, MetaAgent will design a multi-agent system and polish it through an optimization algorithm. When the multi-agent system is deployed, the finite state machine will control the agent's actions and the state transitions. To evaluate our framework, we conduct experiments on both text-based tasks and practical tasks. The results indicate that the generated multi-agent system surpasses other auto-designed methods and can achieve a comparable performance with the human-designed multi-agent system, which is optimized for those specific tasks.

Updated: 2025-07-30 12:22:30

标题: 元代理：基于有限状态机自动构建多智能体系统

摘要: 大型语言模型（LLMs）已经证明能够解决多智能体系统中的各种实际任务。然而，现有的人为设计的多智能体框架通常局限于一小组预定义场景，而当前的自动设计方法存在一些限制，如缺乏工具集成、依赖外部训练数据和僵化的通信结构。本文提出了MetaAgent，一个基于有限状态机的框架，可以自动生成多智能体系统。给定一个任务描述，MetaAgent将设计一个多智能体系统，并通过优化算法完善它。当多智能体系统部署时，有限状态机将控制代理的动作和状态转换。为了评估我们的框架，我们进行了基于文本任务和实际任务的实验。结果表明，生成的多智能体系统超越了其他自动设计方法，并且可以实现与人为设计的多智能体系统相媲美的性能，后者针对那些特定任务进行了优化。

更新时间: 2025-07-30 12:22:30

领域: cs.AI

下载: http://arxiv.org/abs/2507.22606v1

MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines

Updated: 2025-07-30 12:22:30

标题: 元代理：基于有限状态机自动构建多智能体系统

摘要: 大型语言模型（LLMs）已经展示出在多智能体系统内解决各种实际任务的能力。然而，现有的人类设计的多智能体框架通常仅限于一小部分预定义场景，而当前的自动设计方法存在一些限制，例如缺乏工具集成、依赖外部训练数据和刚性通信结构。在本文中，我们提出了MetaAgent，这是一个基于有限状态机的框架，可以自动生成多智能体系统。给定一个任务描述，MetaAgent将设计一个多智能体系统，并通过优化算法对其进行优化。当多智能体系统部署时，有限状态机将控制智能体的行动和状态转换。为了评估我们的框架，我们进行了基于文本的任务和实际任务的实验。结果表明，生成的多智能体系统超过了其他自动生成的方法，并且能够达到与为这些特定任务优化的人类设计的多智能体系统相媲美的性能。

更新时间: 2025-07-30 12:22:30

领域: cs.AI

下载: http://arxiv.org/abs/2507.22606v1

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

The emergence of Multimodal Large Language Models (MLLMs) has driven significant advances in Graphical User Interface (GUI) agent capabilities. Nevertheless, existing GUI agent training and inference techniques still suffer from a dilemma for reasoning designs, ineffective reward, and visual noise. To address these issues, we introduce UI-AGILE, a comprehensive framework enhancing GUI agents at both the training and inference stages. For training, we propose a suite of improvements to the Supervised Fine-Tuning (SFT) process: 1) a Continuous Reward function to incentivize high-precision grounding; 2) a "Simple Thinking" reward to balance planning with speed and grounding accuracy; and 3) a Cropping-based Resampling strategy to mitigate the sparse reward problem and improve learning on complex tasks. For inference, we present Decomposed Grounding with Selection, a novel method that dramatically improves grounding accuracy on high-resolution displays by breaking the image into smaller, manageable parts. Experiments show that UI-AGILE achieves the state-of-the-art performance on two benchmarks ScreenSpot-Pro and ScreenSpot-v2. For instance, using both our proposed training and inference enhancement methods brings 23% grounding accuracy improvement over the best baseline on ScreenSpot-Pro.

Updated: 2025-07-30 12:17:53

标题: UI-AGILE：利用有效的强化学习和精确的推理时间基础推进GUI代理

摘要: 多模态大型语言模型（MLLMs）的出现推动了图形用户界面（GUI）代理能力的显著进步。然而，现有的GUI代理训练和推理技术仍然存在推理设计、无效奖励和视觉噪声的困境。为了解决这些问题，我们引入了UI-AGILE，这是一个全面的框架，可以增强GUI代理在训练和推理阶段的能力。对于训练，我们提出了一系列改进措施，用于监督微调（SFT）过程：1）连续奖励函数以激励高精度接地；2）“简单思考”奖励以平衡规划与速度和接地准确性；以及3）基于裁剪的重采样策略以缓解稀疏奖励问题，并改善复杂任务上的学习。对于推理，我们提出了一种名为分解接地与选择的新方法，通过将图像分解为更小、更易处理的部分，显著提高了高分辨率显示屏上的接地准确性。实验证明，UI-AGILE在两个基准测试ScreenSpot-Pro和ScreenSpot-v2上实现了最先进的性能。例如，使用我们提出的训练和推理增强方法，可以使ScreenSpot-Pro上最佳基线的接地准确性提高23%。

更新时间: 2025-07-30 12:17:53

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2507.22025v2

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Updated: 2025-07-30 12:17:53

标题: UI-AGILE: 通过有效的强化学习和精确的推理时间基础，推进GUI代理

摘要: 多模态大型语言模型（MLLMs）的出现推动了图形用户界面（GUI）代理能力的显著进步。然而，现有的GUI代理训练和推理技术仍然存在关于推理设计、无效奖励和视觉噪声的困境。为了解决这些问题，我们引入了UI-AGILE，这是一个全面框架，可以在训练和推理阶段增强GUI代理。在训练阶段，我们提出了一系列改进来完善监督微调（SFT）过程：1）连续奖励函数以激励高精度接地；2）“简单思考”奖励以平衡规划和速度与接地准确性；以及3）基于裁剪的重采样策略来缓解稀疏奖励问题，并提高在复杂任务上的学习效果。在推理阶段，我们提出了一种名为“分解接地与选择”的新方法，通过将图像分解成更小、可管理的部分，显著提高了高分辨率显示器上的接地准确性。实验证明，UI-AGILE在两个基准测试ScreenSpot-Pro和ScreenSpot-v2上实现了最先进的性能。例如，使用我们提出的训练和推理增强方法，可以使在ScreenSpot-Pro上的接地准确性比最佳基线提高23%。

更新时间: 2025-07-30 12:17:53

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2507.22025v2

BALSAM: A Platform for Benchmarking Arabic Large Language Models

The impressive advancement of Large Language Models (LLMs) in English has not been matched across all languages. In particular, LLM performance in Arabic lags behind, due to data scarcity, linguistic diversity of Arabic and its dialects, morphological complexity, etc. Progress is further hindered by the quality of Arabic benchmarks, which typically rely on static, publicly available data, lack comprehensive task coverage, or do not provide dedicated platforms with blind test sets. This makes it challenging to measure actual progress and to mitigate data contamination. Here, we aim to bridge these gaps. In particular, we introduce BALSAM, a comprehensive, community-driven benchmark aimed at advancing Arabic LLM development and evaluation. It includes 78 NLP tasks from 14 broad categories, with 52K examples divided into 37K test and 15K development, and a centralized, transparent platform for blind evaluation. We envision BALSAM as a unifying platform that sets standards and promotes collaborative research to advance Arabic LLM capabilities.

Updated: 2025-07-30 12:16:39

标题: BALSAM：用于基准测试阿拉伯语大型语言模型的平台

摘要: 大语言模型（LLMs）在英语中取得的显着进展并未在所有语言中得到匹配。特别是，由于数据稀缺、阿拉伯语及其方言的语言多样性、形态复杂性等原因，阿拉伯语中LLM的性能落后。进展进一步受到阿拉伯语基准数据的质量限制，这些数据通常依赖于静态的、公开可用的数据，缺乏全面的任务覆盖，或者没有提供带有盲测试集的专用平台。这使得衡量实际进展并减少数据污染变得具有挑战性。在这里，我们旨在弥合这些差距。具体而言，我们介绍了BALSAM，一个全面的、社区驱动的基准，旨在推动阿拉伯语LLM的开发和评估。它包括来自14个广泛类别的78个自然语言处理任务，共包含52,000个示例，分为37,000个测试集和15,000个开发集，并提供一个集中、透明的平台进行盲评估。我们设想BALSAM作为一个统一的平台，设定标准并促进合作研究，以提升阿拉伯语LLM的能力。

更新时间: 2025-07-30 12:16:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.22603v1

BALSAM: A Platform for Benchmarking Arabic Large Language Models

Updated: 2025-07-30 12:16:39

标题: BALSAM：用于基准测试阿拉伯语大型语言模型的平台

摘要: 大型语言模型（LLMs）在英语方面的显著进展并未在所有语言中得到匹配。特别是，在阿拉伯语方面，由于数据稀缺、阿拉伯语及其方言的语言多样性、形态复杂性等原因，LLM性能落后。阿拉伯语基准的质量进一步受到阻碍，这些基准通常依赖于静态、公开可用的数据，缺乏全面的任务覆盖，或者不提供带有盲测试集的专用平台。这使得衡量实际进展和减少数据污染变得具有挑战性。在这里，我们旨在弥合这些差距。具体而言，我们介绍了BALSAM，一个全面的、社区驱动的基准，旨在推动阿拉伯语LLM的发展和评估。它包括来自14个广泛类别的78个自然语言处理任务，共有52,000个示例，分为37,000个测试和15,000个开发示例，并提供一个集中、透明的盲评估平台。我们设想BALSAM作为一个统一的平台，旨在制定标准并促进合作研究，以推进阿拉伯语LLM的能力。

更新时间: 2025-07-30 12:16:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.22603v1

The Cooperative Network Architecture: Learning Structured Networks as Representation of Sensory Patterns

We introduce the Cooperative Network Architecture (CNA), a model that represents sensory signals using structured, recurrently connected networks of neurons, termed "nets." Nets are dynamically assembled from overlapping net fragments, which are learned based on statistical regularities in sensory input. This architecture offers robustness to noise, deformation, and out-of-distribution data, addressing challenges in current vision systems from a novel perspective. We demonstrate that net fragments can be learned without supervision and flexibly recombined to encode novel patterns, enabling figure completion and resilience to noise. Our findings establish CNA as a promising paradigm for developing neural representations that integrate local feature processing with global structure formation, providing a foundation for future research on invariant object recognition.

Updated: 2025-07-30 12:14:16

标题: 合作网络架构：学习结构化网络作为感知模式的表征

摘要: 我们介绍了合作网络架构（CNA），这是一个模型，利用结构化的、循环连接的神经元网络来表示感觉信号，称为“网络”。网络是动态组装的，由重叠的网络片段组成，这些片段是基于感觉输入中的统计规律学习而来的。这种架构能够在噪声、变形和超出分布的数据方面提供鲁棒性，从一个新颖的角度解决了当前视觉系统中的挑战。我们证明，网络片段可以在没有监督的情况下学习，并可以灵活地重新组合以编码新的模式，实现图形完成和对噪声的韧性。我们的研究结果将CNA确立为一个有前途的范式，用于开发整合局部特征处理和全局结构形成的神经表示，为未来关于不变对象识别的研究奠定基础。

更新时间: 2025-07-30 12:14:16

领域: cs.CV,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.05650v4

The Cooperative Network Architecture: Learning Structured Networks as Representation of Sensory Patterns

Updated: 2025-07-30 12:14:16

标题: 合作网络架构：学习结构化网络作为感知模式的表示

摘要: 我们介绍了合作网络架构（CNA），这是一个模型，用结构化的、循环连接的神经元网络表示感觉信号，称为“网络”。网络是动态组装的，由重叠的网络片段组成，这些片段是基于感觉输入中的统计规律学习而来的。这种架构对噪声、变形和超出分布数据具有鲁棒性，从新的角度解决了当前视觉系统面临的挑战。我们证明了网络片段可以在无监督情况下学习，并可以灵活重新组合以编码新的模式，实现图形完成并对抗噪声。我们的发现确立了CNA作为一个有前途的范例，可以发展整合局部特征处理和全局结构形成的神经表示，为将来关于不变物体识别的研究奠定基础。

更新时间: 2025-07-30 12:14:16

领域: cs.CV,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.05650v4

Privacy-Preserving Federated Learning Scheme with Mitigating Model Poisoning Attacks: Vulnerabilities and Countermeasures

The privacy-preserving federated learning schemes based on the setting of two honest-but-curious and non-colluding servers offer promising solutions in terms of security and efficiency. However, our investigation reveals that these schemes still suffer from privacy leakage when considering model poisoning attacks from malicious users. Specifically, we demonstrate that the privacy-preserving computation process for defending against model poisoning attacks inadvertently leaks privacy to one of the honest-but-curious servers, enabling it to access users' gradients in plaintext. To address both privacy leakage and model poisoning attacks, we propose an enhanced privacy-preserving and Byzantine-robust federated learning (PBFL) scheme, comprising three components: (1) a two-trapdoor fully homomorphic encryption (FHE) scheme to bolster users' privacy protection; (2) a novel secure normalization judgment method to preemptively thwart gradient poisoning; and (3) an innovative secure cosine similarity measurement method for detecting model poisoning attacks without compromising data privacy. Our scheme guarantees privacy preservation and resilience against model poisoning attacks, even in scenarios with heterogeneous, non-IID (Independently and Identically Distributed) datasets. Theoretical analyses substantiate the security and efficiency of our scheme, and extensive experiments corroborate the efficacy of our private attacks. Furthermore, the experimental results demonstrate that our scheme accelerates training speed while reducing communication overhead compared to the state-of-the-art PBFL schemes.

Updated: 2025-07-30 12:10:41

标题: 隐私保护的联邦学习方案：缓解模型毒化攻击的漏洞与对策

摘要: 基于两个诚实但好奇和不勾结的服务器设置的保护隐私的联邦学习方案在安全性和效率方面提供了有希望的解决方案。然而，我们的调查显示，这些方案在考虑恶意用户的模型投毒攻击时仍然存在隐私泄露问题。具体来说，我们证明了用于抵御模型投毒攻击的保护隐私计算过程无意中向其中一个诚实但好奇的服务器泄露隐私，使其能够以明文形式访问用户的梯度。为了解决隐私泄露和模型投毒攻击，我们提出了增强的保护隐私和拜占庭-强健的联邦学习（PBFL）方案，包括三个组成部分：（1）一个双陷门完全同态加密（FHE）方案，以加强用户的隐私保护；（2）一种新颖的安全归一化判断方法，以预先阻止梯度投毒；和（3）一种创新的安全余弦相似度测量方法，用于检测模型投毒攻击而不损害数据隐私。我们的方案保证了隐私保护和对抗模型投毒攻击的弹性，即使在具有异构、非独立和相同分布（IID）数据集的场景中也是如此。理论分析证实了我们方案的安全性和效率，并且广泛的实验证实了我们的私人攻击的有效性。此外，实验结果表明，与最先进的PBFL方案相比，我们的方案加快了训练速度同时减少了通信开销。

更新时间: 2025-07-30 12:10:41

领域: cs.CR

下载: http://arxiv.org/abs/2506.23622v2

Skull-stripping induces shortcut learning in MRI-based Alzheimer's disease classification

Objectives: High classification accuracy of Alzheimer's disease (AD) from structural MRI has been achieved using deep neural networks, yet the specific image features contributing to these decisions remain unclear. In this study, the contributions of T1-weighted (T1w) gray-white matter texture, volumetric information, and preprocessing -- particularly skull-stripping -- were systematically assessed. Methods: A dataset of 990 matched T1w MRIs from AD patients and cognitively normal controls from the ADNI database were used. Preprocessing was varied through skull-stripping and intensity binarization to isolate texture and shape contributions. A 3D convolutional neural network was trained on each configuration, and classification performance was compared using exact McNemar tests with discrete Bonferroni-Holm correction. Feature relevance was analyzed using Layer-wise Relevance Propagation, image similarity metrics, and spectral clustering of relevance maps. Results: Despite substantial differences in image content, classification accuracy, sensitivity, and specificity remained stable across preprocessing conditions. Models trained on binarized images preserved performance, indicating minimal reliance on gray-white matter texture. Instead, volumetric features -- particularly brain contours introduced through skull-stripping -- were consistently used by the models. Conclusions: This behavior reflects a shortcut learning phenomenon, where preprocessing artifacts act as potentially unintended cues. The resulting Clever Hans effect emphasizes the critical importance of interpretability tools to reveal hidden biases and to ensure robust and trustworthy deep learning in medical imaging.

Updated: 2025-07-30 12:00:44

标题: 去脑壳化引发MRI诊断阿尔茨海默病的捷径学习

摘要: 目标：使用深度神经网络实现了从结构性MRI中对阿尔茨海默病（AD）的高分类准确性，然而导致这些决策的具体图像特征仍然不清楚。本研究系统评估了T1加权（T1w）灰白质纹理、体积信息和预处理（特别是去颅骨）的贡献。方法：使用来自ADNI数据库的990例匹配的AD患者和认知正常对照组的T1w MRI数据集。通过去颅骨和强度二值化来改变预处理，以分离纹理和形状的贡献。对每种配置进行了3D卷积神经网络训练，并使用确切的McNemar检验和离散的Bonferroni-Holm校正比较分类性能。使用逐层相关传播、图像相似度度量和相关性图的光谱聚类来分析特征的相关性。结果：尽管图像内容存在显著差异，但分类准确性、灵敏度和特异性在预处理条件下保持稳定。在二值化图像上训练的模型保持了性能，表明对灰白质纹理的依赖性较小。相反，体积特征，特别是通过去颅骨引入的脑轮廓，被模型一贯使用。结论：这种行为反映了一种快捷学习现象，即预处理伪装作为潜在的无意识线索。由此产生的聪明汉斯效应强调了解释性工具的关键重要性，以揭示隐藏的偏见，并确保医学成像中深度学习的稳健可靠性。

更新时间: 2025-07-30 12:00:44

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2501.15831v3

Skull-stripping induces shortcut learning in MRI-based Alzheimer's disease classification

Updated: 2025-07-30 12:00:44

标题: 去颅骨化在基于MRI的阿尔茨海默病分类中引发了快速学习

摘要: 目标：使用深度神经网络，已经实现了从结构MRI中高准确率地分类阿尔茨海默病（AD），但是导致这些决策的具体图像特征仍然不清楚。在这项研究中，系统评估了T1加权（T1w）灰质-白质纹理、体积信息和预处理（特别是去颅骨）的贡献。方法：使用来自ADNI数据库的来自AD患者和认知正常对照组的990个匹配的T1w MRI数据集。通过去颅骨和强度二值化来改变预处理，以分离纹理和形状的贡献。对每种配置训练一个三维卷积神经网络，并使用精确的McNemar检验与离散的Bonferroni-Holm校正来比较分类性能。使用分层相关传播、图像相似度度量和相关性映射的谱聚类来分析特征的相关性。结果：尽管在图像内容上存在显著差异，但在预处理条件下，分类准确性、灵敏性和特异性仍保持稳定。在二值化图像上训练的模型保持了性能，表明对灰-白质纹理的依赖较小。相反，体积特征，特别是通过去颅骨引入的脑轮廓，一直被模型一致使用。结论：这种行为反映了一种捷径学习现象，即预处理人为因素充当潜在的提示。由此产生的Clever Hans效应强调了解释性工具的关键重要性，以揭示隐藏的偏见，并确保医学影像中的深度学习稳健可靠。

更新时间: 2025-07-30 12:00:44

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2501.15831v3

TempRe: Template generation for single and direct multi-step retrosynthesis

Retrosynthesis planning remains a central challenge in molecular discovery due to the vast and complex chemical reaction space. While traditional template-based methods offer tractability, they suffer from poor scalability and limited generalization, and template-free generative approaches risk generating invalid reactions. In this work, we propose TempRe, a generative framework that reformulates template-based approaches as sequence generation, enabling scalable, flexible, and chemically plausible retrosynthesis. We evaluated TempRe across single-step and multi-step retrosynthesis tasks, demonstrating its superiority over both template classification and SMILES-based generation methods. On the PaRoutes multi-step benchmark, TempRe achieves strong top-k route accuracy. Furthermore, we extend TempRe to direct multi-step synthesis route generation, providing a lightweight and efficient alternative to conventional single-step and search-based approaches. These results highlight the potential of template generative modeling as a powerful paradigm in computer-aided synthesis planning.

Updated: 2025-07-30 11:59:42

标题: TempRe: 单步和直接多步反向合成的模板生成

摘要: 反合成规划仍然是分子发现中的一个核心挑战，因为化学反应空间庞大且复杂。传统基于模板的方法具有可行性，但受到扩展性差和泛化能力有限的困扰，而无模板生成方法存在生成无效反应的风险。在这项工作中，我们提出了TempRe，一个将基于模板的方法重新构思为序列生成的生成框架，实现可扩展、灵活和化学合理的反合成。我们在单步和多步反合成任务上评估了TempRe，展示了其优于模板分类和基于SMILES的生成方法。在PaRoutes多步基准测试中，TempRe实现了强大的顶k路线准确性。此外，我们将TempRe扩展到直接多步合成路线生成，为传统单步和基于搜索的方法提供了一种轻量且高效的替代方案。这些结果突显了模板生成建模作为计算辅助合成规划中强大范式的潜力。

更新时间: 2025-07-30 11:59:42

领域: cs.LG

下载: http://arxiv.org/abs/2507.21762v2

TempRe: Template generation for single and direct multi-step retrosynthesis

Updated: 2025-07-30 11:59:42

标题: TempRe: 单步和直接多步逆合成的模板生成

摘要: 反向合成规划仍然是分子发现中的一个核心挑战，因为化学反应空间庞大且复杂。传统基于模板的方法虽然易于处理，但存在可扩展性差和泛化能力有限的问题，而无模板的生成方法存在生成无效反应的风险。在这项工作中，我们提出了TempRe，一个生成框架，将基于模板的方法重新构想为序列生成，实现可扩展、灵活和化学合理的反向合成。我们评估了TempRe在单步和多步反向合成任务中的表现，证明其优于模板分类和基于SMILES的生成方法。在PaRoutes多步基准测试中，TempRe实现了强大的前k路线准确性。此外，我们将TempRe扩展到直接多步合成路径生成，提供了一种轻量且高效的替代方案，相对于传统的单步和基于搜索的方法。这些结果突显了模板生成建模作为计算辅助合成规划中强大范式的潜力。

更新时间: 2025-07-30 11:59:42

领域: cs.LG

下载: http://arxiv.org/abs/2507.21762v2

Compression Method for Deep Diagonal State Space Model Based on $H^2$ Optimal Reduction

Deep learning models incorporating linear SSMs have gained attention for capturing long-range dependencies in sequential data. However, their large parameter sizes pose challenges for deployment on resource-constrained devices. In this study, we propose an efficient parameter reduction method for these models by applying $H^{2}$ model order reduction techniques from control theory to their linear SSM components. In experiments, the LRA benchmark results show that the model compression based on our proposed method outperforms an existing method using the Balanced Truncation, while successfully reducing the number of parameters in the SSMs to $1/32$ without sacrificing the performance of the original models.

Updated: 2025-07-30 11:57:54

标题: 基于$H^2$最优降维的深层对角状态空间模型的压缩方法

摘要: 深度学习模型结合线性SSMs已经引起关注，用于捕捉顺序数据中的长距离依赖关系。然而，它们庞大的参数大小对于在资源受限设备上部署构成挑战。在这项研究中，我们提出了一种高效的参数减少方法，通过将控制理论中的$H^{2}$模型阶减方法应用于它们的线性SSM组件。实验中，LRA基准结果显示，基于我们提出的方法的模型压缩优于使用平衡截断的现有方法，同时成功将SSMs中的参数数量降低到$1/32$，而不会牺牲原始模型的性能。

更新时间: 2025-07-30 11:57:54

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.10078v2

Compression Method for Deep Diagonal State Space Model Based on $H^2$ Optimal Reduction

Updated: 2025-07-30 11:57:54

标题: 深度对角状态空间模型的压缩方法基于$H^2$最优降维

摘要: 深度学习模型结合线性SSMs已经引起人们的关注，用于捕捉序列数据中的长程依赖关系。然而，它们庞大的参数规模给在资源受限设备上部署带来挑战。在本研究中，我们提出了一种高效的参数减少方法，通过将控制理论中的$H^{2}$模型降阶技术应用于其线性SSM组件。实验结果显示，基于我们提出的方法的模型压缩优于使用平衡截断的现有方法，在成功将SSMs中的参数数量减少到1/32的同时，不损害原始模型的性能。

更新时间: 2025-07-30 11:57:54

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.10078v2

AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence

Despite major advances in machine learning, current artificial intelligence systems continue to fall short of human-like general intelligence. While large language and reasoning models can generate fluent and coherent outputs, they lack the deep understanding and adaptive reasoning that characterize truly general intelligence. Existing evaluation frameworks, which are centered on broad language or perception tasks, fail to capture generality at its core and offer no guidance. The artificial general intelligence testbed (AGITB) is a novel and freely available benchmarking suite comprising twelve fully automatable tests designed to evaluate low-level cognitive precursors through binary signal prediction. AGITB requires models to forecast temporal sequences without pretraining, symbolic manipulation, or semantic grounding. The framework isolates core computational invariants - such as determinism, sensitivity, and generalization - that align with principles of biological information processing. Engineered to resist brute-force and memorization-based approaches, AGITB presumes no prior knowledge and demands learning from first principles. While humans pass all tests, no current AI system has met the full AGITB criteria, underscoring its potential as a rigorous, interpretable, and actionable benchmark for guiding and evaluating progress toward artificial general intelligence. A reference implementation of AGITB is available on GitHub.

Updated: 2025-07-30 11:42:12

标题: AGITB：用于评估人工通用智能的信号水平基准

摘要: 尽管机器学习取得了重大进展，但当前的人工智能系统仍然无法达到类似人类智能的普遍智能水平。虽然大型语言和推理模型可以生成流畅连贯的输出，但它们缺乏深刻的理解和适应性推理，这是真正普遍智能的特征。现有的评估框架，主要集中在广泛的语言或感知任务上，未能捕捉到普遍性的本质，并且没有提供指导。人工智能通用测试平台（AGITB）是一个新颖且免费的基准套件，包括十二个完全可自动化的测试，旨在通过二进制信号预测评估低级认知先兆。AGITB要求模型在没有预训练、符号操作或语义基础的情况下预测时间序列。该框架隔离了核心的计算不变性——如确定性、敏感性和泛化性——这些不变性符合生物信息处理的原则。AGITB经过设计，能够抵抗蛮力和基于记忆的方法，不假设任何先前知识，并要求从第一原则开始学习。虽然人类通过了所有测试，但目前没有任何人工智能系统符合完整的AGITB标准，突显了它作为引导和评估人工智能通用智能进展的严格、可解释和可执行的基准的潜力。AGITB的一个参考实现可在GitHub上找到。

更新时间: 2025-07-30 11:42:12

领域: cs.AI,I.2; D.2.8; I.2.6; I.5

下载: http://arxiv.org/abs/2504.04430v5

AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence

Updated: 2025-07-30 11:42:12

标题: AGITB：用于评估人工通用智能的信号水平基准

摘要: 尽管机器学习取得了重大进展，但当前的人工智能系统仍然无法达到类人一般的智能水平。尽管大型语言和推理模型可以生成流畅、连贯的输出，但它们缺乏表征真正一般智能的深刻理解和适应性推理。现有的评估框架，主要集中在广泛的语言或感知任务上，无法真正捕捉到核心的泛化性，并且没有提供指导。人工通用智能测试平台（AGITB）是一个新颖且免费的基准套件，包括十二个完全可自动化的测试，旨在通过二进制信号预测来评估低级认知前体。AGITB要求模型在没有预训练、符号操作或语义基础的情况下预测时间序列。该框架隔离了核心的计算不变性 - 如确定性、敏感性和泛化 - 与生物信息处理原则相一致。设计成抵抗蛮力和基于记忆的方法，AGITB不假设任何先前知识，并要求从基本原理开始学习。尽管人类通过了所有测试，但目前没有一种人工智能系统符合完整的AGITB标准，突显了它作为指导和评估通往人工通用智能进展的严格、可解释和可执行的基准的潜力。AGITB的参考实现可在GitHub上找到。

更新时间: 2025-07-30 11:42:12

领域: cs.AI,I.2; D.2.8; I.2.6; I.5

下载: http://arxiv.org/abs/2504.04430v5

Deep learning of geometrical cell division rules

The positioning of new cellular walls during cell division plays a key role in shaping plant tissue organization. The influence of cell geometry on the positioning of division planes has been previously captured into various geometrical rules. Accordingly, linking cell shape to division orientation has relied on the comparison between observed division patterns and predictions under specific rules. The need to define a priori the tested rules is a fundamental limitation of this hypothesis-driven approach. As an alternative, we introduce a data-based approach to investigate the relation between cell geometry and division plane positioning, exploiting the ability of deep neural network to learn complex relationships across multidimensional spaces. Adopting an image-based cell representation, we show how division patterns can be learned and predicted from mother cell geometry using a UNet architecture modified to operate on cell masks. Using synthetic data and A. thaliana embryo cells, we evaluate the model performances on a wide range of diverse cell shapes and division patterns. We find that the trained model accounted for embryo division patterns that were previously irreconcilable under existing geometrical rules. Our work shows the potential of deep networks to understand cell division patterns and to generate new hypotheses on the control of cell division positioning.

Updated: 2025-07-30 11:41:42

标题: 深度学习几何细胞分裂规则

摘要: During cell division, the positioning of new cellular walls is crucial in shaping plant tissue organization. Previous studies have captured the influence of cell geometry on division plane positioning through various geometrical rules. However, linking cell shape to division orientation has traditionally relied on comparing observed division patterns with predictions made under specific rules, which limits the ability to explore new hypotheses. In this study, we propose a data-driven approach using deep neural networks to investigate the relationship between cell geometry and division plane positioning. By employing an image-based cell representation and modifying a UNet architecture to operate on cell masks, we demonstrate how division patterns can be learned and predicted from mother cell geometry. Through the use of synthetic data and A. thaliana embryo cells, we evaluate the model's performance across a range of diverse cell shapes and division patterns. Our findings show that the trained model was able to account for embryo division patterns that were previously inconsistent with existing geometrical rules. This study highlights the potential of deep networks in understanding cell division patterns and generating new hypotheses on the control of cell division positioning.

更新时间: 2025-07-30 11:41:42

领域: cs.LG,q-bio.CB,q-bio.QM,I.2.6; I.6; J.3

下载: http://arxiv.org/abs/2507.22587v1

Deep learning of geometrical cell division rules

Updated: 2025-07-30 11:41:42

标题: 深度学习几何细胞分裂规则

摘要: 在细胞分裂过程中新细胞壁的定位在塑造植物组织结构中起着关键作用。细胞几何形状对分裂面定位的影响先前已被纳入各种几何规则中。因此，将细胞形状与分裂方向联系起来依赖于观察到的分裂模式与特定规则下的预测之间的比较。在这种基于假设的方法中，需要事先定义被测试的规则，这是一个基本的限制。作为替代方案，我们引入了一种基于数据的方法来研究细胞几何形状与分裂面定位之间的关系，利用深度神经网络在多维空间中学习复杂关系的能力。采用基于图像的细胞表示，我们展示了如何使用修改后的UNet架构在细胞掩膜上操作，从母细胞几何形状中学习和预测分裂模式。使用合成数据和拟南芥胚胎细胞，我们评估了模型在各种不同细胞形状和分裂模式上的表现。我们发现训练好的模型能解释以前在现有几何规则下无法协调的胚胎分裂模式。我们的工作展示了深度网络理解细胞分裂模式和提出关于细胞分裂定位控制的新假设的潜力。

更新时间: 2025-07-30 11:41:42

领域: cs.LG,q-bio.CB,q-bio.QM,I.2.6; I.6; J.3

下载: http://arxiv.org/abs/2507.22587v1

MultiEditor: Controllable Multimodal Object Editing for Driving Scenarios Using 3D Gaussian Splatting Priors

Autonomous driving systems rely heavily on multimodal perception data to understand complex environments. However, the long-tailed distribution of real-world data hinders generalization, especially for rare but safety-critical vehicle categories. To address this challenge, we propose MultiEditor, a dual-branch latent diffusion framework designed to edit images and LiDAR point clouds in driving scenarios jointly. At the core of our approach is introducing 3D Gaussian Splatting (3DGS) as a structural and appearance prior for target objects. Leveraging this prior, we design a multi-level appearance control mechanism--comprising pixel-level pasting, semantic-level guidance, and multi-branch refinement--to achieve high-fidelity reconstruction across modalities. We further propose a depth-guided deformable cross-modality condition module that adaptively enables mutual guidance between modalities using 3DGS-rendered depth, significantly enhancing cross-modality consistency. Extensive experiments demonstrate that MultiEditor achieves superior performance in visual and geometric fidelity, editing controllability, and cross-modality consistency. Furthermore, generating rare-category vehicle data with MultiEditor substantially enhances the detection accuracy of perception models on underrepresented classes.

Updated: 2025-07-30 11:40:20

标题: 多编辑器：使用3D高斯喷溅先验进行驾驶场景的可控多模态对象编辑

摘要: 自动驾驶系统在理解复杂环境时严重依赖多模态感知数据。然而，真实世界数据的长尾分布阻碍了泛化能力，尤其是对于罕见但安全关键的车辆类别。为了解决这一挑战，我们提出了MultiEditor，这是一个双分支潜态扩散框架，旨在联合编辑驾驶场景中的图像和LiDAR点云。我们方法的核心是引入3D高斯喷洒（3DGS）作为目标对象的结构和外观先验。利用这一先验，我们设计了一个多层外观控制机制，包括像素级粘贴、语义级引导和多分支细化，以实现跨模态的高保真重建。我们进一步提出了一个深度引导的可变形跨模态条件模块，通过使用3DGS渲染的深度自适应地实现模态间的相互引导，显著增强了跨模态一致性。大量实验证明，MultiEditor在视觉和几何保真度、编辑可控性和跨模态一致性方面表现出优越性能。此外，使用MultiEditor生成罕见类别的车辆数据显著增强了感知模型在不常见类别上的检测准确性。

更新时间: 2025-07-30 11:40:20

领域: cs.AI

下载: http://arxiv.org/abs/2507.21872v2

MultiEditor: Controllable Multimodal Object Editing for Driving Scenarios Using 3D Gaussian Splatting Priors

Updated: 2025-07-30 11:40:20

标题: MultiEditor：使用3D高斯涂抹先验进行可控的多模态对象编辑驾驶场景

摘要: 自动驾驶系统在理解复杂环境时严重依赖多模态感知数据。然而，真实世界数据的长尾分布阻碍了泛化能力，特别是对于罕见但安全关键的车辆类别。为了解决这一挑战，我们提出了MultiEditor，这是一个双分支潜在扩散框架，旨在联合编辑驾驶场景中的图像和LiDAR点云。我们方法的核心是引入3D高斯喷洒（3DGS）作为目标对象的结构和外观先验。利用这个先验，我们设计了一个多级外观控制机制，包括像素级粘贴，语义级指导和多分支细化，以实现跨模态的高保真重建。我们进一步提出了一个深度引导的可变形跨模态条件模块，通过使用3DGS渲染的深度自适应地启用模态之间的相互指导，显著增强了跨模态一致性。大量实验证明，MultiEditor在视觉和几何保真度、编辑可控性和跨模态一致性方面表现出优越性能。此外，使用MultiEditor生成罕见类别车辆数据显著提高了感知模型在代表性不足的类别上的检测准确性。

更新时间: 2025-07-30 11:40:20

领域: cs.AI

下载: http://arxiv.org/abs/2507.21872v2

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily adept at processing textual information. To address this constraint, researchers have endeavored to integrate visual capabilities with LLMs, resulting in the emergence of Vision-Language Models (VLMs). These advanced models are instrumental in tackling more intricate tasks such as image captioning and visual question answering. In our comprehensive survey paper, we delve into the key advancements within the realm of VLMs. Our classification organizes VLMs into three distinct categories: models dedicated to vision-language understanding, models that process multimodal inputs to generate unimodal (textual) outputs and models that both accept and produce multimodal inputs and outputs.This classification is based on their respective capabilities and functionalities in processing and generating various modalities of data.We meticulously dissect each model, offering an extensive analysis of its foundational architecture, training data sources, as well as its strengths and limitations wherever possible, providing readers with a comprehensive understanding of its essential components. We also analyzed the performance of VLMs in various benchmark datasets. By doing so, we aim to offer a nuanced understanding of the diverse landscape of VLMs. Additionally, we underscore potential avenues for future research in this dynamic domain, anticipating further breakthroughs and advancements.

Updated: 2025-07-30 11:37:10

标题: 探索视觉-语言模型的前沿：当前方法论和未来方向的调查

摘要: 大型语言模型（LLMs）的出现显著改变了人工智能革命的轨迹。然而，这些LLMs表现出一个显著的局限性，即它们主要擅长处理文本信息。为了解决这一限制，研究人员致力于将视觉能力与LLMs整合，从而产生了视觉-语言模型（VLMs）。这些先进模型对于处理诸如图像描述和视觉问答等更复杂的任务至关重要。在我们的综合调查论文中，我们深入探讨了VLMs领域的关键进展。我们的分类将VLMs分为三个不同的类别：专注于视觉-语言理解的模型、处理多模态输入以生成单模态（文本）输出的模型以及既接受又产生多模态输入和输出的模型。这种分类基于它们在处理和生成各种数据模态方面的能力和功能。我们对每个模型进行了细致的解剖，对其基本架构、训练数据来源以及尽可能的优缺点进行了广泛分析，为读者提供了对其基本组成部分的全面理解。我们还分析了VLMs在各种基准数据集上的性能。通过这样做，我们旨在提供对VLMs多样化景观的细致理解。此外，我们强调了未来研究在这一充满活力领域中的潜在途径，期待进一步的突破和进展。

更新时间: 2025-07-30 11:37:10

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.07214v3

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

Updated: 2025-07-30 11:37:10

标题: 探索视觉-语言模型的前沿：当前方法论和未来方向的调查

摘要: 大型语言模型（LLMs）的出现显著地改变了人工智能革命的轨迹。然而，这些LLMs表现出一个显著的限制，即它们主要擅长处理文本信息。为了解决这一限制，研究人员努力将视觉能力与LLMs整合，从而产生了视觉语言模型（VLMs）。这些先进模型在处理更复杂的任务，如图像字幕和视觉问题回答方面起着关键作用。在我们的综合调查论文中，我们深入探讨了VLMs领域的关键进展。我们的分类将VLMs分为三个不同的类别：专注于视觉语言理解的模型，处理多模态输入以生成单模态（文本）输出的模型，以及接受并产生多模态输入和输出的模型。这种分类基于它们在处理和生成各种数据模态方面的能力和功能。我们对每个模型进行了细致的解剖，对其基础架构、训练数据来源以及可能的优势和局限性进行了广泛分析，为读者提供了对其重要组成部分的全面理解。我们还分析了VLMs在各种基准数据集上的表现。通过这样做，我们旨在提供对VLMs多样化领域的微妙理解。此外，我们强调了这个充满活力的领域未来研究的潜在途径，期待进一步的突破和进展。

更新时间: 2025-07-30 11:37:10

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.07214v3

Unveiling the Influence of Amplifying Language-Specific Neurons

Language-specific neurons in LLMs that strongly correlate with individual languages have been shown to influence model behavior by deactivating them. However, their role in amplification remains underexplored. This work investigates the effect of amplifying language-specific neurons through interventions across 18 languages, including low-resource ones, using three models primarily trained in different languages. We compare amplification factors by their effectiveness in steering to the target language using a proposed Language Steering Shift (LSS) evaluation score, then evaluate it on downstream tasks: commonsense reasoning (XCOPA, XWinograd), knowledge (Include), and translation (FLORES). The optimal amplification factors effectively steer output toward nearly all tested languages. Intervention using this factor on downstream tasks improves self-language performance in some cases but generally degrades cross-language results. These findings highlight the effect of language-specific neurons in multilingual behavior, where amplification can be beneficial especially for low-resource languages, but provides limited advantage for cross-lingual transfer.

Updated: 2025-07-30 11:23:30

标题: 揭示放大语言特定神经元的影响

摘要: 在LMM中与个体语言强相关的特定语言神经被证明可以通过停用来影响模型行为。然而，它们在放大方面的作用尚未得到充分探讨。本研究通过跨越18种语言（包括低资源语言）对三个主要训练于不同语言的模型进行干预，研究放大语言特定神经的效果。我们通过提出的Language Steering Shift（LSS）评估分数来比较放大因子在引导到目标语言方面的有效性，然后在后续任务中进行评估：常识推理（XCOPA，XWinograd），知识（Include）和翻译（FLORES）。最佳放大因子有效地将输出引导到几乎所有测试过的语言。在某些情况下，使用该因子对后续任务进行干预可以改善自身语言表现，但通常会降低跨语言结果。这些发现突显了多语言行为中语言特定神经的影响，其中放大对于低资源语言尤其有益，但对跨语言转移的优势有限。

更新时间: 2025-07-30 11:23:30

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.22581v1

Unveiling the Influence of Amplifying Language-Specific Neurons

Updated: 2025-07-30 11:23:30

标题: 揭示放大语言特定神经元的影响

摘要: LLMs中与特定语言强烈相关的特定语言神经元已被证明通过使其失活来影响模型行为。然而，它们在增强方面的作用尚未得到充分探索。本研究通过在18种语言中进行干预，包括低资源语言，使用主要在不同语言中进行训练的三个模型，研究了通过增强特定语言神经元的效果。我们比较了增强因子在使用提出的语言转向移位（LSS）评估分数时在引导到目标语言方面的有效性，然后在下游任务中对其进行评估：常识推理（XCOPA，XWinograd）、知识（Include）和翻译（FLORES）。最佳增强因子有效地将输出引导到几乎所有经过测试的语言。在某些情况下，对下游任务使用这个因子可以提高自身语言表现，但通常会降低跨语言结果。这些发现突显了特定语言神经元在多语言行为中的影响，增强对于低资源语言尤其有益，但对跨语言转移提供的优势有限。

更新时间: 2025-07-30 11:23:30

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.22581v1

RePaCA: Leveraging Reasoning Large Language Models for Static Automated Patch Correctness Assessment

Automated Program Repair (APR) seeks to automatically correct software bugs without requiring human intervention. However, existing tools tend to generate patches that satisfy test cases without fixing the underlying bug, those are known as overfitting patches. To address this issue, Automated Patch Correctness Assessment (APCA) attempts to identify overfitting patches generated by APR tools. It can be solved as a static approach, meaning that no additional information is needed beyond the original and fixed code snippets. Current static techniques often struggle with reliability, flexibility and transparency. To address these issues, we introduce RePaCA, a novel static APCA technique that leverages Large Language Models (LLMs) specialized in thinking tasks. Our model is prompted with both buggy and fixed code snippets and guided to generate a Chain of Thought that analyses code differences, reasons about how the patch addresses the root cause, and ultimately provides a binary classification: correct or overfitting. To enhance these reasoning capabilities for the APCA task specifically, the LLM is finetuned using Reinforcement Learning with the Group Relative Policy Optimization algorithm. When evaluated on a standard Defects4J-derived test, our approach achieves state-of-the-art performance, with 83.1% accuracy and an 84.8% F1-score. Furthermore, our model demonstrates superior generalization capabilities when trained on different datasets, outperforming the leading technique. This reasoning capability also provides enhanced explainability for the patch assessment. These findings underscore the considerable promise of finetuned, reasoning LLMs to advance static APCA by enhancing accuracy, generalization, and explainability.

Updated: 2025-07-30 11:21:09

标题: RePaCA：利用推理大型语言模型进行静态自动补丁正确性评估

摘要: 自动程序修复（APR）旨在在无需人工干预的情况下自动修复软件错误。然而，现有工具往往生成满足测试用例但未修复潜在错误的补丁，这些被称为过拟合补丁。为了解决这个问题，自动补丁正确性评估（APCA）试图识别由APR工具生成的过拟合补丁。它可以作为一种静态方法来解决，这意味着除了原始和修复的代码片段之外不需要额外的信息。当前的静态技术通常在可靠性、灵活性和透明度方面存在困难。为了解决这些问题，我们介绍了一种新颖的静态APCA技术RePaCA，它利用专门用于思考任务的大型语言模型（LLMs）。我们的模型同时提示有错误和修复的代码片段，并引导其生成一串思考链，分析代码差异，推断补丁如何解决根本原因，并最终提供二元分类：正确或过拟合。为了增强这种推理能力，LLM专门针对APCA任务进行了微调，使用了群体相对策略优化算法进行强化学习。在标准的Defects4J衍生测试中进行评估时，我们的方法实现了最先进的性能，准确率为83.1％，F1分数为84.8％。此外，我们的模型在训练不同数据集时展现了卓越的泛化能力，优于主流技术。这种推理能力还为补丁评估提供了增强的可解释性。这些发现强调了微调、推理LLMs在提高静态APCA准确性、泛化性和可解释性方面的巨大潜力。

更新时间: 2025-07-30 11:21:09

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.22580v1

RePaCA: Leveraging Reasoning Large Language Models for Static Automated Patch Correctness Assessment

Updated: 2025-07-30 11:21:09

标题: RePaCA:利用推理大型语言模型进行静态自动修补正确性评估

摘要: 自动程序修复（APR）旨在在不需要人工干预的情况下自动纠正软件错误。然而，现有工具往往会生成满足测试用例但没有修复基础错误的补丁，这些被称为过拟合补丁。为了解决这个问题，自动补丁正确性评估（APCA）尝试识别由APR工具生成的过拟合补丁。它可以作为一个静态方法来解决，意味着除了原始和修复的代码片段之外不需要额外的信息。当前的静态技术通常在可靠性、灵活性和透明度方面遇到困难。为了解决这些问题，我们引入了RePaCA，一种利用专门用于思考任务的大型语言模型（LLMs）的新颖静态APCA技术。我们的模型同时提示有错误和修复代码片段，并引导生成一个思维链，分析代码差异，推理补丁如何解决根本原因，并最终提供二进制分类：正确或过拟合。为了增强这些推理能力，特别是针对APCA任务，LLM使用组相对策略优化算法进行了微调。在标准Defects4J衍生测试中评估时，我们的方法实现了最先进的性能，准确率为83.1％，F1分数为84.8％。此外，我们的模型在不同数据集上训练时展现出卓越的泛化能力，优于领先的技术。这种推理能力还为补丁评估提供了增强的可解释性。这些发现强调了通过改进准确性、泛化能力和可解释性来推进静态APCA的巨大潜力。

更新时间: 2025-07-30 11:21:09

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.22580v1

A Mean-Field Theory of $Θ$-Expectations

The canonical theory of sublinear expectations, a foundation of stochastic calculus under ambiguity, is insensitive to the non-convex geometry of primitive uncertainty models. This paper develops a new stochastic calculus for a structured class of such non-convex models. We introduce a class of fully coupled Mean-Field Forward-Backward Stochastic Differential Equations where the BSDE driver is defined by a pointwise maximization over a law-dependent, non-convex set. Mathematical tractability is achieved via a uniform strong concavity assumption on the driver with respect to the control variable, which ensures the optimization admits a unique and stable solution. A central contribution is to establish the Lipschitz stability of this optimizer from primitive geometric and regularity conditions, which underpins the entire well-posedness theory. We prove local and global well-posedness theorems for the FBSDE system. The resulting valuation functional, the $\Theta$-Expectation, is shown to be dynamically consistent and, most critically, to violate the axiom of sub-additivity. This, along with its failure to be translation invariant, demonstrates its fundamental departure from the convex paradigm. This work provides a rigorous foundation for stochastic calculus under a class of non-convex, endogenous ambiguity.

Updated: 2025-07-30 11:08:56

标题: 一个$Θ$-期望的平均场理论

摘要: 次线性期望的规范理论是模糊下的随机微积分的基础，对于原始不确定模型的非凸几何形状不敏感。本文为这类非凸模型开发了一种新的随机微积分。我们引入了一类完全耦合的平均场前向-后向随机微分方程，其中BSDE驱动器由对一个依赖于法律的非凸集进行逐点最大化定义。通过对控制变量关于驱动器的统一强凹性假设，实现了数学可处理性，这确保了优化具有唯一且稳定的解。一个中心贡献是建立这个优化器的从原始的几何和正则条件开始的利普希茨稳定性，为整个适定性理论奠定基础。我们证明了FBSDE系统的局部和全局适定性定理。所得估值功能，Θ-期望，被证明是动态一致的，并且最关键的是，违反了次加性公理。这，连同它不具有平移不变性的失败，表明了它与凸范式的根本偏离。这项工作为一类非凸内生模糊下的随机微积分提供了严格的基础。

更新时间: 2025-07-30 11:08:56

领域: math.PR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22577v1

A Mean-Field Theory of $Θ$-Expectations

Updated: 2025-07-30 11:08:56

标题: 一个$Θ$-期望的平均场理论

摘要: 亚线性期望的规范理论是确定性下的随机微积分的基础，对原始不确定性模型的非凸几何形状不敏感。本文开发了一种新的随机微积分理论，适用于一类结构化的非凸模型。我们引入了一类完全耦合的平均场前向后向随机微分方程，其中BSDE驱动器通过对依赖于法律的非凸集进行点态最大化来定义。通过对驱动器相对于控制变量的统一强凹性假设，实现了数学上的可操作性，确保优化具有唯一且稳定的解。一个中心贡献是建立这个优化器的利普希茨稳定性，基于原始几何和正则条件，这是整个良好定义理论的基础。我们证明了FBSDE系统的局部和全局良好定义定理。所得到的估值功能，θ-期望，被证明是动态一致的，最重要的是，违反了次可加性公理。这一点，以及它的不具有平移不变性，展示了它与凸范式的根本分歧。这项工作为一类非凸、内生模糊性下的随机微积分提供了严格的基础。

更新时间: 2025-07-30 11:08:56

领域: math.PR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22577v1

MLMC-based Resource Adequacy Assessment with Active Learning Trained Surrogate Models

Multilevel Monte Carlo (MLMC) is a flexible and effective variance reduction technique for accelerating reliability assessments of complex power system. Recently, data-driven surrogate models have been proposed as lower-level models in the MLMC framework due to their high correlation and negligible execution time once trained. However, in resource adequacy assessments, pre-labeled datasets are typically unavailable. For large-scale systems, the efficiency gains from surrogate models are often offset by the substantial time required for labeling training data. Therefore, this paper introduces a speed metric that accounts for training time in evaluating MLMC efficiency. Considering the total time budget is limited, a vote-by-committee active learning approach is proposed to reduce the required labeling calls. A case study demonstrates that, within a given computational budget, active learning in combination with MLMC can result in a substantial reduction variance.

Updated: 2025-07-30 11:07:49

标题: 基于MLMC的资源充足性评估与主动学习训练的替代模型

摘要: 多层蒙特卡洛（MLMC）是一种灵活而有效的方差减少技术，用于加速复杂电力系统的可靠性评估。最近，基于数据驱动的代理模型被提出作为MLMC框架中的下位模型，因为它们具有很高的相关性，并且一旦训练完成，执行时间可以忽略不计。然而，在资源充足性评估中，通常无法获得预标记的数据集。对于大规模系统，代理模型带来的效率提升通常会被标记训练数据所需的大量时间所抵消。因此，本文引入了一个考虑训练时间的速度指标，用于评估MLMC的效率。考虑到总时间预算有限，提出了一种委员会投票主动学习方法，以减少所需的标记调用。一个案例研究表明，在给定的计算预算内，主动学习与MLMC相结合可以实现方差显著减少。

更新时间: 2025-07-30 11:07:49

领域: cs.LG

下载: http://arxiv.org/abs/2505.20930v2

MLMC-based Resource Adequacy Assessment with Active Learning Trained Surrogate Models

Updated: 2025-07-30 11:07:49

标题: 基于MLMC的资源充裕性评估与主动学习训练的替代模型

摘要: 多层蒙特卡罗（MLMC）是一种灵活且有效的方差缩减技术，可加速复杂电力系统可靠性评估。最近，基于数据驱动的代理模型被提议作为MLMC框架中的低级模型，因为它们具有高相关性且一旦训练完毕，执行时间可以忽略不计。然而，在资源充足性评估中，通常无法获得预先标记的数据集。对于大型系统，代理模型带来的效率提升往往被标记训练数据所需的大量时间抵消。因此，本文引入了一个考虑训练时间的速度指标来评估MLMC效率。考虑到总时间预算有限，提出了一种委员会投票主动学习方法来减少所需的标记调用。案例研究表明，在给定的计算预算内，主动学习与MLMC相结合可以显著减少方差。

更新时间: 2025-07-30 11:07:49

领域: cs.LG

下载: http://arxiv.org/abs/2505.20930v2

Hyperbolic Graph Learning: A Comprehensive Review

Graph representation learning in Euclidean space, despite its widespread adoption and proven utility in many domains, often struggles to effectively capture the inherent hierarchical and complex relational structures prevalent in real-world data, particularly for datasets exhibiting a highly non-Euclidean latent anatomy or power-law distributions. Hyperbolic geometry, with its constant negative curvature and exponential growth property, naturally accommodates such structures, offering a promising alternative for learning rich graph representations. This survey paper provides a comprehensive review of the rapidly evolving field of Hyperbolic Graph Learning (HGL). We systematically categorize and analyze existing methods broadly dividing them into (1) hyperbolic graph embedding-based techniques, (2) graph neural network-based hyperbolic models, and (3) emerging paradigms. Beyond methodologies, we extensively discuss diverse applications of HGL across multiple domains, including recommender systems, knowledge graphs, bioinformatics, and other relevant scenarios, demonstrating the broad applicability and effectiveness of hyperbolic geometry in real-world graph learning tasks. Most importantly, we identify several key challenges that serve as directions for advancing HGL, including handling complex data structures, developing geometry-aware learning objectives, ensuring trustworthy and scalable implementations, and integrating with foundation models, e.g., large language models. We highlight promising research opportunities in this exciting interdisciplinary area. A comprehensive repository can be found at https://github.com/digailab/awesome-hyperbolic-graph-learning.

Updated: 2025-07-30 11:05:02

标题: 双曲图学习：综述

摘要: 在欧几里得空间中的图表示学习，尽管被广泛采用并在许多领域证明了其效用，但往往难以有效地捕捉现实世界数据中普遍存在的层次化和复杂关系结构，特别是对于展现高度非欧几里得潜在解剖学或幂律分布的数据集。双曲几何具有恒定的负曲率和指数增长特性，自然地适应这些结构，为学习丰富的图表示提供了一个有前途的替代方案。本调查论文综述了快速发展的双曲图学习（HGL）领域。我们系统地分类和分析现有方法，广泛将它们分为（1）基于双曲图嵌入的技术，（2）基于图神经网络的双曲模型，以及（3）新兴范式。除了方法论，我们广泛讨论了HGL在多个领域的各种应用，包括推荐系统、知识图谱、生物信息学和其他相关场景，展示了双曲几何在现实世界图学习任务中的广泛适用性和有效性。最重要的是，我们确定了几个作为推动HGL发展方向的关键挑战，包括处理复杂数据结构、开发几何感知的学习目标、确保可信赖和可扩展的实现，以及与基础模型（例如大型语言模型）整合。我们强调了这一激动人心的跨学科领域中的有前景的研究机会。详尽的存储库可在https://github.com/digailab/awesome-hyperbolic-graph-learning找到。

更新时间: 2025-07-30 11:05:02

领域: cs.LG

下载: http://arxiv.org/abs/2202.13852v3

Hyperbolic Graph Learning: A Comprehensive Review

Updated: 2025-07-30 11:05:02

标题: 双曲图学习：综合评述

摘要: 在欧几里得空间中的图表示学习，尽管在许多领域被广泛采用并证明其效用，但往往难以有效捕获现实世界数据中普遍存在的层次化和复杂关系结构，特别是对于展示高度非欧几里得潜在解剖学或幂律分布的数据集而言。双曲几何具有恒定的负曲率和指数增长特性，自然地容纳这种结构，为学习丰富的图表示提供了一种有前途的替代方案。本调查论文全面审查了快速发展的双曲图学习（HGL）领域。我们系统地对现有方法进行分类和分析，广泛将它们分为（1）基于双曲图嵌入的技术，（2）基于图神经网络的双曲模型，以及（3）新兴范例。除了方法论，我们广泛讨论了HGL在多个领域的各种应用，包括推荐系统、知识图谱、生物信息学和其他相关场景，展示了双曲几何在实际图学习任务中的广泛适用性和有效性。最重要的是，我们确定了几个关键挑战，作为推进HGL的方向，包括处理复杂数据结构、开发几何感知的学习目标、确保可信赖和可扩展的实现，以及与基础模型（例如大型语言模型）的整合。我们强调了这个令人兴奋的跨学科领域中的有前景的研究机会。您可以在https://github.com/digailab/awesome-hyperbolic-graph-learning找到一个全面的资源库。

更新时间: 2025-07-30 11:05:02

领域: cs.LG

下载: http://arxiv.org/abs/2202.13852v3

COOkeD: Ensemble-based OOD detection in the era of zero-shot CLIP

Out-of-distribution (OOD) detection is an important building block in trustworthy image recognition systems as unknown classes may arise at test-time. OOD detection methods typically revolve around a single classifier, leading to a split in the research field between the classical supervised setting (e.g. ResNet18 classifier trained on CIFAR100) vs. the zero-shot setting (class names fed as prompts to CLIP). In both cases, an overarching challenge is that the OOD detection performance is implicitly constrained by the classifier's capabilities on in-distribution (ID) data. In this work, we show that given a little open-mindedness from both ends, remarkable OOD detection can be achieved by instead creating a heterogeneous ensemble - COOkeD combines the predictions of a closed-world classifier trained end-to-end on a specific dataset, a zero-shot CLIP classifier, and a linear probe classifier trained on CLIP image features. While bulky at first sight, this approach is modular, post-hoc and leverages the availability of pre-trained VLMs, thus introduces little overhead compared to training a single standard classifier. We evaluate COOkeD on popular CIFAR100 and ImageNet benchmarks, but also consider more challenging, realistic settings ranging from training-time label noise, to test-time covariate shift, to zero-shot shift which has been previously overlooked. Despite its simplicity, COOkeD achieves state-of-the-art performance and greater robustness compared to both classical and CLIP-based OOD detection methods. Code is available at https://github.com/glhr/COOkeD

Updated: 2025-07-30 11:02:38

标题: 烹饪：零样本CLIP时代的基于集成的OOD检测

摘要: Out-of-distribution（OOD）检测是可信图像识别系统中的重要组成部分，因为未知类别可能在测试时出现。OOD检测方法通常围绕单个分类器展开，导致研究领域在经典监督设置（例如在CIFAR100上训练的ResNet18分类器）与零样本设置（将类别名称作为提示输入到CLIP中）之间分裂。在这两种情况下，一个普遍的挑战是OOD检测性能隐含地受到分类器在内部分布（ID）数据上的能力的限制。在这项工作中，我们展示了如果双方都保持开放态度，可以通过创建异构集成实现出色的OOD检测 - COOkeD结合了端到端在特定数据集上训练的封闭世界分类器的预测、零样本CLIP分类器以及在CLIP图像特征上训练的线性探针分类器。尽管乍一看庞大，这种方法是模块化的、事后的，并利用了预训练的VLMs的可用性，因此与训练单个标准分类器相比，引入的开销很小。我们在流行的CIFAR100和ImageNet基准测试上评估了COOkeD，但也考虑了更具挑战性、更现实的设置，从训练时标签噪声到测试时协变量转移，再到之前被忽视的零样本转移。尽管其简单性，COOkeD实现了最新技术水平的性能，并且与传统和基于CLIP的OOD检测方法相比具有更大的鲁棒性。代码可在https://github.com/glhr/COOkeD上找到。

更新时间: 2025-07-30 11:02:38

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22576v1

COOkeD: Ensemble-based OOD detection in the era of zero-shot CLIP

Updated: 2025-07-30 11:02:38

标题: 烹饪：零射击CLIP时代的基于集成的OOD检测

摘要: Out-of-distribution (OOD)检测是值得信赖的图像识别系统中的重要基础，因为在测试时可能会出现未知类别。OOD检测方法通常围绕一个单一的分类器展开，导致研究领域在传统监督设置（例如在CIFAR100上训练的ResNet18分类器）与零样本设置（将类名作为提示提供给CLIP）之间产生分歧。在这两种情况下，一个普遍的挑战是OOD检测性能在ID数据上被分类器的能力隐式限制。在这项工作中，我们表明，通过双方稍微开放的思维，可以通过创建异构集成来实现显著的OOD检测 - COOkeD将在特定数据集上端到端训练的封闭世界分类器的预测、零样本CLIP分类器和在CLIP图像特征上训练的线性探针分类器的预测结合起来。虽然乍一看庞大，但这种方法是模块化的、事后的，并利用了预训练的VLMs的可用性，因此与训练单个标准分类器相比，引入的开销很小。我们在流行的CIFAR100和ImageNet基准上评估了COOkeD，但也考虑了更具挑战性、更现实的设置，从训练时的标签噪声，到测试时的协变量变化，再到之前被忽视的零样本变化。尽管其简单性，COOkeD实现了与传统和基于CLIP的OOD检测方法相比的最新性能和更大的鲁棒性。代码可在https://github.com/glhr/COOkeD 上找到。

更新时间: 2025-07-30 11:02:38

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22576v1

Explaining Deep Network Classification of Matrices: A Case Study on Monotonicity

This work demonstrates a methodology for using deep learning to discover simple, practical criteria for classifying matrices based on abstract algebraic properties. By combining a high-performance neural network with explainable AI (XAI) techniques, we can distill a model's learned strategy into human-interpretable rules. We apply this approach to the challenging case of monotone matrices, defined by the condition that their inverses are entrywise nonnegative. Despite their simple definition, an easy characterization in terms of the matrix elements or the derived parameters is not known. Here, we present, to the best of our knowledge, the first systematic machine-learning approach for deriving a practical criterion that distinguishes monotone from non-monotone matrices. After establishing a labelled dataset by randomly generated monotone and non-monotone matrices uniformly on $(-1,1)$, we employ deep neural network algorithms for classifying the matrices as monotone or non-monotone, using both their entries and a comprehensive set of matrix features. By saliency methods, such as integrated gradients, we identify among all features, two matrix parameters which alone provide sufficient information for the matrix classification, with $95\%$ accuracy, namely the absolute values of the two lowest-order coefficients, $c_0$ and $c_1$ of the matrix's characteristic polynomial. A data-driven study of 18,000 random $7\times7$ matrices shows that the monotone class obeys $\lvert c_{0}/c_{1}\rvert\le0.18$ with probability $>99.98\%$; because $\lvert c_{0}/c_{1}\rvert = 1/\mathrm{tr}(A^{-1})$ for monotone $A$, this is equivalent to the simple bound $\mathrm{tr}(A^{-1})\ge5.7$.

Updated: 2025-07-30 10:55:44

标题: 解释矩阵的深度网络分类：单调性案例研究

摘要: 这项工作展示了一种利用深度学习发现简单、实用标准来分类矩阵的方法，这些标准基于抽象代数性质。通过将高性能神经网络与可解释人工智能（XAI）技术结合，我们可以将模型学到的策略提炼成人类可以解释的规则。我们将这种方法应用于具有挑战性的单调矩阵的情况，这些矩阵的定义是它们的逆矩阵是逐个非负的。尽管它们的定义简单，但以矩阵元素或派生参数的形式进行简单刻画是未知的。在这里，我们据我所知，提出了第一个系统的机器学习方法，用于推导出一个将单调矩阵与非单调矩阵区分开来的实际标准。通过在$(-1,1)$上均匀随机生成单调和非单调矩阵来建立一个标记数据集，我们利用深度神经网络算法对这些矩阵进行分类，以它们的元素和一套全面的矩阵特征。通过一些显著性方法，如综合梯度，我们确定出在所有特征中，单独使用两个矩阵参数就足以提供矩阵分类所需的信息，准确率为95%，即矩阵特征多项式的两个最低阶系数$c_0$和$c_1$的绝对值。对18,000个随机生成的$7\times7$矩阵进行的数据驱动研究表明，单调类矩阵满足$\lvert c_{0}/c_{1}\rvert\le0.18$的概率超过99.98%；因为对于单调矩阵$A$，$\lvert c_{0}/c_{1}\rvert = 1/\mathrm{tr}(A^{-1})$，这等价于简单的界限$\mathrm{tr}(A^{-1})\ge5.7。

更新时间: 2025-07-30 10:55:44

领域: cs.LG,cs.AI,cs.NA,math.NA,15B48, 68T07, 15A18, 62G32,I.2.6; I.5.2; G.1.3

下载: http://arxiv.org/abs/2507.22570v1

Explaining Deep Network Classification of Matrices: A Case Study on Monotonicity

Updated: 2025-07-30 10:55:44

标题: 解释矩阵深度网络分类：单调性案例研究

摘要: 这项工作展示了一种利用深度学习来发现基于抽象代数特性的矩阵分类的简单实用标准的方法。通过结合高性能神经网络和可解释人工智能（XAI）技术，我们可以将模型学到的策略提炼成人类可解释的规则。我们将这种方法应用于具有挑战性的单调矩阵的情况，其定义为其逆矩阵的元素逐个非负。尽管其简单定义，但在矩阵元素或派生参数方面的简单特征尚不为人所知。在这里，我们提出了我们所知道的第一个系统的机器学习方法，用于推导一个可区分单调和非单调矩阵的实用标准。通过在(-1,1)上均匀生成的随机单调和非单调矩阵建立一个带标签的数据集，我们利用深度神经网络算法来将矩阵分类为单调或非单调，同时使用它们的元素和一套全面的矩阵特征。通过显著性方法，如综合梯度，我们确定在所有特征中，仅使用两个矩阵参数即可提供足够的信息进行矩阵分类，准确率为95％，即矩阵特征多项式的两个最低阶系数c0和c1的绝对值。对18,000个随机生成的7×7矩阵的数据驱动研究表明，单调类别满足|c0/c1|≤0.18的概率大于99.98％；因为对于单调A，|c0/c1| = 1/tr(A^(-1))，这相当于简单的界限 tr(A^(-1)) ≥ 5.7。

更新时间: 2025-07-30 10:55:44

领域: cs.LG,cs.AI,cs.NA,math.NA,15B48, 68T07, 15A18, 62G32,I.2.6; I.5.2; G.1.3

下载: http://arxiv.org/abs/2507.22570v1

Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning

The tension between data privacy and model utility has become the defining bottleneck for the practical deployment of large language models (LLMs) trained on sensitive corpora including healthcare. Differentially private stochastic gradient descent (DP-SGD) guarantees formal privacy, yet it does so at a pronounced cost: gradients are forcibly clipped and perturbed with noise, degrading sample efficiency and final accuracy. Numerous variants have been proposed to soften this trade-off, but they all share a handicap: their control knobs are hard-coded, global, and oblivious to the evolving optimization landscape. Consequently, practitioners are forced either to over-spend privacy budget in pursuit of utility, or to accept mediocre models in order to stay within privacy constraints. We present RLDP, the first framework to cast DP optimization itself as a closed-loop control problem amenable to modern deep reinforcement learning (RL). RLDP continuously senses rich statistics of the learning dynamics and acts by selecting fine-grained per parameter gradient-clipping thresholds as well as the magnitude of injected Gaussian noise. A soft actor-critic (SAC) hyper-policy is trained online during language model fine-tuning; it learns, from scratch, how to allocate the privacy budget where it matters and when it matters. Across more than 1,600 ablation experiments on GPT2-small, Llama-1B, Llama-3B, and Mistral-7B, RLDP delivers perplexity reductions of 1.3-30.5% (mean 5.4%) and an average 5.6% downstream utility gain. RLDP reaches each baseline's final utility after only 13-43% of the gradient-update budget (mean speed-up 71%), all while honoring the same ($\epsilon$, $\delta$)-DP contract and exhibiting equal or lower susceptibility to membership-inference and canary-extraction attacks.

Updated: 2025-07-30 10:46:53

标题: 通过强化学习实现高效的差分隐私微调LLMs

摘要: 数据隐私与模型效用之间的紧张关系已经成为在包括医疗保健在内的敏感语料库上训练的大型语言模型（LLMs）的实际部署的关键瓶颈。差分隐私随机梯度下降（DP-SGD）保证了形式上的隐私，但以明显的代价为代价：梯度被强制夹断和扰动噪音，降低了样本效率和最终准确性。已经提出了许多变体来软化这种权衡，但它们都有一个劣势：它们的控制旋钮是硬编码的、全局的，并且对不断发展的优化景观毫不知情。因此，从业者被迫要么在追求效用时过度消耗隐私预算，要么接受平庸的模型以保持在隐私约束之内。我们提出了RLDP，这是第一个将DP优化本身作为适合现代深度强化学习（RL）的闭环控制问题的框架。RLDP不断感知学习动态的丰富统计数据，并通过选择每个参数的细粒度梯度夹断阈值以及注入的高斯噪声的幅度来行动。在语言模型微调过程中在线训练了一个软演员-评论家（SAC）超级策略；它从头开始学习如何在关键时刻和关键时刻分配隐私预算。在GPT2-small、Llama-1B、Llama-3B和Mistral-7B上进行了超过1600次消融实验，RLDP实现了1.3-30.5%（平均5.4%）的困惑度降低和平均5.6%的下游效用增益。RLDP在仅使用梯度更新预算的13-43%（平均加速度71%）之后达到了每个基准线的最终效用，同时遵守相同的（ε，δ）-DP合同，并表现出相同或更低的易受成员推断和金丝雀提取攻击的敏感性。

更新时间: 2025-07-30 10:46:53

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.22565v1

Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning

Updated: 2025-07-30 10:46:53

标题: 通过强化学习实现的高效差分隐私微调LLMs

摘要: 数据隐私与模型效用之间的紧张关系已成为大型语言模型（LLMs）在包括医疗保健在内的敏感语料库上进行实际部署的关键瓶颈。差分私人随机梯度下降（DP-SGD）保证了形式上的隐私，但是以明显的代价为代价：梯度被强制裁剪和扰动噪声，降低了样本效率和最终准确性。已经提出了许多变种以软化这种权衡，但它们都共享一个缺点：它们的控制旋钮是硬编码的、全局的，并且对不断演化的优化景观是无视的。因此，从业者不得不在追求效用时超支隐私预算，或者为了符合隐私约束而接受平庸的模型。我们提出了RLDP，这是第一个将DP优化本身作为一个适合现代深度强化学习（RL）的闭环控制问题的框架。RLDP不断感知学习动态的丰富统计数据，并通过选择每个参数的细粒度梯度剪切阈值以及注入的高斯噪声的幅度来行动。在语言模型微调过程中在线训练一个软演员-评论家（SAC）超策略；它学会了如何在重要时刻和地方分配隐私预算。在GPT2-small、Llama-1B、Llama-3B和Mistral-7B上进行了超过1600次消融实验，RLDP实现了1.3-30.5%（平均5.4%）的困惑度降低和平均5.6%的下游效用增益。RLDP在仅使用梯度更新预算的13-43%（平均加速比71%）后达到每个基线的最终效用，同时遵守相同的（ϵ，δ）-DP合同，并表现出与成员推理和金丝雀提取攻击相等或更低的易受性。

更新时间: 2025-07-30 10:46:53

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.22565v1

Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs

Large Language Models (LLMs) demonstrate impressive capabilities across a wide range of tasks, yet their safety mechanisms remain susceptible to adversarial attacks that exploit cognitive biases -- systematic deviations from rational judgment. Unlike prior jailbreaking approaches focused on prompt engineering or algorithmic manipulation, this work highlights the overlooked power of multi-bias interactions in undermining LLM safeguards. We propose CognitiveAttack, a novel red-teaming framework that systematically leverages both individual and combined cognitive biases. By integrating supervised fine-tuning and reinforcement learning, CognitiveAttack generates prompts that embed optimized bias combinations, effectively bypassing safety protocols while maintaining high attack success rates. Experimental results reveal significant vulnerabilities across 30 diverse LLMs, particularly in open-source models. CognitiveAttack achieves a substantially higher attack success rate compared to the SOTA black-box method PAP (60.1% vs. 31.6%), exposing critical limitations in current defense mechanisms. These findings highlight multi-bias interactions as a powerful yet underexplored attack vector. This work introduces a novel interdisciplinary perspective by bridging cognitive science and LLM safety, paving the way for more robust and human-aligned AI systems.

Updated: 2025-07-30 10:40:53

标题: 利用协同认知偏差绕过LLMs中的安全措施

摘要: 大型语言模型（LLMs）展示了在广泛任务范围内令人印象深刻的能力，然而它们的安全机制仍然容易受到利用认知偏见的对抗攻击的影响 -- 这些认知偏见是对理性判断的系统性偏离。与之前侧破解方法专注于提示工程或算法操纵不同，这项工作突出了多种偏见相互作用在破坏LLM安全措施中被忽视的力量。我们提出了CognitiveAttack，这是一个新颖的红队框架，系统地利用个体和组合认知偏见。通过整合监督微调和强化学习，CognitiveAttack生成嵌入优化偏见组合的提示，有效地绕过安全协议，同时保持高攻击成功率。实验结果显示了30个不同LLMs之间的显著漏洞，特别是在开源模型中。与SOTA黑盒方法PAP相比，CognitiveAttack达到了显著更高的攻击成功率（60.1% vs. 31.6%），暴露了当前防御机制中的关键限制。这些发现突出了多种偏见相互作用作为一种强大但尚未深入探讨的攻击向量。这项工作通过搭建认知科学和LLM安全之间的桥梁引入了一种新颖的跨学科视角，为更稳健和与人类对齐的人工智能系统铺平了道路。

更新时间: 2025-07-30 10:40:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22564v1

Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs

Updated: 2025-07-30 10:40:53

标题: 利用协同认知偏差绕过LLM中的安全问题

摘要: 大型语言模型（LLMs）展示了在各种任务中令人印象深刻的能力，然而它们的安全机制仍然容易受到利用认知偏见的对抗性攻击的影响——这些偏见是对理性判断的系统性偏离。与先前专注于提示工程或算法操纵的越狱方法不同，这项工作突出了多偏见相互作用在破坏LLM安全机制中被忽视的力量。我们提出了CognitiveAttack，这是一个新颖的红队攻击框架，系统地利用了个体和组合认知偏见。通过整合监督微调和强化学习，CognitiveAttack生成嵌入了优化偏见组合的提示，有效地绕过安全协议，同时保持高攻击成功率。实验结果揭示了30个不同LLMs中的显著漏洞，特别是在开源模型中。与SOTA黑盒方法PAP相比，CognitiveAttack取得了更高的攻击成功率（60.1% vs. 31.6%），揭示了当前防御机制的关键限制。这些发现突出了多偏见互动作为一种强大但未被充分探索的攻击向量。这项工作通过搭建认知科学和LLM安全之间的桥梁，引入了一种新颖的跨学科视角，为更强大和与人类对齐的人工智能系统铺平了道路。

更新时间: 2025-07-30 10:40:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22564v1

Rationale-guided Prompting for Knowledge-based Visual Question Answering

Recently, Large Language Models (LLMs) have been used for knowledge-based Visual Question Answering (VQA). Despite the encouraging results of previous studies, prior methods prompt LLMs to predict answers directly, neglecting intermediate thought processes. We argue that prior methods do not sufficiently activate the capacities of LLMs. We propose a framework called PLRH that Prompts LLMs with Rationale Heuristics for knowledge-based VQA. The PLRH prompts LLMs with Chain of Thought (CoT) to generate rationale heuristics, i.e., intermediate thought processes, and then leverages the rationale heuristics to inspire LLMs to predict answers. Experiments show that our approach outperforms the existing baselines by more than 2.2 and 2.1 on OK-VQA and A-OKVQA, respectively.

Updated: 2025-07-30 10:40:50

标题: 理由引导的提示用于基于知识的视觉问题回答

摘要: 最近，大型语言模型（LLMs）已被用于基于知识的视觉问答（VQA）。尽管先前研究取得了令人鼓舞的结果，但先前的方法促使LLMs直接预测答案，忽略了中间的思考过程。我们认为先前的方法并未充分激活LLMs的能力。我们提出了一个名为PLRH的框架，用于基于知识的VQA，该框架通过合理启发法（Rationale Heuristics）提示LLMs。PLRH使用思维链（CoT）提示LLMs生成合理启发法，即中间的思考过程，然后利用合理启发法激发LLMs预测答案。实验表明，我们的方法在OK-VQA和A-OKVQA上分别比现有基线表现提高了2.2和2.1。

更新时间: 2025-07-30 10:40:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2412.16936v2

Rationale-guided Prompting for Knowledge-based Visual Question Answering

Updated: 2025-07-30 10:40:50

标题: 基于知识的视觉问答的理念引导提示

摘要: 最近，大型语言模型（LLMs）已被用于基于知识的视觉问答（VQA）。尽管先前研究取得了令人鼓舞的结果，但先前的方法促使LLMs直接预测答案，忽略了中间的思维过程。我们认为先前的方法并未充分激活LLMs的能力。我们提出了一个名为PLRH的框架，通过理性启发法则提示LLMs进行基于知识的VQA。PLRH使用思维链（CoT）提示LLMs生成理性启发法则，即中间的思维过程，然后利用这些理性启发法则激发LLMs预测答案。实验证明，我们的方法在OK-VQA和A-OKVQA上分别优于现有基线超过2.2和2.1。

更新时间: 2025-07-30 10:40:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2412.16936v2

aLLoyM: A large language model for alloy phase diagram prediction

Large Language Models (LLMs) are general-purpose tools with wide-ranging applications, including in materials science. In this work, we introduce aLLoyM, a fine-tuned LLM specifically trained on alloy compositions, temperatures, and their corresponding phase information. To develop aLLoyM, we curated question-and-answer (Q&A) pairs for binary and ternary phase diagrams using the open-source Computational Phase Diagram Database (CPDDB) and assessments based on CALPHAD (CALculation of PHAse Diagrams). We fine-tuned Mistral, an open-source pre-trained LLM, for two distinct Q&A formats: multiple-choice and short-answer. Benchmark evaluations demonstrate that fine-tuning substantially enhances performance on multiple-choice phase diagram questions. Moreover, the short-answer model of aLLoyM exhibits the ability to generate novel phase diagrams from its components alone, underscoring its potential to accelerate the discovery of previously unexplored materials systems. To promote further research and adoption, we have publicly released the short-answer fine-tuned version of aLLoyM, along with the complete benchmarking Q&A dataset, on Hugging Face.

Updated: 2025-07-30 10:32:39

标题: aLLoyM：一种用于合金相图预测的大型语言模型

摘要: 大型语言模型（LLMs）是具有广泛应用的通用工具，包括在材料科学中。在这项工作中，我们介绍了aLLoyM，这是一个经过精细调整的LLM，专门针对合金成分、温度及其对应的相信息进行训练。为了开发aLLoyM，我们利用开源计算相图数据库（CPDDB）和基于CALPHAD（PHAse Diagrams Calculation）的评估，策划了用于二元和三元相图的问题和答案（Q&A）对。我们为两种不同的Q&A格式（多项选择和简答）对Mistral进行了微调，Mistral是一个开源的预训练LLM。基准评估表明，微调显著提高了对多项选择相图问题的性能。此外，aLLoyM的简答模型表现出从其组成部分单独生成新的相图的能力，强调了它加速发现以前未开发的材料体系的潜力。为了促进进一步的研究和采用，我们已经在Hugging Face上公开发布了aLLoyM的简答微调版本，以及完整的基准测试Q&A数据集。

更新时间: 2025-07-30 10:32:39

领域: cond-mat.mtrl-sci,cs.AI

下载: http://arxiv.org/abs/2507.22558v1

aLLoyM: A large language model for alloy phase diagram prediction

Updated: 2025-07-30 10:32:39

标题: aLLoyM：一种用于合金相图预测的大型语言模型

摘要: 大型语言模型（LLMs）是一种通用工具，具有广泛的应用，包括在材料科学中。在这项工作中，我们介绍了aLLoyM，这是一种经过精细调整的LLM，专门针对合金组成、温度及其相应的相信息进行训练。为了开发aLLoyM，我们使用开源的计算相图数据库（CPDDB）和基于CALPHAD（PHAse Diagrams的计算）的评估，整理了二元和三元相图的问题和答案对。我们为两种不同的Q&A格式（多项选择和简答）对Mistral进行了微调，一个开源的预训练LLM。基准评估表明，微调显著提高了多项选择相图问题的性能。此外，aLLoyM的简答模型表现出能够仅凭其组成部分生成新的相图的能力，突显了加速发现以前未开发的材料体系的潜力。为了促进进一步的研究和采用，我们已经在Hugging Face上公开发布了aLLoyM的简答微调版本，以及完整的基准测试Q&A数据集。

更新时间: 2025-07-30 10:32:39

领域: cond-mat.mtrl-sci,cs.AI

下载: http://arxiv.org/abs/2507.22558v1

VAR: Visual Analysis for Rashomon Set of Machine Learning Models' Performance

Evaluating the performance of closely matched machine learning(ML) models under specific conditions has long been a focus of researchers in the field of machine learning. The Rashomon set is a collection of closely matched ML models, encompassing a wide range of models with similar accuracies but different structures. Traditionally, the analysis of these sets has focused on vertical structural analysis, which involves comparing the corresponding features at various levels within the ML models. However, there has been a lack of effective visualization methods for horizontally comparing multiple models with specific features. We propose the VAR visualization solution. VAR uses visualization to perform comparisons of ML models within the Rashomon set. This solution combines heatmaps and scatter plots to facilitate the comparison. With the help of VAR, ML model developers can identify the optimal model under specific conditions and better understand the Rashomon set's overall characteristics.

Updated: 2025-07-30 10:31:20

标题: VAR：瑞士木门机器学习模型性能的可视化分析

摘要: 评估在特定条件下性能相近的机器学习（ML）模型长期以来一直是机器学习领域研究人员的关注焦点。拉什莫门集是一组性能相近的ML模型，包括一系列具有相似准确度但结构不同的模型。传统上，对这些集合的分析主要集中在垂直结构分析上，涉及比较ML模型内不同层次的相应特征。然而，目前缺乏有效的可视化方法来水平比较具有特定特征的多个模型。我们提出了VAR可视化解决方案。VAR利用可视化来比较拉什莫门集内的ML模型。该解决方案结合了热图和散点图，以便于比较。借助VAR，ML模型开发人员可以在特定条件下确定最佳模型，并更好地了解拉什莫门集的整体特性。

更新时间: 2025-07-30 10:31:20

领域: cs.LG

下载: http://arxiv.org/abs/2507.22556v1

VAR: Visual Analysis for Rashomon Set of Machine Learning Models' Performance

Updated: 2025-07-30 10:31:20

标题: VAR：拉设模型学习性能的拉设视觉分析

摘要: 评估在特定条件下性能相近的机器学习（ML）模型一直是机器学习领域研究人员的关注重点。拉什莫门集是一组性能相近的ML模型，包括一系列准确度相似但结构不同的模型。传统上，对这些集合的分析主要集中在垂直结构分析上，即比较ML模型内部不同层次上对应的特征。然而，目前缺乏有效的可视化方法来水平比较具有特定特征的多个模型。我们提出了VAR可视化解决方案。VAR利用可视化来比较拉什莫门集中的ML模型。该解决方案结合了热图和散点图，以便于比较。借助VAR，ML模型开发人员可以在特定条件下确定最佳模型，并更好地了解拉什莫门集的整体特征。

更新时间: 2025-07-30 10:31:20

领域: cs.LG

下载: http://arxiv.org/abs/2507.22556v1

DeepC4: Deep Conditional Census-Constrained Clustering for Large-scale Multitask Spatial Disaggregation of Urban Morphology

To understand our global progress for sustainable development and disaster risk reduction in many developing economies, two recent major initiatives - the Uniform African Exposure Dataset of the Global Earthquake Model (GEM) Foundation and the Modelling Exposure through Earth Observation Routines (METEOR) Project - implemented classical spatial disaggregation techniques to generate large-scale mapping of urban morphology using the information from various satellite imagery and its derivatives, geospatial datasets of the built environment, and subnational census statistics. However, the local discrepancy with well-validated census statistics and the propagated model uncertainties remain a challenge in such coarse-to-fine-grained mapping problems, specifically constrained by weak and conditional label supervision. Therefore, we present Deep Conditional Census-Constrained Clustering (DeepC4), a novel deep learning-based spatial disaggregation approach that incorporates local census statistics as cluster-level constraints while considering multiple conditional label relationships in a joint multitask learning of the patterns of satellite imagery. To demonstrate, compared to GEM and METEOR, we enhanced the quality of Rwandan maps of urban morphology, specifically building exposure and physical vulnerability, at the third-level administrative unit from the 2022 census. As the world approaches the conclusion of our global frameworks in 2030, our work has offered a new deep learning-based mapping technique towards a spatial auditing of our existing coarse-grained derived information at large scales.

Updated: 2025-07-30 10:25:39

标题: DeepC4：用于大规模多任务空间细分城市形态的深层条件普查约束聚类

摘要: 为了了解许多发展中国家在可持续发展和减灾方面的全球进展，最近的两项重要倡议 - 全球地震模型（GEM）基金会的统一非洲暴露数据集和通过地球观测例程对模拟暴露（METEOR）项目 - 实施了经典的空间细分技术，利用各种卫星影像及其衍生产品、建成环境的地理数据集以及次国家普查统计数据生成了大规模的城市形态映射。然而，在这种粗粒度到细粒度映射问题中，与经过验证的普查统计数据存在本地差异和传播模型不确定性仍然是一个挑战，特别是受到弱和有条件的标签监督的限制。因此，我们提出了一种新颖的基于深度学习的空间细分方法，即深度条件普查约束聚类（DeepC4），它将本地普查统计数据作为集群级约束，同时考虑多重条件标签关系，通过对卫星影像模式进行联合多任务学习。为了证明，与GEM和METEOR相比，我们提高了卢旺达城市形态地图的质量，特别是在来自2022普查的第三级行政单位的建筑暴露和物理脆弱性。随着世界在2030年接近全球框架的结束，我们的工作为通过新的基于深度学习的映射技术对现有大规模粗粒度衍生信息进行空间审计提供了一种新的方法。

更新时间: 2025-07-30 10:25:39

领域: cs.LG

下载: http://arxiv.org/abs/2507.22554v1

DeepC4: Deep Conditional Census-Constrained Clustering for Large-scale Multitask Spatial Disaggregation of Urban Morphology

Updated: 2025-07-30 10:25:39

标题: DeepC4：用于大规模多任务空间细分城市形态的深度条件人口普查约束聚类

摘要: 为了了解许多发展中国家在可持续发展和减灾方面的全球进展，最近实施了两项重要倡议——全球地震模型（GEM）基金会的非洲曝光数据集和通过地球观测例程模拟曝光（METEOR）项目。这两项倡议采用了经典的空间细分技术，利用各种卫星图像及其衍生物、建成环境的地理空间数据集和次国家人口普查统计信息生成了城市形态的大规模映射。然而，与经过验证的人口普查统计数据之间的本地差异以及传播的模型不确定性仍然是这种粗粒度到细粒度映射问题中的挑战，具体受到标签监督的弱和有条件的限制。因此，我们提出了深度条件人口普查约束聚类（DeepC4），这是一种新颖的基于深度学习的空间细分方法，它将本地人口普查统计数据作为集群级约束，同时考虑多个条件标签关系，对卫星图像的模式进行联合多任务学习。为了证明，与GEM和METEOR相比，我们提高了卢旺达城市形态地图的质量，特别是来自2022年人口普查的第三级行政单位的建筑暴露和物理脆弱性。随着世界在2030年接近全球框架的结束，我们的工作提供了一种新的基于深度学习的映射技术，可以对我们现有的大规模粗粒度衍生信息进行空间审计。

更新时间: 2025-07-30 10:25:39

领域: cs.LG

下载: http://arxiv.org/abs/2507.22554v1

RainbowPrompt: Diversity-Enhanced Prompt-Evolving for Continual Learning

Prompt-based continual learning provides a rehearsal-free solution by tuning small sets of parameters while keeping pre-trained models frozen. To meet the complex demands of sequential tasks, it is crucial to integrate task-specific knowledge within prompts effectively. However, existing works rely on either fixed learned prompts (i.e., prompts whose representations remain unchanged during new task learning) or on prompts generated from an entangled task-shared space, limiting the representational diversity of the integrated prompt. To address this issue, we propose a novel prompt-evolving mechanism to adaptively aggregate base prompts (i.e., task-specific prompts) into a unified prompt while ensuring diversity. By transforming and aligning base prompts, both previously learned and newly introduced, our approach continuously evolves accumulated knowledge to facilitate learning new tasks. We further introduce a learnable probabilistic gate that adaptively determines which layers to activate during the evolution process. We validate our method on image classification and video action recognition tasks in class-incremental learning, achieving average gains of 9.07% and 7.40% over existing methods across all scenarios.

Updated: 2025-07-30 10:25:28

标题: RainbowPrompt：多样性增强的提示进化，用于持续学习

摘要: 快速提示的持续学习通过调节少量参数而保持预训练模型冻结，提供了一种无需复习的解决方案。为了满足顺序任务的复杂需求，有效地整合任务特定知识到提示中至关重要。然而，现有作品要么依赖于固定学习的提示（即，在新任务学习过程中其表示保持不变的提示），要么依赖于从纠缠的任务共享空间生成的提示，限制了整合提示的表示多样性。为了解决这个问题，我们提出了一种新颖的提示演变机制，将基本提示（即，任务特定提示）自适应地聚合到一个统一的提示中，同时确保多样性。通过转换和对齐基本提示，包括先前学习的和新引入的提示，我们的方法不断演进积累的知识，以促进学习新任务。我们进一步引入了一个可学习的概率门，自适应地确定在演变过程中激活哪些层。我们在图像分类和视频动作识别任务中验证了我们的方法，在类增量学习中，我们在所有场景中实现的平均增益分别为9.07％和7.40％，超过现有方法。

更新时间: 2025-07-30 10:25:28

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22553v1

RainbowPrompt: Diversity-Enhanced Prompt-Evolving for Continual Learning

Updated: 2025-07-30 10:25:28

标题: RainbowPrompt: 多样性增强提示演化用于持续学习

摘要: 提示式持续学习通过调整少量参数而保持预训练模型冻结，提供了一种无需排练的解决方案。为了满足顺序任务的复杂需求，有效地整合任务特定知识至提示中是至关重要的。然而，现有工作要么依赖于固定学习的提示（即，在新任务学习过程中其表示保持不变的提示），要么依赖于从纠缠的任务共享空间生成的提示，限制了整合提示的表征多样性。为了解决这个问题，我们提出了一种新颖的提示演化机制，将基本提示（即任务特定提示）自适应地聚合成一个统一的提示，同时确保多样性。通过转换和对齐基本提示，包括先前学习的和新引入的提示，我们的方法持续演化积累的知识，以促进学习新任务。我们进一步引入了一个可学习的概率门，自适应地确定在演化过程中激活哪些层。我们在图像分类和视频动作识别任务上验证了我们的方法，在类增量学习中，相对于现有方法，平均获得了9.07%和7.40%的增益。

更新时间: 2025-07-30 10:25:28

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22553v1

Thermodynamics-Inspired Computing with Oscillatory Neural Networks for Inverse Matrix Computation

We describe a thermodynamic-inspired computing paradigm based on oscillatory neural networks (ONNs). While ONNs have been widely studied as Ising machines for tackling complex combinatorial optimization problems, this work investigates their feasibility in solving linear algebra problems, specifically the inverse matrix. Grounded in thermodynamic principles, we analytically demonstrate that the linear approximation of the coupled Kuramoto oscillator model leads to the inverse matrix solution. Numerical simulations validate the theoretical framework, and we examine the parameter regimes that computation has the highest accuracy.

Updated: 2025-07-30 10:16:55

标题: 受热力学启发的振荡神经网络在逆矩阵计算中的应用

摘要: 我们描述了一种基于振荡神经网络（ONNs）的受热力学启发的计算范式。虽然ONNs被广泛研究作为解决复杂组合优化问题的伊辛机，这项工作调查了它们在解决线性代数问题，特别是逆矩阵问题中的可行性。基于热力学原理，我们在分析上证明了耦合库仑模型的线性近似导致逆矩阵解的解决方案。数值模拟验证了理论框架，我们检查了计算具有最高准确性的参数范围。

更新时间: 2025-07-30 10:16:55

领域: cs.LG,cs.ET

下载: http://arxiv.org/abs/2507.22544v1

Thermodynamics-Inspired Computing with Oscillatory Neural Networks for Inverse Matrix Computation

Updated: 2025-07-30 10:16:55

标题: 受热力学启发的振荡神经网络在逆矩阵计算中的应用

摘要: 我们描述了一种基于振荡神经网络（ONNs）的受热力学启发的计算范式。虽然ONNs被广泛研究作为解决复杂组合优化问题的伊辛机器，但本研究探讨了它们在解决线性代数问题，特别是逆矩阵问题中的可行性。基于热力学原理，我们分析地证明了耦合库拉莫托振荡器模型的线性近似导致逆矩阵解决方案。数值模拟验证了理论框架，我们还研究了计算具有最高准确性的参数范围。

更新时间: 2025-07-30 10:16:55

领域: cs.LG,cs.ET

下载: http://arxiv.org/abs/2507.22544v1

Pre-trained Models Perform the Best When Token Distributions Follow Zipf's Law

Tokenization is a fundamental step in natural language processing (NLP) and other sequence modeling domains, where the choice of vocabulary size significantly impacts model performance. Despite its importance, selecting an optimal vocabulary size remains underexplored, typically relying on heuristics or dataset-specific choices. In this work, we propose a principled method for determining the vocabulary size by analyzing token frequency distributions through Zipf's law. We show that downstream task performance correlates with how closely token distributions follow power-law behavior, and that aligning with Zipfian scaling improves both model efficiency and effectiveness. Extensive experiments across NLP, genomics, and chemistry demonstrate that models consistently achieve peak performance when the token distribution closely adheres to Zipf's law, establishing Zipfian alignment as a robust and generalizable criterion for vocabulary size selection.

Updated: 2025-07-30 10:16:23

标题: 预训练模型在令牌分布遵循Zipf定律时表现最佳

摘要: Tokenization是自然语言处理（NLP）和其他序列建模领域中的基本步骤，词汇大小的选择对模型性能有显著影响。尽管其重要性，选择最佳词汇大小仍未被充分探讨，通常依赖于启发式方法或特定数据集的选择。在这项工作中，我们提出了一种通过分析Zipf定律中的标记频率分布来确定词汇大小的原则方法。我们表明，下游任务的性能与标记分布如何与幂律行为密切相关，并且与Zipfian缩放对齐可以提高模型的效率和有效性。在NLP、基因组学和化学等领域进行的大量实验表明，当标记分布与Zipf定律密切一致时，模型始终达到最佳性能，将Zipfian对齐建立为词汇大小选择的一个稳健且可推广的标准。

更新时间: 2025-07-30 10:16:23

领域: cs.LG,cs.CL,I.2.6; I.2.7

下载: http://arxiv.org/abs/2507.22543v1

Pre-trained Models Perform the Best When Token Distributions Follow Zipf's Law

Updated: 2025-07-30 10:16:23

标题: 预训练模型在令牌分布遵循Zipf定律时表现最佳

摘要: 标记化是自然语言处理（NLP）和其他序列建模领域中的基本步骤，其中词汇量大小的选择显著影响模型性能。尽管其重要性，选择最佳词汇量仍未得到充分探讨，通常依赖于启发式方法或特定数据集的选择。在这项工作中，我们提出了一种通过分析词汇频率分布来确定词汇量大小的原则性方法，通过齐夫定律。我们表明，下游任务性能与令牌分布如何紧密地遵循幂律行为相关，与齐夫缩放对齐可以改善模型的效率和有效性。NLP、基因组学和化学等领域的广泛实验表明，当令牌分布与齐夫定律紧密一致时，模型始终达到最佳性能，将齐夫对齐作为词汇量选择的一个稳健且可推广的标准。

更新时间: 2025-07-30 10:16:23

领域: cs.LG,cs.CL,I.2.6; I.2.7

下载: http://arxiv.org/abs/2507.22543v1

Training language models to be warm and empathetic makes them less reliable and more sycophantic

Artificial intelligence (AI) developers are increasingly building language models with warm and empathetic personas that millions of people now use for advice, therapy, and companionship. Here, we show how this creates a significant trade-off: optimizing language models for warmth undermines their reliability, especially when users express vulnerability. We conducted controlled experiments on five language models of varying sizes and architectures, training them to produce warmer, more empathetic responses, then evaluating them on safety-critical tasks. Warm models showed substantially higher error rates (+10 to +30 percentage points) than their original counterparts, promoting conspiracy theories, providing incorrect factual information, and offering problematic medical advice. They were also significantly more likely to validate incorrect user beliefs, particularly when user messages expressed sadness. Importantly, these effects were consistent across different model architectures, and occurred despite preserved performance on standard benchmarks, revealing systematic risks that current evaluation practices may fail to detect. As human-like AI systems are deployed at an unprecedented scale, our findings indicate a need to rethink how we develop and oversee these systems that are reshaping human relationships and social interaction.

Updated: 2025-07-30 10:11:59

标题: 训练语言模型变得温暖和富有同情心使其更不可靠和更阿谀奉承

摘要: 人工智能（AI）开发人员越来越多地构建具有温暖和同理心的人格的语言模型，现在有数百万人用于咨询、治疗和陪伴。在这里，我们展示了这种情况带来的一个重要权衡：优化语言模型的温暖会损害它们的可靠性，特别是当用户表达脆弱时。我们对五种不同大小和架构的语言模型进行了控制实验，训练它们产生更温暖、更同理心的回应，然后在关键安全任务上对其进行评估。温暖的模型显示出远高于原始对照组的错误率（+10到+30个百分点），促进了阴谋论，提供了不正确的事实信息，并提供了有问题的医疗建议。它们也更有可能验证用户错误的信念，特别是当用户的消息表达悲伤时。重要的是，这些效应在不同的模型架构中是一致的，并且尽管在标准基准测试中表现良好，但这些系统性风险仍会发生，这可能导致当前的评估实践无法检测到。随着类人AI系统以前所未有的规模部署，我们的研究结果表明需要重新思考如何开发和监督这些正在重塑人际关系和社会互动的系统。

更新时间: 2025-07-30 10:11:59

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2507.21919v2

Training language models to be warm and empathetic makes them less reliable and more sycophantic

Updated: 2025-07-30 10:11:59

标题: 训练语言模型变得温暖和富有同情心会使其更不可靠且更阿谀奉承

摘要: 人工智能（AI）开发者越来越多地构建拥有温暖和共情人格的语言模型，数百万人现在用于咨询、治疗和陪伴。在这里，我们展示了这种情况造成的重要权衡：优化语言模型的温暖性会损害其可靠性，特别是当用户表达脆弱时。我们对五种不同大小和结构的语言模型进行了控制实验，训练它们产生更温暖、更共情的回应，然后在安全关键任务上对其进行评估。温暖的模型显示出明显更高的错误率（+10到+30个百分点）比其原始对应物，促进阴谋论，提供不正确的事实信息，并提供有问题的医疗建议。它们也更有可能验证错误的用户信念，特别是当用户消息表达悲伤时。重要的是，这些效应在不同的模型结构中是一致的，尽管在标准基准测试中表现良好，但仍会发生，揭示了目前评估实践可能无法检测到的系统风险。随着类似人类的AI系统被规模空前地部署，我们的发现表明需要重新思考如何开发和监督这些正在重塑人类关系和社会互动的系统。

更新时间: 2025-07-30 10:11:59

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2507.21919v2

A surrogate model for topology optimisation of elastic structures via parametric autoencoders

A surrogate-based topology optimisation algorithm for linear elastic structures under parametric loads and boundary conditions is proposed. Instead of learning the parametric solution of the state (and adjoint) problems or the optimisation trajectory as a function of the iterations, the proposed approach devises a surrogate version of the entire optimisation pipeline. First, the method predicts a quasi-optimal topology for a given problem configuration as a surrogate model of high-fidelity topologies optimised with the homogenisation method. This is achieved by means of a feed-forward net learning the mapping between the input parameters characterising the system setup and a latent space determined by encoder/decoder blocks reducing the dimensionality of the parametric topology optimisation problem and reconstructing a high-dimensional representation of the topology. Then, the predicted topology is used as an educated initial guess for a computationally efficient algorithm penalising the intermediate values of the design variable, while enforcing the governing equations of the system. This step allows the method to correct potential errors introduced by the surrogate model, eliminate artifacts, and refine the design in order to produce topologies consistent with the underlying physics. Different architectures are proposed and the approximation and generalisation capabilities of the resulting models are numerically evaluated. The quasi-optimal topologies allow to outperform the high-fidelity optimiser by reducing the average number of optimisation iterations by $53\%$ while achieving discrepancies below $4\%$ in the optimal value of the objective functional, even in the challenging scenario of testing the model to extrapolate beyond the training and validation domain.

Updated: 2025-07-30 10:07:42

标题: 通过参数化自动编码器进行弹性结构拓扑优化的代理模型

摘要: 提出了一种基于代理的拓扑优化算法，用于受参数加载和边界条件影响的线性弹性结构。与学习状态（和伴随）问题的参数化解决方案或优化轨迹作为迭代函数不同，所提出的方法设计了整个优化流程的代理版本。首先，该方法预测了给定问题配置的准最优拓扑作为高保真度拓扑的代理模型，该高保真度拓扑经过均匀化方法优化。这通过通过一个前馈网络学习输入参数与由编码器/解码器块确定的潜在空间之间的映射，减少参数化拓扑优化问题的维度，重构拓扑的高维表示来实现。然后，预测的拓扑被用作计算高效算法的教育初始猜测，惩罚设计变量的中间值，同时强制系统的控制方程。这一步骤允许该方法纠正代理模型引入的潜在错误，消除人为因素，并优化设计以产生与基础物理一致的拓扑。提出了不同的架构，并对结果模型的逼近和概括能力进行了数值评估。准最优拓扑使得平均优化迭代次数减少了53％，同时在目标函数的最优值方面实现了低于4％的差异，即使在测试模型超越训练和验证领域的挑战性场景中也是如此。

更新时间: 2025-07-30 10:07:42

领域: math.NA,cs.AI,cs.CE,cs.LG,cs.NA,math.OC,49M41, 74P05, 74P15, 74S05, 65M60, 65M30

下载: http://arxiv.org/abs/2507.22539v1

A surrogate model for topology optimisation of elastic structures via parametric autoencoders

Updated: 2025-07-30 10:07:42

标题: 一种通过参数自动编码器进行弹性结构拓扑优化的代理模型

摘要: 提出了一种基于代理的拓扑优化算法，用于在参数载荷和边界条件下线性弹性结构。与学习状态（和共轭）问题的参数化解决方案或优化轨迹作为迭代的函数不同，所提出的方法设计了整个优化流程的代理版本。首先，该方法预测了给定问题配置的准最优拓扑作为高保真拓扑的代理模型，该拓扑是通过均匀化方法优化的。这是通过前馈网络学习输入参数与由编码器/解码器块确定的潜在空间之间的映射来实现的，从而降低参数拓扑优化问题的维度，并重建拓扑的高维表示。然后，预测的拓扑被用作计算效率高的算法的教育性初始猜测，惩罚设计变量的中间值，同时强制执行系统的控制方程。该步骤允许该方法纠正代理模型引入的潜在错误，消除人为瑕疵，并优化设计，以产生与基础物理一致的拓扑。提出了不同的架构，并对结果模型的近似和泛化能力进行了数值评估。准最优拓扑可以通过将平均优化迭代次数减少53％来胜过高保真优化器，同时在目标功能的最优值下达到低于4％的差异，即使在测试模型超越训练和验证域的挑战性情况下也是如此。

更新时间: 2025-07-30 10:07:42

领域: math.NA,cs.AI,cs.CE,cs.LG,cs.NA,math.OC,49M41, 74P05, 74P15, 74S05, 65M60, 65M30

下载: http://arxiv.org/abs/2507.22539v1

Universally Unfiltered and Unseen:Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model Safeguards

Various (text) prompt filters and (image) safety checkers have been implemented to mitigate the misuse of Text-to-Image (T2I) models in creating Not-Safe-For-Work (NSFW) content.In order to expose potential security vulnerabilities of such safeguards, multimodal jailbreaks have been studied.However, existing jailbreaks are limited to prompt-specific and image-specific perturbations, which suffer from poor scalability and time-consuming optimization.To address these limitations, we propose Universally Unfiltered and Unseen (U3)-Attack, a multimodal jailbreak attack method against T2I safeguards.Specifically, U3-Attack optimizes an adversarial patch on the image background to universally bypass safety checkers and optimizes a safe paraphrase set from a sensitive word to universally bypass prompt filters while eliminating redundant computations.Extensive experimental results demonstrate the superiority of our U3-Attack on both open-source and commercial T2I models.For example, on the commercial Runway-inpainting model with both prompt filter and safety checker, our U3-Attack achieves $~4\times$ higher success rates than the state-of-the-art multimodal jailbreak attack, MMA-Diffusion.Content Warning: This paper includes examples of NSFW content.

Updated: 2025-07-30 10:06:38

标题: 普遍未经过滤和未被看见：针对文本到图像模型安全措施的输入不可知的多模式越狱

摘要: 各种（文本）提示过滤器和（图像）安全检查器已经实施，以减轻文本到图像（T2I）模型被滥用用于创建不适宜工作场所（NSFW）内容。为了揭示这些保障措施的潜在安全漏洞，已经研究了多模态越狱。然而，现有的越狱方法仅限于特定于提示和图像的扰动，存在可扩展性差和耗时长的优化问题。为了解决这些限制，我们提出了Universally Unfiltered and Unseen (U3)-Attack，这是一种针对T2I安全措施的多模态越狱攻击方法。具体来说，U3-Attack在图像背景上优化对抗性补丁，以普遍绕过安全检查器，并在敏感词上优化一个安全的释义集，以普遍绕过提示过滤器，同时消除冗余计算。大量的实验结果证明了我们的U3-Attack在开源和商业T2I模型上的优越性。例如，在商业Runway-inpainting模型上，我们的U3-Attack在同时具有提示过滤器和安全检查器的情况下，比最先进的多模态越狱攻击MMA-Diffusion的成功率高出4倍。内容警告：本文包含不适宜工作场所内容的示例。

更新时间: 2025-07-30 10:06:38

领域: cs.CR,cs.CV,cs.MM

下载: http://arxiv.org/abs/2508.05658v1

Scalable and (quantum-accessible) adaptive pseudorandom quantum states and pseudorandom function-like quantum state generators

Pseudorandom quantum states (PRS) and pseudorandom function-like quantum state (PRFS) generators are quantum analogues of pseudorandom generators and pseudorandom functions. It is known that PRS (and PRFS) can exist even if BQP = QMA (relative to a quantum oracle) or if P = NP (relative to a classical oracle), which does not allow for the existence of one-way functions (relative to these oracles). Hence, these are potentially weaker objects than quantum-secure one-way functions, which can be used to do quantum cryptography. A desirable property of PRS and PRFS constructions is scalability, which ensures that the security parameter $\lambda$ (which determines indistinguishability from their Haar-random counterparts) is much larger than $n$ (the number of qubits of the output states). This may be important in some applications where PRS and PRFS primitives are used. We present an isometric procedure to prepare quantum states that can be arbitrarily random (i.e., the trace distance from the Haar-random state can be arbitrarily small for the true random case, or the distinguishing advantage can be arbitrarily small for the pseudorandom case). Our procedure provides a new method for scalable PRS that introduces no entanglement or correlations with the environment. This naturally gives the first construction for scalable and (quantum-accessible) adaptive PRFS assuming quantum-secure one-way functions. Our PRFS construction implies various primitives, including long-input PRFS, short-input PRFS, short-output PRFS, non-adaptive PRFS, and classical-accessible adaptive PRFS. This new construction may be helpful in some simplification of the microcrypt zoo.

Updated: 2025-07-30 10:02:45

标题: 可扩展和（量子可访问的）自适应伪随机量子态及类似伪随机函数的量子态生成器

摘要: 伪随机量子态（PRS）和类似伪随机函数的量子态（PRFS）生成器是伪随机生成器和伪随机函数的量子模拟。已知PRS（和PRFS）可以存在，即使BQP = QMA（相对于量子oracle）或P = NP（相对于经典oracle），这不允许存在单向函数（相对于这些oracle）。因此，这些可能是比量子安全单向函数更弱的对象，可以用于进行量子密码学。PRS和PRFS构造的一个理想特性是可扩展性，确保安全参数λ（确定与其Haar随机对应物的不可区分性）远大于n（输出状态的量子比特数）。在一些应用中，这可能很重要，其中使用PRS和PRFS基元。我们提出了一种等距过程，用于准备可以任意随机的量子态（即，与Haar随机态的距离可以在真随机情况下任意小，或在伪随机情况下可以任意小）。我们的过程为可扩展PRS提供了一种新方法，不引入与环境的纠缠或相关性。这自然地提供了第一个基于量子安全单向函数的可扩展和（量子可访问的）自适应PRFS的构造。我们的PRFS构造涵盖了各种基元，包括长输入PRFS、短输入PRFS、短输出PRFS、非自适应PRFS和经典可访问的自适应PRFS。这种新构造可能有助于简化微密码动物园。

更新时间: 2025-07-30 10:02:45

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2507.22535v1

CliCARE: Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records

Large Language Models (LLMs) hold significant promise for improving clinical decision support and reducing physician burnout by synthesizing complex, longitudinal cancer Electronic Health Records (EHRs). However, their implementation in this critical field faces three primary challenges: the inability to effectively process the extensive length and multilingual nature of patient records for accurate temporal analysis; a heightened risk of clinical hallucination, as conventional grounding techniques such as Retrieval-Augmented Generation (RAG) do not adequately incorporate process-oriented clinical guidelines; and unreliable evaluation metrics that hinder the validation of AI systems in oncology. To address these issues, we propose CliCARE, a framework for Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records. The framework operates by transforming unstructured, longitudinal EHRs into patient-specific Temporal Knowledge Graphs (TKGs) to capture long-range dependencies, and then grounding the decision support process by aligning these real-world patient trajectories with a normative guideline knowledge graph. This approach provides oncologists with evidence-grounded decision support by generating a high-fidelity clinical summary and an actionable recommendation. We validated our framework using large-scale, longitudinal data from a private Chinese cancer dataset and the public English MIMIC-IV dataset. In these diverse settings, CliCARE significantly outperforms strong baselines, including leading long-context LLMs and Knowledge Graph-enhanced RAG methods. The clinical validity of our results is supported by a robust evaluation protocol, which demonstrates a high correlation with assessments made by expert oncologists.

Updated: 2025-07-30 10:02:16

标题: CliCARE:将大型语言模型基于临床指南用于对经纬度癌症电子健康记录的决策支持

摘要: 大型语言模型（LLMs）在改善临床决策支持和减少医生工作过度疲劳方面具有重要潜力，通过合成复杂的、长期的癌症电子健康记录（EHRs）。然而，它们在这一关键领域的实施面临三个主要挑战：无法有效处理患者记录的广泛长度和多语言特性，以进行准确的时间分析；临床幻觉的风险增加，因为传统的基础技术（如检索增强生成RAG）未充分纳入面向过程的临床指南；以及阻碍肿瘤学人工智能系统验证的不可靠评估指标。为了解决这些问题，我们提出了CliCARE，一个用于在长期癌症电子健康记录上将大型语言模型基于临床指南的决策支持的框架。该框架通过将非结构化的长期EHRs转化为患者特定的时间知识图（TKGs）来捕捉长程依赖关系，然后通过将这些现实世界患者轨迹与规范指南知识图对齐来基于决策支持过程。这种方法为肿瘤学家提供了基于证据的决策支持，生成高保真的临床摘要和可操作的建议。我们使用来自一个私人中国癌症数据集和公共英语MIMIC-IV数据集的大规模、长期数据验证了我们的框架。在这些多样化的环境中，CliCARE明显优于强大的基线，包括领先的长上下文LLMs和知识图增强的RAG方法。我们结果的临床有效性得到了强有力的评估协议的支持，该协议显示与专家肿瘤学家所做的评估高度相关。

更新时间: 2025-07-30 10:02:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.22533v1

CliCARE: Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records

Updated: 2025-07-30 10:02:16

标题: CliCARE：将大型语言模型基于临床指南用于支持对癌症电子病历的长期决策

摘要: 大型语言模型（LLMs）在改善临床决策支持和减少医生职业倦怠方面具有重要潜力，通过综合复杂、纵向的癌症电子健康记录（EHRs）。然而，在这一关键领域中，它们的实施面临三个主要挑战：无法有效处理患者记录的广泛长度和多语言特性，以进行准确的时间分析；临床幻觉的风险增加，因为传统的基础技术（如检索增强生成（RAG））未充分整合面向过程的临床指南；以及可靠性评估指标阻碍了肿瘤学AI系统的验证。为了解决这些问题，我们提出了CliCARE，一个用于将大型语言模型基于临床指南的决策支持框架，用于纵向癌症电子健康记录。该框架通过将非结构化的纵向EHRs转换为患者特定的时间知识图（TKGs）来捕捉长程依赖性，然后通过将这些现实世界的患者轨迹与规范指南知识图对齐来基于决策支持过程。这种方法通过生成高保真的临床总结和可操作的建议，为肿瘤学家提供有证据的决策支持。我们使用来自中国私人癌症数据集和公共英文MIMIC-IV数据集的大规模纵向数据验证了我们的框架。在这些多样化的环境中，CliCARE明显优于强大的基线，包括领先的长上下文LLMs和知识图增强的RAG方法。我们结果的临床有效性得到了强大的评估协议的支持，该协议显示与专家肿瘤学家所做的评估高度相关。

更新时间: 2025-07-30 10:02:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.22533v1

HRVVS: A High-resolution Video Vasculature Segmentation Network via Hierarchical Autoregressive Residual Priors

The segmentation of the hepatic vasculature in surgical videos holds substantial clinical significance in the context of hepatectomy procedures. However, owing to the dearth of an appropriate dataset and the inherently complex task characteristics, few researches have been reported in this domain. To address this issue, we first introduce a high quality frame-by-frame annotated hepatic vasculature dataset containing 35 long hepatectomy videos and 11442 high-resolution frames. On this basis, we propose a novel high-resolution video vasculature segmentation network, dubbed as HRVVS. We innovatively embed a pretrained visual autoregressive modeling (VAR) model into different layers of the hierarchical encoder as prior information to reduce the information degradation generated during the downsampling process. In addition, we designed a dynamic memory decoder on a multi-view segmentation network to minimize the transmission of redundant information while preserving more details between frames. Extensive experiments on surgical video datasets demonstrate that our proposed HRVVS significantly outperforms the state-of-the-art methods. The source code and dataset will be publicly available at \href{https://github.com/scott-yjyang/xx}{https://github.com/scott-yjyang/HRVVS}.

Updated: 2025-07-30 09:57:38

标题: HRVVS：一种通过分层自回归残差先验实现高分辨率视频血管分割的网络

摘要: 在肝切除手术视频中对肝脏血管的分割在肝切除手术过程中具有重要的临床意义。然而，由于缺乏适当的数据集和固有的复杂任务特征，该领域中报道很少的研究。为了解决这个问题，我们首先介绍了一个高质量的逐帧标注的肝脏血管数据集，包含35个长时间的肝切除手术视频和11442个高分辨率帧。在此基础上，我们提出了一种新颖的高分辨率视频血管分割网络，命名为HRVVS。我们创新地将一个预训练的视觉自回归建模(VAR)模型嵌入到分层编码器的不同层中作为先验信息，以减少在下采样过程中产生的信息降级。此外，我们设计了一个动态内存解码器在多视图分割网络上，以最小化冗余信息的传输，同时保留帧间更多的细节。对手术视频数据集进行的大量实验表明，我们提出的HRVVS明显优于现有方法。源代码和数据集将在\href{https://github.com/scott-yjyang/xx}{https://github.com/scott-yjyang/HRVVS}上公开提供。

更新时间: 2025-07-30 09:57:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22530v1

HRVVS: A High-resolution Video Vasculature Segmentation Network via Hierarchical Autoregressive Residual Priors

Updated: 2025-07-30 09:57:38

标题: HRVVS：一种通过分层自回归残差先验的高分辨率视频血管分割网络

摘要: 在肝切除手术视频中肝血管的分割在肝切除手术过程中具有重要的临床意义。然而，由于缺乏适当的数据集和固有的复杂任务特征，这一领域中报道的研究很少。为了解决这个问题，我们首先介绍了一个高质量的逐帧标记肝血管数据集，包含35个长肝切除视频和11442帧高分辨率图像。在此基础上，我们提出了一种新颖的高分辨率视频血管分割网络，命名为HRVVS。我们创新地将预训练的视觉自回归建模(VAR)模型嵌入到分层编码器的不同层中作为先验信息，以减少在下采样过程中生成的信息退化。此外，我们设计了一个动态记忆解码器在多视图分割网络上，以最小化冗余信息的传输，同时保留帧间更多的细节。对手术视频数据集进行的大量实验表明，我们提出的HRVVS明显优于最先进的方法。源代码和数据集将在以下网址公开提供：https://github.com/scott-yjyang/HRVVS。

更新时间: 2025-07-30 09:57:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22530v1

Accident-Driven Congestion Prediction and Simulation: An Explainable Framework Using Advanced Clustering and Bayesian Networks

Traffic congestion due to uncertainties, such as accidents, is a significant issue in urban areas, as the ripple effect of accidents causes longer delays, increased emissions, and safety concerns. To address this issue, we propose a robust framework for predicting the impact of accidents on congestion. We implement Automated Machine Learning (AutoML)-enhanced Deep Embedding Clustering (DEC) to assign congestion labels to accident data and predict congestion probability using a Bayesian Network (BN). The Simulation of Urban Mobility (SUMO) simulation is utilized to evaluate the correctness of BN predictions using evidence-based scenarios. Results demonstrate that the AutoML-enhanced DEC has outperformed traditional clustering approaches. The performance of the proposed BN model achieved an overall accuracy of 95.6%, indicating its ability to understand the complex relationship of accidents causing congestion. Validation in SUMO with evidence-based scenarios demonstrated that the BN model's prediction of congestion states closely matches those of SUMO, indicating the high reliability of the proposed BN model in ensuring smooth urban mobility.

Updated: 2025-07-30 09:57:08

标题: 事故驱动的拥堵预测与模拟：使用先进聚类和贝叶斯网络的可解释框架

摘要: 交通拥堵是城市地区的一个重要问题，不确定性因素，如事故，是导致交通拥堵的一个重要原因，因为事故的连锁效应会导致更长的延误，增加的排放和安全问题。为了解决这个问题，我们提出了一个稳健的框架来预测事故对交通拥堵的影响。我们实现了自动化机器学习（AutoML）增强的深度嵌入聚类（DEC）来为事故数据分配拥堵标签，并利用贝叶斯网络（BN）预测拥堵概率。我们利用城市移动仿真（SUMO）仿真来评估BN预测的准确性，使用基于证据的场景。结果表明，AutoML增强的DEC的表现优于传统的聚类方法。所提出的BN模型的性能达到了95.6%的总体准确度，表明其能够理解导致交通拥堵的事故的复杂关系。在SUMO中使用基于证据的场景进行验证，表明BN模型对拥堵状态的预测与SUMO的预测非常接近，表明所提出的BN模型在确保城市流动性顺畅方面具有很高的可靠性。

更新时间: 2025-07-30 09:57:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22529v1

Accident-Driven Congestion Prediction and Simulation: An Explainable Framework Using Advanced Clustering and Bayesian Networks

Updated: 2025-07-30 09:57:08

标题: 事故驱动的拥堵预测和模拟：使用先进聚类和贝叶斯网络的可解释框架

摘要: 因不确定性导致的交通拥堵，如事故，是城市地区的一个重要问题，事故的连锁效应会导致交通延误加长、排放增加和安全问题。为了解决这个问题，我们提出了一个预测事故对交通拥堵影响的稳健框架。我们使用自动化机器学习（AutoML）增强的深度嵌入聚类（DEC）来为事故数据分配拥堵标签，并利用贝叶斯网络（BN）预测拥堵概率。利用城市交通模拟（SUMO）模拟评估了BN预测的正确性。结果表明，AutoML增强的DEC表现优于传统聚类方法。提出的BN模型的性能达到了总体准确率95.6%，表明其能够理解导致交通拥堵的事故的复杂关系。在SUMO中使用基于证据的场景验证表明，BN模型对拥堵状态的预测与SUMO的预测密切匹配，表明提出的BN模型在确保城市交通流畅方面具有很高的可靠性。

更新时间: 2025-07-30 09:57:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22529v1

FGFP: A Fractional Gaussian Filter and Pruning for Deep Neural Networks Compression

Network compression techniques have become increasingly important in recent years because the loads of Deep Neural Networks (DNNs) are heavy for edge devices in real-world applications. While many methods compress neural network parameters, deploying these models on edge devices remains challenging. To address this, we propose the fractional Gaussian filter and pruning (FGFP) framework, which integrates fractional-order differential calculus and Gaussian function to construct fractional Gaussian filters (FGFs). To reduce the computational complexity of fractional-order differential operations, we introduce Gr\"unwald-Letnikov fractional derivatives to approximate the fractional-order differential equation. The number of parameters for each kernel in FGF is minimized to only seven. Beyond the architecture of Fractional Gaussian Filters, our FGFP framework also incorporates Adaptive Unstructured Pruning (AUP) to achieve higher compression ratios. Experiments on various architectures and benchmarks show that our FGFP framework outperforms recent methods in accuracy and compression. On CIFAR-10, ResNet-20 achieves only a 1.52% drop in accuracy while reducing the model size by 85.2%. On ImageNet2012, ResNet-50 achieves only a 1.63% drop in accuracy while reducing the model size by 69.1%.

Updated: 2025-07-30 09:56:18

标题: FGFP：一种用于深度神经网络压缩的分数高斯滤波器和修剪

摘要: 在最近几年中，网络压缩技术变得越来越重要，因为在现实世界的应用中，深度神经网络（DNNs）对边缘设备的负载较重。虽然有许多方法可以压缩神经网络参数，但在边缘设备上部署这些模型仍然具有挑战性。为了解决这个问题，我们提出了分数高斯滤波器和修剪（FGFP）框架，该框架整合了分数阶微积分和高斯函数，构建了分数高斯滤波器（FGFs）。为了减少分数阶微分运算的计算复杂性，我们引入了Gruenwald-Letnikov分数导数来近似分数阶微分方程。每个FGF中的核的参数数量最小化为仅为七个。除了分数高斯滤波器的架构外，我们的FGFP框架还包括自适应非结构化修剪（AUP）以实现更高的压缩比。在各种架构和基准测试中的实验表明，我们的FGFP框架在准确性和压缩方面优于最近的方法。在CIFAR-10上，ResNet-20的准确率仅下降了1.52%，同时将模型大小减少了85.2%。在ImageNet2012上，ResNet-50的准确率仅下降了1.63%，同时将模型大小减少了69.1%。

更新时间: 2025-07-30 09:56:18

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.22527v1

FGFP: A Fractional Gaussian Filter and Pruning for Deep Neural Networks Compression

Updated: 2025-07-30 09:56:18

标题: FGFP：用于深度神经网络压缩的分数高斯滤波器和剪枝

摘要: 最近几年，由于深度神经网络（DNNs）的负载对于现实世界中的边缘设备来说太重，网络压缩技术变得越来越重要。虽然有许多方法可以压缩神经网络参数，但在边缘设备上部署这些模型仍然具有挑战性。为了解决这个问题，我们提出了分数高斯滤波器和修剪（FGFP）框架，该框架整合了分数阶微积分和高斯函数，构建了分数高斯滤波器（FGFs）。为了减少分数阶微分运算的计算复杂性，我们引入了格伦瓦尔德-莱特尼科夫分数导数来近似分数阶微分方程。FGF中每个核的参数数量最小化为仅7个。除了分数高斯滤波器的架构之外，我们的FGFP框架还结合了自适应非结构化修剪（AUP）以实现更高的压缩比。对各种架构和基准测试的实验表明，我们的FGFP框架在准确性和压缩方面优于最近的方法。在CIFAR-10上，ResNet-20的准确性仅下降了1.52％，同时将模型大小减少了85.2％。在ImageNet2012上，ResNet-50的准确性仅下降了1.63％，同时将模型大小减少了69.1％。

更新时间: 2025-07-30 09:56:18

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2507.22527v1

HGCN(O): A Self-Tuning GCN HyperModel Toolkit for Outcome Prediction in Event-Sequence Data

We propose HGCN(O), a self-tuning toolkit using Graph Convolutional Network (GCN) models for event sequence prediction. Featuring four GCN architectures (O-GCN, T-GCN, TP-GCN, TE-GCN) across the GCNConv and GraphConv layers, our toolkit integrates multiple graph representations of event sequences with different choices of node- and graph-level attributes and in temporal dependencies via edge weights, optimising prediction accuracy and stability for balanced and unbalanced datasets. Extensive experiments show that GCNConv models excel on unbalanced data, while all models perform consistently on balanced data. Experiments also confirm the superior performance of HGCN(O) over traditional approaches. Applications include Predictive Business Process Monitoring (PBPM), which predicts future events or states of a business process based on event logs.

Updated: 2025-07-30 09:50:43

标题: HGCN(O): 用于事件序列数据中结果预测的自调节GCN超模型工具包

摘要: 我们提出了HGCN(O)，这是一个使用图卷积网络（GCN）模型进行事件序列预测的自调整工具包。我们的工具包包含四种GCN架构（O-GCN、T-GCN、TP-GCN、TE-GCN），跨越GCNConv和GraphConv层，集成了多个事件序列的图表示，通过不同选择的节点和图级属性以及通过边权重的时间依赖性，优化了平衡和不平衡数据集的预测准确性和稳定性。大量实验表明，GCNConv模型在不平衡数据上表现出色，而所有模型在平衡数据上表现一致。实验还证实了HGCN(O)相对于传统方法具有更优越的性能。应用包括预测性业务流程监控（PBPM），该应用基于事件日志来预测业务流程的未来事件或状态。

更新时间: 2025-07-30 09:50:43

领域: cs.LG,I.2.6

下载: http://arxiv.org/abs/2507.22524v1

HGCN(O): A Self-Tuning GCN HyperModel Toolkit for Outcome Prediction in Event-Sequence Data

Updated: 2025-07-30 09:50:43

标题: HGCN(O)：用于事件序列数据中结果预测的自动调节GCN超模型工具包

摘要: 我们提出了HGCN（O），一种使用图卷积网络（GCN）模型进行事件序列预测的自调整工具包。该工具包包括四种GCN架构（O-GCN、T-GCN、TP-GCN、TE-GCN），涵盖了GCNConv和GraphConv层，通过多种图表示事件序列，并通过节点和图级属性的不同选择以及边缘权重中的时间依赖性，优化了对平衡和不平衡数据集的预测准确性和稳定性。大量实验表明，GCNConv模型在不平衡数据上表现优异，而所有模型在平衡数据上表现一致。实验还证实了HGCN（O）相对于传统方法的卓越性能。应用包括预测性业务流程监控（PBPM），该应用基于事件日志预测未来事件或业务流程的状态。

更新时间: 2025-07-30 09:50:43

领域: cs.LG,I.2.6

下载: http://arxiv.org/abs/2507.22524v1

Recognizing Actions from Robotic View for Natural Human-Robot Interaction

Natural Human-Robot Interaction (N-HRI) requires robots to recognize human actions at varying distances and states, regardless of whether the robot itself is in motion or stationary. This setup is more flexible and practical than conventional human action recognition tasks. However, existing benchmarks designed for traditional action recognition fail to address the unique complexities in N-HRI due to limited data, modalities, task categories, and diversity of subjects and environments. To address these challenges, we introduce ACTIVE (Action from Robotic View), a large-scale dataset tailored specifically for perception-centric robotic views prevalent in mobile service robots. ACTIVE comprises 30 composite action categories, 80 participants, and 46,868 annotated video instances, covering both RGB and point cloud modalities. Participants performed various human actions in diverse environments at distances ranging from 3m to 50m, while the camera platform was also mobile, simulating real-world scenarios of robot perception with varying camera heights due to uneven ground. This comprehensive and challenging benchmark aims to advance action and attribute recognition research in N-HRI. Furthermore, we propose ACTIVE-PC, a method that accurately perceives human actions at long distances using Multilevel Neighborhood Sampling, Layered Recognizers, Elastic Ellipse Query, and precise decoupling of kinematic interference from human actions. Experimental results demonstrate the effectiveness of ACTIVE-PC. Our code is available at: https://github.com/wangzy01/ACTIVE-Action-from-Robotic-View.

Updated: 2025-07-30 09:48:34

标题: 从机器人视角识别动作，实现自然人机交互

摘要: 自然人机交互（N-HRI）要求机器人在不同距离和状态下识别人类动作，无论机器人本身是在运动还是静止。这种设置比传统的人类动作识别任务更灵活和实用。然而，现有为传统动作识别设计的基准未能解决N-HRI中的独特复杂性，原因在于数据、模态、任务类别以及受试者和环境的多样性有限。为了解决这些挑战，我们引入了ACTIVE（从机器人视角看动作），这是一个专门针对移动服务机器人中普遍存在的感知中心的大规模数据集。ACTIVE包括30个复合动作类别、80名参与者和46,868个标注的视频实例，涵盖了RGB和点云模态。参与者在不同环境中进行各种人类动作，距离范围从3米到50米不等，而摄像机平台也是移动的，模拟了由于不平坦地面导致摄像机高度变化的真实世界场景。这一全面而具有挑战性的基准旨在推动N-HRI中的动作和属性识别研究。此外，我们提出了ACTIVE-PC，一种利用多级邻域采样、分层识别器、弹性椭圆查询以及准确解耦动作的运动干扰的方法，可以准确感知长距离处的人类动作。实验结果表明了ACTIVE-PC的有效性。我们的代码可在以下链接获得：https://github.com/wangzy01/ACTIVE-Action-from-Robotic-View.

更新时间: 2025-07-30 09:48:34

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2507.22522v1

Recognizing Actions from Robotic View for Natural Human-Robot Interaction

Updated: 2025-07-30 09:48:34

标题: 通过机器人视角识别动作，实现自然的人机交互

摘要: 自然人机交互（N-HRI）要求机器人能够识别不同距离和状态下的人类动作，无论机器人本身是运动还是静止。这种设置比传统的人类动作识别任务更灵活和实用。然而，现有的设计用于传统动作识别的基准无法解决N-HRI中的独特复杂性，因为数据、模态、任务类别以及主体和环境的多样性有限。为了解决这些挑战，我们引入了ACTIVE（来自机器人视角的动作），这是一个大规模数据集，专门针对移动服务机器人中普遍存在的以感知为中心的机器人视角。ACTIVE包括30个组合动作类别，80名参与者和46,868个带有注释的视频实例，涵盖了RGB和点云模态。参与者在距离3m到50m范围内的多样环境中执行各种人类动作，同时摄像机平台也是移动的，模拟了由于不平整地面导致摄像机高度变化的真实场景。这个全面而具有挑战性的基准旨在推动N-HRI中的动作和属性识别研究。此外，我们提出了ACTIVE-PC，这是一种通过多级邻域采样、分层识别器、弹性椭圆查询以及准确解耦人类动作的运动干扰来准确感知长距离的人类动作的方法。实验结果表明了ACTIVE-PC的有效性。我们的代码可在以下网址获取：https://github.com/wangzy01/ACTIVE-Action-from-Robotic-View.

更新时间: 2025-07-30 09:48:34

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2507.22522v1

Rethinking Individual Fairness in Deepfake Detection

Generative AI models have substantially improved the realism of synthetic media, yet their misuse through sophisticated DeepFakes poses significant risks. Despite recent advances in deepfake detection, fairness remains inadequately addressed, enabling deepfake markers to exploit biases against specific populations. While previous studies have emphasized group-level fairness, individual fairness (i.e., ensuring similar predictions for similar individuals) remains largely unexplored. In this work, we identify for the first time that the original principle of individual fairness fundamentally fails in the context of deepfake detection, revealing a critical gap previously unexplored in the literature. To mitigate it, we propose the first generalizable framework that can be integrated into existing deepfake detectors to enhance individual fairness and generalization. Extensive experiments conducted on leading deepfake datasets demonstrate that our approach significantly improves individual fairness while maintaining robust detection performance, outperforming state-of-the-art methods. The code is available at https://github.com/Purdue-M2/Individual-Fairness-Deepfake-Detection.

Updated: 2025-07-30 09:45:55

标题: 重新思考深度伪造检测中的个体公平性

摘要: 生成AI模型显著提高了合成媒体的逼真度，然而它们被精心制作的DeepFakes滥用，带来了重大风险。尽管最近在深度伪造检测方面取得了进展，但公平性仍未得到充分解决，使得深度伪造标记者能够利用针对特定人群的偏见。虽然先前的研究强调了群体级公平性，但个体公平性（即确保相似个体得到类似预测）仍然很少被探讨。在这项工作中，我们首次发现原始的个体公平原则在深度伪造检测的背景下基本上失败了，揭示了文献中以前未曾探讨的重要差距。为了缓解这一问题，我们提出了第一个可集成到现有深度伪造检测器中以增强个体公平性和泛化性的通用框架。对主要深度伪造数据集进行的大量实验表明，我们的方法显著提高了个体公平性，同时保持了强大的检测性能，优于最先进的方法。代码可在https://github.com/Purdue-M2/Individual-Fairness-Deepfake-Detection获取。

更新时间: 2025-07-30 09:45:55

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2507.14326v2

Rethinking Individual Fairness in Deepfake Detection

Updated: 2025-07-30 09:45:55

标题: 重新思考深伪造检测中的个体公平性

摘要: 生成式人工智能模型显著提高了合成媒体的逼真度，然而它们通过复杂的DeepFakes的滥用带来了重大风险。尽管深度伪造检测方面取得了最新进展，但公平性仍未得到充分解决，使深度伪造标记者能够利用对特定人群的偏见。尽管先前的研究强调了群体层面的公平性，但个体公平性（即确保类似个体的预测相似）仍然很少被探讨。在这项工作中，我们首次确认原始的个体公平性原则在深度伪造检测的环境中基本失败，揭示了先前文献中未曾探索的关键差距。为了减轻这一问题，我们提出了第一个通用框架，可以集成到现有的深度伪造检测器中，以增强个体公平性和泛化性。在领先的深度伪造数据集上进行的大量实验表明，我们的方法显著提高了个体公平性，同时保持了强大的检测性能，超过了最先进的方法。代码可在https://github.com/Purdue-M2/Individual-Fairness-Deepfake-Detection找到。

更新时间: 2025-07-30 09:45:55

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2507.14326v2

SmilesT5: Domain-specific pretraining for molecular language models

Molecular property prediction is an increasingly critical task within drug discovery and development. Typically, neural networks can learn molecular properties using graph-based, language-based or feature-based methods. Recent advances in natural language processing have highlighted the capabilities of neural networks to learn complex human language using masked language modelling. These approaches to training large transformer-based deep learning models have also been used to learn the language of molecules, as represented by simplified molecular-input line-entry system (SMILES) strings. Here, we present novel domain-specific text-to-text pretraining tasks that yield improved performance in six classification-based molecular property prediction benchmarks, relative to both traditional likelihood-based training and previously proposed fine-tuning tasks. Through ablation studies, we show that data and computational efficiency can be improved by using these domain-specific pretraining tasks. Finally, the pretrained embeddings from the model can be used as fixed inputs into a downstream machine learning classifier and yield comparable performance to finetuning but with much lower computational overhead.

Updated: 2025-07-30 09:36:18

标题: SmilesT5：分子语言模型的领域特定预训练

摘要: 分子属性预测是药物发现和开发中日益关键的任务。通常，神经网络可以使用基于图形、基于语言或基于特征的方法来学习分子属性。最近自然语言处理的进展突显了神经网络使用掩码语言建模学习复杂人类语言的能力。这些训练大型基于变压器的深度学习模型的方法也被用于学习分子的语言，以简化的分子输入线输入系统（SMILES）字符串表示。在这里，我们提出了新颖的领域特定的文本到文本预训练任务，在六个基于分类的分子属性预测基准测试中表现出比传统的基于可能性的训练和先前提出的微调任务更好的性能。通过消融研究，我们展示了通过使用这些领域特定的预训练任务可以提高数据和计算效率。最后，模型的预训练嵌入可以作为固定输入输入到下游机器学习分类器，并且在计算开销大大降低的情况下产生与微调相当的性能。

更新时间: 2025-07-30 09:36:18

领域: cs.LG

下载: http://arxiv.org/abs/2507.22514v1

SmilesT5: Domain-specific pretraining for molecular language models

Updated: 2025-07-30 09:36:18

标题: SmilesT5：用于分子语言模型的领域特定预训练

摘要: 分子性质预测是药物发现和开发中越来越关键的任务。通常，神经网络可以使用基于图形、基于语言或基于特征的方法学习分子性质。自然语言处理的最新进展突显了神经网络利用掩码语言建模学习复杂人类语言的能力。这些训练大型基于变压器的深度学习模型的方法也被用来学习分子的语言，其表示为简化的分子输入行输入系统（SMILES）字符串。在这里，我们提出了新领域特定的文本到文本预训练任务，相对于传统的基于可能性的训练和先前提出的微调任务，在六个基于分类的分子性质预测基准测试中取得了改进的性能。通过消融研究，我们表明通过使用这些领域特定的预训练任务，数据和计算效率可以得到改善。最后，模型的预训练嵌入可以作为固定输入用于下游机器学习分类器，产生与微调相当的性能，但计算开销要低得多。

更新时间: 2025-07-30 09:36:18

领域: cs.LG

下载: http://arxiv.org/abs/2507.22514v1

AlphaDent: A dataset for automated tooth pathology detection

In this article, we present a new unique dataset for dental research - AlphaDent. This dataset is based on the DSLR camera photographs of the teeth of 295 patients and contains over 1200 images. The dataset is labeled for solving the instance segmentation problem and is divided into 9 classes. The article provides a detailed description of the dataset and the labeling format. The article also provides the details of the experiment on neural network training for the Instance Segmentation problem using this dataset. The results obtained show high quality of predictions. The dataset is published under an open license; and the training/inference code and model weights are also available under open licenses.

Updated: 2025-07-30 09:34:43

标题: AlphaDent: 一个用于自动化牙齿病变检测的数据集

摘要: 在这篇文章中，我们介绍了一个用于牙科研究的新的独特数据集 - AlphaDent。该数据集基于295名患者牙齿的单反相机照片，包含超过1200张图片。该数据集被标记用于解决实例分割问题，并分为9个类别。文章详细描述了数据集和标记格式。文章还提供了使用该数据集进行神经网络训练解决实例分割问题的实验细节。获得的结果表明预测质量很高。该数据集以开放许可发布；训练/推理代码和模型权重也可在开放许可下获得。

更新时间: 2025-07-30 09:34:43

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.22512v1

AlphaDent: A dataset for automated tooth pathology detection

Updated: 2025-07-30 09:34:43

标题: AlphaDent：用于自动牙齿病变检测的数据集

摘要: 在这篇文章中，我们介绍了一个供牙科研究使用的新数据集 - AlphaDent。该数据集基于295名患者牙齿的数码单反相机照片，包含1200多张图片。该数据集被标记用于解决实例分割问题，并分为9个类别。文章详细描述了数据集和标记格式。文章还提供了使用该数据集进行神经网络训练解决实例分割问题的实验细节。所得结果显示预测质量很高。该数据集以开放许可发布；训练/推理代码和模型权重也以开放许可提供。

更新时间: 2025-07-30 09:34:43

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.22512v1

Collaborative Medical Triage under Uncertainty: A Multi-Agent Dynamic Matching Approach

The post-pandemic surge in healthcare demand, coupled with critical nursing shortages, has placed unprecedented pressure on emergency department triage systems, necessitating innovative AI-driven solutions. We present a multi-agent interactive intelligent system for medical triage that addresses three fundamental challenges in current AI-based triage systems: insufficient medical specialization leading to hallucination-induced misclassifications, heterogeneous department structures across healthcare institutions, and inefficient detail-oriented questioning that impedes rapid triage decisions. Our system employs three specialized agents - RecipientAgent, InquirerAgent, and DepartmentAgent - that collaborate through structured inquiry mechanisms and department-specific guidance rules to transform unstructured patient symptoms into accurate department recommendations. To ensure robust evaluation, we constructed a comprehensive Chinese medical triage dataset from a medical website, comprising 3,360 real-world cases spanning 9 primary departments and 62 secondary departments. Through systematic data imputation using large language models, we address the prevalent issue of incomplete medical records in real-world data. Experimental results demonstrate that our multi-agent system achieves 89.2% accuracy in primary department classification and 73.9% accuracy in secondary department classification after four rounds of patient interaction. The system's pattern-matching-based guidance mechanisms enable efficient adaptation to diverse hospital configurations while maintaining high triage accuracy. Our work provides a scalable framework for deploying AI-assisted triage systems that can accommodate the organizational heterogeneity of healthcare institutions while ensuring clinically sound decision-making.

Updated: 2025-07-30 09:21:59

标题: 合作医疗分诊在不确定性下的多智能体动态匹配方法

摘要: 随着后疫情时期医疗需求激增，加之护士严重短缺，急诊科分诊系统面临前所未有的压力，需要创新的人工智能驱动解决方案。我们提出了一个多代理交互智能系统，用于医疗分诊，解决了当前基于人工智能的分诊系统中的三个基本挑战：医学专业不足导致幻觉诱发的错误分类、医疗机构之间异质的部门结构，以及效率低下的细节导向问询阻碍了快速的分诊决策。我们的系统采用了三个专门的代理 - 接受者代理、询问者代理和部门代理 - 通过结构化的询问机制和部门特定的指导规则，将无结构的病人症状转化为准确的部门推荐。为确保强大的评估，我们从一个医疗网站构建了一个全面的中文医疗分诊数据集，包括3,360个跨越9个主要部门和62个次要部门的真实案例。通过使用大型语言模型进行系统数据插补，我们解决了现实世界数据中不完整医疗记录的普遍问题。实验结果表明，我们的多代理系统在经过四轮病人互动后，主要部门分类达到了89.2%的准确率，次要部门分类达到了73.9%的准确率。系统基于模式匹配的指导机制能够有效适应不同医院配置，同时保持高分诊准确性。我们的工作为部署能够适应医疗机构组织异质性并确保临床决策的可靠人工智能辅助分诊系统提供了可扩展的框架。

更新时间: 2025-07-30 09:21:59

领域: cs.AI

下载: http://arxiv.org/abs/2507.22504v1

Collaborative Medical Triage under Uncertainty: A Multi-Agent Dynamic Matching Approach

Updated: 2025-07-30 09:21:59

标题: 协作式医疗分诊在不确定性下的多智能体动态匹配方法

摘要: 随着后流行病期间医疗需求激增，加上护理人员严重短缺，紧急科室分诊系统面临前所未有的压力，需要创新的人工智能驱动解决方案。我们提出了一个多智能体交互智能系统，用于医疗分诊，解决了当前基于人工智能的分诊系统中的三个基本挑战：医疗专业不足导致幻觉引起的错误分类、医疗机构之间异质的科室结构，以及效率低下的详细询问方式阻碍了快速分诊决策。我们的系统采用了三个专门的智能体 - RecipientAgent、InquirerAgent 和 DepartmentAgent - 通过结构化的询问机制和科室特定的指导规则进行协作，将未经结构化的患者症状转化为准确的科室推荐。为了确保强大的评估，我们从一个医疗网站构建了一个全面的中文医疗分诊数据集，包括3,360个涵盖9个主科室和62个次科室的真实案例。通过使用大型语言模型进行系统数据填补，我们解决了现实世界数据中不完整医疗记录的普遍问题。实验结果表明，我们的多智能体系统在四轮患者互动后，在主科室分类中达到了89.2%的准确率，在次科室分类中达到了73.9%的准确率。系统基于模式匹配的指导机制能够有效适应各种医院配置，同时保持高水平的分诊准确性。我们的工作为部署可扩展的AI辅助分诊系统提供了一个框架，可以适应医疗机构的组织异质性，同时确保临床决策的合理性。

更新时间: 2025-07-30 09:21:59

领域: cs.AI

下载: http://arxiv.org/abs/2507.22504v1

Geometry of nonlinear forecast reconciliation

Forecast reconciliation, an ex-post technique applied to forecasts that must satisfy constraints, has been a prominent topic in the forecasting literature over the past two decades. Recently, several efforts have sought to extend reconciliation methods to the probabilistic settings. Nevertheless, formal theorems demonstrating error reduction in nonlinear contexts, analogous to those presented in Panagiotelis et al.(2021), are still lacking. This paper addresses that gap by establishing such theorems for various classes of nonlinear hypersurfaces and vector-valued functions. Specifically, we derive an exact analog of Theorem 3.1 from Panagiotelis et al.(2021) for hypersurfaces with constant-sign curvature. Additionally, we provide probabilistic guarantees for the broader case of hypersurfaces with non-constant-sign curvature and for general vector-valued functions. To support reproducibility and practical adoption, we release a JAX-based Python package, \emph{to be released upon publication}, implementing the presented theorems and reconciliation procedures.

Updated: 2025-07-30 09:14:51

标题: 非线性预测协调的几何学

摘要: 预测调和是一种应用于必须满足约束条件的预测的后期技术，在过去二十年的预测文献中一直是一个突出的话题。最近，一些努力试图将调和方法扩展到概率设置。然而，类似于Panagiotelis等人（2021年）所提出的那些形式化定理，证明在非线性情境中的误差减少仍然缺乏。本文通过为各种非线性超曲面和矢量值函数建立这样的定理来填补这一空白。具体来说，我们为具有恒定符号曲率的超曲面推导了Panagiotelis等人（2021年）的定理3.1的精确类比。此外，我们为非恒定符号曲率的超曲面以及一般矢量值函数的更广泛情况提供了概率保证。为了支持可重复性和实际采用，我们发布了一个基于JAX的Python软件包，将实施所提出的定理和调和程序，该软件包将在发表后发布。

更新时间: 2025-07-30 09:14:51

领域: cs.LG,cs.CG

下载: http://arxiv.org/abs/2507.22500v1

Geometry of nonlinear forecast reconciliation

Updated: 2025-07-30 09:14:51

标题: 非线性预测调和的几何学

摘要: 预测协调是一种应用于必须满足约束条件的预测的后验技术，在过去二十年的预测文献中已经成为一个突出的主题。最近，一些努力试图将协调方法扩展到概率设置中。然而，类似于Panagiotelis等人（2021年）所提出的，在非线性环境中证明误差减少的形式定理仍然缺乏。本文通过为各种类别的非线性超曲面和矢量值函数建立这样的定理来填补这一空白。具体来说，我们为具有恒定符号曲率的超曲面导出了Panagiotelis等人（2021年）的定理3.1的精确类比。此外，我们为具有非恒定符号曲率的超曲面和一般矢量值函数的更广泛情况提供了概率保证。为了支持可重现性和实际采用，我们发布了一个基于JAX的Python包，该包将实现所提出的定理和协调程序，并将在发表后发布。

更新时间: 2025-07-30 09:14:51

领域: cs.LG,cs.CG

下载: http://arxiv.org/abs/2507.22500v1

LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning

Recent generative models face significant risks of producing harmful content, which has underscored the importance of machine unlearning (MU) as a critical technique for eliminating the influence of undesired data. However, existing MU methods typically assign the same weight to all data to be forgotten, which makes it difficult to effectively forget certain data that is harder to unlearn than others. In this paper, we empirically demonstrate that the loss of data itself can implicitly reflect its varying difficulty. Building on this insight, we introduce Loss-based Reweighting Unlearning (LoReUn), a simple yet effective plug-and-play strategy that dynamically reweights data during the unlearning process with minimal additional computational overhead. Our approach significantly reduces the gap between existing MU methods and exact unlearning in both image classification and generation tasks, effectively enhancing the prevention of harmful content generation in text-to-image diffusion models.

Updated: 2025-07-30 09:12:25

标题: LoReUn：数据本身隐含提供线索以改进机器去学习

摘要: 最近的生成模型面临着产生有害内容的重大风险，这凸显了机器取消学习（MU）作为消除不良数据影响的关键技术的重要性。然而，现有的MU方法通常将相同的权重分配给所有需要被遗忘的数据，这使得难以有效地忘记某些比其他数据更难取消学习的数据。在本文中，我们通过实证研究表明，数据本身的损失可以隐含地反映其不同的难度。基于这一洞察力，我们引入了基于损失重新加权取消学习（LoReUn）的简单而有效的即插即用策略，在取消学习过程中动态重新加权数据，几乎没有额外的计算开销。我们的方法显著缩小了现有MU方法与精确取消学习之间的差距，在图像分类和生成任务中，有效增强了在文本到图像扩散模型中预防有害内容生成的能力。

更新时间: 2025-07-30 09:12:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22499v1

LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning

Updated: 2025-07-30 09:12:25

标题: LoReUn：数据本身隐含提供线索以改善机器遗忘

摘要: 最近的生成模型面临着产生有害内容的重大风险，这凸显了机器遗忘（MU）作为一种消除不良数据影响的关键技术的重要性。然而，现有的MU方法通常将所有待遗忘的数据赋予相同的权重，这使得难以有效遗忘比其他数据更难学习的数据。在本文中，我们通过实验证明，数据本身的损失可以隐含地反映其不同的难度。基于这一观点，我们引入了基于损失重新加权遗忘（LoReUn）的方法，这是一个简单而有效的即插即用策略，在遗忘过程中动态重新加权数据，而额外的计算开销极小。我们的方法在图像分类和生成任务中显著缩小了现有MU方法与精确遗忘之间的差距，有效增强了文本到图像扩散模型中有害内容生成的预防能力。

更新时间: 2025-07-30 09:12:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22499v1

Reconstructing Historical Climate Fields With Deep Learning

Historical records of climate fields are often sparse due to missing measurements, especially before the introduction of large-scale satellite missions. Several statistical and model-based methods have been introduced to fill gaps and reconstruct historical records. Here, we employ a recently introduced deep-learning approach based on Fourier convolutions, trained on numerical climate model output, to reconstruct historical climate fields. Using this approach we are able to realistically reconstruct large and irregular areas of missing data, as well as reconstruct known historical events such as strong El Ni\~no and La Ni\~na with very little given information. Our method outperforms the widely used statistical kriging method as well as other recent machine learning approaches. The model generalizes to higher resolutions than the ones it was trained on and can be used on a variety of climate fields. Moreover, it allows inpainting of masks never seen before during the model training.

Updated: 2025-07-30 09:09:40

标题: 用深度学习重建历史气候场景

摘要: 气候领域的历史记录通常稀缺，因为在引入大规模卫星任务之前存在缺失的测量数据。已经引入了几种基于统计和模型的方法来填补这些空白并重建历史记录。在这里，我们采用了一种基于傅里叶卷积的最近引入的深度学习方法，该方法在数值气候模型输出上进行了训练，用于重建历史气候场。使用这种方法，我们能够真实地重建大片和不规则的缺失数据区域，以及重建已知的历史事件，如强烈的厄尔尼诺和拉尼娜，只需非常少的已知信息。我们的方法优于广泛使用的统计克里金方法以及其他最近的机器学习方法。该模型具有比其训练时更高的分辨率，并且可以用于各种气候领域。此外，它允许在模型训练期间从未见过的掩模进行修复。

更新时间: 2025-07-30 09:09:40

领域: physics.geo-ph,cs.LG

下载: http://arxiv.org/abs/2311.18348v2

Reconstructing Historical Climate Fields With Deep Learning

Updated: 2025-07-30 09:09:40

标题: 用深度学习重建历史气候场景

摘要: 气候领域的历史记录通常由于缺失测量数据而稀疏，特别是在大规模卫星任务推出之前。已经引入了几种统计和基于模型的方法来填补空白并重建历史记录。在这里，我们采用了一种最近引入的基于傅立叶卷积的深度学习方法，该方法在数值气候模型输出上进行训练，以重建历史气候场。使用这种方法，我们能够真实地重建大片缺失数据区域，以及重建已知的历史事件，例如强烈的厄尔尼诺和拉尼娜，只需极少的信息。我们的方法优于广泛使用的统计克里金方法以及其他最近的机器学习方法。该模型可以泛化到比其训练分辨率更高的分辨率，并可用于各种气候领域。此外，它还允许对模型训练期间从未见过的掩模进行填充。

更新时间: 2025-07-30 09:09:40

领域: physics.geo-ph,cs.LG

下载: http://arxiv.org/abs/2311.18348v2

Robust Adverse Weather Removal via Spectral-based Spatial Grouping

Adverse weather conditions cause diverse and complex degradation patterns, driving the development of All-in-One (AiO) models. However, recent AiO solutions still struggle to capture diverse degradations, since global filtering methods like direct operations on the frequency domain fail to handle highly variable and localized distortions. To address these issue, we propose Spectral-based Spatial Grouping Transformer (SSGformer), a novel approach that leverages spectral decomposition and group-wise attention for multi-weather image restoration. SSGformer decomposes images into high-frequency edge features using conventional edge detection and low-frequency information via Singular Value Decomposition. We utilize multi-head linear attention to effectively model the relationship between these features. The fused features are integrated with the input to generate a grouping-mask that clusters regions based on the spatial similarity and image texture. To fully leverage this mask, we introduce a group-wise attention mechanism, enabling robust adverse weather removal and ensuring consistent performance across diverse weather conditions. We also propose a Spatial Grouping Transformer Block that uses both channel attention and spatial attention, effectively balancing feature-wise relationships and spatial dependencies. Extensive experiments show the superiority of our approach, validating its effectiveness in handling the varied and intricate adverse weather degradations.

Updated: 2025-07-30 09:08:34

标题: 通过基于频谱的空间分组实现强大的恶劣天气去除

摘要: 恶劣的天气条件会导致多样化和复杂的退化模式，推动了全能一体化（AiO）模型的发展。然而，最近的AiO解决方案仍然难以捕捉多样化的退化，因为像直接在频域上进行操作这样的全局过滤方法无法处理高度可变和局部失真。为了解决这些问题，我们提出了基于谱空间分组变换器（SSGformer）的新方法，该方法利用谱分解和组内注意力进行多天气图像恢复。SSGformer通过传统边缘检测将图像分解为高频边缘特征和通过奇异值分解获取低频信息。我们利用多头线性注意力来有效地建模这些特征之间的关系。融合的特征与输入结合生成一个分组掩模，根据空间相似性和图像纹理对区域进行聚类。为了充分利用这个掩模，我们引入了一个组内注意力机制，实现了强大的恶劣天气去除，并确保在不同天气条件下保持一致的性能。我们还提出了一个使用通道注意力和空间注意力的空间分组变换器块，有效平衡特征间的关系和空间依赖关系。大量实验证明了我们方法的优越性，验证了其在处理多样化和复杂的恶劣天气退化方面的有效性。

更新时间: 2025-07-30 09:08:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22498v1

Robust Adverse Weather Removal via Spectral-based Spatial Grouping

Updated: 2025-07-30 09:08:34

标题: 通过基于频谱的空间分组实现强大的恶劣天气去除

摘要: Adverse weather conditions lead to complex degradation patterns, prompting the development of All-in-One (AiO) models. However, existing AiO solutions struggle to address diverse degradations due to limitations in global filtering methods. In this study, we introduce the Spectral-based Spatial Grouping Transformer (SSGformer), which utilizes spectral decomposition and group-wise attention for multi-weather image restoration. SSGformer decomposes images into high-frequency edge features and low-frequency information, and employs multi-head linear attention to model their relationship. The fused features are integrated with the input to generate a grouping-mask that clusters regions based on spatial similarity and image texture. A group-wise attention mechanism is introduced to enable robust adverse weather removal and consistent performance across different weather conditions. Additionally, a Spatial Grouping Transformer Block is proposed to balance feature-wise relationships and spatial dependencies. Extensive experiments demonstrate the effectiveness of our approach in handling diverse adverse weather degradations.

更新时间: 2025-07-30 09:08:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22498v1

Emergence of Quantised Representations Isolated to Anisotropic Functions

This paper presents a novel methodology for determining representational alignment, which builds upon the existing Spotlight Resonance method. Particularly, this new tool is used to gain insight into how discrete representations can emerge and organise in autoencoder models, through a controlled ablation study in which only the activation function is altered. Using this technique, the validity of whether function-driven symmetries can act as implicit inductive biases on representations is determined. Representations are found to tend to discretise when the activation functions are defined through a discrete algebraic permutation-equivariant symmetry. In contrast, they remain continuous under a continuous algebraic orthogonal-equivariant definition. This confirms the hypothesis: algebraic symmetries of network primitives can carry unintended inductive biases which produce task-independent artefactual structures in representations. The discrete symmetry of contemporary forms is shown to be a strong predictor for the induction of discrete representations transformed from otherwise continuous structures -- a quantisation effect. This motivates further reassessment of functional forms in common usage. Moreover, this supports a general causal model for one mode in which discrete representations may form, and could constitute a prerequisite for downstream interpretability phenomena, including grandmother neurons, discrete coding schemes, general linear features and possibly Superposition. Hence, this tool and proposed mechanism for the influence of functional form on representations may provide insights into emergent interpretability research. Finally, preliminary results indicate that quantisation of representations appears to correlate with a measurable increase in reconstruction error, reinforcing previous conjectures that this collapse can be detrimental.

Updated: 2025-07-30 09:07:28

标题: 量子化表示的出现与各向异性函数的隔离

摘要: 这篇论文提出了一种新的方法来确定代表性对齐，该方法建立在现有的Spotlight Resonance方法之上。特别地，这个新工具被用来揭示离散表示如何在自编码器模型中出现和组织，通过一个控制的消融研究，仅改变激活函数。利用这种技术，确定了函数驱动对称性是否可以作为表示的隐性归纳偏见。当激活函数通过离散代数置换等价对称性定义时，表示被发现倾向于离散化。相反，在连续代数正交等价定义下，它们保持连续。这证实了假设：网络基元的代数对称性可能携带意想不到的归纳偏见，产生表示中的任务无关的人为结构。当代形式的离散对称性被证明是从否则连续结构转换为离散表示的诱导的强预测因子 -- 一个量化效应。这促使进一步重新评估常用的功能形式。此外，这支持一个一般因果模型，其中离散表示可能形成，并可能构成下游可解释性现象的先决条件，包括祖母神经元、离散编码方案、一般线性特征，可能还有叠加。因此，这个工具和提出的功能形式对表示的影响机制可能为新兴的可解释性研究提供见解。最后，初步结果表明，表示的量化似乎与可衡量的重建误差的增加相关，进一步加强了之前的猜测，即这种崩溃可能是有害的。

更新时间: 2025-07-30 09:07:28

领域: cs.LG,I.5.1; F.1.1; I.2.6

下载: http://arxiv.org/abs/2507.12070v2

Emergence of Quantised Representations Isolated to Anisotropic Functions

Updated: 2025-07-30 09:07:28

标题: 量化表示的出现仅限于各向异性函数

摘要: 这篇论文提出了一种确定表征对齐的新方法，该方法建立在现有的Spotlight Resonance方法之上。特别地，这种新工具被用来揭示离散表示如何在自动编码器模型中出现并组织，通过一个受控的消融研究，在这个研究中只改变激活函数。使用这种技术，确定了功能驱动的对称性是否可以作为表示的隐含归纳偏差。当激活函数通过离散的代数置换等变对称性定义时，表示被发现倾向于离散化。相反，在连续的代数正交等变定义下，它们保持连续。这证实了假设：网络基元的代数对称性可能携带意想不到的归纳偏差，从而在表示中产生与任务无关的人为结构。当代形式的离散对称性被证明是从否则连续结构转变为离散表示的强预测因素 -- 一种量化效应。这激励了对常见用法中功能形式的进一步重新评估。此外，这支持了一般因果模型，用于解释离散表示可能形成的一种方式，并可能构成下游可解释性现象的先决条件，包括祖母神经元、离散编码方案、一般线性特征以及可能的叠加。因此，这种工具和提出的功能形式对表示的影响机制可能为新兴可解释性研究提供见解。最后，初步结果表明，表示的量化似乎与可测量的重建误差增加相关，从而强化了之前的猜测，即这种崩溃可能具有害处。

更新时间: 2025-07-30 09:07:28

领域: cs.LG,I.5.1; F.1.1; I.2.6

下载: http://arxiv.org/abs/2507.12070v2

Ownership Verification of DNN Models Using White-Box Adversarial Attacks with Specified Probability Manipulation

In this paper, we propose a novel framework for ownership verification of deep neural network (DNN) models for image classification tasks. It allows verification of model identity by both the rightful owner and third party without presenting the original model. We assume a gray-box scenario where an unauthorized user owns a model that is illegally copied from the original model, provides services in a cloud environment, and the user throws images and receives the classification results as a probability distribution of output classes. The framework applies a white-box adversarial attack to align the output probability of a specific class to a designated value. Due to the knowledge of original model, it enables the owner to generate such adversarial examples. We propose a simple but effective adversarial attack method based on the iterative Fast Gradient Sign Method (FGSM) by introducing control parameters. Experimental results confirm the effectiveness of the identification of DNN models using adversarial attack.

Updated: 2025-07-30 09:06:26

标题: 使用指定概率操纵的白盒对抗攻击验证DNN模型的所有权

摘要: 在这篇论文中，我们提出了一个新颖的框架，用于验证深度神经网络（DNN）模型在图像分类任务中的所有权。它允许合法所有者和第三方验证模型身份，而无需提供原始模型。我们假设一个灰盒情景，未经授权的用户拥有一个从原始模型非法复制而来的模型，在云环境中提供服务，用户输入图像并以输出类别的概率分布接收分类结果。该框架应用白盒对抗攻击来将特定类别的输出概率对齐到指定值。由于对原始模型的了解，它使所有者能够生成这样的对抗性示例。我们提出了一种基于迭代快速梯度符号方法（FGSM）的简单而有效的对抗攻击方法，引入控制参数。实验结果证实了使用对抗攻击来识别DNN模型的有效性。

更新时间: 2025-07-30 09:06:26

领域: cs.LG

下载: http://arxiv.org/abs/2505.17579v3

Ownership Verification of DNN Models Using White-Box Adversarial Attacks with Specified Probability Manipulation

Updated: 2025-07-30 09:06:26

标题: 使用白盒对抗攻击和指定概率操纵进行DNN模型的所有权验证

摘要: 在本文中，我们提出了一种新的框架，用于验证深度神经网络（DNN）模型在图像分类任务中的所有权。它允许合法所有者和第三方验证模型身份，而无需提供原始模型。我们假设存在一个灰盒场景，其中一个未经授权的用户拥有一个从原始模型非法复制而来的模型，在云环境中提供服务，用户输入图像并接收输出类别的概率分布作为分类结果。该框架采用白盒对抗攻击来将特定类别的输出概率与指定值对齐。由于对原始模型的了解，它使所有者能够生成这种对抗性示例。我们提出了一种基于迭代快速梯度符号方法（FGSM）的简单但有效的对抗攻击方法，通过引入控制参数。实验结果证实了使用对抗攻击识别DNN模型的有效性。

更新时间: 2025-07-30 09:06:26

领域: cs.LG

下载: http://arxiv.org/abs/2505.17579v3

Probing Information Distribution in Transformer Architectures through Entropy Analysis

This work explores entropy analysis as a tool for probing information distribution within Transformer-based architectures. By quantifying token-level uncertainty and examining entropy patterns across different stages of processing, we aim to investigate how information is managed and transformed within these models. As a case study, we apply the methodology to a GPT-based large language model, illustrating its potential to reveal insights into model behavior and internal representations. This approach may offer insights into model behavior and contribute to the development of interpretability and evaluation frameworks for transformer-based models

Updated: 2025-07-30 09:00:40

标题: 通过熵分析探究变压器架构中的信息分布

摘要: 这项工作探讨了熵分析作为探究基于Transformer的架构中信息分布的工具。通过量化令牌级别的不确定性，并在不同处理阶段检查熵模式，我们旨在研究信息在这些模型中是如何被管理和转换的。作为一个案例研究，我们将这一方法应用于基于GPT的大型语言模型，展示其揭示模型行为和内部表示的潜力。这种方法可能为模型行为提供见解，并有助于为基于Transformer的模型开发可解释性和评估框架。

更新时间: 2025-07-30 09:00:40

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.15347v2

Probing Information Distribution in Transformer Architectures through Entropy Analysis

Updated: 2025-07-30 09:00:40

标题: 通过熵分析探究变压器架构中的信息分布

摘要: 这项工作探讨了熵分析作为探究基于Transformer架构中信息分布的工具。通过量化令牌级的不确定性，并研究在不同处理阶段的熵模式，我们旨在调查信息在这些模型中如何管理和转换。作为一个案例研究，我们将该方法应用到一个基于GPT的大型语言模型中，展示其揭示模型行为和内部表示的潜力。这种方法可能为理解模型行为提供见解，并为Transformer模型的可解释性和评估框架的发展做出贡献。

更新时间: 2025-07-30 09:00:40

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.15347v2

LVM-GP: Uncertainty-Aware PDE Solver via coupling latent variable model and Gaussian process

We propose a novel probabilistic framework, termed LVM-GP, for uncertainty quantification in solving forward and inverse partial differential equations (PDEs) with noisy data. The core idea is to construct a stochastic mapping from the input to a high-dimensional latent representation, enabling uncertainty-aware prediction of the solution. Specifically, the architecture consists of a confidence-aware encoder and a probabilistic decoder. The encoder implements a high-dimensional latent variable model based on a Gaussian process (LVM-GP), where the latent representation is constructed by interpolating between a learnable deterministic feature and a Gaussian process prior, with the interpolation strength adaptively controlled by a confidence function learned from data. The decoder defines a conditional Gaussian distribution over the solution field, where the mean is predicted by a neural operator applied to the latent representation, allowing the model to learn flexible function-to-function mapping. Moreover, physical laws are enforced as soft constraints in the loss function to ensure consistency with the underlying PDE structure. Compared to existing approaches such as Bayesian physics-informed neural networks (B-PINNs) and deep ensembles, the proposed framework can efficiently capture functional dependencies via merging a latent Gaussian process and neural operator, resulting in competitive predictive accuracy and robust uncertainty quantification. Numerical experiments demonstrate the effectiveness and reliability of the method.

Updated: 2025-07-30 09:00:39

标题: LVM-GP：通过耦合潜变量模型和高斯过程实现的不确定性感知PDE求解器

摘要: 我们提出了一种新颖的概率框架，称为LVM-GP，用于在解决具有噪声数据的前向和反向偏微分方程（PDEs）中进行不确定性量化。核心思想是构建从输入到高维潜在表示的随机映射，从而实现对解的不确定性感知预测。具体而言，该架构包括一个置信度感知编码器和一个概率解码器。编码器实现了基于高斯过程（LVM-GP）的高维潜在变量模型，其中潜在表示是通过在可学习的确定性特征和高斯过程先验之间插值构建的，插值强度由从数据中学习的置信度函数自适应控制。解码器在解场上定义了一个条件高斯分布，其中均值由应用于潜在表示的神经算子预测，使模型能够学习灵活的函数到函数的映射。此外，物理定律被强加为损失函数中的软约束，以确保与基础PDE结构的一致性。与现有方法（如贝叶斯物理信息神经网络（B-PINNs）和深度集合）相比，所提出的框架通过合并潜在高斯过程和神经算子，可以有效地捕获功能依赖性，从而实现具有竞争力的预测准确性和稳健的不确定性量化。数值实验证明了该方法的有效性和可靠性。

更新时间: 2025-07-30 09:00:39

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22493v1

LVM-GP: Uncertainty-Aware PDE Solver via coupling latent variable model and Gaussian process

Updated: 2025-07-30 09:00:39

标题: LVM-GP：通过耦合潜变量模型和高斯过程实现的不确定性感知PDE求解器

摘要: 我们提出了一种新颖的概率框架，称为LVM-GP，用于在解决带有噪声数据的正向和反向偏微分方程（PDEs）中量化不确定性。核心思想是构建一个从输入到高维潜在表示的随机映射，从而实现对解的不确定性感知预测。具体而言，该架构由一个置信度感知编码器和一个概率解码器组成。编码器基于高维潜在变量模型（LVM-GP）实现，其中潜在表示通过在可学习的确定性特征和高斯过程先验之间插值构建，插值强度由从数据中学习的置信度函数自适应控制。解码器定义了解场上的条件高斯分布，其中均值由应用于潜在表示的神经算子预测，使模型能够学习灵活的函数到函数映射。此外，物理定律作为软约束在损失函数中强制执行，以确保与基础PDE结构的一致性。与现有方法如贝叶斯物理知识神经网络（B-PINNs）和深度集合相比，所提出的框架通过融合潜在高斯过程和神经算子，能够有效捕捉功能依赖关系，从而实现竞争性的预测准确性和稳健的不确定性量化。数值实验证明了该方法的有效性和可靠性。

更新时间: 2025-07-30 09:00:39

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22493v1

SHoM: A Mental-Synthesis Trust Management Model for Mitigating Botnet-Driven DDoS Attacks in the Internet of Things

The advantages of IoT in strengthening commercial, industrial, and social ecosystems have led to its widespread expansion. Nevertheless, because endpoint devices have limited computation, storage, and communication capabilities, the IoT infrastructure is vulnerable to several cyber threats. As a result, DDoS attacks pose a severe risk to the security of IoT. By taking advantage of these weaknesses, attackers may quickly employ IoT devices as a component of botnets to execute DDoS attacks. The most critical development is how more armies of robots are being constructed from IoT devices. We offer a Model for dealing with DDOS attacks on botnets in the Internet of Things via trust management. In this Model, an attempt has been made to consider all aspects of security concerning trust factors to design a reliable and flexible model against DDoS attacks against the Internet of Things. In the initial studies, about 40-50 security models related to the subject have been studied by using review articles

Updated: 2025-07-30 08:57:59

标题: SHoM: 一种用于缓解物联网中基于僵尸网络的DDoS攻击的心理综合信任管理模型

摘要: 物联网在加强商业、工业和社会生态系统方面的优势导致其广泛扩张。然而，由于终端设备具有有限的计算、存储和通信能力，物联网基础设施容易受到多种网络威胁。因此，DDoS攻击对物联网的安全构成严重风险。利用这些弱点，攻击者可能迅速利用物联网设备作为僵尸网络的一部分进行DDoS攻击。最关键的发展是如何从物联网设备构建更多的机器人军团。我们提供了一个通过信任管理来应对物联网中僵尸网络的DDoS攻击的模型。在这个模型中，试图考虑所有与信任因素相关的安全方面，设计一个可靠且灵活的模型来抵御对物联网的DDoS攻击。在最初的研究中，通过使用评论文章研究了约40-50个与该主题相关的安全模型。

更新时间: 2025-07-30 08:57:59

领域: cs.CR

下载: http://arxiv.org/abs/2507.21178v2

Proto-EVFL: Enhanced Vertical Federated Learning via Dual Prototype with Extremely Unaligned Data

In vertical federated learning (VFL), multiple enterprises address aligned sample scarcity by leveraging massive locally unaligned samples to facilitate collaborative learning. However, unaligned samples across different parties in VFL can be extremely class-imbalanced, leading to insufficient feature representation and limited model prediction space. Specifically, class-imbalanced problems consist of intra-party class imbalance and inter-party class imbalance, which can further cause local model bias and feature contribution inconsistency issues, respectively. To address the above challenges, we propose Proto-EVFL, an enhanced VFL framework via dual prototypes. We first introduce class prototypes for each party to learn relationships between classes in the latent space, allowing the active party to predict unseen classes. We further design a probabilistic dual prototype learning scheme to dynamically select unaligned samples by conditional optimal transport cost with class prior probability. Moreover, a mixed prior guided module guides this selection process by combining local and global class prior probabilities. Finally, we adopt an \textit{adaptive gated feature aggregation strategy} to mitigate feature contribution inconsistency by dynamically weighting and aggregating local features across different parties. We proved that Proto-EVFL, as the first bi-level optimization framework in VFL, has a convergence rate of 1/\sqrt T. Extensive experiments on various datasets validate the superiority of our Proto-EVFL. Even in a zero-shot scenario with one unseen class, it outperforms baselines by at least 6.97%

Updated: 2025-07-30 08:48:33

标题: 原型-EVFL: 通过双原型在极不对齐数据上增强的纵向联邦学习

摘要: 在垂直联邦学习（VFL）中，多个企业通过利用大量本地不对齐的样本来促进协作学习，从而解决了对齐样本稀缺的问题。然而，在VFL中，不同方的不对齐样本可能极度不平衡，导致特征表示不足和模型预测空间有限。具体而言，类不平衡问题包括党内类不平衡和党间类不平衡，分别可能导致本地模型偏差和特征贡献不一致的问题。为了解决上述挑战，我们提出了Proto-EVFL，这是一个通过双原型增强的VFL框架。我们首先为每个党引入类原型，以学习潜在空间中类之间的关系，使活动党能够预测看不见的类。我们进一步设计了一个概率双原型学习方案，通过类先验概率和条件最优传输成本动态选择不对齐样本。此外，一个混合先验引导模块通过结合本地和全局类先验概率来指导这个选择过程。最后，我们采用了一种自适应门控特征聚合策略，通过动态加权和聚合不同党之间的本地特征来减少特征贡献不一致。我们证明了Proto-EVFL作为VFL中的第一个双层优化框架，其收敛速度为1/\sqrt T。对各种数据集的广泛实验验证了我们Proto-EVFL的优越性。即使在一个看不见的类的零样本场景中，它的性能也至少比基线高出6.97%。

更新时间: 2025-07-30 08:48:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22488v1

Proto-EVFL: Enhanced Vertical Federated Learning via Dual Prototype with Extremely Unaligned Data

Updated: 2025-07-30 08:48:33

标题: 原型EVFL：通过双重原型处理极不对齐数据的增强垂直联邦学习

摘要: 在垂直联邦学习（VFL）中，多个企业通过利用大量本地不对齐的样本来促进协作学习，以解决对齐样本稀缺的问题。然而，在VFL中，不同方的不对齐样本可能极度不平衡，导致特征表示不足和模型预测空间有限。具体来说，类不平衡问题包括各方内部的类不平衡和各方之间的类不平衡，分别可能导致局部模型偏差和特征贡献不一致的问题。为了解决上述挑战，我们提出了Proto-EVFL，一个通过双原型增强的VFL框架。我们首先为每个方引入类原型，学习潜在空间中类之间的关系，使活跃方能够预测未见类。我们进一步设计了概率双原型学习方案，通过条件最优传输成本和类别先验概率动态选择不对齐样本。此外，一个混合先验引导模块通过组合局部和全局类别先验概率引导该选择过程。最后，我们采用自适应门控特征聚合策略，通过动态加权和聚合不同方之间的本地特征来减轻特征贡献不一致性。我们证明了Proto-EVFL作为VFL中的第一个双层优化框架，具有1/√T的收敛速度。对各种数据集进行的大量实验证实了我们的Proto-EVFL的优越性。即使在一个未见类的零样本情况下，其表现也至少比基线高出6.97%。

更新时间: 2025-07-30 08:48:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22488v1

Policy-Driven AI in Dataspaces: Taxonomy, Explainability, and Pathways for Compliant Innovation

As AI-driven dataspaces become integral to data sharing and collaborative analytics, ensuring privacy, performance, and policy compliance presents significant challenges. This paper provides a comprehensive review of privacy-preserving and policy-aware AI techniques, including Federated Learning, Differential Privacy, Trusted Execution Environments, Homomorphic Encryption, and Secure Multi-Party Computation, alongside strategies for aligning AI with regulatory frameworks such as GDPR and the EU AI Act. We propose a novel taxonomy to classify these techniques based on privacy levels, performance impacts, and compliance complexity, offering a clear framework for practitioners and researchers to navigate trade-offs. Key performance metrics -- latency, throughput, cost overhead, model utility, fairness, and explainability -- are analyzed to highlight the multi-dimensional optimization required in dataspaces. The paper identifies critical research gaps, including the lack of standardized privacy-performance KPIs, challenges in explainable AI for federated ecosystems, and semantic policy enforcement amidst regulatory fragmentation. Future directions are outlined, proposing a conceptual framework for policy-driven alignment, automated compliance validation, standardized benchmarking, and integration with European initiatives like GAIA-X, IDS, and Eclipse EDC. By synthesizing technical, ethical, and regulatory perspectives, this work lays the groundwork for developing trustworthy, efficient, and compliant AI systems in dataspaces, fostering innovation in secure and responsible data-driven ecosystems.

Updated: 2025-07-30 08:46:55

标题: 数据空间中的政策驱动人工智能：分类、可解释性和合规创新路径

摘要: 随着以人工智能驱动的数据空间对数据共享和协作分析变得至关重要，确保隐私、性能和合规性面临重大挑战。本文综述了保护隐私和遵守政策的人工智能技术，包括联邦学习、差分隐私、受信执行环境、同态加密和安全多方计算，以及与GDPR和欧盟AI法案等监管框架相结合的策略。我们提出了一个新颖的分类法，根据隐私级别、性能影响和合规复杂性对这些技术进行分类，为从业者和研究人员提供了一个清晰的框架，以便进行权衡。关键性能指标——延迟、吞吐量、成本开销、模型效用、公平性和可解释性——被分析，以突出数据空间中所需的多维度优化。该文确定了关键的研究空白，包括缺乏标准化的隐私-性能关键绩效指标，联邦生态系统中可解释人工智能的挑战，以及在监管碎片化中语义政策执行的困难。未来的方向被概述，提出了一个基于政策驱动的对齐概念框架，自动合规性验证，标准化基准测试和与欧洲GAIA-X、IDS和Eclipse EDC等倡议的整合。通过综合技术、伦理和监管视角，这项工作为在数据空间中开发可信赖、高效和合规的人工智能系统奠定了基础，促进了安全和负责任的数据驱动生态系统中的创新。

更新时间: 2025-07-30 08:46:55

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.20014v2

Policy-Driven AI in Dataspaces: Taxonomy, Explainability, and Pathways for Compliant Innovation

Updated: 2025-07-30 08:46:55

标题: 政策驱动的数据空间中的人工智能：分类法，可解释性和符合创新途径

摘要: 随着人工智能驱动的数据空间成为数据共享和协作分析的重要组成部分，确保隐私、性能和政策遵从性带来了重大挑战。本文全面回顾了保护隐私和具有政策意识的人工智能技术，包括联邦学习、差分隐私、受信任执行环境、同态加密和安全多方计算，以及与GDPR和欧盟人工智能法等监管框架对齐的策略。我们提出了一种新颖的分类法，基于隐私级别、性能影响和符合复杂性对这些技术进行分类，为从业者和研究人员提供了一个清晰的框架来权衡利弊。关键性能指标——延迟、吞吐量、成本开销、模型效用、公平性和可解释性——被分析，以突出数据空间中需要的多维优化。本文确定了关键的研究空白，包括缺乏标准化的隐私性能关键绩效指标、联邦生态系统中可解释人工智能的挑战，以及在监管分散背景下语义政策执行的问题。未来的方向被勾勒出来，提出了一个以政策驱动的对齐、自动合规验证、标准化基准测试和与欧洲倡议（如GAIA-X、IDS和Eclipse EDC）整合的概念框架。通过综合技术、伦理和监管视角，这项工作为在数据空间中开发可信赖、高效和合规的人工智能系统奠定了基础，促进了安全和负责任的数据驱动生态系统中的创新。

更新时间: 2025-07-30 08:46:55

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2507.20014v2

Physics-constrained generative machine learning-based high-resolution downscaling of Greenland's surface mass balance and surface temperature

Accurate, high-resolution projections of the Greenland ice sheet's surface mass balance (SMB) and surface temperature are essential for understanding future sea-level rise, yet current approaches are either computationally demanding or limited to coarse spatial scales. Here, we introduce a novel physics-constrained generative modeling framework based on a consistency model (CM) to downscale low-resolution SMB and surface temperature fields by a factor of up to 32 (from 160 km to 5 km grid spacing) in a few sampling steps. The CM is trained on monthly outputs of the regional climate model MARv3.12 and conditioned on ice-sheet topography and insolation. By enforcing a hard conservation constraint during inference, we ensure approximate preservation of SMB and temperature sums on the coarse spatial scale as well as robust generalization to extreme climate states without retraining. On the test set, our constrained CM achieves a continued ranked probability score of 6.31 mmWE for the SMB and 0.1 K for the surface temperature, outperforming interpolation-based downscaling. Together with spatial power-spectral analysis, we demonstrate that the CM faithfully reproduces variability across spatial scales. We further apply bias-corrected outputs of the NorESM2 Earth System Model as inputs to our CM, to demonstrate the potential of our model to directly downscale ESM fields. Our approach delivers realistic, high-resolution climate forcing for ice-sheet simulations with fast inference and can be readily integrated into Earth-system and ice-sheet model workflows to improve projections of the future contribution to sea-level rise from Greenland and potentially other ice sheets and glaciers too.

Updated: 2025-07-30 08:43:48

标题: 物理约束的生成机器学习方法用于格陵兰岛地表质量平衡和地表温度的高分辨率降尺度处理

摘要: 格陵兰冰盖表面质量平衡（SMB）和表面温度的准确、高分辨率的预测对于理解未来海平面上升至关重要，然而目前的方法要么计算量大，要么仅限于粗糙的空间尺度。在这里，我们介绍了一种基于一致性模型（CM）的新型物理约束生成建模框架，通过几个采样步骤将低分辨率的SMB和表面温度场降尺度至高达32倍（从160公里到5公里的网格间距）。CM是在区域气候模型MARv3.12的月度输出上进行训练的，并且以冰盖地形和日照为条件。通过在推断过程中强制执行硬保守约束，我们确保在粗糙空间尺度上近似保持SMB和温度总和，并且能够在不重新训练的情况下对极端气候状态进行稳健的泛化。在测试集上，我们的受约束CM为SMB实现了持续排名概率分数为6.31 mmWE，表面温度为0.1 K，优于基于插值的降尺度方法。结合空间功率谱分析，我们展示了CM忠实地再现了不同空间尺度上的变异性。我们进一步将NorESM2地球系统模型的偏差校正输出作为我们CM的输入，以展示我们的模型直接将ESM字段降尺度的潜力。我们的方法为冰盖模拟提供了逼真的高分辨率气候驱动，具有快速推断的特点，并可以轻松集成到地球系统和冰盖模型工作流程中，以改进对未来格陵兰和潜在其他冰盖和冰川对海平面上升贡献的预测。

更新时间: 2025-07-30 08:43:48

领域: physics.geo-ph,cs.AI

下载: http://arxiv.org/abs/2507.22485v1

Physics-constrained generative machine learning-based high-resolution downscaling of Greenland's surface mass balance and surface temperature

Updated: 2025-07-30 08:43:48

标题: 物理约束的生成式机器学习用于格陵兰岛表面质量平衡和表面温度的高分辨率下尺度化

摘要: 精确、高分辨率的格陵兰冰盖表面质量平衡（SMB）和表面温度的预测对于理解未来海平面上升至关重要，然而当前的方法要么计算要求高，要么仅限于粗糙的空间尺度。在这里，我们引入了一种基于一致性模型（CM）的新型物理约束生成建模框架，通过几个采样步骤将低分辨率的SMB和表面温度场降尺度至高达32倍（从160 km到5 km的网格间距）。CM是在区域气候模型MARv3.12的月度输出上进行训练的，并且受冰盖地形和日射的约束。通过在推断过程中强制执行硬约束，我们确保在粗糙的空间尺度上近似保持SMB和温度总和，以及在不重新训练的情况下对极端气候状态具有强大的泛化能力。在测试集上，我们的受约束CM实现了SMB为6.31 mmWE和表面温度为0.1 K的持续排名概率得分，优于基于插值的降尺度。结合空间功率谱分析，我们展示了CM忠实地再现了各种空间尺度上的变化。我们进一步将NorESM2地球系统模型的偏差校正输出作为我们CM的输入，以展示我们模型直接降尺度ESM字段的潜力。我们的方法提供了逼真、高分辨率的气候强迫数据，用于冰盖模拟，具有快速推断能力，并且可以轻松集成到地球系统和冰盖模型工作流程中，以改进对格陵兰和潜在其他冰盖和冰川未来对海平面上升贡献的预测。

更新时间: 2025-07-30 08:43:48

领域: physics.geo-ph,cs.AI

下载: http://arxiv.org/abs/2507.22485v1

Convergence Properties of Natural Gradient Descent for Minimizing KL Divergence

The Kullback-Leibler (KL) divergence plays a central role in probabilistic machine learning, where it commonly serves as the canonical loss function. Optimization in such settings is often performed over the probability simplex, where the choice of parameterization significantly impacts convergence. In this work, we study the problem of minimizing the KL divergence and analyze the behavior of gradient-based optimization algorithms under two dual coordinate systems within the framework of information geometry$-$ the exponential family ($\theta$ coordinates) and the mixture family ($\eta$ coordinates). We compare Euclidean gradient descent (GD) in these coordinates with the coordinate-invariant natural gradient descent (NGD), where the natural gradient is a Riemannian gradient that incorporates the intrinsic geometry of the underlying statistical model. In continuous time, we prove that the convergence rates of GD in the $\theta$ and $\eta$ coordinates provide lower and upper bounds, respectively, on the convergence rate of NGD. Moreover, under affine reparameterizations of the dual coordinates, the convergence rates of GD in $\eta$ and $\theta$ coordinates can be scaled to $2c$ and $\frac{2}{c}$, respectively, for any $c>0$, while NGD maintains a fixed convergence rate of $2$, remaining invariant to such transformations and sandwiched between them. Although this suggests that NGD may not exhibit uniformly superior convergence in continuous time, we demonstrate that its advantages become pronounced in discrete time, where it achieves faster convergence and greater robustness to noise, outperforming GD. Our analysis hinges on bounding the spectrum and condition number of the Hessian of the KL divergence at the optimum, which coincides with the Fisher information matrix.

Updated: 2025-07-30 08:43:29

标题: 自然梯度下降在最小化KL散度中的收敛性质

摘要: Kullback-Leibler（KL）散度在概率机器学习中起着核心作用，通常作为规范损失函数。在这种情景下，优化通常在概率单纯形上进行，参数化选择显著影响收敛性。本文研究了最小化KL散度的问题，并分析了在信息几何学框架下两个对偶坐标系下基于梯度的优化算法的行为-指数族（θ坐标）和混合族（η坐标）。我们比较了这些坐标系中欧几里德梯度下降（GD）与坐标不变的自然梯度下降（NGD），其中自然梯度是包含基础统计模型内在几何的黎曼梯度。在连续时间下，我们证明了在θ和η坐标中的GD的收敛速度分别提供了NGD的收敛速度的下限和上限。此外，在对偶坐标的仿射重参数化下，GD在η和θ坐标中的收敛速度可以分别缩放到2c和2/c，其中c>0，而NGD保持固定的收敛速度为2，在此类变换下保持不变并位于其之间。虽然这表明NGD在连续时间中可能不会表现出统一优越的收敛性，但我们证明了在离散时间中，其优势变得显著，实现更快的收敛速度和更高的噪声鲁棒性，优于GD。我们的分析依赖于限制KL散度在最优点处的海森矩阵的谱和条件数，它与费舍尔信息矩阵一致。

更新时间: 2025-07-30 08:43:29

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2504.19259v2

Convergence Properties of Natural Gradient Descent for Minimizing KL Divergence

Updated: 2025-07-30 08:43:29

标题: 自然梯度下降在最小化KL散度时的收敛性质

摘要: Kullback-Leibler(KL)散度在概率机器学习中起着核心作用，通常被用作标准损失函数。在这种情况下，优化通常在概率单纯形上进行，参数化的选择对收敛性有重要影响。在这项工作中，我们研究了最小化KL散度的问题，并分析了基于梯度的优化算法在信息几何框架中两个对偶坐标系下的行为-指数族($\theta$坐标)和混合族($\eta$坐标)。我们比较了这些坐标系中的欧几里德梯度下降(GD)与坐标不变的自然梯度下降(NGD)，其中自然梯度是一个包含底层统计模型的内在几何的黎曼梯度。在连续时间下，我们证明了$\theta$和$\eta$坐标中GD的收敛率分别为NGD的下限和上限。此外，在对偶坐标的仿射重新参数化下，$\eta$和$\theta$坐标中GD的收敛率可以分别缩放到$2c$和$\frac{2}{c}$，其中$c>0$，而NGD保持固定的收敛率为$2$，对这样的转换保持不变并位于它们之间。虽然这表明在连续时间中NGD可能不会表现出统一优越的收敛性，但我们证明了它在离散时间中的优势变得显著，它实现了更快的收敛速度和更强的噪声鲁棒性，优于GD。我们的分析关键在于限制KL散度在最优解处的Hessian矩阵的谱和条件数，该矩阵与Fisher信息矩阵重合。

更新时间: 2025-07-30 08:43:29

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2504.19259v2

Towards Blind Bitstream-corrupted Video Recovery via a Visual Foundation Model-driven Framework

Video signals are vulnerable in multimedia communication and storage systems, as even slight bitstream-domain corruption can lead to significant pixel-domain degradation. To recover faithful spatio-temporal content from corrupted inputs, bitstream-corrupted video recovery has recently emerged as a challenging and understudied task. However, existing methods require time-consuming and labor-intensive annotation of corrupted regions for each corrupted video frame, resulting in a large workload in practice. In addition, high-quality recovery remains difficult as part of the local residual information in corrupted frames may mislead feature completion and successive content recovery. In this paper, we propose the first blind bitstream-corrupted video recovery framework that integrates visual foundation models with a recovery model, which is adapted to different types of corruption and bitstream-level prompts. Within the framework, the proposed Detect Any Corruption (DAC) model leverages the rich priors of the visual foundation model while incorporating bitstream and corruption knowledge to enhance corruption localization and blind recovery. Additionally, we introduce a novel Corruption-aware Feature Completion (CFC) module, which adaptively processes residual contributions based on high-level corruption understanding. With VFM-guided hierarchical feature augmentation and high-level coordination in a mixture-of-residual-experts (MoRE) structure, our method suppresses artifacts and enhances informative residuals. Comprehensive evaluations show that the proposed method achieves outstanding performance in bitstream-corrupted video recovery without requiring a manually labeled mask sequence. The demonstrated effectiveness will help to realize improved user experience, wider application scenarios, and more reliable multimedia communication and storage systems.

Updated: 2025-07-30 08:31:54

标题: 朝着盲目比特流受损视频恢复的方向：基于视觉基础模型驱动的框架

摘要: 视频信号在多媒体通信和存储系统中很容易受到破坏，即使轻微的比特流域损坏也会导致像素域的显著退化。为了从受损输入中恢复可靠的时空内容，比特流域损坏的视频恢复最近被提出作为一个具有挑战性和不够研究的任务。然而，现有方法需要耗时且劳动密集的注释每一帧受损视频的受损区域，实际中造成了巨大的工作量。此外，由于受损帧中的局部残留信息可能会误导特征完成和后续内容恢复，因此高质量的恢复仍然困难。在本文中，我们提出了第一个盲比特流域受损视频恢复框架，该框架将视觉基础模型与恢复模型相结合，适应不同类型的破坏和比特流级提示。在该框架内，所提出的Detect Any Corruption (DAC)模型利用视觉基础模型的丰富先验知识，同时结合比特流和破坏知识，以增强破坏定位和盲恢复。此外，我们引入了一种新颖的Corruption-aware Feature Completion (CFC)模块，根据高级破坏理解自适应地处理残留贡献。通过VFM引导的分层特征增强和在混合残留专家 (MoRE) 结构中的高级协调，我们的方法抑制了伪影并增强了信息残差。全面的评估表明，所提出的方法在不需要手动标记的掩膜序列的情况下，在比特流域受损视频恢复方面取得了出色的性能。所展示的有效性将有助于实现改进的用户体验、更广泛的应用场景以及更可靠的多媒体通信和存储系统。

更新时间: 2025-07-30 08:31:54

领域: eess.IV,cs.AI,cs.CV,cs.MM

下载: http://arxiv.org/abs/2507.22481v1

Towards Blind Bitstream-corrupted Video Recovery via a Visual Foundation Model-driven Framework

Updated: 2025-07-30 08:31:54

标题: 朝向通过视觉基础模型驱动框架实现盲比特流损坏视频恢复

摘要: 视频信号在多媒体通信和存储系统中容易受损，即使轻微的比特流域损坏也会导致像素域严重退化。为了从损坏的输入中恢复忠实的时空内容，比特流域损坏视频恢复最近出现并成为一个具有挑战性和未被充分研究的任务。然而，现有方法需要耗时且工作量巨大地为每个损坏视频帧进行损坏区域的标注，这在实践中会造成巨大的负担。此外，由于损坏帧中部分本地残留信息可能会误导特征完成和后续内容恢复，高质量的恢复仍然困难。在本文中，我们提出了第一个盲目比特流域损坏视频恢复框架，该框架将视觉基础模型与恢复模型相结合，适应不同类型的损坏和比特流级提示。在该框架中，提出的Detect Any Corruption（DAC）模型利用视觉基础模型的丰富先验知识，同时结合比特流和损坏知识来增强损坏定位和盲目恢复。此外，我们引入了一种新颖的Corruption-aware Feature Completion（CFC）模块，根据高级损坏理解自适应地处理残留贡献。通过VFM引导的分层特征增强和在混合残差专家（MoRE）结构中的高级协调，我们的方法抑制了伪影并增强了信息残留。全面的评估表明，所提出的方法在比特流域损坏视频恢复方面取得了出色的性能，而无需手动标记掩模序列。所展示的有效性将有助于实现改善用户体验、更广泛的应用场景和更可靠的多媒体通信和存储系统。

更新时间: 2025-07-30 08:31:54

领域: eess.IV,cs.AI,cs.CV,cs.MM

下载: http://arxiv.org/abs/2507.22481v1

LIDAR: Lightweight Adaptive Cue-Aware Fusion Vision Mamba for Multimodal Segmentation of Structural Cracks

Achieving pixel-level segmentation with low computational cost using multimodal data remains a key challenge in crack segmentation tasks. Existing methods lack the capability for adaptive perception and efficient interactive fusion of cross-modal features. To address these challenges, we propose a Lightweight Adaptive Cue-Aware Vision Mamba network (LIDAR), which efficiently perceives and integrates morphological and textural cues from different modalities under multimodal crack scenarios, generating clear pixel-level crack segmentation maps. Specifically, LIDAR is composed of a Lightweight Adaptive Cue-Aware Visual State Space module (LacaVSS) and a Lightweight Dual Domain Dynamic Collaborative Fusion module (LD3CF). LacaVSS adaptively models crack cues through the proposed mask-guided Efficient Dynamic Guided Scanning Strategy (EDG-SS), while LD3CF leverages an Adaptive Frequency Domain Perceptron (AFDP) and a dual-pooling fusion strategy to effectively capture spatial and frequency-domain cues across modalities. Moreover, we design a Lightweight Dynamically Modulated Multi-Kernel convolution (LDMK) to perceive complex morphological structures with minimal computational overhead, replacing most convolutional operations in LIDAR. Experiments on three datasets demonstrate that our method outperforms other state-of-the-art (SOTA) methods. On the light-field depth dataset, our method achieves 0.8204 in F1 and 0.8465 in mIoU with only 5.35M parameters. Code and datasets are available at https://github.com/Karl1109/LIDAR-Mamba.

Updated: 2025-07-30 08:28:20

标题: 激光雷达：轻量级自适应感知融合视觉Mamba用于结构裂缝的多模态分割

摘要: 使用多模态数据实现低计算成本的像素级分割仍然是裂缝分割任务中的一个关键挑战。现有方法缺乏自适应感知和有效交互融合跨模态特征的能力。为了解决这些挑战，我们提出了一种轻量级自适应提示感知视觉曼巴网络（LIDAR），在多模态裂缝场景下高效地感知和整合来自不同模态的形态和纹理提示，生成清晰的像素级裂缝分割地图。具体而言，LIDAR由轻量级自适应提示感知视觉状态空间模块（LacaVSS）和轻量级双域动态协同融合模块（LD3CF）组成。LacaVSS通过提出的基于掩模引导的高效动态引导扫描策略（EDG-SS）自适应地建模裂缝提示，而LD3CF利用自适应频域感知器（AFDP）和双池融合策略有效捕捉跨模态的空间和频域提示。此外，我们设计了一种轻量级动态调制多核卷积（LDMK）来感知复杂形态结构，并在LIDAR中取代大多数卷积操作，以最小化计算开销。在三个数据集上的实验证明，我们的方法胜过其他最先进的方法。在光场深度数据集上，我们的方法在F1上达到了0.8204，在mIoU上达到了0.8465，仅使用5.35M个参数。代码和数据集可在https://github.com/Karl1109/LIDAR-Mamba获取。

更新时间: 2025-07-30 08:28:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22477v1

LIDAR: Lightweight Adaptive Cue-Aware Fusion Vision Mamba for Multimodal Segmentation of Structural Cracks

Updated: 2025-07-30 08:28:20

标题: 激光雷达：轻量级自适应提示感知融合视觉蟒蛇用于结构裂缝的多模态分割

摘要: 利用多模态数据实现低计算成本的像素级分割仍然是裂缝分割任务中的一个关键挑战。现有方法缺乏适应性感知和跨模态特征的高效交互融合能力。为解决这些挑战，我们提出了一种轻量级自适应线索感知视觉曼巴网络（LIDAR），它能够有效地感知和整合来自不同模态的形态和纹理线索，在多模态裂缝场景下生成清晰的像素级裂缝分割图。具体来说，LIDAR由轻量级自适应线索感知视觉状态空间模块（LacaVSS）和轻量级双域动态协同融合模块（LD3CF）组成。LacaVSS通过提出的基于掩模引导的高效动态引导扫描策略（EDG-SS）自适应地建模裂缝线索，而LD3CF利用自适应频域感知器（AFDP）和双池融合策略有效捕获模态间的空间和频域线索。此外，我们设计了一种轻量级动态调制多核卷积（LDMK）来感知复杂的形态结构，以最小化计算开销，替换LIDAR中的大部分卷积操作。在三个数据集上的实验表明，我们的方法优于其他最先进的方法。在光场深度数据集上，我们的方法仅使用5.35M参数就实现了0.8204的F1值和0.8465的mIoU值。代码和数据集可在https://github.com/Karl1109/LIDAR-Mamba获取。

更新时间: 2025-07-30 08:28:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22477v1

The Ball-Proximal (="Broximal") Point Method: a New Algorithm, Convergence Theory, and Applications

Non-smooth and non-convex global optimization poses significant challenges across various applications, where standard gradient-based methods often struggle. We propose the Ball-Proximal Point Method, Broximal Point Method, or Ball Point Method (BPM) for short - a novel algorithmic framework inspired by the classical Proximal Point Method (PPM) (Rockafellar, 1976), which, as we show, sheds new light on several foundational optimization paradigms and phenomena, including non-convex and non-smooth optimization, acceleration, smoothing, adaptive stepsize selection, and trust-region methods. At the core of BPM lies the ball-proximal ("broximal") operator, which arises from the classical proximal operator by replacing the quadratic distance penalty by a ball constraint. Surprisingly, and in sharp contrast with the sublinear rate of PPM in the nonsmooth convex regime, we prove that BPM converges linearly and in a finite number of steps in the same regime. Furthermore, by introducing the concept of ball-convexity, we prove that BPM retains the same global convergence guarantees under weaker assumptions, making it a powerful tool for a broader class of potentially non-convex optimization problems. Just like PPM plays the role of a conceptual method inspiring the development of practically efficient algorithms and algorithmic elements, e.g., gradient descent, adaptive step sizes, acceleration (Ahn & Sra, 2020), and "W" in AdamW (Zhuang et al., 2022), we believe that BPM should be understood in the same manner: as a blueprint and inspiration for further development.

Updated: 2025-07-30 08:25:58

标题: 球-近端（="Broximal"）点方法：一种新的算法、收敛理论和应用

摘要: 非光滑和非凸全局优化在各种应用中都面临着重大挑战，标准的基于梯度的方法经常难以应对。我们提出了Ball-Proximal Point Method，简称为Broximal Point Method或Ball Point Method（BPM）-这是一个新颖的算法框架，受经典的Proximal Point Method（PPM）（Rockafellar，1976）启发，正如我们所展示的，它为几个基础优化范式和现象，包括非凸和非光滑优化、加速、平滑、自适应步长选择和信任域方法，带来了新的启示。BPM的核心是球-近端（"broximal"）算子，它是通过将经典近端算子中的二次距离惩罚替换为球约束而产生的。令人惊讶的是，与PPM在非光滑凸区域的亚线性率相比，我们证明BPM在同一区域内呈线性收敛，且在有限步数内收敛。此外，通过引入球凸性的概念，我们证明BPM在更弱的假设下保持相同的全局收敛性保证，使其成为更广泛类别的潜在非凸优化问题的强大工具。就像PPM扮演着激发实际高效算法和算法元素发展的概念方法的角色，例如梯度下降、自适应步长、加速（Ahn＆Sra，2020）和AdamW中的“W”（Zhuang等，2022），我们认为BPM应该以同样的方式理解：作为进一步发展的蓝图和灵感。

更新时间: 2025-07-30 08:25:58

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.02002v2

The Ball-Proximal (="Broximal") Point Method: a New Algorithm, Convergence Theory, and Applications

Updated: 2025-07-30 08:25:58

标题: 球-近点（="Broximal"）点方法：一种新算法、收敛理论和应用

摘要: 非光滑和非凸全局优化在各种应用中都面临重大挑战，标准基于梯度的方法通常难以应对。我们提出了Ball-Proximal Point Method，简称为Broximal Point Method或Ball Point Method（BPM）-一种新颖的算法框架，灵感来自经典的Proximal Point Method（PPM）（Rockafellar，1976），正如我们展示的那样，它为多个基础优化范式和现象带来新的启发，包括非凸和非光滑优化、加速、平滑、自适应步长选择和信任域方法。BPM的核心是球-近端（“broximal”）运算符，它是通过将经典近端运算符中的二次距离惩罚替换为球约束而产生的。令人惊讶的是，与非光滑凸区域中PPM的亚线性收敛率形成鲜明对比，我们证明BPM在相同区域内呈线性收敛，并在有限步骤内实现。此外，通过引入球-凸性的概念，我们证明BPM在更弱的假设下保持相同的全局收敛保证，使其成为处理更广泛的潜在非凸优化问题的强大工具。就像PPM扮演着概念方法的角色，启发了实际高效算法和算法元素的发展，例如梯度下降、自适应步长、加速（Ahn & Sra，2020）和AdamW中的“W”（Zhuang等人，2022），我们认为BPM应该以同样的方式理解：作为进一步发展的蓝图和灵感。

更新时间: 2025-07-30 08:25:58

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.02002v2

Agentic Privacy-Preserving Machine Learning

Privacy-preserving machine learning (PPML) is critical to ensure data privacy in AI. Over the past few years, the community has proposed a wide range of provably secure PPML schemes that rely on various cryptography primitives. However, when it comes to large language models (LLMs) with billions of parameters, the efficiency of PPML is everything but acceptable. For instance, the state-of-the-art solution for confidential LLM inference represents at least 10,000-fold slower performance compared to plaintext inference. The performance gap is even larger when the context length increases. In this position paper, we propose a novel framework named Agentic-PPML to make PPML in LLMs practical. Our key insight is to employ a general-purpose LLM for intent understanding and delegate cryptographically secure inference to specialized models trained on vertical domains. By modularly separating language intent parsing - which typically involves little or no sensitive information - from privacy-critical computation, Agentic-PPML completely eliminates the need for the LLMs to process the encrypted prompts, enabling practical deployment of privacy-preserving LLM-centric services.

Updated: 2025-07-30 08:20:45

标题: 主动隐私保护机器学习

摘要: 隐私保护机器学习（PPML）对于确保人工智能中的数据隐私至关重要。在过去几年中，社区提出了许多经过证明安全的PPML方案，这些方案依赖于各种密码学原语。然而，当涉及到拥有数十亿参数的大型语言模型（LLMs）时，PPML的效率却是不可接受的。例如，针对机密LLM推断的最新解决方案，其性能至少比明文推断慢10,000倍。当上下文长度增加时，性能差距甚至更大。在这篇立场论文中，我们提出了一个名为Agentic-PPML的新框架，以使LLMs中的PPML变得实用。我们的关键洞察是利用通用的LLM进行意图理解，并将基于垂直领域训练的专门模型委托给具有密码学安全性的推理。通过模块化地将语言意图解析（通常涉及很少或没有敏感信息）与隐私关键的计算分离开来，Agentic-PPML完全消除了LLMs处理加密提示的需要，从而实现了隐私保护的LLM中心化服务的实际部署。

更新时间: 2025-07-30 08:20:45

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2508.02836v1

Visual Language Models as Zero-Shot Deepfake Detectors

The contemporary phenomenon of deepfakes, utilizing GAN or diffusion models for face swapping, presents a substantial and evolving threat in digital media, identity verification, and a multitude of other systems. The majority of existing methods for detecting deepfakes rely on training specialized classifiers to distinguish between genuine and manipulated images, focusing only on the image domain without incorporating any auxiliary tasks that could enhance robustness. In this paper, inspired by the zero-shot capabilities of Vision Language Models, we propose a novel VLM-based approach to image classification and then evaluate it for deepfake detection. Specifically, we utilize a new high-quality deepfake dataset comprising 60,000 images, on which our zero-shot models demonstrate superior performance to almost all existing methods. Subsequently, we compare the performance of the best-performing architecture, InstructBLIP, on the popular deepfake dataset DFDC-P against traditional methods in two scenarios: zero-shot and in-domain fine-tuning. Our results demonstrate the superiority of VLMs over traditional classifiers.

Updated: 2025-07-30 08:20:02

标题: 视觉语言模型作为零样本深假图检测器

摘要: 摘要：当代深度伪造现象利用GAN或扩散模型进行面部交换，对数字媒体、身份验证和多种其他系统构成重大且不断发展的威胁。现有大多数检测深度伪造的方法依赖于训练专门的分类器来区分真实和被操纵的图像，仅关注图像领域，没有整合任何可能增强鲁棒性的辅助任务。本文受到视觉语言模型的零样本能力的启发，提出了一种基于VLM的新型图像分类方法，并对其用于深度伪造检测进行了评估。具体来说，我们利用一个包含60,000张图像的新的高质量深度伪造数据集，证明了我们的零样本模型在性能上几乎优于所有现有方法。随后，我们比较了最佳性能架构InstructBLIP在流行的深度伪造数据集DFDC-P上在两种场景下的性能：零样本和领域内微调。我们的结果表明VLM优于传统分类器。

更新时间: 2025-07-30 08:20:02

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22469v1

Visual Language Models as Zero-Shot Deepfake Detectors

Updated: 2025-07-30 08:20:02

标题: 视觉语言模型作为零样本深度伪造检测器

摘要: 当代的深度伪造现象，利用GAN或扩散模型进行面部替换，对数字媒体、身份验证以及许多其他系统构成了重大且不断发展的威胁。现有大多数用于检测深度伪造的方法依赖于训练专门的分类器来区分真实和操纵的图像，仅关注图像领域，没有整合任何可能增强鲁棒性的辅助任务。在本文中，受到视觉语言模型的零样本能力的启发，我们提出了一种基于VLM的新颖图像分类方法，并对其进行了深度伪造检测的评估。具体地，我们利用一个包含60,000张图片的新高质量深度伪造数据集，我们的零样本模型在这个数据集上表现出比几乎所有现有方法都更好的性能。随后，我们比较了表现最佳的架构InstructBLIP在流行的深度伪造数据集DFDC-P上的性能，与两种场景下的传统方法进行比较：零样本和领域内微调。我们的结果表明，VLMs在传统分类器上具有优势。

更新时间: 2025-07-30 08:20:02

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22469v1

Towards Simulating Social Influence Dynamics with LLM-based Multi-agents

Recent advancements in Large Language Models offer promising capabilities to simulate complex human social interactions. We investigate whether LLM-based multi-agent simulations can reproduce core human social dynamics observed in online forums. We evaluate conformity dynamics, group polarization, and fragmentation across different model scales and reasoning capabilities using a structured simulation framework. Our findings indicate that smaller models exhibit higher conformity rates, whereas models optimized for reasoning are more resistant to social influence.

Updated: 2025-07-30 08:14:40

标题: 朝向基于LLM多智能体的社会影响动态模拟

摘要: 最近大型语言模型的进展为模拟复杂的人类社会互动提供了有希望的能力。我们研究LLM基础的多智能体模拟是否可以重现在线论坛中观察到的核心人类社会动态。我们使用结构化模拟框架评估了不同模型规模和推理能力下的遵从动态、群体极化和分裂。我们的研究结果表明，较小的模型表现出更高的遵从率，而优化推理的模型更具抗拒社会影响的能力。

更新时间: 2025-07-30 08:14:40

领域: cs.MA,cs.AI,cs.CY

下载: http://arxiv.org/abs/2507.22467v1

Towards Simulating Social Influence Dynamics with LLM-based Multi-agents

Updated: 2025-07-30 08:14:40

标题: 朝向基于LLM多代理的社会影响动态模拟

摘要: 最近大型语言模型的进展提供了模拟复杂人类社会互动的有希望的能力。我们研究基于LLM的多智能体模拟是否能够重现在线论坛中观察到的核心人类社会动态。我们使用结构化模拟框架评估了不同模型规模和推理能力下的遵从动态、群体极化和分裂。我们的研究结果表明，较小的模型表现出更高的遵从率，而针对推理优化的模型更能抵抗社会影响。

更新时间: 2025-07-30 08:14:40

领域: cs.MA,cs.AI,cs.CY

下载: http://arxiv.org/abs/2507.22467v1

Shallow Features Matter: Hierarchical Memory with Heterogeneous Interaction for Unsupervised Video Object Segmentation

Unsupervised Video Object Segmentation (UVOS) aims to predict pixel-level masks for the most salient objects in videos without any prior annotations. While memory mechanisms have been proven critical in various video segmentation paradigms, their application in UVOS yield only marginal performance gains despite sophisticated design. Our analysis reveals a simple but fundamental flaw in existing methods: over-reliance on memorizing high-level semantic features. UVOS inherently suffers from the deficiency of lacking fine-grained information due to the absence of pixel-level prior knowledge. Consequently, memory design relying solely on high-level features, which predominantly capture abstract semantic cues, is insufficient to generate precise predictions. To resolve this fundamental issue, we propose a novel hierarchical memory architecture to incorporate both shallow- and high-level features for memory, which leverages the complementary benefits of pixel and semantic information. Furthermore, to balance the simultaneous utilization of the pixel and semantic memory features, we propose a heterogeneous interaction mechanism to perform pixel-semantic mutual interactions, which explicitly considers their inherent feature discrepancies. Through the design of Pixel-guided Local Alignment Module (PLAM) and Semantic-guided Global Integration Module (SGIM), we achieve delicate integration of the fine-grained details in shallow-level memory and the semantic representations in high-level memory. Our Hierarchical Memory with Heterogeneous Interaction Network (HMHI-Net) consistently achieves state-of-the-art performance across all UVOS and video saliency detection benchmarks. Moreover, HMHI-Net consistently exhibits high performance across different backbones, further demonstrating its superiority and robustness. Project page: https://github.com/ZhengxyFlow/HMHI-Net .

Updated: 2025-07-30 08:11:18

标题: 表面特征重要性：具有异质交互的分层内存用于无监督视频对象分割

摘要: 无监督视频对象分割（UVOS）旨在预测视频中最显著对象的像素级掩模，而无需任何先前的注释。虽然记忆机制在各种视频分割范式中被证明是至关重要的，但它们在UVOS中的应用仅带来了较小的性能增益，尽管设计精良。我们的分析揭示了现有方法中一个简单但根本性的缺陷：过度依赖于记忆高级语义特征。由于缺乏像素级的先验知识，UVOS在本质上缺乏细粒度信息。因此，仅依赖高级特征的记忆设计，主要捕捉抽象语义线索，无法生成精确的预测。为了解决这一根本问题，我们提出了一种新颖的分层记忆架构，用于将浅层和高级特征结合到记忆中，利用像素和语义信息的互补优势。此外，为了平衡对像素和语义记忆特征的同时利用，我们提出了一种异质交互机制，以执行像素-语义相互作用，明确考虑它们固有的特征差异。通过设计像素引导的局部对齐模块（PLAM）和语义引导的全局集成模块（SGIM），我们在浅层记忆中实现了细粒度细节和高级记忆中的语义表示的精细集成。我们的分层记忆与异质交互网络（HMHI-Net）在所有UVOS和视频显著性检测基准上始终实现最先进的性能。此外，HMHI-Net在不同的骨干结构上始终表现出高性能，进一步证明了其优越性和稳健性。项目页面：https://github.com/ZhengxyFlow/HMHI-Net。

更新时间: 2025-07-30 08:11:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22465v1

Shallow Features Matter: Hierarchical Memory with Heterogeneous Interaction for Unsupervised Video Object Segmentation

Updated: 2025-07-30 08:11:18

标题: 浅层特征重要性：具有异质交互的分层内存用于无监督视频对象分割

摘要: 无监督视频目标分割（UVOS）旨在在视频中预测最显著对象的像素级掩模，而无需任何先前的注释。尽管记忆机制在各种视频分割范例中被证明是至关重要的，但在UVOS中的应用仅带来了较小的性能增益，尽管设计精良。我们的分析揭示了现有方法中一个简单但基本的缺陷：过度依赖记忆高级语义特征。由于缺乏像素级先验知识，UVOS固有地缺乏细粒度信息。因此，仅依赖高级特征的记忆设计，主要捕捉抽象语义线索，不足以生成精确的预测。为解决这一基本问题，我们提出了一种新颖的分层记忆体系结构，以融合浅层和高级特征用于记忆，并利用像素和语义信息的互补优势。此外，为了平衡像素和语义记忆特征的同时利用，我们提出了一种异质交互机制，以执行像素-语义相互作用，明确考虑它们固有的特征差异。通过设计Pixel-guided Local Alignment Module（PLAM）和Semantic-guided Global Integration Module（SGIM），我们实现了对浅层记忆中的细粒度细节和高层记忆中的语义表示的精细集成。我们的分层记忆与异质交互网络（HMHI-Net）在所有UVOS和视频显著性检测基准上始终实现了最先进的性能。此外，HMHI-Net在不同的主干网络上始终表现出高性能，进一步证明了其优越性和稳健性。项目页面：https://github.com/ZhengxyFlow/HMHI-Net。

更新时间: 2025-07-30 08:11:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22465v1

Towards Interpretable Renal Health Decline Forecasting via Multi-LMM Collaborative Reasoning Framework

Accurate and interpretable prediction of estimated glomerular filtration rate (eGFR) is essential for managing chronic kidney disease (CKD) and supporting clinical decisions. Recent advances in Large Multimodal Models (LMMs) have shown strong potential in clinical prediction tasks due to their ability to process visual and textual information. However, challenges related to deployment cost, data privacy, and model reliability hinder their adoption. In this study, we propose a collaborative framework that enhances the performance of open-source LMMs for eGFR forecasting while generating clinically meaningful explanations. The framework incorporates visual knowledge transfer, abductive reasoning, and a short-term memory mechanism to enhance prediction accuracy and interpretability. Experimental results show that the proposed framework achieves predictive performance and interpretability comparable to proprietary models. It also provides plausible clinical reasoning processes behind each prediction. Our method sheds new light on building AI systems for healthcare that combine predictive accuracy with clinically grounded interpretability.

Updated: 2025-07-30 08:11:06

标题: 朝向可解释的肾健康下降预测：基于多重LMM协作推理框架

摘要: 准确和可解释的估计肾小球滤过率（eGFR）的预测对于管理慢性肾病（CKD）和支持临床决策至关重要。最近大型多模型（LMMs）的进展显示出在临床预测任务中具有强大潜力，因为它们能够处理视觉和文本信息。然而，与部署成本、数据隐私和模型可靠性相关的挑战阻碍了它们的采用。在本研究中，我们提出了一个协作框架，增强了开源LMMs在eGFR预测中的性能，同时生成临床意义的解释。该框架结合了视觉知识传递、引导推理和短期记忆机制，以增强预测准确性和可解释性。实验结果表明，所提出的框架实现了与专有模型相当的预测性能和可解释性。它还提供了每个预测背后的合理临床推理过程。我们的方法为结合预测准确性和临床基础可解释性的医疗人工智能系统的构建提供了新的思路。

更新时间: 2025-07-30 08:11:06

领域: cs.LG,cs.AI,cs.MA,stat.AP

下载: http://arxiv.org/abs/2507.22464v1

Towards Interpretable Renal Health Decline Forecasting via Multi-LMM Collaborative Reasoning Framework

Updated: 2025-07-30 08:11:06

标题: 朝向可解释的肾功能下降预测：基于多LMM协作推理框架

摘要: 精确和可解释的估计肾小球滤过率（eGFR）的预测对于管理慢性肾病（CKD）和支持临床决策至关重要。最近大型多模型模型（LMMs）在临床预测任务中显示出强大潜力，因为它们能够处理视觉和文本信息。然而，与部署成本、数据隐私和模型可靠性相关的挑战阻碍了它们的采用。在本研究中，我们提出了一个协作框架，增强了开源LMMs在eGFR预测中的性能，同时生成临床有意义的解释。该框架结合了视觉知识传递、推理和短期记忆机制，以增强预测准确性和可解释性。实验结果显示，所提出的框架实现了与专有模型相当的预测性能和可解释性。它还提供了每个预测背后合理的临床推理过程。我们的方法为结合预测准确性和临床可解释性的医疗AI系统的建立带来了新的启示。

更新时间: 2025-07-30 08:11:06

领域: cs.LG,cs.AI,cs.MA,stat.AP

下载: http://arxiv.org/abs/2507.22464v1

SDBA: A Stealthy and Long-Lasting Durable Backdoor Attack in Federated Learning

Federated learning is a promising approach for training machine learning models while preserving data privacy. However, its distributed nature makes it vulnerable to backdoor attacks, particularly in NLP tasks, where related research remains limited. This paper introduces SDBA, a novel backdoor attack mechanism designed for NLP tasks in federated learning environments. Through a systematic analysis across LSTM and GPT-2 models, we identify the most vulnerable layers for backdoor injection and achieve both stealth and long-lasting durability by applying layer-wise gradient masking and top-k% gradient masking. Also, to evaluate the task generalizability of SDBA, we additionally conduct experiments on the T5 model. Experiments on next-token prediction, sentiment analysis, and question answering tasks show that SDBA outperforms existing backdoors in terms of durability and effectively bypasses representative defense mechanisms, demonstrating notable performance in transformer-based models such as GPT-2. These results highlight the urgent need for robust defense strategies in NLP-based federated learning systems.

Updated: 2025-07-30 08:09:17

标题: SDBA：一种隐蔽且持久耐用的联邦学习后门攻击

摘要: 联邦学习是一种有前途的方法，用于训练机器学习模型并保护数据隐私。然而，其分布式特性使其容易受到后门攻击的影响，尤其在自然语言处理任务中，相关研究仍然有限。本文介绍了SDBA，这是一种专为联邦学习环境下的自然语言处理任务设计的新型后门攻击机制。通过对LSTM和GPT-2模型进行系统分析，我们确定了最容易受到后门注入的层，并通过逐层梯度掩盖和前k%梯度掩盖实现了隐蔽性和持久性。此外，为了评估SDBA的任务泛化能力，我们还在T5模型上进行了实验。对下一个标记预测、情感分析和问答任务的实验表明，在持久性方面，SDBA优于现有的后门，并有效地绕过代表性的防御机制，在基于transformer的模型（如GPT-2）中表现出显著的性能。这些结果突显了NLP基础联邦学习系统中对强大的防御策略的迫切需要。

更新时间: 2025-07-30 08:09:17

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2409.14805v2

Trajectory First: A Curriculum for Discovering Diverse Policies

Being able to solve a task in diverse ways makes agents more robust to task variations and less prone to local optima. In this context, constrained diversity optimization has emerged as a powerful reinforcement learning (RL) framework to train a diverse set of agents in parallel. However, existing constrained-diversity RL methods often under-explore in complex tasks such as robotic manipulation, leading to a lack in policy diversity. To improve diversity optimization in RL, we therefore propose a curriculum that first explores at the trajectory level before learning step-based policies. In our empirical evaluation, we provide novel insights into the shortcoming of skill-based diversity optimization, and demonstrate empirically that our curriculum improves the diversity of the learned skills.

Updated: 2025-07-30 08:07:33

标题: 首先轨迹：一个用于发现多样化政策的课程

摘要: 在不同的方式解决任务使代理更具韧性，更不容易陷入局部最优解。在这种情况下，受限多样性优化已经成为强化学习（RL）中训练多样化代理的强大框架。然而，现有的受限多样性RL方法在复杂任务中往往探索不足，例如机器人操作，导致策略多样性不足。为了改善RL中的多样性优化，我们提出了一个课程，首先在轨迹级别探索，然后学习基于步骤的策略。在我们的实证评估中，我们提供了有关基于技能多样性优化的缺陷的新见解，并从实证上证明我们的课程改进了学到的技能的多样性。

更新时间: 2025-07-30 08:07:33

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2506.01568v2

Trajectory First: A Curriculum for Discovering Diverse Policies

Updated: 2025-07-30 08:07:33

标题: 首先路径规划：探索多样化政策的课程

摘要: 能够以多种方式解决任务使代理更具韧性，不容易陷入局部最优解。在这种情况下，受限多样性优化已经成为一个强大的强化学习框架，可以并行训练多样化的代理。然而，现有的受限多样性强化学习方法在复杂任务中往往探索不足，比如机器人操作，导致策略多样性不足。为了改进强化学习中的多样性优化，我们提出了一个课程，首先在轨迹级别进行探索，然后学习基于步骤的策略。在我们的实证评估中，我们提供了关于基于技能的多样性优化的不足之处的新见解，并通过实证证明我们的课程提高了学习技能的多样性。

更新时间: 2025-07-30 08:07:33

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2506.01568v2

Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization

In-context learning (ICL) enables Large Vision-Language Models (LVLMs) to adapt to new tasks without parameter updates, using a few demonstrations from a large support set. However, selecting informative demonstrations leads to high computational and memory costs. While some methods explore selecting a small and representative coreset in the text classification, evaluating all support set samples remains costly, and discarded samples lead to unnecessary information loss. These methods may also be less effective for image classification due to differences in feature spaces. Given these limitations, we propose Key-based Coreset Optimization (KeCO), a novel framework that leverages untapped data to construct a compact and informative coreset. We introduce visual features as keys within the coreset, which serve as the anchor for identifying samples to be updated through different selection strategies. By leveraging untapped samples from the support set, we update the keys of selected coreset samples, enabling the randomly initialized coreset to evolve into a more informative coreset under low computational cost. Through extensive experiments on coarse-grained and fine-grained image classification benchmarks, we demonstrate that KeCO effectively enhances ICL performance for image classification task, achieving an average improvement of more than 20\%. Notably, we evaluate KeCO under a simulated online scenario, and the strong performance in this scenario highlights the practical value of our framework for resource-constrained real-world scenarios.

Updated: 2025-07-30 08:05:05

标题: 通过核心集优化增强图像分类的多模态上下文学习

摘要: 上下文学习（ICL）使大型视觉-语言模型（LVLMs）能够在没有参数更新的情况下适应新任务，只需使用大型支持集中的一些演示。然而，选择信息丰富的演示会导致高计算和内存成本。虽然一些方法在文本分类中探索选择一个小而代表性的核心集，但评估所有支持集样本仍然昂贵，丢弃的样本会导致不必要的信息丢失。由于特征空间的差异，这些方法在图像分类方面可能也不够有效。鉴于这些限制，我们提出了基于关键的核心集优化（KeCO）框架，这是一个利用未开发数据构建紧凑且信息丰富核心集的新方法。我们将视觉特征作为核心集中的关键，这些特征作为识别通过不同选择策略更新的样本的锚点。通过利用支持集中未开发的样本，我们更新所选核心集样本的关键，使随机初始化的核心集在低计算成本下发展成一个更具信息丰富的核心集。通过在粗粒度和细粒度图像分类基准上进行大量实验，我们证明了KeCO有效地提升了图像分类任务的ICL性能，平均改进超过20％。值得注意的是，我们在模拟在线场景下评估了KeCO，在这种场景下表现出色，突显了我们的框架在资源受限的实际场景中的实际价值。

更新时间: 2025-07-30 08:05:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2504.14200v2

What is an "Abstract Reasoner"? Revisiting Experiments and Arguments about Large Language Models

Recent work has argued that large language models (LLMs) are not "abstract reasoners", citing their poor zero-shot performance on a variety of challenging tasks as evidence. We revisit these experiments in order to add nuance to the claim. First, we show that while LLMs indeed perform poorly in a zero-shot setting, even tuning a small subset of parameters for input encoding can enable near-perfect performance. However, we also show that this finetuning does not necessarily transfer across datasets. We take this collection of empirical results as an invitation to (re-)open the discussion of what it means to be an "abstract reasoner", and why it matters whether LLMs fit the bill.

Updated: 2025-07-30 08:04:19

标题: “什么是‘抽象推理者’？重新审视关于大型语言模型的实验和论证”

摘要: 最近的研究认为，大型语言模型（LLMs）并不是“抽象推理者”，引用它们在各种具有挑战性的任务上表现不佳的零射击性能作为证据。我们重新审视这些实验，以在主张中添加细微差别。首先，我们展示了虽然LLMs在零射击设置中表现不佳，但是即使调整输入编码的一小部分参数也可以实现接近完美的性能。然而，我们也展示了这种微调不一定在数据集之间转移。我们将这一系列经验结果视为重新打开关于什么是“抽象推理者”，以及LLMs是否符合要求的讨论的邀请。

更新时间: 2025-07-30 08:04:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.22457v1

The wall confronting large language models

We show that the scaling laws which determine the performance of large language models (LLMs) severely limit their ability to improve the uncertainty of their predictions. As a result, raising their reliability to meet the standards of scientific inquiry is intractable by any reasonable measure. We argue that the very mechanism which fuels much of the learning power of LLMs, namely the ability to generate non-Gaussian output distributions from Gaussian input ones, might well be at the roots of their propensity to produce error pileup, ensuing information catastrophes and degenerative AI behaviour. This tension between learning and accuracy is a likely candidate mechanism underlying the observed low values of the scaling components. It is substantially compounded by the deluge of spurious correlations pointed out by Calude and Longo which rapidly increase in any data set merely as a function of its size, regardless of its nature. The fact that a degenerative AI pathway is a very probable feature of the LLM landscape does not mean that it must inevitably arise in all future AI research. Its avoidance, which we also discuss in this paper, necessitates putting a much higher premium on insight and understanding of the structural characteristics of the problems being investigated.

Updated: 2025-07-30 07:58:56

标题: 面对大型语言模型的挑战

摘要: 我们展示了决定大型语言模型（LLMs）性能的缩放定律严重限制了它们改善预测不确定性的能力。因此，通过任何合理的措施将它们的可靠性提高到满足科学研究标准是棘手的。我们认为，驱动LLMs许多学习能力的机制，即能够从高斯输入生成非高斯输出分布的能力，可能是导致它们产生错误堆积、随之而来的信息灾难和退化AI行为的根源。学习与准确性之间的这种紧张关系是观察到的缩放组件低值的可能候选机制。由Calude和Longo指出的大量虚假相关性的泛滥迅速增加，仅仅作为其大小的函数，而不考虑其性质。虽然退化的AI路径在LLM景观中是一个非常可能的特征，但这并不意味着它必定会在所有未来的AI研究中出现。避免这种情况，我们在本文中也讨论了，需要更高地重视对正在调查的问题的结构特征的洞察力和理解。

更新时间: 2025-07-30 07:58:56

领域: cs.AI

下载: http://arxiv.org/abs/2507.19703v2

DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework

Multivariate Time Series Forecasting plays a key role in many applications. Recent works have explored using Large Language Models for MTSF to take advantage of their reasoning abilities. However, many methods treat LLMs as end-to-end forecasters, which often leads to a loss of numerical precision and forces LLMs to handle patterns beyond their intended design. Alternatively, methods that attempt to align textual and time series modalities within latent space frequently encounter alignment difficulty. In this paper, we propose to treat LLMs not as standalone forecasters, but as semantic guidance modules within a dual-stream framework. We propose DualSG, a dual-stream framework that provides explicit semantic guidance, where LLMs act as Semantic Guides to refine rather than replace traditional predictions. As part of DualSG, we introduce Time Series Caption, an explicit prompt format that summarizes trend patterns in natural language and provides interpretable context for LLMs, rather than relying on implicit alignment between text and time series in the latent space. We also design a caption-guided fusion module that explicitly models inter-variable relationships while reducing noise and computation. Experiments on real-world datasets from diverse domains show that DualSG consistently outperforms 15 state-of-the-art baselines, demonstrating the value of explicitly combining numerical forecasting with semantic guidance.

Updated: 2025-07-30 07:57:31

标题: DualSG：一种双流显式语义引导的多变量时间序列预测框架

摘要: 多元时间序列预测在许多应用中起着关键作用。最近的研究探索了使用大型语言模型进行多元时间序列预测，以利用它们的推理能力。然而，许多方法将LLMs视为端到端的预测器，这经常导致数值精度的损失，并迫使LLMs处理超出其预期设计范围的模式。相反，试图在潜在空间内对齐文本和时间序列模态的方法经常遇到对齐困难。在本文中，我们建议将LLMs视为语义引导模块而不是独立的预测器，在双流框架中提供明确的语义引导。我们提出了DualSG，一个提供明确语义引导的双流框架，其中LLMs充当语义引导，用于优化而不是替换传统预测。作为DualSG的一部分，我们引入了时间序列标题，一种明确的提示格式，用自然语言总结趋势模式并为LLMs提供可解释的上下文，而不是依赖于潜在空间中文本和时间序列之间的隐式对齐。我们还设计了一个由标题引导的融合模块，明确建模变量之间的关系同时减少噪音和计算。对来自不同领域的真实数据集进行的实验表明，DualSG始终优于15种最先进的基线方法，证明了明确将数字预测与语义引导相结合的价值。

更新时间: 2025-07-30 07:57:31

领域: cs.AI

下载: http://arxiv.org/abs/2507.21830v2

Strategic Integration of Artificial Intelligence in the C-Suite: The Role of the Chief AI Officer

The integration of Artificial Intelligence (AI) into corporate strategy has become critical for organizations seeking to maintain a competitive advantage in the digital age. As AI transforms business models, operations, and decision-making, the need for dedicated executive leadership to guide, govern, and orchestrate this transformation becomes increasingly evident. This paper examines emerging future scenarios across three domains: the AI Economy, the AI Organization, and Competition in the Age of AI. These domains reveal environmental, structural, and strategic tensions that existing C-suite roles struggle to resolve. In response, the paper develops a theory-informed framework for the Chief AI Officer (CAIO), outlining the distinct functions and capabilities required to guide and govern AI at scale. Drawing on illustrative cases and emerging practice, this conceptualization clarifies the CAIOs unique role within the executive landscape and presents a forward-looking research agenda. This paper advances the discourse on AI leadership by offering a theory-driven rationale for the strategic integration of AI at the executive level and by positioning the Chief AI Officer as a distinct and necessary role within modern organizations.

Updated: 2025-07-30 07:54:27

标题: 高管层中人工智能战略整合：首席人工智能官的角色

摘要: 将人工智能（AI）整合到企业战略中已经变得至关重要，对于寻求在数字时代保持竞争优势的组织来说。随着AI改变商业模式、运营和决策制定，需要专门的高管领导来引导、治理和协调这种转变变得越来越明显。本文研究了三个领域的新兴未来场景：AI经济、AI组织和AI时代的竞争。这些领域揭示了现有C级高管角色难以解决的环境、结构和战略紧张关系。为此，本文提出了一个基于理论的首席人工智能官（CAIO）框架，概述了指导和治理大规模AI所需的独特职能和能力。借鉴实例案例和新兴实践，这种概念化阐明了CAIO在高管领域的独特角色，并提出了一个前瞻性的研究议程。本文通过提供一个理论驱动的理由，支持在高管层面战略性整合AI，并将首席人工智能官定位为现代组织中独特且必要的角色，推动了关于AI领导力的讨论。

更新时间: 2025-07-30 07:54:27

领域: cs.CY,cs.AI,cs.LG,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2407.10247v2

A case for data valuation transparency via DValCards

Following the rise in popularity of data-centric machine learning (ML), various data valuation methods have been proposed to quantify the contribution of each datapoint to desired ML model performance metrics (e.g., accuracy). Beyond the technical applications of data valuation methods (e.g., data cleaning, data acquisition, etc.), it has been suggested that within the context of data markets, data buyers might utilize such methods to fairly compensate data owners. Here we demonstrate that data valuation metrics are inherently biased and unstable under simple algorithmic design choices, resulting in both technical and ethical implications. By analyzing 9 tabular classification datasets and 6 data valuation methods, we illustrate how (1) common and inexpensive data pre-processing techniques can drastically alter estimated data values; (2) subsampling via data valuation metrics may increase class imbalance; and (3) data valuation metrics may undervalue underrepresented group data. Consequently, we argue in favor of increased transparency associated with data valuation in-the-wild and introduce the novel Data Valuation Cards (DValCards) framework towards this aim. The proliferation of DValCards will reduce misuse of data valuation metrics, including in data pricing, and build trust in responsible ML systems.

Updated: 2025-07-30 07:49:04

标题: 透明数据价值评估的案例：通过DValCards

摘要: 随着以数据为中心的机器学习（ML）的流行，各种数据估值方法已被提出，以量化每个数据点对所需的ML模型性能指标（例如准确性）的贡献。除了数据估值方法的技术应用（例如数据清洗、数据获取等），还有人提出在数据市场的背景下，数据买家可能利用这些方法公平地补偿数据所有者。在这里，我们展示了数据估值指标在简单算法设计选择下天生具有偏见和不稳定性，导致技术和道德方面的影响。通过分析9个表格分类数据集和6种数据估值方法，我们阐明了以下几点：(1)常见且廉价的数据预处理技术可以极大地改变估计的数据价值；(2)通过数据估值指标进行子抽样可能会增加类别不平衡；(3)数据估值指标可能会低估代表性较低的群体数据。因此，我们主张增加数据估值在实践中的透明度，并引入旨在实现这一目标的新颖数据估值卡（DValCards）框架。DValCards的普及将减少数据估值指标的误用，包括在数据定价方面，并建立对负责任的ML系统的信任。

更新时间: 2025-07-30 07:49:04

领域: cs.LG

下载: http://arxiv.org/abs/2506.23349v2

A case for data valuation transparency via DValCards

Updated: 2025-07-30 07:49:04

标题: 通过DValCards提倡数据估值透明化：一个案例

摘要: 随着以数据为中心的机器学习（ML）的普及，各种数据估值方法已被提出来量化每个数据点对所需ML模型性能指标（例如准确性）的贡献。除了数据估值方法的技术应用（例如数据清洗、数据获取等），还有人提出在数据市场的背景下，数据买家可能利用这些方法公平地补偿数据所有者。在这里，我们展示了数据估值指标在简单的算法设计选择下天生具有偏见和不稳定性，从而产生技术和伦理问题。通过分析9个表格分类数据集和6种数据估值方法，我们说明了（1）常见且廉价的数据预处理技术可以极大地改变估计的数据价值；（2）通过数据估值指标进行子抽样可能会增加类别不平衡；（3）数据估值指标可能会低估少数群体的数据。因此，我们主张增加数据估值在实际应用中的透明度，并推出新颖的数据估值卡（DValCards）框架以实现这一目标。DValCards的普及将减少数据估值指标的误用，包括在数据定价中，并建立对负责任的ML系统的信任。

更新时间: 2025-07-30 07:49:04

领域: cs.LG

下载: http://arxiv.org/abs/2506.23349v2

Breaking Obfuscation: Cluster-Aware Graph with LLM-Aided Recovery for Malicious JavaScript Detection

With the rapid expansion of web-based applications and cloud services, malicious JavaScript code continues to pose significant threats to user privacy, system integrity, and enterprise security. But, detecting such threats remains challenging due to sophisticated code obfuscation techniques and JavaScript's inherent language characteristics, particularly its nested closure structures and syntactic flexibility. In this work, we propose DeCoda, a hybrid defense framework that combines large language model (LLM)-based deobfuscation with code graph learning: (1) We first construct a sophisticated prompt-learning pipeline with multi-stage refinement, where the LLM progressively reconstructs the original code structure from obfuscated inputs and then generates normalized Abstract Syntax Tree (AST) representations; (2) In JavaScript ASTs, dynamic typing scatters semantically similar nodes while deeply nested functions fracture scope capturing, introducing structural noise and semantic ambiguity. To address these challenges, we then propose to learn hierarchical code graph representations via a Cluster-wise Graph that synergistically integrates graph transformer network, node clustering, and node-to-cluster attention to simultaneously capture both local node-level semantics and global cluster-induced structural relationships from AST graph. Experimental results demonstrate that our method achieves F1-scores of 94.64% and 97.71% on two benchmark datasets, demonstrating absolute improvements of 10.74% and 13.85% over state-of-the-art baselines. In false-positive control evaluation at fixed FPR levels (0.0001, 0.001, 0.01), our approach delivers 4.82, 5.91, and 2.53 higher TPR respectively compared to the best-performing baseline. These results highlight the effectiveness of LLM-based deobfuscation and underscore the importance of modeling cluster-level relationships in detecting malicious code.

Updated: 2025-07-30 07:46:49

标题: 打破混淆：LLM辅助恢复的集群感知图用于恶意JavaScript检测

摘要: 随着基于Web的应用程序和云服务的快速扩展，恶意JavaScript代码继续对用户隐私、系统完整性和企业安全构成重大威胁。然而，由于复杂的代码混淆技术和JavaScript固有的语言特性，特别是其嵌套闭包结构和语法灵活性，检测此类威胁仍然具有挑战性。在这项工作中，我们提出了DeCoda，一个混合防御框架，将基于大型语言模型（LLM）的反混淆与代码图学习相结合：（1）我们首先构建一个复杂的prompt-learning流水线，通过多阶段细化，LLM逐渐从混淆的输入中重构原始代码结构，然后生成规范化的抽象语法树（AST）表示；（2）在JavaScript AST中，动态类型散布语义相似的节点，而深度嵌套函数破坏了作用域捕获，引入了结构噪音和语义模糊。为了应对这些挑战，我们提出通过集群式图形学习分层代码图表示，通过图形变换网络、节点聚类和节点到集群的关注同时捕获AST图中的本地节点级语义和全局集群引起的结构关系。实验结果表明，我们的方法在两个基准数据集上实现了94.64%和97.71%的F1分数，相对于最先进的基准线分别取得了10.74%和13.85%的绝对改善。在固定FPR水平（0.0001、0.001、0.01）的误报控制评估中，与表现最佳的基准线相比，我们的方法分别提供了4.82、5.91和2.53更高的TPR。这些结果突显了基于LLM的反混淆的有效性，并强调了在检测恶意代码时建模集群级别关系的重要性。

更新时间: 2025-07-30 07:46:49

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2507.22447v1

RCR-AF: Enhancing Model Generalization via Rademacher Complexity Reduction Activation Function

Despite their widespread success, deep neural networks remain critically vulnerable to adversarial attacks, posing significant risks in safety-sensitive applications. This paper investigates activation functions as a crucial yet underexplored component for enhancing model robustness. We propose a Rademacher Complexity Reduction Activation Function (RCR-AF), a novel activation function designed to improve both generalization and adversarial resilience. RCR-AF uniquely combines the advantages of GELU (including smoothness, gradient stability, and negative information retention) with ReLU's desirable monotonicity, while simultaneously controlling both model sparsity and capacity through built-in clipping mechanisms governed by two hyperparameters, $\alpha$ and $\gamma$. Our theoretical analysis, grounded in Rademacher complexity, demonstrates that these parameters directly modulate the model's Rademacher complexity, offering a principled approach to enhance robustness. Comprehensive empirical evaluations show that RCR-AF consistently outperforms widely-used alternatives (ReLU, GELU, and Swish) in both clean accuracy under standard training and in adversarial robustness within adversarial training paradigms.

Updated: 2025-07-30 07:45:03

标题: RCR-AF：通过减少Rademacher复杂度激活函数来增强模型泛化能力

摘要: 尽管深度神经网络取得了广泛成功，但仍然存在对抗性攻击的严重脆弱性，这在安全敏感应用中带来了重大风险。本文研究激活函数作为增强模型稳健性的关键但尚未充分探索的组件。我们提出了一种Rademacher复杂度减小激活函数（RCR-AF），这是一种旨在提高泛化能力和对抗性韧性的新型激活函数。RCR-AF独特地结合了GELU的优势（包括平滑性、梯度稳定性和负信息保留）与ReLU的理想单调性，同时通过两个超参数$\alpha$和$\gamma$控制模型的稀疏性和容量，通过内置的剪切机制。我们的理论分析基于Rademacher复杂度，证明这些参数直接调节模型的Rademacher复杂度，提供了一种增强稳健性的原则性方法。全面的实证评估显示，RCR-AF在标准训练下的干净准确性和在对抗性训练范式下的对抗性鲁棒性方面均始终优于广泛使用的替代品（ReLU、GELU和Swish）。

更新时间: 2025-07-30 07:45:03

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.22446v1

AI-generated stories favour stability over change: homogeneity and cultural stereotyping in narratives generated by gpt-4o-mini

Can a language model trained largely on Anglo-American texts generate stories that are culturally relevant to other nationalities? To find out, we generated 11,800 stories - 50 for each of 236 countries - by sending the prompt "Write a 1500 word potential {demonym} story" to OpenAI's model gpt-4o-mini. Although the stories do include surface-level national symbols and themes, they overwhelmingly conform to a single narrative plot structure across countries: a protagonist lives in or returns home to a small town and resolves a minor conflict by reconnecting with tradition and organising community events. Real-world conflicts are sanitised, romance is almost absent, and narrative tension is downplayed in favour of nostalgia and reconciliation. The result is a narrative homogenisation: an AI-generated synthetic imaginary that prioritises stability above change and tradition above growth. We argue that the structural homogeneity of AI-generated narratives constitutes a distinct form of AI bias, a narrative standardisation that should be acknowledged alongside the more familiar representational bias. These findings are relevant to literary studies, narratology, critical AI studies, NLP research, and efforts to improve the cultural alignment of generative AI.

Updated: 2025-07-30 07:44:28

标题: 人工智能生成的故事更偏向于稳定性而非变化：由GPT-4O-mini生成的叙述中的同质性和文化刻板印象

摘要: 一个在主要训练于盎格鲁-美国文本的语言模型能否生成对其他国家具有文化相关性的故事？为了找出答案，我们通过向OpenAI的gpt-4o-mini模型发送提示“写一个1500字的潜在{国家名}故事”，生成了11,800个故事 - 每个236个国家50个。尽管这些故事确实包含表面层面的国家符号和主题，它们在各个国家之间主要遵循单一的叙事情节结构：一个主角生活在或回到一个小镇，并通过重新连接传统和组织社区活动解决了一个轻微冲突。现实世界的冲突被消除，浪漫几乎不存在，叙事张力被淡化，而怀旧和和解被青睐。结果是一种叙事同质化：一种由人工智能生成的合成想象，将稳定置于变化之上，将传统置于增长之上。我们认为，人工智能生成的叙事结构同质性构成了一种独特形式的人工智能偏见，一种应该与更为熟悉的表现性偏见一同被承认的叙事标准化。这些发现对文学研究、叙述学、批判性人工智能研究、自然语言处理研究以及努力改善生成人工智能的文化对齐性都具有相关性。

更新时间: 2025-07-30 07:44:28

领域: cs.CL,cs.AI,H.1.2; I.2.4; I.2.0; I.2.7

下载: http://arxiv.org/abs/2507.22445v1

Anti-Inpainting: A Proactive Defense Approach against Malicious Diffusion-based Inpainters under Unknown Conditions

With the increasing prevalence of diffusion-based malicious image manipulation, existing proactive defense methods struggle to safeguard images against tampering under unknown conditions. To address this, we propose Anti-Inpainting, a proactive defense approach that achieves protection comprising three novel modules. First, we introduce a multi-level deep feature extractor to obtain intricate features from the diffusion denoising process, enhancing protective effectiveness. Second, we design a multi-scale, semantic-preserving data augmentation technique to enhance the transferability of adversarial perturbations across unknown conditions. Finally, we propose a selection-based distribution deviation optimization strategy to bolster protection against manipulations guided by diverse random seeds. Extensive experiments on InpaintGuardBench and CelebA-HQ demonstrate that Anti-Inpainting effectively defends against diffusion-based inpainters under unknown conditions. Additionally, our approach demonstrates robustness against various image purification methods and transferability across different diffusion model versions.

Updated: 2025-07-30 07:40:12

标题: 反修复：一种针对未知条件下恶意扩散修复算法的主动防御方法

摘要: 随着基于扩散的恶意图像篡改日益普遍，现有的主动防御方法在未知条件下难以保护图像免受篡改。为了解决这个问题，我们提出了一种名为Anti-Inpainting的主动防御方法，该方法包括三个新颖的模块。首先，我们引入了一个多级深度特征提取器，从扩散去噪过程中获取复杂特征，增强了保护效果。其次，我们设计了一种多尺度、语义保留的数据增强技术，以增强对未知条件下对抗扰动的可传递性。最后，我们提出了一种基于选择的分布偏差优化策略，以加强对由不同随机种子引导的篡改的保护。对InpaintGuardBench和CelebA-HQ进行的大量实验表明，Anti-Inpainting在未知条件下有效防御基于扩散的修复工具。此外，我们的方法还表现出对各种图像净化方法的稳健性，并且在不同扩散模型版本之间具有可传递性。

更新时间: 2025-07-30 07:40:12

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2505.13023v2

Efficient Pain Recognition via Respiration Signals: A Single Cross-Attention Transformer Multi-Window Fusion Pipeline

Pain is a complex condition affecting a large portion of the population. Accurate and consistent evaluation is essential for individuals experiencing pain, and it supports the development of effective and advanced management strategies. Automatic pain assessment systems provide continuous monitoring and support clinical decision-making, aiming to reduce distress and prevent functional decline. This study has been submitted to the \textit{Second Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN)}. The proposed method introduces a pipeline that leverages respiration as the input signal and incorporates a highly efficient cross-attention transformer alongside a multi-windowing strategy. Extensive experiments demonstrate that respiration is a valuable physiological modality for pain assessment. Moreover, experiments revealed that compact and efficient models, when properly optimized, can achieve strong performance, often surpassing larger counterparts. The proposed multi-window approach effectively captures both short-term and long-term features, as well as global characteristics, thereby enhancing the model's representational capacity.

Updated: 2025-07-30 07:37:25

标题: 通过呼吸信号实现高效的疼痛识别：单一交叉注意力变换器多窗口融合流程

摘要: 疼痛是一种影响大部分人口的复杂病症。准确和一致的评估对于经历疼痛的个体至关重要，并支持有效和先进的管理策略的发展。自动疼痛评估系统提供持续监测并支持临床决策，旨在减少痛苦并预防功能下降。本研究已提交给 \textit{第二届多模感知大挑战，用于下一代疼痛评估 (AI4PAIN)}。提出的方法引入了一个利用呼吸作为输入信号并结合高效的交叉注意力变压器以及多窗口策略的流程。广泛的实验表明，呼吸是一种有价值的生理模态用于疼痛评估。此外，实验证明，紧凑高效的模型在适当优化时，可以取得强大的性能，往往超过更大的对应物。提出的多窗口方法有效捕获短期和长期特征以及全局特征，从而增强模型的表示能力。

更新时间: 2025-07-30 07:37:25

领域: cs.AI,cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.21886v2

Multi-Representation Diagrams for Pain Recognition: Integrating Various Electrodermal Activity Signals into a Single Image

Pain is a multifaceted phenomenon that affects a substantial portion of the population. Reliable and consistent evaluation benefits those experiencing pain and underpins the development of effective and advanced management strategies. Automatic pain-assessment systems deliver continuous monitoring, inform clinical decision-making, and aim to reduce distress while preventing functional decline. By incorporating physiological signals, these systems provide objective, accurate insights into an individual's condition. This study has been submitted to the \textit{Second Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN)}. The proposed method introduces a pipeline that leverages electrodermal activity signals as input modality. Multiple representations of the signal are created and visualized as waveforms, and they are jointly visualized within a single multi-representation diagram. Extensive experiments incorporating various processing and filtering techniques, along with multiple representation combinations, demonstrate the effectiveness of the proposed approach. It consistently yields comparable, and in several cases superior, results to traditional fusion methods, establishing it as a robust alternative for integrating different signal representations or modalities.

Updated: 2025-07-30 07:34:18

标题: 多重表现图表用于疼痛识别：将各种皮电活动信号整合为单一图像

摘要: 疼痛是一个影响大部分人口的多方面现象。可靠和一致的评估有利于那些正在经历疼痛的人，并支持有效和先进的管理策略的发展。自动疼痛评估系统提供持续监测，为临床决策提供信息，并旨在减少痛苦同时防止功能下降。通过整合生理信号，这些系统提供客观、准确的洞察一个个体的状态。本研究已提交给“第二届多模态传感大挑战赛（AI4PAIN）”。所提出的方法引入了一个利用皮肤电活动信号作为输入模态的流程。信号的多种表示被创建并可视化为波形，并在单个多重表示图中联合可视化。广泛的实验包括各种处理和过滤技术，以及多种表示组合，展示了所提出方法的有效性。它始终产生与传统融合方法相当甚至在几种情况下更好的结果，将其确立为整合不同信号表示或模态的强大替代方案。

更新时间: 2025-07-30 07:34:18

领域: cs.AI

下载: http://arxiv.org/abs/2507.21881v2

ETrace:Event-Driven Vulnerability Detection in Smart Contracts via LLM-Based Trace Analysis

With the advance application of blockchain technology in various fields, ensuring the security and stability of smart contracts has emerged as a critical challenge. Current security analysis methodologies in vulnerability detection can be categorized into static analysis and dynamic analysis methods.However, these existing traditional vulnerability detection methods predominantly rely on analyzing original contract code, not all smart contracts provide accessible code.We present ETrace, a novel event-driven vulnerability detection framework for smart contracts, which uniquely identifies potential vulnerabilities through LLM-powered trace analysis without requiring source code access. By extracting fine-grained event sequences from transaction logs, the framework leverages Large Language Models (LLMs) as adaptive semantic interpreters to reconstruct event analysis through chain-of-thought reasoning. ETrace implements pattern-matching to establish causal links between transaction behavior patterns and known attack behaviors. Furthermore, we validate the effectiveness of ETrace through preliminary experimental results.

Updated: 2025-07-30 07:32:19

标题: ETrace：通过基于LLM的跟踪分析在智能合约中的事件驱动漏洞检测

摘要: 随着区块链技术在各个领域的广泛应用，确保智能合约的安全性和稳定性已经成为一个关键挑战。目前的漏洞检测安全分析方法可以分类为静态分析和动态分析方法。然而，现有的传统漏洞检测方法主要依赖于分析原始合约代码，而并非所有智能合约都提供可访问的代码。我们提出了ETrace，这是一个新颖的基于事件驱动的智能合约漏洞检测框架，通过LLM技术进行跟踪分析，无需源代码访问即可独特地识别潜在漏洞。通过从交易日志中提取细粒度的事件序列，该框架利用大型语言模型（LLMs）作为自适应语义解释器，通过思维链推理重建事件分析。ETrace实现了模式匹配，以建立交易行为模式与已知攻击行为之间的因果关系。此外，我们通过初步实验结果验证了ETrace的有效性。

更新时间: 2025-07-30 07:32:19

领域: cs.CR,cs.SE,D.2.5

下载: http://arxiv.org/abs/2506.15790v3

Nearest-Better Network for Visualizing and Analyzing Combinatorial Optimization Problems: A Unified Tool

The Nearest-Better Network (NBN) is a powerful method to visualize sampled data for continuous optimization problems while preserving multiple landscape features. However, the calculation of NBN is very time-consuming, and the extension of the method to combinatorial optimization problems is challenging but very important for analyzing the algorithm's behavior. This paper provides a straightforward theoretical derivation showing that the NBN network essentially functions as the maximum probability transition network for algorithms. This paper also presents an efficient NBN computation method with logarithmic linear time complexity to address the time-consuming issue. By applying this efficient NBN algorithm to the OneMax problem and the Traveling Salesman Problem (TSP), we have made several remarkable discoveries for the first time: The fitness landscape of OneMax exhibits neutrality, ruggedness, and modality features. The primary challenges of TSP problems are ruggedness, modality, and deception. Two state-of-the-art TSP algorithms (i.e., EAX and LKH) have limitations when addressing challenges related to modality and deception, respectively. LKH, based on local search operators, fails when there are deceptive solutions near global optima. EAX, which is based on a single population, can efficiently maintain diversity. However, when multiple attraction basins exist, EAX retains individuals within multiple basins simultaneously, reducing inter-basin interaction efficiency and leading to algorithm's stagnation.

Updated: 2025-07-30 07:31:58

标题: 最接近更好的网络用于可视化和分析组合优化问题：一个统一工具 (Note: This translation may be slightly modified based on the context of the document)

摘要: 最近较好网络（NBN）是一种强大的方法，用于可视化连续优化问题的采样数据，同时保留多个景观特征。然而，NBN的计算非常耗时，将该方法扩展到组合优化问题具有挑战性，但对于分析算法行为非常重要。本文提供了一个直观的理论推导，显示NBN网络本质上是算法的最大概率转移网络。本文还提出了一种具有对数线性时间复杂度的高效NBN计算方法，以解决时间消耗的问题。通过将这种高效的NBN算法应用于OneMax问题和旅行商问题（TSP），我们首次做出了几项重要发现：OneMax的适应度景观表现出中立性、崎岖性和多模态特征。TSP问题的主要挑战是崎岖性、多模态性和欺骗性。两种最先进的TSP算法（即EAX和LKH）在应对多模态和欺骗性挑战时存在局限性。基于局部搜索算子的LKH在全局最优解附近存在欺骗性解时失败。基于单一种群的EAX能够有效地保持多样性。然而，当存在多个吸引盆时，EAX同时保留个体在多个盆中，降低了盆间交互效率，导致算法停滞不前。

更新时间: 2025-07-30 07:31:58

领域: cs.AI,cs.NE

下载: http://arxiv.org/abs/2507.22440v1

Tiny-BioMoE: a Lightweight Embedding Model for Biosignal Analysis

Pain is a complex and pervasive condition that affects a significant portion of the population. Accurate and consistent assessment is essential for individuals suffering from pain, as well as for developing effective management strategies in a healthcare system. Automatic pain assessment systems enable continuous monitoring, support clinical decision-making, and help minimize patient distress while mitigating the risk of functional deterioration. Leveraging physiological signals offers objective and precise insights into a person's state, and their integration in a multimodal framework can further enhance system performance. This study has been submitted to the \textit{Second Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN)}. The proposed approach introduces \textit{Tiny-BioMoE}, a lightweight pretrained embedding model for biosignal analysis. Trained on $4.4$ million biosignal image representations and consisting of only $7.3$ million parameters, it serves as an effective tool for extracting high-quality embeddings for downstream tasks. Extensive experiments involving electrodermal activity, blood volume pulse, respiratory signals, peripheral oxygen saturation, and their combinations highlight the model's effectiveness across diverse modalities in automatic pain recognition tasks. \textit{\textcolor{blue}{The model's architecture (code) and weights are available at https://github.com/GkikasStefanos/Tiny-BioMoE.

Updated: 2025-07-30 07:31:29

标题: 微型生物模型嵌入式模型：一种用于生物信号分析的轻量级嵌入模型

摘要: 疼痛是一个复杂而普遍存在的症状，影响着大部分人口。对于患有疼痛的个体以及在医疗系统中开发有效管理策略而言，准确而一致的评估至关重要。自动疼痛评估系统能够实现持续监测，支持临床决策，并帮助减轻患者的痛苦，同时减少功能恶化的风险。利用生理信号提供客观而精确的洞察力，将其整合到多模态框架中可以进一步提高系统性能。本研究已提交至“第二届下一代疼痛评估多模态感知大挑战（AI4PAIN）”。提出的方法引入了“Tiny-BioMoE”，一个用于生物信号分析的轻量级预训练嵌入模型。该模型在440万个生物信号图像表示上进行了训练，仅包含730万个参数，可作为提取高质量嵌入用于下游任务的有效工具。涉及皮肤电活动、血容量脉搏、呼吸信号、外周氧饱和度及其组合的大量实验突显了该模型在自动疼痛识别任务中跨多种模态的有效性。该模型的架构（代码）和权重可在 https://github.com/GkikasStefanos/Tiny-BioMoE 上找到。

更新时间: 2025-07-30 07:31:29

领域: cs.AI

下载: http://arxiv.org/abs/2507.21875v2

RANA: Robust Active Learning for Noisy Network Alignment

Network alignment has attracted widespread attention in various fields. However, most existing works mainly focus on the problem of label sparsity, while overlooking the issue of noise in network alignment, which can substantially undermine model performance. Such noise mainly includes structural noise from noisy edges and labeling noise caused by human-induced and process-driven errors. To address these problems, we propose RANA, a Robust Active learning framework for noisy Network Alignment. RANA effectively tackles both structure noise and label noise while addressing the sparsity of anchor link annotations, which can improve the robustness of network alignment models. Specifically, RANA introduces the proposed Noise-aware Selection Module and the Label Denoising Module to address structural noise and labeling noise, respectively. In the first module, we design a noise-aware maximization objective to select node pairs, incorporating a cleanliness score to address structural noise. In the second module, we propose a novel multi-source fusion denoising strategy that leverages model and twin node pairs labeling to provide more accurate labels for node pairs. Empirical results on three real-world datasets demonstrate that RANA outperforms state-of-the-art active learning-based methods in alignment accuracy. Our code is available at https://github.com/YXNan0110/RANA.

Updated: 2025-07-30 07:26:40

标题: RANA：嘈杂网络对齐的稳健主动学习

摘要: 网络对齐在各个领域吸引了广泛的关注。然而，大多数现有工作主要集中在标签稀疏性问题上，而忽视了网络对齐中的噪声问题，这可能会严重影响模型性能。这种噪声主要包括来自嘈杂边缘的结构噪声和由人为引起的标签噪声以及过程驱动错误引起的标签噪声。为了解决这些问题，我们提出了RANA，一个用于嘈杂网络对齐的鲁棒主动学习框架。RANA有效地处理结构噪声和标签噪声，同时解决了锚点链接注释的稀疏性问题，从而提高了网络对齐模型的鲁棒性。具体而言，RANA引入了提出的噪声感知选择模块和标签去噪模块，分别解决结构噪声和标签噪声问题。在第一个模块中，我们设计了一个噪声感知最大化目标来选择节点对，结合了一个清洁度评分来解决结构噪声问题。在第二个模块中，我们提出了一种新颖的多源融合去噪策略，利用模型和双节点对标签来为节点对提供更准确的标签。在三个真实世界数据集上的实证结果表明，RANA在对齐准确性方面优于最先进的基于主动学习的方法。我们的代码可在https://github.com/YXNan0110/RANA找到。

更新时间: 2025-07-30 07:26:40

领域: cs.LG

下载: http://arxiv.org/abs/2507.22434v1

RANA: Robust Active Learning for Noisy Network Alignment

Updated: 2025-07-30 07:26:40

标题: RANA: 噪声网络对齐的稳健主动学习

摘要: 网络对齐在各个领域受到广泛关注。然而，大多数现有的研究主要集中在标签稀疏性问题上，而忽视了网络对齐中的噪声问题，这可能会严重损害模型性能。这种噪声主要包括来自嘈杂边缘的结构噪声和由人为因素和过程驱动的标签噪声。为解决这些问题，我们提出了RANA，一个面向嘈杂网络对齐的稳健主动学习框架。RANA有效地解决了结构噪声和标签噪声问题，同时解决了锚定链接注释的稀疏性，从而提高了网络对齐模型的鲁棒性。具体来说，RANA引入了所提出的噪声感知选择模块和标签去噪模块，分别解决结构噪声和标签噪声问题。在第一个模块中，我们设计了一个噪声感知最大化目标来选择节点对，结合了一个清洁度评分来解决结构噪声。在第二个模块中，我们提出了一种新颖的多源融合去噪策略，利用模型和双节点对标签来为节点对提供更准确的标签。在三个真实世界数据集上的实证结果表明，RANA在对齐准确性方面优于最先进的基于主动学习的方法。我们的代码可在https://github.com/YXNan0110/RANA找到。

更新时间: 2025-07-30 07:26:40

领域: cs.LG

下载: http://arxiv.org/abs/2507.22434v1

Cross-Border Legal Adaptation of Autonomous Vehicle Design based on Logic and Non-monotonic Reasoning

This paper focuses on the legal compliance challenges of autonomous vehicles in a transnational context. We choose the perspective of designers and try to provide supporting legal reasoning in the design process. Based on argumentation theory, we introduce a logic to represent the basic properties of argument-based practical (normative) reasoning, combined with partial order sets of natural numbers to express priority. Finally, through case analysis of legal texts, we show how the reasoning system we provide can help designers to adapt their design solutions more flexibly in the cross-border application of autonomous vehicles and to more easily understand the legal implications of their decisions.

Updated: 2025-07-30 07:24:15

标题: 基于逻辑和非单调推理的自动驾驶车辆设计的跨境法律适应

摘要: 本文关注自动驾驶车辆在跨国背景下的法律合规挑战。我们选择设计师的视角，并尝试在设计过程中提供支持性的法律推理。基于论证理论，我们引入了一种逻辑来表示基于论点的实践（规范）推理的基本属性，结合自然数的偏序集来表达优先级。最后，通过对法律文本的案例分析，我们展示了我们提供的推理系统如何帮助设计师在自动驾驶车辆的跨境应用中更灵活地调整他们的设计解决方案，并更容易地理解他们决策的法律影响。

更新时间: 2025-07-30 07:24:15

领域: cs.AI

下载: http://arxiv.org/abs/2507.22432v1

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

While large language models (LLMs) have achieved impressive progress, their application in scientific domains such as chemistry remains hindered by shallow domain understanding and limited reasoning capabilities. In this work, we focus on the specific field of chemistry and develop a Chemical Reasoner LLM, ChemDFM-R. We first construct a comprehensive dataset of atomized knowledge points to enhance the model's understanding of the fundamental principles and logical structure of chemistry. Then, we propose a mix-sourced distillation strategy that integrates expert-curated knowledge with general-domain reasoning skills, followed by domain-specific reinforcement learning to enhance chemical reasoning. Experiments on diverse chemical benchmarks demonstrate that ChemDFM-R achieves cutting-edge performance while providing interpretable, rationale-driven outputs. Further case studies illustrate how explicit reasoning chains significantly improve the reliability, transparency, and practical utility of the model in real-world human-AI collaboration scenarios.

Updated: 2025-07-30 07:23:58

标题: ChemDFM-R：一种融合原子化化学知识的化学推理LLM

摘要: 尽管大型语言模型（LLMs）取得了令人瞩目的进展，但它们在化学等科学领域的应用仍受到领域理解的限制和推理能力有限的影响。在这项工作中，我们专注于化学领域，开发了一个化学推理LLM，ChemDFM-R。我们首先构建了一个全面的原子化知识数据集，以增强模型对化学基本原理和逻辑结构的理解。然后，我们提出了一种混合源蒸馏策略，将专家策划的知识与通用领域推理技能相结合，然后通过领域特定的强化学习来增强化学推理能力。在各种化学基准测试中的实验证明，ChemDFM-R实现了尖端性能，同时提供了可解释的、基于理性的输出。进一步的案例研究说明了明确的推理链如何显着提高模型在现实世界人工智能协作场景中的可靠性、透明度和实用性。

更新时间: 2025-07-30 07:23:58

领域: cs.CE,cs.AI

下载: http://arxiv.org/abs/2507.21990v2

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

In this work, we propose CleanMel, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance. The proposed network takes as input the noisy and reverberant microphone recording and predicts the corresponding clean Mel-spectrogram. The enhanced Mel-spectrogram can be either transformed to the speech waveform with a neural vocoder or directly used for ASR. The proposed network is composed of interleaved cross-band and narrow-band processing in the Mel-frequency domain, for learning the full-band spectral pattern and the narrow-band properties of signals, respectively. Compared to linear-frequency domain or time-domain speech enhancement, the key advantage of Mel-spectrogram enhancement is that Mel-frequency presents speech in a more compact way and thus is easier to learn, which will benefit both speech quality and ASR. Experimental results on five English and one Chinese datasets demonstrate a significant improvement in both speech quality and ASR performance achieved by the proposed model.Code and audio examples of our model are available online.

Updated: 2025-07-30 07:22:13

标题: CleanMel：用于提高语音质量和ASR的Mel频谱图增强

摘要: 在这项工作中，我们提出了CleanMel，一种用于改善语音质量和自动语音识别（ASR）性能的单通道Mel频谱降噪和去混响网络。所提出的网络以噪声和混响的麦克风录音作为输入，并预测相应的清晰Mel频谱图。增强的Mel频谱图可以通过神经声码器转换为语音波形，也可以直接用于ASR。所提出的网络在Mel频率域中由交叉带和窄带处理交织而成，用于分别学习全频段的谱图模式和信号的窄带属性。与线性频率域或时域语音增强相比，Mel频谱图增强的关键优势在于Mel频率以更紧凑的方式呈现语音，因此更容易学习，这将有利于语音质量和ASR。对五个英文和一个中文数据集的实验结果表明，所提出的模型在语音质量和ASR性能方面取得了显著改善。我们的模型的代码和音频示例可在线获取。

更新时间: 2025-07-30 07:22:13

领域: eess.AS,cs.AI,cs.SD

下载: http://arxiv.org/abs/2502.20040v2

Comparing Normalizing Flows with Kernel Density Estimation in Estimating Risk of Automated Driving Systems

The development of safety validation methods is essential for the safe deployment and operation of Automated Driving Systems (ADSs). One of the goals of safety validation is to prospectively evaluate the risk of an ADS dealing with real-world traffic. Scenario-based assessment is a widely-used approach, where test cases are derived from real-world driving data. To allow for a quantitative analysis of the system performance, the exposure of the scenarios must be accurately estimated. The exposure of scenarios at parameter level is expressed using a Probability Density Function (PDF). However, assumptions about the PDF, such as parameter independence, can introduce errors, while avoiding assumptions often leads to oversimplified models with limited parameters to mitigate the curse of dimensionality. This paper considers the use of Normalizing Flows (NF) for estimating the PDF of the parameters. NF are a class of generative models that transform a simple base distribution into a complex one using a sequence of invertible and differentiable mappings, enabling flexible, high-dimensional density estimation without restrictive assumptions on the PDF's shape. We demonstrate the effectiveness of NF in quantifying risk and risk uncertainty of an ADS, comparing its performance with Kernel Density Estimation (KDE), a traditional method for non-parametric PDF estimation. While NF require more computational resources compared to KDE, NF is less sensitive to the curse of dimensionality. As a result, NF can improve risk uncertainty estimation, offering a more precise assessment of an ADS's safety. This work illustrates the potential of NF in scenario-based safety. Future work involves experimenting more with using NF for scenario generation and optimizing the NF architecture, transformation types, and training hyperparameters to further enhance their applicability.

Updated: 2025-07-30 07:16:59

标题: 比较正规化流与核密度估计在估计自动驾驶系统风险中的应用

摘要: 安全验证方法的发展对于自动驾驶系统（ADS）的安全部署和运行至关重要。安全验证的一个目标是前瞻性评估ADS应对真实道路交通风险的能力。基于场景的评估是一种广泛应用的方法，其中测试案例源自真实驾驶数据。为了允许对系统性能进行定量分析，必须准确估计场景的曝光。在参数级别上，场景的曝光使用概率密度函数（PDF）来表达。然而，关于PDF的假设，如参数独立性，可能引入误差，而避免假设通常会导致过于简化的模型，参数有限以减轻维度诅咒。本文考虑使用归一化流（NF）来估计参数的PDF。NF是一类生成模型，通过一系列可逆和可微分的映射将简单的基本分布转化为复杂的分布，实现灵活、高维度的密度估计，而不对PDF的形状做出限制性假设。我们展示了NF在量化ADS的风险和风险不确定性方面的有效性，将其性能与核密度估计（KDE）进行了比较，后者是一种传统的非参数PDF估计方法。虽然与KDE相比，NF需要更多的计算资源，但NF对维度诅咒不太敏感。因此，NF可以改善风险不确定性估计，提供对ADS安全性的更精确评估。这项工作展示了NF在基于场景的安全性方面的潜力。未来工作涉及更多地尝试使用NF进行场景生成，并优化NF的架构、转换类型和培训超参数，以进一步提升其适用性。

更新时间: 2025-07-30 07:16:59

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2507.22429v1

Comparing Normalizing Flows with Kernel Density Estimation in Estimating Risk of Automated Driving Systems

Updated: 2025-07-30 07:16:59

标题: 比较正规流与核密度估计在估计自动驾驶系统风险方面的表现

摘要: 安全验证方法的发展对自动驾驶系统（ADSs）的安全部署和运行至关重要。安全验证的目标之一是前瞻性评估ADS处理现实交通风险的能力。基于场景的评估是一种广泛使用的方法，其中测试用例源自真实驾驶数据。为了允许对系统性能进行定量分析，必须准确估计场景的暴露。在参数级别上，场景的暴露使用概率密度函数（PDF）来表达。然而，关于PDF的假设，如参数独立性，可能引入错误，而避免假设通常会导致过于简化的模型，具有有限的参数来减轻维度灾难。本文考虑使用归一化流（NF）来估计参数的PDF。NF是一类生成模型，通过一系列可逆和可微分映射将简单的基本分布转化为复杂的分布，实现了对PDF形状没有限制假设的灵活、高维密度估计。我们展示了NF在量化ADS的风险和风险不确定性方面的有效性，将其性能与核密度估计（KDE）进行了比较，后者是一种传统的非参数PDF估计方法。虽然与KDE相比，NF需要更多的计算资源，但NF对维度灾难不太敏感。因此，NF可以提高风险不确定性估计，提供更精确的ADS安全评估。本研究展示了NF在基于场景的安全性方面的潜力。未来的工作涉及更多地尝试使用NF进行场景生成，并优化NF架构、变换类型和训练超参数，以进一步提高它们的适用性。

更新时间: 2025-07-30 07:16:59

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2507.22429v1

Theoretical Analysis of Relative Errors in Gradient Computations for Adversarial Attacks with CE Loss

Gradient-based adversarial attacks using the Cross-Entropy (CE) loss often suffer from overestimation due to relative errors in gradient computation induced by floating-point arithmetic. This paper provides a rigorous theoretical analysis of these errors, conducting the first comprehensive study of floating-point computation errors in gradient-based attacks across four distinct scenarios: (i) unsuccessful untargeted attacks, (ii) successful untargeted attacks, (iii) unsuccessful targeted attacks, and (iv) successful targeted attacks. We establish theoretical foundations characterizing the behavior of relative numerical errors under different attack conditions, revealing previously unknown patterns in gradient computation instability, and identify floating-point underflow and rounding as key contributors. Building on this insight, we propose the Theoretical MIFPE (T-MIFPE) loss function, which incorporates an optimal scaling factor $T = t^*$ to minimize the impact of floating-point errors, thereby enhancing the accuracy of gradient computation in adversarial attacks. Extensive experiments on the MNIST, CIFAR-10, and CIFAR-100 datasets demonstrate that T-MIFPE outperforms existing loss functions, including CE, C\&W, DLR, and MIFPE, in terms of attack potency and robustness evaluation accuracy.

Updated: 2025-07-30 07:14:59

标题: 对CE损失函数下对抗攻击中梯度计算相对误差的理论分析

摘要: 基于梯度的对抗攻击通常使用交叉熵（CE）损失，由于浮点运算引起的梯度计算中的相对误差，往往存在高估问题。本文提供了对这些错误的严格理论分析，首次对基于梯度攻击中的浮点计算错误进行了全面研究，涵盖了四种不同情景：（i）未成功的无目标攻击，（ii）成功的无目标攻击，（iii）未成功的有目标攻击，以及（iv）成功的有目标攻击。我们建立了在不同攻击条件下相对数值误差行为的理论基础，揭示了梯度计算不稳定性中以前未知的模式，并确定了浮点下溢和舍入作为关键因素。基于这一洞察，我们提出了理论MIFPE（T-MIFPE）损失函数，该函数结合了一个最优缩放因子$T = t^*$，以最小化浮点错误的影响，从而提高对抗攻击中梯度计算的准确性。在MNIST、CIFAR-10和CIFAR-100数据集上进行的大量实验表明，T-MIFPE在攻击效果和稳健性评估准确性方面优于现有的损失函数，包括CE、C\&W、DLR和MIFPE。

更新时间: 2025-07-30 07:14:59

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.22428v1

Multimodal Late Fusion Model for Problem-Solving Strategy Classification in a Machine Learning Game

Machine learning models are widely used to support stealth assessment in digital learning environments. Existing approaches typically rely on abstracted gameplay log data, which may overlook subtle behavioral cues linked to learners' cognitive strategies. This paper proposes a multimodal late fusion model that integrates screencast-based visual data and structured in-game action sequences to classify students' problem-solving strategies. In a pilot study with secondary school students (N=149) playing a multitouch educational game, the fusion model outperformed unimodal baseline models, increasing classification accuracy by over 15%. Results highlight the potential of multimodal ML for strategy-sensitive assessment and adaptive support in interactive learning contexts.

Updated: 2025-07-30 07:12:06

标题: 多模式晚期融合模型用于机器学习游戏中的问题解决策略分类

摘要: 机器学习模型广泛用于支持数字学习环境中的隐形评估。现有方法通常依赖于抽象的游戏日志数据，可能忽略与学习者认知策略相关的微妙行为线索。本文提出了一种多模态后期融合模型，该模型整合了基于屏幕录像的视觉数据和结构化的游戏内动作序列，用于分类学生的问题解决策略。在一项与中学生（N=149）一起玩多点触摸教育游戏的试点研究中，融合模型的表现优于单模态基线模型，将分类准确度提高了超过15%。结果突显了多模态机器学习在策略敏感评估和交互式学习环境中自适应支持方面的潜力。

更新时间: 2025-07-30 07:12:06

领域: cs.LG

下载: http://arxiv.org/abs/2507.22426v1

Multimodal Late Fusion Model for Problem-Solving Strategy Classification in a Machine Learning Game

Updated: 2025-07-30 07:12:06

标题: 多模态后期融合模型用于机器学习游戏中问题解决策略分类

摘要: 机器学习模型被广泛应用于支持数字学习环境中的隐性评估。现有方法通常依赖于抽象的游戏日志数据，这可能忽视与学习者认知策略相关的细微行为线索。本文提出了一种多模态延迟融合模型，该模型整合了基于屏幕录制的视觉数据和结构化的游戏内动作序列，以分类学生的问题解决策略。在一项与中学生（N=149）进行的试点研究中，玩多点触控教育游戏，融合模型的表现优于单模态基准模型，将分类准确性提高了15%以上。结果突显了多模态机器学习在策略敏感评估和互动学习环境中的自适应支持潜力。

更新时间: 2025-07-30 07:12:06

领域: cs.LG

下载: http://arxiv.org/abs/2507.22426v1

Bridging Privacy and Robustness for Trustworthy Machine Learning

The widespread adoption of machine learning necessitates robust privacy protection alongside algorithmic resilience. While Local Differential Privacy (LDP) provides foundational guarantees, sophisticated adversaries with prior knowledge demand more nuanced Bayesian privacy notions, such as Maximum Bayesian Privacy (MBP) and Average Bayesian Privacy (ABP), first introduced by \cite{zhang2022no}. Concurrently, machine learning systems require inherent robustness against data perturbations and adversarial manipulations. This paper systematically investigates the intricate theoretical relationships among LDP, MBP, and ABP. Crucially, we bridge these privacy concepts with algorithmic robustness, particularly within the Probably Approximately Correct (PAC) learning framework. Our work demonstrates that privacy-preserving mechanisms inherently confer PAC robustness. We present key theoretical results, including the formalization of the established LDP-MBP relationship, novel bounds between MBP and ABP, and a proof demonstrating PAC robustness from MBP. Furthermore, we establish a novel theoretical relationship quantifying how privacy leakage directly influences an algorithm's input robustness. These results provide a unified theoretical framework for understanding and optimizing the privacy-robustness trade-off, paving the way for the development of more secure, trustworthy, and resilient machine learning systems.

Updated: 2025-07-30 07:10:49

标题: 桥接隐私和鲁棒性以构建可信的机器学习

摘要: 机器学习的广泛应用需要在算法的弹性之外提供强大的隐私保护。虽然本地差分隐私（LDP）提供了基本保证，但具有先验知识的复杂对手需要更加细致的贝叶斯隐私概念，例如最大贝叶斯隐私（MBP）和平均贝叶斯隐私（ABP），这是由张等人在2022年首次提出的。与此同时，机器学习系统需要固有的抗数据扰动和对抗性操作。本文系统地研究了LDP、MBP和ABP之间的复杂理论关系。重要的是，我们将这些隐私概念与算法弹性联系起来，特别是在“大概近似正确”（PAC）学习框架内。我们的工作表明，保护隐私的机制本质上赋予了PAC的弹性。我们提出了关键的理论结果，包括建立了LDP-MBP关系的形式化、MBP和ABP之间的新边界以及从MBP证明PAC弹性的证明。此外，我们建立了一个量化隐私泄霏如何直接影响算法输入弹性的新理论关系。这些结果为理解和优化隐私-弹性权衡提供了一个统一的理论框架，为开发更安全、可信和弹性的机器学习系统铺平了道路。

更新时间: 2025-07-30 07:10:49

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2403.16591v5

Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance

Vision-Language-Action (VLA) models have made substantial progress by leveraging the robust capabilities of Visual Language Models (VLMs). However, VLMs' significant parameter size and autoregressive (AR) decoding nature impose considerable computational demands on VLA models. While Speculative Decoding (SD) has shown efficacy in accelerating Large Language Models (LLMs) by incorporating efficient drafting and parallel verification, allowing multiple tokens to be generated in one forward pass, its application to VLA models remains unexplored. This work introduces Spec-VLA, an SD framework designed to accelerate VLA models. Due to the difficulty of the action prediction task and the greedy decoding mechanism of the VLA models, the direct application of the advanced SD framework to the VLA prediction task yields a minor speed improvement. To boost the generation speed, we propose an effective mechanism to relax acceptance utilizing the relative distances represented by the action tokens of the VLA model. Empirical results across diverse test scenarios affirm the effectiveness of the Spec-VLA framework, and further analysis substantiates the impact of our proposed strategies, which enhance the acceptance length by 44%, achieving 1.42 times speedup compared with the OpenVLA baseline, without compromising the success rate. The success of the Spec-VLA framework highlights the potential for broader application of speculative execution in VLA prediction scenarios.

Updated: 2025-07-30 07:04:09

标题: Spec-VLA：具有放松接受标准的视觉-语言-动作模型的推测解码

摘要: Vision-Language-Action（VLA）模型通过利用视觉语言模型（VLMs）的强大能力取得了实质性进展。然而，VLMs的显著参数大小和自回归（AR）解码特性对VLA模型施加了相当大的计算需求。虽然猜测解码（SD）已经证明通过结合高效起草和并行验证来加速大型语言模型（LLMs）具有有效性，允许在一次前向传递中生成多个标记，但其在VLA模型中的应用尚未被探索。本文介绍了Spec-VLA，一个旨在加速VLA模型的SD框架。由于动作预测任务的困难性和VLA模型的贪婪解码机制，将先进的SD框架直接应用于VLA预测任务仅带来了轻微的速度改善。为提高生成速度，我们提出了一种有效的机制，利用VLA模型的动作标记表示的相对距离来放宽接受度。在各种测试场景中的实证结果证实了Spec-VLA框架的有效性，并进一步分析了我们提出的策略的影响，通过增加接受长度44%，与OpenVLA基线相比实现了1.42倍的加速，而不影响成功率。Spec-VLA框架的成功突显了在VLA预测场景中推测执行的更广泛应用潜力。

更新时间: 2025-07-30 07:04:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.22424v1

On the Definition of Intelligence

To engineer AGI, we should first capture the essence of intelligence in a species-agnostic form that can be evaluated, while being sufficiently general to encompass diverse paradigms of intelligent behavior, including reinforcement learning, generative models, classification, analogical reasoning, and goal-directed decision-making. We propose a general criterion based on sample fidelity: intelligence is the ability, given sample(s) from a category, to generate sample(s) from the same category. We formalise this intuition as {\epsilon}-category intelligence: it is {\epsilon}-intelligent with respect to a category if no chosen admissible distinguisher can separate generated from original samples beyond tolerance {\epsilon}. We present the formal framework, outline empirical protocols, and discuss implications for evaluation, safety, and generalization.

Updated: 2025-07-30 07:04:00

标题: 关于智力定义的研究

摘要: 为了设计通用人工智能（AGI），我们首先应该以一种与物种无关的形式捕捉智能的本质，这种形式可以被评估，同时足够通用以涵盖智能行为的各种范式，包括强化学习、生成模型、分类、类比推理和目标导向决策。我们提出了一个基于样本忠实度的通用标准：智能是指在从一个类别中获取样本的情况下，能够生成相同类别的样本。我们将这种直觉形式化为{\epsilon}-类别智能：如果没有选择的可接受的区分者能够将生成的样本与原始样本在容差{\epsilon}以外分开，那么它就是{\epsilon}智能的。我们提出了正式框架，概述了实证协议，并讨论了对评估、安全性和泛化的影响。

更新时间: 2025-07-30 07:04:00

领域: cs.AI

下载: http://arxiv.org/abs/2507.22423v1

Neural Networks as Universal Finite-State Machines: A Constructive ReLU Simulation Framework for NFAs

We present a formal and constructive simulation framework for nondeterministic finite automata (NFAs) using standard feedforward ReLU neural networks. Unlike prior approaches that rely on recurrent architectures or post hoc extraction methods, our formulation symbolically encodes automaton states as binary vectors, transitions as sparse linear transformations, and nondeterministic branching - including {\epsilon}-closures - as compositions of shared ReLU layers. We prove that every regular language can be recognized exactly by a depth-unrolled ReLU network with shared parameters, independent of input length. Our construction yields not only formal equivalence between NFAs and ReLU networks, but also practical trainability: we demonstrate that the networks can learn NFA acceptance behavior through gradient descent using standard supervised data. Extensive experiments validate all theoretical results, achieving perfect or near-perfect agreement on acceptance, state propagation, and closure dynamics. This work establishes a new bridge between symbolic automata theory and modern neural architectures, showing that feedforward networks can perform precise, interpretable, and trainable symbolic computation.

Updated: 2025-07-30 06:52:13

标题: 神经网络作为通用有限状态机：一种用于NFAs的构造ReLU模拟框架

摘要: 我们提出了一个形式化和构造性的模拟框架，用于使用标准前馈ReLU神经网络模拟非确定有限自动机（NFAs）。与依赖递归架构或事后提取方法的先前方法不同，我们的公式将自动机状态符号编码为二进制向量，将转换编码为稀疏线性变换，并将非确定性分支（包括ε-闭包）编码为共享ReLU层的组合。我们证明每个正则语言都可以通过具有共享参数的深度展开的ReLU网络精确识别，而与输入长度无关。我们的构建不仅在NFAs和ReLU网络之间产生了形式上的等价，而且在实践中也具有可训练性：我们证明网络可以通过使用标准监督数据的梯度下降学习NFA接受行为。广泛的实验验证了所有理论结果，在接受、状态传播和闭包动态方面实现了完美或接近完美的一致性。这项工作建立了符号自动机理论和现代神经结构之间的新桥梁，表明前馈网络可以执行精确、可解释和可训练的符号计算。

更新时间: 2025-07-30 06:52:13

领域: cs.LG,cs.FL

下载: http://arxiv.org/abs/2505.24110v2

Neural Networks as Universal Finite-State Machines: A Constructive ReLU Simulation Framework for NFAs

Updated: 2025-07-30 06:52:13

标题: 神经网络作为通用有限状态机：一种用于NFAs的构造ReLU模拟框架

摘要: 我们提出了一个形式化和建设性的模拟框架，用于使用标准前馈ReLU神经网络模拟非确定性有限自动机（NFAs）。与依赖循环架构或事后提取方法的先前方法不同，我们的公式将自动机状态符号化地编码为二进制向量，将转换编码为稀疏线性转换，并将非确定性分支 - 包括{\epsilon}-闭包 - 编码为共享ReLU层的组合。我们证明每个正则语言可以由具有共享参数的深度展开的ReLU网络精确识别，与输入长度无关。我们的构建不仅产生了NFAs和ReLU网络之间的形式等价性，而且实际可训练：我们证明网络可以通过使用标准监督数据的梯度下降学习NFA接受行为。广泛的实验验证了所有理论结果，在接受、状态传播和闭包动态方面实现了完美或接近完美的一致性。这项工作建立了符号自动机理论与现代神经架构之间的新桥梁，表明前馈网络可以执行精确、可解释和可训练的符号计算。

更新时间: 2025-07-30 06:52:13

领域: cs.LG,cs.FL

下载: http://arxiv.org/abs/2505.24110v2

Efficient Spatial-Temporal Modeling for Real-Time Video Analysis: A Unified Framework for Action Recognition and Object Tracking

Real-time video analysis remains a challenging problem in computer vision, requiring efficient processing of both spatial and temporal information while maintaining computational efficiency. Existing approaches often struggle to balance accuracy and speed, particularly in resource-constrained environments. In this work, we present a unified framework that leverages advanced spatial-temporal modeling techniques for simultaneous action recognition and object tracking. Our approach builds upon recent advances in parallel sequence modeling and introduces a novel hierarchical attention mechanism that adaptively focuses on relevant spatial regions across temporal sequences. We demonstrate that our method achieves state-of-the-art performance on standard benchmarks while maintaining real-time inference speeds. Extensive experiments on UCF-101, HMDB-51, and MOT17 datasets show improvements of 3.2% in action recognition accuracy and 2.8% in tracking precision compared to existing methods, with 40% faster inference time.

Updated: 2025-07-30 06:49:11

标题: 高效的时空建模用于实时视频分析：动作识别和目标追踪的统一框架

摘要: 实时视频分析在计算机视觉中仍然是一个具有挑战性的问题，需要同时处理空间和时间信息并保持计算效率。现有方法经常难以在资源受限环境中平衡准确性和速度。在这项工作中，我们提出了一个统一的框架，利用先进的空间-时间建模技术来实现动作识别和物体跟踪。我们的方法建立在最近在并行序列建模方面取得的进展基础上，并引入了一种新颖的分层注意机制，自适应地关注时间序列中相关的空间区域。我们展示了我们的方法在标准基准测试中达到了最先进的性能，同时保持实时推理速度。对UCF-101、HMDB-51和MOT17数据集的广泛实验显示，与现有方法相比，我们的方法在动作识别准确性上提高了3.2%，在跟踪精度上提高了2.8%，推理时间快了40%。

更新时间: 2025-07-30 06:49:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22421v1

Systematic Evaluation of Knowledge Graph Repair with Large Language Models

We present a systematic approach for evaluating the quality of knowledge graph repairs with respect to constraint violations defined in shapes constraint language (SHACL). Current evaluation methods rely on \emph{ad hoc} datasets, which limits the rigorous analysis of repair systems in more general settings. Our method addresses this gap by systematically generating violations using a novel mechanism, termed violation-inducing operations (VIOs). We use the proposed evaluation framework to assess a range of repair systems which we build using large language models. We analyze the performance of these systems across different prompting strategies. Results indicate that concise prompts containing both the relevant violated SHACL constraints and key contextual information from the knowledge graph yield the best performance.

Updated: 2025-07-30 06:46:30

标题: 大语言模型在知识图修复中的系统评估

摘要: 我们提出了一种系统评估知识图修复质量的方法，该方法针对shapes约束语言（SHACL）中定义的约束违规进行评估。当前的评估方法依赖于\emph{特定}数据集，这限制了在更一般环境中对修复系统进行严格分析。我们的方法通过系统地生成违规，使用一种新颖的机制，称为违规诱导操作（VIOs），来填补这一差距。我们使用提出的评估框架来评估一系列使用大型语言模型构建的修复系统。我们分析了这些系统在不同提示策略下的表现。结果表明，简洁的提示包含有关违反SHACL约束的相关信息和知识图中的关键上下文信息可以获得最佳性能。

更新时间: 2025-07-30 06:46:30

领域: cs.DB,cs.AI

下载: http://arxiv.org/abs/2507.22419v1

Aleatoric Uncertainty Medical Image Segmentation Estimation via Flow Matching

Quantifying aleatoric uncertainty in medical image segmentation is critical since it is a reflection of the natural variability observed among expert annotators. A conventional approach is to model the segmentation distribution using the generative model, but current methods limit the expression ability of generative models. While current diffusion-based approaches have demonstrated impressive performance in approximating the data distribution, their inherent stochastic sampling process and inability to model exact densities limit their effectiveness in accurately capturing uncertainty. In contrast, our proposed method leverages conditional flow matching, a simulation-free flow-based generative model that learns an exact density, to produce highly accurate segmentation results. By guiding the flow model on the input image and sampling multiple data points, our approach synthesizes segmentation samples whose pixel-wise variance reliably reflects the underlying data distribution. This sampling strategy captures uncertainties in regions with ambiguous boundaries, offering robust quantification that mirrors inter-annotator differences. Experimental results demonstrate that our method not only achieves competitive segmentation accuracy but also generates uncertainty maps that provide deeper insights into the reliability of the segmentation outcomes. The code for this paper is freely available at https://github.com/huynhspm/Data-Uncertainty

Updated: 2025-07-30 06:45:32

标题: 随机不确定性医学图像分割估计通过流匹配

摘要: 在医学图像分割中量化随机不确定性是至关重要的，因为它反映了专家标注者之间观察到的自然变异性。传统方法是使用生成模型来建模分割分布，但当前方法限制了生成模型的表达能力。虽然当前基于扩散的方法已经展示出在逼近数据分布方面的出色性能，但它们固有的随机采样过程和无法建模精确密度的限制使它们在准确捕捉不确定性方面效果有限。相反，我们提出的方法利用条件流匹配，这是一种基于流的生成模型，学习精确密度，以产生高度精确的分割结果。通过在输入图像上引导流模型和采样多个数据点，我们的方法合成分割样本，其像素级方差可靠地反映基础数据分布。这种采样策略捕捉了具有模糊边界的区域的不确定性，提供了反映标注者之间差异的强大量化。实验结果表明，我们的方法不仅实现了竞争性分割准确性，还生成了提供更深入洞察分割结果可靠性的不确定性地图。本文的代码可以在https://github.com/huynhspm/Data-Uncertainty免费获取。

更新时间: 2025-07-30 06:45:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22418v1

NeedleChain: Measuring Intact Long-Context Reasoning Capability of Large Language Models

The Needle-in-a-Haystack (NIAH) benchmark is widely used to evaluate Large Language Models' (LLMs) ability to understand long contexts (LC). It evaluates the capability to identify query-relevant context within extensive query-irrelevant passages. Although this method serves as a widely accepted standard for evaluating long-context understanding, our findings suggest it may overestimate the true LC capability of LLMs. We demonstrate that even state-of-the-art models such as GPT-4o struggle to intactly incorporate given contexts made up of solely query-relevant ten sentences. In response, we introduce a novel benchmark, \textbf{NeedleChain}, where the context consists entirely of query-relevant information, requiring the LLM to fully grasp the input to answer correctly. Our benchmark allows for flexible context length and reasoning order, offering a more comprehensive analysis of LLM performance. Additionally, we propose an extremely simple yet compelling strategy to improve LC understanding capability of LLM: ROPE Contraction. Our experiments with various advanced LLMs reveal a notable disparity between their ability to process large contexts and their capacity to fully understand them. Source code and datasets are available at https://github.com/hyeonseokk/NeedleChain

Updated: 2025-07-30 06:29:50

标题: NeedleChain: 测量大型语言模型完整长上下文推理能力

摘要: The Needle-in-a-Haystack (NIAH) benchmark被广泛用于评估大型语言模型（LLMs）理解长上下文（LC）的能力。它评估了在大量与查询无关的段落中识别与查询相关上下文的能力。尽管这种方法被广泛接受为评估长上下文理解的标准，但我们的研究表明它可能高估了LLMs的真实LC能力。我们展示了即使是GPT-4o等最先进的模型也难以完整地整合仅由查询相关的十个句子组成的给定上下文。为此，我们引入了一个新的基准，\textbf{NeedleChain}，其中上下文完全由查询相关信息组成，需要LLM完全掌握输入才能正确回答。我们的基准允许灵活的上下文长度和推理顺序，提供了对LLM性能更全面的分析。此外，我们提出了一个非常简单但引人注目的策略来提高LLM的LC理解能力：ROPE Contraction。我们对各种先进的LLMs进行的实验显示，它们在处理大上下文的能力和完全理解它们的能力之间存在显著差异。源代码和数据集可在https://github.com/hyeonseokk/NeedleChain 上找到。

更新时间: 2025-07-30 06:29:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.22411v1

SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

While frontier large language models (LLMs) continue to push capability boundaries, their deployment remains confined to GPU-powered cloud infrastructure. We challenge this paradigm with SmallThinker, a family of LLMs natively designed - not adapted - for the unique constraints of local devices: weak computational power, limited memory, and slow storage. Unlike traditional approaches that mainly compress existing models built for clouds, we architect SmallThinker from the ground up to thrive within these limitations. Our innovation lies in a deployment-aware architecture that transforms constraints into design principles. First, We introduce a two-level sparse structure combining fine-grained Mixture-of-Experts (MoE) with sparse feed-forward networks, drastically reducing computational demands without sacrificing model capacity. Second, to conquer the I/O bottleneck of slow storage, we design a pre-attention router that enables our co-designed inference engine to prefetch expert parameters from storage while computing attention, effectively hiding storage latency that would otherwise cripple on-device inference. Third, for memory efficiency, we utilize NoPE-RoPE hybrid sparse attention mechanism to slash KV cache requirements. We release SmallThinker-4B-A0.6B and SmallThinker-21B-A3B, which achieve state-of-the-art performance scores and even outperform larger LLMs. Remarkably, our co-designed system mostly eliminates the need for expensive GPU hardware: with Q4_0 quantization, both models exceed 20 tokens/s on ordinary consumer CPUs, while consuming only 1GB and 8GB of memory respectively. SmallThinker is publicly available at hf.co/PowerInfer/SmallThinker-4BA0.6B-Instruct and hf.co/PowerInfer/SmallThinker-21BA3B-Instruct.

Updated: 2025-07-30 06:29:40

标题: SmallThinker：一系列本地部署的高效大型语言模型

摘要: 尽管前沿的大型语言模型（LLMs）继续推动能力边界，但它们的部署仍然局限于GPU加速的云基础设施。我们挑战这一范式，推出SmallThinker，这是一系列专为本地设备的独特限制而设计的LLMs，而不是适应云环境。这些限制包括计算能力不足、内存有限和存储缓慢。与传统方法主要是对云端构建的现有模型进行压缩不同，我们从头开始构建SmallThinker以在这些限制内蓬勃发展。我们的创新在于部署感知架构，将约束转化为设计原则。首先，我们引入了一个两级稀疏结构，将细粒度的专家混合（MoE）与稀疏前馈网络结合，大大降低了计算需求而不牺牲模型容量。其次，为了克服存储缓慢的I/O瓶颈，我们设计了一个预注意路由器，使我们共同设计的推理引擎能够在计算注意力的同时从存储中预取专家参数，有效地隐藏存储延迟，否则会使设备上的推理受到影响。第三，为了提高内存效率，我们利用NoPE-RoPE混合稀疏注意机制来减少KV缓存需求。我们发布了SmallThinker-4B-A0.6B和SmallThinker-21B-A3B，它们实现了最先进的性能得分，甚至胜过更大的LLMs。值得注意的是，我们共同设计的系统基本上消除了对昂贵的GPU硬件的需求：通过Q4_0量化，这两个模型在普通消费者CPU上都超过了20个令牌/秒，分别仅消耗1GB和8GB内存。SmallThinker现已公开发布，网址为hf.co/PowerInfer/SmallThinker-4BA0.6B-Instruct和hf.co/PowerInfer/SmallThinker-21BA3B-Instruct。

更新时间: 2025-07-30 06:29:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.20984v2

Question Generation for Assessing Early Literacy Reading Comprehension

Assessment of reading comprehension through content-based interactions plays an important role in the reading acquisition process. In this paper, we propose a novel approach for generating comprehension questions geared to K-2 English learners. Our method ensures complete coverage of the underlying material and adaptation to the learner's specific proficiencies, and can generate a large diversity of question types at various difficulty levels to ensure a thorough evaluation. We evaluate the performance of various language models in this framework using the FairytaleQA dataset as the source material. Eventually, the proposed approach has the potential to become an important part of autonomous AI-driven English instructors.

Updated: 2025-07-30 06:27:02

标题: 早期识字阅读理解评估的问题生成

摘要: 通过基于内容的互动评估阅读理解在阅读习得过程中扮演着重要角色。在本文中，我们提出了一种针对K-2年级英语学习者生成理解问题的新方法。我们的方法确保对基础材料进行完整覆盖，并根据学习者特定的能力进行调整，可以生成多种题型，难度级别不同，以确保全面评估。我们使用FairytaleQA数据集作为来源材料，在该框架中评估了各种语言模型的性能。最终，提出的方法有潜力成为自主AI驱动的英语教师的重要组成部分。

更新时间: 2025-07-30 06:27:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.22410v1

Decision Transformer-Based Drone Trajectory Planning with Dynamic Safety-Efficiency Trade-Offs

A drone trajectory planner should be able to dynamically adjust the safety-efficiency trade-off according to varying mission requirements in unknown environments. Although traditional polynomial-based planners offer computational efficiency and smooth trajectory generation, they require expert knowledge to tune multiple parameters to adjust this trade-off. Moreover, even with careful tuning, the resulting adjustment may fail to achieve the desired trade-off. Similarly, although reinforcement learning-based planners are adaptable in unknown environments, they do not explicitly address the safety-efficiency trade-off. To overcome this limitation, we introduce a Decision Transformer-based trajectory planner that leverages a single parameter, Return-to-Go (RTG), as a \emph{temperature parameter} to dynamically adjust the safety-efficiency trade-off. In our framework, since RTG intuitively measures the safety and efficiency of a trajectory, RTG tuning does not require expert knowledge. We validate our approach using Gazebo simulations in both structured grid and unstructured random environments. The experimental results demonstrate that our planner can dynamically adjust the safety-efficiency trade-off by simply tuning the RTG parameter. Furthermore, our planner outperforms existing baseline methods across various RTG settings, generating safer trajectories when tuned for safety and more efficient trajectories when tuned for efficiency. Real-world experiments further confirm the reliability and practicality of our proposed planner.

Updated: 2025-07-30 06:25:45

标题: Decision Transformer-Based Drone Trajectory Planning with Dynamic Safety-Efficiency Trade-Offs 基于决策Transformer的无人机轨迹规划与动态安全效率权衡

摘要: 一个无人机轨迹规划器应该能够根据不同的任务需求在未知环境中动态调整安全性和效率之间的权衡。虽然传统的基于多项式的规划器提供了计算效率和平滑的轨迹生成，但它们需要专家知识来调整多个参数以调整这种权衡。此外，即使经过仔细调整，结果可能无法实现期望的权衡。同样，虽然基于强化学习的规划器在未知环境中具有适应性，但它们并未明确解决安全性和效率之间的权衡。为了克服这一限制，我们引入了一种基于决策转换器的轨迹规划器，利用单个参数Return-to-Go（RTG）作为动态调整安全性和效率之间的权衡的“温度参数”。在我们的框架中，由于RTG直观地衡量了轨迹的安全性和效率，RTG的调整不需要专业知识。我们使用Gazebo模拟验证了我们的方法，包括结构化网格和非结构化随机环境。实验结果表明，我们的规划器可以通过简单调整RTG参数来动态调整安全性和效率的权衡。此外，我们的规划器在各种RTG设置下表现优于现有的基准方法，在安全性调整时生成更安全的轨迹，在效率调整时生成更高效的轨迹。现实世界的实验进一步证实了我们提出的规划器的可靠性和实用性。

更新时间: 2025-07-30 06:25:45

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2507.21506v2

Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis

Distributionally Robust Optimization (DRO), which aims to find an optimal decision that minimizes the worst case cost over the ambiguity set of probability distribution, has been widely applied in diverse applications, e.g., network behavior analysis, risk management, etc. However, existing DRO techniques face three key challenges: 1) how to deal with the asynchronous updating in a distributed environment; 2) how to leverage the prior distribution effectively; 3) how to properly adjust the degree of robustness according to different scenarios. To this end, we propose an asynchronous distributed algorithm, named Asynchronous Single-looP alternatIve gRadient projEction (ASPIRE) algorithm with the itErative Active SEt method (EASE) to tackle the federated distributionally robust optimization (FDRO) problem. Furthermore, a new uncertainty set, i.e., constrained D-norm uncertainty set, is developed to effectively leverage the prior distribution and flexibly control the degree of robustness. Finally, our theoretical analysis elucidates that the proposed algorithm is guaranteed to converge and the iteration complexity is also analyzed. Extensive empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, and remain robust against data heterogeneity as well as malicious attacks, but also tradeoff robustness with performance.

Updated: 2025-07-30 06:22:46

标题: 具有非凸目标的联邦分布鲁棒优化：算法和分析

摘要: 分布鲁棒优化（DRO）旨在找到一种最优决策，以在概率分布的模糊集中最小化最坏情况成本，已广泛应用于各种应用领域，例如网络行为分析，风险管理等。然而，现有的DRO技术面临三个关键挑战：1）如何处理分布环境中的异步更新；2）如何有效利用先验分布；3）如何根据不同场景适当调整鲁棒性程度。为此，我们提出了一种名为Asynchronous Single-looP alternatIve gRadient projEction（ASPIRE）算法的异步分布式算法，并采用迭代主动集（EASE）方法来解决联邦分布鲁棒优化（FDRO）问题。此外，开发了一种新的不确定性集，即受约束的D-范数不确定性集，以有效利用先验分布并灵活控制鲁棒性程度。最后，我们的理论分析阐明了所提出的算法保证收敛，并对迭代复杂性进行了分析。对真实数据集的广泛实证研究表明，所提出的方法不仅可以实现快速收敛，并且能够保持对数据异质性和恶意攻击的鲁棒性，同时还能在鲁棒性和性能之间取得平衡。

更新时间: 2025-07-30 06:22:46

领域: math.OC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2307.14364v2

FedCVD++: Communication-Efficient Federated Learning for Cardiovascular Risk Prediction with Parametric and Non-Parametric Model Optimization

Cardiovascular diseases (CVD) cause over 17 million deaths annually worldwide, highlighting the urgent need for privacy-preserving predictive systems. We introduce FedCVD++, an enhanced federated learning (FL) framework that integrates both parametric models (logistic regression, SVM, neural networks) and non-parametric models (Random Forest, XGBoost) for coronary heart disease risk prediction. To address key FL challenges, we propose: (1) tree-subset sampling that reduces Random Forest communication overhead by 70%, (2) XGBoost-based feature extraction enabling lightweight federated ensembles, and (3) federated SMOTE synchronization for resolving cross-institutional class imbalance. Evaluated on the Framingham dataset (4,238 records), FedCVD++ achieves state-of-the-art results: federated XGBoost (F1 = 0.80) surpasses its centralized counterpart (F1 = 0.78), and federated Random Forest (F1 = 0.81) matches non-federated performance. Additionally, our communication-efficient strategies reduce bandwidth consumption by 3.2X while preserving 95% accuracy. Compared to existing FL frameworks, FedCVD++ delivers up to 15% higher F1-scores and superior scalability for multi-institutional deployment. This work represents the first practical integration of non-parametric models into federated healthcare systems, providing a privacy-preserving solution validated under real-world clinical constraints.

Updated: 2025-07-30 06:17:33

标题: FedCVD++：通信效率高的联邦学习，用于心血管风险预测，并进行参数化和非参数化模型优化

摘要: 心血管疾病(CVD)每年导致全球超过1700万人死亡，突出了对保护隐私的预测系统的迫切需求。我们引入了FedCVD ++，这是一个增强的联邦学习(FL)框架，集成了参数模型(逻辑回归、支持向量机、神经网络)和非参数模型(随机森林、XGBoost)用于冠心病风险预测。为了解决关键的FL挑战，我们提出：(1)树子集抽样可将随机森林通信开销降低70%，(2)基于XGBoost的特征提取实现轻量级联邦集成，以及(3)用于解决跨机构类不平衡的联邦SMOTE同步。在Framingham数据集(4,238条记录)上评估，FedCVD ++取得了最先进的结果：联邦XGBoost(F1 = 0.80)超过了其集中对应物(F1 = 0.78)，联邦随机森林(F1 = 0.81)与非联邦表现相匹配。此外，我们的通信高效策略将带宽消耗降低了3.2倍，同时保持了95%的准确性。与现有的FL框架相比，FedCVD ++在F1得分上提供高达15%的优势，并提供了更好的多机构部署可扩展性。这项工作代表了非参数模型首次实际整合到联邦医疗系统中，提供了在真实临床约束条件下验证的保护隐私解决方案。

更新时间: 2025-07-30 06:17:33

领域: cs.LG,q-bio.OT

下载: http://arxiv.org/abs/2507.22963v1

MINR: Implicit Neural Representations with Masked Image Modelling

Self-supervised learning methods like masked autoencoders (MAE) have shown significant promise in learning robust feature representations, particularly in image reconstruction-based pretraining task. However, their performance is often strongly dependent on the masking strategies used during training and can degrade when applied to out-of-distribution data. To address these limitations, we introduce the masked implicit neural representations (MINR) framework that synergizes implicit neural representations with masked image modeling. MINR learns a continuous function to represent images, enabling more robust and generalizable reconstructions irrespective of masking strategies. Our experiments demonstrate that MINR not only outperforms MAE in in-domain scenarios but also in out-of-distribution settings, while reducing model complexity. The versatility of MINR extends to various self-supervised learning applications, confirming its utility as a robust and efficient alternative to existing frameworks.

Updated: 2025-07-30 06:12:57

标题: MINR：具有遮挡图像建模的隐式神经表示

摘要: 自监督学习方法，如掩码自动编码器（MAE），已经显示出在学习稳健特征表示方面具有显著的潜力，特别是在基于图像重建的预训练任务中。然而，它们的性能往往严重依赖于训练过程中使用的掩码策略，并且在应用于超出分布数据时可能会降级。为了解决这些限制，我们引入了掩码隐式神经表示（MINR）框架，将隐式神经表示与掩码图像建模相结合。MINR学习一个连续函数来表示图像，可以实现更稳健和可泛化的重建，无论掩码策略如何。我们的实验证明，MINR不仅在域内场景中优于MAE，而且在超出分布设置中也表现出色，同时降低了模型复杂性。MINR的多功能性扩展到各种自监督学习应用中，确认其作为现有框架的稳健和高效替代品的实用性。

更新时间: 2025-07-30 06:12:57

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.22404v1

OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing

The advancement of remote sensing, including satellite systems, facilitates the continuous acquisition of remote sensing imagery globally, introducing novel challenges for achieving open-world tasks. Deployed models need to continuously adjust to a constant influx of new data, which frequently exhibits diverse shifts from the data encountered during the training phase. To effectively handle the new data, models are required to detect semantic shifts, adapt to covariate shifts, and continuously update their parameters without forgetting learned knowledge, as has been considered in works on a variety of open-world tasks. However, existing studies are typically conducted within a single dataset to simulate realistic conditions, with a lack of large-scale benchmarks capable of evaluating multiple open-world tasks. In this paper, we introduce \textbf{OpenEarthSensing (OES)}, a large-scale fine-grained benchmark for open-world remote sensing. OES includes 189 scene and object categories, covering the vast majority of potential semantic shifts that may occur in the real world. Additionally, to provide a more comprehensive testbed for evaluating the generalization performance, OES encompasses five data domains with significant covariate shifts, including two RGB satellite domains, one RGB aerial domain, one multispectral RGB domain, and one infrared domain. We evaluate the baselines and existing methods for diverse tasks on OES, demonstrating that it serves as a meaningful and challenging benchmark for open-world remote sensing. The proposed dataset OES is available at https://haiv-lab.github.io/OES.

Updated: 2025-07-30 06:04:46

标题: OpenEarthSensing: 开放世界遥感的大规模细粒度基准测试

摘要: 遥感技术的进步，包括卫星系统的发展，促进了全球遥感图像的持续获取，为实现开放世界任务引入了新挑战。部署的模型需要不断调整以适应持续涌入的新数据，这些数据往往与训练阶段遇到的数据之间存在多样化的转变。为了有效处理新数据，模型需要检测语义转变，适应协变量转变，并在不忘记已学知识的情况下持续更新其参数，这一点已在各种开放世界任务的研究中得到考虑。然而，现有研究通常在单个数据集内进行，以模拟现实条件，缺乏能够评估多个开放世界任务的大规模基准。本文介绍了OpenEarthSensing（OES），一个用于开放世界遥感的大规模细粒度基准。OES包括189个场景和对象类别，涵盖了可能在现实世界中发生的绝大部分语义转变。此外，为了提供一个更全面的测试平台来评估泛化性能，OES包括五个具有显著协变量转变的数据领域，包括两个RGB卫星领域，一个RGB航空领域，一个多光谱RGB领域和一个红外领域。我们在OES上评估了基线和现有方法在不同任务上的表现，证明它作为一个有意义且具有挑战性的开放世界遥感基准。提出的数据集OES可在https://haiv-lab.github.io/OES上获取。

更新时间: 2025-07-30 06:04:46

领域: cs.CV,cs.AI,cs.LG,eess.IV

下载: http://arxiv.org/abs/2502.20668v2

R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs

Recent studies have combined Large Language Models (LLMs) with Knowledge Graphs (KGs) to enhance reasoning, improving inference accuracy without additional training while mitigating hallucination. However, existing frameworks still suffer two practical drawbacks: they must be re-tuned whenever the KG or reasoning task changes, and they depend on a single, high-capacity LLM for reliable (i.e., trustworthy) reasoning. To address this, we introduce R2-KG, a plug-and-play, dual-agent framework that separates reasoning into two roles: an Operator (a low-capacity LLM) that gathers evidence and a Supervisor (a high-capacity LLM) that makes final judgments. This design is cost-efficient for LLM inference while still maintaining strong reasoning accuracy. Additionally, R2-KG employs an Abstention mechanism, generating answers only when sufficient evidence is collected from KG, which significantly enhances reliability. Experiments across five diverse benchmarks show that R2-KG consistently outperforms baselines in both accuracy and reliability, regardless of the inherent capability of LLMs used as the Operator. Further experiments reveal that the single-agent version of R2-KG, equipped with a strict self-consistency strategy, achieves significantly higher-than-baseline reliability with reduced inference cost but increased abstention rate in complex KGs. Our findings establish R2-KG as a flexible and cost-effective solution for KG-based reasoning, reducing reliance on high-capacity LLMs while ensuring trustworthy inference. The code is available at https://github.com/ekrxjwh2009/R2-KG/.

Updated: 2025-07-30 06:04:25

标题: R2-KG：用于知识图谱可靠推理的通用双代理框架

摘要: 最近的研究将大型语言模型（LLMs）与知识图谱（KGs）结合起来，以增强推理，提高推理准确性而不需要额外的训练，同时减轻幻觉。然而，现有框架仍然存在两个实际缺点：它们必须在KG或推理任务发生更改时进行重新调整，并且它们依赖于一个单一的、高容量的LLM来进行可靠（即可信赖的）推理。为了解决这个问题，我们引入了R2-KG，这是一个即插即用的双代理框架，将推理分为两个角色：一个是操作员（低容量的LLM），用于收集证据，另一个是监督员（高容量的LLM），用于做出最终判断。这种设计在LLM推理方面具有成本效益，同时仍保持较强的推理准确性。此外，R2-KG采用了弃权机制，只有在从知识图谱中收集到足够的证据时才生成答案，这显著提高了可靠性。在五个不同的基准测试中进行的实验表明，R2-KG在准确性和可靠性方面始终优于基线，无论LLMs作为操作员的固有能力如何。进一步的实验显示，配备严格的自洽策略的R2-KG的单代理版本在复杂的知识图谱中实现了明显高于基线的可靠性，同时减少了推理成本但增加了弃权率。我们的研究结果将R2-KG确立为一种灵活且具有成本效益的解决方案，用于基于知识图谱的推理，减少对高容量LLMs的依赖，同时确保可信的推理。代码可在https://github.com/ekrxjwh2009/R2-KG/上找到。

更新时间: 2025-07-30 06:04:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.12767v6

RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection

4D millimeter-wave radar has emerged as a promising sensor for autonomous driving, but effective 3D object detection from both 4D radar and monocular images remains a challenge. Existing fusion approaches typically rely on either instance-based proposals or dense BEV grids, which either lack holistic scene understanding or are limited by rigid grid structures. To address these, we propose RaGS, the first framework to leverage 3D Gaussian Splatting (GS) as representation for fusing 4D radar and monocular cues in 3D object detection. 3D GS naturally suits 3D object detection by modeling the scene as a field of Gaussians, dynamically allocating resources on foreground objects and providing a flexible, resource-efficient solution. RaGS uses a cascaded pipeline to construct and refine the Gaussian field. It starts with the Frustum-based Localization Initiation (FLI), which unprojects foreground pixels to initialize coarse 3D Gaussians positions. Then, the Iterative Multimodal Aggregation (IMA) fuses semantics and geometry, refining the limited Gaussians to the regions of interest. Finally, the Multi-level Gaussian Fusion (MGF) renders the Gaussians into multi-level BEV features for 3D object detection. By dynamically focusing on sparse objects within scenes, RaGS enable object concentrating while offering comprehensive scene perception. Extensive experiments on View-of-Delft, TJ4DRadSet, and OmniHD-Scenes benchmarks demonstrate its state-of-the-art performance. Code will be released.

Updated: 2025-07-30 05:32:55

标题: RaGS: 从4D雷达和单目线索释放3D高斯飞溅以用于3D物体检测

摘要: 4D毫米波雷达已经成为自动驾驶的一种有前途的传感器，但是从4D雷达和单目图像中有效地进行3D物体检测仍然是一个挑战。现有的融合方法通常依赖于基于实例的提议或密集的BEV网格，这两种方法要么缺乏整体场景理解，要么受到刚性网格结构的限制。为了解决这些问题，我们提出了RaGS，这是第一个利用3D高斯雨滴（GS）作为在3D物体检测中融合4D雷达和单眼线索的表示的框架。3D GS自然适合于3D物体检测，通过将场景建模为高斯场，动态地分配资源给前景对象，并提供一种灵活、资源高效的解决方案。RaGS使用级联管道来构建和细化高斯场。它以基于锥体的定位初始化（FLI）开始，将前景像素反投影以初始化粗糙的3D高斯位置。然后，迭代多模态聚合（IMA）融合语义和几何信息，将有限的高斯精化到感兴趣的区域。最后，多级高斯融合（MGF）将高斯渲染成多级BEV特征，用于3D物体检测。通过动态聚焦于场景中的稀疏物体，RaGS实现了物体聚焦，同时提供了全面的场景感知。在View-of-Delft、TJ4DRadSet和OmniHD-Scenes基准测试上进行的大量实验表明其具有最先进的性能。代码将发布。

更新时间: 2025-07-30 05:32:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.19856v2

Gems: Group Emotion Profiling Through Multimodal Situational Understanding

Understanding individual, group and event level emotions along with contextual information is crucial for analyzing a multi-person social situation. To achieve this, we frame emotion comprehension as the task of predicting fine-grained individual emotion to coarse grained group and event level emotion. We introduce GEMS that leverages a multimodal swin-transformer and S3Attention based architecture, which processes an input scene, group members, and context information to generate joint predictions. Existing multi-person emotion related benchmarks mainly focus on atomic interactions primarily based on emotion perception over time and group level. To this end, we extend and propose VGAF-GEMS to provide more fine grained and holistic analysis on top of existing group level annotation of VGAF dataset. GEMS aims to predict basic discrete and continuous emotions (including valence and arousal) as well as individual, group and event level perceived emotions. Our benchmarking effort links individual, group and situational emotional responses holistically. The quantitative and qualitative comparisons with adapted state-of-the-art models demonstrate the effectiveness of GEMS framework on VGAF-GEMS benchmarking. We believe that it will pave the way of further research. The code and data is available at: https://github.com/katariaak579/GEMS

Updated: 2025-07-30 05:28:25

标题: 宝石：通过多模态情境理解进行群体情感剖析

摘要: 理解个人、团体和事件级别的情绪以及上下文信息对于分析多人社交情境至关重要。为了实现这一目标，我们将情绪理解框架为从预测细粒度个人情绪到粗粒度团体和事件级别情绪的任务。我们介绍了GEMS，它利用了多模态swin-transformer和基于S3Attention的架构，处理输入场景、团体成员和上下文信息以生成联合预测。现有的多人情绪相关基准主要关注基于时间和团体级别的情绪感知的原子互动。为此，我们扩展并提出了VGAF-GEMS，以在现有VGAF数据集的团体级别注释之上提供更细粒度和全面的分析。GEMS旨在预测基本的离散和连续情绪（包括愉悦和唤起），以及个人、团体和事件级别的感知情绪。我们的基准努力在整体上连接了个人、团体和情境情绪反应。与最先进模型进行的定量和定性比较展示了GEMS框架在VGAF-GEMS基准测试上的有效性。我们相信这将为进一步研究铺平道路。代码和数据可在以下网址获取：https://github.com/katariaak579/GEMS

更新时间: 2025-07-30 05:28:25

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.22393v1

Gems: Group Emotion Profiling Through Multimodal Situational Understanding

Updated: 2025-07-30 05:28:25

标题: 宝石：通过多模态情境理解进行群体情绪剖析

摘要: 理解个人、群体和事件级别的情绪以及上下文信息对于分析多人社交情况至关重要。为了实现这一目标，我们将情绪理解框架定为将细粒度个人情绪预测为粗粒度群体和事件级别情绪的任务。我们引入了GEMS，它利用了多模态swin-transformer和基于S3Attention的架构，处理输入场景、群体成员和上下文信息以生成联合预测。现有的多人情绪相关基准主要侧重于基于时间和群体级别的情绪感知的原子交互。为此，我们扩展并提出了VGAF-GEMS，以在现有VGAF数据集的群体级别注释基础上提供更细粒度和整体性分析。GEMS旨在预测基本的离散和连续情绪（包括valence和arousal），以及个人、群体和事件级别的感知情绪。我们的基准努力在个人、群体和情境情绪反应之间建立了全面的联系。与最先进模型的定量和定性比较展示了GEMS框架在VGAF-GEMS基准测试上的有效性。我们相信这将为进一步研究铺平道路。代码和数据可在以下链接获取：https://github.com/katariaak579/GEMS

更新时间: 2025-07-30 05:28:25

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2507.22393v1

A Survey on Large Language Model Acceleration based on KV Cache Management

Large Language Models (LLMs) have revolutionized a wide range of domains such as natural language processing, computer vision, and multi-modal tasks due to their ability to comprehend context and perform logical reasoning. However, the computational and memory demands of LLMs, particularly during inference, pose significant challenges when scaling them to real-world, long-context, and real-time applications. Key-Value (KV) cache management has emerged as a critical optimization technique for accelerating LLM inference by reducing redundant computations and improving memory utilization. This survey provides a comprehensive overview of KV cache management strategies for LLM acceleration, categorizing them into token-level, model-level, and system-level optimizations. Token-level strategies include KV cache selection, budget allocation, merging, quantization, and low-rank decomposition, while model-level optimizations focus on architectural innovations and attention mechanisms to enhance KV reuse. System-level approaches address memory management, scheduling, and hardware-aware designs to improve efficiency across diverse computing environments. Additionally, the survey provides an overview of both text and multimodal datasets and benchmarks used to evaluate these strategies. By presenting detailed taxonomies and comparative analyses, this work aims to offer useful insights for researchers and practitioners to support the development of efficient and scalable KV cache management techniques, contributing to the practical deployment of LLMs in real-world applications. The curated paper list for KV cache management is in: \href{https://github.com/TreeAI-Lab/Awesome-KV-Cache-Management}{https://github.com/TreeAI-Lab/Awesome-KV-Cache-Management}.

Updated: 2025-07-30 05:24:46

标题: 基于KV缓存管理的大型语言模型加速调查

摘要: 大型语言模型（LLMs）已经在自然语言处理、计算机视觉和多模态任务等广泛领域引起了革命，这是因为它们能够理解上下文并进行逻辑推理。然而，LLMs在推理过程中的计算和存储需求，尤其在将它们扩展到现实世界、长上下文和实时应用时，会带来重大挑战。关键-值（KV）缓存管理已经成为一种关键的优化技术，通过减少冗余计算和提高内存利用率来加速LLM的推理。本调查提供了KV缓存管理策略的全面概述，将其分类为令牌级别、模型级别和系统级别优化。令牌级别策略包括KV缓存选择、预算分配、合并、量化和低秩分解，而模型级别的优化则专注于架构创新和注意机制以增强KV的重用。系统级方法涉及内存管理、调度和硬件感知设计，以提高在不同计算环境中的效率。此外，该调查提供了用于评估这些策略的文本和多模态数据集以及基准。通过提供详细的分类法和比较分析，本工作旨在为研究人员和从业者提供有用的见解，以支持高效和可扩展的KV缓存管理技术的发展，促进LLMs在实际应用中的实际部署。关于KV缓存管理的策略列表可在以下链接找到：https://github.com/TreeAI-Lab/Awesome-KV-Cache-Management。

更新时间: 2025-07-30 05:24:46

领域: cs.AI,cs.DC

下载: http://arxiv.org/abs/2412.19442v3

Outcome-based Reinforcement Learning to Predict the Future

Reinforcement Learning with Verifiable Rewards (RLVR) has been an effective approach for improving Large Language Models' reasoning in domains such as coding and mathematics. Here, we apply RLVR methods towards forecasting future real-world events - a challenging task for RL due to the very noisy (and delayed) outcomes involved. Using a novel dataset of recent questions from a prediction market, and accompanying relevant news headlines, we show that a compact (14B) reasoning model can be trained to match or surpass the predictive accuracy of frontier models like o1, while greatly improving probabilistic calibration. The model's performance is also practically meaningful: in a Polymarket trading simulation, we estimate that its bets would have yielded a return on investment of over 10% across all questions in the test set. We detail and compare approaches used in training our model, including augmenting our training-data with synthetic prediction questions, guardrails for learning stability, and median prediction sampling at inference-time.

Updated: 2025-07-30 05:18:39

标题: 基于结果的强化学习用于预测未来

摘要: 通过可验证奖励的强化学习（RLVR）已被证明是改善大型语言模型在编码和数学等领域推理能力的有效方法。在这里，我们将RLVR方法应用于预测未来的现实世界事件 - 这对RL来说是一项具有挑战性的任务，因为涉及的结果非常嘈杂（并且延迟）。使用最近的预测市场问题的新数据集和相关的新闻标题，我们展示了一个紧凑的（14B）推理模型可以被训练来匹配或超越类似o1的前沿模型的预测准确性，同时大大提高概率校准。该模型的性能也具有实际意义：在Polymarket交易模拟中，我们估计其下注将会在测试集中的所有问题中获得超过10%的投资回报。我们详细描述和比较了训练模型时使用的方法，包括用合成预测问题增强训练数据，用于学习稳定性的防护栏，以及在推理时进行中位预测抽样。

更新时间: 2025-07-30 05:18:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2505.17989v3

PATENTWRITER: A Benchmarking Study for Patent Drafting with LLMs

Large language models (LLMs) have emerged as transformative approaches in several important fields. This paper aims for a paradigm shift for patent writing by leveraging LLMs to overcome the tedious patent-filing process. In this work, we present PATENTWRITER, the first unified benchmarking framework for evaluating LLMs in patent abstract generation. Given the first claim of a patent, we evaluate six leading LLMs -- including GPT-4 and LLaMA-3 -- under a consistent setup spanning zero-shot, few-shot, and chain-of-thought prompting strategies to generate the abstract of the patent. Our benchmark PATENTWRITER goes beyond surface-level evaluation: we systematically assess the output quality using a comprehensive suite of metrics -- standard NLP measures (e.g., BLEU, ROUGE, BERTScore), robustness under three types of input perturbations, and applicability in two downstream patent classification and retrieval tasks. We also conduct stylistic analysis to assess length, readability, and tone. Experimental results show that modern LLMs can generate high-fidelity and stylistically appropriate patent abstracts, often surpassing domain-specific baselines. Our code and dataset are open-sourced to support reproducibility and future research.

Updated: 2025-07-30 05:17:35

标题: 专利撰写者：一项有关使用LLMs进行专利起草的基准研究

摘要: 大型语言模型(LLMs)已经成为几个重要领域中的革命性方法。本文旨在通过利用LLMs来实现专利撰写的范式转变，以克服繁琐的专利申请流程。在这项工作中，我们提出了PATENTWRITER，这是第一个统一的基准评估框架，用于评估LLMs在专利摘要生成方面的性能。给定专利的第一个声明，我们评估了六种领先的LLMs，包括GPT-4和LLaMA-3，采用一致的设置，跨越零-shot、少-shot和思维链提示策略生成专利的摘要。我们的基准PATENTWRITER超越表面级评估：我们系统地使用一套综合的度量标准评估输出质量--标准的自然语言处理度量标准(如BLEU、ROUGE、BERTScore)、对三种类型的输入扰动的鲁棒性以及在两个下游专利分类和检索任务中的适用性。我们还进行风格分析，评估长度、可读性和语气。实验结果表明，现代LLMs可以生成高保真度和风格适当的专利摘要，通常超过领域特定的基线。我们的代码和数据集是开源的，以支持可复制性和未来研究。

更新时间: 2025-07-30 05:17:35

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.22387v1

PATENTWRITER: A Benchmarking Study for Patent Drafting with LLMs

Updated: 2025-07-30 05:17:35

标题: 专利撰写者：使用LLMs进行专利起草的基准研究

摘要: 大型语言模型(LLMs)已经成为几个重要领域中的变革性方法。本文旨在通过利用LLMs来实现专利撰写的范式转变，从而克服繁琐的专利申请过程。在这项工作中，我们提出了PATENTWRITER，这是第一个用于评估专利摘要生成中LLMs的统一基准框架。给定专利的第一个声明，我们评估了六个领先的LLMs，包括GPT-4和LLaMA-3，在一个跨越零-shot、少-shot和思维链提示策略的一致设置下生成专利摘要。我们的基准PATENTWRITER超越了表面级评估：我们系统地使用全面的度量标准来评估输出质量，包括标准的NLP度量标准(如BLEU、ROUGE、BERTScore)、对三种输入扰动的稳健性，以及在两个下游专利分类和检索任务中的适用性。我们还进行了风格分析，评估长度、可读性和语调。实验结果表明，现代LLMs可以生成高保真度和风格适当的专利摘要，通常超越领域特定的基线。我们的代码和数据集是开源的，以支持可重现性和未来研究。

更新时间: 2025-07-30 05:17:35

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2507.22387v1

Multi-Hazard Early Warning Systems for Agriculture with Featural-Temporal Explanations

Climate extremes present escalating risks to agriculture intensifying the need for reliable multi-hazard early warning systems (EWS). The situation is evolving due to climate change and hence such systems should have the intelligent to continue to learn from recent climate behaviours. However, traditional single-hazard forecasting methods fall short in capturing complex interactions among concurrent climatic events. To address this deficiency, in this paper, we combine sequential deep learning models and advanced Explainable Artificial Intelligence (XAI) techniques to introduce a multi-hazard forecasting framework for agriculture. In our experiments, we utilize meteorological data from four prominent agricultural regions in the United States (between 2010 and 2023) to validate the predictive accuracy of our framework on multiple severe event types, which are extreme cold, floods, frost, hail, heatwaves, and heavy rainfall, with tailored models for each area. The framework uniquely integrates attention mechanisms with TimeSHAP (a recurrent XAI explainer for time series) to provide comprehensive temporal explanations revealing not only which climatic features are influential but precisely when their impacts occur. Our results demonstrate strong predictive accuracy, particularly with the BiLSTM architecture, and highlight the system's capacity to inform nuanced, proactive risk management strategies. This research significantly advances the explainability and applicability of multi-hazard EWS, fostering interdisciplinary trust and effective decision-making process for climate risk management in the agricultural industry.

Updated: 2025-07-30 05:16:35

标题: 农业多灾害预警系统与特征-时间解释

摘要: 气候极端事件对农业造成不断升级的风险，加剧了对可靠多灾害预警系统（EWS）的需求。由于气候变化，形势不断发展，因此这些系统应具有智能化，能够继续从最近的气候行为中学习。然而，传统的单一灾害预测方法在捕捉同时发生的气候事件之间的复杂互动方面表现不佳。为了解决这一不足，在本文中，我们结合了顺序深度学习模型和先进的可解释人工智能（XAI）技术，引入了一个针对农业的多灾害预测框架。在我们的实验中，我们利用了美国四个著名农业地区的气象数据（2010年至2023年）来验证我们的框架在多种严重事件类型上的预测准确性，这些事件类型包括极端寒冷、洪水、霜冻、冰雹、热浪和暴雨，并为每个地区定制了模型。该框架独特地将注意机制与TimeSHAP（一种适用于时间序列的循环XAI解释工具）相结合，提供全面的时间解释，不仅揭示了哪些气候特征具有影响力，而且准确地指出它们的影响发生的时间。我们的结果表明，尤其是在BiLSTM架构下，具有强大的预测准确性，并突出了该系统在提供细致、主动的风险管理策略方面的能力。这项研究显著推进了多灾害EWS的可解释性和适用性，促进了跨学科信任和有效的气候风险管理决策过程，为农业行业的气候风险管理做出贡献。

更新时间: 2025-07-30 05:16:35

领域: cs.LG

下载: http://arxiv.org/abs/2507.22962v1

OWLViz: An Open-World Benchmark for Visual Question Answering

We present a challenging benchmark for the Open WorLd VISual question answering (OWLViz) task. OWLViz presents concise, unambiguous queries that require integrating multiple capabilities, including visual understanding, web exploration, and specialized tool usage. While humans achieve 69.2% accuracy on these intuitive tasks, even state-of-the-art VLMs struggle, with the best model, Gemini 2.0, achieving only 26.6% accuracy. Current agentic VLMs, which rely on limited vision and vision-language models as tools, perform even worse. This performance gap reveals significant limitations in multimodal systems' ability to select appropriate tools and execute complex reasoning sequences, establishing new directions for advancing practical AI research.

Updated: 2025-07-30 05:15:43

标题: OWLViz：一个用于视觉问答的开放世界基准Benchmark

摘要: 我们提出了一个具有挑战性的基准测试，用于开放世界视觉问答（OWLViz）任务。OWLViz提出了简洁、明确的查询，需要整合多种能力，包括视觉理解、网页探索和专门工具的使用。尽管人类在这些直观任务上达到了69.2%的准确率，但即使是最先进的视觉-语言模型（VLMs）也很难应对，最好的模型Gemini 2.0只达到了26.6%的准确率。当前依赖于有限视觉和视觉-语言模型作为工具的代理VLMs表现得更差。这种性能差距揭示了多模态系统在选择合适工具和执行复杂推理序列方面存在重大局限，为推动实用人工智能研究开辟了新的方向。

更新时间: 2025-07-30 05:15:43

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.07631v3

OWLViz: An Open-World Benchmark for Visual Question Answering

Updated: 2025-07-30 05:15:43

标题: OWLViz：一个用于视觉问答的开放世界基准

摘要: 我们提出了一个具有挑战性的基准测试，用于开放世界视觉问答（OWLViz）任务。OWLViz提出了简洁、明确的查询，需要整合多种能力，包括视觉理解、网络探索和专业工具使用。虽然人类在这些直观任务上的准确率达到了69.2%，但即使是最先进的VLM也很难应对，最好的模型Gemini 2.0的准确率仅为26.6%。目前依赖有限视觉和视觉语言模型作为工具的代理VLM表现得更差。这种性能差距揭示了多模式系统在选择适当工具和执行复杂推理序列方面存在显著局限，为推进实用人工智能研究开辟了新方向。

更新时间: 2025-07-30 05:15:43

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2503.07631v3

Set Invariance with Probability One for Controlled Diffusion: Score-based Approach

Given a controlled diffusion and a connected, bounded, Lipschitz set, when is it possible to guarantee controlled set invariance with probability one? In this work, we answer this question by deriving the necessary and sufficient conditions for the same in terms of gradients of certain log-likelihoods -- a.k.a. score vector fields -- for two cases: given finite time horizon and infinite time horizon. The deduced conditions comprise a score-based test that provably certifies or falsifies the existence of Markovian controllers for given controlled set invariance problem data. Our results are constructive in the sense when the problem data passes the proposed test, we characterize all controllers guaranteeing the desired set invariance. When the problem data fails the proposed test, there does not exist a controller that can accomplish the desired set invariance with probability one. The computation in the proposed tests involve solving certain Dirichlet boundary value problems, and in the finite horizon case, can also account for additional constraint of hitting a target subset at the terminal time. We illustrate the results using several semi-analytical and numerical examples.

Updated: 2025-07-30 05:13:31

标题: 控制扩散的概率一设置不变性：基于评分的方法

摘要: 考虑到受控扩散和连通、有界、Lipschitz集合，什么时候可以确保以概率为一的控制集不变性？在这项工作中，我们通过推导某些对数似然梯度（即评分向量场）的必要和充分条件来回答这个问题，分为两种情况：有限时间和无限时间。推导出的条件包括基于评分的测试，可以明确证明或证伪给定控制集不变性问题数据中存在马尔可夫控制器的存在。我们的结果是具有建设性的，因为当问题数据通过所提出的测试时，我们可以表征所有保证所需集合不变性的控制器。当问题数据未通过所提出的测试时，不存在一个可以以概率为一实现所需集合不变性的控制器。在所提出的测试中的计算涉及解决某些Dirichlet边界值问题，而在有限时间情况下，还可以考虑在终端时间击中目标子集的附加约束。我们使用几个半解析和数值示例说明了结果。

更新时间: 2025-07-30 05:13:31

领域: math.OC,cs.LG,cs.SY,eess.SY,math.PR,stat.ME

下载: http://arxiv.org/abs/2507.22385v1

Set Invariance with Probability One for Controlled Diffusion: Score-based Approach

Updated: 2025-07-30 05:13:31

标题: 受控扩散的概率一设置不变性：基于得分的方法

摘要: 鉴于一定控制扩散和一个连通、有界、利普希兹集合，何时可以确保以概率为一的控制集不变性？在这项工作中，我们通过推导某些对数似然的梯度 -- 也称为得分向量场 -- 的必要和充分条件来回答这个问题，针对有限时间和无限时间两种情况。推导出的条件包括一个基于得分的测试，可以明确证明或证伪存在用于给定控制集不变性问题数据的马尔可夫控制器。我们的结果是具有建设性的，当问题数据通过所提出的测试时，我们描述了所有确保所需集合不变性的控制器。当问题数据未通过所提出的测试时，不存在一个控制器能够以概率为一实现所需的集合不变性。所提出测试中的计算涉及解决某些迪利克雷边界值问题，在有限时间范围内，还可以考虑在终点时间击中目标子集的附加约束。我们使用几个半解析和数值示例说明了这些结果。

更新时间: 2025-07-30 05:13:31

领域: math.OC,cs.LG,cs.SY,eess.SY,math.PR,stat.ME

下载: http://arxiv.org/abs/2507.22385v1

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation

Despite recent advances in text-to-speech (TTS) models, audio-visual-to-audio-visual (AV2AV) translation still faces a critical challenge: maintaining speaker consistency between the original and translated vocal and facial features. To address this issue, we propose a conditional flow matching (CFM) zero-shot audio-visual renderer that utilizes strong dual guidance from both audio and visual modalities. By leveraging multimodal guidance with CFM, our model robustly preserves speaker-specific characteristics and enhances zero-shot AV2AV translation abilities. For the audio modality, we enhance the CFM process by integrating robust speaker embeddings with x-vectors, which serve to bolster speaker consistency. Additionally, we convey emotional nuances to the face rendering module. The guidance provided by both audio and visual cues remains independent of semantic or linguistic content, allowing our renderer to effectively handle zero-shot translation tasks for monolingual speakers in different languages. We empirically demonstrate that the inclusion of high-quality mel-spectrograms conditioned on facial information not only enhances the quality of the synthesized speech but also positively influences facial generation, leading to overall performance improvements in LSE and FID score. Our code is available at https://github.com/Peter-SungwooCho/MAVFlow.

Updated: 2025-07-30 05:08:14

标题: MAVFlow：使用条件流匹配保留语用元素，实现零-shot AV2AV 多语言翻译

摘要: 尽管近年来在文本转语音（TTS）模型方面取得了进展，但音频-视觉到音频-视觉（AV2AV）翻译仍面临一个关键挑战：在原始和翻译的语音和面部特征之间保持说话人一致性。为了解决这个问题，我们提出了一种条件流匹配（CFM）零样本音频-视觉渲染器，利用音频和视觉两种模态的强双重指导。通过利用CFM的多模态指导，我们的模型能够稳健地保留说话人特定特征并增强零样本AV2AV翻译能力。对于音频模态，我们通过将强健的说话人嵌入与x-向量集成，以增强CFM过程，从而巩固说话人一致性。此外，我们向面部渲染模块传达情感细微差别。音频和视觉线索提供的指导与语义或语言内容无关，使我们的渲染器能够有效处理不同语言的单语言说话人的零样本翻译任务。我们在实证上证明，结合基于面部信息的高质量mel-频谱图不仅提高了合成语音的质量，还积极影响了面部生成，从而提高了LSE和FID分数的整体性能。我们的代码可在https://github.com/Peter-SungwooCho/MAVFlow 上找到。

更新时间: 2025-07-30 05:08:14

领域: eess.AS,cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2503.11026v2

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation

Updated: 2025-07-30 05:08:14

标题: MAVFlow：使用条件流匹配保留语音元素的零-shot AV2AV 多语言翻译

摘要: 尽管最近在文本到语音（TTS）模型方面取得了进展，但音频-视觉到音频-视觉（AV2AV）翻译仍然面临一个关键挑战：在原始和翻译的语音和面部特征之间保持说话者一致性。为了解决这个问题，我们提出了一种条件流匹配（CFM）零样本音频-视觉渲染器，它利用来自音频和视觉两种模态的强双重指导。通过利用CFM的多模态指导，我们的模型能够稳健地保留说话者特定特征，并增强零样本AV2AV翻译能力。对于音频模态，我们通过将稳健的说话者嵌入与x-vectors相结合来增强CFM过程，从而加强说话者一致性。此外，我们向面部渲染模块传达情感细微差别。由音频和视觉提示提供的指导与语义或语言内容无关，使我们的渲染器能够有效处理不同语言中的单语言说话者的零样本翻译任务。我们凭经验证明，基于面部信息条件的高质量mel频谱图的包含不仅提升了合成语音的质量，还积极影响面部生成，从而导致LSE和FID分数的整体性能改进。我们的代码可在https://github.com/Peter-SungwooCho/MAVFlow上找到。

更新时间: 2025-07-30 05:08:14

领域: eess.AS,cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2503.11026v2

ChartM$^3$: Benchmarking Chart Editing with Multimodal Instructions

Charts are a fundamental visualization format widely used in data analysis across research and industry. While enabling users to edit charts based on high-level intentions is of great practical value, existing methods primarily rely on natural language instructions, which are often too ambiguous to support fine-grained editing. In this work, we introduce a novel paradigm for multimodal chart editing, where user intent is expressed through a combination of natural language and visual indicators that explicitly highlight the elements to be modified. To support this paradigm, we present Chart$\text{M}^3$, a new benchmark for Multimodal chart editing with Multi-level complexity and Multi-perspective evaluation. Chart$\text{M}^3$ contains 1,000 samples spanning four levels of editing difficulty. Each sample includes triplets in the form of (chart, code, multimodal instructions). To comprehensively evaluate chart editing models, Chart$\text{M}^3$ provides metrics that assess both visual appearance and code correctness. Our benchmark reveals significant limitations in current multimodal large language models (MLLMs), including GPT-4o, particularly in their ability to interpret and act on visual indicators. To address this, we construct Chart$\text{M}^3$-Train, a large-scale training set with 24,000 multimodal chart editing samples. Fine-tuning MLLMs on this dataset leads to substantial improvements, demonstrating the importance of multimodal supervision in building practical chart editing systems. Our datasets, codes, and evaluation tools are available at https://github.com/MLrollIT/ChartM3. %https://github.com/MLrollIT/ChartM3Our datasets, codes, and evaluation tools are available at https://github.com/yaolinli/VCE.

Updated: 2025-07-30 05:05:10

标题: ChartM$^3$: 用多模指令对图表编辑进行基准测试

摘要: 图表是一种在研究和工业领域广泛使用的基本可视化格式，用于数据分析。虽然使用户能够基于高级意图编辑图表具有很大的实际价值，但现有方法主要依赖于自然语言指令，这些指令通常不明确以支持细粒度编辑。在这项工作中，我们引入了一种新的多模态图表编辑范式，用户意图通过自然语言和视觉指示的组合表达，明确突出需要修改的元素。为了支持这种范式，我们提出了Chart$\text{M}^3$，一个具有多级复杂性和多角度评估的多模态图表编辑新基准。Chart$\text{M}^3$包含1,000个样本，涵盖四个难度级别的编辑。每个样本包括三元组形式的（图表，代码，多模态指令）。为了全面评估图表编辑模型，Chart$\text{M}^3$提供了评估视觉外观和代码正确性的指标。我们的基准测试揭示了当前多模态大语言模型（MLLMs）的显着局限性，包括GPT-4o，尤其是它们解释和执行视觉指示的能力。为了解决这个问题，我们构建了Chart$\text{M}^3$-Train，一个包含24,000个多模态图表编辑样本的大规模训练集。在这个数据集上微调MLLMs会带来显著的改进，这证明了在构建实际图表编辑系统中多模态监督的重要性。我们的数据集、代码和评估工具可在https://github.com/MLrollIT/ChartM3上找到。我们的数据集、代码和评估工具可在https://github.com/yaolinli/VCE上找到。

更新时间: 2025-07-30 05:05:10

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.21167v2

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation

We introduce LLaVA-Reward, an efficient reward model designed to automatically evaluate text-to-image (T2I) generations across multiple perspectives, leveraging pretrained multimodal large language models (MLLMs). Existing MLLM-based approaches require instruction-following data for supervised fine-tuning and evaluate generation quality on analyzing text response, which is time-consuming and difficult to train. To address this problem, we propose LLaVA-Reward, which directly utilizes the hidden states of MLLMs given text-image pairs. To enhance the bidirectional interaction between visual and textual representations in decoder-only MLLMs, we further propose adding a Skip-connection Cross Attention (SkipCA) module. This design enhances text-image correlation reasoning by connecting early-layer visual features with later-layer hidden representations. In addition, LLaVA-Reward supports different types of preference data for efficient fine-tuning, including paired preference data and unpaired data. We train LLaVA-Reward on four evaluation perspectives: text-image alignment, fidelity/artifact, safety, and overall ranking. Empirical results demonstrate that LLaVA-Reward outperforms conventional and MLLM-based methods in generating human-aligned scores for automatic evaluations and inference-time scaling in text-to-image generations.

Updated: 2025-07-30 04:49:38

标题: 多模态LLMs作为文本到图像生成的定制奖励模型

摘要: 我们介绍了LLaVA-Reward，这是一个高效的奖励模型，旨在自动评估跨多个角度的文本到图像（T2I）生成，利用预训练的多模态大型语言模型（MLLMs）。现有的基于MLLM的方法需要遵循指令的数据进行监督微调，并通过分析文本响应来评估生成质量，这是耗时且难以训练的。为了解决这个问题，我们提出了LLaVA-Reward，它直接利用MLLMs给定的文本-图像对的隐藏状态。为了增强解码器-MLLMs之间的双向交互作用，我们进一步提出了添加Skip-connection Cross Attention（SkipCA）模块。这种设计通过连接早期层的视觉特征和后期隐藏表示来增强文本-图像相关性推理。此外，LLaVA-Reward支持不同类型的偏好数据进行高效微调，包括成对的偏好数据和非成对数据。我们在四个评估角度上对LLaVA-Reward进行训练：文本-图像对齐，保真度/伪像，安全性和总体排名。实证结果表明，LLaVA-Reward在生成人类对齐分数方面优于传统的基于MLLM的方法，并在文本到图像生成的自动评估和推理时间缩放方面表现更好。

更新时间: 2025-07-30 04:49:38

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.21391v2

Interpretable Open-Vocabulary Referring Object Detection with Reverse Contrast Attention

We propose Reverse Contrast Attention (RCA), a plug-in method that enhances object localization in vision-language transformers without retraining. RCA reweights final-layer attention by suppressing extremes and amplifying mid-level activations to let semantically relevant but subdued tokens guide predictions. We evaluate it on Open Vocabulary Referring Object Detection (OV-RefOD), introducing FitAP, a confidence-free average precision metric based on IoU and box area. RCA improves FitAP in 11 out of 15 open-source VLMs, with gains up to $+26.6\%$. Effectiveness aligns with attention sharpness and fusion timing; while late-fusion models benefit consistently, models like $\texttt{DeepSeek-VL2}$ also improve, pointing to capacity and disentanglement as key factors. RCA offers both interpretability and performance gains for multimodal transformers. Codes and dataset are available from https://github.com/earl-juanico/rca

Updated: 2025-07-30 04:47:07

标题: 具有可解释性的开放词汇指代物体检测与反向对比注意力

摘要: 我们提出了Reverse Contrast Attention (RCA)，这是一种插件方法，可以在无需重新训练的情况下增强视觉-语言转换器中的对象定位。RCA通过抑制极端值并放大中间层激活来重新调整最终层的注意力，从而让语义相关但被压制的标记引导预测。我们在Open Vocabulary Referring Object Detection (OV-RefOD)上对其进行评估，引入了FitAP，一种基于IoU和框面积的无置信度平均精度度量。在15个开源VLM中，RCA在11个模型中提高了FitAP，增益高达+26.6%。效果与注意力的清晰度和融合时机保持一致；虽然延迟融合模型受益显著，像DeepSeek-VL2这样的模型也有所改善，指出容量和解耦是关键因素。RCA为多模态转换器提供了可解释性和性能增益。代码和数据集可在https://github.com/earl-juanico/rca上找到。

更新时间: 2025-07-30 04:47:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.19891v2

Improving Generalization Ability of Robotic Imitation Learning by Resolving Causal Confusion in Observations

Recent developments in imitation learning have considerably advanced robotic manipulation. However, current techniques in imitation learning can suffer from poor generalization, limiting performance even under relatively minor domain shifts. In this work, we aim to enhance the generalization capabilities of complex imitation learning algorithms to handle unpredictable changes from the training environments to deployment environments. To avoid confusion caused by observations that are not relevant to the target task, we propose to explicitly learn the causal relationship between observation components and expert actions, employing a framework similar to [6], where a causal structural function is learned by intervention on the imitation learning policy. Disentangling the feature representation from image input as in [6] is hard to satisfy in complex imitation learning process in robotic manipulation, we theoretically clarify that this requirement is not necessary in causal relationship learning. Therefore, we propose a simple causal structure learning framework that can be easily embedded in recent imitation learning architectures, such as the Action Chunking Transformer [31]. We demonstrate our approach using a simulation of the ALOHA [31] bimanual robot arms in Mujoco, and show that the method can considerably mitigate the generalization problem of existing complex imitation learning algorithms.

Updated: 2025-07-30 04:46:48

标题: 通过解决观察中的因果混淆来提高机器人模仿学习的泛化能力

摘要: 最近在模仿学习领域的发展极大地推进了机器人操作。然而，当前的模仿学习技术可能存在泛化能力较差的问题，甚至在相对较小的领域转移下性能受限。在这项工作中，我们旨在增强复杂模仿学习算法的泛化能力，以处理从训练环境到部署环境的不可预测变化。为了避免由与目标任务无关的观察引起的混乱，我们建议明确学习观察组件和专家行为之间的因果关系，采用类似于[6]的框架，在模仿学习策略的干预下学习因果结构功能。在机器人操作中，从图像输入中解开特征表示，如[6]所述，对于复杂的模仿学习过程来说很难满足，我们理论上澄清了这一要求在因果关系学习中并非必要。因此，我们提出了一个简单的因果结构学习框架，可以轻松嵌入到最近的模仿学习架构中，例如动作分块变换器[31]。我们使用Mujoco中ALOHA[31]双臂机器人模拟展示了我们的方法，并展示了该方法能够大大缓解现有复杂模仿学习算法的泛化问题。

更新时间: 2025-07-30 04:46:48

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2507.22380v1

Improving Generalization Ability of Robotic Imitation Learning by Resolving Causal Confusion in Observations

Updated: 2025-07-30 04:46:48

标题: 通过解决观察中的因果混淆来提高机器人模仿学习的泛化能力

摘要: 最近，模仿学习在机器人操作中取得了显著的进展。然而，当前的模仿学习技术可能存在泛化能力差的问题，甚至在相对较小的领域转移下也会限制性能。在这项工作中，我们旨在增强复杂模仿学习算法的泛化能力，以处理训练环境到部署环境中的不可预测变化。为了避免由于与目标任务无关的观察而引起的混淆，我们提出明确学习观察组件与专家行动之间的因果关系，采用类似于[6]的框架，在模仿学习策略上进行干预学习因果结构函数。在复杂的机器人操作中，像[6]中对图像输入的特征表示进行分离是很难满足的，我们在理论上澄清了这种要求在因果关系学习中是不必要的。因此，我们提出了一个简单的因果结构学习框架，可以轻松地嵌入到最近的模仿学习架构中，例如Action Chunking Transformer[31]。我们使用Mujoco中的ALOHA[31]双臂机器人模拟来演示我们的方法，并表明该方法可以显著缓解现有复杂模仿学习算法的泛化问题。

更新时间: 2025-07-30 04:46:48

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2507.22380v1

Can GPT-4o mini and Gemini 2.0 Flash Predict Fine-Grained Fashion Product Attributes? A Zero-Shot Analysis

The fashion retail business is centered around the capacity to comprehend products. Product attribution helps in comprehending products depending on the business process. Quality attribution improves the customer experience as they navigate through millions of products offered by a retail website. It leads to well-organized product catalogs. In the end, product attribution directly impacts the 'discovery experience' of the customer. Although large language models (LLMs) have shown remarkable capabilities in understanding multimodal data, their performance on fine-grained fashion attribute recognition remains under-explored. This paper presents a zero-shot evaluation of state-of-the-art LLMs that balance performance with speed and cost efficiency, mainly GPT-4o-mini and Gemini 2.0 Flash. We have used the dataset DeepFashion-MultiModal (https://github.com/yumingj/DeepFashion-MultiModal) to evaluate these models in the attribution tasks of fashion products. Our study evaluates these models across 18 categories of fashion attributes, offering insight into where these models excel. We only use images as the sole input for product information to create a constrained environment. Our analysis shows that Gemini 2.0 Flash demonstrates the strongest overall performance with a macro F1 score of 56.79% across all attributes, while GPT-4o-mini scored a macro F1 score of 43.28%. Through detailed error analysis, our findings provide practical insights for deploying these LLMs in production e-commerce product attribution-related tasks and highlight the need for domain-specific fine-tuning approaches. This work also lays the groundwork for future research in fashion AI and multimodal attribute extraction.

Updated: 2025-07-30 04:37:06

标题: GPT-4o mini和Gemini 2.0 Flash能预测细粒度时尚产品属性吗？零射击分析

摘要: 时尚零售业的核心在于理解产品。产品属性有助于根据业务流程理解产品。质量属性提高了客户体验，因为他们在零售网站提供的数百万产品中进行导航。它导致了组织良好的产品目录。最终，产品属性直接影响客户的“发现体验”。尽管大型语言模型（LLMs）在理解多模态数据方面表现出了卓越的能力，但它们在细粒度时尚属性识别方面的表现仍未得到充分探讨。本文提出了一种零样本评估最先进的LLMs的方法，以平衡性能与速度和成本效率，主要是GPT-4o-mini和Gemini 2.0 Flash。我们使用数据集DeepFashion-MultiModal（https://github.com/yumingj/DeepFashion-MultiModal）来评估这些模型在时尚产品的属性任务中的表现。我们的研究评估了这些模型在18个时尚属性类别中的表现，为这些模型擅长的领域提供了见解。我们仅使用图像作为产品信息的唯一输入，以创造一个受限制的环境。我们的分析显示，Gemini 2.0 Flash在所有属性上的宏F1得分为56.79%，表现最佳，而GPT-4o-mini的宏F1得分为43.28%。通过详细的错误分析，我们的研究结果为在生产电子商务产品属性相关任务中部署这些LLMs提供了实用见解，并强调了需要领域特定的微调方法。这项工作还为未来时尚人工智能和多模态属性提取的研究奠定了基础。

更新时间: 2025-07-30 04:37:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.09950v2

Year-over-Year Developments in Financial Fraud Detection via Deep Learning: A Systematic Literature Review

This paper systematically reviews advancements in deep learning (DL) techniques for financial fraud detection, a critical issue in the financial sector. Using the Kitchenham systematic literature review approach, 57 studies published between 2019 and 2024 were analyzed. The review highlights the effectiveness of various deep learning models such as Convolutional Neural Networks, Long Short-Term Memory, and transformers across domains such as credit card transactions, insurance claims, and financial statement audits. Performance metrics such as precision, recall, F1-score, and AUC-ROC were evaluated. Key themes explored include the impact of data privacy frameworks and advancements in feature engineering and data preprocessing. The study emphasizes challenges such as imbalanced datasets, model interpretability, and ethical considerations, alongside opportunities for automation and privacy-preserving techniques such as blockchain integration and Principal Component Analysis. By examining trends over the past five years, this review identifies critical gaps and promising directions for advancing DL applications in financial fraud detection, offering actionable insights for researchers and practitioners.

Updated: 2025-07-30 04:32:58

标题: 深度学习在财务欺诈检测中的逐年发展：系统文献综述

摘要: 本文系统性地回顾了深度学习（DL）技术在金融欺诈检测中的进展，这是金融领域中的一个关键问题。利用Kitchenham系统文献综述方法，分析了2019年至2024年间发表的57篇研究。综述突出了各种深度学习模型的有效性，如卷积神经网络、长短期记忆和变换器，在信用卡交易、保险理赔和财务报表审计等领域的应用。评估了精确度、召回率、F1分数和AUC-ROC等性能指标。探讨的关键主题包括数据隐私框架的影响以及特征工程和数据预处理方面的进展。该研究强调了不平衡数据集、模型可解释性和伦理考虑等挑战，同时提出了自动化和隐私保护技术的机会，如区块链集成和主成分分析。通过对过去五年的趋势进行审查，本综述确定了金融欺诈检测中深度学习应用的关键空白和有前途的方向，为研究人员和从业者提供可操作的见解。

更新时间: 2025-07-30 04:32:58

领域: cs.LG,cs.AI,q-fin.ST

下载: http://arxiv.org/abs/2502.00201v2

Learning Neural Strategy-Proof Matching Mechanism from Examples

Designing two-sided matching mechanisms is challenging when practical demands for matching outcomes are difficult to formalize and the designed mechanism must satisfy theoretical conditions. To address this, prior work has proposed a framework that learns a matching mechanism from examples, using a parameterized family that satisfies properties such as stability. However, despite its usefulness, this framework does not guarantee strategy-proofness (SP), and cannot handle varying numbers of agents or incorporate publicly available contextual information about agents, both of which are crucial in real-world applications. In this paper, we propose a new parametrized family of matching mechanisms that always satisfy strategy-proofness, are applicable for an arbitrary number of agents, and deal with public contextual information of agents, based on the serial dictatorship (SD). This family is represented by NeuralSD, a novel neural network architecture based on SD, where agent rankings in SD are treated as learnable parameters computed from agents' contexts using an attention-based sub-network. To enable learning, we introduce tensor serial dictatorship (TSD), a differentiable relaxation of SD using tensor operations. This allows NeuralSD to be trained end-to-end from example matchings while satisfying SP. We conducted experiments to learn a matching mechanism from matching examples while satisfying SP. We demonstrated that our method outperformed baselines in predicting matchings and on several metrics for goodness of matching outcomes.

Updated: 2025-07-30 04:30:43

标题: 从示例学习神经策略稳定匹配机制

摘要: 设计双边匹配机制在实践需求难以形式化并且设计的机制必须满足理论条件时是具有挑战性的。为了解决这个问题，先前的工作提出了一个从示例中学习匹配机制的框架，使用一个参数化的家族来满足稳定性等属性。然而，尽管其有用性，这个框架并不保证策略无懈可击（SP），也不能处理不同数量的代理或者整合公开可用的关于代理的环境信息，这两者在现实应用中是至关重要的。在本文中，我们提出了一个新的参数化匹配机制家族，始终满足策略无懈可击，适用于任意数量的代理，并处理代理的公开环境信息，基于串行独裁（SD）。这个家族由NeuralSD表示，这是一个基于SD的新颖神经网络架构，其中SD中的代理排名被视为可学习参数，通过使用基于注意力的子网络从代理的环境中计算出来。为了实现学习，我们引入了张量串行独裁（TSD），这是使用张量运算的SD的可微分松弛。这使得NeuralSD能够从示例匹配中进行端到端的训练，同时满足SP。我们进行了实验，从匹配示例中学习匹配机制，同时满足SP。我们证明了我们的方法在预测匹配和匹配结果的好处方面优于基线。

更新时间: 2025-07-30 04:30:43

领域: cs.AI

下载: http://arxiv.org/abs/2410.19384v3

SAEL: Leveraging Large Language Models with Adaptive Mixture-of-Experts for Smart Contract Vulnerability Detection

With the increasing security issues in blockchain, smart contract vulnerability detection has become a research focus. Existing vulnerability detection methods have their limitations: 1) Static analysis methods struggle with complex scenarios. 2) Methods based on specialized pre-trained models perform well on specific datasets but have limited generalization capabilities. In contrast, general-purpose Large Language Models (LLMs) demonstrate impressive ability in adapting to new vulnerability patterns. However, they often underperform on specific vulnerability types compared to methods based on specialized pre-trained models. We also observe that explanations generated by general-purpose LLMs can provide fine-grained code understanding information, contributing to improved detection performance. Inspired by these observations, we propose SAEL, an LLM-based framework for smart contract vulnerability detection. We first design targeted prompts to guide LLMs in identifying vulnerabilities and generating explanations, which serve as prediction features. Next, we apply prompt-tuning on CodeT5 and T5 to process contract code and explanations, enhancing task-specific performance. To combine the strengths of each approach, we introduce an Adaptive Mixture-of-Experts architecture. This dynamically adjusts feature weights via a Gating Network, which selects relevant features using TopK filtering and Softmax normalization, and incorporates a Multi-Head Self-Attention mechanism to enhance cross-feature relationships. This design enables effective integration of LLM predictions, explanation features, and code features through gradient optimization. The loss function jointly considers both independent feature performance and overall weighted predictions. Experiments show that SAEL outperforms existing methods across various vulnerabilities.

Updated: 2025-07-30 04:28:00

标题: SAEL：利用自适应专家混合模型的大型语言模型，用于智能合约漏洞检测

摘要: 随着区块链安全问题越来越严重，智能合约漏洞检测成为研究焦点。现有的漏洞检测方法存在一定限制：1) 静态分析方法在复杂场景中表现不佳。2) 基于专门预训练模型的方法在特定数据集上表现良好，但泛化能力有限。相比之下，通用的大型语言模型(LLMs)展示了在适应新漏洞模式方面的印象力。然而，与基于专门预训练模型的方法相比，它们通常在特定漏洞类型上表现不佳。我们还观察到，通用的LLMs生成的解释可以提供细粒度的代码理解信息，有助于提高检测性能。受到这些观察的启发，我们提出了SAEL，一个基于LLM的智能合约漏洞检测框架。我们首先设计有针对性的提示，引导LLMs识别漏洞并生成解释，作为预测特征。接下来，我们在CodeT5和T5上应用提示调整来处理合约代码和解释，增强任务特定性能。为了结合每种方法的优势，我们引入了自适应专家混合架构。这通过一个门控网络动态调整特征权重，使用TopK过滤和Softmax归一化选择相关特征，并结合多头自注意机制增强跨特征关系。这种设计通过梯度优化有效地整合了LLM预测、解释特征和代码特征。损失函数同时考虑了独立特征性能和整体加权预测。实验表明，SAEL在各种漏洞上优于现有方法。

更新时间: 2025-07-30 04:28:00

领域: cs.CR,cs.AI,cs.SE

下载: http://arxiv.org/abs/2507.22371v1

NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT

Inertial tracking is vital for robotic IoT and has gained popularity thanks to the ubiquity of low-cost inertial measurement units and deep learning-powered tracking algorithms. Existing works, however, have not fully utilized IMU measurements, particularly magnetometers, nor have they maximized the potential of deep learning to achieve the desired accuracy. To address these limitations, we introduce NeurIT, which elevates tracking accuracy to a new level. NeurIT employs a Time-Frequency Block-recurrent Transformer (TF-BRT) at its core, combining both RNN and Transformer to learn representative features in both time and frequency domains. To fully utilize IMU information, we strategically employ body-frame differentiation of magnetometers, considerably reducing the tracking error. We implement NeurIT on a customized robotic platform and conduct evaluation in various indoor environments. Experimental results demonstrate that NeurIT achieves a mere 1-meter tracking error over a 300-meter distance. Notably, it significantly outperforms state-of-the-art baselines by 48.21% on unseen data. Moreover, NeurIT demonstrates robustness in large urban complexes and performs comparably to the visual-inertial approach (Tango Phone) in vision-favored conditions while surpassing it in feature-sparse settings. We believe NeurIT takes an important step forward toward practical neural inertial tracking for ubiquitous and scalable tracking of robotic things. NeurIT is open-sourced here: https://github.com/aiot-lab/NeurIT.

Updated: 2025-07-30 04:26:50

标题: NeurIT：推动室内机器人物联网的神经惯性跟踪极限

摘要: 惯性跟踪对于机器人物联网至关重要，并且由于低成本惯性测量单元和深度学习驱动的跟踪算法的普及而变得流行。然而，现有的研究并没有充分利用IMU测量，特别是磁力计，也没有最大限度地发挥深度学习的潜力以实现所需的精度。为了解决这些限制，我们引入了NeurIT，将跟踪精度提升到一个新水平。NeurIT的核心是采用了时间频率块循环变压器（TF-BRT），结合了RNN和Transformer来学习时间和频率域中的代表性特征。为了充分利用IMU信息，我们策略性地利用了磁力计的身体框架差异化，大大降低了跟踪误差。我们在定制的机器人平台上实现了NeurIT，并在各种室内环境中进行了评估。实验结果表明，NeurIT在300米距离内只有1米的跟踪误差。值得注意的是，它在未知数据上的性能比最先进的基线提高了48.21%。此外，NeurIT在大型城市综合体中表现出鲁棒性，在视觉优势条件下与视觉惯性方法（Tango Phone）相媲美，同时在特征稀疏环境中超越它。我们相信NeurIT迈出了实现机器人物体的普遍性和可扩展性跟踪的重要一步。NeurIT在此开源：https://github.com/aiot-lab/NeurIT。

更新时间: 2025-07-30 04:26:50

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2404.08939v2

Prediction of acoustic field in 1-D uniform duct with varying mean flow and temperature using neural networks

Neural networks constrained by the physical laws emerged as an alternate numerical tool. In this paper, the governing equation that represents the propagation of sound inside a one-dimensional duct carrying a heterogeneous medium is derived. The problem is converted into an unconstrained optimization problem and solved using neural networks. Both the acoustic state variables: acoustic pressure and particle velocity are predicted and validated with the traditional Runge-Kutta solver. The effect of the temperature gradient on the acoustic field is studied. Utilization of machine learning techniques such as transfer learning and automatic differentiation for acoustic applications is demonstrated.

Updated: 2025-07-30 04:26:36

标题: 使用神经网络预测变化平均流动和温度下1-D均匀风道中的声场

摘要: 受物理定律约束的神经网络作为一种替代数值工具出现。本文推导出了代表一维管道内传播声音的控制方程，该管道携带着异质介质。问题被转化为一个无约束优化问题，并使用神经网络进行求解。声学状态变量：声压和粒子速度都被预测并通过传统的Runge-Kutta求解器进行验证。研究了温度梯度对声场的影响。展示了利用机器学习技术如迁移学习和自动微分进行声学应用的方法。

更新时间: 2025-07-30 04:26:36

领域: cs.LG,cs.SD,eess.AS,34A06,G.1.6; I.6.4; J.2

下载: http://arxiv.org/abs/2507.22370v1

Prediction of acoustic field in 1-D uniform duct with varying mean flow and temperature using neural networks

Updated: 2025-07-30 04:26:36

标题: 使用神经网络预测变化的均匀一维导流中声场和温度。

摘要: 受物理定律约束的神经网络成为了一种替代的数值工具。本文推导了代表一维导管内传播声音的控制方程，导管内携带着异质介质。该问题转化为一个无约束优化问题，并使用神经网络进行求解。声学状态变量：声压和粒子速度都被预测并通过传统的Runge-Kutta求解器进行验证。研究了温度梯度对声学场的影响。展示了利用机器学习技术如迁移学习和自动微分进行声学应用的方法。

更新时间: 2025-07-30 04:26:36

领域: cs.LG,cs.SD,eess.AS,34A06,G.1.6; I.6.4; J.2

下载: http://arxiv.org/abs/2507.22370v1

Exploring the Application of Visual Question Answering (VQA) for Classroom Activity Monitoring

Classroom behavior monitoring is a critical aspect of educational research, with significant implications for student engagement and learning outcomes. Recent advancements in Visual Question Answering (VQA) models offer promising tools for automatically analyzing complex classroom interactions from video recordings. In this paper, we investigate the applicability of several state-of-the-art open-source VQA models, including LLaMA2, LLaMA3, QWEN3, and NVILA, in the context of classroom behavior analysis. To facilitate rigorous evaluation, we introduce our BAV-Classroom-VQA dataset derived from real-world classroom video recordings at the Banking Academy of Vietnam. We present the methodology for data collection, annotation, and benchmark the performance of the selected VQA models on this dataset. Our initial experimental results demonstrate that all four models achieve promising performance levels in answering behavior-related visual questions, showcasing their potential in future classroom analytics and intervention systems.

Updated: 2025-07-30 04:25:14

标题: 探索视觉问答（VQA）在课堂活动监测中的应用

摘要: 课堂行为监测是教育研究中的一个关键方面，对学生参与度和学习成果有重要影响。最近，视觉问答（VQA）模型的发展为自动分析视频录像中复杂的课堂互动提供了有希望的工具。本文研究了几种最先进的开源VQA模型（包括LLaMA2、LLaMA3、QWEN3和NVILA）在课堂行为分析的背景下的适用性。为了便于严格评估，我们介绍了从越南银行学院的真实课堂视频录像中提取的BAV-Classroom-VQA数据集。我们介绍了数据收集、标注的方法，并对选择的VQA模型在这个数据集上的性能进行了基准测试。我们的初步实验结果表明，这四个模型在回答与行为相关的视觉问题方面均取得了有希望的性能水平，展示了它们在未来课堂分析和干预系统中的潜力。

更新时间: 2025-07-30 04:25:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22369v1

BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity

To alleviate the computational burden of large language models (LLMs), architectures with activation sparsity, represented by mixture-of-experts (MoE), have attracted increasing attention. However, the non-differentiable and inflexible routing of vanilla MoE hurts model performance. Moreover, while each token activates only a few parameters, these sparsely-activated architectures exhibit low chunk-level sparsity, indicating that the union of multiple consecutive tokens activates a large ratio of parameters. Such a sparsity pattern is unfriendly for acceleration under low-resource conditions (e.g., end-side devices) and incompatible with mainstream acceleration techniques (e.g., speculative decoding). To address these challenges, we introduce a novel MoE architecture, BlockFFN, as well as its efficient training and deployment techniques. Specifically, we use a router integrating ReLU activation and RMSNorm for differentiable and flexible routing. Next, to promote both token-level sparsity (TLS) and chunk-level sparsity (CLS), CLS-aware training objectives are designed, making BlockFFN more acceleration-friendly. Finally, we implement efficient acceleration kernels, combining activation sparsity and speculative decoding for the first time. The experimental results demonstrate the superior performance of BlockFFN over other MoE baselines, achieving over 80% TLS and 70% 8-token CLS. Our kernels achieve up to 3.67$\times$ speedup on real end-side devices than dense models. All codes and checkpoints are available publicly (https://github.com/thunlp/BlockFFN).

Updated: 2025-07-30 04:14:15

标题: BlockFFN：面向端侧加速友好的基于块级激活稀疏的专家混合模型

摘要: 为了减轻大型语言模型（LLMs）的计算负担，具有激活稀疏性的架构，如专家混合（MoE），引起了越来越多的关注。然而，普通MoE的不可微和不灵活的路由会损害模型性能。此外，虽然每个标记只激活了少量参数，但这些稀疏激活的架构表现出低块级稀疏性，表明多个连续标记的并集激活了大比例的参数。这种稀疏模式对低资源条件（例如端侧设备）下的加速不友好，并且与主流加速技术（例如推测解码）不兼容。为了解决这些挑战，我们引入了一种新颖的MoE架构，BlockFFN，以及其高效的训练和部署技术。具体而言，我们使用整合了ReLU激活和RMSNorm的路由器进行可微分且灵活的路由。接下来，为了促进标记级稀疏性（TLS）和块级稀疏性（CLS），设计了CLS-aware训练目标，使BlockFFN更加适合加速。最后，我们实现了高效的加速核，首次结合了激活稀疏性和推测解码。实验结果表明，BlockFFN相对于其他MoE基线具有更优越的性能，实现了超过80%的TLS和70%的8标记CLS。我们的核心在真实的端侧设备上比密集模型实现了高达3.67倍的加速。所有代码和检查点都可公开获取（https://github.com/thunlp/BlockFFN）。

更新时间: 2025-07-30 04:14:15

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2507.08771v2

BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity

Updated: 2025-07-30 04:14:15

标题: BlockFFN: 朝向端侧加速友好的专家混合体，具有块级激活稀疏性

摘要: 为了减轻大型语言模型（LLMs）的计算负担，具有激活稀疏性架构，如混合专家（MoE），引起了越来越多的关注。然而，普通MoE的不可微和不灵活的路由损害了模型性能。此外，虽然每个标记只激活少量参数，但这些稀疏激活的架构表现出低块级稀疏性，表明多个连续标记的并集激活了大比例的参数。这样的稀疏模式在低资源条件下（例如端侧设备）加速不友好，并且与主流加速技术（例如，猜测解码）不兼容。为了解决这些挑战，我们引入了一种新颖的MoE架构BlockFFN，以及其高效的训练和部署技术。具体地，我们使用一个整合ReLU激活和RMSNorm的路由器进行可微分和灵活的路由。接下来，为了促进标记级稀疏性（TLS）和块级稀疏性（CLS），设计了CLS感知训练目标，使BlockFFN更具加速友好性。最后，我们实现了高效的加速内核，首次将激活稀疏性和猜测解码结合起来。实验结果表明，BlockFFN相对于其他MoE基线具有卓越的性能，实现了80%以上的TLS和70%的8标记CLS。我们的内核在真实的端侧设备上比密集模型实现了高达3.67倍的加速。所有的代码和检查点都可以公开获取（https://github.com/thunlp/BlockFFN）。

更新时间: 2025-07-30 04:14:15

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2507.08771v2

Hybrid Quantum Classical Surrogate for Real Time Inverse Finite Element Modeling in Digital Twins

Large-scale civil structures, such as bridges, pipelines, and offshore platforms, are vital to modern infrastructure, where unexpected failures can cause significant economic and safety repercussions. Although finite element (FE) modeling is widely used for real-time structural health monitoring (SHM), its high computational cost and the complexity of inverse FE analysis, where low dimensional sensor data must map onto high-dimensional displacement or stress fields pose ongoing challenges. Here, we propose a hybrid quantum classical multilayer perceptron (QMLP) framework to tackle these issues and facilitate swift updates to digital twins across a range of structural applications. Our approach embeds sensor data using symmetric positive definite (SPD) matrices and polynomial features, yielding a representation well suited to quantum processing. A parameterized quantum circuit (PQC) transforms these features, and the resultant quantum outputs feed into a classical neural network for final inference. By fusing quantum capabilities with classical modeling, the QMLP handles large scale inverse FE mapping while preserving computational viability. Through extensive experiments on a bridge, we demonstrate that the QMLP achieves a mean squared error (MSE) of 0.0000000000316, outperforming purely classical baselines with a large margin. These findings confirm the potential of quantum-enhanced methods for real time SHM, establishing a pathway toward more efficient, scalable digital twins that can robustly monitor and diagnose structural integrity in near real time.

Updated: 2025-07-30 04:09:49

标题: 混合量子经典代理用于数字孪生中的实时反向有限元建模

摘要: 大型民用结构，如桥梁、管道和海上平台，对现代基础设施至关重要，意外故障可能导致重大经济和安全影响。虽然有限元（FE）建模被广泛用于实时结构健康监测（SHM），但其高计算成本和逆向FE分析的复杂性，其中低维传感器数据必须映射到高维位移或应力场，仍然存在挑战。在这里，我们提出了一个混合量子经典多层感知器（QMLP）框架，以解决这些问题，并促进结构应用范围内数字孪生体的快速更新。我们的方法使用对称正定（SPD）矩阵和多项式特征嵌入传感器数据，得到一个适合量子处理的表示。一个参数化的量子电路（PQC）转换这些特征，而结果量子输出进入经典神经网络进行最终推理。通过融合量子能力与经典建模，QMLP处理大规模逆向FE映射同时保持计算可行性。通过在桥梁上的大量实验，我们证明QMLP的均方误差（MSE）为0.0000000000316，远远优于纯经典基线。这些发现证实了量子增强方法在实时SHM方面的潜力，为更高效、可扩展的数字孪生体开辟了道路，可以在几乎实时监测和诊断结构完整性。

更新时间: 2025-07-30 04:09:49

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2508.00029v1

Beyond Accuracy: How AI Metacognitive Sensitivity improves AI-assisted Decision Making

In settings where human decision-making relies on AI input, both the predictive accuracy of the AI system and the reliability of its confidence estimates influence decision quality. We highlight the role of AI metacognitive sensitivity -- its ability to assign confidence scores that accurately distinguish correct from incorrect predictions -- and introduce a theoretical framework for assessing the joint impact of AI's predictive accuracy and metacognitive sensitivity in hybrid decision-making settings. Our analysis identifies conditions under which an AI with lower predictive accuracy but higher metacognitive sensitivity can enhance the overall accuracy of human decision making. Finally, a behavioral experiment confirms that greater AI metacognitive sensitivity improves human decision performance. Together, these findings underscore the importance of evaluating AI assistance not only by accuracy but also by metacognitive sensitivity, and of optimizing both to achieve superior decision outcomes.

Updated: 2025-07-30 04:05:50

标题: 超越准确性：AI元认知敏感性如何改善AI辅助决策-making

摘要: 在依赖人工智能输入进行决策的环境中，人工智能系统的预测准确性和其置信度的可靠性都会影响决策质量。我们强调人工智能元认知敏感性的作用——其能够准确地分辨正确和错误预测的能力，并介绍了一个评估人工智能预测准确性和元认知敏感性在混合决策环境中联合影响的理论框架。我们的分析确定了一些条件，即在这些条件下，具有较低预测准确性但较高元认知敏感性的人工智能能够提升人类决策的整体准确性。最后，一项行为实验证实了更高的人工智能元认知敏感性可以提升人类决策表现。总的来说，这些发现强调了不仅通过准确性而且通过元认知敏感性来评估人工智能协助的重要性，并且优化这两者以实现更优越的决策结果。

更新时间: 2025-07-30 04:05:50

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2507.22365v1

ST-GDance: Long-Term and Collision-Free Group Choreography from Music

Group dance generation from music has broad applications in film, gaming, and animation production. However, it requires synchronizing multiple dancers while maintaining spatial coordination. As the number of dancers and sequence length increase, this task faces higher computational complexity and a greater risk of motion collisions. Existing methods often struggle to model dense spatial-temporal interactions, leading to scalability issues and multi-dancer collisions. To address these challenges, we propose ST-GDance, a novel framework that decouples spatial and temporal dependencies to optimize long-term and collision-free group choreography. We employ lightweight graph convolutions for distance-aware spatial modeling and accelerated sparse attention for efficient temporal modeling. This design significantly reduces computational costs while ensuring smooth and collision-free interactions. Experiments on the AIOZ-GDance dataset demonstrate that ST-GDance outperforms state-of-the-art baselines, particularly in generating long and coherent group dance sequences. Project page: https://yilliajing.github.io/ST-GDance-Website/.

Updated: 2025-07-30 03:57:47

标题: ST-GDance：音乐中的长期无碰撞团体舞蹈Choreography

摘要: 音乐生成的群体舞蹈在电影、游戏和动画制作中有广泛的应用。然而，这需要在保持空间协调的同时同步多个舞者。随着舞者数量和序列长度的增加，这项任务面临着更高的计算复杂性和更大的动作碰撞风险。现有方法常常难以建模密集的空间-时间交互作用，导致可扩展性问题和多舞者碰撞。为了解决这些挑战，我们提出了ST-GDance，一个新颖的框架，它将空间和时间依赖性解耦，以优化长期和无碰撞的群体编舞。我们采用轻量级图卷积进行基于距离感知的空间建模，采用加速稀疏注意力进行高效的时间建模。这种设计显著降低了计算成本，同时确保了平滑和无碰撞的交互作用。在AIOZ-GDance数据集上的实验证明，ST-GDance在生成长而连贯的群体舞蹈序列方面优于现有基线。项目页面：https://yilliajing.github.io/ST-GDance-Website/.

更新时间: 2025-07-30 03:57:47

领域: cs.AI

下载: http://arxiv.org/abs/2507.21518v2

Object Recognition Datasets and Challenges: A Review

Object recognition is among the fundamental tasks in the computer vision applications, paving the path for all other image understanding operations. In every stage of progress in object recognition research, efforts have been made to collect and annotate new datasets to match the capacity of the state-of-the-art algorithms. In recent years, the importance of the size and quality of datasets has been intensified as the utility of the emerging deep network techniques heavily relies on training data. Furthermore, datasets lay a fair benchmarking means for competitions and have proved instrumental to the advancements of object recognition research by providing quantifiable benchmarks for the developed models. Taking a closer look at the characteristics of commonly-used public datasets seems to be an important first step for data-driven and machine learning researchers. In this survey, we provide a detailed analysis of datasets in the highly investigated object recognition areas. More than 160 datasets have been scrutinized through statistics and descriptions. Additionally, we present an overview of the prominent object recognition benchmarks and competitions, along with a description of the metrics widely adopted for evaluation purposes in the computer vision community. All introduced datasets and challenges can be found online at github.com/AbtinDjavadifar/ORDC.

Updated: 2025-07-30 03:56:37

标题: 目标识别数据集和挑战：一项综述

摘要: 目标识别是计算机视觉应用中的基本任务之一，为所有其他图像理解操作铺平了道路。在目标识别研究的每个阶段，都在努力收集和注释新的数据集，以匹配最先进算法的能力。近年来，数据集的规模和质量的重要性日益加强，因为新兴的深度网络技术的效用在很大程度上依赖于训练数据。此外，数据集为竞赛提供了公平的基准手段，并通过为开发的模型提供可量化的基准，已被证明对目标识别研究的进展至关重要。仔细研究常用公共数据集的特征似乎是数据驱动和机器学习研究人员的重要第一步。在本调查中，我们通过统计和描述对高度研究的目标识别领域的数据集进行了详细分析。超过160个数据集经过仔细审查。此外，我们介绍了著名的目标识别基准和比赛概况，以及计算机视觉社区广泛采用的用于评估目的的度量描述。所有介绍的数据集和挑战可在线找到github.com/AbtinDjavadifar/ORDC。

更新时间: 2025-07-30 03:56:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22361v1

GVD: Guiding Video Diffusion Model for Scalable Video Distillation

To address the larger computation and storage requirements associated with large video datasets, video dataset distillation aims to capture spatial and temporal information in a significantly smaller dataset, such that training on the distilled data has comparable performance to training on all of the data. We propose GVD: Guiding Video Diffusion, the first diffusion-based video distillation method. GVD jointly distills spatial and temporal features, ensuring high-fidelity video generation across diverse actions while capturing essential motion information. Our method's diverse yet representative distillations significantly outperform previous state-of-the-art approaches on the MiniUCF and HMDB51 datasets across 5, 10, and 20 Instances Per Class (IPC). Specifically, our method achieves 78.29 percent of the original dataset's performance using only 1.98 percent of the total number of frames in MiniUCF. Additionally, it reaches 73.83 percent of the performance with just 3.30 percent of the frames in HMDB51. Experimental results across benchmark video datasets demonstrate that GVD not only achieves state-of-the-art performance but can also generate higher resolution videos and higher IPC without significantly increasing computational cost.

Updated: 2025-07-30 03:51:35

标题: GVD：用于可扩展视频提炼的引导视频扩散模型

摘要: 为了解决大型视频数据集所需的计算和存储需求，视频数据集精炼旨在捕获空间和时间信息，并在一个显著较小的数据集中，使得在精炼数据上训练具有与在所有数据上训练相当的性能。我们提出了GVD：引导视频扩散，这是第一个基于扩散的视频精炼方法。GVD共同精炼空间和时间特征，确保在各种动作中高保真度的视频生成，同时捕获基本的运动信息。我们的方法的多样化却又代表性的精炼明显优于以往在MiniUCF和HMDB51数据集上的最新方法，分别在每类5、10和20个实例（IPC）上。具体来说，我们的方法仅使用MiniUCF总帧数的1.98％就达到了原始数据集性能的78.29％。此外，仅使用HMDB51总帧数的3.30％就达到了73.83％的性能。跨基准视频数据集的实验结果表明，GVD不仅实现了最新技术性能，还可以生成更高分辨率的视频和更高IPC，而不会显著增加计算成本。

更新时间: 2025-07-30 03:51:35

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22360v1

LLM-Crowdsourced: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

Although large language models (LLMs) demonstrate remarkable capabilities across various tasks, evaluating their capabilities remains a challenging task. Existing evaluation methods suffer from issues such as data contamination, black-box operation, and subjective preference. These issues make it difficult to evaluate the LLMs' true capabilities comprehensively. To tackle these challenges, we propose a novel benchmark-free evaluation paradigm, LLM-Crowdsourced. It utilizes LLMs to generate questions, answer independently, and evaluate mutually. This method integrates four key evaluation criteria: dynamic, transparent, objective, and professional, which existing evaluation methods cannot satisfy simultaneously. Experiments on eight mainstream LLMs across mathematics and programming verify the advantages of our method in distinguishing LLM performance. Furthermore, our study reveals several novel findings that are difficult for traditional methods to detect, including but not limited to: (1) Gemini demonstrates the highest original and professional question-design capabilities among others; (2) Some LLMs exhibit ''memorization-based answering'' by misrecognizing questions as familiar ones with a similar structure; (3) LLM evaluation results demonstrate high consistency (robustness).

Updated: 2025-07-30 03:50:46

标题: LLM-Crowdsourced：一种无基准的大型语言模型互相评估范式

摘要: 尽管大型语言模型(LLMs)在各种任务中展现出卓越的能力，但评估它们的能力仍然是一个具有挑战性的任务。现有的评估方法存在数据污染、黑匣操作和主观偏好等问题。这些问题使得全面评估LLMs的真实能力变得困难。为了应对这些挑战，我们提出了一种新颖的无基准评估范式，即LLM-众包。它利用LLMs生成问题，独立回答并相互评估。该方法整合了四个关键的评估标准：动态、透明、客观和专业，这些现有的评估方法无法同时满足。在数学和编程领域对八种主流LLMs进行实验验证了我们方法在区分LLM性能方面的优势。此外，我们的研究揭示了几项传统方法难以检测到的新发现，包括但不限于：(1) Gemini在原创和专业问题设计能力方面表现出最高水平；(2) 一些LLMs表现出“基于记忆的回答”，将问题误认为具有相似结构的熟悉问题；(3) LLM评估结果表现出高一致性(稳健性)。

更新时间: 2025-07-30 03:50:46

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2507.22359v1

FastTrackTr:Towards Fast Multi-Object Tracking with Transformers

Transformer-based multi-object tracking (MOT) methods have captured the attention of many researchers in recent years. However, these models often suffer from slow inference speeds due to their structure or other issues. To address this problem, we revisited the Joint Detection and Tracking (JDT) method by looking back at past approaches. By integrating the original JDT approach with some advanced theories, this paper employs an efficient method of information transfer between frames on the DETR, constructing a fast and novel JDT-type MOT framework: FastTrackTr. Thanks to the superiority of this information transfer method, our approach not only reduces the number of queries required during tracking but also avoids the excessive introduction of network structures, ensuring model simplicity. Experimental results indicate that our method has the potential to achieve real-time tracking and exhibits competitive tracking accuracy across multiple datasets.

Updated: 2025-07-30 03:49:48

标题: FastTrackTr：朝向使用Transformer实现快速多目标跟踪

摘要: 基于Transformer的多目标跟踪（MOT）方法近年来引起许多研究人员的关注。然而，这些模型通常由于其结构或其他问题而导致推理速度较慢。为解决这一问题，我们重新审视了联合检测和跟踪（JDT）方法，回顾了过去的方法。通过将原始JDT方法与一些先进理论相结合，本文在DETR上采用了一种高效的帧间信息传递方法，构建了一个快速而新颖的JDT类型MOT框架：FastTrackTr。由于这种信息传递方法的优越性，我们的方法不仅减少了跟踪过程中所需的查询数量，还避免了对网络结构的过度引入，确保了模型的简单性。实验结果表明，我们的方法有潜力实现实时跟踪，并在多个数据集上展示出竞争性的跟踪准确性。

更新时间: 2025-07-30 03:49:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2411.15811v4

Magentic-UI: Towards Human-in-the-loop Agentic Systems

AI agents powered by large language models are increasingly capable of autonomously completing complex, multi-step tasks using external tools. Yet, they still fall short of human-level performance in most domains including computer use, software development, and research. Their growing autonomy and ability to interact with the outside world, also introduces safety and security risks including potentially misaligned actions and adversarial manipulation. We argue that human-in-the-loop agentic systems offer a promising path forward, combining human oversight and control with AI efficiency to unlock productivity from imperfect systems. We introduce Magentic-UI, an open-source web interface for developing and studying human-agent interaction. Built on a flexible multi-agent architecture, Magentic-UI supports web browsing, code execution, and file manipulation, and can be extended with diverse tools via Model Context Protocol (MCP). Moreover, Magentic-UI presents six interaction mechanisms for enabling effective, low-cost human involvement: co-planning, co-tasking, multi-tasking, action guards, and long-term memory. We evaluate Magentic-UI across four dimensions: autonomous task completion on agentic benchmarks, simulated user testing of its interaction capabilities, qualitative studies with real users, and targeted safety assessments. Our findings highlight Magentic-UI's potential to advance safe and efficient human-agent collaboration.

Updated: 2025-07-30 03:49:14

标题: "Magentic-UI: 朝向人机协同的主体系统"

摘要: 由大型语言模型驱动的人工智能代理越来越能够自主完成复杂的多步任务，使用外部工具。然而，在大多数领域，包括计算机使用、软件开发和研究，它们仍然无法达到人类水平的表现。它们日益增长的自主性和与外部世界的互动能力也引入了安全和安全风险，包括潜在的不协调行动和对抗性操纵。我们认为，人机协同的代理系统提供了一个有前途的发展路径，将人类监督和控制与人工智能效率相结合，释放不完美系统的生产力。我们介绍了Magentic-UI，一个用于开发和研究人机交互的开源Web界面。基于灵活的多代理架构，Magentic-UI支持Web浏览、代码执行和文件操作，并可以通过模型上下文协议（MCP）扩展使用各种工具。此外，Magentic-UI提供了六种交互机制，以实现有效、低成本的人类参与：共同规划、共同任务、多任务、行动保卫和长期记忆。我们评估了Magentic-UI在四个维度上的表现：在代理基准上的自主任务完成、对其交互能力进行模拟用户测试、与真实用户进行定性研究以及针对安全性的定向评估。我们的研究结果突显了Magentic-UI推进安全高效的人机协作的潜力。

更新时间: 2025-07-30 03:49:14

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2507.22358v1

($\boldsymbolθ_l, \boldsymbolθ_u$)-Parametric Multi-Task Optimization: Joint Search in Solution and Infinite Task Spaces

Multi-task optimization is typically characterized by a fixed and finite set of tasks. The present paper relaxes this condition by considering a non-fixed and potentially infinite set of optimization tasks defined in a parameterized, continuous and bounded task space. We refer to this unique problem setting as parametric multi-task optimization (PMTO). Assuming the bounds of the task parameters to be ($\boldsymbol{\theta}_l$, $\boldsymbol{\theta}_u$), a novel ($\boldsymbol{\theta}_l$, $\boldsymbol{\theta}_u$)-PMTO algorithm is crafted to operate in two complementary modes. In an offline optimization mode, a joint search over solution and task spaces is carried out with the creation of two approximation models: (1) for mapping points in a unified solution space to the objective spaces of all tasks, which provably accelerates convergence by acting as a conduit for inter-task knowledge transfers, and (2) for probabilistically mapping tasks to their corresponding solutions, which facilitates evolutionary exploration of under-explored regions of the task space. In the online mode, the derived models enable direct optimization of any task within the bounds without the need to search from scratch. This outcome is validated on both synthetic test problems and practical case studies, with the significant real-world applicability of PMTO shown towards fast reconfiguration of robot controllers under changing task conditions. The potential of PMTO to vastly speedup the search for solutions to minimax optimization problems is also demonstrated through an example in robust engineering design.

Updated: 2025-07-30 03:39:06

标题: ($\boldsymbolθ_l, \boldsymbolθ_u$)-参数化多任务优化：在解决方案和无限任务空间中的联合搜索

摘要: 多任务优化通常以固定和有限的任务集为特征。本文通过考虑一个参数化、连续且有界的任务空间中定义的非固定且潜在无限的优化任务集，放宽了这一条件。我们将这个独特的问题设置称为参数化多任务优化（PMTO）。假设任务参数的边界为（$\boldsymbol{\theta}_l$，$\boldsymbol{\theta}_u$），设计了一种新的（$\boldsymbol{\theta}_l$，$\boldsymbol{\theta}_u$）-PMTO算法，以在两种互补模式下运行。在离线优化模式中，对解空间和任务空间进行联合搜索，创建两个逼近模型：（1）用于将统一解空间中的点映射到所有任务的目标空间，通过作为任务间知识转移的传道加速收敛，（2）用于将任务概率地映射到其相应解决方案，从而促进任务空间中未开发区域的进化探索。在在线模式下，派生模型使得在边界内直接优化任何任务成为可能，而无需从头开始搜索。这一结果在合成测试问题和实际案例研究中得到验证，表明PMTO在快速重新配置机器人控制器以适应不断变化的任务条件方面具有显著的现实世界适用性。通过一个在鲁棒工程设计中的示例，展示了PMTO在大大加快寻找最小最大优化问题解决方案的潜力。

更新时间: 2025-07-30 03:39:06

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2503.08394v3

($\boldsymbolθ_l, \boldsymbolθ_u$)-Parametric Multi-Task Optimization: Joint Search in Solution and Infinite Task Spaces

Updated: 2025-07-30 03:39:06

标题: ($\boldsymbolθ_l, \boldsymbolθ_u$)-参数化多任务优化：在解决方案和无限任务空间中的联合搜索

摘要: 多任务优化通常以一组固定和有限的任务为特征。本文通过考虑一个参数化、连续和有界的任务空间中定义的非固定且潜在无限的优化任务集，放宽了这个条件。我们将这个独特的问题设置称为参数化多任务优化（PMTO）。假设任务参数的边界为（$\boldsymbol{\theta}_l$，$\boldsymbol{\theta}_u$），设计了一种新颖的（$\boldsymbol{\theta}_l$，$\boldsymbol{\theta}_u$）-PMTO算法，以在两种互补模式下运行。在离线优化模式下，对解空间和任务空间进行联合搜索，创建两个近似模型：（1）用于将统一解空间中的点映射到所有任务的目标空间，据证明通过作为跨任务知识转移的桥梁加速收敛，（2）用于将任务概率地映射到其对应的解决方案，从而促进任务空间中未开发区域的进化探索。在在线模式下，派生模型使得在边界内直接优化任何任务，无需从头开始搜索。这一结果在合成测试问题和实际案例研究中得到验证，显示了PMTO在机器人控制器在不断变化的任务条件下快速重新配置中的重要现实应用性。通过一个强健的工程设计示例，也展示了PMTO大大加速寻找最小最大优化问题解决方案的潜力。

更新时间: 2025-07-30 03:39:06

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2503.08394v3

Benchmarking Android Malware Detection: Traditional vs. Deep Learning Models

Android malware detection has been extensively studied using both traditional machine learning (ML) and deep learning (DL) approaches. While many state-of-the-art detection models, particularly those based on DL, claim superior performance, they often rely on limited comparisons, lacking comprehensive benchmarking against traditional ML models across diverse datasets. This raises concerns about the robustness of DL-based approaches' performance and the potential oversight of simpler, more efficient ML models. In this paper, we conduct a systematic evaluation of Android malware detection models across four datasets: three recently published, publicly available datasets and a large-scale dataset we systematically collected. We implement a range of traditional ML models, including Random Forests (RF) and CatBoost, alongside advanced DL models such as Capsule Graph Neural Networks (CapsGNN), BERT-based models, and ExcelFormer based models. Our results reveal that in many cases simpler and more computationally efficient ML models achieve comparable or even superior performance compared with DL models. These findings highlight the need for rigorous benchmarking in Android malware detection research. We encourage future studies to conduct more comprehensive benchmarking comparisons between traditional and advanced models to ensure a more accurate assessment of detection capabilities. To facilitate further research, we provide access to our dataset, including app IDs, hash values, and labels.

Updated: 2025-07-30 03:35:25

标题: 基准测试安卓恶意软件检测：传统与深度学习模型

摘要: Android恶意软件检测已经被广泛研究，使用传统机器学习（ML）和深度学习（DL）方法。虽然许多最先进的检测模型，尤其是基于DL的模型，声称具有优越性能，但它们常常依赖有限的比较，缺乏对不同数据集全面基准测试的比较。这引发了对DL模型性能的稳健性以及可能忽视更简单、更高效的ML模型的担忧。在本文中，我们对Android恶意软件检测模型进行了系统评估，涵盖了四个数据集：三个最近发布的公开数据集和一个我们系统收集的大规模数据集。我们实现了一系列传统ML模型，包括随机森林（RF）和CatBoost，以及一些先进的DL模型，如胶囊图神经网络（CapsGNN）、基于BERT的模型和ExcelFormer模型。我们的结果表明，在许多情况下，更简单、更高效的ML模型实现了与DL模型相当甚至更优越的性能。这些发现强调了在Android恶意软件检测研究中进行严格基准测试的必要性。我们鼓励未来的研究进行更全面的基准测试比较，以确保更准确地评估检测能力。为了促进进一步的研究，我们提供了我们的数据集访问权限，包括应用程序ID、哈希值和标签。

更新时间: 2025-07-30 03:35:25

领域: cs.CR

下载: http://arxiv.org/abs/2502.15041v2

A ChatGPT-based approach for questions generation in higher education

Large language models have been widely applied in many aspects of real life, bringing significant efficiency to businesses and offering distinctive user experiences. In this paper, we focus on exploring the application of ChatGPT, a chatbot based on a large language model, to support higher educator in generating quiz questions and assessing learners. Specifically, we explore interactive prompting patterns to design an optimal AI-powered question bank creation process. The generated questions are evaluated through a "Blind test" survey sent to various stakeholders including lecturers and learners. Initial results at the Banking Academy of Vietnam are relatively promising, suggesting a potential direction to streamline the time and effort involved in assessing learners at higher education institutes.

Updated: 2025-07-30 03:29:41

标题: 一个基于ChatGPT的方法：高等教育中的问题生成

摘要: 大型语言模型已广泛应用于现实生活的许多方面，为企业带来了显著的效率提升，并提供了独特的用户体验。本文重点探讨了ChatGPT的应用，这是一个基于大型语言模型的聊天机器人，旨在支持高等教育工作者生成测验题和评估学习者。具体而言，我们探索了交互提示模式，以设计最佳的AI驱动的题库创建过程。生成的问题通过向包括讲师和学习者在内的各方利益相关者发送的“盲测”调查进行评估。在越南银行学院的初步结果相对令人鼓舞，表明了简化高等教育机构评估学习者所涉及的时间和精力的潜在方向。

更新时间: 2025-07-30 03:29:41

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2507.21174v2

Scalable Spectrum Availability Prediction using a Markov Chain Framework and ITU-R Propagation Models

Spectrum resources are often underutilized across time and space, motivating dynamic spectrum access strategies that allow secondary users to exploit unused frequencies. A key challenge is predicting when and where spectrum will be available (i.e., unused by primary licensed users) in order to enable proactive and interference-free access. This paper proposes a scalable framework for spectrum availability prediction that combines a two-state Markov chain model of primary user activity with high-fidelity propagation models from the ITU-R (specifically Recommendations P.528 and P.2108). The Markov chain captures temporal occupancy patterns, while the propagation models incorporate path loss and clutter effects to determine if primary signals exceed interference thresholds at secondary user locations. By integrating these components, the proposed method can predict spectrum opportunities both in time and space with improved accuracy. We develop the system model and algorithm for the approach, analyze its scalability and computational efficiency, and discuss assumptions, limitations, and potential applications. The framework is flexible and can be adapted to various frequency bands and scenarios. The results and analysis show that the proposed approach can effectively identify available spectrum with low computational cost, making it suitable for real-time spectrum management in cognitive radio networks and other dynamic spectrum sharing systems.

Updated: 2025-07-30 03:22:55

标题: 可扩展的频谱可用性预测：使用马尔可夫链框架和ITU-R传播模型

摘要: 频谱资源在时间和空间上经常被低效利用，这激发了动态频谱访问策略，允许次要用户利用未使用的频率。一个关键挑战是预测频谱何时何地可用（即未被主要许可用户使用），以便实现主动且无干扰的访问。本文提出了一种可扩展的频谱可用性预测框架，结合了主要用户活动的两状态马尔可夫链模型和ITU-R的高保真传播模型（具体来说是推荐 P.528 和 P.2108）。马尔可夫链捕捉了时间占用模式，而传播模型则结合了路径损耗和杂波效应，以确定在次要用户位置主要信号是否超过干扰阈值。通过集成这些组件，提出的方法可以更准确地预测时间和空间上的频谱机会。我们开发了该方法的系统模型和算法，分析了其可扩展性和计算效率，并讨论了假设、限制和潜在应用。该框架灵活，并可适应各种频段和场景。结果和分析表明，提出的方法可以有效地以较低的计算成本识别可用频谱，适用于认知无线电网络和其他动态频谱共享系统的实时频谱管理。

更新时间: 2025-07-30 03:22:55

领域: cs.NI,cs.AI,cs.CL,cs.NA,math.NA

下载: http://arxiv.org/abs/2508.00028v1

MSQ: Memory-Efficient Bit Sparsification Quantization

As deep neural networks (DNNs) see increased deployment on mobile and edge devices, optimizing model efficiency has become crucial. Mixed-precision quantization is widely favored, as it offers a superior balance between efficiency and accuracy compared to uniform quantization. However, finding the optimal precision for each layer is challenging. Recent studies utilizing bit-level sparsity have shown promise, yet they often introduce substantial training complexity and high GPU memory requirements. In this paper, we propose Memory-Efficient Bit Sparsification Quantization (MSQ), a novel approach that addresses these limitations. MSQ applies a round-clamp quantizer to enable differentiable computation of the least significant bits (LSBs) from model weights. It further employs regularization to induce sparsity in these LSBs, enabling effective precision reduction without explicit bit-level parameter splitting. Additionally, MSQ incorporates Hessian information, allowing the simultaneous pruning of multiple LSBs to further enhance training efficiency. Experimental results show that MSQ achieves up to 8.00x reduction in trainable parameters and up to 86% reduction in training time compared to previous bit-level quantization, while maintaining competitive accuracy and compression rates. This makes it a practical solution for training efficient DNNs on resource-constrained devices.

Updated: 2025-07-30 03:21:29

标题: MSQ：内存高效的比特稀疏量化

摘要: 随着深度神经网络（DNNs）在移动和边缘设备上的部署增加，优化模型效率变得至关重要。混合精度量化被广泛青睐，因为与均匀量化相比，它在效率和准确性之间提供了更好的平衡。然而，找到每一层的最佳精度是具有挑战性的。最近利用位级稀疏性的研究显示出了潜力，但它们通常引入了大量的训练复杂性和高GPU内存需求。在本文中，我们提出了一种新颖的方法，称为Memory-Efficient Bit Sparsification Quantization（MSQ），它解决了这些限制。MSQ应用了一个round-clamp量化器，以便从模型权重中计算出最不显著的位（LSBs）。它进一步采用正则化来诱导这些LSBs中的稀疏性，从而实现有效的精度降低，而无需显式的位级参数分割。此外，MSQ还整合了Hessian信息，允许同时修剪多个LSB，以进一步增强训练效率。实验结果表明，与以前的位级量化相比，MSQ在可训练参数上实现了高达8.00倍的降低，并在训练时间上实现了高达86％的降低，同时保持竞争力的准确性和压缩率。这使其成为在资源受限设备上训练高效DNNs的实际解决方案。

更新时间: 2025-07-30 03:21:29

领域: cs.LG

下载: http://arxiv.org/abs/2507.22349v1

MSQ: Memory-Efficient Bit Sparsification Quantization

Updated: 2025-07-30 03:21:29

标题: MSQ：高效存储的比特稀疏量化

摘要: 随着深度神经网络（DNNs）在移动和边缘设备上的部署增加，优化模型效率变得至关重要。混合精度量化被广泛青睐，因为与均匀量化相比，它在效率和准确性之间提供了更好的平衡。然而，找到每个层的最佳精度是具有挑战性的。最近利用比特级稀疏性的研究显示出了希望，但它们往往引入了重大的训练复杂性和高GPU内存需求。在本文中，我们提出了一种新颖的方法，称为Memory-Efficient Bit Sparsification Quantization（MSQ），以解决这些限制。MSQ应用了一个round-clamp量化器，以实现对模型权重的最低有效位（LSBs）的可微计算。它进一步采用正则化来在这些LSBs中引入稀疏性，从而实现有效的精度减少，而无需显式比特级参数分割。此外，MSQ还包含Hessian信息，允许同时修剪多个LSBs，以进一步增强训练效率。实验结果显示，与先前的比特级量化相比，MSQ在可训练参数上实现了高达8.00倍的减少，并在训练时间上实现了高达86%的减少，同时保持了竞争力的准确性和压缩率。这使得它成为在资源受限设备上训练高效的DNNs的实用解决方案。

更新时间: 2025-07-30 03:21:29

领域: cs.LG

下载: http://arxiv.org/abs/2507.22349v1

Quantum Semi-Random Forests for Qubit-Efficient Recommender Systems

Modern recommenders describe each item with hundreds of sparse semantic tags, yet most quantum pipelines still map one qubit per tag, demanding well beyond one hundred qubits, far out of reach for current noisy-intermediate-scale quantum (NISQ) devices and prone to deep, error-amplifying circuits. We close this gap with a three-stage hybrid machine learning algorithm that compresses tag profiles, optimizes feature selection under a fixed qubit budget via QAOA, and scores recommendations with a Quantum semi-Random Forest (QsRF) built on just five qubits, while performing similarly to the state-of-the-art methods. Leveraging SVD sketching and k-means, we learn a 1000-atom dictionary ($>$97 \% variance), then solve a 2020 QUBO via depth-3 QAOA to select 5 atoms. A 100-tree QsRF trained on these codes matches full-feature baselines on ICM-150/500.

Updated: 2025-07-30 03:20:44

标题: 量子半随机森林用于量子比特高效的推荐系统

摘要: 现代推荐系统用数百个稀疏的语义标签描述每个物品，然而大多数量子流水线仍然将每个标签映射到一个量子比特，需要远远超过一百个量子比特，这远远超出了当前嘈杂的中等规模量子（NISQ）设备的能力，并且容易产生深层、错误放大的电路。我们通过一个三阶段混合机器学习算法来弥合这一差距，该算法压缩标签配置文件，通过QAOA在固定的量子比特预算下优化特征选择，并利用仅仅五个量子比特构建的量子半随机森林（QsRF）对推荐进行评分，而且表现与最先进的方法相似。利用SVD素描和k-means，我们学习了一个包含1000个原子的字典（> 97%方差），然后通过深度为3的QAOA解决了一个2020 QUBO问题以选择5个原子。在这些编码上训练的100棵树的QsRF与ICM-150/500上的完整特征基线相匹配。

更新时间: 2025-07-30 03:20:44

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2508.00027v1

Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning

Solving Bengali Math Word Problems (MWPs) remains a major challenge in natural language processing (NLP) due to the language's low-resource status and the multi-step reasoning required. Existing models struggle with complex Bengali MWPs, largely because no human-annotated Bengali dataset has previously addressed this task. This gap has limited progress in Bengali mathematical reasoning. To address this, we created SOMADHAN, a dataset of 8792 complex Bengali MWPs with manually written, step-by-step solutions. We designed this dataset to support reasoning-focused evaluation and model development in a linguistically underrepresented context. Using SOMADHAN, we evaluated a range of large language models (LLMs) - including GPT-4o, GPT-3.5 Turbo, LLaMA series models, Deepseek, and Qwen - through both zero-shot and few-shot prompting with and without Chain of Thought (CoT) reasoning. CoT prompting consistently improved performance over standard prompting, especially in tasks requiring multi-step logic. LLaMA-3.3 70B achieved the highest accuracy of 88% with few-shot CoT prompting. We also applied Low-Rank Adaptation (LoRA) to fine-tune models efficiently, enabling them to adapt to Bengali MWPs with minimal computational cost. Our work fills a critical gap in Bengali NLP by providing a high-quality reasoning dataset and a scalable framework for solving complex MWPs. We aim to advance equitable research in low-resource languages and enhance reasoning capabilities in educational and language technologies.

Updated: 2025-07-30 03:20:16

标题: 利用大型语言模型进行孟加拉语数学文字问题解决与思维链推理

摘要: 解决孟加拉数学应用问题（MWPs）仍然是自然语言处理（NLP）中的一个主要挑战，因为该语言资源稀缺且需要多步推理。现有模型在处理复杂的孟加拉数学应用问题时遇到困难，主要是因为之前没有人工标注的孟加拉数据集涉及到这一任务。这一缺口限制了孟加拉数学推理的进展。为了解决这个问题，我们创建了SOMADHAN，一个包含8792个复杂孟加拉数学应用问题的数据集，附有手工编写的逐步解决方案。我们设计这个数据集来支持以推理为重点的评估和模型开发，在语言上被较少代表的环境中。使用SOMADHAN，我们通过零提示和少量提示评估了一系列大型语言模型（LLMs），包括GPT-4o、GPT-3.5 Turbo、LLaMA系列模型、Deepseek和Qwen。在有或无思维链（CoT）推理的情况下，CoT提示始终提高了性能，特别是在需要多步逻辑的任务中。LLaMA-3.3 70B在少量CoT提示下取得了88%的最高准确率。我们还应用了低秩适应（LoRA）来有效地微调模型，使其能够以最小的计算成本适应孟加拉数学应用问题。我们的工作通过提供高质量的推理数据集和一个可扩展的框架来解决复杂的数学应用问题，填补了孟加拉NLP中的重要空白。我们旨在推动低资源语言中的平等研究，并增强教育和语言技术中的推理能力。

更新时间: 2025-07-30 03:20:16

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2505.21354v2

Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning

Updated: 2025-07-30 03:20:16

标题: 利用大型语言模型进行孟加拉语数学应用题解决与思维链推理

摘要: 解决孟加拉数学文字问题（MWPs）仍然是自然语言处理（NLP）中的一个重大挑战，这是由于该语言的低资源状态和所需的多步推理。现有模型在处理复杂的孟加拉MWPs时遇到困难，主要是因为以前没有人工注释的孟加拉数据集来解决这一任务。这一空白限制了孟加拉数学推理的进展。为了解决这个问题，我们创建了SOMADHAN，这是一个包含8792个复杂孟加拉MWPs的数据集，其中包括手动编写的逐步解决方案。我们设计这个数据集来支持在语言学上未被充分代表的环境中进行推理重点评估和模型开发。使用SOMADHAN，我们通过零-shot和少-shot提示评估了一系列大型语言模型（LLMs），包括GPT-4o、GPT-3.5 Turbo、LLaMA系列模型、Deepseek和Qwen等，其中包括和不包括思维链（CoT）推理。CoT提示始终在标准提示上提高性能，特别是在需要多步逻辑的任务中。LLaMA-3.3 70B通过少量CoT提示达到了88%的最高准确率。我们还应用了低秩适应（LoRA）来高效地微调模型，使它们能够以最小的计算成本适应孟加拉MWPs。我们的工作通过提供高质量的推理数据集和一个可扩展的框架来解决复杂的MWPs，填补了孟加拉NLP中的一个关键空白。我们的目标是推动低资源语言中的公平研究，并增强教育和语言技术中的推理能力。

更新时间: 2025-07-30 03:20:16

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2505.21354v2

Benchmarking Fraud Detectors on Private Graph Data

We introduce the novel problem of benchmarking fraud detectors on private graph-structured data. Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs. We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties (e.g., vendors or researchers). The third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results. We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results. In simulations of a privacy-sensitive benchmark for facial recognition algorithms by the National Institute of Standards and Technology (NIST), our attack achieves near perfect accuracy in identifying whether individuals' data is present in a private dataset, with a True Positive Rate of 0.98 at a False Positive Rate of 0.00. We then study how to benchmark algorithms while satisfying a formal differential privacy (DP) guarantee. We empirically evaluate two classes of solutions: subsample-and-aggregate and DP synthetic graph data. We demonstrate through extensive experiments that current approaches do not provide utility when guaranteeing DP. Our results indicate that the error arising from DP trades off between bias from distorting graph structure and variance from adding random noise. Current methods lie on different points along this bias-variance trade-off, but more complex methods tend to require high-variance noise addition, undermining utility.

Updated: 2025-07-30 03:20:15

标题: 在私有图数据上对欺诈检测器进行基准测试

摘要: 我们介绍了在私人图结构数据上对欺诈检测器进行基准测试的新问题。目前，许多类型的欺诈部分由操作在图上的自动检测算法进行管理。我们考虑数据持有者希望将欺诈检测器的开发外包给第三方（例如供应商或研究人员）的情况。第三方向数据持有者提交其欺诈检测器，数据持有者评估这些算法在私人数据集上的表现，然后公开传达结果。我们提出了对该系统的一个现实隐私攻击，允许对手仅基于评估结果对个人数据进行去匿名化。在国家标准与技术研究所（NIST）进行的面部识别算法隐私敏感基准测试模拟中，我们的攻击几乎完美地实现了在私人数据集中识别个人数据是否存在的准确性，真阳性率为0.98，假阳性率为0.00。然后，我们研究如何在满足形式化差分隐私（DP）保证的情况下对算法进行基准测试。我们在实验中评估了两类解决方案：子样本和聚合以及DP合成图数据。我们通过大量实验表明，当前方法在保证DP时并不提供效用。我们的结果表明，由于扭曲图结构引起的偏差和添加随机噪音引起的方差之间存在权衡。当前方法在这种偏差-方差权衡中处于不同的点上，但更复杂的方法往往需要高方差的噪音添加，削弱了效用。

更新时间: 2025-07-30 03:20:15

领域: cs.CR

下载: http://arxiv.org/abs/2507.22347v1

Koopman-Based Generalization of Deep Reinforcement Learning With Application to Wireless Communications

Deep Reinforcement Learning (DRL) is a key machine learning technology driving progress across various scientific and engineering fields, including wireless communication. However, its limited interpretability and generalizability remain major challenges. In supervised learning, generalizability is commonly evaluated through the generalization error using information-theoretic methods. In DRL, the training data is sequential and not independent and identically distributed (i.i.d.), rendering traditional information-theoretic methods unsuitable for generalizability analysis. To address this challenge, this paper proposes a novel analytical method for evaluating the generalizability of DRL. Specifically, we first model the evolution of states and actions in trained DRL algorithms as unknown discrete, stochastic, and nonlinear dynamical functions. Then, we employ a data-driven identification method, the Koopman operator, to approximate these functions, and propose two interpretable representations. Based on these interpretable representations, we develop a rigorous mathematical approach to evaluate the generalizability of DRL algorithms. This approach is formulated using the spectral feature analysis of the Koopman operator, leveraging the H_\infty norm. Finally, we apply this generalization analysis to compare the soft actor-critic method, widely recognized as a robust DRL approach, against the proximal policy optimization algorithm for an unmanned aerial vehicle-assisted mmWave wireless communication scenario.

Updated: 2025-07-30 03:19:42

标题: 基于Koopman的深度强化学习泛化方法及其在无线通信中的应用

摘要: 深度强化学习(DRL)是推动各个科学和工程领域取得进展的关键机器学习技术，包括无线通信。然而，其有限的可解释性和泛化性仍然是主要挑战。在监督学习中，通常通过使用信息论方法评估泛化误差来评估泛化性。在DRL中，训练数据是顺序的，而不是独立和同分布的(i.i.d.)，使得传统的信息论方法不适用于泛化能力分析。为了解决这一挑战，本文提出了一种用于评估DRL泛化能力的新颖分析方法。具体而言，我们首先将训练后的DRL算法中的状态和动作的演变建模为未知的离散、随机和非线性动态函数。然后，我们采用一种数据驱动的识别方法，Koopman算子，来近似这些函数，并提出了两种可解释的表示形式。基于这些可解释的表示形式，我们开发了一种严格的数学方法来评估DRL算法的泛化能力。该方法是通过利用Koopman算子的谱特征分析，利用H_\infty范数来制定的。最后，我们将这种泛化分析应用于比较被广泛认为是强大的DRL方法的软演员-评论家方法，与用于无人机辅助毫米波无线通信场景的近端策略优化算法。

更新时间: 2025-07-30 03:19:42

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2503.02961v2

Koopman-Based Generalization of Deep Reinforcement Learning With Application to Wireless Communications

Updated: 2025-07-30 03:19:42

标题: Koopman基于的深度强化学习的泛化及其在无线通信中的应用

摘要: 深度强化学习(DRL)是推动各种科学和工程领域进步的关键机器学习技术，包括无线通信。然而，其有限的可解释性和泛化性仍然是主要挑战。在监督学习中，通常通过使用信息论方法来评估泛化误差来评估泛化性。在DRL中，训练数据是序列化的，并且不是独立和同分布的(i.i.d.)，使传统的信息论方法不适用于泛化性分析。为了解决这一挑战，本文提出了一种新颖的分析方法来评估DRL的泛化性。具体地，我们首先将训练后的DRL算法中状态和动作的演变建模为未知的离散、随机和非线性动态函数。然后，我们采用一种数据驱动的识别方法，Koopman算子，来近似这些函数，并提出了两种可解释的表示。基于这些可解释的表示，我们开发了一种严格的数学方法来评估DRL算法的泛化性。这种方法是通过使用Koopman算子的谱特征分析来制定的，利用H_\infty范数。最后，我们将这种泛化分析应用于比较软演员-评论家方法(被广泛认为是一种强大的DRL方法)和用于无人机辅助毫米波无线通信场景的近端策略优化算法。

更新时间: 2025-07-30 03:19:42

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2503.02961v2

Robust Filtering and Learning in State-Space Models: Skewness and Heavy Tails Via Asymmetric Laplace Distribution

State-space models are pivotal for dynamic system analysis but often struggle with outlier data that deviates from Gaussian distributions, frequently exhibiting skewness and heavy tails. This paper introduces a robust extension utilizing the asymmetric Laplace distribution, specifically tailored to capture these complex characteristics. We propose an efficient variational Bayes algorithm and a novel single-loop parameter estimation strategy, significantly enhancing the efficiency of the filtering, smoothing, and parameter estimation processes. Our comprehensive experiments demonstrate that our methods provide consistently robust performance across various noise settings without the need for manual hyperparameter adjustments. In stark contrast, existing models generally rely on specific noise conditions and necessitate extensive manual tuning. Moreover, our approach uses far fewer computational resources, thereby validating the model's effectiveness and underscoring its potential for practical applications in fields such as robust control and financial modeling.

Updated: 2025-07-30 03:06:27

标题: 在状态空间模型中的鲁棒滤波和学习：通过非对称拉普拉斯分布的偏斜和重尾

摘要: 状态空间模型对于动态系统分析至关重要，但常常在处理偏离高斯分布的异常数据时遇到困难，这些数据通常表现出偏斜和重尾特征。本文介绍了一种利用不对称拉普拉斯分布的鲁棒扩展，专门设计用于捕捉这些复杂特征。我们提出了一种高效的变分贝叶斯算法和一种新颖的单循环参数估计策略，显著提高了滤波、平滑和参数估计过程的效率。我们的综合实验表明，我们的方法在各种噪声设置下提供了一致稳健的性能，无需手动调整超参数。与之形成鲜明对比的是，现有模型通常依赖特定的噪声条件，需要大量手动调整。此外，我们的方法使用的计算资源远少于现有模型，从而验证了模型的有效性，并突显了在诸如鲁棒控制和金融建模等领域的实际应用潜力。

更新时间: 2025-07-30 03:06:27

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2507.22343v1

Robust Filtering and Learning in State-Space Models: Skewness and Heavy Tails Via Asymmetric Laplace Distribution

Updated: 2025-07-30 03:06:27

标题: 强健的状态空间模型中的滤波和学习：通过非对称拉普拉斯分布的偏度和重尾

摘要: 状态空间模型在动态系统分析中起着关键作用，但常常在处理偏离高斯分布的异常数据时遇到困难，这些数据经常呈现偏斜和重尾现象。本文介绍了一种利用不对称拉普拉斯分布的鲁棒扩展，专门设计用于捕捉这些复杂特征。我们提出了一种高效的变分贝叶斯算法和一种新颖的单循环参数估计策略，显著提高了滤波、平滑和参数估计过程的效率。我们全面的实验表明，我们的方法在各种噪声设置下提供了一致稳健的表现，无需手动调整超参数。相比之下，现有模型通常依赖于特定的噪声条件，并需要大量手动调整。此外，我们的方法使用更少的计算资源，从而验证了模型的有效性，并强调了在强健控制和金融建模等领域实际应用的潜力。

更新时间: 2025-07-30 03:06:27

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2507.22343v1

A Semi-Supervised Federated Learning Framework with Hierarchical Clustering Aggregation for Heterogeneous Satellite Networks

Low Earth Orbit (LEO) satellites are emerging as key components of 6G networks, with many already deployed to support large-scale Earth observation and sensing related tasks. Federated Learning (FL) presents a promising paradigm for enabling distributed intelligence in these resource-constrained and dynamic environments. However, achieving reliable convergence, while minimizing both processing time and energy consumption, remains a substantial challenge, particularly in heterogeneous and partially unlabeled satellite networks. To address this challenge, we propose a novel semi-supervised federated learning framework tailored for LEO satellite networks with hierarchical clustering aggregation. To further reduce communication overhead, we integrate sparsification and adaptive weight quantization techniques. In addition, we divide the FL clustering into two stages: satellite cluster aggregation stage and Ground Stations (GSs) aggregation stage. The supervised learning at GSs guides selected Parameter Server (PS) satellites, which in turn support fully unlabeled satellites during the federated training process. Extensive experiments conducted on a satellite network testbed demonstrate that our proposal can significantly reduce processing time (up to 3x) and energy consumption (up to 4x) compared to other comparative methods while maintaining model accuracy.

Updated: 2025-07-30 02:47:14

标题: 一个带有分层聚类聚合的半监督联邦学习框架，用于异构卫星网络

摘要: 低地球轨道（LEO）卫星正在成为6G网络的关键组成部分，许多卫星已部署以支持大规模的地球观测和传感相关任务。联邦学习（FL）提供了一种在资源受限且动态环境中实现分布式智能的有前景的范例。然而，在异构和部分未标记的卫星网络中实现可靠的收敛，同时最小化处理时间和能耗，仍然是一个重大挑战。为了解决这一挑战，我们提出了一个专为LEO卫星网络量身定制的新型半监督联邦学习框架，采用分层聚类聚合。为了进一步减少通信开销，我们整合了稀疏化和自适应权重量化技术。此外，我们将FL聚类划分为两个阶段：卫星集聚阶段和地面站（GSs）集聚阶段。GSs上的监督学习指导选定的参数服务器（PS）卫星，这些卫星反过来在联邦训练过程中支持完全未标记的卫星。在卫星网络测试平台上进行的大量实验表明，与其他比较方法相比，我们的提案可以显著降低处理时间（最多3倍）和能耗（最多4倍），同时保持模型准确性。

更新时间: 2025-07-30 02:47:14

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2507.22339v1

A Semi-Supervised Federated Learning Framework with Hierarchical Clustering Aggregation for Heterogeneous Satellite Networks

Updated: 2025-07-30 02:47:14

标题: 一个带有层次聚类聚合的半监督联邦学习框架，适用于异构卫星网络

摘要: 低地球轨道（LEO）卫星正逐渐成为6G网络的关键组成部分，许多卫星已经部署用于支持大规模的地球观测和传感相关任务。联邦学习（FL）提供了一种有前途的范例，可以在这些资源受限且动态环境中实现分布式智能。然而，在异构和部分未标记的卫星网络中，实现可靠的收敛，同时最小化处理时间和能量消耗，仍然是一个重大挑战。为了解决这一挑战，我们提出了一种专为LEO卫星网络定制的新型半监督联邦学习框架，采用分层聚类聚合。为了进一步减少通信开销，我们整合了稀疏化和自适应权重量化技术。此外，我们将FL聚类分为两个阶段：卫星集群聚合阶段和地面站（GSs）聚合阶段。GSs上的监督学习指导选择的参数服务器（PS）卫星，这些卫星在联邦训练过程中支持完全未标记的卫星。在卫星网络测试平台上进行的大量实验表明，与其他比较方法相比，我们的提案可以显著减少处理时间（高达3倍）和能量消耗（高达4倍），同时保持模型准确性。

更新时间: 2025-07-30 02:47:14

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2507.22339v1

Parametrized Multi-Agent Routing via Deep Attention Models

We propose a scalable deep learning framework for parametrized sequential decision-making (ParaSDM), where multiple agents jointly optimize discrete action policies and shared continuous parameters. A key subclass of this setting arises in Facility-Location and Path Optimization (FLPO), where multi-agent systems must simultaneously determine optimal routes and facility locations, aiming to minimize the cumulative transportation cost within the network. FLPO problems are NP-hard due to their mixed discrete-continuous structure and highly non-convex objective. To address this, we integrate the Maximum Entropy Principle (MEP) with a neural policy model called the Shortest Path Network (SPN)-a permutation-invariant encoder-decoder that approximates the MEP solution while enabling efficient gradient-based optimization over shared parameters. The SPN achieves up to 100$\times$ speedup in policy inference and gradient computation compared to MEP baselines, with an average optimality gap of approximately 6% across a wide range of problem sizes. Our FLPO approach yields over 10$\times$ lower cost than metaheuristic baselines while running significantly faster, and matches Gurobi's optimal cost with annealing at a 1500$\times$ speedup-establishing a new state of the art for ParaSDM problems. These results highlight the power of structured deep models for solving large-scale mixed-integer optimization tasks.

Updated: 2025-07-30 02:46:45

标题: 通过深度关注模型参数化的多智能体路由

摘要: 我们提出了一个可扩展的深度学习框架，用于参数化顺序决策制定（ParaSDM），在这个框架中，多个代理共同优化离散动作策略和共享连续参数。这种设置的一个关键子类出现在设施位置和路径优化（FLPO）中，多代理系统必须同时确定最佳路线和设施位置，旨在最小化网络中的累积运输成本。FLPO问题由于其混合离散连续结构和高度非凸目标而是NP难题。为了解决这个问题，我们将最大熵原则（MEP）与一种名为最短路径网络（SPN）的神经策略模型相结合-一种排列不变的编码器-解码器，它近似于MEP解决方案，同时实现了对共享参数的高效梯度优化。与MEP基线相比，SPN在策略推断和梯度计算方面实现了高达100倍的加速，跨越了一系列问题规模，平均最优性差约为6％。我们的FLPO方法比基于元启发式的基线成本降低了10倍以上，同时运行速度明显更快，并且在1500倍的加速下与Gurobi的最优成本相匹配-为ParaSDM问题建立了一种新的技术水平。这些结果突显了结构化深度模型在解决大规模混合整数优化任务中的能力。

更新时间: 2025-07-30 02:46:45

领域: cs.LG

下载: http://arxiv.org/abs/2507.22338v1

Parametrized Multi-Agent Routing via Deep Attention Models

Updated: 2025-07-30 02:46:45

标题: 参数化多智能体路径规划的深度注意模型

摘要: 我们提出了一个可扩展的深度学习框架，用于参数化顺序决策（ParaSDM），其中多个代理共同优化离散动作策略和共享连续参数。这种设置的一个关键子类出现在设施位置和路径优化（FLPO）中，其中多智能体系统必须同时确定最佳路线和设施位置，旨在最小化网络内的累积运输成本。由于其混合离散连续结构和高度非凸目标，FLPO问题是NP困难的。为了解决这个问题，我们将最大熵原则（MEP）与一种称为最短路径网络（SPN）的神经策略模型集成在一起，SPN是一种排列不变编码器-解码器，可以近似MEP解决方案，同时实现对共享参数的高效基于梯度的优化。与MEP基线相比，SPN在策略推断和梯度计算方面实现了最高100倍的加速，平均最优性差距约为6％，涵盖了各种问题规模范围。我们的FLPO方法比元启发式基线降低了超过10倍的成本，同时运行速度显著更快，并在1500倍加速下与Gurobi的最佳成本相匹配，为ParaSDM问题建立了一个新的技术水平。这些结果突显了结构化深度模型在解决大规模混合整数优化任务中的能力。

更新时间: 2025-07-30 02:46:45

领域: cs.LG

下载: http://arxiv.org/abs/2507.22338v1

HypKG: Hypergraph-based Knowledge Graph Contextualization for Precision Healthcare

Knowledge graphs (KGs) are important products of the semantic web, which are widely used in various application domains. Healthcare is one of such domains where KGs are intensively used, due to the high requirement for knowledge accuracy and interconnected nature of healthcare data. However, KGs storing general factual information often lack the ability to account for important contexts of the knowledge such as the status of specific patients, which are crucial in precision healthcare. Meanwhile, electronic health records (EHRs) provide rich personal data, including various diagnoses and medications, which provide natural contexts for general KGs. In this paper, we propose HypKG, a framework that integrates patient information from EHRs into KGs to generate contextualized knowledge representations for accurate healthcare predictions. Using advanced entity-linking techniques, we connect relevant knowledge from general KGs with patient information from EHRs, and then utilize a hypergraph model to "contextualize" the knowledge with the patient information. Finally, we employ hypergraph transformers guided by downstream prediction tasks to jointly learn proper contextualized representations for both KGs and patients, fully leveraging existing knowledge in KGs and patient contexts in EHRs. In experiments using a large biomedical KG and two real-world EHR datasets, HypKG demonstrates significant improvements in healthcare prediction tasks across multiple evaluation metrics. Additionally, by integrating external contexts, HypKG can learn to adjust the representations of entities and relations in KG, potentially improving the quality and real-world utility of knowledge.

Updated: 2025-07-30 02:32:04

标题: HypKG:基于超图的知识图上下文化，用于精准健康医疗

摘要: 知识图谱（KGs）是语义网络的重要产品，在各种应用领域广泛使用。医疗保健是其中之一，在这个领域，由于对知识准确性和医疗数据的互联性的高要求，KGs被大量使用。然而，存储通用事实信息的KGs通常缺乏考虑知识的重要上下文（如特定患者的状态）的能力，这在精准医疗中至关重要。同时，电子健康记录（EHRs）提供丰富的个人数据，包括各种诊断和药物，为通用KGs提供自然上下文。在本文中，我们提出了HypKG，这是一个将EHRs中的患者信息整合到KGs中，以生成用于准确医疗预测的具有上下文的知识表示的框架。通过使用先进的实体链接技术，我们将通用KGs中的相关知识与EHRs中的患者信息连接起来，然后利用超图模型将知识与患者信息“上下文化”。最后，我们利用下游预测任务引导的超图变换器共同学习适当的上下文化表示，充分利用KGs中的现有知识和EHRs中的患者上下文。在使用大型生物医学KG和两个真实世界EHR数据集进行实验时，HypKG在多个评估指标上显示出对医疗预测任务的显著改进。此外，通过整合外部上下文，HypKG可以学习调整KG中实体和关系的表示，从而潜在地提高知识的质量和实际应用效果。

更新时间: 2025-07-30 02:32:04

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.19726v2

Hypernetworks for Model-Heterogeneous Personalized Federated Learning

Recent advances in personalized federated learning have focused on addressing client model heterogeneity. However, most existing methods still require external data, rely on model decoupling, or adopt partial learning strategies, which can limit their practicality and scalability. In this paper, we revisit hypernetwork-based methods and leverage their strong generalization capabilities to design a simple yet effective framework for heterogeneous personalized federated learning. Specifically, we propose MH-pFedHN, which leverages a server-side hypernetwork that takes client-specific embedding vectors as input and outputs personalized parameters tailored to each client's heterogeneous model. To promote knowledge sharing and reduce computation, we introduce a multi-head structure within the hypernetwork, allowing clients with similar model sizes to share heads. Furthermore, we further propose MH-pFedHNGD, which integrates an optional lightweight global model to improve generalization. Our framework does not rely on external datasets and does not require disclosure of client model architectures, thereby offering enhanced privacy and flexibility. Extensive experiments on multiple benchmarks and model settings demonstrate that our approach achieves competitive accuracy, strong generalization, and serves as a robust baseline for future research in model-heterogeneous personalized federated learning.

Updated: 2025-07-30 02:24:26

标题: 超网络用于模型异构个性化联邦学习

摘要: 最近关于个性化联邦学习的研究集中在解决客户端模型的异质性。然而，大多数现有方法仍然需要外部数据，依赖于模型解耦，或采用部分学习策略，这可能限制其实用性和可扩展性。在本文中，我们重新审视了基于超网络的方法，并利用其强大的泛化能力来设计一个简单而有效的框架，用于异构个性化联邦学习。具体而言，我们提出了MH-pFedHN，利用一个服务器端的超网络，该超网络以客户端特定的嵌入向量作为输入，并输出针对每个客户端异构模型定制的个性化参数。为了促进知识共享和减少计算量，我们在超网络中引入了一个多头结构，允许具有相似模型大小的客户端共享头部。此外，我们进一步提出了MH-pFedHNGD，该方法整合了一个可选的轻量级全局模型以提高泛化能力。我们的框架不依赖于外部数据集，并且不需要披露客户端模型架构，从而提供了增强的隐私性和灵活性。在多个基准和模型设置上进行的大量实验表明，我们的方法实现了竞争性的准确性，强大的泛化能力，并作为未来模型异构个性化联邦学习研究的坚实基础。

更新时间: 2025-07-30 02:24:26

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2507.22330v1

Hypernetworks for Model-Heterogeneous Personalized Federated Learning

Updated: 2025-07-30 02:24:26

标题: 超网络用于异构模型个性化联邦学习

摘要: 最近个性化联邦学习的进展集中在解决客户模型的异质性。然而，大多数现有方法仍然需要外部数据，依赖模型解耦，或采用部分学习策略，这可能限制它们的实用性和可扩展性。在本文中，我们重新审视基于超网络的方法，并利用它们强大的泛化能力设计一个简单而有效的异构个性化联邦学习框架。具体而言，我们提出了MH-pFedHN，它利用一个服务器端的超网络，将客户特定的嵌入向量作为输入，并输出针对每个客户异构模型定制的个性化参数。为了促进知识共享和减少计算量，我们在超网络中引入了一个多头结构，允许具有相似模型大小的客户共享头部。此外，我们进一步提出了MH-pFedHNGD，它集成了一个可选的轻量级全局模型以改进泛化能力。我们的框架不依赖于外部数据集，也不需要披露客户模型架构，因此提供了增强的隐私性和灵活性。在多个基准和模型设置上进行的大量实验表明，我们的方法实现了竞争性的准确性，强大的泛化能力，并为未来模型异构个性化联邦学习研究提供了一个稳健的基线。

更新时间: 2025-07-30 02:24:26

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2507.22330v1

FAST: An Optimization Framework for Fast Additive Segmentation in Transparent ML

We present FAST, an optimization framework for fast additive segmentation. FAST segments piecewise constant shape functions for each feature in a dataset to produce transparent additive models. The framework leverages a novel optimization procedure to fit these models $\sim$2 orders of magnitude faster than existing state-of-the-art methods, such as explainable boosting machines \citep{nori2019interpretml}. We also develop new feature selection algorithms in the FAST framework to fit parsimonious models that perform well. Through experiments and case studies, we show that FAST improves the computational efficiency and interpretability of additive models.

Updated: 2025-07-30 02:11:56

标题: 快速增量分割的优化框架：透明机器学习中的FAST

摘要: 我们提出了FAST，这是一个用于快速加法分割的优化框架。FAST为数据集中的每个特征段分段常数形状函数，以生成透明的加法模型。该框架利用一种新颖的优化过程来拟合这些模型，比现有的最先进方法（如可解释性提升机）快大约2个数量级\citep{nori2019interpretml}。我们还在FAST框架中开发了新的特征选择算法，以拟合表现良好的简约模型。通过实验证明，FAST提高了加法模型的计算效率和可解释性。

更新时间: 2025-07-30 02:11:56

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.12630v2

FAST: An Optimization Framework for Fast Additive Segmentation in Transparent ML

Updated: 2025-07-30 02:11:56

标题: FAST：透明机器学习中快速加法分割的优化框架

摘要: 我们提出了FAST，一个用于快速附加分割的优化框架。FAST对数据集中每个特征的分段常数形状函数进行分割，以生成透明的附加模型。该框架利用一种新颖的优化过程，使得拟合这些模型比现有的最先进方法（如可解释提升机 \citep{nori2019interpretml}）快大约2个数量级。我们还在FAST框架中开发了新的特征选择算法，以拟合表现良好的简约模型。通过实验和案例研究，我们展示了FAST提高了附加模型的计算效率和可解释性。

更新时间: 2025-07-30 02:11:56

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.12630v2

Whisper Smarter, not Harder: Adversarial Attack on Partial Suppression

Currently, Automatic Speech Recognition (ASR) models are deployed in an extensive range of applications. However, recent studies have demonstrated the possibility of adversarial attack on these models which could potentially suppress or disrupt model output. We investigate and verify the robustness of these attacks and explore if it is possible to increase their imperceptibility. We additionally find that by relaxing the optimisation objective from complete suppression to partial suppression, we can further decrease the imperceptibility of the attack. We also explore possible defences against these attacks and show a low-pass filter defence could potentially serve as an effective defence.

Updated: 2025-07-30 02:02:58

标题: Whisper Smarter, not Harder: 针对部分抑制的对抗攻击

摘要: 目前，自动语音识别（ASR）模型被部署在广泛的应用中。然而，最近的研究表明这些模型可能遭受对抗性攻击，可能会抑制或干扰模型输出。我们调查并验证这些攻击的鲁棒性，并探讨是否可能增加它们的难以察觉性。我们另外发现，通过将优化目标从完全抑制放宽到部分抑制，我们可以进一步降低攻击的难以察觉性。我们还探讨可能的对抗措施，并展示一个低通滤波器防御可能作为一种有效的防御手段。

更新时间: 2025-07-30 02:02:58

领域: cs.SD,cs.CR,cs.LG,eess.AS

下载: http://arxiv.org/abs/2508.09994v1

An Explainable Emotion Alignment Framework for LLM-Empowered Agent in Metaverse Service Ecosystem

Metaverse service is a product of the convergence between Metaverse and service systems, designed to address service-related challenges concerning digital avatars, digital twins, and digital natives within Metaverse. With the rise of large language models (LLMs), agents now play a pivotal role in Metaverse service ecosystem, serving dual functions: as digital avatars representing users in the virtual realm and as service assistants (or NPCs) providing personalized support. However, during the modeling of Metaverse service ecosystems, existing LLM-based agents face significant challenges in bridging virtual-world services with real-world services, particularly regarding issues such as character data fusion, character knowledge association, and ethical safety concerns. This paper proposes an explainable emotion alignment framework for LLM-based agents in Metaverse Service Ecosystem. It aims to integrate factual factors into the decision-making loop of LLM-based agents, systematically demonstrating how to achieve more relational fact alignment for these agents. Finally, a simulation experiment in the Offline-to-Offline food delivery scenario is conducted to evaluate the effectiveness of this framework, obtaining more realistic social emergence.

Updated: 2025-07-30 02:00:26

标题: 一个可解释的情感对齐框架：基于LLM增强的代理人在元宇宙服务生态系统中

摘要: 元宇宙服务是元宇宙和服务系统融合的产物，旨在解决有关数字化头像、数字化孪生体和元宇宙内数字原住民的服务相关挑战。随着大型语言模型（LLMs）的崛起，代理现在在元宇宙服务生态系统中发挥关键作用，扮演双重功能：作为代表用户在虚拟领域中的数字化头像，以及作为提供个性化支持的服务助手（或NPCs）。然而，在建模元宇宙服务生态系统时，现有基于LLM的代理面临着重要挑战，特别是在虚拟世界服务与现实世界服务之间建立联系方面，特别是关于角色数据融合、角色知识关联和道德安全问题。本文提出了一个用于元宇宙服务生态系统中基于LLM的代理的可解释情感对齐框架。它旨在将事实因素整合到LLM代理的决策循环中，系统地展示如何为这些代理实现更多关系事实对齐。最后，在离线到离线食品配送场景中进行了模拟实验，以评估该框架的有效性，获得更加逼真的社会出现。

更新时间: 2025-07-30 02:00:26

领域: cs.AI

下载: http://arxiv.org/abs/2507.22326v1

From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications

Maintaining software packages imposes significant costs due to dependency management, bug fixes, and versioning. We show that rich method descriptions in scientific publications can serve as standalone specifications for modern large language models (LLMs), enabling on-demand code generation that could supplant human-maintained libraries. We benchmark state-of-the-art models (GPT-o4-mini-high, Gemini Pro 2.5, Claude Sonnet 4) by tasking them with implementing a diverse set of core algorithms drawn from original publications. Our results demonstrate that current LLMs can reliably reproduce package functionality with performance indistinguishable from conventional libraries. These findings foreshadow a paradigm shift toward flexible, on-demand code generation and away from static, human-maintained packages, which will result in reduced maintenance overhead by leveraging published articles as sufficient context for the automated implementation of analytical workflows.

Updated: 2025-07-30 01:52:01

标题: 从文章到代码：根据科学出版物的要求生成核心算法

摘要: 维护软件包会带来重大成本，包括依赖管理、bug修复和版本管理。我们展示了科学出版物中丰富的方法描述可以作为现代大型语言模型（LLMs）的独立规范，实现按需代码生成，从而取代人工维护的库。我们通过让最先进的模型（GPT-o4-mini-high、Gemini Pro 2.5、Claude Sonnet 4）实现从原始出版物中提取的多样化核心算法来对其进行基准测试。我们的结果表明，当前的LLMs可以可靠地复制软件包功能，性能与传统库无法区分。这些发现预示着向灵活的按需代码生成的范式转变，远离静态的人工维护软件包，通过利用已发布文章作为自动实现分析工作流程的足够上下文，降低维护开销。

更新时间: 2025-07-30 01:52:01

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.22324v1

Provable Low-Frequency Bias of In-Context Learning of Representations

In-context learning (ICL) enables large language models (LLMs) to acquire new behaviors from the input sequence alone without any parameter updates. Recent studies have shown that ICL can surpass the original meaning learned in pretraining stage through internalizing the structure the data-generating process (DGP) of the prompt into the hidden representations. However, the mechanisms by which LLMs achieve this ability is left open. In this paper, we present the first rigorous explanation of such phenomena by introducing a unified framework of double convergence, where hidden representations converge both over context and across layers. This double convergence process leads to an implicit bias towards smooth (low-frequency) representations, which we prove analytically and verify empirically. Our theory explains several open empirical observations, including why learned representations exhibit globally structured but locally distorted geometry, and why their total energy decays without vanishing. Moreover, our theory predicts that ICL has an intrinsic robustness towards high-frequency noise, which we empirically confirm. These results provide new insights into the underlying mechanisms of ICL, and a theoretical foundation to study it that hopefully extends to more general data distributions and settings.

Updated: 2025-07-30 01:51:41

标题: 可证明的上下文学习表示的低频偏差

摘要: 在上下文学习（ICL）中，大型语言模型（LLMs）能够仅通过输入序列而无需任何参数更新就能获得新行为。最近的研究表明，ICL通过将prompt的数据生成过程（DGP）的结构内部化到隐藏表示中，可以超越在预训练阶段学习的原始含义。然而，LLMs实现这种能力的机制尚未被揭示。在本文中，我们通过引入双重收敛的统一框架，首次对这种现象进行了严格解释，其中隐藏表示在上下文和层之间都收敛。这种双重收敛过程导致对平滑（低频）表示的隐性偏见，我们通过理论证明和经验证实了这一点。我们的理论解释了几个开放的经验观察，包括为什么学习到的表示展现出全局结构化但局部扭曲的几何形状，以及为什么它们的总能量在不消失的情况下衰减。此外，我们的理论预测ICL对高频噪声具有固有的鲁棒性，我们通过实验证实了这一点。这些结果为我们提供了对ICL潜在机制的新见解，以及研究它的理论基础，希望能够扩展到更一般的数据分布和设置。

更新时间: 2025-07-30 01:51:41

领域: cs.LG

下载: http://arxiv.org/abs/2507.13540v2

Provable Low-Frequency Bias of In-Context Learning of Representations

Updated: 2025-07-30 01:51:41

标题: 可证明的上下文学习表示的低频偏差

摘要: 上下文学习（ICL）使大型语言模型（LLM）能够仅通过输入序列而无需任何参数更新就从中获取新行为。最近的研究表明，ICL可以通过将提示的数据生成过程（DGP）的结构内化到隐藏表示中，超越在预训练阶段学到的原始含义。然而，LLM实现这种能力的机制尚未明确。在本文中，我们通过引入一个双重收敛的统一框架，首次对这种现象进行了严格解释，其中隐藏表示同时在上下文和跨层间收敛。这种双重收敛过程导致对平滑（低频）表示的隐含偏见，我们在理论上证明并在实验中验证。我们的理论解释了几个开放的经验观察，包括为什么学习表示表现出全局结构化但局部扭曲的几何形状，以及为什么它们的总能量会衰减而不会消失。此外，我们的理论预测ICL对高频噪声具有固有的鲁棒性，这一点我们在实验中得到了验证。这些结果为我们深入研究ICL的基础机制提供了新的见解，并提供了一种理论基础，希望能够扩展到更一般的数据分布和设置中。

更新时间: 2025-07-30 01:51:41

领域: cs.LG

下载: http://arxiv.org/abs/2507.13540v2

Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges

Natural Language Processing (NLP) is revolutionising the way both professionals and laypersons operate in the legal field. The considerable potential for NLP in the legal sector, especially in developing computational assistance tools for various legal processes, has captured the interest of researchers for years. This survey follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses framework, reviewing 154 studies, with a final selection of 131 after manual filtering. It explores foundational concepts related to NLP in the legal domain, illustrating the unique aspects and challenges of processing legal texts, such as extensive document lengths, complex language, and limited open legal datasets. We provide an overview of NLP tasks specific to legal text, such as Document Summarisation, Named Entity Recognition, Question Answering, Argument Mining, Text Classification, and Judgement Prediction. Furthermore, we analyse both developed legal-oriented language models, and approaches for adapting general-purpose language models to the legal domain. Additionally, we identify sixteen open research challenges, including the detection and mitigation of bias in artificial intelligence applications, the need for more robust and interpretable models, and improving explainability to handle the complexities of legal language and reasoning.

Updated: 2025-07-30 01:39:42

标题: 法律领域的自然语言处理：任务、数据集、模型和挑战的调查

摘要: 自然语言处理（NLP）正在彻底改变专业人士和普通人在法律领域的操作方式。NLP在法律领域的巨大潜力，特别是在为各种法律流程开发计算辅助工具方面，多年来一直吸引着研究人员的兴趣。本调查遵循了系统评价和荟萃分析的首选报告项目框架，审查了154项研究，经过手动筛选后最终选择了131项。它探讨了与NLP在法律领域相关的基本概念，说明了处理法律文本的独特方面和挑战，如文档长度较长、语言复杂以及开放法律数据集有限等。我们概述了特定于法律文本的NLP任务，如文档摘要、命名实体识别、问答、论证挖掘、文本分类和判断预测。此外，我们分析了已开发的面向法律的语言模型，以及将通用语言模型调整为法律领域的方法。此外，我们确定了十六个开放式研究挑战，包括检测和减轻人工智能应用中的偏见、需要更强大和可解释的模型，以及改进可解释性以处理法律语言和推理的复杂性。

更新时间: 2025-07-30 01:39:42

领域: cs.CL,cs.AI,A.1; I.2.7; J.1

下载: http://arxiv.org/abs/2410.21306v3

Learning from Heterogeneous Structural MRI via Collaborative Domain Adaptation for Late-Life Depression Assessment

Accurate identification of late-life depression (LLD) using structural brain MRI is essential for monitoring disease progression and facilitating timely intervention. However, existing learning-based approaches for LLD detection are often constrained by limited sample sizes (e.g., tens), which poses significant challenges for reliable model training and generalization. Although incorporating auxiliary datasets can expand the training set, substantial domain heterogeneity, such as differences in imaging protocols, scanner hardware, and population demographics, often undermines cross-domain transferability. To address this issue, we propose a Collaborative Domain Adaptation (CDA) framework for LLD detection using T1-weighted MRIs. The CDA leverages a Vision Transformer (ViT) to capture global anatomical context and a Convolutional Neural Network (CNN) to extract local structural features, with each branch comprising an encoder and a classifier. The CDA framework consists of three stages: (a) supervised training on labeled source data, (b) self-supervised target feature adaptation and (c) collaborative training on unlabeled target data. We first train ViT and CNN on source data, followed by self-supervised target feature adaptation by minimizing the discrepancy between classifier outputs from two branches to make the categorical boundary clearer. The collaborative training stage employs pseudo-labeled and augmented target-domain MRIs, enforcing prediction consistency under strong and weak augmentation to enhance domain robustness and generalization. Extensive experiments conducted on multi-site T1-weighted MRI data demonstrate that the CDA consistently outperforms state-of-the-art unsupervised domain adaptation methods.

Updated: 2025-07-30 01:38:32

标题: 通过协作领域适应学习来从异质结构MRI中学习晚发性抑郁症评估

摘要: 使用结构性脑MRI准确识别晚发性抑郁症（LLD）对于监测疾病进展和促进及时干预至关重要。然而，现有的基于学习的LLD检测方法通常受限于样本规模有限（例如，几十个），这给可靠的模型训练和泛化带来了重大挑战。尽管整合辅助数据集可以扩大训练集，但领域之间的巨大异质性，如成像协议、扫描仪硬件和人口统计学的差异，经常削弱跨领域的可转移性。为解决这一问题，我们提出了一种使用T1加权MRI进行LLD检测的协作领域适应（CDA）框架。CDA利用Vision Transformer（ViT）捕获全局解剖上下文和卷积神经网络（CNN）提取局部结构特征，每个分支包括一个编码器和一个分类器。CDA框架包括三个阶段：（a）在标记源数据上进行监督训练，（b）自监督目标特征适应，（c）在未标记目标数据上进行协作训练。我们首先在源数据上对ViT和CNN进行训练，然后通过最小化两个分支的分类器输出之间的差异来进行自监督目标特征适应，使分类边界更清晰。协作训练阶段利用伪标记和增强目标域MRI，强化和弱化增强下的预测一致性，增强领域的鲁棒性和泛化性。对多个场地的T1加权MRI数据进行的大量实验表明，CDA始终优于最先进的无监督领域适应方法。

更新时间: 2025-07-30 01:38:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.22321v1

Floating-Point Neural Networks Are Provably Robust Universal Approximators

The classical universal approximation (UA) theorem for neural networks establishes mild conditions under which a feedforward neural network can approximate a continuous function $f$ with arbitrary accuracy. A recent result shows that neural networks also enjoy a more general interval universal approximation (IUA) theorem, in the sense that the abstract interpretation semantics of the network using the interval domain can approximate the direct image map of $f$ (i.e., the result of applying $f$ to a set of inputs) with arbitrary accuracy. These theorems, however, rest on the unrealistic assumption that the neural network computes over infinitely precise real numbers, whereas their software implementations in practice compute over finite-precision floating-point numbers. An open question is whether the IUA theorem still holds in the floating-point setting. This paper introduces the first IUA theorem for floating-point neural networks that proves their remarkable ability to perfectly capture the direct image map of any rounded target function $f$, showing no limits exist on their expressiveness. Our IUA theorem in the floating-point setting exhibits material differences from the real-valued setting, which reflects the fundamental distinctions between these two computational models. This theorem also implies surprising corollaries, which include (i) the existence of provably robust floating-point neural networks; and (ii) the computational completeness of the class of straight-line programs that use only floating-point additions and multiplications for the class of all floating-point programs that halt.

Updated: 2025-07-30 01:31:24

标题: 浮点神经网络被证明是稳健的通用逼近器

摘要: 神经网络的经典通用逼近（UA）定理建立了一个条件，根据这个条件，前馈神经网络可以以任意精度逼近连续函数$f$。最近的一个结果表明，神经网络还享有更一般的区间通用逼近（IUA）定理，即使用区间域的网络的抽象解释语义可以以任意精度逼近$f$的直接映射（即将$f$应用于一组输入的结果）。然而，这些定理都基于一个不现实的假设，即神经网络计算在无限精确的实数上，而实际的软件实现是在有限精度浮点数上进行计算。一个未解决的问题是在浮点设置下IUA定理是否仍然成立。本文介绍了第一个适用于浮点神经网络的IUA定理，证明了它们完美捕捉任何舍入目标函数$f$的直接映射的显著能力，显示其表达能力没有限制。在浮点设置下，我们的IUA定理展现出与实值设置有实质性差异，反映了这两种计算模型之间的基本区别。这个定理还暗示了一些令人惊讶的推论，包括（i）存在可证的强健浮点神经网络；以及（ii）对于使用仅浮点加法和乘法的直线程序类别，所有停机的浮点程序类别的计算完备性。

更新时间: 2025-07-30 01:31:24

领域: cs.LG,cs.LO,cs.PL

下载: http://arxiv.org/abs/2506.16065v2

Floating-Point Neural Networks Are Provably Robust Universal Approximators

Updated: 2025-07-30 01:31:24

标题: 浮点神经网络可证明为稳健的通用逼近器

摘要: 神经网络的经典通用逼近（UA）定理建立了一个条件，通过这个条件，前馈神经网络可以以任意精度逼近连续函数$f$。最近的研究结果表明，神经网络也享有更一般的区间通用逼近（IUA）定理，即使用区间域的网络的抽象解释语义可以以任意精度逼近$f$的直接图像映射（即将$f$应用于一组输入的结果）。然而，这些定理是建立在一个不切实际的假设上，即神经网络是在无限精确的实数上进行计算的，而实际上它们的软件实现是基于有限精度浮点数进行计算的。一个未解决的问题是在浮点数设置中IUA定理是否仍然成立。本文介绍了第一个针对浮点神经网络的IUA定理，证明了它们能够完美地捕捉任何四舍五入目标函数$f$的直接图像映射，显示出其表达能力没有限制。我们在浮点设置中的IUA定理与实值设置有实质区别，反映出这两个计算模型之间的基本区别。该定理还暗示了一些令人惊讶的推论，包括（i）可证明鲁棒的浮点神经网络的存在；以及（ii）只使用浮点加法和乘法的直线程序类的计算完备性，适用于所有停止的浮点程序类。

更新时间: 2025-07-30 01:31:24

领域: cs.LG,cs.LO,cs.PL

下载: http://arxiv.org/abs/2506.16065v2

Scientific Machine Learning with Kolmogorov-Arnold Networks

The field of scientific machine learning, which originally utilized multilayer perceptrons (MLPs), is increasingly adopting Kolmogorov-Arnold Networks (KANs) for data encoding. This shift is driven by the limitations of MLPs, including poor interpretability, fixed activation functions, and difficulty capturing localized or high-frequency features. KANs address these issues with enhanced interpretability and flexibility, enabling more efficient modeling of complex nonlinear interactions and effectively overcoming the constraints associated with conventional MLP architectures. This review categorizes recent progress in KAN-based models across three distinct perspectives: (i) data-driven learning, (ii) physics-informed modeling, and (iii) deep operator learning. Each perspective is examined through the lens of architectural design, training strategies, application efficacy, and comparative evaluation against MLP-based counterparts. By benchmarking KANs against MLPs, we highlight consistent improvements in accuracy, convergence, and spectral representation, clarifying KANs' advantages in capturing complex dynamics while learning more effectively. Finally, this review identifies critical challenges and open research questions in KAN development, particularly regarding computational efficiency, theoretical guarantees, hyperparameter tuning, and algorithm complexity. We also outline future research directions aimed at improving the robustness, scalability, and physical consistency of KAN-based frameworks.

Updated: 2025-07-30 01:26:44

标题: 科尔莫戈洛夫-阿诺德网络在科学机器学习中的应用

摘要: 科学机器学习领域最初使用多层感知器（MLPs），越来越多地采用科尔莫哥洛夫-阿诺德网络（KANs）进行数据编码。这种转变是由MLPs的局限性驱动的，包括解释性差，固定激活函数和难以捕捉局部或高频特征。KANs通过增强解释性和灵活性来解决这些问题，使得更有效地对复杂非线性相互作用建模，并有效地克服了传统MLP架构相关的约束。这篇综述将基于KAN的模型在三个不同视角下的最新进展进行分类：（i）数据驱动学习，（ii）受物理启发的建模，以及（iii）深度操作符学习。每个视角通过架构设计，训练策略，应用效果和与基于MLP的对比评估来进行考察。通过将KAN与MLP进行基准测试，我们突出了在精度，收敛性和频谱表示方面的持续改进，澄清了KAN在学习更有效地捕捉复杂动态方面的优势。最后，这篇综述确定了KAN发展中的关键挑战和开放的研究问题，特别是关于计算效率，理论保证，超参数调整和算法复杂性。我们还概述了旨在提高KAN框架的稳健性，可扩展性和物理一致性的未来研究方向。

更新时间: 2025-07-30 01:26:44

领域: cs.LG,cs.CE,math-ph,math.MP

下载: http://arxiv.org/abs/2507.22959v1

AdapSCA-PSO: An Adaptive Localization Algorithm with AI-Based Hybrid SCA-PSO for IoT WSNs

The accurate localization of sensor nodes is a fundamental requirement for the practical application of the Internet of Things (IoT). To enable robust localization across diverse environments, this paper proposes a hybrid meta-heuristic localization algorithm. Specifically, the algorithm integrates the Sine Cosine Algorithm (SCA), which is effective in global search, with Particle Swarm Optimization (PSO), which excels at local search. An adaptive switching module is introduced to dynamically select between the two algorithms. Furthermore, the initialization, fitness evaluation, and parameter settings of the algorithm have been specifically redesigned and optimized to address the characteristics of the node localization problem. Simulation results across varying numbers of sensor nodes demonstrate that, compared to standalone PSO and the unoptimized SCAPSO algorithm, the proposed method significantly reduces the number of required iterations and achieves an average localization error reduction of 84.97%.

Updated: 2025-07-30 01:18:54

标题: AdapSCA-PSO：一种基于人工智能混合SCA-PSO的自适应定位算法，用于物联网无线传感器网络

摘要: 传感器节点的准确定位是物联网（IoT）实际应用的基本要求。为了实现在不同环境下的稳健定位，本文提出了一种混合元启发式定位算法。具体而言，该算法将在全局搜索方面效果显著的正弦余弦算法（SCA）与在局部搜索方面表现出色的粒子群优化（PSO）相结合。引入了一个自适应切换模块，动态选择这两种算法之间的应用。此外，算法的初始化、适应度评估和参数设置已经被重新设计和优化，以应对节点定位问题的特点。通过对不同数量的传感器节点进行模拟实验，结果表明，与独立的PSO和未优化的SCAPSO算法相比，提出的方法显著减少了所需迭代次数，并实现了平均定位误差降低84.97%。

更新时间: 2025-07-30 01:18:54

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2507.22317v1

The challenge of hidden gifts in multi-agent reinforcement learning

Sometimes we benefit from actions that others have taken even when we are unaware that they took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These "hidden gifts" represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus the act of dropping the key for others is a "hidden gift". We show that several different state-of-the-art RL algorithms, including MARL algorithms, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that independent model-free policy gradient agents can solve the task when we provide them with information about their own action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for these independent agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show that credit assignment in multi-agent settings can be particularly challenging in the presence of "hidden gifts", and demonstrate that learning awareness in independent agents can benefit these settings.

Updated: 2025-07-30 01:18:05

标题: 多智体强化学习中隐藏奖励的挑战

摘要: 有时候，我们会受益于他人所采取的行动，即使我们并不知道他们采取了这些行动。例如，如果你不在家时，邻居选择不占据你家前面的停车位，你也会受益，即使你并不知道他们采取了这个行动。这些“隐藏的礼物”对于多智能体强化学习（MARL）构成了一个有趣的挑战，因为在他人有益行动被隐藏时进行归因是非常困难的。在这里，我们通过一个非常简单的MARL任务研究了隐藏礼物的影响。在这个任务中，格子世界环境中的智能体有个别门需要解锁以获得个别奖励。此外，如果所有智能体都解锁了他们的门，团体将获得更大的集体奖励。然而，所有门只有一个钥匙，因此只有在智能体使用钥匙后为其他人放下钥匙才能获得集体奖励。值得注意的是，没有任何迹象表明一个智能体其他智能体已经放下了钥匙，因此为其他人放下钥匙是一个“隐藏的礼物”。我们发现，包括MARL算法在内的几种不同的最新RL算法都无法在这个简单任务中学会如何获得集体奖励。有趣的是，我们发现独立的无模型策略梯度智能体可以在我们提供他们自己行动历史信息的情况下解决这个任务，但MARL智能体仍然无法在行动历史下解决这个任务。最后，我们为这些独立智能体推导出一个纠正项，灵感来自于学习感知方法，这将减少学习中的方差，并帮助他们更可靠地收敛到集体成功。这些结果表明，在“隐藏的礼物”存在的情况下，在多智能体环境中进行归因可能会特别具有挑战性，并且证明了对于独立智能体来说，学习感知可以有益于这些情境。

更新时间: 2025-07-30 01:18:05

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2505.20579v3

BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems

Novelty search (NS) refers to a class of exploration algorithms that seek to uncover diverse system behaviors through simulations or experiments. Such diversity is central to many AI-driven discovery and design tasks, including material and drug development, neural architecture search, and reinforcement learning. However, existing NS methods typically rely on evolutionary strategies and other meta-heuristics that require dense sampling of the input space, making them impractical for expensive black-box systems. In this work, we introduce BEACON, a sample-efficient, Bayesian optimization-inspired approach to NS that is tailored for settings where the input-to-behavior relationship is opaque and costly to evaluate. BEACON models this mapping using multi-output Gaussian processes (MOGPs) and selects new inputs by maximizing a novelty metric computed from posterior samples of the MOGP, effectively balancing the exploration-exploitation trade-off. By leveraging recent advances in posterior sampling and high-dimensional GP modeling, our method remains scalable to large input spaces and datasets. We evaluate BEACON across ten synthetic benchmarks and eight real-world tasks, including the design of diverse materials for clean energy applications. Our results show that BEACON significantly outperforms existing NS baselines, consistently discovering a broader set of behaviors under tight evaluation budgets.

Updated: 2025-07-30 01:09:18

标题: BEACON：昂贵黑盒系统中新颖搜索的贝叶斯优化策略

摘要: 新颖性搜索（NS）是一类探索算法，旨在通过模拟或实验揭示多样化的系统行为。这种多样性对许多基于人工智能的发现和设计任务至关重要，包括材料和药物开发、神经结构搜索和强化学习。然而，现有的NS方法通常依赖于进化策略和其他需要对输入空间进行密集采样的元启发式方法，这使得它们对于昂贵的黑盒系统来说是不切实际的。在这项工作中，我们介绍了BEACON，这是一种样本高效、受贝叶斯优化启发的NS方法，专为输入与行为关系不透明且评估成本高昂的情境而设计。BEACON使用多输出高斯过程（MOGP）来建模这种映射，并通过最大化MOGP后验样本计算的新颖性度量来选择新的输入，有效平衡了探索与开发的权衡。通过利用后验采样和高维GP建模的最新进展，我们的方法仍然可扩展到大型输入空间和数据集。我们在十个合成基准测试和八个真实任务中评估了BEACON，包括为清洁能源应用设计多样化材料。我们的结果表明，BEACON明显优于现有的NS基线，在严格的评估预算下持续发现更广泛的行为集。

更新时间: 2025-07-30 01:09:18

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.03616v4

BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems

Updated: 2025-07-30 01:09:18

标题: 信标：一种用于昂贵黑匣系统中新颖搜索的贝叶斯优化策略

摘要: 新颖性搜索（NS）是一类探索算法，通过模拟或实验寻求揭示多样化系统行为的方法。这种多样性对许多人工智能驱动的发现和设计任务至关重要，包括材料和药物开发、神经架构搜索和强化学习。然而，现有的NS方法通常依赖于进化策略和其他需要对输入空间进行密集采样的元启发式方法，使它们在昂贵的黑匣子系统中变得不切实际。在这项工作中，我们介绍了BEACON，这是一种样本高效、受贝叶斯优化启发的NS方法，专为输入与行为关系不透明且评估成本高昂的设置而设计。BEACON使用多输出高斯过程（MOGPs）来建模这种映射，并通过最大化从MOGP后验样本计算出的新颖度指标来选择新的输入，有效平衡探索和利用的权衡。通过利用后验采样和高维GP建模的最新进展，我们的方法仍然可扩展到大型输入空间和数据集。我们在十个合成基准测试和八个真实任务中评估了BEACON，包括为清洁能源应用设计多样化材料。我们的结果表明，BEACON明显优于现有的NS基线，在紧密的评估预算下始终发现了更广泛的行为集合。

更新时间: 2025-07-30 01:09:18

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.03616v4

Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training

Large language models (LLMs) have shown impressive performance across a range of natural language processing tasks. However, their vast number of parameters introduces significant memory challenges during training, particularly when using memory-intensive optimizers like Adam. Existing memory-efficient algorithms often rely on techniques such as singular value decomposition projection or weight freezing. While these approaches help alleviate memory constraints, they generally produce suboptimal results compared to full-rank updates. In this paper, we investigate the memory-efficient method beyond low-rank training, proposing a novel solution called Gradient Wavelet Transform (GWT), which applies wavelet transforms to gradients in order to significantly reduce the memory requirements for maintaining optimizer states. We demonstrate that GWT can be seamlessly integrated with memory-intensive optimizers, enabling efficient training without sacrificing performance. Through extensive experiments on both pre-training and fine-tuning tasks, we show that GWT achieves state-of-the-art performance compared with advanced memory-efficient optimizers and full-rank approaches in terms of both memory usage and training performance.

Updated: 2025-07-30 01:07:39

标题: 小波遇见Adam：压缩梯度以实现高效训练

摘要: 大型语言模型（LLMs）在各种自然语言处理任务中展现出令人印象深刻的性能。然而，它们庞大的参数数量在训练过程中引入了显著的内存挑战，特别是在使用像Adam这样的内存密集型优化器时。现有的内存高效算法通常依赖于诸如奇异值分解投影或权重冻结等技术。尽管这些方法有助于缓解内存约束，但通常与完整秩更新相比产生次优结果。在本文中，我们研究了低秩训练之外的内存高效方法，提出了一种称为梯度小波变换（GWT）的新颖解决方案，该方法将小波变换应用于梯度，以显著减少维护优化器状态所需的内存。我们证明了GWT可以与内存密集型优化器无缝集成，实现高效训练而不降低性能。通过对预训练和微调任务的广泛实验，我们展示了GWT在内存使用和训练性能方面相比先进的内存高效优化器和完整秩方法取得了最先进的性能。

更新时间: 2025-07-30 01:07:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2501.07237v3

Decoding Neural Signatures of Semantic Evaluations in Depression and Suicidality

Depression and suicidality profoundly impact cognition and emotion, yet objective neurophysiological biomarkers remain elusive. We investigated the spatiotemporal neural dynamics underlying affective semantic processing in individuals with varying levels of clinical severity of depression and suicidality using multivariate decoding of electroencephalography (EEG) data. Participants (N=137) completed a sentence evaluation task involving emotionally charged self-referential statements while EEG was recorded. We identified robust, neural signatures of semantic processing, with peak decoding accuracy between 300-600 ms -- a window associated with automatic semantic evaluation and conflict monitoring. Compared to healthy controls, individuals with depression and suicidality showed earlier onset, longer duration, and greater amplitude decoding responses, along with broader cross-temporal generalization and increased activation of frontocentral and parietotemporal components. These findings suggest altered sensitivity and impaired disengagement from emotionally salient content in the clinical groups, advancing our understanding of the neurocognitive basis of mental health and providing a principled basis for developing reliable EEG-based biomarkers of depression and suicidality.

Updated: 2025-07-30 00:58:51

标题: 解码抑郁和自杀倾向中语义评价的神经标志

摘要: 抑郁和自杀倾向深刻影响认知和情绪，然而客观的神经生理生物标志仍然难以捉摸。我们通过对脑电图（EEG）数据进行多变量解码，研究了不同临床严重程度抑郁和自杀倾向个体情感语义加工的时空神经动态。参与者（N=137）在记录EEG的同时完成了一个涉及情绪充满的自我参照陈述的句子评估任务。我们发现了情感语义加工的强大神经标志，解码准确度在300-600毫秒之间达到峰值——这一时窗与自动语义评估和冲突监控相关。与健康对照组相比，抑郁和自杀倾向个体显示出更早的开始、更长的持续时间和更大的幅度解码响应，以及更广泛的跨时域泛化和额中央和顶颞部分的激活增加。这些发现表明临床群体对情绪显著内容的敏感性和脱离能力受损，推进了我们对精神健康的神经认知基础的理解，并为开发可靠的以EEG为基础的抑郁和自杀倾向生物标志提供了一个基础。

更新时间: 2025-07-30 00:58:51

领域: q-bio.NC,cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.22313v1

Decoding Neural Signatures of Semantic Evaluations in Depression and Suicidality

Updated: 2025-07-30 00:58:51

标题: 翻译：解读抑郁和自杀倾向中的语义评价的神经特征

摘要: 抑郁和自杀倾向深刻影响认知和情绪，然而客观的神经生理生物标记仍然难以捉摸。我们通过多变量解码脑电图（EEG）数据，研究了在不同临床严重程度的抑郁和自杀倾向个体中，情绪语义处理背后的时空神经动态。参与者（N=137）在记录脑电图的同时完成了一个涉及情绪充电的自我参考陈述的句子评估任务。我们确定了语义处理的强有力神经特征，解码准确度在300-600毫秒之间达到峰值--这是与自动语义评估和冲突监控相关的时间窗口。与健康对照组相比，抑郁和自杀倾向个体显示出更早的开始、更长的持续时间和更大的振幅解码响应，以及更广泛的跨时间泛化和前中央和顶颞部分的激活增加。这些发现表明，在临床群体中，对情绪显著内容的敏感性和脱离能力受到改变，推进了我们对精神健康的神经认知基础的理解，并为开发可靠的基于EEG的抑郁和自杀倾向生物标记提供了原则依据。

更新时间: 2025-07-30 00:58:51

领域: q-bio.NC,cs.LG,eess.SP

下载: http://arxiv.org/abs/2507.22313v1

An Asynchronous Decentralised Optimisation Algorithm for Nonconvex Problems

In this paper, we consider nonconvex decentralised optimisation and learning over a network of distributed agents. We develop an ADMM algorithm based on the Randomised Block Coordinate Douglas-Rachford splitting method which enables agents in the network to distributedly and asynchronously compute a set of first-order stationary solutions of the problem. To the best of our knowledge, this is the first decentralised and asynchronous algorithm for solving nonconvex optimisation problems with convergence proof. The numerical examples demonstrate the efficiency of the proposed algorithm for distributed Phase Retrieval and sparse Principal Component Analysis problems.

Updated: 2025-07-30 00:55:17

标题: 一个用于非凸问题的异步分散优化算法

摘要: 在这篇论文中，我们考虑了在分布式代理网络上进行的非凸分散优化和学习。我们基于随机块坐标Douglas-Rachford分裂方法开发了一种ADMM算法，使网络中的代理能够分布式和异步地计算问题的一组一阶稳态解。据我们所知，这是第一个用于解决非凸优化问题的分散和异步算法，并具有收敛证明。数值实例展示了所提算法在分布式相位恢复和稀疏主成分分析问题中的效率。

更新时间: 2025-07-30 00:55:17

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2507.22311v1

An Asynchronous Decentralised Optimisation Algorithm for Nonconvex Problems

Updated: 2025-07-30 00:55:17

标题: 一个用于非凸问题的异步分布式优化算法

摘要: 在本文中，我们考虑分布式代理网络上的非凸分散优化和学习。我们基于随机块坐标Douglas-Rachford分裂方法开发了一种ADMM算法，使网络中的代理能够分布式和异步地计算问题的一组一阶稳态解。据我们所知，这是解决非凸优化问题的第一个分散和异步算法，并且具有收敛证明。数值示例展示了所提算法在分布式相位恢复和稀疏主成分分析问题上的效率。

更新时间: 2025-07-30 00:55:17

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2507.22311v1

Cycles Protocol: A Peer-to-Peer Electronic Clearing System

For centuries, financial institutions have responded to liquidity challenges by forming closed, centralized clearing clubs with strict rules and membership that allow them to collaborate on using the least money to discharge the most debt. As closed clubs, much of the general public has been excluded from participation. But the vast majority of private sector actors consists of micro or small firms that are vulnerable to late payments and generally ineligible for bank loans. This low liquidity environment often results in gridlock and leads to insolvency, and it disproportionately impacts small enterprises and communities. On the other hand, blockchain communities have developed open, decentralized settlement systems, along with a proliferation of store of value assets and new lending protocols, allowing anyone to permissionlessly transact and access credit. However, these protocols remain used primarily for speculative purposes, and so far have fallen short of the large-scale positive impact on the real economy prophesied by their promoters. We address these challenges by introducing Cycles, an open, decentralized clearing, settlement, and issuance protocol. Cycles is designed to enable firms to overcome payment inefficiencies, to reduce their working capital costs, and to leverage diverse assets and liquidity sources, including cryptocurrencies, stablecoins, and lending protocols, in service of clearing more debt with less money. Cycles solves real world liquidity challenges through a privacy-preserving multilateral settlement platform based on a graph optimization algorithm. The design is based on a core insight: liquidity resides within cycles in the payment network's structure and can be accessed via settlement flows optimized to reduce debt.

Updated: 2025-07-30 00:48:50

标题: 《循环协议：一种点对点电子结算系统》

摘要: 几个世纪以来，金融机构一直通过形成封闭、集中的清算俱乐部来应对流动性挑战，制定严格的规则和会员资格，使它们能够协作使用最少的资金来偿还最多的债务。作为封闭俱乐部，许多普通公众被排除在外。但私营部门的绝大多数参与者是微型或小型企业，容易受到逾期付款的影响，通常不符合银行贷款的资格。这种低流动性环境经常导致僵局，导致破产，并且不成比例地影响小企业和社区。另一方面，区块链社区已经发展出开放、去中心化的结算系统，以及大量的价值储存资产和新的借贷协议，使任何人都可以无需许可地进行交易并获取信贷。然而，这些协议主要仍被用于投机目的，迄今为止还未能实现预言的大规模积极影响实体经济。我们通过引入Cycles来解决这些挑战，这是一个开放、去中心化的清算、结算和发行协议。Cycles旨在使企业能够克服支付效率低下的问题，减少其营运资本成本，并利用各种资产和流动性来源，包括加密货币、稳定币和借贷协议，以更少的资金清偿更多的债务。Cycles通过基于图优化算法的隐私保护多边结算平台解决了现实世界的流动性挑战。设计基于一个核心洞察：流动性存在于支付网络结构中的循环中，并且可以通过优化的结算流程来减少债务而被访问。

更新时间: 2025-07-30 00:48:50

领域: cs.CE,cs.CR,econ.TH

下载: http://arxiv.org/abs/2507.22309v1

SleepWalk: Exploiting Context Switching and Residual Power for Physical Side-Channel Attacks

Context switching is utilized by operating systems to change the execution context between application programs. It involves saving and restoring the states of multiple registers and performing a pipeline flush to remove any pre-fetched instructions, leading to a higher instantaneous power consumption compared to typical program execution. In this paper, we introduce a physical power side-channel leakage source that exploits the power spike observed during a context switch, triggered by the inbuilt sleep function of the system kernel. We observed that this power spike directly correlates with both the power consumption during context switching and the residual power consumption of the previously executed program. Notably, the persistence of residual power signatures from previous workloads extends the scope of this side-channel beyond extracting the data in registers during the context switch. Unlike traditional approaches that require analyzing full power traces, applying complex preprocessing, or relying on external synchronization triggers, this novel technique leverages only the amplitude of a single power spike, significantly simplifying the attack. We developed a power model to illustrate the feasibility of mounting end-to-end side-channel attacks using the sleep-induced power spikes. Experimental evaluation demonstrates that our framework can successfully perform cryptographic key recovery for both AES and SIKE implementations on Broadcom BCM2711.

Updated: 2025-07-30 00:39:27

标题: SleepWalk:利用上下文切换和剩余功率进行物理侧信道攻击

摘要: 上下文切换被操作系统用于在应用程序之间改变执行上下文。它涉及保存和恢复多个寄存器的状态，并执行流水线刷新以删除任何预取指令，导致瞬时功耗比典型程序执行更高。在本文中，我们介绍了一种利用在系统内核的内置睡眠函数触发的上下文切换期间观察到的功率峰值的物理功率侧信道泄漏源。我们观察到这个功率峰值直接与上下文切换期间的功耗以及先前执行的程序的剩余功耗消耗相关。值得注意的是，来自先前工作负载的剩余功率特征的持久性扩展了这个侧信道的范围，超出了在上下文切换期间提取寄存器中的数据的范围。与传统方法不同，传统方法需要分析完整的功耗跟踪、应用复杂的预处理，或依赖外部同步触发器，这种新颖技术仅利用单个功率峰值的幅度，大大简化了攻击过程。我们开发了一个功率模型，以说明使用睡眠诱导功率峰值进行端到端侧信道攻击的可行性。实验评估表明，我们的框架可以成功地对Broadcom BCM2711上的AES和SIKE实现进行加密密钥恢复。

更新时间: 2025-07-30 00:39:27

领域: cs.CR

下载: http://arxiv.org/abs/2507.22306v1

Invisible Injections: Exploiting Vision-Language Models Through Steganographic Prompt Embedding

Vision-language models (VLMs) have revolutionized multimodal AI applications but introduce novel security vulnerabilities that remain largely unexplored. We present the first comprehensive study of steganographic prompt injection attacks against VLMs, where malicious instructions are invisibly embedded within images using advanced steganographic techniques. Our approach demonstrates that current VLM architectures can inadvertently extract and execute hidden prompts during normal image processing, leading to covert behavioral manipulation. We develop a multi-domain embedding framework combining spatial, frequency, and neural steganographic methods, achieving an overall attack success rate of 24.3% (plus or minus 3.2%, 95% CI) across leading VLMs including GPT-4V, Claude, and LLaVA, with neural steganography methods reaching up to 31.8%, while maintaining reasonable visual imperceptibility (PSNR greater than 38 dB, SSIM greater than 0.94). Through systematic evaluation on 12 diverse datasets and 8 state-of-the-art models, we reveal moderate but meaningful vulnerabilities in current VLM architectures and propose effective countermeasures. Our findings have significant implications for VLM deployment in security-critical applications and highlight the need for proportionate multimodal AI security frameworks.

Updated: 2025-07-30 00:34:20

标题: 看不见的注射：通过隐写术提示嵌入来利用视觉-语言模型

摘要: 视觉语言模型（VLMs）已经彻底改变了多模态人工智能应用，但引入了新型安全漏洞，这些漏洞目前大多未被探索。我们提出了对VLMs进行隐写术提示注入攻击的第一次全面研究，其中恶意指令通过先进的隐写术技术隐匿地嵌入在图像中。我们的方法表明，当前的VLM架构在正常图像处理过程中可能无意中提取和执行隐藏的提示，导致隐蔽的行为操纵。我们开发了一个多域嵌入框架，结合空间、频率和神经隐写术方法，实现了在领先的VLMs（包括GPT-4V、Claude和LLaVA）上的整体攻击成功率为24.3%（加减3.2%，95% CI），神经隐写术方法达到31.8%，同时保持合理的视觉不可察觉性（PSNR大于38 dB，SSIM大于0.94）。通过在12个不同数据集和8个最先进模型上的系统评估，我们揭示了当前VLM架构中存在的适度但有意义的漏洞，并提出了有效的对策。我们的研究结果对于VLM在安全关键应用中的部署具有重要意义，并强调了需要相应的多模态人工智能安全框架。

更新时间: 2025-07-30 00:34:20

领域: cs.CR

下载: http://arxiv.org/abs/2507.22304v1

High-Resolution Live Fuel Moisture Content (LFMC) Maps for Wildfire Risk from Multimodal Earth Observation Data

Wildfires are increasing in intensity and severity at an alarming rate. Recent advances in AI and publicly available satellite data enable monitoring critical wildfire risk factors globally, at high resolution and low latency. Live Fuel Moisture Content (LFMC) is a critical wildfire risk factor and is valuable for both wildfire research and operational response. However, ground-based LFMC samples are both labor intensive and costly to acquire, resulting in sparse and infrequent updates. In this work, we explore the use of a pretrained, highly-multimodal earth-observation model for generating large-scale spatially complete (wall-to-wall) LFMC maps. Our approach achieves significant improvements over previous methods using randomly initialized models (20 reduction in RMSE). We provide an automated pipeline that enables rapid generation of these LFMC maps across the United States, and demonstrate its effectiveness in two regions recently impacted by wildfire (Eaton and Palisades).

Updated: 2025-07-30 00:31:48

标题: 多模式地球观测数据生成的高分辨率活体燃料湿度（LFMC）地图用于野火风险

摘要: 野火的强度和严重程度正以令人担忧的速度增加。人工智能的最新进展和公开可获取的卫星数据使得全球关键野火风险因素能够以高分辨率和低延迟进行监测。活体燃料水分含量(LFMC)是一个关键的野火风险因素，对野火研究和操作响应都非常有价值。然而，基于地面的LFMC样本获取劳动密集且成本高昂，导致更新稀疏且不经常。在这项工作中，我们探讨了使用预先训练的高度多模态地球观测模型生成大规模空间完整的LFMC地图。我们的方法相比使用随机初始化模型的先前方法取得了显著的改进(均方根误差减少20%)。我们提供了一个自动化流程，可以快速在美国各地生成这些LFMC地图，并展示了其在最近受野火影响的两个地区(Eaton和Palisades)的有效性。

更新时间: 2025-07-30 00:31:48

领域: cs.LG

下载: http://arxiv.org/abs/2506.20132v2

High-Resolution Live Fuel Moisture Content (LFMC) Maps for Wildfire Risk from Multimodal Earth Observation Data

Updated: 2025-07-30 00:31:48

标题: 高分辨率的多模式地球观测数据生成野火风险的活体燃料湿度含量（LFMC）地图

摘要: 野火的强度和严重程度以令人担忧的速度增加。人工智能和公开可用的卫星数据的最新进展使得能够全球范围内、高分辨率和低延迟地监测关键的野火风险因素。活性燃料湿度含量(LFMC)是一个关键的野火风险因素，对于野火研究和操作响应都非常有价值。然而，基于地面的LFMC样本获取既费时又昂贵，导致更新稀疏且不经常。在这项工作中，我们探讨了使用一个预训练的高度多模式地球观测模型生成大规模空间完整（墙对墙）LFMC地图的方法。我们的方法在使用随机初始化模型时相比以前的方法取得了显著的改进（RMSE减少了20%）。我们提供了一个自动化流程，可以快速生成美国各地的LFMC地图，并展示了在最近受野火影响的两个地区（伊顿和帕利塞兹）中的有效性。

更新时间: 2025-07-30 00:31:48

领域: cs.LG

下载: http://arxiv.org/abs/2506.20132v2

CS-SHRED: Enhancing SHRED for Robust Recovery of Spatiotemporal Dynamics

We present $\textbf{CS-SHRED}$, a novel deep learning architecture that integrates Compressed Sensing (CS) into a Shallow Recurrent Decoder ($\textbf{SHRED}$) to reconstruct spatiotemporal dynamics from incomplete, compressed, or corrupted data. Our approach introduces two key innovations. First, by incorporating CS techniques into the $\textbf{SHRED}$ architecture, our method leverages a batch-based forward framework with $\ell_1$ regularization to robustly recover signals even in scenarios with sparse sensor placements, noisy measurements, and incomplete sensor acquisitions. Second, an adaptive loss function dynamically combines Mean Squared Error (MSE) and Mean Absolute Error (MAE) terms with a piecewise Signal-to-Noise Ratio (SNR) regularization, which suppresses noise and outliers in low-SNR regions while preserving fine-scale features in high-SNR regions. We validate $\textbf{CS-SHRED}$ on challenging problems including viscoelastic fluid flows, maximum specific humidity fields, sea surface temperature distributions, and rotating turbulent flows. Compared to the traditional $\textbf{SHRED}$ approach, $\textbf{CS-SHRED}$ achieves significantly higher reconstruction fidelity - as demonstrated by improved SSIM and PSNR values, lower normalized errors, and enhanced LPIPS scores-thereby providing superior preservation of small-scale structures and increased robustness against noise and outliers. Our results underscore the advantages of the jointly trained CS and SHRED design architecture which includes an LSTM sequence model for characterizing the temporal evolution with a shallow decoder network (SDN) for modeling the high-dimensional state space. The SNR-guided adaptive loss function for the spatiotemporal data recovery establishes $\textbf{CS-SHRED}$ as a promising tool for a wide range of applications in environmental, climatic, and scientific data analyses.

Updated: 2025-07-30 00:27:18

标题: CS-SHRED：增强SHRED以实现时空动态的稳健恢复

摘要: 我们提出了一种新颖的深度学习架构$\textbf{CS-SHRED}$，将压缩感知（CS）集成到浅层循环解码器（$\textbf{SHRED}$）中，用于从不完整、压缩或损坏的数据中重建时空动态。我们的方法引入了两个关键创新。首先，通过将CS技术整合到$\textbf{SHRED}$架构中，我们的方法利用基于批处理的前向框架和$\ell_1$正则化，即使在传感器放置稀疏、测量噪声大、传感器采集不完整的情况下也能强健地恢复信号。其次，自适应损失函数动态地结合了均方误差（MSE）和平均绝对误差（MAE）项，以及分段信噪比（SNR）正则化，可以在低SNR区域抑制噪声和异常值，同时保留高SNR区域的细节特征。我们在具有挑战性的问题上验证了$\textbf{CS-SHRED}$，包括粘弹性流动、最大比湿场、海表温度分布和旋转湍流流动。与传统的$\textbf{SHRED}$方法相比，$\textbf{CS-SHRED}$实现了显著更高的重建保真度-通过改善SSIM和PSNR值、降低标准化误差和增强LPIPS分数，从而提供了对小尺度结构的优越保留并增强了对噪声和异常值的鲁棒性。我们的结果强调了联合训练的CS和SHRED设计架构的优势，其中包括用于表征时间演变的LSTM序列模型和用于建模高维状态空间的浅层解码器网络（SDN）。SNR引导的自适应损失函数用于时空数据恢复，将$\textbf{CS-SHRED}$确立为环境、气候和科学数据分析中广泛应用的工具。

更新时间: 2025-07-30 00:27:18

领域: cs.LG,68T07, 35Q35, 94A12,I.2.6; I.5.4; I.6.3; J.2

下载: http://arxiv.org/abs/2507.22303v1

CS-SHRED: Enhancing SHRED for Robust Recovery of Spatiotemporal Dynamics

Updated: 2025-07-30 00:27:18

标题: CS-SHRED：增强SHRED以实现时空动态的稳健恢复

摘要: 我们提出了$\textbf{CS-SHRED}$，这是一种将压缩感知（CS）集成到浅层递归解码器（$\textbf{SHRED}$）中的新型深度学习架构，用于从不完整、压缩或损坏的数据中重建时空动态。我们的方法引入了两个关键创新。首先，通过将CS技术纳入$\textbf{SHRED}$架构，我们的方法利用基于批处理的前向框架以$\ell_1$正则化来稳健地恢复信号，即使在传感器布置稀疏、测量嘈杂或传感器采集不完整的情况下也能实现。其次，自适应损失函数动态地结合了均方误差（MSE）和平均绝对误差（MAE）项，同时具有分段信噪比（SNR）正则化，这在低SNR区域抑制噪声和异常值的同时保留高SNR区域的细节特征。我们在具有挑战性的问题上验证了$\textbf{CS-SHRED}$，包括粘弹性流体流动、最大比湿场、海表温度分布和旋转湍流流动。与传统的$\textbf{SHRED}$方法相比，$\textbf{CS-SHRED}$实现了显著更高的重建保真度-如改善的SSIM和PSNR值、更低的标准化误差和增强的LPIPS分数-从而提供了对小尺度结构的优越保持以及对噪声和异常值的增强鲁棒性。我们的结果强调了联合训练的CS和SHRED设计架构的优势，其中包括用于描述时间演变的LSTM序列模型和用于建模高维状态空间的浅解码器网络（SDN）。SNR引导的自适应损失函数用于时空数据恢复，将$\textbf{CS-SHRED}$确定为在环境、气候和科学数据分析领域具有广泛应用前景的工具。

更新时间: 2025-07-30 00:27:18

领域: cs.LG,68T07, 35Q35, 94A12,I.2.6; I.5.4; I.6.3; J.2

下载: http://arxiv.org/abs/2507.22303v1

Comparing Cluster-Based Cross-Validation Strategies for Machine Learning Model Evaluation

Cross-validation plays a fundamental role in Machine Learning, enabling robust evaluation of model performance and preventing overestimation on training and validation data. However, one of its drawbacks is the potential to create data subsets (folds) that do not adequately represent the diversity of the original dataset, which can lead to biased performance estimates. The objective of this work is to deepen the investigation of cluster-based cross-validation strategies by analyzing the performance of different clustering algorithms through experimental comparison. Additionally, a new cross-validation technique that combines Mini Batch K-Means with class stratification is proposed. Experiments were conducted on 20 datasets (both balanced and imbalanced) using four supervised learning algorithms, comparing cross-validation strategies in terms of bias, variance, and computational cost. The technique that uses Mini Batch K-Means with class stratification outperformed others in terms of bias and variance on balanced datasets, though it did not significantly reduce computational cost. On imbalanced datasets, traditional stratified cross-validation consistently performed better, showing lower bias, variance, and computational cost, making it a safe choice for performance evaluation in scenarios with class imbalance. In the comparison of different clustering algorithms, no single algorithm consistently stood out as superior. Overall, this work contributes to improving predictive model evaluation strategies by providing a deeper understanding of the potential of cluster-based data splitting techniques and reaffirming the effectiveness of well-established strategies like stratified cross-validation. Moreover, it highlights perspectives for increasing the robustness and reliability of model evaluations, especially in datasets with clustering characteristics.

Updated: 2025-07-30 00:13:52

标题: 比较基于聚类的交叉验证策略用于机器学习模型评估

摘要: 交叉验证在机器学习中发挥着基础作用，能够对模型性能进行稳健评估，并防止在训练和验证数据上高估。然而，其缺点之一是可能会创建不足以充分代表原始数据集多样性的数据子集（fold），这可能导致性能估计存在偏差。本文旨在通过实验比较分析不同聚类算法的性能，深入研究基于簇的交叉验证策略。此外，提出了一种结合Mini Batch K-Means和类别分层的新交叉验证技术。实验使用四种监督学习算法在20个数据集上进行（包括平衡和不平衡的数据集），比较了偏差、方差和计算成本等交叉验证策略。在平衡数据集上，使用Mini Batch K-Means与类别分层的技术在偏差和方差方面表现优于其他技术，尽管在计算成本上并未显著降低。在不平衡数据集上，传统分层交叉验证一直表现更好，显示出更低的偏差、方差和计算成本，使其成为在存在类别不平衡情况下性能评估的安全选择。在不同聚类算法的比较中，并没有单一算法表现出持续优势。总体而言，本研究通过提供对基于簇数据分割技术潜力的深入理解，以及对已建立策略如分层交叉验证有效性的确认，有助于改进预测模型评估策略。此外，本研究强调了增加模型评估的稳健性和可靠性的展望，特别是在具有聚类特征的数据集中。

更新时间: 2025-07-30 00:13:52

领域: cs.LG

下载: http://arxiv.org/abs/2507.22299v1

Comparing Cluster-Based Cross-Validation Strategies for Machine Learning Model Evaluation

Updated: 2025-07-30 00:13:52

标题: 比较基于簇的交叉验证策略用于机器学习模型评估

摘要: 交叉验证在机器学习中发挥着基础作用，可以对模型性能进行稳健评估，并防止在训练和验证数据上高估性能。然而，其缺点之一是可能创建不充分代表原始数据集多样性的数据子集（折叠），这可能导致性能估计存在偏差。本文的目标是通过实验比较分析不同聚类算法的性能，深入研究基于聚类的交叉验证策略。此外，提出了一种将Mini Batch K-Means与类别分层结合的新交叉验证技术。在20个数据集上进行了实验（平衡和不平衡），使用四种监督学习算法，比较了交叉验证策略的偏差、方差和计算成本。在平衡数据集上，使用Mini Batch K-Means与类别分层的技术在偏差和方差方面表现优于其他技术，尽管它并未显著降低计算成本。在不平衡数据集上，传统的分层交叉验证始终表现更好，显示出更低的偏差、方差和计算成本，使其成为在具有类别不平衡的情景中进行性能评估的安全选择。在不同聚类算法的比较中，并没有单一算法始终表现出优越性。总的来说，本研究通过提供对基于聚类的数据拆分技术潜力的更深入理解，以及重申分层交叉验证等成熟策略的有效性，有助于改进预测模型评估策略。此外，本文强调了增加模型评估的稳健性和可靠性的展望，特别是在具有聚类特性的数据集中。

更新时间: 2025-07-30 00:13:52

领域: cs.LG

下载: http://arxiv.org/abs/2507.22299v1

Ontological Foundations of State Sovereignty

This short paper is a primer on the nature of state sovereignty and the importance of claims about it. It also aims to reveal (merely reveal) a strategy for working with vague or contradictory data about which states, in fact, are sovereign. These goals together are intended to set the stage for applied work in ontology about international affairs.

Updated: 2025-07-30 00:02:27

标题: 国家主权的本体论基础

摘要: 这篇简短的论文是关于国家主权性质和对其重要性的介绍。它还旨在揭示一个处理关于哪些国家实际上具有主权的模糊或矛盾数据的策略（仅仅揭示）。这些目标结合在一起旨在为国际事务本体论的应用工作奠定基础。

更新时间: 2025-07-30 00:02:27

领域: cs.AI

下载: http://arxiv.org/abs/2507.21172v1